This replaces existing fixed field for etcd encryption with a completely
flexible configuration which exactly matches upstream kube-apiserver
configuration.
The default machine configuration generated still retains previous
defaults.
New configuration allows:
* rotating etcd encryption secrets
* implementing any encryption policies (e.g. encrypting configmaps).
Fixes#10899
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Two fixes:
* for EventsWatch, use adaptive timeout and abort early (I would rather
kill this API & the test)
* for apid test, add one more error which sometimes pops up in the test
run to be ignored.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Add NVIDIA arm64 test matrix.
Also ensure we have a known baseline for nvidia cdi files,
so if upstream adds more files and we don't install to right location
the test would fail.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Allow to set build NAME on build, propagate it down to more consumers.
Expose name in `Version` resource, and use that in the dashboard
next to Talos version.
Fix some places where `Name` was hardcoded.
Propagate Name down to UKI build.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
It seems that depending on timing, we might get one or another Talos in
gRPC client.
Fixes#13016
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
As one of the integration tests was overriding TrustedRoots config, it
erased the required settings leading to a random failure (depending on
the nodes picked for subsequent tests).
Fixes#13013
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
The gpu-operator device plugin generates CDI specs with hooks pointing
to /usr/bin/nvidia-ctk and /usr/bin/nvidia-cdi-hook (hardcoded defaults
in NVIDIA/k8s-device-plugin and NVIDIA/nvidia-container-toolkit). Talos
extensions install these binaries under /usr/local/bin/, so pods
requesting nvidia.com/gpu resource limits fail with "no such file".
Add /usr/bin/nvidia-ctk and /usr/bin/nvidia-cdi-hook to the rootfs as
symlinks.
Fixes: #13021
Fixes: https://github.com/siderolabs/extensions/issues/1017
Signed-off-by: David Orman <ormandj@corenode.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
We should use the endpoint(s) from the original talosconfig instead of
using node IPs, as they might be private/behind the LB.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Drop maintenance service and all the code supporting it directly.
Instead, move all network API termination into the `apid` service, which
now can work now in more modes to support maintenance operations as
well.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Add RoutingRuleConfig multi-doc config type for management of routing rules.
KubeSpan now uses COSI resources instead of direct kernel management.
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
This was yet another socket with implicit auth - remove it completely
by reworking the only usecase for it - cluster-side health checks.
Now these health checks build a "regular" network Talos API client (as
they anyways work only controlplane nodes).
Refactor the check for controlplane nodes to use resources instead of
machine config directly (as machine config might not be always present).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Add symlinks that are expected by nvidia-gpu-operator.
These symlinks point to empty files when nvidia-container-toolkit extension is not added.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Add support for whole machine-wide image verification configuration.
Configuration is a set of rules applied top-down to the image reference,
each specifying a specific cosign-based identity or static public key
claim.
Talos provides a machined API to verify an image reference, resolving it
to the digest on the way as needed.
Talos itself hooks up in the image verification process, while
containerd CRI plugin accesses same API via the machined socket.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Environment suite tests fail often, especially on AWS/GCP.
This change makes the tests more robust.
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
Via tools/pkgs, also pulling in Clang-built Linux
Update go.mod dependencies
Fix linter errors with new golangci-lint, modernize, use new()
Signed-off-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Also allow the system containerd to execute igzip, which is essential
for pulling images
Signed-off-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
This implements a way to run a debug container with a provided image on
the node.
The container runs with privileged profile, allowing to issue debugging
commands (e.g. using some advanced network tools) to troubleshoot a
machine.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Add DisableAccessTime and Secure mount options for existing volumes.
DisableAccessTime adds noatime parameter to disable access time updates.
Secure adds nosuid and nodev parameters for security (defaults to true).
Add integration tests for both options.
Signed-off-by: Pranav Patil <pranavppatil767@gmail.com>
These new APIs only support one2one proxying, so they don't have any
hacks, and look as regular gRPC APIs.
Old APIs are deprecated, but still supported.
Implement client-side multiplexing in `talosctl`, provide fallback to
old APIs for legacy Talos versions.
New APIs include removing an image, importing an image.
Extracted from #12392
Co-authored-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This commit introduces ProbeConfig, a new network configuration document type
that allows users to configure TCP connectivity probes to monitor network
endpoints.
Features:
- ProbeConfig document type with TCP probe support
- ProbeSpec and ProbeStatus resources for probe management
- ProbeConfigController to translate ProbeConfig into ProbeSpec
- ProbeController to execute probes and update ProbeStatus
- Configurable probe interval, timeout, and failure threshold
- Integration tests for API functionality
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Add support for negative max size values in volume configuration.
Negative max size represents the amount of space to be left free on the device, rather than the size the volume should consume.
For example, a max size of "-10GiB" means the volume can grow to the device size minus 10GiB.
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
Open the blockdevice in `O_EXCL` mode when wiping to ensure that we
don't wipe a mounted device.
This issue was discovered via #12620, when we wipe a blockdevice which
is still mounted ending up in a wrong state.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#12491
In (almost) all places we previously used `FastWipe`, use instead a
helper which will try to discover filesystem/partition signatures, and
wipe them.
This fixes the issue when a partition re-created in the same place might
already hit a scenario when the "old" filesystem is discovered in the
same place.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Migrate KubeSpan configuration to support multi-document format.
Add version-aware support for talosctl cluster create and gen config.
Uses multi-doc format for Talos 1.13+, legacy format for 1.12 and earlier.
Signed-off-by: Pranav Patil <pranavppatil767@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
The interactive installer has been deprecated since v1.12 cycle,
now removed completely including the API method.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Extracted from #12115
The idea is that kernel log can be delivered/persisted along with any
other service logs.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
When set to `disk`, a full block device is used for the volume.
When `volumeType = "disk"`:
- Size specific settings are not allowed in the provisioning block (`minSize`, `maxSize`, `grow`).
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
Previously, system volumes (`META`, `STATE`, etc.) were created by
`VolumeConfigController` and user volumes were created by
`UserVolumeConfigController`. This resulted in these controllers
racing to create volumes, which could cause partitions to be created in
an incorrect order.
This patch fixes this potential race by merging these two controllers
into a single controller, and refactoring a lot of the similar code
paths into one single pipeline for volume config handling.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
In certain situations, Talos's shutdown/reboot sequence hangs while
waiting for services/mounts to be gracefully stopped (see:
https://github.com/siderolabs/talos/issues/11775).
This patch adds a forceful mode to the reboot sequence (`talosctl reboot
--mode force`) that bypasses graceful userspace teardown and hard
reboots the machine.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>