372 Commits

Author SHA1 Message Date
Noel Georgi
4e5ff8fa21
fix(ci): zfs test
Expect the zvol file to be present eventually.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-05-05 16:58:53 +05:30
Andrey Smirnov
e1f759af80
chore: fix lint issues automatically
Mostly reformatting to use consistent newlines in function calls.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-05-05 14:06:44 +04:00
Andrey Smirnov
4f11f021de
feat: implement etcd encryption config (kube-apiserver)
This replaces existing fixed field for etcd encryption with a completely
flexible configuration which exactly matches upstream kube-apiserver
configuration.

The default machine configuration generated still retains previous
defaults.

New configuration allows:

* rotating etcd encryption secrets
* implementing any encryption policies (e.g. encrypting configmaps).

Fixes #10899

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-05-04 19:35:23 +04:00
Mateusz Urbanek
876f836430
feat: add support for HTTP Probes
- Add HTTPProbeSpec to ProbeSpecSpec (URL + timeout)
- Implement probeHTTP() to send GET requests, treat 2xx/3xx as success
- Support machine proxy config via httpdefaults.PatchTransport
- Add HTTPProbeConfig v1alpha1 document and controller integration
- Add unit and integration tests for HTTP probe lifecycle

Signed-off-by: Pranav Patil <pranavppatil767@gmail.com>
Co-authored-by: Pranav Patil <pranavppatil767@gmail.com>
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-05-04 15:22:51 +02:00
Mateusz Urbanek
462015bcd9
release(v1.14.0-alpha.0): prepare release
This is the official v1.14.0-alpha.0 release.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-04-29 14:24:29 +02:00
Andrey Smirnov
8a037a56ed
test: fix flaky tests
Two fixes:

* for EventsWatch, use adaptive timeout and abort early (I would rather
  kill this API & the test)
* for apid test, add one more error which sometimes pops up in the test
  run to be ignored.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-29 13:39:58 +04:00
Noel Georgi
bba0b4aeef
chore(ci): nvidia update helm values
See #13159, newer GPU operator v26.3.1 has better detection.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-20 23:36:33 +05:30
Andrey Smirnov
3399ff4de0
fix: propagate route table down to the resource
Fixes #13153

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-20 20:26:50 +04:00
Noel Georgi
ed9545d0db
chore(ci): bump gpu operator version
Bump GPU operator version.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-20 18:43:02 +05:30
Noel Georgi
509cd97339
fix: boot entry detection
Fixes: #13080

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-18 12:22:21 +05:30
Noel Georgi
7fa4d39197
fix: zfs extensions test
Make sure we run the check commands also on the same node where we created the pool.

Fixes: #13014

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-17 23:28:55 +05:30
Noel Georgi
6a3ab87c54
feat(ci): add nvidia arm64 matrix
Add NVIDIA arm64 test matrix.

Also ensure we have a known baseline for nvidia cdi files,
so if upstream adds more files and we don't install to right location
the test would fail.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-12 19:32:40 +05:30
Andrey Smirnov
968ec1e0ca
refactor: propagate NAME properly, allow to set on build
Allow to set build NAME on build, propagate it down to more consumers.

Expose name in `Version` resource, and use that in the dashboard
next to Talos version.

Fix some places where `Name` was hardcoded.

Propagate Name down to UKI build.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-08 17:57:43 +04:00
Andrey Smirnov
4227921b39
test: fix the PKI mismatch test flake
It seems that depending on timing, we might get one or another Talos in
gRPC client.

Fixes #13016

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-26 13:00:34 +04:00
Andrey Smirnov
70cefab6af
test: fix the flakes in tests with trusted roots
As one of the integration tests was overriding TrustedRoots config, it
erased the required settings leading to a random failure (depending on
the nodes picked for subsequent tests).

Fixes #13013

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-24 21:25:42 +04:00
David Orman
9597714f62
fix: add symlinks nvidia-ctk and nvidia-cdi-hook in /usr/bin
The gpu-operator device plugin generates CDI specs with hooks pointing
to /usr/bin/nvidia-ctk and /usr/bin/nvidia-cdi-hook (hardcoded defaults
in NVIDIA/k8s-device-plugin and NVIDIA/nvidia-container-toolkit). Talos
extensions install these binaries under /usr/local/bin/, so pods
requesting nvidia.com/gpu resource limits fail with "no such file".

Add /usr/bin/nvidia-ctk and /usr/bin/nvidia-cdi-hook to the rootfs as
symlinks.

Fixes: #13021
Fixes: https://github.com/siderolabs/extensions/issues/1017

Signed-off-by: David Orman <ormandj@corenode.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-23 21:22:48 +04:00
Andrey Smirnov
8e1c8a7a90
test: fix the apid test against AWS/GCP
We should use the endpoint(s) from the original talosconfig instead of
using node IPs, as they might be private/behind the LB.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-18 12:57:08 +04:00
Andrey Smirnov
b7d70cf625
feat: unify maintenance and regular APIs
Drop maintenance service and all the code supporting it directly.

Instead, move all network API termination into the `apid` service, which
now can work now in more modes to support maintenance operations as
well.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-17 17:00:35 +04:00
Mateusz Urbanek
6bb5cf57a2
feat: implement routing rules support
Add RoutingRuleConfig multi-doc config type for management of routing rules.
KubeSpan now uses COSI resources instead of direct kernel management.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-03-13 15:17:49 +01:00
Andrey Smirnov
ad3c59aada
fix: prevent stale discovered volumes reads
This pulls in a fix https://github.com/siderolabs/go-blockdevice/pull/148

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 12:34:08 +04:00
Noel Georgi
c14179e78d
chore(ci): update nvidia test to use gpu-operator
Update NVIDIA tests to use GPU Operator.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-03-11 05:25:15 +05:30
Andrey Smirnov
da70cedfd2
refactor: drop apid file socket
This was yet another socket with implicit auth - remove it completely
by reworking the only usecase for it - cluster-side health checks.
Now these health checks build a "regular" network Talos API client (as
they anyways work only controlplane nodes).

Refactor the check for controlplane nodes to use resources instead of
machine config directly (as machine config might not be always present).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-10 21:52:03 +04:00
Noel Georgi
2fb6f6a16d
feat: add symlinks needed by gpu-operator
Add symlinks that are expected by nvidia-gpu-operator.
These symlinks point to empty files when nvidia-container-toolkit extension is not added.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-03-10 05:02:21 +05:30
Laura Brehm
7f2eb48561
feat: add image verification endpoint
Add support for whole machine-wide image verification configuration.
Configuration is a set of rules applied top-down to the image reference,
each specifying a specific cosign-based identity or static public key
claim.

Talos provides a machined API to verify an image reference, resolving it
to the digest on the way as needed.

Talos itself hooks up in the image verification process, while
containerd CRI plugin accesses same API via the machined socket.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-06 20:06:07 +04:00
Mateusz Urbanek
95287d2dbe
fix: environment suite failures
Environment suite tests fail often, especially on AWS/GCP.
This change makes the tests more robust.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-03-05 11:12:45 +01:00
Andrey Smirnov
000c18d538
feat: implement blackhole route config
This is useful part of #12608

Closes #12608

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-02-27 14:15:43 +04:00
pythoner6
1da2b63ab5
feat: multi-doc support for configuring vrfs
Fixes https://github.com/siderolabs/talos/issues/11960

This adds a new network config document type, network.VRFConfig that can
be used to configure vrfs https://docs.kernel.org/networking/vrf.html.

Signed-off-by: pythoner6 <pythoner6@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-02-25 15:10:35 +04:00
Dmitrii Sharshakov
9758bd4fe0
feat: update Go to 1.26
Via tools/pkgs, also pulling in Clang-built Linux

Update go.mod dependencies

Fix linter errors with new golangci-lint, modernize, use new()

Signed-off-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-02-19 22:15:19 +01:00
Mateusz Urbanek
8a0e797744
refactor: split locate and provision
Splitting locate and provision and adding extra tests.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-02-19 14:22:15 +01:00
Dmitrii Sharshakov
daf18abf41
fix: fix talosctl debug in enforcing mode
Also allow the system containerd to execute igzip, which is essential
for pulling images

Signed-off-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
2026-02-11 18:07:48 +01:00
Andrey Smirnov
4d531884e9
chore: update dependencies
Update Go modules, various test dependencies.

Brings in:

* CoreDNS 1.14.1
* Flannel 0.28.1

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-02-10 21:17:23 +04:00
Laura Brehm
d43a01ccbd
feat: implement talosctl debug
This implements a way to run a debug container with a provided image on
the node.

The container runs with privileged profile, allowing to issue debugging
commands (e.g. using some advanced network tools) to troubleshoot a
machine.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-02-04 21:26:09 +04:00
Pranav Patil
34a31c9797
feat: add mount options support for existing volumes
Add DisableAccessTime and Secure mount options for existing volumes.
DisableAccessTime adds noatime parameter to disable access time updates.
Secure adds nosuid and nodev parameters for security (defaults to true).
Add integration tests for both options.

Signed-off-by: Pranav Patil <pranavppatil767@gmail.com>
2026-02-04 09:13:05 +01:00
Andrey Smirnov
8b245b8f26
feat: implement new image service APIs
These new APIs only support one2one proxying, so they don't have any
hacks, and look as regular gRPC APIs.

Old APIs are deprecated, but still supported.

Implement client-side multiplexing in `talosctl`, provide fallback to
old APIs for legacy Talos versions.

New APIs include removing an image, importing an image.

Extracted from #12392

Co-authored-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-02-02 15:55:56 +04:00
Mickaël Canévet
b5c760f707
feat: add ProbeConfig for network connectivity probes
This commit introduces ProbeConfig, a new network configuration document type
that allows users to configure TCP connectivity probes to monitor network
endpoints.

Features:
- ProbeConfig document type with TCP probe support
- ProbeSpec and ProbeStatus resources for probe management
- ProbeConfigController to translate ProbeConfig into ProbeSpec
- ProbeController to execute probes and update ProbeStatus
- Configurable probe interval, timeout, and failure threshold
- Integration tests for API functionality

Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-21 22:17:38 +04:00
Mateusz Urbanek
8c7b8f5b7d
feat: add support for negative max size
Add support for negative max size values in volume configuration.
Negative max size represents the amount of space to be left free on the device, rather than the size the volume should consume.
For example, a max size of "-10GiB" means the volume can grow to the device size minus 10GiB.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-01-21 12:11:31 +01:00
Andrey Smirnov
f7072c050e
fix: check if the device is not mounted when wiping
Open the blockdevice in `O_EXCL` mode when wiping to ensure that we
don't wipe a mounted device.

This issue was discovered via #12620, when we wipe a blockdevice which
is still mounted ending up in a wrong state.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-20 17:07:19 +04:00
Andrey Smirnov
5127ef7c28
fix: wipe disk by signatures
Fixes #12491

In (almost) all places we previously used `FastWipe`, use instead a
helper which will try to discover filesystem/partition signatures, and
wipe them.

This fixes the issue when a partition re-created in the same place might
already hit a scenario when the "old" filesystem is discovered in the
same place.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-14 19:15:37 +04:00
Pranav Patil
8184927316
feat: implement KubeSpan multi-document configuration
Migrate KubeSpan configuration to support multi-document format.
Add version-aware support for talosctl cluster create and gen config.
Uses multi-doc format for Talos 1.13+, legacy format for 1.12 and earlier.

Signed-off-by: Pranav Patil <pranavppatil767@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-13 16:08:11 +04:00
Mateusz Urbanek
c3176adcf9
feat: add EnvironmentConfig document
Add new EnvironmentConfig document for configuring the Env vars.
Deprecate .Machine.Env

Closes #12439

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-01-12 15:10:20 +01:00
Andrey Smirnov
c57701d659
fix: remove interactive installer
The interactive installer has been deprecated since v1.12 cycle,
now removed completely including the API method.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-12-25 15:01:10 +04:00
Mateusz Urbanek
681f3e84c8
test: run virtiofs tests only when virtiofsd is running
Detect if virtiofsd is created, and then run or skip virtiofs volumes tests.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2025-12-18 10:26:06 +01:00
Mateusz Urbanek
694f45413f
feat: external volumes
Add new volume type for managing external volume mounts - Virtiofs volumes

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2025-12-15 14:35:52 +01:00
Dmitrii Sharshakov
13df943884
fix: adapt SELinuxSuite.TestNoPtrace to new strace version
Alpine updated strace which changed its error messages

Signed-off-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
2025-12-04 14:54:43 +01:00
Andrey Smirnov
a0cfc35274
feat: implement logs persistence
Implement a log persistence controller, rotate logs and bufferize writes.

Fixes #11461

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Co-authored-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
2025-12-02 12:51:12 +01:00
Andrey Smirnov
e715f38713
feat: present kernel log as talosctl logs kernel
Extracted from #12115

The idea is that kernel log can be delivered/persisted along with any
other service logs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-11-27 20:56:16 +04:00
Mateusz Urbanek
83f2bdb9ce
feat: support relative voume size
Include percent-based maxSize, e.g. use 50% of available space.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2025-11-14 14:56:22 +01:00
Mateusz Urbanek
308c6bc414
feat: add full disk volumes
When set to `disk`, a full block device is used for the volume.

When `volumeType = "disk"`:
- Size specific settings are not allowed in the provisioning block (`minSize`, `maxSize`, `grow`).

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2025-11-12 14:50:56 +01:00
Laura Brehm
43f4e317f1
fix: race between VolumeConfigController and UserVolumeConfigController
Previously, system volumes (`META`, `STATE`, etc.) were created by
`VolumeConfigController` and user volumes were created by
`UserVolumeConfigController`. This resulted in these controllers
racing to create volumes, which could cause partitions to be created in
an incorrect order.

This patch fixes this potential race by merging these two controllers
into a single controller, and refactoring a lot of the similar code
paths into one single pipeline for volume config handling.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
2025-11-12 12:11:17 +04:00
Laura Brehm
957770f65a
feat(machined): add panic/force mode reboot
In certain situations, Talos's shutdown/reboot sequence hangs while
waiting for services/mounts to be gracefully stopped (see:
https://github.com/siderolabs/talos/issues/11775).

This patch adds a forceful mode to the reboot sequence (`talosctl reboot
--mode force`) that bypasses graceful userspace teardown and hard
reboots the machine.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
2025-11-11 12:08:34 +01:00