6315 Commits

Author SHA1 Message Date
Andrey Smirnov
a349dac036
fix: stale discovered volume children
Fix a race when a parent disappearing doesn't clean up children
(partitions) based on the order of reconciliation events - a timer-based
one, and a notification about disk.

Treat a disappearing (failing) parent as a signal to clean up all
children as well.

Fixes #13259

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-05-01 21:29:02 +04:00
Andrey Smirnov
13ce018795
fix: re-enable kexec on arm64
The upstream kernel bug should be fixed now.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-05-01 20:42:11 +04:00
Andrey Smirnov
32539d4ac4
fix: deadlock in the makefs ext4 with populated source
Close the pipe on error/abort.

Fixes #13256

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-05-01 19:48:56 +04:00
Andrey Smirnov
0f3e1966af
fix: panic in Kubernetes manifest sync
Fixes #13254

This also bumps k8s.io/api to 0.36.0 to match our Kubernetes version (it
was made possible via fluxcd update).

The actual fix is https://github.com/siderolabs/go-kubernetes/pull/57

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-05-01 17:18:34 +04:00
Andrey Smirnov
3bae01ac11
fix: do not pick up a system disk from a loop device
If someone mounts a Talos disk image over a loop device, it will appear
in DiscoveredVolumes correctly, but we should not match it as a system
disk. A system disk can't be a loop device.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-05-01 15:51:18 +04:00
Mateusz Urbanek
dedb7a96c1
fix(talosctl): protect k8sNames map writes with mutex
Concurrent goroutines wrote to k8sNames without synchronization,
causing a data race during drain. The comment claiming "no mutex
needed" was wrong - each goroutine writes a different key, but
the Go map implementation is not safe for concurrent writes.

Fixes #13247

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-04-30 14:36:43 +02:00
Andrey Smirnov
cc2be213a8
fix: drop explicit platform matcher
The containerd code path for the image pull via CRI doesn't set one
excplitily, and relies on implicit default of `client.Pull` API to pull
in the default platform matcher.

See also #13222

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-30 15:33:00 +04:00
Mateusz Urbanek
1dffebaf2a
fix: mount throws EPERM on virtiofs with SELinux
Fixes #13245

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-04-30 11:03:06 +02:00
Andrey Smirnov
48a481c29f
fix: replace Canal manifest with a more recent one
Fixes #13221

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-29 21:40:34 +04:00
Andrey Smirnov
6a445406e0
fix: make lacp active nilable
This fixes aset of inconsistencies, one of the examples is #13203.

Make the field in the spec nilable, and treat `nil` as `on` (this is
Linux default if the field is not set explicitly).

When using machine config, the setting can be flipped to `off`
explicitly (if needed).

This brings back compatibility with raw metal network resource specs
pre-1.12 which don't contain any explicit `adLacpActive: on` at all.

Fixes #13203

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-29 21:14:20 +04:00
Andrey Smirnov
0d1d95c7da
fix: bump go-kmsg to fix the timestamp drift
See https://github.com/siderolabs/go-kmsg/pull/15

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-29 20:46:50 +04:00
Andrey Smirnov
bd344fd53f
fix: reset the ticker when the KubeSpan is disabled/enabled
Fixes #13226

The issue was that that the ticker was stopped, but never set to `nil`,
so that it was never re-created when the KubeSpan is enabled back again.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-29 19:36:12 +04:00
Mateusz Urbanek
462015bcd9
release(v1.14.0-alpha.0): prepare release
This is the official v1.14.0-alpha.0 release.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
v1.14.0-alpha.0 pkg/machinery/v1.14.0-alpha.0
2026-04-29 14:24:29 +02:00
Andrey Smirnov
8a037a56ed
test: fix flaky tests
Two fixes:

* for EventsWatch, use adaptive timeout and abort early (I would rather
  kill this API & the test)
* for apid test, add one more error which sometimes pops up in the test
  run to be ignored.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-29 13:39:58 +04:00
Noel Georgi
08c81d8380
feat: bump kernel to 6.18.25
Bump kernel to 6.18.25.
This should pass all grype scans.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-28 17:38:05 +05:30
Noel Georgi
fe40b6e588
fix(ci): fetch empty pr labels
If labels are empty return `[]`.
Brought in via rekres.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-27 23:40:25 +05:30
Andrey Smirnov
837a9ed077
feat: move host DNS config into ResolverConfig
This deprecates more `.machine.features`, allows host DNS to be enabled
in maintenance mode.

Fixes #12438

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-27 20:11:01 +04:00
Mateusz Urbanek
96a8ecd1ee
feat: default to factory installer image
Defaults to installer image from factory.talos.dev. Default images now
use schematic hash naming (metal-installer/<hash>) instead of
registry-based naming.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-04-27 15:46:13 +02:00
Mateusz Urbanek
f19eef78b9
fix: revert add extraArgs from service-account-issuer
This reverts commit d1954278a1ba3470b2e5ccae90762078c18d69e9.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-04-27 10:36:34 +02:00
Mateusz Urbanek
6821225b64
fix: revert use append instead of prepend in service-account-issuer
This reverts commit 01a3678913de0fa4d309a361428c117d24ce0d1e.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-04-27 10:36:22 +02:00
Edward Sammut Alessi
b43c3a124f
feat: add quirk for talosctl factory downloads
Add a SupportsFactoryTalosctlDownload quirk to mark the minimum version that supports talosctl downloads from factory

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2026-04-24 13:30:37 +02:00
Andrey Smirnov
df0b9a8da1
refactor: make all controller unit-test follow modern patterns
Kill old-style "manual" tests, use `ctest` consistently now.

This should be no-op refactoring.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-23 22:52:09 +04:00
Andrey Smirnov
c2948cef23
feat: support auth for Image Factory in cluster create
Allows to authenticate to Image Factory (if Image Factory is configured
for auth), applies for HTTP downloads (e.g. ISO), and injects registry
auth into Talos as well.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-23 22:21:00 +04:00
Andrey Smirnov
560bcf0cae
feat: enforce TLS 1.3 minmum version for Kubernetes components
Fixes #13120

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-23 18:32:51 +04:00
Mateusz Urbanek
3db14309e0
fix(talosctl): ensure uncordon runs after reboot/upgrade errors
Use defer blocks and error joining to guarantee uncordon cleanup
runs regardless of reboot/upgrade success or failure. Prevents nodes
from staying cordoned when operations fail.

Also added gRPC keepalive params to prevent timeout issues during
long operations.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-04-23 12:35:43 +02:00
Andrey Smirnov
ecf2fa855b
feat: update Kubernetes to v1.36.0
The final Kubernetes version for Talos v1.13.0.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-23 13:42:59 +04:00
Noel Georgi
71557eadda
fix(ci): skip misc jobs not on pull request
Skip the misc jobs not on pull request.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-23 12:58:04 +05:30
buckaroo
026313b7cc
docs: rename security-insights.yml to lowercase for LFX detection
The OSSF Security Insights spec and its reference tooling
(ossf/si-tooling) require the lowercase filename security-insights.yml.
LFX Insights scans case-sensitively, so the previously-uppercase file
was not detected and OSPS-QA-04.01 continued to fail on the dashboard.

Also bump schema-version to 2.2.0 (current) and refresh the review
dates.

Closes #13189

Signed-off-by: buckaroo <jeff.behl@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-23 10:49:20 +04:00
Noel Georgi
dc4ffd490d
fix(ci): fix jobs not interpolating matrix due to condition
Rekres to bring in fixes for jobs not interpolating matrix.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-23 00:34:11 +05:30
Andrey Smirnov
25e2f37e2b
chore: generate comments for fields in resource proto
Update structprotogen to put comments from Go structs into generated
.proto files.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-22 18:02:27 +04:00
Andrey Smirnov
149592fa59
fix: watch kubelet's kubeconfig and time out for cache sync
Fixes #13169

Also fixes a number of other issues with controller being stuck
"watching" over stale data.

The major part of the change is to watch contents of kubelet's
kubeconfig and restart the watch when it changes.

The internals of the watch process don't always bubble up error
properly, or we don't watch for errors.

With this change, not only initial sync has a timeout and a way to abort
the sync process, Talos now can also restart the sync on kubeconfig
change make it more transparent.

This might become irrelevant if we start managing kubeconfig via Talos
controlplane for workers, but for now this seems to be the way to fix
issues.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-22 17:44:46 +04:00
Andrey Smirnov
1f315e6e90
feat: update Linux to 6.18.23
Sync tools/pkgs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-22 17:32:30 +04:00
Erwan Leboucher
0198eedc2b
feat: add NTS (Network Time Security) support for NTP time sync
Implement optional NTS (RFC 8915) support for authenticated and encrypted
time synchronization, using the beevik/nts library.

Signed-off-by: Erwan Leboucher <erwanleboucher@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-22 15:27:50 +04:00
Noel Georgi
6830a8b97d
fix(ci): matrix jobs cleanups
* Have proper matrix job names
* Run all aws-nvidia tests in parallel
* Make misc-0 a matrix in flattened jobs too, so we can re-trigger just the failed one

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-22 12:56:01 +05:30
Andrey Smirnov
71aeb347f9
test: fix OOM test flake
While the OOM pressure is high, we might observe "extra kills" as there
are no other victims to kill anymore (as `stress-ng` is already gone).
Tolerate those kills, but log them in case we see this getting out of
hand.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-21 21:56:48 +04:00
Andrey Smirnov
9b9542cc55
test: fix a flake in the manifest sync test
A sample failure:

```
manifests.go:133:
        	Error Trace:	/src/internal/integration/k8s/manifests.go:133
        	Error:      	[]string{"/usr/local/bin/kube-proxy", "--cluster-cidr=10.244.0.0/16", "--conntrack-max-per-core=0", "--hostname-override=$(NODE_NAME)", "--kubeconfig=/etc/kubernetes/kubeconfig", "--proxy-mode=nftables"} does not contain "--nodeport-addresses=0.0.0.0/0"
        	Test:       	TestIntegration/k8s.ManifestsSuite/TestSync
    manifests.go:137: disabling kube-proxy
```

My running theory is that `List()` picks up a stale pod, so trying to
filter it out and log it in full if we hit it.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-21 21:12:26 +04:00
Andrey Smirnov
863d882b6c
test: add image verification for factory.talos.dev
This modifies a test patch, it will not have much effect in the CI (as
it doesn't pull from factory.talos.dev), but good for local testing.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-21 18:52:02 +04:00
Noel Georgi
bba0b4aeef
chore(ci): nvidia update helm values
See #13159, newer GPU operator v26.3.1 has better detection.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-20 23:36:33 +05:30
Andrey Smirnov
3399ff4de0
fix: propagate route table down to the resource
Fixes #13153

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-20 20:26:50 +04:00
Andrey Smirnov
c684ec60ea
chore: prepare for Talos 1.14 release
Add compatibility, bump versions in upgrade & Image Factory tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-20 19:33:01 +04:00
Noel Georgi
ed9545d0db
chore(ci): bump gpu operator version
Bump GPU operator version.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-20 18:43:02 +05:30
Noel Georgi
4de3e4393e
fix(ci): cron triggered workflows
The triggered jobs were missing the token to download artifacts.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-20 18:07:31 +05:30
Andrey Smirnov
212182e6f6
chore: bump container registry library
They re-enabled support for absolute symlinks, but symlinks which target
paths with `../` are still dropped.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-20 14:49:50 +04:00
Utku Ozdemir
c028db0b8d
fix: do not flip machine stage to rebooting during shutdown
At the end of every sequence that intentionally terminates the machine (reboot, shutdown, upgrade, etc.), a fatal event is published to signal expected termination. The machine status controller was unconditionally flipping the stage to "rebooting" on this event, which was correct for sequences that end in a reboot but incorrect for the shutdown sequence whose expected termination is a power-off.

The stage tracker now skips this transition when the current sequence is shutdown, so the machine stays in "shutting down" until it actually powers off.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2026-04-20 10:35:22 +02:00
Noel Georgi
6ce62d9e8e
fix(ci): workflow runs with workflow_run
Fix workflows having trigger as `workflow_run`

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-18 17:11:45 +05:30
Noel Georgi
509cd97339
fix: boot entry detection
Fixes: #13080

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-18 12:22:21 +05:30
Noel Georgi
5e3f301887
feat(ci): rework to schedule daily runs after a cron
This prevents us from building and pushing artifacts and replacing then for each run.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-17 23:43:10 +05:30
Noel Georgi
7fa4d39197
fix: zfs extensions test
Make sure we run the check commands also on the same node where we created the pool.

Fixes: #13014

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-04-17 23:28:55 +05:30
Andrey Smirnov
1ef8e630ab
test: allow more tests to run in FIPS strict mode
Remove the skip statements/rework the code to allow
FIPS builds to do Wireguard by wrapping Wireguard operations
into `fips140.WithoutEnforcement` blocks.

Using Wireguard (or not using it) is still a user's choice, but this
allows tests to run in strict mode.

There might be more fixes required for FIPS strict, right now being
blocked by Go issue with X25119 which is going to be backported to Go
1.26.3.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-17 19:56:18 +04:00
Andrey Smirnov
bdcc9321b6
fix: reduce memory dashboard usage
Many small changes, memory reduction measured to be aroun -20MiB.

Reduce cgroup memory limit.

Changes:

* limit updates to 2fps
* batch log updates
* reuse/maps slices to reduce allocations

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-04-17 19:27:24 +04:00