6217 Commits

Author SHA1 Message Date
Andrey Smirnov
7a94673068
test: fix cron failures for provision-1 & provision-2
Build missing assets in cron schedule.

Fixes #13017

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit aa5946dd385a2b99d572f9318e4eeeeee441b51b)
2026-03-26 16:04:57 +04:00
Andrey Smirnov
7978152094
fix: allow blockdevice wipe in maintenance mode
This is a regression compared to Talos 1.12: allow blockdevice wipe in
maintenance mode (with `os:reader` role).

Also improve the test for maintenance via SideroLink - add a test on
install, META write and reboot preserving META value.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 1dd701efa8119b6515a62ff68c430c99a96f2b68)
2026-03-26 16:03:43 +04:00
Andrey Smirnov
efc76f0bfe
test: fix the flakes in tests with trusted roots
As one of the integration tests was overriding TrustedRoots config, it
erased the required settings leading to a random failure (depending on
the nodes picked for subsequent tests).

Fixes #13013

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 70cefab6af3dacdc80921b55ca8dbf5644501c6c)
2026-03-26 16:03:09 +04:00
Andrey Smirnov
7fa16b4978
test: bump memory for Flannel netpolicy tests
Fixes #13015

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit aacff17f4c8890d6cada8efc6e715f69750f79cd)
2026-03-26 16:02:50 +04:00
Kevin Tijssen
576c269484
feat: add --platform=all support to image cache-create
Add support for caching all platforms in a multi-platform image index
by passing --platform=all to the images cache-create command.

When all is specified, the index manifest is fetched without platform
resolution, and each platform-specific image is downloaded individually.
Attestation manifests (unknown/unknown) are included.

Include the platform in the fetch log line so each pull is identifiable,
e.g. fetching image "..." (linux/amd64).

Signed-off-by: Kevin Tijssen <kevin.tijssen@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 786bf00abb309955616e440cd06fd0718b1b77ab)
2026-03-26 16:01:56 +04:00
Andrey Smirnov
ceec42f2a5
feat: update Linux to 6.18.19, CNI to 1.9.1
Also clean up some imports in go.mod, reduce replaced modules.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 9c34591144f1e2fc759fdc6d56694541eb9f241a)
2026-03-26 16:01:35 +04:00
Andrey Smirnov
902c78a17e
test: improve maintenance API provision tests
Add a test that covers all maintenance APIs in general.

Add a test for transition from SideroLink.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit ad72c73006abc3b51e5371496c61d8637b2222f0)
2026-03-26 16:01:16 +04:00
Orzelius
a4b0cbc491
feat: validate luks headers for tampering
pull in new version of go-blockdevice which adds support for validating luks headers for tampering

Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
(cherry picked from commit e1f645e3cbeee5306dc0075deb8942793eb80a81)
2026-03-26 16:00:27 +04:00
Orzelius
281584b88c
chore: update go-kubernetes library
new retry logic and CDRs

Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
(cherry picked from commit e2b2dd3ea7eed8bc139cd0bd812253baee0dd95c)
2026-03-26 16:00:06 +04:00
David Orman
b863607905
fix: add symlinks nvidia-ctk and nvidia-cdi-hook in /usr/bin
The gpu-operator device plugin generates CDI specs with hooks pointing
to /usr/bin/nvidia-ctk and /usr/bin/nvidia-cdi-hook (hardcoded defaults
in NVIDIA/k8s-device-plugin and NVIDIA/nvidia-container-toolkit). Talos
extensions install these binaries under /usr/local/bin/, so pods
requesting nvidia.com/gpu resource limits fail with "no such file".

Add /usr/bin/nvidia-ctk and /usr/bin/nvidia-cdi-hook to the rootfs as
symlinks.

Fixes: #13021
Fixes: https://github.com/siderolabs/extensions/issues/1017

Signed-off-by: David Orman <ormandj@corenode.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 9597714f625ac07bf74de32a24c3e6dad5abdc91)
2026-03-26 15:59:44 +04:00
Andrey Smirnov
d82fada75b
fix: unset rlimits for extension services
See https://github.com/siderolabs/talos/discussions/13012

The containerd's default OCI spec sets NOFILE rlimit to 1024,
unset it to simply let machined defaults take over.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 8ac47d677703624ec6568294d94dcad7e533e6c4)
2026-03-26 15:59:25 +04:00
Andrey Smirnov
76931f4092
feat: enforce PID check on connections to services over file sockets
Whitelist services which can access the file socket, refuse other
connections.

Fixes #12701

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 038cb87354eea1c1ff4612bdd13d1e77e595955a)
2026-03-26 15:58:41 +04:00
Andrey Smirnov
df4e0e7f58
feat: update etcd to 3.6.9
Resolves:

* https://github.com/etcd-io/etcd/security/advisories/GHSA-q8m4-xhhv-38mg
* https://github.com/etcd-io/etcd/security/advisories/GHSA-rfx7-8w68-q57q

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 362fdc9ece81e805a5a6a4e0303bdf78a6b2c35d)
2026-03-26 15:58:20 +04:00
Andrey Smirnov
08ba425e6c
feat: update Kubernetes to 1.36.0-beta.0
Update to the latest available release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit b1a02f3681c7e361ee6a3ef3d230b47480b48408)
2026-03-26 15:58:02 +04:00
Andrey Smirnov
1cb2a8b302
fix: update diff library to v1.0.1
Our fixes got merged, and more fixes in the library as well.

Bump grpc library (due to a reported CVE which we are not affected
with).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 86344639fcb76d9430ac1e975c98db4488701e43)
2026-03-26 15:57:44 +04:00
Andrey Smirnov
5e171a3de1
test: fix the apid test against AWS/GCP
We should use the endpoint(s) from the original talosconfig instead of
using node IPs, as they might be private/behind the LB.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 8e1c8a7a90fb039fd8a639a1218c169bc683d141)
2026-03-26 15:57:24 +04:00
Andrey Smirnov
f98e76f8d8
fix: panics in diff algorithms
The fix PR https://github.com/neticdk/go-stdlib/pull/44

Replace the library for now.

Add fuzzing test, keep panic causing vectors.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit eff89d1ed46e5f3c709305a8cb134dabae925420)
2026-03-26 15:57:01 +04:00
Mateusz Urbanek
a544aea844
release(v1.13.0-beta.0): prepare release
This is the official v1.13.0-beta.0 release.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
v1.13.0-beta.0 pkg/machinery/v1.13.0-beta.0
2026-03-18 12:41:00 +01:00
Mateusz Urbanek
f36f6ef54d
chore: update pkgs and tools
Update dependencies:
```
pkgs: v1.13.0-beta.0
tools: v1.13.0-beta.0
```

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-03-17 14:32:17 +01:00
Andrey Smirnov
b7d70cf625
feat: unify maintenance and regular APIs
Drop maintenance service and all the code supporting it directly.

Instead, move all network API termination into the `apid` service, which
now can work now in more modes to support maintenance operations as
well.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-17 17:00:35 +04:00
Andrey Smirnov
13d6b4a03c
fix: trim down cosign dependencies
Trade some imports, bump some modules, net result is killing lots of
transitive dependencies which were getting into the build.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-16 22:53:50 +04:00
Andrey Smirnov
5c39a85814
fix: drop aws & azure KMS APIs from the machined build
Replace imports of `pkg/imager` which are reachable from machined.

See #12980

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-16 21:44:26 +04:00
Andrey Smirnov
3d059754c2
fix: accept image cache volume encryption config
Fixes #12945

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-16 19:01:49 +04:00
Andrey Smirnov
d2661d2531
fix: apparmor parser config files
Bring in apparmor fix from https://github.com/siderolabs/pkgs/pull/1489

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-16 17:38:11 +04:00
Mateusz Urbanek
13ef0cfc9b
fix: unmount pseudo-late recursively
Pseudo late mount points (`/system`, `/run` and `/system`) were consistently failing to unmount.
While reaching this unmount sequence, we should already have unmounted any children.
However, if those are not unmounted, we should log what are we unmounting and unmount them recursively.

Fixes #12974

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-03-16 14:14:01 +01:00
Andrey Smirnov
e9d45671a8
fix: panic in hardware.SystemInfoController
The panic:

```
2026/03/16 13:39:56 172.20.0.3: {"component":"controller-runtime","controller":"hardware.SystemInfoController","error":"controller \"hardware.SystemInfoController\" panicked: output tracking already enabled\n\ngoroutine 613 [running]:\nruntime/debug.Stack()\n\t/go/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/cosi-project/runtime/pkg/controller/runtime/internal/rruntime.(*Adapter).runOnce.func2()\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/internal/rruntime/run.go:67 +0x4c\npanic({0x2a43dc0?, 0x350ff30?})\n\t/go/src/runtime/panic.go:860 +0x13a\ngithub.com/cosi-project/runtime/pkg/controller/runtime/internal/rruntime.(*Adapter).StartTrackingOutputs(0x38246abe1c98?)\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/internal/rruntime/output_tracker.go:25 +0x94\ngithub.com/siderolabs/talos/internal/app/machined/pkg/controllers/hardware.(*SystemInfoController).Run(0x38246a3fe280, {0x3549b50, 0x38246a96dbd0}, {0x358b070, 0x38246adaf0e0}, 0x38246adba000)\n\t/src/internal/app/machined/pkg/controllers/hardware/system.go:93 +0x127\ngithub.com/cosi-project/runtime/pkg/controller/runtime/internal/rruntime.(*Adapter).runOnce(0x38246adaf0e0, {0x3549b50, 0x38246a96dbd0}, 0x38246adba000)\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/internal/rruntime/run.go:73 +0xfa\ngithub.com/cosi-project/runtime/pkg/controller/runtime/internal/rruntime.(*Adapter).Run(0x38246adaf0e0, {0x3549b50, 0x38246a96dbd0})\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/internal/rruntime/run.go:25 +0x16b\ngithub.com/cosi-project/runtime/pkg/controller/runtime.(*Runtime).Run.func1.2()\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/runtime.go:201 +0x2e\ngithub.com/cosi-project/runtime/pkg/controller/runtime.(*Runtime).Run.func1.goFunc.3()\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/runtime.go:473 +0x13\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/.cache/mod/golang.org/x/sync@v0.20.0/errgroup/errgroup.go:93 +0x50\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 146\n\t/.cache/mod/golang.org/x/sync@v0.20.0/errgroup/errgroup.go:78 +0x95\n","msg":"2026-03-16T09:39:56.457Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-03-16T09:39:56.718594712Z"}
```

This more of a cosmetic issue, but still - move tracking outputs below
the `continue` statement, otherwise it might be called twice in a single
run.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-16 13:52:46 +04:00
Dominik Pitz
a728bbd897
fix: validate missing apiVersion in config document decoder
Add ErrMissingAPIVersion check in the config document decoder, parallel
to the existing ErrMissingKind. Previously, a typo in the apiVersion key
(e.g. 'apiVerstion') would result in a misleading 'not registered' error
instead of clearly indicating the missing field.

Signed-off-by: Dominik Pitz <pitzdominik@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-16 13:36:33 +04:00
Andrey Smirnov
c8a674afa6
fix: pull in a fix for dmesg timestamps
See https://github.com/siderolabs/go-kmsg/pull/13

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-16 13:17:38 +04:00
Noel Georgi
e7e21fe8ee
feat: bump dependencies
Bump dependencies.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-03-15 20:53:59 +05:30
Mateusz Urbanek
6bb5cf57a2
feat: implement routing rules support
Add RoutingRuleConfig multi-doc config type for management of routing rules.
KubeSpan now uses COSI resources instead of direct kernel management.

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2026-03-13 15:17:49 +01:00
Zadkiel AHARONIAN
a0b9d6e777
feat: bump kernel with uhci_hcd driver
See https://github.com/siderolabs/pkgs/pull/1483

Signed-off-by: Zadkiel AHARONIAN <hello@zadkiel.fr>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-12 17:19:37 +04:00
Andrey Smirnov
1f0d2da396
feat: update containerd to 2.2.2
Pull in via pkgs, bump containerd module (our fork).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 21:54:01 +04:00
Mickaël Canévet
cff0f57825
fix(machined): support USERDATA legacy fallback in OpenNebula driver
The reference does USER_DATA="${USER_DATA:-${USERDATA}}". Talos only read
USER_DATA, silently returning ErrNoConfigSource when a VM used the legacy
USERDATA variable name.

Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 14:15:24 +04:00
Mickaël Canévet
5d3a326c80
feat(machined): add ONEGATE proxy route and deterministic interface iteration for OpenNebula
When ONEGATE_ENDPOINT contains a link-local IPv4 address (169.254.x.x),
emit a /32 scope-link host route via the first static interface, matching
the reference add_onegate_proxy_route behavior. Without this route, VMs
using link-local OneGate endpoints cannot reach the metadata service.

Interface names are now collected and sorted before processing, matching
the reference env | grep ... | sort behavior (ETH0, ETH1, ...). This
makes DNS server ordering and ONEGATE route attachment deterministic
regardless of Go map iteration order.

The interface loop is extracted into processInterfaces to keep ParseMetadata
within cyclomatic complexity limits.

Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 14:15:15 +04:00
Mickaël Canévet
3bec5cc7ba
feat(machined): inherit IP6_METHOD from METHOD in OpenNebula driver
When ETH*_IP6_METHOD is unset, fall back to the value of ETH*_METHOD,
matching the reference [ -z "$ip6_method" ] && ip6_method="${method}"
logic in setup_iface_vars. This means a DHCP interface now also gets a
DHCPv6 operator, a static interface stays static, and a skip interface
remains fully skipped. Update golden testdata to include the DHCPv6
operator that ETH1_METHOD=dhcp now emits.

Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 14:15:06 +04:00
Mickaël Canévet
4f4ec98060
fix(machined): align OpenNebula hostname precedence with reference
Use SET_HOSTNAME exclusively, matching the reference net-15-hostname
script. The previous implementation fell back to HOSTNAME (not used by
OpenNebula) and NAME (the VM name, not a hostname source in the
reference). DNS_HOSTNAME is a server-side flag that triggers a reverse
DNS lookup — a live network operation that cannot be performed inside
ParseMetadata.

Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 14:14:58 +04:00
Mickaël Canévet
4d0244ddf7
feat(machined): add IPv6 alias address support for OpenNebula (ETH*_ALIAS*_IP6)
Extends parseAliases to read ETH*_ALIAS*_IP6 (legacy: ETH*_ALIAS*_IPV6)
and ETH*_ALIAS*_IP6_PREFIX_LENGTH (default 64), emitting an IPv6
AddressSpecSpec subject to the same EXTERNAL/DETACH skip logic as IPv4
aliases.

Error tests for IPv4/IPv6 addresses, aliases, and gateway are consolidated
into a single TestParseErrors function to avoid duplication.

Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 14:14:51 +04:00
Mickaël Canévet
5bb896230e
feat(machined): support ETH*_IP6_METHOD (static/dhcp/auto/disable) for OpenNebula
Dispatches on ETH*_IP6_METHOD before the static IPv6 path:
- disable: skip all IPv6 config for the interface
- auto: emit nothing; Talos accepts Router Advertisements by default so
  SLAAC address auto-configuration works without any explicit operator
- dhcp: emit OperatorDHCP6 with RouteMetric from ETH*_IP6_METRIC (default 1)
- static / empty: fall through to the existing static address path

Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 14:14:43 +04:00
Mickaël Canévet
469db18d39
refactor(machined): extract per-interface IPv4 helper in OpenNebula driver
Move the per-interface IPv4 logic from ParseMetadata into a dedicated
parseInterfaceIPv4 helper, and add an empty parseInterfaceIPv6 stub.
ParseMetadata now delegates all per-interface work to those two helpers
plus the existing parseAliases, keeping its own body small.

No behaviour change; all existing tests pass.

Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 14:14:35 +04:00
Mickaël Canévet
ae61f5a5e5
fix(machined): use ParseFQDN for hostname parsing in OpenNebula
Two bugs are fixed:

1. DNS_HOSTNAME was wrongly used as Domainname. DNS_HOSTNAME is a boolean
   flag (YES/NO) that tells the OpenNebula daemon to perform a reverse
   DNS lookup; it is not a domain name string. Using it as Domainname
   produced invalid FQDNs like "myhost.YES".

2. No FQDN splitting: if the hostname source contained a dot (e.g.
   NAME="myhost.example.com"), the full string was used as Hostname
   instead of splitting on the first dot.

Both bugs are fixed by switching to ParseFQDN(), consistent with how all
other Talos platform implementations handle hostname parsing.

Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 14:14:23 +04:00
Mickaël Canévet
7adbbd2f84
feat(machined): support per-interface route metric for OpenNebula (ETH*_METRIC)
Parse ETHn_METRIC context variables and apply the value as the route
priority for static default gateway routes and the DHCP4 operator's
RouteMetric. When absent, the existing default of 1024 is preserved,
matching the reference netcfg-networkd behavior.

Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 14:14:16 +04:00
Mickaël Canévet
196658c41c
feat(machined): add network alias support for OpenNebula (ETH*_ALIAS*)
Parse ETHn_ALIASm_* context variables and add secondary IPv4 addresses
to the parent interface as additional AddressSpecSpec entries. Aliases
are skipped when DETACH is non-empty or EXTERNAL=YES, matching the
reference netcfg-networkd behavior.

Also guard the ETHn_MAC interface loop to only process top-level
interface keys (ETH<digits>_MAC), preventing alias MAC keys such as
ETH0_ALIAS0_MAC from being mistakenly treated as interfaces.

Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 14:14:05 +04:00
Mickaël Canévet
e96766e810
feat(machined): merge global and per-interface DNS for OpenNebula
Accumulate DNS servers and search domains from both global context
variables (DNS, SEARCH_DOMAIN) and per-interface variables
(ETH*_DNS, ETH*_SEARCH_DOMAIN) into a single merged ResolverSpecSpec,
matching the reference one-apps context-linux get_nameservers() /
get_searchdomains() behavior that writes one /etc/resolv.conf.

Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 14:13:58 +04:00
Mickaël Canévet
23c99a3cb4
feat(machined): add static routes support via ETH*_ROUTES for OpenNebula
Parse the ETH*_ROUTES context variable in the OpenNebula platform and
install per-interface static routes into the platform network config.
Both legacy format ("DEST MASK GW [METRIC]") and CIDR format
("DEST/PREFIX GW [METRIC]") are supported, matching the reference
one-apps context-linux implementation.

Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 14:13:29 +04:00
Andrey Smirnov
ad3c59aada
fix: prevent stale discovered volumes reads
This pulls in a fix https://github.com/siderolabs/go-blockdevice/pull/148

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 12:34:08 +04:00
Andrey Smirnov
fc9749b9eb
feat: pull in kernel with preemptible kernel
Also sync tools, now the kernel is built with LLVM 22.1.

See https://github.com/siderolabs/pkgs/issues/1479 for the context.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-11 12:12:23 +04:00
Noel Georgi
c14179e78d
chore(ci): update nvidia test to use gpu-operator
Update NVIDIA tests to use GPU Operator.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2026-03-11 05:25:15 +05:30
Andrey Smirnov
da70cedfd2
refactor: drop apid file socket
This was yet another socket with implicit auth - remove it completely
by reworking the only usecase for it - cluster-side health checks.
Now these health checks build a "regular" network Talos API client (as
they anyways work only controlplane nodes).

Refactor the check for controlplane nodes to use resources instead of
machine config directly (as machine config might not be always present).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-10 21:52:03 +04:00
Andrey Smirnov
ee53a18c8b
fix: stop pulling wrong platform for images
Attempt to fix intermittent issue with images being pulled with the
wrong platform for multi-platform images.

The Claude did the analysis, and I think the root cause is that the
`DefaultSpec()` we used causes the match to include `variant` which is
e.g. `v8` for arm64, while if the image doesn't declare the exact
variant, it might skip filtering and pick up the first layer which is
amd64.

It is still not clear why exactly it is intermittent this way.

But this change aligns it more closely with the way containerd pulls, so
should be good to go.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-10 20:26:26 +04:00
Andrey Smirnov
17335107be
fix: use non-sensitive resource for health check precondition
A fixup for #12896

The health check might be running as a reduced privilege role client, so
don't pull the machine config, but instead read a field from a
non-sensitive resource.

As this field doesn't exist in older versions of Talos, the check should
still run by default (as it will be empty).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-10 18:37:55 +04:00