This is a regression compared to Talos 1.12: allow blockdevice wipe in
maintenance mode (with `os:reader` role).
Also improve the test for maintenance via SideroLink - add a test on
install, META write and reboot preserving META value.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 1dd701efa8119b6515a62ff68c430c99a96f2b68)
As one of the integration tests was overriding TrustedRoots config, it
erased the required settings leading to a random failure (depending on
the nodes picked for subsequent tests).
Fixes#13013
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 70cefab6af3dacdc80921b55ca8dbf5644501c6c)
Add support for caching all platforms in a multi-platform image index
by passing --platform=all to the images cache-create command.
When all is specified, the index manifest is fetched without platform
resolution, and each platform-specific image is downloaded individually.
Attestation manifests (unknown/unknown) are included.
Include the platform in the fetch log line so each pull is identifiable,
e.g. fetching image "..." (linux/amd64).
Signed-off-by: Kevin Tijssen <kevin.tijssen@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 786bf00abb309955616e440cd06fd0718b1b77ab)
Also clean up some imports in go.mod, reduce replaced modules.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 9c34591144f1e2fc759fdc6d56694541eb9f241a)
Add a test that covers all maintenance APIs in general.
Add a test for transition from SideroLink.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit ad72c73006abc3b51e5371496c61d8637b2222f0)
pull in new version of go-blockdevice which adds support for validating luks headers for tampering
Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
(cherry picked from commit e1f645e3cbeee5306dc0075deb8942793eb80a81)
new retry logic and CDRs
Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
(cherry picked from commit e2b2dd3ea7eed8bc139cd0bd812253baee0dd95c)
The gpu-operator device plugin generates CDI specs with hooks pointing
to /usr/bin/nvidia-ctk and /usr/bin/nvidia-cdi-hook (hardcoded defaults
in NVIDIA/k8s-device-plugin and NVIDIA/nvidia-container-toolkit). Talos
extensions install these binaries under /usr/local/bin/, so pods
requesting nvidia.com/gpu resource limits fail with "no such file".
Add /usr/bin/nvidia-ctk and /usr/bin/nvidia-cdi-hook to the rootfs as
symlinks.
Fixes: #13021
Fixes: https://github.com/siderolabs/extensions/issues/1017
Signed-off-by: David Orman <ormandj@corenode.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 9597714f625ac07bf74de32a24c3e6dad5abdc91)
See https://github.com/siderolabs/talos/discussions/13012
The containerd's default OCI spec sets NOFILE rlimit to 1024,
unset it to simply let machined defaults take over.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 8ac47d677703624ec6568294d94dcad7e533e6c4)
Whitelist services which can access the file socket, refuse other
connections.
Fixes#12701
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 038cb87354eea1c1ff4612bdd13d1e77e595955a)
Update to the latest available release.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit b1a02f3681c7e361ee6a3ef3d230b47480b48408)
Our fixes got merged, and more fixes in the library as well.
Bump grpc library (due to a reported CVE which we are not affected
with).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 86344639fcb76d9430ac1e975c98db4488701e43)
We should use the endpoint(s) from the original talosconfig instead of
using node IPs, as they might be private/behind the LB.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 8e1c8a7a90fb039fd8a639a1218c169bc683d141)
Drop maintenance service and all the code supporting it directly.
Instead, move all network API termination into the `apid` service, which
now can work now in more modes to support maintenance operations as
well.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Trade some imports, bump some modules, net result is killing lots of
transitive dependencies which were getting into the build.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Pseudo late mount points (`/system`, `/run` and `/system`) were consistently failing to unmount.
While reaching this unmount sequence, we should already have unmounted any children.
However, if those are not unmounted, we should log what are we unmounting and unmount them recursively.
Fixes#12974
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
The panic:
```
2026/03/16 13:39:56 172.20.0.3: {"component":"controller-runtime","controller":"hardware.SystemInfoController","error":"controller \"hardware.SystemInfoController\" panicked: output tracking already enabled\n\ngoroutine 613 [running]:\nruntime/debug.Stack()\n\t/go/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/cosi-project/runtime/pkg/controller/runtime/internal/rruntime.(*Adapter).runOnce.func2()\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/internal/rruntime/run.go:67 +0x4c\npanic({0x2a43dc0?, 0x350ff30?})\n\t/go/src/runtime/panic.go:860 +0x13a\ngithub.com/cosi-project/runtime/pkg/controller/runtime/internal/rruntime.(*Adapter).StartTrackingOutputs(0x38246abe1c98?)\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/internal/rruntime/output_tracker.go:25 +0x94\ngithub.com/siderolabs/talos/internal/app/machined/pkg/controllers/hardware.(*SystemInfoController).Run(0x38246a3fe280, {0x3549b50, 0x38246a96dbd0}, {0x358b070, 0x38246adaf0e0}, 0x38246adba000)\n\t/src/internal/app/machined/pkg/controllers/hardware/system.go:93 +0x127\ngithub.com/cosi-project/runtime/pkg/controller/runtime/internal/rruntime.(*Adapter).runOnce(0x38246adaf0e0, {0x3549b50, 0x38246a96dbd0}, 0x38246adba000)\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/internal/rruntime/run.go:73 +0xfa\ngithub.com/cosi-project/runtime/pkg/controller/runtime/internal/rruntime.(*Adapter).Run(0x38246adaf0e0, {0x3549b50, 0x38246a96dbd0})\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/internal/rruntime/run.go:25 +0x16b\ngithub.com/cosi-project/runtime/pkg/controller/runtime.(*Runtime).Run.func1.2()\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/runtime.go:201 +0x2e\ngithub.com/cosi-project/runtime/pkg/controller/runtime.(*Runtime).Run.func1.goFunc.3()\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/runtime.go:473 +0x13\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/.cache/mod/golang.org/x/sync@v0.20.0/errgroup/errgroup.go:93 +0x50\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 146\n\t/.cache/mod/golang.org/x/sync@v0.20.0/errgroup/errgroup.go:78 +0x95\n","msg":"2026-03-16T09:39:56.457Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-03-16T09:39:56.718594712Z"}
```
This more of a cosmetic issue, but still - move tracking outputs below
the `continue` statement, otherwise it might be called twice in a single
run.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Add ErrMissingAPIVersion check in the config document decoder, parallel
to the existing ErrMissingKind. Previously, a typo in the apiVersion key
(e.g. 'apiVerstion') would result in a misleading 'not registered' error
instead of clearly indicating the missing field.
Signed-off-by: Dominik Pitz <pitzdominik@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Add RoutingRuleConfig multi-doc config type for management of routing rules.
KubeSpan now uses COSI resources instead of direct kernel management.
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
The reference does USER_DATA="${USER_DATA:-${USERDATA}}". Talos only read
USER_DATA, silently returning ErrNoConfigSource when a VM used the legacy
USERDATA variable name.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
When ONEGATE_ENDPOINT contains a link-local IPv4 address (169.254.x.x),
emit a /32 scope-link host route via the first static interface, matching
the reference add_onegate_proxy_route behavior. Without this route, VMs
using link-local OneGate endpoints cannot reach the metadata service.
Interface names are now collected and sorted before processing, matching
the reference env | grep ... | sort behavior (ETH0, ETH1, ...). This
makes DNS server ordering and ONEGATE route attachment deterministic
regardless of Go map iteration order.
The interface loop is extracted into processInterfaces to keep ParseMetadata
within cyclomatic complexity limits.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
When ETH*_IP6_METHOD is unset, fall back to the value of ETH*_METHOD,
matching the reference [ -z "$ip6_method" ] && ip6_method="${method}"
logic in setup_iface_vars. This means a DHCP interface now also gets a
DHCPv6 operator, a static interface stays static, and a skip interface
remains fully skipped. Update golden testdata to include the DHCPv6
operator that ETH1_METHOD=dhcp now emits.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Use SET_HOSTNAME exclusively, matching the reference net-15-hostname
script. The previous implementation fell back to HOSTNAME (not used by
OpenNebula) and NAME (the VM name, not a hostname source in the
reference). DNS_HOSTNAME is a server-side flag that triggers a reverse
DNS lookup — a live network operation that cannot be performed inside
ParseMetadata.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Extends parseAliases to read ETH*_ALIAS*_IP6 (legacy: ETH*_ALIAS*_IPV6)
and ETH*_ALIAS*_IP6_PREFIX_LENGTH (default 64), emitting an IPv6
AddressSpecSpec subject to the same EXTERNAL/DETACH skip logic as IPv4
aliases.
Error tests for IPv4/IPv6 addresses, aliases, and gateway are consolidated
into a single TestParseErrors function to avoid duplication.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Dispatches on ETH*_IP6_METHOD before the static IPv6 path:
- disable: skip all IPv6 config for the interface
- auto: emit nothing; Talos accepts Router Advertisements by default so
SLAAC address auto-configuration works without any explicit operator
- dhcp: emit OperatorDHCP6 with RouteMetric from ETH*_IP6_METRIC (default 1)
- static / empty: fall through to the existing static address path
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Move the per-interface IPv4 logic from ParseMetadata into a dedicated
parseInterfaceIPv4 helper, and add an empty parseInterfaceIPv6 stub.
ParseMetadata now delegates all per-interface work to those two helpers
plus the existing parseAliases, keeping its own body small.
No behaviour change; all existing tests pass.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Two bugs are fixed:
1. DNS_HOSTNAME was wrongly used as Domainname. DNS_HOSTNAME is a boolean
flag (YES/NO) that tells the OpenNebula daemon to perform a reverse
DNS lookup; it is not a domain name string. Using it as Domainname
produced invalid FQDNs like "myhost.YES".
2. No FQDN splitting: if the hostname source contained a dot (e.g.
NAME="myhost.example.com"), the full string was used as Hostname
instead of splitting on the first dot.
Both bugs are fixed by switching to ParseFQDN(), consistent with how all
other Talos platform implementations handle hostname parsing.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Parse ETHn_METRIC context variables and apply the value as the route
priority for static default gateway routes and the DHCP4 operator's
RouteMetric. When absent, the existing default of 1024 is preserved,
matching the reference netcfg-networkd behavior.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Parse ETHn_ALIASm_* context variables and add secondary IPv4 addresses
to the parent interface as additional AddressSpecSpec entries. Aliases
are skipped when DETACH is non-empty or EXTERNAL=YES, matching the
reference netcfg-networkd behavior.
Also guard the ETHn_MAC interface loop to only process top-level
interface keys (ETH<digits>_MAC), preventing alias MAC keys such as
ETH0_ALIAS0_MAC from being mistakenly treated as interfaces.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Accumulate DNS servers and search domains from both global context
variables (DNS, SEARCH_DOMAIN) and per-interface variables
(ETH*_DNS, ETH*_SEARCH_DOMAIN) into a single merged ResolverSpecSpec,
matching the reference one-apps context-linux get_nameservers() /
get_searchdomains() behavior that writes one /etc/resolv.conf.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Parse the ETH*_ROUTES context variable in the OpenNebula platform and
install per-interface static routes into the platform network config.
Both legacy format ("DEST MASK GW [METRIC]") and CIDR format
("DEST/PREFIX GW [METRIC]") are supported, matching the reference
one-apps context-linux implementation.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Also sync tools, now the kernel is built with LLVM 22.1.
See https://github.com/siderolabs/pkgs/issues/1479 for the context.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This was yet another socket with implicit auth - remove it completely
by reworking the only usecase for it - cluster-side health checks.
Now these health checks build a "regular" network Talos API client (as
they anyways work only controlplane nodes).
Refactor the check for controlplane nodes to use resources instead of
machine config directly (as machine config might not be always present).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Attempt to fix intermittent issue with images being pulled with the
wrong platform for multi-platform images.
The Claude did the analysis, and I think the root cause is that the
`DefaultSpec()` we used causes the match to include `variant` which is
e.g. `v8` for arm64, while if the image doesn't declare the exact
variant, it might skip filtering and pick up the first layer which is
amd64.
It is still not clear why exactly it is intermittent this way.
But this change aligns it more closely with the way containerd pulls, so
should be good to go.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
A fixup for #12896
The health check might be running as a reduced privilege role client, so
don't pull the machine config, but instead read a field from a
non-sensitive resource.
As this field doesn't exist in older versions of Talos, the check should
still run by default (as it will be empty).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>