Kill old-style "manual" tests, use `ctest` consistently now.
This should be no-op refactoring.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit df0b9a8da1423842d830261e5ddc5dc8f5a234c1)
Fixes#13169
Also fixes a number of other issues with controller being stuck
"watching" over stale data.
The major part of the change is to watch contents of kubelet's
kubeconfig and restart the watch when it changes.
The internals of the watch process don't always bubble up error
properly, or we don't watch for errors.
With this change, not only initial sync has a timeout and a way to abort
the sync process, Talos now can also restart the sync on kubeconfig
change make it more transparent.
This might become irrelevant if we start managing kubeconfig via Talos
controlplane for workers, but for now this seems to be the way to fix
issues.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 149592fa59d20c5aa29e4c0af9a3760585f378ce)
At the end of every sequence that intentionally terminates the machine (reboot, shutdown, upgrade, etc.), a fatal event is published to signal expected termination. The machine status controller was unconditionally flipping the stage to "rebooting" on this event, which was correct for sequences that end in a reboot but incorrect for the shutdown sequence whose expected termination is a power-off.
The stage tracker now skips this transition when the current sequence is shutdown, so the machine stays in "shutting down" until it actually powers off.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
(cherry picked from commit c028db0b8d25e85a4b580e10252d964785320291)
This check was in maintenance Upgrade API for Talos <= 1.12,
so keep it in the "normal" API as well.
It always makes sense - the upgrade would fail if Talos is not
installed, but that failure in legacy Upgrade API is async and not
reported properly back.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 0d8362119e4415182caa9349e0ddfb27ea290d90)
There are no security issues fixed.
Drop username/password creds - they were not used.
Improve security of token interceptor.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 9fbb7c95df2b1dcd68fafa23865412bbd8300f4b)
Remove the skip statements/rework the code to allow
FIPS builds to do Wireguard by wrapping Wireguard operations
into `fips140.WithoutEnforcement` blocks.
Using Wireguard (or not using it) is still a user's choice, but this
allows tests to run in strict mode.
There might be more fixes required for FIPS strict, right now being
blocked by Go issue with X25119 which is going to be backported to Go
1.26.3.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 1ef8e630ab77b3c849e7da6d1ff83e7c6795f070)
Reset ARPIPTargets and NSIP6Targets at the start of BondMasterSpec.Decode.
Without this, repeated decode calls on the same struct can retain old target
entries after config removes them, which makes link status drift from
current bond configuration.
Add a regression test that decodes a payload with targets, then decodes a
payload without target attributes into the same struct and asserts both
slices are empty.
Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 0a47f40b3cdf304a079c6b3fa964e9f82e91ec63)
Add an integration test and fix legacy upgrade API in maintenance mode.
There were several assumptions which do not hold true in maintenance as
we have no machine configuration.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit c464c7e88a3f058cb2bbc36af1910d69d903cd07)
Also fix one more place when version.Name wasn't used properly.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 4ba11156fd164a0d94538508f5c028f249deed50)
For IPv4, they should be attached to no interfaces.
Discovered while doing some manual testing for the documentation.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 0bfdf7f7035fefe804ec4b568709cd6a09195293)
Allow to set build NAME on build, propagate it down to more consumers.
Expose name in `Version` resource, and use that in the dashboard
next to Talos version.
Fix some places where `Name` was hardcoded.
Propagate Name down to UKI build.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 968ec1e0ca26eb1f0de0836e0a55df09dea7dafe)
When dashboard runs within Talos, it previously used `os:admin` role
which allows anything.
With changes in 1.13, I dropped the role to `os:reader`, which is a way
tighter scope from the security perspective, but it broke network config
tab - it tries to write to META, which is not allowed under `os:reader`
role, so this change fixes the dashboard, but still keeps the RBAC
tight.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 649ab7fe4234de1a947071926603377e00910cb9)
Fixes#12933
There are many usecases for this:
* exploring resources and state of the system, learning available
resources
* when a Talos machine is booted up in an environment without network
access, learning all available network interfaces, all disks
available, etc.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 5e24d5265bde9adee92c02e675140de87ee126bf)
Previously, there was no way to grow virtual disks attached to VMs,
even though resizing them was possible (e.g. through hypervisor changing
the size of disk). This forces the UserVolume of type=disk to always
grow to full size of the disk.
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
(cherry picked from commit e2df0f6ce8c47b0dc3e93bf257afb8a1ae9243fb)
The runtime capabilities lookup did not include an entry for the metal-agent mode, causing an index out of range panic when any capability check was performed in that mode. This broke MetaWrite calls from Omni to machines running in metal-agent mode through the new unified apid, preventing them from appearing as pending machines.
Also fix the incorrect comments on the existing entries to match the actual iota order.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
(cherry picked from commit 783a35851ed1bac4ddd0f1fed583fc1b6477614d)
when processing on-link routes, the source address was incorrectly set to the first address of the interface.
This caused issues when the interface had multiple addresses, as the source address may not have been valid for the route.
The source address is now set to an empty string, which allows the kernel to automatically select the appropriate source address for the route.
Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
(cherry picked from commit 3400059ccf4811140a4326397d972f68693c708c)
This is a regression compared to Talos 1.12: allow blockdevice wipe in
maintenance mode (with `os:reader` role).
Also improve the test for maintenance via SideroLink - add a test on
install, META write and reboot preserving META value.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 1dd701efa8119b6515a62ff68c430c99a96f2b68)
Add a test that covers all maintenance APIs in general.
Add a test for transition from SideroLink.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit ad72c73006abc3b51e5371496c61d8637b2222f0)
See https://github.com/siderolabs/talos/discussions/13012
The containerd's default OCI spec sets NOFILE rlimit to 1024,
unset it to simply let machined defaults take over.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 8ac47d677703624ec6568294d94dcad7e533e6c4)
Whitelist services which can access the file socket, refuse other
connections.
Fixes#12701
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 038cb87354eea1c1ff4612bdd13d1e77e595955a)
Drop maintenance service and all the code supporting it directly.
Instead, move all network API termination into the `apid` service, which
now can work now in more modes to support maintenance operations as
well.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Pseudo late mount points (`/system`, `/run` and `/system`) were consistently failing to unmount.
While reaching this unmount sequence, we should already have unmounted any children.
However, if those are not unmounted, we should log what are we unmounting and unmount them recursively.
Fixes#12974
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
The panic:
```
2026/03/16 13:39:56 172.20.0.3: {"component":"controller-runtime","controller":"hardware.SystemInfoController","error":"controller \"hardware.SystemInfoController\" panicked: output tracking already enabled\n\ngoroutine 613 [running]:\nruntime/debug.Stack()\n\t/go/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/cosi-project/runtime/pkg/controller/runtime/internal/rruntime.(*Adapter).runOnce.func2()\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/internal/rruntime/run.go:67 +0x4c\npanic({0x2a43dc0?, 0x350ff30?})\n\t/go/src/runtime/panic.go:860 +0x13a\ngithub.com/cosi-project/runtime/pkg/controller/runtime/internal/rruntime.(*Adapter).StartTrackingOutputs(0x38246abe1c98?)\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/internal/rruntime/output_tracker.go:25 +0x94\ngithub.com/siderolabs/talos/internal/app/machined/pkg/controllers/hardware.(*SystemInfoController).Run(0x38246a3fe280, {0x3549b50, 0x38246a96dbd0}, {0x358b070, 0x38246adaf0e0}, 0x38246adba000)\n\t/src/internal/app/machined/pkg/controllers/hardware/system.go:93 +0x127\ngithub.com/cosi-project/runtime/pkg/controller/runtime/internal/rruntime.(*Adapter).runOnce(0x38246adaf0e0, {0x3549b50, 0x38246a96dbd0}, 0x38246adba000)\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/internal/rruntime/run.go:73 +0xfa\ngithub.com/cosi-project/runtime/pkg/controller/runtime/internal/rruntime.(*Adapter).Run(0x38246adaf0e0, {0x3549b50, 0x38246a96dbd0})\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/internal/rruntime/run.go:25 +0x16b\ngithub.com/cosi-project/runtime/pkg/controller/runtime.(*Runtime).Run.func1.2()\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/runtime.go:201 +0x2e\ngithub.com/cosi-project/runtime/pkg/controller/runtime.(*Runtime).Run.func1.goFunc.3()\n\t/.cache/mod/github.com/cosi-project/runtime@v1.14.0/pkg/controller/runtime/runtime.go:473 +0x13\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/.cache/mod/golang.org/x/sync@v0.20.0/errgroup/errgroup.go:93 +0x50\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 146\n\t/.cache/mod/golang.org/x/sync@v0.20.0/errgroup/errgroup.go:78 +0x95\n","msg":"2026-03-16T09:39:56.457Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-03-16T09:39:56.718594712Z"}
```
This more of a cosmetic issue, but still - move tracking outputs below
the `continue` statement, otherwise it might be called twice in a single
run.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Add RoutingRuleConfig multi-doc config type for management of routing rules.
KubeSpan now uses COSI resources instead of direct kernel management.
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
The reference does USER_DATA="${USER_DATA:-${USERDATA}}". Talos only read
USER_DATA, silently returning ErrNoConfigSource when a VM used the legacy
USERDATA variable name.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
When ONEGATE_ENDPOINT contains a link-local IPv4 address (169.254.x.x),
emit a /32 scope-link host route via the first static interface, matching
the reference add_onegate_proxy_route behavior. Without this route, VMs
using link-local OneGate endpoints cannot reach the metadata service.
Interface names are now collected and sorted before processing, matching
the reference env | grep ... | sort behavior (ETH0, ETH1, ...). This
makes DNS server ordering and ONEGATE route attachment deterministic
regardless of Go map iteration order.
The interface loop is extracted into processInterfaces to keep ParseMetadata
within cyclomatic complexity limits.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
When ETH*_IP6_METHOD is unset, fall back to the value of ETH*_METHOD,
matching the reference [ -z "$ip6_method" ] && ip6_method="${method}"
logic in setup_iface_vars. This means a DHCP interface now also gets a
DHCPv6 operator, a static interface stays static, and a skip interface
remains fully skipped. Update golden testdata to include the DHCPv6
operator that ETH1_METHOD=dhcp now emits.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Use SET_HOSTNAME exclusively, matching the reference net-15-hostname
script. The previous implementation fell back to HOSTNAME (not used by
OpenNebula) and NAME (the VM name, not a hostname source in the
reference). DNS_HOSTNAME is a server-side flag that triggers a reverse
DNS lookup — a live network operation that cannot be performed inside
ParseMetadata.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Extends parseAliases to read ETH*_ALIAS*_IP6 (legacy: ETH*_ALIAS*_IPV6)
and ETH*_ALIAS*_IP6_PREFIX_LENGTH (default 64), emitting an IPv6
AddressSpecSpec subject to the same EXTERNAL/DETACH skip logic as IPv4
aliases.
Error tests for IPv4/IPv6 addresses, aliases, and gateway are consolidated
into a single TestParseErrors function to avoid duplication.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Dispatches on ETH*_IP6_METHOD before the static IPv6 path:
- disable: skip all IPv6 config for the interface
- auto: emit nothing; Talos accepts Router Advertisements by default so
SLAAC address auto-configuration works without any explicit operator
- dhcp: emit OperatorDHCP6 with RouteMetric from ETH*_IP6_METRIC (default 1)
- static / empty: fall through to the existing static address path
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Move the per-interface IPv4 logic from ParseMetadata into a dedicated
parseInterfaceIPv4 helper, and add an empty parseInterfaceIPv6 stub.
ParseMetadata now delegates all per-interface work to those two helpers
plus the existing parseAliases, keeping its own body small.
No behaviour change; all existing tests pass.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Two bugs are fixed:
1. DNS_HOSTNAME was wrongly used as Domainname. DNS_HOSTNAME is a boolean
flag (YES/NO) that tells the OpenNebula daemon to perform a reverse
DNS lookup; it is not a domain name string. Using it as Domainname
produced invalid FQDNs like "myhost.YES".
2. No FQDN splitting: if the hostname source contained a dot (e.g.
NAME="myhost.example.com"), the full string was used as Hostname
instead of splitting on the first dot.
Both bugs are fixed by switching to ParseFQDN(), consistent with how all
other Talos platform implementations handle hostname parsing.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Parse ETHn_METRIC context variables and apply the value as the route
priority for static default gateway routes and the DHCP4 operator's
RouteMetric. When absent, the existing default of 1024 is preserved,
matching the reference netcfg-networkd behavior.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Parse ETHn_ALIASm_* context variables and add secondary IPv4 addresses
to the parent interface as additional AddressSpecSpec entries. Aliases
are skipped when DETACH is non-empty or EXTERNAL=YES, matching the
reference netcfg-networkd behavior.
Also guard the ETHn_MAC interface loop to only process top-level
interface keys (ETH<digits>_MAC), preventing alias MAC keys such as
ETH0_ALIAS0_MAC from being mistakenly treated as interfaces.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Accumulate DNS servers and search domains from both global context
variables (DNS, SEARCH_DOMAIN) and per-interface variables
(ETH*_DNS, ETH*_SEARCH_DOMAIN) into a single merged ResolverSpecSpec,
matching the reference one-apps context-linux get_nameservers() /
get_searchdomains() behavior that writes one /etc/resolv.conf.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Parse the ETH*_ROUTES context variable in the OpenNebula platform and
install per-interface static routes into the platform network config.
Both legacy format ("DEST MASK GW [METRIC]") and CIDR format
("DEST/PREFIX GW [METRIC]") are supported, matching the reference
one-apps context-linux implementation.
Signed-off-by: Mickaël Canévet <mickael.canevet@proton.ch>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This was yet another socket with implicit auth - remove it completely
by reworking the only usecase for it - cluster-side health checks.
Now these health checks build a "regular" network Talos API client (as
they anyways work only controlplane nodes).
Refactor the check for controlplane nodes to use resources instead of
machine config directly (as machine config might not be always present).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
A fixup for #12896
The health check might be running as a reduced privilege role client, so
don't pull the machine config, but instead read a field from a
non-sensitive resource.
As this field doesn't exist in older versions of Talos, the check should
still run by default (as it will be empty).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>