1. Explicitly enable DHCPv4 on v4 instances.
2. Run DHCP6 if IPv6 is connected.
3. Support v6-only environments.
4. Add DNS for v6.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This replaces existing fixed field for etcd encryption with a completely
flexible configuration which exactly matches upstream kube-apiserver
configuration.
The default machine configuration generated still retains previous
defaults.
New configuration allows:
* rotating etcd encryption secrets
* implementing any encryption policies (e.g. encrypting configmaps).
Fixes#10899
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fix a race when a parent disappearing doesn't clean up children
(partitions) based on the order of reconciliation events - a timer-based
one, and a notification about disk.
Treat a disappearing (failing) parent as a signal to clean up all
children as well.
Fixes#13259
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#13254
This also bumps k8s.io/api to 0.36.0 to match our Kubernetes version (it
was made possible via fluxcd update).
The actual fix is https://github.com/siderolabs/go-kubernetes/pull/57
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
If someone mounts a Talos disk image over a loop device, it will appear
in DiscoveredVolumes correctly, but we should not match it as a system
disk. A system disk can't be a loop device.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Concurrent goroutines wrote to k8sNames without synchronization,
causing a data race during drain. The comment claiming "no mutex
needed" was wrong - each goroutine writes a different key, but
the Go map implementation is not safe for concurrent writes.
Fixes#13247
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
The containerd code path for the image pull via CRI doesn't set one
excplitily, and relies on implicit default of `client.Pull` API to pull
in the default platform matcher.
See also #13222
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This fixes aset of inconsistencies, one of the examples is #13203.
Make the field in the spec nilable, and treat `nil` as `on` (this is
Linux default if the field is not set explicitly).
When using machine config, the setting can be flipped to `off`
explicitly (if needed).
This brings back compatibility with raw metal network resource specs
pre-1.12 which don't contain any explicit `adLacpActive: on` at all.
Fixes#13203
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#13226
The issue was that that the ticker was stopped, but never set to `nil`,
so that it was never re-created when the KubeSpan is enabled back again.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Two fixes:
* for EventsWatch, use adaptive timeout and abort early (I would rather
kill this API & the test)
* for apid test, add one more error which sometimes pops up in the test
run to be ignored.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This deprecates more `.machine.features`, allows host DNS to be enabled
in maintenance mode.
Fixes#12438
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Defaults to installer image from factory.talos.dev. Default images now
use schematic hash naming (metal-installer/<hash>) instead of
registry-based naming.
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
Add a SupportsFactoryTalosctlDownload quirk to mark the minimum version that supports talosctl downloads from factory
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
Kill old-style "manual" tests, use `ctest` consistently now.
This should be no-op refactoring.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Allows to authenticate to Image Factory (if Image Factory is configured
for auth), applies for HTTP downloads (e.g. ISO), and injects registry
auth into Talos as well.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Use defer blocks and error joining to guarantee uncordon cleanup
runs regardless of reboot/upgrade success or failure. Prevents nodes
from staying cordoned when operations fail.
Also added gRPC keepalive params to prevent timeout issues during
long operations.
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
The OSSF Security Insights spec and its reference tooling
(ossf/si-tooling) require the lowercase filename security-insights.yml.
LFX Insights scans case-sensitively, so the previously-uppercase file
was not detected and OSPS-QA-04.01 continued to fail on the dashboard.
Also bump schema-version to 2.2.0 (current) and refresh the review
dates.
Closes#13189
Signed-off-by: buckaroo <jeff.behl@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#13169
Also fixes a number of other issues with controller being stuck
"watching" over stale data.
The major part of the change is to watch contents of kubelet's
kubeconfig and restart the watch when it changes.
The internals of the watch process don't always bubble up error
properly, or we don't watch for errors.
With this change, not only initial sync has a timeout and a way to abort
the sync process, Talos now can also restart the sync on kubeconfig
change make it more transparent.
This might become irrelevant if we start managing kubeconfig via Talos
controlplane for workers, but for now this seems to be the way to fix
issues.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Implement optional NTS (RFC 8915) support for authenticated and encrypted
time synchronization, using the beevik/nts library.
Signed-off-by: Erwan Leboucher <erwanleboucher@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
* Have proper matrix job names
* Run all aws-nvidia tests in parallel
* Make misc-0 a matrix in flattened jobs too, so we can re-trigger just the failed one
Signed-off-by: Noel Georgi <git@frezbo.dev>
While the OOM pressure is high, we might observe "extra kills" as there
are no other victims to kill anymore (as `stress-ng` is already gone).
Tolerate those kills, but log them in case we see this getting out of
hand.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
A sample failure:
```
manifests.go:133:
Error Trace: /src/internal/integration/k8s/manifests.go:133
Error: []string{"/usr/local/bin/kube-proxy", "--cluster-cidr=10.244.0.0/16", "--conntrack-max-per-core=0", "--hostname-override=$(NODE_NAME)", "--kubeconfig=/etc/kubernetes/kubeconfig", "--proxy-mode=nftables"} does not contain "--nodeport-addresses=0.0.0.0/0"
Test: TestIntegration/k8s.ManifestsSuite/TestSync
manifests.go:137: disabling kube-proxy
```
My running theory is that `List()` picks up a stale pod, so trying to
filter it out and log it in full if we hit it.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This modifies a test patch, it will not have much effect in the CI (as
it doesn't pull from factory.talos.dev), but good for local testing.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
They re-enabled support for absolute symlinks, but symlinks which target
paths with `../` are still dropped.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
At the end of every sequence that intentionally terminates the machine (reboot, shutdown, upgrade, etc.), a fatal event is published to signal expected termination. The machine status controller was unconditionally flipping the stage to "rebooting" on this event, which was correct for sequences that end in a reboot but incorrect for the shutdown sequence whose expected termination is a power-off.
The stage tracker now skips this transition when the current sequence is shutdown, so the machine stays in "shutting down" until it actually powers off.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>