Run a health check before the test, as the test depends on CoreDNS being
healthy, and previous tests might disturb the cluster.
Also refactor by using watch instead of retries, make pods terminate
fast.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Provide a trace for each step of the reset sequence taken, so if one of
those fails, integration test produces a meaningful message instead of
proceeding and failing somewhere else.
More cleanup/refactor, should be functionally equivalent.
Fixes#8635
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
New package arrived in Go 1.22 which provides better rand primitives and functions.
Use it instead of the old one.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
More specifically, pick up `/etc/resolv.conf` contents by default when
in container mode, and use that as a base resolver for the host DNS.
Fixes#8303
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This fixes an issue with a single controlplane cluster.
Properly present all accepted CAs to the apiserver, in the test let the
cluster fully recovery between two CA rotations performed.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Dynamically map Kubernetes and Talos API ports to an available port on
the host, so every cluster gets its own unique set of parts.
As part of the changes, refactor the provision library and interfaces,
dropping old weird interfaces replacing with (hopefully) much more
descriprive names.
Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#8361
Talos requires v2 (circa 2008), but VMs are often configured to limit
the exposed features to the baseline (v1).
```
[ 0.779218] [talos] [initramfs] booting Talos v1.7.0-alpha.1-35-gef5bbe728-dirty
[ 0.779806] [talos] [initramfs] CPU: QEMU Virtual CPU version 2.5+, 4 core(s), 1 thread(s) per core
[ 0.780529] [talos] [initramfs] x86_64 microarchitecture level: 1
[ 0.781018] [talos] [initramfs] it might be that the VM is configured with an older CPU model, please check the VM configuration
[ 0.782346] [talos] [initramfs] x86_64 microarchitecture level 2 or higher is required, halting
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This allows to roll all nodes to use a new CA, to refresh it, or e.g.
when the `talosconfig` was exposed accidentally.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This controller combines kobject events, and scan of `/sys/block` to
build a consistent list of available block devices, updating resources
as the blockdevice changes.
Based on these resources the next step can run probe on the blockdevices
as they change to present a consistent view of filesystems/partitions.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
The current code was stipping non-`v1alpha1.Config` documents. Provide a
proper method in the config provider, and update places using it.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
To be used in the `go-talos-support` module without importing the whole
Talos repo.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Let's add a very basic test for the Kata Containers extension, mimicing
what's already in place for gVisor.
This depends on the work being done in:
https://github.com/siderolabs/extensions/pull/279
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
Drop the Kubernetes manifests as static files clean up (this is only
needed for upgrades from 1.2.x).
Fix Talos handling of cgroup hierarchy: if started in container in a
non-root cgroup hiearachy, use that to handle proper cgroup paths.
Add a test for a simple TinK mode (Talos-in-Kubernetes).
Update the docs.
Fixes#8274
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Also:
* Linux 6.6.14 + XDP enablement
* etcd 3.5.12
Various other bumps for the tools, utilities, and Go modules.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#8057
I went back and forth on the way to fix it exactly, and ended up with a
pretty simple version of a fix.
The problem was that discovery service was removing the member at the
initial phase of reset, which actually still requires KubeSpan to be up:
* leaving `etcd` (need to talk to other members)
* stopping pods (might need to talk to Kubernetes API with some CNIs)
Now leaving discovery service happens way later, when network
interactions are no longer required.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#4421
See documentation for details on how to use the feature.
With `talosctl cluster create`, firewall can be easily test with
`--with-firewall=accept|block` (default mode).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This PR adds support for custom node taints. Refer to `nodeTaints` in the `configuration` for more information.
Closes#7581
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Fixes#7873
Some services which perform mounts inside the container which require
mounts to propagate back to the host (e.g. `stargz-snapshotter`) require
this configuration setting.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
As Talos doesn't consume `.machine.install` if already installed, there
is no point in validating it once already installed.
This fixes a problem users often run into: after a reboot/upgrade the
system disk blockdevice name changes, due to the kernel upgrade, or just
unpredictable behavior of device discovery. Talos fails to boot as it
can't validate the machine config, while it's already installed, so
actual blockdevice name doesn't matter.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Before we started a reboot/shutdown/reset/upgrade action with the action tracker (`--wait`), we were setting a flag to prevent cobra from printing the returned error from the command.
This was to prevent the error from being printed twice, as the reporter of the action tracker already prints any errors occurred during the action execution.
But if the error happens too early - i.e. before we even started the status printer goroutine, then that error wouldn't be printed at all, as we have suppressed the errors.
This PR moves the suppression flag to be set after the status printer is started - so we still do not double-print the errors, but neither do we suppress any early-stage error from being printed.
Closessiderolabs/talos#7900.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
This allows to "recover" secrets if the machine config was generated
first without explicitly saving secrets bundle.
Fixes#7895
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
There were weird hacks put into the tests, while each test already runs
in a temporary directory as 'working directory', so no hacks are needed.
Moreover, using fixed `/tmp/...` paths leads to test failures, as CI
runs docker & QEMU tests in parallel conflicting with each other.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
drop `UpdateEndpointSuite` suite since KubePrism is enabled by default
starting Talos 1.6 and the test never passes since K8s node is always
ready since it can connect to api server over KubePrism.
Signed-off-by: Noel Georgi <git@frezbo.dev>