Provide a trace for each step of the reset sequence taken, so if one of
those fails, integration test produces a meaningful message instead of
proceeding and failing somewhere else.
More cleanup/refactor, should be functionally equivalent.
Fixes#8635
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
New package arrived in Go 1.22 which provides better rand primitives and functions.
Use it instead of the old one.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Dynamically map Kubernetes and Talos API ports to an available port on
the host, so every cluster gets its own unique set of parts.
As part of the changes, refactor the provision library and interfaces,
dropping old weird interfaces replacing with (hopefully) much more
descriprive names.
Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#8361
Talos requires v2 (circa 2008), but VMs are often configured to limit
the exposed features to the baseline (v1).
```
[ 0.779218] [talos] [initramfs] booting Talos v1.7.0-alpha.1-35-gef5bbe728-dirty
[ 0.779806] [talos] [initramfs] CPU: QEMU Virtual CPU version 2.5+, 4 core(s), 1 thread(s) per core
[ 0.780529] [talos] [initramfs] x86_64 microarchitecture level: 1
[ 0.781018] [talos] [initramfs] it might be that the VM is configured with an older CPU model, please check the VM configuration
[ 0.782346] [talos] [initramfs] x86_64 microarchitecture level 2 or higher is required, halting
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This controller combines kobject events, and scan of `/sys/block` to
build a consistent list of available block devices, updating resources
as the blockdevice changes.
Based on these resources the next step can run probe on the blockdevices
as they change to present a consistent view of filesystems/partitions.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
The current code was stipping non-`v1alpha1.Config` documents. Provide a
proper method in the config provider, and update places using it.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Drop the Kubernetes manifests as static files clean up (this is only
needed for upgrades from 1.2.x).
Fix Talos handling of cgroup hierarchy: if started in container in a
non-root cgroup hiearachy, use that to handle proper cgroup paths.
Add a test for a simple TinK mode (Talos-in-Kubernetes).
Update the docs.
Fixes#8274
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#8057
I went back and forth on the way to fix it exactly, and ended up with a
pretty simple version of a fix.
The problem was that discovery service was removing the member at the
initial phase of reset, which actually still requires KubeSpan to be up:
* leaving `etcd` (need to talk to other members)
* stopping pods (might need to talk to Kubernetes API with some CNIs)
Now leaving discovery service happens way later, when network
interactions are no longer required.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#4421
See documentation for details on how to use the feature.
With `talosctl cluster create`, firewall can be easily test with
`--with-firewall=accept|block` (default mode).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
`config.Container` implements a multi-doc container which implements
both `Container` interface (encoding, validation, etc.), and `Conifg`
interface (accessing parts of the config).
Refactor `generate` and `bundle` packages to support multi-doc, and
provide backwards compatibility.
Implement a first (mostly example) machine config document for
SideroLink API URL.
Many places don't properly support multi-doc yet (e.g. config patches).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Bump golangci-lint and fixup new warnings. Ignore check that checks for
used function parameters, it's kind of noisy and makes it confusing to
read interface implementations.
Signed-off-by: Noel Georgi <git@frezbo.dev>
This allows to safely recover out of space quota issues, and perform
degragmentation as needed.
`talosctl etcd status` command provides lots of information about the
cluster health.
See docs for more details.
Fixes#4889
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This the first step towards replacing all import paths to be based on
`siderolabs/` instead of `talos-systems/`.
All updates contain no functional changes, just refactorings to adapt to
the new path structure.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Don't allow worker nodes to act as apid routers:
* don't try to issue client certificate for apid on worker nodes
* if worker nodes receives incoming connections with `--nodes` set to
one of the local addresses of the nodd, it routes the request to
itself without proxying
Second point allows using `talosctl -e worker -n worker` to connect
directly to the worker if the connection from the control plane is not
available for some reason.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Overview: deprecate existing Talos resource API, and introduce new COSI
API.
Consequences:
* COSI API can only go via one-2-one proxy (`client.WithNode`)
* client-side API access is way easier with `state.State` wrappers
* lots of small changes on the client side to use new APIs
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
We add a new CRD, `serviceaccounts.talos.dev` (with `tsa` as short name), and its controller which allows users to get a `Secret` containing a short-lived Talosconfig in their namespaces with the roles they need. Additionally, we introduce the `talosctl inject serviceaccount` command to accept a YAML file with Kubernetes manifests and inject them with Talos service accounts so that they can be directly applied to Kubernetes afterwards. If Talos API access feature is enabled on Talos side, the injected workloads will be able to talk to Talos API.
Closessiderolabs/talos#4422.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Fix `Talos` sequencer to run only a single sequence at the same time.
Sequences priority was updated. To match the table:
| what is running (columns) what is requested (rows) | boot | reboot | reset | upgrade |
|----------------------------------------------------|------|--------|-------|---------|
| reboot | Y | Y | Y | N |
| reset | Y | N | N | N |
| upgrade | Y | N | N | N |
With a small addition that `WithTakeover` is still there.
If set, priority is ignored.
This is mainly used for `Shutdown` sequence invokation.
And if doing apply config with reboot enabled.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Clear the kubelet certificates and kubeconfig when hostname changes so that on next start, kubelet goes through the bootstrap process and new certificates are generated and the node is joined to the cluster with the new name.
Fixessiderolabs/talos#5834.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
When integration tests run without data from Talos provisioner (e.g.
against AWS/GCP), it should work only with `talosconfig` as an input.
This specific flow was missing filling out `infoWrapper` properly.
Clean up things a bit by reducing code duplication.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Query the discovery service to fetch the node list and use the results in health checks. Closes siderolabs#5554.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Introduce `cluster.NodeInfo` to represent the basic info about a node which can be used in the health checks. This information, where possible, will be populated by the discovery service in following PRs. Part of siderolabs#5554.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
As `talosctl time` relies on default time server set in the config, and
our nodes start with `pool.ntp.org`, sometimes request to the timeserver
fails failing the tests.
Retry such errors in the tests to avoid spurious failures.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This verifies that members match cluster state and that both cluster
registries work in sync producing same discovery data.
Fixes#4191
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>