Fixes#8057
I went back and forth on the way to fix it exactly, and ended up with a
pretty simple version of a fix.
The problem was that discovery service was removing the member at the
initial phase of reset, which actually still requires KubeSpan to be up:
* leaving `etcd` (need to talk to other members)
* stopping pods (might need to talk to Kubernetes API with some CNIs)
Now leaving discovery service happens way later, when network
interactions are no longer required.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#4421
See documentation for details on how to use the feature.
With `talosctl cluster create`, firewall can be easily test with
`--with-firewall=accept|block` (default mode).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This PR adds support for custom node taints. Refer to `nodeTaints` in the `configuration` for more information.
Closes#7581
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Fixes#7873
Some services which perform mounts inside the container which require
mounts to propagate back to the host (e.g. `stargz-snapshotter`) require
this configuration setting.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
As Talos doesn't consume `.machine.install` if already installed, there
is no point in validating it once already installed.
This fixes a problem users often run into: after a reboot/upgrade the
system disk blockdevice name changes, due to the kernel upgrade, or just
unpredictable behavior of device discovery. Talos fails to boot as it
can't validate the machine config, while it's already installed, so
actual blockdevice name doesn't matter.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
drop `UpdateEndpointSuite` suite since KubePrism is enabled by default
starting Talos 1.6 and the test never passes since K8s node is always
ready since it can connect to api server over KubePrism.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Move drone extensions integration to a function. This allows us to
re-use the code and just depend on a single step rather than explicitly
defining all dependencies.
Signed-off-by: Noel Georgi <git@frezbo.dev>
This is required for https://github.com/siderolabs/sidero/pull/1070, as
we need to allow DHCP traffic from Sidero controller running in a VM
through the bridge to other VMs.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Talos now supports new type of encryption keys which rely on Sealing/Unsealing randomly generated bytes with a KMS server:
```
systemDiskEncryption:
ephemeral:
keys:
- kms:
endpoint: https://1.2.3.4:443
slot: 0
```
gRPC API definitions and a simple reference implementation of the KMS server can be found in this
[repository](https://github.com/siderolabs/kms-client/blob/main/cmd/kms-server/main.go).
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
See #7233
The controlplane label is simply injected into existing controller-based
node label flow.
For controlplane taint default NoScheduleTaint, additional controller &
resource was implemented to handle node taints.
This also fixes a problem with `allowSchedulingOnControlPlanes` not
being reactive to config changes - now it is.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This PR adds support for creating a list of API endpoints (each is pair of host and port).
It gets them from
- Machine config cluster endpoint.
- Localhost with LocalAPIServerPort if machine is control panel.
- netip.Addr[0] and port from affiliates if they are control panels.
For #7191
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
`config.Container` implements a multi-doc container which implements
both `Container` interface (encoding, validation, etc.), and `Conifg`
interface (accessing parts of the config).
Refactor `generate` and `bundle` packages to support multi-doc, and
provide backwards compatibility.
Implement a first (mostly example) machine config document for
SideroLink API URL.
Many places don't properly support multi-doc yet (e.g. config patches).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
See #7230
Refactor more config interfaces, move config accessor interfaces
to different package to break the dependency loop.
Make `.RawV1Alpha1()` method typed to avoid type assertions everywhere.
No functional changes.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
See #7230
This is a step towards preparing for multi-doc config.
Split the `config.Provider` interface into parts which have different
implementation:
* `config.Config` accesses the config itself, it might be implemented by
`v1alpha1.Config` for example
* `config.Container` will be a set of config documents, which implement
validation, encoding, etc.
`Version()` method dropped, as it makes little sense and it was almost
not used.
`Raw()` method renamed to `RawV1Alpha1()` to support legacy direct
access to `v1alpha1.Config`, next PR will refactor more to make it
return proper type.
There will be many more changes coming up.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#7159
The change looks big, but it's actually pretty simple inside: the static
pods had an annotation which tracks a version of the secrets which
forced control plane pods to reload on a change. At the same time
`kube-apiserver` can reload certificate inputs automatically from files
without restart.
So the inputs were split: the dynamic (for kube-apiserver) inputs don't
need to be reloaded, so its version is not tracked in static pod
annotation, so they don't cause a reload. The previous non-dynamic
resource still causes a reload, but it doesn't get updated when e.g.
node addresses change.
There might be many more refactoring done, the resource chain is a bit
of a mess there, but I wanted to keep number of changes minimal to keep
this backportable.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Bump golangci-lint and fixup new warnings. Ignore check that checks for
used function parameters, it's kind of noisy and makes it confusing to
read interface implementations.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Fixes: https://github.com/siderolabs/talos/issues/6815
Additionally, make it possible to run reset in maintenance mode: to
enable a way for resetting system disk and remove all traces of Talos
from it.
The new reset flow works in a separate sequence, changed disk probe
lookup to check the boot partition instead of the ephemeral one.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
As with #6724, controlplane node kubelet doesn't use control plane
endpoint anymore, run the test on the worker node instead of cp node.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
We add the `nodeLabels` key to the machine config to allow users to add
node labels to the kubernetes Node object. A controller
reads the nodeLabels from the machine config and applies them via the
kubernetes API.
Older versions of talosctl will throw an unknown keys error if `edit mc`
is called on a node with this change.
Fixes#6301
Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Use boot kernel arg `talos.unified_cgroup_hierarchy=0` to force Talos to
use cgroups v1. Talos still defaults to cgroupsv2.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
We add a controller that provides the etcd member id as a resource
and change the etcd related commands to support member ids next to
hostnames.
Fixes: #6223
Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This the first step towards replacing all import paths to be based on
`siderolabs/` instead of `talos-systems/`.
All updates contain no functional changes, just refactorings to adapt to
the new path structure.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Don't allow worker nodes to act as apid routers:
* don't try to issue client certificate for apid on worker nodes
* if worker nodes receives incoming connections with `--nodes` set to
one of the local addresses of the nodd, it routes the request to
itself without proxying
Second point allows using `talosctl -e worker -n worker` to connect
directly to the worker if the connection from the control plane is not
available for some reason.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#6119
With new stable default hostname feature, any default hostname is
disabled until the machine config is available.
Talos enters maintenance mode when the default config source is empty,
so it doesn't have any machine config available at the moment
maintenance service is started.
Hostname might be set via different sources, e.g. kernel args or via
DHCP before the machine config is available, but if all these sources
are not available, hostname won't be set at all.
This stops waiting for the hostname, and skips setting any DNS names in
the maintenance mode certificate SANs if the hostname is not available.
Also adds a regression test via new `--disable-dhcp-hostname` flag to
`talosctl cluster create`.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>