Fixes: https://github.com/siderolabs/talos/issues/6045
`talosctl apply-config` now supports `--config-patch` flag that takes
machine config patches as the input.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
We add a new CRD, `serviceaccounts.talos.dev` (with `tsa` as short name), and its controller which allows users to get a `Secret` containing a short-lived Talosconfig in their namespaces with the roles they need. Additionally, we introduce the `talosctl inject serviceaccount` command to accept a YAML file with Kubernetes manifests and inject them with Talos service accounts so that they can be directly applied to Kubernetes afterwards. If Talos API access feature is enabled on Talos side, the injected workloads will be able to talk to Talos API.
Closessiderolabs/talos#4422.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Fix `Talos` sequencer to run only a single sequence at the same time.
Sequences priority was updated. To match the table:
| what is running (columns) what is requested (rows) | boot | reboot | reset | upgrade |
|----------------------------------------------------|------|--------|-------|---------|
| reboot | Y | Y | Y | N |
| reset | Y | N | N | N |
| upgrade | Y | N | N | N |
With a small addition that `WithTakeover` is still there.
If set, priority is ignored.
This is mainly used for `Shutdown` sequence invokation.
And if doing apply config with reboot enabled.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Clear the kubelet certificates and kubeconfig when hostname changes so that on next start, kubelet goes through the bootstrap process and new certificates are generated and the node is joined to the cluster with the new name.
Fixessiderolabs/talos#5834.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
This uses the `go-packet` library with native bindings for the packet
capture (without `libpcap`). This is not the most performant way, but it
allows us to avoid CGo.
There is a problem with converting network filter expressions (like
`tcp port 3222`) into BPF instructions, it's only available in C
libraries, but there's a workaround with `tcpdump`.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Overwrite cluster's server URL in the kubeconfig file used by kubelet when the cluster control plane endpoint is changed in machineconfig, so that kubelet doesn't lose connectivity to kube-apiserver.
Closessiderolabs/talos#4470.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
This PR allows the ability to generate `secrets.yaml` (`talosctl gen secrets`) using a Kubernetes PKI directory path (e.g. `/etc/kubernetes/pki`) as input. Also introduces the flag `--kubernetes-bootstrap-token` to be able to set a static Kubernetes bootstrap token to the generated `secrets.yaml` file instead of a randomly-generated one. Closessiderolabs/talos#5894.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
The end result is that every Talos CLI accepts both JSON and strategic
patches to patch machine configuration.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Adds a new command `talosctl gen secrets` to generate a `secrets.yaml` file with Talos and Kubenetes secrets. This file can later be used like `talosctl gen config ... --with-secrets secrets` to generate a config with these pre-generated secrets. Closessiderolabs/talos#5861.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
There should be no functional change with this PR.
The primary driver is supporting strategic merge configuration patches.
For such type of patches machine config should be loaded from incomplete
fragments, so it becomes critically important to distinguish between a
field having zero value vs. field being set in YAML.
E.g. with following struct:
```go
struct { AEnabled *bool `yaml:"a"` }
```
It's possible to distinguish between:
```yaml
a: false
```
and no metion of `a` in YAML.
Merging process trewats zero values as "not set" (skips them when
merging), so it's important to allow overriding value to explicit
`false`.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
When integration tests run without data from Talos provisioner (e.g.
against AWS/GCP), it should work only with `talosconfig` as an input.
This specific flow was missing filling out `infoWrapper` properly.
Clean up things a bit by reducing code duplication.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
We skip the client-side health endpoint test that relies on the discovery service if the discovery service is not enabled for the cluster. Related to siderolabs#5554.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Query the discovery service to fetch the node list and use the results in health checks. Closes siderolabs#5554.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Introduce `cluster.NodeInfo` to represent the basic info about a node which can be used in the health checks. This information, where possible, will be populated by the discovery service in following PRs. Part of siderolabs#5554.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
The new mode allows changing the config for a period of time, which
allows trying the configuration and automatically rolling it back in case
if it doesn't work for example.
The mode can only be used with changes that can be applied without a
reboot.
When changed it doesn't write the configuration to disk, only changes it
in memory.
`--timeout` parameter can be used to customize the rollback delay.
The default timeout is 1 minute.
Any consequent configuration change will abort try mode and the last
applied configuration will be used.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Dry run prints out config diff, selected application mode without
changing the configuration.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
This is the follow-up fix to the PR #5129.
1. Correctly catch only expected errors in the tests.
2. Rewind the snapshot each time the upload is retried.
3. Correctly unwrap errors in the `EtcdRecovery` client.
4. Update the `grpc-proxy` library to pass through the EOF error.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Some failures can be fixed by updating the machine configuration.
Now `userDisks` and `userFiles` do not make Talos to enter into reboot
loop but pause for 35 minutes.
Additionally, `apid` and `machined` are now started right after
containerd is up and running.
That makes it possible for the operator to connect to the node using
talosctl and fix the config.
Fixes: https://github.com/talos-systems/talos/issues/4669
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Fixes#4947
It turns out there's something related to boot process in BIOS mode
which leads to initramfs corruption on later `kexec`.
Booting via GRUB is always successful.
Problem with kexec was confirmed with:
* direct boot via QEMU
* QEMU boot via iPXE (bundled with QEMU)
The root cause is not known, but the only visible difference is the
placement of RAMDISK with UEFI and BIOS boots:
```
[ 0.005508] RAMDISK: [mem 0x312dd000-0x34965fff]
```
or:
```
[ 0.003821] RAMDISK: [mem 0x711aa000-0x747a7fff]
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Talos shouldn't try to re-encode the machine config it was provided
with.
So add a `ReadonlyWrapper` around `*v1alpha1.Config` which makes sure
that raw config object is not available anymore (it's a private field),
but config accessors are available for read-only access.
Another thing that `ReadonlyWrapper` does is that it preserves the
original `[]byte` encoding of the config keeping it exactly same way as
it was loaded from file or read over the network.
Improved `talosctl edit mc` to preserve the config as it was submitted,
and preserve the edits on error from Talos (previously edits were lost).
`ReadonlyWrapper` is not used on config generation path though - config
there is represented by `*v1alpha.Config` and can be freely modified.
Why almost? Some parts of Talos (platform code) patch the machine
configuration with new data. We need to fix platforms to provide
networking configuration in a different way, but this will come with
other PRs later.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Sometimes pushing/pulling to Kubernetes registry is delayed due to
backoff on failed attempts to talk to the API server when the cluster is
still bootstrapping. Workaround that by adding retries.
Also disable kernel module controller in container mode, as it will keep
always failing.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
As `talosctl time` relies on default time server set in the config, and
our nodes start with `pool.ntp.org`, sometimes request to the timeserver
fails failing the tests.
Retry such errors in the tests to avoid spurious failures.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4656
As now changes to kubelet configuration can be applied without a reboot,
`talosctl upgrade-k8s` can handle the kubelet upgrades as well.
The gist is simply modifying machine config and waiting for `Node`
version to be updated, rest of the code is required for reliability of
the process.
Also fixed a bug in the API while watching deleted items with
tombstones.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes: https://github.com/talos-systems/talos/issues/4385
Now sysctls defined in the config can override kernel args defined by
defaults controller.
In that case controller shows the warning that tells which param was
overridden and the new value and tells that it is not recommended.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>