258 Commits

Author SHA1 Message Date
Artem Chernyshev
13499fc302
feat: support patching the machine config in the apply-config cmd
Fixes: https://github.com/siderolabs/talos/issues/6045

`talosctl apply-config` now supports `--config-patch` flag that takes
machine config patches as the input.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-08-11 13:56:23 +03:00
Andrey Smirnov
5dd1b40020
feat: disable Kubernetes discovery backend by default
Fixes #5827

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-10 22:25:31 +04:00
Noel Georgi
b62b18a972
feat: bump k8s to v1.25.0-beta.0
Bump k8s to v1.25.0-beta.0

Update most kubernetes `master` references to `controlplane`

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-10 22:17:53 +05:30
Utku Ozdemir
84e712a9f1
feat: introduce Talos API access from Kubernetes
We add a new CRD, `serviceaccounts.talos.dev` (with `tsa` as short name), and its controller which allows users to get a `Secret` containing a short-lived Talosconfig in their namespaces with the roles they need. Additionally, we introduce the `talosctl inject serviceaccount` command to accept a YAML file with Kubernetes manifests and inject them with Talos service accounts so that they can be directly applied to Kubernetes afterwards. If Talos API access feature is enabled on Talos side, the injected workloads will be able to talk to Talos API.

Closes siderolabs/talos#4422.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-08 18:27:26 +02:00
Andrey Smirnov
a6b010a8b4
chore: update Go to 1.19, Linux to 5.15.58
See https://go.dev/doc/go1.19

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-03 17:03:58 +04:00
Andrey Smirnov
fe2ee3b100
feat: implement MachineStatus resource
Fixes #5789

Example:

```yaml
spec:
    stage: running
    status:
        ready: false
        unmetConditions:
            - name: staticPods
              reason: kube-system/kube-controller-manager-talos-default-master-1 not ready, kube-system/kube-scheduler-talos-default-master-1 not ready
```

As events (CLI doesn't show full contents):

```
172.20.0.2   cbhf2l6f9lrs738hehfg   talos/runtime/machine.MachineStatusEvent   BOOTING   ready: false, unmet conditions: [time network services]
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-01 18:36:10 +04:00
Artem Chernyshev
e5994ff7a7
fix: skip ResetDuringBoot test if the Cluster config is unknown
And improve retry logic in the test.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-28 15:57:58 +03:00
Artem Chernyshev
8028e10749
fix: wait for boot done when rebooting a node in the integration tests
We shouldn't start cluster healthcheck until boot sequence is done.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-27 23:58:43 +03:00
Artem Chernyshev
ae1bec59e9
feat: allow running only one sequence at a time
Fix `Talos` sequencer to run only a single sequence at the same time.
Sequences priority was updated. To match the table:

| what is running (columns) what is requested (rows) | boot | reboot | reset | upgrade |
|----------------------------------------------------|------|--------|-------|---------|
| reboot                                             | Y    | Y      | Y     | N       |
| reset                                              | Y    | N      | N     | N       |
| upgrade                                            | Y    | N      | N     | N       |

With a small addition that `WithTakeover` is still there.
If set, priority is ignored.

This is mainly used for `Shutdown` sequence invokation.
And if doing apply config with reboot enabled.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-27 17:21:36 +03:00
Dmitriy Matrenichev
30f7851d2a
chore: bump golangci-lint from 1.45.2 to 1.47.2
Minor linter upgrade.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-07-22 17:49:44 +03:00
Utku Ozdemir
bb4abc0961
fix: regenerate kubelet certs when hostname changes
Clear the kubelet certificates and kubeconfig when hostname changes so that on next start, kubelet goes through the bootstrap process and new certificates are generated and the node is joined to the cluster with the new name.

Fixes siderolabs/talos#5834.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-21 01:54:15 +02:00
Andrey Smirnov
065b59276c
feat: implement packet capture API
This uses the `go-packet` library with native bindings for the packet
capture (without `libpcap`). This is not the most performant way, but it
allows us to avoid CGo.

There is a problem with converting network filter expressions (like
`tcp port 3222`) into BPF instructions, it's only available in C
libraries, but there's a workaround with `tcpdump`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-19 01:23:09 +04:00
Utku Ozdemir
87ea1d9611
fix: update kubelet kubeconfig when cluster control plane endpoint changes
Overwrite cluster's server URL in the kubeconfig file used by kubelet when the cluster control plane endpoint is changed in machineconfig, so that kubelet doesn't lose connectivity to kube-apiserver.

Closes siderolabs/talos#4470.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-16 14:19:25 +02:00
Utku Ozdemir
a75fe7600d
feat: gen secrets from kubernetes pki dir
This PR allows the ability to generate `secrets.yaml` (`talosctl gen secrets`) using a Kubernetes PKI directory path (e.g. `/etc/kubernetes/pki`) as input. Also introduces the flag `--kubernetes-bootstrap-token` to be able to set a static Kubernetes bootstrap token to the generated `secrets.yaml` file instead of a randomly-generated one. Closes siderolabs/talos#5894.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-16 13:06:32 +02:00
Andrey Smirnov
641f6a1e4e
feat: expose strategic merge config patches
The end result is that every Talos CLI accepts both JSON and strategic
patches to patch machine configuration.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-12 15:38:01 +04:00
Utku Ozdemir
d924901b79
feat: add cli subcommand to generate secrets
Adds a new command `talosctl gen secrets` to generate a `secrets.yaml` file with Talos and Kubenetes secrets. This file can later be used like `talosctl gen config ... --with-secrets secrets` to generate a config with these pre-generated secrets. Closes siderolabs/talos#5861.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-06 20:00:35 +02:00
Andrey Smirnov
86a0a7bdf7
refactor: use pointer types more in machine config structs
There should be no functional change with this PR.

The primary driver is supporting strategic merge configuration patches.
For such type of patches machine config should be loaded from incomplete
fragments, so it becomes critically important to distinguish between a
field having zero value vs. field being set in YAML.

E.g. with following struct:

```go
struct { AEnabled *bool `yaml:"a"` }
```

It's possible to distinguish between:

```yaml
a: false
```

and no metion of `a` in YAML.

Merging process trewats zero values as "not set" (skips them when
merging), so it's important to allow overriding value to explicit
`false`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-01 17:27:11 +04:00
Andrey Smirnov
52cd12951c
test: bump Talos versions in upgrade tests
We should keep the latest stable up to date.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-06-30 21:57:44 +04:00
Andrey Smirnov
a167a54021
test: fix CLI nodes discovery without provisioner data
When integration tests run without data from Talos provisioner (e.g.
against AWS/GCP), it should work only with `talosconfig` as an input.

This specific flow was missing filling out `infoWrapper` properly.

Clean up things a bit by reducing code duplication.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-06-21 18:42:26 +04:00
Utku Ozdemir
80090a3eda
test: fix health endpoint cli test when discovery is disabled
We skip the client-side health endpoint test that relies on the discovery service if the discovery service is not enabled for the cluster. Related to siderolabs#5554.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-06-20 21:45:42 +02:00
Utku Ozdemir
6759fcd4ae
feat: use discovery service on cluster health checks
Query the discovery service to fetch the node list and use the results in health checks. Closes siderolabs#5554.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-06-15 16:01:38 +02:00
Utku Ozdemir
8d2be5e315
feat: extend node definition used in health checks
Introduce `cluster.NodeInfo` to represent the basic info about a node which can be used in the health checks. This information, where possible, will be populated by the discovery service in following PRs. Part of siderolabs#5554.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-06-13 14:13:42 +02:00
Dmitriy Matrenichev
4dbbf4ac50
chore: add generic methods and use them part #2
Use things from #5702.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-06-09 23:10:02 +08:00
Andrey Smirnov
f2997c0f22
chore: bump dependencies
dependabot + go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-06-06 23:27:17 +04:00
Andrey Smirnov
2ae0e3a569
test: add a test for version of Go Talos was built with
This is to ensure that in fact Talos is built with Go version we expect.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-05-11 21:51:12 +03:00
Artem Chernyshev
2b03057b91
feat: implement a new mode try in the config manipulation commands
The new mode allows changing the config for a period of time, which
allows trying the configuration and automatically rolling it back in case
if it doesn't work for example.

The mode can only be used with changes that can be applied without a
reboot.

When changed it doesn't write the configuration to disk, only changes it
in memory.
`--timeout` parameter can be used to customize the rollback delay.
The default timeout is 1 minute.

Any consequent configuration change will abort try mode and the last
applied configuration will be used.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-04-21 20:31:45 +03:00
Artem Chernyshev
2b9722d1f5
feat: add dry-run flag in apply-config and edit commands
Dry run prints out config diff, selected application mode without
changing the configuration.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-04-14 19:12:57 +03:00
Andrey Smirnov
fc23c7a595
test: bump versions for upgrade tests
Use 0.14 -> 1.0 -> master.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-30 18:59:48 +03:00
Dmitriy Matrenichev
e06e1473b0
feat: update golangci-lint to 1.45.0 and gofumpt to 0.3.0
- Update golangci-lint to 1.45.0
- Update gofumpt to 0.3.0
- Fix gofumpt errors
- Add goimports and format imports since gofumports is removed
- Update Dockerfile
- Fix .golangci.yml configuration
- Fix linting errors

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-03-24 08:14:04 +04:00
Andrey Smirnov
883d401f9f
chore: rename github organization to siderolabs
Go module import paths still use talos-systems, packages use new
siderolabs name.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-23 21:07:46 +03:00
Andrey Smirnov
f477507262
fix: the etcd recovery client and tests
This is the follow-up fix to the PR #5129.

1. Correctly catch only expected errors in the tests.
2. Rewind the snapshot each time the upload is retried.
3. Correctly unwrap errors in the `EtcdRecovery` client.
4. Update the `grpc-proxy` library to pass through the EOF error.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-22 16:51:36 +03:00
Artem Chernyshev
27af5d41c6
feat: pause the boot process on some failures instead of rebooting
Some failures can be fixed by updating the machine configuration.
Now `userDisks` and `userFiles` do not make Talos to enter into reboot
loop but pause for 35 minutes.

Additionally, `apid` and `machined` are now started right after
containerd is up and running.

That makes it possible for the operator to connect to the node using
talosctl and fix the config.

Fixes: https://github.com/talos-systems/talos/issues/4669
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-03-21 17:39:45 +03:00
Artem Chernyshev
a50747a64a
fix: align list and diskusage command flags with their Linux analogs
Fixes: https://github.com/talos-systems/talos/issues/3018

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-03-02 22:27:56 +03:00
Andrey Smirnov
09efa62f68
chore: re-enable kexec and default to UEFI booting in tests
Fixes #4947

It turns out there's something related to boot process in BIOS mode
which leads to initramfs corruption on later `kexec`.

Booting via GRUB is always successful.

Problem with kexec was confirmed with:

* direct boot via QEMU
* QEMU boot via iPXE (bundled with QEMU)

The root cause is not known, but the only visible difference is the
placement of RAMDISK with UEFI and BIOS boots:

```
[    0.005508] RAMDISK: [mem 0x312dd000-0x34965fff]
```

or:

```
[    0.003821] RAMDISK: [mem 0x711aa000-0x747a7fff]
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-02 21:52:18 +03:00
Andrey Smirnov
0da370dfef
test: unlock CABPT/CACPPT provider versions
We should always test latest versions of our providers.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-02-10 00:14:15 +03:00
Andrey Smirnov
85782faa24
feat: update Kubernetes to 1.23.3
Also bumps some dependencies and updates Talos version we use in the
upgrade tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-01-26 17:59:21 +03:00
Artem Chernyshev
2f2bdb26aa
feat: replace flags with --mode in apply, edit and patch commands
Fixes: https://github.com/talos-systems/talos/issues/4588

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-01-13 16:09:53 +03:00
Andrey Smirnov
2f4b9d8d6d
feat: make machine configuration read-only in Talos (almost)
Talos shouldn't try to re-encode the machine config it was provided
with.

So add a `ReadonlyWrapper` around `*v1alpha1.Config` which makes sure
that raw config object is not available anymore (it's a private field),
but config accessors are available for read-only access.

Another thing that `ReadonlyWrapper` does is that it preserves the
original `[]byte` encoding of the config keeping it exactly same way as
it was loaded from file or read over the network.

Improved `talosctl edit mc` to preserve the config as it was submitted,
and preserve the edits on error from Talos (previously edits were lost).

`ReadonlyWrapper` is not used on config generation path though - config
there is represented by `*v1alpha.Config` and can be freely modified.

Why almost? Some parts of Talos (platform code) patch the machine
configuration with new data. We need to fix platforms to provide
networking configuration in a different way, but this will come with
other PRs later.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-28 20:12:55 +03:00
Andrey Smirnov
d2a7e082c2
test: retry in discovery tests
Sometimes pushing/pulling to Kubernetes registry is delayed due to
backoff on failed attempts to talk to the API server when the cluster is
still bootstrapping. Workaround that by adding retries.

Also disable kernel module controller in container mode, as it will keep
always failing.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-28 16:55:41 +03:00
Andrey Smirnov
c297d66a13
test: attempt number on two on proper retries in CLI time tests
See #4702

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-22 18:29:34 +03:00
Andrey Smirnov
17c1474881
test: retry talosctl time call in the tests
As `talosctl time` relies on default time server set in the config, and
our nodes start with `pool.ntp.org`, sometimes request to the timeserver
fails failing the tests.

Retry such errors in the tests to avoid spurious failures.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-17 20:55:06 +03:00
Noel Georgi
4c96e936ed
docs: add cilium guide
- Add Cilium CNI install guide
- Use Canal CNI for default examples

Fixes #4477

Signed-off-by: Noel Georgi <git@frezbo.dev>
2021-12-16 20:37:03 +05:30
Andrey Smirnov
ec641f7296
fix: use default time servers in time API if none are configured
This fixes simple bug:

```
$ talosctl -n 172.20.0.2 time
error fetching time: 1 error occurred:
	* 172.20.0.2: rpc error: code = Unknown desc = no time servers configured
```

After the change:

```
$ talosctl -n 172.20.0.2 time
NODE         NTP-SERVER     NODE-TIME                                 NTP-SERVER-TIME
172.20.0.2   pool.ntp.org   2021-12-10 14:25:38.871656717 +0000 UTC   2021-12-10 14:25:38.92119139 +0000 UTC
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-10 17:39:36 +03:00
Andrey Smirnov
97ffa7a645
feat: upgrade kubelet version in talosctl upgrade-k8s
Fixes #4656

As now changes to kubelet configuration can be applied without a reboot,
`talosctl upgrade-k8s` can handle the kubelet upgrades as well.

The gist is simply modifying machine config and waiting for `Node`
version to be updated, rest of the code is required for reliability of
the process.

Also fixed a bug in the API while watching deleted items with
tombstones.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-08 21:12:17 +03:00
Andrey Smirnov
64a4f6e77c
test: bump Talos versions in upgrade tests
In preparation for going 0.14-beta.0, bump versions in upgrade tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-06 18:07:24 +03:00
Artem Chernyshev
4f5d9da922
feat: allow overriding KSPP kernel parameters
Fixes: https://github.com/talos-systems/talos/issues/4385

Now sysctls defined in the config can override kernel args defined by
defaults controller.
In that case controller shows the warning that tells which param was
overridden and the new value and tells that it is not recommended.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-12-03 18:50:21 +03:00
Rohit Dandamudi
7f9922296a
feat: add powercycle mode in reboot
- Fixes #4569
- Updated reboot process sequence
- Updted api.descriptors to avoid proto type change linting error https://github.com/talos-systems/talos/pull/4612#discussion_r758599242
Signed-off-by: Rohit Dandamudi <rohit.dandamudi@siderolabs.com>

Signed-off-by: Rohit Dandamudi <rohit.dandamudi@siderolabs.com>
2021-12-02 22:40:04 +05:30
Nico Berlee
852bf4a7de
feat: talosctl fish completion support
Generate talosctl completion for fish

Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-11-23 16:45:16 +03:00
Andrey Smirnov
753a82188f
refactor: move pkg/resources to machinery
Fixes #4420

No functional changes, just moving packages around.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-11-15 19:50:35 +03:00
Alexey Palazhchenko
7462733bcb
chore: update golangci-lint
Fix context propagation.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-11-15 14:55:25 +00:00