273 Commits

Author SHA1 Message Date
Philipp Sauter
f17cdee167
feat: jsonpath filter for talosctl get outputs
We add a filter to the `talosctl get` command that allows users to
specify a jsonpath filter. Now they can reduce the information that is
printed to only the parts they are interested in.

Fixes #6109

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
2022-09-27 20:47:11 +02:00
Dmitriy Matrenichev
fc48849d00
chore: move maps/slices/ordered to gen module
Use github.com/siderolabs/gen

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-21 20:22:43 +03:00
Noel Georgi
357b770cb5
fix: cryptsetup delete slot
Fix cryptsetup delete slot.

Fixes: #6298

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-09-21 16:37:54 +05:30
Andrey Smirnov
2dadcd6695
fix: stop worker nodes from acting as apid routers
Don't allow worker nodes to act as apid routers:

* don't try to issue client certificate for apid on worker nodes
* if worker nodes receives incoming connections with `--nodes` set to
  one of the local addresses of the nodd, it routes the request to
  itself without proxying

Second point allows using `talosctl -e worker -n worker` to connect
directly to the worker if the connection from the control plane is not
available for some reason.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-13 15:07:31 +04:00
Andrey Smirnov
9df8f1ff1a
fix: list COSI APIs for the apid authenticator
As APIs were not listed explicitly, access with `os:reader` was denied
by default, while it should have been checked down in the access filter.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-08 21:05:36 +04:00
Utku Ozdemir
d283aba3a3
test: fix cli reboot test
Fix the assertions on the reboot cli test to correctly assert the event messages in lowercase.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-30 13:05:11 +02:00
Utku Ozdemir
0b339a9dc5
feat: track progress of action API calls
Track the progress of the long-running actions `reboot`, `reset`, `upgrade` and `shutdown` on the client side by default, unless `--no-wait=true` is specified.

Use the events API to follow the events using the actor ID of the action and display it using an stderr reporter with a spinner.

Closes siderolabs/talos#5499.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-29 22:54:40 +02:00
Andrey Smirnov
0723498125
fix: update COSI to the version with gRPC Wait fix
See https://github.com/cosi-project/runtime/pull/140

Also update for changes in https://github.com/cosi-project/runtime/pull/134

Fixes #6169

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-29 23:09:35 +04:00
Dmitriy Matrenichev
b59ca5810e
chore: move from inet.af/netaddr to net/netip and go4.org/netipx
Closes #6007

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-25 17:51:32 +03:00
Andrey Smirnov
11edb2c6f8
test: re-enable upgrade tests
Now final upgrade version is COSI API compatible.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-24 22:23:49 +04:00
Dmitriy Matrenichev
29bd632401
chore: remove old build tags syntax
This commit removes lines contains old build tag syntax.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-24 17:27:01 +03:00
Andrey Smirnov
2f2d97b6b5
fix: don't wait for the hostname in maintenance mode
Fixes #6119

With new stable default hostname feature, any default hostname is
disabled until the machine config is available.

Talos enters maintenance mode when the default config source is empty,
so it doesn't have any machine config available at the moment
maintenance service is started.

Hostname might be set via different sources, e.g. kernel args or via
DHCP before the machine config is available, but if all these sources
are not available, hostname won't be set at all.

This stops waiting for the hostname, and skips setting any DNS names in
the maintenance mode certificate SANs if the hostname is not available.

Also adds a regression test via new `--disable-dhcp-hostname` flag to
`talosctl cluster create`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-23 17:52:20 +04:00
Dmitriy Matrenichev
0fe4492e72
chore: bump golangci-lint from 1.47.2 to 1.48.0
Patch version linter upgrade.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-15 18:11:30 +03:00
Andrey Smirnov
9baca49662
refactor: implement COSI resource API for Talos
Overview: deprecate existing Talos resource API, and introduce new COSI
API.

Consequences:

* COSI API can only go via one-2-one proxy (`client.WithNode`)
* client-side API access is way easier with `state.State` wrappers
* lots of small changes on the client side to use new APIs

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-12 22:31:54 +04:00
Artem Chernyshev
5c6648e3d2
fix: make talosctl command return nonzero error codes if it had errors
Multinode requests were printing out the errors for each node to stderr,
but they didn't set the global error.

Refactor the code a bit to use a single function for handling that logic
to avoid rewriting it in many other places.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-08-12 14:19:45 +03:00
Artem Chernyshev
13499fc302
feat: support patching the machine config in the apply-config cmd
Fixes: https://github.com/siderolabs/talos/issues/6045

`talosctl apply-config` now supports `--config-patch` flag that takes
machine config patches as the input.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-08-11 13:56:23 +03:00
Andrey Smirnov
5dd1b40020
feat: disable Kubernetes discovery backend by default
Fixes #5827

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-10 22:25:31 +04:00
Noel Georgi
b62b18a972
feat: bump k8s to v1.25.0-beta.0
Bump k8s to v1.25.0-beta.0

Update most kubernetes `master` references to `controlplane`

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-10 22:17:53 +05:30
Utku Ozdemir
84e712a9f1
feat: introduce Talos API access from Kubernetes
We add a new CRD, `serviceaccounts.talos.dev` (with `tsa` as short name), and its controller which allows users to get a `Secret` containing a short-lived Talosconfig in their namespaces with the roles they need. Additionally, we introduce the `talosctl inject serviceaccount` command to accept a YAML file with Kubernetes manifests and inject them with Talos service accounts so that they can be directly applied to Kubernetes afterwards. If Talos API access feature is enabled on Talos side, the injected workloads will be able to talk to Talos API.

Closes siderolabs/talos#4422.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-08 18:27:26 +02:00
Andrey Smirnov
a6b010a8b4
chore: update Go to 1.19, Linux to 5.15.58
See https://go.dev/doc/go1.19

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-03 17:03:58 +04:00
Andrey Smirnov
fe2ee3b100
feat: implement MachineStatus resource
Fixes #5789

Example:

```yaml
spec:
    stage: running
    status:
        ready: false
        unmetConditions:
            - name: staticPods
              reason: kube-system/kube-controller-manager-talos-default-master-1 not ready, kube-system/kube-scheduler-talos-default-master-1 not ready
```

As events (CLI doesn't show full contents):

```
172.20.0.2   cbhf2l6f9lrs738hehfg   talos/runtime/machine.MachineStatusEvent   BOOTING   ready: false, unmet conditions: [time network services]
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-01 18:36:10 +04:00
Artem Chernyshev
e5994ff7a7
fix: skip ResetDuringBoot test if the Cluster config is unknown
And improve retry logic in the test.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-28 15:57:58 +03:00
Artem Chernyshev
8028e10749
fix: wait for boot done when rebooting a node in the integration tests
We shouldn't start cluster healthcheck until boot sequence is done.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-27 23:58:43 +03:00
Artem Chernyshev
ae1bec59e9
feat: allow running only one sequence at a time
Fix `Talos` sequencer to run only a single sequence at the same time.
Sequences priority was updated. To match the table:

| what is running (columns) what is requested (rows) | boot | reboot | reset | upgrade |
|----------------------------------------------------|------|--------|-------|---------|
| reboot                                             | Y    | Y      | Y     | N       |
| reset                                              | Y    | N      | N     | N       |
| upgrade                                            | Y    | N      | N     | N       |

With a small addition that `WithTakeover` is still there.
If set, priority is ignored.

This is mainly used for `Shutdown` sequence invokation.
And if doing apply config with reboot enabled.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-27 17:21:36 +03:00
Dmitriy Matrenichev
30f7851d2a
chore: bump golangci-lint from 1.45.2 to 1.47.2
Minor linter upgrade.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-07-22 17:49:44 +03:00
Utku Ozdemir
bb4abc0961
fix: regenerate kubelet certs when hostname changes
Clear the kubelet certificates and kubeconfig when hostname changes so that on next start, kubelet goes through the bootstrap process and new certificates are generated and the node is joined to the cluster with the new name.

Fixes siderolabs/talos#5834.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-21 01:54:15 +02:00
Andrey Smirnov
065b59276c
feat: implement packet capture API
This uses the `go-packet` library with native bindings for the packet
capture (without `libpcap`). This is not the most performant way, but it
allows us to avoid CGo.

There is a problem with converting network filter expressions (like
`tcp port 3222`) into BPF instructions, it's only available in C
libraries, but there's a workaround with `tcpdump`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-19 01:23:09 +04:00
Utku Ozdemir
87ea1d9611
fix: update kubelet kubeconfig when cluster control plane endpoint changes
Overwrite cluster's server URL in the kubeconfig file used by kubelet when the cluster control plane endpoint is changed in machineconfig, so that kubelet doesn't lose connectivity to kube-apiserver.

Closes siderolabs/talos#4470.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-16 14:19:25 +02:00
Utku Ozdemir
a75fe7600d
feat: gen secrets from kubernetes pki dir
This PR allows the ability to generate `secrets.yaml` (`talosctl gen secrets`) using a Kubernetes PKI directory path (e.g. `/etc/kubernetes/pki`) as input. Also introduces the flag `--kubernetes-bootstrap-token` to be able to set a static Kubernetes bootstrap token to the generated `secrets.yaml` file instead of a randomly-generated one. Closes siderolabs/talos#5894.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-16 13:06:32 +02:00
Andrey Smirnov
641f6a1e4e
feat: expose strategic merge config patches
The end result is that every Talos CLI accepts both JSON and strategic
patches to patch machine configuration.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-12 15:38:01 +04:00
Utku Ozdemir
d924901b79
feat: add cli subcommand to generate secrets
Adds a new command `talosctl gen secrets` to generate a `secrets.yaml` file with Talos and Kubenetes secrets. This file can later be used like `talosctl gen config ... --with-secrets secrets` to generate a config with these pre-generated secrets. Closes siderolabs/talos#5861.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-06 20:00:35 +02:00
Andrey Smirnov
86a0a7bdf7
refactor: use pointer types more in machine config structs
There should be no functional change with this PR.

The primary driver is supporting strategic merge configuration patches.
For such type of patches machine config should be loaded from incomplete
fragments, so it becomes critically important to distinguish between a
field having zero value vs. field being set in YAML.

E.g. with following struct:

```go
struct { AEnabled *bool `yaml:"a"` }
```

It's possible to distinguish between:

```yaml
a: false
```

and no metion of `a` in YAML.

Merging process trewats zero values as "not set" (skips them when
merging), so it's important to allow overriding value to explicit
`false`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-01 17:27:11 +04:00
Andrey Smirnov
52cd12951c
test: bump Talos versions in upgrade tests
We should keep the latest stable up to date.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-06-30 21:57:44 +04:00
Andrey Smirnov
a167a54021
test: fix CLI nodes discovery without provisioner data
When integration tests run without data from Talos provisioner (e.g.
against AWS/GCP), it should work only with `talosconfig` as an input.

This specific flow was missing filling out `infoWrapper` properly.

Clean up things a bit by reducing code duplication.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-06-21 18:42:26 +04:00
Utku Ozdemir
80090a3eda
test: fix health endpoint cli test when discovery is disabled
We skip the client-side health endpoint test that relies on the discovery service if the discovery service is not enabled for the cluster. Related to siderolabs#5554.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-06-20 21:45:42 +02:00
Utku Ozdemir
6759fcd4ae
feat: use discovery service on cluster health checks
Query the discovery service to fetch the node list and use the results in health checks. Closes siderolabs#5554.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-06-15 16:01:38 +02:00
Utku Ozdemir
8d2be5e315
feat: extend node definition used in health checks
Introduce `cluster.NodeInfo` to represent the basic info about a node which can be used in the health checks. This information, where possible, will be populated by the discovery service in following PRs. Part of siderolabs#5554.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-06-13 14:13:42 +02:00
Dmitriy Matrenichev
4dbbf4ac50
chore: add generic methods and use them part #2
Use things from #5702.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-06-09 23:10:02 +08:00
Andrey Smirnov
f2997c0f22
chore: bump dependencies
dependabot + go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-06-06 23:27:17 +04:00
Andrey Smirnov
2ae0e3a569
test: add a test for version of Go Talos was built with
This is to ensure that in fact Talos is built with Go version we expect.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-05-11 21:51:12 +03:00
Artem Chernyshev
2b03057b91
feat: implement a new mode try in the config manipulation commands
The new mode allows changing the config for a period of time, which
allows trying the configuration and automatically rolling it back in case
if it doesn't work for example.

The mode can only be used with changes that can be applied without a
reboot.

When changed it doesn't write the configuration to disk, only changes it
in memory.
`--timeout` parameter can be used to customize the rollback delay.
The default timeout is 1 minute.

Any consequent configuration change will abort try mode and the last
applied configuration will be used.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-04-21 20:31:45 +03:00
Artem Chernyshev
2b9722d1f5
feat: add dry-run flag in apply-config and edit commands
Dry run prints out config diff, selected application mode without
changing the configuration.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-04-14 19:12:57 +03:00
Andrey Smirnov
fc23c7a595
test: bump versions for upgrade tests
Use 0.14 -> 1.0 -> master.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-30 18:59:48 +03:00
Dmitriy Matrenichev
e06e1473b0
feat: update golangci-lint to 1.45.0 and gofumpt to 0.3.0
- Update golangci-lint to 1.45.0
- Update gofumpt to 0.3.0
- Fix gofumpt errors
- Add goimports and format imports since gofumports is removed
- Update Dockerfile
- Fix .golangci.yml configuration
- Fix linting errors

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-03-24 08:14:04 +04:00
Andrey Smirnov
883d401f9f
chore: rename github organization to siderolabs
Go module import paths still use talos-systems, packages use new
siderolabs name.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-23 21:07:46 +03:00
Andrey Smirnov
f477507262
fix: the etcd recovery client and tests
This is the follow-up fix to the PR #5129.

1. Correctly catch only expected errors in the tests.
2. Rewind the snapshot each time the upload is retried.
3. Correctly unwrap errors in the `EtcdRecovery` client.
4. Update the `grpc-proxy` library to pass through the EOF error.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-22 16:51:36 +03:00
Artem Chernyshev
27af5d41c6
feat: pause the boot process on some failures instead of rebooting
Some failures can be fixed by updating the machine configuration.
Now `userDisks` and `userFiles` do not make Talos to enter into reboot
loop but pause for 35 minutes.

Additionally, `apid` and `machined` are now started right after
containerd is up and running.

That makes it possible for the operator to connect to the node using
talosctl and fix the config.

Fixes: https://github.com/talos-systems/talos/issues/4669
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-03-21 17:39:45 +03:00
Artem Chernyshev
a50747a64a
fix: align list and diskusage command flags with their Linux analogs
Fixes: https://github.com/talos-systems/talos/issues/3018

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-03-02 22:27:56 +03:00
Andrey Smirnov
09efa62f68
chore: re-enable kexec and default to UEFI booting in tests
Fixes #4947

It turns out there's something related to boot process in BIOS mode
which leads to initramfs corruption on later `kexec`.

Booting via GRUB is always successful.

Problem with kexec was confirmed with:

* direct boot via QEMU
* QEMU boot via iPXE (bundled with QEMU)

The root cause is not known, but the only visible difference is the
placement of RAMDISK with UEFI and BIOS boots:

```
[    0.005508] RAMDISK: [mem 0x312dd000-0x34965fff]
```

or:

```
[    0.003821] RAMDISK: [mem 0x711aa000-0x747a7fff]
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-02 21:52:18 +03:00
Andrey Smirnov
0da370dfef
test: unlock CABPT/CACPPT provider versions
We should always test latest versions of our providers.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-02-10 00:14:15 +03:00