3932 Commits

Author SHA1 Message Date
Andrey Smirnov
985b0c2e79
chore: remove go.work.sum
This file receives many updates, and we don't want to handle them.

Everyone can have it on their local machine.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-15 18:43:35 +04:00
Andrey Smirnov
69124f1026
feat: update etcd to v3.5.5
See https://github.com/etcd-io/etcd/releases/tag/v3.5.5

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-15 17:09:02 +04:00
Pau Campana
1985a796c0
docs: update docs for pod security
Add new section to see how to disable admission control in control
plane.

Signed-off-by: Pau Campana <pau.campanya.soler@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-15 14:41:26 +04:00
Andrey Smirnov
94b088f02f
fix: set etcd options consistently
This fixes an issue introduced in #5879: options should be set same way
for both `init` and `controlplane` cases.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-14 22:56:26 +04:00
Dmitriy Matrenichev
92ae7ef4b1
fix: fix protoenc encoding for enums and types with custom encoders
This commit bumps protoenc to v0.2.0 and also adds tests to ensure that encoding fixes are working correctly.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-14 17:47:37 +03:00
Noel Georgi
93809017c5
docs: cpu scaling governor knowledgebase
Add docs on setting cpu scaling governor across all CPUs.

Thanks to @nberlee for the [suggestion](https://github.com/siderolabs/talos/issues/4508#issuecomment-1245633679)

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-09-14 13:20:28 +05:30
Andrey Smirnov
7b270ff33d
test: fix api controller test
Fixing the test to match the implementation.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-13 15:26:32 +04:00
Andrey Smirnov
2dadcd6695
fix: stop worker nodes from acting as apid routers
Don't allow worker nodes to act as apid routers:

* don't try to issue client certificate for apid on worker nodes
* if worker nodes receives incoming connections with `--nodes` set to
  one of the local addresses of the nodd, it routes the request to
  itself without proxying

Second point allows using `talosctl -e worker -n worker` to connect
directly to the worker if the connection from the control plane is not
available for some reason.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-13 15:07:31 +04:00
Andrey Smirnov
9eaf33f3f2
fix: never sign client certificate requests in trustd
Talos worker nodes use `trustd` API on control plane nodes to issue
certificates for `apid` service. Access to the API is protected with the
Talos join token specified in the machine configuration.

There was no validation on what kind of request is requested, so
`trustd` could issue a certificate which is valid for client
authentication with any set of Talos API RBAC roles, including
`os:admin` role allowing full access to the Talos API on control plane
nodes.

See: GHSA-7hgc-php5-77qq
CVE: CVE-2022-36103

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-13 15:06:09 +04:00
Noel Georgi
4367491247
feat: environment vars for extension service
This allows setting environment variables for the extension service.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-09-13 14:06:55 +05:30
Andrey Smirnov
0c0cb671ea
chore: mark machine configuration validation failure as InvalidArgument
This makes it easier to distinguish between retriable and fatal
failures.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-12 22:04:54 +04:00
Andrey Smirnov
f424e53404
fix: stop containers more thoroughly
Don't skip pods which are not ready, try still to stop containers inside
not ready pod sandboxes.

Re-enable the test with Canal CNI (upstream Calico got fixed).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-12 20:16:40 +04:00
Dmitriy Matrenichev
12827b861c
chore: move "implements" checks to compile time
There is no need to use `assert.Implements` since we can express this check during compile time. Go will eliminate `_` variables and any accompanying allocations during dead-code elimination phase.

This commit also removes:

    tok := new(v1alpha1.ClusterConfig).Token()
	assert.Implements(t, (*config.Token)(nil), tok)

Code since it doesn't check anything - v1alpha1.ClusterConfig.Token() already returns a config.Token interface.

Also - run `go work sync` and `go mod tidy`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-12 16:57:24 +03:00
Andrey Smirnov
3a67c42cbf
fix: kill the task processes when cleaning up stale task
The bug was triggered by `containerd` crash (restart), in this case
runner receives an error as if the process exited.
Runner tries to restart the container, but as the container is still
running, attempt to delete the task would fail.

With this change Talos always tries to kill the running container and
waits for the container to terminate.

The error message when the bug was triggered looks like:

```
service[kubelet](Waiting): Error running Containerd(kubelet), going to restart forever: failed to clean up task "kubelet": task must be stopped before deletion: running: failed precondition
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-12 17:05:13 +04:00
Andrey Smirnov
14a79e325b
chore: bump dependencies
dependabot

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-12 16:38:21 +04:00
Andrey Smirnov
9beee92e71
docs: fix double vv in Kubernetes version
Fixes #6242

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-12 15:36:26 +04:00
Andrey Smirnov
6882725157
fix: use different username for Talos Kubernetes API access
Fixes #6156

Now access from Talos itself goes with `talos:admin` username in the
Kubernetes API server audit log, while access with admin kubeconfig goes
with `admin` username as before.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-09 19:30:36 +04:00
Andrey Smirnov
161a52a9ef
feat: check apid client certificate extended key usage
This is enabled via a machine config feature/version contract, as
`talosconfig` certificate generated previously didn't have proper key
usage set, so we need to keep backwards compatibility on upgrades.

New v1.3+ clusters will include this check.

This check prevents even potential mis-use of server certificates as a
client certificate.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-09 16:37:21 +04:00
Andrey Smirnov
9dadc4a599
fix: include all node addresses into etcd cert SANs
That was a mistake to use only 'routed' addresses, as they e.g. do not
include SideroLink.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-09 15:24:58 +04:00
Andrey Smirnov
71bfd3e43c
feat: update CoreDNS to 1.9.4
See https://github.com/coredns/coredns/blob/master/notes/coredns-1.9.4.md

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-08 21:59:58 +04:00
Andrey Smirnov
9df8f1ff1a
fix: list COSI APIs for the apid authenticator
As APIs were not listed explicitly, access with `os:reader` was denied
by default, while it should have been checked down in the access filter.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-08 21:05:36 +04:00
Dmitriy Matrenichev
31462450f1
fix: pass a pointer to specs.Mount into protoenc.Marshal
Encoder function `protoenc.Marshal` expects a pointer.

Fixes #6233

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-08 15:04:04 +03:00
Andrey Smirnov
e626540dfb
chore: avoid double API request logging in trustd
There's a common logger for API calls already working, so no need to log
in the token authenticator.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-08 14:18:39 +04:00
Andrey Smirnov
f62d17125b
chore: update crypto to use new import path siderolabs/crypto
No functional changes in this PR, just updating import paths.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-07 23:02:50 +04:00
Andrey Smirnov
ef27dd8553
chore: bump dependencies
dependabot

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-06 14:48:50 +04:00
Andrey Smirnov
6472ae00b2
fix: automatically discard VIPs for etcd advertised addresses
Fixes #6210

Refactored the code a bit to support excludes and default configuration.

Etcd should never advertise VIPs, as VIPs are managed by etcd elections.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-06 14:22:12 +04:00
Noel Georgi
5e21cca52d
feat: support setting kernel parameters
Support setting kernel parameters via machine config.

Fixes: #6206

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-09-05 23:45:51 +05:30
Dmitriy Matrenichev
bd56621cdf
feat: add structprotogen tool
This commit adds structprotogen tool which is used to generate proto file from Go structs.

Closes #6078.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-05 16:54:00 +03:00
Marvin Drees
cdb6bb2cc7
feat: add Nano Pi R4S support
This commit adds initial support for the Nano Pi
R4S from Friendlyelec. This device is a networking focused
rk3399 based SBC with two 1G ethernet interfaces,
making it perfect for edge or SOHO deployments.

Signed-off-by: Marvin Drees <marvin.drees@9elements.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-09-02 23:37:07 +05:30
Andrey Smirnov
36c1f1d6e6
fix: flip the client-server version check
It should have been the opposite: it's a problem if the server version
is _older_ than the client verion.

E.g. using talosctl 1.2.0 against Talos 1.1.2 is a problem, not vice
versa.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-02 19:14:44 +04:00
Andrey Smirnov
cd6c53a979
docs: fork docs for v1.3
Now master docs are generate for v1.3.0.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-02 18:35:05 +04:00
Utku Ozdemir
0847400f72
fix: prevent panic on health check if a member has no IPs
If a member has no IP addresses, prevent cluster health checks from failing with a panic by checking for the length of member IPs and not assuming there's always at least 1 IP.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-09-02 15:16:59 +02:00
Andrey Smirnov
7471d7f017
feat: update Flannel to v0.19.2
See https://github.com/flannel-io/flannel/releases/tag/v0.19.2

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-02 16:12:07 +04:00
Steve Francis
148c75cfb9
docs: consolidate the control-plane documentation
Also fix some typos.

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-02 00:35:55 +04:00
Andrey Smirnov
353154281a
fix: drop kube-system SA default binding
This is not needed anymore, it's a leftover from bootkube times.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-01 21:38:01 +04:00
Noel Georgi
4f37b668be
chore: remove capi hacks
Remove hacks used for CAPI tests

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-09-01 18:57:27 +05:30
Andrey Smirnov
1369afea85
docs: make 1.2.0 docs default ones
Update latest release to 1.2.0.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-01 15:26:53 +04:00
Andrey Smirnov
7627cb0e30
docs: add new talosctl gen secrets
I forgot to mention that in the docs update.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-01 00:51:04 +04:00
Noel Georgi
8aa60a37a6
chore: bump kernel to 5.15.64
Bump kernel to [5.15.64](https://github.com/siderolabs/pkgs/pull/576)

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-09-01 00:01:03 +05:30
Andrey Smirnov
a798dbd5d2
docs: update docs for upcoming 1.2.0 release
Update what's new, upgrading docs.

Fix up instances of `master` leftover in the docs.

Fix the formatting of kernel params reference.

Fixes #6150

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-31 21:54:45 +04:00
Artem Chernyshev
b2fec3c975
fix: properly handle configContext being nil in Talos client
Client crashes if you try using it in the unixSocket mode.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-08-31 17:31:20 +03:00
Artem Chernyshev
1c0977b3af
fix: change the type of returned gRPC connection object from the client
`client.Conn()` now returns `*grpc.ClientConn` instead of
`gprc.ClientConnInterface`.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-08-30 16:55:13 +03:00
Artem Chernyshev
41848e4214
fix: expose Talos client gRPC connection via the function Conn
Previously we had `GetConn` public function, need something similar.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-08-30 15:11:37 +03:00
Andrey Smirnov
2e9be4af8b
chore: bump dependencies
go-mod-outdated + dependabot

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-30 15:26:22 +04:00
Utku Ozdemir
d283aba3a3
test: fix cli reboot test
Fix the assertions on the reboot cli test to correctly assert the event messages in lowercase.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-30 13:05:11 +02:00
Utku Ozdemir
0b339a9dc5
feat: track progress of action API calls
Track the progress of the long-running actions `reboot`, `reset`, `upgrade` and `shutdown` on the client side by default, unless `--no-wait=true` is specified.

Use the events API to follow the events using the actor ID of the action and display it using an stderr reporter with a spinner.

Closes siderolabs/talos#5499.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-29 22:54:40 +02:00
Andrey Smirnov
0723498125
fix: update COSI to the version with gRPC Wait fix
See https://github.com/cosi-project/runtime/pull/140

Also update for changes in https://github.com/cosi-project/runtime/pull/134

Fixes #6169

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-29 23:09:35 +04:00
Andrey Smirnov
89d57aa816
fix: always abort the maintenance service
I hit this bug when one the API calls got hanging, and submitting the
machine config with `apply-config` never takes the node out of
maintenance mode, as `.GracefulStop()` may hang forever waiting for all
the calls to finish.

This way we always abort at some timeout and stop the server forcefully.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-29 22:48:06 +04:00
Andrey Smirnov
f6fa746193
fix: limit apid backoff max delay
This fixes a case when a node is rebooted, and connection via another
endpoint apid "caches" a connection error even when the node is up.

E.g. this command:

```
talosctl -e IP1 -n IP2 version
```

If node `IP2` is rebooted, `apid` at `IP1` might enter long backoff loop
and return an error still when `IP2` is actually up.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-29 21:59:46 +04:00
Andrey Smirnov
d7ef346db8
fix: get command in the case 'nodes' are not set in the context
For maintenance mode (`talosctl get --insecure`), there's no 'nodes'
set, so we run the loop for a single "current" node client is connected
to.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-29 18:48:59 +04:00