talos

mirror of https://github.com/siderolabs/talos.git synced 2025-12-15 22:41:55 +01:00

Author	SHA1	Message	Date
Alexey Palazhchenko	4708beaee5	feat: implement `talosctl config info` command Closes #3852. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-07-06 00:58:47 -07:00
Andrey Smirnov	6d13d2cf92	fix: close Kubernetes API client The problem is that there's no official way to close Kuberentes client underlying TCP/HTTP connections. So each time Talos initializes connection to the control plane endpoint, new client is built, but this client is never closed, so the connection stays active on the load balancers, on the API server level, etc. It also eats some resources out of Talos itself. We add a way to close underlying connections by using helper from the Kubernetes client libraries to force close all TCP connections which should shut down all HTTP/2 connections as well. Alternative approach might be to cache a client for some time, but many of the clients are created with temporary PKI, so even cached client still needs to be closed once it gets stale, and it's not clear how to recreate a client in case existing one is broken for one reason or another (and we need to force a re-connection). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-05 14:25:26 -07:00
Andrey Smirnov	aaa36f3b4f	fix: ignore 'not a leader' error on forfeit leadership When forfeiting etcd leadership, it might be that the node still reports leadership status while not being a leader once the actual API call is used. We should ignore such an error as the node is not a leader. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-05 14:23:24 -07:00
Andrey Smirnov	22a4193678	fix: workaround 'Unauthorized' errors when accessing Kubernetes API This should fix an error like: ``` failed to create etcd client: error getting kubernetes endpoints: Unauthorized ``` The problem is that the generated cert was used immediately, so even slight time sync issue across nodes might render the cert not (yet) usable. Cert is generated on one node, but might be used on any other node (as it goes via the LB). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-05 14:15:03 -07:00
Alexey Palazhchenko	71c6f7004e	chore: bump go.mod dependencies Closes #3879, #3880, #3881, #3882, #3883, #3884, #3885, #3886. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-07-05 06:59:14 -07:00
Alexey Palazhchenko	915cd8fe20	docs: add guide for RBAC Document how to enable RBAC without screwing up. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-07-05 05:56:29 -07:00
Serge Logvinov	f5721050de	fix: controlplane keyusage * kube-apiserver keyusage serverAuth * kube-scheduler keyusage clientAuth * kube-controller-manager keyusage clientAuth * kubeconfig keyusage clientAuth Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>	2021-07-01 12:49:29 -07:00
Andrey Smirnov	3d7726613c	fix: fill uuid argument correctly in the config download URL It was broken, because `?uuid=` URL parses to `{"uuid": []string{""}}`. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-01 10:50:39 -07:00
Serge Logvinov	d8602025c8	chore: update containerd config version 2 * Rename key cri -> io.containerd.grpc.v1.cri * Disable plugins aufs,zfs,devmapper,btrfs (less warning messages on boot time) Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>	2021-07-01 09:08:54 -07:00
Andrey Smirnov	5949ec4e6e	docs: describe the new network configuration subsystem Internal details, resources, examples inspecting the configuration. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-01 09:02:56 -07:00
Spencer Smith	444d72b4d7	feat: update pkgs version This PR bumps pkgs to v0.7.0-alpha.0, so that we gain a fix for hotplugging of nvme drives. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2021-07-01 07:55:00 -07:00
Andrey Smirnov	e883c12b31	fix: make output of `upgrade-k8s` command less scary This removes `retrying error` messages while waiting for the API server pod state to reflect changes from the updated static pod definition. Log more lines to notify about the progress. Skip `kube-proxy` if not found (as we allow it to be disabled). ``` $ talosctl upgrade-k8s -n 172.20.0.2 --from 1.21.0 --to 1.21.2 discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"] updating "kube-apiserver" to version "1.21.2" > "172.20.0.2": starting update > "172.20.0.2": machine configuration patched > "172.20.0.2": waiting for API server state pod update < "172.20.0.2": successfully updated > "172.20.0.3": starting update > "172.20.0.3": machine configuration patched > "172.20.0.3": waiting for API server state pod update < "172.20.0.3": successfully updated > "172.20.0.4": starting update > "172.20.0.4": machine configuration patched > "172.20.0.4": waiting for API server state pod update < "172.20.0.4": successfully updated updating "kube-controller-manager" to version "1.21.2" > "172.20.0.2": starting update > "172.20.0.2": machine configuration patched > "172.20.0.2": waiting for API server state pod update < "172.20.0.2": successfully updated > "172.20.0.3": starting update > "172.20.0.3": machine configuration patched > "172.20.0.3": waiting for API server state pod update < "172.20.0.3": successfully updated > "172.20.0.4": starting update > "172.20.0.4": machine configuration patched > "172.20.0.4": waiting for API server state pod update < "172.20.0.4": successfully updated updating "kube-scheduler" to version "1.21.2" > "172.20.0.2": starting update > "172.20.0.2": machine configuration patched > "172.20.0.2": waiting for API server state pod update < "172.20.0.2": successfully updated > "172.20.0.3": starting update > "172.20.0.3": machine configuration patched > "172.20.0.3": waiting for API server state pod update < "172.20.0.3": successfully updated > "172.20.0.4": starting update > "172.20.0.4": machine configuration patched > "172.20.0.4": waiting for API server state pod update < "172.20.0.4": successfully updated updating daemonset "kube-proxy" to version "1.21.2" kube-proxy skipped as DaemonSet was not found ``` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-01 06:54:36 -07:00
Andrey Smirnov	7f8e50de4d	fix: restart the merge controllers on conflict Fixes #3861 What this change effectively does is that it changes immediate reconcile request to an error return, so that controller will be restarted with a backoff. More details: * root cause of the update/teardown conflict is that the finalizer is still pending on the tearing down resource * finalizer might not be removed immediately, e.g. if the controller which put the finalizer is itself in the crash loop * if the merge controller queues reconcile immediately, it restarts itself, but the finalizer is still there, so it once again goes into reconcile loop and that goes forever until the finalizer is removed, so instead if the controller fails, it will be restarted with exponential backoff lowering the load on the system Change is validated with the unit-tests reproducing the conflict. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-30 08:23:33 -07:00
Andrey Smirnov	60d7360944	fix: ignore deadline exceeded errors on bootstrap With the recent changes, bootstrap API might wait for the time to be in sync (as the apid is launched before time is sync). We set timeout to 500ms for the bootstrap API call, so there's a chance that a call might time out, and we should ignore it. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-30 06:59:36 -07:00
Andrey Smirnov	ee06dd69fc	fix: don't print git sha of the release twice in the dashboard This is a small nit, Talos version already contains sha when it is needed. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-30 03:45:34 -07:00
Andrey Smirnov	07fb61e5d2	fix: issue worker apid certs properly on renewal This fixes endless block on RemoteGenerator.Close method rewriting the RemoteGenerator using the retry package. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-29 09:02:35 -07:00
Andrey Smirnov	84817f7334	chore: bump Talos version in upgrade tests Preparing for 0.11 to be stable release soon. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-29 07:24:48 -07:00
Alexey Palazhchenko	2fa54107b2	chore: fix tests for disabled RBAC This commit also introduces a hidden `--json` flag for `talosctl version` command that is not supported and should be re-worked at #907. Refs #3852. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-28 13:56:40 -07:00
Andrey Smirnov	78583ba985	fix: don't set bond delay options if miimon is not enabled Basically all delay options are interlocked with `miimon`: if `miimon` is zero, all delays are set to zero, and kernel complains even if zero delay attribute is sent while miimon is zero. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-28 11:55:39 -07:00
Alexey Palazhchenko	bbf1c091d4	feat: add RBAC to `talosctl version` output Refs #3852. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-28 07:10:25 -07:00
Andrey Smirnov	5f6ec3ef66	fix: handle cases when merged resource re-appears before being destroyed The sequence of events to reproduce the problem: * some resource was merged as final representation with ID `x` * underlying source resource gets destroyed * merge controller marks final resource `x` for teardown and waits for the finalizers to be empty * another source resource appears which gets merged to same final `x` * as `x` is in the teardown phase, spec controller will ignore it * merge controller doesn't see the problem as well, as `x` spec is correct, but the phase is wrong (which merge controller ignores) This pulls in COSI fix to return an error if a resource in teardown phase is modified. This way merge controller knows that the resource `x` is in the teardown phase, so it should be first fully torn down, and then new representation should be re-created as new resource with same ID `x`. Regression unit-tests included (they don't reproduce the sequence of events always reliably, but they do with 10% probability). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-28 06:45:44 -07:00
Rui Lopes	1e9a0e745d	fix: documentation typos Fix a couple of documentation typos. Signed-off-by: Rui Lopes <rgl@ruilopes.com>	2021-06-28 02:50:31 -07:00
Alexey Palazhchenko	f228af4061	chore: bump go.mod dependencies Closes #3848, #3849, #3850, #3851. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-28 02:25:43 -07:00
Spencer Smith	2060ceaa0b	chore: add CAPI version to CI setup This PR makes sure we pin to a known CAPI version because with the new v0.4.x released, we'll fail until we support the v1alpha4 APIs. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2021-06-25 10:44:07 -04:00
Alexey Palazhchenko	ad047a7dee	chore: small RBAC improvements * `talosctl config new` now sets endpoints in the generated config. * Avoid duplication of roles in metadata. * Remove method name prefix handling. All methods should be set explicitly. * Add tests. Closes #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-25 05:50:38 -07:00
Andrey Smirnov	829e54f1a4	fix: limit apid access to COSI runtime resources This makes sure that apid can't access any resources than the one it actually needs. This improves the security in case of a container breach. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-24 14:10:27 -07:00
Andrey Smirnov	f9e01d0274	fix: ignore EINVAL on `unmount` operations Fixes #3837 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-24 14:09:53 -07:00
Artem Chernyshev	7672435e16	feat: add a method to get gRPC connection from the client This change is for Theila which is going to use gRPC proxy to forward requests from TS frontend right to the node's apid. `gRPC` proxy operates on top of `grpc.ClientConn` objects, so getting this connection from the clients which are already being created is the easiest path. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-06-24 23:02:12 +03:00
Andrey Smirnov	b5244bf182	chore: bump go.mod dependencies, fix netaddr API changes Bump dependencies, clean up go.mod files, update for netaddr changes (all around `netaddr.IPPrefix` being a private struct now). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-24 08:37:37 -07:00
Serge Logvinov	c7e6225671	chore: update coredns to 1.8.4 * Coredns 1.8.0 -> 1.8.4 * Add RBAC endpointslices list/watch Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>	2021-06-24 07:47:36 -07:00
Andrey Smirnov	3a34f1a51d	chore: bump Talos Go modules to release versions No actual changes, just tag updates. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-24 07:45:41 -07:00
Andrey Smirnov	8d60abff7a	chore: use tagged versions of bldr dependencies for 0.11 No actual changes, just tag updates. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-24 07:17:16 -07:00
Serge Logvinov	8ef68a6fb8	feat: remove go-runner in staticpods Do not use legacy method to run contolplane Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>	2021-06-24 06:06:05 -07:00
Andrey Smirnov	a650531fab	release(v0.11.0-alpha.2): prepare release This is the official v0.11.0-alpha.2 release. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com> v0.11.0-alpha.2	2021-06-23 16:58:05 -07:00
Artem Chernyshev	71fff02ff0	fix: revert back resource.proto order Otherwise it breaks older `talosctl` compatibility. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-06-23 16:37:36 -07:00
Andrey Smirnov	d3f4e6006f	fix: replace tabs with spaces in console output See https://github.com/talos-systems/go-kmsg/pull/2 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-23 16:12:47 -07:00
Artem Chernyshev	1990ad2525	feat: add created and updated timestamps to the resource metadata This will allow to keep track of when the resource was created and updated. Update is tied to the version bump. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-06-23 13:56:49 -07:00
Spencer Smith	0731be908b	feat: add cloud images to releases This PR updates our CI so that when we release talos, a json file containing our cloud images for AWS will be published as a release asset. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2021-06-23 16:40:54 -04:00
Serge Logvinov	b52b206665	feat: split etcd certificates to peer/client Changes: * Etcd peer port key usage: ServerAuth,ClientAuth * Etcd client port key usage: ServerAuth,ClientAuth * Talos etcd client key usage: ClientAuth * KubeAPI etcd client key usage: ClientAuth * List of etcd allowed ciphers Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-23 13:26:48 -07:00
Andrey Smirnov	33119d2b8e	chore: add an option to launch cluster with bad RTC state This is useful for time sync testing. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-23 13:08:20 -07:00
Andrey Smirnov	d8c2bca1b5	feat: reimplement apid certificate generation on top of COSI This PR can be split into two parts: * controllers * apid binding into COSI world Controllers ----------- * `k8s.EndpointController` provides control plane endpoints on worker nodes (it isn't required for now on control plane nodes) * `secrets.RootController` now provides OS top-level secrets (CA cert) and secret configuration * `secrets.APIController` generates API secrets (certificates) in a bit different way for workers and control plane nodes: controlplane nodes generate directly, while workers reach out to `trustd` on control plane nodes via `k8s.Endpoint` resource apid Binding ------------ Resource `secrets.API` provides binding to protobuf by converting itself back and forth to protobuf spec. apid no longer receives machine configuration, instead it receives gRPC-backed socket to access Resource API. apid watches `secrets.API` resource, fetches certs and CA from it and uses that in its TLS configuration. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-23 13:07:00 -07:00
Alexey Palazhchenko	3c1b32199d	chore: refactor CLI tests Use testing.T.TempDir. Add support for `talosctl --endpoints`. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-23 05:49:00 -07:00
Andrew Rynhard	0fd9ea2d63	feat: enable MACVTAP support Brings in the latest version of `pkgs` with a kernel that has MACVTAP support. Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2021-06-23 05:17:33 -07:00
Spencer Smith	898673e8d3	chore: update e2e tests to use latest capi releases This PR version bumps cacppt, cabpt, capa, capg, and cluster api itself Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2021-06-22 12:37:22 -07:00
Andrey Smirnov	e26c5583c2	docs: add AMI IDs for Talos 0.10.4 Just new AMI IDs. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-22 11:14:27 -07:00
Andrey Smirnov	72ef48f0ea	fix: assign source address to the DHCP default gateway routes This isn't strictly require, but it should be backwards compatible with Talos 0.10 (networkd). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-22 10:01:43 -07:00
Andrey Smirnov	004885a379	feat: update Linux kernel to 5.10.45, etcd to 3.4.16 This also pulls in HP ILO driver, dmesg restrict mode by default and dm-crypt options. See talos-systems/pkgs#289, talos-systems/pkgs#290, talos-systems/pkgs#287 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-22 02:42:09 -07:00
Andrew Rynhard	821f469a1d	feat: skip overlay mount checks with docker We need to be able to run an install with `docker run`. This checks if we are running from docker and skips overlay mount checks if we are, as docker creates a handful of overlay mounts by default that we can't workaround (not easily at least). Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2021-06-21 15:51:39 -07:00
Alexey Palazhchenko	b6e02311a3	feat: use COSI RD's sensitivity for RBAC Refs #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-21 14:05:06 -07:00
Serge Logvinov	46751c1ad2	feat: improve security of Kubernetes control plane components Fix of fixes #3765 Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>	2021-06-21 13:04:54 -07:00

1 2 3 4 5 ...

2550 Commits