talos

mirror of https://github.com/siderolabs/talos.git synced 2025-12-06 18:11:33 +01:00

Author	SHA1	Message	Date
Dmitriy Matrenichev	4dbbf4ac50	chore: add generic methods and use them part #2 Use things from #5702. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-06-09 23:10:02 +08:00
Dmitriy Matrenichev	70fc424099	chore: add generic methods and use them Things like ToSet, Keys etc... Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-06-09 02:59:23 +08:00
Utku Ozdemir	c19dd1b892	feat: add 'etcd members should be control plane nodes' health check Add new health check which checks if the etcd members match the control plane nodes. Closes siderolabs#5553. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2022-06-07 10:34:38 +02:00
Dmitriy Matrenichev	bf7a6443ee	feat: add 'etcd membership is consistent across nodes' health check Add new health check which waits for all etcd members. Closes #5552. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-05-20 21:51:17 +08:00
Andrey Smirnov	5a91f6076d	fix: ignore completed pods in cluster health check This fixes an error when integration test become stuck with the message like: ``` waiting for coredns to report ready: some pods are not ready: [coredns-868c687b7-g2z64] ``` After some random sequence of node restarts one of the pods might become "stuck" in `Completed` state (as it is shown in `kubectl get pods`) blocking the check, as the pod will never become ready. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-05-16 14:28:25 +03:00
Andrey Smirnov	f1f43131f8	fix: strip 'v' prefix from versions on Kubernetes upgrade This fixes an issue when `talosctl upgrade-k8s` fails with unhelpful message if the version is specified as `v1.23.5` vs. `1.23.5`. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-04-22 14:59:12 +03:00
Andrey Smirnov	4eb9f45cc8	refactor: split polymorphic K8sControlPlane into typed resources Having polymorphic (spec type depends on ID) resources is not a good idea, and it's not compatible with protobuf encoding. Introduce new resources for each polymorphic sub-spec using new Go 1.18 generic typed.Resource to reduce the boilerplate code. (Still needs proper deepcopy-gen, but I'm skipping it for now, as K8sControlPlane had also broken deep copy). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-04-19 16:53:09 +03:00
Andrey Smirnov	8af50fcd27	fix: correct cri package import path Containerd CRI plugin was merged into the main repo, but we were using old import path, so our constants coming from the module were outdated. This fixes the image version for the pause container. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-04-14 16:27:45 +03:00
Andrey Smirnov	0cb84e8c1a	fix: correctly parse tags out of images Use the last `:` in the image reference. Handle the case when no version was discovered. See https://github.com/siderolabs/theila/issues/138 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-04-07 19:32:12 +03:00
Andrey Smirnov	2ca5279e56	fix: retry manifest updates in upgrade-k8s This showed up recently frequently in integration-provision tests (might be related to Kubernetes upgrade), but anyways errors should be retried. Refactored the function to extract the retryable part. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-04-01 16:20:25 +03:00
Andrey Smirnov	ca8b9c0a3a	feat: update Kubernetes to 1.24.0-alpha.4 See https://github.com/kubernetes/kubernetes/releases/tag/v1.24.0-alpha.4 Fix some incompatibilities around dropped flags/API versions. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-03-30 22:59:07 +03:00
Dmitriy Matrenichev	e06e1473b0	feat: update golangci-lint to 1.45.0 and gofumpt to 0.3.0 - Update golangci-lint to 1.45.0 - Update gofumpt to 0.3.0 - Fix gofumpt errors - Add goimports and format imports since gofumports is removed - Update Dockerfile - Fix .golangci.yml configuration - Fix linting errors Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-03-24 08:14:04 +04:00
Artem Chernyshev	27af5d41c6	feat: pause the boot process on some failures instead of rebooting Some failures can be fixed by updating the machine configuration. Now `userDisks` and `userFiles` do not make Talos to enter into reboot loop but pause for 35 minutes. Additionally, `apid` and `machined` are now started right after containerd is up and running. That makes it possible for the operator to connect to the node using talosctl and fix the config. Fixes: https://github.com/talos-systems/talos/issues/4669 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-03-21 17:39:45 +03:00
Andrey Smirnov	50594ab1a7	fix: ignore terminated pods in pod health checks With graceful kubelet shutdown (#5108), after graceful node restart pods on the restarted node might stay in the status `Terminated` which breaks the check on pod readiness. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-03-17 19:17:56 +03:00
Andrew Rynhard	84ee1795dc	docs: update logo Changes the logo and reformats the description on the front page. Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2022-03-16 15:57:56 +03:00
Noel Georgi	dcde2c4f68	chore: update k8s upgrade message Update k8s upgrade message Signed-off-by: Noel Georgi <git@frezbo.dev>	2022-01-31 16:49:25 +05:30
Artem Chernyshev	831f65a07f	fix: close client provider instead of Talos client in the upgrade module Otherwise it breaks Theila, which never closes Talos clients during operation. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-01-27 15:07:28 +03:00
Seán C McCord	6af83afd5a	fix: handle multiple-IP cluster nodes Allow cluster nodes to have multiple internal IP addresses when checking for all Kubernetes nodes. Fixes #4807 Signed-off-by: Seán C McCord <ulexus@gmail.com>	2022-01-17 11:41:54 -05:00
Artem Chernyshev	2f2bdb26aa	feat: replace flags with --mode in `apply`, `edit` and `patch` commands Fixes: https://github.com/talos-systems/talos/issues/4588 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-01-13 16:09:53 +03:00
Andrey Smirnov	2f4b9d8d6d	feat: make machine configuration read-only in Talos (almost) Talos shouldn't try to re-encode the machine config it was provided with. So add a `ReadonlyWrapper` around `v1alpha1.Config` which makes sure that raw config object is not available anymore (it's a private field), but config accessors are available for read-only access. Another thing that `ReadonlyWrapper` does is that it preserves the original `[]byte` encoding of the config keeping it exactly same way as it was loaded from file or read over the network. Improved `talosctl edit mc` to preserve the config as it was submitted, and preserve the edits on error from Talos (previously edits were lost). `ReadonlyWrapper` is not used on config generation path though - config there is represented by `v1alpha.Config` and can be freely modified. Why almost? Some parts of Talos (platform code) patch the machine configuration with new data. We need to fix platforms to provide networking configuration in a different way, but this will come with other PRs later. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-28 20:12:55 +03:00
Andrey Smirnov	f49f40a336	fix: pass path to conformance retrieve results Sonobouy once again changed the API in a way that breaks our tool. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-22 17:28:05 +03:00
Andrey Smirnov	dc9a0cfe94	chore: bump Go dependencies Bump all dependencies, update `grpc.WithInsecure()` which is deprecated now. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-20 23:05:32 +03:00
Andrey Smirnov	97ffa7a645	feat: upgrade kubelet version in `talosctl upgrade-k8s` Fixes #4656 As now changes to kubelet configuration can be applied without a reboot, `talosctl upgrade-k8s` can handle the kubelet upgrades as well. The gist is simply modifying machine config and waiting for `Node` version to be updated, rest of the code is required for reliability of the process. Also fixed a bug in the API while watching deleted items with tombstones. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-08 21:12:17 +03:00
Andrey Smirnov	753a82188f	refactor: move pkg/resources to machinery Fixes #4420 No functional changes, just moving packages around. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-11-15 19:50:35 +03:00
Alexey Palazhchenko	95105071de	chore: fix simple issues found by golangci-lint Avoid slice mutation with append. Simplify code. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-11-12 15:20:28 +00:00
Alexey Palazhchenko	8e8687d759	fix: use temporary sonobuoy version `replace` should be removed when v0.55.1+ is released. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-11-12 11:34:09 +00:00
Alexey Palazhchenko	d6147eb17d	chore: update sonobuoy See https://github.com/vmware-tanzu/sonobuoy/issues/1520. Closes #4516. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-11-11 14:53:54 +00:00
Artem Chernyshev	261c497c71	feat: implement `talosctl support` command Fixes: https://github.com/talos-systems/talos/issues/4406 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2021-11-08 16:20:50 +03:00
Andrey Smirnov	ae5af9d3fa	feat: update Kubernetes to 1.23.0-alpha.3 See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1230-alpha3 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-10-22 14:59:41 +03:00
Artem Chernyshev	e3e2113adc	feat: upgrade CoreDNS during `upgrade-k8s` call Fixes: https://github.com/talos-systems/talos/issues/4065 Get all Talos generated manifests and apply them, wait for deployments to be updated and to become ready. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2021-10-13 15:47:06 +03:00
Andrey Smirnov	a1c9d64907	fix: update the way results are retrieved for certified conformance Looks like we bumped sonobuoy library, and it silently changed a lot of things in the way it works with the results. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-09-13 23:32:59 +03:00
Alexey Palazhchenko	d53e9e8963	chore: use named constants Just for consistency. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-09-07 12:13:48 +00:00
Alexey Palazhchenko	032e7c6b86	chore: import yaml.v3 consistently Do not use yaml.v2. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-08-26 11:36:50 +00:00
Artem Chernyshev	2b614e430e	feat: check if cluster has deprecated resources versions Fixes: https://github.com/talos-systems/talos/issues/4026 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2021-08-18 23:26:36 +03:00
Alexey Palazhchenko	09d70b7eaf	feat: update Kubernetes to v1.22.0 Closes #3967. Closes #3997. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-08-06 09:06:32 -07:00
Andrey Smirnov	539f42090e	chore: bump dependencies via dependabot Fixes #3993 Fixes #3994 Fixes #3995 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-08-03 10:25:17 -07:00
Andrey Smirnov	0c7ce1cd81	feat: remove remnants of bootkube support Fixes #3951 Bootkube support was removed in Talos 0.9. Talos versions 0.9-0.11 support conversion of self-hosted bootkube-based control plane to the new style control plane running as static pods managed by Talos. This commit removes all backwards compatibility and removes conversion code. For the k8s controllers, `BootstrapStatus` is removed and a dependency on `etcd` service status is added (as it was implicitly there via `BootstrapStatus`). Remove control plane conversion code. In k8s upgrade code, remove self-hosted part. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-08-03 07:55:42 -07:00
Artem Chernyshev	70d2505b7c	fix: do not require ToVersion to be set when detecting version We do not know the upgrade version when checking components versions in Theila. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-07-21 08:51:26 -07:00
Artem Chernyshev	f8f1c83a75	feat: detect the lowest Kubernetes version in upgrade-k8s CLI command Scan all pods in `kube-system` and find `kube-proxy`, `kube-scheduler`, `kube-controller-manager` and `kube-apiserver` ones, then check the lowest version amongst them. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-07-19 08:24:04 -07:00
Artem Chernyshev	2e463348b2	fix: pass all logs through the options.Log method Looks like I've missed some 🤦 Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-07-15 08:32:48 -07:00
Artem Chernyshev	bf61c2cc4a	fix: write upgrade logs only to the LogOutput if it's defined No need to print them to stdout in that case. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-07-15 07:02:45 -07:00
Artem Chernyshev	23ef1d40af	chore: add ability to redirect talos upgrade module logs to io.Writer This is going to be useful in the third party code which is using upgrade modules, to collect output logs instead of printing them to the stdout. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-07-13 08:12:06 -07:00
Andrey Smirnov	10c28758a4	fix: ignore DeadlineExceeded error correctly on bootstrap The problem was that gRPC method `status.Code(err)` doesn't unwrap errors, while Talos client returns errors wrapped with `multierror.Error` and `fmt.Errrorf`, so `status.Code` doesn't return error code correctly. Fix that by introducing our own client method which correctly goes over the chain of wrapped errors. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-07 12:02:26 -07:00
Andrey Smirnov	6d13d2cf92	fix: close Kubernetes API client The problem is that there's no official way to close Kuberentes client underlying TCP/HTTP connections. So each time Talos initializes connection to the control plane endpoint, new client is built, but this client is never closed, so the connection stays active on the load balancers, on the API server level, etc. It also eats some resources out of Talos itself. We add a way to close underlying connections by using helper from the Kubernetes client libraries to force close all TCP connections which should shut down all HTTP/2 connections as well. Alternative approach might be to cache a client for some time, but many of the clients are created with temporary PKI, so even cached client still needs to be closed once it gets stale, and it's not clear how to recreate a client in case existing one is broken for one reason or another (and we need to force a re-connection). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-05 14:25:26 -07:00
Andrey Smirnov	e883c12b31	fix: make output of `upgrade-k8s` command less scary This removes `retrying error` messages while waiting for the API server pod state to reflect changes from the updated static pod definition. Log more lines to notify about the progress. Skip `kube-proxy` if not found (as we allow it to be disabled). ``` $ talosctl upgrade-k8s -n 172.20.0.2 --from 1.21.0 --to 1.21.2 discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"] updating "kube-apiserver" to version "1.21.2" > "172.20.0.2": starting update > "172.20.0.2": machine configuration patched > "172.20.0.2": waiting for API server state pod update < "172.20.0.2": successfully updated > "172.20.0.3": starting update > "172.20.0.3": machine configuration patched > "172.20.0.3": waiting for API server state pod update < "172.20.0.3": successfully updated > "172.20.0.4": starting update > "172.20.0.4": machine configuration patched > "172.20.0.4": waiting for API server state pod update < "172.20.0.4": successfully updated updating "kube-controller-manager" to version "1.21.2" > "172.20.0.2": starting update > "172.20.0.2": machine configuration patched > "172.20.0.2": waiting for API server state pod update < "172.20.0.2": successfully updated > "172.20.0.3": starting update > "172.20.0.3": machine configuration patched > "172.20.0.3": waiting for API server state pod update < "172.20.0.3": successfully updated > "172.20.0.4": starting update > "172.20.0.4": machine configuration patched > "172.20.0.4": waiting for API server state pod update < "172.20.0.4": successfully updated updating "kube-scheduler" to version "1.21.2" > "172.20.0.2": starting update > "172.20.0.2": machine configuration patched > "172.20.0.2": waiting for API server state pod update < "172.20.0.2": successfully updated > "172.20.0.3": starting update > "172.20.0.3": machine configuration patched > "172.20.0.3": waiting for API server state pod update < "172.20.0.3": successfully updated > "172.20.0.4": starting update > "172.20.0.4": machine configuration patched > "172.20.0.4": waiting for API server state pod update < "172.20.0.4": successfully updated updating daemonset "kube-proxy" to version "1.21.2" kube-proxy skipped as DaemonSet was not found ``` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-01 06:54:36 -07:00
Andrey Smirnov	60d7360944	fix: ignore deadline exceeded errors on bootstrap With the recent changes, bootstrap API might wait for the time to be in sync (as the apid is launched before time is sync). We set timeout to 500ms for the bootstrap API call, so there's a chance that a call might time out, and we should ignore it. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-30 06:59:36 -07:00
Andrey Smirnov	d8c2bca1b5	feat: reimplement apid certificate generation on top of COSI This PR can be split into two parts: * controllers * apid binding into COSI world Controllers ----------- * `k8s.EndpointController` provides control plane endpoints on worker nodes (it isn't required for now on control plane nodes) * `secrets.RootController` now provides OS top-level secrets (CA cert) and secret configuration * `secrets.APIController` generates API secrets (certificates) in a bit different way for workers and control plane nodes: controlplane nodes generate directly, while workers reach out to `trustd` on control plane nodes via `k8s.Endpoint` resource apid Binding ------------ Resource `secrets.API` provides binding to protobuf by converting itself back and forth to protobuf spec. apid no longer receives machine configuration, instead it receives gRPC-backed socket to access Resource API. apid watches `secrets.API` resource, fetches certs and CA from it and uses that in its TLS configuration. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-23 13:07:00 -07:00
Alexey Palazhchenko	06209bba28	chore: update RBAC rules, remove old APIs Refs #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-18 09:54:49 -07:00
Andrey Smirnov	9f24b519dc	chore: remove bootkube check from cluster health check We're no longer testing against Talos <= 0.8, so no reason to run this check (even if it's no-op). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-17 10:04:32 -07:00
Alexey Palazhchenko	f63ab9dd9b	feat: implement `talosctl config new` command Refs #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-17 09:06:43 -07:00

1 2

96 Commits