talos

mirror of https://github.com/siderolabs/talos.git synced 2025-12-19 08:21:13 +01:00

Author	SHA1	Message	Date
Seán C McCord	6af83afd5a	fix: handle multiple-IP cluster nodes Allow cluster nodes to have multiple internal IP addresses when checking for all Kubernetes nodes. Fixes #4807 Signed-off-by: Seán C McCord <ulexus@gmail.com>	2022-01-17 11:41:54 -05:00
Artem Chernyshev	2f2bdb26aa	feat: replace flags with --mode in `apply`, `edit` and `patch` commands Fixes: https://github.com/talos-systems/talos/issues/4588 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-01-13 16:09:53 +03:00
Andrey Smirnov	2f4b9d8d6d	feat: make machine configuration read-only in Talos (almost) Talos shouldn't try to re-encode the machine config it was provided with. So add a `ReadonlyWrapper` around `v1alpha1.Config` which makes sure that raw config object is not available anymore (it's a private field), but config accessors are available for read-only access. Another thing that `ReadonlyWrapper` does is that it preserves the original `[]byte` encoding of the config keeping it exactly same way as it was loaded from file or read over the network. Improved `talosctl edit mc` to preserve the config as it was submitted, and preserve the edits on error from Talos (previously edits were lost). `ReadonlyWrapper` is not used on config generation path though - config there is represented by `v1alpha.Config` and can be freely modified. Why almost? Some parts of Talos (platform code) patch the machine configuration with new data. We need to fix platforms to provide networking configuration in a different way, but this will come with other PRs later. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-28 20:12:55 +03:00
Andrey Smirnov	f49f40a336	fix: pass path to conformance retrieve results Sonobouy once again changed the API in a way that breaks our tool. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-22 17:28:05 +03:00
Andrey Smirnov	dc9a0cfe94	chore: bump Go dependencies Bump all dependencies, update `grpc.WithInsecure()` which is deprecated now. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-20 23:05:32 +03:00
Andrey Smirnov	97ffa7a645	feat: upgrade kubelet version in `talosctl upgrade-k8s` Fixes #4656 As now changes to kubelet configuration can be applied without a reboot, `talosctl upgrade-k8s` can handle the kubelet upgrades as well. The gist is simply modifying machine config and waiting for `Node` version to be updated, rest of the code is required for reliability of the process. Also fixed a bug in the API while watching deleted items with tombstones. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-08 21:12:17 +03:00
Andrey Smirnov	753a82188f	refactor: move pkg/resources to machinery Fixes #4420 No functional changes, just moving packages around. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-11-15 19:50:35 +03:00
Alexey Palazhchenko	95105071de	chore: fix simple issues found by golangci-lint Avoid slice mutation with append. Simplify code. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-11-12 15:20:28 +00:00
Alexey Palazhchenko	8e8687d759	fix: use temporary sonobuoy version `replace` should be removed when v0.55.1+ is released. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-11-12 11:34:09 +00:00
Alexey Palazhchenko	d6147eb17d	chore: update sonobuoy See https://github.com/vmware-tanzu/sonobuoy/issues/1520. Closes #4516. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-11-11 14:53:54 +00:00
Artem Chernyshev	261c497c71	feat: implement `talosctl support` command Fixes: https://github.com/talos-systems/talos/issues/4406 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2021-11-08 16:20:50 +03:00
Andrey Smirnov	ae5af9d3fa	feat: update Kubernetes to 1.23.0-alpha.3 See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1230-alpha3 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-10-22 14:59:41 +03:00
Artem Chernyshev	e3e2113adc	feat: upgrade CoreDNS during `upgrade-k8s` call Fixes: https://github.com/talos-systems/talos/issues/4065 Get all Talos generated manifests and apply them, wait for deployments to be updated and to become ready. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2021-10-13 15:47:06 +03:00
Andrey Smirnov	a1c9d64907	fix: update the way results are retrieved for certified conformance Looks like we bumped sonobuoy library, and it silently changed a lot of things in the way it works with the results. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-09-13 23:32:59 +03:00
Alexey Palazhchenko	d53e9e8963	chore: use named constants Just for consistency. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-09-07 12:13:48 +00:00
Alexey Palazhchenko	032e7c6b86	chore: import yaml.v3 consistently Do not use yaml.v2. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-08-26 11:36:50 +00:00
Artem Chernyshev	2b614e430e	feat: check if cluster has deprecated resources versions Fixes: https://github.com/talos-systems/talos/issues/4026 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2021-08-18 23:26:36 +03:00
Alexey Palazhchenko	09d70b7eaf	feat: update Kubernetes to v1.22.0 Closes #3967. Closes #3997. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-08-06 09:06:32 -07:00
Andrey Smirnov	539f42090e	chore: bump dependencies via dependabot Fixes #3993 Fixes #3994 Fixes #3995 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-08-03 10:25:17 -07:00
Andrey Smirnov	0c7ce1cd81	feat: remove remnants of bootkube support Fixes #3951 Bootkube support was removed in Talos 0.9. Talos versions 0.9-0.11 support conversion of self-hosted bootkube-based control plane to the new style control plane running as static pods managed by Talos. This commit removes all backwards compatibility and removes conversion code. For the k8s controllers, `BootstrapStatus` is removed and a dependency on `etcd` service status is added (as it was implicitly there via `BootstrapStatus`). Remove control plane conversion code. In k8s upgrade code, remove self-hosted part. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-08-03 07:55:42 -07:00
Artem Chernyshev	70d2505b7c	fix: do not require ToVersion to be set when detecting version We do not know the upgrade version when checking components versions in Theila. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-07-21 08:51:26 -07:00
Artem Chernyshev	f8f1c83a75	feat: detect the lowest Kubernetes version in upgrade-k8s CLI command Scan all pods in `kube-system` and find `kube-proxy`, `kube-scheduler`, `kube-controller-manager` and `kube-apiserver` ones, then check the lowest version amongst them. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-07-19 08:24:04 -07:00
Artem Chernyshev	2e463348b2	fix: pass all logs through the options.Log method Looks like I've missed some 🤦 Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-07-15 08:32:48 -07:00
Artem Chernyshev	bf61c2cc4a	fix: write upgrade logs only to the LogOutput if it's defined No need to print them to stdout in that case. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-07-15 07:02:45 -07:00
Artem Chernyshev	23ef1d40af	chore: add ability to redirect talos upgrade module logs to io.Writer This is going to be useful in the third party code which is using upgrade modules, to collect output logs instead of printing them to the stdout. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-07-13 08:12:06 -07:00
Andrey Smirnov	10c28758a4	fix: ignore DeadlineExceeded error correctly on bootstrap The problem was that gRPC method `status.Code(err)` doesn't unwrap errors, while Talos client returns errors wrapped with `multierror.Error` and `fmt.Errrorf`, so `status.Code` doesn't return error code correctly. Fix that by introducing our own client method which correctly goes over the chain of wrapped errors. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-07 12:02:26 -07:00
Andrey Smirnov	6d13d2cf92	fix: close Kubernetes API client The problem is that there's no official way to close Kuberentes client underlying TCP/HTTP connections. So each time Talos initializes connection to the control plane endpoint, new client is built, but this client is never closed, so the connection stays active on the load balancers, on the API server level, etc. It also eats some resources out of Talos itself. We add a way to close underlying connections by using helper from the Kubernetes client libraries to force close all TCP connections which should shut down all HTTP/2 connections as well. Alternative approach might be to cache a client for some time, but many of the clients are created with temporary PKI, so even cached client still needs to be closed once it gets stale, and it's not clear how to recreate a client in case existing one is broken for one reason or another (and we need to force a re-connection). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-05 14:25:26 -07:00
Andrey Smirnov	e883c12b31	fix: make output of `upgrade-k8s` command less scary This removes `retrying error` messages while waiting for the API server pod state to reflect changes from the updated static pod definition. Log more lines to notify about the progress. Skip `kube-proxy` if not found (as we allow it to be disabled). ``` $ talosctl upgrade-k8s -n 172.20.0.2 --from 1.21.0 --to 1.21.2 discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"] updating "kube-apiserver" to version "1.21.2" > "172.20.0.2": starting update > "172.20.0.2": machine configuration patched > "172.20.0.2": waiting for API server state pod update < "172.20.0.2": successfully updated > "172.20.0.3": starting update > "172.20.0.3": machine configuration patched > "172.20.0.3": waiting for API server state pod update < "172.20.0.3": successfully updated > "172.20.0.4": starting update > "172.20.0.4": machine configuration patched > "172.20.0.4": waiting for API server state pod update < "172.20.0.4": successfully updated updating "kube-controller-manager" to version "1.21.2" > "172.20.0.2": starting update > "172.20.0.2": machine configuration patched > "172.20.0.2": waiting for API server state pod update < "172.20.0.2": successfully updated > "172.20.0.3": starting update > "172.20.0.3": machine configuration patched > "172.20.0.3": waiting for API server state pod update < "172.20.0.3": successfully updated > "172.20.0.4": starting update > "172.20.0.4": machine configuration patched > "172.20.0.4": waiting for API server state pod update < "172.20.0.4": successfully updated updating "kube-scheduler" to version "1.21.2" > "172.20.0.2": starting update > "172.20.0.2": machine configuration patched > "172.20.0.2": waiting for API server state pod update < "172.20.0.2": successfully updated > "172.20.0.3": starting update > "172.20.0.3": machine configuration patched > "172.20.0.3": waiting for API server state pod update < "172.20.0.3": successfully updated > "172.20.0.4": starting update > "172.20.0.4": machine configuration patched > "172.20.0.4": waiting for API server state pod update < "172.20.0.4": successfully updated updating daemonset "kube-proxy" to version "1.21.2" kube-proxy skipped as DaemonSet was not found ``` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-01 06:54:36 -07:00
Andrey Smirnov	60d7360944	fix: ignore deadline exceeded errors on bootstrap With the recent changes, bootstrap API might wait for the time to be in sync (as the apid is launched before time is sync). We set timeout to 500ms for the bootstrap API call, so there's a chance that a call might time out, and we should ignore it. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-30 06:59:36 -07:00
Andrey Smirnov	d8c2bca1b5	feat: reimplement apid certificate generation on top of COSI This PR can be split into two parts: * controllers * apid binding into COSI world Controllers ----------- * `k8s.EndpointController` provides control plane endpoints on worker nodes (it isn't required for now on control plane nodes) * `secrets.RootController` now provides OS top-level secrets (CA cert) and secret configuration * `secrets.APIController` generates API secrets (certificates) in a bit different way for workers and control plane nodes: controlplane nodes generate directly, while workers reach out to `trustd` on control plane nodes via `k8s.Endpoint` resource apid Binding ------------ Resource `secrets.API` provides binding to protobuf by converting itself back and forth to protobuf spec. apid no longer receives machine configuration, instead it receives gRPC-backed socket to access Resource API. apid watches `secrets.API` resource, fetches certs and CA from it and uses that in its TLS configuration. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-23 13:07:00 -07:00
Alexey Palazhchenko	06209bba28	chore: update RBAC rules, remove old APIs Refs #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-18 09:54:49 -07:00
Andrey Smirnov	9f24b519dc	chore: remove bootkube check from cluster health check We're no longer testing against Talos <= 0.8, so no reason to run this check (even if it's no-op). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-17 10:04:32 -07:00
Alexey Palazhchenko	f63ab9dd9b	feat: implement `talosctl config new` command Refs #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-17 09:06:43 -07:00
Andrey Smirnov	5811f4dda1	feat: implement link (interface) controllers The structure of the controllers is really similar to addresses and routes: * `LinkSpec` resource describes desired link state * `LinkConfig` controller generates `LinkSpecs` based on machine configuration and kernel cmdline * `LinkMerge` controller merges multiple configuration sources into a single `LinkSpec` paying attention to the config layer priority * `LinkSpec` controller applies the specs to the kernel state Controller `LinkStatus` (which was implemented before) watches the kernel state and publishes current link status. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-01 09:36:25 -07:00
Andrey Smirnov	0acb04ad7a	feat: implement route network controllers Route handling is very similar to addresses: * `RouteStatus` describes kernel routing table state, `RouteStatusController` reflects kernel state into resources * `RouteSpec` defines routes to be configured * `RouteConfigController` creates `RouteSpec`s based on cmdline and machine configuration * `RouteMergeController` merges different configuration layers into the final representation * `RouteSpecController` applies the specs to the kernel routing table Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-05-25 11:09:21 -07:00
Andrey Smirnov	e7a9164b1e	test: implement `talosctl conformance` command to run e2e tests Command implements two modes: * `fast`: conformance suite is run at maximum speed * `certified`: conformance suite is run in serial mode, results are capture to produce artifacts ready for CNCF submission process Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-16 09:17:51 -07:00
Andrey Smirnov	d24df8f844	chore: re-import talos-systems/os-runtime as cosi-project/runtime No changes, just import path change (as project got moved). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-12 07:44:24 -07:00
Andrey Smirnov	e0650218a6	feat: support etcd recovery from snapshot on bootstrap When Talos `controlplane` node is waiting for a bootstrap, `etcd` contents can be recovered from a snapshot created with `talosctl etcd snapshot` on a healthy cluster. Bootstrap process goes same way as before, but the etcd data directory is recovered from the snapshot. This flow enables disaster recovery for the control plane: given that periodic backups are available, destroy control plane nodes, re-create them with the same config, and bootstrap one node with the saved snapshot to recover etcd state at the time of the snapshot. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-08 10:15:37 -07:00
Andrey Smirnov	a1e6415403	fix: retry Kubernetes API errors on cordon/uncordon/etc This extracts function which was used in upgrade/convert flows to retry transient errors to the main `kubernetes` package, expands it to ignore timeout errors, and it is now used to retry errors where applicable in `pkg/kubernetes`. Fixes #3403 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-02 03:51:40 -07:00
Andrey Smirnov	e039172eda	fix: ignore EOF errors from Kubernetes API when converting control plane During the conversion process, API server goes down, so we can see lots of network errors including EOF. Fixes #3404 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-01 10:52:44 -07:00
Andrey Smirnov	672c970739	fix: allow `convert-k8s --remove-initialized-keys` with K8s cp is down The command `--remove-initialized-key` is the last resort to convert control plane when control plane is down for whatever reason, so it should work when control plane is not available. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-25 14:06:08 -07:00
Alexey Palazhchenko	fb605a0fc5	chore: tweak nolintlint settings Copy from kres manually for now. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-25 13:56:16 -07:00
Alexey Palazhchenko	1f5a0c4065	fix: resolve the issue with Kubernetes upgrade Add missing cases, refactoring. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-25 12:48:28 -07:00
Andrey Smirnov	125b86f4ef	fix: upgrade-k8s bug with empty config values and provision script First, if the config for some component image (e.g. `apiServer`) is empty, Talos pushes default image which is unknown to the script, so verify that change is not no-op, as otherwise script will hang forvever waiting for k8s control plane config update. Second, with bootkube bootstrap it was fine to omit explicit kubernetes version in upgrade test, but with Talos-managed that means that after Talos upgrade Kubernetes gets upgraded as well (as Talos config doesn't contain K8s version, and defaults are used). This is not what we want to test actually. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-19 12:05:31 -07:00
Alexey Palazhchenko	7662d033bf	fix: talosctl health should not check kube-proxy when it is disabled Fixes #3299. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-16 13:21:36 -07:00
Artem Chernyshev	22f375300c	chore: update golanci-lint to 1.38.0 Fix all discovered issues. Detected couple bugs, fixed them as well. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-03-12 06:50:02 -08:00
Andrey Smirnov	6f7df3da1e	fix: update output of `convert-k8s` command This includes Sean's comments from #3278 and introduces a new flag which is referenced in manual conversion process document. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-12 02:21:01 -08:00
Andrey Smirnov	81acadf345	fix: ignore connection refused errors when updating/converting cp Without loadbalancer, when api-server goes down, there will be connection refused errors which should be retried. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-05 06:59:06 -08:00
Alexey Palazhchenko	df52c13581	chore: fix //nolint directives That's the recommended syntax: https://golangci-lint.run/usage/false-positives/ Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-05 05:58:33 -08:00
Andrey Smirnov	60aa011c7a	feat: rename namespaces, resources, types etc See https://github.com/talos-systems/os-runtime/pull/12 for new mnaming conventions. No functional changes. Additionally implements printing extra columns in `talosctl get xyz`. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-02 13:34:15 -08:00

1 2

79 Commits