talos

mirror of https://github.com/siderolabs/talos.git synced 2025-10-17 02:21:13 +02:00

Author	SHA1	Message	Date
Andrey Smirnov	0acb04ad7a	feat: implement route network controllers Route handling is very similar to addresses: * `RouteStatus` describes kernel routing table state, `RouteStatusController` reflects kernel state into resources * `RouteSpec` defines routes to be configured * `RouteConfigController` creates `RouteSpec`s based on cmdline and machine configuration * `RouteMergeController` merges different configuration layers into the final representation * `RouteSpecController` applies the specs to the kernel routing table Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-05-25 11:09:21 -07:00
Alexey Palazhchenko	4fe6912143	test: better `talosctl ls` tests Refs #3018. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-05-20 03:29:21 -07:00
Andrey Smirnov	76e38b7b82	feat: update Kubernetes to 1.21.1 See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-05-13 08:05:08 -07:00
Andrey Smirnov	0f49722d0f	feat: add `--config-patch` flag by node type The problem is that some patches can't be applied to join config, as some nodes don't even exist in the config, for example `/cluster/apiServer` node, and applying such patches doesn't make any sense. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-27 11:55:03 -07:00
Andrey Smirnov	daf2208749	test: update upgrade tests to 0.10 release In preparation for going 0.10 beta, start testing upgrades to 0.10, drop 0.8 and self-hosted control plane handling in the tests. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-09 12:57:04 -07:00
Alexey Palazhchenko	1fcf38f9d6	feat: add support for "none" CNI type Closes #3411. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-04-09 12:53:00 -07:00
Alexey Palazhchenko	37a5edf04a	feat: update Kubernetes to 1.21.0 release See CHANGELOG: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md Closes #3329. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-04-09 20:08:20 +03:00
Alexey Palazhchenko	29da22d063	feat: add config validation warnings Closes #3412. Refs #3413. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-04-08 13:49:58 -07:00
Andrey Smirnov	e0650218a6	feat: support etcd recovery from snapshot on bootstrap When Talos `controlplane` node is waiting for a bootstrap, `etcd` contents can be recovered from a snapshot created with `talosctl etcd snapshot` on a healthy cluster. Bootstrap process goes same way as before, but the etcd data directory is recovered from the snapshot. This flow enables disaster recovery for the control plane: given that periodic backups are available, destroy control plane nodes, re-create them with the same config, and bootstrap one node with the saved snapshot to recover etcd state at the time of the snapshot. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-08 10:15:37 -07:00
Artem Chernyshev	39c6dbcc7a	feat: add --config-patch parameter to talosctl gen config Fixes: https://github.com/talos-systems/talos/issues/3410 Same as in `talosctl cluster create`. Will apply RFC6902 json patch during the config generation if specified. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-04-02 10:56:41 -07:00
Andrey Smirnov	e664362cec	feat: add API and command to save etcd snapshot (backup) This adds a simple API and `talosctl etcd snapshot` command to stream snapshot of etcd from one of the control plane nodes to the local file. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-02 09:20:16 -07:00
Andrey Smirnov	abc2e17ebb	test: update 0.9.x version in upgrade tests to 0.9.1 Version 0.9.1 contains a fix for concurrent map write on unmount which was frequently breaking our upgrade tests. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-02 03:59:36 -07:00
Andrey Smirnov	7d91258475	test: fix data race in apply config tests Variable `chanErr` was read before waiting for the goroutine to finish. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-31 10:46:50 -07:00
Andrey Smirnov	204caf8eb9	test: fix apply-config integration test, bump clusterctl version Tests for ApplyConfig API were relying on not really supported behavior of modifying config via the `Provider` interface (and it was "fixed" in another PR which cleans up such access to the configuration). Cluster version bumped to try to workaround strange CAPI bootstrap failures in e2e-capi. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-31 09:55:53 -07:00
Alexey Palazhchenko	a9451f5712	feat: update Kubernetes to 1.21.0-beta.1 See CHANGELOG: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md Refs #3329. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-30 03:07:03 -07:00
Andrey Smirnov	2ea20f598a	feat: replace timed with time sync controller This is a complete rewrite of time sync process. Now the time sync process starts early at boot time, and it adapts to configuration changes: * before config is available, `pool.ntp.org` is used * once config is available, configured time servers are used Controller updates same time sync resource as other controllers had dependency on, so they have a chance to wait for the time sync event. Talos services which depend on time now wait on same resource instead of waiting on timed health. New features: * time sync now sticks to the particular time server unless there's an error from that server, and server is changed in that case, this improves time sync accuracy * time sync acts on config changes immediately, so it's possible to reconfigure time sync at any time * there's a new 'epoch' field in time sync resources which allows time-dependent controllers to regenerate certs when there's a big enough jump in time Features to implement later: * apid shouldn't depend on timed, it should be started early and it should regenerate certs on time jump * trustd should be updated in same way Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-29 09:29:43 -07:00
Alexey Palazhchenko	d7e9f6d6a8	chore: build integration tests with -race Refs https://github.com/talos-systems/talos/issues/3378. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-26 10:08:12 -07:00
Alexey Palazhchenko	ed272e604e	feat: update Kubernetes to 1.21.0-beta.0 See CHANGELOG: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md Refs #3329. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-24 07:36:54 -07:00
Andrey Smirnov	b0209fd29d	refactor: move networkd, timed APIs to machined, remove routerd This moves implementation of the user-facing APIs to the machined, and as now all the APIs are implemented by machined, remove routerd and adjust apid to proxy to machined. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-24 00:00:28 -07:00
Artem Chernyshev	6ffabe5169	feat: add ability to find disk by disk properties Fixes: https://github.com/talos-systems/talos/issues/3323 Not exactly matching with udevd generated `by-<id>` symlinks, but should provide sufficient amount of property selectors to be able to pick specific disks for any kind of disk: sd card, hdd, ssd, nvme. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-03-23 14:23:02 -07:00
Andrey Smirnov	ac8764702f	refactor: move apid, routerd, timed and trustd to single executable This removes container images for the aforementioned services, they are now built into `machined` executable which launches one or another service based on `argv[0]`. Containers are started with rootfs directory which contains only a single executable file for the service. This creates rootfs on squashfs for each container in `/opt/<container>`. Service `networkd` is not touched as it's handled in #3350. This removes all the image imports, snapshots and other things which were associated with the existing way to run containers. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-23 09:48:11 -07:00
Andrey Smirnov	125b86f4ef	fix: upgrade-k8s bug with empty config values and provision script First, if the config for some component image (e.g. `apiServer`) is empty, Talos pushes default image which is unknown to the script, so verify that change is not no-op, as otherwise script will hang forvever waiting for k8s control plane config update. Second, with bootkube bootstrap it was fine to omit explicit kubernetes version in upgrade test, but with Talos-managed that means that after Talos upgrade Kubernetes gets upgraded as well (as Talos config doesn't contain K8s version, and defaults are used). This is not what we want to test actually. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-19 12:05:31 -07:00
Andrey Smirnov	f0512dfce9	feat: update Kubernetes to 1.20.5 See CHANGELOG: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1204 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-19 03:14:46 -07:00
Andrey Smirnov	ca8a5596c7	chore: fix provision tests after changes to build-container CNI was removed from build-container which works fine for `talosctl cluster create` clusters as it installs its own CNI, but fails for upgrade tests as they were never updated for the CNI bundle. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-12 09:59:15 -08:00
Artem Chernyshev	22f375300c	chore: update golanci-lint to 1.38.0 Fix all discovered issues. Detected couple bugs, fixed them as well. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-03-12 06:50:02 -08:00
Alexey Palazhchenko	df52c13581	chore: fix //nolint directives That's the recommended syntax: https://golangci-lint.run/usage/false-positives/ Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-05 05:58:33 -08:00
Andrey Smirnov	7e8f13652c	chore: fix upgrade tests by bumping 0.9 to alpha.5 Resources/types were renamed after alpha.4, so we need Talos API to match expectations of the upgrade test built against master. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-03 13:53:06 -08:00
Andrey Smirnov	60aa011c7a	feat: rename namespaces, resources, types etc See https://github.com/talos-systems/os-runtime/pull/12 for new mnaming conventions. No functional changes. Additionally implements printing extra columns in `talosctl get xyz`. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-02 13:34:15 -08:00
Andrey Smirnov	1d8ed9b5cd	chore: update provision/upgrade tests to 0.9.0-alpha.3 This drops support for 0.7.x in upgrade tests, and bumps tests to use version 0.9.0-alpha.3 as the next stable (it will eventually graduate to 0.9.0). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-02 07:11:16 -08:00
Andrey Smirnov	31e56e63db	fix: update in-cluster kubeconfig validity to match other certs Talos generates in-cluster kubeconfig for the kube-scheduler and kube-controller-manager to authenticate to kube-apiserver. Bug was that validity of that kubeconfig was set to 24h by mistake. Fix that by bumping validity to default for other Kubernetes certs (1 year). Add a certificate refresh at 50% of the validity. Fix bugs with copying secret resources which was leading to updates not being propagated correctly. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-01 11:16:04 -08:00
Andrey Smirnov	c7ee239087	fix: show stopped/exited containers via CRI inspector This fixes output of `talosctl containers` to show failed/exited containers so that it's possible to see e.g. `kube-apiserver` container when it fails to start. This also enables using ID from the container list to see logs of failing containers, so it's easy to debug issues when control plane pods don't start because of wrong configuration. Also remove option to use either CRI or containerd inspector, default to containerd for system namespace and to CRI for kubernetes namespace. The only side effect is that we can't see `kubelet` container in the output of `talosctl containers -k`, but `kubelet` itself is available in `talosctl services` and `talosctl logs kubelet`. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-26 14:45:13 -08:00
Artem Chernyshev	041620c852	feat: implement talosctl edit and patch config commands Fixes: https://github.com/talos-systems/talos/issues/3209 Using parts of `kubectl` package to run the editor. Also using the same approach as in `kubectl edit` command: - add commented section to the top of the file with the description. - if the config has errors, display validation errors in the commented section at the top of the file. - retry apply config until it succeeds. - abort if no changes were detected or if the edited file is empty. Patch currently supports jsonpatch only and can read it either from the file or from the inline argument. https://asciinema.org/a/wPawpctjoCFbJZKo2z2ATDXeC Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-02-26 02:00:20 +03:00
Artem Chernyshev	7108bb3f5b	test: upgrade master to master tests Verify upgrade flow using the same version of the installer. Run that with disk encryption enabled. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-02-24 07:56:44 -08:00
Andrey Smirnov	e2f1fbcfdb	feat: support control plane upgrades with Talos managed control plane Upgrade is performed by updating node configuration (node by node, service by service), watching internal resource state to get new configuration version and verifying that pod with matching version successfully propagated to the API server state and pod is ready. Process is similar to the rolling update of the DaemonSet. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-20 11:57:32 -08:00
Artem Chernyshev	06b8c09484	test: enable disk encryption key rotation test Verify that disk encryption sync operations work properly. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-02-20 06:17:55 -08:00
Andrey Smirnov	e9fc54f6e3	feat: update Kubernetes to 1.20.3 https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1202 Also updater pkgs for: * talos-systems/pkgs#238 (raspberrypi-firmware update) * talos-systems/pkgs#242 (Linux 5.10.17 + init_on_free=0) Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-19 05:22:34 -08:00
Andrey Smirnov	32d2588528	test: update integration tests to use wrapped client for etcd APIs This continues the fix from #3167. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-18 08:08:48 -08:00
Andrey Smirnov	7751920dba	feat: add a tool and package to convert self-hosted CP to static pods This is required to upgrade from Talos 0.8.x to 0.9.x. After the cluster is fully upgraded, control plane is still self-hosted (as it was bootstrapped with bootkube). Tool `talosctl convert-k8s` (and library behind it) performs the upgrade to self-hosted version. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-17 23:26:57 -08:00
Artem Chernyshev	58ff2c9808	feat: implement ephemeral partition encryption This PR introduces the first part of disk encryption support. New config section `systemDiskEncryption` was added into MachineConfig. For now it contains only Ephemeral partition encryption. Encryption itself supports two kinds of keys for now: - node id deterministic key. - static key which is hardcoded in the config and mainly used for test purposes. Talosctl cluster create can now be told to encrypt ephemeral partition by using `--encrypt-ephemeral` flag. Additionally: - updated pkgs library version. - changed Dockefile to copy cryptsetup deps from pkgs. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-02-17 13:39:04 -08:00
Andrey Smirnov	cc83b83808	feat: rename apply-config --no-reboot to --on-reboot This explains the intetion better: config is applied on reboot, and allows to easily distinguish it from `apply-config --immediate` which applies config immediately without a reboot (that is coming in a different PR). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-17 12:49:47 -08:00
Andrey Smirnov	254e0e91e1	fix: correctly unwrap responses for etcd commands This uses wrappers which helps to unwrap errors from proxied apid responses. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-17 11:33:54 -08:00
Artem Chernyshev	02b3719df9	feat: skip filesystem for state and ephemeral partitions in the installer Filesystem creation step is moved on the later stage: when Talos mounts the partition for the first time. Now it checks if the partition doesn't have any filesystem and formats it right before mounting. Additionally refactored mount options a bit: - replaced separate options with a set of binary flags. - implemented pre-mount and post-unmount hooks. And fixed typos in couple of places and increased timeout for `apid ready`. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-02-17 09:37:21 -08:00
Artem Chernyshev	f96548e165	refactor: extract go-cmd into a separate library To be used in the `go-blockdevice` library. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-02-16 10:31:20 -08:00
Andrey Smirnov	d99a016af2	fix: correct response structure for GenerateConfig API Also fix recovery grpc handler to print panic stacktrace to the log. Any API should follow the structure compatible with apid proxying injection of errors/nodes. Explicitly fail GenerateConfig API on worker nodes, as it panics on worker nodes (missing certificates in node config). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-11 06:34:10 -08:00
Andrey Smirnov	daea9d3811	feat: support version contract for Talos config generation This allows to generating current version Talos configs (by default) or backwards compatible configuration (e.g. for Talos 0.8). `talosctl gen config` defaults to current version, but explicit version can be passed to the command via flags. `talosctl cluster create` defaults to install/container image version, but that can be overridden. This makes `talosctl cluster create` now compatible with 0.8.1 images out of the box. Upgrade tests use contract based on source version in the test. When used as a library, `VersionContract` can be omitted (defaults to current version) or passed explicitly. `VersionContract` can be convienietly parsed from Talos version string or specified as one of the constants. Fixes #3130 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-10 13:02:52 -08:00
Andrey Smirnov	7f3dca8e4c	test: add support for IPv6 in talosctl cluster create Modify provision library to support multiple IPs, CIDRs, gateways, which can be IPv4/IPv6. Based on IP types, enable services in the cluster to run DHCPv4/DHCPv6 in the test environment. There's outstanding bug left with routes not being properly set up in the cluster so, IPs are not properly routable, but DHCPv6 works and IPs are allocated (validates DHCPv6 client). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-09 13:28:53 -08:00
Andrey Smirnov	edf5777222	feat: add an option to force upgrade without checks Our upgrades are safe by default - we check etcd health, take locks, etc. But sometimes upgrades might be a way to recover broken (or semi-broken) cluster, in that case we need upgrade to run even if the checks are not passing. This is not a safe way to do upgrades, but it might be a way to recover a cluster. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-09 10:20:03 -08:00
Andrey Smirnov	2277ce8abe	feat: move to ECDSA keys for all Kubernetes/etcd certs and keys ECDSA keys are smaller which decreases Talos config size, they are more efficient in terms of key generation, signing, etc., so it makes boot performance better (and config generation as well). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-02 13:25:00 -08:00
Andrey Smirnov	87ccf0eb21	test: clear connection refused errors after reset After node reboot (and gRPC API unavailability), gRPC stack might cache connection refused errors for up to backoff timeout. Explicitly clear such errors in reset tests before trying to read data from the node to verify reset success. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-01 08:11:27 -08:00
Andrey Smirnov	e0a0f58801	feat: use multi-arch images for k8s and Flannel CNI Flannel got updated to 0.13 version which has multi-arch image. Kubernetes images are multi-arch. Fixes #3049 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-01-28 08:26:02 -08:00

1 2 3 4

181 Commits