talos

mirror of https://github.com/siderolabs/talos.git synced 2025-08-21 14:41:12 +02:00

Author	SHA1	Message	Date
Andrey Smirnov	daea9d3811	feat: support version contract for Talos config generation This allows to generating current version Talos configs (by default) or backwards compatible configuration (e.g. for Talos 0.8). `talosctl gen config` defaults to current version, but explicit version can be passed to the command via flags. `talosctl cluster create` defaults to install/container image version, but that can be overridden. This makes `talosctl cluster create` now compatible with 0.8.1 images out of the box. Upgrade tests use contract based on source version in the test. When used as a library, `VersionContract` can be omitted (defaults to current version) or passed explicitly. `VersionContract` can be convienietly parsed from Talos version string or specified as one of the constants. Fixes #3130 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-10 13:02:52 -08:00
Andrey Smirnov	7f3dca8e4c	test: add support for IPv6 in talosctl cluster create Modify provision library to support multiple IPs, CIDRs, gateways, which can be IPv4/IPv6. Based on IP types, enable services in the cluster to run DHCPv4/DHCPv6 in the test environment. There's outstanding bug left with routes not being properly set up in the cluster so, IPs are not properly routable, but DHCPv6 works and IPs are allocated (validates DHCPv6 client). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-09 13:28:53 -08:00
Andrey Smirnov	edf5777222	feat: add an option to force upgrade without checks Our upgrades are safe by default - we check etcd health, take locks, etc. But sometimes upgrades might be a way to recover broken (or semi-broken) cluster, in that case we need upgrade to run even if the checks are not passing. This is not a safe way to do upgrades, but it might be a way to recover a cluster. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-09 10:20:03 -08:00
Andrey Smirnov	2277ce8abe	feat: move to ECDSA keys for all Kubernetes/etcd certs and keys ECDSA keys are smaller which decreases Talos config size, they are more efficient in terms of key generation, signing, etc., so it makes boot performance better (and config generation as well). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-02 13:25:00 -08:00
Andrey Smirnov	e0a0f58801	feat: use multi-arch images for k8s and Flannel CNI Flannel got updated to 0.13 version which has multi-arch image. Kubernetes images are multi-arch. Fixes #3049 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-01-28 08:26:02 -08:00
Andrey Smirnov	0aaf8fa968	feat: replace bootkube with Talos-managed control plane Control plane components are running as static pods managed by the kubelets. Whole subsystem is managed via resources/controllers from os-runtime. Many supporting changes/refactoring to enable new code paths. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-01-26 14:22:35 -08:00
Andrey Smirnov	d71ac4c4ff	feat: update Kubernetes to 1.20.2 Minor point release, official changelog: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-01-15 09:06:18 -08:00
Andrey Smirnov	f2c029a07d	chore: update upgrade test version used Now with official 0.8.0 release. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-24 18:49:29 +03:00
Andrey Smirnov	b1d4814308	feat: update Kubernetes to 1.20.1 See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-21 23:52:29 +03:00
Andrey Smirnov	3dae6df27b	test: stabilize upgrade test by running health check several times For single node clusters, control plane is unstable after reboot, run health check several times to let it settle down to avoid failures in subsequent checks. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-11 08:31:01 -08:00
Andrey Smirnov	872e792dbc	feat: update Kubernetes to 1.20.0 Official K8s release matching Talos 0.8.0. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-09 06:11:48 -08:00
Andrey Smirnov	350280eb59	feat: implement "staged" (failsafe/backup) upgrades Regular upgrade path takes just one reboot, but it requires all the processes to be stopped on the node before upgrade might proceed. Under some circumstances and with potential Talos bugs it might not work rendering Talos upgrades almost impossible. Staged upgrades build upon regular install flow to run the upgrade on the node reboot. Such upgrades require two reboots of the node, and it requires two pulls of the installer image, but they should be much less suspicious to the failure. Once the upgrade is staged, node can be rebooted in any possible way, including hard reset and upgrade is performed on the next boot. New ADV format was implemented as well to allow to store install image ref/options across reboots. New format allows for bigger values and takes 50% of the `META` partition. Old ADV is still kept for compatibility reasons. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-08 08:34:26 -08:00
Andrey Smirnov	1cf6b98fb8	test: bump Talos release version for upgrade test to 0.7.1 We should always use latest releases. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-08 18:41:28 +03:00
Andrey Smirnov	621968977e	feat: update kubernetes to 1.20.0-rc.0 Talos 0.8 is going to ship with K8s 1.20. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-02 10:50:58 -08:00
Andrey Smirnov	28ba6e416e	feat: update Kubernetes to v1.20.0-beta.2 Talos 0.8 is going to ship with K8s 1.20.x. Changes to support new `control-plane` label, upgrade-k8s supports automated fixups for 1.20. See also: https://github.com/talos-systems/bootkube-plugin/pull/22 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-25 06:39:14 -08:00
Artem Chernyshev	b6874ee82a	feat: add TUI based talos interactive installer This is initial commit of the installer. What's done: - verifying node availability before starting any operations. - gathering information about disks on the machine. - allows setting: install disk, hostname, machine type, installer image, kubernetes version, dns domain, cluster-name. - dumps/merges talosconfig to a file after applying configuration. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-18 12:34:15 -08:00
Andrey Smirnov	07cbf4be3f	test: update integration test versions, clean up names Bump to 0.7.0 as we have a new release. Clean up the tests we do: 0.6.3 is a previous release, 0.7.0 is a stable release, current version (0.8.x) is the "next" release. We test the following: * 0.6.3 -> 0.7.0 * 0.7.0 -> 0.8-current * 0.7.0 -> 0.8-current (single node) This tests upgrades always between two releases. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-18 16:39:40 +03:00
Andrey Smirnov	df6ad3fa80	feat: upgrade Kubernetes default version to 1.19.4 k8s.io modules don't have 1.19.4 tag yet :( Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-12 08:51:04 -08:00
Andrey Smirnov	b2b86a622e	fix: remove 'token creds' from maintenance service This fixes the reverse Go dependency from `pkg/machinery` to `talos` package. Add a check to `Dockerfile` to prevent `pkg/machinery/go.mod` getting out of sync, this should prevent problems in the future. Fix potential security issue in `token` authorizer to deny requests without grpc metadata. In provisioner, add support for launching nodes without the config (config is not delivered to the provisioned nodes). Breaking change in `pkg/provision`: now `NodeRequest.Type` should be set to the node type (as config can be missing now). In `talosctl cluster create` add a flag to skip providing config to the nodes so that they enter maintenance mode, while the generated configs are written down to disk (so they can be tweaked and applied easily). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-09 14:10:32 -08:00
Artem Chernyshev	061b296530	feat: allow specifying user-disks in talosctl cluster create User-disks are supported by QEMU and Firecracker providers. Can be defined by using the following parameters: ``` --user-disk /mount/path:1GB ``` Can get more than 1 user disk. Same set of user disks will be created for all master and worker nodes. Additionally enable user-disks in qemu e2e test. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-10-30 08:44:08 -07:00
Andrey Smirnov	66829b14d5	test: bump Talos version for upgrade tests, bump Cilium version Use 0.6.3 as upgrade source version, use latest Cilium release. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-29 22:22:21 +03:00
Andrey Smirnov	bc9e0c0dba	fix: re-implement upgrade (install) with preserve For 0.6 -> 0.7 upgrade, in any case config.yaml is preserved and moved from `/boot` to `/system/state`. For single node upgrade, `EPHEMERAL` partition is not touched and other partitions are re-created as needed. Bump provision tests to 0.6/0.7 upgrades as we get closer to the new release. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-28 07:25:26 -07:00
Andrey Smirnov	56f1ee37fd	feat: upgrade Kubernetes to 1.19.3 Just minor release bump. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-20 05:12:32 -07:00
Andrey Smirnov	773912833e	test: clean up integration test code, fix flakes This enables golangci-lint via build tags for integration tests (this should have been done long ago!), and fixes the linting errors. Two tests were updated to reduce flakiness: * apply config: wait for nodes to issue "boot done" sequence event before proceeding * recover: kill pods even if they appear after the initial set gets killed (potential race condition with previous test). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-19 15:44:14 -07:00
Andrey Smirnov	ff0d4b305a	feat: build Talos images/artifacts for amd64/arm64 By default, build outside of Drone works the same and builds only amd64 version, loads images back into dockerd, etc. If multiple platforms are used, multi-arch images are built which can't be exported to docker or to `.tar` image, they're always pushed to the registry (even for PR builds to our internal CI registry). Artifacts as files (initramfs, kernel) now have `-arch` suffix: `vmlinuz-amd64`, `initramfs-amd64.xz`. "Magic" script normalizes output paths depending on whether single platform or multiple platforms were given. VM provisioners accept magic `${ARCH}` in initramfs/kernel paths which gets replaced by cluster architecture. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-27 10:32:07 -07:00
Andrey Smirnov	0f54574d89	fix: update one more places which had stale reference for constants s/constants/images/ Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-25 10:51:35 -07:00
Andrew Rynhard	27c7bc0788	fix: use images package in integration tests This fixes an incorrect import path. Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2020-09-25 08:11:27 -07:00
Andrey Smirnov	15181aeade	feat: use architecture-specific image for core k8s components This is one step towards running Talos on non-amd64 architectures (e.g. arm64). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-16 01:11:40 -07:00
Andrey Smirnov	f6e075ea55	test: verify kubernetes control plane upgrade in provision tests Add Kubernetes upgrade as part of the provisioning (upgrade tests): first K8s control plane is upgraded, then Talos is upgraded (with kubelet), and e2e test is run last. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-11 10:53:33 -07:00
Andrey Smirnov	788cd15c29	test: add e2e test to the provision (upgrade) tests Add sonobuoy runner code with log fetching on failure. Use hand-picked set of e2e tests to run: verify basic pod functionality, verify service connectivity. Add option `--run-e2e` to the `talosctl health` to run quick e2e test to verify cluster health. Add option to run provision tests with custom CNI, run one track of provision tests with Cilium. Bump Cilium to 1.8.2. Talos 0.6 won't uncordon node automatically after upgrade from 0.5, as 0.5 doesn't put annotation. Workaround that in upgrade tests. Bump upgrade test version to 0.6.0 release. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-08 13:26:31 -07:00
Andrey Smirnov	f6ecf000c9	refactor: extract packages loadbalancer and retry This removes in-tree packages in favor of: * github.com/talos-systems/go-retry * github.com/talos-systems/go-loadbalancer Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-02 13:46:22 -07:00
Andrew Rynhard	1a4059a553	feat: add grub bootloader This moves to using grub instead of syslinux. BREAKING CHANGE: Single node upgrades will fail in this change. This will also break the A/B fallback setup since this version introduces an entirely new partition scheme, that any fallback will not know about. We plan on addressing these issues in a follow up change. Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2020-09-01 12:06:43 -07:00
Andrew Rynhard	83aa3bd3ab	chore: bump next version to v0.6.0-beta.2 This updates the "next" version in our integration tests. Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2020-08-21 01:44:26 -07:00
Andrey Smirnov	bddd4f1bf6	refactor: move external API packages into `machinery/` This moves `pkg/config`, `pkg/client` and `pkg/constants` under `pkg/machinery` umbrella. And `pkg/machinery` is published as Go module inside Talos repository. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-08-17 09:56:14 -07:00
Andrey Smirnov	2697b99b7d	refactor: extract `pkg/net` as `github.com/talos-systems/net` This extracts common package as new module/repository. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-08-14 11:04:50 -07:00
Andrey Smirnov	9379cf9ee1	refactor: expose `provision` as public package This change is only moving packages and updating import paths. Goal: expose `internal/pkg/provision` as `pkg/provision` to enable other projects to import Talos provisioning library. As cluster checks are almost always required as part of provisioning process, package `internal/pkg/cluster` was also made public as `pkg/cluster`. Other changes were direct dependencies discovered by `importvet` which were updated. Public packages (useful, general purpose packages with stable API): * `internal/pkg/conditions` -> `pkg/conditions` * `internal/pkg/tail` -> `pkg/tail` Private packages (used only on provisioning library internally): * `internal/pkg/inmemhttp` -> `pkg/provision/internal/inmemhttp` * `internal/pkg/kernel/vmlinuz` -> `pkg/provision/internal/vmlinuz` * `internal/pkg/cniutils` -> `pkg/provision/internal/cniutils` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-08-12 05:12:05 -07:00
Andrey Smirnov	ede662bcb1	test: bump timeout for upgrade tests 'cordonAndDrainNode' task sometimes takes 5 minutes. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-07-31 00:28:29 +03:00
Andrey Smirnov	a48c1dbe89	chore: use qemu instead of firecracker in CI qemu opens up a bunch of possibilities, including the bootloader testing. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-07-30 22:43:16 +03:00
Andrey Smirnov	a5d64d97c1	test: update qemu/firecracker provisioners Fixes #2363 #2364 #2370 #2371 Several changes packed together: * use compressed `vmlinuz` everywhere, firecracker provisioner uncompresses it before first use, drop `vmlinux` * handle reboots in qemu launcher to support reset API case, update empty disk check to handle reset behavior (erasing partition table) * make bootloader support default in provisioners, and flag to disable that * early support for target architecture for qemu provisioner This should allow us to use `qemu` in CI/CD (not included into this PR): integration test passes with qemu. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-07-30 21:17:25 +03:00
Andrey Smirnov	47608fb874	refactor: make `pkg/config` not rely on `machined/../internal/runtime` This makes `pkg/config` directly importable from other projects. There should be no functional changes. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-07-29 12:40:12 -07:00
Andrey Smirnov	2770d6414c	test: upgrade versions the upgrade tests are operating on This bumps next version to the latest 0.6 alpha and latest 0.5. This also enables single node preserve test. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-07-28 12:35:37 -07:00
Andrey Smirnov	5ecddf2866	feat: add round-robin LB policy to Talos client by default Handling of multiple endpoints has already been implemented in #2094. This PR enables round-robin policy so that grpc picks up new endpoint for each call (and not send each request to the first control plane node). Endpoint list is randomized to handle cases when only one request is going to be sent, so that it doesn't go always to the first node in the list. gprc handles dead/unresponsive nodes automatically for us. `talosctl cluster create` and provision tests switched to use client-side load balancer for Talos API. On the additional improvements we got: * `talosctl` now reports correct node IP when using commands without `-n`, not the loadbalancer IP (if using multiple endpoints of course) * loadbalancer can't provide reliable handling of errors when upstream server is unresponsive or there're no upstreams available, grpc returns much more helpful errors Fixes #1641 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-07-09 08:35:15 -07:00
Andrey Smirnov	81d1c2bfe7	chore: enable godot linter Issues were fixed automatically. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-06-30 10:39:56 -07:00
Andrey Smirnov	51112a1d86	fix: use kubernetes version in config generator Update all k8s image references to point to the version specified by the user. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-06-26 17:05:19 -07:00
Andrew Rynhard	77150f51cf	chore: update provision test versions This adds latest 0.6 alpha and 0.5 stable to the upgrade tests. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-05-29 14:58:54 -07:00
Andrey Smirnov	652531853f	test: update Talos versions for upgrade tests Our policy it to support two last releases (0.4, 0.5 at the moment). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-05-20 07:43:10 -07:00
Andrew Rynhard	49307d554d	refactor: improve machined This is a rewrite of machined. It addresses some of the limitations and complexity in the implementation. This introduces the idea of a controller. A controller is responsible for managing the runtime, the sequencer, and a new state type introduced in this PR. A few highlights are: - no more event bus - functional approach to tasks (no more types defined for each task) - the task function definition now offers a lot more context, like access to raw API requests, the current sequence, a logger, the new state interface, and the runtime interface. - no more panics to handle reboots - additional initialize and reboot sequences - graceful gRPC server shutdown on critical errors - config is now stored at install time to avoid having to download it at install time and at boot time - upgrades now use the local config instead of downloading it - the upgrade API's preserve option takes precedence over the config's install force option Additionally, this pulls various packes in under machined to make the code easier to navigate. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-04-28 08:20:55 -07:00
Andrey Smirnov	55dcbbc8d0	feat: add commands talosctl health/crashdump This extracts health & crashdump features which were specific to provisioning code into separate package which can be used standalone. Everything else is just new glue. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-04-27 20:43:10 -07:00
Andrey Smirnov	ff2267eb99	test: update versions used for upgrade tests We should stick to the latest version in each release series. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-04-07 15:51:56 -07:00
Andrey Smirnov	682dd433ba	refactor: move Talos client package to `pkg/` As this implements Go client for Talos API, it makes sense to publish it one the top level. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-04-01 23:45:58 +03:00

1 2

57 Commits