talos

mirror of https://github.com/siderolabs/talos.git synced 2025-10-13 00:21:12 +02:00

Author	SHA1	Message	Date
Andrey Smirnov	a8dd2ff30d	fix: checkpoint controller-manager and scheduler Default manifests created by bootkube so far were only enabling pod-checkpointer for kube-apiserver. This seems to have issues with single-node control plane scenario, when without scheduler and controller-manager node might fall into `NodeAffinity` state. See https://github.com/talos-systems/bootkube-plugin/pull/23 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-28 11:53:17 -08:00
Andrey Smirnov	f2c029a07d	chore: update upgrade test version used Now with official 0.8.0 release. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-24 18:49:29 +03:00
Artem Chernyshev	a83e8758db	feat: add commands to manage/query etcd cluster Used already existing protobufs for that. Commands: `talosctl etcd members -n <node>` `talosctl etcd leave -n <node>` `talosctl etcd forfeit-leadership -n <node>` Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-12-22 11:49:10 -08:00
Andrey Smirnov	b1d4814308	feat: update Kubernetes to 1.20.1 See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-21 23:52:29 +03:00
Andrey Smirnov	3dae6df27b	test: stabilize upgrade test by running health check several times For single node clusters, control plane is unstable after reboot, run health check several times to let it settle down to avoid failures in subsequent checks. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-11 08:31:01 -08:00
Andrey Smirnov	54ed80e244	feat: reset with system disk wipe spec Idea is to add an option to perform "selective" reset: default reset operation is to wipe all partitions (triggering reinstall), while spec allows only to wipe some of the operations. Other operations are performed exactly in the same way for any reset flow. Possible use case: reset only `EPHEMERAL` partition. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-10 11:31:07 -08:00
Artem Chernyshev	68dd5b9add	feat: add talosctl merge config command Allows merging two Talos configs into one. Merges the config in whatever is set by TALOSCONFIG or ~/.talos/config. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-12-09 13:07:45 -08:00
Artem Chernyshev	d7ce831465	feat: add talosctl config contexts Bonus to `talosctl config merge`. Got that idea after using talosctl for a weekend. I feel that can be a good addition to have a command that can list existing contexts in a table view, which is similar to what `kubectl config get-contexts` does. To avoid going through the file which has all the certs and such. Called it just `contexts` to align with whatever we have now (to switch context you need to use `talosctl config context`). Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-12-09 12:19:10 -08:00
Andrey Smirnov	872e792dbc	feat: update Kubernetes to 1.20.0 Official K8s release matching Talos 0.8.0. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-09 06:11:48 -08:00
Andrey Smirnov	350280eb59	feat: implement "staged" (failsafe/backup) upgrades Regular upgrade path takes just one reboot, but it requires all the processes to be stopped on the node before upgrade might proceed. Under some circumstances and with potential Talos bugs it might not work rendering Talos upgrades almost impossible. Staged upgrades build upon regular install flow to run the upgrade on the node reboot. Such upgrades require two reboots of the node, and it requires two pulls of the installer image, but they should be much less suspicious to the failure. Once the upgrade is staged, node can be rebooted in any possible way, including hard reset and upgrade is performed on the next boot. New ADV format was implemented as well to allow to store install image ref/options across reboots. New format allows for bigger values and takes 50% of the `META` partition. Old ADV is still kept for compatibility reasons. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-08 08:34:26 -08:00
Andrey Smirnov	1cf6b98fb8	test: bump Talos release version for upgrade test to 0.7.1 We should always use latest releases. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-08 18:41:28 +03:00
Andrey Smirnov	11c2b8f80c	test: bump defaults for provision tests resources Our defaults are too low now. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-07 07:01:41 -08:00
Andrey Smirnov	e4ebc4ab95	feat: suggest fixed control plane endpoints in talosctl gen config Ex.: ``` $ talosctl gen config foo 192.168.0.1 no scheme and port specified for the cluster endpoint URL try: "https://192.168.0.1:6443" ``` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-02 13:16:30 -08:00
Andrey Smirnov	621968977e	feat: update kubernetes to 1.20.0-rc.0 Talos 0.8 is going to ship with K8s 1.20. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-02 10:50:58 -08:00
Artem Chernyshev	8aad711f18	feat: implement network interfaces list API To be used in the interactive installer to configure networking. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-27 10:48:45 -08:00
Artem Chernyshev	f96cffd2b2	feat: add ability to choose CNI config Initial version which only allows setting CNI using preset, no custom CNI urls are supported at the moment. Still need to figure out what kind of UI can be used for that. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-26 06:49:54 -08:00
Andrey Smirnov	28ba6e416e	feat: update Kubernetes to v1.20.0-beta.2 Talos 0.8 is going to ship with K8s 1.20.x. Changes to support new `control-plane` label, upgrade-k8s supports automated fixups for 1.20. See also: https://github.com/talos-systems/bootkube-plugin/pull/22 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-25 06:39:14 -08:00
Andrey Smirnov	9a32e34cb1	feat: implement apply configuration without reboot This allows config to be written to disk without being applied immediately. Small refactoring to extract common code paths. At first, I tried to implement this via the sequencer, but looks like it's too hard to get it right, as sequencer lacks context and config to be written is not applied to the runtime. Fixes #2828 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-23 12:42:44 -08:00
Artem Chernyshev	2588e2960b	feat: make GenerateConfiguration API reuse current node auth Fixes: https://github.com/talos-systems/talos/issues/2819 Only if requested config type is not `TypeInit`. This functionality will help implementing TUI installer cluster extension workflow. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-23 12:12:15 -08:00
Artem Chernyshev	b6874ee82a	feat: add TUI based talos interactive installer This is initial commit of the installer. What's done: - verifying node availability before starting any operations. - gathering information about disks on the machine. - allows setting: install disk, hostname, machine type, installer image, kubernetes version, dns domain, cluster-name. - dumps/merges talosconfig to a file after applying configuration. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-18 12:34:15 -08:00
Andrey Smirnov	07cbf4be3f	test: update integration test versions, clean up names Bump to 0.7.0 as we have a new release. Clean up the tests we do: 0.6.3 is a previous release, 0.7.0 is a stable release, current version (0.8.x) is the "next" release. We test the following: * 0.6.3 -> 0.7.0 * 0.7.0 -> 0.8-current * 0.7.0 -> 0.8-current (single node) This tests upgrades always between two releases. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-18 16:39:40 +03:00
Artem Chernyshev	8513123d22	feat: return client config as the second value in GenerateConfiguration To be used in interactive installer to output the node client configuration to a file. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-17 07:20:05 -08:00
Artem Chernyshev	0f924b5122	feat: add generate config gRPC API Fixes: https://github.com/talos-systems/talos/issues/2766 This API is implemented in Maintenance and Machine services. Can be used to generate configuration on the node, instead of using talosctl to generate it locally. To be used in interactive installer and talosctl gen config. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-13 08:07:32 -08:00
Andrey Smirnov	df6ad3fa80	feat: upgrade Kubernetes default version to 1.19.4 k8s.io modules don't have 1.19.4 tag yet :( Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-12 08:51:04 -08:00
Andrey Smirnov	b2b86a622e	fix: remove 'token creds' from maintenance service This fixes the reverse Go dependency from `pkg/machinery` to `talos` package. Add a check to `Dockerfile` to prevent `pkg/machinery/go.mod` getting out of sync, this should prevent problems in the future. Fix potential security issue in `token` authorizer to deny requests without grpc metadata. In provisioner, add support for launching nodes without the config (config is not delivered to the provisioned nodes). Breaking change in `pkg/provision`: now `NodeRequest.Type` should be set to the node type (as config can be missing now). In `talosctl cluster create` add a flag to skip providing config to the nodes so that they enter maintenance mode, while the generated configs are written down to disk (so they can be tweaked and applied easily). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-09 14:10:32 -08:00
Andrey Smirnov	8560fb9662	chore: enable nlreturn linter Most of the fixes were automatically applied. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-09 06:48:07 -08:00
Artem Chernyshev	061b296530	feat: allow specifying user-disks in talosctl cluster create User-disks are supported by QEMU and Firecracker providers. Can be defined by using the following parameters: ``` --user-disk /mount/path:1GB ``` Can get more than 1 user disk. Same set of user disks will be created for all master and worker nodes. Additionally enable user-disks in qemu e2e test. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-10-30 08:44:08 -07:00
Andrey Smirnov	66829b14d5	test: bump Talos version for upgrade tests, bump Cilium version Use 0.6.3 as upgrade source version, use latest Cilium release. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-29 22:22:21 +03:00
Andrey Smirnov	bc9e0c0dba	fix: re-implement upgrade (install) with preserve For 0.6 -> 0.7 upgrade, in any case config.yaml is preserved and moved from `/boot` to `/system/state`. For single node upgrade, `EPHEMERAL` partition is not touched and other partitions are re-created as needed. Bump provision tests to 0.6/0.7 upgrades as we get closer to the new release. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-28 07:25:26 -07:00
Andrey Smirnov	ff4d702f77	fix: implement preserving contents of partition on install This fixes A/B upgrades and rollback API. Installer manifest supports now an option to preserve partition contents while disk is being re-partitioned and partitions are re-formatted. Mount `/boot` partition as needed (to find current label before starting the installation and in the rollback API). Fix upgrade API for non-master nodes. Contents of `/boot`, `/system/state` and META partitions are preserved in memory while the disk is re-partitioned. Remove `--save` flag from the installer as it's not being used. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-22 23:56:39 +03:00
Andrey Smirnov	56f1ee37fd	feat: upgrade Kubernetes to 1.19.3 Just minor release bump. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-20 05:12:32 -07:00
Andrey Smirnov	773912833e	test: clean up integration test code, fix flakes This enables golangci-lint via build tags for integration tests (this should have been done long ago!), and fixes the linting errors. Two tests were updated to reduce flakiness: * apply config: wait for nodes to issue "boot done" sequence event before proceeding * recover: kill pods even if they appear after the initial set gets killed (potential race condition with previous test). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-19 15:44:14 -07:00
Artem Chernyshev	e7e99cf1b3	feat: support disk usage command in talosctl Usage example: ```bash talosctl du --nodes 10.5.0.2 /var -H -d 2 NODE NAME 10.5.0.2 8.4 kB etc 10.5.0.2 1.3 GB lib 10.5.0.2 16 MB log 10.5.0.2 25 kB run 10.5.0.2 4.1 kB tmp 10.5.0.2 1.3 GB . ``` Supported flags: - `-a` writes counts for all files, not just directories. - `-d` recursion depth - '-H' humanize size outputs. - '-t' size threshold (skip files if < size or > size). Fixes: https://github.com/talos-systems/talos/issues/2504 Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-10-13 09:30:31 -07:00
Andrey Smirnov	dc6ea74c35	fix: random failures in cluster health checks The problem was that some of the health checks sort the list of the nodes in place (via `sort.Strings()`). If cluster info provider returns original slice, it might be mutated in such a way that it gets corrupted. We never noticed it before CAPI clusters, as in our tests IPs are assigned sequentially, and sort operation is a no-op. Specifically, the problem was with the `Nodes()` function, it returns `append(controlPlaneNodes, workerNodes...)` slice, which by definition might share memory with `controlPlaneNodes` slice. For example, if control plane nodes were `4, 5, 6` and worker nodes were `3`, the returned slice will be `4, 5, 6, 3`, and it shares memory with `controlPlaneNodes` slice (firs three items). If we apply `sort` to the returned slice, it re-orders it as `3, 4, 5, 6`, but as it is done in-place, the `controlPlaneNodes` slice is now `3, 4, 5`, which is obviously wrong. Fix that by always returning a copy of the slice from the functions implementing `ClusterInfo` interface. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-08 07:13:24 -07:00
Andrew Rynhard	4eeef28e90	feat: add etcd API This adds RPCs for basic etcd management tasks. Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2020-10-06 11:30:04 -07:00
Andrey Smirnov	d7f5de62c3	feat: colorize output of cluster health checks It only gets enabled if output is a terminal. Failures which resolve themselves are removed from the final output. Small spinner to indicate progress. While I was at it, I fixed client-side `talosctl health` when init node is missing. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-06 07:59:30 -07:00
Andrey Smirnov	16eb47a1a3	feat: use kubeconfig merge in `talosctl kubeconfig` by default Kubeconfig merge was completely rewritten to be "smarter": * automatically apply renames done at previous stages to avoid asking over and over again (in general should ask just once) * skip checks if parts of the config match exactly * allow overwrite as an option * flexible way to control the output * activating context in the end * custom merged context name Fixes #2578 Fixes #2587 Fixes #2577 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-03 05:36:15 -07:00
Andrey Smirnov	b9bfe00b88	feat: support custom filename for talosctl kubeconfig This also refactors much of the CLI code for the `talosctl kubeconfig`: 1. Do all the checks before fetching kubeconfig from the server: as kubeconfig generation takes a few seconds, it doesn't make sense to generate it if it's not going to be used. 2. Unify most of merge & write directly features. 3. Don't use ExtractTarGz method to be more flexible. 4. Allow custom paths for kubeconfig, whether it is a directory or full path to the file to be created. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-30 12:05:50 -07:00
Seán C McCord	ff92d2a14b	feat: add ApplyConfiguration API Adds the ability to apply (replace) an existing node configuration with a new one via the Machine API. Fixes #2345 Signed-off-by: Seán C McCord <ulexus@gmail.com>	2020-09-29 14:44:06 -07:00
Andrey Smirnov	ff0d4b305a	feat: build Talos images/artifacts for amd64/arm64 By default, build outside of Drone works the same and builds only amd64 version, loads images back into dockerd, etc. If multiple platforms are used, multi-arch images are built which can't be exported to docker or to `.tar` image, they're always pushed to the registry (even for PR builds to our internal CI registry). Artifacts as files (initramfs, kernel) now have `-arch` suffix: `vmlinuz-amd64`, `initramfs-amd64.xz`. "Magic" script normalizes output paths depending on whether single platform or multiple platforms were given. VM provisioners accept magic `${ARCH}` in initramfs/kernel paths which gets replaced by cluster architecture. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-27 10:32:07 -07:00
Andrey Smirnov	0f54574d89	fix: update one more places which had stale reference for constants s/constants/images/ Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-25 10:51:35 -07:00
Andrew Rynhard	27c7bc0788	fix: use images package in integration tests This fixes an incorrect import path. Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2020-09-25 08:11:27 -07:00
Andrew Rynhard	c693e556d2	feat: add images command This adds a command that lists all of the images used by Talos. This is useful in the case of airgap installs, so that users will know which images to pull. Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2020-09-18 12:55:08 -07:00
Andrey Smirnov	15181aeade	feat: use architecture-specific image for core k8s components This is one step towards running Talos on non-amd64 architectures (e.g. arm64). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-16 01:11:40 -07:00
Andrey Smirnov	f6e075ea55	test: verify kubernetes control plane upgrade in provision tests Add Kubernetes upgrade as part of the provisioning (upgrade tests): first K8s control plane is upgraded, then Talos is upgraded (with kubelet), and e2e test is run last. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-11 10:53:33 -07:00
Andrey Smirnov	788cd15c29	test: add e2e test to the provision (upgrade) tests Add sonobuoy runner code with log fetching on failure. Use hand-picked set of e2e tests to run: verify basic pod functionality, verify service connectivity. Add option `--run-e2e` to the `talosctl health` to run quick e2e test to verify cluster health. Add option to run provision tests with custom CNI, run one track of provision tests with Cilium. Bump Cilium to 1.8.2. Talos 0.6 won't uncordon node automatically after upgrade from 0.5, as 0.5 doesn't put annotation. Workaround that in upgrade tests. Bump upgrade test version to 0.6.0 release. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-08 13:26:31 -07:00
Andrey Smirnov	f6ecf000c9	refactor: extract packages loadbalancer and retry This removes in-tree packages in favor of: * github.com/talos-systems/go-retry * github.com/talos-systems/go-loadbalancer Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-02 13:46:22 -07:00
Andrew Rynhard	1a4059a553	feat: add grub bootloader This moves to using grub instead of syslinux. BREAKING CHANGE: Single node upgrades will fail in this change. This will also break the A/B fallback setup since this version introduces an entirely new partition scheme, that any fallback will not know about. We plan on addressing these issues in a follow up change. Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2020-09-01 12:06:43 -07:00
Marco De Luca	1fbb171fd0	test: determine reboots using boot id Changed the RebootSuite to use /proc/sys/kernel/random/boot_id rather than /proc/uptime Signed-off-by: Marco De Luca <marcodl404@gmail.com>	2020-08-26 06:09:02 -07:00
Andrew Rynhard	83aa3bd3ab	chore: bump next version to v0.6.0-beta.2 This updates the "next" version in our integration tests. Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2020-08-21 01:44:26 -07:00

1 2 3

127 Commits