talos

mirror of https://github.com/siderolabs/talos.git synced 2025-10-29 15:31:12 +01:00

Author	SHA1	Message	Date
Artem Chernyshev	f96548e165	refactor: extract go-cmd into a separate library To be used in the `go-blockdevice` library. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-02-16 10:31:20 -08:00
Andrey Smirnov	d99a016af2	fix: correct response structure for GenerateConfig API Also fix recovery grpc handler to print panic stacktrace to the log. Any API should follow the structure compatible with apid proxying injection of errors/nodes. Explicitly fail GenerateConfig API on worker nodes, as it panics on worker nodes (missing certificates in node config). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-11 06:34:10 -08:00
Andrey Smirnov	daea9d3811	feat: support version contract for Talos config generation This allows to generating current version Talos configs (by default) or backwards compatible configuration (e.g. for Talos 0.8). `talosctl gen config` defaults to current version, but explicit version can be passed to the command via flags. `talosctl cluster create` defaults to install/container image version, but that can be overridden. This makes `talosctl cluster create` now compatible with 0.8.1 images out of the box. Upgrade tests use contract based on source version in the test. When used as a library, `VersionContract` can be omitted (defaults to current version) or passed explicitly. `VersionContract` can be convienietly parsed from Talos version string or specified as one of the constants. Fixes #3130 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-10 13:02:52 -08:00
Andrey Smirnov	7f3dca8e4c	test: add support for IPv6 in talosctl cluster create Modify provision library to support multiple IPs, CIDRs, gateways, which can be IPv4/IPv6. Based on IP types, enable services in the cluster to run DHCPv4/DHCPv6 in the test environment. There's outstanding bug left with routes not being properly set up in the cluster so, IPs are not properly routable, but DHCPv6 works and IPs are allocated (validates DHCPv6 client). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-09 13:28:53 -08:00
Andrey Smirnov	edf5777222	feat: add an option to force upgrade without checks Our upgrades are safe by default - we check etcd health, take locks, etc. But sometimes upgrades might be a way to recover broken (or semi-broken) cluster, in that case we need upgrade to run even if the checks are not passing. This is not a safe way to do upgrades, but it might be a way to recover a cluster. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-09 10:20:03 -08:00
Andrey Smirnov	2277ce8abe	feat: move to ECDSA keys for all Kubernetes/etcd certs and keys ECDSA keys are smaller which decreases Talos config size, they are more efficient in terms of key generation, signing, etc., so it makes boot performance better (and config generation as well). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-02 13:25:00 -08:00
Andrey Smirnov	87ccf0eb21	test: clear connection refused errors after reset After node reboot (and gRPC API unavailability), gRPC stack might cache connection refused errors for up to backoff timeout. Explicitly clear such errors in reset tests before trying to read data from the node to verify reset success. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-01 08:11:27 -08:00
Andrey Smirnov	e0a0f58801	feat: use multi-arch images for k8s and Flannel CNI Flannel got updated to 0.13 version which has multi-arch image. Kubernetes images are multi-arch. Fixes #3049 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-01-28 08:26:02 -08:00
Andrey Smirnov	0aaf8fa968	feat: replace bootkube with Talos-managed control plane Control plane components are running as static pods managed by the kubelets. Whole subsystem is managed via resources/controllers from os-runtime. Many supporting changes/refactoring to enable new code paths. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-01-26 14:22:35 -08:00
Andrey Smirnov	d71ac4c4ff	feat: update Kubernetes to 1.20.2 Minor point release, official changelog: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-01-15 09:06:18 -08:00
Artem Chernyshev	d515613bb7	fix: list command unlimited recursion default behavior Revert back to old behavior. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-01-15 05:06:41 -08:00
Andrey Smirnov	47fb5720cf	test: skip etcd tests on non-HA clusters We can't test much of the flow on single-node clusters. Fixes #3013 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-01-08 07:39:36 -08:00
Andrey Smirnov	a8dd2ff30d	fix: checkpoint controller-manager and scheduler Default manifests created by bootkube so far were only enabling pod-checkpointer for kube-apiserver. This seems to have issues with single-node control plane scenario, when without scheduler and controller-manager node might fall into `NodeAffinity` state. See https://github.com/talos-systems/bootkube-plugin/pull/23 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-28 11:53:17 -08:00
Andrey Smirnov	f2c029a07d	chore: update upgrade test version used Now with official 0.8.0 release. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-24 18:49:29 +03:00
Artem Chernyshev	a83e8758db	feat: add commands to manage/query etcd cluster Used already existing protobufs for that. Commands: `talosctl etcd members -n <node>` `talosctl etcd leave -n <node>` `talosctl etcd forfeit-leadership -n <node>` Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-12-22 11:49:10 -08:00
Andrey Smirnov	b1d4814308	feat: update Kubernetes to 1.20.1 See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-21 23:52:29 +03:00
Andrey Smirnov	3dae6df27b	test: stabilize upgrade test by running health check several times For single node clusters, control plane is unstable after reboot, run health check several times to let it settle down to avoid failures in subsequent checks. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-11 08:31:01 -08:00
Andrey Smirnov	54ed80e244	feat: reset with system disk wipe spec Idea is to add an option to perform "selective" reset: default reset operation is to wipe all partitions (triggering reinstall), while spec allows only to wipe some of the operations. Other operations are performed exactly in the same way for any reset flow. Possible use case: reset only `EPHEMERAL` partition. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-10 11:31:07 -08:00
Artem Chernyshev	68dd5b9add	feat: add talosctl merge config command Allows merging two Talos configs into one. Merges the config in whatever is set by TALOSCONFIG or ~/.talos/config. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-12-09 13:07:45 -08:00
Artem Chernyshev	d7ce831465	feat: add talosctl config contexts Bonus to `talosctl config merge`. Got that idea after using talosctl for a weekend. I feel that can be a good addition to have a command that can list existing contexts in a table view, which is similar to what `kubectl config get-contexts` does. To avoid going through the file which has all the certs and such. Called it just `contexts` to align with whatever we have now (to switch context you need to use `talosctl config context`). Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-12-09 12:19:10 -08:00
Andrey Smirnov	872e792dbc	feat: update Kubernetes to 1.20.0 Official K8s release matching Talos 0.8.0. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-09 06:11:48 -08:00
Andrey Smirnov	350280eb59	feat: implement "staged" (failsafe/backup) upgrades Regular upgrade path takes just one reboot, but it requires all the processes to be stopped on the node before upgrade might proceed. Under some circumstances and with potential Talos bugs it might not work rendering Talos upgrades almost impossible. Staged upgrades build upon regular install flow to run the upgrade on the node reboot. Such upgrades require two reboots of the node, and it requires two pulls of the installer image, but they should be much less suspicious to the failure. Once the upgrade is staged, node can be rebooted in any possible way, including hard reset and upgrade is performed on the next boot. New ADV format was implemented as well to allow to store install image ref/options across reboots. New format allows for bigger values and takes 50% of the `META` partition. Old ADV is still kept for compatibility reasons. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-08 08:34:26 -08:00
Andrey Smirnov	1cf6b98fb8	test: bump Talos release version for upgrade test to 0.7.1 We should always use latest releases. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-08 18:41:28 +03:00
Andrey Smirnov	11c2b8f80c	test: bump defaults for provision tests resources Our defaults are too low now. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-07 07:01:41 -08:00
Andrey Smirnov	e4ebc4ab95	feat: suggest fixed control plane endpoints in talosctl gen config Ex.: ``` $ talosctl gen config foo 192.168.0.1 no scheme and port specified for the cluster endpoint URL try: "https://192.168.0.1:6443" ``` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-02 13:16:30 -08:00
Andrey Smirnov	621968977e	feat: update kubernetes to 1.20.0-rc.0 Talos 0.8 is going to ship with K8s 1.20. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-02 10:50:58 -08:00
Artem Chernyshev	8aad711f18	feat: implement network interfaces list API To be used in the interactive installer to configure networking. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-27 10:48:45 -08:00
Artem Chernyshev	f96cffd2b2	feat: add ability to choose CNI config Initial version which only allows setting CNI using preset, no custom CNI urls are supported at the moment. Still need to figure out what kind of UI can be used for that. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-26 06:49:54 -08:00
Andrey Smirnov	28ba6e416e	feat: update Kubernetes to v1.20.0-beta.2 Talos 0.8 is going to ship with K8s 1.20.x. Changes to support new `control-plane` label, upgrade-k8s supports automated fixups for 1.20. See also: https://github.com/talos-systems/bootkube-plugin/pull/22 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-25 06:39:14 -08:00
Andrey Smirnov	9a32e34cb1	feat: implement apply configuration without reboot This allows config to be written to disk without being applied immediately. Small refactoring to extract common code paths. At first, I tried to implement this via the sequencer, but looks like it's too hard to get it right, as sequencer lacks context and config to be written is not applied to the runtime. Fixes #2828 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-23 12:42:44 -08:00
Artem Chernyshev	2588e2960b	feat: make GenerateConfiguration API reuse current node auth Fixes: https://github.com/talos-systems/talos/issues/2819 Only if requested config type is not `TypeInit`. This functionality will help implementing TUI installer cluster extension workflow. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-23 12:12:15 -08:00
Artem Chernyshev	b6874ee82a	feat: add TUI based talos interactive installer This is initial commit of the installer. What's done: - verifying node availability before starting any operations. - gathering information about disks on the machine. - allows setting: install disk, hostname, machine type, installer image, kubernetes version, dns domain, cluster-name. - dumps/merges talosconfig to a file after applying configuration. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-18 12:34:15 -08:00
Andrey Smirnov	07cbf4be3f	test: update integration test versions, clean up names Bump to 0.7.0 as we have a new release. Clean up the tests we do: 0.6.3 is a previous release, 0.7.0 is a stable release, current version (0.8.x) is the "next" release. We test the following: * 0.6.3 -> 0.7.0 * 0.7.0 -> 0.8-current * 0.7.0 -> 0.8-current (single node) This tests upgrades always between two releases. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-18 16:39:40 +03:00
Artem Chernyshev	8513123d22	feat: return client config as the second value in GenerateConfiguration To be used in interactive installer to output the node client configuration to a file. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-17 07:20:05 -08:00
Artem Chernyshev	0f924b5122	feat: add generate config gRPC API Fixes: https://github.com/talos-systems/talos/issues/2766 This API is implemented in Maintenance and Machine services. Can be used to generate configuration on the node, instead of using talosctl to generate it locally. To be used in interactive installer and talosctl gen config. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-13 08:07:32 -08:00
Andrey Smirnov	df6ad3fa80	feat: upgrade Kubernetes default version to 1.19.4 k8s.io modules don't have 1.19.4 tag yet :( Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-12 08:51:04 -08:00
Andrey Smirnov	b2b86a622e	fix: remove 'token creds' from maintenance service This fixes the reverse Go dependency from `pkg/machinery` to `talos` package. Add a check to `Dockerfile` to prevent `pkg/machinery/go.mod` getting out of sync, this should prevent problems in the future. Fix potential security issue in `token` authorizer to deny requests without grpc metadata. In provisioner, add support for launching nodes without the config (config is not delivered to the provisioned nodes). Breaking change in `pkg/provision`: now `NodeRequest.Type` should be set to the node type (as config can be missing now). In `talosctl cluster create` add a flag to skip providing config to the nodes so that they enter maintenance mode, while the generated configs are written down to disk (so they can be tweaked and applied easily). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-09 14:10:32 -08:00
Andrey Smirnov	8560fb9662	chore: enable nlreturn linter Most of the fixes were automatically applied. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-09 06:48:07 -08:00
Artem Chernyshev	061b296530	feat: allow specifying user-disks in talosctl cluster create User-disks are supported by QEMU and Firecracker providers. Can be defined by using the following parameters: ``` --user-disk /mount/path:1GB ``` Can get more than 1 user disk. Same set of user disks will be created for all master and worker nodes. Additionally enable user-disks in qemu e2e test. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-10-30 08:44:08 -07:00
Andrey Smirnov	66829b14d5	test: bump Talos version for upgrade tests, bump Cilium version Use 0.6.3 as upgrade source version, use latest Cilium release. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-29 22:22:21 +03:00
Andrey Smirnov	bc9e0c0dba	fix: re-implement upgrade (install) with preserve For 0.6 -> 0.7 upgrade, in any case config.yaml is preserved and moved from `/boot` to `/system/state`. For single node upgrade, `EPHEMERAL` partition is not touched and other partitions are re-created as needed. Bump provision tests to 0.6/0.7 upgrades as we get closer to the new release. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-28 07:25:26 -07:00
Andrey Smirnov	ff4d702f77	fix: implement preserving contents of partition on install This fixes A/B upgrades and rollback API. Installer manifest supports now an option to preserve partition contents while disk is being re-partitioned and partitions are re-formatted. Mount `/boot` partition as needed (to find current label before starting the installation and in the rollback API). Fix upgrade API for non-master nodes. Contents of `/boot`, `/system/state` and META partitions are preserved in memory while the disk is re-partitioned. Remove `--save` flag from the installer as it's not being used. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-22 23:56:39 +03:00
Andrey Smirnov	56f1ee37fd	feat: upgrade Kubernetes to 1.19.3 Just minor release bump. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-20 05:12:32 -07:00
Andrey Smirnov	773912833e	test: clean up integration test code, fix flakes This enables golangci-lint via build tags for integration tests (this should have been done long ago!), and fixes the linting errors. Two tests were updated to reduce flakiness: * apply config: wait for nodes to issue "boot done" sequence event before proceeding * recover: kill pods even if they appear after the initial set gets killed (potential race condition with previous test). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-19 15:44:14 -07:00
Artem Chernyshev	e7e99cf1b3	feat: support disk usage command in talosctl Usage example: ```bash talosctl du --nodes 10.5.0.2 /var -H -d 2 NODE NAME 10.5.0.2 8.4 kB etc 10.5.0.2 1.3 GB lib 10.5.0.2 16 MB log 10.5.0.2 25 kB run 10.5.0.2 4.1 kB tmp 10.5.0.2 1.3 GB . ``` Supported flags: - `-a` writes counts for all files, not just directories. - `-d` recursion depth - '-H' humanize size outputs. - '-t' size threshold (skip files if < size or > size). Fixes: https://github.com/talos-systems/talos/issues/2504 Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-10-13 09:30:31 -07:00
Andrey Smirnov	dc6ea74c35	fix: random failures in cluster health checks The problem was that some of the health checks sort the list of the nodes in place (via `sort.Strings()`). If cluster info provider returns original slice, it might be mutated in such a way that it gets corrupted. We never noticed it before CAPI clusters, as in our tests IPs are assigned sequentially, and sort operation is a no-op. Specifically, the problem was with the `Nodes()` function, it returns `append(controlPlaneNodes, workerNodes...)` slice, which by definition might share memory with `controlPlaneNodes` slice. For example, if control plane nodes were `4, 5, 6` and worker nodes were `3`, the returned slice will be `4, 5, 6, 3`, and it shares memory with `controlPlaneNodes` slice (firs three items). If we apply `sort` to the returned slice, it re-orders it as `3, 4, 5, 6`, but as it is done in-place, the `controlPlaneNodes` slice is now `3, 4, 5`, which is obviously wrong. Fix that by always returning a copy of the slice from the functions implementing `ClusterInfo` interface. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-08 07:13:24 -07:00
Andrew Rynhard	4eeef28e90	feat: add etcd API This adds RPCs for basic etcd management tasks. Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2020-10-06 11:30:04 -07:00
Andrey Smirnov	d7f5de62c3	feat: colorize output of cluster health checks It only gets enabled if output is a terminal. Failures which resolve themselves are removed from the final output. Small spinner to indicate progress. While I was at it, I fixed client-side `talosctl health` when init node is missing. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-06 07:59:30 -07:00
Andrey Smirnov	16eb47a1a3	feat: use kubeconfig merge in `talosctl kubeconfig` by default Kubeconfig merge was completely rewritten to be "smarter": * automatically apply renames done at previous stages to avoid asking over and over again (in general should ask just once) * skip checks if parts of the config match exactly * allow overwrite as an option * flexible way to control the output * activating context in the end * custom merged context name Fixes #2578 Fixes #2587 Fixes #2577 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-03 05:36:15 -07:00
Andrey Smirnov	b9bfe00b88	feat: support custom filename for talosctl kubeconfig This also refactors much of the CLI code for the `talosctl kubeconfig`: 1. Do all the checks before fetching kubeconfig from the server: as kubeconfig generation takes a few seconds, it doesn't make sense to generate it if it's not going to be used. 2. Unify most of merge & write directly features. 3. Don't use ExtractTarGz method to be more flexible. 4. Allow custom paths for kubeconfig, whether it is a directory or full path to the file to be created. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-30 12:05:50 -07:00

... 3 4 5 6 7

339 Commits