talos

mirror of https://github.com/siderolabs/talos.git synced 2025-10-10 07:01:12 +02:00

Author	SHA1	Message	Date
Dmitriy Matrenichev	fc48849d00	chore: move maps/slices/ordered to gen module Use github.com/siderolabs/gen Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-09-21 20:22:43 +03:00
Andrey Smirnov	2dadcd6695	fix: stop worker nodes from acting as apid routers Don't allow worker nodes to act as apid routers: * don't try to issue client certificate for apid on worker nodes * if worker nodes receives incoming connections with `--nodes` set to one of the local addresses of the nodd, it routes the request to itself without proxying Second point allows using `talosctl -e worker -n worker` to connect directly to the worker if the connection from the control plane is not available for some reason. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-13 15:07:31 +04:00
Dmitriy Matrenichev	b59ca5810e	chore: move from inet.af/netaddr to net/netip and go4.org/netipx Closes #6007 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-08-25 17:51:32 +03:00
Dmitriy Matrenichev	29bd632401	chore: remove old build tags syntax This commit removes lines contains old build tag syntax. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-08-24 17:27:01 +03:00
Andrey Smirnov	9baca49662	refactor: implement COSI resource API for Talos Overview: deprecate existing Talos resource API, and introduce new COSI API. Consequences: * COSI API can only go via one-2-one proxy (`client.WithNode`) * client-side API access is way easier with `state.State` wrappers * lots of small changes on the client side to use new APIs Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-08-12 22:31:54 +04:00
Noel Georgi	b62b18a972	feat: bump k8s to v1.25.0-beta.0 Bump k8s to v1.25.0-beta.0 Update most kubernetes `master` references to `controlplane` Signed-off-by: Noel Georgi <git@frezbo.dev>	2022-08-10 22:17:53 +05:30
Utku Ozdemir	84e712a9f1	feat: introduce Talos API access from Kubernetes We add a new CRD, `serviceaccounts.talos.dev` (with `tsa` as short name), and its controller which allows users to get a `Secret` containing a short-lived Talosconfig in their namespaces with the roles they need. Additionally, we introduce the `talosctl inject serviceaccount` command to accept a YAML file with Kubernetes manifests and inject them with Talos service accounts so that they can be directly applied to Kubernetes afterwards. If Talos API access feature is enabled on Talos side, the injected workloads will be able to talk to Talos API. Closes siderolabs/talos#4422. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2022-08-08 18:27:26 +02:00
Andrey Smirnov	a6b010a8b4	chore: update Go to 1.19, Linux to 5.15.58 See https://go.dev/doc/go1.19 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-08-03 17:03:58 +04:00
Artem Chernyshev	8028e10749	fix: wait for boot done when rebooting a node in the integration tests We shouldn't start cluster healthcheck until boot sequence is done. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-07-27 23:58:43 +03:00
Artem Chernyshev	ae1bec59e9	feat: allow running only one sequence at a time Fix `Talos` sequencer to run only a single sequence at the same time. Sequences priority was updated. To match the table: \| what is running (columns) what is requested (rows) \| boot \| reboot \| reset \| upgrade \| \|----------------------------------------------------\|------\|--------\|-------\|---------\| \| reboot \| Y \| Y \| Y \| N \| \| reset \| Y \| N \| N \| N \| \| upgrade \| Y \| N \| N \| N \| With a small addition that `WithTakeover` is still there. If set, priority is ignored. This is mainly used for `Shutdown` sequence invokation. And if doing apply config with reboot enabled. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-07-27 17:21:36 +03:00
Dmitriy Matrenichev	30f7851d2a	chore: bump golangci-lint from 1.45.2 to 1.47.2 Minor linter upgrade. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-07-22 17:49:44 +03:00
Utku Ozdemir	bb4abc0961	fix: regenerate kubelet certs when hostname changes Clear the kubelet certificates and kubeconfig when hostname changes so that on next start, kubelet goes through the bootstrap process and new certificates are generated and the node is joined to the cluster with the new name. Fixes siderolabs/talos#5834. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2022-07-21 01:54:15 +02:00
Andrey Smirnov	a167a54021	test: fix CLI nodes discovery without provisioner data When integration tests run without data from Talos provisioner (e.g. against AWS/GCP), it should work only with `talosconfig` as an input. This specific flow was missing filling out `infoWrapper` properly. Clean up things a bit by reducing code duplication. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-06-21 18:42:26 +04:00
Utku Ozdemir	6759fcd4ae	feat: use discovery service on cluster health checks Query the discovery service to fetch the node list and use the results in health checks. Closes siderolabs#5554. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2022-06-15 16:01:38 +02:00
Utku Ozdemir	8d2be5e315	feat: extend node definition used in health checks Introduce `cluster.NodeInfo` to represent the basic info about a node which can be used in the health checks. This information, where possible, will be populated by the discovery service in following PRs. Part of siderolabs#5554. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2022-06-13 14:13:42 +02:00
Andrey Smirnov	2ae0e3a569	test: add a test for version of Go Talos was built with This is to ensure that in fact Talos is built with Go version we expect. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-05-11 21:51:12 +03:00
Andrey Smirnov	c297d66a13	test: attempt number on two on proper retries in CLI time tests See #4702 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-22 18:29:34 +03:00
Andrey Smirnov	17c1474881	test: retry `talosctl time` call in the tests As `talosctl time` relies on default time server set in the config, and our nodes start with `pool.ntp.org`, sometimes request to the timeserver fails failing the tests. Retry such errors in the tests to avoid spurious failures. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-17 20:55:06 +03:00
Alexey Palazhchenko	7462733bcb	chore: update golangci-lint Fix context propagation. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-11-15 14:55:25 +00:00
Andrey Smirnov	b6b78e7fef	test: add cluster discovery integration tests This verifies that members match cluster state and that both cluster registries work in sync producing same discovery data. Fixes #4191 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-10-25 21:03:29 +03:00
Andrey Smirnov	a059454045	chore: build using Go 1.17 `initramfs` size for amd64 shrinks by 1.3 MiB. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-09-13 22:33:47 +03:00
Alexey Palazhchenko	eea750de2c	chore: rename "join" type to "worker" Closes #3413. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-07-09 07:10:45 -07:00
Alexey Palazhchenko	ad047a7dee	chore: small RBAC improvements * `talosctl config new` now sets endpoints in the generated config. * Avoid duplication of roles in metadata. * Remove method name prefix handling. All methods should be set explicitly. * Add tests. Closes #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-25 05:50:38 -07:00
Alexey Palazhchenko	3c1b32199d	chore: refactor CLI tests Use testing.T.TempDir. Add support for `talosctl --endpoints`. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-23 05:49:00 -07:00
Alexey Palazhchenko	f63ab9dd9b	feat: implement `talosctl config new` command Refs #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-17 09:06:43 -07:00
Andrey Smirnov	5811f4dda1	feat: implement link (interface) controllers The structure of the controllers is really similar to addresses and routes: * `LinkSpec` resource describes desired link state * `LinkConfig` controller generates `LinkSpecs` based on machine configuration and kernel cmdline * `LinkMerge` controller merges multiple configuration sources into a single `LinkSpec` paying attention to the config layer priority * `LinkSpec` controller applies the specs to the kernel state Controller `LinkStatus` (which was implemented before) watches the kernel state and publishes current link status. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-01 09:36:25 -07:00
Alexey Palazhchenko	4fe6912143	test: better `talosctl ls` tests Refs #3018. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-05-20 03:29:21 -07:00
Andrey Smirnov	e0650218a6	feat: support etcd recovery from snapshot on bootstrap When Talos `controlplane` node is waiting for a bootstrap, `etcd` contents can be recovered from a snapshot created with `talosctl etcd snapshot` on a healthy cluster. Bootstrap process goes same way as before, but the etcd data directory is recovered from the snapshot. This flow enables disaster recovery for the control plane: given that periodic backups are available, destroy control plane nodes, re-create them with the same config, and bootstrap one node with the saved snapshot to recover etcd state at the time of the snapshot. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-08 10:15:37 -07:00
Alexey Palazhchenko	df52c13581	chore: fix //nolint directives That's the recommended syntax: https://golangci-lint.run/usage/false-positives/ Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-05 05:58:33 -08:00
Andrey Smirnov	e9fc54f6e3	feat: update Kubernetes to 1.20.3 https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1202 Also updater pkgs for: * talos-systems/pkgs#238 (raspberrypi-firmware update) * talos-systems/pkgs#242 (Linux 5.10.17 + init_on_free=0) Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-19 05:22:34 -08:00
Artem Chernyshev	f96548e165	refactor: extract go-cmd into a separate library To be used in the `go-blockdevice` library. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-02-16 10:31:20 -08:00
Andrey Smirnov	87ccf0eb21	test: clear connection refused errors after reset After node reboot (and gRPC API unavailability), gRPC stack might cache connection refused errors for up to backoff timeout. Explicitly clear such errors in reset tests before trying to read data from the node to verify reset success. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-01 08:11:27 -08:00
Andrey Smirnov	3dae6df27b	test: stabilize upgrade test by running health check several times For single node clusters, control plane is unstable after reboot, run health check several times to let it settle down to avoid failures in subsequent checks. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-11 08:31:01 -08:00
Andrey Smirnov	8560fb9662	chore: enable nlreturn linter Most of the fixes were automatically applied. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-09 06:48:07 -08:00
Andrey Smirnov	ff4d702f77	fix: implement preserving contents of partition on install This fixes A/B upgrades and rollback API. Installer manifest supports now an option to preserve partition contents while disk is being re-partitioned and partitions are re-formatted. Mount `/boot` partition as needed (to find current label before starting the installation and in the rollback API). Fix upgrade API for non-master nodes. Contents of `/boot`, `/system/state` and META partitions are preserved in memory while the disk is re-partitioned. Remove `--save` flag from the installer as it's not being used. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-22 23:56:39 +03:00
Andrey Smirnov	56f1ee37fd	feat: upgrade Kubernetes to 1.19.3 Just minor release bump. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-20 05:12:32 -07:00
Andrey Smirnov	773912833e	test: clean up integration test code, fix flakes This enables golangci-lint via build tags for integration tests (this should have been done long ago!), and fixes the linting errors. Two tests were updated to reduce flakiness: * apply config: wait for nodes to issue "boot done" sequence event before proceeding * recover: kill pods even if they appear after the initial set gets killed (potential race condition with previous test). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-19 15:44:14 -07:00
Andrey Smirnov	dc6ea74c35	fix: random failures in cluster health checks The problem was that some of the health checks sort the list of the nodes in place (via `sort.Strings()`). If cluster info provider returns original slice, it might be mutated in such a way that it gets corrupted. We never noticed it before CAPI clusters, as in our tests IPs are assigned sequentially, and sort operation is a no-op. Specifically, the problem was with the `Nodes()` function, it returns `append(controlPlaneNodes, workerNodes...)` slice, which by definition might share memory with `controlPlaneNodes` slice. For example, if control plane nodes were `4, 5, 6` and worker nodes were `3`, the returned slice will be `4, 5, 6, 3`, and it shares memory with `controlPlaneNodes` slice (firs three items). If we apply `sort` to the returned slice, it re-orders it as `3, 4, 5, 6`, but as it is done in-place, the `controlPlaneNodes` slice is now `3, 4, 5`, which is obviously wrong. Fix that by always returning a copy of the slice from the functions implementing `ClusterInfo` interface. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-08 07:13:24 -07:00
Andrey Smirnov	f6ecf000c9	refactor: extract packages loadbalancer and retry This removes in-tree packages in favor of: * github.com/talos-systems/go-retry * github.com/talos-systems/go-loadbalancer Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-02 13:46:22 -07:00
Marco De Luca	1fbb171fd0	test: determine reboots using boot id Changed the RebootSuite to use /proc/sys/kernel/random/boot_id rather than /proc/uptime Signed-off-by: Marco De Luca <marcodl404@gmail.com>	2020-08-26 06:09:02 -07:00
Andrey Smirnov	bddd4f1bf6	refactor: move external API packages into `machinery/` This moves `pkg/config`, `pkg/client` and `pkg/constants` under `pkg/machinery` umbrella. And `pkg/machinery` is published as Go module inside Talos repository. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-08-17 09:56:14 -07:00
Andrey Smirnov	9379cf9ee1	refactor: expose `provision` as public package This change is only moving packages and updating import paths. Goal: expose `internal/pkg/provision` as `pkg/provision` to enable other projects to import Talos provisioning library. As cluster checks are almost always required as part of provisioning process, package `internal/pkg/cluster` was also made public as `pkg/cluster`. Other changes were direct dependencies discovered by `importvet` which were updated. Public packages (useful, general purpose packages with stable API): * `internal/pkg/conditions` -> `pkg/conditions` * `internal/pkg/tail` -> `pkg/tail` Private packages (used only on provisioning library internally): * `internal/pkg/inmemhttp` -> `pkg/provision/internal/inmemhttp` * `internal/pkg/kernel/vmlinuz` -> `pkg/provision/internal/vmlinuz` * `internal/pkg/cniutils` -> `pkg/provision/internal/cniutils` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-08-12 05:12:05 -07:00
Andrey Smirnov	3926442704	feat: taint master nodes with `NoSchedule` taint Fixes #2350 This also brings in a fix for `coredns` tolerations from https://github.com/talos-systems/bootkube-plugin/pull/19. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-07-29 14:02:41 -07:00
Andrey Smirnov	47608fb874	refactor: make `pkg/config` not rely on `machined/../internal/runtime` This makes `pkg/config` directly importable from other projects. There should be no functional changes. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-07-29 12:40:12 -07:00
Andrey Smirnov	6a81f30941	test: provide node discovery for cli tests via kubectl Fixes #2330 CLI tests require node discovery as `--nodes` flag is enforced for most of the `talosctl commands`. For clusters created via `talosctl cluster create`, cluster provisioner state provides all the necessary information, but clusters created via CAPI don't have the state attached. API tests rely on Talos and Kubernetes APIs to fetch kubeconfig and access Nodes K8s API. CLI tests should rely only on CLI tools, so we use `kubectl get nodes` + `talosctl kubeconfig` to fetch list of master and worker nodes. This discovery method relies on "bootstrap" node being set in `talosconfig` (to fetch `kubeconfig`). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-07-28 11:35:47 -07:00
Andrey Smirnov	3d8418a689	feat: force nodes to be set in `talosctl` commands using the API With load-balancing enabled by default running `talosctl` without `--nodes` is risky, as it might hit any control plane by default without `--nodes`. Only two commands do not enforce this check, as they do their own node contexts: `crashdump` and `health` (client-side). Integration tests were updated to always supply `--nodes` cli argument, while doing that I refactored the storage for discovered nodes to use existing `cluster.Info` interface. The downside is that with e2e CAPI tests CLI tests will be mostly skipped as we don't support discovery in CLI tests at the momemnt. This can be fixed by using `talosctl kubeconfig` + `kubectl get nodes` for node discovery. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-07-21 12:17:43 -07:00
Andrey Smirnov	a4a2a3c83a	feat: uncordon nodes automatically on boot Talos will mark node as schedulable if it was previously cordoned by Talos (for upgrade, reset, etc.) If user marked node as not schedulable, Talos won't change it on boot. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-07-09 15:32:36 -07:00
Andrey Smirnov	81d1c2bfe7	chore: enable godot linter Issues were fixed automatically. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-06-30 10:39:56 -07:00
Andrey Smirnov	6fb55229a2	test: fix and improve reboot/reset tests These tests rely on node uptime checks. These checks are quite flaky. Following fixes were applied: * code was refactored as common method shared between reset/reboot tests (reboot all nodes does checks in a different way, so it wasn't updated) * each request to read uptime times out in 5 seconds, so that checks don't wait forever when node is down (or connection is aborted) * to account for node availability vs. lower uptime in the beginning of test, add extra elapsed time to the check condition Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-06-29 13:56:48 -07:00
Andrew Rynhard	d0d2ac3c74	test: default to using the bootstrap API This moves our test scripts to using the bootstrap API. Some automation around invoking the bootstrap API was also added to give the same ease of use when creating clusters with the CLI. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-06-24 08:46:10 -07:00

1 2

76 Commits