talos

mirror of https://github.com/siderolabs/talos.git synced 2025-08-12 01:27:07 +02:00

Author	SHA1	Message	Date
Dmitriy Matrenichev	fa3b933705	chore: replace fmt.Errorf with errors.New where possible This time use `eg` from `x/tools` repo tool to do this. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2024-02-14 17:39:30 +03:00
Dmitriy Matrenichev	5324d39167	chore: bump stuff Also fix .golangci.yml file. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2024-02-09 19:19:25 +03:00
Andrey Smirnov	10c59a6b90	fix: leave discovery service later in the reset sequence Fixes #8057 I went back and forth on the way to fix it exactly, and ended up with a pretty simple version of a fix. The problem was that discovery service was removing the member at the initial phase of reset, which actually still requires KubeSpan to be up: * leaving `etcd` (need to talk to other members) * stopping pods (might need to talk to Kubernetes API with some CNIs) Now leaving discovery service happens way later, when network interactions are no longer required. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-13 19:16:12 +04:00
Andrey Smirnov	36c8ddb5e1	feat: implement ingress firewall rules Fixes #4421 See documentation for details on how to use the feature. With `talosctl cluster create`, firewall can be easily test with `--with-firewall=accept\|block` (default mode). Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-11-30 22:58:16 +04:00
Noel Georgi	f041b26299	chore: add tests for mdadm extension Add tests for mdadm extension. See: https://github.com/siderolabs/extensions/pull/271 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-11-27 23:18:35 +05:30
Andrey Smirnov	3c9f7a7de6	chore: re-enable nolintlint and typecheck linters Drop startup/rand.go, as since Go 1.20 `rand.Seed` is done automatically. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-08-25 01:05:41 +04:00
Noel Georgi	6778ded29d	feat: add e2e-aws for nvidia extensions Add e2e tests for nvidia Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-08-24 17:43:36 +05:30
Noel Georgi	833895940b	chore: add tests for zfs extension Add tests for ZFS and btrfs extensions. Also fix the e2e-aws cron pipeline. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-08-23 11:16:25 +05:30
Noel Georgi	6b0373ebef	chore: move bash tests to integration move extensions and secureboot tests to integration. Makes it easier to test. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-08-17 19:58:35 +05:30
Dmitriy Matrenichev	c4a1ca8d61	chore: remove <-errCh where possible in grpc methods Simplify code by passing error directly into the pipe closer. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-08-07 22:28:58 +03:00
Noel Georgi	e3f3f5794d	feat: implement revert for sd-boot Implement revert for sd-boot. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-06-22 20:20:31 +05:30
Andrey Smirnov	badbc51e63	refactor: rewrite code to include preliminary support for multi-doc `config.Container` implements a multi-doc container which implements both `Container` interface (encoding, validation, etc.), and `Conifg` interface (accessing parts of the config). Refactor `generate` and `bundle` packages to support multi-doc, and provide backwards compatibility. Implement a first (mostly example) machine config document for SideroLink API URL. Many places don't properly support multi-doc yet (e.g. config patches). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-31 18:38:05 +04:00
Noel Georgi	d1a61fd343	chore: bump golangci-lint Bump golangci-lint and fixup new warnings. Ignore check that checks for used function parameters, it's kind of noisy and makes it confusing to read interface implementations. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-03-22 19:55:38 +05:30
Andrey Smirnov	96aa9638f7	chore: rename talos-systems/talos to siderolabs/talos There's a cyclic dependency on siderolink library which imports talos machinery back. We will fix that after we get talos pushed under a new name. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-03 16:50:32 +04:00
Andrey Smirnov	343c55762e	chore: replace talos-systems Go modules with siderolabs This the first step towards replacing all import paths to be based on `siderolabs/` instead of `talos-systems/`. All updates contain no functional changes, just refactorings to adapt to the new path structure. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-01 12:55:40 +04:00
Andrey Smirnov	2dadcd6695	fix: stop worker nodes from acting as apid routers Don't allow worker nodes to act as apid routers: * don't try to issue client certificate for apid on worker nodes * if worker nodes receives incoming connections with `--nodes` set to one of the local addresses of the nodd, it routes the request to itself without proxying Second point allows using `talosctl -e worker -n worker` to connect directly to the worker if the connection from the control plane is not available for some reason. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-13 15:07:31 +04:00
Dmitriy Matrenichev	29bd632401	chore: remove old build tags syntax This commit removes lines contains old build tag syntax. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-08-24 17:27:01 +03:00
Andrey Smirnov	a6b010a8b4	chore: update Go to 1.19, Linux to 5.15.58 See https://go.dev/doc/go1.19 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-08-03 17:03:58 +04:00
Artem Chernyshev	8028e10749	fix: wait for boot done when rebooting a node in the integration tests We shouldn't start cluster healthcheck until boot sequence is done. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-07-27 23:58:43 +03:00
Artem Chernyshev	ae1bec59e9	feat: allow running only one sequence at a time Fix `Talos` sequencer to run only a single sequence at the same time. Sequences priority was updated. To match the table: \| what is running (columns) what is requested (rows) \| boot \| reboot \| reset \| upgrade \| \|----------------------------------------------------\|------\|--------\|-------\|---------\| \| reboot \| Y \| Y \| Y \| N \| \| reset \| Y \| N \| N \| N \| \| upgrade \| Y \| N \| N \| N \| With a small addition that `WithTakeover` is still there. If set, priority is ignored. This is mainly used for `Shutdown` sequence invokation. And if doing apply config with reboot enabled. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-07-27 17:21:36 +03:00
Utku Ozdemir	8d2be5e315	feat: extend node definition used in health checks Introduce `cluster.NodeInfo` to represent the basic info about a node which can be used in the health checks. This information, where possible, will be populated by the discovery service in following PRs. Part of siderolabs#5554. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2022-06-13 14:13:42 +02:00
Alexey Palazhchenko	7462733bcb	chore: update golangci-lint Fix context propagation. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-11-15 14:55:25 +00:00
Andrey Smirnov	b6b78e7fef	test: add cluster discovery integration tests This verifies that members match cluster state and that both cluster registries work in sync producing same discovery data. Fixes #4191 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-10-25 21:03:29 +03:00
Andrey Smirnov	a059454045	chore: build using Go 1.17 `initramfs` size for amd64 shrinks by 1.3 MiB. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-09-13 22:33:47 +03:00
Alexey Palazhchenko	f63ab9dd9b	feat: implement `talosctl config new` command Refs #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-17 09:06:43 -07:00
Andrey Smirnov	5811f4dda1	feat: implement link (interface) controllers The structure of the controllers is really similar to addresses and routes: * `LinkSpec` resource describes desired link state * `LinkConfig` controller generates `LinkSpecs` based on machine configuration and kernel cmdline * `LinkMerge` controller merges multiple configuration sources into a single `LinkSpec` paying attention to the config layer priority * `LinkSpec` controller applies the specs to the kernel state Controller `LinkStatus` (which was implemented before) watches the kernel state and publishes current link status. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-01 09:36:25 -07:00
Andrey Smirnov	e0650218a6	feat: support etcd recovery from snapshot on bootstrap When Talos `controlplane` node is waiting for a bootstrap, `etcd` contents can be recovered from a snapshot created with `talosctl etcd snapshot` on a healthy cluster. Bootstrap process goes same way as before, but the etcd data directory is recovered from the snapshot. This flow enables disaster recovery for the control plane: given that periodic backups are available, destroy control plane nodes, re-create them with the same config, and bootstrap one node with the saved snapshot to recover etcd state at the time of the snapshot. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-08 10:15:37 -07:00
Alexey Palazhchenko	df52c13581	chore: fix //nolint directives That's the recommended syntax: https://golangci-lint.run/usage/false-positives/ Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-05 05:58:33 -08:00
Andrey Smirnov	87ccf0eb21	test: clear connection refused errors after reset After node reboot (and gRPC API unavailability), gRPC stack might cache connection refused errors for up to backoff timeout. Explicitly clear such errors in reset tests before trying to read data from the node to verify reset success. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-01 08:11:27 -08:00
Andrey Smirnov	ff4d702f77	fix: implement preserving contents of partition on install This fixes A/B upgrades and rollback API. Installer manifest supports now an option to preserve partition contents while disk is being re-partitioned and partitions are re-formatted. Mount `/boot` partition as needed (to find current label before starting the installation and in the rollback API). Fix upgrade API for non-master nodes. Contents of `/boot`, `/system/state` and META partitions are preserved in memory while the disk is re-partitioned. Remove `--save` flag from the installer as it's not being used. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-22 23:56:39 +03:00
Andrey Smirnov	56f1ee37fd	feat: upgrade Kubernetes to 1.19.3 Just minor release bump. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-20 05:12:32 -07:00
Andrey Smirnov	773912833e	test: clean up integration test code, fix flakes This enables golangci-lint via build tags for integration tests (this should have been done long ago!), and fixes the linting errors. Two tests were updated to reduce flakiness: * apply config: wait for nodes to issue "boot done" sequence event before proceeding * recover: kill pods even if they appear after the initial set gets killed (potential race condition with previous test). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-10-19 15:44:14 -07:00
Andrey Smirnov	f6ecf000c9	refactor: extract packages loadbalancer and retry This removes in-tree packages in favor of: * github.com/talos-systems/go-retry * github.com/talos-systems/go-loadbalancer Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-02 13:46:22 -07:00
Marco De Luca	1fbb171fd0	test: determine reboots using boot id Changed the RebootSuite to use /proc/sys/kernel/random/boot_id rather than /proc/uptime Signed-off-by: Marco De Luca <marcodl404@gmail.com>	2020-08-26 06:09:02 -07:00
Andrey Smirnov	bddd4f1bf6	refactor: move external API packages into `machinery/` This moves `pkg/config`, `pkg/client` and `pkg/constants` under `pkg/machinery` umbrella. And `pkg/machinery` is published as Go module inside Talos repository. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-08-17 09:56:14 -07:00
Andrey Smirnov	9379cf9ee1	refactor: expose `provision` as public package This change is only moving packages and updating import paths. Goal: expose `internal/pkg/provision` as `pkg/provision` to enable other projects to import Talos provisioning library. As cluster checks are almost always required as part of provisioning process, package `internal/pkg/cluster` was also made public as `pkg/cluster`. Other changes were direct dependencies discovered by `importvet` which were updated. Public packages (useful, general purpose packages with stable API): * `internal/pkg/conditions` -> `pkg/conditions` * `internal/pkg/tail` -> `pkg/tail` Private packages (used only on provisioning library internally): * `internal/pkg/inmemhttp` -> `pkg/provision/internal/inmemhttp` * `internal/pkg/kernel/vmlinuz` -> `pkg/provision/internal/vmlinuz` * `internal/pkg/cniutils` -> `pkg/provision/internal/cniutils` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-08-12 05:12:05 -07:00
Andrey Smirnov	47608fb874	refactor: make `pkg/config` not rely on `machined/../internal/runtime` This makes `pkg/config` directly importable from other projects. There should be no functional changes. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-07-29 12:40:12 -07:00
Andrey Smirnov	3d8418a689	feat: force nodes to be set in `talosctl` commands using the API With load-balancing enabled by default running `talosctl` without `--nodes` is risky, as it might hit any control plane by default without `--nodes`. Only two commands do not enforce this check, as they do their own node contexts: `crashdump` and `health` (client-side). Integration tests were updated to always supply `--nodes` cli argument, while doing that I refactored the storage for discovered nodes to use existing `cluster.Info` interface. The downside is that with e2e CAPI tests CLI tests will be mostly skipped as we don't support discovery in CLI tests at the momemnt. This can be fixed by using `talosctl kubeconfig` + `kubectl get nodes` for node discovery. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-07-21 12:17:43 -07:00
Andrey Smirnov	a4a2a3c83a	feat: uncordon nodes automatically on boot Talos will mark node as schedulable if it was previously cordoned by Talos (for upgrade, reset, etc.) If user marked node as not schedulable, Talos won't change it on boot. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-07-09 15:32:36 -07:00
Andrey Smirnov	81d1c2bfe7	chore: enable godot linter Issues were fixed automatically. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-06-30 10:39:56 -07:00
Andrey Smirnov	6fb55229a2	test: fix and improve reboot/reset tests These tests rely on node uptime checks. These checks are quite flaky. Following fixes were applied: * code was refactored as common method shared between reset/reboot tests (reboot all nodes does checks in a different way, so it wasn't updated) * each request to read uptime times out in 5 seconds, so that checks don't wait forever when node is down (or connection is aborted) * to account for node availability vs. lower uptime in the beginning of test, add extra elapsed time to the check condition Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-06-29 13:56:48 -07:00
Andrey Smirnov	795a10b681	test: improve reboot/reset test resiliency against request timeouts After node reboot test code tries endlessly to read the uptime until it goes down after reboot, but during actual reboot API won't be responsive and it might happen that this call will time out only with parent context canceling, and by that time retry timeout is already exhausted, so no more attempts will be made (while node successfully booted after a reboot). ``` uptime didn't go down: before 219.730000, after 267.020000 uptime didn't go down: before 219.730000, after 268.030000 EOF rpc error: code = DeadlineExceeded desc = context deadline exceeded timeout ``` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-05-22 12:31:06 -07:00
Seán C McCord	3e0e01e2c3	fix: refactor client creation API Create a new `client.New` to make external API systems easier to construct. A new type `client.OptionFunc` allows the client to be extended with specific configuration. This also makes a first pass at supporting multiple endpoints properly by creating a custom grpc resolver. (Proper load balancing support is still a TODO.) Fixes #2093 Signed-off-by: Seán C McCord <ulexus@gmail.com>	2020-05-11 10:21:07 -07:00
Andrew Rynhard	56d7bf19fe	feat: add recovery API This adds an API for recovering the self-hosted control plane. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-05-04 19:38:30 -07:00
Andrew Rynhard	49307d554d	refactor: improve machined This is a rewrite of machined. It addresses some of the limitations and complexity in the implementation. This introduces the idea of a controller. A controller is responsible for managing the runtime, the sequencer, and a new state type introduced in this PR. A few highlights are: - no more event bus - functional approach to tasks (no more types defined for each task) - the task function definition now offers a lot more context, like access to raw API requests, the current sequence, a logger, the new state interface, and the runtime interface. - no more panics to handle reboots - additional initialize and reboot sequences - graceful gRPC server shutdown on critical errors - config is now stored at install time to avoid having to download it at install time and at boot time - upgrades now use the local config instead of downloading it - the upgrade API's preserve option takes precedence over the config's install force option Additionally, this pulls various packes in under machined to make the code easier to navigate. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-04-28 08:20:55 -07:00
Andrey Smirnov	55dcbbc8d0	feat: add commands talosctl health/crashdump This extracts health & crashdump features which were specific to provisioning code into separate package which can be used standalone. Everything else is just new glue. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-04-27 20:43:10 -07:00
Andrey Smirnov	682dd433ba	refactor: move Talos client package to `pkg/` As this implements Go client for Talos API, it makes sense to publish it one the top level. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-04-01 23:45:58 +03:00
Andrew Rynhard	5dbc26c7a3	feat: rename osctl to talosctl This is a rename of the osctl binary. We decided that talosctl is a better name for the Talos CLI. This does not break any APIs, but does make older documentation only accurate for previous versions of Talos. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-03-20 19:07:39 -07:00
Andrey Smirnov	d5f80858dd	test: add 'reset' integration test for Reset() API Every node is reset, rebooted and it comes back up again except for the init node due to known issues with init node boostrapping etcd cluster from scratch when metadata is missing (as node was wiped). Planned workaround is to prohibit resetting init node (should be coming next). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-03-06 23:05:46 +03:00
Andrey Smirnov	afa8a48174	chore: implement reboot test Reboot test does node-by-node reboots followed by cluster health checks (same as done by provisioner). Fixed bug with `Read()` returning `Reader` instead of `ReadCloser` (minor). Allowed `bootkube` to be `Skipped` (for rebooted node). Added support for doing checks via provided client instance. Implemented generic capabilities to skip tests based on cluster platform. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-02-03 11:02:43 -08:00

1 2

60 Commits