talos

mirror of https://github.com/siderolabs/talos.git synced 2025-11-07 11:51:49 +01:00

Author	SHA1	Message	Date
Andrey Smirnov	bb02dd263c	chore: drop deprecated stuff for Talos 1.5 * drop old resources API, which was deprecated long time ago * use bootstrapped event in `talosctl get --watch` to better align columns in the table output Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-18 19:46:37 +04:00
Andrey Smirnov	442cb9c1b0	feat: implement APIs to write to META This allows to put keys to META partition. META contents can be viewed with `talosctl get metakeys`. There is not real usecase for it yet, but the next PRs will introduce two special keys which can be written: * platform network config for `metal` * `${code}` variable Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-15 22:17:52 +04:00
Nico Berlee	97048f7c37	feat: netstat in API and client Implements netstat in Talos API and client (talosctl). Signed-off-by: Nico Berlee <nico.berlee@on2it.net> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-09 15:48:30 +04:00
Andrey Smirnov	96629d5ba6	feat: implement etcd maintenance commands This allows to safely recover out of space quota issues, and perform degragmentation as needed. `talosctl etcd status` command provides lots of information about the cluster health. See docs for more details. Fixes #4889 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-01-03 23:25:28 +04:00
Philipp Sauter	4e114ca120	feat: use the etcd member id for etcd operations instead of hostname We add a controller that provides the etcd member id as a resource and change the etcd related commands to support member ids next to hostnames. Fixes: #6223 Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>	2022-11-10 19:17:56 +04:00
Andrey Smirnov	96aa9638f7	chore: rename talos-systems/talos to siderolabs/talos There's a cyclic dependency on siderolink library which imports talos machinery back. We will fix that after we get talos pushed under a new name. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-03 16:50:32 +04:00
Artem Chernyshev	b2fec3c975	fix: properly handle `configContext` being `nil` in Talos client Client crashes if you try using it in the unixSocket mode. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-08-31 17:31:20 +03:00
Philipp Sauter	f37da96ef3	feat: enable talos client to connect to Talos through an auth proxy Talos client can connect to Talos API via a proxy with basic auth. Additionally it is now optional to specify a TLS CA,key or crt. Optionally Developers can build talosctl with WITH_DEBUG=1 to allow insecure connections when http:// endpoints are specified. Fixes #5980 Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>	2022-08-15 18:05:26 +02:00
Andrey Smirnov	9baca49662	refactor: implement COSI resource API for Talos Overview: deprecate existing Talos resource API, and introduce new COSI API. Consequences: * COSI API can only go via one-2-one proxy (`client.WithNode`) * client-side API access is way easier with `state.State` wrappers * lots of small changes on the client side to use new APIs Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-08-12 22:31:54 +04:00
Utku Ozdemir	d04211f85f	feat: add new event watch fn and return action responses on API Add a new function EventsWatchV2 that blocks until receiving the first event then switches to non-blocking mode. Also add new API functions to return responses of the lifecycle actions `reboot`, `reset` and `shutdown`. Required for the client-side part of siderolabs/talos#5499. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2022-08-12 19:07:02 +02:00
Philipp Sauter	6b23deddcf	feat: support custom ports for connecting to apid from talosctl Users can now add a port suffix to the endpoints used by talosctl. Either in the CLI flag or the ~/.talos/config. The default port is still 50000. Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>	2022-08-11 16:52:46 +02:00
Andrey Smirnov	ec05aee040	fix: correctly unwrap errors when streaming When message is sent via the proxy, `metadata.error` carries only string representation which can't be unmarshalled back into an `error` which we can match against. A similar fix was already done for "unary" responses, but we missed the streaming case. This fixes a spurious failure in integration tests when calling `talosctl pcap --duration 1s`. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-07-26 23:52:37 +04:00
Andrey Smirnov	065b59276c	feat: implement packet capture API This uses the `go-packet` library with native bindings for the packet capture (without `libpcap`). This is not the most performant way, but it allows us to avoid CGo. There is a problem with converting network filter expressions (like `tcp port 3222`) into BPF instructions, it's only available in C libraries, but there's a workaround with `tcpdump`. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-07-19 01:23:09 +04:00
Dmitriy Matrenichev	9b9191c5e7	fix: increase intiial window and connection window sizes For #4950 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-03-25 01:38:59 +04:00
Andrey Smirnov	f477507262	fix: the etcd recovery client and tests This is the follow-up fix to the PR #5129. 1. Correctly catch only expected errors in the tests. 2. Rewind the snapshot each time the upload is retried. 3. Correctly unwrap errors in the `EtcdRecovery` client. 4. Update the `grpc-proxy` library to pass through the EOF error. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-03-22 16:51:36 +03:00
Tim Jones	fe40e7b1b3	feat: drain node on shutdown Cordon & drain a node when the Shutdown message is received. Also adds a '--force' option to the shutdown command in case the control plane is unresponsive. Signed-off-by: Tim Jones <timniverse@gmail.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-02-01 00:06:32 +03:00
Rohit Dandamudi	7f9922296a	feat: add powercycle mode in reboot - Fixes #4569 - Updated reboot process sequence - Updted api.descriptors to avoid proto type change linting error https://github.com/talos-systems/talos/pull/4612#discussion_r758599242 Signed-off-by: Rohit Dandamudi <rohit.dandamudi@siderolabs.com> Signed-off-by: Rohit Dandamudi <rohit.dandamudi@siderolabs.com>	2021-12-02 22:40:04 +05:30
Andrey Smirnov	c97becdd95	chore: remove interfaces and routes APIs Fixes #4279 These APIs were deprecated in 0.13, now it's time to drop them for 0.14. They were not used anywhere in Talos, so no changes on Talos side. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-10-27 15:34:17 +03:00
Andrey Smirnov	b450b7cef0	chore: deprecate Interfaces and Routes APIs Fixes #4094 Deprecate old networkd APIs, `talosctl interfaces` and `talosctl routes` now suggest different commands to be used to achieve same task. TUI installer was updated to stop using Interfaces API. Those APIs will be completely removed in 0.14. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-09-27 15:21:02 +03:00
Andrey Smirnov	b969e7720e	chore: update references to old protobuf package This simply uses new protobuf package instead of old one. Old protobuf package is still in use by Talos dependencies. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-08 05:34:12 -07:00
Andrey Smirnov	10c28758a4	fix: ignore DeadlineExceeded error correctly on bootstrap The problem was that gRPC method `status.Code(err)` doesn't unwrap errors, while Talos client returns errors wrapped with `multierror.Error` and `fmt.Errrorf`, so `status.Code` doesn't return error code correctly. Fix that by introducing our own client method which correctly goes over the chain of wrapped errors. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-07 12:02:26 -07:00
Alexey Palazhchenko	ad047a7dee	chore: small RBAC improvements * `talosctl config new` now sets endpoints in the generated config. * Avoid duplication of roles in metadata. * Remove method name prefix handling. All methods should be set explicitly. * Add tests. Closes #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-25 05:50:38 -07:00
Artem Chernyshev	7672435e16	feat: add a method to get gRPC connection from the client This change is for Theila which is going to use gRPC proxy to forward requests from TS frontend right to the node's apid. `gRPC` proxy operates on top of `grpc.ClientConn` objects, so getting this connection from the clients which are already being created is the easiest path. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-06-24 23:02:12 +03:00
Alexey Palazhchenko	06209bba28	chore: update RBAC rules, remove old APIs Refs #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-18 09:54:49 -07:00
Alexey Palazhchenko	f63ab9dd9b	feat: implement `talosctl config new` command Refs #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-17 09:06:43 -07:00
Andrey Smirnov	e0650218a6	feat: support etcd recovery from snapshot on bootstrap When Talos `controlplane` node is waiting for a bootstrap, `etcd` contents can be recovered from a snapshot created with `talosctl etcd snapshot` on a healthy cluster. Bootstrap process goes same way as before, but the etcd data directory is recovered from the snapshot. This flow enables disaster recovery for the control plane: given that periodic backups are available, destroy control plane nodes, re-create them with the same config, and bootstrap one node with the saved snapshot to recover etcd state at the time of the snapshot. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-08 10:15:37 -07:00
Andrey Smirnov	e664362cec	feat: add API and command to save etcd snapshot (backup) This adds a simple API and `talosctl etcd snapshot` command to stream snapshot of etcd from one of the control plane nodes to the local file. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-02 09:20:16 -07:00
Alexey Palazhchenko	df52c13581	chore: fix //nolint directives That's the recommended syntax: https://golangci-lint.run/usage/false-positives/ Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-05 05:58:33 -08:00
Artem Chernyshev	376fdcf6cb	feat: implement etcd remove-member cli command Fixes: https://github.com/talos-systems/talos/issues/3219 We already have `etcd leave`, which makes the node exclude itself from etcd members. But in case if the node can't remove itself because it doesn't have connection to etcd we need this etcd remove-member cli, which basically removes a node from a different node. No unit tests for that as it's going to destroy the test cluster. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-03-01 07:55:08 -08:00
Andrey Smirnov	32d2588528	test: update integration tests to use wrapped client for etcd APIs This continues the fix from #3167. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-18 08:08:48 -08:00
Andrey Smirnov	254e0e91e1	fix: correctly unwrap responses for etcd commands This uses wrappers which helps to unwrap errors from proxied apid responses. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-17 11:33:54 -08:00
Andrey Smirnov	d99a016af2	fix: correct response structure for GenerateConfig API Also fix recovery grpc handler to print panic stacktrace to the log. Any API should follow the structure compatible with apid proxying injection of errors/nodes. Explicitly fail GenerateConfig API on worker nodes, as it panics on worker nodes (missing certificates in node config). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-11 06:34:10 -08:00
Andrey Smirnov	df0099036c	fix: correctly extract wrapped error messages In our client API, as the request goes through `apid` proxying, actual error might be wrapped into the response (to support multi-node requests), so it should always be correctly unwrapped. This has UX issue: `talosctl apply-config` silently doesn't work (server API fails, but no errror is shown in `talosctl`). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-11 04:54:49 -08:00
Andrey Smirnov	edf5777222	feat: add an option to force upgrade without checks Our upgrades are safe by default - we check etcd health, take locks, etc. But sometimes upgrades might be a way to recover broken (or semi-broken) cluster, in that case we need upgrade to run even if the checks are not passing. This is not a safe way to do upgrades, but it might be a way to recover a cluster. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-09 10:20:03 -08:00
Andrey Smirnov	0aaf8fa968	feat: replace bootkube with Talos-managed control plane Control plane components are running as static pods managed by the kubelets. Whole subsystem is managed via resources/controllers from os-runtime. Many supporting changes/refactoring to enable new code paths. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-01-26 14:22:35 -08:00
Andrey Smirnov	11863dd74d	feat: implement resource API in Talos This brings in `os-runtime` package and exposes resources with first iteration of read-only API. Two Talos resources (and one controller) are implemented: * legacy.Service resource tracks Talos 'service' `RUNNING` state * config.V1Alpha1 stores current runtime config Glue point between existing runtime and new os-runtime based runtime is in `v1alpha2` implementation and `V1Alpha2()` sub-interfaces of existing `Runtime`, `State`, `Controller` interfaces. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-01-19 11:45:46 -08:00
Andrey Smirnov	54ed80e244	feat: reset with system disk wipe spec Idea is to add an option to perform "selective" reset: default reset operation is to wipe all partitions (triggering reinstall), while spec allows only to wipe some of the operations. Other operations are performed exactly in the same way for any reset flow. Possible use case: reset only `EPHEMERAL` partition. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-10 11:31:07 -08:00
Andrey Smirnov	350280eb59	feat: implement "staged" (failsafe/backup) upgrades Regular upgrade path takes just one reboot, but it requires all the processes to be stopped on the node before upgrade might proceed. Under some circumstances and with potential Talos bugs it might not work rendering Talos upgrades almost impossible. Staged upgrades build upon regular install flow to run the upgrade on the node reboot. Such upgrades require two reboots of the node, and it requires two pulls of the installer image, but they should be much less suspicious to the failure. Once the upgrade is staged, node can be rebooted in any possible way, including hard reset and upgrade is performed on the next boot. New ADV format was implemented as well to allow to store install image ref/options across reboots. New format allows for bigger values and takes 50% of the `META` partition. Old ADV is still kept for compatibility reasons. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-08 08:34:26 -08:00
Andrey Smirnov	666e4feb73	fix: defer resolving config context in client code Now config context is resolved only when it is about to be used, so that client can operate in config-less mode if config is not required (e.g. when doing `apply-config --insecure`). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-03 07:16:22 -08:00
Artem Chernyshev	b6874ee82a	feat: add TUI based talos interactive installer This is initial commit of the installer. What's done: - verifying node availability before starting any operations. - gathering information about disks on the machine. - allows setting: install disk, hostname, machine type, installer image, kubernetes version, dns domain, cluster-name. - dumps/merges talosconfig to a file after applying configuration. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-18 12:34:15 -08:00
Artem Chernyshev	0f924b5122	feat: add generate config gRPC API Fixes: https://github.com/talos-systems/talos/issues/2766 This API is implemented in Maintenance and Machine services. Can be used to generate configuration on the node, instead of using talosctl to generate it locally. To be used in interactive installer and talosctl gen config. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-13 08:07:32 -08:00
Artem Chernyshev	93e30a1738	chore: remove maintenance service interface and use machine service Now maintenance service implements `MachineService` interface, stubbing all not implemented methods. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-11-11 12:33:44 -08:00
Andrey Smirnov	b2b86a622e	fix: remove 'token creds' from maintenance service This fixes the reverse Go dependency from `pkg/machinery` to `talos` package. Add a check to `Dockerfile` to prevent `pkg/machinery/go.mod` getting out of sync, this should prevent problems in the future. Fix potential security issue in `token` authorizer to deny requests without grpc metadata. In provisioner, add support for launching nodes without the config (config is not delivered to the provisioned nodes). Breaking change in `pkg/provision`: now `NodeRequest.Type` should be set to the node type (as config can be missing now). In `talosctl cluster create` add a flag to skip providing config to the nodes so that they enter maintenance mode, while the generated configs are written down to disk (so they can be tweaked and applied easily). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-09 14:10:32 -08:00
Andrey Smirnov	a2efa44663	chore: enable gci linter Fixes were applied automatically. Import ordering might be questionable, but it's strict: * stdlib * other packages * same package imports Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-11-09 08:09:48 -08:00
Andrew Rynhard	562f816526	refactor: use gRPC for interactive installation Instead of hosting a web service, we decided to implement a gRPC service that exposes APIs that can be used in a client-side interactive installer. Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2020-11-03 08:36:44 -08:00
Artem Chernyshev	04e267a550	feat: handle unsupported commands being called for docker Return proper message back to the client in case if called method is not supported by mode any particular node runs in. Fixes: https://github.com/talos-systems/talos/issues/2629 Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-10-14 13:44:38 -07:00
Artem Chernyshev	e7e99cf1b3	feat: support disk usage command in talosctl Usage example: ```bash talosctl du --nodes 10.5.0.2 /var -H -d 2 NODE NAME 10.5.0.2 8.4 kB etc 10.5.0.2 1.3 GB lib 10.5.0.2 16 MB log 10.5.0.2 25 kB run 10.5.0.2 4.1 kB tmp 10.5.0.2 1.3 GB . ``` Supported flags: - `-a` writes counts for all files, not just directories. - `-d` recursion depth - '-H' humanize size outputs. - '-t' size threshold (skip files if < size or > size). Fixes: https://github.com/talos-systems/talos/issues/2504 Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-10-13 09:30:31 -07:00
Seán C McCord	ff92d2a14b	feat: add ApplyConfiguration API Adds the ability to apply (replace) an existing node configuration with a new one via the Machine API. Fixes #2345 Signed-off-by: Seán C McCord <ulexus@gmail.com>	2020-09-29 14:44:06 -07:00
Andrey Smirnov	2b7d8e7343	fix: improve error message on empty config Fixes #2096 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-09-17 22:17:21 +03:00
Andrey Smirnov	bddd4f1bf6	refactor: move external API packages into `machinery/` This moves `pkg/config`, `pkg/client` and `pkg/constants` under `pkg/machinery` umbrella. And `pkg/machinery` is published as Go module inside Talos repository. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-08-17 09:56:14 -07:00

50 Commits