talos

mirror of https://github.com/siderolabs/talos.git synced 2025-12-03 16:41:17 +01:00

Author	SHA1	Message	Date
Nico Berlee	0af8fe2fb5	feat: netstat pod support talosctl netstat -k show all host and non-hostnetwork pods sockets/connections. talosctl netstat namespace/pod shows sockets/connections of a specific pod + autocompletes in the shell. Signed-off-by: Nico Berlee <nico.berlee@on2it.net> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-30 23:39:38 +04:00
Andrey Smirnov	442cb9c1b0	feat: implement APIs to write to META This allows to put keys to META partition. META contents can be viewed with `talosctl get metakeys`. There is not real usecase for it yet, but the next PRs will introduce two special keys which can be written: * platform network config for `metal` * `${code}` variable Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-15 22:17:52 +04:00
Nico Berlee	97048f7c37	feat: netstat in API and client Implements netstat in Talos API and client (talosctl). Signed-off-by: Nico Berlee <nico.berlee@on2it.net> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-09 15:48:30 +04:00
Artem Chernyshev	b520710810	feat: introduce new flag in reset API that makes Talos reset user disks Fixes: https://github.com/siderolabs/talos/issues/6815 Additionally, make it possible to run reset in maintenance mode: to enable a way for resetting system disk and remove all traces of Talos from it. The new reset flow works in a separate sequence, changed disk probe lookup to check the boot partition instead of the ephemeral one. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2023-02-28 15:10:41 +03:00
Andrey Smirnov	96629d5ba6	feat: implement etcd maintenance commands This allows to safely recover out of space quota issues, and perform degragmentation as needed. `talosctl etcd status` command provides lots of information about the cluster health. See docs for more details. Fixes #4889 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-01-03 23:25:28 +04:00
Andrey Smirnov	89dbb0ecf0	release(v1.4.0-alpha.0): prepare release This is the official v1.4.0-alpha.0 release. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-23 22:32:09 +04:00
Philipp Sauter	4e114ca120	feat: use the etcd member id for etcd operations instead of hostname We add a controller that provides the etcd member id as a resource and change the etcd related commands to support member ids next to hostnames. Fixes: #6223 Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>	2022-11-10 19:17:56 +04:00
Andrey Smirnov	96aa9638f7	chore: rename talos-systems/talos to siderolabs/talos There's a cyclic dependency on siderolink library which imports talos machinery back. We will fix that after we get talos pushed under a new name. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-03 16:50:32 +04:00
Utku Ozdemir	b5da686a7b	feat: add actor ID to events & emit an initial empty event Add a new field `actorID` to the events and populate it with a UUID for the lifecycle actions `reboot`, `reset`, `upgrade` and `shutdown`. This actor ID will be present on all events emitted by this triggered action. We can use this ID later on the client side to be able to track triggered actions. We also emit an event with an empty payload on the events streaming GRPC endpoint when a client connects. The purpose of this event is to signal to the client that the event streaming has actually started. Server-side part of siderolabs/talos#5499. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2022-08-11 15:14:11 +02:00
Noel Georgi	b62b18a972	feat: bump k8s to v1.25.0-beta.0 Bump k8s to v1.25.0-beta.0 Update most kubernetes `master` references to `controlplane` Signed-off-by: Noel Georgi <git@frezbo.dev>	2022-08-10 22:17:53 +05:30
Andrey Smirnov	fe2ee3b100	feat: implement MachineStatus resource Fixes #5789 Example: ```yaml spec: stage: running status: ready: false unmetConditions: - name: staticPods reason: kube-system/kube-controller-manager-talos-default-master-1 not ready, kube-system/kube-scheduler-talos-default-master-1 not ready ``` As events (CLI doesn't show full contents): ``` 172.20.0.2 cbhf2l6f9lrs738hehfg talos/runtime/machine.MachineStatusEvent BOOTING ready: false, unmet conditions: [time network services] ``` Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-08-01 18:36:10 +04:00
Andrey Smirnov	065b59276c	feat: implement packet capture API This uses the `go-packet` library with native bindings for the packet capture (without `libpcap`). This is not the most performant way, but it allows us to avoid CGo. There is a problem with converting network filter expressions (like `tcp port 3222`) into BPF instructions, it's only available in C libraries, but there's a workaround with `tcpdump`. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-07-19 01:23:09 +04:00
Andrey Smirnov	022581d809	release(v1.2.0-alpha.0): prepare release This is the official v1.2.0-alpha.0 release. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-06-30 19:01:07 +04:00
Artem Chernyshev	2b03057b91	feat: implement a new mode `try` in the config manipulation commands The new mode allows changing the config for a period of time, which allows trying the configuration and automatically rolling it back in case if it doesn't work for example. The mode can only be used with changes that can be applied without a reboot. When changed it doesn't write the configuration to disk, only changes it in memory. `--timeout` parameter can be used to customize the rollback delay. The default timeout is 1 minute. Any consequent configuration change will abort try mode and the last applied configuration will be used. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-04-21 20:31:45 +03:00
Artem Chernyshev	2b9722d1f5	feat: add `dry-run` flag in `apply-config` and `edit` commands Dry run prints out config diff, selected application mode without changing the configuration. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-04-14 19:12:57 +03:00
Andrey Smirnov	25d19131d3	release(v1.1.0-alpha.0): prepare release This is the official v1.1.0-alpha.0 release. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-04-01 18:23:19 +03:00
Andrey Smirnov	59681b8c9a	fix: backport fixes from release-1.0 branch They were discovered as we tagged 1.0.0 version: * wrong deprecated version * incompatibility in extension compatibility checks Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-03-04 23:28:06 +03:00
Tim Jones	fe40e7b1b3	feat: drain node on shutdown Cordon & drain a node when the Shutdown message is received. Also adds a '--force' option to the shutdown command in case the control plane is unresponsive. Signed-off-by: Tim Jones <timniverse@gmail.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-02-01 00:06:32 +03:00
Artem Chernyshev	2f2bdb26aa	feat: replace flags with --mode in `apply`, `edit` and `patch` commands Fixes: https://github.com/talos-systems/talos/issues/4588 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-01-13 16:09:53 +03:00
Andrey Smirnov	cb548a368a	release(v0.15.0-alpha.0): prepare release This is the official v0.15.0-alpha.0 release. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-30 16:27:19 +03:00
Rohit Dandamudi	7f9922296a	feat: add powercycle mode in reboot - Fixes #4569 - Updated reboot process sequence - Updted api.descriptors to avoid proto type change linting error https://github.com/talos-systems/talos/pull/4612#discussion_r758599242 Signed-off-by: Rohit Dandamudi <rohit.dandamudi@siderolabs.com> Signed-off-by: Rohit Dandamudi <rohit.dandamudi@siderolabs.com>	2021-12-02 22:40:04 +05:30
Alexey Palazhchenko	0f169bf9b1	chore: add API deprecations mechanism Refs #4576. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-11-30 06:31:55 +00:00
Alexey Palazhchenko	20d39c0b48	chore: format .proto files Refs #2722. Co-authored-by: Andrey Smirnov <andrey.smirnov@talos-systems.com> Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-11-23 15:05:25 +00:00
Artem Chernyshev	f730252579	feat: add new event types Add config load + validation errors and address + hostnames events. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2021-11-18 18:48:35 +03:00
Andrey Smirnov	dadaa65d54	feat: print uid/gid for the files in `ls -l` This adds information about file ownership in the long listing which is crucial sometimes. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-08-13 00:10:49 +03:00
Andrey Smirnov	eefe1c21c3	feat: add new etcd members in learner mode Fixes #3714 This provides more safe way to join new members to the etcd cluster. See https://etcd.io/docs/v3.4/learning/design-learner/ With learner mode join there are few differences: * new nodes are joined one by one, because etcd enforces a single learner member in the cluster * learner members are not counted in quorum calculations, so while learner catches up with the master node, quorum is not affected and cluster is still operational Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-08-12 17:56:57 +03:00
Andrey Smirnov	0c7ce1cd81	feat: remove remnants of bootkube support Fixes #3951 Bootkube support was removed in Talos 0.9. Talos versions 0.9-0.11 support conversion of self-hosted bootkube-based control plane to the new style control plane running as static pods managed by Talos. This commit removes all backwards compatibility and removes conversion code. For the k8s controllers, `BootstrapStatus` is removed and a dependency on `etcd` service status is added (as it was implicitly there via `BootstrapStatus`). Remove control plane conversion code. In k8s upgrade code, remove self-hosted part. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-08-03 07:55:42 -07:00
Alexey Palazhchenko	eea750de2c	chore: rename "join" type to "worker" Closes #3413. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-07-09 07:10:45 -07:00
Alexey Palazhchenko	bbf1c091d4	feat: add RBAC to `talosctl version` output Refs #3852. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-28 07:10:25 -07:00
Alexey Palazhchenko	06209bba28	chore: update RBAC rules, remove old APIs Refs #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-18 09:54:49 -07:00
Alexey Palazhchenko	f63ab9dd9b	feat: implement `talosctl config new` command Refs #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-17 09:06:43 -07:00
Artem Chernyshev	9a91142a38	feat: print complete member info in etcd members Fixes: https://github.com/talos-systems/talos/issues/3487 Example output: ``` NODE ID HOSTNAME PEERS CLIENTS 10.5.0.2 c3d3020cf75b8728 talos-default-master-1 https://10.5.0.2:2380 https://10.5.0.2:2379 ``` Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-04-17 11:07:59 -07:00
Andrey Smirnov	0bd8b0e800	feat: provide an option to recover etcd from data directory copy Sometimes `talosctl etcd snapshot` might not be available, for example when etcd is not healthy. In that case it's possible to copy raw etcd data directory with `talosctl cp /var/lib/etcd .` and use `member/snap/db` to recover the cluster. But such copy won't pass integrity checks, so they should be disabled explicitly. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-14 08:25:32 -07:00
Alexey Palazhchenko	29da22d063	feat: add config validation warnings Closes #3412. Refs #3413. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-04-08 13:49:58 -07:00
Andrey Smirnov	e0650218a6	feat: support etcd recovery from snapshot on bootstrap When Talos `controlplane` node is waiting for a bootstrap, `etcd` contents can be recovered from a snapshot created with `talosctl etcd snapshot` on a healthy cluster. Bootstrap process goes same way as before, but the etcd data directory is recovered from the snapshot. This flow enables disaster recovery for the control plane: given that periodic backups are available, destroy control plane nodes, re-create them with the same config, and bootstrap one node with the saved snapshot to recover etcd state at the time of the snapshot. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-08 10:15:37 -07:00
Andrey Smirnov	e664362cec	feat: add API and command to save etcd snapshot (backup) This adds a simple API and `talosctl etcd snapshot` command to stream snapshot of etcd from one of the control plane nodes to the local file. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-02 09:20:16 -07:00
Artem Chernyshev	376fdcf6cb	feat: implement etcd remove-member cli command Fixes: https://github.com/talos-systems/talos/issues/3219 We already have `etcd leave`, which makes the node exclude itself from etcd members. But in case if the node can't remove itself because it doesn't have connection to etcd we need this etcd remove-member cli, which basically removes a node from a different node. No unit tests for that as it's going to destroy the test cluster. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-03-01 07:55:08 -08:00
Andrey Smirnov	7751920dba	feat: add a tool and package to convert self-hosted CP to static pods This is required to upgrade from Talos 0.8.x to 0.9.x. After the cluster is fully upgraded, control plane is still self-hosted (as it was bootstrapped with bootkube). Tool `talosctl convert-k8s` (and library behind it) performs the upgrade to self-hosted version. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-17 23:26:57 -08:00
Andrey Smirnov	cc83b83808	feat: rename apply-config --no-reboot to --on-reboot This explains the intetion better: config is applied on reboot, and allows to easily distinguish it from `apply-config --immediate` which applies config immediately without a reboot (that is coming in a different PR). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-17 12:49:47 -08:00
Andrey Smirnov	d99a016af2	fix: correct response structure for GenerateConfig API Also fix recovery grpc handler to print panic stacktrace to the log. Any API should follow the structure compatible with apid proxying injection of errors/nodes. Explicitly fail GenerateConfig API on worker nodes, as it panics on worker nodes (missing certificates in node config). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-11 06:34:10 -08:00
Andrey Smirnov	edf5777222	feat: add an option to force upgrade without checks Our upgrades are safe by default - we check etcd health, take locks, etc. But sometimes upgrades might be a way to recover broken (or semi-broken) cluster, in that case we need upgrade to run even if the checks are not passing. This is not a safe way to do upgrades, but it might be a way to recover a cluster. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-02-09 10:20:03 -08:00
Andrey Smirnov	76a6794436	fix: kill all processes and umount all disk on reboot/shutdown There are several ways Talos node might be restarted or shut down: * error in sequence (initiated from machined) * panic in main goroutine (machined recovers panics) * error in sequence (initiated via API, event caught by machined) * reboot/shutdown via Talos API Before this change, paths (1) and (2) were handled in machined, and no disks were unmounted and processes killed, so technically all the processes are running and potentially writing to the filesystems. Paths (3) and (4) try to stop services (but not pods) and unmount explicitly mounted filesystems, followed by reboot directly from sequencer (bypassing machined handler). There was a bug that user disks were never explicitly unmounted (but they might have been unmounted if mounted on top `/var`). This refactors all the reboot/shutdown paths to flow through machined's main function: on paths (4) event is sent via event API from the sequencer back to the machined and machined initiates proper shutdown sequence. Refactoring in machined leads to all the paths (1)-(4) flowing through the same function `handle(error)`. Added two additional checks before flushing buffers: * kill all non-system processes, this also kills all mount namespaces * unmount any filesystem backed by `/dev/*` This ensures all filesystems are unmounted before buffers are flushed. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-01-29 06:14:07 -08:00
Andrey Smirnov	0aaf8fa968	feat: replace bootkube with Talos-managed control plane Control plane components are running as static pods managed by the kubelets. Whole subsystem is managed via resources/controllers from os-runtime. Many supporting changes/refactoring to enable new code paths. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-01-26 14:22:35 -08:00
Alexey Palazhchenko	f3465b8e3e	feat: support type filter in list API and CLI Closes #2068. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2020-12-24 06:34:02 -08:00
Artem Chernyshev	a83e8758db	feat: add commands to manage/query etcd cluster Used already existing protobufs for that. Commands: `talosctl etcd members -n <node>` `talosctl etcd leave -n <node>` `talosctl etcd forfeit-leadership -n <node>` Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-12-22 11:49:10 -08:00
Andrey Smirnov	54ed80e244	feat: reset with system disk wipe spec Idea is to add an option to perform "selective" reset: default reset operation is to wipe all partitions (triggering reinstall), while spec allows only to wipe some of the operations. Other operations are performed exactly in the same way for any reset flow. Possible use case: reset only `EPHEMERAL` partition. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-10 11:31:07 -08:00
Andrey Smirnov	350280eb59	feat: implement "staged" (failsafe/backup) upgrades Regular upgrade path takes just one reboot, but it requires all the processes to be stopped on the node before upgrade might proceed. Under some circumstances and with potential Talos bugs it might not work rendering Talos upgrades almost impossible. Staged upgrades build upon regular install flow to run the upgrade on the node reboot. Such upgrades require two reboots of the node, and it requires two pulls of the installer image, but they should be much less suspicious to the failure. Once the upgrade is staged, node can be rebooted in any possible way, including hard reset and upgrade is performed on the next boot. New ADV format was implemented as well to allow to store install image ref/options across reboots. New format allows for bigger values and takes 50% of the `META` partition. Old ADV is still kept for compatibility reasons. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-12-08 08:34:26 -08:00
Artem Chernyshev	5d48bd5f6a	feat: allow disabling NoSchedule taint on masters using TUI installer I think this should come handy for setting up single node SBC clusters. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-12-07 07:31:54 -08:00
Artem Chernyshev	63e0d02aa9	feat: add TUI for configuring network interfaces settings Allows configuring: - cidr. - dhcp enable/disable. - MTU. - Ignore. - Dhcp metric. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-12-03 11:05:55 -08:00
Artem Chernyshev	c7062e3f4d	feat: make GenerateConfiguration accept current time as a parameter If the node time is out of sync, it can generate incorrect configuration. And maintenance mode does not allow us starting ntp, because there is no containerd. By providing current UTC time of the machine where talosctl client is running, it is possible to force GenerateConfiguration use correct time. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2020-12-03 08:28:11 -08:00

1 2

92 Commits