talos

mirror of https://github.com/siderolabs/talos.git synced 2025-10-09 22:51:12 +02:00

Author	SHA1	Message	Date
Dmitriy Matrenichev	e06e1473b0	feat: update golangci-lint to 1.45.0 and gofumpt to 0.3.0 - Update golangci-lint to 1.45.0 - Update gofumpt to 0.3.0 - Fix gofumpt errors - Add goimports and format imports since gofumports is removed - Update Dockerfile - Fix .golangci.yml configuration - Fix linting errors Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-03-24 08:14:04 +04:00
Andrey Smirnov	883d401f9f	chore: rename github organization to siderolabs Go module import paths still use talos-systems, packages use new siderolabs name. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-03-23 21:07:46 +03:00
Andrey Smirnov	f477507262	fix: the etcd recovery client and tests This is the follow-up fix to the PR #5129. 1. Correctly catch only expected errors in the tests. 2. Rewind the snapshot each time the upload is retried. 3. Correctly unwrap errors in the `EtcdRecovery` client. 4. Update the `grpc-proxy` library to pass through the EOF error. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-03-22 16:51:36 +03:00
Artem Chernyshev	27af5d41c6	feat: pause the boot process on some failures instead of rebooting Some failures can be fixed by updating the machine configuration. Now `userDisks` and `userFiles` do not make Talos to enter into reboot loop but pause for 35 minutes. Additionally, `apid` and `machined` are now started right after containerd is up and running. That makes it possible for the operator to connect to the node using talosctl and fix the config. Fixes: https://github.com/talos-systems/talos/issues/4669 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-03-21 17:39:45 +03:00
Artem Chernyshev	a50747a64a	fix: align list and diskusage command flags with their Linux analogs Fixes: https://github.com/talos-systems/talos/issues/3018 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-03-02 22:27:56 +03:00
Andrey Smirnov	09efa62f68	chore: re-enable kexec and default to UEFI booting in tests Fixes #4947 It turns out there's something related to boot process in BIOS mode which leads to initramfs corruption on later `kexec`. Booting via GRUB is always successful. Problem with kexec was confirmed with: * direct boot via QEMU * QEMU boot via iPXE (bundled with QEMU) The root cause is not known, but the only visible difference is the placement of RAMDISK with UEFI and BIOS boots: ``` [ 0.005508] RAMDISK: [mem 0x312dd000-0x34965fff] ``` or: ``` [ 0.003821] RAMDISK: [mem 0x711aa000-0x747a7fff] ``` Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-03-02 21:52:18 +03:00
Andrey Smirnov	0da370dfef	test: unlock CABPT/CACPPT provider versions We should always test latest versions of our providers. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-02-10 00:14:15 +03:00
Andrey Smirnov	85782faa24	feat: update Kubernetes to 1.23.3 Also bumps some dependencies and updates Talos version we use in the upgrade tests. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-01-26 17:59:21 +03:00
Artem Chernyshev	2f2bdb26aa	feat: replace flags with --mode in `apply`, `edit` and `patch` commands Fixes: https://github.com/talos-systems/talos/issues/4588 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2022-01-13 16:09:53 +03:00
Andrey Smirnov	2f4b9d8d6d	feat: make machine configuration read-only in Talos (almost) Talos shouldn't try to re-encode the machine config it was provided with. So add a `ReadonlyWrapper` around `v1alpha1.Config` which makes sure that raw config object is not available anymore (it's a private field), but config accessors are available for read-only access. Another thing that `ReadonlyWrapper` does is that it preserves the original `[]byte` encoding of the config keeping it exactly same way as it was loaded from file or read over the network. Improved `talosctl edit mc` to preserve the config as it was submitted, and preserve the edits on error from Talos (previously edits were lost). `ReadonlyWrapper` is not used on config generation path though - config there is represented by `v1alpha.Config` and can be freely modified. Why almost? Some parts of Talos (platform code) patch the machine configuration with new data. We need to fix platforms to provide networking configuration in a different way, but this will come with other PRs later. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-28 20:12:55 +03:00
Andrey Smirnov	d2a7e082c2	test: retry in discovery tests Sometimes pushing/pulling to Kubernetes registry is delayed due to backoff on failed attempts to talk to the API server when the cluster is still bootstrapping. Workaround that by adding retries. Also disable kernel module controller in container mode, as it will keep always failing. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-28 16:55:41 +03:00
Andrey Smirnov	c297d66a13	test: attempt number on two on proper retries in CLI time tests See #4702 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-22 18:29:34 +03:00
Andrey Smirnov	17c1474881	test: retry `talosctl time` call in the tests As `talosctl time` relies on default time server set in the config, and our nodes start with `pool.ntp.org`, sometimes request to the timeserver fails failing the tests. Retry such errors in the tests to avoid spurious failures. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-17 20:55:06 +03:00
Noel Georgi	4c96e936ed	docs: add cilium guide - Add Cilium CNI install guide - Use Canal CNI for default examples Fixes #4477 Signed-off-by: Noel Georgi <git@frezbo.dev>	2021-12-16 20:37:03 +05:30
Andrey Smirnov	ec641f7296	fix: use default time servers in time API if none are configured This fixes simple bug: ``` $ talosctl -n 172.20.0.2 time error fetching time: 1 error occurred: * 172.20.0.2: rpc error: code = Unknown desc = no time servers configured ``` After the change: ``` $ talosctl -n 172.20.0.2 time NODE NTP-SERVER NODE-TIME NTP-SERVER-TIME 172.20.0.2 pool.ntp.org 2021-12-10 14:25:38.871656717 +0000 UTC 2021-12-10 14:25:38.92119139 +0000 UTC ``` Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-10 17:39:36 +03:00
Andrey Smirnov	97ffa7a645	feat: upgrade kubelet version in `talosctl upgrade-k8s` Fixes #4656 As now changes to kubelet configuration can be applied without a reboot, `talosctl upgrade-k8s` can handle the kubelet upgrades as well. The gist is simply modifying machine config and waiting for `Node` version to be updated, rest of the code is required for reliability of the process. Also fixed a bug in the API while watching deleted items with tombstones. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-08 21:12:17 +03:00
Andrey Smirnov	64a4f6e77c	test: bump Talos versions in upgrade tests In preparation for going 0.14-beta.0, bump versions in upgrade tests. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-12-06 18:07:24 +03:00
Artem Chernyshev	4f5d9da922	feat: allow overriding KSPP kernel parameters Fixes: https://github.com/talos-systems/talos/issues/4385 Now sysctls defined in the config can override kernel args defined by defaults controller. In that case controller shows the warning that tells which param was overridden and the new value and tells that it is not recommended. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2021-12-03 18:50:21 +03:00
Rohit Dandamudi	7f9922296a	feat: add powercycle mode in reboot - Fixes #4569 - Updated reboot process sequence - Updted api.descriptors to avoid proto type change linting error https://github.com/talos-systems/talos/pull/4612#discussion_r758599242 Signed-off-by: Rohit Dandamudi <rohit.dandamudi@siderolabs.com> Signed-off-by: Rohit Dandamudi <rohit.dandamudi@siderolabs.com>	2021-12-02 22:40:04 +05:30
Nico Berlee	852bf4a7de	feat: talosctl fish completion support Generate talosctl completion for fish Signed-off-by: Nico Berlee <nico.berlee@on2it.net> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-11-23 16:45:16 +03:00
Andrey Smirnov	753a82188f	refactor: move pkg/resources to machinery Fixes #4420 No functional changes, just moving packages around. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-11-15 19:50:35 +03:00
Alexey Palazhchenko	7462733bcb	chore: update golangci-lint Fix context propagation. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-11-15 14:55:25 +00:00
Andrey Smirnov	a76f6d69db	feat: allow kubelet to be restarted and provide negative nodeIP subnets Fixes #4407 fixes #4489 This PR started by enabling simple restart of the `kubelet` service via services API, but it turned out there's a problem: When kubelet restarts, CNI is already up, so there's an interface on the host with CNI node IP, the code which picks kubelet node IP finds it and tries to add it to the list of kubelet node IPs which completely breaks kubelet. Solution was easy: allow node IPs to be filtered out - e.g. we never want kubelet node IP to be from the pod CIDR. But this filtering feature is also useful in other cases, so I added that as well. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-11-15 15:43:34 +03:00
Andrey Smirnov	d4b0ca21a1	test: retry upgrade mutex lock failures With recent changes and kexec, Talos upgrades much faster in the tests and mutex is not released properly (#4525). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-11-12 17:49:46 +03:00
Artem Chernyshev	efbae7857d	fix: use etc folder for du cli tests Fixes: https://github.com/talos-systems/talos/issues/4382 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2021-11-10 20:10:40 +03:00
Artem Chernyshev	261c497c71	feat: implement `talosctl support` command Fixes: https://github.com/talos-systems/talos/issues/4406 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2021-11-08 16:20:50 +03:00
Andrey Smirnov	8329d21114	chore: split polymorphic RootSecret resource into specific types Fixes #4418 Only one resource (one of the very first ones) was polymorphic: its actual spec type depends on its ID. This was a bad idea, and it doesn't work with protobuf specs (as type <> protobuf relationship can't be established). Refactor this by splitting into three separate resource types: `OSRoot` (OS-level root secrets), `EtcdRoot` (for etcd), `KubernetesRoot` (for Kubernetes). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-10-27 19:56:04 +03:00
Andrey Smirnov	b6b78e7fef	test: add cluster discovery integration tests This verifies that members match cluster state and that both cluster registries work in sync producing same discovery data. Fixes #4191 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-10-25 21:03:29 +03:00
Andrey Smirnov	38516a5499	test: update Talos versions in upgrade tests Now 0.13.0 is the past release and 0.12.3 is the one before it. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-10-21 17:36:30 +03:00
Andrey Smirnov	3e100aa977	test: workaround EventsWatch test flakiness This test sometimes fails with a message like: ``` === RUN TestIntegration/api.EventsSuite/TestEventsWatch assertion_compare.go:323: Error Trace: events.go:88 Error: "0" is not greater than or equal to "14" Test: TestIntegration/api.EventsSuite/TestEventsWatch Messages: [] ``` I believe the root cause is that the initial (first event) delivery might be more than 100ms, so instead of waiting for 100ms for each event, block for 500ms for all events to arrive. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-10-15 12:51:56 +03:00
Andrey Smirnov	b450b7cef0	chore: deprecate Interfaces and Routes APIs Fixes #4094 Deprecate old networkd APIs, `talosctl interfaces` and `talosctl routes` now suggest different commands to be used to achieve same task. TUI installer was updated to stop using Interfaces API. Those APIs will be completely removed in 0.14. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-09-27 15:21:02 +03:00
Andrey Smirnov	d943bb0e28	feat: update Kubernetes to 1.22.2 See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-09-16 13:59:51 +03:00
Andrey Smirnov	a059454045	chore: build using Go 1.17 `initramfs` size for amd64 shrinks by 1.3 MiB. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-09-13 22:33:47 +03:00
Andrey Smirnov	950f122c95	chore: update versions in upgrade tests In preparation for 0.13, start testing upgrades to 0.12. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-08-25 18:02:47 +03:00
Andrey Smirnov	dadaa65d54	feat: print uid/gid for the files in `ls -l` This adds information about file ownership in the long listing which is crucial sometimes. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2021-08-13 00:10:49 +03:00
Alexey Palazhchenko	09d70b7eaf	feat: update Kubernetes to v1.22.0 Closes #3967. Closes #3997. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>	2021-08-06 09:06:32 -07:00
Alexey Palazhchenko	eea750de2c	chore: rename "join" type to "worker" Closes #3413. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-07-09 07:10:45 -07:00
Andrey Smirnov	b969e7720e	chore: update references to old protobuf package This simply uses new protobuf package instead of old one. Old protobuf package is still in use by Talos dependencies. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-08 05:34:12 -07:00
Andrey Smirnov	10c28758a4	fix: ignore DeadlineExceeded error correctly on bootstrap The problem was that gRPC method `status.Code(err)` doesn't unwrap errors, while Talos client returns errors wrapped with `multierror.Error` and `fmt.Errrorf`, so `status.Code` doesn't return error code correctly. Fix that by introducing our own client method which correctly goes over the chain of wrapped errors. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-07 12:02:26 -07:00
Andrey Smirnov	84817f7334	chore: bump Talos version in upgrade tests Preparing for 0.11 to be stable release soon. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-29 07:24:48 -07:00
Alexey Palazhchenko	2fa54107b2	chore: fix tests for disabled RBAC This commit also introduces a hidden `--json` flag for `talosctl version` command that is not supported and should be re-worked at #907. Refs #3852. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-28 13:56:40 -07:00
Alexey Palazhchenko	bbf1c091d4	feat: add RBAC to `talosctl version` output Refs #3852. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-28 07:10:25 -07:00
Alexey Palazhchenko	ad047a7dee	chore: small RBAC improvements * `talosctl config new` now sets endpoints in the generated config. * Avoid duplication of roles in metadata. * Remove method name prefix handling. All methods should be set explicitly. * Add tests. Closes #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-25 05:50:38 -07:00
Alexey Palazhchenko	3c1b32199d	chore: refactor CLI tests Use testing.T.TempDir. Add support for `talosctl --endpoints`. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-23 05:49:00 -07:00
Alexey Palazhchenko	42c16f67f4	chore: bump dependencies Update k8s to 1.21.2. See #3787 #3788 #3789 #3790 #3791 #3792 #3793 #3794 #3795 #3796 #3798. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-21 07:05:41 -07:00
Alexey Palazhchenko	f63ab9dd9b	feat: implement `talosctl config new` command Refs #3421. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-17 09:06:43 -07:00
Andrey Smirnov	62c702c4fd	fix: remove conflicting etcd member on rejoin with empty data directory This fixes a scenario when control plane node loses contents of `/var` without leaving etcd first: on reboot etcd data directory is empty, but member is already present in the etcd member list, so etcd won't be able to join because of raft log being empty. The fix is to remove a member with matching hostname if found in the etcd member list followed by new member add. The risk here is removing another member which has same hostname as the joining node, but having duplicate hostnames for control plane node is a problem anyways. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-03 15:11:44 -07:00
Andrew Rynhard	a71053fcd8	feat: default to bootstrap workflow Changes `gen config` to output `controlplane` and `join` machine config types only. Users can manually set the `type` to `init` if they need to. Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2021-06-03 11:29:56 -07:00
Andrey Smirnov	5811f4dda1	feat: implement link (interface) controllers The structure of the controllers is really similar to addresses and routes: * `LinkSpec` resource describes desired link state * `LinkConfig` controller generates `LinkSpecs` based on machine configuration and kernel cmdline * `LinkMerge` controller merges multiple configuration sources into a single `LinkSpec` paying attention to the config layer priority * `LinkSpec` controller applies the specs to the kernel state Controller `LinkStatus` (which was implemented before) watches the kernel state and publishes current link status. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-01 09:36:25 -07:00
Andrey Smirnov	0acb04ad7a	feat: implement route network controllers Route handling is very similar to addresses: * `RouteStatus` describes kernel routing table state, `RouteStatusController` reflects kernel state into resources * `RouteSpec` defines routes to be configured * `RouteConfigController` creates `RouteSpec`s based on cmdline and machine configuration * `RouteMergeController` merges different configuration layers into the final representation * `RouteSpecController` applies the specs to the kernel routing table Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-05-25 11:09:21 -07:00

1 2 3 4 5

230 Commits