56 Commits

Author SHA1 Message Date
Dmitriy Matrenichev
fc48849d00
chore: move maps/slices/ordered to gen module
Use github.com/siderolabs/gen

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-21 20:22:43 +03:00
Andrey Smirnov
8b09bd4b04
feat: update Kubernetes to v1.26.0-alpha.1
Talos 1.3.0 will ship with Kubernetes 1.26.0.

See https://github.com/kubernetes/kubernetes/releases/tag/v1.26.0-alpha.1

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-21 18:42:31 +04:00
Andrey Smirnov
9baca49662
refactor: implement COSI resource API for Talos
Overview: deprecate existing Talos resource API, and introduce new COSI
API.

Consequences:

* COSI API can only go via one-2-one proxy (`client.WithNode`)
* client-side API access is way easier with `state.State` wrappers
* lots of small changes on the client side to use new APIs

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-12 22:31:54 +04:00
Noel Georgi
b62b18a972
feat: bump k8s to v1.25.0-beta.0
Bump k8s to v1.25.0-beta.0

Update most kubernetes `master` references to `controlplane`

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-10 22:17:53 +05:30
Andrey Smirnov
a6b010a8b4
chore: update Go to 1.19, Linux to 5.15.58
See https://go.dev/doc/go1.19

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-03 17:03:58 +04:00
Andrey Smirnov
0cdf222431
fix: retry Conflict errors when upgrading k8s manifests
Fixes #5985

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-29 13:20:04 +04:00
Dmitriy Matrenichev
4dbbf4ac50
chore: add generic methods and use them part #2
Use things from #5702.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-06-09 23:10:02 +08:00
Andrey Smirnov
f1f43131f8
fix: strip 'v' prefix from versions on Kubernetes upgrade
This fixes an issue when `talosctl upgrade-k8s` fails with unhelpful
message if the version is specified as `v1.23.5` vs. `1.23.5`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-22 14:59:12 +03:00
Andrey Smirnov
4eb9f45cc8
refactor: split polymorphic K8sControlPlane into typed resources
Having polymorphic (spec type depends on ID) resources is not a good
idea, and it's not compatible with protobuf encoding.

Introduce new resources for each polymorphic sub-spec using new Go 1.18
generic typed.Resource to reduce the boilerplate code.

(Still needs proper deepcopy-gen, but I'm skipping it for now, as
K8sControlPlane had also broken deep copy).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-19 16:53:09 +03:00
Andrey Smirnov
0cb84e8c1a
fix: correctly parse tags out of images
Use the last `:` in the image reference.

Handle the case when no version was discovered.

See https://github.com/siderolabs/theila/issues/138

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-07 19:32:12 +03:00
Andrey Smirnov
2ca5279e56
fix: retry manifest updates in upgrade-k8s
This showed up recently frequently in integration-provision tests
(might be related to Kubernetes upgrade), but anyways errors should be
retried.

Refactored the function to extract the retryable part.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-01 16:20:25 +03:00
Andrey Smirnov
ca8b9c0a3a
feat: update Kubernetes to 1.24.0-alpha.4
See https://github.com/kubernetes/kubernetes/releases/tag/v1.24.0-alpha.4

Fix some incompatibilities around dropped flags/API versions.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-30 22:59:07 +03:00
Dmitriy Matrenichev
e06e1473b0
feat: update golangci-lint to 1.45.0 and gofumpt to 0.3.0
- Update golangci-lint to 1.45.0
- Update gofumpt to 0.3.0
- Fix gofumpt errors
- Add goimports and format imports since gofumports is removed
- Update Dockerfile
- Fix .golangci.yml configuration
- Fix linting errors

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-03-24 08:14:04 +04:00
Noel Georgi
dcde2c4f68
chore: update k8s upgrade message
Update k8s upgrade message

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-01-31 16:49:25 +05:30
Artem Chernyshev
831f65a07f
fix: close client provider instead of Talos client in the upgrade module
Otherwise it breaks Theila, which never closes Talos clients during
operation.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-01-27 15:07:28 +03:00
Artem Chernyshev
2f2bdb26aa
feat: replace flags with --mode in apply, edit and patch commands
Fixes: https://github.com/talos-systems/talos/issues/4588

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-01-13 16:09:53 +03:00
Andrey Smirnov
2f4b9d8d6d
feat: make machine configuration read-only in Talos (almost)
Talos shouldn't try to re-encode the machine config it was provided
with.

So add a `ReadonlyWrapper` around `*v1alpha1.Config` which makes sure
that raw config object is not available anymore (it's a private field),
but config accessors are available for read-only access.

Another thing that `ReadonlyWrapper` does is that it preserves the
original `[]byte` encoding of the config keeping it exactly same way as
it was loaded from file or read over the network.

Improved `talosctl edit mc` to preserve the config as it was submitted,
and preserve the edits on error from Talos (previously edits were lost).

`ReadonlyWrapper` is not used on config generation path though - config
there is represented by `*v1alpha.Config` and can be freely modified.

Why almost? Some parts of Talos (platform code) patch the machine
configuration with new data. We need to fix platforms to provide
networking configuration in a different way, but this will come with
other PRs later.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-28 20:12:55 +03:00
Andrey Smirnov
97ffa7a645
feat: upgrade kubelet version in talosctl upgrade-k8s
Fixes #4656

As now changes to kubelet configuration can be applied without a reboot,
`talosctl upgrade-k8s` can handle the kubelet upgrades as well.

The gist is simply modifying machine config and waiting for `Node`
version to be updated, rest of the code is required for reliability of
the process.

Also fixed a bug in the API while watching deleted items with
tombstones.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-08 21:12:17 +03:00
Andrey Smirnov
753a82188f
refactor: move pkg/resources to machinery
Fixes #4420

No functional changes, just moving packages around.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-11-15 19:50:35 +03:00
Alexey Palazhchenko
95105071de
chore: fix simple issues found by golangci-lint
Avoid slice mutation with append.
Simplify code.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-11-12 15:20:28 +00:00
Andrey Smirnov
ae5af9d3fa
feat: update Kubernetes to 1.23.0-alpha.3
See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1230-alpha3

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-22 14:59:41 +03:00
Artem Chernyshev
e3e2113adc
feat: upgrade CoreDNS during upgrade-k8s call
Fixes: https://github.com/talos-systems/talos/issues/4065

Get all Talos generated manifests and apply them, wait for deployments to be
updated and to become ready.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-10-13 15:47:06 +03:00
Alexey Palazhchenko
d53e9e8963
chore: use named constants
Just for consistency.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-09-07 12:13:48 +00:00
Alexey Palazhchenko
032e7c6b86
chore: import yaml.v3 consistently
Do not use yaml.v2.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-08-26 11:36:50 +00:00
Artem Chernyshev
2b614e430e
feat: check if cluster has deprecated resources versions
Fixes: https://github.com/talos-systems/talos/issues/4026

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-08-18 23:26:36 +03:00
Alexey Palazhchenko
09d70b7eaf feat: update Kubernetes to v1.22.0
Closes #3967.
Closes #3997.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-08-06 09:06:32 -07:00
Andrey Smirnov
0c7ce1cd81 feat: remove remnants of bootkube support
Fixes #3951

Bootkube support was removed in Talos 0.9. Talos versions 0.9-0.11
support conversion of self-hosted bootkube-based control plane to the
new style control plane running as static pods managed by Talos.

This commit removes all backwards compatibility and removes conversion
code.

For the k8s controllers, `BootstrapStatus` is removed and a dependency
on `etcd` service status is added (as it was implicitly there via
`BootstrapStatus`).

Remove control plane conversion code.

In k8s upgrade code, remove self-hosted part.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-08-03 07:55:42 -07:00
Artem Chernyshev
70d2505b7c fix: do not require ToVersion to be set when detecting version
We do not know the upgrade version when checking components versions in
Theila.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-07-21 08:51:26 -07:00
Artem Chernyshev
f8f1c83a75 feat: detect the lowest Kubernetes version in upgrade-k8s CLI command
Scan all pods in `kube-system` and find `kube-proxy`, `kube-scheduler`,
`kube-controller-manager` and `kube-apiserver` ones, then check the
lowest version amongst them.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-07-19 08:24:04 -07:00
Artem Chernyshev
2e463348b2 fix: pass all logs through the options.Log method
Looks like I've missed some 🤦

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-07-15 08:32:48 -07:00
Artem Chernyshev
bf61c2cc4a fix: write upgrade logs only to the LogOutput if it's defined
No need to print them to stdout in that case.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-07-15 07:02:45 -07:00
Artem Chernyshev
23ef1d40af chore: add ability to redirect talos upgrade module logs to io.Writer
This is going to be useful in the third party code which is using
upgrade modules, to collect output logs instead of printing them to the
stdout.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-07-13 08:12:06 -07:00
Andrey Smirnov
e883c12b31 fix: make output of upgrade-k8s command less scary
This removes `retrying error` messages while waiting for the API server
pod state to reflect changes from the updated static pod definition.

Log more lines to notify about the progress.

Skip `kube-proxy` if not found (as we allow it to be disabled).

```
$ talosctl upgrade-k8s -n 172.20.0.2 --from 1.21.0 --to 1.21.2
discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
updating "kube-apiserver" to version "1.21.2"
 > "172.20.0.2": starting update
 > "172.20.0.2": machine configuration patched
 > "172.20.0.2": waiting for API server state pod update
 < "172.20.0.2": successfully updated
 > "172.20.0.3": starting update
 > "172.20.0.3": machine configuration patched
 > "172.20.0.3": waiting for API server state pod update
 < "172.20.0.3": successfully updated
 > "172.20.0.4": starting update
 > "172.20.0.4": machine configuration patched
 > "172.20.0.4": waiting for API server state pod update
 < "172.20.0.4": successfully updated
updating "kube-controller-manager" to version "1.21.2"
 > "172.20.0.2": starting update
 > "172.20.0.2": machine configuration patched
 > "172.20.0.2": waiting for API server state pod update
 < "172.20.0.2": successfully updated
 > "172.20.0.3": starting update
 > "172.20.0.3": machine configuration patched
 > "172.20.0.3": waiting for API server state pod update
 < "172.20.0.3": successfully updated
 > "172.20.0.4": starting update
 > "172.20.0.4": machine configuration patched
 > "172.20.0.4": waiting for API server state pod update
 < "172.20.0.4": successfully updated
updating "kube-scheduler" to version "1.21.2"
 > "172.20.0.2": starting update
 > "172.20.0.2": machine configuration patched
 > "172.20.0.2": waiting for API server state pod update
 < "172.20.0.2": successfully updated
 > "172.20.0.3": starting update
 > "172.20.0.3": machine configuration patched
 > "172.20.0.3": waiting for API server state pod update
 < "172.20.0.3": successfully updated
 > "172.20.0.4": starting update
 > "172.20.0.4": machine configuration patched
 > "172.20.0.4": waiting for API server state pod update
 < "172.20.0.4": successfully updated
updating daemonset "kube-proxy" to version "1.21.2"
kube-proxy skipped as DaemonSet was not found
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-07-01 06:54:36 -07:00
Alexey Palazhchenko
06209bba28 chore: update RBAC rules, remove old APIs
Refs #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-18 09:54:49 -07:00
Andrey Smirnov
5811f4dda1 feat: implement link (interface) controllers
The structure of the controllers is really similar to addresses and
routes:

* `LinkSpec` resource describes desired link state
* `LinkConfig` controller generates `LinkSpecs` based on machine
configuration and kernel cmdline
* `LinkMerge` controller merges multiple configuration sources into a
single `LinkSpec` paying attention to the config layer priority
* `LinkSpec` controller applies the specs to the kernel state

Controller `LinkStatus` (which was implemented before) watches the
kernel state and publishes current link status.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-01 09:36:25 -07:00
Andrey Smirnov
d24df8f844 chore: re-import talos-systems/os-runtime as cosi-project/runtime
No changes, just import path change (as project got moved).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-12 07:44:24 -07:00
Andrey Smirnov
a1e6415403 fix: retry Kubernetes API errors on cordon/uncordon/etc
This extracts function which was used in upgrade/convert flows to retry
transient errors to the main `kubernetes` package, expands it to ignore
timeout errors, and it is now used to retry errors where applicable in
`pkg/kubernetes`.

Fixes #3403

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-02 03:51:40 -07:00
Andrey Smirnov
e039172eda fix: ignore EOF errors from Kubernetes API when converting control plane
During the conversion process, API server goes down, so we can see lots
of network errors including EOF.

Fixes #3404

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-01 10:52:44 -07:00
Andrey Smirnov
672c970739 fix: allow convert-k8s --remove-initialized-keys with K8s cp is down
The command `--remove-initialized-key` is the last resort to convert
control plane when control plane is down for whatever reason, so it
should work when control plane is not available.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-25 14:06:08 -07:00
Alexey Palazhchenko
fb605a0fc5 chore: tweak nolintlint settings
Copy from kres manually for now.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-25 13:56:16 -07:00
Alexey Palazhchenko
1f5a0c4065 fix: resolve the issue with Kubernetes upgrade
Add missing cases, refactoring.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-25 12:48:28 -07:00
Andrey Smirnov
125b86f4ef fix: upgrade-k8s bug with empty config values and provision script
First, if the config for some component image (e.g. `apiServer`) is empty,
Talos pushes default image which is unknown to the script, so verify
that change is not no-op, as otherwise script will hang forvever waiting
for k8s control plane config update.

Second, with bootkube bootstrap it was fine to omit explicit kubernetes
version in upgrade test, but with Talos-managed that means that after
Talos upgrade Kubernetes gets upgraded as well (as Talos config doesn't
contain K8s version, and defaults are used). This is not what we want to
test actually.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-19 12:05:31 -07:00
Artem Chernyshev
22f375300c chore: update golanci-lint to 1.38.0
Fix all discovered issues.
Detected couple bugs, fixed them as well.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-12 06:50:02 -08:00
Andrey Smirnov
6f7df3da1e fix: update output of convert-k8s command
This includes Sean's comments from #3278 and introduces a new flag which
is referenced in manual conversion process document.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-12 02:21:01 -08:00
Andrey Smirnov
81acadf345 fix: ignore connection refused errors when updating/converting cp
Without loadbalancer, when api-server goes down, there will be
connection refused errors which should be retried.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-05 06:59:06 -08:00
Alexey Palazhchenko
df52c13581 chore: fix //nolint directives
That's the recommended syntax:
https://golangci-lint.run/usage/false-positives/

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-05 05:58:33 -08:00
Andrey Smirnov
60aa011c7a feat: rename namespaces, resources, types etc
See https://github.com/talos-systems/os-runtime/pull/12 for new mnaming
conventions.

No functional changes.

Additionally implements printing extra columns in `talosctl get xyz`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-02 13:34:15 -08:00
Andrey Smirnov
e2f1fbcfdb feat: support control plane upgrades with Talos managed control plane
Upgrade is performed by updating node configuration (node by node, service
by service), watching internal resource state to get new configuration
version and verifying that pod with matching version successfully
propagated to the API server state and pod is ready.

Process is similar to the rolling update of the DaemonSet.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-20 11:57:32 -08:00
Andrey Smirnov
7751920dba feat: add a tool and package to convert self-hosted CP to static pods
This is required to upgrade from Talos 0.8.x to 0.9.x. After the cluster
is fully upgraded, control plane is still self-hosted (as it was
bootstrapped with bootkube).

Tool `talosctl convert-k8s` (and library behind it) performs the upgrade
to self-hosted version.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-17 23:26:57 -08:00
Andrey Smirnov
e0a0f58801 feat: use multi-arch images for k8s and Flannel CNI
Flannel got updated to 0.13 version which has multi-arch image.

Kubernetes images are multi-arch.

Fixes #3049

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-01-28 08:26:02 -08:00