Overview: deprecate existing Talos resource API, and introduce new COSI
API.
Consequences:
* COSI API can only go via one-2-one proxy (`client.WithNode`)
* client-side API access is way easier with `state.State` wrappers
* lots of small changes on the client side to use new APIs
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This fixes an issue when `talosctl upgrade-k8s` fails with unhelpful
message if the version is specified as `v1.23.5` vs. `1.23.5`.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Having polymorphic (spec type depends on ID) resources is not a good
idea, and it's not compatible with protobuf encoding.
Introduce new resources for each polymorphic sub-spec using new Go 1.18
generic typed.Resource to reduce the boilerplate code.
(Still needs proper deepcopy-gen, but I'm skipping it for now, as
K8sControlPlane had also broken deep copy).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Use the last `:` in the image reference.
Handle the case when no version was discovered.
See https://github.com/siderolabs/theila/issues/138
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This showed up recently frequently in integration-provision tests
(might be related to Kubernetes upgrade), but anyways errors should be
retried.
Refactored the function to extract the retryable part.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Talos shouldn't try to re-encode the machine config it was provided
with.
So add a `ReadonlyWrapper` around `*v1alpha1.Config` which makes sure
that raw config object is not available anymore (it's a private field),
but config accessors are available for read-only access.
Another thing that `ReadonlyWrapper` does is that it preserves the
original `[]byte` encoding of the config keeping it exactly same way as
it was loaded from file or read over the network.
Improved `talosctl edit mc` to preserve the config as it was submitted,
and preserve the edits on error from Talos (previously edits were lost).
`ReadonlyWrapper` is not used on config generation path though - config
there is represented by `*v1alpha.Config` and can be freely modified.
Why almost? Some parts of Talos (platform code) patch the machine
configuration with new data. We need to fix platforms to provide
networking configuration in a different way, but this will come with
other PRs later.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4656
As now changes to kubelet configuration can be applied without a reboot,
`talosctl upgrade-k8s` can handle the kubelet upgrades as well.
The gist is simply modifying machine config and waiting for `Node`
version to be updated, rest of the code is required for reliability of
the process.
Also fixed a bug in the API while watching deleted items with
tombstones.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes: https://github.com/talos-systems/talos/issues/4065
Get all Talos generated manifests and apply them, wait for deployments to be
updated and to become ready.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Fixes#3951
Bootkube support was removed in Talos 0.9. Talos versions 0.9-0.11
support conversion of self-hosted bootkube-based control plane to the
new style control plane running as static pods managed by Talos.
This commit removes all backwards compatibility and removes conversion
code.
For the k8s controllers, `BootstrapStatus` is removed and a dependency
on `etcd` service status is added (as it was implicitly there via
`BootstrapStatus`).
Remove control plane conversion code.
In k8s upgrade code, remove self-hosted part.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Scan all pods in `kube-system` and find `kube-proxy`, `kube-scheduler`,
`kube-controller-manager` and `kube-apiserver` ones, then check the
lowest version amongst them.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This is going to be useful in the third party code which is using
upgrade modules, to collect output logs instead of printing them to the
stdout.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This removes `retrying error` messages while waiting for the API server
pod state to reflect changes from the updated static pod definition.
Log more lines to notify about the progress.
Skip `kube-proxy` if not found (as we allow it to be disabled).
```
$ talosctl upgrade-k8s -n 172.20.0.2 --from 1.21.0 --to 1.21.2
discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
updating "kube-apiserver" to version "1.21.2"
> "172.20.0.2": starting update
> "172.20.0.2": machine configuration patched
> "172.20.0.2": waiting for API server state pod update
< "172.20.0.2": successfully updated
> "172.20.0.3": starting update
> "172.20.0.3": machine configuration patched
> "172.20.0.3": waiting for API server state pod update
< "172.20.0.3": successfully updated
> "172.20.0.4": starting update
> "172.20.0.4": machine configuration patched
> "172.20.0.4": waiting for API server state pod update
< "172.20.0.4": successfully updated
updating "kube-controller-manager" to version "1.21.2"
> "172.20.0.2": starting update
> "172.20.0.2": machine configuration patched
> "172.20.0.2": waiting for API server state pod update
< "172.20.0.2": successfully updated
> "172.20.0.3": starting update
> "172.20.0.3": machine configuration patched
> "172.20.0.3": waiting for API server state pod update
< "172.20.0.3": successfully updated
> "172.20.0.4": starting update
> "172.20.0.4": machine configuration patched
> "172.20.0.4": waiting for API server state pod update
< "172.20.0.4": successfully updated
updating "kube-scheduler" to version "1.21.2"
> "172.20.0.2": starting update
> "172.20.0.2": machine configuration patched
> "172.20.0.2": waiting for API server state pod update
< "172.20.0.2": successfully updated
> "172.20.0.3": starting update
> "172.20.0.3": machine configuration patched
> "172.20.0.3": waiting for API server state pod update
< "172.20.0.3": successfully updated
> "172.20.0.4": starting update
> "172.20.0.4": machine configuration patched
> "172.20.0.4": waiting for API server state pod update
< "172.20.0.4": successfully updated
updating daemonset "kube-proxy" to version "1.21.2"
kube-proxy skipped as DaemonSet was not found
```
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The structure of the controllers is really similar to addresses and
routes:
* `LinkSpec` resource describes desired link state
* `LinkConfig` controller generates `LinkSpecs` based on machine
configuration and kernel cmdline
* `LinkMerge` controller merges multiple configuration sources into a
single `LinkSpec` paying attention to the config layer priority
* `LinkSpec` controller applies the specs to the kernel state
Controller `LinkStatus` (which was implemented before) watches the
kernel state and publishes current link status.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This extracts function which was used in upgrade/convert flows to retry
transient errors to the main `kubernetes` package, expands it to ignore
timeout errors, and it is now used to retry errors where applicable in
`pkg/kubernetes`.
Fixes#3403
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
During the conversion process, API server goes down, so we can see lots
of network errors including EOF.
Fixes#3404
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The command `--remove-initialized-key` is the last resort to convert
control plane when control plane is down for whatever reason, so it
should work when control plane is not available.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
First, if the config for some component image (e.g. `apiServer`) is empty,
Talos pushes default image which is unknown to the script, so verify
that change is not no-op, as otherwise script will hang forvever waiting
for k8s control plane config update.
Second, with bootkube bootstrap it was fine to omit explicit kubernetes
version in upgrade test, but with Talos-managed that means that after
Talos upgrade Kubernetes gets upgraded as well (as Talos config doesn't
contain K8s version, and defaults are used). This is not what we want to
test actually.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This includes Sean's comments from #3278 and introduces a new flag which
is referenced in manual conversion process document.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Without loadbalancer, when api-server goes down, there will be
connection refused errors which should be retried.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
See https://github.com/talos-systems/os-runtime/pull/12 for new mnaming
conventions.
No functional changes.
Additionally implements printing extra columns in `talosctl get xyz`.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Upgrade is performed by updating node configuration (node by node, service
by service), watching internal resource state to get new configuration
version and verifying that pod with matching version successfully
propagated to the API server state and pod is ready.
Process is similar to the rolling update of the DaemonSet.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is required to upgrade from Talos 0.8.x to 0.9.x. After the cluster
is fully upgraded, control plane is still self-hosted (as it was
bootstrapped with bootkube).
Tool `talosctl convert-k8s` (and library behind it) performs the upgrade
to self-hosted version.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Flannel got updated to 0.13 version which has multi-arch image.
Kubernetes images are multi-arch.
Fixes#3049
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>