mirror of
https://github.com/siderolabs/talos.git
synced 2025-08-16 11:37:07 +02:00
This is mostly refactoring to adapt to the new APIs. There are some small changes which are not user-visible immediately (but visible when using `talosctl get` to inspect low-level details): * `extras` namespace is removed, it was a hack to distinguish extra and system manifests * `Manifests` are managed by two controllers as shared outputs, stored in the `controlplane` namespace now * `talosctl inspect dependencies` output got slightly changed * resources now have `md.owner` set to the controller name which manages the resource Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
258 lines
12 KiB
Markdown
258 lines
12 KiB
Markdown
---
|
|
title: "Converting Control Plane"
|
|
description: "How to convert Talos self-hosted Kubernetes control plane (pre-0.9) to static pods based one."
|
|
---
|
|
|
|
Talos version 0.9 runs Kubernetes control plane in a new way: static pods managed by Talos.
|
|
Talos version 0.8 and below runs self-hosted control plane.
|
|
After Talos OS upgrade to version 0.9 Kubernetes control plane should be converted to run as static pods.
|
|
|
|
This guide describes automated conversion script and also shows detailed manual conversion process.
|
|
|
|
## Video Walkthrough
|
|
|
|
To see a live demo of this writeup, see the video below:
|
|
|
|
<iframe width="560" height="315" src="https://www.youtube.com/embed/nUuFYLEp7wQ" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
|
|
|
|
## Automated Conversion
|
|
|
|
First, make sure all nodes are updated to Talos 0.9:
|
|
|
|
```bash
|
|
$ kubectl get nodes -o wide
|
|
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
|
|
talos-default-master-1 Ready control-plane,master 58m v1.20.4 172.20.0.2 <none> Talos (v0.9.0) 5.10.19-talos containerd://1.4.4
|
|
talos-default-master-2 Ready control-plane,master 58m v1.20.4 172.20.0.3 <none> Talos (v0.9.0) 5.10.19-talos containerd://1.4.4
|
|
talos-default-master-3 Ready control-plane,master 58m v1.20.4 172.20.0.4 <none> Talos (v0.9.0) 5.10.19-talos containerd://1.4.4
|
|
talos-default-worker-1 Ready <none> 58m v1.20.4 172.20.0.5 <none> Talos (v0.9.0) 5.10.19-talos containerd://1.4.4
|
|
```
|
|
|
|
Start the conversion script:
|
|
|
|
```bash
|
|
$ talosctl -n <IP> convert-k8s
|
|
discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
|
|
current self-hosted status: true
|
|
gathering control plane configuration
|
|
aggregator CA key can't be recovered from bootkube-boostrapped control plane, generating new CA
|
|
patching master node "172.20.0.2" configuration
|
|
patching master node "172.20.0.3" configuration
|
|
patching master node "172.20.0.4" configuration
|
|
waiting for static pod definitions to be generated
|
|
waiting for manifests to be generated
|
|
Talos generated control plane static pod definitions and bootstrap manifests, please verify them with commands:
|
|
talosctl -n <master node IP> get StaticPods.kubernetes.talos.dev
|
|
talosctl -n <master node IP> get Manifests.kubernetes.talos.dev
|
|
|
|
in order to remove self-hosted control plane, pod-checkpointer component needs to be disabled
|
|
once pod-checkpointer is disabled, the cluster shouldn't be rebooted until the entire conversion process is complete
|
|
confirm disabling pod-checkpointer to proceed with control plane update [yes/no]:
|
|
```
|
|
|
|
Script stops at this point waiting for confirmation.
|
|
Talos still runs self-hosted control plane, and static pods were not rendered yet.
|
|
|
|
As instructed by the script, please verify that static pod definitions are correct:
|
|
|
|
```bash
|
|
$ talosctl -n <IP> get staticpods -o yaml
|
|
node: 172.20.0.2
|
|
metadata:
|
|
namespace: controlplane
|
|
type: StaticPods.kubernetes.talos.dev
|
|
id: kube-apiserver
|
|
version: 1
|
|
phase: running
|
|
spec:
|
|
apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
annotations:
|
|
talos.dev/config-version: "2"
|
|
talos.dev/secrets-version: "1"
|
|
creationTimestamp: null
|
|
labels:
|
|
k8s-app: kube-apiserver
|
|
tier: control-plane
|
|
name: kube-apiserver
|
|
namespace: kube-system
|
|
spec:
|
|
containers:
|
|
- command:
|
|
...
|
|
```
|
|
|
|
Static pod definitions are generated from the machine configuration and should match pod template as generated by Talos on bootstrap of self-hosted control plane unless there were some manual changes applied to the daemonset specs after bootstrap.
|
|
Talos patches the machine configuration with the container image versions scraped from the daemonset definition, fetches the service account key from Kubernetes secrets.
|
|
|
|
Aggregator CA can't be recovered from the self-hosted control plane, so new CA gets generated.
|
|
This is generally harmless and not visible from outside the cluster.
|
|
The Aggregator CA is _not_ the same CA as is used by Talos or Kubernetes standard API.
|
|
It is a special PKI used for aggregating API extension services inside your cluster.
|
|
If you have non-standard apiserver aggregations (fairly rare, and you should know if you do), then you may need to restart these services after the new CA is in place.
|
|
|
|
Verify that bootstrap manifests are correct:
|
|
|
|
```bash
|
|
$ talosctl -n <IP> get manifests
|
|
NODE NAMESPACE TYPE ID VERSION
|
|
172.20.0.2 controlplane Manifest 00-kubelet-bootstrapping-token 1
|
|
172.20.0.2 controlplane Manifest 01-csr-approver-role-binding 1
|
|
172.20.0.2 controlplane Manifest 01-csr-node-bootstrap 1
|
|
172.20.0.2 controlplane Manifest 01-csr-renewal-role-binding 1
|
|
172.20.0.2 controlplane Manifest 02-kube-system-sa-role-binding 1
|
|
172.20.0.2 controlplane Manifest 03-default-pod-security-policy 1
|
|
172.20.0.2 controlplane Manifest 05-https://docs.projectcalico.org/manifests/calico.yaml 1
|
|
172.20.0.2 controlplane Manifest 10-kube-proxy 1
|
|
172.20.0.2 controlplane Manifest 11-core-dns 1
|
|
172.20.0.2 controlplane Manifest 11-core-dns-svc 1
|
|
172.20.0.2 controlplane Manifest 11-kube-config-in-cluster 1
|
|
```
|
|
|
|
Make sure that manifests and static pods are correct across all control plane nodes, as each node reconciles
|
|
control plane state on its own.
|
|
For example, CNI configuration in machine config should be in sync across all the nodes.
|
|
Talos nodes try to create any missing Kubernetes resources from the manifests, but it never
|
|
updates or deletes existing resources.
|
|
|
|
If something looks wrong, script can be aborted and machine configuration should be updated to fix the problem.
|
|
Once configuration is updated, the script can be restarted.
|
|
|
|
If static pod definitions and manifests look good, confirm next step to disable `pod-checkpointer`:
|
|
|
|
```bash
|
|
$ talosctl -n <IP> convert-k8s
|
|
...
|
|
confirm disabling pod-checkpointer to proceed with control plane update [yes/no]: yes
|
|
disabling pod-checkpointer
|
|
deleting daemonset "pod-checkpointer"
|
|
checking for active pod checkpoints
|
|
2021/03/09 23:37:25 retrying error: found 3 active pod checkpoints: [pod-checkpointer-655gc-talos-default-master-3 pod-checkpointer-pw6mv-talos-default-master-1 pod-checkpointer-zdw9z-talos-default-master-2]
|
|
2021/03/09 23:42:25 retrying error: found 1 active pod checkpoints: [pod-checkpointer-pw6mv-talos-default-master-1]
|
|
confirm applying static pod definitions and manifests [yes/no]:
|
|
```
|
|
|
|
Self-hosted control plane runs `pod-checkpointer` to work around issues with control plane availability.
|
|
It should be disabled before conversion starts to allow self-hosted control plane to be removed.
|
|
It takes around 5 minutes for the `pod-checkpointer` to be fully disabled.
|
|
Script verifies that all checkpoints are removed before proceeding.
|
|
|
|
This last confirmation before proceeding is at the point when there is no way to keep running self-hosted control plane:
|
|
static pods are released, bootstrap manifests are applied, self-hosted control plane is removed.
|
|
|
|
```bash
|
|
$ talosctl -n <IP> convert-k8s
|
|
...
|
|
confirm applying static pod definitions and manifests [yes/no]: yes
|
|
removing self-hosted initialized key
|
|
waiting for static pods for "kube-apiserver" to be present in the API server state
|
|
waiting for static pods for "kube-controller-manager" to be present in the API server state
|
|
waiting for static pods for "kube-scheduler" to be present in the API server state
|
|
deleting daemonset "kube-apiserver"
|
|
waiting for static pods for "kube-apiserver" to be present in the API server state
|
|
deleting daemonset "kube-controller-manager"
|
|
waiting for static pods for "kube-controller-manager" to be present in the API server state
|
|
deleting daemonset "kube-scheduler"
|
|
waiting for static pods for "kube-scheduler" to be present in the API server state
|
|
conversion process completed successfully
|
|
```
|
|
|
|
As soon as the control plane static pods are rendered, the kubelet starts the control plane static pods.
|
|
It is expected that the pods for `kube-apiserver` will crash initially.
|
|
Only one `kube-apiserver` can be bound to the host `Node`'s port 6443 at a time.
|
|
Eventually, the old `kube-apiserver` will be killed, and the new one will be able to start.
|
|
This is all handled automatically.
|
|
The script will continue by removing each self-hosted daemonset and verifying that static pods are ready and healthy.
|
|
|
|
## Manual Conversion
|
|
|
|
Check that Talos runs self-hosted control plane:
|
|
|
|
```bash
|
|
$ talosctl -n <CONTROL_PLANE_IP> get bs
|
|
NODE NAMESPACE TYPE ID VERSION SELF HOSTED
|
|
172.20.0.2 runtime BootstrapStatus control-plane 2 true
|
|
```
|
|
|
|
Talos machine configuration need to be updated to the 0.9 format; there are two new required machine configuration settings:
|
|
|
|
* `.cluster.serviceAccount` is the service account PEM-encoded private key.
|
|
* `.cluster.aggregatorCA` is the aggregator CA for `kube-apiserver` (certficiate and private key).
|
|
|
|
Current service account can be fetched from the Kubernetes secrets:
|
|
|
|
```bash
|
|
$ kubectl -n kube-system get secrets kube-controller-manager -o jsonpath='{.data.service\-account\.key}'
|
|
LS0tLS1CRUdJTiBSU0EgUFJJVkFURS...
|
|
```
|
|
|
|
All control plane node machine configurations should be patched with the service account key:
|
|
|
|
```bash
|
|
$ talosctl -n <CONTROL_PLANE_IP1>,<CONTROL_PLANE_IP2>,... patch mc --immediate -p '[{"op": "add", "path": "/cluster/serviceAccount", "value": {"key": "LS0tLS1CRUdJTiBSU0EgUFJJVkFURS..."}}]'
|
|
patched mc at the node 172.20.0.2
|
|
```
|
|
|
|
Aggregator CA can be generated using OpenSSL or any other certificate generation tools: RSA or ECDSA certificate with CN `front-proxy` valid for 10 years.
|
|
PEM-encoded CA certificate and key should be base64-encoded and patched into the machine config at path `/cluster/aggregatorCA`:
|
|
|
|
```bash
|
|
$ talosctl -n <CONTROL_PLANE_IP1>,<CONTROL_PLANE_IP2>,... patch mc --immediate -p '[{"op": "add", "path": "/cluster/aggregatorCA", "value": {"crt": "S0tLS1CRUdJTiBDRVJUSUZJQ...", "key": "LS0tLS1CRUdJTiBFQy..."}}]'
|
|
patched mc at the node 172.20.0.2
|
|
```
|
|
|
|
At this point static pod definitions and bootstrap manifests should be rendered, please see "Automated Conversion" on how to verify generated objects.
|
|
Feel free to continue to refine your machine configuration until the generated static pod definitions and bootstrap manifests look good.
|
|
|
|
If static pod definitions are not generated, check logs with `talosctl -n <IP> logs controller-runtime`.
|
|
|
|
Disable `pod-checkpointer` with:
|
|
|
|
```bash
|
|
$ kubectl -n kube-system delete ds pod-checkpointer
|
|
daemonset.apps "pod-checkpointer" deleted
|
|
```
|
|
|
|
Wait for all pod checkpoints to be removed:
|
|
|
|
```bash
|
|
$ kubectl -n kube-system get pods
|
|
NAME READY STATUS RESTARTS AGE
|
|
...
|
|
pod-checkpointer-8q2lh-talos-default-master-2 1/1 Running 0 3m34s
|
|
pod-checkpointer-nnm5w-talos-default-master-3 1/1 Running 0 3m24s
|
|
pod-checkpointer-qnmdt-talos-default-master-1 1/1 Running 0 2m21s
|
|
```
|
|
|
|
Pod checkpoints have annotation `checkpointer.alpha.coreos.com/checkpoint-of`.
|
|
|
|
Once all the pod checkpoints are removed (it takes 5 minutes for the checkpoints to be removed), proceed by removing self-hosted initialized key:
|
|
|
|
```bash
|
|
talosctl -n <CONTROL_PLANE_IP> convert-k8s --remove-initialized-key
|
|
```
|
|
|
|
Talos controllers will now render static pod definitions, and the kubelet will launch any resulting static pods.
|
|
|
|
Once static pods are visible in `kubectl get pods -n kube-system` output, proceed by removing each of the self-hosted daemonsets:
|
|
|
|
```bash
|
|
$ kubectl -n kube-system delete daemonset kube-apiserver
|
|
daemonset.apps "kube-apiserver" deleted
|
|
```
|
|
|
|
Make sure static pods for `kube-apiserver` got started successfully, pods are running and ready.
|
|
|
|
Proceed by deleting `kube-controller-manager` and `kube-scheduler` daemonsets, verifying that static pods are running between each step:
|
|
|
|
```bash
|
|
$ kubectl -n kube-system delete daemonset kube-controller-manager
|
|
daemonset.apps "kube-controller-manager" deleted
|
|
```
|
|
|
|
```bash
|
|
$ kubectl -n kube-system delete daemonset kube-scheduler
|
|
daemonset.apps "kube-scheduler" deleted
|
|
```
|