--- title: "Converting Control Plane" description: "How to convert Talos self-hosted Kubernetes control plane (pre-0.9) to static pods based one." --- Talos version 0.9 runs Kubernetes control plane in a new way: static pods managed by Talos. Talos version 0.8 and below runs self-hosted control plane. After Talos OS upgrade to version 0.9 Kubernetes control plane should be converted to run as static pods. This guide describes automated conversion script and also shows detailed manual conversion process. ## Video Walkthrough To see a live demo of this writeup, see the video below: ## Automated Conversion First, make sure all nodes are updated to Talos 0.9: ```bash $ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME talos-default-master-1 Ready control-plane,master 58m v1.20.4 172.20.0.2 Talos (v0.9.0) 5.10.19-talos containerd://1.4.4 talos-default-master-2 Ready control-plane,master 58m v1.20.4 172.20.0.3 Talos (v0.9.0) 5.10.19-talos containerd://1.4.4 talos-default-master-3 Ready control-plane,master 58m v1.20.4 172.20.0.4 Talos (v0.9.0) 5.10.19-talos containerd://1.4.4 talos-default-worker-1 Ready 58m v1.20.4 172.20.0.5 Talos (v0.9.0) 5.10.19-talos containerd://1.4.4 ``` Start the conversion script: ```bash $ talosctl -n convert-k8s discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"] current self-hosted status: true gathering control plane configuration aggregator CA key can't be recovered from bootkube-boostrapped control plane, generating new CA patching master node "172.20.0.2" configuration patching master node "172.20.0.3" configuration patching master node "172.20.0.4" configuration waiting for static pod definitions to be generated waiting for manifests to be generated Talos generated control plane static pod definitions and bootstrap manifests, please verify them with commands: talosctl -n get StaticPods.kubernetes.talos.dev talosctl -n get Manifests.kubernetes.talos.dev in order to remove self-hosted control plane, pod-checkpointer component needs to be disabled once pod-checkpointer is disabled, the cluster shouldn't be rebooted until the entire conversion process is complete confirm disabling pod-checkpointer to proceed with control plane update [yes/no]: ``` Script stops at this point waiting for confirmation. Talos still runs self-hosted control plane, and static pods were not rendered yet. As instructed by the script, please verify that static pod definitions are correct: ```bash $ talosctl -n get staticpods -o yaml node: 172.20.0.2 metadata: namespace: controlplane type: StaticPods.kubernetes.talos.dev id: kube-apiserver version: 1 phase: running spec: apiVersion: v1 kind: Pod metadata: annotations: talos.dev/config-version: "2" talos.dev/secrets-version: "1" creationTimestamp: null labels: k8s-app: kube-apiserver tier: control-plane name: kube-apiserver namespace: kube-system spec: containers: - command: ... ``` Static pod definitions are generated from the machine configuration and should match pod template as generated by Talos on bootstrap of self-hosted control plane unless there were some manual changes applied to the daemonset specs after bootstrap. Talos patches the machine configuration with the container image versions scraped from the daemonset definition, fetches the service account key from Kubernetes secrets. Aggregator CA can't be recovered from the self-hosted control plane, so new CA gets generated. This is generally harmless and not visible from outside the cluster. The Aggregator CA is _not_ the same CA as is used by Talos or Kubernetes standard API. It is a special PKI used for aggregating API extension services inside your cluster. If you have non-standard apiserver aggregations (fairly rare, and you should know if you do), then you may need to restart these services after the new CA is in place. Verify that bootstrap manifests are correct: ```bash $ talosctl -n get manifests NODE NAMESPACE TYPE ID VERSION 172.20.0.2 controlplane Manifest 00-kubelet-bootstrapping-token 1 172.20.0.2 controlplane Manifest 01-csr-approver-role-binding 1 172.20.0.2 controlplane Manifest 01-csr-node-bootstrap 1 172.20.0.2 controlplane Manifest 01-csr-renewal-role-binding 1 172.20.0.2 controlplane Manifest 02-kube-system-sa-role-binding 1 172.20.0.2 controlplane Manifest 03-default-pod-security-policy 1 172.20.0.2 controlplane Manifest 05-https://docs.projectcalico.org/manifests/calico.yaml 1 172.20.0.2 controlplane Manifest 10-kube-proxy 1 172.20.0.2 controlplane Manifest 11-core-dns 1 172.20.0.2 controlplane Manifest 11-core-dns-svc 1 172.20.0.2 controlplane Manifest 11-kube-config-in-cluster 1 ``` Make sure that manifests and static pods are correct across all control plane nodes, as each node reconciles control plane state on its own. For example, CNI configuration in machine config should be in sync across all the nodes. Talos nodes try to create any missing Kubernetes resources from the manifests, but it never updates or deletes existing resources. If something looks wrong, script can be aborted and machine configuration should be updated to fix the problem. Once configuration is updated, the script can be restarted. If static pod definitions and manifests look good, confirm next step to disable `pod-checkpointer`: ```bash $ talosctl -n convert-k8s ... confirm disabling pod-checkpointer to proceed with control plane update [yes/no]: yes disabling pod-checkpointer deleting daemonset "pod-checkpointer" checking for active pod checkpoints 2021/03/09 23:37:25 retrying error: found 3 active pod checkpoints: [pod-checkpointer-655gc-talos-default-master-3 pod-checkpointer-pw6mv-talos-default-master-1 pod-checkpointer-zdw9z-talos-default-master-2] 2021/03/09 23:42:25 retrying error: found 1 active pod checkpoints: [pod-checkpointer-pw6mv-talos-default-master-1] confirm applying static pod definitions and manifests [yes/no]: ``` Self-hosted control plane runs `pod-checkpointer` to work around issues with control plane availability. It should be disabled before conversion starts to allow self-hosted control plane to be removed. It takes around 5 minutes for the `pod-checkpointer` to be fully disabled. Script verifies that all checkpoints are removed before proceeding. This last confirmation before proceeding is at the point when there is no way to keep running self-hosted control plane: static pods are released, bootstrap manifests are applied, self-hosted control plane is removed. ```bash $ talosctl -n convert-k8s ... confirm applying static pod definitions and manifests [yes/no]: yes removing self-hosted initialized key waiting for static pods for "kube-apiserver" to be present in the API server state waiting for static pods for "kube-controller-manager" to be present in the API server state waiting for static pods for "kube-scheduler" to be present in the API server state deleting daemonset "kube-apiserver" waiting for static pods for "kube-apiserver" to be present in the API server state deleting daemonset "kube-controller-manager" waiting for static pods for "kube-controller-manager" to be present in the API server state deleting daemonset "kube-scheduler" waiting for static pods for "kube-scheduler" to be present in the API server state conversion process completed successfully ``` As soon as the control plane static pods are rendered, the kubelet starts the control plane static pods. It is expected that the pods for `kube-apiserver` will crash initially. Only one `kube-apiserver` can be bound to the host `Node`'s port 6443 at a time. Eventually, the old `kube-apiserver` will be killed, and the new one will be able to start. This is all handled automatically. The script will continue by removing each self-hosted daemonset and verifying that static pods are ready and healthy. ## Manual Conversion Check that Talos runs self-hosted control plane: ```bash $ talosctl -n get bs NODE NAMESPACE TYPE ID VERSION SELF HOSTED 172.20.0.2 runtime BootstrapStatus control-plane 2 true ``` Talos machine configuration need to be updated to the 0.9 format; there are two new required machine configuration settings: * `.cluster.serviceAccount` is the service account PEM-encoded private key. * `.cluster.aggregatorCA` is the aggregator CA for `kube-apiserver` (certficiate and private key). Current service account can be fetched from the Kubernetes secrets: ```bash $ kubectl -n kube-system get secrets kube-controller-manager -o jsonpath='{.data.service\-account\.key}' LS0tLS1CRUdJTiBSU0EgUFJJVkFURS... ``` All control plane node machine configurations should be patched with the service account key: ```bash $ talosctl -n ,,... patch mc --immediate -p '[{"op": "add", "path": "/cluster/serviceAccount", "value": {"key": "LS0tLS1CRUdJTiBSU0EgUFJJVkFURS..."}}]' patched mc at the node 172.20.0.2 ``` Aggregator CA can be generated using OpenSSL or any other certificate generation tools: RSA or ECDSA certificate with CN `front-proxy` valid for 10 years. PEM-encoded CA certificate and key should be base64-encoded and patched into the machine config at path `/cluster/aggregatorCA`: ```bash $ talosctl -n ,,... patch mc --immediate -p '[{"op": "add", "path": "/cluster/aggregatorCA", "value": {"crt": "S0tLS1CRUdJTiBDRVJUSUZJQ...", "key": "LS0tLS1CRUdJTiBFQy..."}}]' patched mc at the node 172.20.0.2 ``` At this point static pod definitions and bootstrap manifests should be rendered, please see "Automated Conversion" on how to verify generated objects. Feel free to continue to refine your machine configuration until the generated static pod definitions and bootstrap manifests look good. If static pod definitions are not generated, check logs with `talosctl -n logs controller-runtime`. Disable `pod-checkpointer` with: ```bash $ kubectl -n kube-system delete ds pod-checkpointer daemonset.apps "pod-checkpointer" deleted ``` Wait for all pod checkpoints to be removed: ```bash $ kubectl -n kube-system get pods NAME READY STATUS RESTARTS AGE ... pod-checkpointer-8q2lh-talos-default-master-2 1/1 Running 0 3m34s pod-checkpointer-nnm5w-talos-default-master-3 1/1 Running 0 3m24s pod-checkpointer-qnmdt-talos-default-master-1 1/1 Running 0 2m21s ``` Pod checkpoints have annotation `checkpointer.alpha.coreos.com/checkpoint-of`. Once all the pod checkpoints are removed (it takes 5 minutes for the checkpoints to be removed), proceed by removing self-hosted initialized key: ```bash talosctl -n convert-k8s --remove-initialized-key ``` Talos controllers will now render static pod definitions, and the kubelet will launch any resulting static pods. Once static pods are visible in `kubectl get pods -n kube-system` output, proceed by removing each of the self-hosted daemonsets: ```bash $ kubectl -n kube-system delete daemonset kube-apiserver daemonset.apps "kube-apiserver" deleted ``` Make sure static pods for `kube-apiserver` got started successfully, pods are running and ready. Proceed by deleting `kube-controller-manager` and `kube-scheduler` daemonsets, verifying that static pods are running between each step: ```bash $ kubectl -n kube-system delete daemonset kube-controller-manager daemonset.apps "kube-controller-manager" deleted ``` ```bash $ kubectl -n kube-system delete daemonset kube-scheduler daemonset.apps "kube-scheduler" deleted ```