mirror of
				https://github.com/siderolabs/talos.git
				synced 2025-10-26 05:51:17 +01:00 
			
		
		
		
	This is mostly refactoring to adapt to the new APIs. There are some small changes which are not user-visible immediately (but visible when using `talosctl get` to inspect low-level details): * `extras` namespace is removed, it was a hack to distinguish extra and system manifests * `Manifests` are managed by two controllers as shared outputs, stored in the `controlplane` namespace now * `talosctl inspect dependencies` output got slightly changed * resources now have `md.owner` set to the controller name which manages the resource Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
		
			
				
	
	
		
			258 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			258 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: "Converting Control Plane"
 | |
| description: "How to convert Talos self-hosted Kubernetes control plane (pre-0.9) to static pods based one."
 | |
| ---
 | |
| 
 | |
| Talos version 0.9 runs Kubernetes control plane in a new way: static pods managed by Talos.
 | |
| Talos version 0.8 and below runs self-hosted control plane.
 | |
| After Talos OS upgrade to version 0.9 Kubernetes control plane should be converted to run as static pods.
 | |
| 
 | |
| This guide describes automated conversion script and also shows detailed manual conversion process.
 | |
| 
 | |
| ## Video Walkthrough
 | |
| 
 | |
| To see a live demo of this writeup, see the video below:
 | |
| 
 | |
| <iframe width="560" height="315" src="https://www.youtube.com/embed/nUuFYLEp7wQ" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
 | |
| 
 | |
| ## Automated Conversion
 | |
| 
 | |
| First, make sure all nodes are updated to Talos 0.9:
 | |
| 
 | |
| ```bash
 | |
| $ kubectl get nodes -o wide
 | |
| NAME                     STATUS   ROLES                  AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION   CONTAINER-RUNTIME
 | |
| talos-default-master-1   Ready    control-plane,master   58m   v1.20.4   172.20.0.2    <none>        Talos (v0.9.0)   5.10.19-talos    containerd://1.4.4
 | |
| talos-default-master-2   Ready    control-plane,master   58m   v1.20.4   172.20.0.3    <none>        Talos (v0.9.0)   5.10.19-talos    containerd://1.4.4
 | |
| talos-default-master-3   Ready    control-plane,master   58m   v1.20.4   172.20.0.4    <none>        Talos (v0.9.0)   5.10.19-talos    containerd://1.4.4
 | |
| talos-default-worker-1   Ready    <none>                 58m   v1.20.4   172.20.0.5    <none>        Talos (v0.9.0)   5.10.19-talos    containerd://1.4.4
 | |
| ```
 | |
| 
 | |
| Start the conversion script:
 | |
| 
 | |
| ```bash
 | |
| $ talosctl -n <IP> convert-k8s
 | |
| discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
 | |
| current self-hosted status: true
 | |
| gathering control plane configuration
 | |
| aggregator CA key can't be recovered from bootkube-boostrapped control plane, generating new CA
 | |
| patching master node "172.20.0.2" configuration
 | |
| patching master node "172.20.0.3" configuration
 | |
| patching master node "172.20.0.4" configuration
 | |
| waiting for static pod definitions to be generated
 | |
| waiting for manifests to be generated
 | |
| Talos generated control plane static pod definitions and bootstrap manifests, please verify them with commands:
 | |
|     talosctl -n <master node IP> get StaticPods.kubernetes.talos.dev
 | |
|     talosctl -n <master node IP> get Manifests.kubernetes.talos.dev
 | |
| 
 | |
| in order to remove self-hosted control plane, pod-checkpointer component needs to be disabled
 | |
| once pod-checkpointer is disabled, the cluster shouldn't be rebooted until the entire conversion process is complete
 | |
| confirm disabling pod-checkpointer to proceed with control plane update [yes/no]:
 | |
| ```
 | |
| 
 | |
| Script stops at this point waiting for confirmation.
 | |
| Talos still runs self-hosted control plane, and static pods were not rendered yet.
 | |
| 
 | |
| As instructed by the script, please verify that static pod definitions are correct:
 | |
| 
 | |
| ```bash
 | |
| $ talosctl -n <IP> get staticpods -o yaml
 | |
| node: 172.20.0.2
 | |
| metadata:
 | |
|     namespace: controlplane
 | |
|     type: StaticPods.kubernetes.talos.dev
 | |
|     id: kube-apiserver
 | |
|     version: 1
 | |
|     phase: running
 | |
| spec:
 | |
|     apiVersion: v1
 | |
|     kind: Pod
 | |
|     metadata:
 | |
|         annotations:
 | |
|             talos.dev/config-version: "2"
 | |
|             talos.dev/secrets-version: "1"
 | |
|         creationTimestamp: null
 | |
|         labels:
 | |
|             k8s-app: kube-apiserver
 | |
|             tier: control-plane
 | |
|         name: kube-apiserver
 | |
|         namespace: kube-system
 | |
|     spec:
 | |
|         containers:
 | |
|             - command:
 | |
| ...
 | |
| ```
 | |
| 
 | |
| Static pod definitions are generated from the machine configuration and should match pod template as generated by Talos on bootstrap of self-hosted control plane unless there were some manual changes applied to the daemonset specs after bootstrap.
 | |
| Talos patches the machine configuration with the container image versions scraped from the daemonset definition, fetches the service account key from Kubernetes secrets.
 | |
| 
 | |
| Aggregator CA can't be recovered from the self-hosted control plane, so new CA gets generated.
 | |
| This is generally harmless and not visible from outside the cluster.
 | |
| The Aggregator CA is _not_ the same CA as is used by Talos or Kubernetes standard API.
 | |
| It is a special PKI used for aggregating API extension services inside your cluster.
 | |
| If you have non-standard apiserver aggregations (fairly rare, and you should know if you do), then you may need to restart these services after the new CA is in place.
 | |
| 
 | |
| Verify that bootstrap manifests are correct:
 | |
| 
 | |
| ```bash
 | |
| $ talosctl -n <IP> get manifests
 | |
| NODE         NAMESPACE      TYPE       ID                               VERSION
 | |
| 172.20.0.2   controlplane   Manifest   00-kubelet-bootstrapping-token   1
 | |
| 172.20.0.2   controlplane   Manifest   01-csr-approver-role-binding     1
 | |
| 172.20.0.2   controlplane   Manifest   01-csr-node-bootstrap            1
 | |
| 172.20.0.2   controlplane   Manifest   01-csr-renewal-role-binding      1
 | |
| 172.20.0.2   controlplane   Manifest   02-kube-system-sa-role-binding   1
 | |
| 172.20.0.2   controlplane   Manifest   03-default-pod-security-policy   1
 | |
| 172.20.0.2   controlplane   Manifest   05-https://docs.projectcalico.org/manifests/calico.yaml   1
 | |
| 172.20.0.2   controlplane   Manifest   10-kube-proxy                    1
 | |
| 172.20.0.2   controlplane   Manifest   11-core-dns                      1
 | |
| 172.20.0.2   controlplane   Manifest   11-core-dns-svc                  1
 | |
| 172.20.0.2   controlplane   Manifest   11-kube-config-in-cluster        1
 | |
| ```
 | |
| 
 | |
| Make sure that manifests and static pods are correct across all control plane nodes, as each node reconciles
 | |
| control plane state on its own.
 | |
| For example, CNI configuration in machine config should be in sync across all the nodes.
 | |
| Talos nodes try to create any missing Kubernetes resources from the manifests, but it never
 | |
| updates or deletes existing resources.
 | |
| 
 | |
| If something looks wrong, script can be aborted and machine configuration should be updated to fix the problem.
 | |
| Once configuration is updated, the script can be restarted.
 | |
| 
 | |
| If static pod definitions and manifests look good, confirm next step to disable `pod-checkpointer`:
 | |
| 
 | |
| ```bash
 | |
| $ talosctl -n <IP> convert-k8s
 | |
| ...
 | |
| confirm disabling pod-checkpointer to proceed with control plane update [yes/no]: yes
 | |
| disabling pod-checkpointer
 | |
| deleting daemonset "pod-checkpointer"
 | |
| checking for active pod checkpoints
 | |
| 2021/03/09 23:37:25 retrying error: found 3 active pod checkpoints: [pod-checkpointer-655gc-talos-default-master-3 pod-checkpointer-pw6mv-talos-default-master-1 pod-checkpointer-zdw9z-talos-default-master-2]
 | |
| 2021/03/09 23:42:25 retrying error: found 1 active pod checkpoints: [pod-checkpointer-pw6mv-talos-default-master-1]
 | |
| confirm applying static pod definitions and manifests [yes/no]:
 | |
| ```
 | |
| 
 | |
| Self-hosted control plane runs `pod-checkpointer` to work around issues with control plane availability.
 | |
| It should be disabled before conversion starts to allow self-hosted control plane to be removed.
 | |
| It takes around 5 minutes for the `pod-checkpointer` to be fully disabled.
 | |
| Script verifies that all checkpoints are removed before proceeding.
 | |
| 
 | |
| This last confirmation before proceeding is at the point when there is no way to keep running self-hosted control plane:
 | |
| static pods are released, bootstrap manifests are applied, self-hosted control plane is removed.
 | |
| 
 | |
| ```bash
 | |
| $ talosctl -n <IP> convert-k8s
 | |
| ...
 | |
| confirm applying static pod definitions and manifests [yes/no]: yes
 | |
| removing self-hosted initialized key
 | |
| waiting for static pods for "kube-apiserver" to be present in the API server state
 | |
| waiting for static pods for "kube-controller-manager" to be present in the API server state
 | |
| waiting for static pods for "kube-scheduler" to be present in the API server state
 | |
| deleting daemonset "kube-apiserver"
 | |
| waiting for static pods for "kube-apiserver" to be present in the API server state
 | |
| deleting daemonset "kube-controller-manager"
 | |
| waiting for static pods for "kube-controller-manager" to be present in the API server state
 | |
| deleting daemonset "kube-scheduler"
 | |
| waiting for static pods for "kube-scheduler" to be present in the API server state
 | |
| conversion process completed successfully
 | |
| ```
 | |
| 
 | |
| As soon as the control plane static pods are rendered, the kubelet starts the control plane static pods.
 | |
| It is expected that the pods for `kube-apiserver` will crash initially.
 | |
| Only one `kube-apiserver` can be bound to the host `Node`'s port 6443 at a time.
 | |
| Eventually, the old `kube-apiserver` will be killed, and the new one will be able to start.
 | |
| This is all handled automatically.
 | |
| The script will continue by removing each self-hosted daemonset and verifying that static pods are ready and healthy.
 | |
| 
 | |
| ## Manual Conversion
 | |
| 
 | |
| Check that Talos runs self-hosted control plane:
 | |
| 
 | |
| ```bash
 | |
| $ talosctl -n <CONTROL_PLANE_IP> get bs
 | |
| NODE         NAMESPACE   TYPE              ID              VERSION   SELF HOSTED
 | |
| 172.20.0.2   runtime     BootstrapStatus   control-plane   2         true
 | |
| ```
 | |
| 
 | |
| Talos machine configuration need to be updated to the 0.9 format; there are two new required machine configuration settings:
 | |
| 
 | |
| * `.cluster.serviceAccount` is the service account PEM-encoded private key.
 | |
| * `.cluster.aggregatorCA` is the aggregator CA for `kube-apiserver` (certficiate and private key).
 | |
| 
 | |
| Current service account can be fetched from the Kubernetes secrets:
 | |
| 
 | |
| ```bash
 | |
| $ kubectl -n kube-system get secrets kube-controller-manager -o jsonpath='{.data.service\-account\.key}'
 | |
| LS0tLS1CRUdJTiBSU0EgUFJJVkFURS...
 | |
| ```
 | |
| 
 | |
| All control plane node machine configurations should be patched with the service account key:
 | |
| 
 | |
| ```bash
 | |
| $ talosctl -n <CONTROL_PLANE_IP1>,<CONTROL_PLANE_IP2>,... patch mc --immediate -p '[{"op": "add", "path": "/cluster/serviceAccount", "value": {"key": "LS0tLS1CRUdJTiBSU0EgUFJJVkFURS..."}}]'
 | |
| patched mc at the node 172.20.0.2
 | |
| ```
 | |
| 
 | |
| Aggregator CA can be generated using OpenSSL or any other certificate generation tools: RSA or ECDSA certificate with CN `front-proxy` valid for 10 years.
 | |
| PEM-encoded CA certificate and key should be base64-encoded and patched into the machine config at path `/cluster/aggregatorCA`:
 | |
| 
 | |
| ```bash
 | |
| $ talosctl -n <CONTROL_PLANE_IP1>,<CONTROL_PLANE_IP2>,... patch mc --immediate -p '[{"op": "add", "path": "/cluster/aggregatorCA", "value": {"crt": "S0tLS1CRUdJTiBDRVJUSUZJQ...", "key": "LS0tLS1CRUdJTiBFQy..."}}]'
 | |
| patched mc at the node 172.20.0.2
 | |
| ```
 | |
| 
 | |
| At this point static pod definitions and bootstrap manifests should be rendered, please see "Automated Conversion" on how to verify generated objects.
 | |
| Feel free to continue to refine your machine configuration until the generated static pod definitions and bootstrap manifests look good.
 | |
| 
 | |
| If static pod definitions are not generated, check logs with `talosctl -n <IP> logs controller-runtime`.
 | |
| 
 | |
| Disable `pod-checkpointer` with:
 | |
| 
 | |
| ```bash
 | |
| $ kubectl -n kube-system delete ds pod-checkpointer
 | |
| daemonset.apps "pod-checkpointer" deleted
 | |
| ```
 | |
| 
 | |
| Wait for all pod checkpoints to be removed:
 | |
| 
 | |
| ```bash
 | |
| $ kubectl -n kube-system get pods
 | |
| NAME                                            READY   STATUS    RESTARTS   AGE
 | |
| ...
 | |
| pod-checkpointer-8q2lh-talos-default-master-2   1/1     Running   0          3m34s
 | |
| pod-checkpointer-nnm5w-talos-default-master-3   1/1     Running   0          3m24s
 | |
| pod-checkpointer-qnmdt-talos-default-master-1   1/1     Running   0          2m21s
 | |
| ```
 | |
| 
 | |
| Pod checkpoints have annotation `checkpointer.alpha.coreos.com/checkpoint-of`.
 | |
| 
 | |
| Once all the pod checkpoints are removed (it takes 5 minutes for the checkpoints to be removed), proceed by removing self-hosted initialized key:
 | |
| 
 | |
| ```bash
 | |
| talosctl -n <CONTROL_PLANE_IP> convert-k8s --remove-initialized-key
 | |
| ```
 | |
| 
 | |
| Talos controllers will now render static pod definitions, and the kubelet will launch any resulting static pods.
 | |
| 
 | |
| Once static pods are visible in `kubectl get pods -n kube-system` output, proceed by removing each of the self-hosted daemonsets:
 | |
| 
 | |
| ```bash
 | |
| $ kubectl -n kube-system delete daemonset kube-apiserver
 | |
| daemonset.apps "kube-apiserver" deleted
 | |
| ```
 | |
| 
 | |
| Make sure static pods for `kube-apiserver` got started successfully, pods are running and ready.
 | |
| 
 | |
| Proceed by deleting `kube-controller-manager` and `kube-scheduler` daemonsets, verifying that static pods are running between each step:
 | |
| 
 | |
| ```bash
 | |
| $ kubectl -n kube-system delete daemonset kube-controller-manager
 | |
| daemonset.apps "kube-controller-manager" deleted
 | |
| ```
 | |
| 
 | |
| ```bash
 | |
| $ kubectl -n kube-system delete daemonset kube-scheduler
 | |
| daemonset.apps "kube-scheduler" deleted
 | |
| ```
 |