--- title: Upgrading Kubernetes --- This guide covers Kubernetes control plane upgrade for clusters running Talos-managed control plane. If the cluster is still running self-hosted control plane (after upgrade from Talos 0.8), please refer to 0.8 docs. ## Video Walkthrough To see a live demo of this writeup, see the video below: ## Automated Kubernetes Upgrade To check what is going to be upgraded you can run `talosctl upgrade-k8s` with `--dry-run` flag: ```bash $ talosctl --nodes upgrade-k8s --from 1.21.3 --to 1.22.0 --dry-run checking for resource APIs to be deprecated in version 1.22.0 WARNING: found resources which are going to be deprecated/migrated in the version 1.22.0 RESOURCE COUNT validatingwebhookconfigurations.v1beta1.admissionregistration.k8s.io 4 mutatingwebhookconfigurations.v1beta1.admissionregistration.k8s.io 3 customresourcedefinitions.v1beta1.apiextensions.k8s.io 25 apiservices.v1beta1.apiregistration.k8s.io 54 leases.v1beta1.coordination.k8s.io 4 discovered master nodes ["10.5.0.2"] updating "kube-apiserver" to version "1.22.0" > "10.5.0.2": starting update > update kube-apiserver: v1.21.3 -> 1.22.0 > skipped in dry-run updating "kube-controller-manager" to version "1.22.0" > "10.5.0.2": starting update > update kube-controller-manager: v1.21.3 -> 1.22.0 > skipped in dry-run updating "kube-scheduler" to version "1.22.0" > "10.5.0.2": starting update > update kube-scheduler: v1.21.3 -> 1.22.0 > skipped in dry-run updating daemonset "kube-proxy" to version "1.22.0" skipped in dry-run ``` To upgrade Kubernetes from v1.21.3 to v1.22.0 run: ```bash $ talosctl --nodes upgrade-k8s --from 1.21.3 --to 1.22.0 checking for resource APIs to be deprecated in version 1.22.0 discovered master nodes ["10.5.0.2"] updating "kube-apiserver" to version "1.22.0" > "10.5.0.2": starting update > update kube-apiserver: v1.21.3 -> 1.22.0 > "10.5.0.2": machine configuration patched > "10.5.0.2": waiting for API server state pod update < "10.5.0.2": successfully updated updating "kube-controller-manager" to version "1.22.0" > "10.5.0.2": starting update > update kube-controller-manager: v1.21.3 -> 1.22.0 > "10.5.0.2": machine configuration patched > "10.5.0.2": waiting for API server state pod update < "10.5.0.2": successfully updated updating "kube-scheduler" to version "1.22.0" > "10.5.0.2": starting update > update kube-scheduler: v1.21.3 -> 1.22.0 > "10.5.0.2": machine configuration patched > "10.5.0.2": waiting for API server state pod update < "10.5.0.2": successfully updated updating daemonset "kube-proxy" to version "1.22.0" ``` Script runs in two phases: 1. In the first phase every control plane node machine configuration is patched with new image version for each control plane component. Talos renders new static pod definition on configuration update which is picked up by the kubelet. Script waits for the change to propagate to the API server state. Messages `config version mismatch` indicate that script is waiting for the updated container to be registered in the API server. 2. In the second phase script updates `kube-proxy` daemonset with the new image version. If script fails for any reason, it can be safely restarted to continue upgrade process. ## Manual Kubernetes Upgrade Kubernetes can be upgraded manually as well by following the steps outlined below. They are equivalent to the steps performed by the `talosctl upgrade-k8s` command. ### Kubeconfig In order to edit the control plane, we will need a working `kubectl` config. If you don't already have one, you can get one by running: ```bash talosctl --nodes kubeconfig ``` ### API Server Patch machine configuration using `talosctl patch` command: ```bash $ talosctl -n patch mc --immediate -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "k8s.gcr.io/kube-apiserver:v1.20.4"}]' patched mc at the node 172.20.0.2 ``` JSON patch might need to be adjusted if current machine configuration is missing `.cluster.apiServer.image` key. Also machine configuration can be edited manually with `talosctl -n edit mc --immediate`. Capture new version of `kube-apiserver` config with: ```bash $ talosctl -n get kcpc kube-apiserver -o yaml node: 172.20.0.2 metadata: namespace: config type: KubernetesControlPlaneConfigs.config.talos.dev id: kube-apiserver version: 5 phase: running spec: image: k8s.gcr.io/kube-apiserver:v1.20.4 cloudProvider: "" controlPlaneEndpoint: https://172.20.0.1:6443 etcdServers: - https://127.0.0.1:2379 localPort: 6443 serviceCIDR: 10.96.0.0/12 extraArgs: {} extraVolumes: [] ``` In this example, new version is `5`. Wait for the new pod definition to propagate to the API server state (replace `talos-default-master-1` with the node name): ```bash $ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}' 5 ``` Check that the pod is running: ```bash $ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-master-1 NAME READY STATUS RESTARTS AGE kube-apiserver-talos-default-master-1 1/1 Running 0 16m ``` Repeat this process for every control plane node, verifying that state got propagated successfully between each node update. ### Controller Manager Patch machine configuration using `talosctl patch` command: ```bash $ talosctl -n patch mc --immediate -p '[{"op": "replace", "path": "/cluster/controllerManager/image", "value": "k8s.gcr.io/kube-controller-manager:v1.20.4"}]' patched mc at the node 172.20.0.2 ``` JSON patch might need be adjusted if current machine configuration is missing `.cluster.controllerManager.image` key. Capture new version of `kube-controller-manager` config with: ```bash $ talosctl -n get kcpc kube-controller-manager -o yaml node: 172.20.0.2 metadata: namespace: config type: KubernetesControlPlaneConfigs.config.talos.dev id: kube-controller-manager version: 3 phase: running spec: image: k8s.gcr.io/kube-controller-manager:v1.20.4 cloudProvider: "" podCIDR: 10.244.0.0/16 serviceCIDR: 10.96.0.0/12 extraArgs: {} extraVolumes: [] ``` In this example, new version is `3`. Wait for the new pod definition to propagate to the API server state (replace `talos-default-master-1` with the node name): ```bash $ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}' 3 ``` Check that the pod is running: ```bash $ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-master-1 NAME READY STATUS RESTARTS AGE kube-controller-manager-talos-default-master-1 1/1 Running 0 35m ``` Repeat this process for every control plane node, verifying that state got propagated successfully between each node update. ### Scheduler Patch machine configuration using `talosctl patch` command: ```bash $ talosctl -n patch mc --immediate -p '[{"op": "replace", "path": "/cluster/scheduler/image", "value": "k8s.gcr.io/kube-scheduler:v1.20.4"}]' patched mc at the node 172.20.0.2 ``` JSON patch might need be adjusted if current machine configuration is missing `.cluster.scheduler.image` key. Capture new version of `kube-scheduler` config with: ```bash $ talosctl -n get kcpc kube-scheduler -o yaml node: 172.20.0.2 metadata: namespace: config type: KubernetesControlPlaneConfigs.config.talos.dev id: kube-scheduler version: 3 phase: running spec: image: k8s.gcr.io/kube-scheduler:v1.20.4 extraArgs: {} extraVolumes: [] ``` In this example, new version is `3`. Wait for the new pod definition to propagate to the API server state (replace `talos-default-master-1` with the node name): ```bash $ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}' 3 ``` Check that the pod is running: ```bash $ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-master-1 NAME READY STATUS RESTARTS AGE kube-scheduler-talos-default-master-1 1/1 Running 0 39m ``` Repeat this process for every control plane node, verifying that state got propagated successfully between each node update. ### Proxy In the proxy's `DaemonSet`, change: ```yaml kind: DaemonSet ... spec: ... template: ... spec: containers: - name: kube-proxy image: k8s.gcr.io/kube-proxy:v1.20.1 tolerations: - ... ``` to: ```yaml kind: DaemonSet ... spec: ... template: ... spec: containers: - name: kube-proxy image: k8s.gcr.io/kube-proxy:v1.20.4 tolerations: - ... - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule ``` To edit the `DaemonSet`, run: ```bash kubectl edit daemonsets -n kube-system kube-proxy ``` ## Kubelet Upgrading Kubelet version requires Talos node reboot after machine configuration change. For every node, patch machine configuration with new kubelet version, wait for the node to reboot: ```bash $ talosctl -n patch mc -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/talos-systems/kubelet:v1.20.4"}]' patched mc at the node 172.20.0.2 ``` Once node boots with the new configuration, confirm upgrade with `kubectl get nodes `: ```bash $ kubectl get nodes talos-default-master-1 NAME STATUS ROLES AGE VERSION talos-default-master-1 Ready control-plane,master 123m v1.20.4 ```