diff --git a/website/content/docs/v0.15/Guides/upgrading-kubernetes.md b/website/content/docs/v0.15/Guides/upgrading-kubernetes.md index 0259ee7a2..a0f04683a 100644 --- a/website/content/docs/v0.15/Guides/upgrading-kubernetes.md +++ b/website/content/docs/v0.15/Guides/upgrading-kubernetes.md @@ -2,19 +2,28 @@ title: Upgrading Kubernetes --- -This guide covers Kubernetes control plane upgrade for clusters running Talos-managed control plane. -If the cluster is still running self-hosted control plane (after upgrade from Talos 0.8), please -refer to 0.8 docs. +This guide covers upgrading Kubernetes on Talos Linux clusters. +For upgrading the Talos Linux operating system, see [Upgrading Talos](../upgrading-talos/) ## Video Walkthrough -To see a live demo of this writeup, see the video below: +To see a demo of this process, watch this video: ## Automated Kubernetes Upgrade -To check what is going to be upgraded you can run `talosctl upgrade-k8s` with `--dry-run` flag: +The recommended method to upgrade Kubernetes is to use the `talosctl upgrade-k8s` command. +This will automatically update the components needed to upgrade Kubernetes safely. +Upgrading Kubernetes is non-disruptive to the cluster workloads. + +To trigger a Kubernetes upgrade, issue a command specifiying the version of Kubernetes to ugprade to, such as: + +`talosctl --nodes upgrade-k8s --to 1.23.0` + +Note that the `--nodes` parameter specifies the control plane node to send the API call to, but all members of the cluster will be upgraded. + +To check what will be upgraded you can run `talosctl upgrade-k8s` with the `--dry-run` flag: ```bash $ talosctl --nodes upgrade-k8s --to 1.23.0 --dry-run @@ -44,84 +53,15 @@ updating "kube-controller-manager" to version "1.23.0" > update kube-controller-manager: v1.22.4 -> 1.23.0 > skipped in dry-run > "172.20.0.3": starting update - > update kube-controller-manager: v1.22.4 -> 1.23.0 - > skipped in dry-run - > "172.20.0.4": starting update - > update kube-controller-manager: v1.22.4 -> 1.23.0 - > skipped in dry-run -updating "kube-scheduler" to version "1.23.0" - > "172.20.0.2": starting update - > update kube-scheduler: v1.22.4 -> 1.23.0 - > skipped in dry-run - > "172.20.0.3": starting update - > update kube-scheduler: v1.22.4 -> 1.23.0 - > skipped in dry-run - > "172.20.0.4": starting update - > update kube-scheduler: v1.22.4 -> 1.23.0 - > skipped in dry-run -updating daemonset "kube-proxy" to version "1.23.0" -skipped in dry-run -updating kubelet to version "1.23.0" - > "172.20.0.2": starting update - > update kubelet: v1.22.4 -> 1.23.0 - > skipped in dry-run - > "172.20.0.3": starting update - > update kubelet: v1.22.4 -> 1.23.0 - > skipped in dry-run - > "172.20.0.4": starting update - > update kubelet: v1.22.4 -> 1.23.0 - > skipped in dry-run - > "172.20.0.5": starting update - > update kubelet: v1.22.4 -> 1.23.0 - > skipped in dry-run - > "172.20.0.6": starting update - > update kubelet: v1.22.4 -> 1.23.0 - > skipped in dry-run + + + updating manifests > apply manifest Secret bootstrap-token-3lb63t > apply skipped in dry run > apply manifest ClusterRoleBinding system-bootstrap-approve-node-client-csr > apply skipped in dry run - > apply manifest ClusterRoleBinding system-bootstrap-node-bootstrapper - > apply skipped in dry run - > apply manifest ClusterRoleBinding system-bootstrap-node-renewal - > apply skipped in dry run - > apply manifest ClusterRoleBinding system:default-sa - > apply skipped in dry run - > apply manifest ClusterRole psp:privileged - > apply skipped in dry run - > apply manifest ClusterRoleBinding psp:privileged - > apply skipped in dry run - > apply manifest PodSecurityPolicy privileged - > apply skipped in dry run - > apply manifest ClusterRole flannel - > apply skipped in dry run - > apply manifest ClusterRoleBinding flannel - > apply skipped in dry run - > apply manifest ServiceAccount flannel - > apply skipped in dry run - > apply manifest ConfigMap kube-flannel-cfg - > apply skipped in dry run - > apply manifest DaemonSet kube-flannel - > apply skipped in dry run - > apply manifest ServiceAccount kube-proxy - > apply skipped in dry run - > apply manifest ClusterRoleBinding kube-proxy - > apply skipped in dry run - > apply manifest ServiceAccount coredns - > apply skipped in dry run - > apply manifest ClusterRoleBinding system:coredns - > apply skipped in dry run - > apply manifest ClusterRole system:coredns - > apply skipped in dry run - > apply manifest ConfigMap coredns - > apply skipped in dry run - > apply manifest Deployment coredns - > apply skipped in dry run - > apply manifest Service kube-dns - > apply skipped in dry run - > apply manifest ConfigMap kubeconfig-in-cluster - > apply skipped in dry run + ``` To upgrade Kubernetes from v1.22.4 to v1.23.0 run: @@ -140,148 +80,32 @@ updating "kube-apiserver" to version "1.23.0" < "172.20.0.2": successfully updated > "172.20.0.3": starting update > update kube-apiserver: v1.22.4 -> 1.23.0 - > "172.20.0.3": machine configuration patched - > "172.20.0.3": waiting for API server state pod update - < "172.20.0.3": successfully updated - > "172.20.0.4": starting update - > update kube-apiserver: v1.22.4 -> 1.23.0 - > "172.20.0.4": machine configuration patched - > "172.20.0.4": waiting for API server state pod update - < "172.20.0.4": successfully updated -updating "kube-controller-manager" to version "1.23.0" - > "172.20.0.2": starting update - > update kube-controller-manager: v1.22.4 -> 1.23.0 - > "172.20.0.2": machine configuration patched - > "172.20.0.2": waiting for API server state pod update - < "172.20.0.2": successfully updated - > "172.20.0.3": starting update - > update kube-controller-manager: v1.22.4 -> 1.23.0 - > "172.20.0.3": machine configuration patched - > "172.20.0.3": waiting for API server state pod update - < "172.20.0.3": successfully updated - > "172.20.0.4": starting update - > update kube-controller-manager: v1.22.4 -> 1.23.0 - > "172.20.0.4": machine configuration patched - > "172.20.0.4": waiting for API server state pod update - < "172.20.0.4": successfully updated -updating "kube-scheduler" to version "1.23.0" - > "172.20.0.2": starting update - > update kube-scheduler: v1.22.4 -> 1.23.0 - > "172.20.0.2": machine configuration patched - > "172.20.0.2": waiting for API server state pod update - < "172.20.0.2": successfully updated - > "172.20.0.3": starting update - > update kube-scheduler: v1.22.4 -> 1.23.0 - > "172.20.0.3": machine configuration patched - > "172.20.0.3": waiting for API server state pod update - < "172.20.0.3": successfully updated - > "172.20.0.4": starting update - > update kube-scheduler: v1.22.4 -> 1.23.0 - > "172.20.0.4": machine configuration patched - > "172.20.0.4": waiting for API server state pod update - < "172.20.0.4": successfully updated -updating daemonset "kube-proxy" to version "1.23.0" -updating kubelet to version "1.23.0" - > "172.20.0.2": starting update - > update kubelet: v1.22.4 -> 1.23.0 - > "172.20.0.2": machine configuration patched - > "172.20.0.2": waiting for kubelet restart - > "172.20.0.2": waiting for node update - < "172.20.0.2": successfully updated - > "172.20.0.3": starting update - > update kubelet: v1.22.4 -> 1.23.0 - > "172.20.0.3": machine configuration patched - > "172.20.0.3": waiting for kubelet restart - > "172.20.0.3": waiting for node update - < "172.20.0.3": successfully updated - > "172.20.0.4": starting update - > update kubelet: v1.22.4 -> 1.23.0 - > "172.20.0.4": machine configuration patched - > "172.20.0.4": waiting for kubelet restart - > "172.20.0.4": waiting for node update - < "172.20.0.4": successfully updated - > "172.20.0.5": starting update - > update kubelet: v1.22.4 -> 1.23.0 - > "172.20.0.5": machine configuration patched - > "172.20.0.5": waiting for kubelet restart - > "172.20.0.5": waiting for node update - < "172.20.0.5": successfully updated - > "172.20.0.6": starting update - > update kubelet: v1.22.4 -> 1.23.0 - > "172.20.0.6": machine configuration patched - > "172.20.0.6": waiting for kubelet restart - > "172.20.0.6": waiting for node update - < "172.20.0.6": successfully updated -updating manifests - > apply manifest Secret bootstrap-token-3lb63t - > apply skipped: nothing to update - > apply manifest ClusterRoleBinding system-bootstrap-approve-node-client-csr - > apply skipped: nothing to update - > apply manifest ClusterRoleBinding system-bootstrap-node-bootstrapper - > apply skipped: nothing to update - > apply manifest ClusterRoleBinding system-bootstrap-node-renewal - > apply skipped: nothing to update - > apply manifest ClusterRoleBinding system:default-sa - > apply skipped: nothing to update - > apply manifest ClusterRole psp:privileged - > apply skipped: nothing to update - > apply manifest ClusterRoleBinding psp:privileged - > apply skipped: nothing to update - > apply manifest PodSecurityPolicy privileged - > apply skipped: nothing to update - > apply manifest ClusterRole flannel - > apply skipped: nothing to update - > apply manifest ClusterRoleBinding flannel - > apply skipped: nothing to update - > apply manifest ServiceAccount flannel - > apply skipped: nothing to update - > apply manifest ConfigMap kube-flannel-cfg - > apply skipped: nothing to update - > apply manifest DaemonSet kube-flannel - > apply skipped: nothing to update - > apply manifest ServiceAccount kube-proxy - > apply skipped: nothing to update - > apply manifest ClusterRoleBinding kube-proxy - > apply skipped: nothing to update - > apply manifest ServiceAccount coredns - > apply skipped: nothing to update - > apply manifest ClusterRoleBinding system:coredns - > apply skipped: nothing to update - > apply manifest ClusterRole system:coredns - > apply skipped: nothing to update - > apply manifest ConfigMap coredns - > apply skipped: nothing to update - > apply manifest Deployment coredns - > apply skipped: nothing to update - > apply manifest Service kube-dns - > apply skipped: nothing to update - > apply manifest ConfigMap kubeconfig-in-cluster - > apply skipped: nothing to update + ``` -Script runs in several phases: +This command runs in several phases: -1. Every control plane node machine configuration is patched with new image version for each control plane component. - Talos renders new static pod definition on configuration update which is picked up by the kubelet. - Script waits for the change to propagate to the API server state. -2. The script updates `kube-proxy` daemonset with the new image version. -3. On every node in the cluster, `kubelet` version is updated. - The script waits for the `kubelet` service to be restarted, become healthy. - Update is verified with the `Node` resource state. +1. Every control plane node machine configuration is patched with the new image version for each control plane component. + Talos renders new static pod definitions on the configuration update which is picked up by the kubelet. + The command waits for the change to propagate to the API server state. +2. The command updates the `kube-proxy` daemonset with the new image version. +3. On every node in the cluster, the `kubelet` version is updated. + The command then waits for the `kubelet` service to be restarted and become healthy. + The update is verified by checking the `Node` resource state. 4. Kubernetes bootstrap manifests are re-applied to the cluster. - The script never deletes any resources from the cluster, they should be deleted manually. - Updated bootstrap manifests might come with new Talos version (e.g. CoreDNS version update), or might be result of machine configuration change. + Updated bootstrap manifests might come with a new Talos version (e.g. CoreDNS version update), or might be the result of machine configuration change. + Note: The `upgrade-k8s` command never deletes any resources from the cluster: they should be deleted manually. -If the script fails for any reason, it can be safely restarted to continue upgrade process from the moment of the failure. +If the command fails for any reason, it can be safely restarted to continue the upgrade process from the moment of the failure. ## Manual Kubernetes Upgrade -Kubernetes can be upgraded manually as well by following the steps outlined below. +Kubernetes can be upgraded manually by following the steps outlined below. They are equivalent to the steps performed by the `talosctl upgrade-k8s` command. ### Kubeconfig -In order to edit the control plane, we will need a working `kubectl` config. +In order to edit the control plane, you need a working `kubectl` config. If you don't already have one, you can get one by running: ```bash @@ -297,11 +121,11 @@ $ talosctl -n patch mc --mode=no-reboot -p '[{"op": "replac patched mc at the node 172.20.0.2 ``` -JSON patch might need to be adjusted if current machine configuration is missing `.cluster.apiServer.image` key. +The JSON patch might need to be adjusted if current machine configuration is missing `.cluster.apiServer.image` key. -Also machine configuration can be edited manually with `talosctl -n edit mc --mode=no-reboot`. +Also the machine configuration can be edited manually with `talosctl -n edit mc --mode=no-reboot`. -Capture new version of `kube-apiserver` config with: +Capture the new version of `kube-apiserver` config with: ```bash $ talosctl -n get kcpc kube-apiserver -o yaml @@ -324,7 +148,7 @@ spec: extraVolumes: [] ``` -In this example, new version is `5`. +In this example, the new version is `5`. Wait for the new pod definition to propagate to the API server state (replace `talos-default-master-1` with the node name): ```bash @@ -351,7 +175,7 @@ $ talosctl -n patch mc --mode=no-reboot -p '[{"op": "replac patched mc at the node 172.20.0.2 ``` -JSON patch might need be adjusted if current machine configuration is missing `.cluster.controllerManager.image` key. +The JSON patch might need be adjusted if current machine configuration is missing `.cluster.controllerManager.image` key. Capture new version of `kube-controller-manager` config with: @@ -389,7 +213,7 @@ NAME READY STATUS RESTARTS AG kube-controller-manager-talos-default-master-1 1/1 Running 0 35m ``` -Repeat this process for every control plane node, verifying that state got propagated successfully between each node update. +Repeat this process for every control plane node, verifying that state propagated successfully between each node update. ### Scheduler diff --git a/website/content/docs/v0.15/Guides/upgrading-talos.md b/website/content/docs/v0.15/Guides/upgrading-talos.md index 2eae86a46..01e4d1f42 100644 --- a/website/content/docs/v0.15/Guides/upgrading-talos.md +++ b/website/content/docs/v0.15/Guides/upgrading-talos.md @@ -1,14 +1,28 @@ --- -title: Upgrading Talos +title: Upgrading Talos Linux --- -Talos upgrades are effected by an API call. -The `talosctl` CLI utility will facilitate this. - +OS upgrades, like other operations on Talos Linux, are effected by an API call, which can be sent via the `talosctl` CLI utility. +Because Talos Linux is image based, an upgrade is almost the same as installing Talos, with the difference that the system has already been initialized with a configuration. + +The upgrade API call passes a node the installer image to use to perform the upgrade. +Each Talos version has a corresponding installer. + +Upgrades use an A-B image scheme in order to facilitate rollbacks. +This scheme retains the previous Talos kernel and OS image following each upgrade. +If an upgrade fails to boot, Talos will roll back to the previous version. +Likewise, Talos may be manually rolled back via API (or `talosctl rollback`). +This will simply update the boot reference and reboot. + +Unless explicitly told to `preserve` data, an upgrade will cause the node to wipe the ephemeral partition, remove itself from the etcd cluster (if it is a control node), and generally make itself as pristine as is possible. +(This is generally the desired behavior, except in specialised use cases such as single-node clusters.) + +*Note* that unless the Kubernetes version has been specified in the machine config, an upgrade of the Talos Linux OS will also apply an upgrade of the Kubernetes version. +Each release of Talos Linux includes the latest stable Kubernetes version by default. ## Video Walkthrough -To see a live demo of this writeup, see the video below: +To see a live demo of an upgrade of Talos Linux, see the video below: @@ -16,10 +30,10 @@ To see a live demo of this writeup, see the video below: TBD -## `talosctl` Upgrade +## `talosctl upgrade` -To manually upgrade a Talos node, you will specify the node's IP address and the -installer container image for the version of Talos to which you wish to upgrade. +To upgrade a Talos node, specify the node's IP address and the +installer container image for the version of Talos to upgrade to. For instance, if your Talos node has the IP address `10.20.30.40` and you want to install the official version `v0.15.0`, you would enter a command such @@ -30,12 +44,18 @@ as: --image ghcr.io/talos-systems/installer:v0.15.0 ``` -There is an option to this command: `--preserve`, which can be used to explicitly tell Talos to either keep intact its ephemeral data or not. -In most cases, it is correct to just let Talos perform its default action. +There is an option to this command: `--preserve`, which will explicitly tell Talos to keep ephemeral data intact. +In most cases, it is correct to let Talos perform its default action of erasing the ephemeral data. However, if you are running a single-node control-plane, you will want to make sure that `--preserve=true`. -If Talos fails to run the upgrade, the `--stage` flag may be used to perform the upgrade after a reboot -which is followed by another reboot to upgraded version. +Rarely, a upgrade command will fail to run due to a process holding a file open on disk, or you may wish to set a node to upgrade, but delay the actual reboot as long as possible. +In these cases, you can use the `--stage` flag. +This puts the upgrade artifacts on disk, and adds some metadata to a disk partition that gets checked very early in the boot process. +The node is *not* rebooted by the `upgrade --stage` process. +However, whenever the system does next reboot, Talos sees that it needs to apply an upgrade, and will do so immediately. +Because this occurs in a just rebooted system, there will be no conflict with any files being held open. +After the upgrade is applied, the node will reboot again, in order to boot into the new version. +Note that because Talos Linux now reboots via the kexec syscall, the extra reboot adds very little time.