diff --git a/website/content/docs/v0.15/Guides/upgrading-kubernetes.md b/website/content/docs/v0.15/Guides/upgrading-kubernetes.md
index 0259ee7a2..a0f04683a 100644
--- a/website/content/docs/v0.15/Guides/upgrading-kubernetes.md
+++ b/website/content/docs/v0.15/Guides/upgrading-kubernetes.md
@@ -2,19 +2,28 @@
title: Upgrading Kubernetes
---
-This guide covers Kubernetes control plane upgrade for clusters running Talos-managed control plane.
-If the cluster is still running self-hosted control plane (after upgrade from Talos 0.8), please
-refer to 0.8 docs.
+This guide covers upgrading Kubernetes on Talos Linux clusters.
+For upgrading the Talos Linux operating system, see [Upgrading Talos](../upgrading-talos/)
## Video Walkthrough
-To see a live demo of this writeup, see the video below:
+To see a demo of this process, watch this video:
## Automated Kubernetes Upgrade
-To check what is going to be upgraded you can run `talosctl upgrade-k8s` with `--dry-run` flag:
+The recommended method to upgrade Kubernetes is to use the `talosctl upgrade-k8s` command.
+This will automatically update the components needed to upgrade Kubernetes safely.
+Upgrading Kubernetes is non-disruptive to the cluster workloads.
+
+To trigger a Kubernetes upgrade, issue a command specifiying the version of Kubernetes to ugprade to, such as:
+
+`talosctl --nodes upgrade-k8s --to 1.23.0`
+
+Note that the `--nodes` parameter specifies the control plane node to send the API call to, but all members of the cluster will be upgraded.
+
+To check what will be upgraded you can run `talosctl upgrade-k8s` with the `--dry-run` flag:
```bash
$ talosctl --nodes upgrade-k8s --to 1.23.0 --dry-run
@@ -44,84 +53,15 @@ updating "kube-controller-manager" to version "1.23.0"
> update kube-controller-manager: v1.22.4 -> 1.23.0
> skipped in dry-run
> "172.20.0.3": starting update
- > update kube-controller-manager: v1.22.4 -> 1.23.0
- > skipped in dry-run
- > "172.20.0.4": starting update
- > update kube-controller-manager: v1.22.4 -> 1.23.0
- > skipped in dry-run
-updating "kube-scheduler" to version "1.23.0"
- > "172.20.0.2": starting update
- > update kube-scheduler: v1.22.4 -> 1.23.0
- > skipped in dry-run
- > "172.20.0.3": starting update
- > update kube-scheduler: v1.22.4 -> 1.23.0
- > skipped in dry-run
- > "172.20.0.4": starting update
- > update kube-scheduler: v1.22.4 -> 1.23.0
- > skipped in dry-run
-updating daemonset "kube-proxy" to version "1.23.0"
-skipped in dry-run
-updating kubelet to version "1.23.0"
- > "172.20.0.2": starting update
- > update kubelet: v1.22.4 -> 1.23.0
- > skipped in dry-run
- > "172.20.0.3": starting update
- > update kubelet: v1.22.4 -> 1.23.0
- > skipped in dry-run
- > "172.20.0.4": starting update
- > update kubelet: v1.22.4 -> 1.23.0
- > skipped in dry-run
- > "172.20.0.5": starting update
- > update kubelet: v1.22.4 -> 1.23.0
- > skipped in dry-run
- > "172.20.0.6": starting update
- > update kubelet: v1.22.4 -> 1.23.0
- > skipped in dry-run
+
+
+
updating manifests
> apply manifest Secret bootstrap-token-3lb63t
> apply skipped in dry run
> apply manifest ClusterRoleBinding system-bootstrap-approve-node-client-csr
> apply skipped in dry run
- > apply manifest ClusterRoleBinding system-bootstrap-node-bootstrapper
- > apply skipped in dry run
- > apply manifest ClusterRoleBinding system-bootstrap-node-renewal
- > apply skipped in dry run
- > apply manifest ClusterRoleBinding system:default-sa
- > apply skipped in dry run
- > apply manifest ClusterRole psp:privileged
- > apply skipped in dry run
- > apply manifest ClusterRoleBinding psp:privileged
- > apply skipped in dry run
- > apply manifest PodSecurityPolicy privileged
- > apply skipped in dry run
- > apply manifest ClusterRole flannel
- > apply skipped in dry run
- > apply manifest ClusterRoleBinding flannel
- > apply skipped in dry run
- > apply manifest ServiceAccount flannel
- > apply skipped in dry run
- > apply manifest ConfigMap kube-flannel-cfg
- > apply skipped in dry run
- > apply manifest DaemonSet kube-flannel
- > apply skipped in dry run
- > apply manifest ServiceAccount kube-proxy
- > apply skipped in dry run
- > apply manifest ClusterRoleBinding kube-proxy
- > apply skipped in dry run
- > apply manifest ServiceAccount coredns
- > apply skipped in dry run
- > apply manifest ClusterRoleBinding system:coredns
- > apply skipped in dry run
- > apply manifest ClusterRole system:coredns
- > apply skipped in dry run
- > apply manifest ConfigMap coredns
- > apply skipped in dry run
- > apply manifest Deployment coredns
- > apply skipped in dry run
- > apply manifest Service kube-dns
- > apply skipped in dry run
- > apply manifest ConfigMap kubeconfig-in-cluster
- > apply skipped in dry run
+
```
To upgrade Kubernetes from v1.22.4 to v1.23.0 run:
@@ -140,148 +80,32 @@ updating "kube-apiserver" to version "1.23.0"
< "172.20.0.2": successfully updated
> "172.20.0.3": starting update
> update kube-apiserver: v1.22.4 -> 1.23.0
- > "172.20.0.3": machine configuration patched
- > "172.20.0.3": waiting for API server state pod update
- < "172.20.0.3": successfully updated
- > "172.20.0.4": starting update
- > update kube-apiserver: v1.22.4 -> 1.23.0
- > "172.20.0.4": machine configuration patched
- > "172.20.0.4": waiting for API server state pod update
- < "172.20.0.4": successfully updated
-updating "kube-controller-manager" to version "1.23.0"
- > "172.20.0.2": starting update
- > update kube-controller-manager: v1.22.4 -> 1.23.0
- > "172.20.0.2": machine configuration patched
- > "172.20.0.2": waiting for API server state pod update
- < "172.20.0.2": successfully updated
- > "172.20.0.3": starting update
- > update kube-controller-manager: v1.22.4 -> 1.23.0
- > "172.20.0.3": machine configuration patched
- > "172.20.0.3": waiting for API server state pod update
- < "172.20.0.3": successfully updated
- > "172.20.0.4": starting update
- > update kube-controller-manager: v1.22.4 -> 1.23.0
- > "172.20.0.4": machine configuration patched
- > "172.20.0.4": waiting for API server state pod update
- < "172.20.0.4": successfully updated
-updating "kube-scheduler" to version "1.23.0"
- > "172.20.0.2": starting update
- > update kube-scheduler: v1.22.4 -> 1.23.0
- > "172.20.0.2": machine configuration patched
- > "172.20.0.2": waiting for API server state pod update
- < "172.20.0.2": successfully updated
- > "172.20.0.3": starting update
- > update kube-scheduler: v1.22.4 -> 1.23.0
- > "172.20.0.3": machine configuration patched
- > "172.20.0.3": waiting for API server state pod update
- < "172.20.0.3": successfully updated
- > "172.20.0.4": starting update
- > update kube-scheduler: v1.22.4 -> 1.23.0
- > "172.20.0.4": machine configuration patched
- > "172.20.0.4": waiting for API server state pod update
- < "172.20.0.4": successfully updated
-updating daemonset "kube-proxy" to version "1.23.0"
-updating kubelet to version "1.23.0"
- > "172.20.0.2": starting update
- > update kubelet: v1.22.4 -> 1.23.0
- > "172.20.0.2": machine configuration patched
- > "172.20.0.2": waiting for kubelet restart
- > "172.20.0.2": waiting for node update
- < "172.20.0.2": successfully updated
- > "172.20.0.3": starting update
- > update kubelet: v1.22.4 -> 1.23.0
- > "172.20.0.3": machine configuration patched
- > "172.20.0.3": waiting for kubelet restart
- > "172.20.0.3": waiting for node update
- < "172.20.0.3": successfully updated
- > "172.20.0.4": starting update
- > update kubelet: v1.22.4 -> 1.23.0
- > "172.20.0.4": machine configuration patched
- > "172.20.0.4": waiting for kubelet restart
- > "172.20.0.4": waiting for node update
- < "172.20.0.4": successfully updated
- > "172.20.0.5": starting update
- > update kubelet: v1.22.4 -> 1.23.0
- > "172.20.0.5": machine configuration patched
- > "172.20.0.5": waiting for kubelet restart
- > "172.20.0.5": waiting for node update
- < "172.20.0.5": successfully updated
- > "172.20.0.6": starting update
- > update kubelet: v1.22.4 -> 1.23.0
- > "172.20.0.6": machine configuration patched
- > "172.20.0.6": waiting for kubelet restart
- > "172.20.0.6": waiting for node update
- < "172.20.0.6": successfully updated
-updating manifests
- > apply manifest Secret bootstrap-token-3lb63t
- > apply skipped: nothing to update
- > apply manifest ClusterRoleBinding system-bootstrap-approve-node-client-csr
- > apply skipped: nothing to update
- > apply manifest ClusterRoleBinding system-bootstrap-node-bootstrapper
- > apply skipped: nothing to update
- > apply manifest ClusterRoleBinding system-bootstrap-node-renewal
- > apply skipped: nothing to update
- > apply manifest ClusterRoleBinding system:default-sa
- > apply skipped: nothing to update
- > apply manifest ClusterRole psp:privileged
- > apply skipped: nothing to update
- > apply manifest ClusterRoleBinding psp:privileged
- > apply skipped: nothing to update
- > apply manifest PodSecurityPolicy privileged
- > apply skipped: nothing to update
- > apply manifest ClusterRole flannel
- > apply skipped: nothing to update
- > apply manifest ClusterRoleBinding flannel
- > apply skipped: nothing to update
- > apply manifest ServiceAccount flannel
- > apply skipped: nothing to update
- > apply manifest ConfigMap kube-flannel-cfg
- > apply skipped: nothing to update
- > apply manifest DaemonSet kube-flannel
- > apply skipped: nothing to update
- > apply manifest ServiceAccount kube-proxy
- > apply skipped: nothing to update
- > apply manifest ClusterRoleBinding kube-proxy
- > apply skipped: nothing to update
- > apply manifest ServiceAccount coredns
- > apply skipped: nothing to update
- > apply manifest ClusterRoleBinding system:coredns
- > apply skipped: nothing to update
- > apply manifest ClusterRole system:coredns
- > apply skipped: nothing to update
- > apply manifest ConfigMap coredns
- > apply skipped: nothing to update
- > apply manifest Deployment coredns
- > apply skipped: nothing to update
- > apply manifest Service kube-dns
- > apply skipped: nothing to update
- > apply manifest ConfigMap kubeconfig-in-cluster
- > apply skipped: nothing to update
+
```
-Script runs in several phases:
+This command runs in several phases:
-1. Every control plane node machine configuration is patched with new image version for each control plane component.
- Talos renders new static pod definition on configuration update which is picked up by the kubelet.
- Script waits for the change to propagate to the API server state.
-2. The script updates `kube-proxy` daemonset with the new image version.
-3. On every node in the cluster, `kubelet` version is updated.
- The script waits for the `kubelet` service to be restarted, become healthy.
- Update is verified with the `Node` resource state.
+1. Every control plane node machine configuration is patched with the new image version for each control plane component.
+ Talos renders new static pod definitions on the configuration update which is picked up by the kubelet.
+ The command waits for the change to propagate to the API server state.
+2. The command updates the `kube-proxy` daemonset with the new image version.
+3. On every node in the cluster, the `kubelet` version is updated.
+ The command then waits for the `kubelet` service to be restarted and become healthy.
+ The update is verified by checking the `Node` resource state.
4. Kubernetes bootstrap manifests are re-applied to the cluster.
- The script never deletes any resources from the cluster, they should be deleted manually.
- Updated bootstrap manifests might come with new Talos version (e.g. CoreDNS version update), or might be result of machine configuration change.
+ Updated bootstrap manifests might come with a new Talos version (e.g. CoreDNS version update), or might be the result of machine configuration change.
+ Note: The `upgrade-k8s` command never deletes any resources from the cluster: they should be deleted manually.
-If the script fails for any reason, it can be safely restarted to continue upgrade process from the moment of the failure.
+If the command fails for any reason, it can be safely restarted to continue the upgrade process from the moment of the failure.
## Manual Kubernetes Upgrade
-Kubernetes can be upgraded manually as well by following the steps outlined below.
+Kubernetes can be upgraded manually by following the steps outlined below.
They are equivalent to the steps performed by the `talosctl upgrade-k8s` command.
### Kubeconfig
-In order to edit the control plane, we will need a working `kubectl` config.
+In order to edit the control plane, you need a working `kubectl` config.
If you don't already have one, you can get one by running:
```bash
@@ -297,11 +121,11 @@ $ talosctl -n patch mc --mode=no-reboot -p '[{"op": "replac
patched mc at the node 172.20.0.2
```
-JSON patch might need to be adjusted if current machine configuration is missing `.cluster.apiServer.image` key.
+The JSON patch might need to be adjusted if current machine configuration is missing `.cluster.apiServer.image` key.
-Also machine configuration can be edited manually with `talosctl -n edit mc --mode=no-reboot`.
+Also the machine configuration can be edited manually with `talosctl -n edit mc --mode=no-reboot`.
-Capture new version of `kube-apiserver` config with:
+Capture the new version of `kube-apiserver` config with:
```bash
$ talosctl -n get kcpc kube-apiserver -o yaml
@@ -324,7 +148,7 @@ spec:
extraVolumes: []
```
-In this example, new version is `5`.
+In this example, the new version is `5`.
Wait for the new pod definition to propagate to the API server state (replace `talos-default-master-1` with the node name):
```bash
@@ -351,7 +175,7 @@ $ talosctl -n patch mc --mode=no-reboot -p '[{"op": "replac
patched mc at the node 172.20.0.2
```
-JSON patch might need be adjusted if current machine configuration is missing `.cluster.controllerManager.image` key.
+The JSON patch might need be adjusted if current machine configuration is missing `.cluster.controllerManager.image` key.
Capture new version of `kube-controller-manager` config with:
@@ -389,7 +213,7 @@ NAME READY STATUS RESTARTS AG
kube-controller-manager-talos-default-master-1 1/1 Running 0 35m
```
-Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.
+Repeat this process for every control plane node, verifying that state propagated successfully between each node update.
### Scheduler
diff --git a/website/content/docs/v0.15/Guides/upgrading-talos.md b/website/content/docs/v0.15/Guides/upgrading-talos.md
index 2eae86a46..01e4d1f42 100644
--- a/website/content/docs/v0.15/Guides/upgrading-talos.md
+++ b/website/content/docs/v0.15/Guides/upgrading-talos.md
@@ -1,14 +1,28 @@
---
-title: Upgrading Talos
+title: Upgrading Talos Linux
---
-Talos upgrades are effected by an API call.
-The `talosctl` CLI utility will facilitate this.
-
+OS upgrades, like other operations on Talos Linux, are effected by an API call, which can be sent via the `talosctl` CLI utility.
+Because Talos Linux is image based, an upgrade is almost the same as installing Talos, with the difference that the system has already been initialized with a configuration.
+
+The upgrade API call passes a node the installer image to use to perform the upgrade.
+Each Talos version has a corresponding installer.
+
+Upgrades use an A-B image scheme in order to facilitate rollbacks.
+This scheme retains the previous Talos kernel and OS image following each upgrade.
+If an upgrade fails to boot, Talos will roll back to the previous version.
+Likewise, Talos may be manually rolled back via API (or `talosctl rollback`).
+This will simply update the boot reference and reboot.
+
+Unless explicitly told to `preserve` data, an upgrade will cause the node to wipe the ephemeral partition, remove itself from the etcd cluster (if it is a control node), and generally make itself as pristine as is possible.
+(This is generally the desired behavior, except in specialised use cases such as single-node clusters.)
+
+*Note* that unless the Kubernetes version has been specified in the machine config, an upgrade of the Talos Linux OS will also apply an upgrade of the Kubernetes version.
+Each release of Talos Linux includes the latest stable Kubernetes version by default.
## Video Walkthrough
-To see a live demo of this writeup, see the video below:
+To see a live demo of an upgrade of Talos Linux, see the video below:
@@ -16,10 +30,10 @@ To see a live demo of this writeup, see the video below:
TBD
-## `talosctl` Upgrade
+## `talosctl upgrade`
-To manually upgrade a Talos node, you will specify the node's IP address and the
-installer container image for the version of Talos to which you wish to upgrade.
+To upgrade a Talos node, specify the node's IP address and the
+installer container image for the version of Talos to upgrade to.
For instance, if your Talos node has the IP address `10.20.30.40` and you want
to install the official version `v0.15.0`, you would enter a command such
@@ -30,12 +44,18 @@ as:
--image ghcr.io/talos-systems/installer:v0.15.0
```
-There is an option to this command: `--preserve`, which can be used to explicitly tell Talos to either keep intact its ephemeral data or not.
-In most cases, it is correct to just let Talos perform its default action.
+There is an option to this command: `--preserve`, which will explicitly tell Talos to keep ephemeral data intact.
+In most cases, it is correct to let Talos perform its default action of erasing the ephemeral data.
However, if you are running a single-node control-plane, you will want to make sure that `--preserve=true`.
-If Talos fails to run the upgrade, the `--stage` flag may be used to perform the upgrade after a reboot
-which is followed by another reboot to upgraded version.
+Rarely, a upgrade command will fail to run due to a process holding a file open on disk, or you may wish to set a node to upgrade, but delay the actual reboot as long as possible.
+In these cases, you can use the `--stage` flag.
+This puts the upgrade artifacts on disk, and adds some metadata to a disk partition that gets checked very early in the boot process.
+The node is *not* rebooted by the `upgrade --stage` process.
+However, whenever the system does next reboot, Talos sees that it needs to apply an upgrade, and will do so immediately.
+Because this occurs in a just rebooted system, there will be no conflict with any files being held open.
+After the upgrade is applied, the node will reboot again, in order to boot into the new version.
+Note that because Talos Linux now reboots via the kexec syscall, the extra reboot adds very little time.