mirror of
https://github.com/siderolabs/talos.git
synced 2025-12-15 22:41:55 +01:00
docs: consolidate the control-plane documentation
Also fix some typos. Signed-off-by: Steve Francis <steve.francis@talos-systems.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This commit is contained in:
parent
353154281a
commit
148c75cfb9
@ -5,14 +5,12 @@ aliases:
|
||||
- ../guides/disaster-recovery
|
||||
---
|
||||
|
||||
`etcd` database backs Kubernetes control plane state, so if the `etcd` service is unavailable
|
||||
Kubernetes control plane goes down, and the cluster is not recoverable until `etcd` is recovered with contents.
|
||||
The `etcd` consistency model builds around the consensus protocol Raft, so for highly-available control plane clusters,
|
||||
loss of one control plane node doesn't impact cluster health.
|
||||
In general, `etcd` stays up as long as a sufficient number of nodes to maintain quorum are up.
|
||||
`etcd` database backs Kubernetes control plane state, so if the `etcd` service is unavailable,
|
||||
the Kubernetes control plane goes down, and the cluster is not recoverable until `etcd` is recovered.
|
||||
`etcd` builds around the consensus protocol Raft, so highly-available control plane clusters can tolerate the loss of nodes so long as more than half of the members are running and reachable.
|
||||
For a three control plane node Talos cluster, this means that the cluster tolerates a failure of any single node,
|
||||
but losing more than one node at the same time leads to complete loss of service.
|
||||
Because of that, it is important to take routine backups of `etcd` state to have a snapshot to recover cluster from
|
||||
Because of that, it is important to take routine backups of `etcd` state to have a snapshot to recover the cluster from
|
||||
in case of catastrophic failure.
|
||||
|
||||
## Backup
|
||||
@ -35,7 +33,7 @@ It is recommended to configure `etcd` snapshots to be created on some schedule t
|
||||
|
||||
### Disaster Database Snapshot
|
||||
|
||||
If `etcd` cluster is not healthy, the `talosctl etcd snapshot` command might fail.
|
||||
If the `etcd` cluster is not healthy (for example, if quorum has already been lost), the `talosctl etcd snapshot` command might fail.
|
||||
In that case, copy the database snapshot directly from the control plane node:
|
||||
|
||||
```bash
|
||||
@ -81,17 +79,18 @@ NODE NAMESPACE TYPE ID VERSION TYPE
|
||||
172.20.0.3 config MachineType machine-type 2 controlplane
|
||||
```
|
||||
|
||||
Nodes with `init` type are incompatible with `etcd` recovery procedure.
|
||||
Init node type is deprecated, and are incompatible with `etcd` recovery procedure.
|
||||
`init` node can be converted to `controlplane` type with `talosctl edit mc --mode=staged` command followed
|
||||
by node reboot with `talosctl reboot` command.
|
||||
|
||||
### Preparing Control Plane Nodes
|
||||
|
||||
If some control plane nodes experienced hardware failure, replace them with new nodes.
|
||||
|
||||
Use machine configuration backup to re-create the nodes with the same secret material and control plane settings
|
||||
to allow workers to join the recovered control plane.
|
||||
|
||||
If a control plane node is healthy but `etcd` isn't, wipe the node's [EPHEMERAL]({{< relref "../learn-more/architecture/#file-system-partitions" >}}) partition to remove the `etcd`
|
||||
If a control plane node is up but `etcd` isn't, wipe the node's [EPHEMERAL]({{< relref "../learn-more/architecture/#file-system-partitions" >}}) partition to remove the `etcd`
|
||||
data directory (make sure a database snapshot is taken before doing this):
|
||||
|
||||
```bash
|
||||
@ -100,8 +99,8 @@ talosctl -n <IP> reset --graceful=false --reboot --system-labels-to-wipe=EPHEMER
|
||||
|
||||
At this point, all control plane nodes should boot up, and `etcd` service should be in the `Preparing` state.
|
||||
|
||||
Kubernetes control plane endpoint should be pointed to the new control plane nodes if there were
|
||||
any changes to the node addresses.
|
||||
The Kubernetes control plane endpoint should be pointed to the new control plane nodes if there were
|
||||
changes to the node addresses.
|
||||
|
||||
### Recovering from the Backup
|
||||
|
||||
|
||||
@ -7,72 +7,30 @@ aliases:
|
||||
|
||||
<!-- markdownlint-disable MD026 -->
|
||||
|
||||
This guide is written as series of topics and detailed answers for each topic.
|
||||
It starts with basics of control plane and goes into Talos specifics.
|
||||
|
||||
In this guide we assume that Talos client config is available and Talos API access is available.
|
||||
Kubernetes client configuration can be pulled from control plane nodes with `talosctl -n <IP> kubeconfig`
|
||||
(this command works before Kubernetes is fully booted).
|
||||
|
||||
### What is a control plane node?
|
||||
|
||||
A control plane node is a node which:
|
||||
|
||||
- runs etcd, the Kubernetes database
|
||||
- runs the Kubernetes control plane
|
||||
- kube-apiserver
|
||||
- kube-controller-manager
|
||||
- kube-scheduler
|
||||
- serves as an administrative proxy to the worker nodes
|
||||
|
||||
These nodes are critical to the operation of your cluster.
|
||||
Without control plane nodes, Kubernetes will not respond to changes in the
|
||||
system, and certain central services may not be available.
|
||||
|
||||
Talos nodes which have `.machine.type` of `controlplane` are control plane nodes.
|
||||
|
||||
Control plane nodes are tainted by default to prevent workloads from being scheduled to control plane nodes.
|
||||
|
||||
### How many control plane nodes should be deployed?
|
||||
|
||||
Because control plane nodes are so important, it is important that they be
|
||||
deployed with redundancy to ensure consistent, reliable operation of the cluster
|
||||
during upgrades, reboots, hardware failures, and other such events.
|
||||
This is also known as high-availability or just HA.
|
||||
Non-HA clusters are sometimes used as test clusters, CI clusters, or in specific scenarios
|
||||
which warrant the loss of redundancy, but they should almost never be used in production.
|
||||
|
||||
Maintaining the proper count of control plane nodes is also critical.
|
||||
The etcd database operates on the principles of membership and quorum, so
|
||||
membership should always be an odd number, and there is exponentially-increasing
|
||||
overhead for each additional member.
|
||||
Therefore, the number of control plane nodes should almost always be 3.
|
||||
In some particularly large or distributed clusters, the count may be 5, but this
|
||||
is very rare.
|
||||
|
||||
See [this document]({{< relref "../learn-more/concepts#control-planes-are-not-linear-replicas" >}}) on the topic for more information.
|
||||
|
||||
### What is the control plane endpoint?
|
||||
|
||||
The Kubernetes control plane endpoint is the single canonical URL by which the
|
||||
Kubernetes API is accessed.
|
||||
Especially with high-availability (HA) control planes, it is common that this endpoint may not point to the Kubernetes API server
|
||||
directly, but may be instead point to a load balancer or a DNS name which may
|
||||
Especially with high-availability (HA) control planes, this endpoint may point to a load balancer or a DNS name which may
|
||||
have multiple `A` and `AAAA` records.
|
||||
|
||||
Like Talos' own API, the Kubernetes API is constructed with mutual TLS, client
|
||||
Like Talos' own API, the Kubernetes API uses mutual TLS, client
|
||||
certs, and a common Certificate Authority (CA).
|
||||
Unlike general-purpose websites, there is no need for an upstream CA, so tools
|
||||
such as cert-manager, services such as Let's Encrypt, or purchased products such
|
||||
such as cert-manager, Let's Encrypt, or products such
|
||||
as validated TLS certificates are not required.
|
||||
Encryption, however, _is_, and hence the URL scheme will always be `https://`.
|
||||
|
||||
By default, the Kubernetes API server in Talos runs on port 6443.
|
||||
As such, the control plane endpoint URLs for Talos will almost always be of the form
|
||||
`https://endpoint:6443`, noting that the port, since it is not the `https`
|
||||
default of `443` is _required_.
|
||||
`https://endpoint:6443`.
|
||||
(The port, since it is not the `https` default of `443` is required.)
|
||||
The `endpoint` above may be a DNS name or IP address, but it should be
|
||||
ultimately be directed to the _set_ of all controlplane nodes, as opposed to a
|
||||
directed to the _set_ of all controlplane nodes, as opposed to a
|
||||
single one.
|
||||
|
||||
As mentioned above, this can be achieved by a number of strategies, including:
|
||||
@ -82,10 +40,9 @@ As mentioned above, this can be achieved by a number of strategies, including:
|
||||
- Talos-builtin shared IP ([VIP]({{< relref "../talos-guides/network/vip" >}}))
|
||||
- BGP peering of a shared IP (such as with [kube-vip](https://kube-vip.io))
|
||||
|
||||
Using a DNS name here is usually a good idea, it being the most flexible
|
||||
option, since it allows the combination with any _other_ option, while offering
|
||||
Using a DNS name here is a good idea, since it allows any other option, while offering
|
||||
a layer of abstraction.
|
||||
It allows the underlying IP addresses to change over time without impacting the
|
||||
It allows the underlying IP addresses to change without impacting the
|
||||
canonical URL.
|
||||
|
||||
Unlike most services in Kubernetes, the API server runs with host networking,
|
||||
@ -94,11 +51,10 @@ This means you can use the IP address(es) of the host to refer to the Kubernetes
|
||||
API server.
|
||||
|
||||
For availability of the API, it is important that any load balancer be aware of
|
||||
the health of the backend API servers.
|
||||
This makes a load balancer-based system valuable to minimize disruptions during
|
||||
common node lifecycle operations like reboots and upgrades.
|
||||
the health of the backend API servers, to minimize disruptions during
|
||||
common node operations like reboots and upgrades.
|
||||
|
||||
It is critical that control plane endpoint works correctly during cluster bootstrap phase, as nodes discover
|
||||
It is critical that the control plane endpoint works correctly during cluster bootstrap phase, as nodes discover
|
||||
each other using control plane endpoint.
|
||||
|
||||
### kubelet is not running on control plane node
|
||||
|
||||
@ -329,7 +329,7 @@ Apply the manifests:
|
||||
kubectl apply -f manifests.yaml
|
||||
```
|
||||
|
||||
> Note: if some boostrap resources were removed, they have to be removed from the cluster manually.
|
||||
> Note: if some bootstrap resources were removed, they have to be removed from the cluster manually.
|
||||
|
||||
### kubelet
|
||||
|
||||
|
||||
@ -56,88 +56,3 @@ let a replacement grow.
|
||||
Rebuilds of Talos are remarkably fast, whether they be new machines, upgrades,
|
||||
or reinstalls.
|
||||
Never get hung up on an individual machine.
|
||||
|
||||
## Control Planes are not linear replicas
|
||||
|
||||
People familiar with traditional relational database replication often
|
||||
overlook a critical design concept of the Kubernetes (and Talos) database:
|
||||
`etcd`.
|
||||
Unlike linear replicas, which have dedicated masters and slaves/replicas, `etcd`
|
||||
is highly dynamic.
|
||||
The `master` in an `etcd` cluster is entirely temporal.
|
||||
This means fail-overs are handled easily, and usually without any notice
|
||||
of operators.
|
||||
This _also_ means that the operational architecture is fundamentally different.
|
||||
|
||||
Properly managed (which Talos Linux does), `etcd` should never have split brain
|
||||
and should never encounter noticeable down time.
|
||||
In order to do this, though, `etcd` maintains the concept of "membership" and of
|
||||
"quorum".
|
||||
In order to perform _any_ operation, read _or_ write, the database requires
|
||||
quorum to be sustained.
|
||||
That is, a _strict_ majority must agree on the current leader, and absenteeism
|
||||
counts as a negative.
|
||||
In other words, if there are three registered members (voters), at least two out
|
||||
of the three must be actively asserting that the current master _is_ the master.
|
||||
If any two disagree or even fail to answer, the `etcd` database will lock itself
|
||||
until quorum is again achieved in order to protect itself and the integrity of
|
||||
the data.
|
||||
This is fantastically important for handling distributed systems and the various
|
||||
types of contention which may arise.
|
||||
|
||||
This design means, however, that having an incorrect number of members can be
|
||||
devastating.
|
||||
Having only two controlplane nodes, for instance, is mostly _worse_ than having
|
||||
only one, because if _either_ goes down, your entire database will lock.
|
||||
You would be better off just making periodic snapshots of the data and restoring
|
||||
it when necessary.
|
||||
|
||||
Another common situation occurs when replacing controlplane nodes.
|
||||
If you have three controlplane nodes and replace one, you will not have three
|
||||
members, you will have four, and one of those will never be available again.
|
||||
Thus, if _any_ of your three remaining nodes goes down, your database will lock,
|
||||
because only two out of the four members will be available: four nodes is
|
||||
_worse_ than three nodes!
|
||||
So it is critical that controlplane members which are replaced be removed.
|
||||
Luckily, the Talos API makes this easy.
|
||||
|
||||
## Bootstrap once
|
||||
|
||||
In the old days, Talos Linux had the idea of an `init` node.
|
||||
The `init` node was a "special" controlplane node which was designated as the
|
||||
founder of the cluster.
|
||||
It was the first, was guaranteed to be the elector, and was authorised to create
|
||||
a cluster...
|
||||
even if one already existed.
|
||||
This made the formation of a cluster cluster really easy, but it had a lot of
|
||||
down sides.
|
||||
Mostly, these related to rebuilding or replacing that `init` node:
|
||||
you could easily end up with a split-brain scenario in which you had two different clusters:
|
||||
a single node one and a two-node one.
|
||||
Needless to say, this was an unhappy arrangement.
|
||||
|
||||
Fortunately, `init` nodes are gone, but that means that the critical operation
|
||||
of forming a cluster is a manual process.
|
||||
It's an _easy_ process, consisting of a single API call, but it can be a
|
||||
confusing one, until you understand what it does.
|
||||
|
||||
Every new cluster must be bootstrapped exactly and only once.
|
||||
This means you do NOT bootstrap each node in a cluster, not even each
|
||||
controlplane node.
|
||||
You bootstrap only a _single_ controlplane node, because you are bootstrapping the
|
||||
_cluster_, not the node.
|
||||
|
||||
It doesn't matter _which_ controlplane node is told to bootstrap, but it must be
|
||||
a controlplane node, and it must be only one.
|
||||
|
||||
Bootstrapping is _fast_ and sure.
|
||||
Even if your Kubernetes cluster fails to form for other reasons (say, a bad
|
||||
configuration option or unavailable container repository), if the bootstrap API
|
||||
call returns successfully, you do NOT need to bootstrap again:
|
||||
just fix the config or let Kubernetes retry.
|
||||
|
||||
Bootstrapping itself does not do anything with Kubernetes.
|
||||
Bootstrapping only tells `etcd` to form a cluster, so don't judge the success of
|
||||
a bootstrap by the failure of Kubernetes to start.
|
||||
Kubernetes relies on `etcd`, so bootstrapping is _required_, but it is not
|
||||
_sufficient_ for Kubernetes to start.
|
||||
|
||||
@ -4,7 +4,81 @@ weight: 50
|
||||
description: "Understand the Kubernetes Control Plane."
|
||||
---
|
||||
|
||||
This guide provides details on how Talos runs and bootstraps the Kubernetes control plane.
|
||||
This guide provides information about the Kubernetes control plane, and details on how Talos runs and bootstraps the Kubernetes control plane.
|
||||
|
||||
<!-- markdownlint-disable MD026 -->
|
||||
|
||||
## What is a control plane node?
|
||||
|
||||
A control plane node is a node which:
|
||||
|
||||
- runs etcd, the Kubernetes database
|
||||
- runs the Kubernetes control plane
|
||||
- kube-apiserver
|
||||
- kube-controller-manager
|
||||
- kube-scheduler
|
||||
- serves as an administrative proxy to the worker nodes
|
||||
|
||||
These nodes are critical to the operation of your cluster.
|
||||
Without control plane nodes, Kubernetes will not respond to changes in the
|
||||
system, and certain central services may not be available.
|
||||
|
||||
Talos nodes which have `.machine.type` of `controlplane` are control plane nodes.
|
||||
(check via `talosctl get member`)
|
||||
|
||||
Control plane nodes are tainted by default to prevent workloads from being scheduled onto them.
|
||||
This is both to protect the control plane from workloads consuming resources and starving the control plane processes, and also to reduce the risk of a vulnerability exposes the control plane's credentials to a workload.
|
||||
|
||||
## The Control Plane and Etcd
|
||||
|
||||
A critical design concept of Kubernetes (and Talos) is the `etcd` database.
|
||||
|
||||
Properly managed (which Talos Linux does), `etcd` should never have split brain or noticeable down time.
|
||||
In order to do this, `etcd` maintains the concept of "membership" and of
|
||||
"quorum".
|
||||
To perform any operation, read or write, the database requires
|
||||
quorum.
|
||||
That is, a majority of members must agree on the current leader, and absenteeism (members that are down, or not reachable)
|
||||
counts as a negative.
|
||||
For example, if there are three members, at least two out
|
||||
of the three must agree on the current leader.
|
||||
If two disagree or fail to answer, the `etcd` database will lock itself
|
||||
until quorum is achieved in order to protect the integrity of
|
||||
the data.
|
||||
|
||||
This design means that having two controlplane nodes is _worse_ than having only one, because if _either_ goes down, your database will lock (and the chance of one of two nodes going down is greater than the chance of just a single node going down).
|
||||
Similarly, a 4 node etcd cluster is worse than a 3 node etcd cluster - a 4 node cluster requires 3 nodes to be up to achieve quorum (in order to have a majority), while the 3 node cluster requires 2 nodes:
|
||||
i.e. both can support a single node failure and keep running - but the chance of a node failing in a 4 node cluster is higher than that in a 3 node cluster.
|
||||
|
||||
Another note about etcd: due to the need to replicate data amongst members, performance of etcd _decreases_ as the cluster scales.
|
||||
A 5 node cluster can commit about 5% less writes per second than a 3 node cluster running on the same hardware.
|
||||
|
||||
## Recommendations for your control plane
|
||||
|
||||
- Run your clusters with three or five control plane nodes.
|
||||
Three is enough for most use cases.
|
||||
Five will give you better availability (in that it can tolerate two node failures simultaneously), but cost you more both in the number of nodes required, and also as each node may require more hardware resources to offset the performance degradation seen in larger clusters.
|
||||
- Implement good monitoring and put processes in place to deal with a failed node in a timely manner (and test them!)
|
||||
- Even with robust monitoring and procedures for replacing failed nodes in place, backup etcd and your control plane node configuration to guard against unforeseen disasters.
|
||||
- Monitor the performance of your etcd clusters.
|
||||
If etcd performance is slow, vertically scale the nodes, not the number of nodes.
|
||||
- If a control plane node fails, remove it first, then add the replacement node.
|
||||
(This ensures that the failed node does not "vote" when adding in the new node, minimizing the chances of a quorum violation.)
|
||||
- If replacing a node that has not failed, add the new one, then remove the old.
|
||||
|
||||
## Bootstrapping the Control Plane
|
||||
|
||||
Every new cluster must be bootstrapped only once, which is achieved by telling a single control plane node to initiate the bootstrap.
|
||||
|
||||
Bootstrapping itself does not do anything with Kubernetes.
|
||||
Bootstrapping only tells `etcd` to form a cluster, so don't judge the success of
|
||||
a bootstrap by the failure of Kubernetes to start.
|
||||
Kubernetes relies on `etcd`, so bootstrapping is _required_, but it is not
|
||||
_sufficient_ for Kubernetes to start.
|
||||
If your Kubernetes cluster fails to form for other reasons (say, a bad
|
||||
configuration option or unavailable container repository), if the bootstrap API
|
||||
call returns successfully, you do NOT need to bootstrap again:
|
||||
just fix the config or let Kubernetes retry.
|
||||
|
||||
### High-level Overview
|
||||
|
||||
@ -26,7 +100,6 @@ The `kubelet` tries to contact the control plane endpoint, but as it is not up y
|
||||
One of the control plane nodes is chosen as the bootstrap node, and promoted using the bootstrap API (`talosctl bootstrap`).
|
||||
The bootstrap node initiates the `etcd` bootstrap process by initializing `etcd` as the first member of the cluster.
|
||||
|
||||
> Note: there should be only one bootstrap node for the cluster lifetime.
|
||||
> Once `etcd` is bootstrapped, the bootstrap node has no special role and acts the same way as other control plane nodes.
|
||||
|
||||
Services `etcd` on non-bootstrap nodes try to get `Endpoints` resource via control plane endpoint, but that request fails as control plane endpoint is not up yet.
|
||||
@ -58,9 +131,14 @@ control plane endpoint, joins the `etcd` cluster, and the control plane componen
|
||||
|
||||
Scaling down the control plane involves removing a node from the cluster.
|
||||
The most critical part is making sure that the node which is being removed leaves the etcd cluster.
|
||||
When using `talosctl reset` command, the targeted control plane node leaves the `etcd` cluster as part of the reset sequence.
|
||||
The recommended way to do this is to use:
|
||||
|
||||
### Upgrading Control Plane Nodes
|
||||
- `talosctl -n IP.of.node.to.remove reset`
|
||||
- `kubectl delete node`
|
||||
|
||||
When using `talosctl reset` command, the targeted control plane node leaves the `etcd` cluster as part of the reset sequence, and its disks are erased.
|
||||
|
||||
### Upgrading Talos on Control Plane Nodes
|
||||
|
||||
When a control plane node is upgraded, Talos leaves `etcd`, wipes the system disk, installs a new version of itself, and reboots.
|
||||
The upgraded node then joins the `etcd` cluster on reboot.
|
||||
|
||||
@ -10,7 +10,7 @@ Talos implements concepts of *resources* and *controllers* to facilitate interna
|
||||
Talos resources and controllers are very similar to Kubernetes resources and controllers, but there are some differences.
|
||||
The content of this document is not required to operate Talos, but it is useful for troubleshooting.
|
||||
|
||||
Starting with Talos 0.9, most of the Kubernetes control plane boostrapping and operations is implemented via controllers and resources which allows Talos to be reactive to configuration changes, environment changes (e.g. time sync).
|
||||
Starting with Talos 0.9, most of the Kubernetes control plane bootstrapping and operations is implemented via controllers and resources which allows Talos to be reactive to configuration changes, environment changes (e.g. time sync).
|
||||
|
||||
## Resources
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user