docs: document SideroLink and other 0.5.0 new features

Includes bumps to the pkgs/tools for Go 1.17.7 and new kernel version.

Added clusterctl move label to the `siderolink` root secret.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This commit is contained in:
Andrey Smirnov 2022-02-11 23:14:14 +03:00
parent 416cc51e0a
commit 36ebc2a056
No known key found for this signature in database
GPG Key ID: 7B26396447AB6DFD
15 changed files with 225 additions and 36 deletions

View File

@ -13,8 +13,8 @@ TALOS_RELEASE ?= v0.14.1
PREVIOUS_TALOS_RELEASE ?= v0.13.4
DEFAULT_K8S_VERSION ?= v1.22.3
TOOLS ?= ghcr.io/talos-systems/tools:v0.9.0
PKGS ?= v0.10.0-alpha.0-21-g2b8cd88
TOOLS ?= ghcr.io/talos-systems/tools:v0.10.0-alpha.0-3-g4c9e7a4
PKGS ?= v0.10.0-alpha.0-24-g6019223
SFYRA_CLUSTERCTL_CONFIG ?= $(HOME)/.cluster-api/clusterctl.sfyra.yaml

View File

@ -14,6 +14,7 @@ import (
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
clusterctl "sigs.k8s.io/cluster-api/cmd/clusterctl/api/v1alpha3"
runtimeclient "sigs.k8s.io/controller-runtime/pkg/client"
"github.com/talos-systems/siderolink/pkg/wireguard"
@ -96,6 +97,9 @@ func (cfg *Config) save(ctx context.Context, metalClient runtimeclient.Client) e
ObjectMeta: metav1.ObjectMeta{
Namespace: corev1.NamespaceDefault,
Name: SecretName,
Labels: map[string]string{
clusterctl.ClusterctlMoveLabelName: "",
},
},
Data: map[string][]byte{
secretInstallationID: []byte(cfg.InstallationID),

View File

@ -18,7 +18,7 @@ preface = """\
[notes.capi]
title = "CAPI v1beta1"
description = """\
This release of CACPPT brings compatibility with CAPI v1beta1.
This release of Sidero brings compatibility with CAPI v1beta1.
"""
[notes.ipmi-pxe-method]
@ -28,21 +28,21 @@ IPMI PXE method (UEFI, BIOS) can now be configured with `SIDERO_CONTROLLER_MANAG
"""
[notes.siderolink]
title = "Siderolink"
title = "SideroLink"
description = """\
Sidero now connects to all servers using Siderolink.
This enables streaming of all dmesg logs and events back to sidero.
Sidero now connects to all servers using SideroLink (available only with Talos >= 0.14).
This enables streaming of kernel logs and events back to Sidero.
All server logs can now be viewed by getting logs of one of the container of the `sidero-controller-manager`:
```
kubectl logs -f -n sidero-system deployment/sidero-controller-manager serverlogs
kubectl logs -f -n sidero-system deployment/sidero-controller-manager -c serverlogs
```
Events:
```
kubectl logs -f -n sidero-system deployment/sidero-controller-manager serverevents
kubectl logs -f -n sidero-system deployment/sidero-controller-manager -c serverevents
```
"""
@ -64,6 +64,8 @@ New set of conditions is now available which can simplify cluster troubleshootin
- `TalosConfigValidated` is set to false when the config validation
fails on the node.
- `TalosInstalled` is set to true/false when talos installer finishes.
Requires Talos >= v0.14.
"""
[notes.pxeboot]
@ -77,4 +79,14 @@ Now the node will be PXE booted until Talos installation succeeds.
title = "iPXE Boot From Disk Method"
description = """\
iPXE boot from disk method can now be set not only on the global level, but also in the Server and ServerClass specs.
"""
[notes.clustertemplate]
title = "Cluster Template"
description = """\
Sidero ships with new cluster template without `init` nodes.
This template is only compatible with Talos >= 0.14 (it requires SideroLink feature which was introduced in Talos 0.14).
On upgrade, Sidero supports clusters running Talos < 0.14 if they were created before the upgrade.
Use [legacy template](https://github.com/talos-systems/sidero/blob/release-0.4/templates/cluster-template.yaml) to deploy clusters with Talos < 0.14.
"""

View File

@ -39,12 +39,13 @@ The main article for installing `clusterctl` can be found
```bash
sudo curl -Lo /usr/local/bin/clusterctl \
"https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.4.4/clusterctl-$(uname -s | tr '[:upper:]' '[:lower:]')-amd64" \
"https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.4.7/clusterctl-$(uname -s | tr '[:upper:]' '[:lower:]')-amd64" \
sudo chmod +x /usr/local/bin/clusterctl
```
> Note: This version of Sidero is only compatible with CAPI v1alpha4,
> so versions of `clusterctl` above v0.4.x will not work.
> Please use the latest v0.4.x version of `clusterctl` from the [release page](https://github.com/kubernetes-sigs/cluster-api/releases/).
## Install `talosctl`

View File

@ -94,15 +94,16 @@ they are ready.
In order to interact with the new machines (outside of Kubernetes), you will
need to obtain the `talosctl` client configuration, or `talosconfig`.
You can do this by retrieving the resource of the same type from the Sidero
You can do this by retrieving the secret from the Sidero
management cluster:
```bash
kubectl --context=sidero-demo \
get talosconfig \
-l cluster.x-k8s.io/cluster-name=cluster-0 \
-o jsonpath='{.items[0].status.talosConfig}' \
> cluster-0-talosconfig.yaml
get secret \
cluster-0-talosconfig \
-o jsonpath='{.data.talosconfig}' \
| base64 -d \
> cluster-0-talosconfig
```
## Retrieve the Kubeconfig
@ -110,7 +111,7 @@ kubectl --context=sidero-demo \
With the talosconfig obtained, the workload cluster's kubeconfig can be retrieved in the normal Talos way:
```bash
talosctl --talosconfig cluster-0.yaml kubeconfig
talosctl --talosconfig cluster-0-talosconfig --nodes <CONTROL_PLANE_IP> kubeconfig
```
## Check access

View File

@ -6,7 +6,7 @@ title: "Expose Sidero Services"
> If you built your cluster as specified in the [Prerequisite: Kubernetes] section in this tutorial, your services are already exposed and you can skip this section.
There are two external Services which Sidero serves and which much be made
There are three external Services which Sidero serves and which much be made
reachable by the servers which it will be driving.
For most servers, TFTP (port 69/udp) will be needed.
@ -20,9 +20,14 @@ The kernel, initrd, and all configuration assets are served from the HTTP servic
It is needed for all servers, but since it is HTTP-based, it
can be easily proxied, load balanced, or run through an ingress controller.
Overlay Wireguard SideroLink network requires UDP port 51821 to be open.
Same as TFTP, many load balancers do not support Wireguard UDP protocol.
Instead, use MetalLB.
The main thing to keep in mind is that the services **MUST** match the IP or
hostname specified by the `SIDERO_CONTROLLER_MANAGER_API_ENDPOINT` environment
variable (or configuration parameter) when you installed Sidero.
hostname specified by the `SIDERO_CONTROLLER_MANAGER_API_ENDPOINT` and
`SIDERO_CONTROLLER_MANAGER_SIDEROLINK_ENDPOINT` environment
variables (or configuration parameters) when you installed Sidero.
It is a good idea to verify that the services are exposed as you think they
should be.

View File

@ -25,6 +25,7 @@ options.
```bash
export SIDERO_CONTROLLER_MANAGER_HOST_NETWORK=true
export SIDERO_CONTROLLER_MANAGER_API_ENDPOINT=192.168.1.150
export SIDERO_CONTROLLER_MANAGER_SIDEROLINK_ENDPOINT=192.168.1.150
clusterctl init -b talos -c talos -i sidero
```

View File

@ -19,6 +19,7 @@ the _new_ management cluster first.
```bash
export SIDERO_CONTROLLER_MANAGER_API_ENDPOINT=sidero.mydomain.com
export SIDERO_CONTROLLER_MANAGER_SIDEROLINK_ENDPOINT=sidero.mydomain.com
clusterctl init \
--kubeconfig-context=management

View File

@ -39,12 +39,12 @@ The main article for installing `clusterctl` can be found
```bash
sudo curl -Lo /usr/local/bin/clusterctl \
"https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.4.4/clusterctl-$(uname -s | tr '[:upper:]' '[:lower:]')-amd64" \
"https://github.com/kubernetes-sigs/cluster-api/releases/download/v1.0.4/clusterctl-$(uname -s | tr '[:upper:]' '[:lower:]')-amd64" \
sudo chmod +x /usr/local/bin/clusterctl
```
> Note: This version of Sidero is only compatible with CAPI v1alpha4,
> so versions of `clusterctl` above v0.4.x will not work.
> Note: This version of Sidero is only compatible with CAPI v1beta1,
> so please install the latest version of `clusterctl` v1.x.
## Install `talosctl`

View File

@ -20,12 +20,14 @@ We need only set up our DHCP server to point to it.
The tricky bit is that at different phases, we need to serve different assets,
but they all use the same DHCP metadata key.
In fact, for each architecture, we have as many as four different client types:
In fact, we have as many as six different client types:
- Legacy BIOS-based PXE boot (undionly.kpxe via TFTP)
- UEFI-based PXE boot (ipxe.efi via TFTP)
- UEFI HTTP boot (ipxe.efi via HTTP URL)
- iPXE (boot.ipxe via HTTP URL)
- UEFI-based PXE arm64 boot (ipxe-arm64.efi via TFTP)
- UEFI HTTP boot on arm64 (ipxe-arm64.efi via HTTP URL)
## Common client types
@ -39,6 +41,8 @@ options:
- UEFI-based PXE boot: `ipxe.efi`
- UEFI HTTP boot: `http://sidero-server-url/tftp/ipxe.efi`
- iPXE boot: `http://sidero-server-url/boot.ipxe`
- arm64 UEFI PXE boot: `ipxe-arm64.efi`
- arm64 UEFI HTTP boot: `http://sidero-server-url/tftp/ipxe-arm64.efi`
In the ISC DHCP server, these options look like:

View File

@ -15,8 +15,8 @@ It can be, for example:
Two important things are needed in this cluster:
- Kubernetes `v1.18` or later
- Ability to expose tcp and udp Services to the workload cluster machines
- Kubernetes `v1.19` or later
- Ability to expose TCP and UDP Services to the workload cluster machines
For the purposes of this tutorial, we will create this cluster in Docker on a
workstation, perhaps a laptop.
@ -35,7 +35,7 @@ export HOST_IP="192.168.1.150"
talosctl cluster create \
--name sidero-demo \
-p 69:69/udp,8081:8081/tcp \
-p 69:69/udp,8081:8081/tcp,51821:51821/udp \
--workers 0 \
--config-patch '[{"op": "add", "path": "/cluster/allowSchedulingOnMasters", "value": true}]' \
--endpoint $HOST_IP
@ -46,11 +46,12 @@ host.
This is _not_ the Docker bridge IP but the standard IP address of the
workstation.
Note that there are two ports mentioned in the command above.
Note that there are three ports mentioned in the command above.
The first (69) is
for TFTP.
The second (8081) is for the web server (which serves netboot
artifacts and configuration).
The third (51821) is for the SideroLink Wireguard network.
Exposing them here allows us to access the services that will get deployed on this node.
In turn, we will be running our Sidero services with `hostNetwork: true`,

View File

@ -122,7 +122,7 @@ Issue the following to create a single-node cluster:
```bash
talosctl cluster create \
--kubernetes-version 1.22.2 \
-p 69:69/udp,8081:8081/tcp \
-p 69:69/udp,8081:8081/tcp,51821:51821/udp \
--workers 0 \
--endpoint $PUBLIC_IP
```
@ -143,8 +143,6 @@ kubectl taint node talos-default-master-1 node-role.kubernetes.io/master:NoSched
## Install Sidero
As of Cluster API version 0.3.9, Sidero is included as a default infrastructure provider in clusterctl.
To install Sidero and the other Talos providers, simply issue:
```bash

View File

@ -4,8 +4,6 @@ weight: 2
title: Installation
---
As of Cluster API version 0.3.9, Sidero is included as a default infrastructure provider in `clusterctl`.
To install Sidero and the other Talos providers, simply issue:
```bash
@ -16,21 +14,24 @@ Sidero supports several variables to configure the installation, these variables
variables or as variables in the `clusterctl` configuration:
- `SIDERO_CONTROLLER_MANAGER_HOST_NETWORK` (`false`): run `sidero-controller-manager` on host network
- `SIDERO_CONTROLLER_MANAGER_API_ENDPOINT` (empty): specifies the IP address controller manager can be reached on, defaults to the node IP
- `SIDERO_CONTROLLER_MANAGER_API_ENDPOINT` (empty): specifies the IP address controller manager API service can be reached on, defaults to the node IP (TCP)
- `SIDERO_CONTROLLER_MANAGER_API_PORT` (8081): specifies the port controller manager can be reached on
- `SIDERO_CONTROLLER_MANAGER_CONTAINER_API_PORT` (8081): specifies the controller manager internal container port
- `SIDERO_CONTROLLER_MANAGER_SIDEROLINK_ENDPOINT` (empty): specifies the IP address SideroLink Wireguard service can be reached on, defaults to the node IP (UDP)
- `SIDERO_CONTROLLER_MANAGER_SIDEROLINK_PORT` (51821): specifies the port SideroLink Wireguard service can be reached on
- `SIDERO_CONTROLLER_MANAGER_EXTRA_AGENT_KERNEL_ARGS` (empty): specifies additional Linux kernel arguments for the Sidero agent (for example, different console settings)
- `SIDERO_CONTROLLER_MANAGER_AUTO_ACCEPT_SERVERS` (`false`): automatically accept discovered servers, by default `.spec.accepted` should be changed to `true` to accept the server
- `SIDERO_CONTROLLER_MANAGER_AUTO_BMC_SETUP` (`true`): automatically attempt to configure the BMC with a `sidero` user that will be used for all IPMI tasks.
- `SIDERO_CONTROLLER_MANAGER_INSECURE_WIPE` (`true`): wipe only the first megabyte of each disk on the server, otherwise wipe the full disk
- `SIDERO_CONTROLLER_MANAGER_SERVER_REBOOT_TIMEOUT` (`20m`): timeout for the server reboot (how long it might take for the server to be rebooted before Sidero retries an IPMI reboot operation)
- `SIDERO_CONTROLLER_MANAGER_IPMI_PXE_METHOD` (`uefi`): IPMI boot from PXE method: `uefi` for UEFI boot or `bios` for BIOS boot
- `SIDERO_CONTROLLER_MANAGER_BOOT_FROM_DISK_METHOD` (`ipxe-exit`): configures the way Sidero forces server to boot from disk when server hits iPXE server after initial install: `ipxe-exit` returns iPXE script with `exit` command, `http-404` returns HTTP 404 Not Found error, `ipxe-sanboot` uses iPXE `sanboot` command to boot from the first hard disk
- `SIDERO_CONTROLLER_MANAGER_BOOT_FROM_DISK_METHOD` (`ipxe-exit`): configures the way Sidero forces server to boot from disk when server hits iPXE server after initial install: `ipxe-exit` returns iPXE script with `exit` command, `http-404` returns HTTP 404 Not Found error, `ipxe-sanboot` uses iPXE `sanboot` command to boot from the first hard disk (can be also configured on `ServerClass`/`Server` method)
Sidero provides two endpoints which should be made available to the infrastructure:
Sidero provides three endpoints which should be made available to the infrastructure:
- TCP port 8081 which provides combined iPXE, metadata and gRPC service (external endpoint should be passed to Sidero as `SIDERO_CONTROLLER_MANAGER_API_ENDPOINT` and `SIDERO_CONTROLLER_MANAGER_API_PORT`)
- TCP port 8081 which provides combined iPXE, metadata and gRPC service (external endpoint should be specified as `SIDERO_CONTROLLER_MANAGER_API_ENDPOINT` and `SIDERO_CONTROLLER_MANAGER_API_PORT`)
- UDP port 69 for the TFTP service (DHCP server should point the nodes to PXE boot from that IP)
- UDP port 51821 for the SideroLink Wireguard service (external endpoint should be specified as `SIDERO_CONTROLLER_MANAGER_SIDEROLINK_ENDPOINT` and `SIDERO_CONTROLLER_MANAGER_SIDEROLINK_PORT`)
These endpoints could be exposed to the infrastructure using different strategies:

View File

@ -1,6 +1,6 @@
---
description: ""
weight: 4
weight: 5
title: Resources
---
@ -73,6 +73,42 @@ This resource corresponds to the `infrastructureRef` section of Cluster API's `C
A `MetalMachine` is Sidero's view of a machine.
Allows for reference of a single server or a server class from which a physical server will be picked to bootstrap.
`MetalMachine` provides a set of statuses describing the state (available with SideroLink, requires Talos >= 0.14):
```yaml
status:
addresses:
- address: 172.25.0.5
type: InternalIP
- address: pxe-2
type: Hostname
conditions:
- lastTransitionTime: "2022-02-11T14:20:42Z"
message: 'Get ... connection refused'
reason: ProviderUpdateFailed
severity: Warning
status: "False"
type: ProviderSet
- lastTransitionTime: "2022-02-11T12:48:35Z"
status: "True"
type: TalosConfigLoaded
- lastTransitionTime: "2022-02-11T12:48:35Z"
status: "True"
type: TalosConfigValidated
- lastTransitionTime: "2022-02-11T12:48:35Z"
status: "True"
type: TalosInstalled
```
Statuses:
- `addresses` lists the current IP addresses and hostname of the node, `addresses` are updated when the node addresses are changed
- `conditions`:
- `ProviderSet`: captures the moment infrastrucutre provider ID is set in the `Node` specification; depends on workload cluster control plane availability
- `TalosConfigLoaded`: Talos successfully loaded machine configuration from Sidero; if this condition indicates a failure, check `sidero-controller-manager` logs
- `TalosConfigValidated`: Talos successfully validated machine configuration; a failure in this condition indicates that the machine config is malformed
- `TalosInstalled`: Talos was successfully installed to disk
#### `MetalMachineTemplates`
A `MetalMachineTemplate` is similar to a `MetalMachine` above, but serves as a template that is reused for resources like `MachineDeployments` or `TalosControlPlanes` that allocate multiple `Machines` at once.

View File

@ -0,0 +1,124 @@
---
description: ""
weight: 4
title: SideroLink
---
SideroLink provides an overlay Wireguard point-to-point connection from every Talos machine to the Sidero.
Sidero provisions each machine with a unique IPv6 address and Wireguard key for the SideroLink connection.
> Note: SideroLink is only supported with Talos >= 0.14.
>
> SideroLink doesn't provide a way for workload machines to communicate with each other, a connection is only
> point-to-point.
SideroLink connection is both encrypted and authenticated, so Sidero uses that to map data streams coming from the machines
to a specific `ServerBinding`, `MetalMachine`, `Machine` and `Cluster`.
Talos node sends two streams over the SideroLink connection: kernel logs (dmesg) and Talos event stream.
SideroLink is enabled automatically by Sidero when booting Talos.
## Kernel Logs
Kernel logs (`dmesg`) are streamed in real time from the Talos nodes to the `sidero-controller-manager` over SideroLink connection.
Log streaming starts when the kernel passes control to the `init` process, so kernel boot time logs will only be available when control
is passed to the userland.
Logs can be accessed by accessing the logs of the `serverlogs` container of the `sidero-controller-manager` pod:
```bash
$ kubectl -n sidero-system logs deployment/sidero-controller-manager -c serverlogs -f
{"clock":8576583,"cluster":"management-cluster","facility":"user","machine":"management-cluster-cp-ddgsw","metal_machine":"management-cluster-cp-vrff4","msg":"[talos] phase mountState (6/13): 1 tasks(s)\n","namespace":"default","priority":"warning","seq":665,"server_uuid":"6b121f82-24a8-4611-9d23-fa1a5ba564f0","talos-level":"warn","talos-time":"2022-02-11T12:42:02.74807823Z"}
...
```
The format of the message is the following:
```json
{
"clock": 8576583,
"cluster": "management-cluster",
"facility": "user",
"machine": "management-cluster-cp-ddgsw",
"metal_machine": "management-cluster-cp-vrff4",
"msg": "[talos] phase mountState (6/13): 1 tasks(s)\n",
"namespace": "default",
"priority": "warning",
"seq": 665,
"server_uuid": "6b121f82-24a8-4611-9d23-fa1a5ba564f0",
"talos-level": "warn",
"talos-time": "2022-02-11T12:42:02.74807823Z"
}
```
Kernel fields (see [Linux documentation](https://www.kernel.org/doc/Documentation/ABI/testing/dev-kmsg) for details):
- `clock` is the kernel timestamp relative to the boot time
- `facility` of the message
- `msg` is the actual log message
- `seq` is the kernel log sequence
- `priority` is the message priority
Talos-added fields:
- `talos-level` is the translated `priority` into standard logging levels
- `talos-time` is the timestamp of the log message (accuracy of the timestamp depends on time sync)
Sidero-added fields:
- `server_uuid` is the `name` of the matching `Server` and `ServerBinding` resources
- `namespace` is the namespace of the `Cluster`, `MetalMachine` and `Machine`
- `cluster`, `metal_machine` and `machine` are the names of the matching `Cluster`, `MetalMachine` and `Machine` resources
It might be a good idea to send container logs to some log aggregation system and filter the logs for a cluster or a machine.
Quick filtering for a specific server:
```bash
kubectl -n sidero-system logs deployment/sidero-controller-manager -c serverlogs | jq -R 'fromjson? | select(.server_uuid == "b4e677d9-b59b-4c1c-925a-f9d9ce049d79")'
```
## Talos Events
Talos delivers system events over the SideroLink connection to the `sidero-link-manager` pod.
These events can be accessed with `talosctl events` command.
Events are mostly used to update `ServerBinding`/`MetalMachine` statuses, but they can be also seen in the logs of the `serverevents` container:
```bash
$ kubectl -n sidero-system logs deployment/sidero-controller-manager -c serverevents -f
{"level":"info","ts":1644853714.2700942,"caller":"events-manager/adapter.go:153","msg":"incoming event","component":"sink","node":"[fdae:2859:5bb1:7a03:3ae3:be30:7ec4:4c09]:44530","id":"c857jkm1jjcc7393cbs0","type":"type.googleapis.com/machine.
AddressEvent","server_uuid":"b4e677d9-b59b-4c1c-925a-f9d9ce049d79","cluster":"management-cluster","namespace":"default","metal_machine":"management-cluster-cp-47lll","machine":"management-cluster-cp-7mpsh","hostname":"pxe-2","addresses":"172.25.0.5"}
```
## MetalMachine Conditions
Sidero updates the statuses of `ServerBinding`/`MetalMachine` resources based on the events received from Talos node:
- current addresses of the node
- statuses of machine configuration loading and validation, installation status
See [Resources](../resources/) for details.
## SideroLink State
State of the SideroLink connection is kept in the `ServerBinding` resource:
```yaml
spec:
siderolink:
address: fdae:2859:5bb1:7a03:3ae3:be30:7ec4:4c09/64
publicKey: XIBT49g9xCoBvyb/x36J+ASlQ4qaxXMG20ZgKbBbfE8=
```
Installation-wide SideroLink state is kept in the `siderolink` `Secret` resource:
```bash
$ kubectl get secrets siderolink -o yaml
apiVersion: v1
data:
installation-id: QUtmZGFmVGJtUGVFcWp0RGMzT1BHSzlGcmlHTzdDQ0JCSU9aRzRSamdtWT0=
private-key: ME05bHhBd3JwV0hDczhNbm1aR3RDL1ZjK0ZSUFM5UzQwd25IU00wQ3dHOD0=
...
```
Key `installation-id` is used to generate unique SideroLink IPv6 addresses, and `private-key` is the Wireguard key of Sidero.