Fixes#4138
When KubeSpan is enabled, Talos automatically generates or loads
KubeSpan identity which consists of Wireguard key pair. ULA address is
calculated based on ClusterID and first NIC MAC address.
Some code was borrowed from #3577.
Example:
```
$ talosctl -n 172.20.0.2 get ksi
NODE NAMESPACE TYPE ID VERSION ADDRESS PUBLICKEY
172.20.0.2 kubespan KubeSpanIdentity local 1 fd71:6e1d:86be:6302:e871:1bff:feb2:ccee/128 Oak2fBEWngBhwslBxDVgnRNHXs88OAp4kjroSX0uqUE=
```
Additional changes:
* `--with-kubespan` flag for `talosctl cluster create` for quick testing
* validate that cluster discovery (and KubeSpan) requires ClusterID and
ClusterSecret.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Co-authored-by: Seán C McCord <ulexus@gmail.com>
Fixes#4137
Node identity is established when `STATE` partition is mounted, and
cached there. Node identity will be used for the cluster discovery
process to identify each node of the cluster.
Random 32 bytes encoded via base62 are used as node identity.
`base62` uses only URL-safe characters which might save us some trouble
later.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixed: https://github.com/talos-systems/talos/issues/3686
Replaced sequencer tasks for KSPP and Kubernetes required sysctl props
by the ones set by controllers.
KernelParam flow includes of 3 controllers and 2 resources:
- `KernelParamConfigController` - handles user sysctls coming from v1alpha1
config.
- `KernelParamDefaultsController` - handles our built-in KSPP and K8s
required sysctls.
- `KernelParamSpecController` - consumes `KernelParamSpec`s created by the
previous two controllers, applies them and updates the corresponding
`KernelParamStatus`.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Fixes#3951
Bootkube support was removed in Talos 0.9. Talos versions 0.9-0.11
support conversion of self-hosted bootkube-based control plane to the
new style control plane running as static pods managed by Talos.
This commit removes all backwards compatibility and removes conversion
code.
For the k8s controllers, `BootstrapStatus` is removed and a dependency
on `etcd` service status is added (as it was implicitly there via
`BootstrapStatus`).
Remove control plane conversion code.
In k8s upgrade code, remove self-hosted part.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR can be split into two parts:
* controllers
* apid binding into COSI world
Controllers
-----------
* `k8s.EndpointController` provides control plane endpoints on worker
nodes (it isn't required for now on control plane nodes)
* `secrets.RootController` now provides OS top-level secrets (CA cert)
and secret configuration
* `secrets.APIController` generates API secrets (certificates) in a bit
different way for workers and control plane nodes: controlplane nodes
generate directly, while workers reach out to `trustd` on control plane
nodes via `k8s.Endpoint` resource
apid Binding
------------
Resource `secrets.API` provides binding to protobuf by converting
itself back and forth to protobuf spec.
apid no longer receives machine configuration, instead it receives
gRPC-backed socket to access Resource API. apid watches `secrets.API`
resource, fetches certs and CA from it and uses that in its TLS
configuration.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This changes the way Kubernetes nodename is computed: it is set by the
controller based on the hostname and machine configuration, and pulled
from the resource when needed.
Kubelet client now also uses nodename to fix the certifcate mismatch
issue on AWS.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR enables debug logging of controller-runtime to the server
console if the machine configuration .debug field is set to true (log
verbosity can be also changed on the fly).
For control plane nodes, don't send kubelet logs to the console (as they
tend to flood the console with messages), this leaves reasonable amount
of logging for early boot troubleshooting.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This removes networkd, updates network ready condition, enables all the
controllers which were previously disabled.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This controller based on machine configuration, cmdline, defaults,
produces configuration for the operators - what operators should run,
what are the parameters for the operators, etc.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This resource holds aggregated network status which can be easily used
in various places to wait for the network to reach some desired state.
The state checks are simple right now, we might improve the logic to
make sure all the configured network subsystems reached defined state,
but this might come later as we refine the logic (e.g. to make sure that
all static configuration got applied, etc.)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This implements two controllers: one which generates templates for
`/etc/hosts` and `/etc/resolv.config`, and another generic controller
which renders files to `/etc` via shadow bind mounts.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is next part of networkd rewrite.
This implements three new resource types coupled with controllers which
process the default configuration, merges and applying changes.
TimeSync was set up to watch the time servers resource. This is a no-op
for now, but once DHCP is implemented, this would enable time server
configuration coming from DHCP.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Now the latest value for CPU and Memory is also represented as COSI
resources.
Was going back and forth in the implementation but in the end decided to
use dedicated yaml structures for both CPU and Memory stats because:
- JSON tags are ignored by `go-yaml`, so the output is not really great.
- protobuf Talos definition contains fields which we don't really need
in the YAML output of `talosctl get`.
- current state of Talos resource service does not support protobuf
encoding for resources.
So the plan for Theila is to just use the structure as a dynamic object
without relying on protobufs. At least for now.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This controller provides three important aggregated resources to be
consumed by different interested parties:
* "default" node IP
* "current" addresses (node can be reached on these at the moment)
* "accumulative" addresses (for certSANs)
Example:
```
$ talosctl get nodeaddresses -n 172.20.0.2
NODE NAMESPACE TYPE ID VERSION ADDRESSES
172.20.0.2 network NodeAddress accumulative 4 ["10.244.0.0","10.244.0.1","172.20.0.2"]
172.20.0.2 network NodeAddress current 6 ["10.244.0.0","10.244.0.1","172.20.0.2"]
172.20.0.2 network NodeAddress default 1 ["172.20.0.2"]
```
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The structure of the controllers is really similar to addresses and
routes:
* `LinkSpec` resource describes desired link state
* `LinkConfig` controller generates `LinkSpecs` based on machine
configuration and kernel cmdline
* `LinkMerge` controller merges multiple configuration sources into a
single `LinkSpec` paying attention to the config layer priority
* `LinkSpec` controller applies the specs to the kernel state
Controller `LinkStatus` (which was implemented before) watches the
kernel state and publishes current link status.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Route handling is very similar to addresses:
* `RouteStatus` describes kernel routing table state,
`RouteStatusController` reflects kernel state into resources
* `RouteSpec` defines routes to be configured
* `RouteConfigController` creates `RouteSpec`s based on cmdline and
machine configuration
* `RouteMergeController` merges different configuration layers into the
final representation
* `RouteSpecController` applies the specs to the kernel routing table
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Enable logging using default development config with some fine tuning.
Additionally, now `info` and below logs go to kmsg.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This includes multiple controllers responsible for different stages of
`AddressSpec` conversion:
* `AddressConfigController` produces initial unmerged configuration from
multiple sources (more sources coming later, e.g. DHCP)
* `AddressMergeController` merges address configuration into final
representation
* `AddressSpecController` syncs resources with kernel state
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This controller queries addresses of all the interfaces in the system
and presents them as resources. The idea is that can be a source for
many decisions - e.g. whether network is ready (physical interface has
scope global address assigned).
This is also good for debugging purposes.
Examples:
```
$ talosctl -n 172.20.0.2 get addresses
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 network AddressStatus cni0/10.244.0.1/24 1
172.20.0.2 network AddressStatus cni0/fe80::9c87:cdff:fe8e:5fdc/64 2
172.20.0.2 network AddressStatus eth0/172.20.0.2/24 1
172.20.0.2 network AddressStatus eth0/fe80::ac1b:9cff:fe19:6b47/64 2
172.20.0.2 network AddressStatus flannel.1/10.244.0.0/32 1
172.20.0.2 network AddressStatus flannel.1/fe80::440b:67ff:fe99:c18f/64 2
172.20.0.2 network AddressStatus lo/127.0.0.1/8 1
172.20.0.2 network AddressStatus lo/::1/128 1
172.20.0.2 network AddressStatus veth178e9b31/fe80::6040:1dff:fe5b:ae1a/64 2
172.20.0.2 network AddressStatus vethb0b96a94/fe80::2473:86ff:fece:1954/64 2
```
```
$ talosctl -n 172.20.0.2 get addresses -o yaml eth0/172.20.0.2/24
node: 172.20.0.2
metadata:
namespace: network
type: AddressStatuses.net.talos.dev
id: eth0/172.20.0.2/24
version: 1
owner: network.AddressStatusController
phase: running
spec:
address: 172.20.0.2/24
local: 172.20.0.2
broadcast: 172.20.0.255
linkIndex: 4
linkName: eth0
family: inet4
scope: global
flags: permanent
```
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is the first PR of many which implement new COSI network
configuration. This controller provides low-level status of the network
interfaces (links) not touching on the addresses of the interface.
The information gathered resembles output of `ip link show` command.
Examples:
```
$ talosctl -n 172.20.0.2 get links
NODE NAMESPACE TYPE ID VERSION TYPE KIND HW ADDR OPER STATE LINK STATE
172.20.0.2 net LinkStatus bond0 1 ether bond fe:c4:d6:4c:04:05 down false
172.20.0.2 net LinkStatus cni0 5 ether bridge 22:cc:25:7e:64:19 up true
172.20.0.2 net LinkStatus dummy0 1 ether dummy 0e:f6:f3:ef:53:29 down false
172.20.0.2 net LinkStatus eth0 4 ether ae:1b:9c:19:6b:47 up true
172.20.0.2 net LinkStatus flannel.1 2 ether vxlan be:c5:4f:eb:da:5c unknown true
172.20.0.2 net LinkStatus ip6tnl0 1 tunnel6 ip6tnl 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 down false
172.20.0.2 net LinkStatus lo 4 loopback 00:00:00:00:00:00 unknown true
172.20.0.2 net LinkStatus sit0 1 sit sit 00:00:00:00 down false
172.20.0.2 net LinkStatus teql0 1 void down false
172.20.0.2 net LinkStatus tunl0 1 ipip ipip 00:00:00:00 down false
172.20.0.2 net LinkStatus veth1c1422df 2 ether veth 6a:2d:68:be:8e:8f up true
172.20.0.2 net LinkStatus veth2ce7ce8d 1 ether veth 52:fc:98:82:f7:29 up true
```
```
$ talosctl -n 172.20.0.2 get links eth0 -o yaml
node: 172.20.0.2
metadata:
namespace: net
type: LinkStatuses.net.talos.dev
id: eth0
version: 4
owner: network.LinkStatusController
phase: running
spec:
index: 4
type: ether
linkIndex: 0
flags: UP,BROADCAST,RUNNING,MULTICAST,LOWER_UP
hardwareAddr: ae:1b:9c:19:6b:47
broadcastAddr: ff:ff:ff:ff:ff:ff
mtu: 1500
queueDisc: pfifo_fast
operationalState: up
kind: ""
slaveKind: ""
linkState: true
speedMbit: 4294967295
port: Other
duplex: Unknown
```
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is a complete rewrite of time sync process.
Now the time sync process starts early at boot time, and it adapts to
configuration changes:
* before config is available, `pool.ntp.org` is used
* once config is available, configured time servers are used
Controller updates same time sync resource as other controllers had
dependency on, so they have a chance to wait for the time sync event.
Talos services which depend on time now wait on same resource instead of
waiting on timed health.
New features:
* time sync now sticks to the particular time server unless there's an
error from that server, and server is changed in that case, this
improves time sync accuracy
* time sync acts on config changes immediately, so it's possible to
reconfigure time sync at any time
* there's a new 'epoch' field in time sync resources which allows
time-dependent controllers to regenerate certs when there's a big enough
jump in time
Features to implement later:
* apid shouldn't depend on timed, it should be started early and it
should regenerate certs on time jump
* trustd should be updated in same way
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
See https://github.com/talos-systems/os-runtime/pull/12 for new mnaming
conventions.
No functional changes.
Additionally implements printing extra columns in `talosctl get xyz`.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes#3062
There's no user-visible change in this PR.
It carefully separates generated secrets (e.g. certs) from source
secrets from the config (e.g. CAs), so that certs are generated on
config changes which actually affect cert input.
And same way separates etcd and Kubernetes PKI, so if etcd CA got
changed, only etcd certs will be regenerated.
This should have noticeable impact with RSA-based PKI as it reduces
number of times PKI gets generated.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Certificate generation depends on current time, and this bug is visible
on RPi which doesn't have RTC clock - controllers can generate certs
before `timed` does its initial sync creating certs which are not
usable.
Fix generates new intermediate resource `TimeSync` which tracks time
sync status (aggregates `timed` service status and `timed`
enabled/disabled in the config).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Control plane components are running as static pods managed by the
kubelets.
Whole subsystem is managed via resources/controllers from os-runtime.
Many supporting changes/refactoring to enable new code paths.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This brings in `os-runtime` package and exposes resources with first
iteration of read-only API.
Two Talos resources (and one controller) are implemented:
* legacy.Service resource tracks Talos 'service' `RUNNING` state
* config.V1Alpha1 stores current runtime config
Glue point between existing runtime and new os-runtime based runtime is
in `v1alpha2` implementation and `V1Alpha2()` sub-interfaces of existing
`Runtime`, `State`, `Controller` interfaces.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>