28 Commits

Author SHA1 Message Date
Andrey Smirnov
caee24bf61
feat: implement KubeSpan identity controller
Fixes #4138

When KubeSpan is enabled, Talos automatically generates or loads
KubeSpan identity which consists of Wireguard key pair. ULA address is
calculated based on ClusterID and first NIC MAC address.

Some code was borrowed from #3577.

Example:

```
$ talosctl -n 172.20.0.2 get ksi
NODE         NAMESPACE   TYPE               ID      VERSION   ADDRESS                                       PUBLICKEY
172.20.0.2   kubespan    KubeSpanIdentity   local   1         fd71:6e1d:86be:6302:e871:1bff:feb2:ccee/128   Oak2fBEWngBhwslBxDVgnRNHXs88OAp4kjroSX0uqUE=
```

Additional changes:

* `--with-kubespan` flag for `talosctl cluster create` for quick testing
* validate that cluster discovery (and KubeSpan) requires ClusterID and
ClusterSecret.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Co-authored-by: Seán C McCord <ulexus@gmail.com>
2021-08-27 18:49:15 +03:00
Andrey Smirnov
7f22879af0
feat: provide random node identity
Fixes #4137

Node identity is established when `STATE` partition is mounted, and
cached there. Node identity will be used for the cluster discovery
process to identify each node of the cluster.

Random 32 bytes encoded via base62 are used as node identity.

`base62` uses only URL-safe characters which might save us some trouble
later.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-08-26 16:16:47 +03:00
Andrey Smirnov
83fdb7721f
feat: provide first NIC hardware addr as a resource
This will be used to derive e.g. KubeSpan address.

Fixes #4132

Example:

```
$ talosctl -n 172.20.0.2 get hardwareaddresses -o yaml
node: 172.20.0.2
metadata:
    namespace: network
    type: HardwareAddresses.net.talos.dev
    id: first
    version: 1
    owner: network.HardwareAddrController
    phase: running
    created: 2021-08-24T20:30:43Z
    updated: 2021-08-24T20:30:43Z
spec:
    name: eth0
    hardwareAddr: 6a:2b:bd:b2:fc:e0
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-08-25 17:11:24 +03:00
Artem Chernyshev
e08b4f8f9e feat: implement sysctl controllers
Fixed: https://github.com/talos-systems/talos/issues/3686

Replaced sequencer tasks for KSPP and Kubernetes required sysctl props
by the ones set by controllers.

KernelParam flow includes of 3 controllers and 2 resources:
- `KernelParamConfigController` - handles user sysctls coming from v1alpha1
config.
- `KernelParamDefaultsController` - handles our built-in KSPP and K8s
required sysctls.
- `KernelParamSpecController` - consumes `KernelParamSpec`s created by the
previous two controllers, applies them and updates the corresponding
`KernelParamStatus`.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-08-10 13:21:49 -07:00
Andrey Smirnov
0c7ce1cd81 feat: remove remnants of bootkube support
Fixes #3951

Bootkube support was removed in Talos 0.9. Talos versions 0.9-0.11
support conversion of self-hosted bootkube-based control plane to the
new style control plane running as static pods managed by Talos.

This commit removes all backwards compatibility and removes conversion
code.

For the k8s controllers, `BootstrapStatus` is removed and a dependency
on `etcd` service status is added (as it was implicitly there via
`BootstrapStatus`).

Remove control plane conversion code.

In k8s upgrade code, remove self-hosted part.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-08-03 07:55:42 -07:00
Andrey Smirnov
d8c2bca1b5 feat: reimplement apid certificate generation on top of COSI
This PR can be split into two parts:

* controllers
* apid binding into COSI world

Controllers
-----------

* `k8s.EndpointController` provides control plane endpoints on worker
nodes (it isn't required for now on control plane nodes)
* `secrets.RootController` now provides OS top-level secrets (CA cert)
and secret configuration
* `secrets.APIController` generates API secrets (certificates) in a bit
different way for workers and control plane nodes: controlplane nodes
generate directly, while workers reach out to `trustd` on control plane
nodes via `k8s.Endpoint` resource

apid Binding
------------

Resource `secrets.API` provides binding to protobuf by converting
itself back and forth to protobuf spec.

apid no longer receives machine configuration, instead it receives
gRPC-backed socket to access Resource API. apid watches `secrets.API`
resource, fetches certs and CA from it and uses that in its TLS
configuration.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-23 13:07:00 -07:00
Andrey Smirnov
3aae94e530 feat: provide Kubernetes nodename as a COSI resource
This changes the way Kubernetes nodename is computed: it is set by the
controller based on the hostname and machine configuration, and pulled
from the resource when needed.

Kubelet client now also uses nodename to fix the certifcate mismatch
issue on AWS.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-18 19:58:19 +03:00
Andrey Smirnov
96f89071c3 feat: update controller-runtime logs to console level on config.debug
This PR enables debug logging of controller-runtime to the server
console if the machine configuration .debug field is set to true (log
verbosity can be also changed on the fly).

For control plane nodes, don't send kubelet logs to the console (as they
tend to flood the console with messages), this leaves reasonable amount
of logging for early boot troubleshooting.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-16 08:41:58 -07:00
Andrey Smirnov
f2ae9cd0c1 feat: replace networkd with new network implementation
This removes networkd, updates network ready condition, enables all the
controllers which were previously disabled.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-15 17:37:28 -07:00
Andrey Smirnov
11e258b150 feat: implement operator configuration controller
This controller based on machine configuration, cmdline, defaults,
produces configuration for the operators - what operators should run,
what are the parameters for the operators, etc.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-10 11:17:41 -07:00
Andrey Smirnov
02bd657b25 feat: implement network.Status resource and controller
This resource holds aggregated network status which can be easily used
in various places to wait for the network to reach some desired state.

The state checks are simple right now, we might improve the logic to
make sure all the configured network subsystems reached defined state,
but this might come later as we refine the logic (e.g. to make sure that
all static configuration got applied, etc.)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-09 11:10:08 -07:00
Andrey Smirnov
e74f789b01 feat: implement EtcFileController to render files in /etc
This implements two controllers: one which generates templates for
`/etc/hosts` and `/etc/resolv.config`, and another generic controller
which renders files to `/etc` via shadow bind mounts.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-07 13:42:44 -07:00
Andrey Smirnov
8b8de11d9f feat: implement new controllers for hostname, resolvers and time servers
This is next part of networkd rewrite.

This implements three new resource types coupled with controllers which
process the default configuration, merges and applying changes.

TimeSync was set up to watch the time servers resource. This is a no-op
for now, but once DHCP is implemented, this would enable time server
configuration coming from DHCP.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-05 12:41:14 -07:00
Artem Chernyshev
76aac4bb25 feat: implement CPU and Memory stats controller
Now the latest value for CPU and Memory is also represented as COSI
resources.

Was going back and forth in the implementation but in the end decided to
use dedicated yaml structures for both CPU and Memory stats because:

- JSON tags are ignored by `go-yaml`, so the output is not really great.
- protobuf Talos definition contains fields which we don't really need
in the YAML output of `talosctl get`.
- current state of Talos resource service does not support protobuf
encoding for resources.

So the plan for Theila is to just use the structure as a dynamic object
without relying on protobufs. At least for now.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-06-03 10:19:27 -07:00
Andrey Smirnov
ed10e139c1 feat: implement NodeAddress controller
This controller provides three important aggregated resources to be
consumed by different interested parties:

* "default" node IP
* "current" addresses (node can be reached on these at the moment)
* "accumulative" addresses (for certSANs)

Example:

```
$ talosctl get nodeaddresses -n 172.20.0.2
NODE         NAMESPACE   TYPE          ID             VERSION   ADDRESSES
172.20.0.2   network     NodeAddress   accumulative   4         ["10.244.0.0","10.244.0.1","172.20.0.2"]
172.20.0.2   network     NodeAddress   current        6         ["10.244.0.0","10.244.0.1","172.20.0.2"]
172.20.0.2   network     NodeAddress   default        1         ["172.20.0.2"]
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-02 14:03:59 -07:00
Andrey Smirnov
5811f4dda1 feat: implement link (interface) controllers
The structure of the controllers is really similar to addresses and
routes:

* `LinkSpec` resource describes desired link state
* `LinkConfig` controller generates `LinkSpecs` based on machine
configuration and kernel cmdline
* `LinkMerge` controller merges multiple configuration sources into a
single `LinkSpec` paying attention to the config layer priority
* `LinkSpec` controller applies the specs to the kernel state

Controller `LinkStatus` (which was implemented before) watches the
kernel state and publishes current link status.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-01 09:36:25 -07:00
Andrey Smirnov
0acb04ad7a feat: implement route network controllers
Route handling is very similar to addresses:

* `RouteStatus` describes kernel routing table state,
`RouteStatusController` reflects kernel state into resources
* `RouteSpec` defines routes to be configured
* `RouteConfigController` creates `RouteSpec`s based on cmdline and
machine configuration
* `RouteMergeController` merges different configuration layers into the
final representation
* `RouteSpecController` applies the specs to the kernel routing table

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-25 11:09:21 -07:00
Artem Chernyshev
1db301edf6 feat: switch controller-runtime to zap.Logger
Enable logging using default development config with some fine tuning.
Additionally, now `info` and below logs go to kmsg.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-05-25 02:15:31 -07:00
Andrey Smirnov
071f044562 feat: implement AddressSpec handling
This includes multiple controllers responsible for different stages of
`AddressSpec` conversion:

* `AddressConfigController` produces initial unmerged configuration from
multiple sources (more sources coming later, e.g. DHCP)
* `AddressMergeController` merges address configuration into final
representation
* `AddressSpecController` syncs resources with kernel state

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-14 02:27:12 -07:00
Andrey Smirnov
db9c35b570 feat: implement AddressStatusController
This controller queries addresses of all the interfaces in the system
and presents them as resources. The idea is that can be a source for
many decisions - e.g. whether network is ready (physical interface has
scope global address assigned).

This is also good for debugging purposes.

Examples:

```
$ talosctl -n 172.20.0.2 get addresses
NODE         NAMESPACE   TYPE            ID                                          VERSION
172.20.0.2   network     AddressStatus   cni0/10.244.0.1/24                          1
172.20.0.2   network     AddressStatus   cni0/fe80::9c87:cdff:fe8e:5fdc/64           2
172.20.0.2   network     AddressStatus   eth0/172.20.0.2/24                          1
172.20.0.2   network     AddressStatus   eth0/fe80::ac1b:9cff:fe19:6b47/64           2
172.20.0.2   network     AddressStatus   flannel.1/10.244.0.0/32                     1
172.20.0.2   network     AddressStatus   flannel.1/fe80::440b:67ff:fe99:c18f/64      2
172.20.0.2   network     AddressStatus   lo/127.0.0.1/8                              1
172.20.0.2   network     AddressStatus   lo/::1/128                                  1
172.20.0.2   network     AddressStatus   veth178e9b31/fe80::6040:1dff:fe5b:ae1a/64   2
172.20.0.2   network     AddressStatus   vethb0b96a94/fe80::2473:86ff:fece:1954/64   2
```

```
$ talosctl -n 172.20.0.2 get addresses -o yaml eth0/172.20.0.2/24
node: 172.20.0.2
metadata:
    namespace: network
    type: AddressStatuses.net.talos.dev
    id: eth0/172.20.0.2/24
    version: 1
    owner: network.AddressStatusController
    phase: running
spec:
    address: 172.20.0.2/24
    local: 172.20.0.2
    broadcast: 172.20.0.255
    linkIndex: 4
    linkName: eth0
    family: inet4
    scope: global
    flags: permanent
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-11 13:32:17 -07:00
Andrey Smirnov
3c1213596c feat: implement LinkStatusController
This is the first PR of many which implement new COSI network
configuration. This controller provides low-level status of the network
interfaces (links) not touching on the addresses of the interface.

The information gathered resembles output of `ip link show` command.

Examples:

```
$ talosctl -n 172.20.0.2 get links
NODE         NAMESPACE   TYPE         ID             VERSION   TYPE       KIND     HW ADDR                                           OPER STATE   LINK STATE
172.20.0.2   net         LinkStatus   bond0          1         ether      bond     fe:c4:d6:4c:04:05                                 down         false
172.20.0.2   net         LinkStatus   cni0           5         ether      bridge   22:cc:25:7e:64:19                                 up           true
172.20.0.2   net         LinkStatus   dummy0         1         ether      dummy    0e:f6:f3:ef:53:29                                 down         false
172.20.0.2   net         LinkStatus   eth0           4         ether               ae:1b:9c:19:6b:47                                 up           true
172.20.0.2   net         LinkStatus   flannel.1      2         ether      vxlan    be:c5:4f:eb:da:5c                                 unknown      true
172.20.0.2   net         LinkStatus   ip6tnl0        1         tunnel6    ip6tnl   00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00   down         false
172.20.0.2   net         LinkStatus   lo             4         loopback            00:00:00:00:00:00                                 unknown      true
172.20.0.2   net         LinkStatus   sit0           1         sit        sit      00:00:00:00                                       down         false
172.20.0.2   net         LinkStatus   teql0          1         void                                                                  down         false
172.20.0.2   net         LinkStatus   tunl0          1         ipip       ipip     00:00:00:00                                       down         false
172.20.0.2   net         LinkStatus   veth1c1422df   2         ether      veth     6a:2d:68:be:8e:8f                                 up           true
172.20.0.2   net         LinkStatus   veth2ce7ce8d   1         ether      veth     52:fc:98:82:f7:29                                 up           true
```

```
$ talosctl -n 172.20.0.2 get links eth0 -o yaml
node: 172.20.0.2
metadata:
    namespace: net
    type: LinkStatuses.net.talos.dev
    id: eth0
    version: 4
    owner: network.LinkStatusController
    phase: running
spec:
    index: 4
    type: ether
    linkIndex: 0
    flags: UP,BROADCAST,RUNNING,MULTICAST,LOWER_UP
    hardwareAddr: ae:1b:9c:19:6b:47
    broadcastAddr: ff:ff:ff:ff:ff:ff
    mtu: 1500
    queueDisc: pfifo_fast
    operationalState: up
    kind: ""
    slaveKind: ""
    linkState: true
    speedMbit: 4294967295
    port: Other
    duplex: Unknown
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-07 10:08:21 -07:00
Andrey Smirnov
d24df8f844 chore: re-import talos-systems/os-runtime as cosi-project/runtime
No changes, just import path change (as project got moved).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-12 07:44:24 -07:00
Andrey Smirnov
2ea20f598a feat: replace timed with time sync controller
This is a complete rewrite of time sync process.

Now the time sync process starts early at boot time, and it adapts to
configuration changes:

* before config is available, `pool.ntp.org` is used
* once config is available, configured time servers are used

Controller updates same time sync resource as other controllers had
dependency on, so they have a chance to wait for the time sync event.

Talos services which depend on time now wait on same resource instead of
waiting on timed health.

New features:

* time sync now sticks to the particular time server unless there's an
error from that server, and server is changed in that case, this
improves time sync accuracy

* time sync acts on config changes immediately, so it's possible to
reconfigure time sync at any time

* there's a new 'epoch' field in time sync resources which allows
time-dependent controllers to regenerate certs when there's a big enough
jump in time

Features to implement later:

* apid shouldn't depend on timed, it should be started early and it
should regenerate certs on time jump

* trustd should be updated in same way

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-29 09:29:43 -07:00
Andrey Smirnov
60aa011c7a feat: rename namespaces, resources, types etc
See https://github.com/talos-systems/os-runtime/pull/12 for new mnaming
conventions.

No functional changes.

Additionally implements printing extra columns in `talosctl get xyz`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-02 13:34:15 -08:00
Andrey Smirnov
b914398154 refactor: split kubernetes/etcd resource generation into subresources
Fixes #3062

There's no user-visible change in this PR.

It carefully separates generated secrets (e.g. certs) from source
secrets from the config (e.g. CAs), so that certs are generated on
config changes which actually affect cert input.

And same way separates etcd and Kubernetes PKI, so if etcd CA got
changed, only etcd certs will be regenerated.

This should have noticeable impact with RSA-based PKI as it reduces
number of times PKI gets generated.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-18 22:01:28 -08:00
Andrey Smirnov
85ae9f75e9 fix: wait for time sync before generating Kubernetes certificates
Certificate generation depends on current time, and this bug is visible
on RPi which doesn't have RTC clock - controllers can generate certs
before `timed` does its initial sync creating certs which are not
usable.

Fix generates new intermediate resource `TimeSync` which tracks time
sync status (aggregates `timed` service status and `timed`
enabled/disabled in the config).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-09 10:01:19 -08:00
Andrey Smirnov
0aaf8fa968 feat: replace bootkube with Talos-managed control plane
Control plane components are running as static pods managed by the
kubelets.

Whole subsystem is managed via resources/controllers from os-runtime.

Many supporting changes/refactoring to enable new code paths.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-01-26 14:22:35 -08:00
Andrey Smirnov
11863dd74d feat: implement resource API in Talos
This brings in `os-runtime` package and exposes resources with first
iteration of read-only API.

Two Talos resources (and one controller) are implemented:

* legacy.Service resource tracks Talos 'service' `RUNNING` state
* config.V1Alpha1 stores current runtime config

Glue point between existing runtime and new os-runtime based runtime is
in `v1alpha2` implementation and `V1Alpha2()` sub-interfaces of existing
`Runtime`, `State`, `Controller` interfaces.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-01-19 11:45:46 -08:00