24 Commits

Author SHA1 Message Date
Andrey Smirnov
8eacc4ba80
feat: support rotation of Talos API CA
This allows to roll all nodes to use a new CA, to refresh it, or e.g.
when the `talosconfig` was exposed accidentally.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-22 12:16:47 +04:00
Dmitriy Matrenichev
fa3b933705
chore: replace fmt.Errorf with errors.New where possible
This time use `eg` from `x/tools` repo tool to do this.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-14 17:39:30 +03:00
Andrey Smirnov
c3e4182000
refactor: use COSI runtime with new controller runtime DB
See https://github.com/cosi-project/runtime/pull/336

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-12 19:44:44 +04:00
Andrey Smirnov
badbc51e63
refactor: rewrite code to include preliminary support for multi-doc
`config.Container` implements a multi-doc container which implements
both `Container` interface (encoding, validation, etc.), and `Conifg`
interface (accessing parts of the config).

Refactor `generate` and `bundle` packages to support multi-doc, and
provide backwards compatibility.

Implement a first (mostly example) machine config document for
SideroLink API URL.

Many places don't properly support multi-doc yet (e.g. config patches).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-31 18:38:05 +04:00
Andrey Smirnov
80fed31940
feat: include Kubernetes controlplane endpoint as one of the endpoints
These endpoints are used for workers to find the addresses of the
controlplane nodes to connect to `trustd` to issue certificates of
`apid`.

These endpoints today come from two sources:

* discovery service data
* Kubernetes API server endpoints

This PR adds to the list static entry based on the Kubernetes control
plane endpoint in the machine config.

E.g. if the loadbalancer is used for the controlplane endpoint, and that
loadbalancer also proxies requests for port 50001 (trustd), this static
endpoint will provide workers with connectivity to trustd even if the
discovery service is disabled, and Kubernetes API is not up.

If this endpoint doesn't provide any trustd API, Talos will still try
other endpoints.

Talos does server certificate validation when calling trustd,
so including malicious endpoints doesn't cause any harm, as malicious
endpoint can't provider proper server certificate.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-01-03 21:33:18 +04:00
Andrey Smirnov
a505b8909a
fix: update COSI and reset restart backoff on success
See https://github.com/cosi-project/runtime/pull/191

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-12-06 17:43:26 +04:00
Andrey Smirnov
96aa9638f7
chore: rename talos-systems/talos to siderolabs/talos
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-03 16:50:32 +04:00
Andrey Smirnov
2dadcd6695
fix: stop worker nodes from acting as apid routers
Don't allow worker nodes to act as apid routers:

* don't try to issue client certificate for apid on worker nodes
* if worker nodes receives incoming connections with `--nodes` set to
  one of the local addresses of the nodd, it routes the request to
  itself without proxying

Second point allows using `talosctl -e worker -n worker` to connect
directly to the worker if the connection from the control plane is not
available for some reason.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-13 15:07:31 +04:00
Andrey Smirnov
f62d17125b
chore: update crypto to use new import path siderolabs/crypto
No functional changes in this PR, just updating import paths.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-07 23:02:50 +04:00
Andrey Smirnov
92314e47bf
refactor: use controllers/resources to feed trustd with data
This is mostly same as the way `apid` consumes certificates generated by
`machined` via COSI API connection.

Service `trustd` consumes two resources:

* `secrets.Trustd` which contains `trustd` server TLS certificates and
  it gets refreshed as e.g. node IP changes
* `secrets.OSRoot` which contains Talos API CA and join token

This PR fixes an issue with `trustd` certs not always including all IPs
of the node, as previously `trustd` certs will only capture addresses of
the node at the moment of `trustd` startup.

Another thing is that refactoring allows to dynamically change API CA
and join token. This needs more work, but `trustd` should now pick up
changes without any additional changes.

Fixes #5863

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-04 23:45:34 +04:00
Dmitriy Matrenichev
6351928611
chore: redo pointer with github.com/siderolabs/go-pointer module
With the advent of generics, redo pointer functionality and remove github.com/AlekSi/pointer dependency.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-05-02 02:17:13 +04:00
Dmitriy Matrenichev
e06e1473b0
feat: update golangci-lint to 1.45.0 and gofumpt to 0.3.0
- Update golangci-lint to 1.45.0
- Update gofumpt to 0.3.0
- Fix gofumpt errors
- Add goimports and format imports since gofumports is removed
- Update Dockerfile
- Fix .golangci.yml configuration
- Fix linting errors

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-03-24 08:14:04 +04:00
Andrey Smirnov
753a82188f
refactor: move pkg/resources to machinery
Fixes #4420

No functional changes, just moving packages around.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-11-15 19:50:35 +03:00
Andrey Smirnov
8329d21114
chore: split polymorphic RootSecret resource into specific types
Fixes #4418

Only one resource (one of the very first ones) was polymorphic: its
actual spec type depends on its ID. This was a bad idea, and it doesn't
work with protobuf specs (as type <> protobuf relationship can't be
established).

Refactor this by splitting into three separate resource types:
`OSRoot` (OS-level root secrets), `EtcdRoot` (for etcd),
`KubernetesRoot` (for Kubernetes).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-27 19:56:04 +03:00
Andrey Smirnov
12f7888b75
feat: feed control plane endpoints on workers from cluster discovery
Fixes #4231

This allows much faster worker join on `apid` level without waiting for
Kubernetes control plane to be up (and even if the Kubernetes control
plane doesn't go up or the kubelet isn't up).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-18 18:49:29 +03:00
Andrey Smirnov
0f60ef6d38
fix: reset inputs back to initial state in secrets.APIController
This fixes a bug when after an error generating certificates controller
gets into a state of not being able to read its expected inputs
(NetworkStatus specifically).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-29 19:57:51 +03:00
Andrey Smirnov
62acd62516
fix: check trustd API CA on worker nodes
This distributes API CA (just the certificate, not the key) to the
worker nodes on config generation, and if the CA cert is present on the
worker node, it verifies TLS connection to the trustd with the CA
certificate.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-28 15:14:23 +03:00
Andrey Smirnov
c3b2429ce9
fix: suppress spurious Kubernetes API server cert updates
With the last changes, `kube-apiserver` certificates are generated based
on the assigned `NodeAdresses`, machine configuration, etc. Whenver the
certificate is regenerated, `kube-apiserver` is reloaded to pick up the
new cert.

With Virtual IP enabled, Virtual IP address is included into the
certificate from the beginning as it is specified in the machine
configuration, but as virtual IP moves between the nodes this causes
`NodeAddresses` update, which triggers the controller, generates new
certs and reloads `kube-apiserver` at bad time (right after VIP got
moved). Even though the cert generated is identical to the previous one,
the API server reload makes it unavailable for 30-90 seconds.

This change extracts `CertSANs` as a separate resource so that its
updates are suppressed if the CertSANs sources change, but the final
list stays the same, and in turn prevents final certificate from being
updated.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-09 00:31:54 +03:00
Andrey Smirnov
2c66e1b3c5
feat: provide building of local Affiliate structure (for the node)
Fixes #4139

This builds the local (for the node) `Affiliate` structure which
describes node for the cluster discovery. Dependending on the
configuration, KubeSpan information might be included as well.

`NodeAddresses` were updated to hold CIDRs instead of simple IPs.

The `Affiliate` will be pushed to the registries, while `Affiliate`s for
other nodes will be fetched back from the registries.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-03 16:44:19 +03:00
Andrey Smirnov
0b347570a7
feat: use dynamic NodeAddresses/HostnameStatus in Kubernetes certs
This is a PR on a path towards removing `ApplyDynamicConfig`.

This fixes Kubernetes API server certificate generation to use dynamic
data to generate cert with proper SANs for IPs of the node.

As part of that refactored a bit apid certificate generation (without
any changes).

Added two unit-tests for apid and Kubernetes certificate generation.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-09-01 20:56:53 +03:00
Andrey Smirnov
6956edd0bf
feat: add node address filters, filter out k8s addresses for Talos API
This implements abstract `NodeAddressFilter` which can be attached to
build additional `NodeAddress` resources filtering out some entries from
the complete list.

Kubernetes creates two filters to get all node IPs without Kubernetes
CIDRs and vice versa, only Kubernetes CIDRs.

API certificate generation now considers all addresses minus K8s CIDRs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-08-27 23:46:39 +03:00
Alexey Palazhchenko
eea750de2c chore: rename "join" type to "worker"
Closes #3413.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-07-09 07:10:45 -07:00
Andrey Smirnov
07fb61e5d2 fix: issue worker apid certs properly on renewal
This fixes endless block on RemoteGenerator.Close method rewriting the
RemoteGenerator using the retry package.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-29 09:02:35 -07:00
Andrey Smirnov
d8c2bca1b5 feat: reimplement apid certificate generation on top of COSI
This PR can be split into two parts:

* controllers
* apid binding into COSI world

Controllers
-----------

* `k8s.EndpointController` provides control plane endpoints on worker
nodes (it isn't required for now on control plane nodes)
* `secrets.RootController` now provides OS top-level secrets (CA cert)
and secret configuration
* `secrets.APIController` generates API secrets (certificates) in a bit
different way for workers and control plane nodes: controlplane nodes
generate directly, while workers reach out to `trustd` on control plane
nodes via `k8s.Endpoint` resource

apid Binding
------------

Resource `secrets.API` provides binding to protobuf by converting
itself back and forth to protobuf spec.

apid no longer receives machine configuration, instead it receives
gRPC-backed socket to access Resource API. apid watches `secrets.API`
resource, fetches certs and CA from it and uses that in its TLS
configuration.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-23 13:07:00 -07:00