This allows to roll all nodes to use a new CA, to refresh it, or e.g.
when the `talosconfig` was exposed accidentally.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
`config.Container` implements a multi-doc container which implements
both `Container` interface (encoding, validation, etc.), and `Conifg`
interface (accessing parts of the config).
Refactor `generate` and `bundle` packages to support multi-doc, and
provide backwards compatibility.
Implement a first (mostly example) machine config document for
SideroLink API URL.
Many places don't properly support multi-doc yet (e.g. config patches).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
These endpoints are used for workers to find the addresses of the
controlplane nodes to connect to `trustd` to issue certificates of
`apid`.
These endpoints today come from two sources:
* discovery service data
* Kubernetes API server endpoints
This PR adds to the list static entry based on the Kubernetes control
plane endpoint in the machine config.
E.g. if the loadbalancer is used for the controlplane endpoint, and that
loadbalancer also proxies requests for port 50001 (trustd), this static
endpoint will provide workers with connectivity to trustd even if the
discovery service is disabled, and Kubernetes API is not up.
If this endpoint doesn't provide any trustd API, Talos will still try
other endpoints.
Talos does server certificate validation when calling trustd,
so including malicious endpoints doesn't cause any harm, as malicious
endpoint can't provider proper server certificate.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Don't allow worker nodes to act as apid routers:
* don't try to issue client certificate for apid on worker nodes
* if worker nodes receives incoming connections with `--nodes` set to
one of the local addresses of the nodd, it routes the request to
itself without proxying
Second point allows using `talosctl -e worker -n worker` to connect
directly to the worker if the connection from the control plane is not
available for some reason.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This is mostly same as the way `apid` consumes certificates generated by
`machined` via COSI API connection.
Service `trustd` consumes two resources:
* `secrets.Trustd` which contains `trustd` server TLS certificates and
it gets refreshed as e.g. node IP changes
* `secrets.OSRoot` which contains Talos API CA and join token
This PR fixes an issue with `trustd` certs not always including all IPs
of the node, as previously `trustd` certs will only capture addresses of
the node at the moment of `trustd` startup.
Another thing is that refactoring allows to dynamically change API CA
and join token. This needs more work, but `trustd` should now pick up
changes without any additional changes.
Fixes#5863
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
With the advent of generics, redo pointer functionality and remove github.com/AlekSi/pointer dependency.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Fixes#4418
Only one resource (one of the very first ones) was polymorphic: its
actual spec type depends on its ID. This was a bad idea, and it doesn't
work with protobuf specs (as type <> protobuf relationship can't be
established).
Refactor this by splitting into three separate resource types:
`OSRoot` (OS-level root secrets), `EtcdRoot` (for etcd),
`KubernetesRoot` (for Kubernetes).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4231
This allows much faster worker join on `apid` level without waiting for
Kubernetes control plane to be up (and even if the Kubernetes control
plane doesn't go up or the kubelet isn't up).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This fixes a bug when after an error generating certificates controller
gets into a state of not being able to read its expected inputs
(NetworkStatus specifically).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This distributes API CA (just the certificate, not the key) to the
worker nodes on config generation, and if the CA cert is present on the
worker node, it verifies TLS connection to the trustd with the CA
certificate.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
With the last changes, `kube-apiserver` certificates are generated based
on the assigned `NodeAdresses`, machine configuration, etc. Whenver the
certificate is regenerated, `kube-apiserver` is reloaded to pick up the
new cert.
With Virtual IP enabled, Virtual IP address is included into the
certificate from the beginning as it is specified in the machine
configuration, but as virtual IP moves between the nodes this causes
`NodeAddresses` update, which triggers the controller, generates new
certs and reloads `kube-apiserver` at bad time (right after VIP got
moved). Even though the cert generated is identical to the previous one,
the API server reload makes it unavailable for 30-90 seconds.
This change extracts `CertSANs` as a separate resource so that its
updates are suppressed if the CertSANs sources change, but the final
list stays the same, and in turn prevents final certificate from being
updated.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4139
This builds the local (for the node) `Affiliate` structure which
describes node for the cluster discovery. Dependending on the
configuration, KubeSpan information might be included as well.
`NodeAddresses` were updated to hold CIDRs instead of simple IPs.
The `Affiliate` will be pushed to the registries, while `Affiliate`s for
other nodes will be fetched back from the registries.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This is a PR on a path towards removing `ApplyDynamicConfig`.
This fixes Kubernetes API server certificate generation to use dynamic
data to generate cert with proper SANs for IPs of the node.
As part of that refactored a bit apid certificate generation (without
any changes).
Added two unit-tests for apid and Kubernetes certificate generation.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This implements abstract `NodeAddressFilter` which can be attached to
build additional `NodeAddress` resources filtering out some entries from
the complete list.
Kubernetes creates two filters to get all node IPs without Kubernetes
CIDRs and vice versa, only Kubernetes CIDRs.
API certificate generation now considers all addresses minus K8s CIDRs.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This fixes endless block on RemoteGenerator.Close method rewriting the
RemoteGenerator using the retry package.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR can be split into two parts:
* controllers
* apid binding into COSI world
Controllers
-----------
* `k8s.EndpointController` provides control plane endpoints on worker
nodes (it isn't required for now on control plane nodes)
* `secrets.RootController` now provides OS top-level secrets (CA cert)
and secret configuration
* `secrets.APIController` generates API secrets (certificates) in a bit
different way for workers and control plane nodes: controlplane nodes
generate directly, while workers reach out to `trustd` on control plane
nodes via `k8s.Endpoint` resource
apid Binding
------------
Resource `secrets.API` provides binding to protobuf by converting
itself back and forth to protobuf spec.
apid no longer receives machine configuration, instead it receives
gRPC-backed socket to access Resource API. apid watches `secrets.API`
resource, fetches certs and CA from it and uses that in its TLS
configuration.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>