With the advent of generics, redo pointer functionality and remove github.com/AlekSi/pointer dependency.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Talos historically relied on `kubernetes` `Endpoints` resource (which
specifies `kube-apiserver` endpoints) to find other controlplane members
of the cluster to connect to the `etcd` nodes for the cluster (when node
local etcd instance is not up, for example). This method works great,
but it relies on Kubernetes endpoint being up. If the Kubernetes API is
down for whatever reason, or if the loadbalancer malfunctions, endpoints
are not available and join/leave operations don't work.
This PR replaces the endpoints lookup to use the `Endpoints` COSI
resource which is filled in using two methods:
* from the discovery data (if discovery is enabled, default to enabled)
* from the Kubernetes `Endpoints` resource
If the discovery is disabled (or not available), this change does almost
nothing: still Kubernetes is used to discover control plane endpoints,
but as the data persists in memory, even if the Kubernetes control plane
endpoint went down, cached copy will be used to connect to the endpoint.
If the discovery is enabled, Talos can join the etcd cluster immediately
on boot without waiting for Kubernetes to be up on the bootstrap node
which means that Talos cluster initial bootstrap runs in parallel on all
control plane nodes, while previously nodes were waiting for the first
node to finish bootstrap enough to fill in the endpoints data.
As the `etcd` communication is anyways protected with mutual TLS,
there's no risk even if the discovery data is stale or poisoned, as etcd
operations would fail on TLS mismatch.
Most of the changes in this PR actually enable populating Talos
`Endpoints` resource based on the `Kubernetes` `endpoints` resource
using the watch API.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4231
This allows much faster worker join on `apid` level without waiting for
Kubernetes control plane to be up (and even if the Kubernetes control
plane doesn't go up or the kubelet isn't up).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
The problem is that the kubelet kubeconfig gets created early, but the
actual client key and cert files are not written, so controllers spam
with scary errors that the config is not valid. This PR removes those
scary messages as we wait for the kubeconfig to be usable.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
The problem is that there's no official way to close Kuberentes client
underlying TCP/HTTP connections. So each time Talos initializes
connection to the control plane endpoint, new client is built, but this
client is never closed, so the connection stays active on the load
balancers, on the API server level, etc. It also eats some resources out
of Talos itself.
We add a way to close underlying connections by using helper from the
Kubernetes client libraries to force close all TCP connections which
should shut down all HTTP/2 connections as well.
Alternative approach might be to cache a client for some time, but many
of the clients are created with temporary PKI, so even cached client
still needs to be closed once it gets stale, and it's not clear how to
recreate a client in case existing one is broken for one reason or
another (and we need to force a re-connection).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR can be split into two parts:
* controllers
* apid binding into COSI world
Controllers
-----------
* `k8s.EndpointController` provides control plane endpoints on worker
nodes (it isn't required for now on control plane nodes)
* `secrets.RootController` now provides OS top-level secrets (CA cert)
and secret configuration
* `secrets.APIController` generates API secrets (certificates) in a bit
different way for workers and control plane nodes: controlplane nodes
generate directly, while workers reach out to `trustd` on control plane
nodes via `k8s.Endpoint` resource
apid Binding
------------
Resource `secrets.API` provides binding to protobuf by converting
itself back and forth to protobuf spec.
apid no longer receives machine configuration, instead it receives
gRPC-backed socket to access Resource API. apid watches `secrets.API`
resource, fetches certs and CA from it and uses that in its TLS
configuration.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>