The shared code is going out to the
github.com/siderolabs/go-kubernetes library.
The code will be used in Talos and other projects using same features.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This switches the last usage of Kubernetes controlplane endpoint to use
`localhost` (itself) for controlplane nodes.
Worker nodes still use cluster-wide controlplane endpoint.
This allows controlplane nodes to boot fully even if the controlplane
endpoint (e.g. loadbalancer) doesn't function.
The process of joining etcd still requires either a discovery service or
a proper functioning controlplane endpoint.
With this fix, Talos controlplane nodes can boot successfully without a
loadbalancer being up, while worker nodes obviously won't join.
This improves Talos behavior in single-node clusters when controlplane
endpoint is not available, the node will still boot just fine and
function properly.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
We add the `nodeLabels` key to the machine config to allow users to add
node labels to the kubernetes Node object. A controller
reads the nodeLabels from the machine config and applies them via the
kubernetes API.
Older versions of talosctl will throw an unknown keys error if `edit mc`
is called on a node with this change.
Fixes#6301
Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This the first step towards replacing all import paths to be based on
`siderolabs/` instead of `talos-systems/`.
All updates contain no functional changes, just refactorings to adapt to
the new path structure.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
It got broken with the changes to the kubelet now sourcing static pods
from a HTTP internal server.
As we don't want it to be broken, and to make health checks better, add
a new check to make sure kubelet reports control plane static pods as
running. This coupled with API server check should make it more
thorough.
Also add logging when static pod definitions are updated (they were
previously there for file-based implementation). These logs are very
helpful for troubleshooting.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
We add a new CRD, `serviceaccounts.talos.dev` (with `tsa` as short name), and its controller which allows users to get a `Secret` containing a short-lived Talosconfig in their namespaces with the roles they need. Additionally, we introduce the `talosctl inject serviceaccount` command to accept a YAML file with Kubernetes manifests and inject them with Talos service accounts so that they can be directly applied to Kubernetes afterwards. If Talos API access feature is enabled on Talos side, the injected workloads will be able to talk to Talos API.
Closessiderolabs/talos#4422.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
When we query kubelet API to populate the StaticPodStatuses, instead of checking for ownerReferences to be empty, we check the annotation "kubernetes.io/config.source" value so we avoid including standalone pods (that are regular pods but not part of a replicaset).
We also optimize their fetching by avoiding to unmarshal the fields we do not need.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Talos historically relied on `kubernetes` `Endpoints` resource (which
specifies `kube-apiserver` endpoints) to find other controlplane members
of the cluster to connect to the `etcd` nodes for the cluster (when node
local etcd instance is not up, for example). This method works great,
but it relies on Kubernetes endpoint being up. If the Kubernetes API is
down for whatever reason, or if the loadbalancer malfunctions, endpoints
are not available and join/leave operations don't work.
This PR replaces the endpoints lookup to use the `Endpoints` COSI
resource which is filled in using two methods:
* from the discovery data (if discovery is enabled, default to enabled)
* from the Kubernetes `Endpoints` resource
If the discovery is disabled (or not available), this change does almost
nothing: still Kubernetes is used to discover control plane endpoints,
but as the data persists in memory, even if the Kubernetes control plane
endpoint went down, cached copy will be used to connect to the endpoint.
If the discovery is enabled, Talos can join the etcd cluster immediately
on boot without waiting for Kubernetes to be up on the bootstrap node
which means that Talos cluster initial bootstrap runs in parallel on all
control plane nodes, while previously nodes were waiting for the first
node to finish bootstrap enough to fill in the endpoints data.
As the `etcd` communication is anyways protected with mutual TLS,
there's no risk even if the discovery data is stale or poisoned, as etcd
operations would fail on TLS mismatch.
Most of the changes in this PR actually enable populating Talos
`Endpoints` resource based on the `Kubernetes` `endpoints` resource
using the watch API.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This fixes `talosctl upgrade-k8s`:
```
Get "https://172.21.0.1:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app+%3D+kube-apiserver": read tcp 172.21.0.1:51416->172.21.0.1:6443: read: connection reset by peer
```
The error happens when the `kube-apiserver` is restarted during the
control plane upgrade, and it should be ignored as a transient error.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4557
When running `reset` for a node which was already deleted from
Kubernetes, we should ignore failure to cordon and proceed with other
actions.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes: https://github.com/talos-systems/talos/issues/4065
Get all Talos generated manifests and apply them, wait for deployments to be
updated and to become ready.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
This is a PR on a path towards removing `ApplyDynamicConfig`.
This fixes Kubernetes API server certificate generation to use dynamic
data to generate cert with proper SANs for IPs of the node.
As part of that refactored a bit apid certificate generation (without
any changes).
Added two unit-tests for apid and Kubernetes certificate generation.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The problem is that there's no official way to close Kuberentes client
underlying TCP/HTTP connections. So each time Talos initializes
connection to the control plane endpoint, new client is built, but this
client is never closed, so the connection stays active on the load
balancers, on the API server level, etc. It also eats some resources out
of Talos itself.
We add a way to close underlying connections by using helper from the
Kubernetes client libraries to force close all TCP connections which
should shut down all HTTP/2 connections as well.
Alternative approach might be to cache a client for some time, but many
of the clients are created with temporary PKI, so even cached client
still needs to be closed once it gets stale, and it's not clear how to
recreate a client in case existing one is broken for one reason or
another (and we need to force a re-connection).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This should fix an error like:
```
failed to create etcd client: error getting kubernetes endpoints: Unauthorized
```
The problem is that the generated cert was used immediately, so even
slight time sync issue across nodes might render the cert not (yet)
usable. Cert is generated on one node, but might be used on any other
node (as it goes via the LB).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This changes the way Kubernetes nodename is computed: it is set by the
controller based on the hostname and machine configuration, and pulled
from the resource when needed.
Kubelet client now also uses nodename to fix the certifcate mismatch
issue on AWS.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The structure of the controllers is really similar to addresses and
routes:
* `LinkSpec` resource describes desired link state
* `LinkConfig` controller generates `LinkSpecs` based on machine
configuration and kernel cmdline
* `LinkMerge` controller merges multiple configuration sources into a
single `LinkSpec` paying attention to the config layer priority
* `LinkSpec` controller applies the specs to the kernel state
Controller `LinkStatus` (which was implemented before) watches the
kernel state and publishes current link status.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Kubelet might be running either self-signed cert (by default) or API
server issued cert (signed by the CA). User might switch between the two
methods, so instead of guessing based on filesystem contents, accept
both Kubernetes CA and self-signed cert (if available).
Spotted by @aceat64
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Looks like tls errors implement the interface, but they are not derived
from the `*net.OpError`, so this check should catch more errors.
Fixes#3457
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This extracts function which was used in upgrade/convert flows to retry
transient errors to the main `kubernetes` package, expands it to ignore
timeout errors, and it is now used to retry errors where applicable in
`pkg/kubernetes`.
Fixes#3403
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is required to correctly handle ACPI reboot or forceful reboots
during sequence that locks the controller.
Additionally fix `NoSchedule` untaint when the configuration is changed.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Critical bug (I believe) was that drain code entered the loop to evict
the pod after wait for pod to be deleted returned success effectively
evicting pod once again once it got rescheduled to a different node.
Add a global timeout to prevent draining code from running forever.
Filter more pod types which shouldn't be ever drained.
Fixes#3124
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
If kubelet is configured to issue certificates from the control plane,
`/var/lib/kubelet/pki/kubelet.crt` file is never created, and cluster CA
canv be used to verify the TLS connection.
Use k8s `RESTClient` instead of a custom client, this also results in
much more descriptive error messages if API call fails.
Fix a problem in apid on worker nodes with issued serving certificates:
`/var/lib/kubelet/pki` doesn't exist by the time `apid` starts.
First write static pods, then try to build kubelet client: for issued
serving kubelet certificates, control plane should be up first.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Labels shouldn't be used, as this is not supposed to be used for
filtering pods. Use proper annotation prefix private for Talos.
Add config-version annotation to track how static pod propagates up to
API server (it will be used in control plane upgrade).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Health checks verify node list in Kubernetes to match expectations, but
initial set of nodes for server-side health checks was driven by
`MasterIPs` functions which returns list of master endpoints which is
not exactly same as master nodes: endpoints also include some
healthchecks.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
ECDSA keys are smaller which decreases Talos config size, they are more
efficient in terms of key generation, signing, etc., so it makes boot
performance better (and config generation as well).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Control plane components are running as static pods managed by the
kubelets.
Whole subsystem is managed via resources/controllers from os-runtime.
Many supporting changes/refactoring to enable new code paths.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
When bootkube service fails, it can clean up manifests after itself, but
it only happens if we give it a chance to shut down cleanly. If boot
sequence times out, `machined` does emergency reboot and it doesn't let
`bootkube` do the cleanup.
So this fix has two paths:
* synchronize boot/bootstrap sequence timeouts with bootkube asset
timeout;
* cleanup bootkube-generated manifests and bootkube service startup.
Also logs errors on initial phases like `labelNodeAsMaster` to provide
some feedback on why boot is stuck.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
As code was looking for existing taint with `value == true`, it failed
to find existing taint and tried to add another one which never
succeeds.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This seems to be more preferred way and fixes compatibility with
deployments which don't do `operator: Exists` in tolerations.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Talos 0.8 is going to ship with K8s 1.20.x.
Changes to support new `control-plane` label,
upgrade-k8s supports automated fixups for 1.20.
See also: https://github.com/talos-systems/bootkube-plugin/pull/22
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes were applied automatically.
Import ordering might be questionable, but it's strict:
* stdlib
* other packages
* same package imports
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Add sonobuoy runner code with log fetching on failure. Use hand-picked
set of e2e tests to run: verify basic pod functionality, verify service
connectivity.
Add option `--run-e2e` to the `talosctl health` to run quick e2e test to
verify cluster health.
Add option to run provision tests with custom CNI, run one track of
provision tests with Cilium.
Bump Cilium to 1.8.2.
Talos 0.6 won't uncordon node automatically after upgrade from 0.5, as
0.5 doesn't put annotation. Workaround that in upgrade tests.
Bump upgrade test version to 0.6.0 release.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.
And `pkg/machinery` is published as Go module inside Talos repository.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Package `pkg/crypto` was extracted as `github.com/talos-systems/crypto`
repository and Go module.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
While waiting for node ready condition, API server endpoint might return
networking errors (e.g. if endpoint is a RR DNS record).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This implements existing server-side health checks as defined in
`internal/pkg/cluster/checks` in Talos API.
Summary of changes:
* new `cluster` API
* `apid` now listens without auth on local file socket
* `cluster` API is for now implemented in `machined`, but we can move it
to the new service if we find it more appropriate
* `talosctl health` by default now does server-side health check
UX: `talosctl health` without arguments does health check for the
cluster if it has healthy K8s to return master/worker nodes. If needed,
node list can be overridden with flags.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
1. Increase retry timeout.
2. Use timeout per attempt.
3. Check for node readiness as a gate to succeed.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Talos will mark node as schedulable if it was previously cordoned by
Talos (for upgrade, reset, etc.)
If user marked node as not schedulable, Talos won't change it on boot.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>