Fixes#4557
When running `reset` for a node which was already deleted from
Kubernetes, we should ignore failure to cordon and proceed with other
actions.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
(cherry picked from commit c6a67b8662bb3c6efbe912b19699ace19e70dd3f)
Fixes#4407fixes#4489
This PR started by enabling simple restart of the `kubelet` service via
services API, but it turned out there's a problem:
When kubelet restarts, CNI is already up, so there's an interface on the
host with CNI node IP, the code which picks kubelet node IP finds it and
tries to add it to the list of kubelet node IPs which completely breaks
kubelet.
Solution was easy: allow node IPs to be filtered out - e.g. we never
want kubelet node IP to be from the pod CIDR.
But this filtering feature is also useful in other cases, so I added
that as well.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
(cherry picked from commit a76f6d69dbfdf34e4383dd5d2ee9f8cca4661e87)
We use `unknown` in the machine state file for PXE booted VMs in the
provisioning library.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
(cherry picked from commit e77d81fff31d68f762da3741846f95a6d2303903)
It is needed for advanced use cases like Docker-in-Docker, our CI, etc.
Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
(cherry picked from commit 423861cf9f99eaf034a4f0cb243d73d1275c3f38)
Due to the way our crypto library is implemented, it can't generate a
key from CA with ECDSA-SHA256 on older versions of Talos.
Talos >= 0.13: ECDSA-SHA256 with P-256
Talos < 0.13: ECDSA-SHA512 with P-256
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
(cherry picked from commit 997873b6d3116b59ebb46df66b8aa1cee06df92f)
Previously Talos used ECDSA-SHA512 with P-256 EC key, which is not
widely supported combination. Use ECDSA-SHA256 instead.
There's no security benefit to use ECDSA-SHA512 with P-256 key, and this
combination is officially supported by TLS 1.3 standard.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
(cherry picked from commit 657f7a56b10089e0dc551e178bc85b28d8003243)
This fixes the instabilitiy on some of the internal resources, as they
get regenerated as a result of machine config changes. As map iteration
order is not stable this might cause unexpected static pod defition
regeneration with the only difference is the flag order.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
(cherry picked from commit 91a858b53704ede86392fe3c155ce9ab3c2d406f)
Previously Talos used ECDSA-SHA512 with P-256 EC key, which is not
widely supported combination. Use ECDSA-SHA256 instead.
There's no security benefit to use ECDSA-SHA512 with P-256 key, and this
combination is officially supported by TLS 1.3 standard.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
(cherry picked from commit 657f7a56b10089e0dc551e178bc85b28d8003243)
Update component versions, Go module versions.
Add platform tiers to the support matrix.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4243
The idea is to make sure kubelet picks node IP based on filtering by
CIDRs of the node's addresses. The flow is simple - every address is
filtered by subnet and picked if it matches the subnet.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
If Talos is built with `sidero.debug` build tag (`make WITH_DEBUG=1`),
the machine configuration is allowed to use insecure HTTP for the discovery service.
Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
Fixes#4250
Each KubeSpan peer sees each other KubeSpan peer endpoint as it got
connected. If the peer is behind NAT, the discovered endpoint is
different from the endpoints node knows about itself (as it punched a
hole in NAT). This discovered endpoint is pushed to the discovery
service so that every other peer now can use that punched hole to talk
to the peer.
If the endpoint observed is actually in the list of the endpoints
reported by the peer itself, discovery service will take care of
deduplicating them and suppressing updates.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Use `argsbuilder` same way as it's used in services.
Rewrite `kubeProxy` generation code to override default args.
As a consequence of this change now flags do not have determined order
as they all come from a single merged map.
Introduced merge policy in the `ArgsBuilder` to deny overrides for some
arguments and do additive merge of others.
Fixes: https://github.com/talos-systems/talos/issues/4238
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
This provides integration layer with discovery service to provide
cluster discovery (and transitively KubeSpan peer discovery).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This distributes API CA (just the certificate, not the key) to the
worker nodes on config generation, and if the CA cert is present on the
worker node, it verifies TLS connection to the trustd with the CA
certificate.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Talos supports automatic virtual IP for the control plane with pure
layer 2 connectivity. Hetzner Cloud API supports assigning Floating IPs
to the nodes, this PR combines existing virtual IP functionality with calls
to HCloud API to move the IP address on HCloud side to the leader node.
The only thing which should be supplied in the machine configuration is
the Hetzner Cloud API token, every other setting is automatically
discovered by Talos.
Talos supports two types of floating IPs:
* external Floating IP for external network
* server alias IP for local networks
The controlplane can have only one alias on the local network interface.
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
That PR contains an example of how fuzz tests can be written with Go 1.18.
It also fixes a few panics with invalid configs.
Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
Fixes#4094
Deprecate old networkd APIs, `talosctl interfaces` and `talosctl routes`
now suggest different commands to be used to achieve same task.
TUI installer was updated to stop using Interfaces API.
Those APIs will be completely removed in 0.14.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This basically provides `talosctl get --insecure` in maintenance mode.
Only non-sensitive resources are available (equivalent to having
`os:reader` role in the Talos client certificate).
Changes:
* refactored insecure/maintenance client setup in talosctl
* `LinkStatus` is no longer sensitive as it shows only Wireguard public
key, `LinkSpec` still contains private key for obvious reasons
* maintenance mode injects `os:reader` role implicitly
The motivation behind this PR is to deprecate networkd-era interfaces &
routes APIs which are being used in TUI installer, and we need a
replacement.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This field is not marshalable, as it's technically an interface.
This will be used to save/load SecretsBundle as a whole in the CABPT.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Note: Talos can be still run under `Firecracker`, support for
Firecracker was only removed for `talosctl cluster create`.
Reason:
* code is untested/unmaintained, and probably doesn't work correctly
* firecracker Go SDK pulls lots of dependencies and it blocks CNI Go
module update
Bonus: `talosctl-linux-amd64` shrinks by 2 MiB.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4232
The result:
```
talosctl -n 172.20.0.2 get members
NODE NAMESPACE TYPE ID VERSION HOSTNAME MACHINE TYPE OS ADDRESSES
172.20.0.2 cluster Member talos-default-master-1 2 talos-default-master-1 controlplane Talos (v0.13.0-alpha.0-13-gfdd80a12-dirty) ["172.20.0.2","fdd1:f54:2697:3902:44f8:92ff:fe2e:1aea"]
172.20.0.2 cluster Member talos-default-worker-1 1 talos-default-worker-1 worker Talos (v0.13.0-alpha.0-13-gfdd80a12-dirty) ["172.20.0.3","fdd1:f54:2697:3902:d4ba:55ff:fe8a:f551"]
172.20.0.2 cluster Member talos-default-worker-2 1 talos-default-worker-2 worker Talos (v0.13.0-alpha.0-13-gfdd80a12-dirty) ["172.20.0.4","fdd1:f54:2697:3902:e00d:f4ff:fecf:51c8"]
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
The problem is that the kubelet kubeconfig gets created early, but the
actual client key and cert files are not written, so controllers spam
with scary errors that the config is not valid. This PR removes those
scary messages as we wait for the kubeconfig to be usable.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This concludes basic KubeSpan implementation.
Most of the code is from #3577 with some fixes and refactoring.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Co-authored-by: Seán C McCord <ulexus@gmail.com>
KubeSpan manager uses list of KubeSpan peers prepared from the discovery
and local KubeSpan identity to set up and update configuration of the
Wireguard interface.
As new peers are getting added or deleted, manager takes care of
updating the Wireguard config.
Manager also keeps track of all peers and their state coming from the
Wireguard link status: whether the connection is up or not, some stats,
last actually used endpoint, etc.
Manager cycles through the available peer endpoints until it finds the
one which works.
Manager exposes peer status as `PeerStatus` resources.
Example:
```
$ talosctl -n 172.20.0.2 get kubespanpeerstatuses
NODE NAMESPACE TYPE ID VERSION LABEL ENDPOINT STATE RX TX
172.20.0.2 kubespan KubeSpanPeerStatus GpO3gs5n09WpoiVANbzRL5nwrkRi+9Q19qoeC8RTkQ4= 30 talos-default-worker-2 172.20.0.6:51820 up 640 1920
172.20.0.2 kubespan KubeSpanPeerStatus j4CRlKByMcTWOBS2ifZcPzcUr3lXdBOc/I4AxGmhXxI= 30 talos-default-worker-1 172.20.0.5:51820 up 672 1888
172.20.0.2 kubespan KubeSpanPeerStatus o5EPScFrD895A5EpVyKU8hFR+vi25D0CJMYsoaXN3Qk= 28 talos-default-master-3 172.20.0.4:51820 up 640 1920
172.20.0.2 kubespan KubeSpanPeerStatus rBp5wyHdxqZkq5CWher2DcPcGgwHrFOwB6fP/ReFRlE= 16 talos-default-master-2 172.20.0.3:51820 up 432 2088
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Co-authored-by: Seán C McCord <ulexus@gmail.com>
Looks like we bumped sonobuoy library, and it silently changed a lot of
things in the way it works with the results.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
* calculate covering IPPrefixes for the KubeSpan peer `AllowedIPs`,
check for overlap
* don't use KubeSpan IP as potential node endpoint (inception!)
* allow Wireguard config to be applied which doesn't change peer
endpoint
* support for pre-shared Wireguard peer keys
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Co-authored-by: Seán C McCord <ulexus@gmail.com>
With the last changes, `kube-apiserver` certificates are generated based
on the assigned `NodeAdresses`, machine configuration, etc. Whenver the
certificate is regenerated, `kube-apiserver` is reloaded to pick up the
new cert.
With Virtual IP enabled, Virtual IP address is included into the
certificate from the beginning as it is specified in the machine
configuration, but as virtual IP moves between the nodes this causes
`NodeAddresses` update, which triggers the controller, generates new
certs and reloads `kube-apiserver` at bad time (right after VIP got
moved). Even though the cert generated is identical to the previous one,
the API server reload makes it unavailable for 30-90 seconds.
This change extracts `CertSANs` as a separate resource so that its
updates are suppressed if the CertSANs sources change, but the final
list stays the same, and in turn prevents final certificate from being
updated.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Controller watches cluster Affiliates and generates KubeSpanPeerSpecs
from those which are not local node Affiliates and have KubeSpan
configuration attached.
Example:
```
$ talosctl -n 172.20.0.2 get kubespanpeerspecs
NODE NAMESPACE TYPE ID VERSION LABEL ENDPOINTS
172.20.0.2 kubespan KubeSpanPeerSpec 27E8I+ekrqT21cq2iW6+fDe+H7WBw6q9J7vqLCeswiM= 1 talos-default-worker-1 ["172.20.0.3:51820"]
```
```
$ talosctl -n 172.20.0.3 get kubespanpeerspecs -o yaml
node: 172.20.0.3
metadata:
namespace: kubespan
type: KubeSpanPeerSpecs.kubespan.talos.dev
id: mB6WlFOR66Jx5rtPMIpxJ3s4XHyer9NCzqWPP7idGRo=
version: 1
owner: kubespan.PeerSpecController
phase: running
created: 2021-09-07T19:26:35Z
updated: 2021-09-07T19:26:35Z
spec:
address: fdc8:8aee:4e2d:1202:f073:9cff:fe6c:4d67
additionalAddresses:
- 10.244.1.0/32
endpoints:
- 172.20.0.2:51820
label: talos-default-master-1
```
KubeSpan peers will be used to drive configuration of the KubeSpan
networking components.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This changes machinery API for the configuration to make it more
obvious that the returned value is a list of CIDRs and adjusts usage
accordingly.
For the K8s Address Filter controller, fix the actual bug by parsing
CIDRs as a list of values.
Fixes#4192
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This implements pushing to and pulling from Kubernetes cluster discovery
registry which is simply using extra Talos annotations on the Node
resources.
Note: cluster discovery is still disabled by default.
This means that each Talos node is going to push data from its own local
`Affiliate` structure to the `Node` resource, and also watches the other
`Node`s to scrape data to build `Affiliate`s from each other cluster
member.
Further down the pipeline, `Affiliate` is converted to a cluster
`Member` which is an easy way to see the cluster membership.
In its current form, `talosctl get members` is mostly equivalent to
`kubectl get nodes`, but as we add more registries, it will become more
powerful.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4139
This builds the local (for the node) `Affiliate` structure which
describes node for the cluster discovery. Dependending on the
configuration, KubeSpan information might be included as well.
`NodeAddresses` were updated to hold CIDRs instead of simple IPs.
The `Affiliate` will be pushed to the registries, while `Affiliate`s for
other nodes will be fetched back from the registries.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This is a PR on a path towards removing `ApplyDynamicConfig`.
This fixes Kubernetes API server certificate generation to use dynamic
data to generate cert with proper SANs for IPs of the node.
As part of that refactored a bit apid certificate generation (without
any changes).
Added two unit-tests for apid and Kubernetes certificate generation.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This adds the ability to specify the subnet that `etcd`'s listen address
should be in. This allows users to ensure that `etcd` is on a private
subnet.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>