945 Commits

Author SHA1 Message Date
Alexey Palazhchenko
0dad5f4d78
chore: small cleanup
Remove empty tests.
Remove unused parameter.
Remove extra parameter.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-10-14 08:54:24 +00:00
Andrey Smirnov
31b6e39e58
fix: delete expired affiliates from the discovery service
See https://github.com/talos-systems/discovery-service/pull/20

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-12 14:48:57 +03:00
Artem Chernyshev
7137166d1d
fix: allow overriding audit-policy-file in kube-apiserver static pod
Otherwise we lock it with our default config.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-10-11 11:27:36 +03:00
Andrey Smirnov
022c7335f3
fix: add interface route if DHCP4 router is not directly routeable
Fixes #4320

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-07 22:33:05 +03:00
Andrey Smirnov
66a1579ea7
fix: don't enable 'no new privs' on the system level
This breaks some pods which specifically drop everything but gain
capabilities back via file capabilities (e.g. `nginx-ingress`).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-06 21:43:14 +03:00
Alexey Palazhchenko
423861cf9f
feat: don't drop capabilities if kexec is disabled
It is needed for advanced use cases like Docker-in-Docker, our CI, etc.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-10-06 08:37:25 +00:00
Andrey Smirnov
5e41dd4a65
feat: add an option to configure kubelet node IP based on subnets
Fixes #4243

The idea is to make sure kubelet picks node IP based on filtering by
CIDRs of the node's addresses. The flow is simple - every address is
filtered by subnet and picked if it matches the subnet.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-01 15:28:09 +03:00
Alexey Palazhchenko
72e49029e7
chore: allow insecure discovery in debug builds
If Talos is built with `sidero.debug` build tag (`make WITH_DEBUG=1`),
the machine configuration is allowed to use insecure HTTP for the discovery service.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-09-30 17:34:25 +00:00
Andrey Smirnov
d52befd1ac
fix: ignore 404 for AWS external IPs
Also ignore expected errors for other platforms to keep controller from
failing over and over again.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-30 16:50:14 +03:00
Andrey Smirnov
4044372e12
feat: harvest discovered endpoints and push them via discovery svc
Fixes #4250

Each KubeSpan peer sees each other KubeSpan peer endpoint as it got
connected. If the peer is behind NAT, the discovered endpoint is
different from the endpoints node knows about itself (as it punched a
hole in NAT). This discovered endpoint is pushed to the discovery
service so that every other peer now can use that punched hole to talk
to the peer.

If the endpoint observed is actually in the list of the endpoints
reported by the peer itself, discovery service will take care of
deduplicating them and suppressing updates.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-29 23:39:35 +03:00
Andrey Smirnov
9a51aa8358
feat: add an option to skip downed peers in KubeSpan
Fixes #4248

This resolves the balance between security and connectivity.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-29 23:06:14 +03:00
Andrey Smirnov
cbbd7c6821
feat: publish node's ExternalIPs as node addresses
This means that ExternalIPs (as presented by the platform) will be
published as `AddressStatus` resource, and transitively as
`NodeAddresses` (which includes cert generation) and as KubeSpan
endpoints (for KubeSpan connectivity in the cloud).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-29 21:51:36 +03:00
Andrey Smirnov
0f60ef6d38
fix: reset inputs back to initial state in secrets.APIController
This fixes a bug when after an error generating certificates controller
gets into a state of not being able to read its expected inputs
(NetworkStatus specifically).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-29 19:57:51 +03:00
Artem Chernyshev
64cb873ec4
feat: override static pods default args by extra Args
Use `argsbuilder` same way as it's used in services.
Rewrite `kubeProxy` generation code to override default args.

As a consequence of this change now flags do not have determined order
as they all come from a single merged map.

Introduced merge policy in the `ArgsBuilder` to deny overrides for some
arguments and do additive merge of others.

Fixes: https://github.com/talos-systems/talos/issues/4238
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-09-29 11:50:40 +03:00
Andrey Smirnov
ecdd7757fb
test: workaround race in the tests with zaptest package
Looks like `zaptest` package when used from the goroutine (like in gRPC
server) results in a potential data race on test tear down.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-28 23:59:00 +03:00
Andrey Smirnov
30ae714243
feat: implement integration with Discovery Service
This provides integration layer with discovery service to provide
cluster discovery (and transitively KubeSpan peer discovery).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-28 20:24:08 +03:00
Serge Logvinov
353d632ae5
feat: add nocloud platform support
* fetch cdrom/net nocloud config
* apply simple network configuration

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-28 16:32:12 +03:00
Andrey Smirnov
62acd62516
fix: check trustd API CA on worker nodes
This distributes API CA (just the certificate, not the key) to the
worker nodes on config generation, and if the CA cert is present on the
worker node, it verifies TLS connection to the trustd with the CA
certificate.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-28 15:14:23 +03:00
Serge Logvinov
ba27bc366f
feat: implement Hetzner Cloud support for virtual (shared) IP
Talos supports automatic virtual IP for the control plane with pure
layer 2 connectivity. Hetzner Cloud API supports assigning Floating IPs
to the nodes, this PR combines existing virtual IP functionality with calls
to HCloud API to move the IP address on HCloud side to the leader node.

The only thing which should be supplied in the machine configuration is
the Hetzner Cloud API token, every other setting is automatically
discovered by Talos.

Talos supports two types of floating IPs:
* external Floating IP for external network
* server alias IP for local networks

The controlplane can have only one alias on the local network interface.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-27 23:45:46 +03:00
Andrey Smirnov
b450b7cef0
chore: deprecate Interfaces and Routes APIs
Fixes #4094

Deprecate old networkd APIs, `talosctl interfaces` and `talosctl routes`
now suggest different commands to be used to achieve same task.

TUI installer was updated to stop using Interfaces API.

Those APIs will be completely removed in 0.14.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-27 15:21:02 +03:00
Seán C McCord
b1b6d61365
fix: check for existence of dhcp6 FQDN first
Check that dhcpv6.Options.FQDN() is not nil before trying to use it.

This fixes DHCPv6 on GCP.

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2021-09-24 12:58:04 -07:00
Artem Chernyshev
519999b846
fix: use readonly mode when probing devices with All lookup
Update `go-blockdevice` library.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-09-23 14:47:52 +03:00
Andrey Smirnov
2b5204200a
feat: enable resource API in the maintenance mode
This basically provides `talosctl get --insecure` in maintenance mode.
Only non-sensitive resources are available (equivalent to having
`os:reader` role in the Talos client certificate).

Changes:

* refactored insecure/maintenance client setup in talosctl
* `LinkStatus` is no longer sensitive as it shows only Wireguard public
key, `LinkSpec` still contains private key for obvious reasons
* maintenance mode injects `os:reader` role implicitly

The motivation behind this PR is to deprecate networkd-era interfaces &
routes APIs which are being used in TUI installer, and we need a
replacement.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-22 21:36:34 +03:00
Serge Logvinov
d9eb18bfdd
fix: containerd log symlink
Kubelet creates symlinks from /var/log/containers/<pod>.log to the log file /var/log/pod/<pod-folder>/0.log
Log senders (like fluentd) usually watch the folder /var/log/containers/*.log
Kubelet needs to share containers folder.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-21 15:09:02 +03:00
Andrey Smirnov
50a2410482
feat: add operating system version field to discovery
Fixes #4232

The result:

```
talosctl -n 172.20.0.2 get members
NODE         NAMESPACE   TYPE     ID                       VERSION   HOSTNAME                 MACHINE TYPE   OS                                           ADDRESSES
172.20.0.2   cluster     Member   talos-default-master-1   2         talos-default-master-1   controlplane   Talos (v0.13.0-alpha.0-13-gfdd80a12-dirty)   ["172.20.0.2","fdd1:f54:2697:3902:44f8:92ff:fe2e:1aea"]
172.20.0.2   cluster     Member   talos-default-worker-1   1         talos-default-worker-1   worker         Talos (v0.13.0-alpha.0-13-gfdd80a12-dirty)   ["172.20.0.3","fdd1:f54:2697:3902:d4ba:55ff:fe8a:f551"]
172.20.0.2   cluster     Member   talos-default-worker-2   1         talos-default-worker-2   worker         Talos (v0.13.0-alpha.0-13-gfdd80a12-dirty)   ["172.20.0.4","fdd1:f54:2697:3902:e00d:f4ff:fecf:51c8"]
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-17 15:44:33 +03:00
Andrey Smirnov
085c61b2ec
chore: add a special condition to check for kubeconfig readiness
The problem is that the kubelet kubeconfig gets created early, but the
actual client key and cert files are not written, so controllers spam
with scary errors that the config is not valid. This PR removes those
scary messages as we wait for the kubeconfig to be usable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-17 00:07:38 +03:00
Andrey Smirnov
21cdd85403
fix: add node address to the list of allowed IPs (kubespan)
This fixes the bug with host networking pods not being able to reach out
to the Kubernetes services.

This also moves any node-to-node networking over to KubeSpan link as
well.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-16 23:29:42 +03:00
Andrey Smirnov
fdd80a1234
feat: add an option to continue booting on NTP timeout
Fixes #4224

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-16 21:34:17 +03:00
Andrey Smirnov
ef36849899
feat: add routes, routing rules and nftables rules for KubeSpan
This concludes basic KubeSpan implementation.

Most of the code is from #3577 with some fixes and refactoring.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Co-authored-by: Seán C McCord <ulexus@gmail.com>
2021-09-16 20:01:39 +03:00
Andrey Smirnov
d0585fb6b3
feat: reboot via kexec
This should save a lot of time on BIOS/POST time with bare metal
hardware.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-15 22:14:19 +03:00
Serge Logvinov
3de505c894
fix: skip bad cloud-config in OpenStack platform
Sometimes we cannot redefine user-data in the openstack.
we need to catch this and allow user to apply configuration
through api.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-15 21:22:50 +03:00
Andrey Smirnov
a394d1e20b
fix: tear down control plane static pods when etcd is stopped
When `etcd` is stopped, control plane can't function anymore as API
server connects to the local etcd instance.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-15 21:00:39 +03:00
Andrey Smirnov
1c05089bb2
feat: implement KubeSpan manager for Wireguard peer state
KubeSpan manager uses list of KubeSpan peers prepared from the discovery
and local KubeSpan identity to set up and update configuration of the
Wireguard interface.

As new peers are getting added or deleted, manager takes care of
updating the Wireguard config.

Manager also keeps track of all peers and their state coming from the
Wireguard link status: whether the connection is up or not, some stats,
last actually used endpoint, etc.

Manager cycles through the available peer endpoints until it finds the
one which works.

Manager exposes peer status as `PeerStatus` resources.

Example:

```
$ talosctl -n 172.20.0.2 get kubespanpeerstatuses
NODE         NAMESPACE   TYPE                 ID                                             VERSION   LABEL                    ENDPOINT           STATE   RX    TX
172.20.0.2   kubespan    KubeSpanPeerStatus   GpO3gs5n09WpoiVANbzRL5nwrkRi+9Q19qoeC8RTkQ4=   30        talos-default-worker-2   172.20.0.6:51820   up      640   1920
172.20.0.2   kubespan    KubeSpanPeerStatus   j4CRlKByMcTWOBS2ifZcPzcUr3lXdBOc/I4AxGmhXxI=   30        talos-default-worker-1   172.20.0.5:51820   up      672   1888
172.20.0.2   kubespan    KubeSpanPeerStatus   o5EPScFrD895A5EpVyKU8hFR+vi25D0CJMYsoaXN3Qk=   28        talos-default-master-3   172.20.0.4:51820   up      640   1920
172.20.0.2   kubespan    KubeSpanPeerStatus   rBp5wyHdxqZkq5CWher2DcPcGgwHrFOwB6fP/ReFRlE=   16        talos-default-master-2   172.20.0.3:51820   up      432   2088
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Co-authored-by: Seán C McCord <ulexus@gmail.com>
2021-09-15 16:09:38 +03:00
Serge Logvinov
19a8ae97c6
feat: add vultr.com cloud support
* cloud-init for vultr.com
* ipv4/v6 support
* set static IPs for private interface

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-14 22:58:30 +03:00
Lennard Klein
0ff4c7cdb2
fix: write KubernetesCACert chmodded 0400 instead of 0500
Looks like this was an error made long ago, fixed similarly for etcd
in b52b20666.

Signed-off-by: Lennard Klein <lennard.klein@eu.equinix.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-14 15:57:08 +03:00
Andrey Smirnov
a059454045
chore: build using Go 1.17
`initramfs` size for amd64 shrinks by 1.3 MiB.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-13 22:33:47 +03:00
Andrey Smirnov
ee2dce6c1a
chore: bump dependencies
PRs:

* #4215
* #4216
* #4217
* #4218
* #4219
* #4220
* #4221

+ go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-13 16:39:17 +03:00
Andrey Smirnov
5ca1fb8221
fix: multiple fixes for KubeSpan and Wireguard implementation
* calculate covering IPPrefixes for the KubeSpan peer `AllowedIPs`,
check for overlap
* don't use KubeSpan IP as potential node endpoint (inception!)
* allow Wireguard config to be applied which doesn't change peer
endpoint
* support for pre-shared Wireguard peer keys

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Co-authored-by: Seán C McCord <ulexus@gmail.com>
2021-09-10 16:23:27 +03:00
Serge Logvinov
3b5f4038de
feat: add scaleway.com cloud support
* cloud-init for scaleway
* set ipv6 to the interface

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-09 23:01:50 +03:00
Serge Logvinov
f156ab1847
feat: add upcloud.com cloud support
* cloud-init for upcloud.com
* ipv4/v6 support

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
2021-09-09 17:00:05 +03:00
Andrey Smirnov
c3b2429ce9
fix: suppress spurious Kubernetes API server cert updates
With the last changes, `kube-apiserver` certificates are generated based
on the assigned `NodeAdresses`, machine configuration, etc. Whenver the
certificate is regenerated, `kube-apiserver` is reloaded to pick up the
new cert.

With Virtual IP enabled, Virtual IP address is included into the
certificate from the beginning as it is specified in the machine
configuration, but as virtual IP moves between the nodes this causes
`NodeAddresses` update, which triggers the controller, generates new
certs and reloads `kube-apiserver` at bad time (right after VIP got
moved). Even though the cert generated is identical to the previous one,
the API server reload makes it unavailable for 30-90 seconds.

This change extracts `CertSANs` as a separate resource so that its
updates are suppressed if the CertSANs sources change, but the final
list stays the same, and in turn prevents final certificate from being
updated.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-09 00:31:54 +03:00
Andrey Smirnov
ff90b5751e
feat: implement KubeSpan peer generation controller
Controller watches cluster Affiliates and generates KubeSpanPeerSpecs
from those which are not local node Affiliates and have KubeSpan
configuration attached.

Example:

```
$ talosctl -n 172.20.0.2 get kubespanpeerspecs
NODE         NAMESPACE   TYPE               ID                                             VERSION   LABEL                    ENDPOINTS
172.20.0.2   kubespan    KubeSpanPeerSpec   27E8I+ekrqT21cq2iW6+fDe+H7WBw6q9J7vqLCeswiM=   1         talos-default-worker-1   ["172.20.0.3:51820"]
```

```
$ talosctl -n 172.20.0.3 get kubespanpeerspecs -o yaml
node: 172.20.0.3
metadata:
    namespace: kubespan
    type: KubeSpanPeerSpecs.kubespan.talos.dev
    id: mB6WlFOR66Jx5rtPMIpxJ3s4XHyer9NCzqWPP7idGRo=
    version: 1
    owner: kubespan.PeerSpecController
    phase: running
    created: 2021-09-07T19:26:35Z
    updated: 2021-09-07T19:26:35Z
spec:
    address: fdc8:8aee:4e2d:1202:f073:9cff:fe6c:4d67
    additionalAddresses:
        - 10.244.1.0/32
    endpoints:
        - 172.20.0.2:51820
    label: talos-default-master-1
```

KubeSpan peers will be used to drive configuration of the KubeSpan
networking components.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-08 15:58:09 +03:00
Andrey Smirnov
14c69df506
fix: correctly parse multiple pod/service CIDRs
This changes machinery API for the configuration to make it more
obvious that the returned value is a list of CIDRs and adjusts usage
accordingly.

For the K8s Address Filter controller, fix the actual bug by parsing
CIDRs as a list of values.

Fixes #4192

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-08 15:13:28 +03:00
Andrey Smirnov
69897dbba4
feat: drop some capabilities to be never available
This PR makes sure that some capabilities (SYS_BOOT and SYS_MODULES) and
never be gained by any process running on Talos except for `machined`
itself.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-08 14:46:36 +03:00
Serge Logvinov
812d59c700
feat: add hetzner.com cloud support
* cloud-init for hcloud
* set ipv6 to the interface

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-07 21:33:15 +03:00
Andrey Smirnov
af6622109f
feat: implement Kubernetes cluster discovery registry
This implements pushing to and pulling from Kubernetes cluster discovery
registry which is simply using extra Talos annotations on the Node
resources.

Note: cluster discovery is still disabled by default.

This means that each Talos node is going to push data from its own local
`Affiliate` structure to the `Node` resource, and also watches the other
`Node`s to scrape data to build `Affiliate`s from each other cluster
member.

Further down the pipeline, `Affiliate` is converted to a cluster
`Member` which is an easy way to see the cluster membership.

In its current form, `talosctl get members` is mostly equivalent to
`kubectl get nodes`, but as we add more registries, it will become more
powerful.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-03 22:09:26 +03:00
Andrey Smirnov
2c66e1b3c5
feat: provide building of local Affiliate structure (for the node)
Fixes #4139

This builds the local (for the node) `Affiliate` structure which
describes node for the cluster discovery. Dependending on the
configuration, KubeSpan information might be included as well.

`NodeAddresses` were updated to hold CIDRs instead of simple IPs.

The `Affiliate` will be pushed to the registries, while `Affiliate`s for
other nodes will be fetched back from the registries.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-03 16:44:19 +03:00
Andrey Smirnov
0b347570a7
feat: use dynamic NodeAddresses/HostnameStatus in Kubernetes certs
This is a PR on a path towards removing `ApplyDynamicConfig`.

This fixes Kubernetes API server certificate generation to use dynamic
data to generate cert with proper SANs for IPs of the node.

As part of that refactored a bit apid certificate generation (without
any changes).

Added two unit-tests for apid and Kubernetes certificate generation.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-09-01 20:56:53 +03:00
Andrew Rynhard
668627d5b8
feat: add subnet filter for etcd address
This adds the ability to specify the subnet that `etcd`'s listen address
should be in. This allows users to ensure that `etcd` is on a private
subnet.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2021-08-30 19:49:24 +00:00
Andrey Smirnov
6956edd0bf
feat: add node address filters, filter out k8s addresses for Talos API
This implements abstract `NodeAddressFilter` which can be attached to
build additional `NodeAddress` resources filtering out some entries from
the complete list.

Kubernetes creates two filters to get all node IPs without Kubernetes
CIDRs and vice versa, only Kubernetes CIDRs.

API certificate generation now considers all addresses minus K8s CIDRs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-08-27 23:46:39 +03:00