Commit Graph

151 Commits

Author SHA1 Message Date
Noel Georgi
69d8054c9e
chore: drop UpdateEndpointSuite
drop `UpdateEndpointSuite` suite since KubePrism is enabled by default
starting Talos 1.6 and the test never passes since K8s node is always
ready since it can connect to api server over KubePrism.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-10-04 00:26:59 +05:30
Noel Georgi
9b5cfdd0bc
chore: add tests for iscsi
Add tests for iscsi to make sure it works.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-10-02 22:12:42 +05:30
Andrey Smirnov
a52d3cda3b
chore: update gen and COSI runtime
No actual changes, adapting to use new APIs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-22 12:13:13 +04:00
Noel Georgi
9c2ba7c6fa
chore: add tests for chelsio drivers
Add tests for Chelsio drivers and firmware.

Ref: https://github.com/siderolabs/extensions/pull/232

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-20 20:07:25 +05:30
Noel Georgi
3fbed806c4
chore: add tests for util-linux extensions
Add tests for utils-linux extensions.

Ref: https://github.com/siderolabs/extensions/pull/216

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-05 19:29:50 +05:30
Noel Georgi
d03dc7a8af
chore: validate new system extensions
Validate the amdgpu and intel-ice firmware extensions.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-25 17:18:19 +05:30
Andrey Smirnov
3c9f7a7de6
chore: re-enable nolintlint and typecheck linters
Drop startup/rand.go, as since Go 1.20 `rand.Seed` is done
automatically.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-25 01:05:41 +04:00
Noel Georgi
6778ded29d
feat: add e2e-aws for nvidia extensions
Add e2e tests for nvidia

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-24 17:43:36 +05:30
Noel Georgi
d6b2719e2e
chore: drone: move extensions step to a function
Move drone extensions integration to a function. This allows us to
re-use the code and just depend on a single step rather than explicitly
defining all dependencies.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-23 20:56:43 +05:30
Andrey Smirnov
9608ef56dc
chore: allow bridge traffic with DHCP broadcast traffic
This is required for https://github.com/siderolabs/sidero/pull/1070, as
we need to allow DHCP traffic from Sidero controller running in a VM
through the bridge to other VMs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-23 18:37:37 +04:00
Noel Georgi
833895940b
chore: add tests for zfs extension
Add tests for ZFS and btrfs extensions.
Also fix the e2e-aws cron pipeline.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-23 11:16:25 +05:30
Noel Georgi
6b0373ebef
chore: move bash tests to integration
move extensions and secureboot tests to integration.
Makes it easier to test.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-17 19:58:35 +05:30
Dmitriy Matrenichev
c4a1ca8d61
chore: remove <-errCh where possible in grpc methods
Simplify code by passing error directly into the pipe closer.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-08-07 22:28:58 +03:00
Artem Chernyshev
ce63abb219
feat: add KMS assisted encryption key handler
Talos now supports new type of encryption keys which rely on Sealing/Unsealing randomly generated bytes with a KMS server:

```
systemDiskEncryption:
  ephemeral:
    keys:
      - kms:
          endpoint: https://1.2.3.4:443
        slot: 0
```
gRPC API definitions and a simple reference implementation of the KMS server can be found in this
[repository](https://github.com/siderolabs/kms-client/blob/main/cmd/kms-server/main.go).

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-07-07 19:02:39 +03:00
Noel Georgi
e3f3f5794d
feat: implement revert for sd-boot
Implement revert for sd-boot.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-22 20:20:31 +05:30
Andrey Smirnov
dbaf5c6997
refactor: task labelControlPlane into controllers
See #7233

The controlplane label is simply injected into existing controller-based
node label flow.

For controlplane taint default NoScheduleTaint, additional controller &
resource was implemented to handle node taints.

This also fixes a problem with `allowSchedulingOnControlPlanes` not
being reactive to config changes - now it is.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-12 15:25:13 +04:00
Dmitriy Matrenichev
8a02ecd4cb
chore: add endpoints balancer controller
This PR adds support for creating a list of API endpoints (each is pair of host and port).

It gets them from
- Machine config cluster endpoint.
- Localhost with LocalAPIServerPort if machine is control panel.
- netip.Addr[0] and port from affiliates if they are control panels.

For #7191

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-06-05 20:47:52 -04:00
Andrey Smirnov
badbc51e63
refactor: rewrite code to include preliminary support for multi-doc
`config.Container` implements a multi-doc container which implements
both `Container` interface (encoding, validation, etc.), and `Conifg`
interface (accessing parts of the config).

Refactor `generate` and `bundle` packages to support multi-doc, and
provide backwards compatibility.

Implement a first (mostly example) machine config document for
SideroLink API URL.

Many places don't properly support multi-doc yet (e.g. config patches).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-31 18:38:05 +04:00
Andrey Smirnov
dc6764871c
refactor: move around config interfaces, make RawV1Alpha1 typed
See #7230

Refactor more config interfaces, move config accessor interfaces
to different package to break the dependency loop.

Make `.RawV1Alpha1()` method typed to avoid type assertions everywhere.

No functional changes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-23 22:08:58 +04:00
Andrey Smirnov
0bb7e8a5cf
refactor: split config.Provider into Config & Container
See #7230

This is a step towards preparing for multi-doc config.

Split the `config.Provider` interface into parts which have different
implementation:

* `config.Config` accesses the config itself, it might be implemented by
  `v1alpha1.Config` for example
* `config.Container` will be a set of config documents, which implement
  validation, encoding, etc.

`Version()` method dropped, as it makes little sense and it was almost
not used.

`Raw()` method renamed to `RawV1Alpha1()` to support legacy direct
access to `v1alpha1.Config`, next PR will refactor more to make it
return proper type.

There will be many more changes coming up.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-23 16:05:16 +04:00
Dmitriy Matrenichev
45e6e27af7
chore: bump runtime
Use new functions and methods from runtime module.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-05-11 17:18:08 -04:00
Andrey Smirnov
860002c735
fix: don't reload control plane pods on cert SANs changes
Fixes #7159

The change looks big, but it's actually pretty simple inside: the static
pods had an annotation which tracks a version of the secrets which
forced control plane pods to reload on a change. At the same time
`kube-apiserver` can reload certificate inputs automatically from files
without restart.

So the inputs were split: the dynamic (for kube-apiserver) inputs don't
need to be reloaded, so its version is not tracked in static pod
annotation, so they don't cause a reload. The previous non-dynamic
resource still causes a reload, but it doesn't get updated when e.g.
node addresses change.

There might be many more refactoring done, the resource chain is a bit
of a mess there, but I wanted to keep number of changes minimal to keep
this backportable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-05 16:59:09 +04:00
Noel Georgi
d1a61fd343
chore: bump golangci-lint
Bump golangci-lint and fixup new warnings. Ignore check that checks for
used function parameters, it's kind of noisy and makes it confusing to
read interface implementations.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-22 19:55:38 +05:30
Nico Berlee
97048f7c37
feat: netstat in API and client
Implements netstat in Talos API and client (talosctl).

Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-09 15:48:30 +04:00
Artem Chernyshev
b520710810
feat: introduce new flag in reset API that makes Talos reset user disks
Fixes: https://github.com/siderolabs/talos/issues/6815

Additionally, make it possible to run reset in maintenance mode: to
enable a way for resetting system disk and remove all traces of Talos
from it.

The new reset flow works in a separate sequence, changed disk probe
lookup to check the boot partition instead of the ephemeral one.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-02-28 15:10:41 +03:00
Andrey Smirnov
062c7d754b
test: fix integration test on cp endpoint update
As with #6724, controlplane node kubelet doesn't use control plane
endpoint anymore, run the test on the worker node instead of cp node.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-01-12 15:23:14 +04:00
Philipp Sauter
e1e340bdd9
feat: expose Talos node labels as a machine configuration field
We add the `nodeLabels` key to the machine config to allow users to add
node labels to the kubernetes Node object. A controller
reads the nodeLabels from the machine config and applies them via the
kubernetes API.
Older versions of talosctl will throw an unknown keys error if `edit mc`
 is called on a node with this change.

Fixes #6301

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-15 21:25:40 +04:00
Andrey Smirnov
1cfb6188bc
feat: implement support for cgroupsv1
Use boot kernel arg `talos.unified_cgroup_hierarchy=0` to force Talos to
use cgroups v1. Talos still defaults to cgroupsv2.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-11 15:49:25 +04:00
Philipp Sauter
4e114ca120
feat: use the etcd member id for etcd operations instead of hostname
We add a controller that provides the etcd member id as a resource
and change the etcd related commands to support member ids next to
hostnames.

Fixes: #6223

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
2022-11-10 19:17:56 +04:00
Andrey Smirnov
96aa9638f7
chore: rename talos-systems/talos to siderolabs/talos
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-03 16:50:32 +04:00
Andrey Smirnov
30bbf6463a
refactor: use siderolabs/net version with netip.Addr
Replace most of `net.IP` usage in Talos with `netip.Addr`, refactor code
accordingly.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-02 14:21:03 +04:00
Andrey Smirnov
343c55762e
chore: replace talos-systems Go modules with siderolabs
This the first step towards replacing all import paths to be based on
`siderolabs/` instead of `talos-systems/`.

All updates contain no functional changes, just refactorings to adapt to
the new path structure.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-01 12:55:40 +04:00
Andrey Smirnov
d7070f5e74
release(v1.3.0-alpha.1): prepare release
This is the official v1.3.0-alpha.1 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-10-31 16:43:11 +04:00
Dmitriy Matrenichev
fc48849d00
chore: move maps/slices/ordered to gen module
Use github.com/siderolabs/gen

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-21 20:22:43 +03:00
Andrey Smirnov
2dadcd6695
fix: stop worker nodes from acting as apid routers
Don't allow worker nodes to act as apid routers:

* don't try to issue client certificate for apid on worker nodes
* if worker nodes receives incoming connections with `--nodes` set to
  one of the local addresses of the nodd, it routes the request to
  itself without proxying

Second point allows using `talosctl -e worker -n worker` to connect
directly to the worker if the connection from the control plane is not
available for some reason.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-13 15:07:31 +04:00
Andrey Smirnov
0723498125
fix: update COSI to the version with gRPC Wait fix
See https://github.com/cosi-project/runtime/pull/140

Also update for changes in https://github.com/cosi-project/runtime/pull/134

Fixes #6169

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-29 23:09:35 +04:00
Dmitriy Matrenichev
b59ca5810e
chore: move from inet.af/netaddr to net/netip and go4.org/netipx
Closes #6007

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-25 17:51:32 +03:00
Dmitriy Matrenichev
29bd632401
chore: remove old build tags syntax
This commit removes lines contains old build tag syntax.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-24 17:27:01 +03:00
Andrey Smirnov
2f2d97b6b5
fix: don't wait for the hostname in maintenance mode
Fixes #6119

With new stable default hostname feature, any default hostname is
disabled until the machine config is available.

Talos enters maintenance mode when the default config source is empty,
so it doesn't have any machine config available at the moment
maintenance service is started.

Hostname might be set via different sources, e.g. kernel args or via
DHCP before the machine config is available, but if all these sources
are not available, hostname won't be set at all.

This stops waiting for the hostname, and skips setting any DNS names in
the maintenance mode certificate SANs if the hostname is not available.

Also adds a regression test via new `--disable-dhcp-hostname` flag to
`talosctl cluster create`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-23 17:52:20 +04:00
Dmitriy Matrenichev
0fe4492e72
chore: bump golangci-lint from 1.47.2 to 1.48.0
Patch version linter upgrade.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-15 18:11:30 +03:00
Andrey Smirnov
9baca49662
refactor: implement COSI resource API for Talos
Overview: deprecate existing Talos resource API, and introduce new COSI
API.

Consequences:

* COSI API can only go via one-2-one proxy (`client.WithNode`)
* client-side API access is way easier with `state.State` wrappers
* lots of small changes on the client side to use new APIs

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-12 22:31:54 +04:00
Andrey Smirnov
5dd1b40020
feat: disable Kubernetes discovery backend by default
Fixes #5827

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-10 22:25:31 +04:00
Utku Ozdemir
84e712a9f1
feat: introduce Talos API access from Kubernetes
We add a new CRD, `serviceaccounts.talos.dev` (with `tsa` as short name), and its controller which allows users to get a `Secret` containing a short-lived Talosconfig in their namespaces with the roles they need. Additionally, we introduce the `talosctl inject serviceaccount` command to accept a YAML file with Kubernetes manifests and inject them with Talos service accounts so that they can be directly applied to Kubernetes afterwards. If Talos API access feature is enabled on Talos side, the injected workloads will be able to talk to Talos API.

Closes siderolabs/talos#4422.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-08 18:27:26 +02:00
Andrey Smirnov
a6b010a8b4
chore: update Go to 1.19, Linux to 5.15.58
See https://go.dev/doc/go1.19

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-03 17:03:58 +04:00
Andrey Smirnov
fe2ee3b100
feat: implement MachineStatus resource
Fixes #5789

Example:

```yaml
spec:
    stage: running
    status:
        ready: false
        unmetConditions:
            - name: staticPods
              reason: kube-system/kube-controller-manager-talos-default-master-1 not ready, kube-system/kube-scheduler-talos-default-master-1 not ready
```

As events (CLI doesn't show full contents):

```
172.20.0.2   cbhf2l6f9lrs738hehfg   talos/runtime/machine.MachineStatusEvent   BOOTING   ready: false, unmet conditions: [time network services]
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-01 18:36:10 +04:00
Artem Chernyshev
e5994ff7a7
fix: skip ResetDuringBoot test if the Cluster config is unknown
And improve retry logic in the test.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-28 15:57:58 +03:00
Artem Chernyshev
ae1bec59e9
feat: allow running only one sequence at a time
Fix `Talos` sequencer to run only a single sequence at the same time.
Sequences priority was updated. To match the table:

| what is running (columns) what is requested (rows) | boot | reboot | reset | upgrade |
|----------------------------------------------------|------|--------|-------|---------|
| reboot                                             | Y    | Y      | Y     | N       |
| reset                                              | Y    | N      | N     | N       |
| upgrade                                            | Y    | N      | N     | N       |

With a small addition that `WithTakeover` is still there.
If set, priority is ignored.

This is mainly used for `Shutdown` sequence invokation.
And if doing apply config with reboot enabled.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-27 17:21:36 +03:00
Dmitriy Matrenichev
30f7851d2a
chore: bump golangci-lint from 1.45.2 to 1.47.2
Minor linter upgrade.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-07-22 17:49:44 +03:00
Utku Ozdemir
bb4abc0961
fix: regenerate kubelet certs when hostname changes
Clear the kubelet certificates and kubeconfig when hostname changes so that on next start, kubelet goes through the bootstrap process and new certificates are generated and the node is joined to the cluster with the new name.

Fixes siderolabs/talos#5834.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-21 01:54:15 +02:00
Utku Ozdemir
87ea1d9611
fix: update kubelet kubeconfig when cluster control plane endpoint changes
Overwrite cluster's server URL in the kubeconfig file used by kubelet when the cluster control plane endpoint is changed in machineconfig, so that kubelet doesn't lose connectivity to kube-apiserver.

Closes siderolabs/talos#4470.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-16 14:19:25 +02:00