337 Commits

Author SHA1 Message Date
Andrey Smirnov
f9639fb531
test: fix 'talosctl gen' tests
There were weird hacks put into the tests, while each test already runs
in a temporary directory as 'working directory', so no hacks are needed.

Moreover, using fixed `/tmp/...` paths leads to test failures, as CI
runs docker & QEMU tests in parallel conflicting with each other.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-12 16:24:02 +04:00
Noel Georgi
69d8054c9e
chore: drop UpdateEndpointSuite
drop `UpdateEndpointSuite` suite since KubePrism is enabled by default
starting Talos 1.6 and the test never passes since K8s node is always
ready since it can connect to api server over KubePrism.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-10-04 00:26:59 +05:30
Noel Georgi
9b5cfdd0bc
chore: add tests for iscsi
Add tests for iscsi to make sure it works.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-10-02 22:12:42 +05:30
Andrey Smirnov
e7575ecaae
feat: support n-5 latest Kubernetes versions
For Talos 1.6 this means 1.24-1.29 Kubernetes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-29 13:41:56 +04:00
Andrey Smirnov
a52d3cda3b
chore: update gen and COSI runtime
No actual changes, adapting to use new APIs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-22 12:13:13 +04:00
Noel Georgi
9c2ba7c6fa
chore: add tests for chelsio drivers
Add tests for Chelsio drivers and firmware.

Ref: https://github.com/siderolabs/extensions/pull/232

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-20 20:07:25 +05:30
Andrey Smirnov
96f2a62eaf
test: update upgrade tests versions
Use a 1.4/1.5 releases.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-19 14:53:25 +04:00
Serge Logvinov
3f52320752
feat: upgrade-k8s without comments
This feature allows us to remove any comments from the machineconfig after
upgrading Kubernetes.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-12 14:50:56 +04:00
Noel Georgi
1eebbce357
chore: add output flag for talosctl config info
Add output flag for `talosctl config info`.

This allows to programatically gather endpoints for CI tests.

Eg:

```bash
_out/talosctl-linux-amd64 config info --output json | jq '.Contexts[].Endpoints[0]'
```

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-05 21:25:21 +04:00
Noel Georgi
3fbed806c4
chore: add tests for util-linux extensions
Add tests for utils-linux extensions.

Ref: https://github.com/siderolabs/extensions/pull/216

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-05 19:29:50 +05:30
Andrey Smirnov
c918c0855d
fix: set correct (1 year) talosconfig expiration
Fixes #7698

Also fix `talosctl config info` for `talosconfig` without a client
certificate (e.g. Omni-generated one).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-04 14:46:28 +04:00
Noel Georgi
d03dc7a8af
chore: validate new system extensions
Validate the amdgpu and intel-ice firmware extensions.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-25 17:18:19 +05:30
Andrey Smirnov
3c9f7a7de6
chore: re-enable nolintlint and typecheck linters
Drop startup/rand.go, as since Go 1.20 `rand.Seed` is done
automatically.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-25 01:05:41 +04:00
Andrey Smirnov
8670450d28
release(v1.6.0-alpha.0): prepare release
This is the official v1.6.0-alpha.0 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-24 17:09:34 +04:00
Noel Georgi
6778ded29d
feat: add e2e-aws for nvidia extensions
Add e2e tests for nvidia

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-24 17:43:36 +05:30
Noel Georgi
d6b2719e2e
chore: drone: move extensions step to a function
Move drone extensions integration to a function. This allows us to
re-use the code and just depend on a single step rather than explicitly
defining all dependencies.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-23 20:56:43 +05:30
Andrey Smirnov
9608ef56dc
chore: allow bridge traffic with DHCP broadcast traffic
This is required for https://github.com/siderolabs/sidero/pull/1070, as
we need to allow DHCP traffic from Sidero controller running in a VM
through the bridge to other VMs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-23 18:37:37 +04:00
Noel Georgi
833895940b
chore: add tests for zfs extension
Add tests for ZFS and btrfs extensions.
Also fix the e2e-aws cron pipeline.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-23 11:16:25 +05:30
Noel Georgi
6b0373ebef
chore: move bash tests to integration
move extensions and secureboot tests to integration.
Makes it easier to test.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-17 19:58:35 +05:30
Dmitriy Matrenichev
c4a1ca8d61
chore: remove <-errCh where possible in grpc methods
Simplify code by passing error directly into the pipe closer.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-08-07 22:28:58 +03:00
Andrey Smirnov
06369e8195
fix: retry CRI pod removal, fix upgrade flow in the tests
It seems that CRI has a bit of eventual consistency, and it might fail
to remove a stopped pod failing that it's still running.

Rewrite the upgrade API call in the upgrade test to actually wait for
the upgrade to be successful, and fail immediately if it's not
successful. This should improve the test stability and it should make
it easier to find issues immediately.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-12 16:20:10 +04:00
Andrey Smirnov
8017afb107
feat: implement CRI image management and pre-pull on K8s upgrade
Fixes #6391

Implement a set of APIs and commands to manage images in the CRI, and
pre-pull images on Kubernetes upgrades.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-11 19:25:10 +04:00
Artem Chernyshev
ce63abb219
feat: add KMS assisted encryption key handler
Talos now supports new type of encryption keys which rely on Sealing/Unsealing randomly generated bytes with a KMS server:

```
systemDiskEncryption:
  ephemeral:
    keys:
      - kms:
          endpoint: https://1.2.3.4:443
        slot: 0
```
gRPC API definitions and a simple reference implementation of the KMS server can be found in this
[repository](https://github.com/siderolabs/kms-client/blob/main/cmd/kms-server/main.go).

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-07-07 19:02:39 +03:00
Noel Georgi
8daf432b29
chore: bump deps
Bump deps.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-22 22:41:08 +05:30
Noel Georgi
e3f3f5794d
feat: implement revert for sd-boot
Implement revert for sd-boot.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-22 20:20:31 +05:30
Dmitriy Matrenichev
665702ddd3
chore: fix cilium e2e tests
`WITH_CONFIG_PATCH_WORKER` check result was overriding any value set in `CONFIG_PATCH_FLAG` variable.
Move it to the different variable.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-14 15:08:31 +04:00
Andrey Smirnov
e9dbc9311b
test: bump versions for upgrade tests
As we're getting to 1.5.0, bump versions for upgrade tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-13 23:17:22 +04:00
Andrey Smirnov
0a99965efb
refactor: replace uncordonNode with controllers
Fixes #7233

Waiting for node readiness now happens in the `MachineStatus` controller
which won't mark the node as ready until Kubernetes `Node` is ready.

Handling cordoning/uncordining happens with help of additional resource
in `NodeApplyController`.

New controller provides reactive `NodeStatus` resource to see current
status of Kubernetes `Node`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-13 21:48:42 +04:00
Andrey Smirnov
dbaf5c6997
refactor: task labelControlPlane into controllers
See #7233

The controlplane label is simply injected into existing controller-based
node label flow.

For controlplane taint default NoScheduleTaint, additional controller &
resource was implemented to handle node taints.

This also fixes a problem with `allowSchedulingOnControlPlanes` not
being reactive to config changes - now it is.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-12 15:25:13 +04:00
Dmitriy Matrenichev
8a02ecd4cb
chore: add endpoints balancer controller
This PR adds support for creating a list of API endpoints (each is pair of host and port).

It gets them from
- Machine config cluster endpoint.
- Localhost with LocalAPIServerPort if machine is control panel.
- netip.Addr[0] and port from affiliates if they are control panels.

For #7191

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-06-05 20:47:52 -04:00
Andrey Smirnov
badbc51e63
refactor: rewrite code to include preliminary support for multi-doc
`config.Container` implements a multi-doc container which implements
both `Container` interface (encoding, validation, etc.), and `Conifg`
interface (accessing parts of the config).

Refactor `generate` and `bundle` packages to support multi-doc, and
provide backwards compatibility.

Implement a first (mostly example) machine config document for
SideroLink API URL.

Many places don't properly support multi-doc yet (e.g. config patches).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-31 18:38:05 +04:00
Andrey Smirnov
dc6764871c
refactor: move around config interfaces, make RawV1Alpha1 typed
See #7230

Refactor more config interfaces, move config accessor interfaces
to different package to break the dependency loop.

Make `.RawV1Alpha1()` method typed to avoid type assertions everywhere.

No functional changes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-23 22:08:58 +04:00
Andrey Smirnov
0bb7e8a5cf
refactor: split config.Provider into Config & Container
See #7230

This is a step towards preparing for multi-doc config.

Split the `config.Provider` interface into parts which have different
implementation:

* `config.Config` accesses the config itself, it might be implemented by
  `v1alpha1.Config` for example
* `config.Container` will be a set of config documents, which implement
  validation, encoding, etc.

`Version()` method dropped, as it makes little sense and it was almost
not used.

`Raw()` method renamed to `RawV1Alpha1()` to support legacy direct
access to `v1alpha1.Config`, next PR will refactor more to make it
return proper type.

There will be many more changes coming up.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-23 16:05:16 +04:00
Dmitriy Matrenichev
45e6e27af7
chore: bump runtime
Use new functions and methods from runtime module.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-05-11 17:18:08 -04:00
Andrey Smirnov
860002c735
fix: don't reload control plane pods on cert SANs changes
Fixes #7159

The change looks big, but it's actually pretty simple inside: the static
pods had an annotation which tracks a version of the secrets which
forced control plane pods to reload on a change. At the same time
`kube-apiserver` can reload certificate inputs automatically from files
without restart.

So the inputs were split: the dynamic (for kube-apiserver) inputs don't
need to be reloaded, so its version is not tracked in static pod
annotation, so they don't cause a reload. The previous non-dynamic
resource still causes a reload, but it doesn't get updated when e.g.
node addresses change.

There might be many more refactoring done, the resource chain is a bit
of a mess there, but I wanted to keep number of changes minimal to keep
this backportable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-05 16:59:09 +04:00
Noel Georgi
cad43f0ad3
chore: remove k8s master label
Since talos now defaults to k8s 1.27, remove the handling
of `master` label for controlplane nodes.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-04-25 20:48:05 +05:30
Nico Berlee
0af8fe2fb5
feat: netstat pod support
talosctl netstat -k show all host and non-hostnetwork pods sockets/connections.
talosctl netstat namespace/pod shows sockets/connections of a specific pod +
autocompletes in the shell.

Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-30 23:39:38 +04:00
Noel Georgi
d1a61fd343
chore: bump golangci-lint
Bump golangci-lint and fixup new warnings. Ignore check that checks for
used function parameters, it's kind of noisy and makes it confusing to
read interface implementations.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-22 19:55:38 +05:30
Noel Georgi
c63cf90e32
feat: update k8s to v1.27.0-beta.0
Update k8s to v1.27.0-beta.0

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-21 23:59:17 +05:30
Noel Georgi
cf101e56fb
fix: add --force flag for talosctl gen
Error out if file(s) already exists and warn user to use
`--force` to overwrite.

Fixes: #6963

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-17 15:07:12 +05:30
Andrey Smirnov
442cb9c1b0
feat: implement APIs to write to META
This allows to put keys to META partition.

META contents can be viewed with `talosctl get metakeys`.

There is not real usecase for it yet, but the next PRs will introduce
two special keys which can be written:

* platform network config for `metal`
* `${code}` variable

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-15 22:17:52 +04:00
Nico Berlee
97048f7c37
feat: netstat in API and client
Implements netstat in Talos API and client (talosctl).

Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-09 15:48:30 +04:00
Andrey Smirnov
8ea4bfad8f
refactor: improve the kubernetes upgrade flow
Use new version of go-kubernetes, and move the `kube-proxy` DaemonSet
update to follow common logic of bootstrap manifests update.

This fixes a confusing behavior when after `k8s-upgrade` the version of
`kube-proxy` is not updated in the machine config.

See https://github.com/siderolabs/go-kubernetes/pull/3

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-06 15:01:29 +04:00
Andrey Smirnov
337aaba7a7
feat: add 'os:operator' role
This introduces a new role for Talos API which fills the gap between
`os:reader` and `os:admin` roles.

Fixes #6898

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-01 16:12:25 +04:00
Artem Chernyshev
b520710810
feat: introduce new flag in reset API that makes Talos reset user disks
Fixes: https://github.com/siderolabs/talos/issues/6815

Additionally, make it possible to run reset in maintenance mode: to
enable a way for resetting system disk and remove all traces of Talos
from it.

The new reset flow works in a separate sequence, changed disk probe
lookup to check the boot partition instead of the ephemeral one.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-02-28 15:10:41 +03:00
Andrey Smirnov
230e46e567
refactor: extract parts of kubernetes libraries
The shared code is going out to the
github.com/siderolabs/go-kubernetes library.

The code will be used in Talos and other projects using same features.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-22 14:56:49 +04:00
Dmitriy Matrenichev
3d55bd80f4
fix: add --force flag to talosctl gen config
Only overwrite existing files if explicitly demanded.

Closes #6847

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-02-20 23:44:00 +03:00
Andrey Smirnov
062c7d754b
test: fix integration test on cp endpoint update
As with #6724, controlplane node kubelet doesn't use control plane
endpoint anymore, run the test on the worker node instead of cp node.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-01-12 15:23:14 +04:00
Andrey Smirnov
96629d5ba6
feat: implement etcd maintenance commands
This allows to safely recover out of space quota issues, and perform
degragmentation as needed.

`talosctl etcd status` command provides lots of information about the
cluster health.

See docs for more details.

Fixes #4889

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-01-03 23:25:28 +04:00
Andrey Smirnov
31fb905358
feat: update Linux 6.1.1, containerd 1.6.14
Bumps tools/pkgs/extras to the latest.

Bumps Go modules.

Enables adaptive capacity for COSI state.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-12-23 20:30:09 +04:00