349 Commits

Author SHA1 Message Date
Andrey Smirnov
2ff81c06bc
feat: update runc 1.1.12, containerd 1.7.13
Also:

* Linux 6.6.14 + XDP enablement
* etcd 3.5.12

Various other bumps for the tools, utilities, and Go modules.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-01 17:01:04 +04:00
Andrey Smirnov
b44551ccdb
feat: update Linux to 6.6.13
See https://github.com/siderolabs/pkgs/pull/873

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-29 16:50:33 +04:00
Noel Georgi
0b94550c42
chore: fix the gvisor test
The gvisor test was not using the correct runtimeclass and would have
always passed the regardless.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-12-15 20:48:44 +05:30
Andrey Smirnov
10c59a6b90
fix: leave discovery service later in the reset sequence
Fixes #8057

I went back and forth on the way to fix it exactly, and ended up with a
pretty simple version of a fix.

The problem was that discovery service was removing the member at the
initial phase of reset, which actually still requires KubeSpan to be up:

* leaving `etcd` (need to talk to other members)
* stopping pods (might need to talk to Kubernetes API with some CNIs)

Now leaving discovery service happens way later, when network
interactions are no longer required.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-13 19:16:12 +04:00
Andrey Smirnov
36c8ddb5e1
feat: implement ingress firewall rules
Fixes #4421

See documentation for details on how to use the feature.

With `talosctl cluster create`, firewall can be easily test with
`--with-firewall=accept|block` (default mode).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-30 22:58:16 +04:00
Noel Georgi
f041b26299
chore: add tests for mdadm extension
Add tests for mdadm extension.

See: https://github.com/siderolabs/extensions/pull/271

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-11-27 23:18:35 +05:30
Dmitriy Matrenichev
dd45dd06cf
chore: add custom node taints
This PR adds support for custom node taints. Refer to `nodeTaints` in the `configuration` for more information.

Closes #7581

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-11-25 18:33:18 +03:00
Andrey Smirnov
06941b7e5c
fix: allow rootfs propagation configuration for extension services
Fixes #7873

Some services which perform mounts inside the container which require
mounts to propagate back to the host (e.g. `stargz-snapshotter`) require
this configuration setting.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-13 21:58:22 +04:00
Andrey Smirnov
e22ab440d7
feat: update Linux 6.1.61, containerd 1.7.8, runc 1.1.10
Bump tools/pkgs/extras.

Update Go dependencies.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-09 20:17:28 +04:00
Andrey Smirnov
813442dd7a
fix: don't validate machine.install if installed
As Talos doesn't consume `.machine.install` if already installed, there
is no point in validating it once already installed.

This fixes a problem users often run into: after a reboot/upgrade the
system disk blockdevice name changes, due to the kernel upgrade, or just
unpredictable behavior of device discovery. Talos fails to boot as it
can't validate the machine config, while it's already installed, so
actual blockdevice name doesn't matter.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-03 15:08:42 +04:00
Utku Ozdemir
5dff164f1c
fix: fix error output of cli action tracker
Before we started a reboot/shutdown/reset/upgrade action with the action tracker (`--wait`), we were setting a flag to prevent cobra from printing the returned error from the command.

This was to prevent the error from being printed twice, as the reporter of the action tracker already prints any errors occurred during the action execution.

But if the error happens too early - i.e. before we even started the status printer goroutine, then that error wouldn't be printed at all, as we have suppressed the errors.

This PR moves the suppression flag to be set after the status printer is started - so we still do not double-print the errors, but neither do we suppress any early-stage error from being printed.

Closes siderolabs/talos#7900.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-10-27 21:16:54 +02:00
Andrey Smirnov
8eba4c5999
feat: generate secrets bundle from the machine config
This allows to "recover" secrets if the machine config was generated
first without explicitly saving secrets bundle.

Fixes #7895

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-25 13:44:14 +04:00
Andrey Smirnov
f9639fb531
test: fix 'talosctl gen' tests
There were weird hacks put into the tests, while each test already runs
in a temporary directory as 'working directory', so no hacks are needed.

Moreover, using fixed `/tmp/...` paths leads to test failures, as CI
runs docker & QEMU tests in parallel conflicting with each other.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-12 16:24:02 +04:00
Noel Georgi
69d8054c9e
chore: drop UpdateEndpointSuite
drop `UpdateEndpointSuite` suite since KubePrism is enabled by default
starting Talos 1.6 and the test never passes since K8s node is always
ready since it can connect to api server over KubePrism.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-10-04 00:26:59 +05:30
Noel Georgi
9b5cfdd0bc
chore: add tests for iscsi
Add tests for iscsi to make sure it works.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-10-02 22:12:42 +05:30
Andrey Smirnov
e7575ecaae
feat: support n-5 latest Kubernetes versions
For Talos 1.6 this means 1.24-1.29 Kubernetes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-29 13:41:56 +04:00
Andrey Smirnov
a52d3cda3b
chore: update gen and COSI runtime
No actual changes, adapting to use new APIs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-22 12:13:13 +04:00
Noel Georgi
9c2ba7c6fa
chore: add tests for chelsio drivers
Add tests for Chelsio drivers and firmware.

Ref: https://github.com/siderolabs/extensions/pull/232

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-20 20:07:25 +05:30
Andrey Smirnov
96f2a62eaf
test: update upgrade tests versions
Use a 1.4/1.5 releases.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-19 14:53:25 +04:00
Serge Logvinov
3f52320752
feat: upgrade-k8s without comments
This feature allows us to remove any comments from the machineconfig after
upgrading Kubernetes.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-12 14:50:56 +04:00
Noel Georgi
1eebbce357
chore: add output flag for talosctl config info
Add output flag for `talosctl config info`.

This allows to programatically gather endpoints for CI tests.

Eg:

```bash
_out/talosctl-linux-amd64 config info --output json | jq '.Contexts[].Endpoints[0]'
```

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-05 21:25:21 +04:00
Noel Georgi
3fbed806c4
chore: add tests for util-linux extensions
Add tests for utils-linux extensions.

Ref: https://github.com/siderolabs/extensions/pull/216

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-05 19:29:50 +05:30
Andrey Smirnov
c918c0855d
fix: set correct (1 year) talosconfig expiration
Fixes #7698

Also fix `talosctl config info` for `talosconfig` without a client
certificate (e.g. Omni-generated one).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-04 14:46:28 +04:00
Noel Georgi
d03dc7a8af
chore: validate new system extensions
Validate the amdgpu and intel-ice firmware extensions.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-25 17:18:19 +05:30
Andrey Smirnov
3c9f7a7de6
chore: re-enable nolintlint and typecheck linters
Drop startup/rand.go, as since Go 1.20 `rand.Seed` is done
automatically.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-25 01:05:41 +04:00
Andrey Smirnov
8670450d28
release(v1.6.0-alpha.0): prepare release
This is the official v1.6.0-alpha.0 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-24 17:09:34 +04:00
Noel Georgi
6778ded29d
feat: add e2e-aws for nvidia extensions
Add e2e tests for nvidia

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-24 17:43:36 +05:30
Noel Georgi
d6b2719e2e
chore: drone: move extensions step to a function
Move drone extensions integration to a function. This allows us to
re-use the code and just depend on a single step rather than explicitly
defining all dependencies.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-23 20:56:43 +05:30
Andrey Smirnov
9608ef56dc
chore: allow bridge traffic with DHCP broadcast traffic
This is required for https://github.com/siderolabs/sidero/pull/1070, as
we need to allow DHCP traffic from Sidero controller running in a VM
through the bridge to other VMs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-23 18:37:37 +04:00
Noel Georgi
833895940b
chore: add tests for zfs extension
Add tests for ZFS and btrfs extensions.
Also fix the e2e-aws cron pipeline.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-23 11:16:25 +05:30
Noel Georgi
6b0373ebef
chore: move bash tests to integration
move extensions and secureboot tests to integration.
Makes it easier to test.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-17 19:58:35 +05:30
Dmitriy Matrenichev
c4a1ca8d61
chore: remove <-errCh where possible in grpc methods
Simplify code by passing error directly into the pipe closer.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-08-07 22:28:58 +03:00
Andrey Smirnov
06369e8195
fix: retry CRI pod removal, fix upgrade flow in the tests
It seems that CRI has a bit of eventual consistency, and it might fail
to remove a stopped pod failing that it's still running.

Rewrite the upgrade API call in the upgrade test to actually wait for
the upgrade to be successful, and fail immediately if it's not
successful. This should improve the test stability and it should make
it easier to find issues immediately.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-12 16:20:10 +04:00
Andrey Smirnov
8017afb107
feat: implement CRI image management and pre-pull on K8s upgrade
Fixes #6391

Implement a set of APIs and commands to manage images in the CRI, and
pre-pull images on Kubernetes upgrades.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-11 19:25:10 +04:00
Artem Chernyshev
ce63abb219
feat: add KMS assisted encryption key handler
Talos now supports new type of encryption keys which rely on Sealing/Unsealing randomly generated bytes with a KMS server:

```
systemDiskEncryption:
  ephemeral:
    keys:
      - kms:
          endpoint: https://1.2.3.4:443
        slot: 0
```
gRPC API definitions and a simple reference implementation of the KMS server can be found in this
[repository](https://github.com/siderolabs/kms-client/blob/main/cmd/kms-server/main.go).

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-07-07 19:02:39 +03:00
Noel Georgi
8daf432b29
chore: bump deps
Bump deps.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-22 22:41:08 +05:30
Noel Georgi
e3f3f5794d
feat: implement revert for sd-boot
Implement revert for sd-boot.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-22 20:20:31 +05:30
Dmitriy Matrenichev
665702ddd3
chore: fix cilium e2e tests
`WITH_CONFIG_PATCH_WORKER` check result was overriding any value set in `CONFIG_PATCH_FLAG` variable.
Move it to the different variable.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-14 15:08:31 +04:00
Andrey Smirnov
e9dbc9311b
test: bump versions for upgrade tests
As we're getting to 1.5.0, bump versions for upgrade tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-13 23:17:22 +04:00
Andrey Smirnov
0a99965efb
refactor: replace uncordonNode with controllers
Fixes #7233

Waiting for node readiness now happens in the `MachineStatus` controller
which won't mark the node as ready until Kubernetes `Node` is ready.

Handling cordoning/uncordining happens with help of additional resource
in `NodeApplyController`.

New controller provides reactive `NodeStatus` resource to see current
status of Kubernetes `Node`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-13 21:48:42 +04:00
Andrey Smirnov
dbaf5c6997
refactor: task labelControlPlane into controllers
See #7233

The controlplane label is simply injected into existing controller-based
node label flow.

For controlplane taint default NoScheduleTaint, additional controller &
resource was implemented to handle node taints.

This also fixes a problem with `allowSchedulingOnControlPlanes` not
being reactive to config changes - now it is.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-12 15:25:13 +04:00
Dmitriy Matrenichev
8a02ecd4cb
chore: add endpoints balancer controller
This PR adds support for creating a list of API endpoints (each is pair of host and port).

It gets them from
- Machine config cluster endpoint.
- Localhost with LocalAPIServerPort if machine is control panel.
- netip.Addr[0] and port from affiliates if they are control panels.

For #7191

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-06-05 20:47:52 -04:00
Andrey Smirnov
badbc51e63
refactor: rewrite code to include preliminary support for multi-doc
`config.Container` implements a multi-doc container which implements
both `Container` interface (encoding, validation, etc.), and `Conifg`
interface (accessing parts of the config).

Refactor `generate` and `bundle` packages to support multi-doc, and
provide backwards compatibility.

Implement a first (mostly example) machine config document for
SideroLink API URL.

Many places don't properly support multi-doc yet (e.g. config patches).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-31 18:38:05 +04:00
Andrey Smirnov
dc6764871c
refactor: move around config interfaces, make RawV1Alpha1 typed
See #7230

Refactor more config interfaces, move config accessor interfaces
to different package to break the dependency loop.

Make `.RawV1Alpha1()` method typed to avoid type assertions everywhere.

No functional changes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-23 22:08:58 +04:00
Andrey Smirnov
0bb7e8a5cf
refactor: split config.Provider into Config & Container
See #7230

This is a step towards preparing for multi-doc config.

Split the `config.Provider` interface into parts which have different
implementation:

* `config.Config` accesses the config itself, it might be implemented by
  `v1alpha1.Config` for example
* `config.Container` will be a set of config documents, which implement
  validation, encoding, etc.

`Version()` method dropped, as it makes little sense and it was almost
not used.

`Raw()` method renamed to `RawV1Alpha1()` to support legacy direct
access to `v1alpha1.Config`, next PR will refactor more to make it
return proper type.

There will be many more changes coming up.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-23 16:05:16 +04:00
Dmitriy Matrenichev
45e6e27af7
chore: bump runtime
Use new functions and methods from runtime module.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-05-11 17:18:08 -04:00
Andrey Smirnov
860002c735
fix: don't reload control plane pods on cert SANs changes
Fixes #7159

The change looks big, but it's actually pretty simple inside: the static
pods had an annotation which tracks a version of the secrets which
forced control plane pods to reload on a change. At the same time
`kube-apiserver` can reload certificate inputs automatically from files
without restart.

So the inputs were split: the dynamic (for kube-apiserver) inputs don't
need to be reloaded, so its version is not tracked in static pod
annotation, so they don't cause a reload. The previous non-dynamic
resource still causes a reload, but it doesn't get updated when e.g.
node addresses change.

There might be many more refactoring done, the resource chain is a bit
of a mess there, but I wanted to keep number of changes minimal to keep
this backportable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-05 16:59:09 +04:00
Noel Georgi
cad43f0ad3
chore: remove k8s master label
Since talos now defaults to k8s 1.27, remove the handling
of `master` label for controlplane nodes.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-04-25 20:48:05 +05:30
Nico Berlee
0af8fe2fb5
feat: netstat pod support
talosctl netstat -k show all host and non-hostnetwork pods sockets/connections.
talosctl netstat namespace/pod shows sockets/connections of a specific pod +
autocompletes in the shell.

Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-30 23:39:38 +04:00
Noel Georgi
d1a61fd343
chore: bump golangci-lint
Bump golangci-lint and fixup new warnings. Ignore check that checks for
used function parameters, it's kind of noisy and makes it confusing to
read interface implementations.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-22 19:55:38 +05:30