201 Commits

Author SHA1 Message Date
Noel Georgi
64914b086c
chore: add test for crun extension
Add a test to verify the `crun` runtimeclass container-runtime extension
works as expected.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-08-02 20:15:01 +05:30
Andrey Smirnov
7a1c62b8bc
feat: publish installed extensions as node labels/annotations
Extensions are posted the following way:

`extensions.talos.dev/<name>=<version>`

The name should be valid as a label (annotation) key.

If the value is valid as a label value, use labels, otherwise use
annotations.

Also implements node annotations in the machine config as a side-effect.

Fixes #9089

Fixes #8971

See #9070

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-01 17:32:09 +04:00
Noel Georgi
117628aa60
chore: add test for gvisor extension with platform kvm
Add test for Gvisor extensions when kvm platform is used.

The test is marked as skipped until pod termination issue is resolved.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-25 19:15:27 +05:30
Andrey Smirnov
b07338f547
feat: provide machine config document to update trusted CA roots
Fixes #8867

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-12 19:28:31 +04:00
Dmitriy Matrenichev
dad9c40c73
chore: simplify code
- replace `interface{}` with `any` using `gofmt -r 'interface{} -> any -w'`
- replace `a = []T{}` with `var a []T` where possible.
- replace `a = []T{}` with `a = make([]T, 0, len(b))` where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-07-08 18:14:00 +03:00
Andrey Smirnov
2512ef435f
test: fix the integrtion tests for apply-config
They got broken after refactoring.

Also use this PR to test things before the release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-08 14:06:45 +04:00
Dmitriy Matrenichev
2d054ad355
chore: handle documents diff in apply-config dry run
Before this PR diff generator only diffed the v1alpha1 config and nothing else. With this PR it also takes
separate docs into the account.

```shell
~ > <editor> controlplane.yaml
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml --dry-run
Dry run summary:
Applied configuration without a reboot (skipped in dry-run).
Config diff:
No changes.
Documents diff:
[]config.Document{
+	&runtime.KmsgLogV1Alpha1{
+		Meta:       meta.Meta{MetaAPIVersion: "v1alpha1", MetaKind: "KmsgLogConfig"},
+		MetaName:   "omni-kmsg",
+		KmsgLogURL: s"tcp://[fdae:41e4:649b:9303::1]:8092",
+	},
}
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml
Applied configuration without a reboot
~ >
~ >
~ >
~ > <editor> controlplane.yaml
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml --dry-run
Dry run summary:
Applied configuration without a reboot (skipped in dry-run).
Config diff:
No changes.
Documents diff:
[]config.Document{
	&runtime.KmsgLogV1Alpha1{Meta: {MetaAPIVersion: "v1alpha1", MetaKind: "KmsgLogConfig"}, MetaName: "omni-kmsg", KmsgLogURL: {URL: &{Scheme: "tcp", Host: "[fdae:41e4:649b:9303::1]:8092"}}},
+	&network.DefaultActionConfigV1Alpha1{
+		Meta:    meta.Meta{MetaAPIVersion: "v1alpha1", MetaKind: "NetworkDefaultActionConfig"},
+		Ingress: s"block",
+	},
}
```

Closes #8885

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-27 21:15:36 +03:00
Dmitriy Matrenichev
c603d2bf95
chore: output more info when ExecuteCommandInPod fails
This should make investigating things like [this](https://github.com/siderolabs/talos/actions/runs/9411253542/job/25924192027)
easier.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-24 20:15:45 +03:00
Noel Georgi
86a3222aee
chore: use new disks api for iscsi tests
The iscsi test broke when the new disks api was introduced making the
test pass always, now filter other only `iscsi` disk types using the new
disks API.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-18 18:38:21 +05:30
Andrey Smirnov
d1a0c1f983
test: fix the integration test for no META name
When META has never been written (e.g. booted from a disk image), it
won't be detected as `talosmeta`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-11 15:17:52 +04:00
Andrey Smirnov
7cbdce73f7
fix: detect CD devices, fix user disks wipe test
Detect CD devices, and set size to 0 for CD without media.

In user disk wipe tests, skip device mapper devices and CD-ROM.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-10 18:00:06 +04:00
Andrey Smirnov
f07b79f4a8
feat: provide disk detection based on new blockdevices
Uses go-siderolabs/go-blockdevice/v2 for all the hard parts,
provides new resource `Disk` which describes all disks in the system.

Additional resource `SystemDisk` always point to the system disk (based
on the location of `META` partition).

The `Disks` API (and `talosctl disks`) provides a view now into the
`talosctl get disks` to keep backwards compatibility.

QEMU provisioner can now create extra disks of various types: IDE, AHCI,
SCSI, NVME, this allows to test detection properly.

The new resource will be the foundation for volume provisioning (to pick
up the disk to provision the volume on).

Example:

```
talosctl -n 172.20.0.5 get disks
NODE         NAMESPACE   TYPE   ID        VERSION   SIZE          READ ONLY   TRANSPORT   ROTATIONAL   WWID                                                               MODEL            SERIAL
172.20.0.5   runtime     Disk   loop0     1         65568768      true
172.20.0.5   runtime     Disk   nvme0n1   1         10485760000   false       nvme                     nvme.1b36-6465616462656566-51454d55204e564d65204374726c-00000001   QEMU NVMe Ctrl   deadbeef
172.20.0.5   runtime     Disk   sda       1         10485760000   false       virtio      true                                                                            QEMU HARDDISK
172.20.0.5   runtime     Disk   sdb       1         10485760000   false       sata        true         t10.ATA     QEMU HARDDISK                           QM00013        QEMU HARDDISK
172.20.0.5   runtime     Disk   sdc       1         10485760000   false       sata        true         t10.ATA     QEMU HARDDISK                           QM00001        QEMU HARDDISK
172.20.0.5   runtime     Disk   vda       1         12884901888   false       virtio      true
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-07 20:18:32 +04:00
Andrey Smirnov
7c9a14383e
fix: volume discovery improvements
Use shared locks, discover more partitions, some other small changes.

Re-enable the flaky test.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-06 19:45:40 +04:00
Andrey Smirnov
30860210cc
test: fix hardware test not to require PCI devices
On e.g. Azure VMs there are non reported.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-03 17:20:42 +04:00
Andrey Smirnov
4dd0aa7120
feat: implement PCI device bus enumeration
Fixes #8826

From the QEMU VM:

```shell
$ talosctl -n 172.20.0.5 get pcidevice
NODE         NAMESPACE   TYPE        ID             VERSION   CLASS                       SUBCLASS                    VENDOR              PRODUCT
172.20.0.5   hardware    PCIDevice   0000:00:00.0   1         Bridge                      Host bridge                 Intel Corporation   82G33/G31/P35/P31 Express DRAM Controller
172.20.0.5   hardware    PCIDevice   0000:00:01.0   1         Display controller          VGA compatible controller
172.20.0.5   hardware    PCIDevice   0000:00:02.0   1         Network controller          Ethernet controller         Red Hat, Inc.       Virtio network device
172.20.0.5   hardware    PCIDevice   0000:00:03.0   1         Unclassified device                                     Red Hat, Inc.       Virtio RNG
172.20.0.5   hardware    PCIDevice   0000:00:04.0   1         Unclassified device                                     Red Hat, Inc.       Virtio memory balloon
172.20.0.5   hardware    PCIDevice   0000:00:05.0   1         Communication controller    Communication controller    Red Hat, Inc.       Virtio console
172.20.0.5   hardware    PCIDevice   0000:00:06.0   1         Generic system peripheral   System peripheral           Intel Corporation   6300ESB Watchdog Timer
172.20.0.5   hardware    PCIDevice   0000:00:07.0   1         Mass storage controller     SCSI storage controller     Red Hat, Inc.       Virtio block device
172.20.0.5   hardware    PCIDevice   0000:00:1f.0   1         Bridge                      ISA bridge                  Intel Corporation   82801IB (ICH9) LPC Interface Controller
172.20.0.5   hardware    PCIDevice   0000:00:1f.2   1         Mass storage controller     SATA controller             Intel Corporation   82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
172.20.0.5   hardware    PCIDevice   0000:00:1f.3   1         Serial bus controller       SMBus                       Intel Corporation   82801I (ICH9 Family) SMBus Controller
```

```yaml
node: 172.20.0.5
metadata:
    namespace: hardware
    type: PCIDevices.hardware.talos.dev
    id: 0000:00:1f.3
    version: 1
    owner: hardware.PCIDevicesController
    phase: running
    created: 2024-05-30T12:09:05Z
    updated: 2024-05-30T12:09:05Z
spec:
    class: Serial bus controller
    subclass: SMBus
    vendor: Intel Corporation
    product: 82801I (ICH9 Family) SMBus Controller
    class_id: "0x0c"
    subclass_id: "0x05"
    vendor_id: "0x8086"
    product_id: "0x2930"
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-31 20:56:16 +04:00
Dmitriy Matrenichev
893e64fcb1
fix: replace nslookup with dig in integration tests
This should be more reliable on `integration-aws-*` and others.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-30 01:37:01 +03:00
Dmitry Sharshakov
da8305ffb4
test: add a test for watchdog timers
Try to activate/deactivate watchdogs, change timeout, run only on QEMU.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
2024-05-28 16:46:04 +04:00
Dmitriy Matrenichev
a9cf9b7892
fix: correctly handle dns messages in our dns implementation
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

Closes #8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-24 21:41:00 +03:00
Andrey Smirnov
c2b19dcb97
chore: move to containerd 2.0 API
Lots of module moves/renames.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-24 21:48:55 +04:00
Andrey Smirnov
2e64e9e4e0
fix: require accepted CAs on worker nodes
Note: this issue never happens with default Talos worker configuration
(generated by Omni, `talosctl gen config` or CABPT).

Before change https://github.com/siderolabs/talos/pull/4294 3 years ago,
worker nodes connected to trustd in "insecure" mode (without validating
the trustd server certificate). The change kept backwards compatibility,
so it still allowed insecure mode on upgrades.

Now it's time to break this compatibility promise, and require
accepted CAs to be always present. Adds validation for machine
configuration, so if upgrade is attempeted, it would not validate the
machine config without accepted CAs.

Now lack of accepted CAs would lead to failure to connect to trustd.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-23 17:48:16 +04:00
Andrey Smirnov
b7afe2669b
feat: update Linux 6.6.30
Update tools/pkgs to the latest version, brings in all updates.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-13 17:14:03 +04:00
Andrey Smirnov
b690ffeb89
test: improve DNS resolver test stability
Run a health check before the test, as the test depends on CoreDNS being
healthy, and previous tests might disturb the cluster.

Also refactor by using watch instead of retries, make pods terminate
fast.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-29 19:31:34 +04:00
Andrey Smirnov
05fd042bb3
test: improve the reset integration tests
Provide a trace for each step of the reset sequence taken, so if one of
those fails, integration test produces a meaningful message instead of
proceeding and failing somewhere else.

More cleanup/refactor, should be functionally equivalent.

Fixes #8635

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-24 18:35:39 +04:00
Andrey Smirnov
3433fa13bf
feat: use container DNS when in container mode
More specifically, pick up `/etc/resolv.conf` contents by default when
in container mode, and use that as a base resolver for the host DNS.

Fixes #8303

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-16 17:01:36 +04:00
Andrey Smirnov
c8f674bd3d
test: add a test for 'spin' container runtime
See https://github.com/siderolabs/extensions/pull/355

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-10 20:42:16 +04:00
Andrey Smirnov
9aa1e1b79b
fix: present all accepted CAs to the kube-apiserver
This fixes an issue with a single controlplane cluster.

Properly present all accepted CAs to the apiserver, in the test let the
cluster fully recovery between two CA rotations performed.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-08 23:33:22 +04:00
Dmitry Sharshakov
653f838b09
feat: support multiple Docker cluster in talosctl cluster create
Dynamically map Kubernetes and Talos API ports to an available port on
the host, so every cluster gets its own unique set of parts.

As part of the changes, refactor the provision library and interfaces,
dropping old weird interfaces replacing with (hopefully) much more
descriprive names.

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-04 21:21:39 +04:00
Andrey Smirnov
78b9bd9273
fix: report unsupported x86_64 microarchitecture level
Fixes #8361

Talos requires v2 (circa 2008), but VMs are often configured to limit
the exposed features to the baseline (v1).

```
[    0.779218] [talos] [initramfs] booting Talos v1.7.0-alpha.1-35-gef5bbe728-dirty
[    0.779806] [talos] [initramfs] CPU: QEMU Virtual CPU version 2.5+, 4 core(s), 1 thread(s) per core
[    0.780529] [talos] [initramfs] x86_64 microarchitecture level: 1
[    0.781018] [talos] [initramfs] it might be that the VM is configured with an older CPU model, please check the VM configuration
[    0.782346] [talos] [initramfs] x86_64 microarchitecture level 2 or higher is required, halting
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-03 16:09:57 +04:00
Noel Georgi
f515741b52
chore: add equinix e2e-tests
Add equinix e2e-tests.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-04-02 17:16:59 +05:30
Andrey Smirnov
7a68504b6b
feat: support rotating Kubernetes CA
Fixes #8440

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-01 22:08:02 +04:00
Dmitriy Matrenichev
8dc4910c48
chore: enable "WG over GRPC" testing in siderolink agent tests
Fixes https://github.com/siderolabs/talos/issues/8514
For https://github.com/siderolabs/talos/issues/8392

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-04-01 18:24:57 +03:00
Andrey Smirnov
8eacc4ba80
feat: support rotation of Talos API CA
This allows to roll all nodes to use a new CA, to refresh it, or e.g.
when the `talosconfig` was exposed accidentally.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-22 12:16:47 +04:00
Dmitriy Matrenichev
19f15a840c
chore: bump golangci-lint to 1.57.0
Fix all discovered issues.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-03-21 01:06:53 +03:00
Andrey Smirnov
ead37abf09
test: disable volume tests
They're flaky, disable until the root cause is known.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-19 16:40:42 +04:00
Andrey Smirnov
15beb14780
feat: implement blockdevice watch controller
This controller combines kobject events, and scan of `/sys/block` to
build a consistent list of available block devices, updating resources
as the blockdevice changes.

Based on these resources the next step can run probe on the blockdevices
as they change to present a consistent view of filesystems/partitions.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-18 18:28:40 +04:00
Andrey Smirnov
9afa70baf3
fix: patch correctly config in talosctl upgrade-k8s
The current code was stipping non-`v1alpha1.Config` documents. Provide a
proper method in the config provider, and update places using it.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-15 20:42:44 +04:00
Andrey Smirnov
3130caf954
chore: re-enable DRBD extension
See https://github.com/siderolabs/extensions/pull/343

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-15 15:55:18 +04:00
Andrey Smirnov
bbed07e03a
feat: update Linux to 6.6.18
ZFS extension got re-enabled for 1.7.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-29 20:08:59 +04:00
Fabiano Fidêncio
64e9703f86
chore: add tests for the Kata Containers extension
Let's add a very basic test for the Kata Containers extension, mimicing
what's already in place for gVisor.

This depends on the work being done in:
https://github.com/siderolabs/extensions/pull/279

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-02-20 18:49:47 +05:30
Dmitriy Matrenichev
fa3b933705
chore: replace fmt.Errorf with errors.New where possible
This time use `eg` from `x/tools` repo tool to do this.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-14 17:39:30 +03:00
Dmitriy Matrenichev
5324d39167
chore: bump stuff
Also fix .golangci.yml file.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-09 19:19:25 +03:00
Saiyam Pathak
4184e617ab
chore: add test for wasmedge runtime extension
Add tests for WasmEdge container runtime system extension.

Signed-off-by: Saiyam Pathak <saiyam911@gmail.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-02-05 18:18:13 +05:30
Andrey Smirnov
b44551ccdb
feat: update Linux to 6.6.13
See https://github.com/siderolabs/pkgs/pull/873

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-29 16:50:33 +04:00
Noel Georgi
0b94550c42
chore: fix the gvisor test
The gvisor test was not using the correct runtimeclass and would have
always passed the regardless.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-12-15 20:48:44 +05:30
Andrey Smirnov
10c59a6b90
fix: leave discovery service later in the reset sequence
Fixes #8057

I went back and forth on the way to fix it exactly, and ended up with a
pretty simple version of a fix.

The problem was that discovery service was removing the member at the
initial phase of reset, which actually still requires KubeSpan to be up:

* leaving `etcd` (need to talk to other members)
* stopping pods (might need to talk to Kubernetes API with some CNIs)

Now leaving discovery service happens way later, when network
interactions are no longer required.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-13 19:16:12 +04:00
Andrey Smirnov
36c8ddb5e1
feat: implement ingress firewall rules
Fixes #4421

See documentation for details on how to use the feature.

With `talosctl cluster create`, firewall can be easily test with
`--with-firewall=accept|block` (default mode).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-30 22:58:16 +04:00
Noel Georgi
f041b26299
chore: add tests for mdadm extension
Add tests for mdadm extension.

See: https://github.com/siderolabs/extensions/pull/271

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-11-27 23:18:35 +05:30
Dmitriy Matrenichev
dd45dd06cf
chore: add custom node taints
This PR adds support for custom node taints. Refer to `nodeTaints` in the `configuration` for more information.

Closes #7581

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-11-25 18:33:18 +03:00
Andrey Smirnov
06941b7e5c
fix: allow rootfs propagation configuration for extension services
Fixes #7873

Some services which perform mounts inside the container which require
mounts to propagate back to the host (e.g. `stargz-snapshotter`) require
this configuration setting.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-13 21:58:22 +04:00
Andrey Smirnov
813442dd7a
fix: don't validate machine.install if installed
As Talos doesn't consume `.machine.install` if already installed, there
is no point in validating it once already installed.

This fixes a problem users often run into: after a reboot/upgrade the
system disk blockdevice name changes, due to the kernel upgrade, or just
unpredictable behavior of device discovery. Talos fails to boot as it
can't validate the machine config, while it's already installed, so
actual blockdevice name doesn't matter.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-03 15:08:42 +04:00