399 Commits

Author SHA1 Message Date
Andrey Smirnov
736c1485e2
fix: change the UEFI firmware search path order
Ensure that SecureBoot enabled images come before regular ones.

With Ubuntu 24.04 `ovmf` package, due to the ordering of the search
paths `talosctl` might pick up a wrong image and disable SecureBoot.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-11 21:56:33 +04:00
Dmitriy Matrenichev
dad9c40c73
chore: simplify code
- replace `interface{}` with `any` using `gofmt -r 'interface{} -> any -w'`
- replace `a = []T{}` with `var a []T` where possible.
- replace `a = []T{}` with `a = make([]T, 0, len(b))` where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-07-08 18:14:00 +03:00
Andrey Smirnov
2512ef435f
test: fix the integrtion tests for apply-config
They got broken after refactoring.

Also use this PR to test things before the release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-08 14:06:45 +04:00
Dmitriy Matrenichev
2d054ad355
chore: handle documents diff in apply-config dry run
Before this PR diff generator only diffed the v1alpha1 config and nothing else. With this PR it also takes
separate docs into the account.

```shell
~ > <editor> controlplane.yaml
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml --dry-run
Dry run summary:
Applied configuration without a reboot (skipped in dry-run).
Config diff:
No changes.
Documents diff:
[]config.Document{
+	&runtime.KmsgLogV1Alpha1{
+		Meta:       meta.Meta{MetaAPIVersion: "v1alpha1", MetaKind: "KmsgLogConfig"},
+		MetaName:   "omni-kmsg",
+		KmsgLogURL: s"tcp://[fdae:41e4:649b:9303::1]:8092",
+	},
}
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml
Applied configuration without a reboot
~ >
~ >
~ >
~ > <editor> controlplane.yaml
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml --dry-run
Dry run summary:
Applied configuration without a reboot (skipped in dry-run).
Config diff:
No changes.
Documents diff:
[]config.Document{
	&runtime.KmsgLogV1Alpha1{Meta: {MetaAPIVersion: "v1alpha1", MetaKind: "KmsgLogConfig"}, MetaName: "omni-kmsg", KmsgLogURL: {URL: &{Scheme: "tcp", Host: "[fdae:41e4:649b:9303::1]:8092"}}},
+	&network.DefaultActionConfigV1Alpha1{
+		Meta:    meta.Meta{MetaAPIVersion: "v1alpha1", MetaKind: "NetworkDefaultActionConfig"},
+		Ingress: s"block",
+	},
}
```

Closes #8885

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-27 21:15:36 +03:00
Dmitriy Matrenichev
c603d2bf95
chore: output more info when ExecuteCommandInPod fails
This should make investigating things like [this](https://github.com/siderolabs/talos/actions/runs/9411253542/job/25924192027)
easier.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-24 20:15:45 +03:00
Noel Georgi
86a3222aee
chore: use new disks api for iscsi tests
The iscsi test broke when the new disks api was introduced making the
test pass always, now filter other only `iscsi` disk types using the new
disks API.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-18 18:38:21 +05:30
Andrey Smirnov
7fcb521a6a
feat: use hydrophone instead of sonobuoy
Fixes #8790

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-11 16:51:45 +04:00
Andrey Smirnov
d1a0c1f983
test: fix the integration test for no META name
When META has never been written (e.g. booted from a disk image), it
won't be detected as `talosmeta`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-11 15:17:52 +04:00
Andrey Smirnov
e8ced2c2dd
chore: drop k8s timeout in the default kubeconfig
(This is not user-facing, but rather internal use of the kubeconfig in
the tests/inside the machine).

This was added 4 years ago as a workaround, but instead of a global
timeout we should rather use contexts with timeouts/deadlines (and we
do!).

Setting a global timeout breaks streaming Kubernetes pod logs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-10 18:29:50 +04:00
Andrey Smirnov
7cbdce73f7
fix: detect CD devices, fix user disks wipe test
Detect CD devices, and set size to 0 for CD without media.

In user disk wipe tests, skip device mapper devices and CD-ROM.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-10 18:00:06 +04:00
Andrey Smirnov
f07b79f4a8
feat: provide disk detection based on new blockdevices
Uses go-siderolabs/go-blockdevice/v2 for all the hard parts,
provides new resource `Disk` which describes all disks in the system.

Additional resource `SystemDisk` always point to the system disk (based
on the location of `META` partition).

The `Disks` API (and `talosctl disks`) provides a view now into the
`talosctl get disks` to keep backwards compatibility.

QEMU provisioner can now create extra disks of various types: IDE, AHCI,
SCSI, NVME, this allows to test detection properly.

The new resource will be the foundation for volume provisioning (to pick
up the disk to provision the volume on).

Example:

```
talosctl -n 172.20.0.5 get disks
NODE         NAMESPACE   TYPE   ID        VERSION   SIZE          READ ONLY   TRANSPORT   ROTATIONAL   WWID                                                               MODEL            SERIAL
172.20.0.5   runtime     Disk   loop0     1         65568768      true
172.20.0.5   runtime     Disk   nvme0n1   1         10485760000   false       nvme                     nvme.1b36-6465616462656566-51454d55204e564d65204374726c-00000001   QEMU NVMe Ctrl   deadbeef
172.20.0.5   runtime     Disk   sda       1         10485760000   false       virtio      true                                                                            QEMU HARDDISK
172.20.0.5   runtime     Disk   sdb       1         10485760000   false       sata        true         t10.ATA     QEMU HARDDISK                           QM00013        QEMU HARDDISK
172.20.0.5   runtime     Disk   sdc       1         10485760000   false       sata        true         t10.ATA     QEMU HARDDISK                           QM00001        QEMU HARDDISK
172.20.0.5   runtime     Disk   vda       1         12884901888   false       virtio      true
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-07 20:18:32 +04:00
Andrey Smirnov
7c9a14383e
fix: volume discovery improvements
Use shared locks, discover more partitions, some other small changes.

Re-enable the flaky test.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-06 19:45:40 +04:00
Andrey Smirnov
30860210cc
test: fix hardware test not to require PCI devices
On e.g. Azure VMs there are non reported.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-03 17:20:42 +04:00
Andrey Smirnov
4dd0aa7120
feat: implement PCI device bus enumeration
Fixes #8826

From the QEMU VM:

```shell
$ talosctl -n 172.20.0.5 get pcidevice
NODE         NAMESPACE   TYPE        ID             VERSION   CLASS                       SUBCLASS                    VENDOR              PRODUCT
172.20.0.5   hardware    PCIDevice   0000:00:00.0   1         Bridge                      Host bridge                 Intel Corporation   82G33/G31/P35/P31 Express DRAM Controller
172.20.0.5   hardware    PCIDevice   0000:00:01.0   1         Display controller          VGA compatible controller
172.20.0.5   hardware    PCIDevice   0000:00:02.0   1         Network controller          Ethernet controller         Red Hat, Inc.       Virtio network device
172.20.0.5   hardware    PCIDevice   0000:00:03.0   1         Unclassified device                                     Red Hat, Inc.       Virtio RNG
172.20.0.5   hardware    PCIDevice   0000:00:04.0   1         Unclassified device                                     Red Hat, Inc.       Virtio memory balloon
172.20.0.5   hardware    PCIDevice   0000:00:05.0   1         Communication controller    Communication controller    Red Hat, Inc.       Virtio console
172.20.0.5   hardware    PCIDevice   0000:00:06.0   1         Generic system peripheral   System peripheral           Intel Corporation   6300ESB Watchdog Timer
172.20.0.5   hardware    PCIDevice   0000:00:07.0   1         Mass storage controller     SCSI storage controller     Red Hat, Inc.       Virtio block device
172.20.0.5   hardware    PCIDevice   0000:00:1f.0   1         Bridge                      ISA bridge                  Intel Corporation   82801IB (ICH9) LPC Interface Controller
172.20.0.5   hardware    PCIDevice   0000:00:1f.2   1         Mass storage controller     SATA controller             Intel Corporation   82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
172.20.0.5   hardware    PCIDevice   0000:00:1f.3   1         Serial bus controller       SMBus                       Intel Corporation   82801I (ICH9 Family) SMBus Controller
```

```yaml
node: 172.20.0.5
metadata:
    namespace: hardware
    type: PCIDevices.hardware.talos.dev
    id: 0000:00:1f.3
    version: 1
    owner: hardware.PCIDevicesController
    phase: running
    created: 2024-05-30T12:09:05Z
    updated: 2024-05-30T12:09:05Z
spec:
    class: Serial bus controller
    subclass: SMBus
    vendor: Intel Corporation
    product: 82801I (ICH9 Family) SMBus Controller
    class_id: "0x0c"
    subclass_id: "0x05"
    vendor_id: "0x8086"
    product_id: "0x2930"
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-31 20:56:16 +04:00
Dmitriy Matrenichev
893e64fcb1
fix: replace nslookup with dig in integration tests
This should be more reliable on `integration-aws-*` and others.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-30 01:37:01 +03:00
Dmitry Sharshakov
da8305ffb4
test: add a test for watchdog timers
Try to activate/deactivate watchdogs, change timeout, run only on QEMU.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
2024-05-28 16:46:04 +04:00
Dmitriy Matrenichev
a9cf9b7892
fix: correctly handle dns messages in our dns implementation
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

Closes #8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-24 21:41:00 +03:00
Andrey Smirnov
c2b19dcb97
chore: move to containerd 2.0 API
Lots of module moves/renames.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-24 21:48:55 +04:00
Andrey Smirnov
2e64e9e4e0
fix: require accepted CAs on worker nodes
Note: this issue never happens with default Talos worker configuration
(generated by Omni, `talosctl gen config` or CABPT).

Before change https://github.com/siderolabs/talos/pull/4294 3 years ago,
worker nodes connected to trustd in "insecure" mode (without validating
the trustd server certificate). The change kept backwards compatibility,
so it still allowed insecure mode on upgrades.

Now it's time to break this compatibility promise, and require
accepted CAs to be always present. Adds validation for machine
configuration, so if upgrade is attempeted, it would not validate the
machine config without accepted CAs.

Now lack of accepted CAs would lead to failure to connect to trustd.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-23 17:48:16 +04:00
Andrey Smirnov
b7afe2669b
feat: update Linux 6.6.30
Update tools/pkgs to the latest version, brings in all updates.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-13 17:14:03 +04:00
Andrey Smirnov
763dae2508
fix: add cluster name to the worker machine config
This is 1.8+ only.

Fixes #8694

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-07 20:11:23 +04:00
Andrey Smirnov
b690ffeb89
test: improve DNS resolver test stability
Run a health check before the test, as the test depends on CoreDNS being
healthy, and previous tests might disturb the cluster.

Also refactor by using watch instead of retries, make pods terminate
fast.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-29 19:31:34 +04:00
Andrey Smirnov
05fd042bb3
test: improve the reset integration tests
Provide a trace for each step of the reset sequence taken, so if one of
those fails, integration test produces a meaningful message instead of
proceeding and failing somewhere else.

More cleanup/refactor, should be functionally equivalent.

Fixes #8635

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-24 18:35:39 +04:00
Andrey Smirnov
bac1d00c35
chore: prepare for Talos 1.8
Fork docs, introduce version contract for 1.8.

Clean up old version contracts 0.8-0.14.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-19 18:19:36 +04:00
Dmitriy Matrenichev
ec69d7a785
chore: replace math/rand with math/rand/v2
New package arrived in Go 1.22 which provides better rand primitives and functions.
Use it instead of the old one.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-04-18 13:20:59 +03:00
Andrey Smirnov
3433fa13bf
feat: use container DNS when in container mode
More specifically, pick up `/etc/resolv.conf` contents by default when
in container mode, and use that as a base resolver for the host DNS.

Fixes #8303

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-16 17:01:36 +04:00
Andrey Smirnov
c8f674bd3d
test: add a test for 'spin' container runtime
See https://github.com/siderolabs/extensions/pull/355

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-10 20:42:16 +04:00
Andrey Smirnov
9aa1e1b79b
fix: present all accepted CAs to the kube-apiserver
This fixes an issue with a single controlplane cluster.

Properly present all accepted CAs to the apiserver, in the test let the
cluster fully recovery between two CA rotations performed.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-08 23:33:22 +04:00
Dmitry Sharshakov
653f838b09
feat: support multiple Docker cluster in talosctl cluster create
Dynamically map Kubernetes and Talos API ports to an available port on
the host, so every cluster gets its own unique set of parts.

As part of the changes, refactor the provision library and interfaces,
dropping old weird interfaces replacing with (hopefully) much more
descriprive names.

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-04 21:21:39 +04:00
Andrey Smirnov
78b9bd9273
fix: report unsupported x86_64 microarchitecture level
Fixes #8361

Talos requires v2 (circa 2008), but VMs are often configured to limit
the exposed features to the baseline (v1).

```
[    0.779218] [talos] [initramfs] booting Talos v1.7.0-alpha.1-35-gef5bbe728-dirty
[    0.779806] [talos] [initramfs] CPU: QEMU Virtual CPU version 2.5+, 4 core(s), 1 thread(s) per core
[    0.780529] [talos] [initramfs] x86_64 microarchitecture level: 1
[    0.781018] [talos] [initramfs] it might be that the VM is configured with an older CPU model, please check the VM configuration
[    0.782346] [talos] [initramfs] x86_64 microarchitecture level 2 or higher is required, halting
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-03 16:09:57 +04:00
Noel Georgi
f515741b52
chore: add equinix e2e-tests
Add equinix e2e-tests.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-04-02 17:16:59 +05:30
Andrey Smirnov
7a68504b6b
feat: support rotating Kubernetes CA
Fixes #8440

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-01 22:08:02 +04:00
Dmitriy Matrenichev
8dc4910c48
chore: enable "WG over GRPC" testing in siderolink agent tests
Fixes https://github.com/siderolabs/talos/issues/8514
For https://github.com/siderolabs/talos/issues/8392

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-04-01 18:24:57 +03:00
Andrey Smirnov
8eacc4ba80
feat: support rotation of Talos API CA
This allows to roll all nodes to use a new CA, to refresh it, or e.g.
when the `talosconfig` was exposed accidentally.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-22 12:16:47 +04:00
Dmitriy Matrenichev
19f15a840c
chore: bump golangci-lint to 1.57.0
Fix all discovered issues.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-03-21 01:06:53 +03:00
Artem Chernyshev
113fb646ec
chore: use go-talos-support library
The code for collecting Talos `support.zip` was extracted there.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-03-19 18:28:46 +03:00
Andrey Smirnov
ead37abf09
test: disable volume tests
They're flaky, disable until the root cause is known.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-19 16:40:42 +04:00
Andrey Smirnov
15beb14780
feat: implement blockdevice watch controller
This controller combines kobject events, and scan of `/sys/block` to
build a consistent list of available block devices, updating resources
as the blockdevice changes.

Based on these resources the next step can run probe on the blockdevices
as they change to present a consistent view of filesystems/partitions.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-18 18:28:40 +04:00
Andrey Smirnov
9afa70baf3
fix: patch correctly config in talosctl upgrade-k8s
The current code was stipping non-`v1alpha1.Config` documents. Provide a
proper method in the config provider, and update places using it.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-15 20:42:44 +04:00
Andrey Smirnov
3130caf954
chore: re-enable DRBD extension
See https://github.com/siderolabs/extensions/pull/343

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-15 15:55:18 +04:00
Artem Chernyshev
3c8f51d707
chore: move cli formatters and version modules to machinery
To be used in the `go-talos-support` module without importing the whole
Talos repo.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-03-07 16:29:15 +03:00
Andrey Smirnov
bbed07e03a
feat: update Linux to 6.6.18
ZFS extension got re-enabled for 1.7.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-29 20:08:59 +04:00
Andrey Smirnov
0b9b4da12a
feat: update Kubernetes to 1.30.0-alpha.3
See https://github.com/kubernetes/kubernetes/releases/tag/v1.30.0-alpha.3

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-29 14:36:09 +04:00
Fabiano Fidêncio
64e9703f86
chore: add tests for the Kata Containers extension
Let's add a very basic test for the Kata Containers extension, mimicing
what's already in place for gVisor.

This depends on the work being done in:
https://github.com/siderolabs/extensions/pull/279

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-02-20 18:49:47 +05:30
Andrey Smirnov
66f3ffdd4a
fix: ensure that Talos runs in a pod (container)
Drop the Kubernetes manifests as static files clean up (this is only
needed for upgrades from 1.2.x).

Fix Talos handling of cgroup hierarchy: if started in container in a
non-root cgroup hiearachy, use that to handle proper cgroup paths.

Add a test for a simple TinK mode (Talos-in-Kubernetes).

Update the docs.

Fixes #8274

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-20 15:06:48 +04:00
Matthieu S
3fe82ec461
feat: custom image settings for k8s upgrade
Allows to use custom registry/images.

Fixes: #8275

Co-authored-by:  @g3offrey
Signed-off-by: Matthieu STROHL <mstrohl@dive-in-it.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-15 17:54:01 +04:00
Dmitriy Matrenichev
fa3b933705
chore: replace fmt.Errorf with errors.New where possible
This time use `eg` from `x/tools` repo tool to do this.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-14 17:39:30 +03:00
Dmitriy Matrenichev
5324d39167
chore: bump stuff
Also fix .golangci.yml file.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-09 19:19:25 +03:00
Henno Schooljan
a04cc80154
fix: pass TTL when generating client certificate
Pass the TTL to the talosconfig generation function.

Signed-off-by: Henno Schooljan <github@sfynx.nl>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-05 18:54:16 +04:00
Saiyam Pathak
4184e617ab
chore: add test for wasmedge runtime extension
Add tests for WasmEdge container runtime system extension.

Signed-off-by: Saiyam Pathak <saiyam911@gmail.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-02-05 18:18:13 +05:30