277 Commits

Author SHA1 Message Date
Andrey Smirnov
33ecbaec6d
test: update apply config tests
Make the setup phase of the test a bit more consistent - wait for the
machine to be ready, connection refused to be cleared (after reboots).

This doesn't change anything in the tests themselves, but hopefully
should reduce number of flakes like: https://github.com/siderolabs/talos/actions/runs/15895820994/job/44827039818

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-07-04 19:04:15 +04:00
Andrey Smirnov
3801413309
feat: expose kernel cmdline as a resource
Fixes #11279

Co-authored-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-07-02 15:31:10 +02:00
Andrey Smirnov
c880835c80
feat: implement zswap support
Zswap allows to compress pages in memory before they hit the actual swap
device.

Both swap and zswap (or either one of these) can be enabled.

Fixes #10675

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-06-30 20:17:27 +04:00
Andrey Smirnov
7f0300f108
feat: update dependencies, Kubernetes 1.34.0-alpha.2
Bump all dependencies, many small changes due to new golangci-lint
version.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-06-30 19:05:22 +04:00
Andrey Smirnov
d32ccfa598
feat: implement swap support
Fixes #10674

Provide a way to see current swap status, configure additional swap
devices (block) and de-configure them on the fly.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-06-11 14:25:38 +04:00
Andrey Smirnov
c7d4191e78
fix: rework the way CRI config generation is waited for
Instead of relying on the fact that CRI patch should modify the
generated final CRI config, rely on the specific checksum of the CRI
patch to be included into the generated CRI config.

This also to resolve Talos hanging on boot when a CRI patch is a no-op
(it doesn't change the generated config).

Fixes #11132

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-06-03 14:56:52 +04:00
Andrey Smirnov
0b99631a0b
fix: bump apid memory limit
Fixes #11046

Test up to the maximum gRPC message size as we support.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-05-20 22:54:56 +04:00
Andrey Smirnov
da67952666
fix: disable automatic MAC assignment to bridge interfaces
Linux kernel has the following policy:

* initial bridge MAC is random
* if the bridge MAC is not set explicitly by userspace,
  bridge MAC is the smallest MAC address of all ports

But systemd-udevd which we use started to assign "stable" MACs to bridge
interfaces (when they are created), which Linux kernel treats as
userspace explicitly set, so the bridge no longer gets an automatic MAC
of the ports.

This is a breaking change, so we need to revert it.

Fixes #10884

Fixes #11011

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-05-15 18:45:16 +04:00
Andrey Smirnov
c6824c2114
fix: deny apply config requests without v1alpha1 in "normal" mode
In maintenance mode, we still accept any config.

Fixes #10897

As "normal" mode requires v1alpha1 config today, it should be an easy
fix to require it part of the applied config always.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-05-05 20:06:37 +04:00
Noel Georgi
1299aaa45d
chore(ci): add extensions test for Youki runtime
Add extensions test for Youki container runtime.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2025-05-01 08:28:09 +05:30
Andrey Smirnov
8013aa06cd
test: replace platform metadata test
It seems that the integration test introduced in
https://github.com/siderolabs/talos/pull/10792 is causing some
unintented side-effects in kube-apiserver -> kubelet communication (most
probably around the TLS certificate??).

Instead of assigning dummy external IP, create a dummy link.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-04-30 16:56:12 +04:00
Andrey Smirnov
f7c5b86be7
fix: sync PCR extension with volume provisioning lifecycle
Ensure volumes are not locked to the wrong value of PCR.

Fixes #10665

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-04-25 17:41:29 +04:00
Andrey Smirnov
8db34624c6
fix: handle correctly changing platform network config
The bug was with incorrect condition: if `activeNetworkConfig` was ever
set to non-nil value, it was stuck with this value forever, despite new
network config being available with `networkConfig`.

In `talosctl dashboard` case, Talos `metal` platform always reports
initial data (before META is available) which doesn't have any network
config, but later on sends updates (if something updates META), so this
bug leads to Talos being stuck with initial empty network config.

Fixes #10787

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-04-24 20:04:46 +04:00
Dmitrii Sharshakov
be3f0c018c
fix: fix Gvisor tests with containerd patch
Fixes #10681

Signed-off-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
2025-04-23 13:42:24 +02:00
Andrey Smirnov
8cd3c8dc77
test: fix NVIDIA OSS tests
Add more logging output.

Force non-UEFI boot.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-04-15 21:15:36 +04:00
Andrey Smirnov
664fa36973
feat: implement user volumes
User volumes are identified by a short name which serves both
as a `/var/mnt` mount point and a partition label.

User volumes can be added and removed on the fly, and they are
automatically propagated into the `kubelet` mount namespace.

Also deprecate `.machine.disks`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-04-11 20:27:39 +04:00
Dmitrii Sharshakov
c1bec3cd0d
test: add negative tests for SELinux
Make sure a privileged pod cannot violate some of the important security rules enforced by SELinux.

Fixes #10615

Signed-off-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
2025-04-11 14:15:20 +02:00
Noel Georgi
5e4c24758e
feat: add a version resource
```text
❯ talosctl -n 10.5.0.3 get version
NODE       NAMESPACE   TYPE      ID        VERSION   VERSION
10.5.0.3   runtime     Version   version   1         v1.10.0-alpha.3-33-g84f69f043-dirty
```

Fixes: #10574

Signed-off-by: Noel Georgi <git@frezbo.dev>
2025-04-10 16:44:57 +05:30
Andrey Smirnov
6eee57b167
feat: add support for GCP instance tags
Same change as AWS.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-04-08 13:49:37 +04:00
Andrey Smirnov
60448b516e
feat: add support for instance tags on AWS
We can add on other platforms as well as we go.

See https://github.com/siderolabs/omni/issues/1059

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-04-08 12:57:56 +04:00
Noel Georgi
c4136c27da
fix: uki boot detection
Fix UKI boot detection

Also fix bug introduced by #10640 which imported the unix package making
talosctl non-unix builds broken.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2025-04-08 09:02:02 +05:30
Noel Georgi
1996610375
feat: expose if system is booted with UKI
Expose if system is booted with a UKI in `securitystate` resource.

Fixes: #10620

Signed-off-by: Noel Georgi <git@frezbo.dev>
2025-04-04 16:14:11 +02:00
Andrey Smirnov
c83611ddd7
test: more extension modules
Update with the fix https://github.com/siderolabs/pkgs/pull/1200, load
explicitly `xdma` and `ena` drivers.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-04-03 17:54:28 +04:00
Andrey Smirnov
203e02df49
refactor: implement directory and overlay mounts
This complements the previous PRs to implement more volume features:
directory volumes control their permissions, SELinux labels, etc.

Overlay mounts support additional parent relationship.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-04-01 20:51:02 +04:00
Dmitrii Sharshakov
087a85f409
feat: support running with SELinux enforcing
Add more rules alongside supporting code.

Signed-off-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
2025-03-22 14:39:48 +01:00
Andrey Smirnov
d4aacb0d85
refactor: mount operation for STATE and user disks
Use new controller for user disk and STATE mounts, drop
old code in the sequencer.

Also support mounts with parent (when e.g. `/var/lib` is mounted on top
of `/var`).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-03-20 20:46:57 +04:00
Andrey Smirnov
44f3c72489
fix: kata extension
Fixes #10575

See https://github.com/siderolabs/extensions/pull/651

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-03-20 19:58:44 +04:00
Andrey Smirnov
f9b14e7848
fix: reconnect on SideroLink tunnel on/off change
The issue is not so easy to fix, as GRPC tunnel on/off change requires
two different flow for the link (interface):

* no tunnel -> Talos link controller should create in-kernel `wireguard`
  link and no userspace components
* tunnel on -> Talos link controller should never create the link, and
  only adjust WG settings via UAPI, while the actual link is created by
  the userspace implementation (it's a `tun` device)

Transition between those two links is impossible for the link controller
to distinguish, as it doesn't know that it has to drop old link and skip
creating new one based on the information available.

So, instead, use different names for the link in two states:
`siderolink` for the kernel flow, and `siderolinktun` for the userspace
flow. This fixes the issue of proper link cleanup/re-creation.

Add integration tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-03-13 15:08:09 +04:00
Noel Georgi
29f7b3bf37
test(ci): use k8s websocket executor for tests
Use k8s websocket executor over SPDY.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2025-03-12 18:54:48 +05:30
Andrey Smirnov
a3f88d2ef5
fix: block NodePort services with ingress firewall
The previous fix #10354 was not full/complete.

The problem lies in the fact that `kube-proxy` creates a rule like:

```
chain nat-prerouting {
	type nat hook prerouting priority dstnat; policy accept;
	jump services
}
```

This chain has a prerouting hook, which gets executed before Talos's
input hook, and rewrites (does DNAT) for NodePort services before Talos
has a chance to block the packet, but rewritten packet hits the input
chain with DNAT address, or might be forwarded to another host and never
hit the firewall again.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-02-28 19:56:52 +04:00
Andrey Smirnov
47f377b21f
feat: implement the last ethtool feature - channels
Fixes #9173

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-02-11 15:34:57 +04:00
Andrey Smirnov
0419f5d8ba
feat: implement features in ethtool-like support
Support showing current feature state, and changing features on the fly.

The output and interface should be similar to `ethtool`.

We don't support legacy feature names.

```
node: 172.20.0.5
metadata:
    namespace: network
    type: EthernetStatuses.net.talos.dev
    id: enp0s2
    version: 2
    owner: network.EthernetStatusController
    phase: running
    created: 2025-02-10T11:40:32Z
    updated: 2025-02-10T11:40:32Z
spec:
    linkState: true
    port: Other
    duplex: Unknown
    rings:
        rx-max: 256
        tx-max: 256
        rx: 256
        tx: 256
        tx-push: false
        rx-push: false
    features:
        tx-scatter-gather: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        highdma: on [fixed]
        tx-scatter-gather-fraglist: off [fixed]
        tx-vlan-hw-insert: off [fixed]
        rx-vlan-hw-parse: off [fixed]
        rx-vlan-filter: on [fixed]
        vlan-challenged: off [fixed]
        tx-generic-segmentation: on
        rx-gro: on
        rx-lro: off [fixed]
        tx-tcp-segmentation: on
        tx-gso-robust: on [fixed]
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
        tx-fcoe-segmentation: off [fixed]
        tx-gre-segmentation: off [fixed]
        tx-gre-csum-segmentation: off [fixed]
        tx-ipxip4-segmentation: off [fixed]
        tx-ipxip6-segmentation: off [fixed]
        tx-udp_tnl-segmentation: off [fixed]
        tx-udp_tnl-csum-segmentation: off [fixed]
        tx-gso-partial: off [fixed]
        tx-tunnel-remcsum-segmentation: off [fixed]
        tx-sctp-segmentation: off [fixed]
        tx-esp-segmentation: off [fixed]
        tx-udp-segmentation: off
        tx-gso-list: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
        rx-ntuple-filter: off [fixed]
        rx-hashing: off [fixed]
        rx-checksum: on [fixed]
        tx-nocache-copy: off
        loopback: off [fixed]
        rx-fcs: off [fixed]
        rx-all: off [fixed]
        tx-vlan-stag-hw-insert: off [fixed]
        rx-vlan-stag-hw-parse: off [fixed]
        rx-vlan-stag-filter: off [fixed]
        l2-fwd-offload: off [fixed]
        hw-tc-offload: off [fixed]
        esp-hw-offload: off [fixed]
        esp-tx-csum-hw-offload: off [fixed]
        rx-udp_tunnel-port-offload: off [fixed]
        tls-hw-tx-offload: off [fixed]
        tls-hw-rx-offload: off [fixed]
        rx-gro-hw: on
        tls-hw-record: off [fixed]
        rx-gro-list: off
        macsec-hw-offload: off [fixed]
        rx-udp-gro-forwarding: off
        hsr-tag-ins-offload: off [fixed]
        hsr-tag-rm-offload: off [fixed]
        hsr-fwd-offload: off [fixed]
        hsr-dup-offload: off [fixed]
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-02-10 16:05:49 +04:00
Andrey Smirnov
716f700da7
feat: provide initial support for ethtool configuration
See https://github.com/siderolabs/ethtool - our fork.

This PR covers only configuring rings, follow-up PRs will address other
pieces: channels and features.

Example:

```
node: 172.20.0.5
metadata:
    namespace: network
    type: EthernetStatuses.net.talos.dev
    id: enp0s2
    version: 4
    owner: network.EthernetStatusController
    phase: running
    created: 2025-02-04T16:03:14Z
    updated: 2025-02-04T16:04:12Z
spec:
    linkState: true
    port: Other
    duplex: Unknown
    rings:
        rx-max: 256
        tx-max: 256
        rx: 128
        tx: 128
        tx-push: false
        rx-push: false
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-02-05 21:28:42 +04:00
Andrey Smirnov
93b4a3740b
test: bump timeout on rotate CA test
When using VIP, recovery of Kubernetes controlplane takes more time
(plus given the fact that the test rotates PKI twice).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-01-28 18:42:06 +04:00
Andrey Smirnov
75673b6a38
feat: provide stable symlinks in disk resources
This allows to grab various `/dev/disk` symlinks,
including in maintenance mode when `talosctl ls` is not allowed.

Samle output:

```yaml
node: 172.20.0.5
metadata:
    namespace: runtime
    type: Disks.block.talos.dev
    id: nvme0n2
    version: 2
    owner: block.DisksController
    phase: running
    created: 2025-01-23T12:57:08Z
    updated: 2025-01-23T12:57:09Z
spec:
    dev_path: /dev/nvme0n2
    size: 5368709120
    pretty_size: 5.4 GB
    io_size: 512
    sector_size: 512
    readonly: false
    cdrom: false
    model: QEMU NVMe Ctrl
    serial: deadbeef
    wwid: nvme.1b36-6465616462656566-51454d55204e564d65204374726c-00000002
    bus_path: /pci0000:00/0000:00:08.0/nvme
    sub_system: /sys/class/block
    transport: nvme
    symlinks:
        - /dev/disk/by-diskseq/11
        - /dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_deadbeef_2
        - /dev/disk/by-id/nvme-nvme.1b36-6465616462656566-51454d55204e564d65204374726c-00000002
        - /dev/disk/by-path/pci-0000:00:08.0-nvme-2
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-01-24 18:46:56 +04:00
Noel Georgi
bde516fde6
chore(ci): rework iscsi-tools extensions test
Rework `iscsi-tools` extensions tested based on https://github.com/siderolabs/extensions/pull/577

Signed-off-by: Noel Georgi <git@frezbo.dev>
2025-01-20 23:27:10 +05:30
Noel Georgi
01c86832cb
chore(ci): add test for OpenEBS MayaStor
Add a test in CI for OpenEBS MayaStor.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2025-01-16 09:47:17 +05:30
Noel Georgi
83d84a8318
chore(ci): better zfs checks
Part of: https://github.com/siderolabs/extensions/issues/572

Signed-off-by: Noel Georgi <git@frezbo.dev>
2025-01-02 21:12:31 +05:30
Andrey Smirnov
27233cf0fc
test: use node informer instead of raw watch
This should improve watch reliability, as it was failing on channel
being closed.

Fixes #10039

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-12-25 18:52:07 +04:00
Noel Georgi
a5660ed778
feat: pcirebind controller
Add a controller to support rebinding drivers for PCI devices.

Fixes: https://github.com/siderolabs/extensions/pull/488

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-12-20 17:35:37 +05:30
Andrey Smirnov
7d39b9ec2b
feat: remove cgroupsv1 in non-container mode
Following up on deprecation in Talos 1.9, remove it completely for Talos
1.10.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-12-18 18:48:11 +04:00
Andrey Smirnov
9470e842fc
test: cleanup failed Kubernetes pods
See #9870

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-12-16 16:48:30 +04:00
Andrey Smirnov
e33d2f581f
feat: support overriding base OCI spec for CRI
Fixes #9827

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-11-29 16:22:25 +04:00
Andrey Smirnov
fc3b31575c
fix: multiple issues with opening encrypted volumes
Fixes #9820

This only affects volumes with multiple key slots configured.

Make sync issues non-fatal, so that if some keys fail to sync, proceed
with normal boot, but record an error in the `VolumeStatus` resource.

When opening, correctly try all key slots.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-11-28 21:34:41 +04:00
Andrey Smirnov
ef69c9d39b
feat: update Linux to 6.12.1
No other changes, just update default bundled module list.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-11-27 23:08:14 +04:00
Dmitry Sharshakov
a13f82c594
feat: udev: label device nodes
Use udev rules to assign basic device file labels based on their subsystem

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
2024-11-22 12:42:22 +01:00
Dmitry Sharshakov
e899fb37fd
feat: label created files in /etc
Implement SELinux labeling support in EtcFileController, label both squashfs and runtime-created files in /etc and /system/etc.

Add corresponding test cases.

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
2024-11-22 09:16:13 +01:00
Noel Georgi
77cf84fb57
feat: support generating iso with imagecache
Support generating iso with imagecache.

Part-of: #9616

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-11-21 20:40:05 +05:30
Dmitry Sharshakov
1a8cc5f8b2
feat: add SELinux labels to volumes
Label mounted filesystems like ephemeral, overlay mounts, as well as data directories (going to become volumes later).

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
2024-11-21 14:23:43 +01:00
Andrey Smirnov
cc768037f8
feat: implement block device wipe
Fixes #9731

The wipe doesn't require a reboot, but it requires the blockdevice not
to be used as a volume.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-11-20 15:46:37 +04:00