Include percent-based maxSize, e.g. use 50% of available space.
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
(cherry picked from commit 83f2bdb9ce6c9466716a6ac9c94dc2222e569ee8)
Don't guess based on the volume type, but use explicit fields for
different locators.
IMAGECACHE-ISO is a disk volume, but uses full volume locator (by
filesystem type, etc.)
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
When resetting+wiping system partitions (`talosctl reset
--system-labels-to-wipe ...`), also drop partitions. This enables
usecases such as relocating EPHEMERAL, etc. with a new machine
config.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
This previously returned immediately on first error, preventing
the "STATE was wiped but META wasn't" codepath from running.
This patch instead collects errors, checking whether META/STATE were
successfully wiped along the way, and unconditionally runs the "delete
state encryption info from META" if STATE was wiped and META wasn't.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Move `internal/app/machined/pkg/controllers/block/` `system_volumes.go`
+ `user_volumes.go` (and extras) to
`internal/app/machined/pkg/controllers/block/internal`. Adds plenty of
unit tests.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
When set to `disk`, a full block device is used for the volume.
When `volumeType = "disk"`:
- Size specific settings are not allowed in the provisioning block (`minSize`, `maxSize`, `grow`).
Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
`client.ErrEventNotSupported` was a simple sentinel with no information.
Replaced it with `client.EventNotSupportedError`, a struct implementing
error with the offending TypeURL included.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Previously, system volumes (`META`, `STATE`, etc.) were created by
`VolumeConfigController` and user volumes were created by
`UserVolumeConfigController`. This resulted in these controllers
racing to create volumes, which could cause partitions to be created in
an incorrect order.
This patch fixes this potential race by merging these two controllers
into a single controller, and refactoring a lot of the similar code
paths into one single pipeline for volume config handling.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
In certain situations, Talos's shutdown/reboot sequence hangs while
waiting for services/mounts to be gracefully stopped (see:
https://github.com/siderolabs/talos/issues/11775).
This patch adds a forceful mode to the reboot sequence (`talosctl reboot
--mode force`) that bypasses graceful userspace teardown and hard
reboots the machine.
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Fixes#10963
Also hides/deprecated `.machine.network.interfaces`, as every piece of
it is now available as proper multi-doc.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
It only applies to Talos pulling images, not CRI-initiated pulls.
This more of an experiment to fight a random issue when a wrong platform
image is pulled (specifically on arm64 platform accidentally pulling
amd64 image).
Co-authored-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Also expand internal bond configuration to cover missing fields.
They are not going to be exposed in legacy configuration.
Fixes#10960
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This bug showed up as a random deadlock on kubelet restart (might be any
other service though).
With a chain of mount requests, like `/var/log` ->
`/var/log/containers`, there was a chance that a new generation of mount
requests might try to pick up a tearing down parent of the previous
generation leading to a deadlock when the mount can't proceed for the
parent.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
- Add d_* PSI derivative values to the trigger expression context
- Only trigger OOM action while PSI is rising
- Make OOM test fail if controller kills a cgroup without stress-ng
- Wait for stress-mem to terminate before proceeding with the next tests
- Skip OOM test when running with race detector
Signed-off-by: Dmitrii Sharshakov <dmitry.sharshakov@siderolabs.com>
Update COSI, and stop using a fork of `gopkg.in/yaml.v3`, now we use new
supported for of this library.
Drop `MarshalYAMLBytes` for the machine config, as we actually marshal
config as a string, and we don't need this at all.
Make `talosctl` stop doing hacks on machine config for newer Talos, keep
hacks for backwards compatibility.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
KubeletConfig itself is only `v1beta1`, while `CredentialProviderConfig`
was `v1` for quite some time, including minimum Kubernetes 1.30
supported with Talos 1.12.
Fixes#12112
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Prevents needing to use --cluster and stays consistent with omnictl.
fixes#12127
Signed-off-by: Justin Garrison <justin.garrison@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
When the buffer Writer is request, code unconditionally started all
senders (in our case, this was always JSON network senders).
This resulted in log duplication on service restart - each time service
is started, the senders goroutine was recreated.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Allows for NVIDIA kernel modules to load on arm arch
Signed-off-by: Justin Garrison <justin.garrison@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This is a fix for the wrong fix in #11204, which was wrong in two ways:
* the ldflags -X override had a wrong variable name, so it had no effect
* but the above even if it worked, only covered "mamangement" part of
things, while `wgctrl-go` which configures things still has a
hardcoded location of `/var/run/`.
So the fix is two ways:
* replace the location where the socket is created properly
* use updated forked wgctrl-go which looks in both locations
This keeps all fixes of #11204 - `talosctl cluster create` siderolink
agent works properly with `wg` on the host, and Talos uses proper
location.
Before the fix the location was actually `/var/run` and it randomly
failed depending on the race condition of Talos booting up and managing
`/var`.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Use the `image-signer` cli since we cannot pass in docker login credentials saved in keychain to `docker` container.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Rework the assertion to be more specific.
The root cause that now LVM marks device mapper devices in a different
way, and we see just two of them.
Co-authored-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>