Talos validates machine configuration at boot time, and refuses to boot
if machine configuration is invalid.
As machine configuration validation rules might change over time, we
need to prevent a scenario when after an upgrade machine configuration
becomes invalid, as there's no way to roll back properly.
Machine configuration is submitted over stdin to the installer
container, and installer container validates it using the new version of
Talos (which is going to be installed).
If the config is not sent over stdin, installer assumes old version of
Talos and proceeds.
This should be backported to 0.9 to allow config validation on upgrade
to 0.10.
Fixes#3419
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Overlay mount in `mountinfo` don't show up as mounts for any particular
block device, so the existing check doesn't catch them.
This was discovered as our current master can't upgrade because of
overlay mount for `/opt` and `apid` image in `/opt/apid` (which will be
fixed in a separate PR).
Without the check, installer fails on resetting partition table for the
disk effectively wiping the node (`device or resource busy` error).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This fixes a case of upgrade from 0.9.0-alpha.4 to 0.9.0-beta.0. With
introduced proper partition alignment and physical block size != 512,
partitions before ephemeral will be moved around a bit (due to the
alignment), and `STATE` partition size might change a bit.
If encryption is enabled, contents are preserved as raw bytes, so
partition size should be exactly same during restore.
Drop code (mostly tests) which handled 0.6 to 0.7 upgrades.
On upgrade with preserve don't touch any partitions, at least for 0.8 ->
0.9 layout hasn't changed.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR introduces the first part of disk encryption support.
New config section `systemDiskEncryption` was added into MachineConfig.
For now it contains only Ephemeral partition encryption.
Encryption itself supports two kinds of keys for now:
- node id deterministic key.
- static key which is hardcoded in the config and mainly used for test
purposes.
Talosctl cluster create can now be told to encrypt ephemeral partition
by using `--encrypt-ephemeral` flag.
Additionally:
- updated pkgs library version.
- changed Dockefile to copy cryptsetup deps from pkgs.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Filesystem creation step is moved on the later stage: when Talos mounts
the partition for the first time.
Now it checks if the partition doesn't have any filesystem and formats
it right before mounting.
Additionally refactored mount options a bit:
- replaced separate options with a set of binary flags.
- implemented pre-mount and post-unmount hooks.
And fixed typos in couple of places and increased timeout for `apid ready`.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This fixes spurious race conditions when user disks are partitioned
and formatted in `mountUserDisks` task. While this task runs, `udevd` is
running to allow various `/dev/` symlinks to be used for user disks.
At the same time `udevd` might trigger syscall `BLKRRPART` at any time
concurrently with Talos which leads to a race on kernel side when Talos
tries to update kernel partition table while kernel does it on its own
as a result of `udevd` call.
As part of the fix, `RereadPartitionTable()` calls were removed (they
trigger `BLKRRPART` and they're not needed as Talos updates partition
table on its own).
Some cleanups to make sure blockdevice is open/closed just in matching
pairs (no lingering open blockdevice instances). This is import for
`WithExclusiveLock()` calls, as it would lead to a deadlock if previous
blockdevice instance is not closed.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This refactoring is required to simplify the work to be done to support
disk encryption.
Tried to minimize amount of queries done by `blockdevice` `probe`
methods.
Instead, where we have `runtime.Runtime` we get all required blockdevices
there from blockdevice cache stored in `State().Machine().Disk()`.
This opens a way to store encryption settings in the `Partition`
objects.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Fixes#3011
See also https://github.com/talos-systems/go-procfs/pull/8
We don't want to allow all the kernel args to be overridden, as this
might compromise KSPP, but we would rather allow some args to be
overridden explicitly.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
That change should make Talos updates more straightforward in any
projects that depend on Talos.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
SBC should always overwrite default kernel params.
Otherwise we will always get duplicate values for some of them.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Idea is to add an option to perform "selective" reset: default reset
operation is to wipe all partitions (triggering reinstall), while spec
allows only to wipe some of the operations.
Other operations are performed exactly in the same way for any reset
flow.
Possible use case: reset only `EPHEMERAL` partition.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Regular upgrade path takes just one reboot, but it requires all the
processes to be stopped on the node before upgrade might proceed. Under
some circumstances and with potential Talos bugs it might not work
rendering Talos upgrades almost impossible.
Staged upgrades build upon regular install flow to run the upgrade on
the node reboot. Such upgrades require two reboots of the node, and it
requires two pulls of the installer image, but they should be much less
suspicious to the failure. Once the upgrade is staged, node can be
rebooted in any possible way, including hard reset and upgrade is
performed on the next boot.
New ADV format was implemented as well to allow to store install image
ref/options across reboots. New format allows for bigger values and
takes 50% of the `META` partition. Old ADV is still kept for
compatibility reasons.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This makes sure Talos won't pick up any potential leftover data on fresh
install. On upgrade contents of META partitions are preserved anyways.
Fixes#2919
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
By publishing SBC images as compressed raw images, tools like etcher can flash SD cards
by using URLs to the release asset. It is also common in this community to publish compressed
images instead of tarballs.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
This allows boards to provide kernel args at install time. We need this so that
we can set the console.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
This introduces the notion of a "board" in Talos. A board is an interface that is capable
of modifying the installation in specific ways for a given SBC. This also adds support for the
libretech_all_h3_cc_h5.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
This PR adds the ability for us to deploy Talos in openstack. Tested in
local devstack with a supplied userdata file. It also adds support to
the Makefile for building the openstack image so it'll be published with
next release.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This changes installer image/iso output to be tar via stdout
(optionally), so that we can copy back artifacts back from remote docker
daemon.
Fixes#2776
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes were applied automatically.
Import ordering might be questionable, but it's strict:
* stdlib
* other packages
* same package imports
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Instead of hosting a web service, we decided to implement a gRPC service
that exposes APIs that can be used in a client-side interactive installer.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
This will make it more obvious when installer got started, and when it
starts to wipe a disk (which might take some time).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
For 0.6 -> 0.7 upgrade, in any case config.yaml is preserved and moved
from `/boot` to `/system/state`.
For single node upgrade, `EPHEMERAL` partition is not touched and other
partitions are re-created as needed.
Bump provision tests to 0.6/0.7 upgrades as we get closer to the new
release.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This fixes A/B upgrades and rollback API.
Installer manifest supports now an option to preserve partition contents
while disk is being re-partitioned and partitions are re-formatted.
Mount `/boot` partition as needed (to find current label before starting
the installation and in the rollback API).
Fix upgrade API for non-master nodes.
Contents of `/boot`, `/system/state` and META partitions are preserved
in memory while the disk is re-partitioned.
Remove `--save` flag from the installer as it's not being used.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This unifies more code paths under the control of `install.Manifest` vs.
being split across the installer and manifest code.
There should be no functional changes now.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Library `blockdevice` was extracted as `talos-systems/go-blockdevice`,
this PR finalizes the move by removing Talos copy of it.
Some functions around `mkfs`/`growfs` were extracted as `makefs`
package, as they depend on `cmd` package.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Multiple fixes from local testing:
* `.ova` file shouldn't contain `./` entries
* fix error message (`err` is `nil` at that point)
* drop `efi` boot key (BIOS mode works fine)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves to using grub instead of syslinux.
BREAKING CHANGE: Single node upgrades will fail in this change. This
will also break the A/B fallback setup since this version introduces
an entirely new partition scheme, that any fallback will not know about.
We plan on addressing these issues in a follow up change.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.
And `pkg/machinery` is published as Go module inside Talos repository.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes#2272
`gofumpt` is now included into `golangci-lint`, but not the
`gofumports`, so we keep it using it as separate binary, but we keep
versions in sync with `golangci-lint`.
This contains fixes from:
* `gofumpt` (automated, mostly around octal constants)
* `exhaustive` in `switch` statements
* `noctx` (adding context with default timeout to http requests)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This linter makes sure tests are excercising only public package API.
I fixed all the tests which touch only public API of the packages. For
other test packages I added proper `//nolint` directive.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>