38 Commits

Author SHA1 Message Date
Andrey Smirnov
96aa9638f7
chore: rename talos-systems/talos to siderolabs/talos
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-03 16:50:32 +04:00
Andrey Smirnov
343c55762e
chore: replace talos-systems Go modules with siderolabs
This the first step towards replacing all import paths to be based on
`siderolabs/` instead of `talos-systems/`.

All updates contain no functional changes, just refactorings to adapt to
the new path structure.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-01 12:55:40 +04:00
Dmitriy Matrenichev
93e55b85f2
chore: bump golangci-lint to v1.50.0
I had to do several things:
- contextcheck now supports Go 1.18 generics, but I had to disable it because of this https://github.com/kkHAIKE/contextcheck/issues/9
- dupword produces to many false positives, so it's also disabled
- revive found all packages which didn't have a documentation comment before. And tehre is A LOT of them. I updated some of them, but gave up at some point and just added them to exclude rules for now.
- change lint-vulncheck to use `base` stage as base

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-10-20 18:33:19 +03:00
Noel Georgi
357b770cb5
fix: cryptsetup delete slot
Fix cryptsetup delete slot.

Fixes: #6298

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-09-21 16:37:54 +05:30
Andrey Smirnov
23984efcdf
fix: detect lingering mounts in the installer correctly
Not sure how and when it got broken, but we're looking for mounts for
the blockdevice (like `/dev/vda`), while the actual mount info contains
the partition device (like `/dev/vda6`).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-19 21:18:40 +03:00
Andrey Smirnov
dee6305170 fix: align partitions with minimal I/O size
Also print discovered blockdevice properties before partitioning the
device.

See https://github.com/talos-systems/go-blockdevice/pull/40

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-08-04 11:51:00 -07:00
Andrew Rynhard
821f469a1d feat: skip overlay mount checks with docker
We need to be able to run an install with `docker run`. This checks if
we are running from docker and skips overlay mount checks if we are, as
docker creates a handful of overlay mounts by default that we can't
workaround (not easily at least).

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2021-06-21 15:51:39 -07:00
Andrey Smirnov
5811f4dda1 feat: implement link (interface) controllers
The structure of the controllers is really similar to addresses and
routes:

* `LinkSpec` resource describes desired link state
* `LinkConfig` controller generates `LinkSpecs` based on machine
configuration and kernel cmdline
* `LinkMerge` controller merges multiple configuration sources into a
single `LinkSpec` paying attention to the config layer priority
* `LinkSpec` controller applies the specs to the kernel state

Controller `LinkStatus` (which was implemented before) watches the
kernel state and publishes current link status.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-01 09:36:25 -07:00
Artem Chernyshev
76dbfb3699 feat: add ability to mark MBR partition bootable
Fixes: https://github.com/talos-systems/talos/issues/3532

Machine install section now has `markMBRBootable` option.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-05-27 12:44:50 -07:00
Andrey Smirnov
5fb38d3e5f chore: refactor Dockerfile for cross-compilation
This has two big visible changes:

* `installer` image now contains assets for both `amd64` and `arm64`, so
it can be used to generate any Talos image (including RPi on amd64 host)
* Talos is using cross-compilation instead of emulation to build
non-native architectures: on amd64, Go amd64 compiler produces binaries
for both arm64 and amd64
(before this change: Go arm64 compiler via QEMU produces arm64 binaries on amd64)

CI implications: we no longer require arm64 nodes.

Changes walkthrough:

* `installer` container now keeps assets under `/usr/install/<arch>`
* Dockerfile build starts forcing toolchain/base image to use the build
host native architecture, not target architecture
* lots of duplication for amd64/arm64 as we want to combine assets for
both arches in a single image (e.g. we have multi-arch amd64/arm64
installer image, each arch has native installer binary, but both arches
contain full set of amd64/arm64 assets)
* fixed a small bug preventing arm64 on amd64 talosctl cluster create

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-20 13:11:38 -07:00
Andrey Smirnov
bd5ae1e0b5 fix: add a check for overlay mounts in installer pre-flight checks
Overlay mount in `mountinfo` don't show up as mounts for any particular
block device, so the existing check doesn't catch them.

This was discovered as our current master can't upgrade because of
overlay mount for `/opt` and `apid` image in `/opt/apid` (which will be
fixed in a separate PR).

Without the check, installer fails on resetting partition table for the
disk effectively wiping the node (`device or resource busy` error).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-05 14:29:46 -07:00
Andrey Smirnov
3c5bfbb473 fix: don't touch any partitions on upgrade with --preserve
This fixes a case of upgrade from 0.9.0-alpha.4 to 0.9.0-beta.0. With
introduced proper partition alignment and physical block size != 512,
partitions before ephemeral will be moved around a bit (due to the
alignment), and `STATE` partition size might change a bit.

If encryption is enabled, contents are preserved as raw bytes, so
partition size should be exactly same during restore.

Drop code (mostly tests) which handled 0.6 to 0.7 upgrades.

On upgrade with preserve don't touch any partitions, at least for 0.8 ->
0.9 layout hasn't changed.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-15 12:25:53 -07:00
Artem Chernyshev
22f375300c chore: update golanci-lint to 1.38.0
Fix all discovered issues.
Detected couple bugs, fixed them as well.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-12 06:50:02 -08:00
Alexey Palazhchenko
df52c13581 chore: fix //nolint directives
That's the recommended syntax:
https://golangci-lint.run/usage/false-positives/

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-05 05:58:33 -08:00
Artem Chernyshev
58ff2c9808 feat: implement ephemeral partition encryption
This PR introduces the first part of disk encryption support.
New config section `systemDiskEncryption` was added into MachineConfig.
For now it contains only Ephemeral partition encryption.

Encryption itself supports two kinds of keys for now:
- node id deterministic key.
- static key which is hardcoded in the config and mainly used for test
purposes.

Talosctl cluster create can now be told to encrypt ephemeral partition
by using `--encrypt-ephemeral` flag.

Additionally:
- updated pkgs library version.
- changed Dockefile to copy cryptsetup deps from pkgs.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-02-17 13:39:04 -08:00
Artem Chernyshev
02b3719df9 feat: skip filesystem for state and ephemeral partitions in the installer
Filesystem creation step is moved on the later stage: when Talos mounts
the partition for the first time.
Now it checks if the partition doesn't have any filesystem and formats
it right before mounting.

Additionally refactored mount options a bit:
- replaced separate options with a set of binary flags.
- implemented pre-mount and post-unmount hooks.

And fixed typos in couple of places and increased timeout for `apid ready`.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-02-17 09:37:21 -08:00
Andrey Smirnov
18db20dbc2 fix: open blockdevices with exclusive flock for partitioning
This fixes spurious race conditions when user disks are partitioned
and formatted in `mountUserDisks` task. While this task runs, `udevd` is
running to allow various `/dev/` symlinks to be used for user disks.
At the same time `udevd` might trigger syscall `BLKRRPART` at any time
concurrently with Talos which leads to a race on kernel side when Talos
tries to update kernel partition table while kernel does it on its own
as a result of `udevd` call.

As part of the fix, `RereadPartitionTable()` calls were removed (they
trigger `BLKRRPART` and they're not needed as Talos updates partition
table on its own).

Some cleanups to make sure blockdevice is open/closed just in matching
pairs (no lingering open blockdevice instances). This is import for
`WithExclusiveLock()` calls, as it would lead to a deadlock if previous
blockdevice instance is not closed.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-01-28 09:11:39 -08:00
Andrey Smirnov
54ed80e244 feat: reset with system disk wipe spec
Idea is to add an option to perform "selective" reset: default reset
operation is to wipe all partitions (triggering reinstall), while spec
allows only to wipe some of the operations.

Other operations are performed exactly in the same way for any reset
flow.

Possible use case: reset only `EPHEMERAL` partition.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-10 11:31:07 -08:00
Andrey Smirnov
29fb7ef07b fix: zero out partitions without filesystems on install
This makes sure Talos won't pick up any potential leftover data on fresh
install. On upgrade contents of META partitions are preserved anyways.

Fixes #2919

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-08 04:41:06 -08:00
Andrey Smirnov
1eac88e470 feat: add support for installing to SBCs
This introduces the notion of a "board" in Talos. A board is an interface that is capable
of modifying the installation in specific ways for a given SBC. This also adds support for the
libretech_all_h3_cc_h5.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-26 07:18:25 -08:00
Andrey Smirnov
a2efa44663 chore: enable gci linter
Fixes were applied automatically.

Import ordering might be questionable, but it's strict:

* stdlib
* other packages
* same package imports

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 08:09:48 -08:00
Andrey Smirnov
8560fb9662 chore: enable nlreturn linter
Most of the fixes were automatically applied.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 06:48:07 -08:00
Andrey Smirnov
18e847fa8b fix: bump type for DiskSize to be 64-bit
Otherwise we're bound with 4GiB partititions.

Discovered by @Unix4ever.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-29 22:00:35 +03:00
Andrey Smirnov
e461320bcb chore: output more logs from the installer
This will make it more obvious when installer got started, and when it
starts to wipe a disk (which might take some time).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-29 01:48:45 +03:00
Andrey Smirnov
bc9e0c0dba fix: re-implement upgrade (install) with preserve
For 0.6 -> 0.7 upgrade, in any case config.yaml is preserved and moved
from `/boot` to `/system/state`.

For single node upgrade, `EPHEMERAL` partition is not touched and other
partitions are re-created as needed.

Bump provision tests to 0.6/0.7 upgrades as we get closer to the new
release.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-28 07:25:26 -07:00
Andrey Smirnov
ff4d702f77 fix: implement preserving contents of partition on install
This fixes A/B upgrades and rollback API.

Installer manifest supports now an option to preserve partition contents
while disk is being re-partitioned and partitions are re-formatted.

Mount `/boot` partition as needed (to find current label before starting
the installation and in the rollback API).

Fix upgrade API for non-master nodes.

Contents of `/boot`, `/system/state` and META partitions are preserved
in memory while the disk is re-partitioned.

Remove `--save` flag from the installer as it's not being used.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-22 23:56:39 +03:00
Andrey Smirnov
41f92b5d35 feat: wipe disks faster in the installer
See https://github.com/talos-systems/talos/pull/2663 for details.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-22 08:32:29 -07:00
Andrey Smirnov
4adb613f66 refactor: bring more control to install.Manifest execution
This unifies more code paths under the control of `install.Manifest` vs.
being split across the installer and manifest code.

There should be no functional changes now.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-20 01:08:14 -07:00
Andrey Smirnov
a12eb76734 test: add unit-test for the installer manifest
This test only works on local machine (see notes in the file).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-15 13:31:42 -07:00
Andrey Smirnov
018086d1fa refactor: extract blockdevice library
Library `blockdevice` was extracted as `talos-systems/go-blockdevice`,
this PR finalizes the move by removing Talos copy of it.

Some functions around `mkfs`/`growfs` were extracted as `makefs`
package, as they depend on `cmd` package.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-05 11:18:43 -07:00
Andrey Smirnov
f6ecf000c9 refactor: extract packages loadbalancer and retry
This removes in-tree packages in favor of:

* github.com/talos-systems/go-retry
* github.com/talos-systems/go-loadbalancer

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-02 13:46:22 -07:00
Andrew Rynhard
1a4059a553 feat: add grub bootloader
This moves to using grub instead of syslinux.

BREAKING CHANGE: Single node upgrades will fail in this change. This
will also break the A/B fallback setup since this version introduces
an entirely new partition scheme, that any fallback will not know about.
We plan on addressing these issues in a follow up change.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-09-01 12:06:43 -07:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
70a65cbb01 feat: make partitions on additional disk without size occupy full disk
Fixes #2214

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-21 07:33:07 -07:00
Andrey Smirnov
197369bdc1 fix: make installer re-read partition table before formatting
This hopefully should fix errors like:

```
2020/06/25 18:23:22 attaching loopback device
2020/06/25 18:23:22 partitioning /dev/loop2 - ESP
2020/06/25 18:23:22 partitioning /dev/loop2 - EPHEMERAL
2020/06/25 18:23:22 formatting partition "/dev/loop2p1" as "fat" with label "ESP"
2020/06/25 18:23:22 detaching loopback device
2020/06/25 18:23:22 failed to format device: exit status 1: mkfs.vfat: can't open '/dev/loop2p1': No such file or directory
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-26 12:22:00 -07:00
Andrew Rynhard
6ea313fa7d fix: detect if partition table is missing
This adds a sentinel error for a missing partition table. This error
is used to detect if a partition table already exists when setting
up user defined disks.

In addition to the fix, this removes a legacy parameter from the
`PartitionTable` method that indicated that the partition table
should be read. It is safer to just read it every time. Also, I
can't think of a case when the block device partition table is nil
and we want to read.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-06-16 18:26:59 -07:00
Andrew Rynhard
9cf76ca5ec fix: allow for using /dev/disk/* symlinks
This change ensures that any symlinks under `/dev/disks` are created
by udevd early enough so that users can specify disks identified
by these symlinks in `machine.disks` entries. This change also moves
the udevd-trigger from a service to a post-startup task for udevd.
This is a one time command that doesn't need to be a service. Some
other minor changes were made around parsing the partition name that
makes use of the symlinks created by udevd. This also ensures that
target mount points in `machine.disks` entries are created, fixing a
preexisting bug.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-06-16 13:01:26 -07:00
Andrew Rynhard
49307d554d refactor: improve machined
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.

A few highlights are:

- no more event bus
- functional approach to tasks (no more types defined for each task)
  - the task function definition now offers a lot more context, like
    access to raw API requests, the current sequence, a logger, the new
    state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
  install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
  install force option

Additionally, this pulls various packes in under machined to make the
code easier to navigate.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-28 08:20:55 -07:00