51 Commits

Author SHA1 Message Date
Andrey Smirnov
1eac88e470 feat: add support for installing to SBCs
This introduces the notion of a "board" in Talos. A board is an interface that is capable
of modifying the installation in specific ways for a given SBC. This also adds support for the
libretech_all_h3_cc_h5.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-26 07:18:25 -08:00
Andrey Smirnov
a2efa44663 chore: enable gci linter
Fixes were applied automatically.

Import ordering might be questionable, but it's strict:

* stdlib
* other packages
* same package imports

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 08:09:48 -08:00
Andrey Smirnov
8560fb9662 chore: enable nlreturn linter
Most of the fixes were automatically applied.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 06:48:07 -08:00
Andrey Smirnov
d9b74f0cc6 feat: skip resizing ephemeral partition if not required
This skips writing partition table if partition doesn't have to be
resized (already resized or max size from the beginning).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-30 14:31:10 -07:00
Andrey Smirnov
bc9e0c0dba fix: re-implement upgrade (install) with preserve
For 0.6 -> 0.7 upgrade, in any case config.yaml is preserved and moved
from `/boot` to `/system/state`.

For single node upgrade, `EPHEMERAL` partition is not touched and other
partitions are re-created as needed.

Bump provision tests to 0.6/0.7 upgrades as we get closer to the new
release.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-28 07:25:26 -07:00
Andrey Smirnov
775010ddba fix: stop ignoring EINVAL on mount
This fixes a bug introduced in #1982, the intention was to ignore
`EINVAL` on `unmount` when partition is no longer mounted, but the
change was wrong as it affected both `mount` and `unmount` code paths.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-23 07:19:26 -07:00
Andrey Smirnov
ff4d702f77 fix: implement preserving contents of partition on install
This fixes A/B upgrades and rollback API.

Installer manifest supports now an option to preserve partition contents
while disk is being re-partitioned and partitions are re-formatted.

Mount `/boot` partition as needed (to find current label before starting
the installation and in the rollback API).

Fix upgrade API for non-master nodes.

Contents of `/boot`, `/system/state` and META partitions are preserved
in memory while the disk is re-partitioned.

Remove `--save` flag from the installer as it's not being used.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-22 23:56:39 +03:00
Andrey Smirnov
4adb613f66 refactor: bring more control to install.Manifest execution
This unifies more code paths under the control of `install.Manifest` vs.
being split across the installer and manifest code.

There should be no functional changes now.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-20 01:08:14 -07:00
Andrey Smirnov
018086d1fa refactor: extract blockdevice library
Library `blockdevice` was extracted as `talos-systems/go-blockdevice`,
this PR finalizes the move by removing Talos copy of it.

Some functions around `mkfs`/`growfs` were extracted as `makefs`
package, as they depend on `cmd` package.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-05 11:18:43 -07:00
Seán C McCord
ff92d2a14b feat: add ApplyConfiguration API
Adds the ability to apply (replace) an existing node configuration with
a new one via the Machine API.

Fixes #2345

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-09-29 14:44:06 -07:00
Andrey Smirnov
f6ecf000c9 refactor: extract packages loadbalancer and retry
This removes in-tree packages in favor of:

* github.com/talos-systems/go-retry
* github.com/talos-systems/go-loadbalancer

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-02 13:46:22 -07:00
Andrew Rynhard
1a4059a553 feat: add grub bootloader
This moves to using grub instead of syslinux.

BREAKING CHANGE: Single node upgrades will fail in this change. This
will also break the A/B fallback setup since this version introduces
an entirely new partition scheme, that any fallback will not know about.
We plan on addressing these issues in a follow up change.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-09-01 12:06:43 -07:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
41d5f7859a chore: update golangci-lint to 1.28.3
Fixes #2272

`gofumpt` is now included into `golangci-lint`, but not the
`gofumports`, so we keep it using it as separate binary, but we keep
versions in sync with `golangci-lint`.

This contains fixes from:

* `gofumpt` (automated, mostly around octal constants)
* `exhaustive` in `switch` statements
* `noctx` (adding context with default timeout to http requests)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-16 08:05:42 -07:00
Andrew Rynhard
888c8b948a feat: add /system directory
This adds the `/system` directory to provide a dedicated
directory for all system related runtime files.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-07-01 09:51:56 -07:00
Andrey Smirnov
81d1c2bfe7 chore: enable godot linter
Issues were fixed automatically.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 10:39:56 -07:00
Andrew Rynhard
6ea313fa7d fix: detect if partition table is missing
This adds a sentinel error for a missing partition table. This error
is used to detect if a partition table already exists when setting
up user defined disks.

In addition to the fix, this removes a legacy parameter from the
`PartitionTable` method that indicated that the partition table
should be read. It is safer to just read it every time. Also, I
can't think of a case when the block device partition table is nil
and we want to read.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-06-16 18:26:59 -07:00
Andrew Rynhard
49307d554d refactor: improve machined
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.

A few highlights are:

- no more event bus
- functional approach to tasks (no more types defined for each task)
  - the task function definition now offers a lot more context, like
    access to raw API requests, the current sequence, a logger, the new
    state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
  install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
  install force option

Additionally, this pulls various packes in under machined to make the
code easier to navigate.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-28 08:20:55 -07:00
Andrew Rynhard
4ccd4d5364 fix: set ephemeral partition to max size
This sets the size of the ephemeral partition to the maximum
allowed size at installation time. We have reports of `xfs_growfs` causing
extremely slow boot times when the disk is 1TB or more. In our research
we found evidence that `xfs_growfs` is an expensive operation when
growing to a size of 10 times or more of the base. Instead, users should
create the disk close to the max disk size at install time. The
difference being that `mkfs.xfs` will handle larger disks better.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-17 07:08:04 -07:00
Andrey Smirnov
f18b5737d8 fix: ignore EINVAL on unmounting when mount point isn't mounted
This fixes reboot errors like:

```
[    8.963544] [talos] [phase]: unmount system disk submounts error running task: unmount: 1 error(s) occurred:
[    8.964428] 	invalid argument
[    8.964711] [talos] [phase]: unmount system disk submounts done, 1.836089ms
[    8.965354] [talos] error running phase "unmount system disk submounts": 1 error occurred:
[    8.966077] 	* unmount: 1 error(s) occurred:
[    8.966469] 	invalid argument
[    8.966765]
[    8.966919]
[    8.967102] [talos] recovered from: shutdown failed: error running phase "unmount system disk submounts": 1 error occurred:
[    8.968119] 	* unmount: 1 error(s) occurred:
[    8.968530] 	invalid argument
[    8.968815]
[    8.968966]
[    8.969177] [talos] rebooting in 10 seconds
```

Fixing this on the safe side: unmount is always performed, and `EINVAL`
is ignored only if mountpoint is not in `/proc/mounts`/

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-03-27 15:41:06 -07:00
Spencer Smith
12bfd8dd94 feat: allow for persistence of config data
This PR will allow users to set the `persist: true` value in their
config data to tell talos not to re-pull the config data at each reboot.
The default will still remain as a "pull every time" methodolgy in order
to encourage immutability by default.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-06 11:42:00 -05:00
Andrew Rynhard
1a68840eb4 feat: add function for mounting a specific system disk partition
We need the ability to manage the boot partition and ephemeral partition
independently. This adds the required functions to allow for that.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-28 11:27:31 -08:00
Andrew Rynhard
f8c2f14119 fix: mount as rshared
This adds unix.MS_REC to the shared mounts. We haven't seen any reports
of bugs yet, but in some testing I found that `Bidirectional` mounts don't
work unless the mount point is rshared.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-10 08:07:25 -08:00
Andrew Rynhard
4efccd96ea refactor: rename virtual package to pseudo
This aligns the nomenclature for filesystems like /dev and /proc with
what is used in the kernel code.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-26 22:32:48 -08:00
Andrew Rynhard
031c65be47 feat: add IMA policy
This creates an IMA policy at boot. It uses the default TCB policy with
a dont_measure rule for XFS.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-26 16:49:48 -08:00
Andrew Rynhard
3f49a15c06 feat: enable IMA measurement and appraisal
This updates the kernel to make use of a version that has IMA
measurement and appraisal enabled. It is not yet enforced. Additionally,
this adds the securityfs mount at /sys/kernel/security.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-26 03:15:58 -08:00
Andrew Rynhard
f411491484 fix: stop leaking file descriptors
This ensures that probed block devices are closed.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-03 17:15:54 -08:00
Andrey Smirnov
d3d011c8d2 chore: replace /* */ comments with // comments in license header
This fixes issues with `// +build` directives not being recognized in
source files.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-25 14:15:17 -07:00
Andrew Rynhard
d430a37e46 refactor: use go 1.13 error wrapping
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-15 22:20:50 -07:00
Andrew Rynhard
92de30715e feat: add retry package
This package provides a consistent way for us to retry arbitrary logic.
It provides the following backoff algorithms:

- exponential
- linear
- constant

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-10 13:11:02 -07:00
Andrey Smirnov
c2cb0f9778 chore: enable 'wsl' linter and fix all the issues
I wish there were less of them :)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-10 01:16:29 +03:00
Andrew Rynhard
5ee554128e chore: move from gofumpt to gofumports
The gofumports does everything that gofumpt does with the addition of
formatting imports. This change proposes the use of the `-local` flag so
that we can have imports separated in the following order:

- standard library
- third party
- Talos specific

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-12 07:49:12 -07:00
Andrew Rynhard
2955428850 chore: format code with gofumpt
The gofumpt linter is a stricter drop-in replacement for gofmt. The
rules are ones that I strongly agree with and I think it would be better
if we added this linter instead of nit picking every PR.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-11 11:03:29 -07:00
Andrew Rynhard
0bdaff1a90 feat: perform upgrades via container
This moves to performing upgrades via a container.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 09:44:50 -07:00
Andrew Rynhard
be8f58c15d feat: add overlay task
This adds a well defined task for handling all overlay mount points that
are required by the system.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-25 10:47:54 -07:00
Andrew Rynhard
1eb02875c2 feat: use BLKPG ioctl for partition events
This moves to using BLKPG ioctl instead of BLKRRPART. BLKRRPART is older
and more sensitive to EBUSY errors. BLKPG has the potential to minimize
the changes of encountering an EBUSY error when manipulating partition
tables.

In looking at a comparison between BLKPG and BLKRRPART, it seems that
both have their pros and cons. Eventually a combination of the two may
serve us better, but for now I think BLKPG will get us further.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-25 07:55:24 -07:00
Andrew Rynhard
2e65cff3ce feat: mount /sys/fs/bpf
The BPF filesystem is required to pin BPF objects.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-18 07:37:08 -07:00
Andrew Rynhard
a116145c1b feat: rename DATA partition to EPHEMERAL
This changes the data partition name to something more appropriate. We
chose ephemeral to make it very clear that the disk should not be used
for application data.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-15 08:00:22 -07:00
Brad Beam
da1f73249f fix(machined): Clean up installation process
This also includes a fix for #955 which had the unintended side effect
of breaking image creation ( since it would attempt to grow the filesystem
always ).

The refactor standardizes around looking for the DATA and ESP labels to
discover any existing installations/filesystems. If none are found, an
installation will proceed -- for both image creation and bare metal.
During bootup, the DATA partition will always attempt to expand/grow.

This also introduces a new phase to verify the installation through the
existance of /boot/installed ( migrated from install stage ).

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-08 22:10:14 -05:00
Brad Beam
53b1330c44 fix(initramfs): Allow data partition to grow
This fix ensures that we always grow the data partition during an installation.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-07 09:11:02 -05:00
Andrew Rynhard
90c91807bd refactor: restructure the project layout
This change moves packages into more appropriate places.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 22:19:42 -07:00
Andrew Rynhard
a9c4a95a4b fix: mount the owned partitions in cloud platforms
This adds the logic for mounting the owned block device and resizing the
ephemeral partition for cloud platforms.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 21:48:23 -07:00
Andrew Rynhard
ca35b85300 refactor: improve installation reliability
This change aims to make installations more unified and reliable. It
introduces the concept of a mountpoint manager that is capable of
mounting, unmounting, and moving a set of mountpoints in the correct
order.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 11:44:40 -07:00
Andrew Rynhard
e63c882b89 refactor: split machined into phases
This change aims to standardize the boot process. It introduces the
concept of a phase, which is comprised of tasks. Phases are ran in serial and
the tasks that make up a phase are ran concurrently.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-29 12:40:03 -07:00
Andrew Rynhard
0ec17e4169 feat: run rootfs from squashfs
This change moves the rootfs to a squashfs image.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-25 08:38:31 -07:00
Andrew Rynhard
5a68b8b371 fix: mount cgroups properly
This change mounts cgroups properly.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-24 22:10:15 -07:00
Andrew Rynhard
d4a59b7c14 fix(init) mount root partition as read-only
This uses the correct mount flag for read-only.
We mistakenly had the flag for opening a file as read-only.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-12 21:34:59 -07:00
Andrey Smirnov
7da7c8c2ff refactor: add stub unit-tests to non-trivial Go packages (#556)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-04-17 13:25:22 -07:00
Andrew Rynhard
7688de6a3a
chore: upgrade golangci-lint to v1.16.0 (#515)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-09 21:53:35 -07:00
Andrew Rynhard
ee226dddac
chore: enforce commit and license policies (#304)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-01-13 16:10:49 -08:00