91 Commits

Author SHA1 Message Date
Andrey Smirnov
b3c3ef29bd
feat: install system extensions
Fixes #4815

This implements the following steps:

* machine configuration updates
* pulling and unpacking system extension images
* validating, listing system extensions
* re-packing system extensions
* preserving installed extensions in `/etc/extensions.yaml`

Once extension is enabled, raw information can be queried with:

```
$ talosctl -n 172.20.0.2 cat /etc/extensions.yaml
layers:
    - image: 000.ghcr.io-smira-gvisor-c927b54-dirty.sqsh
      metadata:
        name: gvisor
        version: 20220117.0-v1.0.0
        author: Andrew Rynhard
        description: |
            This system extension provides gVisor using containerd's runtime handler.
        compatibility:
            talos:
                version: '> v0.15.0-alpha.1'
```

This was tested with the `gvisor` system extension.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-01-26 16:24:28 +03:00
Serge Logvinov
2869b5eeac
feat: add oraclecloud.com platform support
* cloud-init for oraclecloud (IMDSv2)
* amd64/arm64 arch
* set DHCPv6 on if IPv6 subnet allocated

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-01-14 16:56:37 +03:00
Serge Logvinov
353d632ae5
feat: add nocloud platform support
* fetch cdrom/net nocloud config
* apply simple network configuration

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-28 16:32:12 +03:00
Artem Chernyshev
519999b846
fix: use readonly mode when probing devices with All lookup
Update `go-blockdevice` library.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-09-23 14:47:52 +03:00
Serge Logvinov
19a8ae97c6
feat: add vultr.com cloud support
* cloud-init for vultr.com
* ipv4/v6 support
* set static IPs for private interface

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-14 22:58:30 +03:00
Serge Logvinov
3b5f4038de
feat: add scaleway.com cloud support
* cloud-init for scaleway
* set ipv6 to the interface

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-09 23:01:50 +03:00
Serge Logvinov
f156ab1847
feat: add upcloud.com cloud support
* cloud-init for upcloud.com
* ipv4/v6 support

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
2021-09-09 17:00:05 +03:00
Serge Logvinov
812d59c700
feat: add hetzner.com cloud support
* cloud-init for hcloud
* set ipv6 to the interface

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-07 21:33:15 +03:00
Andrey Smirnov
faecae44fd feat: make ISO builds reproducible
This relies on changes in GRUB and other utilities to respect
`SOURCE_DATE_EPOCH`.

Variable `SOURCE_DATE_EPOCH` is set to the timestamp of the last git
commit which makes it deterministic, but still changes for each
release/commit.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-08-11 09:20:07 -07:00
Alexey Palazhchenko
fdf6b2433c chore: revert "improve artifacts generation reproducibility"
GCP does not consider generated .tar file to be valid.

This reverts commit b2507b41d250b989b9c13ad23e16202cd53a18d2.
Refs #4023.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-08-10 11:50:42 -07:00
Andrey Smirnov
b2507b41d2 chore: improve artifacts generation reproducibility
Sparse file generation replaced with Go native calls.

Final artifact `.tar` reproducible with new tar flags and using GNU tar
instead of busybox one, but as the image itself is not reproducible,
this only helps a bit.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-08-09 06:54:47 -07:00
Andrey Smirnov
6d6ed1170f chore: use parallel xz with higher compression level
Preset `-0` for xz means fast compression but low compression level.
Changing this to `-6` (default) means that result is 10% smaller (tested
with RPi4 image).

Enable parallel compression with number of threads equal to number of
CPUs to make it compress even faster then with `-0`:

* `-0`: 15s
* `-6`: 60s
* `-6 -T 0`: 10s (on my machine, depends on number of cores)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-08-06 12:51:31 -07:00
Andrey Smirnov
dee6305170 fix: align partitions with minimal I/O size
Also print discovered blockdevice properties before partitioning the
device.

See https://github.com/talos-systems/go-blockdevice/pull/40

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-08-04 11:51:00 -07:00
Andrew Rynhard
821f469a1d feat: skip overlay mount checks with docker
We need to be able to run an install with `docker run`. This checks if
we are running from docker and skips overlay mount checks if we are, as
docker creates a handful of overlay mounts by default that we can't
workaround (not easily at least).

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2021-06-21 15:51:39 -07:00
Andrey Smirnov
5811f4dda1 feat: implement link (interface) controllers
The structure of the controllers is really similar to addresses and
routes:

* `LinkSpec` resource describes desired link state
* `LinkConfig` controller generates `LinkSpecs` based on machine
configuration and kernel cmdline
* `LinkMerge` controller merges multiple configuration sources into a
single `LinkSpec` paying attention to the config layer priority
* `LinkSpec` controller applies the specs to the kernel state

Controller `LinkStatus` (which was implemented before) watches the
kernel state and publishes current link status.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-01 09:36:25 -07:00
Artem Chernyshev
76dbfb3699 feat: add ability to mark MBR partition bootable
Fixes: https://github.com/talos-systems/talos/issues/3532

Machine install section now has `markMBRBootable` option.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-05-27 12:44:50 -07:00
Joost Coelingh
f7cf64d42e fix: add talos.config to the vApp Properties in VMware OVA
VMware vSphere doens't allow talos.config to be set when deploying the OVA
due to missing vApp properties. Added talos.config to the ovf template to include
talos.config to fix this.

Fixes talos-systems#3669

Signed-off-by: Joost Coelingh <joost.coelingh@eu.equinix.com>
2021-05-24 14:31:53 -07:00
Lennard Klein
7f468d350a fix: update osType in OVA other3xLinux64Guest"
VMware vSphere considers the OVA invalid, seemingly because it considers
VirtualSCSI incompatible with osType otherLinux64Guest. Updating the osType
to other3xLinux64Guest fixes this

Fixes talos-systems#3515

Signed-off-by: Lennard Klein <lennard.klein@eu.equinix.com>
2021-04-21 05:46:31 -07:00
Andrey Smirnov
5fb38d3e5f chore: refactor Dockerfile for cross-compilation
This has two big visible changes:

* `installer` image now contains assets for both `amd64` and `arm64`, so
it can be used to generate any Talos image (including RPi on amd64 host)
* Talos is using cross-compilation instead of emulation to build
non-native architectures: on amd64, Go amd64 compiler produces binaries
for both arm64 and amd64
(before this change: Go arm64 compiler via QEMU produces arm64 binaries on amd64)

CI implications: we no longer require arm64 nodes.

Changes walkthrough:

* `installer` container now keeps assets under `/usr/install/<arch>`
* Dockerfile build starts forcing toolchain/base image to use the build
host native architecture, not target architecture
* lots of duplication for amd64/arm64 as we want to combine assets for
both arches in a single image (e.g. we have multi-arch amd64/arm64
installer image, each arch has native installer binary, but both arches
contain full set of amd64/arm64 assets)
* fixed a small bug preventing arm64 on amd64 talosctl cluster create

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-20 13:11:38 -07:00
Jorik Jonker
8b8542e3b5 feat: add support for reading OVF data on VMWare
The OVF environment is a way to supply guestinfo to guests. It is
a datastructure (XML) put in `extraConfig` (commonly referred to as
`guestinfo`) under the key `ovfenv`.

This OVF env is said to be the proper way to supply customization data
to guests (ie, not through `extraConfig`), and on some platforms (eg,
vCD), it is even the only option.

This change also enables the actual OVF transport in the OVA.

Signed-off-by: Jorik Jonker <jorik.jonker@eu.equinix.com>
2021-04-13 16:16:44 +03:00
Andrey Smirnov
d5e2a45db3 feat: validate the machine configuration in the installer container
Talos validates machine configuration at boot time, and refuses to boot
if machine configuration is invalid.

As machine configuration validation rules might change over time, we
need to prevent a scenario when after an upgrade machine configuration
becomes invalid, as there's no way to roll back properly.

Machine configuration is submitted over stdin to the installer
container, and installer container validates it using the new version of
Talos (which is going to be installed).

If the config is not sent over stdin, installer assumes old version of
Talos and proceeds.

This should be backported to 0.9 to allow config validation on upgrade
to 0.10.

Fixes #3419

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-12 06:47:28 -07:00
Andrey Smirnov
bd5ae1e0b5 fix: add a check for overlay mounts in installer pre-flight checks
Overlay mount in `mountinfo` don't show up as mounts for any particular
block device, so the existing check doesn't catch them.

This was discovered as our current master can't upgrade because of
overlay mount for `/opt` and `apid` image in `/opt/apid` (which will be
fixed in a separate PR).

Without the check, installer fails on resetting partition table for the
disk effectively wiping the node (`device or resource busy` error).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-05 14:29:46 -07:00
Andrey Smirnov
3c5bfbb473 fix: don't touch any partitions on upgrade with --preserve
This fixes a case of upgrade from 0.9.0-alpha.4 to 0.9.0-beta.0. With
introduced proper partition alignment and physical block size != 512,
partitions before ephemeral will be moved around a bit (due to the
alignment), and `STATE` partition size might change a bit.

If encryption is enabled, contents are preserved as raw bytes, so
partition size should be exactly same during restore.

Drop code (mostly tests) which handled 0.6 to 0.7 upgrades.

On upgrade with preserve don't touch any partitions, at least for 0.8 ->
0.9 layout hasn't changed.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-15 12:25:53 -07:00
Artem Chernyshev
22f375300c chore: update golanci-lint to 1.38.0
Fix all discovered issues.
Detected couple bugs, fixed them as well.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-12 06:50:02 -08:00
Alexey Palazhchenko
df52c13581 chore: fix //nolint directives
That's the recommended syntax:
https://golangci-lint.run/usage/false-positives/

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-05 05:58:33 -08:00
Andrey Smirnov
e9fc54f6e3 feat: update Kubernetes to 1.20.3
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1202

Also updater pkgs for:

* talos-systems/pkgs#238 (raspberrypi-firmware update)
* talos-systems/pkgs#242 (Linux 5.10.17 + init_on_free=0)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-19 05:22:34 -08:00
Artem Chernyshev
58ff2c9808 feat: implement ephemeral partition encryption
This PR introduces the first part of disk encryption support.
New config section `systemDiskEncryption` was added into MachineConfig.
For now it contains only Ephemeral partition encryption.

Encryption itself supports two kinds of keys for now:
- node id deterministic key.
- static key which is hardcoded in the config and mainly used for test
purposes.

Talosctl cluster create can now be told to encrypt ephemeral partition
by using `--encrypt-ephemeral` flag.

Additionally:
- updated pkgs library version.
- changed Dockefile to copy cryptsetup deps from pkgs.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-02-17 13:39:04 -08:00
Artem Chernyshev
02b3719df9 feat: skip filesystem for state and ephemeral partitions in the installer
Filesystem creation step is moved on the later stage: when Talos mounts
the partition for the first time.
Now it checks if the partition doesn't have any filesystem and formats
it right before mounting.

Additionally refactored mount options a bit:
- replaced separate options with a set of binary flags.
- implemented pre-mount and post-unmount hooks.

And fixed typos in couple of places and increased timeout for `apid ready`.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-02-17 09:37:21 -08:00
Artem Chernyshev
f96548e165 refactor: extract go-cmd into a separate library
To be used in the `go-blockdevice` library.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-02-16 10:31:20 -08:00
Andrey Smirnov
6791036cfa fix: add 3 seconds grub boot timeout
This allows to drop into GRUB menu and edit boot configuration.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-12 11:13:06 -08:00
Andrey Smirnov
18db20dbc2 fix: open blockdevices with exclusive flock for partitioning
This fixes spurious race conditions when user disks are partitioned
and formatted in `mountUserDisks` task. While this task runs, `udevd` is
running to allow various `/dev/` symlinks to be used for user disks.
At the same time `udevd` might trigger syscall `BLKRRPART` at any time
concurrently with Talos which leads to a race on kernel side when Talos
tries to update kernel partition table while kernel does it on its own
as a result of `udevd` call.

As part of the fix, `RereadPartitionTable()` calls were removed (they
trigger `BLKRRPART` and they're not needed as Talos updates partition
table on its own).

Some cleanups to make sure blockdevice is open/closed just in matching
pairs (no lingering open blockdevice instances). This is import for
`WithExclusiveLock()` calls, as it would lead to a deadlock if previous
blockdevice instance is not closed.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-01-28 09:11:39 -08:00
Artem Chernyshev
a83af03730 refactor: update go-blockdevice and restructure disk interaction code
This refactoring is required to simplify the work to be done to support
disk encryption.

Tried to minimize amount of queries done by `blockdevice` `probe`
methods.
Instead, where we have `runtime.Runtime` we get all required blockdevices
there from blockdevice cache stored in `State().Machine().Disk()`.
This opens a way to store encryption settings in the `Partition`
objects.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-01-28 17:42:09 +03:00
Andrey Smirnov
d19486afaa fix: allow 'console' argument in kernel args to be always overridden
Fixes #3011

See also https://github.com/talos-systems/go-procfs/pull/8

We don't want to allow all the kernel args to be overridden, as this
might compromise KSPP, but we would rather allow some args to be
overridden explicitly.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-01-08 08:08:34 -08:00
Artem Chernyshev
7b6c4bcb1f refactor: define default kernel flags in machinery instead of procfs
That change should make Talos updates more straightforward in any
projects that depend on Talos.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-24 06:50:53 -08:00
Artem Chernyshev
47fb7d26e0 fix: use SetAll instead of AppendAll when building kernel args
SBC should always overwrite default kernel params.
Otherwise we will always get duplicate values for some of them.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-23 09:09:13 -08:00
Andrey Smirnov
17830b9152 fix: disable kmsg throttling for iso mode
Just adding the kernel arg.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-15 07:08:52 -08:00
Andrey Smirnov
80184393bc feat: update kernel to 5.9.13, new KSPP requirements
Pulls in following changes:

* https://github.com/talos-systems/toolchain/pull/20
* https://github.com/talos-systems/tools/pull/116
* https://github.com/talos-systems/pkgs/pull/214
* https://github.com/talos-systems/pkgs/pull/215
* https://github.com/talos-systems/pkgs/pull/216
* https://github.com/talos-systems/pkgs/pull/217
* https://github.com/talos-systems/go-procfs/pull/4

New empty amd64 images for u-boot & rpi-firmware reduce the size of
amd64 installer image.

For backwards compatibility QEMU provisioner still injects "legacy" KSPP
kernel args into initial boot environment.

Installer correctly upgrades KSPP options when moving from one version
of Talos to another.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-10 12:41:58 -08:00
Andrey Smirnov
54ed80e244 feat: reset with system disk wipe spec
Idea is to add an option to perform "selective" reset: default reset
operation is to wipe all partitions (triggering reinstall), while spec
allows only to wipe some of the operations.

Other operations are performed exactly in the same way for any reset
flow.

Possible use case: reset only `EPHEMERAL` partition.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-10 11:31:07 -08:00
Andrey Smirnov
350280eb59 feat: implement "staged" (failsafe/backup) upgrades
Regular upgrade path takes just one reboot, but it requires all the
processes to be stopped on the node before upgrade might proceed. Under
some circumstances and with potential Talos bugs it might not work
rendering Talos upgrades almost impossible.

Staged upgrades build upon regular install flow to run the upgrade on
the node reboot. Such upgrades require two reboots of the node, and it
requires two pulls of the installer image, but they should be much less
suspicious to the failure. Once the upgrade is staged, node can be
rebooted in any possible way, including hard reset and upgrade is
performed on the next boot.

New ADV format was implemented as well to allow to store install image
ref/options across reboots. New format allows for bigger values and
takes 50% of the `META` partition. Old ADV is still kept for
compatibility reasons.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-08 08:34:26 -08:00
Andrey Smirnov
29fb7ef07b fix: zero out partitions without filesystems on install
This makes sure Talos won't pick up any potential leftover data on fresh
install. On upgrade contents of META partitions are preserved anyways.

Fixes #2919

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-08 04:41:06 -08:00
Andrew Rynhard
03094861c2 chore: output SBC images as compressed raw images
By publishing SBC images as compressed raw images, tools like etcher can flash SD cards
by using URLs to the release asset. It is also common in this community to publish compressed
images instead of tarballs.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-12-01 20:49:51 -08:00
Andrew Rynhard
5fe41ba32b feat: allow boards to set kernel args
This allows boards to provide kernel args at install time. We need this so that
we can set the console.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-12-01 07:08:20 -08:00
Andrew Rynhard
99aa3cdba5 feat: add support for the Raspberry Pi 4 Model B
This adds support for the Raspberry Pi 4 Model B.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-30 09:29:48 -08:00
Andrew Rynhard
733f068be1 chore: fix metal image name
We shouldn't include "none" in the default metal image.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-30 06:38:28 -08:00
Andrey Smirnov
d51018a76a fix: update generated .ova manifest for raw disk size
Use constant instead of hardcoded value.

Fixes #2845

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-27 06:19:32 -08:00
Andrey Smirnov
1eac88e470 feat: add support for installing to SBCs
This introduces the notion of a "board" in Talos. A board is an interface that is capable
of modifying the installation in specific ways for a given SBC. This also adds support for the
libretech_all_h3_cc_h5.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-26 07:18:25 -08:00
Spencer Smith
79057f93c5 feat: support openstack platform
This PR adds the ability for us to deploy Talos in openstack. Tested in
local devstack with a supplied userdata file. It also adds support to
the Makefile for building the openstack image so it'll be published with
next release.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-11-25 07:12:57 -08:00
Andrey Smirnov
61facf700a chore: build arm64 images in CI
This changes installer image/iso output to be tar via stdout
(optionally), so that we can copy back artifacts back from remote docker
daemon.

Fixes #2776

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-13 12:34:48 -08:00
Andrey Smirnov
a2efa44663 chore: enable gci linter
Fixes were applied automatically.

Import ordering might be questionable, but it's strict:

* stdlib
* other packages
* same package imports

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 08:09:48 -08:00
Andrey Smirnov
8560fb9662 chore: enable nlreturn linter
Most of the fixes were automatically applied.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 06:48:07 -08:00