52 Commits

Author SHA1 Message Date
Andrey Smirnov
29fb7ef07b fix: zero out partitions without filesystems on install
This makes sure Talos won't pick up any potential leftover data on fresh
install. On upgrade contents of META partitions are preserved anyways.

Fixes #2919

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-08 04:41:06 -08:00
Andrew Rynhard
03094861c2 chore: output SBC images as compressed raw images
By publishing SBC images as compressed raw images, tools like etcher can flash SD cards
by using URLs to the release asset. It is also common in this community to publish compressed
images instead of tarballs.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-12-01 20:49:51 -08:00
Andrew Rynhard
5fe41ba32b feat: allow boards to set kernel args
This allows boards to provide kernel args at install time. We need this so that
we can set the console.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-12-01 07:08:20 -08:00
Andrew Rynhard
99aa3cdba5 feat: add support for the Raspberry Pi 4 Model B
This adds support for the Raspberry Pi 4 Model B.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-30 09:29:48 -08:00
Andrew Rynhard
733f068be1 chore: fix metal image name
We shouldn't include "none" in the default metal image.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-30 06:38:28 -08:00
Andrey Smirnov
d51018a76a fix: update generated .ova manifest for raw disk size
Use constant instead of hardcoded value.

Fixes #2845

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-27 06:19:32 -08:00
Andrey Smirnov
1eac88e470 feat: add support for installing to SBCs
This introduces the notion of a "board" in Talos. A board is an interface that is capable
of modifying the installation in specific ways for a given SBC. This also adds support for the
libretech_all_h3_cc_h5.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-26 07:18:25 -08:00
Spencer Smith
79057f93c5 feat: support openstack platform
This PR adds the ability for us to deploy Talos in openstack. Tested in
local devstack with a supplied userdata file. It also adds support to
the Makefile for building the openstack image so it'll be published with
next release.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-11-25 07:12:57 -08:00
Andrey Smirnov
61facf700a chore: build arm64 images in CI
This changes installer image/iso output to be tar via stdout
(optionally), so that we can copy back artifacts back from remote docker
daemon.

Fixes #2776

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-13 12:34:48 -08:00
Andrey Smirnov
a2efa44663 chore: enable gci linter
Fixes were applied automatically.

Import ordering might be questionable, but it's strict:

* stdlib
* other packages
* same package imports

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 08:09:48 -08:00
Andrey Smirnov
8560fb9662 chore: enable nlreturn linter
Most of the fixes were automatically applied.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 06:48:07 -08:00
Andrew Rynhard
562f816526 refactor: use gRPC for interactive installation
Instead of hosting a web service, we decided to implement a gRPC service
that exposes APIs that can be used in a client-side interactive installer.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-03 08:36:44 -08:00
Andrew Rynhard
562ab1d572 chore: update golangci-lint
Brings in the latest version of golangci-lint and addresses errors.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-02 20:34:05 -08:00
Andrew Rynhard
1ca61ddce7 feat: add ISO support
This reverts commit 3515f4e0f8c11352539ed0d430e1f44f73c8229f.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-02 10:21:40 -08:00
Andrey Smirnov
18e847fa8b fix: bump type for DiskSize to be 64-bit
Otherwise we're bound with 4GiB partititions.

Discovered by @Unix4ever.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-29 22:00:35 +03:00
Andrey Smirnov
e461320bcb chore: output more logs from the installer
This will make it more obvious when installer got started, and when it
starts to wipe a disk (which might take some time).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-29 01:48:45 +03:00
Andrey Smirnov
bc9e0c0dba fix: re-implement upgrade (install) with preserve
For 0.6 -> 0.7 upgrade, in any case config.yaml is preserved and moved
from `/boot` to `/system/state`.

For single node upgrade, `EPHEMERAL` partition is not touched and other
partitions are re-created as needed.

Bump provision tests to 0.6/0.7 upgrades as we get closer to the new
release.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-28 07:25:26 -07:00
Andrey Smirnov
ff4d702f77 fix: implement preserving contents of partition on install
This fixes A/B upgrades and rollback API.

Installer manifest supports now an option to preserve partition contents
while disk is being re-partitioned and partitions are re-formatted.

Mount `/boot` partition as needed (to find current label before starting
the installation and in the rollback API).

Fix upgrade API for non-master nodes.

Contents of `/boot`, `/system/state` and META partitions are preserved
in memory while the disk is re-partitioned.

Remove `--save` flag from the installer as it's not being used.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-22 23:56:39 +03:00
Andrey Smirnov
41f92b5d35 feat: wipe disks faster in the installer
See https://github.com/talos-systems/talos/pull/2663 for details.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-22 08:32:29 -07:00
Andrey Smirnov
4adb613f66 refactor: bring more control to install.Manifest execution
This unifies more code paths under the control of `install.Manifest` vs.
being split across the installer and manifest code.

There should be no functional changes now.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-20 01:08:14 -07:00
Andrey Smirnov
a12eb76734 test: add unit-test for the installer manifest
This test only works on local machine (see notes in the file).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-15 13:31:42 -07:00
Andrey Smirnov
018086d1fa refactor: extract blockdevice library
Library `blockdevice` was extracted as `talos-systems/go-blockdevice`,
this PR finalizes the move by removing Talos copy of it.

Some functions around `mkfs`/`growfs` were extracted as `makefs`
package, as they depend on `cmd` package.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-05 11:18:43 -07:00
Andrey Smirnov
dd614c4f35 fix: update vmware image and platform
Multiple fixes from local testing:

* `.ova` file shouldn't contain `./` entries
* fix error message (`err` is `nil` at that point)
* drop `efi` boot key (BIOS mode works fine)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-08 02:05:33 -07:00
Andrew Rynhard
3515f4e0f8 feat: remove ISO support
This feature has long been broken. It is time to remove it.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-09-04 01:00:40 -07:00
Andrey Smirnov
f6ecf000c9 refactor: extract packages loadbalancer and retry
This removes in-tree packages in favor of:

* github.com/talos-systems/go-retry
* github.com/talos-systems/go-loadbalancer

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-02 13:46:22 -07:00
Andrew Rynhard
1a4059a553 feat: add grub bootloader
This moves to using grub instead of syslinux.

BREAKING CHANGE: Single node upgrades will fail in this change. This
will also break the A/B fallback setup since this version introduces
an entirely new partition scheme, that any fallback will not know about.
We plan on addressing these issues in a follow up change.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-09-01 12:06:43 -07:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
70a65cbb01 feat: make partitions on additional disk without size occupy full disk
Fixes #2214

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-21 07:33:07 -07:00
Andrey Smirnov
41d5f7859a chore: update golangci-lint to 1.28.3
Fixes #2272

`gofumpt` is now included into `golangci-lint`, but not the
`gofumports`, so we keep it using it as separate binary, but we keep
versions in sync with `golangci-lint`.

This contains fixes from:

* `gofumpt` (automated, mostly around octal constants)
* `exhaustive` in `switch` statements
* `noctx` (adding context with default timeout to http requests)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-16 08:05:42 -07:00
Andrey Smirnov
686dcc6743 chore: enable 'testpackage' linter
This linter makes sure tests are excercising only public package API.

I fixed all the tests which touch only public API of the packages. For
other test packages I added proper `//nolint` directive.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 16:42:28 -07:00
Andrey Smirnov
81d1c2bfe7 chore: enable godot linter
Issues were fixed automatically.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 10:39:56 -07:00
Andrey Smirnov
4ad4511b38 chore: enable nolintlint linter
It makes sure our `//nolint:` directives are not redundant.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 07:39:19 -07:00
Andrey Smirnov
197369bdc1 fix: make installer re-read partition table before formatting
This hopefully should fix errors like:

```
2020/06/25 18:23:22 attaching loopback device
2020/06/25 18:23:22 partitioning /dev/loop2 - ESP
2020/06/25 18:23:22 partitioning /dev/loop2 - EPHEMERAL
2020/06/25 18:23:22 formatting partition "/dev/loop2p1" as "fat" with label "ESP"
2020/06/25 18:23:22 detaching loopback device
2020/06/25 18:23:22 failed to format device: exit status 1: mkfs.vfat: can't open '/dev/loop2p1': No such file or directory
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-26 12:22:00 -07:00
Andrew Rynhard
6ea313fa7d fix: detect if partition table is missing
This adds a sentinel error for a missing partition table. This error
is used to detect if a partition table already exists when setting
up user defined disks.

In addition to the fix, this removes a legacy parameter from the
`PartitionTable` method that indicated that the partition table
should be read. It is safer to just read it every time. Also, I
can't think of a case when the block device partition table is nil
and we want to read.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-06-16 18:26:59 -07:00
Andrew Rynhard
9cf76ca5ec fix: allow for using /dev/disk/* symlinks
This change ensures that any symlinks under `/dev/disks` are created
by udevd early enough so that users can specify disks identified
by these symlinks in `machine.disks` entries. This change also moves
the udevd-trigger from a service to a post-startup task for udevd.
This is a one time command that doesn't need to be a service. Some
other minor changes were made around parsing the partition name that
makes use of the symlinks created by udevd. This also ensures that
target mount points in `machine.disks` entries are created, fixing a
preexisting bug.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-06-16 13:01:26 -07:00
Andrew Rynhard
3c793361e9 feat: add support for file scheme
This adds support for reading the config from a local file.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-05-13 11:36:19 -07:00
Andrew Rynhard
7645169152 refactor: remove warning about missing boot partition
There is no need to really warn about this. We handle this sceanrio just
fine now. The warning just adds confusion.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-29 07:50:43 -07:00
Andrew Rynhard
49307d554d refactor: improve machined
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.

A few highlights are:

- no more event bus
- functional approach to tasks (no more types defined for each task)
  - the task function definition now offers a lot more context, like
    access to raw API requests, the current sequence, a logger, the new
    state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
  install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
  install force option

Additionally, this pulls various packes in under machined to make the
code easier to navigate.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-28 08:20:55 -07:00
Andrew Rynhard
d34e9f0984 fix: pass dev path to mkfs
This fixes a bug caused by a missing device argument to `mkfs.xfs`.
Without a device, `mkfs.xfs` will error out. Additionally, this ensures
that the installer container is started with the `kmsg` writer that
ensures logs are formatted correctly for `/dev/kmsg`. Without this we
lose a lot of the logs output by the container, one of them being any
error from `mkfs.xfs`

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-21 16:30:35 -07:00
Andrew Rynhard
f77c353e65 fix: prevent formatting the ephemeral partition twice
This adds the requirement of the existence of the boot partition in the
if statement responsible for handling v0.4 style upgrades. Without this,
when upgrading from v0.3 to v0.4, we will attempt to format the
ephemeral partition twice. This will cause upgrades to fail.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-21 11:10:26 -07:00
Andrew Rynhard
4ccd4d5364 fix: set ephemeral partition to max size
This sets the size of the ephemeral partition to the maximum
allowed size at installation time. We have reports of `xfs_growfs` causing
extremely slow boot times when the disk is 1TB or more. In our research
we found evidence that `xfs_growfs` is an expensive operation when
growing to a size of 10 times or more of the base. Instead, users should
create the disk close to the max disk size at install time. The
difference being that `mkfs.xfs` will handle larger disks better.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-17 07:08:04 -07:00
Spencer Smith
8d2f8d6127 chore: remove random.trust_cpu references
This PR removes the references to adding in the random CPU trust to the
kernel for all v0.4 docs, as well as in the iso command in the
installer. This is no longer needed with the newer linux kernel.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-04-14 17:10:56 -07:00
Andrey Smirnov
5255883034 fix: make sure Close() is called on every path
For some places `.Close()` was clearly missing, for some of them I wanted
to be 200% sure it gets called on every code path.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-04-03 19:16:01 -04:00
Andrew Rynhard
47327eca09 fix: move empty label check
We should always set the fallback tag on an upgrade, and only revert if
the tag value is not an empty string.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-03-30 13:42:08 -07:00
Andrew Rynhard
6fe5fed6f9 fix: make upgrades work with UEFI
Since the `--once` option of `extlinux` seems to only work with BIOS, we
needed to change to remove any reliance on this option. Instead of
booting the upgraded version once, and then making it the default after
a successful boot, we now make it the default, and then revert on any
boot error.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-03-26 13:34:00 -07:00
Andrew Rynhard
69fa63a7b2 refactor: perform upgrade upon reboot
This PR introduces a new strategy for upgrades. Instead of attempting to
zap the partition table, create a new one, and then format the
partitions, this change will only update the `vmlinuz`, and
`initramfs.xz` being used to boot. It introduces an A/B style upgrade
process, which will allow for easy rollbacks. One deviation from our
original intention with upgrades is that this change does not completely
reset a node. It falls just short of that and does not reset the
partition table. This forces us to keep the current partition scheme in
mind as we make changes in the future, because an upgrade assumes a
specific partition scheme. We can improve upgrades further in the
future, but this will at least make them more dependable. Finally, one
more feature in this PR is the ability to keep state. This enables
single node clusters to upgrade since we keep the etcd data around.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-03-20 17:32:18 -07:00
Spencer Smith
12bfd8dd94 feat: allow for persistence of config data
This PR will allow users to set the `persist: true` value in their
config data to tell talos not to re-pull the config data at each reboot.
The default will still remain as a "pull every time" methodolgy in order
to encourage immutability by default.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-06 11:42:00 -05:00
Andrey Smirnov
bbe2c53d29 feat: generate kubeconfig on the fly on request
This extracts admin kubeconfig generation out of bootkube, now based on
Talos x509 library. On each API request for `kubeconfig`, config is
generated on the fly and sent back on the wire.

This fixes two issues:

* any master node can now generate `kubeconfig` (worker nodes can do
that too, but that should probably change in the future)
* after upgrade-and-wipe the disk scenario, `osctl kubeconfig` still
works

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-28 21:00:52 +03:00
Andrew Rynhard
64b5b32732 refactor: use go-procfs
This makes use of the external procfs pacakge that is based on the
pacakge we are removing here.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 15:58:57 -08:00
Andrew Rynhard
815aa99cc4 fix: set the correct kernel args for VMware
This enusres that we default to using `guestinfo` for VMware's config
source, and that we use tty0 instead of ttyS0 for the console.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-01 08:21:09 -08:00