Commit Graph

654 Commits

Author SHA1 Message Date
Andrey Smirnov
79bed5e610 fix: remove kmsg ratelimiting on startup
If Talos node is booted without `devkmsg_printk` set to `on` (which
disables ratelimiting), logs are severely ratelimited and close to
impossible to read.

If all the regular kernel args are missing (including KSPP ones), Talos
reboots but actual error message is not printed.

This fixes to at least disable ratelimiting on kmsg writes to make all
the logs visible anyways.

Fixes #2908

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-08 06:34:56 -08:00
Andrey Smirnov
2d5faf3b86 fix: stabilize serial console on RPi4, add video console
This adds `tty0` for all the boards in case HDMI output actually works.

For RPi4, disable BT to enable PL011 instead of mini-UART for serial
console, as PL011 is much more stable (fixes garbage on serial output).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-04 11:39:35 -08:00
Andrew Rynhard
0c254e79d6 feat: add support for the Pine64 Rock64
This adds support for the Rock64.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-12-04 09:44:20 -08:00
Andrey Smirnov
360d887967 fix: prevent endless loop with DHCP requests in networkd
There were two problems:

* `configureInterfaces` was always failing if interface is already set
up, as the routes already exist

* `renew` was halving the renew interval each time `configureInterface`
fails, which starts at (LeaseTime/2) and goes effectively to zero

This was leading to high networkd CPU usage, storm of DHCP requests on
the network.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-01 08:12:12 -08:00
Andrey Smirnov
71063c4a6e fix: skip board argument to the installer if it's not set
This allows to use older installer images with new Talos environment
(as older installers don't support `--board` argument).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-01 07:44:57 -08:00
Andrew Rynhard
5fe41ba32b feat: allow boards to set kernel args
This allows boards to provide kernel args at install time. We need this so that
we can set the console.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-12-01 07:08:20 -08:00
Andrew Rynhard
10db642b2f feat: add support for the Banana Pi M64
This adds the Banana Pi M64 to the list of supported boards.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-30 18:17:37 -08:00
Andrew Rynhard
88f15b1254 fix: use the dtb from kernel pkg for libretech_all_h3_cc_h5
This adds sun50i-h5-libretech-all-h3-cc.dtb to the EFI partition.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-30 17:54:07 -08:00
Andrew Rynhard
99aa3cdba5 feat: add support for the Raspberry Pi 4 Model B
This adds support for the Raspberry Pi 4 Model B.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-30 09:29:48 -08:00
Artem Chernyshev
8aad711f18 feat: implement network interfaces list API
To be used in the interactive installer to configure networking.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-11-27 10:48:45 -08:00
Andrey Smirnov
1eac88e470 feat: add support for installing to SBCs
This introduces the notion of a "board" in Talos. A board is an interface that is capable
of modifying the installation in specific ways for a given SBC. This also adds support for the
libretech_all_h3_cc_h5.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-26 07:18:25 -08:00
Spencer Smith
79057f93c5 feat: support openstack platform
This PR adds the ability for us to deploy Talos in openstack. Tested in
local devstack with a supplied userdata file. It also adds support to
the Makefile for building the openstack image so it'll be published with
next release.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-11-25 07:12:57 -08:00
Andrey Smirnov
9a32e34cb1 feat: implement apply configuration without reboot
This allows config to be written to disk without being applied
immediately.

Small refactoring to extract common code paths.

At first, I tried to implement this via the sequencer, but looks like
it's too hard to get it right, as sequencer lacks context and config to
be written is not applied to the runtime.

Fixes #2828

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-23 12:42:44 -08:00
Andrey Smirnov
d9d7c27d0c feat: sync time before installer runs
Under normal boot, Talos run `timed` before starting almost anything
else, but in the installer phase timed doesn't run. If the RTC is
missing or totally off, it might lead to image pull failure for the
installer.

This is special task to run timed without blocking on time sync, as it
might not be available in some environments.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-20 04:21:58 -08:00
Andrey Smirnov
7767a41d4a feat: set interface MTU in DHCP mode even if DHCP is not successful
Fixes #2789

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-19 10:59:21 -08:00
Artem Chernyshev
816e8af261 feat: print hint about using interative installer in mainenance mode
Similar to what we have for config upload.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-11-19 07:15:42 -08:00
Artem Chernyshev
b6874ee82a feat: add TUI based talos interactive installer
This is initial commit of the installer.
What's done:
- verifying node availability before starting any operations.
- gathering information about disks on the machine.
- allows setting: install disk, hostname, machine type, installer image,
  kubernetes version, dns domain, cluster-name.
- dumps/merges talosconfig to a file after applying configuration.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-11-18 12:34:15 -08:00
Seán C McCord
5d4d179cd8 feat: support ipv6 routes
While IPv6 were mostly supported already, there was a single segment in
the interface setup which forced everything into an IPv4 route.
This limitation has been removed.

In so doing, route metrics have been cleaned up a small amount.
This change allows the specification of the route metric from the
config.

Fixes #2772

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-11-17 13:11:26 -08:00
Spencer Smith
dfa3ad485e fix: return hostname from packet platform
This PR fixes a bug where kubelets weren't getting registered with the
actual hostname of the server in packet.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-11-17 11:28:12 -05:00
Andrey Smirnov
fc5f53bf51 fix: make fingerprint clearly optional in a boot hint
Plus fix the logging on docker/Talos to avoid logs in docker mode going
to the host kernel message buffer.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-16 11:46:15 -08:00
Andrey Smirnov
83bb1afcb6 feat: drop to maintenance mode in cloud platforms if userdata is missing
On first boot of Talos, if userdata is missing, Talos is going to drop
into maintenance mode which allows to upload config to the server via
`talosctl apply-config` command.

See also: https://github.com/talos-systems/go-retry/pull/4

Fixes #2780

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-16 11:03:26 -08:00
Andrey Smirnov
39e644c924 feat: read config from extra guestinfo key (vmware)
This provides compatibility with VMWare CAPI provider which stores the
bootstrap secret in `guestinfo.userdata`.

Fixes #2795

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-16 07:59:21 -08:00
Spencer Smith
ac88dbe516 fix: ensure packet nics get all IPs
This PR fixes a bug where the nics in packet was only receiving the
public mgmt IP, instead of all addresses supplied in the network
metadata.

It also moves to using the bond instead of eth0 directly. This is more
in line with other packet OS images. We will need to explore why eth1
seems to always show as not having a carrier, as well as tolerate that
condition when setting up bonding in a follow-up PR.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-11-13 08:56:29 -08:00
Artem Chernyshev
0f924b5122 feat: add generate config gRPC API
Fixes: https://github.com/talos-systems/talos/issues/2766

This API is implemented in Maintenance and Machine services.
Can be used to generate configuration on the node, instead of using
talosctl to generate it locally.

To be used in interactive installer and talosctl gen config.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-11-13 08:07:32 -08:00
Andrey Smirnov
58df555580 feat: add example command in maintenance, enforce cert fingerprint
Server in maintenance mode now prints certficate fingerprint and
provides sample talosctl command to upload config to the node.

`talosctl` can optionally enforce server certificate fingerprint.

See also https://github.com/talos-systems/crypto/pull/4

Fixes #2753

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-12 07:36:18 -08:00
Artem Chernyshev
93e30a1738 chore: remove maintenance service interface and use machine service
Now maintenance service implements `MachineService` interface, stubbing
all not implemented methods.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-11-11 12:33:44 -08:00
Andrew Rynhard
71321214a1 feat: add storage API
This is the initial implementation of a storage API.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-11 10:12:25 -08:00
Andrey Smirnov
026244097a refactor: drop osd compatibility layer
Fixes #2761

Service `osd` was merged into machined on Jul, 13th, before 0.6 release.

It's time to drop the backwards compatibility with clients before 0.6.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-11 09:38:19 -08:00
Andrey Smirnov
b2b86a622e fix: remove 'token creds' from maintenance service
This fixes the reverse Go dependency from `pkg/machinery` to `talos`
package.

Add a check to `Dockerfile` to prevent `pkg/machinery/go.mod` getting
out of sync, this should prevent problems in the future.

Fix potential security issue in `token` authorizer to deny requests
without grpc metadata.

In provisioner, add support for launching nodes without the config
(config is not delivered to the provisioned nodes).

Breaking change in `pkg/provision`: now `NodeRequest.Type` should be set
to the node type (as config can be missing now).

In `talosctl cluster create` add a flag to skip providing config to the
nodes so that they enter maintenance mode, while the generated configs
are written down to disk (so they can be tweaked and applied easily).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 14:10:32 -08:00
Andrey Smirnov
a2efa44663 chore: enable gci linter
Fixes were applied automatically.

Import ordering might be questionable, but it's strict:

* stdlib
* other packages
* same package imports

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 08:09:48 -08:00
Andrey Smirnov
8560fb9662 chore: enable nlreturn linter
Most of the fixes were automatically applied.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 06:48:07 -08:00
Andrew Rynhard
a38410ead6 fix: remove log.Fatal from maintenance service
Errors should be returned, otherwise we will get a kernel panic.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-03 09:04:01 -08:00
Andrew Rynhard
562f816526 refactor: use gRPC for interactive installation
Instead of hosting a web service, we decided to implement a gRPC service
that exposes APIs that can be used in a client-side interactive installer.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-03 08:36:44 -08:00
Andrew Rynhard
562ab1d572 chore: update golangci-lint
Brings in the latest version of golangci-lint and addresses errors.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-02 20:34:05 -08:00
Andrew Rynhard
1ca61ddce7 feat: add ISO support
This reverts commit 3515f4e0f8.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-02 10:21:40 -08:00
Andrew Rynhard
2f95d03b41 fix: prevent blind mode boot
Adds options to the grub.cfg that prevents booting in blind mode.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-01 11:51:50 -08:00
Seán C McCord
63daa56dd0 feat: add webconfig service
If no configuration source is provided (on baremetal only for now), Talos will set of a simple web service to receive its configuration via an HTTP POST request, on port 80 or 443.
It also serves a simple web form for interactive submission from the browser.

Fixes #2593

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-10-31 13:46:37 -07:00
Andrey Smirnov
aeb4883970 fix: properly initialize manifest in user disks creation
This is 🤦, I somehow missed that installer Manifest is used
here and missed the fact I need to update it.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-29 21:39:42 +03:00
Andrey Smirnov
75817a58ef fix: stop CRI pods on upgrade with preserve
Talos always stops and removes CRI pods before stopping CRI containerd
when upgrading with wipe (force), but on "preserve" code paths pods were
never stopped (we can't remove them to keep preserve guarantees). This
PR makes sure pods are stopped on upgrade in any case.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-26 16:05:48 -07:00
Andrey Smirnov
e7f6344d97 fix: stop etcd on any path on upgrade
The problem was that etcd stop was only happening in `LeaveEtcd`, thus
upgrade with preserve was never stopping etcd leaving ephemeral
partition still busy.

Refactored code which was stopping service, shutting down all the
services to provide the interface we need:

* stop a service without considering reverse dependencies (force);
* stop a service (services) waiting for reverse dependencies;
* shutdown all the services waiting for reverse dependencies.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-26 12:18:53 -07:00
Seán C McCord
a7a27e7edd feat: extend etcd health check on upgrade
When an etcd node is upgraded, we now perform additional quorum checks.
This is necessary because when etcd nodes are upgraded, they are removed
from membership.  If, for instance, two etcd nodes were to upgrade
simultaneously, quorum may be lost.  This, of course, does not apply to
single-node etcd clusters.

Fixes #1422

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-10-23 15:49:55 -07:00
Andrey Smirnov
ff4d702f77 fix: implement preserving contents of partition on install
This fixes A/B upgrades and rollback API.

Installer manifest supports now an option to preserve partition contents
while disk is being re-partitioned and partitions are re-formatted.

Mount `/boot` partition as needed (to find current label before starting
the installation and in the rollback API).

Fix upgrade API for non-master nodes.

Contents of `/boot`, `/system/state` and META partitions are preserved
in memory while the disk is re-partitioned.

Remove `--save` flag from the installer as it's not being used.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-22 23:56:39 +03:00
Spencer Smith
8b5406c889 chore: move to newer release of rtnetlink with fn args
This PR makes use of a new merge into the upstream rtnetlink library
that introduces functional args for adding routes.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-10-22 06:56:22 -07:00
Spencer Smith
cfb2c50dd7 fix: update handling of ntp disable
This PR changes the bool for disabling ntp to `disable` instead of the
previous `enable`. We need to do this because customers were seeing
failure in cases where they were defining time servers only, which
results in `enabled: false` when configs get unmarshalled. Users wishing
to disable ntp altogether should now use `disabled: true`.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-10-20 08:58:55 -07:00
Andrey Smirnov
16b6d344de chore: bump module dependencies in go.mod
This covers most of the packages except for those we have to keep on
hold (etcd and grpc because of etcd).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-20 08:09:42 -07:00
Andrey Smirnov
4adb613f66 refactor: bring more control to install.Manifest execution
This unifies more code paths under the control of `install.Manifest` vs.
being split across the installer and manifest code.

There should be no functional changes now.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-20 01:08:14 -07:00
Spencer Smith
4c47fa259c feat: support MTU and route changes for DHCP
This PR updates the behavior of our machine configs with respect to
DHCP-enabled interfaces. Now, if MTU is specified by the user, that
value will take precedence over any setting provided by the DHCP server.

Additionally, any routes specified will be appended to routes specified
by the DHCP server.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-10-16 17:22:47 -07:00
Spencer Smith
7bc3fcf77d feat: support metric values for DHCP
This PR adds a "DHCPOptions" field to the config. This field contains a
single subfield currently, "RouteMetric". Setting this well ensure that
any routes provided from the DHCP server are given this metric upon
injection into the routing table.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-10-16 08:29:04 -07:00
Artem Chernyshev
04e267a550 feat: handle unsupported commands being called for docker
Return proper message back to the client in case if called method is not
supported by mode any particular node runs in.

Fixes: https://github.com/talos-systems/talos/issues/2629

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-10-14 13:44:38 -07:00
Artem Chernyshev
e7e99cf1b3 feat: support disk usage command in talosctl
Usage example:

```bash
talosctl du --nodes 10.5.0.2 /var -H -d 2
NODE       NAME
10.5.0.2   8.4 kB   etc
10.5.0.2   1.3 GB   lib
10.5.0.2   16 MB    log
10.5.0.2   25 kB    run
10.5.0.2   4.1 kB   tmp
10.5.0.2   1.3 GB   .
```

Supported flags:
- `-a` writes counts for all files, not just directories.
- `-d` recursion depth
- '-H' humanize size outputs.
- '-t' size threshold (skip files if < size or > size).

Fixes: https://github.com/talos-systems/talos/issues/2504

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-10-13 09:30:31 -07:00