If Talos node is booted without `devkmsg_printk` set to `on` (which
disables ratelimiting), logs are severely ratelimited and close to
impossible to read.
If all the regular kernel args are missing (including KSPP ones), Talos
reboots but actual error message is not printed.
This fixes to at least disable ratelimiting on kmsg writes to make all
the logs visible anyways.
Fixes#2908
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This adds `tty0` for all the boards in case HDMI output actually works.
For RPi4, disable BT to enable PL011 instead of mini-UART for serial
console, as PL011 is much more stable (fixes garbage on serial output).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
There were two problems:
* `configureInterfaces` was always failing if interface is already set
up, as the routes already exist
* `renew` was halving the renew interval each time `configureInterface`
fails, which starts at (LeaseTime/2) and goes effectively to zero
This was leading to high networkd CPU usage, storm of DHCP requests on
the network.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This allows to use older installer images with new Talos environment
(as older installers don't support `--board` argument).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This allows boards to provide kernel args at install time. We need this so that
we can set the console.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
This introduces the notion of a "board" in Talos. A board is an interface that is capable
of modifying the installation in specific ways for a given SBC. This also adds support for the
libretech_all_h3_cc_h5.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
This PR adds the ability for us to deploy Talos in openstack. Tested in
local devstack with a supplied userdata file. It also adds support to
the Makefile for building the openstack image so it'll be published with
next release.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This allows config to be written to disk without being applied
immediately.
Small refactoring to extract common code paths.
At first, I tried to implement this via the sequencer, but looks like
it's too hard to get it right, as sequencer lacks context and config to
be written is not applied to the runtime.
Fixes#2828
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Under normal boot, Talos run `timed` before starting almost anything
else, but in the installer phase timed doesn't run. If the RTC is
missing or totally off, it might lead to image pull failure for the
installer.
This is special task to run timed without blocking on time sync, as it
might not be available in some environments.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is initial commit of the installer.
What's done:
- verifying node availability before starting any operations.
- gathering information about disks on the machine.
- allows setting: install disk, hostname, machine type, installer image,
kubernetes version, dns domain, cluster-name.
- dumps/merges talosconfig to a file after applying configuration.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
While IPv6 were mostly supported already, there was a single segment in
the interface setup which forced everything into an IPv4 route.
This limitation has been removed.
In so doing, route metrics have been cleaned up a small amount.
This change allows the specification of the route metric from the
config.
Fixes#2772
Signed-off-by: Seán C McCord <ulexus@gmail.com>
This PR fixes a bug where kubelets weren't getting registered with the
actual hostname of the server in packet.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
Plus fix the logging on docker/Talos to avoid logs in docker mode going
to the host kernel message buffer.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
On first boot of Talos, if userdata is missing, Talos is going to drop
into maintenance mode which allows to upload config to the server via
`talosctl apply-config` command.
See also: https://github.com/talos-systems/go-retry/pull/4Fixes#2780
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This provides compatibility with VMWare CAPI provider which stores the
bootstrap secret in `guestinfo.userdata`.
Fixes#2795
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR fixes a bug where the nics in packet was only receiving the
public mgmt IP, instead of all addresses supplied in the network
metadata.
It also moves to using the bond instead of eth0 directly. This is more
in line with other packet OS images. We will need to explore why eth1
seems to always show as not having a carrier, as well as tolerate that
condition when setting up bonding in a follow-up PR.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
Fixes: https://github.com/talos-systems/talos/issues/2766
This API is implemented in Maintenance and Machine services.
Can be used to generate configuration on the node, instead of using
talosctl to generate it locally.
To be used in interactive installer and talosctl gen config.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Server in maintenance mode now prints certficate fingerprint and
provides sample talosctl command to upload config to the node.
`talosctl` can optionally enforce server certificate fingerprint.
See also https://github.com/talos-systems/crypto/pull/4Fixes#2753
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Now maintenance service implements `MachineService` interface, stubbing
all not implemented methods.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Fixes#2761
Service `osd` was merged into machined on Jul, 13th, before 0.6 release.
It's time to drop the backwards compatibility with clients before 0.6.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This fixes the reverse Go dependency from `pkg/machinery` to `talos`
package.
Add a check to `Dockerfile` to prevent `pkg/machinery/go.mod` getting
out of sync, this should prevent problems in the future.
Fix potential security issue in `token` authorizer to deny requests
without grpc metadata.
In provisioner, add support for launching nodes without the config
(config is not delivered to the provisioned nodes).
Breaking change in `pkg/provision`: now `NodeRequest.Type` should be set
to the node type (as config can be missing now).
In `talosctl cluster create` add a flag to skip providing config to the
nodes so that they enter maintenance mode, while the generated configs
are written down to disk (so they can be tweaked and applied easily).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes were applied automatically.
Import ordering might be questionable, but it's strict:
* stdlib
* other packages
* same package imports
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Instead of hosting a web service, we decided to implement a gRPC service
that exposes APIs that can be used in a client-side interactive installer.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
If no configuration source is provided (on baremetal only for now), Talos will set of a simple web service to receive its configuration via an HTTP POST request, on port 80 or 443.
It also serves a simple web form for interactive submission from the browser.
Fixes#2593
Signed-off-by: Seán C McCord <ulexus@gmail.com>
This is 🤦, I somehow missed that installer Manifest is used
here and missed the fact I need to update it.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Talos always stops and removes CRI pods before stopping CRI containerd
when upgrading with wipe (force), but on "preserve" code paths pods were
never stopped (we can't remove them to keep preserve guarantees). This
PR makes sure pods are stopped on upgrade in any case.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The problem was that etcd stop was only happening in `LeaveEtcd`, thus
upgrade with preserve was never stopping etcd leaving ephemeral
partition still busy.
Refactored code which was stopping service, shutting down all the
services to provide the interface we need:
* stop a service without considering reverse dependencies (force);
* stop a service (services) waiting for reverse dependencies;
* shutdown all the services waiting for reverse dependencies.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
When an etcd node is upgraded, we now perform additional quorum checks.
This is necessary because when etcd nodes are upgraded, they are removed
from membership. If, for instance, two etcd nodes were to upgrade
simultaneously, quorum may be lost. This, of course, does not apply to
single-node etcd clusters.
Fixes#1422
Signed-off-by: Seán C McCord <ulexus@gmail.com>
This fixes A/B upgrades and rollback API.
Installer manifest supports now an option to preserve partition contents
while disk is being re-partitioned and partitions are re-formatted.
Mount `/boot` partition as needed (to find current label before starting
the installation and in the rollback API).
Fix upgrade API for non-master nodes.
Contents of `/boot`, `/system/state` and META partitions are preserved
in memory while the disk is re-partitioned.
Remove `--save` flag from the installer as it's not being used.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR makes use of a new merge into the upstream rtnetlink library
that introduces functional args for adding routes.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This PR changes the bool for disabling ntp to `disable` instead of the
previous `enable`. We need to do this because customers were seeing
failure in cases where they were defining time servers only, which
results in `enabled: false` when configs get unmarshalled. Users wishing
to disable ntp altogether should now use `disabled: true`.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This covers most of the packages except for those we have to keep on
hold (etcd and grpc because of etcd).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This unifies more code paths under the control of `install.Manifest` vs.
being split across the installer and manifest code.
There should be no functional changes now.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR updates the behavior of our machine configs with respect to
DHCP-enabled interfaces. Now, if MTU is specified by the user, that
value will take precedence over any setting provided by the DHCP server.
Additionally, any routes specified will be appended to routes specified
by the DHCP server.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This PR adds a "DHCPOptions" field to the config. This field contains a
single subfield currently, "RouteMetric". Setting this well ensure that
any routes provided from the DHCP server are given this metric upon
injection into the routing table.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
Return proper message back to the client in case if called method is not
supported by mode any particular node runs in.
Fixes: https://github.com/talos-systems/talos/issues/2629
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>