This renames existing 'DHCP' implementation to `DHCP4`, new client is
`DHCP6`.
For now, `DHCP6` is disabled by default and should be explicitly enabled
with the config.
QEMU testbed for IPv6 is going to be pushed as separate PR.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR fixes a problem we had with AWS clusters. We now allow the
kubelet to register using the full fqdn instead of just hostname.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
There are several ways Talos node might be restarted or shut down:
* error in sequence (initiated from machined)
* panic in main goroutine (machined recovers panics)
* error in sequence (initiated via API, event caught by machined)
* reboot/shutdown via Talos API
Before this change, paths (1) and (2) were handled in machined, and no
disks were unmounted and processes killed, so technically all the
processes are running and potentially writing to the filesystems.
Paths (3) and (4) try to stop services (but not pods) and unmount
explicitly mounted filesystems, followed by reboot directly from
sequencer (bypassing machined handler).
There was a bug that user disks were never explicitly unmounted (but
they might have been unmounted if mounted on top `/var`).
This refactors all the reboot/shutdown paths to flow through machined's
main function: on paths (4) event is sent via event API from the
sequencer back to the machined and machined initiates proper shutdown
sequence.
Refactoring in machined leads to all the paths (1)-(4) flowing through
the same function `handle(error)`.
Added two additional checks before flushing buffers:
* kill all non-system processes, this also kills all mount namespaces
* unmount any filesystem backed by `/dev/*`
This ensures all filesystems are unmounted before buffers are flushed.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Flannel got updated to 0.13 version which has multi-arch image.
Kubernetes images are multi-arch.
Fixes#3049
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Control plane components are running as static pods managed by the
kubelets.
Whole subsystem is managed via resources/controllers from os-runtime.
Many supporting changes/refactoring to enable new code paths.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This brings in `os-runtime` package and exposes resources with first
iteration of read-only API.
Two Talos resources (and one controller) are implemented:
* legacy.Service resource tracks Talos 'service' `RUNNING` state
* config.V1Alpha1 stores current runtime config
Glue point between existing runtime and new os-runtime based runtime is
in `v1alpha2` implementation and `V1Alpha2()` sub-interfaces of existing
`Runtime`, `State`, `Controller` interfaces.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This the first iteration of Wireguard network support.
What was done:
- kernel was updated to enable Wireguard kernel module.
- changed networkd to support creating Wireguard device type.
- used wgctrl to configure wireguard.
- updated `talosctl cluster create` to support generating Wireguard
network configuration automatically by just specifying the network cidr.
- added docs about Wireguard support/how to use it.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Before these changes, errors were always sent as strings, so if original
error was gRPC error (which is almost always the case for apid), it is
formatted as string and original fields (like code) are lost in the
formatted string.
With this change, apid sends errors as official `grpc.Status` protobuf
structure, and client decodes that into Go grpc.Status based error.
This change is backwards and forwards compatible.
This should fix more cases when integration tests were not able to
ignore grpc `transport is closing` errors when they were sent as strings
from the apid endpoint.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes: https://github.com/talos-systems/talos/issues/2973
Can now supply disk image using `--disk-image-path` flag.
May need to enable `--with-apply-config` if it's necessary to bootstrap
nodes properly.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Enabling debug on slow serial consoles degrades Talos bootstrap
performance and so Talos nodes never get configured properly.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Idea is to add an option to perform "selective" reset: default reset
operation is to wipe all partitions (triggering reinstall), while spec
allows only to wipe some of the operations.
Other operations are performed exactly in the same way for any reset
flow.
Possible use case: reset only `EPHEMERAL` partition.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
If disk is empty and ISO path is given, QEMU provisioner mounts ISO on
the first boot.
To drop into maintenance mode:
```
talosctl cluster create --provisioner=qemu --iso-path=./_out/talos-amd64.iso --skip-injecting-config --wait=false
```
Then inject the config, bootstrap the node, wait for it to come up (via
`talosctl cluster health`).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Allows merging two Talos configs into one. Merges the config in whatever
is set by TALOSCONFIG or ~/.talos/config.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Bonus to `talosctl config merge`.
Got that idea after using talosctl for a weekend.
I feel that can be a good addition to have a command that can list existing
contexts in a table view, which is similar to what `kubectl config get-contexts`
does. To avoid going through the file which has all the certs and such.
Called it just `contexts` to align with whatever we have now (to switch
context you need to use `talosctl config context`).
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Regular upgrade path takes just one reboot, but it requires all the
processes to be stopped on the node before upgrade might proceed. Under
some circumstances and with potential Talos bugs it might not work
rendering Talos upgrades almost impossible.
Staged upgrades build upon regular install flow to run the upgrade on
the node reboot. Such upgrades require two reboots of the node, and it
requires two pulls of the installer image, but they should be much less
suspicious to the failure. Once the upgrade is staged, node can be
rebooted in any possible way, including hard reset and upgrade is
performed on the next boot.
New ADV format was implemented as well to allow to store install image
ref/options across reboots. New format allows for bigger values and
takes 50% of the `META` partition. Old ADV is still kept for
compatibility reasons.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR adds a guide on how to deploy on Openstack with our new image.
It also fixes a small typo I noticed in gcp docs.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
If the node time is out of sync, it can generate incorrect
configuration. And maintenance mode does not allow us starting ntp,
because there is no containerd.
By providing current UTC time of the machine where talosctl client is
running, it is possible to force GenerateConfiguration use correct time.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
By publishing SBC images as compressed raw images, tools like etcher can flash SD cards
by using URLs to the release asset. It is also common in this community to publish compressed
images instead of tarballs.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
This will build SBC images and output them to the artifacts directory. These images
will be published on releases.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>