This makes sure source directory exists before performing mount
operation.
Also adds an ability to patch the config bundle configs with JSON patch,
which is exposed in `talosctl cluster create`, this allowed me to easily
test this fix:
```
talosctl cluster create ... --config-patch='[{"op": "add", "path": "/machine/kubelet/extraMounts", "value": [{"destination": "/var/log/containers", "type": "bind", "source": "/var/log/containers", "options": ["rshared", "rbind", "rw"]}]}]'
```
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This allows to apply config even if sequencer is locked to recover from
confguration mistakes.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
See https://github.com/talos-systems/os-runtime/pull/12 for new mnaming
conventions.
No functional changes.
Additionally implements printing extra columns in `talosctl get xyz`.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes: https://github.com/talos-systems/talos/issues/3219
We already have `etcd leave`, which makes the node exclude itself from
etcd members.
But in case if the node can't remove itself because it doesn't have
connection to etcd we need this etcd remove-member cli, which basically removes
a node from a different node.
No unit tests for that as it's going to destroy the test cluster.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This fixes output of `talosctl containers` to show failed/exited
containers so that it's possible to see e.g. `kube-apiserver` container
when it fails to start. This also enables using ID from the container
list to see logs of failing containers, so it's easy to debug issues
when control plane pods don't start because of wrong configuration.
Also remove option to use either CRI or containerd inspector, default to
containerd for system namespace and to CRI for kubernetes namespace.
The only side effect is that we can't see `kubelet` container in the
output of `talosctl containers -k`, but `kubelet` itself is available in
`talosctl services` and `talosctl logs kubelet`.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This adds a VIP (virtual IP) option to the network configuration of an
interface, which will allow a set of nodes to share a floating IP
address among them. For now, this is restricted to control plane use
and only a single shared IP is supported.
Fixes#3111
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes: https://github.com/talos-systems/talos/issues/3209
Using parts of `kubectl` package to run the editor.
Also using the same approach as in `kubectl edit` command:
- add commented section to the top of the file with the description.
- if the config has errors, display validation errors in the commented
section at the top of the file.
- retry apply config until it succeeds.
- abort if no changes were detected or if the edited file is empty.
Patch currently supports jsonpatch only and can read it either from the
file or from the inline argument.
https://asciinema.org/a/wPawpctjoCFbJZKo2z2ATDXeC
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This allows to mount extra volumes into Talos-managed control plane
static pods. With additional options like extra files, any additional
content/configuration can be mounted.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This options drops kube-proxy manifest from the list of bootstrap
manifests. It might be used with CNIs which don't need `kube-proxy`.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
State partition encryption support adds a new section to the machine config.
And a new step to the sequencer flow which saves encryption
configuration object as json serialized value in the META partition.
Everything else is the same as is for the ephemeral partition.
Additionally enabled state partition encryption in the disk encryption
integration tests.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This is required to upgrade from Talos 0.8.x to 0.9.x. After the cluster
is fully upgraded, control plane is still self-hosted (as it was
bootstrapped with bootkube).
Tool `talosctl convert-k8s` (and library behind it) performs the upgrade
to self-hosted version.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR introduces the first part of disk encryption support.
New config section `systemDiskEncryption` was added into MachineConfig.
For now it contains only Ephemeral partition encryption.
Encryption itself supports two kinds of keys for now:
- node id deterministic key.
- static key which is hardcoded in the config and mainly used for test
purposes.
Talosctl cluster create can now be told to encrypt ephemeral partition
by using `--encrypt-ephemeral` flag.
Additionally:
- updated pkgs library version.
- changed Dockefile to copy cryptsetup deps from pkgs.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This uses API in `os-runtime` to pull the initial list of resources +
updates for resource by type.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This explains the intetion better: config is applied on reboot, and
allows to easily distinguish it from `apply-config --immediate` which
applies config immediately without a reboot (that is coming in a
different PR).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Also fix recovery grpc handler to print panic stacktrace to the log.
Any API should follow the structure compatible with apid proxying
injection of errors/nodes.
Explicitly fail GenerateConfig API on worker nodes, as it panics on
worker nodes (missing certificates in node config).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This allows to generating current version Talos configs (by default) or
backwards compatible configuration (e.g. for Talos 0.8).
`talosctl gen config` defaults to current version, but explicit version
can be passed to the command via flags.
`talosctl cluster create` defaults to install/container image version,
but that can be overridden. This makes `talosctl cluster create` now
compatible with 0.8.1 images out of the box.
Upgrade tests use contract based on source version in the test.
When used as a library, `VersionContract` can be omitted (defaults to
current version) or passed explicitly. `VersionContract` can be
convienietly parsed from Talos version string or specified as one of the
constants.
Fixes#3130
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Modify provision library to support multiple IPs, CIDRs, gateways, which
can be IPv4/IPv6. Based on IP types, enable services in the cluster to
run DHCPv4/DHCPv6 in the test environment.
There's outstanding bug left with routes not being properly set up in
the cluster so, IPs are not properly routable, but DHCPv6 works and IPs
are allocated (validates DHCPv6 client).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Our upgrades are safe by default - we check etcd health, take locks,
etc. But sometimes upgrades might be a way to recover broken (or
semi-broken) cluster, in that case we need upgrade to run even if the
checks are not passing. This is not a safe way to do upgrades, but it
might be a way to recover a cluster.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This renames existing 'DHCP' implementation to `DHCP4`, new client is
`DHCP6`.
For now, `DHCP6` is disabled by default and should be explicitly enabled
with the config.
QEMU testbed for IPv6 is going to be pushed as separate PR.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR fixes a problem we had with AWS clusters. We now allow the
kubelet to register using the full fqdn instead of just hostname.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
There are several ways Talos node might be restarted or shut down:
* error in sequence (initiated from machined)
* panic in main goroutine (machined recovers panics)
* error in sequence (initiated via API, event caught by machined)
* reboot/shutdown via Talos API
Before this change, paths (1) and (2) were handled in machined, and no
disks were unmounted and processes killed, so technically all the
processes are running and potentially writing to the filesystems.
Paths (3) and (4) try to stop services (but not pods) and unmount
explicitly mounted filesystems, followed by reboot directly from
sequencer (bypassing machined handler).
There was a bug that user disks were never explicitly unmounted (but
they might have been unmounted if mounted on top `/var`).
This refactors all the reboot/shutdown paths to flow through machined's
main function: on paths (4) event is sent via event API from the
sequencer back to the machined and machined initiates proper shutdown
sequence.
Refactoring in machined leads to all the paths (1)-(4) flowing through
the same function `handle(error)`.
Added two additional checks before flushing buffers:
* kill all non-system processes, this also kills all mount namespaces
* unmount any filesystem backed by `/dev/*`
This ensures all filesystems are unmounted before buffers are flushed.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Flannel got updated to 0.13 version which has multi-arch image.
Kubernetes images are multi-arch.
Fixes#3049
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Control plane components are running as static pods managed by the
kubelets.
Whole subsystem is managed via resources/controllers from os-runtime.
Many supporting changes/refactoring to enable new code paths.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This brings in `os-runtime` package and exposes resources with first
iteration of read-only API.
Two Talos resources (and one controller) are implemented:
* legacy.Service resource tracks Talos 'service' `RUNNING` state
* config.V1Alpha1 stores current runtime config
Glue point between existing runtime and new os-runtime based runtime is
in `v1alpha2` implementation and `V1Alpha2()` sub-interfaces of existing
`Runtime`, `State`, `Controller` interfaces.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This the first iteration of Wireguard network support.
What was done:
- kernel was updated to enable Wireguard kernel module.
- changed networkd to support creating Wireguard device type.
- used wgctrl to configure wireguard.
- updated `talosctl cluster create` to support generating Wireguard
network configuration automatically by just specifying the network cidr.
- added docs about Wireguard support/how to use it.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>