Tests for ApplyConfig API were relying on not really supported behavior
of modifying config via the `Provider` interface (and it was "fixed" in
another PR which cleans up such access to the configuration).
Cluster version bumped to try to workaround strange CAPI bootstrap
failures in e2e-capi.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is a complete rewrite of time sync process.
Now the time sync process starts early at boot time, and it adapts to
configuration changes:
* before config is available, `pool.ntp.org` is used
* once config is available, configured time servers are used
Controller updates same time sync resource as other controllers had
dependency on, so they have a chance to wait for the time sync event.
Talos services which depend on time now wait on same resource instead of
waiting on timed health.
New features:
* time sync now sticks to the particular time server unless there's an
error from that server, and server is changed in that case, this
improves time sync accuracy
* time sync acts on config changes immediately, so it's possible to
reconfigure time sync at any time
* there's a new 'epoch' field in time sync resources which allows
time-dependent controllers to regenerate certs when there's a big enough
jump in time
Features to implement later:
* apid shouldn't depend on timed, it should be started early and it
should regenerate certs on time jump
* trustd should be updated in same way
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves implementation of the user-facing APIs to the machined, and
as now all the APIs are implemented by machined, remove routerd and
adjust apid to proxy to machined.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes: https://github.com/talos-systems/talos/issues/3323
Not exactly matching with udevd generated `by-<id>` symlinks, but should
provide sufficient amount of property selectors to be able to pick
specific disks for any kind of disk: sd card, hdd, ssd, nvme.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This removes container images for the aforementioned services, they are
now built into `machined` executable which launches one or another
service based on `argv[0]`.
Containers are started with rootfs directory which contains only a
single executable file for the service.
This creates rootfs on squashfs for each container in
`/opt/<container>`.
Service `networkd` is not touched as it's handled in #3350.
This removes all the image imports, snapshots and other things which
were associated with the existing way to run containers.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
First, if the config for some component image (e.g. `apiServer`) is empty,
Talos pushes default image which is unknown to the script, so verify
that change is not no-op, as otherwise script will hang forvever waiting
for k8s control plane config update.
Second, with bootkube bootstrap it was fine to omit explicit kubernetes
version in upgrade test, but with Talos-managed that means that after
Talos upgrade Kubernetes gets upgraded as well (as Talos config doesn't
contain K8s version, and defaults are used). This is not what we want to
test actually.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
CNI was removed from build-container which works fine for
`talosctl cluster create` clusters as it installs its own CNI, but fails
for upgrade tests as they were never updated for the CNI bundle.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Resources/types were renamed after alpha.4, so we need Talos API to
match expectations of the upgrade test built against master.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
See https://github.com/talos-systems/os-runtime/pull/12 for new mnaming
conventions.
No functional changes.
Additionally implements printing extra columns in `talosctl get xyz`.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This drops support for 0.7.x in upgrade tests, and bumps tests to use
version 0.9.0-alpha.3 as the next stable (it will eventually graduate to
0.9.0).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Talos generates in-cluster kubeconfig for the kube-scheduler and
kube-controller-manager to authenticate to kube-apiserver. Bug was that
validity of that kubeconfig was set to 24h by mistake. Fix that by
bumping validity to default for other Kubernetes certs (1 year).
Add a certificate refresh at 50% of the validity.
Fix bugs with copying secret resources which was leading to updates not
being propagated correctly.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This fixes output of `talosctl containers` to show failed/exited
containers so that it's possible to see e.g. `kube-apiserver` container
when it fails to start. This also enables using ID from the container
list to see logs of failing containers, so it's easy to debug issues
when control plane pods don't start because of wrong configuration.
Also remove option to use either CRI or containerd inspector, default to
containerd for system namespace and to CRI for kubernetes namespace.
The only side effect is that we can't see `kubelet` container in the
output of `talosctl containers -k`, but `kubelet` itself is available in
`talosctl services` and `talosctl logs kubelet`.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes: https://github.com/talos-systems/talos/issues/3209
Using parts of `kubectl` package to run the editor.
Also using the same approach as in `kubectl edit` command:
- add commented section to the top of the file with the description.
- if the config has errors, display validation errors in the commented
section at the top of the file.
- retry apply config until it succeeds.
- abort if no changes were detected or if the edited file is empty.
Patch currently supports jsonpatch only and can read it either from the
file or from the inline argument.
https://asciinema.org/a/wPawpctjoCFbJZKo2z2ATDXeC
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Verify upgrade flow using the same version of the installer.
Run that with disk encryption enabled.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Upgrade is performed by updating node configuration (node by node, service
by service), watching internal resource state to get new configuration
version and verifying that pod with matching version successfully
propagated to the API server state and pod is ready.
Process is similar to the rolling update of the DaemonSet.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is required to upgrade from Talos 0.8.x to 0.9.x. After the cluster
is fully upgraded, control plane is still self-hosted (as it was
bootstrapped with bootkube).
Tool `talosctl convert-k8s` (and library behind it) performs the upgrade
to self-hosted version.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR introduces the first part of disk encryption support.
New config section `systemDiskEncryption` was added into MachineConfig.
For now it contains only Ephemeral partition encryption.
Encryption itself supports two kinds of keys for now:
- node id deterministic key.
- static key which is hardcoded in the config and mainly used for test
purposes.
Talosctl cluster create can now be told to encrypt ephemeral partition
by using `--encrypt-ephemeral` flag.
Additionally:
- updated pkgs library version.
- changed Dockefile to copy cryptsetup deps from pkgs.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This explains the intetion better: config is applied on reboot, and
allows to easily distinguish it from `apply-config --immediate` which
applies config immediately without a reboot (that is coming in a
different PR).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Filesystem creation step is moved on the later stage: when Talos mounts
the partition for the first time.
Now it checks if the partition doesn't have any filesystem and formats
it right before mounting.
Additionally refactored mount options a bit:
- replaced separate options with a set of binary flags.
- implemented pre-mount and post-unmount hooks.
And fixed typos in couple of places and increased timeout for `apid ready`.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Also fix recovery grpc handler to print panic stacktrace to the log.
Any API should follow the structure compatible with apid proxying
injection of errors/nodes.
Explicitly fail GenerateConfig API on worker nodes, as it panics on
worker nodes (missing certificates in node config).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This allows to generating current version Talos configs (by default) or
backwards compatible configuration (e.g. for Talos 0.8).
`talosctl gen config` defaults to current version, but explicit version
can be passed to the command via flags.
`talosctl cluster create` defaults to install/container image version,
but that can be overridden. This makes `talosctl cluster create` now
compatible with 0.8.1 images out of the box.
Upgrade tests use contract based on source version in the test.
When used as a library, `VersionContract` can be omitted (defaults to
current version) or passed explicitly. `VersionContract` can be
convienietly parsed from Talos version string or specified as one of the
constants.
Fixes#3130
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Modify provision library to support multiple IPs, CIDRs, gateways, which
can be IPv4/IPv6. Based on IP types, enable services in the cluster to
run DHCPv4/DHCPv6 in the test environment.
There's outstanding bug left with routes not being properly set up in
the cluster so, IPs are not properly routable, but DHCPv6 works and IPs
are allocated (validates DHCPv6 client).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Our upgrades are safe by default - we check etcd health, take locks,
etc. But sometimes upgrades might be a way to recover broken (or
semi-broken) cluster, in that case we need upgrade to run even if the
checks are not passing. This is not a safe way to do upgrades, but it
might be a way to recover a cluster.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
ECDSA keys are smaller which decreases Talos config size, they are more
efficient in terms of key generation, signing, etc., so it makes boot
performance better (and config generation as well).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
After node reboot (and gRPC API unavailability), gRPC stack might cache
connection refused errors for up to backoff timeout. Explicitly clear
such errors in reset tests before trying to read data from the node to
verify reset success.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Flannel got updated to 0.13 version which has multi-arch image.
Kubernetes images are multi-arch.
Fixes#3049
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Control plane components are running as static pods managed by the
kubelets.
Whole subsystem is managed via resources/controllers from os-runtime.
Many supporting changes/refactoring to enable new code paths.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Default manifests created by bootkube so far were only enabling
pod-checkpointer for kube-apiserver. This seems to have issues with
single-node control plane scenario, when without scheduler and
controller-manager node might fall into `NodeAffinity` state.
See https://github.com/talos-systems/bootkube-plugin/pull/23
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
For single node clusters, control plane is unstable after reboot, run
health check several times to let it settle down to avoid failures in
subsequent checks.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Idea is to add an option to perform "selective" reset: default reset
operation is to wipe all partitions (triggering reinstall), while spec
allows only to wipe some of the operations.
Other operations are performed exactly in the same way for any reset
flow.
Possible use case: reset only `EPHEMERAL` partition.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Allows merging two Talos configs into one. Merges the config in whatever
is set by TALOSCONFIG or ~/.talos/config.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Bonus to `talosctl config merge`.
Got that idea after using talosctl for a weekend.
I feel that can be a good addition to have a command that can list existing
contexts in a table view, which is similar to what `kubectl config get-contexts`
does. To avoid going through the file which has all the certs and such.
Called it just `contexts` to align with whatever we have now (to switch
context you need to use `talosctl config context`).
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>