If a member has no IP addresses, prevent cluster health checks from failing with a panic by checking for the length of member IPs and not assuming there's always at least 1 IP.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Track the progress of the long-running actions `reboot`, `reset`, `upgrade` and `shutdown` on the client side by default, unless `--no-wait=true` is specified.
Use the events API to follow the events using the actor ID of the action and display it using an stderr reporter with a spinner.
Closessiderolabs/talos#5499.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
When integration tests run without data from Talos provisioner (e.g.
against AWS/GCP), it should work only with `talosconfig` as an input.
This specific flow was missing filling out `infoWrapper` properly.
Clean up things a bit by reducing code duplication.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Query the discovery service to fetch the node list and use the results in health checks. Closes siderolabs#5554.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Introduce `cluster.NodeInfo` to represent the basic info about a node which can be used in the health checks. This information, where possible, will be populated by the discovery service in following PRs. Part of siderolabs#5554.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Add new health check which checks if the etcd members match the control plane nodes. Closes siderolabs#5553.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
This fixes an error when integration test become stuck with the message
like:
```
waiting for coredns to report ready: some pods are not ready: [coredns-868c687b7-g2z64]
```
After some random sequence of node restarts one of the pods might become
"stuck" in `Completed` state (as it is shown in `kubectl get pods`)
blocking the check, as the pod will never become ready.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
With graceful kubelet shutdown (#5108), after graceful node restart pods
on the restarted node might stay in the status `Terminated` which breaks
the check on pod readiness.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Allow cluster nodes to have multiple internal IP addresses when checking
for all Kubernetes nodes.
Fixes#4807
Signed-off-by: Seán C McCord <ulexus@gmail.com>
We're no longer testing against Talos <= 0.8, so no reason to
run this check (even if it's no-op).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Verify upgrade flow using the same version of the installer.
Run that with disk encryption enabled.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Filesystem creation step is moved on the later stage: when Talos mounts
the partition for the first time.
Now it checks if the partition doesn't have any filesystem and formats
it right before mounting.
Additionally refactored mount options a bit:
- replaced separate options with a set of binary flags.
- implemented pre-mount and post-unmount hooks.
And fixed typos in couple of places and increased timeout for `apid ready`.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Control plane components are running as static pods managed by the
kubelets.
Whole subsystem is managed via resources/controllers from os-runtime.
Many supporting changes/refactoring to enable new code paths.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Pods which are spun up as checkpoints shouldn't be counted towards the
normal pods spun up as part of the daemon set.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
For 0.6 -> 0.7 upgrade, in any case config.yaml is preserved and moved
from `/boot` to `/system/state`.
For single node upgrade, `EPHEMERAL` partition is not touched and other
partitions are re-created as needed.
Bump provision tests to 0.6/0.7 upgrades as we get closer to the new
release.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This was bugging me for quite some time, as (randomly) output was
scrolling endless repeated line when terminal width was small.
Root cause was that `\t` on output takes random amount of spaces
(columns). Fixed small issues counting correctly UTF-8 runes (important
for the spinner and if we ever do i18n), and make sure spinner itself is
included into the calculations.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
It only gets enabled if output is a terminal. Failures which resolve
themselves are removed from the final output. Small spinner to indicate
progress.
While I was at it, I fixed client-side `talosctl health` when init node
is missing.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.
And `pkg/machinery` is published as Go module inside Talos repository.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This change is only moving packages and updating import paths.
Goal: expose `internal/pkg/provision` as `pkg/provision` to enable other
projects to import Talos provisioning library.
As cluster checks are almost always required as part of provisioning
process, package `internal/pkg/cluster` was also made public as
`pkg/cluster`.
Other changes were direct dependencies discovered by `importvet` which
were updated.
Public packages (useful, general purpose packages with stable API):
* `internal/pkg/conditions` -> `pkg/conditions`
* `internal/pkg/tail` -> `pkg/tail`
Private packages (used only on provisioning library internally):
* `internal/pkg/inmemhttp` -> `pkg/provision/internal/inmemhttp`
* `internal/pkg/kernel/vmlinuz` -> `pkg/provision/internal/vmlinuz`
* `internal/pkg/cniutils` -> `pkg/provision/internal/cniutils`
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>