Add cilium e2e tests. The existing cilium check was very old, update to
latest cilium version and also add a test for KPR strict mode.
Signed-off-by: Noel Georgi <git@frezbo.dev>
This PR adds two additional checks which are performed during boot sequence and in `talosctl health`. They ensure that nodes have enough memory and disk.
- Boot check will print a warning if memory / disk size is not sufficient.
- Health check will fail if memory / disk size is not sufficient.
Closes#6467
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
It got broken with the changes to the kubelet now sourcing static pods
from a HTTP internal server.
As we don't want it to be broken, and to make health checks better, add
a new check to make sure kubelet reports control plane static pods as
running. This coupled with API server check should make it more
thorough.
Also add logging when static pod definitions are updated (they were
previously there for file-based implementation). These logs are very
helpful for troubleshooting.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
If a member has no IP addresses, prevent cluster health checks from failing with a panic by checking for the length of member IPs and not assuming there's always at least 1 IP.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Track the progress of the long-running actions `reboot`, `reset`, `upgrade` and `shutdown` on the client side by default, unless `--no-wait=true` is specified.
Use the events API to follow the events using the actor ID of the action and display it using an stderr reporter with a spinner.
Closessiderolabs/talos#5499.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
When integration tests run without data from Talos provisioner (e.g.
against AWS/GCP), it should work only with `talosconfig` as an input.
This specific flow was missing filling out `infoWrapper` properly.
Clean up things a bit by reducing code duplication.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Query the discovery service to fetch the node list and use the results in health checks. Closes siderolabs#5554.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Introduce `cluster.NodeInfo` to represent the basic info about a node which can be used in the health checks. This information, where possible, will be populated by the discovery service in following PRs. Part of siderolabs#5554.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Add new health check which checks if the etcd members match the control plane nodes. Closes siderolabs#5553.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
This fixes an error when integration test become stuck with the message
like:
```
waiting for coredns to report ready: some pods are not ready: [coredns-868c687b7-g2z64]
```
After some random sequence of node restarts one of the pods might become
"stuck" in `Completed` state (as it is shown in `kubectl get pods`)
blocking the check, as the pod will never become ready.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
With graceful kubelet shutdown (#5108), after graceful node restart pods
on the restarted node might stay in the status `Terminated` which breaks
the check on pod readiness.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Allow cluster nodes to have multiple internal IP addresses when checking
for all Kubernetes nodes.
Fixes#4807
Signed-off-by: Seán C McCord <ulexus@gmail.com>
We're no longer testing against Talos <= 0.8, so no reason to
run this check (even if it's no-op).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Verify upgrade flow using the same version of the installer.
Run that with disk encryption enabled.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Filesystem creation step is moved on the later stage: when Talos mounts
the partition for the first time.
Now it checks if the partition doesn't have any filesystem and formats
it right before mounting.
Additionally refactored mount options a bit:
- replaced separate options with a set of binary flags.
- implemented pre-mount and post-unmount hooks.
And fixed typos in couple of places and increased timeout for `apid ready`.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Control plane components are running as static pods managed by the
kubelets.
Whole subsystem is managed via resources/controllers from os-runtime.
Many supporting changes/refactoring to enable new code paths.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Pods which are spun up as checkpoints shouldn't be counted towards the
normal pods spun up as part of the daemon set.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
For 0.6 -> 0.7 upgrade, in any case config.yaml is preserved and moved
from `/boot` to `/system/state`.
For single node upgrade, `EPHEMERAL` partition is not touched and other
partitions are re-created as needed.
Bump provision tests to 0.6/0.7 upgrades as we get closer to the new
release.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This was bugging me for quite some time, as (randomly) output was
scrolling endless repeated line when terminal width was small.
Root cause was that `\t` on output takes random amount of spaces
(columns). Fixed small issues counting correctly UTF-8 runes (important
for the spinner and if we ever do i18n), and make sure spinner itself is
included into the calculations.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
It only gets enabled if output is a terminal. Failures which resolve
themselves are removed from the final output. Small spinner to indicate
progress.
While I was at it, I fixed client-side `talosctl health` when init node
is missing.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.
And `pkg/machinery` is published as Go module inside Talos repository.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This change is only moving packages and updating import paths.
Goal: expose `internal/pkg/provision` as `pkg/provision` to enable other
projects to import Talos provisioning library.
As cluster checks are almost always required as part of provisioning
process, package `internal/pkg/cluster` was also made public as
`pkg/cluster`.
Other changes were direct dependencies discovered by `importvet` which
were updated.
Public packages (useful, general purpose packages with stable API):
* `internal/pkg/conditions` -> `pkg/conditions`
* `internal/pkg/tail` -> `pkg/tail`
Private packages (used only on provisioning library internally):
* `internal/pkg/inmemhttp` -> `pkg/provision/internal/inmemhttp`
* `internal/pkg/kernel/vmlinuz` -> `pkg/provision/internal/vmlinuz`
* `internal/pkg/cniutils` -> `pkg/provision/internal/cniutils`
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>