Drop maintenance service and all the code supporting it directly.
Instead, move all network API termination into the `apid` service, which
now can work now in more modes to support maintenance operations as
well.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This was yet another socket with implicit auth - remove it completely
by reworking the only usecase for it - cluster-side health checks.
Now these health checks build a "regular" network Talos API client (as
they anyways work only controlplane nodes).
Refactor the check for controlplane nodes to use resources instead of
machine config directly (as machine config might not be always present).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
A fixup for #12896
The health check might be running as a reduced privilege role client, so
don't pull the machine config, but instead read a field from a
non-sensitive resource.
As this field doesn't exist in older versions of Talos, the check should
still run by default (as it will be empty).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
The `talosctl upgrade-k8s` doesn't support pinning to image digests, but
it should ignore any image digests if they already exist in the
machine configuration.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
drop the old cli-utils based manifest apply logic and replace it with the new fluxcd/pkg/ssa based implementation
Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
Re-generate, fix new linting issues.
Update containerd library to the latest 2.2.1 to address the new cgroups
package import (via tools update).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
These new APIs only support one2one proxying, so they don't have any
hacks, and look as regular gRPC APIs.
Old APIs are deprecated, but still supported.
Implement client-side multiplexing in `talosctl`, provide fallback to
old APIs for legacy Talos versions.
New APIs include removing an image, importing an image.
Extracted from #12392
Co-authored-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Removes the 5 minute timeouts in the cluster health checks and relies
only on the global timeout set by the --wait-timeout flag
Signed-off-by: Olav Thoresen <Olav.Sortland.Thoresen@spk.no>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
add the following flags to the upgrade-k8s command:
* `--force-conflicts` overwrite the fields when applying even if the field manager differs
* `--inventory-policy` string kubernetes SSA inventory policy (one of 'MustMatch', 'AdoptIfNoInventory' or 'AdoptAll') (default "AdoptIfNoInventory")
* `--no-prune` whether pruning of previously applied objects should happen after apply
* `--prune-timeout` int how long to wait for resources to be pruned in secunds (set to zero to disable waiting for resources to be fully deleted) (default 180)
* `--reconcile-timeout` int how long to wait for resources to be prfully reconciled in secunds (set to zero to disable waiting for resources to be fully reoondiled) (default 180)
Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
* add SSA via the new go-kubernetes library implementation to talosctl `upgrade-k8s` command
* add SSA via direct ResourceInterface call into talos (machined) with a manual inventory update
* add an integration test for ssa functionality
Co-authored-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Update COSI, and stop using a fork of `gopkg.in/yaml.v3`, now we use new
supported for of this library.
Drop `MarshalYAMLBytes` for the machine config, as we actually marshal
config as a string, and we don't need this at all.
Make `talosctl` stop doing hacks on machine config for newer Talos, keep
hacks for backwards compatibility.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fix issue introduced in #11532 (`main` only) with versionContract
parsing: wrong variable was returned (overwritten).
Also some small cleanups/nits (with Albert).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Make sure kubelet is not touched for both controlplanes and workers.
Signed-off-by: Alp Celik <celikal18@itu.edu.tr>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#11198
We should enforce in following places:
* before starting `upgrade-k8s`, check that all Talos machines would end
up with a valid version
* validate in Talos machine configuration, this will cover both
upgrades, new installs, and any machine configuration manual edits
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Use `VolumeStatus` resource, drop old code which was really hard to
understand and follow.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
See #10855
Also refactor conformance tests to increase parallelism and speed up
fast conformance tests.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Brings in Linux 6.12.21, go 1.24.2.
Also updates Go dependencies, golangci-lint, etc.
The configuration was migrated, fix new linting errors.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Use new controller for user disk and STATE mounts, drop
old code in the sequencer.
Also support mounts with parent (when e.g. `/var/lib` is mounted on top
of `/var`).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This commit adds validation for Docker image references in the UpgradeOptions
struct. The validation ensures that all image fields (kubelet, apiserver,
controller-manager, scheduler, proxy) are valid Docker image references before
proceeding with the upgrade process.
The implementation:
- Uses the distribution/reference package for robust Docker image validation
- Validates all image fields in UpgradeOptions struct
- Performs validation before starting the upgrade process
- Provides clear error messages indicating which image field is invalid
- Includes comprehensive tests covering various image reference formats
This change helps prevent potential issues during the upgrade process by
catching invalid image references early with clear error messages.
Signed-off-by: Mikhail Petrov <azalio@azalio.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
1. Don't set max cgroups limit if race mode is enabled (only in test
mode). When e.g. apid/trustd are built with race detector on, they
consume 10x the memory.
2. Fix a data race in `talosctl support` when showing UI progress.
3. Fix an issue pulling `kubeconfig` in `talosctl support` - pull from
endpoints (controlplanes) without setting any nodes.
Fixes#10036
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Save `support.zip` always, also use a different folder for saving logs,
so we can save artifacts of multi cluster tests.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Bring in new tools, pkgs, update Go dependencies and others.
In preparation for Talos 1.9.0-alpha.0.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
The fix in #9233 wasn't correct, as it was looking for number of
replicas in a "random" ReplicaSet. If the deployment has multiple
replica sets, it leads to unexpected results.
Instead, read the Deployment resource directly.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Explicitly enable access to host DNS from pod/service IPs.
Also fix the Kubernetes health checks to assert number of ready pods to
match expectation, otherwise the check might skip a pod (e.g.
`kube-proxy` one) which is not ready, allowing the test to proceed too
early.
Update DNS test to print more logs on error.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Re-structure k8s components health checks so that K8s health can be
independently checked without auxiliary components being up.
Signed-off-by: Noel Georgi <git@frezbo.dev>
- replace `interface{}` with `any` using `gofmt -r 'interface{} -> any -w'`
- replace `a = []T{}` with `var a []T` where possible.
- replace `a = []T{}` with `a = make([]T, 0, len(b))` where possible.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
(This is not user-facing, but rather internal use of the kubeconfig in
the tests/inside the machine).
This was added 4 years ago as a workaround, but instead of a global
timeout we should rather use contexts with timeouts/deadlines (and we
do!).
Setting a global timeout breaks streaming Kubernetes pod logs.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Talos diagnostics analyzes current system state and comes up with detailed
warnings on the system misconfiguration which might be tricky to figure
out other way.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Sort the pod names, so the check output doesn't re-print itself on no
change to the list of pods.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Dynamically map Kubernetes and Talos API ports to an available port on
the host, so every cluster gets its own unique set of parts.
As part of the changes, refactor the provision library and interfaces,
dropping old weird interfaces replacing with (hopefully) much more
descriprive names.
Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This PR ensures that we can test our siderolink communication using embedded siderolink-agent.
If `--with-siderolink` provided during `talos cluster create` talosctl will embed proper kernel string and setup `siderolink-agent` as a separate process. It should be used with combination of `--skip-injecting-config` and `--with-apply-config` (the latter will use newly generated IPv6 siderolink addresses which talosctl passes to the agent as a "pre-bind").
Fixes#8392
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>