202 Commits

Author SHA1 Message Date
Andrey Smirnov
b7d70cf625
feat: unify maintenance and regular APIs
Drop maintenance service and all the code supporting it directly.

Instead, move all network API termination into the `apid` service, which
now can work now in more modes to support maintenance operations as
well.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-17 17:00:35 +04:00
Andrey Smirnov
da70cedfd2
refactor: drop apid file socket
This was yet another socket with implicit auth - remove it completely
by reworking the only usecase for it - cluster-side health checks.
Now these health checks build a "regular" network Talos API client (as
they anyways work only controlplane nodes).

Refactor the check for controlplane nodes to use resources instead of
machine config directly (as machine config might not be always present).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-10 21:52:03 +04:00
Andrey Smirnov
17335107be
fix: use non-sensitive resource for health check precondition
A fixup for #12896

The health check might be running as a reduced privilege role client, so
don't pull the machine config, but instead read a field from a
non-sensitive resource.

As this field doesn't exist in older versions of Talos, the check should
still run by default (as it will be empty).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-10 18:37:55 +04:00
Orzelius
57599fb877
fix: skip some readiness checks when the CNI is disabled
* skip node readiness check
* skip coredns readiness check

Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
2026-03-09 22:10:57 +09:00
Andrey Smirnov
0ab84c2a15
fix: ignore image digest when doing upgrade-k8s
The `talosctl upgrade-k8s` doesn't support pinning to image digests, but
it should ignore any image digests if they already exist in the
machine configuration.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-03-02 17:26:01 +04:00
Orzelius
d417d68e0d
feat: bring in new ssa logic
drop the old cli-utils based manifest apply logic and replace it with the new fluxcd/pkg/ssa based implementation

Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
2026-03-02 19:37:31 +09:00
Sébastien Masset
87615f5511
feat: implement network policies with Flannel CNI
Align flannel ClusterRole with upstream chart template (cf.
https://github.com/flannel-io/flannel/blob/master/chart/kube-flannel/templates/rbac.yaml)

Add boolean in cluster flannel CNI config to deploy extra resources to
handle network policies. Inspired by flannel Helm chart handling of
netpol.enabled value (cf. https://github.com/flannel-io/flannel/blob/master/Documentation/netpol.md)

Signed-off-by: Sébastien Masset <86793256+smasset-orange@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-02-10 15:47:49 +04:00
Andrey Smirnov
9690dbad02
chore: bump tools (including linter)
Re-generate, fix new linting issues.

Update containerd library to the latest 2.2.1 to address the new cgroups
package import (via tools update).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-02-09 13:07:35 +04:00
Andrey Smirnov
8b245b8f26
feat: implement new image service APIs
These new APIs only support one2one proxying, so they don't have any
hacks, and look as regular gRPC APIs.

Old APIs are deprecated, but still supported.

Implement client-side multiplexing in `talosctl`, provide fallback to
old APIs for legacy Talos versions.

New APIs include removing an image, importing an image.

Extracted from #12392

Co-authored-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-02-02 15:55:56 +04:00
Orzelius
b1b703dbe2
chore: move sync logging code to go-kubernetes package
so it can be reused in Omni

Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
2026-01-27 22:53:17 +09:00
Olav Thoresen
e5aca71cd0
fix: fix healthcheck timeout
Removes the 5 minute timeouts in the cluster health checks and relies
only on the global timeout set by the --wait-timeout flag

Signed-off-by: Olav Thoresen <Olav.Sortland.Thoresen@spk.no>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2026-01-14 15:01:57 +04:00
Orzelius
c839b38809
feat: expose more SSA options in the upgrade-k8s command
add the following flags to the upgrade-k8s command:
* `--force-conflicts`            overwrite the fields when applying even if the field manager differs
* `--inventory-policy` string    kubernetes SSA inventory policy (one of 'MustMatch', 'AdoptIfNoInventory' or 'AdoptAll') (default "AdoptIfNoInventory")
* `--no-prune`                   whether pruning of previously applied objects should happen after apply
* `--prune-timeout` int          how long to wait for resources to be pruned in secunds (set to zero to disable waiting for resources to be fully deleted) (default 180)
* `--reconcile-timeout` int      how long to wait for resources to be prfully reconciled in secunds (set to zero to disable waiting for resources to be fully reoondiled) (default 180)

Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
2026-01-12 21:17:43 +09:00
Orzelius
c4f3f6d3e5
feat: implement kubernetes server-side apply
* add SSA via the new go-kubernetes library implementation to talosctl `upgrade-k8s` command
* add SSA via direct ResourceInterface call into talos (machined) with a manual inventory update
* add an integration test for ssa functionality

Co-authored-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-12-26 12:08:16 +04:00
Andrey Smirnov
92eeaa4826
fix: update YAML library
Update COSI, and stop using a fork of `gopkg.in/yaml.v3`, now we use new
supported for of this library.

Drop `MarshalYAMLBytes` for the machine config, as we actually marshal
config as a string, and we don't need this at all.

Make `talosctl` stop doing hacks on machine config for newer Talos, keep
hacks for backwards compatibility.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-11-04 15:21:57 +04:00
Andrey Smirnov
7f048e962e
feat: update dependencies
Bump PKGS (Linux 6.16.9), tools, other go.mod dependencies.

Fix the linting issues.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-09-30 20:55:29 +04:00
Andrey Smirnov
9c97ed886b
fix: version contract parsing in encryption keys handling
Fix issue introduced in #11532 (`main` only) with versionContract
parsing: wrong variable was returned (overwritten).

Also some small cleanups/nits (with Albert).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-08-18 15:12:54 +04:00
Mateusz Urbanek
1fc670a08d
fix: dial with proxy
Fixes #11536

Signed-off-by: Mateusz Urbanek <mateusz.urbanek@siderolabs.com>
2025-08-18 09:53:14 +02:00
Alp Celik
7a52d7489c
fix: kubernetes upgrade options for kubelet
Make sure kubelet is not touched for both controlplanes and workers.

Signed-off-by: Alp Celik <celikal18@itu.edu.tr>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-08-15 15:01:14 +04:00
Andrey Smirnov
7e6052e63a
feat: increase boot partition to 2 GiB
See https://github.com/siderolabs/talos/discussions/10994

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-07-22 14:48:00 +04:00
Andrey Smirnov
7f0300f108
feat: update dependencies, Kubernetes 1.34.0-alpha.2
Bump all dependencies, many small changes due to new golangci-lint
version.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-06-30 19:05:22 +04:00
Andrey Smirnov
4da2dd537d
feat: enforce Kubernetes version compatibility
Fixes #11198

We should enforce in following places:

* before starting `upgrade-k8s`, check that all Talos machines would end
  up with a valid version
* validate in Talos machine configuration, this will cover both
  upgrades, new installs, and any machine configuration manual edits

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-06-27 20:03:23 +04:00
Andrey Smirnov
0cb137ad73
fix: make disk size check work on old Talos
For old Talos without `VolumeStatus` resource skip the check.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-05-21 17:41:39 +04:00
Andrey Smirnov
918b94d9a0
refactor: rewrite disk size check
Use `VolumeStatus` resource, drop old code which was really hard to
understand and follow.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-05-07 17:45:36 +04:00
Andrey Smirnov
b3b20eff3a
fix: containerd crashing with sigsegv
See #10855

Also refactor conformance tests to increase parallelism and speed up
fast conformance tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-05-02 21:20:06 +04:00
Andrey Smirnov
ace44ea616
test: update hydrophone to 0.7.0
This is the hydrophone version matching Kubernetes 1.33.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-05-01 15:29:17 +04:00
Andrey Smirnov
1e677587c0
fix: preserve kubelet image suffix
Preserve the `-fat`, `-slim` kubelet image suffixes while upgrading
`kubelet` images.

Fixes #10488

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-04-29 08:27:31 +04:00
Andrey Smirnov
efd918eeb5
feat: update dependencies
Brings in Linux 6.12.21, go 1.24.2.

Also updates Go dependencies, golangci-lint, etc.

The configuration was migrated, fix new linting errors.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-04-02 21:18:25 +04:00
Andrey Smirnov
d4aacb0d85
refactor: mount operation for STATE and user disks
Use new controller for user disk and STATE mounts, drop
old code in the sequencer.

Also support mounts with parent (when e.g. `/var/lib` is mounted on top
of `/var`).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-03-20 20:46:57 +04:00
Andrey Smirnov
61f1a32d24
test: allocate more resources for conformance runs
There was a wrong variable, but also always use more resources.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-03-13 22:53:03 +04:00
Mikhail Petrov
7af8f6b2fa
feat: validate docker image references in upgrade options
This commit adds validation for Docker image references in the UpgradeOptions
struct. The validation ensures that all image fields (kubelet, apiserver,
controller-manager, scheduler, proxy) are valid Docker image references before
proceeding with the upgrade process.

The implementation:
- Uses the distribution/reference package for robust Docker image validation
- Validates all image fields in UpgradeOptions struct
- Performs validation before starting the upgrade process
- Provides clear error messages indicating which image field is invalid
- Includes comprehensive tests covering various image reference formats

This change helps prevent potential issues during the upgrade process by
catching invalid image references early with clear error messages.

Signed-off-by: Mikhail Petrov <azalio@azalio.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-02-25 16:22:39 +04:00
Andrey Smirnov
716f700da7
feat: provide initial support for ethtool configuration
See https://github.com/siderolabs/ethtool - our fork.

This PR covers only configuring rings, follow-up PRs will address other
pieces: channels and features.

Example:

```
node: 172.20.0.5
metadata:
    namespace: network
    type: EthernetStatuses.net.talos.dev
    id: enp0s2
    version: 4
    owner: network.EthernetStatusController
    phase: running
    created: 2025-02-04T16:03:14Z
    updated: 2025-02-04T16:04:12Z
spec:
    linkState: true
    port: Other
    duplex: Unknown
    rings:
        rx-max: 256
        tx-max: 256
        rx: 128
        tx: 128
        tx-push: false
        rx-push: false
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2025-02-05 21:28:42 +04:00
Andrey Smirnov
b72bda0a42
fix: talosctl support and race tests
1. Don't set max cgroups limit if race mode is enabled (only in test
   mode). When e.g. apid/trustd are built with race detector on, they
   consume 10x the memory.
2. Fix a data race in `talosctl support` when showing UI progress.
3. Fix an issue pulling `kubeconfig` in `talosctl support` - pull from
   endpoints (controlplanes) without setting any nodes.

Fixes #10036

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-12-25 21:05:27 +04:00
Noel Georgi
347b758465
chore: support saving cluster logs on destroy
Support saving cluster logs on destroy

Fixes: #9808

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-11-29 14:39:15 +05:30
Dmitriy Matrenichev
e26d0043e0
chore: code cleanup
More usage of slices package, less usage of package sort.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-11-14 12:25:56 +03:00
Noel Georgi
2001167058
chore(ci): save support zip always after tests
Save `support.zip` always, also use a different folder for saving logs,
so we can save artifacts of multi cluster tests.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-11-08 19:18:06 +05:30
Noel Georgi
5112547d6b
chore: generate support zip for crashdump
Generate support zip on crashdump.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-11-06 15:54:21 +05:30
Andrey Smirnov
e0434d77d7
feat: update dependencies
Bring in new tools, pkgs, update Go dependencies and others.

In preparation for Talos 1.9.0-alpha.0.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-10-17 22:12:50 +04:00
Andrey Smirnov
780a1f198a
fix: update CoreDNS health check
The fix in #9233 wasn't correct, as it was looking for number of
replicas in a "random" ReplicaSet. If the deployment has multiple
replica sets, it leads to unexpected results.

Instead, read the Deployment resource directly.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-09-12 16:10:40 +04:00
Andrey Smirnov
a9551b7caa
fix: host DNS access with firewall enabled
Explicitly enable access to host DNS from pod/service IPs.

Also fix the Kubernetes health checks to assert number of ready pods to
match expectation, otherwise the check might skip a pod (e.g.
`kube-proxy` one) which is not ready, allowing the test to proceed too
early.

Update DNS test to print more logs on error.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-27 15:44:14 +04:00
Noel Georgi
c312a46f69
chore: restructure k8s component health checks
Re-structure k8s components health checks so that K8s health can be
independently checked without auxiliary components being up.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-08-19 14:32:07 +05:30
Dmitriy Matrenichev
dad9c40c73
chore: simplify code
- replace `interface{}` with `any` using `gofmt -r 'interface{} -> any -w'`
- replace `a = []T{}` with `var a []T` where possible.
- replace `a = []T{}` with `a = make([]T, 0, len(b))` where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-07-08 18:14:00 +03:00
Andrey Smirnov
3d35e54683
chore: update hydrophone library
My PR https://github.com/kubernetes-sigs/hydrophone/pull/198 got merged
upstream, so drop local workaround.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-05 14:42:47 +04:00
Andrey Smirnov
7fcb521a6a
feat: use hydrophone instead of sonobuoy
Fixes #8790

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-11 16:51:45 +04:00
Andrey Smirnov
e8ced2c2dd
chore: drop k8s timeout in the default kubeconfig
(This is not user-facing, but rather internal use of the kubeconfig in
the tests/inside the machine).

This was added 4 years ago as a workaround, but instead of a global
timeout we should rather use contexts with timeouts/deadlines (and we
do!).

Setting a global timeout breaks streaming Kubernetes pod logs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-10 18:29:50 +04:00
Andrey Smirnov
8dbe2128a9
feat: implement Talos diagnostics
Talos diagnostics analyzes current system state and comes up with detailed
warnings on the system misconfiguration which might be tricky to figure
out other way.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-05 22:28:15 +04:00
Andrey Smirnov
b0fdc3c8ca
fix: make static pods check output consistent
Sort the pod names, so the check output doesn't re-print itself on no
change to the list of pods.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-10 15:30:24 +04:00
Dmitry Sharshakov
653f838b09
feat: support multiple Docker cluster in talosctl cluster create
Dynamically map Kubernetes and Talos API ports to an available port on
the host, so every cluster gets its own unique set of parts.

As part of the changes, refactor the provision library and interfaces,
dropping old weird interfaces replacing with (hopefully) much more
descriprive names.

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-04 21:21:39 +04:00
Dmitriy Matrenichev
949ad11a2d
chore: import siderolink as siderolink-launch subcommand
This PR ensures that we can test our siderolink communication using embedded siderolink-agent.
If `--with-siderolink` provided during `talos cluster create` talosctl will embed proper kernel string and setup `siderolink-agent` as a separate process. It should be used with combination of `--skip-injecting-config` and `--with-apply-config` (the latter will use newly generated IPv6 siderolink addresses which talosctl passes to the agent as a "pre-bind").

Fixes #8392

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-03-23 16:08:56 +03:00
Dmitriy Matrenichev
19f15a840c
chore: bump golangci-lint to 1.57.0
Fix all discovered issues.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-03-21 01:06:53 +03:00
Artem Chernyshev
113fb646ec
chore: use go-talos-support library
The code for collecting Talos `support.zip` was extracted there.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-03-19 18:28:46 +03:00