Commit Graph

6 Commits

Author SHA1 Message Date
Utku Ozdemir
6f014b1ea1
fix: fix node resolution cache for nodes in maintenance mode
Some checks failed
default / default (push) Has been cancelled
default / e2e-backups (push) Has been cancelled
default / e2e-forced-removal (push) Has been cancelled
default / e2e-scaling (push) Has been cancelled
default / e2e-short (push) Has been cancelled
default / e2e-short-secureboot (push) Has been cancelled
default / e2e-templates (push) Has been cancelled
default / e2e-upgrades (push) Has been cancelled
default / e2e-workload-proxy (push) Has been cancelled
e2e-workload-proxy-cron / default (push) Has been cancelled
e2e-upgrades-cron / default (push) Has been cancelled
e2e-templates-cron / default (push) Has been cancelled
e2e-short-secureboot-cron / default (push) Has been cancelled
e2e-short-cron / default (push) Has been cancelled
e2e-scaling-cron / default (push) Has been cancelled
e2e-forced-removal-cron / default (push) Has been cancelled
e2e-backups-cron / default (push) Has been cancelled
There was a problem with the node resolution (a.k.a. DNS) cache of the nodes.
When a machine is in maintenance mode, there is a corresponding `MachineStatus` resource for it, but there isn't any `ClusterMachineIdentity`.
Both of these types trigger updates in the node resolution cache.
When a machine was never part of a cluster, the only source is `MachineStatus`, and the cache updates on it did not populate the machine ID in the cache.
This caused the GRPC router to pick the wrong destination.

Furthermore, we did not remove the cluster and node name information from the cache when a machine was removed from a cluster. This caused the cache to contain obsolete cluster information, causing Talos GRPC proxy to not proxy the requests correctly after a machine was removed from a cluster.

Co-authored-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2025-01-31 08:38:34 +01:00
Artem Chernyshev
ed946b30a6
feat: display OMNI_ENDPOINT in the service account creation UI
Fixes: https://github.com/siderolabs/omni/issues/858

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-01-29 15:27:36 +03:00
Andrey Smirnov
3ba096a06d
fix: bring in new versions of COSI runtime and state-etcd
Some checks failed
default / default (push) Has been cancelled
default / e2e-backups (push) Has been cancelled
default / e2e-forced-removal (push) Has been cancelled
default / e2e-scaling (push) Has been cancelled
default / e2e-short (push) Has been cancelled
default / e2e-short-secureboot (push) Has been cancelled
default / e2e-templates (push) Has been cancelled
default / e2e-upgrades (push) Has been cancelled
default / e2e-workload-proxy (push) Has been cancelled
This brings in watch restarts for controller-runtime.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-12-27 17:32:32 +04:00
Artem Chernyshev
58159e419c
feat: automatically resolve cluster in talosctl calls
Some checks are pending
default / default (push) Waiting to run
default / e2e-backups (push) Blocked by required conditions
default / e2e-scaling (push) Blocked by required conditions
default / e2e-short (push) Blocked by required conditions
default / e2e-templates (push) Blocked by required conditions
default / e2e-upgrades (push) Blocked by required conditions
default / e2e-workload-proxy (push) Blocked by required conditions
`talosctl --cluster` flag is now optional, Omni will automatically
resolve the cluster if the machine is a part of one.
Fixes: https://github.com/siderolabs/omni/issues/620

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-10-30 15:21:49 +03:00
Artem Chernyshev
b47acf2e0f
feat: support insecure access to the nodes running in maintenance
Any insecure `talosctl` commands now work with Omni per-instance
`talosconfig`.
User should have at least `Operator` Omni role to be able to use the
insecure `talosctl` mode.

DNS resolver was updated to react on the `MachineStatus` resource
creation, not only the `ClusterMachineConfigStatus` resource.
That makes the DNS record for UUID appear as soon as machine joins Omni,
not when the machine gets allocated into a cluster.

Machines list now has maintenance Talos version update button.
The UI will issue `talosctl upgrade` when another Talos version is
picked.

`MachineStatus` controller was updated a bit: version poller wasn't
marked as dirty after maintenance upgrades. Now we mark it as dirty
every time we get Talos `MachineStatus` resource update.

Also fixed UI issues here and there.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-06-26 15:37:59 +03:00
Andrey Smirnov
dfcbaae7d0
chore: initial commit
Omni is source-available under BUSL.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Co-Authored-By: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Co-Authored-By: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Co-Authored-By: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Co-Authored-By: Philipp Sauter <philipp.sauter@siderolabs.com>
Co-Authored-By: Noel Georgi <git@frezbo.dev>
Co-Authored-By: evgeniybryzh <evgeniybryzh@gmail.com>
Co-Authored-By: Tim Jones <tim.jones@siderolabs.com>
Co-Authored-By: Andrew Rynhard <andrew@rynhard.io>
Co-Authored-By: Spencer Smith <spencer.smith@talos-systems.com>
Co-Authored-By: Christian Rolland <christian.rolland@siderolabs.com>
Co-Authored-By: Gerard de Leeuw <gdeleeuw@leeuwit.nl>
Co-Authored-By: Steve Francis <67986293+steverfrancis@users.noreply.github.com>
Co-Authored-By: Volodymyr Mazurets <volodymyrmazureets@gmail.com>
2024-02-29 17:19:57 +04:00