179 Commits

Author SHA1 Message Date
Utku Ozdemir
2fe716d2c9
chore: enable go linting for build tags, fix linting errors
Add the build tags we were using, `integration` and `tools`, to be included in the linting/formatting of  golangci-lint.

Rename the build tag `tools` to `sidero.tools` to avoid colliding with the same named build tag in `github.com/johannesboyne/gofakes3` package - otherwise the dependency was failing to compile due to having multiple package names in the same package.

Fix all the linting errors surfaced by this enablement.

Also, temporarily re-enabled `nolintlint` to find the nolint directives which were no longer necessary and removed them.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2026-04-29 21:18:45 +02:00
Edward Sammut Alessi
d3592671ec
feat: download talosctl directly from factory
Download talosctl binaries from factory instead of Github

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2026-04-29 17:06:25 +02:00
Utku Ozdemir
f9dd849153
feat: introduce powered off machine state and power on support
Machines that were shutting down and then disconnect are now shown as "Powered Off" in the UI instead of being stuck in "Shutting Down" with a greyed-out unreachable state.

For machines managed by a static infra provider, shutting down a machine now prevents the provider from automatically powering it back on due to cluster allocation. The provider honors the shutdown request until the machine goes through a deallocation cycle, at which point the request is considered stale.

Intentionally powered-off machines are also excluded from the "disconnected machines" list on the frontend when destroying a cluster, to avoid them being force-destroyed.

The shutdown modal in the frontend now calls a new management API endpoint instead of the Talos API directly. The CLI gains \`omnictl machine shutdown\` and \`omnictl machine power-on\` commands.

Closes siderolabs/omni#1634.
Part of siderolabs/omni-infra-provider-bare-metal#103.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2026-04-24 13:57:12 +02:00
Artem Chernyshev
725f41d4ee
fix: properly display service account expiration time in the UI
The old code was incorrectly picking the public key.

Fixes: https://github.com/siderolabs/omni/issues/2717

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2026-04-23 21:19:54 +03:00
Edward Sammut Alessi
c5a4310570
feat(frontend): add support modal to omni
Add a support modal to Omni, providing links to github issues, support, docs, community links, and office hours.

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2026-04-23 15:46:42 +02:00
Edward Sammut Alessi
be67f710f8
feat: allow reader access to join token
Explicitly allow readers to read join tokens

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2026-04-21 16:28:32 +02:00
Oguz Kilcan
475e3660d7
feat: add Talos version end-of-support notifications and metrics
* Track machines running Talos versions approaching or past end of support relative to MinTalosVersion.
* Replace the config-driven non-ImageFactory deprecation notification with hardcoded constants and add two new notifications (approaching end of support, end of support reached) with corresponding Prometheus metrics.
* Add startup validation hooks (currently disabled) that will refuse to start when unsupported machines are detected.
* Fix frontend notification namespace from Default to Ephemeral.

Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
2026-04-20 17:11:49 +02:00
Edward Sammut Alessi
488b020b2e
feat: add more filters to audit logs
Add multiple new filters to audit logs. Through the UI, there will be a generic search box and the ability to sort columns. Through the CLI, there will be support for the same plus also direct filters for event_type, resource_type, resource_id, cluster_id, and actor.

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2026-04-15 11:03:54 +02:00
Utku Ozdemir
590ea2e370
feat: add per-key creation and last-active tracking for service accounts
Add creation timestamps and per-key last-active tracking to service account key listings. The `omnictl serviceaccount list` command now shows KEY CREATED and KEY LAST ACTIVE columns for each public key, alongside the existing SA-level LAST ACTIVE.

A new PublicKeyLastActive resource tracks per-key usage. The activity interceptor now extracts the signing key fingerprint from the auth context and records last-used timestamps per key, with independent debouncing. The ServiceAccountStatusController aggregates this data into the service account status for display.

A cleanup controller removes PublicKeyLastActive resources when their corresponding public key is torn down.

Closes: siderolabs/omni#2661
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2026-04-14 21:12:30 +02:00
Edward Sammut Alessi
cad3713552
feat: implement eula guard for omni
Implement a guard for Omni to prevent usage until users accept an EULA through the UI or a startup flag.

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2026-04-13 16:49:51 +02:00
Artem Chernyshev
e4760526f2
feat: support omnictl edit command
Works same way as `talosctl edit`, `kubectl edit`.

Fixes: https://github.com/siderolabs/omni/issues/905

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2026-04-07 15:47:31 +03:00
Artem Chernyshev
43be52c7b1
chore: bump sqlite metrics collector timeout and interval
Timeout: 10s -> 60s
Interval: 60s -> 120s

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2026-04-02 14:14:25 +03:00
Artem Chernyshev
6efb0f2f0a
feat: support Kubernetes manifests in the cluster templates
Fixes: https://github.com/siderolabs/omni/issues/2172

Leverage kubernetes manifest resources and expose them through cluster
templates.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2026-03-26 14:10:14 +03:00
Artem Chernyshev
ada0360837
feat: add a way to sync Kubernetes manifests in Omni
Manifests support two modes:
- `FULL` - Omni will keep the manifest in sync always.
- `ONE_TIME` - Omni will apply the manifest only if it doesn't exist. If the manifest is removed by hand and then changed in Omni it will be applied too.

Manifests are applied using service side apply, Omni now has three inventories: `omni-internal-inventory`, `omni-user-inventory` and `omny-sync-one-time`:

- User inventory will be used for user managed manifests.
- Internal one will be used for the manifests which are created by Omni controllers (workloadproxy, advanced healtcheck service).
- One time inventory is used with NoPrune enabled. If the manifest is
  applied it's just removed from the list of applied manifests: that
  ensures that manifests changes are not going happen.

Manifests also support setting namespace to all namespaced resources. It might be useful for the huge manifest files which are supplied without the namespace (similar to `kubectl apply -n namespace -f manifest.yaml`).

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2026-03-23 15:29:49 +03:00
Oguz Kilcan
b9cabbd95c
feat: add deprecation notification for non-ImageFactory machines
Warn users when machines are provisioned without ImageFactory by creating a notification resource when invalid schematics are detected. The notification is gated behind a configurable flag under notifications.nonImageFactoryDeprecation with customizable title/body.

Also adds omni_machines_invalid_schematic Prometheus metric, exposes the count in MachineStatusMetricsSpec, and adds a Machine Status section to the Grafana dashboard.

Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
2026-03-20 15:47:53 +01:00
Oguz Kilcan
cf7d752453
feat: enforce configurable machine registration limit
Add `account.maxRegisteredMachines` config option to cap the number of registered machines. The provision handler atomically checks the limit under a mutex before creating new Link resources, returning ResourceExhausted when the cap is reached.

Introduce a Notification resource type (ephemeral namespace) so controllers can surface warnings to users. `omnictl` displays all active notifications on every command invocation. Frontend part of showing notifications will be implemented in a different PR.

MachineStatusMetricsController creates a warning notification when the registration limit is reached and tears it down when it's not.

Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
2026-03-16 12:48:47 +01:00
Utku Ozdemir
1e9b733cb0
chore: bump deps, rekres
Bump all dependencies.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2026-03-10 18:31:38 +01:00
Artem Chernyshev
543cf70b5b
chore: force SSA manifests sync mode for Talos >= 1.13
Backend now automatically switches between legacy and SSA modes for
different Talos versions.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2026-03-05 20:30:13 +03:00
Oguz Kilcan
a9f2937ced
feat: add OIDC token cache isolation for generated kubeconfigs
Support isolated OIDC token cache directories in generated `kubeconfig`s to prevent token conflicts when switching between users/clusters. Configurable via server flags and omnictl `--oidc-cache-base-dir` `--oidc-cache-isolation`.

Also upgrade exec credential API to v1 and add interactiveMode field.

Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
2026-03-04 09:53:05 +01:00
Artem Chernyshev
f8a42eeb04
chore: move graceful upgrades to the lowest level
Rewrite `TalosUpgradeStatus` controller to use the completely different
flow:
- update all `ClusterMachineTalosVersion` resources immediately.
- to control quotas and rollout sequence use `UpgradeRollout` resource,
  it has a single field which is a map of MachineSetName -> Current
  Quota:
  - if control plane is updating it sets quota 0 on all other machine
    sets.
  - the number of not running/unhealthy machines is subtracted from the
    quota.
  - quota is now copied from the new `UpgradeStrategy`, so it's possible
    to have more than one machine updated in parallel.
- `ClusterMachineConfigStatus` controller now adds a new finalizer for
  upgrades on all `ClusterMachines` which are currently being updated to
  acquire/release locks and reads quotas from the `UpgradeRollout`.

Fixes: https://github.com/siderolabs/omni/issues/2393

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2026-03-03 20:02:59 +03:00
Oguz Kilcan
6d03fc7cdb
feat: track user and service account last activity
* Add `IdentityLastActive` resource to record the last time each identity(`User`/`ServiceAccount`) made a gRPC call.
* Add `IdentityStatusController` to aggregate identity, user role, and last-active data into an ephemeral `IdentityStatus` resource.
* Expose last_active in ListUsers/ListServiceAccounts gRPC responses, omnictl CLI output, and the frontend Users/ServiceAccounts views.
* Add `UserMetricsController` exposing `omni_users` (total) and `omni_active_users` (7d/30d windows) Prometheus gauges.

Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
2026-03-03 13:53:29 +01:00
Edward Sammut Alessi
5fccd82b6e
feat: add talos_version and kubernetes_version to clusterstatus
Add talos_version and kubernetes_version to ClusterStatusSpec, so as to not need to also query ClusterSpec.

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2026-02-26 15:58:39 +01:00
Oguz Kilcan
da60807d48
feat: add ManagementService gRPC endpoints for user operations
Migrate user create, list, update, and destroy operations from direct resource manipulation to dedicated ManagementService gRPC endpoints, matching the existing service account pattern.
Direct Identity/User resource mutations are now restricted, and the CLI, frontend, and client library are updated to use the new endpoints.

Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
2026-02-26 09:33:27 +01:00
Artem Chernyshev
47fb4dd792
feat: allow resetting node unique tokens
This allows token rotation and disaster recovery if the token gets
rejected by Omni.

Introduced the new CLI command for that:

```
omnictl configure machine <id> --reset-node-unique-token
```

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2026-02-25 13:42:12 +03:00
Artem Chernyshev
69c2759b8b
fix: break the dep loop in the cluster machine config status controller
Extract the fields required by the `MachineConfigStatusController` to a
separate resource.
Otherwise there's circular loop: `MachinePendingUpdates` ->
`MachineSetStatus` -> `MachineConfigStatus` -> `MachinePendingUpdates`...

Also change the way machine pending is calculated: do not delete the
pending machine updates resource if the Talos version/schematic is not
in sync.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2026-02-17 00:28:32 +03:00
Oguz Kilcan
afdf123e29
feat: add support for Kubernetes CA rotation
Add support for Kubernetes CA rotation

Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
2026-02-14 11:32:00 +01:00
Utku Ozdemir
d1c869a9d8
chore: bump deps, rekres
Bump all dependencies.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2026-02-12 20:43:45 +01:00
Utku Ozdemir
0906bcc23c
fix: prevent unwanted upgrades of non-image-factory machines
The schematic comparison logic had an edge case: if a machine predates the image factory, it is installed via a `ghcr.io` installer image (or a custom one). Those machines do not have the schematic meta extension on them, and Omni creates a synthetic schematic ID and properties for those. These properties do not have the "actual" kernel args of the machine, but rather, Omni sets them as what it thinks they should be (the "correct" siderolink args from the Omni perspective).

Later, if Omni gets its siderolink API advertised URL get updated, it wrongly detects those synthetic kernel args to be the "new ones (with the new URL)", hence, the desired vs actual schematic comparison returns a mismatch. And Omni does an unnecessary upgrade to that machine.

Fix this by using the "current (non-protected) args of the machine" as the synthetic args in such cases. Those "current" args will be synthetic themselves (since we cannot read them from the machine, as it does not have schematic info on it), but, it will prevent changes when the advertised URL changes.

Additionally, we have two checks to detect a schematic mismatch in the `ClusterMachineConfigStatus` controller - make them check the mismatch in the same way, to be more consistent.

Unrelated to this bug, also fix the `SchematicReady` check (introduced in 1.5) to treat invalid schematics as valid, as otherwise we cannot create clusters from non-factory images.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2026-02-05 13:56:49 +01:00
Utku Ozdemir
c319d7bcf2
fix: fix schematic generation for machines in agent mode
We had an issue with bare metal provider where two different schematic IDs would fight each other, causing machine to get installed with a wrong schematic ID, only to be upgraded to the correct one immediately, and in some cases, go into an upgrade loop between a correct and an incorrect schematic.

The cause: Omni treated schematics it observed when the machine in agent mode dialed in, and stored the information it received (like kernel args and initial schematic info). This was wrong, as agent mode information essentially meaningless.

Fix this by changing the simple check of "was the schematic info for machine X ever observed" to be "is the schematic info for machine X ready". The readiness check involves schematic being populated and machine not being in agent mode.

This change caused `SchematicConfiguration` resource to not be generated before the machine leaves the agent mode, and caused a side effect: `InfraMachineController` would not receive Talos version from it and would not populate it on the `InfraMachine` resource. And this would cause BM provider to never get notified about the fact that the machine is allocated to a cluster, and would not power it on (to PXE boot it to "regular" Talos, for it to receive the "install" call to Omni).

Change that controller to get the Talos version info directly from the Cluster resource.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2026-02-03 11:46:15 +01:00
Oguz Kilcan
c6cc25c73c
feat: add support for Talos CA rotation
Add support for Talos CA rotation

Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
2026-01-30 09:59:25 +01:00
Edward Sammut Alessi
7376edafc1
fix(installation-media): fix bug when setting arch to amd64
Fix a bug when setting arch to AMD64 which was enum value 0 and was being omitted in responses to frontend.

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2026-01-29 11:30:40 +01:00
Utku Ozdemir
98ef83ee42
fix: fix config patches encryption when encryption is disabled
When the resource compression was disabled in the Omni config, we were not generating the ClusterMachineConfigPatches correctly.

The issue was: it was attempting to "force-compress" the ClusterMachineConfigPatches when any of the patches' size was above the threshold. But when it was trying to do that, it did not override the global setting of false.

The default setting for resource compression is `true`, but when a config file is used to configure Omni, and it was not specified in the config YAML, it was getting overwritten to be `false` due to the boolean merging behavior, which was fixed in https://github.com/siderolabs/omni/pull/2150.

Also: fix the compression kicking in even in cases when it is disabled in config but above the threshold.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2026-01-26 19:04:21 +01:00
Artem Chernyshev
41506f72f8
chore: move graceful config rollout logic to the lowest controller level
Now graceful config rollout is handled by the
`ClusterMachineConfigStatusController`.
It calculates the available update quota by adding finalizers on the
`ClusterMachine` resources. By counting the resources with the
finalizers it tracks the remaining quota.
It now also calculates the pending changes which are not yet applied to
the machine in the `MachinePendingUpdates`.

Pending changes are not yet shown in the UI anywhere.

Fixes: https://github.com/siderolabs/omni/issues/1929

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2026-01-19 16:30:28 +03:00
Edward Sammut Alessi
fb08dcaa2d
feat(frontend): add extra information to userpilot
Add extra information to userpilot as per [RFD-28](https://www.notion.so/siderolabs/RFD-28-Userpilot-Implementation-and-Omni-Usage-Analytics-6024961dd8cc43eea0ef7797841be51b?d=2dfb1211badf809fbd3f001c79056040#118b1211badf804b9de5edde0f22da96)

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2026-01-12 11:52:01 +01:00
Edward Sammut Alessi
66e243a233
refactor(installation-media): add metal id const and use gets where possible
Add PlatformMetalID constant to frontend and use it where relevant as an ID. Also update some places in backend with the same idea. There were some lingering uses of List requests in places where Get requests were more suited and those have been replaced too.

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2026-01-09 17:22:26 +01:00
Oguz Kilcan
ef2d931aac
chore: rekres and bump deps
* Rekres
* Bump deps
* Update default versions for talos and kubernetes

Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
2026-01-09 11:34:03 +01:00
Edward Sammut Alessi
950ca1b0a3
refactor(installation-media): extract schematic generation and download links
Extract schematic generation and download links from the confirmation step of the installation media wizard to allow for re-use inside the download modal of the list view.

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2026-01-08 18:17:11 +01:00
Utku Ozdemir
535d733ea6
chore: drop migrations older than v1.1.0
Drop old migrations and deprecated types which were kept only for the migrations.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2026-01-06 14:50:11 +01:00
Edward Sammut Alessi
5c98d44bdf
chore: implement InstallationMediaConfig resource
This resource is going to be used to store the saved installation media
presets generated by the UI wizard.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-12-29 17:41:45 +01:00
Edward Sammut Alessi
7ffe5a4db8
feat(installation-media): allow submitting bootloader to schematic request
Allow submitting the bootloader option when creating a schematic.

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2025-12-16 15:40:54 +01:00
Artem Chernyshev
d3e4884ba7
chore: add new fields to the CreateSchematic Omni API
Now it's possible to pass the `overlay` ID directly to the request.
`MediaId` is also still supported, but is there only for the backward
compatibility.
`InstallationMedia` resources will be used only in the `omnictl download`.

Updated the Wizard UI to no longer use `InstallationMedia` resources.
Dropped `pxe_url` from the `CreateSchematic` response, as all required
arguments are now on the client side (if not using `InstallationMedia`
resources).

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-12-16 15:06:08 +03:00
Artem Chernyshev
aa6acff632
chore: support resource list based filtering in the DependencyGraph
This will allow seeing which controllers are using the defined resources.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-12-15 21:00:01 +03:00
Artem Chernyshev
ee926cd9eb
feat: add a way to switch gRPC tunnel mode for the connected machines
Fixes: https://github.com/siderolabs/omni/issues/1816

Introduce a new command:

```
omnictl configure machine <id> --siderolink-connection=[udp|http-tunnel|auto]
```

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-12-12 22:59:33 +03:00
Spencer Smith
914c8c0ba1
feat: add min-commit flag for omni
This PR would add a new flag for minimum committed machines. This data would come from stripe and if the user's Omni environment has less that the committed machines, we'd just report the minimum specified.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2025-12-12 09:25:42 -05:00
Edward Sammut Alessi
4d11b75e03
feat: return schematic yml when creating installation media
Return the final schematic YML to display in the frontend when creating installation media

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2025-12-11 17:43:14 +01:00
Tim Jones
d68562f595
feat: add labels to talos version metric
Add labels for the assigned cluster and connection status to the
`omni_machines_version` metric.

Closes #1967

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2025-12-10 13:26:36 +01:00
Oguz Kilcan
bc2a5a9986
chore: prepare omni with talos v1.12.0-beta.1
Prepare omni for upcoming talos version 1.12.0-beta.1.

Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
2025-12-06 16:55:35 +01:00
Edward Sammut Alessi
24ed384afb
fix(installation-media): only list architectures supported by providers
Only list the architectures supported by the providers as defined in the API.

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2025-12-04 13:44:19 +01:00
Edward Sammut Alessi
9826116e85
fix(installation-media): adjust secureboot support check
Adjust the secure boot support check in the machine arch step to match how it works in factory.

Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
2025-12-03 21:19:55 +01:00
Artem Chernyshev
8b5c29b303
feat: support locks,node delete and restore when using machine classes
Make `MachineSetNode` created without an owner by the
`MachineSetNodeController`.

Fixes: https://github.com/siderolabs/omni/issues/1450

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-11-21 20:44:17 +03:00