Add the build tags we were using, `integration` and `tools`, to be included in the linting/formatting of golangci-lint.
Rename the build tag `tools` to `sidero.tools` to avoid colliding with the same named build tag in `github.com/johannesboyne/gofakes3` package - otherwise the dependency was failing to compile due to having multiple package names in the same package.
Fix all the linting errors surfaced by this enablement.
Also, temporarily re-enabled `nolintlint` to find the nolint directives which were no longer necessary and removed them.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Machines that were shutting down and then disconnect are now shown as "Powered Off" in the UI instead of being stuck in "Shutting Down" with a greyed-out unreachable state.
For machines managed by a static infra provider, shutting down a machine now prevents the provider from automatically powering it back on due to cluster allocation. The provider honors the shutdown request until the machine goes through a deallocation cycle, at which point the request is considered stale.
Intentionally powered-off machines are also excluded from the "disconnected machines" list on the frontend when destroying a cluster, to avoid them being force-destroyed.
The shutdown modal in the frontend now calls a new management API endpoint instead of the Talos API directly. The CLI gains \`omnictl machine shutdown\` and \`omnictl machine power-on\` commands.
Closessiderolabs/omni#1634.
Part of siderolabs/omni-infra-provider-bare-metal#103.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Add a support modal to Omni, providing links to github issues, support, docs, community links, and office hours.
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
* Track machines running Talos versions approaching or past end of support relative to MinTalosVersion.
* Replace the config-driven non-ImageFactory deprecation notification with hardcoded constants and add two new notifications (approaching end of support, end of support reached) with corresponding Prometheus metrics.
* Add startup validation hooks (currently disabled) that will refuse to start when unsupported machines are detected.
* Fix frontend notification namespace from Default to Ephemeral.
Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
Add multiple new filters to audit logs. Through the UI, there will be a generic search box and the ability to sort columns. Through the CLI, there will be support for the same plus also direct filters for event_type, resource_type, resource_id, cluster_id, and actor.
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
Add creation timestamps and per-key last-active tracking to service account key listings. The `omnictl serviceaccount list` command now shows KEY CREATED and KEY LAST ACTIVE columns for each public key, alongside the existing SA-level LAST ACTIVE.
A new PublicKeyLastActive resource tracks per-key usage. The activity interceptor now extracts the signing key fingerprint from the auth context and records last-used timestamps per key, with independent debouncing. The ServiceAccountStatusController aggregates this data into the service account status for display.
A cleanup controller removes PublicKeyLastActive resources when their corresponding public key is torn down.
Closes: siderolabs/omni#2661
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Implement a guard for Omni to prevent usage until users accept an EULA through the UI or a startup flag.
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
Manifests support two modes:
- `FULL` - Omni will keep the manifest in sync always.
- `ONE_TIME` - Omni will apply the manifest only if it doesn't exist. If the manifest is removed by hand and then changed in Omni it will be applied too.
Manifests are applied using service side apply, Omni now has three inventories: `omni-internal-inventory`, `omni-user-inventory` and `omny-sync-one-time`:
- User inventory will be used for user managed manifests.
- Internal one will be used for the manifests which are created by Omni controllers (workloadproxy, advanced healtcheck service).
- One time inventory is used with NoPrune enabled. If the manifest is
applied it's just removed from the list of applied manifests: that
ensures that manifests changes are not going happen.
Manifests also support setting namespace to all namespaced resources. It might be useful for the huge manifest files which are supplied without the namespace (similar to `kubectl apply -n namespace -f manifest.yaml`).
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Warn users when machines are provisioned without ImageFactory by creating a notification resource when invalid schematics are detected. The notification is gated behind a configurable flag under notifications.nonImageFactoryDeprecation with customizable title/body.
Also adds omni_machines_invalid_schematic Prometheus metric, exposes the count in MachineStatusMetricsSpec, and adds a Machine Status section to the Grafana dashboard.
Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
Add `account.maxRegisteredMachines` config option to cap the number of registered machines. The provision handler atomically checks the limit under a mutex before creating new Link resources, returning ResourceExhausted when the cap is reached.
Introduce a Notification resource type (ephemeral namespace) so controllers can surface warnings to users. `omnictl` displays all active notifications on every command invocation. Frontend part of showing notifications will be implemented in a different PR.
MachineStatusMetricsController creates a warning notification when the registration limit is reached and tears it down when it's not.
Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
Backend now automatically switches between legacy and SSA modes for
different Talos versions.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Support isolated OIDC token cache directories in generated `kubeconfig`s to prevent token conflicts when switching between users/clusters. Configurable via server flags and omnictl `--oidc-cache-base-dir` `--oidc-cache-isolation`.
Also upgrade exec credential API to v1 and add interactiveMode field.
Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
Rewrite `TalosUpgradeStatus` controller to use the completely different
flow:
- update all `ClusterMachineTalosVersion` resources immediately.
- to control quotas and rollout sequence use `UpgradeRollout` resource,
it has a single field which is a map of MachineSetName -> Current
Quota:
- if control plane is updating it sets quota 0 on all other machine
sets.
- the number of not running/unhealthy machines is subtracted from the
quota.
- quota is now copied from the new `UpgradeStrategy`, so it's possible
to have more than one machine updated in parallel.
- `ClusterMachineConfigStatus` controller now adds a new finalizer for
upgrades on all `ClusterMachines` which are currently being updated to
acquire/release locks and reads quotas from the `UpgradeRollout`.
Fixes: https://github.com/siderolabs/omni/issues/2393
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
* Add `IdentityLastActive` resource to record the last time each identity(`User`/`ServiceAccount`) made a gRPC call.
* Add `IdentityStatusController` to aggregate identity, user role, and last-active data into an ephemeral `IdentityStatus` resource.
* Expose last_active in ListUsers/ListServiceAccounts gRPC responses, omnictl CLI output, and the frontend Users/ServiceAccounts views.
* Add `UserMetricsController` exposing `omni_users` (total) and `omni_active_users` (7d/30d windows) Prometheus gauges.
Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
Add talos_version and kubernetes_version to ClusterStatusSpec, so as to not need to also query ClusterSpec.
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
Migrate user create, list, update, and destroy operations from direct resource manipulation to dedicated ManagementService gRPC endpoints, matching the existing service account pattern.
Direct Identity/User resource mutations are now restricted, and the CLI, frontend, and client library are updated to use the new endpoints.
Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
This allows token rotation and disaster recovery if the token gets
rejected by Omni.
Introduced the new CLI command for that:
```
omnictl configure machine <id> --reset-node-unique-token
```
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Extract the fields required by the `MachineConfigStatusController` to a
separate resource.
Otherwise there's circular loop: `MachinePendingUpdates` ->
`MachineSetStatus` -> `MachineConfigStatus` -> `MachinePendingUpdates`...
Also change the way machine pending is calculated: do not delete the
pending machine updates resource if the Talos version/schematic is not
in sync.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
The schematic comparison logic had an edge case: if a machine predates the image factory, it is installed via a `ghcr.io` installer image (or a custom one). Those machines do not have the schematic meta extension on them, and Omni creates a synthetic schematic ID and properties for those. These properties do not have the "actual" kernel args of the machine, but rather, Omni sets them as what it thinks they should be (the "correct" siderolink args from the Omni perspective).
Later, if Omni gets its siderolink API advertised URL get updated, it wrongly detects those synthetic kernel args to be the "new ones (with the new URL)", hence, the desired vs actual schematic comparison returns a mismatch. And Omni does an unnecessary upgrade to that machine.
Fix this by using the "current (non-protected) args of the machine" as the synthetic args in such cases. Those "current" args will be synthetic themselves (since we cannot read them from the machine, as it does not have schematic info on it), but, it will prevent changes when the advertised URL changes.
Additionally, we have two checks to detect a schematic mismatch in the `ClusterMachineConfigStatus` controller - make them check the mismatch in the same way, to be more consistent.
Unrelated to this bug, also fix the `SchematicReady` check (introduced in 1.5) to treat invalid schematics as valid, as otherwise we cannot create clusters from non-factory images.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
We had an issue with bare metal provider where two different schematic IDs would fight each other, causing machine to get installed with a wrong schematic ID, only to be upgraded to the correct one immediately, and in some cases, go into an upgrade loop between a correct and an incorrect schematic.
The cause: Omni treated schematics it observed when the machine in agent mode dialed in, and stored the information it received (like kernel args and initial schematic info). This was wrong, as agent mode information essentially meaningless.
Fix this by changing the simple check of "was the schematic info for machine X ever observed" to be "is the schematic info for machine X ready". The readiness check involves schematic being populated and machine not being in agent mode.
This change caused `SchematicConfiguration` resource to not be generated before the machine leaves the agent mode, and caused a side effect: `InfraMachineController` would not receive Talos version from it and would not populate it on the `InfraMachine` resource. And this would cause BM provider to never get notified about the fact that the machine is allocated to a cluster, and would not power it on (to PXE boot it to "regular" Talos, for it to receive the "install" call to Omni).
Change that controller to get the Talos version info directly from the Cluster resource.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Fix a bug when setting arch to AMD64 which was enum value 0 and was being omitted in responses to frontend.
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
When the resource compression was disabled in the Omni config, we were not generating the ClusterMachineConfigPatches correctly.
The issue was: it was attempting to "force-compress" the ClusterMachineConfigPatches when any of the patches' size was above the threshold. But when it was trying to do that, it did not override the global setting of false.
The default setting for resource compression is `true`, but when a config file is used to configure Omni, and it was not specified in the config YAML, it was getting overwritten to be `false` due to the boolean merging behavior, which was fixed in https://github.com/siderolabs/omni/pull/2150.
Also: fix the compression kicking in even in cases when it is disabled in config but above the threshold.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Now graceful config rollout is handled by the
`ClusterMachineConfigStatusController`.
It calculates the available update quota by adding finalizers on the
`ClusterMachine` resources. By counting the resources with the
finalizers it tracks the remaining quota.
It now also calculates the pending changes which are not yet applied to
the machine in the `MachinePendingUpdates`.
Pending changes are not yet shown in the UI anywhere.
Fixes: https://github.com/siderolabs/omni/issues/1929
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Add PlatformMetalID constant to frontend and use it where relevant as an ID. Also update some places in backend with the same idea. There were some lingering uses of List requests in places where Get requests were more suited and those have been replaced too.
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
Extract schematic generation and download links from the confirmation step of the installation media wizard to allow for re-use inside the download modal of the list view.
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
This resource is going to be used to store the saved installation media
presets generated by the UI wizard.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Now it's possible to pass the `overlay` ID directly to the request.
`MediaId` is also still supported, but is there only for the backward
compatibility.
`InstallationMedia` resources will be used only in the `omnictl download`.
Updated the Wizard UI to no longer use `InstallationMedia` resources.
Dropped `pxe_url` from the `CreateSchematic` response, as all required
arguments are now on the client side (if not using `InstallationMedia`
resources).
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
This PR would add a new flag for minimum committed machines. This data would come from stripe and if the user's Omni environment has less that the committed machines, we'd just report the minimum specified.
Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
Return the final schematic YML to display in the frontend when creating installation media
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
Add labels for the assigned cluster and connection status to the
`omni_machines_version` metric.
Closes#1967
Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
Adjust the secure boot support check in the machine arch step to match how it works in factory.
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
Make `MachineSetNode` created without an owner by the
`MachineSetNodeController`.
Fixes: https://github.com/siderolabs/omni/issues/1450
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>