The schematic comparison logic had an edge case: if a machine predates the image factory, it is installed via a `ghcr.io` installer image (or a custom one). Those machines do not have the schematic meta extension on them, and Omni creates a synthetic schematic ID and properties for those. These properties do not have the "actual" kernel args of the machine, but rather, Omni sets them as what it thinks they should be (the "correct" siderolink args from the Omni perspective).
Later, if Omni gets its siderolink API advertised URL get updated, it wrongly detects those synthetic kernel args to be the "new ones (with the new URL)", hence, the desired vs actual schematic comparison returns a mismatch. And Omni does an unnecessary upgrade to that machine.
Fix this by using the "current (non-protected) args of the machine" as the synthetic args in such cases. Those "current" args will be synthetic themselves (since we cannot read them from the machine, as it does not have schematic info on it), but, it will prevent changes when the advertised URL changes.
Additionally, we have two checks to detect a schematic mismatch in the `ClusterMachineConfigStatus` controller - make them check the mismatch in the same way, to be more consistent.
Unrelated to this bug, also fix the `SchematicReady` check (introduced in 1.5) to treat invalid schematics as valid, as otherwise we cannot create clusters from non-factory images.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
(cherry picked from commit 0906bcc23c5d2e56b52fe4c3c7826afbea73dada)
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
This allows token rotation and disaster recovery if the token gets
rejected by Omni.
Introduced the new CLI command for that:
```
omnictl configure machine <id> --reset-node-unique-token
```
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
(cherry picked from commit 47fb4dd792b7201f1cbb2a3710da03c3b11d5972)
Extract the fields required by the `MachineConfigStatusController` to a
separate resource.
Otherwise there's circular loop: `MachinePendingUpdates` ->
`MachineSetStatus` -> `MachineConfigStatus` -> `MachinePendingUpdates`...
Also change the way machine pending is calculated: do not delete the
pending machine updates resource if the Talos version/schematic is not
in sync.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
(cherry picked from commit 69c2759b8bc5fe39f731c126921cf9ced4e7d468)
We had an issue with bare metal provider where two different schematic IDs would fight each other, causing machine to get installed with a wrong schematic ID, only to be upgraded to the correct one immediately, and in some cases, go into an upgrade loop between a correct and an incorrect schematic.
The cause: Omni treated schematics it observed when the machine in agent mode dialed in, and stored the information it received (like kernel args and initial schematic info). This was wrong, as agent mode information essentially meaningless.
Fix this by changing the simple check of "was the schematic info for machine X ever observed" to be "is the schematic info for machine X ready". The readiness check involves schematic being populated and machine not being in agent mode.
This change caused `SchematicConfiguration` resource to not be generated before the machine leaves the agent mode, and caused a side effect: `InfraMachineController` would not receive Talos version from it and would not populate it on the `InfraMachine` resource. And this would cause BM provider to never get notified about the fact that the machine is allocated to a cluster, and would not power it on (to PXE boot it to "regular" Talos, for it to receive the "install" call to Omni).
Change that controller to get the Talos version info directly from the Cluster resource.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
(cherry picked from commit c319d7bcf27966494dde99f62aa3c6eb636640cd)
Fix a bug when setting arch to AMD64 which was enum value 0 and was being omitted in responses to frontend.
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
When the resource compression was disabled in the Omni config, we were not generating the ClusterMachineConfigPatches correctly.
The issue was: it was attempting to "force-compress" the ClusterMachineConfigPatches when any of the patches' size was above the threshold. But when it was trying to do that, it did not override the global setting of false.
The default setting for resource compression is `true`, but when a config file is used to configure Omni, and it was not specified in the config YAML, it was getting overwritten to be `false` due to the boolean merging behavior, which was fixed in https://github.com/siderolabs/omni/pull/2150.
Also: fix the compression kicking in even in cases when it is disabled in config but above the threshold.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Now graceful config rollout is handled by the
`ClusterMachineConfigStatusController`.
It calculates the available update quota by adding finalizers on the
`ClusterMachine` resources. By counting the resources with the
finalizers it tracks the remaining quota.
It now also calculates the pending changes which are not yet applied to
the machine in the `MachinePendingUpdates`.
Pending changes are not yet shown in the UI anywhere.
Fixes: https://github.com/siderolabs/omni/issues/1929
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Add PlatformMetalID constant to frontend and use it where relevant as an ID. Also update some places in backend with the same idea. There were some lingering uses of List requests in places where Get requests were more suited and those have been replaced too.
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
Extract schematic generation and download links from the confirmation step of the installation media wizard to allow for re-use inside the download modal of the list view.
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
This resource is going to be used to store the saved installation media
presets generated by the UI wizard.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Now it's possible to pass the `overlay` ID directly to the request.
`MediaId` is also still supported, but is there only for the backward
compatibility.
`InstallationMedia` resources will be used only in the `omnictl download`.
Updated the Wizard UI to no longer use `InstallationMedia` resources.
Dropped `pxe_url` from the `CreateSchematic` response, as all required
arguments are now on the client side (if not using `InstallationMedia`
resources).
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
This PR would add a new flag for minimum committed machines. This data would come from stripe and if the user's Omni environment has less that the committed machines, we'd just report the minimum specified.
Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
Return the final schematic YML to display in the frontend when creating installation media
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
Add labels for the assigned cluster and connection status to the
`omni_machines_version` metric.
Closes#1967
Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
Adjust the secure boot support check in the machine arch step to match how it works in factory.
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
Make `MachineSetNode` created without an owner by the
`MachineSetNodeController`.
Fixes: https://github.com/siderolabs/omni/issues/1450
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Present all of that as 3 kinds of virtual resources:
- `MetalPlatformConfig`.
- `CloudPlatformConfig`.
- `SBCConfig`
Virtual resource supports `Get` and `List` operations.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
We compute the schematic id for a machine in two different places: in the `SchematicConfigurationController` for allocated machines, and in `MachineUpgradeStatusController` for maintenance mode machines.
Centralize this computation to be done only in `SchematicConfigurationController`.
Change the lifecycle of the `SchematicConfiguration` resource to be bound to a machine, not to a cluster.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Added new `update_on_each_login` field to the `SAMLLabelRule` spec.
Also renamed `assign_role_on_registration` to `assign_role` as it's no
longer reflecting the actual meaning.
The old field is kept there for the backward compatibility.
Fixes: https://github.com/siderolabs/omni/issues/1201
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Omni now supports ECDSA P-256 keys for signing the requests.
The plain key should be encoded as PEM when it is submitted to
`RegisterPublicKey` method.
Signature should be encoded using RFC4754 method (`r||s`).
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
(Re)implement the kernel args support functionality in the following way:
- Only support UKI or UKI-like (>=1.12 with GrubUseUKICmdline) systems.
- In `MachineStatusController`:
- When we see a machine for the first time, do a one-time operation of extracting of the extra kernel args from it and store them in the newly introduced `KernelArgs` resource. This resource is user-owned from that point on.
- Mark the `MachineStatus` with an annotation as "its kernel args are initialized".
- Start storing the the raw schematic.
- Take a one-time snapshot of the extensions on the machine and set them as "initial extensions". They might not be the "actual initial", i.e., the set of extensions when we actually seen the machine for the first time, but we do this in a best-effort basis. We need this, since now we cannot simply go back to the initial schematic ID when all extensions are removed - kernel args are also included in the schematic.
- Start collecting the kernel cmdline from Talos machines as well.
- Adapt the `SchematicConfiguration` controller to not revert to the initial schematic ID ever - it now always computes the needed schematic - when it wants to revert to the initial set of extensions, it uses the new field on the `MachineStatus`.
- Introduce the resource `MachineUpgradeStatus` and its controller `MachineUpgradeStatusController`, which handles the maintenance mode upgrades when kernel args are updated. The controller is named this way, since our long-term plan is to centralize all upgrade calls to be done from this controller. Currently, it does not change Talos version or the set of extensions. It works only in maintenance mode, only for kernel args changes (when supported).
- Introduce the resource `KernelArgsStatus` and its controller `KernelArgsStatusController`, which provides information about the kernel args updates. Its status is reliable in both maintenance and non-maintenance modes.
- Build a UI to update these args (with @Unix4ever's help).
Co-authored-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Adds a link to stripe in the omni settings sidebar if stripe is enabled in omni
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
Add metrics for enabled cluster features and for various machine properties:
```text
# HELP omni_cluster_features Number of clusters with specific features enabled.
# TYPE omni_cluster_features gauge
omni_cluster_features{feature="disk_encryption"} 1
omni_cluster_features{feature="embedded_discovery_service"} 0
omni_cluster_features{feature="workload_proxy"} 1
# HELP omni_machine_platforms Number of machines in the instance by platform.
# TYPE omni_machine_platforms gauge
omni_machine_platforms{platform="akamai"} 0
omni_machine_platforms{platform="aws"} 0
omni_machine_platforms{platform="azure"} 0
omni_machine_platforms{platform="cloudstack"} 0
omni_machine_platforms{platform="digital-ocean"} 0
omni_machine_platforms{platform="equinixMetal"} 0
omni_machine_platforms{platform="exoscale"} 0
omni_machine_platforms{platform="gcp"} 0
omni_machine_platforms{platform="hcloud"} 0
omni_machine_platforms{platform="metal"} 10
omni_machine_platforms{platform="nocloud"} 0
omni_machine_platforms{platform="opennebula"} 0
omni_machine_platforms{platform="openstack"} 0
omni_machine_platforms{platform="oracle"} 0
omni_machine_platforms{platform="scaleway"} 0
omni_machine_platforms{platform="upcloud"} 0
omni_machine_platforms{platform="vmware"} 0
omni_machine_platforms{platform="vultr"} 0
# HELP omni_machine_secure_boot_status Number of machines in the instance by secure boot status.
# TYPE omni_machine_secure_boot_status gauge
omni_machine_secure_boot_status{enabled="false"} 10
omni_machine_secure_boot_status{enabled="true"} 0
omni_machine_secure_boot_status{enabled="unknown"} 0
# HELP omni_machine_uki_status Number of machines in the instance by UKI (Unified Kernel Image) status.
# TYPE omni_machine_uki_status gauge
omni_machine_uki_status{booted_with_uki="false"} 0
omni_machine_uki_status{booted_with_uki="true"} 10
omni_machine_uki_status{booted_with_uki="unknown"} 0
```
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
default / e2e-cluster-import (push) Has been cancelled
default / e2e-forced-removal (push) Has been cancelled
default / e2e-omni-upgrade (push) Has been cancelled
default / e2e-scaling (push) Has been cancelled
default / e2e-short (push) Has been cancelled
default / e2e-short-secureboot (push) Has been cancelled
default / e2e-templates (push) Has been cancelled
default / e2e-upgrades (push) Has been cancelled
default / e2e-workload-proxy (push) Has been cancelled
default / integration-test (push) Has been cancelled
default / lint (push) Has been cancelled
default / unit-tests (push) Has been cancelled
If both conditions are true, make Omni crash with the error explaining
that using Talos 1.6 and `strict` join tokens mode is not possible.
Fixes: https://github.com/siderolabs/omni/issues/1588
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Do not include the comments to the generated ClusterMachineConfig to prevent triggering noop machine updates caused by changes on machine config comments
Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
Added a new omnictl command for aborting cluster import process and removing the created resources (e.g `Cluster`, `MachineSet`s, `MachineSetNode`s without reseting the machines.
Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
Fixes: https://github.com/siderolabs/omni/issues/92
Now Omni can be configured to use OIDC provider (tested against Google
provider).
New flags introduced:
```
--auth-oidc-enabled true
--auth-oidc-provider-url https://accounts.google.com
--auth-oidc-client-id REDACTED
--auth-oidc-client-secret REDACTED
--auth-oidc-scopes openid
--auth-oidc-scopes profile
--auth-oidc-scopes email
```
Initial users are created the same way as for Auth0 provider, same flag
should be used.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
default / e2e-cluster-import (push) Has been cancelled
default / e2e-forced-removal (push) Has been cancelled
default / e2e-omni-upgrade (push) Has been cancelled
default / e2e-scaling (push) Has been cancelled
default / e2e-short (push) Has been cancelled
default / e2e-short-secureboot (push) Has been cancelled
default / e2e-templates (push) Has been cancelled
default / e2e-upgrades (push) Has been cancelled
default / e2e-workload-proxy (push) Has been cancelled
Close stale issues and PRs / stale (push) Has been cancelled
e2e-workload-proxy-cron / default (push) Has been cancelled
e2e-upgrades-cron / default (push) Has been cancelled
e2e-templates-cron / default (push) Has been cancelled
e2e-short-secureboot-cron / default (push) Has been cancelled
e2e-short-cron / default (push) Has been cancelled
e2e-scaling-cron / default (push) Has been cancelled
e2e-omni-upgrade-cron / default (push) Has been cancelled
e2e-forced-removal-cron / default (push) Has been cancelled
e2e-cluster-import-cron / default (push) Has been cancelled
e2e-backups-cron / default (push) Has been cancelled
Lock old issues / action (push) Has been cancelled
Omni now has `--user-pilot-app-token` flag. When set it will enable
`Userpilot` integration in the UI.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
default / e2e-cluster-import (push) Has been cancelled
default / e2e-forced-removal (push) Has been cancelled
default / e2e-omni-upgrade (push) Has been cancelled
default / e2e-scaling (push) Has been cancelled
default / e2e-short (push) Has been cancelled
default / e2e-short-secureboot (push) Has been cancelled
default / e2e-templates (push) Has been cancelled
default / e2e-upgrades (push) Has been cancelled
default / e2e-workload-proxy (push) Has been cancelled
* Rekres, which brings Go 1.25.0. Also update it in go.mod files.
* Fix linter errors coming from new linters.
* Bump deps
Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
default / e2e-cluster-import (push) Has been cancelled
default / e2e-forced-removal (push) Has been cancelled
default / e2e-omni-upgrade (push) Has been cancelled
default / e2e-scaling (push) Has been cancelled
default / e2e-short (push) Has been cancelled
default / e2e-short-secureboot (push) Has been cancelled
default / e2e-templates (push) Has been cancelled
default / e2e-upgrades (push) Has been cancelled
default / e2e-workload-proxy (push) Has been cancelled
Several changes:
- if identity doesn't look like a valid email address ignore the
attribute.
- if identity was already detected in an attribute ignore other
attributes.
Allow setting extra attribute mappings in the command line flags.
Fixes: https://github.com/siderolabs/omni/issues/1376
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
default / e2e-backups (push) Blocked by required conditions
default / e2e-cluster-import (push) Blocked by required conditions
default / e2e-forced-removal (push) Blocked by required conditions
default / e2e-omni-upgrade (push) Blocked by required conditions
default / e2e-scaling (push) Blocked by required conditions
default / e2e-short (push) Blocked by required conditions
default / e2e-short-secureboot (push) Blocked by required conditions
default / e2e-templates (push) Blocked by required conditions
default / e2e-upgrades (push) Blocked by required conditions
default / e2e-workload-proxy (push) Blocked by required conditions
The UI now shows warning if the token revoke/delete operation will
render some machines not able to connect.
Same thing was done for the CLI commands.
Fixes: https://github.com/siderolabs/omni/issues/1380
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>
default / e2e-cluster-import (push) Has been cancelled
default / e2e-forced-removal (push) Has been cancelled
default / e2e-omni-upgrade (push) Has been cancelled
default / e2e-scaling (push) Has been cancelled
default / e2e-short (push) Has been cancelled
default / e2e-short-secureboot (push) Has been cancelled
default / e2e-templates (push) Has been cancelled
default / e2e-upgrades (push) Has been cancelled
default / e2e-workload-proxy (push) Has been cancelled
The node token status can be used to check if the machine has the unique
token generated. It also shows the exact token state:
- `NONE` - token is supported, but not yet generated.
- `UNSUPPORTED` - Talos is < 1.6.x, so token won't be generated.
- `EPHEMERAL` - token is generated, but is not persistent, so join token
rotation and machine reboot will make the node to disconnect.
- `PERSISTENT` - token is generated and is persisted to the `META`
partition.
If the node unique status is `NONE` the same controller will try to
generate node unique token.
Fixes: https://github.com/siderolabs/omni/issues/1348
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>