mirrors/omni - omni - gitea@git.xfx1.de

mirror of https://github.com/siderolabs/omni.git synced 2026-05-05 14:46:12 +02:00

Author	SHA1	Message	Date
Utku Ozdemir	2fe716d2c9	chore: enable go linting for build tags, fix linting errors Add the build tags we were using, `integration` and `tools`, to be included in the linting/formatting of golangci-lint. Rename the build tag `tools` to `sidero.tools` to avoid colliding with the same named build tag in `github.com/johannesboyne/gofakes3` package - otherwise the dependency was failing to compile due to having multiple package names in the same package. Fix all the linting errors surfaced by this enablement. Also, temporarily re-enabled `nolintlint` to find the nolint directives which were no longer necessary and removed them. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-04-29 21:18:45 +02:00
Edward Sammut Alessi	d3592671ec	feat: download talosctl directly from factory Download talosctl binaries from factory instead of Github Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>	2026-04-29 17:06:25 +02:00
Edward Sammut Alessi	c5a4310570	feat(frontend): add support modal to omni Add a support modal to Omni, providing links to github issues, support, docs, community links, and office hours. Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>	2026-04-23 15:46:42 +02:00
Edward Sammut Alessi	be67f710f8	feat: allow reader access to join token Explicitly allow readers to read join tokens Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>	2026-04-21 16:28:32 +02:00
Oguz Kilcan	0987fa9e8f	chore: prepare omni with talos v1.13.0-rc Prepare omni for upcoming talos version 1.13 Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>	2026-04-17 16:58:24 +02:00
Artem Chernyshev	78544a8557	feat: restrict directories for included files in the cluster templates By default only allow to include files from the same directory where the template file lives. This is to prevent malicious cluster templates that include something like `/etc/passwd`. Fixes: https://github.com/siderolabs/omni/issues/2590 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2026-04-16 19:28:33 +03:00
Edward Sammut Alessi	488b020b2e	feat: add more filters to audit logs Add multiple new filters to audit logs. Through the UI, there will be a generic search box and the ability to sort columns. Through the CLI, there will be support for the same plus also direct filters for event_type, resource_type, resource_id, cluster_id, and actor. Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>	2026-04-15 11:03:54 +02:00
Utku Ozdemir	590ea2e370	feat: add per-key creation and last-active tracking for service accounts Add creation timestamps and per-key last-active tracking to service account key listings. The `omnictl serviceaccount list` command now shows KEY CREATED and KEY LAST ACTIVE columns for each public key, alongside the existing SA-level LAST ACTIVE. A new PublicKeyLastActive resource tracks per-key usage. The activity interceptor now extracts the signing key fingerprint from the auth context and records last-used timestamps per key, with independent debouncing. The ServiceAccountStatusController aggregates this data into the service account status for display. A cleanup controller removes PublicKeyLastActive resources when their corresponding public key is torn down. Closes: siderolabs/omni#2661 Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-04-14 21:12:30 +02:00
Edward Sammut Alessi	cad3713552	feat: implement eula guard for omni Implement a guard for Omni to prevent usage until users accept an EULA through the UI or a startup flag. Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>	2026-04-13 16:49:51 +02:00
Oguz Kilcan	9201358b22	chore: bump dependencies and rekres Bump dependencies, rekres and fix linter issues Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>	2026-04-07 17:59:48 +02:00
Artem Chernyshev	5db4dbfa08	test: lock prepared for Omni upgrade cluster, then check pending changes This check will show actual unexpected introduced diffs. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2026-04-01 18:59:29 +03:00
Artem Chernyshev	6efb0f2f0a	feat: support Kubernetes manifests in the cluster templates Fixes: https://github.com/siderolabs/omni/issues/2172 Leverage kubernetes manifest resources and expose them through cluster templates. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2026-03-26 14:10:14 +03:00
Artem Chernyshev	ada0360837	feat: add a way to sync Kubernetes manifests in Omni Manifests support two modes: - `FULL` - Omni will keep the manifest in sync always. - `ONE_TIME` - Omni will apply the manifest only if it doesn't exist. If the manifest is removed by hand and then changed in Omni it will be applied too. Manifests are applied using service side apply, Omni now has three inventories: `omni-internal-inventory`, `omni-user-inventory` and `omny-sync-one-time`: - User inventory will be used for user managed manifests. - Internal one will be used for the manifests which are created by Omni controllers (workloadproxy, advanced healtcheck service). - One time inventory is used with NoPrune enabled. If the manifest is applied it's just removed from the list of applied manifests: that ensures that manifests changes are not going happen. Manifests also support setting namespace to all namespaced resources. It might be useful for the huge manifest files which are supplied without the namespace (similar to `kubectl apply -n namespace -f manifest.yaml`). Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2026-03-23 15:29:49 +03:00
Utku Ozdemir	2977f05381	feat: allow empty subdomain for workload proxy Allow setting the workload proxy subdomain to an empty string when useOmniSubdomain is true. This exposes services directly as subdomains of Omni (e.g., grafana.omni.example.com), which is the simplest possible setup for on-prem deployments needing only a wildcard DNS and cert on the Omni domain. Continuation of https://github.com/siderolabs/omni/pull/2538. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-03-19 12:07:38 +01:00
Oguz Kilcan	6370b41c9a	test: re-fetch machine IPs in AssertTalosVersion retry loop During ScaleUpAndDown, machines being removed still have ClusterMachineIdentity resources when the version check starts. The test collected IPs once upfront, then spent 2 minutes trying to reach a machine whose TLS identity was already invalidated, causing x509 errors until the timeout. Re-fetch ClusterMachineIdentity on each retry iteration so that destroyed machines drop out of the IP list naturally. Also fix clearConnectionRefused: replace the manual ctx.Done() check with RetryWithContext. The old code returned a plain fmt.Errorf on timeout, which fell through as a non-retryable error due to a race between the context deadline and the retry loop's own timeout. Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>	2026-03-16 15:04:23 +01:00
Oguz Kilcan	cf7d752453	feat: enforce configurable machine registration limit Add `account.maxRegisteredMachines` config option to cap the number of registered machines. The provision handler atomically checks the limit under a mutex before creating new Link resources, returning ResourceExhausted when the cap is reached. Introduce a Notification resource type (ephemeral namespace) so controllers can surface warnings to users. `omnictl` displays all active notifications on every command invocation. Frontend part of showing notifications will be implemented in a different PR. MachineStatusMetricsController creates a warning notification when the registration limit is reached and tears it down when it's not. Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>	2026-03-16 12:48:47 +01:00
Artem Chernyshev	385c512d4c	test: fix `ConfigPatching` test Accidentally added the check which was intended for another test case. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2026-03-12 16:42:31 +03:00
Artem Chernyshev	31e13e9e39	fix: do not release lock on apply config fails The code there was also incorrect: it was skipping setting the `LastError` on the `ClusterMachineConfigStatus` resource. Also add an integration test to verify that invalid config errors are properly reported. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2026-03-10 19:49:33 +03:00
Artem Chernyshev	f8a42eeb04	chore: move graceful upgrades to the lowest level Rewrite `TalosUpgradeStatus` controller to use the completely different flow: - update all `ClusterMachineTalosVersion` resources immediately. - to control quotas and rollout sequence use `UpgradeRollout` resource, it has a single field which is a map of MachineSetName -> Current Quota: - if control plane is updating it sets quota 0 on all other machine sets. - the number of not running/unhealthy machines is subtracted from the quota. - quota is now copied from the new `UpgradeStrategy`, so it's possible to have more than one machine updated in parallel. - `ClusterMachineConfigStatus` controller now adds a new finalizer for upgrades on all `ClusterMachines` which are currently being updated to acquire/release locks and reads quotas from the `UpgradeRollout`. Fixes: https://github.com/siderolabs/omni/issues/2393 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2026-03-03 20:02:59 +03:00
Oguz Kilcan	6d03fc7cdb	feat: track user and service account last activity * Add `IdentityLastActive` resource to record the last time each identity(`User`/`ServiceAccount`) made a gRPC call. * Add `IdentityStatusController` to aggregate identity, user role, and last-active data into an ephemeral `IdentityStatus` resource. * Expose last_active in ListUsers/ListServiceAccounts gRPC responses, omnictl CLI output, and the frontend Users/ServiceAccounts views. * Add `UserMetricsController` exposing `omni_users` (total) and `omni_active_users` (7d/30d windows) Prometheus gauges. Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>	2026-03-03 13:53:29 +01:00
Oguz Kilcan	e3df911d48	feat: enforce configurable limits on user and service account creation Add state validation that rejects identity creation when the configured maximum number of users or service accounts is reached. The gRPC resource and management servers now use the validated state so these limits are enforced for all creation paths (CLI, UI, API). Identity is created before the user resource so the validation fires before any side effects. Also adds create validation for join token name, e2e Playwright tests covering UI and AccountLimits integration test covering API and CLI for limit enforcement. Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>	2026-02-26 13:47:52 +01:00
Oguz Kilcan	da60807d48	feat: add ManagementService gRPC endpoints for user operations Migrate user create, list, update, and destroy operations from direct resource manipulation to dedicated ManagementService gRPC endpoints, matching the existing service account pattern. Direct Identity/User resource mutations are now restricted, and the CLI, frontend, and client library are updated to use the new endpoints. Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>	2026-02-26 09:33:27 +01:00
Artem Chernyshev	69c2759b8b	fix: break the dep loop in the cluster machine config status controller Extract the fields required by the `MachineConfigStatusController` to a separate resource. Otherwise there's circular loop: `MachinePendingUpdates` -> `MachineSetStatus` -> `MachineConfigStatus` -> `MachinePendingUpdates`... Also change the way machine pending is calculated: do not delete the pending machine updates resource if the Talos version/schematic is not in sync. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2026-02-17 00:28:32 +03:00
Utku Ozdemir	fbf36740f2	test: add unit and e2e tests to the helm chart Add helm unit tests (via helm-unittest) covering services, ingresses, HTTPRoutes, secrets, PrometheusRules and ServiceAccounts. Add a helm-based e2e test workflow that deploys Omni on a Talos cluster with Traefik and etcd, runs integration tests including workload proxy, and verifies the full stack end-to-end. Add a configurable TestOptions struct to the workload proxy test to allow running with smaller scale in helm e2e. Signed-off-by: Kevin Tijssen <kevin.tijssen@siderolabs.com> Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-02-16 13:58:56 +01:00
Oguz Kilcan	afdf123e29	feat: add support for Kubernetes CA rotation Add support for Kubernetes CA rotation Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>	2026-02-14 11:32:00 +01:00
Utku Ozdemir	30d17dcf6d	chore: update Go to 1.26 in go.mod, rekres, fix linting issues Update Go in go.mod to keep it consistent with the value in the Makefile (the actual Go version the project is built with). It kicks in some new linters, causes linters to change behavior. Reformat and fix all those linting issues. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-02-13 10:58:59 +01:00
Utku Ozdemir	868f8ac1e7	test: reach maintenance mode machines' Talos API through Omni in tests In the integration tests, we were accessing the API of the Talos machines which are in maintenance mode by directly hitting their SideroLink mgmt endpoint. This worked only because the test was running on the same host as Omni itself (as we spawned Omni as process). This approach breaks when we install Omni via its helm chart on a Kubernetes cluster. Fix this by going to them through Omni as well. Additionally, centralize the talos client creation in the tests. Additionally: bump Talos machinery, and pass the service account key explicitly to the Talos client when creating it, instead of relying on it to pick it from env vars. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-02-12 10:20:59 +01:00
Utku Ozdemir	ef3e3bc1cc	test: use automation sa directly in integration tests Instead of doing the fake user auth flow in the integration tests via the `clientconfig` package, use the automation service account directly. Remove all other usages of that package as well, and drop it completely. The package predates the initial service account token feature of Omni, its purpose was to authenticate to the Omni API in the integration tests. We have the automation key now, so we don't need that anymore. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-02-11 19:26:46 +01:00
Utku Ozdemir	f3cdbda7e0	refactor: remove global config, inject it to services Part of the effort to improve Omni codebase, reduce the usage of globals. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-02-09 14:16:02 +01:00
Utku Ozdemir	4cc3a3da8f	test: do not check for empty wipe id in static infra provider test Wipe ID on `InfraMachine` resources is empty only when a machine was never needed to be wiped, i.e., was never allocated to and then de-allocated from a cluster. This is not always the case in the bare metal infra provider tests, as it runs both `ConfigPatching` and `StaticInfraProvider` integration tests at the same time. Sometimes, the latter test picked machines which were released by the former test, and those machines were already wiped at least once. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-02-04 12:09:45 +01:00
Utku Ozdemir	c319d7bcf2	fix: fix schematic generation for machines in agent mode We had an issue with bare metal provider where two different schematic IDs would fight each other, causing machine to get installed with a wrong schematic ID, only to be upgraded to the correct one immediately, and in some cases, go into an upgrade loop between a correct and an incorrect schematic. The cause: Omni treated schematics it observed when the machine in agent mode dialed in, and stored the information it received (like kernel args and initial schematic info). This was wrong, as agent mode information essentially meaningless. Fix this by changing the simple check of "was the schematic info for machine X ever observed" to be "is the schematic info for machine X ready". The readiness check involves schematic being populated and machine not being in agent mode. This change caused `SchematicConfiguration` resource to not be generated before the machine leaves the agent mode, and caused a side effect: `InfraMachineController` would not receive Talos version from it and would not populate it on the `InfraMachine` resource. And this would cause BM provider to never get notified about the fact that the machine is allocated to a cluster, and would not power it on (to PXE boot it to "regular" Talos, for it to receive the "install" call to Omni). Change that controller to get the Talos version info directly from the Cluster resource. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-02-03 11:46:15 +01:00
Oguz Kilcan	c6cc25c73c	feat: add support for Talos CA rotation Add support for Talos CA rotation Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>	2026-01-30 09:59:25 +01:00
Utku Ozdemir	a5795c2fa4	feat: add config descriptions in schema, use them in flags Rework root command to get the flag descriptions from the JSON schema. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-01-27 14:50:25 +01:00
Utku Ozdemir	91c8bff46c	feat: generate omni config from schema Make all leaf fields nillable, so that we can distinguish unset from explicit empty, and merging of CLI args and YAML configs work correctly. Generate nil-safe accessors (getter/setters) for these nillable fields and use them in the code. Wrap the cobra command line parser to support nillable flags. Move all validations into the JSON schema and drop go-validator usage and its annotations. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-01-22 13:23:11 +01:00
Edward Sammut Alessi	d3ae77c0cc	chore: bump copyright to 2026 Bump copyright for conformance to 2026 Signed-off-by: Edward Sammut Alessi <edward.sammutalessi@siderolabs.com>	2026-01-21 15:30:49 +01:00
Artem Chernyshev	41506f72f8	chore: move graceful config rollout logic to the lowest controller level Now graceful config rollout is handled by the `ClusterMachineConfigStatusController`. It calculates the available update quota by adding finalizers on the `ClusterMachine` resources. By counting the resources with the finalizers it tracks the remaining quota. It now also calculates the pending changes which are not yet applied to the machine in the `MachinePendingUpdates`. Pending changes are not yet shown in the UI anywhere. Fixes: https://github.com/siderolabs/omni/issues/1929 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2026-01-19 16:30:28 +03:00
Oguz Kilcan	2d5e58cbac	chore: rekres and bump deps * rekres * bump deps * bump go to 1.25.6 * fix linter errors Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>	2026-01-16 11:15:02 +01:00
Pranav Patil	c6aaff0f9e	refactor: make namespace implicit in auth package Simplify the code and make it less error prone. Signed-off-by: Pranav Patil <pranavppatil767@gmail.com> Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-01-14 21:07:33 +01:00
Pranav Patil	dff8e1f64d	feat: make namespace implicit in k8s and oidc package NewResource functions Refactored NewKubernetesResource and NewJWTPublicKey to use implicit namespace Signed-off-by: Pranav Patil <pranavppatil767@gmail.com>	2026-01-14 11:29:03 +01:00
Utku Ozdemir	4db838196c	test: remove machine.install.extraKernelArgs from infra machines With Talos 1.12, `.machine.install.extraKernelArgs` is not the right way of setting kernel args. Remove that from the infra machines (bare metal infra provider) tests. Remove the `disableKexec` bool argument from the function, as it was always set to true. Set the kernel arg to disable kexec in correct format, as `sysctl.kernel.kexec_load_disabled=1`, not `kexec_load_disabled=1` (was effectively no-op). Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-01-13 20:31:35 +01:00
Pranav Patil	de6e2c66f7	refactor: make namespace implicit in omni resources Refactor for code simplicity. Signed-off-by: Pranav Patil <pranavppatil767@gmail.com> Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-01-12 12:54:11 +01:00
Pranav Patil	9503f850cc	refactor: make namespace implicit in siderolink resources Refactor for code simplicity. Signed-off-by: Pranav Patil <pranavppatil767@gmail.com> Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-01-12 10:42:29 +01:00
Oguz Kilcan	ef2d931aac	chore: rekres and bump deps * Rekres * Bump deps * Update default versions for talos and kubernetes Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>	2026-01-09 11:34:03 +01:00
Pranav Patil	55fd33db39	refactor: make namespace implicit in system & virtual resources Refactor for code simplicity. Signed-off-by: Pranav Patil <pranavppatil767@gmail.com> Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-01-08 11:40:25 +01:00
Utku Ozdemir	0be460205b	test: improve test stability Fix a few things in tests: - Add the forgotten `claimMachines` calls to a few integration tests - When picking unallocated machines in integration tests, ensure that they are unallocated by checking that here is no corresponding `MachineSetNode` resource. Previous check on the `Available` label on `MachineStatus` resource was inherently racy, as that label is set by a controller asynchronously after a machine was "picked". - Fix the flake in TalosUpgradeStatus unit test: it was skipping reconciliation because the `SchematicConfiguration` resource was missing the cluster label, but in the same time it was not failing reliably, as it was not asserting the completion of one upgrade before starting the next one. Fix both issues. - Fix a crash in TalosUpgradeStatusController - it was failing to read back the `ClusterMachineTalosVersion` resource it just created because it was not yet available in the controller runtime cache. Instead of reading it back after writing, simply return the created resource reference. Co-authored-by: Oguz Kilcan <oguz.kilcan@siderolabs.com> Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-01-07 17:19:17 +01:00
Utku Ozdemir	535d733ea6	chore: drop migrations older than v1.1.0 Drop old migrations and deprecated types which were kept only for the migrations. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2026-01-06 14:50:11 +01:00
Edward Sammut Alessi	5c98d44bdf	chore: implement `InstallationMediaConfig` resource This resource is going to be used to store the saved installation media presets generated by the UI wizard. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2025-12-29 17:41:45 +01:00
Artem Chernyshev	36c20175e6	fix: ignore labeled `MachineSetNodes` in the export and sync CLI cmds Now as `MachineSetNodes` are no longer ever owned by the `MachineSetNodeController` and marked with `managed-by-machine-set-node-controller` label instead, CLI tools should properly handle that and ignore such `MachineSetNodes` during export and cluster sync. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2025-12-23 20:39:19 +03:00
Artem Chernyshev	ee926cd9eb	feat: add a way to switch gRPC tunnel mode for the connected machines Fixes: https://github.com/siderolabs/omni/issues/1816 Introduce a new command: ``` omnictl configure machine <id> --siderolink-connection=[udp\|http-tunnel\|auto] ``` Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2025-12-12 22:59:33 +03:00
Utku Ozdemir	9bf690ef2e	refactor: do SQLite migrations unconditionally, rework the config flags Remove the flags for turning on SQLite storage for: - Discovery service state - Audit logs - Machine logs Instead, migrate them unconditionally to SQLite on the next startup. Remove many flags which are no longer meaningful. Only keep the ones which are required for the migrations. Additionally: Make the `--sqlite-storage-path` (or its config counterpart `.storage.sqlite.path`) required with no default value, as a default value does not make sense for it in most of the cases. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2025-12-12 12:47:04 +01:00

1 2

97 Commits