Commit Graph

30 Commits

Author SHA1 Message Date
Artem Chernyshev
c9c4c8e10d
test: use go test to build and run Omni integration tests
Some checks failed
default / default (push) Has been cancelled
default / e2e-backups (push) Has been cancelled
default / e2e-forced-removal (push) Has been cancelled
default / e2e-scaling (push) Has been cancelled
default / e2e-short (push) Has been cancelled
default / e2e-short-secureboot (push) Has been cancelled
default / e2e-templates (push) Has been cancelled
default / e2e-upgrades (push) Has been cancelled
default / e2e-workload-proxy (push) Has been cancelled
All test modules were moved under `integration` tag and are now in
`internal/integration` folder: no more `cmd/integration-test`
executable.

New Kres version is able to build the same executable from the tests
directory instead.

All Omni related flags were renamed, for example `--endpoint` ->
`--omni.endpoint`.

2 more functional changes:

- Enabled `--test.failfast` for all test runs.
- Removed finalizers, which were running if the test has failed.

Both of these changes should make it easier to understand the test
failure: Talos node logs won't be cluttered with the finalizer tearing
down the cluster.

Fixes: https://github.com/siderolabs/omni/issues/1171

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-06-03 15:07:00 +03:00
Artem Chernyshev
d88bb1df06
test: use latest Talemu infra provider version in the integration tests
Some checks failed
default / default (push) Has been cancelled
default / e2e-backups (push) Has been cancelled
default / e2e-forced-removal (push) Has been cancelled
default / e2e-scaling (push) Has been cancelled
default / e2e-short (push) Has been cancelled
default / e2e-short-secureboot (push) Has been cancelled
default / e2e-templates (push) Has been cancelled
default / e2e-upgrades (push) Has been cancelled
default / e2e-workload-proxy (push) Has been cancelled
The latest Talemu got updated to support new infra provider schema.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-05-22 23:41:12 +03:00
Utku Ozdemir
7c17ed6cf8
fix: use the correct schematic IP for maintenance upgrades
Some checks are pending
default / default (push) Waiting to run
default / e2e-backups (push) Blocked by required conditions
default / e2e-forced-removal (push) Blocked by required conditions
default / e2e-scaling (push) Blocked by required conditions
default / e2e-short (push) Blocked by required conditions
default / e2e-short-secureboot (push) Blocked by required conditions
default / e2e-templates (push) Blocked by required conditions
default / e2e-upgrades (push) Blocked by required conditions
default / e2e-workload-proxy (push) Blocked by required conditions
Maintenance upgrades triggered from the UI were using the wrong schematic ID, causing the machines which use UKI to lose siderolink kernel args and disconnect.

Since we have a complex logic to build the correct install image including the schematic, move it to a central place.

Add a new management endpoint for the maintenance upgrades. UI now calls this endpoint instead of calling the Talos API directly.
The new endpoint builds the install image correctly using the common logic and issues the upgrade.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2025-05-22 20:10:37 +02:00
Artem Chernyshev
0020ee3d79
feat: allow managing infra providers in the UI
Some checks are pending
default / default (push) Waiting to run
default / e2e-backups (push) Blocked by required conditions
default / e2e-forced-removal (push) Blocked by required conditions
default / e2e-scaling (push) Blocked by required conditions
default / e2e-short (push) Blocked by required conditions
default / e2e-short-secureboot (push) Blocked by required conditions
default / e2e-templates (push) Blocked by required conditions
default / e2e-upgrades (push) Blocked by required conditions
default / e2e-workload-proxy (push) Blocked by required conditions
No longer show the infra provider service accounts on the service
accounts page.
Instead, move the infra provider service account creation to the infra
providers page.
They are now always created in pair with `infra.Provider` resources.

If `infra.Provider` resource doesn't exist the provider is not allowed
to connect.
Also `infra.Provider` resource controls the lifecycle of the
`infra.ProviderStatus`, `infra.ProviderHealthStatus` and the service
account created for the provider: when the `infra.Provider` is deleted
everything is cleaned up.

`infra.ProviderStatus` and `infra.ProviderHealthStatus` are now combined
into `infra.ProviderCombinedStatus` which represents the current
connection state of the provider: if it runs or not and if it has any
errors.

`infra.ProviderCombinedStatus` resources are now shown on the new page
in the Omni UI.

`omnictl` got new commands for managing and viewing infra providers in
the system: `omnictl infraprovider [create|delete|renewkey|list]`.

Fixes: https://github.com/siderolabs/omni/issues/894

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-05-22 17:11:32 +03:00
Utku Ozdemir
dc753f4e75
test: bump Talos version used in integration tests to v1.10
Some checks are pending
default / default (push) Waiting to run
default / e2e-backups (push) Blocked by required conditions
default / e2e-forced-removal (push) Blocked by required conditions
default / e2e-scaling (push) Blocked by required conditions
default / e2e-short (push) Blocked by required conditions
default / e2e-short-secureboot (push) Blocked by required conditions
default / e2e-templates (push) Blocked by required conditions
default / e2e-upgrades (push) Blocked by required conditions
default / e2e-workload-proxy (push) Blocked by required conditions
Test 1.10 before release.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2025-05-14 20:58:46 +03:00
Artem Chernyshev
029ec05600
test: bump Talemu tests timeout
The internal timeout for the provision test is 6 minutes.
With 4 minutes timeout for the whole test run we miss the actual error
report.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-05-06 13:53:07 +03:00
Utku Ozdemir
b6563c2d21
chore: bump default Talos version to 1.9.5, Kubernetes version to 1.32.3
Bump to latest versions.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2025-03-20 11:22:46 +01:00
Utku Ozdemir
6a807c12ef
feat: push a partial config to machines in maintenance mode
Some checks are pending
default / default (push) Waiting to run
default / e2e-backups (push) Blocked by required conditions
default / e2e-forced-removal (push) Blocked by required conditions
default / e2e-scaling (push) Blocked by required conditions
default / e2e-short (push) Blocked by required conditions
default / e2e-short-secureboot (push) Blocked by required conditions
default / e2e-templates (push) Blocked by required conditions
default / e2e-upgrades (push) Blocked by required conditions
default / e2e-workload-proxy (push) Blocked by required conditions
Our goal is to ensure that the machines connected to Omni have always a correct machine config.

To do that, push a partial config with an `EventSinkConfig` and a `KmsgLogConfig` to the machines in maintenance mode.

If they have an existing partial config, merge the new documents with it.

Re-apply this partial config each time the machine disconnects and reconnects (siderolink public key changes), as it might mean the machine could have been rebooted and lost its in-memory partial config.

Note: we do not push a `SideroLinkConfig`, as if the machine is already connected to Omni, it has its SideroLink configured one way or the other. Additionally, we do not want to interfere/overwrite the join token machine uses to connect.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2025-02-20 23:21:59 +01:00
Artem Chernyshev
ed946b30a6
feat: display OMNI_ENDPOINT in the service account creation UI
Fixes: https://github.com/siderolabs/omni/issues/858

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-01-29 15:27:36 +03:00
Artem Chernyshev
2a2c648141
feat: bump default Talos version to 1.9.1, Kubernetes to 1.32.0
Some checks are pending
default / default (push) Waiting to run
default / e2e-backups (push) Blocked by required conditions
default / e2e-forced-removal (push) Blocked by required conditions
default / e2e-scaling (push) Blocked by required conditions
default / e2e-short (push) Blocked by required conditions
default / e2e-short-secureboot (push) Blocked by required conditions
default / e2e-templates (push) Blocked by required conditions
default / e2e-upgrades (push) Blocked by required conditions
default / e2e-workload-proxy (push) Blocked by required conditions
Also bump Talos machinery version to 1.9.1.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-01-16 20:45:53 +03:00
Artem Chernyshev
900987bf51
test: disable secure boot in e2e tests
Enabled secureboot in Talemu.
Split e2e-scaling tests: extract forced removal flows from it.
Changed the tests flags to support more complicated machine provision
flows: now it can read the config from the yaml file.

Run secureboot tests on cron only.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-10-31 02:11:38 +03:00
Artem Chernyshev
b3dc48ad33
chore: bump dependencies
Some checks are pending
default / default (push) Waiting to run
default / e2e-backups (push) Blocked by required conditions
default / e2e-scaling (push) Blocked by required conditions
default / e2e-short (push) Blocked by required conditions
default / e2e-templates (push) Blocked by required conditions
default / e2e-upgrades (push) Blocked by required conditions
default / e2e-workload-proxy (push) Blocked by required conditions
Bump go, JS deps. Container images.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-10-22 20:07:26 +03:00
Utku Ozdemir
423f729400
chore: bump default versions: Talos 1.7.6, Kubernetes 1.30.5
Bump them to the latest and greatest versions.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-09-16 12:01:07 +02:00
Artem Chernyshev
81e08eb38b
test: run infra integration tests against Talemu provider
Now Talemu based tests set up `MachineRequestSet` with 30 machines
before the tests and tear it down after.

New blocks validate links and machine requests creation and deletion.

Fixes: https://github.com/siderolabs/omni/issues/366

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-09-12 15:04:25 +03:00
Dmitriy Matrenichev
76ba670121
chore: allow users with admin role to download audit log from UI
Currently, there is no way to write to file using stream-like API, so we have to concatenate the full log in memory,
before we save it to the file system. In practice, if this proves to be problematic we can always redo the logic
using standard http file download.

Closes #586

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-08-28 18:10:20 +03:00
Andrey Smirnov
dcd123d333
fix: workload service reconciler
Most of the code is simplifying/refactoring, but there are few fixes:

* increase LB upstream healthcheck interval to 1 minute
* pass a logger to the LB (as otherwise it creates its own)
* shutdown the LB by waiting for it to shutdown
* close the LB even when it fails to start to avoid leaking health check goroutines

Additionally, add an integration test for workload proxying.

Co-authored-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-08-19 00:40:32 +02:00
Artem Chernyshev
60355b61be
test: run prometheus in tests and check metrics after talemu tests
Make the tests fail if the metrics do not meet the expected thresholds.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-07-29 17:45:34 +03:00
Artem Chernyshev
d76f8bdf59
test: enable Talemu tests
Fix several bugs to make the tests pass:

- fix races in `go-api-signature` library and in the TalosAPIKeyPrepare
  function.
- fix `MachineSetRequiredMachines` updates when it enters tearing down
  phase: ignore phase conflict errors.
- destroy the correct cluster in the `ScaleUpAndDownMachineClasses`
  finalizer.
- support changing tests max parallelism.
- reset restart backoff in `MachineSetNode` and `KubernetesStatus`
  controllers.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-07-24 18:16:18 +03:00
Artem Chernyshev
22e3acf2ea
chore: bump default Talos version to 1.7.4
Also update it in the integration tests.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-05-31 12:18:54 +03:00
Utku Ozdemir
55afa59033
feat: add secure boot support
Correctly handle the retrieval and updates of schematics when a Talos node has secure boot enabled.

When secure boot is enabled, we now
- Use the correct installer image, `installer-secureboot` instead of `installer`
- Preserve the kernel args in the schematic on install/upgrade instead of stripping them away.

For non-secureboot, we keep everything as-is, to avoid triggering an upgrade of existing nodes.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-05-27 16:02:44 +02:00
Artem Chernyshev
0aa16dbd83
chore: update Talos to 1.7.2 in the tests
Bump machinery to 1.7.2.
Enable partial machine config tests.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-05-22 12:15:20 +03:00
Artem Chernyshev
041a4364c1
feat: unify admin settings under Settings page
Use two tabs on that page: `Users`, `Backups`.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-05-08 19:35:12 +03:00
Artem Chernyshev
f40c55293d
chore: use new Auth0 app for CI
Store client ID and domain encrypted.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-05-06 12:57:23 +03:00
Utku Ozdemir
fbe196e6e9
test: use Talos nodes with partial config in integration tests
Make half of the Talos nodes used in the integration tests to use a partial machine config to join Omni instead of kernel args. This would allow us to test more real use cases and catch more issues.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-04-28 02:17:30 +02:00
Artem Chernyshev
340d078571
fix: use correct labels struct in the download installation media cmd
Additionally add validation for the labels meta arguments in
the `CreateSchematic` API.
Implement integration test that use omnictl to download the images with labels added.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-04-18 20:35:50 +03:00
Utku Ozdemir
176f9d9f57
feat: compute schematic id only from the extensions
When determining the schematic ID of a machine, instead of relying the ID on the schematic ID meta-extension, compute the ID by gathering the extensions on the machine. This way, the extension ID will not contain the META values, labels or the kernel args.

This ID is actually the ID we need, as when we compare the desired schematic with the actual one during a Talos upgrade, we are only interested in the changes in the list of extensions.

This does not cause the kernel args, labels, etc. to disappear, as they are used at installation time and preserved afterward (e.g., during upgrades).

Additionally:
- Remove the list of extensions from the `Schematic` resource, as it relied upon the schematics always being created through Omni. This is not always the case - i.e., when a partial join config is used. Therefore, instead of relying on it, we store the list of extensions by directly reading them from the machine and storing them on the `MachineStatus` resource.
- Skip setting the schematic META section at all if there are no labels set on Download Installation Media screen.

Closes siderolabs/omni#55.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-03-22 14:58:19 +03:00
Utku Ozdemir
8173377c12
feat: preserve maintenance machine configs
When a machine has a partial machine config (e.g., `SideroLinkConfig` or `KmsgLogConfig`) while being in maintenance mode, detect and store it on Omni side.

When this machine has a config applied to it (i.e., when it joins to a cluster) or when its config is updated, always preserve these detected configs in addition to the main (`v1alpha1`) config.

This allows machines to join Omni by using partial configs (for example, by using cloud user data) instead of having kernel args for the SideroLink and kernel log sync configuration.

We do not fail in the cases where we are not authorized to access the COSI resource API, as the full access to the resources over SideroLink is only available in Talos v1.6.5 and above.

Modify the Talos upgrade integration tests to apply a partial config during maintenance mode, and later assert that it was preserved through the upgrade/revert flow.

Additionally, bump the default Talos version and the version used in the integration tests to 1.6.6.

Closes siderolabs/omni#13.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-03-13 09:05:17 +01:00
Noel Georgi
0960100f11
chore: drop integration binary from releases
Drop integration binary from releases.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-03-08 15:01:44 +05:30
Andrey Smirnov
69dba26ece
fix: redirect omni feedback to omni repo
Now that `omni` repo is public we can use it.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-29 18:28:58 +04:00
Andrey Smirnov
dfcbaae7d0
chore: initial commit
Omni is source-available under BUSL.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Co-Authored-By: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Co-Authored-By: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Co-Authored-By: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Co-Authored-By: Philipp Sauter <philipp.sauter@siderolabs.com>
Co-Authored-By: Noel Georgi <git@frezbo.dev>
Co-Authored-By: evgeniybryzh <evgeniybryzh@gmail.com>
Co-Authored-By: Tim Jones <tim.jones@siderolabs.com>
Co-Authored-By: Andrew Rynhard <andrew@rynhard.io>
Co-Authored-By: Spencer Smith <spencer.smith@talos-systems.com>
Co-Authored-By: Christian Rolland <christian.rolland@siderolabs.com>
Co-Authored-By: Gerard de Leeuw <gdeleeuw@leeuwit.nl>
Co-Authored-By: Steve Francis <67986293+steverfrancis@users.noreply.github.com>
Co-Authored-By: Volodymyr Mazurets <volodymyrmazureets@gmail.com>
2024-02-29 17:19:57 +04:00