All test modules were moved under `integration` tag and are now in
`internal/integration` folder: no more `cmd/integration-test`
executable.
New Kres version is able to build the same executable from the tests
directory instead.
All Omni related flags were renamed, for example `--endpoint` ->
`--omni.endpoint`.
2 more functional changes:
- Enabled `--test.failfast` for all test runs.
- Removed finalizers, which were running if the test has failed.
Both of these changes should make it easier to understand the test
failure: Talos node logs won't be cluttered with the finalizer tearing
down the cluster.
Fixes: https://github.com/siderolabs/omni/issues/1171
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Maintenance upgrades triggered from the UI were using the wrong schematic ID, causing the machines which use UKI to lose siderolink kernel args and disconnect.
Since we have a complex logic to build the correct install image including the schematic, move it to a central place.
Add a new management endpoint for the maintenance upgrades. UI now calls this endpoint instead of calling the Talos API directly.
The new endpoint builds the install image correctly using the common logic and issues the upgrade.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
No longer show the infra provider service accounts on the service
accounts page.
Instead, move the infra provider service account creation to the infra
providers page.
They are now always created in pair with `infra.Provider` resources.
If `infra.Provider` resource doesn't exist the provider is not allowed
to connect.
Also `infra.Provider` resource controls the lifecycle of the
`infra.ProviderStatus`, `infra.ProviderHealthStatus` and the service
account created for the provider: when the `infra.Provider` is deleted
everything is cleaned up.
`infra.ProviderStatus` and `infra.ProviderHealthStatus` are now combined
into `infra.ProviderCombinedStatus` which represents the current
connection state of the provider: if it runs or not and if it has any
errors.
`infra.ProviderCombinedStatus` resources are now shown on the new page
in the Omni UI.
`omnictl` got new commands for managing and viewing infra providers in
the system: `omnictl infraprovider [create|delete|renewkey|list]`.
Fixes: https://github.com/siderolabs/omni/issues/894
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
The internal timeout for the provision test is 6 minutes.
With 4 minutes timeout for the whole test run we miss the actual error
report.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Our goal is to ensure that the machines connected to Omni have always a correct machine config.
To do that, push a partial config with an `EventSinkConfig` and a `KmsgLogConfig` to the machines in maintenance mode.
If they have an existing partial config, merge the new documents with it.
Re-apply this partial config each time the machine disconnects and reconnects (siderolink public key changes), as it might mean the machine could have been rebooted and lost its in-memory partial config.
Note: we do not push a `SideroLinkConfig`, as if the machine is already connected to Omni, it has its SideroLink configured one way or the other. Additionally, we do not want to interfere/overwrite the join token machine uses to connect.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Enabled secureboot in Talemu.
Split e2e-scaling tests: extract forced removal flows from it.
Changed the tests flags to support more complicated machine provision
flows: now it can read the config from the yaml file.
Run secureboot tests on cron only.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Now Talemu based tests set up `MachineRequestSet` with 30 machines
before the tests and tear it down after.
New blocks validate links and machine requests creation and deletion.
Fixes: https://github.com/siderolabs/omni/issues/366
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Currently, there is no way to write to file using stream-like API, so we have to concatenate the full log in memory,
before we save it to the file system. In practice, if this proves to be problematic we can always redo the logic
using standard http file download.
Closes#586
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Most of the code is simplifying/refactoring, but there are few fixes:
* increase LB upstream healthcheck interval to 1 minute
* pass a logger to the LB (as otherwise it creates its own)
* shutdown the LB by waiting for it to shutdown
* close the LB even when it fails to start to avoid leaking health check goroutines
Additionally, add an integration test for workload proxying.
Co-authored-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Fix several bugs to make the tests pass:
- fix races in `go-api-signature` library and in the TalosAPIKeyPrepare
function.
- fix `MachineSetRequiredMachines` updates when it enters tearing down
phase: ignore phase conflict errors.
- destroy the correct cluster in the `ScaleUpAndDownMachineClasses`
finalizer.
- support changing tests max parallelism.
- reset restart backoff in `MachineSetNode` and `KubernetesStatus`
controllers.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Correctly handle the retrieval and updates of schematics when a Talos node has secure boot enabled.
When secure boot is enabled, we now
- Use the correct installer image, `installer-secureboot` instead of `installer`
- Preserve the kernel args in the schematic on install/upgrade instead of stripping them away.
For non-secureboot, we keep everything as-is, to avoid triggering an upgrade of existing nodes.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Make half of the Talos nodes used in the integration tests to use a partial machine config to join Omni instead of kernel args. This would allow us to test more real use cases and catch more issues.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Additionally add validation for the labels meta arguments in
the `CreateSchematic` API.
Implement integration test that use omnictl to download the images with labels added.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
When determining the schematic ID of a machine, instead of relying the ID on the schematic ID meta-extension, compute the ID by gathering the extensions on the machine. This way, the extension ID will not contain the META values, labels or the kernel args.
This ID is actually the ID we need, as when we compare the desired schematic with the actual one during a Talos upgrade, we are only interested in the changes in the list of extensions.
This does not cause the kernel args, labels, etc. to disappear, as they are used at installation time and preserved afterward (e.g., during upgrades).
Additionally:
- Remove the list of extensions from the `Schematic` resource, as it relied upon the schematics always being created through Omni. This is not always the case - i.e., when a partial join config is used. Therefore, instead of relying on it, we store the list of extensions by directly reading them from the machine and storing them on the `MachineStatus` resource.
- Skip setting the schematic META section at all if there are no labels set on Download Installation Media screen.
Closessiderolabs/omni#55.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
When a machine has a partial machine config (e.g., `SideroLinkConfig` or `KmsgLogConfig`) while being in maintenance mode, detect and store it on Omni side.
When this machine has a config applied to it (i.e., when it joins to a cluster) or when its config is updated, always preserve these detected configs in addition to the main (`v1alpha1`) config.
This allows machines to join Omni by using partial configs (for example, by using cloud user data) instead of having kernel args for the SideroLink and kernel log sync configuration.
We do not fail in the cases where we are not authorized to access the COSI resource API, as the full access to the resources over SideroLink is only available in Talos v1.6.5 and above.
Modify the Talos upgrade integration tests to apply a partial config during maintenance mode, and later assert that it was preserved through the upgrade/revert flow.
Additionally, bump the default Talos version and the version used in the integration tests to 1.6.6.
Closessiderolabs/omni#13.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Omni is source-available under BUSL.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Co-Authored-By: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Co-Authored-By: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Co-Authored-By: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Co-Authored-By: Philipp Sauter <philipp.sauter@siderolabs.com>
Co-Authored-By: Noel Georgi <git@frezbo.dev>
Co-Authored-By: evgeniybryzh <evgeniybryzh@gmail.com>
Co-Authored-By: Tim Jones <tim.jones@siderolabs.com>
Co-Authored-By: Andrew Rynhard <andrew@rynhard.io>
Co-Authored-By: Spencer Smith <spencer.smith@talos-systems.com>
Co-Authored-By: Christian Rolland <christian.rolland@siderolabs.com>
Co-Authored-By: Gerard de Leeuw <gdeleeuw@leeuwit.nl>
Co-Authored-By: Steve Francis <67986293+steverfrancis@users.noreply.github.com>
Co-Authored-By: Volodymyr Mazurets <volodymyrmazureets@gmail.com>