- Bump some deps, namely cosi-runtime and Talos machinery.
- Update `auditState` to implement the new methods in COSI's `state.State`.
- Bump default Talos and Kubernetes versions to their latest.
- Rekres, which brings Go 1.24.5. Also update it in go.mod files.
- Fix linter errors coming from new linters.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Our unit test logs are too verbose/excessive because:
- COSI runtime writes a lot of `reconcile succeeded` logs
- migration tests were too spammy
This caused the logs to get clipped by docker/buildx in the CI:
```
[output clipped, log limit 2MiB reached]
```
and resulted in mysterious test failures without the failure reason being printed.
With these changes we pull our unit test logs a bit below the 2MiB limit (1.7MiB).
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
This enables test coverage, builds Omni with race detector.
Also redone the COSI state creation flow: no more callbacks.
The state is now an Object, which has `Stop` method, that should be
called when the app stops.
All defers were moved into the `Stop` method basically.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Rework the discovery service affiliate deletion by doing the following changes:
1. Add support for arbitrary discovery services (e.g., self-hosted or third party):
- Read the discovery service used by a machine from the machine itself
- Implement a cache for discovery service clients
- Use this discovery service client to remove the affiliate on node removal.
2. Make the discovery affiliate deletion asynchronous:
- Introduce `DiscoveryAffiliateDeleteTask` resource
- When a node is removed from a cluster, a resource for this node ID is created
- A controller continuously tries to remove the affiliate until it succeeds or until it gets expired in the discovery service itself (after 30 minutes)
- The controller removes the `DiscoveryAffiliateDeleteTask` resource
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
- `NewClusterBootstrapStatusController.TransformExtraOutputFunc` gets `*client.Client` but forgets to close it.
This leads to `*grpc.ClientConn` leakage.
- `MachineTeardownController.resetMachine` gets `*client.Client` but forgets to close it.
This leads to `*grpc.ClientConn` leakage.
- `runWithState` gets `*discovery.Client` but forgets to close it. This leads to `*grpc.ClientConn` leakage.
- `WithClient` gets `*client.Client` but forgets to close it. This leads to `*grpc.ClientConn` leakage.
Shorten the code in some places, while we are at it.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
With this change, we change the way we track whether Talos is installed on the disk or not in the bare-metal infra provider.
Previously, it worked like the following:
- Omni, when observing some specific type of events on SideroLink, set the `installed` flag on the dedicated `MachineState` resource to true.
- The provider, after wiping disks of a machine, set that flag to false.
This method went against the "single owner per resource" principle and was not leveraging COSI runtime and controller-based logic. Furthermore, it made the contract between Omni and the provider more complex since it was yet another resource.
Instead, now, we do the following:
- Every time we observe those specific types of events on SideroLink, we increment a counter field on the `infra.Machine` resource.
- When the provider wipes a machine, it persists this counter value at the time of wipe internally.
- To detect whether Talos is installed or not, the provider compares the internally stored counter value vs the value on the `infra.Machine`. It is "installed" only if the counter value on the `infra.Machine` is bigger than the internally stored one (it means we observed an installation after the last wipe).
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Improve the exposed service reliability by using a TCP loadbalancer between the nodes exposing the service.
Rework the exposed service proxy registry to be a COSI controller instead to simplify the logic, improve reliability and testability.
Closessiderolabs/omni#396.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Fixes: https://github.com/siderolabs/omni/issues/33
It is now possible to get full access `kubeconfig` and `talosconfig`
(operator role), if the Omni instance has `enable-break-glass-configs`
flag enabled.
They can be downloaded using cli commands:
`omnictl kubeconfig --admin --cluster <name>`
`omnictl talosconfig --admin --cluster <name>`
After you download the config the cluster will be marked with
`omni.sidero.dev/tainted` annotation to keep in mind that this cluster
has weaker security and might need to get secrets rotation in the
future.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Run a discovery service instance inside Omni (enabled by default).
It listens only on the SideroLink interface on port 8093.
Clusters can opt in to use this embedded discovery service instead of the `discovery.talos.dev`. It is added as a new cluster feature both on frontend and in cluster templates.
Closessiderolabs/omni#20.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Make the controller run tasks that can collect machine status from each
machine.
Instead of changing the `MachineStatusSnapshot` directly in the
siderolink events handler pass these events to the controller through
the channel, so that all events are handled in the same place.
If either event comes from siderolink or if task runner gets the machine
status it updates the `MachineStatusSnapshot` resource.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
- run rekres and fix nolint directives
- bump deps (keep gen to 0.4.8 for now) for server, client and tests
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
If a `ClusterMachine` is removed, always attempt to remove them from the discovery service.
Use a single discovery service client instead of recreating it every time.
Use the GRPC dial options exposed from the siderolabs/discovery-client when connecting to the discovery service.
Closessiderolabs/omni#19.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Fixes: https://github.com/siderolabs/omni/issues/143
This is crucial if we want to support SBCs in Omni.
Automatically detect which overlay we need to install when any SBC type
is selected on the backend.
Move some of filename generation to the backend, as it's now Talos
version dependent.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
When determining the schematic ID of a machine, instead of relying the ID on the schematic ID meta-extension, compute the ID by gathering the extensions on the machine. This way, the extension ID will not contain the META values, labels or the kernel args.
This ID is actually the ID we need, as when we compare the desired schematic with the actual one during a Talos upgrade, we are only interested in the changes in the list of extensions.
This does not cause the kernel args, labels, etc. to disappear, as they are used at installation time and preserved afterward (e.g., during upgrades).
Additionally:
- Remove the list of extensions from the `Schematic` resource, as it relied upon the schematics always being created through Omni. This is not always the case - i.e., when a partial join config is used. Therefore, instead of relying on it, we store the list of extensions by directly reading them from the machine and storing them on the `MachineStatus` resource.
- Skip setting the schematic META section at all if there are no labels set on Download Installation Media screen.
Closessiderolabs/omni#55.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Omni is source-available under BUSL.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Co-Authored-By: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Co-Authored-By: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Co-Authored-By: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Co-Authored-By: Philipp Sauter <philipp.sauter@siderolabs.com>
Co-Authored-By: Noel Georgi <git@frezbo.dev>
Co-Authored-By: evgeniybryzh <evgeniybryzh@gmail.com>
Co-Authored-By: Tim Jones <tim.jones@siderolabs.com>
Co-Authored-By: Andrew Rynhard <andrew@rynhard.io>
Co-Authored-By: Spencer Smith <spencer.smith@talos-systems.com>
Co-Authored-By: Christian Rolland <christian.rolland@siderolabs.com>
Co-Authored-By: Gerard de Leeuw <gdeleeuw@leeuwit.nl>
Co-Authored-By: Steve Francis <67986293+steverfrancis@users.noreply.github.com>
Co-Authored-By: Volodymyr Mazurets <volodymyrmazureets@gmail.com>