Commit Graph

12 Commits

Author SHA1 Message Date
Artem Chernyshev
3810ccb03f
fix: properly clean up stale Talos gRPC backends
Fixes: https://github.com/siderolabs/omni/issues/432

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-07-01 17:09:36 +03:00
Utku Ozdemir
e9bca13f8f
feat: use tcp loadbalancer for exposed services
Improve the exposed service reliability by using a TCP loadbalancer between the nodes exposing the service.

Rework the exposed service proxy registry to be a COSI controller instead to simplify the logic, improve reliability and testability.

Closes siderolabs/omni#396.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-06-25 17:28:21 +02:00
Dmitriy Matrenichev
271bb70b12
chore: migrate to oidc v3
Update to latest oidc implementation.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-20 22:55:54 +03:00
Utku Ozdemir
6dcfd4c979
feat: handle all goroutine panics gracefully
Convert goroutine panics to errors or error logs.

Disallow usage of `golang.org/x/sync/errgroup` package in the backend by `depguard` linter. This linter configuration depends on: https://github.com/siderolabs/kres/pull/417

Rekres the project to include the feature (also bump Go to 1.22.4), but revert `PROTOBUF_GO_VERSION` and `GRPC_GATEWAY_VERSION` manually to not break the frontend.

Disallowing the named `go` statement was not possible at the moment using existing linters, raised an issue in `forbidigo` for it: https://github.com/ashanbrown/forbidigo/issues/47

Closes siderolabs/omni#373.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-06-20 21:28:12 +02:00
Utku Ozdemir
a2b7b530c9
feat: use the new domain scheme for exposed services
Rework the workload service proxying feature to support the new domain format in addition to the existing one.

Old format:
```
p-g3a4ana-demo.omni.siderolabs.io
```

New format:
```
g3a4ana-demo.proxy-us.omni.siderolabs.io
```

The old format required a new DNS records to be added for each new workload service, causing issues with the resolution on clients. The new format addresses it by leveraging wildcard records.

Additionally, build the full exposed service URL on the backend and make it a field on `ExposedService` resource, so they can be accessed using `omnictl get exposedservice`.

Part of siderolabs/omni#17.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-06-14 17:29:27 +02:00
Dmitriy Matrenichev
3f75f91608
fix: change Transport.Address field to Transport.Address method
With new gRPC (both gateway and modules) it uses `grpc.NewClient` call to create clients.
It no longer support custom addresses without a `passthrough:` prefix. Previous fix didn't
account for that in some places, so this one changes the structure of `Transport` to always
return address in proper form for external users.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-07 19:16:26 +03:00
Utku Ozdemir
331fc31984
feat: run embedded discovery service in Omni
Run a discovery service instance inside Omni (enabled by default).

It listens only on the SideroLink interface on port 8093.

Clusters can opt in to use this embedded discovery service instead of the `discovery.talos.dev`. It is added as a new cluster feature both on frontend and in cluster templates.

Closes siderolabs/omni#20.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-06-06 01:11:17 +02:00
Artem Chernyshev
ed26122ce0
fix: implement the controller for handling machine status snapshot
Make the controller run tasks that can collect machine status from each
machine.
Instead of changing the `MachineStatusSnapshot` directly in the
siderolink events handler pass these events to the controller through
the channel, so that all events are handled in the same place.

If either event comes from siderolink or if task runner gets the machine
status it updates the `MachineStatusSnapshot` resource.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-06-04 13:59:47 +03:00
Utku Ozdemir
95197e2b07
feat: improve reliability of machine status snapshots
Use the Talos resource API as well as the siderolink event sink to determine the status of a machine.

Follow the agreed decision tree of:
- if the update came over the same channel as before, use it
- if the update came over a different channel than before, and the timestamp is newer than the previous update, use it
- otherwise, drop it

Closes siderolabs/omni#41.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-04-30 17:32:20 +02:00
Utku Ozdemir
176f9d9f57
feat: compute schematic id only from the extensions
When determining the schematic ID of a machine, instead of relying the ID on the schematic ID meta-extension, compute the ID by gathering the extensions on the machine. This way, the extension ID will not contain the META values, labels or the kernel args.

This ID is actually the ID we need, as when we compare the desired schematic with the actual one during a Talos upgrade, we are only interested in the changes in the list of extensions.

This does not cause the kernel args, labels, etc. to disappear, as they are used at installation time and preserved afterward (e.g., during upgrades).

Additionally:
- Remove the list of extensions from the `Schematic` resource, as it relied upon the schematics always being created through Omni. This is not always the case - i.e., when a partial join config is used. Therefore, instead of relying on it, we store the list of extensions by directly reading them from the machine and storing them on the `MachineStatus` resource.
- Skip setting the schematic META section at all if there are no labels set on Download Installation Media screen.

Closes siderolabs/omni#55.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-03-22 14:58:19 +03:00
Artem Chernyshev
1e4e303c09
feat: implement omnictl support command
Works the same way as `talosctl support` but also grabs some relevant
Omni resources to help with the diagnostics.

Uses `go-talos-support` common module to collect Talos data.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-03-19 14:20:46 +03:00
Andrey Smirnov
dfcbaae7d0
chore: initial commit
Omni is source-available under BUSL.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Co-Authored-By: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Co-Authored-By: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Co-Authored-By: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Co-Authored-By: Philipp Sauter <philipp.sauter@siderolabs.com>
Co-Authored-By: Noel Georgi <git@frezbo.dev>
Co-Authored-By: evgeniybryzh <evgeniybryzh@gmail.com>
Co-Authored-By: Tim Jones <tim.jones@siderolabs.com>
Co-Authored-By: Andrew Rynhard <andrew@rynhard.io>
Co-Authored-By: Spencer Smith <spencer.smith@talos-systems.com>
Co-Authored-By: Christian Rolland <christian.rolland@siderolabs.com>
Co-Authored-By: Gerard de Leeuw <gdeleeuw@leeuwit.nl>
Co-Authored-By: Steve Francis <67986293+steverfrancis@users.noreply.github.com>
Co-Authored-By: Volodymyr Mazurets <volodymyrmazureets@gmail.com>
2024-02-29 17:19:57 +04:00