Commit Graph

34 Commits

Author SHA1 Message Date
Artem Chernyshev
b0f7634310
feat: implement the API for reading resources and their dependency graph
Then another Tool can represent it in a nice way.
This will be a good education tool for the start, and then we can add
monitoring there too.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-07-21 20:50:47 +03:00
Artem Chernyshev
c097b5f14d
fix: do not try running debug server in the prod builds
Otherwise the debug server stops returning `nil` and that stops the
entire Omni instance.
Also refactor the fns list in the `server.go` to give more idea on which
subsystem is stopped first. Otherwise it's really hard to understand
what caused Omni to stop.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-06-19 15:17:58 +03:00
Artem Chernyshev
122b79605f
test: run Omni as part of integration tests
Some checks are pending
default / default (push) Waiting to run
default / e2e-backups (push) Blocked by required conditions
default / e2e-forced-removal (push) Blocked by required conditions
default / e2e-scaling (push) Blocked by required conditions
default / e2e-short (push) Blocked by required conditions
default / e2e-short-secureboot (push) Blocked by required conditions
default / e2e-templates (push) Blocked by required conditions
default / e2e-upgrades (push) Blocked by required conditions
default / e2e-workload-proxy (push) Blocked by required conditions
This enables test coverage, builds Omni with race detector.

Also redone the COSI state creation flow: no more callbacks.
The state is now an Object, which has `Stop` method, that should be
called when the app stops.
All defers were moved into the `Stop` method basically.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-06-18 16:20:11 +03:00
Artem Chernyshev
493d00ca54
fix: properly support --config-path argument
Some checks are pending
default / default (push) Waiting to run
default / e2e-backups (push) Blocked by required conditions
default / e2e-forced-removal (push) Blocked by required conditions
default / e2e-scaling (push) Blocked by required conditions
default / e2e-short (push) Blocked by required conditions
default / e2e-short-secureboot (push) Blocked by required conditions
default / e2e-templates (push) Blocked by required conditions
default / e2e-upgrades (push) Blocked by required conditions
default / e2e-workload-proxy (push) Blocked by required conditions
Changes:
- Change the flow how the config is merged: first read the config from
  the file if it is set, merge it with the defaults, then init the rest
  of command line flags using the merged defaults + config file as the
  base.
- Make some config fields pointers to change how it's merged.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-06-17 17:14:04 +03:00
Artem Chernyshev
ccd55cc8fb
feat: rewrite Omni config management
Some checks are pending
default / default (push) Waiting to run
default / e2e-backups (push) Blocked by required conditions
default / e2e-forced-removal (push) Blocked by required conditions
default / e2e-scaling (push) Blocked by required conditions
default / e2e-short (push) Blocked by required conditions
default / e2e-short-secureboot (push) Blocked by required conditions
default / e2e-templates (push) Blocked by required conditions
default / e2e-upgrades (push) Blocked by required conditions
default / e2e-workload-proxy (push) Blocked by required conditions
Omni can now be configured via a config file instead of the command line
flags.
The flags `--config-path` will now read the config provided in the YAML
format.
The config structure was completely changed. It was not public before,
so it's fine to ignore backward compatibility.
The command line flags were not changed.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-06-09 14:44:29 +03:00
Artem Chernyshev
3c55a0b0bf
fix: do not allow http[s] urls in the redirect query
This mainly affects the exposed services flow.

Instead of putting raw url there, sign the redirect URL on the backend. When the frontend needs
to follow the redirect it calls the newly implemented API on the server,
then this API redirects to the correct URL if the signature is correct.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-03-19 17:27:30 +03:00
Artem Chernyshev
ed946b30a6
feat: display OMNI_ENDPOINT in the service account creation UI
Fixes: https://github.com/siderolabs/omni/issues/858

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-01-29 15:27:36 +03:00
Utku Ozdemir
fd888ab190
refactor: track infra machine install status via a counter
With this change, we change the way we track whether Talos is installed on the disk or not in the bare-metal infra provider.

Previously, it worked like the following:
- Omni, when observing some specific type of events on SideroLink, set the `installed` flag on the dedicated `MachineState` resource to true.
- The provider, after wiping disks of a machine, set that flag to false.

This method went against the "single owner per resource" principle and was not leveraging COSI runtime and controller-based logic. Furthermore, it made the contract between Omni and the provider more complex since it was yet another resource.

Instead, now, we do the following:
- Every time we observe those specific types of events on SideroLink, we increment a counter field on the `infra.Machine` resource.
- When the provider wipes a machine, it persists this counter value at the time of wipe internally.
- To detect whether Talos is installed or not, the provider compares the internally stored counter value vs the value on the `infra.Machine`. It is "installed" only if the counter value on the `infra.Machine` is bigger than the internally stored one (it means we observed an installation after the last wipe).

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2025-01-27 05:11:30 +01:00
Utku Ozdemir
5a26d4c7ac
feat: add resources and controllers for bare metal infra provider
Some checks failed
default / default (push) Has been cancelled
default / e2e-backups (push) Has been cancelled
default / e2e-forced-removal (push) Has been cancelled
default / e2e-scaling (push) Has been cancelled
default / e2e-short (push) Has been cancelled
default / e2e-short-secureboot (push) Has been cancelled
default / e2e-templates (push) Has been cancelled
default / e2e-upgrades (push) Has been cancelled
default / e2e-workload-proxy (push) Has been cancelled
Introduce the concept of "static" infra providers, e.g., bare-metal infra provider, which manage a static set of machines contrary to the "regular" infra providers.

Add the following resources:

- `infra.Machine`: similar to `MachineRequest`, lives in the `infra-provider` namespace, serving as the input of the owning static provider. It is created in the `MachineController` if there is a SideroLink connection with the static provider ID. Regular flow of `Machine` creation is blocked, until this `infra.Machine` is accepted.

- `infra.MachineStatus`: similar to `MachineRequestStatus`, lives in the `infra-provider` namespace, serving as the output of the owning static provider. Its lifecycle must be bound to the corresponding `infra.Machine`.

- `infra.MachineState`: a resource that is supposed to be shared by Omni and bare-metal provider bi-directionally - they both can read from and write to it. It is currently used to mark the machine as installed when we observe an installation (through `SequenceEvent`s in the event sink), and to mark it as non-installed after we wipe it in the provider.

- `omni.InfraMachineConfig`: a user-managed resource to mark the `infra.Machine`s as accepted or set their desired power state. The acceptance information is then propagated to the `infra.Machine` resource. A machine which was already accepted cannot be unaccepted (checked by a validation), and this resource can only be removed when the `siderolink.Link` for the matching infra machine is removed.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-11-27 00:00:20 +01:00
Artem Chernyshev
c754cdc0d7
feat: support insecure localhost infra provider access mode
Some checks failed
default / default (push) Has been cancelled
default / e2e-backups (push) Has been cancelled
default / e2e-scaling (push) Has been cancelled
default / e2e-short (push) Has been cancelled
default / e2e-templates (push) Has been cancelled
default / e2e-upgrades (push) Has been cancelled
default / e2e-workload-proxy (push) Has been cancelled
Also:
- support generating the initial service account and dumping it's key
  somewhere.
- support running omni integration tests against the production build of
  Omni using a service account.
- enable `omni-integration-test` image.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-10-18 17:29:57 +03:00
Dmitriy Matrenichev
4084b6e9d7
fix: get proper IP from peer metadata
Currently, if gateway meets existing X-Forwarded-For header, it will append peer address that it sees to the existing value using comma.
Our IP extraction function didn't account for that, and so it failed to parse IP and it used the original `peer.address` which
set deep below in the gRPC middleware.

This commit ensures that we try to split the string value using `,`.

Closes #668

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-10-09 13:27:51 +03:00
Artem Chernyshev
d547889b7b
fix: filter requests in the infra provision controller
Make each controller process only resources labeled with it's provider
ID.
Allow overriding gRPC tunnel options for the machine classes/request
sets.
Expose join configs to the infra providers.

Also publish Omni integration tests as the part of releases.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-10-08 18:41:02 +03:00
Dmitriy Matrenichev
23a4092af5
chore: refactor code
- Redo `backend.Server.Run` so it's easier to reason about.
- Upgrade `math/rand` to `math/rand/v2`
- Remove `resettable` package.
- Add `xcontext` package.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-10-08 14:01:38 +03:00
Dmitriy Matrenichev
bfe036e136
chore: allow to specify start and end time for audit-log
This commit allows us to specify the `start` and `end` time for the `audit-log` command. If not specified,
Omni will use current time minus thirty days to get audit logs.

Example:

```bash
omnictl audit-log 2024-08-26 2024-08-27
{"event_type":"create","resource_type":"PublicKeys.omni.sidero.dev","event_ts":1724767441119,"event_data":{"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36","ip_address":"188.186.141.156","user_id":"3b470fcd-4170-420e-94f8-0ea03180ec35","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"b07755c2aaf099923182014e05634d017649a42d","public_key_expiration":1724795641}}}
{"event_type":"update","resource_type":"PublicKeys.omni.sidero.dev","event_ts":1724767441762,"event_data":{"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36","ip_address":"188.186.141.156","user_id":"3b470fcd-4170-420e-94f8-0ea03180ec35","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"b07755c2aaf099923182014e05634d017649a42d","confirmation_type":"auth0","public_key_expiration":1724795641}}}
{"event_type":"destroy","resource_type":"PublicKeys.omni.sidero.dev","event_ts":1724796226583,"event_data":{"session":{"user_agent":"Omni-Internal-Agent","fingerprint":"b07755c2aaf099923182014e05634d017649a42d"}}}
```

The command passes time directly to the server to avoid any timezone issues.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-08-28 22:41:05 +03:00
Artem Chernyshev
a32a6fa44b
feat: reload TLS certs without restart
Omni now watches gRPC/HTTP TLS cert files for changes and loads them
without restart.

Fixes: https://github.com/siderolabs/omni/issues/508

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-08-27 14:56:55 +03:00
Dmitriy Matrenichev
bf188e4ac1
chore: implement audit log reader
Implement audit log reader functionality and `audit-log` command in `omnictl`.

Closes #578

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-08-23 00:48:17 +03:00
Dmitriy Matrenichev
5d48547c7f
chore: use range-over-func iterators for resource iteration
Bump to Go 1.23 and use new iterator mechanism. Also fix new linter issues.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-08-22 01:20:55 +03:00
Dmitriy Matrenichev
cbfe7c9d9f
chore: add periodic cleanup of old log files
Delete files older than 30 days. Also bump minimal Go version to 1.23 and fix lint issues.

Closes #546

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-08-20 13:55:22 +03:00
Dmitriy Matrenichev
99f93179bd
chore: implement audit log for several types
This commit implements session tracking and log audit for those types:
- [x] auth.PublicKey
- [x] auth.AccessPolicy
- [x] auth.User
- [x] auth.Identity
- [x] omni.Machine
- [x] omni.MachineLabels
- [x] omni.Cluster
- [x] omni.MachineSet (only empty owners for update, log create and delete in all cases)
- [x] omni.MachineSetNode (only empty owners for update, log create and delete in all cases)
- [x] omni.ConfigPatch
- [x] Talos API Access
- [x] Kubernetes API access

Output example:

```
{"event_type":"update","resource_type":"Machines.omni.sidero.dev","event_ts":1723137771180,"event_data":{"session":{"user_agent":"Omni-Internal-Agent"},"machine":{"id":"18cec051-d975-483d-8d43-10ac6421648a","is_connected":true,"management_address":"fdae:41e4:649b:9303:da9b:1ed:a725:c3dd","labels":{"omni.sidero.dev/address":"fdae:41e4:649b:9303:da9b:1ed:a725:c3dd"}}}}
{"event_type":"update","resource_type":"Machines.omni.sidero.dev","event_ts":1723137771180,"event_data":{"session":{"user_agent":"Omni-Internal-Agent"},"machine":{"id":"18cec051-d975-483d-8d43-10ac6421648a","is_connected":true,"management_address":"fdae:41e4:649b:9303:da9b:1ed:a725:c3dd","labels":{"omni.sidero.dev/address":"fdae:41e4:649b:9303:da9b:1ed:a725:c3dd"}}}}
{"event_type":"update","resource_type":"Machines.omni.sidero.dev","event_ts":1723137771181,"event_data":{"session":{"user_agent":"Omni-Internal-Agent"},"machine":{"id":"18cec051-d975-483d-8d43-10ac6421648a","is_connected":true,"management_address":"fdae:41e4:649b:9303:da9b:1ed:a725:c3dd","labels":{"omni.sidero.dev/address":"fdae:41e4:649b:9303:da9b:1ed:a725:c3dd"}}}}
{"event_type":"create","resource_type":"MachineLabels.omni.sidero.dev","event_ts":1723137787549,"event_data":{"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36","ip_address":"<snip>","user_id":"ea002172-b9da-423f-bd1d-b443b8a7b43c","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"da7b997eb68449a12bebc6a3bf4f59beaf167209"},"machine_labels":{"id":"18cec051-d975-483d-8d43-10ac6421648a","labels":{"222":""}}}}
{"event_type":"update","resource_type":"MachineLabels.omni.sidero.dev","event_ts":1723137787553,"event_data":{"session":{"user_agent":"Omni-Internal-Agent"},"machine_labels":{"id":"18cec051-d975-483d-8d43-10ac6421648a","labels":{"222":""}}}}
{"event_type":"update","resource_type":"MachineLabels.omni.sidero.dev","event_ts":1723137811532,"event_data":{"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36","ip_address":"<snip>","user_id":"ea002172-b9da-423f-bd1d-b443b8a7b43c","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"da7b997eb68449a12bebc6a3bf4f59beaf167209"},"machine_labels":{"id":"18cec051-d975-483d-8d43-10ac6421648a","labels":{"222":"","333":""}}}}
{"event_type":"update","resource_type":"MachineLabels.omni.sidero.dev","event_ts":1723137811610,"event_data":{"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36","ip_address":"<snip>","user_id":"ea002172-b9da-423f-bd1d-b443b8a7b43c","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"da7b997eb68449a12bebc6a3bf4f59beaf167209"},"machine_labels":{"id":"18cec051-d975-483d-8d43-10ac6421648a","labels":{"222":"","333":""}}}}
{"event_type":"update","resource_type":"MachineLabels.omni.sidero.dev","event_ts":1723137811611,"event_data":{"session":{"user_agent":"Omni-Internal-Agent"},"machine_labels":{"id":"18cec051-d975-483d-8d43-10ac6421648a","labels":{"222":"","333":""}}}}
{"event_type":"destroy","resource_type":"MachineLabels.omni.sidero.dev","event_ts":1723137811621,"event_data":{"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36","ip_address":"<snip>","user_id":"ea002172-b9da-423f-bd1d-b443b8a7b43c","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"da7b997eb68449a12bebc6a3bf4f59beaf167209"},"machine_labels":{"id":"18cec051-d975-483d-8d43-10ac6421648a","labels":{"222":"","333":""}}}}
{"event_type":"create","resource_type":"Users.omni.sidero.dev","event_ts":1723141793888,"event_data":{"new_user":{"role":"Admin","id":"7903a72c-87af-43b8-94dc-82bd961ab768"},"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36","ip_address":"<snip>","user_id":"ea002172-b9da-423f-bd1d-b443b8a7b43c","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"da7b997eb68449a12bebc6a3bf4f59beaf167209"}}}
{"event_type":"create","resource_type":"Identities.omni.sidero.dev","event_ts":1723141793981,"event_data":{"new_user":{"id":"7903a72c-87af-43b8-94dc-82bd961ab768","email":"some-user-email@email.com"},"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36","ip_address":"<snip>","user_id":"ea002172-b9da-423f-bd1d-b443b8a7b43c","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"da7b997eb68449a12bebc6a3bf4f59beaf167209"}}}
```

Closes #37

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-08-12 15:36:55 +03:00
Dmitriy Matrenichev
d194d59be8
feat: implement audit log
This PR implements audit logs. To enable it you have to set the `--audit-log-dir` flag
to a directory where the audit logs will be stored. The audit logs are stored in a JSON format.

Example:
```json
{"event_type":"update","resource_type":"PublicKeys.omni.sidero.dev","event_ts":1722537710182,"event_data":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36","ip_address":"<snip>","user_id":"a19a7a38-1793-4262-a9ef-97bc00c7a155","role":"Admin","email":"useremail@userdomain.com","confirmation_type":"auth0","fingerprint":"15acb974f769bdccd38a4b28f282b78736b80bc7","public_key_expiration":1722565909}}
```

Keep in mind that `event_ts` are in milliseconds instead of seconds.
Field `event_data` contains all relevant information about the event.

To enabled it in the development environment you will have to add the
`--audit-log-dir /tmp/omni-data/audit-logs` line to `docker-compose.override.yml`
or run `generate-certs` again.

For #37

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-08-02 03:15:31 +03:00
Dmitriy Matrenichev
5dd52593ee
chore: add rotating log for audit data
Adds rotating audit log writer. Also minor improvements.

For #37

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-07-25 19:52:09 +03:00
Andrey Smirnov
b93ac8179f
fix: provide cached access to the state via Omni API
This affects Get/List operations, but not Watch and
Create/Update/Destroy.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-08 18:23:03 +04:00
Artem Chernyshev
3810ccb03f
fix: properly clean up stale Talos gRPC backends
Fixes: https://github.com/siderolabs/omni/issues/432

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-07-01 17:09:36 +03:00
Utku Ozdemir
e9bca13f8f
feat: use tcp loadbalancer for exposed services
Improve the exposed service reliability by using a TCP loadbalancer between the nodes exposing the service.

Rework the exposed service proxy registry to be a COSI controller instead to simplify the logic, improve reliability and testability.

Closes siderolabs/omni#396.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-06-25 17:28:21 +02:00
Dmitriy Matrenichev
271bb70b12
chore: migrate to oidc v3
Update to latest oidc implementation.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-20 22:55:54 +03:00
Utku Ozdemir
6dcfd4c979
feat: handle all goroutine panics gracefully
Convert goroutine panics to errors or error logs.

Disallow usage of `golang.org/x/sync/errgroup` package in the backend by `depguard` linter. This linter configuration depends on: https://github.com/siderolabs/kres/pull/417

Rekres the project to include the feature (also bump Go to 1.22.4), but revert `PROTOBUF_GO_VERSION` and `GRPC_GATEWAY_VERSION` manually to not break the frontend.

Disallowing the named `go` statement was not possible at the moment using existing linters, raised an issue in `forbidigo` for it: https://github.com/ashanbrown/forbidigo/issues/47

Closes siderolabs/omni#373.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-06-20 21:28:12 +02:00
Utku Ozdemir
a2b7b530c9
feat: use the new domain scheme for exposed services
Rework the workload service proxying feature to support the new domain format in addition to the existing one.

Old format:
```
p-g3a4ana-demo.omni.siderolabs.io
```

New format:
```
g3a4ana-demo.proxy-us.omni.siderolabs.io
```

The old format required a new DNS records to be added for each new workload service, causing issues with the resolution on clients. The new format addresses it by leveraging wildcard records.

Additionally, build the full exposed service URL on the backend and make it a field on `ExposedService` resource, so they can be accessed using `omnictl get exposedservice`.

Part of siderolabs/omni#17.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-06-14 17:29:27 +02:00
Dmitriy Matrenichev
3f75f91608
fix: change Transport.Address field to Transport.Address method
With new gRPC (both gateway and modules) it uses `grpc.NewClient` call to create clients.
It no longer support custom addresses without a `passthrough:` prefix. Previous fix didn't
account for that in some places, so this one changes the structure of `Transport` to always
return address in proper form for external users.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-07 19:16:26 +03:00
Utku Ozdemir
331fc31984
feat: run embedded discovery service in Omni
Run a discovery service instance inside Omni (enabled by default).

It listens only on the SideroLink interface on port 8093.

Clusters can opt in to use this embedded discovery service instead of the `discovery.talos.dev`. It is added as a new cluster feature both on frontend and in cluster templates.

Closes siderolabs/omni#20.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-06-06 01:11:17 +02:00
Artem Chernyshev
ed26122ce0
fix: implement the controller for handling machine status snapshot
Make the controller run tasks that can collect machine status from each
machine.
Instead of changing the `MachineStatusSnapshot` directly in the
siderolink events handler pass these events to the controller through
the channel, so that all events are handled in the same place.

If either event comes from siderolink or if task runner gets the machine
status it updates the `MachineStatusSnapshot` resource.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-06-04 13:59:47 +03:00
Utku Ozdemir
95197e2b07
feat: improve reliability of machine status snapshots
Use the Talos resource API as well as the siderolink event sink to determine the status of a machine.

Follow the agreed decision tree of:
- if the update came over the same channel as before, use it
- if the update came over a different channel than before, and the timestamp is newer than the previous update, use it
- otherwise, drop it

Closes siderolabs/omni#41.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-04-30 17:32:20 +02:00
Utku Ozdemir
176f9d9f57
feat: compute schematic id only from the extensions
When determining the schematic ID of a machine, instead of relying the ID on the schematic ID meta-extension, compute the ID by gathering the extensions on the machine. This way, the extension ID will not contain the META values, labels or the kernel args.

This ID is actually the ID we need, as when we compare the desired schematic with the actual one during a Talos upgrade, we are only interested in the changes in the list of extensions.

This does not cause the kernel args, labels, etc. to disappear, as they are used at installation time and preserved afterward (e.g., during upgrades).

Additionally:
- Remove the list of extensions from the `Schematic` resource, as it relied upon the schematics always being created through Omni. This is not always the case - i.e., when a partial join config is used. Therefore, instead of relying on it, we store the list of extensions by directly reading them from the machine and storing them on the `MachineStatus` resource.
- Skip setting the schematic META section at all if there are no labels set on Download Installation Media screen.

Closes siderolabs/omni#55.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-03-22 14:58:19 +03:00
Artem Chernyshev
1e4e303c09
feat: implement omnictl support command
Works the same way as `talosctl support` but also grabs some relevant
Omni resources to help with the diagnostics.

Uses `go-talos-support` common module to collect Talos data.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-03-19 14:20:46 +03:00
Andrey Smirnov
dfcbaae7d0
chore: initial commit
Omni is source-available under BUSL.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Co-Authored-By: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Co-Authored-By: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Co-Authored-By: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Co-Authored-By: Philipp Sauter <philipp.sauter@siderolabs.com>
Co-Authored-By: Noel Georgi <git@frezbo.dev>
Co-Authored-By: evgeniybryzh <evgeniybryzh@gmail.com>
Co-Authored-By: Tim Jones <tim.jones@siderolabs.com>
Co-Authored-By: Andrew Rynhard <andrew@rynhard.io>
Co-Authored-By: Spencer Smith <spencer.smith@talos-systems.com>
Co-Authored-By: Christian Rolland <christian.rolland@siderolabs.com>
Co-Authored-By: Gerard de Leeuw <gdeleeuw@leeuwit.nl>
Co-Authored-By: Steve Francis <67986293+steverfrancis@users.noreply.github.com>
Co-Authored-By: Volodymyr Mazurets <volodymyrmazureets@gmail.com>
2024-02-29 17:19:57 +04:00