Then another Tool can represent it in a nice way.
This will be a good education tool for the start, and then we can add
monitoring there too.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Otherwise the debug server stops returning `nil` and that stops the
entire Omni instance.
Also refactor the fns list in the `server.go` to give more idea on which
subsystem is stopped first. Otherwise it's really hard to understand
what caused Omni to stop.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
This enables test coverage, builds Omni with race detector.
Also redone the COSI state creation flow: no more callbacks.
The state is now an Object, which has `Stop` method, that should be
called when the app stops.
All defers were moved into the `Stop` method basically.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Changes:
- Change the flow how the config is merged: first read the config from
the file if it is set, merge it with the defaults, then init the rest
of command line flags using the merged defaults + config file as the
base.
- Make some config fields pointers to change how it's merged.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Omni can now be configured via a config file instead of the command line
flags.
The flags `--config-path` will now read the config provided in the YAML
format.
The config structure was completely changed. It was not public before,
so it's fine to ignore backward compatibility.
The command line flags were not changed.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
This mainly affects the exposed services flow.
Instead of putting raw url there, sign the redirect URL on the backend. When the frontend needs
to follow the redirect it calls the newly implemented API on the server,
then this API redirects to the correct URL if the signature is correct.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
With this change, we change the way we track whether Talos is installed on the disk or not in the bare-metal infra provider.
Previously, it worked like the following:
- Omni, when observing some specific type of events on SideroLink, set the `installed` flag on the dedicated `MachineState` resource to true.
- The provider, after wiping disks of a machine, set that flag to false.
This method went against the "single owner per resource" principle and was not leveraging COSI runtime and controller-based logic. Furthermore, it made the contract between Omni and the provider more complex since it was yet another resource.
Instead, now, we do the following:
- Every time we observe those specific types of events on SideroLink, we increment a counter field on the `infra.Machine` resource.
- When the provider wipes a machine, it persists this counter value at the time of wipe internally.
- To detect whether Talos is installed or not, the provider compares the internally stored counter value vs the value on the `infra.Machine`. It is "installed" only if the counter value on the `infra.Machine` is bigger than the internally stored one (it means we observed an installation after the last wipe).
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Introduce the concept of "static" infra providers, e.g., bare-metal infra provider, which manage a static set of machines contrary to the "regular" infra providers.
Add the following resources:
- `infra.Machine`: similar to `MachineRequest`, lives in the `infra-provider` namespace, serving as the input of the owning static provider. It is created in the `MachineController` if there is a SideroLink connection with the static provider ID. Regular flow of `Machine` creation is blocked, until this `infra.Machine` is accepted.
- `infra.MachineStatus`: similar to `MachineRequestStatus`, lives in the `infra-provider` namespace, serving as the output of the owning static provider. Its lifecycle must be bound to the corresponding `infra.Machine`.
- `infra.MachineState`: a resource that is supposed to be shared by Omni and bare-metal provider bi-directionally - they both can read from and write to it. It is currently used to mark the machine as installed when we observe an installation (through `SequenceEvent`s in the event sink), and to mark it as non-installed after we wipe it in the provider.
- `omni.InfraMachineConfig`: a user-managed resource to mark the `infra.Machine`s as accepted or set their desired power state. The acceptance information is then propagated to the `infra.Machine` resource. A machine which was already accepted cannot be unaccepted (checked by a validation), and this resource can only be removed when the `siderolink.Link` for the matching infra machine is removed.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Also:
- support generating the initial service account and dumping it's key
somewhere.
- support running omni integration tests against the production build of
Omni using a service account.
- enable `omni-integration-test` image.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Currently, if gateway meets existing X-Forwarded-For header, it will append peer address that it sees to the existing value using comma.
Our IP extraction function didn't account for that, and so it failed to parse IP and it used the original `peer.address` which
set deep below in the gRPC middleware.
This commit ensures that we try to split the string value using `,`.
Closes#668
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Make each controller process only resources labeled with it's provider
ID.
Allow overriding gRPC tunnel options for the machine classes/request
sets.
Expose join configs to the infra providers.
Also publish Omni integration tests as the part of releases.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
This commit allows us to specify the `start` and `end` time for the `audit-log` command. If not specified,
Omni will use current time minus thirty days to get audit logs.
Example:
```bash
omnictl audit-log 2024-08-26 2024-08-27
{"event_type":"create","resource_type":"PublicKeys.omni.sidero.dev","event_ts":1724767441119,"event_data":{"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36","ip_address":"188.186.141.156","user_id":"3b470fcd-4170-420e-94f8-0ea03180ec35","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"b07755c2aaf099923182014e05634d017649a42d","public_key_expiration":1724795641}}}
{"event_type":"update","resource_type":"PublicKeys.omni.sidero.dev","event_ts":1724767441762,"event_data":{"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36","ip_address":"188.186.141.156","user_id":"3b470fcd-4170-420e-94f8-0ea03180ec35","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"b07755c2aaf099923182014e05634d017649a42d","confirmation_type":"auth0","public_key_expiration":1724795641}}}
{"event_type":"destroy","resource_type":"PublicKeys.omni.sidero.dev","event_ts":1724796226583,"event_data":{"session":{"user_agent":"Omni-Internal-Agent","fingerprint":"b07755c2aaf099923182014e05634d017649a42d"}}}
```
The command passes time directly to the server to avoid any timezone issues.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Omni now watches gRPC/HTTP TLS cert files for changes and loads them
without restart.
Fixes: https://github.com/siderolabs/omni/issues/508
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Delete files older than 30 days. Also bump minimal Go version to 1.23 and fix lint issues.
Closes#546
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This commit implements session tracking and log audit for those types:
- [x] auth.PublicKey
- [x] auth.AccessPolicy
- [x] auth.User
- [x] auth.Identity
- [x] omni.Machine
- [x] omni.MachineLabels
- [x] omni.Cluster
- [x] omni.MachineSet (only empty owners for update, log create and delete in all cases)
- [x] omni.MachineSetNode (only empty owners for update, log create and delete in all cases)
- [x] omni.ConfigPatch
- [x] Talos API Access
- [x] Kubernetes API access
Output example:
```
{"event_type":"update","resource_type":"Machines.omni.sidero.dev","event_ts":1723137771180,"event_data":{"session":{"user_agent":"Omni-Internal-Agent"},"machine":{"id":"18cec051-d975-483d-8d43-10ac6421648a","is_connected":true,"management_address":"fdae:41e4:649b:9303:da9b:1ed:a725:c3dd","labels":{"omni.sidero.dev/address":"fdae:41e4:649b:9303:da9b:1ed:a725:c3dd"}}}}
{"event_type":"update","resource_type":"Machines.omni.sidero.dev","event_ts":1723137771180,"event_data":{"session":{"user_agent":"Omni-Internal-Agent"},"machine":{"id":"18cec051-d975-483d-8d43-10ac6421648a","is_connected":true,"management_address":"fdae:41e4:649b:9303:da9b:1ed:a725:c3dd","labels":{"omni.sidero.dev/address":"fdae:41e4:649b:9303:da9b:1ed:a725:c3dd"}}}}
{"event_type":"update","resource_type":"Machines.omni.sidero.dev","event_ts":1723137771181,"event_data":{"session":{"user_agent":"Omni-Internal-Agent"},"machine":{"id":"18cec051-d975-483d-8d43-10ac6421648a","is_connected":true,"management_address":"fdae:41e4:649b:9303:da9b:1ed:a725:c3dd","labels":{"omni.sidero.dev/address":"fdae:41e4:649b:9303:da9b:1ed:a725:c3dd"}}}}
{"event_type":"create","resource_type":"MachineLabels.omni.sidero.dev","event_ts":1723137787549,"event_data":{"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36","ip_address":"<snip>","user_id":"ea002172-b9da-423f-bd1d-b443b8a7b43c","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"da7b997eb68449a12bebc6a3bf4f59beaf167209"},"machine_labels":{"id":"18cec051-d975-483d-8d43-10ac6421648a","labels":{"222":""}}}}
{"event_type":"update","resource_type":"MachineLabels.omni.sidero.dev","event_ts":1723137787553,"event_data":{"session":{"user_agent":"Omni-Internal-Agent"},"machine_labels":{"id":"18cec051-d975-483d-8d43-10ac6421648a","labels":{"222":""}}}}
{"event_type":"update","resource_type":"MachineLabels.omni.sidero.dev","event_ts":1723137811532,"event_data":{"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36","ip_address":"<snip>","user_id":"ea002172-b9da-423f-bd1d-b443b8a7b43c","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"da7b997eb68449a12bebc6a3bf4f59beaf167209"},"machine_labels":{"id":"18cec051-d975-483d-8d43-10ac6421648a","labels":{"222":"","333":""}}}}
{"event_type":"update","resource_type":"MachineLabels.omni.sidero.dev","event_ts":1723137811610,"event_data":{"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36","ip_address":"<snip>","user_id":"ea002172-b9da-423f-bd1d-b443b8a7b43c","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"da7b997eb68449a12bebc6a3bf4f59beaf167209"},"machine_labels":{"id":"18cec051-d975-483d-8d43-10ac6421648a","labels":{"222":"","333":""}}}}
{"event_type":"update","resource_type":"MachineLabels.omni.sidero.dev","event_ts":1723137811611,"event_data":{"session":{"user_agent":"Omni-Internal-Agent"},"machine_labels":{"id":"18cec051-d975-483d-8d43-10ac6421648a","labels":{"222":"","333":""}}}}
{"event_type":"destroy","resource_type":"MachineLabels.omni.sidero.dev","event_ts":1723137811621,"event_data":{"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36","ip_address":"<snip>","user_id":"ea002172-b9da-423f-bd1d-b443b8a7b43c","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"da7b997eb68449a12bebc6a3bf4f59beaf167209"},"machine_labels":{"id":"18cec051-d975-483d-8d43-10ac6421648a","labels":{"222":"","333":""}}}}
{"event_type":"create","resource_type":"Users.omni.sidero.dev","event_ts":1723141793888,"event_data":{"new_user":{"role":"Admin","id":"7903a72c-87af-43b8-94dc-82bd961ab768"},"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36","ip_address":"<snip>","user_id":"ea002172-b9da-423f-bd1d-b443b8a7b43c","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"da7b997eb68449a12bebc6a3bf4f59beaf167209"}}}
{"event_type":"create","resource_type":"Identities.omni.sidero.dev","event_ts":1723141793981,"event_data":{"new_user":{"id":"7903a72c-87af-43b8-94dc-82bd961ab768","email":"some-user-email@email.com"},"session":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36","ip_address":"<snip>","user_id":"ea002172-b9da-423f-bd1d-b443b8a7b43c","role":"Admin","email":"dmitry.matrenichev@siderolabs.com","fingerprint":"da7b997eb68449a12bebc6a3bf4f59beaf167209"}}}
```
Closes#37
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This PR implements audit logs. To enable it you have to set the `--audit-log-dir` flag
to a directory where the audit logs will be stored. The audit logs are stored in a JSON format.
Example:
```json
{"event_type":"update","resource_type":"PublicKeys.omni.sidero.dev","event_ts":1722537710182,"event_data":{"user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36","ip_address":"<snip>","user_id":"a19a7a38-1793-4262-a9ef-97bc00c7a155","role":"Admin","email":"useremail@userdomain.com","confirmation_type":"auth0","fingerprint":"15acb974f769bdccd38a4b28f282b78736b80bc7","public_key_expiration":1722565909}}
```
Keep in mind that `event_ts` are in milliseconds instead of seconds.
Field `event_data` contains all relevant information about the event.
To enabled it in the development environment you will have to add the
`--audit-log-dir /tmp/omni-data/audit-logs` line to `docker-compose.override.yml`
or run `generate-certs` again.
For #37
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Improve the exposed service reliability by using a TCP loadbalancer between the nodes exposing the service.
Rework the exposed service proxy registry to be a COSI controller instead to simplify the logic, improve reliability and testability.
Closessiderolabs/omni#396.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Convert goroutine panics to errors or error logs.
Disallow usage of `golang.org/x/sync/errgroup` package in the backend by `depguard` linter. This linter configuration depends on: https://github.com/siderolabs/kres/pull/417
Rekres the project to include the feature (also bump Go to 1.22.4), but revert `PROTOBUF_GO_VERSION` and `GRPC_GATEWAY_VERSION` manually to not break the frontend.
Disallowing the named `go` statement was not possible at the moment using existing linters, raised an issue in `forbidigo` for it: https://github.com/ashanbrown/forbidigo/issues/47Closessiderolabs/omni#373.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Rework the workload service proxying feature to support the new domain format in addition to the existing one.
Old format:
```
p-g3a4ana-demo.omni.siderolabs.io
```
New format:
```
g3a4ana-demo.proxy-us.omni.siderolabs.io
```
The old format required a new DNS records to be added for each new workload service, causing issues with the resolution on clients. The new format addresses it by leveraging wildcard records.
Additionally, build the full exposed service URL on the backend and make it a field on `ExposedService` resource, so they can be accessed using `omnictl get exposedservice`.
Part of siderolabs/omni#17.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
With new gRPC (both gateway and modules) it uses `grpc.NewClient` call to create clients.
It no longer support custom addresses without a `passthrough:` prefix. Previous fix didn't
account for that in some places, so this one changes the structure of `Transport` to always
return address in proper form for external users.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Run a discovery service instance inside Omni (enabled by default).
It listens only on the SideroLink interface on port 8093.
Clusters can opt in to use this embedded discovery service instead of the `discovery.talos.dev`. It is added as a new cluster feature both on frontend and in cluster templates.
Closessiderolabs/omni#20.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Make the controller run tasks that can collect machine status from each
machine.
Instead of changing the `MachineStatusSnapshot` directly in the
siderolink events handler pass these events to the controller through
the channel, so that all events are handled in the same place.
If either event comes from siderolink or if task runner gets the machine
status it updates the `MachineStatusSnapshot` resource.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Use the Talos resource API as well as the siderolink event sink to determine the status of a machine.
Follow the agreed decision tree of:
- if the update came over the same channel as before, use it
- if the update came over a different channel than before, and the timestamp is newer than the previous update, use it
- otherwise, drop it
Closessiderolabs/omni#41.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
When determining the schematic ID of a machine, instead of relying the ID on the schematic ID meta-extension, compute the ID by gathering the extensions on the machine. This way, the extension ID will not contain the META values, labels or the kernel args.
This ID is actually the ID we need, as when we compare the desired schematic with the actual one during a Talos upgrade, we are only interested in the changes in the list of extensions.
This does not cause the kernel args, labels, etc. to disappear, as they are used at installation time and preserved afterward (e.g., during upgrades).
Additionally:
- Remove the list of extensions from the `Schematic` resource, as it relied upon the schematics always being created through Omni. This is not always the case - i.e., when a partial join config is used. Therefore, instead of relying on it, we store the list of extensions by directly reading them from the machine and storing them on the `MachineStatus` resource.
- Skip setting the schematic META section at all if there are no labels set on Download Installation Media screen.
Closessiderolabs/omni#55.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Works the same way as `talosctl support` but also grabs some relevant
Omni resources to help with the diagnostics.
Uses `go-talos-support` common module to collect Talos data.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Omni is source-available under BUSL.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Co-Authored-By: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Co-Authored-By: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Co-Authored-By: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Co-Authored-By: Philipp Sauter <philipp.sauter@siderolabs.com>
Co-Authored-By: Noel Georgi <git@frezbo.dev>
Co-Authored-By: evgeniybryzh <evgeniybryzh@gmail.com>
Co-Authored-By: Tim Jones <tim.jones@siderolabs.com>
Co-Authored-By: Andrew Rynhard <andrew@rynhard.io>
Co-Authored-By: Spencer Smith <spencer.smith@talos-systems.com>
Co-Authored-By: Christian Rolland <christian.rolland@siderolabs.com>
Co-Authored-By: Gerard de Leeuw <gdeleeuw@leeuwit.nl>
Co-Authored-By: Steve Francis <67986293+steverfrancis@users.noreply.github.com>
Co-Authored-By: Volodymyr Mazurets <volodymyrmazureets@gmail.com>