- Bump some deps, namely cosi-runtime and Talos machinery.
- Update `auditState` to implement the new methods in COSI's `state.State`.
- Bump default Talos and Kubernetes versions to their latest.
- Rekres, which brings Go 1.24.5. Also update it in go.mod files.
- Fix linter errors coming from new linters.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Now the machine join config is always generate when there's a `machine`
resource. It will automatically populate the correct parameters for the
machine API URL, logs and events.
If the machine is managed by an infra provider it will populate it's
request ID too.
The default provider join config is also generated, but it is not used
in the common infra provider library, because it's easier to just
generate the config at the moment it's going to be used.
The code for the siderolink join config generation was unified in all
the places, and is now in `client/pkg/siderolink`.
The new management API introduced for downloading the join config in the
UI `GetMachineJoinConfig`.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Refactor the siderolink manager to improve context handling for Wireguard and its associated services.
This ensures proper shutdown and cleanup of resources.
Add a linger option to TCP connections in the log receiver to forcefully close the connection with RST.
Signed-off-by: Oguz Kilcan <oguz.kilcan@siderolabs.com>
Omni can now be configured via a config file instead of the command line
flags.
The flags `--config-path` will now read the config provided in the YAML
format.
The config structure was completely changed. It was not public before,
so it's fine to ignore backward compatibility.
The command line flags were not changed.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Followup fix after https://github.com/siderolabs/omni/pull/976.
Change the way the global flag `--siderolink-use-grpc-tunnel` works: It now is used in the SideroLink provision API to decide whether to ignore what the machine requests in the prevision request and "force" the tunnel mode.
Additionally, now it is ignored in the generated `SideroLinkConfig` document which is pushed to the machines when they are allocated - instead, we simply check if the machine was already connected (provisioned) using a GRPC tunnel, and preserve that option. By doing that, we avoid switching the mode after the provisioning, avoiding the bug which is fixed in https://github.com/siderolabs/talos/pull/10517.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Updated SideroLink module to add the support for it and configure it
on the Omni side.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
follow up to https://github.com/siderolabs/omni/pull/959
reduce the amount of unimportant logs coming from
the SideroLink GRPC tunnel handler.
Signed-off-by: Orzelius <33936483+Orzelius@users.noreply.github.com>
It seems SideroLink logs are too verbose in debug mode when the GRPC tunnel is used.
Reduce their level to info.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
With the feature flag it is now possible to use the old flow.
The new secure join flow is not stable yet, so the default mode is
legacy, which doesn't enable node unique token generation for all
machines, not only the ones using too old Talos versions.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Fixes: https://github.com/siderolabs/omni/issues/840
This PR changes the Talos machine join flow drastically:
- newly joined machine first put into a limbo state where Omni creates a
temporary Wireguard connection to it.
- the controller picks up and tries to write a unique machine token to
the newly joined machine, in the mean time it also resolves UUID
conflicts automatically and writes UUID override to the META
partition.
- the machine re-joins Omni, now with the unique token.
- the unique token is saved in the `siderolink.Link` resource and any
subsequent join checks that `siderolink.Link` has matching unique
token.
Siderolink manager was refactored, as it was a huge monolithic poorly
testable chunk, it was split to:
- LinkStatus controller, which creates/removes wireguard peers.
- PendingMachineStatus controller, which ensures all joined machines
have unique node tokens.
- Provision handler, which implements gRPC server and has all logic
related to the machine acceptance now.
- PeersPool, which is used by LinkStatus controllers and deduplicate
peers creation, reuse them when possible.
Additionally updated siderolink loghandler to not accept logger
connection for the machines which do not have corresponding log buffers.
Nodes which do not support secure flow are still able to join by
default.
Secure join flow can be forced by setting `--disable-legacy-join-tokens`
flag.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Introduce the concept of "static" infra providers, e.g., bare-metal infra provider, which manage a static set of machines contrary to the "regular" infra providers.
Add the following resources:
- `infra.Machine`: similar to `MachineRequest`, lives in the `infra-provider` namespace, serving as the input of the owning static provider. It is created in the `MachineController` if there is a SideroLink connection with the static provider ID. Regular flow of `Machine` creation is blocked, until this `infra.Machine` is accepted.
- `infra.MachineStatus`: similar to `MachineRequestStatus`, lives in the `infra-provider` namespace, serving as the output of the owning static provider. Its lifecycle must be bound to the corresponding `infra.Machine`.
- `infra.MachineState`: a resource that is supposed to be shared by Omni and bare-metal provider bi-directionally - they both can read from and write to it. It is currently used to mark the machine as installed when we observe an installation (through `SequenceEvent`s in the event sink), and to mark it as non-installed after we wipe it in the provider.
- `omni.InfraMachineConfig`: a user-managed resource to mark the `infra.Machine`s as accepted or set their desired power state. The acceptance information is then propagated to the `infra.Machine` resource. A machine which was already accepted cannot be unaccepted (checked by a validation), and this resource can only be removed when the `siderolink.Link` for the matching infra machine is removed.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Currently, if gateway meets existing X-Forwarded-For header, it will append peer address that it sees to the existing value using comma.
Our IP extraction function didn't account for that, and so it failed to parse IP and it used the original `peer.address` which
set deep below in the gRPC middleware.
This commit ensures that we try to split the string value using `,`.
Closes#668
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Make each controller process only resources labeled with it's provider
ID.
Allow overriding gRPC tunnel options for the machine classes/request
sets.
Expose join configs to the infra providers.
Also publish Omni integration tests as the part of releases.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
By default, generated Talos installation media uses `grpc_tunnel` for SideroLink based on the Omni instance configuration, namely via `--siderolink-use-grpc-tunnel` flag.
Allow overriding this setting in `omnictl download` and in Download Installation Media screen on the web.
On the Download Installation Media screen, the default value of the checkbox is based on the instance default.
Closessiderolabs/omni#388.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
This extra data is used in the infra provider to add the annotation to the
`siderolink.Link` as early as possible.
Then the `Machine` controller is changed to skip the `Links` that have
annotation `omni.sidero.dev/infra-provider` and do not have the label
`omni.sidero.dev/machine-request`.
This change makes not consistent `Links` to be ignored by the system,
until the are fully populated.
Also changed the infra provider interface to take siderolink connection
params as string instead of the resource.
Fixes: https://github.com/siderolabs/omni/issues/603
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
The code is already there: Talos will simply fail to connect and will try again by rotating the IP.
We simply add support for specifying multiple IP's in the `siderolink-wireguard-advertised-addr` flag separated by a comma.
Fixes#495
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
And exclude port from the saved address.
Additionally fix Talos backends cache to not to
react on the `MachineType` `Create` and `Update` events.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
And store them in the `link` resources.
This might be help to determine the real IP of the node which is coming
to Omni in case if `MachineStatus` is not populated.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Convert goroutine panics to errors or error logs.
Disallow usage of `golang.org/x/sync/errgroup` package in the backend by `depguard` linter. This linter configuration depends on: https://github.com/siderolabs/kres/pull/417
Rekres the project to include the feature (also bump Go to 1.22.4), but revert `PROTOBUF_GO_VERSION` and `GRPC_GATEWAY_VERSION` manually to not break the frontend.
Disallowing the named `go` statement was not possible at the moment using existing linters, raised an issue in `forbidigo` for it: https://github.com/ashanbrown/forbidigo/issues/47Closessiderolabs/omni#373.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Even if they already have the kernel arguments.
It will generate the config only for Talos >= 1.5.0.
Added migration to avoid triggering config updates for all machines, as
they don't have this partial config right now.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Enabled ALPN negotiation for machine API endpoint
Signed-off-by: Petr Krutov <kjubybot@proton.me>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This lets the operator define url params for the api endpoint. For example https://<endpoint>/?grpc_tunnel=true. Instead of only appending the jointoken, we are parsing the url and adding it using Query.Set.
Signed-off-by: Simon-Boyer <si.boyer@hotmail.ca>
Co-authored-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Use the Talos resource API as well as the siderolink event sink to determine the status of a machine.
Follow the agreed decision tree of:
- if the update came over the same channel as before, use it
- if the update came over a different channel than before, and the timestamp is newer than the previous update, use it
- otherwise, drop it
Closessiderolabs/omni#41.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Introduce a buffer for `PeerEvents` channel, to not block adding new
machines when SideroLink is processing new peers.
Fixes: https://github.com/siderolabs/omni/issues/120
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
This PR adds the support for WG over GRPC. New field `VirtualAddrport`
in `SiderolinkSpec` should allow for both
setting the virtual addr and loading it after the Omni restart.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Omni is source-available under BUSL.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Co-Authored-By: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Co-Authored-By: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Co-Authored-By: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Co-Authored-By: Philipp Sauter <philipp.sauter@siderolabs.com>
Co-Authored-By: Noel Georgi <git@frezbo.dev>
Co-Authored-By: evgeniybryzh <evgeniybryzh@gmail.com>
Co-Authored-By: Tim Jones <tim.jones@siderolabs.com>
Co-Authored-By: Andrew Rynhard <andrew@rynhard.io>
Co-Authored-By: Spencer Smith <spencer.smith@talos-systems.com>
Co-Authored-By: Christian Rolland <christian.rolland@siderolabs.com>
Co-Authored-By: Gerard de Leeuw <gdeleeuw@leeuwit.nl>
Co-Authored-By: Steve Francis <67986293+steverfrancis@users.noreply.github.com>
Co-Authored-By: Volodymyr Mazurets <volodymyrmazureets@gmail.com>