Commit Graph

6 Commits

Author SHA1 Message Date
Artem Chernyshev
3e07a88a5d
fix: revert workload proxy LB refactoring
Some checks are pending
default / default (push) Waiting to run
default / e2e-backups (push) Blocked by required conditions
default / e2e-forced-removal (push) Blocked by required conditions
default / e2e-scaling (push) Blocked by required conditions
default / e2e-short (push) Blocked by required conditions
default / e2e-short-secureboot (push) Blocked by required conditions
default / e2e-templates (push) Blocked by required conditions
default / e2e-upgrades (push) Blocked by required conditions
default / e2e-workload-proxy (push) Blocked by required conditions
The new code has several instabilities that need to be addressed.

Fixes: https://github.com/siderolabs/omni/issues/1074

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-04-10 21:46:10 +03:00
Dmitriy Matrenichev
e751022e8a
chore: rework Reconciler to use proper http.Transport
- [x] Ensure `Reconciler` is internally consistent on all variations of `Reconcile` call (including parallel). Track aliases and clusters
side by side.
- [x] Add tests for the above.
- [x] Replace HealthCheck logic with the actual tcp probing.
- [x] Replace probing port on alias removal. That is - if we lost an alias to the probing port, find a new one and use it.
- [x] Expose metrics. Specifically for `connectionLatency`, `requestStartLatency`, `responseStartLatency`, `inFlightRequests` and `workingProbes`. Register those in prometheus.
- [x] Add happy path test for the http.Handler.

For #886

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2025-02-28 15:49:00 +03:00
Artem Chernyshev
ed946b30a6
feat: display OMNI_ENDPOINT in the service account creation UI
Fixes: https://github.com/siderolabs/omni/issues/858

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-01-29 15:27:36 +03:00
Andrey Smirnov
dcd123d333
fix: workload service reconciler
Most of the code is simplifying/refactoring, but there are few fixes:

* increase LB upstream healthcheck interval to 1 minute
* pass a logger to the LB (as otherwise it creates its own)
* shutdown the LB by waiting for it to shutdown
* close the LB even when it fails to start to avoid leaking health check goroutines

Additionally, add an integration test for workload proxying.

Co-authored-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-08-19 00:40:32 +02:00
Utku Ozdemir
5e35cbe572
fix: fix nil pointer dereference in workload proxy reconciler
Add the missed `nil` reference check to the workload proxy TCP load balancer upstreams reconciler.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-08-08 07:59:18 +02:00
Utku Ozdemir
e9bca13f8f
feat: use tcp loadbalancer for exposed services
Improve the exposed service reliability by using a TCP loadbalancer between the nodes exposing the service.

Rework the exposed service proxy registry to be a COSI controller instead to simplify the logic, improve reliability and testability.

Closes siderolabs/omni#396.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-06-25 17:28:21 +02:00