Artem Chernyshev b264a412c2
Some checks failed
default / default (push) Has been cancelled
default / e2e-backups (push) Has been cancelled
default / e2e-forced-removal (push) Has been cancelled
default / e2e-scaling (push) Has been cancelled
default / e2e-short (push) Has been cancelled
default / e2e-short-secureboot (push) Has been cancelled
default / e2e-templates (push) Has been cancelled
default / e2e-upgrades (push) Has been cancelled
default / e2e-workload-proxy (push) Has been cancelled
fix: properly support the PXE and ISO machines in the secure tokens flow
The unique token flow was reworked to support machines running from PXE
and ISO.

As they do not support META persistence, Omni doesn't enforce secure
tokens for them.
But to distinguish machines and make the UUID conflict resolution to work,
Omni now calculates the node fingerprints out of the mac addresses of
all physical interfaces on the node.

So now each unique token consists of two parts:

- fingerprint.
- a random string.

Omni detects Talos installation on the machine in the following way:

- check if the pending machine status exists and it detected the system
  disk.
- overwrite the previous check if the existing link was labeled with the
  Talos being installed.
- lastly if the `MachineStatus` exists, overwrite all checks with the
  installed label from it (ensures bare-metal provider workflow which
  goes to installed to not installed and PXE booted).

Then when a machine joins Omni with some token, Omni checks if the
random part is equal. If it is equal, the machine is immediately
accepted.

If the random part is different and the fingerprint matches:
- if Talos is installed - reject the machine and log the warning in the
  logs.
- if Talos is not installed - replace the existing link with the new one
  (only if the request has a valid join token).

Then if nothing matches, the UUID conflict resolution kicks in.
Provisioner creates a `PendingMachine` which is marked with UUID
conflict label and `PendingMachineStatus` controller generates a random
UUID for the node.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2025-03-03 19:54:58 +03:00
..