mirror of
https://github.com/siderolabs/talos.git
synced 2025-12-15 14:31:18 +01:00
The issue shows up in our tests as:
```
=== RUN TestIntegration/api.DiscoverySuite/TestRegistries
discovery.go:210: waiting for cluster affiliates to be discovered: 4 expected, 6 found
discovery.go:210: waiting for cluster affiliates to be discovered: 4 expected, 6 found
discovery.go:210: waiting for cluster affiliates to be discovered: 4 expected, 6 found
discovery.go:210: waiting for cluster affiliates to be discovered: 4 expected, 6 found
discovery.go:210: waiting for cluster affiliates to be discovered: 4 expected, 6 found
discovery.go:210: waiting for cluster affiliates to be discovered: 4 expected, 6 found
discovery.go:210: waiting for cluster affiliates to be discovered: 4 expected, 6 found
discovery.go:210: waiting for cluster affiliates to be discovered: 4 expected, 6 found
```
It should be a minor issue for non-KubeSpan'ed clusters (as members get
correctly de-duplicated), but might cause connectivity issues for
KubeSpan'ed clusters.
The issue comes from the short mount in the sequencer around
`loadConfig` step: as the mount time is short, it triggers a race in the
node identity controller when it tries to read existing identity from
`/system/state`, but as the partition is unmounted by the time it tries
to read, it assumes there's no identity and establishes a new one.
Eventually, it will write new identity back to disk, but that new
identity is different from the previous one, so it creates another entry
for itself in the discovery service.
A proper solution is a volume mount controller, but a temporary band aid
is to avoid broadcasting mount notification for this short `STATE` mount
via resources, so that controller isn't triggered.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>