tailscale/slow-flakes.md
Avery Pennarun 748fc029cc docs: add flaky test fix reports (to be reverted)
Add documentation of the flaky test investigation and fixes:

- fixed-tests-report.md: detailed breakdown of all fixes by tier
- more-fixes-plan.md: root cause analysis and verification steps

These files document the work done but should be reverted before
merging to main.

Change-Id: Ib0a2a787aaba2ef8b47475667db4677639c09645
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13 17:28:22 +02:00

2.7 KiB

Slow Tests That Appear Flaky Under Load

These tests pass consistently when run individually but show intermittent failures when run in parallel with many other tests (e.g., via cmd/deflake with -parallel=12).

The root cause is timeout-based flakiness, not logic bugs. Under heavy parallel load with the race detector enabled, resource contention causes these tests to exceed their expected timeouts.

Affected Tests

tailscale.com/tstest/integration

Test Baseline (ms) With Race Pass Rate (parallel) Notes
TestAutoUpdateDefaults 1020 ~25s 5/6 3 subtests, each ~8-9s
TestAutoUpdateDefaults_cap 370 ~25s 5/6 Similar to above
TestNATPing 2000 ~70s+ 3/6 Very slow with race detector
TestPeerRelayPing 1790 ~20-25s 5/6 3-node relay test
TestTailnetLock 1290 ~12s 2/6 Flaky even without race

tailscale.com/tsconsensus

Test Baseline (ms) With Race Pass Rate (parallel) Notes
TestRejoin ~4000 ~15s+ 4/5 Raft consensus test

Conditions Observed

  • Machine: Linux 6.12.41+deb13-amd64
  • Parallelism: 12 packages tested concurrently (-parallel=12)
  • Race detector: Enabled (-race=true)
  • Iterations: 3-5 per test (-count=3 or -count=5)

Commands to Run Tests Successfully

Individual test (passes reliably)

# Without race detector
./tool/go test -v -count=5 -run '^TestAutoUpdateDefaults$' ./tstest/integration

# With race detector (slower but still passes)
./tool/go test -v -race -count=5 -run '^TestAutoUpdateDefaults$' ./tstest/integration

All integration tests (reduced parallelism)

# Reduce parallelism to avoid resource contention
./tool/go test -v -race -count=3 -parallel=4 ./tstest/integration

Using deflake with longer timeouts

# Build deflake
go build ./cmd/deflake

# Run with reduced parallelism
./deflake -packages=tailscale.com/tstest/integration -count=3 -parallel=4

Recommendations

  1. For CI: Run integration tests with reduced parallelism or longer timeouts
  2. For local development: Run individual tests rather than full suites
  3. For deflake tool: Consider adding a -slow flag that uses more conservative timeouts for integration tests
  4. Long-term: Some tests could be optimized to run faster (e.g., TestNATPing takes 70s+ with race detector)

Not Actual Flakes

These tests do NOT have logic bugs causing flakiness. The underlying test logic is correct. The "failures" are purely due to timeouts being exceeded under heavy load.

Evidence: When run individually with ./tool/go test -v -race -count=5, all tests pass 5/5 times.