mirror of
https://github.com/tailscale/tailscale.git
synced 2026-05-05 04:06:35 +02:00
Add documentation of the flaky test investigation and fixes: - fixed-tests-report.md: detailed breakdown of all fixes by tier - more-fixes-plan.md: root cause analysis and verification steps These files document the work done but should be reverted before merging to main. Change-Id: Ib0a2a787aaba2ef8b47475667db4677639c09645 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2.7 KiB
2.7 KiB
Slow Tests That Appear Flaky Under Load
These tests pass consistently when run individually but show intermittent failures when run in parallel with many other tests (e.g., via cmd/deflake with -parallel=12).
The root cause is timeout-based flakiness, not logic bugs. Under heavy parallel load with the race detector enabled, resource contention causes these tests to exceed their expected timeouts.
Affected Tests
tailscale.com/tstest/integration
| Test | Baseline (ms) | With Race | Pass Rate (parallel) | Notes |
|---|---|---|---|---|
| TestAutoUpdateDefaults | 1020 | ~25s | 5/6 | 3 subtests, each ~8-9s |
| TestAutoUpdateDefaults_cap | 370 | ~25s | 5/6 | Similar to above |
| TestNATPing | 2000 | ~70s+ | 3/6 | Very slow with race detector |
| TestPeerRelayPing | 1790 | ~20-25s | 5/6 | 3-node relay test |
| TestTailnetLock | 1290 | ~12s | 2/6 | Flaky even without race |
tailscale.com/tsconsensus
| Test | Baseline (ms) | With Race | Pass Rate (parallel) | Notes |
|---|---|---|---|---|
| TestRejoin | ~4000 | ~15s+ | 4/5 | Raft consensus test |
Conditions Observed
- Machine: Linux 6.12.41+deb13-amd64
- Parallelism: 12 packages tested concurrently (
-parallel=12) - Race detector: Enabled (
-race=true) - Iterations: 3-5 per test (
-count=3or-count=5)
Commands to Run Tests Successfully
Individual test (passes reliably)
# Without race detector
./tool/go test -v -count=5 -run '^TestAutoUpdateDefaults$' ./tstest/integration
# With race detector (slower but still passes)
./tool/go test -v -race -count=5 -run '^TestAutoUpdateDefaults$' ./tstest/integration
All integration tests (reduced parallelism)
# Reduce parallelism to avoid resource contention
./tool/go test -v -race -count=3 -parallel=4 ./tstest/integration
Using deflake with longer timeouts
# Build deflake
go build ./cmd/deflake
# Run with reduced parallelism
./deflake -packages=tailscale.com/tstest/integration -count=3 -parallel=4
Recommendations
- For CI: Run integration tests with reduced parallelism or longer timeouts
- For local development: Run individual tests rather than full suites
- For deflake tool: Consider adding a
-slowflag that uses more conservative timeouts for integration tests - Long-term: Some tests could be optimized to run faster (e.g., TestNATPing takes 70s+ with race detector)
Not Actual Flakes
These tests do NOT have logic bugs causing flakiness. The underlying test logic is correct. The "failures" are purely due to timeouts being exceeded under heavy load.
Evidence: When run individually with ./tool/go test -v -race -count=5, all tests pass 5/5 times.