Force-disconnect leaves stale routes in the container's network
namespace: libnetwork removes the host-side veth but the
namespace-internal route survives. The next ConnectNetwork on the
same network then fails with "cannot program address X/16 in sandbox
interface because it conflicts with existing route", and the route
never resolves on its own. Bounded retry around ConnectNetwork
exhausts MaxElapsedTime instead of recovering.
Without Force, libnetwork drains the namespace routes synchronously
during disconnect and ConnectNetwork sees a clean slate. Cable-pull
semantic is preserved: docker still tears down the endpoint at the
namespace level, leaving in-flight TCP half-open inside the
container's view, verified via paired probe-timeout pairs in HA
prober logs while both routers are physically disconnected.
Fixes#3234
Libnetwork endpoint cleanup is eventually consistent. A back-to-back
disconnect+connect on the same network can race teardown and return a
transient error. Wrap the daemon calls in bounded exponential backoff
so TestHASubnetRouterFailoverDockerDisconnect no longer flakes on
phase 4c reconnect.
Fixes#3234
Add DisconnectFromNetwork/ReconnectToNetwork on TailscaleClient
backed by pool.Client.DisconnectNetwork.
Exercise single-router fail+recover either side, sequential dual
failure, and simultaneous dual failure. The dual-failure legs
assert no flap to a known-bad primary; the single-router-return
legs check traffic only because docker network disconnect
transiently fails probes on sibling routers.
Fails on parent; passes after the fix.
Updates #3203
Three regression tests for the user scenario: an in-process
Disconnect/Reconnect, a tailscale-down/up integration test, and
an iptables -j DROP cable-pull integration test.
Updates #3203
TestEnablingExitRoutes runs without an ACL, so tailcfg.FilterAllowAll
hides any policy-path regression. Add a sibling that applies the literal
#3212 policy via hsic.WithACLPolicy after registration and approval,
then asserts each peer carries 0.0.0.0/0 + ::/0 in AllowedIPs and
ExitNodeOption is true — the daemon-derived bool that drives
`tailscale exit-node list`.
Updates #3212
Add an integration test that runs the tailscale-rs axum example
against headscale end-to-end. The test provisions one headscale
instance, one Go tailscale probe client (tsic), and one
tailscale-rs node (tsric) on the same tailnet, then verifies:
- the tailscale-rs node registers and is assigned both an IPv4
and an IPv6 address
- the Go probe sees the tailscale-rs node as a peer in its status
- GET /index.html and GET /assets/index.css from the axum server
return the expected content over the tailnet
- three sequential POST /count calls return distinct, incrementing
counter values, proving netstack state is maintained across
multiple TCP connections
This is the first integration test that exercises a non-Go
Tailscale client against headscale, giving end-to-end coverage of
the control protocol for alternate implementations.
Add TailscaleRustInContainer (tsric), a Rust-client counterpart to
integration/tsic. It runs the axum example from tailscale-rs in a
Docker container and exposes the same lifecycle hooks as tsic
(Shutdown, SaveLog, Execute, WriteFile) so integration tests can
treat it as any other Tailscale node.
Dockerfile.tailscale-rs clones tailscale-rs at build time, so no
local source checkout is required. The repo URL and ref are Docker
build arguments (TAILSCALE_RS_REPO, TAILSCALE_RS_REF) exposed as
tsric.WithRepo / tsric.WithRef options. The HEADSCALE_INTEGRATION_
TAILSCALE_RS_IMAGE environment variable provides an escape hatch
for using a pre-built image instead of building from source.
The Dockerfile patches the cloned root Cargo.toml to expose
ts_control's insecure-keyfetch feature through the tailscale crate
so the axum example can fetch the control key over plain HTTP.
The integration harness serves the control plane without TLS,
which is the only mode tailscale-rs can register against until it
grows a way to inject a custom CA bundle.
Ingest (registration and MapRequest updates) now calls
dnsname.SanitizeHostname directly and lets NodeStore auto-bump on
collision. Admin rename uses dnsname.ValidLabel + SetGivenName so
conflicts are surfaced to the caller instead of silently mutated.
Three duplicate invalidDNSRegex definitions, the old NormaliseHostname
and ValidateHostname helpers, EnsureHostname, InvalidString,
ApplyHostnameFromHostInfo, GivenNameHasBeenChanged, generateGivenName
and EnsureUniqueGivenName are removed along with their tests.
ValidateHostname's username half is retained as ValidateUsername for
users.go.
The SaaS-matching collision rule replaces the random "invalid-xxxxxx"
fallback and the 8-character hash suffix; the empty-input fallback is
the literal "node". TestUpdateHostnameFromClient now exercises the
rewrite end-to-end with awkward macOS/Windows names.
Fixes#3188Fixes#2926Fixes#2343Fixes#2762Fixes#2449
Updates #2177
Updates #2121
Updates #363
Subnet routers that advertise routes must not accept peer routes.
With co-router visibility the HA primary's subnet appears in co-routers'
AllowedIPs, and --accept-routes installs a system route that conflicts
with local subnet forwarding.
Updates #3157
Block ping callbacks via iptables while keeping the Noise session alive
to simulate a zombie-connected router. Verify the prober detects it,
fails over, and does not flap on recovery.
Updates #2129
Updates #2902
Render an interstitial showing device hostname, OS, and machine-key
fingerprint before finalising OIDC registration. The user must POST
to /register/confirm/{auth_id} with a CSRF double-submit cookie.
Removes the TODO at oidc.go:201.
Address-based aliases (Prefix, Host) now resolve to exactly the literal
prefix and do not expand to include the matching node's other IP
addresses. This means an IPv4-only host definition only produces IPv4
filter rules, and an IPv6-only definition only produces IPv6 rules.
Update TestACLDevice1CanAccessDevice2 and TestACLNamedHostsCanReach to
track which addresses each test case covers via test1Addr/test2Addr/
test3Addr fields and only assert connectivity for that family.
Previously the tests assumed all address families would work regardless
of how the policy aliases were defined, which was true only when
address-based aliases auto-expanded to include all of a node's IPs.
The group test case (identity-based) keeps using IPv4 since tags, users,
groups, autogroups and the wildcard still resolve to both IPv4 and IPv6.
Updates #2180
When WithPingUntilDirect(false) is set, the Ping helper should accept
any indirect path, but the substring check only matched "via DERP" and
"via relay". Tailscale peer relay pings output
pong from ... via peer-relay(ip:port:vni:N) in Nms
which does not contain the "via relay" substring and was therefore
rejected as errTailscalePingNotDERP. TestGrantCapRelay Phase 4 never
passed because of this: even when the data plane was healthy the
helper returned an error.
Commit abe1a3e7 attempted to fix this by adding "via relay" alongside
"via DERP" but missed the "peer-" prefix used by peer relay output.
Add an explicit "via peer-relay" substring check so peer relay pongs
are accepted alongside DERP and plain relay pongs.
Updates #2180
Use the node.expiry config to apply a default expiry to non-tagged
nodes when the client does not request a specific expiry. This covers
all registration paths: new node creation, re-authentication, and
pre-auth key re-registration.
Tagged nodes remain exempt and never expire.
Fixes#1711
Remove TestGrantViaExitNodeSteering and TestGrantViaMixedSteering.
Exit node traffic forwarding through via grants cannot be validated
with curl/traceroute in Docker containers because Tailscale exit nodes
strip locally-connected subnets from their forwarding filter.
The correctness of via exit steering is validated by:
- Golden MapResponse comparison (TestViaGrantMapCompat with GRANT-V31
and GRANT-V36) comparing full netmap output against Tailscale SaaS
- Filter rule compatibility (TestGrantsCompat with GRANT-V14 through
GRANT-V36) comparing per-node PacketFilter rules against Tailscale SaaS
- TestGrantViaSubnetSteering (kept) validates via subnet steering with
actual curl/traceroute through Docker, which works for subnet routes
Updates #2180
Add NetworkSpec struct with optional Subnet field to ScenarioSpec.Networks.
When Subnet is set, the Docker network is created with that specific CIDR
instead of Docker's auto-assigned RFC1918 range.
Fix all exit node integration tests to use curl + traceroute. Tailscale
exit nodes strip locally-connected subnets from their forwarding filter
(shrinkDefaultRoute + localInterfaceRoutes), so exit nodes cannot
forward to IPs on their Docker network via the default route alone.
This is by design: exit nodes provide internet access, not LAN access.
To also get LAN access, the subnet must be explicitly advertised as a
route — matching real-world Tailscale deployment requirements.
- TestSubnetRouterMultiNetworkExitNode: advertise usernet1 subnet
alongside exit route, upgraded from ping to curl + traceroute
- TestGrantViaExitNodeSteering: usernet1 subnet in via grants and
auto-approvers alongside autogroup:internet
- TestGrantViaMixedSteering: externet subnet in auto-approvers and
route advertisement for exit traffic
Updates #2180
Action.UnmarshalJSON produces the format
'action="unknown-action" is not supported: invalid ACL action',
not the reversed format the test expected.
Updates #2180
Add NodeAttrsTaildriveShare and NodeAttrsTaildriveAccess to the node capability map, enabling Taildrive file sharing when granted via policy. Add integration test verifying the full cap/drive grant lifecycle.
Updates #2180
Add ConnectToNetwork to the TailscaleClient interface for multi-network test scenarios and implement peer relay ping support. Use these to test that cap/relay grants correctly enable peer-to-peer relay connections between tagged nodes.
Updates #2180
Add integration tests validating that via grants correctly steer
routes to designated nodes per client group:
- TestGrantViaSubnetSteering: two routers advertise the same
subnet, via grants steer each client group to a specific router.
Verifies per-client route visibility, curl reachability, and
traceroute path.
- TestGrantViaExitNodeSteering: two exit nodes, via grants steer
each client group to a designated exit node. Verifies exit
routes are withdrawn from non-designated nodes and the client
rejects setting a non-designated exit node.
- TestGrantViaMixedSteering: cross-steering where subnet routes
and exit routes go to different servers per client group.
Verifies subnet traffic uses the subnet-designated server while
exit traffic uses the exit-designated server.
Also add autogroupp helper for constructing AutoGroup aliases in
grant policy configurations.
Updates #2180
Apply CI-aware scaling to all remaining hardcoded timeouts:
- requireAllClientsOfflineStaged: scale the three internal stage
timeouts (15s/20s/60s) with ScaledTimeout.
- validateReloginComplete: scale requireAllClientsOnline (120s)
and requireAllClientsNetInfoAndDERP (3min) calls.
- WaitForTailscaleSyncPerUser callers in acl_test.go (3 sites, 60s).
- WaitForRunning callers in tags_test.go (10 sites): switch to
PeerSyncTimeout() to match convention.
- WaitForRunning/WaitForPeers direct callers in route_test.go.
- requireAllClientsOnline callers in general_test.go and
auth_key_test.go.
Replace pingAllHelper with assertPingAll/assertPingAllWithCollect:
- Wraps pings in EventuallyWithT so transient docker exec timeouts
are retried instead of immediately failing the test.
- Timeout scales with the ping matrix size (2s per ping budget for
2 full sweeps) so large tests get proportionally more time.
- Uses CollectT correctly, fixing the broken EventuallyWithT usage
in TestEphemeral where the old t.Errorf bypassed CollectT.
- Follows the established assert*/assertWithCollect naming.
Updates #3125
The default docker execute timeout (10s) is the root cause of
"dockertest command timed out" errors across many integration tests
on CI. On congested GitHub Actions runners, docker exec latency
alone can consume 2-5 seconds of this budget before the command
even starts inside the container.
Replace the hardcoded 10s constant with a function that returns
20s on CI, doubling the budget for all container commands
(tailscale status, headscale CLI, curl, etc.).
Similarly, scale the default tailscale ping timeout from 200ms to
400ms on CI. This doubles the per-attempt budget and the docker
exec timeout for pings (from 200ms*5=1s to 400ms*5=2s), giving
more headroom for docker exec overhead.
Updates #3125
TestACLTagPropagationPortSpecific failed twice on CI because it jumped
from SetNodeTags directly to checking curl, without first verifying the
tag change was applied on the server. This races against server-side
processing.
Add a tag verification step (matching TestACLTagPropagation's pattern)
and bump the Step 4 timeout from 60s to 90s since port-specific filter
changes require both endpoints to process the new PacketFilter from
the MapResponse while the WireGuard tunnel stays up.
Updates #3125
Wrap all 329 hardcoded EventuallyWithT timeouts across 12 test files
with integrationutil.ScaledTimeout(), which applies a 2x multiplier
on CI runners. This addresses the systemic issue where hardcoded
timeouts that work locally are insufficient under CI resource
contention.
Variable-based timeouts (propagationTime, assertTimeout in
route_test.go and totalWaitTime in auth_oidc_test.go) are wrapped
at their definition site so all downstream usages benefit.
The retry intervals (second duration parameter) are intentionally
NOT scaled, as they control polling frequency, not total wait time.
Updates #3125
Replace Curl() with CurlFailFast() in all negative curl assertions
(where the test expects the connection to fail). CurlFailFast uses
1 retry and 2s max time instead of 3 retries and 5s max, which
avoids wasting time on unnecessary retries when we expect the
connection to be blocked.
This affects 21 call sites across 7 test functions:
- TestACLAllowUser80Dst
- TestACLDenyAllPort80
- TestACLAllowUserDst
- TestACLAllowStarDst
- TestACLNamedHostsCanReach
- TestACLDevice1CanAccessDevice2
- TestPolicyUpdateWhileRunningWithCLIInDatabase
- TestACLAutogroupSelf
- TestACLPolicyPropagationOverTime
Where possible, the inline Curl+Error pattern is replaced with the
assertCurlFailWithCollect helper introduced in the previous commit.
Updates #3125
Add ScaledTimeout to scale EventuallyWithT timeouts by 2x on CI,
consistent with the existing PeerSyncTimeout (60s/120s) and
dockertestMaxWait (300s/600s) conventions.
Add assertCurlSuccessWithCollect and assertCurlFailWithCollect helpers
following the existing *WithCollect naming convention.
assertCurlFailWithCollect uses CurlFailFast internally for aggressive
timeouts, avoiding wasted retries when expecting blocked connections.
Apply these to the three flakiest ACL tests:
- TestACLTagPropagation: swap NetMap and curl verification order so
the fast NetMap check (confirms MapResponse arrived) runs before
the slower curl check. Use curl helpers and scaled timeouts.
- TestACLTagPropagationPortSpecific: use curl helpers and scaled
timeouts.
- TestACLHostsInNetMapTable: scale the 10s EventuallyWithT timeout.
Updates #3125
Replace inline WithDockerEntrypoint shell scripts in
TestACLTagPropagation and TestACLTagPropagationPortSpecific with
the standard WithPackages and WithWebserver options.
The custom entrypoints used fragile fixed sleeps and lacked the
robust network/cert readiness waits that buildEntrypoint provides.
Updates #3139
Make embedded DERP server and TLS the default configuration for all
integration tests, replacing the per-test opt-in model that led to
inconsistent and flaky test behavior.
Infrastructure changes:
- DefaultConfigEnv() includes embedded DERP server settings
- New() auto-generates a proper CA + server TLS certificate pair
- CA cert is installed into container trust stores and returned by
GetCert() so clients and internal tools (curl) trust the server
- CreateCertificate() now returns (caCert, cert, key) instead of
discarding the CA certificate
- Add WithPublicDERP() and WithoutTLS() opt-out options
- Remove WithTLS(), WithEmbeddedDERPServerOnly(), and WithDERPAsIP()
since all their behavior is now the default or unnecessary
Test cleanup:
- Remove all redundant WithTLS/WithEmbeddedDERPServerOnly/WithDERPAsIP
calls from test files
- Give every test a unique WithTestName by parameterizing aclScenario,
sshScenario, and derpServerScenario helpers
- Add WithTestName to tests that were missing it
- Document all non-standard options with inline comments explaining
why each is needed
Updates #3139
Add embedded DERP server, TLS, and netfilter=off to match the
infrastructure configuration used by all other ACL integration tests.
Without these options, the test fails intermittently because traffic
routes through external DERP relays and iptables initialization fails
in Docker containers.
Updates #3139
Add end-to-end integration test that validates localpart:*@domain
SSH user mapping with real Tailscale clients. The test sets up an
SSH policy with localpart entries and verifies that users can SSH
into tagged servers using their email local-part as the username.
Updates #3049
Add ReadLog method to headscale integration container for log
inspection. Split SSH check mode tests into CLI and OIDC variants
and add comprehensive test coverage:
- TestSSHOneUserToOneCheckModeCLI: basic check mode with CLI approval
- TestSSHOneUserToOneCheckModeOIDC: check mode with OIDC approval
- TestSSHCheckModeUnapprovedTimeout: rejection on cache expiry
- TestSSHCheckModeCheckPeriodCLI: session expiry and re-auth
- TestSSHCheckModeAutoApprove: auto-approval within check period
- TestSSHCheckModeNegativeCLI: explicit rejection via CLI
Update existing integration tests to use headscale auth register.
Updates #1850
Implement the SSH "check" action which requires additional
verification before allowing SSH access. The policy compiler generates
a HoldAndDelegate URL that the Tailscale client calls back to
headscale. The SSHActionHandler creates an auth session and waits for
approval via the generalised auth flow.
Sort check (HoldAndDelegate) rules before accept rules to match
Tailscale's first-match-wins evaluation order.
Updates #1850
Generalise the registration pipeline to a more general auth pipeline
supporting both node registrations and SSH check auth requests.
Rename RegistrationID to AuthID, unexport AuthRequest fields, and
introduce AuthVerdict to unify the auth finish API.
Add the urlParam generic helper for extracting typed URL parameters
from chi routes, used by the new auth request handler.
Updates #1850
Tagged nodes no longer have user_id set, so ListNodes(user) cannot
find them. Update integration tests to use ListNodes() (all nodes)
when looking up tagged nodes.
Add a findNode helper to locate nodes by predicate from an
unfiltered list, used in ACL tests that have multiple nodes per
scenario.
Updates #3077
Add --disable flag to "headscale nodes expire" CLI command and
disable_expiry field handling in the gRPC API to allow disabling
key expiry for nodes. When disabled, the node's expiry is set to
NULL and IsExpired() returns false.
The CLI follows the new grpcRunE/RunE/printOutput patterns
introduced in the recent CLI refactor.
Also fix NodeSetExpiry to persist directly to the database instead
of going through persistNodeToDB which omits the expiry field.
Fixes#2681
Co-authored-by: Marco Santos <me@marcopsantos.com>
Fix issues found by the upgraded golangci-lint:
- wsl_v5: add required whitespace in CLI files
- staticcheck SA4006: replace new(var.Field) with &localVar
pattern since staticcheck does not recognize Go 1.26
new(value) as a use of the variable
- staticcheck SA5011: use t.Fatal instead of t.Error for
nil guard checks so execution stops
- unused: remove dead ptrTo helper function
This commit upgrades the codebase from Go 1.25.5 to Go 1.26rc2 and
adopts new language features.
Toolchain updates:
- go.mod: go 1.25.5 → go 1.26rc2
- flake.nix: buildGo125Module → buildGo126Module, go_1_25 → go_1_26
- flake.nix: build golangci-lint from source with Go 1.26
- Dockerfile.integration: golang:1.25-trixie → golang:1.26rc2-trixie
- Dockerfile.tailscale-HEAD: golang:1.25-alpine → golang:1.26rc2-alpine
- Dockerfile.derper: golang:alpine → golang:1.26rc2-alpine
- .goreleaser.yml: go mod tidy -compat=1.25 → -compat=1.26
- cmd/hi/run.go: fallback Go version 1.25 → 1.26rc2
- .pre-commit-config.yaml: simplify golangci-lint hook entry
Code modernization using Go 1.26 features:
- Replace tsaddr.SortPrefixes with slices.SortFunc + netip.Prefix.Compare
- Replace ptr.To(x) with new(x) syntax
- Replace errors.As with errors.AsType[T]
Lint rule updates:
- Add forbidigo rules to prevent regression to old patterns
Errors should not start capitalised and they should not contain the word error
or state that they "failed" as we already know it is an error
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
Update integration test expectations to match current policy behavior:
1. IPProto defaults include all four protocols (TCP, UDP, ICMPv4,
ICMPv6) for port-range ACL rules, not just TCP and UDP.
2. Filter rules with identical SrcIPs and IPProto are now merged
into a single rule with combined DstPorts, so the subnet router
receives one filter rule instead of two.
Updates #3036
Add TestTagsAuthKeyConvertToUserViaCLIRegister that reproduces the
exact panic from #3038: register a node with a tags-only PreAuthKey
(no user), force reauth with empty tags, then register via CLI with
a user. The mapper panics on node.Owner().Model().ID when User is nil.
The critical detail is using a tags-only PreAuthKey (User: nil). When
the key is created under a user, the node inherits the User pointer
from createAndSaveNewNode and the bug is masked.
Also add Owner() validity assertions to the existing unit test
TestTaggedNodeWithoutUserToDifferentUser to catch the nil pointer
at the unit test level.
Updates #3038
Update integration tests to use valid SSH patterns:
- TestSSHOneUserToAll: use autogroup:member and autogroup:tagged
instead of wildcard destination
- TestSSHMultipleUsersAllToAll: use autogroup:self instead of
username destinations for group-to-user SSH access
Updates #3009
Updates #3010
Extend TestApiKeyCommand to test the new --id flag for expire and
delete commands, verifying that API keys can be managed by their
database ID in addition to the existing --prefix method.
Updates #2986
Make DeleteUser call updatePolicyManagerUsers() to refresh the policy
manager's cached user list after user deletion. This ensures consistency
with CreateUser, UpdateUser, and RenameUser which all update the policy
manager.
Previously, DeleteUser only removed the user from the database without
updating the policy manager. This could leave stale user references in
the cached user list, potentially causing issues when policy is
re-evaluated.
The gRPC handler now uses the change returned from DeleteUser instead of
manually constructing change.UserRemoved().
Fixes#2967
Add DeleteUser method to ControlServer interface and implement it in
HeadscaleInContainer to enable testing user deletion scenarios.
Add two integration tests for issue #2967:
- TestACLGroupWithUnknownUser: tests that valid users can communicate
when a group references a non-existent user
- TestACLGroupAfterUserDeletion: tests connectivity after deleting a
user that was referenced in an ACL group
These tests currently pass but don't fully reproduce the reported issue
where deleted users break connectivity for the entire group.
Updates #2967