2031 Commits

Author SHA1 Message Date
Alex Chan
94643ba572 Check that we need to compute missing AUMs before updating local
Change-Id: If2d2ff03fc650f8d50bb45c1c7ff8bf622715b89
2026-04-30 15:10:57 +01:00
Alex Chan
8749d19029 tka,ipn: reduce boilerplate in Tailnet Lock tests
The `CreateStateForTest` helper reduces boilerplate in cases where the test
only cares about the trusted keys and not the disablement values (and makes
it more obvious where the disablement values are meaningful).

The `setupChonkStorage` helper reduces the boilerplate when creating on-disk
TKA storage in tests.

The `fakeLocalBackend` helper reduces the boilerplate when setting up a
`LocalBackend` instance in the IPN tests.

Updates #cleanup

Change-Id: Iacfba1be5f7fab208eec11e4369d63c7d7519da5
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-04-30 12:43:07 +01:00
Brad Fitzpatrick
f343b496c3 wgengine, all: remove LazyWG, use wireguard-go callback API for on-demand peers
Replace the UAPI text protocol-based wireguard configuration with
wireguard-go's new direct callback API (SetPeerLookupFunc,
SetPeerByIPPacketFunc, RemoveMatchingPeers, SetPrivateKey).

Instead of computing a trimmed wireguard config ahead of time upon
control plane updates and pushing it via UAPI, install callbacks so
wireguard-go creates peers on demand when packets arrive. This removes
all the LazyWG trimming machinery: idle peer tracking, activity maps,
noteRecvActivity callbacks, the KeepFullWGConfig control knob, and the
ts_omit_lazywg build tag.

For incoming packets, PeerLookupFunc answers wireguard-go's questions
about unknown public keys by looking up the peer in the full config.
For outgoing packets, PeerByIPPacketFunc (installed from
LocalBackend.lookupPeerByIP) maps destination IPs to node public keys
using the existing nodeByAddr index.

Updates tailscale/corp#12345

Change-Id: I4cba80979ac49a1231d00a01fdba5f0c2af95dd8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 19:46:19 -07:00
Claus Lensbøl
978b6a81b2
ipn/ipnlocal: always ReSTUN when starting up without a cache (#19586)
78627c1 introduced starting up and preserving the DERP server from
cache, but also changed it so the initial ReSTUN would not fire when
setting the DERPMap.

Change this so when not working from a cache, the ReSTUN will always
fire during startup.

Updates #19585

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-04-29 18:56:57 -04:00
Brad Fitzpatrick
15cba0a3f6 tstest/natlab/vmtest: add TestDiscoKeyChange
Add a vmtest that brings up two gokrazy nodes A and B behind two
One2OneNAT networks (so direct UDP works in both directions and any
slowness can't be blamed on NAT traversal), establishes a WireGuard
tunnel A → B with TSMP, then rotates B's disco key four times and
asserts that the data plane recovers in both directions after each
rotation. All pings are TSMP (the data-plane ping; disco pings would
not exercise the WireGuard tunnel itself).

The five pings:

  1. A → B  (initial; brings up the tunnel; 30s budget)
  2. B → A  after rotate (LocalAPI rotate-disco-key debug action)
  3. A → B  after rotate (LocalAPI)
  4. B → A  after restart (SIGKILL; gokrazy supervisor respawns)
  5. A → B  after restart (SIGKILL)

Each post-rotation ping gets a 15-second budget. Two unavoidable
multi-second waits dominate today:

  - The rotate-then-a→b phase takes ~10s on main because of LazyWG.
    After B's WantRunning bounce, B's wgengine resets its
    sentActivityAt/recvActivityAt maps and trims A out of the
    wireguard-go config as an "idle peer"; B only re-adds A on
    inbound activity, by which point A's first few TSMP packets
    have been silently dropped at B's tundev. The
    bradfitz/rm_lazy_wg branch removes that trimming entirely
    (verified locally: this phase drops to <100ms there).

  - The restart phases take ~5s for wireguard-go's RekeyTimeout
    handshake retry. After SIGKILL+respawn the first WG handshake
    init from the restarted node sometimes goes into the void
    (likely the brief peer-removed window in the receiver's
    two-step maybeReconfigWireguardLocked reconfig during which
    the peer is absent from wireguard-go), and wg-go's 5s+jitter
    retransmit timer is the next opportunity to retry. That retry
    succeeds and the staged TSMP packet flushes. Intrinsic to the
    protocol's retransmit policy.

Once LazyWG is removed and the first-handshake-after-reconfig race
is fixed, the budget should drop to 5s.

Supporting changes:

  ipn/ipnlocal: DebugRotateDiscoKey now toggles WantRunning off and
  back on after rotating the disco key. magicsock.Conn.RotateDiscoKey
  only resets local disco state; without also dropping wireguard-go
  session keys, peers keep encrypting with their stale per-peer
  session against us until their rekey timer fires (WireGuard has no
  data-plane signaling to invalidate sessions). Bouncing WantRunning
  runs the engine through Reconfig(empty) → authReconfig, which
  drops every peer's WG session so the next packet either way
  triggers a fresh handshake.

  ipn/ipnlocal, ipn/localapi: add a debug-only "peer-disco-keys"
  LocalAPI action ([LocalBackend.DebugPeerDiscoKeys]) that returns
  a map[NodePublic]DiscoPublic from the current netmap. Tests reach
  it via [local.Client.DebugResultJSON]. We do not surface disco
  keys via [ipnstate.PeerStatus] because adding a non-comparable
  [key.DiscoPublic] field there breaks reflect-based test helpers
  (e.g. TestFilterFormatAndSortExitNodes' use of cmp.Diff), and
  general LocalAPI clients have no need for disco keys. Since the
  debug LocalAPI is gated behind the ts_omit_debug build tag, this
  endpoint is automatically stripped from small binaries.

  cmd/tta: add /restart-tailscaled handler (Linux-only, via /proc walk)
  to drive the SIGKILL phase. On gokrazy the supervisor respawns
  tailscaled within a second.

  tstest/integration/testcontrol: add Server.AllOnline. When set,
  every peer entry in MapResponses is marked Online=true. Several
  disco-key handling fast paths in controlclient and wgengine
  (removeUnwantedDiscoUpdates, removeUnwantedDiscoUpdatesFromFull
  NetmapUpdate, the wgengine tsmpLearnedDisco fast path) only fire
  for online peers; without this flag, tests exercising disco-key
  rotation only hit the offline-peer code paths, which mask issues
  and are several seconds slower in this scenario. Finer-grained
  per-node online tracking can be added later.

  tstest/natlab/vmtest: add Env.RotateDiscoKey,
  Env.RestartTailscaled, Env.PeerDiscoKey, Node.Name, an
  [AllOnline] EnvOption that plumbs through to
  testcontrol.Server.AllOnline, and an exported
  Env.Ping(from, to, type, timeout). Ping replaces the unexported
  helper so callers can specify both a ping type (PingDisco for
  warming peer state, PingTSMP for asserting end-to-end
  connectivity) and a deadline. PeerDiscoKey returns its LocalAPI
  error so callers inside tstest.WaitFor can retry transient
  failures rather than fataling the test.

Updates #12639
Updates #13038

Change-Id: I3644f27fc30e52990ba25a3983498cc582ddb958
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 12:58:00 -07:00
Brad Fitzpatrick
22ff402da9 wgengine/magicsock: restore SetDERPMap signature, add SetDERPMapWithoutReSTUN
Commit 78627c132f changed the signature of magicsock.Conn.SetDERPMap to
take an additional bool doReStun parameter. Avoid both the boolean
parameter and the API signature change by restoring SetDERPMap to its
original single-argument form and adding a new SetDERPMapWithoutReSTUN
method for the cache-loading caller that wants to skip the post-set
ReSTUN.

Updates #19490

Change-Id: I97d9e82156bfc546ccf59756d1ea52f039b5de06
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 12:46:15 -07:00
Claus Lensbøl
78627c132f
wgengine/magicsock,ipn/ipnlocal: store and load homeDERP from cache (#19491)
With netmap caching, the home DERP of the self node was neither saved to
the cache or loaded from it, making nodes not stick to a DERP when
starting without a connection to control.

Instead, make sure that when a cache is available, load that cache,
before looking for DERP servers. This is implemented by allowing a skip
of ReSTUN in setting the DERP map (we must have a DERP map before
setting the home DERP), so the DERP from cache will set itself and be
sticky until a connection to control is established.

Making DERP only change when connected to control is handled by existing
code from f072d017bd8241675aa946a27fc1827f570435cb.

Updates #19490

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-04-29 10:24:09 -04:00
Alex Chan
bb91bb842c all: remove everything related to non-seamless key renewal
Seamless key renewal has been the default in all clients since 1.90.
We retained the ability to disable it from the control plane as a
precaution, but we haven't seen any issues that require us to disable it.

We're now removing all the code for non-seamless key renewal, because we
don't expect to turn it on again, and indeed it's been untested in the
field for three releases so might contain latent bugs!

Updates tailscale/corp#33042

Change-Id: I4b80bf07a3a50298d1c303743484169accc8844b
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-04-29 10:03:26 +01:00
Brad Fitzpatrick
ad5436af0d tstest/largetailnet, tstest/integration/testcontrol: add in-process large-tailnet benchmark
Add a Go benchmark that exercises a single tailnet client (a [tsnet.Server]
running in the test process) against a synthetic large initial netmap and
a stream of caller-driven peer add/remove deltas, all in-process.

The harness is split in two parts:

  - tstest/largetailnet, a reusable package containing a [Streamer]
    that hijacks the map long-poll on a [testcontrol.Server] via the new
    AltMapStream hook, sends one initial MapResponse with N synthetic
    peers, and forwards caller-supplied delta MapResponses on the same
    stream. Helpers like MakePeer / AllocPeer build synthetic peers with
    unique IDs and addresses derived from the Tailscale ULA range.

  - tstest/largetailnet/largetailnet_test.go, BenchmarkGiantTailnet
    (headless tailscaled workload, no IPN bus subscriber) and
    BenchmarkGiantTailnetBusWatcher (GUI-client workload with one
    Notify subscriber attached). Both are gated on
    --actually-test-giant-tailnet (skipped by default), stand up an
    in-process testcontrol + tsnet.Server, let Up block until the
    initial N-peer netmap has been processed, then ResetTimer and run
    add+remove pairs via b.Loop. Per-delta sync is via a test-only
    [ipnlocal.LocalBackend.AwaitNodeKeyForTest] channel that closes
    once the just-added peer key appears in the netmap (no-watcher
    variant) or via bus-Notify drain (bus-watcher variant).

To support the hijack, [testcontrol.Server] grows an AltMapStream hook
and a small MapStreamWriter interface for benchmarks/stress tests that
need to drive a controlled MapResponse sequence; the normal serveMap
path is untouched when AltMapStream is nil. The streamer answers
non-streaming "lite" map polls (which controlclient issues before the
streaming long-poll to push HostInfo) with an empty MapResponse and
returns immediately, so the streaming poll that follows is the one
that gets the initial netmap.

The benchmark is intended for before/after comparisons of netmap- and
delta-handling changes targeted at large tailnets. CPU profiles on
unmodified main show the expected O(N) hotspots:
setControlClientStatusLocked / authReconfigLocked /
userspaceEngine.Reconfig / setNetMapLocked, plus JSON encoding of the
full Notify.NetMap to bus watchers (which dominates the BusWatcher
variant).

Median ms/op over 10 runs on unmodified main, by tailnet size N:

       N      no-watcher   bus-watcher
   10000          32          166
   50000         222          865
  100000         504         1765
  250000        1551         4696

Recommended invocation:

	go test ./tstest/largetailnet/ -run=^$ \
	    -bench='BenchmarkGiantTailnet(BusWatcher)?$' \
	    -benchtime=2000x -timeout=10m \
	    --actually-test-giant-tailnet \
	    --giant-tailnet-n=250000 \
	    -cpuprofile=/tmp/giant.cpu.pprof

Updates #12542

Change-Id: I4f5b2bb271a36ba853d5a0ffe82054ef2b15c585
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-27 11:47:12 -07:00
Brad Fitzpatrick
0e10a3f580 net/tsdial, ipn/localapi, client/local: let clients dial non-Tailscale addresses directly
Add a tsdial.Dialer.UserDialPlan method that resolves an address and
reports whether the dialer would route it via Tailscale. The LocalAPI
/dial handler now uses this to skip proxying for addresses that aren't
Tailscale routes (e.g. localhost), returning a Dial-Self response with
the resolved address so the client can dial it directly. This avoids
an unnecessary round-trip through the daemon for local connections.

The client's UserDial handles the new response by dialing the resolved
address itself, and the server passes the pre-resolved IP:port for
Tailscale dials to avoid redundant DNS lookups.

Thanks to giacomo and Moyao for pointing this out!

Updates tailscale/corp#39702

Change-Id: I78d640f11ccd92f43ddd505cbb0db8fee19f43a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-27 09:33:27 -07:00
Evan Lowry
3a05c450ce
posture: add HealthTracker for serial number retrieval (#19181)
Device posture checking can fail while enabled if tailscaled does not
have access to smbios. Previously, this was only observable by looking
in the tailscaled logs.

Fixes tailscale/corp#39314

Signed-off-by: Evan Lowry <evan@tailscale.com>
2026-04-25 15:42:47 -03:00
kari-ts
aa740cb393
ipnlocal/drive: reduce noisey per-peer remote logs (#19493)
This drops the per peer "appending remote" log while constructing the remote list, which can get noisy on big tailnets, and keeps logs around remote availability checks, including whether a peer is missing, offline, lacks PeerAPI reachability, lacks sharing permission, or is available.

Updates tailscale/corp#40580

Signed-off-by: kari-ts <kari@tailscale.com>
2026-04-24 08:26:33 -07:00
James 'zofrex' Sanderson
36f094ea3b
ipn/ipnlocal: deflake TestStateMachine{,Seamless} (#19475)
Remove the remaining known sources of flakiness in TestStateMachine and
TestStateMachineSeamless.

Updates tailscale/corp#36230
Updates #19377

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2026-04-22 10:22:47 +01:00
James 'zofrex' Sanderson
ffae275d4d
ipn/ipnlocal,tailcfg: add /debug/tka c2n endpoint (#19198)
Updates tailscale/corp#35015

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2026-04-20 16:00:03 +01:00
James 'zofrex' Sanderson
ec86f0ff93
ipn/ipnlocal: make TestStateMachine less flaky (#19434)
TestStateMachine & TestStateMachineSeamless both flake a lot asserting the
"Shutdown" call on cc after a Logout. This is because Shutdown is called on
a goroutine to avoid a deadlock if it's called while holding the
LocalBackend lock (#18052).

This fixes that cause of flakes by waiting for LocalBackend's goroutine
tracker to have no goroutines running (so the goroutine that calls Shutdown
must have finished).

This does not make TestStateMachine non-flaky because it can flake later in
the test, too: the assertion on "unpause" after clearing the netmap between
"Start4" and "Start4 -> netmap" sometimes fails.

Updates tailscale/corp#36230
Updates #19377
Updates #18052

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2026-04-20 15:58:21 +01:00
Alex Chan
cf76202aa3 ipn/ipnlocal: log the local and remote TKA HEADs during sync
Update this log message to show both the local and remote TKA HEAD; this
is useful for debugging issues on nodes that have fallen behind the
remote TKA HEAD.

Updates tailscale/corp#39455

Change-Id: Ia62ce15756180d2fbac4a898fb94d6143df08b54
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-04-19 16:52:48 +01:00
Scott Graham
cb5a53c424 ipn/ipnlocal: preserve b.loginFlags in auto-login cc.Login calls
LocalBackend stores loginFlags at construction so that per-instance
properties (e.g. LoginEphemeral set by tsnet.Server.Ephemeral) persist
for the session. StartLoginInteractiveAs already merges b.loginFlags
into its cc.Login call, but the two auto-login call sites pass bare
controlclient.LoginDefault, silently dropping any stored flags.

Merge b.loginFlags at both auto-login call sites to match the existing
StartLoginInteractiveAs pattern. LoginDefault is zero so this is a
no-op when loginFlags is empty, and restores the documented behavior
when it isn't.

Fixes #15852

Signed-off-by: Scott Graham <scott.github@h4ck3r.net>
2026-04-17 23:31:18 -05:00
Michael Ben-Ami
1dc08f4d41 appc,feature/conn25: prevent clients from forwarding DNS requests and
modifying DNS responses for domains they are also connectors for

For Connectors 2025, determine if a client is configured as a
connector and what domains it is a connector for. When acting as a
client, don't install Split DNS routes to other connectors for those
domains, and don't alter DNS responses for those domains. The responses
are forwarded back to the original client, which in turn does the alteration,
swapping the real IP for a Magic IP.

A client is also a connector for a domain if it has tags that overlap
with tags in the configured policy, and --advertise-connector=true
in the prefs (not in the self-node Hostinfo from the netmap). We use the prefs
as the source of truth because control only gets a copy from the prefs, and
may drift. And the AppConnector field is currently zeroed out in the
self-node Hostinfo from control.

The extension adds a ProfileStateChange hook to process prefs changes,
and the config type is split into prefs and nodeview sub-configs.

Fixes tailscale/corp#39317

Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
2026-04-16 09:41:54 -04:00
Alex Chan
4f47c3c93d ipn/ipnlocal: log AUM hash on startup as base32, not hex
Before:

    tka initialized at head 325557575a59525354484e4a534f494b4c4e56575435583737564b5036584c4d4c335534554255344c344c36484c5a444a323341

After:

    tka initialized at head 2UWWZYRSTHNJSOIKLNVWT5X77VKP6XLML3U4UBU4L4L6HLZDJ23A

Printing the AUM hash as hex makes it difficult to compare to other AUM
hashes; stringifying it will make it consistent with other printing.

Updates #cleanup

Change-Id: Ic1e23a9ce6a71a53cff7d2190f9fa06eb838ab89
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-04-16 13:45:29 +01:00
M. J. Fromberger
1e4934659b
ipn/ipnlocal: discard cached netmaps upon panic during SetNetworkMap (#19414)
For debugging purposes, unstable builds will sometimes intentionally panic for
unexpected behaviours. We observed such a panic after loading a cached netmap,
but because we had a valid cached map, the client was unable to recover on its
own and the operator had to manually reset the cache.

As a defensive hedge, when netmap caching is enabled, check for a panic during
installation of a net network map: If one occurs, discard any cached netmaps
before letting the panic unwind, so that we do not lose the panic itself, but
reduce the need for manual intervention.

Updates #12639
Updates tailscale/corp#27300

Change-Id: I0436889c6bdc2fa728c9cb83630cd7b00a72ce68
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-04-15 11:07:42 -07:00
Naman Sood
6301a6ce4b
util/linuxfw,wgengine/router: allow incoming CGNAT range traffic with nodeattr
Clients with the newly added node attribute
`"disable-linux-cgnat-drop-rule"` will not automatically drop inbound
traffic on non-Tailscale network interfaces with the source IP in the
CGNAT IP range. This is an initial proof-of-concept for enabling
connectivity with off-Tailnet CGNAT endpoints.

Fixes tailscale/corp#36270.

Signed-off-by: Naman Sood <mail@nsood.in>
2026-04-14 16:45:06 -04:00
Brad Fitzpatrick
9fbe4b3ed2 all: fix six tests that failed with -count=2
Avery found a bunch of tests that fail with -count=2.

Updates tailscale/corp#40176 (tracks making our CI detect them)

Change-Id: Ie3e4398070dd92e4fe0146badddf1254749cca20
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Co-authored-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13 18:52:57 -07:00
Brad Fitzpatrick
5a7ef4a533 ipn/ipnlocal: mark TestStateMachineSeamless as flaky
Updates #19377

Change-Id: I7dbf5b954effbfa821339e79d02d8a6e46d2862a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-13 15:19:42 -07:00
Adriano Sela Aviles
21880457eb ipn/localapi,client/local: add services over localapi
Updates tailscale/corp#40052

Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
2026-04-13 11:47:23 -07:00
Alex Chan
1ff369a261 tka: keep the CompactionDefaults alongside the other limits
Updates #cleanup

Change-Id: Ib5e481d5a9c7ec7ac3e6b3913909ab1bf21d7a4d
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-04-10 16:06:23 +01:00
Jonathan Nobels
03c3551ee5
ipn/ipnlocal: add netmap mutations to the ipn bus (#19120)
ipn/local: add netmap mutations to the ipn bus

updates tailscale/tailscale#1909

This adds a new new NotifyWatchOpt that allows watchers to
receive PeerChange events (derived from node mutations)
on the IPN bus in lieu of a complete netmap.  We'll continue
to send the full netmap for any map response that includes it,
but for  mutations, sending PeerChange events gives the client
the option to manage it's own models more selectively and cuts
way down on json serialization overhead.

On chatty tailnets, this will vastly reduce the amount of
chatter on the bus.

This change should be backwards compatible, it is
purely additive.  Clients that subscribe to NotifyNetmap will
get the full netmap for every delta.  New clients can
omit that and instead opt into NotifyPeerChanges.

Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
2026-04-09 15:45:41 -04:00
Brad Fitzpatrick
a182b864ac tsd, all: add Sys.ExtraRootCAs, plumb through TLS dial paths
Add ExtraRootCAs *x509.CertPool to tsd.System and plumb it through
the control client, noise transport, DERP, and wgengine layers so
that platforms like Android can inject user-installed CA certificates
into Go's TLS verification.

tlsdial.Config now honors base.RootCAs as additional trusted roots,
tried after system roots and before the baked-in LetsEncrypt fallback.
SetConfigExpectedCert gets the same treatment for domain-fronted DERP.

The Android client will set sys.ExtraRootCAs with a pool built from
x509.SystemCertPool + user-installed certs obtained via the Android
KeyStore API, replacing the current SSL_CERT_DIR environment variable
approach.

Updates #8085

Change-Id: Iecce0fd140cd5aa0331b124e55a7045e24d8e0c2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-07 18:10:54 -07:00
Nick Khyl
1f84729908 ipn/desktop: use runtime.Pinner to force heap-allocation of msg
GetMessage can call back into Go, triggering stack growth and causing the stack
to be copied to a new memory region, which invalidates the original stack pointer
passed to the syscall. Since GetMessage uses that pointer to write the message
before returning, this leads to memory corruption.

In this PR, we fix this by using runtime.Pinner, which requires the pointer to refer
to heap-allocated memory.

Fixes #19263
Fixes #17832

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2026-04-07 12:55:11 -05:00
Brad Fitzpatrick
1b5b43787c ipn/localapi, cli, clientmetric: add ipnbus feature tag; fix omit.go stub
Add a new "ipnbus" build feature tag so the watch-ipn-bus LocalAPI
endpoint can be independently controlled, rather than being gated
behind HasDebug || HasServe. Minimal/embedded builds that omit both
debug and serve were getting 404s on watch-ipn-bus, breaking
"tailscale up --authkey=..." and other CLI flows that depend on
WatchIPNBus.

In the CLI, check buildfeatures.HasIPNBus before attempting to watch
the IPN bus in "tailscale up"/"tailscale login", and exit early with
an informational message when the feature is omitted.

Also add the missing NewCounterFunc stub to clientmetric/omit.go,
which caused compilation errors when building with
ts_omit_clientmetrics and netstack enabled.

Fixes #19240

Change-Id: I2e3c69a72fc50fa02542b91b8a54859618a463d1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-07 10:22:37 -07:00
James Tucker
21695cdbf8 ipn/ipnlocal,net/netmon: make frequent darkwake more efficient
Investigating battery costs on a busy tailnet I noticed a large number
of nodes regularly reconnecting to control and DERP. In one case I was
able to analyze closely `pmset` reported the every-minute wake-ups being
triggered by bluetooth. The node was by side effect reconnecting to
control constantly, and this was at times visible to peers as well.

Three changes here improve the situation:
- Short time jumps (less than 10 minutes) no longer produce "major
  network change" events, and so do not trigger full rebind/reconnect.
- Many "incidental" fields on interfaces are ignored, like MTU, flags
  and so on - if the route is still good, the rest should be manageable.
- Additional log output will provide more detail about the cause of
  major network change events.

Updates #3363

Signed-off-by: James Tucker <james@tailscale.com>
2026-04-06 15:46:51 -07:00
Brad Fitzpatrick
5a899e406d ipn/ipnlocal: add health.Tracker to tests where it was warning in CI
To denoise log output, to make it easier to find real failures.

Updates #19252

Change-Id: Iae64a9278c70de24a236c39e3d181a509a512a0b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-05 20:20:07 -07:00
Brad Fitzpatrick
5ef3713c9f cmd/vet: add subtestnames analyzer; fix all existing violations
Add a new vet analyzer that checks t.Run subtest names don't contain
characters requiring quoting when re-running via "go test -run". This
enforces the style guide rule: don't use spaces or punctuation in
subtest names.

The analyzer flags:
- Direct t.Run calls with string literal names containing spaces,
  regex metacharacters, quotes, or other problematic characters
- Table-driven t.Run(tt.name, ...) calls where tt ranges over a
  slice/map literal with bad name field values

Also fix all 978 existing violations across 81 test files, replacing
spaces with hyphens and shortening long sentence-like names to concise
hyphenated forms.

Updates #19242

Change-Id: Ib0ad96a111bd8e764582d1d4902fe2599454ab65
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-05 15:52:51 -07:00
Harry Harpham
7ddbd84171 ipn/ipnlocal: ensure TestServeUnixSocket actually serves a Unix socket
The test sets up an HTTP-over-Unix server and a reverse proxy pointed at
this server, but prior to this change did not round-trip anything to the
backing server. This change ensures that we test code paths which proxy
Unix sockets for serve.

Fixes #19232
Signed-off-by: Harry Harpham <harry@tailscale.com>
2026-04-03 14:15:15 -06:00
M. J. Fromberger
eaa5d9df4b
client,cmd/tailscale,ipn/{ipnlocal,localapi}: add debug CLI command to clear netmap caches (#19213)
This is a follow-up to #19117, adding a debug CLI command allowing the operator
to explicitly discard cached netmap data, as a safety and recovery measure.

Updates #12639

Change-Id: I5c3c47c0204754b9c8e526a4ff8f69d6974db6d0
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-04-02 12:06:39 -07:00
M. J. Fromberger
211ef67222
tailcfg,ipn/ipnlocal: regulate netmap caching via a node attribute (#19117)
Add a new tailcfg.NodeCapability (NodeAttrCacheNetworkMaps) to control whether
a node with support for caching network maps will attempt to do so. Update the
capability version to reflect this change (mainly as a safety measure, as the
control plane does not currently need to know about it).

Use the presence (or absence) of the node attribute to decide whether to create
and update a netmap cache for each profile. If caching is disabled, discard the
cached data; this allows us to use the presence of a cached netmap as an
indicator it should be used (unless explicitly overridden). Add a test that
verifies the attribute is respected. Reverse the sense of the environment knob
to be true by default, with an override to disable caching at the client
regardless what the node attribute says.

Move the creation/update of the netmap cache (when enabled) until after
successfully applying the network map, to reduce the possibility that we will
cache (and thus reuse after a restart) a network map that fails to correctly
configure the client.

Updates #12639

Change-Id: I1df4dd791fdb485c6472a9f741037db6ed20c47e
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-04-01 15:02:53 -07:00
Alex Chan
5b62f98894 ipn, cmd/tailscale/cli: allow setting FQDN sans dot as an exit node
In #10057, @seigel pointed out an inconsistency in the help text for
`exit-node list` and `set --exit-node`:

1.  Use `tailscale exit-node list`, which has a column titled "hostname"
    and tells you that you can use a hostname with `set --exit-node`:

    ```console
    $ tailscale exit-node list
     IP                  HOSTNAME                               COUNTRY            CITY                   STATUS
     100.98.193.6        linode-vps.tailfa84dd.ts.net           -                  -                      -
    […]
     100.93.242.75       ua-iev-wg-001.mullvad.ts.net           Ukraine            Kyiv                   -

    # To view the complete list of exit nodes for a country, use `tailscale exit-node list --filter=` followed by the country name.
    # To use an exit node, use `tailscale set --exit-node=` followed by the hostname or IP.
    # To have Tailscale suggest an exit node, use `tailscale exit-node suggest`.
    ```

    (This is the same format hostnames are presented in the admin
    console.)

2.  Try copy/pasting a hostname into `set --exit-node`:

    ```console
    $ tailscale set --exit-node=linode-vps.tailfa84dd.ts.net
    invalid value "linode-vps.tailfa84dd.ts.net" for --exit-node; must be IP or unique node name
    ```

3.  Note that the command allows some hostnames, if they're from nodes
    in a different tailnet:

    ```console
    $ tailscale set --exit-node= ua-iev-wg-001.mullvad.ts.net
    $ echo $?
    0
    ```

This patch addresses the inconsistency in two ways:

1.  Allow using `tailscale set --exit-node=` with an FQDN that's missing
    the trailing dot, matching the formatting used in `exit-node list`
    and the admin console.

2.  Make the description of valid exit nodes consistent across commands
    ("hostname or IP").

Updates #10057

Change-Id: If5d74f950cc1a9cc4b0ebc0c2f2d70689ffe4d73
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-04-01 20:42:35 +01:00
Alex Chan
4ffb92d7f6 tka: refer consistently to "DisablementValues"
This avoids putting "DisablementSecrets" in the JSON output from
`tailscale lock log`, which is potentially scary to somebody who doesn't
understand the distinction.

AUMs are stored and transmitted in CBOR-encoded format, which uses an
integer rather than a string key, so this doesn't break already-created
TKAs.

Fixes #19189

Change-Id: I15b4e81a7cef724a450bafcfa0b938da223c78c9
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-04-01 19:09:22 +01:00
Alex Chan
88e7330ff1 ipn,tka: improve Tailnet Lock logs
* Refer to "tailnet-lock" instead of "network-lock" in log messages
* Log keys as `tlpub:<hex>` rather than as Go structs

Updates tailscale/corp#39455
Updates tailscale/corp#37904

Change-Id: I644407d1eda029ee11027bcc949897aa4ba52787
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-04-01 17:08:12 +01:00
Harry Harpham
61ac021c5d wgengine/magicsock: assume network up for tests
Without this, any test relying on underlying use of magicsock will fail
without network connectivity, even when the test logic has no need for a
network connection. Tests currently in this bucket include many in
tstest/integration and in tsnet.

Further explanation:

ipn only becomes Running when it sees at least one live peer or DERP
connection:
0cc1b2ff76/ipn/ipnlocal/local.go (L5861-L5866)

When tests only use a single node, they will never see a peer, so the
node has to wait to see a DERP server.

magicsock sets the preferred DERP server in updateNetInfo(), but this
function returns early if the network is down.
0cc1b2ff76/wgengine/magicsock/magicsock.go (L1053-L1106)

Because we're checking the real network, this prevents ipn from entering
"Running" and causes the test to fail or hang.

In tests, we can assume the network is up unless we're explicitly testing
the behaviour of tailscaled when the network is down. We do something similar
in magicsock/derp.go, where we assume we're connected to control unless
explicitly testing otherwise:
7d2101f352/wgengine/magicsock/derp.go (L166-L177)

This is the template for the changes to `networkDown()`.

Fixes #17122

Co-authored-by: Alex Chan <alexc@tailscale.com>
Signed-off-by: Harry Harpham <harry@tailscale.com>
2026-03-31 09:57:14 -06:00
Claus Lensbøl
bf467727fc
control/controlclient,ipn/ipnlocal,wgengine: avoid restarting wireguard when key is learned via tsmp (#19142)
When disco keys are learned on a node that is connected to control and
has a mapSession, wgengine will see the key as having changed, and
assume that any existing connections will need to be reset.

For keys learned via TSMP, the connection should not be reset as that
key is learned via an active wireguard connection. If wgengine resets
that connetion, a 15s timeout will occur.

This change adds a map to track new keys coming in via TSMP, and removes
them from the list of keys that needs to trigger wireguard resets. This
is done with an interface chain from controlclient down via localBackend
to userspaceEngine via the watchdog.

Once a key has been actively used for preventing a wireguard reset, the
key is removed from the map.

If mapSession becomes a long lived process instead of being dependent on
having a connection to control. This interface chain can be removed, and
the event sequence from wrap->controlClient->userspaceEngine, can be
changed to wrap->userspaceEngine->controlClient as we know the map will
not be gunked up with stale TSMP entries.

Updates #12639

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-03-30 14:26:08 -04:00
KevinLiang10
45f989f52a
ipn/ipnlocal: warn incompatibility between no-snat-routes and exitnode (#19023)
* ipn/ipnlocal: warn incompatibility between no-snat-routes and exitnode

This commit adds a warning to health check when the --snat-subnet-routes=false flag for subnet router is
set alone side --advertise-exit-node=true. These two would conflict with each other and result internet-bound
traffic from peers using this exit node no masqueraded to the node's source IP and fail to route return
packets back. The described combination is not valid until we figure out a way to separate exitnode masquerade rule and skip it for subnet routes.

Updates #18725

Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>

* use date instead of for now to clarify effectivness

Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>

---------

Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
2026-03-26 12:36:31 -04:00
Fran Bull
2d5962f524 feature/conn25,ipn/ipnext,ipn/ipnlocal: add ExtraRouterConfigRoutes hook
conn25 needs to add routes to the operating system to direct handling
of the addresses in the magic IP range to the tailscale0 TUN and
tailscaled.

The way we do this for exit nodes and VIP services is that we add routes
to the Routes field of router.Config, and then the config is passed to
the WireGuard engine Reconfig.

conn25 is implemented as an ipnext.Extension and so this commit adds a
hook to ipnext.Hooks to allow any extension to provide routes to the
config. The hook if provided is called in routerConfigLocked, similarly
to exit nodes and VIP services.

Fixes tailscale/corp#38123

Signed-off-by: Fran Bull <fran@tailscale.com>
2026-03-25 19:28:33 -07:00
Michael Ben-Ami
a57c6457c9 ipn/ipnlocal: debounce extra enqueues in ExtensionHost.AuthReconfigAsync
Fixes tailscale/corp#39065

Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
2026-03-25 09:11:15 -04:00
rtgnx
c026be18cc
ipn/ipnserver: use peercreds for actor.Username on freebsd (for Taildrive)
Signed-off-by: Adrian Cybulski <adrian@cybulski.cc>
2026-03-24 20:35:56 -07:00
kari-ts
9992b7c817
ipn,ipn/local: broadcast ClientVersion if AutoUpdate.Check (#19107)
If AutoUpdate.Check is false, the client has opted out of checking for updates, so we shouldn't broadcast ClientVersion. If the client has opted in, it should be included in the initial Notify.

Updates tailscale/corp#32629

Signed-off-by: kari-ts <kari@tailscale.com>
2026-03-24 15:06:20 -07:00
KevinLiang10
1e51d57cdd
ipn: fix the typo causing NoSNAT always set to true (#19110)
Fixes #19109

Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
2026-03-24 16:41:58 -04:00
Alex Chan
302e49dc4e cmd/tailscale/cli: add a debug command to print the statedir
Example:

```console
$ tailscale debug statedir
/tmp/ts/node1
```

Updates #18019

Change-Id: I7c93c94179bd7b56d0fa8fe57a9129df05c2c1df
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-03-24 15:16:43 +00:00
Amal Bansode
04ef9d80b5
ipn/ipnlocal: add a map for node public key to node ID lookups (#19051)
This path is currently only used by DERP servers that have also
enabled `verify-clients` to ensure that only authorized clients
within a Tailnet are allowed to use said DERP server.

The previous naive linear scan in NodeByKey would almost
certainly lead to bad outcomes with a large enough netmap, so
address an existing todo by building a map of node key -> node ID.

Updates #19042

Signed-off-by: Amal Bansode <amal@tailscale.com>
2026-03-23 10:23:28 -07:00
Nahum Shalman
1d6ecb1e51
safesocket, ipn/ipnserver: use PeerCreds on solaris and illumos
Updates tailscale/peercred#10

Signed-off-by: Nahum Shalman <nahamu@gmail.com>
2026-03-23 07:45:35 -07:00
Michael Ben-Ami
ea7040eea2 ipn/{ipnext,ipnlocal}: expose authReconfig in ipnext.Host as AuthReconfigAsync
Also implement a limit of one on the number of goroutines that can be
waiting to do a reconfig via AuthReconfig, to prevent extensions from
calling too fast and taxing resources.

Even with the protection, the new method should only be used in
experimental or proof-of-concept contexts. The current intended use is
for an extension to be able force a reconfiguration of WireGuard, and
have the reconfiguration call back into the extension for extra Allowed
IPs.

If in the future if WireGuard is able to reconfigure individual peers more
dynamically, an extension might be able to hook into that process, and
this method on ipnext.Host may be deprecated.

Fixes tailscale/corp#38120
Updates tailscale/corp#38124
Updates tailscale/corp#38125

Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
2026-03-20 17:29:11 -04:00