10 Commits

Author SHA1 Message Date
Brad Fitzpatrick
159cf8707a ipn/ipnlocal, all: split LocalBackend.NetMap into NetMapNoPeers / NetMapWithPeers
Add two narrower accessors alongside the existing
[LocalBackend.NetMap], with docs that distinguish their semantics:

  - NetMapNoPeers: cheap (returns the cached *netmap.NetworkMap with
    a possibly-stale Peers slice). For callers that only read non-Peers
    fields like SelfNode, DNS, PacketFilter, capabilities.
  - NetMapWithPeers: documented as returning an up-to-date Peers slice.
    For callers that genuinely need to iterate Peers or call
    PeerByXxx.

Mark the existing NetMap deprecated and point readers at the two new
accessors. NetMap, NetMapNoPeers, and NetMapWithPeers all currently
return the same value (b.currentNode().NetMap()): this commit is a
no-op behaviorally, just a renaming and migration of in-tree callers.
A subsequent change in the same series will switch
NetMapWithPeers to actually rebuild the Peers slice from the live
per-node-backend peers map (O(N) per call), at which point the
distinction between the two new accessors becomes load-bearing.

Migrate in-tree callers to the appropriate accessor based on what
fields they read:

  - NetMapNoPeers (most common): localapi handlers, peerapi accept,
    GetCertPEMWithValidity, web client noise request, doctor DNS
    resolver check, tsnet CertDomains/TailscaleIPs, ssh/tailssh
    SSH-policy/cap reads, several LocalBackend internals
    (isLocalIP, allowExitNodeDNSProxyToServeName, pauseForNetwork
    nil-check, serve config).
  - NetMapWithPeers: writeNetmapToDiskLocked (persist full netmap to
    disk for fast restart), PeerByTailscaleIP lookup.

Tests still call the legacy NetMap; they'll see the deprecation
warning but otherwise behave identically.

Also add two pieces of plumbing the next change in this series will
need, but which are already useful on their own:

  - [client/local.GetDebugResultJSON]: a generic [Client.DebugResultJSON]
    that decodes directly into a target type T, avoiding the
    marshal/unmarshal roundtrip callers otherwise need.
  - localapi "current-netmap" debug action: returns the current
    netmap (with peers) as JSON. Documented as debug-only — the
    netmap.NetworkMap shape is internal and may change without notice.

This commit is part of a series breaking up a larger change for
review; on its own it is a no-op refactor.

Updates #12542

Change-Id: Idbb30707414f8da3149c44ca0273262708375b02
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-30 11:14:06 -07:00
Brad Fitzpatrick
15cba0a3f6 tstest/natlab/vmtest: add TestDiscoKeyChange
Add a vmtest that brings up two gokrazy nodes A and B behind two
One2OneNAT networks (so direct UDP works in both directions and any
slowness can't be blamed on NAT traversal), establishes a WireGuard
tunnel A → B with TSMP, then rotates B's disco key four times and
asserts that the data plane recovers in both directions after each
rotation. All pings are TSMP (the data-plane ping; disco pings would
not exercise the WireGuard tunnel itself).

The five pings:

  1. A → B  (initial; brings up the tunnel; 30s budget)
  2. B → A  after rotate (LocalAPI rotate-disco-key debug action)
  3. A → B  after rotate (LocalAPI)
  4. B → A  after restart (SIGKILL; gokrazy supervisor respawns)
  5. A → B  after restart (SIGKILL)

Each post-rotation ping gets a 15-second budget. Two unavoidable
multi-second waits dominate today:

  - The rotate-then-a→b phase takes ~10s on main because of LazyWG.
    After B's WantRunning bounce, B's wgengine resets its
    sentActivityAt/recvActivityAt maps and trims A out of the
    wireguard-go config as an "idle peer"; B only re-adds A on
    inbound activity, by which point A's first few TSMP packets
    have been silently dropped at B's tundev. The
    bradfitz/rm_lazy_wg branch removes that trimming entirely
    (verified locally: this phase drops to <100ms there).

  - The restart phases take ~5s for wireguard-go's RekeyTimeout
    handshake retry. After SIGKILL+respawn the first WG handshake
    init from the restarted node sometimes goes into the void
    (likely the brief peer-removed window in the receiver's
    two-step maybeReconfigWireguardLocked reconfig during which
    the peer is absent from wireguard-go), and wg-go's 5s+jitter
    retransmit timer is the next opportunity to retry. That retry
    succeeds and the staged TSMP packet flushes. Intrinsic to the
    protocol's retransmit policy.

Once LazyWG is removed and the first-handshake-after-reconfig race
is fixed, the budget should drop to 5s.

Supporting changes:

  ipn/ipnlocal: DebugRotateDiscoKey now toggles WantRunning off and
  back on after rotating the disco key. magicsock.Conn.RotateDiscoKey
  only resets local disco state; without also dropping wireguard-go
  session keys, peers keep encrypting with their stale per-peer
  session against us until their rekey timer fires (WireGuard has no
  data-plane signaling to invalidate sessions). Bouncing WantRunning
  runs the engine through Reconfig(empty) → authReconfig, which
  drops every peer's WG session so the next packet either way
  triggers a fresh handshake.

  ipn/ipnlocal, ipn/localapi: add a debug-only "peer-disco-keys"
  LocalAPI action ([LocalBackend.DebugPeerDiscoKeys]) that returns
  a map[NodePublic]DiscoPublic from the current netmap. Tests reach
  it via [local.Client.DebugResultJSON]. We do not surface disco
  keys via [ipnstate.PeerStatus] because adding a non-comparable
  [key.DiscoPublic] field there breaks reflect-based test helpers
  (e.g. TestFilterFormatAndSortExitNodes' use of cmp.Diff), and
  general LocalAPI clients have no need for disco keys. Since the
  debug LocalAPI is gated behind the ts_omit_debug build tag, this
  endpoint is automatically stripped from small binaries.

  cmd/tta: add /restart-tailscaled handler (Linux-only, via /proc walk)
  to drive the SIGKILL phase. On gokrazy the supervisor respawns
  tailscaled within a second.

  tstest/integration/testcontrol: add Server.AllOnline. When set,
  every peer entry in MapResponses is marked Online=true. Several
  disco-key handling fast paths in controlclient and wgengine
  (removeUnwantedDiscoUpdates, removeUnwantedDiscoUpdatesFromFull
  NetmapUpdate, the wgengine tsmpLearnedDisco fast path) only fire
  for online peers; without this flag, tests exercising disco-key
  rotation only hit the offline-peer code paths, which mask issues
  and are several seconds slower in this scenario. Finer-grained
  per-node online tracking can be added later.

  tstest/natlab/vmtest: add Env.RotateDiscoKey,
  Env.RestartTailscaled, Env.PeerDiscoKey, Node.Name, an
  [AllOnline] EnvOption that plumbs through to
  testcontrol.Server.AllOnline, and an exported
  Env.Ping(from, to, type, timeout). Ping replaces the unexported
  helper so callers can specify both a ping type (PingDisco for
  warming peer state, PingTSMP for asserting end-to-end
  connectivity) and a deadline. PeerDiscoKey returns its LocalAPI
  error so callers inside tstest.WaitFor can retry transient
  failures rather than fataling the test.

Updates #12639
Updates #13038

Change-Id: I3644f27fc30e52990ba25a3983498cc582ddb958
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 12:58:00 -07:00
M. J. Fromberger
eaa5d9df4b
client,cmd/tailscale,ipn/{ipnlocal,localapi}: add debug CLI command to clear netmap caches (#19213)
This is a follow-up to #19117, adding a debug CLI command allowing the operator
to explicitly discard cached netmap data, as a safety and recovery measure.

Updates #12639

Change-Id: I5c3c47c0204754b9c8e526a4ff8f69d6974db6d0
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-04-02 12:06:39 -07:00
Alex Chan
302e49dc4e cmd/tailscale/cli: add a debug command to print the statedir
Example:

```console
$ tailscale debug statedir
/tmp/ts/node1
```

Updates #18019

Change-Id: I7c93c94179bd7b56d0fa8fe57a9129df05c2c1df
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-03-24 15:16:43 +00:00
Brad Fitzpatrick
bd2a2d53d3 all: use Go 1.26 things, run most gofix modernizers
I omitted a lot of the min/max modernizers because they didn't
result in more clear code.

Some of it's older "for x := range 123".

Also: errors.AsType, any, fmt.Appendf, etc.

Updates #18682

Change-Id: I83a451577f33877f962766a5b65ce86f7696471c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-06 13:32:03 -08:00
James Tucker
fe69b7f0e5 cmd/tailscale: add event bus queue depth debugging
Under extremely high load it appears we may have some retention issues
as a result of queue depth build up, but there is currently no direct
way to observe this. The scenario does not trigger the slow subscriber
log message, and the event stream debugging endpoint produces a
saturating volume of information.

Updates tailscale/corp#36904

Signed-off-by: James Tucker <james@tailscale.com>
2026-02-06 10:46:29 -08:00
Will Norris
3ec5be3f51 all: remove AUTHORS file and references to it
This file was never truly necessary and has never actually been used in
the history of Tailscale's open source releases.

A Brief History of AUTHORS files
---

The AUTHORS file was a pattern developed at Google, originally for
Chromium, then adopted by Go and a bunch of other projects. The problem
was that Chromium originally had a copyright line only recognizing
Google as the copyright holder. Because Google (and most open source
projects) do not require copyright assignemnt for contributions, each
contributor maintains their copyright. Some large corporate contributors
then tried to add their own name to the copyright line in the LICENSE
file or in file headers. This quickly becomes unwieldy, and puts a
tremendous burden on anyone building on top of Chromium, since the
license requires that they keep all copyright lines intact.

The compromise was to create an AUTHORS file that would list all of the
copyright holders. The LICENSE file and source file headers would then
include that list by reference, listing the copyright holder as "The
Chromium Authors".

This also become cumbersome to simply keep the file up to date with a
high rate of new contributors. Plus it's not always obvious who the
copyright holder is. Sometimes it is the individual making the
contribution, but many times it may be their employer. There is no way
for the proejct maintainer to know.

Eventually, Google changed their policy to no longer recommend trying to
keep the AUTHORS file up to date proactively, and instead to only add to
it when requested: https://opensource.google/docs/releasing/authors.
They are also clear that:

> Adding contributors to the AUTHORS file is entirely within the
> project's discretion and has no implications for copyright ownership.

It was primarily added to appease a small number of large contributors
that insisted that they be recognized as copyright holders (which was
entirely their right to do). But it's not truly necessary, and not even
the most accurate way of identifying contributors and/or copyright
holders.

In practice, we've never added anyone to our AUTHORS file. It only lists
Tailscale, so it's not really serving any purpose. It also causes
confusion because Tailscalars put the "Tailscale Inc & AUTHORS" header
in other open source repos which don't actually have an AUTHORS file, so
it's ambiguous what that means.

Instead, we just acknowledge that the contributors to Tailscale (whoever
they are) are copyright holders for their individual contributions. We
also have the benefit of using the DCO (developercertificate.org) which
provides some additional certification of their right to make the
contribution.

The source file changes were purely mechanical with:

    git ls-files | xargs sed -i -e 's/\(Tailscale Inc &\) AUTHORS/\1 contributors/g'

Updates #cleanup

Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
2026-01-23 15:49:45 -08:00
James Tucker
c09c95ef67 types/key,wgengine/magicsock,control/controlclient,ipn: add debug disco key rotation
Adds the ability to rotate discovery keys on running clients, needed for
testing upcoming disco key distribution changes.

Introduces key.DiscoKey, an atomic container for a disco private key,
public key, and the public key's ShortString, replacing the prior
separate atomic fields.

magicsock.Conn has a new RotateDiscoKey method, and access to this is
provided via localapi and a CLI debug command.

Note that this implementation is primarily for testing as it stands, and
regular use should likely introduce an additional mechanism that allows
the old key to be used for some time, to provide a seamless key rotation
rather than one that invalidates all sessions.

Updates tailscale/corp#34037

Signed-off-by: James Tucker <james@tailscale.com>
2025-11-18 12:16:15 -08:00
Brad Fitzpatrick
edb11e0e60 wgengine/magicsock: fix js/wasm crash regression loading non-existent portmapper
Thanks for the report, @Need-an-AwP!

Fixes #17681
Updates #9394

Change-Id: I2e0b722ef9b460bd7e79499192d1a315504ca84c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-10-28 08:59:00 -07:00
Brad Fitzpatrick
ee034d48fc feature/featuretags: add a catch-all "Debug" feature flag
Saves 168 KB.

Updates #12614

Change-Id: Iaab3ae3efc6ddc7da39629ef13e5ec44976952ba
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-09-30 11:32:33 -07:00