Add a narrow LocalAPI accessor and matching client/LocalBackend method
to look up a single peer's current full [tailcfg.Node] by NodeID, in
O(1) time on the daemon side, without fetching the entire netmap.
Useful for callers that need the latest state of a single peer (e.g.
in response to a peer-mutation event on the IPN bus) without paying
for a full netmap fetch.
Updates #12542
Change-Id: I1cb2d350e6ad846a5dabc1f5368dfc8121387f7c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add two narrow LocalAPI accessors so callers don't have to subscribe to
the IPN bus and pull a full *netmap.NetworkMap just to read DNS-shaped
fields:
- GET /localapi/v0/cert-domains returns DNS.CertDomains.
- GET /localapi/v0/dns-config returns the full tailcfg.DNSConfig.
Migrate in-tree callers off the netmap-on-the-bus pattern:
- kube/certs.waitForCertDomain still wakes on the IPN bus but now
queries CertDomains via LocalClient.CertDomains rather than
reading n.NetMap.DNS.CertDomains. The kube LocalClient interface
and FakeLocalClient gain a CertDomains method.
- cmd/tailscale dns status calls LocalClient.DNSConfig directly
instead of opening a NotifyInitialNetMap watcher.
- cmd/tailscale configure kubeconfig switches from a netmap watcher
+ serviceDNSRecordFromNetMap to LocalClient.DNSConfig +
serviceDNSRecordFromDNSConfig.
This is part of a series moving callers away from depending on the
netmap traveling on the IPN bus, so the bus payload can shrink in a
later change.
Updates #12542
Change-Id: Ie10204e141d085fbac183b4cfe497226b670ad6c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add two narrower accessors alongside the existing
[LocalBackend.NetMap], with docs that distinguish their semantics:
- NetMapNoPeers: cheap (returns the cached *netmap.NetworkMap with
a possibly-stale Peers slice). For callers that only read non-Peers
fields like SelfNode, DNS, PacketFilter, capabilities.
- NetMapWithPeers: documented as returning an up-to-date Peers slice.
For callers that genuinely need to iterate Peers or call
PeerByXxx.
Mark the existing NetMap deprecated and point readers at the two new
accessors. NetMap, NetMapNoPeers, and NetMapWithPeers all currently
return the same value (b.currentNode().NetMap()): this commit is a
no-op behaviorally, just a renaming and migration of in-tree callers.
A subsequent change in the same series will switch
NetMapWithPeers to actually rebuild the Peers slice from the live
per-node-backend peers map (O(N) per call), at which point the
distinction between the two new accessors becomes load-bearing.
Migrate in-tree callers to the appropriate accessor based on what
fields they read:
- NetMapNoPeers (most common): localapi handlers, peerapi accept,
GetCertPEMWithValidity, web client noise request, doctor DNS
resolver check, tsnet CertDomains/TailscaleIPs, ssh/tailssh
SSH-policy/cap reads, several LocalBackend internals
(isLocalIP, allowExitNodeDNSProxyToServeName, pauseForNetwork
nil-check, serve config).
- NetMapWithPeers: writeNetmapToDiskLocked (persist full netmap to
disk for fast restart), PeerByTailscaleIP lookup.
Tests still call the legacy NetMap; they'll see the deprecation
warning but otherwise behave identically.
Also add two pieces of plumbing the next change in this series will
need, but which are already useful on their own:
- [client/local.GetDebugResultJSON]: a generic [Client.DebugResultJSON]
that decodes directly into a target type T, avoiding the
marshal/unmarshal roundtrip callers otherwise need.
- localapi "current-netmap" debug action: returns the current
netmap (with peers) as JSON. Documented as debug-only — the
netmap.NetworkMap shape is internal and may change without notice.
This commit is part of a series breaking up a larger change for
review; on its own it is a no-op refactor.
Updates #12542
Change-Id: Idbb30707414f8da3149c44ca0273262708375b02
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a vmtest that brings up two gokrazy nodes A and B behind two
One2OneNAT networks (so direct UDP works in both directions and any
slowness can't be blamed on NAT traversal), establishes a WireGuard
tunnel A → B with TSMP, then rotates B's disco key four times and
asserts that the data plane recovers in both directions after each
rotation. All pings are TSMP (the data-plane ping; disco pings would
not exercise the WireGuard tunnel itself).
The five pings:
1. A → B (initial; brings up the tunnel; 30s budget)
2. B → A after rotate (LocalAPI rotate-disco-key debug action)
3. A → B after rotate (LocalAPI)
4. B → A after restart (SIGKILL; gokrazy supervisor respawns)
5. A → B after restart (SIGKILL)
Each post-rotation ping gets a 15-second budget. Two unavoidable
multi-second waits dominate today:
- The rotate-then-a→b phase takes ~10s on main because of LazyWG.
After B's WantRunning bounce, B's wgengine resets its
sentActivityAt/recvActivityAt maps and trims A out of the
wireguard-go config as an "idle peer"; B only re-adds A on
inbound activity, by which point A's first few TSMP packets
have been silently dropped at B's tundev. The
bradfitz/rm_lazy_wg branch removes that trimming entirely
(verified locally: this phase drops to <100ms there).
- The restart phases take ~5s for wireguard-go's RekeyTimeout
handshake retry. After SIGKILL+respawn the first WG handshake
init from the restarted node sometimes goes into the void
(likely the brief peer-removed window in the receiver's
two-step maybeReconfigWireguardLocked reconfig during which
the peer is absent from wireguard-go), and wg-go's 5s+jitter
retransmit timer is the next opportunity to retry. That retry
succeeds and the staged TSMP packet flushes. Intrinsic to the
protocol's retransmit policy.
Once LazyWG is removed and the first-handshake-after-reconfig race
is fixed, the budget should drop to 5s.
Supporting changes:
ipn/ipnlocal: DebugRotateDiscoKey now toggles WantRunning off and
back on after rotating the disco key. magicsock.Conn.RotateDiscoKey
only resets local disco state; without also dropping wireguard-go
session keys, peers keep encrypting with their stale per-peer
session against us until their rekey timer fires (WireGuard has no
data-plane signaling to invalidate sessions). Bouncing WantRunning
runs the engine through Reconfig(empty) → authReconfig, which
drops every peer's WG session so the next packet either way
triggers a fresh handshake.
ipn/ipnlocal, ipn/localapi: add a debug-only "peer-disco-keys"
LocalAPI action ([LocalBackend.DebugPeerDiscoKeys]) that returns
a map[NodePublic]DiscoPublic from the current netmap. Tests reach
it via [local.Client.DebugResultJSON]. We do not surface disco
keys via [ipnstate.PeerStatus] because adding a non-comparable
[key.DiscoPublic] field there breaks reflect-based test helpers
(e.g. TestFilterFormatAndSortExitNodes' use of cmp.Diff), and
general LocalAPI clients have no need for disco keys. Since the
debug LocalAPI is gated behind the ts_omit_debug build tag, this
endpoint is automatically stripped from small binaries.
cmd/tta: add /restart-tailscaled handler (Linux-only, via /proc walk)
to drive the SIGKILL phase. On gokrazy the supervisor respawns
tailscaled within a second.
tstest/integration/testcontrol: add Server.AllOnline. When set,
every peer entry in MapResponses is marked Online=true. Several
disco-key handling fast paths in controlclient and wgengine
(removeUnwantedDiscoUpdates, removeUnwantedDiscoUpdatesFromFull
NetmapUpdate, the wgengine tsmpLearnedDisco fast path) only fire
for online peers; without this flag, tests exercising disco-key
rotation only hit the offline-peer code paths, which mask issues
and are several seconds slower in this scenario. Finer-grained
per-node online tracking can be added later.
tstest/natlab/vmtest: add Env.RotateDiscoKey,
Env.RestartTailscaled, Env.PeerDiscoKey, Node.Name, an
[AllOnline] EnvOption that plumbs through to
testcontrol.Server.AllOnline, and an exported
Env.Ping(from, to, type, timeout). Ping replaces the unexported
helper so callers can specify both a ping type (PingDisco for
warming peer state, PingTSMP for asserting end-to-end
connectivity) and a deadline. PeerDiscoKey returns its LocalAPI
error so callers inside tstest.WaitFor can retry transient
failures rather than fataling the test.
Updates #12639
Updates #13038
Change-Id: I3644f27fc30e52990ba25a3983498cc582ddb958
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a tsdial.Dialer.UserDialPlan method that resolves an address and
reports whether the dialer would route it via Tailscale. The LocalAPI
/dial handler now uses this to skip proxying for addresses that aren't
Tailscale routes (e.g. localhost), returning a Dial-Self response with
the resolved address so the client can dial it directly. This avoids
an unnecessary round-trip through the daemon for local connections.
The client's UserDial handles the new response by dialing the resolved
address itself, and the server passes the pre-resolved IP:port for
Tailscale dials to avoid redundant DNS lookups.
Thanks to giacomo and Moyao for pointing this out!
Updates tailscale/corp#39702
Change-Id: I78d640f11ccd92f43ddd505cbb0db8fee19f43a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a new "ipnbus" build feature tag so the watch-ipn-bus LocalAPI
endpoint can be independently controlled, rather than being gated
behind HasDebug || HasServe. Minimal/embedded builds that omit both
debug and serve were getting 404s on watch-ipn-bus, breaking
"tailscale up --authkey=..." and other CLI flows that depend on
WatchIPNBus.
In the CLI, check buildfeatures.HasIPNBus before attempting to watch
the IPN bus in "tailscale up"/"tailscale login", and exit early with
an informational message when the feature is omitted.
Also add the missing NewCounterFunc stub to clientmetric/omit.go,
which caused compilation errors when building with
ts_omit_clientmetrics and netstack enabled.
Fixes#19240
Change-Id: I2e3c69a72fc50fa02542b91b8a54859618a463d1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is a follow-up to #19117, adding a debug CLI command allowing the operator
to explicitly discard cached netmap data, as a safety and recovery measure.
Updates #12639
Change-Id: I5c3c47c0204754b9c8e526a4ff8f69d6974db6d0
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
* Refer to "tailnet-lock" instead of "network-lock" in log messages
* Log keys as `tlpub:<hex>` rather than as Go structs
Updates tailscale/corp#39455
Updates tailscale/corp#37904
Change-Id: I644407d1eda029ee11027bcc949897aa4ba52787
Signed-off-by: Alex Chan <alexc@tailscale.com>
I omitted a lot of the min/max modernizers because they didn't
result in more clear code.
Some of it's older "for x := range 123".
Also: errors.AsType, any, fmt.Appendf, etc.
Updates #18682
Change-Id: I83a451577f33877f962766a5b65ce86f7696471c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
PR #18860 adds firewall rules in the mangle table to save outbound packet
marks to conntrack and restore them on reply packets before the routing
decision. When reply packets have their marks restored, the kernel uses
the correct routing table (based on the mark) and the packets pass the
rp_filter check.
This makes the risk check and reverse path filtering warnings unnecessary.
Updates #3310Fixestailscale/corp#37846
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
Under extremely high load it appears we may have some retention issues
as a result of queue depth build up, but there is currently no direct
way to observe this. The scenario does not trigger the slow subscriber
log message, and the event stream debugging endpoint produces a
saturating volume of information.
Updates tailscale/corp#36904
Signed-off-by: James Tucker <james@tailscale.com>
The Tailscale CLI has some methods to watch the IPN bus for
messages, say, the current netmap (`tailscale debug netmap`).
The Tailscale daemon supports this using a streaming HTTP
response. Sometimes, the client can close its connection
abruptly -- due to an interruption, or in the case of `debug netmap`,
intentionally after consuming one message.
If the server daemon is writing a response as the client closes
its end of the socket, the daemon typically encounters a "broken pipe"
error. The "Watch IPN Bus" handler currently logs such errors after
they're propagated by a JSON encoding/writer helper.
Since the Tailscale CLI nominally closes its socket with the daemon
in this slightly ungraceful way (viz. `debug netmap`), stop logging
these broken pipe errors as far as possible. This will help avoid
confounding users when they scan backend logs.
Updates #18477
Signed-off-by: Amal Bansode <amal@tailscale.com>
This file was never truly necessary and has never actually been used in
the history of Tailscale's open source releases.
A Brief History of AUTHORS files
---
The AUTHORS file was a pattern developed at Google, originally for
Chromium, then adopted by Go and a bunch of other projects. The problem
was that Chromium originally had a copyright line only recognizing
Google as the copyright holder. Because Google (and most open source
projects) do not require copyright assignemnt for contributions, each
contributor maintains their copyright. Some large corporate contributors
then tried to add their own name to the copyright line in the LICENSE
file or in file headers. This quickly becomes unwieldy, and puts a
tremendous burden on anyone building on top of Chromium, since the
license requires that they keep all copyright lines intact.
The compromise was to create an AUTHORS file that would list all of the
copyright holders. The LICENSE file and source file headers would then
include that list by reference, listing the copyright holder as "The
Chromium Authors".
This also become cumbersome to simply keep the file up to date with a
high rate of new contributors. Plus it's not always obvious who the
copyright holder is. Sometimes it is the individual making the
contribution, but many times it may be their employer. There is no way
for the proejct maintainer to know.
Eventually, Google changed their policy to no longer recommend trying to
keep the AUTHORS file up to date proactively, and instead to only add to
it when requested: https://opensource.google/docs/releasing/authors.
They are also clear that:
> Adding contributors to the AUTHORS file is entirely within the
> project's discretion and has no implications for copyright ownership.
It was primarily added to appease a small number of large contributors
that insisted that they be recognized as copyright holders (which was
entirely their right to do). But it's not truly necessary, and not even
the most accurate way of identifying contributors and/or copyright
holders.
In practice, we've never added anyone to our AUTHORS file. It only lists
Tailscale, so it's not really serving any purpose. It also causes
confusion because Tailscalars put the "Tailscale Inc & AUTHORS" header
in other open source repos which don't actually have an AUTHORS file, so
it's ambiguous what that means.
Instead, we just acknowledge that the contributors to Tailscale (whoever
they are) are copyright holders for their individual contributions. We
also have the benefit of using the DCO (developercertificate.org) which
provides some additional certification of their right to make the
contribution.
The source file changes were purely mechanical with:
git ls-files | xargs sed -i -e 's/\(Tailscale Inc &\) AUTHORS/\1 contributors/g'
Updates #cleanup
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
This change adds API to ipn.LocalBackend to retrieve the ETag when
querying for the current serve config. This allows consumers of
ipn.LocalBackend.SetServeConfig to utilize the concurrency control
offered by ETags. Previous to this change, utilizing serve config ETags
required copying the local backend's internal ETag calcuation.
The local API server was previously copying the local backend's ETag
calculation as described above. With this change, the local API server
now uses the new ETag retrieval function instead. Serve config ETags are
therefore now opaque to clients, in line with best practices.
Fixestailscale/corp#35857
Signed-off-by: Harry Harpham <harry@tailscale.com>
The existing client metric methods only support incrementing (or
decrementing) a delta value. This new method allows setting the metric
to a specific value.
Updates tailscale/corp#35327
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
With the introduction of node sealing, store.New fails in some cases due
to the TPM device being reset or unavailable. Currently it results in
tailscaled crashing at startup, which is not obvious to the user until
they check the logs.
Instead of crashing tailscaled at startup, start with an in-memory store
with a health warning about state initialization and a link to (future)
docs on what to do. When this health message is set, also block any
login attempts to avoid masking the problem with an ephemeral node
registration.
Updates #15830
Updates #17654
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
This commit enables user to set service backend to remote destinations, that can be a partial
URL or a full URL. The commit also prevents user to set remote destinations on linux system
when socket mark is not working. For user on any version of mac extension they can't serve a
service either. The socket mark usability is determined by a new local api.
Fixestailscale/corp#24783
Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
Adds the ability to rotate discovery keys on running clients, needed for
testing upcoming disco key distribution changes.
Introduces key.DiscoKey, an atomic container for a disco private key,
public key, and the public key's ShortString, replacing the prior
separate atomic fields.
magicsock.Conn has a new RotateDiscoKey method, and access to this is
provided via localapi and a CLI debug command.
Note that this implementation is primarily for testing as it stands, and
regular use should likely introduce an additional mechanism that allows
the old key to be used for some time, to provide a seamless key rotation
rather than one that invalidates all sessions.
Updates tailscale/corp#34037
Signed-off-by: James Tucker <james@tailscale.com>
It's an unnecessary nuisance having it. We go out of our way to redact
it in so many places when we don't even need it there anyway.
Updates #12639
Change-Id: I5fc72e19e9cf36caeb42cf80ba430873f67167c3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Saves ~94 KB from the min build.
Updates #12614
Change-Id: I3b0b8a47f80b9fd3b1038c2834b60afa55bf02c2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Saves ~102 KB from the min build.
Updates #12614
Change-Id: Ie1d4f439321267b9f98046593cb289ee3c4d6249
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Saves 262 KB so far. I'm sure I missed some places, but shotizam says
these were the low hanging fruit.
Updates #12614
Change-Id: Ia31c01b454f627e6d0470229aae4e19d615e45e3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add and wire up event publishers for these two event types in the AppConnector.
Nothing currently subscribes to them, so this is harmless. Subscribers for
these events will be added in a near-future commit.
As part of this, move the appc.RouteInfo type to the types/appctype package.
It does not contain any package-specific details from appc. Beside it, add
appctype.RouteUpdate to carry route update event state, likewise not specific
to appc. Update all usage of the appc.* types throughout to use appctype.*
instead, and update depaware files to reflect these changes.
Add a Close method to the AppConnector to make sure the client gets cleaned up
when the connector is dropped (we re-create connectors).
Update the unit tests in the appc package to also check the events published
alongside calls to the RouteAdvertiser.
For now the tests still rely on the RouteAdvertiser for correctness; this is OK
for now as the two methods are always performed together. In the near future,
we need to rework the tests so not require that, but that will require building
some more test fixtures that we can handle separately.
Updates #15160
Updates #17192
Change-Id: I184670ba2fb920e0d2cb2be7c6816259bca77afe
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Saves 442 KB. Lock it with a new min test.
Updates #12614
Change-Id: Ia7bf6f797b6cbf08ea65419ade2f359d390f8e91
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Saves 328 KB (2.5%) off the minimal binary.
For IoT devices that don't need MagicDNS (e.g. they don't make
outbound connections), this provides a knob to disable all the DNS
functionality.
Rather than a massive refactor today, this uses constant false values
as a deadcode sledgehammer, guided by shotizam to find the largest DNS
functions which survived deadcode.
A future refactor could make it so that the net/dns/resolver and
publicdns packages don't even show up in the import graph (along with
their imports) but really it's already pretty good looking with just
these consts, so it's not at the top of my list to refactor it more
soon.
Also do the same in a few places with the ACME (cert) functionality,
as I saw those while searching for DNS stuff.
Updates #12614
Change-Id: I8e459f595c2fde68ca16503ff61c8ab339871f97
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Allow the user to access information about routes an app connector has
learned, such as how many routes for each domain.
Fixestailscale/corp#32624
Signed-off-by: Fran Bull <fran@tailscale.com>
Removes 434 KB from the minimal Linux binary, or ~3%.
Primarily this comes from not linking in the zstd encoding code.
Fixes#17323
Change-Id: I0a90de307dfa1ad7422db7aa8b1b46c782bfaaf7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The dnstype package is used by tailcfg, which tries to be light and
leafy. But it brings in dnstype. So dnstype shouldn't bring in
x/net/dns/dnsmessage.
Updates #12614
Change-Id: I043637a7ce7fed097e648001f13ca1927a781def
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
A customer wants to allow their employees to restart tailscaled at will, when access rights and MDM policy allow it,
as a way to fully reset client state and re-create the tunnel in case of connectivity issues.
On Windows, the main tailscaled process runs as a child of a service process. The service restarts the child
when it exits (or crashes) until the service itself is stopped. Regular (non-admin) users can't stop the service,
and allowing them to do so isn't ideal, especially in managed or multi-user environments.
In this PR, we add a LocalAPI endpoint that instructs ipnserver.Server, and by extension the tailscaled process,
to shut down. The service then restarts the child tailscaled. Shutting down tailscaled requires LocalAPI write access
and an enabled policy setting.
Updates tailscale/corp#32674
Updates tailscale/corp#32675
Signed-off-by: Nick Khyl <nickk@tailscale.com>
I'd started to do this in the earlier ts_omit_server PR but
decided to split it into this separate PR.
Updates #17128
Change-Id: Ief8823a78d1f7bbb79e64a5cab30a7d0a5d6ff4b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The Tracker was using direct callbacks to ipnlocal. This PR moves those
to be triggered via the eventbus.
Additionally, the eventbus is now closed on exit from tailscaled
explicitly, and health is now a SubSystem in tsd.
Updates #15160
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
This is a small introduction of the eventbus into controlclient that
communicates with mainly ipnlocal. While ipnlocal is a complicated part
of the codebase, the subscribers here are from the perspective of
ipnlocal already called async.
Updates #15160
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
As of this commit (per the issue), the Taildrive code remains where it
was, but in new files that are protected by the new ts_omit_drive
build tag. Future commits will move it.
Updates #17058
Change-Id: Idf0a51db59e41ae8da6ea2b11d238aefc48b219e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>