It appears (*controlclient.Auto).Shutdown() can still deadlock when called with b.mu held, and therefore the changes in #18127 are unsafe.
This reverts #18127 until we figure out what causes it.
This reverts commit d199ecac80083e64d32baf3b473c67b11a6e6936.
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Previously we only set this when it updated, which was fine for the first
call to Start(), but after that point future updates would be skipped if
nothing had changed. If Start() was called again, it would wipe the peer API
endpoints and they wouldn't get added back again, breaking exit nodes (and
anything else requiring peer API to be advertised).
Updates tailscale/corp#27173
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
Based on PR #16700 by @lox, adapted to current codebase.
Adds support for proxying HTTP requests to Unix domain sockets via
tailscale serve unix:/path/to/socket, enabling exposure of services
like Docker, containerd, PHP-FPM over Tailscale without TCP bridging.
The implementation includes reasonable protections against exposure of
tailscaled's own socket.
Adaptations from original PR:
- Use net.Dialer.DialContext instead of net.Dial for context propagation
- Use http.Transport with Protocols API (current h2c approach, not http2.Transport)
- Resolve conflicts with hasScheme variable in ExpandProxyTargetValue
Updates #9771
Signed-off-by: Peter A. <ink.splatters@pm.me>
Co-authored-by: Lachlan Donald <lachlan@ljd.cc>
If a packet arrives while WireGuard is being reconfigured with b.mu held, such as during a profile switch,
calling back into (*LocalBackend).GetPeerAPIPort from (*Wrapper).filterPacketInboundFromWireGuard
may deadlock when it tries to acquire b.mu.
This occurs because a peer cannot be removed while an inbound packet is being processed.
The reconfig and profile switch wait for (*Peer).RoutineSequentialReceiver to return, but it never finishes
because GetPeerAPIPort needs b.mu, which the waiting goroutine already holds.
In this PR, we make peerAPIPorts a new syncs.AtomicValue field that is written with b.mu held
but can be read by GetPeerAPIPort without holding the mutex, which fixes the deadlock.
There might be other long-term ways to address the issue, such as moving peer API listeners
from LocalBackend to nodeBackend so they can be accessed without holding b.mu,
but these changes are too large and risky at this stage in the v1.92 release cycle.
Updates #18124
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Previously, callers of (*LocalBackend).resetControlClientLocked were supposed
to call Shutdown on the returned controlclient.Client after releasing b.mu.
In #17804, we started calling Shutdown while holding b.mu, which caused
deadlocks during profile switches due to the (*ExecQueue).RunSync implementation.
We first patched this in #18053 by calling Shutdown in a new goroutine,
which avoided the deadlocks but made TestStateMachine flaky because
the shutdown order was no longer guaranteed.
In #18070, we updated (*ExecQueue).RunSync to allow shutting down
the queue without waiting for RunSync to return. With that change,
shutting down the control client while holding b.mu became safe.
Therefore, this PR updates (*LocalBackend).resetControlClientLocked
to shut down the old client synchronously during the reset, instead of
returning it and shifting that responsibility to the callers.
This fixes the flaky tests and simplifies the code.
Fixes#18052
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This patch adds an integration test for Tailnet Lock, checking that a node can't
talk to peers in the tailnet until it becomes signed.
This patch also introduces a new package `tstest/tkatest`, which has some helpers
for constructing a mock control server that responds to TKA requests. This allows
us to reduce boilerplate in the IPN tests.
Updates tailscale/corp#33599
Signed-off-by: Alex Chan <alexc@tailscale.com>
In preparation for exposing its configuration via ipn.ConfigVAlpha,
change {Masked}Prefs.RelayServerPort from *int to *uint16. This takes a
defensive stance against invalid inputs at JSON decode time.
'tailscale set --relay-server-port' is currently the only input to this
pref, and has always sanitized input to fit within a uint16.
Updates tailscale/corp#34591
Signed-off-by: Jordan Whited <jordan@tailscale.com>
In suggestExitNodeLocked, if no exit node candidates have a home DERP or
valid location info, `bestCandidates` is an empty slice. This slice is
passed to `selectNode` (`randomNode` in prod):
```go func randomNode(nodes views.Slice[tailcfg.NodeView], …) tailcfg.NodeView {
…
return nodes.At(rand.IntN(nodes.Len()))
}
```
An empty slice becomes a call to `rand.IntN(0)`, which panics.
This patch changes the behaviour, so if we've filtered out all the
candidates before calling `selectNode`, reset the list and then pick
from any of the available candidates.
This patch also updates our tests to give us more coverage of `randomNode`,
so we can spot other potential issues.
Updates #17661
Change-Id: I63eb5e4494d45a1df5b1f4b1b5c6d5576322aa72
Signed-off-by: Alex Chan <alexc@tailscale.com>
In PR tailscale/corp#34401, the `traffic-steering` feature flag does
not automatically enable traffic steering for all nodes. Instead, an
admin must add the `traffic-steering` node attribute to each client
node that they want opted-in.
For backwards compatibility with older clients, tailscale/corp#34401
strips out the `traffic-steering` node attribute if the feature flag
is not enabled, even if it is set in the policy file. This lets us
safely disable the feature flag.
This PR adds a missing test case for suggested exit nodes that have no
priority.
Updates tailscale/corp#34399
Signed-off-by: Simon Law <sfllaw@tailscale.com>
When the underlying transport returns a network error, the RoundTrip
method returns (nil, error). The defer was attempting to access resp
without checking if it was nil first, causing a panic. Fix this by
checking for nil in the defer.
Also changes driveTransport.tr from *http.Transport to http.RoundTripper
and adds a test.
Fixes#17306
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
Change-Id: Icf38a020b45aaa9cfbc1415d55fd8b70b978f54c
With the introduction of node sealing, store.New fails in some cases due
to the TPM device being reset or unavailable. Currently it results in
tailscaled crashing at startup, which is not obvious to the user until
they check the logs.
Instead of crashing tailscaled at startup, start with an in-memory store
with a health warning about state initialization and a link to (future)
docs on what to do. When this health message is set, also block any
login attempts to avoid masking the problem with an ephemeral node
registration.
Updates #15830
Updates #17654
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
These validations were previously performed in the CLI frontend. There
are two motivations for moving these to the local backend:
1. The backend controls synchronization around the relevant state, so
only the backend can guarantee many of these validations.
2. Doing these validations in the back-end avoids the need to repeat
them across every frontend (e.g. the CLI and tsnet).
Updates tailscale/corp#27200
Signed-off-by: Harry Harpham <harry@tailscale.com>
This commit enables user to set service backend to remote destinations, that can be a partial
URL or a full URL. The commit also prevents user to set remote destinations on linux system
when socket mark is not working. For user on any version of mac extension they can't serve a
service either. The socket mark usability is determined by a new local api.
Fixestailscale/corp#24783
Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
Now that we support using an in-memory backend for TKA state (#17946),
this function always returns `nil` – we can always support Network Lock.
We don't need it any more.
Plus, clean up a couple of errant TODOs from that PR.
Updates tailscale/corp#33599
Change-Id: Ief93bb9adebb82b9ad1b3e406d1ae9d2fa234877
Signed-off-by: Alex Chan <alexc@tailscale.com>
Previously a TKA compaction would only run when a node starts, which means a long-running node could use unbounded storage as it accumulates ever-increasing amounts of TKA state. This patch changes TKA so it runs a compaction after every sync.
Updates https://github.com/tailscale/corp/issues/33537
Change-Id: I91df887ea0c5a5b00cb6caced85aeffa2a4b24ee
Signed-off-by: Alex Chan <alexc@tailscale.com>
Adds the ability to rotate discovery keys on running clients, needed for
testing upcoming disco key distribution changes.
Introduces key.DiscoKey, an atomic container for a disco private key,
public key, and the public key's ShortString, replacing the prior
separate atomic fields.
magicsock.Conn has a new RotateDiscoKey method, and access to this is
provided via localapi and a CLI debug command.
Note that this implementation is primarily for testing as it stands, and
regular use should likely introduce an additional mechanism that allows
the old key to be used for some time, to provide a seamless key rotation
rather than one that invalidates all sessions.
Updates tailscale/corp#34037
Signed-off-by: James Tucker <james@tailscale.com>
For manual (human) testing, this lets the user disable control plane
map polls with "tailscale set --sync=false" (which survives restarts)
and "tailscale set --sync" to restore.
A high severity health warning is shown while this is active.
Updates #12639
Updates #17945
Change-Id: I83668fa5de3b5e5e25444df0815ec2a859153a6d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This requires making the internals of LocalBackend a bit more generic,
and implementing the `tka.CompactableChonk` interface for `tka.Mem`.
Signed-off-by: Alex Chan <alexc@tailscale.com>
Updates https://github.com/tailscale/corp/issues/33599
Pick up a fix for https://pkg.go.dev/vuln/GO-2025-4116 (even though
we're not affected).
Updates #cleanup
Change-Id: I9f2571b17c1f14db58ece8a5a34785805217d9dd
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Includes adding StartPaused, which will be used in a future change to
enable netmap caching testing.
Updates #12639
Change-Id: Iec39915d33b8d75e9b8315b281b1af2f5d13a44a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds the --proxy-protocol flag to 'tailscale serve' and
'tailscale funnel', which tells the Tailscale client to prepend a PROXY
protocol[1] header when making connections to the proxied-to backend.
I've verified that this works with our existing funnel servers without
additional work, since they pass along source address information via
PeerAPI already.
Updates #7747
[1]: https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
Change-Id: I647c24d319375c1b33e995555a541b7615d2d203
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
It's an unnecessary nuisance having it. We go out of our way to redact
it in so many places when we don't even need it there anyway.
Updates #12639
Change-Id: I5fc72e19e9cf36caeb42cf80ba430873f67167c3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Remove the State enum (StateNew, StateNotAuthenticated, etc.) from
controlclient and replace it with two explicit boolean fields:
- LoginFinished: indicates successful authentication
- Synced: indicates we've received at least one netmap
This makes the state more composable and easier to reason about, as
multiple conditions can be true independently rather than being
encoded in a single enum value.
The State enum was originally intended as the state machine for the
whole client, but that abstraction moved to ipn.Backend long ago.
This change continues moving away from the legacy state machine by
representing state as a combination of independent facts.
Also adds test helpers in ipnlocal that check independent, observable
facts (hasValidNetMap, needsLogin, etc.) rather than relying on
derived state enums, making tests more robust.
Updates #12639
Signed-off-by: James Tucker <james@tailscale.com>
As a baby step towards eventbus-ifying controlclient, make the
Observer optional.
This also means callers that don't care (like this network lock test,
and some tests in other repos) can omit it, rather than passing in a
no-op one.
Updates #12639
Change-Id: Ibd776b45b4425c08db19405bc3172b238e87da4e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit replaces usage of local.Client in net/udprelay with DERPMap
plumbing over the eventbus. This has been a longstanding TODO. This work
was also accelerated by a memory leak in net/http when using
local.Client over long periods of time. So, this commit also addresses
said leak.
Updates #17801
Signed-off-by: Jordan Whited <jordan@tailscale.com>
They distracted me in some refactoring. They're set but never used.
Updates #17858
Change-Id: I6ec7d6841ab684a55bccca7b7cbf7da9c782694f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
updates tailscale/corp#31571
It appears that on the latest macOS, iOS and tVOS versions, the work
that netns is doing to bind outgoing connections to the default interface (and all
of the trimmings and workarounds in netmon et al that make that work) are
not needed. The kernel is extension-aware and doing nothing, is the right
thing. This is, however, not the case for tailscaled (which is not a
special process).
To allow us to test this assertion (and where it might break things), we add a
new node cap that turns this behaviour off only for network-extension equipped clients,
making it possible to turn this off tailnet-wide, without breaking any tailscaled
macos nodes.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
Unfortunately I closed the tab and lost it in my sea of CI failures
I'm currently fighting.
Updates #cleanup
Change-Id: I4e3a652d57d52b75238f25d104fc1987add64191
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When systemd notification support was omitted from the build, or on
non-Linux systems, we were unnecessarily emitting code and generating
garbage stringifying addresses upon transition to the Running state.
Updates #12614
Change-Id: If713f47351c7922bb70e9da85bf92725b25954b9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
* lock released early just to call `b.send` when it can call
`b.sendToLocked` instead
* `UnlockEarly` called to release the lock before trivially fast
operations, we can wait for a defer there
Updates #11649
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
It was disabled in May 2024 in #12205 (9eb72bb51).
This removes the unused symbols.
Updates #188
Updates tailscale/corp#19106
Updates tailscale/corp#19116
Change-Id: I5208b7b750b18226ed703532ed58c4ea17195a8e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Adds a new Redirect field to HTTPHandler for serving HTTP redirects
from the Tailscale serve config. The redirect URL supports template
variables ${HOST} and ${REQUEST_URI} that are resolved per request.
By default, it redirects using HTTP Status 302 (Found). For another
redirect status, like 301 - Moved Permanently, pass the HTTP status
code followed by ':' on Redirect, like: "301:https://tailscale.com"
Updates #11252
Updates #11330
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>