tailscale

mirror of https://github.com/tailscale/tailscale.git synced 2026-05-04 11:51:17 +02:00

Author	SHA1	Message	Date
Brad Fitzpatrick	f343b496c3	wgengine, all: remove LazyWG, use wireguard-go callback API for on-demand peers Replace the UAPI text protocol-based wireguard configuration with wireguard-go's new direct callback API (SetPeerLookupFunc, SetPeerByIPPacketFunc, RemoveMatchingPeers, SetPrivateKey). Instead of computing a trimmed wireguard config ahead of time upon control plane updates and pushing it via UAPI, install callbacks so wireguard-go creates peers on demand when packets arrive. This removes all the LazyWG trimming machinery: idle peer tracking, activity maps, noteRecvActivity callbacks, the KeepFullWGConfig control knob, and the ts_omit_lazywg build tag. For incoming packets, PeerLookupFunc answers wireguard-go's questions about unknown public keys by looking up the peer in the full config. For outgoing packets, PeerByIPPacketFunc (installed from LocalBackend.lookupPeerByIP) maps destination IPs to node public keys using the existing nodeByAddr index. Updates tailscale/corp#12345 Change-Id: I4cba80979ac49a1231d00a01fdba5f0c2af95dd8 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 19:46:19 -07:00
Brad Fitzpatrick	22ff402da9	wgengine/magicsock: restore SetDERPMap signature, add SetDERPMapWithoutReSTUN Commit 78627c132f changed the signature of magicsock.Conn.SetDERPMap to take an additional bool doReStun parameter. Avoid both the boolean parameter and the API signature change by restoring SetDERPMap to its original single-argument form and adding a new SetDERPMapWithoutReSTUN method for the cache-loading caller that wants to skip the post-set ReSTUN. Updates #19490 Change-Id: I97d9e82156bfc546ccf59756d1ea52f039b5de06 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 12:46:15 -07:00
Claus Lensbøl	be7cce74ba	wgengine/userspace: do not fall back to old key on tsmpLearned mismatch (#19575 ) The mismatch behaviour of falling back to a previous key could end up breaking connections when the netmap update took longer than the 2 seconds allowed in controlClient.auto for netmap updates, or if the controlClient context was canceled. This could end up breaking legitimate updates to the netmap for disco keys coming from control. Instead, log the event, and let the connection be reset to that of the key as that is safer. Issue found by @bradfitz. Updates #19574 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-04-29 13:23:04 -04:00
Claus Lensbøl	78627c132f	wgengine/magicsock,ipn/ipnlocal: store and load homeDERP from cache (#19491 ) With netmap caching, the home DERP of the self node was neither saved to the cache or loaded from it, making nodes not stick to a DERP when starting without a connection to control. Instead, make sure that when a cache is available, load that cache, before looking for DERP servers. This is implemented by allowing a skip of ReSTUN in setting the DERP map (we must have a DERP map before setting the home DERP), so the DERP from cache will set itself and be sticky until a connection to control is established. Making DERP only change when connected to control is handled by existing code from f072d017bd8241675aa946a27fc1827f570435cb. Updates #19490 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-04-29 10:24:09 -04:00
Mike O'Driscoll	33342aec32	The connmark save/restore rules in mangle/PREROUTING restore the Tailscale bypass fwmark (0x80000) onto reply packets so that rp_filter's reverse-path check routes through the main table instead of table 52. However, the kernel only uses the packet's fwmark during the rp_filter lookup when net.ipv4.conf.all.src_valid_mark=1. (#19537 ) On systems where this sysctl defaults to 0 (including GCP VMs), rp_filter performs its lookup with fwmark=0, hits rule 5270 then table 52 and routes to 0.0.0.0/0 dev tailscale0, and drops every reply packet arriving on the physical interface as a martian. This breaks all connectivity when using an exit node: DERP, DNS, control plane, and even the cloud metadata service. Set src_valid_mark=1 when enabling the connmark rules so the rp_filter workaround actually works in these cases. Updates #3310 Updates tailscale/corp#37846 Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>	2026-04-27 13:52:45 -04:00
James Tucker	1b40911611	wgengine/netstack: absorb all quad-100 traffic locally, never leak to peers Previously, handleLocalPackets intercepted traffic to the Tailscale service IP (100.100.100.100 / fd7a:115c:a1e0::53) only for an allow-list of ports: TCP 53/80/8080 and UDP 53. Any other port returned filter.Accept, letting the packet fall through to the ACL filter and wireguard-go, which would attempt a peer lookup. No peer owns the quad-100 AllowedIP, so after ~5s pendopen.go would log: open-conn-track: timeout opening ...; no associated peer node This is the common "conntrack error no peer found for 100.100.100.100:853" log spam seen in the wild (e.g. from systemd-resolved or another resolver speculatively trying DoT on quad-100). It also leaks quad-100 packets onto the tailnet. Remove the port allow-list so handleLocalPackets absorbs every quad-100 packet into netstack regardless of IP protocol or port. Traffic never reaches the conntrack / peer-routing layers. With the allow-list gone, acceptTCP needs a corresponding guard: on a quad-100 TCP port we don't serve, execution used to fall through to the isTailscaleIP case (quad-100 is in the tailscale IP range), which rewrote the dial target to 127.0.0.1:<port> and forwardTCP'd the connection to whatever happened to be listening on the host's loopback at that port. Add a hittingServiceIP case that RSTs cleanly instead, placed before the isTailscaleIP fallthrough. TestQuad100UnservedTCPPortDoesNotForward is a new integration test that injects a TCP SYN to 100.100.100.100:853 via handleLocalPackets, stubs forwardDialFunc, and asserts the dialer is not invoked; it catches regressions of the acceptTCP recursion/loopback-redirection case. Fixes #15796 Fixes #19421 Updates #3261 Updates #11305 Signed-off-by: James Tucker <james@tailscale.com>	2026-04-24 12:42:16 -07:00
Claus Lensbøl	ee76a7d3f8	wgengine/magicsock: do not send TSMP disco when connected (#19497 ) When there is an active connection between devices, do not send new disco keys via TSMP. Updates #12639 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-04-23 12:23:57 -04:00
Brad Fitzpatrick	311dd3839d	wgengine/magicsock: replace peers slice with peersByID map; add Upsert/RemovePeer Replace Conn.peers (sorted views.Slice) with peersByID, a map[tailcfg.NodeID]tailcfg.NodeView. The only caller that needed the sorted slice (the disco message receive path's binary search) becomes a single map lookup. Drop nodesEqual. Add Conn.UpsertPeer / Conn.RemovePeer for O(1) single-peer endpoint work. RemovePeer also performs a targeted single-disco-key cleanup (previously that scan was O(discoInfo)). Extract the shared per-peer upsert body as upsertPeerLocked; still used by SetNetworkMap's bulk path. SetNetworkMap is documented as the bulk / initial / self-change path; UpsertPeer and RemovePeer are preferred for single-peer changes. Make the relay server set update O(1) per peer: add serverUpsertCh / serverRemoveCh to relayManager with matching run-loop handlers. UpsertPeer / RemovePeer evaluate the per-peer relay predicate locally and dispatch upsert or remove. The full-rebuild updateRelayServersSet stays for the initial netmap, filter changes, and fallback. Move the hasPeerRelayServers atomic from Conn onto relayManager, next to the serversByNodeKey map it summarizes. The run loop is now the single writer and needs no back-pointer to Conn; endpoint's two hot-path readers take one extra hop to de.c.relayManager.hasPeerRelayServers but the cost is the same atomic load. No callers use UpsertPeer/RemovePeer yet; a subsequent change will plumb per-peer add/remove through the incremental map update path. Updates #12542 Change-Id: If6a3442fe29ccbd77890ea61b754a4d1ad6ef225 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-22 15:07:11 -07:00
Alex Valiushko	d3ba1480f5	magicsock: invalidate endpoint on trust timeout (#19415 ) Endpoint's best address was cleared on trustBestAddrUntil expiry only if it was a udprelay connection. This generalizes invalidation to also cover direct UDP. Trust deadline is checked in two cases: On disco ping timeout from the endpoint's best address. Traffic goes DERP-only, heartbeats to the old address stop. The discovery pings are still in flight, handled by the following. On disco ping success from an alternative. BestAddr switches to the working path, trust refreshed, eager discovery stops. The still in flight pongs are handled by betterAddr(). Updates #19407 Change-Id: Ic41ed18edb4a6e4350a2d49271ba01566a6a6964 Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>	2026-04-15 19:22:07 -07:00
Avery Pennarun	effbe67fe3	wgengine/magicsock: remove pickPort, use port 0 to avoid TOCTOU race pickPort would bind a UDP socket on :0 to get a free port, close the socket, then hope to rebind to the same port in NewConn. This is a TOCTOU race that can cause flaky test failures when another process grabs the port in between. Instead, pass Port: 0 to NewConn and let the OS assign the port atomically, then read back the assigned port via conn.LocalPort(). Fixes #19409 Change-Id: Ie44b599fb93c361e29a05f2171ad747c46f82b7a Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com> Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>	2026-04-14 18:08:47 -07:00
Naman Sood	6301a6ce4b	util/linuxfw,wgengine/router: allow incoming CGNAT range traffic with nodeattr Clients with the newly added node attribute `"disable-linux-cgnat-drop-rule"` will not automatically drop inbound traffic on non-Tailscale network interfaces with the source IP in the CGNAT IP range. This is an initial proof-of-concept for enabling connectivity with off-Tailnet CGNAT endpoints. Fixes tailscale/corp#36270. Signed-off-by: Naman Sood <mail@nsood.in>	2026-04-14 16:45:06 -04:00
Fernando Serboncini	5834058269	wgengine: replace reflect.DeepEqual with typed Equal for maybeReconfigInputs (#19365 ) reflect.DeepEqual is expensive and allocates heavily. Replace it with a field-by-field comparison that does zero allocations. Adds tests and benchmarks for the new Equal method. Fixes #19363 Signed-off-by: Fernando Serboncini <fserb@tailscale.com>	2026-04-14 13:16:21 -04:00
Brad Fitzpatrick	6aa10576c9	wgengine/magicsock: deflake TestTwoDevicePing compare-metrics-stats The compare-metrics-stats subtest reset two independent counting systems (physical connection counters and expvar.Int user metrics) non-atomically. Background WireGuard keepalives arriving between the resets could increment one system but not the other, causing off-by-one packet/byte mismatches in either direction. Replace the reset-then-compare pattern with snapshot-and-delta: snapshot both systems before pings, snapshot again after, and compare the deltas. This eliminates the non-atomic reset window entirely. As a belt-and-suspenders safety net, tolerate a difference of exactly one packet (and corresponding bytes) from a stray keepalive that could still arrive in the narrow window between the two snapshots. flakestress passes with ~5900 runs (~2800 without -race, ~3100 with -race) but it also passed previously too. This is an annoying one to repro. Fixes #11762 Change-Id: I3447ad67e71c8146e85eed38b7a665033ef9e284 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-14 06:57:24 -07:00
Brad Fitzpatrick	9fbe4b3ed2	all: fix six tests that failed with -count=2 Avery found a bunch of tests that fail with -count=2. Updates tailscale/corp#40176 (tracks making our CI detect them) Change-Id: Ie3e4398070dd92e4fe0146badddf1254749cca20 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Co-authored-by: Avery Pennarun <apenwarr@tailscale.com>	2026-04-13 18:52:57 -07:00
Brad Fitzpatrick	50b8cfbde2	wgengine/netstack: fix data race on in-flight connection test globals The maxInFlightConnectionAttemptsForTest and maxInFlightConnectionAttemptsPerClientForTest globals were plain ints read by background gVisor TCP handler goroutines (via wrapTCPProtocolHandler) and written by tstest.Replace cleanup in TestTCPForwardLimits_PerClient. When a gVisor goroutine outlived the test cleanup window, the race detector caught the unsynchronized access. The race-prone code was introduced in c5abbcd4b4d8 (2024-02-26, "wgengine/netstack: add a per-client limit for in-flight TCP forwards") which added both the plain int globals and the TestTCPForwardLimits_PerClient test that writes them via tstest.Replace. It is not obvious why this has only recently started being detected as a data race; likely some combination of gVisor version bumps, Go toolchain scheduler changes, and additional TCP-injecting subtests (e.g. 03461ea7f, 2026-01-30) increased goroutine churn enough to hit the window. Change both globals to atomic.Int32 and replace tstest.Replace (which does non-atomic *target = old on cleanup) with explicit Store/Cleanup pairs. Fixes #19118 Change-Id: Id26ba6fbfb2e4ade319976db80af8e16c7c8778e Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-13 15:24:35 -07:00
Amal Bansode	b4c0d67f8b	wgengine/router/osrouter: fix privileged tests missing fake netfilter runner These test failures were never caught by CI because the package in question was missing from our privileged tests list. tailscale/corp#40007 covers improving our process around this. Fixes #19316 Signed-off-by: Amal Bansode <amal@tailscale.com>	2026-04-10 14:51:55 -07:00
Brad Fitzpatrick	5e81840b57	tstest: add RequireRoot helper Start using a common helper for tests to declare that they require root. This is step 1. A later step will then make this helper track which tests were skipped so a subsequent pass will run these test as root. Updates tailscale/corp#40007 Change-Id: I4979e1def0fa3691d38c83f48c89aaa443e7f62e Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-10 10:48:50 -07:00
Tom Meadows	5341b26328	wgengine/netstack: allow UDP listeners to receive traffic on Service VIP addresses (#18972 ) Fixes UDP listeners on VIP Service addresses not receiving inbound traffic. - Modified shouldProcessInbound to check for registered UDP transport endpoints when processing packets to service VIPs - Uses FindTransportEndpoint to determine if a UDP listener exists for the destination VIP/port - Supports both IPv4 and IPv6 The aim was to mirror the existing TCP logic, providing feature parity for UDP-based services on VIP Services. Fixes #18971 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>	2026-04-08 10:53:50 +01:00
Brad Fitzpatrick	a182b864ac	tsd, all: add Sys.ExtraRootCAs, plumb through TLS dial paths Add ExtraRootCAs *x509.CertPool to tsd.System and plumb it through the control client, noise transport, DERP, and wgengine layers so that platforms like Android can inject user-installed CA certificates into Go's TLS verification. tlsdial.Config now honors base.RootCAs as additional trusted roots, tried after system roots and before the baked-in LetsEncrypt fallback. SetConfigExpectedCert gets the same treatment for domain-fronted DERP. The Android client will set sys.ExtraRootCAs with a pool built from x509.SystemCertPool + user-installed certs obtained via the Android KeyStore API, replacing the current SSL_CERT_DIR environment variable approach. Updates #8085 Change-Id: Iecce0fd140cd5aa0331b124e55a7045e24d8e0c2 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-07 18:10:54 -07:00
Claus Lensbøl	9a7f143903	wgengine/userspace: add extra check for tsmp learned keys in engine (#19223 ) If an entry in the tsmpLearnedDisco does not match the disco key of the key currently being processed, overwrite the key, and leave the entry in the map for later processing. In reality, this should not happen, but is put in as a safety measure with logging of the situation so we can replicate the behaviour and correct it should it happen. Updates #12639 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-04-07 09:11:11 -04:00
Brad Fitzpatrick	5ef3713c9f	cmd/vet: add subtestnames analyzer; fix all existing violations Add a new vet analyzer that checks t.Run subtest names don't contain characters requiring quoting when re-running via "go test -run". This enforces the style guide rule: don't use spaces or punctuation in subtest names. The analyzer flags: - Direct t.Run calls with string literal names containing spaces, regex metacharacters, quotes, or other problematic characters - Table-driven t.Run(tt.name, ...) calls where tt ranges over a slice/map literal with bad name field values Also fix all 978 existing violations across 81 test files, replacing spaces with hyphens and shortening long sentence-like names to concise hyphenated forms. Updates #19242 Change-Id: Ib0ad96a111bd8e764582d1d4902fe2599454ab65 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-05 15:52:51 -07:00
M. J. Fromberger	211ef67222	tailcfg,ipn/ipnlocal: regulate netmap caching via a node attribute (#19117 ) Add a new tailcfg.NodeCapability (NodeAttrCacheNetworkMaps) to control whether a node with support for caching network maps will attempt to do so. Update the capability version to reflect this change (mainly as a safety measure, as the control plane does not currently need to know about it). Use the presence (or absence) of the node attribute to decide whether to create and update a netmap cache for each profile. If caching is disabled, discard the cached data; this allows us to use the presence of a cached netmap as an indicator it should be used (unless explicitly overridden). Add a test that verifies the attribute is respected. Reverse the sense of the environment knob to be true by default, with an override to disable caching at the client regardless what the node attribute says. Move the creation/update of the netmap cache (when enabled) until after successfully applying the network map, to reduce the possibility that we will cache (and thus reuse after a restart) a network map that fails to correctly configure the client. Updates #12639 Change-Id: I1df4dd791fdb485c6472a9f741037db6ed20c47e Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-04-01 15:02:53 -07:00
Claus Lensbøl	c76113ac75	wgengine/magicsock: send out disco keys over TSMP periodically (#19212 ) Instead of sending out disco keys via TSMP once, send them out in intervals of 60+ seconds. The trigger is still callmemaaybe and the keys will not be send if no direct connection needs to be established. This fixes a case where a node can have stale keys but have communicated with the other peer before, leading to an infinite DERP state. Updates #12639 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-04-01 17:20:03 -04:00
Harry Harpham	61ac021c5d	wgengine/magicsock: assume network up for tests Without this, any test relying on underlying use of magicsock will fail without network connectivity, even when the test logic has no need for a network connection. Tests currently in this bucket include many in tstest/integration and in tsnet. Further explanation: ipn only becomes Running when it sees at least one live peer or DERP connection: `0cc1b2ff76/ipn/ipnlocal/local.go (L5861-L5866)` When tests only use a single node, they will never see a peer, so the node has to wait to see a DERP server. magicsock sets the preferred DERP server in updateNetInfo(), but this function returns early if the network is down. `0cc1b2ff76/wgengine/magicsock/magicsock.go (L1053-L1106)` Because we're checking the real network, this prevents ipn from entering "Running" and causes the test to fail or hang. In tests, we can assume the network is up unless we're explicitly testing the behaviour of tailscaled when the network is down. We do something similar in magicsock/derp.go, where we assume we're connected to control unless explicitly testing otherwise: `7d2101f352/wgengine/magicsock/derp.go (L166-L177)` This is the template for the changes to `networkDown()`. Fixes #17122 Co-authored-by: Alex Chan <alexc@tailscale.com> Signed-off-by: Harry Harpham <harry@tailscale.com>	2026-03-31 09:57:14 -06:00
Claus Lensbøl	bf467727fc	control/controlclient,ipn/ipnlocal,wgengine: avoid restarting wireguard when key is learned via tsmp (#19142 ) When disco keys are learned on a node that is connected to control and has a mapSession, wgengine will see the key as having changed, and assume that any existing connections will need to be reset. For keys learned via TSMP, the connection should not be reset as that key is learned via an active wireguard connection. If wgengine resets that connetion, a 15s timeout will occur. This change adds a map to track new keys coming in via TSMP, and removes them from the list of keys that needs to trigger wireguard resets. This is done with an interface chain from controlclient down via localBackend to userspaceEngine via the watchdog. Once a key has been actively used for preventing a wireguard reset, the key is removed from the map. If mapSession becomes a long lived process instead of being dependent on having a connection to control. This interface chain can be removed, and the event sequence from wrap->controlClient->userspaceEngine, can be changed to wrap->userspaceEngine->controlClient as we know the map will not be gunked up with stale TSMP entries. Updates #12639 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-03-30 14:26:08 -04:00
Alex Chan	1d0fde6fc2	all: use `bart.Lite` instead of `bart.Table` where appropriate When we don't care about the payload value and are just checking whether a set contains an IP/prefix, we can use `bart.Lite` for the same lookup times but a lower memory footprint. Fixes #19075 Change-Id: Ia709e8b718666cc61ea56eac1066467ae0b6e86c Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-03-24 14:45:23 +00:00
Claus Lensbøl	85bb5f84a5	wgengine/magicsock,control/controlclient: do not overwrite discokey with old key (#18606 ) When a client starts up without being able to connect to control, it sends its discoKey to other nodes it wants to communicate with over TSMP. This disco key will be a newer key than the one control knows about. If the client that can connect to control gets a full netmap, ensure that the disco key for the node not connected to control is not overwritten with the stale key control knows about. This is implemented through keeping track of mapSession and use that for the discokey injection if it is available. This ensures that we are not constantly resetting the wireguard connection when getting the wrong keys from control. This is implemented as: - If the key is received via TSMP: - Set lastSeen for the peer to now() - Set online for the peer to false - When processing new keys, only accept keys where either: - Peer is online - lastSeen is newer than existing last seen If mapSession is not available, as in we are not yet connected to control, punt down the disco key injection to magicsock. Ideally, we will want to have mapSession be long lived at some point in the near future so we only need to inject keys in one location and then also use that for testing and loading the cache, but that is a yak for another PR. Updates #12639 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-03-20 08:56:27 -04:00
Josef Bacik	b0e63cbeb9	wgengine/netstack: add TS_NETSTACK_KEEPALIVE_{IDLE,INTERVAL} envknobs Adds envknobs to override the netstack default TCP keepalive idle time (~2h) and probe interval (75s) for forwarded connections. When a tailnet peer goes away without closing its connections (pod deleted, peer removed from the netmap, silent network partition), the forwardTCP io.Copy goroutines block until keepalive fires: the gvisor-side Read waits on a peer that will never send again, and the backend-side Read waits on a backend that is alive and idle. With the netstack default of 7200s idle + 9×75s probes, dead-peer detection takes a little over two hours. Under high-churn forwarding — many short-lived peers, or peers holding thousands of proxied connections that drop at once — stuck goroutines accumulate faster than they clear. The existing SetKeepAlive(true) at this site enables keepalive without setting the timers; the TODO above it noted "a shorter default might be better" and "might be a useful user-tunable". This makes both timers tunable without changing the defaults: unset preserves the ~2h behavior, which is the right trade-off for battery-powered peers. The two knobs are independent — setting one leaves the other at the netstack default. The options are set before SetKeepAlive(true) so the timer arms with the configured values rather than the defaults — matches the order in ipnlocal/local.go for SSH keepalive. Updates #4522 Signed-off-by: Josef Bacik <josefbacik@anthropic.com>	2026-03-17 13:44:11 -07:00
Brad Fitzpatrick	54606a0a89	wgengine/netstack: don't register subnet/4via6 TCP flows with proxymap Fixes #18991 Change-Id: I29a609dcd401854026aef4a5ad8d5806c3249ea6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-03-13 19:41:30 -07:00
Jordan Whited	96dde53b43	net/{batching,udprelay},wgengine/magicsock: add SO_RXQ_OVFL clientmetrics For the purpose of improved observability of UDP socket receive buffer overflows on Linux. Updates tailscale/corp#37679 Signed-off-by: Jordan Whited <jordan@tailscale.com>	2026-03-13 14:27:03 -07:00
Brad Fitzpatrick	073a9a8c9e	wgengine{,/magicsock}: add DERP hooks for filtering+sending packets Add two small APIs to support out-of-tree projects to exchange custom signaling messages over DERP without requiring disco protocol extensions: - OnDERPRecv callback on magicsock.Options / wgengine.Config: called for every non-disco DERP packet before the peer map lookup, allowing callers to intercept packets from unknown peers that would otherwise be dropped. - SendDERPPacketTo method on magicsock.Conn: sends arbitrary bytes to a node key via a DERP region, creating the connection if needed. Thin wrapper around the existing internal sendAddr. Also allow netstack.Start to accept a nil LocalBackend for use cases that wire up TCP/UDP handlers directly without a full LocalBackend. Updates tailscale/corp#24454 Change-Id: I99a523ef281625b8c0024a963f5f5bf5d8792c17 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-03-11 16:37:19 -07:00
kari-ts	dd1da0b389	wgengine: search randomly for unused port instead of in contiguous range (#18974 ) In TestUserspaceEnginePortReconfig, when selecting a port, use a random offset rather than searching in a continguous range in case there is a range that is blocked Updates tailscale/tailscale#2855 Signed-off-by: kari-ts <kari@tailscale.com>	2026-03-11 12:21:50 -07:00
Brad Fitzpatrick	70de111394	wgengine/magicsock: fix three race conditions in TestTwoDevicePing Fix three independent flake sources, at least as debugged by Claude, though empirically no longer flaking as it was before: 1. Poll for connection counter data instead of reading immediately. The conncount callback fires asynchronously on received WireGuard traffic, so after counts.Reset() there is no guarantee the counter has been repopulated before checkStats reads it. Use tstest.WaitFor with a 5s timeout to retry until a matching connection appears. 2. Replace the *2 symmetry assumption in global metric assertions. metricSendUDP and friends are AggregateCounters that sum per-conn expvars from both magicsock instances. The old assertion assumed both instances had identical packet counts, which breaks under asymmetric background WireGuard activity (handshake retries, etc). The new assertGlobalMetricsMatchPerConn computes the actual sum of both conns' expvars and compares against the AggregateCounter value. 3. Tolerate physical stats being 0 when user metrics are non-zero. A rebind event replaces the socket mid-measurement, resetting the physical connection counter while user metrics still reflect packets processed before the rebind. Log instead of failing in this case. Also move counts.Reset() after metric reads and reorder the reset sequence (counts before metrics) to minimize the race window. Fixes tailscale/tailscale#13420 Change-Id: I7b090a4dc229a862c1a52161b3f2547ec1d1f23f Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-03-11 07:34:52 -07:00
Brad Fitzpatrick	16fa81e804	wgengine: add API to force a disco key for experiments, testing Updates #12639 Updates tailscale/corp#24454 Change-Id: I2361206aec197a7eecbdf29d87b1b75335ee8eec Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-03-10 20:42:29 -07:00
Brad Fitzpatrick	bd2a2d53d3	all: use Go 1.26 things, run most gofix modernizers I omitted a lot of the min/max modernizers because they didn't result in more clear code. Some of it's older "for x := range 123". Also: errors.AsType, any, fmt.Appendf, etc. Updates #18682 Change-Id: I83a451577f33877f962766a5b65ce86f7696471c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-03-06 13:32:03 -08:00
Brad Fitzpatrick	2a64c03c95	types/ptr: deprecate ptr.To, use Go 1.26 new Updates #18682 Change-Id: I62f6aa0de2a15ef8c1435032c6aa74a181c25f8f Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-03-05 20:13:18 -08:00
Brad Fitzpatrick	2810f0c6f1	all: fix typos in comments Fix its/it's, who's/whose, wether/whether, missing apostrophes in contractions, and other misspellings across the codebase. Updates #cleanup Change-Id: I20453b81a7aceaa14ea2a551abba08a2e7f0a1d8 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-03-05 13:52:01 -08:00
Claus Lensbøl	9657a93217	tstest/natlab: add test for no control and rotated disco key (#18261 ) Updates #12639 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-03-05 16:00:36 -05:00
Mike O'Driscoll	26ef46bf81	util/linuxfw,wgengine/router: add connmark rules for rp_filter workaround (#18860 ) When a Linux system acts as an exit node or subnet router with strict reverse path filtering (rp_filter=1), reply packets may be dropped because they fail the RPF check. Reply packets arrive on the WAN interface but the routing table indicates they should have arrived on the Tailscale interface, causing the kernel to drop them. This adds firewall rules in the mangle table to save outbound packet marks to conntrack and restore them on reply packets before the routing decision. When reply packets have their marks restored, the kernel uses the correct routing table (based on the mark) and the packets pass the rp_filter check. Implementation adds two rules per address family (IPv4/IPv6): - mangle/OUTPUT: Save packet marks to conntrack for NEW connections with non-zero marks in the Tailscale fwmark range (0xff0000) - mangle/PREROUTING: Restore marks from conntrack to packets for ESTABLISHED,RELATED connections before routing decision and rp_filter check The workaround is automatically enabled when UseConnmarkForRPFilter is set in the router configuration, which happens when subnet routes are advertised on Linux systems. Both iptables and nftables implementations are provided, with automatic backend detection. Fixes #3310 Fixes #14409 Fixes #12022 Fixes #15815 Fixes #9612 Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>	2026-03-04 14:09:11 -05:00
Claus Lensbøl	2d21dd46cd	wgengine/magicsoc,net/tstun: put disco key advertisement behind a nob (#18857 ) To be less spammy in stable, add a nob that disables the creation and processing of TSMPDiscoKeyAdvertisements until we have a proper rollout mechanism. Updates #12639 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-03-03 09:04:37 -05:00
Alex Chan	0cca3bd417	wgengine/magicsock: improve error message for moving Mullvad node keys The "public key moved" panic has caused confusion on multiple occasions, and is a known issue for Mullvad. Add a loose heuristic to detect Mullvad nodes, and trigger distinct panics for Mullvad and non-Mullvad instances, with a link to the associated bug. When this occurs again with Mullvad, it'll be easier for somebody to find the existing bug. If it occurs again with something other than Mullvad, it'll be more obvious that it's a distinct issue. Updates tailscale/corp#27300 Change-Id: Ie47271f45f2ff28f767578fcca5e6b21731d08a1 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-03-03 09:13:48 +00:00
James Tucker	0fb207c3d0	wgengine/netstack: deliver self-addressed packets via loopback When a tsnet.Server dials its own Tailscale IP, TCP SYN packets are silently dropped. In inject(), outbound packets with dst=self fail the shouldSendToHost check and fall through to WireGuard, which has no peer for the node's own address. Fix this by detecting self-addressed packets in inject() using isLocalIP and delivering them back into gVisor's network stack as inbound packets via a new DeliverLoopback method on linkEndpoint. The outbound packet must be re-serialized into a new PacketBuffer because outbound packets have their headers parsed into separate views, but DeliverNetworkPacket expects raw unparsed data. Updates #18829 Signed-off-by: James Tucker <james@tailscale.com>	2026-02-27 14:30:41 -08:00
Brad Fitzpatrick	a98036b41d	go.mod: bump gvisor Updates #8043 Change-Id: Ia229ad4f28f2ff20e0bdecb99ca9e1bd0356ad8e Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-02-26 18:29:36 -08:00
Fernando Serboncini	da90ea664d	wgengine/magicsock: only run derpActiveFunc after connecting to DERP (#18814 ) derpActiveFunc was being called immediately as a bare goroutine, before startGate was resolved. For the firstDerp case, startGate is c.derpStarted which only closes after dc.Connect() completes, so derpActiveFunc was firing before the DERP connection existed. We now block it with the same logic used by runDerpReader and by runDerpWriter. Updates: #18810 Signed-off-by: Fernando Serboncini <fserb@tailscale.com>	2026-02-26 12:36:26 -05:00
joshua stein	518d241700	netns,wgengine: add OpenBSD support to netns via an rtable When an exit node has been set and a new default route is added, create a new rtable in the default rdomain and add the current default route via its physical interface. When control() is requesting a connection not go through the exit-node default route, we can use the SO_RTABLE socket option to force it through the new rtable we created. Updates #17321 Signed-off-by: joshua stein <jcs@jcs.org>	2026-02-25 12:44:32 -08:00
Michael Ben-Ami	811fe7d18e	ipnext,ipnlocal,wgengine/filter: add extension hooks for custom filter matchers Add PacketMatch hooks to the packet filter, allowing extensions to customize filtering decisions: - IngressAllowHooks: checked in RunIn after pre() but before the standard runIn4/runIn6 match rules. Hooks can accept packets to destinations outside the local IP set. First match wins; the returned why string is used for logging. - LinkLocalAllowHooks: checked inside pre() for both ingress and egress, providing exceptions to the default policy of dropping link-local unicast packets. First match wins. The GCP DNS address (169.254.169.254) is always allowed regardless of hooks. PacketMatch returns (match bool, why string) to provide a log reason consistent with the existing filter functions. Hooks are registered via the new FilterHooks struct in ipnext.Hooks and wired through to filter.Filter in LocalBackend.updateFilterLocked. Fixes tailscale/corp#35989 Fixes tailscale/corp#37207 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>	2026-02-24 10:54:56 -05:00
Jonathan Nobels	be4449f6e0	util/clientmetric, wgengine/watchdog: report watchdog errors in user/client metrics (#18591 ) fixes tailscale/corp#36708 Sets up a set of metrics to report watchdog timeouts for wgengine and reports an event for any watchdog timeout. Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>	2026-02-13 13:30:48 -05:00
Brad Fitzpatrick	dc1d811d48	magicsock, ipnlocal: revert eventbus-based node/filter updates, remove Synchronize hack Restore synchronous method calls from LocalBackend to magicsock.Conn for node views, filter, and delta mutations. The eventbus delivery introduced in 8e6f63cf1 was invalid for these updates because subsequent operations in the same call chain depend on magicsock already having the current state. The Synchronize/settleEventBus workaround was fragile and kept requiring more workarounds and introducing new mystery bugs. Since eventbus was added, we've since learned more about when to use eventbus, and this wasn't one of the cases. We can take another swing at using eventbus for netmap changes in a future change. Fixes #16369 Updates #18575 (likely fixes) Change-Id: I79057cc9259993368bb1e350ff0e073adf6b9a8f Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-02-10 07:32:05 -08:00
Michael Ben-Ami	5a5572e48a	tstun,wgengine: add new datapath hooks for intercepting Connectors 2025 app connector packets We introduce the Conn25PacketHooks interface to be used as a nil-able field in userspaceEngine. The engine then plumbs through the functions to the corresponding tstun.Wrapper intercepts. The new intercepts run pre-filter when egressing toward WireGuard, and post-filter when ingressing from WireGuard. This is preserve the design invariant that the filter recognizes the traffic as interesting app connector traffic. This commit does not plumb through implementation of the interface, so should be a functional no-op. Fixes tailscale/corp#35985 Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>	2026-02-09 17:06:27 -05:00
KevinLiang10	03461ea7fb	wgengine/netstack: add local tailscale service IPs to route and terminate locally (#18461 ) * wgengine/netstack: add local tailscale service IPs to route and terminate locally This commit adds the tailscales service IPs served locally to OS routes, and make interception to packets so that the traffic terminates locally without making affects to the HA traffics. Fixes tailscale/corp#34048 Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com> * fix test Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com> * add ready field to avoid accessing lb before netstack starts Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com> * wgengine/netstack: store values from lb to avoid acquiring a lock Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com> * add active services to netstack on starts with stored prefs. Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com> * fix comments Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com> * update comments Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com> --------- Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>	2026-01-30 16:46:03 -05:00

1 2 3 4 5 ...

1879 Commits