tailscale

mirror of https://github.com/tailscale/tailscale.git synced 2026-05-05 20:26:47 +02:00

Author	SHA1	Message	Date
Alex Chan	94643ba572	Check that we need to compute missing AUMs before updating local Change-Id: If2d2ff03fc650f8d50bb45c1c7ff8bf622715b89	2026-04-30 15:10:57 +01:00
Alex Chan	8749d19029	tka,ipn: reduce boilerplate in Tailnet Lock tests The `CreateStateForTest` helper reduces boilerplate in cases where the test only cares about the trusted keys and not the disablement values (and makes it more obvious where the disablement values are meaningful). The `setupChonkStorage` helper reduces the boilerplate when creating on-disk TKA storage in tests. The `fakeLocalBackend` helper reduces the boilerplate when setting up a `LocalBackend` instance in the IPN tests. Updates #cleanup Change-Id: Iacfba1be5f7fab208eec11e4369d63c7d7519da5 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-04-30 12:43:07 +01:00
Brad Fitzpatrick	f343b496c3	wgengine, all: remove LazyWG, use wireguard-go callback API for on-demand peers Replace the UAPI text protocol-based wireguard configuration with wireguard-go's new direct callback API (SetPeerLookupFunc, SetPeerByIPPacketFunc, RemoveMatchingPeers, SetPrivateKey). Instead of computing a trimmed wireguard config ahead of time upon control plane updates and pushing it via UAPI, install callbacks so wireguard-go creates peers on demand when packets arrive. This removes all the LazyWG trimming machinery: idle peer tracking, activity maps, noteRecvActivity callbacks, the KeepFullWGConfig control knob, and the ts_omit_lazywg build tag. For incoming packets, PeerLookupFunc answers wireguard-go's questions about unknown public keys by looking up the peer in the full config. For outgoing packets, PeerByIPPacketFunc (installed from LocalBackend.lookupPeerByIP) maps destination IPs to node public keys using the existing nodeByAddr index. Updates tailscale/corp#12345 Change-Id: I4cba80979ac49a1231d00a01fdba5f0c2af95dd8 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 19:46:19 -07:00
Brad Fitzpatrick	b313bffbe7	control/tsp, tstest/integration/testcontrol: deflake TestMapAgainstTestControl The test was flaky under stress with "AddRawMapResponse N: node not connected" failures. The root cause was in testcontrol's addDebugMessage: it conflated "no streaming poll registered" with "wake-up channel buffer momentarily full". The single-slot updatesCh is just a lossy wake-up signal, but the streaming serveMap loop has fast paths (takeRawMapMessage and the hasPendingRawMapMessage continue) that don't drain it. A stale notification could remain buffered, causing the next sendUpdate to fail even though msgToSend had been queued and the streaming poll would still pick it up. Detect the real failure case (no streaming poll) by checking s.updates[nodeID] directly, and treat sendUpdate's buffer-full result as benign — the message is in msgToSend, which is the source of truth. Also plumb an optional health.Tracker through tsp.ClientOpts to the underlying ts2021.Client and supply one in the tests, eliminating the "## WARNING: (non-fatal) nil health.Tracker (being strict in CI)" stack dumps emitted by controlhttp.(Dialer).forceNoise443 under CI. Fixes #19583 Change-Id: Ib2334376585e8d6562f000a0b71dea0117acb0ff Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 16:11:00 -07:00
Claus Lensbøl	978b6a81b2	ipn/ipnlocal: always ReSTUN when starting up without a cache (#19586 ) 78627c1 introduced starting up and preserving the DERP server from cache, but also changed it so the initial ReSTUN would not fire when setting the DERPMap. Change this so when not working from a cache, the ReSTUN will always fire during startup. Updates #19585 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-04-29 18:56:57 -04:00
Jordan Whited	c0a9728fe2	derp/derpserver: fix Server.UpdateRateLimits docs As of 0e9f9e2bd it is possible to have an infinity per-client limit, with finite global. Updates tailscale/corp#40962 Signed-off-by: Jordan Whited <jordan@tailscale.com>	2026-04-29 14:43:12 -07:00
Jordan Whited	0e9f9e2bd8	derp/derpserver: support global rate limiting independent of per-client This commit enables the operator to set a global rate limit without any per-client. Updates tailscale/corp#40962 Signed-off-by: Jordan Whited <jordan@tailscale.com>	2026-04-29 14:15:53 -07:00
Brad Fitzpatrick	15cba0a3f6	tstest/natlab/vmtest: add TestDiscoKeyChange Add a vmtest that brings up two gokrazy nodes A and B behind two One2OneNAT networks (so direct UDP works in both directions and any slowness can't be blamed on NAT traversal), establishes a WireGuard tunnel A → B with TSMP, then rotates B's disco key four times and asserts that the data plane recovers in both directions after each rotation. All pings are TSMP (the data-plane ping; disco pings would not exercise the WireGuard tunnel itself). The five pings: 1. A → B (initial; brings up the tunnel; 30s budget) 2. B → A after rotate (LocalAPI rotate-disco-key debug action) 3. A → B after rotate (LocalAPI) 4. B → A after restart (SIGKILL; gokrazy supervisor respawns) 5. A → B after restart (SIGKILL) Each post-rotation ping gets a 15-second budget. Two unavoidable multi-second waits dominate today: - The rotate-then-a→b phase takes ~10s on main because of LazyWG. After B's WantRunning bounce, B's wgengine resets its sentActivityAt/recvActivityAt maps and trims A out of the wireguard-go config as an "idle peer"; B only re-adds A on inbound activity, by which point A's first few TSMP packets have been silently dropped at B's tundev. The bradfitz/rm_lazy_wg branch removes that trimming entirely (verified locally: this phase drops to <100ms there). - The restart phases take ~5s for wireguard-go's RekeyTimeout handshake retry. After SIGKILL+respawn the first WG handshake init from the restarted node sometimes goes into the void (likely the brief peer-removed window in the receiver's two-step maybeReconfigWireguardLocked reconfig during which the peer is absent from wireguard-go), and wg-go's 5s+jitter retransmit timer is the next opportunity to retry. That retry succeeds and the staged TSMP packet flushes. Intrinsic to the protocol's retransmit policy. Once LazyWG is removed and the first-handshake-after-reconfig race is fixed, the budget should drop to 5s. Supporting changes: ipn/ipnlocal: DebugRotateDiscoKey now toggles WantRunning off and back on after rotating the disco key. magicsock.Conn.RotateDiscoKey only resets local disco state; without also dropping wireguard-go session keys, peers keep encrypting with their stale per-peer session against us until their rekey timer fires (WireGuard has no data-plane signaling to invalidate sessions). Bouncing WantRunning runs the engine through Reconfig(empty) → authReconfig, which drops every peer's WG session so the next packet either way triggers a fresh handshake. ipn/ipnlocal, ipn/localapi: add a debug-only "peer-disco-keys" LocalAPI action ([LocalBackend.DebugPeerDiscoKeys]) that returns a map[NodePublic]DiscoPublic from the current netmap. Tests reach it via [local.Client.DebugResultJSON]. We do not surface disco keys via [ipnstate.PeerStatus] because adding a non-comparable [key.DiscoPublic] field there breaks reflect-based test helpers (e.g. TestFilterFormatAndSortExitNodes' use of cmp.Diff), and general LocalAPI clients have no need for disco keys. Since the debug LocalAPI is gated behind the ts_omit_debug build tag, this endpoint is automatically stripped from small binaries. cmd/tta: add /restart-tailscaled handler (Linux-only, via /proc walk) to drive the SIGKILL phase. On gokrazy the supervisor respawns tailscaled within a second. tstest/integration/testcontrol: add Server.AllOnline. When set, every peer entry in MapResponses is marked Online=true. Several disco-key handling fast paths in controlclient and wgengine (removeUnwantedDiscoUpdates, removeUnwantedDiscoUpdatesFromFull NetmapUpdate, the wgengine tsmpLearnedDisco fast path) only fire for online peers; without this flag, tests exercising disco-key rotation only hit the offline-peer code paths, which mask issues and are several seconds slower in this scenario. Finer-grained per-node online tracking can be added later. tstest/natlab/vmtest: add Env.RotateDiscoKey, Env.RestartTailscaled, Env.PeerDiscoKey, Node.Name, an [AllOnline] EnvOption that plumbs through to testcontrol.Server.AllOnline, and an exported Env.Ping(from, to, type, timeout). Ping replaces the unexported helper so callers can specify both a ping type (PingDisco for warming peer state, PingTSMP for asserting end-to-end connectivity) and a deadline. PeerDiscoKey returns its LocalAPI error so callers inside tstest.WaitFor can retry transient failures rather than fataling the test. Updates #12639 Updates #13038 Change-Id: I3644f27fc30e52990ba25a3983498cc582ddb958 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 12:58:00 -07:00
Brad Fitzpatrick	22ff402da9	wgengine/magicsock: restore SetDERPMap signature, add SetDERPMapWithoutReSTUN Commit 78627c132f changed the signature of magicsock.Conn.SetDERPMap to take an additional bool doReStun parameter. Avoid both the boolean parameter and the API signature change by restoring SetDERPMap to its original single-argument form and adding a new SetDERPMapWithoutReSTUN method for the cache-loading caller that wants to skip the post-set ReSTUN. Updates #19490 Change-Id: I97d9e82156bfc546ccf59756d1ea52f039b5de06 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 12:46:15 -07:00
Adriano Sela Aviles	1cd8bcc827	tailcfg: extend services model for client application actions Updates: tailscale/corp#40648 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>	2026-04-29 11:33:13 -07:00
Brad Fitzpatrick	70f0b261b6	go.mod, gokrazy: bump to fork of gokrazy/gokrazy init process for syslog change When we switched to monogok in 371d6369cd25afb, we lost our gokrazy fork's change to let the syslog be configured from the Linux cmdline. That's sent upstream in gokrazy/gokrazy#275 but still in review. Meanwhile, revert to a fork, while still keeping monogok. Monogok was updated to support an alternate init package, which is now hosted temporarily at https://github.com/tailscale/ts-gokrazy This means we can rip out the log polling loop out of pending PR #19568 and go ack to using syslog. Updates #13038 Change-Id: I36931ee8eecc40d6165ad036c6181dfb07b86ba2 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 11:27:41 -07:00
Alex Valiushko	01d0bdd253	cmd/derper,derp: add metrics for rate limit hits (#19560 ) Expvars track count of rate limiters exceeding their threshold. Covers (1) global rate limiter and (2) total of local rate limiters. Also publish optional rate-limit metrics during ExpVar() call if -rate-config is specified. Fixes current rate-limit metrics being published outside of "derp" in /debug/vars. Updates tailscale/corp#38509 Change-Id: Ic7f5a1e890d0d7d3d7b679daa4b5f8926a6a6964 Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>	2026-04-29 10:29:09 -07:00
Claus Lensbøl	be7cce74ba	wgengine/userspace: do not fall back to old key on tsmpLearned mismatch (#19575 ) The mismatch behaviour of falling back to a previous key could end up breaking connections when the netmap update took longer than the 2 seconds allowed in controlClient.auto for netmap updates, or if the controlClient context was canceled. This could end up breaking legitimate updates to the netmap for disco keys coming from control. Instead, log the event, and let the connection be reset to that of the key as that is safer. Issue found by @bradfitz. Updates #19574 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-04-29 13:23:04 -04:00
Brad Fitzpatrick	fd6ae2fad4	tstest/natlab/vmtest: serialize per-platform setup with sync.Once Two cloud-platform nodes (e.g. sr-a and sr-b in TestSiteToSite) boot in parallel via errgroup and both call ensureCompiled and the inline image preparation block, racing to Begin() the same shared Step (which is deduped by name in Env.Step). The second goroutine panics: panic: Step "Compile linux_amd64 binaries": Begin called in state running panic: Step "Prepare ubuntu-24.04 image": Begin called in state done ensureCompiled had a TOCTOU dedup attempt (released compileMu before doing the work, only added to the compiled set at the end), and image preparation had no dedup at all. Replace the compiled set with a per-key map[string]sync.Once for each of compile and image preparation, so concurrent callers serialize on the Once and only the first executes Begin/work/End. Fixes commit 02ffe5baa8ccb2b81c4cfba3b59653e2cff10e01. Updates #13038 Change-Id: If710bcc9e0aafebf0ad5b61553bae11458d976d7 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 09:54:58 -07:00
Brad Fitzpatrick	02ffe5baa8	tstest/natlab/vmtest: add macOS VM snapshot caching for fast test starts Cache a pre-booted macOS VM snapshot on disk so subsequent test runs restore from the snapshot instead of cold-booting. The snapshot is keyed by the Tart base image digest and a code version constant (macOSSnapshotCodeVersion); bumping either invalidates the cache. Snapshot preparation (one-time): - Boot the Tart base image with a NAT NIC (--nat-nic flag) - Wait for SSH, compile and install cmd/tta as a LaunchDaemon - TTA polls the host via AF_VSOCK for an IP assignment; during prep the host replies "wait" - Disconnect NIC, save VM state via SIGINT Test fast path (cached, ~7s to agent connected): - APFS clone the snapshot, write test-specific config.json - Launch Host.app with --disconnected-nic --attach-network --assign-ip - VZ restores from SaveFile.vzvmsave (~5s with 4GB RAM) - TTA's vsock poll gets the IP config, sets static IP via ifconfig (bypasses DHCP entirely), switches driver addr to the IP directly (bypasses DNS), and resets the dial context so the reverse-dial reconnects immediately - TTA agent connects to test driver within ~2s of IP assignment Key optimizations: - 4GB RAM instead of 8GB: halves SaveFile.vzvmsave (1.4GB vs 2.4GB), halves restore time (5.5s vs 11s) - AF_VSOCK IP assignment: bypasses macOS DHCP (~5-7s saved) - Direct IP dial: bypasses DNS resolution for test-driver.tailscale - Dial context reset: cancels stale in-flight dials from snapshot - Kill instead of SIGINT for test VM cleanup (no state save needed) - Parallel VM launches Also: - Add TestDriverIPv4/TestDriverPort constants to vnet - Add --nat-nic and --assign-ip flags to Host.app - Fix SIGINT handler: retain DispatchSource globally, use dispatchMain() - Add vsock listener (port 51011) to Host.app for IP config protocol - Add disconnectNetwork() to VMController for clean snapshot state - Fix Makefile: set -o pipefail so xcodebuild failures aren't swallowed Updates #13038 Change-Id: Icbab73b57af7df3ae96136fb49cda2536310f31b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 08:17:13 -07:00
M. J. Fromberger	7b53550fe6	control/controlclient: fix a nil-indirection bug in DERP key pruning (#19565 ) Upon deciding to update the LastSeen timestamp, we weren't checking that the field we are replacing into was non-nil. Rather than add an additional check, just allocate a fresh pointer for the updated time. Updates #19564 Change-Id: I589ebe65175fc7677c04a31dd6c4670e2531ee62 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-04-29 07:57:38 -07:00
David Bond	a29e42135b	cmd/k8s-operator: add nodeSelector to `DNSConfig` resource (#19429 ) This commit modifies the `DNSConfig` resource to allow customisation of the `spec.nodeSelector` field in the nameserver pods. Closes: https://github.com/tailscale/tailscale/issues/19419 Signed-off-by: David Bond <davidsbond93@gmail.com>	2026-04-29 15:56:33 +01:00
Brad Fitzpatrick	4cec06b8f2	tstest/natlab/vmtest: add macOS VM screenshot streaming to web UI When --vmtest-web is set, Host.app is launched with --screenshot-port 0 to start a localhost HTTP server that captures the VZVirtualMachineView display. The Go test harness parses the SCREENSHOT_PORT=<port> line from stdout, then polls every 2 seconds for JPEG thumbnails and pushes them over WebSocket to the web dashboard. Clicking a screenshot thumbnail opens a full-resolution image proxied through the web UI's /screenshot/{node} endpoint. Screenshot events are excluded from the EventBus history (they're large and only the latest matters, stored in NodeStatus.Screenshot). Updates #13038 Change-Id: I9bc67ddd1cc72948b33c555d4be3d8db06a41f6d Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 07:48:26 -07:00
Claus Lensbøl	78627c132f	wgengine/magicsock,ipn/ipnlocal: store and load homeDERP from cache (#19491 ) With netmap caching, the home DERP of the self node was neither saved to the cache or loaded from it, making nodes not stick to a DERP when starting without a connection to control. Instead, make sure that when a cache is available, load that cache, before looking for DERP servers. This is implemented by allowing a skip of ReSTUN in setting the DERP map (we must have a DERP map before setting the home DERP), so the DERP from cache will set itself and be sticky until a connection to control is established. Making DERP only change when connected to control is handled by existing code from f072d017bd8241675aa946a27fc1827f570435cb. Updates #19490 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-04-29 10:24:09 -04:00
Alex Chan	1841a93ab2	ssh/tailssh: mark TestSSHRecordingCancelsSessionsOnUploadFailure as flaky (again) This test is still flaking on macOS, so mark it as such so we can track and investigate further. Updates #7707 Change-Id: I640da3c1068a90a9815caab2df9431bceb01f846 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-04-29 14:22:09 +01:00
Alex Chan	bb91bb842c	all: remove everything related to non-seamless key renewal Seamless key renewal has been the default in all clients since 1.90. We retained the ability to disable it from the control plane as a precaution, but we haven't seen any issues that require us to disable it. We're now removing all the code for non-seamless key renewal, because we don't expect to turn it on again, and indeed it's been untested in the field for three releases so might contain latent bugs! Updates tailscale/corp#33042 Change-Id: I4b80bf07a3a50298d1c303743484169accc8844b Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-04-29 10:03:26 +01:00
Noel O'Brien	40088602c9	cmd/hello: remove hello.ipn.dev (#19567 ) Fixes #19566 Signed-off-by: Noel O'Brien <noel@tailscale.com>	2026-04-28 17:54:29 -07:00
Brad Fitzpatrick	b2d4ba04b6	tstest/natlab/vmtest: add macOS VM support using Tart base images Add macOS VM support to the vmtest framework using Tart's pre-built macOS images (ghcr.io/cirruslabs/macos-tahoe-base) instead of building from IPSW. The Tart image has SIP disabled and SSH enabled. At test time, the Tart base image's disk, NVRAM, and hardware identity are APFS-cloned into a tailmac-compatible directory layout, and the VM is booted headlessly via tailmac's Host.app (Virtualization.framework) with its NIC connected to vnet's dgram socket. New features: - tailmac.go: ensureTartImage (auto-pull), cloneTartToTailmac (format conversion), startTailMacVM (launch + cleanup) - NoAgent() node option for VMs without TTA installed - LANPing() for ICMP reachability testing via TTA's /ping endpoint - IsMacOS field on OSImage, with GOOS/GOARCH support - Dgram socket listener in Start() for macOS VMs - Fix ReadFromUnix error spam on dgram socket close in vnet TestMacOSAndLinuxCanPing verifies a macOS Tart VM and a gokrazy Linux VM can ping each other on the same vnet LAN. Updates #13038 Change-Id: I5e73a27878abf009f780fdf11a346fc857711cff Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-28 12:51:40 -07:00
Brad Fitzpatrick	ec7b11d986	tstest/natlab/vmtest, cmd/tta: add TestTaildrop Add a vmtest that brings up two Ubuntu nodes, each behind its own EasyNAT, joined to the tailnet. The sender pushes a small file via "tailscale file cp" and the receiver fetches it via "tailscale file get --wait", asserting that the filename and contents round-trip unchanged. To make Taildrop work in vmtest, three small pieces were needed: The Linux/FreeBSD cloud-init now starts tailscaled with --statedir as well as --state=mem:, so the daemon has a VarRoot to host Taildrop's incoming-files directory. State itself remains in-memory (so nothing persists across reboots); only the var-root scratch space is on disk. vmtest.New grows a variadic EnvOption parameter and a SameTailnetUser helper. When the option is passed, Start sets AllNodesSameUser=true on the embedded testcontrol.Server. Cross-node Taildrop requires the sender and receiver to share a Tailnet user (or have an explicit PeerCapabilityFileSharingTarget granted between them, which we don't plumb here), so TestTaildrop opts in. Existing tests don't. cmd/tta gains /taildrop-send and /taildrop-recv handlers that wrap "tailscale file cp" and "tailscale file get --wait", plus Env.SendTaildropFile and Env.RecvTaildropFile helpers in vmtest that drive them. Updates #13038 Change-Id: I8f5f70f88106e6e2ee07780dd46fe00f8efcfdf1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-28 12:27:55 -07:00
Brad Fitzpatrick	4b8e0ede6d	tstest/natlab/{vmtest,vnet}, cmd/tta: add TestMullvadExitNode Add a vmtest that brings up a Tailscale client, an Ubuntu VM acting as a Mullvad-style plain-WireGuard exit node, and a non-Tailscale webserver, each on its own NAT'd vnet network with a distinct WAN IP. The test exercises Tailscale's IsWireGuardOnly peer code path: the way the control plane wires Mullvad exit nodes into a client's netmap, including the per-client SelfNodeV4MasqAddrForThisPeer source-IP rewrite that lets a Tailscale CGNAT IP egress through a plain-WireGuard tunnel that has no idea what Tailscale is. The mullvad VM doesn't run wireguard-tools or kernel WireGuard; instead, a new TTA endpoint /wg-server-up creates a real Linux TUN named wg0, drives it with wireguard-go (already vendored), and configures the kernel side (ip addr/up, ip_forward, iptables NAT MASQUERADE) so decrypted traffic from the peer egresses with the mullvad VM's WAN IP. Userspace vs kernel WireGuard makes no difference on the wire — what's being tested is Tailscale's plain-WireGuard exit-node code path, not the kernel module — and this lets the test avoid downloading and installing .deb packages inside the VM. Adds Env.BringUpMullvadWGServer (calls /wg-server-up, returns the generated WG public key as a key.NodePublic), Env.SetExitNodeIP (EditPrefs ExitNodeIP directly, for exit nodes whose IPs aren't discoverable via TTA), Env.ControlServer (exposes the underlying testcontrol.Server so tests can UpdateNode / SetMasqueradeAddresses to inject custom peers), and Env.Status (fetches a node's tailscale status, used to read the client's pubkey so we can pin it as the WG server's only allowed peer). The test verifies that the webserver's echoed source IP is the client's WAN with no exit node selected, the mullvad VM's WAN with the WG-only peer selected as exit, and the client's WAN again after clearing. Updates #13038 Change-Id: I5bac4e0d832f05929f12cb77fa9946d7f5fb5ef1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-28 11:31:48 -07:00
Andrew Lytvynov	da0a277565	client/web: fail /api/routes requests with empty flags (#19548 ) If both ExitNode and AdvertiseRoutes flags are empty, then the request is invalid and should fail. Previously it would wipe out any existing values configured for these prefs because of the assumption in the handler that exactly one of them is set. Updates https://github.com/tailscale/corp/issues/40851 Signed-off-by: Andrew Lytvynov <awly@tailscale.com>	2026-04-28 11:16:47 -07:00
Brad Fitzpatrick	f7f8b0a0a5	cmd/tailscale/cli: drive "file cp" progress and offline warning from peerAPI The Online bit in PeerStatus comes from control's last-known state and can lag reality, so gating "tailscale file cp" on it is both unreliable and pushes correctness onto the server. Just try the push directly. In runCp, when the target's PeerStatus says it's offline, no longer fail upfront; getTargetStableID returns the StableID anyway. Replace the static "is offline" warning with a 3-second timer armed for the first file: if the timer fires before peerAPI bytes have flowed, we print a warning to stderr. The wording depends on whether control reported the peer offline ("is reportedly offline; trying anyway") or online ("is not replying; trying anyway"). The warning is printed with a leading vt100 clear-line and a trailing newline so it doesn't get painted over by the progress redraw and so the next progress redraw lands on a fresh line below it. Both the timer disarm and the progress display now read from tailscaled's OutgoingFile.Sent (subscribed via WatchIPNBus) instead of the local-body counter. That's the difference between bytes-acked-by- local-tailscaled (what countingReader.n was measuring; useless for detecting an unreachable peer because for small files net/http buffers the entire body into the unix-socket conn before the peerAPI dial has even started) and bytes-pulled-toward-peerAPI (what tailscaled is actually doing, reflected in OutgoingFile.Sent). The previous code reported 100% within milliseconds for a 3 KiB file even when the peer was unreachable. Add --update-interval (default 250ms) to control the progress repaint cadence; zero or negative disables the progress display entirely. The printer now also stops repainting once it observes Sent at full size with a near-zero rate for >2s, so a stuck transfer doesn't keep clobbering whatever the rest of runCp is trying to print. Updates #18740 Change-Id: I189bd1c2cd8e094d372c4fee23114b1d2f8024b4 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-28 11:03:58 -07:00
Brad Fitzpatrick	88cb6f58f8	tool/updateflakes, cmd/nardump: replace update-flake.sh with Go tool Consolidate go.mod.sri and go.toolchain.rev.sri into a single flakehashes.json file at the repo root, owned by a new Go program at tool/updateflakes. The JSON is consumed by flake.nix via builtins.fromJSON and by any future Go code via the FlakeHashes struct that defines its schema. Each block records its input fingerprint alongside the SRI it produced: the goModSum (a sha256 over go.mod and go.sum) for the vendor block, and the literal rev string from go.toolchain.rev for the toolchain block. updateflakes regenerates a block only when its recorded fingerprint disagrees with the current input. Doing the gating by content rather than file mtimes avoids the usual mtime hazards across git checkouts, clones, and merges. It also means re-runs with no input changes are essentially free, and a re-run that touches only one input pays only for that one block. The two blocks have no shared state -- vendor invokes go mod vendor into one tempdir, toolchain fetches and extracts a tarball into another -- so they run concurrently via errgroup. Cold time is bounded by the slower of the two rather than their sum. Also takes the opportunity to fold the toolchain fetch into a single curl\|tar pipeline (no intermediate .tar.gz on disk). Split cmd/nardump into a thin package main and a new package nardump library at cmd/nardump/nardump that holds the NAR encoder and SRI helper. tool/updateflakes imports the library directly rather than building and exec'ing the nardump binary at runtime. The library uses fs.ReadLink (Go 1.25+) instead of os.Readlink, so it no longer requires the caller to chdir into the FS root for symlink targets to resolve. WriteNAR now wraps its writer in a bufio.Writer internally (unless the caller already passed one) and flushes on return, so callers don't pay for tiny writes against slow underlying writers. The cache-busting line in flake.nix and shell.nix is known to live at end of file, so updateCacheBust walks the lines in reverse. make tidy timings on this machine, before: ~14s every run. After: warm (no input changes): 0.05s vendor block stale only: 1.4s toolchain block stale only: 5.0s cold (no flakehashes.json): 5.0s Updates #6845 Change-Id: I0340608798f1614abf147a491bf7c68a198a0db4 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-28 10:18:32 -07:00
Andrew Dunham	33714211c8	net/dns: use os.Root to prevent path traversal in darwin resolver The darwinConfigurator writes split DNS resolver files to /etc/resolver/$SUFFIX using os.WriteFile with string concatenation. A crafted MatchDomain value containing path traversal sequences (e.g. "../evil") could write files outside the resolver directory. Use os.OpenRoot to confine all file operations in SetDNS and removeResolverFiles to the resolver directory. os.Root rejects any path component that escapes the root, returning an error instead of following the traversal. Also parametrize the resolver directory path on the struct to enable testing with t.TempDir(), and add tests. As far as I can tell, this would require a malicious controlplane to exploit, but still worth fixing. Updates tailscale/corp#39751 Signed-off-by: Andrew Dunham <andrew@tailscale.com>	2026-04-28 11:08:22 -04:00
Brad Fitzpatrick	b9eac14ef9	tstest/natlab/vmtest: add web UI for watching VM tests live Add an optional --vmtest-web flag that starts an HTTP server showing a live dashboard for vmtest runs. The dashboard includes: - Step progress tracker showing all test phases (compile, image prep, QEMU launch, agent connect, tailscale up, test-specific steps) with status icons and elapsed times - Per-VM "virtual monitor" cards showing serial console output streamed in realtime via WebSocket - Per-NIC DHCP status (supporting multi-homed VMs like subnet routers) - Per-node Tailscale status (hidden for non-tailnet VMs) - Test status badge (Running/Passed/Failed) with live elapsed timer - Event log showing all lifecycle events chronologically Architecture follows the existing util/eventbus HTMX+WebSocket pattern: the server pushes HTML fragments with hx-swap-oob attributes over a WebSocket, and HTMX routes them to the correct DOM elements by ID. Key components: - vmstatus.go: Step tracker (Begin/End lifecycle), EventBus (pub/sub with history for late joiners), VMEvent types, NodeStatus tracking - web.go: HTTP server, WebSocket handler, template loading, ANSI-to-HTML conversion via robert-nix/ansihtml, deterministic port selection - assets/: HTML templates, CSS, HTMX library (copied from eventbus) - vnet/vnet.go: DHCP event callback on Server for observing DHCP lifecycle - qemu.go: Console log file tailing with manual offset-based reading Usage: go test ./tstest/natlab/vmtest/ --run-vm-tests --vmtest-web=:0 -v When using :0, a deterministic port based on the test name is tried first so re-runs get the same URL, falling back to OS-assigned on conflict. Updates #13038 Change-Id: I45281347b3d7af78ed9f4ff896033984f84dcb4d Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-28 07:46:04 -07:00
Alex Chan	0ac09721df	tka: reduce boilerplate code in the tests Updates #cleanup Change-Id: Id69d509f5e470fb5fb50b5c5c4ca61f000389c53 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-04-28 16:42:48 +02:00
Brad Fitzpatrick	cb239808a6	tstest/natlab/vmtest: add --test-version flag Add a --test-version flag to run the natlab VM tests against released tailscale/tailscaled binaries downloaded from pkgs.tailscale.com instead of building from the source tree. The value can be a concrete release like "1.97.255", or "stable" / "unstable" which resolve to the latest TarballsVersion on that track via pkgs.tailscale.com/<track>/?mode=json. The track for a concrete version is derived from its minor (even=stable, odd=unstable). The host architecture (amd64 or arm64) selects the tarball. Tarballs are cached + extracted under ~/.cache/tailscale-vmtest/builds/<version>_<arch>/ so they are not re-fetched per test. tta is still always built from the local tree. Cloud VMs (Ubuntu, Debian) pick up the downloaded binaries via the existing files.tailscale file server. Non-Linux GOOS (FreeBSD) falls back to building from source since pkgs.tailscale.com only ships Linux tarballs. Gokrazy nodes continue to use binaries baked into the gokrazy image; --test-version is a no-op for them. Updates #13038 Change-Id: I213ef7db362dd17bf69d2685cbf2ab0ec5a3fee1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-28 06:59:26 -07:00
Daniel Pañeda	7735b15de3	cmd/k8s-operator: truncate long label values in metrics resources (#18895 ) * cmd/k8s-operator: truncate long label values in metrics resources Kubernetes label values have a 63-character limit, but resource names can be up to 253 characters. When a Service or Ingress with a long name is exposed via Tailscale, the operator fails to reconcile because it uses the parent resource name directly as label values on metrics Services. Truncate label values that may exceed the limit by keeping the first 54 characters and appending a SHA256-based hash suffix to preserve uniqueness. Fixes #18894 Signed-off-by: Daniel Pañeda <daniel.paneda@clickhouse.com> Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> * cmd/k8s-operator: move TruncateLabelValue to shared k8s-operator package Move the label truncation helper to k8s-operator/utils.go so it can be reused by other components that need to produce valid Kubernetes labels. Signed-off-by: Daniel Pañeda <daniel.paneda@clickhouse.com> Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> * cmd/k8s-operator: truncate long domain label values in cert resources Applies TruncateLabelValue to certResourceLabels in order to prevent API server validation failures. This covers both the HA Ingress and kube-apiserver proxy reconcilers, as both flow through certResourceLabels. Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> * cmd/k8s-operator: remove empty metrics_resources_test.go, use hyphens in test names to satisfy go vet Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> --------- Signed-off-by: Daniel Pañeda <daniel.paneda@clickhouse.com> Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>	2026-04-28 14:11:59 +01:00
Kristoffer Dalby	384b7fb561	release/dist/qnap: preserve .codesigning files as build artifacts Stop deleting .qpkg.codesigning files in build-qpkg.sh and include them in the returned artifact list from buildQPKG. These files contain the last 32 characters of the base64-encoded CMS signature produced by QDK code signing. They are consumed by pkgserve to populate <signature> entries in the QNAP repository XML, matching the format used by myqnap.org and qnapclub.eu. Updates corp#33203 Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2026-04-28 12:29:56 +01:00
Will Norris	2d85f37f39	client/systray: support several different color themes Currently we only have a dark theme icon with white and grey dots over a black background. For some desktops, a logo with black and grey dots over a white background might be preferable. And for desktops where the bar is almost black or white, but not quite, an option to render the logo with dots only and no background can look really nice. Add a new -theme flag to the systray command with the default staying the same as it is today. Updates #18303 Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d Signed-off-by: Will Norris <will@tailscale.com>	2026-04-27 18:54:14 -07:00
License Updater	325f52c654	licenses: update license notices Signed-off-by: License Updater <noreply+license-updater@tailscale.com>	2026-04-27 18:38:06 -07:00
Brad Fitzpatrick	d0ae993334	tstest/natlab/vmtest: add more subnet router tests Add two tests building on TestExitNode's framework: TestSubnetRouterPublicIP brings up a client, a subnet router, and a webserver, each on its own NAT'd network with distinct WAN IPs. The subnet router advertises the webserver's network as a route. The test toggles the client's --accept-routes preference and asserts that the webserver's echoed source IP switches between the client's own WAN (direct dial) and the subnet router's WAN (forwarded through the router and SNAT'd). TestSubnetRouterAndExitNode adds a fourth node, an exit node that advertises 0.0.0.0/0 + ::/0, and uses a table-driven layout with subtests to cover the four combinations of (exit on/off, subnet on/off). The case where both are on confirms longest-prefix match wins: the subnet router's /24 takes precedence over the exit node's /0. The exit node itself is configured with --accept-routes=off so that, in the exit-only case, it forwards directly to the simulated internet rather than re-routing the forwarded traffic via the subnet router (which would otherwise mask the exit node's WAN as the observed source). Adds an Env.SetAcceptRoutes helper for toggling the RouteAll pref via EditPrefs, used by both tests. Updates #13038 Change-Id: Ifc2726db1df2f039c477c222484f535bebc40445 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-27 17:06:17 -07:00
Brad Fitzpatrick	c0e6ffed0d	tstest/tailmac: add NIC hot-swap, disconnected NIC, and screenshot server Add NIC attachment hot-swap support to Host.app: VZNetworkDevice.attachment is writable at runtime, so --disconnected-nic creates a NIC with no attachment, and --attach-network hot-swaps it to a vnet dgram socket after boot/restore. macOS detects link-up and does DHCP. Refactor TailMacConfigHelper: extract createDgramAttachment() and createDisconnectedNetworkDeviceConfiguration() from the monolithic createSocketNetworkDeviceConfiguration(). Add --screenshot-port flag for headless mode. Host.app serves GET /screenshot as JPEG via a localhost HTTP server, capturing the VZVirtualMachineView via CGWindowListCreateImage. The Go test harness polls these to push live thumbnails to the web dashboard. Also: SIGINT handler in headless mode for clean VM state save. Updates #13038 Change-Id: I42fba0ecd760371b4ec5b26a0557e3dd0ba9ecae Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-27 17:03:09 -07:00
Brad Fitzpatrick	5c1738fd56	tstest/natlab/{vmtest,vnet}, cmd/tta: add TestExitNode Add a vmtest TestExitNode that brings up a client, two exit nodes, and a non-Tailscale webserver, each on its own NAT'd vnet network with a distinct WAN IP. The test cycles the client's exit node setting between off, exit1, and exit2 and asserts that the webserver echoes the expected post-NAT source IP for each. Three pieces were needed to make this work: vnet now forwards TCP between simulated networks at the packet level, mirroring the existing UDP path. When a guest VM sends TCP to another simulated network's WAN IP, the source network's gateway rewrites src via doNATOut and routeTCPPacket hands the packet off to the destination network, which rewrites dst via doNATIn and writes the rewritten frame onto the destination LAN. The TCP stacks of the two guest VM kernels talk end-to-end; vnet just NATs the IP/port headers in flight, so all TCP semantics (handshakes, options, sequence numbers, payload) are preserved without a gvisor TCP termination in the middle. Adds a focused TestInterNetworkTCP that exercises this path without any Tailscale machinery. cmd/tta binds its outbound dial to the default route's interface using SO_BINDTODEVICE. Without that, the moment tailscaled installs 0.0.0.0/0 → tailscale0 in response to setting an exit node, TTA's existing TCP connection to test-driver gets rerouted through the exit node. From the test driver's perspective the connection's packets then arrive with the exit node's WAN IP as the source rather than the client's, so they don't match the existing flow and the connection is dead — manifesting in the test as a hang on EditPrefs (which had actually completed in milliseconds on the daemon side, but whose response never made it back). Pinning the socket to the underlying NIC keeps TTA's agent connection on a real interface regardless of any policy routing tailscaled installs later. We bind rather than carry the Tailscale bypass fwmark because the fwmark approach is conditional on tailscaled having configured SO_MARK-based policy routing, while binding is unconditional. vmtest grows an Env.SetExitNode helper that sets ExitNodeIP via EditPrefs through the agent, used by the new test. Updates #13038 Change-Id: I9fc8f91848b7aa2297ef3eaf71fed9d96056a024 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-27 16:54:20 -07:00
Alex Chan	10b63f27ce	tstest/clock: explain what happens if you don't set a Start time While working on #19444, I assumed that omitting `Start` would return a clock that started at January 1, year 1, because that's the zero value for a `time.Time`, but actually it uses the current UTC time instead. This behaviour is non-obvious, so document it. Updates #cleanup Change-Id: Id91400778578655953ff3e1671ce470db97cfe91 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-04-28 00:15:46 +02:00
Brad Fitzpatrick	ad5436af0d	tstest/largetailnet, tstest/integration/testcontrol: add in-process large-tailnet benchmark Add a Go benchmark that exercises a single tailnet client (a [tsnet.Server] running in the test process) against a synthetic large initial netmap and a stream of caller-driven peer add/remove deltas, all in-process. The harness is split in two parts: - tstest/largetailnet, a reusable package containing a [Streamer] that hijacks the map long-poll on a [testcontrol.Server] via the new AltMapStream hook, sends one initial MapResponse with N synthetic peers, and forwards caller-supplied delta MapResponses on the same stream. Helpers like MakePeer / AllocPeer build synthetic peers with unique IDs and addresses derived from the Tailscale ULA range. - tstest/largetailnet/largetailnet_test.go, BenchmarkGiantTailnet (headless tailscaled workload, no IPN bus subscriber) and BenchmarkGiantTailnetBusWatcher (GUI-client workload with one Notify subscriber attached). Both are gated on --actually-test-giant-tailnet (skipped by default), stand up an in-process testcontrol + tsnet.Server, let Up block until the initial N-peer netmap has been processed, then ResetTimer and run add+remove pairs via b.Loop. Per-delta sync is via a test-only [ipnlocal.LocalBackend.AwaitNodeKeyForTest] channel that closes once the just-added peer key appears in the netmap (no-watcher variant) or via bus-Notify drain (bus-watcher variant). To support the hijack, [testcontrol.Server] grows an AltMapStream hook and a small MapStreamWriter interface for benchmarks/stress tests that need to drive a controlled MapResponse sequence; the normal serveMap path is untouched when AltMapStream is nil. The streamer answers non-streaming "lite" map polls (which controlclient issues before the streaming long-poll to push HostInfo) with an empty MapResponse and returns immediately, so the streaming poll that follows is the one that gets the initial netmap. The benchmark is intended for before/after comparisons of netmap- and delta-handling changes targeted at large tailnets. CPU profiles on unmodified main show the expected O(N) hotspots: setControlClientStatusLocked / authReconfigLocked / userspaceEngine.Reconfig / setNetMapLocked, plus JSON encoding of the full Notify.NetMap to bus watchers (which dominates the BusWatcher variant). Median ms/op over 10 runs on unmodified main, by tailnet size N: N no-watcher bus-watcher 10000 32 166 50000 222 865 100000 504 1765 250000 1551 4696 Recommended invocation: go test ./tstest/largetailnet/ -run=^$ \ -bench='BenchmarkGiantTailnet(BusWatcher)?$' \ -benchtime=2000x -timeout=10m \ --actually-test-giant-tailnet \ --giant-tailnet-n=250000 \ -cpuprofile=/tmp/giant.cpu.pprof Updates #12542 Change-Id: I4f5b2bb271a36ba853d5a0ffe82054ef2b15c585 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-27 11:47:12 -07:00
Mike O'Driscoll	33342aec32	The connmark save/restore rules in mangle/PREROUTING restore the Tailscale bypass fwmark (0x80000) onto reply packets so that rp_filter's reverse-path check routes through the main table instead of table 52. However, the kernel only uses the packet's fwmark during the rp_filter lookup when net.ipv4.conf.all.src_valid_mark=1. (#19537 ) On systems where this sysctl defaults to 0 (including GCP VMs), rp_filter performs its lookup with fwmark=0, hits rule 5270 then table 52 and routes to 0.0.0.0/0 dev tailscale0, and drops every reply packet arriving on the physical interface as a martian. This breaks all connectivity when using an exit node: DERP, DNS, control plane, and even the cloud metadata service. Set src_valid_mark=1 when enabling the connmark rules so the rp_filter workaround actually works in these cases. Updates #3310 Updates tailscale/corp#37846 Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>	2026-04-27 13:52:45 -04:00
Brad Fitzpatrick	0e10a3f580	net/tsdial, ipn/localapi, client/local: let clients dial non-Tailscale addresses directly Add a tsdial.Dialer.UserDialPlan method that resolves an address and reports whether the dialer would route it via Tailscale. The LocalAPI /dial handler now uses this to skip proxying for addresses that aren't Tailscale routes (e.g. localhost), returning a Dial-Self response with the resolved address so the client can dial it directly. This avoids an unnecessary round-trip through the daemon for local connections. The client's UserDial handles the new response by dialing the resolved address itself, and the server passes the pre-resolved IP:port for Tailscale dials to avoid redundant DNS lookups. Thanks to giacomo and Moyao for pointing this out! Updates tailscale/corp#39702 Change-Id: I78d640f11ccd92f43ddd505cbb0db8fee19f43a6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-27 09:33:27 -07:00
Andrew Lytvynov	649781df84	util/pidowner: remove unused package (#19521 ) Added in 2020, this appears to be unused. Updates #cleanup Signed-off-by: Andrew Lytvynov <awly@tailscale.com>	2026-04-27 09:25:46 -07:00
Andrew Lytvynov	a70629eae3	util/topk: remove unsued package (#19524 ) Added in 2024 and appears unused. Updates #cleanup Signed-off-by: Andrew Lytvynov <awly@tailscale.com>	2026-04-27 09:13:40 -07:00
Andrew Lytvynov	346d6bb04c	util/sysresources: remove unused package (#19523 ) Added a few years ago and appears to be unused. Updates #cleanup Signed-off-by: Andrew Lytvynov <awly@tailscale.com>	2026-04-27 09:13:30 -07:00
Andrew Lytvynov	64bb40b45b	util/pool: remove unused package (#19522 ) Added in 2024 and appears to be unused. Updates #cleanup Signed-off-by: Andrew Lytvynov <awly@tailscale.com>	2026-04-27 09:13:14 -07:00
BeckyPauley	7477a6ee47	cmd/k8s-operator: use dynamic resource names in e2e ingress tests (#19536 ) Replace hardcoded resource names with dynamically generated names in k8s-operator-e2e ingress tests to avoid collisions with stale resources. Updates #tailscale/corp#40612 Signed-off-by: Becky Pauley <becky@tailscale.com>	2026-04-27 13:40:46 +01:00
Evan Lowry	3a05c450ce	posture: add HealthTracker for serial number retrieval (#19181 ) Device posture checking can fail while enabled if tailscaled does not have access to smbios. Previously, this was only observable by looking in the tailscaled logs. Fixes tailscale/corp#39314 Signed-off-by: Evan Lowry <evan@tailscale.com>	2026-04-25 15:42:47 -03:00
Brad Fitzpatrick	f3b2f9b0ef	all: fix duplicate package docs and tighten TestPackageDocs TestPackageDocs walked into directories starting with "." (such as .claude worktrees) and only logged warnings on duplicate package docs across files in a directory. Skip dot-directories (which covers the old .git but also .claude), ignore files with "//go:build ignore" so command files don't falsely trip the duplicate check, and promote the duplicate-doc warning to a t.Errorf. While here, deduplicate the package docs that were previously only logged: drop the redundant comment from client/systray/startup-creator.go, move the comprehensive taildrop doc into feature/taildrop/doc.go, and remove a leftover doc fragment from feature/condlite/expvar/omit.go. The tstest/integration/vms allowlist is no longer needed since the //go:build ignore filter now handles its dns_tester.go and udp_tester.go files generically. Fixes #19526 Change-Id: Id794d96bd728826a1883a054e4a244f90fa05d3d Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-24 19:01:43 -07:00

1 2 3 4 5 ...

10561 Commits