tailscale

mirror of https://github.com/tailscale/tailscale.git synced 2026-05-06 12:46:20 +02:00

Author	SHA1	Message	Date
Brad Fitzpatrick	15cba0a3f6	tstest/natlab/vmtest: add TestDiscoKeyChange Add a vmtest that brings up two gokrazy nodes A and B behind two One2OneNAT networks (so direct UDP works in both directions and any slowness can't be blamed on NAT traversal), establishes a WireGuard tunnel A → B with TSMP, then rotates B's disco key four times and asserts that the data plane recovers in both directions after each rotation. All pings are TSMP (the data-plane ping; disco pings would not exercise the WireGuard tunnel itself). The five pings: 1. A → B (initial; brings up the tunnel; 30s budget) 2. B → A after rotate (LocalAPI rotate-disco-key debug action) 3. A → B after rotate (LocalAPI) 4. B → A after restart (SIGKILL; gokrazy supervisor respawns) 5. A → B after restart (SIGKILL) Each post-rotation ping gets a 15-second budget. Two unavoidable multi-second waits dominate today: - The rotate-then-a→b phase takes ~10s on main because of LazyWG. After B's WantRunning bounce, B's wgengine resets its sentActivityAt/recvActivityAt maps and trims A out of the wireguard-go config as an "idle peer"; B only re-adds A on inbound activity, by which point A's first few TSMP packets have been silently dropped at B's tundev. The bradfitz/rm_lazy_wg branch removes that trimming entirely (verified locally: this phase drops to <100ms there). - The restart phases take ~5s for wireguard-go's RekeyTimeout handshake retry. After SIGKILL+respawn the first WG handshake init from the restarted node sometimes goes into the void (likely the brief peer-removed window in the receiver's two-step maybeReconfigWireguardLocked reconfig during which the peer is absent from wireguard-go), and wg-go's 5s+jitter retransmit timer is the next opportunity to retry. That retry succeeds and the staged TSMP packet flushes. Intrinsic to the protocol's retransmit policy. Once LazyWG is removed and the first-handshake-after-reconfig race is fixed, the budget should drop to 5s. Supporting changes: ipn/ipnlocal: DebugRotateDiscoKey now toggles WantRunning off and back on after rotating the disco key. magicsock.Conn.RotateDiscoKey only resets local disco state; without also dropping wireguard-go session keys, peers keep encrypting with their stale per-peer session against us until their rekey timer fires (WireGuard has no data-plane signaling to invalidate sessions). Bouncing WantRunning runs the engine through Reconfig(empty) → authReconfig, which drops every peer's WG session so the next packet either way triggers a fresh handshake. ipn/ipnlocal, ipn/localapi: add a debug-only "peer-disco-keys" LocalAPI action ([LocalBackend.DebugPeerDiscoKeys]) that returns a map[NodePublic]DiscoPublic from the current netmap. Tests reach it via [local.Client.DebugResultJSON]. We do not surface disco keys via [ipnstate.PeerStatus] because adding a non-comparable [key.DiscoPublic] field there breaks reflect-based test helpers (e.g. TestFilterFormatAndSortExitNodes' use of cmp.Diff), and general LocalAPI clients have no need for disco keys. Since the debug LocalAPI is gated behind the ts_omit_debug build tag, this endpoint is automatically stripped from small binaries. cmd/tta: add /restart-tailscaled handler (Linux-only, via /proc walk) to drive the SIGKILL phase. On gokrazy the supervisor respawns tailscaled within a second. tstest/integration/testcontrol: add Server.AllOnline. When set, every peer entry in MapResponses is marked Online=true. Several disco-key handling fast paths in controlclient and wgengine (removeUnwantedDiscoUpdates, removeUnwantedDiscoUpdatesFromFull NetmapUpdate, the wgengine tsmpLearnedDisco fast path) only fire for online peers; without this flag, tests exercising disco-key rotation only hit the offline-peer code paths, which mask issues and are several seconds slower in this scenario. Finer-grained per-node online tracking can be added later. tstest/natlab/vmtest: add Env.RotateDiscoKey, Env.RestartTailscaled, Env.PeerDiscoKey, Node.Name, an [AllOnline] EnvOption that plumbs through to testcontrol.Server.AllOnline, and an exported Env.Ping(from, to, type, timeout). Ping replaces the unexported helper so callers can specify both a ping type (PingDisco for warming peer state, PingTSMP for asserting end-to-end connectivity) and a deadline. PeerDiscoKey returns its LocalAPI error so callers inside tstest.WaitFor can retry transient failures rather than fataling the test. Updates #12639 Updates #13038 Change-Id: I3644f27fc30e52990ba25a3983498cc582ddb958 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 12:58:00 -07:00
Brad Fitzpatrick	02ffe5baa8	tstest/natlab/vmtest: add macOS VM snapshot caching for fast test starts Cache a pre-booted macOS VM snapshot on disk so subsequent test runs restore from the snapshot instead of cold-booting. The snapshot is keyed by the Tart base image digest and a code version constant (macOSSnapshotCodeVersion); bumping either invalidates the cache. Snapshot preparation (one-time): - Boot the Tart base image with a NAT NIC (--nat-nic flag) - Wait for SSH, compile and install cmd/tta as a LaunchDaemon - TTA polls the host via AF_VSOCK for an IP assignment; during prep the host replies "wait" - Disconnect NIC, save VM state via SIGINT Test fast path (cached, ~7s to agent connected): - APFS clone the snapshot, write test-specific config.json - Launch Host.app with --disconnected-nic --attach-network --assign-ip - VZ restores from SaveFile.vzvmsave (~5s with 4GB RAM) - TTA's vsock poll gets the IP config, sets static IP via ifconfig (bypasses DHCP entirely), switches driver addr to the IP directly (bypasses DNS), and resets the dial context so the reverse-dial reconnects immediately - TTA agent connects to test driver within ~2s of IP assignment Key optimizations: - 4GB RAM instead of 8GB: halves SaveFile.vzvmsave (1.4GB vs 2.4GB), halves restore time (5.5s vs 11s) - AF_VSOCK IP assignment: bypasses macOS DHCP (~5-7s saved) - Direct IP dial: bypasses DNS resolution for test-driver.tailscale - Dial context reset: cancels stale in-flight dials from snapshot - Kill instead of SIGINT for test VM cleanup (no state save needed) - Parallel VM launches Also: - Add TestDriverIPv4/TestDriverPort constants to vnet - Add --nat-nic and --assign-ip flags to Host.app - Fix SIGINT handler: retain DispatchSource globally, use dispatchMain() - Add vsock listener (port 51011) to Host.app for IP config protocol - Add disconnectNetwork() to VMController for clean snapshot state - Fix Makefile: set -o pipefail so xcodebuild failures aren't swallowed Updates #13038 Change-Id: Icbab73b57af7df3ae96136fb49cda2536310f31b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 08:17:13 -07:00
Brad Fitzpatrick	ec7b11d986	tstest/natlab/vmtest, cmd/tta: add TestTaildrop Add a vmtest that brings up two Ubuntu nodes, each behind its own EasyNAT, joined to the tailnet. The sender pushes a small file via "tailscale file cp" and the receiver fetches it via "tailscale file get --wait", asserting that the filename and contents round-trip unchanged. To make Taildrop work in vmtest, three small pieces were needed: The Linux/FreeBSD cloud-init now starts tailscaled with --statedir as well as --state=mem:, so the daemon has a VarRoot to host Taildrop's incoming-files directory. State itself remains in-memory (so nothing persists across reboots); only the var-root scratch space is on disk. vmtest.New grows a variadic EnvOption parameter and a SameTailnetUser helper. When the option is passed, Start sets AllNodesSameUser=true on the embedded testcontrol.Server. Cross-node Taildrop requires the sender and receiver to share a Tailnet user (or have an explicit PeerCapabilityFileSharingTarget granted between them, which we don't plumb here), so TestTaildrop opts in. Existing tests don't. cmd/tta gains /taildrop-send and /taildrop-recv handlers that wrap "tailscale file cp" and "tailscale file get --wait", plus Env.SendTaildropFile and Env.RecvTaildropFile helpers in vmtest that drive them. Updates #13038 Change-Id: I8f5f70f88106e6e2ee07780dd46fe00f8efcfdf1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-28 12:27:55 -07:00
Brad Fitzpatrick	4b8e0ede6d	tstest/natlab/{vmtest,vnet}, cmd/tta: add TestMullvadExitNode Add a vmtest that brings up a Tailscale client, an Ubuntu VM acting as a Mullvad-style plain-WireGuard exit node, and a non-Tailscale webserver, each on its own NAT'd vnet network with a distinct WAN IP. The test exercises Tailscale's IsWireGuardOnly peer code path: the way the control plane wires Mullvad exit nodes into a client's netmap, including the per-client SelfNodeV4MasqAddrForThisPeer source-IP rewrite that lets a Tailscale CGNAT IP egress through a plain-WireGuard tunnel that has no idea what Tailscale is. The mullvad VM doesn't run wireguard-tools or kernel WireGuard; instead, a new TTA endpoint /wg-server-up creates a real Linux TUN named wg0, drives it with wireguard-go (already vendored), and configures the kernel side (ip addr/up, ip_forward, iptables NAT MASQUERADE) so decrypted traffic from the peer egresses with the mullvad VM's WAN IP. Userspace vs kernel WireGuard makes no difference on the wire — what's being tested is Tailscale's plain-WireGuard exit-node code path, not the kernel module — and this lets the test avoid downloading and installing .deb packages inside the VM. Adds Env.BringUpMullvadWGServer (calls /wg-server-up, returns the generated WG public key as a key.NodePublic), Env.SetExitNodeIP (EditPrefs ExitNodeIP directly, for exit nodes whose IPs aren't discoverable via TTA), Env.ControlServer (exposes the underlying testcontrol.Server so tests can UpdateNode / SetMasqueradeAddresses to inject custom peers), and Env.Status (fetches a node's tailscale status, used to read the client's pubkey so we can pin it as the WG server's only allowed peer). The test verifies that the webserver's echoed source IP is the client's WAN with no exit node selected, the mullvad VM's WAN with the WG-only peer selected as exit, and the client's WAN again after clearing. Updates #13038 Change-Id: I5bac4e0d832f05929f12cb77fa9946d7f5fb5ef1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-28 11:31:48 -07:00
Brad Fitzpatrick	5c1738fd56	tstest/natlab/{vmtest,vnet}, cmd/tta: add TestExitNode Add a vmtest TestExitNode that brings up a client, two exit nodes, and a non-Tailscale webserver, each on its own NAT'd vnet network with a distinct WAN IP. The test cycles the client's exit node setting between off, exit1, and exit2 and asserts that the webserver echoes the expected post-NAT source IP for each. Three pieces were needed to make this work: vnet now forwards TCP between simulated networks at the packet level, mirroring the existing UDP path. When a guest VM sends TCP to another simulated network's WAN IP, the source network's gateway rewrites src via doNATOut and routeTCPPacket hands the packet off to the destination network, which rewrites dst via doNATIn and writes the rewritten frame onto the destination LAN. The TCP stacks of the two guest VM kernels talk end-to-end; vnet just NATs the IP/port headers in flight, so all TCP semantics (handshakes, options, sequence numbers, payload) are preserved without a gvisor TCP termination in the middle. Adds a focused TestInterNetworkTCP that exercises this path without any Tailscale machinery. cmd/tta binds its outbound dial to the default route's interface using SO_BINDTODEVICE. Without that, the moment tailscaled installs 0.0.0.0/0 → tailscale0 in response to setting an exit node, TTA's existing TCP connection to test-driver gets rerouted through the exit node. From the test driver's perspective the connection's packets then arrive with the exit node's WAN IP as the source rather than the client's, so they don't match the existing flow and the connection is dead — manifesting in the test as a hang on EditPrefs (which had actually completed in milliseconds on the daemon side, but whose response never made it back). Pinning the socket to the underlying NIC keeps TTA's agent connection on a real interface regardless of any policy routing tailscaled installs later. We bind rather than carry the Tailscale bypass fwmark because the fwmark approach is conditional on tailscaled having configured SO_MARK-based policy routing, while binding is unconditional. vmtest grows an Env.SetExitNode helper that sets ExitNodeIP via EditPrefs through the agent, used by the new test. Updates #13038 Change-Id: I9fc8f91848b7aa2297ef3eaf71fed9d96056a024 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-27 16:54:20 -07:00
Brad Fitzpatrick	f289f7e77c	tstest/natlab/vmtest,cmd/tta: add TestSiteToSite Verifies that site-to-site Tailscale subnet routing with --snat-subnet-routes=false preserves the original source IP end-to-end. Topology: two sites, each with a Linux subnet router on a NATted WAN plus an internal LAN, and a non-Tailscale backend on each LAN. Backends are given static routes pointing to their local subnet router for the remote site's prefix; an HTTP GET from backend-a to backend-b over Tailscale returns a body containing backend-a's LAN IP. Adds the supporting vmtest.SNATSubnetRoutes NodeOption and plumbs snat-subnet-routes through TTA's /up handler. The webserver started by vmtest.WebServer now also echoes the remote IP, for the preservation assertion. Adds a /add-route TTA endpoint (Linux-only for now) and a vmtest Env.AddRoute helper so the test can install the backend static routes through TTA rather than needing a host SSH key and debug NIC. ensureGokrazy now always rebuilds the natlab qcow2 (once per test process, via sync.Once) so the test picks up the new TTA and webserver behavior. This is pulled out of a larger pending change that adds FreeBSD site-to-site subnet routing support; figured we should have at least the Linux test covering what works today. Updates #5573 Change-Id: I881c55b0f118ac9094546b5fbe68dddf179bb042 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-22 12:11:30 -07:00
Brad Fitzpatrick	ec0b23a21f	vmtest: add VM-based integration test framework Add tstest/natlab/vmtest, a high-level framework for running multi-VM integration tests with mixed OS types (gokrazy + Ubuntu/Debian cloud images) connected via natlab's vnet virtual network. The vmtest package provides: - Env type that orchestrates vnet, QEMU processes, and agent connections - OS image support (Gokrazy, Ubuntu2404, Debian12) with download/cache - QEMU launch per OS type (microvm for gokrazy, q35+KVM for cloud) - Cloud-init seed ISO generation with network-config for multi-NIC - Cross-compilation of test binaries for cloud VMs - Debug SSH NIC on cloud VMs for interactive debugging - Test helpers: ApproveRoutes, HTTPGet, TailscalePing, DumpStatus, WaitForPeerRoute, SSHExec TTA enhancements (cmd/tta): - Parameterize /up (accept-routes, advertise-routes, snat-subnet-routes) - Add /set, /start-webserver, /http-get endpoints - /http-get uses local.Client.UserDial for Tailscale-routed requests - Fix /ping for non-gokrazy systems TestSubnetRouter exercises a 3-VM subnet router scenario: client (gokrazy) → subnet-router (Ubuntu, dual-NIC) → backend (gokrazy) Verifies HTTP access to the backend webserver through the Tailscale subnet route. Passes in ~30 seconds. Updates tailscale/tailscale#13038 Change-Id: I165b64af241d37f5f5870e796a52502fc56146fa Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-08 17:24:18 -07:00
Naman Sood	d6b626f5bb	tstest: add test for connectivity to off-tailnet CGNAT endpoints This test is currently known-broken, but work is underway to fix it. tailscale/corp#36270 tracks this work. Updates tailscale/corp#36270 Fixes tailscale/corp#36272 Signed-off-by: Naman Sood <mail@nsood.in>	2026-04-02 14:44:40 -04:00
Brad Fitzpatrick	bd2a2d53d3	all: use Go 1.26 things, run most gofix modernizers I omitted a lot of the min/max modernizers because they didn't result in more clear code. Some of it's older "for x := range 123". Also: errors.AsType, any, fmt.Appendf, etc. Updates #18682 Change-Id: I83a451577f33877f962766a5b65ce86f7696471c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-03-06 13:32:03 -08:00
Brad Fitzpatrick	2a64c03c95	types/ptr: deprecate ptr.To, use Go 1.26 new Updates #18682 Change-Id: I62f6aa0de2a15ef8c1435032c6aa74a181c25f8f Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-03-05 20:13:18 -08:00
Will Norris	3ec5be3f51	all: remove AUTHORS file and references to it This file was never truly necessary and has never actually been used in the history of Tailscale's open source releases. A Brief History of AUTHORS files --- The AUTHORS file was a pattern developed at Google, originally for Chromium, then adopted by Go and a bunch of other projects. The problem was that Chromium originally had a copyright line only recognizing Google as the copyright holder. Because Google (and most open source projects) do not require copyright assignemnt for contributions, each contributor maintains their copyright. Some large corporate contributors then tried to add their own name to the copyright line in the LICENSE file or in file headers. This quickly becomes unwieldy, and puts a tremendous burden on anyone building on top of Chromium, since the license requires that they keep all copyright lines intact. The compromise was to create an AUTHORS file that would list all of the copyright holders. The LICENSE file and source file headers would then include that list by reference, listing the copyright holder as "The Chromium Authors". This also become cumbersome to simply keep the file up to date with a high rate of new contributors. Plus it's not always obvious who the copyright holder is. Sometimes it is the individual making the contribution, but many times it may be their employer. There is no way for the proejct maintainer to know. Eventually, Google changed their policy to no longer recommend trying to keep the AUTHORS file up to date proactively, and instead to only add to it when requested: https://opensource.google/docs/releasing/authors. They are also clear that: > Adding contributors to the AUTHORS file is entirely within the > project's discretion and has no implications for copyright ownership. It was primarily added to appease a small number of large contributors that insisted that they be recognized as copyright holders (which was entirely their right to do). But it's not truly necessary, and not even the most accurate way of identifying contributors and/or copyright holders. In practice, we've never added anyone to our AUTHORS file. It only lists Tailscale, so it's not really serving any purpose. It also causes confusion because Tailscalars put the "Tailscale Inc & AUTHORS" header in other open source repos which don't actually have an AUTHORS file, so it's ambiguous what that means. Instead, we just acknowledge that the contributors to Tailscale (whoever they are) are copyright holders for their individual contributions. We also have the benefit of using the DCO (developercertificate.org) which provides some additional certification of their right to make the contribution. The source file changes were purely mechanical with: git ls-files \| xargs sed -i -e 's/\(Tailscale Inc &\) AUTHORS/\1 contributors/g' Updates #cleanup Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d Signed-off-by: Will Norris <will@tailscale.com>	2026-01-23 15:49:45 -08:00
Brad Fitzpatrick	05ac21ebe4	all: use new LocalAPI client package location It was moved in f57fa3cbc30e. Updates tailscale/corp#22748 Change-Id: I19f965e6bded1d4c919310aa5b864f2de0cd6220 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2025-02-05 14:41:42 -08:00
Brad Fitzpatrick	2636a83d0e	cmd/tta: pull out test driver dialing into a type, fix bugs There were a few places it could get wedged (notably the dial without a timeout). And add a knob for verbose debug logs. And keep two idle connections always. Updates #13038 Change-Id: I952ad182d7111481d97a83c12aa2ff4bfdc55fe8 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2024-08-26 15:36:30 -07:00
Brad Fitzpatrick	b78df4d48a	tstest/natlab/vnet: add start of IPv6 support Updates #13038 Change-Id: Ic3d095f167daf6c7129463e881b18f2e0d5693f5 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2024-08-24 18:02:38 -07:00
Brad Fitzpatrick	3904e4d175	cmd/tta, tstest/natlab/vnet: remove unneeded port 124 log hack, add log buffer The natlab Test Agent (tta) still had its old log streaming hack in place where it dialed out to anything on TCP port 124 and those logs were streamed to the host running the tests. But we'd since added gokrazy syslog streaming support, which made that redundant. So remove all the port 124 stuff. And then make sure we log to stderr so gokrazy logs it to syslog. Also, keep the first 1MB of logs in memory in tta too, exported via localhost:8034/logs for interactive debugging. That was very useful during debugging when I added IPv6 support. (which is coming in future PRs) Updates #13038 Change-Id: Ieed904a704410b9031d5fd5f014a73412348fa7f Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2024-08-23 12:10:19 -07:00
Brad Fitzpatrick	a61825c7b8	cmd/tta, vnet: add host firewall, env var support, more tests In particular, tests showing that #3824 works. But that test doesn't actually work yet; it only gets a DERP connection. (why?) Updates #13038 Change-Id: Ie1fd1b6a38d4e90fae7e72a0b9a142a95f0b2e8f Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2024-08-12 15:32:12 -07:00
Maisem Ali	d0e8375b53	cmd/{tta,vnet}: proxy to gokrazy UI Updates #13038 Change-Id: I1cacb1b0f8c3d0e4c36b7890155f7b1ad0d23575 Signed-off-by: Maisem Ali <maisem@tailscale.com>	2024-08-09 09:06:54 -07:00
Brad Fitzpatrick	f47a5fe52b	vnet: reduce some log spam Updates #13038 Change-Id: I76038a90dfde10a82063988a5b54190074d4b5c5 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2024-08-09 09:06:54 -07:00
Maisem Ali	f8d23b3582	tstest/integration/nat: stream daemon logs directly Updates #13038 Signed-off-by: Maisem Ali <maisem@tailscale.com> Change-Id: I5da5706149c082c27d74c8b894bf53dd9b259e84	2024-08-09 09:06:54 -07:00
Maisem Ali	12764e9db4	natlab: add NodeAgentClient This adds a new NodeAgentClient type that can be used to invoke the LocalAPI using the LocalClient instead of handcrafted URLs. However, there are certain cases where it does make sense for the node agent to provide more functionality than whats possible with just the LocalClient, as such it also exposes a http.Client to make requests directly. Signed-off-by: Maisem Ali <maisem@tailscale.com>	2024-08-09 09:06:54 -07:00
Brad Fitzpatrick	1016aa045f	hostinfo: add hostinfo.IsNATLabGuestVM And don't make guests under vnet/natlab upload to logcatcher, as there won't be a valid cert anyway. Updates #13038 Change-Id: Ie1ce0139788036b8ecc1804549a9b5d326c5fef5 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2024-08-09 09:06:54 -07:00
Brad Fitzpatrick	8594292aa4	vnet: add control/derps to test, stateful firewall Updates #13038 Change-Id: Icd65b34c5f03498b5a7109785bb44692bce8911a Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2024-08-09 09:06:54 -07:00
Brad Fitzpatrick	1ed958fe23	tstest/natlab/vnet: add start of virtual network-based NAT Lab Updates #13038 Change-Id: I3c74120d73149c1329288621f6474bbbcaa7e1a6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2024-08-07 09:37:15 -07:00

23 Commits