tailscale

mirror of https://github.com/tailscale/tailscale.git synced 2026-05-06 04:36:15 +02:00

Author	SHA1	Message	Date
Brad Fitzpatrick	9f343fdc0c	client/local, ipn/localapi, all: add CertDomains and DNSConfig accessors Add two narrow LocalAPI accessors so callers don't have to subscribe to the IPN bus and pull a full *netmap.NetworkMap just to read DNS-shaped fields: - GET /localapi/v0/cert-domains returns DNS.CertDomains. - GET /localapi/v0/dns-config returns the full tailcfg.DNSConfig. Migrate in-tree callers off the netmap-on-the-bus pattern: - kube/certs.waitForCertDomain still wakes on the IPN bus but now queries CertDomains via LocalClient.CertDomains rather than reading n.NetMap.DNS.CertDomains. The kube LocalClient interface and FakeLocalClient gain a CertDomains method. - cmd/tailscale dns status calls LocalClient.DNSConfig directly instead of opening a NotifyInitialNetMap watcher. - cmd/tailscale configure kubeconfig switches from a netmap watcher + serviceDNSRecordFromNetMap to LocalClient.DNSConfig + serviceDNSRecordFromDNSConfig. This is part of a series moving callers away from depending on the netmap traveling on the IPN bus, so the bus payload can shrink in a later change. Updates #12542 Change-Id: Ie10204e141d085fbac183b4cfe497226b670ad6c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-30 13:50:46 -07:00
Brad Fitzpatrick	92179b1fc7	cmd/hello: split server into helloserver package Move the template, request handler, and HTTP/HTTPS server wiring out of package main and into a new cmd/hello/helloserver package so the server can be embedded in other binaries. The main package now only constructs a helloserver.Server with the production addresses and calls Run. While here, drop the -http, -https, and -test-ip flags along with the dev-mode template and fake-data fallbacks they enabled; the binary is only run in production. Updates tailscale/corp#32398 Change-Id: Id1d38b981733334cafc596021130f36e1c1eed67 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-30 08:40:55 -07:00
David Bond	644c3224e9	cmd/{containerboot,k8s-operator}: don't return pointers to maps (#19593 ) This commit modifies the usage of the `egressservices.Configs` type within containerboot and the k8s operator. Originally it was being thrown around as a pointer which is not required as maps are already pointers under the hood. Signed-off-by: David Bond <davidsbond93@gmail.com>	2026-04-30 16:11:00 +01:00
Brad Fitzpatrick	815bb291c9	cmd/tailscale/cli: allow tag without "tag:" prefix in 'tailscale up' If a user passes --advertise-tags=foo,bar (with no colons in any segment), automatically prepend "tag:" client-side so it goes on the wire as "tag:foo,tag:bar". Segments that already contain a colon are left untouched and must be fully-qualified ("tag:foo"), which keeps the door open for future colon-bearing syntax. This was originally added in cd07437ad (2020-10-28) and then reverted in 1be01ddc6 (2020-11-10) over forward-compatibility concerns. But then it was realized in 2026-04-29 that this was always safe for future extensiblity anyway (tags can't contain colons-- tag:foo:bar is invalid anyway, per the 2020 CheckTag restrictions). So if we wanted to perhaps some hypothetical --advertise-tags=tagset:setfoo or "group:foo", we'd still have syntax to do, as it can't conflict with tag:group:foo. Avery signed off on this on Slack: "Ok, I withdraw my objection to auto-qualifying tag names in advertise-tags and I hope I won't regret it :)" Updates #861 Change-Id: I06935b0d3ae909894c95c9c2e185b7d6a219ff32 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-30 07:13:48 -07:00
Brad Fitzpatrick	15cba0a3f6	tstest/natlab/vmtest: add TestDiscoKeyChange Add a vmtest that brings up two gokrazy nodes A and B behind two One2OneNAT networks (so direct UDP works in both directions and any slowness can't be blamed on NAT traversal), establishes a WireGuard tunnel A → B with TSMP, then rotates B's disco key four times and asserts that the data plane recovers in both directions after each rotation. All pings are TSMP (the data-plane ping; disco pings would not exercise the WireGuard tunnel itself). The five pings: 1. A → B (initial; brings up the tunnel; 30s budget) 2. B → A after rotate (LocalAPI rotate-disco-key debug action) 3. A → B after rotate (LocalAPI) 4. B → A after restart (SIGKILL; gokrazy supervisor respawns) 5. A → B after restart (SIGKILL) Each post-rotation ping gets a 15-second budget. Two unavoidable multi-second waits dominate today: - The rotate-then-a→b phase takes ~10s on main because of LazyWG. After B's WantRunning bounce, B's wgengine resets its sentActivityAt/recvActivityAt maps and trims A out of the wireguard-go config as an "idle peer"; B only re-adds A on inbound activity, by which point A's first few TSMP packets have been silently dropped at B's tundev. The bradfitz/rm_lazy_wg branch removes that trimming entirely (verified locally: this phase drops to <100ms there). - The restart phases take ~5s for wireguard-go's RekeyTimeout handshake retry. After SIGKILL+respawn the first WG handshake init from the restarted node sometimes goes into the void (likely the brief peer-removed window in the receiver's two-step maybeReconfigWireguardLocked reconfig during which the peer is absent from wireguard-go), and wg-go's 5s+jitter retransmit timer is the next opportunity to retry. That retry succeeds and the staged TSMP packet flushes. Intrinsic to the protocol's retransmit policy. Once LazyWG is removed and the first-handshake-after-reconfig race is fixed, the budget should drop to 5s. Supporting changes: ipn/ipnlocal: DebugRotateDiscoKey now toggles WantRunning off and back on after rotating the disco key. magicsock.Conn.RotateDiscoKey only resets local disco state; without also dropping wireguard-go session keys, peers keep encrypting with their stale per-peer session against us until their rekey timer fires (WireGuard has no data-plane signaling to invalidate sessions). Bouncing WantRunning runs the engine through Reconfig(empty) → authReconfig, which drops every peer's WG session so the next packet either way triggers a fresh handshake. ipn/ipnlocal, ipn/localapi: add a debug-only "peer-disco-keys" LocalAPI action ([LocalBackend.DebugPeerDiscoKeys]) that returns a map[NodePublic]DiscoPublic from the current netmap. Tests reach it via [local.Client.DebugResultJSON]. We do not surface disco keys via [ipnstate.PeerStatus] because adding a non-comparable [key.DiscoPublic] field there breaks reflect-based test helpers (e.g. TestFilterFormatAndSortExitNodes' use of cmp.Diff), and general LocalAPI clients have no need for disco keys. Since the debug LocalAPI is gated behind the ts_omit_debug build tag, this endpoint is automatically stripped from small binaries. cmd/tta: add /restart-tailscaled handler (Linux-only, via /proc walk) to drive the SIGKILL phase. On gokrazy the supervisor respawns tailscaled within a second. tstest/integration/testcontrol: add Server.AllOnline. When set, every peer entry in MapResponses is marked Online=true. Several disco-key handling fast paths in controlclient and wgengine (removeUnwantedDiscoUpdates, removeUnwantedDiscoUpdatesFromFull NetmapUpdate, the wgengine tsmpLearnedDisco fast path) only fire for online peers; without this flag, tests exercising disco-key rotation only hit the offline-peer code paths, which mask issues and are several seconds slower in this scenario. Finer-grained per-node online tracking can be added later. tstest/natlab/vmtest: add Env.RotateDiscoKey, Env.RestartTailscaled, Env.PeerDiscoKey, Node.Name, an [AllOnline] EnvOption that plumbs through to testcontrol.Server.AllOnline, and an exported Env.Ping(from, to, type, timeout). Ping replaces the unexported helper so callers can specify both a ping type (PingDisco for warming peer state, PingTSMP for asserting end-to-end connectivity) and a deadline. PeerDiscoKey returns its LocalAPI error so callers inside tstest.WaitFor can retry transient failures rather than fataling the test. Updates #12639 Updates #13038 Change-Id: I3644f27fc30e52990ba25a3983498cc582ddb958 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 12:58:00 -07:00
Alex Valiushko	01d0bdd253	cmd/derper,derp: add metrics for rate limit hits (#19560 ) Expvars track count of rate limiters exceeding their threshold. Covers (1) global rate limiter and (2) total of local rate limiters. Also publish optional rate-limit metrics during ExpVar() call if -rate-config is specified. Fixes current rate-limit metrics being published outside of "derp" in /debug/vars. Updates tailscale/corp#38509 Change-Id: Ic7f5a1e890d0d7d3d7b679daa4b5f8926a6a6964 Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>	2026-04-29 10:29:09 -07:00
Brad Fitzpatrick	02ffe5baa8	tstest/natlab/vmtest: add macOS VM snapshot caching for fast test starts Cache a pre-booted macOS VM snapshot on disk so subsequent test runs restore from the snapshot instead of cold-booting. The snapshot is keyed by the Tart base image digest and a code version constant (macOSSnapshotCodeVersion); bumping either invalidates the cache. Snapshot preparation (one-time): - Boot the Tart base image with a NAT NIC (--nat-nic flag) - Wait for SSH, compile and install cmd/tta as a LaunchDaemon - TTA polls the host via AF_VSOCK for an IP assignment; during prep the host replies "wait" - Disconnect NIC, save VM state via SIGINT Test fast path (cached, ~7s to agent connected): - APFS clone the snapshot, write test-specific config.json - Launch Host.app with --disconnected-nic --attach-network --assign-ip - VZ restores from SaveFile.vzvmsave (~5s with 4GB RAM) - TTA's vsock poll gets the IP config, sets static IP via ifconfig (bypasses DHCP entirely), switches driver addr to the IP directly (bypasses DNS), and resets the dial context so the reverse-dial reconnects immediately - TTA agent connects to test driver within ~2s of IP assignment Key optimizations: - 4GB RAM instead of 8GB: halves SaveFile.vzvmsave (1.4GB vs 2.4GB), halves restore time (5.5s vs 11s) - AF_VSOCK IP assignment: bypasses macOS DHCP (~5-7s saved) - Direct IP dial: bypasses DNS resolution for test-driver.tailscale - Dial context reset: cancels stale in-flight dials from snapshot - Kill instead of SIGINT for test VM cleanup (no state save needed) - Parallel VM launches Also: - Add TestDriverIPv4/TestDriverPort constants to vnet - Add --nat-nic and --assign-ip flags to Host.app - Fix SIGINT handler: retain DispatchSource globally, use dispatchMain() - Add vsock listener (port 51011) to Host.app for IP config protocol - Add disconnectNetwork() to VMController for clean snapshot state - Fix Makefile: set -o pipefail so xcodebuild failures aren't swallowed Updates #13038 Change-Id: Icbab73b57af7df3ae96136fb49cda2536310f31b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 08:17:13 -07:00
David Bond	a29e42135b	cmd/k8s-operator: add nodeSelector to `DNSConfig` resource (#19429 ) This commit modifies the `DNSConfig` resource to allow customisation of the `spec.nodeSelector` field in the nameserver pods. Closes: https://github.com/tailscale/tailscale/issues/19419 Signed-off-by: David Bond <davidsbond93@gmail.com>	2026-04-29 15:56:33 +01:00
Noel O'Brien	40088602c9	cmd/hello: remove hello.ipn.dev (#19567 ) Fixes #19566 Signed-off-by: Noel O'Brien <noel@tailscale.com>	2026-04-28 17:54:29 -07:00
Brad Fitzpatrick	ec7b11d986	tstest/natlab/vmtest, cmd/tta: add TestTaildrop Add a vmtest that brings up two Ubuntu nodes, each behind its own EasyNAT, joined to the tailnet. The sender pushes a small file via "tailscale file cp" and the receiver fetches it via "tailscale file get --wait", asserting that the filename and contents round-trip unchanged. To make Taildrop work in vmtest, three small pieces were needed: The Linux/FreeBSD cloud-init now starts tailscaled with --statedir as well as --state=mem:, so the daemon has a VarRoot to host Taildrop's incoming-files directory. State itself remains in-memory (so nothing persists across reboots); only the var-root scratch space is on disk. vmtest.New grows a variadic EnvOption parameter and a SameTailnetUser helper. When the option is passed, Start sets AllNodesSameUser=true on the embedded testcontrol.Server. Cross-node Taildrop requires the sender and receiver to share a Tailnet user (or have an explicit PeerCapabilityFileSharingTarget granted between them, which we don't plumb here), so TestTaildrop opts in. Existing tests don't. cmd/tta gains /taildrop-send and /taildrop-recv handlers that wrap "tailscale file cp" and "tailscale file get --wait", plus Env.SendTaildropFile and Env.RecvTaildropFile helpers in vmtest that drive them. Updates #13038 Change-Id: I8f5f70f88106e6e2ee07780dd46fe00f8efcfdf1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-28 12:27:55 -07:00
Brad Fitzpatrick	4b8e0ede6d	tstest/natlab/{vmtest,vnet}, cmd/tta: add TestMullvadExitNode Add a vmtest that brings up a Tailscale client, an Ubuntu VM acting as a Mullvad-style plain-WireGuard exit node, and a non-Tailscale webserver, each on its own NAT'd vnet network with a distinct WAN IP. The test exercises Tailscale's IsWireGuardOnly peer code path: the way the control plane wires Mullvad exit nodes into a client's netmap, including the per-client SelfNodeV4MasqAddrForThisPeer source-IP rewrite that lets a Tailscale CGNAT IP egress through a plain-WireGuard tunnel that has no idea what Tailscale is. The mullvad VM doesn't run wireguard-tools or kernel WireGuard; instead, a new TTA endpoint /wg-server-up creates a real Linux TUN named wg0, drives it with wireguard-go (already vendored), and configures the kernel side (ip addr/up, ip_forward, iptables NAT MASQUERADE) so decrypted traffic from the peer egresses with the mullvad VM's WAN IP. Userspace vs kernel WireGuard makes no difference on the wire — what's being tested is Tailscale's plain-WireGuard exit-node code path, not the kernel module — and this lets the test avoid downloading and installing .deb packages inside the VM. Adds Env.BringUpMullvadWGServer (calls /wg-server-up, returns the generated WG public key as a key.NodePublic), Env.SetExitNodeIP (EditPrefs ExitNodeIP directly, for exit nodes whose IPs aren't discoverable via TTA), Env.ControlServer (exposes the underlying testcontrol.Server so tests can UpdateNode / SetMasqueradeAddresses to inject custom peers), and Env.Status (fetches a node's tailscale status, used to read the client's pubkey so we can pin it as the WG server's only allowed peer). The test verifies that the webserver's echoed source IP is the client's WAN with no exit node selected, the mullvad VM's WAN with the WG-only peer selected as exit, and the client's WAN again after clearing. Updates #13038 Change-Id: I5bac4e0d832f05929f12cb77fa9946d7f5fb5ef1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-28 11:31:48 -07:00
Brad Fitzpatrick	f7f8b0a0a5	cmd/tailscale/cli: drive "file cp" progress and offline warning from peerAPI The Online bit in PeerStatus comes from control's last-known state and can lag reality, so gating "tailscale file cp" on it is both unreliable and pushes correctness onto the server. Just try the push directly. In runCp, when the target's PeerStatus says it's offline, no longer fail upfront; getTargetStableID returns the StableID anyway. Replace the static "is offline" warning with a 3-second timer armed for the first file: if the timer fires before peerAPI bytes have flowed, we print a warning to stderr. The wording depends on whether control reported the peer offline ("is reportedly offline; trying anyway") or online ("is not replying; trying anyway"). The warning is printed with a leading vt100 clear-line and a trailing newline so it doesn't get painted over by the progress redraw and so the next progress redraw lands on a fresh line below it. Both the timer disarm and the progress display now read from tailscaled's OutgoingFile.Sent (subscribed via WatchIPNBus) instead of the local-body counter. That's the difference between bytes-acked-by- local-tailscaled (what countingReader.n was measuring; useless for detecting an unreachable peer because for small files net/http buffers the entire body into the unix-socket conn before the peerAPI dial has even started) and bytes-pulled-toward-peerAPI (what tailscaled is actually doing, reflected in OutgoingFile.Sent). The previous code reported 100% within milliseconds for a 3 KiB file even when the peer was unreachable. Add --update-interval (default 250ms) to control the progress repaint cadence; zero or negative disables the progress display entirely. The printer now also stops repainting once it observes Sent at full size with a near-zero rate for >2s, so a stuck transfer doesn't keep clobbering whatever the rest of runCp is trying to print. Updates #18740 Change-Id: I189bd1c2cd8e094d372c4fee23114b1d2f8024b4 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-28 11:03:58 -07:00
Brad Fitzpatrick	88cb6f58f8	tool/updateflakes, cmd/nardump: replace update-flake.sh with Go tool Consolidate go.mod.sri and go.toolchain.rev.sri into a single flakehashes.json file at the repo root, owned by a new Go program at tool/updateflakes. The JSON is consumed by flake.nix via builtins.fromJSON and by any future Go code via the FlakeHashes struct that defines its schema. Each block records its input fingerprint alongside the SRI it produced: the goModSum (a sha256 over go.mod and go.sum) for the vendor block, and the literal rev string from go.toolchain.rev for the toolchain block. updateflakes regenerates a block only when its recorded fingerprint disagrees with the current input. Doing the gating by content rather than file mtimes avoids the usual mtime hazards across git checkouts, clones, and merges. It also means re-runs with no input changes are essentially free, and a re-run that touches only one input pays only for that one block. The two blocks have no shared state -- vendor invokes go mod vendor into one tempdir, toolchain fetches and extracts a tarball into another -- so they run concurrently via errgroup. Cold time is bounded by the slower of the two rather than their sum. Also takes the opportunity to fold the toolchain fetch into a single curl\|tar pipeline (no intermediate .tar.gz on disk). Split cmd/nardump into a thin package main and a new package nardump library at cmd/nardump/nardump that holds the NAR encoder and SRI helper. tool/updateflakes imports the library directly rather than building and exec'ing the nardump binary at runtime. The library uses fs.ReadLink (Go 1.25+) instead of os.Readlink, so it no longer requires the caller to chdir into the FS root for symlink targets to resolve. WriteNAR now wraps its writer in a bufio.Writer internally (unless the caller already passed one) and flushes on return, so callers don't pay for tiny writes against slow underlying writers. The cache-busting line in flake.nix and shell.nix is known to live at end of file, so updateCacheBust walks the lines in reverse. make tidy timings on this machine, before: ~14s every run. After: warm (no input changes): 0.05s vendor block stale only: 1.4s toolchain block stale only: 5.0s cold (no flakehashes.json): 5.0s Updates #6845 Change-Id: I0340608798f1614abf147a491bf7c68a198a0db4 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-28 10:18:32 -07:00
Daniel Pañeda	7735b15de3	cmd/k8s-operator: truncate long label values in metrics resources (#18895 ) * cmd/k8s-operator: truncate long label values in metrics resources Kubernetes label values have a 63-character limit, but resource names can be up to 253 characters. When a Service or Ingress with a long name is exposed via Tailscale, the operator fails to reconcile because it uses the parent resource name directly as label values on metrics Services. Truncate label values that may exceed the limit by keeping the first 54 characters and appending a SHA256-based hash suffix to preserve uniqueness. Fixes #18894 Signed-off-by: Daniel Pañeda <daniel.paneda@clickhouse.com> Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> * cmd/k8s-operator: move TruncateLabelValue to shared k8s-operator package Move the label truncation helper to k8s-operator/utils.go so it can be reused by other components that need to produce valid Kubernetes labels. Signed-off-by: Daniel Pañeda <daniel.paneda@clickhouse.com> Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> * cmd/k8s-operator: truncate long domain label values in cert resources Applies TruncateLabelValue to certResourceLabels in order to prevent API server validation failures. This covers both the HA Ingress and kube-apiserver proxy reconcilers, as both flow through certResourceLabels. Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> * cmd/k8s-operator: remove empty metrics_resources_test.go, use hyphens in test names to satisfy go vet Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> --------- Signed-off-by: Daniel Pañeda <daniel.paneda@clickhouse.com> Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>	2026-04-28 14:11:59 +01:00
Will Norris	2d85f37f39	client/systray: support several different color themes Currently we only have a dark theme icon with white and grey dots over a black background. For some desktops, a logo with black and grey dots over a white background might be preferable. And for desktops where the bar is almost black or white, but not quite, an option to render the logo with dots only and no background can look really nice. Add a new -theme flag to the systray command with the default staying the same as it is today. Updates #18303 Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d Signed-off-by: Will Norris <will@tailscale.com>	2026-04-27 18:54:14 -07:00
Brad Fitzpatrick	5c1738fd56	tstest/natlab/{vmtest,vnet}, cmd/tta: add TestExitNode Add a vmtest TestExitNode that brings up a client, two exit nodes, and a non-Tailscale webserver, each on its own NAT'd vnet network with a distinct WAN IP. The test cycles the client's exit node setting between off, exit1, and exit2 and asserts that the webserver echoes the expected post-NAT source IP for each. Three pieces were needed to make this work: vnet now forwards TCP between simulated networks at the packet level, mirroring the existing UDP path. When a guest VM sends TCP to another simulated network's WAN IP, the source network's gateway rewrites src via doNATOut and routeTCPPacket hands the packet off to the destination network, which rewrites dst via doNATIn and writes the rewritten frame onto the destination LAN. The TCP stacks of the two guest VM kernels talk end-to-end; vnet just NATs the IP/port headers in flight, so all TCP semantics (handshakes, options, sequence numbers, payload) are preserved without a gvisor TCP termination in the middle. Adds a focused TestInterNetworkTCP that exercises this path without any Tailscale machinery. cmd/tta binds its outbound dial to the default route's interface using SO_BINDTODEVICE. Without that, the moment tailscaled installs 0.0.0.0/0 → tailscale0 in response to setting an exit node, TTA's existing TCP connection to test-driver gets rerouted through the exit node. From the test driver's perspective the connection's packets then arrive with the exit node's WAN IP as the source rather than the client's, so they don't match the existing flow and the connection is dead — manifesting in the test as a hang on EditPrefs (which had actually completed in milliseconds on the daemon side, but whose response never made it back). Pinning the socket to the underlying NIC keeps TTA's agent connection on a real interface regardless of any policy routing tailscaled installs later. We bind rather than carry the Tailscale bypass fwmark because the fwmark approach is conditional on tailscaled having configured SO_MARK-based policy routing, while binding is unconditional. vmtest grows an Env.SetExitNode helper that sets ExitNodeIP via EditPrefs through the agent, used by the new test. Updates #13038 Change-Id: I9fc8f91848b7aa2297ef3eaf71fed9d96056a024 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-27 16:54:20 -07:00
BeckyPauley	7477a6ee47	cmd/k8s-operator: use dynamic resource names in e2e ingress tests (#19536 ) Replace hardcoded resource names with dynamically generated names in k8s-operator-e2e ingress tests to avoid collisions with stale resources. Updates #tailscale/corp#40612 Signed-off-by: Becky Pauley <becky@tailscale.com>	2026-04-27 13:40:46 +01:00
Andrew Lytvynov	ad9e6c1925	go.mod: bump github.com/google/go-containerregistry (#19500 ) This drops an indirect dependency on the old github.com/docker/docker (which was replaced with github.com/moby/moby) and fixes a couple recent CVEs. Updates #cleanup Signed-off-by: Andrew Lytvynov <awly@tailscale.com>	2026-04-23 10:39:27 -07:00
Brad Fitzpatrick	f289f7e77c	tstest/natlab/vmtest,cmd/tta: add TestSiteToSite Verifies that site-to-site Tailscale subnet routing with --snat-subnet-routes=false preserves the original source IP end-to-end. Topology: two sites, each with a Linux subnet router on a NATted WAN plus an internal LAN, and a non-Tailscale backend on each LAN. Backends are given static routes pointing to their local subnet router for the remote site's prefix; an HTTP GET from backend-a to backend-b over Tailscale returns a body containing backend-a's LAN IP. Adds the supporting vmtest.SNATSubnetRoutes NodeOption and plumbs snat-subnet-routes through TTA's /up handler. The webserver started by vmtest.WebServer now also echoes the remote IP, for the preservation assertion. Adds a /add-route TTA endpoint (Linux-only for now) and a vmtest Env.AddRoute helper so the test can install the backend static routes through TTA rather than needing a host SSH key and debug NIC. ensureGokrazy now always rebuilds the natlab qcow2 (once per test process, via sync.Once) so the test picks up the new TTA and webserver behavior. This is pulled out of a larger pending change that adds FreeBSD site-to-site subnet routing support; figured we should have at least the Linux test covering what works today. Updates #5573 Change-Id: I881c55b0f118ac9094546b5fbe68dddf179bb042 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-22 12:11:30 -07:00
Fernando Serboncini	81fbcc1ac8	cmd/tsnet-proxy: add tsnet-based port proxy tool (#19468 ) Exposes a local port on the tailnet under a chosen hostname. Raw TCP by default; --http or --https reverse-proxy with Tailscale-User-* identity headers from WhoIs, matching tailscaled's serve header conventions. Useful as a one-shot to put a dev server on the tailnet. Fixes #19467 Change-Id: I79f63cfbbedf7e40cf0f1f51cbae8df86ae90cdf Signed-off-by: Fernando Serboncini <fserb@tailscale.com>	2026-04-22 13:34:18 -04:00
James 'zofrex' Sanderson	ffae275d4d	ipn/ipnlocal,tailcfg: add /debug/tka c2n endpoint (#19198 ) Updates tailscale/corp#35015 Signed-off-by: James Sanderson <jsanderson@tailscale.com>	2026-04-20 16:00:03 +01:00
BeckyPauley	b239e92eb6	cmd/k8s-operator: add e2e test setup and l7 ingress test for multi-tailnet (#19426 ) This change adds setup for a second tailnet to enable multi-tailnet e2e tests. When running against devcontrol, a second tailnet is created via the API. Otherwise, credentials are read from SECOND_TS_API_CLIENT_SECRET. Also adds an l7 HA Ingress test for multi-tailnet. Fixes tailscale/corp#37498 Signed-off-by: Becky Pauley <becky@tailscale.com>	2026-04-17 17:03:25 +01:00
Andrew Dunham	d52ae45e9b	cmd/cloner: deep-clone pointer elements in map-of-slice values The cloner's codegen for map[K][]V fields was doing a shallow append (copying pointer values) instead of cloning each element. This meant that cloned structs aliased the original's pointed-to values through the map's slice entries. Mirror the existing standalone-slice logic that checks ContainsPointers(sliceType.Elem()) and generates per-element cloning for pointer, interface, and struct types. Regenerate net/dns and tailcfg which both had affected map[...][]dnstype.Resolver fields. Fixes #19284 Signed-off-by: Andrew Dunham <andrew@tailscale.com>	2026-04-17 11:36:05 -04:00
Bjorn Stange	47ecbe5845	cmd/k8s-operator: add priorityClassName support to helm chart (#19236 ) Expose priorityClassName in the operator Helm chart values so that users can configure the operator deployment with a Kubernetes PriorityClass. This prevents the operator pods from being preempted by lower-priority workloads. Fixes #19235 Signed-off-by: Bjorn Stange <bjorn.stange@expel.io> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 12:57:12 +01:00
Brad Fitzpatrick	50d7176333	control/tsp, cmd/tsp: add low-level Tailscale protocol client and tool Add a new control/tsp package providing a client for speaking the Tailscale protocol to a coordination server over Noise, along with a cmd/tsp binary exposing it as a low-level composable tool for generating keys, registering nodes, and issuing map requests. Previously developed out-of-tree at github.com/bradfitz/tsp; imported here without git history. Updates #12542 Change-Id: I6ad21143c4aefe8939d4a46ae65b2184173bf69f Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-16 20:00:25 -07:00
David Bond	eea39eaf52	cmd/k8s-operator: add affinity rules to DNSConfig (#19360 ) This commit modifies the `DNSConfig` custom resource to allow the user to specify affinity rules on the nameserver pods. Updates: https://github.com/tailscale/tailscale/issues/18556 Signed-off-by: David Bond <davidsbond93@gmail.com>	2026-04-15 22:39:04 +01:00
Jonathan Nobels	acc43356c6	control/controlclient: enable request signatures on macOS (#19317 ) fixes tailscale/corp#39422 Updates tailscale/certstore for properly macOS support and builds the request signing support into macOS builds. iOS and builds that do not use cGo are omitted. Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>	2026-04-15 14:11:14 -04:00
Tom Meadows	5eb0b4be31	cmd/containerboot,cmd/k8s-proxy,kube: add authkey renewal to k8s-proxy (#19221 ) * kube/authkey,cmd/containerboot: extract shared auth key reissue package Move auth key reissue logic (set marker, wait for new key, clear marker, read config) into a shared kube/authkey package and update containerboot to use it. No behaviour change. Updates #14080 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> * kube/authkey,kube/state,cmd/containerboot: preserve device_id across restarts Stop clearing device_id, device_fqdn, and device_ips from state on startup. These keys are now preserved across restarts so the operator can track device identity. Expand ClearReissueAuthKey to clear device state and tailscaled profile data when performing a full auth key reissue. Updates #14080 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> * cmd/containerboot: use root context for auth key reissue wait Pass the root context instead of bootCtx to setAndWaitForAuthKeyReissue. The 60-second bootCtx timeout was cancelling the reissue wait before the operator had time to respond, causing the pod to crash-loop. Updates #14080 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> * cmd/k8s-proxy: add auth key renewal support Add auth key reissue handling to k8s-proxy, mirroring containerboot. When the proxy detects an auth failure (login-state health warning or NeedsLogin state), it disconnects from control, signals the operator via the state Secret, waits for a new key, clears stale state, and exits so Kubernetes restarts the pod with the new key. A health watcher goroutine runs alongside ts.Up() to short-circuit the startup timeout on terminal auth failures. Updates #14080 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk> --------- Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>	2026-04-15 16:13:46 +01:00
Brad Fitzpatrick	9fbe4b3ed2	all: fix six tests that failed with -count=2 Avery found a bunch of tests that fail with -count=2. Updates tailscale/corp#40176 (tracks making our CI detect them) Change-Id: Ie3e4398070dd92e4fe0146badddf1254749cca20 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Co-authored-by: Avery Pennarun <apenwarr@tailscale.com>	2026-04-13 18:52:57 -07:00
Brad Fitzpatrick	a97850f7e2	cmd/derper: fix TestLookupMetric to pass when run alone TestLookupMetric was added in e8d140654 (2023-08-17) without initializing the dnsCache and dnsCacheBytes globals. When run in isolation, handleBootstrapDNS writes a nil body (from the uninitialized dnsCacheBytes), causing getBootstrapDNS to fail decoding an empty response with EOF. Add a setDNSCache test helper that stores the dnsEntryMap, marshals dnsCacheBytes, and registers a t.Cleanup to nil both out, so tests that forget to call it will hit the dnsCache-nil fatal in getBootstrapDNS rather than silently depending on prior test state. Also add AssertNotParallel and a dnsCache-nil fatal check to getBootstrapDNS, the central helper all bootstrap DNS tests flow through, to prevent future tests from running in parallel (they all mutate package-level DNS caches and metrics) and to give a clear error if a test forgets to initialize the DNS caches. Fixes #19388 Change-Id: I8ad454ec6026c71f13ecfa14d25925df5478b908 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Co-authored-by: Avery Pennarun <apenwarr@tailscale.com>	2026-04-13 17:20:43 -07:00
Brad Fitzpatrick	6500d3c3f8	cmd/containerboot: mark TestContainerBoot as flaky Updates #19380 Change-Id: Ib1be53836e37224265d10abd0c2213644ea54d64 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-13 15:21:42 -07:00
Jordan Whited	929ad51be0	cmd/derper: mark rate-config flag as experimental and unstable Updates tailscale/corp#38509 Signed-off-by: Jordan Whited <jordan@tailscale.com>	2026-04-13 12:24:59 -07:00
Mike O'Driscoll	ca5db865b4	cmd/derper,derp: add --rate-config file with SIGHUP reload (#19314 ) Add a --rate-config flag pointing to a JSON file for per-client receive rate limits (bytes/sec and burst bytes). The config is reloaded on SIGHUP, updating all existing client connections live. The --per-client-rate-limit and --per-client-rate-burst flags are removed in favor of the config file. In derpserver, rate limiting uses an atomic.Pointer[xrate.Limiter] per client: nil when unlimited or mesh (zero overhead), non-nil when rate-limited. Document that clientSet.activeClient Store operations require Server.mu. Updates tailscale/corp#38509 Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>	2026-04-10 18:37:54 -04:00
Fernando Serboncini	6b7caaf7ee	cmd/k8s-operator: set PreferDualStack on ProxyGroup egress services (#19194 ) On dual-stack clusters defaulting to IPv6, the ProxyGroup egress service only got an IPv6 address, which causes request failures. Individual egress proxies already set PreferDualStack correctly. Fixes: #18768 Signed-off-by: Fernando Serboncini <fserb@tailscale.com>	2026-04-09 13:33:39 -04:00
David Bond	85d6ba9473	cmd/k8s-operator: migrate to tailscale-client-go-v2 (#19010 ) This commit modifies the kubernetes operator to use the `tailscale-client-go-v2` package instead of the internal tailscale client it was previously using. This now gives us the ability to expand out custom resources and features as they become available via the API module. The tailnet reconciler has also been modified to manage clients as tailnets are created and removed, providing each subsequent reconciler with a single `ClientProvider` that obtains a tailscale client for the respective tailnet by name, or the operator's default when presented with a blank string. Fixes: https://github.com/tailscale/corp/issues/38418 Signed-off-by: David Bond <davidsbond93@gmail.com>	2026-04-09 14:39:46 +01:00
Brad Fitzpatrick	ec0b23a21f	vmtest: add VM-based integration test framework Add tstest/natlab/vmtest, a high-level framework for running multi-VM integration tests with mixed OS types (gokrazy + Ubuntu/Debian cloud images) connected via natlab's vnet virtual network. The vmtest package provides: - Env type that orchestrates vnet, QEMU processes, and agent connections - OS image support (Gokrazy, Ubuntu2404, Debian12) with download/cache - QEMU launch per OS type (microvm for gokrazy, q35+KVM for cloud) - Cloud-init seed ISO generation with network-config for multi-NIC - Cross-compilation of test binaries for cloud VMs - Debug SSH NIC on cloud VMs for interactive debugging - Test helpers: ApproveRoutes, HTTPGet, TailscalePing, DumpStatus, WaitForPeerRoute, SSHExec TTA enhancements (cmd/tta): - Parameterize /up (accept-routes, advertise-routes, snat-subnet-routes) - Add /set, /start-webserver, /http-get endpoints - /http-get uses local.Client.UserDial for Tailscale-routed requests - Fix /ping for non-gokrazy systems TestSubnetRouter exercises a 3-VM subnet router scenario: client (gokrazy) → subnet-router (Ubuntu, dual-NIC) → backend (gokrazy) Verifies HTTP access to the backend webserver through the Tailscale subnet route. Passes in ~30 seconds. Updates tailscale/tailscale#13038 Change-Id: I165b64af241d37f5f5870e796a52502fc56146fa Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-08 17:24:18 -07:00
Brad Fitzpatrick	a182b864ac	tsd, all: add Sys.ExtraRootCAs, plumb through TLS dial paths Add ExtraRootCAs *x509.CertPool to tsd.System and plumb it through the control client, noise transport, DERP, and wgengine layers so that platforms like Android can inject user-installed CA certificates into Go's TLS verification. tlsdial.Config now honors base.RootCAs as additional trusted roots, tried after system roots and before the baked-in LetsEncrypt fallback. SetConfigExpectedCert gets the same treatment for domain-fronted DERP. The Android client will set sys.ExtraRootCAs with a pool built from x509.SystemCertPool + user-installed certs obtained via the Android KeyStore API, replacing the current SSL_CERT_DIR environment variable approach. Updates #8085 Change-Id: Iecce0fd140cd5aa0331b124e55a7045e24d8e0c2 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-07 18:10:54 -07:00
Doug Bryant	8df8e9cb6e	cmd/containerboot: rate-limit IPN bus netmap notifications CPU profiling a containerboot subnet router on a large tailnet showed roughly 40% of CPU spent in serveWatchIPNBus JSON-encoding the full netmap on every update. containerboot only reads SelfNode fields from those notifications (and does a peer lookup when TailnetTargetFQDN is set), so it does not need every intermediate netmap delta. Set ipn.NotifyRateLimit on all three WatchIPNBus calls so netmap notifications are coalesced to one per 3s. Initial-state delivery is unaffected since the rateLimitingBusSender flushes the first send immediately. Updates #cleanup Signed-off-by: Doug Bryant <dougbryant@anthropic.com>	2026-04-07 16:03:15 -07:00
Mike O'Driscoll	e689283ebd	derp/derpserver: add per-connection receive rate limiting (#19222 ) Add server-side per-client bandwidth enforcement using TCP backpressure. When configured, the server calls WaitN after reading each DERP frame, which delays the next read, fills the TCP receive buffer, shrinks the TCP window, and naturally throttles the sender — no packets are dropped. - Rate limiting is on the receive (inbound) side, which is what an abusive client controls - Mesh peers are exempt since they are trusted infrastructure - The burst size is at least MaxPacketSize (64KB) to ensure a single max-size frame can always be processed Also refactors sclient to store a context.Context directly instead of a done channel, which simplifies the rate limiter's WaitN call. Flags added to cmd/derper: --per-client-rate-limit (bytes/sec, default 0 = unlimited) --per-client-rate-burst (bytes, default 0 = 2x rate limit) Example for 10Mbps: --per-client-rate-limit=1250000 Updates #38509 Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>	2026-04-07 18:40:41 -04:00
Brad Fitzpatrick	8a7e160a6e	ipn/desktop: move behind feature/condregister Move the ipn/desktop blank import from cmd/tailscaled/tailscaled_windows.go into feature/condregister/maybe_desktop_sessions.go, consistent with how all other modular features are registered. tailscaled already imports feature/condregister, so it still gets ipn/desktop on Windows. Updates #12614 Change-Id: I92418c4bf0e67f0ab40542e47584762ac0ffa2b2 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-07 11:37:47 -07:00
Brad Fitzpatrick	1b5b43787c	ipn/localapi, cli, clientmetric: add ipnbus feature tag; fix omit.go stub Add a new "ipnbus" build feature tag so the watch-ipn-bus LocalAPI endpoint can be independently controlled, rather than being gated behind HasDebug \|\| HasServe. Minimal/embedded builds that omit both debug and serve were getting 404s on watch-ipn-bus, breaking "tailscale up --authkey=..." and other CLI flows that depend on WatchIPNBus. In the CLI, check buildfeatures.HasIPNBus before attempting to watch the IPN bus in "tailscale up"/"tailscale login", and exit early with an informational message when the feature is omitted. Also add the missing NewCounterFunc stub to clientmetric/omit.go, which caused compilation errors when building with ts_omit_clientmetrics and netstack enabled. Fixes #19240 Change-Id: I2e3c69a72fc50fa02542b91b8a54859618a463d1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-07 10:22:37 -07:00
Kristoffer Dalby	dd3b613787	ssh: replace tempfork with tailscale/gliderssh Brings in a newer version of Gliderlabs SSH with added socket forwarding support. Fixes #12409 Fixes #5295 Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2026-04-07 11:59:38 +01:00
Brad Fitzpatrick	86f42ea87b	cmd/cloner, cmd/viewer: handle named map/slice types with Clone/View methods The cloner and viewer code generators didn't handle named types with basic underlying types (map/slice) that have their own Clone or View methods. For example, a type like: type Map map[string]any func (m Map) Clone() Map { ... } func (m Map) View() MapView { ... } When used as a struct field, the cloner would descend into the underlying map[string]any and fail because it can't clone the any (interface{}) value type. Similarly, the viewer would try to create a MapFnOf view and fail. Fix the cloner to check for a Clone method on the named type before falling through to the underlying type handling. Fix the viewer to check for a View method on named map/slice types, so the type author can provide a purpose-built safe view that doesn't leak raw any values. Named map/slice types without a View method fall through to normal handling, which correctly rejects types like map[string]any as unsupported. Updates tailscale/corp#39502 (needed by tailscale/corp#39594) Change-Id: Iaef0192a221e02b4b8e409c99ef8398090327744 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-05 20:20:32 -07:00
Brad Fitzpatrick	5ef3713c9f	cmd/vet: add subtestnames analyzer; fix all existing violations Add a new vet analyzer that checks t.Run subtest names don't contain characters requiring quoting when re-running via "go test -run". This enforces the style guide rule: don't use spaces or punctuation in subtest names. The analyzer flags: - Direct t.Run calls with string literal names containing spaces, regex metacharacters, quotes, or other problematic characters - Table-driven t.Run(tt.name, ...) calls where tt ranges over a slice/map literal with bad name field values Also fix all 978 existing violations across 81 test files, replacing spaces with hyphens and shortening long sentence-like names to concise hyphenated forms. Updates #19242 Change-Id: Ib0ad96a111bd8e764582d1d4902fe2599454ab65 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-05 15:52:51 -07:00
M. J. Fromberger	eaa5d9df4b	client,cmd/tailscale,ipn/{ipnlocal,localapi}: add debug CLI command to clear netmap caches (#19213 ) This is a follow-up to #19117, adding a debug CLI command allowing the operator to explicitly discard cached netmap data, as a safety and recovery measure. Updates #12639 Change-Id: I5c3c47c0204754b9c8e526a4ff8f69d6974db6d0 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-04-02 12:06:39 -07:00
Naman Sood	d6b626f5bb	tstest: add test for connectivity to off-tailnet CGNAT endpoints This test is currently known-broken, but work is underway to fix it. tailscale/corp#36270 tracks this work. Updates tailscale/corp#36270 Fixes tailscale/corp#36272 Signed-off-by: Naman Sood <mail@nsood.in>	2026-04-02 14:44:40 -04:00
BeckyPauley	e82ffe03ad	cmd/k8s-operator: add further E2E tests for Ingress (#19219 ) * cmd/k8s-operator/e2e: add L7 HA ingress test Change-Id: Ic017e4a7e3affbc3e2a87b9b6b9c38afd65f32ed Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com> * cmd/k8s-operator: add further E2E tests for Ingress (#34833) This change adds E2E tests for L3 HA Ingress and L7 Ingress (Standalone and HA). Updates the existing L3 Ingress test to use the Service's Magic DNS name to test connectivity. Also refactors test setup to set TS_DEBUG_ACME_DIRECTORY_URL only for tests running against devcontrol, and updates the Kind node image from v1.30.0 to v1.35.0. Fixes tailscale/corp#34833 Signed-off-by: Becky Pauley <becky@tailscale.com> --------- Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com> Signed-off-by: Becky Pauley <becky@tailscale.com> Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>	2026-04-02 15:49:40 +01:00
Alex Chan	5b62f98894	ipn, cmd/tailscale/cli: allow setting FQDN sans dot as an exit node In #10057, @seigel pointed out an inconsistency in the help text for `exit-node list` and `set --exit-node`: 1. Use `tailscale exit-node list`, which has a column titled "hostname" and tells you that you can use a hostname with `set --exit-node`: ```console $ tailscale exit-node list IP HOSTNAME COUNTRY CITY STATUS 100.98.193.6 linode-vps.tailfa84dd.ts.net - - - […] 100.93.242.75 ua-iev-wg-001.mullvad.ts.net Ukraine Kyiv - # To view the complete list of exit nodes for a country, use `tailscale exit-node list --filter=` followed by the country name. # To use an exit node, use `tailscale set --exit-node=` followed by the hostname or IP. # To have Tailscale suggest an exit node, use `tailscale exit-node suggest`. ``` (This is the same format hostnames are presented in the admin console.) 2. Try copy/pasting a hostname into `set --exit-node`: ```console $ tailscale set --exit-node=linode-vps.tailfa84dd.ts.net invalid value "linode-vps.tailfa84dd.ts.net" for --exit-node; must be IP or unique node name ``` 3. Note that the command allows some hostnames, if they're from nodes in a different tailnet: ```console $ tailscale set --exit-node= ua-iev-wg-001.mullvad.ts.net $ echo $? 0 ``` This patch addresses the inconsistency in two ways: 1. Allow using `tailscale set --exit-node=` with an FQDN that's missing the trailing dot, matching the formatting used in `exit-node list` and the admin console. 2. Make the description of valid exit nodes consistent across commands ("hostname or IP"). Updates #10057 Change-Id: If5d74f950cc1a9cc4b0ebc0c2f2d70689ffe4d73 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-04-01 20:42:35 +01:00
Alex Chan	4ffb92d7f6	tka: refer consistently to "DisablementValues" This avoids putting "DisablementSecrets" in the JSON output from `tailscale lock log`, which is potentially scary to somebody who doesn't understand the distinction. AUMs are stored and transmitted in CBOR-encoded format, which uses an integer rather than a string key, so this doesn't break already-created TKAs. Fixes #19189 Change-Id: I15b4e81a7cef724a450bafcfa0b938da223c78c9 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-04-01 19:09:22 +01:00
Alex Chan	edb2be1a01	cmd/tailscale: improve `tailscale lock` error message if no keys Previously, running `add/remove/revoke-keys` without passing any keys would fail with an unhelpful error: ```console $ tailscale lock revoke-keys generation of recovery AUM failed: sending generate-recovery-aum: 500 Internal Server Error: no provided key is currently trusted ``` or ```console $ tailscale lock revoke-keys generation of recovery AUM failed: sending generate-recovery-aum: 500 Internal Server Error: network-lock is not active ``` Now they fail with a more useful error: ```console $ tailscale lock revoke-keys missing argument, expected one or more tailnet lock keys ``` Fixes #19130 Change-Id: I9d81fe2f5b92a335854e71cbc6928e7e77e537e3 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-03-29 09:28:52 +01:00

1 2 3 4 5 ...

2705 Commits