530 Commits

Author SHA1 Message Date
Brad Fitzpatrick
a4f623ad4b tstest/build-macos-base-vm: cache IPSW in ~/.cache with freshness checks
The IPSW restore image (~15GB) is now cached in
~/.cache/tailscale/vmtest/macos-ipsw/ and only re-downloaded when
Apple publishes a new version. Freshness is checked via HTTP ETag
headers. At most one IPSW is kept in the cache directory.

The Go program now handles the download itself (with progress
reporting) rather than delegating to the Swift helper, which lets
us do proper HTTP conditional requests. The Swift helper is split
into two modes: "fetch-ipsw-url" (queries VZ framework for the
latest URL) and "install" (installs from a local IPSW file).
2026-04-10 09:02:23 -07:00
Brad Fitzpatrick
647bab9acd tstest: move macOS VM storage to ~/.cache/tailscale/vmtest/macos/
Replace ~/VM.bundle (not a macOS convention, just an arbitrary choice
from the original tailmac code) with ~/.cache/tailscale/vmtest/macos/.

Default VM name is now "macos-base" instead of "llmacstation" (which
was an unrelated project). Remove all llmacstation references from
vmtest code.
2026-04-10 08:53:36 -07:00
Brad Fitzpatrick
3e518d50ae tstest/build-macos-base-vm: fix waitFor, add --rebuild, fix MkdirAll
- waitFor now runs the try func at least once before checking the
  deadline, instead of potentially returning nil on a short timeout
- Add --rebuild flag to delete and recreate an existing VM
- Exit early with a message (not an error) when the VM already exists
- Create intermediate directories before writing .AppleSetupDone
  (freshly installed VMs may not have /private/var/db yet)
2026-04-10 08:02:10 -07:00
Brad Fitzpatrick
a981211011 tstest/build-macos-base-vm: add tool to create macOS base VM for vmtest
New command that creates a macOS VM image from scratch:

    go run ./tstest/build-macos-base-vm

It downloads the latest macOS IPSW restore image (~15GB, cached), installs
macOS via Virtualization.framework (~3.5 min), and applies the
.AppleSetupDone fixup so the VM boots without the interactive Setup
Assistant.

The vmtest skip message now points to this command instead of requiring
an external tool. The --macos-vm-id flag defaults to "llmacstation" so
no flags are needed in the common case.
2026-04-10 07:41:50 -07:00
Brad Fitzpatrick
7fe5a954b4 tstest/tailmac: remove unnecessary socketpair relay for VZ network device
Give VZ the dgram socket file descriptor directly, matching what
llmacstation does. The socketpair relay was added during debugging
but is unnecessary — the direct approach works fine.
2026-04-10 07:22:04 -07:00
Brad Fitzpatrick
ef69d2041b tstest/natlab/vmtest: add macOS VM support and TestMacOSAndLinuxCanPing
Add macOS VM support to the vmtest integration test framework using
tailmac (Apple Virtualization.framework). The new TestMacOSAndLinuxCanPing
creates a LAN with a Gokrazy arm64 Linux VM (QEMU/HVF) and a macOS VM
(tailmac) and verifies they can ping each other over vnet.

Key changes:

vmtest framework (tstest/natlab/vmtest/):
- Add MacOS OSImage type with IsMacOS field
- Add NoAgent node option for VMs without TTA
- Add LANPing method using TTA's /ping endpoint with retries
- Add --macos-vm-id flag for specifying the base macOS VM
- Set up Unix datagram socket for macOS VMs (ProtocolUnixDGRAM)
- Add tailmac.go for macOS VM clone/configure/launch lifecycle
- Support arm64 QEMU with HVF on macOS hosts (virt machine type)
- Use /tmp for socket paths to avoid 104-byte sun_path limit

tailmac (tstest/tailmac/):
- Add --headless flag to Host.app for GUI-less VM operation
- Use RunLoop.main.run() instead of dispatchMain() (VZ framework
  requires the RunLoop for start/restore callbacks)
- Single-NIC mode in headless (matches llmacstation VM config)
- Add socketpair relay between VZ and the vnet dgram socket
- Fix dispatchMain() bug in tailmac CLI's create command too

gokrazy arm64:
- Add natlab-arm64 Makefile target
- Add gokrazydeps.go for github.com/gokrazy/kernel.arm64
- Add kernel.arm64 dependency to go.mod

The test requires: macOS arm64 host, qemu-system-aarch64, a pre-built
macOS VM (--macos-vm-id flag), and tailmac Host.app built.
2026-04-10 06:43:21 -07:00
Brad Fitzpatrick
dca1d8eea1 tstest/natlab: add TestSubnetRouterFreeBSD with FreeBSD cloud image support
As a warm-up to making natlab support multiple operating systems,
start with an easy one (in that it's also Unixy and open source like
Linux) and add FreeBSD 15.0 as a VM OS option for the vmtest
integration test framework, and add TestSubnetRouterFreeBSD which
tests subnet routing through a FreeBSD VM (Gokrazy → FreeBSD →
Gokrazy).

Key changes:
- Add FreeBSD150 OSImage using the official FreeBSD 15.0
  BASIC-CLOUDINIT cloud image (xz-compressed qcow2)
- Add GOOS()/IsFreeBSD() methods to OSImage for cross-compilation
  and OS-specific behavior
- Handle xz-compressed image downloads in ensureImage
- Refactor compileBinaries into compileBinariesForOS to support
  multiple GOOS targets (linux, freebsd), with binaries registered
  at <goos>/<name> paths on the file server VIP
- Add FreeBSD-specific cloud-init (nuageinit) user-data generation:
  string-form runcmd (nuageinit doesn't support YAML arrays),
  fetch(1) instead of curl, FreeBSD sysctl names for IP forwarding,
  mkdir /usr/local/bin, PATH setup for tta
- Skip network-config in cidata ISO for FreeBSD (DHCP via rc.conf)

Updates tailscale/tailscale#13038

Change-Id: Ibeb4f7d02659d5cd8e3a7c3a66ee7b1a92a0110d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-09 07:49:07 -07:00
Brad Fitzpatrick
ec0b23a21f vmtest: add VM-based integration test framework
Add tstest/natlab/vmtest, a high-level framework for running multi-VM
integration tests with mixed OS types (gokrazy + Ubuntu/Debian cloud
images) connected via natlab's vnet virtual network.

The vmtest package provides:
  - Env type that orchestrates vnet, QEMU processes, and agent connections
  - OS image support (Gokrazy, Ubuntu2404, Debian12) with download/cache
  - QEMU launch per OS type (microvm for gokrazy, q35+KVM for cloud)
  - Cloud-init seed ISO generation with network-config for multi-NIC
  - Cross-compilation of test binaries for cloud VMs
  - Debug SSH NIC on cloud VMs for interactive debugging
  - Test helpers: ApproveRoutes, HTTPGet, TailscalePing, DumpStatus,
    WaitForPeerRoute, SSHExec

TTA enhancements (cmd/tta):
  - Parameterize /up (accept-routes, advertise-routes, snat-subnet-routes)
  - Add /set, /start-webserver, /http-get endpoints
  - /http-get uses local.Client.UserDial for Tailscale-routed requests
  - Fix /ping for non-gokrazy systems

TestSubnetRouter exercises a 3-VM subnet router scenario:
  client (gokrazy) → subnet-router (Ubuntu, dual-NIC) → backend (gokrazy)
  Verifies HTTP access to the backend webserver through the Tailscale
  subnet route. Passes in ~30 seconds.

Updates tailscale/tailscale#13038

Change-Id: I165b64af241d37f5f5870e796a52502fc56146fa
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-08 17:24:18 -07:00
Brad Fitzpatrick
814161303f tstest/natlab/vnet: add multi-NIC node support, DHCP fixes, and VIPs
Multi-NIC support:
  - Add nodeNIC type and node.extraNICs for secondary network interfaces
  - Add netForMAC/macForNet to route packets to the correct network by MAC
  - Update initFromConfig to allocate a MAC + LAN IP per network
  - Fix handleEthernetFrameFromVM, ServeUnixConn to use netForMAC
  - Fix MACOfIP, writeEth, WriteUDPPacketNoNAT, gVisor write path, and
    createARPResponse to use macForNet (return the MAC actually on that
    network, not the node's primary MAC)
  - Fix createDHCPResponse for multi-NIC (correct client IP and subnet)
  - Add nodeNICMac for secondary NIC MAC generation
  - Add Node accessors: NumNICs, NICMac, Networks, LanIP

DHCP fixes:
  - Include LeaseTime, SubnetMask, Router, DNS in DHCP Offer (not just
    Ack). systemd-networkd requires these to accept an Offer.
  - Fix DHCP response source IP: use gateway IP instead of echoing
    the request's destination (which was 255.255.255.255 for discovers)

New VIPs:
  - cloud-init.tailscale: serves per-node cloud-init meta-data, user-data,
    and network-config for VMs booting with nocloud datasource
  - files.tailscale: serves binary files (tta, tailscale, tailscaled)
    registered via RegisterFile for cloud VM provisioning
  - Add ControlServer() accessor for test control server

This is necessary for a three-VM natlab subnet router
integration test, coming later.

Updates #13038

Change-Id: I59f9f356bae9b5509c117265237983972dfdd5af
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-08 11:04:26 -07:00
Brad Fitzpatrick
ccef06b968 tstest/integration/testcontrol: notify peers when subnet routes change
When SetSubnetRoutes is called, also send updatePeerChanged to all
other connected nodes so they re-fetch their MapResponse and learn
about the updated AllowedIPs. Without this, peers never see new
subnet routes until they happen to reconnect to the control server.

Discovered while working on a three-VM natlab subnet router
integration test, coming later.

Updates #13038

Change-Id: I20e7a2fda994a8ab0e7a24240e6eae536f4f5f15
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-08 10:39:09 -07:00
Brad Fitzpatrick
8a7e160a6e ipn/desktop: move behind feature/condregister
Move the ipn/desktop blank import from cmd/tailscaled/tailscaled_windows.go
into feature/condregister/maybe_desktop_sessions.go, consistent with how
all other modular features are registered. tailscaled already imports
feature/condregister, so it still gets ipn/desktop on Windows.

Updates #12614

Change-Id: I92418c4bf0e67f0ab40542e47584762ac0ffa2b2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-07 11:37:47 -07:00
Brad Fitzpatrick
5ef3713c9f cmd/vet: add subtestnames analyzer; fix all existing violations
Add a new vet analyzer that checks t.Run subtest names don't contain
characters requiring quoting when re-running via "go test -run". This
enforces the style guide rule: don't use spaces or punctuation in
subtest names.

The analyzer flags:
- Direct t.Run calls with string literal names containing spaces,
  regex metacharacters, quotes, or other problematic characters
- Table-driven t.Run(tt.name, ...) calls where tt ranges over a
  slice/map literal with bad name field values

Also fix all 978 existing violations across 81 test files, replacing
spaces with hyphens and shortening long sentence-like names to concise
hyphenated forms.

Updates #19242

Change-Id: Ib0ad96a111bd8e764582d1d4902fe2599454ab65
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-05 15:52:51 -07:00
Naman Sood
d6b626f5bb
tstest: add test for connectivity to off-tailnet CGNAT endpoints
This test is currently known-broken, but work is underway to fix it.
tailscale/corp#36270 tracks this work.

Updates tailscale/corp#36270
Fixes tailscale/corp#36272

Signed-off-by: Naman Sood <mail@nsood.in>
2026-04-02 14:44:40 -04:00
Alex Chan
edb2be1a01 cmd/tailscale: improve tailscale lock error message if no keys
Previously, running `add/remove/revoke-keys` without passing any keys
would fail with an unhelpful error:

```console
$ tailscale lock revoke-keys
generation of recovery AUM failed: sending generate-recovery-aum: 500 Internal Server Error: no provided key is currently trusted
```

or

```console
$ tailscale lock revoke-keys
generation of recovery AUM failed: sending generate-recovery-aum: 500 Internal Server Error: network-lock is not active
```

Now they fail with a more useful error:

```console
$ tailscale lock revoke-keys
missing argument, expected one or more tailnet lock keys
```

Fixes #19130

Change-Id: I9d81fe2f5b92a335854e71cbc6928e7e77e537e3
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-03-29 09:28:52 +01:00
Brad Fitzpatrick
4c91f90776 tstest/integration: add userspace-networking + proxymap WhoIs integration test
Before sending a fix for #18991, this adds an integration test that
locks in that the proxymap WhoIs code works with two nodes running as
different users, with the second node running a localhost service and
able to use its local tailscaled to identify a Tailscale connection
from the other tailscaled.

Updates #18991

Change-Id: I6fbb0810204d77d2ac558f0cc786b73e3248d031
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-13 15:01:31 -07:00
Brad Fitzpatrick
f905871fb1 ipn/ipnlocal, feature/ssh: move SSH code out of LocalBackend to feature
This makes tsnet apps not depend on x/crypto/ssh and locks that in with a test.

It also paves the wave for tsnet apps to opt-in to SSH support via a
blank feature import in the future.

Updates #12614

Change-Id: Ica85628f89c8f015413b074f5001b82b27c953a9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-10 17:27:17 -07:00
Brad Fitzpatrick
99bde5a406 tstest/integration: deflake TestCollectPanic
Two issues caused TestCollectPanic to flake:

1. ETXTBSY: The test exec'd the tailscaled binary directly without
   going through StartDaemon/awaitTailscaledRunnable, so it lacked
   the retry loop that other tests use to work around a mysterious
   ETXTBSY on GitHub Actions.

2. Shared filch files: The test didn't pass --statedir or TS_LOGS_DIR,
   so all parallel test instances wrote panic logs to the shared system
   state directory (~/.local/share/tailscale). Concurrent runs would
   clobber each other's filch log files, causing the second run to not
   find the panic data from the first.

Fix both by adding awaitTailscaledRunnable before the first exec, and
passing --statedir and TS_LOGS_DIR to isolate each test's log files,
matching what StartDaemon does.

It now passes x/tools/cmd/stress.

Fixes #15865

Change-Id: If18b9acf8dbe9a986446a42c5d98de7ad8aae098
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-10 14:16:47 -07:00
Brad Fitzpatrick
bd2a2d53d3 all: use Go 1.26 things, run most gofix modernizers
I omitted a lot of the min/max modernizers because they didn't
result in more clear code.

Some of it's older "for x := range 123".

Also: errors.AsType, any, fmt.Appendf, etc.

Updates #18682

Change-Id: I83a451577f33877f962766a5b65ce86f7696471c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-06 13:32:03 -08:00
Brad Fitzpatrick
2a64c03c95 types/ptr: deprecate ptr.To, use Go 1.26 new
Updates #18682

Change-Id: I62f6aa0de2a15ef8c1435032c6aa74a181c25f8f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-05 20:13:18 -08:00
Brad Fitzpatrick
2810f0c6f1 all: fix typos in comments
Fix its/it's, who's/whose, wether/whether, missing apostrophes
in contractions, and other misspellings across the codebase.

Updates #cleanup

Change-Id: I20453b81a7aceaa14ea2a551abba08a2e7f0a1d8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-05 13:52:01 -08:00
Claus Lensbøl
9657a93217
tstest/natlab: add test for no control and rotated disco key (#18261)
Updates #12639

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-03-05 16:00:36 -05:00
Brad Fitzpatrick
d784dcc61b go.toolchain.branch: switch to Go 1.26
Updates #18682

Change-Id: I1eadfab950e55d004484af880a5d8df6893e85e8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-04 21:57:05 -08:00
Fernando Serboncini
54de5daae0
tstest/integration/nat: use per-call timeout in natlab ping (#18811)
The test ping() passed the full 60s context to each PingWithOpts call,
so if the first attempt hung (DERP not yet registered), the retry loop
never reached attempt 2. Use a 2s per-call timeout instead.

Updates: #18810

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-02-25 17:41:51 -05:00
Harry Harpham
299f1bf581 testcontrol: ensure Server.UpdateNode triggers netmap updates
Updating a node on a testcontrol server should trigger netmap updates to
all connected streaming clients. This was not the case previous to this
change and consequently caused race conditions in tests. It was possible
for a test to call UpdateNode and for connected nodes to never see the
update propagate.

Updates #16340
Fixes #18703

Signed-off-by: Harry Harpham <harry@tailscale.com>
2026-02-18 09:08:12 -07:00
Brad Fitzpatrick
371d6369cd gokrazy: use monorepo for gokrazy appliance builds (monogok)
This switches our gokrazy builds to use a new variant of cmd/gok called
opinionated about using monorepos: https://github.com/bradfitz/monogok

And with that, we can get rid of all the go.mod files and builddir forests
under gokrazy/**.

Updates #13038
Updates gokrazy/gokrazy#361

Change-Id: I9f18fbe59b8792286abc1e563d686ea9472c622d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-02-13 16:19:14 -08:00
Harry Harpham
84ee5b640b testcontrol: send updates for new DNS records or app capabilities
Two methods were recently added to the testcontrol.Server type:
AddDNSRecords and SetGlobalAppCaps. These two methods should trigger
netmap updates for all nodes connected to the Server instance, the way
that other state-change methods do (see SetNodeCapMap, for example).

This will also allow us to get rid of Server.ForceNetmapUpdate, which
was a band-aid fix to force the netmap updates which should have been
triggered by the aforementioned methods.

Fixes tailscale/corp#37102

Signed-off-by: Harry Harpham <harry@tailscale.com>
2026-02-11 11:49:15 -07:00
Fernando Serboncini
73d09316e2
tstest: update clock to always use UTC (#18663)
Instead of relying on the local timezone, which may cause
non-deterministic behavior in some CIs, we force timezone
to be UTC on default created clocks.

Fixes: tailscale/corp#37005

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-02-11 13:47:48 -05:00
James Tucker
1183f7a191 tstest/integration/testcontrol: fix unguarded read of DNS config
Fixes #18498

Signed-off-by: James Tucker <james@tailscale.com>
2026-01-24 14:38:48 -08:00
Nick Khyl
2a69f48541 wf: allow limited broadcast to/from permitted interfaces when using an exit node on Windows
Similarly to allowing link-local multicast in #13661, we should also allow broadcast traffic
on permitted interfaces when the killswitch is enabled due to exit node usage on Windows.
This always includes internal interfaces, such as Hyper-V/WSL2, and also the LAN when
"Allow local network access" is enabled in the client.

Updates #18504

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2026-01-23 18:30:38 -06:00
Will Norris
3ec5be3f51 all: remove AUTHORS file and references to it
This file was never truly necessary and has never actually been used in
the history of Tailscale's open source releases.

A Brief History of AUTHORS files
---

The AUTHORS file was a pattern developed at Google, originally for
Chromium, then adopted by Go and a bunch of other projects. The problem
was that Chromium originally had a copyright line only recognizing
Google as the copyright holder. Because Google (and most open source
projects) do not require copyright assignemnt for contributions, each
contributor maintains their copyright. Some large corporate contributors
then tried to add their own name to the copyright line in the LICENSE
file or in file headers. This quickly becomes unwieldy, and puts a
tremendous burden on anyone building on top of Chromium, since the
license requires that they keep all copyright lines intact.

The compromise was to create an AUTHORS file that would list all of the
copyright holders. The LICENSE file and source file headers would then
include that list by reference, listing the copyright holder as "The
Chromium Authors".

This also become cumbersome to simply keep the file up to date with a
high rate of new contributors. Plus it's not always obvious who the
copyright holder is. Sometimes it is the individual making the
contribution, but many times it may be their employer. There is no way
for the proejct maintainer to know.

Eventually, Google changed their policy to no longer recommend trying to
keep the AUTHORS file up to date proactively, and instead to only add to
it when requested: https://opensource.google/docs/releasing/authors.
They are also clear that:

> Adding contributors to the AUTHORS file is entirely within the
> project's discretion and has no implications for copyright ownership.

It was primarily added to appease a small number of large contributors
that insisted that they be recognized as copyright holders (which was
entirely their right to do). But it's not truly necessary, and not even
the most accurate way of identifying contributors and/or copyright
holders.

In practice, we've never added anyone to our AUTHORS file. It only lists
Tailscale, so it's not really serving any purpose. It also causes
confusion because Tailscalars put the "Tailscale Inc & AUTHORS" header
in other open source repos which don't actually have an AUTHORS file, so
it's ambiguous what that means.

Instead, we just acknowledge that the contributors to Tailscale (whoever
they are) are copyright holders for their individual contributions. We
also have the benefit of using the DCO (developercertificate.org) which
provides some additional certification of their right to make the
contribution.

The source file changes were purely mechanical with:

    git ls-files | xargs sed -i -e 's/\(Tailscale Inc &\) AUTHORS/\1 contributors/g'

Updates #cleanup

Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
2026-01-23 15:49:45 -08:00
Harry Harpham
3840183be9 tsnet: add support for Services
This change allows tsnet nodes to act as Service hosts by adding a new
function, tsnet.Server.ListenService. Invoking this function will
advertise the node as a host for the Service and create a listener to
receive traffic for the Service.

Fixes #17697
Fixes tailscale/corp#27200
Signed-off-by: Harry Harpham <harry@tailscale.com>
2026-01-16 15:28:31 -07:00
Andrew Dunham
6aac87a84c net/portmapper, go.mod: unfork our goupnp dependency
Updates #7436

Signed-off-by: Andrew Dunham <andrew@tailscale.com>
2026-01-08 11:42:36 -05:00
Alex Chan
b7658a4ad2 tstest/integration: add integration test for Tailnet Lock
This patch adds an integration test for Tailnet Lock, checking that a node can't
talk to peers in the tailnet until it becomes signed.

This patch also introduces a new package `tstest/tkatest`, which has some helpers
for constructing a mock control server that responds to TKA requests. This allows
us to reduce boilerplate in the IPN tests.

Updates tailscale/corp#33599

Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-11-26 11:54:48 +00:00
Brad Fitzpatrick
ac0b15356d tailcfg, control/controlclient: start moving MapResponse.DefaultAutoUpdate to a nodeattr
And fix up the TestAutoUpdateDefaults integration tests as they
weren't testing reality: the DefaultAutoUpdate is supposed to only be
relevant on the first MapResponse in the stream, but the tests weren't
testing that. They were instead injecting a 2nd+ MapResponse.

This changes the test control server to add a hook to modify the first
map response, and then makes the test control when the node goes up
and down to make new map responses.

Also, the test now runs on macOS where the auto-update feature being
disabled would've previously t.Skipped the whole test.

Updates #11502

Change-Id: If2319bd1f71e108b57d79fe500b2acedbc76e1a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-11-25 10:45:34 -08:00
Andrew Dunham
a20cdb5c93 tstest/integration/testcontrol: de-flake TestUserMetricsRouteGauges
SetSubnetRoutes was not sending update notifications to nodes when their
approved routes changed, causing nodes to not fetch updated netmaps with
PrimaryRoutes populated. This resulted in TestUserMetricsRouteGauges
flaking because it waited for PrimaryRoutes to be set, which only happened
if the node happened to poll for other reasons.

Now send updateSelfChanged notification to affected nodes so they fetch
an updated netmap immediately.

Fixes #17962

Signed-off-by: Andrew Dunham <andrew@tailscale.com>
2025-11-23 21:13:23 -05:00
Andrew Dunham
16587746ed portlist,tstest: skip tests on kernels with /proc/net/tcp regression
Linux kernel versions 6.6.102-104 and 6.12.42-45 have a regression
in /proc/net/tcp that causes seek operations to fail with "illegal seek".
This breaks portlist tests on these kernels.

Add kernel version detection for Linux systems and a SkipOnKernelVersions
helper to tstest. Use it to skip affected portlist tests on the broken
kernel versions.

Thanks to philiptaron for the list of kernels with the issue and fix.

Updates #16966

Signed-off-by: Andrew Dunham <andrew@tailscale.com>
2025-11-21 22:33:57 -05:00
Andrew Lytvynov
c679aaba32
cmd/tailscaled,ipn: show a health warning when state store fails to open (#17883)
With the introduction of node sealing, store.New fails in some cases due
to the TPM device being reset or unavailable. Currently it results in
tailscaled crashing at startup, which is not obvious to the user until
they check the logs.

Instead of crashing tailscaled at startup, start with an in-memory store
with a health warning about state initialization and a link to (future)
docs on what to do. When this health message is set, also block any
login attempts to avoid masking the problem with an ephemeral node
registration.

Updates #15830
Updates #17654

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2025-11-20 15:52:58 -06:00
Alex Chan
e1dd9222d4 ipn/ipnlocal, tka: compact TKA state after every sync
Previously a TKA compaction would only run when a node starts, which means a long-running node could use unbounded storage as it accumulates ever-increasing amounts of TKA state. This patch changes TKA so it runs a compaction after every sync.

Updates https://github.com/tailscale/corp/issues/33537

Change-Id: I91df887ea0c5a5b00cb6caced85aeffa2a4b24ee
Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-11-19 12:27:04 +00:00
Alex Chan
85373ef822 tka: move RemoveAll() to CompactableChonk
I added a RemoveAll() method on tka.Chonk in #17946, but it's only used
in the node to purge local AUMs. We don't need it in the SQLite storage,
which currently implements tka.Chonk, so move it to CompactableChonk
instead.

Also add some automated tests, as a safety net.

Updates tailscale/corp#33599

Change-Id: I54de9ccf1d6a3d29b36a94eccb0ebd235acd4ebc
Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-11-18 12:53:52 +00:00
Alex Chan
c2e474e729 all: rename variables with lowercase-l/uppercase-I
See http://go/no-ell

Signed-off-by: Alex Chan <alexc@tailscale.com>

Updates #cleanup

Change-Id: I8c976b51ce7a60f06315048b1920516129cc1d5d
2025-11-18 09:12:34 +00:00
Alex Chan
1723cb83ed ipn/ipnlocal: use an in-memory TKA store if FS is unavailable
This requires making the internals of LocalBackend a bit more generic,
and implementing the `tka.CompactableChonk` interface for `tka.Mem`.

Signed-off-by: Alex Chan <alexc@tailscale.com>

Updates https://github.com/tailscale/corp/issues/33599
2025-11-17 18:12:33 +00:00
Brad Fitzpatrick
653d0738f9 types/netmap: remove PrivateKey from NetworkMap
It's an unnecessary nuisance having it. We go out of our way to redact
it in so many places when we don't even need it there anyway.

Updates #12639

Change-Id: I5fc72e19e9cf36caeb42cf80ba430873f67167c3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-11-16 15:32:51 -08:00
Brad Fitzpatrick
edb11e0e60 wgengine/magicsock: fix js/wasm crash regression loading non-existent portmapper
Thanks for the report, @Need-an-AwP!

Fixes #17681
Updates #9394

Change-Id: I2e0b722ef9b460bd7e79499192d1a315504ca84c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-10-28 08:59:00 -07:00
Alex Chan
4673992b96 tka: created a shared testing library for Chonk
This patch creates a set of tests that should be true for all implementations of Chonk and CompactableChonk, which we can share with the SQLite implementation in corp.

It includes all the existing tests, plus a test for LastActiveAncestor which was in corp but not in oss.

Updates https://github.com/tailscale/corp/issues/33465

Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-10-20 13:13:14 +01:00
Alex Chan
c961d58091 cmd/tailscale: improve the error message for lock log with no lock
Previously, running `tailscale lock log` in a tailnet without Tailnet
Lock enabled would return a potentially confusing error:

    $ tailscale lock log
    2025/10/20 11:07:09 failed to connect to local Tailscale service; is Tailscale running?

It would return this error even if Tailscale was running.

This patch fixes the error to be:

    $ tailscale lock log
    Tailnet Lock is not enabled

Fixes #17586

Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-10-20 12:15:57 +01:00
Alex Chan
b7fe1cea9f cmd/tailscale/cli: only print authURLs and device approval URLs once
This patch fixes several issues related to printing login and device
approval URLs, especially when `tailscale up` is interrupted:

1.  Only print a login URL that will cause `tailscale up` to complete.
    Don't print expired URLs or URLs from previous login attempts.

2.  Print the device approval URL if you run `tailscale up` after
    previously completing a login, but before approving the device.

3.  Use the correct control URL for device approval if you run a bare
    `tailscale up` after previously completing a login, but before
    approving the device.

4.  Don't print the device approval URL more than once (or at least,
    not consecutively).

Updates tailscale/corp#31476
Updates #17361

## How these fixes work

This patch went through a lot of trial and error, and there may still
be bugs! These notes capture the different scenarios and considerations
as we wrote it, which are also captured by integration tests.

1.  We were getting stale login URLs from the initial IPN state
    notification.

    When the IPN watcher was moved to before Start() in c011369, we
    mistakenly continued to request the initial state. This is only
    necessary if you start watching after you call Start(), because
    you may have missed some notifications.

    By getting the initial state before calling Start(), we'd get
    a stale login URL. If you clicked that URL, you could complete
    the login in the control server (if it wasn't expired), but your
    instance of `tailscale up` would hang, because it's listening for
    login updates from a different login URL.

    In this patch, we no longer request the initial state, and so we
    don't print a stale URL.

2.  Once you skip the initial state from IPN, the following sequence:

    *   Run `tailscale up`
    *   Log into a tailnet with device approval
    *   ^C after the device approval URL is printed, but without approving
    *   Run `tailscale up` again

    means that nothing would ever be printed.

    `tailscale up` would send tailscaled the pref `WantRunning: true`,
    but that was already the case so nothing changes. You never get any
    IPN notifications, and in particular you never get a state change to
    `NeedsMachineAuth`. This means we'd never print the device approval URL.

    In this patch, we add a hard-coded rule that if you're doing a simple up
    (which won't trigger any other IPN notifications) and you start in the
    `NeedsMachineAuth` state, we print the device approval message without
    waiting for an IPN notification.

3.  Consider the following sequence:

    *   Run `tailscale up --login-server=<custom server>`
    *   Log into a tailnet with device approval
    *   ^C after the device approval URL is printed, but without approving
    *   Run `tailscale up` again

    We'd print the device approval URL for the default control server,
    rather than the real control server, because we were using the `prefs`
    from the CLI arguments (which are all the defaults) rather than the
    `curPrefs` (which contain the custom login server).

    In this patch, we use the `prefs` if the user has specified any settings
    (and other code will ensure this is a complete set of settings) or
    `curPrefs` if it's a simple `tailscale up`.

4.  Consider the following sequence: you've logged in, but not completed
    device approval, and you run `down` and `up` in quick succession.

    *   `up`: sees state=NeedsMachineAuth
    *   `up`: sends `{wantRunning: true}`, prints out the device approval URL
    *   `down`: changes state to Stopped
    *   `up`: changes state to Starting
    *   tailscaled: changes state to NeedsMachineAuth
    *   `up`: gets an IPN notification with the state change, and prints
        a second device approval URL

    Either URL works, but this is annoying for the user.

    In this patch, we track whether the last printed URL was the device
    approval URL, and if so, we skip printing it a second time.

Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-10-08 18:00:29 +01:00
Alex Chan
bb6bd46570 tstest/integration: log all the output printed by tailscale up
Updates tailscale/corp#31476
Updates #17361

Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-10-08 18:00:29 +01:00
Alex Chan
06f12186d9 tstest/integration: test tailscale up when device approval is required
This patch extends the integration tests for `tailscale up` to include tailnets
where new devices need to be approved. It doesn't change the CLI, because it's
mostly working correctly already -- these tests are just to prevent future
regressions.

I've added support for `MachineAuthorized` to mock control, and I've refactored
`TestOneNodeUpAuth` to be more flexible. It now takes a sequence of steps to
run and asserts whether we got a login URL and/or machine approval URL after
each step.

Updates tailscale/corp#31476
Updates #17361

Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-10-08 18:00:29 +01:00
Alex Chan
6db8957744 tstest/integration: mark TestPeerRelayPing as flaky
Updates #17251

Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-10-06 17:01:02 +01:00
Alex Chan
59a39841c3 tstest/integration: mark TestClientSideJailing as flaky
Updates #17419

Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-10-03 17:58:08 +01:00