643 Commits

Author SHA1 Message Date
Brad Fitzpatrick
206d98e84b control/controlclient: restore aggressive Direct.Close teardown
In the earlier http2 package migration (1d93bdce20ddd2, #17394) I had
removed Direct.Close's tracking of the connPool, thinking it wasn't
necessary.

Some tests (in another repo) are strict and like it to tear down the
world and wait, to check for leaked goroutines. And they caught this
letting some goroutines idle past Close, even if they'd eventually
close down on their own.

This restores the connPool accounting and the aggressife close.

Updates #17305
Updates #17394

Change-Id: I5fed283a179ff7c3e2be104836bbe58b05130cc7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-10-02 20:50:28 -07:00
Brad Fitzpatrick
24e38eb729 control/controlclient,health,ipn/ipnlocal,health: fix deadlock by deleting health reporting
A recent change (009d702adfa0fc) introduced a deadlock where the
/machine/update-health network request to report the client's health
status update to the control plane was moved to being synchronous
within the eventbus's pump machinery.

I started to instead make the health reporting be async, but then we
realized in the three years since we added that, it's barely been used
and doesn't pay for itself, for how many HTTP requests it makes.

Instead, delete it all and replace it with a c2n handler, which
provides much more helpful information.

Fixes tailscale/corp#32952

Change-Id: I9e8a5458269ebfdda1c752d7bbb8af2780d71b04
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-10-02 12:48:22 -07:00
Brad Fitzpatrick
a208cb9fd5 feature/featuretags: add features for c2n, peerapi, advertise/use routes/exit nodes
Saves 262 KB so far. I'm sure I missed some places, but shotizam says
these were the low hanging fruit.

Updates #12614

Change-Id: Ia31c01b454f627e6d0470229aae4e19d615e45e3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-10-02 12:48:12 -07:00
Brad Fitzpatrick
2cd518a8b6 control/controlclient: optimize zstd decode of KeepAlive messages
Maybe it matters? At least globally across all nodes?

Fixes #17343

Change-Id: I3f61758ea37de527e16602ec1a6e453d913b3195
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-10-02 10:51:30 -07:00
Brad Fitzpatrick
1d93bdce20 control/controlclient: remove x/net/http2, use net/http
Saves 352 KB, removing one of our two HTTP/2 implementations linked
into the binary.

Fixes #17305
Updates #15015

Change-Id: I53a04b1f2687dca73c8541949465038b69aa6ade
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-10-02 08:25:14 -07:00
Brad Fitzpatrick
78af49dd1a control/ts2021: rename from internal/noiseconn in prep for controlclient split
A following change will split out the controlclient.NoiseClient type
out, away from the rest of the controlclient package which is
relatively dependency heavy.

A question was where to move it, and whether to make a new (a fifth!)
package in the ts2021 dependency chain.

@creachadair and I brainstormed and decided to merge
internal/noiseconn and controlclient.NoiseClient into one package,
with names ts2021.Conn and ts2021.Client.

For ease of reviewing the subsequent PR, this is the first step that
just renames the internal/noiseconn package to control/ts2021.

Updates #17305

Change-Id: Ib5ea162dc1d336c1d805bdd9548d1702dd6e1468
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-10-01 15:34:57 -07:00
Claus Lensbøl
ce752b8a88
net/netmon: remove usage of direct callbacks from netmon (#17292)
The callback itself is not removed as it is used in other repos, making
it simpler for those to slowly transition to the eventbus.

Updates #15160

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2025-10-01 14:59:38 -04:00
Brad Fitzpatrick
05a4c8e839 tsnet: remove AuthenticatedAPITransport (API-over-noise) support
It never launched and I've lost hope of it launching and it's in my
way now, so I guess it's time to say goodbye.

Updates tailscale/corp#4383
Updates #17305

Change-Id: I2eb551d49f2fb062979cc307f284df4b3dfa5956
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-10-01 08:13:24 -07:00
Brad Fitzpatrick
c2f37c891c all: use Go 1.20's errors.Join instead of our multierr package
Updates #7123

Change-Id: Ie9be6814831f661ad5636afcd51d063a0d7a907d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-10-01 08:10:59 -07:00
Brad Fitzpatrick
ee034d48fc feature/featuretags: add a catch-all "Debug" feature flag
Saves 168 KB.

Updates #12614

Change-Id: Iaab3ae3efc6ddc7da39629ef13e5ec44976952ba
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-09-30 11:32:33 -07:00
Brad Fitzpatrick
442a3a779d feature, net/tshttpproxy: pull out support for using proxies as a feature
Saves 139 KB.

Also Synology support, which I saw had its own large-ish proxy parsing
support on Linux, but support for proxies without Synology proxy
support is reasonable, so I pulled that out as its own thing.

Updates #12614

Change-Id: I22de285a3def7be77fdcf23e2bec7c83c9655593
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-09-30 10:25:56 -07:00
Brad Fitzpatrick
bcd79b161a feature/featuretags: add option to turn off DNS
Saves 328 KB (2.5%) off the minimal binary.

For IoT devices that don't need MagicDNS (e.g. they don't make
outbound connections), this provides a knob to disable all the DNS
functionality.

Rather than a massive refactor today, this uses constant false values
as a deadcode sledgehammer, guided by shotizam to find the largest DNS
functions which survived deadcode.

A future refactor could make it so that the net/dns/resolver and
publicdns packages don't even show up in the import graph (along with
their imports) but really it's already pretty good looking with just
these consts, so it's not at the top of my list to refactor it more
soon.

Also do the same in a few places with the ACME (cert) functionality,
as I saw those while searching for DNS stuff.

Updates #12614

Change-Id: I8e459f595c2fde68ca16503ff61c8ab339871f97
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-09-30 08:25:24 -07:00
Brad Fitzpatrick
976389c0f7 feature/sdnotify: move util/systemd to a modular feature
Updates #12614

Change-Id: I08e714c83b455df7f538cc99cafe940db936b480
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-09-29 13:08:36 -07:00
Brad Fitzpatrick
01e645fae1 util/backoff: rename logtail/backoff package to util/backoff
It has nothing to do with logtail and is confusing named like that.

Updates #cleanup
Updates #17323

Change-Id: Idd34587ba186a2416725f72ffc4c5778b0b9db4a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-09-28 11:55:07 -07:00
Claus Lensbøl
f67ad67c6f
control/controlclient: switch ID to be incrementing instead of random (#17230)
Also cleans up a a few comments.

Updates #15160

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2025-09-22 13:14:55 -04:00
Claus Lensbøl
6e128498a7
controlclient/auto: switch eventbus to using a monitor (#17205)
Only changes how the go routine consuming the events starts and stops,
not what it does.

Updates #15160

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2025-09-22 09:16:13 -04:00
Kristoffer Dalby
986b4d1b0b control/controlclient: fix tka godoc
Updates #cleanup

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-22 08:32:42 +02:00
Brad Fitzpatrick
db048e905d control/controlhttp: simplify, fix race dialing, remove priority concept
controlhttp has the responsibility of dialing a set of candidate control
endpoints in a way that minimizes user facing latency. If one control
endpoint is unavailable we promptly dial another, racing across the
dimensions of: IPv6, IPv4, port 80, and port 443, over multiple server
endpoints.

In the case that the top priority endpoint was not available, the prior
implementation would hang waiting for other results, so as to try to
return the highest priority successful connection to the rest of the
client code. This hang would take too long with a large dialplan and
sufficient client to endpoint latency as to cause the server to timeout
the connection due to inactivity in the intermediate state.

Instead of trying to prioritize non-ideal candidate connections, the
first successful connection is now used unconditionally, improving user
facing latency and avoiding any delays that would encroach on the
server-side timeout.

The tests are converted to memnet and synctest, running on all
platforms.

Fixes #8442
Fixes tailscale/corp#32534

Co-authored-by: James Tucker <james@tailscale.com>
Change-Id: I4eb57f046d8b40403220e40eb67a31c41adb3a38
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
2025-09-20 20:37:14 -07:00
Claus Lensbøl
009d702adf
health: remove direct callback and replace with eventbus (#17199)
Pulls out the last callback logic and ensures timers are still running.

The eventbustest package is updated support the absence of events.

Updates #15160

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2025-09-19 14:58:37 -04:00
Brad Fitzpatrick
ecfdd86fc9 net/ace, control/controlhttp: start adding ACE dialing support
Updates tailscale/corp#32227

Change-Id: I38afc668f99eb1d6f7632e82554b82922f3ebb9f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-09-19 09:52:29 -07:00
Anton Tolchanov
4a04161828 ipn/ipnlocal: add a C2N endpoint for fetching a netmap
For debugging purposes, add a new C2N endpoint returning the current
netmap. Optionally, coordination server can send a new "candidate" map
response, which the client will generate a separate netmap for.
Coordination server can later compare two netmaps, detecting unexpected
changes to the client state.

Updates tailscale/corp#32095

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2025-09-19 17:28:49 +01:00
Alex Chan
cd153aa644 control, ipn, tailcfg: enable seamless key renewal by default
Previously, seamless key renewal was an opt-in feature.  Customers had
to set a `seamless-key-renewal` node attribute in their policy file.

This patch enables seamless key renewal by default for all clients.

It includes a `disable-seamless-key-renewal` node attribute we can set
in Control, so we can manage the rollout and disable the feature for
clients with known bugs.  This new attribute makes the feature opt-out.

Updates tailscale/corp#31479

Signed-off-by: Alex Chan <alexc@tailscale.com>
2025-09-18 09:59:46 +01:00
Claus Lensbøl
2015ce4081
health,ipn/ipnlocal: introduce eventbus in heath.Tracker (#17085)
The Tracker was using direct callbacks to ipnlocal. This PR moves those
to be triggered via the eventbus.

Additionally, the eventbus is now closed on exit from tailscaled
explicitly, and health is now a SubSystem in tsd.

Updates #15160

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2025-09-16 11:25:29 -04:00
Claus Lensbøl
b816fd7117
control/controlclient: introduce eventbus messages instead of callbacks (#16956)
This is a small introduction of the eventbus into controlclient that
communicates with mainly ipnlocal. While ipnlocal is a complicated part
of the codebase, the subscribers here are from the perspective of
ipnlocal already called async.

Updates #15160

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2025-09-15 10:36:17 -04:00
Brad Fitzpatrick
d05e6dc09e util/syspolicy/policyclient: add policyclient.Client interface, start plumbing
This is step 2 of ~4, breaking up #14720 into reviewable chunks, with
the aim to make syspolicy be a build-time configurable feature.

Step 1 was #16984.

In this second step, the util/syspolicy/policyclient package is added
with the policyclient.Client interface.  This is the interface that's
always present (regardless of build tags), and is what code around the
tree uses to ask syspolicy/MDM questions.

There are two implementations of policyclient.Client for now:

1) NoPolicyClient, which only returns default values.
2) the unexported, temporary 'globalSyspolicy', which is implemented
   in terms of the global functions we wish to later eliminate.

This then starts to plumb around the policyclient.Client to most callers.

Future changes will plumb it more. When the last of the global func
callers are gone, then we can unexport the global functions and make a
proper policyclient.Client type and constructor in the syspolicy
package, removing the globalSyspolicy impl out of tsd.

The final change will sprinkle build tags in a few more places and
lock it in with dependency tests to make sure the dependencies don't
later creep back in.

Updates #16998
Updates #12614

Change-Id: Ib2c93d15c15c1f2b981464099177cd492d50391c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-09-01 09:34:29 -07:00
Brad Fitzpatrick
cc532efc20 util/syspolicy/*: move syspolicy keys to new const leaf "pkey" package
This is step 1 of ~3, breaking up #14720 into reviewable chunks, with
the aim to make syspolicy be a build-time configurable feature.

In this first (very noisy) step, all the syspolicy string key
constants move to a new constant-only (code-free) package. This will
make future steps more reviewable, without this movement noise.

There are no code or behavior changes here.

The future steps of this series can be seen in #14720: removing global
funcs from syspolicy resolution and using an interface that's plumbed
around instead. Then adding build tags.

Updates #12614

Change-Id: If73bf2c28b9c9b1a408fe868b0b6a25b03eeabd1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-08-31 17:09:24 -07:00
Andrew Lytvynov
0f7facfeee
control/controlclient: fix data race on tkaHead (#16855)
Grab a copy under mutex in sendMapRequest.

Updates #cleanup

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2025-08-13 13:49:27 -07:00
Jordan Whited
d122f0350e
control/controlknobs,tailcfg,wgengine/magicsock: deprecate NodeAttrDisableMagicSockCryptoRouting (#16818)
Peer Relay is dependent on crypto routing, therefore crypto routing is
now mandatory.

Updates tailscale/corp#20732
Updates tailscale/corp#31083

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2025-08-11 09:04:03 -07:00
James Sanderson
5731869565 health: add an ETag to UnhealthyState for change detection
Updates tailscale/corp#30596

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2025-07-28 11:50:18 +01:00
Brad Fitzpatrick
a64ca7a5b4 tstest/tlstest: simplify, don't even bake in any keys
I earlier thought this saved a second of CPU even on a fast machine,
but I think when I was previously measuring, I still had a 4096 bit
RSA key being generated in the code I was measuring.

Measuring again for this, it's plenty fast.

Prep for using this package more, for derp, etc.

Updates #16315

Change-Id: I4c9008efa9aa88a3d65409d6ffd7b3807f4d75e9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-06-19 16:12:32 -07:00
Brad Fitzpatrick
e92eb6b17b net/tlsdial: fix TLS cert validation of HTTPS proxies
If you had HTTPS_PROXY=https://some-valid-cert.example.com running a
CONNECT proxy, we should've been able to do a TLS CONNECT request to
e.g. controlplane.tailscale.com:443 through that, and I'm pretty sure
it used to work, but refactorings and lack of integration tests made
it regress.

It probably regressed when we added the baked-in LetsEncrypt root cert
validation fallback code, which was testing against the wrong hostname
(the ultimate one, not the one which we were being asked to validate)

Fixes #16222

Change-Id: If014e395f830e2f87f056f588edacad5c15e91bc
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-06-18 14:20:39 -07:00
James Sanderson
5716d0977d health: prefix Warnables received from the control plane
Updates tailscale/corp#27759

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2025-06-09 10:35:22 +01:00
James Sanderson
11e83f9da5 controlclient,health,ipnlocal,tailcfg: add DisplayMessage support
Updates tailscale/corp#27759

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2025-05-30 14:48:11 +01:00
James 'zofrex' Sanderson
aa8bc23c49
control/controlclient,health,tailcfg: refactor control health messages (#15839)
* control/controlclient,health,tailcfg: refactor control health messages

Updates tailscale/corp#27759

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
Signed-off-by: Paul Scott <408401+icio@users.noreply.github.com>
Co-authored-by: Paul Scott <408401+icio@users.noreply.github.com>
2025-05-22 13:40:32 +01:00
Brian Palmer
f5cc657e13
control/controlclient: send optional ConnectionHandleForTest with map requests (#15904)
This handle can be used in tests and debugging to identify the specific
client connection.

Updates tailscale/corp#28368

Change-Id: I48cc573fc0bcf018c66a18e67ad6c4f248fb760c

Signed-off-by: Brian Palmer <brianp@tailscale.com>
2025-05-07 12:57:56 -06:00
James Sanderson
1f1c323eeb control/controlclient,health: add tests for control health tracking
Updates tailscale/corp#27759

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2025-04-29 12:36:38 +01:00
David Anderson
6d6f69e735 derp/derphttp: remove ban on websockets dependency
The event bus's debug page uses websockets.

Updates #15160

Signed-off-by: David Anderson <dave@tailscale.com>
2025-04-16 10:10:45 -07:00
Brad Fitzpatrick
fb96137d79 net/{netx,memnet},all: add netx.DialFunc, move memnet Network impl
This adds netx.DialFunc, unifying a type we have a bazillion other
places, giving it now a nice short name that's clickable in
editors, etc.

That highlighted that my earlier move (03b47a55c7956) of stuff from
nettest into netx moved too much: it also dragged along the memnet
impl, meaning all users of netx.DialFunc who just wanted netx for the
type definition were instead also pulling in all of memnet.

So move the memnet implementation netx.Network into memnet, a package
we already had.

Then use netx.DialFunc in a bunch of places. I'm sure I missed some.
And plenty remain in other repos, to be updated later.

Updates tailscale/corp#27636

Change-Id: I7296cd4591218e8624e214f8c70dab05fb884e95
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-04-08 10:07:47 -07:00
Brad Fitzpatrick
c76d075472 nettest, *: add option to run HTTP tests with in-memory network
To avoid ephemeral port / TIME_WAIT exhaustion with high --count
values, and to eventually detect leaked connections in tests. (Later
the memory network will register a Cleanup on the TB to verify that
everything's been shut down)

Updates tailscale/corp#27636

Change-Id: Id06f1ae750d8719c5a75d871654574a8226d2733
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-04-07 11:11:45 -07:00
Brad Fitzpatrick
65c7a37bc6 all: use network less when running in v86 emulator
Updates #5794

Change-Id: I1d8b005a1696835c9062545f87b7bab643cfc44d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-04-02 07:36:04 -07:00
Brad Fitzpatrick
29c2bb1db6 control/controlhttp: reduce some log spam on context cancel
Change-Id: I3ac00ddb29c16e9791ab2be19f454dabd721e4c3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-04-02 07:36:04 -07:00
Brad Fitzpatrick
4c9b37fa2e control/controlhttp: set forceNoise443 on Plan 9
Updates #5794

Change-Id: Idc67082f5d367e03540e1a5310db5b466ee03666
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-04-02 07:36:04 -07:00
Brad Fitzpatrick
5aa1c27aad control/controlhttp: quiet "forcing port 443" log spam
Minimal mitigation that doesn't do the full refactor that's probably
warranted.

Updates #15402

Change-Id: I79fd91de0e0661d25398f7d95563982ed1d11561
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-03-25 14:26:24 -07:00
Jonathan Nobels
52710945f5
control/controlclient, ipn: add client audit logging (#14950)
updates tailscale/corp#26435

Adds client support for sending audit logs to control via /machine/audit-log.
Specifically implements audit logging for user initiated disconnections.

This will require further work to optimize the peristant storage and exclusion
via build tags for mobile:
tailscale/corp#27011
tailscale/corp#27012

Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
2025-03-12 10:37:03 -04:00
Brad Fitzpatrick
75a03fc719 wgengine/magicsock: use learned DERP route as send path of last resort
If we get a packet in over some DERP and don't otherwise know how to
reply (no known DERP home or UDP endpoint), this makes us use the
DERP connection on which we received the packet to reply. This will
almost always be our own home DERP region.

This is particularly useful for large one-way nodes (such as
hello.ts.net) that don't actively reach out to other nodes, so don't
need to be told the DERP home of peers. They can instead learn the
DERP home upon getting the first connection.

This can also help nodes from a slow or misbehaving control plane.

Updates tailscale/corp#26438

Change-Id: I6241ec92828bf45982e0eb83ad5c7404df5968bc
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-03-07 05:37:24 -08:00
James Sanderson
45f29a208a control/controlclient,tailcfg:types: remove MaxKeyduration from NetMap
This reverts most of 124dc10261ea (#10401).

Removing in favour of adding this in CapMaps instead (#14829).

Updates tailscale/corp#16016

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2025-02-14 18:06:23 +00:00
Brad Fitzpatrick
b7f508fccf Revert "control/controlclient: delete unreferenced mapSession UserProfiles"
This reverts commit 413fb5b93311972e3a8d724bb696607ef3afe6f2.

See long story in #14992

Updates #14992
Updates tailscale/corp#26058

Change-Id: I3de7d080443efe47cbf281ea20887a3caf202488
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-02-11 14:53:04 -08:00
Brad Fitzpatrick
9706c9f4ff types/netmap,*: pass around UserProfiles as views (pointers) instead
Smaller.

Updates tailscale/corp#26058 (@andrew-d noticed during this)

Change-Id: Id33cddd171aaf8f042073b6d3c183b0a746e9931
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-02-11 07:12:54 -08:00
Brad Fitzpatrick
0ed4aa028f control/controlclient: flesh out a recently added comment
Updates tailscale/corp#26058

Change-Id: Ib46161fbb2e79c080f886083665961f02cbf5949
2025-01-30 08:48:52 +00:00
Brad Fitzpatrick
ed8bb3b564 control/controlclient: add missing word in comment
Found by review.ai.

Updates #cleanup

Change-Id: Ib9126de7327527b8b3818d92cc774bb1c7b6f974
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2025-01-30 08:48:52 +00:00