Adds a CI check to keep opted-in directories' README.md files in sync
with their package godoc. For now tsnet (and its sub-packages under
tsnet/example) is the only opted-in tree. The list of directories
lives in misc/genreadme/genreadme.go as defaultRoots, so CI and humans
both just run `./tool/go run ./misc/genreadme` with no arguments.
The check piggybacks on the existing go_generate job in test.yml and
fails if any README.md is out of date, pointing the user at the same
command.
Along the way:
- tempfork/pkgdoc now emits Markdown instead of plain text: headings
become level-2 with no {#hdr-...} anchors, and [Symbol] doc links
resolve to pkg.go.dev URLs, including for symbols in the current
package (which the default Printer would otherwise emit as bare
#Name fragments with no backing anchor in a README). Parsing no
longer uses parser.ImportsOnly, so doc.Package knows the package's
symbols and can resolve [Symbol] links at all.
- genreadme also emits a pkg.go.dev Go Reference badge at the top of
a library package's README; suppressed for package main.
- tsnet/tsnet.go's package godoc is expanded in idiomatic godoc
syntax — [Type], [Type.Method], reference-style [link]: URL
definitions — rather than Markdown-flavored [text](url) or
backtick-quoted identifiers, so that both pkg.go.dev and the
generated README.md render cleanly from a single source.
Fixes#19431Fixes#19483Fixes#19470
Change-Id: I8ca37e9e7b3bd446b8bfa7a91ac548f142688cb1
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Walter Poupore <walterp@tailscale.com>
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Replace Conn.peers (sorted views.Slice) with peersByID, a
map[tailcfg.NodeID]tailcfg.NodeView. The only caller that needed
the sorted slice (the disco message receive path's binary search)
becomes a single map lookup. Drop nodesEqual.
Add Conn.UpsertPeer / Conn.RemovePeer for O(1) single-peer endpoint
work. RemovePeer also performs a targeted single-disco-key cleanup
(previously that scan was O(discoInfo)).
Extract the shared per-peer upsert body as upsertPeerLocked; still
used by SetNetworkMap's bulk path. SetNetworkMap is documented as
the bulk / initial / self-change path; UpsertPeer and RemovePeer
are preferred for single-peer changes.
Make the relay server set update O(1) per peer: add serverUpsertCh
/ serverRemoveCh to relayManager with matching run-loop handlers.
UpsertPeer / RemovePeer evaluate the per-peer relay predicate
locally and dispatch upsert or remove. The full-rebuild
updateRelayServersSet stays for the initial netmap, filter
changes, and fallback.
Move the hasPeerRelayServers atomic from Conn onto relayManager,
next to the serversByNodeKey map it summarizes. The run loop is
now the single writer and needs no back-pointer to Conn;
endpoint's two hot-path readers take one extra hop to
de.c.relayManager.hasPeerRelayServers but the cost is the same
atomic load.
No callers use UpsertPeer/RemovePeer yet; a subsequent change will
plumb per-peer add/remove through the incremental map update path.
Updates #12542
Change-Id: If6a3442fe29ccbd77890ea61b754a4d1ad6ef225
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Verifies that site-to-site Tailscale subnet routing with
--snat-subnet-routes=false preserves the original source IP
end-to-end.
Topology: two sites, each with a Linux subnet router on a NATted WAN
plus an internal LAN, and a non-Tailscale backend on each LAN. Backends
are given static routes pointing to their local subnet router for the
remote site's prefix; an HTTP GET from backend-a to backend-b over
Tailscale returns a body containing backend-a's LAN IP.
Adds the supporting vmtest.SNATSubnetRoutes NodeOption and plumbs
snat-subnet-routes through TTA's /up handler. The webserver started by
vmtest.WebServer now also echoes the remote IP, for the preservation
assertion.
Adds a /add-route TTA endpoint (Linux-only for now) and a vmtest
Env.AddRoute helper so the test can install the backend static routes
through TTA rather than needing a host SSH key and debug NIC.
ensureGokrazy now always rebuilds the natlab qcow2 (once per test
process, via sync.Once) so the test picks up the new TTA and webserver
behavior.
This is pulled out of a larger pending change that adds FreeBSD
site-to-site subnet routing support; figured we should have at least
the Linux test covering what works today.
Updates #5573
Change-Id: I881c55b0f118ac9094546b5fbe68dddf179bb042
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Exposes a local port on the tailnet under a chosen hostname. Raw TCP by
default; --http or --https reverse-proxy with Tailscale-User-* identity
headers from WhoIs, matching tailscaled's serve header conventions.
Useful as a one-shot to put a dev server on the tailnet.
Fixes#19467
Change-Id: I79f63cfbbedf7e40cf0f1f51cbae8df86ae90cdf
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Remove the remaining known sources of flakiness in TestStateMachine and
TestStateMachineSeamless.
Updates tailscale/corp#36230
Updates #19377
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
For use in parallelizing go:generate up-to-date checks.
Updates tailscale/corp#28679
Change-Id: Ifc31c56de4225ba2e0fc048b0f18974dc2f2fc82
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And use it to allow overwrites of old address assignments in the conn25 client.
The magic and transit address pools from which the addresses come are limited
resources and we want to reuse them. This commit is a small part of that bigger
need.
We expect to follow soon:
* Extending expiry if assignments are still in use.
* Returning expired addresses back to the pools so they can be reallocated.
Updates tailscale/corp#39975
Signed-off-by: Fran Bull <fran@tailscale.com>
addrAssignments is a table of addrs with lookup indices, representing
the assignments of magic+destination+transit IP addresses the client has
made dut to the domain being routed because of an app
.
byConnKey is a map of node public key to prefixes of transit IPs, so it
is associated with, but not that data itself, and can be its own thing.
Updates tailscale/corp#39975
Signed-off-by: Fran Bull <fran@tailscale.com>
also port pkgdoc, into the tempfork folder
git rev from corp at the time this copy was made:
- e909fc93595414c90ff1339cece7c84500ab3c36
Updates #19470
Change-Id: I3d98d82020a2b336647b795210dcb7065dfa44d7
Change-Id: Ie63141860b76dd2d5ae3ff52f8a4bcdf6106421e
Signed-off-by: Walter Poupore <walterp@tailscale.com>
When the repo is checked out as a nested worktree, a go.work in the
outer tree hijacks module resolution, which makes the rebuild fails
with "main module does not contain package." Set GOWORK=off for the
build since the hook is self-contained.
Bumps HOOK_VERSION so existing installs pick up the fix.
Updates #cleanup
Change-Id: Ibd14849efc26e4e1893c5b8e300caa71573f54bd
Signed-off-by: Fernando Serboncini <fserb@fserb.com.br>
TestEncodeAndUploadMessages waited on the default 2s FlushDelay,
making the logtail package the slowest non-integration test in
the tree (~2s real time). Switch the shared harness from an
httptest.Server-on-loopback to a memnet.Listener-backed *http.Server
and run the tests inside synctest.Test, so fake time advances the
flush timer instantly.
Drops the net/http/httptest dependency from these tests. Combined
with the TestMain non-localhost dial guard added in the previous
commit, no test in this package can accidentally reach the real
log.tailscale.com server. Whole package now runs in ~7ms.
Updates tailscale/corp#28679
Change-Id: Ie0e7a6a79641384ed0eecb99d767e17cda8bb944
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
NewLogger unconditionally writes a "logtail started" banner before
it returns, which callers that later call Logger.SetEnabled(false)
have no way to suppress: the banner is already buffered for upload
by the time the caller gets the logger back.
Add Config.Disabled so callers that know up front they want the
logger to start disabled (e.g. Android's remote-logging opt-out)
can seed the state before NewLogger's internal Write. The process-
wide Disable kill switch still takes precedence; SetEnabled can
still flip the state at runtime.
Updates #13174
Updates tailscale/tailscale-android#695
Change-Id: Icc4fa88c198447cf0faa707264dac84e359fe52c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Reverting back to the previous format (including
the "svc:" prefix in the map's keys).
Note that the /services endpoint in localapi, along
with any software that relies on this is unreleased
so this does not break any clients.
Updates tailscale/corp#40052
Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
TestStateMachine & TestStateMachineSeamless both flake a lot asserting the
"Shutdown" call on cc after a Logout. This is because Shutdown is called on
a goroutine to avoid a deadlock if it's called while holding the
LocalBackend lock (#18052).
This fixes that cause of flakes by waiting for LocalBackend's goroutine
tracker to have no goroutines running (so the goroutine that calls Shutdown
must have finished).
This does not make TestStateMachine non-flaky because it can flake later in
the test, too: the assertion on "unpause" after clearing the netmap between
"Start4" and "Start4 -> netmap" sometimes fails.
Updates tailscale/corp#36230
Updates #19377
Updates #18052
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
Update this log message to show both the local and remote TKA HEAD; this
is useful for debugging issues on nodes that have fallen behind the
remote TKA HEAD.
Updates tailscale/corp#39455
Change-Id: Ia62ce15756180d2fbac4a898fb94d6143df08b54
Signed-off-by: Alex Chan <alexc@tailscale.com>
LocalBackend stores loginFlags at construction so that per-instance
properties (e.g. LoginEphemeral set by tsnet.Server.Ephemeral) persist
for the session. StartLoginInteractiveAs already merges b.loginFlags
into its cc.Login call, but the two auto-login call sites pass bare
controlclient.LoginDefault, silently dropping any stored flags.
Merge b.loginFlags at both auto-login call sites to match the existing
StartLoginInteractiveAs pattern. LoginDefault is zero so this is a
no-op when loginFlags is empty, and restores the documented behavior
when it isn't.
Fixes#15852
Signed-off-by: Scott Graham <scott.github@h4ck3r.net>
Updates the format of the service map that is served over
the local api to be keyed without the "svc:" prefix. This
change is backwards incompatible, this is OK because there
is only one tailnet with the services-in-nodecapmap feature
flag enabled, and the client side changes that start showing
services over local api have not been released. (These were
added in 4fcce6000d3d3f79d1ac1fca571a50efb059cbf2).
Updates tailscale/corp#40052
Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
Pull the hook logic into a reusable githook library package so
tailscale/corp can share it via a thin wrapper main instead of
keeping a forked copy in sync.
The install flow also changes: a wrapper scripts now build the
binary and reinstall the git hooks. Pulling new shared code no
longer requires re-running the installer.
Updates tailscale/corp#39860
Change-Id: I4d606d11c8c883015c190c54e3387a7f9fe4dd32
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Callers that need to turn logtail uploads on and off in response to
user preference or policy changes previously had no choice: the
package-level Disable is a one-way kill switch intended for the
controlplane DisableLogTail debug message, and requires a process
restart to undo.
Add a per-Logger disabled flag, toggled via SetEnabled, that drops
incoming entries without buffering while disabled. The process-wide
Disable still takes precedence, so a controlplane-issued kill switch
cannot be overridden by a client setting it back on.
To simplify https://github.com/tailscale/tailscale-android/pull/695
Updates #13174
Change-Id: I06e75bd719c851f5f837ca5b2d1e17f7c68355f0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Currently, clientupdate.NewUpdater().Update() is called directly inside tailscaled, which fatals. There is also a failure that doesn't return, causing a panic.
This fix allows us to use the same approach as startAutoUpdate, which is to find tailscale.exe and run tailscale.exe --update, though since it's calling the updater library directly, we get progress messages.
Fixes tailscale/corp#40430s
Signed-off-by: kari-ts <kari@tailscale.com>
This change adds setup for a second tailnet to enable multi-tailnet e2e
tests. When running against devcontrol, a second tailnet is created via the
API. Otherwise, credentials are read from SECOND_TS_API_CLIENT_SECRET.
Also adds an l7 HA Ingress test for multi-tailnet.
Fixestailscale/corp#37498
Signed-off-by: Becky Pauley <becky@tailscale.com>
The cloner's codegen for map[K][]*V fields was doing a shallow
append (copying pointer values) instead of cloning each element.
This meant that cloned structs aliased the original's pointed-to
values through the map's slice entries.
Mirror the existing standalone-slice logic that checks
ContainsPointers(sliceType.Elem()) and generates per-element
cloning for pointer, interface, and struct types.
Regenerate net/dns and tailcfg which both had affected
map[...][]*dnstype.Resolver fields.
Fixes#19284
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
Expose priorityClassName in the operator Helm chart values so that
users can configure the operator deployment with a Kubernetes
PriorityClass. This prevents the operator pods from being preempted
by lower-priority workloads.
Fixes#19235
Signed-off-by: Bjorn Stange <bjorn.stange@expel.io>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The package updates started getting really slow yesterday. We can do
better, but attempt a band aid fix for now, as the test is failing about
a third of the time on PR CI.
Updates tailscale/corp#40465
Change-Id: Icf53292ba83dd1ed76b9bdf9fb94a8f6fb448c07
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Add a new control/tsp package providing a client for speaking the
Tailscale protocol to a coordination server over Noise, along with a
cmd/tsp binary exposing it as a low-level composable tool for
generating keys, registering nodes, and issuing map requests.
Previously developed out-of-tree at github.com/bradfitz/tsp; imported
here without git history.
Updates #12542
Change-Id: I6ad21143c4aefe8939d4a46ae65b2184173bf69f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
modifying DNS responses for domains they are also connectors for
For Connectors 2025, determine if a client is configured as a
connector and what domains it is a connector for. When acting as a
client, don't install Split DNS routes to other connectors for those
domains, and don't alter DNS responses for those domains. The responses
are forwarded back to the original client, which in turn does the alteration,
swapping the real IP for a Magic IP.
A client is also a connector for a domain if it has tags that overlap
with tags in the configured policy, and --advertise-connector=true
in the prefs (not in the self-node Hostinfo from the netmap). We use the prefs
as the source of truth because control only gets a copy from the prefs, and
may drift. And the AppConnector field is currently zeroed out in the
self-node Hostinfo from control.
The extension adds a ProfileStateChange hook to process prefs changes,
and the config type is split into prefs and nodeview sub-configs.
Fixestailscale/corp#39317
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
Before:
tka initialized at head 325557575a59525354484e4a534f494b4c4e56575435583737564b5036584c4d4c335534554255344c344c36484c5a444a323341
After:
tka initialized at head 2UWWZYRSTHNJSOIKLNVWT5X77VKP6XLML3U4UBU4L4L6HLZDJ23A
Printing the AUM hash as hex makes it difficult to compare to other AUM
hashes; stringifying it will make it consistent with other printing.
Updates #cleanup
Change-Id: Ic1e23a9ce6a71a53cff7d2190f9fa06eb838ab89
Signed-off-by: Alex Chan <alexc@tailscale.com>
Endpoint's best address was cleared on trustBestAddrUntil expiry
only if it was a udprelay connection. This generalizes invalidation
to also cover direct UDP.
Trust deadline is checked in two cases:
On disco ping timeout from the endpoint's best address.
Traffic goes DERP-only, heartbeats to the old address stop.
The discovery pings are still in flight, handled by the following.
On disco ping success from an alternative. BestAddr switches to the
working path, trust refreshed, eager discovery stops. The still
in flight pongs are handled by betterAddr().
Updates #19407
Change-Id: Ic41ed18edb4a6e4350a2d49271ba01566a6a6964
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
TestUsedConsistently shells out to git grep to find forbidden
http.Method* uses across the repo. Since the test itself doesn't
open any repo files, Go's test cache considers it unchanged
between commits and serves stale passing results even when new
violations are introduced.
Fix by opening .git/index, which makes Go's test cache track it
as an input. The index file changes on git reset, checkout, pull,
etc., so the cache is properly invalidated when moving between
commits.
Updates tailscale/corp#40359
Change-Id: If1497b992a545351bdd68cff279d60f5591fe70b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit modifies the `DNSConfig` custom resource to allow the
user to specify affinity rules on the nameserver pods.
Updates: https://github.com/tailscale/tailscale/issues/18556
Signed-off-by: David Bond <davidsbond93@gmail.com>
fixestailscale/corp#39422
Updates tailscale/certstore for properly macOS support and
builds the request signing support into macOS builds. iOS and builds
that do not use cGo are omitted.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
For debugging purposes, unstable builds will sometimes intentionally panic for
unexpected behaviours. We observed such a panic after loading a cached netmap,
but because we had a valid cached map, the client was unable to recover on its
own and the operator had to manually reset the cache.
As a defensive hedge, when netmap caching is enabled, check for a panic during
installation of a net network map: If one occurs, discard any cached netmaps
before letting the panic unwind, so that we do not lose the panic itself, but
reduce the need for manual intervention.
Updates #12639
Updates tailscale/corp#27300
Change-Id: I0436889c6bdc2fa728c9cb83630cd7b00a72ce68
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
If we get a 429 response during node registration, use the `Retry-After`
header for backoff instead of the regular exponential backoff.
The rate limiter error is propagated to the user, just like other
registration errors are, e.g.
```
$ tailscale up
backend error: node registration rate limited; will retry after 57s
exit status 1
```
Updates tailscale/corp#39533
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
By adding a server-global parent bucket. Per-client rate limiting is
subject to the parent bucket if global rate limiting is enabled.
This implementation is experimental, and all related APIs should be
considered unstable.
Updates tailscale/corp#40291
Signed-off-by: Jordan Whited <jordan@tailscale.com>
* kube/authkey,cmd/containerboot: extract shared auth key reissue package
Move auth key reissue logic (set marker, wait for new key, clear marker,
read config) into a shared kube/authkey package and update containerboot
to use it. No behaviour change.
Updates #14080
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
* kube/authkey,kube/state,cmd/containerboot: preserve device_id across restarts
Stop clearing device_id, device_fqdn, and device_ips from state on startup.
These keys are now preserved across restarts so the operator can track
device identity. Expand ClearReissueAuthKey to clear device state and
tailscaled profile data when performing a full auth key reissue.
Updates #14080
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
* cmd/containerboot: use root context for auth key reissue wait
Pass the root context instead of bootCtx to setAndWaitForAuthKeyReissue.
The 60-second bootCtx timeout was cancelling the reissue wait before the
operator had time to respond, causing the pod to crash-loop.
Updates #14080
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
* cmd/k8s-proxy: add auth key renewal support
Add auth key reissue handling to k8s-proxy, mirroring containerboot.
When the proxy detects an auth failure (login-state health warning or
NeedsLogin state), it disconnects from control, signals the operator
via the state Secret, waits for a new key, clears stale state, and
exits so Kubernetes restarts the pod with the new key.
A health watcher goroutine runs alongside ts.Up() to short-circuit
the startup timeout on terminal auth failures.
Updates #14080
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
---------
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
Add an opt-in metrics.LabelMap tracking why patchifyPeer fails to
convert a PeersChanged entry into a PeersChangedPatch. The stats are
gated behind the TS_DEBUG_PATCHIFY_PEER_MISS envknob so there is zero
overhead in normal operation.
peerChangeDiff now takes an optional onFalse callback that is called
with the field name on every non-patchable return path. When the
envknob is off, nil is passed and replaced with a no-op at the top of
peerChangeDiff.
The resulting metric renders as:
counter_patchify_miss{why="Hostinfo"} 2
counter_patchify_miss{why="peer_not_found"} 1170
Updates tailscale/corp#40088
Change-Id: I2d4b9074bf42ec03ab296c0629a54106bafa873e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
On some nodes (found via natlab), the existing nodes last seen could be
unset. For these cases, we would want to accept the key and write a last
seen. This was breaking the cached netmap natlab tests.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
pickPort would bind a UDP socket on :0 to get a free port, close
the socket, then hope to rebind to the same port in NewConn. This
is a TOCTOU race that can cause flaky test failures when another
process grabs the port in between.
Instead, pass Port: 0 to NewConn and let the OS assign the port
atomically, then read back the assigned port via conn.LocalPort().
Fixes#19409
Change-Id: Ie44b599fb93c361e29a05f2171ad747c46f82b7a
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
Clients with the newly added node attribute
`"disable-linux-cgnat-drop-rule"` will not automatically drop inbound
traffic on non-Tailscale network interfaces with the source IP in the
CGNAT IP range. This is an initial proof-of-concept for enabling
connectivity with off-Tailnet CGNAT endpoints.
Fixestailscale/corp#36270.
Signed-off-by: Naman Sood <mail@nsood.in>
reflect.DeepEqual is expensive and allocates heavily. Replace it with
a field-by-field comparison that does zero allocations.
Adds tests and benchmarks for the new Equal method.
Fixes#19363
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Fix a panic in getOrCreateChain when the kernel lacks nftables support
(CONFIG_NF_TABLES). When the nftables netlink connection fails, chain
objects returned by getChainFromTable can have nil Hooknum and Priority
fields. Dereferencing these caused tailscaled to SIGSEGV during router
configuration, which manifested as tailscaled silently crashing ~13
seconds after "tailscale up" on arm64 gokrazy (whose kernel.arm64
build doesn't include nftables).
Updates #13038
Change-Id: I14433616da5ed57895cad37038921fb4f79c3534
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Use linkat via /proc/self/fd with AT_SYMLINK_FOLLOW to create a
hardlink of the test binary instead of copying it. This avoids
copying ~50MB+ binaries into each test's temp directory, making
test setup faster and reducing disk I/O.
The simpler os.Link(b.Path, ret.Path) can't be used here because
the source binary lives in the first test's TempDir, which may be
cleaned up before later tests call CopyTo. The open FD keeps the
inode alive after the path is deleted, but os.Link needs a valid
path. (See also b9f468240f which tried os.Link but is racy for
this reason.)
The /proc/self/fd approach works without elevated privileges,
unlike AT_EMPTY_PATH which requires CAP_DAC_READ_SEARCH. If the
linkat fails for any reason (e.g. cross-filesystem temp dirs), it
falls back to the existing full-copy path.
Fixes#19397
Change-Id: I4b1f97f7e63a9ae9e09dce36dfbdd1f6cff92320
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The kernel version parser used strings.Cut with "-" to handle versions
like "5.4.0-76-generic", but Debian uses "+" in versions like
"6.12.41+deb13-amd64".
Use strings.IndexAny to find the first "-" or "+" and truncate there.
Fixes TestKernelVersion on Debian systems.
Fixes#19395
Change-Id: I70e5f95682d54baf908e51f9f4b51c130b00aaaa
Co-Authored-By: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
The compare-metrics-stats subtest reset two independent counting
systems (physical connection counters and expvar.Int user metrics)
non-atomically. Background WireGuard keepalives arriving between the
resets could increment one system but not the other, causing
off-by-one packet/byte mismatches in either direction.
Replace the reset-then-compare pattern with snapshot-and-delta:
snapshot both systems before pings, snapshot again after, and compare
the deltas. This eliminates the non-atomic reset window entirely.
As a belt-and-suspenders safety net, tolerate a difference of exactly
one packet (and corresponding bytes) from a stray keepalive that
could still arrive in the narrow window between the two snapshots.
flakestress passes with ~5900 runs (~2800 without -race, ~3100 with
-race) but it also passed previously too. This is an annoying one to
repro.
Fixes#11762
Change-Id: I3447ad67e71c8146e85eed38b7a665033ef9e284
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The test had two problems:
1. runFileWatcher passed hardcoded "/etc/" to the inotify watcher,
but the test filesystem uses a temp directory prefix. The watcher
was watching the real /etc/, never seeing the test's file writes.
2. The test's watchFile used gonotify.NewDirWatcher which creates
goroutines that block on real inotify syscalls. These don't work
inside synctest's fake-time bubble. The test only passed standalone
by accident: gonotify walks /etc/ on startup producing fake events
that happened to trigger trample detection at the right time.
Fix the path issue by adding ActualPath to the wholeFileFS interface,
which translates logical paths (like "/etc/resolv.conf") to real
filesystem paths (respecting any test prefix). Use it in
runFileWatcher so the inotify watch targets the correct directory.
Replace gonotify in the test with a one-shot timer that synctest can
advance through fake time, reliably triggering the trample check.
Fixes#19400
Change-Id: Idb252881ec24d0ab3b3c1d154dbdaf532db837d4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The previous filters would allow for a handful of subtle issues such as
updating the last seen date when the key or online status had not
changed, and making online keys unconditionally make an engine update.
These have been fixed along side making no change updates from TSMP into
a no-op for the engine so we don't have to reconfigure.
A bunch of additional testing has been added as well.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>