This improves our test coverage of the Bootstrap() method, especially
around catching AUMs that shouldn't pass validation.
Updates #cleanup
Change-Id: Idc61fcbc6daaa98c36d20ec61e45ce48771b85de
Signed-off-by: Alex Chan <alexc@tailscale.com>
Previously, if users attempted to expose a headless Service to tailnet,
this just silently did not work.
This PR makes the operator throw a warning event + update Service's
status with an error message.
Updates #18139
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
The event queue gets deleted events, which means that sometimes
the object that should be reconciled no longer exists.
Don't log user facing errors if that is the case.
Updates #18141
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
The service was starting after systemd itself, and while this
surprisingly worked for some situations, it broke for others.
Change it to start after a GUI has been initialized.
Updates #17656
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Previously we only set this when it updated, which was fine for the first
call to Start(), but after that point future updates would be skipped if
nothing had changed. If Start() was called again, it would wipe the peer API
endpoints and they wouldn't get added back again, breaking exit nodes (and
anything else requiring peer API to be advertised).
Updates tailscale/corp#27173
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
Based on PR #16700 by @lox, adapted to current codebase.
Adds support for proxying HTTP requests to Unix domain sockets via
tailscale serve unix:/path/to/socket, enabling exposure of services
like Docker, containerd, PHP-FPM over Tailscale without TCP bridging.
The implementation includes reasonable protections against exposure of
tailscaled's own socket.
Adaptations from original PR:
- Use net.Dialer.DialContext instead of net.Dial for context propagation
- Use http.Transport with Protocols API (current h2c approach, not http2.Transport)
- Resolve conflicts with hasScheme variable in ExpandProxyTargetValue
Updates #9771
Signed-off-by: Peter A. <ink.splatters@pm.me>
Co-authored-by: Lachlan Donald <lachlan@ljd.cc>
If a packet arrives while WireGuard is being reconfigured with b.mu held, such as during a profile switch,
calling back into (*LocalBackend).GetPeerAPIPort from (*Wrapper).filterPacketInboundFromWireGuard
may deadlock when it tries to acquire b.mu.
This occurs because a peer cannot be removed while an inbound packet is being processed.
The reconfig and profile switch wait for (*Peer).RoutineSequentialReceiver to return, but it never finishes
because GetPeerAPIPort needs b.mu, which the waiting goroutine already holds.
In this PR, we make peerAPIPorts a new syncs.AtomicValue field that is written with b.mu held
but can be read by GetPeerAPIPort without holding the mutex, which fixes the deadlock.
There might be other long-term ways to address the issue, such as moving peer API listeners
from LocalBackend to nodeBackend so they can be accessed without holding b.mu,
but these changes are too large and risky at this stage in the v1.92 release cycle.
Updates #18124
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Previously, callers of (*LocalBackend).resetControlClientLocked were supposed
to call Shutdown on the returned controlclient.Client after releasing b.mu.
In #17804, we started calling Shutdown while holding b.mu, which caused
deadlocks during profile switches due to the (*ExecQueue).RunSync implementation.
We first patched this in #18053 by calling Shutdown in a new goroutine,
which avoided the deadlocks but made TestStateMachine flaky because
the shutdown order was no longer guaranteed.
In #18070, we updated (*ExecQueue).RunSync to allow shutting down
the queue without waiting for RunSync to return. With that change,
shutting down the control client while holding b.mu became safe.
Therefore, this PR updates (*LocalBackend).resetControlClientLocked
to shut down the old client synchronously during the reset, instead of
returning it and shifting that responsibility to the callers.
This fixes the flaky tests and simplifies the code.
Fixes#18052
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This commit uses SO_REUSEPORT (when supported) to bind multiple sockets
per address family. Increasing the number of sockets can increase
aggregate throughput when serving many peer relay client flows.
Benchmarks show 3x improvement in max aggregate bitrate in some
environments.
Updates tailscale/corp#34745
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Add support for pinning specific Tailscale versions during installation
via the TAILSCALE_VERSION environment variable.
Example usage:
curl -fsSL https://tailscale.com/install.sh | TAILSCALE_VERSION=1.88.4 sh
Fixes#17776
Signed-off-by: Raj Singh <raj@tailscale.com>
111 is 3 years old, and there have been a lot of speed improvements
since then. We run wasm-opt twice as part of the CI wasm job, and it
currently takes about 3 minutes each time. With 125, it takes ~40
seconds, a 4.5x speed-up.
Updates #cleanup
Change-Id: I671ae6cefa3997a23cdcab6871896b6b03e83a4f
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Implements a new disk put function for cigocacher that does not cause
locking issues on Windows when there are multiple processes reading and
writing the same files concurrently. Integrates cigocacher into test.yml
for Windows where we are running on larger runners that support
connecting to private Azure vnet resources where cigocached is hosted.
Updates tailscale/corp#10808
Change-Id: I0d0e9b670e49e0f9abf01ff3d605cd660dd85ebb
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
The cache artifacts from a full run of test.yml are 14GB. Only save
artifacts from the main branch to ensure we don't thrash too much. Most
branches should get decent performance with a hit from recent main.
Fixestailscale/corp#34739
Change-Id: Ia83269d878e4781e3ddf33f1db2f21d06ea2130f
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Thanks to seamless key renewal, you can now do a force-reauth without
losing your connection in all circumstances. We softened the interactive
warning (see #17262) so let's soften the help text as well.
Updates https://github.com/tailscale/corp/issues/32429
Signed-off-by: Alex Chan <alexc@tailscale.com>
* cmd/k8s-operator: add support for taiscale.com/http-redirect
The k8s-operator now supports a tailscale.com/http-redirect annotation
on Ingress resources. When enabled, this automatically creates port 80
handlers that automatically redirect to the equivalent HTTPS location.
Fixes#11252
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
* Fix for permanent redirect
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
* lint
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
* warn for redirect+endpoint
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
* tests
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
---------
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Restrict running the golangci-lint workflow to when the workflow file
itself or a .go file, go.mod, or go.sum have actually been modified.
Updates #cleanup
Signed-off-by: Mario Minardi <mario@tailscale.com>
Skip the "request review" workflows for PRs that are in draft to reduce
noise / skip adding reviewers to PRs that are intentionally marked as
not ready to review.
Updates #cleanup
Signed-off-by: Mario Minardi <mario@tailscale.com>
Adds an observation point that may identify potentially abusive traffic
patterns at outlier values.
Updates tailscale/corp#24681
Signed-off-by: James Tucker <james@tailscale.com>
We don't hold q.mu while running normal ExecQueue.Add funcs, so we
shouldn't in RunSync either. Otherwise code it calls can't shut down
the queue, as seen in #18502.
Updates #18052
Co-authored-by: Nick Khyl <nickk@tailscale.com>
Change-Id: Ic5e53440411eca5e9fabac7f4a68a9f6ef026de1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This patch adds an integration test for Tailnet Lock, checking that a node can't
talk to peers in the tailnet until it becomes signed.
This patch also introduces a new package `tstest/tkatest`, which has some helpers
for constructing a mock control server that responds to TKA requests. This allows
us to reduce boilerplate in the IPN tests.
Updates tailscale/corp#33599
Signed-off-by: Alex Chan <alexc@tailscale.com>
In preparation for exposing its configuration via ipn.ConfigVAlpha,
change {Masked}Prefs.RelayServerPort from *int to *uint16. This takes a
defensive stance against invalid inputs at JSON decode time.
'tailscale set --relay-server-port' is currently the only input to this
pref, and has always sanitized input to fit within a uint16.
Updates tailscale/corp#34591
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Adds a new types of TSMP messages for advertising disco keys keys
to/from a peer, and implements the advertising triggered by a TSMP ping.
Needed as part of the effort to cache the netmap and still let clients
connect without control being reachable.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Co-authored-by: James Tucker <james@tailscale.com>
In suggestExitNodeLocked, if no exit node candidates have a home DERP or
valid location info, `bestCandidates` is an empty slice. This slice is
passed to `selectNode` (`randomNode` in prod):
```go func randomNode(nodes views.Slice[tailcfg.NodeView], …) tailcfg.NodeView {
…
return nodes.At(rand.IntN(nodes.Len()))
}
```
An empty slice becomes a call to `rand.IntN(0)`, which panics.
This patch changes the behaviour, so if we've filtered out all the
candidates before calling `selectNode`, reset the list and then pick
from any of the available candidates.
This patch also updates our tests to give us more coverage of `randomNode`,
so we can spot other potential issues.
Updates #17661
Change-Id: I63eb5e4494d45a1df5b1f4b1b5c6d5576322aa72
Signed-off-by: Alex Chan <alexc@tailscale.com>
And fix up the TestAutoUpdateDefaults integration tests as they
weren't testing reality: the DefaultAutoUpdate is supposed to only be
relevant on the first MapResponse in the stream, but the tests weren't
testing that. They were instead injecting a 2nd+ MapResponse.
This changes the test control server to add a hook to modify the first
map response, and then makes the test control when the node goes up
and down to make new map responses.
Also, the test now runs on macOS where the auto-update feature being
disabled would've previously t.Skipped the whole test.
Updates #11502
Change-Id: If2319bd1f71e108b57d79fe500b2acedbc76e1a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In PR tailscale/corp#34401, the `traffic-steering` feature flag does
not automatically enable traffic steering for all nodes. Instead, an
admin must add the `traffic-steering` node attribute to each client
node that they want opted-in.
For backwards compatibility with older clients, tailscale/corp#34401
strips out the `traffic-steering` node attribute if the feature flag
is not enabled, even if it is set in the policy file. This lets us
safely disable the feature flag.
This PR adds a missing test case for suggested exit nodes that have no
priority.
Updates tailscale/corp#34399
Signed-off-by: Simon Law <sfllaw@tailscale.com>
This commit fixes a bug in our HA ingress reconciler where ingress resources would
be stuck in a deleting state should their associated VIP service be deleted within
control.
The reconciliation loop would check for the existence of the VIP service and if not
found perform no additional cleanup steps. The code has been modified to continue
onwards even if the VIP service is not found.
Fixes: https://github.com/tailscale/tailscale/issues/18049
Signed-off-by: David Bond <davidsbond93@gmail.com>