This change reintroduces UserProfile.Groups, a slice that contains
the ACL-defined and synced groups that a user is a member of.
The slice will only be non-nil for clients with the node attribute
see-groups, and will only contain groups that the client is allowed
to see as per the app payload of the see-groups node attribute.
For example:
```
"nodeAttrs": [
{
"target": ["tag:dev"],
"app": {
"tailscale.com/see-groups": [{"groups": ["group:dev"]}]
}
},
[...]
]
```
UserProfile.Groups will also be gated by a feature flag for the time
being.
Updates tailscale/corp#31529
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
This commit is based on ff0978ab, and extends #18497 to connect network map
caching to the LocalBackend. As implemented, only "whole" netmap values are
stored, and we do not yet handle incremental updates. As-written, the feature must
be explicitly enabled via the TS_USE_CACHED_NETMAP envknob, and must be
considered experimental.
Updates #12639
Co-Authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I48a1e92facfbf7fb3a8e67cff7f2c9ab4ed62c83
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
TestOnlyTaggedPeersCanBeDialed has a race condition:
- The test untags ps[2] and waits until ps[0] sees this tag dropped from
ps[2] in the netmap.
- Later the test tries to dial ps[2] from ps[0] and expects the dial to
fail as authorization to dial relies on the presence of the tag, now
removed from ps[2].
- However, the authorization layer caches the status used to consult peer
tags. When the dial happens before the cache times out, the test fails.
- Due to a bug in testcontrol.Server.UpdateNode, which the test uses to
remove the tag, netmap updates are not immediately triggered. The test
has to wait for the next natural set of netmap updates, which on my
machine takes about 22 seconds. As a result, the cache in the
authorization layer times out and the test passes.
- If one fixes the bug in UpdateNode, then netmap updates happen
immediately, the cache is no longer timed out when the dial occurs, and
the test fails.
Fixes#18720
Updates #18703
Signed-off-by: Harry Harpham <harry@tailscale.com>
This adds a new ControlKnob to make MagicDNS IPv6 registration
(telling systemd/etc) opt-out rather than opt-in.
Updates #15404
Change-Id: If008e1cb046b792c6aff7bb1d7c58638f7d650b1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
OnPolicyChange can observe a conn in activeConns before authentication
completes. The previous `c.info == nil` guard was itself a data race
against clientAuth writing c.info, and even when c.info appeared
non-nil, c.localUser could still be nil, causing a nil pointer
dereference at c.localUser.Username.
Add an authCompleted atomic.Bool to conn, stored true after all auth
fields are written in clientAuth. OnPolicyChange checks this atomic
instead of c.info, which provides the memory barrier guaranteeing all
prior writes are visible to the concurrent reader.
Updates tailscale/corp#36268 (fixes, but we might want to cherry-pick)
Co-authored-by: Gesa Stupperich <gesa@tailscale.com>
Change-Id: I4c69843541f5f9f04add9bf431e320c65a203a39
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The gocross-wrapper.sh bash script already checks TS_GO_NEXT (as of
a374cc344e48) to select go.toolchain.next.rev over go.toolchain.rev,
but when TS_USE_GOCROSS=1 the Go binary itself was hardcoded to read
go.toolchain.rev. This makes gocross also respect the TS_GO_NEXT=1
environment variable.
Updates tailscale/corp#36382
Change-Id: I04bef25a34e7ed3ccb1bfdb33a3a1f896236c6ee
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In PR #18681, we started logging which exit nodes were being
suggested. However, we did not log if there were errors encountered.
This patch corrects this oversight.
Updates: tailscale/corp#29964
Updates: tailscale/corp#36446
Signed-off-by: Simon Law <sfllaw@tailscale.com>
This switches our gokrazy builds to use a new variant of cmd/gok called
opinionated about using monorepos: https://github.com/bradfitz/monogok
And with that, we can get rid of all the go.mod files and builddir forests
under gokrazy/**.
Updates #13038
Updates gokrazy/gokrazy#361
Change-Id: I9f18fbe59b8792286abc1e563d686ea9472c622d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
(*health.Tracker).CurrentState() returns an empty state when there are no client-side
warnables, even when there are control-health messages, which is incorrect.
This fixes it.
Updates tailscale/corp#37275
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Store packet filter rules in the cache. The match expressions are derived
from the filter rules, so these do not need to be stored explicitly, but ensure
they are properly reconstructed when the cache is read back.
Update the tests to include these fields, and provide representative values.
Updates #12639
Change-Id: I9bdb972a86d2c6387177d393ada1f54805a2448b
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
fixestailscale/corp#36708
Sets up a set of metrics to report watchdog timeouts for wgengine and
reports an event for any watchdog timeout.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
In the absence of a better mechanism, writing unqualified hostnames to the hosts file may be required
for MagicDNS to work on some Windows environments, such as domain-joined machines. It can also
improve MagicDNS performance on non-domain joined devices when we are not the device's primary
DNS resolver.
At the same time, updating the hosts file can be slow and expensive, especially when it already contains
many entries, as was previously reported in #14327. It may also have negative side effects, such as interfering
with the system's DNS resolution policies.
Additionally, to fix#18712, we had to extend hosts file usage to domain-joined machines when we are not
the primary DNS resolver. For the reasons above, this change may introduce risk.
To allow customers to disable hosts file updates remotely without disabling MagicDNS entirely, whether on
domain-joined machines or not, this PR introduces the `disable-hosts-file-updates` node attribute.
Updates #18712
Updates #14327
Signed-off-by: Nick Khyl <nickk@tailscale.com>
On domain-joined Windows devices the primary search domain (the one the device is joined to)
always takes precedence over other search domains. This breaks MagicDNS when we are the primary
resolver on the device (see #18712). To work around this Windows behavior, we should write MagicDNS
host names the hosts file just as we do when we're not the primary resolver.
This commit does exactly that.
Fixes#18712
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This commit adds a new custom resource definition to the kubernetes
operator named `ProxyGroupPolicy`. This resource is namespace scoped
and is used as an allow list for which `ProxyGroup` resources can be
used within its namespace.
The `spec` contains two fields, `ingress` and `egress`. These should
contain the names of `ProxyGroup` resources to denote which can be
used as values in the `tailscale.com/proxy-group` annotation within
`Service` and `Ingress` resources.
The intention is for these policies to be merged within a namespace and
produce a `ValidatingAdmissionPolicy` and `ValidatingAdmissionPolicyBinding`
for both ingress and egress that prevents users from using names of
`ProxyGroup` resources in those annotations.
Closes: https://github.com/tailscale/corp/issues/36829
Signed-off-by: David Bond <davidsbond93@gmail.com>
Adds a new track for release candidates. Supports querying by track in
version and updating to RCs in update for supported platforms.
updates #18193
Signed-off-by: Will Hannah <willh@tailscale.com>
Two methods were recently added to the testcontrol.Server type:
AddDNSRecords and SetGlobalAppCaps. These two methods should trigger
netmap updates for all nodes connected to the Server instance, the way
that other state-change methods do (see SetNodeCapMap, for example).
This will also allow us to get rid of Server.ForceNetmapUpdate, which
was a band-aid fix to force the netmap updates which should have been
triggered by the aforementioned methods.
Fixestailscale/corp#37102
Signed-off-by: Harry Harpham <harry@tailscale.com>
Instead of relying on the local timezone, which may cause
non-deterministic behavior in some CIs, we force timezone
to be UTC on default created clocks.
Fixes: tailscale/corp#37005
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
This updates the TS_GO_NEXT=1 (testing) toolchain to Go 1.26.0
The default one is still Go 1.25.x.
Updates #18682
Change-Id: I99747798c166ce162ee9eee74baa9ff6744a62f6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When traffic steering is enabled, some users are suggested an exit
node that is inappropriately far from their location. This seems to
happen right when the client connects to the control plane and the
client eventually fixes itself. But whenever an affected client
reconnects, its suggested exit node flaps, and this happens often
enough to be noticeable because connections drop whenever the exit
node is switched. This should not happen, since the map response that
contains the list of suggested exit nodes that the client picks from,
also contains the scores for those nodes.
Since our current logging and diagnostic tools don’t give us enough
insight into what is happening, this PR adds additional logging when:
- traffic steering scores are used to suggest an exit node
- an exit node is suggested, no matter how it was determined
Updates: tailscale/corp#29964
Updates: tailscale/corp#36446
Signed-off-by: Simon Law <sfllaw@tailscale.com>
Updates rotateLocked so that we hold the activeStderrWriteForTest write
lock around the dup2Stderr call, rather than acquiring it only after
dup2 was already compelete. This ensures no stderrWriteForTest calls
can race with the dup2 syscall. The now unused waitIdleStderrForTest has
been removed.
On macOS, dup2 and write on the same file descriptor are not atomic with
respect to each other, when rotateLocked called dup2Stderr to redirect
the stderr fd to a new file, concurrent goroutines calling
stderrWriteForTest could observe the fd in a transiently invalid state,
resulting in the bad file descripter.
Fixestailscale/corp#36953
Signed-off-by: James Scott <jim@tailscale.com>
Restore synchronous method calls from LocalBackend to magicsock.Conn
for node views, filter, and delta mutations. The eventbus delivery
introduced in 8e6f63cf1 was invalid for these updates because
subsequent operations in the same call chain depend on magicsock
already having the current state. The Synchronize/settleEventBus
workaround was fragile and kept requiring more workarounds and
introducing new mystery bugs.
Since eventbus was added, we've since learned more about when to use
eventbus, and this wasn't one of the cases.
We can take another swing at using eventbus for netmap changes in a
future change.
Fixes#16369
Updates #18575 (likely fixes)
Change-Id: I79057cc9259993368bb1e350ff0e073adf6b9a8f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
fixestailscale/tailscale#18436
Queries can still make their way to the forwarder when accept-dns is disabled.
Since we have not configured the forwarder if --accept-dns is false, this errors out
(correctly) but it also generates a persistent health warning. This forwards the
Pref setting all the way through the stack to the forwarder so that we can be more
judicious about when we decide that the forward path is unintentionally missing, vs
simply not configured.
Testing:
tailscale set --accept-dns=false. (or from the GUI)
dig @100.100.100.100 example.com
tailscale status
No dns related health warnings should be surfaced.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
-Wait does not just wait for the created process; it waits for the
entire process tree rooted at that process! This can cause the shell
to wait indefinitely if something in that tree fired up any background
processes.
Instead we call WaitForExit on the returned process.
Updates https://github.com/tailscale/corp/issues/29940
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
app connector packets
We introduce the Conn25PacketHooks interface to be used as a nil-able
field in userspaceEngine. The engine then plumbs through the functions
to the corresponding tstun.Wrapper intercepts.
The new intercepts run pre-filter when egressing toward WireGuard,
and post-filter when ingressing from WireGuard. This is preserve the
design invariant that the filter recognizes the traffic as interesting
app connector traffic.
This commit does not plumb through implementation of the interface, so
should be a functional no-op.
Fixestailscale/corp#35985
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
bart has gained a bunch of purported performance and usability
improvements since the current version we are using (0.18.0,
from 1y ago)
Updates tailscale/corp#36982
Signed-off-by: Amal Bansode <amal@tailscale.com>
This updates the URL shown by systemd to the new URL used by the docs
after the recent migration.
Fixes#18646
Signed-off-by: Tim Walters <tim@tailscale.com>
Add new "webbrowser" and "colorable" feature tags so that the
github.com/toqueteos/webbrowser and mattn/go-colorable packages
can be excluded from minbox builds.
Updates #12614
Change-Id: Iabd38b242f5a56aa10ef2050113785283f4e1fe8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit adds a bool named PeerRelay to Hostinfo, to identify the host's status of acting as a peer relay.
Considering the RelayServerPort number can be 0, I just made this a bool in stead of a port number. If the port
info is needed in future this would also help indicating if the port was set to 0 (meaning any port in peer relay
context).
Updates tailscale/corp#35862
Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
Under extremely high load it appears we may have some retention issues
as a result of queue depth build up, but there is currently no direct
way to observe this. The scenario does not trigger the slow subscriber
log message, and the event stream debugging endpoint produces a
saturating volume of information.
Updates tailscale/corp#36904
Signed-off-by: James Tucker <james@tailscale.com>
Currently the expvar exporter attempts to write expvar.String, which
breaks the Prometheus metric page.
Updates tailscale/corp#36552
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
concurrent netmaps that if the first is logged in, it is never skipped.
This should have been covered be the skip test case, but that case
wasn't updated to include level set state.
Updates #12639
Updates #17869
Signed-off-by: James Tucker <james@tailscale.com>
If any profiles exist and an Authkey is provided via syspolicy, the
AuthKey is ignored on backend start, preventing re-auth attempts. This
is useful for one-time device provisioning scenarios, skipping authKey
use after initial setup when the authKey may no longer be valid.
updates #18618
Signed-off-by: Will Hannah <willh@tailscale.com>
Use the parsed and validated advertise tags value from prefs instead of
doing a strings.Split on the raw tags value as an input to the OAuth and
identity federation auth key generation methods.
The previous strings.Split method would return an array with a single
empty string element which would pass downstream length checks on the
tags argument before eventually failing with a confusing message when
hitting the API.
Fixes https://github.com/tailscale/tailscale/issues/18617
Signed-off-by: Mario Minardi <mario@tailscale.com>
Package feature/conn25 is excludeable from a build via the featuretag.
Test it is excluded for minimal builds.
Updates #12614
Signed-off-by: Fran Bull <fran@tailscale.com>
We already had a featuretag for clientupdate, but the CLI wasn't using
it, making the "minbox" build (minimal combined tailscaled + CLI
build) larger than necessary.
Updates #12614
Change-Id: Idd7546c67dece7078f25b8f2ae9886f58d599002
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>