Some natc instances have been observed with excessive memory growth,
dominant in gvisor buffers. It is likely that the connection buffers are
sticking around for too long due to the default long segment time, and
uptuned buffer size applied by default in wgengine/netstack. Apply
configurations in natc specifically which are a better match for the
natc use case, most notably a 5s maximum segment lifetime.
Updates tailscale/corp#25169
Signed-off-by: James Tucker <james@tailscale.com>
Remove "unexpected" labelling of PeerGoneReasonNotHere.
A peer being no longer connected to a DERP server
is not an unexpected case and causes confusion in looking at logs.
Fixestailscale/corp#25609
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
This commit intend to provide support for TCP and Web VIP services and also allow user to use Tun
for VIP services if they want to.
The commit includes:
1.Setting TCP intercept function for VIP Services.
2.Update netstack to send packet written from WG to netStack handler for VIP service.
3.Return correct TCP hander for VIP services when netstack acceptTCP.
This commit also includes unit tests for if the local backend setServeConfig would set correct TCP intercept
function and test if a hander gets returned when getting TCPHandlerForDst. The shouldProcessInbound
check is not unit tested since the test result just depends on mocked functions. There should be an integration
test to cover shouldProcessInbound and if the returned TCP handler actually does what the serveConfig says.
Updates tailscale/corp#24604
Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
This deprecates the old "DERP string" packing a DERP region ID into an
IP:port of 127.3.3.40:$REGION_ID and just uses an integer, like
PeerChange.DERPRegion does.
We still support servers sending the old form; they're converted to
the new form internally right when they're read off the network.
Updates #14636
Change-Id: I9427ec071f02a2c6d75ccb0fcbf0ecff9f19f26f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In this PR, we add a generic views.ValuePointer type that can be used as a view for pointers
to basic types and struct types that do not require deep cloning and do not have corresponding
view types. Its Get/GetOk methods return stack-allocated shallow copies of the underlying value.
We then update the cmd/viewer codegen to produce getters that return either concrete views
when available or ValuePointer views when not, for pointer fields in generated view types.
This allows us to avoid unnecessary allocations compared to returning pointers to newly
allocated shallow copies.
Updates #14570
Signed-off-by: Nick Khyl <nickk@tailscale.com>
sync.OnceValue and slices.Compact were both added in Go 1.21.
cmp.Or was added in Go 1.22.
Updates #8632
Updates #11058
Change-Id: I89ba4c404f40188e1f8a9566c8aaa049be377754
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Remove the platform specificity, it is unnecessary complexity.
Deduplicate repeated code as a result of reduced complexity.
Split out error identification code.
Update call-sites and tests.
Updates #14551
Updates tailscale/corp#25648
Signed-off-by: James Tucker <james@tailscale.com>
Observed in the wild some macOS machines gain broken sockets coming out
of sleep (we observe "time jumped", followed by EPIPE on sendto). The
cause of this in the platform is unclear, but the fix is clear: always
rebind if the socket is broken. This can also be created artificially on
Linux via `ss -K`, and other conditions or software on a system could
also lead to the same outcomes.
Updates tailscale/corp#25648
Signed-off-by: James Tucker <james@tailscale.com>
Fixes#14492
-----
Developer Certificate of Origin
Version 1.1
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
Change-Id: I6dc1068d34bbfa7477e7b7a56a4325b3868c92e1
Signed-off-by: Marc Paquette <marcphilippaquette@gmail.com>
Importing the ~deprecated golang.org/x/exp/maps as "xmaps" to not
shadow the std "maps" was getting ugly.
And using slices.Collect on an iterator is verbose & allocates more.
So copy (x)maps.Keys+Values into our slicesx package instead.
Updates #cleanup
Updates #12912
Updates #14514 (pulled out of that change)
Change-Id: I5e68d12729934de93cf4a9cd87c367645f86123a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This provides an interface for a user to force a preferred DERP outcome
for all future netchecks that will take precedence unless the forced
region is unreachable.
The option does not persist and will be lost when the daemon restarts.
Updates tailscale/corp#18997
Updates tailscale/corp#24755
Signed-off-by: James Tucker <james@tailscale.com>
We have several places where LocalBackend instances are created for testing, but they are rarely shut down
when the tests that created them exit.
In this PR, we update newTestLocalBackend and similar functions to use testing.TB.Cleanup(lb.Shutdown)
to ensure LocalBackend instances are properly shut down during test cleanup.
Updates #12687
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Initial support for SrcCaps was added in 5ec01bf but it was not actually
working without this.
Updates #12542
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
This gets close to all of the remaining ones.
Updates #12912
Change-Id: I9c672bbed2654a6c5cab31e0cbece6c107d8c6fa
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
A filesystem was plumbed into netstack in 993acf4475
but hasn't been used since 2d5d6f5403. Remove it.
Noticed while rebasing a Tailscale fork elsewhere.
Updates tailscale/corp#16827
Change-Id: Ib76deeda205ffe912b77a59b9d22853ebff42813
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This required sharing the dropped packet metric between two packages
(tstun and magicsock), so I've moved its definition to util/usermetric.
Updates tailscale/corp#22075
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
The user-facing metrics are intended to track data transmitted at
the overlay network level.
Updates tailscale/corp#22075
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
This adds additional logging on DERP home changes to allow
better troubleshooting.
Updates tailscale/corp#18095
Signed-off-by: Tim Walters <tim@tailscale.com>
While looking at deflaking TestTwoDevicePing/ping_1.0.0.2_via_SendPacket,
there were a bunch of distracting:
WARNING: (non-fatal) nil health.Tracker (being strict in CI): ...
This pacifies those so it's easier to work on actually deflaking the test.
Updates #11762
Updates #11874
Change-Id: I08dcb44511d4996b68d5f1ce5a2619b555a2a773
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
connstats currently increments the packet counter whenever it is called
to store a length of data, however when udp batch sending was introduced
we pass the length for a series of packages, and it is only incremented
ones, making it count wrongly if we are on a platform supporting udp
batches.
Updates tailscale/corp#22075
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Like we do for the ones on iOS.
As a bonus, this removes a caller of tsaddr.IsTailscaleIP which we
want to revamp/remove soonish.
Updates #13687
Change-Id: Iab576a0c48e9005c7844ab52a0aba5ba343b750e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The AddSNATRuleForDst rule was adding a new rule each time it was called including:
- if a rule already existed
- if a rule matching the destination, but with different desired source already existed
This was causing issues especially for the in-progress egress HA proxies work,
where the rules are now refreshed more frequently, so more redundant rules
were being created.
This change:
- only creates the rule if it doesn't already exist
- if a rule for the same dst, but different source is found, delete it
- also ensures that egress proxies refresh firewall rules
if the node's tailnet IP changes
Updates tailscale/tailscale#13406
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
The new logging in 2dd71e64ac is spammy at shutdown:
Receive func ReceiveIPv6 exiting with error: *net.OpError, read udp [::]:38869: raw-read udp6 [::]:38869: use of closed network connection
Receive func ReceiveIPv4 exiting with error: *net.OpError, read udp 0.0.0.0:36123: raw-read udp4 0.0.0.0:36123: use of closed network connection
Skip it if we're in the process of shutting down.
Updates #10976
Change-Id: I4f6d1c68465557eb9ffe335d43d740e499ba9786
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
* cmd/containerboot,kube,util/linuxfw: configure kube egress proxies to route to 1+ tailnet targets
This commit is first part of the work to allow running multiple
replicas of the Kubernetes operator egress proxies per tailnet service +
to allow exposing multiple tailnet services via each proxy replica.
This expands the existing iptables/nftables-based proxy configuration
mechanism.
A proxy can now be configured to route to one or more tailnet targets
via a (mounted) config file that, for each tailnet target, specifies:
- the target's tailnet IP or FQDN
- mappings of container ports to which cluster workloads will send traffic to
tailnet target ports where the traffic should be forwarded.
Example configfile contents:
{
"some-svc": {"tailnetTarget":{"fqdn":"foo.tailnetxyz.ts.net","ports"{"tcp:4006:80":{"protocol":"tcp","matchPort":4006,"targetPort":80},"tcp:4007:443":{"protocol":"tcp","matchPort":4007,"targetPort":443}}}}
}
A proxy that is configured with this config file will configure firewall rules
to route cluster traffic to the tailnet targets. It will then watch the config file
for updates as well as monitor relevant netmap updates and reconfigure firewall
as needed.
This adds a bunch of new iptables/nftables functionality to make it easier to dynamically update
the firewall rules without needing to restart the proxy Pod as well as to make
it easier to debug/understand the rules:
- for iptables, each portmapping is a DNAT rule with a comment pointing
at the 'service',i.e:
-A PREROUTING ! -i tailscale0 -p tcp -m tcp --dport 4006 -m comment --comment "some-svc:tcp:4006 -> tcp:80" -j DNAT --to-destination 100.64.1.18:80
Additionally there is a SNAT rule for each tailnet target, to mask the source address.
- for nftables, a separate prerouting chain is created for each tailnet target
and all the portmapping rules are placed in that chain. This makes it easier
to look up rules and delete services when no longer needed.
(nftables allows hooking a custom chain to a prerouting hook, so no extra work
is needed to ensure that the rules in the service chains are evaluated).
The next steps will be to get the Kubernetes Operator to generate
the configfile and ensure it is mounted to the relevant proxy nodes.
Updates tailscale/tailscale#13406
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Like Linux, macOS will reply to sendto(2) with EPERM if the firewall is
currently blocking writes, though this behavior is like Linux
undocumented. This is often caused by a faulting network extension or
content filter from EDR software.
Updates #11710
Updates #12891
Updates #13511
Signed-off-by: James Tucker <james@tailscale.com>
When querying for an exit node suggestion, occasionally it triggers a
new report concurrently with an existing report in progress. Generally,
there should always be a recent report or one in progress, so it is
redundant to start one there, and it causes concurrency issues.
Fixes#12643
Change-Id: I66ab9003972f673e5d4416f40eccd7c6676272a5
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
this commit changes usermetrics to be non-global, this is a building
block for correct metrics if a go process runs multiple tsnets or
in tests.
Updates #13420
Updates tailscale/corp#22075
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
netcheck.Client.GetReport() applies its own deadlines. This 2s deadline
was causing GetReport() to never fall back to HTTPS/ICMP measurements
as it was shorter than netcheck.stunProbeTimeout, leaving no time
for fallbacks.
Updates #13394
Updates #6187
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Disable TCP & UDP GRO if the probe fails.
torvalds/linux@e269d79c7d broke virtio_net
TCP & UDP GRO causing GRO writes to return EINVAL. The bug was then
resolved later in
torvalds/linux@89add40066. The offending
commit was pulled into various LTS releases.
Updates #13041
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Previously, despite what the commit said, we were using a raw IP socket
that was *not* an AF_PACKET socket, and thus was subject to the host
firewall rules. Switch to using a real AF_PACKET socket to actually get
the functionality we want.
Updates #13140
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: If657daeeda9ab8d967e75a4f049c66e2bca54b78
When the TS_DEBUG_NETSTACK_LOOPBACK_PORT environment variable is set,
netstack will loop back (dnat to addressFamilyLoopback:loopbackPort)
TCP & UDP flows originally destined to localServicesIP:loopbackPort.
localServicesIP is quad-100 or the IPv6 equivalent.
Updates tailscale/corp#22713
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This was previously disabled in 8e42510 due to missing GSO-awareness in
tstun, which was resolved in d097096.
Updates tailscale/corp#22511
Signed-off-by: Jordan Whited <jordan@tailscale.com>
After the upstream PR is merged, we can point directly at github.com/vishvananda/netlink
and retire github.com/tailscale/netlink.
See https://github.com/vishvananda/netlink/pull/1006
Updates #12298
Signed-off-by: Percy Wegmann <percy@tailscale.com>
net/tstun.Wrapper.InjectInboundPacketBuffer is not GSO-aware, which can
break quad-100 TCP streams as a result. Linux is the only platform where
gVisor GSO was previously enabled.
Updates tailscale/corp#22511
Updates #13211
Signed-off-by: Jordan Whited <jordan@tailscale.com>
In df6014f1d7 we removed build tag
gating preventing importation, which tripped a NetworkExtension limit
test in corp. This was a reversal of
25f0a3fc8f which actually made the
situation worse, hence the simplification.
This commit goes back to the strategy in
25f0a3fc8f, and gets us back under the
limit in my local testing. Admittedly, we don't fully understand
the effects of importing or excluding importation of this package,
and have seen mixed results, but this commit allows us to move forward
again.
Updates tailscale/corp#22125
Signed-off-by: Jordan Whited <jordan@tailscale.com>
In 2f27319baf we disabled GRO due to a
data race around concurrent calls to tstun.Wrapper.Write(). This commit
refactors GRO to be thread-safe, and re-enables it on Linux.
This refactor now carries a GRO type across tstun and netstack APIs
with a lifetime that is scoped to a single tstun.Wrapper.Write() call.
In 25f0a3fc8f we used build tags to
prevent importation of gVisor's GRO package on iOS as at the time we
believed it was contributing to additional memory usage on that
platform. It wasn't, so this commit simplifies and removes those
build tags.
Updates tailscale/corp#22353
Updates tailscale/corp#22125
Updates #6816
Signed-off-by: Jordan Whited <jordan@tailscale.com>
In a93dc6cdb1 tryUpgradeToBatchingConn()
moved to build tag gated files, but the runtime.GOOS condition excluding
Android was removed unintentionally from batching_conn_linux.go. Add it
back.
Updates tailscale/corp#22348
Signed-off-by: Jordan Whited <jordan@tailscale.com>
By default, Windows sets the SIO_UDP_CONNRESET and SIO_UDP_NETRESET
options on created UDP sockets. These behaviours make the UDP socket
ICMP-aware; when the system gets an ICMP message (e.g. an "ICMP Port
Unreachable" message, in the case of SIO_UDP_CONNRESET), it will cause
the underlying UDP socket to throw an error. Confusingly, this can occur
even on reads, if the same UDP socket is used to write a packet that
triggers this response.
The Go runtime disabled the SIO_UDP_CONNRESET behavior in 3114bd6, but
did not change SIO_UDP_NETRESET–probably because that socket option
isn't documented particularly well.
Various other networking code seem to disable this behaviour, such as
the Godot game engine (godotengine/godot#22332) and the Eclipse TCF
agent (link below). Others appear to work around this by ignoring the
error returned (anacrolix/dht#16, among others).
For now, until it's clear whether this ends up in the upstream Go
implementation or not, let's also disable the SIO_UDP_NETRESET in a
similar manner to SIO_UDP_CONNRESET.
Eclipse TCF agent: https://gitlab.eclipse.org/eclipse/tcf/tcf.agent/-/blob/master/agent/tcf/framework/mdep.c
Updates #10976
Updates golang/go#68614
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I70a2f19855f8dec1bfb82e63f6d14fc4a22ed5c3
A SIGSEGV was observed around packet merging logic in gVisor's GRO
package.
Updates tailscale/corp#22353
Signed-off-by: Jordan Whited <jordan@tailscale.com>
In particular, tests showing that #3824 works. But that test doesn't
actually work yet; it only gets a DERP connection. (why?)
Updates #13038
Change-Id: Ie1fd1b6a38d4e90fae7e72a0b9a142a95f0b2e8f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit adds a batchingConn interface, and renames batchingUDPConn
to linuxBatchingConn. tryUpgradeToBatchingConn() may return a platform-
specific implementation of batchingConn. So far only a Linux
implementation of this interface exists, but this refactor is being
done in anticipation of a Windows implementation.
Updates tailscale/corp#21874
Signed-off-by: Jordan Whited <jordan@tailscale.com>
It was returning a nil `*iptablesRunner` instead of a
nil `NetfilterRunner` interface which would then fail
checks later.
Fixes#13012
Signed-off-by: Maisem Ali <maisem@tailscale.com>
This commit increases gVisor's TCP max send (4->6MiB) and receive
(4->8MiB) buffer sizes on all platforms except iOS. These values are
biased towards higher throughput on high bandwidth-delay product paths.
The iperf3 results below demonstrate the effect of this commit between
two Linux computers with i5-12400 CPUs. 100ms of RTT latency is
introduced via Linux's traffic control network emulator queue
discipline.
The first set of results are from commit f0230ce prior to TCP buffer
resizing.
gVisor write direction:
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 180 MBytes 151 Mbits/sec 0 sender
[ 5] 0.00-10.10 sec 179 MBytes 149 Mbits/sec receiver
gVisor read direction:
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.10 sec 337 MBytes 280 Mbits/sec 20 sender
[ 5] 0.00-10.00 sec 323 MBytes 271 Mbits/sec receiver
The second set of results are from this commit with increased TCP
buffer sizes.
gVisor write direction:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 297 MBytes 249 Mbits/sec 0 sender
[ 5] 0.00-10.10 sec 297 MBytes 247 Mbits/sec receiver
gVisor read direction:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.10 sec 501 MBytes 416 Mbits/sec 17 sender
[ 5] 0.00-10.00 sec 485 MBytes 407 Mbits/sec receiver
Updates #9707
Updates tailscale/corp#22119
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This commit implements TCP GRO for packets being written to gVisor on
Linux. Windows support will follow later. The wireguard-go dependency is
updated in order to make use of newly exported IP checksum functions.
gVisor is updated in order to make use of newly exported
stack.PacketBuffer GRO logic.
TCP throughput towards gVisor, i.e. TUN write direction, is dramatically
improved as a result of this commit. Benchmarks show substantial
improvement, sometimes as high as 2x. High bandwidth-delay product
paths remain receive window limited, bottlenecked by gVisor's default
TCP receive socket buffer size. This will be addressed in a follow-on
commit.
The iperf3 results below demonstrate the effect of this commit between
two Linux computers with i5-12400 CPUs. There is roughly ~13us of round
trip latency between them.
The first result is from commit 57856fc without TCP GRO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 4.77 GBytes 4.10 Gbits/sec 20 sender
[ 5] 0.00-10.00 sec 4.77 GBytes 4.10 Gbits/sec receiver
The second result is from this commit with TCP GRO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.6 GBytes 9.14 Gbits/sec 20 sender
[ 5] 0.00-10.00 sec 10.6 GBytes 9.14 Gbits/sec receiver
Updates #6816
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This commit implements TCP GSO for packets being read from gVisor on
Linux. Windows support will follow later. The wireguard-go dependency is
updated in order to make use of newly exported GSO logic from its tun
package.
A new gVisor stack.LinkEndpoint implementation has been established
(linkEndpoint) that is loosely modeled after its predecessor
(channel.Endpoint). This new implementation supports GSO of monster TCP
segments up to 64K in size, whereas channel.Endpoint only supports up to
32K. linkEndpoint will also be required for GRO, which will be
implemented in a follow-on commit.
TCP throughput from gVisor, i.e. TUN read direction, is dramatically
improved as a result of this commit. Benchmarks show substantial
improvement through a wide range of RTT and loss conditions, sometimes
as high as 5x.
The iperf3 results below demonstrate the effect of this commit between
two Linux computers with i5-12400 CPUs. There is roughly ~13us of round
trip latency between them.
The first result is from commit 57856fc without TCP GSO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 2.51 GBytes 2.15 Gbits/sec 154 sender
[ 5] 0.00-10.00 sec 2.49 GBytes 2.14 Gbits/sec receiver
The second result is from this commit with TCP GSO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 12.6 GBytes 10.8 Gbits/sec 6 sender
[ 5] 0.00-10.00 sec 12.6 GBytes 10.8 Gbits/sec receiver
Updates #6816
Signed-off-by: Jordan Whited <jordan@tailscale.com>
wgengine/magicsock,ipn: allow setting static node endpoints via tailscaled config file.
Adds a new StaticEndpoints field to tailscaled config
that can be used to statically configure the endpoints
that the node advertizes. This field will replace
TS_DEBUG_PRETENDPOINTS env var that can be used to achieve the same.
Additionally adds some functionality that ensures that endpoints
are updated when configfile is reloaded.
Also, refactor configuring/reconfiguring components to use the
same functionality when configfile is parsed the first time or
subsequent times (after reload). Previously a configfile reload
did not result in resetting of prefs. Now it does- but does not yet
tell the relevant components to consume the new prefs. This is to
be done in a follow-up.
Updates tailscale/tailscale#12578
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Windows requires routes to have a nexthop. Routes created using the interface's local IP address or an unspecified IP address ("0.0.0.0" or "::") as the nexthop are considered on-link routes. Notably, Windows treats on-link subnet routes differently, reserving the last IP in the range as the broadcast IP and therefore prohibiting TCP connections to it, resulting in WSA error 10049: "The requested address is not valid in its context. This does not happen with single-host routes, such as routes to Tailscale IP addresses, but becomes a problem with advertised subnets when all IPs in the range should be reachable.
Before Windows 8, only routes created with an unspecified IP address were considered on-link, so our previous approach of using the interface's own IP as the nexthop likely worked on Windows 7.
This PR updates configureInterface to use the TailscaleServiceIP (100.100.100.100) and its IPv6 counterpart as the nexthop for subnet routes.
Fixestailscale/support-escalations#57
Signed-off-by: Nick Khyl <nickk@tailscale.com>
If we get an non-disco presumably-wireguard-encrypted UDP packet from
an IP:port we don't recognize, rather than drop the packet, give it to
WireGuard anyway and let WireGuard try to figure out who it's from and
tell us.
This uses the new hook added in https://github.com/tailscale/wireguard-go/pull/27
Updates tailscale/corp#20732
Change-Id: I5c61a40143810592f9efac6c12808a87f924ecf2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Load Balancers often have more than one ingress IP, so allowing us to
add multiple means we can offer multiple options.
Updates #12578
Change-Id: I4aa49a698d457627d2f7011796d665c67d4c7952
Signed-off-by: Lee Briggs <lee@leebriggs.co.uk>
For testing. Lee wants to play with 'AWS Global Accelerator Custom
Routing with Amazon Elastic Kubernetes Service'. If this works well
enough, we can promote it.
Updates #12578
Change-Id: I5018347ed46c15c9709910717d27305d0aedf8f4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The DERP Return Path Optimization (DRPO) is over four years old (and
on by default for over two) and we haven't had problems, so time to
remove the emergency shutoff code (controlknob) which we've never
used. The controlknobs are only meant for new features, to mitigate
risk. But we don't want to keep them forever, as they kinda pollute
the code.
Updates #150
Change-Id: If021bc8fd1b51006d8bddd1ffab639bb1abb0ad1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And fix up a bogus comment and flesh out some other comments.
Updates #cleanup
Change-Id: Ia60a1c04b0f5e44e8d9587914af819df8e8f442a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The logic we added in #11378 would prevent selecting a home DERP if we
have no control connection.
Updates tailscale/corp#18095
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I44bb6ac4393989444e4961b8cfa27dc149a33c6e
Changes "Accept" TCP logs to display in verbose logs only,
and removes lines from default logging behavior.
Updates #12158
Signed-off-by: Keli Velazquez <keli@tailscale.com>
The control plane hasn't sent it to clients in ages.
Updates tailscale/corp#20965
Change-Id: I1d71a4b6dd3f75010a05c544ee39827837c30772
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I noticed we were allocating these every time when they could just
share the same memory. Rather than document ownership, just lock it
down with a view.
I was considering doing all of the fields but decided to just do this
one first as test to see how infectious it became. Conclusion: not
very.
Updates #cleanup (while working towards tailscale/corp#20514)
Change-Id: I8ce08519de0c9a53f20292adfbecd970fe362de0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Previously, we were registering TCP and UDP connections in the same map,
which could result in erroneously removing a mapping if one of the two
connections completes while the other one is still active.
Add a "proto string" argument to these functions to avoid this.
Additionally, take the "proto" argument in LocalAPI, and plumb that
through from the CLI and add a new LocalClient method.
Updates tailscale/corp#20600
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I35d5efaefdfbf4721e315b8ca123f0c8af9125fb
This moves NewContainsIPFunc from tsaddr to new ipset package.
And wgengine/filter types gets split into wgengine/filter/filtertype,
so netmap (and thus the CLI, etc) doesn't need to bring in ipset,
bart, etc.
Then add a test making sure the CLI deps don't regress.
Updates #1278
Change-Id: Ia246d6d9502bbefbdeacc4aef1bed9c8b24f54d5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I noticed the not-local-v6 numbers were nowhere near the v4 numbers
(they should be identical) and then saw this. It meant the
Addr().Next() wasn't picking an IP that was no longer local, as
assumed.
Updates #12486
Change-Id: I18dfb641f00c74c6252666bc41bd2248df15fadd
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
NewContainsIPFunc returns a contains matcher optimized for its
input. Use that instead of what this did before, always doing a test
over each of a list of netip.Prefixes.
goos: darwin
goarch: arm64
pkg: tailscale.com/wgengine/filter
│ before │ after │
│ sec/op │ sec/op vs base │
FilterMatch/file1-8 32.60n ± 1% 18.87n ± 1% -42.12% (p=0.000 n=10)
Updates #12486
Change-Id: I8f902bc064effb431e5b46751115942104ff6531
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Updates tailscale/tailscale#4136
This PR is the first round of work to move from encoding health warnings as strings and use structured data instead. The current health package revolves around the idea of Subsystems. Each subsystem can have (or not have) a Go error associated with it. The overall health of the backend is given by the concatenation of all these errors.
This PR polishes the concept of Warnable introduced by @bradfitz a few weeks ago. Each Warnable is a component of the backend (for instance, things like 'dns' or 'magicsock' are Warnables). Each Warnable has a unique identifying code. A Warnable is an entity we can warn the user about, by setting (or unsetting) a WarningState for it. Warnables have:
- an identifying Code, so that the GUI can track them as their WarningStates come and go
- a Title, which the GUIs can use to tell the user what component of the backend is broken
- a Text, which is a function that is called with a set of Args to generate a more detailed error message to explain the unhappy state
Additionally, this PR also begins to send Warnables and their WarningStates through LocalAPI to the clients, using ipn.Notify messages. An ipn.Notify is only issued when a warning is added or removed from the Tracker.
In a next PR, we'll get rid of subsystems entirely, and we'll start using structured warnings for all errors affecting the backend functionality.
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
This refactors the logic for determining whether a packet should be sent
to the host or not into a function, and then adds tests for it.
Updates #11304
Updates #12448
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ief9afa98eaffae00e21ceb7db073c61b170355e5
Fix a bug where, for a subnet router that advertizes
4via6 route, all packets with a source IP matching
the 4via6 address were being sent to the host itself.
Instead, only send to host packets whose destination
address is host's local address.
Fixestailscale/tailscale#12448
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Co-authored-by: Andrew Dunham <andrew@du.nham.ca>
When we're starting child processes on Windows that are CLI programs that
don't need to output to a console, we should pass in DETACHED_PROCESS as a
CreationFlag on SysProcAttr. This prevents the OS from even creating a console
for the child (and paying the associated time/space penalty for new conhost
processes). This is more efficient than letting the OS create the console
window and then subsequently trying to hide it, which we were doing at a few
callsites.
Fixes#12270
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
This adds a new ListenPacket function on tsnet.Server
which acts mostly like `net.ListenPacket`.
Unlike `Server.Listen`, this requires listening on a
specific IP and does not automatically listen on both
V4 and V6 addresses of the Server when the IP is unspecified.
To test this, it also adds UDP support to tsdial.Dialer.UserDial
and plumbs it through the localapi. Then an associated test
to make sure the UDP functionality works from both sides.
Updates #12182
Signed-off-by: Maisem Ali <maisem@tailscale.com>
This busybox fwmaskWorks check was added before we moved away from
using the "ip" command to using netlink directly.
So it's now just wasted work (and log spam on Gokrazy) to check the
"ip" command capabilities if we're never going to use it.
Do it lazily instead.
Updates #12277
Change-Id: I8ab9acf64f9c0d8240ce068cb9ec8c0f6b1ecee7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Palo Alto reported interpreting hairpin probes as LAND attacks, and the
firewalls may be responding to this by shutting down otherwise in use NAT sessions
prematurely. We don't currently make use of the outcome of the hairpin
probes, and they contribute to other user confusion with e.g. the
AirPort Extreme hairpin session workaround. We decided in response to
remove the whole probe feature as a result.
Updates #188
Updates tailscale/corp#19106
Updates tailscale/corp#19116
Signed-off-by: James Tucker <james@tailscale.com>
After some analysis, stateful filtering is only necessary in tailnets
that use `autogroup:danger-all` in `src` in ACLs. And in those cases
users explicitly specify that hosts outside of the tailnet should be
able to reach their nodes. To fix local DNS breakage in containers, we
disable stateful filtering by default.
Updates #12108
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
It was requested by the first customer 4-5 years ago and only used
for a brief moment of time. We later added netmap visibility trimming
which removes the need for this.
It's been hidden by the CLI for quite some time and never documented
anywhere else.
This keeps the CLI flag, though, out of caution. It just returns an
error if it's set to anything but true (its default).
Fixes#12058
Change-Id: I7514ba572e7b82519b04ed603ff9f3bdbaecfda7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Palo Alto firewalls have a typically hard NAT, but also have a mode
called Persistent DIPP that is supposed to provide consistent port
mapping suitable for STUN resolution of public ports. Persistent DIPP
works initially on most Palo Alto firewalls, but some models/software
versions have a bug which this works around.
The bug symptom presents as follows:
- STUN sessions resolve a consistent public IP:port to start with
- Much later netchecks report the same IP:Port for a subset of
sessions, most often the users active DERP, and/or the port related
to sustained traffic.
- The broader set of DERPs in a full netcheck will now consistently
observe a new IP:Port.
- After this point of observation, new inbound connections will only
succeed to the new IP:Port observed, and existing/old sessions will
only work to the old binding.
In this patch we now advertise the lowest latency global endpoint
discovered as we always have, but in addition any global endpoints that
are observed more than once in a single netcheck report. This should
provide viable endpoints for potential connection establishment across
a NAT with this behavior.
Updates tailscale/corp#19106
Signed-off-by: James Tucker <james@tailscale.com>
Fixestailscale/tailscale#10393Fixestailscale/corp#15412Fixestailscale/corp#19808
On Apple platforms, exit nodes and subnet routers have been unable to relay pings from Tailscale devices to non-Tailscale devices due to sandbox restrictions imposed on our network extensions by Apple. The sandbox prevented the code in netstack.go from spawning the `ping` process which we were using.
Replace that exec call with logic to send an ICMP echo request directly, which appears to work in userspace, and not trigger a sandbox violation in the syslog.
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
When Docker is detected on the host and stateful filtering is enabled,
Docker containers may be unable to reach Tailscale nodes (depending on
the network settings of a container). Detect Docker when stateful
filtering is enabled and print a health warning to aid users in noticing
this issue.
We avoid printing the warning if the current node isn't advertising any
subnet routes and isn't an exit node, since without one of those being
true, the node wouldn't have the correct AllowedIPs in WireGuard to
allow a Docker container to connect to another Tailscale node anyway.
Updates #12070
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Idef538695f4d101b0ef6f3fb398c0eaafc3ae281
Previously, a node that was advertising a 4via6 route wouldn't be able
to make use of that same route; the packet would be delivered to
Tailscale, but since we weren't accepting it in handleLocalPackets, the
packet wouldn't be delivered to netstack and would never hit the 4via6
logic. Let's add that support so that usage of 4via6 is consistent
regardless of where the connection is initiated from.
Updates #11304
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ic28dc2e58080d76100d73b93360f4698605af7cb
This adds a new bool that can be sent down from control
to do jailing on the client side. Previously this would
only be done from control by modifying the packet filter
we sent down to clients. This would result in a lot of
additional work/CPU on control, we could instead just
do this on the client. This has always been a TODO which
we keep putting off, might as well do it now.
Updates tailscale/corp#19623
Signed-off-by: Maisem Ali <maisem@tailscale.com>
This plumbs a packet filter for jailed nodes through to the
tstun.Wrapper; the filter for a jailed node is equivalent to a "shields
up" filter. Currently a no-op as there is no way for control to
tell the client whether a peer is jailed.
Updates tailscale/corp#19623
Co-authored-by: Andrew Dunham <andrew@du.nham.ca>
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Change-Id: I5ccc5f00e197fde15dd567485b2a99d8254391ad
In prep for it being required in more places.
Updates #11874
Change-Id: Ib743205fc2a6c6ff3d2c4ed3a2b28cac79156539
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In prep for most of the package funcs in net/interfaces to become
methods in a long-lived netmon.Monitor that can cache things. (Many
of the funcs are very heavy to call regularly, whereas the long-lived
netmon.Monitor can subscribe to things from the OS and remember
answers to questions it's asked regularly later)
Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299
Change-Id: Ie4e8dedb70136af2d611b990b865a822cd1797e5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The goal is to move more network state accessors to netmon.Monitor
where they can be cheaper/cached. But first (this change and others)
we need to make sure the one netmon.Monitor is plumbed everywhere.
Some notable bits:
* tsdial.NewDialer is added, taking a now-required netmon
* because a tsdial.Dialer always has a netmon, anything taking both
a Dialer and a NetMon is now redundant; take only the Dialer and
get the NetMon from that if/when needed.
* netmon.NewStatic is added, primarily for tests
Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299
Change-Id: I877f9cb87618c4eb037cee098241d18da9c01691
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This has been a TODO for ages. Time to do it.
The goal is to move more network state accessors to netmon.Monitor
where they can be cheaper/cached.
Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299
Change-Id: I60fc6508cd2d8d079260bda371fc08b6318bcaf1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds a health.Tracker to tsd.System, accessible via
a new tsd.System.HealthTracker method.
In the future, that new method will return a tsd.System-specific
HealthTracker, so multiple tsnet.Servers in the same process are
isolated. For now, though, it just always returns the temporary
health.Global value. That permits incremental plumbing over a number
of changes. When the second to last health.Global reference is gone,
then the tsd.System.HealthTracker implementation can return a private
Tracker.
The primary plumbing this does is adding it to LocalBackend and its
dozen and change health calls. A few misc other callers are also
plumbed. Subsequent changes will flesh out other parts of the tree
(magicsock, controlclient, etc).
Updates #11874
Updates #4136
Change-Id: Id51e73cfc8a39110425b6dc19d18b3975eac75ce
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Previously it was both metadata about the class of warnable item as
well as the value.
Now it's only metadata and the value is per-Tracker.
Updates #11874
Updates #4136
Change-Id: Ia1ed1b6c95d34bc5aae36cffdb04279e6ba77015
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This moves most of the health package global variables to a new
`health.Tracker` type.
But then rather than plumbing the Tracker in tsd.System everywhere,
this only goes halfway and makes one new global Tracker
(`health.Global`) that all the existing callers now use.
A future change will eliminate that global.
Updates #11874
Updates #4136
Change-Id: I6ee27e0b2e35f68cb38fecdb3b2dc4c3f2e09d68
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This helps reduce memory pressure on tailnets with large numbers
of routes.
Updates tailscale/corp#19332
Signed-off-by: Percy Wegmann <percy@tailscale.com>
* cmd/containerboot,util/linuxfw: support proxy backends specified by DNS name
Adds support for optionally configuring containerboot to proxy
traffic to backends configured by passing TS_EXPERIMENTAL_DEST_DNS_NAME env var
to containerboot.
Containerboot will periodically (every 10 minutes) attempt to resolve
the DNS name and ensure that all traffic sent to the node's
tailnet IP gets forwarded to the resolved backend IP addresses.
Currently:
- if the firewall mode is iptables, traffic will be load balanced
accross the backend IP addresses using round robin. There are
no health checks for whether the IPs are reachable.
- if the firewall mode is nftables traffic will only be forwarded
to the first IP address in the list. This is to be improved.
* cmd/k8s-operator: support ExternalName Services
Adds support for exposing endpoints, accessible from within
a cluster to the tailnet via DNS names using ExternalName Services.
This can be done by annotating the ExternalName Service with
tailscale.com/expose: "true" annotation.
The operator will deploy a proxy configured to route tailnet
traffic to the backend IPs that service.spec.externalName
resolves to. The backend IPs must be reachable from the operator's
namespace.
Updates tailscale/tailscale#10606
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
The Network Location Awareness service identifies networks authenticated against
an Active Directory domain and categorizes them as "Domain Authenticated".
This includes the Tailscale network if a Domain Controller is reachable through it.
If a network is categories as NLM_NETWORK_CATEGORY_DOMAIN_AUTHENTICATED,
it is not possible to override its category, and we shouldn't attempt to do so.
Additionally, our Windows Firewall rules should be compatible with both private
and domain networks.
This fixes both issues.
Fixes#11813
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Most of the magicsock tests fake the network, simulating packets going
out and coming in. There's no reason to actually hit your router to do
UPnP/NAT-PMP/PCP during in tests. But while debugging thousands of
iterations of tests to deflake some things, I saw it slamming my
router. This stops that.
Updates #11762
Change-Id: I59b9f48f8f5aff1fa16b4935753d786342e87744
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Seems to deflake tstest/integration tests. I can't reproduce it
anymore on one of my VMs that was consistently flaking after a dozen
runs before. Now I can run hundreds of times.
Updates #11649Fixes#7036
Change-Id: I2f7d4ae97500d507bdd78af9e92cd1242e8e44b8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I'm on a mission to simplify LocalBackend.Start and its locking
and deflake some tests.
I noticed this hasn't been used since March 2023 when it was removed
from the Windows client in corp 66be796d33c.
So, delete.
Updates #11649
Change-Id: I40f2cb75fb3f43baf23558007655f65a8ec5e1b2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We have seen in macOS client logs that the "operation not permitted", a
syscall.EPERM error, is being returned when traffic is attempted to be
sent. This may be caused by security software on the client.
This change will perform a rebind and restun if we receive a
syscall.EPERM error on clients running darwin. Rebinds will only be
called if we haven't performed one specifically for an EPERM error in
the past 5 seconds.
Updates #11710
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
Trying to run iptables/nftables on Synology pauses for minutes with
lots of errors and ultimately does nothing as it's not used and we
lack permissions.
This fixes a regression from db760d0bac (#11601) that landed
between Synology testing on unstable 1.63.110 and 1.64.0 being cut.
Fixes#11737
Change-Id: Iaf9563363b8e45319a9b6fe94c8d5ffaecc9ccef
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Just because we don't have known endpoints for a peer does not mean that
the peer should become unreachable. If we know the peers key, it should
be able to call us, then we can talk back via whatever path it called us
on. First step - don't drop the packet in this context.
Updates tailscale/corp#19106
Signed-off-by: James Tucker <james@tailscale.com>
This removes a potentially increased boot delay for certain boot
topologies where they block on ExecStartPre that may have socket
activation dependencies on other system services (such as
systemd-resolved and NetworkManager).
Also rename cleanup to clean up in affected/immediately nearby places
per code review commentary.
Fixes#11599
Signed-off-by: James Tucker <james@tailscale.com>
It was used when we only supported subnet routers on linux
and would nil out the SubnetRoutes slice as no other router
worked with it, but now we support subnet routers on ~all platforms.
The field it was setting to nil is now only used for network logging
and nowhere else, so keep the field but drop the SubnetRouterWrapper
as it's not useful.
Updates #cleanup
Change-Id: Id03f9b6ec33e47ad643e7b66e07911945f25db79
Signed-off-by: Maisem Ali <maisem@tailscale.com>
The netcheck package and the magicksock package coordinate via the
health package, but both sides have time based heuristics through
indirect dependencies. These were misaligned, so the implemented
heuristic aimed at reducing DERP moves while there is active traffic
were non-operational about 3/5ths of the time.
It is problematic to setup a good test for this integration presently,
so instead I added comment breadcrumbs along with the initial fix.
Updates #8603
Signed-off-by: James Tucker <james@tailscale.com>
Only on Gokrazy, set sysctls to enable IP forwarding so subnet routing
and advertised exit node works.
Fixes#11405
Signed-off-by: Joonas Kuorilehto <joneskoo@derbian.fi>
This allows clients to avoid establishing their VPN multiple times when
both routes and DNS are changing in rapid succession.
Updates tailscale/corp#18928
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This change updates all tailfs functions and the majority of the tailfs
variables to use the new drive naming.
Updates tailscale/corp#16827
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
This change updates the tailfs file and package names to their new
naming convention.
Updates #tailscale/corp#16827
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>