IPv6 connectivity might start off (e.g. network not fully up when
haproxy starts), so for features like resolvers, it would be nice to
periodically recheck.
With this change, instead of having the resolvers code rely on a variable
indicating connectivity, it will now call a function that will check for
how long a connectivity check hasn't been run, and will perform a new one
if needed. The age was set to 30s which seems reasonable considering that
the DNS will cache results anyway. There's no saving in spacing it more
since the syscall is very check (just a connect() without any packet being
emitted).
The variables remain exported so that we could present them in show info
or anywhere else.
This way, "dns-accept-family auto" will now stay up to date. Warning
though, it does perform some caching so even with a refreshed IPv6
connectivity, an older record may be returned anyway.
In order to ease dual-stack deployments, we could at least try to
check if ipv6 seems to be reachable. For this we're adding a test
based on a UDP connect (no traffic) on port 53 to the base of
public addresses (2001::) and see if the connect() is permitted,
indicating that the routing table knows how to reach it, or fails.
Based on this result we're setting a global variable that other
subsystems might use to preset their defaults.
Multipath TCP (MPTCP), standardized in RFC8684 [1], is a TCP extension
that enables a TCP connection to use different paths.
Multipath TCP has been used for several use cases. On smartphones, MPTCP
enables seamless handovers between cellular and Wi-Fi networks while
preserving established connections. This use-case is what pushed Apple
to use MPTCP since 2013 in multiple applications [2]. On dual-stack
hosts, Multipath TCP enables the TCP connection to automatically use the
best performing path, either IPv4 or IPv6. If one path fails, MPTCP
automatically uses the other path.
To benefit from MPTCP, both the client and the server have to support
it. Multipath TCP is a backward-compatible TCP extension that is enabled
by default on recent Linux distributions (Debian, Ubuntu, Redhat, ...).
Multipath TCP is included in the Linux kernel since version 5.6 [3]. To
use it on Linux, an application must explicitly enable it when creating
the socket. No need to change anything else in the application.
This attached patch adds MPTCP per address support, to be used with:
mptcp{,4,6}@<address>[:port1[-port2]]
MPTCP v4 and v6 protocols have been added: they are mainly a copy of the
TCP ones, with small differences: names, proto, and receivers lists.
These protocols are stored in __protocol_by_family, as an alternative to
TCP, similar to what has been done with QUIC. By doing that, the size of
__protocol_by_family has not been increased, and it behaves like TCP.
MPTCP is both supported for the frontend and backend sides.
Also added an example of configuration using mptcp along with a backend
allowing to experiment with it.
Note that this is a re-implementation of Björn's work from 3 years ago
[4], when haproxy's internals were probably less ready to deal with
this, causing his work to be left pending for a while.
Currently, the TCP_MAXSEG socket option doesn't seem to be supported
with MPTCP [5]. This results in a warning when trying to set the MSS of
sockets in proto_tcp:tcp_bind_listener.
This can be resolved by adding two new variables:
sock_inet(6)_mptcp_maxseg_default that will hold the default
value of the TCP_MAXSEG option. Note that for the moment, this
will always be -1 as the option isn't supported. However, in the
future, when the support for this option will be added, it should
contain the correct value for the MSS, allowing to correctly
set the TCP_MAXSEG option.
Link: https://www.rfc-editor.org/rfc/rfc8684.html [1]
Link: https://www.tessares.net/apples-mptcp-story-so-far/ [2]
Link: https://www.mptcp.dev [3]
Link: https://github.com/haproxy/haproxy/issues/1028 [4]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/515 [5]
Co-authored-by: Dorian Craps <dorian.craps@student.vinci.be>
Co-authored-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
At various places we need to set a port on an IPv4 or IPv6 address, and
it requires casts that are easy to get wrong. Let's add a new set_port()
helper to the address family to assist in this. It will be directly
accessible from the protocol and will make the operation seamless.
Right now this is only implemented for sock_inet as other families do
not need a port.
We don't need to specify the handler anymore since it's set in the
receiver. Let's remove this argument from the function and clean up
the remains of code that were still setting it.
We need to specially handle protocol families which regroup common
functions used for a given address family. These functions include
bind(), addrcmp(), get_src() and get_dst() for now. Some fields are
also added about the address family, socket domain (protocol family
passed to the socket() syscall), and address length.
These protocol families are referenced from the protocols but not yet
used.
This function collects all the receiver-specific code from both
tcp_bind_listener() and udp_bind_listener() in order to provide a more
generic AF_INET/AF_INET6 socket binding function. For now the API is
not very elegant because some info are still missing from the receiver
while there's no ideal place to fill them except when calling ->listen()
at the protocol level. It looks like some polishing code is needed in
check_config_validity() or somewhere around this in order to finalize
the receivers' setup. The main issue is that listeners and receivers
are created *before* bind_conf options are parsed and that there's no
finishing step to resolve some of them.
The function currently sets up a receiver and subscribes it to the
poller. In an ideal world we wouldn't subscribe it but let the caller
do it after having finished to configure the L4 stuff. The problem is
that the caller would then need to perform an fd_insert() call and to
possibly set the exported flag on the FD while it's not its job. Maybe
an improvement could be to have a separate sock_start_receiver() call
in sock.c.
For now the function is not used but it will soon be. It's already
referenced as tcp and udp's ->bind().
This code was highly redundant, existing for TCP clients, TCP servers
and UDP servers. Let's move it to sock_inet where it belongs. The new
functions are sock_inet4_make_foreign() and sock_inet6_make_foreign().
Let's determine it at boot time instead of doing it on first use. It
also saves us from having to keep it thread local. It's been moved to
the new sock_inet_prepare() function, and the variables were renamed
to sock_inet_tcp_maxseg_default and sock_inet6_tcp_maxseg_default.
The v6only_default variable is not specific to TCP but to AF_INET6, so
let's move it to the right file. It's now immediately filled on startup
during the PREPARE stage so that it doesn't have to be tested each time.
The variable's name was changed to sock_inet6_v6only_default.
The function now makes it clear that it's independent on the socket
type and solely relies on the address family. Note that it supports
both IPv4 and IPv6 as we don't seem to need it per-family.
This one is common to the TCPv4 and UDPv4 code, it retrieves the
destination address of a socket, taking care of the possiblity that for
an incoming connection the traffic was possibly redirected. The TCP and
UDP definitions were updated to rely on it and remove duplicated code.
These files will regroup everything specific to AF_INET, AF_INET6 and
AF_UNIX socket definitions and address management. Some code there might
be agnostic to the socket type and could later move to af_xxxx.c but for
now we only support regular sockets so no need to go too far.
The files are quite poor at this step, they only contain the address
comparison function for each address family.