haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-06 15:17:01 +02:00

Author	SHA1	Message	Date
Willy Tarreau	8a96216847	MEDIUM: sock-inet: re-check IPv6 connectivity every 30s IPv6 connectivity might start off (e.g. network not fully up when haproxy starts), so for features like resolvers, it would be nice to periodically recheck. With this change, instead of having the resolvers code rely on a variable indicating connectivity, it will now call a function that will check for how long a connectivity check hasn't been run, and will perform a new one if needed. The age was set to 30s which seems reasonable considering that the DNS will cache results anyway. There's no saving in spacing it more since the syscall is very check (just a connect() without any packet being emitted). The variables remain exported so that we could present them in show info or anywhere else. This way, "dns-accept-family auto" will now stay up to date. Warning though, it does perform some caching so even with a refreshed IPv6 connectivity, an older record may be returned anyway.	2025-05-09 15:45:44 +02:00
Willy Tarreau	5d41d476f3	MINOR: sock-inet: detect apparent IPv6 connectivity In order to ease dual-stack deployments, we could at least try to check if ipv6 seems to be reachable. For this we're adding a test based on a UDP connect (no traffic) on port 53 to the base of public addresses (2001::) and see if the connect() is permitted, indicating that the routing table knows how to reach it, or fails. Based on this result we're setting a global variable that other subsystems might use to preset their defaults.	2025-04-24 17:52:28 +02:00
Aperence	20efb856e1	MEDIUM: protocol: add MPTCP per address support Multipath TCP (MPTCP), standardized in RFC8684 [1], is a TCP extension that enables a TCP connection to use different paths. Multipath TCP has been used for several use cases. On smartphones, MPTCP enables seamless handovers between cellular and Wi-Fi networks while preserving established connections. This use-case is what pushed Apple to use MPTCP since 2013 in multiple applications [2]. On dual-stack hosts, Multipath TCP enables the TCP connection to automatically use the best performing path, either IPv4 or IPv6. If one path fails, MPTCP automatically uses the other path. To benefit from MPTCP, both the client and the server have to support it. Multipath TCP is a backward-compatible TCP extension that is enabled by default on recent Linux distributions (Debian, Ubuntu, Redhat, ...). Multipath TCP is included in the Linux kernel since version 5.6 [3]. To use it on Linux, an application must explicitly enable it when creating the socket. No need to change anything else in the application. This attached patch adds MPTCP per address support, to be used with: mptcp{,4,6}@<address>[:port1[-port2]] MPTCP v4 and v6 protocols have been added: they are mainly a copy of the TCP ones, with small differences: names, proto, and receivers lists. These protocols are stored in __protocol_by_family, as an alternative to TCP, similar to what has been done with QUIC. By doing that, the size of __protocol_by_family has not been increased, and it behaves like TCP. MPTCP is both supported for the frontend and backend sides. Also added an example of configuration using mptcp along with a backend allowing to experiment with it. Note that this is a re-implementation of Bj�rn's work from 3 years ago [4], when haproxy's internals were probably less ready to deal with this, causing his work to be left pending for a while. Currently, the TCP_MAXSEG socket option doesn't seem to be supported with MPTCP [5]. This results in a warning when trying to set the MSS of sockets in proto_tcp:tcp_bind_listener. This can be resolved by adding two new variables: sock_inet(6)_mptcp_maxseg_default that will hold the default value of the TCP_MAXSEG option. Note that for the moment, this will always be -1 as the option isn't supported. However, in the future, when the support for this option will be added, it should contain the correct value for the MSS, allowing to correctly set the TCP_MAXSEG option. Link: https://www.rfc-editor.org/rfc/rfc8684.html [1] Link: https://www.tessares.net/apples-mptcp-story-so-far/ [2] Link: https://www.mptcp.dev [3] Link: https://github.com/haproxy/haproxy/issues/1028 [4] Link: https://github.com/multipath-tcp/mptcp_net-next/issues/515 [5] Co-authored-by: Dorian Craps <dorian.craps@student.vinci.be> Co-authored-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>	2024-08-30 18:53:49 +02:00
Willy Tarreau	2a799b64b0	MINOR: protocol: add the real address family to the protocol For custom families, there's sometimes an underlying real address and it would be nice to be able to directly use the real family in calls to bind() and connect() without having to add explicit checks for exceptions everywhere. Let's add a .real_family field to struct proto_fam for this. For now it's always equal to the family except for non-transferable ones such as rhttp where it's equal to the custom one (anything else could fit).	2024-08-21 17:37:46 +02:00
Willy Tarreau	785b89f551	MINOR: protocol: move the global reuseport flag to the protocols Some protocol support SO_REUSEPORT and others not. Some have such a limitation in the kernel, and others in haproxy itself (e.g. sock_unix cannot support multiple bindings since each one will unbind the previous one). Also it's really protocol-dependent and not just family-dependent because on Linux for some time it was supported for TCP and not UDP. Let's move the definition to the protocols instead. Now it's preset in tcp/udp/quic when SO_REUSEPORT is defined, and is otherwise left unset. The enabled() config condition test validates IPv4 (generally sufficient), and -dR / noreuseport all protocols at once.	2023-04-23 09:46:15 +02:00
Willy Tarreau	9f53b7b41a	BUG/MINOR: sock_inet: use SO_REUSEPORT_LB where available On FreeBSD 13.1 I noticed that thread balancing using shards was not always working. Sometimes several threads would work, but most of the time a single one was taking all the traffic. This is related to how SO_REUSEPORT works on FreeBSD since version 12, as it seems there is no guarantee that multiple sockets will receive the traffic. However there is SO_REUSEPORT_LB that is designed exactly for this, so we'd rather use it when available. This patch may possibly be backported, but nobody complained and it's not sure that many users rely on shards. So better wait for some feedback before backporting this.	2023-04-23 09:46:15 +02:00
Willy Tarreau	0e1aaf4e78	MEDIUM: proto: duplicate receivers marked RX_F_MUST_DUP The different protocol's ->bind() function will now check the receiver's RX_F_MUST_DUP flag to decide whether to bind a fresh new listener from scratch or reuse an existing one and just duplicate it. It turns out that the existing code already supports reusing FDs since that was done as part of the FD passing and inheriting mechanism. Here it's not much different, we pass the FD of the reference receiver, it gets duplicated and becomes the new receiver's FD. These FDs are also marked RX_F_INHERITED so that they are not exported and avoid being touched directly (only the reference should be touched).	2023-04-21 17:41:26 +02:00
Willy Tarreau	145b17fd2f	BUG/MEDIUM: listener: duplicate inherited FDs if needed Since commit `36d9097cf` ("MINOR: fd: Add BUG_ON checks on fd_insert()"), there is currently a test in fd_insert() to detect that we're not trying to reinsert an FD that had already been inserted. This test catches the following anomalies: frontend fail1 bind fd@0 bind fd@0 and: frontend fail2 bind fd@0 shards 2 What happens is that clone_listener() is called on a listener already having an FD, and when sock_{inet,unix}_bind_receiver() are called, the same FD will be registered multiple times and rightfully crash in the sanity check. It wouldn't be correct to block shards though (e.g. they could be used in a default-bind line). What looks like a safer and more future-proof approach simply is to dup() the FD so that each listener has one copy. This is also the only solution that might allow later to support more than 64 threads on an inherited FD. This needs to be backported as far as 2.4. Better wait for at least one extra -dev version before backporting though, as the bug should not be triggered often anyway.	2023-01-11 11:27:20 +01:00
Amaury Denoyelle	487d04f6d7	BUG/MINOR: quic: set IP_PKTINFO socket option for QUIC receivers only Move code which activates IP_PKTINFO socket option (or affiliated options) from sock_inet_bind_receiver() to quic_bind_listener() function. This change is useful for two reasons : * first, and the most important one : this activates IP_PKTINFO only for QUIC receivers. The previous version impacted all datagram receivers, used for example by log-forwarder. This should reduce memory usage for these datagram sockets which do not need this option. * second, USE_QUIC preprocessor statements are removed from src/sock_inet.c which clean up the code. IP_PKTINFO was introduced recently by the following patch : `97ecc7a8ea` (quic-dev/qns) MEDIUM: quic: retrieve frontend destination address For the moment, this does not impact any stable release. However, as previous patch is scheduled for 2.6 backporting, the current change must also be backported to the same versions.	2022-10-11 16:46:04 +02:00
Amaury Denoyelle	97ecc7a8ea	MEDIUM: quic: retrieve frontend destination address Retrieve the frontend destination address for a QUIC connection. This address is retrieve from the first received datagram and then stored in the associated quic-conn. This feature relies on IP_PKTINFO or affiliated flags support on the socket. This flag is set for each QUIC listeners in sock_inet_bind_receiver(). To retrieve the destination address, recvfrom() has been replaced by recvmsg() syscall. This operation and parsing of msghdr structure has been extracted in a wrapper quic_recv(). This change is useful to finalize the implementation of 'dst' sample fetch. As such, quic_sock_get_dst() has been edited to return local address from the quic-conn. As a best effort, if local address is not available due to kernel non-support of IP_PKTINFO, address of the listener is returned instead. This should be backported up to 2.6.	2022-10-10 11:48:27 +02:00
Willy Tarreau	9464bb1f05	MEDIUM: fd: add the tgid to the fd and pass it to fd_insert() The file descriptors will need to know the thread group ID in addition to the mask. This extends fd_insert() to take the tgid, and will store it into the FD. In the FD, the tgid is stored as a combination of tgid on the lower 16 bits and a refcount on the higher 16 bits. This allows to know when it's really possible to trust the tgid and the running mask. If a refcount is higher than 1 it indeed indicates another thread else might be in the process of updating these values. Since a closed FD must necessarily have a zero refcount, a test was added to fd_insert() to make sure that it is the case.	2022-07-15 19:58:06 +02:00
Willy Tarreau	512dd2dc1c	MINOR: fd: make fd_insert() apply the thread mask itself It's a bit ugly to see that half of the callers of fd_insert() have to apply all_threads_mask themselves to the bit field they're passing, because usually it comes from a listener that may have other bits set. Let's make the function apply the mask itself.	2022-07-15 19:58:06 +02:00
Willy Tarreau	82e378aa8a	MINOR: fd/thread: get rid of thread_mask() Since commit `d2494e048` ("BUG/MEDIUM: peers/config: properly set the thread mask") there must not remain any single case of a receiver that is bound nowhere, so there's no need anymore for thread_mask(). We're adding a test in fd_insert() to make sure this doesn't happen by accident though, but the function was removed and its rare uses were replaced with the original value of the bind_thread msak.	2022-07-15 19:43:10 +02:00
Willy Tarreau	382474348c	CLEANUP: tree-wide: use fd_set_nonblock() and fd_set_cloexec() This gets rid of most open-coded fcntl() calls, some of which were passed through DISGUISE() to avoid a useless test. The FD_CLOEXEC was most often set without preserving previous flags, which could become a problem once new flags are created. Now this will not happen anymore.	2022-04-26 10:59:48 +02:00
Willy Tarreau	01cac3f721	MEDIUM: listeners: split the thread mask between receiver and bind_conf With groups at some point we'll have to have distinct masks/groups in the receiver and the bind_conf, because a single bind_conf might require to instantiate multiple receivers (one per group). Let's split the thread mask and group to have one for the bind_conf and another one for the receiver while it remains easy to do. This will later allow to use different storage for the bind_conf if needed (e.g. support multiple groups).	2021-10-14 21:27:48 +02:00
Willy Tarreau	6823a3acee	MINOR: protocol: uniformize protocol errors Some protocols fail with "error blah [ip:port]" and other fail with "[ip:port] error blah". All this already appears in a "starting" or "binding" context after a proxy name. Let's choose a more universal approach like below where the ip:port remains at the end of the line prefixed with "for". [WARNING] (18632) : Binding [binderr.cfg:10] for proxy http: cannot bind receiver to device 'eth2' (No such device) for [0.0.0.0:1080] [WARNING] (18632) : Starting [binderr.cfg:10] for proxy http: cannot set MSS to 12 for [0.0.0.0:1080]	2021-10-14 21:22:52 +02:00
Willy Tarreau	f78b52eb7d	MINOR: inet: report the faulty interface name in "bind" errors When a "bind ... interface foo" statement fails, let's report the interface name in the error message to help locating it in the file.	2021-10-14 21:22:52 +02:00
Willy Tarreau	9063a660cc	MINOR: fd: move .exported into fdtab[].state No need to keep this flag apart any more, let's merge it into the global state.	2021-04-07 18:10:36 +02:00
Willy Tarreau	4bfc6630ba	CLEANUP: socket: replace SOL_IP/IPV6/TCP with IPPROTO_IP/IPV6/TCP Historically we've used SOL_IP/SOL_IPV6/SOL_TCP everywhere as the socket level value in getsockopt() and setsockopt() but as we've seen over time it regularly broke the build and required to have them defined to their IPPROTO_* equivalent. The Linux ip(7) man page says: Using the SOL_IP socket options level isn't portable; BSD-based stacks use the IPPROTO_IP level. And it indeed looks like a pure linuxism inherited from old examples and documentation. strace also reports SOL_* instead of IPPROTO_, which does not help... A check to linux/in.h shows they have the same values. Only SOL_SOCKET and other non-IP values make sense since there is no IPPROTO equivalent. Let's get rid of this annoying confusion by removing all redefinitions of SOL_IP/IPV6/TCP and using IPPROTO_ instead, just like any other operating system. This also removes duplicated tests for the same value. Note that this should not result in exposing syscalls to other OSes as the only ones that were still conditionned to SOL_IPV6 were for IPV6_UNICAST_HOPS which already had an IPPROTO_IPV6 equivalent, and IPV6_TRANSPARENT which is Linux-specific.	2021-03-31 08:59:34 +02:00
Willy Tarreau	73bed9ff13	MINOR: protocol: add a ->set_port() helper to address families At various places we need to set a port on an IPv4 or IPv6 address, and it requires casts that are easy to get wrong. Let's add a new set_port() helper to the address family to assist in this. It will be directly accessible from the protocol and will make the operation seamless. Right now this is only implemented for sock_inet as other families do not need a port.	2020-12-04 15:08:00 +01:00
Willy Tarreau	233ad288cd	CLEANUP: protocol: remove the now unused <handler> field of proto_fam->bind() We don't need to specify the handler anymore since it's set in the receiver. Let's remove this argument from the function and clean up the remains of code that were still setting it.	2020-10-15 21:47:56 +02:00
Willy Tarreau	f2cda10b1d	BUILD: sock_inet: include errno.h I was careful to have it for sock_unix.c but missed it for sock_inet which broke with commit `36722d227` ("MINOR: sock_inet: report the errno string in binding errors") depending on the build options. No backport is needed.	2020-09-17 14:02:01 +02:00
Willy Tarreau	36722d2274	MINOR: sock_inet: report the errno string in binding errors With the socket binding code cleanup it becomes easy to add more info to error messages. One missing thing used to be the error string, which is now added after the generic one, for example: [ALERT] 260/082852 (12974) : Starting frontend f: cannot bind socket (Permission denied) [0.0.0.0:4] [ALERT] 260/083053 (13292) : Starting frontend f: cannot bind socket (Address already in use) [0.0.0.0:4444] [ALERT] 260/083104 (13298) : Starting frontend f: cannot bind socket (Cannot assign requested address) [1.1.1.1:4444]	2020-09-17 08:32:17 +02:00
Willy Tarreau	f1f660978c	MINOR: protocol: retrieve the family-specific fields from the family We now take care of retrieving sock_family, l3_addrlen, bind(), addrcmp(), get_src() and get_dst() from the protocol family and not just the protocol itself. There are very few places, this was only seldom used. Interestingly in sock_inet.c used to rely on ->sock_family instead of ->sock_domain, and sock_unix.c used to hard-code PF_UNIX instead of using ->sock_domain. Also it appears obvious we have something wrong it the protocol selection algorithm because sock_domain is the one set to the custom protocols while it ought to be sock_family instead, which would avoid having to hard-code some conversions for UDP namely.	2020-09-16 22:08:07 +02:00
Willy Tarreau	b0254cb361	MINOR: protocol: add a new proto_fam structure for protocol families We need to specially handle protocol families which regroup common functions used for a given address family. These functions include bind(), addrcmp(), get_src() and get_dst() for now. Some fields are also added about the address family, socket domain (protocol family passed to the socket() syscall), and address length. These protocol families are referenced from the protocols but not yet used.	2020-09-16 22:08:07 +02:00
Willy Tarreau	d69ce1ffbc	MEDIUM: sock_inet: implement sock_inet_bind_receiver() This function collects all the receiver-specific code from both tcp_bind_listener() and udp_bind_listener() in order to provide a more generic AF_INET/AF_INET6 socket binding function. For now the API is not very elegant because some info are still missing from the receiver while there's no ideal place to fill them except when calling ->listen() at the protocol level. It looks like some polishing code is needed in check_config_validity() or somewhere around this in order to finalize the receivers' setup. The main issue is that listeners and receivers are created before bind_conf options are parsed and that there's no finishing step to resolve some of them. The function currently sets up a receiver and subscribes it to the poller. In an ideal world we wouldn't subscribe it but let the caller do it after having finished to configure the L4 stuff. The problem is that the caller would then need to perform an fd_insert() call and to possibly set the exported flag on the FD while it's not its job. Maybe an improvement could be to have a separate sock_start_receiver() call in sock.c. For now the function is not used but it will soon be. It's already referenced as tcp and udp's ->bind().	2020-09-16 22:08:07 +02:00
Willy Tarreau	3fd3bdc836	MINOR: receiver: move the FOREIGN and V6ONLY options from listener to settings The new RX_O_FOREIGN, RX_O_V6ONLY and RX_O_V4V6 options are now set into the rx_settings part during the parsing, so that we don't need to adjust them in each and every listener anymore. We have to keep both v4v6 and v6only due to the precedence from v6only over v4v6.	2020-09-16 22:08:07 +02:00
Willy Tarreau	37bafdcbb1	MINOR: sock_inet: move the IPv4/v6 transparent mode code to sock_inet This code was highly redundant, existing for TCP clients, TCP servers and UDP servers. Let's move it to sock_inet where it belongs. The new functions are sock_inet4_make_foreign() and sock_inet6_make_foreign().	2020-08-28 18:51:36 +02:00
Willy Tarreau	e5bdc51bb5	REORG: sock_inet: move default_tcp_maxseg from proto_tcp.c Let's determine it at boot time instead of doing it on first use. It also saves us from having to keep it thread local. It's been moved to the new sock_inet_prepare() function, and the variables were renamed to sock_inet_tcp_maxseg_default and sock_inet6_tcp_maxseg_default.	2020-08-28 18:51:36 +02:00
Willy Tarreau	d88e8c06ac	REORG: sock_inet: move v6only_default from proto_tcp.c to sock_inet.c The v6only_default variable is not specific to TCP but to AF_INET6, so let's move it to the right file. It's now immediately filled on startup during the PREPARE stage so that it doesn't have to be tested each time. The variable's name was changed to sock_inet6_v6only_default.	2020-08-28 18:51:36 +02:00
Willy Tarreau	25140cc573	REORG: inet: replace tcp_is_foreign() with sock_inet_is_foreign() The function now makes it clear that it's independent on the socket type and solely relies on the address family. Note that it supports both IPv4 and IPv6 as we don't seem to need it per-family.	2020-08-28 18:51:36 +02:00
Willy Tarreau	c5a94c936b	MINOR: sock_inet: implement sock_inet_get_dst() This one is common to the TCPv4 and UDPv4 code, it retrieves the destination address of a socket, taking care of the possiblity that for an incoming connection the traffic was possibly redirected. The TCP and UDP definitions were updated to rely on it and remove duplicated code.	2020-08-28 18:51:36 +02:00
Willy Tarreau	0d06df6448	MINOR: sock: introduce sock_inet and sock_unix These files will regroup everything specific to AF_INET, AF_INET6 and AF_UNIX socket definitions and address management. Some code there might be agnostic to the socket type and could later move to af_xxxx.c but for now we only support regular sockets so no need to go too far. The files are quite poor at this step, they only contain the address comparison function for each address family.	2020-08-28 18:51:36 +02:00

33 Commits