haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-10-28 07:01:00 +01:00

Author	SHA1	Message	Date
Frederic Lecaille	47bb15ca84	MINOR: quic: get rid of ->target quic_conn struct member The ->li (struct listener ) member of quic_conn struct was replaced by a ->target (struct obj_type ) member by this commit: MINOR: quic-be: get rid of ->li quic_conn member to abstract the connection type (front or back) when implementing QUIC for the backends. In these cases, ->target was a pointer to the ojb_type of a server struct. This could not work with the dynamic servers contrary to the listeners which are not dynamic. This patch almost reverts the one mentioned above. ->target pointer to obj_type member is replaced by ->li pointer to listener struct member. As the listener are not dynamic, this is easy to do this. All one has to do is to replace the objt_listener(qc->target) statement by qc->li where applicable. For the backend connection, when needed, this is always qc->conn->target which is used only when qc->conn is initialized. The only "problematic" case is for quic_dgram_parse() which takes a pointer to an obj_type as third argument. But this obj_type is only used to call quic_rx_pkt_parse(). Inside this function it is used to access the proxy counters of the connection thanks to qc_counters(). So, this obj_type argument may be null for now on with this patch. This is the reason why qc_counters() is modified to take this into consideration.	2025-09-11 09:51:28 +02:00
Frederic Lecaille	1c33756f78	BUG/MINOR: quic: Wrong source address use on FreeBSD The bug is a listener only one, and only occured on FreeBSD. The FreeBSD issue has been reported here: https://forums.freebsd.org/threads/quic-http-3-with-haproxy.98443/ where QUIC traces could reveal that sendmsg() calls lead to EINVAL syscall errnos. Such a similar issue could be reproduced from a FreeBSD 14-2 VM with reg-tests/quic/retry.vtc as reg test. As noted by Olivier, this issue could be fixed within the VM binding the listener socket to INADDR_ANY. That said, the symptoms are not exactly the same as the one reporte by the user. What could be observed from such a VM is that if the first recvmsg() call returns the datagram destination address, and if the listener listening address is bound to a specific address, the calls to sendmsg() fail because of the IP_SENDSRCADDR ip option value set by cmsg_set_saddr(). According to the ip(4) freebsd manual such an IP options must be used if the listening socket is bound to a specific address. It is to be noted that into a VM the first call to recvmsg() of the first connection does not return the datagram destination address. This leads the first quic_conn to be initialized without ->local_addr value. This is this value which is used by IP_SENDSRCADDR ip option. In this case, the sendmsg() calls (without IP_SENDSRCADDR) never fail. The issue appears at the second condition. This patch replaces the conditions to use IP_SENDSRCADDR to a call to qc_may_use_saddr(). This latter also checks that the listener listening address is not INADDR_ANY to allow the use of the source address. It is generalized to all the OSes. Indeed, there is no reason to set the source address when the listener is bound to a specific address. Must be backported as far as 2.8.	2025-07-16 10:17:54 +02:00
Amaury Denoyelle	46cee07931	BUG/MINOR: quic: don't restrict reception on backend privileged ports When QUIC is used on the frontend side, communication is restricted with clients using privileged port. This is a simple protection against DNS/NTP spoofing. This feature should not be activated on the backend side, as in this case it is quite frequent to exchange with server running on privileged ports. As such, a new parameter is added to quic_recv() so that it is only active on the frontend side. Without this patch, it is impossible to communicate with QUIC servers running on privileged ports, as incoming datagrams would be silently dropped. No need to backport.	2025-06-13 16:40:21 +02:00
Frederic Lecaille	b9703cf711	MINOR: quic-be: get rid of ->li quic_conn member Replace ->li quic_conn pointer to struct listener member by ->target which is an object type enum and adapt the code. Use __objt_(listener\|server)() where the object type is known. Typically this is were the code which is specific to one connection type (frontend/backend). Remove <server> parameter passed to qc_new_conn(). It is redundant with the <target> parameter. GSO is not supported at this time for QUIC backend. qc_prep_pkts() is modified to prevent it from building more than an MTU. This has as consequence to prevent qc_send_ppkts() to use GSO. ssl_clienthello.c code is run only by listeners. This is why __objt_listener() is used in place of ->li.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	43d88a44f1	MINOR: quic-be: Datagrams and packet parsing support Modify quic_dgram_parse() to stop passing it a listener as third parameter. In place the object type address of the connection socket owner is passed to support the haproxy servers with QUIC as transport protocol. qc_owner_obj_type() is implemented to return this address. qc_counters() is also implemented to return the QUIC specific counters of the proxy of owner of the connection. quic_rx_pkt_parse() called by quic_dgram_parse() is also modify to use the object type address used by this latter as last parameter. It is also modified to send Retry packet only from listeners. A QUIC client (connection to haproxy QUIC servers) must drop the Initial packets with non null token length. It is also not supposed to receive O-RTT packets which are dropped.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	266b10b8a4	MINOR: quic-be: Do not redispatch the datagrams The QUIC datagram redispatch is there to counter the race condition which exists only for QUIC connections to listener where datagrams may arrive on the wrong socket between the bind() and connect() calls. Run this code part only for listeners.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	89d5a59933	MINOR: quic-be: add field for max_udp_payload_size into quic_conn Add ->max_udp_payload_size new member to quic_conn struct. Initialize it from qc_new_conn(). Adapt qc_snd_buf() to use it.	2025-06-11 18:37:34 +02:00
Olivier Houchard	09f5501bb9	MEDIUM: quic: Make sure we return the tasklet from quic_accept_run In quic_accept_run, return the tasklet to tell the scheduler the tasklet is still alive, it is not yet needed, but will be soon.	2025-04-25 16:14:26 +02:00
Willy Tarreau	dd900aead8	BUILD: quic_sock: address a strict-aliasing build warning with gcc 5 and 6 The UDP GSO code emits a build warning with older toolchains (gcc 5 and 6): src/quic_sock.c: In function 'cmsg_set_gso': src/quic_sock.c:683:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] ((uint16_t )CMSG_DATA(c)) = gso_size; ^ Let's just use the write_u16() function that's made for this purpose. It was verified that for all versions from 5 to 13, gcc produces the exact same code with the fix (and without the warning). It arrived in 3.1 with commit 448d3d388a ("MINOR: quic: add GSO parameter on quic_sock send API") so this can be backported there.	2025-04-02 16:07:31 +02:00
Willy Tarreau	7760e3a374	CLEANUP: quic: replace ALREADY_CHECKED() with ASSUME_NONNULL() at a few places There were 4 instances of ALREADY_CHECKED() used to tell the compiler that the argument couldn't be NULL by design. Let's change them to the cleaner ASSUME_NONNULL(). Functions like qc_snd_buf() were slightly reduced in size (-24 bytes). Apparently gcc-13 sees a potential case that others don't see, and it's likely a bug since depending what is masked, it will completely change the output warnings to the point of contradicting itself. After many attempts, it appears that just checking that CMSG_FIRSTHDR(msg) is not null suffices to calm it down, so the strange warnings might have been the result of an overoptimization based on a supposed UB in the first place. At least now all versions up to 13.2 as well as clang are happy.	2024-12-17 17:47:57 +01:00
Willy Tarreau	2bc513dd31	BUILD: quic: fix build errors on FreeBSD since recent GSO changes The following commits broke the build on FreeBSD when QUIC is enabled: 35470d518 ("MINOR: quic: activate UDP GSO for QUIC if supported") 448d3d388 ("MINOR: quic: add GSO parameter on quic_sock send API") Indeed, it turns out that netinet/udp.h requires sys/types.h to be included before. Let's just change the includes order to fix the build. No backport is needed.	2024-08-30 18:53:49 +02:00
Amaury Denoyelle	85131f91bf	BUG/MEDIUM: quic: fix invalid conn reject with CONNECTION_REFUSED quic-initial rules were implemented just recently. For some actions, a new flags field was added in quic_dgram structure. This is used to report the result of the rules execution. However, this flags field was left uninitialized. Depending on its value, it may close the connection to be wrongly rejected via CONNECTION_REFUSED. Fix this by properly set flags value to 0. No need to backport.	2024-07-26 15:24:35 +02:00
Amaury Denoyelle	f91be2657e	MINOR: quic: pass quic_dgram as obj_type for quic-initial rules To extend quic-initial rules, pass quic_dgram instance to argument for the various actions. As such, quic_dgram is now supported as an obj_type and can be used in session origin field.	2024-07-25 15:39:39 +02:00
Amaury Denoyelle	d0ea173e35	MEDIUM: quic: implement GSO fallback mechanism UDP GSO on Linux is not implemented in every network devices. For example, this is not available for veth devices frequently used in container environment. In such case, EIO is reported on send() invocation. It is impossible to test at startup for proper GSO support in this case as a listener may be bound on multiple network interfaces. Furthermore, network interfaces may change during haproxy lifetime. As such, the only option is to react on send syscall error when GSO is used. The purpose of this patch is to implement a fallback when encountering such conditions. Emission can be retried immediately by trying to send each prepared datagrams individually. To support this, qc_send_ppkts() is able to iterate over each datagram in a so-called non-GSO fallback mode. Between each emission, a datagram header is rewritten in front of the buffer which allows the sending loop to proceed until last datagram is emitted. To complement this, quic_conn listener is flagged on first GSO send error with value LI_F_UDP_GSO_NOTSUPP. This completely disables GSO for all future emission with QUIC connections using this listener. For the moment, non-GSO fallback mode is activated when EIO is reported after GSO has been set. This is the error reported for the veth usage described above.	2024-07-11 11:02:44 +02:00
Amaury Denoyelle	448d3d388a	MINOR: quic: add GSO parameter on quic_sock send API Add <gso_size> parameter to qc_snd_buf(). When non-null, this specifies the value for socket option SOL_UDP/UDP_SEGMENT. This allows to send several datagrams in a single call by splitting data multiple times at <gso_size> boundary. For now, <gso_size> remains set to 0 by caller, as such there should not be any functional change.	2024-07-11 11:02:44 +02:00
Willy Tarreau	4e65fc66f6	MAJOR: import: update mt_list to support exponential back-off (try #2 ) This is the second attempt at importing the updated mt_list code (commit 59459ea3). The previous one was attempted with commit c618ed5ff4 ("MAJOR: import: update mt_list to support exponential back-off") but revealed problems with QUIC connections and was reverted. The problem that was faced was that elements deleted inside an iterator were no longer reset, and that if they were to be recycled in this form, they could appear as busy to the next user. This was trivially reproduced with this: $ cat quic-repro.cfg global stats socket /tmp/sock1 level admin stats timeout 1h limited-quic frontend stats mode http bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3 timeout client 5s stats uri / $ ./haproxy -db -f quic-repro.cfg & $ h2load -c 10 -n 100000 --npn h3 https://127.0.0.1:8443/ => hang This was purely an API issue caused by the simplified usage of the macros for the iterator. The original version had two backups (one full element and one pointer) that the user had to take care of, while the new one only uses one that is transparent for the user. But during removal, the element still has to be unlocked if it's going to be reused. All of this sparked discussions with Fred and Aur�lien regarding the still unclear state of locking. It was found that the lock API does too much at once and is lacking granularity. The new version offers a much more fine- grained control allowing to selectively lock/unlock an element, a link, the rest of the list etc. It was also found that plenty of places just want to free the current element, or delete it to do anything with it, hence don't need to reset its pointers (e.g. event_hdl). Finally it appeared obvious that the root cause of the problem was the unclear usage of the list iterators themselves because one does not necessarily expect the element to be presented locked when not needed, which makes the unlock easy to overlook during reviews. The updated version of the list presents explicit lock status in the macro name (_LOCKED or _UNLOCKED suffixes). When using the _LOCKED suffix, the caller is expected to unlock the element if it intends to reuse it. At least the status is advertised. The _UNLOCKED variant, instead, always unlocks it before starting the loop block. This means it's not necessary to think about unlocking it, though it's obviously not usable with everything. A few _UNLOCKED were used at obvious places (i.e. where the element is deleted and freed without any prior check). Interestingly, the tests performed last year on QUIC forwarding, that resulted in limited traffic for the original version and higher bit rate for the new one couldn't be reproduced because since then the QUIC stack has gaind in efficiency, and the 100 Gbps barrier is now reached with or without the mt_list update. However the unit tests definitely show a huge difference, particularly on EPYC platforms where the EBO provides tremendous CPU savings. Overall, the following changes are visible from the application code: - mt_list_for_each_entry_safe() + 1 back elem + 1 back ptr => MT_LIST_FOR_EACH_ENTRY_LOCKED() or MT_LIST_FOR_EACH_ENTRY_UNLOCKED() + 1 back elem - MT_LIST_DELETE_SAFE() no longer needed in MT_LIST_FOR_EACH_ENTRY_UNLOCKED() => just manually set iterator to NULL however. For MT_LIST_FOR_EACH_ENTRY_LOCKED() => mt_list_unlock_self() (if element going to be reused) + NULL - MT_LIST_LOCK_ELT => mt_list_lock_full() - MT_LIST_UNLOCK_ELT => mt_list_unlock_full() - l = MT_LIST_APPEND_LOCKED(h, e); MT_LIST_UNLOCK_ELT(); => l=mt_list_lock_prev(h); mt_list_lock_elem(e); mt_list_unlock_full(e, l)	2024-07-09 16:46:38 +02:00
Amaury Denoyelle	b9f67a46a2	MINOR: quic: clarify doc for quic_recv() Just highlight the fact that quic_recv() only receive a single datagram.	2024-05-24 14:36:31 +02:00
Amaury Denoyelle	45f40bac4c	MEDIUM: config: prevent communication with privileged ports This commit introduces a new global setting named harden.reject_privileged_ports.{tcp\|quic}. When active, communications with clients which use privileged source ports are forbidden. Such behavior is considered suspicious as it can be used as spoofing or DNS/NTP amplication attack. Value is configured per transport protocol. For each TCP and QUIC distinct code locations are impacted by this setting. The first one is in sock_accept_conn() which acts as a filter for all TCP based communications just after accept() returns a new connection. The second one is dedicated for QUIC communication in quic_recv(). In both cases, if a privileged source port is used and setting is disabled, received message is silently dropped. By default, protection are disabled for both protocols. This is to be able to backport it without breaking changes on stable release. This should be backported as it is an interesting security feature yet relatively simple to implement.	2024-05-24 14:36:31 +02:00
Willy Tarreau	72d0dcda8e	MINOR: dynbuf: pass a criticality argument to b_alloc() The goal is to indicate how critical the allocation is, between the least one (growing an existing buffer ring) and the topmost one (boot time allocation for the life of the process). The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function to try to allocate otherwise subscribe. There's currently no distinction of direction nor part that tries to allocate, and this should be revisited to improve this situation, particularly when we consider that mux-h2 can reduce its Tx allocations if needed. For now, 4 main levels are planned, to translate how the data travels inside haproxy from a producer to a consumer: - MUX_RX: buffer used to receive data from the OS - SE_RX: buffer used to place a transformation of the RX data for a mux, or to produce a response for an applet - CHANNEL: the channel buffer for sync recv - MUX_TX: buffer used to transfer data from the channel to the outside, generally a mux but there can be a few specificities (e.g. http client's response buffer passed to the application, which also gets a transformation of the channel data). The other levels are a bit different in that they don't strictly need to allocate for the first two ones, or they're permanent for the last one (used by compression).	2024-05-10 17:18:13 +02:00
Amaury Denoyelle	d8f1ff8648	BUG/MEDIUM: quic: fix connection freeze on post handshake After handshake completion, QUIC server is responsible to emit HANDSHAKE_DONE frame. Some clients wait for it to begin STREAM transfers. Previously, there was no explicit tasklet_wakeup() after handshake completion, which is necessary to emit post-handshake frames. In most cases, this was undetected as most client continue emission which will reschedule the tasklet. However, as there is no tasklet_wakeup(), this is not a consistent behavior. If this bug occurs, it causes a connection freeze, preventing the client to emit any request. The connection is finally closed on idle timeout. To fix this, add an explicit tasklet_wakeup() after handshake completion. It sounds simple enough but in fact it's difficult to find the correct location efor tasklet_wakeup() invocation, as post-handshake is directly linked to connection accept, with different orderings. Notably, if 0-RTT is used, connection can be accepted prior handshake completion. Another major point is that along HANDSHAKE_DONE frame, a series of NEW_CONNECTION_ID frames are emitted. However, these new CIDs allocation must occur after connection is migrated to its new thread as these CIDs are tied to it. A BUG_ON() is present to check this in qc_set_tid_affinity(). With all this in mind, 2 locations were selected for the necessary tasklet_wakeup() : * on qc_xprt_start() : this is useful for standard case without 0-RTT. This ensures that this is done only after connection thread migration. * on qc_ssl_provide_all_quic_data() : this is done on handshake completion with 0-RTT used. In this case only, connection is already accepted and migrated, so tasklet_wakeup() is safe. Note that as a side-change, quic_accept_push_qc() API has evolved to better reflect differences between standard and 0-RTT usages. It is now forbidden to call it multiple times on a single quic_conn instance. A BUG_ON() has been added. This issue is labelled as medium even though it seems pretty rare. It was only reproducible using QUIC interop runner, with haproxy compiled with LibreSSL with quic-go as client. However, affected code parts are pretty sensible, which justify the chosen severity. This should fix github issue #2418. It should be backported up to 2.6, after a brief period of observation. Note that the extra comment added in qc_set_tid_affinity() can be removed in 2.6 as thread migration is not implemented for this version. Other parts should apply without conflict.	2024-03-06 10:39:57 +01:00
Amaury Denoyelle	a17eaf7763	BUG/MINOR: quic: initialize msg_flags before sendmsg Previously, msghdr struct used for sendmsg was memset to 0. This was updated for performance reason with each members individually defined. This is done by the following commit : commit 107d6d75465419a09d90c790edb617091a04468a OPTIM: quic: improve slightly qc_snd_buf() internal msg_flags is the only member unset, as sendmsg manual page reports that it is unused. However, this caused a coverity report. In the end, it is better to explicitely set it to 0 to avoid any future interrogations, compiler warning or even portability issues. This should fix coverity report from github issue #2455. No need to backport unless above patch is.	2024-02-21 10:13:53 +01:00
Amaury Denoyelle	8b950f40fa	MINOR: quic: only use sendmsg() syscall variant This patch is the direct followup of the previous one : MINOR: quic: remove sendto() usage variant This finalizes qc_snd_buf() simplification by removing send() syscall usage for quic-conn owned socket. Syscall invocation is merged in a single code location to the sendmsg() variant. The only difference for owned socket is that destination address for sendmsg() is set to NULL. This usage is documented in man 2 sendmsg as valid for connected sockets. This allows maximum performance by avoiding unnecessary lookups on kernel socket address tables. As the previous patch, no functional change should happen here. However, it will be simpler to extend qc_snd_buf() for GSO usage.	2024-02-20 16:42:05 +01:00
Amaury Denoyelle	8de9f8f193	MINOR: quic: remove sendto() usage variant qc_snd_buf() is a wrapper around emission syscalls. Given QUIC configuration, a different variant is used. When using connection socket, send() is the only used. For listener sockets, sendmsg() and sendto() are possible. The first one is used only if local address has been retrieved prior. This allows to fix it on sending to guarantee the source address selection. Finally, sendto() is used for systems which do not support local address retrieval. All of these variants render the code too complex. As such, this patch simplifies this by removing sendto() alternative. Now, sendmsg() is always used for listener sockets. Source address is then specified only if supported by the system. This patch should not exhibit functional behavior changes. It will be useful when implementing GSO as the code is now simpler.	2024-02-20 16:42:05 +01:00
Amaury Denoyelle	ea90c39302	MINOR: quic: move IP_PKTINFO on send on a dedicated function When using listener socket, source address for emission is explicitely set using ancillary data for sendmsg(). This is useful to guarantee the correct address is used when binding on a non-explicit address. This code was implemented directly under qc_snd_buf(). However, it is quite complex due to portability issue. For IPv4, two parallel implementations coexist, defined under IP_PKTINFO or IP_RECVDSTADDR. For IPv6, another option is defined under IPV6_RECVPKTINFO. Each variant uses its distinct name which increase the code complexity. Extract ancillary data filling in a dedicated function named cmsg_set_saddr(). This reduces greatly the body of qc_snd_buf(). Such functions can be replicated when other ancillary data type will be implemented. This will notably be useful for GSO implementation.	2024-02-20 16:42:05 +01:00
Amaury Denoyelle	107d6d7546	OPTIM: quic: improve slightly qc_snd_buf() internal qc_snd_buf() is a wrapper for sendmsg() syscall (or its derivatives) used for all QUIC emissions. This patch aims at removing several non-optimal code sections : * fd_send_ready() for connected sockets is only checked on the function preambule instead of inside the emission loop * zero-ing msghdr structure for unconnected sockets is removed. This is unnecessary as all fields are properly initialized then. * extra memcpy/memset invocations when using IP_PKTINFO/IPV6_RECVPKTINFO are removed by setting directly the address value into cmsg buffer	2024-02-20 16:42:05 +01:00
Amaury Denoyelle	5b31989a3f	BUG/MEDIUM: quic: fix transient send error with listener socket Transient send errors is handled differentely if using connection or listener socket for QUIC transfers. In the first case, proper poller subscription is used via fd_cant_send()/fd_want_send(). For the listener socket case, error is ignored by qc_snd_buf() caller and retransmission mechanism will allow to reemit the data. For listener socket, transient error code handling is buggy. It blindly uses fd_cand_send() with <qc.fd> member which is set to -1 for listener socket usage. This results in an invalid fdtab access, with a possible crash or a modification of a totally unrelated FD. This bug is simply fixed by using qc_test_fd() before using fd_cant_send()/fd_want_send(). This ensures <qc.fd> is used only if initialized which is only the case when using connection socket. No crash was reported yet for this bug. However, it is reproducible by using ASAN compilation and the following strace sendmsg() errno command injection : # strace -qq -yy -p $(pgrep haproxy) -f -e trace=%network \ -e inject=sendto,sendmsg:error=EAGAIN:when=20+20 This must be backported up to 2.7.	2024-02-19 17:56:51 +01:00
Frédéric Lécaille	f74d882ef0	REORG: quic: Move the QUIC DCID parser to quic_sock.c Move quic_get_dgram_dcid() from quic_conn.c to quic_sock.c because only used in this file and define it as static.	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	0fc0d45745	REORG: quic: Add a new module to handle QUIC connection IDs Move quic_cid and quic_connnection_id from quic_conn-t.h to new quic_cid-t.h header. Move defintions of quic_stateless_reset_token_init(), quic_derive_cid(), new_quic_cid(), quic_get_cid_tid() and retrieve_qc_conn_from_cid() to quic_cid.c new C file.	2023-11-28 15:37:22 +01:00
Ilya Shipitsin	80813cdd2a	CLEANUP: assorted typo fixes in the code and comments This is 37th iteration of typo fixes	2023-11-23 16:23:14 +01:00
Amaury Denoyelle	bb28215d9b	MEDIUM: quic: define an accept queue limit QUIC connections are pushed manually into a dedicated listener queue when they are ready to be accepted. This happens after handshake finalization or on 0-RTT packet reception. Listener is then woken up to dequeue them with listener_accept(). This patch comptabilizes the number of connections currently stored in the accept queue. If reaching a certain limit, INITIAL packets are dropped on reception to prevent further QUIC connections allocation. This should help to preserve system resources. This limit is automatically derived from the listener backlog. Half of its value is reserved for handshakes and the other half for accept queues. By default, backlog is equal to maxconn which guarantee that there can't be no more than maxconn connections in handshake or waiting to be accepted.	2023-11-09 16:24:00 +01:00
Amaury Denoyelle	3df6a60113	MEDIUM: quic: limit handshake per listener Implement a limit per listener for concurrent number of QUIC connections. When reached, INITIAL packets for new connections are automatically dropped until the number of handshakes is reduced. The limit value is automatically based on listener backlog, which itself defaults to maxconn. This feature is important to ensure CPU and memory resources are not consume if too many handshakes attempt are started in parallel. Special care is taken if a connection is released before handshake completion. In this case, counter must be decremented. This forces to ensure that member <qc.state> is set early in qc_new_conn() before any quic_conn_release() invocation.	2023-11-09 16:23:52 +01:00
Amaury Denoyelle	f59f8326f9	REORG: quic: cleanup traces definition Move all QUIC trace definitions from quic_conn.h to quic_trace-t.h. Also remove multiple definition trace_quic macro definition into quic_trace.h. This forces all QUIC source files who relies on trace to include it while reducing the size of quic_conn.h.	2023-10-11 14:15:31 +02:00
Amaury Denoyelle	2ac5d9a657	MINOR: quic: handle perm error on bind during runtime Improve EACCES permission errors encounterd when using QUIC connection socket at runtime : * First occurence of the error on the process will generate a log warning. This should prevent users from using a privileged port without mandatory access rights. * Socket mode will automatically fallback to listener socket for the receiver instance. This requires to duplicate the settings from the bind_conf to the receiver instance to support configurations with multiple addresses on the same bind line.	2023-10-03 16:52:02 +02:00
Willy Tarreau	6cbb5a057b	Revert "MAJOR: import: update mt_list to support exponential back-off" This reverts commit c618ed5ff41ce29454e784c610b23bad0ea21f4f. The list iterator is broken. As found by Fred, running QUIC single- threaded shows that only the first connection is accepted because the accepter relies on the element being initialized once detached (which is expected and matches what MT_LIST_DELETE_SAFE() used to do before). However while doing this in the quic_sock code seems to work, doing it inside the macro show total breakage and the unit test doesn't work anymore (random crashes). Thus it looks like the fix is not trivial, let's roll this back for the time it will take to fix the loop.	2023-09-15 17:13:43 +02:00
Willy Tarreau	c618ed5ff4	MAJOR: import: update mt_list to support exponential back-off The new mt_list code supports exponential back-off on conflict, which is important for use cases where there is contention on a large number of threads. The API evolved a little bit and required some updates: - mt_list_for_each_entry_safe() is now in upper case to explicitly show that it is a macro, and only uses the back element, doesn't require a secondary pointer for deletes anymore. - MT_LIST_DELETE_SAFE() doesn't exist anymore, instead one just has to set the list iterator to NULL so that it is not re-inserted into the list and the list is spliced there. One must be careful because it was usually performed before freeing the element. Now instead the element must be nulled before the continue/break. - MT_LIST_LOCK_ELT() and MT_LIST_UNLOCK_ELT() have always been unclear. They were replaced by mt_list_cut_around() and mt_list_connect_elem() which more explicitly detach the element and reconnect it into the list. - MT_LIST_APPEND_LOCKED() was only in haproxy so it was left as-is in list.h. It may however possibly benefit from being upstreamed. This required tiny adaptations to event_hdl.c and quic_sock.c. The test case was updated and the API doc added. Note that in order to keep include files small, the struct mt_list definition remains in list-t.h (par of the internal API) and was ifdef'd out in mt_list.h. A test on QUIC with both quictls 1.1.1 and wolfssl 5.6.3 on ARM64 with 80 threads shows a drastic reduction of CPU usage thanks to this and the refined memory barriers. Please note that the CPU usage on OpenSSL 3.0.9 is significantly higher due to the excessive use of atomic ops by openssl, but 3.1 is only slightly above 1.1.1 though: - before: 35 Gbps, 3.5 Mpps, 7800% CPU - after: 41 Gbps, 4.2 Mpps, 2900% CPU	2023-09-13 11:50:33 +02:00
Amaury Denoyelle	7f80d51812	BUG/MEDIUM: quic: fix tasklet_wakeup loop on connection closing It is possible to trigger a loop of tasklets calls if a QUIC connection is interrupted abruptly by the client. This is caused by the following interaction : * FD iocb is woken up for read. This causes a wakeup on quic_conn tasklet. * quic_conn_io_cb is run and try to read but fails as the connection socket is closed (typically with a ECONNREFUSED). FD read is subscribed to the poller via qc_rcv_buf() which will cause the loop. The looping will stop automatically once the idle-timeout is expired and the connection instance is finally released. To fix this, ensure FD read is subscribed only for transient error cases (EAGAIN or similar). All other cases are considered as fatal and thus all future read operations will fail. Note that for the moment, nothing is reported on the quic_conn which may not skip future reception. This should be improved in a future commit to accelerate connection closing. This bug can be reproduced on a frequent occurence by interrupting the following command. Quic traces should be activated on haproxy side to detect the loop : $ ngtcp2-client --tp-file=/tmp/ngtcp2-tp.txt \ --session-file=/tmp/ngtcp2-session.txt \ -r 0.3 -t 0.3 --exit-on-all-streams-close 127.0.0.1 20443 \ "http://127.0.0.1:20443/?s=1024" This must be backported up to 2.7.	2023-08-11 17:04:20 +02:00
Frédéric Lécaille	5d602f4f33	MINOR: quic: Add a trace for QUIC conn fd ready for receive Add a trace as this is done for the "send ready" fd state.	2023-08-11 08:57:47 +02:00
Amaury Denoyelle	f40a72a7ff	BUILD: quic: fix wrong potential NULL dereference GCC warns about a possible NULL dereference when requeuing a datagram on the connection socket. This happens due to a MT_LIST_POP to retrieve a rxbuf instance. In fact, this can never be NULL there is enough rxbuf allocated for each thread. Once a thread has finished to work with it, it must always reappend it. This issue was introduced with the following patch : commit b34d353968db7f646e83871cb6b21a246af84ddc BUG/MEDIUM: quic: consume contig space on requeue datagram As such, it must be backported in every version with the above commit. This should fix the github CI compilation error.	2023-08-04 15:42:34 +02:00
Amaury Denoyelle	f59635c495	BUG/MINOR: quic: reappend rxbuf buffer on fake dgram alloc error A thread must always reappend the rxbuf instance after finishing datagram reception treatment. This was not the case on one error code path : when fake datagram allocation fails on datagram requeing. This issue was introduced with the following patch : commit b34d353968db7f646e83871cb6b21a246af84ddc BUG/MEDIUM: quic: consume contig space on requeue datagram As such, it must be backported in every version with the above commit.	2023-08-04 15:42:30 +02:00
Amaury Denoyelle	b34d353968	BUG/MEDIUM: quic: consume contig space on requeue datagram When handling UDP datagram reception, it is possible to receive a QUIC packet for one connection to the socket attached to another connection. To protect against this, an explicit comparison is done against the packet DCID and the quic-conn CID. On no match, the datagram is requeued and dispatched via rxbuf and will be treated as if it arrived on the listener socket. One reason for this wrong reception is explained by the small race condition that exists between bind() and connect() syscalls during connection socket initialization. However, one other reason which was not thought initially is when clients reuse the same IP:PORT for different connections. In this case the current FD attribution is not optimal and this can cause a substantial number of requeuing. This situation has revealed a bug during requeuing. If rxbuf contig space is not big enough for the datagram, the incoming datagram was dropped, even if there is space at buffer origin. This can cause several datagrams to be dropped in a series until eventually buffer head is moved when passing through the listener FD. To fix this, allocate a fake datagram to consume contig space. This is similar to the handling of datagrams on the listener FD. This allows then to store the datagram to requeue on buffer head and continue. This can be reproduced by starting a lot of connections. To increase the phenomena, POST are used to increase the number of datagram dropping : $ while true; do curl -F "a=@~/50k" -k --http3-only -o /dev/null https://127.0.0.1:20443/; done	2023-08-04 14:27:40 +02:00
Frédéric Lécaille	444c1a4113	MINOR: quic: Split QUIC connection code into three parts Move the TX part of the code to quic_tx.c. Add quic_tx-t.h and quic_tx.h headers for this TX part code. The definition of quic_tx_packet struct has been move from quic_conn-t.h to quic_tx-t.h. Same thing for the TX part: Move the RX part of the code to quic_rx.c. Add quic_rx-t.h and quic_rx.h headers for this TX part code. The definition of quic_rx_packet struct has been move from quic_conn-t.h to quic_rx-t.h.	2023-07-27 10:51:03 +02:00
Fr�d�ric L�caille	bdd64fd71d	MINOR: quic: Add some counters at QUIC connection level Add some statistical counters to quic_conn struct from quic_counters struct which are used at listener level to handle them at QUIC connection level. This avoid calling atomic functions. Furthermore this will be useful soon when a counter will be added for the total number of packets which have been sent which will be very often incremented. Some counters were not added, espcially those which count the number of QUIC errors by QUIC error types. Indeed such counters would be incremented most of the time only one time at QUIC connection level. Implement quic_conn_prx_cntrs_update() which accumulates the QUIC connection level statistical counters to the listener level statistical counters. Must be backported to 2.7.	2023-05-24 16:30:11 +02:00
Frédéric Lécaille	76d502588d	BUG/MINOR: quic: Wrong redispatch for external data on connection socket It is possible to receive datagram from other connection on a dedicated quic-conn socket. This is due to a race condition between bind() and connect() system calls. To handle this, an explicit check is done on each datagram. If the DCID is not associated to the connection which owns the socket, the datagram is redispatch as if it arrived on the listener socket. This redispatch step was not properly done because the source address specified for the redispatch function was incorrect. Instead of using the datagram source address, we used the address of the socket quic-conn which received the datagram due to the above race condition. Fix this simply by using the address from the recvmsg() system call. The impact of this bug is minor as redispatch on connection socket should be really rare. However, when it happens it can lead to several kinds of problems, like for example a connection initialized with an incorrect peer address. It can also break the Retry token check as this relies on the peer address. In fact, Retry token check failure was the reason this bug was found. When using h2load with thousands of clients, the counter of Retry token failure was unusually high. With this patch, no failure is reported anymore for Retry. Must be backported to 2.7.	2023-05-12 14:48:30 +02:00
Amaury Denoyelle	1bcb695a05	MINOR: quic: use real sending rate measurement Before this patch, global sending rate was measured on the QUIC lower layer just after sendto(). This meant that all QUIC frames were accounted for, including non STREAM frames and also retransmission. To have a better reflection of the application data transferred, move the incrementation into the MUX layer. This allows to account only for STREAM frames payload on their first emission. This should be backported up to 2.6.	2023-04-28 16:52:26 +02:00
Frédéric Lécaille	7d23e8d1a6	CLEANUP: quic: Rename several <buf> variables into quic_sock.c Rename some variables which are not struct buffer variables. Should be backported to 2.7.	2023-04-24 15:53:27 +02:00
Tim Duesterhus	c18e244515	CLEANUP: Stop checking the pointer before calling `pool_free()` Changes performed with this Coccinelle patch: @@ expression e; expression p; @@ - if (e != NULL) { pool_free(p, e); - } @@ expression e; expression p; @@ - if (e) { pool_free(p, e); - } @@ expression e; expression p; @@ - if (e) pool_free(p, e); @@ expression e; expression p; @@ - if (e != NULL) pool_free(p, e);	2023-04-23 00:28:25 +02:00
Willy Tarreau	6a4d48b736	MINOR: quic_sock: index li->per_thr[] on local thread id, not global one There's a li_per_thread array in each listener for use with QUIC listeners. Since thread groups were introduced, this array can be allocated too large because global.nbthread is allocated for each listener, while only no more than MIN(nbthread,MAX_THREADS_PER_GROUP) may be used by a single listener. This was because the global thread ID is used as the index instead of the local ID (since a listener may only be used by a single group). Let's just switch to local ID and reduce the allocated size.	2023-04-21 17:41:26 +02:00
Amaury Denoyelle	739de3f119	MINOR: quic: properly finalize thread rebinding When a quic_conn instance is rebinded on a new thread its tasks and tasklet are destroyed and new ones created. Its socket is also migrated to a new thread which stop reception on it. To properly reactivate a quic_conn after rebind, wake up its tasks and tasklet if they were active before thread rebind. Also reactivate reading on the socket FD. These operations are implemented on a new function qc_finalize_affinity_rebind(). This should be backported up to 2.7 after a period of observation.	2023-04-18 17:09:02 +02:00
Amaury Denoyelle	987812b190	MINOR: quic: do not proceed to accept for closing conn Each quic_conn is inserted in an accept queue to allocate the upper layers. This is done through a listener tasklet in quic_sock_accept_conn(). This patch interrupts the accept process for a quic_conn in closing/draining state. Indeed, this connection will soon be closed so it's unnecessary to allocate a complete stack for it. This patch will become necessary when thread migration is implemented. Indeed, it won't be allowed to proceed to thread migration for a closing quic_conn. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:48 +02:00
Amaury Denoyelle	f16ec344d5	MEDIUM: quic: handle conn bootstrap/handshake on a random thread TID encoding in CID was removed by a recent change. It is now possible to access to the <tid> member stored in quic_connection_id instance. For unknown CID, a quick solution was to redispatch to the thread corresponding to the first CID byte. This ensures that an identical CID will always be handled by the same thread to avoid creating multiple same connection. However, this forces an uneven load repartition which can be critical for QUIC handshake operation. To improve this, remove the above constraint. An unknown CID is now handled by its receiving thread. However, this means that if multiple packets are received with the same unknown CID, several threads will try to allocate the same connection. To prevent this race condition, CID insertion in global tree is now conducted first before creating the connection. This is a thread-safe operation which can only be executed by a single thread. The thread which have inserted the CID will then proceed to quic_conn allocation. Other threads won't be able to insert the same CID : this will stop the treatment of the current packet which is redispatch to the now owning thread. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:44 +02:00

1 2 3

144 Commits