haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-09-01 03:51:28 +02:00

Author	SHA1	Message	Date
Willy Tarreau	dd900aead8	BUILD: quic_sock: address a strict-aliasing build warning with gcc 5 and 6 The UDP GSO code emits a build warning with older toolchains (gcc 5 and 6): src/quic_sock.c: In function 'cmsg_set_gso': src/quic_sock.c:683:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] ((uint16_t )CMSG_DATA(c)) = gso_size; ^ Let's just use the write_u16() function that's made for this purpose. It was verified that for all versions from 5 to 13, gcc produces the exact same code with the fix (and without the warning). It arrived in 3.1 with commit 448d3d388a ("MINOR: quic: add GSO parameter on quic_sock send API") so this can be backported there.	2025-04-02 16:07:31 +02:00
Willy Tarreau	7760e3a374	CLEANUP: quic: replace ALREADY_CHECKED() with ASSUME_NONNULL() at a few places There were 4 instances of ALREADY_CHECKED() used to tell the compiler that the argument couldn't be NULL by design. Let's change them to the cleaner ASSUME_NONNULL(). Functions like qc_snd_buf() were slightly reduced in size (-24 bytes). Apparently gcc-13 sees a potential case that others don't see, and it's likely a bug since depending what is masked, it will completely change the output warnings to the point of contradicting itself. After many attempts, it appears that just checking that CMSG_FIRSTHDR(msg) is not null suffices to calm it down, so the strange warnings might have been the result of an overoptimization based on a supposed UB in the first place. At least now all versions up to 13.2 as well as clang are happy.	2024-12-17 17:47:57 +01:00
Willy Tarreau	2bc513dd31	BUILD: quic: fix build errors on FreeBSD since recent GSO changes The following commits broke the build on FreeBSD when QUIC is enabled: 35470d518 ("MINOR: quic: activate UDP GSO for QUIC if supported") 448d3d388 ("MINOR: quic: add GSO parameter on quic_sock send API") Indeed, it turns out that netinet/udp.h requires sys/types.h to be included before. Let's just change the includes order to fix the build. No backport is needed.	2024-08-30 18:53:49 +02:00
Amaury Denoyelle	85131f91bf	BUG/MEDIUM: quic: fix invalid conn reject with CONNECTION_REFUSED quic-initial rules were implemented just recently. For some actions, a new flags field was added in quic_dgram structure. This is used to report the result of the rules execution. However, this flags field was left uninitialized. Depending on its value, it may close the connection to be wrongly rejected via CONNECTION_REFUSED. Fix this by properly set flags value to 0. No need to backport.	2024-07-26 15:24:35 +02:00
Amaury Denoyelle	f91be2657e	MINOR: quic: pass quic_dgram as obj_type for quic-initial rules To extend quic-initial rules, pass quic_dgram instance to argument for the various actions. As such, quic_dgram is now supported as an obj_type and can be used in session origin field.	2024-07-25 15:39:39 +02:00
Amaury Denoyelle	d0ea173e35	MEDIUM: quic: implement GSO fallback mechanism UDP GSO on Linux is not implemented in every network devices. For example, this is not available for veth devices frequently used in container environment. In such case, EIO is reported on send() invocation. It is impossible to test at startup for proper GSO support in this case as a listener may be bound on multiple network interfaces. Furthermore, network interfaces may change during haproxy lifetime. As such, the only option is to react on send syscall error when GSO is used. The purpose of this patch is to implement a fallback when encountering such conditions. Emission can be retried immediately by trying to send each prepared datagrams individually. To support this, qc_send_ppkts() is able to iterate over each datagram in a so-called non-GSO fallback mode. Between each emission, a datagram header is rewritten in front of the buffer which allows the sending loop to proceed until last datagram is emitted. To complement this, quic_conn listener is flagged on first GSO send error with value LI_F_UDP_GSO_NOTSUPP. This completely disables GSO for all future emission with QUIC connections using this listener. For the moment, non-GSO fallback mode is activated when EIO is reported after GSO has been set. This is the error reported for the veth usage described above.	2024-07-11 11:02:44 +02:00
Amaury Denoyelle	448d3d388a	MINOR: quic: add GSO parameter on quic_sock send API Add <gso_size> parameter to qc_snd_buf(). When non-null, this specifies the value for socket option SOL_UDP/UDP_SEGMENT. This allows to send several datagrams in a single call by splitting data multiple times at <gso_size> boundary. For now, <gso_size> remains set to 0 by caller, as such there should not be any functional change.	2024-07-11 11:02:44 +02:00
Willy Tarreau	4e65fc66f6	MAJOR: import: update mt_list to support exponential back-off (try #2 ) This is the second attempt at importing the updated mt_list code (commit 59459ea3). The previous one was attempted with commit c618ed5ff4 ("MAJOR: import: update mt_list to support exponential back-off") but revealed problems with QUIC connections and was reverted. The problem that was faced was that elements deleted inside an iterator were no longer reset, and that if they were to be recycled in this form, they could appear as busy to the next user. This was trivially reproduced with this: $ cat quic-repro.cfg global stats socket /tmp/sock1 level admin stats timeout 1h limited-quic frontend stats mode http bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3 timeout client 5s stats uri / $ ./haproxy -db -f quic-repro.cfg & $ h2load -c 10 -n 100000 --npn h3 https://127.0.0.1:8443/ => hang This was purely an API issue caused by the simplified usage of the macros for the iterator. The original version had two backups (one full element and one pointer) that the user had to take care of, while the new one only uses one that is transparent for the user. But during removal, the element still has to be unlocked if it's going to be reused. All of this sparked discussions with Fred and Aur�lien regarding the still unclear state of locking. It was found that the lock API does too much at once and is lacking granularity. The new version offers a much more fine- grained control allowing to selectively lock/unlock an element, a link, the rest of the list etc. It was also found that plenty of places just want to free the current element, or delete it to do anything with it, hence don't need to reset its pointers (e.g. event_hdl). Finally it appeared obvious that the root cause of the problem was the unclear usage of the list iterators themselves because one does not necessarily expect the element to be presented locked when not needed, which makes the unlock easy to overlook during reviews. The updated version of the list presents explicit lock status in the macro name (_LOCKED or _UNLOCKED suffixes). When using the _LOCKED suffix, the caller is expected to unlock the element if it intends to reuse it. At least the status is advertised. The _UNLOCKED variant, instead, always unlocks it before starting the loop block. This means it's not necessary to think about unlocking it, though it's obviously not usable with everything. A few _UNLOCKED were used at obvious places (i.e. where the element is deleted and freed without any prior check). Interestingly, the tests performed last year on QUIC forwarding, that resulted in limited traffic for the original version and higher bit rate for the new one couldn't be reproduced because since then the QUIC stack has gaind in efficiency, and the 100 Gbps barrier is now reached with or without the mt_list update. However the unit tests definitely show a huge difference, particularly on EPYC platforms where the EBO provides tremendous CPU savings. Overall, the following changes are visible from the application code: - mt_list_for_each_entry_safe() + 1 back elem + 1 back ptr => MT_LIST_FOR_EACH_ENTRY_LOCKED() or MT_LIST_FOR_EACH_ENTRY_UNLOCKED() + 1 back elem - MT_LIST_DELETE_SAFE() no longer needed in MT_LIST_FOR_EACH_ENTRY_UNLOCKED() => just manually set iterator to NULL however. For MT_LIST_FOR_EACH_ENTRY_LOCKED() => mt_list_unlock_self() (if element going to be reused) + NULL - MT_LIST_LOCK_ELT => mt_list_lock_full() - MT_LIST_UNLOCK_ELT => mt_list_unlock_full() - l = MT_LIST_APPEND_LOCKED(h, e); MT_LIST_UNLOCK_ELT(); => l=mt_list_lock_prev(h); mt_list_lock_elem(e); mt_list_unlock_full(e, l)	2024-07-09 16:46:38 +02:00
Amaury Denoyelle	b9f67a46a2	MINOR: quic: clarify doc for quic_recv() Just highlight the fact that quic_recv() only receive a single datagram.	2024-05-24 14:36:31 +02:00
Amaury Denoyelle	45f40bac4c	MEDIUM: config: prevent communication with privileged ports This commit introduces a new global setting named harden.reject_privileged_ports.{tcp\|quic}. When active, communications with clients which use privileged source ports are forbidden. Such behavior is considered suspicious as it can be used as spoofing or DNS/NTP amplication attack. Value is configured per transport protocol. For each TCP and QUIC distinct code locations are impacted by this setting. The first one is in sock_accept_conn() which acts as a filter for all TCP based communications just after accept() returns a new connection. The second one is dedicated for QUIC communication in quic_recv(). In both cases, if a privileged source port is used and setting is disabled, received message is silently dropped. By default, protection are disabled for both protocols. This is to be able to backport it without breaking changes on stable release. This should be backported as it is an interesting security feature yet relatively simple to implement.	2024-05-24 14:36:31 +02:00
Willy Tarreau	72d0dcda8e	MINOR: dynbuf: pass a criticality argument to b_alloc() The goal is to indicate how critical the allocation is, between the least one (growing an existing buffer ring) and the topmost one (boot time allocation for the life of the process). The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function to try to allocate otherwise subscribe. There's currently no distinction of direction nor part that tries to allocate, and this should be revisited to improve this situation, particularly when we consider that mux-h2 can reduce its Tx allocations if needed. For now, 4 main levels are planned, to translate how the data travels inside haproxy from a producer to a consumer: - MUX_RX: buffer used to receive data from the OS - SE_RX: buffer used to place a transformation of the RX data for a mux, or to produce a response for an applet - CHANNEL: the channel buffer for sync recv - MUX_TX: buffer used to transfer data from the channel to the outside, generally a mux but there can be a few specificities (e.g. http client's response buffer passed to the application, which also gets a transformation of the channel data). The other levels are a bit different in that they don't strictly need to allocate for the first two ones, or they're permanent for the last one (used by compression).	2024-05-10 17:18:13 +02:00
Amaury Denoyelle	d8f1ff8648	BUG/MEDIUM: quic: fix connection freeze on post handshake After handshake completion, QUIC server is responsible to emit HANDSHAKE_DONE frame. Some clients wait for it to begin STREAM transfers. Previously, there was no explicit tasklet_wakeup() after handshake completion, which is necessary to emit post-handshake frames. In most cases, this was undetected as most client continue emission which will reschedule the tasklet. However, as there is no tasklet_wakeup(), this is not a consistent behavior. If this bug occurs, it causes a connection freeze, preventing the client to emit any request. The connection is finally closed on idle timeout. To fix this, add an explicit tasklet_wakeup() after handshake completion. It sounds simple enough but in fact it's difficult to find the correct location efor tasklet_wakeup() invocation, as post-handshake is directly linked to connection accept, with different orderings. Notably, if 0-RTT is used, connection can be accepted prior handshake completion. Another major point is that along HANDSHAKE_DONE frame, a series of NEW_CONNECTION_ID frames are emitted. However, these new CIDs allocation must occur after connection is migrated to its new thread as these CIDs are tied to it. A BUG_ON() is present to check this in qc_set_tid_affinity(). With all this in mind, 2 locations were selected for the necessary tasklet_wakeup() : * on qc_xprt_start() : this is useful for standard case without 0-RTT. This ensures that this is done only after connection thread migration. * on qc_ssl_provide_all_quic_data() : this is done on handshake completion with 0-RTT used. In this case only, connection is already accepted and migrated, so tasklet_wakeup() is safe. Note that as a side-change, quic_accept_push_qc() API has evolved to better reflect differences between standard and 0-RTT usages. It is now forbidden to call it multiple times on a single quic_conn instance. A BUG_ON() has been added. This issue is labelled as medium even though it seems pretty rare. It was only reproducible using QUIC interop runner, with haproxy compiled with LibreSSL with quic-go as client. However, affected code parts are pretty sensible, which justify the chosen severity. This should fix github issue #2418. It should be backported up to 2.6, after a brief period of observation. Note that the extra comment added in qc_set_tid_affinity() can be removed in 2.6 as thread migration is not implemented for this version. Other parts should apply without conflict.	2024-03-06 10:39:57 +01:00
Amaury Denoyelle	a17eaf7763	BUG/MINOR: quic: initialize msg_flags before sendmsg Previously, msghdr struct used for sendmsg was memset to 0. This was updated for performance reason with each members individually defined. This is done by the following commit : commit 107d6d75465419a09d90c790edb617091a04468a OPTIM: quic: improve slightly qc_snd_buf() internal msg_flags is the only member unset, as sendmsg manual page reports that it is unused. However, this caused a coverity report. In the end, it is better to explicitely set it to 0 to avoid any future interrogations, compiler warning or even portability issues. This should fix coverity report from github issue #2455. No need to backport unless above patch is.	2024-02-21 10:13:53 +01:00
Amaury Denoyelle	8b950f40fa	MINOR: quic: only use sendmsg() syscall variant This patch is the direct followup of the previous one : MINOR: quic: remove sendto() usage variant This finalizes qc_snd_buf() simplification by removing send() syscall usage for quic-conn owned socket. Syscall invocation is merged in a single code location to the sendmsg() variant. The only difference for owned socket is that destination address for sendmsg() is set to NULL. This usage is documented in man 2 sendmsg as valid for connected sockets. This allows maximum performance by avoiding unnecessary lookups on kernel socket address tables. As the previous patch, no functional change should happen here. However, it will be simpler to extend qc_snd_buf() for GSO usage.	2024-02-20 16:42:05 +01:00
Amaury Denoyelle	8de9f8f193	MINOR: quic: remove sendto() usage variant qc_snd_buf() is a wrapper around emission syscalls. Given QUIC configuration, a different variant is used. When using connection socket, send() is the only used. For listener sockets, sendmsg() and sendto() are possible. The first one is used only if local address has been retrieved prior. This allows to fix it on sending to guarantee the source address selection. Finally, sendto() is used for systems which do not support local address retrieval. All of these variants render the code too complex. As such, this patch simplifies this by removing sendto() alternative. Now, sendmsg() is always used for listener sockets. Source address is then specified only if supported by the system. This patch should not exhibit functional behavior changes. It will be useful when implementing GSO as the code is now simpler.	2024-02-20 16:42:05 +01:00
Amaury Denoyelle	ea90c39302	MINOR: quic: move IP_PKTINFO on send on a dedicated function When using listener socket, source address for emission is explicitely set using ancillary data for sendmsg(). This is useful to guarantee the correct address is used when binding on a non-explicit address. This code was implemented directly under qc_snd_buf(). However, it is quite complex due to portability issue. For IPv4, two parallel implementations coexist, defined under IP_PKTINFO or IP_RECVDSTADDR. For IPv6, another option is defined under IPV6_RECVPKTINFO. Each variant uses its distinct name which increase the code complexity. Extract ancillary data filling in a dedicated function named cmsg_set_saddr(). This reduces greatly the body of qc_snd_buf(). Such functions can be replicated when other ancillary data type will be implemented. This will notably be useful for GSO implementation.	2024-02-20 16:42:05 +01:00
Amaury Denoyelle	107d6d7546	OPTIM: quic: improve slightly qc_snd_buf() internal qc_snd_buf() is a wrapper for sendmsg() syscall (or its derivatives) used for all QUIC emissions. This patch aims at removing several non-optimal code sections : * fd_send_ready() for connected sockets is only checked on the function preambule instead of inside the emission loop * zero-ing msghdr structure for unconnected sockets is removed. This is unnecessary as all fields are properly initialized then. * extra memcpy/memset invocations when using IP_PKTINFO/IPV6_RECVPKTINFO are removed by setting directly the address value into cmsg buffer	2024-02-20 16:42:05 +01:00
Amaury Denoyelle	5b31989a3f	BUG/MEDIUM: quic: fix transient send error with listener socket Transient send errors is handled differentely if using connection or listener socket for QUIC transfers. In the first case, proper poller subscription is used via fd_cant_send()/fd_want_send(). For the listener socket case, error is ignored by qc_snd_buf() caller and retransmission mechanism will allow to reemit the data. For listener socket, transient error code handling is buggy. It blindly uses fd_cand_send() with <qc.fd> member which is set to -1 for listener socket usage. This results in an invalid fdtab access, with a possible crash or a modification of a totally unrelated FD. This bug is simply fixed by using qc_test_fd() before using fd_cant_send()/fd_want_send(). This ensures <qc.fd> is used only if initialized which is only the case when using connection socket. No crash was reported yet for this bug. However, it is reproducible by using ASAN compilation and the following strace sendmsg() errno command injection : # strace -qq -yy -p $(pgrep haproxy) -f -e trace=%network \ -e inject=sendto,sendmsg:error=EAGAIN:when=20+20 This must be backported up to 2.7.	2024-02-19 17:56:51 +01:00
Frédéric Lécaille	f74d882ef0	REORG: quic: Move the QUIC DCID parser to quic_sock.c Move quic_get_dgram_dcid() from quic_conn.c to quic_sock.c because only used in this file and define it as static.	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	0fc0d45745	REORG: quic: Add a new module to handle QUIC connection IDs Move quic_cid and quic_connnection_id from quic_conn-t.h to new quic_cid-t.h header. Move defintions of quic_stateless_reset_token_init(), quic_derive_cid(), new_quic_cid(), quic_get_cid_tid() and retrieve_qc_conn_from_cid() to quic_cid.c new C file.	2023-11-28 15:37:22 +01:00
Ilya Shipitsin	80813cdd2a	CLEANUP: assorted typo fixes in the code and comments This is 37th iteration of typo fixes	2023-11-23 16:23:14 +01:00
Amaury Denoyelle	bb28215d9b	MEDIUM: quic: define an accept queue limit QUIC connections are pushed manually into a dedicated listener queue when they are ready to be accepted. This happens after handshake finalization or on 0-RTT packet reception. Listener is then woken up to dequeue them with listener_accept(). This patch comptabilizes the number of connections currently stored in the accept queue. If reaching a certain limit, INITIAL packets are dropped on reception to prevent further QUIC connections allocation. This should help to preserve system resources. This limit is automatically derived from the listener backlog. Half of its value is reserved for handshakes and the other half for accept queues. By default, backlog is equal to maxconn which guarantee that there can't be no more than maxconn connections in handshake or waiting to be accepted.	2023-11-09 16:24:00 +01:00
Amaury Denoyelle	3df6a60113	MEDIUM: quic: limit handshake per listener Implement a limit per listener for concurrent number of QUIC connections. When reached, INITIAL packets for new connections are automatically dropped until the number of handshakes is reduced. The limit value is automatically based on listener backlog, which itself defaults to maxconn. This feature is important to ensure CPU and memory resources are not consume if too many handshakes attempt are started in parallel. Special care is taken if a connection is released before handshake completion. In this case, counter must be decremented. This forces to ensure that member <qc.state> is set early in qc_new_conn() before any quic_conn_release() invocation.	2023-11-09 16:23:52 +01:00
Amaury Denoyelle	f59f8326f9	REORG: quic: cleanup traces definition Move all QUIC trace definitions from quic_conn.h to quic_trace-t.h. Also remove multiple definition trace_quic macro definition into quic_trace.h. This forces all QUIC source files who relies on trace to include it while reducing the size of quic_conn.h.	2023-10-11 14:15:31 +02:00
Amaury Denoyelle	2ac5d9a657	MINOR: quic: handle perm error on bind during runtime Improve EACCES permission errors encounterd when using QUIC connection socket at runtime : * First occurence of the error on the process will generate a log warning. This should prevent users from using a privileged port without mandatory access rights. * Socket mode will automatically fallback to listener socket for the receiver instance. This requires to duplicate the settings from the bind_conf to the receiver instance to support configurations with multiple addresses on the same bind line.	2023-10-03 16:52:02 +02:00
Willy Tarreau	6cbb5a057b	Revert "MAJOR: import: update mt_list to support exponential back-off" This reverts commit c618ed5ff41ce29454e784c610b23bad0ea21f4f. The list iterator is broken. As found by Fred, running QUIC single- threaded shows that only the first connection is accepted because the accepter relies on the element being initialized once detached (which is expected and matches what MT_LIST_DELETE_SAFE() used to do before). However while doing this in the quic_sock code seems to work, doing it inside the macro show total breakage and the unit test doesn't work anymore (random crashes). Thus it looks like the fix is not trivial, let's roll this back for the time it will take to fix the loop.	2023-09-15 17:13:43 +02:00
Willy Tarreau	c618ed5ff4	MAJOR: import: update mt_list to support exponential back-off The new mt_list code supports exponential back-off on conflict, which is important for use cases where there is contention on a large number of threads. The API evolved a little bit and required some updates: - mt_list_for_each_entry_safe() is now in upper case to explicitly show that it is a macro, and only uses the back element, doesn't require a secondary pointer for deletes anymore. - MT_LIST_DELETE_SAFE() doesn't exist anymore, instead one just has to set the list iterator to NULL so that it is not re-inserted into the list and the list is spliced there. One must be careful because it was usually performed before freeing the element. Now instead the element must be nulled before the continue/break. - MT_LIST_LOCK_ELT() and MT_LIST_UNLOCK_ELT() have always been unclear. They were replaced by mt_list_cut_around() and mt_list_connect_elem() which more explicitly detach the element and reconnect it into the list. - MT_LIST_APPEND_LOCKED() was only in haproxy so it was left as-is in list.h. It may however possibly benefit from being upstreamed. This required tiny adaptations to event_hdl.c and quic_sock.c. The test case was updated and the API doc added. Note that in order to keep include files small, the struct mt_list definition remains in list-t.h (par of the internal API) and was ifdef'd out in mt_list.h. A test on QUIC with both quictls 1.1.1 and wolfssl 5.6.3 on ARM64 with 80 threads shows a drastic reduction of CPU usage thanks to this and the refined memory barriers. Please note that the CPU usage on OpenSSL 3.0.9 is significantly higher due to the excessive use of atomic ops by openssl, but 3.1 is only slightly above 1.1.1 though: - before: 35 Gbps, 3.5 Mpps, 7800% CPU - after: 41 Gbps, 4.2 Mpps, 2900% CPU	2023-09-13 11:50:33 +02:00
Amaury Denoyelle	7f80d51812	BUG/MEDIUM: quic: fix tasklet_wakeup loop on connection closing It is possible to trigger a loop of tasklets calls if a QUIC connection is interrupted abruptly by the client. This is caused by the following interaction : * FD iocb is woken up for read. This causes a wakeup on quic_conn tasklet. * quic_conn_io_cb is run and try to read but fails as the connection socket is closed (typically with a ECONNREFUSED). FD read is subscribed to the poller via qc_rcv_buf() which will cause the loop. The looping will stop automatically once the idle-timeout is expired and the connection instance is finally released. To fix this, ensure FD read is subscribed only for transient error cases (EAGAIN or similar). All other cases are considered as fatal and thus all future read operations will fail. Note that for the moment, nothing is reported on the quic_conn which may not skip future reception. This should be improved in a future commit to accelerate connection closing. This bug can be reproduced on a frequent occurence by interrupting the following command. Quic traces should be activated on haproxy side to detect the loop : $ ngtcp2-client --tp-file=/tmp/ngtcp2-tp.txt \ --session-file=/tmp/ngtcp2-session.txt \ -r 0.3 -t 0.3 --exit-on-all-streams-close 127.0.0.1 20443 \ "http://127.0.0.1:20443/?s=1024" This must be backported up to 2.7.	2023-08-11 17:04:20 +02:00
Frédéric Lécaille	5d602f4f33	MINOR: quic: Add a trace for QUIC conn fd ready for receive Add a trace as this is done for the "send ready" fd state.	2023-08-11 08:57:47 +02:00
Amaury Denoyelle	f40a72a7ff	BUILD: quic: fix wrong potential NULL dereference GCC warns about a possible NULL dereference when requeuing a datagram on the connection socket. This happens due to a MT_LIST_POP to retrieve a rxbuf instance. In fact, this can never be NULL there is enough rxbuf allocated for each thread. Once a thread has finished to work with it, it must always reappend it. This issue was introduced with the following patch : commit b34d353968db7f646e83871cb6b21a246af84ddc BUG/MEDIUM: quic: consume contig space on requeue datagram As such, it must be backported in every version with the above commit. This should fix the github CI compilation error.	2023-08-04 15:42:34 +02:00
Amaury Denoyelle	f59635c495	BUG/MINOR: quic: reappend rxbuf buffer on fake dgram alloc error A thread must always reappend the rxbuf instance after finishing datagram reception treatment. This was not the case on one error code path : when fake datagram allocation fails on datagram requeing. This issue was introduced with the following patch : commit b34d353968db7f646e83871cb6b21a246af84ddc BUG/MEDIUM: quic: consume contig space on requeue datagram As such, it must be backported in every version with the above commit.	2023-08-04 15:42:30 +02:00
Amaury Denoyelle	b34d353968	BUG/MEDIUM: quic: consume contig space on requeue datagram When handling UDP datagram reception, it is possible to receive a QUIC packet for one connection to the socket attached to another connection. To protect against this, an explicit comparison is done against the packet DCID and the quic-conn CID. On no match, the datagram is requeued and dispatched via rxbuf and will be treated as if it arrived on the listener socket. One reason for this wrong reception is explained by the small race condition that exists between bind() and connect() syscalls during connection socket initialization. However, one other reason which was not thought initially is when clients reuse the same IP:PORT for different connections. In this case the current FD attribution is not optimal and this can cause a substantial number of requeuing. This situation has revealed a bug during requeuing. If rxbuf contig space is not big enough for the datagram, the incoming datagram was dropped, even if there is space at buffer origin. This can cause several datagrams to be dropped in a series until eventually buffer head is moved when passing through the listener FD. To fix this, allocate a fake datagram to consume contig space. This is similar to the handling of datagrams on the listener FD. This allows then to store the datagram to requeue on buffer head and continue. This can be reproduced by starting a lot of connections. To increase the phenomena, POST are used to increase the number of datagram dropping : $ while true; do curl -F "a=@~/50k" -k --http3-only -o /dev/null https://127.0.0.1:20443/; done	2023-08-04 14:27:40 +02:00
Frédéric Lécaille	444c1a4113	MINOR: quic: Split QUIC connection code into three parts Move the TX part of the code to quic_tx.c. Add quic_tx-t.h and quic_tx.h headers for this TX part code. The definition of quic_tx_packet struct has been move from quic_conn-t.h to quic_tx-t.h. Same thing for the TX part: Move the RX part of the code to quic_rx.c. Add quic_rx-t.h and quic_rx.h headers for this TX part code. The definition of quic_rx_packet struct has been move from quic_conn-t.h to quic_rx-t.h.	2023-07-27 10:51:03 +02:00
Fr�d�ric L�caille	bdd64fd71d	MINOR: quic: Add some counters at QUIC connection level Add some statistical counters to quic_conn struct from quic_counters struct which are used at listener level to handle them at QUIC connection level. This avoid calling atomic functions. Furthermore this will be useful soon when a counter will be added for the total number of packets which have been sent which will be very often incremented. Some counters were not added, espcially those which count the number of QUIC errors by QUIC error types. Indeed such counters would be incremented most of the time only one time at QUIC connection level. Implement quic_conn_prx_cntrs_update() which accumulates the QUIC connection level statistical counters to the listener level statistical counters. Must be backported to 2.7.	2023-05-24 16:30:11 +02:00
Frédéric Lécaille	76d502588d	BUG/MINOR: quic: Wrong redispatch for external data on connection socket It is possible to receive datagram from other connection on a dedicated quic-conn socket. This is due to a race condition between bind() and connect() system calls. To handle this, an explicit check is done on each datagram. If the DCID is not associated to the connection which owns the socket, the datagram is redispatch as if it arrived on the listener socket. This redispatch step was not properly done because the source address specified for the redispatch function was incorrect. Instead of using the datagram source address, we used the address of the socket quic-conn which received the datagram due to the above race condition. Fix this simply by using the address from the recvmsg() system call. The impact of this bug is minor as redispatch on connection socket should be really rare. However, when it happens it can lead to several kinds of problems, like for example a connection initialized with an incorrect peer address. It can also break the Retry token check as this relies on the peer address. In fact, Retry token check failure was the reason this bug was found. When using h2load with thousands of clients, the counter of Retry token failure was unusually high. With this patch, no failure is reported anymore for Retry. Must be backported to 2.7.	2023-05-12 14:48:30 +02:00
Amaury Denoyelle	1bcb695a05	MINOR: quic: use real sending rate measurement Before this patch, global sending rate was measured on the QUIC lower layer just after sendto(). This meant that all QUIC frames were accounted for, including non STREAM frames and also retransmission. To have a better reflection of the application data transferred, move the incrementation into the MUX layer. This allows to account only for STREAM frames payload on their first emission. This should be backported up to 2.6.	2023-04-28 16:52:26 +02:00
Frédéric Lécaille	7d23e8d1a6	CLEANUP: quic: Rename several <buf> variables into quic_sock.c Rename some variables which are not struct buffer variables. Should be backported to 2.7.	2023-04-24 15:53:27 +02:00
Tim Duesterhus	c18e244515	CLEANUP: Stop checking the pointer before calling `pool_free()` Changes performed with this Coccinelle patch: @@ expression e; expression p; @@ - if (e != NULL) { pool_free(p, e); - } @@ expression e; expression p; @@ - if (e) { pool_free(p, e); - } @@ expression e; expression p; @@ - if (e) pool_free(p, e); @@ expression e; expression p; @@ - if (e != NULL) pool_free(p, e);	2023-04-23 00:28:25 +02:00
Willy Tarreau	6a4d48b736	MINOR: quic_sock: index li->per_thr[] on local thread id, not global one There's a li_per_thread array in each listener for use with QUIC listeners. Since thread groups were introduced, this array can be allocated too large because global.nbthread is allocated for each listener, while only no more than MIN(nbthread,MAX_THREADS_PER_GROUP) may be used by a single listener. This was because the global thread ID is used as the index instead of the local ID (since a listener may only be used by a single group). Let's just switch to local ID and reduce the allocated size.	2023-04-21 17:41:26 +02:00
Amaury Denoyelle	739de3f119	MINOR: quic: properly finalize thread rebinding When a quic_conn instance is rebinded on a new thread its tasks and tasklet are destroyed and new ones created. Its socket is also migrated to a new thread which stop reception on it. To properly reactivate a quic_conn after rebind, wake up its tasks and tasklet if they were active before thread rebind. Also reactivate reading on the socket FD. These operations are implemented on a new function qc_finalize_affinity_rebind(). This should be backported up to 2.7 after a period of observation.	2023-04-18 17:09:02 +02:00
Amaury Denoyelle	987812b190	MINOR: quic: do not proceed to accept for closing conn Each quic_conn is inserted in an accept queue to allocate the upper layers. This is done through a listener tasklet in quic_sock_accept_conn(). This patch interrupts the accept process for a quic_conn in closing/draining state. Indeed, this connection will soon be closed so it's unnecessary to allocate a complete stack for it. This patch will become necessary when thread migration is implemented. Indeed, it won't be allowed to proceed to thread migration for a closing quic_conn. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:48 +02:00
Amaury Denoyelle	f16ec344d5	MEDIUM: quic: handle conn bootstrap/handshake on a random thread TID encoding in CID was removed by a recent change. It is now possible to access to the <tid> member stored in quic_connection_id instance. For unknown CID, a quick solution was to redispatch to the thread corresponding to the first CID byte. This ensures that an identical CID will always be handled by the same thread to avoid creating multiple same connection. However, this forces an uneven load repartition which can be critical for QUIC handshake operation. To improve this, remove the above constraint. An unknown CID is now handled by its receiving thread. However, this means that if multiple packets are received with the same unknown CID, several threads will try to allocate the same connection. To prevent this race condition, CID insertion in global tree is now conducted first before creating the connection. This is a thread-safe operation which can only be executed by a single thread. The thread which have inserted the CID will then proceed to quic_conn allocation. Other threads won't be able to insert the same CID : this will stop the treatment of the current packet which is redispatch to the now owning thread. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:44 +02:00
Amaury Denoyelle	e83f937cc1	MEDIUM: quic: use a global CID trees list Previously, quic_connection_id were stored in a per-thread tree list. Datagram were first dispatched to the correct thread using the encoded TID before a tree lookup was done. Remove these trees and replace it with a global trees list of 256 entries. A CID is using the list index corresponding to its first byte. On datagram dispatch, CID is lookup on its tree and TID is retrieved using new member quic_connection_id.tid. As such, a read-write lock protects each list instances. With 256 entries, it is expected that contention should be reduced. A new structure quic_cid_tree served as a tree container associated with its read-write lock. An API is implemented to ensure lock safety for insert/lookup/delete operation. This patch is a step forward to be able to break the affinity between a CID and a TID encoded thread. This is required to be able to migrate a quic_conn after accept to select thread based on their load. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:17 +02:00
Amaury Denoyelle	66947283ba	MINOR: quic: remove TID ref from quic_conn Remove <tid> member in quic_conn. This is moved to quic_connection_id instance. For the moment, this change has no impact. Indeed, qc.tid reference could easily be replaced by tid as all of this work was already done on the connection thread. However, it is planified to support quic_conn thread migration in the future, so removal of qc.tid will simplify this. This should be backported up to 2.7.	2023-04-18 16:20:47 +02:00
Willy Tarreau	8f6da64641	MINOR: quic_sock: un-statify quic_conn_sock_fd_iocb() This one is printed as the iocb in the "show fd" output, and arguably this wasn't very convenient as-is: 293 : st=0x000123(cl heopI W:sRa R:sRA) ref=0 gid=1 tmask=0x8 umask=0x0 prmsk=0x8 pwmsk=0x0 owner=0x7f488487afe0 iocb=0x50a2c0(main+0x60f90) Let's unstatify it and export it so that the symbol can now be resolved from the various points that need it.	2023-03-10 14:30:01 +01:00
Frédéric Lécaille	4377dbd756	BUG/MINOR: quic: Missing listener accept queue tasklet wakeups This bug was revealed by h2load tests run as follows: h2load -t 4 --npn-list h3 -c 64 -m 16 -n 16384 -v https://127.0.0.1:4443/ This open (-c) 64 QUIC connections (-n) 16384 h3 requets from (-t) 4 threads, i.e. 256 requests by connection. Such tests could not always pass and often ended with such results displays by h2load: finished in 53.74s, 38.11 req/s, 493.78KB/s requests: 16384 total, 2944 started, 2048 done, 2048 succeeded, 14336 failed, 14336 errored, 0 timeout status codes: 2048 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 25.92MB (27174537) total, 102.00KB (104448) headers (space savings 1.92%), 25.80MB (27053569) data UDP datagram: 3883 sent, 24330 received min max mean sd ± sd time for request: 48.75ms 502.86ms 134.12ms 75.80ms 92.68% time for connect: 20.94ms 331.24ms 189.59ms 84.81ms 59.38% time to 1st byte: 394.36ms 417.01ms 406.72ms 9.14ms 75.00% req/s : 0.00 115.45 14.30 38.13 87.50% The number of successful requests was always a multiple of 256. Activating the traces also shew that some connections were blocked after having successfully completed their handshakes due to the fact that the mux. The mux is started upon the acceptation of the connection. Under heavy load, some connections were never accepted. From the moment where more than 4 (MAXACCEPT) connections were enqueued before a listener could be woken up to accept at most 4 connections, the remaining connections were not accepted ore lately at the second listener tasklet wakeup. Add a call to tasklet_wakeup() to the accept list tasklet of the listeners to wake up it if there are remaining connections to accept after having called listener_accept(). In this case the listener must not be removed of this accept list, if not at the next call it will not accept anything more. Must be backported to 2.7 and 2.6.	2023-03-10 14:05:24 +01:00
Amaury Denoyelle	caa16549b8	MINOR: quic: notify on send ready This patch completes the previous one with poller subscribe of quic-conn owned socket on sendto() error. This ensures that mux-quic is notified if waiting on sending when a transient sendto() error is cleared. As such, qc_notify_send() is called directly inside socket I/O callback. qc_notify_send() internal condition have been thus completed. This will prevent to notify upper layer until all sending condition are fulfilled: room in congestion window and no transient error on socket FD. This should be backported up to 2.7.	2023-03-01 14:32:37 +01:00
Amaury Denoyelle	e1a0ee3cf6	MEDIUM: quic: implement poller subscribe on sendto error On sendto() transient error, prior to this patch sending was simulated and we relied on retransmission to retry sending. This could hurt significantly the performance. Thanks to quic-conn owned socket support, it is now possible to improve this. On transient error, sending is interrupted and quic-conn socket FD is subscribed on the poller for sending. When send is possible, quic_conn_sock_fd_iocb() will be in charge of restart sending. A consequence of this change is on the return value of qc_send_ppkts(). This function will now return 0 on transient error if quic-conn has its owned socket. This is used to interrupt sending in the calling function. The flag QUIC_FL_CONN_TO_KILL must be checked to differentiate a fatal error from a transient one. This should be backported up to 2.7.	2023-03-01 14:32:37 +01:00
Amaury Denoyelle	4bdd069637	MINOR: quic: consider EBADF as critical on send() EBADF on sendto() is considered as a fatal error. As such, it is removed from the list of the transient errors. The connection will be killed when encountered. For the record, EBADF can be encountered on process termination with the listener socket. This should be backported up to 2.7.	2023-02-28 10:51:25 +01:00
Amaury Denoyelle	1febc2d316	MEDIUM: quic: improve fatal error handling on send Send is conducted through qc_send_ppkts() for a QUIC connection. There is two types of error which can be encountered on sendto() or affiliated syscalls : * transient error. In this case, sending is simulated with the remaining data and retransmission process is used to have the opportunity to retry emission * fatal error. If this happens, the connection should be closed as soon as possible. This is done via qc_kill_conn() function. Until this patch, only ECONNREFUSED errno was considered as fatal. Modify the QUIC send API to be able to differentiate transient and fatal errors more easily. This is done by fixing the return value of the sendto() wrapper qc_snd_buf() : * on fatal error, a negative error code is returned. This is now the case for every errno except EAGAIN, EWOULDBLOCK, ENOTCONN, EINPROGRESS and EBADF. * on a transient error, 0 is returned. This is the case for the listed errno values above and also if a partial send has been conducted by the kernel. * on success, the return value of sendto() syscall is returned. This commit will be useful to be able to handle transient error with a quic-conn owned socket. In this case, the socket should be subscribed to the poller and no simulated send will be conducted. This commit allows errno management to be confined in the quic-sock module which is a nice cleanup. On a final note, EBADF should be considered as fatal. This will be the subject of a next commit. This should be backported up to 2.7.	2023-02-28 10:51:25 +01:00

1 2 3

136 Commits