haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-19 05:31:26 +02:00

Author	SHA1	Message	Date
Willy Tarreau	c264ea1679	MEDIUM: tree-wide: replace most DECLARE_POOL with DECLARE_TYPED_POOL This will make the pools size and alignment automatically inherit the type declaration. It was done like this: sed -i -e 's:DECLARE_POOL($[^,],[^,],\s$sizeof($[^)]$)):DECLARE_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_POOL src addons) sed -i -e 's:DECLARE_STATIC_POOL($[^,],[^,],\s$sizeof($[^)]$)):DECLARE_STATIC_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_STATIC_POOL src addons) 81 replacements were made. The only remaining ones are those which set their own size without depending on a structure. The few ones with an extra size were manually handled. It also means that the requested alignments are now checked against the type's. Given that none is specified for now, no issue is reported. It was verified with "show pools detailed" that the definitions are exactly the same, and that the binaries are similar.	2025-08-11 19:55:30 +02:00
Amaury Denoyelle	731b52ded9	MINOR: quic: prefer qc_is_back() usage over qc->target Previously quic_conn <target> member was used to determine if quic_conn was used on the frontend (as server) or backend side (as client). A new helper function can now be used to directly check flag QUIC_FL_CONN_IS_BACK. This reduces the dependency between quic_conn and their relative listener/server instances.	2025-08-07 16:59:59 +02:00
Amaury Denoyelle	e064e5d461	MINOR: quic: duplicate GSO unsupp status from listener to conn QUIC emission can use GSO to emit multiple datagrams with a single syscall invokation. However, this feature relies on several kernel parameters which are checked on haproxy process startup. Even if these checks report no issue, GSO may still be unable due to the underlying network adapter underneath. Thus, if a EIO occured on sendmsg() with GSO, listener is flagged to mark GSO as unsupported. This allows every other QUIC connections to share the status and avoid using GSO when using this listener. Previously, listener flag was checked for every QUIC emission. This was done using an atomic operation to prevent races. Improve this by duplicating GSO unsupported status as the connection level. This is done on qc_new_conn() and also on thread rebinding if a new listener instance is used. The main benefit from this patch is to reduce the dependency between quic_conn and listener instances.	2025-08-07 16:36:26 +02:00
Frederic Lecaille	14d0f74052	MINOR: quic: Remove pool_head_quic_be_cc_buf pool This patch impacts the QUIC frontends. It reverts this patch MINOR: quic-be: add a "CC connection" backend TX buffer pool which adds <pool_head_quic_be_cc_buf> new pool to allocate CC (connection closed state) TX buffers with bigger object size than the one for <pool_head_quic_cc_buf>. Indeed the QUIC backends must be able to send at least 1200 bytes Initial packets. For now on, both the QUIC frontends and backend use the same pool with MAX(QUIC_INITIAL_IPV6_MTU, QUIC_INITIAL_IPV4_MTU)(1252 bytes) as object size.	2025-07-17 19:33:21 +02:00
Frederic Lecaille	838024e07e	MINOR: quic: Get rid of qc_is_listener() Replace all calls to qc_is_listener() (resp. !qc_is_listener()) by calls to objt_listener() (resp. objt_server()). Remove qc_is_listener() implement and QUIC_FL_CONN_LISTENER the flag it relied on.	2025-07-16 16:42:21 +02:00
Ilia Shipitsin	0ee3d739b8	CLEANUP: assorted typo fixes in the code, commits and doc Corrected various spelling and phrasing errors to improve clarity and consistency.	2025-07-10 19:49:48 +02:00
Frederic Lecaille	87ada46f38	BUG/MINOR: quic-be: Malformed coalesced Initial packets This bug fix completes this patch which was not sufficient: MINOR: quic-be: Allow sending 1200 bytes Initial datagrams This patch could not allow the build of well formed Initial packets coalesced to others (Handshake) packets. Indeed, the <padding> parameter passed to qc_build_pkt() is deduced from a first value: <padding> value and must be set to 1 for the last encryption level. As a client, the last encryption level is always the Handshake encryption level. But <padding> was always set to 1 for a QUIC client, leading the first Initial packet to be malformed because considered as the second one into the same datagram. So, this patch sets <padding> value passed to qc_build_pkt() to 1 only when there is no last encryption level at all, to allow the build of Initial only packets (not coalesced) or when it frames to send (coalesced packets). No need to backport.	2025-07-07 14:13:02 +02:00
Frederic Lecaille	194e3bc2d5	MINOR: quic-be: address validation support implementation (RETRY) - Add ->retry_token and ->retry_token_len new quic_conn struct members to store the retry tokens. These objects are allocated by quic_rx_packet_parse() and released by quic_conn_release(). - Add <pool_head_quic_retry_token> new pool for these tokens. - Implement quic_retry_packet_check() to check the integrity tag of these tokens upon RETRY packets receipt. quic_tls_generate_retry_integrity_tag() is called by this new function. It has been modified to pass the address where the tag must be generated - Add <resend> new parameter to quic_pktns_discard(). This function is called to discard the packet number spaces where the already TX packets and frames are attached to. <resend> allows the caller to prevent this function to release the in flight TX packets/frames. The frames are requeued to be resent. - Modify quic_rx_pkt_parse() to handle the RETRY packets. What must be done upon such packets receipt is: - store the retry token, - store the new peer SCID as the DCID of the connection. Note that the peer will modify again its SCID. This is why this SCID is also stored as the ODCID which must be matched with the peer retry_source_connection_id transport parameter, - discard the Initial packet number space without flagging it as discarded and prevent retransmissions calling qc_set_timer(), - modify the TLS cryptographic cipher contexts (RX/TX), - wakeup the I/O handler to send new Initial packets asap. - Modify quic_transport_param_decode() to handle the retry_source_connection_id transport parameter as a QUIC client. Then its caller is modified to check this transport parameter matches with the SCID sent by the peer with the RETRY packet.	2025-06-26 09:48:00 +02:00
Frederic Lecaille	8a25fcd36e	MINOR: quic-be: Allow sending 1200 bytes Initial datagrams This easy to understand patch is not intrusive at all and cannot break the QUIC listeners. The QUIC client MUST always pad its datagrams with Initial packets. A "!l" (not a listener) test OR'ed with the existing ones is added to satisfy the condition to allow the build of such datagrams.	2025-06-26 09:48:00 +02:00
Frederic Lecaille	c898b29e64	MINOR: quic: Useless TX buffer size reduction in closing state There is no need to limit the size of the TX buffer to QUIC_MIN_CC_PKTSIZE bytes when the connection is in closing state. There is already a test which limits the number of bytes to be used from this TX buffer after this useless test removed. It limits this number of bytes to the size of the TX buffer itself: if (end > (unsigned char )b_wrap(buf)) end = (unsigned char )b_wrap(buf); This is exactly what is needed when the connection is in closing state. Indeed, the size of the TX buffers are limited to reduce the memory usage. The connection only needs to send short datagrams with at most 2 packets with a CONNECTION_CLOSE* frames. They are built only one time and backed up into small TX buffer allocated from a dedicated pool. The size of this TX buffer is QUIC_MAX_CC_BUFSIZE which depends on QUIC_MIN_CC_PKTSIZE: #define QUIC_MIN_CC_PKTSIZE 128 #define QUIC_MAX_CC_BUFSIZE (2 * (QUIC_MIN_CC_PKTSIZE + QUIC_DGRAM_HEADLEN)) This size is smaller than an MTU. This patch should be backported as far as 2.9 to ease further backports to come.	2025-06-26 09:48:00 +02:00
Frederic Lecaille	9cb2acd2f2	MINOR: quic-be: add a "CC connection" backend TX buffer pool A QUIC client must be able to close a connection sending Initial packets. But QUIC client Initial packets must always be at least 1200 bytes long. To reduce the memory use of TX buffers of a connection when in "closing" state, a pool was dedicated for this purpose but with a too much reduced TX buffer size (QUIC_MAX_CC_BUFSIZE). This patch adds a "closing state connection" TX buffer pool with the same role for QUIC backends.	2025-06-26 09:48:00 +02:00
Frederic Lecaille	1e6d8f199c	BUG/MINOR: quic: wrong QUIC_FT_CONNECTION_CLOSE(0x1c) frame encoding This is an old bug which was there since this commit: MINOR: quic: Avoid zeroing frame structures It seems QUIC_FT_CONNECTION_CLOSE was confused with QUIC_FT_CONNECTION_CLOSE_APP which does not include a "frame type" field. This field was not initialized (so with a random value) which prevent the packet to be built because the packet builder supposes the packet with such frames are very short. Must be backported as far as 2.6.	2025-06-26 09:48:00 +02:00
Frederic Lecaille	b9703cf711	MINOR: quic-be: get rid of ->li quic_conn member Replace ->li quic_conn pointer to struct listener member by ->target which is an object type enum and adapt the code. Use __objt_(listener\|server)() where the object type is known. Typically this is were the code which is specific to one connection type (frontend/backend). Remove <server> parameter passed to qc_new_conn(). It is redundant with the <target> parameter. GSO is not supported at this time for QUIC backend. qc_prep_pkts() is modified to prevent it from building more than an MTU. This has as consequence to prevent qc_send_ppkts() to use GSO. ssl_clienthello.c code is run only by listeners. This is why __objt_listener() is used in place of ->li.	2025-06-11 18:37:34 +02:00
Ilia Shipitsin	78b849b839	CLEANUP: assorted typo fixes in the code and comments code, comments and doc actually.	2025-04-02 11:12:20 +02:00
Amaury Denoyelle	a71007c088	MINOR: quic: move global tune options into quic_tune A new structure quic_tune has recently been defined. Its purpose is to store global options related to QUIC. Previously, only the tunable to toggle pacing was stored in it. This commit moves several QUIC related tunable from global to quic_tune structure. This better centralizes QUIC configuration option and gives room for future generic options.	2025-03-24 10:01:46 +01:00
Amaury Denoyelle	e2744d23be	MINOR: quic: refactor CRYPTO encoding and splitting This patch is the direct follow-up of the previous one which refactor STREAM frame encoding. Reuse the newly defined quic_strm_frm_fillbuf() and quic_strm_frm_split() functions for CRYPTO frame encoding. The code for CRYPTO and STREAM frames encoding should now be clearer as it is mostly identical.	2025-02-12 15:10:54 +01:00
Amaury Denoyelle	f96af8e463	MINOR: quic: refactor STREAM encoding and splitting CRYPTO and STREAM frames encoding is similar. If payload is too large, frame will be splitted and only the first payload part will be written in the output QUIC packet. This process is complexified by the presence of a variable-length integer Length field prior to the payload. This commit aims at refactor these operations. Define two functions to simplify the code : * quic_strm_frm_fillbuf() which is used to calculate the optimal frame length of a STREAM/CRYPTO frame with its payload in a buffer * quic_strm_frm_split() which is used to split the frame payload if buffer is too small With this patch, both functions are now implemented for STREAM encoding.	2025-02-12 15:10:03 +01:00
Amaury Denoyelle	731340afbd	MINOR: quic: simplify length calculation for STREAM/CRYPTO frames STREAM and CRYPTO frames have a similar encoding format. In particular, both of them have a variable-length integer Length field just before the frame payload. It is complex to determine the optimal Length value before copying the payload data in the remaining buffer space. As such, helper functions were implemented to calculate this. However, CRYPTO and STREAM frames encoding implementation were not completely aligned, which renders the code harder to follow. The purpose of this commit is to simplify CRYPTO and STREAM frames encoding. First, a new helper quic_int_cap_length() is defined which is useful to determine the optimal buffer room available if prefixed by a variable-length integer as Length field. Then, processing of both CRYPTO and STREAM frames is now nearly identical, based on this new helper function. Functions max_available_room() and max_stream_data_size() are now unused and are removed.	2025-02-12 11:51:09 +01:00
Amaury Denoyelle	e6a223542a	BUG/MINOR: quic: fix CRYPTO payload size calcul for encoding Function max_stream_data_size() is used to determine the payload length of a CRYPTO frame. It takes into account that the CRYPTO length field is a variable length integer. Implemented calcul was incorrect as it reserved too much space as a frame header. This error is mostly due because max_stream_data_size() reuses max_available_room() which also reserve space for a variable length integer. This results in CRYPTO frames shorter of 1 to 2 bytes than the maximum achievable value, which produces in the end datagram shorter than the MTU. Fix max_stream_data_size() implementation. It is now merely a wrapper on max_available_room(). This ensures that CRYPTO frame encoding is now properly optimized to use the MTU available. This should be backported up to 2.6.	2025-02-12 11:51:09 +01:00
Amaury Denoyelle	63747452a3	BUG/MINOR: quic: reserve length field for long header encoding Long header packets have a mandatory Length field, which contains the size of Packet number and payload, encoded as a variable-length integer. Its value can thus only be determined after the payload size is known, which depends on the remaining buffer space after this variable-length field. Packet payload are encoded in two steps. First, a list of input frames is processed until the packet buffer is full. CRYPTO and STREAM frames payload can be splitted if need to fill the buffer. Real encoding is then performed as a second stage operation, first with Length field, then with the selected frames themselves. Before this patch, no space was reserved in the buffer for Length field when attaching the frames to the packet. This could result in a error as the packet payload would be too large for the remaining space. In practice, this issue was rarely encounted, mostly as a side-effect from another issue linked to CRYPTO frame encoding. Indeed, a wrong calculation is performed on CRYPTO splitting, which results in frame payload shorter by a few bytes than expected. This however ensured there would be always enough room for the Length field and payload during encoding. As CRYPTO frames are the only big enough content emitted with a Long header packet, this renders the current issue mostly non reproducible. Fix the original issue by reserving some space for Length field prior to frame payload calculation, using a maximum value based on the remaining room space. Packet length is then reduced if needed when encoding is performed, which ensures there is always enough room for the selected frames. Note that the other issue impacting CRYPTO frame encoding is not yet fixed. This could result in datagrams with Long header packets not completely extended to the full MTU. The issue will be addressed in another patch. This should be backported up to 2.6.	2025-02-12 11:51:09 +01:00
Amaury Denoyelle	4489a61585	MEDIUM: quic: implement credit based pacing Implement a new method for QUIC pacing emission based on credit. This represents the number of packets which can be emitted in a single burst. After emission, decrement from the credit the number of emitted packets. Several emission can be conducted in the same sequence until the credit is completely decremented. When a new emission sequence is initiated (i.e. under a new QMUX tasklet invokation), credit is refilled according to the delay which occured between the last and current emission context. This new mechanism main advantage is that it allows to conduct several emission in the same task context without having to wait between each invokation. Wait is only forced if pacing is expired, which is now equivalent to having a null credit. Furthermore, if delay between two emissions sequence would have been smaller than expected, credit is only partially refilled. This allows to restart emission without having to wait for the whole credit to be available. On the implementation side, a new field <credit> is avaiable in quic_pacer structure. It is automatically decremented on quic_pacing_sent_done() invokation. Also, a new function quic_pacing_reload() must be used by QUIC MUX when a new emission sequence is initiated to refill credit. <next> field from quic_pacer has been removed. For the moment, credit is based on the burst configured via quic-cc-algo keyword, or directly reported by BBR. This should be backported up to 3.1.	2025-01-23 17:40:20 +01:00
Frederic Lecaille	f8b697c19b	BUG/MINOR: improve BBR throughput on very fast links This patch fixes the loss of information when computing the delivery rate (quic_cc_drs.c) on links with very low latency due to usage of 32bits variables with the millisecond as precision. Initialize the quic_conn task with TASK_F_WANTS_TIME flag ask it to ask the scheduler to update the call date of this task. This allows this task to get a nanosecond resolution on the call date calling task_mono_time(). This is enabled only for congestion control algorithms with delivery rate estimation support (BBR only at this time). Store the send date with nanosecond precision of each TX packet into ->time_sent_ns new quic_tx_packet struct member to store the date a packet was sent in nanoseconds thanks to task_mono_time(). Make use of this new timestamp by the delivery rate estimation algorithm (quic_cc_drs.c). Rename current ->time_sent member from quic_tx_packet struct to ->time_sent_ms to distinguish the unit used by this variable (millisecond) and update the code which uses this variable. The logic found in quic_loss.c is not modified at all. Must be backported to 3.1.	2024-11-28 21:39:05 +01:00
Amaury Denoyelle	2fffd85b97	BUG/MEDIUM: quic: prevent EMSGSIZE with GSO for larger bufsize A UDP datagram cannot be greater than 65535 bytes, as UDP length header field is encoded on 2 bytes. As such, sendmsg() will reject a bigger input with error EMSGSIZE. By default, this does not cause any issue as QUIC datagrams are limited to 1.252 bytes and sent individually. However, with GSO support, value bigger than 1.252 bytes are specified on sendmsg(). If using a bufsize equal to or greater than 65535, syscall could reject the input buffer with EMSGSIZE. As this value is not expected, the connection is immediately closed by haproxy and the transfer is interrupted. This bug can easily reproduced by requesting a large object on loopback interface and using a bufsize of 65535 bytes. In fact, the limit is slightly less than 65535, as extra room is also needed for IP + UDP headers. Fix this by reducing the count of datagrams encoded in a single GSO invokation via qc_prep_pkts(). Previously, it was set to 64 as specified by man 7 udp. However, with 1252 datagrams, this is still too many. Reduce it to a value of 52. Input to sendmsg will thus be restricted to at most 65.104 bytes if last datagram is full. If there is still data available for encoding in qc_prep_pkts(), they will be written in a separate batch of datagrams. qc_send_ppkts() will then loop over the whole QUIC Tx buffer and call sendmsg() for each series of at most 52 datagrams. This does not need to be backported.	2024-11-26 11:49:30 +01:00
Frederic Lecaille	96b2641fc8	BUG/MAJOR: quic: fix wrong packet building due to already acked frames If a packet build was asked to probe the peer with frames which have just been acked, the frames build run by qc_build_frms() could be cancelled by qc_stream_frm_is_acked() whose aim is to check that current frames to be built have not been already acknowledged. In this case the packet build run by qc_do_build_pkt() is not interrupted, leading to the build of an empty packet which should be ack-eliciting. This is a bug detected by the BUG_ON() statement in qc_do_build_pk(): BUG_ON(qel->pktns->tx.pto_probe && !(pkt->flags & QUIC_FL_TX_PACKET_ACK_ELICITING)); Thank you to @Tristan971 for having reported this issue in GH #2709 This is an old bug which must be backported as far as 2.6.	2024-11-25 18:55:45 +01:00
Amaury Denoyelle	044452546e	BUG/MEDIUM: quic: fix sending performance due to qc_prep_pkts() return qc_prep_pkts() is a QUIC transport level function which encodes one or several datagrams in a buffer before sending them. It returns the number of encoded datagram. This is especially important when pacing is used to limit packet bursts. This datagram accounting was not trivial as qc_prep_pkts() used several code paths depending on the condition of the current encoded packet. Thus, there were several places were the local variable dgram_cnt could have been incremented. This was implemented by the following commit : commit 5cb8f8a6224db96f4386277c41ddae4a29a4130d MINOR: quic: support a max number of built packet per send iteration However, there is a bug due to a missing increment when all frames from the current QEL have been encoded. In this case, the encoding continue in the same datagram to coalesce a futur packet. However, if this is the last QEL, encoding loop will then break. As first_pkt is not NULL, qc_txb_store() is called outside but dgram_cnt is yet not incremented. In particular, this causes qc_prep_pkts() to return 0 when there is only small STREAM frames to emit for application QEL. In qc_send(), this is interpreted as a value which prevents further emission for the current invokation. Thus, it may hurts performance, both without and with pacing. To fix this, removing multiple dgram_cnt increment. Now, it is modified only in a single place which should cover every case, and render the code easier to validate. The most notable case where the bug is visible is when using cubic with pacing without any burst, with quic-cc-algo cubic(,1). First, transfer bandwidth in average was suboptimal, with significant variation. Worst, it could sometimes fall dramatically for a particular stream without recovering before returning to an expected level on the next one. No need to backport.	2024-11-25 11:21:28 +01:00
Frederic Lecaille	01fcbd6c08	BUG/MINOR: quic: Missing application limitations tracking for BBR The ->app_limited member of the delivery rate struct (quic_cc_drs) aim is to store the index of the last transmitted byte marked as application-limited so that to track the application-limited phases. During these phases, BBR must ignore delivery rate samples to properly estimate the delivery rate. Without such a patch, the Startup phase could be exited very quickly with a very low estimated bottleneck bandwidth. This had a very bad impact on little objects with download times smaller than the expected Startup phase duration. For such objects, with enough bandwith, BBR should stay in the Startup state. No need to be backported, as BBR is implemented in the current developement version.	2024-11-21 19:23:53 +01:00
Frederic Lecaille	e778b9a2b6	MINOR: quic: TX part modifications to support BBR. Very few modifications: call ->on_transmit() and ->drs_on_transmit() congestion control algorithm (quic_cc) callbacks from qc_send_ppkts() just after having sents some packets.	2024-11-20 17:34:22 +01:00
Amaury Denoyelle	886a7c475c	MINOR: quic/pacing: add burst support qc_send_mux() has been extended previously to support pacing emission. This will ensure that no more than one datagram will be emitted during each invokation. However, to achieve better performance, it may be necessary to emit a batch of several datagrams one one turn. A so-called burst value can be specified by the user in the configuration. However, some congestion control algos may defined their owned dynamic value. As such, a new CC callback pacing_burst is defined. quic_cc_default_pacing_burst() can be used for algo without pacing interaction, such as cubic. It will returns a static value based on user selected configuration.	2024-11-19 16:16:48 +01:00
Amaury Denoyelle	8039fe43e6	MINOR: quic/pacing: support pacing emission on quic_conn layer Pacing will be implemented for STREAM frames emission. As such, qc_send_mux() API has been extended to add an argument to a quic_pacer engine. If non NULL, engine will be used to pace emission. In short, no more than one datagram will be emitted for each qc_send_mux() invokation. Pacer is then notified about the emission and a timer for a future emission is calculated. qc_send_mux() will return PACING error value, to inform QUIC MUX layer that it will be responsible to retry emission after some delay.	2024-11-19 16:16:48 +01:00
Amaury Denoyelle	7fd48a5723	MINOR: quic: extend qc_send_mux() return type with a dedicated enum This commit is part of a adjustment on QUIC transport send API to support pacing. Here, qc_send_mux() return type has been changed to use a new enum quic_tx_err. This is useful to explain different failure causes of emission. For now, only two values have been defined : NONE and FATAL. When pacing will be implemented, a new value would be added to specify that emission was interrupted on pacing. This won't be a fatal error as this allows to retry emission but not immediately.	2024-11-19 16:16:48 +01:00
Amaury Denoyelle	5cb8f8a622	MINOR: quic: support a max number of built packet per send iteration Extend QUIC transport emission function to support a maximum datagram argument. The purpose is to ensure that qc_send() won't emit more than the specified value, unless it is 0 which is considered as unlimited. In qc_prep_pkts(), a counter of built datagram has been added to support this. The packet building loop is interrupted if it reaches a specified maximum value. Also, its return value has been changed to the number of prepared datagrams. This is reused by qc_send() to interrupt its work if a specified max datagram argument value is reached over one or several iteration of prepared/sent datagrams. This change is necessary to support pacing emission. Note that ideally, the total length in bytes of emitted datagrams should be taken into account instead of the raw number of datagrams. However, for a first implementation, it was deemed easier to implement it with the latter.	2024-11-19 16:16:48 +01:00
Amaury Denoyelle	a554d82131	MINOR: quic: simplify qc_prep_pkts() exit path To prepare pacing support, qc_prep_pkts() exit path have been rewritten to be easily modified. This is purely refactoring which should not have any functional change : * a dedicated error path has been added * ensure qc_txb_store() is always called to finalize datagram on normal exit path if first_pkt is not NULL. Needed to support breaking from packet building loop in a easier way.	2024-11-19 16:16:48 +01:00
Frederic Lecaille	217e467e89	BUG/MINOR: quic: fix malformed probing packet building This bug arrived with this commit: cdfceb10a MINOR: quic: refactor qc_prep_pkts() loop which prevents haproxy from sending PING only packets/datagrams (some packets/datagrams with only PING frame as ack-eliciting frames inside). Such packets/datagrams are useful in rare cases during retransmissions when one wants to probe the peer without exceeding the anti-amplification limit. Modify the condition passed to qc_build_pkt() to add padding to the current datagram. One does not want to do that when probing the peer without ack-eliciting frames passed as <frms> parameter. Indeed qc_build_pkt() calls qc_do_build_pkt() which supports this case: if <probe> is true (probing required), qc_do_build_pkt() handles the case where some padding must be added to a PING only packet/datagram. This is the case when probing with an empty <frms> frame list of ack-eliciting frames without exceeding the anti-amplification limit from qc_dgrams_retransmit(). Add some comments to qc_build_pkt() and qc_do_build_pkt() to clarify this as this code is easy to break! Thank you for @Tristan971 for having reported this issue in GH #2709. Must be backported to 3.0.	2024-11-05 20:17:35 +01:00
Frederic Lecaille	444a19ea38	MINOR: quic: Help diagnosing malformed probing packets Add a BUG_ON() to detect some malformed packets which are supposed to probe the peer without being ack-eliciting: the peer would not acknowledged such packets.	2024-11-05 20:17:35 +01:00
Amaury Denoyelle	a8738f4156	MINOR: quic: complete trace in qc_may_build_pkt() Log the encryption level in qc_may_build_pkt(). This is necessary to fully understand the sending conditions of the QUIC stack.	2024-10-31 15:35:31 +01:00
Amaury Denoyelle	e7578084b0	MINOR: quic: implement dedicated type for out-of-order stream ACK QUIC streamdesc layer is responsible to handle reception of ACK for streams. It removes stream data from the underlying buffers on ACK reception. Streamdesc layer treats ACK in order at the stream level. Out of order ACKs are buffered in a tree until they can be handled on older data acknowledgement reception. Previously, qf_stream instance which comes from the quic_tx_packet was used as tree node to buffer such ranges. Introduce a new type dedicated to represent out of order stream ack data range. This type is named qc_stream_ack. It contains minimal infos only relative to the acknowledged stream data range. This allows to reduce size of frequently used quic_frame with the removal of tree node from qf_stream. Another side effect of this change is that now quic_frame are always released immediately on ACK reception, both in-order and out-of-order. This allows to also release the quic_tx_packet instance which should reduce memory consumption. The drawback of this change is that qc_stream_ack instance must be allocated on out-of-order ACK reception. As such, qc_stream_desc_ack() may fail if an error happens on allocation. For the moment, such error is silenly recovered up to qc_treat_rx_pkts() with the dropping of the received packet containing the ACK frame. In the future, it may be useful to close the connection as this error may only happens on low memory usage.	2024-10-04 17:56:45 +02:00
Amaury Denoyelle	d7f4e5abf0	MEDIUM: quic: strengthen MUX send notification Previous commit implement a refactor of MUX send notification from quic_conn layer. With this new architecture, a proper callback is defined for each qc_stream_desc instance. This architecture change allows to simplify notification from quic_conn layer. First, ensure the MUX callback to properly ignore retransmission of an already emitted frame. Luckily, this can be handled easily by comparing offsets and FIN status. Also, each QCS instance can now be unregistered from send notification just prior qc_stream_desc releasing. This ensures a QCS is never manipulated from quic_conn after its emission ending. Both these changes render the send notification more robust. As a nice effect, flag QUIC_FL_CONN_TX_MUX_CONTEXT can be removed as it is now unneeded.	2024-10-01 16:19:25 +02:00
Amaury Denoyelle	6ad99af0a9	MINOR: quic: refactor MUX send notification For STREAM emission, MUX QUIC generates one or several frames and emit them via qc_send_mux(). Lower layer may use them as-is, or split them to lower chunk to fit in a QUIC packet. It is then responsible to notify the MUX to report the amount of data sent. Previously, this was done via a direct call from quic_conn to MUX using qcc_streams_sent_done(). Modify this to have a better isolation accross layers. Define a send callback handled by the qc_stream_desc instance. This allows the MUX to register each QCS instance individually to the renamved qmux_ctrl_send() which replaces qcc_streams_sent_done(). At quic_conn layer, qc_stream_desc_send() can be used now. This is a wrapper to qc_stream_desc layer to invoke the send callback if registered. This mechanism of qc_stream_desc callback should be extended later to implement other notifications accross the QUIC stack.	2024-10-01 16:19:25 +02:00
Amaury Denoyelle	714009b7bc	MINOR: quic: implement function to check if STREAM is fully acked When a STREAM frame is retransmitted, a check is performed to remove range of data already acked from it. This is useful when STREAM frames are duplicated and splitted to cover different data ranges. The newly retransmitted frame contains only unacked data. This process is performed similarly in qc_dup_pkt_frms() and qc_build_frms(). Refactor the code into a new function named qc_stream_frm_is_acked(). It returns true if frame data are already fully acked and retransmission can be avoided. If only a partial range of data is acknowledged, frame content is updated to only cover the unacked data. This patch does not have any functional change. However, it simplifies retransmission for STREAM frames. Also, it will be reused to fix retransmission for empty STREAM frames with FIN set from the following patch : BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM As such, it must be backported prior to it.	2024-08-07 10:57:10 +02:00
Frederic Lecaille	eb1a097a66	BUG/MINOR: quic: Too short datagram during packet building failures (aws-lc only) This issue was reported by Ilya (@Chipitsine) when building haproxy against aws-lc in GH #2663 where handshakeloss and handshakecorruption interop tests could lead haproxy to crash after having built too short datagrams: FATAL: bug condition "first_pkt->type == QUIC_PACKET_TYPE_INITIAL && (first_pkt->flags & (1UL << 0)) && length < 1200" matched at src/quic_tx.c:163 call trace(13): \| 0x55f4ee4dcc02 [ba d9 00 00 00 48 8d 35]: main-0x195bf2 \| 0x55f4ee4e3112 [83 3d 2f 16 35 00 00 0f]: qc_send+0x11f3/0x1b5d \| 0x55f4ee4e9ab4 [85 c0 0f 85 00 f6 ff ff]: quic_conn_io_cb+0xab1/0xf1c \| 0x55f4ee6efa82 [48 c7 c0 f8 55 ff ff 64]: run_tasks_from_lists+0x173/0x9c2 \| 0x55f4ee6f05d3 [8b 7d a0 29 c7 85 ff 0f]: process_runnable_tasks+0x302/0x6e6 \| 0x55f4ee671bb7 [83 3d 86 72 44 00 01 0f]: run_poll_loop+0x6e/0x57b \| 0x55f4ee672367 [48 8b 1d 22 d4 1d 00 48]: main-0x48d \| 0x55f4ee6755e0 [b8 00 00 00 00 e8 08 61]: main+0x2dec/0x335d This could happen after Handshake packet building failures which follow a successful Initial packet into the same datagram. In this case, the datagram could be emitted with a too short length (<1200 bytes). To fix this, store the datagram only if the first packet is not an Initial packet or if its length is big enough (>=1200 bytes). Must be backported as far as 2.6.	2024-08-05 13:40:51 +02:00
William Lallemand	177c84808c	MEDIUM: quic: add key argument to header protection crypto functions In order to prepare the code for using Chacha20 with the EVP_AEAD API, both quic_tls_hp_decrypt() and quic_tls_hp_encrypt() need an extra key argument. Indeed Chacha20 does not exists as an EVP_CIPHER in AWS-LC, so the key won't be embedded into the EVP_CIPHER_CTX, so we need an extra parameter to use it.	2024-07-25 13:45:39 +02:00
William Lallemand	d55a297b85	MINOR: quic: rename confusing wording aes to hp Some of the crypto functions used for headers protection in QUIC are named with an "aes" name even thought they are not used for AES encryption only. This patch renames these "aes" to "hp" so it is clearer.	2024-07-25 13:45:38 +02:00
Amaury Denoyelle	b0990b38f8	MINOR: quic: add counters of sent bytes with and without GSO Add a sent bytes counter for each quic_conn instance. A secondary field which only account bytes sent via GSO which is useful to ensure if this is activated. For the moment, these counters are reported on "show quic" but not aggregated on proxy quic module stats.	2024-07-11 11:02:44 +02:00
Amaury Denoyelle	d0ea173e35	MEDIUM: quic: implement GSO fallback mechanism UDP GSO on Linux is not implemented in every network devices. For example, this is not available for veth devices frequently used in container environment. In such case, EIO is reported on send() invocation. It is impossible to test at startup for proper GSO support in this case as a listener may be bound on multiple network interfaces. Furthermore, network interfaces may change during haproxy lifetime. As such, the only option is to react on send syscall error when GSO is used. The purpose of this patch is to implement a fallback when encountering such conditions. Emission can be retried immediately by trying to send each prepared datagrams individually. To support this, qc_send_ppkts() is able to iterate over each datagram in a so-called non-GSO fallback mode. Between each emission, a datagram header is rewritten in front of the buffer which allows the sending loop to proceed until last datagram is emitted. To complement this, quic_conn listener is flagged on first GSO send error with value LI_F_UDP_GSO_NOTSUPP. This completely disables GSO for all future emission with QUIC connections using this listener. For the moment, non-GSO fallback mode is activated when EIO is reported after GSO has been set. This is the error reported for the veth usage described above.	2024-07-11 11:02:44 +02:00
Amaury Denoyelle	af22792a43	MAJOR: quic: support GSO when encoding datagrams QUIC datagrams are encoded during emission via the function qc_prep_pkts(). By default, if GSO is not used, each datagram is prefixed by a metadata header which specify its length and address of its first quic_tx_packet instance. If GSO is activated, metadata header won't be inserted for datagrams following the first one sent in a single syscall. Length field will contain the total size of these datagrams. This allows to support both GSO and non-GSO prepared datagram in the same Tx buffer. qc_send_ppkts() is invoked just after datagrams encoding. It iterates over each metadata header in Tx buffer to sent each datagram individually. If length field is bigger than network MTU, GSO usage is assumed and qc_snd_buf() GSO parameter will be set. Another important point to note regarding GSO implementation is that during datagram encoding, packets from the same datagram instance are attached together. However, if using GSO, consecutive packets from different datagrams are also linked, but without QUIC_FL_TX_PACKET_COALESCED flag. This allows to properly update quic_conn status with all sent packets in qc_send_ppkts(). Packets from different datagrams are then unlinked to treat them separately when receiving corresponding ACK frames.	2024-07-11 11:02:44 +02:00
Amaury Denoyelle	448d3d388a	MINOR: quic: add GSO parameter on quic_sock send API Add <gso_size> parameter to qc_snd_buf(). When non-null, this specifies the value for socket option SOL_UDP/UDP_SEGMENT. This allows to send several datagrams in a single call by splitting data multiple times at <gso_size> boundary. For now, <gso_size> remains set to 0 by caller, as such there should not be any functional change.	2024-07-11 11:02:44 +02:00
Amaury Denoyelle	cac47d19bd	CLEANUP: quic: remove obsolete comment on send Remove comment on send which is now obsolete since the introduction of per-connection socket.	2024-07-11 11:02:44 +02:00
Frederic Lecaille	6d943b8db6	BUG/MINOR: quic: Wrong datagram building when probing. This issue was revealed by chacha20 interop test which very often fails with ngtcp2 as client. This was due to the fact that 2 application level packets could be coalesced into the same datagram as revealed by such a capture: Frame 380: 255 bytes on wire (2040 bits), 255 bytes captured (2040 bits) Point-to-Point Protocol Internet Protocol Version 4, Src: 193.167.100.100, Dst: 193.167.0.100 User Datagram Protocol QUIC IETF QUIC Connection information [Connection Number: 0] [Packet Length: 187] QUIC Short Header DCID=ec523fe99840f9c17c868a88d649147814 PKN=333 0... .... = Header Form: Short Header (0) .1.. .... = Fixed Bit: True ..0. .... = Spin Bit: False [...0 0... = Reserved: 0] [.... .0.. = Key Phase Bit: False] [.... ..00 = Packet Number Length: 1 bytes (0)] Destination Connection ID: ec523fe99840f9c17c868a88d649147814 [Packet Number: 333] Protected Payload […]: 43537d43a3c83e47db6891bd6a4fd7d7fa31941badcb87a540e843341d6a5e493ed4c3f6e6bbff094804ee0ab06830dc1a1bbf52ace4323d2e4f6e0bd4eea73df0721d2949d05a058d3afb974e814494ebf44d1375b0e7f1fd5bcf634cf32ef9a9b4018758a49d39a24c40 STREAM id=0 fin=0 off=294768 len=144 dir=Bidirectional origin=Client-initiated Frame Type: STREAM (0x000000000000000e) .... ...0 = Fin: False .... ..1. = Len(gth): True .... .1.. = Off(set): True Stream ID: 0 .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ...0 = Stream initiator: Client-initiated (0) .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ..0. = Stream direction: Bidirectional (0) Offset: 294768 Length: 144 Stream Data […]: 63eef6ccee0d2ab602db3682d0e7cc09b72db6adc307d7699a211144b4b6c029cbed9beae1491c10a5fe0678d815a5303843d33c0593fedc9b64068fd0207e280d05aac2c0054fe9ab30857bc3669ee51d34756cfd2e098eb1ab31a03911f6a103f0a16f8f984d9861efdcf4433c QUIC IETF [Packet Length: 38] QUIC Short Header DCID=ec523fe99840f9c17c868a88d649147814 PKN=334 0... .... = Header Form: Short Header (0) .1.. .... = Fixed Bit: True ..0. .... = Spin Bit: False [...0 0... = Reserved: 0] [.... .0.. = Key Phase Bit: False] [.... ..00 = Packet Number Length: 1 bytes (0)] Destination Connection ID: ec523fe99840f9c17c868a88d649147814 [Packet Number: 334] Protected Payload: b9c0e6dc3fc523574f8164c31b6cd156496212 PING Frame Type: PING (0x0000000000000001) PADDING Length: 2 Frame Type: PADDING (0x0000000000000000) [Padding Length: 2] On the peer side these two packet are considered as a unique one because there may be only one packet by datagram at application encryption level and reported as a STREAM frame encoding error: I00000332 0xec523fe99840f9c17c868a88d649147814 con recv packet len=225 mask=b2c69c7827 sample=43a3c83e47db6891bd6a4fd7d7fa3194 I00000332 0xec523fe99840f9c17c868a88d649147814 pkt rx pkn=333 dcid=0xec523fe99840f9c17c868a88d649147814 type=1RTT k=0 I00000332 0xec523fe99840f9c17c868a88d649147814 frm rx 333 1RTT STREAM(0x0e) id=0x0 fin=0 offset=294768 len=144 uni=0 ngtcp2_conn_read_pkt: ERR_FRAME_ENCODING I00000332 0xec523fe99840f9c17c868a88d649147814 pkt tx pkn=1531039643 dcid=0xae79dfc99d6c65d6 type=1RTT k=0 I00000332 0xec523fe99840f9c17c868a88d649147814 frm tx 1531039643 1RTT CONNECTION_CLOSE(0x1c) error_code=FRAME_ENCODING_ERROR(0x7) frame_type=0 reason_len=0 reason=[] I00000332 0xec523fe99840f9c17c868a88d649147814 frm tx 1531039643 1RTT PADDING(0x00) len=9 Note here that the sum of the two packet sizes (from capture) is the same as the packet length reporte by ngtcp2: 187+38 = 225. It also seems that wireshark tries to parse as much as packet into the same datagram, regardless of the QUIC protocol rules. Haproxy traces revealed that this could happen at least when probing the peer. The recent low level packet building modifications aim was to build as much as datagrams into the same buffer. But it seems that the probing packet case treatment has been broken. That said, I have not identified impacted commit. This issue could be reproduced inside interop test environment (no possible git bisection). To fix this, rely on the <probe> variable value to identify if the last packet built by qc_prep_pkts() was a probing one, then try to coalesce some others packet into the same datagram if this was not the case. Of course the test on <probe> value has to be done before setting it for the next packet. Must be backported to 3.0.	2024-07-01 09:29:09 +02:00
Amaury Denoyelle	d5376b7a87	BUG/MINOR: quic: fix BUG_ON() on Tx pkt alloc failure On quic_tx_packet allocation failure, it is possible to trigger BUG_ON() crash on INITIAL packet building. This statement is responsible to ensure INITIAL packets are padded to 1.200 bytes as required. If a packet on higher encryption level allocation fails, PADDING frame cannot properly encoded, despite the INITIAL packet properly built. This crash happens due to qc_txb_store() invokation after quic_tx_packet allocation failure to validate already built packets. However, this statement is unneeded as qc_purge_tx_buf() is called just after. Simply remove qc_txb_store() to fix this issue. This was detected using -dMfail. This should be backported up to 2.6.	2024-06-24 14:40:38 +02:00
Amaury Denoyelle	937324d493	BUG/MAJOR: quic: do not loop on emission on closing/draining state To emit CONNECTION_CLOSE frame, a special buffer is allocated via qc_txb_store(). This is due to QUIC_FL_CONN_IMMEDIATE_CLOSE flag. However this flag is reset after qc_send_ppkts() invocation to prevent reemission of CONNECTION_CLOSE frame. qc_send() can invoke multiple times a series of qc_prep_pkts() + qc_send_ppkts() to emit several datagrams. However, this may cause a crash if on first loop a CONNECTION_CLOSE is emitted. On the next loop iteration, QUIC_FL_CONN_IMMEDIATE_CLOSE is resetted, thus qc_prep_pkts() will use the wrong buffer size as end delimiter. In some cases, this may cause a BUG_ON() crash due to b_add() outside of buffer. This bug can be reproduced by using a while loop of ngtcp2-client and interrupting them randomly via Ctrl+C. Here is the patch which introduce this regression : cdfceb10ae136b02e51f9bb346321cf0045d58e0 MINOR: quic: refactor qc_prep_pkts() loop	2024-06-19 15:15:59 +02:00

1 2 3

101 Commits