120 Commits

Author SHA1 Message Date
Amaury Denoyelle
a5801e542d MINOR: quic: split global CID tree between FE and BE sides
QUIC CIDs are stored in a global tree. Prior to this patch, CIDs used on
both frontend and backend sides were mixed together.

This patch implement CID storage separation between FE and BE sides. The
original tre quic_cid_trees is splitted as
quic_fe_cid_trees/quic_be_cid_trees.

This patch should reduce contention between frontend and backend usages.
Also, it should reduce the risk of random CID collision.
2025-11-25 14:30:18 +01:00
Frederic Lecaille
91f479604e BUG/MEDIUM: quic-be: quic_conn_closed buffer overflow
This bug impacts only the backends.

Recent commits have modified quic_rx_pkt_parse() for the QUIC backend to handle the
retry token, and version negotiation. This function is called for the quic_conn
even when is closing state (so for the quic_conn_closed struct). The quic_conn
struct and quic_conn_closed struct share some members thank to the leading
QUIC_CONN_COMMON struct. The recent modification impacts some members which do not
exist for the quic_connn_closed struct, leading to buffer overflows if modified.

For the backends only this patch:
  1- silently drops the Retry packet (received/parsed only by backends)
  2- silently drops the Initial packets received in closing state

This is safe for the Initial packets because in closing state the datagrams
are entirely skipped thanks to qc_rx_check_closing() in quic_dgram_parse().

No backport needed because the backend support arrived with the current dev.
2025-11-21 10:49:44 +01:00
Amaury Denoyelle
c67a614e45 MINOR: quic: remove <ipv4> arg from qc_new_conn()
Remove <ipv4> argument from qc_new_conn(). This parameter is unnecessary
as it can be derived from the family type of the addresses also passed
as argument.
2025-11-17 10:20:54 +01:00
Amaury Denoyelle
133f100467 MINOR: quic: refactor qc_new_conn() prototype
The objective of this patch is to streamline qc_new_conn() usage so that
it is similar for frontend and backend sides.

Previously, several parameters were set only for frontend connections.
These arguments are replaced by a single quic_rx_packet argument, which
represents the INITIAL packet triggering the connection allocation on
the server side. For a QUIC client endpoint, it remains NULL. This usage
is consider more explicit.

As a minor change, <target> is moved as the first argument of the
function. This is considered useful as this argument determines whether
the connection is a frontend or backend entry.

Along with these changes, qc_new_conn() documentation has been reworded
so that it is now up-to-date with the newest usage.
2025-11-17 10:13:40 +01:00
Amaury Denoyelle
8720130cc7 MINOR: quic: do not use quic_newcid_from_hash64 on BE side
quic_newcid_from_hash64 is an external callback. If defined, it serves
as a CID method generation, as an alternative to the default random
implementation.

This mechanism was not correctly implemented on the backend side.
Indeed, <hash64> quic_conn member is only setted for frontend
connections. The simplest solution would be to properly define it also
for backend ones. However, quic_newcid_from_hash64 derivation is really
only useful for the frontend side for now. Thus, this patch disables
using it on the backend side in favor of the default random generator.

To implement this, quic_cid_generate() is splitted in two functions, for
both methods of CIDs generation. This is the responsibility of the
caller to select the proper method. On backend side, only random
implementation is now used.
2025-11-17 10:11:04 +01:00
Frederic Lecaille
f0c52f7160 BUG/MINOR: quic-be: missing version negotiation
This bug impacts only the QUIC clients (or backends). The version negotiation
was not supported at all for them. This is an oversight.

Contrary to the QUIC server which choose the negotiated version after having
received the transport parameters (into ClientHello message) the client selects
the negotiated version from the first Initial packet version field. Indeed, the
server transport parameters are inside the ServerHello messages ciphered
into Handshake packets.

This non intrusive patch does not impact the QUIC server implementation.
It only selects the negotiated version from the first Initial packet
received from the server and consequently initializes the TLS cipher context.

Thank you to @InputOutputZ for having reporte this issue in GH #3178.

No need to backport because the QUIC backends support arrives with 3.3.
2025-11-14 17:37:34 +01:00
Frederic Lecaille
80070fe51c MEDIUM: quic-be: Parse, store and reuse tokens provided by NEW_TOKEN
Add a per thread ist struct to srv_per_thread struct to store the QUIC token to
be reused for subsequent sessions.

Parse at packet level (from qc_parse_ptk_frms()) these tokens and store
them calling qc_try_store_new_token() newly implemented function. This is
this new function which does its best (may fail) to update the tokens.

Modify qc_do_build_pkt() to resend these tokens calling quic_enc_token()
implemented by this patch.
2025-11-13 14:04:31 +01:00
Frederic Lecaille
64e32a0767 BUG/MEDIUM: quic-be: do not launch the connection migration process
At this time the connection migration is not supported by QUIC backends.
This patch prevents this process to be launched for connections to QUIC backends.

Furthermore, the connection migration process could be started systematically
when connecting a backend to INADDR_ANY, leading to crashes into qc_handle_conn_migration()
(when referencing qc->li).

Thank you to @InputOutputZ for having reported this issue in GH #3178.

This patch simply checks the connection type (listener or not) before checking if
a connection migration must be started.

No need to backport because support for QUIC backends is available from 3.3.
2025-11-13 13:52:40 +01:00
Amaury Denoyelle
5a8728d03a MEDIUM/OPTIM: quic: alloc quic_conn after CID collision check
On Initial packet parsing, a new quic_conn instance is allocated via
qc_new_conn(). Then a CID is allocated with its value derivated from
client ODCID. On CID tree insert, a collision can occur if another
thread was already parsing an Initial packet from the same client. In
this case, the connection is released and the packet will be requeued to
the other thread.

Originally, CID collision check was performed prior to quic_conn
allocation. This was changed by the commit below, as this could cause
issue on quic_conn alloc failure.

  commit 4ae29be18c5b212dd2a1a8e9fa0ee2fcb9dbb4b3
  BUG/MINOR: quic: Possible endless loop in quic_lstnr_dghdlr()

However, this procedure is less optimal. Indeed, qc_new_conn() performs
many steps, thus it could be better to skip it on Initial CID collision,
which can happen frequently. This patch restores the older order of
operations, with CID collision check prior to quic_conn allocation.

To ensure this does not cause again the same bug, the CID is removed in
case of quic_conn alloc failure. This should prevent any loop as it
ensures that a CID found in the global tree does not point to a NULL
quic_conn, unless if CID is attach to a foreign thread. When this thread
will parse a re-enqueued packet, either the quic_conn is already
allocated or the CID has been removed, triggering a fresh CID and
quic_conn allocation procedure.
2025-11-10 12:10:14 +01:00
Amaury Denoyelle
a9d11ab7f3 MINOR: quic: extend traces on CID allocation
Add new traces to detect the CID generation method and also when an
Initial packet is requeued due to CID collision.
2025-11-10 12:10:14 +01:00
Amaury Denoyelle
2623e0a0b7 BUG/MEDIUM: quic: handle collision on CID generation
CIDs are provided by haproxy so that the peer can use them as DCID of
its packets. Their value is set via a random generator. It happens on
several occasions during connection lifetime:
* via ODCID derivation if haproxy is the server
* on quic_conn init if haproxy is the client
* during post-handshake if haproxy is the server
* on RETIRE_CONNECTION_ID frame parsing

CIDs are stored in a global tree. On ODCID derivation, a check is
performed to ensure the CID is not a duplicate value. This is mandatory
to properly handle multiple INITIAL packets from the same client on
different thread.

However, for the other cases, no check is performed for CID collision.
As _quic_cid_insert() is silent, the issue is not detected at all. This
results in a CID advertized to the peer but not stored in the global
one. In the end, this may cause two issues. The first one is that
packets from the client which use the new CID will be rejected by
haproxy, most probably with a STATELESS_RESET. The second issue is that
it can cause a crash during quic_conn release. Indeed, the CID is stored
in the quic_conn local tree and thus eb_delete() for the global tree
will be performed. As <leaf_p> member is uninit, this results in a
segfault.

Note that this issue is pretty rare. It can only be observed if running
with a high number of concurrent connections in parallel, so that the
random generator will provide duplicate values. Patch is still labelled
as MEDIUM as this modifies code paths used frequently.

To fix this, _quic_cid_insert() unsafe function is completely removed.
Instead, quic_cid_insert() can be used, which reports an error code if a
collision happens. CID are then stored in the quic_conn tree only after
global tree insert success. Here is the solution for each steps if a
collision occurs :
* on init as client: the connection is completely released
* post-handshake: the CID is immediately released. The connection is
  kept, but it will miss an extra CID.
* on RETIRE_CONNECTION_ID parsing: a loop is implemented to retry random
  generation. It it fails several times, the connection is closed in
  error.

A small convenience change is made to quic_cid_insert(). Output
parameter <new_tid> can now be NULL, which is useful as most of the
times caller do not care about it.

This must be backported up to 2.6.
2025-11-10 12:10:14 +01:00
Amaury Denoyelle
419e5509d8 MINOR: quic: split CID alloc/generation function
Split new_quic_cid() function into multiple ones. This patch should not
introduce any visible change. The objective is to render CID allocation
and generation more modular.

The first advantage of this patch is to bring code simplication. In
particular, conn CID sequence number increment and insertion into
connection tree is simpler than before. Another improvment is also that
errors could now be handled easier at each different steps of the CID
init.

This patch is a prerequisite for the fix on CID collision, thus it must
be backported prior to it to every affected version.
2025-11-10 12:10:14 +01:00
Amaury Denoyelle
73621adb23 BUG/MINOR: quic: close connection on CID alloc failure
During RETIRE_CONNECTION_ID frame parsing, a new connection ID is
immediately reallocated after the release of the previous one. This is
done to ensure that the peer will never run out of DCID.

Prior to this patch, a CID allocation failure was be silently ignored.
This prevent the emission of a new CID, which could prevent the peer to
emit packets if it had no other CIDs available for use. Now, such error
is considered fatal to the connection. This is the safest solution as
it's better to close connections when memory is running low.

It must be backported up to 2.8.
2025-11-10 12:10:14 +01:00
Amaury Denoyelle
b9809fe0d0 MINOR: quic: remove <mux_state> field
This patch removes <mux_state> field from quic_conn structure. The
purpose of this field was to indicate if MUX layer above quic_conn is
not yet initialized, active, or already released.

It became tedious to properly set it as initialization order of the
various quic_conn/conn/MUX layers now differ between the frontend and
backend sides, and also depending if 0-RTT is used or not. Recently, a
new change introduced in connect_server() will allow to initialize QUIC
MUX earlier if ALPN is cached on the server structure. This had another
level of complexity.

Thus, this patch removes <mux_state> field completely. Instead, a new
flag QUIC_FL_CONN_XPRT_CLOSED is defined. It is set at a single place
only on close XPRT callback invokation. It can be mixed with the new
utility functions qc_wait_for_conn()/qc_is_conn_ready() to determine the
status of conn/MUX layers now without an extra quic_conn field.
2025-11-05 14:03:34 +01:00
Amaury Denoyelle
efe60745b3 MINOR: quic: remove connection arg from qc_new_conn()
This patch is similar to the previous one, this time dealing with
qc_new_conn(). This function was asymetric on frontend and backend side,
as connection argument was set only in the latter case.

This was required prior due to qc_alloc_ssl_sock_ctx() signature. This
has changed with the previous patch, thus qc_new_conn() can also be
realigned on both FE and BE sides. <conn> member of quic_conn instance
is always set outside it, in qc_xprt_start() on the backend case.
2025-11-04 17:47:42 +01:00
Amaury Denoyelle
a14c6cee17 MINOR: quic: rename retry-threshold setting
A QUIC global tune setting is defined to be able to force Retry emission
prior to handshake. By definition, this ability is only supported by
QUIC servers, hence it is a frontend option only.

Rename the option to use "fe" prefix. The old option name is deprecated
and will be removed in 3.5
2025-10-23 16:49:20 +02:00
Amaury Denoyelle
33a8cb87a9 MINOR: quic: split congestion controler options for FE/BE usage
Various settings can be configured related to QUIC congestion controler.
This patch duplicates them to be able to set independent values on
frontend and backend sides.

As with previous patch, option are renamed to use "fe/be" unified
prefixes. This is part of the current serie of commits which unify QUIC
settings. Older options are deprecated and will be removed on 3.5
release.
2025-10-23 16:49:20 +02:00
Amaury Denoyelle
f50425c021 MINOR: quic: remove received CRYPTO temporary tree storage
The previous commit switch from ncbuf to ncbmbuf as storage for received
CRYPTO frames. The latter ensures that buffering of such frames cannot
fail anymore due to gaps size.

Previously, extra mechanism were implemented on QUIC frames parsing
function to overcome the limitation of ncbuf on gaps size. Before
insertion, CRYPTO frames were stored in a temporary tree to order their
insertion. As this is not necessary anymore, this commit removes the
temporary tree insertion.

This commit is closely associated to the previous bug fix. As it
provides a neat optimization and code simplication, it can be backported
with it, but not in the next immediate release to spot potential
regression.
2025-10-22 15:24:02 +02:00
Amaury Denoyelle
4c11206395 BUG/MAJOR: quic: use ncbmbuf for CRYPTO handling
In QUIC, TLS handshake messages such as ClientHello are encapsulated in
CRYPTO frames. Each QUIC implementation can split the content in several
frames of random sizes. In fact, this feature is now used by several
clients, based on chrome so-called "Chaos protection" mechanism :

https://quiche.googlesource.com/quiche/+/cb6b51054274cb2c939264faf34a1776e0a5bab7

To support this, haproxy uses a ncbuf storage to store received CRYPTO
frames before passing it to the SSL library. However, this storage
suffers from a limitation as gaps between two filled blocks cannot be
smaller than 8 bytes. Thus, depending on the size of received CRYPTO
frames and their order, ncbuf may not be sufficient. Over time, several
mechanisms were implemented in haproxy QUIC frames parsing to overcome
the ncbuf limitation.

However, reports recently highlight that with some clients haproxy is
not able to deal with CRYPTO frames reception. In particular, this is
the case with the latest ngtcp2 release, which implements a similar
chaos protection mechanism via the following patch. It also seems that
this impacts haproxy interaction with firefox.

commit 89c29fd8611d5e6d2f6b1f475c5e3494c376028c
Author: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>
Date:   Mon Aug 4 22:48:06 2025 +0900

    Crumble Client Initial CRYPTO (aka chaos protection)

To fix haproxy CRYPTO frames buffering once and for all, an alternative
non-contiguous buffer named ncbmbuf has been recently implemented. This
type does not suffer from gaps size limitation, albeit at the cost of a
small reduction in the size available for data storage.

Thus, the purpose of this current patch is to replace ncbuf with the
newer ncbmbuf for QUIC CRYPTO frames parsing. Now, ncbmb_add() is used
to buffer received frames which is guaranteed to suceed. The only
remaining case of error is if a received frame offset and length exceed
the ncbmbuf data storage, which would result in a CRYPTO_BUFFER_EXCEEDED
error code.

A notable behavior change when switching to ncbmbuf implementation is
that NCB_ADD_COMPARE mode cannot be used anymore during add. Instead,
crypto frame content received at a similar offset will be overwritten.

A final note regarding STREAM frames parsing. For now, it is considered
unnecessary to switch from ncbuf in this case. Indeed, QUIC clients does
not perform aggressive fragmentation for them. Keeping ncbuf ensure that
the data storage size is bigger than the equivalent ncbmbuf area.

This should fix github issue #3141.

This patch must be backported up to 2.6. It is first necessary to pick
the relevant commits for ncbmbuf implementation prior to it.
2025-10-22 15:04:41 +02:00
Frederic Lecaille
47bb15ca84 MINOR: quic: get rid of ->target quic_conn struct member
The ->li (struct listener *) member of quic_conn struct was replaced by a
->target (struct obj_type *) member by this commit:

    MINOR: quic-be: get rid of ->li quic_conn member

to abstract the connection type (front or back) when implementing QUIC for the
backends. In these cases, ->target was a pointer to the ojb_type of a server
struct. This could not work with the dynamic servers contrary to the listeners
which are not dynamic.

This patch almost reverts the one mentioned above. ->target pointer to obj_type member
is replaced by ->li pointer to listener struct member. As the listener are not
dynamic, this is easy to do this. All one has to do is to replace the
objt_listener(qc->target) statement by qc->li where applicable.

For the backend connection, when needed, this is always qc->conn->target which is
used only when qc->conn is initialized. The only "problematic" case is for
quic_dgram_parse() which takes a pointer to an obj_type as third argument.
But this obj_type is only used to call quic_rx_pkt_parse(). Inside this function
it is used to access the proxy counters of the connection thanks to qc_counters().
So, this obj_type argument may be null for now on with this patch. This is the
reason why qc_counters() is modified to take this into consideration.
2025-09-11 09:51:28 +02:00
Frederic Lecaille
58b153b882 MINOR: quic: Add more information about RX packets
This patch is very useful to debug issues at RX packet processing level.

Should be easily backported as far as 2.6 (for debug purposes).
2025-09-03 09:41:38 +02:00
Frederic Lecaille
fba80c7fe8 BUG/MINOR: quic: ignore AGAIN ncbuf err when parsing CRYPTO frames
This fix follows this previous one:

    BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets

which is not sufficient when a client fragments and mixes its CRYPTO frames AND
leaveswith holes by packets. ngtcp2 (and perhaps chrome) splits theire CRYPTO
frames but without hole by packet. In such a case, the CRYPTO parsing leads to
QUIC_RX_RET_FRM_AGAIN errors which cannot be fixed when the peer resends its packets.
Indeed, even if the peer resends its frames in a different order, this does not
help because since the previous commit, the CRYPTO frames are ordered on haproxy side.

This issue was detected thanks to the interopt tests with quic-go as client. This
client fragments its CRYPTO frames, mixes them, and generate holes, and most of
the times with the retry test.

To fix this, when a QUIC_RX_RET_FRM_AGAIN error is encountered, the CRYPTO frames
parsing is not stop. This leaves chances to the next CRYPTO frames to be parsed.

Must be backported as far as 2.6 as the commit mentioned above.
2025-09-02 08:13:58 +02:00
Frederic Lecaille
800ba73a9c BUG/MEDIUM: quic: CRYPTO frame freeing without eb_delete()
Since this commit:

	BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets

when they are parsed, the CRYPTO frames are ordered by their offsets into an ebtree.
Then their data are provided to the ncbufs.

But in case of error, when qc_handle_crypto_frm() returns QUIC_RX_RET_FRM_FATAL
or QUIC_RX_RET_FRM_AGAIN), they remain attached to their tree. Then
from <err> label, they are deteleted and deleted (with a while(node) { eb_delete();
qc_frm_free();} loop). But before this loop, these statements directly
free the frame without deleting it from its tree, if this is a CRYPTO frame,
leading to a use after free when running the loop:

     if (frm)
	    qc_frm_free(qc, &frm);

This issue was detected by the interop tests, with quic-go as client. Weirdly, this
client sends CRYPTO frames by packet with holes.
Must be backported as far as 2.6 as the commit mentioned above.
2025-09-01 10:39:00 +02:00
Frederic Lecaille
90126ec9b7 CLEANUP: quic: remove a useless CRYPTO frame variable assignment
This modification should have arrived with this commit:

	MINOR: quic: remove ->offset qf_crypto struct field

Since this commit, the CRYPTO offset node key assignment is done at parsing time
when calling qc_parse_frm() from qc_parse_pkt_frms().

This useless assigment has been reported in GH #3095 by coverity.

This patch should be easily backported as far as 2.6 as the one mentioned above
to ease any further backport to come.
2025-09-01 09:31:04 +02:00
Frederic Lecaille
31c17ad837 MINOR: quic: remove ->offset qf_crypto struct field
This patch follows this previous bug fix:

    BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets

where a ebtree node has been added to qf_crypto struct. It has the same
meaning and type as ->offset_node.key field with ->offset_node an eb64tree node.
This patch simply removes ->offset which is no more useful.

This patch should be easily backported as far as 2.6 as the one mentioned above
to ease any further backport to come.
2025-08-28 08:19:34 +02:00
Frederic Lecaille
d753f24096 BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets
This issue impacts the QUIC listeners. It is the same as the one fixed by this
commit:

	BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO

As chrome, ngtcp2 client decided to fragment its CRYPTO frames but in a much
more agressive way. This could be fixed with a list local to qc_parse_pkt_frms()
to please chrome thanks to the commit above. But this is not sufficient for
ngtcp2 which often splits its ClientHello message into more than 10 fragments
with very small ones. This leads the packet parser to interrupt the CRYPTO frames
parsing due to the ncbuf gap size limit.

To fix this, this patch approximatively proceeds the same way but with an
ebtree to reorder the CRYPTO by their offsets. These frames are directly
inserted into a local ebtree. Then this ebtree is reused to provide the
reordered CRYPTO data to the underlying ncbuf (non contiguous buffer). This way
there are very few less chances for the ncbufs used to store CRYPTO data
to reach a too much fragmented state.

Must be backported as far as 2.6.
2025-08-27 16:14:19 +02:00
Willy Tarreau
c264ea1679 MEDIUM: tree-wide: replace most DECLARE_POOL with DECLARE_TYPED_POOL
This will make the pools size and alignment automatically inherit
the type declaration. It was done like this:

   sed -i -e 's:DECLARE_POOL(\([^,]*,[^,]*,\s*\)sizeof(\([^)]*\))):DECLARE_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_POOL src addons)
   sed -i -e 's:DECLARE_STATIC_POOL(\([^,]*,[^,]*,\s*\)sizeof(\([^)]*\))):DECLARE_STATIC_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_STATIC_POOL src addons)

81 replacements were made. The only remaining ones are those which set
their own size without depending on a structure. The few ones with an
extra size were manually handled.

It also means that the requested alignments are now checked against the
type's. Given that none is specified for now, no issue is reported.

It was verified with "show pools detailed" that the definitions are
exactly the same, and that the binaries are similar.
2025-08-11 19:55:30 +02:00
Amaury Denoyelle
731b52ded9 MINOR: quic: prefer qc_is_back() usage over qc->target
Previously quic_conn <target> member was used to determine if quic_conn
was used on the frontend (as server) or backend side (as client). A new
helper function can now be used to directly check flag
QUIC_FL_CONN_IS_BACK.

This reduces the dependency between quic_conn and their relative
listener/server instances.
2025-08-07 16:59:59 +02:00
Frederic Lecaille
838024e07e MINOR: quic: Get rid of qc_is_listener()
Replace all calls to qc_is_listener() (resp. !qc_is_listener()) by calls to
objt_listener() (resp. objt_server()).
Remove qc_is_listener() implement and QUIC_FL_CONN_LISTENER the flag it
relied on.
2025-07-16 16:42:21 +02:00
Frederic Lecaille
194e3bc2d5 MINOR: quic-be: address validation support implementation (RETRY)
- Add ->retry_token and ->retry_token_len new quic_conn struct members to store
  the retry tokens. These objects are allocated by quic_rx_packet_parse() and
  released by quic_conn_release().
- Add <pool_head_quic_retry_token> new pool for these tokens.
- Implement quic_retry_packet_check() to check the integrity tag of these tokens
  upon RETRY packets receipt. quic_tls_generate_retry_integrity_tag() is called
  by this new function. It has been modified to pass the address where the tag
  must be generated
- Add <resend> new parameter to quic_pktns_discard(). This function is called
  to discard the packet number spaces where the already TX packets and frames are
  attached to. <resend> allows the caller to prevent this function to release
  the in flight TX packets/frames. The frames are requeued to be resent.
- Modify quic_rx_pkt_parse() to handle the RETRY packets. What must be done upon
  such packets receipt is:
  - store the retry token,
  - store the new peer SCID as the DCID of the connection. Note that the peer will
    modify again its SCID. This is why this SCID is also stored as the ODCID
    which must be matched with the peer retry_source_connection_id transport parameter,
  - discard the Initial packet number space without flagging it as discarded and
    prevent retransmissions calling qc_set_timer(),
  - modify the TLS cryptographic cipher contexts (RX/TX),
  - wakeup the I/O handler to send new Initial packets asap.
- Modify quic_transport_param_decode() to handle the retry_source_connection_id
  transport parameter as a QUIC client. Then its caller is modified to
  check this transport parameter matches with the SCID sent by the peer with
  the RETRY packet.
2025-06-26 09:48:00 +02:00
Amaury Denoyelle
06cab99a0e MINOR: mux-quic: support max bidi streams value set by the peer
Implement support for MAX_STREAMS frame. On frontend, this was mostly
useless as haproxy would never initiate new bidirectional streams.
However, this becomes necessary to control stream flow-control when
using QUIC as a client on the backend side.

Parsing of MAX_STREAMS is implemented via new qcc_recv_max_streams().
This allows to update <ms_uni>/<ms_bidi> QCC fields.

This patch is necessary to achieve QUIC backend connection reuse.
2025-06-18 17:25:27 +02:00
Amaury Denoyelle
577fa44691 BUG/MINOR: quic: work around NEW_TOKEN parsing error on backend side
NEW_TOKEN frame is never emitted by a client, hence parsing was not
tested on frontend side.

On backend side, an issue can occur, as expected token length is static,
based on the token length used internally by haproxy. This is not
sufficient for most server implementation which uses larger token. This
causes a parsing error, which may cause skipping of following frames in
the same packet. This issue was detected using ngtcp2 as server.

As for now tokens are unused by haproxy, simply discard test on token
length during NEW_TOKEN frame parsing. The token itself is merely
skipped without being stored. This is sufficient for now to continue on
experimenting with QUIC backend implementation.

This does not need to be backported.
2025-06-12 17:47:15 +02:00
Frederic Lecaille
b9703cf711 MINOR: quic-be: get rid of ->li quic_conn member
Replace ->li quic_conn pointer to struct listener member by  ->target which is
an object type enum and adapt the code.
Use __objt_(listener|server)() where the object type is known. Typically
this is were the code which is specific to one connection type (frontend/backend).
Remove <server> parameter passed to qc_new_conn(). It is redundant with the
<target> parameter.
GSO is not supported at this time for QUIC backend. qc_prep_pkts() is modified
to prevent it from building more than an MTU. This has as consequence to prevent
qc_send_ppkts() to use GSO.
ssl_clienthello.c code is run only by listeners. This is why __objt_listener()
is used in place of ->li.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
2d076178c6 MINOR: quic-be: Store asap the DCID
Store the peer connection ID (SCID) as the connection DCID as soon as an Initial
packet is received.
Stop comparing the packet to QUIC_PACKET_TYPE_0RTT is already match as
QUIC_PACKET_TYPE_INITIAL.
A QUIC server must not send too short datagram with ack-eliciting packets inside.
This cannot be done from quic_rx_pkt_parse() because one does not know if
there is ack-eliciting frame into the Initial packets. If the packet must be
dropped, this is after having parsed it!
2025-06-11 18:37:34 +02:00
Frederic Lecaille
43d88a44f1 MINOR: quic-be: Datagrams and packet parsing support
Modify quic_dgram_parse() to stop passing it a listener as third parameter.
In place the object type address of the connection socket owner is passed
to support the haproxy servers with QUIC as transport protocol.
qc_owner_obj_type() is implemented to return this address.
qc_counters() is also implemented to return the QUIC specific counters of
the proxy of owner of the connection.
quic_rx_pkt_parse() called by quic_dgram_parse() is also modify to use
the object type address used by this latter as last parameter. It is
also modified to send Retry packet only from listeners. A QUIC client
(connection to haproxy QUIC servers) must drop the Initial packets with
non null token length. It is also not supposed to receive O-RTT packets
which are dropped.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
f49bbd36b9 MINOR: quic-be: SSL sessions initializations
Modify qc_alloc_ssl_sock_ctx() to pass the connection object as parameter. It is
NULL for a QUIC listener, not NULL for a QUIC server. This connection object is
set as value for ->conn quic_conn struct member. Initialise the SSL session object from
this function for QUIC servers.
qc_ssl_set_quic_transport_params() is also modified to pass the SSL object as parameter.
This is the unique parameter this function needs. <qc> parameter is used only for
the trace.
SSL_do_handshake() must be calle as soon as the SSL object is initialized for
the QUIC backend connection. This triggers the TLS CRYPTO data delivery.
tasklet_wakeup() is also called to send asap these CRYPTO data.
Modify the QUIC_EV_CONN_NEW event trace to dump the potential errors returned by
SSL_do_handshake().
2025-06-11 18:37:34 +02:00
Amaury Denoyelle
f286288471 MINOR: quic: refactor handling of streams after MUX release
quic-conn layer has to handle itself STREAM frames after MUX release. If
the stream was already seen, it is probably only a retransmitted frame
which can be safely ignored. For other streams, an active closure may be
needed.

Thus it's necessary that quic-conn layer knows the highest stream ID
already handled by the MUX after its release. Previously, this was done
via <nb_streams> member array in quic-conn structure.

Refactor this by replacing <nb_streams> by two members called
<stream_max_uni>/<stream_max_bidi>. Indeed, it is unnecessary for
quic-conn layer to monitor locally opened uni streams, as the peer
cannot by definition emit a STREAM frame on it. Also, bidirectional
streams are always opened by the remote side.

Previously, <nb_streams> were set by quic-stream layer. Now,
<stream_max_uni>/<stream_max_bidi> members are only set one time, just
prior to QUIC MUX release. This is sufficient as quic-conn do not use
them if the MUX is available.

Note that previously, IDs were used relatively to their type, thus
incremented by 1, after shifting the original value. For simplification,
use the plain stream ID, which is incremented by 4.
2025-05-21 14:26:45 +02:00
Amaury Denoyelle
07d41a043c MINOR: quic: move function to check stream type in utils
Move general function to check if a stream is uni or bidirectional from
QUIC MUX to quic_utils module. This should prevent unnecessary include
of QUIC MUX header file in other sources.
2025-05-21 14:17:41 +02:00
Amaury Denoyelle
3fdb039a99 BUG/MEDIUM: quic: free stream_desc on all data acked
The following patch simplifies qc_stream_desc_ack(). The qc_stream_desc
instance is not freed anymore, even if all data were acknowledged. As
implies by the commit message, the caller is responsible to perform this
cleaning operation.
  f4a83fbb14bdd14ed94752a2280a2f40c1b690d2
  MINOR: quic: do not remove qc_stream_desc automatically on ACK handling

However, despite the commit instruction, qc_stream_desc_free()
invokation was not moved in the caller. This commit fixes this by adding
it after stream ACK handling. This is performed only when a transfer is
completed : all data is acknowledged and qc_stream_desc has been
released by its MUX stream instance counterpart.

This bug may cause a significant increase in memory usage when dealing
with long running connection. However, there is no memory leak, as every
qc_stream_desc attached to a connection are finally freed when quic_conn
instance is released.

This must be backported up to 3.1.
2025-05-09 09:25:47 +02:00
Ilia Shipitsin
27a6353ceb CLEANUP: assorted typo fixes in the code, commits and doc 2025-04-03 11:37:25 +02:00
Ilia Shipitsin
78b849b839 CLEANUP: assorted typo fixes in the code and comments
code, comments and doc actually.
2025-04-02 11:12:20 +02:00
Amaury Denoyelle
bbaa7aef7b BUG/MINOR: quic: do not increase congestion window if app limited
Previously, congestion window was increased any time each time a new
acknowledge was received. However, it did not take into account the
window filling level. In a network condition with negligible loss, this
will cause the window to be incremented until the maximum value (by
default 480k), even though the application does not have enough data to
fill it.

In most cases, this issue is not noticeable. However, it may lead to
excessive memory consumption when a QUIC connection is suddendly
interrupted, as in this case haproxy will fill the window with
retransmission. It even has caused OOM crash when thousands of clients
were interrupted at once on a local network benchmark.

Fix this by first checking window level prior to every incrementation
via a new helper function quic_cwnd_may_increase(). It was arbitrarily
decided that the window must be at least 50% full when the ACK is
handled prior to increment it. This value is a good compromise to keep
window in check while still allowing fast increment when needed.

Note that this patch only concerns cubic and newreno algorithm. BBR has
already its notion of application limited which ensures the window is
only incremented when necessary.

This should be backported up to 2.6.
2025-01-23 14:49:35 +01:00
Amaury Denoyelle
c3a4a4d166 BUG/MAJOR: quic: reject too large CRYPTO frames
Received CRYPTO frames are inserted in a ncbuf to handle out-of-order
reception via ncb_add(). They are stored on the position relative to the
frame offset, minus a base offset which corresponds to the in-order data
length already handled.

Previouly, no check was implemented on the frame offset value prior to
ncb_add(), which could easily trigger a crash if relative offset was too
large. Fix this by ensuring first that the frame can be stored in the
buffer before ncb_add() invokation. If this is not the case, connection
is closed with error CRYPTO_BUFFER_EXCEEDED, as required by QUIC
specification.

This should fix github issue #2842.

This must be backported up to 2.6.
2025-01-20 11:43:23 +01:00
Amaury Denoyelle
4a5d82a97d BUG/MINOR: quic: reject NEW_TOKEN frames from clients
As specified by RFC 9000, reject NEW_TOKEN frames emitted by clients.
Close the connection with error code PROTOCOL_VIOLATION.

This must be backported up to 2.6.
2025-01-10 14:50:59 +01:00
Frederic Lecaille
7868dc9c45 BUILD: quic: fix a build error about an non initialized timestamp
This is to please a non identified compilers which complains about an hypothetic
<time_ns> variable which would be not initialized even if this is the case only
when it is not used.

This build issue arrived with this commit:
	BUG/MINOR: improve BBR throughput on very fast links

Should be backported to 3.1 with this previous commit.
2024-11-29 14:48:37 +01:00
Frederic Lecaille
f8b697c19b BUG/MINOR: improve BBR throughput on very fast links
This patch fixes the loss of information when computing the delivery rate
(quic_cc_drs.c) on links with very low latency due to usage of 32bits
variables with the millisecond as precision.

Initialize the quic_conn task with TASK_F_WANTS_TIME flag ask it to ask
the scheduler to update the call date of this task. This allows this task to get
a nanosecond resolution on the call date calling task_mono_time(). This is enabled
only for congestion control algorithms with delivery rate estimation support
(BBR only at this time).

Store the send date with nanosecond precision of each TX packet into
->time_sent_ns new quic_tx_packet struct member to store the date a packet was
sent in nanoseconds thanks to task_mono_time().

Make use of this new timestamp by the delivery rate estimation algorithm (quic_cc_drs.c).

Rename current ->time_sent member from quic_tx_packet struct to ->time_sent_ms to
distinguish the unit used by this variable (millisecond) and update the code which
uses this variable. The logic found in quic_loss.c is not modified at all.

Must be backported to 3.1.
2024-11-28 21:39:05 +01:00
Frederic Lecaille
44af88d856 MINOR: quic: RX part modifications to support BBR
qc_notify_cc_of_newly_acked_pkts() aim is to notify the congestion algorithm
of all the packet acknowledgements. It must call quic_cc_drs_update_rate_sample()
to update the delivery rate sampling information. It must also call
quic_cc_drs_on_ack_recv() to update the state of the delivery rate sampling part
used by BBR.
Finally, ->on_ack_rcvd() is called with the total number of bytes delivered
by the sender from the newly acknowledged packets with <bytes_delivered> as
parameter to do so. <pkt_delivered> store the per-packet number of bytes
delivered by the newly sent acknowledged packet (the packet with the highest
packet number). <bytes_lost> is also used and has been set by
qc_packet_loss_lookup() before calling qc_notify_cc_of_newly_acked_pkts().
2024-11-20 17:34:22 +01:00
Amaury Denoyelle
2975e8805d BUG/MEDIUM: quic: prevent crash due to CRYPTO parsing error
A packet which contains several splitted and out of order CRYPTO frames
may be parsed multiple times to ensure it can be handled via ncbuf. Only
3 iterations can be performed to prevent excessive CPU usage.

There is a risk of crash if packet parsing is interrupted after maximum
iterations is reached, or no progress can be made on the ncbuf. This is
because <frm> may be dangling after list_for_each_entry_safe()

The crash occurs on qc_frm_free() invokation, on error path of
qc_parse_pkt_frms(). To fix it, always reset frm to NULL after
list_for_each_entry_safe() to ensure it is not dangling.

This should fix new report on github isue #2776. This regression has
been triggered by the following patch :
  1767196d5b2d8d1e557f7b3911a940000166ecda
  BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO

As such, it must be backported up to 2.6, after the above patch.
2024-11-08 15:19:57 +01:00
Amaury Denoyelle
3b851a326b BUG/MEDIUM: quic: do not consider ACK on released stream as error
When an ACK is received by haproxy, a lookup is performed to retrieve
the related emitted frames. For STREAM type frames, a lookup is
performed under quic_conn stream_desc tree. Indeed, the corresponding
stream instance could be already released if multiple ACK were received
refering to the same stream offset, which can happen notably if
retransmission occured.

qc_handle_newly_acked_frm() implements this logic. If the case with an
already released stream is encounted, an error is returned. In the end,
this error is propagated via qc_parse_pkt_frms() into
qc_treat_rx_pkts(), despite being in fact a perfectly valid case. Fix
this by adjusting ACK handling function to return a success value for
the particular case of released stream instead.

The impact of this bug is unknown, but it can have several consequences.
* if the packet with the ACK contains other frames after it, their
  content will be skipped
* the packet won't be acknowledged by haproxy, even if it contains other
  frames and is ack-eliciting. This may cause unneeded retransmission by
  the client.
* RTT sampling information related to this ACK is ignored by haproxy

Finally, it also caused the increment of the quic_conn counter
dropped_parsing (droppars in "show quic" output) which should be
reserved only for real error cases.

This regression is present since the following patch :
  e7578084b0536e3e5988be7f09091c85beb8fa9d
  MINOR: quic: implement dedicated type for out-of-order stream ACK

Before, qc_handle_newly_acked_frm() return type was always ignored. As
such, no backport is needed.
2024-11-06 17:37:44 +01:00
Amaury Denoyelle
1767196d5b BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO
A ClientHello may be splitted accross several different CRYPTO frames,
then mixed in a single QUIC packet. This is used notably by clients such
as chrome to render the first Initial packet opaque to middleboxes.

Each packet frame is handled sequentially. Out-of-order CRYPTO frames
are buffered in a ncbuf, until gaps are filled and data is transferred
to the SSL stack. If CRYPTO frames are heavily splitted with small
fragments, buffering may fail as ncbuf does not support small gaps. This
causes the whole packet to be rejected and unacknowledged. It could be
solved if the client reemits its ClientHello after remixing its CRYPTO
frames.

This patch is written to improve CRYPTO frame parsing. Each CRYPTO
frames which cannot be buffered due to ncbuf limitation are now stored
in a temporary list. Packet parsing is completed until all frames have
been handled. If temporary list is not empty, reparsing is done on the
stored frames. With the newly buffered CRYPTO frames, ncbuf insert
operation may this time succeeds if the frame now covers a whole gap.
Reparsing will loop until either no progress can be made or it has been
done at least 3 times, to prevent CPU utilization.

This patch should fix github issue #2776.

This should be backported up to 2.6, after a period of observation. Note
that it relies on the following refactor patches :
  MINOR: quic: extend return value of CRYPTO parsing
  MINOR: quic: use dynamically allocated frame on parsing
  MINOR: quic: simplify qc_parse_pkt_frms() return path
2024-11-06 14:29:14 +01:00