107 Commits

Author SHA1 Message Date
Amaury Denoyelle
b9809fe0d0 MINOR: quic: remove <mux_state> field
This patch removes <mux_state> field from quic_conn structure. The
purpose of this field was to indicate if MUX layer above quic_conn is
not yet initialized, active, or already released.

It became tedious to properly set it as initialization order of the
various quic_conn/conn/MUX layers now differ between the frontend and
backend sides, and also depending if 0-RTT is used or not. Recently, a
new change introduced in connect_server() will allow to initialize QUIC
MUX earlier if ALPN is cached on the server structure. This had another
level of complexity.

Thus, this patch removes <mux_state> field completely. Instead, a new
flag QUIC_FL_CONN_XPRT_CLOSED is defined. It is set at a single place
only on close XPRT callback invokation. It can be mixed with the new
utility functions qc_wait_for_conn()/qc_is_conn_ready() to determine the
status of conn/MUX layers now without an extra quic_conn field.
2025-11-05 14:03:34 +01:00
Amaury Denoyelle
efe60745b3 MINOR: quic: remove connection arg from qc_new_conn()
This patch is similar to the previous one, this time dealing with
qc_new_conn(). This function was asymetric on frontend and backend side,
as connection argument was set only in the latter case.

This was required prior due to qc_alloc_ssl_sock_ctx() signature. This
has changed with the previous patch, thus qc_new_conn() can also be
realigned on both FE and BE sides. <conn> member of quic_conn instance
is always set outside it, in qc_xprt_start() on the backend case.
2025-11-04 17:47:42 +01:00
Amaury Denoyelle
a14c6cee17 MINOR: quic: rename retry-threshold setting
A QUIC global tune setting is defined to be able to force Retry emission
prior to handshake. By definition, this ability is only supported by
QUIC servers, hence it is a frontend option only.

Rename the option to use "fe" prefix. The old option name is deprecated
and will be removed in 3.5
2025-10-23 16:49:20 +02:00
Amaury Denoyelle
33a8cb87a9 MINOR: quic: split congestion controler options for FE/BE usage
Various settings can be configured related to QUIC congestion controler.
This patch duplicates them to be able to set independent values on
frontend and backend sides.

As with previous patch, option are renamed to use "fe/be" unified
prefixes. This is part of the current serie of commits which unify QUIC
settings. Older options are deprecated and will be removed on 3.5
release.
2025-10-23 16:49:20 +02:00
Amaury Denoyelle
f50425c021 MINOR: quic: remove received CRYPTO temporary tree storage
The previous commit switch from ncbuf to ncbmbuf as storage for received
CRYPTO frames. The latter ensures that buffering of such frames cannot
fail anymore due to gaps size.

Previously, extra mechanism were implemented on QUIC frames parsing
function to overcome the limitation of ncbuf on gaps size. Before
insertion, CRYPTO frames were stored in a temporary tree to order their
insertion. As this is not necessary anymore, this commit removes the
temporary tree insertion.

This commit is closely associated to the previous bug fix. As it
provides a neat optimization and code simplication, it can be backported
with it, but not in the next immediate release to spot potential
regression.
2025-10-22 15:24:02 +02:00
Amaury Denoyelle
4c11206395 BUG/MAJOR: quic: use ncbmbuf for CRYPTO handling
In QUIC, TLS handshake messages such as ClientHello are encapsulated in
CRYPTO frames. Each QUIC implementation can split the content in several
frames of random sizes. In fact, this feature is now used by several
clients, based on chrome so-called "Chaos protection" mechanism :

https://quiche.googlesource.com/quiche/+/cb6b51054274cb2c939264faf34a1776e0a5bab7

To support this, haproxy uses a ncbuf storage to store received CRYPTO
frames before passing it to the SSL library. However, this storage
suffers from a limitation as gaps between two filled blocks cannot be
smaller than 8 bytes. Thus, depending on the size of received CRYPTO
frames and their order, ncbuf may not be sufficient. Over time, several
mechanisms were implemented in haproxy QUIC frames parsing to overcome
the ncbuf limitation.

However, reports recently highlight that with some clients haproxy is
not able to deal with CRYPTO frames reception. In particular, this is
the case with the latest ngtcp2 release, which implements a similar
chaos protection mechanism via the following patch. It also seems that
this impacts haproxy interaction with firefox.

commit 89c29fd8611d5e6d2f6b1f475c5e3494c376028c
Author: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>
Date:   Mon Aug 4 22:48:06 2025 +0900

    Crumble Client Initial CRYPTO (aka chaos protection)

To fix haproxy CRYPTO frames buffering once and for all, an alternative
non-contiguous buffer named ncbmbuf has been recently implemented. This
type does not suffer from gaps size limitation, albeit at the cost of a
small reduction in the size available for data storage.

Thus, the purpose of this current patch is to replace ncbuf with the
newer ncbmbuf for QUIC CRYPTO frames parsing. Now, ncbmb_add() is used
to buffer received frames which is guaranteed to suceed. The only
remaining case of error is if a received frame offset and length exceed
the ncbmbuf data storage, which would result in a CRYPTO_BUFFER_EXCEEDED
error code.

A notable behavior change when switching to ncbmbuf implementation is
that NCB_ADD_COMPARE mode cannot be used anymore during add. Instead,
crypto frame content received at a similar offset will be overwritten.

A final note regarding STREAM frames parsing. For now, it is considered
unnecessary to switch from ncbuf in this case. Indeed, QUIC clients does
not perform aggressive fragmentation for them. Keeping ncbuf ensure that
the data storage size is bigger than the equivalent ncbmbuf area.

This should fix github issue #3141.

This patch must be backported up to 2.6. It is first necessary to pick
the relevant commits for ncbmbuf implementation prior to it.
2025-10-22 15:04:41 +02:00
Frederic Lecaille
47bb15ca84 MINOR: quic: get rid of ->target quic_conn struct member
The ->li (struct listener *) member of quic_conn struct was replaced by a
->target (struct obj_type *) member by this commit:

    MINOR: quic-be: get rid of ->li quic_conn member

to abstract the connection type (front or back) when implementing QUIC for the
backends. In these cases, ->target was a pointer to the ojb_type of a server
struct. This could not work with the dynamic servers contrary to the listeners
which are not dynamic.

This patch almost reverts the one mentioned above. ->target pointer to obj_type member
is replaced by ->li pointer to listener struct member. As the listener are not
dynamic, this is easy to do this. All one has to do is to replace the
objt_listener(qc->target) statement by qc->li where applicable.

For the backend connection, when needed, this is always qc->conn->target which is
used only when qc->conn is initialized. The only "problematic" case is for
quic_dgram_parse() which takes a pointer to an obj_type as third argument.
But this obj_type is only used to call quic_rx_pkt_parse(). Inside this function
it is used to access the proxy counters of the connection thanks to qc_counters().
So, this obj_type argument may be null for now on with this patch. This is the
reason why qc_counters() is modified to take this into consideration.
2025-09-11 09:51:28 +02:00
Frederic Lecaille
58b153b882 MINOR: quic: Add more information about RX packets
This patch is very useful to debug issues at RX packet processing level.

Should be easily backported as far as 2.6 (for debug purposes).
2025-09-03 09:41:38 +02:00
Frederic Lecaille
fba80c7fe8 BUG/MINOR: quic: ignore AGAIN ncbuf err when parsing CRYPTO frames
This fix follows this previous one:

    BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets

which is not sufficient when a client fragments and mixes its CRYPTO frames AND
leaveswith holes by packets. ngtcp2 (and perhaps chrome) splits theire CRYPTO
frames but without hole by packet. In such a case, the CRYPTO parsing leads to
QUIC_RX_RET_FRM_AGAIN errors which cannot be fixed when the peer resends its packets.
Indeed, even if the peer resends its frames in a different order, this does not
help because since the previous commit, the CRYPTO frames are ordered on haproxy side.

This issue was detected thanks to the interopt tests with quic-go as client. This
client fragments its CRYPTO frames, mixes them, and generate holes, and most of
the times with the retry test.

To fix this, when a QUIC_RX_RET_FRM_AGAIN error is encountered, the CRYPTO frames
parsing is not stop. This leaves chances to the next CRYPTO frames to be parsed.

Must be backported as far as 2.6 as the commit mentioned above.
2025-09-02 08:13:58 +02:00
Frederic Lecaille
800ba73a9c BUG/MEDIUM: quic: CRYPTO frame freeing without eb_delete()
Since this commit:

	BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets

when they are parsed, the CRYPTO frames are ordered by their offsets into an ebtree.
Then their data are provided to the ncbufs.

But in case of error, when qc_handle_crypto_frm() returns QUIC_RX_RET_FRM_FATAL
or QUIC_RX_RET_FRM_AGAIN), they remain attached to their tree. Then
from <err> label, they are deteleted and deleted (with a while(node) { eb_delete();
qc_frm_free();} loop). But before this loop, these statements directly
free the frame without deleting it from its tree, if this is a CRYPTO frame,
leading to a use after free when running the loop:

     if (frm)
	    qc_frm_free(qc, &frm);

This issue was detected by the interop tests, with quic-go as client. Weirdly, this
client sends CRYPTO frames by packet with holes.
Must be backported as far as 2.6 as the commit mentioned above.
2025-09-01 10:39:00 +02:00
Frederic Lecaille
90126ec9b7 CLEANUP: quic: remove a useless CRYPTO frame variable assignment
This modification should have arrived with this commit:

	MINOR: quic: remove ->offset qf_crypto struct field

Since this commit, the CRYPTO offset node key assignment is done at parsing time
when calling qc_parse_frm() from qc_parse_pkt_frms().

This useless assigment has been reported in GH #3095 by coverity.

This patch should be easily backported as far as 2.6 as the one mentioned above
to ease any further backport to come.
2025-09-01 09:31:04 +02:00
Frederic Lecaille
31c17ad837 MINOR: quic: remove ->offset qf_crypto struct field
This patch follows this previous bug fix:

    BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets

where a ebtree node has been added to qf_crypto struct. It has the same
meaning and type as ->offset_node.key field with ->offset_node an eb64tree node.
This patch simply removes ->offset which is no more useful.

This patch should be easily backported as far as 2.6 as the one mentioned above
to ease any further backport to come.
2025-08-28 08:19:34 +02:00
Frederic Lecaille
d753f24096 BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets
This issue impacts the QUIC listeners. It is the same as the one fixed by this
commit:

	BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO

As chrome, ngtcp2 client decided to fragment its CRYPTO frames but in a much
more agressive way. This could be fixed with a list local to qc_parse_pkt_frms()
to please chrome thanks to the commit above. But this is not sufficient for
ngtcp2 which often splits its ClientHello message into more than 10 fragments
with very small ones. This leads the packet parser to interrupt the CRYPTO frames
parsing due to the ncbuf gap size limit.

To fix this, this patch approximatively proceeds the same way but with an
ebtree to reorder the CRYPTO by their offsets. These frames are directly
inserted into a local ebtree. Then this ebtree is reused to provide the
reordered CRYPTO data to the underlying ncbuf (non contiguous buffer). This way
there are very few less chances for the ncbufs used to store CRYPTO data
to reach a too much fragmented state.

Must be backported as far as 2.6.
2025-08-27 16:14:19 +02:00
Willy Tarreau
c264ea1679 MEDIUM: tree-wide: replace most DECLARE_POOL with DECLARE_TYPED_POOL
This will make the pools size and alignment automatically inherit
the type declaration. It was done like this:

   sed -i -e 's:DECLARE_POOL(\([^,]*,[^,]*,\s*\)sizeof(\([^)]*\))):DECLARE_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_POOL src addons)
   sed -i -e 's:DECLARE_STATIC_POOL(\([^,]*,[^,]*,\s*\)sizeof(\([^)]*\))):DECLARE_STATIC_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_STATIC_POOL src addons)

81 replacements were made. The only remaining ones are those which set
their own size without depending on a structure. The few ones with an
extra size were manually handled.

It also means that the requested alignments are now checked against the
type's. Given that none is specified for now, no issue is reported.

It was verified with "show pools detailed" that the definitions are
exactly the same, and that the binaries are similar.
2025-08-11 19:55:30 +02:00
Amaury Denoyelle
731b52ded9 MINOR: quic: prefer qc_is_back() usage over qc->target
Previously quic_conn <target> member was used to determine if quic_conn
was used on the frontend (as server) or backend side (as client). A new
helper function can now be used to directly check flag
QUIC_FL_CONN_IS_BACK.

This reduces the dependency between quic_conn and their relative
listener/server instances.
2025-08-07 16:59:59 +02:00
Frederic Lecaille
838024e07e MINOR: quic: Get rid of qc_is_listener()
Replace all calls to qc_is_listener() (resp. !qc_is_listener()) by calls to
objt_listener() (resp. objt_server()).
Remove qc_is_listener() implement and QUIC_FL_CONN_LISTENER the flag it
relied on.
2025-07-16 16:42:21 +02:00
Frederic Lecaille
194e3bc2d5 MINOR: quic-be: address validation support implementation (RETRY)
- Add ->retry_token and ->retry_token_len new quic_conn struct members to store
  the retry tokens. These objects are allocated by quic_rx_packet_parse() and
  released by quic_conn_release().
- Add <pool_head_quic_retry_token> new pool for these tokens.
- Implement quic_retry_packet_check() to check the integrity tag of these tokens
  upon RETRY packets receipt. quic_tls_generate_retry_integrity_tag() is called
  by this new function. It has been modified to pass the address where the tag
  must be generated
- Add <resend> new parameter to quic_pktns_discard(). This function is called
  to discard the packet number spaces where the already TX packets and frames are
  attached to. <resend> allows the caller to prevent this function to release
  the in flight TX packets/frames. The frames are requeued to be resent.
- Modify quic_rx_pkt_parse() to handle the RETRY packets. What must be done upon
  such packets receipt is:
  - store the retry token,
  - store the new peer SCID as the DCID of the connection. Note that the peer will
    modify again its SCID. This is why this SCID is also stored as the ODCID
    which must be matched with the peer retry_source_connection_id transport parameter,
  - discard the Initial packet number space without flagging it as discarded and
    prevent retransmissions calling qc_set_timer(),
  - modify the TLS cryptographic cipher contexts (RX/TX),
  - wakeup the I/O handler to send new Initial packets asap.
- Modify quic_transport_param_decode() to handle the retry_source_connection_id
  transport parameter as a QUIC client. Then its caller is modified to
  check this transport parameter matches with the SCID sent by the peer with
  the RETRY packet.
2025-06-26 09:48:00 +02:00
Amaury Denoyelle
06cab99a0e MINOR: mux-quic: support max bidi streams value set by the peer
Implement support for MAX_STREAMS frame. On frontend, this was mostly
useless as haproxy would never initiate new bidirectional streams.
However, this becomes necessary to control stream flow-control when
using QUIC as a client on the backend side.

Parsing of MAX_STREAMS is implemented via new qcc_recv_max_streams().
This allows to update <ms_uni>/<ms_bidi> QCC fields.

This patch is necessary to achieve QUIC backend connection reuse.
2025-06-18 17:25:27 +02:00
Amaury Denoyelle
577fa44691 BUG/MINOR: quic: work around NEW_TOKEN parsing error on backend side
NEW_TOKEN frame is never emitted by a client, hence parsing was not
tested on frontend side.

On backend side, an issue can occur, as expected token length is static,
based on the token length used internally by haproxy. This is not
sufficient for most server implementation which uses larger token. This
causes a parsing error, which may cause skipping of following frames in
the same packet. This issue was detected using ngtcp2 as server.

As for now tokens are unused by haproxy, simply discard test on token
length during NEW_TOKEN frame parsing. The token itself is merely
skipped without being stored. This is sufficient for now to continue on
experimenting with QUIC backend implementation.

This does not need to be backported.
2025-06-12 17:47:15 +02:00
Frederic Lecaille
b9703cf711 MINOR: quic-be: get rid of ->li quic_conn member
Replace ->li quic_conn pointer to struct listener member by  ->target which is
an object type enum and adapt the code.
Use __objt_(listener|server)() where the object type is known. Typically
this is were the code which is specific to one connection type (frontend/backend).
Remove <server> parameter passed to qc_new_conn(). It is redundant with the
<target> parameter.
GSO is not supported at this time for QUIC backend. qc_prep_pkts() is modified
to prevent it from building more than an MTU. This has as consequence to prevent
qc_send_ppkts() to use GSO.
ssl_clienthello.c code is run only by listeners. This is why __objt_listener()
is used in place of ->li.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
2d076178c6 MINOR: quic-be: Store asap the DCID
Store the peer connection ID (SCID) as the connection DCID as soon as an Initial
packet is received.
Stop comparing the packet to QUIC_PACKET_TYPE_0RTT is already match as
QUIC_PACKET_TYPE_INITIAL.
A QUIC server must not send too short datagram with ack-eliciting packets inside.
This cannot be done from quic_rx_pkt_parse() because one does not know if
there is ack-eliciting frame into the Initial packets. If the packet must be
dropped, this is after having parsed it!
2025-06-11 18:37:34 +02:00
Frederic Lecaille
43d88a44f1 MINOR: quic-be: Datagrams and packet parsing support
Modify quic_dgram_parse() to stop passing it a listener as third parameter.
In place the object type address of the connection socket owner is passed
to support the haproxy servers with QUIC as transport protocol.
qc_owner_obj_type() is implemented to return this address.
qc_counters() is also implemented to return the QUIC specific counters of
the proxy of owner of the connection.
quic_rx_pkt_parse() called by quic_dgram_parse() is also modify to use
the object type address used by this latter as last parameter. It is
also modified to send Retry packet only from listeners. A QUIC client
(connection to haproxy QUIC servers) must drop the Initial packets with
non null token length. It is also not supposed to receive O-RTT packets
which are dropped.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
f49bbd36b9 MINOR: quic-be: SSL sessions initializations
Modify qc_alloc_ssl_sock_ctx() to pass the connection object as parameter. It is
NULL for a QUIC listener, not NULL for a QUIC server. This connection object is
set as value for ->conn quic_conn struct member. Initialise the SSL session object from
this function for QUIC servers.
qc_ssl_set_quic_transport_params() is also modified to pass the SSL object as parameter.
This is the unique parameter this function needs. <qc> parameter is used only for
the trace.
SSL_do_handshake() must be calle as soon as the SSL object is initialized for
the QUIC backend connection. This triggers the TLS CRYPTO data delivery.
tasklet_wakeup() is also called to send asap these CRYPTO data.
Modify the QUIC_EV_CONN_NEW event trace to dump the potential errors returned by
SSL_do_handshake().
2025-06-11 18:37:34 +02:00
Amaury Denoyelle
f286288471 MINOR: quic: refactor handling of streams after MUX release
quic-conn layer has to handle itself STREAM frames after MUX release. If
the stream was already seen, it is probably only a retransmitted frame
which can be safely ignored. For other streams, an active closure may be
needed.

Thus it's necessary that quic-conn layer knows the highest stream ID
already handled by the MUX after its release. Previously, this was done
via <nb_streams> member array in quic-conn structure.

Refactor this by replacing <nb_streams> by two members called
<stream_max_uni>/<stream_max_bidi>. Indeed, it is unnecessary for
quic-conn layer to monitor locally opened uni streams, as the peer
cannot by definition emit a STREAM frame on it. Also, bidirectional
streams are always opened by the remote side.

Previously, <nb_streams> were set by quic-stream layer. Now,
<stream_max_uni>/<stream_max_bidi> members are only set one time, just
prior to QUIC MUX release. This is sufficient as quic-conn do not use
them if the MUX is available.

Note that previously, IDs were used relatively to their type, thus
incremented by 1, after shifting the original value. For simplification,
use the plain stream ID, which is incremented by 4.
2025-05-21 14:26:45 +02:00
Amaury Denoyelle
07d41a043c MINOR: quic: move function to check stream type in utils
Move general function to check if a stream is uni or bidirectional from
QUIC MUX to quic_utils module. This should prevent unnecessary include
of QUIC MUX header file in other sources.
2025-05-21 14:17:41 +02:00
Amaury Denoyelle
3fdb039a99 BUG/MEDIUM: quic: free stream_desc on all data acked
The following patch simplifies qc_stream_desc_ack(). The qc_stream_desc
instance is not freed anymore, even if all data were acknowledged. As
implies by the commit message, the caller is responsible to perform this
cleaning operation.
  f4a83fbb14bdd14ed94752a2280a2f40c1b690d2
  MINOR: quic: do not remove qc_stream_desc automatically on ACK handling

However, despite the commit instruction, qc_stream_desc_free()
invokation was not moved in the caller. This commit fixes this by adding
it after stream ACK handling. This is performed only when a transfer is
completed : all data is acknowledged and qc_stream_desc has been
released by its MUX stream instance counterpart.

This bug may cause a significant increase in memory usage when dealing
with long running connection. However, there is no memory leak, as every
qc_stream_desc attached to a connection are finally freed when quic_conn
instance is released.

This must be backported up to 3.1.
2025-05-09 09:25:47 +02:00
Ilia Shipitsin
27a6353ceb CLEANUP: assorted typo fixes in the code, commits and doc 2025-04-03 11:37:25 +02:00
Ilia Shipitsin
78b849b839 CLEANUP: assorted typo fixes in the code and comments
code, comments and doc actually.
2025-04-02 11:12:20 +02:00
Amaury Denoyelle
bbaa7aef7b BUG/MINOR: quic: do not increase congestion window if app limited
Previously, congestion window was increased any time each time a new
acknowledge was received. However, it did not take into account the
window filling level. In a network condition with negligible loss, this
will cause the window to be incremented until the maximum value (by
default 480k), even though the application does not have enough data to
fill it.

In most cases, this issue is not noticeable. However, it may lead to
excessive memory consumption when a QUIC connection is suddendly
interrupted, as in this case haproxy will fill the window with
retransmission. It even has caused OOM crash when thousands of clients
were interrupted at once on a local network benchmark.

Fix this by first checking window level prior to every incrementation
via a new helper function quic_cwnd_may_increase(). It was arbitrarily
decided that the window must be at least 50% full when the ACK is
handled prior to increment it. This value is a good compromise to keep
window in check while still allowing fast increment when needed.

Note that this patch only concerns cubic and newreno algorithm. BBR has
already its notion of application limited which ensures the window is
only incremented when necessary.

This should be backported up to 2.6.
2025-01-23 14:49:35 +01:00
Amaury Denoyelle
c3a4a4d166 BUG/MAJOR: quic: reject too large CRYPTO frames
Received CRYPTO frames are inserted in a ncbuf to handle out-of-order
reception via ncb_add(). They are stored on the position relative to the
frame offset, minus a base offset which corresponds to the in-order data
length already handled.

Previouly, no check was implemented on the frame offset value prior to
ncb_add(), which could easily trigger a crash if relative offset was too
large. Fix this by ensuring first that the frame can be stored in the
buffer before ncb_add() invokation. If this is not the case, connection
is closed with error CRYPTO_BUFFER_EXCEEDED, as required by QUIC
specification.

This should fix github issue #2842.

This must be backported up to 2.6.
2025-01-20 11:43:23 +01:00
Amaury Denoyelle
4a5d82a97d BUG/MINOR: quic: reject NEW_TOKEN frames from clients
As specified by RFC 9000, reject NEW_TOKEN frames emitted by clients.
Close the connection with error code PROTOCOL_VIOLATION.

This must be backported up to 2.6.
2025-01-10 14:50:59 +01:00
Frederic Lecaille
7868dc9c45 BUILD: quic: fix a build error about an non initialized timestamp
This is to please a non identified compilers which complains about an hypothetic
<time_ns> variable which would be not initialized even if this is the case only
when it is not used.

This build issue arrived with this commit:
	BUG/MINOR: improve BBR throughput on very fast links

Should be backported to 3.1 with this previous commit.
2024-11-29 14:48:37 +01:00
Frederic Lecaille
f8b697c19b BUG/MINOR: improve BBR throughput on very fast links
This patch fixes the loss of information when computing the delivery rate
(quic_cc_drs.c) on links with very low latency due to usage of 32bits
variables with the millisecond as precision.

Initialize the quic_conn task with TASK_F_WANTS_TIME flag ask it to ask
the scheduler to update the call date of this task. This allows this task to get
a nanosecond resolution on the call date calling task_mono_time(). This is enabled
only for congestion control algorithms with delivery rate estimation support
(BBR only at this time).

Store the send date with nanosecond precision of each TX packet into
->time_sent_ns new quic_tx_packet struct member to store the date a packet was
sent in nanoseconds thanks to task_mono_time().

Make use of this new timestamp by the delivery rate estimation algorithm (quic_cc_drs.c).

Rename current ->time_sent member from quic_tx_packet struct to ->time_sent_ms to
distinguish the unit used by this variable (millisecond) and update the code which
uses this variable. The logic found in quic_loss.c is not modified at all.

Must be backported to 3.1.
2024-11-28 21:39:05 +01:00
Frederic Lecaille
44af88d856 MINOR: quic: RX part modifications to support BBR
qc_notify_cc_of_newly_acked_pkts() aim is to notify the congestion algorithm
of all the packet acknowledgements. It must call quic_cc_drs_update_rate_sample()
to update the delivery rate sampling information. It must also call
quic_cc_drs_on_ack_recv() to update the state of the delivery rate sampling part
used by BBR.
Finally, ->on_ack_rcvd() is called with the total number of bytes delivered
by the sender from the newly acknowledged packets with <bytes_delivered> as
parameter to do so. <pkt_delivered> store the per-packet number of bytes
delivered by the newly sent acknowledged packet (the packet with the highest
packet number). <bytes_lost> is also used and has been set by
qc_packet_loss_lookup() before calling qc_notify_cc_of_newly_acked_pkts().
2024-11-20 17:34:22 +01:00
Amaury Denoyelle
2975e8805d BUG/MEDIUM: quic: prevent crash due to CRYPTO parsing error
A packet which contains several splitted and out of order CRYPTO frames
may be parsed multiple times to ensure it can be handled via ncbuf. Only
3 iterations can be performed to prevent excessive CPU usage.

There is a risk of crash if packet parsing is interrupted after maximum
iterations is reached, or no progress can be made on the ncbuf. This is
because <frm> may be dangling after list_for_each_entry_safe()

The crash occurs on qc_frm_free() invokation, on error path of
qc_parse_pkt_frms(). To fix it, always reset frm to NULL after
list_for_each_entry_safe() to ensure it is not dangling.

This should fix new report on github isue #2776. This regression has
been triggered by the following patch :
  1767196d5b2d8d1e557f7b3911a940000166ecda
  BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO

As such, it must be backported up to 2.6, after the above patch.
2024-11-08 15:19:57 +01:00
Amaury Denoyelle
3b851a326b BUG/MEDIUM: quic: do not consider ACK on released stream as error
When an ACK is received by haproxy, a lookup is performed to retrieve
the related emitted frames. For STREAM type frames, a lookup is
performed under quic_conn stream_desc tree. Indeed, the corresponding
stream instance could be already released if multiple ACK were received
refering to the same stream offset, which can happen notably if
retransmission occured.

qc_handle_newly_acked_frm() implements this logic. If the case with an
already released stream is encounted, an error is returned. In the end,
this error is propagated via qc_parse_pkt_frms() into
qc_treat_rx_pkts(), despite being in fact a perfectly valid case. Fix
this by adjusting ACK handling function to return a success value for
the particular case of released stream instead.

The impact of this bug is unknown, but it can have several consequences.
* if the packet with the ACK contains other frames after it, their
  content will be skipped
* the packet won't be acknowledged by haproxy, even if it contains other
  frames and is ack-eliciting. This may cause unneeded retransmission by
  the client.
* RTT sampling information related to this ACK is ignored by haproxy

Finally, it also caused the increment of the quic_conn counter
dropped_parsing (droppars in "show quic" output) which should be
reserved only for real error cases.

This regression is present since the following patch :
  e7578084b0536e3e5988be7f09091c85beb8fa9d
  MINOR: quic: implement dedicated type for out-of-order stream ACK

Before, qc_handle_newly_acked_frm() return type was always ignored. As
such, no backport is needed.
2024-11-06 17:37:44 +01:00
Amaury Denoyelle
1767196d5b BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO
A ClientHello may be splitted accross several different CRYPTO frames,
then mixed in a single QUIC packet. This is used notably by clients such
as chrome to render the first Initial packet opaque to middleboxes.

Each packet frame is handled sequentially. Out-of-order CRYPTO frames
are buffered in a ncbuf, until gaps are filled and data is transferred
to the SSL stack. If CRYPTO frames are heavily splitted with small
fragments, buffering may fail as ncbuf does not support small gaps. This
causes the whole packet to be rejected and unacknowledged. It could be
solved if the client reemits its ClientHello after remixing its CRYPTO
frames.

This patch is written to improve CRYPTO frame parsing. Each CRYPTO
frames which cannot be buffered due to ncbuf limitation are now stored
in a temporary list. Packet parsing is completed until all frames have
been handled. If temporary list is not empty, reparsing is done on the
stored frames. With the newly buffered CRYPTO frames, ncbuf insert
operation may this time succeeds if the frame now covers a whole gap.
Reparsing will loop until either no progress can be made or it has been
done at least 3 times, to prevent CPU utilization.

This patch should fix github issue #2776.

This should be backported up to 2.6, after a period of observation. Note
that it relies on the following refactor patches :
  MINOR: quic: extend return value of CRYPTO parsing
  MINOR: quic: use dynamically allocated frame on parsing
  MINOR: quic: simplify qc_parse_pkt_frms() return path
2024-11-06 14:29:14 +01:00
Amaury Denoyelle
d65e782c8c MINOR: quic: extend return value of CRYPTO parsing
qc_handle_crypto_frm() is the function used to handled a newly received
CRYPTO frame. Change its API to use a newly dedicated return type. This
allows to report if the frame was properly handled, ignored if already
parsed previously or rejected after a fatal error.

This commit does not have any functional changes. However, it allows to
simplify qc_handle_crypto_frm() API by removing <fast_retrans> as output
parameter. Also, this patch will be necessary to support multiple
iteration of packet parsing for CRYPTO frames.
2024-11-06 14:28:14 +01:00
Amaury Denoyelle
190fc97606 MINOR: quic: use dynamically allocated frame on parsing
qc_parse_pkt_frms() is the function responsible to parse a received QUIC
packet. Payload is decoded and splitted into individual frames which are
then handled individually. Previously, frame was used as locally stack
allocated. Change this to work on a dynamically allocated frame.

This commit does bring any functional changes. However, it will be
useful to extend packet parsing. In particular, it will be necessary to
save some frames during parsing to reparse them after the others.
2024-11-06 14:28:14 +01:00
Amaury Denoyelle
498a99a849 MINOR: quic: simplify qc_parse_pkt_frms() return path
Change qc_parse_pkt_frms() return path for normal and error cases. Most
notably, it allows to remove local variable ret as now return value is
hardcoded on normal and err label. This also allows to define a
different trace for error leaving code.
2024-11-06 14:28:14 +01:00
Amaury Denoyelle
e7578084b0 MINOR: quic: implement dedicated type for out-of-order stream ACK
QUIC streamdesc layer is responsible to handle reception of ACK for
streams. It removes stream data from the underlying buffers on ACK
reception.

Streamdesc layer treats ACK in order at the stream level. Out of order
ACKs are buffered in a tree until they can be handled on older data
acknowledgement reception. Previously, qf_stream instance which comes
from the quic_tx_packet was used as tree node to buffer such ranges.

Introduce a new type dedicated to represent out of order stream ack data
range. This type is named qc_stream_ack. It contains minimal infos only
relative to the acknowledged stream data range.

This allows to reduce size of frequently used quic_frame with the
removal of tree node from qf_stream. Another side effect of this change
is that now quic_frame are always released immediately on ACK reception,
both in-order and out-of-order. This allows to also release the
quic_tx_packet instance which should reduce memory consumption.

The drawback of this change is that qc_stream_ack instance must be
allocated on out-of-order ACK reception. As such, qc_stream_desc_ack()
may fail if an error happens on allocation. For the moment, such error
is silenly recovered up to qc_treat_rx_pkts() with the dropping of the
received packet containing the ACK frame. In the future, it may be
useful to close the connection as this error may only happens on low
memory usage.
2024-10-04 17:56:45 +02:00
Amaury Denoyelle
8d68717a41 MEDIUM: quic: refactor buffered STREAM ACK consuming
For the moment, streamdesc layer can only deal with in-order ACK at the
stream level. Received out-of-order ACKs are buffered in a tree attached
to a streambuf instance.

Previously, caller of qc_stream_desc_ack() was responsible to implement
consumption of these buffered ACKs. Refactor this by implementing it
directly at the streamdesc layer within qc_stream_desc_ack(). This
simplifies quic_rx ACK handling and ensure buffered ACKs are consumed as
soon as possible.
2024-10-01 16:22:23 +02:00
Amaury Denoyelle
cc4384aeb7 MEDIUM: quic: handle out-of-order ACK at streamdesc layer
qc_stream_desc_ack() is the entrypoint for streamdesc layer to handle a
new acknowledgement of previously emitted STREAM data.

Previously, it was only able to deal with in-order ACK offset. The
caller was responsible to buffer out-of-order ACKs. Change this by
dealing with the latter case directly in qc_stream_desc_ack(). This
notably simplify ACK handling in quic_rx module.
2024-10-01 16:22:20 +02:00
Amaury Denoyelle
62558a9285 MINOR: quic: move buffered ACK to streambuf
QUIC streamdesc layer is used to manage QUIC MUX stream txbuf data
storage until acknowledgment. Currently, it only supports in-order
acknowledgment at the stream level. This requires to be able to buffer
out-of-order ACKs until they can be handled.

Previously, these ACKs were stored in a tree to the streamdesc instance.
Move this indexed storage at the streambuf instance.

This commit is purely an architecture change. However, it will allow to
extend ACK management in future patches, such as the ability to merge
overlapping out-of-order ACKs.
2024-10-01 16:19:42 +02:00
Amaury Denoyelle
f4a83fbb14 MINOR: quic: do not remove qc_stream_desc automatically on ACK handling
qc_stream_desc_ack() is used to handle ACK received for STREAM frame. It
removes acknowledged data from their underlying buffer.

If all data were removed after ACK handling, qc_stream_desc instance
would automatically be freed at the end of qc_stream_desc_ack().
However, this renders the function complicated to use. Simplify this by
removing this automatic removal. Now, caller is responsible to check
after ACK handling if qc_stream_desc instance can be removed. This is
easily done using qc_stream_desc_done() helper.
2024-10-01 16:19:25 +02:00
Frederic Lecaille
f627b9272b BUG/MEDIUM: quic: always validate sender address on 0-RTT
It has been reported by Wedl Michael, a student at the University of Applied
Sciences St. Poelten, a potential vulnerability into haproxy as described below.

An attacker could have obtained a TLS session ticket after having established
a connection to an haproxy QUIC listener, using its real IP address. The
attacker has not even to send a application level request (HTTP3). Then
the attacker could open a 0-RTT session with a spoofed IP address
trusted by the QUIC listen to bypass IP allow/block list and send HTTP3 requests.

To mitigate this vulnerability, one decided to use a token which can be provided
to the client each time it successfully managed to connect to haproxy. These
tokens may be reused for future connections to validate the address/path of the
remote peer as this is done with the Retry token which is used for the current
connection, not the next one. Such tokens are transported by NEW_TOKEN frames
which was not used at this time by haproxy.

So, each time a client connect to an haproxy QUIC listener with 0-RTT
enabled, it is provided with such a token which can be reused for the
next 0-RTT session. If no such a token is presented by the client,
haproxy checks if the session is a 0-RTT one, so with early-data presented
by the client. Contrary to the Retry token, the decision to refuse the
connection is made only when the TLS stack has been provided with
enough early-data from the Initial ClientHello TLS message and when
these data have been accepted. Hopefully, this event arrives fast enough
to allow haproxy to kill the connection if some early-data have been accepted
without token presented by the client.

quic_build_post_handshake_frames() has been modified to build a NEW_TOKEN
frame with this newly implemented token to be transported inside.

quic_tls_derive_retry_token_secret() was renamed to quic_do_tls_derive_token_secre()
and modified to be reused and derive the secret for the new token implementation.

quic_token_validate() has been implemented to validate both the Retry and
the new token implemented by this patch. When this is a non-retry token
which could not be validated, the datagram received is marked as requiring
a Retry packet to be sent, and no connection is created.

When the Initial packet does not embed any non-retry token and if 0-RTT is enabled
the connection is marked with this new flag: QUIC_FL_CONN_NO_TOKEN_RCVD. As soon
as the TLS stack detects that some early-data have been provided and accepted by
the client, the connection is marked to be killed (QUIC_FL_CONN_TO_KILL) from
ha_quic_add_handshake_data(). This is done calling qc_ssl_eary_data_accepted()
new function. The secret TLS handshake is interrupted as soon as possible returnin
0 from ha_quic_add_handshake_data(). The connection is also marked as
requiring a Retry packet to be sent (QUIC_FL_CONN_SEND_RETRY) from
ha_quic_add_handshake_data(). The the handshake I/O handler (quic_conn_io_cb())
knows how to behave: kill the connection after having sent a Retry packet.

About TLS stack compatibility, this patch is supported by aws-lc. It is
disabled for wolfssl which does not support 0-RTT at this time thanks
to HAVE_SSL_0RTT_QUIC.

This patch depends on these commits:

     MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
     MINOR: quic: Implement qc_ssl_eary_data_accepted().
     MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct)
     BUG/MINOR: quic: Missing incrementation in NEW_TOKEN frame builder
     MINOR: quic: Token for future connections implementation.
     MINOR: quic: Implement quic_tls_derive_token_secret().
     MINOR: tools: Implement ipaddrcpy().

Must be backported as far as 2.6.
2024-08-30 17:04:09 +02:00
Amaury Denoyelle
e177cf341c BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM
STREAM frames have dedicated handling on retransmission. A special check
is done to remove data already acked in case of duplicated frames, thus
only unacked data are retransmitted.

This handling is faulty in case of an empty STREAM frame with FIN set.
On retransmission, this frame does not cover any unacked range as it is
empty and is thus discarded. This may cause the transfer to freeze with
the client waiting indefinitely for the FIN notification.

To handle retransmission of empty FIN STREAM frame, qc_stream_desc layer
have been extended. A new flag QC_SD_FL_WAIT_FOR_FIN is set by MUX QUIC
when FIN has been transmitted. If set, it prevents qc_stream_desc to be
freed until FIN is acknowledged. On retransmission side,
qc_stream_frm_is_acked() has been updated. It now reports false if
FIN bit is set on the frame and qc_stream_desc has QC_SD_FL_WAIT_FOR_FIN
set.

This must be backported up to 2.6. However, this modifies heavily
critical section for ACK handling and retransmission. As such, it must
be backported only after a period of observation.

This issue can be reproduced by using the following socat command as
server to add delay between the response and connection closure :
  $ socat TCP-LISTEN:<port>,fork,reuseaddr,crlf SYSTEM:'echo "HTTP/1.1 200 OK"; echo ""; sleep 1;'

On the client side, ngtcp2 can be used to simulate packet drop. Without
this patch, connection will be interrupted on QUIC idle timeout or
haproxy client timeout with ERR_DRAINING on ngtcp2 :
  $ ngtcp2-client --exit-on-all-streams-close -r 0.3 <host> <port> "http://<host>:<port>/?s=32o"

Alternatively to ngtcp2 random loss, an extra haproxy patch can also be
used to force skipping the emission of the empty STREAM frame :

diff --git a/include/haproxy/quic_tx-t.h b/include/haproxy/quic_tx-t.h
index efbdfe687..1ff899acd 100644
--- a/include/haproxy/quic_tx-t.h
+++ b/include/haproxy/quic_tx-t.h
@@ -26,6 +26,8 @@ extern struct pool_head *pool_head_quic_cc_buf;
 /* Flag a sent packet as being probing with old data */
 #define QUIC_FL_TX_PACKET_PROBE_WITH_OLD_DATA (1UL << 5)

+#define QUIC_FL_TX_PACKET_SKIP_SENDTO (1UL << 6)
+
 /* Structure to store enough information about TX QUIC packets. */
 struct quic_tx_packet {
 	/* List entry point. */
diff --git a/src/quic_tx.c b/src/quic_tx.c
index 2f199ac3c..2702fc9b9 100644
--- a/src/quic_tx.c
+++ b/src/quic_tx.c
@@ -318,7 +318,7 @@ static int qc_send_ppkts(struct buffer *buf, struct ssl_sock_ctx *ctx)
 		tmpbuf.size = tmpbuf.data = dglen;

 		TRACE_PROTO("TX dgram", QUIC_EV_CONN_SPPKTS, qc);
-		if (!skip_sendto) {
+		if (!skip_sendto && !(first_pkt->flags & QUIC_FL_TX_PACKET_SKIP_SENDTO)) {
 			int ret = qc_snd_buf(qc, &tmpbuf, tmpbuf.data, 0, gso);
 			if (ret < 0) {
 				if (gso && ret == -EIO) {
@@ -354,6 +354,7 @@ static int qc_send_ppkts(struct buffer *buf, struct ssl_sock_ctx *ctx)
 					qc->cntrs.sent_bytes_gso += ret;
 			}
 		}
+		first_pkt->flags &= ~QUIC_FL_TX_PACKET_SKIP_SENDTO;

 		b_del(buf, dglen + QUIC_DGRAM_HEADLEN);
 		qc->bytes.tx += tmpbuf.data;
@@ -2066,6 +2067,17 @@ static int qc_do_build_pkt(unsigned char *pos, const unsigned char *end,
 				continue;
 			}

+			switch (cf->type) {
+			case QUIC_FT_STREAM_8 ... QUIC_FT_STREAM_F:
+				if (!cf->stream.len && (qc->flags & QUIC_FL_CONN_TX_MUX_CONTEXT)) {
+					TRACE_USER("artificially drop packet with empty STREAM frame", QUIC_EV_CONN_TXPKT, qc);
+					pkt->flags |= QUIC_FL_TX_PACKET_SKIP_SENDTO;
+				}
+				break;
+			default:
+				break;
+			}
+
 			quic_tx_packet_refinc(pkt);
 			cf->pkt = pkt;
 		}
2024-08-07 11:03:32 +02:00
Amaury Denoyelle
714009b7bc MINOR: quic: implement function to check if STREAM is fully acked
When a STREAM frame is retransmitted, a check is performed to remove
range of data already acked from it. This is useful when STREAM frames
are duplicated and splitted to cover different data ranges. The newly
retransmitted frame contains only unacked data.

This process is performed similarly in qc_dup_pkt_frms() and
qc_build_frms(). Refactor the code into a new function named
qc_stream_frm_is_acked(). It returns true if frame data are already
fully acked and retransmission can be avoided. If only a partial range
of data is acknowledged, frame content is updated to only cover the
unacked data.

This patch does not have any functional change. However, it simplifies
retransmission for STREAM frames. Also, it will be reused to fix
retransmission for empty STREAM frames with FIN set from the following
patch :
  BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM

As such, it must be backported prior to it.
2024-08-07 10:57:10 +02:00
Amaury Denoyelle
bba6baff30 BUG/MEDIUM: quic: prevent conn freeze on 0RTT undeciphered content
Received QUIC packets are stored in quic_conn Rx buffer after header
protection removal in qc_rx_pkt_handle(). These packets are then removed
after quic_conn IO handler via qc_treat_rx_pkts().

If HP cannot be removed, packets are still copied into quic_conn Rx
buffer. This can happen if encryption level TLS keys are not yet
available. The packet remains in the buffer until HP can be removed and
its content processed.

An issue occurs if client emits a 0-RTT packet but haproxy does not have
the shared secret, for example after a haproxy process restart. In this
case, the packet is copied in quic_conn Rx buffer but its HP won't ever
be removed. This prevents the buffer to be purged. After some time, if
the client has emitted enough packets, Rx buffer won't have any space
left and received packets are dropped. This will cause the connection to
freeze.

To fix this, remove any 0-RTT buffered packets on handshake completion.
At this stage, 0-RTT packets are unnecessary anymore. The client is
expected to reemit its content in 1-RTT packet which are properly
deciphered.

This can easily reproduce with HTTP/3 POST requests or retrieving a big
enough object, which will fill the Rx buffer with ACK frames. Here is a
picoquic command to provoke the issue on haproxy startup :

$ picoquicdemo -Q -v 00000001 -a h3 <hostname> 20443 "/?s=1g"

Note that allow-0rtt must be present on the bind line to trigger the
issue. Else haproxy will reject any 0-RTT packets.

This must be backported up to 2.6.

This could be one of the reason for github issue #2549 but it's unsure
for now.
2024-07-31 10:24:53 +02:00
Amaury Denoyelle
08515af9df MINOR: quic: implement send-retry quic-initial rules
Define a new quic-initial "send-retry" rule. This allows to force the
emission of a Retry packet on an initial without token instead of
instantiating a new QUIC connection.
2024-07-25 15:39:39 +02:00