126 Commits

Author SHA1 Message Date
Frederic Lecaille
3f60891360 MEDIUM: quic-be: qc_send_mux() adaptation for 0-RTT
When entering this function, a selection is done about the encryption level
to be used to send data. For a client, the early data encryption level
is used to send 0-RTT if this encryption level is initialized.

The Initial encryption is also registered to the send list for clients if there
is Initial crypto data to send. This allow Initial and 0-RTT packets to
be coalesced by datagrams.
2025-11-13 14:04:31 +01:00
Frederic Lecaille
a4bbbc75db MINOR: quic-be: Send post handshake frames from list of frames (0-RTT)
This patch is required to make 0-RTT work. It modifies the prototype of
quic_build_post_handshake_frames() to send post handshake frames from a
list of frames in place of the application encryption level (used
as <qc->ael> local variable).

This patch does not modify at all the current QUIC stack behavior (even for
QUIC frontends). It must be considered as a preparation for the code
to come about 0-RTT support for QUIC backends.
2025-11-13 14:04:31 +01:00
Frederic Lecaille
ac1d3eba88 MINOR: quic-be: allow the preparation of 0-RTT packets
A QUIC server never sends 0-RTT packets contrary to the client.

This very simple modification allow the the preparation of 0-RTT packets
with early data as encryption level (->eel).
2025-11-13 14:04:31 +01:00
Frederic Lecaille
80070fe51c MEDIUM: quic-be: Parse, store and reuse tokens provided by NEW_TOKEN
Add a per thread ist struct to srv_per_thread struct to store the QUIC token to
be reused for subsequent sessions.

Parse at packet level (from qc_parse_ptk_frms()) these tokens and store
them calling qc_try_store_new_token() newly implemented function. This is
this new function which does its best (may fail) to update the tokens.

Modify qc_do_build_pkt() to resend these tokens calling quic_enc_token()
implemented by this patch.
2025-11-13 14:04:31 +01:00
Amaury Denoyelle
b9809fe0d0 MINOR: quic: remove <mux_state> field
This patch removes <mux_state> field from quic_conn structure. The
purpose of this field was to indicate if MUX layer above quic_conn is
not yet initialized, active, or already released.

It became tedious to properly set it as initialization order of the
various quic_conn/conn/MUX layers now differ between the frontend and
backend sides, and also depending if 0-RTT is used or not. Recently, a
new change introduced in connect_server() will allow to initialize QUIC
MUX earlier if ALPN is cached on the server structure. This had another
level of complexity.

Thus, this patch removes <mux_state> field completely. Instead, a new
flag QUIC_FL_CONN_XPRT_CLOSED is defined. It is set at a single place
only on close XPRT callback invokation. It can be mixed with the new
utility functions qc_wait_for_conn()/qc_is_conn_ready() to determine the
status of conn/MUX layers now without an extra quic_conn field.
2025-11-05 14:03:34 +01:00
Amaury Denoyelle
9bfe9b9e21 MINOR: quic: split Tx options for FE/BE usage
This patch is similar to the previous one, except that it is focused on
Tx QUIC settings. It is now possible to toggle GSO and pacing on
frontend and backend sides independently.

As with previous patch, option are renamed to use "fe/be" unified
prefixes. This is part of the current serie of commits which unify QUI
settings. Older options are deprecated and will be removed on 3.5
release.
2025-10-23 16:49:20 +02:00
Amaury Denoyelle
33a8cb87a9 MINOR: quic: split congestion controler options for FE/BE usage
Various settings can be configured related to QUIC congestion controler.
This patch duplicates them to be able to set independent values on
frontend and backend sides.

As with previous patch, option are renamed to use "fe/be" unified
prefixes. This is part of the current serie of commits which unify QUIC
settings. Older options are deprecated and will be removed on 3.5
release.
2025-10-23 16:49:20 +02:00
Amaury Denoyelle
5b04a85bc7 TESTS: quic: fix uninit of quic_cc_path const member
Fix quic_tx unittest module by adding an explicit define for <mtu> const
member of quic_cc_path.

This should fix coverity report from github issue #3162.

This can be backported up to 3.2.
2025-10-17 09:29:01 +02:00
Frederic Lecaille
47bb15ca84 MINOR: quic: get rid of ->target quic_conn struct member
The ->li (struct listener *) member of quic_conn struct was replaced by a
->target (struct obj_type *) member by this commit:

    MINOR: quic-be: get rid of ->li quic_conn member

to abstract the connection type (front or back) when implementing QUIC for the
backends. In these cases, ->target was a pointer to the ojb_type of a server
struct. This could not work with the dynamic servers contrary to the listeners
which are not dynamic.

This patch almost reverts the one mentioned above. ->target pointer to obj_type member
is replaced by ->li pointer to listener struct member. As the listener are not
dynamic, this is easy to do this. All one has to do is to replace the
objt_listener(qc->target) statement by qc->li where applicable.

For the backend connection, when needed, this is always qc->conn->target which is
used only when qc->conn is initialized. The only "problematic" case is for
quic_dgram_parse() which takes a pointer to an obj_type as third argument.
But this obj_type is only used to call quic_rx_pkt_parse(). Inside this function
it is used to access the proxy counters of the connection thanks to qc_counters().
So, this obj_type argument may be null for now on with this patch. This is the
reason why qc_counters() is modified to take this into consideration.
2025-09-11 09:51:28 +02:00
Amaury Denoyelle
0b6908385e BUG/MINOR: quic: properly support GSO on backend side
Previously, GSO emission was explicitely disabled on backend side. This
is not true since the following patch, thus GSO can be used, for example
when transfering large POST requests to a HTTP/3 backend.

  commit e064e5d46171d32097a84b8f84ccc510a5c211db
  MINOR: quic: duplicate GSO unsupp status from listener to conn

However, GSO on the backend side may cause crash when handling EIO. In
this case, GSO must be completely disabled. Previously, this was
performed by flagging listener instance. In backend side, this would
cause a crash as listener is NULL.

This patch fixes it by supporting GSO disable flag for servers. Thus, in
qc_send_ppkts(), EIO can be converted either to a listener or server
flag depending on the quic_conn proxy side. On backend side, server
instance is retrieved via <qc.conn.target>. This is enough to guarantee
that server is not deleted.

This does not need to be backported.
2025-09-08 16:18:05 +02:00
Amaury Denoyelle
f645cd3c74 MINOR: quic: restore QUIC_HP_SAMPLE_LEN constant
The below patch fixes padding emission for small packets, which is
required to ensure that header protection removal can be performed by
the recipient.

  commit d7dea408c64c327cab6aebf4ccad93405b675565
  BUG/MINOR: quic: too short PADDING frame for too short packets

In addition to the proper fix, constant QUIC_HP_SAMPLE_LEN was removed
and replaced by QUIC_TLS_TAG_LEN. However, it still makes sense to have
a dedicated constant which represent the size of the sample used for
header protection. Thus, this patch restores it.

Special instructions for backport : above patch mentions that no
backport is needed. However, this is incorrect, as bug is introduced by
another patch scheduled for backport up to 2.6. Thus, it is first
mandatory to schedule d7dea408c64c327cab6aebf4ccad93405b675565 after it.
Then, this patch can also be used for the sake of code clarity.
2025-09-08 14:49:03 +02:00
Amaury Denoyelle
c20c71a079 TESTS: quic: add unit-tests for QUIC TX part
Define a new "quic_tx" unit-test which is used to test QUIC TX module.
For the moment, a single test is performed on qc_do_build_pkt(). It
checks that PADDING is correctly added for HP sampling in case of a
small packet.
2025-09-08 14:49:03 +02:00
Amaury Denoyelle
fb8c6e2030 CLEANUP: quic: fix typo in quic_tx trace
Fix trace in qc_may_build_pkt().

This can be backported up to 3.0.
2025-09-08 14:49:03 +02:00
Frederic Lecaille
d7dea408c6 BUG/MINOR: quic: too short PADDING frame for too short packets
This bug arrvived with this commit:

    MINOR: quic: centralize padding for HP sampling on packet building

What was missed is the fact that at the centralization point for the
PADDING frame to add for too short packet, <len> payload length  already includes
<*pn_len> the packet number field length value.

So when computing the length of the PADDING frame, the packet field length must
not be considered and added to the payload length (<len>).

This bug leaded too short PADDING frame to too short packets. This was the case,
most of times with Application level packets with a 1-byte packet number field
followed by a 1-byte PING frame. A 1-byte PADDING frame was added in this case
in place of a correct 2-bytes PADDINF frame. The header packet protection of
such packet could not be removed by the clients as for instance for ngtcp2 with
such traces:

    I00001828 0x5a135c81e803f092c74bac64a85513b657 pkt could not decrypt packet number

As the header protection could no be removed, the header keyupdate bit could also
not be read by packet analyzers such as pyshark used during the keyupdate tests.

No need to backport.
2025-09-05 16:17:11 +02:00
Frederic Lecaille
71336bdd08 MINOR: quic: add useful trace about padding params values
When adding a PADDING frame for too short packets, add a trace about variable
values whose this PADDING frame length depends on.
2025-09-05 16:17:11 +02:00
Amaury Denoyelle
36d28bfca3 MEDIUM: quic: strengthen BUG_ON() for unpad Initial packet on client
To avoid anti-amplification limit, it is required that Initial packet
are padded to be at least 1.200 bytes long. On server side, this only
applies to ack-eliciting packets. However, for client side, this is
mandatory for every packets.

This patch adjusts qc_txb_store() BUG_ON statement used to catch too
small Initial packets. On QUIC client side, ack-eliciting flag is now
ignored, thus every packets are checked.

This is labelled as MEDIUM as this BUG_ON() is known to be easily
triggered, as QUIC datagrams encoding function are complex. However,
it's important that a QUIC endpoint respects it, else the peer will drop
the invalid packet and could immediately close the connection.
2025-09-02 10:41:49 +02:00
Amaury Denoyelle
209a54d539 BUG/MINOR: quic: pad Initial pkt with CONNECTION_CLOSE on client
Currently, when connection is closing, only CONNECTION_CLOSE frame is
emitted via qc_prep_pkts()/qc_do_build_pkt(). Also, only the first
registered encryption level is considered while the others are
dismissed. This results in a single packet datagram.

This can cause issues for QUIC client support, as padding is required
for every Initial packet, contrary to server side where only
ack-eliciting packets are eligible. Thus a client must add padding to a
CONNECTION_CLOSE frame on Initial level.

This patch adjusts qc_prep_pkts() to ensure such packet will be
correctly padded on client side. It sets <final_packet> variable which
instructs that if padding is necessary it must be apply immediately on
the current encryption level instead of the last one.

It could appear as unnecessary to pad a CONNECTION_CLOSE packet, as the
peer will enter in draining state when processing it. However, RFC
mandates that a client Initial packet too small must be dropped by the
server, so there is a risk that the CONNECTION_CLOSE is simply discarded
prior to its processing if stored in a too small datagram.

No need to backport as this is a QUIC backend issue only.
2025-09-02 10:34:12 +02:00
Amaury Denoyelle
e9b78e3fb1 BUG/MINOR: quic: fix padding issue on INITIAL retransmit
On loss detection timer expiration, qc_dgrams_retransmit() is used to
reemit lost packets. Different code paths are present depending on the
active encryption level.

If Initial level is still initialized, retransmit is performed both for
Initial and Handshake spaces, by first retrieving the list of lost
frames for each of them.

Prior to this patch, Handshake level was always registered for emission
after Initial, even if it dit not have any frame to reemit. In this
case, most of the time it would result in a datagram containing Initial
reemitted frames packet coalesced with a Handshake packet consisting
only of a PADDING frame. This is because padding is only added for the
last registered QEL.

For QUIC backend support, this may cause issues. This is because
contrary to QUIC server side, Initial and Handshake levels keys are not
derived simultaneously for a QUIC client. Thus, if the latter keys are
unavailable, Handshake packet cannot be encoded in sending, leaving a
single Initial packet. However, this is now too late to add PADDING.
Thus the resulting datagram is invalid : this triggers the BUG_ON()
assert failure located on qc_txb_store().

This patch fixes this by amending qc_dgrams_retransmit(). Now, Handshake
level is only registered for emission if there is frame to retransmit,
which implies that Handshake keys are already available. Thus, PADDING
will now either be added at Initial or Handshake level as expected.

Note that this issue should not be present on QUIC frontend, due to
Initial and Handshake keys derivation almost simultaneously. However,
this should still be backported up to 3.0.
2025-09-02 10:31:32 +02:00
Amaury Denoyelle
34d5bfd23c BUG/MINOR: quic: fix room check if padding requested
qc_prep_pkts() activates padding when building an Initial packet. This
ensures that resulting datagram will always be at least 1.200 bytes,
which is mandatory to prevent deadlock over anti-amplication.

Prior to padding activation, a check is performed to ensure that output
buffer is big enough for a padded datagram. However, this did not take
into account previously built packets which would be coalesced in the
same datagram. Thus this patch fixes this comparison check.

In theory, prior to this patch, in some cases Initial packets could not
be built despite a datagram of the proper size. Currently, this probably
never happens as Initial packet is always the first encoded in a
datagram, thus there is no coalesced packet prior to it. However, there
is no hard requirement on this, so it's better to reflect this in the
code.

This should be backported up to 2.6.
2025-09-02 10:29:11 +02:00
Amaury Denoyelle
1529ec1a25 MINOR: quic: centralize padding for HP sampling on packet building
The below patch has simplified INITIAL padding on emission. Now,
qc_prep_pkts() is responsible to activate padding for this case, and
there is no more special case in qc_do_build_pkt() needed.

  commit 8bc339a6ad4702f2c39b2a78aaaff665d85c762b
  BUG/MAJOR: quic: fix INITIAL padding with probing packet only

However, qc_do_build_pkt() may still activate padding on its own, to
ensure that a packet is big enough so that header protection decryption
can be performed by the peer. HP decryption is performed by extracting a
sample from the ciphered packet, starting 4 bytes after PN offset.
Sample length is 16 bytes as defined by TLS algos used by QUIC. Thus, a
QUIC sender must ensures that length of packet number plus payload
fields to be at least 4 bytes long. This is enough given that each
packet is completed by a 16 bytes AEAD tag which can be part of the HP
sample.

This patch simplifies qc_do_build_pkt() by centralizing padding for this
case in a single location. This is performed at the end of the function
after payload is completed. The code is thus simpler.

This is not a bug. However, it may be interesting to backport this patch
up to 2.6, as qc_do_build_pkt() is a tedious function, in particular
when dealing with padding generation, thus it may benefit greatly from
simplification.
2025-08-25 08:48:24 +02:00
Amaury Denoyelle
7d554ca629 BUG/MINOR: quic: don't coalesce probing and ACK packet of same type
Haproxy QUIC stack suffers from a limitation : it's not possible to emit
a packet which contains probing data and a ACK frame in it. Thus, in
case qc_do_build_pkt() is invoked which both values as true, probing has
the priority and ACK is ignored.

However, this has the undesired side-effect of possibly generating two
coalesced packets of the same type in the same datagram : the first one
with the probing data and the second with an ACK frame. This is caused
by qc_prep_pkts() loop which may call qc_do_build_pkt() multiple times
with the same QEL instance. This case is normally use when a full
datagram has been built but there is still content to emit on the
current encryption level.

To fix this, alter qc_prep_pkts() loop : if both probing and ACK is
requested, force the datagram to be written after packet encoding. This
will result in a datagram containing the packet with probing data as
final entry. A new datagram is started for the next packet which will
can contain the ACK frame.

This also has some impact on INITIAL padding. Indeed, if packet must be
the last due to probing emission, qc_prep_pkts() will also activate
padding to ensure final datagram is at least 1.200 bytes long.

Note that coalescing two packets of the same type is not invalid
according to QUIC RFC. However it could cause issue with some shaky
implementations, so it is considered as a bug.

This must be backported up to 2.6.
2025-08-22 18:20:42 +02:00
Amaury Denoyelle
8bc339a6ad BUG/MAJOR: quic: fix INITIAL padding with probing packet only
A QUIC datagram that contains an INITIAL packet must be padded to 1.200
bytes to prevent any deadlock due to anti-amplification protection. This
is implemented by encoding a PADDING frame on the last packet of the
datagram if necessary.

Previously, qc_prep_pkts() was responsible to activate padding when
calling qc_do_build_pkt(), as it knows which packet is the last to
encode. However, this has the side-effect of preventing PING emission
for probing with no data as this case was handled in an else-if branch
after padding. This was fixed by the below commit

  217e467e89d15f3c22e11fe144458afbf718c8a8
  BUG/MINOR: quic: fix malformed probing packet building

Above logic was altered to fix the PING case : padding was set to false
explicitely in qc_prep_pkts(). Padding was then added in a specific
block dedicated to the PING case in qc_do_build_pkt() itself for INITIAL
packets.

However, the fix is incorrect if the last QEL used to built a packet is
not the initial one and probing is used with PING frame only. In this
case, specific block in qc_do_build_pkt() does not add padding. This
causes a BUG_ON() crash in qc_txb_store() which catches these packets as
irregularly formed.

To fix this while also properly handling PING emission, revert to the
original padding logic : qc_prep_pkts() is responsible to activate
INITIAL padding. To not interfere with PING emission, qc_do_build_pkt()
body is adjusted so that PING block is moved up in the function and
detached from the padding condition.

The main benefit from this patch is that INITIAL padding decision in
qc_prep_pkts() is clearer now.

Note that padding can also be activated by qc_do_build_pkt(), as packets
should be big enough for header protection decipher. However, this case
is different from INITIAL padding, so it is not covered by this patch.

This should be backported up to 2.6.
2025-08-22 18:12:32 +02:00
Amaury Denoyelle
0376e66112 BUG/MINOR: quic: do not emit probe data if CONNECTION_CLOSE requested
If connection closing is activated, qc_prep_pkts() can only built a
datagram with a single packet. This is because we consider that only a
single CONNECTION_CLOSE frame is relevant at this stage.

This is handled both by qc_prep_pkts() which ensure that only a single
packet datagram is built and also qc_do_build_pkt() which prevents the
invokation of qc_build_frms() if <cc> is set.

However, there is an incoherency for probing. First, qc_prep_pkts()
deactivates it if connection closing is requested. But qc_do_build_pkt()
may still emit probing frame as it does not check its <probe> argument
but rather <pto_probe> QEL field directly. This can results in a packet
mixing a PING and a CONNECTION close frames, which is useless.

Fix this by adjusting qc_do_build_pkt() : closing argument is also
checked on PING probing emission. Note that there is still shaky code
here as qc_do_build_pkt() should rely only on <probe> argument to ensure
this.

This should be backported up to 2.6.
2025-08-22 18:06:43 +02:00
Amaury Denoyelle
fc3ad50788 BUG/MEDIUM: quic: reset padding when building GSO datagrams
qc_prep_pkts() encodes input data into QUIC packets in a loop into one
or several datagrams. It supports GSO which requires to built a serie of
multiple datagrams of the same length.

Each packet encoding is performed via a call to qc_do_build_pkt(). This
function has an argument to specify if output packet must be completed
with a PADDING frame. This option is activated when qc_prep_pkts()
encodes the last packet of a datagram with at least one INITIAL packet
in it.

Padding is resetted each time a new datagram is started. However, this
was not performed if GSO is used to built the next datagram. This patch
fixes it by properly resetting padding in this case also.

The impact of this bug is unknown. It may have several effectfs, one of
the most obvious being the insertion of unnecessary padding in packets.
It could also potentially trigger an infinite loop in qc_prep_pkts(),
although this has never been encountered so far.

This must be backported up to 3.1.
2025-08-22 16:22:01 +02:00
Frederic Lecaille
9a22770ac5 BUG/MINOR: quic-be: missing Initial packet number space discarding
A QUIC client must discard the Initial packet number space as soon as it first
sends a Handshake packet.

This patch implements this packet number space which was missing.
2025-08-21 14:24:31 +02:00
Willy Tarreau
c264ea1679 MEDIUM: tree-wide: replace most DECLARE_POOL with DECLARE_TYPED_POOL
This will make the pools size and alignment automatically inherit
the type declaration. It was done like this:

   sed -i -e 's:DECLARE_POOL(\([^,]*,[^,]*,\s*\)sizeof(\([^)]*\))):DECLARE_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_POOL src addons)
   sed -i -e 's:DECLARE_STATIC_POOL(\([^,]*,[^,]*,\s*\)sizeof(\([^)]*\))):DECLARE_STATIC_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_STATIC_POOL src addons)

81 replacements were made. The only remaining ones are those which set
their own size without depending on a structure. The few ones with an
extra size were manually handled.

It also means that the requested alignments are now checked against the
type's. Given that none is specified for now, no issue is reported.

It was verified with "show pools detailed" that the definitions are
exactly the same, and that the binaries are similar.
2025-08-11 19:55:30 +02:00
Amaury Denoyelle
731b52ded9 MINOR: quic: prefer qc_is_back() usage over qc->target
Previously quic_conn <target> member was used to determine if quic_conn
was used on the frontend (as server) or backend side (as client). A new
helper function can now be used to directly check flag
QUIC_FL_CONN_IS_BACK.

This reduces the dependency between quic_conn and their relative
listener/server instances.
2025-08-07 16:59:59 +02:00
Amaury Denoyelle
e064e5d461 MINOR: quic: duplicate GSO unsupp status from listener to conn
QUIC emission can use GSO to emit multiple datagrams with a single
syscall invokation. However, this feature relies on several kernel
parameters which are checked on haproxy process startup.

Even if these checks report no issue, GSO may still be unable due to the
underlying network adapter underneath. Thus, if a EIO occured on
sendmsg() with GSO, listener is flagged to mark GSO as unsupported. This
allows every other QUIC connections to share the status and avoid using
GSO when using this listener.

Previously, listener flag was checked for every QUIC emission. This was
done using an atomic operation to prevent races. Improve this by
duplicating GSO unsupported status as the connection level. This is done
on qc_new_conn() and also on thread rebinding if a new listener instance
is used.

The main benefit from this patch is to reduce the dependency between
quic_conn and listener instances.
2025-08-07 16:36:26 +02:00
Frederic Lecaille
14d0f74052 MINOR: quic: Remove pool_head_quic_be_cc_buf pool
This patch impacts the QUIC frontends. It reverts this patch

    MINOR: quic-be: add a "CC connection" backend TX buffer pool

which adds <pool_head_quic_be_cc_buf> new pool to allocate CC (connection closed state)
TX buffers with bigger object size than the one for <pool_head_quic_cc_buf>.
Indeed the QUIC backends must be able to send at least 1200 bytes Initial packets.

For now on, both the QUIC frontends and backend use the same pool with
MAX(QUIC_INITIAL_IPV6_MTU, QUIC_INITIAL_IPV4_MTU)(1252 bytes) as object size.
2025-07-17 19:33:21 +02:00
Frederic Lecaille
838024e07e MINOR: quic: Get rid of qc_is_listener()
Replace all calls to qc_is_listener() (resp. !qc_is_listener()) by calls to
objt_listener() (resp. objt_server()).
Remove qc_is_listener() implement and QUIC_FL_CONN_LISTENER the flag it
relied on.
2025-07-16 16:42:21 +02:00
Ilia Shipitsin
0ee3d739b8 CLEANUP: assorted typo fixes in the code, commits and doc
Corrected various spelling and phrasing errors to improve clarity and consistency.
2025-07-10 19:49:48 +02:00
Frederic Lecaille
87ada46f38 BUG/MINOR: quic-be: Malformed coalesced Initial packets
This bug fix completes this patch which was not sufficient:

   MINOR: quic-be: Allow sending 1200 bytes Initial datagrams

This patch could not allow the build of well formed Initial packets coalesced to
others (Handshake) packets. Indeed, the <padding> parameter passed to qc_build_pkt()
is deduced from a first value: <padding> value and must be set to 1 for
the last encryption level. As a client, the last encryption level is always
the Handshake encryption level. But <padding> was always set to 1 for a QUIC
client, leading the first Initial packet to be malformed because considered
as the second one into the same datagram.

So, this patch sets <padding> value passed to qc_build_pkt() to 1 only when there
is no last encryption level at all, to allow the build of Initial only packets
(not coalesced) or when it frames to send (coalesced packets).

No need to backport.
2025-07-07 14:13:02 +02:00
Frederic Lecaille
194e3bc2d5 MINOR: quic-be: address validation support implementation (RETRY)
- Add ->retry_token and ->retry_token_len new quic_conn struct members to store
  the retry tokens. These objects are allocated by quic_rx_packet_parse() and
  released by quic_conn_release().
- Add <pool_head_quic_retry_token> new pool for these tokens.
- Implement quic_retry_packet_check() to check the integrity tag of these tokens
  upon RETRY packets receipt. quic_tls_generate_retry_integrity_tag() is called
  by this new function. It has been modified to pass the address where the tag
  must be generated
- Add <resend> new parameter to quic_pktns_discard(). This function is called
  to discard the packet number spaces where the already TX packets and frames are
  attached to. <resend> allows the caller to prevent this function to release
  the in flight TX packets/frames. The frames are requeued to be resent.
- Modify quic_rx_pkt_parse() to handle the RETRY packets. What must be done upon
  such packets receipt is:
  - store the retry token,
  - store the new peer SCID as the DCID of the connection. Note that the peer will
    modify again its SCID. This is why this SCID is also stored as the ODCID
    which must be matched with the peer retry_source_connection_id transport parameter,
  - discard the Initial packet number space without flagging it as discarded and
    prevent retransmissions calling qc_set_timer(),
  - modify the TLS cryptographic cipher contexts (RX/TX),
  - wakeup the I/O handler to send new Initial packets asap.
- Modify quic_transport_param_decode() to handle the retry_source_connection_id
  transport parameter as a QUIC client. Then its caller is modified to
  check this transport parameter matches with the SCID sent by the peer with
  the RETRY packet.
2025-06-26 09:48:00 +02:00
Frederic Lecaille
8a25fcd36e MINOR: quic-be: Allow sending 1200 bytes Initial datagrams
This easy to understand patch is not intrusive at all and cannot break the QUIC
listeners.

The QUIC client MUST always pad its datagrams with Initial packets. A "!l" (not
a listener) test OR'ed with the existing ones is added to satisfy the condition
to allow the build of such datagrams.
2025-06-26 09:48:00 +02:00
Frederic Lecaille
c898b29e64 MINOR: quic: Useless TX buffer size reduction in closing state
There is no need to limit the size of the TX buffer to QUIC_MIN_CC_PKTSIZE bytes
when the connection is in closing state. There is already a test which limits the
number of bytes to be used from this TX buffer after this useless test removed.
It limits this number of bytes to the size of the TX buffer itself:

    if (end > (unsigned char *)b_wrap(buf))
	    end = (unsigned char *)b_wrap(buf);

This is exactly what is needed when the connection is in closing state. Indeed,
the size of the TX buffers are limited to reduce the memory usage. The connection
only needs to send short datagrams with at most 2 packets with a CONNECTION_CLOSE*
frames. They are built only one time and backed up into small TX buffer allocated
from a dedicated pool.
The size of this TX buffer is QUIC_MAX_CC_BUFSIZE which depends on QUIC_MIN_CC_PKTSIZE:

 #define QUIC_MIN_CC_PKTSIZE  128
 #define QUIC_MAX_CC_BUFSIZE (2 * (QUIC_MIN_CC_PKTSIZE + QUIC_DGRAM_HEADLEN))

This size is smaller than an MTU.

This patch should be backported as far as 2.9 to ease further backports to come.
2025-06-26 09:48:00 +02:00
Frederic Lecaille
9cb2acd2f2 MINOR: quic-be: add a "CC connection" backend TX buffer pool
A QUIC client must be able to close a connection sending Initial packets. But
QUIC client Initial packets must always be at least 1200 bytes long. To reduce
the memory use of TX buffers of a connection when in "closing" state, a pool
was dedicated for this purpose but with a too much reduced TX buffer size
(QUIC_MAX_CC_BUFSIZE).

This patch adds a "closing state connection" TX buffer pool with the same role
for QUIC backends.
2025-06-26 09:48:00 +02:00
Frederic Lecaille
1e6d8f199c BUG/MINOR: quic: wrong QUIC_FT_CONNECTION_CLOSE(0x1c) frame encoding
This is an old bug which was there since this commit:

     MINOR: quic: Avoid zeroing frame structures

It seems QUIC_FT_CONNECTION_CLOSE was confused with QUIC_FT_CONNECTION_CLOSE_APP
which does not include a "frame type" field. This field was not initialized
(so with a random value) which prevent the packet to be built because the
packet builder supposes the packet with such frames are very short.

Must be backported as far as 2.6.
2025-06-26 09:48:00 +02:00
Frederic Lecaille
b9703cf711 MINOR: quic-be: get rid of ->li quic_conn member
Replace ->li quic_conn pointer to struct listener member by  ->target which is
an object type enum and adapt the code.
Use __objt_(listener|server)() where the object type is known. Typically
this is were the code which is specific to one connection type (frontend/backend).
Remove <server> parameter passed to qc_new_conn(). It is redundant with the
<target> parameter.
GSO is not supported at this time for QUIC backend. qc_prep_pkts() is modified
to prevent it from building more than an MTU. This has as consequence to prevent
qc_send_ppkts() to use GSO.
ssl_clienthello.c code is run only by listeners. This is why __objt_listener()
is used in place of ->li.
2025-06-11 18:37:34 +02:00
Ilia Shipitsin
78b849b839 CLEANUP: assorted typo fixes in the code and comments
code, comments and doc actually.
2025-04-02 11:12:20 +02:00
Amaury Denoyelle
a71007c088 MINOR: quic: move global tune options into quic_tune
A new structure quic_tune has recently been defined. Its purpose is to
store global options related to QUIC. Previously, only the tunable to
toggle pacing was stored in it.

This commit moves several QUIC related tunable from global to quic_tune
structure. This better centralizes QUIC configuration option and gives
room for future generic options.
2025-03-24 10:01:46 +01:00
Amaury Denoyelle
e2744d23be MINOR: quic: refactor CRYPTO encoding and splitting
This patch is the direct follow-up of the previous one which refactor
STREAM frame encoding. Reuse the newly defined quic_strm_frm_fillbuf()
and quic_strm_frm_split() functions for CRYPTO frame encoding.

The code for CRYPTO and STREAM frames encoding should now be clearer as
it is mostly identical.
2025-02-12 15:10:54 +01:00
Amaury Denoyelle
f96af8e463 MINOR: quic: refactor STREAM encoding and splitting
CRYPTO and STREAM frames encoding is similar. If payload is too large,
frame will be splitted and only the first payload part will be written
in the output QUIC packet. This process is complexified by the presence
of a variable-length integer Length field prior to the payload.

This commit aims at refactor these operations. Define two functions to
simplify the code :
* quic_strm_frm_fillbuf() which is used to calculate the optimal frame
  length of a STREAM/CRYPTO frame with its payload in a buffer
* quic_strm_frm_split() which is used to split the frame payload if
  buffer is too small

With this patch, both functions are now implemented for STREAM encoding.
2025-02-12 15:10:03 +01:00
Amaury Denoyelle
731340afbd MINOR: quic: simplify length calculation for STREAM/CRYPTO frames
STREAM and CRYPTO frames have a similar encoding format. In particular,
both of them have a variable-length integer Length field just before the
frame payload.

It is complex to determine the optimal Length value before copying the
payload data in the remaining buffer space. As such, helper functions
were implemented to calculate this. However, CRYPTO and STREAM frames
encoding implementation were not completely aligned, which renders the
code harder to follow.

The purpose of this commit is to simplify CRYPTO and STREAM frames
encoding. First, a new helper quic_int_cap_length() is defined which is
useful to determine the optimal buffer room available if prefixed by a
variable-length integer as Length field. Then, processing of both CRYPTO
and STREAM frames is now nearly identical, based on this new helper
function. Functions max_available_room() and max_stream_data_size() are
now unused and are removed.
2025-02-12 11:51:09 +01:00
Amaury Denoyelle
e6a223542a BUG/MINOR: quic: fix CRYPTO payload size calcul for encoding
Function max_stream_data_size() is used to determine the payload length
of a CRYPTO frame. It takes into account that the CRYPTO length field is
a variable length integer.

Implemented calcul was incorrect as it reserved too much space as a
frame header. This error is mostly due because max_stream_data_size()
reuses max_available_room() which also reserve space for a variable
length integer. This results in CRYPTO frames shorter of 1 to 2 bytes
than the maximum achievable value, which produces in the end datagram
shorter than the MTU.

Fix max_stream_data_size() implementation. It is now merely a wrapper on
max_available_room(). This ensures that CRYPTO frame encoding is now
properly optimized to use the MTU available.

This should be backported up to 2.6.
2025-02-12 11:51:09 +01:00
Amaury Denoyelle
63747452a3 BUG/MINOR: quic: reserve length field for long header encoding
Long header packets have a mandatory Length field, which contains the
size of Packet number and payload, encoded as a variable-length integer.
Its value can thus only be determined after the payload size is known,
which depends on the remaining buffer space after this variable-length
field.

Packet payload are encoded in two steps. First, a list of input frames
is processed until the packet buffer is full. CRYPTO and STREAM frames
payload can be splitted if need to fill the buffer. Real encoding is
then performed as a second stage operation, first with Length field,
then with the selected frames themselves.

Before this patch, no space was reserved in the buffer for Length field
when attaching the frames to the packet. This could result in a error as
the packet payload would be too large for the remaining space.

In practice, this issue was rarely encounted, mostly as a side-effect
from another issue linked to CRYPTO frame encoding. Indeed, a wrong
calculation is performed on CRYPTO splitting, which results in frame
payload shorter by a few bytes than expected. This however ensured there
would be always enough room for the Length field and payload during
encoding. As CRYPTO frames are the only big enough content emitted with
a Long header packet, this renders the current issue mostly non
reproducible.

Fix the original issue by reserving some space for Length field prior to
frame payload calculation, using a maximum value based on the remaining
room space. Packet length is then reduced if needed when encoding is
performed, which ensures there is always enough room for the selected
frames.

Note that the other issue impacting CRYPTO frame encoding is not yet
fixed. This could result in datagrams with Long header packets not
completely extended to the full MTU. The issue will be addressed in
another patch.

This should be backported up to 2.6.
2025-02-12 11:51:09 +01:00
Amaury Denoyelle
4489a61585 MEDIUM: quic: implement credit based pacing
Implement a new method for QUIC pacing emission based on credit. This
represents the number of packets which can be emitted in a single burst.
After emission, decrement from the credit the number of emitted packets.
Several emission can be conducted in the same sequence until the credit
is completely decremented.

When a new emission sequence is initiated (i.e. under a new QMUX tasklet
invokation), credit is refilled according to the delay which occured
between the last and current emission context.

This new mechanism main advantage is that it allows to conduct several
emission in the same task context without having to wait between each
invokation. Wait is only forced if pacing is expired, which is now
equivalent to having a null credit.

Furthermore, if delay between two emissions sequence would have been
smaller than expected, credit is only partially refilled. This allows to
restart emission without having to wait for the whole credit to be
available.

On the implementation side, a new field <credit> is avaiable in
quic_pacer structure. It is automatically decremented on
quic_pacing_sent_done() invokation. Also, a new function
quic_pacing_reload() must be used by QUIC MUX when a new emission
sequence is initiated to refill credit. <next> field from quic_pacer has
been removed.

For the moment, credit is based on the burst configured via quic-cc-algo
keyword, or directly reported by BBR.

This should be backported up to 3.1.
2025-01-23 17:40:20 +01:00
Frederic Lecaille
f8b697c19b BUG/MINOR: improve BBR throughput on very fast links
This patch fixes the loss of information when computing the delivery rate
(quic_cc_drs.c) on links with very low latency due to usage of 32bits
variables with the millisecond as precision.

Initialize the quic_conn task with TASK_F_WANTS_TIME flag ask it to ask
the scheduler to update the call date of this task. This allows this task to get
a nanosecond resolution on the call date calling task_mono_time(). This is enabled
only for congestion control algorithms with delivery rate estimation support
(BBR only at this time).

Store the send date with nanosecond precision of each TX packet into
->time_sent_ns new quic_tx_packet struct member to store the date a packet was
sent in nanoseconds thanks to task_mono_time().

Make use of this new timestamp by the delivery rate estimation algorithm (quic_cc_drs.c).

Rename current ->time_sent member from quic_tx_packet struct to ->time_sent_ms to
distinguish the unit used by this variable (millisecond) and update the code which
uses this variable. The logic found in quic_loss.c is not modified at all.

Must be backported to 3.1.
2024-11-28 21:39:05 +01:00
Amaury Denoyelle
2fffd85b97 BUG/MEDIUM: quic: prevent EMSGSIZE with GSO for larger bufsize
A UDP datagram cannot be greater than 65535 bytes, as UDP length header
field is encoded on 2 bytes. As such, sendmsg() will reject a bigger
input with error EMSGSIZE. By default, this does not cause any issue as
QUIC datagrams are limited to 1.252 bytes and sent individually.

However, with GSO support, value bigger than 1.252 bytes are specified
on sendmsg(). If using a bufsize equal to or greater than 65535, syscall
could reject the input buffer with EMSGSIZE. As this value is not
expected, the connection is immediately closed by haproxy and the
transfer is interrupted.

This bug can easily reproduced by requesting a large object on loopback
interface and using a bufsize of 65535 bytes. In fact, the limit is
slightly less than 65535, as extra room is also needed for IP + UDP
headers.

Fix this by reducing the count of datagrams encoded in a single GSO
invokation via qc_prep_pkts(). Previously, it was set to 64 as specified
by man 7 udp. However, with 1252 datagrams, this is still too many.
Reduce it to a value of 52. Input to sendmsg will thus be restricted to
at most 65.104 bytes if last datagram is full.

If there is still data available for encoding in qc_prep_pkts(), they
will be written in a separate batch of datagrams. qc_send_ppkts() will
then loop over the whole QUIC Tx buffer and call sendmsg() for each
series of at most 52 datagrams.

This does not need to be backported.
2024-11-26 11:49:30 +01:00
Frederic Lecaille
96b2641fc8 BUG/MAJOR: quic: fix wrong packet building due to already acked frames
If a packet build was asked to probe the peer with frames which have just
been acked, the frames build run by qc_build_frms() could be cancelled  by
qc_stream_frm_is_acked() whose aim is to check that current frames to
be built have not been already acknowledged. In this case the packet build run
by qc_do_build_pkt() is not interrupted, leading to the build of an empty packet
which should be ack-eliciting.

This is a bug detected by the BUG_ON() statement in qc_do_build_pk():

	    BUG_ON(qel->pktns->tx.pto_probe &&
           !(pkt->flags & QUIC_FL_TX_PACKET_ACK_ELICITING));

Thank you to @Tristan971 for having reported this issue in GH #2709

This is an old bug which must be backported as far as 2.6.
2024-11-25 18:55:45 +01:00
Amaury Denoyelle
044452546e BUG/MEDIUM: quic: fix sending performance due to qc_prep_pkts() return
qc_prep_pkts() is a QUIC transport level function which encodes one or
several datagrams in a buffer before sending them. It returns the number
of encoded datagram. This is especially important when pacing is used to
limit packet bursts.

This datagram accounting was not trivial as qc_prep_pkts() used several
code paths depending on the condition of the current encoded packet.
Thus, there were several places were the local variable dgram_cnt could
have been incremented. This was implemented by the following commit :

  commit 5cb8f8a6224db96f4386277c41ddae4a29a4130d
  MINOR: quic: support a max number of built packet per send iteration

However, there is a bug due to a missing increment when all frames from
the current QEL have been encoded. In this case, the encoding continue
in the same datagram to coalesce a futur packet. However, if this is the
last QEL, encoding loop will then break. As first_pkt is not NULL,
qc_txb_store() is called outside but dgram_cnt is yet not incremented.

In particular, this causes qc_prep_pkts() to return 0 when there is only
small STREAM frames to emit for application QEL. In qc_send(), this is
interpreted as a value which prevents further emission for the current
invokation. Thus, it may hurts performance, both without and with
pacing.

To fix this, removing multiple dgram_cnt increment. Now, it is modified
only in a single place which should cover every case, and render the
code easier to validate.

The most notable case where the bug is visible is when using cubic with
pacing without any burst, with quic-cc-algo cubic(,1). First, transfer
bandwidth in average was suboptimal, with significant variation. Worst,
it could sometimes fall dramatically for a particular stream without
recovering before returning to an expected level on the next one.

No need to backport.
2024-11-25 11:21:28 +01:00