126 Commits

Author SHA1 Message Date
Frederic Lecaille
01fcbd6c08 BUG/MINOR: quic: Missing application limitations tracking for BBR
The ->app_limited member of the delivery rate struct (quic_cc_drs) aim is to
store the index of the last transmitted byte marked as application-limited
so that to track the application-limited phases. During these phases,
BBR must ignore delivery rate samples to properly estimate the delivery rate.

Without such a patch, the Startup phase could be exited very quickly with
a very low estimated bottleneck bandwidth. This had a very bad impact
on little objects with download times smaller than the expected Startup phase
duration. For such objects, with enough bandwith, BBR should stay in the Startup
state.

No need to be backported, as BBR is implemented in the current developement version.
2024-11-21 19:23:53 +01:00
Frederic Lecaille
e778b9a2b6 MINOR: quic: TX part modifications to support BBR.
Very few modifications: call ->on_transmit() and ->drs_on_transmit() congestion
control algorithm (quic_cc) callbacks from qc_send_ppkts() just after having
sents some packets.
2024-11-20 17:34:22 +01:00
Amaury Denoyelle
886a7c475c MINOR: quic/pacing: add burst support
qc_send_mux() has been extended previously to support pacing emission.
This will ensure that no more than one datagram will be emitted during
each invokation. However, to achieve better performance, it may be
necessary to emit a batch of several datagrams one one turn.

A so-called burst value can be specified by the user in the
configuration. However, some congestion control algos may defined their
owned dynamic value. As such, a new CC callback pacing_burst is defined.

quic_cc_default_pacing_burst() can be used for algo without pacing
interaction, such as cubic. It will returns a static value based on user
selected configuration.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
8039fe43e6 MINOR: quic/pacing: support pacing emission on quic_conn layer
Pacing will be implemented for STREAM frames emission. As such,
qc_send_mux() API has been extended to add an argument to a quic_pacer
engine.

If non NULL, engine will be used to pace emission. In short, no more
than one datagram will be emitted for each qc_send_mux() invokation.
Pacer is then notified about the emission and a timer for a future
emission is calculated. qc_send_mux() will return PACING error value, to
inform QUIC MUX layer that it will be responsible to retry emission
after some delay.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
7fd48a5723 MINOR: quic: extend qc_send_mux() return type with a dedicated enum
This commit is part of a adjustment on QUIC transport send API to
support pacing. Here, qc_send_mux() return type has been changed to use
a new enum quic_tx_err.

This is useful to explain different failure causes of emission. For now,
only two values have been defined : NONE and FATAL. When pacing will be
implemented, a new value would be added to specify that emission was
interrupted on pacing. This won't be a fatal error as this allows to
retry emission but not immediately.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
5cb8f8a622 MINOR: quic: support a max number of built packet per send iteration
Extend QUIC transport emission function to support a maximum datagram
argument. The purpose is to ensure that qc_send() won't emit more than
the specified value, unless it is 0 which is considered as unlimited.

In qc_prep_pkts(), a counter of built datagram has been added to support
this. The packet building loop is interrupted if it reaches a specified
maximum value. Also, its return value has been changed to the number of
prepared datagrams. This is reused by qc_send() to interrupt its work if
a specified max datagram argument value is reached over one or several
iteration of prepared/sent datagrams.

This change is necessary to support pacing emission. Note that ideally,
the total length in bytes of emitted datagrams should be taken into
account instead of the raw number of datagrams. However, for a first
implementation, it was deemed easier to implement it with the latter.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
a554d82131 MINOR: quic: simplify qc_prep_pkts() exit path
To prepare pacing support, qc_prep_pkts() exit path have been rewritten
to be easily modified. This is purely refactoring which should not have
any functional change :
* a dedicated error path has been added
* ensure qc_txb_store() is always called to finalize datagram on normal
  exit path if first_pkt is not NULL. Needed to support breaking from
  packet building loop in a easier way.
2024-11-19 16:16:48 +01:00
Frederic Lecaille
217e467e89 BUG/MINOR: quic: fix malformed probing packet building
This bug arrived with this commit:

   cdfceb10a MINOR: quic: refactor qc_prep_pkts() loop

which prevents haproxy from sending PING only packets/datagrams (some
packets/datagrams with only PING frame as ack-eliciting frames inside).
Such packets/datagrams are useful in rare cases during retransmissions
when one wants to probe the peer without exceeding the anti-amplification
limit.

Modify the condition passed to qc_build_pkt() to add padding to the current
datagram. One does not want to do that when probing the peer without ack-eliciting
frames passed as <frms> parameter. Indeed qc_build_pkt() calls qc_do_build_pkt()
which supports this case: if <probe> is true (probing required), qc_do_build_pkt()
handles the case where some padding must be added to a PING only packet/datagram.
This is the case when probing with an empty <frms> frame list of ack-eliciting
frames without exceeding the anti-amplification limit from qc_dgrams_retransmit().

Add some comments to qc_build_pkt() and qc_do_build_pkt() to clarify this
as this code is easy to break!

Thank you for @Tristan971 for having reported this issue in GH #2709.

Must be backported to 3.0.
2024-11-05 20:17:35 +01:00
Frederic Lecaille
444a19ea38 MINOR: quic: Help diagnosing malformed probing packets
Add a BUG_ON() to detect some malformed packets which are supposed to probe the
peer without being ack-eliciting: the peer would not acknowledged such packets.
2024-11-05 20:17:35 +01:00
Amaury Denoyelle
a8738f4156 MINOR: quic: complete trace in qc_may_build_pkt()
Log the encryption level in qc_may_build_pkt(). This is necessary to
fully understand the sending conditions of the QUIC stack.
2024-10-31 15:35:31 +01:00
Amaury Denoyelle
e7578084b0 MINOR: quic: implement dedicated type for out-of-order stream ACK
QUIC streamdesc layer is responsible to handle reception of ACK for
streams. It removes stream data from the underlying buffers on ACK
reception.

Streamdesc layer treats ACK in order at the stream level. Out of order
ACKs are buffered in a tree until they can be handled on older data
acknowledgement reception. Previously, qf_stream instance which comes
from the quic_tx_packet was used as tree node to buffer such ranges.

Introduce a new type dedicated to represent out of order stream ack data
range. This type is named qc_stream_ack. It contains minimal infos only
relative to the acknowledged stream data range.

This allows to reduce size of frequently used quic_frame with the
removal of tree node from qf_stream. Another side effect of this change
is that now quic_frame are always released immediately on ACK reception,
both in-order and out-of-order. This allows to also release the
quic_tx_packet instance which should reduce memory consumption.

The drawback of this change is that qc_stream_ack instance must be
allocated on out-of-order ACK reception. As such, qc_stream_desc_ack()
may fail if an error happens on allocation. For the moment, such error
is silenly recovered up to qc_treat_rx_pkts() with the dropping of the
received packet containing the ACK frame. In the future, it may be
useful to close the connection as this error may only happens on low
memory usage.
2024-10-04 17:56:45 +02:00
Amaury Denoyelle
d7f4e5abf0 MEDIUM: quic: strengthen MUX send notification
Previous commit implement a refactor of MUX send notification from
quic_conn layer. With this new architecture, a proper callback is
defined for each qc_stream_desc instance.

This architecture change allows to simplify notification from quic_conn
layer. First, ensure the MUX callback to properly ignore retransmission
of an already emitted frame. Luckily, this can be handled easily by
comparing offsets and FIN status. Also, each QCS instance can now be
unregistered from send notification just prior qc_stream_desc releasing.
This ensures a QCS is never manipulated from quic_conn after its
emission ending. Both these changes render the send notification more
robust. As a nice effect, flag QUIC_FL_CONN_TX_MUX_CONTEXT can be
removed as it is now unneeded.
2024-10-01 16:19:25 +02:00
Amaury Denoyelle
6ad99af0a9 MINOR: quic: refactor MUX send notification
For STREAM emission, MUX QUIC generates one or several frames and emit
them via qc_send_mux(). Lower layer may use them as-is, or split them to
lower chunk to fit in a QUIC packet. It is then responsible to notify
the MUX to report the amount of data sent.

Previously, this was done via a direct call from quic_conn to MUX using
qcc_streams_sent_done(). Modify this to have a better isolation accross
layers. Define a send callback handled by the qc_stream_desc instance.
This allows the MUX to register each QCS instance individually to the
renamved qmux_ctrl_send() which replaces qcc_streams_sent_done().

At quic_conn layer, qc_stream_desc_send() can be used now. This is a
wrapper to qc_stream_desc layer to invoke the send callback if
registered.

This mechanism of qc_stream_desc callback should be extended later to
implement other notifications accross the QUIC stack.
2024-10-01 16:19:25 +02:00
Amaury Denoyelle
714009b7bc MINOR: quic: implement function to check if STREAM is fully acked
When a STREAM frame is retransmitted, a check is performed to remove
range of data already acked from it. This is useful when STREAM frames
are duplicated and splitted to cover different data ranges. The newly
retransmitted frame contains only unacked data.

This process is performed similarly in qc_dup_pkt_frms() and
qc_build_frms(). Refactor the code into a new function named
qc_stream_frm_is_acked(). It returns true if frame data are already
fully acked and retransmission can be avoided. If only a partial range
of data is acknowledged, frame content is updated to only cover the
unacked data.

This patch does not have any functional change. However, it simplifies
retransmission for STREAM frames. Also, it will be reused to fix
retransmission for empty STREAM frames with FIN set from the following
patch :
  BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM

As such, it must be backported prior to it.
2024-08-07 10:57:10 +02:00
Frederic Lecaille
eb1a097a66 BUG/MINOR: quic: Too short datagram during packet building failures (aws-lc only)
This issue was reported by Ilya (@Chipitsine) when building haproxy against
aws-lc in GH #2663 where handshakeloss and handshakecorruption interop tests could
lead haproxy to crash after having built too short datagrams:

FATAL: bug condition "first_pkt->type == QUIC_PACKET_TYPE_INITIAL && (first_pkt->flags & (1UL << 0)) && length < 1200" matched at src/quic_tx.c:163
call trace(13):
| 0x55f4ee4dcc02 [ba d9 00 00 00 48 8d 35]: main-0x195bf2
| 0x55f4ee4e3112 [83 3d 2f 16 35 00 00 0f]: qc_send+0x11f3/0x1b5d
| 0x55f4ee4e9ab4 [85 c0 0f 85 00 f6 ff ff]: quic_conn_io_cb+0xab1/0xf1c
| 0x55f4ee6efa82 [48 c7 c0 f8 55 ff ff 64]: run_tasks_from_lists+0x173/0x9c2
| 0x55f4ee6f05d3 [8b 7d a0 29 c7 85 ff 0f]: process_runnable_tasks+0x302/0x6e6
| 0x55f4ee671bb7 [83 3d 86 72 44 00 01 0f]: run_poll_loop+0x6e/0x57b
| 0x55f4ee672367 [48 8b 1d 22 d4 1d 00 48]: main-0x48d
| 0x55f4ee6755e0 [b8 00 00 00 00 e8 08 61]: main+0x2dec/0x335d

This could happen after Handshake packet building failures which follow a successful
Initial packet into the same datagram. In this case, the datagram could be emitted
with a too short length (<1200 bytes).

To fix this, store the datagram only if the first packet is not an Initial packet
or if its length is big enough (>=1200 bytes).

Must be backported as far as 2.6.
2024-08-05 13:40:51 +02:00
William Lallemand
177c84808c MEDIUM: quic: add key argument to header protection crypto functions
In order to prepare the code for using Chacha20 with the EVP_AEAD API,
both quic_tls_hp_decrypt() and quic_tls_hp_encrypt() need an extra key
argument.

Indeed Chacha20 does not exists as an EVP_CIPHER in AWS-LC, so the key
won't be embedded into the EVP_CIPHER_CTX, so we need an extra parameter
to use it.
2024-07-25 13:45:39 +02:00
William Lallemand
d55a297b85 MINOR: quic: rename confusing wording aes to hp
Some of the crypto functions used for headers protection in QUIC are
named with an "aes" name even thought they are not used for AES
encryption only.

This patch renames these "aes" to "hp" so it is clearer.
2024-07-25 13:45:38 +02:00
Amaury Denoyelle
b0990b38f8 MINOR: quic: add counters of sent bytes with and without GSO
Add a sent bytes counter for each quic_conn instance. A secondary field
which only account bytes sent via GSO which is useful to ensure if this
is activated.

For the moment, these counters are reported on "show quic" but not
aggregated on proxy quic module stats.
2024-07-11 11:02:44 +02:00
Amaury Denoyelle
d0ea173e35 MEDIUM: quic: implement GSO fallback mechanism
UDP GSO on Linux is not implemented in every network devices. For
example, this is not available for veth devices frequently used in
container environment. In such case, EIO is reported on send()
invocation.

It is impossible to test at startup for proper GSO support in this case
as a listener may be bound on multiple network interfaces. Furthermore,
network interfaces may change during haproxy lifetime.

As such, the only option is to react on send syscall error when GSO is
used. The purpose of this patch is to implement a fallback when
encountering such conditions. Emission can be retried immediately by
trying to send each prepared datagrams individually.

To support this, qc_send_ppkts() is able to iterate over each datagram
in a so-called non-GSO fallback mode. Between each emission, a datagram
header is rewritten in front of the buffer which allows the sending loop
to proceed until last datagram is emitted.

To complement this, quic_conn listener is flagged on first GSO send
error with value LI_F_UDP_GSO_NOTSUPP. This completely disables GSO for
all future emission with QUIC connections using this listener.

For the moment, non-GSO fallback mode is activated when EIO is reported
after GSO has been set. This is the error reported for the veth usage
described above.
2024-07-11 11:02:44 +02:00
Amaury Denoyelle
af22792a43 MAJOR: quic: support GSO when encoding datagrams
QUIC datagrams are encoded during emission via the function
qc_prep_pkts(). By default, if GSO is not used, each datagram is
prefixed by a metadata header which specify its length and address of
its first quic_tx_packet instance.

If GSO is activated, metadata header won't be inserted for datagrams
following the first one sent in a single syscall. Length field will
contain the total size of these datagrams. This allows to support both
GSO and non-GSO prepared datagram in the same Tx buffer.

qc_send_ppkts() is invoked just after datagrams encoding. It iterates
over each metadata header in Tx buffer to sent each datagram
individually. If length field is bigger than network MTU, GSO usage is
assumed and qc_snd_buf() GSO parameter will be set.

Another important point to note regarding GSO implementation is that
during datagram encoding, packets from the same datagram instance are
attached together. However, if using GSO, consecutive packets from
different datagrams are also linked, but without
QUIC_FL_TX_PACKET_COALESCED flag. This allows to properly update
quic_conn status with all sent packets in qc_send_ppkts(). Packets from
different datagrams are then unlinked to treat them separately when
receiving corresponding ACK frames.
2024-07-11 11:02:44 +02:00
Amaury Denoyelle
448d3d388a MINOR: quic: add GSO parameter on quic_sock send API
Add <gso_size> parameter to qc_snd_buf(). When non-null, this specifies
the value for socket option SOL_UDP/UDP_SEGMENT. This allows to send
several datagrams in a single call by splitting data multiple times at
<gso_size> boundary.

For now, <gso_size> remains set to 0 by caller, as such there should not
be any functional change.
2024-07-11 11:02:44 +02:00
Amaury Denoyelle
cac47d19bd CLEANUP: quic: remove obsolete comment on send
Remove comment on send which is now obsolete since the introduction of
per-connection socket.
2024-07-11 11:02:44 +02:00
Frederic Lecaille
6d943b8db6 BUG/MINOR: quic: Wrong datagram building when probing.
This issue was revealed by chacha20 interop test which very often fails with
ngtcp2 as client. This was due to the fact that 2 application level packets could
be coalesced into the same datagram as revealed by such a capture:

Frame 380: 255 bytes on wire (2040 bits), 255 bytes captured (2040 bits)
Point-to-Point Protocol
Internet Protocol Version 4, Src: 193.167.100.100, Dst: 193.167.0.100
User Datagram Protocol
QUIC IETF
    QUIC Connection information
        [Connection Number: 0]
    [Packet Length: 187]
    QUIC Short Header DCID=ec523fe99840f9c17c868a88d649147814 PKN=333
        0... .... = Header Form: Short Header (0)
        .1.. .... = Fixed Bit: True
        ..0. .... = Spin Bit: False
        [...0 0... = Reserved: 0]
        [.... .0.. = Key Phase Bit: False]
        [.... ..00 = Packet Number Length: 1 bytes (0)]
        Destination Connection ID: ec523fe99840f9c17c868a88d649147814
        [Packet Number: 333]
        Protected Payload […]: 43537d43a3c83e47db6891bd6a4fd7d7fa31941badcb87a540e843341d6a5e493ed4c3f6e6bbff094804ee0ab06830dc1a1bbf52ace4323d2e4f6e0bd4eea73df0721d2949d05a058d3afb974e814494ebf44d1375b0e7f1fd5bcf634cf32ef9a9b4018758a49d39a24c40
    STREAM id=0 fin=0 off=294768 len=144 dir=Bidirectional origin=Client-initiated
        Frame Type: STREAM (0x000000000000000e)
            .... ...0 = Fin: False
            .... ..1. = Len(gth): True
            .... .1.. = Off(set): True
        Stream ID: 0
            .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ...0 = Stream initiator: Client-initiated (0)
            .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ..0. = Stream direction: Bidirectional (0)
        Offset: 294768
        Length: 144
        Stream Data […]: 63eef6ccee0d2ab602db3682d0e7cc09b72db6adc307d7699a211144b4b6c029cbed9beae1491c10a5fe0678d815a5303843d33c0593fedc9b64068fd0207e280d05aac2c0054fe9ab30857bc3669ee51d34756cfd2e098eb1ab31a03911f6a103f0a16f8f984d9861efdcf4433c
QUIC IETF
    [Packet Length: 38]
    QUIC Short Header DCID=ec523fe99840f9c17c868a88d649147814 PKN=334
        0... .... = Header Form: Short Header (0)
        .1.. .... = Fixed Bit: True
        ..0. .... = Spin Bit: False
        [...0 0... = Reserved: 0]
        [.... .0.. = Key Phase Bit: False]
        [.... ..00 = Packet Number Length: 1 bytes (0)]
        Destination Connection ID: ec523fe99840f9c17c868a88d649147814
        [Packet Number: 334]
        Protected Payload: b9c0e6dc3fc523574f8164c31b6cd156496212
    PING
        Frame Type: PING (0x0000000000000001)
    PADDING Length: 2
        Frame Type: PADDING (0x0000000000000000)
        [Padding Length: 2]

On the peer side these two packet are considered as a unique one
because there may be only one packet by datagram at application encryption
level and reported as a STREAM frame encoding error:

I00000332 0xec523fe99840f9c17c868a88d649147814 con recv packet len=225
mask=b2c69c7827 sample=43a3c83e47db6891bd6a4fd7d7fa3194
I00000332 0xec523fe99840f9c17c868a88d649147814 pkt rx pkn=333 dcid=0xec523fe99840f9c17c868a88d649147814 type=1RTT k=0
I00000332 0xec523fe99840f9c17c868a88d649147814 frm rx 333 1RTT STREAM(0x0e) id=0x0 fin=0 offset=294768 len=144 uni=0
ngtcp2_conn_read_pkt: ERR_FRAME_ENCODING
I00000332 0xec523fe99840f9c17c868a88d649147814 pkt tx pkn=1531039643 dcid=0xae79dfc99d6c65d6 type=1RTT k=0
I00000332 0xec523fe99840f9c17c868a88d649147814 frm tx 1531039643 1RTT CONNECTION_CLOSE(0x1c) error_code=FRAME_ENCODING_ERROR(0x7) frame_type=0 reason_len=0 reason=[]
I00000332 0xec523fe99840f9c17c868a88d649147814 frm tx 1531039643 1RTT PADDING(0x00) len=9

Note here that the sum of the two packet sizes (from capture) is the same as the
packet length reporte by ngtcp2: 187+38 = 225. It also seems that wireshark tries
to parse as much as packet into the same datagram, regardless of the QUIC protocol
rules.

Haproxy traces revealed that this could happen at least when probing the peer.
The recent low level packet building modifications aim was to build
as much as datagrams into the same buffer. But it seems that the
probing packet case treatment has been broken. That said, I have not
identified impacted commit. This issue could be reproduced inside
interop test environment (no possible git bisection).

To fix this, rely on the <probe> variable value to identify if the last
packet built by qc_prep_pkts() was a probing one, then try to
coalesce some others packet into the same datagram if this was not the case.
Of course the test on <probe> value has to be done before setting it
for the next packet.

Must be backported to 3.0.
2024-07-01 09:29:09 +02:00
Amaury Denoyelle
d5376b7a87 BUG/MINOR: quic: fix BUG_ON() on Tx pkt alloc failure
On quic_tx_packet allocation failure, it is possible to trigger BUG_ON()
crash on INITIAL packet building. This statement is responsible to
ensure INITIAL packets are padded to 1.200 bytes as required. If a
packet on higher encryption level allocation fails, PADDING frame cannot
properly encoded, despite the INITIAL packet properly built.

This crash happens due to qc_txb_store() invokation after quic_tx_packet
allocation failure to validate already built packets. However, this
statement is unneeded as qc_purge_tx_buf() is called just after. Simply
remove qc_txb_store() to fix this issue.

This was detected using -dMfail.

This should be backported up to 2.6.
2024-06-24 14:40:38 +02:00
Amaury Denoyelle
937324d493 BUG/MAJOR: quic: do not loop on emission on closing/draining state
To emit CONNECTION_CLOSE frame, a special buffer is allocated via
qc_txb_store(). This is due to QUIC_FL_CONN_IMMEDIATE_CLOSE flag.
However this flag is reset after qc_send_ppkts() invocation to prevent
reemission of CONNECTION_CLOSE frame.

qc_send() can invoke multiple times a series of qc_prep_pkts() +
qc_send_ppkts() to emit several datagrams. However, this may cause a
crash if on first loop a CONNECTION_CLOSE is emitted. On the next loop
iteration, QUIC_FL_CONN_IMMEDIATE_CLOSE is resetted, thus qc_prep_pkts()
will use the wrong buffer size as end delimiter. In some cases, this may
cause a BUG_ON() crash due to b_add() outside of buffer.

This bug can be reproduced by using a while loop of ngtcp2-client and
interrupting them randomly via Ctrl+C.

Here is the patch which introduce this regression :
  cdfceb10ae136b02e51f9bb346321cf0045d58e0
  MINOR: quic: refactor qc_prep_pkts() loop
2024-06-19 15:15:59 +02:00
Amaury Denoyelle
c714b6bb55 BUG/MAJOR: quic: fix padding with short packets
QUIC sending functions were extended to be more flexible. Of all the
changes, they support now iterating over a variable instance of QEL
instance of only 2 previously. This change has rendered PADDING emission
less previsible, which was adjusted via the following patch :

  a60609f1aa3e5f61d2a2286fdb40ebf6936a80ee
  BUG/MINOR: quic: fix padding of INITIAL packets

Its main purpose was to ensure PADDING would only be generated for the
last iterated QEL instance, to avoid unnecessary padding. In parallel, a
BUG_ON() statement ensure that built INITIAL packets are always padded
to 1.200 bytes as necessary before emitted them.

This BUG_ON() statement caused crash in one particular occurence : when
building datagrams that mixed Initial long packets and 1-RTT short
packets. This last occurence type does not have a length field in its
header, contrary to Long packets. This caused a miscalculation for the
necessary padding size, with INITIAL packets not padded enough to reach
the necessary 1.200 bytes size.

This issue was detected on 3.0.2. It can be reproduced by using 0-RTT
combined with latency. Here are the used commands :

  $ ngtcp2-client --tp-file=/tmp/ngtcp2-tp.txt \
    --session-file=/tmp/ngtcp2-session.txt --exit-on-all-streams-close \
    127.0.0.1 20443 "https://[::]/?s=32o"
  $ sudo tc qdisc add dev lo root netem latency 500ms

Note that this issue cannot be reproduced on current dev version.
Indeed, it seems that the following patch introduce a slight change in
packet building ordering :

  cdfceb10ae136b02e51f9bb346321cf0045d58e0
  MINOR: quic: refactor qc_prep_pkts() loop

This must be backported to 3.0.

This should fix github issue #2609.
2024-06-19 11:11:57 +02:00
Amaury Denoyelle
cdfceb10ae MINOR: quic: refactor qc_prep_pkts() loop
qc_prep_pkts() is built around a double loop iteration. First, it
iterates over every QEL instance register on sending. The inner loop is
used to repeatdly called qc_build_pkt() with a QEL instance. If the QEL
instance has no more data to sent, the next QEL entry is selected. It
can also be interrupted earlier if there is not enough room on the sent
buffer.

Clarify the inner loop by using qc_may_build_pkt() directly into it
besides the check on buffer room left. This function is used to test if
the QEL instance has something to send.

This should simplify send evolution, in particular GSO implementation.
2024-06-12 18:05:40 +02:00
Amaury Denoyelle
ba00431625 MINOR: quic: use global datagram headlen definition
Each emitted QUIC datagram is prefixed by an out-of-band header. This
header specify the datagram length and the pointer to the first QUIC
packet instance. This header length is defined via QUIC_DGRAM_HEADLEN.

Replace every occurences of manually calculated header length with
globally defined QUIC_DGRAM_HEADLEN. This should ease code maintenance
and simplify GSO implementation.
2024-06-12 18:05:40 +02:00
Amaury Denoyelle
88681681cc MINOR: quic: refactor qc_build_pkt() error handling
qc_build_pkt() error handling was difficult due to multiple error code
possible. Improve this by defining a proper enum to describe the various
error code. Also clean up ending labels inside qc_build_pkt().
2024-06-12 18:05:40 +02:00
Amaury Denoyelle
ab37b86921 OPTIM: quic: fill whole Tx buffer if needed
Previously, packets encoding was stopped as soon as buffer room left is
less than UDP MTU. This is suboptimal if the next packet would be
smaller than that.

To improve this, only check if there is at least enough room for the
mandatory packet header. qc_build_pkt() would ensure there is thus
responsible to return QC_BUILD_PKT_ERR_BUFROOM as soon as buffer left is
insufficient to stop packets encoding. An extra check is added to ensure
end pointer would never exceed buffer end.

This should not have any significant impact on the performance. However,
this renders the code intention clearer.
2024-06-12 18:05:40 +02:00
Amaury Denoyelle
a60609f1aa BUG/MINOR: quic: fix padding of INITIAL packets
API for sending has been extended to support emission on more than 2 QEL
instances. However, this has rendered the PADDING emission for INITIAL
packets less previsible. Indeed, if qc_send() is used with empty QEL
instances, a padding frame may be generated before handling the last QEL
registered, which could cause unnecessary padding to be emitted.

This commit simplify PADDING by only activating it for the last QEL
registered. This ensures that no superfluous padding is generated as if
the minimal INITIAL datagram length is reached, padding is resetted
before handling last QEL instance.

This bug is labelled as minor as haproxy already emit big enough INITIAL
packets coalesced with HANDSHAKE one without needing padding. This
however render the padding code difficult to test. Thus, it may be
useful to force emission on INITIAL qel only without coalescing
HANDSHAKE packet. Here is a sample to reproduce it :

--- a/src/quic_conn.c
+++ b/src/quic_conn.c
@@ -794,6 +794,14 @@ struct task *quic_conn_io_cb(struct task *t, void *context, unsigned int state)
                }
        }

+       if (qc->iel && qel_need_sending(qc->iel, qc)) {
+               struct list empty = LIST_HEAD_INIT(empty);
+               qel_register_send(&send_list, qc->iel, &qc->iel->pktns->tx.frms);
+               if (qc->hel)
+                       qel_register_send(&send_list, qc->hel, &empty);
+               qc_send(qc, 0, &send_list);
+       }
+
        /* Insert each QEL into sending list if needed. */
        list_for_each_entry(qel, &qc->qel_list, list) {
                if (qel_need_sending(qel, qc))

This should be backported up to 3.0.
2024-06-12 18:05:40 +02:00
Amaury Denoyelle
0ef94e2dff BUG/MINOR: quic: ensure Tx buf is always purged
quic_conn API for sending was recently refactored. The main objective
was to regroup the different functions present for both handshake and
application emission.

After this refactoring, an optimization was introduced to avoid calling
qc_send() if there was nothing new to emit. However, this prevent the Tx
buffer to be purged if previous sending was interrupted, until new
frames are finally available.

To fix this, simply remove the optimization. qc_send() is thus now
always called in quic_conn IO handlers.

The impact of this bug should be minimal as it happens only on sending
temporary error. However in this case, this could cause extra latency or
even a complete sending freeze in the worst scenario.

This must be backported up to 3.0.
2024-06-10 10:29:28 +02:00
Amaury Denoyelle
50470a5181 BUG/MINOR: quic: fix computed length of emitted STREAM frames
qc_build_frms() is responsible to encode multiple frames in a single
QUIC packet. It accounts for room left in the buffer packet for each
newly encded frame.

An incorrect computation was performed when encoding a STREAM frame in a
single packet. Frame length was accounted twice which would reduce in
excess the buffer packet room. This caused the remaining built frames to
be reduced with the resulting packet not able to fill the whole MTU.

The impact of this bug should be minimal. It is only present when
multiple frames are encoded in a single packet after a STREAM. However
in this case datagrams built are smaller than expecting, which is
suboptimal for bandwith.

This should be backported up to 2.6.
2024-06-10 10:24:02 +02:00
Amaury Denoyelle
5764bc50b5 BUG/MINOR: quic: adjust restriction for stateless reset emission
Review RFC 9000 and ensure restriction on Stateless reset are properly
enforced. After careful examination, several changes are introduced.

First, redefine minimal Stateless Reset emitted packet length to 21
bytes (5 random bytes + a token). This is the new default length used in
every case, unless received packet which triggered it is 43 bytes or
smaller.

Ensure every Stateless Reset packets emitted are at 1 byte shorter than
the received packet which triggered it. No Stateless reset will be
emitted if this falls under the above limit of 21 bytes. Thus this
should prevent looping issues.

This should be backported up to 2.6.
2024-05-24 14:36:31 +02:00
Willy Tarreau
72d0dcda8e MINOR: dynbuf: pass a criticality argument to b_alloc()
The goal is to indicate how critical the allocation is, between the
least one (growing an existing buffer ring) and the topmost one (boot
time allocation for the life of the process).

The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function
to try to allocate otherwise subscribe. There's currently no distinction
of direction nor part that tries to allocate, and this should be revisited
to improve this situation, particularly when we consider that mux-h2 can
reduce its Tx allocations if needed.

For now, 4 main levels are planned, to translate how the data travels
inside haproxy from a producer to a consumer:
  - MUX_RX:   buffer used to receive data from the OS
  - SE_RX:    buffer used to place a transformation of the RX data for
              a mux, or to produce a response for an applet
  - CHANNEL:  the channel buffer for sync recv
  - MUX_TX:   buffer used to transfer data from the channel to the outside,
              generally a mux but there can be a few specificities (e.g.
              http client's response buffer passed to the application,
              which also gets a transformation of the channel data).

The other levels are a bit different in that they don't strictly need to
allocate for the first two ones, or they're permanent for the last one
(used by compression).
2024-05-10 17:18:13 +02:00
Ilia Shipitsin
a65c6d3574 CLEANUP: assorted typo fixes in the code and comments
This is 42nd iteration of typo fixes
2024-05-03 09:01:36 +02:00
Amaury Denoyelle
c6e3d60fc1 OPTIM: quic: do not call qc_prep_pkts() if everything sent
qc_send() is implemented as a loop to repeatedly invoke
qc_prep_pkts()/qc_send_ppkts(). This ensures that all data are emitted
even if bigger that a single Tx buffer instance. This is useful if
congestion window is empty but big enough for application data.

Looping is interrupted if qc_prep_pkts() returns a negative error
code, for example due to no space left in congestion window. It can also
returns 0 if no input data to sent, which also interrupt the loop.

To limit this last case, removed quic_enc_level from send_list each time
everything already send via qc_prep_pkts(). Loop can then be interrupted
as soon as send_list is empty, avoiding an extra superfluous call to
qc_prep_pkts().
2024-04-10 11:18:01 +02:00
Amaury Denoyelle
34b31d85cb OPTIM: quic: do not call qc_send() if nothing to emit
qc_send() was systematically called by quic_conn IO handlers with all
instantiated quic_enc_level. Change this to only register quic_enc_level
for send if needed. Do not call at all qc_send() if no qel registered.

A new function qel_need_sending() is defined to detect if sending is
required. First, it checks if quic_enc_level has prepared frames or
probing is set. It can also returns true if ACK required either on
quic_enc_level itself or because of quic_conn ack timer fired. Finally,
a CONNECTION_CLOSE emission for quic_conn is also a valid case.

This should reduce the number of invocations of qc_send(). This could
improve slightly performance, as well as simplify traces debugging.
2024-04-10 11:17:21 +02:00
Amaury Denoyelle
7fc1ce5bc8 MEDIUM: quic: remove duplicate hdshk/app send functions
A series of previous patches have clean up sending function for
handshake case. Their new exposed API is now flexible enough to convert
app case to use the same functions.

As such, qc_send_hdshk_pkts() is renamed qc_send() and become the single
entry point for QUIC emission. It is used during application packets
emission in quic_conn_app_io_cb(), qc_send_mux(). Also the internal
function qc_prep_hpkts() is renamed qc_prep_pkts().

Remove the new unneeded qc_send_app_pkts() and qc_prep_app_pkts().

Also removed qc_send_app_probing(). It was a simple wrapper over other
application send functions. Now, default qc_send() can be reuse for such
cases with <old_data> argument set to true.

An adjustment was needed when converting qc_send_hdshk_pkts() to the
general qc_send() version. Previously, only a single packets
encoding/emission cycle was performed. This was enough as handshake
packets are always smaller than Tx buffer. However, it may be possible
to emit more application data. As such, a loop is necessary to perform
multiple encoding/emission cycles, as this was already the case in
qc_send_app_pkts().

No functional difference should happen with this commit. However, as
these are critcal functions with a lot of changes, this patch is
labelled as medium.
2024-04-10 11:07:35 +02:00
Amaury Denoyelle
4e4127a66d MINOR: quic: use qc_send_hdshk_pkts() in handshake IO cb
quic_conn_io_cb() manually implements emission by using lower level
functions qc_prep_pkts() and qc_send_ppkts(). Replace this by using the
higher level function qc_send_hdshk_pkts() which notably handle buffer
allocation and purging.

This allows to clean up send API by flagging qc_prep_pkts() and
qc_send_ppkts() as static. They are now used in a single location inside
qc_send_hdshk_pkts().
2024-04-10 11:07:19 +02:00
Amaury Denoyelle
3a8f4761e7 MINOR: quic: improve sending API on retransmit
qc_send_hdshk_pkts() is a wrapper for qc_prep_hpkts() used on
retransmission. It was restricted to use two quic_enc_level pointers as
distinct arguments. Adapt it to directly use the same list of
quic_enc_level which is passed then to qc_prep_hpkts().

Now for retransmission quic_enc_level send list is built directly into
qc_dgrams_retransmit() which calls qc_send_hdshk_pkts().

Along this change, a new utility function qel_register_send() is
defined. It is an helper to build the quic_enc_level send list. It
enfores that each quic_enc_level instance is only registered in a single
list to prevent memory issues. It is both used in qc_dgrams_retransmit()
and quic_conn_io_cb().
2024-04-10 11:06:55 +02:00
Amaury Denoyelle
93f5b4c8ae MINOR: quic: uniformize sending methods for handshake
Emission of packets during handshakes was implemented via an API which
uses two alternative ways to specify the list of frames.

The first one uses a NULL list of quic_enc_level as argument for
qc_prep_hpkts(). This was an implicit method to iterate on all qels
stored in quic_conn instance, with frames already inserted in their
corresponding quic_pktns.

The second method was used for retransmission. It uses a custom local
quic_enc_level list specified by the caller as input to qc_prep_hpkts().
Frames were accessible through <retransmit> list pointers of each
quic_enc_level used in an implicit mechanism.

This commit clarifies the API by using a single common method. Now
quic_enc_level list must always be specified by the caller. As for
frames list, each qels must set its new field <send_frms> pointer to the
list of frames to send. Callers of qc_prep_hpkts() are responsible to
always clear qels send list. This prevent a single instance of
quic_enc_level to be inserted while being attached to another list.

This allows notably to clean up some unnecessary code. First,
<retransmit> list of quic_enc_level is removed as it is replaced by new
<send_frms>. Also, it's now possible to use proper list_for_each_entry()
inside qc_prep_hpkts() to loop over each qels. Internal functions for
quic_enc_level selection is now removed.
2024-04-10 11:06:41 +02:00
Amaury Denoyelle
44eec848e8 MINOR: quic: simplify qc_send_hdshk_pkts() return
Clean up trailer of qc_send_hdshk_pkts() by removing label "leave". Only
"out" label is now used. This operation is safe as LIST_DEL_INIT() is
idempotent. Caller of qc_send_hdshk_pkts() also ensures input frame
lists are freed, so it's better to always reset quic_enc_level
<retrans_frms> member.

Also take the opportunity to reset QUIC_FL_CONN_RETRANS_OLD_DATA only if
already set. This is considered more robust and will also remove
unneeded trace occurences.

No functional change. The main objective of this commit is to clean up
code in preparation of a refactoring on send functions.
2024-04-10 10:14:36 +02:00
Willy Tarreau
c499cd15c7 BUG/MEDIUM: quic: don't blindly rely on unaligned accesses
There are several places where the QUIC low-level code performs unaligned
accesses by casting unaligned char* pointers to uint32_t, but this is
totally forbidden as it only works on machines that support unaligned
accesses, and either crashes on other ones (SPARC, MIPS), can result in
reading garbage (ARMv5) or be very slow due to the access being emulated
(RISC-V). We do have functions for this, such as read_u32() and write_u32()
that rely on the compiler's knowledge of the machine's capabilities to
either perform an unaligned access or do it one byte at a time.

This must be backported at least as far as 2.6. Some of the code moved a
few times since, so in order to figure the points that need to be fixed,
one may look for a forced pointer cast without having verified that either
the machine is compatible or that the pointer is aligned using this:

  $ git grep 'uint[36][24]_t \*)'

Or build and run the code on a MIPS or SPARC and perform requests using
curl to see if they work or crash with a bus error. All the places fixed
in this commit were found thanks to an immediate crash on the first
request.

This was tagged medium because the affected archs are not the most common
ones where QUIC will be found these days.
2024-04-06 00:07:49 +02:00
Frederic Lecaille
a305bb92b9 MINOR: quic: HyStart++ implementation (RFC 9406)
This is a simple algorithm to replace the classic slow start phase of the
congestion control algorithms. It should reduce the high packet loss during
this step.

Implemented only for Cubic.
2024-04-02 18:47:19 +02:00
Frédéric Lécaille
95e9033fd2 REORG: quic: Add a new module for retransmissions
Move several functions in relation with the retransmissions from TX part
(quic_tx.c) to quic_retransmit.c new C file.
2023-11-28 15:47:18 +01:00
Frédéric Lécaille
714d1096bc REORG: quic: Move qc_notify_send() to quic_conn
Move qc_notify_send() from quic_tx.c to quic_conn.c. Note that it was already
exported from both quic_conn.h and quic_tx.h. Modify this latter header
to fix the duplication.
2023-11-28 15:47:18 +01:00
Frédéric Lécaille
b5970967ca REORG: quic: Add a new module for QUIC retry
Add quic_retry.c new C file for the QUIC retry feature:
   quic_saddr_cpy() moved from quic_tx.c,
   quic_generate_retry_token_aad() moved from
   quic_generate_retry_token() moved from
   parse_retry_token() moved from
   quic_retry_token_check() moved from
   quic_retry_token_check() moved from
2023-11-28 15:47:18 +01:00
Frédéric Lécaille
0b872e24cd REORG: quic: Move qc_may_probe_ipktns() to quic_tls.h
This function is in relation with the Initial packet number space which is
more linked to the QUIC TLS specifications. Let's move it to quic_tls.h
to be inlined.
2023-11-28 15:37:50 +01:00
Frédéric Lécaille
c93ebcc59b REORG: quic: Move quic_build_post_handshake_frames() to quic_conn module
Move quic_build_post_handshake_frames() from quic_rx.c to quic_conn.c. This
is a function which is also called from the TX part (quic_tx.c).
2023-11-28 15:37:50 +01:00