quic-conn layer has to handle itself STREAM frames after MUX release. If
the stream was already seen, it is probably only a retransmitted frame
which can be safely ignored. For other streams, an active closure may be
needed.
Thus it's necessary that quic-conn layer knows the highest stream ID
already handled by the MUX after its release. Previously, this was done
via <nb_streams> member array in quic-conn structure.
Refactor this by replacing <nb_streams> by two members called
<stream_max_uni>/<stream_max_bidi>. Indeed, it is unnecessary for
quic-conn layer to monitor locally opened uni streams, as the peer
cannot by definition emit a STREAM frame on it. Also, bidirectional
streams are always opened by the remote side.
Previously, <nb_streams> were set by quic-stream layer. Now,
<stream_max_uni>/<stream_max_bidi> members are only set one time, just
prior to QUIC MUX release. This is sufficient as quic-conn do not use
them if the MUX is available.
Note that previously, IDs were used relatively to their type, thus
incremented by 1, after shifting the original value. For simplification,
use the plain stream ID, which is incremented by 4.
Add accounting at qc_stream_desc level to be able to report the number
of allocated Tx buffers and the sum of their data. This represents data
ready for emission or already emitted and waiting on ACK.
To simplify this accounting, a new counter type bdata_ctr is defined in
quic_utils.h. This regroups both buffers and data counter, plus a
maximum on the buffer value.
These values are now displayed on QCS info used both on logline and
traces, and also on "show quic" output.
Out-of-order STREAM ACK are buffered in its related streambuf tree. On
insertion, overlapping or contiguous ranges are merged together. The
total size of buffered ack range is stored in <room> streambuf member
and reported to QUIC MUX layer on streambuf release. The objective is to
ensure QUIC MUX layer can allocate Tx buffers conveniently to preserve a
good transfer throughput.
Streamdesc is the overall container of many streambufs. It may also been
released when its upper QCS instance is freed, after all stream data
have been emitted. In this case, the active streambuf is also released
via custom code. However, in this code path, <room> was not reported to
the QUIC MUX layer.
This bug caused wrong estimation for the QUIC MUX txbuf window, with
bytes reamining even after all ACK reception. This may cause transfer
freeze on other connection streams, with RESET_STREAM emission on
timeout client.
To fix this, reuse the existing qc_stream_buf_release() function on
streamdesc release. This ensures that notify_room is correctly used.
No need to backport.
To properly decount out-of-order acked data range, contiguous or
overlapping ranges are first merged before their insertion in a tree.
The first step ensure that a newly reported range is not completely
covered by the existing tree ranges. However, one of the condition was
incorrect. Fix this to ensure that the final range tree does not contain
duplicated entry.
The impact of this bug is unknown. However, it may have allowed the
insertion of overlapping ranges, which could in turn cause an error in
QUIC MUX txbuf window, with a possible transfer freeze.
No need to backport.
This commit is the last one of a serie whose objective is to restore
QUIC transfer throughput performance to the state prior to the recent
QUIC MUX buffer allocator rework.
This gain is obtained by reporting received out-of-order ACK data range
to the QUIC MUX which can then decount room in its txbuf window. This is
implemented in QUIC streamdesc layer by adding a new invokation of
notify_room callback. This is done into qc_stream_buf_store_ack() which
handle out-of-order ACK data range.
Previous commit has introduced merging of overlapping ACK data range. As
such, it's easy to only report the newly acknowledged data range.
As with in-order ACKs, this new notification is only performed on
released streambuf. As such, when a streambuf instance is released,
notify_room notification now also reports the total length of
out-of-order ACK data range currently stored. This value is stored in a
new streambuf member <room> to avoid unnecessary tree lookup.
This <room> member also serves on in-order ACK notification to reduce
the notified room. This prevents to report invalid values when overlap
ranges are treated first out-of-order and then in-order, which would
cause an invalid QUIC MUX txbuf window value.
After this change has been implemented, performance has been
significantly improved, both with ngtcp2-client rate usage and on
interop goodput test. These values are now similar to the rate observed
on older haproxy version before QUIC MUX buffer allocator rework.
Transfer throughput was deteriorated since recent rework of QUIC MUX
txbuf allocator. This was partially restorated with the commit to
decount individual in-order ACK from the MUX buffer window.
To fully retrieve the old performance level, all ACKs must be decounted
when handled by QUIC streamdesc layer, event out-of-order ranges.
However, this is not easily implemented as several ranges may exist in
parallel with overlap on the underlying data. It would cause
miscalculation for QUIC MUX buffer window if such ranges were blindly
reported.
The proper solution is to first implement merge of contiguous or
overlapping ACK data ranges to reduce the number of stored ranges to the
minimal. This is the purpose of this patch. This is implemented in a new
static function named qc_stream_buf_store_ack() into streamdesc layer.
The merge algorithm is simple enough. First, it ensures the newly added
range is not already fully covered by a preexisting entry. Then, it
checks if there is contiguity/overlap with one or several ranges
starting at the same of a greater offset. If true, the newly added entry
is extended to cover them all, and all contiguous/overlapped ranges are
removed. Finally, if there is contiguity or overlap with an entry
starting at a smaller offset, no new range is instantiated and instead
the smaller offset is extended.
Now that contiguous or overlapped ranges cannot exits anymore, ACK data
ranges tree instiatiation can used EB_ROOT_UNIQUE.
Outside of the longer term objective which is to decount out-of-order
ACKs from MUX txbuf window, this commit could also improve some
performance and/or memory usage for connections where stream data
fragmentation and packet reording is high.
QUIC streamdesc layer is responsible to handle reception of ACK for
streams. It removes stream data from the underlying buffers on ACK
reception.
Streamdesc layer treats ACK in order at the stream level. Out of order
ACKs are buffered in a tree until they can be handled on older data
acknowledgement reception. Previously, qf_stream instance which comes
from the quic_tx_packet was used as tree node to buffer such ranges.
Introduce a new type dedicated to represent out of order stream ack data
range. This type is named qc_stream_ack. It contains minimal infos only
relative to the acknowledged stream data range.
This allows to reduce size of frequently used quic_frame with the
removal of tree node from qf_stream. Another side effect of this change
is that now quic_frame are always released immediately on ACK reception,
both in-order and out-of-order. This allows to also release the
quic_tx_packet instance which should reduce memory consumption.
The drawback of this change is that qc_stream_ack instance must be
allocated on out-of-order ACK reception. As such, qc_stream_desc_ack()
may fail if an error happens on allocation. For the moment, such error
is silenly recovered up to qc_treat_rx_pkts() with the dropping of the
received packet containing the ACK frame. In the future, it may be
useful to close the connection as this error may only happens on low
memory usage.
Recently, a new allocation mechanism was implemented for Tx buffers used
by QUIC MUX. Now, underlying congestion window size is used to determine
if it is still possible or not to allocate a new buffer when necessary.
This mechanism has render the QUIC stack more flexible. However, it also
has brought some performance degradation, with transfer time longer in
certain environment. It was first discovered on the measurement results
of the interop. It can also easily be reproduced using the following
ngtcp2-client example which forces a very small congestion window due to
frequent loss :
$ ngtcp2-client -q --no-quic-dump --no-http-dump --exit-on-all-streams-close -r 0.1 127.0.0.1 20443 "https://[::]:20443/?s=10m"
This performance decrease is caused by the allocator which is now too
strict. It may cause buffer underrun frequently at the MUX layer when
the congestion window is too small, as new buffers cannot be allocated
until the current one is fully acknowledged. This resuls in transfers
with very bad throughput utilisation. The objective of this new serie of
patches is to relax some restrictions to permit QUIC MUX to allocate new
buffers more quickly, while preserving the initial limitation based on
congestion window size.
An interesting method for this is to notify QUIC MUX about newly
available room on individual ACK reception, without waiting for the full
bffer acknowledgement. This is easily implemented by adding a new
notify_room invokation in QUIC streamdesc layer on ACK reception.
However, ACK reception are handled in-order at the stream level. Out of
order ACKs are buffered and are not decounted for now. This will be
implemented in a future commit.
Note that for a single buffer instance, data can in parallel be written
by QUIC MUX and removed on ACK reception. This could cause room
notification to QUIC MUX layer to report invalid values. As such, ACK
reception are only accounted for released buffers. This ensures that
such buffers won't received any new data. In the same time, buffer room
is notified on release operation as it does not need acknowledgement.
This commit has permit to improve performance for the ngtcp2-client
scenario above. However, it is not yet sufficient enough for interop
goodput test.
Fix NULL argument pass to qc_release_frm(). This allows to give more
context on the traces inside it. Note that no crash occured as QUIC
traces always check validity on first arg before derefencing it.
No backport needed.
For the moment, streamdesc layer can only deal with in-order ACK at the
stream level. Received out-of-order ACKs are buffered in a tree attached
to a streambuf instance.
Previously, caller of qc_stream_desc_ack() was responsible to implement
consumption of these buffered ACKs. Refactor this by implementing it
directly at the streamdesc layer within qc_stream_desc_ack(). This
simplifies quic_rx ACK handling and ensure buffered ACKs are consumed as
soon as possible.
qc_stream_desc_ack() is the entrypoint for streamdesc layer to handle a
new acknowledgement of previously emitted STREAM data.
Previously, it was only able to deal with in-order ACK offset. The
caller was responsible to buffer out-of-order ACKs. Change this by
dealing with the latter case directly in qc_stream_desc_ack(). This
notably simplify ACK handling in quic_rx module.
QUIC streamdesc layer is used to manage QUIC MUX stream txbuf data
storage until acknowledgment. Currently, it only supports in-order
acknowledgment at the stream level. This requires to be able to buffer
out-of-order ACKs until they can be handled.
Previously, these ACKs were stored in a tree to the streamdesc instance.
Move this indexed storage at the streambuf instance.
This commit is purely an architecture change. However, it will allow to
extend ACK management in future patches, such as the ability to merge
overlapping out-of-order ACKs.
qc_stream_desc layer is used by QUIC MUX to store emitted STREAM data
until their acknowledgement. Each stream with Tx capability can allocate
its own qc_stream_desc. In turn, each stream desc can have one or
multiple data buffers. This is useful when a MUX stream releases a
buffer and allocate a new one, to preserve bandwith without waiting to
receive all acknowledgement of the previous buffer.
Each buffer is encapsulated in a qc_stream_buf structure. Previously, it
was stored as a list into qc_stream_desc. Change this storage to use a
tree instead. Each buffer is indexed by their offset.
This commit does not introduce functional changes. However, this
rearchitecture will be necessary for future commit to extend ACK
management which require fetching individual buffer instance, not just
the first or last element of a streamdesc, by their offset.
qc_stream_desc_ack() is used to handle ACK received for STREAM frame. It
removes acknowledged data from their underlying buffer.
If all data were removed after ACK handling, qc_stream_desc instance
would automatically be freed at the end of qc_stream_desc_ack().
However, this renders the function complicated to use. Simplify this by
removing this automatic removal. Now, caller is responsible to check
after ACK handling if qc_stream_desc instance can be removed. This is
easily done using qc_stream_desc_done() helper.
qc_stream_desc is an intermediary layer between QUIC MUX and quic_conn.
It is a facility which permits to store data to emit and keep them for
retransmission until acknowledgment. This layer is responsible to notify
QUIC MUX each time a buffer is freed. This is necessary as MUX buffer
allocation is limited by the underlying congestion window size.
Refactor this to use a mechanism similar to send notification. A new
callback notify_room can now be registered to qc_stream_desc instance.
This is set by QUIC MUX to qmux_ctrl_room(). On MUX QUIC free, special
care is now taken to reset notify_room callback to NULL.
Thanks to this refactoring, further adjustment have been made to refine
the architecture. One of them is the removal of qc_stream_desc
QC_SD_FL_OOB_BUF, which is now converted to a MUX layer flag
QC_SF_TXBUF_OOB.
Previous commit implement a refactor of MUX send notification from
quic_conn layer. With this new architecture, a proper callback is
defined for each qc_stream_desc instance.
This architecture change allows to simplify notification from quic_conn
layer. First, ensure the MUX callback to properly ignore retransmission
of an already emitted frame. Luckily, this can be handled easily by
comparing offsets and FIN status. Also, each QCS instance can now be
unregistered from send notification just prior qc_stream_desc releasing.
This ensures a QCS is never manipulated from quic_conn after its
emission ending. Both these changes render the send notification more
robust. As a nice effect, flag QUIC_FL_CONN_TX_MUX_CONTEXT can be
removed as it is now unneeded.
For STREAM emission, MUX QUIC generates one or several frames and emit
them via qc_send_mux(). Lower layer may use them as-is, or split them to
lower chunk to fit in a QUIC packet. It is then responsible to notify
the MUX to report the amount of data sent.
Previously, this was done via a direct call from quic_conn to MUX using
qcc_streams_sent_done(). Modify this to have a better isolation accross
layers. Define a send callback handled by the qc_stream_desc instance.
This allows the MUX to register each QCS instance individually to the
renamved qmux_ctrl_send() which replaces qcc_streams_sent_done().
At quic_conn layer, qc_stream_desc_send() can be used now. This is a
wrapper to qc_stream_desc layer to invoke the send callback if
registered.
This mechanism of qc_stream_desc callback should be extended later to
implement other notifications accross the QUIC stack.
When a stream buffer is freed, qc_stream_desc notify MUX. This is useful
if MUX is waiting for Tx buffer allocation.
Remove this notification in qc_stream_desc(). This is because the
function is called when all stream data have been acknowledged and thus
notified. This function can also be called with some data
unacknowledged, but in this case this is only true just before
connection closure. As such, it is useful to notify the MUX in this
condition.
QUIC application protocol layer has the ability to either allocate a
standard buffer or a smaller one. The latter is useful when only small
data are transferred to prevent consuming too much of the QUIC MUX
buffer window.
This operation is performed using qc_stream_buf_realloc(). Add a new
BUG_ON() in it to ensure no data is present in the buffer. Indeed, this
would cause to data loss, or even crash when trying to acknowledge data.
Note that for the moment qc_stream_buf_realloc() is only use for HTTP/3
headers transmission, and this usage is conform to the new BUG_ON. This
commit is thus not a bug fix, but only to strengthen the API.
Previous commit switch to small buffers for HTTP/3 HEADERS emission.
This ensures that several parallel streams can allocate their own buffer
without hitting the connection buffer limit based now on the congestion
window size.
However, this prevents the transmission of responses with uncommonly
large headers. Indeed, if all headers cannot be encoded in a single
buffer, an error is reported which cause the whole connection closure.
Adjust this by implementing a realloc API exposed by QUIC MUX. This
allows application layer to switch from a small to a default buffer and
restart its processing. This guarantees that again headers not longer
than bufsize can be properly transferred.
This patch extends qc_stream_desc API to be able to allocate small
buffers. QUIC MUX API is similarly updated as ultimatly each application
protocol is responsible to choose between a default or a smaller buffer.
Internally, the type of allocated buffer is remembered via qc_stream_buf
instance. This is mandatory to ensure that the buffer is released in the
correct pool, in particular as small and standard buffers can be
configured with the same size.
This commit is purely an API change. For the moment, small buffers are
not used. This will changed in a dedicated patch.
Define a new buffer pool reserved to allocate smaller memory area. For
the moment, its usage will be restricted to QUIC, as such it is declared
in quic_stream module.
Add a new config option "tune.bufsize.small" to specify the size of the
allocated objects. A special check ensures that it is not greater than
the default bufsize to avoid unexpected effects.
Each QUIC MUX may allocate buffers for MUX stream emission. These
buffers are then shared with quic_conn to handle ACK reception and
retransmission. A limit on the number of concurrent buffers used per
connection has been defined statically and can be updated via a
configuration option. This commit replaces the limit to instead use the
current underlying congestion window size.
The purpose of this change is to remove the artificial static buffer
count limit, which may be difficult to choose. Indeed, if a connection
performs with minimal loss rate, the buffer count would limit severely
its throughput. It could be increase to fix this, but it also impacts
others connections, even with less optimal performance, causing too many
extra data buffering on the MUX layer. By using the dynamic congestion
window size, haproxy ensures that MUX buffering corresponds roughly to
the network conditions.
Using QCC <buf_in_flight>, a new buffer can be allocated if it is less
than the current window size. If not, QCS emission is interrupted and
haproxy stream layer will subscribe until a new buffer is ready.
One of the criticals parts is to ensure that MUX layer previously
blocked on buffer allocation is properly woken up when sending can be
retried. This occurs on two occasions :
* after an already used Tx buffer is cleared on ACK reception. This case
is already handled by qcc_notify_buf() via quic_stream layer.
* on congestion window increase. A new qcc_notify_buf() invokation is
added into qc_notify_send().
Finally, remove <avail_bufs> QCC field which is now unused.
This commit is labelled MAJOR as it may have unexpected effect and could
cause significant behavior change. For example, in previous
implementation QUIC MUX would be able to buffer more data even if the
congestion window is small. With this patch, data cannot be transferred
from the stream layer which may cause more streams to be shut down on
client timeout. Another effect may be more CPU consumption as the
connection limit would be hit more often, causing more streams to be
interrupted and woken up in cycle.
Define a new QCC counter named <buf_in_flight>. Its purpose is to
account the current sum of all allocated stream buffer size used on
emission.
For this moment, this counter is updated and buffer allocation and
deallocation. It will be used to replace <avail_bufs> once congestion
window is used as limit for buffer allocation in a future commit.
Define a new qc_stream_desc flag QC_SD_FL_OOB_BUF. This is to mark
streams which are not subject to the connection limit on allocated MUX
stream buffer.
The purpose is to simplify handling of QUIC MUX streams which do not
transfer data and as such are not driven by haproxy layer, for example
HTTP/3 control stream. These streams interacts synchronously with QUIC
MUX and cannot retry emission in case of temporary failure.
This commit will be useful once connection buffer allocation limit is
reimplemented to directly rely on the congestion window size. This will
probably cause the buffer limit to be reached more frequently, maybe
even on QUIC MUX initialization. As such, it will be possible to mark
control streams and prevent them to be subject to the buffer limit.
QUIC MUX expose a new function qcs_send_metadata(). It can be used by an
application protocol to specify which streams are used for control
exchanges. For the moment, no such stream use this mechanism.
A limit per connection is put on the number of buffers allocated by QUIC
MUX for emission accross all its streams. This ensures memory
consumption remains under control. This limit is simply explained as a
count of buffers which can be concurrently allocated for each
connection.
As such, quic_conn structure was used to account currently allocated
buffers. However, a quic_conn nevers allocates new stream buffers. This
is only done at QUIC MUX layer. As such, this commit moves buffer
accounting inside QCC structure. This simplifies the API, most notably
qc_stream_buf_alloc() usage.
Note that this commit inverts the accounting. Previously, it was
initially set to 0 and increment for each allocated buffer. Now, it is
set to the maximum value and decrement for each buf usage. This is
considered as clearer to use.
This commit simply adjusts QUIC stream buffer allocation. This operation
is conducted by QUIC MUX using qc_stream_desc layer. Previously,
qc_stream_buf_alloc() would return a qc_stream_buf instance and QUIC MUX
would finalized the buffer area allocation. Change this to perform the
buffer allocation directly into qc_stream_buf_alloc().
This patch clarifies the interaction between QUIC MUX and
qc_stream_desc. It is cleaner to allocate the buffer via qc_stream_desc
as it is already responsible to free the buffer.
It also ensures that connection buffer accounting is only done after the
whole qc_stream_buf and its buffer are allocated. Previously, the
increment operation was performed between the two steps. This was not an
issue, as this kind of error triggers the whole connection closure.
However, if in the future this is handled as a stream closure instead,
this commit ensures that the buffer remains valid in all cases.
A connection freeze may occur if a QCS is released before transmitting
any data. This can happen when an error is detected early by the stream,
for example during HTTP response headers encoding, forcing the whole
connection closure.
In this case, a connection error is registered by the QUIC MUX to the
lower layer. MUX is then release and xprt layer is notified to prepare
CONNECTION_CLOSE emission. However, this is prevented because quic_conn
streams tree is not empty as it contains the qc_stream_desc previously
attached to the failed QCS instance. The connection will freeze until
QUIC idle timeout.
This situation is caused by an omission during qc_stream_desc release
operation. In the described situation, qc_stream_desc current buffer is
empty and can thus by removed, which is the purpose of this patch. This
unblocks this previously failed situation, with qc_stream_desc removal
from quic_conn tree.
This issue can be reproduced by modifying H3/QPACK code to return an
early error during HEADERS response processing.
This must be backported up to 2.6, after a period of observation.
Add a new BUG_ON() in qc-stream_desc_ack(). It ensures that
acknowledgement are always notify in-order. This is because out-of-order
ACKs cannot be handled by qc_stream_desc layer which does not support
gap in STREAM sent data.
Prior to this fix, out-of-order ACKs are simply ignored without any
error. This currently cannot happen thanks to careful
qc_stream_desc_ack() invokation. If this assumption is broken in the
future by inatteion, this would cause loss of ACK notification which
will prevent qc_stream_desc release.
STREAM frames have dedicated handling on retransmission. A special check
is done to remove data already acked in case of duplicated frames, thus
only unacked data are retransmitted.
This handling is faulty in case of an empty STREAM frame with FIN set.
On retransmission, this frame does not cover any unacked range as it is
empty and is thus discarded. This may cause the transfer to freeze with
the client waiting indefinitely for the FIN notification.
To handle retransmission of empty FIN STREAM frame, qc_stream_desc layer
have been extended. A new flag QC_SD_FL_WAIT_FOR_FIN is set by MUX QUIC
when FIN has been transmitted. If set, it prevents qc_stream_desc to be
freed until FIN is acknowledged. On retransmission side,
qc_stream_frm_is_acked() has been updated. It now reports false if
FIN bit is set on the frame and qc_stream_desc has QC_SD_FL_WAIT_FOR_FIN
set.
This must be backported up to 2.6. However, this modifies heavily
critical section for ACK handling and retransmission. As such, it must
be backported only after a period of observation.
This issue can be reproduced by using the following socat command as
server to add delay between the response and connection closure :
$ socat TCP-LISTEN:<port>,fork,reuseaddr,crlf SYSTEM:'echo "HTTP/1.1 200 OK"; echo ""; sleep 1;'
On the client side, ngtcp2 can be used to simulate packet drop. Without
this patch, connection will be interrupted on QUIC idle timeout or
haproxy client timeout with ERR_DRAINING on ngtcp2 :
$ ngtcp2-client --exit-on-all-streams-close -r 0.3 <host> <port> "http://<host>:<port>/?s=32o"
Alternatively to ngtcp2 random loss, an extra haproxy patch can also be
used to force skipping the emission of the empty STREAM frame :
diff --git a/include/haproxy/quic_tx-t.h b/include/haproxy/quic_tx-t.h
index efbdfe687..1ff899acd 100644
--- a/include/haproxy/quic_tx-t.h
+++ b/include/haproxy/quic_tx-t.h
@@ -26,6 +26,8 @@ extern struct pool_head *pool_head_quic_cc_buf;
/* Flag a sent packet as being probing with old data */
#define QUIC_FL_TX_PACKET_PROBE_WITH_OLD_DATA (1UL << 5)
+#define QUIC_FL_TX_PACKET_SKIP_SENDTO (1UL << 6)
+
/* Structure to store enough information about TX QUIC packets. */
struct quic_tx_packet {
/* List entry point. */
diff --git a/src/quic_tx.c b/src/quic_tx.c
index 2f199ac3c..2702fc9b9 100644
--- a/src/quic_tx.c
+++ b/src/quic_tx.c
@@ -318,7 +318,7 @@ static int qc_send_ppkts(struct buffer *buf, struct ssl_sock_ctx *ctx)
tmpbuf.size = tmpbuf.data = dglen;
TRACE_PROTO("TX dgram", QUIC_EV_CONN_SPPKTS, qc);
- if (!skip_sendto) {
+ if (!skip_sendto && !(first_pkt->flags & QUIC_FL_TX_PACKET_SKIP_SENDTO)) {
int ret = qc_snd_buf(qc, &tmpbuf, tmpbuf.data, 0, gso);
if (ret < 0) {
if (gso && ret == -EIO) {
@@ -354,6 +354,7 @@ static int qc_send_ppkts(struct buffer *buf, struct ssl_sock_ctx *ctx)
qc->cntrs.sent_bytes_gso += ret;
}
}
+ first_pkt->flags &= ~QUIC_FL_TX_PACKET_SKIP_SENDTO;
b_del(buf, dglen + QUIC_DGRAM_HEADLEN);
qc->bytes.tx += tmpbuf.data;
@@ -2066,6 +2067,17 @@ static int qc_do_build_pkt(unsigned char *pos, const unsigned char *end,
continue;
}
+ switch (cf->type) {
+ case QUIC_FT_STREAM_8 ... QUIC_FT_STREAM_F:
+ if (!cf->stream.len && (qc->flags & QUIC_FL_CONN_TX_MUX_CONTEXT)) {
+ TRACE_USER("artificially drop packet with empty STREAM frame", QUIC_EV_CONN_TXPKT, qc);
+ pkt->flags |= QUIC_FL_TX_PACKET_SKIP_SENDTO;
+ }
+ break;
+ default:
+ break;
+ }
+
quic_tx_packet_refinc(pkt);
cf->pkt = pkt;
}
qc_stream_desc had a field <release> used as a boolean. Convert it with
a new <flags> field and QC_SD_FL_RELEASE value as equivalent.
The purpose of this patch is to be able to extend qc_stream_desc by
adding newer flags values. This patch is required for the following
patch
BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM
As such, it must be backported prior to it.
This commit is a direct follow-up on the major rearchitecture of send
buffering. This patch implements the proper handling of connection pool
buffer temporary exhaustion.
The first step is to be able to differentiate a fatal allocation error
from a temporary pool exhaustion. This is done via a new output argument
on qcc_get_stream_txbuf(). For a fatal error, application protocol layer
will schedule the immediate connection closing. For a pool exhaustion,
QCC is flagged with QC_CF_CONN_FULL and stream sending process is
interrupted. QCS instance is also registered in a new list
<qcc.buf_wait_list>.
A new connection buffer can become available when all ACKs are received
for an older buffer. This process is taken in charge by quic-conn layer.
It uses qcc_notify_buf() function to clear QC_CF_CONN_FULL and to wake
up every streams registered on buf_wait_list to resume sending process.
Previously, QUIC MUX sending was implemented with data transfered along
two different buffer instances per stream.
The first QCS buffer was used for HTX blocks conversion into H3 (or
other application protocol) during snd_buf stream callback. QCS instance
is then registered for sending via qcc_io_cb().
For each sending QCS, data memcpy is performed from the first to a
secondary buffer. A STREAM frame is produced for each QCS based on the
content of their secondary buffer.
This model is useful for QUIC MUX which has a major difference with
other muxes : data must be preserved longer, even after sent to the
lower layer. Data references is shared with quic-conn layer which
implements retransmission and data deletion on ACK reception.
This double buffering stages was the first model implemented and remains
active until today. One of its major drawbacks is that it requires
memcpy invocation for every data transferred between the two buffers.
Another important drawback is that the first buffer was is allocated by
each QCS individually without restriction. On the other hand, secondary
buffers are accounted for the connection. A bottleneck can appear if
secondary buffer pool is exhausted, causing unnecessary haproxy
buffering.
The purpose of this commit is to completely break this model. The first
buffer instance is removed. Now, application protocols will directly
allocate buffer from qc_stream_desc layer. This removes completely the
memcpy invocation.
This commit has a lot of code modifications. The most obvious one is the
removal of <qcs.tx.buf> field. Now, qcc_get_stream_txbuf() returns a
buffer instance from qc_stream_desc layer. qcs_xfer_data() which was
responsible for the memcpy between the two buffers is also completely
removed. Offset fields of QCS and QCC are now incremented directly by
qcc_send_stream(). These values are used as boundary with flow control
real offset to delimit the STREAM frames built.
As this change has a big impact on the code, this commit is only the
first part to fully support single buffer emission. For the moment, some
limitations are reintroduced and will be fixed in the next patches :
* on snd_buf if QCS sent buffer in used has room but not enough for the
application protocol to store its content
* on snd_buf if QCS sent buffer is NULL and allocation cannot succeeds
due to connection pool exhaustion
One final important aspect is that extra care is necessary now in
snd_buf callback. The same buffer instance is referenced by both the
stream and quic-conn layer. As such, some operation such as realign
cannot be done anymore freely.
A recent fix was introduced to ensure unsent data are deleted when a
QUIC MUX stream releases its qc_stream_desc instance. This is necessary
to ensure all used buffers will be liberated once all ACKs are received.
This is implemented by the following patch :
commit ad6b13d317 (quic-dev/qns)
BUG/MEDIUM: quic: remove unsent data from qc_stream_desc buf
Before this patch, buffer removal was done only on ACK reception. ACK
handling is only done in order from the oldest one. A BUG_ON() statement
is present to ensure this assertion remains valid.
This is however not true anymore since the above patch. Indeed, after
unsent data removal, the current buffer may be empty if it did not
contain yet any sent data. In this case, it is not the oldest buffer,
thus the BUG_ON() statement will be triggered.
To fix this, simply remove this BUG_ON() statement. It should not have
any impact as it is safe to remove buffers in any order.
Note that several conditions must be met to trigger this BUG_ON crash :
* a QUIC MUX stream is destroyed before transmitting all of its data
* several buffers must have been previously allocated for this stream so
it happens only for transfers bigger than bufsize
* latency should be high enough to delay ACK reception
This must be backported wherever the above patch is (currently targetted
to 2.6).
QCS instances use qc_stream_desc for data buffering on emission. On
stream reset, its Tx channel is closed earlier than expected. This may
leave unsent data into qc_stream_desc.
Before this patch, these unsent data would remain after QCS freeing.
This prevents the buffer to be released as no ACK reception will remove
them. The buffer is only freed when the whole connection is closed. As
qc_stream_desc buffer is limited per connection, this reduces the buffer
pool for other streams of the same connection. In the worst case if
several streams are resetted, this may completely freeze the transfer of
the remaining connection streams.
This bug was reproduced by reducing the connection buffer pool to a
single buffer instance by using the following global statement :
tune.quic.frontend.conn-tx-buffers.limit 1.
Then a QUIC client is used which opens a stream for a large enough
object to ensure data are buffered. The client them emits a STOP_SENDING
before reading all data, which forces the corresponding QCS instance to
be resetted. The client then opens a new request but the transfer is
freezed due to this bug.
To fix this, adjust qc_stream_desc API. Add a new argument <final_size>
on qc_stream_desc_release() function. Its value is compared to the
currently buffered offset in latest qc_stream_desc buffer. If
<final_size> is inferior, it means unsent data are present in the
buffer. As such, qc_stream_desc_release() removes them to ensure the
buffer will finally be freed when all ACKs are received. It is also
possible that no data remains immediately, indicating that ACK were
already received. As such, buffer instance is immediately removed by
qc_stream_buf_free().
This must be backported up to 2.6. As this code section is known to
regression, a period of observation could be reserved before
distributing it on LTS releases.
On ACK reception, data are removed from buffer via qc_stream_desc_ack().
The buffer can be freed if no more data are left. A new slot is also
accounted in buffer connection pool. Extract this operation in a
dedicated private function qc_stream_buf_free().
This change should have no functional change. However it will be useful
for the next patch which needs to remove a buffer from another function.
This patch is necessary for the following bugfix. As such, it must be
backported with it up to 2.6.
qc_stream_buf_alloc() can fail for two reasons :
* limit of Tx buffer per connection reached
* allocation failure
The first case is properly treated. A flag QC_CF_CONN_FULL is set on the
connection to interrupt emission. It is cleared when a buffer became
available after in order ACK reception and the MUX tasklet is woken up.
The allocation failure was handled with the same mechanism which in this
case is not appropriate and could lead to a connection transfer freeze.
Instead, prefer to close the connection with a QUIC internal error code.
To differentiate the two causes, qc_stream_buf_alloc() API was changed
to return the number of available buffers to the caller.
This must be backported up to 2.6.
The total number of buffer per connection for sending is limited by a
configuration value. To ensure this, <stream_buf_count> quic_conn field
is incremented on qc_stream_buf_alloc().
qc_stream_buf_alloc() may fail if the buffer cannot be allocated. In
this case, <stream_buf_count> should not be incremented. To fix this,
simply move increment operation after buffer allocation.
The impact of this bug is low. However, if a connection suffers from
several buffer allocation failure, it may cause the <stream_buf_count>
to be incremented over the limit without being able to go back down.
This must be backported up to 2.6.
Rename all frame variables with the suffix _frm. This helps to
differentiate frame instances from other internal objects.
This should be backported up to 2.7.
Each frame type used in quic_frame union has been renamed with the
following prefix "qf_". This helps to differentiate frame instances from
other internal objects.
This should be backported up to 2.7.
Add new quic_cstream struct definition to implement the CRYPTO data stream.
This is a simplication of the qcs object (QUIC streams) for the CRYPTO data
without any information about the flow control. They are not attached to any
tree, but to a QUIC encryption level, one by encryption level except for
the early data encryption level (for 0RTT). A stream descriptor is also allocated
for each CRYPTO data stream.
Must be backported to 2.6
xprt_quic module was too large and did not reflect the true architecture
by contrast to the other protocols in haproxy.
Extract code related to XPRT layer and keep it under xprt_quic module.
This code should only contains a simple API to communicate between QUIC
lower layer and connection/MUX.
The vast majority of the code has been moved into a new module named
quic_conn. This module is responsible to the implementation of QUIC
lower layer. Conceptually, it overlaps with TCP kernel implementation
when comparing QUIC and HTTP1/2 stacks of haproxy.
This should be backported up to 2.6.
Clean up quic sources by adjusting headers list included depending
on the actual dependency of each source file.
On some occasion, xprt_quic.h was removed from included list. This is
useful to help reducing the dependency on this single file and cleaning
up QUIC haproxy architecture.
This should be backported up to 2.6.
Some clients send CONNECTION_CLOSE frame without acknowledging the STREAM
data haproxy has sent. In this case, when closing the connection if
there were remaining data in QUIC stream buffers, they were not released.
Add a <closing> boolean option to qc_stream_desc_free() to force the
stream buffer memory releasing upon closing connection.
Thank you to Tristan for having reported such a memory leak issue in GH #1801.
Must be backported to 2.6.
QUIC was the last user of entities with "conn_stream" in their names,
though there's no more reason for this given that the pool names were
already pretty straightforward. The renaming does this:
qc_stream_desc: pool_head_quic_conn_stream -> pool_head_quic_stream_desc
qc_stream_buf: pool_head_quic_conn_stream_buf -> pool_head_quic_stream_buf
This is required when the retransmitted frame types when the mux is released.
We add a counter for the number of streams which were opened or closed by the mux.
After the mux has been released, we can rely on this counter to know if the STREAM
frames are retransmitted ones or not.