Reuse the idle timeout task to delay the acknowledgments. The time of the
idle timer expiration is for now on stored in ->idle_expire. The one to
trigger the acknowledgements is stored in ->ack_expire.
Add QUIC_FL_CONN_ACK_TIMER_FIRED new connection flag to mark a connection
as having its acknowledgement timer been triggered.
Modify qc_may_build_pkt() to prevent the sending of "ack only" packets and
allows the connection to send packet when the ack timer has fired.
It is possible that acks are sent before the ack timer has triggered. In
this case it is cancelled only if ACK frames are really sent.
The idle timer expiration must be set again when the ack timer has been
triggered or when it is cancelled.
Must be backported to 2.7.
Dump variables displayed by TRACE_ENTER() or TRACE_LEAVE() by calls to TRACE_PROTO().
No more variables are displayed by the two former macros. For now on, these information
are accessible from proto level.
Add new calls to TRACE_PROTO() at important locations in relation whith QUIC transport
protocol.
When relevant, try to prefix such traces with TX or RX keyword to identify the
concerned subpart (transmission or reception) of the protocol.
Must be backported to 2.7.
This bug was revealed by handshakeloss interop tests (often with quiceh) where one
could see haproxy an Initial packet without TLS ClientHello message (only a padded
PING frame). In this case, as the ->max_idle_timeout was not initialized, the
connection was closed about three seconds later, and haproxy opened a new one with
a new source connection ID upon receipt of the next client Initial packet. As the
interop runner count the number of source connection ID used by the server to check
there were exactly 50 such IDs used by the server, it considered the test as failed.
So, the ->max_idle_timeout of the connection must be at least initialized
to the local "max_idle_timeout" transport parameter value to avoid such
a situation (closing connections too soon) until it is "negotiated" with the
client when receiving its TLS ClientHello message.
Must be backported to 2.7 and 2.6.
Add ->srtt, ->rtt_var, ->rtt_min and ->pto_count values from ->path->loss
struct to "show quic". Same thing for ->cwnd from ->path struct.
Also take the opportunity of this patch to dump the packet number
space information directly from ->pktns[] array in place of ->els[]
array. Indeed, ->els[QUIC_TLS_ENC_LEVEL_EARLY_DATA] and ->els[QUIC_TLS_ENC_LEVEL_APP]
have the same packet number space.
Must be backported to 2.7 where "show quic" implementation has alredy been
backported.
This bug arrived with this commit:
MINOR: quic: Send PING frames when probing Initial packet number space
This may happen when haproxy needs to probe the peer with very short packets
(only one PING frame). In this case, the packet must be padded. There was clearly
a case which was removed by the mentionned commit above. That said, there was
an extra byte which was added to the PADDING frame before the mentionned commit
above. This is no more the case with this patch.
Thank you to @tatsuhiro-t (ngtcp2 manager) for having reported this issue which
was revealed by the keyupdate test (on client side).
Must be backported to 2.7 and 2.6.
This patch follows this commit which was not sufficient:
BUG/MINOR: quic: Missing STREAM frame data pointer updates
Indeed, after updating the ->offset field, the bit which informs the
frame builder of its presence must be systematically set.
This bug was revealed by the following BUG_ON() from
quic_build_stream_frame() :
bug condition "!!(frm->type & 0x04) != !!stream->offset.key" matched at src/quic_frame.c:515
This should fix the last crash occured on github issue #2074.
Must be backported to 2.6 and 2.7.
qc_notify_send() is used to wake up the MUX layer for sending. This
function first ensures that all sending condition are met to avoid to
wake up the MUX for unnecessarily.
One of this condition is to check if there is room in the congestion
window. However, when probe packets must be sent due to a PTO
expiration, RFC 9002 explicitely mentions that the congestion window
must be ignored which was not the case prior to this patch.
This commit fixes this by first setting <pto_probe> of 01RTT packet
space before invoking qc_notify_send(). This ensures that congestion
window won't be checked anymore to wake up the MUX layer until probing
packets are sent.
This commit replaces the following one which was not sufficient :
commit e25fce03eb
BUG/MINOR: quic: Dysfunctional 01RTT packet number space probing
This should be backported up to 2.7.
On PTO probe timeout expiration, a probe packet must be emitted.
quic_pto_pktns() is used to determine for which packet space the timer
has expired. However, if MUX is already subscribed for sending, it is
woken up without checking first if this happened for the 01RTT packet
space.
It is unsure that this is really a bug as in most cases, MUX is
established only after Initial and Handshake packet spaces are removed.
However, the situation is not se clear when 0-RTT is used. For this
reason, adjust the code to explicitely check for the 01RTT packet space
before waking up the MUX layer.
This should be backported up to 2.6. Note that qc_notify_send() does not
exists in 2.6 so it should be replaced by the explicit block checking
(qc->subs && qc->subs->events & SUB_RETRY_SEND).
This bug arrived with this commit:
"MINOR: quic: implement qc_notify_send()".
The ->tx.pto_probe variable was no more set when qc_processt_timer() the timer
task for the connection responsible of detecting packet loss and probing upon
PTO expiration leading to interrupted stream transfers. This was revealed by
blackhole interop failed tests where one could see that qc_process_timer()
was wakeup without traces as follows in the log file:
"needs to probe 01RTT packet number space"
Must be backported to 2.7 and to 2.6 if the commit mentionned above
is backported to 2.6 in the meantime.
The ACK frame range of packets were handled from the largest to the smallest
packet number, leading to big number of ebtree insertions when the packet are
handled in the inverse way they are sent. This was detected a long time ago
but left in the code to stress our implementation. It is time to be more
efficient and process the packet so that to avoid useless ebtree insertions.
Modify qc_ackrng_pkts() responsible of handling the acknowledged packets from an
ACK frame range of acknowledged packets.
Must be backported to 2.7.
This patch follows this one which was not sufficient:
"BUG/MINOR: quic: Missing STREAM frame length updates"
Indeed, it is not sufficient to update the ->len and ->offset member
of a STREAM frame to move it forward. The data pointer must also be updated.
This is not done by the STREAM frame builder.
Must be backported to 2.6 and 2.7.
Some STREAM frame lengths were not updated before being duplicated, built
of requeued contrary to their ack offsets. This leads haproxy to crash when
receiving acknowledgements for such frames with this thread #1 backtrace:
Thread 1 (Thread 0x7211b6ffd640 (LWP 986141)):
#0 ha_crash_now () at include/haproxy/bug.h:52
No locals.
#1 b_del (b=<optimized out>, del=<optimized out>) at include/haproxy/buf.h:436
No locals.
#2 qc_stream_desc_ack (stream=stream@entry=0x7211b6fd9bc8, offset=offset@entry=53176, len=len@entry=1122) at src/quic_stream.c:111
Thank you to @Tristan971 for having provided such traces which reveal this issue:
[04|quic|5|c_conn.c:1865] qc_requeue_nacked_pkt_tx_frms(): entering : qc@0x72119c22cfe0
[04|quic|5|_frame.c:1179] qc_frm_unref(): entering : qc@0x72119c22cfe0
[04|quic|5|_frame.c:1186] qc_frm_unref(): remove frame reference : qc@0x72119c22cfe0 frm@0x72118863d260 STREAM_F uni=0 fin=1 id=460 off=52957 len=1122 3244
[04|quic|5|_frame.c:1194] qc_frm_unref(): leaving : qc@0x72119c22cfe0
[04|quic|5|c_conn.c:1902] qc_requeue_nacked_pkt_tx_frms(): updated partially acked frame : qc@0x72119c22cfe0 frm@0x72119c472290 STREAM_F uni=0 fin=1 id=460 off=53176 len=1122
Note that haproxy has much more chance to crash if this frame is the last one
(fin bit set). But another condition must be fullfilled to update the ack offset.
A previous STREAM frame from the same stream with the same offset but with less
data must be acknowledged by the peer. This is the condition to update the ack offset.
For others frames without fin bit in the same conditions, I guess the stream may be
truncated because too much data are removed from the stream when they are
acknowledged.
Must be backported to 2.6 and 2.7.
This issue was revealed by "Multiple streams" QUIC tracker test which very often
fails (locally) with a file of about 1Mbytes (x4 streams). The log of QUIC tracker
revealed that from its point of view, the 4 files were never all received entirely:
"results" : {
"stream_0_rec_closed" : true,
"stream_0_rec_offset" : 1024250,
"stream_0_snd_closed" : true,
"stream_0_snd_offset" : 15,
"stream_12_rec_closed" : false,
"stream_12_rec_offset" : 72689,
"stream_12_snd_closed" : true,
"stream_12_snd_offset" : 15,
"stream_4_rec_closed" : true,
"stream_4_rec_offset" : 1024250,
"stream_4_snd_closed" : true,
"stream_4_snd_offset" : 15,
"stream_8_rec_closed" : true,
"stream_8_rec_offset" : 1024250,
"stream_8_snd_closed" : true,
"stream_8_snd_offset" : 15
},
But this in contradiction with others QUIC tracker logs which confirms that haproxy
has really (re)sent the stream at the suspected offset(stream_12_rec_offset):
1152085,
"transport",
"packet_received",
{
"frames" : [
{
"frame_type" : "stream",
"length" : "155",
"offset" : "72689",
"stream_id" : "12"
}
],
"header" : {
"dcid" : "a14479169ebb9dba",
"dcil" : "8",
"packet_number" : "466",
"packet_size" : 190
},
"packet_type" : "1RTT"
}
When detected as losts, the packets are enlisted, then their frames are
requeued in their packet number space by qc_requeue_nacked_pkt_tx_frms().
This was done using a local list which was spliced to the packet number
frame list. This had as bad effect to retransmit the frames in the inverse
order they have been sent. This is something the QUIC tracker go client
does not like at all!
Removing the frame splicing fixes this issue and allows haproxy to pass the
"Multiple streams" test.
Must be backported to 2.7.
This bug arrived with this commit:
b5a8020e9 MINOR: quic: RETIRE_CONNECTION_ID frame handling (RX)
and was revealed by h3 interop tests with clients like s2n-quic and quic-go
as noticed by Amaury.
Indeed, one must check that the CID matching the sequence number provided by a received
RETIRE_CONNECTION_ID frame does not match the DCID of the packet.
Remove useless ->curr_cid_seq_num member from quic_conn struct.
The sequence number lookup must be done in qc_handle_retire_connection_id_frm()
to check the validity of the RETIRE_CONNECTION_ID frame, it returns the CID to be
retired into <cid_to_retire> variable passed as parameter to this function if
the frame is valid and if the CID was not already retired
Must be backported to 2.7.
A new global quic-conn list has been added by the previous patch. It will
contain every quic-conn in closing or draining state.
Thus, it is now easier to include or skip them on a "show quic" output :
when the default list on the current thread has been browsed entirely,
either we skip to the next thread or we look at the closing list on the
current thread.
This should be backported up to 2.7.
When a CONNECTION_CLOSE is emitted or received, a QUIC connection enters
respectively in draining or closing state. These states are a loose
equivalent of TCP TIME_WAIT. No data can be exchanged anymore but the
connection is maintained during a certain timer to handle packet
reordering or loss.
A new global list has been defined for QUIC connections in
closing/draining state inside thread_ctx structure. Each time a
connection enters in one of this state, it will be moved from the
default global list to the new closing list.
The objective of this patch is to quickly filter connections on
closing/draining. Most notably, this will be used to wake up these
connections and avoid that haproxy process stopping is delayed by them.
A dedicated function qc_detach_th_ctx_list() has been implemented to
transfer a quic-conn from one list instance to the other. This takes
care of back-references attach to a quic-conn instance in case of a
running "show quic".
This should be backported up to 2.7.
Modify quic_transport_params_dump() and others function relative to the
transport parameters value dump from TRACE() to make their output more
compact.
Add call to quic_transport_params_dump() to dump the transport parameters
from "show quic" CLI command.
Must be backported to 2.7.
Add QUIC_FL_RX_PACKET_SPIN_BIT new RX packet flag to mark an RX packet as having
the spin bit set. Idem for the connection with QUIC_FL_CONN_SPIN_BIT flag.
Implement qc_handle_spin_bit() to set/unset QUIC_FL_CONN_SPIN_BIT for the connection
as soon as a packet number could be deciphered.
Modify quic_build_packet_short_header() to set the spin bit when building
a short packet header.
Validated by quic-tracker spin bit test.
Must be backported to 2.7.
Add ->curr_cid_seq_num new quic_conn struct frame to store the connection
ID sequence number currently used by the connection.
Implement qc_handle_retire_connection_id_frm() to handle this RX frame.
Implement qc_retire_connection_seq_num() to remove a connection ID from its
sequence number.
Implement qc_build_new_connection_id_frm to allocate a new NEW_CONNECTION_ID
frame from a CID.
Modify qc_parse_pkt_frms() which parses the frames of an RX packet to handle
the case of the RETIRE_CONNECTION_ID frame.
Must be backported to 2.7.
Add ->next_cid_seq_num new member to quic_conn struct to store the next
connection ID to be used to alloacated a connection ID.
It is initialized to 0 from qc_new_conn() which initializes a connection.
Modify new_quic_cid() to use this variable each time it is called without
giving the possibility to the caller to pass the sequence number for the
connection to be allocated.
Modify quic_build_post_handshake_frames() to use ->next_cid_seq_num
when building NEW_CONNECTION_ID frames after the hanshake has been completed.
Limit the number of connection IDs provided to the peer to the minimum
between 4 and the value it sent with active_connection_id_limit transport
parameter. This includes the connection ID used by the connection to send
this new connection IDs.
Must be backported to 2.7.
The MUX instance is released before its quic-conn counterpart. On
termination, a H3 GOAWAY is emitted to prevent the client to open new
streams for this connection.
The quic-conn instance will stay alive until all opened streams data are
acknowledged. If the client tries to open a new stream during this
interval despite the GOAWAY, quic-conn is responsible to request its
immediate closure with a STOP_SENDING + RESET_STREAM.
This behavior was already implemented but the received packet with the
new STREAM was never acknowledged. This was fixed with the following
commit :
commit 156a89aef8
BUG/MINOR: quic: acknowledge STREAM frame even if MUX is released
However, this patch introduces a regression as it did not skip the call
to qc_handle_strm_frm() despite the MUX instance being released. This
can cause a segfault when using qcc_get_qcs() on a released MUX
instance. To fix this, add a missing break statement which will skip
qc_handle_strm_frm() when the MUX instance is not initialized.
This commit was reproduced using a short timeout client and sending
several requests with delay between them by using a modified aioquic. It
produces a crash with the following backtrace :
#0 0x000055555594d261 in __eb64_lookup (x=4, root=0x7ffff4091f60) at include/import/eb64tree.h:132
#1 eb64_lookup (root=0x7ffff4091f60, x=4) at src/eb64tree.c:37
#2 0x000055555563fc66 in qcc_get_qcs (qcc=0x7ffff4091dc0, id=4, receive_only=1, send_only=0, out=0x7ffff780ca70) at src/mux_quic.c:668
#3 0x0000555555641e1a in qcc_recv (qcc=0x7ffff4091dc0, id=4, len=40, offset=0, fin=1 '\001', data=0x7ffff40c4fef "\001&") at src/mux_quic.c:974
#4 0x0000555555619d28 in qc_handle_strm_frm (pkt=0x7ffff4088e60, strm_frm=0x7ffff780cf50, qc=0x7ffff7cef000, fin=1 '\001') at src/quic_conn.c:2515
#5 0x000055555561d677 in qc_parse_pkt_frms (qc=0x7ffff7cef000, pkt=0x7ffff4088e60, qel=0x7ffff7cef6c0) at src/quic_conn.c:3050
#6 0x00005555556230aa in qc_treat_rx_pkts (qc=0x7ffff7cef000, cur_el=0x7ffff7cef6c0, next_el=0x0) at src/quic_conn.c:4214
#7 0x0000555555625fee in quic_conn_app_io_cb (t=0x7ffff40c1fa0, context=0x7ffff7cef000, state=32848) at src/quic_conn.c:4640
#8 0x00005555558a676d in run_tasks_from_lists (budgets=0x7ffff780d470) at src/task.c:596
#9 0x00005555558a725b in process_runnable_tasks () at src/task.c:876
#10 0x00005555558522ba in run_poll_loop () at src/haproxy.c:2945
#11 0x00005555558529ac in run_thread_poll_loop (data=0x555555d14440 <ha_thread_info+64>) at src/haproxy.c:3141
#12 0x00007ffff789ebb5 in ?? () from /usr/lib/libc.so.6
#13 0x00007ffff7920d90 in ?? () from /usr/lib/libc.so.6
This should fix github issue #2067.
This must be backported up to 2.6.
In very very rare cases, it is possible the Initial packet number space
must be probed even if it there is no more in flight CRYPTO frames.
In such cases, a PING frame is sent into an Initial packet. As this
packet is ack-eliciting, it must be padded by the server. qc_do_build_pkt()
is modified to do so.
Take the opportunity of this patch to modify the trace for TX frames to
easily distinguished them from other frame relative traces.
Must be backported to 2.7.
Mark the connection as limited by the anti-amplification limit when trying to
probe the peer.
Wakeup the connection PTO/dectection loss timer as soon as a datagram is
received. This was done only when the datagram was dropped.
This fixes deadlock issues revealed by some interop runner tests.
Must be backported to 2.7 and 2.6.
Some frames are marked as already acknowledged from duplicated packets
whose the original packet has been acknowledged. There is no need
to resend such packets or frames.
Implement qc_pkt_with_only_acked_frms() to detect packet with only
already acknowledged frames inside and use it from qc_prep_fast_retrans()
which selects the packet to be retransmitted.
Must be backported to 2.6 and 2.7.
Even if there is a check in callers of qc_prep_hdshk_fast_retrans() and
qc_prep_fast_retrans() to prevent retransmissions of packets with no ack-eliciting
frames, these two functions should pay attention not do to that especially if
someone decides to modify their implementations in the future.
Must be backported to 2.6 and 2.7.
This is an old bug which arrived in this commit due to a misinterpretation
of the RFC I guess where the desired effect was to acknowledge all the
handshake packets:
77ac6f566 BUG/MINOR: quic: Missing acknowledgments for trailing packets
This had as bad effect to acknowledge all the handshake packets even the
ones which are not ack-eliciting.
Must be backported to 2.7 and 2.6.
Dump the secret used to derive the next one during a key update initiated by the
client and dump the resulted new secret and the new key and iv to be used to
decryption Application level packets.
Also add a trace when the key update is supposed to be initiated on haproxy side.
This has already helped in diagnosing an issue evealed by the key update interop
test with xquic as client.
Must be backported to 2.7.
v2 interop runner test revealed this bug as follows:
[01|quic|4|c_conn.c:4087] new packet : qc@0x7f62ec026e30 pkt@0x7f62ec056390 el=I pn=491940080 rel=H
[01|quic|5|c_conn.c:1509] qc_pkt_decrypt(): entering : qc@0x7f62ec026e30
[01|quic|0|c_conn.c:1553] quic_tls_decrypt() failed : qc@0x7f62ec026e30
[01|quic|5|c_conn.c:1575] qc_pkt_decrypt(): leaving : qc@0x7f62ec026e30
[01|quic|0|c_conn.c:4091] packet decryption failed -> dropped : qc@0x7f62ec026e30 pkt@0x7f62ec056390 el=I pn=491940080
Only v2 Initial packets decryption received by the clients were impacted. There
is no issue to encrypt v2 Initial packets. This is due to the fact that when
negotiated the client may send two versions of Initial packets (currently v1,
then v2). The selection was done for the TX path but not on the RX path.
Implement qc_select_tls_ctx() to select the correct TLS cipher context for all
types of packets and call this function before removing the header protection
and before deciphering the packet.
Must be backported to 2.7.
When retransmitting datagrams with two coalesced packets inside, the second
packet was not taken into consideration when checking there is enough space
into the network for the datagram, especially when limited by the anti-amplification.
Must be backported to 2.6 and 2.7.
Before building a packet into a datagram, ensure there is sufficient space for at
least 1200 bytes. Also pad datagrams with only one ack-eliciting Initial packet
inside.
Must be backported to 2.7 and 2.6.
When a STREAM frame is re-emitted, it will point to the same stream
buffer as the original one. If an ACK is received for either one of
these frame, the underlying buffer may be freed. Thus, if the second
frame is declared as lost and schedule for retransmission, we must
ensure that the underlying buffer is still allocated or interrupt the
retransmission.
Stream buffer is stored as an eb_tree indexed by the stream ID. To avoid
to lookup over a tree each time a STREAM frame is re-emitted, a lost
STREAM frame is flagged as QUIC_FL_TX_FRAME_LOST.
In most cases, this code is functional. However, there is several
potential issues which may cause a segfault :
- when explicitely probing with a STREAM frame, the frame won't be
flagged as lost
- when splitting a STREAM frame during retransmission, the flag is not
copied
To fix both these cases, QUIC_FL_TX_FRAME_LOST flag has been converted
to a <dup> field in quic_stream structure. This field is now properly
copied when splitting a STREAM frame. Also, as this is now an inner
quic_frame field, it will be copied automatically on qc_frm_dup()
invocation thus ensuring that it will be set on probing.
This issue was encounted randomly with the following backtrace :
#0 __memmove_avx512_unaligned_erms ()
#1 0x000055f4d5a48c01 in memcpy (__len=18446698486215405173, __src=<optimized out>,
#2 quic_build_stream_frame (buf=0x7f6ac3fcb400, end=<optimized out>, frm=0x7f6a00556620,
#3 0x000055f4d5a4a147 in qc_build_frm (buf=buf@entry=0x7f6ac3fcb5d8,
#4 0x000055f4d5a23300 in qc_do_build_pkt (pos=<optimized out>, end=<optimized out>,
#5 0x000055f4d5a25976 in qc_build_pkt (pos=0x7f6ac3fcba10,
#6 0x000055f4d5a30c7e in qc_prep_app_pkts (frms=0x7f6a0032bc50, buf=0x7f6a0032bf30,
#7 qc_send_app_pkts (qc=0x7f6a0032b310, frms=0x7f6a0032bc50) at src/quic_conn.c:4184
#8 0x000055f4d5a35f42 in quic_conn_app_io_cb (t=0x7f6a0009c660, context=0x7f6a0032b310,
This should fix github issue #2051.
This should be backported up to 2.6.
This patch completes the previous one with poller subscribe of quic-conn
owned socket on sendto() error. This ensures that mux-quic is notified
if waiting on sending when a transient sendto() error is cleared. As
such, qc_notify_send() is called directly inside socket I/O callback.
qc_notify_send() internal condition have been thus completed. This will
prevent to notify upper layer until all sending condition are fulfilled:
room in congestion window and no transient error on socket FD.
This should be backported up to 2.7.
On sendto() transient error, prior to this patch sending was simulated
and we relied on retransmission to retry sending. This could hurt
significantly the performance.
Thanks to quic-conn owned socket support, it is now possible to improve
this. On transient error, sending is interrupted and quic-conn socket FD
is subscribed on the poller for sending. When send is possible,
quic_conn_sock_fd_iocb() will be in charge of restart sending.
A consequence of this change is on the return value of qc_send_ppkts().
This function will now return 0 on transient error if quic-conn has its
owned socket. This is used to interrupt sending in the calling function.
The flag QUIC_FL_CONN_TO_KILL must be checked to differentiate a fatal
error from a transient one.
This should be backported up to 2.7.
Sending is implemented in two parts on quic-conn module. First, QUIC
packets are prepared in a buffer and then sendto() is called with this
buffer as input.
qc.tx.buf is used as the input buffer. It must always be empty before
starting to prepare new packets in it. Currently, this is guarantee by
the fact that either sendto() is completed, a fatal error is encountered
which prevent future send, or a transient error is encountered and we
rely on retransmission to send the remaining data.
This will change when poller subscribe of socket FD on sendto()
transient error will be implemented. In this case, qc.tx.buf will not be
emptied to resume sending when the transient error is cleared. To allow
the current sending process to work as expected, a new function
qc_purge_txbuf() is implemented. It will try to send remaining data
before preparing new packets for sending. If successful, txbuf will be
emptied and sending can continue. If not, sending will be interrupted.
This should be backported up to 2.7.
Implement qc_notify_send(). This function is responsible to notify the
upper layer subscribed on SUB_RETRY_SEND if sending condition are back
to normal.
For the moment, this patch has no functional change as only congestion
window room is checked before notifying the upper layer. However, this
will be extended when poller subscribe of socket on sendto() error will
be implemented. qc_notify_send() will thus be responsible to ensure that
all condition are met before wake up the upper layer.
This should be backported up to 2.7.
This patch simply clean up return paths used in various send function of
quic-conn module. This will simplify the implementation of poller
subscribing on sendto() error which add another error handling path.
This should be backported up to 2.7.
Send is conducted through qc_send_ppkts() for a QUIC connection. There
is two types of error which can be encountered on sendto() or affiliated
syscalls :
* transient error. In this case, sending is simulated with the remaining
data and retransmission process is used to have the opportunity to
retry emission
* fatal error. If this happens, the connection should be closed as soon
as possible. This is done via qc_kill_conn() function. Until this
patch, only ECONNREFUSED errno was considered as fatal.
Modify the QUIC send API to be able to differentiate transient and fatal
errors more easily. This is done by fixing the return value of the
sendto() wrapper qc_snd_buf() :
* on fatal error, a negative error code is returned. This is now the
case for every errno except EAGAIN, EWOULDBLOCK, ENOTCONN, EINPROGRESS
and EBADF.
* on a transient error, 0 is returned. This is the case for the listed
errno values above and also if a partial send has been conducted by
the kernel.
* on success, the return value of sendto() syscall is returned.
This commit will be useful to be able to handle transient error with a
quic-conn owned socket. In this case, the socket should be subscribed to
the poller and no simulated send will be conducted.
This commit allows errno management to be confined in the quic-sock
module which is a nice cleanup.
On a final note, EBADF should be considered as fatal. This will be the
subject of a next commit.
This should be backported up to 2.7.
This issue arrived with this commit:
1dbeb35f8 MINOR: quic: Add new traces about by connection RX buffer handling
and revealed by the GH CI as follows:
src/quic_conn.c: In function ‘quic_rx_pkts_del’:
include/haproxy/trace.h:134:65: error: format ‘%zu’ expects argument of type ‘size_t’,
but argument 6 has type ‘uint64_t’ {aka ‘long long unsigned int’} [-Werror=format=]
_msg_len = snprintf(_msg, sizeof(_msg), (fmt), ##args);
Replace all %zu printf integer format by %llu.
Must be backported to 2.7 where the previous is supposed to be backported.
If the TX buffer (->tx.buf) attached to the connection is not drained, there
are chances that this will be detected by qc_txb_release() which triggers
a BUG_ON_HOT() when this is the case as follows
[00|quic|2|c_conn.c:3477] UDP port unreachable : qc@0x5584f18d6d50 pto_count=0 cwnd=6816 ppif=1046 pif=1046
[00|quic|5|ic_conn.c:749] qc_kill_conn(): entering : qc@0x5584f18d6d50
[00|quic|5|ic_conn.c:752] qc_kill_conn(): leaving : qc@0x5584f18d6d50
[00|quic|5|c_conn.c:3532] qc_send_ppkts(): leaving : qc@0x5584f18d6d50 pto_count=0 cwnd=6816 ppif=1046 pif=1046
FATAL: bug condition "buf && b_data(buf)" matched at src/quic_conn.c:3098
Consume the remaining data in the TX buffer calling b_del().
This bug arrived with this commit:
a2c62c314 MINOR: quic: Kill the connections on ICMP (port unreachable) packet receipt
Takes also the opportunity of this patch to modify the comments for qc_send_ppkts()
which should have arrived with a2c62c314 commit.
Must be backported to 2.7 where this latter commit is supposed to be backported.
With previous commit, quic-conn are now handled as jobs to prevent the
termination of haproxy process. This ensures that QUIC connections are
closed when all data are acknowledged by the client and there is no more
active streams.
The quic-conn layer emits a CONNECTION_CLOSE once the MUX has been
released and all streams are acknowledged. Then, the timer is scheduled
to definitely free the connection after the idle timeout period. This
allows to treat late-arriving packets.
Adjust this procedure to deactivate this timer when process stopping is
in progress. In this case, quic-conn timer is set to expire immediately
to free the quic-conn instance as soon as possible. This allows to
quickly close haproxy process.
This should be backported up to 2.7.
To prevent data loss for QUIC connections, haproxy global variable jobs
is incremented each time a quic-conn socket is allocated. This allows
the QUIC connection to terminate all its transfer operation during proxy
soft-stop. Without this patch, the process will be terminated without
waiting for QUIC connections.
Note that this is done in qc_alloc_fd(). This means only QUIC connection
with their owned socket will properly support soft-stop. In the other
case, the connection will be interrupted abruptly as before. Similarly,
jobs decrement is conducted in qc_release_fd().
This should be backported up to 2.7.
When the MUX is freed, the quic-conn layer may stay active until all
streams acknowledgment are processed. In this interval, if a new stream
is opened by the client, the quic-conn is thus now responsible to handle
it. This is done by the emission of a STOP_SENDING + RESET_STREAM.
Prior to this patch, the received packet was not acknowledged. This is
undesirable if the quic-conn is able to properly reject the request as
this can lead to unneeded retransmission from the client.
This must be backported up to 2.6.
When the MUX is freed, the quic-conn layer may stay active until all
streams acknowledgment are processed. In this interval, if a new stream
is opened by the client, the quic-conn is thus now responsible to handle
it. This is done by the emission of a STOP_SENDING.
This process has been completed to also emit a RESET_STREAM with the
same error code H3_REQUEST_REJECTED. This is done to conform with the H3
specification to invite the client to retry its request on a new
connection.
This should be backported up to 2.6.
When the MUX is freed, the quic-conn layer may stay active until all
streams acknowledgment are processed. In this interval, if a new stream
is opened by the client, the quic-conn is thus now responsible to handle
it. This is done by the emission of a STOP_SENDING.
This process is closely related to HTTP/3 protocol despite being handled
by the quic-conn layer. This highlights a flaw in our QUIC architecture
which should be adjusted. To reflect this situation, the function
qc_stop_sending_frm_enqueue() is renamed qc_h3_request_reject(). Also,
internal H3 treatment such as uni-directional bypass has been moved
inside the function.
This commit is only a refactor. However, bug fix on next patches will
rely on it so it should be backported up to 2.6.
This was revealed by Amaury when setting tune.quic.frontend.max-streams-bidi to 8
and asking a client to open 12 streams. haproxy has to send short packets
with little MAX_STREAMS frames encoded with 2 bytes. In addition to a packet number
encoded with only one byte. In the case <len_frms> is the length of the encoded
frames to be added to the packet plus the length of the packet number.
Ensure the length of the packet is at least QUIC_PACKET_PN_MAXLEN adding a PADDING
frame wich (QUIC_PACKET_PN_MAXLEN - <len_frms>) as size. For instance with
a two bytes MAX_STREAMS frames and a one byte packet number length, this adds
one byte of padding.
See https://datatracker.ietf.org/doc/html/rfc9001#name-header-protection-sample.
Must be backported to 2.7 and 2.6.
When receiving an Initial packet a peer must drop it if the datagram is smaller
than 1200. Before this patch, this is the entire datagram which was dropped.
In such a case, drop the packet after having parsed its length.
Must be backported to 2.6 and 2.7
This bug arrives with this commit:
982896961 MINOR: quic: split and rename qc_lstnr_pkt_rcv()
The first block of code consists in possibly setting this variable to true.
But it was already initialized to true before entering this code section.
Should be initialized to false.
Also take the opportunity to remove an unused "err" label.
Must be backported to 2.6 and 2.7.
Before probing the Initial packet number space, verify that we can at least
sent 1200 bytes by datagram. This may not be the case due to the amplification limit.
Must be backported to 2.6 and 2.7.
This should help in diagnosing issues revealed by the interop runner which counts
the number of handshakes from the number of Initial packets sent by the server.
Must be backported to 2.7.
The aim of this function is to rearm the idle timer. The ->expire
field of the timer task was updated without being requeued.
Some connection could be unexpectedly terminated.
Must be backported to 2.6 and 2.7.
This is very helpful during retranmission when receiving ICMP port unreachable
errors after the peer has left. This is the unique case at prevent where
qc_send_hdshk_pkts() or qc_send_app_probing() may fail (when they call
qc_send_ppkts() which fails with ECONNREFUSED as errno).
Also make the callers qc_dgrams_retransmit() stop their packet process. This
is the case of quic_conn_app_io_cb() and quic_conn_io_cb().
This modifications stops definitively any packet processing when receiving
ICMP port unreachable errors.
Must be backported to 2.7.
The send*() syscall which are responsible of such ICMP packets reception
fails with ECONNREFUSED as errno.
man(7) udp
ECONNREFUSED
No receiver was associated with the destination address.
This might be caused by a previous packet sent over the socket.
We must kill asap the underlying connection.
Must be backported to 2.7.
This code was there because the timer task was not running on the same thread
as the one which parse the QUIC packets. Now that this is no more the case,
we can wake up this task directly.
Must be backported to 2.7.
Move quic_rx_pkts_del() out of quic_conn.h to make it benefit from the TRACE API.
Add traces which already already helped in diagnosing an issue encountered with
ngtcp2 which sent too much 1RTT packets before the handshake completion. This
has been fixed here after having discussed with Tasuhiro on QUIC dev slack:
https://github.com/ngtcp2/ngtcp2/pull/663
Must be backported to 2.7.
There was a parenthesis placed in the wrong place for a memcmp().
As a consequence, clients could not reuse a UDP address for a new connection.
Must be backported to 2.7.
This has been detected by libasan as follows:
=================================================================
==3170559==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55cf77faad08 at pc 0x55cf77a87370 bp 0x7ffc01bdba70 sp 0x7ffc01bdba68
READ of size 8 at 0x55cf77faad08 thread T0
#0 0x55cf77a8736f in cli_find_kw src/cli.c:335
#1 0x55cf77a8a9bb in cli_parse_request src/cli.c:792
#2 0x55cf77a8c385 in cli_io_handler src/cli.c:1024
#3 0x55cf77d19ca1 in task_run_applet src/applet.c:245
#4 0x55cf77c0b6ba in run_tasks_from_lists src/task.c:634
#5 0x55cf77c0cf16 in process_runnable_tasks src/task.c:861
#6 0x55cf77b48425 in run_poll_loop src/haproxy.c:2934
#7 0x55cf77b491cf in run_thread_poll_loop src/haproxy.c:3127
#8 0x55cf77b4bef2 in main src/haproxy.c:3783
#9 0x7fb8b0693d09 in __libc_start_main ../csu/libc-start.c:308
#10 0x55cf7764f4c9 in _start (/home/flecaille/src/haproxy-untouched/haproxy+0x1914c9)
0x55cf77faad08 is located 0 bytes to the right of global variable 'cli_kws' defined in 'src/quic_conn.c:7834:27' (0x55cf77faaca0) of size 104
SUMMARY: AddressSanitizer: global-buffer-overflow src/cli.c:335 in cli_find_kw
Shadow bytes around the buggy address:
According to cli_find_kw() code and cli_kw_list struct definition, the second
member of this structure ->kw[] must be a null-terminated array.
Add a last element with default initializers to <cli_kws> global variable which
is impacted by this bug.
This bug arrived with this commit:
15c74702d MINOR: quic: implement a basic "show quic" CLI handler
Must be backported to 2.7 where this previous commit has been already
backported.
Incorrect printf format specifier "%lu" was used on "show quic" handler
for uint64_t. This breaks build on 32-bits architecture. To fix this
portability issue, force an explicit cast to unsigned long long with
"%llu" specifier.
This must be backported up to 2.7.
Filtering of closing/draining connections on "show quic" was not
properly implemented. This causes the extra argument "all" to display
all connections to be without effect. This patch fixes this and restores
the output of all connections.
This must be backported up to 2.7.
Reduce default "show quic" output by masking connection on
closing/draing state due to a CONNECTION_CLOSE emission/reception. These
connections can still be displayed using the special argument "all".
This should be backported up to 2.7.
Complete "show quic" handler by displaying information about
quic_stream_desc entries. These structures are used to emit stream data
and store them until acknowledgment is received.
This should be backported up to 2.7.
Complete "show quic" handler by displaying various information related
to each encryption level and packet number space. Most notably, ack
ranges and bytes in flight are present to help debug retransmission
issues.
This should be backported up to 2.7.
Complete "show quic" handler by displaying information related to the
quic_conn owned socket. First, the FD is printed, followed by the
address of the local and remote endpoint.
This should be backported up to 2.7.
Complete "show quic" handler. Source and destination CIDs are printed
for every connection. This is complete by a state info to reflect if
handshake is completed and if a CONNECTION_CLOSE has been emitted or
received and the allocation status of the attached MUX. Finally the idle
timer expiration is also printed.
This should be backported up to 2.7.
Implement a basic "show quic" CLI handler. This command will be useful
to display various information on all the active QUIC frontend
connections.
This work is heavily inspired by "show sess". Most notably, a global
list of quic_conn has been introduced to be able to loop over them. This
list is stored per thread in ha_thread_ctx.
Also add three CLI handlers for "show quic" in order to allocate and
free the command context. The dump handler runs on thread isolation.
Each quic_conn is referenced using a back-ref to handle deletion during
handler yielding.
For the moment, only a list of raw quic_conn pointers is displayed. The
handler will be completed over time with more information as needed.
This should be backported up to 2.7.
When building STREAM frames in a packet buffer, if a frame is too large
it will be splitted in two. A shorten version will be used and the
original frame will be modified to represent the remaining space.
To ensure there is enough space to store the frame data length encoded
as a QUIC integer, we use the function max_available_room(). This
function can return 0 if there not only a small space left which is
insufficient for the frame header and the shorten data. Prior to this
patch, this wasn't check and an empty unneeded STREAM frame was built
and sent for nothing.
Change this by checking the value return by max_available_room(). If 0,
do not try to split this frame and continue to the next ones in the
packet.
On 2.6, this patch serves as an optimization which will prevent the building
of unneeded empty STREAM frames.
On 2.7, this behavior has the side-effect of triggering a BUG_ON()
statement on quic_build_stream_frame(). This BUG_ON() ensures that we do
not use quic_frame with OFF bit set if its offset is 0. This can happens
if the condition defined above is reproduced for a STREAM frame at
offset 0. An empty unneeded frame is built as descibed. The problem is
that the original frame is modified with its OFF bit set even if the
offset is still 0.
This must be backported up to 2.6.
The SCID (source connection ID) used by a peer (client or server) is sent into the
long header of a QUIC packet in clear. But it is also sent into the transport
parameters (initial_source_connection_id). As these latter are encrypted into the
packet, one must check that these two pieces of information do not differ
due to a packet header corruption. Furthermore as such a connection is unusuable
it must be killed and must stop as soon as possible processing RX/TX packets.
Implement qc_kill_con() to flag a connection as unusable and to kille it asap
waking up the idle timer task to release the connection.
Add a check to quic_transport_params_store() to detect that the SCIDs do not
match and make it call qc_kill_con().
Add several tests about connection to be killed at several critial locations,
especially in the TLS stack callback to receive CRYPTO data from or derive secrets,
and before preparing packet after having received others.
Must be backported to 2.6 and 2.7.
This is a bad idea to make the TLS ClientHello callback call qc_conn_finalize().
If this latter fails, this would generate a TLS alert and make the connection
send packet whereas it is not functional. But qc_conn_finalize() job was to
install the transport parameters sent by the QUIC listener. This installation
cannot be done at any time. This must be done after having possibly negotiated
the QUIC version and before sending the first Handshake packets. It seems
the better moment to do that in when the Handshake TX secrets are derived. This
has been found inspecting the ngtcp2 code. Calling SSL_set_quic_transport_params()
too late would make the ServerHello to be sent without the transport parameters.
The code for the connection update which was done from qc_conn_finalize() has
been moved to quic_transport_params_store(). So, this update is done as soon as
possible.
Add QUIC_FL_CONN_TX_TP_RECEIVED to flag the connection as having received the
peer transport parameters. Indeed this is required when the ClientHello message
is splitted between packets.
Add QUIC_FL_CONN_FINALIZED to protect the connection from calling qc_conn_finalize()
more than one time. This latter is called only when the connection has received
the transport parameters and after returning from SSL_do_hanshake() which is the
function which trigger the TLS ClientHello callback call.
Remove the calls to qc_conn_finalize() from from the TLS ClientHello callbacks.
Must be backported to 2.6. and 2.7.
This bug was revealed by some C1 interop tests (heavy hanshake packet
corruption) when receiving 1-RTT packets with a key phase update.
This lead the packet to be decrypted with the next key phase secrets.
But this latter is initialized only after the handshake is complete.
In fact, 1-RTT must never be processed before the handshake is complete.
Relying on the "qc->mux_state == QC_MUX_NULL" condition to check the
handshake is complete is wrong during 0-RTT sessions when the mux
is initialized before the handshake is complete.
Must be backported to 2.7 and 2.6.
This is not really a bug fix but an improvement. When the Handshake packet number
space has been detected as needed to be probed, we should also try to probe the
Initial packet number space if there are still packets in flight. Furthermore
we should also try to send up to two datagrams.
Must be backported to 2.6 and 2.7.
This function is called only when probing only one packet number space
(Handshake) or two times the same one (Application). So, there is no risk
to prepare two times the same frame when uneeded because we wanted to
probe two packet number spaces. The condition "ignore the packets which
has been coalesced to another one" is not necessary. More importantly
the bug is when we want to prepare a Application packet which has
been coalesced to an Handshake packet. This is always the case when
the first Application packet is sent. It is always coalesced to
an Handshake packet with an ACK frame. So, when lost, this first
application packet was never resent. It contains the HANDSHAKE_DONE
frame to confirm the completion of the handshake to the client.
Must be backported to 2.6 and 2.7.
During the handshake and when the handshake has not been confirmed
the acknowledgement delays reported by the peer may be larger
than max_ack_delay. max_ack_delay SHOULD be ignored before the
handshake is completed when computing the PTO. But the current code considered
the wrong condition "before the hanshake is completed".
Replace the enum value QUIC_HS_ST_COMPLETED by QUIC_HS_ST_CONFIRMED to
fix this issue. In quic_loss.c, the parameter passed to quic_pto_pktns()
is renamed to avoid any possible confusion.
Must be backported to 2.7 and 2.6.
This may happen during retransmission of frames which can be splitted
(CRYPTO, or STREAM frames). One may have to split a frame to be
retransmitted due to the QUIC protocol properties (packet size limitation
and packet field encoding sizes). The remaining part of a frame which
cannot be retransmitted must be detached from the original frame it is
copied from. If not, when the really sent part will be acknowledged
the remaining part will be acknowledged too but not sent!
Must be backported to 2.7 and 2.6.
Define a new configuration option "tune.quic.max-frame-loss". This is
used to specify the limit for which a single frame instance can be
detected as lost. If exceeded, the connection is closed.
This should be backported up to 2.7.
Add a <loss_count> new field in quic_frame structure. This field is set
to 0 and incremented each time a sent packet is declared lost. If
<loss_count> reached a hard-coded limit, the connection is deemed as
failing and is closed immediately with a CONNECTION_CLOSE using
INTERNAL_ERROR.
By default, limit is set to 10. This should ensure that overall memory
usage is limited if a peer behaves incorrectly.
This should be backported up to 2.7.
Define a new function qc_frm_free() to handle frame deallocation. New
BUG_ON() statements ensure that the deallocated frame is not referenced
by other frame. To support this, all LIST_DELETE() have been replaced by
LIST_DEL_INIT(). This should enforce that frame deallocation is robust.
As a complement, qc_frm_unref() has been moved into quic_frame module.
It is justified as this is a utility function related to frame
deallocation. It allows to use it in quic_pktns_tx_pkts_release() before
calling qc_frm_free().
This should be backported up to 2.7.
Define two utility functions for quic_frame allocation :
* qc_frm_alloc() is used to allocate a new frame
* qc_frm_dup() is used to allocate a new frame by duplicating an
existing one
Theses functions are useful to centralize quic_frame initialization.
Note that pool_zalloc() is replaced by a proper pool_alloc() + explicit
initialization code.
This commit will simplify implementation of the per frame retransmission
limitation. Indeed, a new counter will be added in quic_frame structure
which must be initialized to 0.
This should be backported up to 2.7.
Care must be taken when reading/writing offset for STREAM frames. A
special OFF bit is set in the frame type to indicate that the field is
present. If not set, it is assumed that offset is 0.
To represent this, offset field of quic_stream structure must always be
initialized with a valid value in regards with its frame type OFF bit.
The previous code has no bug in part because pool_zalloc() is used to
allocate quic_frame instances. To be able to use pool_alloc(), offset is
always explicitely set to 0. If a non-null value is used, OFF bit is set
at the same occasion. A new BUG_ON() statement is added on frame builder
to ensure that the caller has set OFF bit if offset is non null.
This should be backported up to 2.7.
A dedicated <fin> field was used in quic_stream structure. However, this
info is already encoded in the frame type field as specified by QUIC
protocol.
In fact, only code for packet reception used the <fin> field. On the
sending side, we only checked for the FIN bit. To align both sides,
remove the <fin> field and only used the FIN bit.
This should be backported up to 2.7.
It is forbidden to request h3 clients to close its Control and QPACK unidirection
streams. If not, the client closes the connection with H3_CLOSED_CRITICAL_STREAM(0x104).
Perhaps this could prevent some clients as Chrome to come back for a while.
But at quic_conn level there is no mean to identify the streams for which we cannot
send STOP_SENDING frame. Such a possibility is even not mentionned in RFC 9000.
At this time there is no choice than stopping sending STOP_SENDING frames for
all the h3 unidirectional streams inspecting the ->app_opps quic_conn value.
Must be backported to 2.7 and 2.6.
Set "disable_active_migration" transport parameter to inform the peer
haproxy listeners does not the connection migration feature.
Also drop all received datagrams with a modified source address.
Must be backported to 2.7.
Implement RESET_STREAM reception by mux-quic. On reception, qcs instance
will be mark as remotely closed and its Rx buffer released. The stream
layer will be flagged on error if still attached.
This commit is part of implementing H3 errors at the stream level.
Indeed, on H3 stream errors, STOP_SENDING + RESET_STREAM should be
emitted. The STOP_SENDING will in turn generate a RESET_STREAM by the
remote peer which will be handled thanks to this patch.
This should be backported up to 2.7.
Shards were completely forgotten in commit f5a0c8abf ("MEDIUM: quic:
respect the threads assigned to a bind line"). The thread mask is
taken from the bind_conf, but since shards were introduced in 2.5,
the per-listener mask is held by the receiver and can be smaller
than the bind_conf's mask.
The effect here is that the traffic is not distributed to the
appropriate thread. At first glance it's not dramatic since it remains
one of the threads eligible by the bind_conf, but it still means that
in some contexts such as "shards by-thread", some concurrency may
persist on listeners while they're expected to be alone. One identified
impact is that it requires more rxbufs than necessary, but there may
possibly be other not yet identified side effects.
This must be backported to 2.7 and everywhere the commit above is
backported.
There is a possible segfault when accessing qc->timer_task in
quic_conn_io_cb() without testing it. It seems however very rare as it
requires several condition to be encounter.
* quic_conn must be in CLOSING state after having sent a
CONNECTION_CLOSE which free the qc.timer_task
* quic_conn handshake must still be in progress : in fact, qc.timer_task
is accessed on this path because of the anti-amplification limit
lifted.
I was unable thus far to trigger it but benchmarking tests seems to have
fire it with the following backtrace as a result :
#0 _task_wakeup (f=4096, caller=0x5620ed004a40 <_.46868>, t=0x0) at include/haproxy/task.h:195
195 state = _HA_ATOMIC_OR_FETCH(&t->state, f);
[Current thread is 1 (Thread 0x7fc714ff1700 (LWP 14305))]
(gdb) bt
#0 _task_wakeup (f=4096, caller=0x5620ed004a40 <_.46868>, t=0x0) at include/haproxy/task.h:195
#1 quic_conn_io_cb (t=0x7fc5d0e07060, context=0x7fc5d0df49c0, state=<optimized out>) at src/quic_conn.c:4393
#2 0x00005620ecedab6e in run_tasks_from_lists (budgets=<optimized out>) at src/task.c:596
#3 0x00005620ecedb63c in process_runnable_tasks () at src/task.c:861
#4 0x00005620ecea971a in run_poll_loop () at src/haproxy.c:2913
#5 0x00005620ecea9cf9 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
#6 0x00007fc773c3f609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7 0x00007fc77372d133 in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) up
#1 quic_conn_io_cb (t=0x7fc5d0e07060, context=0x7fc5d0df49c0, state=<optimized out>) at src/quic_conn.c:4393
4393 task_wakeup(qc->timer_task, TASK_WOKEN_MSG);
(gdb) p qc
$1 = (struct quic_conn *) 0x7fc5d0df49c0
(gdb) p qc->timer_task
$2 = (struct task *) 0x0
This fix should be backported up to 2.6.
This patch is the follow up of previous fix :
BUG/MINOR: quic: properly handle alloc failure in qc_new_conn()
quic_conn owned socket FD is initialized as soon as possible in
qc_new_conn(). This guarantees that we can safely call
quic_conn_release() on allocation failure. This function uses internally
qc_release_fd() to free the socket FD unless it has been initialized to
an invalid FD value.
Without this patch, a segfault will occur if one inner allocation of
qc_new_conn() fails before qc.fd is initialized.
This change is linked to quic-conn owned socket implementation.
This should be backported up to 2.7.
qc_new_conn() is used to allocate a quic_conn instance and its various
internal members. If one allocation fails, quic_conn_release() is used
to cleanup things.
For the moment, pool_zalloc() is used which ensures that all content is
null. However, some members must be initialized to a special values
to be able to use quic_conn_release() safely. This is the case for
quic_conn lists and its tasklet.
Also, some quic_conn internal allocation functions were doing their own
cleanup on failure without reset to NULL. This caused an issue with
quic_conn_release() which also frees this members. To fix this, these
functions now only return an error without cleanup. It is the caller
responsibility to free the allocated content, which is done via
quic_conn_release().
Without this patch, allocation failure in qc_new_conn() would often
result in segfault. This was reproduced easily using fail-alloc at 10%.
This should be backported up to 2.6.
UDP addresses may change over time for a QUIC connection. When using
quic-conn owned socket, we have to detect address change to break the
bind/connect association on the socket.
For the moment, on change detected, QUIC connection socket is closed and
a new one is opened. In the future, we may improve this by trying to
keep the original socket and reexecute only bind/connect syscalls.
This change is part of quic-conn owned socket implementation.
It may be backported to 2.7 after a period of observation.
This change is the second part for reception on QUIC connection socket.
All operations inside the FD handler has been delayed to quic-conn
tasklet via the new function qc_rcv_buf().
With this change, buffer management on reception has been simplified. It
is now possible to use a local buffer inside qc_rcv_buf() instead of
quic_receiver_buf().
This change is part of quic-conn owned socket implementation.
It may be backported to 2.7 after a period of observation.
Try to use the quic-conn socket for reception if it is allocated. For
this, the socket is inserted in the fdtab. This will call the new
handler quic_conn_io_cb() which is responsible to process the recv()
system call. It will reuse datagram dispatch for simplicity. However,
this is guaranteed to be called on the quic-conn thread, so it will be
more efficient to use a dedicated buffer. This will be implemented in
another commit.
This patch should improve performance by reducing contention on the
receiver socket. However, more gain can be obtained when the datagram
dispatch operation will be skipped.
Older quic_sock_fd_iocb() is renamed to quic_lstnr_sock_fd_iocb() to
emphasize its usage for the receiver socket.
This change is part of quic-conn owned socket implementation.
It may be backported to 2.7 after a period of observation.
Allocate quic-conn owned socket if possible. This requires that this is
activated in haproxy configuration. Also, this is done only if local
address is known so it depends on the support of IP_PKTINFO.
For the moment this socket is not used. This causes QUIC support to be
broken as received datagram are not read. This commit will be completed
by a following patch to support recv operation on the newly allocated
socket.
This change is part of quic-conn owned socket implementation.
It may be backported to 2.7 after a period of observation.
QUIC protocol support address migration which allows to maintain the
connection even if client has changed its network address. This is done
through address migration.
RFC 9000 stipulates that address migration is forbidden before handshake
has been completed. Add a check for this : drop silently every datagram
if client network address has changed until handshake completion.
This commit is one of the first steps towards QUIC connection migration
support.
This should be backported up to 2.7.
Detect connection migration attempted by the client. This is done by
comparing addresses stored in quic-conn with src/dest addresses of the
UDP datagram.
A new function qc_handle_conn_migration() has been added. For the
moment, no operation is conducted and the function will be completed
during connection migration implementation. The only notable things is
the increment of a new counter "quic_conn_migration_done".
This should be backported up to 2.7.
Extract individual datagram parsing code outside of datagrams list loop
in quic_lstnr_dghdlr(). This is moved in a new function named
quic_dgram_parse().
To complete this change, quic_lstnr_dghdlr() has been moved into
quic_sock source file : it belongs to QUIC socket lower layer and is
directly called by quic_sock_fd_iocb().
This commit will ease implementation of quic-conn owned socket.
New function quic_dgram_parse() will be easily usable after a receive
operation done on quic-conn IO-cb.
This should be backported up to 2.7.
quic_rx_packet struct had a reference to the quic_conn instance. This is
useless as qc instance is always passed through function argument. In
fact, pkt.qc is used only in qc_pkt_decrypt() on key update, even though
qc is also passed as argument.
Simplify this by removing qc field from quic_rx_packet structure
definition. Also clean up qc_pkt_decrypt() documentation and interface
to align it with other quic-conn related functions.
This should be backported up to 2.7.
qc_dgrams_retransmit() could reuse the same local list and could splice it two
times to the packet number space list of frame to be send/resend. This creates a
loop in this list and makes qc_build_frms() possibly endlessly loop when trying
to build frames from the packet number space list of frames. Then haproxy aborts.
This issue could be easily reproduced patching qc_build_frms() function to set <dlen>
variable value to 0 after having built at least 10 CRYPTO frames and using ngtcp2
as client with 30% packet loss in both direction.
Thank you to @gabrieltz for having reported this issue in GH #1903.
Must be backported to 2.6.
Gcc 6.5 is now well known for triggering plenty of false "may be used
uninitialized", particularly at -O1, and two of them happen in quic,
quic_tp and quic_conn. Both of them were reviewed and easily confirmed
as wrong (gcc seems to ignore the control flow after the function
returns and believes error conditions are not met). Let's just preset
the variables that bothers it. In quic_tp the initialization was moved
out of the loop since there's no point inflating the code just to
silence a stupid warning.
This previous patch was not sufficient to prevent haproxy from
crashing when some Handshake packets had to be inspected before being possibly
retransmitted:
"BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets"
This patch introduced another issue: access to packets which have been
released because still attached to others (in the same datagram). This was
the case for instance when discarding the Initial packet number space before
inspecting an Handshake packet in the same datagram through its ->prev or
member in our case.
This patch implements quic_tx_packet_dgram_detach() which detaches a packet
from the adjacent ones in the same datagram to be called when ackwowledging
a packet (as done in the previous commit) and when releasing its memory. This
was, we are sure the released packets will not be accessed during retransmissions.
Thank you to @gabrieltz for having reported this issue in GH #1903.
Must be backported to 2.6.
As revealed by some traces provided by @gabrieltz in GH #1903 issue,
there are clients (chrome I guess) which acknowledge only one packet among others
in the same datagram. This is the case for the first datagram sent by a QUIC haproxy
listener made an Initial packet followed by an Handshake one. In this identified
case, this is the Handshake packet only which is acknowledged. But if the
client is able to respond with an Handshake packet (ACK frame) this is because
it has successfully parsed the Initial packet. So, why not also acknowledging it?
AFAIK, this is mandatory. On our side, when restransmitting this datagram, the
Handshake packet was accessed from the Initial packet after having being released.
Anyway. There is an issue on our side. Obviously, we must not expect an
implementation to respect the RFC especially when it want to build an attack ;)
With this simple patch for each TX packet we send, we also set the previous one
in addition to the next one. When a packet is acknowledged, we detach the next one
and the next one in the same datagram from this packet, so that it cannot be
resent when resending these packets (the previous one, in our case).
Thank you to @gabrieltz for having reported this issue.
Must be backported to 2.6.
Add more traces to follow CRYPTO data buffering in ncbuf. Offset for
quic_enc_level is now reported for event QUIC_EV_CONN_PRHPKTS. Also
ncb_advance() must never fail so a BUG_ON() statement is here to
guarantee it.
This was useful to track handshake failure reported too often. This is
related to github issue #1903.
This should be backported up to 2.6.
Liberate quic_enc_level ncbuf in quic_stream_free(). In most cases, this
will already be done when handshake is completed via
qc_treat_rx_crypto_frms(). However, if a connection is released before
handshake completion, a leak was present without this patch.
Under normal situation, this leak should have been limited due to the
majority of QUIC connection success on handshake. However, another bug
caused handshakes to fail too frequently, especially with chrome client.
This had the side-effect to dramatically increase this memory leak.
This should fix in part github issue #1903.
QUIC handshakes were frequently in error due to haproxy misuse of
ncbuf. This resulted in one of the following scenario :
- handshake rejected with CONNECTION_CLOSE due to overlapping data
rejected
- CRYPTO data fully received by haproxy but handshake completion signal
not reported causing the client to emit PING repeatedly before timeout
This was produced because ncb_advance() result was not checked after
providing crypto data to the SSL stack in qc_provide_cdata(). However,
this operation can fail if a too small gap is formed. In the meantime,
quic_enc_level offset was always incremented. In some cases, this caused
next ncb_add() to report rejected overlapping data. In other cases, no
error was reported but SSL stack never received the end of CRYPTO data.
Change slightly the handling of new CRYPTO frames to avoid this bug :
when receiving a CRYPTO frame for the current offset, handle them
directly as previously done only if quic_enc_level ncbuf is empty. In
the other case, copy them to the buffer before treating contiguous data
via qc_treat_rx_crypto_frms().
This change ensures that ncb_advance() operation is now conducted only
in a data block : thus this is guaranteed to never fail.
This bug was easily reproduced with chromium as it fragments CRYPTO
frames randomly in several frames out of order.
This commit has two drawbacks :
- it is slightly less worst on performance because as sometimes even
data at the current offset will be memcpy
- if a client uses too many fragmented CRYPTO frames, this can cause
repeated ncb_add() error on gap size. This can be reproduced with
chrome, albeit with a slighly less frequent rate than the fixed issue.
This change should fix in part github issue #1903.
This must be backported up to 2.6.
With GCC 12.2.0 and O2 optimization activated, compiler reports the
following warning for qc_release_lost_pkts().
In function ‘quic_tx_packet_refdec’,
inlined from ‘qc_release_lost_pkts.constprop’ at src/quic_conn.c:2056:3:
include/haproxy/atomic.h:320:41: error: ‘__atomic_sub_fetch_4’ writing 4 bytes into a region of size 0 overflows the destination [-Werror=stringop-overflow=]
320 | #define HA_ATOMIC_SUB_FETCH(val, i) __atomic_sub_fetch(val, i, __ATOMIC_SEQ_CST)
| ^~~~~~~~~~~~~~~~~~
include/haproxy/quic_conn.h:499:14: note: in expansion of macro ‘HA_ATOMIC_SUB_FETCH’
499 | if (!HA_ATOMIC_SUB_FETCH(&pkt->refcnt, 1)) {
| ^~~~~~~~~~~~~~~~~~~
GCC thinks that quic_tx_packet_refdec() can be called with a NULL
argument from qc_release_lost_pkts() with <oldest_lost> as arg.
This warning is a false positive as <oldest_lost> cannot be NULL in
qc_release_lost_pkts() at this stage. This is due to the previous check
to ensure that <pkts> list is not empty.
This warning is silenced by using ALREADY_CHECKED() macro.
This should be backported up to 2.6.
This should fix github issue #1852.
Subscribing was not properly designed between quic-conn and quic MUX
layers. Align this as with in other haproxy components : <subs> field is
moved from the MUX to the quic-conn structure. All mention of qcc MUX is
cleaned up in quic_conn_subscribe()/quic_conn_unsubscribe().
Thanks to this change, ACK reception notification has been simplified.
It's now unnecessary to check for the MUX existence before waking it.
Instead, if <subs> quic-conn field is set, just wake-up the upper layer
tasklet without mentionning MUX. This should probably be extended to
other part in quic-conn code.
This should be backported up to 2.6.
On Initial packet reception, token is checked for validity through
quic_retry_token_check() function. However, some related parts were left
in the parent function quic_rx_pkt_retrieve_conn(). Move this code
directly into quic_retry_token_check() to facilitate its call in various
context.
The API of quic_retry_token_check() has also been refactored. Instead of
working on a plain char* buffer, it now uses a quic_rx_packet instance.
This helps to reduce the number of parameters.
This change will allow to check Retry token even if data were received
with a FD-owned quic-conn socket. Indeed, in this case,
quic_rx_pkt_retrieve_conn() call will probably be skipped.
This should be backported up to 2.6.
Sometimes, a packet is dropped on reception. Several goto statements are
used, mostly to increment a proxy drop counter or drop silently the
packet. However, this labels are interleaved. Re-arrang goto labels to
simplify this process :
* drop label is used to drop a packet with counter incrementation. This
is the default method.
* drop_silent is the next label which does the same thing but skip the
counter incrementation. This is useful when we do not need to report
the packet dropping operation.
This should be backported up to 2.6.
This change is the following of qc_lstnr_pkt_rcv() refactoring. This
function has finally been split into several ones.
The first half is renamed quic_rx_pkt_parse(). This function is
responsible to parse a QUIC packet header and calculate the packet
length.
QUIC connection retrieval has been extracted and is now called directly
by quic_lstnr_dghdlr().
The second half of qc_lstnr_pkt_rcv() is renamed to qc_rx_pkt_handle().
This function is responsible to copy a QUIC packet content to a
quic-conn receive buffer.
A third function named qc_rx_check_closing() is responsible to detect if
the connection is already in closing state. As this requires to drop the
whole datagram, it seems justified to be in a separate function.
This change has no functional impact. It is part of a refactoring series
on qc_lstnr_pkt_rcv(). The objective is to facilitate the integration of
FD-owned quic-conn socket patches.
This should be backported up to 2.6.
Simplify qc_lstnr_pkt_rcv() by extracting code responsible to retrieve
the quic-conn instance. This code is put in a dedicated function named
quic_rx_pkt_retrieve_conn(). This new function could be skipped if a
FD-owned quic-conn socket is used.
The first traces of qc_lstnr_pkt_rcv() have been clean up as qc instance
is always NULL here : thus qc parameter can be removed without any
change.
This change has no functional impact. It is a part of a refactoring
series on qc_lstnr_pkt_rcv(). The objective is facilitate integration of
FD-owned socket patches.
This should be backported up to 2.6.
Received packets treatment has some difference regarding if this is the
first one or not of the encapsulating datagram. Previously, this was set
via a function argument. Simplify this by defining a new Rx packet flag
named QUIC_FL_RX_PACKET_DGRAM_FIRST.
This change does not have functional impact. It will simplify API when
qc_lstnr_pkt_rcv() is broken into several functions : their number of
arguments will be reduced thanks to this patch.
This should be backported up to 2.6.
pn_offset field was only set if header protection cannot be removed.
Extend the usage of this field : it is now set everytime on packet
parsing in qc_lstnr_pkt_rcv().
This change helps to clean up API of Rx functions by removing
unnecessary variables and function argument.
This change has no functional impact. It is a part of a refactoring
series on qc_lstnr_pkt_rcv(). The objective is facilitate integration of
FD-owned socket patches.
This should be backported up to 2.6.
Add a new field version on quic_rx_packet structure. This is set on
header parsing in qc_lstnr_pkt_rcv() function.
This change has no functional impact. It is a part of a refactoring
series on qc_lstnr_pkt_rcv(). The objective is facilitate integration of
FD-owned socket patches.
This should be backported up to 2.6.
When generating a Retry token, client CID is used as encryption input.
The client must reuse the same CID when emitting the token in a new
Initial packet.
A memory overflow can occur on quic_generate_retry_token() depending on
the size of client CID. This is because space reserved for <aad> only
accounted for QUIC_HAP_CID_LEN (size of haproxy owned generated CID).
However, the client CID size only depends on client parameter and is
instead limited to QUIC_CID_MAXLEN as specified in RFC9000.
This was reproduced with ngtcp2 and haproxy built with ASAN. Here is the error
log :
==14964==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffee228cee at pc 0x7ffff785f427 bp 0x7fffee2289e0 sp 0x7fffee228188
WRITE of size 17 at 0x7fffee228cee thread T5
#0 0x7ffff785f426 in __interceptor_memcpy /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
#1 0x555555906ea7 in quic_generate_retry_token_aad src/quic_conn.c:5452
#2 0x555555907e72 in quic_retry_token_check src/quic_conn.c:5577
#3 0x55555590d01e in qc_lstnr_pkt_rcv src/quic_conn.c:6103
#4 0x5555559190fa in quic_lstnr_dghdlr src/quic_conn.c:7179
#5 0x555555eb0abf in run_tasks_from_lists src/task.c:590
#6 0x555555eb285f in process_runnable_tasks src/task.c:855
#7 0x555555d9118f in run_poll_loop src/haproxy.c:2853
#8 0x555555d91f88 in run_thread_poll_loop src/haproxy.c:3042
#9 0x7ffff709f8fc (/usr/lib/libc.so.6+0x868fc)
#10 0x7ffff7121a5f (/usr/lib/libc.so.6+0x108a5f)
This must be backported up to 2.6.
Right now the QUIC thread mapping derives the thread ID from the CID
by dividing by global.nbthread. This is a problem because this makes
QUIC work on all threads and ignores the "thread" directive on the
bind lines. In addition, only 8 bits are used, which is no more
compatible with the up to 4096 threads we may have in a configuration.
Let's modify it this way:
- the CID now dedicates 12 bits to the thread ID
- on output we continue to place the TID directly there.
- on input, the value is extracted. If it corresponds to a valid
thread number of the bind_conf, it's used as-is.
- otherwise it's used as a rank within the current bind_conf's
thread mask so that in the end we still get a valid thread ID
for this bind_conf.
The extraction function now requires a bind_conf in order to get the
group and thread mask. It was better to use bind_confs now as the goal
is to make them support multiple listeners sooner or later.
QUIC datagrams are read from a random thread. They are then redispatch
to the connection thread according to the first packet DCID. These
operations are implemented through a special buffer designed to avoid
locking.
Refactor this code with the following changes :
* <rxbuf> type is renamed <quic_receiver_buf>. Its list element is also
renamed to highligh its attach point to a receiver.
* <quic_dgram> and <quic_receiver_buf> definition are moved to
quic_sock-t.h. This helps to reduce the size of quic_conn-t.h.
* <quic_dgram> list elements are renamed to highlight their attach point
into a <quic_receiver_buf> and a <quic_dghdlr>.
This should be backported up to 2.6.
Implement quic_tls_secrets_keys_alloc()/quic_tls_secrets_keys_free() to allocate
the memory for only one direction (RX or TX).
Modify ha_quic_set_encryption_secrets() to call these functions for one of this
direction (or both). So, for now on we can rely on the value of the secret keys
to know if it was derived.
Remove QUIC_FL_TLS_SECRETS_SET flag which is no more useful.
Consequently, the secrets are dumped by the traces only if derived.
Must be backported to 2.6.
This issue was reproduced with -Q picoquic client option to split a big ClientHello
message into two Initial packets and haproxy as server without any knowledged of
any previous ORTT session (restarted after a firt 0RTT session). The ORTT received
packets were removed from their queue when the second Initial packet was parsed,
and the QUIC handshake state never progressed and remained at Initial state.
To avoid such situations, after having treated some Initial packets we always
check if there are ORTT packets to parse and we never remove them from their
queue. This will be done after the hanshake is completed or upon idle timeout
expiration.
Also add more traces to be able to analize the handshake progression.
Tested with ngtcp2 and picoquic
Must be backported to 2.6.
Implement quic_get_ncbuf() to dynamically allocate a new ncbuf to be attached to
any quic_cstream struct which needs such a buffer. Note that there is no quic_cstream
for 0RTT encryption level. quic_free_ncbuf() is added to release the memory
allocated for a non-contiguous buffer.
Modify qc_handle_crypto_frm() to call this function and allocate an ncbuf for
crypto data which are not received in order. The crypto data which are received in
order are not buffered but provide to the TLS stack (calling qc_provide_cdata()).
Modify qc_treat_rx_crypto_frms() which is called after having provided the
in order received crypto data to the TLS stack to provide again the remaining
crypto data which has been buffered, if possible (if they are in order). Each time
buffered CRYPTO data were consumed, we try to release the memory allocated for
the non-contiguous buffer (ncbuf).
Also move rx.crypto.offset quic_enc_level struct member to rx.offset quic_cstream
struct member.
Must be backported to 2.6.
Add new quic_cstream struct definition to implement the CRYPTO data stream.
This is a simplication of the qcs object (QUIC streams) for the CRYPTO data
without any information about the flow control. They are not attached to any
tree, but to a QUIC encryption level, one by encryption level except for
the early data encryption level (for 0RTT). A stream descriptor is also allocated
for each CRYPTO data stream.
Must be backported to 2.6
Retrieve the frontend destination address for a QUIC connection. This
address is retrieve from the first received datagram and then stored in
the associated quic-conn.
This feature relies on IP_PKTINFO or affiliated flags support on the
socket. This flag is set for each QUIC listeners in
sock_inet_bind_receiver(). To retrieve the destination address,
recvfrom() has been replaced by recvmsg() syscall. This operation and
parsing of msghdr structure has been extracted in a wrapper quic_recv().
This change is useful to finalize the implementation of 'dst' sample
fetch. As such, quic_sock_get_dst() has been edited to return local
address from the quic-conn. As a best effort, if local address is not
available due to kernel non-support of IP_PKTINFO, address of the
listener is returned instead.
This should be backported up to 2.6.
Continue on the cleanup of QUIC stack and components.
quic_conn uses internally a ssl_sock_ctx to handle mandatory TLS QUIC
integration. However, this is merely as a convenience, and it is not
equivalent to stackable ssl xprt layer in the context of HTTP1 or 2.
To better emphasize this, ssl_sock_ctx usage in quic_conn has been
removed wherever it is not necessary : namely in functions not related
to TLS. quic_conn struct now contains its own wait_event for tasklet
quic_conn_io_cb().
This should be backported up to 2.6.
xprt_quic module was too large and did not reflect the true architecture
by contrast to the other protocols in haproxy.
Extract code related to XPRT layer and keep it under xprt_quic module.
This code should only contains a simple API to communicate between QUIC
lower layer and connection/MUX.
The vast majority of the code has been moved into a new module named
quic_conn. This module is responsible to the implementation of QUIC
lower layer. Conceptually, it overlaps with TCP kernel implementation
when comparing QUIC and HTTP1/2 stacks of haproxy.
This should be backported up to 2.6.