The purpose of this patch is only a one-to-one replacement, as far as
possible.
CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the
stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not
so simple because at many places, we were testing if a channel was shut for
reads and writes in same time. To do the same, shut for reads must be tested
on one side on the SC and shut for writes on the other side on the opposite
SC. A special care was taken with process_stream(). flags of SCs must be
saved to be able to detect changes, just like for the channels.
This this commit, this is ->idle_expire of quic_conn struct which must
be taken into an account to display the idel timer task expiration value:
MEDIUM: quic: Ack delay implementation
Furthermore, this value was always zero until now_ms has wrapped (20 s after
the start time) due to this commit:
MEDIUM: clock: force internal time to wrap early after boot
Do not rely on the value of now_ms compared to ->idle_expire to display the
difference but use ticks_remain() to compute it.
Must be backported to 2.7 where "show quic" has already been backported.
This bug arrived with this commit:
MEDIUM: quic: Ack delay implementation
It is possible that the idle timer task was already in the run queue when its
->expire field was updated calling qc_idle_timer_do_rearm(). To prevent this
task from running in this condition, one must check its ->expire field value
with this condition to run the task if its timer has really expired:
!tick_is_expired(t->expire, now_ms)
Furthermore, as this task may be directly woken up with a call to task_wakeup()
all, for instance by qc_kill_conn() to kill the connection, one must check this
task has really been woken up when it was in the wait queue and not by a direct
call to task_wakeup() thanks to this test:
(state & TASK_WOKEN_ANY) == TASK_WOKEN_TIMER
Again, when this condition is not fulfilled, the task must be run.
Must be backported where the commit mentionned above was backported.
Reuse the idle timeout task to delay the acknowledgments. The time of the
idle timer expiration is for now on stored in ->idle_expire. The one to
trigger the acknowledgements is stored in ->ack_expire.
Add QUIC_FL_CONN_ACK_TIMER_FIRED new connection flag to mark a connection
as having its acknowledgement timer been triggered.
Modify qc_may_build_pkt() to prevent the sending of "ack only" packets and
allows the connection to send packet when the ack timer has fired.
It is possible that acks are sent before the ack timer has triggered. In
this case it is cancelled only if ACK frames are really sent.
The idle timer expiration must be set again when the ack timer has been
triggered or when it is cancelled.
Must be backported to 2.7.
Dump variables displayed by TRACE_ENTER() or TRACE_LEAVE() by calls to TRACE_PROTO().
No more variables are displayed by the two former macros. For now on, these information
are accessible from proto level.
Add new calls to TRACE_PROTO() at important locations in relation whith QUIC transport
protocol.
When relevant, try to prefix such traces with TX or RX keyword to identify the
concerned subpart (transmission or reception) of the protocol.
Must be backported to 2.7.
This bug was revealed by handshakeloss interop tests (often with quiceh) where one
could see haproxy an Initial packet without TLS ClientHello message (only a padded
PING frame). In this case, as the ->max_idle_timeout was not initialized, the
connection was closed about three seconds later, and haproxy opened a new one with
a new source connection ID upon receipt of the next client Initial packet. As the
interop runner count the number of source connection ID used by the server to check
there were exactly 50 such IDs used by the server, it considered the test as failed.
So, the ->max_idle_timeout of the connection must be at least initialized
to the local "max_idle_timeout" transport parameter value to avoid such
a situation (closing connections too soon) until it is "negotiated" with the
client when receiving its TLS ClientHello message.
Must be backported to 2.7 and 2.6.
Add ->srtt, ->rtt_var, ->rtt_min and ->pto_count values from ->path->loss
struct to "show quic". Same thing for ->cwnd from ->path struct.
Also take the opportunity of this patch to dump the packet number
space information directly from ->pktns[] array in place of ->els[]
array. Indeed, ->els[QUIC_TLS_ENC_LEVEL_EARLY_DATA] and ->els[QUIC_TLS_ENC_LEVEL_APP]
have the same packet number space.
Must be backported to 2.7 where "show quic" implementation has alredy been
backported.
This bug arrived with this commit:
MINOR: quic: Send PING frames when probing Initial packet number space
This may happen when haproxy needs to probe the peer with very short packets
(only one PING frame). In this case, the packet must be padded. There was clearly
a case which was removed by the mentionned commit above. That said, there was
an extra byte which was added to the PADDING frame before the mentionned commit
above. This is no more the case with this patch.
Thank you to @tatsuhiro-t (ngtcp2 manager) for having reported this issue which
was revealed by the keyupdate test (on client side).
Must be backported to 2.7 and 2.6.
This patch follows this commit which was not sufficient:
BUG/MINOR: quic: Missing STREAM frame data pointer updates
Indeed, after updating the ->offset field, the bit which informs the
frame builder of its presence must be systematically set.
This bug was revealed by the following BUG_ON() from
quic_build_stream_frame() :
bug condition "!!(frm->type & 0x04) != !!stream->offset.key" matched at src/quic_frame.c:515
This should fix the last crash occured on github issue #2074.
Must be backported to 2.6 and 2.7.
qc_notify_send() is used to wake up the MUX layer for sending. This
function first ensures that all sending condition are met to avoid to
wake up the MUX for unnecessarily.
One of this condition is to check if there is room in the congestion
window. However, when probe packets must be sent due to a PTO
expiration, RFC 9002 explicitely mentions that the congestion window
must be ignored which was not the case prior to this patch.
This commit fixes this by first setting <pto_probe> of 01RTT packet
space before invoking qc_notify_send(). This ensures that congestion
window won't be checked anymore to wake up the MUX layer until probing
packets are sent.
This commit replaces the following one which was not sufficient :
commit e25fce03ebe3307bc104d1f81356108e271d2bc3
BUG/MINOR: quic: Dysfunctional 01RTT packet number space probing
This should be backported up to 2.7.
On PTO probe timeout expiration, a probe packet must be emitted.
quic_pto_pktns() is used to determine for which packet space the timer
has expired. However, if MUX is already subscribed for sending, it is
woken up without checking first if this happened for the 01RTT packet
space.
It is unsure that this is really a bug as in most cases, MUX is
established only after Initial and Handshake packet spaces are removed.
However, the situation is not se clear when 0-RTT is used. For this
reason, adjust the code to explicitely check for the 01RTT packet space
before waking up the MUX layer.
This should be backported up to 2.6. Note that qc_notify_send() does not
exists in 2.6 so it should be replaced by the explicit block checking
(qc->subs && qc->subs->events & SUB_RETRY_SEND).
This bug arrived with this commit:
"MINOR: quic: implement qc_notify_send()".
The ->tx.pto_probe variable was no more set when qc_processt_timer() the timer
task for the connection responsible of detecting packet loss and probing upon
PTO expiration leading to interrupted stream transfers. This was revealed by
blackhole interop failed tests where one could see that qc_process_timer()
was wakeup without traces as follows in the log file:
"needs to probe 01RTT packet number space"
Must be backported to 2.7 and to 2.6 if the commit mentionned above
is backported to 2.6 in the meantime.
The ACK frame range of packets were handled from the largest to the smallest
packet number, leading to big number of ebtree insertions when the packet are
handled in the inverse way they are sent. This was detected a long time ago
but left in the code to stress our implementation. It is time to be more
efficient and process the packet so that to avoid useless ebtree insertions.
Modify qc_ackrng_pkts() responsible of handling the acknowledged packets from an
ACK frame range of acknowledged packets.
Must be backported to 2.7.
This patch follows this one which was not sufficient:
"BUG/MINOR: quic: Missing STREAM frame length updates"
Indeed, it is not sufficient to update the ->len and ->offset member
of a STREAM frame to move it forward. The data pointer must also be updated.
This is not done by the STREAM frame builder.
Must be backported to 2.6 and 2.7.
Some STREAM frame lengths were not updated before being duplicated, built
of requeued contrary to their ack offsets. This leads haproxy to crash when
receiving acknowledgements for such frames with this thread #1 backtrace:
Thread 1 (Thread 0x7211b6ffd640 (LWP 986141)):
#0 ha_crash_now () at include/haproxy/bug.h:52
No locals.
#1 b_del (b=<optimized out>, del=<optimized out>) at include/haproxy/buf.h:436
No locals.
#2 qc_stream_desc_ack (stream=stream@entry=0x7211b6fd9bc8, offset=offset@entry=53176, len=len@entry=1122) at src/quic_stream.c:111
Thank you to @Tristan971 for having provided such traces which reveal this issue:
[04|quic|5|c_conn.c:1865] qc_requeue_nacked_pkt_tx_frms(): entering : qc@0x72119c22cfe0
[04|quic|5|_frame.c:1179] qc_frm_unref(): entering : qc@0x72119c22cfe0
[04|quic|5|_frame.c:1186] qc_frm_unref(): remove frame reference : qc@0x72119c22cfe0 frm@0x72118863d260 STREAM_F uni=0 fin=1 id=460 off=52957 len=1122 3244
[04|quic|5|_frame.c:1194] qc_frm_unref(): leaving : qc@0x72119c22cfe0
[04|quic|5|c_conn.c:1902] qc_requeue_nacked_pkt_tx_frms(): updated partially acked frame : qc@0x72119c22cfe0 frm@0x72119c472290 STREAM_F uni=0 fin=1 id=460 off=53176 len=1122
Note that haproxy has much more chance to crash if this frame is the last one
(fin bit set). But another condition must be fullfilled to update the ack offset.
A previous STREAM frame from the same stream with the same offset but with less
data must be acknowledged by the peer. This is the condition to update the ack offset.
For others frames without fin bit in the same conditions, I guess the stream may be
truncated because too much data are removed from the stream when they are
acknowledged.
Must be backported to 2.6 and 2.7.
This issue was revealed by "Multiple streams" QUIC tracker test which very often
fails (locally) with a file of about 1Mbytes (x4 streams). The log of QUIC tracker
revealed that from its point of view, the 4 files were never all received entirely:
"results" : {
"stream_0_rec_closed" : true,
"stream_0_rec_offset" : 1024250,
"stream_0_snd_closed" : true,
"stream_0_snd_offset" : 15,
"stream_12_rec_closed" : false,
"stream_12_rec_offset" : 72689,
"stream_12_snd_closed" : true,
"stream_12_snd_offset" : 15,
"stream_4_rec_closed" : true,
"stream_4_rec_offset" : 1024250,
"stream_4_snd_closed" : true,
"stream_4_snd_offset" : 15,
"stream_8_rec_closed" : true,
"stream_8_rec_offset" : 1024250,
"stream_8_snd_closed" : true,
"stream_8_snd_offset" : 15
},
But this in contradiction with others QUIC tracker logs which confirms that haproxy
has really (re)sent the stream at the suspected offset(stream_12_rec_offset):
1152085,
"transport",
"packet_received",
{
"frames" : [
{
"frame_type" : "stream",
"length" : "155",
"offset" : "72689",
"stream_id" : "12"
}
],
"header" : {
"dcid" : "a14479169ebb9dba",
"dcil" : "8",
"packet_number" : "466",
"packet_size" : 190
},
"packet_type" : "1RTT"
}
When detected as losts, the packets are enlisted, then their frames are
requeued in their packet number space by qc_requeue_nacked_pkt_tx_frms().
This was done using a local list which was spliced to the packet number
frame list. This had as bad effect to retransmit the frames in the inverse
order they have been sent. This is something the QUIC tracker go client
does not like at all!
Removing the frame splicing fixes this issue and allows haproxy to pass the
"Multiple streams" test.
Must be backported to 2.7.
This bug arrived with this commit:
b5a8020e9 MINOR: quic: RETIRE_CONNECTION_ID frame handling (RX)
and was revealed by h3 interop tests with clients like s2n-quic and quic-go
as noticed by Amaury.
Indeed, one must check that the CID matching the sequence number provided by a received
RETIRE_CONNECTION_ID frame does not match the DCID of the packet.
Remove useless ->curr_cid_seq_num member from quic_conn struct.
The sequence number lookup must be done in qc_handle_retire_connection_id_frm()
to check the validity of the RETIRE_CONNECTION_ID frame, it returns the CID to be
retired into <cid_to_retire> variable passed as parameter to this function if
the frame is valid and if the CID was not already retired
Must be backported to 2.7.
A new global quic-conn list has been added by the previous patch. It will
contain every quic-conn in closing or draining state.
Thus, it is now easier to include or skip them on a "show quic" output :
when the default list on the current thread has been browsed entirely,
either we skip to the next thread or we look at the closing list on the
current thread.
This should be backported up to 2.7.
When a CONNECTION_CLOSE is emitted or received, a QUIC connection enters
respectively in draining or closing state. These states are a loose
equivalent of TCP TIME_WAIT. No data can be exchanged anymore but the
connection is maintained during a certain timer to handle packet
reordering or loss.
A new global list has been defined for QUIC connections in
closing/draining state inside thread_ctx structure. Each time a
connection enters in one of this state, it will be moved from the
default global list to the new closing list.
The objective of this patch is to quickly filter connections on
closing/draining. Most notably, this will be used to wake up these
connections and avoid that haproxy process stopping is delayed by them.
A dedicated function qc_detach_th_ctx_list() has been implemented to
transfer a quic-conn from one list instance to the other. This takes
care of back-references attach to a quic-conn instance in case of a
running "show quic".
This should be backported up to 2.7.
Modify quic_transport_params_dump() and others function relative to the
transport parameters value dump from TRACE() to make their output more
compact.
Add call to quic_transport_params_dump() to dump the transport parameters
from "show quic" CLI command.
Must be backported to 2.7.
Add QUIC_FL_RX_PACKET_SPIN_BIT new RX packet flag to mark an RX packet as having
the spin bit set. Idem for the connection with QUIC_FL_CONN_SPIN_BIT flag.
Implement qc_handle_spin_bit() to set/unset QUIC_FL_CONN_SPIN_BIT for the connection
as soon as a packet number could be deciphered.
Modify quic_build_packet_short_header() to set the spin bit when building
a short packet header.
Validated by quic-tracker spin bit test.
Must be backported to 2.7.
Add ->curr_cid_seq_num new quic_conn struct frame to store the connection
ID sequence number currently used by the connection.
Implement qc_handle_retire_connection_id_frm() to handle this RX frame.
Implement qc_retire_connection_seq_num() to remove a connection ID from its
sequence number.
Implement qc_build_new_connection_id_frm to allocate a new NEW_CONNECTION_ID
frame from a CID.
Modify qc_parse_pkt_frms() which parses the frames of an RX packet to handle
the case of the RETIRE_CONNECTION_ID frame.
Must be backported to 2.7.
Add ->next_cid_seq_num new member to quic_conn struct to store the next
connection ID to be used to alloacated a connection ID.
It is initialized to 0 from qc_new_conn() which initializes a connection.
Modify new_quic_cid() to use this variable each time it is called without
giving the possibility to the caller to pass the sequence number for the
connection to be allocated.
Modify quic_build_post_handshake_frames() to use ->next_cid_seq_num
when building NEW_CONNECTION_ID frames after the hanshake has been completed.
Limit the number of connection IDs provided to the peer to the minimum
between 4 and the value it sent with active_connection_id_limit transport
parameter. This includes the connection ID used by the connection to send
this new connection IDs.
Must be backported to 2.7.
The MUX instance is released before its quic-conn counterpart. On
termination, a H3 GOAWAY is emitted to prevent the client to open new
streams for this connection.
The quic-conn instance will stay alive until all opened streams data are
acknowledged. If the client tries to open a new stream during this
interval despite the GOAWAY, quic-conn is responsible to request its
immediate closure with a STOP_SENDING + RESET_STREAM.
This behavior was already implemented but the received packet with the
new STREAM was never acknowledged. This was fixed with the following
commit :
commit 156a89aef8c63910502b266251dc34f648a99fae
BUG/MINOR: quic: acknowledge STREAM frame even if MUX is released
However, this patch introduces a regression as it did not skip the call
to qc_handle_strm_frm() despite the MUX instance being released. This
can cause a segfault when using qcc_get_qcs() on a released MUX
instance. To fix this, add a missing break statement which will skip
qc_handle_strm_frm() when the MUX instance is not initialized.
This commit was reproduced using a short timeout client and sending
several requests with delay between them by using a modified aioquic. It
produces a crash with the following backtrace :
#0 0x000055555594d261 in __eb64_lookup (x=4, root=0x7ffff4091f60) at include/import/eb64tree.h:132
#1 eb64_lookup (root=0x7ffff4091f60, x=4) at src/eb64tree.c:37
#2 0x000055555563fc66 in qcc_get_qcs (qcc=0x7ffff4091dc0, id=4, receive_only=1, send_only=0, out=0x7ffff780ca70) at src/mux_quic.c:668
#3 0x0000555555641e1a in qcc_recv (qcc=0x7ffff4091dc0, id=4, len=40, offset=0, fin=1 '\001', data=0x7ffff40c4fef "\001&") at src/mux_quic.c:974
#4 0x0000555555619d28 in qc_handle_strm_frm (pkt=0x7ffff4088e60, strm_frm=0x7ffff780cf50, qc=0x7ffff7cef000, fin=1 '\001') at src/quic_conn.c:2515
#5 0x000055555561d677 in qc_parse_pkt_frms (qc=0x7ffff7cef000, pkt=0x7ffff4088e60, qel=0x7ffff7cef6c0) at src/quic_conn.c:3050
#6 0x00005555556230aa in qc_treat_rx_pkts (qc=0x7ffff7cef000, cur_el=0x7ffff7cef6c0, next_el=0x0) at src/quic_conn.c:4214
#7 0x0000555555625fee in quic_conn_app_io_cb (t=0x7ffff40c1fa0, context=0x7ffff7cef000, state=32848) at src/quic_conn.c:4640
#8 0x00005555558a676d in run_tasks_from_lists (budgets=0x7ffff780d470) at src/task.c:596
#9 0x00005555558a725b in process_runnable_tasks () at src/task.c:876
#10 0x00005555558522ba in run_poll_loop () at src/haproxy.c:2945
#11 0x00005555558529ac in run_thread_poll_loop (data=0x555555d14440 <ha_thread_info+64>) at src/haproxy.c:3141
#12 0x00007ffff789ebb5 in ?? () from /usr/lib/libc.so.6
#13 0x00007ffff7920d90 in ?? () from /usr/lib/libc.so.6
This should fix github issue #2067.
This must be backported up to 2.6.
In very very rare cases, it is possible the Initial packet number space
must be probed even if it there is no more in flight CRYPTO frames.
In such cases, a PING frame is sent into an Initial packet. As this
packet is ack-eliciting, it must be padded by the server. qc_do_build_pkt()
is modified to do so.
Take the opportunity of this patch to modify the trace for TX frames to
easily distinguished them from other frame relative traces.
Must be backported to 2.7.
Mark the connection as limited by the anti-amplification limit when trying to
probe the peer.
Wakeup the connection PTO/dectection loss timer as soon as a datagram is
received. This was done only when the datagram was dropped.
This fixes deadlock issues revealed by some interop runner tests.
Must be backported to 2.7 and 2.6.
Some frames are marked as already acknowledged from duplicated packets
whose the original packet has been acknowledged. There is no need
to resend such packets or frames.
Implement qc_pkt_with_only_acked_frms() to detect packet with only
already acknowledged frames inside and use it from qc_prep_fast_retrans()
which selects the packet to be retransmitted.
Must be backported to 2.6 and 2.7.
Even if there is a check in callers of qc_prep_hdshk_fast_retrans() and
qc_prep_fast_retrans() to prevent retransmissions of packets with no ack-eliciting
frames, these two functions should pay attention not do to that especially if
someone decides to modify their implementations in the future.
Must be backported to 2.6 and 2.7.
This is an old bug which arrived in this commit due to a misinterpretation
of the RFC I guess where the desired effect was to acknowledge all the
handshake packets:
77ac6f566 BUG/MINOR: quic: Missing acknowledgments for trailing packets
This had as bad effect to acknowledge all the handshake packets even the
ones which are not ack-eliciting.
Must be backported to 2.7 and 2.6.
Dump the secret used to derive the next one during a key update initiated by the
client and dump the resulted new secret and the new key and iv to be used to
decryption Application level packets.
Also add a trace when the key update is supposed to be initiated on haproxy side.
This has already helped in diagnosing an issue evealed by the key update interop
test with xquic as client.
Must be backported to 2.7.
v2 interop runner test revealed this bug as follows:
[01|quic|4|c_conn.c:4087] new packet : qc@0x7f62ec026e30 pkt@0x7f62ec056390 el=I pn=491940080 rel=H
[01|quic|5|c_conn.c:1509] qc_pkt_decrypt(): entering : qc@0x7f62ec026e30
[01|quic|0|c_conn.c:1553] quic_tls_decrypt() failed : qc@0x7f62ec026e30
[01|quic|5|c_conn.c:1575] qc_pkt_decrypt(): leaving : qc@0x7f62ec026e30
[01|quic|0|c_conn.c:4091] packet decryption failed -> dropped : qc@0x7f62ec026e30 pkt@0x7f62ec056390 el=I pn=491940080
Only v2 Initial packets decryption received by the clients were impacted. There
is no issue to encrypt v2 Initial packets. This is due to the fact that when
negotiated the client may send two versions of Initial packets (currently v1,
then v2). The selection was done for the TX path but not on the RX path.
Implement qc_select_tls_ctx() to select the correct TLS cipher context for all
types of packets and call this function before removing the header protection
and before deciphering the packet.
Must be backported to 2.7.
When retransmitting datagrams with two coalesced packets inside, the second
packet was not taken into consideration when checking there is enough space
into the network for the datagram, especially when limited by the anti-amplification.
Must be backported to 2.6 and 2.7.
Before building a packet into a datagram, ensure there is sufficient space for at
least 1200 bytes. Also pad datagrams with only one ack-eliciting Initial packet
inside.
Must be backported to 2.7 and 2.6.
When a STREAM frame is re-emitted, it will point to the same stream
buffer as the original one. If an ACK is received for either one of
these frame, the underlying buffer may be freed. Thus, if the second
frame is declared as lost and schedule for retransmission, we must
ensure that the underlying buffer is still allocated or interrupt the
retransmission.
Stream buffer is stored as an eb_tree indexed by the stream ID. To avoid
to lookup over a tree each time a STREAM frame is re-emitted, a lost
STREAM frame is flagged as QUIC_FL_TX_FRAME_LOST.
In most cases, this code is functional. However, there is several
potential issues which may cause a segfault :
- when explicitely probing with a STREAM frame, the frame won't be
flagged as lost
- when splitting a STREAM frame during retransmission, the flag is not
copied
To fix both these cases, QUIC_FL_TX_FRAME_LOST flag has been converted
to a <dup> field in quic_stream structure. This field is now properly
copied when splitting a STREAM frame. Also, as this is now an inner
quic_frame field, it will be copied automatically on qc_frm_dup()
invocation thus ensuring that it will be set on probing.
This issue was encounted randomly with the following backtrace :
#0 __memmove_avx512_unaligned_erms ()
#1 0x000055f4d5a48c01 in memcpy (__len=18446698486215405173, __src=<optimized out>,
#2 quic_build_stream_frame (buf=0x7f6ac3fcb400, end=<optimized out>, frm=0x7f6a00556620,
#3 0x000055f4d5a4a147 in qc_build_frm (buf=buf@entry=0x7f6ac3fcb5d8,
#4 0x000055f4d5a23300 in qc_do_build_pkt (pos=<optimized out>, end=<optimized out>,
#5 0x000055f4d5a25976 in qc_build_pkt (pos=0x7f6ac3fcba10,
#6 0x000055f4d5a30c7e in qc_prep_app_pkts (frms=0x7f6a0032bc50, buf=0x7f6a0032bf30,
#7 qc_send_app_pkts (qc=0x7f6a0032b310, frms=0x7f6a0032bc50) at src/quic_conn.c:4184
#8 0x000055f4d5a35f42 in quic_conn_app_io_cb (t=0x7f6a0009c660, context=0x7f6a0032b310,
This should fix github issue #2051.
This should be backported up to 2.6.
This patch completes the previous one with poller subscribe of quic-conn
owned socket on sendto() error. This ensures that mux-quic is notified
if waiting on sending when a transient sendto() error is cleared. As
such, qc_notify_send() is called directly inside socket I/O callback.
qc_notify_send() internal condition have been thus completed. This will
prevent to notify upper layer until all sending condition are fulfilled:
room in congestion window and no transient error on socket FD.
This should be backported up to 2.7.
On sendto() transient error, prior to this patch sending was simulated
and we relied on retransmission to retry sending. This could hurt
significantly the performance.
Thanks to quic-conn owned socket support, it is now possible to improve
this. On transient error, sending is interrupted and quic-conn socket FD
is subscribed on the poller for sending. When send is possible,
quic_conn_sock_fd_iocb() will be in charge of restart sending.
A consequence of this change is on the return value of qc_send_ppkts().
This function will now return 0 on transient error if quic-conn has its
owned socket. This is used to interrupt sending in the calling function.
The flag QUIC_FL_CONN_TO_KILL must be checked to differentiate a fatal
error from a transient one.
This should be backported up to 2.7.
Sending is implemented in two parts on quic-conn module. First, QUIC
packets are prepared in a buffer and then sendto() is called with this
buffer as input.
qc.tx.buf is used as the input buffer. It must always be empty before
starting to prepare new packets in it. Currently, this is guarantee by
the fact that either sendto() is completed, a fatal error is encountered
which prevent future send, or a transient error is encountered and we
rely on retransmission to send the remaining data.
This will change when poller subscribe of socket FD on sendto()
transient error will be implemented. In this case, qc.tx.buf will not be
emptied to resume sending when the transient error is cleared. To allow
the current sending process to work as expected, a new function
qc_purge_txbuf() is implemented. It will try to send remaining data
before preparing new packets for sending. If successful, txbuf will be
emptied and sending can continue. If not, sending will be interrupted.
This should be backported up to 2.7.
Implement qc_notify_send(). This function is responsible to notify the
upper layer subscribed on SUB_RETRY_SEND if sending condition are back
to normal.
For the moment, this patch has no functional change as only congestion
window room is checked before notifying the upper layer. However, this
will be extended when poller subscribe of socket on sendto() error will
be implemented. qc_notify_send() will thus be responsible to ensure that
all condition are met before wake up the upper layer.
This should be backported up to 2.7.
This patch simply clean up return paths used in various send function of
quic-conn module. This will simplify the implementation of poller
subscribing on sendto() error which add another error handling path.
This should be backported up to 2.7.
Send is conducted through qc_send_ppkts() for a QUIC connection. There
is two types of error which can be encountered on sendto() or affiliated
syscalls :
* transient error. In this case, sending is simulated with the remaining
data and retransmission process is used to have the opportunity to
retry emission
* fatal error. If this happens, the connection should be closed as soon
as possible. This is done via qc_kill_conn() function. Until this
patch, only ECONNREFUSED errno was considered as fatal.
Modify the QUIC send API to be able to differentiate transient and fatal
errors more easily. This is done by fixing the return value of the
sendto() wrapper qc_snd_buf() :
* on fatal error, a negative error code is returned. This is now the
case for every errno except EAGAIN, EWOULDBLOCK, ENOTCONN, EINPROGRESS
and EBADF.
* on a transient error, 0 is returned. This is the case for the listed
errno values above and also if a partial send has been conducted by
the kernel.
* on success, the return value of sendto() syscall is returned.
This commit will be useful to be able to handle transient error with a
quic-conn owned socket. In this case, the socket should be subscribed to
the poller and no simulated send will be conducted.
This commit allows errno management to be confined in the quic-sock
module which is a nice cleanup.
On a final note, EBADF should be considered as fatal. This will be the
subject of a next commit.
This should be backported up to 2.7.
This issue arrived with this commit:
1dbeb35f8 MINOR: quic: Add new traces about by connection RX buffer handling
and revealed by the GH CI as follows:
src/quic_conn.c: In function ‘quic_rx_pkts_del’:
include/haproxy/trace.h:134:65: error: format ‘%zu’ expects argument of type ‘size_t’,
but argument 6 has type ‘uint64_t’ {aka ‘long long unsigned int’} [-Werror=format=]
_msg_len = snprintf(_msg, sizeof(_msg), (fmt), ##args);
Replace all %zu printf integer format by %llu.
Must be backported to 2.7 where the previous is supposed to be backported.
If the TX buffer (->tx.buf) attached to the connection is not drained, there
are chances that this will be detected by qc_txb_release() which triggers
a BUG_ON_HOT() when this is the case as follows
[00|quic|2|c_conn.c:3477] UDP port unreachable : qc@0x5584f18d6d50 pto_count=0 cwnd=6816 ppif=1046 pif=1046
[00|quic|5|ic_conn.c:749] qc_kill_conn(): entering : qc@0x5584f18d6d50
[00|quic|5|ic_conn.c:752] qc_kill_conn(): leaving : qc@0x5584f18d6d50
[00|quic|5|c_conn.c:3532] qc_send_ppkts(): leaving : qc@0x5584f18d6d50 pto_count=0 cwnd=6816 ppif=1046 pif=1046
FATAL: bug condition "buf && b_data(buf)" matched at src/quic_conn.c:3098
Consume the remaining data in the TX buffer calling b_del().
This bug arrived with this commit:
a2c62c314 MINOR: quic: Kill the connections on ICMP (port unreachable) packet receipt
Takes also the opportunity of this patch to modify the comments for qc_send_ppkts()
which should have arrived with a2c62c314 commit.
Must be backported to 2.7 where this latter commit is supposed to be backported.
With previous commit, quic-conn are now handled as jobs to prevent the
termination of haproxy process. This ensures that QUIC connections are
closed when all data are acknowledged by the client and there is no more
active streams.
The quic-conn layer emits a CONNECTION_CLOSE once the MUX has been
released and all streams are acknowledged. Then, the timer is scheduled
to definitely free the connection after the idle timeout period. This
allows to treat late-arriving packets.
Adjust this procedure to deactivate this timer when process stopping is
in progress. In this case, quic-conn timer is set to expire immediately
to free the quic-conn instance as soon as possible. This allows to
quickly close haproxy process.
This should be backported up to 2.7.
To prevent data loss for QUIC connections, haproxy global variable jobs
is incremented each time a quic-conn socket is allocated. This allows
the QUIC connection to terminate all its transfer operation during proxy
soft-stop. Without this patch, the process will be terminated without
waiting for QUIC connections.
Note that this is done in qc_alloc_fd(). This means only QUIC connection
with their owned socket will properly support soft-stop. In the other
case, the connection will be interrupted abruptly as before. Similarly,
jobs decrement is conducted in qc_release_fd().
This should be backported up to 2.7.
When the MUX is freed, the quic-conn layer may stay active until all
streams acknowledgment are processed. In this interval, if a new stream
is opened by the client, the quic-conn is thus now responsible to handle
it. This is done by the emission of a STOP_SENDING + RESET_STREAM.
Prior to this patch, the received packet was not acknowledged. This is
undesirable if the quic-conn is able to properly reject the request as
this can lead to unneeded retransmission from the client.
This must be backported up to 2.6.