Add ->ctx new member field to quic_tls_secrets struct to store the cipher context
for each QUIC TLS context TX/RX parts.
Add quic_tls_rx_ctx_init() and quic_tls_tx_ctx_init() functions to initialize
these cipher context for RX and TX parts respectively.
Make qc_new_isecs() call these two functions to initialize the cipher contexts
of the Initial secrets. Same thing for ha_quic_set_encryption_secrets() to
initialize the cipher contexts of the subsequent derived secrets (ORTT, Handshake,
1RTT).
Modify quic_tls_decrypt() and quic_tls_encrypt() to always use the same cipher
context without allocating it each time they are called.
Define a new API to notify the MUX from the quic-conn when the
connection is about to be closed. This happens in the following cases :
- on idle timeout
- on CONNECTION_CLOSE emission or reception
The MUX wake callback is called on these conditions. The quic-conn
QUIC_FL_NOTIFY_CLOSE is set to only report once. On the MUX side,
connection flags CO_FL_SOCK_RD_SH|CO_FL_SOCK_WR_SH are set to interrupt
future emission/reception.
This patch is the counterpart to
"MEDIUM: mux-quic: report CO_FL_ERROR on send".
Now the quic-conn is able to report its closing, which may be translated
by the MUX into a CO_FL_ERROR on the connection for the upper layer.
This allows the MUX to properly react to the QUIC closing mechanism for
both idle-timeout and closing/draining states.
When freeing a quic-conn, the streams resources attached to it must be
cleared. This code is already implemented but the streams buffer was not
deallocated.
Fix this by using the function qc_stream_desc_free. This existing
function centralize all operations to properly free all streams
elements, attached both to the MUX and the quic-conn.
This fixes a memory leak which can happen for each released connection.
This flag was used to notify the MUX about a CONNECTION_CLOSE frame
reception. It is now unused on the MUX side and can be removed. A new
mechanism to detect quic-conn closing will be soon implemented.
Rationalize the lifetime of the quic-conn regarding with the MUX. The
quic-conn must not be freed if the MUX is still allocated.
This simplify the MUX code when accessing the quic-conn and removed
possible segfaults.
To implement this, if the quic-conn timer expired, the quic-conn is
released only if the MUX is not allocated. Else, the quic-conn is
flagged with QUIC_FL_CONN_EXP_TIMER. The MUX is then responsible
to call quic_close() which will free the flagged quic-conn.
New received packets after sending CONNECTION_CLOSE frame trigger a new
CONNECTION_CLOSE frame to be sent. Each time such a frame is sent we
increase the number of packet required to send another CONNECTION_CLOSE
frame.
Rearm only one time the idle timer when sending a CONNECTION_CLOSE frame.
This should be useful to have an idea of the list of frames which could be built
towards the list of available frames when building packets.
Same thing about the frames which could not be built because of a lack of room
in the TX buffer.
During a handshake, after having prepared a probe upon a PTO expiration from
process_timer(), we wake up the I/O handler to make it send probing packets.
This handler first treat incoming packets which trigger a fast retransmission
leading to send too much probing (duplicated) packets. In this cas we cancel
the fast retranmission.
When discarding a packet number space, we at least reset the PTO backoff counter.
Doing this several times have an impact on the PTO duration calculation.
We must not discard a packet number space several times (this is already the case
for the handshake packet number space).
Before having a look at the next encryption level to build packets if there is
no more ack-eliciting frames to send we must check we have not to probe from
the current encryption level anymore. If not, we only send one datagram instead
of sending two datagrams giving less chance to recover from packet loss.
Due to a erroneous interpretation of the RFC 9000 (quic-transport), ACKs frames
were always sent only after having received two ack-eliciting packets.
This could trigger useless retransmissions for tail packets on the peer side.
For now on, we send as soon as possible ACK frames as soon as we have ACK to send,
in the same packets as the ack-eliciting frame packets, and we also send ACK
frames after having received 2 ack-eliciting packets since the last time we sent
an ACK frame with other ack-eliciting frames.
As such variables are handled by the QUIC connection I/O handler which runs
always on the thread, there is no need to continue to use such atomic operations
This bug has come with this commit:
1fc5e16c4 MINOR: quic: More accurate immediately close
As mentionned in this commit we do not want to derive anymore secret when in closing
state. But the flag which denote secrets were derived was set. Add a label at
the correct flag to skip the secrets derivation without setting this flag.
The new qc_stream_desc type has a tree node for storage. Thus, we can
remove the node in the qcs structure.
When initializing a new stream, it is stored into the qcc streams_by_id
tree. When the MUX releases it, it will freed as soon as its buffer is
emptied. Before this, the quic-conn is responsible to store it inside
its own streams_by_id tree.
Move the xprt-buf and ack related fields from qcs to the qc_stream_desc
structure. In exchange, qcs has a pointer to the low-level stream. For
each new qcs, a qc_stream_desc is automatically allocated.
This simplify the transport layer by removing qcs/mux manipulation
during ACK frame parsing. An additional check is done to not notify the
MUX on sending if the stream is already released : this case may now
happen on retransmission.
To complete this change, the quic_stream frame now references the
quic_stream instance instead of a qcs.
Currently, the mux qcs streams manage the Tx buffering, even after
sending it to the transport layer. Buffers are emptied when
acknowledgement are treated by the transport layer. This complicates the
MUX liberation and we may loose some data after the MUX free.
Change this paradigm by moving the buffering on the transport layer. For
this goal, a new type is implemented as low-level stream at the
transport layer, as a counterpart of qcs mux instances. This structure
is called qc_stream_desc. This will allow to free the qcs/qcc instances
without having to wait for acknowledge reception.
For the moment, the quic-conn is responsible to store the qc_stream_desc
in a new tree named streams_by_id. This will sligthly change in the next
commits to remove the qcs node which has a similar purpose :
qc_stream_desc instances will be shared between the qcc MUX and the
quic-conn.
This patch only introduces the new type definition and the function to
manipulate it. The following commit will bring the rearchitecture in the
qcs structure.
The quic_stream frame stores the qcs instance. On ACK parsing, qcs is
accessed to clear the stream buffer. This can cause a segfault if the
MUX or the qcs is already released.
Consider the following scenario :
1. a STREAM frame is generated by the MUX
transport layer emits the frame with PKN=1
upper layer has finished the transfer so related qcs is detached
2. transport layer reemits the frame with PKN=2 because ACK was not
received
3. ACK for PKN=1 is received, stream buffer is cleared
at this stage, qcs may be freed by the MUX as it is detached
4. ACK for PKN=2 is received
qcs for STREAM frame is dereferenced which will lead to a crash
To prevent this, qcs is never accessed from the quic_stream during ACK
parsing. Instead, a lookup is done on the MUX streams tree. If the MUX
is already released, no lookup is done. These checks prevents a possible
segfault.
This change may have an impact on the perf as now we are forced to use a
tree lookup operation. If this is the case, an alternative solution may
be to implement a refcount on qcs instances.
After having consumed <i> bytes from <buf>, the remaining available room to be
passed to generate_retry_token() is sizeof(buf) - i.
This bug could be easily reproduced with quic-qo as client which chooses a random
value as ODCID length.
This commit is similar to the previous one but with MAX_DATA frames.
This allows to increase the connection level flow-control limit. If the
connection was blocked due to QC_CF_BLK_MFCTL flag, the flag is reseted.
Implement a MUX method to parse MAX_STREAM_DATA. If the limit is greater
than the previous one and the stream was blocked, the flag
QC_SF_BLK_SFCTL is removed.
We must consider the peer address as validated as soon as we received an
handshake packet. An ACK frame in handshake packet was too restrictive.
Rename the concerned flag to reflect this situation.
We must be able to handle 1RTT packets after the mux has terminated its job
(qc->mux_state == QC_MUX_RELEASED). So the condition (qc->mux_state != QC_MUX_READY)
in qc_qel_may_rm_hp() is not correct when we want to wait for the mux to be started.
Add a check in qc_parse_pkt_frms() to ensure is started before calling it. All
the STREAM frames will be ignored when the mux will be released.
The most important one is the ->flags member which leads to an erratic xprt behavior.
For instance a non ack-eliciting packet could be seen as ack-eliciting leading the
xprt to try to retransmit a packet which are not ack-eliciting. In this case, the
xprt does nothing and remains indefinitively in a blocking state.
There are non already identified rare cases where qc_build_frms() does not manage
to size frames to be encoded in a packet leading qc_build_frm() to fail to add
such frame to the packet to be built. In such cases we must move back such
frames to their origin frame list passed as parameter to qc_build_frms(): <frms>.
because they were added to the packet frame list (but not built). If this
this packet is not retransmitted, the frame is lost for ever! Furthermore we must
not modify the buffer.
The TX packet refcounting had come with the multithreading support but not only.
It is very useful to ease the management of the memory allocated for TX packets
with TX frames attached to. At some locations of the code we have to move TX
frames from a packet to a new one during retranmission when the packet has been
deemed as lost or not. When deemed lost the memory allocated for the paquet must
be released contrary to when its frames are retransmitted when probing (PTO).
For now on, thanks to this patch we handle the TX packets memory this way. We
increment the packet refcount when:
- we insert it in its packet number space tree,
- we attache an ack-eliciting frame to it.
And reciprocally we decrement this refcount when:
- we remove an ack-eliciting frame from the packet,
- we delete the packet from its packet number space tree.
Note that an optimization WOULD NOT be to fully reuse (without releasing its
memorya TX packet to retransmit its contents (its ack-eliciting frames). Its
information (timestamp, in flight length) to be processed by packet loss detection
and the congestion control.
When building a packet with an ACK frame, we store the largest acknowledged
packet number sent in this frame in the packet (quic_tx_packet struc).
When receiving an ack for such a packet we can purge the tree of acknowledged
packet number ranges from the range sent before this largest acknowledged
packet number.
This struct member stores the largest acked packet number which was received. It
is used to build (TX) packet. But this is confusing to store it in the tx packet
of the packet number space structure even if it is used to build and transmit
packets.
Add qc_may_reuse_cbuf() function used by qc_prep_pkts() and qc_prep_app_pkts().
Simplification of the factorized section code: there is no need to check there
is enough room to mark the end of the data in the TX buf. This is done by
the callers (qc_prep_pkts() and qc_prep_app_pkts()). Add a diagram to explain
the conditions which must be verified to be able to reuse a cbuf struct.
This should improve the QUIC stack implementation maintenability.
This commit reverts this one:
"d5066dd9d BUG/MEDIUM: quic: qc_prep_app_pkts() retries on qc_build_pkt() failures"
After having filled the congestion control window, qc_build_pkt() always fails.
Then depending on the relative position of the writer and reader indexes for the
TX buffer, this could lead this function to try to reuse the buffer even if not full.
In such case, we do not always mark the end of the data in this TX buffer. This
is something the reader cannot understand: it reads a false datagram length,
then a wrong packet address from the TX buffer, leading to an invalid pointer
dereferencing.
Implement a new MUX function qcc_notify_send. This function must be
called by the transport layer to confirm the sending of STREAM data to
the MUX.
For the moment, the function has no real purpose. However, it will be
useful to solve limitations on push frame and implement the flow
control.
The aim of the idle timeout is to silently closed the connection after a period
of inactivity depending on the "max_idle_timeout" transport parameters advertised
by the endpoints. We add a new task to implement this timer. Its expiry is
updated each time we received an ack-eliciting packet, and each time we send
an ack-eliciting packet if no other such packet was sent since we received
the last ack-eliciting packet. Such conditions may be implemented thanks
to QUIC_FL_CONN_IDLE_TIMER_RESTARTED_AFTER_READ new flag.
There is no need to use such a reference counter anymore since the QUIC
connections are always handled by the same thread.
quic_conn_drop() is removed. Its code is merged into quic_conn_release().
Change the return value to success in qc_handle_bidi_strm_frm for two
specific cases :
* if STREAM frame is an already received offset
* if application decoding failed
This ensures that the packet is not dropped and properly acknowledged.
Previous to this fix, the return code was set to error which prevented
the ACK to be generated.
The impact of the bug might be noticeable in environment with packet
loss and retransmission. Due to haproxy not generating ACK for packets
containing STREAM frames with already received offset, the client will
probably retransmit them again, which will worsen the network
transmission.
Since the persistent congestion detection is done out of the congestion
controllers, there is no need to pass them information through quic_cc_event struct.
We remove its useless members. Also remove qc_cc_loss_event() which is no more used.
We establish the persistent congestion out of any congestion controller
to improve the algorithms genericity. This path characteristic detection may
be implemented regarless of the underlying congestion control algorithm.
Send congestion (loss) event using directly quic_cc_event(), so without
qc_cc_loss_event() wrapper function around quic_cc_event().
Take the opportunity of this patch to shorten "newest_time_sent" member field
of quic_cc_event to "time_sent".
QUIC connection path in flight bytes is a variable which should not be manipulated
by the congestion controller. This latter aim is to compute the congestion window.
So, we pass it as less as parameters as possible to do so.
Since QUIC accept handling has been improved, the MUX is initialized
after the handshake completion. Thus its safe to access transport
parameters in qc_init via the quic_conn.
Remove quic_mux_transport_params_update which was called by the
transport for the MUX. This improves the architecture by removing a
direct call from the transport to the MUX.
The deleted function body is not transfered to qc_init because this part
will change heavily in the near future when implementing the
flow-control.
We want to be able to build ack-eliciting frames to be embedded into QUIC packets
from a prebuilt list of ack-eliciting frames. This will be helpful for the mux
which would like to send STREAM frames asap after having builts its own prebuilt
list.
To do so, we only add a parameter as struct list to this function to handle
such a prebuilt list.
We want to be able to send ack-elicting packets from a list of ack-eliciting
frames. So, this patch adds such a paramaters to the function responsible of
building 1RTT packets. The entry point function is qc_send_app_pkts() which
is used with the underlying packet number space TX frame list as parameter.
We want to get rid of the code used during the handshake step. qc_prep_app_pkts()
aim is to build short packets which are also datagrams.
Make quic_conn_app_io_cb() call this new function to prepare short packets.
As reported by Tim in issue #1428, our sources are clean, there are
just a few files with a few rare non-ASCII chars for the paragraph
symbol, a few typos, or in Fred's name. Given that Fred already uses
the non-accentuated form at other places like on the public list,
let's uniformize all this and make sure the code displays equally
everywhere.
A segfault happens when receiving a CONNECTION_CLOSE during handshake.
This is because the mux is not initialized at this stage but the
transport layer dereferences it.
Fix this by ensuring that the MUX is initialized before. Thanks to Willy
for his help on this one. Welcome in the QUIC-men team !
Do not distinguish the direction (TX/RX) when settings TLS secrets flags.
There is not such a distinction in the RFC 9001.
Assemble them at the same level: at the upper context level.
This is required since this previous commit:
"MINOR: quic: Post handshake I/O callback switching"
If not, such packets remain endlessly in the RX buffer and cannot be parsed
by the new I/O callback used after the handshake has been confirmed.
Wakeup asap the timer task when setting its timer in the past.
Take also the opportunity of this patch to make simplify quic_pto_pktns():
calling tick_first() is useless here to compare <lpto> with <tmp_pto>.
Reorganize the Rx path for STREAM frames on bidirectional streams. A new
function qcc_recv is implemented on the MUX. It will handle the STREAM
frames copy and offset calculation from transport to MUX.
Another function named qcc_decode_qcs from the MUX can be called by
transport each time new STREAM data has been copied.
The architecture is now cleaner with the MUX layer in charge of parsing
the STREAM frames offsets. This is required to be able to implement the
flow-control on the MUX layer.
Note that as a convenience, a STREAM frame is not partially copied to
the MUX buffer. This simplify the implementation for the moment but it
may change in the future to optimize the STREAM frames handling.
For the moment, only bidirectional streams benefit from this change. In
the future, it may be extended to unidirectional streams to unify the
STREAM frames processing.
FIN flag on a STREAM frame was not detected if the frame was previously
buffered on qcs.rx.frms before being handled.
To fix this, copy the fin field from the quic_stream instance to
quic_rx_strm_frm. This is required to properly notify the FIN flag on
qc_treat_rx_strm_frms for the MUX layer.
Without this fix, the request channel might be left opened after the
last STREAM frame reception if there is out-of-order frames on the Rx
path.
This flag is set when the STREAM frame with FIN set has been received on
a qcs instance. For now, this is only used as a BUG_ON guard to prevent
against multiple frames with FIN set. It will also be useful when
reorganize the RX path and move some of its code in the mux.
Adjust the function to handle buffered STREAM frames. If the offset of
the frame was already fully received, discard the frame. If only
partially received, compute the difference and copy only the newly
offset.
Before this change, a buffered frame representing a fully or partially
received offset caused the loop to be interrupted. The frame was
preserved, thus preventing frames with greater offset to be handled.
This may fix some occurences of stalled transfer on the request channel
if there is out-of-order STREAM frames on the Rx path.
qc_strm_cpy can be simplified by simply using b_putblk which already
handle wrapping of the destination buffer. The function is kept to
update the frame length and offset fields.
The quic_frame instance containing the quic_stream must be freed when
the corresponding ACK has been received. However when implementing this
on qcs_try_to_consume, some data transfers are interrupted and cannot
complete (DC test from interop test suite).
The sending buffer of each stream is cleared when processing ACKs
corresponding to STREAM emitted frames. If the buffer is empty, free it
and offer it as with other dynamic buffers usage.
This should reduce memory consumption as before an opened stream
confiscate a buffer during its whole lifetime even if there is no more
data to transmit.
Simplify the data manipulation of STREAM frames on TX. Only stream data
and len field are used to generate a valid STREAM frames from the
buffer. Do not use the offset field, which required that a single buffer
instance should be shared for every frames on a single stream.
Adjust the handling of ACK for STREAM frames. When receiving a ACK, the
corresponding frames from the acknowledged packet are retrieved. If a
frame is of type STREAM, we compare the frame STREAM offset with the
last offset known of the qcs instance.
The comparison was incomplete as it did not treat a acked offset smaller
than the known offset. Previously, the acked frame was incorrectly
buffered in the qcs.tx.acked_frms. On reception of future ACKs, when
trying to process the buffered acks via qcs_try_to_consume, the loop is
interrupted on the smallest offset different from the qcs known offset :
in this case it will be the previous smaller range. This is a real bug
as it prevents all buffered ACKs to be processed, eventually filling the
qcs sending buffer and cause the transfer to stall.
Fix this by properly properly handle smaller acked offset. First check
if the offset length is greater than the qcs offset and mark as
acknowledged the difference on the qcs. If not, the frame is not
buffered and simply ignored.
The recent changes was not complete.
d1c76f24fd
MINOR: quic: do not modify offset node if quic_rx_strm_frm in tree
The frame length and data pointer should incremented after the data
copy. A BUG_ON statement has been added to detect an incorrect decrement
operaiton.
qc_rx_strm_frm_cpy is unsafe because it updates the offset field of the
frame. This is not safe as the frame is inserted in the tree when
calling this function and offset serves as the key node.
To fix this, the API is modified so that qc_rx_strm_frm_cpy does not
update the frame parameter. The caller is responsible to update
offset/length in case of a partial copy.
The impact of this bug is not known. It can only happened with received
STREAM frames out-of-order. This might be triggered with large h3 POST
requests.
Remove this server specific code section. It is useless, not tested. Furthermore
this is really not the good place to retrieve the peer transport parameters.
If the last frame is not entirely copied and must be buffered, FIN
must not be signaled to the upper layer.
This might fix a rare bug which could cause the request channel to be
closed too early leading to an incomplete request.
If a CONNECTION_CLOSE is received during handshake or after mux release,
a segfault happens due to invalid dereferencement of qc->qcc. Check
mux_state first to prevent this.
Move the QUIC datagram handlers oustide of the receivers. Use a global
handler per-thread which is allocated on post-config. Implement a free
function on process deinit to avoid a memory leak.
This should fix Coverity CID 375047 in GH #1536 where <buf_area> could leak because
not always freed by by quic_conn_drop(), especially when not stored in <qc> variable.
Rename quic_conn_to_buf to qc_snd_buf and remove it from xprt ops. This
is done to reflect the true usage of this function which is only a
wrapper around sendto but cannot be called by the upper layer.
qc_snd_buf is moved in quic-sock because to mark its link with
quic_sock_fd_iocb which is the recvfrom counterpart.
Rename a local variable tid to cid_tid. This ensures there is no
confusion with the global tid. It is now more explicit that we are
manipulating a quic datagram handlers from another thread in
quic_lstnr_dgram_dispatch.
In fact the xprt_ctx of the connection is first stored into quic_conn
struct as soon as it is initialized from qc_conn_alloc_ssl_ctx().
As quic_conn_init_timer() is run after this function, we can associate
the timer context of the timer to the one from the quic_conn struct.
We must move this initialization from xprt_start() callback, which
comes too late (after handshake completion for 1RTT session). This timer must be
usable as soon as we have packets to send/receive. Let's initialize it after
the TLS context is initialized in qc_conn_alloc_ssl_ctx(). This latter function
initializes I/O handler task (quic_conn_io_cb) to send/receive packets.
Do not use an extra DCID parameter on new_quic_cid to be able to
associated a new generated CID to a thread ID. Simply do the computation
inside the function. The API is cleaner this way.
This also has the effects to improve the apparent randomness of CIDs.
With the previous version the first byte of all CIDs are identical for a
connection which could lead to privacy issue. This version may not be
totally perfect on this aspect but it improves the situation.
The CID trees are no more attached to the listener receiver but to the
underlying datagram handlers (one by thread) which run always on the same thread.
So, any operation on these trees do not require any locking.
We copy the first octet of the original destination connection ID to any CID for
the connection calling new_quic_cid(). So this patch modifies only this function
to take a dcid as passed parameter.
Rename quic_lstnr_dgram_read() to quic_lstnr_dgram_dispatch() to reflect its new role.
After calling this latter, the sock i/o handler must consume the buffer only if
the datagram it received is detected as wrong by quic_lstnr_dgram_dispatch().
The datagram handler task mark the datagram as consumed atomically setting ->buf
to NULL value. The sock i/o handler is responsible of flushing its RX buffer
before using it. It also keeps a datagram among the consumed ones so that
to pass it to quic_lstnr_dgram_dispatch() and prevent it from allocating a new one.
quic_dgram_read() parses all the QUIC packets from a UDP datagram. It is the best
candidate to be converted into a task, because is processing data unit is the UDP
datagram received by the QUIC sock i/o handler. If correct, this datagram is
added to the context of a task, quic_lstnr_dghdlr(), a conversion of quic_dgram_read()
into a task. This task pop a datagram from an mt_list and passes it among to
the packet handler (quic_lstnr_pkt_rcv()).
Modify the quic_dgram struct to play the role of the old quic_dgram_ctx struct when
passed to quic_lstnr_pkt_rcv().
Modify the datagram handlers allocation to set their tasks to quic_lstnr_dghdlr().
Add quic_dgram new structure to store information about datagrams received
by the sock I/O handler (quic_sock_fd_iocb) and its associated pool.
Implement quic_get_dgram_dcid() to retrieve the datagram DCID which must
be the same for all the packets in the datagram.
Modify quic_lstnr_dgram_read() called by the sock I/O handler to allocate
a quic_dgram each time a correct datagram is found and add it to the sock I/O
handler rxbuf dgram list.
This function is no more used anymore, broken and uses code shared with the
listener packet parser. This is becoming anoying to continue to modify
it without testing each time we modify the code it shares with the
listener packet parser.
This is to be sure xprt functions do not manipulate the buffer struct
passed as parameter to quic_lstnr_dgram_read() from low level datagram
I/O callback in quic_sock.c (quic_sock_fd_iocb()).
Mention that the token is sent only by servers in both server and listener
packet parsers.
Remove a "TO DO" section in listener packet parser because there is nothing
more to do in this function about the token
This quic_dgram_ctx struct member is used to denote if we are parsing a new
datagram (null value), or a coalesced packet into the current datagram (non null
value). But it was never set.
Do not proceed to direct accept when creating a new quic_conn. Wait for
the QUIC handshake to succeeds to insert the quic_conn in the accept
queue. A tasklet is then woken up to call listener_accept to accept the
quic_conn.
The most important effect is that the connection/mux layers are not
instantiated at the same time as the quic_conn. This forces to delay
some process to be sure that the mux is allocated :
* initialization of mux transport parameters
* installation of the app-ops
Also, the mux instance is not checked now to wake up the quic_conn
tasklet. This is safe because the xprt-quic code is now ready to handle
the absence of the connection/mux layers.
Note that this commit has a deep impact as it changes significantly the
lower QUIC architecture. Most notably, it breaks the 0-RTT feature.
The connection is allocated after finishing the QUIC handshake. Remove
handshake/L6 flags when initializing the connection as handshake is
finished with success at this stage.
Remove usage of connection in quic_conn_from_buf. As connection and
quic_conn are decorrelated, it is not logical to check connection flags
when using sendto.
This require to store the L4 peer address in quic_conn to be able to use
sendto.
This change is required to delay allocation of connection.
Add a new function in mux-quic to install app-ops. For now this
functions is called during the ALPN negotiation of the QUIC handshake.
This change will be useful when the connection accept queue will be
implemented. It will be thus required to delay the app-ops
initialization because the mux won't be allocated anymore during the
QUIC handshake.
Define a new enum to represent the status of the mux/connection layer
above a quic_conn. This is important to know if it's possible to handle
application data, or if it should be buffered or dropped.
Adjust the function to check if header protection can be removed. It can
now be used both for a single packet in qc_lstnr_pkt_rcv and in the
quic_conn handler to handle buffered packets for a specific encryption
level.
Extract the allocation of ssl_sock_ctx from qc_conn_init to a dedicated
function qc_conn_alloc_ssl_ctx. This function is called just after
allocating a new quic_conn, without waiting for the initialization of
the connection. It allocates the ssl_sock_ctx and the quic_conn tasklet.
This change is now possible because the SSL callbacks are dealing with a
quic_conn instance.
This change is required to be able to delay the connection allocation
and handle handshake packets without it.
Allow to register quic_conn as ex-data in SSL callbacks. A new index is
used to identify it as ssl_qc_app_data_index.
Replace connection by quic_conn as SSL ex-data when initializing the QUIC
SSL session. When using SSL callbacks in QUIC context, the connection is
now NULL. Used quic_conn instead to retrieve the required parameters.
Also clean up
The same changes are conducted inside the QUIC SSL methods of xprt-quic
: connection instance usage is replaced by quic_conn.
Some functions of xprt-quic were still using connection instead of
quic_conn. This must be removed as the two are decorrelated : a
quic_conn can exist without a connection.
It is possible that the listener is in INITIAL state, but have to probe
with Handshake packets. In this case, when entering qc_prep_pkts() there
is nothing to do. We must select the next packet number space (or encryption
level) to be able to probe with such packet type.
Remove the unsafe call to tasklet_free in quic_close. At this stage the
tasklet may already be scheduled by an other threads even after if the
quic_conn refcount is now null. It will probably cause a crash on the
next tasklet processing.
Use tasklet_kill instead to ensure that the tasklet is freed in a
thread-safe way. Note that quic_conn_io_cb is not protected by the
refcount so only the quic_conn pinned thread must kill the tasklet.
Adjust slightly refcount code decrement on quic_conn close. A new
function named quic_conn_release is implemented. This function is
responsible to remove the quic_conn from CIDs trees and decrement the
refcount to free the quic_conn once all threads have finished to work
with it.
For now, quic_close is responsible to call it so the quic_conn is
scheduled to be free by upper layers. In the future, it may be useful to
delay it to be able to send remaining data or waiting for missing ACKs
for example.
This simplify quic_conn_drop which do not require the lock anymore.
Also, this can help to free the connection more quickly in some cases.
quic_conn_drop decrement the refcount and may free the quic_conn if
reaching 0. The quic_conn should not be dereferenced again after it in
any case even for traces.
Again, we fix a reminiscence of the way we probed before probing by packet.
When we were probing by datagram we inspected <prv_pkt> to know if we were
coalescing several packets. There is no need to do that at all when probing by packet.
Furthermore this could lead to blocking situations where we want to probe but
are limited by the congestion control (<cwnd> path variable). This must not be
the case. When probing we must do it regardless of the congestion control.
If a client resend Initial CRYPTO data, this is because it did not receive all
the server Initial CRYPTO data. With this patch we prepare a fast retransmission
without waiting for the PTO timer expiration sending old Initial CRYPTO data,
coalescing them with Handshake CRYPTO if present in the same datagram. Furthermore
we send also a datagram made of previously sent Hanshashke CRYPTO data if any.
When probing, we must not take into an account the congestion control window.
This was not completely correctly implemented: qc_build_frms() could fail
because of this limit when comparing the head of the packet againts the
congestion control window. With this patch we make it fail only when
we are not probing.
This is to avoid too much PTO timer expirations for 01RTT and Handshake packet
number spaces. Furthermore we are not limited by the anti-amplication for 01RTT
packet number space. According to the RFC we can send up to two packets.
This modification should have come with this commit:
"MINOR: quic: Remove nb_pto_dgrams quic_conn struct member"
where the nb_pto_dgrams quic_conn struct member was removed.
When building packets to send, we build frames computing their sizes
to have more chance to be added to new packets. There are rare cases
where this packet coult not be built because of the congestion control
which may for instance prevent us from building a packet with padding
(retransmitted Initial packets). In such a case, the pre-built frames
were lost because added to the packet frame list but not move packet
to the packet number space they come from.
With this patch we add the frames to the packet only if it could be built
and move them back to the packet number space if not.
There is no need to use an MT_LIST to store frames to send from a packet
number space. This is a reminiscence for multi-threading support for the TX part.
If we wakeup the I/O handler before the mux is started, it is possible
it has enough time to parse the ClientHello TLS message and update the
mux transport parameters, leading to a crash.
So, we initialize ->qcc quic_conn struct member at the very last time,
when the mux if fully initialized. The condition to wakeup the I/O handler
from lstnr_rcv_pkt() is: xprt context and mux both initialized.
Note that if the xprt context is initialized, it implies its tasklet is
initialized. So, we do not check anymore this latter condition.
Free the ssl_sock_ctx tasklet in quic_close() instead of
quic_conn_drop(). This ensures that the tasklet is destroyed safely by
the same thread.
This has no impact as the free operation was previously conducted with
care and should not be responsible of any crash.
Implement the emission of Retry packets. These packets are emitted in
response to Initial from clients without token. The token from the Retry
packet contains the ODCID from the Initial packet.
By default, Retry packet emission is disabled and the handshake can
continue without address validation. To enable Retry, a new bind option
has been defined named "quic-force-retry". If set, the handshake must be
conducted only after receiving a token in the Initial packet.
Implement the parsing of token from Initial packets. It is expected that
the token contains a CID which is the DCID from the Initial packet
received from the client without token which triggers a Retry packet.
This CID is then used for transport parameters.
Note that at the moment Retry packet emission is not implemented. This
will be achieved in a following commit.
It is expected that quic_dgram_read() returns the total number of bytes
read. Fix the return value when the read has been successful. This bug
has no impact as in the end the return value is not checked by the
caller.
->conn quic_conn struct member is a connection struct object which may be
released from several places. With this patch we do our best to stop dereferencing
this member as much as we can.
This commit was not correct:
"MINOR: quic: Only one CRYPTO frame by encryption level"
Indeed, when receiving CRYPTO data from TLS stack for a packet number space,
there are rare cases where there is already other frames than CRYPTO data frames
in the packet number space, especially for 01RTT packet number space. This is
very often with quant as client.
In fact we must look for the first packet with some ack-elicting frame to
in the packet number space tree to retransmit from. Obviously there
may be already retransmit packets which are not deemed as lost and
still present in the packet number space tree for TX packets.
When receiving CRYPTO data from the TLS stack, concatenate the CRYPTO data
to the first allocated CRYPTO frame if present. This reduces by one the number
of handshake packets built for a connection with a standard size certificate.
When block by the anti-amplification limit, this is the responsability of the
client to unblock it sending new datagrams. On the server side, even if not
well parsed, such datagrams must trigger the PTO timer arming.
Switch back to QUIC_HS_ST_SERVER_HANDSHAKE state after a completed handshake
if acks must be send.
Also ensure we build post handshake frames only one time without using prev_st
variable and ensure we discard the Handshake packet number space only one time.
We need to be able to decrypt late Handshake packets after the TLS secret
keys have been discarded. If not the peer send Handshake packet which have
not been acknowledged. But for such packets, we discard the CRYPTO data.
According to RFC 9002 par. 6.2.3. when receving duplicate Initial CRYPTO
data a server may a packet containing non unacknowledged before the PTO
expiry.
These tests were there to initiate PTO probing but they are not correct.
Furthermore they may break the PTO probing process and lead to useless packet
building.
RFC 9002 5.3. Estimating smoothed_rtt and rttvar:
MUST use the lesser of the acknowledgment delay and the peer's max_ack_delay
after the handshake is confirmed.
Properly initialized the ssl_sock_ctx pointer in qc_conn_init. This is
required to avoid to set an undefined pointer in qc.xprt_ctx if argument
*xprt_ctx is NULL.
Implement a refcount on quic_conn instance. By default, the refcount is
0. Two functions are implemented to manipulate it.
* qc_conn_take() which increments the refcount
* qc_conn_drop() which decrements it. If the refcount is 0 *BEFORE*
the substraction, the instance is freed.
The refcount is incremented on retrieve_qc_conn_from_cid() or when
allocating a new quic_conn in qc_lstnr_pkt_rcv(). It is substracted most
notably by the xprt.close operation and at the end of
qc_lstnr_pkt_rcv(). The increments/decrements should be conducted under
the CID lock to guarantee thread-safety.
The timer task is attached to the connection-pinned thread. Only this
thread can delete it. With the future refcount implementation of
quic_conn, every thread can be responsible to remove the quic_conn via
quic_conn_free(). Thus, the timer task deletion is moved from the
calling function quic_close().
Big refactoring on xprt-quic. A lot of functions were using the
ssl_sock_ctx as argument to only access the related quic_conn. All these
arguments are replaced by a quic_conn parameter.
As a convention, the quic_conn instance is always the first parameter of
these functions.
This commit is part of the rearchitecture of xprt-quic layers and the
separation between xprt and connection instances.
Remove the shortcut to use the INITIAL encryption level when removing
header protection on first connection packet.
This change is useful for the following change which removes
ssl_sock_ctx in argument lists in favor of the quic_conn instance.
Add a pointer in quic_conn to its related ssl_sock_ctx. This change is
required to avoid to use the connection instance to access it.
This commit is part of the rearchitecture of xprt-quic layers and the
separation between xprt and connection instances. It will be notably
useful when the connection allocation will be delayed.
free_quic_conn_cids() was called in quic_build_post_handshake_frames()
if an error occured. However, the only error is an allocation failure of
the CID which does not required to call it.
This change is required for future refcount implementation. The CID lock
will be removed from the free_quic_conn_cids() and to the caller.
When a quic_conn is found in the DCID tree, it can be removed from the
first ODCID tree. However, this operation must absolutely be run under a
write-lock to avoid race condition. To avoid to use the lock too
frequently, node.leaf_p is checked. This value is set to NULL after
ebmb_delete.
Add traces about important frame types to chunk_tx_frm_appendf()
and call this function for any type of frame when parsing a packet.
Move it to quic_frame.c
This is the same treatment for bidi and uni STREAM frames. This is a duplication
code which should me remove building a function for both these types of streams.
The connection instance has been replaced by a quic_conn as first
argument to QUIC traces. It is possible to report the quic_conn instance
in the qc_new_conn(), contrary to the connection which is not
initialized at this stage.
Replace the connection instance for first argument of trace callback by
a quic_conn instance. The QUIC trace module is properly initialized with
the first argument refering to a quic_conn.
Replace every connection instances in TRACE_* macros invocation in
xprt-quic by its related quic_conn. In some case, the connection is
still used to access the quic_conn. It may cause some problem on the
future when the connection will be completly separated from the xprt
layer.
This commit is part of the rearchitecture of xprt-quic layers and the
separation between xprt and connection instances.
Add const qualifier on arguments of several dump functions used in the
trace callback. This is required to be able to replace the first trace
argument by a quic_conn instance. The first argument is a const pointer
and so the members accessed through it must also be const.
Add a new member in ssl_sock_ctx structure to reference the quic_conn
instance if used in the QUIC stack. This member is initialized during
qc_conn_init().
This is needed to be able to access to the quic_conn without relying on
the connection instance. This commit is part of the rearchitecture of
xprt-quic layers and the separation between xprt and connection
instances.
Move qcc_get_qcs() function from xprt_quic.c to mux_quic.c. This
function is used to retrieve the qcs instance from a qcc with a stream
id. This clearly belongs to the mux-quic layer.
Use the convention of naming quic_conn instance as qc to not confuse it
with a connection instance. The changes occured for qc_parse_pkt_frms(),
qc_build_frms() and qc_do_build_pkt().
The QUIC connection I/O handler qc_conn_io_cb() could be called just after
qc_pkt_insert() have inserted a packet in a its tree, and before qc_pkt_insert()
have incremented the reference counter to this packet. As qc_conn_io_cb()
decrement this counter, the packet could be released before qc_pkt_insert()
might increment the counter, leading to possible crashes when trying to do so.
So, let's make qc_pkt_insert() increment this counter before inserting the packet
it is tree. No need to lock anything for that.
Add a function to process all STREAM frames received and ordered
by their offset (qc_treat_rx_strm_frms()) and modify
qc_handle_bidi_strm_frm() consequently.
With the DCID refactoring, the locking is more centralized. It is
possible to simplify the code for removal of a quic_conn from the ODCID
tree.
This operation can be conducted as soon as the connection has been
retrieved from the DCID tree, meaning that the peer now uses the final
DCID. Remove the bit to flag a connection for removal and just uses
ebmb_delete() on each sucessful lookup on the DCID tree. If the
quic_conn has already been removed, it is just a noop thanks to
eb_delete() implementation.
A new function named qc_retrieve_conn_from_cid() now contains all the
code to retrieve a connection from a DCID. It handle all type of packets
and centralize the locking on the ODCID/DCID trees.
This simplify the qc_lstnr_pkt_rcv() function.
If an UDP datagram contains multiple QUIC packets, they must all use the
same DCID. The datagram context is used partly for this.
To ensure this, a comparison was made on the dcid_node of DCID tree. As
this is a comparison based on pointer address, it can be faulty when
nodes are removed/readded on the same pointer address.
Replace this comparison by a proper comparison on the DCID data itself.
To this end, the dgram_ctx structure contains now a quic_cid member.
For first Initial packets, the socket source dest address is
concatenated to the DCID. This is used to be able to differentiate
possible collision between several clients which used the same ODCID.
Refactor the code to manage DCID and the concatenation with the address.
Before this, the concatenation was done on the quic_cid struct and its
<len> field incremented. In the code it is difficult to differentiate a
normal DCID with a DCID + address concatenated.
A new field <addrlen> has been added in the quic_cid struct. The <len>
field now only contains the size of the QUIC DCID. the <addrlen> is
first initialized to 0. If the address is concatenated, it will be
updated with the size of the concatenated address. This now means we
have to explicitely used either cid.len or cid.len + cid.addrlen to
access the DCID or the DCID + the address. The code should be clearer
thanks to this.
The field <odcid_len> in quic_rx_packet struct is now useless and has
been removed. However, a new parameter must be added to the
qc_new_conn() function to specify the size of the ODCID addrlen.
On haproxy implementation, generated DCID are on 8 bytes, the minimal
value allowed by the specification. Rename the constant representing
this size to inform that this is haproxy specific.
All operation on the ODCID/DCID trees must be conducted under a
read-write lock. Add a missing read-lock on the lookup operation inside
listener handler.
The packet number space flags were mixed with the connection level flags.
This leaded to ACK to be sent at the connection level without regard to
the underlying packet number space. But we want to be able to acknowleged
packets for a specific packet number space.
A client sends a 0-RTT data packet after an Initial one in the same datagram.
We must be able to parse such packets just after having parsed the Initial packets.
Export the code responsible which set the ->app_ops structure into
quic_set_app_ops() function. It must be called by the TLS callback which
selects the application (ssl_sock_advertise_alpn_protos) so that
to be able to build application packets after having received 0-RTT data.
The TLS does not provide us with TX secrets after we have provided it
with 0-RTT data. This is logic: the server does not need to send 0-RTT
data. We must skip the section where such secrets are derived if we do not
want to close the connection with a TLS alert.
Enable 0-RTT at the TLS context level:
RFC 9001 4.6.1. Enabling 0-RTT
Accordingly, the max_early_data_size parameter is repurposed to hold a
sentinel value 0xffffffff to indicate that the server is willing to accept
QUIC 0-RTT data.
At the SSL connection level, we must call SSL_set_quic_early_data_enabled().
This field is no more useful. Modify the traces consequently.
Also initialize ->pn_node.key value to -1, which is an illegal value
for QUIC packet number, and display it in traces if different from -1.
If not handled by qc_parse_pkt_frms(), the packet which contains it is dropped.
Add only a trace when parsing this frame at this time.
Also modify others to reduce the traces size and have more information about streams.
The xprt layer is reponsible to notify the mux of a CONNECTION_CLOSE
reception. In this case the flag QC_CF_CC_RECV is positionned on the
qcc and the mux tasklet is waken up.
One of the notable effect of the QC_CF_CC_RECV is that each qcs will be
released even if they have remaining data in their send buffers.
Set the HTX EOM flag on RX the app layer. This is required to notify
about the end of the request for the stream analyzers, else the request
channel never goes to MSG_DONE state.
Remove qc_eval_pkt() which has come with the multithreading support. It
was there to evaluate the length of a TX packet before building. We could
build from several thread TX packets without consuming a packet number for nothing (when
the building failed). But as the TX packet building functions are always
executed by the same thread, the one attached to the connection, this does
not make sense to continue to use such a function. Furthermore it is buggy
since we had to recently pad the TX packet under certain circumstances.
After the handshake has succeeded, we must delete any remaining
Initial or Handshake packets from the RX buffer. This cannot be
done depending on the state the connection (->st quic_conn struct
member value) as the packet are not received/treated in order.
Add a null byte to the end of the RX buffer to notify the consumer there is no
more data to treat.
Modify quic_rx_packet_pool_purge() which is the function which remove the
RX packet from the buffer.
Also rename this function to quic_rx_pkts_del().
As the RX packets may be accessed by the QUIC connection handler (quic_conn_io_cb())
the function responsible of decrementing their reference counters must not
access other information than these reference counters! It was a very bad idea
to try to purge the RX buffer asap when executing this function.
Do not leave in the RX buffer packets with CRYPTO data which were
already received. We do this when parsing CRYPTO frame. If already
received we must not consider such frames as if they were not received
in order! This had as side effect to interrupt the transfer of long streams
(ACK frames not parsed).
Implement the subscription in the mux on the qcs instance.
Subscribe is now used by the h3 layer when receiving an incomplete frame
on the H3 control stream. It is also used when attaching the remote
uni-directional streams on the h3 layer.
In the qc_send, the mux wakes up the qcs for each new transfer executed.
This is done via the method qcs_notify_send().
The xprt wakes up the qcs when receiving data on unidirectional streams.
This is done via the method qcs_notify_recv().
Re-implement the QUIC mux. It will reuse the mechanics from the previous
mux without all untested/unsupported features. This should ease the
maintenance.
Note that a lot of features are broken for the moment. They will be
re-implemented on the following commits to have a clean commit history.
The app layer is initialized after the handshake completion by the XPRT
stack. Call the finalize operation just after that.
Remove the erroneous call to finalize by the mux in the TPs callback as
the app layer is not yet initialized at this stage.
This should fix the missing H3 settings currently not emitted by
haproxy.
As soon as the connection ID (the one choosen by the QUIC server) has been used
by the client, we can delete its original destination connection ID from its tree.
This patch modifies ha_quic_set_encryption_secrets() to store the
secrets received by the TLS stack and prepare the information for the
next key update thanks to quic_tls_key_update().
qc_pkt_decrypt() is modified to check if we must used the next or the
previous key phase information to decrypt a short packet.
The information are rotated if the packet could be decrypted with the
next key phase information. Then new secrets, keys and IVs are updated
calling quic_tls_key_update() to prepare the next key phase.
quic_build_packet_short_header() is also modified to handle the key phase
bit from the current key phase information.
This function derives the next RX and TX keys and IVs from secrets
for the next key update key phase. We also implement quic_tls_rotate_keys()
which rotate the key update key phase information to be able to continue
to decrypt old key phase packets. Most of these information are pointers
to unsigned char.
When running Key Update process, we must maintain much information
especially when the key phase bit has been toggled by the peer as
it is possible that it is due to late packets. This patch adds
quic_tls_kp new structure to do so. They are used to store
previous and next secrets, keys and IVs associated to the previous
and next RX key phase. We also need the next TX key phase information
to be able to encrypt packets for the next key phase.
haproxy may crash when running this statement in qc_lstnr_pkt_rcv():
conn_ctx = qc->conn->xprt_ctx;
because qc->conn may not be initialized. With this patch we ensure
qc->conn is correctly initialized before accessing its ->xprt_ctx
members. We zero the xrpt_ctx structure (ssl_conn_ctx struct), then
initialize its ->conn member with HA_ATOMIC_STORE. Then, ->conn and
->conn->xptr_ctx members of quic_conn struct can be accessed with HA_ATOMIC_LOAD()
When sending a CONNECTION_CLOSE frame to immediately close the connection,
do not provide CRYPTO data to the TLS stack. Do not built anything else than a
CONNECTION_CLOSE and do not derive any secret when in immediately close state.
Seize the opportunity of this patch to rename ->err quic_conn struct member
to ->error_code.
We set this TLS error when no application protocol could be negotiated
via the TLS callback concerned. It is converted as a QUIC CRYPTO_ERROR
error (0x178).
Remove the verbosity set to 0 on quic_init_stdout_traces. This will
generate even more verbose traces on stdout with the default verbosity
of 1 when compiling with -DENABLE_QUIC_STDOUT_TRACES.
Implement a function quic_init_stdout_traces called at STG_INIT. If
ENABLE_QUIC_STDOUT_TRACES preprocessor define is set, the QUIC trace
module will be automatically activated to emit traces on stdout on the
developer level.
The main purpose for now is to be able to generate traces on the haproxy
docker image used for QUIC interop testing suite. This should facilitate
test failure analysis.
Change the way the CIDs are organized to rattach received packets DCID
to QUIC connection. This is necessary to be able to handle multiple DCID
to one connection.
For this, the quic_connection_id structure has been extended. When
allocated, they are inserted in the receiver CID tree instead of the
quic_conn directly. When receiving a packet, the receiver tree is
inspected to retrieve the quic_connection_id. The quic_connection_id
contains now contains a reference to the QUIC connection.
The comment is here to warn about a possible thread concurrence issue
when treating INITIAL packets from the same client. The macro unlikely
is added to further highlight this scarce occurence.
It is valid for a QUIC packet to contain a PADDING frame followed by
one or several other frames.
quic_parse_padding_frame() does not require change as it detect properly
the end of the frame with the first non-null byte.
This allow to use quic-go implementation which uses a PADDING-CRYPTO as
the first handshake packet.