The aim of the idle timeout is to silently closed the connection after a period
of inactivity depending on the "max_idle_timeout" transport parameters advertised
by the endpoints. We add a new task to implement this timer. Its expiry is
updated each time we received an ack-eliciting packet, and each time we send
an ack-eliciting packet if no other such packet was sent since we received
the last ack-eliciting packet. Such conditions may be implemented thanks
to QUIC_FL_CONN_IDLE_TIMER_RESTARTED_AFTER_READ new flag.
There is no need to use such a reference counter anymore since the QUIC
connections are always handled by the same thread.
quic_conn_drop() is removed. Its code is merged into quic_conn_release().
Change the return value to success in qc_handle_bidi_strm_frm for two
specific cases :
* if STREAM frame is an already received offset
* if application decoding failed
This ensures that the packet is not dropped and properly acknowledged.
Previous to this fix, the return code was set to error which prevented
the ACK to be generated.
The impact of the bug might be noticeable in environment with packet
loss and retransmission. Due to haproxy not generating ACK for packets
containing STREAM frames with already received offset, the client will
probably retransmit them again, which will worsen the network
transmission.
Since the persistent congestion detection is done out of the congestion
controllers, there is no need to pass them information through quic_cc_event struct.
We remove its useless members. Also remove qc_cc_loss_event() which is no more used.
We establish the persistent congestion out of any congestion controller
to improve the algorithms genericity. This path characteristic detection may
be implemented regarless of the underlying congestion control algorithm.
Send congestion (loss) event using directly quic_cc_event(), so without
qc_cc_loss_event() wrapper function around quic_cc_event().
Take the opportunity of this patch to shorten "newest_time_sent" member field
of quic_cc_event to "time_sent".
QUIC connection path in flight bytes is a variable which should not be manipulated
by the congestion controller. This latter aim is to compute the congestion window.
So, we pass it as less as parameters as possible to do so.
Since QUIC accept handling has been improved, the MUX is initialized
after the handshake completion. Thus its safe to access transport
parameters in qc_init via the quic_conn.
Remove quic_mux_transport_params_update which was called by the
transport for the MUX. This improves the architecture by removing a
direct call from the transport to the MUX.
The deleted function body is not transfered to qc_init because this part
will change heavily in the near future when implementing the
flow-control.
We want to be able to build ack-eliciting frames to be embedded into QUIC packets
from a prebuilt list of ack-eliciting frames. This will be helpful for the mux
which would like to send STREAM frames asap after having builts its own prebuilt
list.
To do so, we only add a parameter as struct list to this function to handle
such a prebuilt list.
We want to be able to send ack-elicting packets from a list of ack-eliciting
frames. So, this patch adds such a paramaters to the function responsible of
building 1RTT packets. The entry point function is qc_send_app_pkts() which
is used with the underlying packet number space TX frame list as parameter.
We want to get rid of the code used during the handshake step. qc_prep_app_pkts()
aim is to build short packets which are also datagrams.
Make quic_conn_app_io_cb() call this new function to prepare short packets.
As reported by Tim in issue #1428, our sources are clean, there are
just a few files with a few rare non-ASCII chars for the paragraph
symbol, a few typos, or in Fred's name. Given that Fred already uses
the non-accentuated form at other places like on the public list,
let's uniformize all this and make sure the code displays equally
everywhere.
A segfault happens when receiving a CONNECTION_CLOSE during handshake.
This is because the mux is not initialized at this stage but the
transport layer dereferences it.
Fix this by ensuring that the MUX is initialized before. Thanks to Willy
for his help on this one. Welcome in the QUIC-men team !
Do not distinguish the direction (TX/RX) when settings TLS secrets flags.
There is not such a distinction in the RFC 9001.
Assemble them at the same level: at the upper context level.
This is required since this previous commit:
"MINOR: quic: Post handshake I/O callback switching"
If not, such packets remain endlessly in the RX buffer and cannot be parsed
by the new I/O callback used after the handshake has been confirmed.
Wakeup asap the timer task when setting its timer in the past.
Take also the opportunity of this patch to make simplify quic_pto_pktns():
calling tick_first() is useless here to compare <lpto> with <tmp_pto>.
Reorganize the Rx path for STREAM frames on bidirectional streams. A new
function qcc_recv is implemented on the MUX. It will handle the STREAM
frames copy and offset calculation from transport to MUX.
Another function named qcc_decode_qcs from the MUX can be called by
transport each time new STREAM data has been copied.
The architecture is now cleaner with the MUX layer in charge of parsing
the STREAM frames offsets. This is required to be able to implement the
flow-control on the MUX layer.
Note that as a convenience, a STREAM frame is not partially copied to
the MUX buffer. This simplify the implementation for the moment but it
may change in the future to optimize the STREAM frames handling.
For the moment, only bidirectional streams benefit from this change. In
the future, it may be extended to unidirectional streams to unify the
STREAM frames processing.
FIN flag on a STREAM frame was not detected if the frame was previously
buffered on qcs.rx.frms before being handled.
To fix this, copy the fin field from the quic_stream instance to
quic_rx_strm_frm. This is required to properly notify the FIN flag on
qc_treat_rx_strm_frms for the MUX layer.
Without this fix, the request channel might be left opened after the
last STREAM frame reception if there is out-of-order frames on the Rx
path.
This flag is set when the STREAM frame with FIN set has been received on
a qcs instance. For now, this is only used as a BUG_ON guard to prevent
against multiple frames with FIN set. It will also be useful when
reorganize the RX path and move some of its code in the mux.
Adjust the function to handle buffered STREAM frames. If the offset of
the frame was already fully received, discard the frame. If only
partially received, compute the difference and copy only the newly
offset.
Before this change, a buffered frame representing a fully or partially
received offset caused the loop to be interrupted. The frame was
preserved, thus preventing frames with greater offset to be handled.
This may fix some occurences of stalled transfer on the request channel
if there is out-of-order STREAM frames on the Rx path.
qc_strm_cpy can be simplified by simply using b_putblk which already
handle wrapping of the destination buffer. The function is kept to
update the frame length and offset fields.
The quic_frame instance containing the quic_stream must be freed when
the corresponding ACK has been received. However when implementing this
on qcs_try_to_consume, some data transfers are interrupted and cannot
complete (DC test from interop test suite).
The sending buffer of each stream is cleared when processing ACKs
corresponding to STREAM emitted frames. If the buffer is empty, free it
and offer it as with other dynamic buffers usage.
This should reduce memory consumption as before an opened stream
confiscate a buffer during its whole lifetime even if there is no more
data to transmit.
Simplify the data manipulation of STREAM frames on TX. Only stream data
and len field are used to generate a valid STREAM frames from the
buffer. Do not use the offset field, which required that a single buffer
instance should be shared for every frames on a single stream.
Adjust the handling of ACK for STREAM frames. When receiving a ACK, the
corresponding frames from the acknowledged packet are retrieved. If a
frame is of type STREAM, we compare the frame STREAM offset with the
last offset known of the qcs instance.
The comparison was incomplete as it did not treat a acked offset smaller
than the known offset. Previously, the acked frame was incorrectly
buffered in the qcs.tx.acked_frms. On reception of future ACKs, when
trying to process the buffered acks via qcs_try_to_consume, the loop is
interrupted on the smallest offset different from the qcs known offset :
in this case it will be the previous smaller range. This is a real bug
as it prevents all buffered ACKs to be processed, eventually filling the
qcs sending buffer and cause the transfer to stall.
Fix this by properly properly handle smaller acked offset. First check
if the offset length is greater than the qcs offset and mark as
acknowledged the difference on the qcs. If not, the frame is not
buffered and simply ignored.
The recent changes was not complete.
d1c76f24fd
MINOR: quic: do not modify offset node if quic_rx_strm_frm in tree
The frame length and data pointer should incremented after the data
copy. A BUG_ON statement has been added to detect an incorrect decrement
operaiton.
qc_rx_strm_frm_cpy is unsafe because it updates the offset field of the
frame. This is not safe as the frame is inserted in the tree when
calling this function and offset serves as the key node.
To fix this, the API is modified so that qc_rx_strm_frm_cpy does not
update the frame parameter. The caller is responsible to update
offset/length in case of a partial copy.
The impact of this bug is not known. It can only happened with received
STREAM frames out-of-order. This might be triggered with large h3 POST
requests.
Remove this server specific code section. It is useless, not tested. Furthermore
this is really not the good place to retrieve the peer transport parameters.
If the last frame is not entirely copied and must be buffered, FIN
must not be signaled to the upper layer.
This might fix a rare bug which could cause the request channel to be
closed too early leading to an incomplete request.
If a CONNECTION_CLOSE is received during handshake or after mux release,
a segfault happens due to invalid dereferencement of qc->qcc. Check
mux_state first to prevent this.
Move the QUIC datagram handlers oustide of the receivers. Use a global
handler per-thread which is allocated on post-config. Implement a free
function on process deinit to avoid a memory leak.
This should fix Coverity CID 375047 in GH #1536 where <buf_area> could leak because
not always freed by by quic_conn_drop(), especially when not stored in <qc> variable.
Rename quic_conn_to_buf to qc_snd_buf and remove it from xprt ops. This
is done to reflect the true usage of this function which is only a
wrapper around sendto but cannot be called by the upper layer.
qc_snd_buf is moved in quic-sock because to mark its link with
quic_sock_fd_iocb which is the recvfrom counterpart.
Rename a local variable tid to cid_tid. This ensures there is no
confusion with the global tid. It is now more explicit that we are
manipulating a quic datagram handlers from another thread in
quic_lstnr_dgram_dispatch.
In fact the xprt_ctx of the connection is first stored into quic_conn
struct as soon as it is initialized from qc_conn_alloc_ssl_ctx().
As quic_conn_init_timer() is run after this function, we can associate
the timer context of the timer to the one from the quic_conn struct.
We must move this initialization from xprt_start() callback, which
comes too late (after handshake completion for 1RTT session). This timer must be
usable as soon as we have packets to send/receive. Let's initialize it after
the TLS context is initialized in qc_conn_alloc_ssl_ctx(). This latter function
initializes I/O handler task (quic_conn_io_cb) to send/receive packets.
Do not use an extra DCID parameter on new_quic_cid to be able to
associated a new generated CID to a thread ID. Simply do the computation
inside the function. The API is cleaner this way.
This also has the effects to improve the apparent randomness of CIDs.
With the previous version the first byte of all CIDs are identical for a
connection which could lead to privacy issue. This version may not be
totally perfect on this aspect but it improves the situation.
The CID trees are no more attached to the listener receiver but to the
underlying datagram handlers (one by thread) which run always on the same thread.
So, any operation on these trees do not require any locking.
We copy the first octet of the original destination connection ID to any CID for
the connection calling new_quic_cid(). So this patch modifies only this function
to take a dcid as passed parameter.
Rename quic_lstnr_dgram_read() to quic_lstnr_dgram_dispatch() to reflect its new role.
After calling this latter, the sock i/o handler must consume the buffer only if
the datagram it received is detected as wrong by quic_lstnr_dgram_dispatch().
The datagram handler task mark the datagram as consumed atomically setting ->buf
to NULL value. The sock i/o handler is responsible of flushing its RX buffer
before using it. It also keeps a datagram among the consumed ones so that
to pass it to quic_lstnr_dgram_dispatch() and prevent it from allocating a new one.
quic_dgram_read() parses all the QUIC packets from a UDP datagram. It is the best
candidate to be converted into a task, because is processing data unit is the UDP
datagram received by the QUIC sock i/o handler. If correct, this datagram is
added to the context of a task, quic_lstnr_dghdlr(), a conversion of quic_dgram_read()
into a task. This task pop a datagram from an mt_list and passes it among to
the packet handler (quic_lstnr_pkt_rcv()).
Modify the quic_dgram struct to play the role of the old quic_dgram_ctx struct when
passed to quic_lstnr_pkt_rcv().
Modify the datagram handlers allocation to set their tasks to quic_lstnr_dghdlr().
Add quic_dgram new structure to store information about datagrams received
by the sock I/O handler (quic_sock_fd_iocb) and its associated pool.
Implement quic_get_dgram_dcid() to retrieve the datagram DCID which must
be the same for all the packets in the datagram.
Modify quic_lstnr_dgram_read() called by the sock I/O handler to allocate
a quic_dgram each time a correct datagram is found and add it to the sock I/O
handler rxbuf dgram list.
This function is no more used anymore, broken and uses code shared with the
listener packet parser. This is becoming anoying to continue to modify
it without testing each time we modify the code it shares with the
listener packet parser.
This is to be sure xprt functions do not manipulate the buffer struct
passed as parameter to quic_lstnr_dgram_read() from low level datagram
I/O callback in quic_sock.c (quic_sock_fd_iocb()).
Mention that the token is sent only by servers in both server and listener
packet parsers.
Remove a "TO DO" section in listener packet parser because there is nothing
more to do in this function about the token
This quic_dgram_ctx struct member is used to denote if we are parsing a new
datagram (null value), or a coalesced packet into the current datagram (non null
value). But it was never set.
Do not proceed to direct accept when creating a new quic_conn. Wait for
the QUIC handshake to succeeds to insert the quic_conn in the accept
queue. A tasklet is then woken up to call listener_accept to accept the
quic_conn.
The most important effect is that the connection/mux layers are not
instantiated at the same time as the quic_conn. This forces to delay
some process to be sure that the mux is allocated :
* initialization of mux transport parameters
* installation of the app-ops
Also, the mux instance is not checked now to wake up the quic_conn
tasklet. This is safe because the xprt-quic code is now ready to handle
the absence of the connection/mux layers.
Note that this commit has a deep impact as it changes significantly the
lower QUIC architecture. Most notably, it breaks the 0-RTT feature.
The connection is allocated after finishing the QUIC handshake. Remove
handshake/L6 flags when initializing the connection as handshake is
finished with success at this stage.
Remove usage of connection in quic_conn_from_buf. As connection and
quic_conn are decorrelated, it is not logical to check connection flags
when using sendto.
This require to store the L4 peer address in quic_conn to be able to use
sendto.
This change is required to delay allocation of connection.
Add a new function in mux-quic to install app-ops. For now this
functions is called during the ALPN negotiation of the QUIC handshake.
This change will be useful when the connection accept queue will be
implemented. It will be thus required to delay the app-ops
initialization because the mux won't be allocated anymore during the
QUIC handshake.
Define a new enum to represent the status of the mux/connection layer
above a quic_conn. This is important to know if it's possible to handle
application data, or if it should be buffered or dropped.
Adjust the function to check if header protection can be removed. It can
now be used both for a single packet in qc_lstnr_pkt_rcv and in the
quic_conn handler to handle buffered packets for a specific encryption
level.
Extract the allocation of ssl_sock_ctx from qc_conn_init to a dedicated
function qc_conn_alloc_ssl_ctx. This function is called just after
allocating a new quic_conn, without waiting for the initialization of
the connection. It allocates the ssl_sock_ctx and the quic_conn tasklet.
This change is now possible because the SSL callbacks are dealing with a
quic_conn instance.
This change is required to be able to delay the connection allocation
and handle handshake packets without it.
Allow to register quic_conn as ex-data in SSL callbacks. A new index is
used to identify it as ssl_qc_app_data_index.
Replace connection by quic_conn as SSL ex-data when initializing the QUIC
SSL session. When using SSL callbacks in QUIC context, the connection is
now NULL. Used quic_conn instead to retrieve the required parameters.
Also clean up
The same changes are conducted inside the QUIC SSL methods of xprt-quic
: connection instance usage is replaced by quic_conn.
Some functions of xprt-quic were still using connection instead of
quic_conn. This must be removed as the two are decorrelated : a
quic_conn can exist without a connection.
It is possible that the listener is in INITIAL state, but have to probe
with Handshake packets. In this case, when entering qc_prep_pkts() there
is nothing to do. We must select the next packet number space (or encryption
level) to be able to probe with such packet type.
Remove the unsafe call to tasklet_free in quic_close. At this stage the
tasklet may already be scheduled by an other threads even after if the
quic_conn refcount is now null. It will probably cause a crash on the
next tasklet processing.
Use tasklet_kill instead to ensure that the tasklet is freed in a
thread-safe way. Note that quic_conn_io_cb is not protected by the
refcount so only the quic_conn pinned thread must kill the tasklet.
Adjust slightly refcount code decrement on quic_conn close. A new
function named quic_conn_release is implemented. This function is
responsible to remove the quic_conn from CIDs trees and decrement the
refcount to free the quic_conn once all threads have finished to work
with it.
For now, quic_close is responsible to call it so the quic_conn is
scheduled to be free by upper layers. In the future, it may be useful to
delay it to be able to send remaining data or waiting for missing ACKs
for example.
This simplify quic_conn_drop which do not require the lock anymore.
Also, this can help to free the connection more quickly in some cases.
quic_conn_drop decrement the refcount and may free the quic_conn if
reaching 0. The quic_conn should not be dereferenced again after it in
any case even for traces.
Again, we fix a reminiscence of the way we probed before probing by packet.
When we were probing by datagram we inspected <prv_pkt> to know if we were
coalescing several packets. There is no need to do that at all when probing by packet.
Furthermore this could lead to blocking situations where we want to probe but
are limited by the congestion control (<cwnd> path variable). This must not be
the case. When probing we must do it regardless of the congestion control.
If a client resend Initial CRYPTO data, this is because it did not receive all
the server Initial CRYPTO data. With this patch we prepare a fast retransmission
without waiting for the PTO timer expiration sending old Initial CRYPTO data,
coalescing them with Handshake CRYPTO if present in the same datagram. Furthermore
we send also a datagram made of previously sent Hanshashke CRYPTO data if any.
When probing, we must not take into an account the congestion control window.
This was not completely correctly implemented: qc_build_frms() could fail
because of this limit when comparing the head of the packet againts the
congestion control window. With this patch we make it fail only when
we are not probing.
This is to avoid too much PTO timer expirations for 01RTT and Handshake packet
number spaces. Furthermore we are not limited by the anti-amplication for 01RTT
packet number space. According to the RFC we can send up to two packets.
This modification should have come with this commit:
"MINOR: quic: Remove nb_pto_dgrams quic_conn struct member"
where the nb_pto_dgrams quic_conn struct member was removed.
When building packets to send, we build frames computing their sizes
to have more chance to be added to new packets. There are rare cases
where this packet coult not be built because of the congestion control
which may for instance prevent us from building a packet with padding
(retransmitted Initial packets). In such a case, the pre-built frames
were lost because added to the packet frame list but not move packet
to the packet number space they come from.
With this patch we add the frames to the packet only if it could be built
and move them back to the packet number space if not.
There is no need to use an MT_LIST to store frames to send from a packet
number space. This is a reminiscence for multi-threading support for the TX part.
If we wakeup the I/O handler before the mux is started, it is possible
it has enough time to parse the ClientHello TLS message and update the
mux transport parameters, leading to a crash.
So, we initialize ->qcc quic_conn struct member at the very last time,
when the mux if fully initialized. The condition to wakeup the I/O handler
from lstnr_rcv_pkt() is: xprt context and mux both initialized.
Note that if the xprt context is initialized, it implies its tasklet is
initialized. So, we do not check anymore this latter condition.
Free the ssl_sock_ctx tasklet in quic_close() instead of
quic_conn_drop(). This ensures that the tasklet is destroyed safely by
the same thread.
This has no impact as the free operation was previously conducted with
care and should not be responsible of any crash.
Implement the emission of Retry packets. These packets are emitted in
response to Initial from clients without token. The token from the Retry
packet contains the ODCID from the Initial packet.
By default, Retry packet emission is disabled and the handshake can
continue without address validation. To enable Retry, a new bind option
has been defined named "quic-force-retry". If set, the handshake must be
conducted only after receiving a token in the Initial packet.
Implement the parsing of token from Initial packets. It is expected that
the token contains a CID which is the DCID from the Initial packet
received from the client without token which triggers a Retry packet.
This CID is then used for transport parameters.
Note that at the moment Retry packet emission is not implemented. This
will be achieved in a following commit.
It is expected that quic_dgram_read() returns the total number of bytes
read. Fix the return value when the read has been successful. This bug
has no impact as in the end the return value is not checked by the
caller.
->conn quic_conn struct member is a connection struct object which may be
released from several places. With this patch we do our best to stop dereferencing
this member as much as we can.
This commit was not correct:
"MINOR: quic: Only one CRYPTO frame by encryption level"
Indeed, when receiving CRYPTO data from TLS stack for a packet number space,
there are rare cases where there is already other frames than CRYPTO data frames
in the packet number space, especially for 01RTT packet number space. This is
very often with quant as client.
In fact we must look for the first packet with some ack-elicting frame to
in the packet number space tree to retransmit from. Obviously there
may be already retransmit packets which are not deemed as lost and
still present in the packet number space tree for TX packets.
When receiving CRYPTO data from the TLS stack, concatenate the CRYPTO data
to the first allocated CRYPTO frame if present. This reduces by one the number
of handshake packets built for a connection with a standard size certificate.
When block by the anti-amplification limit, this is the responsability of the
client to unblock it sending new datagrams. On the server side, even if not
well parsed, such datagrams must trigger the PTO timer arming.
Switch back to QUIC_HS_ST_SERVER_HANDSHAKE state after a completed handshake
if acks must be send.
Also ensure we build post handshake frames only one time without using prev_st
variable and ensure we discard the Handshake packet number space only one time.
We need to be able to decrypt late Handshake packets after the TLS secret
keys have been discarded. If not the peer send Handshake packet which have
not been acknowledged. But for such packets, we discard the CRYPTO data.
According to RFC 9002 par. 6.2.3. when receving duplicate Initial CRYPTO
data a server may a packet containing non unacknowledged before the PTO
expiry.
These tests were there to initiate PTO probing but they are not correct.
Furthermore they may break the PTO probing process and lead to useless packet
building.
RFC 9002 5.3. Estimating smoothed_rtt and rttvar:
MUST use the lesser of the acknowledgment delay and the peer's max_ack_delay
after the handshake is confirmed.
Properly initialized the ssl_sock_ctx pointer in qc_conn_init. This is
required to avoid to set an undefined pointer in qc.xprt_ctx if argument
*xprt_ctx is NULL.
Implement a refcount on quic_conn instance. By default, the refcount is
0. Two functions are implemented to manipulate it.
* qc_conn_take() which increments the refcount
* qc_conn_drop() which decrements it. If the refcount is 0 *BEFORE*
the substraction, the instance is freed.
The refcount is incremented on retrieve_qc_conn_from_cid() or when
allocating a new quic_conn in qc_lstnr_pkt_rcv(). It is substracted most
notably by the xprt.close operation and at the end of
qc_lstnr_pkt_rcv(). The increments/decrements should be conducted under
the CID lock to guarantee thread-safety.
The timer task is attached to the connection-pinned thread. Only this
thread can delete it. With the future refcount implementation of
quic_conn, every thread can be responsible to remove the quic_conn via
quic_conn_free(). Thus, the timer task deletion is moved from the
calling function quic_close().
Big refactoring on xprt-quic. A lot of functions were using the
ssl_sock_ctx as argument to only access the related quic_conn. All these
arguments are replaced by a quic_conn parameter.
As a convention, the quic_conn instance is always the first parameter of
these functions.
This commit is part of the rearchitecture of xprt-quic layers and the
separation between xprt and connection instances.
Remove the shortcut to use the INITIAL encryption level when removing
header protection on first connection packet.
This change is useful for the following change which removes
ssl_sock_ctx in argument lists in favor of the quic_conn instance.
Add a pointer in quic_conn to its related ssl_sock_ctx. This change is
required to avoid to use the connection instance to access it.
This commit is part of the rearchitecture of xprt-quic layers and the
separation between xprt and connection instances. It will be notably
useful when the connection allocation will be delayed.
free_quic_conn_cids() was called in quic_build_post_handshake_frames()
if an error occured. However, the only error is an allocation failure of
the CID which does not required to call it.
This change is required for future refcount implementation. The CID lock
will be removed from the free_quic_conn_cids() and to the caller.
When a quic_conn is found in the DCID tree, it can be removed from the
first ODCID tree. However, this operation must absolutely be run under a
write-lock to avoid race condition. To avoid to use the lock too
frequently, node.leaf_p is checked. This value is set to NULL after
ebmb_delete.
Add traces about important frame types to chunk_tx_frm_appendf()
and call this function for any type of frame when parsing a packet.
Move it to quic_frame.c
This is the same treatment for bidi and uni STREAM frames. This is a duplication
code which should me remove building a function for both these types of streams.
The connection instance has been replaced by a quic_conn as first
argument to QUIC traces. It is possible to report the quic_conn instance
in the qc_new_conn(), contrary to the connection which is not
initialized at this stage.
Replace the connection instance for first argument of trace callback by
a quic_conn instance. The QUIC trace module is properly initialized with
the first argument refering to a quic_conn.
Replace every connection instances in TRACE_* macros invocation in
xprt-quic by its related quic_conn. In some case, the connection is
still used to access the quic_conn. It may cause some problem on the
future when the connection will be completly separated from the xprt
layer.
This commit is part of the rearchitecture of xprt-quic layers and the
separation between xprt and connection instances.
Add const qualifier on arguments of several dump functions used in the
trace callback. This is required to be able to replace the first trace
argument by a quic_conn instance. The first argument is a const pointer
and so the members accessed through it must also be const.
Add a new member in ssl_sock_ctx structure to reference the quic_conn
instance if used in the QUIC stack. This member is initialized during
qc_conn_init().
This is needed to be able to access to the quic_conn without relying on
the connection instance. This commit is part of the rearchitecture of
xprt-quic layers and the separation between xprt and connection
instances.
Move qcc_get_qcs() function from xprt_quic.c to mux_quic.c. This
function is used to retrieve the qcs instance from a qcc with a stream
id. This clearly belongs to the mux-quic layer.
Use the convention of naming quic_conn instance as qc to not confuse it
with a connection instance. The changes occured for qc_parse_pkt_frms(),
qc_build_frms() and qc_do_build_pkt().
The QUIC connection I/O handler qc_conn_io_cb() could be called just after
qc_pkt_insert() have inserted a packet in a its tree, and before qc_pkt_insert()
have incremented the reference counter to this packet. As qc_conn_io_cb()
decrement this counter, the packet could be released before qc_pkt_insert()
might increment the counter, leading to possible crashes when trying to do so.
So, let's make qc_pkt_insert() increment this counter before inserting the packet
it is tree. No need to lock anything for that.
Add a function to process all STREAM frames received and ordered
by their offset (qc_treat_rx_strm_frms()) and modify
qc_handle_bidi_strm_frm() consequently.
With the DCID refactoring, the locking is more centralized. It is
possible to simplify the code for removal of a quic_conn from the ODCID
tree.
This operation can be conducted as soon as the connection has been
retrieved from the DCID tree, meaning that the peer now uses the final
DCID. Remove the bit to flag a connection for removal and just uses
ebmb_delete() on each sucessful lookup on the DCID tree. If the
quic_conn has already been removed, it is just a noop thanks to
eb_delete() implementation.
A new function named qc_retrieve_conn_from_cid() now contains all the
code to retrieve a connection from a DCID. It handle all type of packets
and centralize the locking on the ODCID/DCID trees.
This simplify the qc_lstnr_pkt_rcv() function.
If an UDP datagram contains multiple QUIC packets, they must all use the
same DCID. The datagram context is used partly for this.
To ensure this, a comparison was made on the dcid_node of DCID tree. As
this is a comparison based on pointer address, it can be faulty when
nodes are removed/readded on the same pointer address.
Replace this comparison by a proper comparison on the DCID data itself.
To this end, the dgram_ctx structure contains now a quic_cid member.
For first Initial packets, the socket source dest address is
concatenated to the DCID. This is used to be able to differentiate
possible collision between several clients which used the same ODCID.
Refactor the code to manage DCID and the concatenation with the address.
Before this, the concatenation was done on the quic_cid struct and its
<len> field incremented. In the code it is difficult to differentiate a
normal DCID with a DCID + address concatenated.
A new field <addrlen> has been added in the quic_cid struct. The <len>
field now only contains the size of the QUIC DCID. the <addrlen> is
first initialized to 0. If the address is concatenated, it will be
updated with the size of the concatenated address. This now means we
have to explicitely used either cid.len or cid.len + cid.addrlen to
access the DCID or the DCID + the address. The code should be clearer
thanks to this.
The field <odcid_len> in quic_rx_packet struct is now useless and has
been removed. However, a new parameter must be added to the
qc_new_conn() function to specify the size of the ODCID addrlen.
On haproxy implementation, generated DCID are on 8 bytes, the minimal
value allowed by the specification. Rename the constant representing
this size to inform that this is haproxy specific.
All operation on the ODCID/DCID trees must be conducted under a
read-write lock. Add a missing read-lock on the lookup operation inside
listener handler.
The packet number space flags were mixed with the connection level flags.
This leaded to ACK to be sent at the connection level without regard to
the underlying packet number space. But we want to be able to acknowleged
packets for a specific packet number space.
A client sends a 0-RTT data packet after an Initial one in the same datagram.
We must be able to parse such packets just after having parsed the Initial packets.
Export the code responsible which set the ->app_ops structure into
quic_set_app_ops() function. It must be called by the TLS callback which
selects the application (ssl_sock_advertise_alpn_protos) so that
to be able to build application packets after having received 0-RTT data.
The TLS does not provide us with TX secrets after we have provided it
with 0-RTT data. This is logic: the server does not need to send 0-RTT
data. We must skip the section where such secrets are derived if we do not
want to close the connection with a TLS alert.
Enable 0-RTT at the TLS context level:
RFC 9001 4.6.1. Enabling 0-RTT
Accordingly, the max_early_data_size parameter is repurposed to hold a
sentinel value 0xffffffff to indicate that the server is willing to accept
QUIC 0-RTT data.
At the SSL connection level, we must call SSL_set_quic_early_data_enabled().
This field is no more useful. Modify the traces consequently.
Also initialize ->pn_node.key value to -1, which is an illegal value
for QUIC packet number, and display it in traces if different from -1.
If not handled by qc_parse_pkt_frms(), the packet which contains it is dropped.
Add only a trace when parsing this frame at this time.
Also modify others to reduce the traces size and have more information about streams.
The xprt layer is reponsible to notify the mux of a CONNECTION_CLOSE
reception. In this case the flag QC_CF_CC_RECV is positionned on the
qcc and the mux tasklet is waken up.
One of the notable effect of the QC_CF_CC_RECV is that each qcs will be
released even if they have remaining data in their send buffers.
Set the HTX EOM flag on RX the app layer. This is required to notify
about the end of the request for the stream analyzers, else the request
channel never goes to MSG_DONE state.
Remove qc_eval_pkt() which has come with the multithreading support. It
was there to evaluate the length of a TX packet before building. We could
build from several thread TX packets without consuming a packet number for nothing (when
the building failed). But as the TX packet building functions are always
executed by the same thread, the one attached to the connection, this does
not make sense to continue to use such a function. Furthermore it is buggy
since we had to recently pad the TX packet under certain circumstances.
After the handshake has succeeded, we must delete any remaining
Initial or Handshake packets from the RX buffer. This cannot be
done depending on the state the connection (->st quic_conn struct
member value) as the packet are not received/treated in order.
Add a null byte to the end of the RX buffer to notify the consumer there is no
more data to treat.
Modify quic_rx_packet_pool_purge() which is the function which remove the
RX packet from the buffer.
Also rename this function to quic_rx_pkts_del().
As the RX packets may be accessed by the QUIC connection handler (quic_conn_io_cb())
the function responsible of decrementing their reference counters must not
access other information than these reference counters! It was a very bad idea
to try to purge the RX buffer asap when executing this function.
Do not leave in the RX buffer packets with CRYPTO data which were
already received. We do this when parsing CRYPTO frame. If already
received we must not consider such frames as if they were not received
in order! This had as side effect to interrupt the transfer of long streams
(ACK frames not parsed).
Implement the subscription in the mux on the qcs instance.
Subscribe is now used by the h3 layer when receiving an incomplete frame
on the H3 control stream. It is also used when attaching the remote
uni-directional streams on the h3 layer.
In the qc_send, the mux wakes up the qcs for each new transfer executed.
This is done via the method qcs_notify_send().
The xprt wakes up the qcs when receiving data on unidirectional streams.
This is done via the method qcs_notify_recv().
Re-implement the QUIC mux. It will reuse the mechanics from the previous
mux without all untested/unsupported features. This should ease the
maintenance.
Note that a lot of features are broken for the moment. They will be
re-implemented on the following commits to have a clean commit history.
The app layer is initialized after the handshake completion by the XPRT
stack. Call the finalize operation just after that.
Remove the erroneous call to finalize by the mux in the TPs callback as
the app layer is not yet initialized at this stage.
This should fix the missing H3 settings currently not emitted by
haproxy.
As soon as the connection ID (the one choosen by the QUIC server) has been used
by the client, we can delete its original destination connection ID from its tree.
This patch modifies ha_quic_set_encryption_secrets() to store the
secrets received by the TLS stack and prepare the information for the
next key update thanks to quic_tls_key_update().
qc_pkt_decrypt() is modified to check if we must used the next or the
previous key phase information to decrypt a short packet.
The information are rotated if the packet could be decrypted with the
next key phase information. Then new secrets, keys and IVs are updated
calling quic_tls_key_update() to prepare the next key phase.
quic_build_packet_short_header() is also modified to handle the key phase
bit from the current key phase information.
This function derives the next RX and TX keys and IVs from secrets
for the next key update key phase. We also implement quic_tls_rotate_keys()
which rotate the key update key phase information to be able to continue
to decrypt old key phase packets. Most of these information are pointers
to unsigned char.
When running Key Update process, we must maintain much information
especially when the key phase bit has been toggled by the peer as
it is possible that it is due to late packets. This patch adds
quic_tls_kp new structure to do so. They are used to store
previous and next secrets, keys and IVs associated to the previous
and next RX key phase. We also need the next TX key phase information
to be able to encrypt packets for the next key phase.
haproxy may crash when running this statement in qc_lstnr_pkt_rcv():
conn_ctx = qc->conn->xprt_ctx;
because qc->conn may not be initialized. With this patch we ensure
qc->conn is correctly initialized before accessing its ->xprt_ctx
members. We zero the xrpt_ctx structure (ssl_conn_ctx struct), then
initialize its ->conn member with HA_ATOMIC_STORE. Then, ->conn and
->conn->xptr_ctx members of quic_conn struct can be accessed with HA_ATOMIC_LOAD()
When sending a CONNECTION_CLOSE frame to immediately close the connection,
do not provide CRYPTO data to the TLS stack. Do not built anything else than a
CONNECTION_CLOSE and do not derive any secret when in immediately close state.
Seize the opportunity of this patch to rename ->err quic_conn struct member
to ->error_code.
We set this TLS error when no application protocol could be negotiated
via the TLS callback concerned. It is converted as a QUIC CRYPTO_ERROR
error (0x178).
Remove the verbosity set to 0 on quic_init_stdout_traces. This will
generate even more verbose traces on stdout with the default verbosity
of 1 when compiling with -DENABLE_QUIC_STDOUT_TRACES.
Implement a function quic_init_stdout_traces called at STG_INIT. If
ENABLE_QUIC_STDOUT_TRACES preprocessor define is set, the QUIC trace
module will be automatically activated to emit traces on stdout on the
developer level.
The main purpose for now is to be able to generate traces on the haproxy
docker image used for QUIC interop testing suite. This should facilitate
test failure analysis.
Change the way the CIDs are organized to rattach received packets DCID
to QUIC connection. This is necessary to be able to handle multiple DCID
to one connection.
For this, the quic_connection_id structure has been extended. When
allocated, they are inserted in the receiver CID tree instead of the
quic_conn directly. When receiving a packet, the receiver tree is
inspected to retrieve the quic_connection_id. The quic_connection_id
contains now contains a reference to the QUIC connection.
The comment is here to warn about a possible thread concurrence issue
when treating INITIAL packets from the same client. The macro unlikely
is added to further highlight this scarce occurence.
It is valid for a QUIC packet to contain a PADDING frame followed by
one or several other frames.
quic_parse_padding_frame() does not require change as it detect properly
the end of the frame with the first non-null byte.
This allow to use quic-go implementation which uses a PADDING-CRYPTO as
the first handshake packet.
When receiving Initial packets for Version Negotiation, no quic_conn is
instantiated. Thus, on the final trace, the quic_conn dereferencement
must be tested before using it.
This simple patch add the parsing support for theses frames. But nothing is
done at this time about the streams or flow control concerned. This is only to
prevent some QUIC tracker or interop runner tests from failing for a reason
independant of their tested features.
When we have already received ACK frames with the same largest packet
number, this is not an error at all. In this case, we must continue
to parse the ACK current frame.
Add ->err member to quic_conn struct to store the connection errors.
This is the responsability of ->send_alert callback of SSL_QUIC_METHOD
struct to handle the TLS alert and consequently update ->err value.
At this time, when entering qc_build_pkt() we build a CONNECTION_CLOSE
frame close the connection when ->err value is not null.
When adding a range, if no "lower" range was present in the ack range root for
the packet number space concerned, we did not check if the new added range could
overlap the next one. This leaded haproxy to crash when encoding negative integer
when building ACK frames.
This bug was revealed thanks to "multi_packet_client_hello" QUIC tracker
test which makes a client send two first Initial packets out of order.
->qc (QUIC connection) member of packet structure were badly initialized
when received as second Initial packet (from picoquic -Q for instance).
This leaded to corrupt the quic_conn structure with random behaviors
as size effects. This bug came with this commit:
"MINOR: quic: Possible wrong connection identification"
If we want to run quic-tracker against haproxy, we must at least
support the draft version of the TLS extension for the QUIC transport
parameters (0xffa5). quic-tracker QUIC version is draft-29 at this time.
We select this depending on the QUIC version. If draft, we select the
draft TLS extension.
UDP datagrams with Initial packet were padded only for the clients (haproxy
servers). But such packets MUST also be padded for the servers (haproxy
listeners). Furthere, for servers, only UDP datagrams containing ack-eliciting
Initial packet must be padded.
A client may send several Initial packets. This is the case for picoquic
with -Q option. In this case we must identify the connection of incoming
Initial packets thanks to the original destination connection ID.
If the client announced a QUIC version not supported by haproxy, emit a
Version Negotiation Packet, according to RFC9000 6. Version Negotiation.
This is required to be able to use the framework for QUIC interop
testing from https://github.com/marten-seemann/quic-interop-runner. The
simulator checks that the server is available by sending packets to
force the emission of a Version Negotiation Packet.
Implement a new app_ops layer for quic interop. This layer uses HTTP/0.9
on top of QUIC. Implementation is minimal, with the intent to be able to
pass interoperability test suite from
https://github.com/marten-seemann/quic-interop-runner.
It is instantiated if the negotiated ALPN is "hq-interop".
Remove the hardcoded initialization of h3 layer on mux init. Now the
ALPN is looked just after the SSL handshake. The app layer is then
installed if the ALPN negotiation returned a supported protocol.
This required to add a get_alpn on the ssl_quic layer which is just a
call to ssl_sock_get_alpn() from ssl_sock. This is mandatory to be able
to use conn_get_alpn().
Fix potential allocation failure of HTX start-line during H3 request
decoding. In this case, h3_decode_qcs returns -1 as error code.
This addresses in part github issue #1445.
->frms_rwlock is an old lock supposed to be used when several threads
could handle the same connection. This is no more the case since this
commit:
"MINOR: quic: Attach the QUIC connection to a thread."
Add a buffer per QUIC connection. At this time the listener which receives
the UDP datagram is responsible of identifying the underlying QUIC connection
and must copy the QUIC packets to its buffer.
->pkt_list member has been added to quic_conn struct to enlist the packets
in the order they have been copied to the connection buffer so that to be
able to consume this buffer when the packets are freed. This list is locked
thanks to a R/W lock to protect it from concurent accesses.
quic_rx_packet struct does not use a static buffer anymore to store the QUIC
packets contents.
At this time we allocate an RX buffer by thread.
Also take the opportunity offered by this patch to rename TX related variable
names to distinguish them from the RX part.
Some browsers may send Initial packets with sizes greater than 1252 bytes
(QUIC_INITIAL_IPV4_MTU). Let us increase this size limit up to 2048 bytes.
Also use this size for "max_udp_payload_size" transport parameter to limit
the size of the datagrams we want to receive.
On receiving CONNECTION_CLOSE frame, the mux is flagged for immediate
connection close. A stream is closed even if there is data not ACKed
left if CONNECTION_CLOSE has been received.
The mux tx buffers have been rewritten with buffers attached to qcs
instances. qc_buf_available and qc_get_buf functions are updated to
manipulates qcs. All occurences of the unused qcc ring buffer are
removed to ease the code maintenance.
Defer the shutting of a qcs if there is still data in its tx buffers. In
this case, the conn_stream is closed but the qcs is kept with a new flag
QC_SF_DETACH.
On ACK reception, the xprt wake up the shut_tl tasklet if the stream is
flagged with QC_SF_DETACH. This tasklet is responsible to free the qcs
and possibly the qcc when all bidirectional streams are removed.
Remove the quic_conn from the receiver connection_ids tree on
quic_conn_free. This fixes a crash due to dangling references in the
tree after a quic connection release.
This operation must be conducted under the listener lock. For this
reason, the quic_conn now contains a reference to its attached listener.
It seems it was a bad idea to use the same function as for TCP ssl sockets
to initialize the SSL session objects for QUIC with ssl_bio_and_sess_init().
Indeed, this had as very bad side effects to generate SSL errors due
to the fact that such BIOs initialized for QUIC could not finally be controlled
via the BIO_ctrl*() API, especially BIO_ctrl() function used by very much other
internal OpenSSL functions (BIO_push(), BIO_pop() etc).
Others OpenSSL base QUIC implementation do not use at all BIOs to configure
QUIC connections. So, we decided to proceed the same way as ngtcp2 for instance:
only initialize an SSL object and call SSL_set_quic_method() to set its
underlying method. Note that calling this function silently disable this option:
SSL_OP_ENABLE_MIDDLEBOX_COMPAT.
We implement qc_ssl_sess_init() to initialize SSL sessions for QUIC connections
to do so with a retry in case of allocation failure as this is done by
ssl_bio_and_sess_init(). We also modify the code part for haproxy servers.
We'll need to improve the API to pass other arguments in the future, so
let's start to adapt better to the current use cases. task_new() is used:
- 18 times as task_new(tid_bit)
- 18 times as task_new(MAX_THREADS_MASK)
- 2 times with a single bit (in a loop)
- 1 in the debug code that uses a mask
This patch provides 3 new functions to achieve this:
- task_new_here() to create a task on the calling thread
- task_new_anywhere() to create a task to be run anywhere
- task_new_on() to create a task to run on a specific thread
The change is trivial and will allow us to later concentrate the
required adaptations to these 3 functions only. It's still possible
to call task_new() if needed but a comment was added to encourage the
use of the new ones instead. The debug code was not changed and still
uses it.
When ACK have been received by the xprt, it must wake up the
mux if this latter has subscribed to SEND events. This is the
role of qcs_try_to_consume() to detect such a situation. This
is the function which consumes the buffer filled by the mux.
It is important to know if the packet number spaces used during the
handshakes have really been discarding. If not, this may have a
significant impact on the packet loss detection.
There were cases where the Initial packet number space was not discarded.
This leaded the packet loss detection to continue to take it into
considuration during the connection lifetime. Some Application level
packets could not be retransmitted.
The STREAM data to send coming from the upper layer must be stored until
having being acked by the peer. To do so, we store them in buffer structs,
one by stream (see qcs.tx.buf). Each time a STREAM is built by quic_push_frame(),
its offset must match the offset of the first byte added to the buffer (modulo
the size of the buffer) by the frame. As they are not always acknowledged in
order, they may be stored in eb_trees ordered by their offset to be sure
to sequentially delete the STREAM data from their buffer, in the order they
have been added to it.
The peer transport parameter values were not initialized with
the default ones (when absent), especially the
"active_connection_id_limit" parameter with 2 as default value
when absent from received remote transport parameters. This
had as side effect to send too much NEW_CONNECTION_ID frames.
This was the case for curl which does not announce any
"active_connection_id_limit" parameter.
Also rename ->idle_timeout to ->max_idle_timeout to reflect the RFC9000.
These salts are used to derive initial secrets to decrypt the first Initial packet.
We support draft-29 and v1 QUIC version initial salts.
Add parameters to our QUIC-TLS API functions used to derive these secret for
these salts.
Make our xprt_quic use the correct initial salt upon QUIC version field found in
the first paquet. Useful to support connections with curl which use draft-29
QUIC version.
Move the "ACK required" bit from the packet number space to the connection level.
Force the "ACK required" option when acknowlegding Handshake or Initial packet.
A client may send three packets with a different encryption level for each. So,
this patch modifies qc_treat_rx_pkts() to consider two encryption level passed
as parameters, in place of only one.
Make qc_conn_io_cb() restart its process after the handshake has succeeded
so that to process any Application level packets which have already been received
in the same datagram as the last CRYPTO frames in Handshake packets.
We must take as most as possible data from STREAM frames to be encapsulated
in QUIC packets, almost as this is done for CRYPTO frames whose fields are
variable length fields. The difference is that STREAM frames are only accepted
for short packets without any "Length" field. So it is sufficient to call
max_available_room() for that in place of max_stream_data_size() as this
is done for CRYPTO data.
It is possible the TLS stack stack provides us with 1-RTT TX secrets
at the same time as Handshake secrets are provided. Thanks to this
simple patch we can build Application level packets during the handshake.
Make qc_prep_hdshk_pkts() and qui_conn_io_cb() handle the case
where we enter them with QUIC_HS_ST_COMPLETE or QUIC_HS_ST_CONFIRMED
as connection state with QUIC_TLS_ENC_LEVEL_APP and QUIC_TLS_ENC_LEVEL_NONE
to consider to prepare packets.
quic_get_tls_enc_levels() is modified to return QUIC_TLS_ENC_LEVEL_APP
and QUIC_TLS_ENC_LEVEL_NONE as levels to consider when coalescing
packets in the same datagram.
With very few packets received by the listener, it is possible
that its state may move from QUIC_HS_ST_SERVER_INITIAL to
QUIC_HS_ST_COMPLETE without transition to QUIC_HS_ST_SERVER_HANDSHAKE state.
This latter state is not mandatory.
This simple enable use to coalesce Application level packet with
Handshake ones at the end of the handshake. This is highly useful
if we do want to send a short Handshake packet followed by Application
level ones.
We must evaluate the packet lenghts in advance to be sure we do not
consume a packet number for nothing. The packet building must always
succeeds. This is the role of qc_eval_pkt() implemented by this patch
called before calling qc_do_build_pkt() which was previously modified to
always succeed.
There were cases where the encoded size of acks was not updated leading
to ACK frames building too big compared to the expected size. At this
time, this makes the code "BUG_ON()".
Rename qc_build_hdshk_pkt() to qc_build_pkt() and qc_do_build_hdshk_pkt()
to qc_do_build_pkt().
Update their comments consequently.
Make qc_do_build_hdshk_pkt() BUG_ON() when it does not manage to build
a packet. This is a bug!
Remove the functions which were specific to the Application level.
This is the same function which build any packet for any encryption
level: quic_prep_hdshk_pkts() directly called from the quic_conn_io_cb().
There is no need to pass a copy of CRYPTO frames to qc_build_frm() from
qc_do_build_hdshk_pkt(). Furthermore, after the previous modifications,
qc_do_build_hdshk_pkt() do not build only CRYPTO frame from ->pktns.tx.frms
MT_LIST but any type of frame.
Atomically increase the "next packet variable" before building a new packet.
Make the code bug on a packet building failure. This should never happen
if we do not want to consume a packet number for nothing. There are remaining
modifications to come to ensure this is the case.
Modify this task which is called at least each a packet is received by a listener
so that to make it behave almost as qc_do_hdshk(). This latter is no more useful
and removed.
This function was responsible of building CRYPTO frames to fill as much as
possible a packet passed as argument. This patch makes it support any frame
except STREAM frames whose lengths are highly variable.
We want to treat all the frames to be built the same way as frames
built during handshake (CRYPTO frames). So, let't store them at the same
place which is an MT_LIST.
These structures are similar. quic_tx_frm was there to try to reduce the
size of such objects which embed a union for all the QUIC frames.
Furtheremore this patch fixes the issue where quic_tx_frm objects were freed
from the pool for quic_frame.
Make quic_rx_packet_ref(inc|dec)() functions be thread safe.
Make use of ->rx.crypto.frms_rwlock RW lock when manipulating RX frames
from qc_treat_rx_crypto_frms().
Modify atomically several variables attached to RX part of quic_enc_level struct.
->rx.crypto member of quic_enc_level struct was not initialized as
this was done for all other members of this structure. This patch
fixes this.
Also adds a RW lock for the frame of this member.
If we let the connection packet handler task (quic_conn_io_cb) process the first
client Initial packet which contain the TLS Client Hello message before the mux
context is initialized, quic_mux_transport_params_update() makes haproxy crash.
->start xprt callback already wakes up this task and is called after all the
connection contexts are initialized. So, this patch do not wakes up quic_conn_io_cb()
if the mux context is not initialized (this was already the case for the connection
context (conn_ctx)).
If we add TX packets to their trees before sending them, they may
be detected as lost before being sent. This may make haproxy crash
when it retreives the prepared packets from TX ring buffers, dereferencing
them after they have been freed.
We use only ring buffers (struct qring) to prepare and send QUIC datagrams.
We can safely remove the old buffering implementation which was not thread safe.
We modify the functions responsible of building packets to put these latters
in ring buffers (qc_build_hdshk_pkt() during the handshake step, and
qc_build_phdshk_apkt() during the post-handshake step). These functions
remove a ring buffer from its list to build as much as possible datagrams.
Eache datagram is prepended of two field: the datagram length and the
first packet in the datagram. We chain the packets belonging to the same datagram
in a singly linked list to reach them from the first one: indeed we must
modify some members of each packet when we really send them from send_ppkts().
This function is also modified to retrieved the datagram from ring buffers.
We initialize the pointer to the listener TX ring buffer list.
Note that this is not done for QUIC clients as we do not fully support them:
we only have to allocate the list and attach it to server struct I guess.
Before this patch we reserved 16 bytes (QUIC_TLS_TAG_LEN) before building the
handshake packet to be sure to be able to add the tag which comes with the
the packet encryption, decreasing the end offset of the building buffer by 16 bytes.
But this tag length was taken into an account when calling qc_build_frms() which
computes and build crypto frames for the remaining available room thanks to <*len>
parameter which is the length of the already present bytes in the building buffer
before adding CRYPTO frames. This leaded us to waste the 16 last bytes of the buffer
which were not used.
This make at least our listeners answer to ngtcp2 clients without
HelloRetryRequest message. It seems the server choses the first
group in the group list ordered by preference and set by
SSL_CTX_set1_curves_list() which match the client ones.
Modify the I/O dgram handler principal function used to parse QUIC packets
be thread safe. Its role is at least to create new incoming connections
add to two trees protected by the same RW lock. The packets are for now on
fully parsed before possibly creating new connections.
Allocate everything needed for a connection (struct quic_conn) from the same
function.
Rename qc_new_conn_init() to qc_new_conn() to reflect these modifications.
Insert these connection objects in their tree after returning from this function.
Some SSL call may be called with pointer to ssl_sock_ctx struct as parameter
which does not match the quic_conn_ctx struct type (see ssl_sock_infocb()).
I am not sure we have to keep such callbacks for QUIC but we must ensure
the SSL and QUIC xprts use the same data structure as context.
Move the connection state from quic_conn_ctx struct to quic_conn struct which
is the structure which is used to store the QUIC connection part information.
This structure is initialized by the I/O dgram handler for each new connection
to QUIC listeners. This is needed for the multithread support so that to not
to have to depend on the connection context potentially initialized by another
thread.
We must protect from concurrent the tree which stores the QUIC packets received
by the dgram I/O handler, these packets being also parsed by the xprt task.
No need to call free_quic_rx_packet() after calling quic_rx_packet_eb64_delete()
as this latter already calls quic_rx_packet_refdec() also called by
free_quic_rx_packet().
Let's say that we have to insert a range R between to others A and B
with A->first <= R->first <= B->first. We have to remove the ranges
which are overlapsed by R during. This was correctly done when
the intersection between A and R was not empty, but not when the
intersection between R and B was not empty. If this latter case
after having inserting a new range R we set <new> variable as the
node to consider to check the overlaping between R and its following
ranges.