This counter must be incremented only one time by connection and decremented
as soon as the handshake has failed or succeeded. This is a gauge. Under certain
conditions this counter could be decremented twice. For instance
after having received a TLS alert, then upon SSL_do_handshake() failure.
To stop having to deal to all the current combinations which can lead to such a
situation (and the next to come), add a connection flag to denote if this counter
has been already decremented for a connection. So, this counter must be decremented
only if this flag has not been already set.
Must be backported up to 2.6.
Non constant expressions were used to initialize constant variables leading to
such compilation errors:
src/xprt_quic.c:66:3: error: initializer element is not a constant expression
.key_label_len = strlen(QUIC_HKDF_KEY_LABEL_V1),
Reproduced with CC=gcc-4.9 compilation option.
Fix using macros for each HKDF label.
At this time haproxy supported only incompatible version negotiation feature which
consists in sending a Version Negotiation packet after having received a long packet
without compatible value in its version field. This version value is the version
use to build the current packet. This patch does not modify this behavior.
This patch adds the support for compatible version negotiation feature which
allows endpoints to negotiate during the first flight or packets sent by the
client the QUIC version to use for the connection (or after the first flight).
This is done thanks to "version_information" parameter sent by both endpoints.
To be short, the client offers a list of supported versions by preference order.
The server (or haproxy listener) chooses the first version it also supported as
negotiated version.
This implementation has an impact on the tranport parameters handling (in both
direcetions). Indeed, the server must sent its version information, but only
after received and parsed the client transport parameters). So we cannot
encode these parameters at the same time we instantiated a new connection.
Add QUIC_TP_DRAFT_VERSION_INFORMATION(0xff73db) new transport parameter.
Add tp_version_information new C struct to handle this new parameter.
Implement quic_transport_param_enc_version_info() (resp.
quic_transport_param_dec_version_info()) to encode (resp. decode) this
parameter.
Add qc_conn_finalize() which encodes the transport parameters and configure
the TLS stack to send them.
Add ->negotiated_ictx quic_conn C struct new member to store the Initial
QUIC TLS context for the negotiated version. The Initial secrets derivation
is version dependent.
Rename ->version to ->original_version and add ->negotiated_version to
this C struct to reflect the QUIC-VN RFC denomination.
Modify most of the QUIC TLS API functions to pass a version as parameter.
Export the QUIC version definitions to be reused at least from quic_tp.c
(transport parameters.
Move the token check after the QUIC connection lookup. As this is the original
version which is sent into a Retry packet, and because this original version is
stored into the connection, we must check the token after having retreived this
connection.
Add packet version to traces.
See https://datatracker.ietf.org/doc/html/draft-ietf-quic-version-negotiation-08
for more information about this new feature.
This is not clear at all how to distinguish a QUIC draft version number from a
released one. And among these QUIC draft versions, which one must use the draft
QUIC TLS extension.
According to the QUIC implementations which support v2 draft, the TLS extension
(transport parameters) to be used is the released one
(TLS_EXTENSION_QUIC_TRANSPORT_PARAMETERS).
As the unique QUIC draft version we support is 0xff00001d and as at this time the
unique version with 0xff as most significant byte is this latter which must use
the draft TLS extension, we select the draft TLS extension
(TLS_EXTENSION_QUIC_TRANSPORT_PARAMETERS_DRAFT) only for such versions with 0xff
as most signification byte.
This is becoming difficult to handle the QUIC TLS related definitions
which arrive with a QUIC version (draft or not). So, here we add
quic_version C struct which does not define only the QUIC version number,
but also the QUIC TLS definitions which depend on a QUIC version.
Modify consequently all the QUIC TLS API to reuse these definitions
through new quic_version C struct.
Implement quic_pkt_type() function which return a packet type (0 up to 3)
depending on the QUIC version number.
Stop harding the Retry packet first byte in send_retry(): this is not more
possible because the packet type field depends on the QUIC version.
Also modify quic_build_packet_long_header() for the same reason: the packet
type depends on the QUIC version.
Add a quic_version C struct member to quic_conn C struct.
Modify qc_lstnr_pkt_rcv() to set this member asap.
Remove the version member from quic_rx_packet C struct: a packet is attached
asap to a connection (or dropped) which is the unique object which should
store the QUIC version.
Modify qc_pkt_is_supported_version() to return a supported quic_version C
struct from a version number.
Add Initial salt for QUIC v2 draft (initial_salt_v2_draft).
This is to prepare the support for QUIC v2 version. The packet type
depends on the version. So, we must parse early enough the version
before defining the type of each packet.
The nonce and keys used to cipher the Retry tag depend on the QUIC version.
Add these definitions for 0xff00001d (draft-29) and v2 QUIC version. At least
draft-29 is useful for QUIC tracker tests with "quic-force-retry" enabled
on haproxy side.
Validated with -v 0xff00001d ngtcp2 option.
Could not validate the v2 nonce and key at this time because not supported.
Use the same version as the one received. This is safe because the
version is treated before anything else sending a Version packet.
Must be backported to 2.6.
This simply replaces a call to task_new(1<<thr) with task_new_on(thr)
so that we can later isolate the changes required to add more thread
group stuff.
Building QUIC with gcc-4.4 on el6 shows this error:
src/xprt_quic.c: In function 'qc_release_lost_pkts':
src/xprt_quic.c:1905: error: unknown field 'loss' specified in initializer
compilation terminated due to -Wfatal-errors.
make: *** [src/xprt_quic.o] Error 1
make: *** Waiting for unfinished jobs....
Initializing an anonymous form of union like :
struct quic_cc_event ev = {
(...)
.loss.time_sent = newest_lost->time_sent,
(...)
};
generates an error with gcc-4.4 but not when initializing the
fields outside of the declaration.
The MUX now provides a single API for both uni and bidirectional
streams. It is responsible to reject reception on a local unidirectional
stream with the error STREAM_STATE_ERROR. This is already implemented in
qcc_recv(). As such, remove this duplicated check from xprt_quic.c.
If the connection client timeout has expired, the mux is released.
If the client decides to initiate a new request, we send a STOP_SENDING
frame. Then, the client endessly sends a RESET_STREAM frame.
At this time, we simulate the fact that we support the RESET_STREAM frame
thanks to this ridiculously minimalistic patch.
If the connection client timeout has expired, the mux is released.
If the client decides to initiate a new request, we do not ack its
request. This leads the client to endlessly sent it request.
This patch makes a QUIC listener send a STOP_SENDING frame in such
a situation.
Add ->inc_err_cnt new callback to qcc_app_ops struct which can
be called from xprt to increment the application level error code counters.
It take the application context as first parameter to be generic and support
new QUIC applications to come.
Add h3_stats.c module with counters for all the frame types and error codes.
Add new counters to count the number of dropped packet upon parsing error, lost
sent packets and the number of stateless reset packet sent.
Take the oppportunity of this patch to rename CONN_OPENINGS to QUIC_ST_HALF_OPEN_CONN
(total number of half open connections) and QUIC_ST_HDSHK_FAILS to QUIC_ST_HDSHK_FAIL.
When we select the next encryption level in qc_treat_rx_pkts() we
must reset the local largest_pn variable if we do not want to reuse its
previous value for this encryption. This bug could only happend during
handshake step and had no visible impact because this variable
is only used during the header protection removal step which hopefully
supports the packet reordering.
Add quic_transport_params_dump() static inline function to do so for
a quic_transport_parameters struct as parameter.
We use the trace API do dump these transport parameters both
after they have been initialized (RX/local) or received (TX/remote).
Make the transport parameters be standlone as much as possible as
it consists only in encoding/decoding data into/from buffers.
Reduce the size of xprt_quic.h. Unfortunalety, I think we will
have to continue to include <xprt_quic-t.h> to use the trace API
into this module.
We do not want to count the out of packet padding as being belonging
to an invalid packet, the firt byte of a QUIC packet being never null.
Some browsers like firefox proceeds this way to add PADDING frames
after an Initial packet and increase the size of their Initial packets.
The whole QUIC stack is impacted by this change :
* at quic-conn level, a single function is now used to handle uni and
bidirectional streams. It uses qcc_recv() function from MUX.
* at MUX level, qc_recv() io-handler function does not skip uni streams
* most changes are conducted at app layer. Most notably, all received
data is handle by decode_qcs operation.
Now that decode_qcs is the single app read function, the H3 layer can be
simplified. Uni streams parsing was extracted from h3_attach_ruqs() to
h3_decode_qcs().
h3_decode_qcs() is able to deal with all HTTP/3 frame types. It first
check if the frame is valid for the H3 stream type. Most notably,
SETTINGS parsing was moved from h3_control_recv() into h3_decode_qcs().
This commit has some major benefits besides removing duplicated code.
Mainly, QUIC flow control is now enforced for uni streams as with bidi
streams. Also, an unknown frame received on control stream does not set
an error : it is now silently ignored as required by the specification.
Some cleaning in H3 code is already done with this patch :
h3_control_recv() and h3_attach_ruqs() are removed as they are now
unused. A final patch should clean up the unneeded remaining bit.
Complete quic-conn API for error reporting. A new parameter <app> is
defined in the function quic_set_connection_close(). This will transform
the frame into a CONNECTION_CLOSE_APP type.
This type of frame will be generated by the applicative layer, h3 or
hq-interop for the moment. A new function qcc_emit_cc_app() is exported
by the MUX layer for them.
This reverts commit 118b2cbf84.
This patch was useful mainly for the docker image of QUIC interop to
have traces on stdout.
A better solution has been found by integrating this patch directly in
the qns repository which is used to build the docker image. Thus, this
hack is not require anymore in the main repository.
When we receive a CONNECTION_CLOSE frame, we should decrement this counter
if the handshake state was not successful and if we have not received
a TLS alert from the TLS stack.
Define an API to easily set a CONNECTION_CLOSE. This will mainly be
useful for the MUX when an error is detected which require to close the
whole connection.
On the MUX side, a new flag is added when a CONNECTION_CLOSE has been
prepared. This will disable add future send operations.
We rely on <conn_opening> stats counter and tune.quic.retry_threshold
setting to dynamically start sending Retry packets. We continue to send such packets
when "quic-force-retry" setting is set. The difference is when we receive tokens.
We check them regardless of this setting because the Retry could have been
dynamically started. We must also send Retry packets when we receive Initial
packets without token if the dynamic Retry threshold was reached but only for connection
which are not currently opening or in others words for Initial packets without
connection already instantiated. Indeed, we must not send Retry packets for all
Initial packets without token. For instance a client may have already sent an
Initial packet without receiving Retry packet because the Retry feature was not
started, then the Retry starts on exeeding the threshold value due to others
connections, then finally our client decide to send another Initial packet
(to ACK Initial CRYPTO data for instance). It does this without token. So, for
this already existing connection we must not send a Retry packet.
First commit to handle the QUIC stats counters. There is nothing special to say
except perhaps for ->conn_openings which is a gauge to count the number of
connection openings. It is incremented after having instantiated a quic_conn
struct, then decremented when the handshake was successful (handshake completed
state) or failed or when the connection timed out without reaching the handshake
completed state.
Move the code which finalizes the QUIC connections initialisations after
having called qc_new_conn() into this function to benefit from its
error handling to release the memory allocated for QUIC connections
the initialization of which could not be finalized.
Here is the format of a token:
- format (1 byte)
- ODCID (from 9 up 21 bytes)
- creation timestamp (4 bytes)
- salt (16 bytes)
A format byte is required to distinguish the Retry token from others sent in
NEW_TOKEN frames.
The Retry token is ciphered after having derived a strong secret from the cluster secret
and generated the AEAD AAD, as well as a 16 bytes long salt. This salt is
added to the token. Obviously it is not ciphered. The format byte is not
ciphered too.
The AAD are built by quic_generate_retry_token_aad() which concatenates the version,
the client SCID and the IP address and port. We had to implement quic_saddr_cpy()
to copy the IP address and port to the AAD buffer. Only the Retry SCID is generated
on our side to build a Retry packet, the others fields come from the first packet
received by the client. It must reuse this Retry SCID in response to our Retry packet.
So, we have not to store it on our side. Everything is offloaded to the client (stateless).
quic_generate_retry_token() must be used to generate a Retry packet. It calls
quic_pkt_encrypt() to cipher the token.
quic_generate_retry_check() must be used to check the validity of a Retry token.
It is able to decipher a token which arrives into an Initial packet in response
to a Retry packet. It calls parse_retry_token() after having deciphered the token
to store the ODCID into a local quic_cid struct variable. Finally this ODCID may
be stored into the transport parameter thanks to qc_lstnr_params_init().
The Retry token lifetime is 10 seconds. This lifetime is also checked by
quic_generate_retry_check(). If quic_generate_retry_check() fails, the received
packet is dropped without anymore packet processing at this time.
Slightly change the interface for qcc_recv() between MUX and XPRT. The
MUX is now responsible to call qcc_decode_qcs(). This is cleaner as now
the XPRT does not have to deal with an extra QCS parameter and the MUX
will call qcc_decode_qcs() only if really needed.
This change is possible since there is no extra buffering for
out-of-order STREAM frames and the XPRT does not have to handle buffered
frames.
The quic-conn manages a buffer to store received QUIC packets. When the
buffer wraps, the gap is filled until the end with junk and packets can
be inserted at the start of the buffer.
On the other end, deletion is implemented via quic_rx_pkts_del().
Packets are removed one by one if their refcount is nul. If junk is
found, the buffer is emptied until its wrap.
This seems to work in most cases but a bug was found in a particular
case : on insertion if buffer gap is not at the end of the buffer. In
this case, the gap was filled, which is useless as now the buffer is
full and the packet cannot be inserted. Worst, on deletion, when junk is
removed there is a risk to removed new packets. This can happens in the
following case :
1. buffer contig space is too small, junk is inserted in the middle of
it
2. on quic_rx_pkts_del() invocation, a packet is removed, but not the
next one because its refcount is still positive. When a new packet is
received, it will be stored after the junk.
3. on next quic_rx_pkts_del(), when junk is removed, all contig data is
cleared, with newer packets data too.
This will cause a transfer between a client and haproxy to be stalled.
This can be reproduced with big enough POST requests. I triggered it
with ngtcp2 and 10M of posted data.
Hopefully, the solution of this bug is simple. If contig space is not
big enough to store a packet, but the space is not at the end of the
buffer, no junk is inserted and the packet is dropped as we cannot
buffered it. This ensures that junk is only present at the end of the
buffer and when removed no packets data is purged with it.
quic_rx_strm_frm type was used to buffered STREAM frames received out of
order. Now the MUX is able to deal directly with these frames and
buffered it inside its ncbuf.
This commit is the equivalent for uni-streams of previous commit
MEDIUM: mux-quic/h3/hq-interop: use ncbuf for bidir streams
All unidirectional streams data is now handle in MUX Rx ncbuf. The
obsolete buffer is not unused and will be cleared in the following
patches.
Add a ncbuf for data reception on qcs. Thanks to this, the MUX is able
to buffered all received frame directly into the buffer. Flow control
parameters will be used to ensure there is never an overflow.
This change will simplify Rx path with the future deletion of acked
frames tree previously used for frames out of order.
Add send_stateless_reset() to send a stateless reset packet. It prepares
a packet to build a 1-RTT packet with quic_stateless_reset_token_cpy()
to copy a stateless reset token derived from the cluster secret with
the destination connection ID received as salt.
Also add QUIC_EV_STATELESS_RST new trace event to at least to have a trace
of the connection which are reset.
A server may send the stateless reset token associated to the current
connection from its transport parameters. So, let's copy it from
qc_lstnt_params_init().
The stateless reset token of a connection is generated from qc_new_conn() when
allocating the first connection ID. A QUIC server can copy it into its transport
parameters to allow the peer to reset the associated connection.
This latter is not easily reachable after having returned from qc_new_conn().
We want to be able to initialize the transport parameters from this function which
has an access to all the information to do so.
Extract the code used to initialize the transport parameters from qc_lstnr_pkt_rcv()
and make it callable from qc_new_conn(). qc_lstnr_params_init() is implemented
to accomplish this task for a haproxy listener.
Modify qc_new_conn() to reduce its the number of parameters.
The source address coming from Initial packets is also copied from qc_new_conn().
Add quic_stateless_reset_token_init() wrapper function around
quic_hkdf_extract_and_expand() function to derive the stateless reset tokens
attached to the connection IDs from "cluster-secret" configuration setting
and call it each time we instantiate a QUIC connection ID.
This function will have to call another one from quic_tls.[ch] soon.
As we do not want to include quic_tls.h from xprt_quic.h because
quic_tls.h already includes xprt_quic.h, let's moving it into
xprt_quic.c.
A ->time_received new member is added to quic_rx_packet to store the time the
packet are received. ->largest_time_received is added the the packet number
space structure to store this timestamp for the packet with a new largest
packet number to be acknowledged. QUIC_FL_PKTNS_NEW_LARGEST_PN new flag is
added to mark a packet number space as having to acknowledged a packet wih a
new largest packet number. In this case, the packet number space ack delay
must be recalculated.
Add quic_compute_ack_delay_us() function to compute the ack delay from the value
of the time a packet was received. Used only when a packet with a new largest
packet number.
As we do not have any task to be wake up by the poller after sendto() error,
we add an sendto() error counter to the quic_conn struct.
Dump its values from qc_send_ppkts().
It is possible that we continue to receive retransmitted STREAM frames after
the mux have been released. We rely on the ->rx.streams[].nb_streams counter
to check the stream was closed. If not, at this time we drop the packet.
This is required when the retransmitted frame types when the mux is released.
We add a counter for the number of streams which were opened or closed by the mux.
After the mux has been released, we can rely on this counter to know if the STREAM
frames are retransmitted ones or not.
If the MUX cannot handle immediately nor buffer a STREAM frame, the
packet containing it must not be acknowledge. This is in conformance
with the RFC9000.
qcc_recv() return codes have been adjusted to differentiate an invalid
frame with an already fully received offset which must be acknowledged.
If a packet contains a STREAM frame but the MUX is not allocated, the
frame cannot be enqueued. According to the RFC9000, we must not
acknowledge the packet under this condition.
This may prevents a bug with firefox which keeps trying on refreshing
the web page. This issue has already been detected before closing state
implementation : haproxy wasn't emitted CONNECTION_CLOSE and keeps
acknowledge STREAM frames despite not handle them.
In the future, it might be necessary to respond with a CONNECTION_CLOSE
if the MUX has already been freed.
This function is used to parse the QUIC packets carried by a UDP datagram.
When a correct packet could be found, the ->len RX packet structure value
is set to the packet length value. On the contrary, it is set to the remaining
number of bytes in the UDP datagram if no correct QUIC packet could be found.
So, there is no need to make this function return a status value. It allows
the caller to parse any QUIC packet carried by a UDP datagram without this.
Any client Initial packet carried in a datagram smaller than QUIC_INITIAL_PACKET_MINLEN(200)
bytes must be discarded. This does not mean we must discard the entire datagram.
So we must at least try to parse the packet length before dropping the packet
and return its length from qc_lstnr_pkt_rcv().
A crash is possible under such circumtances:
- The congestion window is drastically reduced to its miniaml value
when a quic listener is experiencing extreme packet loss ;
- we enqueue several STREAM frames to be resent and some of them could not be
transmitted ;
- some of the still in flight are acknowledged and trigger the
stream memory releasing ;
- when we come back to send the remaing STREAM frames, haproxy
crashes when it tries to build them.
To fix this issue, we mark the STREAM frame as lost when detected as lost.
Then a lookup if performed for the stream the STREAM frames are attached
to before building them. They are released if the stream is no more available
or the data range of the frame is consumed.
When we have to probe the peer, we must first try to send new data. This is done
here waking up the mux after having set the number of maximum number of datagrams
to send to QUIC_MAX_NB_PTO_DGRAMS (2). Of course, this is only the case if the
mux was subscribed to SEND events.
This may happen in rare cases with extreme packet loss (30% for both TX and RX)
which leads the congestion window to decrease down to its minimal value (two
datagrams). Under such circumtances, no ack-eliciting frame can be added to
a packet by qc_build_frms(). In this case we must cancel the packet building
process if there is no ACK or probe (PING frame) to send.
This function must return a successful status as soon as it could be build
a frame to be embedded by a packet. This behavior was broken by the last
modifications. This was due to a dangerous "ret = 1" statement inside
a loop. This statement must be reach only if we go out of a switch/case
after a "break" statement.
Add comments to mention this information.
When we are probing, we do not receive packets, furthermore all ACK frames have
already been sent. This is useless to send ACK when probing the peer. This
modification does not reset the flag which marks the connection as requiring an
ACK frame to be sent. If this is the case, this will be taken into an account
by after the probing process.
Make the two I/O handlers quic_conn_io_cb() and quic_conn_app_io_cb() call
qc_dgrams_retransmit() after probing retransmissions need was detected by
the timer task (qc_process_timer()).
We must modify qc_prep_pkts() to support QUIC_TLS_ENC_LEVEL_NONE as <next_tel>
parameter when called from qc_dgrams_retransmit().
When probing retranmissions with old data are needed for the connection we
mark the packets as probing with old data to track them when acknowledged:
we do not resend frames with old data when lost.
Modify qc_send_app_pkt() to distinguish the case where it sends new data
against the case where it sends old data during probing retransmissions.
We add <old_data> boolean parameter to this function to do so. The mux
never directly send old data when probing retransmissions are needed by
the connection.
This function is used to requeue the TX frames from TX packets which have
been detected as lost. The modifications consist in avoiding resending frames from
duplicated frames used to probe the peer. This is useless. Only the original
frames loss must be taken into an account because detected as lost before
the retransmitted frames. If these latter are also detected as lost, other
duplicated frames would have been retransmitted before their loss detection.
qc_prep_fast_retrans() and qc_prep_hdshk_fast_retrans() are modified to
take two list of frames as parameters. Two lists are needed for
qc_prep_hdshk_fast_retrans() to build datagrams with two packets during
handshake. qc_prep_fast_retrans() needs two lists of frames to be used
to send two datagrams with one list by datagram.
We want to be able to resend frames from list of frames during handshakes to
resend datagrams with the same frames as during the first transmissions.
This leads to decrease drasctically the chances of frame fragmentation due to
variable lengths of the packet fields. Furthermore the frames were not duplicated
when retransmitted from a packet to another one. This must be the case only during
packet loss dectection.
qc_dup_pkt_frms() is there to duplicate the frames from an input list to an output
list. A distinction is made between STREAM frames and the other ones because we
can rely on the "acknowledged offset" the aim of which is to track the number
of bytes which were acknowledged from TX STREAM frames.
qc_release_frm() in addition to release the frame passed as parameter, also mark
the duplicate STREAM frames as acknowledeged.
qc_send_hdshk_pkts() is the qc_send_app_pkts() counterpart to send datagrams from
at most two list of frames to be able to coalesced packets from two different
packet number spaces
qc_dgrams_retransmit() is there to probe the peer with datagrams depending on the
need of the packet number spaces which must be flag with QUIC_FL_PKTNS_PROBE_NEEDED
by the PTO timer task (qc_process_timer()).
Add QUIC_FL_CONN_RETRANS_NEEDED connection flag definition to mark a quic_conn
struct as needing a retranmission.
Add QUIC_FL_PKTNS_PROBE_NEEDED to mark a packet number space as needing a
datagram probing.
Set these flags from process_timer() to trigger datagram probings.
Do not initiate anymore datagrams probing from any quic encryption level.
This will be done from the I/O handlers (quic_conn_io_cb() during handshakes and
quic_conn_app_io_cb() after handshakes).
Add QUIC_FL_TX_PACKET_COALESCED flag to mark a TX packet as coalesced with others
to build a datagram.
Ensure we do not directly retransmit frames from such coalesced packets. They must
be retransmitted from their packet number spaces to avoid duplications.
We want to track the frames which have been duplicated during retransmissions so
that to avoid uselessly retransmitting frames which would already have been
acknowledged. ->origin new member is there to store the frame from which a copy
was done, ->reflist is a list to store the frames which are copies.
Also ensure all the frames are zeroed and that their ->reflist list member is
initialized.
Add QUIC_FL_TX_FRAME_ACKED flag definition to mark a TX frame as acknowledged.
Add a loop in the bidi STREAM function. This will call repeatdly
qcc_decode_qcs() and dequeue buffered frames.
This is useful when reception of more data is interrupted because the
MUX buffer was full. qcc_decode_qcs() has probably free some space so it
is useful to immediatly retry reception of buffered frames of the qcs
tree.
This may fix occurences of stalled Rx transfers with large payload.
Note however that there is still room for improvment. The conn-stream
layer is not able at this moment to retrigger demuxing. This is because
the mux io-handler does not treat Rx : this may continue to cause
stalled tranfers.
Improve the reception for STREAM frames. In qcc_recv(), if the frame is
bigger than the remaining space in rx buffer, do not reject it wholly.
Instead, copy as much data as possible. The rest of the data is
buffered.
This is necessary to handle H3 frames bigger than a buffer. The H3 code
does not demux until the frame is complete or the buffer is full.
Without this, the transfer on payload larger than the Rx buffer can
rapidly freeze.
There were plenty of leftovers from old code that were never removed
and that are not needed at all since these files do not use any
definition depending on fcntl.h, let's drop them.
Transport layers (raw_sock, ssl_sock, xprt_handshake and xprt_quic)
were using 4 constructors and 2 destructors. The 4 constructors were
replaced with INITCALL and the destructors with REGISTER_POST_DEINIT()
so that we do not depend on this anymore.
When using qc_stream_desc_ack(), the stream instance may be freed if
there is no more data in its buffers. This also means that all frames
still stored waiting for ACK for this stream are freed via
qc_stream_desc_free().
This is particularly important in quic_stream_try_to_consume() where we
loop over the frames tree of the stream. A use-after-free is present in
cas the stream has been freed in the trace "stream consumed" which
dereference the frame. Fix this by first checking if the stream has been
freed or not.
This bug was detected by using ASAN + quic traces enabled.
It is possible the xprt layer have to process retransmitted STREAM frames after
the mux was released. In this case, there is no need to try to wake it up.
MUX streams can now allocate multiple buffers for sending. quic-conn is
responsible to limit the total count of allowed allocated buffers. A
counter is stored in the new field <stream_buf_count>.
For the moment, the value is hardcoded to 30.
On stream buffer allocation failure, the qcc MUX is flagged with
QC_CF_CONN_FULL. The MUX is then woken up as soon as a buffer is freed,
most notably on ACK reception.
Acknowledge of STREAM has been complexified with the introduction of
stream multi buffers. Two functions are executing roughly the same set
of instructions in xprt_quic.c.
To simplify this, move the code complexity in a new function
qc_stream_desc_ack(). It will handle offset calculation, removal of
data, freeing oldest buffer and freeing stream instance if required.
The qc_stream_desc API is cleaner as qc_stream_desc_free_buf() ambiguous
function has been removed.
Complete the qc_stream_desc type to support multiple buffers on
emission. The main objective is to increase the transfer throughput.
The MUX is now able to transfer more data without having to wait ACKs.
To implement this feature, a new type qc_stream_buf is declared. it
encapsulates a buffer with a list element. New functions are defined to
retrieve the current buffer, release it or allocate a new one. Each
buffer is kept in the qc_stream_desc list until all of its data is
acknowledged.
On the MUX side, a qcs uses the current stream buffer to transfer data.
Once the buffer is full, it is released and a new one will be allocated
on a future qc_send() invocation.
Simplify the model qcs/qc_stream_desc. Each types has now its own tree
node, stored respectively in qcc and quic-conn trees. It is still
necessary to mark the stream as detached by the MUX once all data is
transfered to the lower layer.
This might improve slightly the performance on ACK management as now
only the lookup in quic-conn is necessary. On the other hand, memory
size of qcs structure is increased.
Regroup all type definitions and functions related to qc_stream_desc in
the source file src/quic_stream.c.
qc_stream_desc complexity will be increased with the development of Tx
multi-buffers. Having a dedicated module is useful to mix it with
pure transport/quic-conn code.
A released qc_stream_desc is freed as soon as all its buffer content has
been acknowledged. However, it may still contains other frames waiting
for ACK pointing to deleted buffer content. This can happen on
retransmission.
When freeing a qc_stream_desc, free all its frames in acked_frms tree to
fix memory leak. This may also possibly fix a crash on retransmission.
Now, the frames are properly removed from a packet. This ensure we do
not retransmit a frame whose buffer is deallocated.
Emit a CONNECTION_CLOSE if the app layer cannot be properly initialized
on qc_xprt_start. This force the quic-conn to enter the closing state
before being closed.
Without this, quic-conn normal operations continue, despite the
app-layer reported as not initialized. This behavior is undefined, in
particular when handling STREAM frames.
Fix the return value used in quic-conn start callback for error. The
caller expects a negative value in this case.
Without this patch, the quic-conn and the connection stack are not
closed despite an initialization failure error, which is an undefined
behavior and may cause a crash in the end.
If the client does not sent an ALPN, the SSL ALPN negotiation callback
is not called. However, the handshake is reported as successful. Check
just after SSL_do_handshake if an ALPN was negotiated. If not, emit a
CONNECTION_CLOSE with a TLS alert to close the connection.
This prevent a crash in qcc_install_app_ops() called with null as second
parameter value.
It was supposed to be there, and probably was not placed there due to
historic limitations in listener_accept(), but now there does not seem
to be a remaining valid reason for keeping the quic_conn out of the
handle. In addition in new_quic_cli_conn() the handle->fd was incorrectly
set to the listener's FD.
By being able to return the ssl_sock_ctx, we're now enabling the whole
set of SSL sample fetch methods to work on the current SSL context of
the QUIC connection, as seen in the following test showing a request
forwarded to an HTTP/1 server with plenty of SSL headers filled:
00000001:decrypt.clireq[000f:ffffffff]: GET / HTTP/1.1
00000001:decrypt.clihdr[000f:ffffffff]: host: localhost
00000001:decrypt.clihdr[000f:ffffffff]: user-agent: nghttp3/ngtcp2 client
00000001:decrypt.clihdr[000f:ffffffff]: x-src: 127.0.0.1
00000001:decrypt.clihdr[000f:ffffffff]: x-dst: 127.0.0.4
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_f_serial: D16197E7D3E634E9
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_f_key_alg: rsaEncryption
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_f_sig_alg: RSA-SHA1
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc: 1
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc_has_sni: 1
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc_sni: blah
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc_alpn: h3
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc_protocol: TLSv1.3
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc_cipher: TLS_AES_256_GCM_SHA384
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc_alg_keysize: 256
00000001:decrypt.clihdr[000f:ffffffff]: x-ssl_fc_use_keysize: 256
00000001:decrypt.clihdr[000f:ffffffff]: x-forwarded-for: 127.0.0.1
The code is trivial, but this is marked as medium as there's always
the risk that some of the callable functions do not like being called
on such SSL contexts.
OpenSSL 3.0 warns that ERR_func_error_string() is deprecated. Using
ERR_peek_error_func() solves it instead, and this function was added to
the compat layer by commit 1effd9aa0 ("MINOR: ssl: Remove call to
ERR_func_error_string with OpenSSLv3").
We modify the key update feature implementation to support reusable cipher contexts
as this is done for the other cipher contexts for packet decryption and encryption.
To do so we attach a context to the quic_tls_kp struct and initialize it each time
the underlying secret key is updated. Same thing when we rotate the secrets keys,
we rotate the contexts as the same time.
These settings are potentially cancelled by others setting initialization shared
with SSL sock bindings. This will have to be clarified when we will adapt the
QUIC bindings configuration.
Add ->ctx new member field to quic_tls_secrets struct to store the cipher context
for each QUIC TLS context TX/RX parts.
Add quic_tls_rx_ctx_init() and quic_tls_tx_ctx_init() functions to initialize
these cipher context for RX and TX parts respectively.
Make qc_new_isecs() call these two functions to initialize the cipher contexts
of the Initial secrets. Same thing for ha_quic_set_encryption_secrets() to
initialize the cipher contexts of the subsequent derived secrets (ORTT, Handshake,
1RTT).
Modify quic_tls_decrypt() and quic_tls_encrypt() to always use the same cipher
context without allocating it each time they are called.
Define a new API to notify the MUX from the quic-conn when the
connection is about to be closed. This happens in the following cases :
- on idle timeout
- on CONNECTION_CLOSE emission or reception
The MUX wake callback is called on these conditions. The quic-conn
QUIC_FL_NOTIFY_CLOSE is set to only report once. On the MUX side,
connection flags CO_FL_SOCK_RD_SH|CO_FL_SOCK_WR_SH are set to interrupt
future emission/reception.
This patch is the counterpart to
"MEDIUM: mux-quic: report CO_FL_ERROR on send".
Now the quic-conn is able to report its closing, which may be translated
by the MUX into a CO_FL_ERROR on the connection for the upper layer.
This allows the MUX to properly react to the QUIC closing mechanism for
both idle-timeout and closing/draining states.
When freeing a quic-conn, the streams resources attached to it must be
cleared. This code is already implemented but the streams buffer was not
deallocated.
Fix this by using the function qc_stream_desc_free. This existing
function centralize all operations to properly free all streams
elements, attached both to the MUX and the quic-conn.
This fixes a memory leak which can happen for each released connection.
This flag was used to notify the MUX about a CONNECTION_CLOSE frame
reception. It is now unused on the MUX side and can be removed. A new
mechanism to detect quic-conn closing will be soon implemented.
Rationalize the lifetime of the quic-conn regarding with the MUX. The
quic-conn must not be freed if the MUX is still allocated.
This simplify the MUX code when accessing the quic-conn and removed
possible segfaults.
To implement this, if the quic-conn timer expired, the quic-conn is
released only if the MUX is not allocated. Else, the quic-conn is
flagged with QUIC_FL_CONN_EXP_TIMER. The MUX is then responsible
to call quic_close() which will free the flagged quic-conn.
New received packets after sending CONNECTION_CLOSE frame trigger a new
CONNECTION_CLOSE frame to be sent. Each time such a frame is sent we
increase the number of packet required to send another CONNECTION_CLOSE
frame.
Rearm only one time the idle timer when sending a CONNECTION_CLOSE frame.
This should be useful to have an idea of the list of frames which could be built
towards the list of available frames when building packets.
Same thing about the frames which could not be built because of a lack of room
in the TX buffer.
During a handshake, after having prepared a probe upon a PTO expiration from
process_timer(), we wake up the I/O handler to make it send probing packets.
This handler first treat incoming packets which trigger a fast retransmission
leading to send too much probing (duplicated) packets. In this cas we cancel
the fast retranmission.
When discarding a packet number space, we at least reset the PTO backoff counter.
Doing this several times have an impact on the PTO duration calculation.
We must not discard a packet number space several times (this is already the case
for the handshake packet number space).
Before having a look at the next encryption level to build packets if there is
no more ack-eliciting frames to send we must check we have not to probe from
the current encryption level anymore. If not, we only send one datagram instead
of sending two datagrams giving less chance to recover from packet loss.
Due to a erroneous interpretation of the RFC 9000 (quic-transport), ACKs frames
were always sent only after having received two ack-eliciting packets.
This could trigger useless retransmissions for tail packets on the peer side.
For now on, we send as soon as possible ACK frames as soon as we have ACK to send,
in the same packets as the ack-eliciting frame packets, and we also send ACK
frames after having received 2 ack-eliciting packets since the last time we sent
an ACK frame with other ack-eliciting frames.
As such variables are handled by the QUIC connection I/O handler which runs
always on the thread, there is no need to continue to use such atomic operations
This bug has come with this commit:
1fc5e16c4 MINOR: quic: More accurate immediately close
As mentionned in this commit we do not want to derive anymore secret when in closing
state. But the flag which denote secrets were derived was set. Add a label at
the correct flag to skip the secrets derivation without setting this flag.
The new qc_stream_desc type has a tree node for storage. Thus, we can
remove the node in the qcs structure.
When initializing a new stream, it is stored into the qcc streams_by_id
tree. When the MUX releases it, it will freed as soon as its buffer is
emptied. Before this, the quic-conn is responsible to store it inside
its own streams_by_id tree.
Move the xprt-buf and ack related fields from qcs to the qc_stream_desc
structure. In exchange, qcs has a pointer to the low-level stream. For
each new qcs, a qc_stream_desc is automatically allocated.
This simplify the transport layer by removing qcs/mux manipulation
during ACK frame parsing. An additional check is done to not notify the
MUX on sending if the stream is already released : this case may now
happen on retransmission.
To complete this change, the quic_stream frame now references the
quic_stream instance instead of a qcs.
Currently, the mux qcs streams manage the Tx buffering, even after
sending it to the transport layer. Buffers are emptied when
acknowledgement are treated by the transport layer. This complicates the
MUX liberation and we may loose some data after the MUX free.
Change this paradigm by moving the buffering on the transport layer. For
this goal, a new type is implemented as low-level stream at the
transport layer, as a counterpart of qcs mux instances. This structure
is called qc_stream_desc. This will allow to free the qcs/qcc instances
without having to wait for acknowledge reception.
For the moment, the quic-conn is responsible to store the qc_stream_desc
in a new tree named streams_by_id. This will sligthly change in the next
commits to remove the qcs node which has a similar purpose :
qc_stream_desc instances will be shared between the qcc MUX and the
quic-conn.
This patch only introduces the new type definition and the function to
manipulate it. The following commit will bring the rearchitecture in the
qcs structure.
The quic_stream frame stores the qcs instance. On ACK parsing, qcs is
accessed to clear the stream buffer. This can cause a segfault if the
MUX or the qcs is already released.
Consider the following scenario :
1. a STREAM frame is generated by the MUX
transport layer emits the frame with PKN=1
upper layer has finished the transfer so related qcs is detached
2. transport layer reemits the frame with PKN=2 because ACK was not
received
3. ACK for PKN=1 is received, stream buffer is cleared
at this stage, qcs may be freed by the MUX as it is detached
4. ACK for PKN=2 is received
qcs for STREAM frame is dereferenced which will lead to a crash
To prevent this, qcs is never accessed from the quic_stream during ACK
parsing. Instead, a lookup is done on the MUX streams tree. If the MUX
is already released, no lookup is done. These checks prevents a possible
segfault.
This change may have an impact on the perf as now we are forced to use a
tree lookup operation. If this is the case, an alternative solution may
be to implement a refcount on qcs instances.
After having consumed <i> bytes from <buf>, the remaining available room to be
passed to generate_retry_token() is sizeof(buf) - i.
This bug could be easily reproduced with quic-qo as client which chooses a random
value as ODCID length.
This commit is similar to the previous one but with MAX_DATA frames.
This allows to increase the connection level flow-control limit. If the
connection was blocked due to QC_CF_BLK_MFCTL flag, the flag is reseted.
Implement a MUX method to parse MAX_STREAM_DATA. If the limit is greater
than the previous one and the stream was blocked, the flag
QC_SF_BLK_SFCTL is removed.
We must consider the peer address as validated as soon as we received an
handshake packet. An ACK frame in handshake packet was too restrictive.
Rename the concerned flag to reflect this situation.
We must be able to handle 1RTT packets after the mux has terminated its job
(qc->mux_state == QC_MUX_RELEASED). So the condition (qc->mux_state != QC_MUX_READY)
in qc_qel_may_rm_hp() is not correct when we want to wait for the mux to be started.
Add a check in qc_parse_pkt_frms() to ensure is started before calling it. All
the STREAM frames will be ignored when the mux will be released.
The most important one is the ->flags member which leads to an erratic xprt behavior.
For instance a non ack-eliciting packet could be seen as ack-eliciting leading the
xprt to try to retransmit a packet which are not ack-eliciting. In this case, the
xprt does nothing and remains indefinitively in a blocking state.
There are non already identified rare cases where qc_build_frms() does not manage
to size frames to be encoded in a packet leading qc_build_frm() to fail to add
such frame to the packet to be built. In such cases we must move back such
frames to their origin frame list passed as parameter to qc_build_frms(): <frms>.
because they were added to the packet frame list (but not built). If this
this packet is not retransmitted, the frame is lost for ever! Furthermore we must
not modify the buffer.
The TX packet refcounting had come with the multithreading support but not only.
It is very useful to ease the management of the memory allocated for TX packets
with TX frames attached to. At some locations of the code we have to move TX
frames from a packet to a new one during retranmission when the packet has been
deemed as lost or not. When deemed lost the memory allocated for the paquet must
be released contrary to when its frames are retransmitted when probing (PTO).
For now on, thanks to this patch we handle the TX packets memory this way. We
increment the packet refcount when:
- we insert it in its packet number space tree,
- we attache an ack-eliciting frame to it.
And reciprocally we decrement this refcount when:
- we remove an ack-eliciting frame from the packet,
- we delete the packet from its packet number space tree.
Note that an optimization WOULD NOT be to fully reuse (without releasing its
memorya TX packet to retransmit its contents (its ack-eliciting frames). Its
information (timestamp, in flight length) to be processed by packet loss detection
and the congestion control.
When building a packet with an ACK frame, we store the largest acknowledged
packet number sent in this frame in the packet (quic_tx_packet struc).
When receiving an ack for such a packet we can purge the tree of acknowledged
packet number ranges from the range sent before this largest acknowledged
packet number.
This struct member stores the largest acked packet number which was received. It
is used to build (TX) packet. But this is confusing to store it in the tx packet
of the packet number space structure even if it is used to build and transmit
packets.
Add qc_may_reuse_cbuf() function used by qc_prep_pkts() and qc_prep_app_pkts().
Simplification of the factorized section code: there is no need to check there
is enough room to mark the end of the data in the TX buf. This is done by
the callers (qc_prep_pkts() and qc_prep_app_pkts()). Add a diagram to explain
the conditions which must be verified to be able to reuse a cbuf struct.
This should improve the QUIC stack implementation maintenability.
This commit reverts this one:
"d5066dd9d BUG/MEDIUM: quic: qc_prep_app_pkts() retries on qc_build_pkt() failures"
After having filled the congestion control window, qc_build_pkt() always fails.
Then depending on the relative position of the writer and reader indexes for the
TX buffer, this could lead this function to try to reuse the buffer even if not full.
In such case, we do not always mark the end of the data in this TX buffer. This
is something the reader cannot understand: it reads a false datagram length,
then a wrong packet address from the TX buffer, leading to an invalid pointer
dereferencing.
Implement a new MUX function qcc_notify_send. This function must be
called by the transport layer to confirm the sending of STREAM data to
the MUX.
For the moment, the function has no real purpose. However, it will be
useful to solve limitations on push frame and implement the flow
control.