MAX_STREAM_DATA can be used as the first frame of a stream. In this
case, the stream should be opened, if it respects flow-control limit. To
implement this, simply replace plain lookup in stream tree by
qcc_get_qcs() at the start of the parsing function. This automatically
takes care of opening the stream if not already done.
As specified by RFC 9000, if MAX_STREAM_DATA is receive for a
receive-only stream, a STREAM_STATE_ERROR connection error is emitted.
First we add a loop around recfrom() into the most low level I/O handler
quic_sock_fd_iocb() to collect as most as possible datagrams before during
its tasklet wakeup with a limit: we recvfrom() at most "maxpollevents"
datagrams. Furthermore we add a local task list into the datagram handler
quic_lstnr_dghdlr() which is passed to the first datagrams parser qc_lstnr_pkt_rcv().
This latter parser only identifies the connection associated to the datagrams then
wakeup the highest level packet parser I/O handlers (quic_conn.*io_cb()) after
it is done, thanks to the call to tasklet_wakeup_after() which replaces from now on
the call to tasklet_wakeup(). This should reduce drastically the latency and the
chances to fulfil the RX buffers at the QUIC connections level as reported in
GH #1737 by Tritan.
These modifications depend on this commit:
"MINOR: task: Add tasklet_wakeup_after()"
Must be backported to 2.6 with the previous commit.
Remove the call to qc_list_all_rx_pkts() which print messages on stderr
during RX buffer overruns and add a new counter for the number of dropped packets
because of such events.
Must be backported to 2.6
When the connection RX buffer is full, the received packets are dropped.
Some of them were not taken into an account by the ->dropped_pkt counter.
This very simple patch has no impact at all on the packet handling workflow.
Must be backported to 2.6.
In GH #1760 (which is marked as being a feature), there were compilation
errors on MacOS which could be reproduced in Linux when building 32-bit code
(-m32 gcc option). Most of them were due to variables types mixing in QUIC_MIN macro
or using size_t type in place of uint64_t type.
Must be backported to 2.6.
A curious practise seems to have started long ago and contaminated various
code areas, consisting in appending "_pool" at the end of the name of a
given pool. That makes no sense as the name is only used to name the pool
in diags such as "show pools", and since names are truncated there, this
adds some confusion when analysing the dump outputs. Let's just clean all
of them at once. there were essentially in SSL and QUIC.
This bug was revealed by key_update QUIC tracker test. During this test,
after the handshake has succeeded, the client sends a 1-RTT packet containing
only a PING frame. On our side, we do not acknowledge it, because we have
no ack-eliciting packet to send. This is not correct. We must acknowledge all
the ack-eliciting packets unconditionally. But we must not send too much
ACK frames: every two packets since we have sent an ACK frame. This is the test
(nb_aepkts_since_last_ack >= QUIC_MAX_RX_AEPKTS_SINCE_LAST_ACK) which ensure
this is the case. But this condition must be checked at the very last time,
when we are building a packet and when an acknowledgment is required. This
is not to qc_may_build_pkt() to do that. This boolean function decides if
we have packets to send. It must return "true" if there is no more ack-eliciting
packets to send and if an acknowledgement is required.
We must also add a "force_ack" parameter to qc_build_pkt() to force the
acknowledments during the handshakes (for each packet). If not, they will
be sent every two packets. This parameter must be passed to qc_do_build_pkt()
and be checked alongside the others conditions when building a packet to
decide to add an ACK frame or not to this packet.
Must be backported to 2.6.
Implement quic_tp_version_info_dump() to dump such a transport parameter (only remote).
Call it from quic_transport_params_dump() which dump all the transport parameters.
Can be backported to 2.6 as it's useful for debugging.
All packets received during hanshakes must be acknowledged asap. This was
not the case for Handshake packets received. At this time, this had
no impact because the client has often only one Handshake packet to send
and last handshake to be sent on our side always embeds an HANDSHAKE_DONE
frame which leads the client to consider it has no more handshake packet
to send.
Add <force_ack> to qc_may_build_pkt() to force an ACK frame to be sent.
Set this parameter to 1 when sending packets from Initial or Handshake
packet number spaces, 0 when sending only Application level packet.
Must be backported to 2.6.
Free Rx packet in the datagram handler if the packet was not taken in
charge by a quic_conn instance. This is reflected by the packet refcount
which is null.
A packet can be rejected for a variety of reasons. For example, failed
decryption, no Initial token and Retry emission or for datagram null
padding.
This patch should resolve the Rx packets memory leak observed via "show
pools" with the previous commit
2c31e12936
BUG/MINOR: quic: purge conn Rx packet list on release
This specific memory leak instance was reproduced using quiche client
which uses null datagram padding.
This should partially resolve github issue #1751.
It must be backported up to 2.6.
When releasing a quic_conn instance, free all remaining Rx packets in
quic_conn.rx.pkt_list. This partially fixes a memory leak on Rx packets
which can be observed after several QUIC connections establishment.
This should partially resolve github issue #1751.
It must be backported up to 2.6.
This counter must be incremented only one time by connection and decremented
as soon as the handshake has failed or succeeded. This is a gauge. Under certain
conditions this counter could be decremented twice. For instance
after having received a TLS alert, then upon SSL_do_handshake() failure.
To stop having to deal to all the current combinations which can lead to such a
situation (and the next to come), add a connection flag to denote if this counter
has been already decremented for a connection. So, this counter must be decremented
only if this flag has not been already set.
Must be backported up to 2.6.
Non constant expressions were used to initialize constant variables leading to
such compilation errors:
src/xprt_quic.c:66:3: error: initializer element is not a constant expression
.key_label_len = strlen(QUIC_HKDF_KEY_LABEL_V1),
Reproduced with CC=gcc-4.9 compilation option.
Fix using macros for each HKDF label.
At this time haproxy supported only incompatible version negotiation feature which
consists in sending a Version Negotiation packet after having received a long packet
without compatible value in its version field. This version value is the version
use to build the current packet. This patch does not modify this behavior.
This patch adds the support for compatible version negotiation feature which
allows endpoints to negotiate during the first flight or packets sent by the
client the QUIC version to use for the connection (or after the first flight).
This is done thanks to "version_information" parameter sent by both endpoints.
To be short, the client offers a list of supported versions by preference order.
The server (or haproxy listener) chooses the first version it also supported as
negotiated version.
This implementation has an impact on the tranport parameters handling (in both
direcetions). Indeed, the server must sent its version information, but only
after received and parsed the client transport parameters). So we cannot
encode these parameters at the same time we instantiated a new connection.
Add QUIC_TP_DRAFT_VERSION_INFORMATION(0xff73db) new transport parameter.
Add tp_version_information new C struct to handle this new parameter.
Implement quic_transport_param_enc_version_info() (resp.
quic_transport_param_dec_version_info()) to encode (resp. decode) this
parameter.
Add qc_conn_finalize() which encodes the transport parameters and configure
the TLS stack to send them.
Add ->negotiated_ictx quic_conn C struct new member to store the Initial
QUIC TLS context for the negotiated version. The Initial secrets derivation
is version dependent.
Rename ->version to ->original_version and add ->negotiated_version to
this C struct to reflect the QUIC-VN RFC denomination.
Modify most of the QUIC TLS API functions to pass a version as parameter.
Export the QUIC version definitions to be reused at least from quic_tp.c
(transport parameters.
Move the token check after the QUIC connection lookup. As this is the original
version which is sent into a Retry packet, and because this original version is
stored into the connection, we must check the token after having retreived this
connection.
Add packet version to traces.
See https://datatracker.ietf.org/doc/html/draft-ietf-quic-version-negotiation-08
for more information about this new feature.
This is not clear at all how to distinguish a QUIC draft version number from a
released one. And among these QUIC draft versions, which one must use the draft
QUIC TLS extension.
According to the QUIC implementations which support v2 draft, the TLS extension
(transport parameters) to be used is the released one
(TLS_EXTENSION_QUIC_TRANSPORT_PARAMETERS).
As the unique QUIC draft version we support is 0xff00001d and as at this time the
unique version with 0xff as most significant byte is this latter which must use
the draft TLS extension, we select the draft TLS extension
(TLS_EXTENSION_QUIC_TRANSPORT_PARAMETERS_DRAFT) only for such versions with 0xff
as most signification byte.
This is becoming difficult to handle the QUIC TLS related definitions
which arrive with a QUIC version (draft or not). So, here we add
quic_version C struct which does not define only the QUIC version number,
but also the QUIC TLS definitions which depend on a QUIC version.
Modify consequently all the QUIC TLS API to reuse these definitions
through new quic_version C struct.
Implement quic_pkt_type() function which return a packet type (0 up to 3)
depending on the QUIC version number.
Stop harding the Retry packet first byte in send_retry(): this is not more
possible because the packet type field depends on the QUIC version.
Also modify quic_build_packet_long_header() for the same reason: the packet
type depends on the QUIC version.
Add a quic_version C struct member to quic_conn C struct.
Modify qc_lstnr_pkt_rcv() to set this member asap.
Remove the version member from quic_rx_packet C struct: a packet is attached
asap to a connection (or dropped) which is the unique object which should
store the QUIC version.
Modify qc_pkt_is_supported_version() to return a supported quic_version C
struct from a version number.
Add Initial salt for QUIC v2 draft (initial_salt_v2_draft).
This is to prepare the support for QUIC v2 version. The packet type
depends on the version. So, we must parse early enough the version
before defining the type of each packet.
The nonce and keys used to cipher the Retry tag depend on the QUIC version.
Add these definitions for 0xff00001d (draft-29) and v2 QUIC version. At least
draft-29 is useful for QUIC tracker tests with "quic-force-retry" enabled
on haproxy side.
Validated with -v 0xff00001d ngtcp2 option.
Could not validate the v2 nonce and key at this time because not supported.
Use the same version as the one received. This is safe because the
version is treated before anything else sending a Version packet.
Must be backported to 2.6.
This simply replaces a call to task_new(1<<thr) with task_new_on(thr)
so that we can later isolate the changes required to add more thread
group stuff.
Building QUIC with gcc-4.4 on el6 shows this error:
src/xprt_quic.c: In function 'qc_release_lost_pkts':
src/xprt_quic.c:1905: error: unknown field 'loss' specified in initializer
compilation terminated due to -Wfatal-errors.
make: *** [src/xprt_quic.o] Error 1
make: *** Waiting for unfinished jobs....
Initializing an anonymous form of union like :
struct quic_cc_event ev = {
(...)
.loss.time_sent = newest_lost->time_sent,
(...)
};
generates an error with gcc-4.4 but not when initializing the
fields outside of the declaration.
The MUX now provides a single API for both uni and bidirectional
streams. It is responsible to reject reception on a local unidirectional
stream with the error STREAM_STATE_ERROR. This is already implemented in
qcc_recv(). As such, remove this duplicated check from xprt_quic.c.
If the connection client timeout has expired, the mux is released.
If the client decides to initiate a new request, we send a STOP_SENDING
frame. Then, the client endessly sends a RESET_STREAM frame.
At this time, we simulate the fact that we support the RESET_STREAM frame
thanks to this ridiculously minimalistic patch.
If the connection client timeout has expired, the mux is released.
If the client decides to initiate a new request, we do not ack its
request. This leads the client to endlessly sent it request.
This patch makes a QUIC listener send a STOP_SENDING frame in such
a situation.
Add ->inc_err_cnt new callback to qcc_app_ops struct which can
be called from xprt to increment the application level error code counters.
It take the application context as first parameter to be generic and support
new QUIC applications to come.
Add h3_stats.c module with counters for all the frame types and error codes.
Add new counters to count the number of dropped packet upon parsing error, lost
sent packets and the number of stateless reset packet sent.
Take the oppportunity of this patch to rename CONN_OPENINGS to QUIC_ST_HALF_OPEN_CONN
(total number of half open connections) and QUIC_ST_HDSHK_FAILS to QUIC_ST_HDSHK_FAIL.
When we select the next encryption level in qc_treat_rx_pkts() we
must reset the local largest_pn variable if we do not want to reuse its
previous value for this encryption. This bug could only happend during
handshake step and had no visible impact because this variable
is only used during the header protection removal step which hopefully
supports the packet reordering.
Add quic_transport_params_dump() static inline function to do so for
a quic_transport_parameters struct as parameter.
We use the trace API do dump these transport parameters both
after they have been initialized (RX/local) or received (TX/remote).
Make the transport parameters be standlone as much as possible as
it consists only in encoding/decoding data into/from buffers.
Reduce the size of xprt_quic.h. Unfortunalety, I think we will
have to continue to include <xprt_quic-t.h> to use the trace API
into this module.
We do not want to count the out of packet padding as being belonging
to an invalid packet, the firt byte of a QUIC packet being never null.
Some browsers like firefox proceeds this way to add PADDING frames
after an Initial packet and increase the size of their Initial packets.
The whole QUIC stack is impacted by this change :
* at quic-conn level, a single function is now used to handle uni and
bidirectional streams. It uses qcc_recv() function from MUX.
* at MUX level, qc_recv() io-handler function does not skip uni streams
* most changes are conducted at app layer. Most notably, all received
data is handle by decode_qcs operation.
Now that decode_qcs is the single app read function, the H3 layer can be
simplified. Uni streams parsing was extracted from h3_attach_ruqs() to
h3_decode_qcs().
h3_decode_qcs() is able to deal with all HTTP/3 frame types. It first
check if the frame is valid for the H3 stream type. Most notably,
SETTINGS parsing was moved from h3_control_recv() into h3_decode_qcs().
This commit has some major benefits besides removing duplicated code.
Mainly, QUIC flow control is now enforced for uni streams as with bidi
streams. Also, an unknown frame received on control stream does not set
an error : it is now silently ignored as required by the specification.
Some cleaning in H3 code is already done with this patch :
h3_control_recv() and h3_attach_ruqs() are removed as they are now
unused. A final patch should clean up the unneeded remaining bit.
Complete quic-conn API for error reporting. A new parameter <app> is
defined in the function quic_set_connection_close(). This will transform
the frame into a CONNECTION_CLOSE_APP type.
This type of frame will be generated by the applicative layer, h3 or
hq-interop for the moment. A new function qcc_emit_cc_app() is exported
by the MUX layer for them.
This reverts commit 118b2cbf84.
This patch was useful mainly for the docker image of QUIC interop to
have traces on stdout.
A better solution has been found by integrating this patch directly in
the qns repository which is used to build the docker image. Thus, this
hack is not require anymore in the main repository.
When we receive a CONNECTION_CLOSE frame, we should decrement this counter
if the handshake state was not successful and if we have not received
a TLS alert from the TLS stack.
Define an API to easily set a CONNECTION_CLOSE. This will mainly be
useful for the MUX when an error is detected which require to close the
whole connection.
On the MUX side, a new flag is added when a CONNECTION_CLOSE has been
prepared. This will disable add future send operations.
We rely on <conn_opening> stats counter and tune.quic.retry_threshold
setting to dynamically start sending Retry packets. We continue to send such packets
when "quic-force-retry" setting is set. The difference is when we receive tokens.
We check them regardless of this setting because the Retry could have been
dynamically started. We must also send Retry packets when we receive Initial
packets without token if the dynamic Retry threshold was reached but only for connection
which are not currently opening or in others words for Initial packets without
connection already instantiated. Indeed, we must not send Retry packets for all
Initial packets without token. For instance a client may have already sent an
Initial packet without receiving Retry packet because the Retry feature was not
started, then the Retry starts on exeeding the threshold value due to others
connections, then finally our client decide to send another Initial packet
(to ACK Initial CRYPTO data for instance). It does this without token. So, for
this already existing connection we must not send a Retry packet.
First commit to handle the QUIC stats counters. There is nothing special to say
except perhaps for ->conn_openings which is a gauge to count the number of
connection openings. It is incremented after having instantiated a quic_conn
struct, then decremented when the handshake was successful (handshake completed
state) or failed or when the connection timed out without reaching the handshake
completed state.
Move the code which finalizes the QUIC connections initialisations after
having called qc_new_conn() into this function to benefit from its
error handling to release the memory allocated for QUIC connections
the initialization of which could not be finalized.
Here is the format of a token:
- format (1 byte)
- ODCID (from 9 up 21 bytes)
- creation timestamp (4 bytes)
- salt (16 bytes)
A format byte is required to distinguish the Retry token from others sent in
NEW_TOKEN frames.
The Retry token is ciphered after having derived a strong secret from the cluster secret
and generated the AEAD AAD, as well as a 16 bytes long salt. This salt is
added to the token. Obviously it is not ciphered. The format byte is not
ciphered too.
The AAD are built by quic_generate_retry_token_aad() which concatenates the version,
the client SCID and the IP address and port. We had to implement quic_saddr_cpy()
to copy the IP address and port to the AAD buffer. Only the Retry SCID is generated
on our side to build a Retry packet, the others fields come from the first packet
received by the client. It must reuse this Retry SCID in response to our Retry packet.
So, we have not to store it on our side. Everything is offloaded to the client (stateless).
quic_generate_retry_token() must be used to generate a Retry packet. It calls
quic_pkt_encrypt() to cipher the token.
quic_generate_retry_check() must be used to check the validity of a Retry token.
It is able to decipher a token which arrives into an Initial packet in response
to a Retry packet. It calls parse_retry_token() after having deciphered the token
to store the ODCID into a local quic_cid struct variable. Finally this ODCID may
be stored into the transport parameter thanks to qc_lstnr_params_init().
The Retry token lifetime is 10 seconds. This lifetime is also checked by
quic_generate_retry_check(). If quic_generate_retry_check() fails, the received
packet is dropped without anymore packet processing at this time.
Slightly change the interface for qcc_recv() between MUX and XPRT. The
MUX is now responsible to call qcc_decode_qcs(). This is cleaner as now
the XPRT does not have to deal with an extra QCS parameter and the MUX
will call qcc_decode_qcs() only if really needed.
This change is possible since there is no extra buffering for
out-of-order STREAM frames and the XPRT does not have to handle buffered
frames.
The quic-conn manages a buffer to store received QUIC packets. When the
buffer wraps, the gap is filled until the end with junk and packets can
be inserted at the start of the buffer.
On the other end, deletion is implemented via quic_rx_pkts_del().
Packets are removed one by one if their refcount is nul. If junk is
found, the buffer is emptied until its wrap.
This seems to work in most cases but a bug was found in a particular
case : on insertion if buffer gap is not at the end of the buffer. In
this case, the gap was filled, which is useless as now the buffer is
full and the packet cannot be inserted. Worst, on deletion, when junk is
removed there is a risk to removed new packets. This can happens in the
following case :
1. buffer contig space is too small, junk is inserted in the middle of
it
2. on quic_rx_pkts_del() invocation, a packet is removed, but not the
next one because its refcount is still positive. When a new packet is
received, it will be stored after the junk.
3. on next quic_rx_pkts_del(), when junk is removed, all contig data is
cleared, with newer packets data too.
This will cause a transfer between a client and haproxy to be stalled.
This can be reproduced with big enough POST requests. I triggered it
with ngtcp2 and 10M of posted data.
Hopefully, the solution of this bug is simple. If contig space is not
big enough to store a packet, but the space is not at the end of the
buffer, no junk is inserted and the packet is dropped as we cannot
buffered it. This ensures that junk is only present at the end of the
buffer and when removed no packets data is purged with it.
quic_rx_strm_frm type was used to buffered STREAM frames received out of
order. Now the MUX is able to deal directly with these frames and
buffered it inside its ncbuf.