The conn-stream endpoint is now shared between the conn-stream and the
applet or the multiplexer. If the mux or the applet is created first, it is
responsible to also create the endpoint and share it with the conn-stream.
If the conn-stream is created first, it is the opposite.
When the endpoint is only owned by an applet or a mux, it is called an
orphan endpoint (there is no conn-stream). When it is only owned by a
conn-stream, it is called a detached endpoint (there is no mux/applet).
The last entity that owns an endpoint is responsible to release it. When a
mux or an applet is detached from a conn-stream, the conn-stream
relinquishes the endpoint to recreate a new one. This way, the endpoint
state is never lost for the mux or the applet.
Group the endpoint target of a conn-stream, its context and the associated
flags in a dedicated structure in the conn-stream. It is not inlined in the
conn-stream structure. There is a dedicated pool.
For now, there is no complexity. It is just an indirection to get the
endpoint or its context. But the purpose of this structure is to be able to
share a refcounted context between the mux and the conn-stream. This way, it
will be possible to preserve it when the mux is detached from the
conn-stream.
This change is only significant for the multiplexer part. For the applets,
the context and the endpoint are the same. Thus, there is no much change. For
the multiplexer part, the connection was used to set the conn-stream
endpoint and the mux's stream was the context. But it is a bit strange
because once a mux is installed, it takes over the connection. In a
wonderful world, the connection should be totally hidden behind the mux. The
stream-interface and, in a lesser extent, the stream, still access the
connection because that was inherited from the pre-multiplexer era.
Now, the conn-stream endpoint is the mux's stream (an opaque entity for the
conn-stream) and the connection is the context. Dedicated functions have
been added to attached an applet or a mux to a conn-stream.
It was supposed to be there, and probably was not placed there due to
historic limitations in listener_accept(), but now there does not seem
to be a remaining valid reason for keeping the quic_conn out of the
handle. In addition in new_quic_cli_conn() the handle->fd was incorrectly
set to the listener's FD.
The mux didn't have its flags nor name set, as seen in this output of
"haproxy -vv":
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
quic : mode=HTTP side=FE mux= flags=
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|CLEAN_ABRT|HOL_RISK|NO_UPG
This might have random impacts at certain points like forcing some
connections to close instead of aborting a stream, or not always
handling certain streams as fully HTX-compliant.
On qc_detach(), the qcs must cleared the conn-stream context and set its
cs pointer to NULL. This prevents the qcs to point to a dangling
reference.
Without this, a SEGFAULT may occurs in qc_wake_some_streams() when
accessing an already detached conn-stream instance through a qcs.
Here is the SEGFAULT observed on haproxy.org.
Program terminated with signal 11, Segmentation fault.
1234 else if (qcs->cs->data_cb->wake) {
(gdb) p qcs.cs.data_cb
$1 = (const struct data_cb *) 0x0
This can happens since the following patch :
commit fe035eca3a24ea0f031fdcdad23809bea5de32e4
MEDIUM: mux-quic: report errors on conn-streams
The stream mux buffering has been reworked since the introduction of the
struct qc_stream_desc. A qcs is now able to quickly release its buffer
to the quic-conn.
Define a new API to notify the MUX from the quic-conn when the
connection is about to be closed. This happens in the following cases :
- on idle timeout
- on CONNECTION_CLOSE emission or reception
The MUX wake callback is called on these conditions. The quic-conn
QUIC_FL_NOTIFY_CLOSE is set to only report once. On the MUX side,
connection flags CO_FL_SOCK_RD_SH|CO_FL_SOCK_WR_SH are set to interrupt
future emission/reception.
This patch is the counterpart to
"MEDIUM: mux-quic: report CO_FL_ERROR on send".
Now the quic-conn is able to report its closing, which may be translated
by the MUX into a CO_FL_ERROR on the connection for the upper layer.
This allows the MUX to properly react to the QUIC closing mechanism for
both idle-timeout and closing/draining states.
Complete the error reporting. For each attached streams, if CO_FL_ERROR
is set, mark them with CS_FL_ERR_PENDING|CS_FL_ERROR. This will notify
the upper layer to trigger streams detach and release of the MUX.
This reporting is implemented in a new function qc_wake_some_streams(),
called by qc_wake(). This ensures that a lower-layer error is quickly
reported to the individual streams.
Mark the connection with CO_FL_ERROR on qc_send() if the socket Tx is
closed. This flag is used by the upper layer to order a close on the
MUX. This requires to check CO_FL_ERROR in qcc_is_dead() to process to
immediate MUX free when set.
The qc_wake() callback has been completed. Most notably, it now calls
qc_send() to report a possible CO_FL_ERROR. This is useful because
qc_wake() is called by the quic-conn on imminent closing.
Note that for the moment the error flag can never be set because the
quic-conn does not report when the Tx socket is closed. This will be
implemented in a following patch.
Regroup all features related to sending in qc_send(). This will be
useful when qc_send() will be called outside of the io-cb.
Currently, flow-control frames generation is now automatically
integrated in qc_send().
Add a new app layer operation is_active. This can be used by the MUX to
check if the connection can be considered as active or not. This is used
inside qcc_is_dead as a first check.
For example on HTTP/3, if there is at least one bidir client stream
opened the connection is active. This explicitly ignore the uni streams
used for control and qpack as they can never be closed during the
connection lifetime.
Improve timeout handling on the MUX. When releasing a stream, first
check if the connection can be considered as dead and should be freed
immediatly. This allows to liberate resources faster when possible.
If the connection is still active, ensure there is no attached
conn-stream before scheduling the timeout. To do this, add a nb_cs field
in the qcc structure.
The new qc_stream_desc type has a tree node for storage. Thus, we can
remove the node in the qcs structure.
When initializing a new stream, it is stored into the qcc streams_by_id
tree. When the MUX releases it, it will freed as soon as its buffer is
emptied. Before this, the quic-conn is responsible to store it inside
its own streams_by_id tree.
Move the xprt-buf and ack related fields from qcs to the qc_stream_desc
structure. In exchange, qcs has a pointer to the low-level stream. For
each new qcs, a qc_stream_desc is automatically allocated.
This simplify the transport layer by removing qcs/mux manipulation
during ACK frame parsing. An additional check is done to not notify the
MUX on sending if the stream is already released : this case may now
happen on retransmission.
To complete this change, the quic_stream frame now references the
quic_stream instance instead of a qcs.
Remove qcs instances left during qcc MUX release. This can happen when
the MUX is closed before the completion of all the transfers, such as on
a timeout or process termination.
This may free some memory leaks on the connection.
Define a new callback release inside qcc_app_ops. It is called when the
qcc MUX is freed via qc_release. This will allows to implement cleaning
on the app layer.
Regroup some cleaning operations inside a new function qcs_free. This
can be used for all streams, both through qcs_destroy and with
uni-directional streams.
This commit is similar to the following one :
commit 118b2cbf8430a9513947c27a8403ff380e1dcaf2
MINOR: quic: activate QUIC traces at compilation
If the macro ENABLE_QUIC_STDOUT_TRACES is defined, qmux traces are
outputted automatically on stdout. This is useful for the haproxy-qns
interop docker image.
Add a new qmux trace event QMUX_EV_QCS_PUSH_FRM. Its only purpose is to
display the meaningful result of a qcs_push_frame invocation.
A dedicated struct qcs_push_frm_trace_arg is defined to pass a series of
extra args for the trace output.
Define a new qmux event QMUX_EV_SEND_FRM. This allows to pass a
quic_frame as an extra argument. Depending on the frame type, a special
format can be used to log the frame content.
Currently this event is only used in qc_send_max_streams. Thus the
handler is only able to handle MAX_STREAMS frames.
Convert all printfs in the mux-quic code with traces.
Note that some meaningul printfs were not converted because they use
extra args in a format-string. This is the case inside qcs_push_frame
and qc_send_max_streams. A dedicated trace event should be implemented
for them to be able to display the extra arguments.
Declare a new trace module for mux-quic named qmux. It will be used to
convert all printf to regular traces. The handler qmux_trace can uses a
connection and a qcs instance as extra arguments.
This commit is similar to the previous one but with MAX_DATA frames.
This allows to increase the connection level flow-control limit. If the
connection was blocked due to QC_CF_BLK_MFCTL flag, the flag is reseted.
Implement a MUX method to parse MAX_STREAM_DATA. If the limit is greater
than the previous one and the stream was blocked, the flag
QC_SF_BLK_SFCTL is removed.
This commit is similar to the previous one, but this time on the
connection level instead of the stream.
When the connection limit is reached, the connection is flagged with
QC_CF_BLK_MFCTL. This flag is checked in qc_send.
qcs_push_frame uses a new parameter which is used to not exceed the
connection flow-limit while calling it repeatdly over multiple streams
instance before transfering data to the transport layer.
Implement the flow-control max-streams-data limit on emission. We ensure
that we never push more than the offset limit set by the peer. When the
limit is reached, the stream is marked as blocked with a new flag
QC_SF_BLK_SFCTL to disable emission.
Currently, this is only implemented for bidirectional streams. It's
required to unify the sending for unidirectional streams via
qcs_push_frame from the H3 layer to respect the flow-control limit for
them.
Rename the fields used for flow-control in the qcc structure. The
objective is to have shorter name for better readability while keeping
their purpose clear. It will be useful when the flow-control will be
extended with new fields.
Add comments on qc_send and qcs_push_frame. Also adjust the return of
qc_send to reflect the total bytes sent. This has no impact as currently
the return value is not checked by the caller.
This could lead to a mux erratic behavior. Sometimes the application layer could
not wakeup the mux I/O handler because it estimated it had already subscribed
to write events (see h3_snd_buf() end of implementation).
This was revealed by libasan when each time qc_send_frames() is run at the first
time:
=================================================================
==84177==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fbaaca2b3c8 at pc 0x560a4fdb7c2e bp 0x7fbaaca2b300 sp 0x7fbaaca2b2f8
READ of size 1 at 0x7fbaaca2b3c8 thread T6
#0 0x560a4fdb7c2d in qc_send_frames src/mux_quic.c:473
#1 0x560a4fdb83be in qc_send src/mux_quic.c:563
#2 0x560a4fdb8a6e in qc_io_cb src/mux_quic.c:638
#3 0x560a502ab574 in run_tasks_from_lists src/task.c:580
#4 0x560a502ad589 in process_runnable_tasks src/task.c:883
#5 0x560a501e3c88 in run_poll_loop src/haproxy.c:2675
#6 0x560a501e4519 in run_thread_poll_loop src/haproxy.c:2846
#7 0x7fbabd120ea6 in start_thread nptl/pthread_create.c:477
#8 0x7fbabcb19dee in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfddee)
Address 0x7fbaaca2b3c8 is located in stack of thread T6 at offset 56 in frame
#0 0x560a4fdb7f00 in qc_send src/mux_quic.c:514
This frame has 1 object(s):
[32, 48) 'frms' (line 515) <== Memory access at offset 56 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
Thread T6 created by T0 here:
#0 0x7fbabd1bd2a2 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:214
#1 0x560a5036f9b8 in setup_extra_threads src/thread.c:221
#2 0x560a501e70fd in main src/haproxy.c:3457
#3 0x7fbabca42d09 in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: stack-buffer-overflow src/mux_quic.c:473 in qc_send_frames
STREAM frames which are not acknowledged in order are inserted in ->tx.acked_frms
tree ordered by the STREAM frame offset values. Then, they are consumed in order
by qcs_try_to_consume(). But, when we retransmit frames, we possibly have to
insert the same STREAM frame node (with the same offset) in this tree.
The problem is when they have different lengths. Unfortunately the restransmitted
frames are not inserted because of the tree nature (EB_ROOT_UNIQUE). If the STREAM
frame which has been successfully inserted has a smaller length than the
retransmitted ones, when it is consumed they are tailing bytes in the STREAM
(retransmitted ones) which indefinitively remains in the STREAM TX buffer
which will never properly be consumed, leading to a blocking state.
At this time this may happen because we sometimes build STREAM frames
with null lengths. But this is another issue.
The solution is to use an EB_ROOT tree to support the insertion of STREAM frames
with the same offset but with different lengths. As qcs_try_to_consume() support
the STREAM frames retransmission this modification should not have any impact.
The current implementation of STREAM frames emission has some
limitation. Most notably when we cannot sent all frames in a single
qc_send run.
In this case, frames are left in front of the MUX list. It will be
re-send individually before other frames, possibly another frame from
the same STREAM with new data. An opportunity to merge the frames is
lost here.
This method is now improved. If a frame cannot be send entirely, it is
discarded. On the next qc_send run, we retry to send to this position. A
new field qcs.sent_offset is used to remember this. A new frame list is
used for each qc_send.
The impact of this change is not precisely known. The most notable point
is that it is a more logical method of emission. It might also improve
performance as we do not keep old STREAM frames which might delay other
streams.
Implement a new MUX function qcc_notify_send. This function must be
called by the transport layer to confirm the sending of STREAM data to
the MUX.
For the moment, the function has no real purpose. However, it will be
useful to solve limitations on push frame and implement the flow
control.
For the moment, the transport layer function qc_send_app_pkts lacks
features. Most notably, it only send up to a single Tx buffer and won't
retry even if there is frames left and its Tx buffer is now empty.
To overcome this limitation, the MUX implements an opportunistic retry
sending mechanism. qc_send_app_pkts is repeatedly called until the
transport layer is blocked on an external condition (such as congestion
control or a sendto syscall error).
The blocking was detected by inspecting the frame list before and after
qc_send_app_pkts. If no frame has been poped by the function, we
considered the transport layer to be blocked and we stop to send. The
MUX is subscribed on the lower layer to send the frames left.
However, in case of STREAM frames, qc_send_app_pkts might use only a
portion of the data and update the frame offset. So, for STREAM frames,
a new mechanism is implemented : if the offset field of the first frame
has not been incremented, it means the transport layer is blocked.
This should improve transfers execution. Before this change, there is a
possibility of interrupted transfer if the mux has not sent everything
possible and is waiting on a transport signaling which will never
happen.
In the future, qc_send_app_pkts should be extended to retry sending by
itself. All this code burden will be removed from the MUX.
For the moment, unidirectional streams handling is not identical to
bidirectional ones in MUX/H3 layer, both in Rx and Tx path. As a safety,
skip over uni streams in qc_send.
In fact, this change has no impact because qcs.tx.buf is emptied before
we start using qcs_push_frame, which prevents the call to
qcs_push_frame. However, this condition will soon change to improve
bidir streams emission, so an explicit check on stream type must be
done.
It is planified to unify uni and bidir streams handling in a future
stage. When implemented, the check will be removed.
Implement the locally flow-control streams limit for opened
bidirectional streams. Add a counter which is used to count the total
number of closed streams. If this number is big enough, emit a
MAX_STREAMS frame to increase the limit of remotely opened bidirectional
streams.
This is the first commit to implement QUIC flow-control. A series of
patches should follow to complete this.
This is required to be able to handle more than 100 client requests.
This should help to validate the Multiplexing interop test.
This commit should fix the possible transfer interruption caused by the
previous commit. The MUX always retry to send frames if there is
remaining data after a send call on the transport layer. This is useful
if the transport layer is not blocked on the sending path.
In the future, the transport layer should retry by itself the send
operation if no blocking condition exists. The MUX layer will always
subscribe to retry later if remaining frames are reported which indicate
a blocking on the transport layer.
Modify the STREAM emission in qc_send. Use the new transport function
qc_send_app_pkts to directly send the list of constructed frames. This
allows to remove the tasklet wakeup on the quic_conn and should reduce
the latency.
If not all frames are send after the transport call, subscribe the MUX
on the lower layer to be able to retry. Currently there is a bug because
the transport layer does not retry to send frames in excess after a
successful sendto. This might cause the transfer to be interrupted.
Improve the functions used to detect the stream characteristics :
uni/bidirectional and local/remote initiated.
Most notably, these functions are now designed to work transparently for
a MUX in the frontend or backend side. For this, we use the connection
to determine the current MUX side. This will be useful if QUIC is
implemented on the server side.
Since QUIC accept handling has been improved, the MUX is initialized
after the handshake completion. Thus its safe to access transport
parameters in qc_init via the quic_conn.
Remove quic_mux_transport_params_update which was called by the
transport for the MUX. This improves the architecture by removing a
direct call from the transport to the MUX.
The deleted function body is not transfered to qc_init because this part
will change heavily in the near future when implementing the
flow-control.
Reorganize the Rx path for STREAM frames on bidirectional streams. A new
function qcc_recv is implemented on the MUX. It will handle the STREAM
frames copy and offset calculation from transport to MUX.
Another function named qcc_decode_qcs from the MUX can be called by
transport each time new STREAM data has been copied.
The architecture is now cleaner with the MUX layer in charge of parsing
the STREAM frames offsets. This is required to be able to implement the
flow-control on the MUX layer.
Note that as a convenience, a STREAM frame is not partially copied to
the MUX buffer. This simplify the implementation for the moment but it
may change in the future to optimize the STREAM frames handling.
For the moment, only bidirectional streams benefit from this change. In
the future, it may be extended to unidirectional streams to unify the
STREAM frames processing.
Simplify the data manipulation of STREAM frames on TX. Only stream data
and len field are used to generate a valid STREAM frames from the
buffer. Do not use the offset field, which required that a single buffer
instance should be shared for every frames on a single stream.