Not specifying the alignment will let the linker choose it, and it turns
out that it will not necessarily be the same size as the one chosen for
struct mem_stats, as can be seen if any new fields are added there. Let's
enforce an alignment to void* both for the section and for the structure.
These fake datagrams are only used by the low level I/O handler. They
are not provided to the "by connection" datagram handlers. This
is why they are not MT_LIST_APPEND()ed to the listner RX buffer list
(see &quic_dghdlrs[cid_tid].dgrams in quic_lstnr_dgram_dispatch().
Replace the call to pool_zalloc() to by the lighter call to pool_malloc()
and initialize only the ->buf and ->length members. This is safe because
only these fields are inspected by the low level I/O handler.
After removing the packet header protection, we can check the packet is long
enough to contain a 16 bytes length AEAD TAG (at this end of the packet).
This test was missing.
Must be backported to 2.6.
These traces about the available room into the packet currently built and
its payload length could be displayed for each STREAM frame, even for
those which have no chance to be embedded into a packet leading to
very traces to be displayed from a connection with a lot of stream.
This was revealed by traces provide by Tristan in GH #1808
May be backported to 2.6.
When entering this function, we first check the packet length is not too short.
But this was done against the datagram lenght in place of the packet length.
This could lead to the header protection to be removed using data past
the end of the packet (without buffer overflow).
Use the packet length in place of the datagram length which is at <end>
address passed as parameter to this function. As the packet length
is already stored in ->len packet struct member, this <end> parameter is no
more useful.
Must be backported to 2.6.
Released version 2.7-dev3 with the following main changes :
- BUILD: makefile: Fix install(1) handling for OpenBSD/NetBSD/Solaris/AIX
- BUG/MEDIUM: tools: avoid calling dlsym() in static builds (try 2)
- MINOR: resolvers: resolvers_destroy() deinit and free a resolver
- BUG/MINOR: resolvers: shut off the warning for the default resolvers
- BUG/MINOR: ssl: allow duplicate certificates in ca-file directories
- BUG/MINOR: tools: fix statistical_prng_range()'s output range
- BUG/MINOR: quic: do not send CONNECTION_CLOSE_APP in initial/handshake
- BUILD: debug: Add braces to if statement calling only CHECK_IF()
- BUG/MINOR: fd: Properly init the fd state in fd_insert()
- BUG/MEDIUM: fd/threads: fix incorrect thread selection in wakeup broadcast
- MINOR: init: load OpenSSL error strings
- MINOR: ssl: enhance ca-file error emitting
- BUG/MINOR: mworker/cli: relative pid prefix not validated anymore
- BUG/MAJOR: mux_quic: fix invalid PROTOCOL_VIOLATION on POST data overlap
- BUG/MEDIUM: mworker: proc_self incorrectly set crashes upon reload
- BUILD: add detection for unsupported compiler models
- BUG/MEDIUM: stconn: Only reset connect expiration when processing backend side
- BUG/MINOR: backend: Fallback on RR algo if balance on source is impossible
- BUG/MEDIUM: master: force the thread count earlier
- BUG/MAJOR: poller: drop FD's tgid when masks don't match
- DEBUG: fd: detect possibly invalid tgid in fd_insert()
- BUG/MINOR: sockpair: wrong return value for fd_send_uxst()
- MINOR: sockpair: move send_fd_uxst() error message in caller
- Revert "BUG/MINOR: peers: set the proxy's name to the peers section name"
- DEBUG: fd: split the fd check
- MEDIUM: resolvers: continue startup if network is unavailable
- BUG/MINOR: fd: always remove late updates when freeing fd_updt[]
- MINOR: cli: emit a warning when _getsocks was used more than once
- BUG/MINOR: mworker: PROC_O_LEAVING used but not updated
- Revert "MINOR: cli: emit a warning when _getsocks was used more than once"
- MINOR: cli: warning on _getsocks when socket were closed
- BUG/MEDIUM: mux-quic: fix missing EOI flag to prevent streams leaks
- MINOR: quic: Congestion control architecture refactoring
- MEDIUM: quic: Cubic congestion control algorithm implementation
- MINOR: quic: New "quic-cc-algo" bind keyword
- BUG/MINOR: quic: loss time limit variable computed but not used
- MINOR: quic: Stop looking for packet loss asap
- BUG/MAJOR: quic: Useless resource intensive loop qc_ackrng_pkts()
- MINOR: quic: Send packets as much as possible from qc_send_app_pkts()
- BUG/MEDIUM: queue/threads: limit the number of entries dequeued at once
- MAJOR: threads/plock: update the embedded library
- MINOR: thread: provide an alternative to pthread's rwlock
- DEBUG: tools: provide a tree dump function for ebmbtrees as well
- MINOR: ebtree: add ebmb_lookup_shorter() to pursue lookups
- BUG/MEDIUM: pattern: only visit equivalent nodes when skipping versions
- BUG/MINOR: mux-quic: prevent crash if conn released during IO callback
- CLEANUP: mux-quic: remove useless app_ops is_active callback
- BUG/MINOR: mux-quic: do not free conn if attached streams
- MINOR: mux-quic: save proxy instance into qcc
- MINOR: mux-quic: use timeout server for backend conns
- MEDIUM: mux-quic: adjust timeout refresh
- MINOR: mux-quic: count in-progress requests
- MEDIUM: mux-quic: implement http-keep-alive timeout
- MINOR: peers: Add a warning about incompatible SSL config for the local peer
- MINOR: peers: Use a dedicated reconnect timeout when stopping the local peer
- BUG/MEDIUM: peers: limit reconnect attempts of the old process on reload
- BUG/MINOR: peers: Use right channel flag to consider the peer as connected
- BUG/MEDIUM: dns: Properly initialize new DNS session
- BUG/MINOR: backend: Don't increment conn_retries counter too early
- MINOR: server: Constify source server to copy its settings
- REORG: server: Export srv_settings_cpy() function
- BUG/MEDIUM: proxy: Perform a custom copy for default server settings
- BUG/MINOR: quic: Missing in flight ack eliciting packet counter decrement
- BUG/MEDIUM: quic: Floating point exception in cubic_root()
- MINOR: h3: support HTTP request framing state
- MINOR: mux-quic: refresh timeout on frame decoding
- MINOR: mux-quic: refactor refresh timeout function
- MEDIUM: mux-quic: implement http-request timeout
- BUG/MINOR: quic: Avoid sending truncated datagrams
- BUG/MINOR: ring/cli: fix a race condition between the writer and the reader
- BUG/MEDIUM: sink: Set the sink ref for forwarders created during ring parsing
- BUG/MINOR: sink: fix a race condition between the writer and the reader
- BUG/MINOR: quic: do not reject datagrams matching minimum permitted size
- MINOR: quic: Add two new stats counters for sendto() errors
- BUG/MINOR: quic: Missing Initial packet dropping case
- MINOR: quic: explicitely ignore sendto error
- BUG/MINOR: quic: adjust errno handling on sendto
- BUG/MEDIUM: quic: break out of the loop in quic_lstnr_dghdlr
- MINOR: threads: report the number of thread groups in build options
- MINOR: config: automatically preset MAX_THREADS based on MAX_TGROUPS
- BUILD: SSL: allow to pass additional configure args to QUICTLS
- CI: enable weekly "m32" builds on x86_64
- CLEANUP: assorted typo fixes in the code and comments
- BUG/MEDIUM: fix DH length when EC key is used
- REGTESTS: ssl: adopt tests to OpenSSL-3.0.N
- REGTESTS: ssl: adopt tests to OpenSSL-3.0.N
- REGTESTS: ssl: fix grep invocation to use extended regex in ssl_generate_certificate.vtc
- BUILD: cfgparse: always defined _GNU_SOURCE for sched.h and crypt.h
_GNU_SOURCE used to be defined only when USE_LIBCRYPT was set. It's also
needed for sched_setaffinity() to be exported. As a side effect, when
USE_LIBCRYPT is not set, a warning is emitted, as Ilya found and reported
in issue #1815. Let's just define _GNU_SOURCE regardless of USE_LIBCRYPT,
and also explicitly add sched.h, as right now it appears to be inherited
from one of the other includes.
This should be backported to 2.4.
dh of length 1024 were chosen for EVP_PKEY_EC key type.
let us pick "default_dh_param" instead.
issue was found on Ubuntu 22.04 which is shipped with OpenSSL configured
with SECLEVEL=2 by default. such SECLEVEL value prohibits DH shorter than
2048:
OpenSSL error[0xa00018a] SSL_CTX_set0_tmp_dh_pkey: dh key too small
better strategy for chosing DH still may be considered though.
MAX_THREADS was not changed when setting MAX_TGROUPS, which still limits
some possibilities. Let's preset it to 4 * LONGBITS when MAX_TGROUPS is
larger than 1, or LONGBITS when it's set to 1. This means that the new
default value is 256 threads.
The rationale behind this is that the main use of thread groups is
mostly to address NUMA issues and that we don't necessarily need large
thread counts when using many groups, and 256 threads is already plenty
even on quite large systems.
For now it's important not to go too far because some internal structs
are arrays of MAX_THREADS entries, for example accept_queue_ring, which
is around 8kB per thread. Such structures will need to become dynamic
before defaulting to large thread counts (at 4096 threads max the
accept queues would require 32 MB RAM alone).
The function processes packets sent by other threads in the current
thread's queue. But if, for any reason, other threads write faster
than the current one processes, this can lead to a situation where
the function never returns.
It seems that it might be what's happening in issue #1808, though
unfortunately, this function is one of the rare without traces. But
the amount of calls to functions like qc_lstnr_pkt_rcv() on a single
thread seems to indicate this possibility.
Thanks to Tristan for his efforts in collecting extremely precious
traces!
This likely needs to be backported to 2.6.
qc_snd_buf returned a size_t which means that it was never negative
despite its documentation. Thus the caller who checked for this was
never informed of a sendto error.
Clean this by changing the return value of qc_snd_buf() to an integer.
A 0 is returned on success. Every other values are considered as an
error.
This commit should be backported up to 2.6. Note that to not cause
malfunctions, it must be backported after the previous patch :
906b058954
MINOR: quic: explicitely ignore sendto error
This is to ensure that a sendto error does not cause send to be
interrupted which may cause a stalled transfer without a proper retry
mechanism.
The impact of this bug seems null as caller explicitely ignores sendto
error. However this part of code seems to be subject to strange issues
and it may fix them in part. It may be of interest for github issue #1808.
qc_snd_buf() returns an error if sendto has failed. On standard
conditions, we should check for EAGAIN/EWOULDBLOCK errno and if so,
register the file-descriptor in the poller to retry the operation later.
However, quic_conn uses directly the listener fd which is shared for all
QUIC connections of this listener on several threads. Thus, it's
complicated to implement fd supversion via the poller : there is no
mechanism to easily wakeup quic_conn or MUX after a sendto failure.
A quick and simple solution for the moment is to considered a datagram
as properly emitted even on sendto error. In the end, this will trigger
the quic_conn retransmission timer as data will be considered lost on
the network and the send operation will be retried. This solution will
be replaced when fd management for quic_conn is reworked.
In fact, this quick hack was already in use in the current code, albeit
not voluntarily. This is due to a bug caused by an API mismatch on the
return type of qc_snd_buf() which never emits a negative error code
despite its documentation. Thus, all its invocation were considered as a
success. If this bug was fixed, the sending would would have been
interrupted by a break which could cause the transfer to freeze.
qc_snd_buf() invocation is clean up : the break statement is removed.
Send operation is now always explicitely conducted entirely even on
error and buffer data is purged.
A simple optimization has been added to skip over sendto when looping
over several datagrams at the first sendto error. However, to properly
function, it requires a fix on the return type of qc_snd_buf() which is
provided in another patch.
As the behavior before and after this patch seems identical, it is not
labelled as a BUG. However, it should be backported for cleaning
purpose. It may also have an impact on github issue #1808.
The dgram length check in quic_get_dgram_dcid() rejects datagrams
matching exactly the minimum allowed length, which doesn't seem
correct. I doubt any useful packet would be that small but better
fix this to avoid confusing debugging sessions in the future.
This might be backported to 2.6.
This is the same issue as just fixed in b8e0fb97f ("BUG/MINOR: ring/cli:
fix a race condition between the writer and the reader") but this time
for sinks. They're also sucking the ring and present the same race at
high write loads.
This must be backported to 2.2 as well. See comments in the aforementioned
commit for backport hints if needed.
A reference to the sink was added in every forwarder by the commit 2ae25ea24
("MINOR: sink: Add a ref to sink in the sink_forward_target structure"). But
this commit is incomplete. It is not performed for the forwarders created
during a ring parsing.
This patch must be backported to 2.6.
The ring's CLI reader unlocks the read side of a ring and relocks it for
writing only if it needs to re-subscribe. But during this time, the writer
might have pushed data, see nobody subscribed hence woken nobody, while
the reader would have left marking that the applet had no more data. This
results in a dump that will not make any forward progress: the ring is
clogged by this reader which believes there's no data and the writer
will never wake it up.
The right approach consists in verifying after re-attaching if the writer
had made any progress in between, and to report that another call is
needed. Note that a jump back to the beginning would also work but here
we provide better fairness between readers this way.
This needs to be backported to 2.2. The applet API needed to signal the
availability of new data changed a few times since then.
There is a remaining loop in this ugly qc_snd_buf() function which could
lead haproxy to send truncated UDP datagrams. For now on, we send
a complete UDP datagram or nothing!
Must be backported to 2.6.
Implement http-request timeout for QUIC MUX. It is used when the
connection is opened and is triggered if no HTTP request is received in
time. By HTTP request we mean at least a QUIC stream with a full header
section. Then qcs instance is attached to a sedesc and upper layer is
then responsible to wait for the rest of the request.
This timeout is also used when new QUIC streams are opened during the
connection lifetime to wait for full HTTP request on them. As it's
possible to demux multiple streams in parallel with QUIC, each waiting
stream is registered in a list <opening_list> stored in qcc with <start>
as timestamp in qcs for the stream opening. Once a qcs is attached to a
sedesc, it is removed from <opening_list>. When refreshing MUX timeout,
if <opening_list> is not empty, the first waiting stream is used to set
MUX timeout.
This is efficient as streams are stored in the list in their creation
order so CPU usage is minimal. Also, the size of the list is
automatically restricted by flow control limitation so it should not
grow too much.
Streams are insert in <opening_list> by application protocol layer. This
is because only application protocol can differentiate streams for HTTP
messaging from internal usage. A function qcs_wait_http_req() has been
added to register a request stream by app layer. QUIC MUX can then
remove it from the list in qc_attach_sc().
As a side-note, it was necessary to implement attach qcc_app_ops
callback on hq-interop module to be able to insert a stream in waiting
list. Without this, a BUG_ON statement would be triggered when trying to
remove the stream on sedesc attach. This is to ensure that every
requests streams are registered for http-request timeout.
MUX timeout is explicitely refreshed on MAX_STREAM_DATA and STOP_SENDING
frame parsing to schedule http-request timeout if a new stream has been
instantiated. It was already done on STREAM parsing due to a previous
patch.
Try to reorganize qcc_refresh_timeout() to improve its readability. The
main objective is to reduce the indentation level and if sequences by
using goto statement to the end of the function. Also, backend and
frontend code path should be more explicit with this new version.
Refresh the MUX connection timeout in frame parsing functions. This is
necessary as these Rx operation are completed directly from the
quic-conn layer outside of MUX I/O callback. Thus, the timeout should be
refreshed on this occasion.
Note that however on STREAM parsing refresh is only conducted when
receiving the current consecutive data offset.
Timeouts related function have been moved up in the source file to be
able to use them in qcc_decode_qcs().
This commit will be useful for http-request timeout. Indeed, a new
stream may be opened during qcc_decode_qcs() which should trigger this
timeout until a full header section is received and qcs instance is
attached to sedesc.
Store the current step of HTTP message in h3s stream. This reports if we
are in the parsing of headers, content or trailers section. A new enum
h3s_st_req is defined for this.
This field is stored in h3s struct but only used for request stream. It
is left undefined for other streams (control or QPACK streams).
h3_is_frame_valid() has been extended to take into account this state
information. A connection error H3_FRAME_UNEXPECTED is reported if an
invalid frame according to the current state is received; for example a
DATA frame at the beginning of a stream.
It is illegal to call my_flsl() with 0 as parameter value. It is a UB.
This leaded cubic_root() to divide values by 0 at this line:
x = 2 * x + (uint32_t)(val / ((uint64_t)x * (uint64_t)(x - 1)));
Thank you to Tristan971 for having reported this issue in GH #1808
and Willy for having spotted the root cause of this bug.
Must follow any cubic for QUIC backport (2.6).
The decrement was missing in quic_pktns_tx_pkts_release() called each time a
packet number space is discarded. This is not sure this bug could have an impact
during handshakes. This counter is used to cancel the timer used both for packet
detection and PTO, setting its value to null. So there could be retransmissions
or probing which could be triggered for nothing.
Must be backported to 2.6.
When a proxy is initialized with the settings of the default proxy, instead
of doing a raw copy of the default server settings, a custom copy is now
performed by calling srv_settings_copy(). This way, all settings will be
really duplicated. Without this deep copy, some pointers are shared between
several servers, leading to UAF, double-free or such bugs.
This patch relies on following commits:
* b32cb9b51 REORG: server: Export srv_settings_cpy() function
* 0b365e3cb MINOR: server: Constify source server to copy its settings
This patch should fix the issue #1804. It must be backported as far as 2.0.
The connection retry counter is incremented too early when a connection
fails. In SC_ST_CER state, errors handling must be performed before
incrementing the counter. Otherwise, we may consider the max connection
attempt is reached while a last one is in fact possible.
This patch must be backported to 2.6.
When a new DNS session is created, all its fields are not properly
initialized. For instance, "tx_msg_offset" can have any value after the
allocation. So, to fix the bug, pool_zalloc() is now used to allocate new
DNS session.
This patch should fix the issue #1781. It must be backported as far as 2.4.
When a peer open a new connection to another peer, it is considered as
connected when the hello message is sent. To do so, the peer applet was
relying on CF_WRITE_PARTIAL channel flag. However it is not the right flag
to use. This one is a transient flag. Depending on the scheduling, this flag
may be removed by the stream before the peer has a chance to see
it. Instead, CF_WROTE_DATA flag must be checked.
This patch is related to the issue #1799. It must be backported as far as
2.0.
When peers are configured and HAProxy is reloaded or restarted, a
synchronization is performed between the old process and the new one. To do
so, the old process connects on the new one. If the synchronization fails,
it retries. However, there is no delay and reconnect attempts are not
bounded. Thus, it may loop for a while, consuming all the CPU. Of course, it
is unexpected, but it is possible. For instance, if the local peer is
misconfigured, an infinite loop can be observed if the connection succeeds
but not the synchronization. This prevents the old process to exit, except
if "hard-stop-after" option is set.
To fix the bug, the reconnect is delayed. The local peer already has a
expiration date to delay the reconnects. But it was not used on stopping
mode. So we use it not. Thanks to the previous fix, the reconnect timeout is
shorter in this case (500ms against 5s on running mode). In addition, we
also use the peers resync expiration date to not infinitely retries. It is
accurate because the new process, on its side, use this timeout to switch
from a local resync to a remote resync.
This patch depends on "MINOR: peers: Use a dedicated reconnect timeout when
stopping the local peer". It fixes the issue #1799. It should be backported
as far as 2.0.
When a process is stopped or reload, a dedicated reconnect timeout is now
used. For now, this timeout is not used because the current code retries
immediately to reconnect to perform the local synchronization with the new
local peer, if any.
This patch is required to fix the issue #1799. It should be backported as
far as 2.0 with next fixes.
In peers section, it is possible to enable SSL for the local peer. In this
case, the bind line and the server line should both be configured. A
"default-server" directive may also be used to configure the SSL on the
server side. However there is no test to be sure the SSL is enabled on both
sides. It is an problem because the local resync performed during a reload
will be impossible and it is probably not the expected behavior.
So, it is now checked during the configuration validation. A warning message
is displayed if the SSL is not properly configured for the local peer.
This patch is related to issue #1799. It should probably be backported to 2.6.
Complete QUIC MUX timeout refresh function by using http-keep-alive
timeout. It is used when the connection is idle after having handle at
least one request.
To implement this a new member <idle_start> has been defined in qcc
structure. This is used as timestamp for when the connection became idle
and is used as base time for http keep-alive timeout
Add a new qcc member named <nb_hreq>. Its purpose is close to <nb_sc>
which represents the number of attached stream connectors. Both are
incremented inside qc_attach_sc().
The difference is on the decrement operation. While <nb_cs> is
decremented on sedesc detach callback, <nb_hreq> is decremented when the
qcs is locally closed.
In most cases, <nb_hreq> will be decremented before <nb_cs>. However, it
will be the reverse if a stream must be kept alive after detach callback.
The main purpose of this field is to implement http-keep-alive timeout.
Both <nb_sc> and <nb_hreq> must be null to activate the http-keep-alive
timeout.
Implement a new internal function qcc_refresh_timeout(). Its role will be
to reset QUIC MUX timeout depending if there is requests in progress or
not.
qcc_update_timeout() does not set a timeout if there is still attached
streams as in this case the upper layer is responsible to manage it.
Else it will activate the timeout depending on the connection current
status.
Timeout is refreshed on several locations : on stream detach and in I/O
handler and wake callback.
For the moment, only the default timeout is used (client or server). The
function may be expanded in the future to support more specific ones :
* http-keep-alive if connection is idle
* http-request when waiting for incomplete HTTP requests
* client/server-fin for graceful shutdown
Store a reference to proxy in the qcc structure. This will be useful to
access to proxy members outside of qcc_init().
Most notably, this change is required to implement timeout refreshing by
using the various timeouts configured at the proxy level.
Ensure via qcc_is_dead() that a connection is not released instance
until all of qcs streams are detached by the upper layer, even if an
error has been reported or the timeout has fired.
On the other side, as qc_detach() always check the connection status,
this should ensure that we do not keep a connection if not necessary.
Without this patch, a qcc instance may be freed with some of its qcs
streams not detached. This is an incorrect behavior and will lead to a
BUG_ON fault. Note however that no occurence of this bug has been
produced currently. This patch is mainly a safety against future
occurences.
This should be backported up to 2.6.
Timeout in QUIC MUX has evolved from the simple first implementation. At
the beginning, a connection was considered dead unless bidirectional
streams were opened. This was abstracted through an app callback
is_active().
Now this paradigm has been reversed and a connection is considered alive
by default, unless an error has been reported or a timeout has already
been fired. The callback is_active() is thus not used anymore and can be
safely removed to simplify qcc_is_dead().
This commit should be backported to 2.6.
A qcc instance may be freed in the middle of qc_io_cb() if all streams
were purged. This will lead to a crash as qcc instance is reused after
this step. Jump directly to the end of the function to avoid this.
Note that this bug has not been triggered for the moment. This is a
safety fix to prevent it.
This must be backported up to 2.6.