The QUIC code can only handle IPv4 or IPv6 addresses, so using two
sockaddr_storage structs wastes a lot of space in the quic_dgram struct.
This is a very large overhead since this structure is written in the MPSC
ring buffers before every datagram, while many of those datagrams are only
50 bytes or less. Using an union instead saves 200 bytes per datagram,
increasing the capacity of the buffers significantly.
Using an offset instead of a pointer into the datagram buffer is less
error-prone as we do not have to manually fixup that pointer when the
datagram is moved somewhere else in memory.
Use an MPSC ring buffer to hold data for each datagram handler. Holding
this data in a per-handler buffer avoids the HoL blocking we experienced
when we had per-listener buffers with data from all threads mixed up
in them.
This also gets rid of the mt_list contention we were suffering before,
that was causing some threads to be stuck for a significant amount of
time, causing warnings and even crashes in some cases.
This is to be used in the QUIC code, where the multiple producers are
the listener threads, and the single consumer is the datagram handler
thread. Entries are variable-length with a size header, and are kept
contiguous in the buffer, so padding is inserted at the end when an
entry would otherwise wrap around. The size field is overloaded to also
mark padding (-1) and entries that are still free or not yet ready for
reads (0).
Headers and payloads are aligned on 8 bytes. Aligning on 16 bytes might
be beneficial on some architectures to let memcpy() use 128-bit SIMD
instructions.
The head and tail offsets are 64-bit unsigned integers, making ABA
issues from integer overflow impossible on current or near-future
hardware. Reservation uses a CAS rather than FAA because of the need to
insert padding to keep entries contiguous.
The variable name is passed as (ptr,len) suggesting that certain callers
might not always have 0-terminated strings (e.g. when used in arguments
where closing parenthesis or commas might follow), the error message
carefully tries to designate the first invalid character, yet by mistake
it prints the whole string. This mistake has been there since commit
4834bc773c ("MEDIUM: vars: adds support of variables") in 1.6. Let's
properly report the char as promised instead. This could be backported
but is totally unimportant.
In 2.5, commit 9a621ae76d ("MEDIUM: vars: add a new "set-var-fmt" action")
introduced the set-var-fmt action. However the storage (by then
sample_store_stream() now var_set()) was added for this specific
branch without any return, leaving the sample copied again over the
variable via the final call, meaning that the variable name is looked
up twice and for proc scope, the lock is taken twice for each call to
set-var-fmt.
This patch removes the first call. Tests show that proc operations
now jump from 1.1M to 1.67M/s on a 64-core CPU (lower lock contention),
while other scopes only observe a modest improvement with few vars
(10 goes from 43.3M to 44M/s). This could be backported.
In 2.5, variables in the scope "proc" were pre-created with commit
df8eeb1619 ("MEDIUM: vars: pre-create parsed SCOPE_PROC variables as
permanent ones"). However one test on var_set() was copy-pasted from
vars_check_args() into parse_store(), and the former returns 0 on
error while for the latter it's a success. This means that some errors
on variables of scope "proc" (typically alloc failure) can be missed
at boot time, probably either making that variable invisible or causing
a crash during boot.
Let's return ACT_RET_PRS_ERR instead. This can be backported.
In 3.1 with commit 1bdf6e884a ("MEDIUM: sink: implement sink_find_early()")
sink creation started to support plugging to an existing sink and only
updating the description. However if the strdup() for the new description
failed, it would go down the error path releasing everything, while leaving
the released entry in the sink list and leaving other users still attached
to it.
Here we take a different approach, we first allocate the new description,
and release the old one on success, otherwise we leave the entry unchanged.
And we only release new sinks, not old ones. This way allocation errors
just do not change what is already referenced elsewhere, so that error
unrolling remains clean.
This remains of low importance anyway since it's only a case of boot
error aggravation. Not really needed to be backported.
When an ACME server returns a certificate URL directly in the newOrder
response (order already validated), parse it and transition straight to
ACME_CERTIFICATE, bypassing the auth/challenge steps.
This needs to be backported to 3.2.
When an ACME server returns a newOrder response with an empty
authorizations array (certificate already validated), ctx->auths
remains NULL. The state machine then transitions to ACME_AUTH which
immediately dereferences ctx->next_auth, causing a segfault.
Return an error from acme_res_neworder() so the caller retries.
This needs to be backported to 3.2.
When the scheme-based normalization is performed, an empty path is
normalized to "/". But as stated in RFC9110#4.2.3, this must not be applied
on OPTIONS requests.
So let's fix the issue by adding a test on the method.
Thanks to @zhanhb for the bug report and the analysis.
This patch should fix the issue #3352. It must be backported as far as 3.0.
The function encoding and sending FCGI_PARAM records was reworked to
properly deal with full buffer. An error must be triggered only when the
parameters cannot be encoded while the mbuf is empty (or the free space is
greater than the max record size). Otherwise we must wait and retry later.
Before, an error was triggered on encoding error if any HTX block was
consumed, regarless the mbuf state. Now, blocks are removed on success
only. So we can wait for more space.
This patch should fix the issue #3346. It should be backported to all stable
versions.
sample_conv_eth_vlan() reads the VLAN TCI at area[idx + 2] without
ensuring there are enough bytes. The original condition 'idx + 4 < data'
breaks when there IS room for more data, leading to an incomplete read
when trying to decode a VLAN ID.
This can be backported where this converter was backported.
sample_conv_tcp_options_list() uses 'ofs + 1 <= len' to check bounds
before reading the option length field at area[ofs + 1]. When ofs + 1
equals len, this reads one byte past the valid buffer (valid indices are
0 to len-1).
This is the same bug pattern as tcp_fullhdr_find_opt() fixed previously,
and the impact is also almost inexistent.
tcp_fullhdr_find_opt() reads smp->data.u.str.area[next + 1] without
checking that next + 1 < len. When the last byte of a TCP header's
options section (at index len - 1) contains an option type that is not
0 (EOL) and not 1 (NOP), the code reads one byte past the valid buffer,
which is an out-of-bounds read, which in practice is totally harmless
but should be fixed.
This can be backported where tcp_fullhdr_find_opt() was backported.
In both smp_fetch_distcc_param() and smp_fetch_distcc_body(), the code
does "ofs += body" without checking if body is larger than the remaining
data. If a malicious distcc packet contains a token with a very large
body length (param value up to 0xFFFFFFFF), ofs could overflow and wrap
around to a small value, causing the next iteration's bounds check
"ofs + 12 > ci_data(chn)" to pass incorrectly.
This could lead to out-of-bounds reads or an infinite loop.
Given that this is only used in trusted environments, this is mostly
harmless. It can be backported to all stable versions.
The keyshare extension parsing loop reads dataPointer[readPosition+2]
and dataPointer[readPosition+3] inside the loop body, requiring at least
4 bytes to be safe. However, keyshare_len was only validated as >= 2.
With keyshare_len == 2 or 3, the first loop iteration would read past
the end of the extension data, causing an out-of-bounds read which is
harmless in practice. We also need to make sure that the read position
stops 4 bytes before the end in order to read the 4 next bytes.
This can be backported to stable versions.
A few typos like "not unhandled" in error messages, and some spelling
mistakes mostly in cfgparse error messages were fixed. One spelling
mistake in a function name was fixed as well (ssl_sock_chose_sni_ctx
renamed to ssl_sock_choose_sni_ctx). Those which apply may be worth
being backported.
Functions emitting chunks size are using size_t integer to do so. Depending
on the code path, these functions can be called using an unsigned long long
integer (h1m->curr_len for instance). On 64-bits architectures, there is no
issue. But on the 32-bits architecture, it is a problem. size_t are 32-bits
integer so the 64-bits parameter will be casted to a 32-bits integer. For
chunk size exceeding 4GB, the wrong size will be emitted.
To fix the issue, these functions are now using true 64-bits
integer. h1s_consume_kop() was also modified accordingly.
In addition, when a size_t is compared to a 64-bits integer, an explicit
cast is used to be sure the right type is used.
This patch must be backported as far as 3.0. It must be backported after
1ef74fc7c ("BUG/MEDIUM: mux_h1: fix stack buffer overflow in
h1_append_chunk_size()").
When a H1 request was parsed, only a light validation was performed on the
URI, mainly because there was no distinction between the different parts of
the URI. So only characters in the range [0x21, 0x7e], excluding the "#" was
allowed.
To be consistant with the H2 and H3 parser, the authority is now validated,
using http_authority_has_forbidden_char() function.
This patch should be backported as far as 2.8. For previous verions,
http_authority_has_forbidden_char() function does not exist.
This patch is similar to the previous one but for header addtion. It is now
possible to skip the authority updated. An extra argument was added to
http_add_header() function for this purpose.
When a header value is replaced, a test is performed to verify if the
authority must be updated or not. But there are some places where we know it
is useless. First, when the caller know the header is not the host
header. Then when the host header value is updated because the authority was
changed.
So now, an extra argument was added to http_replace_header() and
http_replace_header_value() functions to specify if the authority update
must be performed or not.
During scheme based normalization, when the authority is normalized, the
host headers are updated accordingly. Only full host header values must be
updated. Comma-separated list are not expected here.
It is important to do so to be consistant with other places where the host
header is updated (when the request URI is changed for instance).
When a host header value is updated, the authority can also be updated
accordingly. When it is performed, we must not use the new host header value
from the HTX message. Instead we must use the data passed as argument. It is
unexpected but the host header can have several comma-separated values.
Using the full header value can lead to unexpected result.
Note: having multiple comma-separated values for the host header should not
be supported. The comma should be part of the host value. But it is
quite ambiguous. This will be fixed in another commit.
This patch must be backported to all stable versions.
During the scheme based normalization, The original authority value is used
to replace every host headers. It is an issue, because when the HTX message
is modified the blocks may be reorganised to find free space. For instance a
defragmentation can be performed. So the address of the authority can
change. To perform such rewrite, the new value must be stored in a temporary
buffer. It is especially important because the start-line is also updated,
so the original authority could be moved, making it invalid.
Because of this bug, it is possible to mix HTX block values. There is no
overflow but the start-line can be crushed with data from the host header
value, making it invalid for the server.
So, to fix the issue, the new host header value is now the one in the trash
chunk used to rewrite the start line. But in that case, the trash chunk must
be allocated to be sure it remains valid when replacing all host headers
values.
This patch must be backported as far as 2.6.
When using QUIC, mux instantiates quic_frame objects but does not free
them. This is performed only when acknowledgment are received.
This is not the case for QMux protocol, as the transport layer is much
simpler in this case. As such, mux is responsible to free up the frames
after emission.
This patch fixes qcc_qstrm_send_frames() by adding the necessary
qc_frm_free() calls as soon as a frame is emitted. This fixes a memory
leak. This function ensures that the freed object is removed from its
parent list, so LIST_DEL_INIT() is not necessary anymore.
No need to backport.
b_alloc() does nothing if a buffer is already allocated. As such, it is
not necessary to call b_size() as a check prior to it. Removes its usage
in qcc_qstrm_recv() so that the code is similar to other b_alloc()
usages.
Remove exports for functions that are not called directly anymore, and
make them static. This involves some reordering to avoid the need for a
forward static declaration.
Also remove the old callback fields from the lbprm struct.
This creates an ops structure for the various callbacks into the LB
algorithms, and defines those ops structure for each algorithm. A new
proxy_init callback is added for the initialization functions called
from proxy_finalize(). Since one of them needs to return an int in case
of an error, we change all the others to also return an int so we have a
uniform type.
No functional changes.
http_wait_for_request() improperly reports term events on the scb instead
of scf, causing some request parsing failures to possibly be reported as
response errors. This was introduced in 3.2 with commit 2dc02f75b1 ("MEDIUM:
tevt/stconn/stream: Add dedicated termination events for stream location")
so it must be backported there.
The char tmp[10] buffer can only hold 8 hex digits + CRLF suffix. If chksz
exceeds 4GB (0xFFFFFFFF), the do-while loop writes more than 8 hex digits,
overflowing the stack buffer by 1+ bytes. In practice the buffer is aligned
from the end and leaves a 6-byte hole before it on 64-bit systems, leaving
enough room to be harmless, and 4 on 32-bit platforms which save it from
touching lower variables. So it is safe but just by luck.
Fix by increasing tmp[] to 18 bytes, sufficient for up to 16 hex digits
(2^64 - 1) plus CRLF.
This patch adds termination level on QUIC MUX at the stream layer.
Similarly to the previous patch, error codes should be similar to the
ones already used in H2.
Most of the values are reported via qcc_reset_stream() which have now an
extra argument for this. This is used by application protocol H3 and
hq-interop.
qmux_sctl() callback is extended to support MUX_SCTL_TEVTS to report the
current stream termination event at the upper layer.
Implement termination event log in QUIC MUX layer at the connection
level. The objective is to reuse similar codes already used on H2.
Most of the error values are reported via qcc_set_error() which have now
an extra argument to specify the term event code. This is in particular
used in H3 application protocol code.
Also, qmux_ctl() now implements MUX_CTL_TEVTS to report the current
connection termination code to the upper layer. Callback qmux_sctl()
display the connection term event code on MUX_SCTL_DBG_STR.
This cleans up around 12 non-visible typos in h2 and mux-h2, 6 in peers,
3 in tools, and also addresses a leftover after commit 9294e8822f in 2.4
which changed the word fingerprint calculation without updating the
comment about the possible output values. No backport needed.
In update_word_fingerprint_with_len() we convert a character to lower
case, then it's checked against lower case, upper case and digits. Let's
just drop the upper case check which cannot happen.
The expression to check both peer->local and appctx_is_back() uses a
bitwise '&' instead of a logical '&&'. Fortunately both values are
always either 0 or 1 so there is no impact. This can be backported to
all stable versions.
Several cases in sample_conv_when (FORWARDED, TOAPPLET, PROCESSED, ACL)
access smp->strm->scb without checking if strm is NULL. The strm field
may be NULL (e.g., tcp-request connection). Let's add NULL checks to
prevent dereferencing a NULL pointer.
This should be backported to 3.1.
Several error paths in smp_resolve_args used 'continue' which skipped
LIST_DELETE and free(cur), leaking the arg_list node. Changed all to
'break' to ensure proper cleanup on all error paths. This is harmless
since when such issues are met, the process refuses to start, so no
backport is really needed.
When find_acl_by_name() and find_acl_default() both fail when parsing
converter "when(ACL,foo)", the previously allocated acl_sample struct
is leaked. Free it before returning 0. This can be backported to stable
versions.
When strdup() fails after some entries have already been strdup'd, the function
returned -1 without freeing previously allocated strings. Added cleanup loop to
free all previously strdup'd entries and reset init_env.
This can be backported to 3.1.
When malloc() fails in indent_msg, the function returned NULL without
freeing the original *out string as it was supposed to. The caller loses
both the original string (leaked) and gets NULL back. Fixed to free *out
and set it to NULL before returning.