The function drops the lock very early, and the only operations that
are performed on the entry code are updating the current peer's
last_local_table, which doesn't need to be protected. Thus it's
easier to drop the lock before entering the function and it further
limits its scope.
This has raised the peak RPS from 2050 to 2355k/s with a peers section on
the 80-core machine.
Instead of taking the update's write lock in stktable_touch_with_exp(),
while most of the time under high load there is nothing to update because
the entry is touched before having been synchronized present, let's do
the check under a read lock and upgrade it to perform the update if
needed. These updates are rare and the contention is not expected to be
very high, so at the first failure to upgrade we retry directly with a
write lock.
By doing so the performance has almost doubled again, from 1140 to 2050k
with a peers section enabled. The contention is now on taking the read
lock itself, so there's little to be gained beyond this in this function.
Updating an entry in the updates tree is currently performed under the
table's write lock, which causes huge contention with other accesses
such as lookups and free. Aside the updates tree, the update,
localupdate and commitupdate variables, nothing is manipulated, so
let's create a distinct lock (updt_lock) to protect these together
to remove this contention. It required to add an extra lock in the
few places where we delete the update (though only if we're really
going to delete it) to protect the tree. This is very convenient
because now peer_send_teachmsgs() only needs to take this read lock,
and there is very little contention left on the stick-table.
With this alone, the performance jumped from 614k to 1140k/s on a
80-thread machine with a peers section! Stick-table updates with
no peers however now has to stand two locks and slightly regressed
from 4.0-4.1M/s to 3.9-4.0. This is fairly minimal compared to the
significant unlocking of the peers updates and considered totally
acceptable.
This function doesn't need to be write-locked. It performs a lookup
of the next update at its index, atomically updates the ref_cnt on
the stksess, updates some shared_table fields on the local thread,
and updates the table's commitupdate. Now that this update is atomic
we don't need to keep the write lock during that period. In addition
this function's callers do not rely on the write lock to be held
either since it was droped during peer_send_updatemsg() anyway.
Now, when the function is entered with a write lock, it's downgraded
to a read lock, otherwise a read lock is grabbed. Updates are looked
up under the read lock and the message is sent without the lock. The
commitupdate is still performed under the read lock (so as not to
break the code too much), and the write lock is re-acquired when
leaving if needed. This allows multiple peers to look up updates in
parallel and to avoid stalling stick-table lookups.
This function maintains the write lock for a while. In practice it does
not need to hold it that long, and some parts could be performed under a
read lock. This patch first drops then re-acquires the write lock at the
function's entry. The purpose is simply to break the end-to-end atomicity
to prove that it has no impact in case something needs to be bisected
later. In fact the write lock is already dropped while calling
peer_send_updatemsg().
The ->commitupdate index doesn't need to be kept consistent with other
operations, it only needs to be correct and to reflect the last known
value. Right now it's updated under the stick-table lock, which is
expensive and maintains this lock longer than needed. Let's move it
outside of the lock, and update it using a CAS. This patch simply
replaces the assignment with a CAS and makes sure all reads are atomic.
On failed CAS we use a simple cpu_relax(), no need for more as there
should not be that much contention here (updates are not that fast).
The structure currently mixes R/O and R/W fields, let's organize them
by access type, focusing mainly on splitting the updates from the rest
so that peers activity does not affect the rest. For now it doesn't
bring any benefit but it paves the way for splitting the lock.
Due to the ts->ref_cnt being manipulated and checked inside wrlocks,
we continue to have it updated under plenty of read locks, which have
an important cost on many-thread machines.
This patch turns them all to atomic ops and carefully moves them outside
of locks every time this is possible:
- the ref_cnt is incremented before write-unlocking on creation otherwise
the element could vanish before we can do it
- the ref_cnt is decremented after write-locking on release
- for all other cases it's updated out of locks since it's guaranteed by
the sequence that it cannot vanish
- checks are done before locking every time it's used to decide
whether we're going to release the element (saves several write locks)
- expiration tests are just done using atomic loads, since there's no
particular ordering constraint there, we just want consistent values.
For Lua, the loop that is used to dump stick-tables could switch to read
locks only, but this was not done.
For peers, the loop that builds updates in peer_send_teachmsgs is extremely
expensive in write locks and it doesn't seem this is really needed since
the only updated variables are last_pushed and commitupdate, the first
one being on the shared table (thus not used by other threads) and the
commitupdate could likely be changed using a CAS. Thus all of this could
theoretically move under a read lock, but that was not done here.
On a 80-thread machine with a peers section enabled, the request rate
increased from 415 to 520k rps.
The write lock in stktable_touch_with_exp() is quite expensive and should
be shortened as much as possible. There's no need for it when calling
task_wakeup() so let's move it out.
On a 80-thread machine with a peers section, the request rate increased
from 397k to 415k rps.
The write lock in stktable_requeue_exp() is quite expensive and should
be shortened as much as possible. There's no need for it when calling
task_queue() so let's move it out.
On a 80-thread machine with a peers section, the request rate increased
from 368k to 397k rps.
This updates the local copy of the plock library to benefit from finer
memory ordering, EBO on more operations such as when take_w() and stow()
wait for readers to leave and refined EBO, especially on common operation
such as attempts to upgade R to S, and avoids a counter-productive prior
read in rtos() and take_r().
These changes have shown a 5% increase on regular operations on ARM,
a 33% performance increase on ARM on stick-tables and 2% on x86, and
a 14% and 4% improvements on peers updates respectively on ARM and x86.
The availability of relaxed operations will probably be useful for stats
counters which are still extremely expensive to update.
The following plock commits were included in this update:
9db830b plock: support inlining exponential backoff code
008d3c2 plock: make the rtos upgrade faster
2f76dde atomic: clean up the generic xchg()
3c6919b atomic: make sure that the no-return macros do not return a value
97c2bb7 atomic: make the fallback bts use the pointed type for the shift
f4c1880 atomic: also implement the missing pl_btr()
8329b82 atomic: guard all generic definitions to make it easier to provide specific ones
7c5cb62 atomic: use C11 atomics when available
96afaf9 atomic: prefer the C11 definitions in general
f3ec7a6 atomic: implement load/store/atomic barriers
8bdbd1e atomic: add atomic load/stores
0f604c0 atomic: add more _noret operations
3fe35db atomic: remove the (void) cast from the C11 operations
3b08a7c atomic: allow to define the fallback _noret variants
28deb22 atomic: make x86 arithmetic operations the _noret variants
8061fe2 atomic: handle modern compilers that support returning flags
b8b91b7 atomic: add the fetch-and-<op> operations (pl_ld<op>)
59817ca atomic: add memory order variants for most operations
a40774f plock: explicitly make use of the pl_*_noret operations
6f1861b plock: switch to pl_sub_noret_lax() for cancellation
c013980 plock: use pl_ldadd{_lax,_acq,} instead of pl_xadd()
382eea3 plock: use a release ordering when dropping the lock
60d750d plock: use EBO when waiting for readers to leave in take_w() and stow()
fc01c4f plock: improve EBO a little bit
1ef6390 plock: switch to CAS + XADD for pl_take_r()
Michel Mayen reported that mixing lua actions loaded from 'lua-load'
and 'lua-load-per-thread' directives within a single http/tcp session
yields unexpected results.
When executing action defined in another running context from the one of
the previously executed action (from lua-load, then from
lua-load-per-thread or the opposite, order doesn't matter), it would yield
this kind of error:
"Lua function 'name': [state-id x] runtime error: attempt to call a nil value from ."
He also noted that when loading all actions using the same loading
directive, the issue is gone.
This is due to the fact that for lua actions, fetches and converters, lua
code is being executed from the stream lua context. However, the stream
lua context, which is created on the fly when first executing some lua
code related to the stream, is reused between multiple lua executions.
But the thing is, despite successive executions referring to the same
parent "stream" (which is also assigned to a given thread id), they don't
necessarily depend on the same running context from lua point of view.
Indeed, since the function which is about to be executed could have been
loaded from either 'lua-load' or 'lua-load-per-thread', the function
declaration and related dependencies are defined in a specific stack ID
which is known by calling fcn_ref_to_stack_id() on the given function.
Thus, in order to make streams capable of chaining lua actions, fetches and
converters loaded in different lua stacks, we add a new detection logic
in hlua_stream_ctx_prepare() to be able to recreate the lua context in the
proper stack space when the existing one conflicts with the expected stack
id.
This must be backported in every stable versions.
It depends on:
- "MINOR: hlua: add hlua_stream_prepare helper function"
[for < 2.5, skip the filter part since they didn't exist]
[wt: warning, wait a little bit before backporting too far, we
need to be certain the added BUG_ON() will never trigger]
Stream-dedicated hlua ctx creation and attachment is now performed in
hlua_stream_ctx_prepare() helper function to ease code maintenance.
No functional behavior change should be expected.
Multiple error paths made invalid use of lua_pop():
When the stack is emptied using lua_settop(0), lua_pop() (which is
implemented as a lua_settop() macro) should not be used right after,
because it could lead to invalid reads since the stack is already empty.
Unfortunately, some remnants from initial lua stack implementation kept
doing so, resulting in haproxy crashs on some lua runtime errors paths
from time to time (ie: ERRRUN, ERRMEM).
Moreover, the extra lua_pop() instruction, even if it was safe, is totally
pointless in such case.
Removing such unsafe lua_pop() statements when we know that the stack is
already empty.
This must be backported in every stable versions.
It is possible to trigger a loop of tasklets calls if a QUIC connection
is interrupted abruptly by the client. This is caused by the following
interaction :
* FD iocb is woken up for read. This causes a wakeup on quic_conn
tasklet.
* quic_conn_io_cb is run and try to read but fails as the connection
socket is closed (typically with a ECONNREFUSED). FD read is
subscribed to the poller via qc_rcv_buf() which will cause the loop.
The looping will stop automatically once the idle-timeout is expired and
the connection instance is finally released.
To fix this, ensure FD read is subscribed only for transient error cases
(EAGAIN or similar). All other cases are considered as fatal and thus
all future read operations will fail. Note that for the moment, nothing
is reported on the quic_conn which may not skip future reception. This
should be improved in a future commit to accelerate connection closing.
This bug can be reproduced on a frequent occurence by interrupting the
following command. Quic traces should be activated on haproxy side to
detect the loop :
$ ngtcp2-client --tp-file=/tmp/ngtcp2-tp.txt \
--session-file=/tmp/ngtcp2-session.txt \
-r 0.3 -t 0.3 --exit-on-all-streams-close 127.0.0.1 20443 \
"http://127.0.0.1:20443/?s=1024"
This must be backported up to 2.7.
The tasklet responsible of handling the remaining QUIC connection object
and its traffic was not released, leading to a memory leak. Furthermore its
callback, quic_cc_conn_io_cb(), should return NULL after this tasklet is
released.
->xprt_ctx (struct ssl_sock_ctx) and ->conn (struct connection) must be kept
by the remaining QUIC connection object (struct quic_cc_conn) after having
release the previous one (struct quic_conn) to allow "show fd/sess" commands
to be functional without causing haproxy crashes.
No need to backport.
Reset the local cc_qc and qc after having released cc_qc. Note that
cc_qc == qc. This is required to prevent haproxy from crashing
when TRACE_LEAVE() is called.
No need to backport.
There are cases where the mux is started before the handshake is completed:
during 0-RTT sessions.
So, it was a bad idea to try to release the quic_conn object from quic_conn_io_cb()
without checking if the mux is started.
No need to backport.
This will iterate over all commits in the range passed in RANGE, or all
those from master to RANGE if no ".." exists in RANGE, and run "make all"
with the exact same variables. This aims to ease the verification that
no build failure exists inside a series. In case of error, it prints the
faulty commit and stops there with the tree checked out. Example:
$ make-disctcc range RANGE=HEAD
Found 14 commit(s) in range master..HEAD.
Current branch is 20230809-plock+tbl+peers-4
Starting to building now...
[ 1/14 ] 392922bc5 #############################
(...)
Done! 14 commit(s) built successfully for RANGE master..HEAD
Maybe in the future it will automatically use HEAD as a default for RANGE
depending on the feedback.
It's not listed in the help target so as not to encourage users to try it
as it can very quickly become confusing due to the checkouts.
Commit 5201b4abd ("BUG/MEDIUM: mux-h1: do not forget EOH even when no
header is sent") introduced a build warning on clang due to the remaining
two parenthesis in the expression. Let's fix this. No backport needed.
Since commit 723c73f8a ("MEDIUM: mux-h1: Split h1_process_mux() to make
code more readable"), outgoing H1 requests with no header at all (i.e.
essentially HTTP/1.0 requests) get delayed by 200ms. Christopher found
that it's due to the fact that we end processing too early and we don't
have the opportunity to send the EOH in this case.
This fix addresses it by verifying if it's required to emit EOH when
retruning from h1_make_headers(). But maybe that block could be moved
after the while loop in fact, or the stop condition in the loop be
revisited not to stop of !htx_is_empty(). The current solution gets the
job done at least.
No backport is needed, this was in 2.9-dev.
That's a regression introduced in 2.9-dev by commit 723c73f8a ("MEDIUM:
mux-h1: Split h1_process_mux() to make code more readable") and found
by Christopher. The consequence is uncertain but the test definitely was
not right in that it would catch most existing states (H1_MSG_DONE=30).
At least it would emit too many "H1 request fully xferred".
No backport needed.
Ben Kallus also noticed that we preserve leading zeroes on content-length
values. While this is totally valid, it would be safer to at least trim
them before passing the value, because a bogus server written to parse
using "strtol(value, NULL, 0)" could inadvertently take a leading zero
as a prefix for an octal value. While there is not much that can be done
to protect such servers in general (e.g. lack of check for overflows etc),
at least it's quite cheap to make sure the transmitted value is normalized
and not taken for an octal one.
This is not really a bug, rather a missed opportunity to sanitize the
input, but is marked as a bug so that we don't forget to backport it to
stable branches.
A combined regtest was added to h1or2_to_h1c which already validates
end-to-end syntax consistency on aggregate headers.
The content-length header parser has its dedicated function, in order
to take extreme care about invalid, unparsable, or conflicting values.
But there's a corner case in it, by which it stops comparing values
when reaching the end of the header. This has for a side effect that
an empty value or a value that ends with a comma does not deserve
further analysis, and it acts as if the header was absent.
While this is not necessarily a problem for the value ending with a
comma as it will be cause a header folding and will disappear, it is a
problem for the first isolated empty header because this one will not
be recontructed when next ones are seen, and will be passed as-is to the
backend server. A vulnerable HTTP/1 server hosted behind haproxy that
would just use this first value as "0" and ignore the valid one would
then not be protected by haproxy and could be attacked this way, taking
the payload for an extra request.
In field the risk depends on the server. Most commonly used servers
already have safe content-length parsers, but users relying on haproxy
to protect a known-vulnerable server might be at risk (and the risk of
a bug even in a reputable server should never be dismissed).
A configuration-based work-around consists in adding the following rule
in the frontend, to explicitly reject requests featuring an empty
content-length header that would have not be folded into an existing
one:
http-request deny if { hdr_len(content-length) 0 }
The real fix consists in adjusting the parser so that it always expects a
value at the beginning of the header or after a comma. It will now reject
requests and responses having empty values anywhere in the C-L header.
This needs to be backported to all supported versions. Note that the
modification was made to functions h1_parse_cont_len_header() and
http_parse_cont_len_header(). Prior to 2.8 the latter was in
h2_parse_cont_len_header(). One day the two should be refused but the
former is also used by Lua.
The HTTP messaging reg-tests were completed to test these cases.
Thanks to Ben Kallus of Dartmouth College and Narf Industries for
reporting this! (this is in GH #2237).
We indicate in path/pathq/url that they may contain '#' if the frontend
is configured with "option accept-invalid-http-request", and that option
mentions the fragment as well.
This is the h3 version of this previous fix:
BUG/MINOR: h2: reject more chars from the :path pseudo header
In addition to the current NUL/CR/LF, this will also reject all other
control chars, the space and '#' from the :path pseudo-header, to avoid
taking the '#' for a part of the path. It's still possible to fall back
to the previous behavior using "option accept-invalid-http-request".
Here the :path header value is scanned a second time to look for
forbidden chars because we don't know upfront if we're dealing with a
path header field or another one. This is no big deal anyway for now.
This should be progressively backported to 2.6, along with the
following commits it relies on (the same as for h2):
REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests
REORG: http: move has_forbidden_char() from h2.c to http.h
MINOR: ist: add new function ist_find_range() to find a character range
MINOR: http: add new function http_path_has_forbidden_char()
This is the h2 version of this previous fix:
BUG/MINOR: h1: do not accept '#' as part of the URI component
In addition to the current NUL/CR/LF, this will also reject all other
control chars, the space and '#' from the :path pseudo-header, to avoid
taking the '#' for a part of the path. It's still possible to fall back
to the previous behavior using "option accept-invalid-http-request".
This patch modifies the request parser to change the ":path" pseudo header
validation function with a new one that rejects 0x00-0x1F (control chars),
space and '#'. This way such chars will be dropped early in the chain, and
the search for '#' doesn't incur a second pass over the header's value.
This should be progressively backported to stable versions, along with the
following commits it relies on:
REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests
REORG: http: move has_forbidden_char() from h2.c to http.h
MINOR: ist: add new function ist_find_range() to find a character range
MINOR: http: add new function http_path_has_forbidden_char()
MINOR: h2: pass accept-invalid-http-request down the request parser
Seth Manesse and Paul Plasil reported that the "path" sample fetch
function incorrectly accepts '#' as part of the path component. This
can in some cases lead to misrouted requests for rules that would apply
on the suffix:
use_backend static if { path_end .png .jpg .gif .css .js }
Note that this behavior can be selectively configured using
"normalize-uri fragment-encode" and "normalize-uri fragment-strip".
The problem is that while the RFC says that this '#' must never be
emitted, as often it doesn't suggest how servers should handle it. A
diminishing number of servers still do accept it and trim it silently,
while others are rejecting it, as indicated in the conversation below
with other implementers:
https://lists.w3.org/Archives/Public/ietf-http-wg/2023JulSep/0070.html
Looking at logs from publicly exposed servers, such requests appear at
a rate of roughly 1 per million and only come from attacks or poorly
written web crawlers incorrectly following links found on various pages.
Thus it looks like the best solution to this problem is to simply reject
such ambiguous requests by default, and include this in the list of
controls that can be disabled using "option accept-invalid-http-request".
We're already rejecting URIs containing any control char anyway, so we
should also reject '#'.
In the H1 parser for the H1_MSG_RQURI state, there is an accelerated
parser for bytes 0x21..0x7e that has been tightened to 0x24..0x7e (it
should not impact perf since 0x21..0x23 are not supposed to appear in
a URI anyway). This way '#' falls through the fine-grained filter and
we can add the special case for it also conditionned by a check on the
proxy's option "accept-invalid-http-request", with no overhead for the
vast majority of valid URIs. Here this information is available through
h1m->err_pos that's set to -2 when the option is here (so we don't need
to change the API to expose the proxy). Example with a trivial GET
through netcat:
[08/Aug/2023:16:16:52.651] frontend layer1 (#2): invalid request
backend <NONE> (#-1), server <NONE> (#-1), event #0, src 127.0.0.1:50812
buffer starts at 0 (including 0 out), 16361 free,
len 23, wraps at 16336, error at position 7
H1 connection flags 0x00000000, H1 stream flags 0x00000810
H1 msg state MSG_RQURI(4), H1 msg flags 0x00001400
H1 chunk len 0 bytes, H1 body len 0 bytes :
00000 GET /aa#bb HTTP/1.0\r\n
00021 \r\n
This should be progressively backported to all stable versions along with
the following patch:
REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests
Similar fixes for h2 and h3 will come in followup patches.
Thanks to Seth Manesse and Paul Plasil for reporting this problem with
detailed explanations.
We're adding a new argument "relaxed" to h2_make_htx_request() so that
we can control its level of acceptance of certain invalid requests at
the proxy level with "option accept-invalid-http-request". The goal
will be to add deactivable checks that are still desirable to have by
default. For now no test is subject to it.
As its name implies, this function checks if a path component has any
forbidden headers starting at the designated location. The goal is to
seek from the result of a successful ist_find_range() for more precise
chars. Here we're focusing on 0x00-0x1F, 0x20 and 0x23 to make sure
we're not too strict at this point.
This looks up the character range <min>..<max> in the input string and
returns a pointer to the first one found. It's essentially the equivalent
of ist_find_ctl() in that it searches by 32 or 64 bits at once, but deals
with a range.
Historically the parsing error used to apply only to too large headers,
so this is what has been reported in traces. But nowadays we can also
reject invalid characters, and when this happens the trace is a bit
misleading, so let's mention "or invalid".
In practice it's exactly the same for h3 as 54f53ef7c ("BUG/MAJOR: h2:
reject header values containing invalid chars") was for h2: we must
make sure never to accept NUL/CR/LF in any header value because this
may be used to construct splitted headers on the backend. Hence we
apply the same solution. Here pseudo-headers, headers and trailers are
checked separately, which explains why we have 3 locations instead of
2 for h2 (+1 for response which we don't have here).
This is marked major for consistency and due to the impact if abused,
but the reality is that at the time of writing, this problem is limited
by the scarcity of the tools which would permit to build such a request
in the first place. But this may change over time.
This must be backported to 2.6. This depends on the following commit
that exposes the filtering function:
REORG: http: move has_forbidden_char() from h2.c to http.h
This function is not H2 specific but rather generic to HTTP. We'll
need it in H3 soon, so let's move it to HTTP and rename it to
http_header_has_forbidden_char().
If the "limited-quic" globale option wa not set, the QUIC listener bindings were
not bound, this is ok, but silently ignored.
Add a warning in these cases to ask the user to explicitely enable the QUIC
bindings when building QUIC support against a TLS/SSL library without QUIC
support (OpenSSL).
Add a check to the QUIC packet handler running at application level (after the
handshake is complete) to release the quic_conn memory calling quic_conn_release().
This is done only if the mux is not started.
When the connection enters the "connection closing" state after having sent
a datagram with CONNECTION_CLOSE frames inside its packets, a lot of memory
may be freed from quic_conn objects (QUIC connection). This is done allocating
a reduced sized object which keeps enough information to handle the remaining
incoming packets for the connection in "connection closing" state, and to
continue to send again the previous datagram with CONNECTION_CLOSE frames inside
which has already been sent.
Define a new quic_cc_conn struct which represents the connection object after
entering the "connection close" state and after having release the quic_conn
connection object.
Define <pool_head_quic_cc_conn> new pool for these quic_cc_conn struct objects.
Define QUIC_CONN_COMMON structure which is shared between quic_conn struct object
(the connection before entering "connection close" state), and new quic_cc_conn
struct object (the connection after entering "connection close"). So, all the
members inside QUIC_CONN_COMMON may be indifferently dereferenced from a
quic_conn struct or a quic_cc_conn struct pointer.
Implement qc_new_cc_conn() function to allocate such connections in
"connection close" state. This function is responsible of copying the
required information from the original connection (quic_conn) to the remaining
connection (quic_cc_conn). Among others initialization, it redefined the
QUIC packet handler task to quic_cc_conn_io_cb() and the idle timer task
to qc_cc_idle_timer_task(). quic_cc_conn_io_cb() drains the received and
resend the datagram which CONNECTION_CLOSE frame which has already been sent
when entering "connection close" state. qc_cc_idle_timer_task() only releases
the remaining quic_cc_conn struct object.
Modify quic_conn_release() to allocate quic_cc_conn struct objects from the
original connection passed as argument. It does nothing if this original
connection is not in closing state, or if the idle timer has already expired.
Implement quic_release_cc_conn() to release a "connection close" connection.
It is called when its timer expires or if an error occured when sending
a packet from this connection when the peer is no more reachable.
Add "quic_cids" new pool to allocate the ->cids trees of quic_conn objects.
Replace ->cids member of quic_conn objects by pointer to "quic_cids" and
adapt the code consequently. Nothing special.
Add a new pool <pool_head_quic_cc_buf> for buffer used when building datagram
wich CONNECTION_CLOSE frames inside with QUIC_MIN_CC_PKTSIZE(128) as minimum
size.
Add ->cc_buf_area to quic_conn struct to store such buffers.
Add ->cc_dgram_len to store the size of the "connection close" datagrams
and ->cc_buf a buffer struct to be used with ->cc_buf_area as ->area member
value.
Implement qc_get_txb() to be called in place of qc_txb_alloc() to allocate
a struct "quic_cc_buf" buffer when the connection needs an immediate close
or a buffer struct if not.
Modify qc_prep_hptks() and qc_prep_app_pkts() to allow them to use such
"quic_cc_buf" buffer when an immediate close is required.
Move rx.bytes, tx.bytes and tx.prep_bytes quic_conn struct member to
bytes anonymous struct (bytes.rx, bytes.tx and bytes.prep member respectively).
They are moved before being defined into a bytes anonoymous struct common to
a future struct to be defined.
Consequently adapt the code.
Add a BUG_ON() to quic_peer_validated_addr() to check the amplification limit
is respected when it return false(0), i.e. when the connection is not validated.
Implement quic_may_send_bytes() which returns the number of bytes which may be
sent when the connection has not already been validated and call this functions
at several places when this is the case (after having called
quic_peer_validated_addr()).
Furthermore, this patch improves the code maintainability. Some patches to
come will have to rename ->[rt]x.bytes quic_conn struct members.
When a "replace-header" action is used, we loop on all headers in the
message to change value of all headers matching a name. The new value is
placed in a trash. However, there is a race here because if the message must
be defragmented, another trash is used. If several defragmentation are
performed because several headers must be updated at same time, the first
trash, used to store the new value, may be crushed. Indeed, there are only 2
pre-allocated trash used in rotation. and the trash to store the new value
is never renewed. As consequece, random data may be inserted into the header
value.
Here, to fix the issue, we must take care to refresh the trash buffer when
we evaluated a new header. This way, a trash used for the new value, and
eventually another way for the htx defragmentation. But that's all.
Thanks to Christian Ruppert for his detailed report.
This patch must be to all stable versions. On the 2.0, the patch must be
applied on src/proto_htx.c and the function is named
htx_transform_header_str().