Commit Graph

16773 Commits

Author SHA1 Message Date
Christopher Faulet
7d7df1cf0a BUG/MEDIUM: mux-h1: Be sure xprt support splicing to use it during fast-forward
The commit d6d4abdc3 ("BUILD: mux-h1: Fix build without kernel splicing
support") introduced a regression. The kernel support for the underlying
XPRT is no longer checked. So it is possible to enable the splicing for SSL
connection. This of course leads to a segfault.

This patch restore the test on the xprt rcv_pipe/snd_pipe functions.

This patch should fix a crash reported by Tristan in #2095
(#issuecomment-1788949014). No backport needed.
2023-11-07 18:23:00 +01:00
Amaury Denoyelle
6f9b65f952 BUG/MEDIUM: quic: fix sslconns on quic_conn alloc failure
QUIC connections are accounted inside global sslconns. As with QUIC
actconn, it suffered from a similar issue if an intermediary allocation
failed inside qc_new_conn().

Fix this similarly by moving increment operation inside qc_new_conn().
Increment and error path are now centralized and much easier to
validate.

The consequences are similar to the actconn fix : on memory allocation
global sslconns may wrap, this time blocking any future QUIC or SSL
connections on the process.

This must be backported up to 2.6.
2023-11-07 14:06:02 +01:00
Amaury Denoyelle
a7ba679fe7 BUG/MEDIUM: quic: fix actconn on quic_conn alloc failure
Since the following commit, quic_conn instances are accounted into
global actconn and compared against maxconn.

  commit 7735cf3854
  MEDIUM: quic: count quic_conn instance for maxconn

Increment is always done prior to real allocation to guarantee minimal
resource consumption. Special care is taken to ensure there will always
be one decrement operation for each increment. To help this, decrement
is centralized in quic_conn_release().

This behaves incorrectly in case of an intermediary allocation failure
inside qc_new_conn(). In this case, quic_conn_release() will decrement
actconn. Then, a NULL qc is returned in quic_rx_pkt_retrieve_conn()
which will also decrement the counter on its own error code path.

To properly fix this, actconn incrementation has been moved directly
inside qc_new_conn(). It is thus easier to cover every cases :
* if alloc failure before or on pool_head_quic_conn, actconn is
  decremented manually at the end of qc_new_conn()
* after this step, actconn will be decremented by quic_conn_release()
  either on intermediary alloc failure or on proper connection release

This bug happens on memory allocation failure so it should be rare.
However, its impact is not negligeable as if actconn counter is wrapped
it will block any future connection allocation for both QUIC and TCP.

One small downside of this change is that a CID is now always allocated
before quic_conn even if maxconn will be reached. However, this is
considered as of minor importance compared to a more robust code.

This must be backported up to 2.6.
2023-11-07 13:50:07 +01:00
Christopher Faulet
e5fe2013a9 CLEANUP: htx: Properly indent htx_reserve_max_data() function
Spaces were used instead of tabs to indent htx_reserve_max_data()
function. Let's reindent the whole function.
2023-11-07 10:41:11 +01:00
Christopher Faulet
c57af8ebcd BUG/MINOR: stconn: Sanitize report for read activity
When a EOS or EOI is detected on the endpoint and when the event is reported
at the SC level, a read activity must be reported. It is not really a big
deal because these flags already inhibit any read timeout. But it is
consistent with the <lra> comment. In addition, no read activity is reported
on abort. It is up-down event and it is not an event unblocking the
reads. So there is no reason to report a read activity.

This patch must be backported to 2.8.
2023-11-07 10:41:11 +01:00
Christopher Faulet
08d7169f42 MINOR: stconn: Don't queue stream task in past in sc_notify()
A task must never be queued in past. However, in sc_notify(), the stream
task, if not woken up, is queued. Thanks to previous fixes, the stream task
expiration date should be correct. But to prevent any issue, a BUG_ON() is
added to be sure it never happens. I guess a good idea could be to remove it
or change it to BUG_ON_HOT() for the final release.
2023-11-07 10:32:25 +01:00
Christopher Faulet
4a2660aa45 BUG/MEDIUM: stconn: Don't report rcv/snd expiration date if SC cannot epxire
When receive or send expiration date of a stream-connector is retrieved, we
now automatically check if it may expire. If not, TICK_ETERNITY is returned.

The expiration dates of the frontend and backend stream-connectors are used
to compute the stream expiration date. This operation is performed at 2
places: at the end of process_stream() and in sc_notify() if the stream is
not woken up.

With this patch, there is no special changes for process_stream() because it
was already handled. It make thing a little simpler. However, it fixes
sc_notify() by avoiding to erroneously compute an expiration date in
past. This highly reduce the stream wakeups when there is contention on the
consumer side.

The bug was introduced with the commit 8073094bf ("NUG/MEDIUM: stconn:
Always update stream's expiration date after I/O"). It was an error to
unconditionnaly set the stream expiration data, without testing blocking
conditions on both SC.

This patch must be backported to 2.8.
2023-11-07 10:30:01 +01:00
Christopher Faulet
141b489291 BUG/MEDIUM: stconn: Report send activity during mux-to-mux fast-forward
When data are directly forwarded from a mux to the opposite one, we must not
forget to report send activity when data are successfully sent or report a
blocked send with data are blocked. It is important because otherwise, if
the transfer is quite long, longer than the client or server timeout, an
error may be triggered because the write timeout is reached.

H1, H2 and PT muxes are concerned. To fix the issue, The done_fastword()
callback now returns the amount of data consummed. This way it is possible
to update/reset the FSB data accordingly.

No backport needed.
2023-11-07 10:30:01 +01:00
Tim Duesterhus
d7eaa0d553 CLEANUP: Re-apply xalloc_size.cocci (3)
This reapplies the xalloc_size.cocci patch across the whole `src/` tree.

see 16cc16dd82
see 63ee0e4c01
see 9fb57e8c17
2023-11-06 20:49:56 +01:00
Willy Tarreau
09eacb8b24 BUG/MINOR: server: remove some incorrect free() calls on null elements
In commit 6f4bfed3a ("MINOR: server: Add parser support for
set-proxy-v2-tlv-fmt") a few free() calls were made to an element on
error path when it was detected it was NULL. It doesn't have any
effect, however there was one case of use-after-free at the end of
srv_settings_cpy() that was caught by gcc due to attempting to free
the element after freeing its holder.

No backport is needed.
2023-11-04 08:56:01 +01:00
Willy Tarreau
e16762f8a8 OPTIM: mux-h2: call h2_send() directly from h2_snd_buf()
This allows to eliminate full buffers very quickly and to recycle them
much faster, resulting in higher transfer rates and lower memory usage
at the same time. We just wake the tasklet up if it succeeded so that
h2_process() and friends are called to finalize what needs to.

For regular buffer sizes, the performance level becomes quite close to
the one obtained with the zero-copy mechanism (zero-copy remains much
faster with non-default buffer sizes). The memory savings are huge with
default buffer size: at 64c * 100 streams on a single thread, we used
to forward 4.4 Gbps of traffic using 10400 buffers. After the change,
the performance reaches 5.9 Gbps with only 22-24 buffers, since they
are quickly recycled. That's asaving of 160 MB of RAM.

A concern was an increase in the number of syscalls but this is not
the case, the numbers remained exactly the same before and after.

Some experimentations were made to try to cork data and not send
incomplete buffers, and that always voided these changes. One
explanation might be that keeping a first buffer with only headers
frames is sufficient to prevent a zero-copy of the data coming in
a next snd_buf() call. This still needs to be studied anyway.
2023-11-04 08:34:23 +01:00
Willy Tarreau
0fa5adee3b MINOR: mux-h2: always use h2_send() in h2_done_ff(), not h2_process()
By calling h2_process(), the code would theoretically make it possible
for a synchronous ->wake() call to provoke an indirect call to h2_snd_buf()
while we're in h2_done_ff(), which could be quite bad. The current
conditions do not permit it right now but this could easily break by
accident. Better use h2_send() and wake the task up if needed. Precise
performance tests showed no change.
2023-11-04 08:12:17 +01:00
Willy Tarreau
58185669d8 BUG/MEDIUM: pattern: don't trim pools under lock in pat_ref_purge_range()
There's a subtle issue that results from pat_ref_purge_range() trying
to release memory. Since commit 0d93a8186 ("MINOR: pools: work around
possibly slow malloc_trim() during gc") that was backported to 2.3,
trim_all_pools() now protects itself against concurrent malloc() and
free() by isolating itself. The problem is that pat_ref_purge_range()
must be called under a lock, which is precisely what's done in
cli_io_handler_clear_map(). Thus during a clearing of a map, if
another thread tries to access or update an entry in the same map, it
will wait for the ref->lock to be released, and trim_all_pools() will
wait for all threads to be harmless, thus causing a deadlock. Note
that disabling memory trimming cannot work around the problem here
because it's tested only under isolation.

The solution here consists in moving the call to trim_all_pools() to
the caller, out of the lock.

This must be backported as far as 2.4.
2023-11-04 07:55:37 +01:00
Alexander Stephan
ce7501de79 MINOR: connection: Send out generic, user-defined server TLVs
To follow-up the implementation of the new set-proxy-v2-tlv-fmt
keyword in the server, the connection is updated to use the previously
allocated TLVs. If no value was specified, we send out an empty TLV.
As the feature is fully working with this commit, documentation and a
test for the server and default-server are added as well.
2023-11-04 04:56:59 +01:00
Alexander Stephan
6f4bfed3a2 MINOR: server: Add parser support for set-proxy-v2-tlv-fmt
This commit introduces a generic server-side parsing of type-value pair
arguments and allocation of a TLV list via a new keyword called
set-proxy-v2-tlv-fmt.

This allows to 1) forward any TLV type with the help of fc_pp_tlv,
2) generally, send out any TLV type and value via a log format expression.
To have this fully working the connection will need to be updated in
a follow-up commit to actually respect the new server TLV list.

default-server support has also been implemented.
2023-11-04 04:56:59 +01:00
Aurelien DARRAGON
5158c0ff69 MEDIUM: stktable/peers: "write-to" local table on peer updates
In this patch, we add the possibility to declare on a table definition
("table" in peer section, or "stick-table" in proxy section) that we
want the remote/peer updates on that table to be pushed on a local
haproxy table in addition to the source table.

Consider this example:

  |peers mypeers
  |        peer local 127.0.0.1:3334
  |        peer clust 127.0.0.1:3333
  |        table t1.local type string size 10m store server_id,server_key expire 30s
  |        table t1.clust type string size 10m store server_id,server_key write-to mypeers/t1.local expire 30s

With this setup, we consider haproxy uses t1.local as cache/local table
for read and write operations, and that t1.clust is a remote table
containing datas processed from t1.local and similar tables from other
haproxy peers in a cluster setup. The t1.clust table will be used to
refresh the local/cache one via the "write-to" statement.

What will happen, is that every time haproxy will see entry updates for
the t1.clust table: it will overwrite t1.local table with fresh data and
will update the entry expiration timer. If t1.local entry doesn't exist
yet (key doesn't exist), it will automatically create it. Note that only
types that cannot be used for arithmetic ops will be handled, and this
to prevent processed values from the remote table from interfering with
computations based on values from the local table. (ie: prevent
cumulative counters from growing indefinitely).

"write-to" will only push supported types if they both exist in the source
and the target table. Be careful with server_id and server_key storage
because they are often declared implicitly when referencing a table in
sticking rules but it is required to declare them explicitly for them to
be pushed between a remote and a local table through "write-to" option.

Also note that the "write-to" target table should have the same type as
the source one, and that the key length should be strictly equal,
otherwise haproxy will raise an error due to the tables being
incompatibles. A table that is already being written to cannot be used
as a source table for a "write-to" target.

Thanks to this patch, it will now be possible to use sticking rules in
peer cluster context by using a local table as a local cache which
will be automatically refreshed by one or multiple remote table(s).

This commit depends on:
 - "MINOR: stktable: stktable_init() sets err_msg on error"
 - "MINOR: stktable: check if a type should be used as-is"
2023-11-03 17:30:30 +01:00
Aurelien DARRAGON
db0cb54f81 MINOR: stktable: check if a type should be used as-is
stick table types now have an extra bit named 'as_is' that allows us to
check if such type should be used as-is or if it may be involved in
arithmetic operations such as counters. This can be useful since those
types are not common and may require specific handling.

e.g.: stktable_data_types[data_type].as_is will be set to 1 if the type
cannot be used in arithmetic operations.
2023-11-03 17:30:30 +01:00
Aurelien DARRAGON
b8c19f877a MINOR: stktable: stktable_init() sets err_msg on error
stktable_init() now sets err_msg when error occurs so that caller is able
to precisely report the cause of the failure.
2023-11-03 17:30:30 +01:00
Aurelien DARRAGON
b6a9eca88d BUG/MINOR: cfgparse/stktable: fix error message on stktable_init() failure
As a result of copy paste error in 1b8e68e ("MEDIUM: stick-table: Stop
handling stick-tables as proxies."), postparsing stktable_init() failures
were reported as such for named peer tables:

   "Proxy 'table_name': failed to initialize stick table."

Now they are correctly reported like this:

   "Parsing [file:line]: failed to initialize 'table_name' stick-table."

This should be backported to every stable versions.
2023-11-03 17:30:30 +01:00
Aurelien DARRAGON
6376fe9142 BUG/MINOR: stktable: missing free in parse_stick_table()
When "peers" keyword is encountered within a stick table definition,
peers.name hint gets replaced with a new copy of the provided name using
strdup(). However, there is no detection on whether the name was
previously set or not, so it is currently allowed to reuse the keyword
multiple time to overwrite previous value, but here we forgot to free
previous value for peers.name before assigning it to a new one.

This should be backported to every stable versions.
2023-11-03 17:30:30 +01:00
Aurelien DARRAGON
b9c0b039c8 MINOR: proxy/stktable: add resolve_stick_rule helper function
Simplify stick and store sticktable proxy rules postparsing by adding
a sticking rule entry resolve (postparsing) function.

This will ease code maintenance.
2023-11-03 17:30:30 +01:00
Amaury Denoyelle
d82a6d93e2 BUG/MINOR: proto_reverse_connect: support SNI on active connect
SNI may be specify on a server line for connecting to the remote host.
This requires to manually set it on the connection via
ssl_sock_set_servername().

This step was missing when a server line was used for active reverse
HTTP. Fix this by adding the missing ssl_sock_set_servername()
invocation inside new_reverse_conn().

Note that for the moment, no session is instantiated to carry active
reverse connection. A direct consequence of this is that SNI sample
retrieval may crash depending if it depends on session parameters. This
should be fixed by a later commit. In the meantime, this patch is
sufficient to support simple SNI value such as constant expressions.

No need to backport.
2023-11-03 11:11:44 +01:00
Ruei-Bang Chen
7a1ec235cd MINOR: sample: Add fetcher for getting all cookie names
This new fetcher can be used to extract the list of cookie names from
Cookie request header or from Set-Cookie response header depending on
the stream direction. There is an optional argument that can be used
as the delimiter (which is assumed to be the first character of the
argument) between cookie names. The default delimiter is comma (,).

Note that we will treat the Cookie request header as a semi-colon
separated list of cookies and each Set-Cookie response header as
a single cookie and extract the cookie names accordingly.
2023-11-03 09:57:06 +01:00
Christopher Faulet
c72ab1cc6d BUG/MINOR: tcpcheck: Report hexstring instead of binary one on check failure
When an expect rule failed for a tcp-check, information about the expect
rule is dumped in the report. For a check on a binary string, a hexstring is
used in the configuration but the decoded string is dumped. It is an problem
because it can contain special characters. And it is not really handy
because there is no correspondance with the config.

So, now, the hexstring is dumped in the report. This way, we are sure there
is no special characters and it is easy to find it in the configuration.

This patch shoudl solve the issue #2326. It must be backported as far as
2.2.
2023-10-31 08:02:44 +01:00
William Lallemand
e7bae7a0b6 BUG/MEDIUM: ssl: segfault when cipher is NULL
The patch which fixes the certificate selection uses
SSL_CIPHER_get_id() to skip the SCSV ciphers without checking if cipher
is NULL. This patch fixes the issue by skipping any NULL cipher in the
iteration.

Problem was reported in #2329.

Need to be backported where 23093c72f1 was
backported. No release was made with this patch so the severity is
MEDIUM.
2023-10-30 18:08:16 +01:00
Amaury Denoyelle
47ed1181f2 BUG/MINOR: mux-quic: fix early close if unset client timeout
When no client timeout is defined in the configuration, QCC timeout task
is never allocated. However, a NULL timeout task is also used as a
criteria in qcc_is_dead() to consider that the MUX instance should be
released as timeout stroke earlier.

This bug causes every connection to be closed by haproxy side with a
CONNECTION_CLOSE. This is notable when using several streams per
connection with only the first stream completed and the others failed.

To fix this, change timeout task allocation policy. It is now always
allocated. This means that if no timeout is defined, it will never be
run. This is not considered a waste of resource as no timeout in the
configuration is considered as an exception case. However, this has the
advantage to simplify the rest of the code which can now check for the
task instance without having an extra check on the timeout value.

This bug is labelled as minor as it only occurs if no timeout client is
defined which reports warning on startup as it may caused unexpected
behavior.

This bug should be backported up to 2.6.
2023-10-27 17:51:08 +02:00
William Lallemand
23093c72f1 BUG/MINOR: ssl: suboptimal certificate selection with TLSv1.3 and dual ECDSA/RSA
When using TLSv1.3, the signature algorithms extension is used to chose
the right ECDSA or RSA certificate.

However there was an old test for previous version of TLS (< 1.3) which
was testing if the cipher is compatible with ECDSA when an ECDSA
signature algorithm is used. This test was relying on
SSL_CIPHER_get_auth_nid(cipher) == NID_auth_ecdsa to verify if the
cipher is still good.

Problem is, with TLSv1.3, all ciphersuites are compatible with any
authentication algorithm, but SSL_CIPHER_get_auth_nid(cipher) does not
return NID_auth_ecdsa, but NID_auth_any.

Because of this, with TLSv1.3 when both ECDSA and RSA certificates are
available for a domain, the ECDSA one is not chosen in priority.

This patch also introduces a test on the cipher IDs for the signaling
ciphersuites, because they would always return NID_auth_any, and are not
relevent for this selection.

This patch fixes issue #2300.

Must be backported in all stable versions.
2023-10-26 19:17:13 +02:00
Amaury Denoyelle
4a89dba6d5 MEDIUM: quic: count quic_conn for global sslconns
Similar to the previous commit which check for maxconn before allocating
a QUIC connection, this patch checks for maxsslconn at the same step.
This is necessary as a QUIC connection cannot run without a SSL context.

This should be backported up to 2.6. It relies on the following patch :
  "BUG/MINOR: ssl: use a thread-safe sslconns increment"
2023-10-26 15:35:58 +02:00
Amaury Denoyelle
7735cf3854 MEDIUM: quic: count quic_conn instance for maxconn
Increment actconn and check maxconn limit when a quic_conn is
instantiated. This is necessary because prior to this patch, quic_conn
instances where not counted. Global actconn was only incremented after
the handshake has been completed and the connection structure is
allocated.

The increment is done using increment_actconn() on INITIAL packet
parsing if a new connection is about to be created. If the limit is
reached, the allocation is cancelled and the INITIAL packet is dropped.

The decrement is done under quic_conn_release(). This means that
quic_cc_conn instances are not taken into account. This seems safe
enough because quic_cc_conn are only used for minimal usage.

The counterpart of this change is that maxconn must not be checked a
second time when listener_accept() is done over a QUIC connection. For
this, a new bind_conf flag BC_O_XPRT_MAXCONN is set for listeners when
maxconn is already counted by the lower layer. For the moment, it is
positionned only for QUIC listeners.

Without this patch, haproxy process could suffer from heavy memory/CPU
load if the number of concurrent handshake is high.

This patch is not considered a bug fix per-se. However, it has a major
benefit to protect against too many QUIC handshakes. As such, it should
be backported up to 2.6. For this, it relies on the following patch :
  "MINOR: frontend: implement a dedicated actconn increment function"
2023-10-26 15:35:56 +02:00
Amaury Denoyelle
350f8b0c07 BUG/MINOR: ssl: use a thread-safe sslconns increment
Each time a new SSL context is allocated, global.sslconns is
incremented. If global.maxsslconn is reached, the allocation is
cancelled.

This procedure was not entirely thread-safe due to the check and
increment operations conducted at different stage. This could lead to
global.maxsslconn slightly exceeded when several threads allocate SSL
context while sslconns is near the limit.

To fix this, use a CAS operation in a do/while loop. This code is
similar to the actconn/maxconn increment for connection.

A new function increment_sslconn() is defined for this operation. For
the moment, only SSL code is using it. However, it is expected that QUIC
will also use it to count QUIC connections as SSL ones.

This should be backported to all stable releases. Note that prior to the
2.6, sslconns was outside of global struct, so this commit should be
slightly adjusted.
2023-10-26 15:25:07 +02:00
Amaury Denoyelle
fffd435bbd MINOR: frontend: implement a dedicated actconn increment function
When a new frontend connection is instantiated, actconn global counter
is incremented. If global maxconn value is reached, the connection is
cancelled. This ensures that system limit are under control.

Prior to this patch, the atomic check/increment operations were done
directly into listener_accept(). Move them in a dedicated function
increment_actconn() in frontend module. This will be useful when QUIC
connections will be counted in actconn counter.
2023-10-26 15:18:48 +02:00
Amaury Denoyelle
fe29dba872 BUG/MINOR: quic: do not consider idle timeout on CLOSING state
When entering closing state, a QUIC connection is maintained during a
certain delay. The principle is to ensure the other peer has received
the CONNECTION_CLOSE frame. In case of packet duplication/reordering,
CONNECTION_CLOSE is reemitted.

QUIC RFC recommends to use at least 3 times the PTO value. However,
prior to this patch, haproxy used instead the max value between 3 times
the PTO and the connection idle timeout. In the default case, idle
timeout is set to 30s which is in most of the times largely superior to
the PTO. This has the downside of keeping the connection in memory for
too long whereas all resources could be released much earlier.

Fix this behavior by using 3 times the PTO on closing or draining state.
This value is limited up to 1s. This ensures that most of connections
are covered by this. If a connection runs with a very high RTT, it must
not impact the whole process and should be released in a reasonable
delay.

This should be backported up to 2.6.
2023-10-26 15:14:36 +02:00
Willy Tarreau
96bb99a87d DEBUG: pools: detect that malloc_trim() is in progress
Now when calling ha_panic() with a thread still under malloc_trim(),
we'll set a new tainted flag to easily report it, and the output
trace will report that this condition happened and will suggest to
use no-memory-trimming to avoid it in the future.
2023-10-25 15:48:02 +02:00
Willy Tarreau
26a6481f00 DEBUG: lua: add tainted flags for stuck Lua contexts
William suggested that since we can detect the presence of Lua in the
stack, let's combine it with stuck detection to set a new pair of flags
indicating a stuck Lua context and a stuck Lua shared context.

Now, executing an infinite loop in a Lua sample fetch function with
yield disabled crashes with tainted=0xe40 if loaded from a lua-load
statement, or tainted=0x640 from a lua-load-per-thread statement.

In addition, at the end of the panic dump, we can check if Lua was
seen stuck and emit recommendations about lua-load-per-thread and
the choice of dependencies depending on the presence of threads
and/or shared context.
2023-10-25 15:48:02 +02:00
Willy Tarreau
46bbb3a33b DEBUG: add a tainted flag when ha_panic() is called
This will make it easier to know that the panic function was called,
for the occasional case where the dump crashes and/or the stack is
corrupted and not much exploitable. Now at least it will be sufficient
to check the tainted value to know that someone called ha_panic(), and
it will also be usable to condition extra analysis.
2023-10-25 15:48:02 +02:00
Aurelien DARRAGON
1822e8998b MINOR: server: add helper function to detach server from proxy list
Remove some code duplication by introducing a basic helper function
to detach a server from its parent proxy. It is supported to call
the function even if the server is not yet listed in the proxy list.

If the server is not yet listed in the proxy, the function will do
nothing. In delete_server(), we previously performed some BUG_ON()
to ensure that the detach always succeeded given that we were certain
that the server was in the proxy list because it was retrieved through
get_backend_server().

However this test is superfluous, we can safely assume that the operation
will always succeed if get_backend_server() returned != NULL (we're under
full thread isolation), and if it's not the case, then we have a bigger
API issue anyway..
2023-10-25 11:59:27 +02:00
Aurelien DARRAGON
e128fc7ce1 BUG/MEDIUM: server: "proto" not working for dynamic servers
In 304672320e ("MINOR: server: support keyword proto in 'add server' cli")
improper use of conn_get_best_mux_entry() function was made:

First, server's proxy mode was directly passed as "proto_mode" argument
to conn_get_best_mux_entry(), but this is strictly invalid because while
there is some relationship between proto modes and proxy modes, they
don't use the same storage mechanism and cannot be used interchangeably.

Because of this bug, conn_get_best_mux_entry() would not work at all for
TCP because PR_MODE_TCP equals 0, where PROTO_MODE_TCP normally equals 1.

Then another, less sensitive bug, remains:
as its name and description implies, conn_get_best_mux_entry() will try
its best to return something to the user, only using keyword (mux_proto)
input as an hint to return the most relevant mux within the list of
mux that are compatibles with proto_side and proto_mode values.

This means that even if mux_proto cannot be found or is not available
with current proto_side and proto_mode values, conn_get_best_mux_entry()
will most probably fallback to a more generic mux.

However in cli_parse_add_server(), we directly check the result of
conn_get_best_mux_entry() and consider that it will return NULL if the
provided keyword hint for mux_proto cannot be found. This will result in
the function not raising errors as expected, because most of the times if
the expected proto cannot be found, then we'll silently switch to the
fallback one, despite the user providing an explicit proto.

To fix that, we store the result of conn_get_best_mux_entry() to compare
the returned mux proto name with the one we're expecting to get, as it
is originally performed in cfgparse during initial server keyword parsing.

This patch depends on
 - "MINOR: connection: add conn_pr_mode_to_proto_mode() helper func")

It must be backported up to 2.6.
2023-10-25 11:59:27 +02:00
Aurelien DARRAGON
66795bd721 MINOR: connection: add conn_pr_mode_to_proto_mode() helper func
This function allows to safely map proxy mode to corresponding proto_mode

This will allow for easier code maintenance and prevent mixups between
proxy mode and proto mode.
2023-10-25 11:59:27 +02:00
Aurelien DARRAGON
29b76cae47 BUG/MEDIUM: server/log: "mode log" after server keyword causes crash
In 9a74a6c ("MAJOR: log: introduce log backends"), a mistake was made:
it was assumed that the proxy mode was already known during server
keyword parsing in parse_server() function, but this is wrong.

Indeed, "mode log" can be declared late in the proxy section. Due to this,
a simple config like this will cause the process to crash:

   |backend test
   |
   |  server name 127.0.0.1:8080
   |  mode log

In order to fix this, we relax some checks in _srv_parse_init() and store
the address protocol from str2sa_range() in server struct, then we set-up
a postparsing function that is to be called after config parsing to
finish the server checks/initialization that depend on the proxy mode
to be known. We achieve this by checking the PR_CAP_LB capability from
the parent proxy to know if we're in such case where the effective proxy
mode is not yet known (it is assumed that other proxies which are implicit
ones don't provide this possibility and thus don't suffer from this
constraint).

Only then, if the capability is not found, we immediately perform the
server checks that depend on the proxy mode, else the check is postponed
and it will automatically be performed during postparsing thanks to the
REGISTER_POST_SERVER_CHECK() hook.

Note that we remove the SRV_PARSE_IN_LOG_BE flag because it was introduced
in the above commit and it is no longer relevant.

No backport needed unless 9a74a6c gets backported.
2023-10-25 11:59:27 +02:00
Amaury Denoyelle
f76e94d231 MINOR: backend: refactor insertion in avail conns tree
Define a new function srv_add_to_avail_list(). This function is used to
centralize connection insertion in available tree. It reuses a BUG_ON()
statement to ensure the connection is not present in the idle list.
2023-10-25 10:33:06 +02:00
Amaury Denoyelle
394bd4eb39 BUG/MAJOR: backend: fix idle conn crash under low FD
Since the following commit, idle conns are stored in a list as secondary
storage to retrieve them in usage order :
  5afcb686b9
  MAJOR: connection: purge idle conn by last usage

The list usage has been extended wherever connections lookup are done
both on idle and safe trees. This reduced the code size by replacing a
two tree loops by a single list loop.

LIST_ELEM() is used in this context to retrieve the first idle list
element from the server list head. However, macro usage was wrong due to
an extra '&' operator which returns an invalid connection reference.
This will most of the time caused a crash on conn_delete_from_tree() or
affiliated functions.

This bug only occurs if the FD pool is exhausted and some idle
connections are selected to be killed.

It can be reproduced using the following config and h2load command :
$ h2load -t 8 -c 800 -m 10 -n 800 "http://127.0.0.1:21080/?s=10k"

global
	maxconn 100

defaults
	mode http
	timeout connect 20s
	timeout client  20s
	timeout server  20s

listen li
	bind :21080 proto h2
	server nginx 127.99.0.1:30080 proto h1

This bug has been introduced by the above commit. Thus no need to
backport this fix.

Note that LIST_ELEM() macro usage was slightly adjusted also in
srv_migrate_conns_to_remove(). The function used toremove_list instead
of idle_list connection list element. This is not a bug as they are
stored in the same union. However, the new code is clearer as it intends
to move connection from the idle_list only into the toremove_list
mt-list.
2023-10-25 10:30:45 +02:00
Amaury Denoyelle
b9fbbaf2a8 BUG/MINOR: backend: fix wrong BUG_ON for avail conn
Idle connections are both stored in an idle/safe tree and in an idle
list. The list is used as a secondary storage to be able to retrieve
them by usage order.

If a connection is moved into the available tree, it must not be present
in the idle list. A BUG_ON() was written to check this but was placed at
the wrong code section. Fix this by removing the misplaced one and write
new ones for avail_conns tree insertion and lookup.

The impact of this bug is minor as the misplaced BUG_ON() did not seem
to be triggered.

No need to backport.
2023-10-25 10:11:04 +02:00
Tristan
8da0e45382 MINOR: lua: change tune.lua.log.stderr default from 'on' to 'auto'
After making it configurable in previous commit "MINOR: lua: Add flags
to configure logging behaviour", this patch changes the default value
of tune.lua.log.stderr from 'on' (unconditionally forward LUA logs to
stderr) to 'auto' (only forward LUA logs to stderr if logging via a
standard logger is disabled, or none is configured for the current context)

Since this is a change in behaviour, it shouldn't be backported
2023-10-25 07:49:03 +02:00
Tristan
97dacbbb86 MINOR: lua: Add flags to configure logging behaviour
Until now, messages printed from LUA log functions were sent both to
the any logger configured for the current proxy, and additionally to
stderr (in most cases)

This introduces two flags to configure LUA log handling:
- tune.lua.log.loggers to use standard loggers or not
- tune.lua.log.stderr to use stderr, or not, or only conditionally

This addresses github feature request #2316

This can be backported to 2.8 as it doesn't change previous behaviour.
2023-10-25 07:48:48 +02:00
William Lallemand
b12613f0ac BUG/MINOR: ssl: load correctly @system-ca when ca-base is define
The configuration parser still adds the 'ca-base' directory when loading
the @system-ca, preventing it to be loaded correctly.

This patch fixes the problem by not adding the ca-base when a file
starts by '@'.

Fix issue #2313.

Must be backported as far as 2.6.
2023-10-23 22:03:55 +02:00
Willy Tarreau
380f115a4a BUG/MINOR: mux-h2: update tracked counters with req cnt/req err
Originally H2 would transfer everything to H1 and parsing errors were
handled there, so that if there was a track-sc rule in effect, the
counters would be updated as well. As we started to add more and more
HTTP-compliance checks at the H2 layer, then switched to HTX, we
progressively lost this ability. It's a bit annoying because it means
we will not maintain accurate error counters for a given source, for
example.

This patch adds the calls to session_inc_http_req_ctr() and
session_inc_http_err_ctr() when needed (i.e. when failing to parse
an HTTP request since all other cases are handled by the stream),
just like mux-h1 does. The same should be done for mux-h3 by the
way.

This can be backported to recent stable versions. It's not exactly a
bug, rather a missing feature in that we had never updated this counter
for H2 till now, but it does make sense to do it especially based on
what the doc says about its usage.
2023-10-20 21:09:12 +02:00
Willy Tarreau
250b630fb9 BUG/MINOR: mux-h2: commit the current stream ID even on reject
The H2 spec says that a HEADERS frame turns an idle stream to the open
state, and it may then turn to half-closed(remote) on ES, then to close,
all at once, if we respond with RST (e.g. on error). Due to the fact that
we process a complete frame at once since h2_dec_hdrs() may reassemble
CONTINUATION frames until everything is complete, the state was only
committed after the frame was completley valid (otherwise multiple passes
could result in subsequent frames being rejected as the stream ID would
be equal to the highest one).

However this is not correct because it means that a client may retry on
the same ID as a previously failed one, which technically is forbidden
(for example the client couldn't know which of them a WINDOW_UPDATE or
RST_STREAM frame is for).

In practice, due to the error paths, this would only be possible when
failing to decode HPACK while leaving the HPACK stream intact, thus
when the valid decoded HPACK stream cannot be turned into a valid HTTP
representation, e.g. when the resulting headers are too large for example.
The solution to avoid this consists in committing the stream ID on this
error path as well. h2spec continues to be happy.

Thanks to Annika Wickert and Tim Windelschmidt for reporting this issue.

This fix must be backported to all stable versions.
2023-10-20 21:09:12 +02:00
Willy Tarreau
08f3bb5bd5 MINOR: mux-h2/traces: clarify the "rejected H2 request" event
In h2_frt_handle_headers() all failures lead to a generic message saying
"rejected H2 request". It's quite inexpressive while there are a few
distinct tests that are made before jumping there:

  - trailers on closed stream
  - unparsable request
  - refused stream

Let's emit the traces from these call points instead so that we get more
info about what happened. Since these are user-level messages, we take
care of keeping them aligned as much as possible.

For example before it would say:

  [04|h2|1|mux_h2.c:2859] rejected H2 request : h2c=0x7f5d58036fd0(F,FRE)
  [04|h2|5|mux_h2.c:2860] h2c_frt_handle_headers(): leaving on error : h2c=0x7f5d58036fd0(F,FRE) dsi=1 h2s=0x9fdb60(0,CLO)

And now it says:

  [04|h2|1|mux_h2.c:2817] rcvd unparsable H2 request : h2c=0x7f55f8037160(F,FRH) dsi=1 h2s=CLO
  [04|h2|5|mux_h2.c:2875] h2c_frt_handle_headers(): leaving on error : h2c=0x7f55f8037160(F,FRE) dsi=1 h2s=CLO
2023-10-20 21:09:12 +02:00
Willy Tarreau
1deac6f99a MINOR: mux-h2/traces: explicitly show the error/refused stream states
Sometimes it's unclear whether a stream is still open or closed when
certain traces are emitted, for example when the stream was refused,
because the reported pointer and ID in fact correspond to the refused
stream. And for closed streams, no pointer/name is printed, leaving
some confusion about the state. This patch makes the situation easier
to analyse by explicitly reporting "h2s=CLO" on closed/error/refused
streams so that we don't waste time comparing pointers and we instantly
know the stream is closed. Now instead of emitting:

   [03|h2|5|mux_h2.c:2874] h2c_frt_handle_headers(): leaving on error : h2c=0x7fdfa8026820(F,FRE) dsi=201 h2s=0x9fdb60(0,CLO)

It will emit:

   [03|h2|5|mux_h2.c:2874] h2c_frt_handle_headers(): leaving on error : h2c=0x7fdfa8026820(F,FRE) dsi=201 h2s=CLO
2023-10-20 21:09:12 +02:00
Jens Popp
f66b9f6018 MINOR: sample: Added support for Arrays in sample_conv_json_query in sample.c
Method now returns the content of Json Arrays, if it is specified in
Json Path as String. The start and end character is a square bracket. Any
complex object in the array is returned as Json, so that you might get Arrays
of Array or objects. Only recommended for Arrays of simple types (e.g.,
String or int) which will be returned as CSV String. Also updated
documentation and fixed issue with parenthesis and other changes from
comments.

This patch was discussed in issue #2281.

Signed-off-by: William Lallemand <wlallemand@haproxy.com>
2023-10-20 18:42:05 +02:00
Amaury Denoyelle
f70cf28539 MINOR: listener: forbid most keywords for reverse HTTP bind
Reverse HTTP bind is very specific in that in rely on a server to
initiate connection. All connection settings are defined on the server
line and ignored from the bind line.

Before this patch, most of keywords were silently ignored. This could
result in a configuration from doing unexpected things from the user
point of view. To improve this situation, add a new 'rhttp_ok' field in
bind_kw structure. If not set, the keyword is forbidden on a reverse
bind line and will cause a fatal config error.

For the moment, only the following keywords are usable with reverse bind
'id', 'name' and 'nbconn'.

This change is safe as it's already forbidden to mix reverse and
standard addresses on the same bind line.
2023-10-20 17:28:08 +02:00
Amaury Denoyelle
e05edf71df MINOR: cfgparse: rename "rev@" prefix to "rhttp@"
'rev@' was used to specify a bind/server used with reverse HTTP
transport. This notation was deemed not explicit enough. Rename it
'rhttp@' instead.
2023-10-20 14:44:37 +02:00
Amaury Denoyelle
9d4c7c1151 MINOR: server: convert @reverse to rev@ standard format
Remove the recently introduced '@reverse' notation for HTTP reverse
servers. Instead, reuse the 'rev@' prefix already defined for bind
lines.
2023-10-20 14:44:37 +02:00
Amaury Denoyelle
3222047a14 MINOR: listener: add nbconn kw for reverse connect
Previously, maxconn keyword was reused for a specific usage on reverse
HTTP binds to specify the number of active connect to proceed. To avoid
confusion, introduce a new dedicated keyword 'nbconn' which is specific
to reverse HTTP bind.

This new keyword is forbidden for non-reverse listener. A fatal error is
emitted during config parsing if this rule is not respected. It's safe
because it's also forbidden to mix standard and reverse addresses on the
same bind line.

Internally, nbconn value will be reassigned to 'maxconn' member of
bind_conf structure. This ensures that listener layer will automatically
reenable the preconnect task each time a connection is closed.
2023-10-20 14:44:37 +02:00
Amaury Denoyelle
37d7e52cc6 MINOR: cfgparse: forbid mixing reverse and standard listeners
Reverse HTTP listeners are very specific and share only a very limited
subset of keywords with other listeners. As such, it is probable
meaningless to mix standard and reverse addresses on the same bind line.
This patch emits a fatal error during configuration parsing if this is
the case.
2023-10-20 14:44:37 +02:00
Christopher Faulet
60e7116be0 BUG/MEDIUM: peers: Fix synchro for huge number of tables
The number of updates sent at once was limited to not loop too long to emit
updates when the buffer size is huge or when the number of sync tables is
huge. The limit can be configured and is set to 200 by default. However,
this fix introduced a bug. It is impossible to syncrhonize two peers if the
number of tables is higher than this limit. Thus by default, it is not
possible to sync two peers if there are more than 200 tables to sync.

Technically speacking, a teaching process is finished if we loop on all tables
with no new update messages sent. Because we are limited at each call, the loop
is splitted on several calls. However the restart point for the next loop is
always the last table for which we emitted an update message. Thus with more
tables than the limit, the loop never reachs the end point.

Worse, in conjunction with the bug fixed by "BUG/MEDIUM: peers: Be sure to
always refresh recconnect timer in sync task", it is possible to trigger the
watchdog because the applets may be woken up in loop and leave requesting
more room while its buffer is empty.

To fix the issue, restart conditions for a teaching loop were changed. If
the teach process is interrupted, we now save the restart point, called
stop_local_table. It is the last evaluated table on the previous loop. This
restart point is reset when the teach process is finished.

In additionn, the updates_sent variable in peer_send_msgs() was renamed to
updates to avoid ambiguities. Indeed, the variable is incremented, whether
messages were sent or not.

This patch must be backported as far as 2.6.
2023-10-20 14:32:12 +02:00
Christopher Faulet
cebeab3d20 BUG/MEDIUM: peers: Be sure to always refresh recconnect timer in sync task
A sync task used to manage reconnect, sessions creation or shutdown and data
synchronization is responsible to refresh reconnect and heartbeat timers for
each remote peers and trigger applets wakeup. These timers are used to
refresh the sync task timeer itself. Thus it is important to take care to
always properly refresh them.

However, when there are some data to push, the reconnect timer is not
checked. It may be expired and not refreshed. In this case, an expired timer
may be used to the sync task, leading to a storm of wakeups. The sync task
is woken up in loop because its timer is in the past, waking up Peer applets
at each time.

To fix the issue, the peer's reconnect timer is now refresh to the default
reconnect timeout, if necessary, when there are some data to push.

This patch must be backported to all stable versions.
2023-10-19 15:26:43 +02:00
Willy Tarreau
f08322b56c BUG/MINOR: trace: fix trace parser error reporting
Since traces were adapted to support being declared in the global section
in 2.7 with commit c11f1cdf4 ("MINOR: trace: split the CLI "trace" parser
in CLI vs statement"), the method used to return the error message was
unreliable. For example an invalid sink name in the global section would
produce:

  [ALERT]    (26685) : config : parsing [test-trace.cfg:51] : 'trace': No such sink
  [ALERT]    (26685) : config : parsing [test-trace.cfg:51] : (null)
  [ALERT]    (26685) : config : Error(s) found in configuration file : test-trace.cfg
  [ALERT]    (26685) : config : Fatal errors found in configuration.

The reason is that the trace is emitted manually using ha_error() in
cfg_parse_trace() and -1 is returned without setting the message, and
the caller also prints the empty message. That's quite awkward given
that the API originally comes from the CLI which does support dynamic
strings and that config keywords do as well.

This commit modifies both cli_parse_trace() and cfg_parse_trace() to
return a dynamically allocated message instead, and adapts the central
function trace_parse_statement() to do the same, replacing a few direct
assignments with strdup() or memprintf(). This way the alert is no
longer emitted by the parser function, it just passes the message to
the caller.

A few of the static messages switching to memprintf() also took this
opportunity to report the faulty word:

  [ALERT]    (26772) : config : parsing [test-trace.cfg:51] : No such trace sink 'stduot'
  [ALERT]    (26772) : config : Error(s) found in configuration file : test-trace.cfg
  [ALERT]    (26772) : config : Fatal errors found in configuration.

This may be backported to 2.8 and 2.7.
2023-10-19 14:45:07 +02:00
Willy Tarreau
3dd963b35f BUG/MINOR: mux-h2: fix http-request and http-keep-alive timeouts again
Stefan Behte reported that since commit f279a2f14 ("BUG/MINOR: mux-h2:
refresh the idle_timer when the mux is empty"), the http-request and
http-keep-alive timeouts don't work anymore on H2. Before this patch,
and since 3e448b9b64 ("BUG/MEDIUM: mux-h2: make sure control frames do
not refresh the idle timeout"), they would only be refreshed after stream
frames were sent (HEADERS or DATA) but the patch above that adds more
refresh points broke these so they don't expire anymore as long as
there's some activity.

We cannot just revert the fix since it also addressed an isse by which
sometimes the timeout would trigger too early and provoque truncated
responses. The right approach here is in fact to only use refresh the
idle timer when the mux buffer was flushed from any such stream frames.

In order to achieve this, we're now setting a flag on the connection
whenever we write a stream frame, and we consider that flag when deciding
to refresh the buffer after it's emptied. This way we'll only clear that
flag once the buffer is empty and there were stream data in it, not if
there were no such stream data. In theory it remains possible to leave
the flag on if some control data is appended after the buffer and it's
never cleared, but in practice it's not a problem as a buffer will always
get sent in large blocks when the window opens. Even a large buffer should
be emptied once in a while as control frames will not fill it as much as
data frames could.

Given the patch above was backported as far as 2.6, this patch should
also be backported as far as 2.6.
2023-10-18 17:17:58 +02:00
Willy Tarreau
91ed52976c MINOR: dgram: allow to set rcv/sndbuf for dgram sockets as well
tune.rcvbuf.client and tune.rcvbuf.server are not suitable for shared
dgram sockets because they're per connection so their units are not the
same. However, QUIC's listener and log servers are not connected and
take per-thread or per-process traffic where a socket log buffer might
be too small, causing undesirable packet losses and retransmits in the
case of QUIC. This essentially manifests in listener mode with new
connections taking a lot of time to set up under heavy traffic due to
the small queues causing delays. Let's add a few new settings allowing
to set these shared socket sizes on the frontend and backend side (which
reminds that these are per-front/back and not per client/server hence
not per connection).
2023-10-18 17:01:19 +02:00
Christopher Faulet
203211f4cb REORG: stconn/muxes: Rename init step in fast-forwarding
Instead of speaking of an initialisation stage for each data
fast-forwarding, we now use the negociate term. Thus init_ff/init_fastfwd
functions were renamed nego_ff/nego_fastfwd.
2023-10-18 12:46:55 +02:00
Christopher Faulet
d6d4abdc31 BUILD: mux-h1: Fix build without kernel splicing support
Data fast-forwarding does not build without the kernel splicing support
because counters about splicing don't exist. To make the code more readable,
all code about splicing is disabled if kernel splicing is not supported.
2023-10-18 12:43:38 +02:00
Christopher Faulet
023564b685 MINOR: global: Add an option to disable the zero-copy forwarding
The zero-copy forwarding or the mux-to-mux forwarding is a way to
fast-forward data without using the channels buffers. Data are transferred
from a mux to the other one. The kernel splicing is an optimization of the
zero-copy forwarding. But it can also use normal buffers (but not channels
ones). This way, it could be possible to fast-forward data with muxes not
supporting the kernel splicing (H2 and H3 muxes) but also with applets.

However, this mode can introduce regressions or bugs in future (just like
the kernel splicing). Thus, It could be usefull to disable this optim. To do
so, in configuration, the global tune settting
'tune.disable-zero-copy-forwarding' may be set in a global section or the
'-dZ' command line parameter may be used to start HAProxy. Of course, this
also disables the kernel splicing.
2023-10-17 18:51:13 +02:00
Christopher Faulet
ec22d3102d MEDIUM: mux-pt: Add fast-forwarding support
The PT multiplexer now implements callbacks function to produce and consume
fast-forwarded data. Only splicing is support because the mux-pt does not
use its own buffers.
2023-10-17 18:51:13 +02:00
Christopher Faulet
169df3b3a8 CLEAN: mux-h1: Remove useless __maybe_unused attribute on h1_make_chunk()
This attribute was added during the dev stage. But it is useless now the
function is used. So, just remove it.
2023-10-17 18:51:13 +02:00
Christopher Faulet
322d660d08 MINOR: tree-wide: Only rely on co_data() to check channel emptyness
Because channel_is_empty() function does now only check the channel's
buffer, we can remove it and rely on co_data() instead. Of course, all tests
must be inverted.

channel_is_empty() is thus removed.
2023-10-17 18:51:13 +02:00
Christopher Faulet
20c463955d MEDIUM: channel: don't look at iobuf to report an empty channel
It is important to split channels and I/O buffers. When data are pushed in
an I/O buffer, we consider them as forwarded. The channel never sees
them. Fast-forwarded data are now handled in the SE only.
2023-10-17 18:51:13 +02:00
Christopher Faulet
11c05c516a MEDIUM: mux-h2: Add consumer-side fast-forwarding support
The H2 multiplexer now implements callbacks to consume fast-forwarded
data. It is the most usful case: A H2 client getting data from a H1
server. It is also the easiest case to implement. The producer side is
trickier because of multiplexing. It is not obvious this case would be
improved with data fast-forwarding.
2023-10-17 18:51:13 +02:00
Christopher Faulet
eb346074bb MINOR: h2: Set the BODYLESS_RESP flag on the HTX start-line if necessary
When message headers are parsed and an HTX start-line is created, if we
detect the response must not have any payload, a specific flag must be set
on the HTX start-line. It happens for instance for response to HEAD
requests. This flag is useb by the multiplexers to know response payload, if
any, must be silently skipped.

This was not performed when h2 HEADERS frames were decoded. This HTX flag
was specifically added to fix a bug when the splicing is inuse. Thus the H2
multiplexer was not concerned. Because the mux-to-mux fast-forwarding will
be introduced, it is important handle this flag in the H2 multiplexer too.
2023-10-17 18:51:13 +02:00
Christopher Faulet
2d80eb5b7a MEDIUM: mux-h1: Add fast-forwarding support
The H1 multiplexer now implements callbacks function to produce and consume
fast-forwarded data.
2023-10-17 18:51:13 +02:00
Christopher Faulet
2db273a7b5 MEDIUM: mux-h1: Simplify payload formatting based on HTX blocks on sending path
Just like for the zero-copy, this patch tries to simplify the code
responsible to format the message payload before sending it. But here, we
take care to simplify the loop on the HTX blocks. The result should be
less errorrpone.
2023-10-17 18:51:13 +02:00
Christopher Faulet
129787fb00 MEDIUM: mux-h1: Simplify zero-copy on sending path
In h1_make_data(), the function responsible to format the message payload
before sending it, the code dealing with zero-copy was slighly simplified
(at least for me :).

There is no real change but there is a better split between messages with a
content-length and cunked messages.
2023-10-17 18:51:13 +02:00
Christopher Faulet
6dff013fad MINOR: mux-h1: Add function to add size of a chunk to an outgoind message
This function should be used to send the chunk size, before appending the
chunk payload. It also takes care to add a CRLF to finish a previous chunk,
if necessary. This function will be used to fix the splicing for re-chunk
responses with an unknown length.
2023-10-17 18:51:13 +02:00
Christopher Faulet
91f1c5519a MEDIUM: raw-sock: Specifiy amount of data to send via snd_pipe callback
When data were sent using the kernel splicing, we tried to send all data
with no restriction. Most of time it is valid. However, because the payload
representation may differ between the producer and the consumer, it is
important to be able to specify how must data to send via the splicing.

Of course, for performance reason, it is important to maximize amount of
data send via splicing at each call. However, on edge-cases, this now can be
limited.
2023-10-17 18:51:13 +02:00
Christopher Faulet
d57a66d63a MEDIUM: mux-h1: Properly handle state transitions of chunked outgoing messages
On the sending path, there are 3 states for chunked payload in H1:

  * H1_MSG_CHUNK_SIZE: the chunk size must be emitted
  * H1_MSH_CHUNK_CRLF: The end of the chunk must be emitted
  * H1_MSG_DATA: Chunked data must be emitted

However, some shortcuts were used on the sending path to avoid some
transitions. Especially, outgoing messages were never switched in
H1_MSG_CHUNK_SIZE state.

However, it will be necessary to properly handle all transitions on the payload
to implement mux-to-mux forwarding, to be sure to always known when the chunk
size or the end of the chunk must be emitted.
2023-10-17 18:51:13 +02:00
Christopher Faulet
117f9cc017 MINOR: mux-h1: Use HTX extra field only for responses with known length
For now, it is not an issue, but it is safer to explicitly ignore HTX extra
field for responses with unknown length. This will be mandatory to future
fixes, to be able to re-chunk responses with an unknown length..
2023-10-17 18:51:13 +02:00
Christopher Faulet
799518e63f MEDIUM: stconn: Add mux-to-mux fast-forward support
Now the kernel splicing support was removed, we can add mux-to-mux
fast-forward support. Of course, the splicing support will be reintroduced
in the muxes themselves but this will be transparent.

Changes are mainly located into sc_conn_recv() and sc_conn_send().
2023-10-17 18:51:13 +02:00
Christopher Faulet
a500899601 MINOR: mux-h1: Temporarily remove splicing support
Because the kernel splicing support was removed from the stconn, it is
useless to keep it in muxes. In this patch, we remove the kernel splicing
support from the H1 multiplexer. It will be replaced by the mux-to-mux data
fast-forwarding.
2023-10-17 18:51:13 +02:00
Christopher Faulet
02ed7c0d0f MINOR: mux-pt: Temporarily remove splicing support
Because the kernel splicing support was removed from the stconn, it is
useless to keep it in muxes. In this patch, we remove the kernel splicing
support from the passthough multiplexer. It will be replaced by the
mux-to-mux data fast-forwarding.
2023-10-17 18:51:13 +02:00
Christopher Faulet
8b89fe3d8f MINOR: stconn: Temporarily remove kernel splicing support
mux-to-mux fast-forwarding will be added. To avoid mix with the splicing and
simplify the commits, the kernel splicing support is removed from the
stconn. CF_KERN_SPLICING flag is removed and the support is no longer tested
in process_stream().

In the stconn part, rcv_pipe() callback function is no longer called.

Reg-tests scripts testing the kernel splicing are temporarly marked as
broken.
2023-10-17 18:51:13 +02:00
Christopher Faulet
1d68bebb70 MINOR: stconn: Extend iobuf to handle a buffer in addition to a pipe
It is unused for now, but the iobuf structure now owns a pointer to a
buffer. This buffer will be used to perform mux-to-mux fast-forwarding when
splicing is not supported or unusable. This pointer should be filled by an
endpoint to let the opposite one forward data.

Extra fields, in addition to the buffer, are mandatory because the buffer
may already contains some data. the ".offset" field may be used may be used
as the position to start to copy data. Finally, the amount of data copied in
this buffer must be saved in ".data" field.

Some flags are also added to prepare next changes. And helper stconn
fnuctions are updated to also count data in the buffer. For a first
implementation, it is not planned to handle data in the buffer and in the
pipe in same time. But it will be possible to do so.
2023-10-17 18:51:13 +02:00
Christopher Faulet
e52519ac83 MINOR: stconn: Start to introduce mux-to-mux fast-forwarding notion
Instead of talking about kernel splicing at stconn/sedesc level, we now try
to talk about mux-to-mux fast-forwarding. To do so, 2 functions were added
to know if there are fast-forwarded data and to retrieve this amount of
data. Of course, for now, there is only data in a pipe.

In addition, some flags were renamed to reflect this notion. Note the
channel's documentation was not updated yet.
2023-10-17 18:51:13 +02:00
Christopher Faulet
8bee0dcd7d MEDIUM: stconn/channel: Move pipes used for the splicing in the SE descriptors
The pipes used to put data when the kernel splicing is in used are moved in
the SE descriptors. For now, it is just a simple remplacement but there is a
major difference with the pipes in the channel. The data are pushed in the
consumer's pipe while it was pushed in the producer's pipe. So it means the
request data are now pushed in the pipe of the backend SE descriptor and
response data are pushed in the pipe of the frontend SE descriptor.

The idea is to hide the pipe from the channel/SC side and to be able to
handle fast-forwading in pipe but also in buffer. To do so, the pipe is
inside a new entity, called iobuf. This entity will be extended.
2023-10-17 18:51:13 +02:00
Christopher Faulet
1fdfa4f9ba BUG/MEDIUM: mux-h2: Don't report an error on shutr if a shutw is pending
If a shutw is blocked because the mux is full or busy, we must defer the
shutr. In this case, the H2 stream is not in H2_SS_CLOSED state because the
shutw is also deferred. If the shutr is performed, this will lead to a
error.

Concretly, when the mux is unblocked, a RST_STREAM is sent while in some
cases, an empty DATA frame with ES flag set could be sent.

This patch should be backported to all stable versions.
2023-10-17 18:51:13 +02:00
Christopher Faulet
d0b04920d1 BUG/MINOR: htpp-ana/stats: Specify that HTX redirect messages have a C-L header
Redirect responses sent during the HTTP analysis have no payload. However
there is still a "Content-Length" header. It is important to set the
corresponding flag on the HTX start-line to be sure to preserve this header
when the reponse is sent to the client. The same is true with the stats
applet, when it returns a redirect responses.

It is especially important because we no ignore in-fly modifications of
"Content-Length" or "Transfer-Encoding" headers without updating the HTX
start-line flags.

This patch may be backported to all stable versions but it is probably
useless because only the 2.9-dev is affected by the bug.
2023-10-17 18:11:04 +02:00
Christopher Faulet
e9f6e8e7f6 BUG/MEDIUM: mux-h1: do not forget TLR/EOT even when no data is sent
Since commit 723c73f8a ("MEDIUM: mux-h1: Split h1_process_mux() to make code
more readable"), outgoing H1 chunked messages with no data at all get
delayed by 200ms. It is due to the fact that we end processing too early and
we don't have the opportunity to process trailers in this case.

This fix addresses it by verifying if it's required to emit EOT or trailers,
if any, when retruning from h1_make_data()

No backport is needed, this was in 2.9-dev.
2023-10-17 18:11:04 +02:00
Christopher Faulet
2f9db80cc6 CLEANUP: hlua: Remove dead-code on error path in hlua_socket_new()
Since last fixes about the lua cosocket, the appctx is no longer initialized
in hlua_socket_new(). The code to deal with error at this stage can be
removed.

This patch should fix the issue #2308.
2023-10-17 18:11:04 +02:00
Willy Tarreau
4070e4042a BUG/MEDIUM: quic_conn: let the scheduler kill the task when needed
The two timer handlers qc_process_timer() and qc_idle_timer_task() would
inadvertently return NULL when they don't want to be requeued, instead
of just returning the task itself. The effect of returning NULL for the
scheduler is that it considers the task as freed, so it must not touch
it anymore. As such, the TASK_F_RUNNING flag is never removed from these
tasks, and when quic_conn_release() later tries to release these tasks
using task_destroy(), the latter sees the RUNNING flag and just sets
->process to NULL, hoping that the scheduler will kill them on return,
but there's no longer being executed so this never happens and they are
leaked.

Interestingly, this doesn't seem to happen as much when multi-queue is
set to off, but it's likely because the tasks are being replaced and the
first ones have already been woken up and leaked, while the latter might
only trigger on a timeout or timer renewal.

This should address github issue #2310. Thanks to @hpn0t0ad for the
numerous traces that helped understand this sequence.

This must be backported to 2.7 at least, and adapted for 2.6
(qc_idle_timer_task must return t there).
2023-10-17 17:14:06 +02:00
Willy Tarreau
5714aff4a6 DEBUG: pool: store the memprof bin on alloc() and update it on free()
When looking at "show pools", it's often difficult to know which alloc()
corresponds to which free() since it's not often 1:1. But sometimes we
have all elements available to maintain a link between alloc and free.
Indeed, when the caller is recorded in the allocated area, we can store
the pointer to the just created bin instead of the caller address itself,
since the caller address is already in the memprof bin. By doing so, we
permit the pool_free() call to locate the allocator bin and update its
free count when caller tracing is enabled. This for example allows to
produce outputs like this on "show profiling" and a process started with
-dMcaller:

  1391967  1391968  22805987328  22806003712|  0x59f72f process_stream+0x19f/0x3a7a p_alloc(0) [delta=-16384] [pool=buffer]
  1391936  1391937  22805479424  22805495808|  0x6e1476 task_run_applet+0x426/0xea2 p_alloc(0) [delta=-16384] [pool=buffer]
  1391925  1391925  22805299200  22805299200|  0x58435a main+0xdf07a p_alloc(0) [delta=0] [pool=buffer]
        0  2087930            0  34208645120|  0x59b519 stream_release_buffers+0xf9/0x110 p_free(-16384) [pool=buffer]
   695993   695992  11403149312  11403132928|  0x66018f main+0x1baeaf p_alloc(0) [delta=16384] [pool=buffer]
        0  1391957            0  22805823488|  0x59b47c stream_release_buffers+0x5c/0x110 p_free(-16384) [pool=buffer]
   695968   695970  11402739712  11402772480|  0x587b85 h1_io_cb+0x9a5/0xe7c p_alloc(0) [delta=-32768] [pool=buffer]
        0  1391923            0  22805266432|  0x57f388 main+0xda0a8 p_free(-16384) [pool=buffer]
   695959   695960  11402592256  11402608640|  0x586add main+0xe17fd p_alloc(0) [delta=-16384] [pool=buffer]
          0      695978              0    11402903552|         0x59cc58 stream_free+0x178/0x9ea p_free(-16384) [pool=buffer]
(...)

Here it's quickly visible that all of them got properly released.
2023-10-17 17:13:56 +02:00
Willy Tarreau
68d02e5fa9 BUG/MINOR: mux-h2: make up other blocked streams upon removal from list
An interesting issue was met when testing the mux-to-mux forwarding code.

In order to preserve fairness, in h2_snd_buf() if other streams are waiting
in send_list or fctl_list, the stream that is attempting to send also goes
to its list, and will be woken up by h2_process_mux() or h2_send() when
some space is released. But on rare occasions, there are only a few (or
even a single) streams waiting in this list, and these streams are just
quickly removed because of a timeout or a quick h2_detach() that calls
h2s_destroy(). In this case there's no even to wake up the other waiting
stream in its list, and this will possibly resume processing after some
client WINDOW_UPDATE frames or even new streams, so usually it doesn't
last too long and it not much noticeable, reason why it was left that
long. In addition, measures have shown that in heavy network-bound
benchmark, this exact situation happens on less than 1% of the streams
(reached 4% with mux-mux).

The fix here consists in replacing these LIST_DEL_INIT() calls on
h2s->list with a function call that checks if other streams were queued
to the send_list recently, and if so, which also tries to resume them
by calling h2_resume_each_sending_h2s(). The detection of late additions
is made via a new flag on the connection, H2_CF_WAIT_INLIST, which is set
when a stream is queued due to other streams being present, and which is
cleared when this is function is called.

It is particularly difficult to reproduce this case which is particularly
timing-dependent, but in a constrained environment, a test involving 32
conns of 20 streams each, all downloading a 10 MB object previously
showed a limitation of 17 Gbps with lots of idle CPU time, and now
filled the cable at 25 Gbps.

This should be backported to all versions where it applies.
2023-10-17 16:43:44 +02:00
Vladimir Vdovin
70d2d9aefc MINOR: support for http-response set-timeout
Added set-timeout action for http-response. Adapted reg-tests and
documentation.
2023-10-17 08:27:33 +02:00
Christopher Faulet
7629e82c6e BUG/MINOR: mux-h1: Send a 400-bad-request on shutdown before the first request
Except if we must silently ignore empty connections by enabling
http-ignore-probes or dontlognull options, when a client connection is
closed before the first request, a 400-bad-request response must be sent
with the corresponding log message. However, that is broken since the commit
fc473a6453 ("MEDIUM: mux-h1: Rely on the H1C to deal with shutdown for
reads").

The bug is subtle. Parsing errors are no longer reported on connection errors
before the first request while it should be.

This patch must be backported where the above commit is (as far as 2.7).
2023-10-13 17:16:43 +02:00
Christopher Faulet
2a51d5b6ea BUG/MEDIUM: applet: Report a send activity everytime data were sent
In the same way than for stream-connectors (see "BUG/MEDIUM: stconn: Report
a send activity everytime data were sent" for details), we now report a send
activity everytime something was consumed by an applet, even if some output
data remains blocked into the channel's buffer.

This patch must be backported to 2.8.
2023-10-13 10:35:32 +02:00
Christopher Faulet
3083fd90e1 BUG/MEDIUM: stconn: Report a send activity everytime data were sent
When read/write timeouts were refactored in 2.8, we decided to change when a
send activity had to be reported. Before, everytime some data were sent a
send activity were reported. At this time, the channel's wex timer were
updated. During the refactoring, we decided to limit send activity to sends
that ampty te channel's buffer, consuming all outgoing data. Idea behind
this change was to protect haproxy against clients consumming data very
slowly.

However, it is too strict. Some congested muxes but still active can hit the
client or the server timeout. It seems a bit unfair. It is especially
visible with QUIC/H3 but it is probably also possible with H2 if the window
size is small.

The better is to restore the old behavior.

This patch must be backported to 2.8.
2023-10-13 10:35:32 +02:00
Aurelien DARRAGON
94d0f77deb MINOR: server: introduce "log-bufsize" kw
"log-bufsize" may now be used for a log server (in a log backend) to
configure the bufsize of implicit ring associated to the server (which
defaults to BUFSIZE).
2023-10-13 10:05:07 +02:00
Aurelien DARRAGON
b30bd7adba MEDIUM: log/balance: support for the "hash" lb algorithm
hash lb algorithm can be configured with the "log-balance hash <cnv_list>"
directive. With this algorithm, the user specifies a converter list with
<cnv_list>.

The produced log message will be passed as-is to the provided converter
list, and the resulting hash will be used to select the log server that
will receive the log message.
2023-10-13 10:05:06 +02:00
Aurelien DARRAGON
7251344748 MINOR: sample: add sample_process_cnv() function
split sample_process() in 2 parts in order to be able to only process
the converter part of a sample expression from an existing input sample
struct passed as parameter.
2023-10-13 10:05:06 +02:00
Aurelien DARRAGON
08767e162d MINOR: lbprm: compute the hash avalanche in gen_hash()
Instead of systematically computing the avalanche hash right after the
gen_hash() call, do it inside the gen_hash() function directly to ensure
avalanche setting is always considered.
2023-10-13 10:05:06 +02:00
Aurelien DARRAGON
a7563158f7 MINOR: lbprm: support for the "none" hash-type function
Allow the use of the "none" hash-type function so that the key resulting
from the sample expression is directly used as the hash.

This can be useful to do the hashing manually using available hashing
converters, or even custom ones, and then inform haproxy that it can
directly rely on the sample expression result which is explictly handled
as an integer in this case.
2023-10-13 10:05:06 +02:00
Aurelien DARRAGON
e0b4660015 MINOR: log/balance: support for the "random" lb algorithm
In this patch we add basic support for the random algorithm:

random algorithm picks a random server using the result of the
statistical_prng() function as if it was a hash key to then compute the
related server ID.

There is no support for the <draw> parameter (which is implemented for
tcp/http load-balancing), because we don't have the required metrics to
evaluate server's load in log backends for the moment. Plus it would add
more complexity to the __do_send_log_backend() function so we'll keep it
this way for now but this might be needed in the future.
2023-10-13 10:05:06 +02:00
Aurelien DARRAGON
26f73dbcbb MINOR: log/balance: support for the "sticky" lb algorithm
sticky algorithm always tries to send log messages to the first server in
the farm. The server will stay in front during queue and dequeue
operations (no other server can steal its place), unless it becomes
unavailable, in which case it will be replaced by another server from
the tree.
2023-10-13 10:05:06 +02:00
Aurelien DARRAGON
9a74a6cb17 MAJOR: log: introduce log backends
Using "mode log" in a backend section turns the proxy in a log backend
which can be used to log-balance logs between multiple log targets
(udp or tcp servers)

log backends can be used as regular log targets using the log directive
with "backend@be_name" prefix, like so:

  | log backend@mybackend local0

A log backend will distribute log messages to servers according to the
log load-balancing algorithm that can be set using the "log-balance"
option from the log backend section. For now, only the roundrobin
algorithm is supported and set by default.
2023-10-13 10:05:06 +02:00
Aurelien DARRAGON
e58a9b4baf MINOR: sink: add sink_new_from_srv() function
This helper function can be used to create a new sink from an existing
server struct (and thus existing proxy as well), in order to spare some
resources when possible.
2023-10-13 10:05:06 +02:00
Aurelien DARRAGON
5c0d1c1a74 MEDIUM: sink: inherit from caller fmt in ring_write() when rings didn't set one
implicit rings were automatically forced to the parent logger format, but
this was done upon ring creation.

This is quite restrictive because we might want to choose the desired
format right before generating the log header (ie: when producing the
log message), depending on the logger (log directive) that is
responsible for the log message, and with current logic this is not
possible. (To this day, we still have dedicated implicit ring per log
directive, but this might change)

In ring_write(), we check if the sink->fmt is specified:
 - defined: we use it since it is the most precise format
   (ie: for named rings)
 - undefined: then we fallback to the format from the logger

With this change, implicit rings' format is now set to UNSPEC upon
creation. This is safe because the log header building function
automatically enforces the "raw" format when UNSPEC is set. And since
logger->format also defaults to "raw", no change of default behavior
should be expected.
2023-10-13 10:05:06 +02:00
Aurelien DARRAGON
6dad0549a5 MEDIUM: log/sink: simplify log header handling
Introduce log_header struct to easily pass log header data between
functions and use that to simplify the logic around log header
handling.

While at it, some outdated comments were updated as well.

No change in behavior should be expected.
2023-10-13 10:05:06 +02:00
Aurelien DARRAGON
ab914667da MINOR: log: remove the logger dependency in do_send_log()
do_send_log() now exlusively relies on explicit parameters to remove
logger dependency in low-level log sending chain.
2023-10-13 10:05:06 +02:00
Aurelien DARRAGON
60c5821867 MINOR: log: support explicit log target as argument in __do_send_log()
__do_send_log() now takes an extra target parameter to pass an explicit
log target instead of getting it from logger->target.

This will allow __do_send_log() to be called multiple times within a
logger entry containing multiple log targets.
2023-10-13 10:05:06 +02:00
Aurelien DARRAGON
cc3dfe89ed MEDIUM: sink/log: stop relying on AF_UNSPEC for rings
Since a5b325f92 ("MINOR: protocol: add a real family for existing FDs"),
we don't rely anymore on AF_UNSPEC for buffer rings in do_send_log.

But we kept it as a parsing hint to differentiate between implicit and
named rings during ring buffer postparsing.

However it is still a bit confusing and forces us to systematically rely
on target->addr, even for named buffer rings where it doesn't make much
sense anymore.

Now that target->addr was made a pointer in a recent commit, we can
choose not to initialize it when not needed (i.e.: named rings) and use
this as a hint to distinguish implicit rings during init since they rely
on the addr struct to temporarily store the ring's address until the ring
is actually created during postparsing step.
2023-10-13 10:05:06 +02:00
Aurelien DARRAGON
a9b185f34e MEDIUM: log: introduce log target
log targets were immediately embedded in logger struct (previously
named logsrv) and could not be used outside of this context.

In this patch, we're introducing log_target type with the associated
helper functions so that it becomes possible to declare and use log
targets outside of loggers scope.
2023-10-13 10:05:06 +02:00
Aurelien DARRAGON
18da35c123 MEDIUM: tree-wide: logsrv struct becomes logger
When 'log' directive was implemented, the internal representation was
named 'struct logsrv', because the 'log' directive would directly point
to the log target, which used to be a (UDP) log server exclusively at
that time, hence the name.

But things have become more complex, since today 'log' directive can point
to ring targets (implicit, or named) for example.

Indeed, a 'log' directive does no longer reference the "final" server to
which the log will be sent, but instead it describes which log API and
parameters to use for transporting the log messages to the proper log
destination.

So now the term 'logsrv' is rather confusing and prevents us from
introducing a new level of abstraction because they would be mixed
with logsrv.

So in order to better designate this 'log' directive, and make it more
generic, we chose the word 'logger' which now replaces logsrv everywhere
it was used in the code (including related comments).

This is internal rewording, so no functional change should be expected
on user-side.
2023-10-13 10:05:06 +02:00
Amaury Denoyelle
89d685f396 BUG/MEDIUM: quic-conn: free unsent frames on retransmit to prevent crash
Since the following patch :
  commit 33c49cec987c1dcd42d216c6d075fb8260058b16
  MINOR: quic: Make qc_dgrams_retransmit() return a status.
retransmission process is interrupted as soon as a fatal send error has
been encounted. However, this may leave frames in local list. This cause
several issues : a memory leak and a potential crash.

The crash happens because leaked frames are duplicated of an origin
frame via qc_dup_pkt_frms(). If an ACK arrives later for the origin
frame, all duplicated frames are also freed. During qc_frm_free(),
LIST_DEL_INIT() operation is invalid as it still references the local
list used inside qc_dgrams_retransmit().

This bug was reproduced using the following injection from another
machine :
  $ h2load --npn-list h3 -t 8 -c 10000 -m 1 -n 2000000000 \
      https://<host>:<port>/?s=4m

Haproxy was compiled using ASAN. The crash resulted in the following
trace :
==332748==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fff82bf9d78 at pc 0x556facd3b95a bp 0x7fff82bf8b20 sp 0x7fff82bf8b10
WRITE of size 8 at 0x7fff82bf9d78 thread T0
    #0 0x556facd3b959 in qc_frm_free include/haproxy/quic_frame.h:273
    #1 0x556facd59501 in qc_release_frm src/quic_conn.c:1724
    #2 0x556facd5a07f in quic_stream_try_to_consume src/quic_conn.c:1803
    #3 0x556facd5abe9 in qc_treat_acked_tx_frm src/quic_conn.c:1866
    #4 0x556facd5b3d8 in qc_ackrng_pkts src/quic_conn.c:1928
    #5 0x556facd60187 in qc_parse_ack_frm src/quic_conn.c:2354
    #6 0x556facd693a1 in qc_parse_pkt_frms src/quic_conn.c:3203
    #7 0x556facd7531a in qc_treat_rx_pkts src/quic_conn.c:4606
    #8 0x556facd7a528 in quic_conn_app_io_cb src/quic_conn.c:5059
    #9 0x556fad3284be in run_tasks_from_lists src/task.c:596
    #10 0x556fad32a3fa in process_runnable_tasks src/task.c:876
    #11 0x556fad24a676 in run_poll_loop src/haproxy.c:2968
    #12 0x556fad24b510 in run_thread_poll_loop src/haproxy.c:3167
    #13 0x556fad24e7ff in main src/haproxy.c:3857
    #14 0x7fae30ddd0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)
    #15 0x556facc9375d in _start (/opt/haproxy-quic-2.8/haproxy+0x1ea75d)

Address 0x7fff82bf9d78 is located in stack of thread T0 at offset 40 in frame
    #0 0x556facd74ede in qc_treat_rx_pkts src/quic_conn.c:4580

This must be backported up to 2.7.
2023-10-13 08:57:08 +02:00
Amaury Denoyelle
10dab4af98 BUG/MINOR: mux-quic: fix free on qcs-new fail alloc
qcs_new() allocates several elements in intermediary steps. All elements
must first be properly initialized to be able to free qcs instance in
case of an intermediary failure.

Previously, qc_stream_desc allocation was done in the middle of
qcs_new() before some elements initializations. In case this fails, a
crash can happened as some elements are left uninitialized.

To fix this, move qc_stream_desc allocation at the end of qcs_new().
This ensures that all qcs elements are initialized first.

This should be backported up to 2.6.
2023-10-13 08:52:29 +02:00
Amaury Denoyelle
63a6f26a86 BUG/MINOR: quic: fix free on quic-conn fail alloc
qc_new_conn() allocates several elements in intermediary steps. If one
of the fails, a global free is done on the quic_conn and its elements.
This requires that most elements are first initialized to NULL or
equivalent to ensure freeing operation is done only on proper values.

Once of this element is qc.tx.cc_buf_area. It was initialized too late
which could caused crashes. This is introduced by
  9f7cfb0a56
  MEDIUM: quic: Allow the quic_conn memory to be asap released.

No need to backport.
2023-10-13 08:52:20 +02:00
Willy Tarreau
5798b5bb14 BUG/MAJOR: connection: make sure to always remove a connection from the tree
Since commit 5afcb686b ("MAJOR: connection: purge idle conn by last usage")
in 2.9-dev4, the test on conn->toremove_list added to conn_get_idle_flag()
in 2.8 by commit 3a7b539b1 ("BUG/MEDIUM: connection: Preserve flags when a
conn is removed from an idle list") becomes misleading. Indeed, now both
toremove_list and idle_list are shared by a union since the presence in
these lists is mutually exclusive. However, in conn_get_idle_flag() we
check for the presence in the toremove_list to decide whether or not to
delete the connection from the tree. This test now fails because instead
it sees the presence in the idle or safe list via the union, and concludes
the element must not be removed. Thus the element remains in the tree and
can be found later after the connection is released, causing crashes that
Tristan reported in issue #2292.

The following config is sufficient to reproduce it with 2 threads:

   defaults
        mode http
        timeout client 5s
        timeout server 5s
        timeout connect 1s

   listen front
        bind :8001
        server next 127.0.0.1:8002

   frontend next
        bind :8002
        timeout http-keep-alive 1
        http-request redirect location /

Sending traffic with a few concurrent connections and some short timeouts
suffices to instantly crash it after ~10k reqs:

   $ h2load -t 4 -c 16 -n 10000 -m 1 -w 1 http://0:8001/

With Amaury we analyzed the conditions in which the function is called
in order to figure a better condition for the test and concluded that
->toremove_list is never filled there so we can safely remove that part
from the test and just move the flag retrieval back to what it was prior
to the 2.8 patch above. Note that the patch is not reverted though, as
the parts that would drop the unexpected flags removal are unchanged.

This patch must NOT be backported. The code in 2.8 works correctly, it's
only the change in 2.9 that makes it misbehave.
2023-10-12 14:20:03 +02:00
Willy Tarreau
704f090b05 CLEANUP: connection: drop an uneeded leftover cast
In conn_delete_from_tree() there remains a cast of the toremove_list
to struct list while the introduction of the union precisely was to
avoid this cast. It's a leftover from the first version of patch
5afcb686b ("MAJOR: connection: purge idle conn by last usage") merged
into in 2.9-dev4, let's fix that.

No backport is needed.
2023-10-12 14:16:59 +02:00
Amaury Denoyelle
dc750817c5 BUG/MINOR: h3: strengthen host/authority header parsing
HTTP/3 specification has several requirement when parsing authority or
host header inside a request. However, it was until then only partially
implemented.

This commit fixes this by ensuring the following :
* reject an empty authority/host header
* reject a host header if an authority was found with a different value
* no authority neither host header present

This must be backported up to 2.6.
2023-10-11 14:21:30 +02:00
Amaury Denoyelle
9d905dfd73 BUG/MINOR: mux-quic: support initial 0 max-stream-data
Support stream opening with an initial max-stream-data of 0.

In normal case, QC_SF_BLK_SFCTL is set when a qcs instance cannot
transfer more data due to flow-control. This flag is set when
transfering data from MUX to quic-conn instance.

However, it's possible to define an initial value of 0 for
max-stream-data. In this case, qcs instance is blocked despite
QC_SF_BLK_SFCTL not set. No STREAM frame is prepared for this stream as
it's not possible to emit any byte, so QC_SF_BLK_SFCTL flag is never
set.

This behavior should cause no harm. However, this can cause a BUG_ON()
crash on qcc_io_send(). Indeed, when sending is retried, it ensures that
only qcs instance waiting for a new qc_stream_buf or with
QC_SF_BLK_SFCTL set is present in the send_list.

To fix this, initialize qcs with 0 value for msd and QC_SF_BLK_SFCTL.
The flag is removed only if transport parameter msd value is non null.

This should be backported up to 2.6.
2023-10-11 14:15:31 +02:00
Amaury Denoyelle
d85f9f9d43 BUG/MEDIUM: mux-quic: fix RESET_STREAM on send-only stream
When receiving a RESET_STREAM on a send-only stream, it is mandatory to
close the connection with an error STREAM_STATE error. However, this was
badly implemented as this caused two invocation of qcc_set_error() which
is forbidden by the mux-quic API.

To fix this, rely on qcc_get_qcs() to properly detect the error. Remove
qcc_set_error() usage from qcc_recv_reset_stream() instead.

This must be backported up to 2.7.
2023-10-11 14:15:31 +02:00
Amaury Denoyelle
a4c59f5b9e BUG/MINOR: quic: reject packet with no frame
RFC 9000 indicates that a QUIC packet with no frame must trigger a
connection closure with PROTOCOL_VIOLATION error code. Implement this
via an early return inside qc_parse_pkt_frms().

This should be backported up to 2.6.
2023-10-11 14:15:31 +02:00
Amaury Denoyelle
f59f8326f9 REORG: quic: cleanup traces definition
Move all QUIC trace definitions from quic_conn.h to quic_trace-t.h. Also
remove multiple definition trace_quic macro definition into
quic_trace.h. This forces all QUIC source files who relies on trace to
include it while reducing the size of quic_conn.h.
2023-10-11 14:15:31 +02:00
Frédéric Lécaille
bd83b6effb BUG/MINOR: quic: Avoid crashing with unsupported cryptographic algos
This bug was detected when compiling haproxy against aws-lc TLS stack
during QUIC interop runner tests. Some algorithms could be negotiated by haproxy
through the TLS stack but not fully supported by haproxy QUIC implentation.
This leaded tls_aead() to return NULL (same thing for tls_md(), tls_hp()).
As these functions returned values were never checked, they could triggered
segfaults.

To fix this, one closes the connection as soon as possible with a
handshake_failure(40) TLS alert. Note that as the TLS stack successfully
negotiates an algorithm, it provides haproxy with CRYPTO data before entering
->set_encryption_secrets() callback. This is why this callback
(ha_set_encryption_secrets() on haproxy side) is modified to release all
the CRYPTO frames before triggering a CONNECTION_CLOSE with a TLS alert. This is
done calling qc_release_pktns_frms() for all the packet number spaces.
Modify some quic_tls_keys_hexdump to avoid crashes when the ->aead or ->hp EVP_CIPHER
are NULL.
Modify qc_release_pktns_frms() to do nothing if the packet number space passed
as parameter is not intialized.

This bug does not impact the QUIC TLS compatibily mode (USE_QUIC_OPENSSL_COMPAT).

Thank you to @ilia-shipitsin for having reported this issue in GH #2309.

Must be backported as far as 2.6.
2023-10-11 11:52:22 +02:00
William Lallemand
a62a2d8b48 MINOR: ssl: add an explicit error when 'ciphersuites' are not supported
Add an explicit error when the support for 'ciphersuites' was not enable
into the build because of the SSL library.
2023-10-09 14:46:09 +02:00
Aurelien DARRAGON
31e8a003a5 MINOR: sink: function to add new sink servers
Move the sft creation part out of sink_finalize() function so that it
becomes possible to register sink's servers without forward_px being
set.
2023-10-06 15:34:31 +02:00
Aurelien DARRAGON
205d480d9f MINOR: sink: refine forward_px usage
now forward_px only serves as a hint to know if a proxy was created
specifically for the sink, in which case the sink is responsible for it.

Everywhere forward_px was used in appctx context: get the parent proxy from
the sft->srv instead.

This permits to finally get rid of the double link dependency between sink
and proxy.
2023-10-06 15:34:31 +02:00
Aurelien DARRAGON
405567c125 MINOR: sink: don't rely on forward_px to init sink forwarding
Instead, we check if at least one sft has been registered into the sink,
if it is the case, then we need to init the forwarding for the sink.
2023-10-06 15:34:31 +02:00
Aurelien DARRAGON
3c53f6cb76 MINOR: sink: don't rely on p->parent in sink appctx
Removing unnecessary dependency on proxy->parent pointer in
sink appctx functions by directly using the sink sft from the
applet->svcctx to get back to sink related structs.

Thanks to this, proxy used for a ringbuf does not have to be exclusive
to a single sink anymore.
2023-10-06 15:34:31 +02:00
Aurelien DARRAGON
ec770b7924 MINOR: sink: remove useless check after sink creation
It's useless to check if sink has been created with BUF type after
calling sink_new_buf() since the goal of the function is to create
a new sink of BUF type.
2023-10-06 15:34:31 +02:00
Aurelien DARRAGON
cb01da8d12 MINOR: sink/log: fix some typos around postparsing logic
Fixing some typos that have been overlooked during the recent log/sink
API improvements. Using this patch to make sink_new_from_logsrv() static
since it is not used outside of sink.c
2023-10-06 15:34:31 +02:00
Aurelien DARRAGON
19a1210dcd MINOR: cfgparse-listen: warn when use-server rules is used in wrong mode
haproxy will report a warning when "use-server" keyword is used within a
backend that doesn't support server rules to inform the user that rules
will be ignored.

To this day, only TCP and HTTP backends can make use of it.
2023-10-06 15:34:30 +02:00
Aurelien DARRAGON
3934901e51 MINOR: proxy: report a warning for max_ka_queue in proxy_cfg_ensure_no_http()
Display a warning when max_ka_queue is set (it is the case when
"max-keep-alive-queue" directive is used within a proxy section) to inform
the user that this directives depends on the "http" mode to work and thus
will safely be ignored.
2023-10-06 15:34:30 +02:00
Aurelien DARRAGON
65f1124b5d MINOR: cfgparse-listen: "http-reuse" requires TCP or HTTP mode
Prevent the use of the "http-reuse" keyword in proxy section when neither
the TCP nor the HTTP mode is set.
2023-10-06 15:34:30 +02:00
Aurelien DARRAGON
403fdee6a4 MINOR: proxy: dynamic-cookie CLIs require TCP or HTTP mode
Prevent the use of "dynamic-cookie" related CLI commands if the backend
is not in TCP or HTTP mode.
2023-10-06 15:34:30 +02:00
Aurelien DARRAGON
0b09727a22 MINOR: cfgparse-listen: "dynamic-cookie-key" requires TCP or HTTP mode
Prevent the use of the "dynamic-cookie-key" keyword in proxy sections
when TCP or HTTP modes are not set.
2023-10-06 15:34:30 +02:00
Aurelien DARRAGON
d354947365 MINOR: cfgparse-listen: "http-send-name-header" requires TCP or HTTP mode
Prevent the use of the "http-send-name-header" keyword in proxy section
when neither TCP or HTTP mode is set.
2023-10-06 15:34:30 +02:00
Aurelien DARRAGON
0ba731f50b MINOR: fcgi-app: "use-fcgi-app" requires TCP or HTTP mode
Prevent the use of the "use-fcgi-app" keyword in proxy sections where
neither TCP nor HTTP mode is set.
2023-10-06 15:34:30 +02:00
Aurelien DARRAGON
b41b77b4cc MINOR: http_htx/errors: prevent the use of some keywords when not in tcp/http mode
Prevent the use of "errorfile", "errorfiles" and various errorloc options
in proxies that are neither in TCP or HTTP mode.
2023-10-06 15:34:30 +02:00
Aurelien DARRAGON
225526dc16 MINOR: flt_http_comp: "compression" requires TCP or HTTP mode
Prevent the use of "compression" keyword in proxy sections when the proxy
is neither in tcp or http mode.
2023-10-06 15:34:30 +02:00
Aurelien DARRAGON
1e0093a317 MINOR: backend/balance: "balance" requires TCP or HTTP mode
Prevent the use of "balance" and associated keywords when proxy is neither
in tcp or http mode.
2023-10-06 15:34:30 +02:00
Aurelien DARRAGON
f9422551cd MINOR: filter: "filter" requires TCP or HTTP mode
Prevent the use of "filter" when proxy is not in TCP or HTTP mode.
2023-10-06 15:34:30 +02:00
Aurelien DARRAGON
098ae743fd MINOR: stktable: "stick" requires TCP or HTTP mode
Prevent the use of "stick-table" and "stick *" when proxy is neither in
tcp or http mode.
2023-10-06 15:34:30 +02:00
Aurelien DARRAGON
09b15e4163 MINOR: tcp_rules: tcp-{request,response} requires TCP or HTTP mode
Prevent the use of tcp-{request,response} keyword in proxies that are
neither in TCP or HTTP modes.
2023-10-06 15:34:30 +02:00
Willy Tarreau
90fa2eaa15 MINOR: haproxy: permit to register features during boot
The regtests are using the "feature()" predicate but this one can only
rely on build-time options. It would be nice if some runtime-specific
options could be detected at boot time so that regtests could more
flexibly adapt to what is supported (capabilities, splicing, etc).

Similarly, certain features that are currently enabled with USE_XXX
could also be automatically detected at build time using ifdefs and
would simplify the configuration, but then we'd lose the feature
report in the feature list which is convenient for regtests.

This patch makes sure that haproxy -vv shows the variable's contents
and not the macro's contents, and adds a new hap_register_feature()
to allow the code to register a new keyword.
2023-10-06 11:40:02 +02:00
Remi Tricot-Le Breton
a5e96425a2 MEDIUM: cache: Add "Origin" header to secondary cache key
This patch add a hash of the Origin header to the cache's secondary key.
This enables to manage store responses that have a "Vary: Origin" header
in the cache when vary is enabled.
This cannot be considered as a means to manage CORS requests though, it
only processes the Origin header and hashes the presented value without
any form of URI normalization.

This need was expressed by Philipp Hossner in GitHub issue #251.

Co-Authored-by: Philipp Hossner <philipp.hossner@posteo.de>
2023-10-05 10:53:54 +02:00
Amaury Denoyelle
544e320f80 BUG/MINOR: hq-interop: simplify parser requirement
hq-interop should be limited for QUIC testing. As such, its code should
be kept plain simple and not implement too many things.

This patch fixes issues which may cause rare QUIC interop failures :
- remove some unneeded BUG_ON() as parser should not be too strict
- remove support of partial message parsing
- ensure buffer data does not wrap as it was not properly handled. In
  any case, this should never happen as only a single message will be
  stored for each qcs buffer.

This should be backported up to 2.6.
2023-10-04 17:32:23 +02:00
William Lallemand
45174e4fdc BUILD: quic: allow USE_QUIC to work with AWSLC
This patch fixes the build with AWSLC and USE_QUIC=1, this is only meant
to be able to build for now and it's not feature complete.

The set_encryption_secrets callback has been split in set_read_secret
and set_write_secret.

Missing features:

- 0RTT was disabled.
- TLS1_3_CK_CHACHA20_POLY1305_SHA256, TLS1_3_CK_AES_128_CCM_SHA256 were disabled
- clienthello callback is missing, certificate selection could be
  limited (RSA + ECDSA at the same time)
2023-10-04 16:55:19 +02:00
Christopher Faulet
225a4d02e1 MINOR: h1-htx: Declare successful tunnel establishment as bodyless
Successful responses to a CONNECT or to a upgrade request have no payload.
Be explicit on this point by setting HTX_SL_F_BODYLESS_RESP flag on the HTX
start-line.
2023-10-04 15:34:18 +02:00
Christopher Faulet
b6c32f1e04 BUG/MINOR: h1-htx: Keep flags about C-L/T-E during HEAD response parsing
When a response to a HEAD request is parsed, flags to know if the content
length is set or if the payload is chunked must be preserved.. It is
important because of the previous fix. Otherwise, these headers will be
removed from the response sent to the client.

This patch must only backported if "BUG/MEDIUM: mux-h1; Ignore headers
modifications about payload representation" is backported.
2023-10-04 15:34:18 +02:00
Christopher Faulet
f89ba27caa BUG/MEDIUM: mux-h1; Ignore headers modifications about payload representation
We now ignore modifications during the message analysis about the payload
representation if only headers are updated and not meta-data. It means a C-L
header removed to add a T-E one or the opposite via HTTP actions. This kind
of changes are ignored because it is extremly hard to be sure the payload
will be properly formatted.

It is an issue since the HTX was introduced and it was never reported. Thus,
there is no reason to backport this patch for now. It relies on following commits:

  * MINOR: mux-h1: Add flags if outgoing msg contains a header about its payload
  * MINOR: mux-h1: Rely on H1S_F_HAVE_CHNK to add T-E in outgoing messages
  * BUG/MEDIUM: mux-h1: Add C-L header in outgoing message if it was removed
2023-10-04 15:34:18 +02:00
Christopher Faulet
c43742c188 BUG/MEDIUM: mux-h1: Add C-L header in outgoing message if it was removed
If a C-L header was found during parsing of a message but it was removed via
a HTTP action, it is re-added during the message formatting. Indeed, if
headers about the payload are modified, meta-data of the message must also
be updated. Otherwise, it is not possible to guarantee the message will be
properly formatted.

To do so, we rely on the flag H1S_F_HAVE_CLEN.

This patch should not be backported except an issue is explicitly
reported. It relies on "MINOR: mux-h1: Add flags if outgoing msg contains a
header about its payload".
2023-10-04 15:34:18 +02:00
Christopher Faulet
accd3e911c MINOR: mux-h1: Rely on H1S_F_HAVE_CHNK to add T-E in outgoing messages
If a message is declared to have a known length but no C-L or T-E headers
are set, a "Transfer-Encoding; chunked" header is automatically added. It is
useful for H2/H3 messages with no C-L header. There is now a flag to know
this header was found or added. So we use it.
2023-10-04 15:34:18 +02:00
Christopher Faulet
e7964eac2d BUG/MEDIUM: h1: Ignore C-L value in the H1 parser if T-E is also set
In fact, during the parsing there is already a test to remove the
Content-Length header if a Transfer-Encoding one is found. However, in the
parser, the content-length value was still used to set the body length (the
final one and the remaining one). This value is thus also used to set the
extra field in the HTX message and is then used during the sending stage to
announce the chunk size.

So, Content-Length header value must be ignored by the H1 parser to properly
reformat the message when it is sent.

This patch must be backported as far as 2.6. Lower versions don"t handle
this case.
2023-10-04 15:34:18 +02:00
Christopher Faulet
c367957851 BUG/MINOR: mux-h1: Ignore C-L when sending H1 messages if T-E is also set
In fact, it is already done but both flags (H1_MF_CLEN and H1_MF_CHUNK) are
set on the H1 parser. Thus it is errorprone when H1 messages are sent,
especially because most of time, the "Content-length" case is processed
before the "chunked" one. This may lead to compute the wrong chunk size and
to miss the last chunk.

This patch must be backported as far as 2.6. This case is not handled in 2.4
and lower.
2023-10-04 15:34:18 +02:00
Christopher Faulet
331241b084 BUG/MINOR: mux-h1: Handle read0 in rcv_pipe() only when data receipt was tried
In rcv_pipe() callback we must be careful to not report the end of stream
too early because some data may still be present in the input buffer. If we
report a EOS here, this will block the subsequent call to rcv_buf() to
process remaining input data. This only happens when we try a last
rcv_pipe() when the xfer length is unknown and all data was already received
in the input buffer. Concretely this happens with a payload larger than a
buffer but lower than 2 buffers.

This patch must be backported as far as 2.7.
2023-10-04 15:34:18 +02:00
Christopher Faulet
2225cb660c DEBUG: mux-h1: Fix event label from trace messages about payload formatting
The label used for in/out trace messages about payload formatting was not
the right one. Use H1_EV_TX_BODY, instead of H1_EV_TX_HDRS.
2023-10-04 15:34:18 +02:00
Christopher Faulet
751b59c40b BUG/MEDIUM: hlua: Initialize appctx used by a lua socket on connect only
Ths appctx used by a lua socket was synchronously initialized after the
appctx creation. The connect itself is performed later. However it is an
issue because the script may be interrupted beteween the two operation. In
this case, the stream attached to the appctx is woken up before any
destination is set. The stream will try to connect but without destination,
it fails. When the lua script is rescheduled and the connect is performed,
the connection has already failed and an error is returned.

To fix the issue, we must be sure to not woken up the stream before the
connect. To do so, we must defer the appctx initilization. It is now perform
on connect.

This patch relies on the following commits:

  * MINOR: hlua: Test the hlua struct first when the lua socket is connecting
  * MINOR: hlua: Save the lua socket's server in its context
  * MINOR: hlua: Save the lua socket's timeout in its context
  * MINOR: hlua: Don't preform operations on a not connected socket
  * MINOR: hlua: Set context's appctx when the lua socket is created

All the series must be backported as far as 2.6.
2023-10-04 15:34:13 +02:00
Christopher Faulet
66fc9238f0 MINOR: hlua: Test the hlua struct first when the lua socket is connecting
It makes sense to first verify the hlua context is valid. It is probably
better than doing it after updated the appctx.
2023-10-04 15:34:10 +02:00
Christopher Faulet
6f4041c75d MINOR: hlua: Save the lua socket's server in its context
For the same reason than the timeout, the server used by a lua socket is now
saved in its context. This will be mandatory to fix issues with the lua
sockets.
2023-10-04 15:34:06 +02:00
Christopher Faulet
0be1ae2fa2 MINOR: hlua: Save the lua socket's timeout in its context
When the lua socket timeout is set, it is now saved in its context. If there
is already a stream attached to the appctx, the timeout is then immediately
modified. Otherwise, it is modified when the stream is created, thus during
the appctx initialization.

For now, the appctx is initialized when it is created. But this will change
to fix issues with the lua sockets. Thus, this patch is mandatory.
2023-10-04 15:34:03 +02:00
Christopher Faulet
ee687aa18d MINOR: hlua: Don't preform operations on a not connected socket
There is nothing that prevent someone to create a lua socket and try to
receive or to write before the connection was established ot after the
shutdown was performed. The same is true when info about the socket are
retrieved.

It is not an issue because this will fail later. But now, we check the
socket is connected or not earlier. It is more effecient but it will be also
mandatory to fix issue with the lua sockets.
2023-10-04 15:34:00 +02:00
Christopher Faulet
ed9333827a MINOR: hlua: Set context's appctx when the lua socket is created
The lua socket's context referenced the owning appctx. It was set when the
appctx was initialized. It is now performed when the appctx is created. It
is a small change but this will be required to fix several issues with the
lua sockets.
2023-10-04 15:33:57 +02:00
Christopher Faulet
b62d5689d2 BUILD: pool: Fix GCC error about potential null pointer dereference
In pool_gc(), GCC 13.2.1 reports an error about a potential null potential
dereference:

src/pool.c: In function ‘pool_gc’:
src/pool.c:807:64: error: potential null pointer dereference [-Werror=null-dereference]
  807 |                         entry->buckets[bucket].free_list = temp->next;
      |                                                            ~~~~^~~~~~

There is no issue here because "bucket" variable cannot be greater than
CONFIG_HAP_POOL_BUCKETS. But to make GCC happy, we now break the loop if it
is greater or equal to CONFIG_HAP_POOL_BUCKETS.
2023-10-04 08:03:02 +02:00
Amaury Denoyelle
90873dc678 MINOR: proto_reverse_connect: support source address setting
Support backend configuration for explicit source address on
pre-connect. These settings can be specified via "source" backend
keyword or directly on the server line.

Previously, all source parameters triggered a BUG_ON() when binding a
reverse connect listener. This was done because some settings are
incompatible with reverse connect context : this is the case for all
source settings which do not specify a fixed address but rather rely on
a frontend connection. Indeed, in case of preconnect, connection is
initiated on its own without the existence of a previous frontend
connection.

This patch allows to use a source parameter with a fixed address. All
other settings (usesrc client/clientip/hdr_ip) are rejected on listener
binding. On connection init, alloc_bind_address() is used to set the
optional source address.
2023-10-03 17:50:36 +02:00
Amaury Denoyelle
bd001ff346 MINOR: backend: refactor specific source address allocation
Refactor alloc_bind_address() function which is used to allocate a
sockaddr if a connection to a target server relies on a specific source
address setting.

The main objective of this change is to be able to use this function
outside of backend module, namely for preconnections using a reverse
server. As such, this function is now exported globally.

For reverse connect, there is no stream instance. As such, the function
parts which relied on it were reduced to the minimal. Now, stream is
only used if a non-static address is configured which is useful for
usesrc client|clientip|hdr_ip. These options have no sense for reverse
connect so it should be safe to use the same function.
2023-10-03 17:49:12 +02:00
Amaury Denoyelle
2ac5d9a657 MINOR: quic: handle perm error on bind during runtime
Improve EACCES permission errors encounterd when using QUIC connection
socket at runtime :

* First occurence of the error on the process will generate a log
  warning. This should prevent users from using a privileged port
  without mandatory access rights.

* Socket mode will automatically fallback to listener socket for the
  receiver instance. This requires to duplicate the settings from the
  bind_conf to the receiver instance to support configurations with
  multiple addresses on the same bind line.
2023-10-03 16:52:02 +02:00
Amaury Denoyelle
3ef6df7387 MINOR: quic: define quic-socket bind setting
Define a new bind option quic-socket :
  quic-socket [ connection | listener ]

This new setting works in conjunction with the existing configuration
global tune.quic.socket-owner and reuse the same semantics.

The purpose of this setting is to allow to disable connection socket
usage on listener instances individually. This will notably be useful
when needing to deactivating it when encountered a fatal permission
error on bind() at runtime.
2023-10-03 16:49:26 +02:00
Remi Tricot-Le Breton
b019636cd7 DOC: sample: Add a comment in 'check_operator' to explain why 'vars_check_arg' should ignore the 'err' buffer
This extra comment ensure that we do not try to pass an 'err' argument
to 'vars_check_arg' otherwise some warnings will be raised if an
operator is given an integer directly in the configuration file.
2023-10-03 11:13:10 +02:00
Remi Tricot-Le Breton
6fe57303f7 Revert "MEDIUM: sample: Small fix in function check_operator for eror reporting"
This reverts commit d897d7da87.

The "check_operator" function is used for all the operator converters
such as "and", "or", "add"...
With such a converter that accepts a variable name as well as an
integer, the "vars_check_arg" call is expected to fail when an integer
is provided. Passing an "err" variable has the unwanted side effect of
raising a warning during init for a configuration such as the following:

    http-request set-query "s=%[rand,add(20)]"

which raises the following warning:
    [WARNING]  (33040) : config : parsing [hap.cfg:14] : invalid
    variable name '20'. A variable name must be start by its scope. The
    scope can be 'proc', 'sess', 'txn', 'req', 'res' or 'check'.
2023-10-03 11:13:10 +02:00
William Lallemand
c21ec3b735 BUG/MINOR: proto_reverse_connect: fix FD leak upon connect
new_reverse_conn() is creating its own socket with
sock_create_server_socket(). However the connect is done with
conn->ctrl->connect() which is tcp_connect_server().

tcp_connect_server() is also creating its own socket and sets it in the
struct conn, left the previous socket unclosed and leaking at each
attempt.

This patch fixes the issue by letting tcp_connect_server() handling the
socket part, and removes it in new_reverse_conn().
2023-09-30 00:53:43 +02:00
Amaury Denoyelle
c58fd4d1cc MINOR: tcp_act: remove limitation on protocol for attach-srv
This patch allows to specify "tcp-request session attach-srv" without
requiring that each associated bind lines mandates HTTP/2 usage. If a
non supported protocol is targetted by this rule, conn_install_mux_fe()
is responsible to reject it.

This change is mandatory to be able to mix attach-srv and standard
non-reversable connection on the same bind instances. An ACL can be used
to activate attach-srv only on some conditions.
2023-09-29 18:11:10 +02:00
Amaury Denoyelle
337c71423f MINOR: connection: define mux flag for reverse support
Add a new MUX flag MX_FL_REVERSABLE. This value is used to indicate that
MUX instance supports connection reversal. For the moment, only HTTP/2
multiplexer is flagged with it.

This allows to dynamically check if reversal can be completed during MUX
installation. This will allow to relax requirement on config writing for
'tcp-request session attach-srv' which currently cannot be used mixed
with non-http/2 listener instances, even if used conditionnally with an
ACL.
2023-09-29 18:09:08 +02:00
Amaury Denoyelle
ac1164de7c MINOR: connection: define error for reverse connect
Define a new error code for connection CO_ER_REVERSE. This will be used
to report an issue which happens on a connection targetted for reversal
before reverse process is completed.
2023-09-29 18:08:26 +02:00
Amaury Denoyelle
753fe2b9ac BUG/MINOR: tcp_act: fix attach-srv rule ACL parsing
Fix parser for tcp-request session attach-srv rule. Before this commit,
it was impossible to use an anonymous ACL with it. This was caused
because support for optional name argument was badly implemented.

No need to backport this.
2023-09-29 18:07:52 +02:00
Amaury Denoyelle
6118590e95 BUG/MINOR: proto_reverse_connect: fix FD leak on connection error
Listener using "rev@" address is responsible to setup connection and
reverse it using a server instance. If an error occured before reversal
is completed, proper freeing must be taken care of by the listener as no
session exists for this.

Currently, there is two locations where a connection is freed on error
before reversal inside reverse_connect protocol. Both of these were
incomplete as several function must be used to ensure connection is
properly freed. This commit fixes this by reusing the same cleaning
mechanism used inside H2 multiplexer.

One of the biggest drawback before this patch was that connection FD was
not properly removed from fdtab which caused a file-descriptor leak.

No need to backport this.
2023-09-29 18:02:36 +02:00
Willy Tarreau
b3dcd59f8d MINOR: stream: fix output alignment of stuck thread dumps
Since commit c185bc465 ("MEDIUM: stream: now provide full stream dumps
in case of loops"), the stuck threads show the stream's pointer in the
margin since it appears immediately after a line feed. Let's add it after
the prefix and "stream=" to make the output more readable.
2023-09-29 16:43:07 +02:00
Emeric Brun
3c250cb847 Revert "BUG/MEDIUM: quic: missing check of dcid for init pkt including a token"
This reverts commit 072e774939.

Doing h2load with h3 tests we notice this behavior:

Client ---- INIT no token SCID = a , DCID = A ---> Server (1)
Client <--- RETRY+TOKEN DCID = a, SCID = B    ---- Server (2)
Client ---- INIT+TOKEN SCID = a , DCID = B    ---> Server (3)
Client <--- INIT DCID = a, SCID = C           ---- Server (4)
Client ---- INIT+TOKEN SCID = a, DCID = C     ---> Server (5)

With (5) dropped by haproxy due to token validation.

Indeed the previous patch adds SCID of retry packet sent to the aad
of the token ciphering aad. It was useful to validate the next INIT
packets including the token are sent by the client using the new
provided SCID for DCID as mantionned into the RFC 9000.
But this stateless information is lost on received INIT packets
following the first outgoing INIT packet from the server because
the client is also supposed to re-use a second time the lastest
received SCID for its new DCID. This will break the token validation
on those last packets and they will be dropped by haproxy.

It was discussed there:
https://mailarchive.ietf.org/arch/msg/quic/7kXVvzhNCpgPk6FwtyPuIC6tRk0/

To resume: this is not the role of the server to verify the re-use of
retry's SCID for DCID in further client's INIT packets.

The previous patch must be reverted in all versions where it was
backported (supposed until 2.6)
2023-09-29 09:27:22 +02:00
Willy Tarreau
d956db6638 CLEANUP: stream: remove the now unused stream_dump() function
It was superseded by strm_dump_to_buffer() which provides much more
complete information and supports anonymizing.
2023-09-29 09:20:27 +02:00
Willy Tarreau
feff6296a1 MINOR: debug: use the more detailed stream dump in panics
Similarly upon a panic we'd like to have a more detailed dump of a
stream's state, so let's use the full dump function for this now.
2023-09-29 09:20:27 +02:00
Willy Tarreau
c185bc4656 MEDIUM: stream: now provide full stream dumps in case of loops
When a stream is caught looping, we produce some output to help figure
its internal state explaining why it's looping. The problem is that this
debug output is quite old and the info it provides are quite insufficient
to debug a modern process, and since such bugs happen only once or twice
a year the situation doesn't improve.

On the other hand the output of "show sess all" is extremely detailed
and kept up to date with code evolutions since it's a heavily used
debugging tool.

This commit replaces the call to the totally outdated stream_dump() with
a call to strm_dump_to_buffer(), and removes the filters dump since they
are already emitted there, and it now produces much more exploitable
output:

  [ALERT]    (5936) : A bogus STREAM [0x7fa8dc02f660] is spinning at 5653514 calls per second and refuses to die, aborting now! Please report this error to developers:
  0x7fa8dc02f660: [28/Sep/2023:09:53:08.811818] id=2 proto=tcpv4 source=127.0.0.1:58306
     flags=0xc4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x133f220, pend_pos=(nil) waiting=0 epoch=0x1
     frontend=public (id=2 mode=http), listener=? (id=1) addr=127.0.0.1:4080
     backend=public (id=2 mode=http) addr=127.0.0.1:61932
     server=s1 (id=1) addr=127.0.0.1:7443
     task=0x7fa8dc02fa40 (state=0x01 nice=0 calls=5749559 rate=5653514 exp=3s tid=1(1/1) age=1s)
     txn=0x7fa8dc02fbf0 flags=0x3000 meth=1 status=-1 req.st=MSG_DONE rsp.st=MSG_RPBEFORE req.f=0x4c rsp.f=0x00
     scf=0x7fa8dc02f5f0 flags=0x00000482 state=EST endp=CONN,0x7fa8dc02b4b0,0x05004001 sub=1 rex=58s wex=<NEVER>
         h1s=0x7fa8dc02b4b0 h1s.flg=0x100010 .sd.flg=0x5004001 .req.state=MSG_DONE .res.state=MSG_RPBEFORE
          .meth=GET status=0 .sd.flg=0x05004001 .sc.flg=0x00000482 .sc.app=0x7fa8dc02f660
          .subs=0x7fa8dc02f608(ev=1 tl=0x7fa8dc02fae0 tl.calls=0 tl.ctx=0x7fa8dc02f5f0 tl.fct=sc_conn_io_cb)
          h1c=0x7fa8dc0272d0 h1c.flg=0x0 .sub=0 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x7fa8dc0273f0 .exp=<NEVER>
         co0=0x7fa8dc027040 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=LISTENER:0x12840c0
         flags=0x00000300 fd=32 fd.state=20 updt=0 fd.tmask=0x2
     scb=0x7fa8dc02fb30 flags=0x00001411 state=EST endp=CONN,0x7fa8dc0300c0,0x05000001 sub=1 rex=58s wex=<NEVER>
         h1s=0x7fa8dc0300c0 h1s.flg=0x4010 .sd.flg=0x5000001 .req.state=MSG_DONE .res.state=MSG_RPBEFORE
          .meth=GET status=0 .sd.flg=0x05000001 .sc.flg=0x00001411 .sc.app=0x7fa8dc02f660
          .subs=0x7fa8dc02fb48(ev=1 tl=0x7fa8dc02feb0 tl.calls=2 tl.ctx=0x7fa8dc02fb30 tl.fct=sc_conn_io_cb)
          h1c=0x7fa8dc02ff00 h1c.flg=0x80000000 .sub=1 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x7fa8dc030020 .exp=<NEVER>
         co1=0x7fa8dc02fcd0 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x133f220
         flags=0x10000300 fd=33 fd.state=10421 updt=0 fd.tmask=0x2
     req=0x7fa8dc02f680 (f=0x1840000 an=0x8000 pipe=0 tofwd=0 total=79)
         an_exp=<NEVER> buf=0x7fa8dc02f688 data=(nil) o=0 p=0 i=0 size=0
         htx=0xc18f60 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0
     res=0x7fa8dc02f6d0 (f=0x80000000 an=0x1400000 pipe=0 tofwd=0 total=0)
         an_exp=<NEVER> buf=0x7fa8dc02f6d8 data=(nil) o=0 p=0 i=0 size=0
         htx=0xc18f60 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0
    call trace(10):
    |       0x59f2b7 [0f 0b 0f 1f 80 00 00 00]: stream_dump_and_crash+0x1f7/0x2bf
    |       0x5a0d71 [e9 af e6 ff ff ba 40 00]: process_stream+0x19f1/0x3a56
    |       0x68d7bb [49 89 c7 4d 85 ff 74 77]: run_tasks_from_lists+0x3ab/0x924
    |       0x68e0b4 [29 44 24 14 8b 4c 24 14]: process_runnable_tasks+0x374/0x6d6
    |       0x656f67 [83 3d f2 75 84 00 01 0f]: run_poll_loop+0x127/0x5a8
    |       0x6575d7 [48 8b 1d 42 50 5c 00 48]: main+0x1b22f7
    | 0x7fa8e0f35e45 [64 48 89 04 25 30 06 00]: libpthread:+0x7e45
    | 0x7fa8e0e5a4af [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a

Note that the output is subject to the global anon key so that IPs and
object names can be anonymized if required. It could make sense to
backport this and the few related previous patches next time such an
issue is reported.
2023-09-29 09:20:27 +02:00
Willy Tarreau
b206504f43 MINOR: streams: add support for line prefixes to strm_dump_to_buffer()
Now the function can prepend every new line with a caller-fed prefix
that will later be used for indenting. The caller has to feed the
prefix for the first line itself though, allowing to possibly append
the first line at the end of an existing one.
2023-09-29 09:20:27 +02:00
Willy Tarreau
5743eeea88 MINOR: stream: make stream_dump() always multi-line
There used to be two working modes for this function, a single-line one
and a multi-line one, the difference being made on the "eol" argument
which could contain either a space or an LF (and with the prefix being
adjusted accordingly). Let's get rid of the single-line mode as it's
what limits the output contents because it's difficult to produce
exploitable structured data this way. It was only used in the rare case
of spinning streams and applets and these are the ones lacking info. Now
a spinning stream produces:

[ALERT]    (3511) : A bogus STREAM [0x227e7b0] is spinning at 5581202 calls per second and refuses to die, aborting now! Please report this error to developers:
  strm=0x227e7b0,c4a src=127.0.0.1 fe=public be=public dst=s1
  txn=0x2041650,3000 txn.req=MSG_DONE,4c txn.rsp=MSG_RPBEFORE,0
  rqf=1840000 rqa=8000 rpf=80000000 rpa=1400000
  scf=0x24af280,EST,482 scb=0x24af430,EST,1411
  af=(nil),0 sab=(nil),0
  cof=0x7fdb28026630,300:H1(0x24a6f60)/RAW((nil))/tcpv4(33)
  cob=0x23199f0,10000300:H1(0x24af630)/RAW((nil))/tcpv4(32)
  filters={}
  call trace(11):
  (...)
2023-09-29 09:20:27 +02:00
Willy Tarreau
5ddeba7af3 MINOR: stream: make strm_dump_to_buffer() show the list of filters
That's one of the rare pieces of information that was not present in
the full dump and only in the short one, the list of filters the stream
is subscribed to (however the current filter was present and more
detailed).
2023-09-29 09:20:27 +02:00
Willy Tarreau
3e630a9871 MINOR: stream: make strm_dump_to_buffer() take an arbitrary buffer
We won't always want to dump into the trash, so let's make the function
accept an arbitrary buffer.
2023-09-29 09:20:27 +02:00
Willy Tarreau
6bc07103f8 CLEANUP: stream: make strm_dump_to_buffer() take a const stream
Now that we don't need a variable anymore, let's pass a const stream.
It will void any doubt about what can happen to the stream when the
function is called from inspection points (show sess etc).
2023-09-29 09:20:27 +02:00
Willy Tarreau
1a01ee4740 CLEANUP: stream: use const filters in the dump function
The strm_dump_to_buffer() function requires a variable stream only
for a few functions in it that do not take a const. strm_flt() is
one of them (and for good reasons since most call places want to
update filters). Here we know we won't modify the filter nor the
stream so let's directly access the strm_flt in the stream and assign
it to a const filter. This will also catch any future accidental change.
2023-09-29 09:20:27 +02:00
Willy Tarreau
77ecb3146a MINOR: stream: split stats_dump_full_strm_to_buffer() in two
The function only works with the CLI's appctx and does most of the
convenient work of dumping a stream into a buffer (well, the trash
buffer for now). Let's split it in two so that most of the work is
done in a generic function and that the CLI-specific function relies
on that one.

The diff looks huge due to the changed indent caused by the extraction
of the switch/case statement, but when looked at using diff -b it's
small.
2023-09-29 09:20:27 +02:00
Willy Tarreau
6c2af048d6 CLEANUP: stream: make the dump code not depend on the CLI appctx
The HA_ANON_CLI() helper relies on the CLI appctx and prevents the code
from being made more generic. Let's extract the CLI's anon key separately
and pass it via HA_ANON_STR() instead.
2023-09-29 09:20:27 +02:00
Amaury Denoyelle
7cf9cf705e BUG/MINOR: mux-quic: remove full demux flag on ncbuf release
When rcv_buf stream callback is invoked, mux tasklet is woken up if
demux was previously blocked due to lack of buffer space. A BUG_ON() is
present to ensure there is data in qcs Rx buffer. If this is not the
case, wakeup is unneeded :

  BUG_ON(!ncb_data(&qcs->rx.ncbuf, 0));

This BUG_ON() may be triggered if RESET_STREAM is received after demux
has been blocked. On reset, Rx buffer is purged according to RFC 9000
which allows to discard any data not yet consumed. This will trigger the
BUG_ON() assertion if rcv_buf stream callback is invoked after this.

To prevent BUG_ON() crash, just clear demux block flag each time Rx
buffer is purged. This covers accordingly RESET_STREAM reception.

This should be backported up to 2.7.

This may fix github issue #2293.

This bug relies on several precondition so its occurence is rare. This
was reproduced by using a custom client which post big enough data to
fill the buffer. It then emits a RESET_STREAM in place of a proper FIN.
Moreover, mux code has been edited to artificially stalled stream read
to force demux blocking.

h3_data_to_htx:
-       return htx_sent;
+       return 1;

qcc_recv_reset_stream:
        qcs_free_ncbuf(qcs, &qcs->rx.ncbuf);
+       qcs_notify_recv(qcs);

qmux_strm_rcv_buf:
        char fin = 0;
+       static int i = 0;
+       if (++i < 2)
+               return 0;
        TRACE_ENTER(QMUX_EV_STRM_RECV, qcc->conn, qcs);
2023-09-28 11:44:53 +02:00
Vladimir Vdovin
f8b81f6eb7 MINOR: support for http-request set-timeout client
Added set-timeout for frontend side of session, so it can be used to set
custom per-client timeouts if needed. Added cur_client_timeout to fetch
client timeout samples.
2023-09-28 08:49:22 +02:00
Amaury Denoyelle
b9bb3b932c MINOR: proto_reverse_connect: emit log for preconnect
Add reporting using send_log() for preconnect operation. This is minimal
to ensure we understand the current status of listener in active reverse
connect.

To limit logging quantity, only important transition are considered.
This requires to implement a minimal state machine as a new field in
receiver structure.

Here are the logs produced :
* Initiating : first time preconnect is enabled on a listener
* Error : last preconnect attempt interrupted on a connection error
* Reaching maxconn : all necessary connections were reversed and are
  operational on a listener
2023-09-22 17:21:53 +02:00
Amaury Denoyelle
069ca55e70 MINOR: proto_reverse_connect: remove unneeded wakeup
No need to use task_wakeup() on rev_bind_listener() to bootstrap
preconnect. A similar call is done on rev_enable_listener() which serve
both for bootstrap and also later to reinitiate attemps to maintain
maxconn if connection are freed.
2023-09-22 17:06:18 +02:00
Amaury Denoyelle
1f43fb71be MINOR: proto_reverse_connect: refactor preconnect failure
When a connection is freed during preconnect before reversal, the error
must be notified to the listener to remove any connection reference and
rearm a new preconnect attempt. Currently, this can occur through 2 code
paths :
* conn_free() called directly by H2 mux
* error during conn_create_mux(). For this case, connection is flagged
  with CO_FL_ERROR and reverse_connect task is woken up. The process
  task handler is then responsible to call conn_free() for such
  connection.

Duplicated steps where done both in conn_free() and process task
handler. These are now removed. To facilitate code maintenance,
dedicated operation have been centralized in a new function
rev_notify_preconn_err() which is called by conn_free().
2023-09-22 16:43:36 +02:00
Amaury Denoyelle
a37abee266 BUG/MINOR: proto_reverse_connect: set default maxconn
If maxconn is not set for preconnect, it assumes we want to establish a
single connection. However, this does not work properly in case the
connection is closed after reversal. Listener is not resumed by protocol
layer to attempt a new preconnect.

To fix this, explicitely set maxconn to 1 in the listener instance if
none is defined. This ensures the behavior is consistent. A BUG_ON() has
been added to validate we never try to use a listener with a 0 maxconn.
2023-09-22 16:40:58 +02:00
Emeric Brun
27b2fd2e06 MINOR: quic: handle external extra CIDs generator.
This patch adds the ability to externalize and customize the code
of the computation of extra CIDs after the first one was derived from
the ODCID.

This is to prepare interoperability with extra components such as
different QUIC proxies or routers for instance.

To process the patch defines two function callbacks:
- the first one to compute a hash 64bits from the first generated CID
  (itself continues to be derived from ODCID). Resulting hash is stored
  into the 'quic_conn' and 64bits is chosen large enought to be able to
  store an entire haproxy's CID.
- the second callback re-uses the previoulsy computed hash to derive
  an extra CID using the custom algorithm. If not set haproxy will
  continue to choose a randomized CID value.

Those two functions have also the 'cluster_secret' passed as an argument:
this way, it is usable for obfuscation or ciphering.
2023-09-22 10:32:14 +02:00
Lokesh Jindal
d897d7da87 MEDIUM: sample: Small fix in function check_operator for eror reporting
When function "check_operator" calls function "vars_check_arg" to decode
a variable, it passes in NULL value for pointer to the char array meant
for capturing the error message.  This commit replaces NULL with the
pointer to the real char array.  This should help in correct error
reporting.
2023-09-22 08:48:53 +02:00
Lokesh Jindal
915e48675a MEDIUM: sample: Enhances converter "bytes" to take variable names as arguments
Prior to this commit, converter "bytes" takes only integer values as
arguments.  After this commit, it can take variable names as inputs.
This allows us to dynamically determine the offset/length and capture
them in variables.  These variables can then be used with the converter.
Example use case: parsing a token present in a request header.
2023-09-22 08:48:51 +02:00
Amaury Denoyelle
d3db96f11a MINOR: proto_reverse_connect: prevent transparent server for pre-connect
Prevent using transparent servers for pre-connect on startup by emitting
a fatal error. This is used to ensure we never try to connect to a
target with an unspecified destination address or port.
2023-09-21 16:58:08 +02:00
Amaury Denoyelle
9b6812d781 BUG/MINOR: proto_reverse_connect: fix preconnect with startup name resolution
addr member of server structure is not set consistently depending on the
server address type. When using <IP:PORT> notation, its port is properly
set. However, when using <HOSTNAME:PORT>, only IP address is set after
startup name resolution but its port is left to 0.

This behavior causes preconnect to not be functional when using server
with hostname for startup name resolution. Indeed, only srv.addr is used
as connect argument through function new_reverse_conn(). To fix this,
rely on srv.svc_port : this member is always set for servers using IP or
hostname. This is similar to connect_server() on the backend side.

This does not need to be backported.
2023-09-21 16:57:30 +02:00
Sébastien Gross
6a9ba85322 MINOR: hlua: Add support for the "http-after-res" action
This commit introduces support for the "http-after-res" action in
hlua, enabling the invocation of a Lua function in a
"http-after-response" rule. With this enhancement, a Lua action can be
registered using the "http-after-res" action type:

    core.register_action('myaction', {'http-after-res'}, myaction)

A new "lua.myaction" is created and can be invoked in a
"http-after-response" rule:

    http-after-response lua.myaction

This addition provides greater flexibility and extensibility in
handling post-response actions using Lua.

This commit depends on:
 - 4457783 ("MINOR: http_ana: position the FINAL flag for http_after_res execution")

Signed-off-by: Sébastien Gross <sgross@haproxy.com>
2023-09-21 16:31:20 +02:00
Aurelien DARRAGON
95c4d24825 BUG/MEDIUM: server/cli: don't delete a dynamic server that has streams
In cli_parse_delete_server(), we take care of checking that the server is
in MAINT and that the cur_sess counter is set to 0, in the hope that no
connection/stream ressources continue to point to the server, else we
refuse to delete it.

As shown in GH #2298, this is not sufficient.

Indeed, when the server option "on-marked-down shutdown-sessions" is not
used, server streams are not purged when srv enters maintenance mode.

As such, there could be remaining streams that point to the server. To
detect this, a secondary check on srv->cur_sess counter was performed in
cli_parse_delete_server(). Unfortunately, there are some code paths that
could lead to cur_sess being decremented, and not resulting in a stream
being actually shutdown. As such, if the delete_server cli is handled
right after cur_sess has been decremented with streams still pointing to
the server, we could face some nasty bugs where stream->srv_conn could
point to garbage memory area, as described in the original github report.

To make the check more reliable prior to deleting the server, we don't
rely exclusively on cur_sess and directly check that the server is not
used in any stream through the srv_has_stream() helper function.

Thanks to @capflam which found out the root cause for the bug and greatly
helped to provide the fix.

This should be backported up to 2.6.
2023-09-21 14:57:01 +02:00
Aurelien DARRAGON
0189a4679e MINOR: pattern/ip: simplify pat_match_ip() function
pat_match_ip() has been updated several times over the last decade to
introduce new features, but it was never cleaned up.

The result is that the function is pretty hard to read, and there are
multiple duplicated code blocks so it becomes error-prone to maintain it,
plus it bloats the haproxy binary for nothing.

In this patch, we move the tree search (ip4 / ip6) logic into 2
dedicated helper functions. This allows us to refactor pat_match_ip()
without touching to the original behavior.
2023-09-21 09:50:56 +02:00
Aurelien DARRAGON
f80122db26 MINOR: pattern/ip: offload ip conversion logic to helper functions
Now that v4tov6() and v6tov4() were reworked to match behavior from
pat_match_ip() function in ("MINOR: tools/ip: v4tov6() and v6tov4()
rework"), we can remove code duplication in pat_match_ip() by directly
using those dedicated functions where relevant.
2023-09-21 09:50:55 +02:00
Aurelien DARRAGON
72514a4467 MEDIUM: tools/ip: v4tov6() and v6tov4() rework
v4tov6() and v6tov4() helper function were initially implemented in
4f92d3200 ("[MEDIUM] IPv6 support for stick-tables").

However, since ceb4ac9c3 ("MEDIUM: acl: support IPv6 address matching")
support for legacy ip6 to ip4 conversion formats were added, with the
parsing logic directly performed in acl_match_ip (which later became
pat_match_ip)

The issue is that the original v6tov4() function which is used for sample
expressions handling lacks those additional formats, so we could face
inconsistencies whether we rely on ip4/ip6 conversions from an acl context
or an expression context.

To unify ip4/ip6 automatic mapping behavior, we reworked v4tov6 and v6tov4
functions so that they now behave like in pat_match_ip() function.

Note: '6to4 (RFC3056)' and 'RFC4291 ipv4 compatible address' formats are
still supported for legacy purposes despite being deprecated for a while
now.
2023-09-21 09:50:55 +02:00
Christopher Faulet
d3e379b3ce BUG/MEDIUM: http-ana: Try to handle response before handling server abort
In the request analyser responsible to forward the request, we try to detect
the server abort to stop the request forwarding. However, we must be careful
to not block the response processing, if any. Indeed, it is possible to get
the response and the server abort in same time. In this case, we must try to
forward the response to the client first.

So to fix the issue, in the request analyser we no longer handle the server
abort if the response channel is not empty. In the end, the response
analyser is able to detect the server abort if it is relevant. Otherwise,
the stream will be woken up after the response forwarding and the server
abort should be handled at this stage.

This patch should be backported as far as 2.7 only because the risk of
breakage is high. And it is probably a good idea to wait a bit before
backporting it.
2023-09-21 09:36:37 +02:00
Willy Tarreau
cbbee15462 CLEANUP: ring: rename the ring lock "RING_LOCK" instead of "LOGSRV_LOCK"
The ring lock was initially mostly used for the logs and used to inherit
its name in lock stats. Now that it's exclusively used by rings, let's
rename it accordingly.
2023-09-20 21:38:33 +02:00
Willy Tarreau
cec8b42cb3 MEDIUM: logs: atomically check and update the log sample index
The log server lock is pretty visible in perf top when using log samples
because it's taken for each server in turn while trying to validate and
update the log server's index. Let's change this for a CAS, since we have
the index and the range at hand now. This allow us to remove the logsrv
lock.

The test on 4 servers now shows a 3.7 times improvement thanks to much
lower contention. Without log sampling a test producing 4.4M logs/s
delivers 4.4M logs/s at 21 CPUs used, everything spent in the kernel.
After enabling 4 samples (1:4, 2:4, 3:4 and 4:4), the throughput would
previously drop to 1.13M log/s with 37 CPUs used and 75% spent in
process_send_log(). Now with this change, 4.25M logs/s are emitted,
using 26 CPUs and 22% in process_send_log(). That's a 3.7x throughput
improvement for a 30% global CPU usage reduction, but in practice it
mostly shows that the performance drop caused by having samples is much
less noticeable (each of the 4 servers has its index updated for each
log).

Note that in order to even avoid incrementing an index for each log srv
that is consulted, it would be more convenient to have a single index
per frontend and apply the modulus on each log server in turn to see if
the range has to be updated. It would then only perform one write per
range switch. However the place where this is done doesn't have access
to a frontend, so some changes would need to be performed for this, and
it would require to update the current range independently in each
logsrv, which is not necessarily easier since we don't know yet if we
can commit it.
2023-09-20 21:38:33 +02:00
Willy Tarreau
e00470378b MINOR: logs: use a single index to store the current range and index
By using a single long long to store both the current range and the
next index, we'll make it possible to perform atomic operations instead
of locking. Let's only regroup them for now under a new "curr_rg_idx".
The upper word is the range, the lower is the index.
2023-09-20 21:38:33 +02:00
Willy Tarreau
49ddc0138c CLEANUP: logs: rename a confusing local variable "curr_rg" to "smp_rg"
The variable curr_rg in process_send_log() is misleading because it is
not related to the integer curr_rg that's used to calculate it, instead
it's a pointer to the current smp_log_range from smp_rgs[], so let's call
it "smp_rg" as a singular for this "smp_rgs" and put an end to this
confusion.
2023-09-20 21:38:33 +02:00
Willy Tarreau
3f1284560f MINOR: log: remove the unused curr_idx in struct smp_log_range
This index is useless because it only serves to know when the global
index reached the end, while the global one already knows it. Let's
just drop it and perform the test on the global range.

It was verified with the following config that the first server continues
to take 1/10 of the traffic, the 2nd one 2/10, the 3rd one 3/10 and the
4th one 4/10:

    log 127.0.0.1:10001 sample 1:10 local0
    log 127.0.0.1:10002 sample 2,5:10 local0
    log 127.0.0.1:10003 sample 3,7,9:10 local0
    log 127.0.0.1:10004 sample 4,6,8,10:10 local0
2023-09-20 21:38:33 +02:00
Willy Tarreau
4351364700 MINOR: logs: clarify the check of the log range
The test of the log range is not very clear, in part due to the
reuse of the "curr_idx" name that happens at two levels. The call
to in_smp_log_range() applies to the smp_info's index to which 1 is
added: it verifies that the next index is still within the current
range.

Let's just have a local variable "next_index" in process_send_log()
that gets assigned the next index (current+1) and compare it to the
current range's boundaries. This makes the test much clearer. We can
then simply remove in_smp_log_range() that's no longer needed.
2023-09-20 21:38:33 +02:00
Aurelien DARRAGON
2c9bd3ae80 BUG/MINOR: server: add missing free for server->rdr_pfx
rdr_pfx was not being free during server cleanup, leading to small memory
leak when "redir" argument was used on a server line (HTTP only).

This should be backported to every stable versions.

[For 2.6 and 2.7: the free should be performed in srv_drop() directly.
 For older versions: free in deinit() function near the free for the
 cookie string]
2023-09-15 17:46:49 +02:00
Willy Tarreau
6cbb5a057b Revert "MAJOR: import: update mt_list to support exponential back-off"
This reverts commit c618ed5ff4.

The list iterator is broken. As found by Fred, running QUIC single-
threaded shows that only the first connection is accepted because the
accepter relies on the element being initialized once detached (which
is expected and matches what MT_LIST_DELETE_SAFE() used to do before).
However while doing this in the quic_sock code seems to work, doing it
inside the macro show total breakage and the unit test doesn't work
anymore (random crashes). Thus it looks like the fix is not trivial,
let's roll this back for the time it will take to fix the loop.
2023-09-15 17:13:43 +02:00
William Lallemand
694889ac2d BUILD: quic: fix build on centos 8 and USE_QUIC_OPENSSL_COMPAT
When using USE_QUIC_OPENSSL_COMPAT=1 on centos-8 the build fail this
way:

In file included from src/quic_openssl_compat.c:11:
/usr/include/openssl/kdf.h:33:46: error: unknown type name 'va_list'
 int EVP_KDF_vctrl(EVP_KDF_CTX *ctx, int cmd, va_list args);

This is because of openssl/kdf.h being include before openssl-compat.h
2023-09-14 16:26:58 +02:00
Christopher Faulet
89e20033c7 BUG/MAJOR: mux-h2: Report a protocol error for any DATA frame before headers
If any DATA frame is received before all headers are fully received, a
protocol error must be reported. It is required by the HTTP/2 RFC but it is
also important because the HTTP analyzers expect the first HTX block is a
start-line. It leads to a crash if this statement is not respected.

For instance, it is possible to trigger a crash by sending an interim
message with a DATA frame (It may be an empty DATA frame with the ES
flag). AFAIK, only the server side is affected by this bug.

To fix the issue, an protocol error is reported for the stream.

This patch should fix the issue #2291. It must be backported as far as 2.2
(and probably to 2.0 too).
2023-09-14 11:39:39 +02:00
Frédéric Lécaille
3921bf80c7 BUG/MINOR: quic: Leak of frames to send.
In very rare cases, it is possible that packet are detected as lost, their frames
requeued, then the connection is released without releasing for any reason (to
be killed because of a sendto() fatal failure for instance. Such frames are lost
and never release because the function which release their packet number spaces
does not release the frames which are still enqueued to be send.

Must be backported as far as 2.6.
2023-09-13 15:32:14 +02:00
William Lallemand
c7424a1bac MINOR: samples: implement bytes_in and bytes_out samples
%[bytes_in] and %[bytes_out] are equivalent to %U and %B tags in
log-format.
2023-09-13 14:54:50 +02:00
Willy Tarreau
5abbae2d3d CLEANUP: pools: simplify the pool expression when no pool was matched in dump
When dumping pool information, we make a special case of the condition
where the pool couldn't be identified and we consider that it was the
correct one. In the code arrangements brought by commit efc46dede ("DEBUG:
pools: inspect pools on fatal error and dump information found"), a
ternary expression for testing this depends on the "if" block condition
so this can be simplified and will make Coverity happy. This was reported
in GH #2290.
2023-09-13 13:31:41 +02:00
Willy Tarreau
c618ed5ff4 MAJOR: import: update mt_list to support exponential back-off
The new mt_list code supports exponential back-off on conflict, which
is important for use cases where there is contention on a large number
of threads. The API evolved a little bit and required some updates:

  - mt_list_for_each_entry_safe() is now in upper case to explicitly
    show that it is a macro, and only uses the back element, doesn't
    require a secondary pointer for deletes anymore.

  - MT_LIST_DELETE_SAFE() doesn't exist anymore, instead one just has
    to set the list iterator to NULL so that it is not re-inserted
    into the list and the list is spliced there. One must be careful
    because it was usually performed before freeing the element. Now
    instead the element must be nulled before the continue/break.

  - MT_LIST_LOCK_ELT() and MT_LIST_UNLOCK_ELT() have always been
    unclear. They were replaced by mt_list_cut_around() and
    mt_list_connect_elem() which more explicitly detach the element
    and reconnect it into the list.

  - MT_LIST_APPEND_LOCKED() was only in haproxy so it was left as-is
    in list.h. It may however possibly benefit from being upstreamed.

This required tiny adaptations to event_hdl.c and quic_sock.c. The
test case was updated and the API doc added. Note that in order to
keep include files small, the struct mt_list definition remains in
list-t.h (par of the internal API) and was ifdef'd out in mt_list.h.

A test on QUIC with both quictls 1.1.1 and wolfssl 5.6.3 on ARM64 with
80 threads shows a drastic reduction of CPU usage thanks to this and
the refined memory barriers. Please note that the CPU usage on OpenSSL
3.0.9 is significantly higher due to the excessive use of atomic ops
by openssl, but 3.1 is only slightly above 1.1.1 though:

  - before: 35 Gbps, 3.5 Mpps, 7800% CPU
  - after:  41 Gbps, 4.2 Mpps, 2900% CPU
2023-09-13 11:50:33 +02:00
Christopher Faulet
13fb7170be BUG/MEDIUM: master/cli: Pin the master CLI on the first thread of the group 1
There is no reason to start the master CLI on several threads and on several
groups. And in fact, it must not be done otherwise the same FD is inserted
several times in the fdtab, leading to a crash during startup because of a
BUG_ON(). It happens when several groups are configured.

To fix the bug the master CLI is now pinned on the first thread of the first
group.

This patch should fix the issue #2259 and must be backported to 2.8.
2023-09-13 10:26:32 +02:00
Christopher Faulet
665703d456 BUG/MEDIUM: mux-fcgi: Don't swap trash and dbuf when handling STDERR records
trahs chunks are buffers but not allocated from the buffers pool. And the
"trash" chunk is static and thread-local. It is two reason to not swap it
with a regular buffer allocated from the buffers pool.

Unfortunatly, it is exactly what is performed in the FCGI mux when a STDERR
record is handled. b_xfer() is used to copy data from the demux buffer to
the trash to format the error message. A zeor-copy via a swap may be
performed. In this case, this leads to a memory corruption and a crash
because, some time later, the demux buffer is released because it is
empty. And it is in fact the trash chunk.

b_force_xfer() must be used instead. This function forces the copy.

This patch must be backported as far as 2.2. For 2.4 and 2.2, b_force_xfer()
does not exist. For these versions, the following commit must be backported
too:

  * c7860007cc ("MINOR: buf: Add b_force_xfer() function")
2023-09-12 19:50:17 +02:00
Aurelien DARRAGON
1115fc348e BUG/MINOR: hlua/init: coroutine may not resume itself
It's not supported to call lua_resume with <L> and <from> designating
the same lua coroutine. It didn't cause visible bugs so far because
Lua 5.3 used to be more permissive about this, and moreover, yielding
is not involved during the hlua init state.

But this is wrong usage, and the doc clearly specifies that the <from>
argument can be NULL when there is no such coroutine, which is the case
here.

This should be backported in every stable versions.
2023-09-12 19:50:17 +02:00
Aurelien DARRAGON
e7281f3f5d BUG/MEDIUM: hlua: don't pass stale nargs argument to lua_resume()
In hlua_ctx_resume(), we call lua_resume() function like this:

  lua_resume(lua->T, hlua_states[lua->state_id], lua->nargs)

Once the call returns, we may call the function again with the same
hlua context when E_YIELD is returned (the execution was interrupted
and may be resumed through another lua_resume() call).

The 3rd argument to lua_resume(), 'nargs', is a hint passed to Lua to
know how many (optional) arguments were pushed on the stack prior to
resuming the execution (arguments that Lua will then expose to the Lua
script).

But here is the catch: we never reset lua->nargs between successive
lua_resume() calls, meaning that next lua_resume() calls will still
inherit from the initial nargs value that was set in hlua ctx prior
to calling hlua_ctx_resume() (our wrapper function) for the first time.

This is problematic, because despite not being explicitly mentioned in
the Lua documentation, passed arguments (to which `nargs` refer to), are
already consumed once lua_resume() returns.

This means that we cannot keep calling lua_resume() with non-zero nargs
if we don't push new arguments on the stack prior to resuming lua after
the initial call: nargs is proper to a single lua_resume() invocation.

Despite improper use of lua_resume() for a long time, this didn't cause
visible issues in the past with Lua 5.3, but it is particularly sensitive
starting with Lua 5.4.3 due to debugging hooks improvements that led to
some internal changes (see: lua/lua@58aa09a). Not using nargs properly
now exposes us to undefined behavior when resuming after a yield triggered
from a debugging hook, which may cause running scripts to crash
unexpectedly: for instance with Lua raising errors and complaining about
values being NULL where it should not be the case.

For reference, this issue was initially raised on the Lua mailing list:
  http://lua-users.org/lists/lua-l/2023-09/msg00005.html

In this patch, we immediately reset nargs when lua_resume() returns to
prevent any misuse.

It should be backported to every maintained versions.
2023-09-12 19:50:17 +02:00
Willy Tarreau
93c2ea0ec3 MEDIUM: pools: refine pool size rounding
The pools sizes were rounded up a little bit too much with commit
30f931ead ("BUG/MEDIUM: pools: fix the minimum allocation size"). The
goal was in fact to make sure they were always at least large enough to
store 2 list heads, and stuffing this into the alignment calculation
resulted in the size being always rounded up to this size. This is
problematic because it means that the appended tag at the end doesn't
always catch potential overflows since more bytes than needed are
allocated. Moreover, this test was later reinforced by commit b5ba09ed5
("BUG/MEDIUM: pools: ensure items are always large enough for the
pool_cache_item"), proving that the first test was not always sufficient.

This needs to be reworked to proceed correctly:
  - the two lists are needed when the object is in the cache, hence
    when we don't care about the tag, which means that the tag's size,
    if any, can easily cover for the missing bytes to reach that size.
    This is actually what was already being checked for.

  - the rounding should not be performed (beyond the size of a word to
    preserve pointer alignment) when pool tagging is enabled, otherwise
    we don't detect small overflows. It means that there will be less
    merging when proceeding like this. Tests show that we merge 93 pools
    into 36 without tags and 43 with tags enabled.

  - the rounding should not consider the extra size, since it's already
    done when calculating the allocated size later (i.e. don't round up
    twice). The difference is subtle but it's what makes sure the tag
    immediately follows the area instead of starting from the end.

Thanks to this, now when writing one byte too many at the end of a struct
stream, the error is instantly caught.
2023-09-12 18:14:05 +02:00
Willy Tarreau
61575769ac DEBUG: pools: print the contents surrounding the expected tag location
When no tag matches a known pool, we can inspect around to help figure
what could have possibly overwritten memory. The contents are printed
one machine word per line in hex, then using printable characters, and
when they can be resolved to a pointer, either the pool's pointer name
or a resolvable symbol with offset. The goal here is to help recognize
what is easily identifiable in memory.

For example applying the following patch to stream_free():

  -	pool_free(pool_head_stream, s);
  +	pool_free(pool_head_stream, (void*)s+1);

Causes the following dump to be emitted:

  FATAL: pool inconsistency detected in thread 1: tag mismatch on free().
    caller: 0x59e968 (stream_free+0x6d8/0xa0a)
    item: 0x13df5c1
    pool: 0x12782c0 ('stream', size 888, real 904, users 1)
  Tag does not match (0x4f00000000012782). Tag does not match any other pool.
  Contents around address 0x13df5c1+888=0x13df939:
         0x13df918 [00 00 00 00 00 00 00 00] [........]
         0x13df920 [00 00 00 00 00 00 00 00] [........]
         0x13df928 [00 00 00 00 00 00 00 00] [........]
         0x13df930 [00 00 00 00 00 00 00 00] [........]
         0x13df938 [c0 82 27 01 00 00 00 00] [..'.....] [pool:stream]
         0x13df940 [4f c0 59 00 00 00 00 00] [O.Y.....] [stream_new+0x4f/0xbec]
         0x13df948 [49 46 49 43 41 54 45 2d] [IFICATE-]
         0x13df950 [81 02 00 00 00 00 00 00] [........]
         0x13df958 [df 13 00 00 00 00 00 00] [........]
  Other possible callers:
  (...)

We notice that the tag references pool_head_stream with the allocation
point in stream_new. Another benefit is that a caller may be figured
from the tag even if the "caller" feature is not enabled, because upon
a free() we always put the caller's location into the tag. This should
be sufficient to debug most cases that normally require gdb.
2023-09-12 18:14:05 +02:00
Willy Tarreau
0f9a10c7f1 DEBUG: pools: also print the value of the tag when it doesn't match
Sometimes the tag's value may reveal a recognizable pattern, so let's
print it when it doesn't match a known pool.
2023-09-12 18:14:05 +02:00
Willy Tarreau
96c1a24224 DEBUG: pools: also print the item's pointer when crashing
It's important to inspect a core or recognize some values to have the
item pointer, it was not provided.
2023-09-12 18:14:05 +02:00
Willy Tarreau
efc46dede9 DEBUG: pools: inspect pools on fatal error and dump information found
It's a bit frustrating sometimes to see pool checks catch a bug but not
provide exploitable information without a core.

Here we're adding a function "pool_inspect_item()" which is called just
before aborting in pool_check_pattern() and POOL_DEBUG_CHECK_MARK() and
which will display the error type, the pool's pointer and name, and will
try to check if the item's tag matches the pool, and if not, will iterate
over all pools to see if one would be a better candidate, then will try
to figure the last known caller and possibly other likely candidates if
the pool's tag is not sufficiently trusted. This typically helps better
diagnose corruption in use-after-free scenarios, or freeing to a pool
that differs from the one the object was allocated from, and will also
indicate calling points that may help figure where an object was last
released or allocated. The info is printed on stderr just before the
backtrace.

For example, the recent off-by-one test in the PPv2 changes would have
produced the following output in vtest logs:

  ***  h1    debug|FATAL: pool inconsistency detected in thread 1: tag mismatch on free().
  ***  h1    debug|  caller: 0x62bb87 (conn_free+0x147/0x3c5)
  ***  h1    debug|  pool: 0x2211ec0 ('pp_tlv_256', size 304, real 320, users 1)
  ***  h1    debug|Tag does not match. Possible origin pool(s):
  ***  h1    debug|  tag: @0x2565530 = 0x2216740 (pp_tlv_128, size 176, real 192, users 1)
  ***  h1    debug|Recorded caller if pool 'pp_tlv_128':
  ***  h1    debug|  @0x2565538 (+0184) = 0x62c76d (conn_recv_proxy+0x4cd/0xa24)

A mismatch in the allocated/released pool is already visible, and the
callers confirm it once resolved, where the allocator indeed allocates
from pp_tlv_128 and conn_free() releases to pp_tlv_256:

  $ addr2line -spafe ./haproxy <<< $'0x62bb87\n0x62c76d'
  0x000000000062bb87: conn_free at connection.c:568
  0x000000000062c76d: conn_recv_proxy at connection.c:1177
2023-09-11 15:46:14 +02:00
Willy Tarreau
f6bee5a50b DEBUG: pools: make pool_check_pattern() take a pointer to the pool
This will be useful to report detailed bug traces.
2023-09-11 15:19:49 +02:00
Willy Tarreau
e92e96b00f DEBUG: pools: pass the caller pointer to the check functions and macros
In preparation for more detailed pool error reports, let's pass the
caller pointers to the check functions. This will be useful to produce
messages indicating where the issue happened.
2023-09-11 15:19:49 +02:00
Willy Tarreau
baf2070421 DEBUG: pools: always record the caller for uncached allocs as well
When recording the caller of a pool_alloc(), we currently store it only
when the object comes from the cache and never when it comes from the
heap. There's no valid reason for this except that the caller's pointer
was not passed to pool_alloc_nocache(), so it used to set NULL there.
Let's just pass it down the chain.
2023-09-11 15:19:49 +02:00
Frédéric Lécaille
2dedbe76c9 BUG/MINOR: quic: fdtab array underflow access
When using the listener socket as file descriptor, qc->fd value is -1.
In this case one must not access fdtab[qc->fd] element to change its value.

This bug could have been detected by asan with such a backtrace:

=================================================================
==402222==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fa8ecf417ex7fa8e915cf90 sp 0x7fa8e915cf88
WRITE of size 8 at 0x7fa8ecf417e8 thread T6
    #0 0x55707a0bf18a in qc_new_cc_conn src/quic_conn.c:838
    #1 0x55707a0c6dc0 in quic_conn_release src/quic_conn.c:1408
    #2 0x55707a10916f in quic_close src/xprt_quic.c:35
    #3 0x55707a0cec77 in conn_xprt_close include/haproxy/connection.h:153
    #4 0x55707a0ceed0 in conn_full_close include/haproxy/connection.h:197
    #5 0x55707a0ec253 in qcc_release src/mux_quic.c:2412
    #6 0x55707a0ec7d0 in qcc_io_cb src/mux_quic.c:2443
    #7 0x55707a63ff2a in run_tasks_from_lists src/task.c:596
    #8 0x55707a641cc9 in process_runnable_tasks src/task.c:876
    #9 0x55707a56f7b2 in run_poll_loop src/haproxy.c:2954
    #10 0x55707a5705fd in run_thread_poll_loop src/haproxy.c:3153
    #11 0x7fa8f9450ea6 in start_thread nptl/pthread_create.c:477
    #12 0x7fa8f936ea2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfba2e)

0x7fa8ecf417e8 is located 24 bytes to the left of 134217728-byte region [0x7fa8e
allocated by thread T0 here:
    #0 0x7fa8f9a37037 in __interceptor_calloc ../../../../src/libsanitizer/asan/
    #1 0x55707a71a61d in init_pollers src/fd.c:1161
    #2 0x55707a56cdf1 in init src/haproxy.c:2672
    #3 0x55707a5714c2 in main src/haproxy.c:3298
    #4 0x7fa8f9296d09 in __libc_start_main ../csu/libc-start.c:308

Thread T6 created by T0 here:
    #0 0x7fa8f99e22a2 in __interceptor_pthread_create ../../../../src/libsanitizpp:214
    #1 0x55707a748a21 in setup_extra_threads src/thread.c:252
    #2 0x55707a5735c9 in main src/haproxy.c:3844
    #3 0x7fa8f9296d09 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-buffer-overflow src/quic_conn.c:838 in qc_new_cc
Shadow bytes around the buggy address:
  0x0ff59d9e02a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0ff59d9e02f0: fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]fa fa
  0x0ff59d9e0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==402222==ABORTING
Aborted

Thank you to @Tristan971 for having reported this bug in GH #2247.

No need to backport.
2023-09-11 15:14:22 +02:00
Willy Tarreau
4a18d9e560 REORG: cpuset: move parse_cpu_set() and parse_cpumap() to cpuset.c
These ones were still in cfgparse.c but they're not specific to the
config at all and may actually be used even when parsing cpu list
entries in /sys. Better move them where they can be reused.
2023-09-08 16:25:19 +02:00
Willy Tarreau
5119109e3f MINOR: cpuset: dynamically allocate cpu_map
cpu_map is 8.2kB/entry and there's one such entry per group, that's
~520kB total. In addition, the init code is still in haproxy.c enclosed
in ifdefs. Let's make this a dynamically allocated array in the cpuset
code and remove that init code.

Later we may even consider reallocating it once the number of threads
and groups is known, in order to shrink it a little bit, as the typical
setup with a single group will only need 8.2kB, thus saving half a MB
of RAM. This would require that the upper bound is placed in a variable
though.
2023-09-08 16:25:19 +02:00
Willy Tarreau
b0f20ed79b MEDIUM: cfgparse: assign NUMA affinity to cpu-maps
Do not force affinity on the process, instead let's just apply it to
cpu-map, it will automatically be used later in the init process. We
can do this because we know that cpu-map was not set when we're using
this detection code.

This is much saner, as we don't need to manipulate the process' affinity
at this point in time, and just update the info that the user omitted to
set by themselves, which guarantees a better long-term consistency with
the documented feature.
2023-09-08 16:25:19 +02:00
Willy Tarreau
809a49da96 MINOR: cfgparse: use read_line_from_trash() to read from /sys
It's easier to use this function now to natively support variable
fields in the file's path. This also removes read_file_from_trash()
that was only used here and was static.
2023-09-08 16:25:19 +02:00
Willy Tarreau
1f2433fb6a MINOR: tools: add function read_line_to_trash() to read a line of a file
This function takes on input a printf format for the file name, making
it particularly suitable for /proc or /sys entries which take a lot of
numbers. It also automatically trims the trailing CR and/or LF chars.
2023-09-08 16:25:19 +02:00
Willy Tarreau
5f10176e2c MEDIUM: init: initialize the trash earlier
More and more utility functions rely on the trash while most of the init
code doesn't have access to it because it's initialized very late (in
PRE_CHECK for the initial one). It's a pool, and it purposely supports
being reallocated, so let's initialize it in STG_POOL so that early
STG_INIT code can at least use it.
2023-09-08 16:25:19 +02:00
Frédéric Lécaille
e3e218b98e CLEANUP: quic: Remove useless free_quic_tx_pkts() function.
This function define but no more used since this commit:
    BUG/MAJOR: quic: Really ignore malformed ACK frames.
2023-09-08 10:17:25 +02:00
Frédéric Lécaille
292dfdd78d BUG/MINOR: quic: Wrong cluster secret initialization
The function generate_random_cluster_secret() which initializes the cluster secret
when not supplied by configuration is buggy. There 1/256 that the cluster secret
string is empty.

To fix this, one stores the cluster as a reduced size first 128 bits of its own
SHA1 (160 bits) digest, if defined by configuration. If this is not the case, it
is initialized with a 128 bits random value. Furthermore, thus the cluster secret
is always initialized.

As the cluster secret is always initialized, there are several tests which
are for now on useless. This patch removes such tests (if(global.cluster_secret))
in the QUIC code part and at parsing time: no need to check that a cluster
secret was initialized with "quic-force-retry" option.

Must be backported as far as 2.6.
2023-09-08 09:50:58 +02:00
William Lallemand
15e591b6e0 MINOR: ssl: add support for 'curves' keyword on server lines
This patch implements the 'curves' keyword on server lines as well as
the 'ssl-default-server-curves' keyword in the global section.

It also add the keyword on the server line in the ssl_curves reg-test.

These keywords allow the configuration of the curves list for a server.
2023-09-07 23:29:10 +02:00
Willy Tarreau
28ff1a5d56 MINOR: tasks/stats: report the number of niced tasks in "show info"
We currently know the number of tasks in the run queue that are niced,
and we don't expose it. It's too bad because it can give a hint about
what share of the load is relevant. For example if one runs a Lua
script that was purposely reniced, or if a stats page or the CLI is
hammered with slow operations, seeing them appear there can help
identify what part of the load is not caused by the traffic, and
improve monitoring systems or autoscalers.
2023-09-06 17:44:44 +02:00
Remi Tricot-Le Breton
e03d060aa3 MINOR: cache: Change hash function in default normalizer used in case of "vary"
When building the secondary signature for cache entries when vary is
enabled, the referer part of the signature was a simple crc32 of the
first referer header.
This patch changes it to a 64bits hash based of xxhash algorithm with a
random seed built during init. This will prevent "malicious" hash
collisions between entries of the cache.
2023-09-06 16:11:31 +02:00
Aurelien DARRAGON
7a71801af6 CLEANUP: log: remove unnecessary trim in __do_send_log
Since both sink_write and fd_write_frag_line take the maxlen parameter
as argument, there is no added value for the trim before passing the
msg parameter to those functions.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
8e6339aa29 MEDIUM: sink: add sink_finalize() function
To further clean the code and remove duplication, some sink postparsing
and sink->sft finalization is now performed in a dedicated function
named sink_finalize().
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
b2879e3502 MEDIUM: sink/ring: introduce high level ring creation helper function
ease code maintenance.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
5a8755681d MINOR: sink: add helper function to deallocate sink struct
In this patch we move sink freeing logic outside of sink_deinit() function
in order to create the sink_free() helper function that could be used
on error paths for example.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
6049a478e4 MEDIUM: spoe-agent: properly postresolve log rings
Now that we have sink_postresolve_logsrvs() function, we make use of it
for spoe-agent log postparsing logic.

This will allow this kind of config to work:
  |spoe-agent test
  |        log tcp@127.0.0.1:514 local0
  |        use-backend xxx

Plus, consistency checks will also be performed as for regular log
directives used from global, log-forward or proxy sections.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
486aa01204 MEDIUM: fcgi-app: properly postresolve logsrvs
Now that we have postresolve_logsrv_list() function, we make use of it
for fcgi-app log postparsing logic.

This will allow this kind of config to work:
  |fcgi-app test
  |        docroot /
  |        log-stderr tcp@127.0.0.1:514 local0

Plus, consistency checks will also be performed as for regular log
directives used from global, log-forward or proxy sections.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
d9b81e5b49 MEDIUM: log/sink: make logsrv postparsing more generic
We previously had postparsing logic but only for logsrv sinks, but now we
need to make this operation on logsrv directly instead of sinks to prepare
for additional postparsing logic that is not sink-specific.

To do this, we migrated post_sink_resolve() and sink_postresolve_logsrvs()
to their postresolve_logsrvs() and postresolve_logsrv_list() equivalents.

Then, we split postresolve_logsrv_list() so that the sink-only logic stays
in sink.c (sink_resolve_logsrv_buffer() function), and the "generic"
target part stays in log.c as resolve_logsrv().

Error messages formatting was preserved as far as possible but some slight
variations are to be expected.
As for the functional aspect, no change should be expected.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
969e212c66 MINOR: log: add dup_logsrv() helper function
ease code maintenance by introducing dup_logsrv() helper function to
properly duplicate an existing logsrv struct.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
7a12e2d369 MEDIUM: httpclient/logs: rely on per-proxy post-check instead of global one
httpclient used to register a global post-check function to iterate over
all known proxies and post-initialize httpclient related ones (mainly
for logs initialization).

But we currently have an issue: post_sink_resolve() function which is
also registered using REGISTER_POST_CHECK() macro conflicts with
httpclient_postcheck() function.

This is because post_sink_resolve() relies on proxy->logsrvs to be
correctly initialized already, and httpclient_postcheck() may create
and insert new logsrvs entries to existing proxies when executed.

So depending on which function runs first, we could run into trouble.

Hopefully, to this day, everything works "by accident" due to
http_client.c file being loaded before sink.c file when compiling source
code.

But as soon as we would move one of the two functions to other files, or
if we rename files or make changes to the Makefile build recipe, we could
break this at any time.

To prevent post_sink_resolve() from randomly failing in the future, we now
make httpclient postcheck rely on per-proxy post-checks by slightly
modifying httpclient_postcheck() function so that it can be registered
using REGISTER_POST_PROXY_CHECK() macro.

As per-proxy post-check functions are executed right after config parsing
for each known proxy (vs global post-check which are executed a bit later
in the init process), we can be certain that functions registered using
global post-check macro, ie: post_sink_resolve(), will always be executed
after httpclient postcheck, effectively resolving the ordering conflict.

This should normally not cause visible behavior changes, and while it
could be considered as a bug, it's probably not worth backporting it
since the only way to trigger the issue is through code refactors,
unless we want to backport it to ease code maintenance of course,
in which case it should easily apply for >= 2.7.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
e187361b52 MINOR: log: move log-forwarders cleanup in log.c
Move the log-forwarded proxies cleanup from global deinit() function into
log dedicated deinit function.

No backport needed.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
32f1db6d0d MEDIUM: sink: don't perform implicit truncations when maxlen is not set
maxlen now defaults ~0 (instead of BUFSIZE) to make sure no implicit
truncation will be performed when the option is not specified, since the
doc doesn't mention any default value for maxlen. As such, if the payload
is too big, it will be dropped (this is the default expected behavior).
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
fdf82d058b MINOR: sink: inform the user when logs will be implicitly truncated
Consider the following example:

 |log ring@test-ring len 2000 local0
 |
 |ring test-ring
 |  maxlen 1000

This would result in emitted logs being silently truncated to 1000 because
test-ring maxlen is smaller than the log directive maxlen.

In this patch we're adding an extra check in post_sink_resolve() to detect
this kind of confusing setups and warn the user about the implicit
truncation when DIAG mode is on.

This commit depends on:
 - "MINOR: sink: simplify post_sink_resolve function"
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
ceaa1ddb06 MINOR: log/sink: detect when log maxlen exceeds sink size
To prevent logs from being silently (and unexpectly droppped) at runtime,
we check that the maxlen parameter from the log directives are
strictly inferior to the targeted ring size.

      |global
      |     tune.bufsize 16384
      |     log tcp@127.0.0.1:514 len 32768
      |     log myring@127.0.0.1:514 len 32768
      |ring myring
      |     # no explicit size

On such configs, a diag warning will be reported.

This commit depends on:
 - "MINOR: sink: simplify post_sink_resolve function"
 - "MINOR: ring: add a function to compute max ring payload"
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
d499485aa9 MINOR: sink: simplify post_sink_resolve function
Simplify post_sink_resolve() function to reduce code duplication and
make it easier to maintain.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
ddd8671b19 BUG/MEDIUM: ring: adjust maxlen consistency check
When user specifies a maxlen parameter that is greater than the size of
a given ring section, a warning is emitted to inform that the max length
exceeds size, and then the maxlen is forced to size.

The logic is good, but imprecise, because it doesn't take into account
the slight overhead from storing payloads into the ring.

In practise, we cannot store a single message which is exactly the same
length than size. Doing so will result in the message being dropped at
runtime.

Thanks to the ring_max_payload() function introduced in "MINOR: ring: add
a function to compute max ring payload", we can now deduce the maximum
value for the maxlen parameter before it could result in messages being
dropped.

When maxlen value is set to an improper value, the warning will be emitted
and maxlen will be forced to the maximum "single" payload len that could
fit in the ring buffer, preventing messages from being dropped
unexpectedly.

This commit depends on:
 - "MINOR: ring: add a function to compute max ring payload"

This may be backported as far as 2.2
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
5b295ff409 MINOR: ring: add a function to compute max ring payload
Add a helper function to the ring API to compute the maximum payload
length that could fit into the ring based on ring size.
2023-09-06 16:06:39 +02:00
Aurelien DARRAGON
4457783ade MINOR: http_ana: position the FINAL flag for http_after_res execution
Ensure that the ACT_OPT_FINAL flag is always set when executing actions
from http_after_res context.

This will permit lua functions to be executed as http_after_res actions
since hlua_ctx_resume() automatically disables "yielding" when such flag
is set: the hlua handler will only allow 1shot executions at this point
(lua or not, we don't wan't to reschedule http_after_res actions).
2023-09-06 11:42:34 +02:00
Aurelien DARRAGON
967608a432 BUG/MINOR: hlua/action: incorrect message on E_YIELD error
When hlua_action error messages were reworked in d5b073cf1
("MINOR: lua: Improve error message"), an error was made for the
E_YIELD case.

Indeed, everywhere E_YIELD error is handled: "yield is not allowed" or
similar error message is reported to the user. But instead we currently
have: "aborting Lua processing on expired timeout".

It is quite misleading because this error message often refers to the
HLUA_E_ETMOUT case.

Thus, we now report the proper error message thanks to this patch.

This should be backported to all stable versions.
[on 2.0, the patch needs to be slightly adapted]
2023-09-06 11:42:34 +02:00
Frédéric Lécaille
e7240a0ba6 BUG/MINOR: quic: Dereferenced unchecked pointer to Handshke packet number space
This issue was reported by longrtt interop test with quic-go as client
and @chipitsine in GH #2282 when haproxy is compiled against libressl.

Add two checks to prevent a pointer to the Handshake packet number space
to be dereferenced if this packet number space was released.

Thank you to @chipitsine for this report.

No need to backport.
2023-09-06 10:13:40 +02:00
Christopher Faulet
700ca14fc1 BUG/MINOR: ring/cli: Don't expect input data when showing events
The "show events" command may wait for now events if "-w" option is used. In
this case, no timeout must be triggered. So we explicitly state no input
data are expected. This disables the read timeout on the client side.

This patch should be backported to 2.8. It is probably useless to backport
it further. In all cases, it depends on the commit "BUG/MINOR: applet:
Always expect data when CLI is waiting for a new command"
2023-09-06 09:36:29 +02:00
Christopher Faulet
2f1e0a0a46 BUG/MINOR: applet: Always expect data when CLI is waiting for a new command
There is a mechanism for applets to disable the read timeout on the opposite
side if it is now waiting for any data. Of course, there is also a way to
re-activate it. But, it must excplicitly be handle by applets.

For the CLI, some commands may state no input data are expected. So we must
be sure to reset its state when the applet is waiting for a new command. For
now, it is not a bug because no CLI command uses this mechanism.

This patch must be backported to 2.8.
2023-09-06 09:36:19 +02:00
Christopher Faulet
8073094bf1 NUG/MEDIUM: stconn: Always update stream's expiration date after I/O
It is a revert of following patches:

  * d7111e7ac ("MEDIUM: stconn: Don't requeue the stream's task after I/O")
  * 3479d99d5 ("BUG/MEDIUM: stconn: Update stream expiration date on blocked sends")

Because the first one is reverted, the second one is useless and can be reverted
too.

The issue here is that I/O may be performed without stream wakeup. So if no
expiration date was set on the last call to process_stream(), the stream is
never rescheduled and no timeout can be detected. This especially happens on
TCP streams because fast-forward is enabled very early.

Instead of tracking all places where the stream's expiration data must be
updated, it is now centralized in sc_notify(), as it was performed before
the timeout refactoring.

This patch must be backported to 2.8.
2023-09-06 09:29:27 +02:00
Christopher Faulet
b9c87f8082 BUG/MEDIUM: stconn/stream: Forward shutdown on write timeout
The commit 7f59d68fe2 ("BUG/MEDIIM: stconn: Flush output data before
forwarding close to write side") introduced a regression. When a write
timeout is detected, the shutdown is no longer forwarded. Dependig on the
channels state, it may block the processing, waiting the client or the
server leaves.

The commit above tries to avoid to truncate messages on shutdown but on
write timeout, if the channel is not empty, there is nothing more we can do
to send these data. It means the endpoint is unable to send data. In this
case, we must forward the shutdown.

This patch should be backported as far as 2.2.
2023-09-06 09:29:27 +02:00
Christopher Faulet
d18657ae11 BUG/MEDIUM: applet: Report an error if applet request more room on aborted SC
If an abort was performed and the applet still request more room, it means
the applet has not properly handle the error on its own. At least the CLI
applet is concerned. Instead of reviewing all applets, the error is now
handled in task_run_applet() function.

Because of this bug, a session may be blocked infinitly and may also lead to
a wakup loop.

This patch must only be backported to 2.8 for now. And only to lower
versions if a bug is reported because it is a bit sensitive and the code
older versions are very different.
2023-09-06 09:29:27 +02:00
Christopher Faulet
34645a6365 BUG/MEDIUM: stconn: Report read activity when a stream is attached to front SC
It only concerns the front SC. But it is important to report a read activity
when a stream is created and attached to the front SC, especially in TCP. In
HTTP, when this happens, the request was necessarily received. But in TCP,
the client may open a connection without sending anything. We must still
report a first read activity in this case to be able to properly report
client timeout.

This patch must be backported to 2.8.
2023-09-06 09:29:27 +02:00
Christopher Faulet
015fec6a29 BUG/MINOR: stconn: Don't inhibit shutdown on connection on error
In the SC function responsible to perform shutdown, there is a statement
inhibiting the shutdown if an error was encountered on the SC. This
statement is inherited from very old version and should in fact be
removed. The error may be set from the stream. In this case the shutdown
must be performed. In all cases, it is not a big deal if the shutdown is
performed twice because underlying functions already handle multiple calls.

This patch does not fix any bug. Thus there is no reason to backport it.
2023-09-06 09:29:27 +02:00
Frédéric Lécaille
fb4294be55 BUG/MINOR: quic: Wrong RTT computation (srtt and rrt_var)
Due to the fact that several variable values (rtt_var, srtt) were stored as multiple
of their real values, some calculations were less accurate as expected.

Stop storing 4*rtt_var values, and 8*srtt values.
Adjust all the impacted statements.

Must be backported as far as 2.6.
2023-09-05 17:14:51 +02:00
Frédéric Lécaille
cf768f7456 BUG/MINOR: quic: Wrong RTT adjusments
There was a typo in the test statement to check if the rtt must be adjusted
(>= incorectly replaced by >).

Must be backported as far as 2.6.
2023-09-05 17:14:51 +02:00
William Lallemand
6bc00a97da MINOR: httpclient: allow to configure the timeout.connect
When using the httpclient, one could be bothered with it returning
after a very long time when failing. By default the httpclient has a
retries of 3 and a timeout connect of 5s, which can results in pause of
20s upon failure.

This patch allows the user to configure the "timeout connect" of the
httpclient so it could reduce the time to return an error.

This patch helps fixing part of the issue #2269.

Could be backported in 2.7 if needed.
2023-09-05 16:42:27 +02:00
William Lallemand
c52948bd2c MINOR: httpclient: allow to configure the retries
When using the httpclient, one could be bothered with it returning after
a very long time when failing. By default the httpclient has a retries
of 3 and a timeout connect of 5s, which can results in pause of 20s
upon failure.

This patch allows the user to configure the retries of the httpclient so
it could reduce the time to return an error.

This patch helps fixing part of the issue #2269.

Could be backported in 2.7 if needed.
2023-09-05 15:55:04 +02:00
William Lallemand
fcb080d8f9 MEDIUM: mworker: display a more accessible message when a worker crash
Should fix issue #1034.

Display a more accessible message when a worker crash about what to do.

Example:

	$ ./haproxy -W -f haproxy.cfg
	[NOTICE]   (308877) : New worker (308884) forked
	[NOTICE]   (308877) : Loading success.
	[NOTICE]   (308877) : haproxy version is 2.9-dev4-d90d3b-58
	[NOTICE]   (308877) : path to executable is ./haproxy
	[ALERT]    (308877) : Current worker (308884) exited with code 139 (Segmentation fault)
	[WARNING]  (308877) : A worker process unexpectedly died and this can only be explained by a bug in haproxy or its dependencies.
	Please check that you are running an up to date and maintained version of haproxy and open a bug report.
	HAProxy version 2.9-dev4-d90d3b-58 2023/09/05 - https://haproxy.org/
	Status: development branch - not safe for use in production.
	Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
	Running on: Linux 6.2.0-31-generic #31-Ubuntu SMP PREEMPT_DYNAMIC Mon Aug 14 13:42:26 UTC 2023 x86_64
	[ALERT]    (308877) : exit-on-failure: killing every processes with SIGTERM
	[WARNING]  (308877) : All workers exited. Exiting... (139)
2023-09-05 15:31:04 +02:00
William Lallemand
d90d3bf894 MINOR: global: export the display_version() symbol
Export the display_version() function which can be used elsewhere than
in haproxy.c
2023-09-05 15:24:39 +02:00
Frédéric Lécaille
aeb2f28ca7 BUG/MINOR: quic: Unchecked pointer to Handshake packet number space
It is possible that there are still Initial crypto data in flight without
Handshake crypto data in flight. This is very rare but possible.

This issue was reported by handshakeloss interop test with quic-go as client
and @chipitsine in GH #2279.

No need to backport.
2023-09-05 11:38:33 +02:00
Frédéric Lécaille
3afe54ed5b BUILD: quic: Compilation issue on 32-bits systems with quic_may_send_bytes()
quic_may_send_bytes() implementation arrived with this commit:

  MINOR: quic: Amplification limit handling sanitization.

It returns a size_t. So when compared with QUIC_MIN() with qc->path->mtu there is
no need to cast this latted anymore because it is also a size_t.

Detected when compiled with -m32 gcc option.
2023-09-05 10:33:56 +02:00
Willy Tarreau
86854dd032 MEDIUM: threads: detect excessive thread counts vs cpu-map
This detects when there are more threads bound via cpu-map than CPUs
enabled in cpu-map, or when there are more total threads than the total
number of CPUs available at boot (for unbound threads) and configured
for bound threads. In this case, a warning is emitted to explain the
problems it will cause, and explaining how to address the situation.

Note that some configurations will not be detected as faulty because
the algorithmic complexity to resolve all arrangements grows in O(N!).
This means that having 3 threads on 2 CPUs and one thread on 2 CPUs
will not be detected as it's 4 threads for 4 CPUs. But at least configs
such as T0:(1,4) T1:(1,4) T2:(2,4) T3:(3,4) will not trigger a warning
since they're valid.
2023-09-04 19:39:17 +02:00
Willy Tarreau
8357f950cb MEDIUM: threads: detect incomplete CPU bindings
It's very easy to mess up with some cpu-map directives and to leave
some thread unbound. Let's add a test that checks that either all
threads are bound or none are bound, but that we do not face the
intermediary situation where some are pinned and others are left
wandering around, possibly on the same CPUs as bound ones.

Note that this should not be backported, or maybe turned into a
notice only, as it appears that it will easily catch invalid
configs and that may break updates for some users.
2023-09-04 19:39:17 +02:00
Willy Tarreau
e65f54cf96 MINOR: cpuset: centralize a reliable bound cpu detection
Till now the CPUs that were bound were only retrieved in
thread_cpus_enabled() in order to count the number of CPUs allowed,
and it relied on arch-specific code.

Let's slightly arrange this into ha_cpuset_detect_bound() that
reuses the ha_cpuset struct and the accompanying code. This makes
the code much clearer without having to carry along some arch-specific
stuff out of this area.

Note that the macos-specific code used in thread.c to only count
online CPUs but not retrieve a mask, so for now we can't infer
anything from it and can't implement it.

In addition and more importantly, this function is reliable in that
it will only return a value when the detection is accurate, and will
not return incomplete sets on operating systems where we don't have
an exact list, such as online CPUs.
2023-09-04 19:39:17 +02:00
Willy Tarreau
d3ecc67a01 MINOR: cpuset: add ha_cpuset_or() to bitwise-OR two CPU sets
This operation was not implemented and will be needed later.
2023-09-04 19:39:17 +02:00
Willy Tarreau
eb10567254 MINOR: cpuset: add ha_cpuset_isset() to check for the presence of a CPU in a set
This function will be convenient to test for the presence of a given CPU
in a set.
2023-09-04 19:39:17 +02:00
Willy Tarreau
fca3fc0d90 BUILD: checks: shut up yet another stupid gcc warning
gcc has always had hallucinations regarding value ranges, and this one
is interesting, and affects branches 4.7 to 11.3 at least. When building
without threads, the randomly picked new_tid that is reduced to a multiply
by 1 shifted right 32 bits, hence a constant output of 0 shows this
warning:

  src/check.c: In function 'process_chk_conn':
  src/check.c:1150:32: warning: array subscript [-1, 0] is outside array bounds of 'struct thread_ctx[1]' [-Warray-bounds]
  In file included from include/haproxy/thread.h:28,
                   from include/haproxy/list.h:26,
                   from include/haproxy/action.h:28,
                   from src/check.c:31:

or this one when trying to force the test to see that it cannot be zero(!):

  src/check.c: In function 'process_chk_conn':
  src/check.c:1150:54: warning: array subscript [0, 0] is outside array bounds of 'struct thread_ctx[1]' [-Warray-bounds]
   1150 |         uint t2_act  = _HA_ATOMIC_LOAD(&ha_thread_ctx[thr2].active_checks);
        |                                         ~~~~~~~~~~~~~^~~~~~
  include/haproxy/atomic.h:66:40: note: in definition of macro 'HA_ATOMIC_LOAD'
     66 | #define HA_ATOMIC_LOAD(val)          *(val)
        |                                        ^~~
  src/check.c:1150:24: note: in expansion of macro '_HA_ATOMIC_LOAD'
   1150 |         uint t2_act  = _HA_ATOMIC_LOAD(&ha_thread_ctx[thr2].active_checks);
        |                        ^~~~~~~~~~~~~~~

Let's just add an ALREADY_CHECKED() statement there, no other check seems
to get rid of it. No backport is needed.
2023-09-04 19:38:51 +02:00
Andrew Hopkins
b3f94f8b3b BUILD: ssl: Build with new cryptographic library AWS-LC
This adds a new option for the Makefile USE_OPENSSL_AWSLC, and
update the documentation with instructions to use HAProxy with
AWS-LC.

Update the type of the OCSP callback retrieved with
SSL_CTX_get_tlsext_status_cb with the actual type for
libcrypto versions greater than 1.0.2. This doesn't affect
OpenSSL which casts the callback to void* in SSL_CTX_ctrl.
2023-09-04 18:19:18 +02:00
Miroslav Zagorac
3cfc30416c MINOR: properly mark the end of the CLI command in error messages
In several places in the file src/ssl_ckch.c, in the message about the
incorrect use of the CLI command, the end of that CLI command is not
correctly marked with the sign ' .
2023-09-04 18:13:43 +02:00
Willy Tarreau
8547f5cfa2 BUG/MINOR: stream: further protect stream_dump() against incomplete sessions
As found by Coverity in issue #2273, the fix in commit e64bccab2 ("BUG/MINOR:
stream: protect stream_dump() against incomplete streams") was still not
enough, as scf/scb are still dereferenced to dump their flags and states.

This should be backported to 2.8.
2023-09-04 15:32:17 +02:00
Chris Staite
3939e39479 BUG/MEDIUM: h1-htx: Ensure chunked parsing with full output buffer
A previous fix to ensure that there is sufficient space on the output buffer
to place parsed data (#2053) introduced an issue that if the output buffer is
filled on a chunk boundary no data is parsed but the congested flag is not set
due to the state not being H1_MSG_DATA.

The check to ensure that there is sufficient space in the output buffer is
actually already performed in all downstream functions before it is used.
This makes the early optimisation that avoids the state transition to
H1_MSG_DATA needless.  Therefore, in order to allow the chunk parser to
continue in this edge case we can simply remove the early check.  This
ensures that the state can progress and set the congested flag correctly
in the caller.

This patch fixes #2262. The upstream change that caused this logic error was
backported as far as 2.5, therefore it makes sense to backport this fix back
that far also.
2023-09-04 12:15:36 +02:00
Willy Tarreau
135c66f6cb BUG/MEDIUM: connection: fix pool free regression with recent ppv2 TLV patches
In commit fecc573da ("MEDIUM: connection: Generic, list-based allocation
and look-up of PPv2 TLVs") there was a tiny mistake, elements of length
<= 128 are allocated from pool_pp_128 but only those of length < 128 are
released to this pool, other ones go to pool_pp_256. Because of this,
elements of size exactly 128 are allocated from 128 and released to 256.
It can be reproduced a few times by running sample_fetches/tlvs.vtc 1000
times with -DDEBUG_DONT_SHARE_POOLS -DDEBUG_MEMORY_POOLS -DDEBUG_EXPR
-DDEBUG_STRICT=2 -DDEBUG_POOL_INTEGRITY -DDEBUG_POOL_TRACING
-DDEBUG_NO_POOLS. Not sure why it doesn't reproduce more often though.

No backport is needed. This should address github issues #2275 and #2274.
2023-09-04 11:45:37 +02:00
Frédéric Lécaille
d52466726f BUG/MINOR: quic: Unchecked pointer to packet number space dereferenced
It is possible that there are still Initial crypto data in flight without
Handshake crypto data in flight. This is very rare but possible.

This issue was reported by long-rtt interop test with quic-go as client
and @chipitsine in GH #2276.

No need to backport.
2023-09-04 11:29:35 +02:00
Frédéric Lécaille
9077f20251 BUG/MAJOR: quic: Really ignore malformed ACK frames.
If not correctly parsed, an ACK frame must be ignored without any more
treatment. Before this patch an ACK frame could be partially correctly
parsed, then some errors could be detected which leaded newly acknowledged
packets to be released in a wrong way calling free_quic_tx_pkts() called
by qc_parse_ack_frm(). But there is no reason to release such packets because
of a malformed ACK frame.

This patch modifies qc_parse_ack_frm(). The newly acknowledged TX packets is done
in two steps. It first collects the newly acknowledged packet calling
qc_newly_acked_pkts(). Then proceed the same way as before for the treatments of
haproxy TX packets acknowledged by the peer. If the ACK frame could not be fully
parsed, the newly ackowledged packets are replaced back from where they were
detached: the tree of TX packets for their encryption level.

Must be backported as far as 2.6.
2023-09-04 11:29:35 +02:00
Frédéric Lécaille
7dad52bdbd MINOR: quic: Add a trace to quic_release_frm()
Display the address of the frame to be released as soon as entering into
quic_release_frm() whose job is obviously to released the memory allocated
for the frame <frm> passed as parameter.
2023-09-04 11:29:35 +02:00
Frédéric Lécaille
3c90c1ce6b BUG/MINOR: quic: Possible skipped RTT sampling
There are very few chances this bug may occur. Furthermore the consequences
are not dramatic: an RTT sampling may be ignored. I guess this may happen
when the now_ms global value wraps.

Do not rely on the time variable value a packet was sent to decide if it
is a newly acknowledged packet but on its presence or not in the tx packet
ebtree.

Must be backported as far as 2.6.
2023-09-04 11:29:35 +02:00
Christopher Faulet
0b93ff8c87 BUG/MEDIUM: stconn: Wake applets on sending path if there is a pending shutdown
An applet is not woken up on sending path if it is not waiting for data or
if it states it will not consume data. However, it is important to still
wake it up if there is a pending shutdown. Otherwise, the event may be
missed and some data may remain blocked in the channel's buffer.

Because of this bug, it is possible to have a stream stuck if data are also
blocked on the opposite channel. It is for instance possible to hit the buf
with the stats applet and a client not consuming data.

This patch must slowly be backported as far as 2.2. It should partially fix
issue #2249.
2023-09-01 14:18:26 +02:00
Christopher Faulet
9e394d34e0 BUG/MINOR: stconn: Don't report blocked sends during connection establishment
The server timeout must not be handled during the connection establishment
to not superseed the connect timeout. To do so, we must not consider
outgoing data are blocked during this stage. Concretly, it means the fsb
time must not be updated during connection establishment.

It is not an issue with regular clients because the server timeout is only
defined when the connection is estalished. However, it may be an issue for the
HTTP client, when the server timeout is lower than the connect timeout. In this
case, an early 502 may be reported with no connection retries.

This patch must be backported to 2.8.
2023-09-01 14:18:26 +02:00
Christopher Faulet
3479d99d5f BUG/MEDIUM: stconn: Update stream expiration date on blocked sends
When outgoing data are blocked, we must update the stream expiration date
and requeue the task. It is important to be sure to properly handle write
timeout, expecially if the stream cannot expire on reads. This bug was
introduced when handling of channel's timeouts was refactored to be managed
by the stream-connectors.

It is an issue if there is no server timeout and the client does not consume
the response (or the opposite but it is less common). It is also possible to
trigger the same scenario with applets on server side because, most of time,
there is no server timeout.

This patch must be backported to 2.8.
2023-09-01 14:18:26 +02:00
Christopher Faulet
49ed83e948 DEBUG: applet: Properly report opposite SC expiration dates in traces
The wrong label was used in trace to report expiration dates of the opposite
SC. "sc" was used instead of "sco".

This patch should be backported to 2.8.
2023-09-01 14:18:26 +02:00
Willy Tarreau
b0031d9679 MINOR: checks: also consider the thread's queue for rebalancing
Let's also check for other threads when the current one is queueing,
let's not wait for the load to be high. Now this totally eliminates
differences between threads.
2023-09-01 14:00:04 +02:00
Willy Tarreau
844a3bc25b MEDIUM: checks: implement a queue in order to limit concurrent checks
The progressive adoption of OpenSSL 3 and its abysmal handshake
performance has started to reveal situations where it simply isn't
possible anymore to succesfully run health checks on many servers,
because between the moment all the checks are started and the moment
the handshake finally completes, the timeout has expired!

This also has consequences on production traffic which gets
significantly delayed as well, all that for lots of checks. While it's
possible to increase the check delays, it doesn't solve everything as
checks still take a huge amount of time to converge in such conditions.

Here we take a different approach by permitting to enforce the maximum
concurrent checks per thread limitation and implementing an ordered
queue. Thanks to this, if a thread about to start a check has reached
its limit, it will add the check at the end of a queue and it will be
processed once another check is finished. This proves to be extremely
efficient, with all checks completing in a reasonable amount of time
and not being disturbed by the rest of the traffic from other checks.
They're just cycling slower, but at the speed the machine can handle.

One must understand however that if some complex checks perform multiple
exchanges, they will take a check slot for all the required duration.
This is why the limit is not enforced by default.

Tests on SSL show that a limit of 5-50 checks per thread on local
servers gives excellent results already, so that could be a good starting
point.
2023-09-01 14:00:04 +02:00
Willy Tarreau
cfc0bceeb5 MEDIUM: checks: search more aggressively for another thread on overload
When the current check is overloaded (more running checks than the
configured limit), we'll try more aggressively to find another thread.
Instead of just opportunistically looking for one half as loaded, now if
the current thread has more than 1% more active checks than another one,
or has more than a configured limit of concurrent running checks, it will
search for a more suitable thread among 3 other random ones in order to
migrate the check there. The number of migrations remains very low (~1%)
and the checks load very fair across all threads (~1% as well). The new
parameter is called tune.max-checks-per-thread.
2023-09-01 08:26:06 +02:00
Willy Tarreau
016e189ea3 MINOR: check: also consider the random other thread's active checks
When checking if it's worth transferring a sleeping thread to another
random thread, let's also check if that random other thread has less
checks than the current one, which is another reason for transferring
the load there.

This commit adds a function "check_thread_cmp_load()" to compare two
threads' loads in order to simplify the decision taking.

The minimum active check count before starting to consider rebalancing
the load was now raised from 2 to 3, because tests show that at 15k
concurrent checks, at 2, 50% are evaluated for rebalancing and 30%
are rebalanced, while at 3, this is cut in half.
2023-09-01 08:26:06 +02:00
Willy Tarreau
00de9e0804 MINOR: checks: maintain counters of active checks per thread
Let's keep two check counters per thread:
  - one for "active" checks, i.e. checks that are no more sleeping
    and are assigned to the thread. These include sleeping and
    running checks ;

  - one for "running" checks, i.e. those which are currently
    executing on the thread.

By doing so, we'll be able to spread the health checks load a bit better
and refrain from sending too many at once per thread. The counters are
atomic since a migration increments the target thread's active counter.
These numbers are reported in "show activity", which allows to check
per thread and globally how many checks are currently pending and running
on the system.

Ideally, we should only consider checks in the process of establishing
a connection since that's really the expensive part (particularly with
OpenSSL 3.0). But the inner layers are really not suitable to doing
this. However knowing the number of active checks is already a good
enough hint.
2023-09-01 08:26:06 +02:00
Willy Tarreau
3b7942a1c9 MINOR: check/activity: collect some per-thread check activity stats
We now count the number of times a check was started on each thread
and the number of times a check was adopted. This helps understand
better what is observed regarding checks.
2023-09-01 08:26:06 +02:00