25362 Commits

Author SHA1 Message Date
Christopher Faulet
3023e98199 BUG/MINOR: resolvers: Restore round-robin selection on records in DNS answers
Since the commit dcb696cd3 ("MEDIUM: resolvers: hash the records before
inserting them into the tree"), When several records are found in a DNS
answer, the round robin selection over these records is no longer performed.

Indeed, before a list of records was used. To ensure each records was
selected one after the other, at each selection, the first record of the
list was moved at the end. When this list was replaced bu a tree, the same
mechanism was preserved. However, the record is indexed using its key, a
hash of the record. So its position never changes. When it is removed and
reinserted in the tree, its position remains the same. When we walk though
the tree, starting from the root, the records are always evaluated in the
same order. So, even if there are several records in a DNS answer, the same
IP address is always selected.

It is quite easy to trigger the issue with a do-resolv action.

To fix the issue, the node to perform the next selection is now saved. So
instead of restarting from the root each time, we can restart from the next
node of the previous call.

Thanks to Damien Claisse for the issue analysis and for the reproducer.

This patch should fix the issue #3116. It must be backported as far as 2.6.
2025-09-11 15:46:45 +02:00
Christopher Faulet
37abe56b18 BUG/MEDIUM: resolvers: Properly cache do-resolv resolution
As stated by the documentation, when a do-resolv resolution is performed,
the result should be cached for <hold.valid> milliseconds. However, the only
way to cache the result is to always have a requester. When the last
requester is unlink from the resolution, the resolution is released. So, for
a do-resolv resolution, it means it could only work by chance if the same
FQDN is requested enough to always have at least two streams waiting for the
resolution. And because in that case, the cached result is used, it means
the traffic must be quite high.

In fact, a good approach to fix the issue is to keep orphan resolutions to
be able cache the result and only release them after hold.valid milliseconds
after the last real resolution. The resolver's task already releases orphan
resolutions. So we only need to check the expiration date and take care to
not release the resolution when the last stream is unlink from it.

This patch should be backported to all stable versions. We can start to
backport it as far as 3.1 and then wait a bit.
2025-09-11 15:46:45 +02:00
William Lallemand
fb832e1e52 BUILD: ssl: functions defined but not used
Previous patch 50d191b ("MINOR: ssl: set functions as static when no
protypes in the .h") broke the WolfSSL function with unused functions.

This patch add __maybe_unused to ssl_sock_sctl_parse_cbk(),
ssl_sock_sctl_add_cbk() and ssl_sock_msgcbk()
2025-09-11 15:32:59 +02:00
William Lallemand
50d191b8a3 MINOR: ssl: set functions as static when no protypes in the .h
Check with -Wmissing-prototypes what should be static.

src/ssl_sock.c:1572:5: error: no previous prototype for ‘ssl_sock_sctl_add_cbk’ [-Werror=missing-prototypes]
 1572 | int ssl_sock_sctl_add_cbk(SSL *ssl, unsigned ext_type, const unsigned char **out, size_t *outlen, int *al, void *add_arg)
      |     ^~~~~~~~~~~~~~~~~~~~~
src/ssl_sock.c:1582:5: error: no previous prototype for ‘ssl_sock_sctl_parse_cbk’ [-Werror=missing-prototypes]
 1582 | int ssl_sock_sctl_parse_cbk(SSL *s, unsigned int ext_type, const unsigned char *in, size_t inlen, int *al, void *parse_arg)
      |     ^~~~~~~~~~~~~~~~~~~~~~~
src/ssl_sock.c:1604:6: error: no previous prototype for ‘ssl_sock_infocbk’ [-Werror=missing-prototypes]
 1604 | void ssl_sock_infocbk(const SSL *ssl, int where, int ret)
      |      ^~~~~~~~~~~~~~~~
src/ssl_sock.c:2107:6: error: no previous prototype for ‘ssl_sock_msgcbk’ [-Werror=missing-prototypes]
 2107 | void ssl_sock_msgcbk(int write_p, int version, int content_type, const void *buf, size_t len, SSL *ssl, void *arg)
      |      ^~~~~~~~~~~~~~~
src/ssl_sock.c:3936:5: error: no previous prototype for ‘sh_ssl_sess_new_cb’ [-Werror=missing-prototypes]
 3936 | int sh_ssl_sess_new_cb(SSL *ssl, SSL_SESSION *sess)
      |     ^~~~~~~~~~~~~~~~~~
src/ssl_sock.c:3990:14: error: no previous prototype for ‘sh_ssl_sess_get_cb’ [-Werror=missing-prototypes]
 3990 | SSL_SESSION *sh_ssl_sess_get_cb(SSL *ssl, __OPENSSL_110_CONST__ unsigned char *key, int key_len, int *do_copy)
      |              ^~~~~~~~~~~~~~~~~~
src/ssl_sock.c:4043:6: error: no previous prototype for ‘sh_ssl_sess_remove_cb’ [-Werror=missing-prototypes]
 4043 | void sh_ssl_sess_remove_cb(SSL_CTX *ctx, SSL_SESSION *sess)
      |      ^~~~~~~~~~~~~~~~~~~~~
src/ssl_sock.c:4075:6: error: no previous prototype for ‘ssl_set_shctx’ [-Werror=missing-prototypes]
 4075 | void ssl_set_shctx(SSL_CTX *ctx)
      |      ^~~~~~~~~~~~~
src/ssl_sock.c:4103:6: error: no previous prototype for ‘SSL_CTX_keylog’ [-Werror=missing-prototypes]
 4103 | void SSL_CTX_keylog(const SSL *ssl, const char *line)
      |      ^~~~~~~~~~~~~~
src/ssl_sock.c:5167:6: error: no previous prototype for ‘ssl_sock_deinit’ [-Werror=missing-prototypes]
 5167 | void ssl_sock_deinit()
      |      ^~~~~~~~~~~~~~~
src/ssl_sock.c:6976:6: error: no previous prototype for ‘ssl_sock_close’ [-Werror=missing-prototypes]
 6976 | void ssl_sock_close(struct connection *conn, void *xprt_ctx) {
      |      ^~~~~~~~~~~~~~
src/ssl_sock.c:7846:17: error: no previous prototype for ‘ssl_action_wait_for_hs’ [-Werror=missing-prototypes]
 7846 | enum act_return ssl_action_wait_for_hs(struct act_rule *rule, struct proxy *px,
      |                 ^~~~~~~~~~~~~~~~~~~~~~
2025-09-11 15:23:59 +02:00
William Lallemand
19daee6549 MINOR: ocsp: put internal functions as static ones
-Wmissing-prototypes let us check which functions can be made static and
is not used elsewhere.

rc/ssl_ocsp.c:1079:5: error: no previous prototype for ‘ssl_ocsp_update_insert_after_error’ [-Werror=missing-prototypes]
 1079 | int ssl_ocsp_update_insert_after_error(struct certificate_ocsp *ocsp)
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_ocsp.c:1116:6: error: no previous prototype for ‘ocsp_update_response_stline_cb’ [-Werror=missing-prototypes]
 1116 | void ocsp_update_response_stline_cb(struct httpclient *hc)
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_ocsp.c:1127:6: error: no previous prototype for ‘ocsp_update_response_headers_cb’ [-Werror=missing-prototypes]
 1127 | void ocsp_update_response_headers_cb(struct httpclient *hc)
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_ocsp.c:1138:6: error: no previous prototype for ‘ocsp_update_response_body_cb’ [-Werror=missing-prototypes]
 1138 | void ocsp_update_response_body_cb(struct httpclient *hc)
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_ocsp.c:1149:6: error: no previous prototype for ‘ocsp_update_response_end_cb’ [-Werror=missing-prototypes]
 1149 | void ocsp_update_response_end_cb(struct httpclient *hc)
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_ocsp.c:2095:5: error: no previous prototype for ‘ocsp_update_postparser_init’ [-Werror=missing-prototypes]
 2095 | int ocsp_update_postparser_init()
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-09-11 15:18:48 +02:00
William Lallemand
0224d60de6 BUG/MINOR: ocsp: prototype inconsistency
Inconsistencies between the .h and the .c can't be catched because the
.h is not included in the .c.

ocsp_update_init() does not have the right prototype and lacks a const
attribute.

Must be backported in all previous stable versions.
2025-09-11 15:18:10 +02:00
Remi Tricot-Le Breton
e0844a305c BUG/MINOR: ssl: Fix potential NULL deref in trace callback
'conn' might be NULL in the trace callback so the calls to
conn_err_code_str must be covered by a proper check.

This issue was found by Coverity and raised in GitHub #3112.
The patch must be backported to 3.2.
2025-09-11 14:31:32 +02:00
Remi Tricot-Le Breton
a316342ec6 BUG/MINOR: ssl: Potential NULL deref in trace macro
'ctx' might be NULL when we exit 'ssl_sock_handshake', it can't be
dereferenced without check in the trace macro.

This was found by Coverity andraised in GitHub #3113.
This patch should be backported up to 3.2.
2025-09-11 14:31:32 +02:00
William Lallemand
e52e6f66ac BUG/MEDIUM: jws: return size_t in JWS functions
JWS functions are supposed to return 0 upon error or when nothing was
produced. This was done in order to put easily the return value in
trash->data without having to check the return value.

However functions like a2base64url() or snprintf() could return a
negative value, which would be casted in a unsigned int if this happen.

This patch add checks on the JWS functions to ensure that no negative
value can be returned, and change the prototype from int to size_t.

This is also related to issue #3114.

Must be backported to 3.2.
2025-09-11 14:31:32 +02:00
William Lallemand
66a7ebfeef BUG/MINOR: acme: null pointer dereference upon allocation failure
Reported in issue #3115:

     	11. var_compare_op: Comparing task to null implies that task might be null.
681                if (!task) {
682                        ret++;
683                        ha_alert("acme: couldn't start the scheduler!\n");
684                }
CID 1609721: (#1 of 1): Dereference after null check (FORWARD_NULL)
12. var_deref_op: Dereferencing null pointer task.
685                task->nice = 0;
686                task->process = acme_scheduler;
687
688                task_wakeup(task, TASK_WOKEN_INIT);
689        }
690

Task would be dereferenced upon allocation failure instead of falling
back to the end of the function after the error.

Should be backported in 3.2.
2025-09-11 14:31:32 +02:00
Amaury Denoyelle
c15129f7dc DOC: quic: clarifies limited-quic support
This patch extends the documentation for "limited-quic" global keyword.
It mentions first that it relies on USE_QUIC_OPENSSL_COMPAT=1 build
option.

Compatibility with TLS libraries is now clearly exposed. In particular,
it highlights the fact that it is mostly targetted at OpenSSL version
prior to 3.5.2, and that it should be disabled if a recent OpenSSL
release is available. It also states that limited-quic does nothing if
USE_QUIC_OPENSSL_COMPAT is not set during compilation.
2025-09-11 10:11:12 +02:00
Amaury Denoyelle
d293cc62dc MINOR: quic: display build warning for compat layer on recent OpenSSL
Build option USE_QUIC_OPENSSL_COMPAT=1 must be set to activate QUIC
support for OpenSSL prior to version 3.5.2. This compiles an internal
compatibility layer, which must be then activated at runtime with global
option limited-quic.

Starting from OpenSSL version 3.5.2, a proper QUIC TLS API is now
exposed. Thus, the compatibility layer is unneeded. However it can still
be compiled against newer OpenSSL releases and activated at runtime,
mostly for test purpose.

As this compatibility layer has some limitations, (no support for QUIC
0-RTT), it's important that users notice this situation and disable it
if possible. Thus, this patch adds a notice warning when
USE_QUIC_OPENSSL_COMPAT=1 is set when building against OpenSSL 3.5.2 and
above. This should be sufficient for users and packagers to understand
that this option is not necessary anymore.

Note that USE_QUIC_OPENSSL_COMPAT=1 is incompatible with others TLS
library which exposed a QUIC API based on original BoringSSL patches
set. A build error will prevent the compatibility layer to be built.
limited-quic option is thus silently ignored.
2025-09-11 10:11:12 +02:00
Frederic Lecaille
5027ba36a9 MINOR: quic-be: make SSL/QUIC objects use their own indexes (ssl_qc_app_data_index)
This index is used to retrieve the quic_conn object from its SSL object, the same
way the connection is retrieved from its SSL object for SSL/TCP connections.

This patch implements two helper functions to avoid the ugly code with such blocks:

   #ifdef USE_QUIC
   else if (qc) { .. }
   #endif

Implement ssl_sock_get_listener() to return the listener from an SSL object.
Implement ssl_sock_get_conn() to return the connection from an SSL object
and optionally a pointer to the ssl_sock_ctx struct attached to the connections
or the quic_conns.

Use this functions where applicable:
   - ssl_tlsext_ticket_key_cb() calls ssl_sock_get_listener()
   - ssl_sock_infocbk() calls ssl_sock_get_conn()
   - ssl_sock_msgcbk() calls ssl_sock_get_ssl_conn()
   - ssl_sess_new_srv_cb() calls ssl_sock_get_conn()
   - ssl_sock_srv_verifycbk() calls ssl_sock_get_conn()

Also modify qc_ssl_sess_init() to initialize the ssl_qc_app_data_index index for
the QUIC backends.
2025-09-11 09:51:28 +02:00
Frederic Lecaille
47bb15ca84 MINOR: quic: get rid of ->target quic_conn struct member
The ->li (struct listener *) member of quic_conn struct was replaced by a
->target (struct obj_type *) member by this commit:

    MINOR: quic-be: get rid of ->li quic_conn member

to abstract the connection type (front or back) when implementing QUIC for the
backends. In these cases, ->target was a pointer to the ojb_type of a server
struct. This could not work with the dynamic servers contrary to the listeners
which are not dynamic.

This patch almost reverts the one mentioned above. ->target pointer to obj_type member
is replaced by ->li pointer to listener struct member. As the listener are not
dynamic, this is easy to do this. All one has to do is to replace the
objt_listener(qc->target) statement by qc->li where applicable.

For the backend connection, when needed, this is always qc->conn->target which is
used only when qc->conn is initialized. The only "problematic" case is for
quic_dgram_parse() which takes a pointer to an obj_type as third argument.
But this obj_type is only used to call quic_rx_pkt_parse(). Inside this function
it is used to access the proxy counters of the connection thanks to qc_counters().
So, this obj_type argument may be null for now on with this patch. This is the
reason why qc_counters() is modified to take this into consideration.
2025-09-11 09:51:28 +02:00
Christopher Faulet
5354c24c76 BUG/MAJOR: stream: Force channel analysis on successful synchronous send
This patchs reverts commit a498e527b ("BUG/MAJOR: stream: Remove READ/WRITE
events on channels after analysers eval") because of a regression. It was an
attempt to properly detect synchronous sends, even when the stream was woken
up on a write event. However, the fix was wrong because it could mask
shutdowns performed during process_stream() and block the stream.

Indeed, when a shutdown is performed, because an error occurred for
instance, a write event is reported. The commit above could mask this event
while the shutdown prevent any synchronous sends. In such case, the stream
could remain blocked infinitly because an I/O event was missed.

So to properly fix the original issue (#3070), the write event must not be
masked before a synchronous send. Instead, we now force the channel analysis
by setting explicitly CF_WAKE_ONCE flags on the corresponding channel if a
write event is reported after the synchronous send. CF_WRITE_EVENT flag is
remove explicitly just before, so it is quite easy to detect.

This patch must be backport to all stable version in same time of the commit
above.
2025-09-11 09:47:47 +02:00
Willy Tarreau
ded2110ec6 MEDIUM: peers: move process_peer_sync() to a single thread
The remaining half of the task_queue() and task_wakeup() contention
is caused by this function when peers are in use, because just like
process_table_expire(), it's created using task_new_anywhere() and
is woken up for local updates. Let's turn it to single thread by
rotating the assigned threads during initialization so that a table
only runs on one thread at a time.

Here we go backwards to assign the threads, so that on small setups
they don't end up on the same CPUs as the ones used by the stick-tables.
This way this will make an even better use of large machines. The
performance remains the same as with previous patch, even slightly
better (1-3% on avg).

At this point there's almost no multi-threaded task activity anymore
(only srv_cleanup_idle_server once in a while). This should improve
the situation described by Felipe in issues #3084 and #3101.

This should be backported to 3.2 after some extended checks.
2025-09-10 19:14:05 +02:00
Willy Tarreau
e05afda249 MEDIUM: stick-table: move process_table_expire() to a single thread
A big deal of the task_queue() contention is caused by this function
because it's created using task_new_anywhere() and is subject to
heavy updates. Let's turn it to single thread by rotating the assigned
threads during initialization so that a table only runs on one thread
at a time.

However there's a trick: the function used to call task_queue() to
requeue the task if it had advanced its timer (may only happen when
learning an entry from a peer). We can't do that anymore since we can't
queue another thread's task. Thus instead of the task needs to be
scheduled earlier than previously planned, we simply perform a wakeup.
It will likely do nothing and will self-adjust its next wakeup timer.

Doing so halves the number of multi-thread task wakeups. In addition
the request rate at saturation increased by 12% with 16 peers and 40
tables on a 16 8-thread processes. This should improve the situation
described by Felipe in issues #3084 and #3101.

This should be backported to 3.2 after some extended checks.
2025-09-10 19:13:33 +02:00
Willy Tarreau
2831cb104f BUG/MINOR: stick-table: make sure never to miss a process_table_expire update
In stktable_requeue_exp(), there's a tiny race at the beginning during
which we check the task's expiration date to decide whether or not to
wake process_table_expire() up. During this race, the task might just
have finished running on its owner thread and we can miss a task_queue()
opportunity, which probably explains why during testing it seldom happens
that a few entries are left at the end.

Let's perform a CAS to confirm the value is still the same before
leaving. This way we're certain that our value has been seen at least
once.

This should be backported to 3.2.
2025-09-10 18:45:01 +02:00
Willy Tarreau
2ce5e0edcc MEDIUM: resolvers: make the process_resolvers() task single-threaded
This task is sometimes caught triggering the watchdog while waiting for
the infamous resolvers lock, or the scheduler's wait queue lock in
task_queue(). Both are caused by its multi-threaded capability. The
task may indeed start on a thread that's different from the one that
is currently receiving a response and that holds the resolvers lock,
and when being queued back, it requires to lock the wait queue. Both
problems disappear when sticking it to a single thread. But for configs
running multiple resolvers sections, it would be suboptimal to run them
all on the same thread. In order to avoid this, we implement a counter
in the resolvers_finalize_config() section that rotates the thread for
each resolvers section.

This was sufficient to further improve the performance here, making the
CPU usage drop to about 7% (from 11 previously or 38 initially) and not
showing any resolvers lock contention anymore in perf top output.

The change was kept fairly minimal to permit a backport once enough
testing is conducted on it. It could address a significant part of
the trouble reported by Felipe in GH issue #3101.
2025-09-10 16:51:14 +02:00
Willy Tarreau
d624aceaef MEDIUM: dns: bind the nameserver sockets to the initiating thread
There's still a big architectural limitation in the dns/resolvers code
regarding threads: resolvers run as a task that is scheduled to run
anywhere, and each NS dgram socket is bound to any thread of the same
thread group as the initiating thread. This becomes a big problem when
dealing with multiple nameservers because responses arrive on any thread,
start by locking the resolvers section, and other threads dealing with
responses are just stuck waiting for the lock to disappear. This means
that most of the time is exclusively spent causing contention. The
process_resolvers() function also also suffers from this contention
but apparently less often.

It turns out that the nameserver sockets are created during emission
of the first packet, triggered from the resolvers task. The present
patch exploits this to stick all sockets to the calling thread instead
of any thread. This way there is no longer any contention between
multiple nameservers of a same resolvers section. Tests with a section
having 10 name servers showed that the CPU usage dropped from 38 to
about 10%, or almost by a factor of 4.

Note that TCP resolvers do not offer this possibility because the
tasks that manage the applets are created earlier to run anywhere
during config parsing. This might possibly be refined later, e.g.
by changing the task's affinity when it first runs.

The change was kept fairly minimal to permit a backport once enough
testing is conducted on it. It could address a significant part of
the trouble reported by Felipe in GH issue #3101.
2025-09-10 16:48:09 +02:00
Olivier Houchard
07c10ec2f1 BUG/MEDIUM: ssl: Fix a crash if we failed to create the mux
In ssl_sock_io_cb(), if we failed to create the mux, we may have
destroyed the connection, so only attempt to access it to get the ALPN
if conn_create_mux() was successful.
This fixes crashes that may happen when using ssl.
2025-09-10 12:02:53 +02:00
Olivier Houchard
1759c97255 BUG/MEDIUM: ssl: Fix a crash when using QUIC
Commit 5ab9954faa9c815425fa39171ad33e75f4f7d56f introduced a new flag in
ssl_sock_ctx, to know that an ALPN was negociated, however, the way to
get the ssl_sock_ctx was wrong for QUIC. If we're using QUIC, get it
from the quic_conn.
This should fix crashes when attempting to use QUIC.
2025-09-10 11:45:03 +02:00
Willy Tarreau
be86a69fe8 DEBUG: stick-tables: export stktable_add_pend_updates() for better reporting
This function is a tasklet handler used to send peers updates, and it can
happen quite a bit in "show tasks" and "show profiling tasks", so let's
export it so that we don't face a cryptic symbol name:

  $ socat - /tmp/haproxy-n10.stat <<< "show tasks"
  Running tasks: 43 (8 threads)
    function                     places     %    lat_tot   lat_avg  calls_tot  calls_avg calls%
    process_table_expire             16   37.2   1.072m    4.021s      115831       7239   15.4
    task_process_applet              15   34.8   1.072m    4.287s      486299      32419   65.0
    stktable_add_pend_updates         8   18.6      -         -         89725      11215   12.0
    sc_conn_io_cb                     3    6.9      -         -          5007       1669    0.6
    process_peer_sync                 1    2.3   4.293s    4.293s       50765      50765    6.7

This should be backported to 3.2 as it participates to debugging the
table+peers processing overhead.
2025-09-10 11:34:51 +02:00
Willy Tarreau
993c09438b BUG/MEDIUM: stick-tables: don't loop on non-expirable entries
The stick-table expiration of ref-counted entries was insufficiently
addresse by commit 324f0a60ab ("BUG/MINOR: stick-tables: never leave
used entries without expiration"), because now entries are just requeued
where they were, so they're visited over and over for long sessions,
causing process_table_expire() to loop, eating CPU and causing lock
contention.

Here we take care of refreshing their timeer when they are met, so
that we don't meet them more than once per stick-table lifetime. It
should address at least a part of the recent degradation that Felipe
noticed in GH #3084.

Since the fix above was marked for backporting to 3.2, this one should
be backported there as well.
2025-09-10 11:27:27 +02:00
Willy Tarreau
997d217dee MINOR: tools: don't emit "+0" for symbol names which exactly match known ones
resolve_sym_name() knows a number of symbols, but when one exactly matches
(e.g. a task's handler), it systematically displays the offset behind it
("+0"). Let's only show the offset when non-zero. This can be backported
as this is helpful for debugging.
2025-09-10 10:44:33 +02:00
Willy Tarreau
9eb35563a6 MINOR: activity: indicate the number of calls on "show tasks"
The "show tasks" command can be useful to inspect run queues for active
tasks, but currently it's difficult to distinguish an occasional running
task from a heavily active one. Let's collect the number of calls for
each of them, report them average on the number of instances of each task
as well as a percentage of the total used. This way it even becomes
possible to get a hint about how CPU usage is distributed.
2025-09-10 10:44:33 +02:00
Willy Tarreau
17d3392348 BUG/MINOR: activity: fix reporting of task latency
In 2.4, "show tasks" was introduced by commit 7eff06e162 ("MINOR:
activity: add a new "show tasks" command to list currently active tasks")
to expose some info about running tasks. The latency is not correct
because it's a u32 subtracted from a u64. It ought to have been casted
to u32 for the operation, which is what this patch does.

This can be backported to 2.4.
2025-09-10 10:44:33 +02:00
Willy Tarreau
bdff394195 BUILD: ssl: address a recent build warning when QUIC is enabled
Since commit 5ab9954faa ("MINOR: ssl: Add a flag to let it known we have
an ALPN negociated"), when building with QUIC we get this warning:

  src/ssl_sock.c: In function 'ssl_sock_advertise_alpn_protos':
  src/ssl_sock.c:2189:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]

Let's just move the instructions after the optional declaration. No
backport is needed.
2025-09-10 10:44:33 +02:00
Olivier Houchard
d4c51a4f57 MEDIUM: server: Make use of the stored ALPN stored in the server
Now that which ALPN gets negociated for a given server, use that to
decide if we can create the mux right away in connect_server(), and use
it in conn_install_mux_be().
That way, we may create the mux soon enough for early data to be sent,
before the handshake has been completed.
This commit depends on several previous commits, and it has not been
deemed important enough to backport.
2025-09-09 19:01:24 +02:00
Willy Tarreau
6a2b3269f9 CLEANUP: backend: clarify the cases where we want to use early data
The conditions to use early data on output are super tricky and
detected later, so that it's difficult to figure how this works. This
patch splits the condition in two parts, the one that can be performed
early that is based on config/client/etc. It is used to clear a variable
that allows early data to be used in case any condition is not satisfied.
It was purposely split into multiple independent and reviewable tests.

The second part remains where it was at the end, and is used to temporarily
clear the handshake flags to let the data layer use early data. This one
being tricky, a large comment explaining the principle was added.

The logic was not changed at all, only the code was made more readable.
2025-09-09 19:01:24 +02:00
Willy Tarreau
9b9d0720e1 CLEANUP: backend: simplify the complex ifdef related to 0RTT in connect_server()
Since 3.0 we have HAVE_SSL_0RTT precisely to avoid checking horribly
complicated and unmaintainable conditions to detect support for 0RTT.
Let's just drop the complex condition and use the macro instead.
2025-09-09 19:01:24 +02:00
Willy Tarreau
4aaf0bfbce CLEANUP: backend: invert the condition to start the mux in connect_server()
Instead of trying to switch from delayed start to instant start based
on a single condition, let's do the opposite and preset the condition
to instant start and detect what could cause it to be delayed, thus
falling back to the slow mode. The condition remains exactly the
inverted one and better matches the comment about ALPN being the only
cause of such a delay.
2025-09-09 19:01:24 +02:00
Willy Tarreau
7b4a7f92b5 CLEANUP: backend: clarify the role of the init_mux variable in connect_server()
The init_mux variable is currently used in a way that's not super easy
to grasp. It's set a bit too late and requires to know a lot of info at
once. Let's first rename it to "may_start_mux_now" to clarify its role,
as the purpose is not to *force* the mux to be initialized now but to
permit it to do it.
2025-09-09 19:01:24 +02:00
Olivier Houchard
ff47ae60f3 MEDIUM: server: Introduce the concept of path parameters
Add a new field in struct server, path parameters. It will contain
connection informations for the server that are not expected to change.
For now, just store the ALPN negociated with the server. Each time an
handhskae is done, we'll update it, even though it is not supposed to
change. This will be useful when trying to send early data, that way
we'll know which mux to use.
Each time the server goes down or is disabled, those informations are
erased, as we can't be sure those parameters will be the same once the
server will be back up.
2025-09-09 19:01:24 +02:00
Olivier Houchard
9d65f5cd4d MINOR: ssl: Use the new flag to know when the ALPN has been set.
How that we have a flag to let us know the ALPN has been set, we no
longer have to call ssl_sock_get_alpn() to know if the alpn has been
negociated already.
Remove the call to conn_create_mux() from ssl_sock_handshake(), and just
reuse the one already present in ssl_sock_io_cb() if we have received
early data, and if the flag is set.
2025-09-09 19:01:24 +02:00
Olivier Houchard
5ab9954faa MINOR: ssl: Add a flag to let it known we have an ALPN negociated
Add a new flag to the ssl_sock_ctx, to be set as soon as the ALPN has
been negociated.
This happens before the handshake has been completed, and that
information will let us know that, when we receive early data, if the
ALPN has been negociated, then we can immediately create a mux, as the
ALPN will tell us which mux to use.
2025-09-09 19:01:24 +02:00
Olivier Houchard
6b78af837d BUG/MEDIUM: ssl: create the mux immediately on early data
If we received early data, and an ALPN has been negociated, then
immediately try to create a mux if we did not have one already.
Generally, at this point we would not have one, as the mux is decided by
the ALPN, however at this point, even if the handshake is not done yet,
we have enough to determine the ALPN, so we can immediately create the
mux.
Doing so makes up able to treat the request immediately, without waiting
for the handshake to be done.

This should be backported up to 2.8.
2025-09-09 19:01:24 +02:00
Olivier Houchard
aa25ddb773 BUG/MEDIUM: h1: Allow reception if we have early data
In h1_recv_allowed(), do not forbid the reception if we are yet to
complete the connection, if we have received early data on it. That way,
we can deal with them right away, instead of waiting for the handshake
to be done.

This should be backported up to 2.8.
2025-09-09 19:01:24 +02:00
Willy Tarreau
d7696d11e1 MEDIUM: peers: don't even try to process updates under contention
Recent fix 2421c3769a ("BUG/MEDIUM: peers: don't fail twice to grab the
update lock") improved the situation a lot for peers under locking
contention but still not enough for situations with many peers and
many entries to expire fast. It's indeed still possible to trigger
warnings at end of injection sessions for 16 peers at 100k req/s each
doing 10 random track-sc when process_table_expire() runs and holds the
update lock if compiled with a high value of STKTABLE_MAX_UPDATES_AT_ONCE
(1000). Better just not insist in this case and postpone the update.

At this point, under load only ebmb_lookup() consumes CPU, other functions
are in the few percent, indicating reasonable contention, and peers remain
updated.

This should be backported to 3.2 after a bit of testing.
2025-09-09 17:56:37 +02:00
Willy Tarreau
d5e7fba5c0 MEDIUM: stick-tables: don't wait indefinitely in stktable_add_pend_updates()
This one doesn't need to wait forever, if it cannot work it can postpone
it. When building with a high value of STKTABLE_MAX_UPDATES_AT_ONCE (1000),
it's still possible to trigger warnings in this function on the write lock
that is contended by peers and expiration. Changing it for a trylock resolves
the issue.

This should be backported to 3.2 after a bit of testing.
2025-09-09 17:56:37 +02:00
Willy Tarreau
a771b14541 MEDIUM: stick-tables: give up on lock contention in process_table_expire()
process_table_expire() can take quite a lot of time running over all
shards. During this time it will hinder track-sc rules and peers, which
will experience an increased latency to do their work, especially peers
where each message will cause a lock, whose cumulated time can exceed
the watchdog's patience.

Here, we proceed just like in stktable_trash_oldest(), which is that
we're using a trylock to detect contention. The first time it happens,
if we hadn't purged anything, we switch to a regular lock to perform
the operation, and next time it happens we abort. This guarantees that
some entries will be expired and that contention will be reduced with
when detected.

With this change, various tests didn't manage to produce any warning,
including at the end of the load generation session.

This should be backported to 3.2 after a bit more testing.
2025-09-09 17:56:37 +02:00
Willy Tarreau
f87cf8b76e MEDIUM: stick-tables: relax stktable_trash_oldest() to only purge what is needed
stktable_trash_oldest() does insist a lot on purging what was requested,
only limited by STKTABLE_MAX_UPDATES_AT_ONCE. This is called in two
conditions, one to allocate a new stksess, and the other one to purge
entries of a stopping process. The cost of iterating over all shards
is huge, and a shard lock is taken each time before looking up entries.

Moreover, multiple threads can end up doing the same and looking hard for
many entries to purge when only one is needed. Furthermore, all threads
start from the same shard, hence synchronize their locks. All of this
costs a lot to other operations such as access from peers.

This commit simplifies the approach by ignoring the budget, starting
from a random shard number, and using a trylock so as to be able to
give up early in case of contention. The approach chosen here consists
in trying hard to flush at least one entry, but once at least one is
evicted or at least one trylock failed, then a failure on the trylock
will result in finishing.

The function now returns a success as long as one entry was freed.

With this, tests no longer show watchdog warnings during tests, though
a few still remain when stopping the tests (which are not related to
this function but to the contention from process_table_expire()).

With this change, under high contention some entries' purge might be
postponed and the table may occasionally contain slightly more entries
than their size (though this already happens since stksess_new() first
increments ->current before decrementing it).

Measures were made on a 64-core system with 8 peers
of 16 threads each, at CPU saturation (350k req/s each doing 10
track-sc) for 10M req, with 3 different approaches:

  - this one resulted in 1500 failures to find an entry (0.015%
    size overhead), with the lowest contention and the fairest
    peers distibution.

  - leaving only after a success resulted in 229 failures (0.0029%
    size overhead) but doubled the time spent in the function (on
    the write lock precisely).

  - leaving only when both a success and a failed lock were met
    resulted in 31 failures (0.00031% overhead) but the contention
    was high enough again so that peers were not all up to date.

Considering that a saturated machine might exceed its entries by
0.015% is pretty minimal, the mechanism is kept.

This should be backported to 3.2 after a bit more testing as it
resolves some watchdog warnings and panics. It requires precedent
commit "MINOR: stick-table: permit stksess_new() to temporarily
allocate more entries" to over-allocate instead of failing in case
of contention.
2025-09-09 17:56:37 +02:00
Willy Tarreau
b119280f60 MINOR: stick-table: permit stksess_new() to temporarily allocate more entries
stksess_new() calls stktable_trash_oldest() to release some entries.
If it fails however, it will fail to allocate an entry. This is a problem
because it doesn't permit stktable_trash_oldest() to be used in best effort
mode, which forces it to impose high contention. There's no problem with
allocating slightly more in practice. In the worst case if all entries are
in use, it's not shocking to temporarily exceed the number of entries by a
few units.

Let's relax this problematic rule. This patch might need to be backported
to 3.2 after a bit more testing in order to support locking relaxation.
2025-09-09 17:56:37 +02:00
Willy Tarreau
0f33a55171 DEBUG: peers: export functions that use locks
The following functions take locks and are often involved in warnings
but are currently not resolved, so let's export them so that they are
properly decoded:

  peer_prepare_updatemsg(), peer_send_teachmsgs(),
  peer_treat_updatemsg(), peer_send_msgs(), peer_io_handler()

This should be backported to 3.2.
2025-09-09 17:56:14 +02:00
Willy Tarreau
25195ba1e7 MINOR: debug: report the time since last wakeup and call
When task profiling is enabled, the current thread knows when the
currently running task was woken up and called, so we can calculate
how long ago it was woken up and called. This is convenient to figure
whether or not a warning or panic is caused by this task or by a
previous one, so let's report this info in thread outputs when known.

It would be useful to backport this to 3.2.
2025-09-09 17:56:14 +02:00
Willy Tarreau
12bc4f9c44 MINOR: debug: report the number of loops and ctxsw for each thread
When multiple similar warnings are emitted, it can be difficult to know
whether only one task is looping slowly or if many are sharing the CPU.
Let's report the number of context switches and polling loop turns in
thread dumps so that warnings are easier to understand.

This should be backported to 3.2.
2025-09-09 17:56:14 +02:00
Willy Tarreau
c3f94fbd9b DEBUG: stream: count the number of passes in the connect loop
Normally the connect loop cannot loop, but some recent traces can easily
convince one of the opposite. Let's add a counter, including in panic
dumps, in order to avoid the repeated long head scratching sessions
starting with "and what if...". In addition, if it's found to loop, this
time it will be certain and will indicate what to zoom in. This should
be backported to 3.2.
2025-09-09 17:56:14 +02:00
Willy Tarreau
8153cf1e51 MINOR: debug: report the process id in warnings and panics
Warning and panic messages currently do not report the PID. This is
annoying when trying to reproduce problems because warnings do not
allow know which process to attach to in order to debug, and panics
do not permit to know which core dump corresponds to which dump.
Let's add them in both messages. This should probably be backported
at least to 3.2.
2025-09-09 17:56:14 +02:00
Amaury Denoyelle
0678d0a69b MINOR: check: reject invalid check config on a QUIC server
QUIC is now supported on the backend side. The previous commit ensures
that simple checks can be activated on QUIC servers without any issue.

The current patch ensures that check server settings remain compatible
with a QUIC server. Thus, configuration is now invalid if check
specifies an explicit MUX proto other than QUIC, disables SSL or try to
use PROXY protocol.
2025-09-09 16:55:09 +02:00
Amaury Denoyelle
cd3027a7ee BUG/MINOR: check: ensure checks are compatible with QUIC servers
Previously, checks were only performed on TCP. However, QUIC is now
supported on backend. Prior to this patch, check activation for QUIC
servers would result in a crash.

To ensure compatibility between QUIC servers and checks, adjust
protocol_lookup() performed during check connect step. Instead of using
a hardcoded PROTO_TYPE_STREAM, the value is now derived from server
settings.

This does not need to be backported.
2025-09-09 16:55:09 +02:00