25591 Commits

Author SHA1 Message Date
Willy Tarreau
67603162c1 DOC: config: clarify some known limitations of the json_query() converter
Oula Kivalo reported that different JSON libraries may process duplicate
keys differently and that most JSON libraries usually decode the stream
before extracting keys, while the current mjson implementation decodes the
contents during extraction instead. Let's document this point so that
users are aware of the limitations and do not rely on the current behavior
and do not use it for what it's not made for (e.g. content sanitization).

This is also the case for jwt_header_query(), jwt_payload_query() and
jwt_verify(), which already refer to this converter for specificities.
2025-10-02 08:57:39 +02:00
Olivier Houchard
b71bb6c2ae BUG/MEDIUM: fwlc: Handle memory allocation failures.
Properly handle memory allocation failures, by checking the return value
for pool_alloc(), and if it fails, make sure that the caller will take
it into account.
The only use of pool_alloc() in fwlc is to allocate the tree elements in
order to properly queue the server into the ebtree, so if that
allocation fails, just schedule the requeue tasklet, that will try
again, until it hopefully eventually succeeds.

This should be backported to 3.2.
This should fix github issue #3143.
2025-10-01 18:13:33 +02:00
Olivier Houchard
f4a9c6ffae MEDIUM: fwlc: Make it so fwlc_srv_reposition works with unqueued srv
Modify fwlc_srv_reposition() so that it does not assume that the server
was already queued, and so make it so it works even if s->tree_elt is
NULL.
While the server will usually be queued, there is an unlikely
possibility that when the server attempted to get queued when it got up,
it failed due to a memory allocation failure, and it just expect the
server_requeue tasklet to run to take care of that later.

This should be backported to 3.2.
This is part of an attempt to fix github issue #3143
2025-10-01 18:13:33 +02:00
Olivier Houchard
822ee90dc2 MEDIUM: servers: Schedule the server requeue target on creation
On creation, schedule the server requeue once it's been created.
It is possible that when the server went up, it tried to queue itself
into the lb specific code, failed to do so, and expect the tasklet to
run to take care of that.

This should be backported to 3.2.
This is part of an attempt to fix github issue #3143.
2025-10-01 18:13:33 +02:00
Willy Tarreau
7ea80cc5b6 MEDIUM: ssl: don't always process pending handshakes on closed connections
If a client aborts a pending SSL connection for whatever reason (timeout
etc) and the listen queue is large, it may inflict a severe load to a
frontend which will spend the CPU creating new sessions then killing the
connection. This is similar to HTTP requests aborted just after being
sent, except that asymmetric crypto is way more expensive.

Unfortunately "option abortonclose" has no effect on this, because it
only applies at a higher level.

This patch ensures that handshakes being received on a frontend having
"option abortonclose" set will be checked for a pending close, and if
this is the case, then the connection will be aborted before the heavy
calculations. The principle is to use recv(MSG_PEEK) to detect the end,
and to destroy the pending handshake data before returning to the SSL
library so that it cannot start computing, notices the error and stops.
We don't do it without abortonclose though, because this can be used for
health checks from other haproxy nodes or even other components which
just want to see a handshake succeed.

This is in relation with GH issue #3124.
2025-10-01 10:23:04 +02:00
Willy Tarreau
1afaa7b59d MINOR: rawsock: introduce CO_RFL_TRY_HARDER to detect closures on complete reads
Normally, when reading a full buffer, or exactly the requested size, it
is not really possible to know if the peer had closed immediately after,
and usually we don't care. There's a problematic case, though, which is
with SSL: the SSL layer reads in small chunks of a few bytes, and can
consume a client_hello this way, then start computation without knowing
yet that the client has aborted. In order to permit knowing more, we now
introduce a new read flag, CO_RFL_TRY_HARDER, which says that if we've
read up to the permitted limit and the flag is set, then we attempt one
extra byte using MSG_PEEK to detect whether the connection was closed
immediately after that content or not. The first use case will obviously
be related to SSL and client_hello, but it might possibly also make sense
on HTTP responses to detect a pending FIN at the end of a response (e.g.
if a close was already advertised).
2025-10-01 10:23:01 +02:00
Willy Tarreau
dae4cfe8c5 MINOR: ssl: add the ssl_bc_sni sample fetch function to retrieve backend SNI
Sometimes in order to debug certain difficult situations it can be useful
to know what SNI was configured on a connection going to a server, for
example to match it against what the server saw or to detect cases where
a server would route on SNI instead of Host. This sample fetch function
simply retrieves the SNI configured on the backend connection, if any.
2025-10-01 10:18:53 +02:00
Willy Tarreau
205f1cbf4c BUG/MEDIUM: wdt: improve stuck task detection accuracy
The fact that the watchdog timer measures the execution time from the
last return from the poller tends to amplify the impact of multiple
bad tasks, and may explain some of the panics reported by Felipe and
Ricardo in GH issues #3084, #3092 and #3101. The problem is that we
check the time if we see that the scheduler appears not to be moving
anymore, but one situation may still arise and catch a bad task:
  - one slow task takes so long a time that it triggers the watchdog
    twice, emitting a warning the second time (~200ms). The scheduler
    is rightfully marked as stuck.
  - then it completes and the scheduler is no longer stuck. Many other
    tasks run in turn, they all take quite some time but not enough to
    trigger a warning. But collectively their cost adds up.
  - then a task takes more than the warning time (100ms), and causes
    the total execution time to cross the second. The watchdog is
    called, sees that we've spend more than 1 second since we left the
    poller, and marks the thread as stuck.
  - the task is not finished, the watchdog is called again, sees more
    than one second with a stuck thread and panics 100ms later.

The total time away from the poller is indeed more than one second,
which is very bad, but no single task caused this individually, and
while the warnings are OK, the watchdog should not panic in this case.

This patch revisits the approach to store the moment the scheduler was
marked as stuck in the wdt context. The idea is that this date will be
used to detect warnings and panics. And by doing so and exploiting the
new is_sched_alive(thr), we can greatly simplify the mechanism so that
the signal handling thread does the strict minimum (mark the scheduler
as possibly stuck and update the stuck_start date), and only bounces to
the reporting thread if the scheduler made no progress since last call.
This means that without even doing computations in the handing thread,
we can continue to avoid all bounces unless a warning is required. Then
when the reporting thread is signaled, it will check the dates from the
last moment the scheduler was marked, and will decide to warn or panic.

The panic decision continues to pass via a TH_FL_STUCK flag to probe the
code so that exceptionally slow code (e.g. live cert generation etc) can
still find a way to avoid the panic if absolutely certain that things
are still moving.

This means that now we have the guarantee that panics will only happen
if a given task spends more than one full second not moving, and that
warnings will be issued for other calls crossing the warn delay boundary.

This was tested using artificially slow operations, and all combinations
which individually took less than a second only resulted in floods of
warnings even if the total reported time in the warning was much higher,
while those above one second provoked the panic.

One improvement could consist in reporting the time since last stuck
in the thread dumps to differentiate the individual task from the whole
set.

This needs to be backported to 3.2 along with the two previous patches:

    MINOR: sched: let's permit to share the local ctx between threads
    MINOR: sched: pass the thread number to is_sched_alive()
2025-10-01 10:18:53 +02:00
Willy Tarreau
25f5f357cc MINOR: sched: pass the thread number to is_sched_alive()
Now it will be possible to query any thread's scheduler state, not
only the current one. This aims at simplifying the watchdog checks
for reported threads. The operation is now a simple atomic xchg.
2025-10-01 10:18:53 +02:00
Willy Tarreau
7c7e17a605 MINOR: sched: let's permit to share the local ctx between threads
The watchdog timer has to go through complex operations due to not being
able to check if another thread's scheduler is still ticking. This is
simply because the scheduler status is marked as thread-local while it
could in fact also be an array. Let's do that (and align the array to
avoid false sharing) so that it's now possible to check any scheduler's
status.
2025-10-01 10:18:53 +02:00
Olivier Houchard
21ae35dd29 BUG/MEDIUM: stick-tables: Make sure not to free a pending entry
There is a race condition, an entry can be free'd by stksess_kill()
between the time stktable_add_pend_updates() gets the entry from the
mt_list, and the time it adds it to the ebtree.
To prevent this, use the newly implemented MT_LIST_POP_LOCKED() to keep
the stksess locked until it is added to the tree. That way,
__stksess_kill() will wait until we're done with it.

This should be backported to 3.2.
2025-09-30 16:25:07 +02:00
Olivier Houchard
cf26745857 MINOR: mt_list: Implement MT_LIST_POP_LOCKED()
Implement MT_LIST_POP_LOCKED(), that behaves as MT_LIST_POP() and
removes the first element from the list, if any, but keeps it locked.

This should be backported to 3.2, as it will be use in a bug fix in the
stick tables that affects 3.2 too.
2025-09-30 16:25:07 +02:00
William Lallemand
6316f958e3 ADMIN: reload: introduce -vv mode
The -v verbose mode displays the loading messages returned by the master
CLI reload command upon error.

The new -vv mode displays the loading messages even upon success,
showing the content of `show startup-logs` after the reload attempt.
2025-09-29 19:29:10 +02:00
William Lallemand
5d05f343b9 ADMIN: reload: introduce verbose and silent mode
By default haproxy-reload displays the error that are not emitted by
haproxy, but only emitted by haproxy-reload.

-s silent mode, don't display any error

-v verbose mode, display the loading messages returned by the master CLI
reload command upon error.
2025-09-29 19:29:10 +02:00
William Lallemand
3ce597bfa2 BUG/MEDIUM: acme: free() of i2d_X509_REQ() with AWS-LC
When using AWS-LC, the free() of the data ptr resulting from
i2d_X509_REQ() might crash, because it uses the free() of the libc
instead of OPENSSL_free().

It does not seems to be a problem on openssl builds.

Must be backported in 3.2.
2025-09-29 13:46:51 +02:00
William Lallemand
8635c7d789 ADMIN: reload: add a synchronous reload helper
haproxy-reload is a utility script which reload synchronously using the
master CLI, instead of asynchronously with kill.
2025-09-28 22:10:40 +02:00
William Lallemand
02f7bff90b ADMIN: dump-certs: use same error format as haproxy
Replace error/notice by [ALERT]/[WARNING]/[NOTICE] like it's done in
haproxy.

ALERT means a failure and the program will exit 1 just after it
WARNING will continue the execution of the program
NOTICE will continue the execution as well
2025-09-28 20:21:07 +02:00
William Lallemand
5c9f28641b ADMIN: dump-certs: fix lack of / in -p
Add a trailing / so -p don't fail if it wasn't specified.
2025-09-28 18:21:25 +02:00
William Lallemand
172ac6ad03 ADMIN: dump-certs: create files in a tmpdir
Files dumped from the socket are put in a temporary directory, this
directory is then removed upon exit.

Variable were cleaned to be clearer:
- crt_filename -> prev_crt
- key_filename -> prev_key
- ${crt_filename}.${tmp} -> new_crt
- ${key_filename}.${tmp} -> new_key
2025-09-28 18:21:25 +02:00
William Lallemand
8781c65d8a ADMIN: dump-certs: don't update the file if it's up to date
Compare the fingerprint of the leaf certificate to the previous file to
check if it needs to be updated or not

Also skip the check if no file is on the disk.
2025-09-28 18:21:20 +02:00
William Lallemand
3a6ea8b959 ADMIN: haproxy-dump-certs: implement a certificate dumper
haproxy-dump0-certs is a bash script that connects to your master socket
or your stat socket in order to dump certificates from haproxy memory to
the corresponding files.
2025-09-28 13:38:48 +02:00
William Lallemand
b70c7f48fa MINOR: acme: implement "reuse-key" option
The new "reuse-key" option in the "acme" section, allows to keep the
private key instead of generating a new one at each renewal.
2025-09-27 21:41:39 +02:00
William Lallemand
a9ccf692e7 BUG/MEDIUM: acme: cfg_postsection_acme() don't init correctly acme sections
The cfg_postsection_acme() redefines its own cur_acme variable, pointing
to the first acme section created. Meaning that the first section would
be init multiple times, and the next sections won't never be
initialized.

It could result in crashes at the first use of all sections that are not
the first one.

Must be backported in 3.2
2025-09-27 19:58:44 +02:00
William Lallemand
406fd0ceb1 BUG/MINOR: acme: don't unlink from acme_ctx_destroy()
Unlinking the acme_ctx element from acme_ctx_destroy() requires to have
the element unlocked, because MT_LIST_DELETE() locks the element.

acme_ctx_destroy() frees the data from acme_ctx with the ctx still
linked and unlocked, then lock to unlink. So there's a small risk of
accessing acme_ctx from somewhere else. The only way to do that would be
to use the `acme challenge_ready` CLI command at the same time.

Fix the issue by doing a mt_list_unlock_link() and a
mt_list_unlock_self() to unlink the element under the lock, then destroy
the element.

This must be backported in 3.2.
2025-09-27 18:52:56 +02:00
William Lallemand
6499c0a0d5 CI: github: build halog on the vtest job
halog was not built in the vtest job. Add it to vtest.yml to be able to
track build issues on push.
2025-09-26 16:29:29 +02:00
William Lallemand
f1f5877ce1 BUILD: halog: misleading indentation in halog.c
admin/halog/halog.c: In function 'filter_count_url':
admin/halog/halog.c:1685:9: error: this 'if' clause does not guard... [-Werror=misleading-indentation]
 1685 |         if (unlikely(!ustat))
      |         ^~
admin/halog/halog.c:1687:17: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if'
 1687 |                 if (unlikely(!ustat)) {
      |                 ^~

This patch fixes the indentation.

Must be backported where fbd0fb20a22 ("BUG/MINOR: halog: Add OOM checks
for calloc() in filter_count_srv_status() and filter_count_url()") was
backported.
2025-09-26 16:01:50 +02:00
Chris Staite
54f53bc875 MINOR: backend: srv_is_up converter
There is currently an srv_queue converter which is capable of taking the
output of a dynamic name and determining the queue length for a given
server.  In addition there is a sample fetcher for whether a server is
currently up.  This simply combines the two such that srv_is_up can be
used as a converter too.

Future work might extend this to other sample fetchers for servers, but
this is probably the most useful for acl routing.
2025-09-26 10:46:48 +02:00
Chris Staite
faba98c85f MINOR: backend: srv_queue helper
In preparation of providing further server converters, split the code
for finding the server from the sample out.

Additionally, update the documentation for srv_queue converter to note
security concerns.
2025-09-26 10:46:48 +02:00
William Lallemand
b3b910cc3f BUILD: acme: fix false positive null pointer dereference
src/acme.c: In function ‘cfg_parse_acme_vars_provider’:
src/acme.c:471:9: error: potential null pointer dereference [-Werror=null-dereference]
  471 |         free(*dst);
      |         ^~~~~~~~~~

gcc13 on ubuntu 24.04 detects a false positive when building
3e72a9f ("MINOR: acme: provider-name for dpapi sink").
Indeed dst can't be NULL. Clarify the code so gcc don't complain
anymore.
2025-09-26 10:34:35 +02:00
William Lallemand
3e72a9f618 MINOR: acme: provider-name for dpapi sink
Like "acme-vars", the "provider-name" in the acme section is used in
case of DNS-01 challenge and is sent to the dpapi sink.

This is used to pass the name of a DNS provider in order to chose the
DNS API to use.

This patch implements the cfg_parse_acme_vars_provider() which parses
either acme-vars or provider-name options and escape their strings.

Example:

     $ ( echo "@@1 show events dpapi -w -0"; cat - ) | socat /tmp/master.sock -  | cat -e
     <0>2025-09-18T17:53:58.831140+02:00 acme deploy foobpar.pem thumbprint gDvbPL3w4J4rxb8gj20mGEgtuicpvltnTl6j1kSZ3vQ$
     acme-vars "var1=foobar\"toto\",var2=var2"$
     provider-name "godaddy"$
     {$
       "identifier": {$
         "type": "dns",$
         "value": "example.com"$
       },$
       "status": "pending",$
       "expires": "2025-09-25T14:41:57Z",$
       [...]
2025-09-26 10:23:35 +02:00
William Lallemand
c52d69cc78 BUG/MEDIUM: ssl: ca-file directory mode must read every certificates of a file
The httpclient is configured with @system-ca by default, which uses the
directory returned by X509_get_default_cert_dir().

On debian/ubuntu systems, this directory contains multiple certificate
files that are loaded successfully. However it seems that on other
systems the files in this directory is the direct result of
ca-certificates instead of its source. Meaning that you would only have
a bundle file with every certificates in it.

The loading was not done correctly in case of directory loading, and was
only loading the first certificate of each file.

This patch fixes the issue by using X509_STORE_load_locations() on each
file from the scandir instead of trying to load it manually with BIO.

Not that we can't use X509_STORE_load_locations with the `dir` argument,
which would be simpler, because it uses X509_LOOKUP_hash_dir() which
requires a directory in hash form. That wouldn't be suited for this use
case.

Must be backported in every stable branches.

Fix issue #3137.
2025-09-26 09:36:55 +02:00
William Lallemand
230a072102 CI: github: add curl+ech build into openssl-ech job
Build a curl binary with the ECH function linked with our openssl+ech
library.
2025-09-25 17:05:46 +02:00
William Lallemand
44b20e0b01 CI: scripts: build curl with ECH support
Add a script to build curl with ECH support, to specify the path of the
openssl+ECH library, you should set the SSL_LIB variable with the prefix
of the library.

Example:
   SSL_LIB=/opt/openssl-ech CURL_DESTDIR=/opt/curl-ech/ ./build-curl.sh
2025-09-25 17:05:46 +02:00
Christopher Faulet
7aa9f5ec98 BUG/MINOR: pattern: Fix pattern lookup for map with opt@ prefix
When we look for a map file reference, the file@ prefix is removed because
if may be omitted. The same is true with opt@ prefix. However this case was
not properly performed in pat_ref_lookup(). Let's do so.

This patch must be backported as far as 3.0.
2025-09-25 15:28:22 +02:00
William Lallemand
c325e34e6d CLEANUP: acme: acme_will_expire() uses acme_schedule_date()
Date computation between acme_will_expire() and acme_schedule_date() are
the same. Call acme_schedule_date() from acme_will_expire() and put the
functions as static. The patch also move the functions in the right
order.
2025-09-25 15:14:31 +02:00
William Lallemand
f256b5fdf3 BUG/MINOR: acme: possible overflow in acme_will_expire()
acme_will_expire() computes the schedule date using notAfter and
notBefore from the certificate. However notBefore could be greater than
notAfter and could result in an overflow.

This is unlikely to happen and would mean an incorrect certificate.

This patch fixes the issue by checking that notAfter > notBefore.

It also replace the int type by a time_t to avoid overflow on 64bits
architecture which is also unlikely to happen with certificates.

`(date.tv_sec + diff > notAfter)` was also replaced by `if (notAfter -
diff <= date.tv_sec)` to avoid an overflow.

Fix issue #3135.

Need to be backported to 3.2.
2025-09-25 15:12:14 +02:00
William Lallemand
68770479ea BUG/MINOR: acme: possible overflow on scheduling computation
acme_schedule_date() computes the schedule date using notAfter and
notBefore from the certificate. However notBefore could be greater than
notAfter and could result in an overflow.

This is unlikely to happen and would mean an incorrect certificate.

This patch fixes the issue by checking that notAfter > notBefore.

It also replace the int type by a time_t to avoid overflow on 64bits
architecture which is also unlikely to happen with certificates.

Fix issue #3136.

Need to be backported to 3.2.
2025-09-25 15:12:03 +02:00
Christopher Faulet
3be8b06a60 BUG/MINOR: pattern: Properly flag virtual maps as using samples
When a map file is load, internally, the pattern reference is flagged as
based on a sample. However it is not performed for virtual maps. This flag
is only used during startup to check the map compatibility when it used at
different places. At runtime this does not change anything. But errors can
be triggered during configuration parsing. For instance, the following valid
config will trigger an error:

    http-request set-map(virt@test) foo bar if !{ str(foo),map(virt@test) -m found }
    http-request set-var(txn.foo) str(foo),map(virt@test)

The fix is quite obvious. PAT_REF_SMP flag must be set for virtual map as
any other map.

A workaround is to use optional map (opt@...) by checking the map id cannot
reference an existing file.

This patch must be backported as far as 3.0.
2025-09-25 10:16:53 +02:00
Christopher Faulet
23e5d272af BUG/MINOR: compression: Test payload size only if content-length is specified
When a minimum size is defined to performe the comression, the message
payload size is tested. To do so, information from the HTX message a used to
determine the message length. However it is performed regardless the payload
length is fully known or not. Concretely, the test must on be performed when
a content-length value was speficied or when the message was fully received
(EOM flag set). Otherwise, we are unable to really determine the real
payload length.

Because of this bug, compression may be skipped for a large chunked message
because the first chunks received are too small. But this does not mean the
whole message is small.

This patch must be backported to 3.2.
2025-09-25 10:16:53 +02:00
Olivier Houchard
71199e394c BUG/MEDIUM: stick-tables: Don't let table_process_entry() handle refcnt
Instead of having table_process_entry() decrement the session's ref
counter, do it outside, from the caller. Some were missed, such as when
an action was invalid, which would lead to the ref counter not being
decremented, and the session not being destroyable.
It makes more sense to do that from the caller, who just obtained the
ref counter, anyway.
This should be backporter up to 2.8.
2025-09-22 23:14:19 +02:00
Ilia Shipitsin
8c8e50e09a CI: move VTest preparation & friends to dedicated composite action
reference: https://docs.github.com/en/actions/tutorials/create-actions/create-a-composite-action

preparing coredump limits, installing VTest are now served by dedicated
composite action
2025-09-22 19:18:23 +02:00
William Lallemand
fbffd2e25f BUG/MINOR: acme/cli: wrong description for "acme challenge_ready"
The "acme challenge_ready" command mistakenly use the description of the
"acme status" command. This patch adds the right description.

Must be backported to 3.2.
2025-09-22 19:14:54 +02:00
William Lallemand
34cdc5e191 MINOR: acme: check acme-vars allocation during escaping
Handle allocation properly during acme-vars parsing.
Check if we have a allocation failure in both the malloc and the
realloc and emits an error if that's the case.
2025-09-19 18:11:50 +02:00
William Lallemand
92c31a6fb7 MINOR: acme: acme-vars allow to pass data to the dpapi sink
In the case of the dns-01 challenge, the agent that handles the
challenge might need some extra information which depends on the DNS
provider.

This patch introduces the "acme-vars" option in the acme section, which
allows to pass these data to the dpapi sink. The double quotes will be
escaped when printed in the sink.

Example:

    global
        setenv VAR1 'foobar"toto"'

    acme LE
        directory https://acme-staging-v02.api.letsencrypt.org/directory
        challenge DNS-01
        acme-vars "var1=${VAR1},var2=var2"

Would output:

    $ ( echo "@@1 show events dpapi -w -0"; cat - ) | socat /tmp/master.sock -  | cat -e
    <0>2025-09-18T17:53:58.831140+02:00 acme deploy foobpar.pem thumbprint gDvbPL3w4J4rxb8gj20mGEgtuicpvltnTl6j1kSZ3vQ$
    acme-vars "var1=foobar\"toto\",var2=var2"$
    {$
      "identifier": {$
        "type": "dns",$
        "value": "example.com"$
      },$
      "status": "pending",$
      "expires": "2025-09-25T14:41:57Z",$
      [...]
2025-09-19 16:40:53 +02:00
Christopher Faulet
331689d216 BUG/MEDIUM: http-client: Fix the test on the response start-line
The commit 88aa7a780 ("MINOR: http-client: Trigger an error if first
response block isn't a start-line") introduced a bug. From an endpoint, an
applet or a mux, the <first> index must never be used. It is reserved to the
HTTP analyzers. From endpoint, this value may be undefined or just point on
any other block that the first one. Instead we must always get the head
block.

In taht case, to be sure the first HTX block in a response is a start-line,
we must use htx_get_head_type() function instead of htx_get_first_type().
Otherwise, we can trigger an error while the response is in fact properly
formatted.

It is a 3.3-speific issue. cNo backport needed.
2025-09-19 14:59:28 +02:00
Aurelien DARRAGON
5c299dee5a MEDIUM: stats: consider that shared stats pointers may be NULL
This patch looks huge, but it has a very simple goal: protect all
accessed to shared stats pointers (either read or writes), because
we know consider that these pointers may be NULL.

The reason behind this is despite all precautions taken to ensure the
pointers shouldn't be NULL when not expected, there are still corner
cases (ie: frontends stats used on a backend which no FE cap and vice
versa) where we could try to access a memory area which is not
allocated. Willy stumbled on such cases while playing with the rings
servers upon connection error, which eventually led to process crashes
(since 3.3 when shared stats were implemented)

Also, we may decide later that shared stats are optional and should
be disabled on the proxy to save memory and CPU, and this patch is
a step further towards that goal.

So in essence, this patch ensures shared stats pointers are always
initialized (including NULL), and adds necessary guards before shared
stats pointers are de-referenced. Since we already had some checks
for backends and listeners stats, and the pointer address retrieval
should stay in cpu cache, let's hope that this patch doesn't impact
stats performance much.
2025-09-18 16:49:51 +02:00
Aurelien DARRAGON
40eb1dd135 BUG/MEDIUM: sink: fix unexpected double postinit of sink backend
Willy experienced an unexpected behavior with the config below:

    global
        stats socket :1514

    ring buf1
        server srv1 127.0.0.1:1514

Indeed, haproxy would connect to the ring server twice since commit 23e5f18b
("MEDIUM: sink: change the sink mode type to PR_MODE_SYSLOG"), and one of the
connection would report errors.

The reason behind is is, despite the above commit saying no change of behavior
is expected, with the sink forward_px proxy now being set with PR_MODE_SYSLOG,
postcheck_log_backend() was being automatically executed in addition to the
manual cfg_post_parse_ring() function for each "ring" section. The consequence
is that sink_finalize() was called twice for a given "ring" section, which
means the connection init would be triggered twice.. which in turn resulted in
the behavior described above, plus possible unexpected side-effects.

To fix the issue, when we create the forward_px proxy, we now set the
PR_CAP_INT capability on it to tell haproxy not to automatically manage the
proxy (ie: to skip the automatic log backend postinit), because we are about
to manually manage the proxy from the sink API.

No backport needed, this bug is specific to 3.3
2025-09-18 16:49:29 +02:00
Willy Tarreau
79ef362d9e OPTIM: ring: avoid reloading the tail_ofs value before the CAS in ring_write()
The load followed by the CAS seem to cause two bus cycles, one to
retrieve the cache line in shared state and a second one to get
exclusive ownership of it. Tests show that on x86 it's much better
to just rely on the previous value and preset it to zero before
entering the loop. We just mask the ring lock in case of failure
so as to challenge it on next iteration and that's done.

This little change brings 2.3% extra performance (11.34M msg/s) on
a 64-core AMD.
2025-09-18 15:27:32 +02:00
Willy Tarreau
a727c6eaa5 OPTIM: ring: check the queue's owner using a CAS on x86
In the loop where the queue's leader tries to get the tail lock,
we also need to check if another thread took ownership of the queue
the current thread is currently working for. This is currently done
using an atomic load.

Tests show that on x86, using a CAS for this is much more efficient
because it allows to keep the cache line in exclusive state for a
few more cycles that permit the queue release call after the loop
to be done without having to wait again. The measured gain is +5%
for 128 threads on a 64-core AMD system (11.08M msg/s vs 10.56M).
However, ARM loses about 1% on this, and we cannot afford that on
machines without a fast CAS anyway, so the load is performed using
a CAS only on x86_64. It might not be as efficient on low-end models
but we don't care since they are not the ones dealing with high
contention.
2025-09-18 15:08:12 +02:00
Willy Tarreau
d25099b359 OPTIM: ring: always relax in the ring lock and leader wait loop
Tests have shown that AMD systems really need to use a cpu_relax()
in these two loops. The performance improves from 10.03 to 10.56M
messages per second (+5%) on a 128-thread system, without affecting
intel nor ARM, so let's do this.
2025-09-18 15:07:56 +02:00