Compare commits

...

3823 Commits

Author SHA1 Message Date
William Lallemand
90c5618ed5 MEDIUM: systemd: implement directory loading
Redhat-based system already use a CFGDIR variable to load configuration
files from a directory, this patch implements the same feature.

It now requires that /etc/haproxy/conf.d exists or the service won't be
able to start.
2026-01-16 09:55:33 +01:00
Egor Shestakov
a3ee35cbfc REORG/MINOR: cfgparse: eliminate code duplication by lshift_args()
There were similar parts of the code in "no" and "default" prefix
keywords handling. This duplication caused the bug once.

No backport needed.
2026-01-16 09:09:24 +01:00
Egor Shestakov
447d73dc99 BUG/MINOR: cfgparse: fix "default" prefix parsing
Fix the left shift of args when "default" prefix matches. The cause of the
bug was the absence of zeroing of the right element during the shift. The
same bug for "no" prefix was fixed by commit 0f99e3497, but missed for
"default".

The shift of ("default", "option", "dontlog-normal")
    produced ("option", "dontlog-normal", "dontlog-normal")
  instead of ("option", "dontlog-normal", "")

As an example, a valid config line:
    default option dontlog-normal

caused a parse error:
[ALERT]    (32914) : config : parsing [bug-default-prefix.cfg:22] : 'option dontlog-normal' cannot handle unexpected argument 'dontlog-normal'.

The patch should be backported to all stable versions, since the absence of
zeroing was introduced with "default" keyword.
2026-01-16 09:09:19 +01:00
Remi Tricot-Le Breton
362ff2628f REGTESTS: jwe: Fix tests of algorithms not supported by AWS-LC
Many tests use the A128KW algorithm which is not supported by AWS-LC but
instead of removing those tests we will just have a hardcoded value set
by default in this case.
2026-01-15 10:56:28 +01:00
Remi Tricot-Le Breton
aba18bac71 MINOR: jwe: Some algorithms not supported by AWS-LC
AWS-LC does not have EVP_aes_128_wrap or EVP_aes_192_wrap so the A128KW
and A192KW algorithms will not be supported for JWE token decryption.
2026-01-15 10:56:28 +01:00
Remi Tricot-Le Breton
39da1845fc DOC: jwe: Add doc for jwt_decrypt converters
Add doc for jwt_decrypt_secret and jwt_decrypt_cert converters.
2026-01-15 10:56:28 +01:00
Remi Tricot-Le Breton
4b73a3ed29 REGTESTS: jwe: Add jwt_decrypt_secret and jwt_decrypt_cert tests
Test the new jwt_decrypt converters.
2026-01-15 10:56:27 +01:00
Remi Tricot-Le Breton
e3a782adb5 MINOR: jwe: Add new jwt_decrypt_cert converter
This converter checks the validity and decrypts the content of a JWE
token that has an asymetric "alg" algorithm (RSA). In such a case, we
must provide a path to an already loaded certificate and private key
that has the "jwt" option set to "on".
2026-01-15 10:56:27 +01:00
Remi Tricot-Le Breton
416b87d5db MINOR: jwe: Add new jwt_decrypt_secret converter
This converter checks the validity and decrypts the content of a JWE
token that has a symetric "alg" algorithm. In such a case, we only
require a secret as parameter in order to decrypt the token.
2026-01-15 10:56:27 +01:00
Remi Tricot-Le Breton
2b45b7bf4f REGTESTS: ssl: Add tests for new aes cbc converters
This test mimics what was already done for the aes_gcm converters. Some
data is encrypted and directly decrypted and we ensure that the output
was not changed.
2026-01-15 10:56:27 +01:00
Remi Tricot-Le Breton
c431034037 MINOR: ssl: Add new aes_cbc_enc/_dec converters
Those converters allow to encrypt or decrypt data with AES in Cipher
Block Chaining mode. They work the same way as the already existing
aes_gcm_enc/_dec ones apart from the AEAD tag notion which is not
supported in CBC mode.
2026-01-15 10:56:27 +01:00
Remi Tricot-Le Breton
f0e64de753 MINOR: ssl: Factorize AES GCM data processing
The parameter parsing and processing and the actual crypto part of the
aes_gcm converter are interleaved. This patch puts the crypto parts in a
dedicated function for better reuse in the upcoming JWE processing.
2026-01-15 10:56:27 +01:00
Amaury Denoyelle
6870551a57 MEDIUM: proxy: force traffic on unpublished/disabled backends
A recent patch has introduced a new state for proxies : unpublished
backends. Such backends won't be eligilible for traffic, thus
use_backend/default_backend rules which target them won't match and
content switching rules processing will continue.

This patch defines a new frontend keywords 'force-be-switch'. This
keyword allows to ignore unpublished or disabled state. Thus,
use_backend/default_backend will match even if the target backend is
unpublished or disabled. This is useful to be able to test a backend
instance before exposing it outside.

This new keyword is converted into a persist rule of new type
PERSIST_TYPE_BE_SWITCH, stored in persist_rules list proxy member. This
is the only persist rule applicable to frontend side. Prior to this
commit, pure frontend proxies persist_rules list were always empty.

This new features requires adjustment in process_switching_rules(). Now,
when a use_backend/default_backend rule matches with an non eligible
backend, frontend persist_rules are inspected to detect if a
force-be-switch is present so that the backend may be selected.
2026-01-15 09:08:19 +01:00
Amaury Denoyelle
16f035d555 MINOR: cfgparse: adapt warnif_cond_conflicts() error output
Utility function warnif_cond_conflicts() is used when parsing an ACL.
Previously, the function directly calls ha_warning() to report an error.
Change the function so that it now takes the error message as argument.
Caller can then output it as wanted.

This change is necessary to use the function when parsing a keyword
registered as cfg_kw_list. The next patch will reuse it.
2026-01-15 09:08:18 +01:00
Amaury Denoyelle
82907d5621 MINOR: stats: report BE unpublished status
A previous patch defines a new proxy status : unpublished backends. This
patch extends this by changing proxy status reported in stats. If
unpublished is set, an extra "(UNPUB)" is added to the field.

Also, HTML stats is also slightly updated. If a backend is up but
unpublished, its status will be reported in orange color.
2026-01-15 09:08:18 +01:00
Amaury Denoyelle
797ec6ede5 MEDIUM: proxy: implement publish/unpublish backend CLI
Define a new set of CLI commands publish/unpublish backend <be>. The
objective is to be able to change the status of a backend to
unpublished. Such a backend is considered ineligible to traffic : this
allows to skip use_backend rules which target it.

Note that contrary to disabled/stopped proxies, an unpublished backend
still has server checks running on it.

Internally, a new proxy flags PR_FL_BE_UNPUBLISHED is defined. CLI
commands handler "publish backend" and "unpublish backend" are executed
under thread isolation. This guarantees that the flag can safely be set
or remove in the CLI handlers, and read during content-switching
processing.
2026-01-15 09:08:18 +01:00
Amaury Denoyelle
21fb0a3f58 MEDIUM: proxy: do not select a backend if disabled
A proxy can be marked as disabled using the keyword with the same name.
The doc mentions that it won't process any traffic. However, this is not
really the case for backends as they may still be selected via switching
rules during stream processing.

In fact, currently access to disabled backends will be conducted up to
assign_server(). However, no eligible server is found at this stage,
resulting in a connection closure or an HTTP 503, which is expected. So
in the end, servers in disabled backends won't receive any traffic. But
this is only because post-parsing steps are not performed on such
backends. Thus, this can be considered as functional but only via
side-effects.

This patch clarifies the handling of disable backends, so that they are
never selected via switching rules. Now, process_switching_rules() will
ignore disable backends and continue rules evaluation.

As this is a behavior change, this patch is labelled as medium. The
documentation manuel for use_backend is updated accordingly.
2026-01-15 09:08:18 +01:00
Amaury Denoyelle
2d26d353ce REGTESTS: add test on backend switching rules selection
Create a new test to ensure that switching rules selection is fine.
Currently, this checks that dynamic backend switching works as expected.
If a matching rule is resolved to an unexisting backend, the default
backend is used instead.

This regtest should be useful as switching-rules will be extended in a
future set of patches to add new abilities on backends, linked to
dynamic backend support.
2026-01-15 09:08:18 +01:00
Amaury Denoyelle
12975c5c37 MEDIUM: stream: refactor switching-rules processing
This commit rewrites process_switching_rules() function. The objective
is to simplify backend selection so that a single unified
stream_set_backend() call is kept, both for regular and default backends
case.

This patch will be useful to add new capabilities on backends, in the
context of dynamic backend support implementation.
2026-01-15 09:08:18 +01:00
Amaury Denoyelle
2f6aab9211 BUG/MINOR: proxy: free persist_rules
force-persist proxy keyword is converted into a persist_rule, stored in
proxy persist_rules list member. Each new rule is dynamically allocated
during parsing.

This commit fixes the memory leak on deinit due to a missing free on
persist_rules list entries. This is done via deinit_proxy()
modification. Each rule in the list is freed, along with its associated
ACL condition type.

This can be backported to every stable version.
2026-01-15 09:08:18 +01:00
Olivier Houchard
a209c35f30 MEDIUM: thread: Turn the group mask in thread set into a group counter
If we want to be able to have more than 64 thread groups, we can no
longer use thread group masks as long.
One remaining place where it is done is in struct thread_set. However,
it is not really used as a mask anywhere, all we want is a thread group
counter, so convert that mask to a counter.
2026-01-15 05:24:53 +01:00
Olivier Houchard
6249698840 BUG/MEDIUM: queues: Fix arithmetic when feeling non_empty_tgids
Fix the arithmetic when pre-filling non_empty_tgids when we still have
more than 32/64 thread groups left, to get the right index, we of course
have to divide the number of thread groups by the number of bits in a
long.
This bug was introduced by commit
7e1fed4b7a8b862bf7722117f002ee91a836beb5, but hopefully was not hit
because it requires to have at least as much thread groups as there are
bits in a long, which is impossible on 64bits machines, as MAX_TGROUPS
is still 32.
2026-01-15 04:28:04 +01:00
Olivier Houchard
1397982599 MINOR: threads: Eliminate all_tgroups_mask.
Now that it is unused, eliminate all_tgroups_mask, as we can't 64bits
masks to represent thread groups, if we want to be able to have more
than 64 thread groups.
2026-01-15 03:46:57 +01:00
Olivier Houchard
7e1fed4b7a MINOR: queues: Turn non_empty_tgids into a long array.
In order to be able to have more than 64 thread groups, turn
non_empty_tgids into a long array, so that we have enough bits to
represent everty thread group, and manipulate it with the ha_bit_*
functions.
2026-01-15 03:46:57 +01:00
Aurelien DARRAGON
2ec387cdc2 BUG/MINOR: http_act: fix deinit performed on uninitialized lf_expr in release_http_map()
As reported by GH user @Lzq-001 on issue #3245, the config below would
cause haproxy to SEGFAULT after having reported an error:

  frontend 0000000
        http-request set-map %[hdr(0000)0_

Root cause is simple, in parse_http_set_map(), we define the release
function (which is responsible to clear lf_expr expressions used by the
action), prior to initializing the expressions, while the release
function assumes the expressions are always initialized.

For all similar actions, we already perform the init prior to setting
the related release function, but this was not the case for
parse_http_set_map(). We fix the bug by initializing the expressions
earlier.

Thanks to @Lzq-001 for having reported the issue and provided a simple
reproducer.

It should be backported to all stable versions, note for versions prior to
3.0, lf_expr_init() should be replace by LIST_INIT(), see
6810c41 ("MEDIUM: tree-wide: add logformat expressions wrapper")
2026-01-14 20:05:39 +01:00
Olivier Houchard
7f4b053b26 MEDIUM: counters: mostly revert da813ae4d7cb77137ed
Contrarily to what was previously believed, there are corner cases where
the counters may not be allocated, and we may want to make them optional
at a later date, so we have to check if those counters are there.
However, just checking that shared.tg is non-NULL is enough, we can then
assume that shared.tg[tgid - 1] has properly been allocated too.
Also modify the various COUNTER_SHARED_* macros to make sure they check
for that too.
2026-01-14 12:39:14 +01:00
Amaury Denoyelle
7aa839296d BUG/MEDIUM: quic: fix ACK ECN frame parsing
ACK frames are either of type 0x02 or 0x03. The latter is an indication
that it contains extra ECN related fields. In haproxy QUIC stack, this
is considered as a different frame type, set to QUIC_FT_ACK_ECN, with
its own set of builder/parser functions.

This patch fixes ACK ECN parsing function. Indeed, the latter suffered
from two issues. First, 'first ACK range' and 'ACK ranges' were
inverted. Then, the three remaining ECN fields were simply ignored by
the parsing function.

This issue can cause desynchronization in the frames parsing code, which
may result in various result. Most of the time, the connection will be
aborted by haproxy due to an invalid frame content read.

Note that this issue was not detected earlier as most clients do not
enable ECN support if the peer is not able to emit ACK ECN frame first,
which haproxy currently never sends. Nevertheless, this is not the case
for every client implementation, thus proper ACK ECN parsing is
mandatory for a proper QUIC stack support.

Fix this by adjusting quic_parse_ack_ecn_frame() function. The remaining
ECN fields are parsed to ensure correct packet parsing. Currently, they
are not used by the congestion controller.

This must be backported up to 2.6.
2026-01-13 15:08:02 +01:00
Olivier Houchard
82196eb74e BUG/MEDIUM: threads: Fix binding thread on bind.
The code to parse the "thread" keyword on bind lines was changed to
check if the thread numbers were correct against the value provided with
max-threads-per-group, if any were provided, however, at the time those
thread keywords have been set, it may not yet have been set, and that
breaks the feature, so revert to check against MAX_THREADS_PER_GROUP instead,
it should have no major impact.
2026-01-13 11:45:46 +01:00
Olivier Houchard
da813ae4d7 MEDIUM: counters: Remove some extra tests
Before updating counters, a few tests are made to check if the counters
exits. but those counters should always exist at this point, so just
remmove them.
This commit should have no impact, but can easily be reverted with no
functional impact if various crashes appear.
2026-01-13 11:12:34 +01:00
Olivier Houchard
5495c88441 MEDIUM: counters: Dynamically allocate per-thread group counters
Instead of statically allocating the per-thread group counters,
based on the max number of thread groups available, allocate
them dynamically, based on the number of thread groups actually
used. That way we can increase the maximum number of thread
groups without using an unreasonable amount of memory.
2026-01-13 11:12:34 +01:00
Willy Tarreau
37057feb80 BUG/MINOR: net_helper: fix IPv6 header length processing
The IPv6 header contains a payload length that excludes the 40 bytes of
IPv6 packet header, which differs from IPv4's total length which includes
it. As a result, the parser was wrong and would only see the IP part and
not the TCP one unless sufficient options were present tocover it.

This issue came in 3.4-dev2 with recent commit e88e03a6e4 ("MINOR:
net_helper: add ip.fp() to build a simplified fingerprint of a SYN"),
so no backport is needed.
2026-01-13 08:42:36 +01:00
Aurelien DARRAGON
fcd4d4a7aa BUG/MINOR: hlua_fcn: ensure Patref:add_bulk() is given a table object before using it
As reported by GH user @kanashimia in GH #3241, providing anything else
than a table to Patref:add_bulk() method could cause a segfault because
we were calling lua_next() with the lua object without ensuring it
actually is a table.

Let's add the missing lua_istable() check on the stack object before
calling lua_next() function on it.

It should be backported up to 3.2 with 884dc62 ("MINOR: hlua_fcn:
add Patref:add_bulk()")
2026-01-12 17:30:54 +01:00
Aurelien DARRAGON
04545cb2b7 BUG/MINOR: hlua_fcn: fix broken yield for Patref:add_bulk()
In GH #3241, GH user @kanashimia reported that the Patref:add_bulk()
method would raise a Lua exception when called with more than 101
elements at once.

As identified by @kanashimia there was an error in the way the
add_bulk() method was forced to yield after 101 elements precisely.
The yield is there to ensure Lua doesn't eat too much ressources at
once and doesn't impact haproxy's core responsiveness, but the check
for the yield was misplaced resulting in improper stack content upon
resume.

Thanks to user @kanashimia who even provided a reproducer which helped
a lot to troubleshoot the issue.

This fix should be backported up to 3.2 with 884dc62 ("MINOR: hlua_fcn:
add Patref:add_bulk()") where the bug was introduced.
2026-01-12 17:30:52 +01:00
Olivier Houchard
b1cfeeef21 BUG/MINOR: stats-file: Use a 16bits variable when loading tgid
Now that the tgid stored in the stats file has been increased to 16bits
by commit 022cb3ab7fdce74de2cf24bea865ecf7015e5754, don't forget to
increase the variable size when reading it from the file, too.
This should have no impact given the maximum thread group limit is still
32.
2026-01-12 09:48:54 +01:00
Olivier Houchard
022cb3ab7f MINOR: stats: Increase the tgid from 8bits to 16bits
Increase the size of the stored tgid in the stat file from 8bits to
32bits, so that we can have more than 256 thread group. 65536 should be
enough for some time.

This bumps thet stat file minor version, as the structure changes.
2026-01-12 09:39:52 +01:00
Olivier Houchard
c0f64fc36a MINOR: receiver: Dynamically alloc the "members" field of shard_info
Instead of always allocating MAX_TGROUPS members, allocate them
dynamically, using the number of thread groups we'll use, so that
increasing MAX_TGROUPS will not have a huge impact on the structure
size.
2026-01-12 09:32:27 +01:00
Tim Duesterhus
96faf71f87 CLEANUP: connection: Remove outdated note about CO_FL 0x00002000 being unused
This flag is used as of commit dcce9369129f6ca9b8eed6b451c0e20c226af2e3
("MINOR: connections: Add a new CO_FL_SSL_NO_CACHED_INFO flag"). This patch
should be backported to 3.3. Apparently dcce9369129 has been backported
to 3.2 and 3.1 already, with that change already applied, so no need for a
backport there.
2026-01-12 03:22:15 +01:00
Willy Tarreau
2560cce7c5 MINOR: tcp-sample: permit retrieving tcp_info from the connection/session stage
The fc_xxx info that are retrieved over tcp_info could currently not
be accessed before a stream is created due to a test that verified the
existence of a stream. The rationale here was that the function works
both for frontend and backend. Let's always retrieve these info from
the session for the frontend case so that it now becomes possible to
set variables at connection/session time. The doc did not mention this
limitation so this could almost be considered as a bug.
2026-01-11 15:48:20 +01:00
Willy Tarreau
880bbeeda4 MINOR: sample: also support retrieving fc.timer.handshake without a stream
Some timers, like the handshake timer, are stored in the session and are
only copied to the logs struct when a stream is created. But this means
we can't measure it without a stream, nor store it once for all in a
variable at session creation time. Let's extend the sample fetch function
to retrieve it from the session when no stream is present. The doc did not
mention this limitation so this could almost be considered as a bug.
2026-01-11 15:48:19 +01:00
Amaury Denoyelle
875bbaa7fc MINOR: cfgparse: remove duplicate "force-persist" in common kw list
"force-persist" proxy keyword is listed twice in common_kw_list. This
patch removes the duplicated occurence.

This could be backported up to 2.4.
2026-01-09 16:45:54 +01:00
Willy Tarreau
46088b7ad0 MEDIUM: config: warn if some userlist hashes are too slow
It was reported in GH #2956 and more recently in GH #3235 that some
hashes are way too slow. The former triggers watchdog warnings during
checks, the second sees the config parsing take 20 seconds. This is
always due to the use of hash algorithms that are not suitable for use
in low-latency environments like web. They might be fine for a local
auth though. The difficulty, as explained by Philipp Hossner, is that
developers are not aware of this cost and adopt this without suspecting
any side effect.

The proposal here is to measure the crypt() call time and emit a warning
if it takes more than 10ms (which is already extreme). This was tested
by Philipp and confirmed to catch his case.

This is marked medium as it might start to report warnings on config
suffering from this problem without ever detecting it till now.
2026-01-09 14:56:18 +01:00
akarl10
a203ce6854 BUG/MINOR: ech/quic: enable ech configuration also for quic listeners
Patch dba4fd24 ("MEDIUM: ssl/ech: config and load keys") introduced
ECH configuration for bind lines, but the QUIC configuration parsers
still suffers from not using the same code as the TCP/TLS one, so the
init for QUIC was missed.

Must be backported in 3.3.
2026-01-08 17:34:28 +01:00
William Lallemand
6e1718ce4b CI: github: remove ERR=1 temporarly from the ECH job
The ECH job still fails to compile since the openssl 4.0 deprecated
functions were not removed yet. Let's remove ERR=1 temporarly.

We do know that there's a regression in OpenSSL 4.0 with these
reg-tests though:

Error: #    top  TEST reg-tests/ssl/set_ssl_crlfile.vtc FAILED (0.219) exit=2
Error: #    top  TEST reg-tests/ssl/set_ssl_cafile.vtc FAILED (0.236) exit=2
Error: #    top  TEST reg-tests/quic/set_ssl_crlfile.vtc FAILED (0.196) exit=2
2026-01-08 17:32:27 +01:00
Christian Ruppert
dbe52cc23e REGTESTS: ssl: Fix reg-tests curve check
OpenSSL changed the output from "Server Temp Key" in prior versions to
"Peer Temp Key" in recent ones.
a39dc27c25
It looks like it affects OpenSSL >=3.5.0
This broke the reg-test for e.g. Debian 13 builds, using OpenSSL 3.5.1

Fixes bug #3238

Could be backported in every branches.

Signed-off-by: Christian Ruppert <idl0r@qasl.de>
2026-01-08 16:14:54 +01:00
William Lallemand
623aa725a2 BUG/MINOR: cli/stick-tables: argument to "show table" is optional
Discussed in issue #3187, the CLI help is confusing for the "show table"
command as it seems that the argument is mandatory.

This patch adds the arguments between square brackets to remove the
confusion.
2026-01-08 11:54:01 +01:00
Willy Tarreau
dbba442740 BUILD: sockpair: fix build issue on macOS related to variable-length arrays
In GH issue #3226, Sergey Fedorov (@barracuda156) reported that since
commit 10c14a1ed0 ("MINOR: proto_sockpair: send_fd_uxst: init iobuf,
cmsghdr, cmsgbuf to zeros"), macOS 10.6.8 with gcc 14.3.0 doesn't build
anymore:

  src/proto_sockpair.c: In function 'send_fd_uxst':
  src/proto_sockpair.c:246:49: error: variable-sized object may not be initialized except with an empty initializer
    246 |         char cmsgbuf[CMSG_SPACE(sizeof(int))] = {0};
        |                                                 ^
  src/proto_sockpair.c:247:45: error: variable-sized object may not be initialized except with an empty initializer
    247 |         char buf[CMSG_SPACE(sizeof(int))] = {0};
        |                                             ^

Upon investigation, it appears that the CMSG_SPACE() macro on this OS
looks too complex for gcc to consider it as a constant, so it takes
these buffers for variable-length arrays and cannot initialize them.

Let's move to a simple memset() instead, which Sergey confirmed fixes
the problem.

This needs to be backported as far as 3.1. Thanks to Sergey for the
report, the bisect and testing the fix.
2026-01-08 09:26:22 +01:00
Hyeonggeun Oh
c17ed69bf3 MINOR: cfgparse: Refactor "userlist" parser to print it in -dKall operation
This patch covers issue https://github.com/haproxy/haproxy/issues/3221.

The parser for the "userlist" section did not use the standard keyword
registration mechanism. Instead, it relied on a series of strcmp()
comparisons to identify keywords such as "group" and "user".

This had two main drawbacks:
1. The keywords were not discoverable by the "-dKall" dump option,
   making it difficult for users to see all available keywords for the
   section.
2. The implementation was inconsistent with the parsers for other
   sections, which have been progressively refactored to use the
   standard cfg_kw_list infrastructure.

This patch refactors the userlist parser to align it with the project's
standard conventions.

The parsing logic for the "group" and "user" keywords has been extracted
from the if/else block in cfg_parse_users() into two new dedicated
functions:
- cfg_parse_users_group()
- cfg_parse_users_user()

These two keywords are now registered via a dedicated cfg_kw_list,
making them visible to the rest of the HAPorxy ecosystem, including the
-dKall dump.
2026-01-07 18:25:09 +01:00
William Lallemand
91cff75908 BUG/MINOR: cfgparse: wrong section name upon error
When a unknown keyword was used in the "userlist" section, the error was
mentioning the "users" section, instead of "userlist".

Could be backported in every branches.
2026-01-07 18:13:12 +01:00
William Lallemand
4aff6d1c25 BUILD: tools: memchr definition changed in C23
New gcc and clang versions from fedora rawhide seems to use the C23
standard by default. This version changes the definition of some
string.h functions, which now return a const char * instead of a char *.

src/tools.c: In function ‘fgets_from_mem’:
src/tools.c:7200:17: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
 7200 |         new_pos = memchr(*position, '\n', size);
      |                 ^

Strangely, -Wdiscarded-qualifiers does not seem to catch all the
memchr.

Should fix issue #3228.

This could be backported in previous versions.
2026-01-07 14:51:26 +01:00
William Lallemand
5322bd3785 BUILD: ssl: strchr definition changed in C23
New gcc and clang versions from fedora rawhide seems to use the C23
standard by default. This version changes the definition of some
string.h functions, which now return a const char * instead of a char *.

src/ssl_sock.c: In function ‘SSL_CTX_keylog’:
src/ssl_sock.c:4475:17: error: assignment discards ‘const’ qualifier from pointer target type [-Werror=discarded-qualifiers]
 4475 |         lastarg = strrchr(line, ' ');

Strangely, -Wdiscarded-qualifiers does not seem to catch all the
strrchr.

Should fix issue #3228.

This could be backported in previous versions.
2026-01-07 14:51:26 +01:00
Willy Tarreau
71b00a945d [RELEASE] Released version 3.4-dev2
Released version 3.4-dev2 with the following main changes :
    - BUG/MEDIUM: mworker/listener: ambiguous use of RX_F_INHERITED with shards
    - BUG/MEDIUM: http-ana: Properly detect client abort when forwarding response (v2)
    - BUG/MEDIUM: stconn: Don't report abort from SC if read0 was already received
    - BUG/MEDIUM: quic: Don't try to use hystart if not implemented
    - CLEANUP: backend: Remove useless test on server's xprt
    - CLEANUP: tcpcheck: Remove useless test on the xprt used for healthchecks
    - CLEANUP: ssl-sock: Remove useless tests on connection when resuming TLS session
    - REGTESTS: quic: fix a TLS stack usage
    - REGTESTS: list all skipped tests including 'feature cmd' ones
    - CI: github: remove openssl no-deprecated job
    - CI: github: add a job to test the master branch of OpenSSL
    - CI: github: openssl-master.yml misses actions/checkout
    - BUG/MEDIUM: backend: Do not remove CO_FL_SESS_IDLE in assign_server()
    - CI: github: use git prefix for openssl-master.yml
    - BUG/MEDIUM: mux-h2: synchronize all conditions to create a new backend stream
    - REGTESTS: fix error when no test are skipped
    - MINOR: cpu-topo: Turn the cpu policy configuration into a struct
    - MEDIUM: cpu-topo: Add a "threads-per-core" keyword to cpu-policy
    - MEDIUM: cpu-topo: Add a "cpu-affinity" option
    - MEDIUM: cpu-topo: Add a new "max-threads-per-group" global keyword
    - MEDIUM: cpu-topo: Add the "per-thread" cpu_affinity
    - MEDIUM: cpu-topo: Add the "per-ccx" cpu_affinity
    - BUG/MINOR: cpu-topo: fix -Wlogical-not-parentheses build with clang
    - DOC: config: fix number of values for "cpu-affinity"
    - MINOR: tools: add a secure implementation of memset
    - MINOR: mux-h2: add missing glitch count for non-decodable H2 headers
    - MINOR: mux-h2: perform a graceful close at 75% glitches threshold
    - MEDIUM: mux-h1: implement basic glitches support
    - MINOR: mux-h1: perform a graceful close at 75% glitches threshold
    - MEDIUM: cfgparse: acknowledge that proxy ID auto numbering starts at 2
    - MINOR: cfgparse: remove useless checks on no server in backend
    - OPTIM/MINOR: proxy: do not init proxy management task if unused
    - MINOR: patterns: preliminary changes for reorganization
    - MEDIUM: patterns: reorganize pattern reference elements
    - CLEANUP: patterns: remove dead code
    - OPTIM: patterns: cache the current generation
    - MINOR: tcp: add new bind option "tcp-ss" to instruct the kernel to save the SYN
    - MINOR: protocol: support a generic way to call getsockopt() on a connection
    - MINOR: tcp: implement the get_opt() function
    - MINOR: tcp_sample: implement the fc_saved_syn sample fetch function
    - CLEANUP: assorted typo fixes in the code, commits and doc
    - BUG/MEDIUM: cpu-topo: Don't forget to reset visited_ccx.
    - BUG/MAJOR: set the correct generation ID in pat_ref_append().
    - BUG/MINOR: backend: fix the conn_retries check for TFO
    - BUG/MINOR: backend: inspect request not response buffer to check for TFO
    - MINOR: net_helper: add sample converters to decode ethernet frames
    - MINOR: net_helper: add sample converters to decode IP packet headers
    - MINOR: net_helper: add sample converters to decode TCP headers
    - MINOR: net_helper: add ip.fp() to build a simplified fingerprint of a SYN
    - MINOR: net_helper: prepare the ip.fp() converter to support more options
    - MINOR: net_helper: add an option to ip.fp() to append the TTL to the fingerprint
    - MINOR: net_helper: add an option to ip.fp() to append the source address
    - DOC: config: fix the length attribute name for stick tables of type binary / string
    - MINOR: mworker/cli: only keep positive PIDs in proc_list
    - CLEANUP: mworker: remove duplicate list.h include
    - BUG/MINOR: mworker/cli: fix show proc pagination using reload counter
    - MINOR: mworker/cli: extract worker "show proc" row printer
    - MINOR: cpu-topo: Factorize code
    - MINOR: cpu-topo: Rename variables to better fit their usage
    - BUG/MEDIUM: peers: Properly handle shutdown when trying to get a line
    - BUG/MEDIUM: mux-h1: Take care to update <kop> value during zero-copy forwarding
    - MINOR: threads: Avoid using a thread group mask when stopping.
    - MINOR: hlua: Add support for lua 5.5
    - MEDIUM: cpu-topo: Add an optional directive for per-group affinity
    - BUG/MEDIUM: mworker: can't use signals after a failed reload
    - BUG/MEDIUM: stconn: Move data from <kip> to <kop> during zero-copy forwarding
    - DOC: config: fix a few typos and refine cpu-affinity
    - MINOR: receiver: Remove tgroup_mask from struct shard_info
    - BUG/MINOR: quic: fix deprecated warning for window size keyword
2026-01-07 11:02:12 +01:00
Amaury Denoyelle
e061547d9d BUG/MINOR: quic: fix deprecated warning for window size keyword
QUIC configuration was cleaned up in the previous release. Several
global keyword names were changed to unify the configuration. For each
of them the older keyword is marked as deprecated, with a warning to
mention the newer alternative.

This patch fixes the warning for 'tune.quic.frontend.default-max-size'
as the alternative proposed was not correct. The proper value now is
'tune.quic.fe.cc.max-win-size'.

This must be backported up to 3.3.
2026-01-07 09:54:31 +01:00
Olivier Houchard
41cd589645 MINOR: receiver: Remove tgroup_mask from struct shard_info
The only purpose from tgroup_mask seems to be to calculate how many
tgroups share the same shard, but this is an information we can
calculate differently, we just have to increment the number when a new
receiver is added to the shard, and decrement it when one is detached
from the shard. Removing thread group masks will allow us to increase
the maximum number of thread groups past 64.
2026-01-07 09:27:12 +01:00
Willy Tarreau
c3fcdfaf5c DOC: config: fix a few typos and refine cpu-affinity
There were two typos in the recently updated parts about per-group.
Also, change the commas to ':' after the options values, as sometimes
it would be confusing. Last, place quotes around keyword names so that
they're explicitly referred to as language keywords. No backport is
needed.
2026-01-07 09:19:25 +01:00
Christopher Faulet
83457b9e38 BUG/MEDIUM: stconn: Move data from <kip> to <kop> during zero-copy forwarding
The <kip> of producer was not forwarded to <kop> of consumer when zero-copy
data forwarding was tried. Because of the issue, the chunking of emitted H1
messages could be invalid.

To fix the bug, sc_ep_fwd_kip() must be called at this stage.

This fix is related to the previous one (529a8dbfb "BUG/MEDIUM: mux-h1: Take
care to update <kop> value during zero-copy forwarding"). Both are required
to fully fix the issue #3230.

This patch must be backported to 3.3.
2026-01-06 15:41:50 +01:00
William Lallemand
97490a7789 BUG/MEDIUM: mworker: can't use signals after a failed reload
In issue #3229 it was reported that the master couldn't reload after a
failed reload following a wrong configuration.

It is still possible to do a reload using the "reload" command of the
master CLI. But every signals are blocked.

The problem was introduced in 709cde6d0 ("BUG/MEDIUM: mworker: signals
inconsistencies during startup and reload") which fixes the blocking of
signals during the reload.

However the patch missed a case, indeed, the
run_master_in_recovery_mode() is not being called when the worker failed
to parse the configuration, it is only failing when the master is
failing.

To handle this case, the mworker_unblock_signals() function must be
called upon mworker_on_new_child_failure(). But since this is called in
an haproxy signal handler it would mess with the signals.

Instead, the patch adds a task which is started by the signal handler,
and restores the signals outside of it.

This must be backported as far as 3.1.
2026-01-06 14:27:53 +01:00
Olivier Houchard
56fd0c1a5c MEDIUM: cpu-topo: Add an optional directive for per-group affinity
When using per-group affinity, add an optional new directive. It accepts
the values of "auto", where when multiple thread groups are created, the
available CPUs are split equally across the groups, and is the new
default, and "loose", where all groups are bound to all available CPUs,
this is the old default.
2026-01-06 11:32:45 +01:00
Mike Lothian
1c0f781994 MINOR: hlua: Add support for lua 5.5
Lua 5.5 adds an extra argument to lua_newstate(). Since there are
already a few other ifdefs in hlua.c checking for the Lua version,
and there's a single call place, let's do the same here. This should
be safe for backporting if needed.

Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
2026-01-06 11:05:02 +01:00
Olivier Houchard
853604f87a MINOR: threads: Avoid using a thread group mask when stopping.
Remove the "stopped_tgroup_mask" variable, that indicated which thread
groups were stopping, and instead just use "stopped_tgroups", a counter
indicating how many thread groups are stopping. We want to remove all
thread group masks, so that we can increase the maximum number of thread
groups past 64.
2026-01-06 08:30:55 +01:00
Christopher Faulet
529a8dbfba BUG/MEDIUM: mux-h1: Take care to update <kop> value during zero-copy forwarding
Since the extra field was removed from the HTX structure, a regression was
introduced when forwarding of chunked messages. The <kop> value was not
decreased as it should be when data were sent via the zero-copy
forwarding. Because of this bug, it was possible to announce a chunk size
larger than the chunk data sent.

To fix the bug, an helper function was added to properly update the <kop>
value when a chunk size is emitted. This function is now called when new
chunk is announced, including during zero-copy forwarding.

As a workaround, "tune.disable-zero-copy-forwarding" or just
"tune.h1.zero-copy-fwd-send off" can be set in the global section.

This patch should fix the issue #3230. It must be backported to 3.3.
2026-01-06 07:39:05 +01:00
Christopher Faulet
0b29b76a52 BUG/MEDIUM: peers: Properly handle shutdown when trying to get a line
When a shutdown was reported to a peer applet, the event was not properly
handled if it failed to receive data. The function responsible to get data
was exiting too early if the applet buffer was empty, without testing the
sedesc status. Because of this issue, it was possible to have frozen peer
applets. For instance, it happend on client timeout. With too many frozen
applets, it was possible to reach the maxconn.

This patch should fix the issue #3234. It must be backported to 3.3.
2026-01-05 13:46:57 +01:00
Olivier Houchard
196d16f2b1 MINOR: cpu-topo: Rename variables to better fit their usage
Rename "visited_tsid" and "visited_ccx" to "touse_tsid" and
"touse_ccx". They are not there to remember which tsid/ccx we
alreaday visited, contrarily to visited_ccx_set and
visited_cl_set, they are there to know which tsid/ccx we should
use, so make that clear.
2026-01-05 09:25:48 +01:00
Olivier Houchard
bbf5c30a87 MINOR: cpu-topo: Factorize code
Factorize the code common to cpu_policy_group_by_ccx() and
cpu_policy_group_by_cluster() into a new function,
cpu_policy_assign_threads().
2026-01-05 09:24:44 +01:00
Alexander Stephan
e241144e70 MINOR: mworker/cli: extract worker "show proc" row printer
Introduce cli_append_worker_row() to centralize formatting of a single
worker row. Also, replace duplicated row-printing code in both current
and old workers loops with the helper. Motivation: Reduces LOC and
improves readability by removing duplication.
2026-01-05 08:59:45 +01:00
Alexander Stephan
4c10d9c70c BUG/MINOR: mworker/cli: fix show proc pagination using reload counter
After commit 594408cd612b5 ("BUG/MINOR: mworker/cli: 'show proc' is limited
by buffer size"), related to ticket #3204, the "show proc" logic
has been fixed to be able to print more than 202 processes. However, this
fix can lead to the omission of entries in case they have the same
timestamp.

To fix this, we use the unique reload counter instead of the timestamp.
On partial flush, set ctx->next_reload = child->reloads.
On resume skip entries with child->reloads >= ctx->next_reload.
Finally, we clear ctx->next_reload at the end of a complete dump so
subsequent show proc starts from the top.

Could be backported in all stable branches.
2026-01-05 08:59:34 +01:00
Alexander Stephan
a5f274de92 CLEANUP: mworker: remove duplicate list.h include
Drop the second #include <haproxy/list.h> from mworker.c.
No functional change; reduces redundancy and keeps includes tidy.
2026-01-05 08:59:34 +01:00
Alexander Stephan
c30eeb2967 MINOR: mworker/cli: only keep positive PIDs in proc_list
Change mworker_env_to_proc_list() to if (child->pid > 0) before
LIST_APPEND, avoiding invalid PIDs (0/-1) in the process list.
This has no functional impact beyond stricter validation and it aligns
with existing kill safeguards.
2026-01-05 08:59:14 +01:00
Willy Tarreau
6970c8b8b6 DOC: config: fix the length attribute name for stick tables of type binary / string
The stick-table doc was reworked and moved in 3.2 with commit da67a89f3
("DOC: config: move stick-tables and peers to their own section"), however
the optional length attribute for binary/string types was mistakenly
spelled "length" while it's "len".

This must be backported to 3.2.
2026-01-01 10:52:50 +01:00
Willy Tarreau
a206f85f96 MINOR: net_helper: add an option to ip.fp() to append the source address
The new value 4 will permit to append the source address to the
fingerprint, making it easier to build rules checking a specific path.
2026-01-01 10:32:16 +01:00
Willy Tarreau
70ffae3614 MINOR: net_helper: add an option to ip.fp() to append the TTL to the fingerprint
With mode value 1, the TTL will be appended immediately after the 7 bytes,
making it a 8-byte fingerprint.
2026-01-01 10:19:48 +01:00
Willy Tarreau
2c317cfed7 MINOR: net_helper: prepare the ip.fp() converter to support more options
It can make sense to support extra components in the fingerprint to ease
configuration, so let's change the 0/1 value to a bit field. We also turn
the current 1 (TCP options list) to 2 so that we'll reuse 1 for the TTL.
2026-01-01 10:19:20 +01:00
Willy Tarreau
e88e03a6e4 MINOR: net_helper: add ip.fp() to build a simplified fingerprint of a SYN
Here we collect all the stuff that depends on the sender's settings,
such as TOS, IP version, TTL range, presence of DF bit or IP options,
presence of DATA in the SYN, CWR+ECE flags, TCP header length, wscale,
initial window, mss, as well as the list of TCP extension kinds. It's
obviously fairly limited but can allows to avoid blacklisting certain
valid clients sharing the same IP address as a misbehaving one.

It supports both a short and a long mode depending on the argument.
These can be used with the tcp-ss bind option. The doc was updated
accordingly.
2025-12-31 17:17:38 +01:00
Willy Tarreau
6e46d1345b MINOR: net_helper: add sample converters to decode TCP headers
This adds the following converters, used to decode fields
in an incoming tcp header:

   tcp.dst, tcp.flags, tcp.seq, tcp.src, tcp.win,
   tcp.options.mss, tcp.options.tsopt, tcp.options.tsval,
   tcp.options.wscale, tcp.options_list,

These can be used with the tcp-ss bind option. The doc was updated
accordingly.
2025-12-31 17:17:23 +01:00
Willy Tarreau
e0a7a7ca43 MINOR: net_helper: add sample converters to decode IP packet headers
This adds a few converters that help decode parts of IP packets:
  - ip.data : returns the next header (typically TCP)
  - ip.df   : returns the dont-fragment flags
  - ip.dst  : returns the destination IPv4/v6 address
  - ip.hdr  : returns only the IP header
  - ip.proto: returns the upper level protocol (udp/tcp)
  - ip.src  : returns the source IPv4/v6 address
  - ip.tos  : returns the TOS / TC field
  - ip.ttl  : returns the TTL/HL value
  - ip.ver  : returns the IP version (4 or 6)

These can be used with the tcp-ss bind option. The doc was updated
accordingly.
2025-12-31 17:16:29 +01:00
Willy Tarreau
90d2f157f2 MINOR: net_helper: add sample converters to decode ethernet frames
This adds a few converters that help decode parts of ethernet frame
headers:
  - eth.data : returns the next header (typically IP)
  - eth.dst  : returns the destination MAC address
  - eth.hdr  : returns only the ethernet header
  - eth.proto: returns the ethernet proto
  - eth.src  : returns the source MAC address
  - eth.vlan : returns the VLAN ID when present

These can be used with the tcp-ss bind option. The doc was updated
accordingly.
2025-12-31 17:15:36 +01:00
Willy Tarreau
933cb76461 BUG/MINOR: backend: inspect request not response buffer to check for TFO
In 2.6, do_connect_server() was introduced by commit 0a4dcb65f ("MINOR:
stream-int/backend: Move si_connect() in the backend scope") and changed
the approach to work with a stream instead of a stream-interface. However
si_oc(si) was wrongly turned to &s->res instead of &s->req, which breaks
TFO by always inspecting the response channel to figure whether there are
data pending.

This fix can be backported to all versions till 2.6.
2025-12-31 13:03:53 +01:00
Willy Tarreau
799653d536 BUG/MINOR: backend: fix the conn_retries check for TFO
In 2.6, the retries counter on a stream was changed from retries left
to retries done via commit 731c8e6cf ("MINOR: stream: Simplify retries
counter calculation"). However, one comparison fell through the cracks
in order to detect whether or not we can use TFO (only first attempt),
resulting in TFO never working anymore.

This may be backported to all versions till 2.6.
2025-12-31 13:03:53 +01:00
Maxime Henrion
51592f7a09 BUG/MAJOR: set the correct generation ID in pat_ref_append().
This fixes crashes when creating more than one new revision of a map or
acl file and purging the previous version.
2025-12-31 00:29:47 +01:00
Olivier Houchard
54f59e4669 BUG/MEDIUM: cpu-topo: Don't forget to reset visited_ccx.
We want to reset visited_ccx, as introduced by commit
8aef5bec1ef57eac449298823843d6cc08545745, each time we run the loop,
otherwise the chances of its content being correct are very low, and
will likely end up being bound to the wrong threads.
This was reported in github issue #3224.
2025-12-26 23:55:57 +01:00
Ilia Shipitsin
f8a77ecf62 CLEANUP: assorted typo fixes in the code, commits and doc 2025-12-25 19:45:29 +01:00
Willy Tarreau
6fb521d2f6 MINOR: tcp_sample: implement the fc_saved_syn sample fetch function
This function retrieves the copy of a SYN packet that the system has
kept for us when bind option "tcp-ss" was set to 1 or above. It's
recommended to copy it to a local variable because it will be freed
after being read. It allows to inspect all parts of an incoming SYN
packet, provided that it was preserved (e.g. not possible with SYN
cookies). The doc provides examples of how to use it.
2025-12-24 18:39:37 +01:00
Willy Tarreau
52d60bf9ee MINOR: tcp: implement the get_opt() function
It relies on the generic sock_conn_get_opt() function and will permit
sample fetch functions to retrieve generic TCP-level info.
2025-12-24 18:38:51 +01:00
Willy Tarreau
6d995e59e9 MINOR: protocol: support a generic way to call getsockopt() on a connection
It's regularly needed to call getsockopt() on a connection, but each
time the calling code has to do all the job by itself. This commit adds
a "get_opt()" callback on the protocol struct, that directly calls
getsockopt() on the connection's FD. A generic implementation for
standard sockets is provided, though QUIC would likely require a
different approach, or maybe a mapping. Due to the overlap between
IP/TCP/socket option values, it is necessary for the caller to indicate
both the level and the option. An abstraction of the level could be
done, but the caller would nonetheless have to know the optname, which
is generally defined in the same include files. So for now we'll
consider that this callback is only for very specific use.

The levels and optnames are purposely passed as signed ints so that it
is possible to further extend the API by using negative levels for
internal namespaces.
2025-12-24 18:38:51 +01:00
Willy Tarreau
44c67a08dd MINOR: tcp: add new bind option "tcp-ss" to instruct the kernel to save the SYN
This option enables TCP_SAVE_SYN on the listening socket, which will
cause the kernel to try to save a copy of the SYN packet header (L2,
IP and TCP are supported). This can permit to check the source MAC
address of a client, or find certain TCP options such as a source
address encapsulated using RFC7974. It could also be used as an
alternate approach to retrieving the source and destination addresses
and ports. For now setting the option is enabled, but sample fetch
functions and converters will be needed to extract info.
2025-12-24 11:35:09 +01:00
Maxime Henrion
1fdccbe8da OPTIM: patterns: cache the current generation
This makes a significant difference when loading large files and during
commit and clear operations, thanks to improved cache locality. In the
measurements below, master refers to the code before any of the changes
to the patterns code, not the code before this one commit.

Timing the replacement of 10M entries from the CLI with this command
which also reports timestamps at start, end of upload and end of clear:

  $ (echo "prompt i"; echo "show activity"; echo "prepare acl #0";
     awk '{print "add acl @1 #0",$0}' < bad-ip.map; echo "show activity";
     echo "commit acl @1 #0"; echo "clear acl @0 #0";echo "show activity") |
    socat -t 10 - /tmp/sock1 | grep ^uptim

master, on a 3.7 GHz EPYC, 3 samples:

  uptime_now: 6.087030
  uptime_now: 25.981777  => 21.9 sec insertion time
  uptime_now: 29.286368  => 3.3 sec commit+clear

  uptime_now: 5.748087
  uptime_now: 25.740675  => 20.0s insertion time
  uptime_now: 29.039023  => 3.3 s commit+clear

  uptime_now: 7.065362
  uptime_now: 26.769596  => 19.7s insertion time
  uptime_now: 30.065044  => 3.3s commit+clear

And after this commit:

  uptime_now: 6.119215
  uptime_now: 25.023019  => 18.9 sec insertion time
  uptime_now: 27.155503  => 2.1 sec commit+clear

  uptime_now: 5.675931
  uptime_now: 24.551035  => 18.9s insertion
  uptime_now: 26.652352  => 2.1s commit+clear

  uptime_now: 6.722256
  uptime_now: 25.593952  => 18.9s insertion
  uptime_now: 27.724153  => 2.1s commit+clear

Now timing the startup time with a 10M entries file (on another machine)
on master, 20 samples:

Standard Deviation, s: 0.061652677408033
Mean:        4.217

And after this commit:

Standard Deviation, s: 0.081821371548669
Mean:        3.78
2025-12-23 21:17:39 +01:00
Maxime Henrion
99e625a41d CLEANUP: patterns: remove dead code
Situations where we are iterating over elements and find one with a
different generation ID cannot arise anymore since the elements are kept
per-generation.
2025-12-23 21:17:39 +01:00
Maxime Henrion
545cf59b6f MEDIUM: patterns: reorganize pattern reference elements
Instead of a global list (and tree) of pattern reference elements, we
now have an intermediate pat_ref_gen structure and store the elements in
those. This simplifies the logic of some operations such as commit and
clear, and improves performance in some cases - numbers to be provided
in a subsequent commit after one important optimization is added.

A lot of the changes are due to adding an extra level of indirection,
changing many cases where we iterate over all elements to an outer loop
iterating over the generation and an inner one iterating over the
elements of the current generation. It is therefore easier to read this
patch using 'git diff -w'.
2025-12-23 21:17:39 +01:00
Maxime Henrion
5547bedebb MINOR: patterns: preliminary changes for reorganization
Safe and non-functional changes that only add currently unused
structures, field, functions and macros, in preparation of larger
changes that alter the way pattern reference elements are stored.

This includes code to create and lookup generation objects, and
macros to iterate over the generations of a pattern reference.
2025-12-23 21:17:39 +01:00
Amaury Denoyelle
a4a17eb366 OPTIM/MINOR: proxy: do not init proxy management task if unused
Each proxy has its owned task for internal purpose. Currently, it is
only used either by frontends or if a stick-table is present.

This commit rendres the task allocation optional to only the required
case. Thus, it is not allocated anymore for backend only proxies without
stick-table.
2025-12-23 16:35:49 +01:00
Amaury Denoyelle
c397f6fc9a MINOR: cfgparse: remove useless checks on no server in backend
A legacy check could be activated at compile time to reject backends
without servers. In practice this is not used anymore and does not have
much sense with the introduction of dynamic servers.
2025-12-23 16:35:49 +01:00
Amaury Denoyelle
b562602044 MEDIUM: cfgparse: acknowledge that proxy ID auto numbering starts at 2
Each frontend/backend/listen proxies is assigned an unique ID. It can
either be set explicitely via 'id' keyword, or automatically assigned on
post parsing depending on the available values.

It was expected that the first automatically assigned value would start
at '1'. However, due to a legacy bug this is not the case as this value
is always skipped. Thus, automatically assigned proxies always start at
'2' or more.

To avoid breaking the current existing state, this situation is now
acknowledged with the current patch. The code is rewritten with an
explicit warning to ensure that this won't be fixed without knowing the
current status. A new regtest also ensures this.
2025-12-23 16:35:49 +01:00
Willy Tarreau
5904f8279b MINOR: mux-h1: perform a graceful close at 75% glitches threshold
This avoids hitting the hard wall for connections with non-compliant
peers that are accumulating errors. We recycle the connection early
enough to permit to reset the counter. Example below with a threshold
set to 100:

Before, 1% errors:
  $ h1load -H "Host : blah" -c 1 -n 10000000 0:4445
  #     time conns tot_conn  tot_req      tot_bytes    err  cps  rps  bps   ttfb
           1     1     1039   103872        6763365   1038 1k03 103k 54M1 9.426u
           2     1     2128   212793       14086140   2127 1k08 108k 58M5 8.963u
           3     1     3215   321465       21392137   3214 1k08 108k 58M3 8.982u
           4     1     4307   430684       28735013   4306 1k09 109k 58M6 8.935u
           5     1     5390   538989       36016294   5389 1k08 108k 58M1 9.021u

After, no more errors:
  $ h1load -H "Host : blah" -c 1 -n 10000000 0:4445
  #     time conns tot_conn  tot_req      tot_bytes    err  cps  rps  bps   ttfb
           1     1     1509   113161        7487809      0 1k50 113k 59M9 8.482u
           2     1     3002   225101       15114659      0 1k49 111k 60M9 8.582u
           3     1     4508   338045       22809911      0 1k50 112k 61M5 8.523u
           4     1     5971   447785       30286861      0 1k46 109k 59M7 8.772u
           5     1     7472   560335       37955271      0 1k49 112k 61M2 8.537u
2025-12-20 19:29:37 +01:00
Willy Tarreau
05b457002b MEDIUM: mux-h1: implement basic glitches support
We now count glitches for each parsing error, including those that
have been accepted via accept-unsafe-violations-*. Front and back
are considered and the connection gets killed on error once if the
threshold is reached or passed and the CPU usage is beyond the
configured limit (0 by default). This was tested with:

   curl -ivH "host : blah" 0:4445{,,,,,,,,,}

which sends 10 requests to a configuration having a threshold of 5.
The global keywords are named similarly to H2 and quic:

     tune.h1.be.glitches-threshold xxxx
     tune.h1.fe.glitches-threshold xxxx

The glitches count of each connection is also reported when non-null
in the connection dumps (e.g. "show fd").
2025-12-20 19:29:33 +01:00
Willy Tarreau
0901f60cef MINOR: mux-h2: perform a graceful close at 75% glitches threshold
This avoids hitting the hard wall for connections with non-compliant
peers that would be accumulating errors over long connections. We now
permit to recycle the connection early enough to reset the connection
counter.

This was tested artificially by adding this to h2c_frt_handle_headers():

  h2c_report_glitch(h2c, 1, "new stream");

or this to h2_detach():

  h2c_report_glitch(h2c, 1, "detaching");

and injecting using h2load -c 1 -n 1000 0:4445 on a config featuring
tune.h2.fe.glitches-threshold 1000:

  finished in 8.74ms, 85802.54 req/s, 686.62MB/s
  requests: 1000 total, 751 started, 751 done, 750 succeeded, 250 failed, 250 errored, 0 timeout
  status codes: 750 2xx, 0 3xx, 0 4xx, 0 5xx
  traffic: 6.00MB (6293303) total, 132.57KB (135750) headers (space savings 29.84%), 5.86MB (6144000) data
                       min         max         mean         sd        +/- sd
  time for request:        9us       178us        10us         6us    99.47%
  time for connect:      139us       139us       139us         0us   100.00%
  time to 1st byte:      339us       339us       339us         0us   100.00%
  req/s           :   87477.70    87477.70    87477.70        0.00   100.00%

The failures are due to h2load not supporting reconnection.
2025-12-20 19:26:29 +01:00
Willy Tarreau
52adeef7e1 MINOR: mux-h2: add missing glitch count for non-decodable H2 headers
One rare error case could produce a protocol error on the stream when
not being able to decode response headers wasn't being accounted as a
glitch, so let's fix it.
2025-12-20 19:11:16 +01:00
Maxime Henrion
c8750e4e9d MINOR: tools: add a secure implementation of memset
This guarantees that the compiler will not optimize away the memset()
call if it detects a dead store.

Use this to clear SSL passphrases.

No backport needed.
2025-12-19 17:42:57 +01:00
Willy Tarreau
bd92f34f02 DOC: config: fix number of values for "cpu-affinity"
It said "accepts 2 values" then goes on enumerating 5 since more were
added one at a time. Let's fix it by removing the number. No backport
is needed.
2025-12-19 11:21:09 +01:00
William Lallemand
03340748de BUG/MINOR: cpu-topo: fix -Wlogical-not-parentheses build with clang
src/cpu_topo.c:1325:15: warning: logical not is only applied to the left hand side of this bitwise operator [-Wlogical-not-parentheses]
 1325 |                         } else if (!cpu_policy_conf.flags & CPU_POLICY_ONE_THREAD_PER_CORE)
      |                                    ^                      ~
src/cpu_topo.c:1325:15: note: add parentheses after the '!' to evaluate the bitwise operator first
 1325 |                         } else if (!cpu_policy_conf.flags & CPU_POLICY_ONE_THREAD_PER_CORE)
      |                                    ^
      |                                     (                                                     )
src/cpu_topo.c:1325:15: note: add parentheses around left hand side expression to silence this warning
 1325 |                         } else if (!cpu_policy_conf.flags & CPU_POLICY_ONE_THREAD_PER_CORE)
      |                                    ^
      |                                    (                     )
src/cpu_topo.c:1533:15: warning: logical not is only applied to the left hand side of this bitwise operator [-Wlogical-not-parentheses]
 1533 |                         } else if (!cpu_policy_conf.flags & CPU_POLICY_ONE_THREAD_PER_CORE)
      |                                    ^                      ~
src/cpu_topo.c:1533:15: note: add parentheses after the '!' to evaluate the bitwise operator first
 1533 |                         } else if (!cpu_policy_conf.flags & CPU_POLICY_ONE_THREAD_PER_CORE)
      |                                    ^
      |                                     (                                                     )
src/cpu_topo.c:1533:15: note: add parentheses around left hand side expression to silence this warning
 1533 |                         } else if (!cpu_policy_conf.flags & CPU_POLICY_ONE_THREAD_PER_CORE)
      |                                    ^
      |                                    (                     )

No backport needed.
2025-12-19 10:15:17 +01:00
Olivier Houchard
8aef5bec1e MEDIUM: cpu-topo: Add the "per-ccx" cpu_affinity
Add a new cpu-affinity keyword, "per-ccx".
If used, each thread will be bound to all the hardware threads available
in one CCX of the threads group.
2025-12-18 18:52:52 +01:00
Olivier Houchard
c524b181a2 MEDIUM: cpu-topo: Add the "per-thread" cpu_affinity
Add a new cpu-affinity keyword, "per-thread".
If used, each thread will be bound to only one hardware thread of the
thread group.
If used in conjonction with the "threads-per-core 1" cpu_policy, then
each thread will be bound on a different core.
2025-12-18 18:52:52 +01:00
Olivier Houchard
7e22d9c484 MEDIUM: cpu-topo: Add a new "max-threads-per-group" global keyword
Add a new global keyword, max-threads-per-group. It sets the maximum number of
threads a thread group can contain. Unless the number of thread groups
is fixed with "thread-groups", haproxy will just create more thread
groups as needed.
The default and maximum value is 64.
2025-12-18 18:52:52 +01:00
Olivier Houchard
3865f6c5c6 MEDIUM: cpu-topo: Add a "cpu-affinity" option
Add a new global option, "cpu-affinity", which controls how threads are
bound.
It currently accepts three values, "per-core", which will bind one thread to
each hardware thread of a given core, and "per-group" which will use all
the available hardware threads of the thread group, and "auto", the
default, which will use "per-group", unless "threads-per-core 1" has
been specified in cpu_policy, in which case it will use per-core.
2025-12-18 18:52:52 +01:00
Olivier Houchard
3671652bc9 MEDIUM: cpu-topo: Add a "threads-per-core" keyword to cpu-policy
Add a new, optional key-word to "cpu-policy", "threads-per-core".
It takes one argument, "1" or "auto". If "1" is used, then only one
thread per core will be created, no matter how many hardware thread each
core has. If "auto" is used, then one thread will be created per
hardware thread, as is the case by default.

for example: cpu-policy performance threads-per-core 1
2025-12-18 18:52:52 +01:00
Olivier Houchard
58f04b4615 MINOR: cpu-topo: Turn the cpu policy configuration into a struct
Turn the cpu policy configuration into a struct. Right now it just
contains an int, that represents the policy used, but will get more
information soon.
2025-12-18 18:52:52 +01:00
William Lallemand
876b1e8477 REGTESTS: fix error when no test are skipped
Since commit 1ed2c9d ("REGTESTS: list all skipped tests including
'feature cmd' ones"), the script emits some error when trying to display
the list of skipped tests when there are none.

No backport needed.
2025-12-18 17:26:50 +01:00
Willy Tarreau
9a046fc3ad BUG/MEDIUM: mux-h2: synchronize all conditions to create a new backend stream
In H2 the conditions to create a new stream differ for a client and a
server when a GOAWAY was exchanged. While on the server, any stream
whose ID is lower than or equal to the one advertised in GOAWAY is
valid, for a client it's forbidden to create any stream after receipt
of a GOAWAY, even if its ID is lower than or equal to the last one,
despite the server not being able to tell the difference from the
number of streams in flight.

Unfortunately, the logic in the code did not always reflect this
specificity of the client (the backend code in our case), and most
often considered that it was still permitted to create a new stream
until the max_id was greater than or equal to the advertised last_id.
This is for example what h2c_is_dead() and h2c_streams_left() do. In
other places, such as h2_avail_streams(), the rule is properly taken
into account. Very often the advertised last_id is the same, and this
is also what haproxy does (which explains why it's impossible to
reproduce the issue by chaining two haproxy layers), but a server may
wish to advertise any ID including 2^31-1 as mentioned in the spec,
and in this case the functions would behave differently.

This discrepancy results in a corner case where a GOAWAY received on
an idle connection will cause the next stream creation to be initially
accepted but then rejected via h2_avail_streams(), and the connection
left in a bad state, still attached to the session due to http-reuse
safe, but not reinserted into idle list, since the backend code
currently is not able to properly recover from this situation. Worse,
the idle flags are no longer on it but TASK_F_USR1 still is, and this
makes the recently added BUG_ON() rightfully trigger since this case
is not supposed to happen.

Admittedly more of the backend recovery code needs to be reworked,
however the mux must consistently decide whether or not a connection
may be reused or needs to be released.

This commit fixes the affected logic by introducing a new function
"h2c_reached_last_stream()" which says if a connection has reached its
last stream, regardless of the side, and using this one everywhere
max_id was compared to last_id. This is sufficient to address the
corner case that be_reuse_connection() currently cannot recover from.

This is in relation to GH issue #3215 and it should be sufficient to
fix the issue there. Thanks to Chris Staite for reporting the issue
and kudos to Amaury for spotting the events sequence that can lead
to this situation.

This patch must be backported to 3.3 first, then to older versions
later. It's worth noting that it's much more difficult to observe
the issue before 3.3 because the BUG_ON() is not there, and the
possibly non-released connection might end up being killed for other
reasons (timeouts etc). But one possible visible effect might be the
impossibility to delete a server (which Chris observed in 3.3).
2025-12-18 17:01:32 +01:00
William Lallemand
9c8925ba0d CI: github: use git prefix for openssl-master.yml
Uses the git- prefix in order to get the latest tarball for the master
branch on github.
2025-12-18 16:13:04 +01:00
Olivier Houchard
40d16af7a6 BUG/MEDIUM: backend: Do not remove CO_FL_SESS_IDLE in assign_server()
Back in the mists of time, commit e91a526c8f decided that if we were trying
to stay on the same server than the previous request, and if there were
a connection available in the session, we'd remove its CO_FL_SESS_IDLE.
The reason for doing that has been long lost, probably it fixed a bug at some
point, but it was most probably not the right place to do that. And starting
with 3.3, this triggers a BUG_ON() because that flag is expected later on.
So just revert the commit, if the ancient bug shows up again, it will be
fixed another way.

This should be backported to 3.3. There is little reason to backport it
to previous versions, unless other patches depend on it.
2025-12-18 16:09:34 +01:00
William Lallemand
0c7a4469d2 CI: github: openssl-master.yml misses actions/checkout
The job can't run setup-vtest because the actions/checkout use line is
missing.
2025-12-18 16:03:20 +01:00
William Lallemand
38d3c24931 CI: github: add a job to test the master branch of OpenSSL
vtest.yml only builds the releases of OpenSSL for now, there's no way to
check if we still have issues with the API before a pre-release version
is released.

This job builds the master branch of OpenSSL.

It is run everyday at 3 AM.
2025-12-18 15:43:06 +01:00
William Lallemand
a58f09b63c CI: github: remove openssl no-deprecated job
Remove the openssl no-deprecated job which was used for 1.1.0 API.
It's not useful anymore since it uses the OpenSSL version of the
distributions.

Checking depreciations in the API is still useful when using newest
version of the library. A job for the OpenSSL master branch would be
more useful than that.
2025-12-18 15:22:27 +01:00
William Lallemand
1ed2c9da2c REGTESTS: list all skipped tests including 'feature cmd' ones
The script for running regression tests is modified to improve the
visibility of skipped tests.

Previously, the reasons for skipping tests were only visible during the
test discovery phase when grepping the vtc (REQUIRE, EXCLUDE, etc).
But reg-tests skipped by vtest with the 'feature cmd' keywords were not
listed.

This change introduces the following:
  - vtest does not remove the logs itself anymore, because it is not
    able to let the log available when a test is skipped. So the -L
    parameter is now always passed to vtest
  - All skipped tests during the discovery phase are now logged to a
    'skipped.log' file within the test directory
  - The script now parses vtest logs to find tests that were skipped
    due to missing features (via the 'feature cmd' in .vtc files)
    and adds them to the skipped list.
2025-12-17 15:54:15 +01:00
Frederic Lecaille
8523a5cde0 REGTESTS: quic: fix a TLS stack usage
This issue was reported in GH #3214 where quic/tls13_ssl_crt-list_filters.vtc
QUIC reg test was run without haproxy QUIC support due to OPENSSL_AWSLC enabled
featured.

This is due to the fact that when ssl/tls13_ssl_crt-list_filters.vtc has been
ported to QUIC the feature(OPENSSL) was silly replaced by feature(QUIC) leading
the script to be run even without QUIC support if OR'ed OPENSSL_AWSLC feature is
enabled.

A good method to port these feature() commands to QUIC would have been
to add a feature(QUIC) command seperated from the one used for the supported
TLS stacks identified by the original underlying ssl reg tests (in reg-tests/ssl).
This is what is done by this patch.

Thank you to @idl0r for having reported this issue.
2025-12-15 09:44:42 +01:00
Christopher Faulet
a25394b6c8 CLEANUP: ssl-sock: Remove useless tests on connection when resuming TLS session
In ssl_sock_srv_try_reuse_sess(), the connection is always defined, to TCP
and QUIC connections. No reason to test it. Because it is not so obvious for
the QUIC part, a BUG_ON() could be added here. For now, just remove useless
tests.

This patch should fix a Coverity report from #3213.
2025-12-15 08:16:59 +01:00
Christopher Faulet
d6b1d5f6e9 CLEANUP: tcpcheck: Remove useless test on the xprt used for healthchecks
The xprt used to perform a healthcheck is always defined and cannot be NULL.
So there is no reason to test it. It could lead to wrong assumptions later
in the code.

This patch should fix a Coverity report from #3213.
2025-12-15 08:01:21 +01:00
Christopher Faulet
5c5914c32e CLEANUP: backend: Remove useless test on server's xprt
The server's xprt is always defined and cannot be NULL. So there is no
reason to test it. It could lead to wrong assumptions later in the code.

This patch should fix a Coverity report from #3213.
2025-12-15 07:56:53 +01:00
Olivier Houchard
a08bc468d2 BUG/MEDIUM: quic: Don't try to use hystart if not implemented
Not every CC algos implement hystart, so only call the method if it is
actually there. Failure to do so will cause crashes if hystart is on,
and the algo doesn't implement it.

This should fix github issue #3218

This should be backported up to 3.0.
2025-12-14 16:46:12 +01:00
Christopher Faulet
54e58103e5 BUG/MEDIUM: stconn: Don't report abort from SC if read0 was already received
SC_FL_ABRT_DONE flag should never be set when SC_FL_EOS was already
set. These both flags were introduced to replace the old CF_SHUTR and to
have a flag for shuts driven by the stream and a flag for the read0 received
by the mux. So both flags must not be seen at same time on a SC. It is
espeically important because some processing are performed when these flags
are set. And wrong decisions may be made.

This patch must be backproted as far as 2.8.
2025-12-12 08:41:08 +01:00
Christopher Faulet
a483450fa2 BUG/MEDIUM: http-ana: Properly detect client abort when forwarding response (v2)
The first attempt to fix this issue (c672b2a29 "BUG/MINOR: http-ana:
Properly detect client abort when forwarding the response") was not fully
correct and could be responsible to false report of client abort during the
response forwarding. I guess it is possible to truncate the response.

Instead, we must also take care that the client closed on its side, by
checking SC_FL_EOS flag on the front SC. Indeed, if the client has aborted,
this flag should be set.

This patch should be backported as far as 2.8.
2025-12-12 08:41:08 +01:00
William Lallemand
5b19d95850 BUG/MEDIUM: mworker/listener: ambiguous use of RX_F_INHERITED with shards
The RX_F_INHERITED flag was ambiguous, as it was used to mark both
listeners inherited from the parent process and listeners duplicated
from another local receiver. This could lead to incorrect behavior
concerning socket unbinding and suspension.

This commit refactors the handling of inherited listeners by splitting
the RX_F_INHERITED flag into two more specific flags:

- RX_F_INHERITED_FD: Indicates a listener inherited from the parent
  process via its file descriptor. These listeners should not be unbound
  by the master.

- RX_F_INHERITED_SOCK: Indicates a listener that shares a socket with
  another one, either by being inherited from the parent or by being
  duplicated from another local listener. These listeners should not be
  suspended or resumed individually.

Previously, the sharding code was unconditionally using RX_F_INHERITED
when duplicating a file descriptor. In HAProxy versions prior to 3.1,
this led to a file descriptor leak for duplicated unix stats sockets in
the master process. This would eventually cause the master to crash with
a BUG_ON in fd_insert() once the file descriptor limit was reached.

This must be backported as far as 3.0. Branches earlier than 3.0 are
affected but would need a different patch as the logic is different.
2025-12-11 18:09:47 +01:00
Willy Tarreau
aed953088e [RELEASE] Released version 3.4-dev1
Released version 3.4-dev1 with the following main changes :
    - BUG/MINOR: jwt: Missing "case" in switch statement
    - DOC: configuration: ECH support details
    - Revert "MINOR: quic: use dynamic cc_algo on bind_conf"
    - MINOR: quic: define quic_cc_algo as const
    - MINOR: quic: extract cc-algo parsing in a dedicated function
    - MINOR: quic: implement cc-algo server keyword
    - BUG/MINOR: quic-be: Missing keywords array NULL termination
    - REGTESTS: ssl enable tls12_reuse.vtc for AWS-LC
    - REGTESTS: ssl: split tls*_reuse in stateless and stateful resume tests
    - BUG/MEDIUM: connection: fix "bc_settings_streams_limit" typo
    - BUG/MEDIUM: config: ignore empty args in skipped blocks
    - DOC: config: mention clearer that the cache's total-max-size is mandatory
    - DOC: config: reorder the cache section's keywords
    - BUG/MINOR: quic/ssl: crash in ClientHello callback ssl traces
    - BUG/MINOR: quic-be: handshake errors without connection stream closure
    - MINOR: quic: Add useful debugging traces in qc_idle_timer_do_rearm()
    - REGTESTS: ssl: Move all the SSL certificates, keys, crt-lists inside "certs" directory
    - REGTESTS: quic/ssl: ssl/del_ssl_crt-list.vtc supported by QUIC
    - REGTESTS: quic: dynamic_server_ssl.vtc supported by QUIC
    - REGTESTS: quic: issuers_chain_path.vtc supported by QUIC
    - REGTESTS: quic: new_del_ssl_cafile.vtc supported by QUIC
    - REGTESTS: quic: ocsp_auto_update.vtc supported by QUIC
    - REGTESTS: quic: set_ssl_bug_2265.vtc supported by QUIC
    - MINOR: quic: avoid code duplication in TLS alert callback
    - BUG/MINOR: quic-be: missing connection stream closure upon TLS alert to send
    - REGTESTS: quic: set_ssl_cafile.vtc supported by QUIC
    - REGTESTS: quic: set_ssl_cert_noext.vtc supported by QUIC
    - REGTESTS: quic: set_ssl_cert.vtc supported by QUIC
    - REGTESTS: quic: set_ssl_crlfile.vtc supported by QUIC
    - REGTESTS: quic: set_ssl_server_cert.vtc supported by QUIC
    - REGTESTS: quic: show_ssl_ocspresponse.vtc supported by QUIC
    - REGTESTS: quic: ssl_client_auth.vtc supported by QUIC
    - REGTESTS: quic: ssl_client_samples.vtc supported by QUIC
    - REGTESTS: quic: ssl_default_server.vtc supported by QUIC
    - REGTESTS: quic: new_del_ssl_crlfile.vtc supported by QUIC
    - REGTESTS: quic: ssl_frontend_samples.vtc supported by QUIC
    - REGTESTS: quic: ssl_server_samples.vtc supported by QUIC
    - REGTESTS: quic: ssl_simple_crt-list.vtc supported by QUIC
    - REGTESTS: quic: ssl_sni_auto.vtc code provision for QUIC
    - REGTESTS: quic: ssl_curve_name.vtc supported by QUIC
    - REGTESTS: quic: add_ssl_crt-list.vtc supported by QUIC
    - REGTESTS: add ssl_ciphersuites.vtc (TCP & QUIC)
    - BUG/MINOR: quic: do not set first the default QUIC curves
    - REGTESTS: quic/ssl: Add ssl_curves_selection.vtc
    - BUG/MINOR: ssl: Don't allow to set NULL sni
    - MEDIUM: quic: Add connection as argument when qc_new_conn() is called
    - MINOR: ssl: Add a function to hash SNIs
    - MINOR: ssl: Store hash of the SNI for cached TLS sessions
    - MINOR: ssl: Compare hashes instead of SNIs when a session is cached
    - MINOR: connection/ssl: Store the SNI hash value in the connection itself
    - MEDIUM: tcpcheck/backend: Get the connection SNI before initializing SSL ctx
    - BUG/MEDIUM: ssl: Don't reuse TLS session if the connection's SNI differs
    - MEDIUM: ssl/server: No longer store the SNI of cached TLS sessions
    - BUG/MINOR: log: Dump good %B and %U values in logs
    - BUG/MEDIUM: http-ana: Don't close server connection on read0 in TUNNEL mode
    - DOC: config: Fix description of the spop mode
    - DOC: config: Improve spop mode documentation
    - MINOR: ssl: Split ssl_crt-list_filters.vtc in two files by TLS version
    - REGTESTS: quic: tls13_ssl_crt-list_filters.vtc supported by QUIC
    - BUG/MEDIUM: h3: do not access QCS <sd> if not allocated
    - CLEANUP: mworker/cli: remove useless variable
    - BUG/MINOR: mworker/cli: 'show proc' is limited by buffer size
    - BUG/MEDIUM: ssl: Always check the ALPN after handshake
    - MINOR: connections: Add a new CO_FL_SSL_NO_CACHED_INFO flag
    - BUG/MEDIUM: ssl: Don't store the ALPN for check connections
    - BUG/MEDIUM: ssl: Don't resume session for check connections
    - CLEANUP: improvements to the alignment macros
    - CLEANUP: use the automatic alignment feature
    - CLEANUP: more conversions and cleanups for alignment
    - BUG/MEDIUM: h3: fix access to QCS <sd> definitely
    - MINOR: h2/trace: emit a trace of the received RST_STREAM type
2025-12-10 16:52:30 +01:00
Willy Tarreau
3ec5818807 MINOR: h2/trace: emit a trace of the received RST_STREAM type
Right now we don't get any state trace when receiving an RST_STREAM, and
this is not convenient because RST_STREAM(0) is not visible at all, except
in developer level because the function is entered and left.

Let's extract the RST code first and always log it using TRACE_PRINTF()
(along with h2c/h2s) so that it's possible to detect certain codes being
used.
2025-12-10 15:58:56 +01:00
Amaury Denoyelle
5b8e6d6811 BUG/MEDIUM: h3: fix access to QCS <sd> definitely
The previous patch tried to fix access to QCS <sd> member, as the latter
is not always allocated anymore on the frontend side.

  a15f0461a016a664427f5aaad2227adcc622c882
  BUG/MEDIUM: h3: do not access QCS <sd> if not allocated

In particular, access was prevented after HEADERS parsing in case
h3_req_headers_to_htx() returned an error, which indicates that the
stream-endpoint allocation was not performed. However, this still is not
enough when QCS instance is already closed at this step. Indeed, in this
case, h3_req_headers_to_htx() returns OK but stream-endpoint allocation
is skipped as an optimization as no data exchange will be performed.

To definitely fix this kind of problems, add checks on qcs <sd> member
before accessing it in H3 layer. This method is the safest one to ensure
there is no NULL dereferencement.

This should fix github issue #3211.

This must be backported along the above mentionned patch.
2025-12-10 12:04:37 +01:00
Maxime Henrion
6eedd0d485 CLEANUP: more conversions and cleanups for alignment
- Convert additional cases to use the automatic alignment feature for
  the THREAD_ALIGN(ED) macros. This includes some cases that are less
  obviously correct where it seems we wanted to align only in the
  USE_THREAD case but were not using the thread specific macros.
- Also move some alignment requirements to the structure definition
  instead of having it on variable declaration.
2025-12-09 17:40:58 +01:00
Maxime Henrion
bc8e14ec23 CLEANUP: use the automatic alignment feature
- Use the automatic alignment feature instead of hardcoding 64 all over
  the code.
- This also converts a few bare __attribute__((aligned(X))) to using the
  ALIGNED macro.
2025-12-09 17:14:58 +01:00
Maxime Henrion
74719dc457 CLEANUP: improvements to the alignment macros
- It is now possible to use the THREAD_ALIGN and THREAD_ALIGNED macros
  without a parameter. In this case, we automatically align on the cache
  line size.
- The cache line size is set to 64 by default to match the current code,
  but it can be overridden on the command line.
- This required moving the DEFVAL/DEFNULL/DEFZERO macros to compiler.h
  instead of tools-t.h, to avoid namespace pollution if we included
  tools-t.h from compiler.h.
2025-12-09 17:05:52 +01:00
Olivier Houchard
420b42df1c BUG/MEDIUM: ssl: Don't resume session for check connections
Don't attempt to use stored sessions when creating new check
connections, as the check SSL parameters might be different from the
server's ones.
This has not been proven to be a problem yet, but it doesn't mean it
can't be, and this should be backported up to 2.8 along with
dcce9369129f6ca9b8eed6b451c0e20c226af2e3 if it is.
2025-12-09 16:45:54 +01:00
Olivier Houchard
be4e1220c2 BUG/MEDIUM: ssl: Don't store the ALPN for check connections
When establishing check connections, do not store the negociated ALPN
into the server's path_param if the connection is a check connection, as
it may use different SSL parameters than the regular connections. To do
so, only store them if the CO_FL_SSL_NO_CACHED_INFO is not set.
Otherwise, the check ALPN may be stored, and the wrong mux can be used
for regular connections, which will end up generating 502s.

This should fix Github issue #3207

This should be backported to 3.3.
2025-12-09 16:43:31 +01:00
Olivier Houchard
dcce936912 MINOR: connections: Add a new CO_FL_SSL_NO_CACHED_INFO flag
Add a new flag to connections, CO_FL_SSL_NO_CACHED_INFO, and set it for
checks.
It lets the ssl layer know that he should not use cached informations,
such as the ALPN as stored in the server, or cached sessions.
This wlil be used for checks, as checks may target different servers, or
used a different SSL configuration, so we can't assume the stored
informations are correct.

This should be backported to 3.3, and may be backported up to 2.8 if the
attempts to do session resume by checks is proven to be a problem.
2025-12-09 16:43:31 +01:00
Olivier Houchard
260d64d787 BUG/MEDIUM: ssl: Always check the ALPN after handshake
Move the code that is responsible for checking the ALPN, and updating
the one stored in the server's path_param, from after we created the
mux, to after we did an handshake. Once we did it once, the mux will not
be created by the ssl code anymore, as when we know which mux to use
thanks to the ALPN, it will be done earlier in connect_server(), so in
the unlikely event it changes, we would not detect it anymore, and we'd
keep on creating the wrong mux.
This can be reproduced by doing a first request, and then changing the
ALPN of the server without haproxy noticing (ie without haproxy noticing
that the server went down).

This should be backported to 3.3.
2025-12-09 16:43:31 +01:00
William Lallemand
594408cd61 BUG/MINOR: mworker/cli: 'show proc' is limited by buffer size
In ticket #3204, it was reported that "show proc" is not able to display
more than 202 processes. Indeed the bufsize is 16k by default in the
master, and can't be changed anymore since 3.1.

This patch allows the 'show proc' to start again to dump when the buffer
is full, based on the timestamp of the last PID it attempted to dump.
Using pointers or count the number of processes might not be a good idea
since the list can change between calls.

Could be backported in all stable branche.
2025-12-09 16:09:10 +01:00
William Lallemand
dabe8856ad CLEANUP: mworker/cli: remove useless variable
The msg variable is declared and free but never used, this patch removes it.
2025-12-09 16:09:10 +01:00
Amaury Denoyelle
a15f0461a0 BUG/MEDIUM: h3: do not access QCS <sd> if not allocated
Since the following commit, allocation of QCS stream-endpoint on FE side
has been delayed. The objective is to allocate it only for QCS attached
to an upper stream object. Stream-endpoint allocation is now performed
on qcs_attach_sc() called during HEADERS parsing.

  commit e6064c561684d9b079e3b5725d38dc3b5c1b5cd5
  OPTIM: mux-quic: delay FE sedesc alloc to stream creation

Also, stream-endpoint is accessed through the QCS instance after HEADERS
or DATA frames parsing, to update the known input payload length. The
above patch triggered regressions as in some code paths, <sd> field is
dereferenced while still being NULL.

This patch fixes this by restricting access to <sd> field after newer
conditions.

First, after HEADERS parsing, known input length is only updated if
h3_req_headers_to_htx() previously returned a success value, which
guarantee that qcs_attach_sc() has been executed.

After DATA parsing, <sd> is only accessed after the frame validity
check. This ensures that HEADERS were already parsed, thus guaranteing
that stream-endpoint is allocated.

This should fix github issue #3211.

This must be backported up to 3.3. This is sufficient, unless above
patch is backported to previous releases, in which case the current one
must be picked with it.
2025-12-09 15:00:23 +01:00
Frederic Lecaille
18625f7ff3 REGTESTS: quic: tls13_ssl_crt-list_filters.vtc supported by QUIC
ssl/tls13_ssl_crt-list_filters.vtc was renamed to ssl/tls13_ssl_crt-list_filters.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then tls13_ssl_crt-list_filters.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-09 07:42:45 +01:00
Frederic Lecaille
c005ed0df8 MINOR: ssl: Split ssl_crt-list_filters.vtc in two files by TLS version
Seperate the section from ssl_crt-list_filters.vtc which supports TLS 1.2 and 1.3
versions to produce tls12_ssl_crt-list_filters.vtc and tls13_ssl_crt-list_filters.vtc.
2025-12-09 07:42:45 +01:00
Christopher Faulet
2fa3b4c3a3 DOC: config: Improve spop mode documentation
The spop mode description was a bit confusing. So let's improve it.

Thanks to @NickMRamirez.

This patch shoud fix issue #3206. It could be backported as far as 3.1.
2025-12-08 15:24:05 +01:00
Christopher Faulet
e16dcab92f DOC: config: Fix description of the spop mode
It was mentionned that the spop mode turned the backend into a "log"
backend. It is obviously wrong. It turns the backend into a spop backend.

This patch should be backported as far as 3.1.
2025-12-08 15:22:01 +01:00
Christopher Faulet
3cf4e7afb9 BUG/MEDIUM: http-ana: Don't close server connection on read0 in TUNNEL mode
It is a very old bug (2012), dating from the introduction of the keep-alive
support to HAProxy. When a request is fully received, the SC on backend side
is switched to NOHALF mode. It means that when the read0 is received from
the server, the server connection is immediately closed. It is expected to
do so at the end of a classical request. However, it must not be performed
if the session is switched to the TUNNEL mode (after an HTTP/1 upgrade or a
CONNECT). The client may still have data to send to the server. And closing
brutally the server connection this way will be handled as an error on
client side.

This bug is especially visible when a H2 connection on client side because a
RST_STREAM is emitted and a "SD--" is reported in logs.

Thanks to @chrisstaite

This patch should fix the issue #3205. It must be backported to all stable
versions.
2025-12-08 15:22:01 +01:00
Christopher Faulet
5d74980277 BUG/MINOR: log: Dump good %B and %U values in logs
When per-stream "bytes_in" and "bytes_out" counters where replaced in 3.3,
the wrong counters were used for %B and %U values in logs. In the
configuration manual and the commit message, it was specificed that
"bytes_in" was replaced by "req_in" and "bytes_out" by "res_in", but in the
code, wrong counters were used. It is now fixed.

This patch should fix the issue #3208. It must be backported to 3.3.
2025-12-08 15:22:01 +01:00
Christopher Faulet
be998b590e MEDIUM: ssl/server: No longer store the SNI of cached TLS sessions
Thanks to the previous patch, "BUG/MEDIUM: ssl: Don't reuse TLS session
if the connection's SNI differs", it is no useless to store the SNI of
cached TLS sessions. This SNI is no longer tested and new connections
reusing a session must have the same SNI.

The main change here is for the ssl_sock_set_servername() function. It is no
longer possible to compare the SNI of the reused session with the one of the
new connection. So, the SNI is always set, with no other processing. Mainly,
the session is not destroyed when SNIs don't match. It means the commit
119a4084bf ("BUG/MEDIUM: ssl: for a handshake when server-side SNI changes")
is implicitly reverted.

It is good to note that it is unclear for me when and why the reused session
should be destroyed. Because I'm unable to reproduce any issue fixed by the
commit above.

This patch could be backported as far as 3.0 with the commit above.
2025-12-08 15:22:01 +01:00
Christopher Faulet
5702009c8c BUG/MEDIUM: ssl: Don't reuse TLS session if the connection's SNI differs
When a new SSL server connection is created, if no SNI is set, it is
possible to inherit from the one of the reused TLS session. The bug was
introduced by the commit 95ac5fe4a ("MEDIUM: ssl_sock: always use the SSL's
server name, not the one from the tid"). The mixup is possible between
regular connections but also with health-checks connections.

But it is only the visible part of the bug. If the SNI of the cached TLS
session does not match the one of the new connection, no reuse must be
performed at all.

To fix the bug, hash of the SNI of the reused session is compared with the
one of the new connection. The TLS session is reused only if the hashes are
the same.

This patch should fix the issue #3195. It must be slowly backported as far
as 3.0. it relies on the following series:

  * MEDIUM: tcpcheck/backend: Get the connection SNI before initializing SSL ctx
  * MINOR: connection/ssl: Store the SNI hash value in the connection itself
  * MEDIUM: ssl: Store hash of the SNI for cached TLS sessions
  * MINOR: ssl: Add a function to hash SNIs
  * MEDIUM: quic: Add connection as argument when qc_new_conn() is called
  * BUG/MINOR: ssl: Don't allow to set NULL sni
2025-12-08 15:22:01 +01:00
Christopher Faulet
7e9d921141 MEDIUM: tcpcheck/backend: Get the connection SNI before initializing SSL ctx
The SNI of a new connection is now retrieved earlier, before the
initialization of the SSL context. So, concretely, it is now performed
before calling conn_prepare(). The SNI is then set just after.
2025-12-08 15:22:01 +01:00
Christopher Faulet
28654f3c9b MINOR: connection/ssl: Store the SNI hash value in the connection itself
When a SNI is set on a new connection, its hash is now saved in the
connection itself. To do so, a dedicated field was added into the connection
strucutre, called sni_hash. For now, this value is only used when the TLS
session is cached.
2025-12-08 15:22:01 +01:00
Christopher Faulet
92f77cb3e6 MINOR: ssl: Compare hashes instead of SNIs when a session is cached
This patch relies on the commit "MINOR: ssl: Store hash of the SNI for
cached TLS sessions". We now use the hash of the SNIs instead of the SNIs
themselves to know if we must update the cached SNI or not.
2025-12-08 15:22:01 +01:00
Christopher Faulet
9794585204 MINOR: ssl: Store hash of the SNI for cached TLS sessions
For cached TLS sessions, in addition to the SNI itself, its hash is now also
saved. No changes are expected here because this hash is not used for now.

This commit relies on:

  * MINOR: ssl: Add a function to hash SNIs
2025-12-08 15:22:00 +01:00
Christopher Faulet
d993e1eeae MINOR: ssl: Add a function to hash SNIs
This patch only adds the function ssl_sock_sni_hash() that can be used to
get the hash value corresponding to an SNI. A global seed, sni_hash_seed, is
used.
2025-12-08 15:22:00 +01:00
Christopher Faulet
a83ed86b78 MEDIUM: quic: Add connection as argument when qc_new_conn() is called
This patch reverts the commit efe60745b ("MINOR: quic: remove connection arg
from qc_new_conn()"). The connection will be mandatory when the QUIC
connection is created on backend side to fix an issue when we try to reuse a
TLS session.

So, the connection is again an argument of qc_new_conn(), the 4th
argument. It is NULL for frontend QUIC connections but there is no special
check on it.
2025-12-08 15:22:00 +01:00
Christopher Faulet
3534efe798 BUG/MINOR: ssl: Don't allow to set NULL sni
ssl_sock_set_servername() function was documented to support NULL sni to
unset it. However, the man page of SSL_get_servername() does not mentionned
it is supported or not. And it is in fact not supported by WolfSSL and leads
to a crash if we do so.

For now, this function is never called with a NULL sni, so it better and
safer to forbid this case. Now, if the sni is NULL, the function does
nothing.

This patch could be backported to all stable versions.
2025-12-08 15:22:00 +01:00
Frederic Lecaille
7872260525 REGTESTS: quic/ssl: Add ssl_curves_selection.vtc
This reg test ensures the curves may be correctly set for frontend
and backends by "ssl-default-bind-curves" and "ssl-default-server-curves"
as global options or with "curves" options on "bind" and "server" lines.
2025-12-08 10:40:59 +01:00
Frederic Lecaille
90064ac88b BUG/MINOR: quic: do not set first the default QUIC curves
This patch impacts both the QUIC frontends and listeners.

Note that "ssl-default-bind-ciphersuites", "ssl-default-bind-curves",
are not ignored by QUIC by the frontend. This is also the case for the
backends with "ssl-default-server-ciphersuites" and "ssl-default-server-curves".

These settings are set by ssl_sock_prepare_ctx() for the frontends and
by ssl_sock_prepare_srv_ssl_ctx() for the backends. But ssl_quic_initial_ctx()
first sets the default QUIC frontends (see <quic_ciphers> and <quic_groups>)
before these ssl_sock.c function are called, leading some TLS stack to
refuse them if they do not support them. This is the case for some OpenSSL 3.5
stack with FIPS support. They do not support X25519.

To fix this, set the default QUIC ciphersuites and curves only if not already
set by the settings mentioned above.

Rename <quic_ciphers> global variable to <default_quic_ciphersuites>
and <quic_groups> to <default_quic_curves> to reflect the OpenSSL API naming.

These options are taken into an account by ssl_quic_initial_ctx()
which inspects these four variable before calling SSL_CTX_set_ciphersuites()
with <default_quic_ciphersuites> as parameter and SSL_CTX_set_curves() with
<default_quic_curves> as parameter if needed, that is to say, if no ciphersuites
and curves were set by "ssl-default-bind-ciphersuites", "ssl-default-bind-curves"
as global options  or "ciphersuites", "curves" as "bind" line options.
Note that the bind_conf struct is not modified when no "ciphersuites" or
"curves" option are used on "bind" lines.

On backend side, rely on ssl_sock_init_srv() to set the server ciphersuites
and curves. This function is modified to use respectively <default_quic_ciphersuites>
and <default_quic_curves> if no ciphersuites  and curves were set by
"ssl-default-server-ciphersuites", "ssl-default-server-curves" as global options
or "ciphersuites", "curves" as "server" line options.

Thank to @rwagoner for having reported this issue in GH #3194 when using
an OpenSSL 3.5.4 stack with FIPS support.

Must be backported as far as 2.6
2025-12-08 10:40:59 +01:00
Frederic Lecaille
a2d2cda631 REGTESTS: add ssl_ciphersuites.vtc (TCP & QUIC)
This reg test ensures the ciphersuites may be correctly set for frontend
and backends by "ssl-default-bind-ciphersuites" and "ssl-default-server-ciphersuites"
as global options or with "ciphersuites" options on "bind" and "server" lines.
2025-12-08 10:40:59 +01:00
Frederic Lecaille
062a0ed899 REGTESTS: quic: add_ssl_crt-list.vtc supported by QUIC
ssl/add_ssl_crt-list.vtc was renamed to ssl/add_ssl_crt-list.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then add_ssl_crt-list.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
4214c97dd4 REGTESTS: quic: ssl_curve_name.vtc supported by QUIC
ssl/ssl_curve_name.vtc was renamed to ssl/ssl_curve_name.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then ssl_curve_name.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);

Note that this script works by chance for QUIC because the curves
selection matches the default ones used by QUIC.
2025-12-08 10:40:59 +01:00
Frederic Lecaille
c615b14fac REGTESTS: quic: ssl_sni_auto.vtc code provision for QUIC
ssl/ssl_sni_auto.vtc was renamed to ssl/ssl_sni_auto.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then ssl_sni_auto.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);

Mark the test as broken for QUIC
2025-12-08 10:40:59 +01:00
Frederic Lecaille
7bb7b26317 REGTESTS: quic: ssl_simple_crt-list.vtc supported by QUIC
ssl/ssl_simple_crt-list.vtc was renamed to ssl/ssl_simple_crt-list.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then ssl_simple_crt-list.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
b87bee8e04 REGTESTS: quic: ssl_server_samples.vtc supported by QUIC
ssl/ssl_server_samples.vtc was renamed to ssl/ssl_server_samples.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then ssl_server_samples.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
25529dddb6 REGTESTS: quic: ssl_frontend_samples.vtc supported by QUIC
ssl/ssl_frontend_samples.vtc was renamed to ssl/ssl_frontend_samples.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then ssl_frontend_samples.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
5cf5f76a90 REGTESTS: quic: new_del_ssl_crlfile.vtc supported by QUIC
ssl/new_del_ssl_crlfile.vtc was renamed to ssl/new_del_ssl_crlfile.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then new_del_ssl_crlfile.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
fc0c52f2af REGTESTS: quic: ssl_default_server.vtc supported by QUIC
ssl/ssl_default_server.vtc was renamed to ssl/ssl_default_server.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then ssl_default_server.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
4bff826204 REGTESTS: quic: ssl_client_samples.vtc supported by QUIC
ssl/ssl_client_samples.vtc was renamed to ssl/ssl_client_samples.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then ssl_client_samples.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
47889154d2 REGTESTS: quic: ssl_client_auth.vtc supported by QUIC
ssl/ssl_client_auth.vtc was renamed to ssl/ssl_client_auth.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then ssl_client_auth.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
b285f11cd6 REGTESTS: quic: show_ssl_ocspresponse.vtc supported by QUIC
ssl/show_ssl_ocspresponse.vtc was renamed to ssl/show_ssl_ocspresponse.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then show_ssl_ocspresponse.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
c4d066e735 REGTESTS: quic: set_ssl_server_cert.vtc supported by QUIC
ssl/set_ssl_server_cert.vtc was renamed to ssl/set_ssl_server_cert.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then set_ssl_server_cert.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
c1a818c204 REGTESTS: quic: set_ssl_crlfile.vtc supported by QUIC
ssl/set_ssl_crlfile.vtc was renamed to ssl/set_ssl_crlfile.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then set_ssl_crlfile.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
83b3e2876e REGTESTS: quic: set_ssl_cert.vtc supported by QUIC
ssl/set_ssl_cert.vtc was renamed to ssl/set_ssl_cert.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then set_ssl_cert.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
cb1e9e3cd8 REGTESTS: quic: set_ssl_cert_noext.vtc supported by QUIC
ssl/set_ssl_cert_noext.vtc was renamed to ssl/set_ssl_cert_noext.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then set_ssl_cert_noext.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
9c3180160d REGTESTS: quic: set_ssl_cafile.vtc supported by QUIC
ssl/set_ssl_cafile.vtc was renamed to ssl/set_ssl_cafile.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then set_ssl_cafile.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
3f5e73e83f BUG/MINOR: quic-be: missing connection stream closure upon TLS alert to send
This is the same issue as the one fixed by this commit:
   BUG/MINOR: quic-be: handshake errors without connection stream closure
But this time this is when the client has to send an alert to the server.
The fix consists in creating the mux after having set the handshake connection
error flag and error_code.

This bug was revealed by ssl/set_ssl_cafile.vtc reg test.

Depends on this commit:
     MINOR: quic: avoid code duplication in TLS alert callback

Must be backported to 3.3
2025-12-08 10:40:59 +01:00
Frederic Lecaille
e7b06f5e7a MINOR: quic: avoid code duplication in TLS alert callback
Both the OpenSSL QUIC API TLS alert callback ha_quic_ossl_alert() does exactly
the same thing than the one for quictls API, even if the parameter have different
types.

Call ha_quic_send_alert() quictls callback from ha_quic_ossl_alert OpenSSL
QUIC API callback to avoid such code duplication.
2025-12-08 10:40:59 +01:00
Frederic Lecaille
ad101dc3d5 REGTESTS: quic: set_ssl_bug_2265.vtc supported by QUIC
ssl/set_ssl_bug_2265.vtc was renamed to ssl/set_ssl_bug_2265.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then set_ssl_bug_2265.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
2e7320d2ee REGTESTS: quic: ocsp_auto_update.vtc supported by QUIC
ssl/ocsp_auto_update.vtc was renamed to ssl/ocsp_auto_update.vtci
to produce a common part runnable both for QUIC and TCP listeners.
Then ocsp_auto_update.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC listeners and "stream" for TCP listeners);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
cdfd9b154a REGTESTS: quic: new_del_ssl_cafile.vtc supported by QUIC
ssl/new_del_ssl_cafile.vtc was rename to ssl/new_del_ssl_cafile.vtci
to produce a common part runnable both for QUIC and TCP connections.
Then new_del_ssl_cafile.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC connection and "stream" for TCP connections);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
8c48a7798a REGTESTS: quic: issuers_chain_path.vtc supported by QUIC
ssl/issuers_chain_path.vtc was rename to ssl/issuers_chain_path.vtci
to produce a common part runnable both for QUIC and TCP connections.
Then issuers_chain_path.vtc files were created both under ssl and quic directories
to call this .vtci file with correct VTC_SOCK_TYPE environment values
("quic" for QUIC connection and "stream" for TCP connections);
2025-12-08 10:40:59 +01:00
Frederic Lecaille
94a7e0127b REGTESTS: quic: dynamic_server_ssl.vtc supported by QUIC
ssl/dynamic_server_ssl.vtc was rename to ssl/dynamic_server_ssl.vtci
to produce a common part runnable both for QUIC and TCP connections.
Then dynamic_server_ssl.vtc were created both under ssl and quic directories
to call the .vtci file with correct VTC_SOCK_TYPE environment value.

Note that VTC_SOCK_TYPE may be resolved in haproxy -cli { } sections.
2025-12-08 10:40:59 +01:00
Frederic Lecaille
588d0edf99 REGTESTS: quic/ssl: ssl/del_ssl_crt-list.vtc supported by QUIC
Extract from ssl/del_ssl_crt-list.vtc the common part to produce
ssl/del_ssl_crt-list.vtci which may be reused by QUIC and TCP
from respectively quic/del_ssl_crt-list.vtc and ssl/del_ssl_crt-list.vtc
thanks to "include" VTC command and VTC_SOCK_TYPE special vtest environment
variable.
2025-12-08 10:40:59 +01:00
Frederic Lecaille
6e94b69665 REGTESTS: ssl: Move all the SSL certificates, keys, crt-lists inside "certs" directory
Move all these files and others for OCSP tests found into reg-tests/ssl
to reg-test/ssl/certs and adapt all the VTC files which use them.

This patch is needed by other tests which have to include the SSL tests.
Indeed, some VTC commands contain paths to these files which cannot
be customized with environment variables, depending on the location the VTC file
is runi from, because VTC does not resolve the environment variables. Only macros
as ${testdir} can be resolved.

For instance this command run from a VTC file from reg-tests/ssl directory cannot
be reused from another directory, except if we add a symbolic link for each certs,
key etc.

 haproxy h1 -cli {
   send "del ssl crt-list ${testdir}/localhost.crt-list ${testdir}/common.pem:1"
 }

This is not what we want. We add a symbolic link to reg-test/ssl/certs to the
directory and modify the command above as follows:

 haproxy h1 -cli {
   send "del ssl crt-list ${testdir}/certs/localhost.crt-list ${testdir}/certs/common.pem:1"
 }
2025-12-08 10:40:59 +01:00
Frederic Lecaille
21293dd6c3 MINOR: quic: Add useful debugging traces in qc_idle_timer_do_rearm()
Traces were missing in this function.
Also add information about the connection struct from qc->conn when
initialized for all the traces.

Should be easily backported as far as 2.6.
2025-12-08 10:40:59 +01:00
Frederic Lecaille
c36e27d10e BUG/MINOR: quic-be: handshake errors without connection stream closure
This bug was revealed on backend side by reg-tests/ssl/del_ssl_crt-list.vtc when
run wich QUIC connections. As expected by the test, a TLS alert is generated on
servsr side. This latter sands a CONNECTION_CLOSE frame with a CRYPTO error
(>= 0x100). In this case the client closes its QUIC connection. But
the stream connection was not informed. This leads the connection to
be closed after the server timeout expiration. It shouls be closed asap.
This is the reason why reg-tests/ssl/del_ssl_crt-list.vtc could succeeds
or failed, but only after a 5 seconds delay.

To fix this, mimic the ssl_sock_io_cb() for TCP/SSL connections. Call
the same code this patch implements with ssl_sock_handle_hs_error()
to correctly handle the handshake errors. Note that some SSL counters
were not incremented for both the backends and frontends. After such
errors, ssl_sock_io_cb() start the mux after the connection has been
flagged in error. This has as side effect to close the stream
in conn_create_mux().

Must be backported to 3.3 only for backends. This is not sure at this time
if this bug may impact the frontends.
2025-12-08 10:40:59 +01:00
Frederic Lecaille
63273c795f BUG/MINOR: quic/ssl: crash in ClientHello callback ssl traces
Such crashes may occur for QUIC frontends only when the SSL traces are enabled.

ssl_sock_switchctx_cbk() ClientHello callback may be called without any connection
initialize (<conn>) for QUIC connections leading to crashes when passing
conn->err_code to TRACE_ERROR().

Modify the TRACE_ERROR() statement to pass this parameter only when <conn> is
initialized.

Must be backported as far as 3.2.
2025-12-08 10:40:59 +01:00
Willy Tarreau
d2a1665af0 DOC: config: reorder the cache section's keywords
Probably due to historical accumulation, keywords were in a random
order that doesn't help when looking them up. Let's just reorder them
in alphabetical order like other sections. This can be backported.
2025-12-04 15:44:38 +01:00
Willy Tarreau
4d0a88c746 DOC: config: mention clearer that the cache's total-max-size is mandatory
As reported in GH issue #3201, it's easy to overlook this, so let's make
it clearer by mentioning the keyword. This can be backported to all
versions.
2025-12-04 15:42:09 +01:00
Willy Tarreau
cd959f1321 BUG/MEDIUM: config: ignore empty args in skipped blocks
As returned by Christian Ruppert in GH issue #3203, we're having an
issue with checks for empty args in skipped blocks: the check is
performed after the line is tokenized, without considering the case
where it's disabled due to outer false .if/.else conditions. Because
of this, a test like this one:

    .if defined(SRV1_ADDR)
        server srv1 "$SRV1_ADDR"
    .endif

will fail when SRV1_ADDR is empty or not set, saying that this will
result in an empty arg on the line.

The solution consists in postponing this check after the conditions
evaluation so that disabled lines are already skipped. And for this
to be possible, we need to move "errptr" one level above so that it
remains accessible there.

This will need to be backported to 3.3 and wherever commit 1968731765
("BUG/MEDIUM: config: solve the empty argument problem again") is
backported. As such it is also related to GH issue #2367.
2025-12-04 15:33:43 +01:00
Willy Tarreau
b29560f610 BUG/MEDIUM: connection: fix "bc_settings_streams_limit" typo
The keyword was correct in the doc but in the code it was spelled
with a missing 's' after 'settings', making it unavailable. Since
there was no other way to find this but reading the code, it's safe
to simply fix it and assume nobody relied on the wrong spelling.

In the worst case for older backports it can also be duplicated.

This must be backported to 3.0.
2025-12-04 15:26:54 +01:00
William Lallemand
85689b072a REGTESTS: ssl: split tls*_reuse in stateless and stateful resume tests
Simplify ssl_reuse.vtci so it can be started with variables:

- SSL_CACHESIZE allow to specify the size of the session cache size for
  the frontend
- NO_TLS_TICKETS allow to specify the "no-tls-tickets" option on bind

It introduces these files:

- ssl/tls12_resume_stateful.vtc
- ssl/tls12_resume_stateless.vtc
- ssl/tls13_resume_stateless.vtc
- ssl/tls13_resume_stateful.vtc
- quic/tls13_resume_stateless.vtc
- quic/tls13_resume_stateful.vtc
- quic/tls13_0rtt_stateful.vtc
- quic/tls13_0rtt_stateless.vtc

stateful files have "no-tls-tickets" + tune.tls.cachesize 20000
stateless files have "tls-tickets" + tune.tls.cachesize 0

This allows to enable AWS-LC on TCP TLS1.2 and TCP TL1.3+tickets.

TLS1.2+stateless does not seem to work on WolfSSL.
2025-12-04 15:05:56 +01:00
William Lallemand
c7b5d2552a REGTESTS: ssl enable tls12_reuse.vtc for AWS-LC
The TLS resume test was never started with AWS-LC because the TLS1.3
part was not working. Since we split the reg-tests with a TLS1.2 part
and a TLS1.3 part, we can enable the tls1.2 part for AWS-LC.
2025-12-04 11:40:04 +01:00
Frederic Lecaille
cdca48b88c BUG/MINOR: quic-be: Missing keywords array NULL termination
This bug arrived with this commit:
     MINOR: quic: implement cc-algo server keyword
where <srv> keywords list with a missing array NULL termination inside was
introduced to parse the QUIC backend CC algorithms.

Detected by ASAN during ssl/add_ssl_crt-list.vtc execution as follows:

***  h1    debug|==4066081==ERROR: AddressSanitizer: global-buffer-overflow on address 0x5562e31dedb8 at pc 0x5562e298951f bp 0x7ffe9f9f2b40 sp 0x7ffe9f9f2b38
***  h1    debug|READ of size 8 at 0x5562e31dedb8 thread T0
**** dT    0.173
***  h1    debug|    #0 0x5562e298951e in srv_find_kw src/server.c:789
***  h1    debug|    #1 0x5562e2989630 in _srv_parse_kw src/server.c:3847
***  h1    debug|    #2 0x5562e299db1f in parse_server src/server.c:4024
***  h1    debug|    #3 0x5562e2c86ea4 in cfg_parse_listen src/cfgparse-listen.c:593
***  h1    debug|    #4 0x5562e2b0ede9 in parse_cfg src/cfgparse.c:2708
***  h1    debug|    #5 0x5562e2c47d48 in read_cfg src/haproxy.c:1077
***  h1    debug|    #6 0x5562e2682055 in main src/haproxy.c:3366
***  h1    debug|    #7 0x7ff3ff867249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
***  h1    debug|    #8 0x7ff3ff867304 in __libc_start_main_impl ../csu/libc-start.c:360
***  h1    debug|    #9 0x5562e26858d0 in _start (/home/flecaille/src/haproxy/haproxy+0x2638d0)
***  h1    debug|
***  h1    debug|0x5562e31dedb8 is located 40 bytes to the left of global variable 'bind_kws' defined in 'src/cfgparse-quic.c:255:28' (0x5562e31dede0) of size 120
***  h1    debug|0x5562e31dedb8 is located 0 bytes to the right of global variable 'srv_kws' defined in 'src/cfgparse-quic.c:264:27' (0x5562e31ded80) of size 56
***  h1    debug|SUMMARY: AddressSanitizer: global-buffer-overflow src/server.c:789 in srv_find_kw
***  h1    debug|Shadow bytes around the buggy address:
***  h1    debug|  0x0aacdc633d60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
***  h1    debug|  0x0aacdc633d70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
***  h1    debug|  0x0aacdc633d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
***  h1    debug|  0x0aacdc633d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
***  h1    debug|  0x0aacdc633da0: 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9
***  h1    debug|=>0x0aacdc633db0: 00 00 00 00 00 00 00[f9]f9 f9 f9 f9 00 00 00 00
***  h1    debug|  0x0aacdc633dc0: 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9
***  h1    debug|  0x0aacdc633dd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
***  h1    debug|  0x0aacdc633de0: 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 f9
***  h1    debug|  0x0aacdc633df0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
***  h1    debug|  0x0aacdc633e00: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
***  h1    debug|Shadow byte legend (one shadow byte represents 8 application bytes):

This should be backported where the commit above is supposed to be backported.
2025-12-03 11:07:47 +01:00
Amaury Denoyelle
47dff5be52 MINOR: quic: implement cc-algo server keyword
Extend QUIC server configuration so that congestion algorithm and
maximum window size can be set on the server line. This can be achieved
using quic-cc-algo keyword with a syntax similar to a bind line.

This should be backported up to 3.3 as this feature is considered as
necessary for full QUIC backend support. Note that this relies on the
serie of previous commits which should be picked first.
2025-12-01 15:53:58 +01:00
Amaury Denoyelle
4f43abd731 MINOR: quic: extract cc-algo parsing in a dedicated function
Extract code from bind_parse_quic_cc_algo() related to pure parsing of
quic-cc-algo keyword. The objective is to be able to quickly duplicate
this option on the server line.

This may need to be backported to support QUIC congestion control
algorithm support on the server line in version 3.3.
2025-12-01 15:06:01 +01:00
Amaury Denoyelle
979588227f MINOR: quic: define quic_cc_algo as const
Each QUIC congestion algorithm is defined as a structure with callbacks
in it. Every quic_conn has a member pointing to the configured
algorithm, inherited from the bind-conf keyword or to the default CUBIC
value.

Convert all these definitions to const. This ensures that there never
will be an accidental modification of a globally shared structure. This
also requires to mark quic_cc_algo field in bind_conf and quic_cc as
const.
2025-12-01 15:05:41 +01:00
Amaury Denoyelle
acbb378136 Revert "MINOR: quic: use dynamic cc_algo on bind_conf"
This reverts commit a6504c9cfb6bb48ae93babb76a2ab10ddb014a79.

Each supported QUIC algo are associated with a set of callbacks defined
in a structure quic_cc_algo. Originally, bind_conf would use a constant
pointer to one of these definitions.

During pacing implementation, this field was transformed into a
dynamically allocated value copied from the original definition. The
idea was to be able to tweak settings at the listener level. However,
this was never used in practice. As such, revert to the original model.

This may need to be backported to support QUIC congestion control
algorithm support on the server line in version 3.3.
2025-12-01 14:18:58 +01:00
William Lallemand
c641ea4f9b DOC: configuration: ECH support details
Specify which OpenSSL branch is supported and that AWS-LC is not
supported.

Must be backported to 3.3.
2025-11-30 09:47:56 +01:00
Remi Tricot-Le Breton
2b3d13a740 BUG/MINOR: jwt: Missing "case" in switch statement
Because of missing "case" keyword in front of the values in a switch
case statement, the values were interpreted as goto tags and the switch
statement became useless.

This patch should fix GitHub issue #3200.
The fix should be backported up to 2.8.
2025-11-28 16:36:46 +01:00
Willy Tarreau
36133759d3 [RELEASE] Released version 3.4-dev0
Released version 3.4-dev0 with the following main changes :
    - MINOR: version: mention that it's development again
2025-11-26 16:12:45 +01:00
Willy Tarreau
e8d6ffb692 MINOR: version: mention that it's development again
This essentially reverts d8ba9a2a92.
2025-11-26 16:11:47 +01:00
Willy Tarreau
7832fb21fe [RELEASE] Released version 3.3.0
Released version 3.3.0 with the following main changes :
    - BUG/MINOR: acme: better challenge_ready processing
    - BUG/MINOR: acme: warning ‘ctx’ may be used uninitialized
    - MINOR: httpclient: complete the https log
    - BUG/MEDIUM: server: do not use default SNI if manually set
    - BUG/MINOR: freq_ctr: Prevent possible signed overflow in freq_ctr_overshoot_period
    - DOC: ssl: Document the restrictions on 0RTT.
    - DOC: ssl: Note that 0rtt works fork QUIC with QuicTLS too.
    - BUG/MEDIUM: quic: do not prevent sending if no BE token
    - BUG/MINOR: quic/server: free quic_retry_token on srv drop
    - MINOR: quic: split global CID tree between FE and BE sides
    - MINOR: quic: use separate global quic_conns FE/BE lists
    - MINOR: quic: add "clo" filter on show quic
    - MINOR: quic: dump backend connections on show quic
    - MINOR: quic: mark backend conns on show quic
    - BUG/MINOR: quic: fix uninit list on show quic handler
    - BUG/MINOR: quic: release BE quic_conn on connect failure
    - BUG/MINOR: server: fix srv_drop() crash on partially init srv
    - BUG/MINOR: h3: do no crash on forwarding multiple chained response
    - BUG/MINOR: h3: handle properly buf alloc failure on response forwarding
    - BUG/MEDIUM: server/ssl: Unset the SNI for new server connections if none is set
    - BUG/MINOR: acme: fix ha_alert() call
    - Revert "BUG/MEDIUM: server/ssl: Unset the SNI for new server connections if none is set"
    - BUG/MINOR: sock-inet: ignore conntrack for transparent sockets on Linux
    - DEV: patchbot: prepare for new version 3.4-dev
    - DOC: update INSTALL with the range of gcc compilers and openssl versions
    - MINOR: version: mention that 3.3 is stable now
2025-11-26 15:55:57 +01:00
Willy Tarreau
d8ba9a2a92 MINOR: version: mention that 3.3 is stable now
This version will be maintained up to around Q1 2027. The INSTALL file
also mentions it.
2025-11-26 15:54:30 +01:00
Willy Tarreau
09dd6bb4cb DOC: update INSTALL with the range of gcc compilers and openssl versions
Gcc 4.7 to 15 are tested. OpenSSL was tested up to 3.6. QUIC support
requires OpenSSL >= 3.5.2.
2025-11-26 15:50:43 +01:00
Willy Tarreau
22fd296a04 DEV: patchbot: prepare for new version 3.4-dev
The bot will now load the prompt for the upcoming 3.4 version so we have
to rename the files and update their contents to match the current version.
2025-11-26 15:35:22 +01:00
Willy Tarreau
e5658c52d0 BUG/MINOR: sock-inet: ignore conntrack for transparent sockets on Linux
As reported in github issue #3192, in certain situations with transparent
listeners, it is possible to get the incoming connection's destination
wrong via SO_ORIGINAL_DST. Two cases were identified thus far:
  - incorrect conntrack configuration where NOTRACK is used only on
    incoming packets, resulting in reverse connections being created
    from response packets. It's then mostly a matter of timing, i.e.
    whether or not the connection is confirmed before the source is
    retrieved, but in this case the connection's destination address
    as retrieved by SO_ORIGINAL_DST is the client's address.

  - late outgoing retransmit that recreates a just expired conntrack
    entry, in reverse direction as well. It's possible that combinations
    of RST or FIN might play a role here in speeding up conntrack eviction,
    as well as the rollover of source ports on the client whose new
    connection matches an older one and simply refreshes it due to
    nf_conntrack_tcp_loose being set by default.

TPROXY doesn't require conntrack, only REDIRECT, DNAT etc do. However
the system doesn't offer any option to know how a conntrack entry was
created (i.e. normally or via a response packet) to let us know that
it's pointless to check the original destination, nor does it permit
to access the local vs peer addresses in opposition to src/dst which
can be wrong in this case.

One alternate approach could consist in only checking SO_ORIGINAL_DST
for listening sockets not configured with the "transparent" option,
but the problem here is that our low-level API only works with FDs
without knowing their purpose, so it's unknown there that the fd
corresponds to a listener, let alone in transparent mode.

A (slightly more expensive) variant of this approach here consists in
checking on the socket itself that it was accepted in transparent mode
using IP_TRANSPARENT, and skip SO_ORIGINAL_DST if this is the case.
This does the job well enough (no more client addresses appearing in
the dst field) and remains a good compromise. A future improvement of
the API could permit to pass the transparent flag down the stack to
that function.

This should be backported to stable versions after some observation
in latest -dev.

For reference, here are some links to older conversations on that topic
that Lukas found during this analysis:

  https://lists.openwall.net/netdev/2019/01/12/34
  https://discourse.haproxy.org/t/send-proxy-not-modifying-some-traffic-with-proxy-ip-port-details/3336/9
  https://www.mail-archive.com/haproxy@formilux.org/msg32199.html
  https://lists.openwall.net/netdev/2019/01/23/114
2025-11-26 13:43:58 +01:00
Christopher Faulet
7d9cc28f92 Revert "BUG/MEDIUM: server/ssl: Unset the SNI for new server connections if none is set"
This reverts commit de29000e602bda55d32c266252ef63824e838ac0.

The fix was in fact invalid. First it is not supprted by WolfSSL to call
SSL_set_tlsext_host_name with a hostname to NULL. Then, it is not specified
as supported by other SSL libraries.

But, by reviewing the root cause of this bug, it appears there is an issue
with the reuse of TLS sesisons. It must not be performed if the SNI does not
match. A TLS session created with a SNI must not be reused with another
SNI. The side effects are not clear but functionnaly speaking, it is
invalid.

So, for now, the commit above was reverted because it is invalid and it
crashes with WolfSSL. Then the init of the SSL connection must be reworked
to get the SNI earlier, to be able to reuse or not an existing TLS
session.
2025-11-26 12:05:43 +01:00
Maxime Henrion
d506c03aa0 BUG/MINOR: acme: fix ha_alert() call
A NULL pointer was passed as the format string, so this alert message
was never written.

Must be backported to 3.2.
2025-11-25 20:20:25 +01:00
Christopher Faulet
de29000e60 BUG/MEDIUM: server/ssl: Unset the SNI for new server connections if none is set
When a new SSL server connection is created, if no SNI is set, it is
possible to inherit from the one of the reused TLS session. The bug was
introduced by the commit 95ac5fe4a ("MEDIUM: ssl_sock: always use the SSL's
server name, not the one from the tid"). The mixup is possible between
regular connections but also with health-checks connections.

To fix the issue, when no SNI is set, for regular server connections and for
health-check connections, the SNI must explicitly be disabled by calling
ssl_sock_set_servername() with the hostname set to NULL.

Many thanks to Lukas for his detailed bug report.

This patch should fix the issue #3195. It must be backported as far as 3.0.
2025-11-25 16:32:46 +01:00
Amaury Denoyelle
a70816da82 BUG/MINOR: h3: handle properly buf alloc failure on response forwarding
Replace BUG_ON() for buffer alloc failure on h3_resp_headers_to_htx() by
proper error handling. An error status is reported which should be
sufficient to initiate connection closure.

No need to backport.
2025-11-25 15:55:08 +01:00
Amaury Denoyelle
ae96defaca BUG/MINOR: h3: do no crash on forwarding multiple chained response
h3_resp_headers_to_htx() is the function used to convert an HTTP/3
response into a HTX message. It was introduced on this release for QUIC
backend support.

A BUG_ON() would occur if multiple responses are forwarded
simultaneously on a stream without rcv_buf in between. Fix this by
removing it. Instead, if QCS HTX buffer is not empty when handling with
a new response, prefer to pause demux operation. This is restarted when
the buffer has been read and emptied by the upper stream layer.

No need to backport.
2025-11-25 15:52:37 +01:00
Amaury Denoyelle
a363b536a9 BUG/MINOR: server: fix srv_drop() crash on partially init srv
A recent patch has introduced free operation for QUIC tokens stored in a
server. These values are located in <per_thr> server array.

However, a server instance may be released prior to its full
initialization in case of a failure during "add server" CLI command. The
mentionned patch would cause a srv_drop() crash due to an invalid usage
of NULL <per_thr> member.

Fix this by adding a check on <per_thr> prior to dereference it in
srv_drop().

No need to backport.
2025-11-25 15:16:13 +01:00
Amaury Denoyelle
6c08eb7173 BUG/MINOR: quic: release BE quic_conn on connect failure
If quic_connect_server() fails, quic_conn FD will remain unopened as set
to -1. Backend connections do not have a fallback socket for future
exchange, contrary to frontend one which can use the listener FD. As
such, it is better to release these connections early.

This patch adjusts such failure by extending quic_close(). This function
is called by the upper layer immediately after a connect issue. In this
case, release immediately a quic_conn backend instance if the FD is
unset, which means that connect has previously failed.

Also, quic_conn_release() is extended to ensure that such faulty
connections are immediately freed and not converted into a
quic_conn_closed instance.

Prior to this patch, a backend quic_conn without any FD would remain
allocated and possibly active. If its tasklet is executed, this resulted
in a crash due to access to an invalid FD.

No need to backport.
2025-11-25 14:50:23 +01:00
Amaury Denoyelle
346631700d BUG/MINOR: quic: fix uninit list on show quic handler
A recent patch has extended "show quic" capability. It is now possible
to list a specific list of connections, either active frontend, closing
frontend or backend connections.

An issue was introduced as the list is local storage. As this command is
reentrant, show quic context must be extended so that the currently
inspected list is also saved.

This issue was reported via GCC which mentions an uninitilized value
depending on branching conditions.
2025-11-25 14:50:19 +01:00
Amaury Denoyelle
a3f76875f4 MINOR: quic: mark backend conns on show quic
Add an extra "(B)" marker when displaying a backend connection during a
"show quic". This is useful to differentiate them with the frontend side
when displaying all connections.
2025-11-25 14:31:27 +01:00
Amaury Denoyelle
e56fdf6320 MINOR: quic: dump backend connections on show quic
Add a new "be" filter to "show quic". Its purpose is to be able to
display backend connections. These connections can also be listed using
"all" filter.
2025-11-25 14:30:18 +01:00
Amaury Denoyelle
3685681373 MINOR: quic: add "clo" filter on show quic
Add a new filter "clo" for "show quic" command. Its purpose is to filter
output to only list closing frontend connections.
2025-11-25 14:30:18 +01:00
Amaury Denoyelle
49e6fca51b MINOR: quic: use separate global quic_conns FE/BE lists
Each quic_conn instance is stored in a global list. Its purpose is to be
able to loop over all known connections during "show quic".

Split this into two separate lists for frontend and backend usage.
Another change is that closing backend connections do not move into
quic_conns_clo list. They remain instead in their original list. The
objective of this patch is to reduce the contention between the two
sides.

Note that this prevents backend connections to be listed in "show quic"
now. This will be adjusted in a future patch.
2025-11-25 14:30:18 +01:00
Amaury Denoyelle
a5801e542d MINOR: quic: split global CID tree between FE and BE sides
QUIC CIDs are stored in a global tree. Prior to this patch, CIDs used on
both frontend and backend sides were mixed together.

This patch implement CID storage separation between FE and BE sides. The
original tre quic_cid_trees is splitted as
quic_fe_cid_trees/quic_be_cid_trees.

This patch should reduce contention between frontend and backend usages.
Also, it should reduce the risk of random CID collision.
2025-11-25 14:30:18 +01:00
Amaury Denoyelle
4b596c1ea8 BUG/MINOR: quic/server: free quic_retry_token on srv drop
A recent patch has implemented caching of QUIC token received from a
NEW_TOKEN frame into the server cache. This value is stored per thread
into a <quic_retry_token> field.

This field is an ist, first set to an empty string. Via
qc_try_store_new_token(), it is reallocated to fit the size of the newly
stored token. Prior to this patch, the field was never freed so this
causes a memory leak.

Fix this by using istfree() on <quic_retry_token> field during
srv_drop().

No need to backport.
2025-11-25 14:30:18 +01:00
Amaury Denoyelle
cbfe574d8a BUG/MEDIUM: quic: do not prevent sending if no BE token
For QUIC client support, a token may be emitted along with INITIAL
packets during the handshake. The token is encoded during emission via
qc_enc_token() called by qc_build_pkt().

The token may be provided from different sources. First, it can be
retrieved via <retry_token> quic_conn member when a Retry packet was
received. If not present, a token may be reused from the server cache,
populated from NEW_TOKEN received from previous a connection.

Prior to this patch, the last method may cause an issue. If the upper
connection instance is released prior to the handshake completion, this
prevents access to a possible server token. This is considered an error
by qc_enc_token(). The error is reported up to calling functions,
preventing any emission to be performed. In the end, this prevented the
either the full quic_conn release or subsizing into quic_conn_closed
until the idle timeout completion (30s by default). With abortonclose
set now by default on HTTP frontends, early client shutdowns can easily
cause excessive memory consumption.

To fix this, change qc_enc_token() so that if connection is closed, no
token is encoded but also no error is reported. This allows to continue
emission and permit early connection release.

No need to backport.
2025-11-25 14:30:18 +01:00
Olivier Houchard
e27216b799 DOC: ssl: Note that 0rtt works fork QUIC with QuicTLS too.
Document that one can use 0rtt with QUIC when using QuicTLS too.
2025-11-25 13:17:45 +01:00
Olivier Houchard
f867068dc7 DOC: ssl: Document the restrictions on 0RTT.
Document that with QUIC, 0RTT only works with OpenSSL >= 3.5.2 and
AWS-LC, and for TLS/TCP, it only works with OpenSSL, and frontends
require that an ALPN be sent by the client to use the early data before
the handshake.
2025-11-25 11:46:22 +01:00
Jacques Heunis
91eb9b082b BUG/MINOR: freq_ctr: Prevent possible signed overflow in freq_ctr_overshoot_period
All of the other bandwidth-limiting code stores limits and intermediate
(byte) counters as unsigned integers. The exception here is
freq_ctr_overshoot_period which takes in unsigned values but returns a
signed value. While this has the benefit of letting the caller know how
far away from overshooting they are, this is not currently leveraged
anywhere in the codebase, and it has the downside of halving the positive
range of the result.

More concretely though, returning a signed integer when all intermediate
values are unsigned (and boundaries are not checked) could result in an
overflow, producing values that are at best unexpected. In the case of
flt_bwlim (the only usage of freq_ctr_overshoot_period in the codebase at
the time of writing), an overflow could cause the filter to wait for a
large number of milliseconds when in fact it shouldn't wait at all.

This is a niche possibility, because it requires that a bandwidth limit is
defined in the range [2^31, 2^32). In this case, the raw limit value would
not fit into a signed integer, and close to the end of the period, the
`(elapsed * freq)/period` calculation could produce a value which also
doesn't fit into a signed integer.

If at the same time `curr` (the number of events counted so far in the
current period) is small, then we could get a very large negative value
which overflows. This is undefined behaviour and could produce surprising
results. The most obvious outcome is flt_bwlim sometimes waiting for a
large amount of time in a case where it shouldn't wait at all, thereby
incorrectly slowing down the flow of data.

Converting just the return type from signed to unsigned (and checking for
the overflow) prevents this undefined behaviour. It also makes the range
of valid values consistent between the input and output of
freq_ctr_overshoot_period and with the input and output of other freq_ctr
functions, thereby reducing the potential for surprise in intermediate
calculations: now everything supports the full 0 - 2^32 range.
2025-11-24 14:10:13 +01:00
Amaury Denoyelle
2829165f61 BUG/MEDIUM: server: do not use default SNI if manually set
A new server feature "sni-auto" has been introduced recently. The
objective is to automatically set the SNI value to the host header if no
SNI is explicitely set.

  668916c1a2fc2180028ae051aa805bb71c7b690b
  MEDIUM: server/ssl: Base the SNI value to the HTTP host header by default

There is an issue with it : server SNI is currently always overwritten,
even if explicitely set in the configuration file. Adjust
check_config_validity() to ensure the default value is only used if
<sni_expr> is NULL.

This issue was detected as a memory leak on <sni_expr> was reported when
SNI is explicitely set on a server line.

This patch is related to github feature request #3081.

No need to backport, unless the above patch is.
2025-11-24 11:45:18 +01:00
William Lallemand
5dbf06e205 MINOR: httpclient: complete the https log
The httpsclient_log_format variable lacks a few values in the TLS fields
that are now available as fetches.

On the backend side we have:

"%[fc_err]/%[ssl_fc_err,hex]/%[ssl_c_err]/%[ssl_c_ca_err]/%[ssl_fc_is_resumed] %[ssl_fc_sni]/%sslv/%sslc"

We now have enough sample fetches to have this equivalent in the
httpclient:

"%[bc_err]/%[ssl_bc_err,hex]/%[ssl_c_err]/%[ssl_c_ca_err]/%[ssl_bc_is_resumed] %[ssl_bc_sni]/%[ssl_bc_protocol]/%[ssl_bc_cipher]"

Instead of the current:

"%[bc_err]/%[ssl_bc_err,hex]/-/-/%[ssl_bc_is_resumed] -/-/-"
2025-11-22 12:29:33 +01:00
William Lallemand
0cae2f0515 BUG/MINOR: acme: warning ‘ctx’ may be used uninitialized
Please compiler with maybe-uninitialized warning

src/acme.c: In function ‘cli_acme_chall_ready_parse’:
include/haproxy/task.h:215:9: error: ‘ctx’ may be used uninitialized [-Werror=maybe-uninitialized]
  215 |         _task_wakeup(t, f, MK_CALLER(WAKEUP_TYPE_TASK_WAKEUP, 0, 0))
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/acme.c:2903:17: note: in expansion of macro ‘task_wakeup’
 2903 |                 task_wakeup(ctx->task, TASK_WOKEN_MSG);
      |                 ^~~~~~~~~~~
src/acme.c:2862:26: note: ‘ctx’ was declared here
 2862 |         struct acme_ctx *ctx;
      |                          ^~~

Backport to 3.2.
2025-11-21 23:04:16 +01:00
William Lallemand
d77d3479ed BUG/MINOR: acme: better challenge_ready processing
Improve the challenge_ready processing:

- do a lookup directly instead looping in the task tree
- only do a task_wakeup when every challenges are ready to avoid starting
  the task and stopping it just after
- Compute the number of remaining challenge to setup
- Output a message giving the number of remaining challenges to setup
  and if the task started again.

Backport to 3.2.
2025-11-21 22:47:52 +01:00
Willy Tarreau
8418c001ce [RELEASE] Released version 3.3-dev14
Released version 3.3-dev14 with the following main changes :
    - MINOR: stick-tables: Rename stksess shards to use buckets
    - MINOR: quic: do not use quic_newcid_from_hash64 on BE side
    - MINOR: quic: support multiple random CID generation for BE side
    - MINOR: quic: try to clarify quic_conn CIDs fields direction
    - MINOR: quic: refactor qc_new_conn() prototype
    - MINOR: quic: remove <ipv4> arg from qc_new_conn()
    - MEDIUM: mworker: set the mworker-max-reloads to 50
    - BUG/MEDIUM: quic-be: prevent use of MUX for 0-RTT sessions without secrets
    - CLEANUP: startup: move confusing msg variable
    - BUG/MEDIUM: mworker: signals inconsistencies during startup and reload
    - BUG/MINOR: mworker: wrong signals during startup
    - BUG/MINOR: acme: P-256 doesn't work with openssl >= 3.0
    - REGTESTS: ssl: split the SSL reuse test into TLS 1.2/1.3
    - BUILD: Makefile: make install with admin tools
    - CI: github: make install-bin instead of make install
    - BUG/MINOR: ssl: remove dead code in ssl_sock_from_buf()
    - BUG/MINOR: mux-quic: implement max-reuse server parameter
    - MINOR: quic: fix trace on quic_conn_closed release
    - BUG/MINOR: quic: do not decrement jobs for backend conns
    - BUG/MINOR: quic: fix FD usage for quic_conn_closed on backend side
    - BUILD: Makefile: remove halog from install-admin
    - REGTESTS: ssl: add basic 0rtt tests for TLSv1.2, TLSv1.3 and QUIC
    - REGTESTS: ssl: also verify that 0-rtt properly advertises early-data:1
    - MINOR: quic/flags: add missing QUIC flags for flags dev tool.
    - MINOR: quic: uneeded xprt context variable passed as parameter
    - MINOR: limits: keep a copy of the rough estimate of needed FDs in global struct
    - MINOR: limits: explain a bit better what to do when fd limits are exceeded
    - BUG/MEDIUM: quic-be/ssl_sock: TLS callback called without connection
    - BUG/MINOR: acme: alert when the map doesn't exist at startup
    - DOC: acme: add details about the DNS-01 support
    - DOC: acme: explain how to dump the certificates
    - DOC: acme: configuring acme needs a crt file
    - DOC: acme: add details about key pair generation in ACME section
    - BUG/MEDIUM: queues: Don't forget to unlock the queue before exiting
    - MINOR: muxes: Support an optional ALPN string when defining mux protocols
    - MINOR: config: Do proto detection for listeners before checks about ALPN
    - BUG/MEDIUM: config: Use the mux protocol ALPN by default for listeners if forced
    - DOC: config: Add a note about conflict with ALPN/NPN settings and proto keyword
    - MINOR: quic: store source address for backend conns
    - BUG/MINOR: quic: flag conn with CO_FL_FDLESS on backend side
    - ADMIN: dump-certs: let dry-run compare certificates
    - BUG/MEDIUM: connection/ssl: also fix the ssl_sock_io_cb() regarding idle list
    - DOC: http: document 413 response code
    - MINOR: limits: display the computed maxconn using ha_notice()
    - BUG/MEDIUM: applet: Fix conditions to detect spinning loop with the new API
    - BUG/MEDIUM: cli: State the cli have no more data to deliver if it yields
    - MINOR: h3: adjust sedesc update for known input payload len
    - BUG/MINOR: mux-quic: fix sedesc leak on BE side
    - OPTIM: mux-quic: delay FE sedesc alloc to stream creation
    - BUG/MEDIUM: quic-be: quic_conn_closed buffer overflow
    - BUG/MINOR: mux-quic: check access on qcs stream-endpoint
    - BUG/MINOR: acme: handle multiple auth with the same name
    - BUG/MINOR: acme: prevent creating map entries with dns-01
2025-11-21 14:13:44 +01:00
William Lallemand
548e7079cd BUG/MINOR: acme: prevent creating map entries with dns-01
We don't need map entries with dns-01.

The patch must be backported to 3.2.
2025-11-21 12:28:41 +01:00
William Lallemand
26093121a3 BUG/MINOR: acme: handle multiple auth with the same name
In case of the dns-01 challenge, it is possible to have a domain
"example.com" and "*.example.com" in the same request. This will create
2 different auth objects, which need 2 different challenges.

However the associated domain is "example.com" for both auth objects.

When doing a "challenge_ready", the algorithm will break at the first
domain found. But since you can have multiple time the same domain in
this case, breaking at the first one prevent to have all auth objects in
a ready state.

This patch just remove the break so we can loop on every auth objects.

Must be backported to 3.2.
2025-11-21 12:28:41 +01:00
Amaury Denoyelle
bbd83e3de9 BUG/MINOR: mux-quic: check access on qcs stream-endpoint
Since the following commit, allocation of stream-endpoint has been
delayed. The objective is to allocate it only for QCS attached to an
upper stream object.

  commit e6064c561684d9b079e3b5725d38dc3b5c1b5cd5
  OPTIM: mux-quic: delay FE sedesc alloc to stream creation

However, some MUX functions are unsafe as qcs->sd is dereferenced
without any check on it which will result in a crash. Fix this by
testing that qcs->sd is allocated before using it.

This does not need to be backported, unless the above patch is.
2025-11-21 11:16:07 +01:00
Frederic Lecaille
91f479604e BUG/MEDIUM: quic-be: quic_conn_closed buffer overflow
This bug impacts only the backends.

Recent commits have modified quic_rx_pkt_parse() for the QUIC backend to handle the
retry token, and version negotiation. This function is called for the quic_conn
even when is closing state (so for the quic_conn_closed struct). The quic_conn
struct and quic_conn_closed struct share some members thank to the leading
QUIC_CONN_COMMON struct. The recent modification impacts some members which do not
exist for the quic_connn_closed struct, leading to buffer overflows if modified.

For the backends only this patch:
  1- silently drops the Retry packet (received/parsed only by backends)
  2- silently drops the Initial packets received in closing state

This is safe for the Initial packets because in closing state the datagrams
are entirely skipped thanks to qc_rx_check_closing() in quic_dgram_parse().

No backport needed because the backend support arrived with the current dev.
2025-11-21 10:49:44 +01:00
Amaury Denoyelle
e6064c5616 OPTIM: mux-quic: delay FE sedesc alloc to stream creation
On frontend side, a stream-endpoint is allocated on every qcs_new()
invokation. However, this is only used for bidirectional request
streams.

This patch delays stream-endpoint allocation to qcs_attach_sc(), just
prior the instantiation of the upper stream object. This does not bring
any behavior change but is a nice optimization.
2025-11-21 10:34:08 +01:00
Amaury Denoyelle
4fb8908605 BUG/MINOR: mux-quic: fix sedesc leak on BE side
On backend side, streams are instantiated prior to their QCS MUX
counterpart. Thus, QCS can reuse the stream-endpoint already allocated
with the streams, either on qmux_init() or attach operation.

However, a stream-endpoint is also always allocated in every qcs_new()
invokation. For backend QCS, it is thus overwritten on
qmux_init()/attach operation. This causes a memleak.

Fix this by restricting allocation of stream-endpoint only for frontend
connection.

This does not need to be backported.
2025-11-21 10:34:08 +01:00
Amaury Denoyelle
9f16c64a8c MINOR: h3: adjust sedesc update for known input payload len 2025-11-21 10:34:08 +01:00
Christopher Faulet
0629ce8f4b BUG/MEDIUM: cli: State the cli have no more data to deliver if it yields
A regression was introduced in the commit 2d7e3ddd4 ("BUG/MEDIUM: cli: do
not return ACKs one char at a time"). When the CLI is processing a command
line, we no longer send response immediately. It is especially useful for
clients sending a bunch of commands with very short response.

However, in that state, the CLI applet must state it has no more data to
deliver. Otherwise it will be woken up again and again because data are
found in its output buffer with no blocking conditions. In worst cases, if
the command rate is really high, this can trigger the watchdog.

This patch must be backported where the patch above is, so probably as far
as 3.0.
2025-11-21 10:00:15 +01:00
Christopher Faulet
dfdccbd2af BUG/MEDIUM: applet: Fix conditions to detect spinning loop with the new API
There was a mixup between read/send events and ability for an applet to
receive and send. The fix seems obvious by reading it. The call-rate must be
incremented when nothing was received from the applet while it was allowed
and nothing was sent to the applet while it was allowed.

This patch must be backported as far as 3.0.
2025-11-21 09:41:05 +01:00
Willy Tarreau
4cbff2cad9 MINOR: limits: display the computed maxconn using ha_notice()
The computed maxconn was only displayed in verbose or debug modes. This
is too bad because lots of users just don't know what they're starting
with and can be trapped when an environment changes. Let's use ha_notice()
instead of a conditional fprintf() so that it gets displayed right after
the other startup messages, hoping that users will get used to seeing it
and more easily spot anomalies. See github issue #3191 for more context.
2025-11-20 18:38:09 +01:00
Lukas Tribus
a50c074b74 DOC: http: document 413 response code
Considering that we only use a "413 Payload Too Large" response in a single
situation with a specific config toogle (h1-accept-payload-with-any-method),
add some text to make it easier to find.

Should be backported to 2.6.

Link: https://github.com/cbonte/haproxy-dconv/issues/46
Link: https://discourse.haproxy.org/t/haproxy-error-413-paylod-too-large/9831/3
2025-11-20 18:07:01 +01:00
Willy Tarreau
05c409f1be BUG/MEDIUM: connection/ssl: also fix the ssl_sock_io_cb() regarding idle list
The fix in commit 9481cef948 ("BUG/MEDIUM: connection: do not reinsert a
purgeable conn in idle list") is also needed for ssl_sock_io_cb() which
can also release an idle connection and must perform the same checks.
This fix must be backported to all stable versions containing the fix
above.
2025-11-20 17:19:50 +01:00
William Lallemand
6aa236e964 ADMIN: dump-certs: let dry-run compare certificates
Let the --dry-run mode connect to the socket and compare the
certificates. It would exits the process just before trying to move
the previous certificate and replace it.

This allow to have the "[NOTICE] (1234) XXX is already up to date" message
with dry-run.
2025-11-20 16:50:20 +01:00
Amaury Denoyelle
b2664d4450 BUG/MINOR: quic: flag conn with CO_FL_FDLESS on backend side
Connection struct defines an handle which can point to either a FD or a
quic_conn. On the latter case, CO_FL_FDLESS must be set. This is already
the case on frontend side.

This patch fixes QUIC backend support. Before setting connection handle
member to a quic_conn instance, ensure that CO_FL_FDLESS flag is set on
the connection.

Prior to this patch, crash can occur in "show sess all".

No need to backport.
2025-11-20 16:44:03 +01:00
Amaury Denoyelle
cd2962ee64 MINOR: quic: store source address for backend conns
quic_conn has a local_addr member which is used to store the connection
source address. On backend side, this member is initialized to NULL as
the address is not yet known prior to connect. With this patch,
quic_connect_server() is extended so that local_addr is updated after
connect() success.

Also, quic_sock_get_src() is completed for the backend side which now
returns local_addr member. This step is necessary to properly support
fetches bc_src/bc_src_port.
2025-11-20 16:44:03 +01:00
Christopher Faulet
a14b7790ad DOC: config: Add a note about conflict with ALPN/NPN settings and proto keyword
If a mux protocol is forced and an incompatible ALPN or NPN settings are
used, connection errors may be experienced. There is no check performed
during HAProxy startup and It is not necessarily obvious. So a note is added
to warn users about this usage.
2025-11-20 16:14:52 +01:00
Christopher Faulet
0a7f3954b5 BUG/MEDIUM: config: Use the mux protocol ALPN by default for listeners if forced
Since the commit 5003ac7fe ("MEDIUM: config: set useful ALPN defaults for
HTTPS and QUIC"), the ALPN is set by default to "h2,http/1.1" for HTTPS
listeners. However, it is in conflict with the forced mux protocol, if
any. Indeed, with "proto" keyword, the mux can be forced. In that case, some
combinations with the default ALPN will triggers connections errors.

For instance, by setting "proto h2", it will not be possible to use the H1
multiplexer. So we must take care to not advertise it in the ALPN. Worse,
since the commit above, most modern HTTP clients will try to use the H2
because it is advertised in the ALPN. By setting "proto h1" on the bind line
will make all the traffic rejected in error.

To fix the issue, and thanks to previous commits, if it is defined, we are
now relying on the ALPN defined by the mux protocol by default. The H1
multiplexer (only the one that can be forced) defines it to "http/1.1" while
the H2 multiplexer defines it to "h2". So by default, if one or another of
these muxes is forced, and if no ALPN is set, the mux ALPN is used.

Other multiplexers are not defining any default ALPN for now, because it is
useless. In addition, only the listeners are concerned because there is no
default ALPN on the server side.Finally, there is no tests performed if the
ALPN is forced on the bind line. It is the user responsibility to properly
configure his listeners (at least for now).

This patch depends on:
  * MINOR: config: Do proto detection for listeners before checks about ALPN
  * MINOR: muxes: Support an optional ALPN string when defining mux protocols

The series must be backported as far as 2.8.
2025-11-20 16:14:52 +01:00
Christopher Faulet
2ef8b91a00 MINOR: config: Do proto detection for listeners before checks about ALPN
The verification of any forced mux protocol, via the "proto" keyword, for
listeners is now performed before any tests on the ALPN. It will be
mandatory to be able to force the default ALPN, if not forced on the bind
line.

This patch will be mandatory for the next fix.
2025-11-20 16:14:52 +01:00
Christopher Faulet
8e08a635eb MINOR: muxes: Support an optional ALPN string when defining mux protocols
When a multiplexer protocol is defined, it is now possible to specify the
ALPN it supports, in binary format. This info is optionnal. For now only the
h2 and the h1 multiplexers define an ALPN because this will be mandatory for
a fix. But this could be used in future for different purpose.

This patch will be mandatory for the next fix.
2025-11-20 16:14:52 +01:00
Olivier Houchard
e9d34f991e BUG/MEDIUM: queues: Don't forget to unlock the queue before exiting
In assign_server_and_queue(), there's a rare case when the server was
full, so we created a pendconn, another server was considered but in the
meanwhile the pendconn was unqueued already, so we just left the
function. We did so, however, while still holding the queue lock, which
will ultimately lead to a deadlock, and ultimately the watchdog would
kill the process.
To fix that, just unlock the queue before leaving.

This should be backported to 3.2.
2025-11-20 13:57:06 +01:00
William Lallemand
1b443bdec5 DOC: acme: add details about key pair generation in ACME section
In 3.3 it is possible to generate a key pair without needing a
existing certificate on the disk.
2025-11-20 12:48:22 +01:00
William Lallemand
d6e3e5b3a6 DOC: acme: configuring acme needs a crt file
Configuring acme in 3.2 needs a certificate on the disk.

To be backported to 3.2
2025-11-20 12:44:54 +01:00
William Lallemand
332dcaecba DOC: acme: explain how to dump the certificates
The certificates can be dumped with either the dataplaneapi or the
haproxy-dump-certs scripts.

Must be backported in 3.2 as well as the script.
2025-11-20 12:40:38 +01:00
William Lallemand
5ff4c066e7 DOC: acme: add details about the DNS-01 support
DNS-01 is supported and was backported in 3.2.

Backport to 3.2.
2025-11-20 12:37:48 +01:00
William Lallemand
e0665d4ffe BUG/MINOR: acme: alert when the map doesn't exist at startup
When configuring an acme section with the 'map' keyword, the user must
use an existing map. If the map doesn't exist, a log will be emitted
when trying to add the challenge to the map.

This patch change the behavior by checking at startup if the map exists,
so haproxy would warn and won't start with a non-existing map.

This must be backported in 3.2.
2025-11-20 12:22:19 +01:00
Frederic Lecaille
fab7da0fd0 BUG/MEDIUM: quic-be/ssl_sock: TLS callback called without connection
Contrary to TCP, QUIC does not SSL_free() its SSL *  object when its ->close()
XPRT callback is called. This has as side effect to trigger some BUG_ON(!conn)
with <conn> the connection from TLS callbacks registered at configuration
parsing time, so after this <conn> have been released.

This is the case for instance with ssl_sock_srv_verifycbk() whose role is to
add some checks to the built-in server certificate verification process.

This patch prevents the pointer to <conn> dereferencing inside several callbacks
shared between TCP and QUIC.

Thank you to @InputOutputZ for its report in GH #3188.

As the QUIC backend feature arrived with the current 3.3 dev, no need to backport.
2025-11-20 11:36:57 +01:00
Willy Tarreau
8438ca273f MINOR: limits: explain a bit better what to do when fd limits are exceeded
As shown in github issue #3191, the error message shown when FD limits
are exceeded is not very useful as-is, since the current hard limit is
not displayed, and no suggestion is made about what to change in the
config. Let's explain about maxconn/ulimit-n/fd-hard-limit, suggest
dropping them or setting them to a context-based value at roughly 49%
of the current limit minus the known used FDs for listeners and checks.
This allows common "large" hard limits to report mostly round maxconns.
Example:

  [ALERT]    (25330) : [haproxy.main()] Cannot raise FD limit to 4001020,
  current limit is 1024 and hard limit is 4096. You may prefer to let
  HAProxy adjust the limit by itself; for this, please just drop any
  'maxconn' and 'ulimit-n' from the global section, and possibly add
  'fd-hard-limit' lower than this hard limit. You may also force a new
  'maxconn' value that is a bit lower than half of the hard limit minus
  listeners and checks. This results in roughly 1500 here.
2025-11-20 08:44:52 +01:00
Willy Tarreau
91d4f4f618 MINOR: limits: keep a copy of the rough estimate of needed FDs in global struct
It's always a pain to guess the number of FDs that can be needed by
listeners, checks, threads, pollers etc. We have this estimate in
global.maxsock before calling set_global_maxconn(), but we lose it
the line after. Let's copy it into global.est_fd_usage and keep it.
This will be helpful to try to provide more accurate suggestions for
maxconn.
2025-11-20 08:44:52 +01:00
Frederic Lecaille
2c6720a163 MINOR: quic: uneeded xprt context variable passed as parameter
This quic_conn ->xrpt_ctx is passed to qc_send_ppkts(), the quic_conn is retrieved
from this context to be used inside this function and it is not used at all
by this function.

This patch simply directly passes the quic_conn to qc_send_ppkts(). This is only
what this function needs.
2025-11-20 08:17:44 +01:00
Frederic Lecaille
a88fdf8669 MINOR: quic/flags: add missing QUIC flags for flags dev tool.
Add missing QUIC_FL_CONN_XPRT_CLOSED quic_conn flags definition.
2025-11-20 08:10:58 +01:00
Willy Tarreau
40687ebc64 REGTESTS: ssl: also verify that 0-rtt properly advertises early-data:1
This patch completes the 0-rtt test to verify that early-data:1 is
properly emitted to the server in the relevant situations. We carefully
compare it with the expected values that are computed based on the TLS
version, the client and listener's support for 0-rtt and the resumption
status. A response header "x-early-data-test" is set to OK on success,
or KO on failure and the client tests this. The previous test is kept
as well. This was tested with quictls-1.1.1 and quictls-3.0.1 for TCP,
as well as aws-lc for QUIC.
2025-11-19 22:30:31 +01:00
Willy Tarreau
2dc4d99cd2 REGTESTS: ssl: add basic 0rtt tests for TLSv1.2, TLSv1.3 and QUIC
These tests try all the combinations of {0,1}rtt <-> {0,1}rtt with
stateless and stateful tickets. They take into consideration the TLS
version to decide whether or not 0rtt should work. Since we cannot
use environment variables in the client, the tests are run in haproxy
itself where the frontends set a "x-early-rcvd-test" response header
that the client checks. At this stage, the test only verifies that
*some* early data were received.

Note that the tests are a bit complex because we need 4 listeners
for the various combinations of 0rtt/tickets, then we have to set
expectations based on the TLS version (1.2 vs 1.3), as well as the
session resumption status.

We have to set alpn on the server lines because currently our frontends
expect it for 0-rtt to work.
2025-11-19 22:30:21 +01:00
William Lallemand
f6373a6ca8 BUILD: Makefile: remove halog from install-admin
The dependency to halog build provokes problems when changing CFLAGS and
LDFLAGS, because you're suppose to have the same flags during the build
and the install if there's still some things to build.

We probably need to store the flags somewhere to reuse them at another
step, but we need to do it cleanly. In the meantime it's better not to
have this dependency.
2025-11-19 16:52:20 +01:00
Amaury Denoyelle
d54d78fe9a BUG/MINOR: quic: fix FD usage for quic_conn_closed on backend side
On the frontend side, QUIC transfer can be performed either via a
connection owned FD or multiplex on the listener one. When a quic_conn
is freed and converted to quic_conn_closed instance, its FD if open is
closed and all exchanges are now multiplex via the listener FD.

This is different for the backend as connections only has the choice to
use their owned FD. Thus, special care care must be taken when freeing a
connection and converting it to a quic_conn_closed instance. In this
case, qc_release_fd() is delayed to the quic_conn_closed release.

Furthermore, when the FD is transferred, its iocb and owner fields are
updated to the new quic_conn_closed instance. Without it, a crash will
occur when accessing the freed quic_conn tasklet. A newly dedicated
handler quic_conn_closed_sock_fd_iocb is used to ensure access to
quic_conn_closed members only.
2025-11-19 16:02:22 +01:00
Amaury Denoyelle
46c5c232d7 BUG/MINOR: quic: do not decrement jobs for backend conns
jobs is a global counter which serves to account activity through the
whole process. Soft-stop procedure will wait until this counter is
resetted to the nul value.

jobs is not used for backend connections. Thus, it is not incremented
when a QUIC backend connection is instantiated as expected. However,
decrement is performed on all sides during quic_conn_release(). This
causes the counter wrapping.

Fix this by decrementing jobs only for frontend connections. Without
this patch, soft stop procedure will hang indefinitely if QUIC backend
connections were in use.
2025-11-19 16:02:22 +01:00
Amaury Denoyelle
1a22caa6ed MINOR: quic: fix trace on quic_conn_closed release
Adjust leaving trace of quic_release_cc_conn() so that the end of the
function is properly reported.
2025-11-19 16:02:22 +01:00
Amaury Denoyelle
e55bcf5746 BUG/MINOR: mux-quic: implement max-reuse server parameter
Properly implement support for max-reuse server keyword. This is done by
adding a total count of streams seen for the whole connection. This
value is used in avail_streams callback.
2025-11-19 16:02:22 +01:00
William Lallemand
c8540f7437 BUG/MINOR: ssl: remove dead code in ssl_sock_from_buf()
When haproxy is compiled in -O0, the SSL_get_max_early_data() symbol is
used in the generated assembly, however -O2 seems to remove this symbol
when optimizing the code.

It happens because `if conn_is_back(conn)` and `if
(objt_listener(conn->target))` are opposed conditions, which mean we
never use the branch when objt_listener(conn->target) is true.

This patch removes the dead code. Bonus: SSL_get_max_early_data() is not
implemented in rustls, and that's the only thing preventing to start
with it.

This can be backported in every stable branches.
2025-11-19 11:00:05 +01:00
William Lallemand
1f562687e3 CI: github: make install-bin instead of make install
make install now have a dependency to install-admin which have a
dependency to admin/halog/halog.

halog links haproxy .o together with its own objects, but those objects
when built with ASAN must also be linked with ASAN or it won't be
possible to link the binary.

We don't need an ASAN-ready halog, so let's just do an install-bin
instead that will just install haproxy.
2025-11-18 20:11:23 +01:00
William Lallemand
c3a95ba839 BUILD: Makefile: make install with admin tools
`make install` now install some admin tools:

- halog in SBINDIR
- haproxy-dump-certs in SBINDIR
- haproxy-reload in SBINDIR
2025-11-18 20:02:24 +01:00
Willy Tarreau
14cb3799df REGTESTS: ssl: split the SSL reuse test into TLS 1.2/1.3
QUIC and TLS don't use the same tests because QUIC only supports
TLS 1.3 while SSL tests both TLS 1.2 and 1.3, which complicates
the tests scenarios.

This change extracts the core of the test into a single generic
ssl_reuse.vtci file and creates new high-level tests for TLSv1.2
over TCP, TLSv1.3 over TCP and TLSv1.3 over QUIC, which simply
include this file and set two variables. The test is now cleaner
and simpler.
2025-11-18 16:51:56 +01:00
William Lallemand
177816d2b8 BUG/MINOR: acme: P-256 doesn't work with openssl >= 3.0
When trying to use the P-256 curve in the acme configuration with
OpenSSL 3.x, the generation of the account was failing because OpenSSL
doesn't return a NIST or SECG curve name, but a ANSI X9.62 one.

Since the ANSI X9.62 curve names were not in the list, it couldn't match
anything supported.

This patch fixes the issue by adding both prime192v1 and prime256v1 name
in the struct curve array which is used during curve parsing.

Must be backported to 3.2.
2025-11-18 11:34:28 +01:00
William Lallemand
9bf01a0d29 BUG/MINOR: mworker: wrong signals during startup
Since the new master-worker model in 3.1, signals are registered in
step_init_3(). However, those signals were supposed to be registered
only for the worker or the standalone mode. It would call the wrong
callback in the master even during configuration parsing.

The patch set the signals handler to NULL for the master so it does
nothing until they really are registered.

Must be backported as far as 3.1.
2025-11-18 10:27:34 +01:00
William Lallemand
709cde6d08 BUG/MEDIUM: mworker: signals inconsistencies during startup and reload
Since haproxy 3.1, the master-worker mode changed to let the worker
parse the configuration instead of the master.

Previously, signals were blocked during configuration parsing and
unblocked before entering the polling loop of the master. This way it
was impossible to start a reload during the configuration parsing.

But with the new model, the polling loop is started in the master before
the configuration parsing is finished, and the signals are still
unblocked at this step. Meaning that it is possible to start a reload
while the configuration is parsing.

This patch reintroduce the behavior of blocking the signals during
configuration parsing adapted to the new model:

- Before the exec() of the reload, signals are blocked.
- When entering the polling loop, the SIGCHLD is unblocked because it is
  required to get a failure during configuration parsing in the worker
- Once the configuration is parsed, upon success in _send_status() or
  upon failure in run_master_in_recovery_mode() every signals are unblocked.

This patch must be backported as far as 3.1.
2025-11-18 10:05:42 +01:00
William Lallemand
b38405d156 CLEANUP: startup: move confusing msg variable
Move the char *msg variable declared in main() in a sub-block since
there's already multiple msg variable in other sub-blocks in this
function.

Also make it const.
2025-11-18 09:43:25 +01:00
Frederic Lecaille
37d01eea37 BUG/MEDIUM: quic-be: prevent use of MUX for 0-RTT sessions without secrets
The QUIC backend crashes when its peer does not support 0-RTT. In this case,
when the sessions are reused, no early-data level secrets are derived by
the TLS stack. This leads to crashes from qc_send_mux() which does not suppose
that both early-data level (qc->eel) and application level (qc->ael) cipher levels
could be non initialized.

To fix this:
  - prevent qc_send_mux() to send data if these two encryption level are not
    intialized. In this case it returns QUIC_TX_ERR_NONE;
  - avoid waking up the MUX from XPRT ->start() callback if the MUX is ready
    but without early-data level secrets to send them;
  - ensure the MUX is woken up by qc_ssl_do_handshake() after handshake completion
    if it is ready calling qc_notify_send()

Thank you to @InputOutputZ for having reported this issue in GH #3188.

No need to backport because QUIC backends is a current 3.3 development feature.
2025-11-17 15:40:24 +01:00
William Lallemand
0367227375 MEDIUM: mworker: set the mworker-max-reloads to 50
There was no mworker-max-reload value by default, it was set to INT_MAX
so this was impossible to reach.

The default value is now 50, which is still high, but no workers should
undergo that much reloads. Meaning that a worker will be killed with
SIGTERM if it reach this much reloads.
2025-11-17 11:54:30 +01:00
Amaury Denoyelle
c67a614e45 MINOR: quic: remove <ipv4> arg from qc_new_conn()
Remove <ipv4> argument from qc_new_conn(). This parameter is unnecessary
as it can be derived from the family type of the addresses also passed
as argument.
2025-11-17 10:20:54 +01:00
Amaury Denoyelle
133f100467 MINOR: quic: refactor qc_new_conn() prototype
The objective of this patch is to streamline qc_new_conn() usage so that
it is similar for frontend and backend sides.

Previously, several parameters were set only for frontend connections.
These arguments are replaced by a single quic_rx_packet argument, which
represents the INITIAL packet triggering the connection allocation on
the server side. For a QUIC client endpoint, it remains NULL. This usage
is consider more explicit.

As a minor change, <target> is moved as the first argument of the
function. This is considered useful as this argument determines whether
the connection is a frontend or backend entry.

Along with these changes, qc_new_conn() documentation has been reworded
so that it is now up-to-date with the newest usage.
2025-11-17 10:13:40 +01:00
Amaury Denoyelle
49edaca513 MINOR: quic: try to clarify quic_conn CIDs fields direction
quic_conn has two fields named <dcid> and <scid>. It may cause confusion
as it is not obvious how these fields are related to the connection
direction. Try to improve this by extending the documentation of these
two fields.
2025-11-17 10:11:04 +01:00
Amaury Denoyelle
035c026220 MINOR: quic: support multiple random CID generation for BE side
When a new backend connection is instantiated, a CID is first randomly
generated. It will serve as the first DCID for incoming packets from the
server. Prior to this patch, if the generated CID caused a collision
with an other entries from another connection, an error is reported and
the connection cannot be allocated.

This patch improves this procedure by implementing retries when a
collision occurs. Now, at most three attemps will be performed before
giving up. This is the same procedure already performed for CIDs
instantiated after RETIRE_CONNECTION_ID frame parsing.

Along with this functional change, qc_new_conn() is refactored for
backend instantiation. The CID generation is extracted from it and the
value is passed as an argument. This is considered cleaner as the code
is more similar between frontend and backend sides.
2025-11-17 10:11:04 +01:00
Amaury Denoyelle
8720130cc7 MINOR: quic: do not use quic_newcid_from_hash64 on BE side
quic_newcid_from_hash64 is an external callback. If defined, it serves
as a CID method generation, as an alternative to the default random
implementation.

This mechanism was not correctly implemented on the backend side.
Indeed, <hash64> quic_conn member is only setted for frontend
connections. The simplest solution would be to properly define it also
for backend ones. However, quic_newcid_from_hash64 derivation is really
only useful for the frontend side for now. Thus, this patch disables
using it on the backend side in favor of the default random generator.

To implement this, quic_cid_generate() is splitted in two functions, for
both methods of CIDs generation. This is the responsibility of the
caller to select the proper method. On backend side, only random
implementation is now used.
2025-11-17 10:11:04 +01:00
Christopher Faulet
fc6e3e9081 MINOR: stick-tables: Rename stksess shards to use buckets
The shard keyword is already used by the peers and on the server lines. And
it is unrelated with the session keys distribution. So instead of talking
about shard for the session key hashing, we now use the term "bucket".
2025-11-17 07:42:51 +01:00
Willy Tarreau
e5dadb2e8e [RELEASE] Released version 3.3-dev13
Released version 3.3-dev13 with the following main changes :
    - BUG/MEDIUM: config: for word expansion, empty or non-existing are the same
    - BUG/MINOR: quic: close connection on CID alloc failure
    - MINOR: quic: adjust CID conn tree alloc in qc_new_conn()
    - MINOR: quic: split CID alloc/generation function
    - BUG/MEDIUM: quic: handle collision on CID generation
    - MINOR: quic: extend traces on CID allocation
    - MEDIUM/OPTIM: quic: alloc quic_conn after CID collision check
    - MINOR: stats-proxy: ensure future-proof FN_AGE manipulation in me_generate_field()
    - BUG/MEDIUM: stats-file: fix shm-stats-file preload not working anymore
    - BUG/MINOR: do not account backend connections into maxconn
    - BUG/MEDIUM: init: 'devnullfd' not properly closed for master
    - BUG/MINOR: acme: more explicit error when BIO_new_file()
    - BUG/MEDIUM: quic-be: do not launch the connection migration process
    - MINOR: quic-be: Parse the NEW_TOKEN frame
    - MEDIUM: quic-be: Parse, store and reuse tokens provided by NEW_TOKEN
    - MINOR: quic-be: helper functions to save/restore transport params (0-RTT)
    - MINOR: quic-be: helper quic_reuse_srv_params() function to reuse server params (0-RTT)
    - MINOR: quic-be: Save the backend 0-RTT parameters
    - MEDIUM: quic-be: modify ssl_sock_srv_try_reuse_sess() to reuse backend sessions (0-RTT)
    - MINOR: quic-be: allow the preparation of 0-RTT packets
    - MINOR: quic-be: Send post handshake frames from list of frames (0-RTT)
    - MEDIUM: quic-be: qc_send_mux() adaptation for 0-RTT
    - MINOR: quic-be: discard the 0-RTT keys
    - MEDIUM: quic-be: enable the use of 0-RTT
    - MINOR: quic-be: validate the 0-RTT transport parameters
    - MINOR: quic-be: do not create the mux after handshake completion (for 0-RTT)
    - MINOR: quic-be: avoid a useless I/O callback wakeup for 0-RTT sessions
    - BUG/MEDIUM: acme: move from mt_list to a rwlock + ebmbtree
    - BUG/MINOR: acme: can't override the default resolver
    - MINOR: ssl/sample: expose ssl_*c_curve for AWS-LC
    - MINOR: check: delay MUX init when SSL ALPN is used
    - MINOR: cfgdiag: adjust diag on servers
    - BUG/MINOR: check: only try connection reuse for http-check rulesets
    - BUG/MINOR: check: fix reuse-pool if MUX inherited from server
    - MINOR: check: clarify check-reuse-pool interaction with reuse policy
    - DOC: configuration: add missing ssllib_name_startswith()
    - DOC: configuration: add missing openssl_version predicates
    - MINOR: cfgcond: add "awslc_api_atleast" and "awslc_api_before"
    - REGTESTS: ssl: activate ssl_curve_name.vtc for AWS-LC
    - BUILD: ech: fix clang warnings
    - BUG/MEDIUM: stick-tables: Always return the good stksess from stktable_set_entry
    - BUG/MINOR: stick-tables: Fix return value for __stksess_kill()
    - CLEANUP: stick-tables: Don't needlessly compute shard number in stksess_free()
    - MINOR: h1: h1_release() should return if it destroyed the connection
    - BUG/MEDIUM: h1: prevent a crash on HTTP/2 upgrade
    - MINOR: check: use auto SNI for QUIC checks
    - MINOR: check: ensure QUIC checks configuration coherency
    - CLEANUP: peers: remove an unneeded null check
    - Revert "BUG/MEDIUM: connections: permit to permanently remove an idle conn"
    - BUG/MEDIUM: connection: do not reinsert a purgeable conn in idle list
    - DEBUG: extend DEBUG_STRESS to ease testing and turn on extra checks
    - DEBUG: add BUG_ON_STRESS(): a BUG_ON() implemented only when DEBUG_STRESS > 0
    - DEBUG: servers: add a few checks for stress-testing idle conns
    - BUG/MINOR: check: fix QUIC check test when QUIC disabled
    - BUG/MINOR: quic-be: missing version negotiation
    - CLEANUP: quic: Missing succesful SSL handshake backend trace (OpenSSL 3.5)
    - BUG/MINOR: quic-be: backend SSL session reuse fix (OpenSSL 3.5)
    - REGTEST: quic: quic/ssl_reuse.vtc supports OpenSSL 3.5 QUIC API
2025-11-14 19:22:46 +01:00
Frederic Lecaille
d8f3ed6c23 REGTEST: quic: quic/ssl_reuse.vtc supports OpenSSL 3.5 QUIC API
This scripts is supported by OpenSSL 3.5 QUIC API since this previous commit:

   BUG/MINOR: quic: backend SSL session reuse fix (HAVE_OPENSSL_QUIC)

Should be backported where this commit is backported.
2025-11-14 18:06:47 +01:00
Frederic Lecaille
54eeda4b01 BUG/MINOR: quic-be: backend SSL session reuse fix (OpenSSL 3.5)
This bug impacts only the QUIC backends when haproxy is compiled against
OpenSSL 3.5 with QUIC API(HAVE_OPENSSL_QUIC).

The QUIC clients could not reuse their SSL session because the TLS tickets
received from the servers could not be provided to the TLS stack. This should
be done when the stack calls ha_quic_ossl_crypto_recv_rcd()
(OSSL_FUNC_SSL_QUIC_TLS_CRYPTO_RECV_RCD callback).

According to OpenSSL team, an SSL_read() call must be done after the handshake
completion. It seems the correct location is at the same level as for
SSL_process_quic_post_handshake() for quictls.

Thank you to @mattcaswell, @Sashan and @vdukhovni for having helped in solving
this issue.

Must be backported to 3.1
2025-11-14 17:50:49 +01:00
Frederic Lecaille
644bf585c3 CLEANUP: quic: Missing succesful SSL handshake backend trace (OpenSSL 3.5)
This very minor issue impacts only the backend when compiled against OpenSSL 3.5
with QUIC API (HAVE_OPENSSL_QUIC).

The "SSL handshake OK" trace was not dumped by a TRACE() call. This was very
annoying when debugging.

Modify the concerned code section which is a bit ugly and simplify it.
The TRACE() call is done at a unique location for now on.

Should be backported to 3.2 to ease any further backport.
2025-11-14 17:50:49 +01:00
Frederic Lecaille
f0c52f7160 BUG/MINOR: quic-be: missing version negotiation
This bug impacts only the QUIC clients (or backends). The version negotiation
was not supported at all for them. This is an oversight.

Contrary to the QUIC server which choose the negotiated version after having
received the transport parameters (into ClientHello message) the client selects
the negotiated version from the first Initial packet version field. Indeed, the
server transport parameters are inside the ServerHello messages ciphered
into Handshake packets.

This non intrusive patch does not impact the QUIC server implementation.
It only selects the negotiated version from the first Initial packet
received from the server and consequently initializes the TLS cipher context.

Thank you to @InputOutputZ for having reporte this issue in GH #3178.

No need to backport because the QUIC backends support arrives with 3.3.
2025-11-14 17:37:34 +01:00
Willy Tarreau
0746aa68b8 BUG/MINOR: check: fix QUIC check test when QUIC disabled
Latest commit ef206d441c ("MINOR: check: ensure QUIC checks configuration
coherency") introduced a regression when QUIC is not compiled in. Indeed,
not specifying a check proto sets mux_proto to NULL, which also happens to
be the value of get_mux_proto("QUIC"), so it complains about QUIC. Let's
add a non-null check in addition to this.

No backport is needed.
2025-11-14 17:27:53 +01:00
Willy Tarreau
4a6dec7193 DEBUG: servers: add a few checks for stress-testing idle conns
The latest idle conns fix 9481cef948 ("BUG/MEDIUM: connection: do not
reinsert a purgeable conn in idle list") addresses a very hard-to-hit
case which manifests itself with an attempt to reuse a connection fails
because conn->mux is NULL:

  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x0000655410b8642c in conn_backend_get (reuse_mode=4, srv=srv@entry=0x6554378a7140,
      sess=sess@entry=0x7cfe140948a0, is_safe=is_safe@entry=0,
      hash=hash@entry=910818338996668161) at src/backend.c:1390
  1390     if (conn->mux->takeover && conn->mux->takeover(conn, i, 0) == 0) {

However the condition that leads to this situation can be detected
earlier, by the presence of the connection in the toremove_list, whose
race window is much larger and easier to detect.

This patch adds a few BUG_ON_STRESS() at selected places that an detect
this condition. When built with -DDEBUG_STRESS and run under stress with
two distinct processes communicating over H2 over SSL, under a stress of
400-500k req/s, the front process usually crashes in the first 10-30s
triggering in _srv_add_idle() if the fix above is reverted (and it does
not crash with the fix).

This is mainly included to serve as an illustration of how to instrument
the code for seamless stress testing.
2025-11-14 17:00:17 +01:00
Willy Tarreau
675c86c4aa DEBUG: add BUG_ON_STRESS(): a BUG_ON() implemented only when DEBUG_STRESS > 0
The purpose of this new BUG_ON is beyond BUG_ON_HOT(). While BUG_ON_HOT()
is meant to be light but placed on very hot code paths, BUG_ON_STRESS()
might be heavy and only used under stress-testing, to try to detect early
that something bad is starting to happen. This one is not even type-checked
when not defined because we don't want to risk the compiler emitting the
slightest piece of code there in production mode, so as to give enough
freedom to the developers.
2025-11-14 16:42:53 +01:00
Willy Tarreau
3d441e78e5 DEBUG: extend DEBUG_STRESS to ease testing and turn on extra checks
DEBUG_STRESS is currently used only to expose "stress-level". With this
patch, we go a bit further, by automatically forcing DEBUG_STRICT and
DEBUG_STRICT_ACTION to their highest values in order to enable all
BUG_ON levels, and make all of them result in a crash. In addition,
care is taken to always only have 0 or 1 in the macro, so that it can be
tested using "#if DEBUG_STRESS > 0" as well as "if (DEBUG_STRESS) { }"
everywhere.

The goal will be to ease insertion of extra tests for builds dedicated
to stress-testing that enable possibly expensive extra checks on certain
code paths that cannot reasonably be compiled in for production code
right now.
2025-11-14 16:38:04 +01:00
Amaury Denoyelle
9481cef948 BUG/MEDIUM: connection: do not reinsert a purgeable conn in idle list
A recent patch was introduced to fix a rare race condition in idle
connection code which would result in a crash. The issue is when MUX IO
handler run on top of connection moved in the purgeable list. The
connection would be considered as present in the idle list instead, and
reinserted in it at the end of the handler while still in the purge
list.

  096999ee208b8ae306983bc3fd677517d05948d2
  BUG/MEDIUM: connections: permit to permanently remove an idle conn

This patch solves the described issue. However, it introduces another
bug as it may clear connection flag when removing a connection from its
parent list. However, these flags now serve primarily as a status which
indicate that the connection is accounted by the server. When a backend
connection is freed, server idle/used counters are decremented
accordingly to these flags. With the above patch, an incorrect counter
could be adjusted and thus wrapping would occured.

The first impact of this bug is that it may distort the estimated number
of connections needed by servers, which would result either in poor
reuse rate or too many idle connections kept. Another noticeable impact
is that it may prevent server deletion.

The main problem of the original and current issues is that connection
flags are misinterpreted as telling if a connection is present in the
idle list. As already described here, in fact these flags are solely a
status which indicate that the connection is accounted in server
counters. Thus, here are the definitive conclusion that can be learned
here :

* (conn->flags & CO_FL_LIST_MASK) == 1:
  the connection is accounted by the server
  it may or may not be present in the idle list

* (conn->flags & CO_FL_LIST_MASK) == 0
  the connection is not accounted and not present in idle list

The discussion above does not mention session list, but a similar
pattern can be observed when CO_FL_SESS_IDLE flag is set.

To keep the original issue solved and fix the current one, IO MUX
handlers prologue are rewritten. Now, flags are not checked anymore for
list appartenance and LIST_INLIST macro is used instead. This is
definitely clearer with conn_in_list purpose here.

On IO MUX handlers end, conn idle flags may be checked if conn_in_list
was true, to reinsert the connection either in idle or safe list. This
is considered safe as no function should modify idle flags when a
connection is not stored in a list, except during conn_free() operation.

This patch must be backported to every stable versions after revert of
the above commit. It should be appliable up to 3.0 without any issue. On
2.8 and below, <idle_list> connection member does not exist. It should
be safe to check <leaf_p> tree node as a replacement.
2025-11-14 16:06:34 +01:00
Amaury Denoyelle
d79295d89b Revert "BUG/MEDIUM: connections: permit to permanently remove an idle conn"
The target patch fixes a rare race condition which happen when a MUX IO
handler is working on a connection already moved into the purge list. In
this case, the handler will incorrectly moved back the connection into
the idle list.

To fix this, conn_delete_from_tree() was extended to remove flags along
with the connection from the idle list. This was performed when the
connection is moved into the purge list. However, it introduces another
issue related to the idle server connection accounting. Thus it is
necessary to revert it prior to the incoming newer fix.

This patch must be backported to every version where the original commit
is.
2025-11-14 16:06:34 +01:00
Willy Tarreau
6b9c3d0621 CLEANUP: peers: remove an unneeded null check
Coverity reported in GH #3181 that a NULL test was useless, in
peers_trace(), which is true since the peer always belongs to a
peers section and it was already dereferenced. Let's just remove
the test to avoid the confusion.
2025-11-14 13:47:20 +01:00
Amaury Denoyelle
ef206d441c MINOR: check: ensure QUIC checks configuration coherency
QUIC is now supported on the backend side, thus it is possible to use it
with server checks. However, checks configuration can be quite
extensive, differing greatly from the server settings.

This patch ensures that QUIC checks are always performed under a
controlled context. Objectives are to avoid any crashes and ensure that
there is no suprise for users in respect to the configuration.

The first part of this patch ensures that QUIC checks can only be
activated on QUIC servers. Indeed, QUIC requires dedicated
initialization steps prior to its usage.

The other part of this patch disables QUIC usage when one or multiple
specific check connection settings are specified in the configuration,
diverging from the server settings. This is the simplest solution for
now and ensure that there is no hidden behavior to users. This means
that it's currently impossible to perform QUIC checks if other endpoints
that the server itself. However for now there is no real use-case for
this scenario.

Along with these changes, check-proto documentation is updated to
clarify QUIC checks behavior.
2025-11-14 13:42:08 +01:00
Amaury Denoyelle
ca5a5f37a1 MINOR: check: use auto SNI for QUIC checks
By default, check SNI is set to the Host header when an HTTPS check is
performed. This patch extends this mode so that it is also active when
QUIC checks are executed.

This patch should improve reuse rate with checks. Indeed, SNI is also
already automatically set for normal traffic. The same value must be
used during check so that a connection hash match can be found.
2025-11-14 13:42:08 +01:00
Olivier Houchard
333deef485 BUG/MEDIUM: h1: prevent a crash on HTTP/2 upgrade
Change h1_process() to return -2 when the mux is destroyed but the
connection is not, so that we can differentiate between "both mux and
connection were destroyed" and "only the mux was destroyed".
It can happen that only the mux gets destroyed, and the connection is
still alive, if we did upgrade it to HTTP/2.
In h1_wake(), if the connection is alive, then return 0, as the wake
methods should only return -1 if the connection is dead.
This fixes a bug where the ssl xprt would consider the connection
destroyed, and thus would consider its tasklet should die, and return
NULL, and its TASK_RUNNING flag would never be removed, leading to an
infinite loop later on. This would happen anytime an HTTP/2 upgrade was
successful.

This should be backported up to 2.8. While the bug by commit
00f43b7c8b136515653bcb2fc014b0832ec32d61, it was not triggered before
only by chance, and exists in previous releases too.
2025-11-14 12:49:35 +01:00
Olivier Houchard
2f8f09854f MINOR: h1: h1_release() should return if it destroyed the connection
h1_release() is called to destroy everything related to the mux h1,
usually even the connection. However, it handles upgrades to HTTP/2 too,
in which case the h1 mux will be destroyed, but the connection will
still be alive. So make it so it returns 0 if everything is destroyed,
and -1 if the connection is still alive.

This should be backported up to 2.8, as a future bugfix will depend on
it.
2025-11-14 12:49:35 +01:00
Christopher Faulet
14a333c4f4 CLEANUP: stick-tables: Don't needlessly compute shard number in stksess_free()
Since commit 0bda33a3e ("MINOR: stick-tables: remove the uneeded read lock
in stksess_free()"), the lock on the shard is no longer acquired. So it is
useless to still compture the shard number. The result is never used and can
be safely removed.
2025-11-14 11:56:14 +01:00
Christopher Faulet
346d6c3ac7 BUG/MINOR: stick-tables: Fix return value for __stksess_kill()
The commit 9938fb9c7 ("BUG/MEDIUM: stick-tables: Fix race with peers when
killing a sticky session") introduced a regression.

__stksess_kill() must always return 0 if the session cannot be released. But
when the ref_cnt is tested under the update lock, a success is reported if
the session is still in-used. 0 must be returned in that case.

This bug is harmless because callers never use the return value of
__stksess_kill() or stksess_kill().

This bug must be backported as far as 3.0.
2025-11-14 11:56:14 +01:00
Christopher Faulet
bd4fff9a76 BUG/MEDIUM: stick-tables: Always return the good stksess from stktable_set_entry
In stktable_set_entry(), the return value of __stktable_store() is not
tested while it is possible to get an existing session with the same key
instead of the one we want to insert. It happens when we fails to upgrade
the read lock on the bucket to an write lock. In that case, we release the
lock for a short time to get a write lock.

So, to fix the bug, we must check the session returned by __stktable_store()
and take care to return this one.

The bug was introduced by the commit e62885237c ("MEDIUM: stick-table: make
stktable_set_entry() look up under a read lock"). It must be backported as
far as 2.8.
2025-11-14 11:56:12 +01:00
William Lallemand
bf639e581d BUILD: ech: fix clang warnings
No impact as the state is either SHOW_ECH_SPECIFIC or SHOW_ECH_ALL but
never anything else.

src/ech.c:240:6: error: variable 'p' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
  240 |         if (ctx->state == SHOW_ECH_ALL) {
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
src/ech.c:275:12: note: uninitialized use occurs here
  275 |         ctx->pp = p;
      |                   ^
src/ech.c:240:2: note: remove the 'if' if its condition is always true
  240 |         if (ctx->state == SHOW_ECH_ALL) {
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ech.c:228:17: note: initialize the variable 'p' to silence this warning
  228 |         struct proxy *p;
      |                        ^
      |                         = NULL
src/ech.c:240:6: error: variable 'bind_conf' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
  240 |         if (ctx->state == SHOW_ECH_ALL) {
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
src/ech.c:276:11: note: uninitialized use occurs here
  276 |         ctx->b = bind_conf;
      |                  ^~~~~~~~~
src/ech.c:240:2: note: remove the 'if' if its condition is always true
  240 |         if (ctx->state == SHOW_ECH_ALL) {
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ech.c:229:29: note: initialize the variable 'bind_conf' to silence this warning
  229 |         struct bind_conf *bind_conf;
      |                                    ^
      |                                     = NULL
2 errors generated.
make: *** [Makefile:1062: src/ech.o] Error 1
2025-11-14 11:35:38 +01:00
William Lallemand
e17881128b REGTESTS: ssl: activate ssl_curve_name.vtc for AWS-LC
It was difficult to test ssl_curve_name.vtc with AWS-LC without a way to
check the AWS-LC API. Let's add awslc_api_atleast() in the start
conditions.
2025-11-14 11:01:45 +01:00
William Lallemand
3d15c07ed0 MINOR: cfgcond: add "awslc_api_atleast" and "awslc_api_before"
AWS-LC features are not easily tested with just the openssl version
constant. AWS-LC uses its own API versioning stored in the
AWSLC_API_VERSION constant.

This patch add the two awslc_api_atleast and awslc_api_before predicates
that help to check the AWS-LC API.
2025-11-14 11:01:45 +01:00
William Lallemand
35d21a8bc0 DOC: configuration: add missing openssl_version predicates
Add missing openssl_version_atleast() and  openssl_version_before()
predicates.

The predicates exist since 3aeb3f9347 ("MINOR: cfgcond: implements
openssl_version_atleast and openssl_version_before").

Must be backported in every stable versions.
2025-11-14 11:01:45 +01:00
William Lallemand
9ad018a3dd DOC: configuration: add missing ssllib_name_startswith()
Add the missing ssllib_name_startswith() predicate in the documentation.

The predicate was introduced with b01179aa9 ("MINOR: ssl: Add
ssllib_name_startswith precondition").

Must be backported as far as 2.6.
2025-11-14 11:01:45 +01:00
Amaury Denoyelle
8415254cea MINOR: check: clarify check-reuse-pool interaction with reuse policy
check-reuse-pool can only perform as expected if reuse policy on the
backend is set to aggressive or higher. Update the documentation to
reflect this and implement a server diag warning.
2025-11-14 10:44:05 +01:00
Amaury Denoyelle
52a7d4ec39 BUG/MINOR: check: fix reuse-pool if MUX inherited from server
Check reuse is only performed if no specific check connect options are
specified on the configuration. This ensures that reuse won't be
performed if intending to use different connection parameters from the
default traffic.

This relies on tcpcheck_use_nondefault_connect() which indicates if the
check has any specific connection parameters. One of them if check
<mux_proto> field. However, this field may be automatically set during
init_srv_check() in some specific conditions without any explicit
configuration, most notably when using http-check rulesets on an HTTP
backend. Thus, it prevents connection reuse for these checks.

This commit fixes this by adjuting tcpcheck_use_nondefault_connect().
Beside checking check <mux_proto> field, it also detects if it is
different from the server configuration. This is sufficient to know if
the value is derived from the configuration or automatically calculated
in init_srv_check().

Note that this patch introduces a small behavior change. Prior to it,
check reuse were never performed if "check-proto" is explicitely
configured. Now, check reuse will be performed if the configured value
is identical to the server MUX protocol. This is considered as
acceptable as connection reuse is safe when using a similar MUX
protocol.

This must be backported up to 3.2.
2025-11-14 10:44:05 +01:00
Amaury Denoyelle
5d021c028e BUG/MINOR: check: only try connection reuse for http-check rulesets
In 3.2, a new server keyword "check-reuse-pool" has been introduced. It
allows to reuse a connection for a new check, instead of always
initializing a new one. This is only performed if the check does not
rely on specific connection parameters differing from the server.

This patch further restricts reuse for checks only when an HTTP ruleset
is used at the backend level. Indeed, reusing a connection outside of
HTTP is an undefined behavior. The impact of this bug is unknown and
depends on the proxy/server configuration. In the case of an HTTP
backend with non-HTTP checks, check-reuse-pool would probably cause a
drop in reuse rate.

Along this change, implement a new diagnostic warning on servers to
report that check-reuse-pool cannot apply due to an incompatible check
type.

This must be backported up to 3.2.
2025-11-14 10:44:03 +01:00
Amaury Denoyelle
d92f8f84fb MINOR: cfgdiag: adjust diag on servers
Adjust code dealing with diagnostics performed on server. The objective
is to extract the check on duplicate cookies in a dedicated function
outside of the proxies/servers loop.

This does not have any noticeable impact. This patch is merely a code
improvment to implement easily new future diagnostics on servers.
2025-11-14 10:00:26 +01:00
Amaury Denoyelle
d12971dfea MINOR: check: delay MUX init when SSL ALPN is used
When instantiating a new connection for check, its MUX may be
initialized early. This was not performed though if SSL ALPN negotiation
will be used, except if check MUX is already fixed.

However, this method of initialization is problematic when QUIC MUX is
used. Indeed, this multiplexer must only be instantiated after the above
application protocol is known, which is derived from the ALPN
negotiation. If this is not the case a crash will occur in qmux_init().

In fact, a similar problem was already encountered for normal traffic.
Thus, a change was performed in connect_server() : MUX early
initialization is now always skipped if SSL ALPN negotiation is active,
even if MUX is already fixed. This patch introduces a similar change for
checks.

Without this patch, it is not possible to perform check on QUIC servers
as expected. Indeed, when http-check ruleset is active a crash would
occur prior to it.
2025-11-14 09:49:04 +01:00
Damien Claisse
1d46c08689 MINOR: ssl/sample: expose ssl_*c_curve for AWS-LC
The underlying SSL_get_negotiated_group function has been backported
into AWS-LC [1], so expose the feature for users of this TLS stack
as well. Note that even though it was actually added in AWS-LC 1.56.0,
we require AWSLC_API_VERSION >= 35 which was released in AWS-LC 1.57.0,
because API version wasn't incremented after this change. As the delta
is one minor version (less than two weeks), I consider this acceptable
to avoid relying on a proxy constant like TLSEXT_nid_unknown which
might be removed at some point.

[1] d6a37244ad
2025-11-13 17:36:43 +01:00
William Lallemand
b9b158ea4c BUG/MINOR: acme: can't override the default resolver
httpclient_acme_init() was called in cfg_parse_acme() which is at
section parsing. httpclient_acme_init() also calls
httpclient_create_proxy() which could create a "default" resolvers
section if it doesn't exists.

If one tries to override the default resolvers section after an ACME
section, the resolvers section parsing will fail because the section was
already created by httpclient_create_proxy().

This patch fixes the issue by moving the initialization of the ACME
proxy to a pre_check callback, which is called just before
check_config_validity().

Must be backported in 3.2.
2025-11-13 17:17:11 +01:00
William Lallemand
2bdf5a7937 BUG/MEDIUM: acme: move from mt_list to a rwlock + ebmbtree
The current ACME scheduler suffers from problems due to the way the
tasks are stored:

- MT_LIST are not scalables when having a lot of ACME tasks and having
  to look for a specific one.
- the acme_task pointer was stored in the ckch_store in order to not
  passing through the whole list. But a ckch_store can be updated and
  the pointer lost in the previous one.
- when a task fails, the ptr in the ckch_store was not removed because
  we only work with a copy of the original ckch_store, it would need to
  lock the ckchs_tree and remove this pointer.

This patch fixes the issues by removing the MT_LIST-based architecture,
and replacing it by a simple ebmbtree + rwlock design.

The pointer to the task is not stored anymore in the ckch_store, but
instead it is stored in the acme_tasks tree. Finding a task is done by
doing a lookup on this tree with a RDLOCK.
Instead of checking if store->acme_task is not NULL, a lookup is also
done.

This allow to remove the stuck "acme_task" pointer in the store, which
was preventing to restart an acme task when the previous failed for this
specific certificate.

Must be backported in 3.2.
2025-11-13 15:18:12 +01:00
Frederic Lecaille
c76e072e43 MINOR: quic-be: avoid a useless I/O callback wakeup for 0-RTT sessions
For backends and 0-RTT sessions, this patch modifies the ->start() callback to
wake up the I/O callback only if the connection (and the mux) is not ready. Note that
connect_server() has been modified to call this xprt callback just after having
created the mux and installed the mux. Contrary to 1-RTT session, for 0-RTT sessions,
the connections are always ready before calling this ->start xprt callback.
2025-11-13 14:04:31 +01:00
Frederic Lecaille
92d2ab76e0 MINOR: quic-be: do not create the mux after handshake completion (for 0-RTT)
This is required during connection with 0-RTT support, to prevent two mux creations.
Indeed, for 0-RTT sessions, the QUIC mux is already started very soon from
connect_server() (src/backend.c).
2025-11-13 14:04:31 +01:00
Frederic Lecaille
d84463f9f6 MINOR: quic-be: validate the 0-RTT transport parameters
During 0-RTT sessions, some server transport parameters are reused after having
been save from previous sessions. These parameters must not be reduced
when it resends them. The client must check this is the case when some early data
are accepted by the server. This is what is implemented by this patch.

Implement qc_early_tranport_params_validate() which checks the new server parameters
are not reduced.

Also implement qc_ssl_eary_data_accepted() which was not implemented for TLS
stack without 0-RTT support (for instance wolfssl). That said this function
was no more used. This is why the compilation against wolfssl could not fail.
2025-11-13 14:04:31 +01:00
Frederic Lecaille
6419b9f204 MEDIUM: quic-be: enable the use of 0-RTT
This patch allows the use of 0-RTT feature on QUIC server lines with "allow-0rtt"
option. In fact 0-RTT is really enabled only if ssl_sock_srv_try_reuse_sess()
successfully manages to reuse the SSL session and the chosen application protocol
from previous connections.

Note that, at this time, 0-RTT works only with quictls and aws-lc as TLS stack.

(0-RTT does not work at all (even for QUIC frontends) with libressl).
2025-11-13 14:04:31 +01:00
Frederic Lecaille
46d490f7c2 MINOR: quic-be: discard the 0-RTT keys
This patch allows the discarding of the 0-RTT keys as soon as 1-RTT keys
are available.
2025-11-13 14:04:31 +01:00
Frederic Lecaille
3f60891360 MEDIUM: quic-be: qc_send_mux() adaptation for 0-RTT
When entering this function, a selection is done about the encryption level
to be used to send data. For a client, the early data encryption level
is used to send 0-RTT if this encryption level is initialized.

The Initial encryption is also registered to the send list for clients if there
is Initial crypto data to send. This allow Initial and 0-RTT packets to
be coalesced by datagrams.
2025-11-13 14:04:31 +01:00
Frederic Lecaille
a4bbbc75db MINOR: quic-be: Send post handshake frames from list of frames (0-RTT)
This patch is required to make 0-RTT work. It modifies the prototype of
quic_build_post_handshake_frames() to send post handshake frames from a
list of frames in place of the application encryption level (used
as <qc->ael> local variable).

This patch does not modify at all the current QUIC stack behavior (even for
QUIC frontends). It must be considered as a preparation for the code
to come about 0-RTT support for QUIC backends.
2025-11-13 14:04:31 +01:00
Frederic Lecaille
ac1d3eba88 MINOR: quic-be: allow the preparation of 0-RTT packets
A QUIC server never sends 0-RTT packets contrary to the client.

This very simple modification allow the the preparation of 0-RTT packets
with early data as encryption level (->eel).
2025-11-13 14:04:31 +01:00
Frederic Lecaille
6e14365a5b MEDIUM: quic-be: modify ssl_sock_srv_try_reuse_sess() to reuse backend sessions (0-RTT)
This function is called for both TCP and QUIC connections to reuse SSL sessions
saved by ssl_sess_new_srv_cb() callback called upon new SSL session creation.

In addition to this, a QUIC SSL session must reuse the ALPN and some specific QUIC
transport parameters. This is what is added by this patch for QUIC 0-RTT sessions.

Note that for now on, ssl_sock_srv_try_reuse_sess() may fail for QUIC connections
if it did not managed to reuse the ALPN. The caller must be informed of such an
issue. It must not enable 0-RTT for the current session in this case. This is
impossible without ALPN which is required to start a mux.

ssl_sock_srv_try_reuse_sess() is modified to always succeeds for TCP connections.
2025-11-13 14:04:31 +01:00
Frederic Lecaille
5309dfb56b MINOR: quic-be: Save the backend 0-RTT parameters
For both TCP and QUIC connections, this is ssl_sess_new_srv_cb() callback which
is called when a new SSL session is created. Its role is to save the session to
be reused for the next sessions.

This patch modifies this callback to save the QUIC parameters to be reused
for the next 0-RTT sessions (or during SSL session resumption).

The already existing path_params->nego_alpn member is used to store the ALPN as
this is done for TCP alongside path_params->tps new quic_early_transport_params
struct used to save the QUIC transport parameters to be reused for 0-RTT sessions.
2025-11-13 14:04:31 +01:00
Frederic Lecaille
41e40eb431 MINOR: quic-be: helper quic_reuse_srv_params() function to reuse server params (0-RTT)
Implement quic_reuse_srv_params() whose role is to reuse the ALPN negotiated
during a first connection to a QUIC backend alongside its transport parameters.
2025-11-13 14:04:31 +01:00
Frederic Lecaille
33564ca54c MINOR: quic-be: helper functions to save/restore transport params (0-RTT)
Define quic_early_transport_params new struct for QUIC transport parameters
in relation with 0-RTT. This parameters must be saved during a first session to
be reused for 0-RTT next sessions.

qc_early_transport_params_cpy() copies the 0-RTT transport parameters to be
saved during a first connection to a backend. The copy is made from
a quic_transport_params struct to a quic_ealy_transport_params struct.

On the contrary, qc_early_transport_params_reuse() copies the transport parameters
to be reused for a 0-RTT session from a previous one. The copy is made
from a quic_early_transport_params strcut to a quic_transport_params struct.

Also add QUIC_EV_EARLY_TRANSP_PARAMS trace event to dump such 0-RTT
transport parameters from traces.
2025-11-13 14:04:31 +01:00
Frederic Lecaille
80070fe51c MEDIUM: quic-be: Parse, store and reuse tokens provided by NEW_TOKEN
Add a per thread ist struct to srv_per_thread struct to store the QUIC token to
be reused for subsequent sessions.

Parse at packet level (from qc_parse_ptk_frms()) these tokens and store
them calling qc_try_store_new_token() newly implemented function. This is
this new function which does its best (may fail) to update the tokens.

Modify qc_do_build_pkt() to resend these tokens calling quic_enc_token()
implemented by this patch.
2025-11-13 14:04:31 +01:00
Frederic Lecaille
8f23d4d287 MINOR: quic-be: Parse the NEW_TOKEN frame
Rename ->data qf_new_token struct field to ->w_data to distinguish it from
->r_data new field used to parse the NEW_TOKEN frame. Indeed to build the
NEW_TOKEN we need to write it to a static buffer into the frame struct. To
parse it we only need to store the address of the token field into the
RX buffer.
2025-11-13 14:04:31 +01:00
Frederic Lecaille
64e32a0767 BUG/MEDIUM: quic-be: do not launch the connection migration process
At this time the connection migration is not supported by QUIC backends.
This patch prevents this process to be launched for connections to QUIC backends.

Furthermore, the connection migration process could be started systematically
when connecting a backend to INADDR_ANY, leading to crashes into qc_handle_conn_migration()
(when referencing qc->li).

Thank you to @InputOutputZ for having reported this issue in GH #3178.

This patch simply checks the connection type (listener or not) before checking if
a connection migration must be started.

No need to backport because support for QUIC backends is available from 3.3.
2025-11-13 13:52:40 +01:00
William Lallemand
071e5063d8 BUG/MINOR: acme: more explicit error when BIO_new_file()
Replace the error message of BIO_new_file() when the account-key cannot
be created on disk by "acme: cannot create the file '%s'". It was
previously "acme: out of memory." Which is unclear.

Must be backported to 3.2.
2025-11-13 11:56:33 +01:00
Remi Tricot-Le Breton
1b19e4ef32 BUG/MEDIUM: init: 'devnullfd' not properly closed for master
Since commit "1ec59d3 MINOR: init: Make devnullfd global and create it
earlier in init" the devnullfd pointing towards /dev/null gets created
early in the init process but it was closed after the call to
"mworker_run_master". The master process never got to the FD closing
code and we had an FD leak.

This patch does not need to be backported.
2025-11-12 16:06:28 +01:00
Amaury Denoyelle
7927ee95f3 BUG/MINOR: do not account backend connections into maxconn
Remove QUIC backend connections from global actconn accounting. Indeed,
this counter is only used on the frontend side. This is required to
ensure maxconn coherence.
2025-11-12 14:45:00 +01:00
Aurelien DARRAGON
3262da84ea BUG/MEDIUM: stats-file: fix shm-stats-file preload not working anymore
Due to recent commit 5c299dee ("MEDIUM: stats: consider that shared stats
pointers may be NULL") shm-stats-file preloading suddenly stopped working

In fact preloading should be considered as an initializing step so the
counters may be assigned there without checking for NULL first.
Indeed there are supposed to be NULL because preloading occurs before
counters_{fe,be}_shared_prepare() which takes care of setting the pointers
for counters if they weren't set before.

Obviously this corner-case was overlooked during 5c299dee writing and
testing. Thanks to Nick Ramirez for having reported the issue.

No backport needed, this issue is specific to 3.3.
2025-11-11 22:36:17 +01:00
Aurelien DARRAGON
a287841578 MINOR: stats-proxy: ensure future-proof FN_AGE manipulation in me_generate_field()
Commit ad1bdc33 ("BUG/MAJOR: stats-file: fix crash on non-x86 platform
caused by unaligned cast") revealed an ambiguity in me_generate_field()
around FN_AGE manipulation. For now FN_AGE can only be stored as u32 or
s32, but in the future we could also support 64bit FN_AGES, and the
current code assumes 32bits types and performs and explicit unsigned int
cast. Instead we group current 32 bits operations for FF_U32 and FF_S32
formats, and let room for potential future formats for FN_AGE.

Commit ad1bdc33 also suggested that the fix was temporary and the approach
must change, but after a code review it turns out the current approach
(generic types manipulation under me_generate_field()) is legit. The
introduction of shm-stats-file feature didn't change the logic which
was initially implemented in 3.0. It only extended it and since shared
stats are now spread over thread-groups since 3.3, the use of atomic
operations made typecasting errors more visible, and structure mapping
change from d655ed5f14 ("BUG/MAJOR: stats-file: ensure
shm_stats_file_object struct mapping consistency (2nd attempt)") was in
fact the only change to blame for the crash on non-x86 platforms.

With ambiguities removed in me_generate_field(), let's hope we don't face
similar bugs in the future. Indeed, with generic counters, and more
specifically shared ones (which leverage atomic ops), great care must be
taken when changing their underlying types as me_generate_field() solely
relies on stat_col descriptor to know how to read the stat from a generic
pointer, so any breaking change must be reflected in that function as well

No backport needed.
2025-11-10 21:32:22 +01:00
Amaury Denoyelle
5a8728d03a MEDIUM/OPTIM: quic: alloc quic_conn after CID collision check
On Initial packet parsing, a new quic_conn instance is allocated via
qc_new_conn(). Then a CID is allocated with its value derivated from
client ODCID. On CID tree insert, a collision can occur if another
thread was already parsing an Initial packet from the same client. In
this case, the connection is released and the packet will be requeued to
the other thread.

Originally, CID collision check was performed prior to quic_conn
allocation. This was changed by the commit below, as this could cause
issue on quic_conn alloc failure.

  commit 4ae29be18c5b212dd2a1a8e9fa0ee2fcb9dbb4b3
  BUG/MINOR: quic: Possible endless loop in quic_lstnr_dghdlr()

However, this procedure is less optimal. Indeed, qc_new_conn() performs
many steps, thus it could be better to skip it on Initial CID collision,
which can happen frequently. This patch restores the older order of
operations, with CID collision check prior to quic_conn allocation.

To ensure this does not cause again the same bug, the CID is removed in
case of quic_conn alloc failure. This should prevent any loop as it
ensures that a CID found in the global tree does not point to a NULL
quic_conn, unless if CID is attach to a foreign thread. When this thread
will parse a re-enqueued packet, either the quic_conn is already
allocated or the CID has been removed, triggering a fresh CID and
quic_conn allocation procedure.
2025-11-10 12:10:14 +01:00
Amaury Denoyelle
a9d11ab7f3 MINOR: quic: extend traces on CID allocation
Add new traces to detect the CID generation method and also when an
Initial packet is requeued due to CID collision.
2025-11-10 12:10:14 +01:00
Amaury Denoyelle
2623e0a0b7 BUG/MEDIUM: quic: handle collision on CID generation
CIDs are provided by haproxy so that the peer can use them as DCID of
its packets. Their value is set via a random generator. It happens on
several occasions during connection lifetime:
* via ODCID derivation if haproxy is the server
* on quic_conn init if haproxy is the client
* during post-handshake if haproxy is the server
* on RETIRE_CONNECTION_ID frame parsing

CIDs are stored in a global tree. On ODCID derivation, a check is
performed to ensure the CID is not a duplicate value. This is mandatory
to properly handle multiple INITIAL packets from the same client on
different thread.

However, for the other cases, no check is performed for CID collision.
As _quic_cid_insert() is silent, the issue is not detected at all. This
results in a CID advertized to the peer but not stored in the global
one. In the end, this may cause two issues. The first one is that
packets from the client which use the new CID will be rejected by
haproxy, most probably with a STATELESS_RESET. The second issue is that
it can cause a crash during quic_conn release. Indeed, the CID is stored
in the quic_conn local tree and thus eb_delete() for the global tree
will be performed. As <leaf_p> member is uninit, this results in a
segfault.

Note that this issue is pretty rare. It can only be observed if running
with a high number of concurrent connections in parallel, so that the
random generator will provide duplicate values. Patch is still labelled
as MEDIUM as this modifies code paths used frequently.

To fix this, _quic_cid_insert() unsafe function is completely removed.
Instead, quic_cid_insert() can be used, which reports an error code if a
collision happens. CID are then stored in the quic_conn tree only after
global tree insert success. Here is the solution for each steps if a
collision occurs :
* on init as client: the connection is completely released
* post-handshake: the CID is immediately released. The connection is
  kept, but it will miss an extra CID.
* on RETIRE_CONNECTION_ID parsing: a loop is implemented to retry random
  generation. It it fails several times, the connection is closed in
  error.

A small convenience change is made to quic_cid_insert(). Output
parameter <new_tid> can now be NULL, which is useful as most of the
times caller do not care about it.

This must be backported up to 2.6.
2025-11-10 12:10:14 +01:00
Amaury Denoyelle
419e5509d8 MINOR: quic: split CID alloc/generation function
Split new_quic_cid() function into multiple ones. This patch should not
introduce any visible change. The objective is to render CID allocation
and generation more modular.

The first advantage of this patch is to bring code simplication. In
particular, conn CID sequence number increment and insertion into
connection tree is simpler than before. Another improvment is also that
errors could now be handled easier at each different steps of the CID
init.

This patch is a prerequisite for the fix on CID collision, thus it must
be backported prior to it to every affected version.
2025-11-10 12:10:14 +01:00
Amaury Denoyelle
0ef473ba6b MINOR: quic: adjust CID conn tree alloc in qc_new_conn()
Change qc_new_conn() so that the connection CID tree is allocated
earlier in the function. This patch does not introduce a behavior
change. Its objective is to facilitate future evolutions on CIDs
handling.

This patch is a prerequisite for the fix on CID collision, thus it must
be backported prior to it to every affected version.
2025-11-10 12:10:14 +01:00
Amaury Denoyelle
73621adb23 BUG/MINOR: quic: close connection on CID alloc failure
During RETIRE_CONNECTION_ID frame parsing, a new connection ID is
immediately reallocated after the release of the previous one. This is
done to ensure that the peer will never run out of DCID.

Prior to this patch, a CID allocation failure was be silently ignored.
This prevent the emission of a new CID, which could prevent the peer to
emit packets if it had no other CIDs available for use. Now, such error
is considered fatal to the connection. This is the safest solution as
it's better to close connections when memory is running low.

It must be backported up to 2.8.
2025-11-10 12:10:14 +01:00
Willy Tarreau
137d5ba93f BUG/MEDIUM: config: for word expansion, empty or non-existing are the same
Amaury reported a case where "${FOO[*]}" still produces an empty field.
It happens if the variable is defined but does not contain any non-space
characters. The reason is that we special-case word expansion only on
non-existing vars. Let's change the ordering of operations so that word-
expanded vars always pretend the current arg is not an empty quote, so
that we don't make any difference between a non-existing var and an
empty one.

No backport is needed unless commit 1968731765 ("BUG/MEDIUM: config:
solve the empty argument problem again") is.
2025-11-10 11:59:35 +01:00
Willy Tarreau
b26a6d50c6 [RELEASE] Released version 3.3-dev12
Released version 3.3-dev12 with the following main changes :
    - MINOR: quic: enable SSL on QUIC servers automatically
    - MINOR: quic: reject conf with QUIC servers if not compiled
    - OPTIM: quic: adjust automatic ALPN setting for QUIC servers
    - MINOR: sample: optional AAD parameter support to aes_gcm_enc/dec
    - REGTESTS: converters: check USE_OPENSSL in aes_gcm.vtc
    - BUG/MINOR: resolvers: ensure fair round robin iteration
    - BUG/MAJOR: stats-file: fix crash on non-x86 platform caused by unaligned cast
    - OPTIM: backend: skip conn reuse for incompatible proxies
    - SCRIPTS: build-ssl: allow to build a FIPS version without FIPS
    - OPTIM: proxy: move atomically access fields out of the read-only ones
    - SCRIPTS: build-ssl: fix rpath in AWS-LC install for openssl and bssl bin
    - CI: github: update to macos-26
    - BUG/MINOR: quic: fix crash on client handshake abort
    - MINOR: quic: do not set conn member if ssl_sock_ctx
    - MINOR: quic: remove connection arg from qc_new_conn()
    - BUG/MEDIUM: server: Add a rwlock to path parameter
    - BUG/MEDIUM: server: Also call srv_reset_path_parameters() on srv up
    - BUG/MEDIUM: mux-h1: fix 414 / 431 status code reporting
    - BUG/MEDIUM: mux-h2: make sure not to move a dead connection to idle
    - BUG/MEDIUM: connections: permit to permanently remove an idle conn
    - MEDIUM: cfgparse: deprecate 'master-worker' keyword alone
    - MEDIUM: cfgparse: 'daemon' not compatible with -Ws
    - DOC: configuration: deprecate the master-worker keyword
    - MINOR: quic: remove <mux_state> field
    - BUG/MEDIUM: stick-tables: Make sure we handle expiration on all tables
    - MEDIUM: stick-tables: Optimize the expiration process a bit.
    - MEDIUM: ssl/ckch: use ckch_store instead of ckch_data for ckch_conf_kws
    - MINOR: acme: generate a temporary key pair
    - MEDIUM: acme: generate a key pair when no file are available
    - BUILD: ssl/ckch: wrong function name in ckch_conf_kws
    - BUILD: acme: acme_gen_tmp_x509() signedness and unused variables
    - BUG/MINOR: acme: fix initialization issue in acme_gen_tmp_x509()
    - BUILD: ssl/ckch: fix ckch_conf_kws parsing without ACME
    - MINOR: server: move the lock inside srv_add_idle()
    - DOC: acme: crt-store allows you to start without a certificate
    - BUG/MINOR: acme: allow 'key' when generating cert
    - MINOR: stconn: Add counters to SC to know number of bytes received and sent
    - MINOR: stream: Add samples to get number of bytes received or sent on each side
    - MINOR: counters: Add req_in/req_out/res_in/res_out counters for fe/be/srv/li
    - MINOR: stream: Remove bytes_in and bytes_out counters from stream
    - MINOR: counters: Remove bytes_in and bytes_out counter from fe/be/srv/li
    - MINOR: stats: Add stats about request and response bytes received and sent
    - MINOR: applet: Add function to get amount of data in the output buffer
    - MINOR: channel: Remove total field from channels
    - DEBUG: stream: Add bytes_in/bytes_out value for both SC in session dump
    - MEDIUM: stktables: Limit the number of stick counters to 100
    - BUG/MINOR: config: Limit "tune.maxpollevents" parameter to 1000000
    - BUG/MEDIUM: server: close a race around ready_srv when deleting a server
    - BUG/MINOR: config: emit warning for empty args when *not* in discovery mode
    - BUG/MEDIUM: config: solve the empty argument problem again
    - MEDIUM: config: now reject configs with empty arguments
    - MINOR: tools: add support for ist to the word fingerprinting functions
    - MINOR: tools: add env_suggest() to suggest alternate variable names
    - MINOR: tools: have parse_line's error pointer point to unknown variable names
    - MINOR: cfgparse: try to suggest correct variable names on errors
    - IMPORT: cebtree: Replace offset calculation with offsetof to avoid UB
    - BUG/MINOR: acme: wrong dns-01 challenge in the log
    - MEDIUM: backend: Defer conn_xprt_start() after mux creation
    - MINOR: peers: Improve traces for peers
    - MEDIUM: peers: No longer ack updates during a full resync
    - MEDIUM: peers: Remove commitupdate field on stick-tables
    - BUG/MEDIUM: peers: Fix update message parsing during a full resync
    - MINOR: sample/stats: Add "bytes" in req_{in,out} and res_{in,out} names
    - BUG/MEDIUM: stick-tables: Make sure updates are seen as local
    - BUG/MEDIUM: proxy: use aligned allocations for struct proxy
    - BUG/MEDIUM: proxy: use aligned allocations for struct proxy_per_tgroup
    - BUG/MINOR: acme: avoid a possible crash on error paths
2025-11-08 12:12:00 +01:00
Willy Tarreau
5574163073 BUG/MINOR: acme: avoid a possible crash on error paths
In acme_EVP_PKEY_gen(), an error message is printed if *errmsg is set,
however, since commit 546c67d13 ("MINOR: acme: generate a temporary key
pair"), errmsg is passed as NULL in at least one occurrence, leading
the compiler to issue a NULL deref warning at -O3. And indeed, if the
errors are encountered, a crash will occur. No backport is needed.
2025-11-07 22:27:25 +01:00
Willy Tarreau
fb8edd0ce6 BUG/MEDIUM: proxy: use aligned allocations for struct proxy_per_tgroup
In 3.2, commit f879b9a18 ("MINOR: proxies: Add a per-thread group field
to struct proxy") introduced struct proxy_per_tgroup that is declared as
thread_aligned, but is allocated using calloc(). Thus it is at risk of
crashing on machines using instructions requiring 64-byte alignment such
as AVX512. Let's use ha_aligned_zalloc_typed() instead of malloc().

For 3.2, we don't have aligned allocations, so instead the THREAD_ALIGNED()
will have to be removed from the struct definition. Alternately, we could
manually align it as is done for fdtab.
2025-11-07 22:22:55 +01:00
Willy Tarreau
df9eb2e7b6 BUG/MEDIUM: proxy: use aligned allocations for struct proxy
Commit fd012b6c5 ("OPTIM: proxy: move atomically access fields out of
the read-only ones") caused the proxy struct to be 64-byte aligned,
which allows the compiler to use optimizations such as AVX512 to zero
certain fields. However the struct was allocated using calloc() so it
was not necessarily aligned, causing segv on startup on compatible
machines. Let's just use ha_aligned_zalloc_typed() to allocate the
struct.

No backport is needed.
2025-11-07 22:22:55 +01:00
Olivier Houchard
c26bcfc1e3 BUG/MEDIUM: stick-tables: Make sure updates are seen as local
In stktable_touch_with_exp, if it is a local update, add it to the
pending update list even if it's already in the tree as a remote update,
otherwise it will never be communicated to other peers;
It used to work before 3.2 because of the ordering of operations, but
it's been broken by adding an extra step with the pending update list,
so we now have to explicitely check for that.

This should be backported to 3.2.
2025-11-07 16:23:21 +01:00
Christopher Faulet
7d1787ba8e MINOR: sample/stats: Add "bytes" in req_{in,out} and res_{in,out} names
Number of bytes received or sent by a client or a server are now
saved. Sample fetches and stats fields to retrieve these informations are
renamed to add "bytes" in names to avoid any ambiguity with number of
requests and responses.
2025-11-07 14:09:48 +01:00
Christopher Faulet
f12252c7a5 BUG/MEDIUM: peers: Fix update message parsing during a full resync
The commit 590c5ff2e ("MEDIUM: peers: No longer ack updates during a full
resync") introduced a regression. During a full resync, the ID of an update
message is not parsed at all. Thus, the parsing of the whole message in
desynchronized.

On full resync the update id itself is ignored, to not be acked, but it must
be parsed. It is now fixed.

It is a 3.3-specific bug, no backport needed.
2025-11-07 12:47:34 +01:00
Christopher Faulet
ecc2c3a35d MEDIUM: peers: Remove commitupdate field on stick-tables
This stick-table field was atomically updated with the last update id pushed
and dumped on the CLI but never used otherwise. And all peer sessions share
the same id because it is a stick-table info. So the info in peers dump is
pretty limited.

So, let's remove it.
2025-11-07 12:17:53 +01:00
Christopher Faulet
590c5ff2ed MEDIUM: peers: No longer ack updates during a full resync
ACK messages received by a peer sending updates during a full resync are
ignored. So, on the other side, there is no reason to still send these ACK
messages. Let's skip them.

In addition, the received updates during this stage are not considered as to
be acked. It is important to be sure to properly emit ACK messages once the
full sync finished.
2025-11-07 11:50:13 +01:00
Christopher Faulet
383bf11306 MINOR: peers: Improve traces for peers
Trace messages for peers were only protocol oriented and information
provided were quite light. With this patch, the traces were
improved. information about the peer, its applet and the section are
dumped. Several verbosities are now available and messages are dumped at
different levels depending on the context. It should easier to track issues
in the peers.
2025-11-07 11:50:13 +01:00
Olivier Houchard
25559e7055 MEDIUM: backend: Defer conn_xprt_start() after mux creation
In connect_server(), defer the call to conn_xprt_start() until after we
had a chance to create the mux. The xprt can behave differently
depending on if a mux is or is not available at this point, as if it is,
it may want to wait until some data comes from the mux.

This does not need to be backported.
2025-11-07 11:40:52 +01:00
William Lallemand
3bc90d01d1 BUG/MINOR: acme: wrong dns-01 challenge in the log
Since 861fe532046 ("MINOR: acme: add the dns-01-record field to the
sink"), the dns-01 challenge is output in the dns_record trash, instead
of the global trash.

The send_log string was never updated with this change, and dumps some
data from the global trash instead. Since the last data emitted in the
trash seems to be the dns-01 token from the authorization object, it
looks like the response to the challenge.

This must be backported to 3.2.
2025-11-07 09:49:04 +01:00
Ben Kallus
d5ca3bb3b4 IMPORT: cebtree: Replace offset calculation with offsetof to avoid UB
This is the same as the equivalent fix in ebtree:

The C standard specifies that it's undefined behavior to dereference
NULL (even if you use & right after). The hand-rolled offsetof idiom
&(((s*)NULL)->f) is thus technically undefined. This clutters the
output of UBSan and is simple to fix: just use the real offsetof when
it's available.

This is cebtree commit 2d08958858c2b8a1da880061aed941324e20e748.
2025-11-07 07:32:58 +01:00
Willy Tarreau
4c3351fd63 MINOR: cfgparse: try to suggest correct variable names on errors
When an empty argument comes from the use of a non-existing variable,
we'll now detect the difference with an empty variable (error pointer
points to the variable's name instead), and submit it to env_suggest()
to see if another variable looks likely to be the right one or not.

This can be quite useful to quickly figure how to fix misspelled variable
names. Currently only series of letters, digits and underscores are
attempted to be resolved as a name. A typical example is:

   peer "${HAPROXY_LOCAL_PEER}" 127.0.0.1:10000

which produces:

  [ALERT]    (24231) : config : parsing [bug-argv4.cfg:2]: argument number 1 at position 13 is empty and marks the end of the argument list:
    peer "${HAPROXY_LOCAL_PEER}" 127.0.0.1:10000
            ^
  [NOTICE]   (24231) : config : Hint: maybe you meant HAPROXY_LOCALPEER instead ?
2025-11-06 19:57:44 +01:00
Willy Tarreau
49585049b9 MINOR: tools: have parse_line's error pointer point to unknown variable names
When an argument is empty, parse_line() currently returns a pointer to
the empty string itself. This is convenient, but it's only actionable by
the user who will see for example "${HAPROXY_LOCALPEER}" and figure what
is wrong. Here we slightly change the reported pointer so that if an empty
argument results from the evaluation of an empty variable (meaning that
all variables in string are empty and no other char is present), then
instead of pointing to the opening quote, we'll return a pointer to the
first character of the variable's name. This will allow to make a
difference between an empty variable and an unknown variable, and for
the caller to take action based on this.

I.e. before we would get:

    log "${LOG_SERVER_IP}" local0
        ^

if LOG_SERVER_IP is not set, and now instead we'll get this:

    log "${LOG_SERVER_IP}" local0
           ^
2025-11-06 19:57:44 +01:00
Willy Tarreau
14087e48b9 MINOR: tools: add env_suggest() to suggest alternate variable names
The purpose here is to look in the environment for a variable whose
name looks like the provided one. This will be used to try to auto-
correct misspelled environment variables that would silently be turned
to an empty string.
2025-11-06 19:57:44 +01:00
Willy Tarreau
a4d78dd4f5 MINOR: tools: add support for ist to the word fingerprinting functions
The word fingerprinting functions are used to compare similar words to
suggest a correctly spelled one that looks like what the user proposed.
Currently the functions only support const char*, but there's no reason
for this, and it would be convenient to support substrings extracted
from random pieces of configurations. Here we're adding new variants
"_with_len" that take these ISTs and which are in fact a slight change
of the original ones that the old ones now rely on.
2025-11-06 19:57:44 +01:00
Willy Tarreau
d9d0721bc9 MEDIUM: config: now reject configs with empty arguments
As prepared during 3.2, we must error on empty arguments because they
mark the end of the line and cause subsequent arguments to be silently
ignored. It was too late in 3.2 to turn that into an error so it's a
warning, but for 3.3 it needed to be an alert.

This patch does that. It doesn't instantly break, instead it counts
one fatal error per violating line. This allows to emit several errors
at once, which can often be caused by the same variable being missed,
or a group of variables sharing a same misspelled prefix for example.
Tests show that it helps locate them better. It also explains what to
look for in the config manual for help with variables expansion.
2025-11-06 19:57:44 +01:00
Willy Tarreau
1968731765 BUG/MEDIUM: config: solve the empty argument problem again
This mostly reverts commit ff8db5a85 ("BUG/MINOR: config: Stopped parsing
upon unmatched environment variables").

As explained in commit #2367, finally the fix above was incorrect because
it causes other trouble such as this:

     log "192.168.100.${NODE}" "local0"

being resolved to this:

     log 192.168.100.local0

when NODE does not exist due to the loss of the spaces. In fact, while none
of us was well aware of this, when the user had:

     server app 127.0.0.1:80 "${NO_CHECK}" weight 123

in fact they should have written it this way:

     server app 127.0.0.1:80 "${NO_CHECK[*]}" weight 123

so that the variable is expanded to zero, one or multiple words, leaving
no empty arg (like in shell). This is supported since 2.3 with commit
fa41cb6 so the right fix is in the config, let's revert the fix and
properly address the issue.

Some changes are necessary however, since after that patch, the in_arg
checks were added and are now inserting an empty argument even for
proper error reporting. For example, the following statement:

    acl foo path "/a" "${FOO[*]}" "/b"

would complain about an empty arg at FOO due to in_arg=1, while dropping
this in_arg=1 with the following config:

    acl foo path "/a" "${FOO}" "/b"

would silently stop after "/a" instead of complaining about an empty
field. So the approach here consists in noting whether or not something
was written since the quotes were emitted, in order to decide whether
or not to produce an argument. This way, "" continues to be an explicitly
empty arg, just like the same with an unknown variable, while "${FOO[*]}"
is allowed to prevent the creation of an argument if empty.

This should be backported to *some* versions, but the risk that some
configs were altered to rely on the broken fix is not null. At least
recent LTS should be reverted. Note that this requires previous commit:

    BUG/MINOR: config: emit warning for empty args when *not* in discovery mode

otherwise this will break again configs relying on HAPROXY_LOCALPEER and
maybe a few other variables set at the end of discovery.
2025-11-06 19:57:44 +01:00
Willy Tarreau
004e1be48e BUG/MINOR: config: emit warning for empty args when *not* in discovery mode
This actually reverses the condition of commit 5f1fad1690 ("BUG/MINOR:
config: emit warning for empty args only in discovery mode"). Indeed,
some variables are not known in discovery mode (e.g. HAPROXY_LOCALPEER),
and statements like:

   peer "${HAPROXY_LOCALPEER}" 127.0.0.1:10000

are broken during discovery mode. It turns out that the warning is
currently hidden by commit ff8db5a85d ("BUG/MINOR: config: Stopped
parsing upon unmatched environment variables") since it silently drops
empty args which is sufficient to hide the warning, but it also breaks
other configs and needs to be reverted, which will break configs like
above again.

In issue #2995 we were not fully decided about discovery mode or not,
and already suspected some possible issues without being able to guess
which ones. The only downside of not displaying them in discovery mode
is that certain empty fields on the rare keywords specific to master
mode might remain silent until used. Let's just flip the condition to
check for empty args in normal mode only.

This should be backported to 3.2 after some time of observation.
2025-11-06 19:57:44 +01:00
Willy Tarreau
0144426dfb BUG/MEDIUM: server: close a race around ready_srv when deleting a server
When a server is being disabled or deleted, in case it matches the
backend's ready_srv, this one is reset. However it's currently done in
a non-atomic way when the server goes down, and that could occasionally
reset the entry matching another server, but more importantly if in
parallel some requests are dequeued for that server, it may re-appear
there after having been removed, leading to a possible crash once it
is fully removed, as shown in issue #3177.

Let's make sure we reset the pointer when detaching the server from
the proxy, and use a CAS in both cases to only reset this server.

This fix needs to be backported to 3.2. There, srv_detach() is in
server.c instead of server.h. Thanks to Basha Mougamadou for the
detailed report and the useful backtraces.
2025-11-06 19:57:44 +01:00
Christopher Faulet
c6f68901cc BUG/MINOR: config: Limit "tune.maxpollevents" parameter to 1000000
"tune.maxpollevents" global parameter was not limited. It was possible to
set any integer value. But this value is used to allocate the array of
events used by epoll. With a huge value, it seems the allocation silently
fail, making haproxy totally unresponsive.

So let's to limit its value to 1 million. It is pretty high and it should
not be an issue to forbid greater values. The documentation was updated
accordingly.

This patch could be backported to all stable branches.
2025-11-06 15:56:21 +01:00
Christopher Faulet
80edbad4f9 MEDIUM: stktables: Limit the number of stick counters to 100
"tune.stick-counters" global parameter was accepting any positive integer
value. But the maximum value is incredibly high. Setting a huge value has
signitifcant impact on memory and CPU usage. To avoid any issue, this value
is now limited to 100. It should be greater enough to all usage.

It can be seen as a breaking change.
2025-11-06 15:01:29 +01:00
Christopher Faulet
949199a2f4 DEBUG: stream: Add bytes_in/bytes_out value for both SC in session dump
It could be handy to have these infos in the full session dump. So let's
dump it now.
2025-11-06 15:01:29 +01:00
Christopher Faulet
a1b5325a7a MINOR: channel: Remove total field from channels
The <total> field in the channel structure is now useless, so it can be
removed. The <bytes_in> field from the SC is used instead.

This patch is related to issue #1617.
2025-11-06 15:01:29 +01:00
Christopher Faulet
1effe0fc0a MINOR: applet: Add function to get amount of data in the output buffer
The helper function applet_output_data() returns the amount of data in the
output buffer of an applet. For applets using the new API, it is based on
data present in the outbuf buffer. For legacy applets, it is based on input
data present in the input channel's buffer. The HTX version,
applet_htx_output_data(), is also available

This patch is related to issue #1617.
2025-11-06 15:01:29 +01:00
Christopher Faulet
4991a51208 MINOR: stats: Add stats about request and response bytes received and sent
In previous patches, these counters were added per frontend, backend, server
and listener. With this patch, these counters are reported on stats,
including promex.

Note that the stats file minor version was incremented by one because the
shm_stats_file_object struct size has changed.

This patch is related to issue #1617.
2025-11-06 15:01:29 +01:00
Christopher Faulet
0084baa6ba MINOR: counters: Remove bytes_in and bytes_out counter from fe/be/srv/li
bytes_in and bytes_out counters per frontend, backend, listener and server
were removed and we now rely on, respectively on, req_in and res_in
counters.

This patch is related to issue #1617.
2025-11-06 15:01:29 +01:00
Christopher Faulet
567df50d91 MINOR: stream: Remove bytes_in and bytes_out counters from stream
per-stream bytes_in and bytes_out counters was removed and replaced by
req.in and res.in. Coorresponding samples still exists but replies on new
counters.

This patch is related to issue #1617.
2025-11-06 15:01:29 +01:00
Christopher Faulet
1c62a6f501 MINOR: counters: Add req_in/req_out/res_in/res_out counters for fe/be/srv/li
Thanks to the previous patch, and based on info available on the stream, it
is now possible to have counters for frontends, backends, servers and
listeners to report number of bytes received and sent on both sides.

This patch is related to issue #1617.
2025-11-06 15:01:29 +01:00
Christopher Faulet
ac9201f929 MINOR: stream: Add samples to get number of bytes received or sent on each side
req.in and req.out samples can now be used to get the number of bytes
received by a client and send to the server. And res.in and res.out samples
can be used to get the number of bytes received by a server and send to the
client. These info are stored in the logs structure inside a stream.

This patch is related to issue #1617.
2025-11-06 15:01:28 +01:00
Christopher Faulet
629fbbce19 MINOR: stconn: Add counters to SC to know number of bytes received and sent
<bytes_in> and <bytes_out> counters were added to SC to count, respectively,
the number of bytes received from an endpoint or sent to an endpoint. These
counters are updated for connections and applets.

This patch is related to issue #1617.
2025-11-06 15:01:28 +01:00
William Lallemand
094baa1cc0 BUG/MINOR: acme: allow 'key' when generating cert
Allow to use the 'key' keyword when 'crt' was generated with both a crt
and a key.

No backport needed.
2025-11-06 14:11:43 +01:00
William Lallemand
05036180d9 DOC: acme: crt-store allows you to start without a certificate
If your acme certificate is declared in a crt-store, and the certificate
file does not exist on the disk, HAProxy will start with a temporary key
pair.
2025-11-06 13:40:42 +01:00
Willy Tarreau
5fe4677231 MINOR: server: move the lock inside srv_add_idle()
Almost all callers of _srv_add_idle() lock the list then call the
function. It's not the most efficient and it requires some care from
the caller to take care of that lock. Let's change this a little bit by
having srv_add_idle() that takes the lock and calls _srv_add_idle() that
is now inlined. This way callers don't have to handle the lock themselves
anymore, and the lock is only taken around the sensitive parts, not the
function call+return.

Interestingly, perf tests show a small perf increase from 2.28-2.32M RPS
to 2.32-2.37M RPS on a 128-thread system.
2025-11-06 13:16:24 +01:00
William Lallemand
a8498cde74 BUILD: ssl/ckch: fix ckch_conf_kws parsing without ACME
Without ACME, the tmp_pkey and tmp_x509 functions are not available, the
patch checks HAVE_ACME to use them.
2025-11-06 12:27:27 +01:00
William Lallemand
22f92804d6 BUG/MINOR: acme: fix initialization issue in acme_gen_tmp_x509()
src/acme.c: In function ‘acme_gen_tmp_x509’:
src/acme.c:2685:15: error: ‘digest’ may be used uninitialized [-Werror=maybe-uninitialized]
 2685 |         if (!(X509_sign(newcrt, pkey, digest)))
      |              ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/acme.c:2628:23: note: ‘digest’ was declared here
 2628 |         const EVP_MD *digest;
      |                       ^~~~~~
2025-11-06 12:12:18 +01:00
William Lallemand
0524af034f BUILD: acme: acme_gen_tmp_x509() signedness and unused variables
Fix compilation issues in acme_gen_tmp_x509().

src/acme.c:2665:66: warning: pointer targets in passing argument 4 of ‘X509_NAME_add_entry_by_txt’ differ in signedness [-Wpointer-sign]
 2665 |         if (X509_NAME_add_entry_by_txt(name, "CN", MBSTRING_ASC, "expired",
      |                                                                  ^~~~~~~~~
      |                                                                  |
      |                                                                  char *
In file included from /usr/include/openssl/ssl.h:32,
                 from include/haproxy/openssl-compat.h:19,
                 from include/haproxy/acme-t.h:6,
                 from src/acme.c:16:
/usr/include/openssl/x509.h:1074:53: note: expected ‘const unsigned char *’ but argument is of type ‘char *’
 1074 |                                const unsigned char *bytes, int len, int loc,
      |                                ~~~~~~~~~~~~~~~~~~~~~^~~~~
src/acme.c:2630:23: warning: unused variable ‘i’ [-Wunused-variable]
 2630 |         unsigned int  i;
      |                       ^
src/acme.c:2629:23: warning: unused variable ‘ctx’ [-Wunused-variable]
 2629 |         X509V3_CTX    ctx;
      |                       ^~~
2025-11-06 12:08:04 +01:00
William Lallemand
a15d4f5b19 BUILD: ssl/ckch: wrong function name in ckch_conf_kws
ckch_conf_load_pem does not exist anymore and
ckch_conf_load_pem_or_generate must be used instead
2025-11-06 12:03:29 +01:00
William Lallemand
582a1430b2 MEDIUM: acme: generate a key pair when no file are available
When an acme keyword is associated to a crt and key, and the corresponding
files does not exist, HAProxy would not start.

This patch allows to configure acme without pre-generating a keypair before
starting HAProxy. If the files does not exist, it tries to generate a unique
keypair in memory, that will be used for every ACME certificates that don't
have a file on the disk yet.
2025-11-06 11:56:27 +01:00
William Lallemand
546c67d137 MINOR: acme: generate a temporary key pair
This patch provides two functions acme_gen_tmp_pkey() and
acme_gen_tmp_x509().

These functions generates a unique keypair and X509 certificate that
will be stored in tmp_x509 and tmp_pkey. If the key pair or certificate
was already generated they will return the existing one.

The key is an RSA2048 and the X509 is generated with a expiration in the
past. The CN is "expired".

These are just placeholders to be used if we don't have files.
2025-11-06 11:56:27 +01:00
William Lallemand
1df55b441b MEDIUM: ssl/ckch: use ckch_store instead of ckch_data for ckch_conf_kws
This is an API change, instead of passing a ckch_data alone, the
ckch_conf_kws.func() is called with a ckch_store.

This allows the callback to access the whole ckch_store, with the
ckch_conf and the ckch_data. But it requires the ckch_conf to be
actually put in the ckch_store before.
2025-11-06 11:56:27 +01:00
Olivier Houchard
201971ec5f MEDIUM: stick-tables: Optimize the expiration process a bit.
In process_tables_expire(), if the table we're analyzing still has
entries, and thus should be put back into the tree, do not put it in the
mt_list, to have it put back into the tree the next time the task runs.
There is no problem with putting it in the tree right away, as either
the next expiration is in the future, or we handled the maximum number
of expirations per task call and we're about to stop, anyway.

This does not need to be backported.
2025-11-05 19:22:11 +01:00
Olivier Houchard
93f994e8b1 BUG/MEDIUM: stick-tables: Make sure we handle expiration on all tables
In process_tables_expire(), when parsing all the tables with expiration
set, to check if the any entry expired, make sure we start from the
oldest one, we can't just rely on eb32_first(), because of sign issues
on the timestamp.
Not doing that may mean some tables are not considered for expiration.

This does not need to be backported.
2025-11-05 19:22:11 +01:00
Amaury Denoyelle
b9809fe0d0 MINOR: quic: remove <mux_state> field
This patch removes <mux_state> field from quic_conn structure. The
purpose of this field was to indicate if MUX layer above quic_conn is
not yet initialized, active, or already released.

It became tedious to properly set it as initialization order of the
various quic_conn/conn/MUX layers now differ between the frontend and
backend sides, and also depending if 0-RTT is used or not. Recently, a
new change introduced in connect_server() will allow to initialize QUIC
MUX earlier if ALPN is cached on the server structure. This had another
level of complexity.

Thus, this patch removes <mux_state> field completely. Instead, a new
flag QUIC_FL_CONN_XPRT_CLOSED is defined. It is set at a single place
only on close XPRT callback invokation. It can be mixed with the new
utility functions qc_wait_for_conn()/qc_is_conn_ready() to determine the
status of conn/MUX layers now without an extra quic_conn field.
2025-11-05 14:03:34 +01:00
William Lallemand
99a2454e9d DOC: configuration: deprecate the master-worker keyword
Deprecate the 'master-worker' keyword in the global section.

Split the configuration of the 'no-exit-on-failure' subkeyword in
another section which is not deprecated yet and explains that its only
meant for debugging purpose.
2025-11-05 12:27:11 +01:00
William Lallemand
4f978325ac MEDIUM: cfgparse: 'daemon' not compatible with -Ws
Emit a warning when the 'daemon' keyword is used in master-worker mode
for systemd (-Ws). This never worked and was always ignored by setting
MODE_FOREGROUND during cmdline parsing.
2025-11-05 11:49:11 +01:00
William Lallemand
631233e9ec MEDIUM: cfgparse: deprecate 'master-worker' keyword alone
Warn when the 'master-worker' keyword is used without
'no-exit-on-failure'.

Warn when the 'master-worker' keyword is used and -W and -Ws already set
the mode.
2025-11-05 11:49:11 +01:00
Willy Tarreau
096999ee20 BUG/MEDIUM: connections: permit to permanently remove an idle conn
There's currently a function conn_delete_from_tree() which is used to
detach an idle connection from the tree it's currently attached to so
that it is no longer found. This function is used in three circumstances:
  - when picking a new connection that no longer has any avail stream
  - when temporarily working on the connection from an I/O handler,
    in which case it's re-added at the end
  - when killing a connection

The 2nd case above is quite specific, as it requires to preserve the
CO_FL_LIST_MASK flags so that the connection can be re-inserted into
the proper tree when leaving the handler. However, there's a catch.
When killing a connection, we want to be certain it will not be
reinserted into the tree. The flags preservation is causing a tiny
race if an I/O happens while the connection is in the kill list,
because in this case the I/O handler will note the connection flags,
do its work, then reinsert the connection where it believed it was,
then the connection gets purged, and another user can find it in the
tree.

The issue is very difficult to reproduce. On a 128-thread machine it
happens in H2 around 500k req/s after around 50M requests. In H1 it
happens after around 1 billion requests.

The fix here consists in passing an extra argument to the function to
indicate if the removal is permanent or not. When it's permanent, the
function will clear the associated flags. The callers were adjusted
so that all those dequeuing a connection in order to kill it do it
permanently and all other ones do it only temporarily.

A slightly different approach could have worked: the function could
always remove all flags, and the callers would need to restore them.
But this would require trickier modifications of the various call
places, compared to only passing 0/1 to indicate the permanent status.

This will need to be backported to all stable versions. The issue was
at least reproduced since 3.1 (not tested before). The patch will need
to be adjusted for 3.2 and older, because a 2nd argument "thr" was
added in 3.3, so the patch will not apply to older versions as-is.
2025-11-05 11:08:25 +01:00
Willy Tarreau
59c599f3f0 BUG/MEDIUM: mux-h2: make sure not to move a dead connection to idle
In h2_detach(), it looks possible to place a dead connection back to
the idle list, and to later call h2_release() on it once detected as
dead. It's not certain that it happens but nothing in the code shows
it is not possible, so better make sure it cannot happen.

This should be preventively backported to all versions.
2025-11-05 11:08:25 +01:00
Maximilian Moehl
0799fd1072 BUG/MEDIUM: mux-h1: fix 414 / 431 status code reporting
The more detailed status code reporting introduced with bc967758a2 is
checking against the error state to determine whether it is a too long
URL or too large headers. The check used always returns true which
results in a 414 as the error state is only set at a later point.

This commit adjusts the check to use the current state instead to return
the intended status code.

This patch must be backported as far as 3.1.
2025-11-05 10:55:18 +01:00
Olivier Houchard
06821dc189 BUG/MEDIUM: server: Also call srv_reset_path_parameters() on srv up
Also call srv_reset_path_parameters() when the server changed states,
and got up. It is not enough to do it when the server goes down, because
there's a small race condition, and a connection could get established
just after we did it, and could have set the path parameters.

This does not need to be backported.
2025-11-04 18:47:34 +01:00
Olivier Houchard
7d4aa7b22b BUG/MEDIUM: server: Add a rwlock to path parameter
Add a rwlock to control the server's path_parameter, to make sure
multiple threads don't set it at the same time, and it can't be seen in
an inconsistent state.
Also don't set the parameter every time, only set them if they have
changed, to prevent needless writes.

This does not need to be backported.
2025-11-04 18:47:34 +01:00
Amaury Denoyelle
efe60745b3 MINOR: quic: remove connection arg from qc_new_conn()
This patch is similar to the previous one, this time dealing with
qc_new_conn(). This function was asymetric on frontend and backend side,
as connection argument was set only in the latter case.

This was required prior due to qc_alloc_ssl_sock_ctx() signature. This
has changed with the previous patch, thus qc_new_conn() can also be
realigned on both FE and BE sides. <conn> member of quic_conn instance
is always set outside it, in qc_xprt_start() on the backend case.
2025-11-04 17:47:42 +01:00
Amaury Denoyelle
5a17cade4f MINOR: quic: do not set conn member if ssl_sock_ctx
ssl_sock_ctx is a generic object used both on TCP/SSL and QUIC stacks.
Most notably it contains a <conn> member which is a pointer to struct
connection.

On QUIC frontend side, this member is always set to NULL. Indeed,
connection is only created after handshake completion. However, this has
changed for backend side, where the connection is instantiated prior to
its quic_conn counterpart. Thus, ssl_sock_ctx member would be set in
this case as a convenience for use later in qc_ssl_do_hanshake().

However, this method was unsafe as the connection can be released,
without resetting ssl_sock_ctx member. Thus, the previous patch fixes
this by using on <conn> member through the quic_conn instance which is
the proper way.

Thus, this patch resets ssl_sock_ctx <conn> member to NULL. This is
deemed the cleanest method as it ensures that both frontend and backend
sides must not use it anymore.
2025-11-04 17:38:09 +01:00
Amaury Denoyelle
69de7ec14e BUG/MINOR: quic: fix crash on client handshake abort
On backend side, a connection can be aborted and released prior to
handshake completion. This causes a crash in qc_ssl_do_hanshake() as
<conn> member of ssl_sock_ctx is not reset in this case.

To fix this, use <conn> member of quic_conn instead. This is safe as it
is properly set to NULL when a connection is released.

No impact on the frontend side as <conn> member is not accessed. Indeed,
in this case connection is most of the times allocated after handshake
completion.

No need to be backported.
2025-11-04 17:33:42 +01:00
William Lallemand
3c578ca31c CI: github: update to macos-26
macOS-15 images seems to have difficulties to run the reg-tests since a
few days for an unknown reason. Doing a rollback of both VTest2 and
haporxy doesn't seem to fix the problem so this is probably related to a
change in github actions.

This patch switches the image to the new macos-26 images which seems to
fix the problem.
2025-11-03 16:17:36 +01:00
William Lallemand
0c34502c6d SCRIPTS: build-ssl: fix rpath in AWS-LC install for openssl and bssl bin
AWS-LC binaries were not linked correctly with an rpath, preventing the
binaries to be useful without setting an LD_LIBRARY_PATH manually.
2025-11-03 15:04:57 +01:00
Willy Tarreau
fd012b6c59 OPTIM: proxy: move atomically access fields out of the read-only ones
Perf top showed that h1_snd_buf() was having great difficulties accessing
the proxy's server_id_hdr_name field in the middle of the headers loop.
Moving the assignment out of the loop to a local variable moved the
problem there as well:

       |      if (!(h1m->flags & H1_MF_RESP) && isttest(h1c->px->server_id_hdr_n
  0.10 |20b0:   mov        -0x120(%rbp),%rdi
  1.33 |        mov        0x60(%rdi),%r10
  0.01 |        test       %eax,%eax
  0.18 |        jne        2118
 12.87 |        mov        0x350(%r10),%rdi
  0.01 |        test       %rdi,%rdi
  0.05 |        je         2118
       |        mov        0x358(%r10),%r11

It turns out that there are several atomically accessed fields in its
vicinity, causing the cache line to bounce all the time. Let's collect
the few frequently changed fields and place them together at the end
of the structure, and plug the 32-bit hole with another isolated field.
Doing so also reduced a little bit the cost of decrementing be->be_conn
in process_stream(), and overall the HTTP/1 performance increased by
about 1% both on ARM and x86_64.
2025-11-03 13:54:49 +01:00
William Lallemand
12aca978a8 SCRIPTS: build-ssl: allow to build a FIPS version without FIPS
build-ssl.sh is always prepending a "v" to the version, preventing to
build a FIPS version without FIPS enabled.

This patch checks if FIPS is in the version string to chose to add the
"v" or not.

Example:

AWS_LC_VERSION=AWS-LC-FIPS-3.0.0 BUILDSSL_DESTDIR=/opt/awslc-3.0.0 ./scripts/build-ssl.sh
2025-11-03 12:03:05 +01:00
Amaury Denoyelle
6bfabfdc77 OPTIM: backend: skip conn reuse for incompatible proxies
When trying to reuse a backend connection, a connection hash is
calculated to match an entry with similar parameters. Previously, this
operation was skipped if the stream content wasn't based on HTTP, as it
would have been incompatible with http-reuse.

With the introduction of SPOP backends, this condition was removed, so
that it can also benefit from connection reuse. However, this means that
now hash calcul is always performed when connecting to a server, even
for TCP or log backends. This is unnecessary as these proxies cannot
perform connection reuse.

Note also that reuse mode is resetted on postparsing for incompatible
backends. This at least guarantees that no tree lookup will be performed
via be_reuse_connection(). However, connection lookup is still performed
in the session via session_get_conn() which is another unnecessary
operation.

Thus, this patch restores the condition so that reuse operations are now
entirely skipped if a backend mode is incompatible. This is implemented
via a new utility function named be_supports_conn_reuse().

This could be backported up to 3.1, as this commit could be considered
as a performance regression for tcp/log backend modes.
2025-11-03 10:43:50 +01:00
Willy Tarreau
ad1bdc3364 BUG/MAJOR: stats-file: fix crash on non-x86 platform caused by unaligned cast
Since commit d655ed5f14 ("BUG/MAJOR: stats-file: ensure
shm_stats_file_object struct mapping consistency (2nd attempt)"), the
last_state_change field in the counters is a uint (to match how it's
reported). However, it happens that there are explicit casts in function
me_generate_field() to retrieve the value, and which cause crashes on
aarch64 and likely other non-x86 64-bit platforms due to atomically
reading an unaligned 64-bit value, and may even randomly crash other
64-bit platforms when reading past the end of the structure.

The fix for now adapts the cast to match the one used by the accessed
type (i.e. unsigned int), but the approach must change, as there's
nothing there which allows to figure whether or not the type is correct
by just reading the code. At minima a typeof() on a named field is
needed, but this requires more invasive changes, hence this temporary
fix.

No backport is needed, as stats-file is only in 3.3.
2025-11-03 07:33:11 +01:00
Damien Claisse
561dc127bd BUG/MINOR: resolvers: ensure fair round robin iteration
Previous fixes restored round robin iteration, but an imbalance remains
when the response tree contains record types other than A or AAAA. Let's
take the following example: the DNS answers two A records and a CNAME.
The response "tree" (which is actually flat, more like a list) may look
as follows, ordered by hash:
- 1st item: first A record with IP 1
- 2nd item: second A record with IP 2
- 3rd item: CNAME record
As a consequence, resolv_get_ip_from_response will iterate as follows,
while the TTL is still valid:
- 1st call: DNS request is done, response tree is created, iteration
  starts at the first item, IP 1 is returned.
- 2nd call: cached response tree is used, iteration starts at the second
  item, IP 2 is returned.
- 3rd call: cached response tree is used, iteration starts at the third
  item, but it's a CNAME, so we continue to the next item, which restarts
  iteration at the first item, and IP 1 is returned.
- 4th call: cached response tree is used and iteration restarts at the
  beginning, returning IP 1 again.
The 1-2-1-1-2-1-1-2 sequence will repeat, so IP 1 will be used twice as
often as IP 2, creating a strong imbalance. Even with more IP addresses,
the first one by hashing order in the tree will always receive twice the
traffic of the others.
To fix this, set the next iteration item to the one following the selected
IP record, if any. This ensures we never use the same IP twice in a row.

This commit should be backported where 3023e9819 ("BUG/MINOR: resolvers:
Restore round-robin selection on records in DNS answers") is, so as far
as 2.6.
2025-11-02 17:28:32 +01:00
William Lallemand
d1d2461197 REGTESTS: converters: check USE_OPENSSL in aes_gcm.vtc
Check USE_OPENSSL as well as the haproxy version for the aes_gcm
reg-test.
2025-10-31 12:43:00 +01:00
William Lallemand
1d859bdaa2 MINOR: sample: optional AAD parameter support to aes_gcm_enc/dec
The aes_gcm_enc() and aes_gcm_dec() sample converters now accept an
optional fifth argument for Additional Authenticated Data (AAD). When
provided, the AAD value is base64-decoded and used during AES-GCM
encryption or decryption. Both string and variable forms are supported.

This enables use cases that require authentication of additional data.
2025-10-31 12:27:38 +01:00
Amaury Denoyelle
73b5d331cc OPTIM: quic: adjust automatic ALPN setting for QUIC servers
If a QUIC server is declared without ALPN, "h3" value is automatically
set during _srv_parse_finalize().

This patch adjusts this operation. Instead of relying on
ssl_sock_parse_alpn(), a plain strdup() is used. This is considered more
efficient as the ALPN string is constant in this case. This method is
already used for listeners on the frontend side.
2025-10-31 11:32:20 +01:00
Amaury Denoyelle
14a6468df5 MINOR: quic: reject conf with QUIC servers if not compiled
Ensure that QUIC support is compiled into haproxy when a QUIC server is
configured. This check is performed during _srv_parse_finalize() so that
it is detected both on configuration parsing and when adding a dynamic
server via the CLI.

Note that this changes the behavior of srv_is_quic() utility function.
Previously, it always returned false when QUIC support wasn't compiled.
With this new check introduced, it is now guaranteed that a QUIC server
won't exist if compilation support is not active. Hence srv_is_quic()
does not rely anymore on USE_QUIC define.
2025-10-31 11:32:20 +01:00
Amaury Denoyelle
1af3caae7d MINOR: quic: enable SSL on QUIC servers automatically
Previously, QUIC servers were rejected if SSL was not explicitely
activated using 'ssl' configuration keyword.

Change this behavior : now SSL is automatically activated for QUIC
servers when the keyword is missing. A warning is displayed as it is
considered better to explicitely note that SSL is in use.
2025-10-31 11:32:14 +01:00
Willy Tarreau
0a14ad11be [RELEASE] Released version 3.3-dev11
Released version 3.3-dev11 with the following main changes :
    - BUG/MEDIUM: mt_list: Make sure not to unlock the element twice
    - BUG/MINOR: quic-be: unchecked connections during handshakes
    - BUG/MEDIUM: cli: also free the trash chunk on the error path
    - MINOR: initcalls: Add a new initcall stage, STG_INIT_2
    - MEDIUM: stick-tables: Use a per-shard expiration task
    - MEDIUM: stick-tables: Remove the table lock
    - MEDIUM: stick-tables: Stop if stktable_trash_oldest() fails.
    - MEDIUM: stick-tables: Stop as soon as stktable_trash_oldest succeeds.
    - BUG/MEDIUM: h1-htx: Don't set HTX_FL_EOM flag on 1xx informational messages
    - BUG/MEDIUM: h3: properly encode response after interim one in same buf
    - BUG/MAJOR: pools: fix default pool alignment
    - MINOR: ncbuf: extract common types
    - MINOR: ncbmbuf: define new ncbmbuf type
    - MINOR: ncbmbuf: implement add
    - MINOR: ncbmbuf: implement iterator bitmap utilities functions
    - MINOR: ncbmbuf: implement ncbmb_data()
    - MINOR: ncbmbuf: implement advance operation
    - MINOR: ncbmbuf: add tests as standalone mode
    - BUG/MAJOR: quic: use ncbmbuf for CRYPTO handling
    - MINOR: quic: remove received CRYPTO temporary tree storage
    - MINOR: stats-file: fix typo in shm-stats-file object struct size detection
    - MINOR: compiler: add FIXED_SIZE(size, type, name) macro
    - MEDIUM: freq-ctr: use explicit-size types for freq-ctr struct
    - BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency
    - BUG/MEDIUM: build: limit excessive and counter-productive gcc-15 vectorization
    - BUG/MEDIUM: stick-tables: Don't loop if there's nothing left
    - MINOR: acme: add the dns-01-record field to the sink
    - MINOR: acme: display the complete challenge_ready command in the logs
    - BUG/MEDIUM: mt_lists: Avoid el->prev = el->next = el
    - MINOR: quic: remove unused conn-tx-buffers limit keyword
    - MINOR: quic: prepare support for options on FE/BE side
    - MINOR: quic: rename "no-quic" to "tune.quic.listen"
    - MINOR: quic: duplicate glitches FE option on BE side
    - MINOR: quic: split congestion controler options for FE/BE usage
    - MINOR: quic: split Tx options for FE/BE usage
    - MINOR: quic: rename max Tx mem setting
    - MINOR: quic: rename retry-threshold setting
    - MINOR: quic: rename frontend sock-per-conn setting
    - BUG/MINOR: quic: split max-idle-timeout option for FE/BE usage
    - BUG/MINOR: quic: split option for congestion max window size
    - BUG/MINOR: quic: rename and duplicate stream settings
    - BUG/MEDIUM: applet: Improve again spinning loops detection with the new API
    - Revert "BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency"
    - Revert "MEDIUM: freq-ctr: use explicit-size types for freq-ctr struct"
    - Revert "MINOR: compiler: add FIXED_SIZE(size, type, name) macro"
    - BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency (2nd attempt)
    - BUG/MINOR: stick-tables: properly index string-type keys
    - BUILD: openssl-compat: fix build failure with OPENSSL=0 and KTLS=1
    - BUG/MEDIUM: mt_list: Use atomic operations to prevent compiler optims
    - MEDIUM: quic: Fix build with openssl-compat
    - MINOR: applet: do not put SE_FL_WANT_ROOM on rcv_buf() if the channel is empty
    - MINOR: cli: create cli_raw_rcv_buf() from the generic applet_raw_rcv_buf()
    - BUG/MEDIUM: cli: do not return ACKs one char at a time
    - BUG/MEDIUM: ssl: Crash because of dangling ckch_store reference in a ckch instance
    - BUG/MINOR: ssl: Remove unreachable code in CLI function
    - BUG/MINOR: acl: warn if "_sub" derivative used with an explicit match
    - DOC: config: fix confusing typo about ACL -m ("now" vs "not")
    - DOC: config: slightly clarify the ssl_fc_has_early() behavior
    - MINOR: ssl-sample: add ssl_fc_early_rcvd() to detect use of early data
    - CI: disable fail-fast on fedora rawhide builds
    - MINOR: http: fix 405,431,501 default errorfile
    - BUG/MINOR: init: Do not close previously created fd in stdio_quiet
    - MINOR: init: Make devnullfd global and create it earlier in init
    - MINOR: init: Use devnullfd in stdio_quiet calls instead of recreating a fd everytime
    - MEDIUM: ssl: Add certificate password callback that calls external command
    - MEDIUM: ssl: Add local passphrase cache
    - MINOR: ssl: Do not dump decrypted privkeys in 'dump ssl cert'
    - BUG/MINOR: resolvers: Apply dns-accept-family setting on additional records
    - MEDIUM: h1: Immediately try to read data for frontend
    - REGTEST: quic: add ssl_reuse.vtc new QUIC test
    - BUG/MINOR: ssl: returns when SSL_CTX_new failed during init
    - MEDIUM: ssl/ech: config and load keys
    - MINOR: ssl/ech: add logging and sample fetches for ECH status and outer SNI
    - MINOR: listener: implement bind_conf_find_by_name()
    - MINOR: ssl/ech: key management via stats socket
    - CI: github: add USE_ECH=1 to haproxy for openssl-ech job
    - DOC: configuration: "ech" for bind lines
    - BUG/MINOR: ech: non destructive parsing in cli_find_ech_specific_ctx()
    - DOC: management: document ECH CLI commands
    - MEDIUM: mux-h2: do not needlessly refrain from sending data early
    - MINOR: mux-h2: extract the code to send preface+settings into its own function
    - BUG/MINOR: mux-h2: send the preface along with the first request if needed
2025-10-31 10:09:57 +01:00
Willy Tarreau
a1f26ca307 BUG/MINOR: mux-h2: send the preface along with the first request if needed
Tests involving 0-RTT and H2 on the backend show that 0-RTT is being
partially used but does not work. The analysis shows that only the
preface and settings are sent using early-data and the request is sent
separately. As explained in the previous patch, this is caused by the
fact that a wakeup of the iocb is needed just to send the preface, then
a new call to process_stream is needed to try sending again.

Here with this patch, we're making h2_snd_buf() able to send the preface
if it was not yet sent. Thanks to this, the preface, settings and first
request can now leave as a single TCP segment. In case of TLS with 0-RTT,
it now allows all the block to leave in early data.

Even in clear-text H2, we're now seeing a 15% lower context-switch count,
and the number of calls to process_stream() per connection dropped from 3
to 2. The connection rate increased by an extra 9.5%. Compared to without
the last 3 patches, this is a 22% reduction of context-switches, 33%
reduction of process_stream() calls, and 15.7% increase in connection
rate. And more importantly, 0-RTT now really works with H2 on the
backend, saving one full RTT on the first request.

This fix is only for a missed optimization and a non-functional 0-RTT
on the backend. It's worth backporting it, but it doesn't cause enough
harm to hurry a backport. Better wait for it to live a little bit in
3.3 (till at least a week or two after the final release) before
backporting it. It's not sure that it's worth going beyond 3.2 in any
case. It depends on the these two previous commits:

  MEDIUM: mux-h2: do not needlessly refrain from sending data early
  MINOR: mux-h2: extract the code to send preface+settings into its own function
2025-10-30 18:16:54 +01:00
Willy Tarreau
d5aa3e19cc MINOR: mux-h2: extract the code to send preface+settings into its own function
The code that deals with sending preface + settings and changing the
state currently is in h2_process_mux(), but we'll want to do it as
well from h2_snd_buf(), so let's move it to a dedicate function first.
At this point there is no functional change.
2025-10-30 18:16:54 +01:00
Willy Tarreau
b0e8edaef2 MEDIUM: mux-h2: do not needlessly refrain from sending data early
The mux currently refrains from sending data before H2_CS_FRAME_H, i.e.
before the peer's SETTINGS frame was received. While it makes sense on
the frontend, it's causing harm on the backend because it forces the
first request to be sent in two halves over an extra RTT: first the
preface and settings, second the request once the settings are received.
This is totally contrary to the philosophy of the H2 protocol, consisting
in permitting the client to send as soon as possible.

Actually what happens is the following:
  - process_stream() calls connect_server()
  - connect_server() creates a connection, and if the proto/alpn is guessed
    or known, the mux is instantiated for the current request.
  - the H2 init code wakes the h2 tasklet up and returns
  - process_stream() tries to send the request using h2_snd_buf(), but that
    one sees that we're before H2_CS_FRAME_H, refrains from doing so and
    returns.
  - process_stream() subscribes and quits
  - the h2 tasklet can now execute to send the preface and settings, which
    leave as a first TCP segment. The connection is ready.
  - the iocb is woken again once the server's SETTINGS frame is received,
    turning the connection to the H2_CS_FRAME_H state, and the iocb wake
    up process_stream().
  - process_stream() executes again and can try to send again.
  - h2_snd_buf() is called and finally sends the request as a second TCP
    segment.

Not only this is inefficient, but it also renders 0-RTT and TFO impossible
on H2 connections. When 0-RTT is used, only the preface and settings leave
as early data (the very first data of that connection), which is totally
pointless.

In order to fix this, we have to go through a few steps:
  - first we need to let data be sent to a server immediately after the
    SETTINGS frame was sent (i.e. in H2_CS_SETTINGS1 state instead of
    H2_CS_FRAME_H). However, some protocol extensions are advertised by
    the server using SETTINGS (e.g. RFC8441) and some requests might need
    to know the existence of such extensions. For this reason we're adding
    a new h2c flag, H2_CF_SETTINGS_NEEDED, which indicates that some
    operations were not done because a server's SETTINGS frame is needed.
    This is set when trying to send a protocol upgrade or extended CONNECT
    during H2_CS_SETTINGS1, indicating that it's needed to wait for
    H2_CS_FRAME_H in this case. The flag is always set on frontend
    connections. This is what is being done in this patch.

  - second, we need to be able to push the preface opportunistically with
    the first h2_snd_buf() so that it's not needed to wake the tasklet up
    just to send that and wake process_stream() again. This will be in a
    separate patch.

By doing the first step, we're at least saving one needless tasklet
wakeup per connection (~9%), which results in ~5% backend connection
rate increase.
2025-10-30 18:16:54 +01:00
William Lallemand
0436062f48 DOC: management: document ECH CLI commands
Document "show ssl ech", "add ssl ech", "set ssl ech" and "del ssl ech"
2025-10-30 11:59:39 +01:00
William Lallemand
f6503bd7d3 BUG/MINOR: ech: non destructive parsing in cli_find_ech_specific_ctx()
cli_find_ech_specific_ctx() parses the <frontend>/<bind_conf> and sets
 a \0 in place the '/'. But the originals tring is still used to emit
 messages in the CLI so we only output the frontend part.

 This patch do the parsing in a trash buffer instead.
2025-10-30 11:59:39 +01:00
William Lallemand
37f76c45fa DOC: configuration: "ech" for bind lines
ECH is an experimental features which still a draft, but already exists as a
feature branch in OpenSSL.

This patch explains how to configure "ech" on bind lines.
2025-10-30 10:38:46 +01:00
William Lallemand
ce413f002a CI: github: add USE_ECH=1 to haproxy for openssl-ech job
Add the USE_ECH=1 make option to the haproxy build in order to test the
build of the feature.
2025-10-30 10:38:38 +01:00
sftcd
9aacb684cd MINOR: ssl/ech: key management via stats socket
This patch extends the ECH support by adding runtime CLI commands to
view and modify ECH configurations.

New commands are added to the HAProxy CLI:
- "show ssl ech [<name>]" displays all ECH configurations or a specific
  one.
- "add ssl ech <name> <payload>" adds a new PEM-formatted ECH
  configuration.
- "set ssl ech <name> <payload>" replaces all existing ECH
  configurations.
- "del ssl ech <name> [<age-in-secs>]" removes ECH configurations,
  optionally filtered by age.
2025-10-30 10:38:31 +01:00
William Lallemand
1e2f920be6 MINOR: listener: implement bind_conf_find_by_name()
Returns a pointer to the first bind_conf matching <name> in a frontend
<front>.

When name is prefixed by a @ (@<filename>:<linenum>), it tries to look
for the corresponding filename and line of the configuration file.

NULL is returned if no match is found.
2025-10-30 10:37:42 +01:00
sftcd
23f5cbb411 MINOR: ssl/ech: add logging and sample fetches for ECH status and outer SNI
This patch adds functions to expose Encrypted Client Hello (ECH) status
and outer SNI information for logging and sample fetching.

Two new helper functions are introduced in ech.c:
 - conn_get_ech_status() places the ECH processing status string into a
   buffer.
 - conn_get_ech_outer_sni() retrieves the outer SNI value if ECH
   succeeded.

Two new sample fetch keywords are added:
 - "ssl_fc_ech_status" returns the ECH status string.
 - "ssl_fc_ech_outer_sni" returns the outer SNI value seen during ECH.

These allow ECH information to be used in HAProxy logs, ACLs, and
captures.
2025-10-30 10:37:30 +01:00
sftcd
dba4fd248a MEDIUM: ssl/ech: config and load keys
This patch introduces the USE_ECH option in the Makefile to enable
support for Encrypted Client Hello (ECH) with OpenSSL.

A new function, load_echkeys, is added to load ECH keys from a specified
directory. The SSL context initialization process in ssl_sock.c is
updated to load these keys if configured.

A new configuration directive, `ech`, is introduced to allow users to
specify the ECH key  directory in the listener configuration.
2025-10-30 10:37:12 +01:00
William Lallemand
83e3cbc262 BUG/MINOR: ssl: returns when SSL_CTX_new failed during init
In ssl_sock_initial_ctx(), returns when SSL_CTX_new() failed instead of
trying to apply anything on the ctx. This may avoid crashing when
there's not enough memory anymore during configuration parsing.

Could be backported in every haproxy versions
2025-10-30 10:36:56 +01:00
Frederic Lecaille
2f621aa52e REGTEST: quic: add ssl_reuse.vtc new QUIC test
Note that this test does not work with OpenSSL 3.5.0 QUIC API because
the callback set by SSL_CTX_sess_set_new_cb() (ssl_sess_new_srv_cb()) is not
called (at least for QUIC clients)

The role of this new QUIC test is to run the same SSL/TCP test as
reg-tests/ssl/ssl_reuse.vtc but with QUIC connections where applicable (only with
TLSv1.3).

To do so, this QUIC test uses the "include" vtc command to run ssl/ssl_reuse.vtc
It also sets the VTC_SOCK_TYPE environment variable with the "setenv" command and
"quic" as value. This will ask vtest2 to use QUIC sockets for all "fd@{...}"
addresses prefixed by "${VTC_SOCK_TYPE}+" socket type if VTC_SOCK_TYPE value is "quic".

The SSL/TCP is modified to set this environment variable with "setenv -ifunset"
from ssl/ssl_reuse.vtc with "stream" as value, if it not already set.

vtest2 must be used with this patch to support this new QUIC test:
9aa4d498db

Thanks to this latter patch, vtest2 retrieves the VTC_SOCK_TYPE environment variable
value, then it parses the vtc file to retrieve all the fd addresses prefixed by
"${VTC_SOCK_TYPE}+" and creates a QUIC socket or a TCP socket depending on this
variable value.
2025-10-30 08:33:54 +01:00
Olivier Houchard
b3d6f44af8 MEDIUM: h1: Immediately try to read data for frontend
In h1_init(), if we're a frontend connection, immediately attempt to
read data, if the connection is ready, instead of just subscribing.
There may already be data available, at least if we're using 0RTT.

This may be backported up to 2.8 in a while, after 3.3 is released, so
that if it causes problem, we have a chance to hear about it.
2025-10-29 17:18:26 +01:00
Christopher Faulet
c84c15d393 BUG/MINOR: resolvers: Apply dns-accept-family setting on additional records
dns-accept-family setting was only evaluated for responses to A / AAAA DNS
queries. It was ignored when additional records in SRV responses were
parsed.

With this patch, whena SRV responses is parsed, additional records not
matching the dns-accept-family setting are ignored, as expected.

This patch must be backported to 3.2.
2025-10-29 11:20:27 +01:00
Remi Tricot-Le Breton
dc35a3487b MINOR: ssl: Do not dump decrypted privkeys in 'dump ssl cert'
A private keys that is password protected and was decoded during init
thanks to the password obtained thanks to 'ssl-passphrase-cmd' should
not be dumped via 'dump ssl cert' CLI command.
2025-10-29 10:54:17 +01:00
Remi Tricot-Le Breton
5a036d223b MEDIUM: ssl: Add local passphrase cache
Instead of calling the external password command for all loaded
encrypted certificates, we will keep a local password cache.
The passwords won't be stored as plain text, they will be stored
obfuscated into the password cache. The obfuscation is simply based on a
XOR'ing with a random number built during init.
After init is performed, the password cache is overwritten and freed so
that no dangling info allowing to dump the passwords remains.
2025-10-29 10:54:17 +01:00
Remi Tricot-Le Breton
478dd7bad0 MEDIUM: ssl: Add certificate password callback that calls external command
When a certificate is protected by a password, we can provide the
password via the dedicated pem_password_cb param provided to
PEM_read_bio_PrivateKey.
HAProxy will fetch the password automatically during init by calling a
user-defined external command that should dump the right password on its
standard output (see new 'ssl-passphrase-cmd' global option).
2025-10-29 10:54:17 +01:00
Remi Tricot-Le Breton
a011683622 MINOR: init: Use devnullfd in stdio_quiet calls instead of recreating a fd everytime
Since commit "65760d MINOR: init: Make devnullfd global and create it
earlier in init" the devnullfd file descriptor pointing to /dev/null
is created regardless of the process's parameters so we can use it in
all 'stdio_quiet' calls instead or recreating an FD.
2025-10-29 10:54:17 +01:00
Remi Tricot-Le Breton
1ec59d3426 MINOR: init: Make devnullfd global and create it earlier in init
The devnull fd might be needed during configuration parsing, if some
options require to fork/exec for instance. So we now create it much
earlier in the init process and without depending on the '-q' or '-d'
parameters.
2025-10-29 10:54:17 +01:00
Remi Tricot-Le Breton
c606ff45a0 BUG/MINOR: init: Do not close previously created fd in stdio_quiet
During init we were calling 'stdio_quiet' and passing the previously
created 'devnullfd' file descriptor. But the 'stdio_quiet' was also
closed afterwards which raised an error (EBADF).
If we keep from closing FDs that were opened outside of the
'stdio_quiet' function we will let the caller manage its FD and avoid
double close calls.

This patch can be backported to all stable branches.
2025-10-29 10:54:17 +01:00
Huangbin Zhan
ad9a24ee55 MINOR: http: fix 405,431,501 default errorfile
A few typos were present in the default errorfiles for the status codes
above (missing dot at the end of the sentence, extra closing bracket).
This fixes them. This can be backported.
2025-10-29 08:47:19 +01:00
Ilia Shipitsin
9781d91e4d CI: disable fail-fast on fedora rawhide builds
Previously builds were dependent in terms that if one fails, other are
stopped. By their nature those builds are independent, let's not to fail
them altogether
2025-10-29 08:15:01 +01:00
Willy Tarreau
18b27bfec9 MINOR: ssl-sample: add ssl_fc_early_rcvd() to detect use of early data
We currently have ssl_fc_has_early() which says that early data are still
unconfirmed by a final handshake, but nothing to see if a client has been
able to use early data at all, which is a problem because such mechanisms
generally depend on multiple factors and it's hard to know when they start
to work. This new sample fetch function will indicate that some early data
were seen over that front connection, i.e. this can be used to confirm
that at some point the client was able to push some. This is essentially
a debugging tool that has no practical use case other than debugging.
2025-10-29 08:13:29 +01:00
Willy Tarreau
765d49b680 DOC: config: slightly clarify the ssl_fc_has_early() behavior
Clarify that it's about handshake *completion*, and also mention that
the action to be used to wait for the handshake is "wait-for-handshake",
which was not mentioned.

This can be backported though it's very minor.
2025-10-29 08:13:29 +01:00
Willy Tarreau
20174ca143 DOC: config: fix confusing typo about ACL -m ("now" vs "not")
A one-letter typo in the doc update comint with commit 6ea50ba462 ("MINOR:
acl; Warn when matching method based on a suffix is overwritten") inverts
the meaning of the sentence. It was "is not allowed" and not
"is now allowed". Needs to be backported only if the commit above ever is
(unlikely).
2025-10-29 08:13:29 +01:00
Amaury Denoyelle
7f2ae10920 BUG/MINOR: acl: warn if "_sub" derivative used with an explicit match
Recently, a new warning is displayed when an ACL derivative match method
is override with another '-m' method. This is implemented via the
following patch :

  6ea50ba462692d6dcf301081f23cab3e0f6086e4
  MINOR: acl; Warn when matching method based on a suffix is overwritten

However, this warning was not reported when "_sub" suffix was specified.
Fix this by adding PAT_MATCH_SUB in the warning comparison.

No backport needed except if above commit is.
2025-10-28 11:59:32 +01:00
Remi Tricot-Le Breton
89b43740e3 BUG/MINOR: ssl: Remove unreachable code in CLI function
Remove unreachable code in 'cli_parse_show_jwt' function.

This bug was raised in GitHub #3159.
This patch does not need to be backported.
2025-10-28 10:44:51 +01:00
Remi Tricot-Le Breton
7482b6ebf0 BUG/MEDIUM: ssl: Crash because of dangling ckch_store reference in a ckch instance
When updating CAs via the CLI, we need to create new copies of all the
impacted ckch instances (as in referenced in the ckch_inst_link list of
the updated CA) in order to use them instead of the old ones once the
updated is completed. This relies on the ckch_inst_rebuild function that
would set the ckch_store field of the ckch_inst. But we forgot to also
add the newly created instances in the ckch_inst list of the
corresponding ckch_store.

When updating a certificate afterwards, we iterate over all the
instances linked in the ckch_inst list of the ckch_store (which is
missing some instances because of the previous command) and rebuild the
instances before replacing the ckch_store. The previous ckch_store,
still referenced by the dangling ckch instance then gets deleted which
means that the instance keeps a reference to a free'd object.

Then if we were to once again update the CA file, we would iterate over
the ckch instances referenced in the cafile_entry's ckch_inst_link list,
which includes the first mentioned ckch instance with the dead
ckch_store reference. This ends up crashing during the ckch_inst_rebuild
operation.

This bug was raised in GitHub #3165.
This patch should be backported to all stable branches.
2025-10-28 10:43:45 +01:00
Willy Tarreau
2d7e3ddd4a BUG/MEDIUM: cli: do not return ACKs one char at a time
Since 3.0 where the CLI started to use rcv_buf, it appears that some
external tools sending chained commands are randomly experiencing
failures. Each time this happens when the whole command is sent as a
single packet, immediately followed by a close. This is not a correct
way to use the CLI but this has been working for ages for simple
netcat-based scripts, so we should at least try to preserve this.

The cause of the failure is that the first LF that acks a command is
immediately sent back to the client and rejected due to the closed
connection. This in turn forwards the error back to the applet which
aborts its processing.

Before 3.0 the responses would be queued into the buffer, then sent
back to the channel, and would all fail at once. This changed when
snd_buf/rcv_buf were implemented because the applets are much more
responsive and since they yield between each command, they can
deliver one ACK at a time that is immediately forwarded down the
chain.

An easy way to observe the problem is to send 5 map updates, a shutdown,
and immediately close via tcploop, and in parallel run a periodic
"show map" to count the number of elements:

  $ tcploop -U /tmp/sock1 C S:"add map #0 1 1; add map #0 2 2; add map #0 3 3; add map #0 4 4; add map #0 5 5\n" F K

Before 3.0, there would always be 5 elements. Since 3.0 and before
20ec1de214 ("MAJOR: cli: Refacor parsing and execution of pipelined
commands"), almost always 2. And since that commit above in 3.2, almost
always one. Doing the same using socat or netcat shows almost always 5...
It's entirely timing-dependent, and might even vary based on the RTT
between the client and haproxy!

The approach taken here consists in doing the same principle as MSG_MORE
or Nagle but on the response buffer: the applet doesn't need to send a
single ACK for each command when it has already been woken up and is
scheduled to come back to work. It's fine (and even desirable) that
ACKs are grouped in a single packet as much as possible.

For this reason, this patch implements APPCTX_CLI_ST1_YIELD, a new CLI
flag which indicates that the applet left in yielding condition, i.e.
it has not finished its work. This flag is used by .rcv_buf to hold
pending data. This way we won't return partial responses for no reason,
and we can continue to emulate the previous behavior.

One very nice benefit to this is that it saves huge amounts of CPU on
the client. In the test below that tries to update 1M map entries, the
CPU used by socat went from 100% to 0% and the total transfer time
dropped by 28%:

  before:
    $ time awk 'BEGIN{ printf "prompt i\n"; for (i=0;i<1000000;i++) { \
         printf "add map #0 %d %d\n",i,i,i }}' | socat /tmp/sock1 - >/dev/null

    real    0m2.407s
    user    0m1.485s
    sys     0m1.682s

  after:
    $ time awk 'BEGIN{ printf "prompt i\n"; for (i=0;i<1000000;i++) { \
         printf "add map #0 %d %d\n",i,i,i }}' | socat /tmp/sock1 - >/dev/null

    real    0m1.721s
    user    0m0.952s
    sys     0m0.057s

The difference is also quite visible on the number of syscalls during
the test (for 1k updates):

  before:
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.071691           0    100001           sendmsg

  after:
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.000011           1         9           sendmsg

This patch will need to be backported to 3.0, and depends on these two
patches to be backported as well:

    MINOR: applet: do not put SE_FL_WANT_ROOM on rcv_buf() if the channel is empty
    MINOR: cli: create cli_raw_rcv_buf() from the generic applet_raw_rcv_buf()
2025-10-27 16:57:07 +01:00
Willy Tarreau
f38ea2731b MINOR: cli: create cli_raw_rcv_buf() from the generic applet_raw_rcv_buf()
This is in preparation for a future fix. For now it's simply a pure
copy of the original function, but dedicated to the CLI. It will
have to be backported to 3.0.
2025-10-27 16:57:07 +01:00
Willy Tarreau
35106d65fb MINOR: applet: do not put SE_FL_WANT_ROOM on rcv_buf() if the channel is empty
appctx_rcv_buf() prepares all the work to schedule the transfers between
the applet and the channel, and it takes care of setting the various flags
that indicate what condition is blocking the transfer from progressing.

There is one limitation though. In case an applet refrains from sending
data (e.g. rate-limited, prefers to aggregate blocks etc), it will leave
a possibly empty channel buffer, and keep some data in its outbuf. The
data in its outbuf will be seen by the function above as an indication
of a channel full condition, so it will place SE_FL_WANT_ROOM. But later,
sc_applet_recv() will see this flag with a possibly empty channel, and
will rightfully trigger a BUG_ON().

appctx_rcv_buf() should be more accurate in fact. It should only set
SE_FL_RCV_MORE when more data are present in the applet, then it should
either set or clear SE_FL_WANT_ROOM dependingon whether the channel is
empty or not.

Right now it doesn't seem possible to trigger this condition in the
current state of applets, but this will become possible with a future
bugfix that will have to be backported, so this patch will need to be
backported to 3.0.
2025-10-27 16:57:07 +01:00
Olivier Houchard
259b1e1c18 MEDIUM: quic: Fix build with openssl-compat
As the QUIC options have been split into backend and frontend, there is
no more GTUNE_QUIC_LISTEN_OFF to be found in global.tune.options, look
for QUIC_TUNE_FE_LISTEN_OFF in quic_tune.fe instead.
This should fix the build with USE_QUIC and USE_QUIC_OPENSSL_COMPAT.
2025-10-24 13:51:15 +02:00
Olivier Houchard
837351245a BUG/MEDIUM: mt_list: Use atomic operations to prevent compiler optims
As a folow-up to f40f5401b9f24becc6fdd2e77d4f4578bbecae7f, explicitely
use atomic operations to set the prev and next fields, to make sure the
compiler can't assume anything about it, and just does it.

This should be backported after f40f5401b9 up to 2.8.
2025-10-24 13:34:41 +02:00
Willy Tarreau
2ec6df59bf BUILD: openssl-compat: fix build failure with OPENSSL=0 and KTLS=1
The USE_KTLS test is currently being done outside of the USE_OPENSSL
guard so disabling USE_OPENSSL still results in build failures on
libcs built with support for kernels before 4.17, because we enable
KTLS by default on linux. Let's move the KTLS block inside the
USE_OPENSSL guard instead.

No backport is needed since KTLS is only in 3.3.
2025-10-24 10:45:02 +02:00
Willy Tarreau
1824079fca BUG/MINOR: stick-tables: properly index string-type keys
This is one of the rare pleasant surprises of fixing an almost 16-years
old bug that remained unnoticed since the feature was implemented. In
1.4-dev7, commit 3bd697e071 ("[MEDIUM] Add stick table (persistence)
management functions and types") introduced stick-tables with multiple
key types, including strings, IP addresses and integers. Entries are
coded in binary and their binary representation is indexed. A special
case was made for strings in order to index them as zero-terminated
strings. However, there's one subtlety. While strings indeed have a
zero appended, they're still indexed using ebmb_insert(), which means
that all the bytes till the configured size are indexed as well. And
while these bytes generally come from a temporary storage that often
contains zeroes, or that is longer than the configured string length
and will result in truncation, it's not always the case and certain
traffic patterns with certain configurations manage to occasionally
present unpadded strings resulting in apparent duplicate keys appearing
in the dump, as shown in GH issue #3161. It seems to be essentially
reproducible at boot, and not to be particularly affected by mixed
patterns. These keys are in fact not exact duplicates in memory, but
everywhere they're used (including during synchronization), they are
equal.

What's interesting is that when this happens, one key can be presented
to a peer with its own data and will be indexed as the only one, possibly
replacing contents from the previous key, which might replace them again
later once updated in turn. This is visible in the dump of the issue
above, where key "localhost:8001" was split into two entries, one with a
request count of one and the other with a request count of 499999, and
indeed, all peers see only that last value, which overwrote the first
one.

This fix must be backported to all stable branches. Special kudos to
Mark Wort for undelining that one.
2025-10-24 10:15:11 +02:00
Aurelien DARRAGON
d655ed5f14 BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency (2nd attempt)
This is a second attempt at fixing issues on 32bits systems which would
trigger the following BUG_ON() statement:

 FATAL: bug condition "sizeof(struct shm_stats_file_object) != 544" matched at src/stats-file.c:825 shm_stats_file_object struct size changed, is is part of the exported API: ensure all precautions were taken (ie: shm_stats_file version change) before adjusting this

This is a drop-in replacement for d30b88a6c + 4693ee0ff, as suggested by
Willy.

Indeed, on supported platforms unsigned int can be assumed to be 4 bytes
long, and long can be assumed to be 8 bytes long. As such, the previous
attempt was overkill and added unecessary maintenance complexity which
could result in bugs if not used properly. Moreover, it would only
partially solve the issue, since on little endian vs big endian
architectures, the provisioned memory areas (originating from the same
shm stats file) could be read differently by the host.

Instead we fix the aligments issues, and this alone helps to ensure
struct memory consistency on 64 vs 32bits platforms. It was tested
on both i386 and i586.

last_change and last_sess counters are now stored as unsigned int, as
it helped to fix the alignment issues and they were found to be used
as 32bits integers anyway.

Thanks to Willy for problem analysis and the patch proposal.

No backport needed.
2025-10-24 09:35:38 +02:00
Aurelien DARRAGON
a931779dde Revert "MINOR: compiler: add FIXED_SIZE(size, type, name) macro"
This reverts commit 466a603b59ed77e9787398ecf1baf77c46ae57b1.
Due to the last 2 commits, this macro is now unused, and will probably
never be used, so let's get rid of that for now.
2025-10-24 09:35:34 +02:00
Aurelien DARRAGON
8277f891d2 Revert "MEDIUM: freq-ctr: use explicit-size types for freq-ctr struct"
This reverts commit 4693ee0ff7a5fa4a12ff69b1a33adca142e781ac.
As discussed in GH #3168, this works but it is not the proper way to fix
the issue. See following commits.
2025-10-24 09:35:29 +02:00
Aurelien DARRAGON
c0d952ccc1 Revert "BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency"
This reverts commit d30b88a6cc47d662e92b524ad5818be312401d0e.
As discussed in GH #3168, this works but it is not the proper way to fix
the issue. See following commits.
2025-10-24 09:35:25 +02:00
Christopher Faulet
854888497e BUG/MEDIUM: applet: Improve again spinning loops detection with the new API
A first attempt to fix this issue was already pushed (54b7539d6 "BUG/MEDIUM:
apppet: Improve spinning loop detection with the new API"). But it not was
fully accurrate. Indeed, we must check if something was received or sent by
the applet before incrementing the call rate. But we must also take care the
applet is allowed to receive or send data. That is what is performed in this
patch.

This patch must be backported as far as 3.0 with the patch above.
2025-10-24 09:26:10 +02:00
Amaury Denoyelle
7ba4b0ad5f BUG/MINOR: quic: rename and duplicate stream settings
Several settings can be set to control stream multiplexing and
associated receive window. Previously, all of these settings were
configured using prefix "tune.quic.frontend.", despite being applied
blindly on both sides.

Fix this by duplicating these settings specific to frontend and backend
side. Options are also renamed to use the standardize prefix
"tune.quic.[be|fe].stream." notation.

Also, each option is individually renamed to better reflect its purpose
and hide technical details relative to QUIC transport parameter naming :
* max-data-size -> stream.rxbuf
* max-streams-bidi -> stream.max-concurrent
* stream-data-ratio -> stream.data-ratio

No need to backport.
2025-10-23 16:49:20 +02:00
Amaury Denoyelle
d5142706f8 BUG/MINOR: quic: split option for congestion max window size 2025-10-23 16:49:20 +02:00
Amaury Denoyelle
33afba0dda BUG/MINOR: quic: split max-idle-timeout option for FE/BE usage
Streamline max-idle-timeout option. Rename it to use the newer cohesive
naming scheme 'tune.quic.fe|be.'.

Two different fields were already defined in global struct. These fields
are moved into quic_tune along with other QUIC settings. However, no
parser was defined for backend option, this commit fixes this.

No need to backport this.
2025-10-23 16:49:20 +02:00
Amaury Denoyelle
5bc659a4a2 MINOR: quic: rename frontend sock-per-conn setting
On frontend side, a quic_conn can have a dedicated FD or use the
listener one. These different modes can be activated via a global QUIC
tune setting.

This patch adjusts the option. First, it is renamed to the more
meaningful name 'tune.quic.fe.sock-per-conn'. Also, arguments are now
either 'default-on' or 'force-off'. The objective is to better highlight
reliationship with 'quic-socket' bind option.

The older option is deprecated and will be removed in 3.5.
2025-10-23 16:49:20 +02:00
Amaury Denoyelle
a14c6cee17 MINOR: quic: rename retry-threshold setting
A QUIC global tune setting is defined to be able to force Retry emission
prior to handshake. By definition, this ability is only supported by
QUIC servers, hence it is a frontend option only.

Rename the option to use "fe" prefix. The old option name is deprecated
and will be removed in 3.5
2025-10-23 16:49:20 +02:00
Amaury Denoyelle
d248c5bd21 MINOR: quic: rename max Tx mem setting
QUIC global memory can be limited across the entire process via a global
tune setting. Previously, this setting used to misleading "frontend"
prefix. As this is applied as a sum between all QUIC connections, both
from frontend and backend sides, remove the prefix. The new option name
is "tune.quic.mem.tx-max".

The older option name is deprecated and will be removed in 3.5.
2025-10-23 16:49:20 +02:00
Amaury Denoyelle
9bfe9b9e21 MINOR: quic: split Tx options for FE/BE usage
This patch is similar to the previous one, except that it is focused on
Tx QUIC settings. It is now possible to toggle GSO and pacing on
frontend and backend sides independently.

As with previous patch, option are renamed to use "fe/be" unified
prefixes. This is part of the current serie of commits which unify QUI
settings. Older options are deprecated and will be removed on 3.5
release.
2025-10-23 16:49:20 +02:00
Amaury Denoyelle
33a8cb87a9 MINOR: quic: split congestion controler options for FE/BE usage
Various settings can be configured related to QUIC congestion controler.
This patch duplicates them to be able to set independent values on
frontend and backend sides.

As with previous patch, option are renamed to use "fe/be" unified
prefixes. This is part of the current serie of commits which unify QUIC
settings. Older options are deprecated and will be removed on 3.5
release.
2025-10-23 16:49:20 +02:00
Amaury Denoyelle
7640e9a9ee MINOR: quic: duplicate glitches FE option on BE side
Previously, QUIC glitches support was only implemented for frontend
side. Extend this so that the option can be specified separately both on
frontend and backend sides. Function _qcc_report_glitch() now retrieves
the relevant max value based on connection side.

In addition to this, option has been renamed to use "fe/be" prefixes.
This is part of the current serie of commits which unify QUIC settings.
Older options are deprecated and will be removed on 3.5 release.
2025-10-23 16:49:20 +02:00
Amaury Denoyelle
b34cd0b506 MINOR: quic: rename "no-quic" to "tune.quic.listen"
Rename the option to quickly enable/disable every QUIC listeners. It now
takes an argument on/off. The documentation is extended to reflect the
fact that QUIC backend are not impacted by this option.

The older keyword is simply removed. Deprecation is considered
unnecessary as this setting is only useful during debugging.
2025-10-23 16:47:58 +02:00
Amaury Denoyelle
42e5ec6519 MINOR: quic: prepare support for options on FE/BE side
A major reorganization of QUIC settings is going to be performed. One of
its objective is to clearly define options which can be separately
configured on frontend and backend proxy sides.

To implement this, quic_tune structure is extended to support fe and be
options. A set of macros/functions is also defined : it allows to
retrieve an option defined on both sides with unified code, based on
proxy side of a quic_conn/connection instance.
2025-10-23 15:06:01 +02:00
Amaury Denoyelle
cf3cf7bdda MINOR: quic: remove unused conn-tx-buffers limit keyword
Remove parsing code for tune.quic.frontend.conn-tx-buffers.limit. This
option was deprecated for some time and in fact was noop and not
mentionned anymore in the documentation.
2025-10-23 15:06:01 +02:00
Olivier Houchard
f40f5401b9 BUG/MEDIUM: mt_lists: Avoid el->prev = el->next = el
Avoid setting both el->prev and el->next on the same line.
The goal is to set both el->prev and el->next to el, but a naive
compiler, such as when we're using -O0, will set el->next first, then
will set el->prev to the value of el->next, but if we're unlucky,
el->next will have been set to something else by another thread.
So explicitely set both to what we want.

This should be backported up to 2.8.
2025-10-23 14:43:51 +02:00
William Lallemand
d0f9515e5c MINOR: acme: display the complete challenge_ready command in the logs
When using a wildcard DNS domain in the ACME configuration, for example
*.example.com, one might think that it needs to use the challenge_ready
command with this domain. But that's not the case, the challenge_ready
command takes the domain asked by the ACME server, which is stripped of
the wildcard.

In order to be clearer, the log message shows exactly the command the
user should sent, which is clearer.
2025-10-23 11:14:07 +02:00
William Lallemand
861fe53204 MINOR: acme: add the dns-01-record field to the sink
The dns-01-record field in the dpapi sink, output the authentication
token which is needed in the TXT record in order to validate the DNS-01
challenge.
2025-10-23 11:14:07 +02:00
Olivier Houchard
dfe866fa98 BUG/MEDIUM: stick-tables: Don't loop if there's nothing left
Before waking up the expiration task again at the end of it, make sure
the next date is set. If there's nothing left to do, then task_exp will
be TASK_ETERNITY and we then don't want to be waken up again.
2025-10-23 10:51:52 +02:00
Willy Tarreau
871c80505c BUG/MEDIUM: build: limit excessive and counter-productive gcc-15 vectorization
In https://bugs.gentoo.org/964719, Dan Goodliffe reported that using
CFLAGS="-O3 -march=westmere" creates a binary that segfaults on startup
with gcc-15. This could be reproduced here, is isolated to gcc-15 and
-O3, and is caused by gcc emitting "movdqa" instructions to read unaligned
longs taken from chars that were carefully isolated within ifdefs checking
for support for unaligned integers on the platform...

Some experiments showed that changing all casts all over the code using
either typedef-enforced align(1) or using the packed union trick does
the job, it needs a more in-depth validation since it's obvious that
it doesn't produce the same code at all (at least on more modern
machines).

However, the offending optimization option could be isolated, it's
"-fvect-cost-model=dynamic" which causes this, while -O2 uses
"-fvect-cost-model=very-cheap". Turning it back to very-cheap solves the
issue, reduces the code, and yields an extra 5% performance increase on
the http-request rate (181k vs 172k on a single core)! This could at
least partially explain why it has been observed several times over
the last few years that -O3 yields bigger and slower code than -O2.

It was also verified that the option doesn't change the emitted code
at -O0..-O2,-Os,-Oz, but only at -O3.

This patch detects the presence of this option and turns it on to
address the problem that some distros are facing after an upgrade to
gcc-15. As such it should be backported to recent LTS and stable
branches. Here, 3.1 was used, so it seems legit to at least target
the last two LTS branches (i.e. go as far as 3.0).

Thanks to Dan Goodliffe for sharing a working reproducer, Sam James
for starting the investigations and Christian Ruppert for bringing
the issue to us.
2025-10-23 10:06:52 +02:00
Aurelien DARRAGON
d30b88a6cc BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency
As reported by @tianon on GH #3168, running haproxy on 32bits i386
platform would trigger the following BUG_ON() statement:

 FATAL: bug condition "sizeof(struct shm_stats_file_object) != 544" matched at src/stats-file.c:825
shm_stats_file_object struct size changed, is is part of the exported API: ensure all precautions were taken (ie: shm_stats_file version change) before adjusting this

In fact, some efforts were already taken to ensure shm_stats_file_object
struct size remains consistent on 64 vs 32 bits platforms, since
shm_stats_file_object is part of the public API and directly exposed in
the stats file.

However, some parts were overlooked: some structs that are embedded in
shm_stats_file_object struct itself weren't using fixed-width integers,
and would sometime be unaligned. The result of this is that it was
up to the compiler (platform-dependent) to choose how to deal with such
ambiguities, which could cause the struct mapping/size to be inconsistent
from one platform to another.

Hopefully this was caught by the BUG_ON() statement and with the precious
help of @tianon

To fix this, we now use fixed-width integers everywhere for members
(and submembers) of shm_stats_file_object struct, and we use explicit
padding where missing to avoid automatic padding when we don't expect
one. As for the previous commit, we leverage FIXED_SIZE() and
FIXED_SIZE_ARRAY() macro to set the expected width for each integer
without causing build issues on platform that don't support larger
integers.

No backport needed, this feature was introduced during 3.3-dev.
2025-10-22 20:52:22 +02:00
Aurelien DARRAGON
4693ee0ff7 MEDIUM: freq-ctr: use explicit-size types for freq-ctr struct
freq-ctr struct is used by the shm_stats_file API, and more precisely,
it is used in the shm_stats_file_object struct for counters.

shm_stats_file_object struct requires to be plateform-independent, thus
we switch to using explicit size types (AKA fixed width integer types)
for freq-ctr, in the attempt to make freq-ctr size and memory mapping
consistent from one platform to another.

We cannot simply use fixed-width integer because some of them are
involved in atomic operations, and forcing a given width could
cause build issues on some platforms where atomic ops are not
implemented for large integers. Instead we leverage the FIXED_SIZE
macro to keep handling the integers as before, but forcing them to
be stored using expected number of bytes (unused bytes will simply
be ignored).

No change of behavior should be expected.
2025-10-22 20:52:18 +02:00
Aurelien DARRAGON
466a603b59 MINOR: compiler: add FIXED_SIZE(size, type, name) macro
FIXED_SIZE() macro can be used to instruct the compiler that the struct
member named <name>, handled as <type>, must be stored using <size> bytes
and that even if the type used is actualler smaller than the expected size

FIXED_SIZE_ARRAY(), similar to FIXED_SIZE() but for arrays: it takes an
extra argument which is the number of members.

They may be used for portability concerns to ensure a structure mapping
remains consistent between platforms.
2025-10-22 20:52:12 +02:00
Aurelien DARRAGON
1e4dbebef2 MINOR: stats-file: fix typo in shm-stats-file object struct size detection
As reported by @TimWolla on GH #3168, there was a typo in shm stats file
BUG_ON to report that the size of shm_stats_file_object changed.

No backport needed.
2025-10-22 20:52:08 +02:00
Amaury Denoyelle
f50425c021 MINOR: quic: remove received CRYPTO temporary tree storage
The previous commit switch from ncbuf to ncbmbuf as storage for received
CRYPTO frames. The latter ensures that buffering of such frames cannot
fail anymore due to gaps size.

Previously, extra mechanism were implemented on QUIC frames parsing
function to overcome the limitation of ncbuf on gaps size. Before
insertion, CRYPTO frames were stored in a temporary tree to order their
insertion. As this is not necessary anymore, this commit removes the
temporary tree insertion.

This commit is closely associated to the previous bug fix. As it
provides a neat optimization and code simplication, it can be backported
with it, but not in the next immediate release to spot potential
regression.
2025-10-22 15:24:02 +02:00
Amaury Denoyelle
4c11206395 BUG/MAJOR: quic: use ncbmbuf for CRYPTO handling
In QUIC, TLS handshake messages such as ClientHello are encapsulated in
CRYPTO frames. Each QUIC implementation can split the content in several
frames of random sizes. In fact, this feature is now used by several
clients, based on chrome so-called "Chaos protection" mechanism :

https://quiche.googlesource.com/quiche/+/cb6b51054274cb2c939264faf34a1776e0a5bab7

To support this, haproxy uses a ncbuf storage to store received CRYPTO
frames before passing it to the SSL library. However, this storage
suffers from a limitation as gaps between two filled blocks cannot be
smaller than 8 bytes. Thus, depending on the size of received CRYPTO
frames and their order, ncbuf may not be sufficient. Over time, several
mechanisms were implemented in haproxy QUIC frames parsing to overcome
the ncbuf limitation.

However, reports recently highlight that with some clients haproxy is
not able to deal with CRYPTO frames reception. In particular, this is
the case with the latest ngtcp2 release, which implements a similar
chaos protection mechanism via the following patch. It also seems that
this impacts haproxy interaction with firefox.

commit 89c29fd8611d5e6d2f6b1f475c5e3494c376028c
Author: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>
Date:   Mon Aug 4 22:48:06 2025 +0900

    Crumble Client Initial CRYPTO (aka chaos protection)

To fix haproxy CRYPTO frames buffering once and for all, an alternative
non-contiguous buffer named ncbmbuf has been recently implemented. This
type does not suffer from gaps size limitation, albeit at the cost of a
small reduction in the size available for data storage.

Thus, the purpose of this current patch is to replace ncbuf with the
newer ncbmbuf for QUIC CRYPTO frames parsing. Now, ncbmb_add() is used
to buffer received frames which is guaranteed to suceed. The only
remaining case of error is if a received frame offset and length exceed
the ncbmbuf data storage, which would result in a CRYPTO_BUFFER_EXCEEDED
error code.

A notable behavior change when switching to ncbmbuf implementation is
that NCB_ADD_COMPARE mode cannot be used anymore during add. Instead,
crypto frame content received at a similar offset will be overwritten.

A final note regarding STREAM frames parsing. For now, it is considered
unnecessary to switch from ncbuf in this case. Indeed, QUIC clients does
not perform aggressive fragmentation for them. Keeping ncbuf ensure that
the data storage size is bigger than the equivalent ncbmbuf area.

This should fix github issue #3141.

This patch must be backported up to 2.6. It is first necessary to pick
the relevant commits for ncbmbuf implementation prior to it.
2025-10-22 15:04:41 +02:00
Amaury Denoyelle
25e378fa65 MINOR: ncbmbuf: add tests as standalone mode
Write some tests for ncbmbuf buf. These tests should be run each time
ncbmbuf implementation is adjusted. Use the following command :

$ gcc -g -DSTANDALONE -I./include -o ncbmbuf src/ncbmbuf.c && ./ncbmbuf

As the previous patch, this commit must be backported prior to the fix
to come on QUIC CRYPTO frames parsing.
2025-10-22 15:04:24 +02:00
Amaury Denoyelle
8b8ab2824e MINOR: ncbmbuf: implement advance operation
Implement ncbmb_advance() function for the ncbmbuf type. This allows to
remove bytes in front of the buffer, regardless of the existing gaps.
This is implemented by resetting the corresponding bits of the bitmap.

As the previous patch, this commit must be backported prior to the fix
to come on QUIC CRYPTO frames parsing.
2025-10-22 15:04:06 +02:00
Amaury Denoyelle
42c495f3d7 MINOR: ncbmbuf: implement ncbmb_data()
Implement ncbmb_data() function for the ncbmbuf type. Its purpose is
similar to its ncbuf counterpart : it returns the size in bytes of data
starting at a specific offset until the next gap.

As the previous patch, this commit must be backported prior to the fix
to come on QUIC CRYPTO frames parsing.
2025-10-22 15:04:06 +02:00
Amaury Denoyelle
db4a68752d MINOR: ncbmbuf: implement iterator bitmap utilities functions
Extend private API for ncbmbuf type by defining an iterator type for the
buffer bitmap handling. The purpose is to provide a simple method to
iterate over the bitmap one byte at a time, with a proper bitmask set to
hide irrelevant bits.

This internal type is unused for now, but will become useful when
implementing ncb_data() and ncb_advance() functions.

As the previous patch, this commit must be backported prior to the fix
to come on QUIC CRYPTO frames parsing.
2025-10-22 15:04:06 +02:00
Amaury Denoyelle
1e1a3aa6aa MINOR: ncbmbuf: implement add
This patch implements add operation for ncbmbuf type.

This function is simpler than its ncbuf counterpart. Indeed, for now
only NCB_ADD_OVERWRT mode is supported. This compromise has been chosen
as ncbmbuf will be first used for QUIC CRYPTO frames handling, which
does not mandate to compare existing filled blocks during insertion.

As the previous patch, this commit must be backported prior to the fix
to come on QUIC CRYPTO frames parsing.
2025-10-22 15:04:06 +02:00
Amaury Denoyelle
b9f91ad3ff MINOR: ncbmbuf: define new ncbmbuf type
Define ncbmbuf which is an alternative non-contiguous buffer
implementation. "bm" abbreviation stands for bitmap, which reflects how
gaps and filled blocks are encoded. The main purpose of this
implementation is to get rid of the ncbuf limitation regarding the
minimal size for gaps between two blocks of data.

This commit adds the new module ncbmbuf. Along with it, some utility
functions such as ncbmb_make(), ncbmb_init() and ncbmb_is_empty() are
defined. Public API of ncbmbuf will be extended in the following
patches.

This patch is not considered a bug fix. However, it will be required to
fix issue encountered on QUIC CRYPTO frames parsing. Thus, it will be
necessary to backport the current patch prior to the fix to come.
2025-10-22 15:04:06 +02:00
Amaury Denoyelle
59f0bafef2 MINOR: ncbuf: extract common types
ncbuf is a module which provide a non-contiguous buffer type
implementation. This patch extracts some basic types related to it into
a new file ncbuf_common.h.

This patch will be useful to provide a new non-contiguous buffer
alternative implementation based on a bitmap.

This patch is not a bug fix. However, it is necessary for ncbmbuf
implementation which will be required to fix a QUIC issue on CRYPTO
frames parsing. This, it will be necessary to backport the current patch
prior to the fix to come.
2025-10-22 11:11:20 +02:00
Willy Tarreau
f936feb3a9 BUG/MAJOR: pools: fix default pool alignment
The doc in commit 977feb5617 ("DOC: api: update the pools API with the
alignment and typed declarations") says that alignment of zero means
the type's alignment. And this is followed by the DECLARE_TYPED_POOL()
macro. Yet this is not what is done in create_pool_from_reg() which
only raises the alignment to a void* if lower, while it should start
from the type's. The effect is haproxy refusing to start on some 32-bit
platforms since that commit, displaying an error such as:

   "BUG in the code: at src/mux_h2.c:454, requested creation of pool
    'h2s' aligned to 4 while type requires alignment of 8! Please
    report to developers. Aborting."

Let's just apply the default type's alignment.

Thanks to @tianon for reporting this in GH issue #3168. No backport is
needed since aligned pools are 3.3-only.
2025-10-22 09:06:20 +02:00
Amaury Denoyelle
bece704128 BUG/MEDIUM: h3: properly encode response after interim one in same buf
Recently, proper support for interim responses forwarding to HTTP/3
client has been implemented. However, there was still an issue if two
responses are both encoded in the same snd_buf() iteration.

The issue is caused due to H3 HEADERS frame encoding method : 5 bytes
are reserved in front of the buffer to encode both H3 frame type and
varint length field. After proper headers encoding, output buffer head
is adjusted so that length can be encoded using the minimal varint size.

However, if the buffer is not empty due to a previous response already
encoded but not yet emitted, messing with the buffer head will corrupt
the entire H3 message. This only happens when encoding of both responses
is done in the same snd_buf() iteration, or at least without emission to
quic_conn layer in between.

The result of this bug is that the HTTP/3 client will be unable to parse
the response, most of the time reporting a formatting error. This can
be reproduced using the following netcat as HTTP/1 server to haproxy :

$ while sleep 0.2; do \
    printf "HTTP/1.1 100 continue\r\n\r\nHTTP/1.1 200 ok\r\nContent-length: 5\r\nConnection: close\r\n\r\nblah\n" | nc -lp8002
  done

To fix this, only adjust buffer head if content is empty. If this is not
the case, frame length is simply encoded as a 4-bytes varint size so
that messages are contiguous in the buffer.

This must be backported up to 2.6.
2025-10-21 15:51:48 +02:00
Christopher Faulet
18ece2b424 BUG/MEDIUM: h1-htx: Don't set HTX_FL_EOM flag on 1xx informational messages
1xx informational messages are part of the HTTP response. It is not expected
to have a HX_FL_EOM flag set after parsing such messages when received from
a server. It is espacially important whne an informational messages is
processed on client side while the final response was not recieved yet, to
not erroneously detect the end of the message.

The HTTP multiplexers seem to ignore the HTX_FL_EOM flag for information
messages, but it remains an error from the HTX specification point of
view. So it must be fixed.

While it should theorically be backported as far as 3.0, it is a good idea
to not do so for now because no bug was reported and regressions may happen.
2025-10-21 14:22:26 +02:00
Olivier Houchard
cd92aeb366 MEDIUM: stick-tables: Stop as soon as stktable_trash_oldest succeeds.
stktable_trash_oldest() goes through all the shards, trying to free a
number of entries. Going through each shard is expensive, as we have to
take the shard lock, so stop as soon as we free'd at least one entry, as
it is only called when we want to make room for one entry.
2025-10-20 15:04:47 +02:00
Olivier Houchard
7854331c71 MEDIUM: stick-tables: Stop if stktable_trash_oldest() fails.
In stksess_new(), if the table is full, we call stktable_trash_oldest()
to remove a few entries so that we have some room for a new one.
It is unlikely, but possible, that stktable_trash_oldest() will fail. If
so, just give up and do not add the new entry, instead of adding it
anyway.
Give up if stktable_trash_oldest() fails to free any entry
2025-10-20 15:04:47 +02:00
Olivier Houchard
d5562e31bd MEDIUM: stick-tables: Remove the table lock
Remove the table lock, it was only protecting the per-table expiration
date, and that task is gone.
2025-10-20 15:04:47 +02:00
Olivier Houchard
8bc8a21b25 MEDIUM: stick-tables: Use a per-shard expiration task
Instead of having per-table expiration tasks, just use one per shard.
The task will now go through all the tables to expire entries. When a
table gets an expiration earlier than the one previously known, it will
be put in a mt-list, and the task will be responsible to put it into an
eb32, ordered based on the next expiration.
Each per-shard task will run on a different thread, so it should lead to
a better load distribution than the per-table tasks.
2025-10-20 15:04:47 +02:00
Olivier Houchard
945aa0ea82 MINOR: initcalls: Add a new initcall stage, STG_INIT_2
Add a new initcall stage, STG_INIT_2, for stuff to be called after
step_init_2() is called, so after we know for sure that global.nbthread
will be set.
Modify stick-tables stkt_late_init() to run at STG_INIT_2 instead of
STG_INIT, in anticipation for it to be enhanced and have a need for
global.nbthread.
2025-10-20 15:04:41 +02:00
Willy Tarreau
e63e98f1d8 BUG/MEDIUM: cli: also free the trash chunk on the error path
Since commit 20ec1de214 ("MAJOR: cli: Refacor parsing and execution of
pipelined commands"), command not returning any response (e.g. "quit")
don't pass through the free_trash_chunk() call, possibly leaking the
cmdline buffer. A typical way to reproduce it is to loop on "quit" on
the CLI, though it very likely affects other specific commands.

Let's make sure in the release handler that we always release that
chunk in any case. This must be backported to 3.2.
2025-10-20 14:58:53 +02:00
Frederic Lecaille
edd21121d2 BUG/MINOR: quic-be: unchecked connections during handshakes
This bug impacts only the backends.

The ->conn (pointer to struct connection) member validity of the ssl_sock_ctx
struct was not checked before being dereferenced, leading to possible crashes
in qc_ssl_do_hanshake() during handshake.

This was reported by GH #3163 issue.

No need to backport because the QUIC backend support arrived with 3.3
2025-10-20 14:27:12 +02:00
Olivier Houchard
7a33b90b3c BUG/MEDIUM: mt_list: Make sure not to unlock the element twice
In mt_list_delete(), if the element was not in a list, then n and p will
point to it, and so setting n->prev and n->next will be enough to unlock it.
Don't do it twice, as once it's been done the first time, another thread may
be working with it, and may have added it to a list already, and doing it
a second time can lead to list inconsistencies.

This should be backported up to 2.8.
2025-10-19 23:21:42 +02:00
Willy Tarreau
aa259f5b42 [RELEASE] Released version 3.3-dev10
Released version 3.3-dev10 with the following main changes :
    - BUG/MEDIUM: connections: Only avoid creating a mux if we have one
    - BUG/MINOR: sink: retry attempt for sft server may never occur
    - CLEANUP: mjson: remove MJSON_ENABLE_RPC code
    - CLEANUP: mjson: remove MJSON_ENABLE_PRINT code
    - CLEANUP: mjson: remove MJSON_ENABLE_NEXT code
    - CLEANUP: mjson: remove MJSON_ENABLE_BASE64 code
    - CLEANUP: mjson: remove unused defines and math.h
    - BUG/MINOR: http-ana: Reset analyse_exp date after 'wait-for-body' action
    - CLEANUP: mjson: remove unused defines from mjson.h
    - BUG/MINOR: acme: avoid overflow when diff > notAfter
    - DEV: patchbot: use git reset+checkout instead of pull
    - MINOR: proxy: explicitly permit abortonclose on frontends and clarify the doc
    - REGTESTS: fix h2_desync_attacks to wait for the response
    - REGTESTS: http-messaging: fix the websocket and upgrade tests not to close early
    - MINOR: proxy: only check abortonclose through a dedicated function
    - MAJOR: proxy: enable abortonclose by default on HTTP proxies
    - MINOR: proxy: introduce proxy_abrt_close_def() to pass the desired default
    - MAJOR: proxy: enable abortonclose by default on TLS listeners
    - MINOR: h3/qmux: Set QC_SF_UNKNOWN_PL_LENGTH flag on QCS when headers are sent
    - MINOR: stconn: Add two fields in sedesc to replace the HTX extra value
    - MINOR: h1-htx: Increment body len when parsing a payload with no xfer length
    - MINOR: mux-h1: Set known input payload length during demux
    - MINOR: mux-fcgi: Set known input payload length during demux
    - MINOR: mux-h2: Use <body_len> H2S field for payload without content-length
    - MINOR: mux-h2: Set known input payload length of the sedesc
    - MINOR: h3: Set known input payload length of the sedesc
    - MINOR: stconn: Move data from kip to kop when data are sent to the consumer
    - MINOR: filters: Reset knwon input payload length if a data filter is used
    - MINOR: hlua/http-fetch: Use <kip> instead of HTX extra field to get body size
    - MINOR: cache: Use the <kip> value to check too big objects
    - MINOR: compression: Use the <kip> value to check body size
    - MEDIUM: mux-h1: Stop to use HTX extra value when formatting message
    - MEDIUM: htx: Remove the HTX extra field
    - MEDIUM: acme: don't insert acme account key in ckchs_tree
    - BUG/MINOR: acme: memory leak from the config parser
    - CI: cirrus-ci: bump FreeBSD image to 14-3
    - BUG/MEDIUM: ssl: take care of second client hello
    - BUG/MINOR: ssl: always clear the remains of the first hello for the second one
    - BUG/MEDIUM: stconn: Properly forward kip to the opposite SE descriptor
    - MEDIUM: applet: Forward <kip> to applets
    - DEBUG: mux-h1: Dump <kip> and <kop> values with sedesc info
    - BUG/MINOR: ssl: leak in ssl-f-use
    - BUG/MINOR: ssl: leak crtlist_name in ssl-f-use
    - BUILD: makefile: disable tail calls optimizations with memory profiling
    - BUG/MEDIUM: apppet: Improve spinning loop detection with the new API
    - BUG/MINOR: ssl: Free global_ssl structure contents during deinit
    - BUG/MINOR: ssl: Free key_base from global_ssl structure during deinit
    - MEDIUM: jwt: Remove certificate support in jwt_verify converter
    - MINOR: jwt: Add new jwt_verify_cert converter
    - MINOR: jwt: Do not look into ckch_store for jwt_verify converter
    - MINOR: jwt: Add new "jwt" certificate option
    - MINOR: jwt: Add specific error code for known but unavailable certificate
    - DOC: jwt: Add doc about "jwt_verify_cert" converter
    - MINOR: ssl: Dump options in "show ssl cert"
    - MINOR: jwt: Add new "add/del/show ssl jwt" CLI commands
    - REGTEST: jwt: Test new CLI commands
    - BUG/MINOR: ssl: Potential NULL deref in trace macro
    - MINOR: regex: use a thread-local match pointer for pcre2
    - BUG/MEDIUM: pools: fix bad freeing of aligned pools in UAF mode
    - MEDIUM: pools: detect() when munmap() fails in UAF mode
    - TESTS: quic: useless param for b_quic_dec_int()
    - BUG/MEDIUM: pools: fix crash on filtered "show pools" output
    - BUG/MINOR: pools: don't report "limited to the first X entries" by default
    - BUG/MAJOR: lb-chash: fix key calculation when using default hash-key id
    - BUG/MEDIUM: stick-tables: Don't forget to dec count on failure.
    - BUG/MINOR: quic: check applet_putchk() for 'show quic' first line
    - TESTS: quic: fix uninit of quic_cc_path const member
    - BUILD: ssl: can't build when using -DLISTEN_DEFAULT_CIPHERS
    - BUG/MAJOR: quic: uninitialized quic_conn_closed struct members
    - BUG/MAJOR: quic: do not reset QUIC backends fds in closing state
    - BUG/MINOR: quic: SSL counters not handled
    - DOC: clarify the experimental status for certain features
    - MINOR: config: remove experimental status on tune.disable-fast-forward
    - MINOR: tree-wide: add missing TAINTED flags for some experimental directives
    - MEDIUM: config: warn when expose-experimental-directives is used for no reason
    - BUG/MEDIUM: threads/config: drop absent threads from thread groups
    - REGTESTS: remove experimental from quic/retry.vtc
2025-10-18 11:24:05 +02:00
Willy Tarreau
e8dcd4c9c8 REGTESTS: remove experimental from quic/retry.vtc
Recent commit 8b7a82cd30 ("MEDIUM: config: warn when
expose-experimental-directives is used for no reason") triggered on
this test exactly for the reason it was made for. The tests were just
done without quic on it. Let's drop the unneeded option.
2025-10-17 20:55:43 +02:00
Willy Tarreau
c365e47095 BUG/MEDIUM: threads/config: drop absent threads from thread groups
Thread groups can be assigned arbitrary thread ranges, but if the
mentioned threads do not exist, this causes crashes in listener_accept()
or some connections to be ignored. The reason is that the calculated
mask is derived from the thread group's enabled threads count. Examples:

  global
     nbthread 2
     thread-groups 2
     thread-group 1 1-64
     thread-group 2 65-128

  frontend f-crash
     bind :8001 thread 1/all

  frontend f-freeze
     bind :8002 thread 2/all

This commit removes missing threads, emits a warning when the thread
group just has less threads than requested, and an error when it is
left with no threads at all.

This must be backported to 3.1 since the issue is present there already.
2025-10-17 20:36:00 +02:00
Willy Tarreau
8b7a82cd30 MEDIUM: config: warn when expose-experimental-directives is used for no reason
If users start to enable expose-experimental-directives for the purpose
of testing one specific feature, there are chances that the option remains
forever and hides the experimental status of other options.

Let's emit a warning if the option appears and is not used. This will
remind users that they can now drop it, and help keep configs safe for
future upgrades.
2025-10-17 19:00:21 +02:00
Willy Tarreau
80ed9f9dcf MINOR: tree-wide: add missing TAINTED flags for some experimental directives
We normally taint the process when using experimental directives, but
a handful of places were missed so we don't always know that they are
in use. Let's fix these places (hint for future directives, just look
for places checking for "experimental_directives_allowed", and add
"mark_tainted(TAINTED_CONFIG_EXP_KW_DECLARED);").
2025-10-17 19:00:21 +02:00
Willy Tarreau
d3881e61ac MINOR: config: remove experimental status on tune.disable-fast-forward
The option was turned to off by default in 2.8 with commit 2f7c82bfd
("BUG/MINOR: haproxy: Fix option to disable the fast-forward"), however
at the same time it should have dropped its experimental status since
the feature is enabled by default. The only goal of the option is to
debug something, like many other tune.xxx options. The option should
still normally not be used without being invited to do so by developers
looking for something specific though.

This could be backported if desired to simplify debugging, though this
has never been needed for now.
2025-10-17 18:59:47 +02:00
Willy Tarreau
e7c8deb810 DOC: clarify the experimental status for certain features
Certain features require "expose-experimental-directives" to be set in
the global section. Let's clarify that experimental featuers are only
maintained in best effort mode, may break during the stable cycle, and
are generally not maintained beyond the release of the next LTS branch
since it is extremely challenging, and early adopters are expected to
upgrade to benefit from improvements anyway.
2025-10-17 18:41:13 +02:00
Frederic Lecaille
51eca5cbce BUG/MINOR: quic: SSL counters not handled
The SSL counters were not handled at all for QUIC connections. This patch
implement ssl_sock_update_counters() extracting the code from ssl_sock.c
and call this function where applicable both in TLS/TCP and QUIC parts.

Must be backported as far as 2.8.
2025-10-17 12:13:43 +02:00
Frederic Lecaille
8a8417b54a BUG/MAJOR: quic: do not reset QUIC backends fds in closing state
This bug impacts only the backends.

When entering the closing state, a quic_closed_conn is used to replace the quic_conn.
In this state, the ->fd value was reset to -1 value calling qc_init_fd(). This value
is used by qc_may_use_saddr() which supposes it cannot be -1 for a backend, leading
->li to be dereferencd, which is legal only for a listener.

This bug impacts only the backend but with possible crash when qc_may_use_saddr()
is called: qc_test_fd() is false leading qc->li to be dereferenced. This is legal
only for a listener.

This patch prevents such fd value resettings for backends.

No need to backport because the QUIC backends support arrived with 3.3.
2025-10-17 12:13:43 +02:00
Frederic Lecaille
56d15b2a03 BUG/MAJOR: quic: uninitialized quic_conn_closed struct members
A quic_conn_closed struct is initialized to replace the quic_conn when the
connection enters the closing to reduce the connection memory footprint.
->max_udp_payload quic_conn_close was not initialized leading to possible
BUG_ON()s in qc_rcv_buf() when comparing the RX buf size to this payload.

->cntrs counters were alon not initialized with the only consequence
to generate wrong values for these counters.

Must be backported as far as 2.9.
2025-10-17 12:13:43 +02:00
William Lallemand
b74a437e57 BUILD: ssl: can't build when using -DLISTEN_DEFAULT_CIPHERS
Emeric reported that he can't build haproxy anymore since 9bc6a034
("BUG/MINOR: ssl: Free global_ssl structure contents during deinit").

    src/ssl_sock.c:7020:40: error: comparison with string literal results in unspecified behavior [-Werror=address]
     7020 |  if (global_ssl.listen_default_ciphers != LISTEN_DEFAULT_CIPHERS)
          |                                        ^~
    src/ssl_sock.c:7023:41: error: comparison with string literal results in unspecified behavior [-Werror=address]
     7023 |  if (global_ssl.connect_default_ciphers != CONNECT_DEFAULT_CIPHERS)
          |                                         ^~
    src/ssl_sock.c: At top level:

Indeed the mentionned patch is checking the pointer in order to free
something freeable, but that can't work because these constant are
strings literal which can be passed from the compiler and not pointers.

Also the test is not useful, because these strings are strdup() in
__ssl_sock_init, so they can be free directly.

Must be backported in every stable branches with 9bc6a034.
2025-10-17 09:45:26 +02:00
Amaury Denoyelle
5b04a85bc7 TESTS: quic: fix uninit of quic_cc_path const member
Fix quic_tx unittest module by adding an explicit define for <mtu> const
member of quic_cc_path.

This should fix coverity report from github issue #3162.

This can be backported up to 3.2.
2025-10-17 09:29:01 +02:00
Amaury Denoyelle
5067a15870 BUG/MINOR: quic: check applet_putchk() for 'show quic' first line
Ensure applet_putchk() return value is checked when outputing via the
CLI 'show quic' header line.

This is only to align with other usages of the same function, as trash
output buffer should always be large enough for it. As such, the command
is simply aborted if this is not the case.

This should fix coverity report from github issue #3139.

This could be backported up to 2.8.
2025-10-17 09:29:01 +02:00
Olivier Houchard
8d31784c0f BUG/MEDIUM: stick-tables: Don't forget to dec count on failure.
In stksess_new(), if we failed to allocate memory for the new stksess,
don't forget to decrement the table entry count, as nobody else will
do it for us.
An artificially high count could lead to at least purging entries while
there is no need to.

This should be backported up to 2.8.

WIP decrement current on allocation failure
2025-10-16 23:46:37 +02:00
Willy Tarreau
03e9a5a1e7 BUG/MAJOR: lb-chash: fix key calculation when using default hash-key id
A subtle regression was introduced in 3.0 by commit faa8c3e02 ("MEDIUM:
lb-chash: Deterministic node hashes based on server address"). When keys
are calculated from the server's ID (which is the default), due to the
reorganisation of the code, the key ended up being hashed twice instead
of being multiplied by the scaling range.

While most users will never notice it, it is blocking some large cache
users from upgrading from 2.8 to 3.0 or 3.2 because the keys are
redistributed.

After a check with users on the mailing list [1] it was estimated that
keep the current situation is the worst choice because those who have
not yet upgraded will face the problem while by fixing it, those who
already have and for whom it happened smoothly will handle it just
right again.

As such this fix must be backported to 3.0 without waiting (in order
to preserve those who upgrade from two redistributions). Please note
that only configurations featuring "hash-type consistent" and not
having "hash-key" present with a value other than "id" are affected,
others are not (e.g. "hash-key addr" is unaffected).

[1] https://www.mail-archive.com/haproxy@formilux.org/msg46115.html
2025-10-16 10:43:09 +02:00
Willy Tarreau
f263a45ddf BUG/MINOR: pools: don't report "limited to the first X entries" by default
With the fix in commit 982805e6a3 ("BUG/MINOR: pools: Fix the dump of
pools info to deal with buffers limitations"), the max count is now
compared to the number of dumped pools instead of the configured
numbered, and keeping >= is no longer valid because maxcnt is set by
default to the same value when not set, so this means that since this
patch we're always displaying "limited to the first X entries" where X
is the number of dumped entries even in the absence of any limitation.
Let's just fix the comparison to only show this when the limit is lower.

This must be backported to 3.2 where the patch above already is.
2025-10-16 08:41:32 +02:00
Willy Tarreau
ab0c97139f BUG/MEDIUM: pools: fix crash on filtered "show pools" output
The truncation of pools output that was adressed in commit 982805e6a3
("BUG/MINOR: pools: Fix the dump of pools info to deal with buffers
limitations") required to split the pools filling from dumping. However
there is a problem when a limit is passed that is lower than the number
of pools or if a pool name is specified or if pool caches are disabled,
because in this case the number of filled slots will be lower than the
initially allocated one, and empty entries will be visited either by the
sort functions when filling the entries if "byxxx" is specified, or by
the dump function after the last entry, but none of these functions was
expecting to be passed a NULL entry.

Let's just re-adjust nbpools to match the number of filled entries at
the end. Anyway the totals are calculated on the number of dumped
entries.

This must be backported to 3.2 since the fix above was backported there
as well.
2025-10-16 08:41:32 +02:00
Frederic Lecaille
d5f4872ba6 TESTS: quic: useless param for b_quic_dec_int()
The third parameter passed to b_quic_dec_int() is unitialized. This is not a bug.
But this disturbs coverity for an unknown reason as revealed by GH issue #3154.

This patch takes the opportunity to use NULL as passed value to avoid using such
an uneeded third parameter.

Should be backported to 3.2 where this unit test was introduced.
2025-10-15 09:58:03 +02:00
Willy Tarreau
17930edecc MEDIUM: pools: detect() when munmap() fails in UAF mode
Better check that munmap() always works, otherwise it means we might
have miscalculated an address, and if it fails silently, it will eat
all the memory extremely quickly. Let's add a BUG_ON() on munmap's
return.
2025-10-13 19:22:31 +02:00
Willy Tarreau
0e6a233217 BUG/MEDIUM: pools: fix bad freeing of aligned pools in UAF mode
As reported by Christopher, in UAF mode memory release of aligned
objects as introduced in commit ef915e672a ("MEDIUM: pools: respect
pool alignment in allocations") does not work. The padding calculation
in the freeing code is no longer correct since it now depends on the
alignment, so munmap() fails on EINVAL. Fortunately we don't care much
about it since we know it's the low bits of the passed address, which
is much simpler to compute, since all mmaps are page-aligned.

There's no need to backport this, as this was introduced in 3.3.
2025-10-13 19:19:39 +02:00
Willy Tarreau
fda6dc9597 MINOR: regex: use a thread-local match pointer for pcre2
The pcre2 matching requires an array of matches for grouping, that is
allocated when executing the rule by pre-processing it, and that is
immediately freed after use. This is quite inefficient and results in
annoying patterns in "show profiling" that attribute the allocations
to libpcre2 and the releases to haproxy.

A good suggestion from Dragan is to pre-allocate these per thread,
since the entry is not specific to a regex. In addition we're already
limited to MAX_MATCH matches so we don't even have the problem of
having to grow it while parsing nor processing.

The current patch adds a per-thread pair of init/deinit functions to
allocate a thread-local entry for that, and gets rid of the dynamic
allocations. It will result in cleaner memory management patterns and
slightly higher performance (+2.5%) when using pcre2.
2025-10-13 16:56:43 +02:00
Remi Tricot-Le Breton
6f4ca37880 BUG/MINOR: ssl: Potential NULL deref in trace macro
'ctx' might be NULL when we exit 'ssl_sock_handshake', it can't be
dereferenced without check in the trace macro.

This was found by Coverity andraised in GitHub #3113.
This patch should be backported up to 3.2
2025-10-13 15:44:45 +02:00
Remi Tricot-Le Breton
d82019d05c REGTEST: jwt: Test new CLI commands
Test the "add/del ssl jwt" commands and check the new return value in
case of unavailable certificate used in a jwt_verify_cert converter.
2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton
d4bb9983fa MINOR: jwt: Add new "add/del/show ssl jwt" CLI commands
The new "add/del ssl jwt <file>" commands allow to change the "jwt" flag
of an already loaded certificate. It allows to delete certificates used
for JWT validation, which was not yet possible.
The "show ssl jwt" command iterates over all the ckch_stores and dumps
the ones that have the option set.
2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton
daa36adc6e MINOR: ssl: Dump options in "show ssl cert"
Dump the values of the 'ocsp-update' and 'jwt' flags in the output of
'show ssl cert' CLI command.
2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton
0f35b46124 DOC: jwt: Add doc about "jwt_verify_cert" converter
Add information about the new "jwt_verify_cert" converter and update the
existing "jwt_converter" doc to remove mentions of certificates from it.
Add information about the new "jwt" certificate option.
2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton
bf5b912a62 MINOR: jwt: Add specific error code for known but unavailable certificate
A certificate that does not have the 'jwt' flag enabled cannot be used
for JWT validation. We now raise a specific return value so that such a
case can be identified.
2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton
18ff130e9d MINOR: jwt: Add new "jwt" certificate option
This option can be used to enable the use of a given certificate for JWT
verification. It defaults to 'off' so certificates that are declared in
a crt-store and will be used for JWT verification must have a
"jwt on" option in the configuration.
2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton
53957c50c3 MINOR: jwt: Do not look into ckch_store for jwt_verify converter
We must not try to load full-on certificates for 'jwt_verify' converter
anymore. 'jwt_verify_cert' is the only one that accepts a certificate.
2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton
f5632fd481 MINOR: jwt: Add new jwt_verify_cert converter
This converter will be in charge of performing the same operation as the
'jwt_verify' one except that it takes a full-on pem certificate path
instead of a public key path as parameter.
The certificate path can be either provided directly as a string or via
a variable. This allows to use certificates that are not known during
init to perform token validation.
2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton
c3c0597a34 MEDIUM: jwt: Remove certificate support in jwt_verify converter
The jwt_verify converter will not take full-on certificates anymore
in favor of a new soon to come jwt_verify_cert. We might end up with a
new jwt_verify_hmac in the future as well which would allow to deprecate
the jwt_verify converter and remove the need for a specific internal
tree for public keys.
The logic to always look into the internal jwt tree by default and
resolve to locking the ckch tree as little as possible will also be
removed. This allows to get rid of the duplicated reference to
EVP_PKEYs, the one in the jwt tree entry and the one in the ckch_store.
2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton
b706f2d092 BUG/MINOR: ssl: Free key_base from global_ssl structure during deinit
The key_base field of the global_ssl structure is an strdup'ed field
(when set) which was never free'd during deinit.

This patch can be backported up to branch 3.0.
2025-10-10 17:22:48 +02:00
Remi Tricot-Le Breton
9bc6a0349d BUG/MINOR: ssl: Free global_ssl structure contents during deinit
Some fields of the global_ssl structure are strings that are strdup'ed
but never freed. There is only one static global_ssl structure so not
much memory is used but we might as well free it during deinit.

This patch can be backported to all stable branches.
2025-10-10 17:22:48 +02:00
Christopher Faulet
54b7539d64 BUG/MEDIUM: apppet: Improve spinning loop detection with the new API
Conditions to detect the spinning loop for applets based on the new API are
not accurrate. We cannot continue to check the channel's buffers state to
know if an applet has made some progress. At least, we must also check the
applet's buffers.

After digging to find the right way to do, it was clear that the best is to
use something similar to what is performed for the streams, namely, checking
read and write events. And in fact, it is quite easy to do with the new
API. So let's do so.

This patch must be backported as far as 3.0.
2025-10-10 14:41:15 +02:00
Willy Tarreau
dfe7fa9349 BUILD: makefile: disable tail calls optimizations with memory profiling
The purpose of memory profiling precisely is to figure what function
allocates and what function frees for specific objects. It turns out
that a non-negligible number of release callbacks basically do nothing
but a free() or pool_free() call and return, which the compiler happily
turns into a jump, making the caller of that callback appear as the
real one. That's how we can see libcrypto release to pools such as
ssl-capture for example, which also makes the per-DSO calls appear
wrong:

      10000           0       10720000              0|         0x448c8d ssl_async_fd_free+0x3b9d p_alloc(1072) [pool=ssl-capture]
      50000           0        6800000              0|         0x4456b9 ssl_async_fd_free+0x5c9 p_alloc(136) [pool=ssl-keylogf]
      10072           0         644608              0|         0x447f14 ssl_async_fd_free+0x2e24 p_alloc(64) [pool=ssl-keylogf]
          0       10000              0        1360000|         0x445987 ssl_async_fd_free+0x897 p_free(-136) [pool=ssl-keylogf]
          0       10000              0        1360000|         0x4459b8 ssl_async_fd_free+0x8c8 p_free(-136) [pool=ssl-keylogf]
          0       10000              0        1360000|         0x4459e9 ssl_async_fd_free+0x8f9 p_free(-136) [pool=ssl-keylogf]
          0       10000              0        1360000|         0x445a1a ssl_async_fd_free+0x92a p_free(-136) [pool=ssl-keylogf]
          0       10000              0        1360000|         0x445a4b ssl_async_fd_free+0x95b p_free(-136) [pool=ssl-keylogf]
          0       20072              0       11364608|   0x7f5f1397db62 libcrypto:CRYPTO_free_ex_data+0xf2/0x261 p_free(-566) [pool=ssl-keylogf] [locked=72 (0.3 %)]

Worse, as can be seen on the last line above, there can be a single pool
per call place (since we don't release to arbitrary pools), and the stats
are misleading by reporting the first used pool only when a same function
can call multiple release callbacks. This is why the free call totals
10k ssl-capture and 10072 ssl-keylogfile.

Let's just disable tail call optimization when using memory profiling.
The gains are only very marginal and complicate so much the debugging
that it's not worth it. Now the output is correct, and no longer claims
that libcrypto is the caller:

      10000           0       10720000              0|         0x448c9f ssl_async_fd_free+0x3b9f p_alloc(1072) [pool=ssl-capture]
          0       10000              0       10720000|         0x445af0 ssl_async_fd_free+0x9f0 p_free(-1072) [pool=ssl-capture]
      50000           0        6800000              0|         0x4456c9 ssl_async_fd_free+0x5c9 p_alloc(136) [pool=ssl-keylogf]
      10177           0        1221240              0|         0x45543d ssl_async_fd_handler+0xb51d p_alloc(120) [pool=ssl_sock_ct] [locked=165 (1.6 %)]
      10061           0         643904              0|         0x447f1c ssl_async_fd_free+0x2e1c p_alloc(64) [pool=ssl-keylogf]
          0       10000              0        1360000|         0x445987 ssl_async_fd_free+0x887 p_free(-136) [pool=ssl-keylogf]
          0       10000              0        1360000|         0x4459b8 ssl_async_fd_free+0x8b8 p_free(-136) [pool=ssl-keylogf]
          0       10000              0        1360000|         0x4459e9 ssl_async_fd_free+0x8e9 p_free(-136) [pool=ssl-keylogf]
          0       10000              0        1360000|         0x445a1a ssl_async_fd_free+0x91a p_free(-136) [pool=ssl-keylogf]
          0       10000              0        1360000|         0x445a4b ssl_async_fd_free+0x94b p_free(-136) [pool=ssl-keylogf]
          0       10188              0        1222560|         0x44f518 ssl_async_fd_handler+0x55f8 p_free(-120) [pool=ssl_sock_ct] [locked=176 (1.7 %)]
          0       10072              0         644608|         0x445aa6 ssl_async_fd_free+0x9a6 p_free(-64) [pool=ssl-keylogf] [locked=72 (0.7 %)]

An attempt was made to only instrument pool_free() to place a compiler
barrier, but that resulted in much larger code and wouldn't cover
functions ending with a simple "free()" call. "ha_free()" however is
already immune against tail call optimization since it has to write
the NULL when returning from free().

This should be backported to recent stable releases that are still
regularly being debugged.
2025-10-10 13:45:19 +02:00
William Lallemand
47a93dc750 BUG/MINOR: ssl: leak crtlist_name in ssl-f-use
This patch fixes a leak of the temporary variable "crtlist_name" which
is used in the ssl-f-use parser.

Must be backported in 3.2.
2025-10-10 11:22:37 +02:00
William Lallemand
d9365a88a5 BUG/MINOR: ssl: leak in ssl-f-use
Fix the leak of the filename in the struct cfg_crt_node which is a
temporary structure used for ssl-f-use initialization.

Must be backported to 3.2.
2025-10-10 11:22:37 +02:00
Christopher Faulet
cbe5221182 DEBUG: mux-h1: Dump <kip> and <kop> values with sedesc info
It could be handy to debug issues, especially because these values was
recently introduced.
2025-10-10 11:16:21 +02:00
Christopher Faulet
6a0fe6e460 MEDIUM: applet: Forward <kip> to applets
For now, no applets are using the <kop> value when consuming data. At least,
as far as I know. But it remains a good idea to keep the applet API
compatible. So now, the <kip> of the opposite side is properly forwarded to
applets.
2025-10-10 11:11:44 +02:00
Christopher Faulet
4145a61101 BUG/MEDIUM: stconn: Properly forward kip to the opposite SE descriptor
By refactoring the HTX to remove the extra field, a bug was introduced in
the stream-connector part. The <kip> (known input payload) value of a sedesc
was moved to <kop> (knwon output payload) using the same sedesc. Of course,
this is totally wrong. <kip> value of a sedesc must be forwarded to the
opposite side.

In addition, the operation is performed in sc_conn_send(). In this function,
we manipulate the stream-connectors. So se_fwd_kip() function was changed to
use the stream-connectors directely.

Now, the function sc_ep_fwd_kip() is now called with the both
stream-connectors to properly forward <kip> from on side to the opposite
side.

The bug is 3.3-specific. No backport needed.
2025-10-10 11:01:21 +02:00
Willy Tarreau
54f0ab08b8 BUG/MINOR: ssl: always clear the remains of the first hello for the second one
William rightfully pointed that despite the ssl capture being a
structure, some of its entries are only set for certain contents,
so we need to always zero it before using it so as to clear any
remains of a previous use, otherwise we could possibly report some
entries that were only present in the first hello and not the second
one. No need to clear the data though, since any remains will not be
referenced by the fields.

This must be backported wherever commit 336170007c ("BUG/MEDIUM: ssl:
take care of second client hello") is backported.
2025-10-09 18:50:30 +02:00
Willy Tarreau
336170007c BUG/MEDIUM: ssl: take care of second client hello
For a long time we've been observing some sporadic leaks of ssl-capture
pool entries on haproxy.org without figuring exactly the root cause. All
that was seen was that less calls to the free callback were made than
calls to the hello parsing callback, and these were never reproduced
locally.

It recently turned out to be triggered by the presence of "curves" or
"ecdhe" on the "bind" line. Captures have shown the presence of a second
client hello, called "Change Cipher Client Hello" in wireshark traces,
that calls the client hello callback again. That one wasn't prepared for
being called twice per connection, so it allocates an ssl-capture entry
and assigns it to the ex_data entry, possibly overwriting the previous
one.

In this case, the fix is super simple, just reuse the current ex_data
if it exists, otherwise allocate a new one. This completely solves the
problem.

Other callbacks have been audited for the same issue and are not
affected: ssl_ini_keylog() already performs this check and ignores
subsequent calls, and other ones do not allocate data.

This must be backported to all supported versions.
2025-10-09 17:06:49 +02:00
William Lallemand
229eab8fc9 CI: cirrus-ci: bump FreeBSD image to 14-3
FreeBSD CI seems to be broken for a while, try to upgrade the image to
the latest 14.3 version.
2025-10-09 14:06:48 +02:00
William Lallemand
f35caafa6e BUG/MINOR: acme: memory leak from the config parser
This patch fixes some memory leaks in the configuration parser:

- deinit_acme() was never called
- add ha_free() before every strdup() for section overwrite
- lacked some free() in deinit_acme()
2025-10-09 12:04:22 +02:00
William Lallemand
9344ecaade MEDIUM: acme: don't insert acme account key in ckchs_tree
Don't insert the acme account key in the ckchs_tree anymore. ckch_store
are not made to only include a private key. CLI operations are not
possible with them either. That doesn't make much sense to keep it that
way until we rework the ckch_store.
2025-10-09 11:01:58 +02:00
Christopher Faulet
914538cd39 MEDIUM: htx: Remove the HTX extra field
Thanks for previous changes, it is now possible to remove the <extra> field
from the HTX structure. HTX_FL_ALTERED_PAYLOAD flag is also removed because
it is now unsued.
2025-10-08 11:10:42 +02:00
Christopher Faulet
2e2953a3f0 MEDIUM: mux-h1: Stop to use HTX extra value when formatting message
We now rely on the <kop> value to format the message payload before
sending it. It is no longer necessary to use the HTX extra field.
2025-10-08 11:10:42 +02:00
Christopher Faulet
4f40b2de86 MINOR: compression: Use the <kip> value to check body size
When an minimum compression size is defined, we can now use the <kip>
value to skip the compression instead of the HTX extra field.
2025-10-08 11:10:42 +02:00
Christopher Faulet
c0f5b19bc6 MINOR: cache: Use the <kip> value to check too big objects
When an object should be cache, to check if it is too big or not, the
<kip> value is now used instead of the HTX extra field.
2025-10-08 11:10:42 +02:00
Christopher Faulet
f1c659f3ae MINOR: hlua/http-fetch: Use <kip> instead of HTX extra field to get body size
The known input payload length now contains the information. There is no
reason to still rely on the HTX extra field.
2025-10-08 11:10:25 +02:00
Christopher Faulet
be1ce400c4 MINOR: filters: Reset knwon input payload length if a data filter is used
It a data filter is registered on a channel, the corresponding <kip>
field must be reset because the payload may be altered.
2025-10-08 11:01:37 +02:00
Christopher Faulet
30c50e4f19 MINOR: stconn: Move data from kip to kop when data are sent to the consumer
When data are sent to the consumer, the known output payload length is
updated using the known input payload length value and this last one is then
reset. se_fwd_kip() function is used for this purpose.
2025-10-08 11:01:37 +02:00
Christopher Faulet
f6a4d41dd0 MINOR: h3: Set known input payload length of the sedesc
Set <kip> value when data are transfer to the upper layer, in h3_rcv_buf().
The difference between the known length of the payload before and after a
parsing loop is added to <kip> value. When a content-length is specified in
the message, the h3s <body_len> field is used. Otherwise, it is the h3s
<data_len> field.
2025-10-08 11:01:36 +02:00
Christopher Faulet
bc8c6c42f4 MINOR: mux-h2: Set known input payload length of the sedesc
Set <kip> value when data are transfer to the upper layer, in h2_rcv_buf().
The new <body_len> filed of the H2S is used to increment <kip> value and
then it is reset. The patch relies on the previous one ("MINOR: mux-h2: Save
the known length of the payload").
2025-10-08 11:01:36 +02:00
Christopher Faulet
3a6a576e73 MINOR: mux-h2: Use <body_len> H2S field for payload without content-length
Before, the <body_len> H2S field was only use for verity the annonced
content-lenght value was respected. Now, this field is used for all
messages. Messages with a content-length are still handled the same way.
<body_len> is set to the content-length value and decremented by the size of
each DATA frame. For other messages, the value is initialized to ULLONG_MAX
and still decremented by the size of each DATA frame. This change is
mandatory to properly define the known input payload length value of the
sedesc.
2025-10-08 11:01:36 +02:00
Christopher Faulet
4fdc23e648 MINOR: mux-fcgi: Set known input payload length during demux
Set <kip> value during the response parsing. The difference between the body
length before and after a parsing loop is added. The patch relies on the
previous one ("MINOR: h1-htx: Increment body len when parsing a payload with
no xfer length").
2025-10-08 11:01:36 +02:00
Christopher Faulet
2bf2f68cd8 MINOR: mux-h1: Set known input payload length during demux
Set <kip> value during the message parsing. The difference between the body
length before and after a parsing loop is added. The patch relies on the
previous one ("MINOR: h1-htx: Increment body len when parsing a payload with
no xfer length").
2025-10-08 11:01:36 +02:00
Christopher Faulet
c9bc18c0bf MINOR: h1-htx: Increment body len when parsing a payload with no xfer length
In the H1 parseur, the body length was only incremented when the transfer
length was known. So when the content-length was specified or when the
transfer-encoding value was set to "chunk".

Now for messages with unknown transfer length, it is also incremented. It is
mandatory to be able to remove the extra field from the HTX message.
2025-10-08 11:01:36 +02:00
Christopher Faulet
c0b6db2830 MINOR: stconn: Add two fields in sedesc to replace the HTX extra value
For now, the HTX extra value is used to specify the known part, in bytes, of
the HTTP payload we will receive. It may concerne the full payload if a
content-length is specified or the current chunk for a chunk-encoded
message. The main purpose of this value is to be used on the opposite side
to be able to announce chunks bigger than a buffer. It can also be used to
check the validity of the payload on the sending path, to properly detect
too big or too short payload.

However, setting this information in the HTX message itself is not really
appropriate because the information is lost when the HTX message is consumed
and the underlying buffer released. So the producer must take care to always
add it in all HTX messages. it is especially an issue when the payload is
altered by a filter.

So to fix this design issue, the information will be moved in the sedesc. It
is a persistent area to save the information. In addition, to avoid the
ambiguity between what the producer say and what the consumer see, the
information will be splitted in two fields. In this patch, the fields are
added:

 * kip : The known input payload length
 * kop : The known output payload lenght

The producer will be responsible to set <kip> value. The stream will be
responsible to decrement <kip> and increment <kop> accordingly. And the
consumer will be responsible to remove consumed bytes from <kop>.
2025-10-08 11:01:36 +02:00
Christopher Faulet
586511c278 MINOR: h3/qmux: Set QC_SF_UNKNOWN_PL_LENGTH flag on QCS when headers are sent
QC_SF_UNKNOWN_PL_LENGTH flag is set on the qcs to know a payload of message
has an unknown length and not send a RESET_STREAM on shutdown. This flag was
based on the HTX extra field value. However, it is not necessary. When
headers are processed, before sending them, it is possible to check the HTX
start-line to know if the length of the payload is known or not.

So let's do so and don't use anymore the HTX extra field for this purpose.
2025-10-08 11:01:36 +02:00
Willy Tarreau
00b27a993f MAJOR: proxy: enable abortonclose by default on TLS listeners
In the continuity of https://github.com/orgs/haproxy/discussions/3146,
we must also enable abortonclose by default for TLS listeners so as not
to needlessly compute TLS handshakes on dead connections. The change is
very small (just set the default value to 1 in the TLS code when neither
the option nor its opposite were set).

It may possibly cause some TLS handshakes to start failing with 3.3 in
certain legacy environments (e.g. TLS health-checks performed using only
a client hello and closing afterwards), and in this case it is sufficient
to disable the option using "no option abortonclose" in either the
affected frontend or the "defaults" section it derives from.
2025-10-08 10:36:59 +02:00
Willy Tarreau
75103e7701 MINOR: proxy: introduce proxy_abrt_close_def() to pass the desired default
With this function we can now pass the desired default value for the
abortonclose option when neither the option nor its opposite were set.
Let's also take this opportunity for using it directly from the HTTP
analyser since there's no point in re-checking the proxy's mode there.
2025-10-08 10:29:41 +02:00
Willy Tarreau
644b3dc7d8 MAJOR: proxy: enable abortonclose by default on HTTP proxies
As discussed on https://github.com/orgs/haproxy/discussions/3146 and on
the mailing list, there's a marked preference for having abortonclose
enabled by default when relevant. The point being that with todays'
internet, the large majority of requests sent with a closed input
channel are aborted requests, and that it's pointless to waste resources
processing them.

This patch now considers both "option abortonclose" and its opposite
"no option abortonclose" to figure whether abortonclose is enabled or
disabled in a backend. When neither are set (thus not even inherited
from a defaults section), then it considers the proxy's mode, and HTTP
mode implies abortonclose by default.

This may make some legacy services fail starting with 3.3. In this case
it will be sufficient to add "no option abortonclose" in either the
affected backend or the defaults section it derives from. But for
internet-facing proxies it's better to stay with the option enabled.
2025-10-08 10:29:41 +02:00
Willy Tarreau
fe47e8dfc5 MINOR: proxy: only check abortonclose through a dedicated function
In order to prepare for changing the way abortonclose works, let's
replace the direct flag check with a similarly named function
(proxy_abrt_close) which returns the on/off status of the directive
for the proxy. For now it simply reflects the flag's state.
2025-10-08 10:29:41 +02:00
Willy Tarreau
687504344a REGTESTS: http-messaging: fix the websocket and upgrade tests not to close early
By default when building an H2 request, vtest sets the END_STREAM flag
on the HEADERS frame. This is problematic with the websocket and proto
upgrade tests since we're using CONNECT, because it immediately closes
afterwards, which does not correspond to what we're testing. Doing this
in abortonclose mode rightfully produces an error. Let's fix the test
so as not to set the flag on the HEADERS frame. However, doing so means
we'll receive a window update that we must also accept. Now the test
works both with and without abortonclose.
2025-10-08 10:29:41 +02:00
Willy Tarreau
8573c5e2a1 REGTESTS: fix h2_desync_attacks to wait for the response
Tests with abortonclose showed a bug with this test where the client
would close the stream immediately after sending the request, without
waiting for the response, causing some random failures on the server
side.
2025-10-08 10:29:41 +02:00
Willy Tarreau
c42e62d890 MINOR: proxy: explicitly permit abortonclose on frontends and clarify the doc
The "abortonclose" option was recently deprecated in frontends because its
action was essentially limited to the backend part (queuing etc). But in
3.3 we started to support it for TLS on frontends, though it would only
work when placed in a defaults section. Let's officially support it in
frontends, and take this opportunity to clarify the documentation on this
topic, which was incomplete regarding frontend and TLS support. Now the
doc tries to better cover the different use cases.
2025-10-08 10:29:41 +02:00
Willy Tarreau
f657ffc6e7 DEV: patchbot: use git reset+checkout instead of pull
The patchbot stopped on a previous ultra-rare forced push due to wanting
the user's name and e-mail before proceeding. We don't want merges nor
rebases anyway, only to reset the tree to the next one, so let's do that.
2025-10-08 04:38:35 +02:00
William Lallemand
45fba1db27 BUG/MINOR: acme: avoid overflow when diff > notAfter
Avoid an overflow or a negative value if notAfter < diff.

This is unlikely to provoke any problem.

Fixes issue #3138.

Must be backported to 3.2.
2025-10-07 10:54:58 +02:00
William Lallemand
69bd253b23 CLEANUP: mjson: remove unused defines from mjson.h
This patch removes unused defines from mjson.h.
It also removes unused c++ declarations and includes.

string.h is moved to mjson.c
2025-10-06 09:30:07 +02:00
Christopher Faulet
8219fa1842 BUG/MINOR: http-ana: Reset analyse_exp date after 'wait-for-body' action
'wait-for-body' action set analyse_exp date for the channel to the
configured time. However, when the action is finished, it does not reset
it. It is an issue for some following actions, like 'pause', that also rely
on this date.

To fix the issue, we must take care to reset the analyse_exp date to
TICK_ETERNITY when the 'wait-for-body' action is finished.

This patch should fix the issue #3147. It must be backported to all stable
versions.
2025-10-03 17:09:16 +02:00
William Lallemand
61933a96a6 CLEANUP: mjson: remove unused defines and math.h
Remove unused defines for MSVC which is not used in the case of haproxy,
and remove math.h which is not used as well.
2025-10-03 16:09:51 +02:00
William Lallemand
8ea8aaace2 CLEANUP: mjson: remove MJSON_ENABLE_BASE64 code
Remove the code used under #if MJSON_ENABLE_BASE64, which is not used
within haproxy, to ease the maintenance of mjson.
2025-10-03 16:09:13 +02:00
William Lallemand
4edb05eb12 CLEANUP: mjson: remove MJSON_ENABLE_NEXT code
Remove the code used under #if MJSON_ENABLE_NEXT, which is not used
within haproxy, to ease the maintenance of mjson.
2025-10-03 16:08:17 +02:00
William Lallemand
a4eeeeeb07 CLEANUP: mjson: remove MJSON_ENABLE_PRINT code
Remove the code used under #if MJSON_ENABLE_PRINT, which is not used
within haproxy, to ease the maintenance of mjson.
2025-10-03 16:07:59 +02:00
William Lallemand
d63dfa34a2 CLEANUP: mjson: remove MJSON_ENABLE_RPC code
Remove the code used under #if MJSON_ENABLE_RPC, which is not used
within haproxy, to ease the maintenance of mjson.
2025-10-03 16:06:33 +02:00
Aurelien DARRAGON
c26ac3f5e4 BUG/MINOR: sink: retry attempt for sft server may never occur
Since 9561b9fb6 ("BUG/MINOR: sink: add tempo between 2 connection
attempts for sft servers"), there is a possibility that the tempo we use
to schedule the task expiry may point to TICK_ETERNITY as we add ticks to
tempo with a simple addition that doesn't take care of potential wrapping.

When this happens (although relatively rare, since now_ms only wraps every
49.7 days, but a forced wrap occurs 20 seconds after haproxy is started
so it is more likely to happen there), the process_sink_forward() task
expiry being set to TICK_ETERNITY, it may never be called again, this
is especially true if the ring section only contains a single server.

To fix the issue, we must use tick_add() helper function to set the tempo
value and this way we ensure that the value will never be TICK_ETERNITY.

It must be backported everywhere 9561b9fb6 was backported (up to 2.6
it seems).
2025-10-03 14:31:05 +02:00
Olivier Houchard
b01a00acb1 BUG/MEDIUM: connections: Only avoid creating a mux if we have one
In connect_server(), only avoid creating a mux when we're reusing a
connection, if that connection already has one. We can reuse a
connection with no mux, if we made a first attempt at connecting to the
server and it failed before we could create the mux (or during the mux
creation). The connection will then be reused when trying again.
This fixes a bug where a stream could stall if the first connection
attempt failed before the mux creation. It is easy to reproduce by
creating random memory allocation failure with -dmFail.
This was introduced by commit 4aaf0bfbced22d706af08725f977dcce9845d340,
and thus does not need any backport as long as that commit is not
backported.
2025-10-03 13:13:10 +02:00
Christopher Faulet
d0084cb873 [RELEASE] Released version 3.3-dev9
Released version 3.3-dev9 with the following main changes :
    - BUG/MINOR: acl: Fix error message about several '-m' parameters
    - MINOR: server: Parse sni and pool-conn-name expressions in a dedicated function
    - BUG/MEDIUM: server: Use sni as pool connection name for SSL server only
    - BUG/MINOR: server: Update healthcheck when server settings are changed via CLI
    - OPTIM: backend: Don't set SNI for non-ssl connections
    - OPTIM: proto_rhttp: Don't set SNI for non-ssl connections
    - OPTIM: tcpcheck: Don't set SNI and ALPN for non-ssl connections
    - BUG/MINOR: tcpcheck: Don't use sni as pool-conn-name for non-SSL connections
    - MEDIUM: server/ssl: Base the SNI value to the HTTP host header by default
    - MEDIUM: httpcheck/ssl: Base the SNI value on the HTTP host header by default
    - OPTIM: tcpcheck: Reorder tcpchek_connect structure fields to fill holes
    - REGTESTS: ssl: Add a script to test the automatic SNI selection
    - MINOR: quic: add useful trace about padding params values
    - BUG/MINOR: quic: too short PADDING frame for too short packets
    - BUG/MINOR: cpu_topo: work around a small bug in musl's CPU_ISSET()
    - BUG/MEDIUM: ssl: Properly initialize msg_controllen.
    - MINOR: quic: SSL session reuse for QUIC
    - BUG/MEDIUM: proxy: fix crash with stop_proxy() called during init
    - MINOR: stats-file: use explicit unsigned integer bitshift for user slots
    - CLEANUP: quic: fix typo in quic_tx trace
    - TESTS: quic: add unit-tests for QUIC TX part
    - MINOR: quic: restore QUIC_HP_SAMPLE_LEN constant
    - REGTESTS: ssl: Fix the script about automatic SNI selection
    - BUG/MINOR: pools: Fix the dump of pools info to deal with buffers limitations
    - MINOR: pools: Don't dump anymore info about pools when purge is forced
    - BUG/MINOR: quic: properly support GSO on backend side
    - BUG/MEDIUM: mux-h2: Reset MUX blocking flags when a send error is caught
    - BUG/MEDIUM: mux-h2; Don't block reveives in H2_CS_ERROR and H2_CS_ERROR2 states
    - BUG/MEDIUM: mux-h2: Restart reading when mbuf ring is no longer full
    - BUG/MINOR: mux-h2: Remove H2_CF_DEM_DFULL flags when the demux buffer is reset
    - BUG/MEDIUM: mux-h2: Report RST/error to app-layer stream during 0-copy fwding
    - BUG/MEDIUM: mux-h2: Reinforce conditions to report an error to app-layer stream
    - BUG/MINOR: hq-interop: adjust parsing/encoding on backend side
    - OPTIM: check: do not delay MUX for ALPN if SSL not active
    - BUG/MEDIUM: checks: fix ALPN inheritance from server
    - BUG/MINOR: check: ensure checks are compatible with QUIC servers
    - MINOR: check: reject invalid check config on a QUIC server
    - MINOR: debug: report the process id in warnings and panics
    - DEBUG: stream: count the number of passes in the connect loop
    - MINOR: debug: report the number of loops and ctxsw for each thread
    - MINOR: debug: report the time since last wakeup and call
    - DEBUG: peers: export functions that use locks
    - MINOR: stick-table: permit stksess_new() to temporarily allocate more entries
    - MEDIUM: stick-tables: relax stktable_trash_oldest() to only purge what is needed
    - MEDIUM: stick-tables: give up on lock contention in process_table_expire()
    - MEDIUM: stick-tables: don't wait indefinitely in stktable_add_pend_updates()
    - MEDIUM: peers: don't even try to process updates under contention
    - BUG/MEDIUM: h1: Allow reception if we have early data
    - BUG/MEDIUM: ssl: create the mux immediately on early data
    - MINOR: ssl: Add a flag to let it known we have an ALPN negociated
    - MINOR: ssl: Use the new flag to know when the ALPN has been set.
    - MEDIUM: server: Introduce the concept of path parameters
    - CLEANUP: backend: clarify the role of the init_mux variable in connect_server()
    - CLEANUP: backend: invert the condition to start the mux in connect_server()
    - CLEANUP: backend: simplify the complex ifdef related to 0RTT in connect_server()
    - CLEANUP: backend: clarify the cases where we want to use early data
    - MEDIUM: server: Make use of the stored ALPN stored in the server
    - BUILD: ssl: address a recent build warning when QUIC is enabled
    - BUG/MINOR: activity: fix reporting of task latency
    - MINOR: activity: indicate the number of calls on "show tasks"
    - MINOR: tools: don't emit "+0" for symbol names which exactly match known ones
    - BUG/MEDIUM: stick-tables: don't loop on non-expirable entries
    - DEBUG: stick-tables: export stktable_add_pend_updates() for better reporting
    - BUG/MEDIUM: ssl: Fix a crash when using QUIC
    - BUG/MEDIUM: ssl: Fix a crash if we failed to create the mux
    - MEDIUM: dns: bind the nameserver sockets to the initiating thread
    - MEDIUM: resolvers: make the process_resolvers() task single-threaded
    - BUG/MINOR: stick-table: make sure never to miss a process_table_expire update
    - MEDIUM: stick-table: move process_table_expire() to a single thread
    - MEDIUM: peers: move process_peer_sync() to a single thread
    - BUG/MAJOR: stream: Force channel analysis on successful synchronous send
    - MINOR: quic: get rid of ->target quic_conn struct member
    - MINOR: quic-be: make SSL/QUIC objects use their own indexes (ssl_qc_app_data_index)
    - MINOR: quic: display build warning for compat layer on recent OpenSSL
    - DOC: quic: clarifies limited-quic support
    - BUG/MINOR: acme: null pointer dereference upon allocation failure
    - BUG/MEDIUM: jws: return size_t in JWS functions
    - BUG/MINOR: ssl: Potential NULL deref in trace macro
    - BUG/MINOR: ssl: Fix potential NULL deref in trace callback
    - BUG/MINOR: ocsp: prototype inconsistency
    - MINOR: ocsp: put internal functions as static ones
    - MINOR: ssl: set functions as static when no protypes in the .h
    - BUILD: ssl: functions defined but not used
    - BUG/MEDIUM: resolvers: Properly cache do-resolv resolution
    - BUG/MINOR: resolvers: Restore round-robin selection on records in DNS answers
    - MINOR: activity: don't report the lat_tot column for show profiling tasks
    - MINOR: activity: add a new lkw_avg column to show profiling stats
    - MINOR: activity: collect time spent waiting on a lock for each task
    - MINOR: thread: add a lock level information in the thread_ctx
    - MINOR: activity: add a new lkd_avg column to show profiling stats
    - MINOR: activity: collect time spent with a lock held for each task
    - MINOR: activity: add a new mem_avg column to show profiling stats
    - MINOR: activity: collect CPU time spent on memory allocations for each task
    - MINOR: activity/memory: count allocations performed under a lock
    - DOC: proxy-protocol: Add TLS group and sig scheme TLVs
    - BUG/MEDIUM: resolvers: Test for empty tree when getting a record from DNS answer
    - BUG/MEDIUM: resolvers: Make resolution owns its hostname_dn value
    - BUG/MEDIUM: resolvers: Accept to create resolution without hostname
    - BUG/MEDIUM: resolvers: Wake resolver task up whne unlinking a stream requester
    - BUG/MINOR: ocsp: Crash when updating CA during ocsp updates
    - Revert "BUG/MINOR: ocsp: Crash when updating CA during ocsp updates"
    - BUG/MEDIUM: http_ana: fix potential NULL deref in http_process_req_common()
    - MEDIUM: log/proxy: store log-steps selection using a bitmask, not an eb tree
    - BUG/MINOR: ocsp: Crash when updating CA during ocsp updates
    - BUG/MINOR: resolvers: always normalize FQDN from response
    - BUILD: makefile: implement support for running a command in range
    - IMPORT: cebtree: import version 0.5.0 to support duplicates
    - MEDIUM: migrate the patterns reference to cebs_tree
    - MEDIUM: guid: switch guid to more compact cebuis_tree
    - MEDIUM: server: switch addr_node to cebis_tree
    - MEDIUM: server: switch conf.name to cebis_tree
    - MEDIUM: server: switch the host_dn member to cebis_tree
    - MEDIUM: proxy: switch conf.name to cebis_tree
    - MEDIUM: stktable: index table names using compact trees
    - MINOR: proxy: add proxy_get_next_id() to find next free proxy ID
    - MINOR: listener: add listener_get_next_id() to find next free listener ID
    - MINOR: server: add server_get_next_id() to find next free server ID
    - CLEANUP: server: use server_find_by_id() when looking for already used IDs
    - MINOR: server: add server_index_id() to index a server by its ID
    - MINOR: listener: add listener_index_id() to index a listener by its ID
    - MINOR: proxy: add proxy_index_id() to index a proxy by its ID
    - MEDIUM: proxy: index proxy ID using compact trees
    - MEDIUM: listener: index listener ID using compact trees
    - MEDIUM: server: index server ID using compact trees
    - CLEANUP: server: slightly reorder fields in the struct to plug holes
    - CLEANUP: proxy: slightly reorganize fields to plug some holes
    - CLEANUP: backend: factor the connection lookup loop
    - CLEANUP: server: use eb64_entry() not ebmb_entry() to convert an eb64
    - MINOR: server: pass the server and thread to srv_migrate_conns_to_remove()
    - CLEANUP: backend: use a single variable for removed in srv_cleanup_idle_conns()
    - MINOR: connection: pass the thread number to conn_delete_from_tree()
    - MEDIUM: connection: move idle connection trees to ceb64
    - MEDIUM: connection: reintegrate conn_hash_node into connection
    - CLEANUP: tools: use the item API for the file names tree
    - CLEANUP: vars: use the item API for the variables trees
    - BUG/MEDIUM: pattern: fix possible infinite loops on deletion
    - CI: scripts: add support for git in openssl builds
    - CI: github: add an OpenSSL + ECH job
    - CI: scripts: mkdir BUILDSSL_TMPDIR
    - Revert "BUG/MEDIUM: pattern: fix possible infinite loops on deletion"
    - BUG/MEDIUM: pattern: fix possible infinite loops on deletion (try 2)
    - CLEANUP: log: remove deadcode in px_parse_log_steps()
    - MINOR: counters: document that tg shared counters are tied to shm-stats-file mapping
    - DOC: internals: document the shm-stats-file format/mapping
    - IMPORT: ebtree: delete unusable ebpttree.c
    - IMPORT: eb32/eb64: reorder the lookup loop for modern CPUs
    - IMPORT: eb32/eb64: use a more parallelizable check for lack of common bits
    - IMPORT: eb32: drop the now useless node_bit variable
    - IMPORT: eb32/eb64: place an unlikely() on the leaf test
    - IMPORT: ebmb: optimize the lookup for modern CPUs
    - IMPORT: eb32/64: optimize insert for modern CPUs
    - IMPORT: ebtree: only use __builtin_prefetch() when supported
    - IMPORT: ebst: use prefetching in lookup() and insert()
    - IMPORT: ebtree: Fix UB from clz(0)
    - IMPORT: ebtree: add a definition of offsetof()
    - IMPORT: ebtree: replace hand-rolled offsetof to avoid UB
    - MINOR: listener: add the "cc" bind keyword to set the TCP congestion controller
    - MINOR: server: add the "cc" keyword to set the TCP congestion controller
    - BUG/MEDIUM: ring: invert the length check to avoid an int overflow
    - MINOR: trace: don't call strlen() on the thread-id numeric encoding
    - MINOR: trace: don't call strlen() on the function's name
    - OPTIM: sink: reduce contention on sink_announce_dropped()
    - OPTIM: sink: don't waste time calling sink_announce_dropped() if busy
    - CLEANUP: ring: rearrange the wait loop in ring_write()
    - OPTIM: ring: always relax in the ring lock and leader wait loop
    - OPTIM: ring: check the queue's owner using a CAS on x86
    - OPTIM: ring: avoid reloading the tail_ofs value before the CAS in ring_write()
    - BUG/MEDIUM: sink: fix unexpected double postinit of sink backend
    - MEDIUM: stats: consider that shared stats pointers may be NULL
    - BUG/MEDIUM: http-client: Fix the test on the response start-line
    - MINOR: acme: acme-vars allow to pass data to the dpapi sink
    - MINOR: acme: check acme-vars allocation during escaping
    - BUG/MINOR: acme/cli: wrong description for "acme challenge_ready"
    - CI: move VTest preparation & friends to dedicated composite action
    - BUG/MEDIUM: stick-tables: Don't let table_process_entry() handle refcnt
    - BUG/MINOR: compression: Test payload size only if content-length is specified
    - BUG/MINOR: pattern: Properly flag virtual maps as using samples
    - BUG/MINOR: acme: possible overflow on scheduling computation
    - BUG/MINOR: acme: possible overflow in acme_will_expire()
    - CLEANUP: acme: acme_will_expire() uses acme_schedule_date()
    - BUG/MINOR: pattern: Fix pattern lookup for map with opt@ prefix
    - CI: scripts: build curl with ECH support
    - CI: github: add curl+ech build into openssl-ech job
    - BUG/MEDIUM: ssl: ca-file directory mode must read every certificates of a file
    - MINOR: acme: provider-name for dpapi sink
    - BUILD: acme: fix false positive null pointer dereference
    - MINOR: backend: srv_queue helper
    - MINOR: backend: srv_is_up converter
    - BUILD: halog: misleading indentation in halog.c
    - CI: github: build halog on the vtest job
    - BUG/MINOR: acme: don't unlink from acme_ctx_destroy()
    - BUG/MEDIUM: acme: cfg_postsection_acme() don't init correctly acme sections
    - MINOR: acme: implement "reuse-key" option
    - ADMIN: haproxy-dump-certs: implement a certificate dumper
    - ADMIN: dump-certs: don't update the file if it's up to date
    - ADMIN: dump-certs: create files in a tmpdir
    - ADMIN: dump-certs: fix lack of / in -p
    - ADMIN: dump-certs: use same error format as haproxy
    - ADMIN: reload: add a synchronous reload helper
    - BUG/MEDIUM: acme: free() of i2d_X509_REQ() with AWS-LC
    - ADMIN: reload: introduce verbose and silent mode
    - ADMIN: reload: introduce -vv mode
    - MINOR: mt_list: Implement MT_LIST_POP_LOCKED()
    - BUG/MEDIUM: stick-tables: Make sure not to free a pending entry
    - MINOR: sched: let's permit to share the local ctx between threads
    - MINOR: sched: pass the thread number to is_sched_alive()
    - BUG/MEDIUM: wdt: improve stuck task detection accuracy
    - MINOR: ssl: add the ssl_bc_sni sample fetch function to retrieve backend SNI
    - MINOR: rawsock: introduce CO_RFL_TRY_HARDER to detect closures on complete reads
    - MEDIUM: ssl: don't always process pending handshakes on closed connections
    - MEDIUM: servers: Schedule the server requeue target on creation
    - MEDIUM: fwlc: Make it so fwlc_srv_reposition works with unqueued srv
    - BUG/MEDIUM: fwlc: Handle memory allocation failures.
    - DOC: config: clarify some known limitations of the json_query() converter
    - BUG/CRITICAL: mjson: fix possible DoS when parsing numbers
    - BUG/MINOR: h2: forbid 'Z' as well in header field names checks
    - BUG/MINOR: h3: forbid 'Z' as well in header field names checks
    - BUG/MEDIUM: resolvers: break an infinite loop in resolv_get_ip_from_response()
2025-10-03 12:12:51 +02:00
Willy Tarreau
ced9784df4 BUG/MEDIUM: resolvers: break an infinite loop in resolv_get_ip_from_response()
The fix in 3023e98199 ("BUG/MINOR: resolvers: Restore round-robin
selection on records in DNS answers") still contained an issue not
addressed f6dfbbe870 ("BUG/MEDIUM: resolvers: Test for empty tree
when getting a record from DNS answer"). Indeed, if the next element
is the same as the first one, then we can end up with an endless loop
because the test at the end compares the next pointer (possibly null)
with the end one (first).

Let's move the null->first transition at the end. This must be
backported where the patches above were backported (3.2 for now).
2025-10-03 09:08:10 +02:00
zhanhb
ad75431b9c BUG/MINOR: h3: forbid 'Z' as well in header field names checks
The current tests in _h3_handle_hdr() and h3_trailers_to_htx() check
for an interval between 'A' and 'Z' for letters in header field names
that should be forbidden, but mistakenly leave the 'Z' out of the
forbidden range, resulting in it being implicitly valid.

This has no real consequences but should be fixed for the sake of
protocol validity checking.

This must be backported to all relevant versions.
2025-10-02 15:30:02 +02:00
zhanhb
7163d9180c BUG/MINOR: h2: forbid 'Z' as well in header field names checks
The current tests in h2_make_htx_request(), h2_make_htx_response()
and h2_make_htx_trailers() check for an interval between 'A' and 'Z'
for letters in header field names that should be forbidden, but
mistakenly leave the 'Z' out of the forbidden range, resulting in it
being implicitly valid.

This has no real consequences but should be fixed for the sake of
protocol validity checking.

This must be backported to all relevant versions.
2025-10-02 15:29:58 +02:00
Willy Tarreau
06675db4bf BUG/CRITICAL: mjson: fix possible DoS when parsing numbers
Mjson comes with its own strtod() implementation for portability
reasons and probably also because many generic strtod() versions as
provided by operating systems do not focus on resource preservation
and may call malloc(), which is not welcome in a parser.

The strtod() implementation used here apparently originally comes from
https://gist.github.com/mattn/1890186 and seems to have purposely
omitted a few parts that were considered as not needed in this context
(e.g. skipping white spaces, or setting errno). But when subject to the
relevant test cases of the designated file above, the current function
provides the same results.

The aforementioned implementation uses pow() to calculate exponents,
but mjson authors visibly preferred not to introduce a libm dependency
and replaced it with an iterative loop in O(exp) time. The problem is
that the exponent is not bounded and that this loop can take a huge
amount of time. There's even an issue already opened on mjson about
this: https://github.com/cesanta/mjson/issues/59. In the case of
haproxy, fortunately, the watchdog will quickly stop a runaway process
but this remains a possible denial of service.

A first approach would consist in reintroducing pow() like in the
original implementation, but if haproxy is built without Lua nor
51Degrees, -lm is not used so this will not work everywhere.

Anyway here we're dealing with integer exponents, so an easy alternate
approach consists in simply using shifts and squares, to compute the
exponent in O(log(exp)) time. Not only it doesn't introduce any new
dependency, but it turns out to be even faster than the generic pow()
(85k req/s per core vs 83.5k on the same machine).

This must be backported as far as 2.4, where mjson was introduced.

Many thanks to Oula Kivalo for reporting this issue.

CVE-2025-11230 was assigned to this issue.
2025-10-02 09:37:43 +02:00
Willy Tarreau
67603162c1 DOC: config: clarify some known limitations of the json_query() converter
Oula Kivalo reported that different JSON libraries may process duplicate
keys differently and that most JSON libraries usually decode the stream
before extracting keys, while the current mjson implementation decodes the
contents during extraction instead. Let's document this point so that
users are aware of the limitations and do not rely on the current behavior
and do not use it for what it's not made for (e.g. content sanitization).

This is also the case for jwt_header_query(), jwt_payload_query() and
jwt_verify(), which already refer to this converter for specificities.
2025-10-02 08:57:39 +02:00
Olivier Houchard
b71bb6c2ae BUG/MEDIUM: fwlc: Handle memory allocation failures.
Properly handle memory allocation failures, by checking the return value
for pool_alloc(), and if it fails, make sure that the caller will take
it into account.
The only use of pool_alloc() in fwlc is to allocate the tree elements in
order to properly queue the server into the ebtree, so if that
allocation fails, just schedule the requeue tasklet, that will try
again, until it hopefully eventually succeeds.

This should be backported to 3.2.
This should fix github issue #3143.
2025-10-01 18:13:33 +02:00
Olivier Houchard
f4a9c6ffae MEDIUM: fwlc: Make it so fwlc_srv_reposition works with unqueued srv
Modify fwlc_srv_reposition() so that it does not assume that the server
was already queued, and so make it so it works even if s->tree_elt is
NULL.
While the server will usually be queued, there is an unlikely
possibility that when the server attempted to get queued when it got up,
it failed due to a memory allocation failure, and it just expect the
server_requeue tasklet to run to take care of that later.

This should be backported to 3.2.
This is part of an attempt to fix github issue #3143
2025-10-01 18:13:33 +02:00
Olivier Houchard
822ee90dc2 MEDIUM: servers: Schedule the server requeue target on creation
On creation, schedule the server requeue once it's been created.
It is possible that when the server went up, it tried to queue itself
into the lb specific code, failed to do so, and expect the tasklet to
run to take care of that.

This should be backported to 3.2.
This is part of an attempt to fix github issue #3143.
2025-10-01 18:13:33 +02:00
Willy Tarreau
7ea80cc5b6 MEDIUM: ssl: don't always process pending handshakes on closed connections
If a client aborts a pending SSL connection for whatever reason (timeout
etc) and the listen queue is large, it may inflict a severe load to a
frontend which will spend the CPU creating new sessions then killing the
connection. This is similar to HTTP requests aborted just after being
sent, except that asymmetric crypto is way more expensive.

Unfortunately "option abortonclose" has no effect on this, because it
only applies at a higher level.

This patch ensures that handshakes being received on a frontend having
"option abortonclose" set will be checked for a pending close, and if
this is the case, then the connection will be aborted before the heavy
calculations. The principle is to use recv(MSG_PEEK) to detect the end,
and to destroy the pending handshake data before returning to the SSL
library so that it cannot start computing, notices the error and stops.
We don't do it without abortonclose though, because this can be used for
health checks from other haproxy nodes or even other components which
just want to see a handshake succeed.

This is in relation with GH issue #3124.
2025-10-01 10:23:04 +02:00
Willy Tarreau
1afaa7b59d MINOR: rawsock: introduce CO_RFL_TRY_HARDER to detect closures on complete reads
Normally, when reading a full buffer, or exactly the requested size, it
is not really possible to know if the peer had closed immediately after,
and usually we don't care. There's a problematic case, though, which is
with SSL: the SSL layer reads in small chunks of a few bytes, and can
consume a client_hello this way, then start computation without knowing
yet that the client has aborted. In order to permit knowing more, we now
introduce a new read flag, CO_RFL_TRY_HARDER, which says that if we've
read up to the permitted limit and the flag is set, then we attempt one
extra byte using MSG_PEEK to detect whether the connection was closed
immediately after that content or not. The first use case will obviously
be related to SSL and client_hello, but it might possibly also make sense
on HTTP responses to detect a pending FIN at the end of a response (e.g.
if a close was already advertised).
2025-10-01 10:23:01 +02:00
Willy Tarreau
dae4cfe8c5 MINOR: ssl: add the ssl_bc_sni sample fetch function to retrieve backend SNI
Sometimes in order to debug certain difficult situations it can be useful
to know what SNI was configured on a connection going to a server, for
example to match it against what the server saw or to detect cases where
a server would route on SNI instead of Host. This sample fetch function
simply retrieves the SNI configured on the backend connection, if any.
2025-10-01 10:18:53 +02:00
Willy Tarreau
205f1cbf4c BUG/MEDIUM: wdt: improve stuck task detection accuracy
The fact that the watchdog timer measures the execution time from the
last return from the poller tends to amplify the impact of multiple
bad tasks, and may explain some of the panics reported by Felipe and
Ricardo in GH issues #3084, #3092 and #3101. The problem is that we
check the time if we see that the scheduler appears not to be moving
anymore, but one situation may still arise and catch a bad task:
  - one slow task takes so long a time that it triggers the watchdog
    twice, emitting a warning the second time (~200ms). The scheduler
    is rightfully marked as stuck.
  - then it completes and the scheduler is no longer stuck. Many other
    tasks run in turn, they all take quite some time but not enough to
    trigger a warning. But collectively their cost adds up.
  - then a task takes more than the warning time (100ms), and causes
    the total execution time to cross the second. The watchdog is
    called, sees that we've spend more than 1 second since we left the
    poller, and marks the thread as stuck.
  - the task is not finished, the watchdog is called again, sees more
    than one second with a stuck thread and panics 100ms later.

The total time away from the poller is indeed more than one second,
which is very bad, but no single task caused this individually, and
while the warnings are OK, the watchdog should not panic in this case.

This patch revisits the approach to store the moment the scheduler was
marked as stuck in the wdt context. The idea is that this date will be
used to detect warnings and panics. And by doing so and exploiting the
new is_sched_alive(thr), we can greatly simplify the mechanism so that
the signal handling thread does the strict minimum (mark the scheduler
as possibly stuck and update the stuck_start date), and only bounces to
the reporting thread if the scheduler made no progress since last call.
This means that without even doing computations in the handing thread,
we can continue to avoid all bounces unless a warning is required. Then
when the reporting thread is signaled, it will check the dates from the
last moment the scheduler was marked, and will decide to warn or panic.

The panic decision continues to pass via a TH_FL_STUCK flag to probe the
code so that exceptionally slow code (e.g. live cert generation etc) can
still find a way to avoid the panic if absolutely certain that things
are still moving.

This means that now we have the guarantee that panics will only happen
if a given task spends more than one full second not moving, and that
warnings will be issued for other calls crossing the warn delay boundary.

This was tested using artificially slow operations, and all combinations
which individually took less than a second only resulted in floods of
warnings even if the total reported time in the warning was much higher,
while those above one second provoked the panic.

One improvement could consist in reporting the time since last stuck
in the thread dumps to differentiate the individual task from the whole
set.

This needs to be backported to 3.2 along with the two previous patches:

    MINOR: sched: let's permit to share the local ctx between threads
    MINOR: sched: pass the thread number to is_sched_alive()
2025-10-01 10:18:53 +02:00
Willy Tarreau
25f5f357cc MINOR: sched: pass the thread number to is_sched_alive()
Now it will be possible to query any thread's scheduler state, not
only the current one. This aims at simplifying the watchdog checks
for reported threads. The operation is now a simple atomic xchg.
2025-10-01 10:18:53 +02:00
Willy Tarreau
7c7e17a605 MINOR: sched: let's permit to share the local ctx between threads
The watchdog timer has to go through complex operations due to not being
able to check if another thread's scheduler is still ticking. This is
simply because the scheduler status is marked as thread-local while it
could in fact also be an array. Let's do that (and align the array to
avoid false sharing) so that it's now possible to check any scheduler's
status.
2025-10-01 10:18:53 +02:00
Olivier Houchard
21ae35dd29 BUG/MEDIUM: stick-tables: Make sure not to free a pending entry
There is a race condition, an entry can be free'd by stksess_kill()
between the time stktable_add_pend_updates() gets the entry from the
mt_list, and the time it adds it to the ebtree.
To prevent this, use the newly implemented MT_LIST_POP_LOCKED() to keep
the stksess locked until it is added to the tree. That way,
__stksess_kill() will wait until we're done with it.

This should be backported to 3.2.
2025-09-30 16:25:07 +02:00
Olivier Houchard
cf26745857 MINOR: mt_list: Implement MT_LIST_POP_LOCKED()
Implement MT_LIST_POP_LOCKED(), that behaves as MT_LIST_POP() and
removes the first element from the list, if any, but keeps it locked.

This should be backported to 3.2, as it will be use in a bug fix in the
stick tables that affects 3.2 too.
2025-09-30 16:25:07 +02:00
William Lallemand
6316f958e3 ADMIN: reload: introduce -vv mode
The -v verbose mode displays the loading messages returned by the master
CLI reload command upon error.

The new -vv mode displays the loading messages even upon success,
showing the content of `show startup-logs` after the reload attempt.
2025-09-29 19:29:10 +02:00
William Lallemand
5d05f343b9 ADMIN: reload: introduce verbose and silent mode
By default haproxy-reload displays the error that are not emitted by
haproxy, but only emitted by haproxy-reload.

-s silent mode, don't display any error

-v verbose mode, display the loading messages returned by the master CLI
reload command upon error.
2025-09-29 19:29:10 +02:00
William Lallemand
3ce597bfa2 BUG/MEDIUM: acme: free() of i2d_X509_REQ() with AWS-LC
When using AWS-LC, the free() of the data ptr resulting from
i2d_X509_REQ() might crash, because it uses the free() of the libc
instead of OPENSSL_free().

It does not seems to be a problem on openssl builds.

Must be backported in 3.2.
2025-09-29 13:46:51 +02:00
William Lallemand
8635c7d789 ADMIN: reload: add a synchronous reload helper
haproxy-reload is a utility script which reload synchronously using the
master CLI, instead of asynchronously with kill.
2025-09-28 22:10:40 +02:00
William Lallemand
02f7bff90b ADMIN: dump-certs: use same error format as haproxy
Replace error/notice by [ALERT]/[WARNING]/[NOTICE] like it's done in
haproxy.

ALERT means a failure and the program will exit 1 just after it
WARNING will continue the execution of the program
NOTICE will continue the execution as well
2025-09-28 20:21:07 +02:00
William Lallemand
5c9f28641b ADMIN: dump-certs: fix lack of / in -p
Add a trailing / so -p don't fail if it wasn't specified.
2025-09-28 18:21:25 +02:00
William Lallemand
172ac6ad03 ADMIN: dump-certs: create files in a tmpdir
Files dumped from the socket are put in a temporary directory, this
directory is then removed upon exit.

Variable were cleaned to be clearer:
- crt_filename -> prev_crt
- key_filename -> prev_key
- ${crt_filename}.${tmp} -> new_crt
- ${key_filename}.${tmp} -> new_key
2025-09-28 18:21:25 +02:00
William Lallemand
8781c65d8a ADMIN: dump-certs: don't update the file if it's up to date
Compare the fingerprint of the leaf certificate to the previous file to
check if it needs to be updated or not

Also skip the check if no file is on the disk.
2025-09-28 18:21:20 +02:00
William Lallemand
3a6ea8b959 ADMIN: haproxy-dump-certs: implement a certificate dumper
haproxy-dump0-certs is a bash script that connects to your master socket
or your stat socket in order to dump certificates from haproxy memory to
the corresponding files.
2025-09-28 13:38:48 +02:00
William Lallemand
b70c7f48fa MINOR: acme: implement "reuse-key" option
The new "reuse-key" option in the "acme" section, allows to keep the
private key instead of generating a new one at each renewal.
2025-09-27 21:41:39 +02:00
William Lallemand
a9ccf692e7 BUG/MEDIUM: acme: cfg_postsection_acme() don't init correctly acme sections
The cfg_postsection_acme() redefines its own cur_acme variable, pointing
to the first acme section created. Meaning that the first section would
be init multiple times, and the next sections won't never be
initialized.

It could result in crashes at the first use of all sections that are not
the first one.

Must be backported in 3.2
2025-09-27 19:58:44 +02:00
William Lallemand
406fd0ceb1 BUG/MINOR: acme: don't unlink from acme_ctx_destroy()
Unlinking the acme_ctx element from acme_ctx_destroy() requires to have
the element unlocked, because MT_LIST_DELETE() locks the element.

acme_ctx_destroy() frees the data from acme_ctx with the ctx still
linked and unlocked, then lock to unlink. So there's a small risk of
accessing acme_ctx from somewhere else. The only way to do that would be
to use the `acme challenge_ready` CLI command at the same time.

Fix the issue by doing a mt_list_unlock_link() and a
mt_list_unlock_self() to unlink the element under the lock, then destroy
the element.

This must be backported in 3.2.
2025-09-27 18:52:56 +02:00
William Lallemand
6499c0a0d5 CI: github: build halog on the vtest job
halog was not built in the vtest job. Add it to vtest.yml to be able to
track build issues on push.
2025-09-26 16:29:29 +02:00
William Lallemand
f1f5877ce1 BUILD: halog: misleading indentation in halog.c
admin/halog/halog.c: In function 'filter_count_url':
admin/halog/halog.c:1685:9: error: this 'if' clause does not guard... [-Werror=misleading-indentation]
 1685 |         if (unlikely(!ustat))
      |         ^~
admin/halog/halog.c:1687:17: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if'
 1687 |                 if (unlikely(!ustat)) {
      |                 ^~

This patch fixes the indentation.

Must be backported where fbd0fb20a22 ("BUG/MINOR: halog: Add OOM checks
for calloc() in filter_count_srv_status() and filter_count_url()") was
backported.
2025-09-26 16:01:50 +02:00
Chris Staite
54f53bc875 MINOR: backend: srv_is_up converter
There is currently an srv_queue converter which is capable of taking the
output of a dynamic name and determining the queue length for a given
server.  In addition there is a sample fetcher for whether a server is
currently up.  This simply combines the two such that srv_is_up can be
used as a converter too.

Future work might extend this to other sample fetchers for servers, but
this is probably the most useful for acl routing.
2025-09-26 10:46:48 +02:00
Chris Staite
faba98c85f MINOR: backend: srv_queue helper
In preparation of providing further server converters, split the code
for finding the server from the sample out.

Additionally, update the documentation for srv_queue converter to note
security concerns.
2025-09-26 10:46:48 +02:00
William Lallemand
b3b910cc3f BUILD: acme: fix false positive null pointer dereference
src/acme.c: In function ‘cfg_parse_acme_vars_provider’:
src/acme.c:471:9: error: potential null pointer dereference [-Werror=null-dereference]
  471 |         free(*dst);
      |         ^~~~~~~~~~

gcc13 on ubuntu 24.04 detects a false positive when building
3e72a9f ("MINOR: acme: provider-name for dpapi sink").
Indeed dst can't be NULL. Clarify the code so gcc don't complain
anymore.
2025-09-26 10:34:35 +02:00
William Lallemand
3e72a9f618 MINOR: acme: provider-name for dpapi sink
Like "acme-vars", the "provider-name" in the acme section is used in
case of DNS-01 challenge and is sent to the dpapi sink.

This is used to pass the name of a DNS provider in order to chose the
DNS API to use.

This patch implements the cfg_parse_acme_vars_provider() which parses
either acme-vars or provider-name options and escape their strings.

Example:

     $ ( echo "@@1 show events dpapi -w -0"; cat - ) | socat /tmp/master.sock -  | cat -e
     <0>2025-09-18T17:53:58.831140+02:00 acme deploy foobpar.pem thumbprint gDvbPL3w4J4rxb8gj20mGEgtuicpvltnTl6j1kSZ3vQ$
     acme-vars "var1=foobar\"toto\",var2=var2"$
     provider-name "godaddy"$
     {$
       "identifier": {$
         "type": "dns",$
         "value": "example.com"$
       },$
       "status": "pending",$
       "expires": "2025-09-25T14:41:57Z",$
       [...]
2025-09-26 10:23:35 +02:00
William Lallemand
c52d69cc78 BUG/MEDIUM: ssl: ca-file directory mode must read every certificates of a file
The httpclient is configured with @system-ca by default, which uses the
directory returned by X509_get_default_cert_dir().

On debian/ubuntu systems, this directory contains multiple certificate
files that are loaded successfully. However it seems that on other
systems the files in this directory is the direct result of
ca-certificates instead of its source. Meaning that you would only have
a bundle file with every certificates in it.

The loading was not done correctly in case of directory loading, and was
only loading the first certificate of each file.

This patch fixes the issue by using X509_STORE_load_locations() on each
file from the scandir instead of trying to load it manually with BIO.

Not that we can't use X509_STORE_load_locations with the `dir` argument,
which would be simpler, because it uses X509_LOOKUP_hash_dir() which
requires a directory in hash form. That wouldn't be suited for this use
case.

Must be backported in every stable branches.

Fix issue #3137.
2025-09-26 09:36:55 +02:00
William Lallemand
230a072102 CI: github: add curl+ech build into openssl-ech job
Build a curl binary with the ECH function linked with our openssl+ech
library.
2025-09-25 17:05:46 +02:00
William Lallemand
44b20e0b01 CI: scripts: build curl with ECH support
Add a script to build curl with ECH support, to specify the path of the
openssl+ECH library, you should set the SSL_LIB variable with the prefix
of the library.

Example:
   SSL_LIB=/opt/openssl-ech CURL_DESTDIR=/opt/curl-ech/ ./build-curl.sh
2025-09-25 17:05:46 +02:00
Christopher Faulet
7aa9f5ec98 BUG/MINOR: pattern: Fix pattern lookup for map with opt@ prefix
When we look for a map file reference, the file@ prefix is removed because
if may be omitted. The same is true with opt@ prefix. However this case was
not properly performed in pat_ref_lookup(). Let's do so.

This patch must be backported as far as 3.0.
2025-09-25 15:28:22 +02:00
William Lallemand
c325e34e6d CLEANUP: acme: acme_will_expire() uses acme_schedule_date()
Date computation between acme_will_expire() and acme_schedule_date() are
the same. Call acme_schedule_date() from acme_will_expire() and put the
functions as static. The patch also move the functions in the right
order.
2025-09-25 15:14:31 +02:00
William Lallemand
f256b5fdf3 BUG/MINOR: acme: possible overflow in acme_will_expire()
acme_will_expire() computes the schedule date using notAfter and
notBefore from the certificate. However notBefore could be greater than
notAfter and could result in an overflow.

This is unlikely to happen and would mean an incorrect certificate.

This patch fixes the issue by checking that notAfter > notBefore.

It also replace the int type by a time_t to avoid overflow on 64bits
architecture which is also unlikely to happen with certificates.

`(date.tv_sec + diff > notAfter)` was also replaced by `if (notAfter -
diff <= date.tv_sec)` to avoid an overflow.

Fix issue #3135.

Need to be backported to 3.2.
2025-09-25 15:12:14 +02:00
William Lallemand
68770479ea BUG/MINOR: acme: possible overflow on scheduling computation
acme_schedule_date() computes the schedule date using notAfter and
notBefore from the certificate. However notBefore could be greater than
notAfter and could result in an overflow.

This is unlikely to happen and would mean an incorrect certificate.

This patch fixes the issue by checking that notAfter > notBefore.

It also replace the int type by a time_t to avoid overflow on 64bits
architecture which is also unlikely to happen with certificates.

Fix issue #3136.

Need to be backported to 3.2.
2025-09-25 15:12:03 +02:00
Christopher Faulet
3be8b06a60 BUG/MINOR: pattern: Properly flag virtual maps as using samples
When a map file is load, internally, the pattern reference is flagged as
based on a sample. However it is not performed for virtual maps. This flag
is only used during startup to check the map compatibility when it used at
different places. At runtime this does not change anything. But errors can
be triggered during configuration parsing. For instance, the following valid
config will trigger an error:

    http-request set-map(virt@test) foo bar if !{ str(foo),map(virt@test) -m found }
    http-request set-var(txn.foo) str(foo),map(virt@test)

The fix is quite obvious. PAT_REF_SMP flag must be set for virtual map as
any other map.

A workaround is to use optional map (opt@...) by checking the map id cannot
reference an existing file.

This patch must be backported as far as 3.0.
2025-09-25 10:16:53 +02:00
Christopher Faulet
23e5d272af BUG/MINOR: compression: Test payload size only if content-length is specified
When a minimum size is defined to performe the comression, the message
payload size is tested. To do so, information from the HTX message a used to
determine the message length. However it is performed regardless the payload
length is fully known or not. Concretely, the test must on be performed when
a content-length value was speficied or when the message was fully received
(EOM flag set). Otherwise, we are unable to really determine the real
payload length.

Because of this bug, compression may be skipped for a large chunked message
because the first chunks received are too small. But this does not mean the
whole message is small.

This patch must be backported to 3.2.
2025-09-25 10:16:53 +02:00
Olivier Houchard
71199e394c BUG/MEDIUM: stick-tables: Don't let table_process_entry() handle refcnt
Instead of having table_process_entry() decrement the session's ref
counter, do it outside, from the caller. Some were missed, such as when
an action was invalid, which would lead to the ref counter not being
decremented, and the session not being destroyable.
It makes more sense to do that from the caller, who just obtained the
ref counter, anyway.
This should be backporter up to 2.8.
2025-09-22 23:14:19 +02:00
Ilia Shipitsin
8c8e50e09a CI: move VTest preparation & friends to dedicated composite action
reference: https://docs.github.com/en/actions/tutorials/create-actions/create-a-composite-action

preparing coredump limits, installing VTest are now served by dedicated
composite action
2025-09-22 19:18:23 +02:00
William Lallemand
fbffd2e25f BUG/MINOR: acme/cli: wrong description for "acme challenge_ready"
The "acme challenge_ready" command mistakenly use the description of the
"acme status" command. This patch adds the right description.

Must be backported to 3.2.
2025-09-22 19:14:54 +02:00
William Lallemand
34cdc5e191 MINOR: acme: check acme-vars allocation during escaping
Handle allocation properly during acme-vars parsing.
Check if we have a allocation failure in both the malloc and the
realloc and emits an error if that's the case.
2025-09-19 18:11:50 +02:00
William Lallemand
92c31a6fb7 MINOR: acme: acme-vars allow to pass data to the dpapi sink
In the case of the dns-01 challenge, the agent that handles the
challenge might need some extra information which depends on the DNS
provider.

This patch introduces the "acme-vars" option in the acme section, which
allows to pass these data to the dpapi sink. The double quotes will be
escaped when printed in the sink.

Example:

    global
        setenv VAR1 'foobar"toto"'

    acme LE
        directory https://acme-staging-v02.api.letsencrypt.org/directory
        challenge DNS-01
        acme-vars "var1=${VAR1},var2=var2"

Would output:

    $ ( echo "@@1 show events dpapi -w -0"; cat - ) | socat /tmp/master.sock -  | cat -e
    <0>2025-09-18T17:53:58.831140+02:00 acme deploy foobpar.pem thumbprint gDvbPL3w4J4rxb8gj20mGEgtuicpvltnTl6j1kSZ3vQ$
    acme-vars "var1=foobar\"toto\",var2=var2"$
    {$
      "identifier": {$
        "type": "dns",$
        "value": "example.com"$
      },$
      "status": "pending",$
      "expires": "2025-09-25T14:41:57Z",$
      [...]
2025-09-19 16:40:53 +02:00
Christopher Faulet
331689d216 BUG/MEDIUM: http-client: Fix the test on the response start-line
The commit 88aa7a780 ("MINOR: http-client: Trigger an error if first
response block isn't a start-line") introduced a bug. From an endpoint, an
applet or a mux, the <first> index must never be used. It is reserved to the
HTTP analyzers. From endpoint, this value may be undefined or just point on
any other block that the first one. Instead we must always get the head
block.

In taht case, to be sure the first HTX block in a response is a start-line,
we must use htx_get_head_type() function instead of htx_get_first_type().
Otherwise, we can trigger an error while the response is in fact properly
formatted.

It is a 3.3-speific issue. cNo backport needed.
2025-09-19 14:59:28 +02:00
Aurelien DARRAGON
5c299dee5a MEDIUM: stats: consider that shared stats pointers may be NULL
This patch looks huge, but it has a very simple goal: protect all
accessed to shared stats pointers (either read or writes), because
we know consider that these pointers may be NULL.

The reason behind this is despite all precautions taken to ensure the
pointers shouldn't be NULL when not expected, there are still corner
cases (ie: frontends stats used on a backend which no FE cap and vice
versa) where we could try to access a memory area which is not
allocated. Willy stumbled on such cases while playing with the rings
servers upon connection error, which eventually led to process crashes
(since 3.3 when shared stats were implemented)

Also, we may decide later that shared stats are optional and should
be disabled on the proxy to save memory and CPU, and this patch is
a step further towards that goal.

So in essence, this patch ensures shared stats pointers are always
initialized (including NULL), and adds necessary guards before shared
stats pointers are de-referenced. Since we already had some checks
for backends and listeners stats, and the pointer address retrieval
should stay in cpu cache, let's hope that this patch doesn't impact
stats performance much.
2025-09-18 16:49:51 +02:00
Aurelien DARRAGON
40eb1dd135 BUG/MEDIUM: sink: fix unexpected double postinit of sink backend
Willy experienced an unexpected behavior with the config below:

    global
        stats socket :1514

    ring buf1
        server srv1 127.0.0.1:1514

Indeed, haproxy would connect to the ring server twice since commit 23e5f18b
("MEDIUM: sink: change the sink mode type to PR_MODE_SYSLOG"), and one of the
connection would report errors.

The reason behind is is, despite the above commit saying no change of behavior
is expected, with the sink forward_px proxy now being set with PR_MODE_SYSLOG,
postcheck_log_backend() was being automatically executed in addition to the
manual cfg_post_parse_ring() function for each "ring" section. The consequence
is that sink_finalize() was called twice for a given "ring" section, which
means the connection init would be triggered twice.. which in turn resulted in
the behavior described above, plus possible unexpected side-effects.

To fix the issue, when we create the forward_px proxy, we now set the
PR_CAP_INT capability on it to tell haproxy not to automatically manage the
proxy (ie: to skip the automatic log backend postinit), because we are about
to manually manage the proxy from the sink API.

No backport needed, this bug is specific to 3.3
2025-09-18 16:49:29 +02:00
Willy Tarreau
79ef362d9e OPTIM: ring: avoid reloading the tail_ofs value before the CAS in ring_write()
The load followed by the CAS seem to cause two bus cycles, one to
retrieve the cache line in shared state and a second one to get
exclusive ownership of it. Tests show that on x86 it's much better
to just rely on the previous value and preset it to zero before
entering the loop. We just mask the ring lock in case of failure
so as to challenge it on next iteration and that's done.

This little change brings 2.3% extra performance (11.34M msg/s) on
a 64-core AMD.
2025-09-18 15:27:32 +02:00
Willy Tarreau
a727c6eaa5 OPTIM: ring: check the queue's owner using a CAS on x86
In the loop where the queue's leader tries to get the tail lock,
we also need to check if another thread took ownership of the queue
the current thread is currently working for. This is currently done
using an atomic load.

Tests show that on x86, using a CAS for this is much more efficient
because it allows to keep the cache line in exclusive state for a
few more cycles that permit the queue release call after the loop
to be done without having to wait again. The measured gain is +5%
for 128 threads on a 64-core AMD system (11.08M msg/s vs 10.56M).
However, ARM loses about 1% on this, and we cannot afford that on
machines without a fast CAS anyway, so the load is performed using
a CAS only on x86_64. It might not be as efficient on low-end models
but we don't care since they are not the ones dealing with high
contention.
2025-09-18 15:08:12 +02:00
Willy Tarreau
d25099b359 OPTIM: ring: always relax in the ring lock and leader wait loop
Tests have shown that AMD systems really need to use a cpu_relax()
in these two loops. The performance improves from 10.03 to 10.56M
messages per second (+5%) on a 128-thread system, without affecting
intel nor ARM, so let's do this.
2025-09-18 15:07:56 +02:00
Willy Tarreau
eca1f90e16 CLEANUP: ring: rearrange the wait loop in ring_write()
The loop is constructed in a complicated way with a single break
statement in the middle and many continue statements everywhere,
making it hard to better factor between variants. Let's first
reorganize it so as to make it easier to escape when the ring
tail lock is obtained. The sequence of instrucitons remains the
same, it's only better organized.
2025-09-18 14:58:38 +02:00
Willy Tarreau
08c6bbb542 OPTIM: sink: don't waste time calling sink_announce_dropped() if busy
If we see that another thread is already busy trying to announce the
dropped counter, there's no point going there, so let's just skip all
that operation from sink_write() and avoid disturbing the other thread.
This results in a boost from 244 to 262k req/s.
2025-09-18 09:07:35 +02:00
Willy Tarreau
4431e3bd26 OPTIM: sink: reduce contention on sink_announce_dropped()
perf top shows that sink_announce_dropped() consumes most of the CPU
on a 128-thread x86 system. Digging further reveals that the atomic
fetch_or() on the dropped field used to detect the presence of another
thread is entirely responsible for this. Indeed, the compiler implements
it using a CAS that loops without relaxing and makes all threads wait
until they can synchronize on this one, only to discover later that
another thread is there and they need to give up.

Let's just replace this with a hand-crafted CAS loop that will detect
*before* attempting the CAS if another thread is there. Doing so
achieves the same goal without forcing threads to agree. With this
simple change, the sustained request rate on h1 with all traces on
bumped from 110k/s to 244k/s!

This should be backported to stable releases where it's often needed
to help debugging.
2025-09-18 08:38:34 +02:00
Willy Tarreau
361c227465 MINOR: trace: don't call strlen() on the function's name
Currently there's a small mistake in the way the trace function and
macros. The calling function name is known as a constant until the
macro and passed as-is to the __trace() function. That one needs to
know its length and will call ist() on it, resulting in a real call
to strlen() while that length was known before the call. Let's use
an ist instead of a const char* for __trace() and __trace_enabled()
so that we can now completely avoid calling strlen() during this
operation. This has significantly reduced the importance of
__trace_enabled() in perf top.
2025-09-18 08:31:57 +02:00
Willy Tarreau
06fa9f717f MINOR: trace: don't call strlen() on the thread-id numeric encoding
In __trace(), we're making an integer for the thread id but this one
is passed through strlen() in the call to ist() because it's not a
constant. We do know that it's exactly 3 chars long so we can manage
this using ist2() and pass it the length instead in order to reduce
the number of calls to strlen().

Also let's note that the thread number will no longer be numeric for
thread numbers above 100.
2025-09-18 08:02:59 +02:00
Willy Tarreau
d53ad49ad1 BUG/MEDIUM: ring: invert the length check to avoid an int overflow
Vincent Gramer reported in GH issue #3125 a case of crash on a BUG_ON()
condition in the rings. What happens is that a message that is one byte
less than the maximum ring size is emitted, and it passes all the checks,
but once inflated by the extra +1 for the refcount, it can no longer. But
the check was made based on message size compared to space left, except
that this space left can now be negative, which is a high positive for
size_t, so the check remained valid and triggered a BUG_ON() later.

Let's compute the size the other way around instead (i.e. current +
needed) since we can't have rings as large as half of the memory space
anyway, thus we have no risk of overflow on this one.

This needs to be backported to all versions supporting multi-threaded
rings (3.0 and above).

Thanks to Vincent for the easy and working reproducer.
2025-09-17 18:45:13 +02:00
Willy Tarreau
8c077c17eb MINOR: server: add the "cc" keyword to set the TCP congestion controller
It is possible on at least Linux and FreeBSD to set the congestion control
algorithm to be used with outgoing connections, among the list of supported
and permitted ones. Let's expose this setting with "cc". Unknown or
forbidden algorithms will be ignored and the default one will continue to
be used.
2025-09-17 17:19:33 +02:00
Willy Tarreau
4ed3cf295d MINOR: listener: add the "cc" bind keyword to set the TCP congestion controller
It is possible on at least Linux and FreeBSD to set the congestion control
algorithm to be used with incoming connections, among the list of supported
and permitted ones. Let's expose this setting with "cc". Permission issues
might be reported (as warnings).
2025-09-17 17:03:42 +02:00
Ben Kallus
31d0695a6a IMPORT: ebtree: replace hand-rolled offsetof to avoid UB
The C standard specifies that it's undefined behavior to dereference
NULL (even if you use & right after). The hand-rolled offsetof idiom
&(((s*)NULL)->f) is thus technically undefined. This clutters the
output of UBSan and is simple to fix: just use the real offsetof when
it's available.

Note that there's no clear statement about this point in the spec,
only several points which together converge to this:

- From N3220, 6.5.3.4:
  A postfix expression followed by the -> operator and an identifier
  designates a member of a structure or union object. The value is
  that of the named member of the object to which the first expression
  points, and is an lvalue.

- From N3220, 6.3.2.1:
  An lvalue is an expression (with an object type other than void) that
  potentially designates an object; if an lvalue does not designate an
  object when it is evaluated, the behavior is undefined.

- From N3220, 6.5.4.4 p3:
  The unary & operator yields the address of its operand. If the
  operand has type "type", the result has type "pointer to type". If
  the operand is the result of a unary * operator, neither that operator
  nor the & operator is evaluated and the result is as if both were
  omitted, except that the constraints on the operators still apply and
  the result is not an lvalue. Similarly, if the operand is the result
  of a [] operator, neither the & operator nor the unary * that is
  implied by the [] is evaluated and the result is as if the & operator
  were removed and the [] operator were changed to a + operator.

=> In short, this is saying that C guarantees these identities:
    1. &(*p) is equivalent to p
    2. &(p[n]) is equivalent to p + n

As a consequence, &(*p) doesn't result in the evaluation of *p, only
the evaluation of p (and similar for []). There is no corresponding
special carve-out for ->.

See also: https://pvs-studio.com/en/blog/posts/cpp/0306/

After this patch, HAProxy can run without crashing after building w/
clang-19 -fsanitize=undefined -fno-sanitize=function,alignment

This is ebtree commit bd499015d908596f70277ddacef8e6fa998c01d5.
Signed-off-by: Willy Tarreau <w@1wt.eu>
This is ebtree commit 5211c2f71d78bf546f5d01c8d3c1484e868fac13.
2025-09-17 14:30:32 +02:00
Willy Tarreau
a31da78685 IMPORT: ebtree: add a definition of offsetof()
We'll use this to improve the definition of container_of(). Let's define
it if it does not exist. We can rely on __builtin_offsetof() on recent
enough compilers.

This is ebtree commit 1ea273e60832b98f552b9dbd013e6c2b32113aa5.
Signed-off-by: Willy Tarreau <w@1wt.eu>
This is ebtree commit 69b2ef57a8ce321e8de84486182012c954380401.
2025-09-17 14:30:32 +02:00
Ben Kallus
ddbff4e235 IMPORT: ebtree: Fix UB from clz(0)
From 'man gcc': passing 0 as the argument to "__builtin_ctz" or
"__builtin_clz" invokes undefined behavior. This triggers UBsan
in HAProxy.

[wt: tested in treebench and verified not to cause any performance
 regression with opstime-u32 nor stress-u32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
This is ebtree commit 8c29daf9fa6e34de8c7684bb7713e93dcfe09029.
Signed-off-by: Willy Tarreau <w@1wt.eu>
This is ebtree commit cf3b93736cb550038325e1d99861358d65f70e9a.
2025-09-17 14:30:32 +02:00
Willy Tarreau
52c6dd773d IMPORT: ebst: use prefetching in lookup() and insert()
While the previous optimizations couldn't be preserved due to the
possibility of out-of-bounds accesses, at least the prefetch is useful.
A test on treebench shows that for 64k short strings, the lookup time
falls from 276 to 199ns per lookup (28% savings), and the insert falls
from 311 to 296ns (4.9% savings), which are pretty respectable, so
let's do this.

This is ebtree commit b44ea5d07dc1594d62c3a902783ed1fb133f568d.
2025-09-17 14:30:32 +02:00
Willy Tarreau
fef4cfbd21 IMPORT: ebtree: only use __builtin_prefetch() when supported
It looks like __builtin_prefetch() appeared in gcc-3.1 as there's no
mention of it in 3.0's doc. Let's replace it with eb_prefetch() which
maps to __builtin_prefetch() on supported compilers and falls back to
the usual do{}while(0) on other ones. It was tested to properly build
with tcc as well as gcc-2.95.

This is ebtree commit 7ee6ede56a57a046cb552ed31302b93ff1a21b1a.
2025-09-17 14:30:32 +02:00
Willy Tarreau
3dda813d54 IMPORT: eb32/64: optimize insert for modern CPUs
Similar to previous patches, let's improve the insert() descent loop to
avoid discovering mandatory data too late. The change here is even
simpler than previous ones, a prefetch was installed and troot is
calculated before last instruction in a speculative way. This was enough
to gain +50% insertion rate on random data.

This is ebtree commit e893f8cc4d44b10f406b9d1d78bd4a9bd9183ccf.
2025-09-17 14:30:32 +02:00
Willy Tarreau
61654c07bd IMPORT: ebmb: optimize the lookup for modern CPUs
This is the same principles as for the latest improvements made on
integer trees. Applying the same recipes made the ebmb_lookup()
function jump from 10.07 to 12.25 million lookups per second on a
10k random values tree (+21.6%).

It's likely that the ebmb_lookup_longest() code could also benefit
from this, though this was neither explored nor tested.

This is ebtree commit a159731fd6b91648a2fef3b953feeb830438c924.
2025-09-17 14:30:32 +02:00
Willy Tarreau
6c54bf7295 IMPORT: eb32/eb64: place an unlikely() on the leaf test
In the loop we can help the compiler build slightly more efficient code
by placing an unlikely() around the leaf test. This shows a consistent
0.5% performance gain both on eb32 and eb64.

This is ebtree commit 6c9cdbda496837bac1e0738c14e42faa0d1b92c4.
2025-09-17 14:30:32 +02:00
Willy Tarreau
384907f4e7 IMPORT: eb32: drop the now useless node_bit variable
This one was previously used to preload from the node and keep a copy
in a register on i386 machines with few registers. With the new more
optimal code it's totally useless, so let's get rid of it. By the way
the 64 bit code didn't use that at all already.

This is ebtree commit 1e219a74cfa09e785baf3637b6d55993d88b47ef.
2025-09-17 14:30:31 +02:00
Willy Tarreau
c9e4adf608 IMPORT: eb32/eb64: use a more parallelizable check for lack of common bits
Instead of shifting the XOR value right and comparing it to 1, which
roughly requires 2 sequential instructions, better test if the XOR has
any bit above the current bit, which means any bit set among those
strictly higher, or in other words that XOR & (-bit << 1) is non-zero.
This is one less instruction in the fast path and gives another nice
performance gain on random keys (in million lookups/s):

    eb32   1k:  33.17 -> 37.30   +12.5%
          10k:  15.74 -> 17.08   +8.51%
         100k:   8.00 ->  9.00   +12.5%
    eb64   1k:  34.40 -> 38.10   +10.8%
          10k:  16.17 -> 17.10   +5.75%
         100k:   8.38 ->  8.87   +5.85%

This is ebtree commit c942a2771758eed4f4584fe23cf2914573817a6b.
2025-09-17 14:30:31 +02:00
Willy Tarreau
6af17d491f IMPORT: eb32/eb64: reorder the lookup loop for modern CPUs
The current code calculates the next troot based on a calculation.
This was efficient when the algorithm was developed many years ago
on K6 and K7 CPUs running at low frequencies with few registers and
limited branch prediction units but nowadays with ultra-deep pipelines
and high latency memory that's no longer efficient, because the CPU
needs to have completed multiple operations before knowing which
address to start fetching from. It's sad because we only have two
branches each time but the CPU cannot know it. In addition, the
calculation is performed late in the loop, which does not help the
address generation unit to start prefetching next data.

Instead we should help the CPU by preloading data early from the node
and calculing troot as soon as possible. The CPU will be able to
postpone that processing until the dependencies are available and it
really needs to dereference it. In addition we must absolutely avoid
serializing instructions such as "(a >> b) & 1" because there's no
way for the compiler to parallelize that code nor for the CPU to pre-
process some early data.

What this patch does is relatively simple:

  - we try to prefetch the next two branches as soon as the
    node is known, which will help dereference the selected node in
    the next iteration; it was shown that it only works with the next
    changes though, otherwise it can reduce the performance instead.
    In practice the prefetching will start a bit later once the node
    is really in the cache, but since there's no dependency between
    these instructions and any other one, we let the CPU optimize as
    it wants.

  - we preload all important data from the node (next two branches,
    key and node.bit) very early even if not immediately needed.
    This is cheap, it doesn't cause any pipeline stall and speeds
    up later operations.

  - we pre-calculate 1<<bit that we assign into a register, so as
    to avoid serializing instructions when deciding which branch to
    take.

  - we assign the troot based on a ternary operation (or if/else) so
    that the CPU knows upfront the two possible next addresses without
    waiting for the end of a calculation and can prefetch their contents
    every time the branch prediction unit guesses right.

Just doing this provides significant gains at various tree sizes on
random keys (in million lookups per second):

  eb32   1k:  29.07 -> 33.17  +14.1%
        10k:  14.27 -> 15.74  +10.3%
       100k:   6.64 ->  8.00  +20.5%
  eb64   1k:  27.51 -> 34.40  +25.0%
        10k:  13.54 -> 16.17  +19.4%
       100k:   7.53 ->  8.38  +11.3%

The performance is now much closer to the sequential keys. This was
done for all variants ({32,64}{,i,le,ge}).

Another point, the equality test in the loop improves the performance
when looking up random keys (since we don't need to reach the leaf),
but is counter-productive for sequential keys, which can gain ~17%
without that test. However sequential keys are normally not used with
exact lookups, but rather with lookup_ge() that spans a time frame,
and which does not have that test for this precise reason, so in the
end both use cases are served optimally.

It's interesting to note that everything here is solely based on data
dependencies, and that trying to perform *less* operations upfront
always ends up with lower performance (typically the original one).

This is ebtree commit 05a0613e97f51b6665ad5ae2801199ad55991534.
2025-09-17 14:30:31 +02:00
Willy Tarreau
dcd4d36723 IMPORT: ebtree: delete unusable ebpttree.c
Since commit 21fd162 ("[MEDIUM] make ebpttree rely solely on eb32/eb64
trees") it was no longer used and no longer builds. The commit message
mentions that the file is no longer needed, probably that a rebase failed
and left the file there.

This is ebtree commit fcfaf8df90e322992f6ba3212c8ad439d3640cb7.
2025-09-17 14:30:31 +02:00
Aurelien DARRAGON
b72225dee2 DOC: internals: document the shm-stats-file format/mapping
Add some documentation about shm stats file structure to help writing
tools that can parse the file to use the shared stats counters.

This file was written for shm stats file version 1.0 specifically,
it may need to be updated when the shm stats file structure changes
in the future.
2025-09-17 11:32:58 +02:00
Aurelien DARRAGON
644b6b9925 MINOR: counters: document that tg shared counters are tied to shm-stats-file mapping
Let's explicitly mention that fe_counters_shared_tg and
be_counters_shared_tg structs are embedded in shm_stats_file_object
struct so any change in those structs will result in shm stats file
incompatibility between processes, thus extra precaution must be
taken when making changes to them.

Note that the provisionning made in shm_stats_file_object struct could
be used to add members to {fe,be}_counters_shared_tg without changing
shm_stats_file_object struct size if needed in order to preserve
shm stats file version.
2025-09-17 11:31:29 +02:00
Aurelien DARRAGON
31b3be7aae CLEANUP: log: remove deadcode in px_parse_log_steps()
When logsteps proxy storage was migrated from eb nodes to bitmasks in
6a92b14 ("MEDIUM: log/proxy: store log-steps selection using a bitmask,
not an eb tree"), some unused eb node related code was left over in
px_parse_log_steps()

Not only this code is unused, it also resulted in wasted memory since
an eb node was allocated for nothing.

This should fix GH #3121
2025-09-17 11:31:17 +02:00
Willy Tarreau
3d73e6c818 BUG/MEDIUM: pattern: fix possible infinite loops on deletion (try 2)
Commit e36b3b60b3 ("MEDIUM: migrate the patterns reference to cebs_tree")
changed the construction of the loops used to look up matching nodes, and
since we don't need two elements anymore, the "continue" statement now
loops on the same element when deleting. Let's fix this to make sure it
passes through the next one.

While this bug is 3.3 only, it turns out that 3.2 is also affected by
the incorrect loop construct in pat_ref_set_from_node(), where it's
possible to run an infinite loop since commit 010c34b8c7 ("MEDIUM:
pattern: consider gen_id in pat_ref_set_from_node()") due to the
"continue" statement being placed before the ebmb_next_dup() call.

As such the relevant part of this fix (pat_ref_set_from_elt) will
need to be backported to 3.2.
2025-09-16 16:32:39 +02:00
Willy Tarreau
f1b1d3682a Revert "BUG/MEDIUM: pattern: fix possible infinite loops on deletion"
This reverts commit 359a829ccb8693e0b29808acc0fa7975735c0353.
The fix is neither sufficient nor correct (it triggers ASAN). Better
redo it cleanly rather than accumulate invalid fixes.
2025-09-16 16:32:39 +02:00
William Lallemand
6b6c03bc0d CI: scripts: mkdir BUILDSSL_TMPDIR
Creates the BUILDSSL_TMPDIR at the beginning of the script instead of
having to create it in each download functions
2025-09-16 15:35:35 +02:00
William Lallemand
9517116f63 CI: github: add an OpenSSL + ECH job
The upcoming ECH feature need a patched OpenSSL with the "feature/ech"
branch.

This daily job launches an openssl build, as well as haproxy build with
reg-tests.
2025-09-16 15:05:44 +02:00
William Lallemand
31319ff7f0 CI: scripts: add support for git in openssl builds
Add support for git releases downloaded from github in openssl builds:

- GIT_TYPE variable allow you to chose between "branch" or "commit"
- OPENSSL_VERSION variable supports a "git-" prefix
- "git-${commit_id}" is stored in .openssl_version instead of the branch
  name for version comparison.
2025-09-16 15:05:44 +02:00
Willy Tarreau
359a829ccb BUG/MEDIUM: pattern: fix possible infinite loops on deletion
Commit e36b3b60b3 ("MEDIUM: migrate the patterns reference to cebs_tree")
changed the construction of the loops used to look up matching nodes, and
since we don't need two elements anymore, the "continue" statement now
loops on the same element when deleting. Let's fix this to make sure it
passes through the next one.

No backport is needed, this is only 3.3.
2025-09-16 11:49:01 +02:00
Willy Tarreau
4edff4a2cc CLEANUP: vars: use the item API for the variables trees
The variables trees use the immediate cebtree API, better use the
item one which is more expressive and safer. The "node" field was
renamed to "name_node" to avoid any ambiguity.
2025-09-16 10:51:23 +02:00
Willy Tarreau
c058cc5ddf CLEANUP: tools: use the item API for the file names tree
The file names tree uses the immediate cebtree API, better use the
item one which is more expressive and safer.
2025-09-16 10:41:19 +02:00
Willy Tarreau
2d6b5c7a60 MEDIUM: connection: reintegrate conn_hash_node into connection
Previously the conn_hash_node was placed outside the connection due
to the big size of the eb64_node that could have negatively impacted
frontend connections. But having it outside also means that one
extra allocation is needed for each backend connection, and that one
memory indirection is needed for each lookup.

With the compact trees, the tree node is smaller (16 bytes vs 40) so
the overhead is much lower. By integrating it into the connection,
We're also eliminating one pointer from the connection to the hash
node and one pointer from the hash node to the connection (in addition
to the extra object bookkeeping). This results in saving at least 24
bytes per total backend connection, and only inflates connections by
16 bytes (from 240 to 256), which is a reasonable compromise.

Tests on a 64-core EPYC show a 2.4% increase in the request rate
(from 2.08 to 2.13 Mrps).
2025-09-16 09:23:46 +02:00
Willy Tarreau
ceaf8c1220 MEDIUM: connection: move idle connection trees to ceb64
Idle connection trees currently require a 56-byte conn_hash_node per
connection, which can be reduced to 32 bytes by moving to ceb64. While
ceb64 is theoretically slower, in practice here we're essentially
dealing with trees that almost always contain a single key and many
duplicates. In this case, ceb64 insert and lookup functions become
faster than eb64 ones because all duplicates are a list accessed in
O(1) while it's a subtree for eb64. In tests it is impossible to tell
the difference between the two, so it's worth reducing the memory
usage.

This commit brings the following memory savings to conn_hash_node
(one per backend connection), and to srv_per_thread (one per thread
and per server):

     struct       before  after  delta
  conn_hash_nodea   56     32     -24
  srv_per_thread    96     72     -24

The delicate part is conn_delete_from_tree(), because we need to
know the tree root the connection is attached to. But thanks to
recent cleanups, it's now clear enough (i.e. idle/safe/avail vs
session are easy to distinguish).
2025-09-16 09:23:46 +02:00
Willy Tarreau
95b8adff67 MINOR: connection: pass the thread number to conn_delete_from_tree()
We'll soon need to choose the server's root based on the connection's
flags, and for this we'll need the thread it's attached to, which is
not always the current one. This patch simply passes the thread number
from all callers. They know it because they just set the idle_conns
lock on it prior to calling the function.
2025-09-16 09:23:46 +02:00
Willy Tarreau
efe519ab89 CLEANUP: backend: use a single variable for removed in srv_cleanup_idle_conns()
Probably due to older code, there's a boolean variable used to set
another one which is then checked. Also the first check is made under
the lock, which is unnecessary. Let's simplify this and use a single
variable. This only makes the code clearer, it doesn't change the output
code.
2025-09-16 09:23:46 +02:00
Willy Tarreau
f7d1fc2b08 MINOR: server: pass the server and thread to srv_migrate_conns_to_remove()
We'll need to have access to the srv_per_thread element soon from this
function, and there's no particular reason for passing it list pointers
so let's pass the server and the thread so that it is autonomous. It
also makes the calling code simpler.
2025-09-16 09:23:46 +02:00
Willy Tarreau
d1c5df6866 CLEANUP: server: use eb64_entry() not ebmb_entry() to convert an eb64
There were a few leftovers from an earlier version of the conn_hash_node
that was using ebmb nodes. A few calls to ebmb_first() and ebmb_entry()
were still present while acting on an eb64 tree. These are harmless as
one is just eb_first() and the other container_of(), but it's confusing
so let's clean them up.
2025-09-16 09:23:46 +02:00
Willy Tarreau
3d18a0d4c2 CLEANUP: backend: factor the connection lookup loop
The connection lookup loop is made of two identical blocks, one looking
in the idle or safe lists and the other one looking into the safe list
only. The second one is skipped if a connection was found or if the request
looks for a safe one (since already done). Also the two are slightly
different due to leftovers from earlier versions in that the second one
checks for safe connections and not the first one, and the second one
sets is_safe which is not used later.

Let's just rationalize all this by placing them in a loop which checks
first from the idle conns and second from the safe ones, or skips the
first step if the request wants a safe connection. This reduces the
code and shortens the time spent under the lock.
2025-09-16 09:23:46 +02:00
Willy Tarreau
7773d87ea6 CLEANUP: proxy: slightly reorganize fields to plug some holes
The proxy struct has several small holes that deserved being plugged by
moving a few fields around. Now we're down to 3056 from 3072 previously,
and the remaining holes are small.

At the moment, compared to before this series, we're seeing these
sizes:

    type\size   7d554ca62   current  delta
    listener       752        704     -48  (-6.4%)
    server        4032       3840    -192  (-4.8%)
    proxy         3184       3056    -128  (-4%)
    stktable      3392       3328     -64  (-1.9%)

Configs with many servers have shrunk by about 4% in RAM and configs
with many proxies by about 3%.
2025-09-16 09:23:46 +02:00
Willy Tarreau
8df81b6fcc CLEANUP: server: slightly reorder fields in the struct to plug holes
The struct server still has a lot of holes and padding that make it
quite big. By moving a few fields aronud between areas which do not
interact (e.g. boot vs aligned areas), it's quite easy to plug some
of them and/or to arrange larger ones which could be reused later with
a bit more effort. Here we've reduced holes by 40 bytes, allowing the
struct to shrink by one more cache line (64 bytes). The new size is
3840 bytes.
2025-09-16 09:23:46 +02:00
Willy Tarreau
d18d972b1f MEDIUM: server: index server ID using compact trees
The server ID is currently stored as a 32-bit int using an eb32 tree.
It's used essentially to find holes in order to automatically assign IDs,
and to detect duplicates. Let's change this to use compact trees instead
in order to save 24 bytes in struct server for this node, plus 8 bytes in
struct proxy. The server struct is still 3904 bytes large (due to
alignment) and the proxy struct is 3072.
2025-09-16 09:23:46 +02:00
Willy Tarreau
66191584d1 MEDIUM: listener: index listener ID using compact trees
The listener ID is currently stored as a 32-bit int using an eb32 tree.
It's used essentially to find holes in order to automatically assign IDs,
and to detect duplicates. Let's change this to use compact trees instead
in order to save 24 bytes in struct listener for this node, plus 8 bytes
in struct proxy. The struct listener is now 704 bytes large, and the
struct proxy 3080.
2025-09-16 09:23:46 +02:00
Willy Tarreau
1a95bc42c7 MEDIUM: proxy: index proxy ID using compact trees
The proxy ID is currently stored as a 32-bit int using an eb32 tree.
It's used essentially to find holes in order to automatically assign IDs,
and to detect duplicates. Let's change this to use compact trees instead
in order to save 24 bytes in struct proxy for this node, plus 8 bytes in
the root (which is static so not much relevant here). Now the proxy is
3088 bytes large.
2025-09-16 09:23:46 +02:00
Willy Tarreau
eab5b89dce MINOR: proxy: add proxy_index_id() to index a proxy by its ID
This avoids needlessly exposing the tree's root and the mechanics outside
of the low-level code.
2025-09-16 09:23:46 +02:00
Willy Tarreau
5e4b6714e1 MINOR: listener: add listener_index_id() to index a listener by its ID
This avoids needlessly exposing the tree's root and the mechanics outside
of the low-level code.
2025-09-16 09:23:46 +02:00
Willy Tarreau
5a5cec4d7a MINOR: server: add server_index_id() to index a server by its ID
This avoids needlessly exposing the tree's root and the mechanics outside
of the low-level code.
2025-09-16 09:23:46 +02:00
Willy Tarreau
4ed4cdbf3d CLEANUP: server: use server_find_by_id() when looking for already used IDs
In srv_parse_id(), there's no point doing all the low-level work with
the tree functions to check for the existence of an ID, we already have
server_find_by_id() which does exactly this, so let's use it.
2025-09-16 09:23:46 +02:00
Willy Tarreau
0b0aefe19b MINOR: server: add server_get_next_id() to find next free server ID
This was previously achieved via the generic get_next_id() but we'll soon
get rid of generic ID trees so let's have a dedicated server_get_next_id().
As a bonus it reduces the exposure of the tree's root outside of the functions.
2025-09-16 09:23:46 +02:00
Willy Tarreau
23605eddb1 MINOR: listener: add listener_get_next_id() to find next free listener ID
This was previously achieved via the generic get_next_id() but we'll soon
get rid of generic ID trees so let's have a dedicated listener_get_next_id().
As a bonus it reduces the exposure of the tree's root outside of the functions.
2025-09-16 09:23:46 +02:00
Willy Tarreau
b2402d67b7 MINOR: proxy: add proxy_get_next_id() to find next free proxy ID
This was previously achieved via the generic get_next_id() but we'll soon
get rid of generic ID trees so let's have a dedicated proxy_get_next_id().
2025-09-16 09:23:46 +02:00
Willy Tarreau
f4059ea42f MEDIUM: stktable: index table names using compact trees
Here we're saving 64 bytes per stick-table, from 3392 to 3328, and the
change was really straightforward so there's no reason not to do it.
2025-09-16 09:23:46 +02:00
Willy Tarreau
d0d60a007d MEDIUM: proxy: switch conf.name to cebis_tree
This is used to index the proxy's name and it contains a copy of the
pointer to the proxy's name in <id>. Changing that for a ceb_node placed
just before <id> saves 32 bytes to the struct proxy, which is now 3112
bytes large.

Here we need to continue to support duplicates since they're still
allowed between type-incompatible proxies.

Interestingly, the use of cebis_next_dup() instead of cebis_next() in
proxy_find_by_name() allows us to get rid of an strcmp() that was
performed for each use_backend rule. A test with a large config
(100k backends) shows that we can get 3% extra performance on a
config involving a static use_backend rule (3.09M to 3.18M rps),
and even 4.5% on a dynamic rule selecting a random backend (2.47M
to 2.59M).
2025-09-16 09:23:46 +02:00
Willy Tarreau
fdf6fd5b45 MEDIUM: server: switch the host_dn member to cebis_tree
This member is used to index the hostname_dn contents for DNS resolution.
Let's replace it with a cebis_tree to save another 32 bytes (24 for the
node + 8 by avoiding the duplication of the pointer). The struct server is
now at 3904 bytes.
2025-09-16 09:23:46 +02:00
Willy Tarreau
413e903a22 MEDIUM: server: switch conf.name to cebis_tree
This is used to index the server name and it contains a copy of the
pointer to the server's name in <id>. Changing that for a ceb_node placed
just before <id> saves 32 bytes to the struct server, which remains 3968
bytes large due to alignment. The proxy struct shrinks by 8 bytes to 3144.

It's worth noting that the current way duplicate names are handled remains
based on the previous mechanism where dups were permitted. Ideally we
should now reject them during insertion and use unique key trees instead.
2025-09-16 09:23:46 +02:00
Willy Tarreau
0e99f64fc6 MEDIUM: server: switch addr_node to cebis_tree
This contains the text representation of the server's address, for use
with stick-tables with "srvkey addr". Switching them to a compact node
saves 24 more bytes from this structure. The key was moved to an external
pointer "addr_key" right after the node.

The server struct is now 3968 bytes (down from 4032) due to alignment, and
the proxy struct shrinks by 8 bytes to 3152.
2025-09-16 09:23:46 +02:00
Willy Tarreau
91258fb9d8 MEDIUM: guid: switch guid to more compact cebuis_tree
The current guid struct size is 56 bytes. Once reduced using compact
trees, it goes down to 32 (almost half). We're not on a critical path
and size matters here, so better switch to this.

It's worth noting that the name part could also be stored in the
guid_node at the end to save 8 extra byte (no pointer needed anymore),
however the purpose of this struct is to be embedded into other ones,
which is not compatible with having a dynamic size.

Affected struct sizes in bytes:

           Before     After   Diff
  server    4032       4032     0*
  proxy     3184       3160    -24
  listener   752        728    -24

*: struct server is full of holes and padding (176 bytes) and is
64-byte aligned. Moving the guid_node elsewhere such as after sess_conn
reduces it to 3968, or one less cache line. There's no point in moving
anything now because forthcoming patches will arrange other parts.
2025-09-16 09:23:46 +02:00
Willy Tarreau
e36b3b60b3 MEDIUM: migrate the patterns reference to cebs_tree
cebs_tree are 24 bytes smaller than ebst_tree (16B vs 40B), and pattern
references are only used during map/acl updates, so their storage is
pure loss between updates (which most of the time never happen). By
switching their indexing to compact trees, we can save 16 to 24 bytes
per entry depending on alightment (here it's 24 per struct but 16
practical as malloc's alignment keeps 8 unused).

Tested on core i7-8650U running at 3.0 GHz, with a file containing
17.7M IP addresses (16.7M different):

   $ time  ./haproxy -c -f acl-ip.cfg

Save 280 MB RAM for 17.7M IP addresses, and slightly speeds up the
startup (5.8%, from 19.2s to 18.2s), a part of which possible being
attributed to having to write less memory. Note that this is on small
strings. On larger ones such as user-agents, ebtree doesn't reread
the whole key and might be more efficient.

Before:
  RAM (VSZ/RSS): 4443912 3912444

  real    0m19.211s
  user    0m18.138s
  sys     0m1.068s

  Overhead  Command         Shared Object      Symbol
    44.79%  haproxy  haproxy            [.] ebst_insert
    25.07%  haproxy  haproxy            [.] ebmb_insert_prefix
     3.44%  haproxy  libc-2.33.so       [.] __libc_calloc
     2.71%  haproxy  libc-2.33.so       [.] _int_malloc
     2.33%  haproxy  haproxy            [.] free_pattern_tree
     1.78%  haproxy  libc-2.33.so       [.] inet_pton4
     1.62%  haproxy  libc-2.33.so       [.] _IO_fgets
     1.58%  haproxy  libc-2.33.so       [.] _int_free
     1.56%  haproxy  haproxy            [.] pat_ref_push
     1.35%  haproxy  libc-2.33.so       [.] malloc_consolidate
     1.16%  haproxy  libc-2.33.so       [.] __strlen_avx2
     0.79%  haproxy  haproxy            [.] pat_idx_tree_ip
     0.76%  haproxy  haproxy            [.] pat_ref_read_from_file
     0.60%  haproxy  libc-2.33.so       [.] __strrchr_avx2
     0.55%  haproxy  libc-2.33.so       [.] unlink_chunk.constprop.0
     0.54%  haproxy  libc-2.33.so       [.] __memchr_avx2
     0.46%  haproxy  haproxy            [.] pat_ref_append

After:
  RAM (VSZ/RSS): 4166108 3634768

  real    0m18.114s
  user    0m17.113s
  sys     0m0.996s

  Overhead  Command  Shared Object       Symbol
    38.99%  haproxy  haproxy             [.] cebs_insert
    27.09%  haproxy  haproxy             [.] ebmb_insert_prefix
     3.63%  haproxy  libc-2.33.so        [.] __libc_calloc
     3.18%  haproxy  libc-2.33.so        [.] _int_malloc
     2.69%  haproxy  haproxy             [.] free_pattern_tree
     1.99%  haproxy  libc-2.33.so        [.] inet_pton4
     1.74%  haproxy  libc-2.33.so        [.] _IO_fgets
     1.73%  haproxy  libc-2.33.so        [.] _int_free
     1.57%  haproxy  haproxy             [.] pat_ref_push
     1.48%  haproxy  libc-2.33.so        [.] malloc_consolidate
     1.22%  haproxy  libc-2.33.so        [.] __strlen_avx2
     1.05%  haproxy  libc-2.33.so        [.] __strcmp_avx2
     0.80%  haproxy  haproxy             [.] pat_idx_tree_ip
     0.74%  haproxy  libc-2.33.so        [.] __memchr_avx2
     0.69%  haproxy  libc-2.33.so        [.] __strrchr_avx2
     0.69%  haproxy  libc-2.33.so        [.] _IO_getline_info
     0.62%  haproxy  haproxy             [.] pat_ref_read_from_file
     0.56%  haproxy  libc-2.33.so        [.] unlink_chunk.constprop.0
     0.56%  haproxy  libc-2.33.so        [.] cfree@GLIBC_2.2.5
     0.46%  haproxy  haproxy             [.] pat_ref_append

If the addresses are totally disordered (via "shuf" on the input file),
we see both implementations reach exactly 68.0s (slower due to much
higher cache miss ratio).

On large strings such as user agents (1 million here), it's now slightly
slower (+9%):

Before:
  real    0m2.475s
  user    0m2.316s
  sys     0m0.155s

After:
  real    0m2.696s
  user    0m2.544s
  sys     0m0.147s

But such patterns are much less common than short ones, and the memory
savings do still count.

Note that while it could be tempting to get rid of the list that chains
all these pat_ref_elt together and only enumerate them by walking along
the tree to save 16 extra bytes per entry, that's not possible due to
the problem that insertion ordering is critical (think overlapping regex
such as /index.* and /index.html). Currently it's not possible to proceed
differently because patterns are first pre-loaded into the pat_ref via
pat_ref_read_from_file_smp() and later indexed by pattern_read_from_file(),
which has to only redo the second part anyway for maps/acls declared
multiple times.
2025-09-16 09:23:46 +02:00
Willy Tarreau
ddf900a0ce IMPORT: cebtree: import version 0.5.0 to support duplicates
The support for duplicates is necessary for various use cases related
to config names, so let's upgrade to the latest version which brings
this support. This updates the cebtree code to commit 808ed67 (tag
0.5.0). A few tiny adaptations were needed:
  - replace a few ceb_node** with ceb_root** since pointers are now
    tagged ;
  - replace cebu*.h with ceb*.h since both are now merged in the same
    include file. This way we can drop the unused cebu*.h files from
    cebtree that are provided only for compatibility.
  - rename immediate storage functions to cebXX_imm_XXX() as per the API
    change in 0.5 that makes immediate explicit rather than implicit.
    This only affects vars and tools.c:copy_file_name().

The tests continue to work.
2025-09-16 09:23:46 +02:00
Willy Tarreau
90b70b61b1 BUILD: makefile: implement support for running a command in range
When running "make range", it would be convenient to support running
reg tests or anything else such as "size", "pahole" or even benchmarks.
Such commands are usually specific to the developer's environment, so
let's just pass a generic variable TEST_CMD that is executed as-is if
not empty.

This way it becomes possible to run "make range RANGE=... TEST_CMD=...".
2025-09-16 09:23:46 +02:00
Valentine Krasnobaeva
f8acac653e BUG/MINOR: resolvers: always normalize FQDN from response
RFC1034 states the following:

By convention, domain names can be stored with arbitrary case, but
domain name comparisons for all present domain functions are done in a
case-insensitive manner, assuming an ASCII character set, and a high
order zero bit. This means that you are free to create a node with
label "A" or a node with label "a", but not both as brothers; you could
refer to either using "a" or "A".

In practice, most DNS resolvers normalize domain labels (i.e., convert
them to lowercase) before performing searches or comparisons to ensure
this requirement is met.

While HAProxy normalizes the domain name in the request, it currently
does not do so for the response. Commit 75cc653 ("MEDIUM: resolvers:
replace bogus resolv_hostname_cmp() with memcmp()") intentionally
removed the `tolower()` conversion from `resolv_hostname_cmp()` for
safety and performance reasons.

This commit re-introduces the necessary normalization for FQDNs received
in the response. The change is made in `resolv_read_name()`, where labels
are processed as an unsigned char string, allowing `tolower()` to be
applied safely. Since a typical FQDN has only 3-4 labels, replacing
`memcpy()` with an explicit copy that also applies `tolower()` should
not introduce a significant performance degradation.

This patch addresses the rare edge case, as most resolvers perform this
normalization themselves.

This fixes the GitHub issue #3102. This fix may be backported in all stable
versions since 2.5 included 2.5.
2025-09-15 18:02:16 +02:00
Remi Tricot-Le Breton
257df69fbd BUG/MINOR: ocsp: Crash when updating CA during ocsp updates
If an ocsp response is set to be updated automatically and some
certificate or CA updates are performed on the CLI, if the CLI update
happens while the OCSP response is being updated and is then detached
from the udapte tree, it might be wrongly inserted into the update tree
in 'ssl_sock_load_ocsp', and then reinserted when the update finishes.

The update tree then gets corrupted and we could end up crashing when
accessing other nodes in the ocsp response update tree.

This patch must be backported up to 2.8.
This patch fixes GitHub #3100.
2025-09-15 15:34:36 +02:00
Aurelien DARRAGON
6a92b14cc1 MEDIUM: log/proxy: store log-steps selection using a bitmask, not an eb tree
An eb tree was used to anticipate for infinite amount of custom log steps
configured at a proxy level. In turns out this makes no sense to configure
that much logging steps for a proxy, and the cost of the eb tree is non
negligible in terms of memory footprint, especially when used in a default
section.

Instead, let's use a simple bitmask, which allows up to 64 logging steps
configured at proxy level. If we lack space some day (and need more than
64 logging steps to be configured), we could simply modify
"struct log_steps" to spread the bitmask over multiple 64bits integers,
minor some adjustments where the mask is set and checked.
2025-09-15 10:29:02 +02:00
Aurelien DARRAGON
be417c1db2 BUG/MEDIUM: http_ana: fix potential NULL deref in http_process_req_common()
As reported by @kenballus in GH #3118, a potential NULL-deref was
introduced in 3da1d63 ("BUG/MEDIUM: http_ana: handle yield for "stats
http-request" evaluation")

Indeed, px->uri_auth may be NULL when stats directive is not involved in
the current proxy section.

The bug went unnoticed because it didn't seem to cause any side-effect
so far and valgrind didn't catch it. However ASAN did, so let's fix it
before it causes harm.

It should be backported with 3da1d63.
2025-09-15 10:28:59 +02:00
Christopher Faulet
b582fd41c2 Revert "BUG/MINOR: ocsp: Crash when updating CA during ocsp updates"
This reverts commit 167ea8fc7b0cf9d1bf71ec03d7eac3141fbe0080.

The patch was backported by mistake.
2025-09-15 10:16:20 +02:00
Remi Tricot-Le Breton
167ea8fc7b BUG/MINOR: ocsp: Crash when updating CA during ocsp updates
If an ocsp response is set to be updated automatically and some
certificate or CA updates are performed on the CLI, if the CLI update
happens while the OCSP response is being updated and is then detached
from the udapte tree, it might be wrongly inserted into the update tree
in 'ssl_sock_load_ocsp', and then reinserted when the update finishes.

The update tree then gets corrupted and we could end up crashing when
accessing other nodes in the ocsp response update tree.

This patch must be backported up to 2.8.
This patch fixes GitHub #3100.
2025-09-15 08:20:16 +02:00
Christopher Faulet
157852ce99 BUG/MEDIUM: resolvers: Wake resolver task up whne unlinking a stream requester
Another regression introduced with the commit 3023e9819 ("BUG/MINOR:
resolvers: Restore round-robin selection on records in DNS answers"). Stream
requesters are unlinked from any theards. So we must not try to queue the
resolver's task here because it is not allowed to do so from another thread
than the task thread. Instead, we can simply wake the resolver's task up. It
is only performed when the last stream requester is unlink from the
resolution.

This patch should fix the issue #3119. It must be backported with the commit
above.
2025-09-15 07:57:29 +02:00
Christopher Faulet
e6a9192af6 BUG/MEDIUM: resolvers: Accept to create resolution without hostname
A regression was introduced by commit 6cf2401ed ("BUG/MEDIUM: resolvers:
Make resolution owns its hostname_dn value"). In fact, it is possible (an
allowed ?!) to create a resolution without hostname (hostname_dn ==
NULL). It only happens on startup for a server relying on a resolver but
defined with an IP address and not a hostname

Because of the patch above, an error is triggered during the configuration
parsing when this happens, while it should be accepted.

This patch must be backported with the commit above.
2025-09-12 11:52:06 +02:00
Christopher Faulet
6cf2401eda BUG/MEDIUM: resolvers: Make resolution owns its hostname_dn value
The commit 37abe56b1 ("BUG/MEDIUM: resolvers: Properly cache do-resolv
resolution") introduced a regression. A resolution does not own its
hostname_dn value, it is a pointer on the first request value. But since the
commit above, it is possible to have orphan resolution, with no
requester. So it is important to modify the resolutions to make it owns its
hostname_dn value by duplicating it when it is created.

This patch must be backported with the commit above.
2025-09-12 11:09:19 +02:00
Christopher Faulet
f6dfbbe870 BUG/MEDIUM: resolvers: Test for empty tree when getting a record from DNS answer
In the previous fix 5d1d93fad ("BUG/MEDIUM: resolvers: Properly handle empty
tree when getting a record from the DNS answer"), I missed the fact the
answer tree can be empty.

So, to avoid crashes, when the answer tree is empty, we immediately exit
from resolv_get_ip_from_response() function with RSLV_UPD_NO_IP_FOUND. In
addition, when a record is removed from the tree, we take care to reset the
next node saved if necessary.

This patch must be backported with the commit above.
2025-09-12 11:09:19 +02:00
Collison, Steven
d738fa4ec0 DOC: proxy-protocol: Add TLS group and sig scheme TLVs
This change adds the PP2_SUBTYPE_SSL_GROUP and PP2_SUBTYPE_SSL_SIG_SCHEME
code point reservations in proxy_protocol.txt. The motivation for adding
these two TLVs is for backend visibility into the negotiated TLS key
exchange group and handshake signature scheme.

Demand for visibility is expected to increase as endpoints migrate to use
new Post-Quantum resistant algorithms for key exchange and signatures.
2025-09-12 09:25:14 +02:00
Willy Tarreau
8fb5ae5cc6 MINOR: activity/memory: count allocations performed under a lock
By checking the current thread's locking status, it becomes possible
to know during a memory allocation whether it's performed under a lock
or not. Both pools and memprofile functions were instrumented to check
for this and to increment the memprofile bin's locked_calls counter.

This one, when not zero, is reported on "show profiling memory" with a
percentage of all allocations that such locked allocations represent.
This way it becomes possible to try to target certain code paths that
are particularly expensive. Example:

  $ socat - /tmp/sock1 <<< "show profiling memory"|grep lock
     20297301           0     2598054528              0|   0x62a820fa3991 sockaddr_alloc+0x61/0xa3 p_alloc(128) [pool=sockaddr] [locked=54962 (0.2 %)]
            0    20297301              0     2598054528|   0x62a820fa3a24 sockaddr_free+0x44/0x59 p_free(-128) [pool=sockaddr] [locked=34300 (0.1 %)]
      9908432           0     1268279296              0|   0x62a820eb8524 main+0x81974 p_alloc(128) [pool=task] [locked=9908432 (100.0 %)]
      9908432           0      554872192              0|   0x62a820eb85a6 main+0x819f6 p_alloc(56) [pool=tasklet] [locked=9908432 (100.0 %)]
       263001           0       63120240              0|   0x62a820fa3c97 conn_new+0x37/0x1b2 p_alloc(240) [pool=connection] [locked=20662 (7.8 %)]
        71643           0       47307584              0|   0x62a82105204d pool_get_from_os_noinc+0x12d/0x161 posix_memalign(660) [locked=5393 (7.5 %)]
2025-09-11 16:32:34 +02:00
Willy Tarreau
9d8c2a888b MINOR: activity: collect CPU time spent on memory allocations for each task
When task profiling is enabled, the pool alloc/free code will measure the
time it takes to perform memory allocation after a cache miss or memory
freeing to the shared cache or OS. The time taken with the thread-local
cache is never measured as measuring that time is very expensive compared
to the pool access time. Here doing so costs around 2% performance at 2M
req/s, only when task profiling is enabled, so this remains reasonable.
The scheduler takes care of collecting that time and updating the
sched_activity entry corresponding to the current task when task profiling
is enabled.

The goal clearly is to track places that are wasting CPU time allocating
and releasing too often, or causing large evictions. This appears like
this in "show profiling tasks aggr":

  Tasks activity over 11.428 sec till 0.000 sec ago:
    function                      calls   cpu_tot   cpu_avg   lkw_avg   lkd_avg   mem_avg   lat_avg
    process_stream             44183891   16.47m    22.36us   491.0ns   1.154us   1.000ns   101.1us
    h1_io_cb                   57386064   4.011m    4.193us   20.00ns   16.00ns      -      29.47us
    sc_conn_io_cb              42088024   49.04s    1.165us      -         -         -      54.67us
    h1_timeout_task              438171   196.5ms   448.0ns      -         -         -      100.1us
    srv_cleanup_toremove_conns       65   1.468ms   22.58us   184.0ns   87.00ns      -      101.3us
    task_process_applet               3   508.0us   169.3us      -      107.0us   1.847us   29.67us
    srv_cleanup_idle_conns            6   225.3us   37.55us   15.74us   36.84us      -      49.47us
    accept_queue_process              2   45.62us   22.81us      -         -      4.949us   54.33us
2025-09-11 16:32:34 +02:00
Willy Tarreau
195794eb59 MINOR: activity: add a new mem_avg column to show profiling stats
This new column will be used for reporting the average time spent
allocating or freeing memory in a task when task profiling is enabled.
For now it is not updated.
2025-09-11 16:32:34 +02:00
Willy Tarreau
98cc815e3e MINOR: activity: collect time spent with a lock held for each task
When DEBUG_THREAD > 0 and task profiling enabled, we'll now measure the
time spent with at least one lock held for each task. The time is
collected by locking operations when locks are taken raising the level
to one, or released resetting the level. An accumulator is updated in
the thread_ctx struct that is collected by the scheduler when the task
returns, and updated in the sched_activity entry of the related task.

This allows to observe figures like this one:

  Tasks activity over 259.516 sec till 0.000 sec ago:
    function                      calls   cpu_tot   cpu_avg   lkw_avg   lkd_avg   lat_avg
    h1_io_cb                   15466589   2.574m    9.984us      -         -      33.45us <- sock_conn_iocb@src/sock.c:1099 tasklet_wakeup
    sc_conn_io_cb               8047994   8.325s    1.034us      -         -      870.1us <- sc_app_chk_rcv_conn@src/stconn.c:844 tasklet_wakeup
    process_stream              7734689   4.356m    33.79us   1.990us   1.641us   1.554ms <- sc_notify@src/stconn.c:1206 task_wakeup
    process_stream              7734292   46.74m    362.6us   278.3us   132.2us   972.0us <- stream_new@src/stream.c:585 task_wakeup
    sc_conn_io_cb               7733158   46.88s    6.061us      -         -      68.78us <- h1_wake_stream_for_recv@src/mux_h1.c:3633 tasklet_wakeup
    task_process_applet         6603593   4.484m    40.74us   16.69us   34.00us   96.47us <- sc_app_chk_snd_applet@src/stconn.c:1043 appctx_wakeup
    task_process_applet         4761796   3.420m    43.09us   18.79us   39.28us   138.2us <- __process_running_peer_sync@src/peers.c:3579 appctx_wakeup
    process_table_expire        4710662   4.880m    62.16us   9.648us   53.95us   158.6us <- run_tasks_from_lists@src/task.c:671 task_queue
    stktable_add_pend_updates   4171868   6.786s    1.626us      -      1.487us   47.94us <- stktable_add_pend_updates@src/stick_table.c:869 tasklet_wakeup
    h1_io_cb                    2871683   1.198s    417.0ns   70.00ns   69.00ns   1.005ms <- h1_takeover@src/mux_h1.c:5659 tasklet_wakeup
    process_peer_sync           2304957   5.368s    2.328us      -      1.156us   68.54us <- stktable_add_pend_updates@src/stick_table.c:873 task_wakeup
    process_peer_sync           1388141   3.174s    2.286us      -      1.130us   52.31us <- run_tasks_from_lists@src/task.c:671 task_queue
    stktable_add_pend_updates    463488   3.530s    7.615us   2.000ns   7.134us   771.2us <- stktable_touch_with_exp@src/stick_table.c:654 tasklet_wakeup

Here we see that almost the entirety of stktable_add_pend_updates() is
spent under a lock, that 1/3 of the execution time of process_stream()
was performed under a lock and that 2/3 of it was spent waiting for a
lock (this is related to the 10 track-sc present in this config), and
that the locking time in process_peer_sync() has now significantly
reduced. This is more visible with "show profiling tasks aggr":

  Tasks activity over 475.354 sec till 0.000 sec ago:
    function                      calls   cpu_tot   cpu_avg   lkw_avg   lkd_avg   lat_avg
    h1_io_cb                   25742539   3.699m    8.622us   11.00ns   10.00ns   188.0us
    sc_conn_io_cb              22565666   1.475m    3.920us      -         -      473.9us
    process_stream             21665212   1.195h    198.6us   140.6us   67.08us   1.266ms
    task_process_applet        16352495   11.31m    41.51us   17.98us   36.55us   112.3us
    process_peer_sync           7831923   17.15s    2.189us      -      1.107us   41.27us
    process_table_expire        6878569   6.866m    59.89us   9.359us   51.91us   151.8us
    stktable_add_pend_updates   6602502   14.77s    2.236us      -      2.060us   119.8us
    h1_timeout_task                 801   703.4us   878.0ns      -         -      185.7us
    srv_cleanup_toremove_conns      347   12.43ms   35.82us   240.0ns   70.00ns   1.924ms
    accept_queue_process            142   1.384ms   9.743us      -         -      340.6us
    srv_cleanup_idle_conns           74   475.0us   6.418us   896.0ns   5.667us   114.6us
2025-09-11 16:32:34 +02:00
Willy Tarreau
95433f224e MINOR: activity: add a new lkd_avg column to show profiling stats
This new column will be used for reporting the average time spent
in a task with at least one lock held. It will only have a non-zero
value when DEBUG_THREAD > 0. For now it is not updated.
2025-09-11 16:32:34 +02:00
Willy Tarreau
4b23b2ed32 MINOR: thread: add a lock level information in the thread_ctx
The new lock_level field indicates the number of cumulated locks that
are held by the current thread. It's fed as soon as DEBUG_THREAD is at
least 1. In addition, thread_isolate() adds 128, so that it's even
possible to check for combinations of both. The value is also reported
in thread dumps (warnings and panics).
2025-09-11 16:32:34 +02:00
Willy Tarreau
503084643f MINOR: activity: collect time spent waiting on a lock for each task
When DEBUG_THREAD > 0, and if task profiling is enabled, then each
locking attempt will measure the time it takes to obtain the lock, then
add that time to a thread_ctx accumulator that the scheduler will then
retrieve to update the current task's sched_activity entry. The value
will then appear avearaged over the number of calls in the lkw_avg column
of "show profiling tasks", such as below:

  Tasks activity over 48.298 sec till 0.000 sec ago:
    function                      calls   cpu_tot   cpu_avg   lkw_avg   lat_avg
    h1_io_cb                    3200170   26.81s    8.377us      -      32.73us <- sock_conn_iocb@src/sock.c:1099 tasklet_wakeup
    sc_conn_io_cb               1657841   1.645s    992.0ns      -      853.0us <- sc_app_chk_rcv_conn@src/stconn.c:844 tasklet_wakeup
    process_stream              1600450   49.16s    30.71us   1.936us   1.392ms <- sc_notify@src/stconn.c:1206 task_wakeup
    process_stream              1600321   7.770m    291.3us   209.1us   901.6us <- stream_new@src/stream.c:585 task_wakeup
    sc_conn_io_cb               1599928   7.975s    4.984us      -      65.77us <- h1_wake_stream_for_recv@src/mux_h1.c:3633 tasklet_wakeup
    task_process_applet          997609   46.37s    46.48us   16.80us   113.0us <- sc_app_chk_snd_applet@src/stconn.c:1043 appctx_wakeup
    process_table_expire         922074   48.79s    52.92us   7.275us   181.1us <- run_tasks_from_lists@src/task.c:670 task_queue
    stktable_add_pend_updates    705423   1.511s    2.142us      -      56.81us <- stktable_add_pend_updates@src/stick_table.c:869 tasklet_wakeup
    task_process_applet          683511   34.75s    50.84us   18.37us   153.3us <- __process_running_peer_sync@src/peers.c:3579 appctx_wakeup
    h1_io_cb                     535395   198.1ms   370.0ns   72.00ns   930.4us <- h1_takeover@src/mux_h1.c:5659 tasklet_wakeup

It now makes it pretty obvious which tasks (hence call chains) spend their
time waiting on a lock and for what share of their execution time.
2025-09-11 16:32:34 +02:00
Willy Tarreau
1956c544b5 MINOR: activity: add a new lkw_avg column to show profiling stats
This new column will be used for reporting the average time spent waiting
for a lock. It will only have a non-zero value when DEBUG_THREAD > 0. For
now it is not updated.
2025-09-11 16:32:34 +02:00
Willy Tarreau
9f7ce9e807 MINOR: activity: don't report the lat_tot column for show profiling tasks
This column is pretty useless, as the total latency experienced by tasks
is meaningless, what matters is the average per call. Since we'll add more
columns and we need to keep all of this readable, let's get rid of this
column.
2025-09-11 16:32:34 +02:00
Christopher Faulet
3023e98199 BUG/MINOR: resolvers: Restore round-robin selection on records in DNS answers
Since the commit dcb696cd3 ("MEDIUM: resolvers: hash the records before
inserting them into the tree"), When several records are found in a DNS
answer, the round robin selection over these records is no longer performed.

Indeed, before a list of records was used. To ensure each records was
selected one after the other, at each selection, the first record of the
list was moved at the end. When this list was replaced bu a tree, the same
mechanism was preserved. However, the record is indexed using its key, a
hash of the record. So its position never changes. When it is removed and
reinserted in the tree, its position remains the same. When we walk though
the tree, starting from the root, the records are always evaluated in the
same order. So, even if there are several records in a DNS answer, the same
IP address is always selected.

It is quite easy to trigger the issue with a do-resolv action.

To fix the issue, the node to perform the next selection is now saved. So
instead of restarting from the root each time, we can restart from the next
node of the previous call.

Thanks to Damien Claisse for the issue analysis and for the reproducer.

This patch should fix the issue #3116. It must be backported as far as 2.6.
2025-09-11 15:46:45 +02:00
Christopher Faulet
37abe56b18 BUG/MEDIUM: resolvers: Properly cache do-resolv resolution
As stated by the documentation, when a do-resolv resolution is performed,
the result should be cached for <hold.valid> milliseconds. However, the only
way to cache the result is to always have a requester. When the last
requester is unlink from the resolution, the resolution is released. So, for
a do-resolv resolution, it means it could only work by chance if the same
FQDN is requested enough to always have at least two streams waiting for the
resolution. And because in that case, the cached result is used, it means
the traffic must be quite high.

In fact, a good approach to fix the issue is to keep orphan resolutions to
be able cache the result and only release them after hold.valid milliseconds
after the last real resolution. The resolver's task already releases orphan
resolutions. So we only need to check the expiration date and take care to
not release the resolution when the last stream is unlink from it.

This patch should be backported to all stable versions. We can start to
backport it as far as 3.1 and then wait a bit.
2025-09-11 15:46:45 +02:00
William Lallemand
fb832e1e52 BUILD: ssl: functions defined but not used
Previous patch 50d191b ("MINOR: ssl: set functions as static when no
protypes in the .h") broke the WolfSSL function with unused functions.

This patch add __maybe_unused to ssl_sock_sctl_parse_cbk(),
ssl_sock_sctl_add_cbk() and ssl_sock_msgcbk()
2025-09-11 15:32:59 +02:00
William Lallemand
50d191b8a3 MINOR: ssl: set functions as static when no protypes in the .h
Check with -Wmissing-prototypes what should be static.

src/ssl_sock.c:1572:5: error: no previous prototype for ‘ssl_sock_sctl_add_cbk’ [-Werror=missing-prototypes]
 1572 | int ssl_sock_sctl_add_cbk(SSL *ssl, unsigned ext_type, const unsigned char **out, size_t *outlen, int *al, void *add_arg)
      |     ^~~~~~~~~~~~~~~~~~~~~
src/ssl_sock.c:1582:5: error: no previous prototype for ‘ssl_sock_sctl_parse_cbk’ [-Werror=missing-prototypes]
 1582 | int ssl_sock_sctl_parse_cbk(SSL *s, unsigned int ext_type, const unsigned char *in, size_t inlen, int *al, void *parse_arg)
      |     ^~~~~~~~~~~~~~~~~~~~~~~
src/ssl_sock.c:1604:6: error: no previous prototype for ‘ssl_sock_infocbk’ [-Werror=missing-prototypes]
 1604 | void ssl_sock_infocbk(const SSL *ssl, int where, int ret)
      |      ^~~~~~~~~~~~~~~~
src/ssl_sock.c:2107:6: error: no previous prototype for ‘ssl_sock_msgcbk’ [-Werror=missing-prototypes]
 2107 | void ssl_sock_msgcbk(int write_p, int version, int content_type, const void *buf, size_t len, SSL *ssl, void *arg)
      |      ^~~~~~~~~~~~~~~
src/ssl_sock.c:3936:5: error: no previous prototype for ‘sh_ssl_sess_new_cb’ [-Werror=missing-prototypes]
 3936 | int sh_ssl_sess_new_cb(SSL *ssl, SSL_SESSION *sess)
      |     ^~~~~~~~~~~~~~~~~~
src/ssl_sock.c:3990:14: error: no previous prototype for ‘sh_ssl_sess_get_cb’ [-Werror=missing-prototypes]
 3990 | SSL_SESSION *sh_ssl_sess_get_cb(SSL *ssl, __OPENSSL_110_CONST__ unsigned char *key, int key_len, int *do_copy)
      |              ^~~~~~~~~~~~~~~~~~
src/ssl_sock.c:4043:6: error: no previous prototype for ‘sh_ssl_sess_remove_cb’ [-Werror=missing-prototypes]
 4043 | void sh_ssl_sess_remove_cb(SSL_CTX *ctx, SSL_SESSION *sess)
      |      ^~~~~~~~~~~~~~~~~~~~~
src/ssl_sock.c:4075:6: error: no previous prototype for ‘ssl_set_shctx’ [-Werror=missing-prototypes]
 4075 | void ssl_set_shctx(SSL_CTX *ctx)
      |      ^~~~~~~~~~~~~
src/ssl_sock.c:4103:6: error: no previous prototype for ‘SSL_CTX_keylog’ [-Werror=missing-prototypes]
 4103 | void SSL_CTX_keylog(const SSL *ssl, const char *line)
      |      ^~~~~~~~~~~~~~
src/ssl_sock.c:5167:6: error: no previous prototype for ‘ssl_sock_deinit’ [-Werror=missing-prototypes]
 5167 | void ssl_sock_deinit()
      |      ^~~~~~~~~~~~~~~
src/ssl_sock.c:6976:6: error: no previous prototype for ‘ssl_sock_close’ [-Werror=missing-prototypes]
 6976 | void ssl_sock_close(struct connection *conn, void *xprt_ctx) {
      |      ^~~~~~~~~~~~~~
src/ssl_sock.c:7846:17: error: no previous prototype for ‘ssl_action_wait_for_hs’ [-Werror=missing-prototypes]
 7846 | enum act_return ssl_action_wait_for_hs(struct act_rule *rule, struct proxy *px,
      |                 ^~~~~~~~~~~~~~~~~~~~~~
2025-09-11 15:23:59 +02:00
William Lallemand
19daee6549 MINOR: ocsp: put internal functions as static ones
-Wmissing-prototypes let us check which functions can be made static and
is not used elsewhere.

rc/ssl_ocsp.c:1079:5: error: no previous prototype for ‘ssl_ocsp_update_insert_after_error’ [-Werror=missing-prototypes]
 1079 | int ssl_ocsp_update_insert_after_error(struct certificate_ocsp *ocsp)
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_ocsp.c:1116:6: error: no previous prototype for ‘ocsp_update_response_stline_cb’ [-Werror=missing-prototypes]
 1116 | void ocsp_update_response_stline_cb(struct httpclient *hc)
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_ocsp.c:1127:6: error: no previous prototype for ‘ocsp_update_response_headers_cb’ [-Werror=missing-prototypes]
 1127 | void ocsp_update_response_headers_cb(struct httpclient *hc)
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_ocsp.c:1138:6: error: no previous prototype for ‘ocsp_update_response_body_cb’ [-Werror=missing-prototypes]
 1138 | void ocsp_update_response_body_cb(struct httpclient *hc)
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_ocsp.c:1149:6: error: no previous prototype for ‘ocsp_update_response_end_cb’ [-Werror=missing-prototypes]
 1149 | void ocsp_update_response_end_cb(struct httpclient *hc)
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_ocsp.c:2095:5: error: no previous prototype for ‘ocsp_update_postparser_init’ [-Werror=missing-prototypes]
 2095 | int ocsp_update_postparser_init()
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-09-11 15:18:48 +02:00
William Lallemand
0224d60de6 BUG/MINOR: ocsp: prototype inconsistency
Inconsistencies between the .h and the .c can't be catched because the
.h is not included in the .c.

ocsp_update_init() does not have the right prototype and lacks a const
attribute.

Must be backported in all previous stable versions.
2025-09-11 15:18:10 +02:00
Remi Tricot-Le Breton
e0844a305c BUG/MINOR: ssl: Fix potential NULL deref in trace callback
'conn' might be NULL in the trace callback so the calls to
conn_err_code_str must be covered by a proper check.

This issue was found by Coverity and raised in GitHub #3112.
The patch must be backported to 3.2.
2025-09-11 14:31:32 +02:00
Remi Tricot-Le Breton
a316342ec6 BUG/MINOR: ssl: Potential NULL deref in trace macro
'ctx' might be NULL when we exit 'ssl_sock_handshake', it can't be
dereferenced without check in the trace macro.

This was found by Coverity andraised in GitHub #3113.
This patch should be backported up to 3.2.
2025-09-11 14:31:32 +02:00
William Lallemand
e52e6f66ac BUG/MEDIUM: jws: return size_t in JWS functions
JWS functions are supposed to return 0 upon error or when nothing was
produced. This was done in order to put easily the return value in
trash->data without having to check the return value.

However functions like a2base64url() or snprintf() could return a
negative value, which would be casted in a unsigned int if this happen.

This patch add checks on the JWS functions to ensure that no negative
value can be returned, and change the prototype from int to size_t.

This is also related to issue #3114.

Must be backported to 3.2.
2025-09-11 14:31:32 +02:00
William Lallemand
66a7ebfeef BUG/MINOR: acme: null pointer dereference upon allocation failure
Reported in issue #3115:

     	11. var_compare_op: Comparing task to null implies that task might be null.
681                if (!task) {
682                        ret++;
683                        ha_alert("acme: couldn't start the scheduler!\n");
684                }
CID 1609721: (#1 of 1): Dereference after null check (FORWARD_NULL)
12. var_deref_op: Dereferencing null pointer task.
685                task->nice = 0;
686                task->process = acme_scheduler;
687
688                task_wakeup(task, TASK_WOKEN_INIT);
689        }
690

Task would be dereferenced upon allocation failure instead of falling
back to the end of the function after the error.

Should be backported in 3.2.
2025-09-11 14:31:32 +02:00
Amaury Denoyelle
c15129f7dc DOC: quic: clarifies limited-quic support
This patch extends the documentation for "limited-quic" global keyword.
It mentions first that it relies on USE_QUIC_OPENSSL_COMPAT=1 build
option.

Compatibility with TLS libraries is now clearly exposed. In particular,
it highlights the fact that it is mostly targetted at OpenSSL version
prior to 3.5.2, and that it should be disabled if a recent OpenSSL
release is available. It also states that limited-quic does nothing if
USE_QUIC_OPENSSL_COMPAT is not set during compilation.
2025-09-11 10:11:12 +02:00
Amaury Denoyelle
d293cc62dc MINOR: quic: display build warning for compat layer on recent OpenSSL
Build option USE_QUIC_OPENSSL_COMPAT=1 must be set to activate QUIC
support for OpenSSL prior to version 3.5.2. This compiles an internal
compatibility layer, which must be then activated at runtime with global
option limited-quic.

Starting from OpenSSL version 3.5.2, a proper QUIC TLS API is now
exposed. Thus, the compatibility layer is unneeded. However it can still
be compiled against newer OpenSSL releases and activated at runtime,
mostly for test purpose.

As this compatibility layer has some limitations, (no support for QUIC
0-RTT), it's important that users notice this situation and disable it
if possible. Thus, this patch adds a notice warning when
USE_QUIC_OPENSSL_COMPAT=1 is set when building against OpenSSL 3.5.2 and
above. This should be sufficient for users and packagers to understand
that this option is not necessary anymore.

Note that USE_QUIC_OPENSSL_COMPAT=1 is incompatible with others TLS
library which exposed a QUIC API based on original BoringSSL patches
set. A build error will prevent the compatibility layer to be built.
limited-quic option is thus silently ignored.
2025-09-11 10:11:12 +02:00
Frederic Lecaille
5027ba36a9 MINOR: quic-be: make SSL/QUIC objects use their own indexes (ssl_qc_app_data_index)
This index is used to retrieve the quic_conn object from its SSL object, the same
way the connection is retrieved from its SSL object for SSL/TCP connections.

This patch implements two helper functions to avoid the ugly code with such blocks:

   #ifdef USE_QUIC
   else if (qc) { .. }
   #endif

Implement ssl_sock_get_listener() to return the listener from an SSL object.
Implement ssl_sock_get_conn() to return the connection from an SSL object
and optionally a pointer to the ssl_sock_ctx struct attached to the connections
or the quic_conns.

Use this functions where applicable:
   - ssl_tlsext_ticket_key_cb() calls ssl_sock_get_listener()
   - ssl_sock_infocbk() calls ssl_sock_get_conn()
   - ssl_sock_msgcbk() calls ssl_sock_get_ssl_conn()
   - ssl_sess_new_srv_cb() calls ssl_sock_get_conn()
   - ssl_sock_srv_verifycbk() calls ssl_sock_get_conn()

Also modify qc_ssl_sess_init() to initialize the ssl_qc_app_data_index index for
the QUIC backends.
2025-09-11 09:51:28 +02:00
Frederic Lecaille
47bb15ca84 MINOR: quic: get rid of ->target quic_conn struct member
The ->li (struct listener *) member of quic_conn struct was replaced by a
->target (struct obj_type *) member by this commit:

    MINOR: quic-be: get rid of ->li quic_conn member

to abstract the connection type (front or back) when implementing QUIC for the
backends. In these cases, ->target was a pointer to the ojb_type of a server
struct. This could not work with the dynamic servers contrary to the listeners
which are not dynamic.

This patch almost reverts the one mentioned above. ->target pointer to obj_type member
is replaced by ->li pointer to listener struct member. As the listener are not
dynamic, this is easy to do this. All one has to do is to replace the
objt_listener(qc->target) statement by qc->li where applicable.

For the backend connection, when needed, this is always qc->conn->target which is
used only when qc->conn is initialized. The only "problematic" case is for
quic_dgram_parse() which takes a pointer to an obj_type as third argument.
But this obj_type is only used to call quic_rx_pkt_parse(). Inside this function
it is used to access the proxy counters of the connection thanks to qc_counters().
So, this obj_type argument may be null for now on with this patch. This is the
reason why qc_counters() is modified to take this into consideration.
2025-09-11 09:51:28 +02:00
Christopher Faulet
5354c24c76 BUG/MAJOR: stream: Force channel analysis on successful synchronous send
This patchs reverts commit a498e527b ("BUG/MAJOR: stream: Remove READ/WRITE
events on channels after analysers eval") because of a regression. It was an
attempt to properly detect synchronous sends, even when the stream was woken
up on a write event. However, the fix was wrong because it could mask
shutdowns performed during process_stream() and block the stream.

Indeed, when a shutdown is performed, because an error occurred for
instance, a write event is reported. The commit above could mask this event
while the shutdown prevent any synchronous sends. In such case, the stream
could remain blocked infinitly because an I/O event was missed.

So to properly fix the original issue (#3070), the write event must not be
masked before a synchronous send. Instead, we now force the channel analysis
by setting explicitly CF_WAKE_ONCE flags on the corresponding channel if a
write event is reported after the synchronous send. CF_WRITE_EVENT flag is
remove explicitly just before, so it is quite easy to detect.

This patch must be backport to all stable version in same time of the commit
above.
2025-09-11 09:47:47 +02:00
Willy Tarreau
ded2110ec6 MEDIUM: peers: move process_peer_sync() to a single thread
The remaining half of the task_queue() and task_wakeup() contention
is caused by this function when peers are in use, because just like
process_table_expire(), it's created using task_new_anywhere() and
is woken up for local updates. Let's turn it to single thread by
rotating the assigned threads during initialization so that a table
only runs on one thread at a time.

Here we go backwards to assign the threads, so that on small setups
they don't end up on the same CPUs as the ones used by the stick-tables.
This way this will make an even better use of large machines. The
performance remains the same as with previous patch, even slightly
better (1-3% on avg).

At this point there's almost no multi-threaded task activity anymore
(only srv_cleanup_idle_server once in a while). This should improve
the situation described by Felipe in issues #3084 and #3101.

This should be backported to 3.2 after some extended checks.
2025-09-10 19:14:05 +02:00
Willy Tarreau
e05afda249 MEDIUM: stick-table: move process_table_expire() to a single thread
A big deal of the task_queue() contention is caused by this function
because it's created using task_new_anywhere() and is subject to
heavy updates. Let's turn it to single thread by rotating the assigned
threads during initialization so that a table only runs on one thread
at a time.

However there's a trick: the function used to call task_queue() to
requeue the task if it had advanced its timer (may only happen when
learning an entry from a peer). We can't do that anymore since we can't
queue another thread's task. Thus instead of the task needs to be
scheduled earlier than previously planned, we simply perform a wakeup.
It will likely do nothing and will self-adjust its next wakeup timer.

Doing so halves the number of multi-thread task wakeups. In addition
the request rate at saturation increased by 12% with 16 peers and 40
tables on a 16 8-thread processes. This should improve the situation
described by Felipe in issues #3084 and #3101.

This should be backported to 3.2 after some extended checks.
2025-09-10 19:13:33 +02:00
Willy Tarreau
2831cb104f BUG/MINOR: stick-table: make sure never to miss a process_table_expire update
In stktable_requeue_exp(), there's a tiny race at the beginning during
which we check the task's expiration date to decide whether or not to
wake process_table_expire() up. During this race, the task might just
have finished running on its owner thread and we can miss a task_queue()
opportunity, which probably explains why during testing it seldom happens
that a few entries are left at the end.

Let's perform a CAS to confirm the value is still the same before
leaving. This way we're certain that our value has been seen at least
once.

This should be backported to 3.2.
2025-09-10 18:45:01 +02:00
Willy Tarreau
2ce5e0edcc MEDIUM: resolvers: make the process_resolvers() task single-threaded
This task is sometimes caught triggering the watchdog while waiting for
the infamous resolvers lock, or the scheduler's wait queue lock in
task_queue(). Both are caused by its multi-threaded capability. The
task may indeed start on a thread that's different from the one that
is currently receiving a response and that holds the resolvers lock,
and when being queued back, it requires to lock the wait queue. Both
problems disappear when sticking it to a single thread. But for configs
running multiple resolvers sections, it would be suboptimal to run them
all on the same thread. In order to avoid this, we implement a counter
in the resolvers_finalize_config() section that rotates the thread for
each resolvers section.

This was sufficient to further improve the performance here, making the
CPU usage drop to about 7% (from 11 previously or 38 initially) and not
showing any resolvers lock contention anymore in perf top output.

The change was kept fairly minimal to permit a backport once enough
testing is conducted on it. It could address a significant part of
the trouble reported by Felipe in GH issue #3101.
2025-09-10 16:51:14 +02:00
Willy Tarreau
d624aceaef MEDIUM: dns: bind the nameserver sockets to the initiating thread
There's still a big architectural limitation in the dns/resolvers code
regarding threads: resolvers run as a task that is scheduled to run
anywhere, and each NS dgram socket is bound to any thread of the same
thread group as the initiating thread. This becomes a big problem when
dealing with multiple nameservers because responses arrive on any thread,
start by locking the resolvers section, and other threads dealing with
responses are just stuck waiting for the lock to disappear. This means
that most of the time is exclusively spent causing contention. The
process_resolvers() function also also suffers from this contention
but apparently less often.

It turns out that the nameserver sockets are created during emission
of the first packet, triggered from the resolvers task. The present
patch exploits this to stick all sockets to the calling thread instead
of any thread. This way there is no longer any contention between
multiple nameservers of a same resolvers section. Tests with a section
having 10 name servers showed that the CPU usage dropped from 38 to
about 10%, or almost by a factor of 4.

Note that TCP resolvers do not offer this possibility because the
tasks that manage the applets are created earlier to run anywhere
during config parsing. This might possibly be refined later, e.g.
by changing the task's affinity when it first runs.

The change was kept fairly minimal to permit a backport once enough
testing is conducted on it. It could address a significant part of
the trouble reported by Felipe in GH issue #3101.
2025-09-10 16:48:09 +02:00
Olivier Houchard
07c10ec2f1 BUG/MEDIUM: ssl: Fix a crash if we failed to create the mux
In ssl_sock_io_cb(), if we failed to create the mux, we may have
destroyed the connection, so only attempt to access it to get the ALPN
if conn_create_mux() was successful.
This fixes crashes that may happen when using ssl.
2025-09-10 12:02:53 +02:00
Olivier Houchard
1759c97255 BUG/MEDIUM: ssl: Fix a crash when using QUIC
Commit 5ab9954faa9c815425fa39171ad33e75f4f7d56f introduced a new flag in
ssl_sock_ctx, to know that an ALPN was negociated, however, the way to
get the ssl_sock_ctx was wrong for QUIC. If we're using QUIC, get it
from the quic_conn.
This should fix crashes when attempting to use QUIC.
2025-09-10 11:45:03 +02:00
Willy Tarreau
be86a69fe8 DEBUG: stick-tables: export stktable_add_pend_updates() for better reporting
This function is a tasklet handler used to send peers updates, and it can
happen quite a bit in "show tasks" and "show profiling tasks", so let's
export it so that we don't face a cryptic symbol name:

  $ socat - /tmp/haproxy-n10.stat <<< "show tasks"
  Running tasks: 43 (8 threads)
    function                     places     %    lat_tot   lat_avg  calls_tot  calls_avg calls%
    process_table_expire             16   37.2   1.072m    4.021s      115831       7239   15.4
    task_process_applet              15   34.8   1.072m    4.287s      486299      32419   65.0
    stktable_add_pend_updates         8   18.6      -         -         89725      11215   12.0
    sc_conn_io_cb                     3    6.9      -         -          5007       1669    0.6
    process_peer_sync                 1    2.3   4.293s    4.293s       50765      50765    6.7

This should be backported to 3.2 as it participates to debugging the
table+peers processing overhead.
2025-09-10 11:34:51 +02:00
Willy Tarreau
993c09438b BUG/MEDIUM: stick-tables: don't loop on non-expirable entries
The stick-table expiration of ref-counted entries was insufficiently
addresse by commit 324f0a60ab ("BUG/MINOR: stick-tables: never leave
used entries without expiration"), because now entries are just requeued
where they were, so they're visited over and over for long sessions,
causing process_table_expire() to loop, eating CPU and causing lock
contention.

Here we take care of refreshing their timeer when they are met, so
that we don't meet them more than once per stick-table lifetime. It
should address at least a part of the recent degradation that Felipe
noticed in GH #3084.

Since the fix above was marked for backporting to 3.2, this one should
be backported there as well.
2025-09-10 11:27:27 +02:00
Willy Tarreau
997d217dee MINOR: tools: don't emit "+0" for symbol names which exactly match known ones
resolve_sym_name() knows a number of symbols, but when one exactly matches
(e.g. a task's handler), it systematically displays the offset behind it
("+0"). Let's only show the offset when non-zero. This can be backported
as this is helpful for debugging.
2025-09-10 10:44:33 +02:00
Willy Tarreau
9eb35563a6 MINOR: activity: indicate the number of calls on "show tasks"
The "show tasks" command can be useful to inspect run queues for active
tasks, but currently it's difficult to distinguish an occasional running
task from a heavily active one. Let's collect the number of calls for
each of them, report them average on the number of instances of each task
as well as a percentage of the total used. This way it even becomes
possible to get a hint about how CPU usage is distributed.
2025-09-10 10:44:33 +02:00
Willy Tarreau
17d3392348 BUG/MINOR: activity: fix reporting of task latency
In 2.4, "show tasks" was introduced by commit 7eff06e162 ("MINOR:
activity: add a new "show tasks" command to list currently active tasks")
to expose some info about running tasks. The latency is not correct
because it's a u32 subtracted from a u64. It ought to have been casted
to u32 for the operation, which is what this patch does.

This can be backported to 2.4.
2025-09-10 10:44:33 +02:00
Willy Tarreau
bdff394195 BUILD: ssl: address a recent build warning when QUIC is enabled
Since commit 5ab9954faa ("MINOR: ssl: Add a flag to let it known we have
an ALPN negociated"), when building with QUIC we get this warning:

  src/ssl_sock.c: In function 'ssl_sock_advertise_alpn_protos':
  src/ssl_sock.c:2189:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]

Let's just move the instructions after the optional declaration. No
backport is needed.
2025-09-10 10:44:33 +02:00
Olivier Houchard
d4c51a4f57 MEDIUM: server: Make use of the stored ALPN stored in the server
Now that which ALPN gets negociated for a given server, use that to
decide if we can create the mux right away in connect_server(), and use
it in conn_install_mux_be().
That way, we may create the mux soon enough for early data to be sent,
before the handshake has been completed.
This commit depends on several previous commits, and it has not been
deemed important enough to backport.
2025-09-09 19:01:24 +02:00
Willy Tarreau
6a2b3269f9 CLEANUP: backend: clarify the cases where we want to use early data
The conditions to use early data on output are super tricky and
detected later, so that it's difficult to figure how this works. This
patch splits the condition in two parts, the one that can be performed
early that is based on config/client/etc. It is used to clear a variable
that allows early data to be used in case any condition is not satisfied.
It was purposely split into multiple independent and reviewable tests.

The second part remains where it was at the end, and is used to temporarily
clear the handshake flags to let the data layer use early data. This one
being tricky, a large comment explaining the principle was added.

The logic was not changed at all, only the code was made more readable.
2025-09-09 19:01:24 +02:00
Willy Tarreau
9b9d0720e1 CLEANUP: backend: simplify the complex ifdef related to 0RTT in connect_server()
Since 3.0 we have HAVE_SSL_0RTT precisely to avoid checking horribly
complicated and unmaintainable conditions to detect support for 0RTT.
Let's just drop the complex condition and use the macro instead.
2025-09-09 19:01:24 +02:00
Willy Tarreau
4aaf0bfbce CLEANUP: backend: invert the condition to start the mux in connect_server()
Instead of trying to switch from delayed start to instant start based
on a single condition, let's do the opposite and preset the condition
to instant start and detect what could cause it to be delayed, thus
falling back to the slow mode. The condition remains exactly the
inverted one and better matches the comment about ALPN being the only
cause of such a delay.
2025-09-09 19:01:24 +02:00
Willy Tarreau
7b4a7f92b5 CLEANUP: backend: clarify the role of the init_mux variable in connect_server()
The init_mux variable is currently used in a way that's not super easy
to grasp. It's set a bit too late and requires to know a lot of info at
once. Let's first rename it to "may_start_mux_now" to clarify its role,
as the purpose is not to *force* the mux to be initialized now but to
permit it to do it.
2025-09-09 19:01:24 +02:00
Olivier Houchard
ff47ae60f3 MEDIUM: server: Introduce the concept of path parameters
Add a new field in struct server, path parameters. It will contain
connection informations for the server that are not expected to change.
For now, just store the ALPN negociated with the server. Each time an
handhskae is done, we'll update it, even though it is not supposed to
change. This will be useful when trying to send early data, that way
we'll know which mux to use.
Each time the server goes down or is disabled, those informations are
erased, as we can't be sure those parameters will be the same once the
server will be back up.
2025-09-09 19:01:24 +02:00
Olivier Houchard
9d65f5cd4d MINOR: ssl: Use the new flag to know when the ALPN has been set.
How that we have a flag to let us know the ALPN has been set, we no
longer have to call ssl_sock_get_alpn() to know if the alpn has been
negociated already.
Remove the call to conn_create_mux() from ssl_sock_handshake(), and just
reuse the one already present in ssl_sock_io_cb() if we have received
early data, and if the flag is set.
2025-09-09 19:01:24 +02:00
Olivier Houchard
5ab9954faa MINOR: ssl: Add a flag to let it known we have an ALPN negociated
Add a new flag to the ssl_sock_ctx, to be set as soon as the ALPN has
been negociated.
This happens before the handshake has been completed, and that
information will let us know that, when we receive early data, if the
ALPN has been negociated, then we can immediately create a mux, as the
ALPN will tell us which mux to use.
2025-09-09 19:01:24 +02:00
Olivier Houchard
6b78af837d BUG/MEDIUM: ssl: create the mux immediately on early data
If we received early data, and an ALPN has been negociated, then
immediately try to create a mux if we did not have one already.
Generally, at this point we would not have one, as the mux is decided by
the ALPN, however at this point, even if the handshake is not done yet,
we have enough to determine the ALPN, so we can immediately create the
mux.
Doing so makes up able to treat the request immediately, without waiting
for the handshake to be done.

This should be backported up to 2.8.
2025-09-09 19:01:24 +02:00
Olivier Houchard
aa25ddb773 BUG/MEDIUM: h1: Allow reception if we have early data
In h1_recv_allowed(), do not forbid the reception if we are yet to
complete the connection, if we have received early data on it. That way,
we can deal with them right away, instead of waiting for the handshake
to be done.

This should be backported up to 2.8.
2025-09-09 19:01:24 +02:00
Willy Tarreau
d7696d11e1 MEDIUM: peers: don't even try to process updates under contention
Recent fix 2421c3769a ("BUG/MEDIUM: peers: don't fail twice to grab the
update lock") improved the situation a lot for peers under locking
contention but still not enough for situations with many peers and
many entries to expire fast. It's indeed still possible to trigger
warnings at end of injection sessions for 16 peers at 100k req/s each
doing 10 random track-sc when process_table_expire() runs and holds the
update lock if compiled with a high value of STKTABLE_MAX_UPDATES_AT_ONCE
(1000). Better just not insist in this case and postpone the update.

At this point, under load only ebmb_lookup() consumes CPU, other functions
are in the few percent, indicating reasonable contention, and peers remain
updated.

This should be backported to 3.2 after a bit of testing.
2025-09-09 17:56:37 +02:00
Willy Tarreau
d5e7fba5c0 MEDIUM: stick-tables: don't wait indefinitely in stktable_add_pend_updates()
This one doesn't need to wait forever, if it cannot work it can postpone
it. When building with a high value of STKTABLE_MAX_UPDATES_AT_ONCE (1000),
it's still possible to trigger warnings in this function on the write lock
that is contended by peers and expiration. Changing it for a trylock resolves
the issue.

This should be backported to 3.2 after a bit of testing.
2025-09-09 17:56:37 +02:00
Willy Tarreau
a771b14541 MEDIUM: stick-tables: give up on lock contention in process_table_expire()
process_table_expire() can take quite a lot of time running over all
shards. During this time it will hinder track-sc rules and peers, which
will experience an increased latency to do their work, especially peers
where each message will cause a lock, whose cumulated time can exceed
the watchdog's patience.

Here, we proceed just like in stktable_trash_oldest(), which is that
we're using a trylock to detect contention. The first time it happens,
if we hadn't purged anything, we switch to a regular lock to perform
the operation, and next time it happens we abort. This guarantees that
some entries will be expired and that contention will be reduced with
when detected.

With this change, various tests didn't manage to produce any warning,
including at the end of the load generation session.

This should be backported to 3.2 after a bit more testing.
2025-09-09 17:56:37 +02:00
Willy Tarreau
f87cf8b76e MEDIUM: stick-tables: relax stktable_trash_oldest() to only purge what is needed
stktable_trash_oldest() does insist a lot on purging what was requested,
only limited by STKTABLE_MAX_UPDATES_AT_ONCE. This is called in two
conditions, one to allocate a new stksess, and the other one to purge
entries of a stopping process. The cost of iterating over all shards
is huge, and a shard lock is taken each time before looking up entries.

Moreover, multiple threads can end up doing the same and looking hard for
many entries to purge when only one is needed. Furthermore, all threads
start from the same shard, hence synchronize their locks. All of this
costs a lot to other operations such as access from peers.

This commit simplifies the approach by ignoring the budget, starting
from a random shard number, and using a trylock so as to be able to
give up early in case of contention. The approach chosen here consists
in trying hard to flush at least one entry, but once at least one is
evicted or at least one trylock failed, then a failure on the trylock
will result in finishing.

The function now returns a success as long as one entry was freed.

With this, tests no longer show watchdog warnings during tests, though
a few still remain when stopping the tests (which are not related to
this function but to the contention from process_table_expire()).

With this change, under high contention some entries' purge might be
postponed and the table may occasionally contain slightly more entries
than their size (though this already happens since stksess_new() first
increments ->current before decrementing it).

Measures were made on a 64-core system with 8 peers
of 16 threads each, at CPU saturation (350k req/s each doing 10
track-sc) for 10M req, with 3 different approaches:

  - this one resulted in 1500 failures to find an entry (0.015%
    size overhead), with the lowest contention and the fairest
    peers distibution.

  - leaving only after a success resulted in 229 failures (0.0029%
    size overhead) but doubled the time spent in the function (on
    the write lock precisely).

  - leaving only when both a success and a failed lock were met
    resulted in 31 failures (0.00031% overhead) but the contention
    was high enough again so that peers were not all up to date.

Considering that a saturated machine might exceed its entries by
0.015% is pretty minimal, the mechanism is kept.

This should be backported to 3.2 after a bit more testing as it
resolves some watchdog warnings and panics. It requires precedent
commit "MINOR: stick-table: permit stksess_new() to temporarily
allocate more entries" to over-allocate instead of failing in case
of contention.
2025-09-09 17:56:37 +02:00
Willy Tarreau
b119280f60 MINOR: stick-table: permit stksess_new() to temporarily allocate more entries
stksess_new() calls stktable_trash_oldest() to release some entries.
If it fails however, it will fail to allocate an entry. This is a problem
because it doesn't permit stktable_trash_oldest() to be used in best effort
mode, which forces it to impose high contention. There's no problem with
allocating slightly more in practice. In the worst case if all entries are
in use, it's not shocking to temporarily exceed the number of entries by a
few units.

Let's relax this problematic rule. This patch might need to be backported
to 3.2 after a bit more testing in order to support locking relaxation.
2025-09-09 17:56:37 +02:00
Willy Tarreau
0f33a55171 DEBUG: peers: export functions that use locks
The following functions take locks and are often involved in warnings
but are currently not resolved, so let's export them so that they are
properly decoded:

  peer_prepare_updatemsg(), peer_send_teachmsgs(),
  peer_treat_updatemsg(), peer_send_msgs(), peer_io_handler()

This should be backported to 3.2.
2025-09-09 17:56:14 +02:00
Willy Tarreau
25195ba1e7 MINOR: debug: report the time since last wakeup and call
When task profiling is enabled, the current thread knows when the
currently running task was woken up and called, so we can calculate
how long ago it was woken up and called. This is convenient to figure
whether or not a warning or panic is caused by this task or by a
previous one, so let's report this info in thread outputs when known.

It would be useful to backport this to 3.2.
2025-09-09 17:56:14 +02:00
Willy Tarreau
12bc4f9c44 MINOR: debug: report the number of loops and ctxsw for each thread
When multiple similar warnings are emitted, it can be difficult to know
whether only one task is looping slowly or if many are sharing the CPU.
Let's report the number of context switches and polling loop turns in
thread dumps so that warnings are easier to understand.

This should be backported to 3.2.
2025-09-09 17:56:14 +02:00
Willy Tarreau
c3f94fbd9b DEBUG: stream: count the number of passes in the connect loop
Normally the connect loop cannot loop, but some recent traces can easily
convince one of the opposite. Let's add a counter, including in panic
dumps, in order to avoid the repeated long head scratching sessions
starting with "and what if...". In addition, if it's found to loop, this
time it will be certain and will indicate what to zoom in. This should
be backported to 3.2.
2025-09-09 17:56:14 +02:00
Willy Tarreau
8153cf1e51 MINOR: debug: report the process id in warnings and panics
Warning and panic messages currently do not report the PID. This is
annoying when trying to reproduce problems because warnings do not
allow know which process to attach to in order to debug, and panics
do not permit to know which core dump corresponds to which dump.
Let's add them in both messages. This should probably be backported
at least to 3.2.
2025-09-09 17:56:14 +02:00
Amaury Denoyelle
0678d0a69b MINOR: check: reject invalid check config on a QUIC server
QUIC is now supported on the backend side. The previous commit ensures
that simple checks can be activated on QUIC servers without any issue.

The current patch ensures that check server settings remain compatible
with a QUIC server. Thus, configuration is now invalid if check
specifies an explicit MUX proto other than QUIC, disables SSL or try to
use PROXY protocol.
2025-09-09 16:55:09 +02:00
Amaury Denoyelle
cd3027a7ee BUG/MINOR: check: ensure checks are compatible with QUIC servers
Previously, checks were only performed on TCP. However, QUIC is now
supported on backend. Prior to this patch, check activation for QUIC
servers would result in a crash.

To ensure compatibility between QUIC servers and checks, adjust
protocol_lookup() performed during check connect step. Instead of using
a hardcoded PROTO_TYPE_STREAM, the value is now derived from server
settings.

This does not need to be backported.
2025-09-09 16:55:09 +02:00
Amaury Denoyelle
c6d33c09fc BUG/MEDIUM: checks: fix ALPN inheritance from server
If no specific check settings are defined on a server line, it is
expected that these checks will be performed with the same parameters as
normal connections on the same server.

ALPN must be carefully taken into account for checks. Most notably, MUX
initialization is delayed so that it is performed only after SSL
handshake.

Prior to this patch, MUX init delay was only performed if ALPN was
defined via check settings. Thus, with the following settings, checks
would be performed on HTTP/1.1 without consulting ALPN negotiation
result from the server :

  server s1 127.0.0.1:443 ssl crt <...> alpn h2 check

This bug may result in checks reporting failure, for example in case of
a server answering HTTP/2 to ALPN negotiation to the configuration
above. Besides, there is incoherency between normal and check
connections, which is not what the documentation specifies.

This patch fixes this code. Now server parameters are also taken into
account. This ensures that checks and normal connections by default
use the same connection method.

This must be backported up to 2.4.
2025-09-09 16:55:09 +02:00
Amaury Denoyelle
fee3bd48b4 OPTIM: check: do not delay MUX for ALPN if SSL not active
To ensure ALPN is properly applied on checks, MUX initialization is
delayed so that it is created on SSL handshake completion. However, this
does not check if SSL is really active for the connection.

This patch adjusts the condition so that MUX init is not delayed if SSL
is not active for the check connection. A similar process is already
conducted for normal connections via connect_server().

This must be backported up to 2.4. Despite not being a bug, it must be
backported for the following patch which fixes check ALPN inheritance
from server settings.
2025-09-09 16:55:09 +02:00
Amaury Denoyelle
536d2aafa3 BUG/MINOR: hq-interop: adjust parsing/encoding on backend side
HTTP/0.9 is available on top of QUIC. This protocol is reserved for
internal use, mostly interop purpose.

This patch adjusts HTTP/0.9 layer with the following changes :
* version is not emitted anymore on the status line. This is performed
  as some servers does not parse it correctly.
* status line is set explicitely on HTX status-line. This ensures the
  correct HTTP status code is reported to the upper stream layer.

This does not need to be backported.
2025-09-09 16:55:09 +02:00
Christopher Faulet
b901e56acd BUG/MEDIUM: mux-h2: Reinforce conditions to report an error to app-layer stream
This patch relies on the previous one ("BUG/MEDIUM: mux-h2: Report RST/error to
app-layer stream during 0-copy fwding").

When the end of the connection is detected, so when the H2_CF_END_REACHED
flag is set after the shutdown was received and all incoming data were
processed, if a stream is blocked by the flow control (the stream one or the
connection one), an error must be reported to the app-layer stream.

Otherwise, outgoing data won't be sent and the opposite side will handle
this as a lack of room. So the stream will be blocked until the write
timeout is triggerd. By reporting the error early, the stream can be
immediately closed.

This patch should be backported to 3.2. For older versions, it is probably a
good idea to wait for bug report.
2025-09-09 16:30:54 +02:00
Christopher Faulet
22e14f7b54 BUG/MEDIUM: mux-h2: Report RST/error to app-layer stream during 0-copy fwding
In h2_nego_ff(), it is important to report reset and error to app-layer
stream and to send the RST-STREAM frame accordingly. It is not clear if it
is an issue or not. But it is clearly a difference with the classical
forwarding via h2_snd_buf. And it is mandatory for the next fix.

This patch should be backported to 3.2. But is is probably a good idea to
not backport it on older versions, except if a bug is reported in this area.
2025-09-09 16:30:21 +02:00
Christopher Faulet
3b7112aa1d BUG/MINOR: mux-h2: Remove H2_CF_DEM_DFULL flags when the demux buffer is reset
This only happens when a connection error is detected or when the H2
connection is in ERR/ERR2 state. The demux buffer is explicitly reset. In
that case, it is important to remove the flag reporting this buffer as full.

It is probably worth to backport this patch to 3.2. But it is not mandatory
on older versions because it does not fix any known issue.
2025-09-09 16:29:14 +02:00
Christopher Faulet
12edcccc82 BUG/MEDIUM: mux-h2: Restart reading when mbuf ring is no longer full
When the mbuf ring buffer is full, the flag H2_CF_DEM_MROOM is set on the H2
connection to block any demux. It is important to properly handle ACK
frames. However, we must take care to restart reading when some data were
removed from the mbuf. Otherwise, we may block the demux for no reason. It
is especially an issue if the demux buffer is full. In that case, the H2
connection is blocked, waiting for the timeout.

This patch should be backported to 3.2. But is is probably a good idea to
not backport it on older versions, except if a bug is reported in this area.
2025-09-09 16:07:20 +02:00
Christopher Faulet
c6e4584d2b BUG/MEDIUM: mux-h2; Don't block reveives in H2_CS_ERROR and H2_CS_ERROR2 states
The H2 connection is switched to ERR when a GOAWAY must be sent and in ERR2
when it is sent. In these states, no more data can be emitted by the
mux. But there is no reason to not try to process incoming data or to not
try to receive data. It is espcially important to be able to get the
shutdown from the TCP connection when a SSL connection was previously
detected. Otherwise, it is possible to block a H2 connection until its
timeout expiration to be able to close it.

This patch should be backported to 3.2. But is is probably a good idea to
not backport it on older versions, except if a bug is reported in this
area.
2025-09-09 16:07:20 +02:00
Christopher Faulet
626d7934cf BUG/MEDIUM: mux-h2: Reset MUX blocking flags when a send error is caught
When an send error is detected on the underlying connection, a pending error
is reported to the H2 connection by setting H2_CF_ERR_PENDING flag. When
this happen the tail of the mux ring buffer is reset. However some blocking
flags remain set and have no chance to be removed later because of the
pending error. Especially the flag H2_CF_DEM_MROOM which block data
demultiplexing. Thus, it is possible to block a H2 connection with unparsed
incoming data.

Worse, if a read event is received, it could lead to a wakeup loop between
the H2 connection and the underlying SSL connection. The H2 connection is
unable to convert the pending error to a fatal error because the
demultiplexing is blocked. In the mean time, it tries to receive more data
because of the not-consumed read event. On the underlying connection side,
the error detected earlier blocks the read, but the H2 connection is woken
up to handle the error.

To fix the issue, blocking flags must be removed when a send error is caught,
H2_CF_MUX_MFULL and H2_CF_DEM_MROOM flags. But, it is not necessary to only
release the tail of the mbuf ring. When a send error is detected, all outgoing
data can be flushed. So, now, in h2_send(), h2_release_mbuf() function is called
on pending error. The mbuf ring is fully released and H2_CF_MUX_MFULL and
H2_CF_DEM_MROOM flags are removed.

Many thanks to Krzysztof Kozłowski for its help to spot this issue.

This patch could be backported at least as far as 2.8. But it is a bit
sensitive. So, it is probably a good idea to backport it to 3.2 for now and
wait for bug report on older versions.
2025-09-09 16:07:20 +02:00
Amaury Denoyelle
0b6908385e BUG/MINOR: quic: properly support GSO on backend side
Previously, GSO emission was explicitely disabled on backend side. This
is not true since the following patch, thus GSO can be used, for example
when transfering large POST requests to a HTTP/3 backend.

  commit e064e5d46171d32097a84b8f84ccc510a5c211db
  MINOR: quic: duplicate GSO unsupp status from listener to conn

However, GSO on the backend side may cause crash when handling EIO. In
this case, GSO must be completely disabled. Previously, this was
performed by flagging listener instance. In backend side, this would
cause a crash as listener is NULL.

This patch fixes it by supporting GSO disable flag for servers. Thus, in
qc_send_ppkts(), EIO can be converted either to a listener or server
flag depending on the quic_conn proxy side. On backend side, server
instance is retrieved via <qc.conn.target>. This is enough to guarantee
that server is not deleted.

This does not need to be backported.
2025-09-08 16:18:05 +02:00
Christopher Faulet
e653dc304e MINOR: pools: Don't dump anymore info about pools when purge is forced
Historically, when the purge of pools was forced by sending a SIGQUIT to
haproxy, information about the pools were first dumped. It is now totally
pointless because these info can be retrieved via the CLI. It is even less
relevant now because the purge is forced typically when there are memroy
issues and to dump pools information, data must be allocated.

dump_pools_info() function was simplified because it is now called only from
an applet. No reason to still try to dump info on stderr.
2025-09-08 16:04:40 +02:00
Christopher Faulet
982805e6a3 BUG/MINOR: pools: Fix the dump of pools info to deal with buffers limitations
The "show pools" CLI command was not designed to dump information exceeding
the size of a buffer. But there is now much more pools than few years ago
and when detailed information are dumped, we exceeds the buffer limit and
the output is truncated.

To fix the issue, the command must be refactored to be able to stream the
result. To do so, the array containing pools info is now part of the command
context and it is dynamically allocated. A dedicated function was created to
fill all info. In addition, the index of the next pool to dump is saved in
the command context too to properly handle resumption cases. Finally global
information about pools are also stored in the command context for
convenience.

This patch should fix the issue #3067. It must be backported to 3.2. On
older release, the buffer limit is never reached.
2025-09-08 16:01:51 +02:00
Christopher Faulet
d75718af14 REGTESTS: ssl: Fix the script about automatic SNI selection
First, the barrier to delay the client execution was moved before the client
definition. Otherwise, the connection is established too early and with
short timeouts it could be closed before the requests are sent.

The main purpose of the barrier was to workaround slow health-checks. This
is also the reason why the script was flagged as slow. But it can be
significantly speed-up by setting a slow "inter" value. It is now set to
100ms and the script is no longer slow.
2025-09-08 15:55:56 +02:00
Amaury Denoyelle
f645cd3c74 MINOR: quic: restore QUIC_HP_SAMPLE_LEN constant
The below patch fixes padding emission for small packets, which is
required to ensure that header protection removal can be performed by
the recipient.

  commit d7dea408c64c327cab6aebf4ccad93405b675565
  BUG/MINOR: quic: too short PADDING frame for too short packets

In addition to the proper fix, constant QUIC_HP_SAMPLE_LEN was removed
and replaced by QUIC_TLS_TAG_LEN. However, it still makes sense to have
a dedicated constant which represent the size of the sample used for
header protection. Thus, this patch restores it.

Special instructions for backport : above patch mentions that no
backport is needed. However, this is incorrect, as bug is introduced by
another patch scheduled for backport up to 2.6. Thus, it is first
mandatory to schedule d7dea408c64c327cab6aebf4ccad93405b675565 after it.
Then, this patch can also be used for the sake of code clarity.
2025-09-08 14:49:03 +02:00
Amaury Denoyelle
c20c71a079 TESTS: quic: add unit-tests for QUIC TX part
Define a new "quic_tx" unit-test which is used to test QUIC TX module.
For the moment, a single test is performed on qc_do_build_pkt(). It
checks that PADDING is correctly added for HP sampling in case of a
small packet.
2025-09-08 14:49:03 +02:00
Amaury Denoyelle
fb8c6e2030 CLEANUP: quic: fix typo in quic_tx trace
Fix trace in qc_may_build_pkt().

This can be backported up to 3.0.
2025-09-08 14:49:03 +02:00
Aurelien DARRAGON
b9ef55d56d MINOR: stats-file: use explicit unsigned integer bitshift for user slots
As reported in GH #3104, there remained a place where (1 << shift was
used to set or remove bits from uint64_t users bitfield. It is incorrect
and could lead to bugs for values > 32 bits.

Instead, let's use 1ULL to ensure the operation remains 64bits consistent.

No backport needed.
2025-09-08 13:38:49 +02:00
Aurelien DARRAGON
9272b8ce74 BUG/MEDIUM: proxy: fix crash with stop_proxy() called during init
Willy reported that the following config would segfault right after the
"removing incomplete section 'peer' is emitted:

  peers peers
	bind :2300
  	server n10 127.0.0.1:2310

  listen dummy
  bind localhost:9999

This is caused by the fact that stop_proxy(), which tries to read shared
counters, is called during early init while shared counters are not yet
initialized. To fix the crash, let's check if we're still during starting
phase, in which case we assume the counters are not initialized and we
assume 0 value instead.

No backport needed unless 16eb0fab31 ("MAJOR: counters: dispatch counters
over thread groups") is.
2025-09-08 13:38:38 +02:00
Frederic Lecaille
6f9fccec1f MINOR: quic: SSL session reuse for QUIC
Mimic the same behavior as the one for SSL/TCP connetion to implement the
SSL session reuse.

Extract the code which try to reuse the SSL session for SSL/TCP connections
to implement ssl_sock_srv_try_reuse_sess().
Call this function from QUIC ->init() xprt callback (qc_conn_init()) as this
done for SSL/TCP connections.
2025-09-08 11:46:26 +02:00
Olivier Houchard
b3e685ac3d BUG/MEDIUM: ssl: Properly initialize msg_controllen.
When kTLS is compiled in, make sure msg_controllen is initialized to 0.
If we're not actually kTLS, then it won't be set, but we'll check that
it is non-zero later to check if we ancillary data.
This does not need to be backported.
This should fix CID 1620865, as reported in github issue #3106.
2025-09-06 14:19:48 +02:00
Willy Tarreau
75bd9255dd BUG/MINOR: cpu_topo: work around a small bug in musl's CPU_ISSET()
As found in GH issue #3103, CPU_ISSET() on musl 1.25 doesn't match the man
page which says it's returning an int. The reason is pretty simple, it's
a macro that operates on the bits directly and returns the result of the
bit field applied to the mask as an unsigned long. Bits above 31 will
simply be dropped if returned as an int, which causes CPUs 32..63 to
appear as absent from cpu_sets.

The fix is trivial, it consists in just comparing the result against zero
(i.e. turning it to a boolean), but before it's merged and deployed we'll
have to face such deployments, so better implement the same workaround
in the code here since we have access to the raw long value.

This workaround should be backported to 3.0.
2025-09-06 11:05:52 +02:00
Frederic Lecaille
d7dea408c6 BUG/MINOR: quic: too short PADDING frame for too short packets
This bug arrvived with this commit:

    MINOR: quic: centralize padding for HP sampling on packet building

What was missed is the fact that at the centralization point for the
PADDING frame to add for too short packet, <len> payload length  already includes
<*pn_len> the packet number field length value.

So when computing the length of the PADDING frame, the packet field length must
not be considered and added to the payload length (<len>).

This bug leaded too short PADDING frame to too short packets. This was the case,
most of times with Application level packets with a 1-byte packet number field
followed by a 1-byte PING frame. A 1-byte PADDING frame was added in this case
in place of a correct 2-bytes PADDINF frame. The header packet protection of
such packet could not be removed by the clients as for instance for ngtcp2 with
such traces:

    I00001828 0x5a135c81e803f092c74bac64a85513b657 pkt could not decrypt packet number

As the header protection could no be removed, the header keyupdate bit could also
not be read by packet analyzers such as pyshark used during the keyupdate tests.

No need to backport.
2025-09-05 16:17:11 +02:00
Frederic Lecaille
71336bdd08 MINOR: quic: add useful trace about padding params values
When adding a PADDING frame for too short packets, add a trace about variable
values whose this PADDING frame length depends on.
2025-09-05 16:17:11 +02:00
Christopher Faulet
cc8af125be REGTESTS: ssl: Add a script to test the automatic SNI selection
The script reg-tests/ssl/ssl_sni_auto.vtc tests the automatic SNI selection
for regular server connections and for health-check ones. It rely on a
3.3-dev8 feature (in fact, it was pushed just after the dev8).
2025-09-05 15:56:42 +02:00
Christopher Faulet
f9a6ae727c OPTIM: tcpcheck: Reorder tcpchek_connect structure fields to fill holes
Thanks to this patch, two 4-bytes holes are now filled in the
tcpchek_connect structure.
2025-09-05 15:56:42 +02:00
Christopher Faulet
ffc1f096e0 MEDIUM: httpcheck/ssl: Base the SNI value on the HTTP host header by default
Similarly to the automic SNI selection for regulat SSL traffic, the SNI of
health-checks HTTPS connection is now automatically set by default by using
the host header value. "check-sni-auto" and "no-check-sni-auto" server
settings were added to change this behavior.

Only implicit HTTPS health-checks can take advantage of this feature. In
this case, the host header value from the "option httpchk" directive is used
to extract the SNI. It is disabled if http-check rules are used. So, the SNI
must still be explicitly specified via a "http-check connect" rule.

This patch with should paritally fix the issue #3081.
2025-09-05 15:56:42 +02:00
Christopher Faulet
668916c1a2 MEDIUM: server/ssl: Base the SNI value to the HTTP host header by default
For HTTPS outgoing connections, the SNI is now automatically set using the
Host header value if no other value is already set (via the "sni" server
keyword). It is now the default behavior. It could be disabled with the
"no-sni-auto" server keyword. And eventually "sni-auto" server keyword may
be used to reset any previous "no-sni-auto" setting. This option can be
inherited from "default-server" settings. Finally, if no connection name is
set via "pool-conn-name" setting, the selected value is used.

The automatic selection of the SNI is enabled by default for all outgoing
connections. But it is concretely used for HTTPS connections only. The
expression used is "req.hdr(host),host_only".

This patch should paritally fix the issue #3081. It only covers the server
part. Another patch will add the feature for HTTP health-checks.
2025-09-05 15:56:42 +02:00
Christopher Faulet
58555b8653 BUG/MINOR: tcpcheck: Don't use sni as pool-conn-name for non-SSL connections
When we try to ruse connection to perform an healtcheck, the SNI, from the
tcpcheck connection or the healthcheck itself, must not be used as
connection name for non-SSL connections.

This patch must be backported to 3.2.
2025-09-05 15:56:42 +02:00
Christopher Faulet
eb3d4eb59f OPTIM: tcpcheck: Don't set SNI and ALPN for non-ssl connections
There is no reason to set the SNI and ALPN for non-ssl connections. It is
not really an issue because ssl_sock_set_servername() and
ssl_sock_set_alpn() functions will do nothing. But it is cleaner this way
and this could avoid bugs in future.

No backport needed, because there is no bug.
2025-09-05 15:56:42 +02:00
Christopher Faulet
ef07d3511a OPTIM: proto_rhttp: Don't set SNI for non-ssl connections
There is no reason to set the SNI for non-ssl connections. It is not really
an issue because ssl_sock_set_servername() function will do nothing. But
there is no reason to uselessly evaluate an expression.

No backport needed, because there is no bug.
2025-09-05 15:56:42 +02:00
Christopher Faulet
52866349a1 OPTIM: backend: Don't set SNI for non-ssl connections
There is no reason to set the SNI for non-ssl connections. It is not really
an issue because ssl_sock_set_servername() function will do nothing. But
there is no reason to uselessly evaluate an expression.

No backport needed, because there is no bug.
2025-09-05 15:56:42 +02:00
Christopher Faulet
a97bd0f505 BUG/MINOR: server: Update healthcheck when server settings are changed via CLI
not all changes are concerned. But when the SSL is enabled or disabled for a
server, the healthcheck xprt must be eventually be updated too. This happens
when the healthcheck relies on the server settings.

In the same spirit, when the healthcheck address and port are updated, we
must fallback on the raw xprt if the SSL is not explicitly enabled for the
healthcheck with a "check-ssl" parameter.

This patch should be backported to all stable versions.
2025-09-05 15:56:42 +02:00
Christopher Faulet
f8f94ffc9c BUG/MEDIUM: server: Use sni as pool connection name for SSL server only
By default, for a given server, when no pool-conn-name is specified, the
configured sni is used. However, this must only be done when SSL is in-use
for the server. Of course, it is uncommon to have a sni expression for
now-ssl server. But this may happen.

In addition, the SSL may be disabled via the CLI. In that case, the
pool-conn-name must be discarded if it was copied from the sni. And, we must
of course take care to set it if the ssl is enabled.

Finally, when the attac-srv action is checked, we now checked the
pool-conn-name expression.

This patch should be backported as far as 3.0. It relies on "MINOR: server:
Parse sni and pool-conn-name expressions in a dedicated function" which
should be backported too.
2025-09-05 15:56:08 +02:00
Christopher Faulet
086a248645 MINOR: server: Parse sni and pool-conn-name expressions in a dedicated function
This change is mandatory to fix an issue. The parsing of sni and
pool-conn-name expressions (from string to expression) is now handled in a
dedicated function. This will avoid to duplicate the same code at different
places.
2025-09-05 11:32:21 +02:00
Christopher Faulet
bb407ba8e3 BUG/MINOR: acl: Fix error message about several '-m' parameters
There is a typo in the commit * c51ddd5c3 ("MINOR: acl: Only allow one '-m'
matching method") . '*m' was reported in the error message instead of '-m'.

In addition, it is now mentionned that only the last one should be keep if
an old config triggers the error.

No backport needed, except if the commit above is backported.
2025-09-05 11:32:20 +02:00
Willy Tarreau
b167d545cf [RELEASE] Released version 3.3-dev8
Released version 3.3-dev8 with the following main changes :
    - BUG/MEDIUM: mux-h2: fix crash on idle-ping due to unwanted ABORT_NOW
    - BUG/MINOR: quic-be: missing Initial packet number space discarding
    - BUG/MEDIUM: quic-be: crash after backend CID allocation failures
    - BUG/MEDIUM: ssl: apply ssl-f-use on every "ssl" bind
    - BUG/MAJOR: stream: Remove READ/WRITE events on channels after analysers eval
    - MINOR: dns: dns_connect_nameserver: fix fd leak at error path
    - BUG/MEDIUM: quic: reset padding when building GSO datagrams
    - BUG/MINOR: quic: do not emit probe data if CONNECTION_CLOSE requested
    - BUG/MAJOR: quic: fix INITIAL padding with probing packet only
    - BUG/MINOR: quic: don't coalesce probing and ACK packet of same type
    - MINOR: quic: centralize padding for HP sampling on packet building
    - MINOR: http_ana: fix typo in http_res_get_intercept_rule
    - BUG/MEDIUM: http_ana: handle yield for "stats http-request" evaluation
    - MINOR: applet: Rely on applet flag to detect the new api
    - MINOR: applet: Add function to test applet flags from the appctx
    - MINOR: applet: Add a flag to know an applet is using HTX buffers
    - MINOR: applet: Make some applet functions HTX aware
    - MEDIUM: applet: Set .rcv_buf and .snd_buf functions on default ones if not set
    - BUG/MEDIUM: mux-spop: Reject connection attempts from a non-spop frontend
    - REGTESTS: jwt: create dynamically "cert.ecdsa.pem"
    - BUG/MEDIUM: spoe: Improve error detection in SPOE applet on client abort
    - MINOR: haproxy: abort config parsing on fatal errors for post parsing hooks
    - MEDIUM: server: split srv_init() in srv_preinit() + srv_postinit()
    - MINOR: proxy: handle shared listener counters preparation from proxy_postcheck()
    - DOC: configuration: reword 'generate-certificates'
    - BUG/MEDIUM: quic-be: avoid crashes when releasing Initial pktns
    - BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets
    - MINOR: ssl: diagnostic warning when both 'default-crt' and 'strict-sni' are used
    - MEDIUM: ssl: convert diag to warning for strict-sni + default-crt
    - DOC: configuration: clarify 'default-crt' and implicit default certificates
    - MINOR: quic: remove ->offset qf_crypto struct field
    - BUG/MINOR: mux-quic: trace with non initialized qcc
    - BUG/MINOR: acl: set arg_list->kw to aclkw->kw string literal if aclkw is found
    - BUG/MEDIUM: mworker: fix startup and reload on macOS
    - BUG/MINOR: connection: rearrange union list members
    - BUG/MINOR: connection: remove extra session_unown_conn() on reverse
    - MINOR: cli: display failure reason on wait command
    - BUG/MINOR: server: decrement session idle_conns on del server
    - BUG/MINOR: mux-quic: do not access conn after idle list insert
    - MINOR: session: document explicitely that session_add_conn() is safe
    - MINOR: session: uninline functions related to BE conns management
    - MINOR: session: refactor alloc/lookup of sess_conns elements
    - MEDIUM: session: protect sess conns list by idle_conns_lock
    - MINOR: server: shard by thread sess_conns member
    - MEDIUM: server: close new idle conns if server in maintenance
    - MEDIUM: session: close new idle conns if server in maintenance
    - MINOR: server: cleanup idle conns for server in maint already stopped
    - MINOR: muxes: enforce thread-safety for private idle conns
    - MEDIUM: conn/muxes/ssl: reinsert BE priv conn into sess on IO completion
    - MEDIUM: conn/muxes/ssl: remove BE priv idle conn from sess on IO
    - MEDIUM: mux-quic: enforce thread-safety of backend idle conns
    - MAJOR: server: implement purging of private idle connections
    - MEDIUM: session: account on server idle conns attached to session
    - MAJOR: server: do not remove idle conns in del server
    - BUILD: mworker: fix ignoring return value of ‘read’
    - DOC: unreliable sockpair@ on macOS
    - MINOR: muxes: adjust takeover with buf_wait interaction
    - OPTIM: backend: set release on takeover for strict maxconn
    - DOC: configuration: confuse "strict-mode" with "zero-warning"
    - MINOR: doc: add missing statistics column
    - MINOR: doc: add missing statistics column
    - MINOR: stats: display new curr_sess_idle_conns server counter
    - MINOR: proxy: extend "show servers conn" output
    - MEDIUM: proxy: Reject some header names for 'http-send-name-header' directive
    - BUG/BUILD: stats: fix build due to missing stat enum definition
    - DOC: proxy-protocol: Make example for PP2_SUBTYPE_SSL_SIG_ALG accurate
    - CLEANUP: quic: remove a useless CRYPTO frame variable assignment
    - BUG/MEDIUM: quic: CRYPTO frame freeing without eb_delete()
    - BUG/MAJOR: mux-quic: fix crash on reload during emission
    - MINOR: conn/muxes/ssl: add ASSUME_NONNULL() prior to _srv_add_idle
    - REG-TESTS: map_redirect: Don't use hdr_dom in ACLs with "-m end" matching method
    - MINOR: acl: Only allow one '-m' matching method
    - MINOR: acl; Warn when matching method based on a suffix is overwritten
    - BUG/MEDIUM: server: Duplicate healthcheck's alpn inherited from default server
    - BUG/MINOR: server: Duplicate healthcheck's sni inherited from default server
    - BUG/MINOR: acl: Properly detect overwritten matching method
    - BUG/MINOR: halog: Add OOM checks for calloc() in filter_count_srv_status() and filter_count_url()
    - BUG/MINOR: log: Add OOM checks for calloc() and malloc() in logformat parser and dup_logger()
    - BUG/MINOR: acl: Add OOM check for calloc() in smp_fetch_acl_parse()
    - BUG/MINOR: cfgparse: Add OOM check for calloc() in cfg_parse_listen()
    - BUG/MINOR: compression: Add OOM check for calloc() in parse_compression_options()
    - BUG/MINOR: tools: Add OOM check for malloc() in indent_msg()
    - BUG/MINOR: quic: ignore AGAIN ncbuf err when parsing CRYPTO frames
    - MINOR: quic/flags: complete missing flags
    - BUG/MINOR: quic: fix room check if padding requested
    - BUG/MINOR: quic: fix padding issue on INITIAL retransmit
    - BUG/MINOR: quic: pad Initial pkt with CONNECTION_CLOSE on client
    - MEDIUM: quic: strengthen BUG_ON() for unpad Initial packet on client
    - DOC: configuration: rework the jwt_verify keyword documentation
    - BUG/MINOR: haproxy: be sure not to quit too early on soft stop
    - BUILD: acl: silence a possible null deref warning in parse_acl_expr()
    - MINOR: quic: Add more information about RX packets
    - CI: fix syntax of Quic Interop pipelines
    - MEDIUM: cfgparse: warn when using user/group when built statically
    - BUG/MEDIUM: stick-tables: don't leave the expire loop with elements deleted
    - BUG/MINOR: stick-tables: never leave used entries without expiration
    - BUG/MEDIUM: peers: don't fail twice to grab the update lock
    - MINOR: stick-tables: limit the number of visited nodes during expiration
    - OPTIM: stick-tables: exit expiry faster when the update lock is held
    - MINOR: counters: retrieve detailed errmsg upon failure with counters_{fe,be}_shared_prepare()
    - MINOR: stats-file: introduce shm-stats-file directive
    - MEDIUM: stats-file: processes share the same clock source from shm-stats-file
    - MINOR: stats-file: add process slot management for shm stats file
    - MEDIUM: stats-file/counters: store and preload stats counters as shm file objects
    - DOC: config: document "shm-stats-file" directive
    - OPTIM: stats-file: don't unnecessarily die hard on shm_stats_file_reuse_object()
    - MINOR: compiler: add ALWAYS_PAD() macro
    - BUILD: stats-file: fix aligment issues
    - MINOR: stats-file: reserve some bytes in exported structs
    - MEDIUM: stats-file: add some BUG_ON() guards to ensure exported structs are not changed by accident
    - BUG/MINOR: check: ensure check-reuse is compatible with SSL
    - BUG/MINOR: check: fix dst address when reusing a connection
    - REGTESTS: explicitly use "balance roundrobin" where RR is needed
    - MAJOR: backend: switch the default balancing algo to "random"
    - BUG/MEDIUM: conn: fix UAF on connection after reversal on edge
    - BUG/MINOR: connection: streamline conn detach from lists
    - BUG/MEDIUM: quic-be: too early SSL_SESSION initialization
    - BUG/MINOR: log: fix potential memory leak upon error in add_to_logformat_list()
    - MEDIUM: init: always warn when running as root without being asked to
    - MINOR: sample: Add base2 converter
    - MINOR: version: add -vq, -vqb, and -vqs flags for concise version output
    - BUILD: trace: silence a bogus build warning at -Og
    - MINOR: trace: accept trace spec right after "-dt" on the command line
    - BUILD: makefile: bump the default minimum linux version to 4.17
2025-09-05 09:54:34 +02:00
Willy Tarreau
85ac6a6f7b BUILD: makefile: bump the default minimum linux version to 4.17
As explained during the 3.3-dev7 announcement below:
  https://www.mail-archive.com/haproxy@formilux.org/msg46073.html

no regularly maintained distro supports a kernel older than 4.18 anymore,
and KTLS is supported since 4.17. So it's about the right moment to bump
the default minimum kernel version supported by glibc and musl to
automatically cover new features. The linux-glibc-legacy target still
supports 2.6.28 and above.
2025-09-05 09:44:56 +02:00
Willy Tarreau
670dc299d3 MINOR: trace: accept trace spec right after "-dt" on the command line
I continue to mistakenly set the traces using "-dtXXX" and to have to
refer to the doc to figure that it requires a separate argument and
differs from some other options. Worse, "-dthelp" doesn't say anything
and silently ignores the argument.

Let's make the parser take whatever follows "-dt" as the argument if
present, otherwise take the next one (as it currently does). Doing
this even allows to simplify the code, and is easier to figure the
syntax since "-dthelp" now works.
2025-09-05 09:33:28 +02:00
Willy Tarreau
abfd6f3b93 BUILD: trace: silence a bogus build warning at -Og
gcc-13.3 at -Og emits an incorrect build warning in trace.c about a
possibly initialized variable:

  In file included from include/haproxy/api.h:35,
                   from src/trace.c:22:
  src/trace.c: In function 'trace_parse_cmd':
  include/haproxy/bug.h:431:17: warning: 'arg' may be used uninitialized [-Wmaybe-uninitialized]
    431 |                 free(*__x);                                             \
        |                 ^~~~~~~~~~
  src/trace.c:1136:9: note: in expansion of macro 'ha_free'
   1136 |         ha_free(&oarg);
        |         ^~~~~~~
  src/trace.c:1008:15: note: 'arg' was declared here
   1008 |         char *arg, *oarg;
        |               ^~~

The warning is obviously wrong since the field is initialized in one of
the two branches of an "if" whose complementary one returns. But the
compiler doesn't seem to see this because the if is in fact two ifs each
with an opposite condition: "if (arg_src)" then "if (!arg_src)". Let's
just move upwards the default one that returns and eliminate the other
one. Reading the diff with "git diff -b" better shows the tiny change.

It could be backported to 3.0.
2025-09-05 09:19:24 +02:00
Nikita Kurashkin
ef73fe2584 MINOR: version: add -vq, -vqb, and -vqs flags for concise version output
This patch introduces three new command line flags to display HAProxy version
info more flexibly:

- `-vqs` outputs the short version string without commit info (e.g., "3.3.1").
- `-vqb` outputs only the branch (major.minor) part of the version (e.g., "3.3").
- `-vq` outputs the full version string with suffixes (e.g., "3.3.1-dev5-1bb975-71").

This allows easier parsing of version info in automation while keeping existing -v and -vv behaviors.

The command line argument parsing now calls `display_version_plain()` with a
display_mode parameter to select the desired output format. The function handles
stripping of commit or patch info as needed, depending on the mode.

Signed-off-by: Nikita Kurashkin <nkurashkin@stsoft.ru>
2025-09-05 08:57:57 +02:00
Maximilian Moehl
5d9abc68b4 MINOR: sample: Add base2 converter
This commit adds the base2 converter to turn binary input into it's
string representation. Each input byte is converted into a series of
eight characters which are either 0s and 1s by bit-wise comparison.
2025-09-05 08:51:51 +02:00
Willy Tarreau
a6986e1cd6 MEDIUM: init: always warn when running as root without being asked to
Like many exposed network deamons, haproxy does normally not need to run
as root and strongly recommends against this, unless strictly necessary.
On some operating systems, capabilities even totally alleviate this need.

Lately, maybe due to a raise of containerization or automated config
generation or a bit of both, we've observed a resurgence of this bad
practice, possibly due to the fact that users are just not aware of the
conditions they're using their daemon.

Let's add a warning at boot when starting as root without having requested
it using "uid" or "user". And take this opportunity for warning the user
about the existence of capabilities when supported, and encouraging the
use of a chroot.

This is achieved by leaving global.uid set to -1 by default, allowing us
to detect if it was explicitly set or not.
2025-09-05 08:51:07 +02:00
Aurelien DARRAGON
c97ced3f93 BUG/MINOR: log: fix potential memory leak upon error in add_to_logformat_list()
As reported on GH #3099, upon memory error add_to_logformat_list() will
return and error but it fails to properly memory which was allocated
within the function, which could result in memory leak.

Let's free all relevant variables allocated by the function before returning.

No backport needed unless 22ac1f5ee ("("BUG/MINOR: log: Add OOM checks for
calloc() and malloc() in logformat parser and dup_logger()") is.
2025-09-04 23:07:22 +02:00
Frederic Lecaille
842f32f3f1 BUG/MEDIUM: quic-be: too early SSL_SESSION initialization
When an SNI is set on a QUIC server line, ssl_sock_set_servername() is called
from connect_server() (backend.c). This leads some BUG_ON() to be triggered
because the CO_FL_WAIT_L6_CONN | CO_FL_SSL_WAIT_HS were not set. This must
be done into the ->init() xprt callback. This patch move the flags settings
from ->start() to ->init() callback.

Indeed, connect_server() calls these functions in this order:

   ->init(),
   ssl_sock_set_servername() # => crash if CO_FL_WAIT_L6_CONN | CO_FL_SSL_WAIT_HS not set
   ->start()

Furthermore ssl_sock_set_servername() has a side effect to reset the SSL_SESSION
object (attached to SSL object) calling SSL_set_session(), leading to crashes as follows:

 [Thread debugging using libthread_db enabled]
 Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
 Core was generated by `./haproxy -f quic_srv.cfg'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  tls_process_server_hello (s=0x560c259733b0, pkt=0x7fffac239f20)
     at ssl/statem/statem_clnt.c:1624
 1624            if (s->session->session_id_length > 0) {
 [Current thread is 1 (Thread 0x7fc364e53dc0 (LWP 35514))]
 (gdb) bt
 #0  tls_process_server_hello (s=0x560c259733b0, pkt=0x7fffac239f20)
     at ssl/statem/statem_clnt.c:1624
 #1  0x00007fc36540fba4 in ossl_statem_client_process_message (s=0x560c259733b0,
     pkt=0x7fffac239f20) at ssl/statem/statem_clnt.c:1042
 #2  0x00007fc36540d028 in read_state_machine (s=0x560c259733b0) at ssl/statem/statem.c:646
 #3  0x00007fc36540ca70 in state_machine (s=0x560c259733b0, server=0)
     at ssl/statem/statem.c:439
 #4  0x00007fc36540c576 in ossl_statem_connect (s=0x560c259733b0) at ssl/statem/statem.c:250
 #5  0x00007fc3653f1698 in SSL_do_handshake (s=0x560c259733b0) at ssl/ssl_lib.c:3835
 #6  0x0000560c22620327 in qc_ssl_do_hanshake (qc=qc@entry=0x560c25961f60,
     ctx=ctx@entry=0x560c25963020) at src/quic_ssl.c:863
 #7  0x0000560c226210be in qc_ssl_provide_quic_data (len=90, data=<optimized out>,
     ctx=0x560c25963020, level=ssl_encryption_initial, ncbuf=0x560c2588bb18)
     at src/quic_ssl.c:1071
 #8  qc_ssl_provide_all_quic_data (qc=qc@entry=0x560c25961f60, ctx=0x560c25963020)
     at src/quic_ssl.c:1123
 #9  0x0000560c2260ca5f in quic_conn_io_cb (t=0x560c25962f80, context=0x560c25961f60,
     state=<optimized out>) at src/quic_conn.c:791
 #10 0x0000560c228255ed in run_tasks_from_lists (budgets=<optimized out>) at src/task.c:648
 #11 0x0000560c22825f7a in process_runnable_tasks () at src/task.c:889
 #12 0x0000560c22793dc7 in run_poll_loop () at src/haproxy.c:2836
 #13 0x0000560c22794481 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3056
 #14 0x0000560c2259082d in main (argc=<optimized out>, argv=<optimized out>)
     at src/haproxy.c:3667

<s> is the SSL object, and <s->session> is the SSL_SESSION object.

For the client, this is the first call do SSL_do_handshake() which initializes this
SSL_SESSION object from ->init() xpt callback. Then it is reset by
ssl_sock_set_servername(), then tls_process_server_hello() TLS stack is called with
NULL value for s->session when receiving the ServerHello TLS message.

To fix this, simply move the first call to SSL_do_handshake to ->start xprt call
back (qc_xprt_start()).

No need to backport.
2025-09-04 20:49:06 +02:00
Amaury Denoyelle
687df405fe BUG/MINOR: connection: streamline conn detach from lists
Over their lifetime, connections are attached to different list. These
lists depends on whether connection is on frontend or backend side.
Attach point members are stored via a union in struct connection. The
next commit reorganizes them so that a proper frontend/backend
separation is performed :

  commit a96f1286a75246fef6db3e615fabdef1de927d83
  BUG/MINOR: connection: rearrange union list members

On conn_free(), connection instance must be removed from these lists to
ensure there is no use-after-free case. However code was still shaky
there, despite no real issue. Indeed, <toremove_list> was detached for
all connections, despite being only used on backend side only.

This patch streamlines the freeing of connection. Now, <toremove_list>
detach is performed in conn_backend_deinit(). Moreover, a new helper
conn_frontend_deinit() is defined. It ensures that <stopping_list>
detach is done. Prior it was performed individually by muxes.

Note that a similar procedure is performed when the connection is
reversed. Hence, conn_frontend_deinit() is now used here as well,
rendering reversal from FE to BE or vice versa symmetrical.

As mentionned above, no crash occured prior to this patch, but the code
was fragile, in particular access to <toremove_list> for frontend
connections. Thus this patch is considered as a bug fix worthy of a
backport along with above mentionned patch, currently up to 3.0.
2025-09-04 18:31:20 +02:00
Amaury Denoyelle
27ff7ff296 BUG/MEDIUM: conn: fix UAF on connection after reversal on edge
When a connection is reversed, some elements must be resetted prior to
reusing it. Most notably, connection must be removed from lists specific
on frontend/backend sides.

When reverse was performed for frontend to backend side, connection was
not removed via its <stopping_list> attach point. On previous releases,
this did not cause any issue. However, crashes start to occur recently,
probably due to the recent reorganization of connection list attach
points from the following patch.

  commit a96f1286a75246fef6db3e615fabdef1de927d83
  BUG/MINOR: connection: rearrange union list members

To fix this, simply ensure that <stopping_list> detach is performed via
conn_reverse().

This patch must be backported up to 3.0 release.
2025-09-04 18:13:35 +02:00
Willy Tarreau
93cc18ac42 MAJOR: backend: switch the default balancing algo to "random"
For many years, an unset load balancing algorithm would use "roundrobin".
It was shown several times that "random" with at least 2 draws (the
default) generally provides better performance and fairness in that
it will automatically adapt to the server's load and capacity. This
was further described with numbers in this discussion:

  https://www.mail-archive.com/haproxy@formilux.org/msg46011.html
  https://github.com/orgs/haproxy/discussions/3042

BTW there were no objection and only support for the change.

The goal of this patch is to change the default algo when none is
specified, from "roundrobin" to "random". This way, users who don't
care and don't set the load balancing algorithm will benefit from a
better one in most cases, while those who have good reasons to prefer
roundrobin (for session affinity or for reproducible sequences like used
in regtests) can continue to specify it.

The vast majority of users should not notice a difference.
2025-09-04 08:30:35 +02:00
Willy Tarreau
60931ceae9 REGTESTS: explicitly use "balance roundrobin" where RR is needed
A few tests explicitly rely on the server ordering granted by
"balance roundrobin", but didn't specify the balance algorithm.
As it will change soon, let's explicit it.
2025-09-04 08:18:53 +02:00
Amaury Denoyelle
9410b2ab97 BUG/MINOR: check: fix dst address when reusing a connection
The keyword check-reuse-pool allows to reuse an idle connection to
perform a health check instead of opening a new one. It is implemented
similarly to HTTP transfer reuse : a hash is calculated with a subset of
properties to lookup a connection with the same characteristics.

One of these properties is the destination address. Initially it was
always set to NULL prior to reuse check, as this is necessary to match
connections on a reverse-HTTP server. However, this prevents reuse on
other servers with a proper address configured. Indeed, in this case
destination address is always used as key for connections inserted in
idle pool.

This patch fixes this by properly setting destination address for check
reuse. By default, it reuses the address from the server. The only
exception is if the server is using reverse-HTTP, in which case address
remains NULL.

A new test is also performed prior to try check reuse to ensure this is
not performed on a transparent server. Indeed, in this case server
address would be unset. Anyway, check cannot reuse a connection in this
case so this is OK. Note that this does not prevent to continue check
with a newly connection with a NULL address : this should be handled
more properly in another patch.

This must be backported up to 3.2.
2025-09-03 16:58:14 +02:00
Amaury Denoyelle
6d3c3c7871 BUG/MINOR: check: ensure check-reuse is compatible with SSL
SSL may be activated implicitely if a server relies on SSL, even without
check-ssl keyword. This is performed by init_srv_check() function. The
main operation is to change xprt layer for check to SSL.

Prior to this patch, <use_ssl> check member was also set, despite not
strictly necessary. This has a negative side-effect of rendering
check-reuse-pool ineffective. Indeed, reuse on check is only performed
if no specific check configuration has been specified (see
tcpcheck_use_nondefault_connect()).

This patch fixes check reuse with SSL : <use_ssl> is not set in case SSL
is inherited implicitely from server configuration. Thus, <use_ssl> is
now only set if an explicit check-ssl keyword is set, which disables
connection reuse for check.

This must be backported up to 3.2.
2025-09-03 16:54:48 +02:00
Aurelien DARRAGON
f32bc8f0a4 MEDIUM: stats-file: add some BUG_ON() guards to ensure exported structs are not changed by accident
Add two BUG_ON() in shm_stats_file_prepare() which will trigger if
exported structures (shm_stats_file_hdr and shm_stats_file_object) change
in size, because it means that they will become incompatible with older
versions and thus precautions should be taken by the developer to ensure
compatibility with olders versions, or at least detect incompatible
versions by changing the version number to prevent bugs resulting
from inconsistent mapping between versions. The BUG_ON() may be
safely adjusted then.

Please note that it doesn't protect against accidental struct member
re-ordering if the resulting struct size is equal..
2025-09-03 16:29:55 +02:00
Aurelien DARRAGON
1a1362ea0b MINOR: stats-file: reserve some bytes in exported structs
We may need additional struct members in shm_stats_file_object and
shm_stats_file_hdr, yet since these structs are exported they should
not change in size nor ordering else it would require a version change
to break compability on purpose since mapping would differ.

Here we reserve 64 additional bytes in shm_stats_file_object, and
128 bytes in shm_stats_file_hdr for future usage.
2025-09-03 16:29:48 +02:00
Aurelien DARRAGON
21d97ccfae BUILD: stats-file: fix aligment issues
Document some byte holes and fix some potential aligment issues
between 32 and 64 bits architectures to ensure the shm_stats_file memory
mapping is consistent between operating systems.
2025-09-03 16:28:46 +02:00
Aurelien DARRAGON
46a5948ed2 MINOR: compiler: add ALWAYS_PAD() macro
same as THREAD_PAD() but doesn't depend on haproxy being compiled with
thread support. It may be useful for memory (or files) that may be
shared between multiple processed.
2025-09-03 16:28:46 +02:00
Aurelien DARRAGON
cf2562cddf OPTIM: stats-file: don't unnecessarily die hard on shm_stats_file_reuse_object()
shm_stats_file_reuse_object() has a non negligible cost, especially if
the shm file contains a lot of objects because the functions scans the
whole shm file to find available slots.

During startup, if no existing objects could be mapped in the shm
file shm_stats_file_add_object() for each object (server, fe, be or
listener) with a GUID set. On large config it means
shm_stats_file_add_object() could be called a lot of times in a row.

With current implementation, each shm_stats_file_add_object() call
leverages shm_stats_file_reuse_object(), so the more objects are defined
in the config, the slower the startup will be.

To try to optimize startup time a bit with large configs, we don't
sytematically call shm_stats_file_reuse_object(), especially when we
know that the previous attempt to reuse objects failed. In this case
we add a small tempo between failed attempts to reuse objects because
we assume the new attempt will probably fail anyway. (For slots to
become available, either an old process has to clean its entries,
or they have to time out which implies that the clock needs to be updated)
2025-09-03 16:28:41 +02:00
Aurelien DARRAGON
16abfb6e06 DOC: config: document "shm-stats-file" directive
Add some documentation for "shm-stats-file" and
"shm-stats-file-max-objects" experimental directives related to the use
of shared memory for storing stats counters (see previous commits for
implementation details)
2025-09-03 15:59:42 +02:00
Aurelien DARRAGON
585ece4c92 MEDIUM: stats-file/counters: store and preload stats counters as shm file objects
This is the last patch of the shm stats file series, in this patch we
implement the logic to store and fetch shm stats objects and associate
them to existing shared counters on the current process.

Shm objects are stored in the same memory location as the shm stats file
header. In fact they are stored right after it. All objects (struct
shm_stats_file_object) have the same size (no matter their type), which
allows for easy object traversal without having to check the object's
type, and could permit the use of external tools to scan the SHM in the
future. Each object stores a guid (of GUID_MAX_LEN+1 size) and tgid
which allows to match corresponding shared counters indexes. Also,
as stated before, each object stores the list of users making use of
it. Objects are never released (the map can only grow), but unused
objects (when no more users or active users are found in objects->users),
the object is automatically recycled. Also, each object stores its
type which defines how the object generic data member should be handled.

Upon startup (or reload), haproxy first tries to scan existing shm to
find objects that could be associated to frontends, backends, listeners
or servers in the current config based on GUID. For associations that
couldn't be made, haproxy will automatically create missing objects in
the SHM during late startup. When haproxy matches with an existing object,
it means the counter from an older process is preserved in the new
process, so multiple processes temporarily share the same counter for as
long as required for older processes to eventually exit.
2025-09-03 15:59:37 +02:00
Aurelien DARRAGON
ee17d20245 MINOR: stats-file: add process slot management for shm stats file
Now that all processes tied to the same shm stats file now share a
common clock source, we introduce the process slot notion in this
patch.

Each living process registers itself in a map at a free index: each slot
stores information about the process' PID and heartbeat. Each process is
responsible for updating its heartbeat, a slot is considered as "free" if
the heartbeat was never set or if the heartbeat is expired (60 seconds of
inactivity). The total number of slots is set to 64, this is on purpose
because it allows to easily store the "users" of a given shm object using
a 64 bits bitmask. Given that when haproxy is reloaded olders processes
are supposed to die eventually, it should be large enough (64 simultaneous
processes) to be safe. If we manage to reach this limit someday, more
slots could be added by splitting "users" bitmask on multiple 64bits
variable.
2025-09-03 15:59:33 +02:00
Aurelien DARRAGON
443e657fd6 MEDIUM: stats-file: processes share the same clock source from shm-stats-file
The use of the "shm-stats-file" directive now implies that all processes
using the same file now share a common clock source, this is required
for consistency regarding time-related operations.

The clock source is stored in the shm stats file header.
When the directive is set, all processes share the same clock
(global_now_ms and global_now_ns both point to variables in the map),
this is required for time-based counters such as freq counters to work
consistently. Since all processes manipulate global clock with atomic
operations exclusively during runtime, and don't systematically relies
on it (thanks to local now_ms and now_ns), it is pretty much transparent.
2025-09-03 15:59:27 +02:00
Aurelien DARRAGON
c91d93ed1c MINOR: stats-file: introduce shm-stats-file directive
add initial support for the "shm-stats-file" directive and
associated "shm-stats-file-max-objects" directive. For now they are
flagged as experimental directives.

The shared memory file is automatically created by the first process.
The file is created using open() so it is up to the user to provide
relevant path (either on regular filesystem or ramfs for performance
reasons). The directive takes only one argument which is path of the
shared memory file. It is passed as-is to open().

The maximum number of objects per thread-group (hard limit) that can be
stored in the shm is defined by "shm-stats-file-max-objects" directive,

Upon initial creation, the main shm stats file header is provisioned with
the version which must remains the same to be compatible between processes
and defaults to 2k. which means approximately 1mb max per thread group
and should cover most setups. When the limit is reached (during startup)
an error is reported by haproxy which invites the user to increase the
"shm-stats-file-max-objects" if desired, but this means more memory will
be allocated. Actual memory usage is low at start, because only the mmap
(mapping) is provisionned with the maximum number of objects to avoid
relocating the memory area during runtime, but the actual shared memory
file is dynamically resized when objects are added (resized by following
half power of 2 curve when new objects are added, see upcoming commits)

For now only the file is created, further logic will be implemented in
upcoming commits.
2025-09-03 15:59:22 +02:00
Aurelien DARRAGON
cb08bcb9d6 MINOR: counters: retrieve detailed errmsg upon failure with counters_{fe,be}_shared_prepare()
counters_{fe,be}_shared_prepare now take an extra <errmsg> parameter
that contains additional hints about the error in case of failure.

It must be freed accordingly since it is allocated using memprintf
2025-09-03 15:59:17 +02:00
Willy Tarreau
46463d6850 OPTIM: stick-tables: exit expiry faster when the update lock is held
It helps keep the contention level low: when we hold the update lock
that we know other parts may be relying on (peers, track-sc etc),
we decrease the remaining visit counters 4 times as fast to further
reduce the contention. At this point no more warnings are seen during
intense synchronization (2x64 cores, 1.5M req/s with a track-sc each,
5M entries in use).
2025-09-03 15:51:13 +02:00
Willy Tarreau
696793205b MINOR: stick-tables: limit the number of visited nodes during expiration
As reported by Felipe in GH issue #3084, on large systems it's not
sufficient to leave the expiration process after a certain number of
expired entries, because if they accumulate too fast, it's possible
to still spend some time visiting many (e.g. those still in use),
which takes time.

Thus here we're taking a stricter approach consisting in counting the
number of visited entries, which allows to leave early if we can't do
the expected work in a reasonable amount of time.

In order to avoid always stopping on first shards and never visiting
last ones, we're always starting from a random shard number and looping
from that one. This way even if we always leave early, all shards will
be handled equally.

This should be backported to 3.2.
2025-09-03 15:51:13 +02:00
Willy Tarreau
2421c3769a BUG/MEDIUM: peers: don't fail twice to grab the update lock
When the expire task is running fast (i.e. running almost alone), it's
super hard to grab the update lock and peers can easily trigger the
watchdog because the time it takes to grab this lock is multiplied by
the number of updates to perform. This is easier to trigger at the end
of an injection session where the expire task is omni-present. Let's
just record that we failed once and don't fail a second time in the
loop.

This should be backported to 3.2, but probably not further given that
this area changed significantly in 3.2.
2025-09-03 15:51:13 +02:00
Willy Tarreau
324f0a60ab BUG/MINOR: stick-tables: never leave used entries without expiration
When trying to kill/expire entries, if a ref-counted entry is found,
let's requeue it with its expiration timer instead of leaving it out,
because other ref-counters (e.g. peers) will not purge it otherwise,
leaving it orphan. This one seems trickier to trigger, though it seems
to happen sometimes when peers are late and a long resync is active
and competing with intense calls to process_table_expire() (i.e. when
no other acitvity is there).

This must be backported to 3.2. It's likely that older versions are
affected as well, but possibly differently since the expiration
mechanism changed between 3.1 and 3.2, so better not take unneeded
risks there.
2025-09-03 15:51:13 +02:00
Willy Tarreau
8da6ed6b6a BUG/MEDIUM: stick-tables: don't leave the expire loop with elements deleted
In 3.2, the table expiration latency was improved by commit 994cc58576
("MEDIUM: stick-tables: Limit the number of entries we expire"), however
it introduced an issue by which it's possible to leave the loop after a
certain number of elements were expired, without requeuing the deleted
elements. The issue it causes is that other places with a non-null ref_cnt
will not necessarily delete it themselves, resulting in orphan elements in
the table. These ones will then pollute it and force recycling old ones
more often which in turn results in an increase of the contention.

Let's check for the expiration counter before deleting the element so
that it can be found upon next visit.

This fix must be backported to 3.2. It is directly related to GH
issue #3084. Thanks to Felipe and Ricardo for sharing precious info
and testing a candidate fix.
2025-09-03 15:51:13 +02:00
William Lallemand
554a15562f MEDIUM: cfgparse: warn when using user/group when built statically
In issue #3013, an user observed a crash at startup of haproxy when
building statically and using the "user" global section.

This is a known problem of the glibc and the linker even warn about
this:

> warning: Using 'getgrnam' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
> warning: Using 'getpwnam' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

Let's emit a warning when using user/group in this case.
2025-09-03 14:45:00 +02:00
Ilia Shipitsin
3354719709 CI: fix syntax of Quic Interop pipelines
previously, wrong syntax of passing build arguments was used, thus
previously images were built using default SSLLIB=QuicTLS-1.1.1
2025-09-03 11:36:14 +02:00
Frederic Lecaille
58b153b882 MINOR: quic: Add more information about RX packets
This patch is very useful to debug issues at RX packet processing level.

Should be easily backported as far as 2.6 (for debug purposes).
2025-09-03 09:41:38 +02:00
Willy Tarreau
4902195313 BUILD: acl: silence a possible null deref warning in parse_acl_expr()
The fix in commit 441cd614f9 ("BUG/MINOR: acl: set arg_list->kw to
aclkw->kw string literal if aclkw is found") involves an unchecked
access to "al" after that one is tested for possibly being NULL. This
rightfully upsets Coverity (GH #3095) and might also trigger warnings
depending on the compilers. However, no known caller to date passes
a NULL arg list here so there's no way to trigger this theoretical
bug.

This should be backported along with the fix above to avoid emitting
warnings, possibly as far as 2.6 since that fix was tagged as such.
2025-09-02 17:41:51 +02:00
Willy Tarreau
c128887b8e BUG/MINOR: haproxy: be sure not to quit too early on soft stop
The fix in 4a9e3e102e ("BUG/MINOR: haproxy: only tid 0 must not sleep
if got signal") had the nasty side effect of breaking the graceful
reload operations: threads whose id is non-zero could quit too early and
not process incoming traffic, which is visible with broken connections
during reloads. They just need to ignore the the stopping condition
until the signal queue is empty. In any case, it's the thread in charge
of the signal queue which will notify them once it receives the signal.

It was verified that connections are no longer broken with this fix,
and that the issue that required it (#2537, looping threads on reload)
does not re-appear with the reproducer, while it still did without the
fix above. Since the fix above was backported to every stable version,
this one will also have to.
2025-09-02 11:33:14 +02:00
William Lallemand
ce57f11991 DOC: configuration: rework the jwt_verify keyword documentation
Split the documentation in multiple sections:

- Explanation about what it does and how
- <alg> parameter with array of parameters
- <key> parameter with details about certificates and public keys
- Return value

Others changes:

- certificates does not need to be known during configuration parsing
- differences between public key and certificate
2025-09-02 11:16:42 +02:00
Amaury Denoyelle
36d28bfca3 MEDIUM: quic: strengthen BUG_ON() for unpad Initial packet on client
To avoid anti-amplification limit, it is required that Initial packet
are padded to be at least 1.200 bytes long. On server side, this only
applies to ack-eliciting packets. However, for client side, this is
mandatory for every packets.

This patch adjusts qc_txb_store() BUG_ON statement used to catch too
small Initial packets. On QUIC client side, ack-eliciting flag is now
ignored, thus every packets are checked.

This is labelled as MEDIUM as this BUG_ON() is known to be easily
triggered, as QUIC datagrams encoding function are complex. However,
it's important that a QUIC endpoint respects it, else the peer will drop
the invalid packet and could immediately close the connection.
2025-09-02 10:41:49 +02:00
Amaury Denoyelle
209a54d539 BUG/MINOR: quic: pad Initial pkt with CONNECTION_CLOSE on client
Currently, when connection is closing, only CONNECTION_CLOSE frame is
emitted via qc_prep_pkts()/qc_do_build_pkt(). Also, only the first
registered encryption level is considered while the others are
dismissed. This results in a single packet datagram.

This can cause issues for QUIC client support, as padding is required
for every Initial packet, contrary to server side where only
ack-eliciting packets are eligible. Thus a client must add padding to a
CONNECTION_CLOSE frame on Initial level.

This patch adjusts qc_prep_pkts() to ensure such packet will be
correctly padded on client side. It sets <final_packet> variable which
instructs that if padding is necessary it must be apply immediately on
the current encryption level instead of the last one.

It could appear as unnecessary to pad a CONNECTION_CLOSE packet, as the
peer will enter in draining state when processing it. However, RFC
mandates that a client Initial packet too small must be dropped by the
server, so there is a risk that the CONNECTION_CLOSE is simply discarded
prior to its processing if stored in a too small datagram.

No need to backport as this is a QUIC backend issue only.
2025-09-02 10:34:12 +02:00
Amaury Denoyelle
e9b78e3fb1 BUG/MINOR: quic: fix padding issue on INITIAL retransmit
On loss detection timer expiration, qc_dgrams_retransmit() is used to
reemit lost packets. Different code paths are present depending on the
active encryption level.

If Initial level is still initialized, retransmit is performed both for
Initial and Handshake spaces, by first retrieving the list of lost
frames for each of them.

Prior to this patch, Handshake level was always registered for emission
after Initial, even if it dit not have any frame to reemit. In this
case, most of the time it would result in a datagram containing Initial
reemitted frames packet coalesced with a Handshake packet consisting
only of a PADDING frame. This is because padding is only added for the
last registered QEL.

For QUIC backend support, this may cause issues. This is because
contrary to QUIC server side, Initial and Handshake levels keys are not
derived simultaneously for a QUIC client. Thus, if the latter keys are
unavailable, Handshake packet cannot be encoded in sending, leaving a
single Initial packet. However, this is now too late to add PADDING.
Thus the resulting datagram is invalid : this triggers the BUG_ON()
assert failure located on qc_txb_store().

This patch fixes this by amending qc_dgrams_retransmit(). Now, Handshake
level is only registered for emission if there is frame to retransmit,
which implies that Handshake keys are already available. Thus, PADDING
will now either be added at Initial or Handshake level as expected.

Note that this issue should not be present on QUIC frontend, due to
Initial and Handshake keys derivation almost simultaneously. However,
this should still be backported up to 3.0.
2025-09-02 10:31:32 +02:00
Amaury Denoyelle
34d5bfd23c BUG/MINOR: quic: fix room check if padding requested
qc_prep_pkts() activates padding when building an Initial packet. This
ensures that resulting datagram will always be at least 1.200 bytes,
which is mandatory to prevent deadlock over anti-amplication.

Prior to padding activation, a check is performed to ensure that output
buffer is big enough for a padded datagram. However, this did not take
into account previously built packets which would be coalesced in the
same datagram. Thus this patch fixes this comparison check.

In theory, prior to this patch, in some cases Initial packets could not
be built despite a datagram of the proper size. Currently, this probably
never happens as Initial packet is always the first encoded in a
datagram, thus there is no coalesced packet prior to it. However, there
is no hard requirement on this, so it's better to reflect this in the
code.

This should be backported up to 2.6.
2025-09-02 10:29:11 +02:00
Amaury Denoyelle
a84b404b34 MINOR: quic/flags: complete missing flags
Add missing quic_conn flags definition for dev utility.
2025-09-02 09:37:43 +02:00
Frederic Lecaille
fba80c7fe8 BUG/MINOR: quic: ignore AGAIN ncbuf err when parsing CRYPTO frames
This fix follows this previous one:

    BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets

which is not sufficient when a client fragments and mixes its CRYPTO frames AND
leaveswith holes by packets. ngtcp2 (and perhaps chrome) splits theire CRYPTO
frames but without hole by packet. In such a case, the CRYPTO parsing leads to
QUIC_RX_RET_FRM_AGAIN errors which cannot be fixed when the peer resends its packets.
Indeed, even if the peer resends its frames in a different order, this does not
help because since the previous commit, the CRYPTO frames are ordered on haproxy side.

This issue was detected thanks to the interopt tests with quic-go as client. This
client fragments its CRYPTO frames, mixes them, and generate holes, and most of
the times with the retry test.

To fix this, when a QUIC_RX_RET_FRM_AGAIN error is encountered, the CRYPTO frames
parsing is not stop. This leaves chances to the next CRYPTO frames to be parsed.

Must be backported as far as 2.6 as the commit mentioned above.
2025-09-02 08:13:58 +02:00
Alexander Stephan
26776c7b8f BUG/MINOR: tools: Add OOM check for malloc() in indent_msg()
This patch adds a missing out-of-memory (OOM) check after
the call to `malloc()` in `indent_msg()`. If memory
allocation fails, the function returns NULL to prevent
undefined behavior.

Co-authored-by: Christian Norbert Menges <christian.norbert.menges@sap.com>
2025-09-02 07:29:54 +02:00
Alexander Stephan
aa20905ac9 BUG/MINOR: compression: Add OOM check for calloc() in parse_compression_options()
This patch adds a missing out-of-memory (OOM) check after
the call to `calloc()` in `parse_compression_options()`. If
memory allocation fails, an error message is set, the function
returns -1, and parsing is aborted to ensure safe handling
of low-memory conditions.

Co-authored-by: Christian Norbert Menges <christian.norbert.menges@sap.com>
2025-09-02 07:29:54 +02:00
Alexander Stephan
73f9a75894 BUG/MINOR: cfgparse: Add OOM check for calloc() in cfg_parse_listen()
This commit adds a missing out-of-memory (OOM) check
after the call to `calloc()` in `cfg_parse_listen()`.
If memory allocation fails, an alert is logged, error
codes are set, and parsing is aborted to prevent
undefined behavior.

Co-authored-by: Christian Norbert Menges <christian.norbert.menges@sap.com>
2025-09-02 07:29:54 +02:00
Alexander Stephan
c3e69cf065 BUG/MINOR: acl: Add OOM check for calloc() in smp_fetch_acl_parse()
This patch adds a missing out-of-memory (OOM) check after
the call to `calloc()` in `smp_fetch_acl_parse()`. If
memory allocation fails, an error message is set and
the function returns 0, improving robustness in
low-memory situations.

Co-authored-by: Christian Norbert Menges <christian.norbert.menges@sap.com>
2025-09-02 07:29:54 +02:00
Alexander Stephan
22ac1f5ee9 BUG/MINOR: log: Add OOM checks for calloc() and malloc() in logformat parser and dup_logger()
This patch adds missing out-of-memory (OOM) checks after calls
to `calloc()` and `malloc()` in the logformat parser and the
`dup_logger()` function. If memory allocation fails, an error
is reported or NULL is returned, preventing undefined behavior
in low-memory conditions.

Co-authored-by: Christian Norbert Menges <christian.norbert.menges@sap.com>
2025-09-02 07:29:54 +02:00
Alexander Stephan
fbd0fb20a2 BUG/MINOR: halog: Add OOM checks for calloc() in filter_count_srv_status() and filter_count_url()
This patch adds missing out-of-memory (OOM) checks after calls to
calloc() in the functions `filter_count_srv_status()` and `filter_count_url()`.
If memory allocation fails, an error message is printed to stderr
and the process exits with status 1. This improves robustness
and prevents undefined behavior in low-memory situations.

Co-authored-by: Christian Norbert Menges <christian.norbert.menges@sap.com>
2025-09-02 07:29:54 +02:00
Christopher Faulet
8c555a4a4e BUG/MINOR: acl: Properly detect overwritten matching method
A bug was introduced by the commit 6ea50ba46 ("MINOR: acl; Warn when
matching method based on a suffix is overwritten"). The test on the match
function, when defined was not correct. It is now fixed.

No backport needed, except if the commit above is backported.
2025-09-01 21:36:25 +02:00
Christopher Faulet
f8b7299ee7 BUG/MINOR: server: Duplicate healthcheck's sni inherited from default server
It is not really an issue, but the "check-sni" value inerited from a default
server is not duplicated while the paramter value is duplicated during the
parsing. So here there is a small leak if several "check-sni" parameters are
used on the same server line. The previous value is never released. But to
fix this issue, the value inherited from the default server must also be
duplicated. At the end it is safer this way and consistant with the parsing
of the "sni" parameter.

It is harmless so there is no reason to backport this patch.
2025-09-01 15:45:05 +02:00
Christopher Faulet
f7a04b428a BUG/MEDIUM: server: Duplicate healthcheck's alpn inherited from default server
When "check-alpn" parameter is inherited from the default server, the value
is not duplicated, the pointer of the default server is used. However, when
this parameter is overridden, the old value is released. So the "check-alpn"
value of the default server is released. So it is possible to have a UAF if
if another server inherit from the same the default server.

To fix the issue, the "check-alpn" parameter must be handled the same way
the "alpn" is. The default value is duplicated. So it could be safely
released if it is forced on the server line.

This patch should fix the issue #3096. It must be backported to all stable
versions.
2025-09-01 15:45:05 +02:00
Christopher Faulet
6ea50ba462 MINOR: acl; Warn when matching method based on a suffix is overwritten
From time to time, issues are reported about string matching based on suffix
(for instance path_beg). Each time, it appears these ACLs are used in
conjunction with a converter or followed by an explicit matching method
(-m).

Unfortunatly, it is not an issue but an expected behavior, while it is not
obvious. matching suffixes can be consider as aliases on the corresponding
'-m' matching method. Thus "path_beg" is equivalent to "path -m beg". When a
converter is used the original matching (string) is used and the suffix is
lost. When followed by an explicit matching method, it overwrites the
matching method based on the suffix.

It is expected but confusing. Thus now a warning is emitted because it is a
configuration issue for sure. Following sample fetch functions are concerned:

 * base
 * path
 * req.cook
 * req.hdr
 * res.hdr
 * url
 * urlp

The configuration manual was modified to make it less ambiguous.
2025-09-01 15:45:05 +02:00
Christopher Faulet
c51ddd5c38 MINOR: acl: Only allow one '-m' matching method
Several '-m' explicit matching method was allowed, but only the last one was
really used. There is no reason to specify several matching method and it is
most probably an error or a lack of understanding of how matchings are
performed. So now, an error is triggered during the configuration parsing to
avoid any bad usage.
2025-09-01 15:45:05 +02:00
Christopher Faulet
d09d7676d0 REG-TESTS: map_redirect: Don't use hdr_dom in ACLs with "-m end" matching method
hdr_dom() is a alias of "hdr() -m dom". So using it with another explicit
matching method does not work because the matching on the domain will never
be performed. Only the last matching method is used. The scripts was working
by chance because no port was set on host header values.

The script was fixed by using "host_only" converter. In addition, host
header values were changed to have a port now.
2025-09-01 15:45:05 +02:00
Amaury Denoyelle
1868ca9a95 MINOR: conn/muxes/ssl: add ASSUME_NONNULL() prior to _srv_add_idle
When manipulating idle backend connections for input/output processing,
special care is taken to ensure the connection cannot be accessed by
another thread, for example via a takeover. When processing is over,
connection is reinserted in its original list.

A connection can either be attached to a session (private ones) or a
server idle tree. In the latter case, <srv> is guaranteed to be non null
prior to _srv_add_idle() thanks to CO_FL_LIST_MASK comparison with conn
flags. This patch adds an ASSUME_NONNULL() to better reflect this.

This should fix coverity reports from github issue #3095.
2025-09-01 15:35:22 +02:00
Amaury Denoyelle
dcf2261612 BUG/MAJOR: mux-quic: fix crash on reload during emission
MUX QUIC restricts buffer allocation per connection based on the
underlying congestion window. If a QCS instance cannot allocate a new
buffer, it is put in a buf_wait list. Typically, this will cause stream
upper layer to subscribe for sending.

A BUG_ON() was present on snd_buf and nego_ff callback prologue to
ensure that these functions were not called if QCS is already in
buf_wait list. The objective was to guarantee that there is no wake up
on a stream if it cannot allocate a buffer.

However, this BUG_ON() is not correct, as it can be fired legitimely.
Indeed, stream layer can retry emission even if no wake up occured. This
case can happen on reload. Thus, BUG_ON() will cause an unexpected
crash.

Fix this by removing these BUG_ON(). Instead, snd_buf/nego_ff callbacks
ensure that QCS is not subscribed in buf_wait list. If this is the case,
a nul value will be returned, which is sufficient for the stream layer
to pause emission and subscribe if necessary.

Occurences for this crash have been reported on the mailing list. It is
also the subject of github issue #3080, which should be fixed with this
patch.

This must be backported up to 3.0.
2025-09-01 15:35:22 +02:00
Frederic Lecaille
800ba73a9c BUG/MEDIUM: quic: CRYPTO frame freeing without eb_delete()
Since this commit:

	BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets

when they are parsed, the CRYPTO frames are ordered by their offsets into an ebtree.
Then their data are provided to the ncbufs.

But in case of error, when qc_handle_crypto_frm() returns QUIC_RX_RET_FRM_FATAL
or QUIC_RX_RET_FRM_AGAIN), they remain attached to their tree. Then
from <err> label, they are deteleted and deleted (with a while(node) { eb_delete();
qc_frm_free();} loop). But before this loop, these statements directly
free the frame without deleting it from its tree, if this is a CRYPTO frame,
leading to a use after free when running the loop:

     if (frm)
	    qc_frm_free(qc, &frm);

This issue was detected by the interop tests, with quic-go as client. Weirdly, this
client sends CRYPTO frames by packet with holes.
Must be backported as far as 2.6 as the commit mentioned above.
2025-09-01 10:39:00 +02:00
Frederic Lecaille
90126ec9b7 CLEANUP: quic: remove a useless CRYPTO frame variable assignment
This modification should have arrived with this commit:

	MINOR: quic: remove ->offset qf_crypto struct field

Since this commit, the CRYPTO offset node key assignment is done at parsing time
when calling qc_parse_frm() from qc_parse_pkt_frms().

This useless assigment has been reported in GH #3095 by coverity.

This patch should be easily backported as far as 2.6 as the one mentioned above
to ease any further backport to come.
2025-09-01 09:31:04 +02:00
Collison, Steven
00be358426 DOC: proxy-protocol: Make example for PP2_SUBTYPE_SSL_SIG_ALG accurate
The docs call out that this field is the algorithm used to
sign the certificate. However, the example only had the hash portion of
the signature algorithm. This change updates the example to be accurate
based on a value written by HAProxy, which is based on an OID for
signature algorithms. I based example on a real TLV written by
HAProxy on my machine with all SSL TLVs enabled in config.
2025-08-29 16:26:57 +02:00
Amaury Denoyelle
1517869145 BUG/BUILD: stats: fix build due to missing stat enum definition
Recently, new server counter for private idle connections have been
added to statistics output. However, the patch was missing
ST_I_PX_PRIV_IDLE_CUR enum definition.

No need to backport.
2025-08-29 09:32:10 +02:00
Christopher Faulet
8f3b537547 MEDIUM: proxy: Reject some header names for 'http-send-name-header' directive
From time to time, we saw the 'http-send-name-header' directive used to
overwrite the Host header to workaround limitations of a buggy application.
Most of time, this led to troubles. This was never officially supported and
each time we strongly discouraged anyone to do so. We already thought to
deprecate this directive, but it seems to be still used by few people. So
for now, we decided to strengthen the tests performed on it.

The header name is now checked during the configuration parsing to forbid
some risky names. 'Host', 'Content-Length', 'Transfer-Encoding' and
'Connection' header names are now rejected. But more headers could be added
in future.
2025-08-29 09:27:01 +02:00
Amaury Denoyelle
2afcba1eb7 MINOR: proxy: extend "show servers conn" output
CLI command "show servers conn" is used as a debugging tool to monitor
the number of connections per server. This patch extends its output by
adding the content of two server counters.

<served> is the first added column. It represents the number of active
streams on a server. <curr_sess_idle_conns> is the second added column.
This is a recently added value which account private idle connections
referencing a server.
2025-08-28 18:58:11 +02:00
Amaury Denoyelle
fac1de935a MINOR: stats: display new curr_sess_idle_conns server counter
Add a new stats column in proxy stats to display server counter for
private idle connections. This counter has been introduced recently.

The value is displayed on CSV output on the last column before modules.
It is also displayed on HTLM page alongside other idle server counters.
2025-08-28 18:58:11 +02:00
Amaury Denoyelle
fb43343f6f MINOR: doc: add missing statistics column
Complete documentation with missing description of newly added columns.

This must be backported up to 2.8.
2025-08-28 18:58:11 +02:00
Amaury Denoyelle
f0710a1fbc MINOR: doc: add missing statistics column
Complete documentation with missing description of newly added columns.

This should be backported up to 2.4
2025-08-28 18:58:11 +02:00
William Lallemand
e0ec01849f DOC: configuration: confuse "strict-mode" with "zero-warning"
4b10302fd8 ("MINOR: cfgparse: implement a simple if/elif/else/endif
macro block handler") introduces a confusion between "strict-mode" and
"zero-warning".

This patch fixes the issue by changing "strict-mode" by "zero-warning"
in section 2.4. Conditional blocks.

Must be backported as far as 2.4.
2025-08-28 17:35:06 +02:00
Amaury Denoyelle
21f7974e05 OPTIM: backend: set release on takeover for strict maxconn
When strict maxconn is enforced on a server, it may be necessary to kill
an idle connection to never exceed the limit. To be able to delete a
connection from any thread, takeover is first used to migrate it on the
current thread prior to its deletion.

As takeover is performed to delete a connection instead of reusing it,
<release> argument can be set to true. This removes unnecessary
allocations of resources prior to connection deletion. As such, this
patch is a small optimization for strict maxconn implementation.

Note that this patch depends on the previous one which removes any
assumption in takeover implementation that thread isolation is active if
<release> is true.
2025-08-28 16:11:32 +02:00
Amaury Denoyelle
d971d3fed8 MINOR: muxes: adjust takeover with buf_wait interaction
Takeover operation defines an argument <release>. It's a boolean which
if set indicate that freed connection resources during the takeover does
not have to be reallocated on the new thread. Typically, it is set to
false when takever is performed to reuse a connection. However, when
used to be able to delete a connection from a different thread,
<release> should be set to true.

Previously, <release> was only set in conjunction with "del server"
handler. This operation was performed under thread isolation, which
guarantee that not thread-safe operation such as removal from buf_wait
list could be performed on takeover if <release> was true. In the
contrary case, takeover operation would fail.

Recently, "del server" handler has been adjusted to remove idle
connection cleanup with takeover. As such, <release> is never set to
true in remaining takeover usage.

However, takeover is also used to enforce strict-maxconn on a server.
This is performed to delete a connection from any thread, which is the
primary reason of <release> to true. But for the moment as takeover
implementers considers that thread isolation is active if <release> is
set, this is not yet applicable for strict-maxconn usage.

Thus, the purpose of this patch is to adjust takeover implementation.
Remove assumption between <release> and thread-isolation mode. It's not
possible to remove a connection from a buf_wait list, an error will be
return in any case.
2025-08-28 16:09:48 +02:00
William Lallemand
8a456399db DOC: unreliable sockpair@ on macOS
We discovered that the sockpair@ protocol is unreliable in macOS, this
is the same problem that we fixed in d7f6819. But it's not possible to
implement a acknowledgment once the socket are in non-blocking mode.

The problem was discovered in issue #3045.

Must be backported in every stable versions.
2025-08-28 15:35:17 +02:00
William Lallemand
ffdccb6e04 BUILD: mworker: fix ignoring return value of ‘read’
Fix read return value unused result.

src/haproxy.c: In function ‘main’:
src/haproxy.c:3630:17: error: ignoring return value of ‘read’ declared with attribute ‘warn_unused_result’ [-Werror=unused-result]
 3630 |                 read(sock_pair[1], &c, 1);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~

Must be backported where d7f6819 is backported.
2025-08-28 15:13:01 +02:00
Amaury Denoyelle
7232677385 MAJOR: server: do not remove idle conns in del server
Do not remove anymore idle and purgeable connections directly under the
"del server" handler. The main objective of this patch is to reduce the
amount of work performed under thread isolation. This should improve
"del server" scheduling with other haproxy tasks.

Another objective is to be able to properly support dynamic servers with
QUIC. Indeed, takeover is not yet implemented for this protocol, hence
it is not possible to rely on cleanup of idle connections performed by a
single thread under "del server" handler.

With this change it is not possible anymore to remove a server if there
is still idle connections referencing it. To ensure this cannot be
performed, srv_check_for_deletion() has been extended to check server
counters for idle and idle private connections.

Server deletion should still remain a viable procedure, as first it is
mandatory to put the targetted server into maintenance. This step forces
the cleanup of its existing idle connections. Thanks to a recent change,
all finishing connections are also removed immediately instead of
becoming idle. In short, this patch transforms idle connections removal
from a synchronous to an asynchronous procedure. However, this should
remain a steadfast and quick method achievable in less than a second.

This patch is considered major as some users may notice this change when
removing a server. In particular with the following CLI commands
pipeline:
  "disable server <X>; shutdown sessions server <X>; del server <X>"

Server deletion will now probably fail, as idle connections purge cannot
be completed immediately. Thus, it is now highly advise to always use a
small delay "wait srv-removable" before "del server" to ensure that idle
connections purge is executed prior.

Along with this change, documentation for "del server" and related
"shutdown sessions server" has been refined, in particular to better
highlight under what conditions a server can be removed.
2025-08-28 15:08:35 +02:00
Amaury Denoyelle
dbe31e3f65 MEDIUM: session: account on server idle conns attached to session
This patch adds a new member <curr_sess_idle_conns> on the server. It
serves as a counter of idle connections attached on a session instead of
regular idle/safe trees. This is used only for private connections.

The objective is to provide a method to detect if there is idle
connections still referencing a server.

This will be particularly useful to ensure that a server is removable.
Currently, this is not yet necessary as idle connections are directly
freed via "del server" handler under thread isolation. However, this
procedure will be replaced by an asynchronous mechanism outside of
thread isolation.

Careful: connections attached to a session but not idle will not be
accounted by this counter. These connections can still be detected via
srv_has_streams() so "del server" will be safe.

This counter is maintain during the whole lifetime of a private
connection. This is mandatory to guarantee "del server" safety and is
conform with other idle server counters. What this means it that
decrement is performed only when the connection transitions from idle to
in use, or just prior to its deletion. For the first case, this is
covered by session_get_conn(). The second case is trickier. It cannot be
done via session_unown_conn() as a private connection may still live a
little longer after its removal from session, most notably when
scheduled for idle purging.

Thus, conn_free() has been adjusted to handle the final decrement. Now,
conn_backend_deinit() is also called for private connections if
CO_FL_SESS_IDLE flag is present. This results in a call to
srv_release_conn() which is responsible to decrement server idle
counters.
2025-08-28 15:08:35 +02:00
Amaury Denoyelle
7a6e3c1a73 MAJOR: server: implement purging of private idle connections
When a server goes into maintenance, or if its IP address is changed,
idle connections attached to it are scheduled for deletion via the purge
mechanism. Connections are moved from server idle/safe list to the purge
list relative to their thread. Connections are freed on their owned
thread by the scheduled purge task.

This patch extends this procedure to also handle private idle
connections stored in sessions instead of servers. This is possible
thanks via <sess_conns> list server member. A call to the newly
defined-function session_purge_conns() is performed on each list
element. This moves private connections from their session to the purge
list alongside other server idle connections.

This change relies on the serie of previous commits which ensure that
access to private idle connections is now thread-safe, with idle_conns
lock usage and careful manipulation of private idle conns in
input/output handlers.

The main benefit of this patch is that now all idle connections
targetting a server set in maintenance are removed. Previously, private
connections would remain until their attach sessions were closed.
2025-08-28 15:08:35 +02:00
Amaury Denoyelle
17a1daca72 MEDIUM: mux-quic: enforce thread-safety of backend idle conns
Complete QUIC MUX for backend side. Ensure access to idle connections
are performed in a thread-safe way. Even if takeover is not yet
implemented for this protocol, it is at least necessary to ensure that
there won't be any issue with idle connections purging mechanism.

This change will also be necessary to ensure that QUIC servers can
safely be removed via CLI "del server". This is not yet sufficient as
currently server deletion still relies on takeover for idle connections
removal. However, this will be adjusted in a future patch to instead use
idle connections standard purging mechanism.
2025-08-28 15:08:35 +02:00
Amaury Denoyelle
73fd12e928 MEDIUM: conn/muxes/ssl: remove BE priv idle conn from sess on IO
This is a direct follow-up of previous patch which adjust idle private
connections access via input/output handlers.

This patch implement the handlers prologue part. Now, private idle
connections require a similar treatment with non-private idle
connections. Thus, private conns are removed temporarily from its
session under protection of idle_conns lock.

As locking usage is already performed in input/output handler,
session_unown_conn() cannot be called. Thus, a new function
session_detach_idle_conn() is implemented in session module, which
performs basically the same operation but relies on external locking.
2025-08-28 15:08:35 +02:00
Amaury Denoyelle
8de0807b74 MEDIUM: conn/muxes/ssl: reinsert BE priv conn into sess on IO completion
When dealing with input/output on a connection related handler, special
care must be taken prior to access the connection if it is considered as
idle, as it could be manipulated by another thread. Thus, connection is
first removed from its idle tree before processing. The connection is
reinserted on processing completion unless it has been freed during it.

Idle private connections are not concerned by this, because takeover is
not applied on them. However, a future patch will implement purging of
these connections along with regular idle ones. As such, it is necessary
to also protect private connections usage now. This is the subject of
this patch and the next one.

With this patch, input/output handlers epilogue of
muxes/SSL/conn_notify_mux() are adjusted. A new code path is able to
deal with a connection attached to a session instead of a server. In
this case, session_reinsert_idle_conn() is used. Contrary to
session_add_conn(), this new function is reserved for idle connections
usage after a temporary removal.

Contrary to _srv_add_idle() used by regular idle connections,
session_reinsert_idle_conn() may fail as an allocation can be required.
If this happens, the connection is immediately destroyed.

This patch has no effect for now. It must be coupled with the next one
which will temporarily remove private idle connections on input/output
handler prologue.
2025-08-28 15:08:35 +02:00
Amaury Denoyelle
9574867358 MINOR: muxes: enforce thread-safety for private idle conns
When a backend connnection becomes idle, muxes must activate some
protection to mark future access on it as dangerous. Indeed, once a
connection is inserted in an idle list, it may be manipulated by another
thread, either via takeover or scheduled for purging.

Private idle connections are stored into a session instead of the server
tree. They are never subject to a takeover for reuse or purge mechanism.
As such, currently they do not require the same level of protection.

However, a new patch will introduce support for private idle connections
purging. Thus, the purpose of this patch is to ensure protection is
activated as well now.

TASK_F_USR1 was already set on them as an anticipation for such need.
Only some extra operations were missing, most notably xprt_set_idle()
invokation. Also, return path of muxes detach operation is adjusted to
ensure such connection are never accessed after insertion.
2025-08-28 14:55:21 +02:00
Amaury Denoyelle
b18b5e2f74 MINOR: server: cleanup idle conns for server in maint already stopped
When a server goes into maintenance mode, its idle connections are
scheduled for an immediate purge. However, this is not the case if the
server is already in stopped state, for example due to a health check
failure.

Adjust _srv_update_status_adm() to ensure that idle connections are
always scheduled for purge when going into maintenance in both cases.

The main advantage of this patch is to ensure consistent behavior for
server maintenance mode.

Note that it will also become necessary as server deletion will be
adjusted with a future patch. Idle connection closure won't be performed
by "del server" handler anymore, so it's important to ensure that a full
cleanup is always performed prior to executing it, else the server may
not be removable during a certain delay.
2025-08-28 14:55:21 +02:00
Amaury Denoyelle
fa1a168bf1 MEDIUM: session: close new idle conns if server in maintenance
Previous patch ensures that a backend connection going into idle state
is rejected and freed if its target server is in maintenance.

This patch introduces a similar change for connections attached in the
session. session_check_idle_conn() now returns an errorl if connection
target server is in maintenance, similarly to session max idle conns
limit reached. This is sufficient to instruct muxes to delete the
connection immediately.
2025-08-28 14:55:21 +02:00
Amaury Denoyelle
67df6577ff MEDIUM: server: close new idle conns if server in maintenance
Currently, when a server is set on maintenance mode, its idle connection
are scheduled for purge. However, this does not prevent currently used
connection to become idle later on, even if the server is still off.

Change this behavior : an idle connection is now rejected by the server
if it is in maintenance. This is implemented with a new condition in
srv_add_to_idle_list() which returns an error value. In this case, muxes
stream detach callback will immediately free the connection.

A similar change is also performed in each MUX and SSL I/O handlers and
in conn_notify_mux(). An idle connection is not reinserted in its idle
list if server is in maintenance, but instead it is immediately freed.
2025-08-28 14:55:18 +02:00
Amaury Denoyelle
f234b40cde MINOR: server: shard by thread sess_conns member
Server member <sess_conns> is a mt_list which contains every backend
connections attached to a session which targets this server. These
connecions are not present in idle server trees.

The main utility of this list is to be able to cleanup these connections
prior to removing a server via "del server" CLI. However, this procedure
will be adjusted by a future patch. As such, <sess_conns> member must be
moved into srv_per_thread struct. Effectively, this duplicates a list
for every threads.

This commit does not introduce functional change. Its goal is to ensure
that these connections are now ordered by their owning thread, which
will allow to implement a purge, similarly to idle connections attached
to servers.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
37fca75ef7 MEDIUM: session: protect sess conns list by idle_conns_lock
Introduce idle_conns_lock usage to protect manipulation to <priv_conns>
session member. This represents a list of intermediary elements used to
store backend connections attached to a session to prevent their sharing
across multiple clients.

Currently, this patch is unneeded as sessions are only manipulated on a
single-thread. Indeed, contrary to idle connections stored in servers,
takeover is not implemented for connections attached to a session.
However, a future patch will introduce purging of these connections,
which is already performed for connections attached to servers. As this
can be executed by any thread, it is necessary to introduce
idle_conns_lock usage to protect their manipulation.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
f3e8e863c9 MINOR: session: refactor alloc/lookup of sess_conns elements
By default backend connections are stored into idle/avail server trees.
However, if such connections cannot be shared between multiple clients,
session serves as the alternative storage.

To be able to quickly reuse a backend conn from a session, they are
indexed by their target, which is either a server or a backend proxy.
This is the purpose of 'struct sess_priv_conns' intermediary stockage
element.

Lookup and allocation of these elements are performed in several session
function, for example to add, get or remove a backend connection from a
session. The purpose of this patch is to simplify this by providing two
internal functions sess_alloc_sess_conns() and sess_get_sess_conns().

Along with this, a new BUG_ON() is added into session_unown_conn(),
which ensure that sess_priv_conns element is found when the connection
is removed from the session.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
d4f7a2dbcc MINOR: session: uninline functions related to BE conns management
Move from header to source file functions related to session management
of backend connections. These functions are big enough to remove inline
attribute.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
d0df41fd22 MINOR: session: document explicitely that session_add_conn() is safe
A set of recent patches have simplified management of backend connection
attached to sessions. The API is now stricter to prevent any misuse.

One of this change is the addition of a BUG_ON() in session_add_conn(),
which ensures that a connection is not attached to a session if its
<owner> field points to another entry.

On older haproxy releases, this assertion could not be enforced due to
NTLM as a connection is turned as private during its transfer. When
using a true multiplexed protocol on the backend side, the connection
could be assigned in turn to several sessions. However, NTLM is now only
applied for HTTP/1.1 as it does not make sense if the connection is
already shared.

To better clarify this situation, extend the comment on BUG_ON() inside
session_add_conn().
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
b3ce464435 BUG/MINOR: mux-quic: do not access conn after idle list insert
Once a connection is inserted into the server idle/safe tree during
stream detach, it is not accessed anymore by the muxes without
idle_conns_lock protection. This is because the connection could have
been already stolen by a takeover operation.

Adjust QUIC MUX detach implementation to follow the same pattern. Note
that, no bug can occur due to takeover as QUIC does not implement it.
However, prior to this patch, there may still exist race-conditions with
idle connection purging.

No backport needed.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
0be225f341 BUG/MINOR: server: decrement session idle_conns on del server
When a server is deleted, each of its idle connections are removed. This
is also performed for every private connections stored on sessions which
referenced the target server.

As mentionned above, these private connections are idle, guaranteed by
srv_check_for_deletion(). A BUG_ON() on CO_FL_SESS_IDLE is already
present to guarantee this. Thus, these connections are accounted on the
session to enforce max-session-srv-conns limit.

However, this counter is not decremented during private conns cleanup on
"del server" handler. This patch fixes this by adding a decrement for
every private connections removed via "del server".

This should be backported up to 3.0.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
bce29bc7a4 MINOR: cli: display failure reason on wait command
wait CLI command can be used to wait until either a defined timeout or a
specific condition is reached. So far, srv-removable is the only event
supported. This is tested via srv_check_for_deletion().

This is implemented via srv_check_for_deletion(), which is
able to report a message describing the reason if the condition is
unmet.

Previously, wait return a generic string, to specify if the condition is
met, the timer has expired or an immediate error is encountered. In case
of srv-removable, it did not report the real reason why a server could
not be removed.

This patch improves wait command with srv-removable. It now displays the
last message returned by srv_check_for_deletion(), either on immediate
error or on timeout. This is implemented by using dynamic string output
with cli_dynmsg/dynerr() functions.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
04f05f1880 BUG/MINOR: connection: remove extra session_unown_conn() on reverse
When a connection is reversed via rhttp protocol on the edge endpoint,
it migrates from frontend to backend side. This operation is performed
by conn_reverse(). During this transition, the conn owning session is
freed as it becomes unneeded.

Prior to this patch, session_unown_conn() was also called during
frontend to backend migration. However, this is unnecessary as this
function is only used for backend connection reuse. As such, this patch
removes this unnecessary call.

This does not cause any harm to the process as session_unown_conn() can
handle a connection not inserted yet. However, for clarity purpose it's
better to backport this patch up to 3.0.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
a96f1286a7 BUG/MINOR: connection: rearrange union list members
A connection can be stored in several lists, thus there is several
attach points in struct connection. Depending on its proxy side, either
frontend or backend, a single connection will only access some of them
during its lifetime.

As an optimization, these attach points are organized in a union.
However, this repartition was not correctly achieved along
frontend/backend side delimitation.

Furthermore, reverse HTTP has recently been introduced. With this
feature, a connection can migrate from frontend to backend side or vice
versa. As such, it becomes even more tedious to ensure that these
members are always accessed in a safe way.

This commit rearrange these fields. First, union is now clearly splitted
between frontend and backend only elements. Next, backend elements are
initialized with conn_backend_init(), which is already used during
connection reversal on an edge endpoint. A new function
conn_frontend_init() serves to initialize the other members, called both
on connection first instantiation and on reversal on a dialer endpoint.

This model is much cleaner and should prevent any access to fields from
the wrong side.

Currently, there is no known case of wrong access in the existing code
base. However, this cleanup is considered an improvement which must be
backported up to 3.0 to remove any possible undefined behavior.
2025-08-28 14:52:29 +02:00
William Lallemand
d7f6819161 BUG/MEDIUM: mworker: fix startup and reload on macOS
Since the mworker rework in haproxy 3.1, the worker need to tell the
master that it is ready. This is done using the sockpair protocol by
sending a _send_status message to the master.

It seems that the sockpair protocol is buggy on macOS because of a known
issue around fd transfer documented in sendmsg(2):

https://man.freebsd.org/cgi/man.cgi?sendmsg(2) BUGS section

  Because sendmsg() does not necessarily block until the data has been
  transferred, it is possible to transfer an open file descriptor across
  an AF_UNIX domain socket (see recv(2)), then close() it before it has
  actually been sent, the result being that the receiver gets a closed
  file descriptor. It is left to the application to implement an
  acknowledgment mechanism to prevent this from happening.

Indeed the recv side of the sockpair is closed on the send side just
after the send_fd_uxst(), which does not implement an acknowledgment
mechanism. So the master might never recv the _send_status message.

In order to implement an acknowledgment mechanism, a blocking read() is
done before closing the recv fd on the sending side, so we are sure that
the message was read on the other side.

This was only reproduced on macOS, meaning the master CLI is also
impacted on macOS. But no solution was found on macOS for it.
Implementing an acknowledgment mechanism would complexify too much the
protocol in non-blocking mode.

The problem was reported in ticket #3045, reproduced and analyzed by
@cognet.

Must be backported as far as 3.1.
2025-08-28 14:51:46 +02:00
Valentine Krasnobaeva
441cd614f9 BUG/MINOR: acl: set arg_list->kw to aclkw->kw string literal if aclkw is found
During configuration parsing *args can contain different addresses, it is
changing from line to line. smp_resolve_args() is called after the
configuration parsing, it uses arg_list->kw to create an error message, if a
userlist referenced in some ACL is absent. This leads to wrong keyword names
reported in such message or some garbage is printed.

It does not happen in the case of sample fetches. In this case arg_list->kw is
assigned to a string literal from the sample_fetch struct returned by
find_sample_fetch(). Let's do the same in parse_acl_expr(), when find_acl_kw()
lookup returns a corresponding acl_keyword structure.

This fixes the issue #3088 at GitHub.
This should be backported in all stable versions since 2.6 including 2.6.
2025-08-28 10:22:21 +02:00
Frederic Lecaille
ffa926ead3 BUG/MINOR: mux-quic: trace with non initialized qcc
This issue leads to crashes when the QUIC mux traces are enabled and could be
reproduced with -dMfail. When the qcc allocation fails (qcc_init()) haproxy
crashes into qmux_dump_qcc_info() because ->conn qcc member is initialized:

Program terminated with signal SIGSEGV, Segmentation fault.
    at src/qmux_trace.c:146
146             const struct quic_conn *qc = qcc->conn->handle.qc;
[Current thread is 1 (LWP 1448960)]
(gdb) p qcc
$1 = (const struct qcc *) 0x7f9c63719fa0
(gdb) p qcc->conn
$2 = (struct connection *) 0x155550508
(gdb)

This patch simply fixes the TRACE() call concerned to avoid <qcc> object
dereferencing when it is NULL.

Must be backported as far as 3.0.
2025-08-28 08:19:34 +02:00
Frederic Lecaille
31c17ad837 MINOR: quic: remove ->offset qf_crypto struct field
This patch follows this previous bug fix:

    BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets

where a ebtree node has been added to qf_crypto struct. It has the same
meaning and type as ->offset_node.key field with ->offset_node an eb64tree node.
This patch simply removes ->offset which is no more useful.

This patch should be easily backported as far as 2.6 as the one mentioned above
to ease any further backport to come.
2025-08-28 08:19:34 +02:00
William Lallemand
2ed515c632 DOC: configuration: clarify 'default-crt' and implicit default certificates
Clarify the behavior of implicit default certificates when used on the
same line as the default-crt keyword.

Should be backported as far as 3.2
2025-08-27 17:09:02 +02:00
William Lallemand
ab7358b366 MEDIUM: ssl: convert diag to warning for strict-sni + default-crt
Previous patch emits a diag warning when both 'strict-sni' +
'default-crt' are used on the same bind line.

This patch converts this diagnostic warning to a real warning, so the
previous patch could be backported without breaking configurations.

This was discussed in #3082.
2025-08-27 16:22:12 +02:00
William Lallemand
18ebd81962 MINOR: ssl: diagnostic warning when both 'default-crt' and 'strict-sni' are used
It possible to use both 'strict-sni' and 'default-crt' on the same bind
line, which does not make much sense.

This patch implements a check which will look for default certificates
in the sni_w tree when strict-sni is used. (Referenced by their empty
sni ""). default-crt sets the CKCH_INST_EXPL_DEFAULT flag in
ckch_inst->is_default, so its possible to differenciate explicits
default from implicit default.

Could be backported as far as 3.0.

This was discussed in ticket #3082.
2025-08-27 16:22:12 +02:00
Frederic Lecaille
d753f24096 BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets
This issue impacts the QUIC listeners. It is the same as the one fixed by this
commit:

	BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO

As chrome, ngtcp2 client decided to fragment its CRYPTO frames but in a much
more agressive way. This could be fixed with a list local to qc_parse_pkt_frms()
to please chrome thanks to the commit above. But this is not sufficient for
ngtcp2 which often splits its ClientHello message into more than 10 fragments
with very small ones. This leads the packet parser to interrupt the CRYPTO frames
parsing due to the ncbuf gap size limit.

To fix this, this patch approximatively proceeds the same way but with an
ebtree to reorder the CRYPTO by their offsets. These frames are directly
inserted into a local ebtree. Then this ebtree is reused to provide the
reordered CRYPTO data to the underlying ncbuf (non contiguous buffer). This way
there are very few less chances for the ncbufs used to store CRYPTO data
to reach a too much fragmented state.

Must be backported as far as 2.6.
2025-08-27 16:14:19 +02:00
Frederic Lecaille
729196fbed BUG/MEDIUM: quic-be: avoid crashes when releasing Initial pktns
This bug arrived with this fix:

    BUG/MINOR: quic-be: missing Initial packet number space discarding

leading to crashes when dereferencing ->ipktns.

Such crashes could be reproduced with -dMfail option. To reach them, the
memory allocations must fail. So, this is relatively rare, except on systems
with limited memory.

To fix this, do not call quic_pktns_discard() if ->ipktns is NULL.

No need to backport.
2025-08-27 16:14:19 +02:00
William Lallemand
c36e4fb17f DOC: configuration: reword 'generate-certificates'
Reword the 'generate-certificates' keyword documentation to clarify
what's happening upon error.

This was discussed in ticket #3082.
2025-08-27 13:42:29 +02:00
Aurelien DARRAGON
2cd0afb430 MINOR: proxy: handle shared listener counters preparation from proxy_postcheck()
We used to allocate and prepare listener counters from
check_config_validity() all at once. But it isn't correct, since at that
time listeners's guid are not inserted yet, thus
counters_fe_shared_prepare() cannot work correctly, and so does
shm_stats_file_preload() which is meant to be called even earlier.

Thus in this commit (and to prepare for upcoming shm shared counters
preloading patches), we handle the shared listener counters prep in
proxy_postcheck(), which means that between the allocation and the
prep there is the proper window for listener's guid insertion and shm
counters preloading.

No change of behavior expected when shm shared counters are not
actually used.
2025-08-27 12:54:25 +02:00
Aurelien DARRAGON
cdb97cb73e MEDIUM: server: split srv_init() in srv_preinit() + srv_postinit()
We actually need more granularity to split srv postparsing init tasks:
Some of them are required to be run BEFORE the config is checked, and
some of them AFTER the config is checked.

Thus we push the logic from 368d0136 ("MEDIUM: server: add and use
srv_init() function") a little bit further and split the function
in two distinct ones, one of them executed under check_config_validity()
and the other one using REGISTER_POST_SERVER_CHECK() hook.

SRV_F_CHECKED flag was removed because it is no longer needed,
srv_preinit() is only called once, and so is srv_postinit().
2025-08-27 12:54:19 +02:00
Aurelien DARRAGON
9736221e90 MINOR: haproxy: abort config parsing on fatal errors for post parsing hooks
When pre-check and post-check postparsing hooks= are evaluated in
step_init_2() potential fatal errors are ignored during the iteration
and are only taken into account at the end of the loop. This is not ideal
because some errors (ie: memory errors) could cause multiple alert
messages in a row, which could make troubleshooting harder for the user.

Let's stop as soon as a fatal error is encountered for post parsing
hooks, as we use to do everywhere else.
2025-08-27 12:54:13 +02:00
Christopher Faulet
49db9739d0 BUG/MEDIUM: spoe: Improve error detection in SPOE applet on client abort
It is possible to interrupt a SPOE applet without reporting an error. For
instance, when the client of the parent stream aborts. Thanks to this patch,
we take care to report an error on the SPOE applet to be sure to interrupt
the processing. It is especially important if the connection to the agent is
queued. Thanks to 886a248be ("BUG/MEDIUM: mux-spop: Reject connection
attempts from a non-spop frontend"), it is no longer an issue. But there is
no reason to continue to process if the parent stream is gone.

In addition, in the SPOE filter, if the processing is interrupted when the
filter is destroyed, no specific status code was set. It is not a big deal
because it cannot be logged at this stage. But it can be used to notify the SPOE
applet. So better to set it.

This patch should be backported as far as 3.1.
2025-08-26 16:12:18 +02:00
William Lallemand
7a30c10587 REGTESTS: jwt: create dynamically "cert.ecdsa.pem"
Stop declaring "cert.ecdsa.pem" in a crt-store, and add it dynamically
over the stats socket insted.

This way we fully verify a JWS signature with a certificate which never
existed at HAProxy startup.
2025-08-25 16:44:24 +02:00
Christopher Faulet
886a248be4 BUG/MEDIUM: mux-spop: Reject connection attempts from a non-spop frontend
It is possible to crash the process by initializing a connection to a SPOP
server from a non-spop frontend. It is of course unexpected and invalid. And
there are some checks to prevent that when the configuration is
loaded. However, it is not possible to handle all cases, especially the
"use_backend" rules relying on log-format strings.

It could be good to improve the backend selection by checking the mode
compatibility (for now, it is only performed for the HTTP).

But at the end, this can also be handled by the SPOP multiplexer when it is
initialized. If the opposite SD is not attached to an SPOE agent, we should
fail the mux initialization and return an internal error.

This patch must be backported as far as 3.1.
2025-08-25 11:11:05 +02:00
Christopher Faulet
b4a92e7cb1 MEDIUM: applet: Set .rcv_buf and .snd_buf functions on default ones if not set
Based on the applet flags, it is possible to set .rcv_buf and .snd_buf
callback functions if necessary. If these functions are not defined for an
applet using the new API, it means the default functions must be used.

We also take care to choose the raw version or the htx version, depending on
the applet flags.
2025-08-25 11:11:05 +02:00
Christopher Faulet
71c01c1010 MINOR: applet: Make some applet functions HTX aware
applet_output_room() and applet_input_data() are now HTX aware. These
functions automatically rely on htx versions if APPLET_FL_HTX flag is set
for the applet.
2025-08-25 11:11:05 +02:00
Christopher Faulet
927884a3eb MINOR: applet: Add a flag to know an applet is using HTX buffers
Multiplexers already explicitly announce their HTX support. Now it is
possible to set flags on applet, it could be handy to do the same. So, now,
HTX aware applets must set the APPLET_FL_HTX flag.
2025-08-25 11:11:05 +02:00
Christopher Faulet
1c76e4b2e4 MINOR: applet: Add function to test applet flags from the appctx
appctx_app_test() function can now be used to test the applet flags using an
appctx. This simplify a bit tests on applet flags. For now, this function is
used to test APPLET_FL_NEW_API flag.
2025-08-25 11:11:05 +02:00
Christopher Faulet
3de6c375aa MINOR: applet: Rely on applet flag to detect the new api
Instead of setting a flag on the applet context by checking the defined
callback functions of the applet to know if an applet is using the new API
or not, we can now rely on the applet flags itself. By checking
APPLET_FL_NEW_API flag, it does the job. APPCTX_FL_INOUT_BUFS flag is thus
removed.
2025-08-25 11:11:05 +02:00
Aurelien DARRAGON
3da1d63749 BUG/MEDIUM: http_ana: handle yield for "stats http-request" evaluation
stats http-request rules evaluation is handled separately in
http_process_req_common(). Because of that, if a rule requires yielding,
the evaluation is interrupted as (F)YIELD verdict return values are not
handled there.

Since 3.2 with the introduction of costly ruleset interruption in
0846638 ("MEDIUM: stream: interrupt costly rulesets after too many
evaluations"), the issue started being more visible because stats
http-request rules would be interrupted when the evaluation counters
reached tune.max-rules-at-once, but the evaluation would never be
resumed, and the request would continue to be handled as if the
evaluation was complete. Note however that the issue already existed
in the past for actions that could return ACT_RET_YIELD such as
"pause" for instance.

This issue was reported by GH user @Wahnes in #3087, thanks to him for
providing useful repro and details.

To fix the issue, we merge rule vedict handling in
http_process_req_common() so that "stats http-request" evaluation benefits
from all return values already supported for the current ruleset.

It should be backported in 3.2 with 0846638 ("MEDIUM: stream: interrupt
costly rulesets after too many evaluations"), and probably even further
(all stable versions) if the patch adaptation is not to complex (before
HTTP_RULE_RES_FYIELD was introduced) because it is still relevant.
2025-08-25 10:59:16 +02:00
Aurelien DARRAGON
f9b227ebff MINOR: http_ana: fix typo in http_res_get_intercept_rule
HTTP_RULE_RES_YIELD was used where HTTP_RULE_RES_FYIELD should be used.
Hopefully, aside from debug traces, both return values were treated
equally. Let's fix that to prevent confusion and from causing bugs
in the future.

It may be backported in 3.2 with 0846638 ("MEDIUM: stream: interrupt
costly rulesets after too many evaluations") if it easily applies
2025-08-25 10:59:08 +02:00
Amaury Denoyelle
1529ec1a25 MINOR: quic: centralize padding for HP sampling on packet building
The below patch has simplified INITIAL padding on emission. Now,
qc_prep_pkts() is responsible to activate padding for this case, and
there is no more special case in qc_do_build_pkt() needed.

  commit 8bc339a6ad4702f2c39b2a78aaaff665d85c762b
  BUG/MAJOR: quic: fix INITIAL padding with probing packet only

However, qc_do_build_pkt() may still activate padding on its own, to
ensure that a packet is big enough so that header protection decryption
can be performed by the peer. HP decryption is performed by extracting a
sample from the ciphered packet, starting 4 bytes after PN offset.
Sample length is 16 bytes as defined by TLS algos used by QUIC. Thus, a
QUIC sender must ensures that length of packet number plus payload
fields to be at least 4 bytes long. This is enough given that each
packet is completed by a 16 bytes AEAD tag which can be part of the HP
sample.

This patch simplifies qc_do_build_pkt() by centralizing padding for this
case in a single location. This is performed at the end of the function
after payload is completed. The code is thus simpler.

This is not a bug. However, it may be interesting to backport this patch
up to 2.6, as qc_do_build_pkt() is a tedious function, in particular
when dealing with padding generation, thus it may benefit greatly from
simplification.
2025-08-25 08:48:24 +02:00
Amaury Denoyelle
7d554ca629 BUG/MINOR: quic: don't coalesce probing and ACK packet of same type
Haproxy QUIC stack suffers from a limitation : it's not possible to emit
a packet which contains probing data and a ACK frame in it. Thus, in
case qc_do_build_pkt() is invoked which both values as true, probing has
the priority and ACK is ignored.

However, this has the undesired side-effect of possibly generating two
coalesced packets of the same type in the same datagram : the first one
with the probing data and the second with an ACK frame. This is caused
by qc_prep_pkts() loop which may call qc_do_build_pkt() multiple times
with the same QEL instance. This case is normally use when a full
datagram has been built but there is still content to emit on the
current encryption level.

To fix this, alter qc_prep_pkts() loop : if both probing and ACK is
requested, force the datagram to be written after packet encoding. This
will result in a datagram containing the packet with probing data as
final entry. A new datagram is started for the next packet which will
can contain the ACK frame.

This also has some impact on INITIAL padding. Indeed, if packet must be
the last due to probing emission, qc_prep_pkts() will also activate
padding to ensure final datagram is at least 1.200 bytes long.

Note that coalescing two packets of the same type is not invalid
according to QUIC RFC. However it could cause issue with some shaky
implementations, so it is considered as a bug.

This must be backported up to 2.6.
2025-08-22 18:20:42 +02:00
Amaury Denoyelle
8bc339a6ad BUG/MAJOR: quic: fix INITIAL padding with probing packet only
A QUIC datagram that contains an INITIAL packet must be padded to 1.200
bytes to prevent any deadlock due to anti-amplification protection. This
is implemented by encoding a PADDING frame on the last packet of the
datagram if necessary.

Previously, qc_prep_pkts() was responsible to activate padding when
calling qc_do_build_pkt(), as it knows which packet is the last to
encode. However, this has the side-effect of preventing PING emission
for probing with no data as this case was handled in an else-if branch
after padding. This was fixed by the below commit

  217e467e89d15f3c22e11fe144458afbf718c8a8
  BUG/MINOR: quic: fix malformed probing packet building

Above logic was altered to fix the PING case : padding was set to false
explicitely in qc_prep_pkts(). Padding was then added in a specific
block dedicated to the PING case in qc_do_build_pkt() itself for INITIAL
packets.

However, the fix is incorrect if the last QEL used to built a packet is
not the initial one and probing is used with PING frame only. In this
case, specific block in qc_do_build_pkt() does not add padding. This
causes a BUG_ON() crash in qc_txb_store() which catches these packets as
irregularly formed.

To fix this while also properly handling PING emission, revert to the
original padding logic : qc_prep_pkts() is responsible to activate
INITIAL padding. To not interfere with PING emission, qc_do_build_pkt()
body is adjusted so that PING block is moved up in the function and
detached from the padding condition.

The main benefit from this patch is that INITIAL padding decision in
qc_prep_pkts() is clearer now.

Note that padding can also be activated by qc_do_build_pkt(), as packets
should be big enough for header protection decipher. However, this case
is different from INITIAL padding, so it is not covered by this patch.

This should be backported up to 2.6.
2025-08-22 18:12:32 +02:00
Amaury Denoyelle
0376e66112 BUG/MINOR: quic: do not emit probe data if CONNECTION_CLOSE requested
If connection closing is activated, qc_prep_pkts() can only built a
datagram with a single packet. This is because we consider that only a
single CONNECTION_CLOSE frame is relevant at this stage.

This is handled both by qc_prep_pkts() which ensure that only a single
packet datagram is built and also qc_do_build_pkt() which prevents the
invokation of qc_build_frms() if <cc> is set.

However, there is an incoherency for probing. First, qc_prep_pkts()
deactivates it if connection closing is requested. But qc_do_build_pkt()
may still emit probing frame as it does not check its <probe> argument
but rather <pto_probe> QEL field directly. This can results in a packet
mixing a PING and a CONNECTION close frames, which is useless.

Fix this by adjusting qc_do_build_pkt() : closing argument is also
checked on PING probing emission. Note that there is still shaky code
here as qc_do_build_pkt() should rely only on <probe> argument to ensure
this.

This should be backported up to 2.6.
2025-08-22 18:06:43 +02:00
Amaury Denoyelle
fc3ad50788 BUG/MEDIUM: quic: reset padding when building GSO datagrams
qc_prep_pkts() encodes input data into QUIC packets in a loop into one
or several datagrams. It supports GSO which requires to built a serie of
multiple datagrams of the same length.

Each packet encoding is performed via a call to qc_do_build_pkt(). This
function has an argument to specify if output packet must be completed
with a PADDING frame. This option is activated when qc_prep_pkts()
encodes the last packet of a datagram with at least one INITIAL packet
in it.

Padding is resetted each time a new datagram is started. However, this
was not performed if GSO is used to built the next datagram. This patch
fixes it by properly resetting padding in this case also.

The impact of this bug is unknown. It may have several effectfs, one of
the most obvious being the insertion of unnecessary padding in packets.
It could also potentially trigger an infinite loop in qc_prep_pkts(),
although this has never been encountered so far.

This must be backported up to 3.1.
2025-08-22 16:22:01 +02:00
Valentine Krasnobaeva
0dc8d8d027 MINOR: dns: dns_connect_nameserver: fix fd leak at error path
This fixes the commit 2c7e05f80e3b
("MEDIUM: dns: don't call connect to dest socket for AF_INET*"). If we fail to
bind AF_INET sockets or the address family of the nameserver protocol isn't
something, what we expect, we need to close the fd, obtained by
connect.

This fixes the issue GitHub #3085
This must be backported along with the commit 2c7e05f80e3b.
2025-08-22 10:50:47 +02:00
Christopher Faulet
a498e527b4 BUG/MAJOR: stream: Remove READ/WRITE events on channels after analysers eval
It is possible to miss a synchronous write event in process_stream() if the
stream was woken up on a write event. In that case, it is possible to freeze
the stream until the next I/O event or timeout.

Concretely, the stream is woken up with CF_WRITE_EVENT on a channel. this
flag is removed from the channel when we leave proces_stream(). But before
leaving process_stream(), when a synchronous send is tried on this channel,
the flag is removed and eventually set again on success. But this event is
masked by the previous one, and the channel is not resync as it should be.

To fix the bug, CF_READ_EVENT and CF_WRITE_EVENT flags are removed from a
channel after the corresponding analysers evaluation. This way, we will be
able to detect a successful synchronous send to restart analysers evaluation
based on the new channel state. It is safe (or it should be) to do so
becaues these flags are only used by analysers and tested to resync the
stream inside process_stream().

It is a very old bug and I guess all versions are affected. It was observed
on 2.9 and higher, and with the master/worker only. But it could affect any
stream. It is tagged a MAJOR because this area is really sensitive to any
change.

This patch should fix the issue #3070. It should probably be backported to
all stable versions, but only after a period of observation and with a
special care because this area is really sensitive to changes. It is
probably reasonnable to backport it as far as 3.0 and wait for older
versions.

Thanks to Valentine for its help on this issue !
2025-08-21 20:15:18 +02:00
William Lallemand
7b3b3d7146 BUG/MEDIUM: ssl: apply ssl-f-use on every "ssl" bind
This patch introduces a change of behavior in the configuration parsing.

Previously the "ssl-f-use" lines were only applied on "ssl" bind lines
that does not have any "crt" configured.
Since there is no warning and you could mix bind lines with and without
crt, this is really confusing.

This patch applies the "ssl-f-use" lines on every "ssl" bind lines.

This was discussed in ticket #3082.

Must be backported in 3.2.
2025-08-21 14:58:06 +02:00
Frederic Lecaille
e513620c72 BUG/MEDIUM: quic-be: crash after backend CID allocation failures
This bug impacts only the QUIC backends. It arrived with this commit:
   MINOR: quic-be: QUIC connection allocation adaptation (qc_new_conn())
which was supposed to be fixed by:
   BUG/MEDIUM: quic: crash after quic_conn allocation failures
but this commit was not sufficient.

Such a crashe could be reproduced with -dMfail option. To reach it, the
<conn_id> object allocation must fail (from qc_new_conn()). So, this is
relatively rare, except on systems with limited memory.

No need to backport.
2025-08-21 14:24:31 +02:00
Frederic Lecaille
9a22770ac5 BUG/MINOR: quic-be: missing Initial packet number space discarding
A QUIC client must discard the Initial packet number space as soon as it first
sends a Handshake packet.

This patch implements this packet number space which was missing.
2025-08-21 14:24:31 +02:00
Amaury Denoyelle
901de11157 BUG/MEDIUM: mux-h2: fix crash on idle-ping due to unwanted ABORT_NOW
An ABORT_NOW() was used during debugging idle-ping but was not removed
from the final code. This may cause crash, in particular when mixing
idle-ping with shorter http-request/http-keep-alive values.

Fix this situation by removing ABORT_NOW() statement.

This should fix github issue #3079.

This must be backported up to 3.2.
2025-08-21 14:21:11 +02:00
Willy Tarreau
82b002a225 [RELEASE] Released version 3.3-dev7
Released version 3.3-dev7 with the following main changes :
    - MINOR: quic: duplicate GSO unsupp status from listener to conn
    - MINOR: quic: define QUIC_FL_CONN_IS_BACK flag
    - MINOR: quic: prefer qc_is_back() usage over qc->target
    - BUG/MINOR: cfgparse: immediately stop after hard error in srv_init()
    - BUG/MINOR: cfgparse-listen: update err_code for fatal error on proxy directive
    - BUG/MINOR: proxy: avoid NULL-deref in post_section_px_cleanup()
    - MINOR: guid: add guid_get() helper
    - MINOR: guid: add guid_count() function
    - MINOR: clock: add clock_set_now_offset() helper
    - MINOR: clock: add clock_get_now_offset() helper
    - MINOR: init: add REGISTER_POST_DEINIT_MASTER() hook
    - BUILD: restore USE_SHM_OPEN build option
    - BUG/MINOR: stick-table: cap sticky counter idx with tune.nb_stk_ctr instead of MAX_SESS_STKCTR
    - MINOR: sock: update broken accept4 detection for older hardwares.
    - CI: vtest: add os name to OT cache key
    - CI: vtest: add Ubuntu arm64 builds
    - BUG/MEDIUM: ssl: Fix 0rtt to the server
    - BUG/MEDIUM: ssl: fix build with AWS-LC
    - MEDIUM: acme: use lowercase for challenge names in configuration
    - BUG/MINOR: init: Initialize random seed earlier in the init process
    - DOC: management: clarify usage of -V with -c
    - MEDIUM: ssl/cli: relax crt insertion in crt-list of type directory
    - MINOR: tools: implement ha_aligned_zalloc()
    - CLEANUP: fd: make use of ha_aligned_alloc() for the fdtab
    - MINOR: pools: distinguish the requested alignment from the type-specific one
    - MINOR: pools: permit to optionally specify extra size and alignment
    - MINOR: pools: always check that requested alignment matches the type's
    - DOC: api: update the pools API with the alignment and typed declarations
    - MEDIUM: tree-wide: replace most DECLARE_POOL with DECLARE_TYPED_POOL
    - OPTIM: tasks: align task and tasklet pools to 64
    - OPTIM: buffers: align the buffer pool to 64
    - OPTIM: queue: align the pendconn pools to 64
    - OPTIM: connection: align connection pools to 64
    - OPTIM: server: start to use aligned allocs in server
    - DOC: management: fix typo in commit f4f93c56
    - DOC: config: recommend single quoting passwords
    - MINOR: tools: also implement ha_aligned_alloc_typed()
    - MEDIUM: server: introduce srv_alloc()/srv_free() to alloc/free a server
    - MINOR: server: align server struct to 64 bytes
    - MEDIUM: ring: always allocate properly aligned ring structures
    - CI: Update to actions/checkout@v5
    - MINOR: quic: implement qc_ssl_do_hanshake()
    - BUG/MEDIUM: quic: listener connection stuck during handshakes (OpenSSL 3.5)
    - BUG/MINOR: mux-h1: fix wrong lock label
    - MEDIUM: dns: don't call connect to dest socket for AF_INET*
    - BUG/MINOR: spoe: Properly detect and skip empty NOTIFY frames
    - BUG/MEDIUM: cli: Report inbuf is no longer full when a line is consumed
    - BUG/MEDIUM: quic: crash after quic_conn allocation failures
    - BUG/MEDIUM: quic-be: do not initialize ->conn too early
    - BUG/MEDIUM: mworker: more verbose error upon loading failure
    - MINOR: xprt: Add recvmsg() and sendmsg() parameters to rcv_buf() and snd_buf().
    - MINOR: ssl: Add a "flags" field to ssl_sock_ctx.
    - MEDIUM: xprt: Add a "get_capability" method.
    - MEDIUM: mux_h1/mux_pt: Use XPRT_CAN_SPLICE to decide if we should splice
    - MINOR: cfgparse: Add a new "ktls" option to bind and server.
    - MINOR: ssl: Define HAVE_VANILLA_OPENSSL if openssl is used.
    - MINOR: build: Add a new option, USE_KTLS.
    - MEDIUM: ssl: Add kTLS support for OpenSSL.
    - MEDIUM: splice: Don't consider EINVAL to be a fatal error
    - MEDIUM: ssl: Add splicing with SSL.
    - MEDIUM: ssl: Add ktls support for AWS-LC.
    - MEDIUM: ssl: Add support for ktls on TLS 1.3 with AWS-LC
    - MEDIUM: ssl: Handle non-Application data record with AWS-LC
    - MINOR: ssl: Add a way to globally disable ktls.
2025-08-20 21:52:39 +02:00
Olivier Houchard
6f21c5631a MINOR: ssl: Add a way to globally disable ktls.
Add a new global option, "noktls", as well as a command line option,
"-dT", to totally disable ktls usage, even if it is activated on servers
or binds in the configuration.
That makes it easier to quickly figure out if a problem is related to
ktls or not.
2025-08-20 18:33:11 +02:00
Olivier Houchard
5da3540988 MEDIUM: ssl: Handle non-Application data record with AWS-LC
Handle receiving and sending TLS records that are not application data
records.
When receiving, we ignore new session tickets records, we handle close
notify as a read0, and we consider any other records as a connection
error.
For sending, we're just sending close notify, so that the TLS connection
is properly closed.
2025-08-20 18:33:11 +02:00
Olivier Houchard
fefc1cce20 MEDIUM: ssl: Add support for ktls on TLS 1.3 with AWS-LC
AWS-LC added a new API in AWS-LC 1.54 that allows the user to retrieve
the keys for TLS 1.3 connections with SSL_get_read_traffic_secret(), so
use it to be able to use ktls with TLS 1.3 too.
2025-08-20 18:33:11 +02:00
Olivier Houchard
5c8fa50966 MEDIUM: ssl: Add ktls support for AWS-LC.
Add ktls support for AWS-LC. As it does not know anything
about ktls, it means extracting keys from the ssl lib, and provide them
to the kernel. At which point we can use regular recvmsg()/sendmsg()
calls.
This patch only provides support for TLS 1.2, AWS-LC provides a
different way to extract keys for TLS 1.3.
Note that this may work with BoringSSL too, but it has not been tested.
2025-08-20 18:33:11 +02:00
Olivier Houchard
a903004a1a MEDIUM: ssl: Add splicing with SSL.
Implement the splicing methods to the SSL xprt (which will just call the
raw_sock methods if kTLS is enabled on the socket), and properly report
that a connection supports splicing if kTLS is configured on that
connection.
For OpenSSL, if the upper layer indicated that it wanted to start using
splicing by adding the CO_FL_WANT_SPLICING flag, make sure we don't read
any more data from the socket, and just drain what may be in the
internal OpenSSL buffers, before allowing splicing
2025-08-20 18:33:11 +02:00
Olivier Houchard
755436920d MEDIUM: splice: Don't consider EINVAL to be a fatal error
Don't consider that EINVAL is a fatal error, when calling splice().
When doing splicing from a kTLS socket, splice() will set errno to
EINVAL if the next record to be read is not an application data record.
This is not a fatal error, it just means we have to use recvmsg() to
read it, and potentially we can then resume using splicing.
It is unfortunate that EINVAL was used for that case, but we should
never get any other case of receiving EINVAL from splice(), so it should
be safe to treat it as non-fatal.
2025-08-20 18:33:11 +02:00
Olivier Houchard
ed7d20afc8 MEDIUM: ssl: Add kTLS support for OpenSSL.
Modify the SSL code to enable kTLS with OpenSSL.
It mostly requires our internal BIO to be able to handle the various
kTLS-specific controls in ha_ssl_ctrl(), as well as being able to use
recvmsg() and sendmsg() from ha_ssl_read() and ha_ssl_write().
2025-08-20 18:33:11 +02:00
Olivier Houchard
6270073072 MINOR: build: Add a new option, USE_KTLS.
Add a new define, USE_KTLS, that enables using kTLS in haproxy.
It will only work for Linux with a kernel >= 4.17.
2025-08-20 18:33:11 +02:00
Olivier Houchard
7836fe8fe3 MINOR: ssl: Define HAVE_VANILLA_OPENSSL if openssl is used.
If we're using OpenSSL as our crypto library, so add a define,
HAVE_VANILLA_OPENSSL, to make it easier to differentiate between the
various crypto libs.
2025-08-20 18:33:10 +02:00
Olivier Houchard
e8674658ae MINOR: cfgparse: Add a new "ktls" option to bind and server.
Add a new "ktls" option to bind and server. Valid values are "on" and
"off".
It currently does nothing, but when kTLS will be implemented, it will
enable or disable kTLS for the corresponding sockets.
It is marked as experimental for now.
2025-08-20 18:33:10 +02:00
Olivier Houchard
075e753802 MEDIUM: mux_h1/mux_pt: Use XPRT_CAN_SPLICE to decide if we should splice
In both mux_h1 and mux_pt, use the new XPRT_CAN_SPLICE capability to
decide if we should attempt to use splicing or not.
If we receive XPRT_CONN_CAN_MAYBE_SPLICE, add a new flag on the
connection, CO_FL_WANT_SPLICING, to let the xprt know that we'd love to
be able to do splicing, so that it may get ready for that.
This should have no effect right now, and is required work for adding
kTLS support.
2025-08-20 18:33:10 +02:00
Olivier Houchard
5731b8a19c MEDIUM: xprt: Add a "get_capability" method.
Add a new method to xprts, get_capability, that can be used to query if
an xprt supports something or not.
The first capability implemented is XPRT_CAN_SPLICE, to know if the xprt
will be able to use splicing for the provided connection.
The possible answers are XPRT_CONN_CAN_NOT_SPLICE, which indicates
splicing will never be possible for that connection,
XPRT_CONN_COULD_SPLICE, which indicates that splicing is not usable
right now, but may be in the future, and XPRT_CONN_CAN_SPLICE, that
means we can splice right away.
2025-08-20 18:33:10 +02:00
Olivier Houchard
2623b7822e MINOR: ssl: Add a "flags" field to ssl_sock_ctx.
Instead of adding more separate fields in ssl_sock_ctx, add a "flags"
one.
Convert the "can_send_early_data" to the flag SSL_SOCK_F_EARLY_ENABLED.
More flags will be added for kTLS support.
2025-08-20 17:28:03 +02:00
Olivier Houchard
3d685fcb7d MINOR: xprt: Add recvmsg() and sendmsg() parameters to rcv_buf() and snd_buf().
In rcv_buf() and snd_buf(), use sendmsg/recvmsg instead of send and
recv, and add two new optional parameters to provide msg_control and
msg_controllen.
Those are unused for now, but will be used later for kTLS.
2025-08-20 17:28:03 +02:00
William Lallemand
67cb6aab90 BUG/MEDIUM: mworker: more verbose error upon loading failure
When a worker crashes during its configuration parsing and without
emitting any messages, the master will emit the message "Failed to load
worker!". However that doesn't give us neither the PID of the worker,
nor the status code.

This patch fixes the problem by emitting a more verbose error.

Must be backported as far as 3.1.
2025-08-20 17:15:52 +02:00
Frederic Lecaille
ca5511f022 BUG/MEDIUM: quic-be: do not initialize ->conn too early
This bug arrived with this commit:

   BUG/MEDIUM: quic: do not release BE quic-conn prior to upper conn

which added a BUG_ON(qc->conn) statement at the beginning of quic_conn_release().
It is triggered if the connection is not released before releasing the quic_conn.
But this is always the case for a backend quic_conn when its allocation from
qc_new_conn() fails.

Such crashes could be reproduced with -dMfail option. To reach them, the
memory allocations must fail. So, this is relatively rare, except on systems
with limited memory.

To fix this, simply set ->conn quic_conn struct member to a not null value
(the one passed as parameter) after the quic_conn allocation has succeeded.

No backport needed.
2025-08-20 16:25:51 +02:00
Frederic Lecaille
8514647849 BUG/MEDIUM: quic: crash after quic_conn allocation failures
This regression arrived with this commit:

	MINOR: quic-be: QUIC connection allocation adaptation (qc_new_conn())

where qc_new_conn() was modified. The ->cids allocation was moved without
checking if a quic_conn_release() call could lead to crashes due to uninitialized
quic_conn members. Indeed, if qc_new_conn() fails, then quic_conn_release() is
called. This bug could impact both QUIC servers and clients.

Such crashes could be reproduced with -dMfail option. To reach them, the
memory allocations must fail. So, this is relatively rare, except on systems
with limited memory.

This patch ensures all the quic_conn members which could lead to crash
from quic_conn_release() are initialized before any remaining memory allocations
required for the quic_conn.

The <conn_id> variable allocated by the client is no more attached to
the connection during its allocation, but after the ->cids trees is allocated.

No backport needed.
2025-08-20 16:25:51 +02:00
Christopher Faulet
c6c2ef1f11 BUG/MEDIUM: cli: Report inbuf is no longer full when a line is consumed
When the command line parsing was refactored (20ec1de21 "MAJOR: cli: Refacor
parsing and execution of pipelined commands"), a regression was introduced.
When input data are consumed, information about the applet's input buffer
are no longer updated accordingly to state it is no longer full. So it is
possible to freeze the CLI applet. And a spinning loop may be encountered if
a client shutdown is detected in this state.

The fix is obivous. When data are consumed from the applet's input buffer,
APPCTX_FL_INBLK_FULL flag is removed to notify the input buffer is no longer
full and more data can be sent to the CLI applet.

This patch should fix the issue #3064. It must be backported to 3.2.
2025-08-20 16:01:50 +02:00
Christopher Faulet
dc6e8dde23 BUG/MINOR: spoe: Properly detect and skip empty NOTIFY frames
Since the SPOE was refactored, the detection of empty NOTIFY frames is
broken. So it is possible to send a NOTIFY frames to an agent with no
message at all. The bug happens because the frame type is now added to the
buffer before the messages encoding. So the buffer is never really empty.

To fix the issue, the condition to detect empty frame was adapted.

This patch must be backported as far as 3.1.
2025-08-20 16:01:50 +02:00
Valentine Krasnobaeva
2c7e05f80e MEDIUM: dns: don't call connect to dest socket for AF_INET*
When we perform connect call for a datagram socket, used to send DNS requests,
we set for it the default destination address to some given nameserver. Then we
simply use send(), as the destination address is already set. In some usecases
described in GitHub issues #3001 and #2654, this approach becames inefficient,
nameservers change its IP addresses dynamically, this triggers DNS resolution
errors.

To fix this, let's perform the bind() on the wildcard address for the datagram
AF_INET* client socket. Like this we will allocate a port for it. Then let's
use sendto() instead of send().

If the nameserver is local and is listening on the UNIX domain socket, we
continue to use the existed approach (connect() and then send()).

This fixes issues #3001 and #2654.
This may be backported in all stable versions.
2025-08-19 11:26:02 +02:00
Amaury Denoyelle
8ac54cafcd BUG/MINOR: mux-h1: fix wrong lock label
Wrong lock label is used when manipulating idle lock on h1_timeout_task.
Fix this by replacing OTHER_LOCK by IDLE_CONNS_LOCK.

This only concerns thread debugging statistics.

This must be backported up to 2.4.
2025-08-14 16:31:25 +02:00
Frederic Lecaille
878a72d001 BUG/MEDIUM: quic: listener connection stuck during handshakes (OpenSSL 3.5)
This issue was reported in GH #3071 by @famfo where a wireshark capture
reveals that some handshake could not complete after having received
two Initial packets. This could happen when the packets were parsed
in two times, calling qc_ssl_provide_all_quic_data() two times.

This is due to crypto data stream counter which was incremented two times
from qc_ssl_provide_all_quic_data() (see cstream->rx.offset += data
statement around line 1223 in quic_ssl.c). One time by the callback
which "receives" the crypto data, and on time by qc_ssl_provide_all_quic_data().

Then when parsing the second crypto data frame, the parser detected
that the crypto were already provided.

To fix this, one could comment the code which increment the crypto data
stream counter by <data>. That said, when using the OpenSSL 3.5 QUIC API
one should not modified the crypto data stream outside of the OpenSSL 3.5
QUIC API.

So, this patch stop calling qc_ssl_provide_all_quic_data() and
qc_ssl_provide_quic_data() and only calls qc_ssl_do_hanshake() after
having received some crypto data. In addition to this, as these functions
are no more called when building haproxy against OpenSSL 3.5, this patch
disable their compilations (with #ifndef HAVE_OPENSSL_QUIC).

This patch depends on this previous one:

     MINOR: quic: implement qc_ssl_do_hanshake()

Thank you to @famto for this report.

Must be backported to 3.2.
2025-08-14 14:54:47 +02:00
Frederic Lecaille
a874821df3 MINOR: quic: implement qc_ssl_do_hanshake()
Extract the code in relation with the hanshake SSL API (SSL_do_hanshake()...)
from qc_ssl_provide_quic_data() to implement qc_ssl_do_handshake().
2025-08-14 14:54:47 +02:00
Tim Duesterhus
b81a7f428b CI: Update to actions/checkout@v5
No functional change, but we should keep this current.

see 5f4ddb54b05ae0355b1f64c22263a6bc381410df
see 5c923f1869881156bf3a25c9659655ae10f7dbd0
2025-08-13 19:15:04 +02:00
Willy Tarreau
a7f8693fa2 MEDIUM: ring: always allocate properly aligned ring structures
The rings were manually padded to place the various areas that compose
them into different cache lines, provided that the allocator returned
a cache-aligned address, which until now was not granted. By now
switching to the aligned API we can finally have this guarantee and
hope for more consistent ring performance between tests. Like previously
the few carefully crafted THREAD_PAD() could simply be replaced by
generic THREAD_ALIGN() that dictate the type's alignment.

This was the last user of THREAD_PAD() by the way.
2025-08-13 17:47:39 +02:00
Willy Tarreau
cfdab917fe MINOR: server: align server struct to 64 bytes
Several times recently, it was noticed that some benchmarks would
highly vary depending on the position of certain fields in the server
struct, and this could even vary between runs.

The server struct does have separate areas depending on the user cases
and hot/cold aspect of the members stored there, but the areas are
artificially kept apart using fixed padding instead of real alignment,
which has the first sad effect of artificially inflating the struct,
and the second one of misaligning it.

Now that we have all the necessary tools to keep them aligned, let's
just do it. The struct has shrunk from 4160 to 4032 bytes on 64-bit
systems, 152 of which are still holes or padding.
2025-08-13 17:37:11 +02:00
Willy Tarreau
a469356268 MEDIUM: server: introduce srv_alloc()/srv_free() to alloc/free a server
It happens that we free servers at various places in the code, both
on error paths and at runtime thanks to the "server delete" feature. In
order to switch to an aligned struct, we'll need to change the calloc()
and free() calls. Let's first spot them and switch them to srv_alloc()
and srv_free() instead of using calloc() and either free() or ha_free().
An easy trap to fall into is that some of them are default-server
entries. The new srv_free() function also resets the pointer like
ha_free() does.

This was done by running the following coccinelle script all over the
code:

  @@
  struct server *srv;
  @@
  (
  - free(srv)
  + srv_free(&srv)
  |
  - ha_free(&srv)
  + srv_free(&srv)
  )
  @@
  struct server *srv;
  expression e1;
  expression e2;
  @@
  (
  - srv = malloc(e1)
  + srv = srv_alloc()
  |
  - srv = calloc(e1, e2)
  + srv = srv_alloc()
  )

This is marked medium because despite spotting all call places, we can
never rule out the possibility that some out-of-tree patches would
allocate their own servers and continue to use the old API... at their
own risk.
2025-08-13 17:37:11 +02:00
Willy Tarreau
33d72568dd MINOR: tools: also implement ha_aligned_alloc_typed()
This one is a macro and will allocate a properly aligned and sized
object. This will help make sure that the alignment promised to the
compiler is respected.

When memstats is used, the type name is passed as a string into the
.extra field so that it can be displayed in "debug dev memstats". Two
tiny mistakes related to memstats macros were also fixed (calloc
instead of malloc for zalloc), and the doc was also added to document
how to use these calls.
2025-08-13 17:37:08 +02:00
Lukas Tribus
9432e7d688 DOC: config: recommend single quoting passwords
Suggests single quoting passwords and update examples to avoid unexpected
behaviors due to special characters.

Should be backported to stable versions.

Link: https://discourse.haproxy.org/t/enhance-documentation-for-insecure-passwords-and-invald-characters/11959
2025-08-13 09:08:25 +02:00
Lukas Tribus
faacc6c084 DOC: management: fix typo in commit f4f93c56
Fixes a small typo in commit f4f93c56 ("DOC: management: clarify usage
of -V with -c").

Must be backported as far as 2.8 along commit f4f93c56.
2025-08-13 09:08:25 +02:00
Willy Tarreau
1bb9754648 OPTIM: server: start to use aligned allocs in server
This is currently for per-thread arrays like idle conns etc. We're
now cache-aligning the per-thread arrays so as to put an end to false
sharing. A comparative test between no alignment and alignment on a
simple config with round robin between 4 servers showed an average
rate of 1.75M/s vs 1.72M/s before for 100M requests. The gain seems
to be more commonly less than 1% however. This should mostly help
make measurements more reproducible across multiple runs.
2025-08-11 19:55:30 +02:00
Willy Tarreau
c2687f587e OPTIM: connection: align connection pools to 64
The struct connection is used a lot by the muxes during many operations,
particularly at the beginning of the struct (flags, ctrl, xprt and mux).
We definitely want this one not to be falsely shared with another thread,
so let's align the pools to a cache line.
2025-08-11 19:55:30 +02:00
Willy Tarreau
d6095fcfe6 OPTIM: queue: align the pendconn pools to 64
This is in order to limit false sharing, because this element is already
ultra-sensitive to sharing and we'd rather limit it as much as possible.
2025-08-11 19:55:30 +02:00
Willy Tarreau
77335f52fc OPTIM: buffers: align the buffer pool to 64
This struct is used by memcpy() and friends, particularly during the
early recv() and send(). By keeping it 64-byte aligned, we let the
underlying libs/kernel use optimal operations (e.g.  AVX512) for memory
copies while right now it's just random (buffers are found to be equally
aligned to 32 and 64 in practice).
2025-08-11 19:55:30 +02:00
Willy Tarreau
c471de7964 OPTIM: tasks: align task and tasklet pools to 64
These structs are intensively used and really must not experience false
sharing, so let's declare them aligned to 64. We don't try to align the
struct themselves, as we don't want the compiler to expand them either.
2025-08-11 19:55:30 +02:00
Willy Tarreau
c264ea1679 MEDIUM: tree-wide: replace most DECLARE_POOL with DECLARE_TYPED_POOL
This will make the pools size and alignment automatically inherit
the type declaration. It was done like this:

   sed -i -e 's:DECLARE_POOL(\([^,]*,[^,]*,\s*\)sizeof(\([^)]*\))):DECLARE_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_POOL src addons)
   sed -i -e 's:DECLARE_STATIC_POOL(\([^,]*,[^,]*,\s*\)sizeof(\([^)]*\))):DECLARE_STATIC_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_STATIC_POOL src addons)

81 replacements were made. The only remaining ones are those which set
their own size without depending on a structure. The few ones with an
extra size were manually handled.

It also means that the requested alignments are now checked against the
type's. Given that none is specified for now, no issue is reported.

It was verified with "show pools detailed" that the definitions are
exactly the same, and that the binaries are similar.
2025-08-11 19:55:30 +02:00
Willy Tarreau
977feb5617 DOC: api: update the pools API with the alignment and typed declarations
This adds the DECLARE_*ALIGNED*() and DECLARE_*TYPED*() macros.
2025-08-11 19:55:30 +02:00
Willy Tarreau
6be7b64bb4 MINOR: pools: always check that requested alignment matches the type's
For pool registrations that are created from the type declaration, we
now have the ability to verify that the requested alignment matches
the type's one. Let's not miss this opportunity, as we've met bugs in
the past that were caused by such mismatches. The principle is simple:
if the type alignment is known, we check that the configured alignment
is at least as large as that one otherwise we refuse to start (since
the code may crash at any moment). Obviously it doesn't crash for now!
2025-08-11 19:55:30 +02:00
Willy Tarreau
e21bb531ca MINOR: pools: permit to optionally specify extra size and alignment
The common macros REGISTER_TYPED_POOL(), DECLARE_TYPED_POOL() and
DECLARE_STATIC_TYPED_POOL() will now take two optional arguments,
one being the extra size to be added to the structure, and a second
one being the desired alignment to enforce. This will permit to
specify alignments larger than the default ones promised to the
compiler.
2025-08-11 19:55:30 +02:00
Willy Tarreau
d240f387ca MINOR: pools: distinguish the requested alignment from the type-specific one
We're letting users request an alignment but that can violate one imposed
by a type, especially if we start seeing REGISTER_TYPED_POOL() grow in
adoption, encouraging users to specify alignment on their types. On the
other hand, if we ask the user to always specify the alignment, no control
is possible and the error is easy. Let's have a second field in the pool
registration, for the type-specific one. We'll set it to zero when unknown,
and to the types's alignment when known. This way it will become possible
to compare them at startup time to detect conflicts. For now no macro
permits to set both separately so this is not visible.
2025-08-11 19:55:30 +02:00
Willy Tarreau
5e2837cfb4 CLEANUP: fd: make use of ha_aligned_alloc() for the fdtab
We've forcefully aligned the fdtab in commit 97ea9c49f1 ("BUG/MEDIUM:
fd: always align fdtab[] to 64 bytes"), but now we don't need such hacks
anymore thanks to ha_aligned_alloc(). Let's use it and get rid of
fdtab_addr.
2025-08-11 19:55:30 +02:00
Willy Tarreau
746e77d000 MINOR: tools: implement ha_aligned_zalloc()
This one is exactly ha_aligned_alloc() followed by a memset(0), as
it will be convenient for a number of call places as a replacement
for calloc().

Note that ideally we should also have a calloc version that performs
basic multiply overflow checks, but these are essentially used with
numbers of threads times small structs so that's fine, and we already
do the same everywhere in malloc() calls.
2025-08-11 19:55:30 +02:00
William Lallemand
55d561042c MEDIUM: ssl/cli: relax crt insertion in crt-list of type directory
In previous versions of haproxy, insertions of certificates in a
crt-list from the CLI would require to have the path of the directory,
in the path of the certificate. This would help avoiding that the
certificate wasn't loaded upon a reload because it is not at the right
place.

However, since version 3.0 and crt-store, the name stored in the tree
could be an alias and not a path, so that does not make sense anymore.
Even though path would be right, the check is not right anymore in this
case.

The tool or user inserting the certificate must now check itself that
the certificate was placed at the right spot on the filesystem.

Reported in issue #3053.

Could be backported as far as haproxy 3.0.
2025-08-11 17:42:16 +02:00
William Lallemand
f4f93c56c1 DOC: management: clarify usage of -V with -c
In ticket #3065 an user complained that no success message is printed
anymore when using -c. The message does not appear by default since
version 2.9. This patch clarify the documentation.

Must be backported as far as 2.8.
2025-08-11 16:23:00 +02:00
Remi Tricot-Le Breton
15ee49e822 BUG/MINOR: init: Initialize random seed earlier in the init process
The random seed used in ha_random functions needs to be first
initialized by calling ha_random_boot. This function was called rather
late in the init process, after the init functions (INITCALLS) are
called and after the configuration parsing for instance which means that
any ha_random call in an init function would return 0. This was the case
in 'vars_init' and 'cache_init' which tried to build seeds for specific
hash calculations but ended up not being seeded.

This patch can be backported on all stable branches.
2025-08-11 16:02:41 +02:00
William Lallemand
84589a9f48 MEDIUM: acme: use lowercase for challenge names in configuration
Both the RFC and the IANA registry refers to challenge names in
lowercase. If we need to implement more challenges, it's better to
use the correct naming.

In order to keep the compatibility with the previous configurations, the
parsing does a strcasecmp() instead of a strcmp().

Also rename every occurence in the code and doc in lowercase.

This was discussed in issue #1864
2025-08-11 15:09:18 +02:00
Olivier Houchard
b6702d5342 BUG/MEDIUM: ssl: fix build with AWS-LC
AWS-LC doesn't provide SSL_in_before(), and doesn't provide an easy way
to know if we already started the handshake or not. So instead, just add
a new field in ssl_sock_ctx, "can_write_early_data", that will be
initialized to 1, and will be set to 0 as soon as we start the
handshake.

This should be backported up to 2.8 with
13aa5616c9f99dbca0711fd18f716bd6f48eb2ae.
2025-08-08 20:21:14 +02:00
Olivier Houchard
13aa5616c9 BUG/MEDIUM: ssl: Fix 0rtt to the server
In order to send early data, we have to make sure no handshake has been
initiated at all. To do that, we remove the CO_FL_SSL_WAIT_HS flag, so
that we won't attempt to start a handshake. However, by removing those
flags, we allow ssl_sock_to_buf() to call SSL_read(), as it's no longer
aware that no handshake has been done, and SSL_read() will begin the
handshake, thus preventing us from sending early data.
The fix is to just call SSL_in_before() to check if no handshake has
been done yet, in addition to checking CO_FL_SSL_WAIT_HS (both are
needed, as CO_FL_SSL_WAIT_HS may come back in case of renegociation).
In ssl_sock_from_buf(), fix the check to see if we may attempt to send
early data. Use SSL_in_before() instead of SSL_is_init_finished(), as
SSL_is_init_finished() will return 1 if the handshake has been started,
but not terminated, and if the handshake has been started, we can no
longer send early data.
This fixes errors when attempting to send early data (as well as
actually sending early data).

This should be backported up to 2.8.
2025-08-08 19:13:37 +02:00
Ilia Shipitsin
c10e8401e2 CI: vtest: add Ubuntu arm64 builds
Reference: https://github.com/actions/partner-runner-images

since GHA now supports arm64 as well, let add those builds. We will
start with ASAN builds, other will be added later if required
2025-08-08 15:36:11 +02:00
Ilia Shipitsin
6b2bbcb428 CI: vtest: add os name to OT cache key
currently OpenTracing cache does not include os name. it does not
allow to distinguish, for example between ubuntu-24.04 and
ubuntu-24.04-arm.
2025-08-08 15:36:12 +02:00
David Carlier
7fe8989fbb MINOR: sock: update broken accept4 detection for older hardwares.
Some older ARM embedded settings set errno to EPERM instead of ENOSYS
for missing implementations (e.g. Freescale ARM 2.6.35)
2025-08-08 06:01:18 +02:00
Valentine Krasnobaeva
21d5f43aa6 BUG/MINOR: stick-table: cap sticky counter idx with tune.nb_stk_ctr instead of MAX_SESS_STKCTR
Cap sticky counter index with tune.nb_stk_ctr instead of MAX_SESS_STKCTR for
sc-add-gpc. Same logic is already implemented for sc-inc-gpc and sc-set-gpt
keywords. So, it seems missed for sc-add-gpc.

This fixes the issue #3061 reported at GitHub. Thanks to @ma311 for
reporting their analysis of the issue.
This should be backported in all versions until 2.8, included 2.8.
2025-08-08 05:26:30 +02:00
Aurelien DARRAGON
7656a41784 BUILD: restore USE_SHM_OPEN build option
Some optional features may still require the use of shm_open() in the
future. In this patch we restore the USE_SHM_OPEN build option that
was removed in 143be1b59 ("MEDIUM: errors: get rid of shm_open()") and
should guard the use of shm_open() in the code.
2025-08-07 22:27:22 +02:00
Aurelien DARRAGON
bcb124f92a MINOR: init: add REGISTER_POST_DEINIT_MASTER() hook
Similar to REGISTER_POST_DEINIT() hook (which is invoked during deinit)
but for master process only, when haproxy was started in master-worker
mode. The goal is to be able to register cleanup functions that will
only run for the master process right before exiting.
2025-08-07 22:27:14 +02:00
Aurelien DARRAGON
c8282f6138 MINOR: clock: add clock_get_now_offset() helper
Same as clock_set_now_offset() but to retrieve the offset from external
location.
2025-08-07 22:27:09 +02:00
Aurelien DARRAGON
20f9d8fa4e MINOR: clock: add clock_set_now_offset() helper
Since now_offset is a static variable and is not exposed outside from
clock.c, let's add an helper so that it becomes possible to set its
value from another source file.
2025-08-07 22:27:05 +02:00
Aurelien DARRAGON
4c3a36c609 MINOR: guid: add guid_count() function
returns the total amount of registered GUIDs in the guid_tree
2025-08-07 22:26:58 +02:00
Aurelien DARRAGON
7c52964591 MINOR: guid: add guid_get() helper
guid_get() is a convenient function to get the actual key string
associated to a given guid_node struct
2025-08-07 22:26:52 +02:00
Aurelien DARRAGON
3759172015 BUG/MINOR: proxy: avoid NULL-deref in post_section_px_cleanup()
post_section_px_cleanup(), which was implemented in abcc73830
("MEDIUM: proxy: register a post-section cleanup function"), is called
for the current section no matter if the parsing was aborted due to
a fatal error. In this case, the curproxy pointer may point to NULL,
yet post_section_px_cleanup() assumes curproxy pointer is always valid,
which could lead to NULL-deref.

For instance, the config below will cause SEGFAULT:

  listen toto titi

To fix the issue, let's simply consider that the curproxy pointer may
be NULL in post_section_px_cleanup(), in which case we skip the cleanup
for the curproxy since there is nothing we can do.

No backport needed
2025-08-07 22:26:47 +02:00
Aurelien DARRAGON
833158f9e0 BUG/MINOR: cfgparse-listen: update err_code for fatal error on proxy directive
When improper arguments are provided on proxy directive (listen,
frontend or backend), such alert may be emitted:

  "please use the 'bind' keyword for listening addresses"

This was introduced in 6e62fb6405 ("MEDIUM: cfgparse: check section
maximum number of arguments"). However, despite the error being reported
as alert, the err_code isn't updated accordingly, which could make the
upper parser think there was no error, while it isn't the case.

In practise since the proxy directive is ignored following proxy related
directives should raise errors, so this didn't cause much harm, yet
better fix that.

It could be backported to all stable versions.
2025-08-07 22:26:42 +02:00
Aurelien DARRAGON
525750e135 BUG/MINOR: cfgparse: immediately stop after hard error in srv_init()
Since 368d01361 (" MEDIUM: server: add and use srv_init() function"), in
case of srv_init() error, we simply increment cfgerr variable and keep
going.

It isn't enough, some treatment occuring later in check_config_validity()
assume that srv_init() succeeded for servers, and may cause undefined
behavior. To fix the issue, let's consider that if (srv_init() & ERR_CODE)
returns true, then we must stop checking the config immediately.

No backport needed unless 368d01361 is.
2025-08-07 22:26:37 +02:00
Amaury Denoyelle
731b52ded9 MINOR: quic: prefer qc_is_back() usage over qc->target
Previously quic_conn <target> member was used to determine if quic_conn
was used on the frontend (as server) or backend side (as client). A new
helper function can now be used to directly check flag
QUIC_FL_CONN_IS_BACK.

This reduces the dependency between quic_conn and their relative
listener/server instances.
2025-08-07 16:59:59 +02:00
Amaury Denoyelle
cae828cbf5 MINOR: quic: define QUIC_FL_CONN_IS_BACK flag
Define a new quic_conn flag assign if the connection is used on the
backend side. This is similar to other haproxy components such as struct
connection and muxes element.

This flag is positionned via qc_new_conn(). Also update quic traces to
mark proxy side as 'F' or 'B' suffix.
2025-08-07 16:59:59 +02:00
Amaury Denoyelle
e064e5d461 MINOR: quic: duplicate GSO unsupp status from listener to conn
QUIC emission can use GSO to emit multiple datagrams with a single
syscall invokation. However, this feature relies on several kernel
parameters which are checked on haproxy process startup.

Even if these checks report no issue, GSO may still be unable due to the
underlying network adapter underneath. Thus, if a EIO occured on
sendmsg() with GSO, listener is flagged to mark GSO as unsupported. This
allows every other QUIC connections to share the status and avoid using
GSO when using this listener.

Previously, listener flag was checked for every QUIC emission. This was
done using an atomic operation to prevent races. Improve this by
duplicating GSO unsupported status as the connection level. This is done
on qc_new_conn() and also on thread rebinding if a new listener instance
is used.

The main benefit from this patch is to reduce the dependency between
quic_conn and listener instances.
2025-08-07 16:36:26 +02:00
Willy Tarreau
d76ee72d03 [RELEASE] Released version 3.3-dev6
Released version 3.3-dev6 with the following main changes :
    - MINOR: acme: implement traces
    - BUG/MINOR: hlua: take default-path into account with lua-load-per-thread
    - CLEANUP: counters: rename counters_be_shared_init to counters_be_shared_prepare
    - MINOR: clock: make global_now_ms a pointer
    - MINOR: clock: make global_now_ns a pointer as well
    - MINOR: mux-quic: release conn after shutdown on BE reuse failure
    - MINOR: session: strengthen connection attach to session
    - MINOR: session: remove redundant target argument from session_add_conn()
    - MINOR: session: strengthen idle conn limit check
    - MINOR: session: do not release conn in session_check_idle_conn()
    - MINOR: session: streamline session_check_idle_conn() usage
    - MINOR: muxes: refactor private connection detach
    - BUG/MEDIUM: mux-quic: ensure Early-data header is set
    - BUILD: acme: avoid declaring TRACE_SOURCE in acme-t.h
    - MINOR: acme: emit a log for DNS-01 challenge response
    - MINOR: acme: emit the DNS-01 challenge details on the dpapi sink
    - MEDIUM: acme: allow to wait and restart the task for DNS-01
    - MINOR: acme: update the log for DNS-01
    - BUG/MINOR: acme: possible integer underflow in acme_txt_record()
    - BUG/MEDIUM: hlua_fcn: ensure systematic watcher cleanup for server list iterator
    - MINOR: sample: Add le2dec (little endian to decimal) sample fetch
    - BUILD: fcgi: fix the struct name of fcgi_flt_ctx
    - BUILD: compat: provide relaxed versions of the MIN/MAX macros
    - BUILD: quic: use _MAX() to avoid build issues in pools declarations
    - BUILD: compat: always set _POSIX_VERSION to ease comparisons
    - MINOR: implement ha_aligned_alloc() to return aligned memory areas
    - MINOR: pools: support creating a pool from a pool registration
    - MINOR: pools: add a new flag to declare static registrations
    - MINOR: pools: force the name at creation time to be a const.
    - MEDIUM: pools: change the static pool creation to pass a registration
    - DEBUG: pools: store the pool registration file name and line number
    - DEBUG: pools: also retrieve file and line for direct callers of create_pool()
    - MEDIUM: pools: add an alignment property
    - MINOR: pools: add macros to register aligned pools
    - MINOR: pools: add macros to declare pools based on a struct type
    - MEDIUM: pools: respect pool alignment in allocations
2025-08-06 21:50:00 +02:00
Willy Tarreau
ef915e672a MEDIUM: pools: respect pool alignment in allocations
Now pool_alloc_area() takes the alignment in argument and makes use
of ha_aligned_malloc() instead of malloc(). pool_alloc_area_uaf()
simply applies the alignment before returning the mapped area. The
pool_free() functionn calls ha_aligned_free() so as to permit to use
a specific API for aligned alloc/free like mingw requires.

Note that it's possible to see warnings about mismatching sized
during pool_free() since we know both the pool and the type. In
pool_free, adding just this is sufficient to detect potential
offenders:

	WARN_ON(__alignof__(*__ptr) > pool->align);
2025-08-06 19:20:36 +02:00
Willy Tarreau
f0d0922aa1 MINOR: pools: add macros to declare pools based on a struct type
DECLARE_TYPED_POOL() and friends take a name, a type and an extra
size (to be added to the size of the element), and will use this
to create the pool. This has the benefit of letting the compiler
automatically adapt sizeof() and alignof() based on the type
declaration.
2025-08-06 19:20:36 +02:00
Willy Tarreau
6ea0e3e2f8 MINOR: pools: add macros to register aligned pools
This adds an alignment argument to create_pool_from_loc() and
completes the existing low-level macros with new ones that expose
the alignment and the new macros permit to specify it. For now
they're not used.
2025-08-06 19:20:36 +02:00
Willy Tarreau
eb075d15f6 MEDIUM: pools: add an alignment property
This will be used to declare aligned pools. For now it's not used,
but it's properly set from the various registrations that compose
a pool, and rounded up to the next power of 2, with a minimum of
sizeof(void*).

The alignment is returned in the "show pools" part that indicates
the entry size. E.g. "(56 bytes/8)" means 56 bytes, aligned by 8.
2025-08-06 19:20:36 +02:00
Willy Tarreau
ac23b873f5 DEBUG: pools: also retrieve file and line for direct callers of create_pool()
Just like previous patch, we want to retrieve the location of the caller.
For this we turn create_pool() into a macro that collects __FILE__ and
__LINE__ and passes them to the now renamed function create_pool_with_loc().

Now the remaining ~30 pools also have their location stored.
2025-08-06 19:20:34 +02:00
Willy Tarreau
efa856a8b0 DEBUG: pools: store the pool registration file name and line number
When pools are declared using DECLARE_POOL(), REGISTER_POOL etc, we
know where they are and it's trivial to retrieve the file name and line
number, so let's store them in the pool_registration, and display them
when known in "show pools detailed".
2025-08-06 19:20:32 +02:00
Willy Tarreau
ff62aacb20 MEDIUM: pools: change the static pool creation to pass a registration
Now we're creating statically allocated registrations instead of
passing all the parameters and allocating them on the fly. Not only
this is simpler to extend (we're limited in number of INITCALL args),
but it also leaves all of these in the data segment where they are
easier to find when debugging.
2025-08-06 19:20:30 +02:00
Willy Tarreau
f51d58bd2e MINOR: pools: force the name at creation time to be a const.
This is already the case as all names are constant so that's fine. If
it would ever change, it's not very hard to just replace it in-situ
via an strdup() and set a flag to mention that it's dynamically
allocated. We just don't need this right now.

One immediately visible effect is in "show pools detailed" where the
names are no longer truncated.
2025-08-06 19:20:28 +02:00
Willy Tarreau
ee5bc28865 MINOR: pools: add a new flag to declare static registrations
We must not free these ones when destroying a pool, so let's dedicate
them a flag to mention that they are static. For now we don't have any
such.
2025-08-06 19:20:26 +02:00
Willy Tarreau
18505f9718 MINOR: pools: support creating a pool from a pool registration
We've recently introduced pool registrations to be able to enumerate
all pool creation requests with their respective parameters, but till
now they were only used for debugging ("show pools detailed"). Let's
go a step further and split create_pool() in two:
  - the first half only allocates and sets the pool registration
  - the second half creates the pool from the registration

This is what this patch does. This now opens the ability to pre-create
registrations and create pools directly from there.
2025-08-06 19:20:22 +02:00
Willy Tarreau
325d1bdcca MINOR: implement ha_aligned_alloc() to return aligned memory areas
We have two versions, _safe() which verifies and adjusts alignment,
and the regular one which trusts the caller. There's also a dedicated
ha_aligned_free() due to mingw.

The currently detected OSes are mingw, unixes older than POSIX 200112
which require memalign(), and those post 200112 which will use
posix_memalign(). Solaris 10 reports 200112 (probably through
_GNU_SOURCE since it does not do it by default), and Solaris 11 still
supports memalign() so for all Solaris we use memalign(). The memstats
wrappers are also implemented, and have the exported names. This was
the opportunity for providing a separate free call that lets the caller
specify the size (e.g. for use with pools).

For now this code is not used.
2025-08-06 19:19:27 +02:00
Willy Tarreau
e921fe894f BUILD: compat: always set _POSIX_VERSION to ease comparisons
Sometimes we need to compare it to known versions, let's make sure it's
always defined. We set it to zero if undefined so that it cannot match
any comparison.
2025-08-06 19:19:27 +02:00
Willy Tarreau
2ce0c63206 BUILD: quic: use _MAX() to avoid build issues in pools declarations
With the upcoming pool declaration, we're filling a struct's fields,
while older versions were relying on initcalls which could be turned
to function declarations. Thus the compound expressions that were
usable there are not necessarily anymore, as witnessed here with
gcc-5.5 on solaris 10:

      In file included from include/haproxy/quic_tx.h:26:0,
                       from src/quic_tx.c:15:
      include/haproxy/compat.h:106:19: error: braced-group within expression allowed only inside a function
       #define MAX(a, b) ({    \
                         ^
      include/haproxy/pool.h:41:11: note: in definition of macro '__REGISTER_POOL'
         .size = _size,           \
                 ^
      ...
      include/haproxy/quic_tx-t.h:6:29: note: in expansion of macro 'MAX'
       #define QUIC_MAX_CC_BUFSIZE MAX(QUIC_INITIAL_IPV6_MTU, QUIC_INITIAL_IPV4_MTU)

Let's make the macro use _MAX() instead of MAX() since it relies on pure
constants.
2025-08-06 19:19:11 +02:00
Willy Tarreau
cf8871ae40 BUILD: compat: provide relaxed versions of the MIN/MAX macros
In 3.0 the MIN/MAX macros were converted to compound expressions with
commit 0999e3d959 ("CLEANUP: compat: make the MIN/MAX macros more
reliable"). However with older compilers these are not supported out
of code blocks (e.g. to initialize variables or struct members). This
is the case on Solaris 10 with gcc-5.5 when QUIC doesn't compile
anymore with the future pool registration:

  In file included from include/haproxy/quic_tx.h:26:0,
                   from src/quic_tx.c:15:
  include/haproxy/compat.h:106:19: error: braced-group within expression allowed only inside a function
   #define MAX(a, b) ({    \
                     ^
  include/haproxy/pool.h:41:11: note: in definition of macro '__REGISTER_POOL'
     .size = _size,           \
             ^
  ...
  include/haproxy/quic_tx-t.h:6:29: note: in expansion of macro 'MAX'
   #define QUIC_MAX_CC_BUFSIZE MAX(QUIC_INITIAL_IPV6_MTU, QUIC_INITIAL_IPV4_MTU)

Let's provide the old relaxed versions as _MIN/_MAX for use with constants
like such cases where it's certain that there is no risk. A previous attempt
using __builtin_constant_p() to switch between the variants did not work,
and it's really not worth the hassle of going this far.
2025-08-06 19:18:42 +02:00
Willy Tarreau
b1f854bb2e BUILD: fcgi: fix the struct name of fcgi_flt_ctx
The struct was mistakenly spelled flt_fcgi_ctx() in fcgi_flt_stop()
when it was introduced in 2.1 with commit 78fbb9f991 ("MEDIUM:
fcgi-app: Add FCGI application and filter"), causing build issues
when trying to get the alignment of the object in pool_free() for
debugging purposes. No backport is needed as it's just used to convey
a pointer.
2025-08-06 16:27:05 +02:00
Alexander Stephan
ffbb3cc306 MINOR: sample: Add le2dec (little endian to decimal) sample fetch
This commit introduces a sample fetch, `le2dec`, to convert
little-endian binary input samples into their decimal representations.
The function converts the input into a string containing unsigned
integer numbers, with each number derived from a specified number of
input bytes. The numbers are separated using a user-defined separator.

This new sample is achieved by adding a parametrized sample_conv_2dec
function, unifying the logic for be2dec and le2dec converters.

Co-authored-by: Christian Norbert Menges <christian.norbert.menges@sap.com>
[wt: tracked as GH issue #2915]
Signed-off-by: Willy Tarreau <w@1wt.eu>
2025-08-05 13:47:53 +02:00
Aurelien DARRAGON
aeff2a3b2a BUG/MEDIUM: hlua_fcn: ensure systematic watcher cleanup for server list iterator
In 358166a ("BUG/MINOR: hlua_fcn: restore server pairs iterator pointer
consistency"), I wrongly assumed that because the iterator was a temporary
object, no specific cleanup was needed for the watcher.

In fact watcher_detach() is not only relevant for the watcher itself, but
especially for its parent list to remove the current watcher from it.

As iterators are temporary objects, failing to remove their watchers from
the server watcher list causes the server watcher list to be corrupted.

On a normal iteration sequence, the last watcher_next() receives NULL
as target so it successfully detaches the last watcher from the list.
However the corner case here is with interrupted iterators: users are
free to break away from the iteration loop when a specific condition is
met for instance from the lua script, when this happens
hlua_listable_servers_pairs_iterator() doesn't get a chance to detach the
last iterator.

Also, Lua doesn't tell us that the loop was interrupted,
so to fix the issue we rely on the garbage collector to force a last
detach right before the object is freed. To achieve that, watcher_detach()
was slightly modified so that it becomes possible to call it without
knowing if the watcher is already detached or not, if watcher_detach() is
called on a detached watcher, the function does nothing. This way it saves
the caller from having to track the watcher state and makes the API a
little more convenient to use. This way we now systematically call
watcher_detach() for server iterators right before they are garbage
collected.

This was first reported in GH #3055. It can be observed when the server
list is browsed one than more time when it was already browsed from Lua
for a given proxy and the iteration was interrupted before the end. As the
watcher list is corrupted, the common symptom is watcher_attach() or
watcher_next() not ending due to the internal mt_list call looping
forever.

Thanks to GH users @sabretus and @sabretus for their precious help.

It should be backported everywhere 358166a was.
2025-08-05 13:06:46 +02:00
William Lallemand
66f28dbd3f BUG/MINOR: acme: possible integer underflow in acme_txt_record()
a2base64url() can return a negative value is olen is too short to
accept ilen. This is not supposed to happen since the sha256 should
always fit in a buffer. But this is confusing since a2base64()
returns a signed integer which is pt in output->data which is unsigned.

Fix the issue by setting ret to 0 instead of -1 upon error. And returns
a unsigned integer instead of a signed one.
This patch also checks the return value from the caller in order
to emit an error instead of setting trash.data which is already done
from the function.
2025-08-05 12:12:50 +02:00
William Lallemand
8afd3e588d MINOR: acme: update the log for DNS-01
Update the log for DNS-01 by mentionning the challenge_ready command
over the CLI.
2025-08-01 18:08:43 +02:00
William Lallemand
9ee14ed2d9 MEDIUM: acme: allow to wait and restart the task for DNS-01
DNS-01 needs a external process which would register a TXT record on a
DNS provider, using a REST API or something else.

To achieve this, the process should read the dpapi sink and wait for
events. With the DNS-01 challenge, HAProxy will put the task to sleep
before asking the ACME server to achieve the challenge. The task then
need to be woke up, using the command implemented by this patch.

This patch implements the "acme challenge_ready" command which should be
used by the agent once the challenge was configured in order to wake the
task up.

Example:
    echo "@1 acme challenge_ready foobar.pem.rsa domain kikyo" | socat /tmp/master.sock -
2025-08-01 18:07:12 +02:00
William Lallemand
3dde7626ba MINOR: acme: emit the DNS-01 challenge details on the dpapi sink
This commit adds a new message to the dpapi sink which is emitted during
the new authorization request.

One message is emitted by challenge to resolve. The certificate name as
well as the thumprint of the account key are on the first line of the
message. A dump of the JSON response for 1 challenge is dumped, en the
message ends with a \0.

The agent consuming these messages MUST NOT access the URLs, and SHOULD
only uses the thumbprint, dns and token to configure a challenge.

Example:

    $ ( echo "@@1 show events dpapi -w -0"; cat - ) | socat /tmp/master.sock -  | cat -e
    <0>2025-08-01T16:23:14.797733+02:00 acme deploy foobar.pem.rsa thumbprint Gv7pmGKiv_cjo3aZDWkUPz5ZMxctmd-U30P2GeqpnCo$
    {$
       "status": "pending",$
       "identifier": {$
          "type": "dns",$
          "value": "foobar.com"$
       },$
       "challenges": [$
          {$
             "type": "dns-01",$
             "url": "https://0.0.0.0:14000/chalZ/1o7sxLnwcVCcmeriH1fbHJhRgn4UBIZ8YCbcrzfREZc",$
             "token": "tvAcRXpNjbgX964ScRVpVL2NXPid1_V8cFwDbRWH_4Q",$
             "status": "pending"$
          },$
          {$
             "type": "dns-account-01",$
             "url": "https://0.0.0.0:14000/chalZ/z2_WzibwTPvE2zzIiP3BF0zNy3fgpU_8Nj-V085equ0",$
             "token": "UedIMFsI-6Y9Nq3oXgHcG72vtBFWBTqZx-1snG_0iLs",$
             "status": "pending"$
          },$
          {$
             "type": "tls-alpn-01",$
             "url": "https://0.0.0.0:14000/chalZ/AHnQcRvZlFw6e7F6rrc7GofUMq7S8aIoeDileByYfEI",$
             "token": "QhT4ejBEu6ZLl6pI1HsOQ3jD9piu__N0Hr8PaWaIPyo",$
             "status": "pending"$
          },$
          {$
             "type": "http-01",$
             "url": "https://0.0.0.0:14000/chalZ/Q_qTTPDW43-hsPW3C60NHpGDm_-5ZtZaRfOYDsK3kY8",$
             "token": "g5Y1WID1v-hZeuqhIa6pvdDyae7Q7mVdxG9CfRV2-t4",$
             "status": "pending"$
          }$
       ],$
       "expires": "2025-08-01T15:23:14Z"$
    }$
    ^@
2025-08-01 16:48:22 +02:00
William Lallemand
365a69648c MINOR: acme: emit a log for DNS-01 challenge response
This commit emits a log which output the TXT entry to create in case of
DNS-01. This is useful in cases you want to update your TXT entry
manually.

Example:

    acme: foobar.pem.rsa: DNS-01 requires to set the "acme-challenge.example.com" TXT record to "7L050ytWm6ityJqolX-PzBPR0LndHV8bkZx3Zsb-FMg"
2025-08-01 16:12:27 +02:00
William Lallemand
09275fd549 BUILD: acme: avoid declaring TRACE_SOURCE in acme-t.h
Files ending with '-t.h' are supposed to be used for structure
definitions and could be included in the same file to check API
definitions.

This patch removes TRACE_SOURCE from acme-t.h to avoid conflicts with
other TRACE_SOURCE definitions.
2025-07-31 16:03:28 +02:00
Amaury Denoyelle
a6e67e7b41 BUG/MEDIUM: mux-quic: ensure Early-data header is set
QUIC MUX may be initialized prior to handshake completion, when 0-RTT is
used. In this case, connection is flagged with CO_FL_EARLY_SSL_HS, which
is notably used by wait-for-hs http rule.

Early data may be subject to replay attacks. For this reason, haproxy
adds the header 'Early-data: 1' to all requests handled as TLS early
data. Thus the server can reject it if it is deemed unsafe. This header
injection is implemented by http-ana. However, it was not functional
with QUIC due to missing CO_FL_EARLY_DATA connection flag.

Fix this by ensuring that QUIC MUX sets CO_FL_EARLY_DATA when needed.
This is performed during qcc_recv() for STREAM frame reception. It is
only set if QC_CF_WAIT_HS is set, meaning that the handshake is not yet
completed. After this, the request is considered safe and Early-data
header is not necessary anymore.

This should fix github issue #3054.

This must be backported up to 3.2 at least. If possible, it should be
backported to all stable releases as well. On these versions, the
current patch relies on the following refactoring commit :
  commit 0a53a008d032b69377869c8caaec38f81bdd5bd6
  MINOR: mux-quic: refactor wait-for-handshake support
2025-07-31 15:25:59 +02:00
Amaury Denoyelle
697f7d1142 MINOR: muxes: refactor private connection detach
Following the latest adjustment on session_add_conn() /
session_check_idle_conn(), detach muxes callbacks were rewritten for
private connection handling.

Nothing really fancy here : some more explicit comments and the removal
of a duplicate checks on idle conn status for muxes with true
multipexing support.
2025-07-30 16:14:00 +02:00
Amaury Denoyelle
2ecc5290f2 MINOR: session: streamline session_check_idle_conn() usage
session_check_idle_conn() is called by muxes when a connection becomes
idle. It ensures that the session idle limit is not yet reached. Else,
the connection is removed from the session and it can be freed.

Prior to this patch, session_check_idle_conn() was compatible with a
NULL session argument. In this case, it would return true, considering
that no limit was reached and connection not removed.

However, this renders the function error-prone and subject to future
bugs. This patch streamlines it by ensuring it is never called with a
NULL argument. Thus it can now only returns true if connection is kept
in the session or false if it was removed, as first intended.
2025-07-30 16:13:30 +02:00
Amaury Denoyelle
dd9645d6b9 MINOR: session: do not release conn in session_check_idle_conn()
session_check_idle_conn() is called to flag a connection already
inserted in a session list as idle. If the session limit on the number
of idle connections (max-session-srv-conns) is exceeded, the connection
is removed from the session list.

In addition to the connection removal, session_check_idle_conn()
directly calls MUX destroy callback on the connection. This means the
connection is freed by the function itself and should not be used by the
caller anymore.

This is not practical when an alternative connection closure method
should be used, such as a graceful shutdown with QUIC. As such, remove
MUX destroy invokation : this is now the responsability of the caller to
either close or release immediately the connection.
2025-07-30 11:43:41 +02:00
Amaury Denoyelle
57e9425dbc MINOR: session: strengthen idle conn limit check
Add a BUG_ON() on session_check_idle_conn() to ensure the connection is
not already flagged as CO_FL_SESS_IDLE.

This checks that this function is only called one time per connection
transition from active to idle. This is necessary to ensure that session
idle counter is only incremented one time per connection.
2025-07-30 11:40:16 +02:00
Amaury Denoyelle
ec1ab8d171 MINOR: session: remove redundant target argument from session_add_conn()
session_add_conn() uses three argument : connection and session
instances, plus a void pointer labelled as target. Typically, it
represents the server, but can also be a backend instance (for example
on dispatch).

In fact, this argument is redundant as <target> is already a member of
the connection. This commit simplifies session_add_conn() by removing
it. A BUG_ON() on target is extended to ensure it is never NULL.
2025-07-30 11:39:57 +02:00
Amaury Denoyelle
668c2cfb09 MINOR: session: strengthen connection attach to session
This commit is the first one of a serie to refactor insertion of backend
private connection into the session list.

session_add_conn() is used to attach a connection into a session list.
Previously, this function would report an error if the connection
specified was already attached to another session. However, this case
currently never happens and thus can be considered as buggy.

Remove this check and replace it with a BUG_ON(). This allows to ensure
that session insertion remains consistent. The same check is also
transformed in session_check_idle_conn().
2025-07-30 11:39:26 +02:00
Amaury Denoyelle
cfe9bec1ea MINOR: mux-quic: release conn after shutdown on BE reuse failure
On stream detach on backend side, connection is inserted in the proper
server/session list to be able to reuse it later. If insertion fails and
the connection is idle, the connection can be removed immediately.

If this occurs on a QUIC connection, QUIC MUX implements graceful
shutdown to ensure the server is notified of the closure. However, the
connection instance is not freed. Change this to ensure that both
shutdown and release is performed.
2025-07-30 10:04:19 +02:00
Aurelien DARRAGON
14966c856b MINOR: clock: make global_now_ns a pointer as well
Similar to previous commit but for global_now_ns
2025-07-29 18:04:15 +02:00
Aurelien DARRAGON
4a20b3835a MINOR: clock: make global_now_ms a pointer
This is preparation work for shared counters between co-processes. As
co-processes will need to share a common date. global_now_ms will be used
for that as it will point to the shm when sharing is enabled.

Thus in this patch we turn global_now_ms into a pointer (and adjust the
places where it is written to and read from, hopefully atomic operations
through pointer are already used so the change is trivial)

For now global_now_ms points to process-local _global_now_ms which is a
fallback for when sharing through the shm is not enabled.
2025-07-29 18:04:14 +02:00
Aurelien DARRAGON
713ebd2750 CLEANUP: counters: rename counters_be_shared_init to counters_be_shared_prepare
75e480d10 ("MEDIUM: stats: avoid 1 indirection by storing the shared
stats directly in counters struct") took care of renaming
counters_fe_shared_init() but we forgot counters_be_shared_init().

Let's fix that for consistency
2025-07-29 18:00:13 +02:00
Aurelien DARRAGON
2ffe515d97 BUG/MINOR: hlua: take default-path into account with lua-load-per-thread
As discussed in GH #3051, default-path is not taken into account when
loading files using lua-load-per-thread. In fact, the initial
hlua_load_state() (performed on first thread which parses the config)
is successful, but other threads run hlua_load_state() later based
on config hints which were saved by the first thread, and those config
hints only contain the file path provided on the lua-load-per-thread
config line, not the absolute one. Indeed, `default-path` directive
changes the current working directory only for the thread parsing the
configuration.

To fix the issue, when storing config hints under hlua_load_per_thread()
we now make sure to save the absolute file path for `lua-load-per-thread'
argument.

Thanks to GH user @zhanhb for having reported the issue

It may be backported to all stable versions.
2025-07-29 17:58:28 +02:00
William Lallemand
83a335f925 MINOR: acme: implement traces
Implement traces for the ACME protocol.

 -dt acme:data:complete will dump every input and output buffers,
 including decoded buffers before being converted to JWS.
 It will also dump certificates in the traces.

 -dt acme:user:complete will only dump the state of the task handler.
2025-07-29 17:25:10 +02:00
Willy Tarreau
cedb4f0461 [RELEASE] Released version 3.3-dev5
Released version 3.3-dev5 with the following main changes :
    - BUG/MEDIUM: queue/stats: also use stream_set_srv_target() for pendconns
    - DOC: list missing global QUIC settings
2025-07-28 11:26:22 +02:00
Amaury Denoyelle
7fa812a1ac DOC: list missing global QUIC settings
Complete list of global keywords with missing QUIC entries.

This could be backported to stable versions. This requires to take into
account the version of introduction for each keyword.
* limited-quic, introduced in 2.8
* no-quic, introduced in 2.8
* tune.quic.cc.cubic.min-losses, introduced in 3.1
2025-07-28 11:22:35 +02:00
Aurelien DARRAGON
021a0681be BUG/MEDIUM: queue/stats: also use stream_set_srv_target() for pendconns
Following c24de07 ("OPTIM: stats: store fast sharded counters pointers
at session and stream level") some crashes were observed in
connect_server():

  #0  0x00000000007ba39c in connect_server (s=0x65117b0) at src/backend.c:2101
  2101                            _HA_ATOMIC_INC(&s->sv_tgcounters->connect);
  Missing separate debuginfos, use: debuginfo-install glibc-2.17-325.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 nss-softokn-freebl-3.67.0-3.el7_9.x86_64 pcre-8.32-17.el7.x86_64
  (gdb) bt
  #0  0x00000000007ba39c in connect_server (s=0x65117b0) at src/backend.c:2101
  #1  0x00000000007baff8 in back_try_conn_req (s=0x65117b0) at src/backend.c:2378
  #2  0x00000000006c0e9f in process_stream (t=0x650f180, context=0x65117b0, state=8196) at src/stream.c:2366
  #3  0x0000000000bd3e51 in run_tasks_from_lists (budgets=0x7ffd592752e0) at src/task.c:655
  #4  0x0000000000bd49ef in process_runnable_tasks () at src/task.c:889
  #5  0x0000000000851169 in run_poll_loop () at src/haproxy.c:2834
  #6  0x0000000000851865 in run_thread_poll_loop (data=0x1a03580 <ha_thread_info>) at src/haproxy.c:3050
  #7  0x0000000000852a53 in main (argc=7, argv=0x7ffd592755f8) at src/haproxy.c:3637

Here the crash occurs during the atomic inc of a sv_tgcounters metric from
the stream pointer, which tells us the pointer is likely garbage.

In fact, we assign s->sv_tgcounters each time the stream target is set to
a valid server. For that we use stream_set_srv_target() helper which does
assigment for us. By reviewing the code, in turns out we forgot to call
stream_set_srv_target() in pendconn_dequeue(), where the stream target
is set to the server who picked the pendconn.

Let's fix the bug by using stream_set_srv_target() there.

No backport needed unless c24de07 is.
2025-07-28 08:54:38 +02:00
Willy Tarreau
5d4ff9f02e [RELEASE] Released version 3.3-dev4
Released version 3.3-dev4 with the following main changes :
    - CLEANUP: server: do not check for duplicates anymore in findserver()
    - REORG: server: move findserver() from proxy.c to server.c
    - MINOR: server: use the tree to look up the server name in findserver()
    - CLEANUP: server: rename server_find_by_name() to server_find()
    - CLEANUP: server: rename findserver() to server_find_by_name()
    - CLEANUP: server: use server_find_by_name() where relevant
    - CLEANUP: cfgparse: lookup proxy ID using existing functions
    - CLEANUP: stream: lookup server ID using standard functions
    - CLEANUP: server: simplify server_find_by_id()
    - CLEANUP: server: add server_find_by_addr()
    - CLEANUP: stream: use server_find_by_addr() in sticking_rule_find_target()
    - CLEANUP: server: be sure never to compare src against a non-existing defsrv
    - MEDIUM: proxy: take the defsrv out of the struct proxy
    - MINOR: proxy: add checks for defsrv's validity
    - MEDIUM: proxy: no longer allocate the default-server entry by default
    - MEDIUM: proxy: register a post-section cleanup function
    - MINOR: debug: report haproxy and operating system info in panic dumps
    - BUG/MEDIUM: h3: do not overwrite interim with final response
    - BUG/MINOR: h3: properly realloc buffer after interim response encoding
    - BUG/MINOR: h3: ensure that invalid status code are not encoded (FE side)
    - MINOR: qmux: change API for snd_buf FIN transmission
    - BUG/MEDIUM: h3: handle interim response properly on FE side
    - BUG/MINOR: h3: properly handle interim response on BE side
    - BUG/MINOR: quic: Wrong source address use on FreeBSD
    - MINOR: h3: remove unused outbuf in h3_resp_headers_send()
    - BUG/MINOR: applet: Don't trigger BUG_ON if the tid is not on appctx init
    - DEV: gdb: add a memprofile decoder to the debug tools
    - MINOR: quic: Get rid of qc_is_listener()
    - DOC: connection: explain the rules for idle/safe/avail connections
    - BUG/MEDIUM: quic-be: CC buffer released from wrong pool
    - BUG/MINOR: halog: exit with error when some output filters are set simultaneosly
    - MINOR: cpu-topo: split cpu_dump_topology() to show its summary in show dev
    - MINOR: cpu-topo: write thread-cpu bindings into trash buffer
    - MINOR: debug: align output style of debug_parse_cli_show_dev with cpu_dump_topology
    - MINOR: debug: add thread-cpu bindings info in 'show dev' output
    - MINOR: quic: Remove pool_head_quic_be_cc_buf pool
    - BUILD: debug: add missed guard USE_CPU_AFFINITY to show cpu bindings
    - BUG/MEDIUM: threads: Disable the workaround to load libgcc_s on macOS
    - BUG/MINOR: logs: fix log-steps extra log origins selection
    - BUG/MINOR: hq-interop: fix FIN transmission
    - MINOR: ssl: Add ciphers in ssl traces
    - MINOR: ssl: Add curve id to curve name table and mapping functions
    - MINOR: ssl: Add curves in ssl traces
    - MINOR: ssl: Dump ciphers and sigalgs details in trace with 'advanced' verbosity
    - MINOR: ssl: Remove ClientHello specific traces if !HAVE_SSL_CLIENT_HELLO_CB
    - MINOR: h3: use smallbuf for request header emission
    - MINOR: h3: add traces to h3_req_headers_send()
    - BUG/MINOR: h3: fix uninitialized value in h3_req_headers_send()
    - MINOR: log: explicitly ignore "log-steps" on backends
    - BUG/MEDIUM: acme: use POST-as-GET instead of GET for resources
    - BUG/MINOR mux-quic: apply correctly timeout on output pending data
    - BUG/MINOR: mux-quic: ensure close-spread-time is properly applied
    - MINOR: mux-quic: refactor timeout code
    - MINOR: mux-quic: correctly implement backend timeout
    - MINOR: mux-quic: disable glitch on backend side
    - MINOR: mux-quic: store session in QCS instance
    - MEDIUM: mux-quic: implement be connection reuse
    - MINOR: mux-quic: do not reuse connection if app already shut
    - MEDIUM: mux-quic: support backend private connection
    - MINOR: acme: remove acme_req_auth() and use acme_post_as_get() instead
    - BUG/MINOR: acme: allow "processing" in challenge requests
    - CLEANUP: acme: fix wrong spelling of "resources"
    - CLEANUP: ssl: Use only NIDs in curve name to id table
    - MINOR: acme: add ACME to the haproxy -vv feature list
    - BUG/MINOR: hlua: Skip headers when a receive is performed on an HTTP applet
    - BUG/MEDIUM: applet: State inbuf is no longer full if input data are skipped
    - BUG/MEDIUM: stconn: Fix conditions to know an applet can get data from stream
    - BUG/MINOR: applet: Fix applet_getword() to not return one extra byte
    - BUG/MEDIUM: Remove sync sends from streams to applets
    - MINOR: applet: Add HTX versions for applet_input_data() and applet_output_room()
    - MINOR: applet: Improve applet API to take care of inbuf/outbuf alloc failures
    - MEDIUM: hlua: Update the tcp applet to use its own buffers
    - MINOR: hlua: Fill the request array on the first HTTP applet run
    - MINOR: hlua: Use the buffer instead of the HTTP message to get HTTP headers
    - MEDIUM: hlua: Update the http applet to use its own buffers
    - BUG/MEDIUM: hlua: Report to SC when data were consumed on a lua socket
    - BUG/MEDIUM: hlua: Report to SC when output data are blocked on a lua socket
    - MEDIUM: hlua: Update the socket applet to use its own buffers
    - BUG/MEDIUM: dns: Reset reconnect tempo when connection is finally established
    - MEDIUM: dns: Update the dns_session applet to use its own buffers
    - CLEANUP: http-client: Remove useless indentation when sending request body
    - MINOR: http-client: Try to send request body with headers if possible
    - MINOR: http-client: Trigger an error if first response block isn't a start-line
    - BUG/MINOR: httpclient-cli: Don't try to dump raw headers in HTX mode
    - MINOR: httpclient-cli: Reset httpclient HTX buffer instead of removing blocks
    - MEDIUM: http-client: Update the http-client applet to use its own buffers
    - MEDIUM: log: Update the log applet to use its own buffers
    - MEDIUM: sink: Update the sink applets to use their own buffers
    - MEDIUM: peers: Update the peer applet to use its own buffers
    - MEDIUM: promex: Update the promex applet to use their own buffers
    - MINOR: applet: Add support for flags on applets with a flag about the new API
    - MEDIUM: applet: Emit a warning when a legacy applet is spawned
    - BUG/MEDIUM: logs: fix sess_build_logline_orig() recursion with options
    - MEDIUM: stats: avoid 1 indirection by storing the shared stats directly in counters struct
    - CLEANUP: compiler: prefer char * over void * for pointer arithmetic
    - CLEANUP: include: replace hand-rolled offsetof to avoid UB
    - CLEANUP: peers: remove unused peer_session_target()
    - OPTIM: stats: store fast sharded counters pointers at session and stream level
2025-07-26 09:55:26 +02:00
Aurelien DARRAGON
c24de077bd OPTIM: stats: store fast sharded counters pointers at session and stream level
Following commit 75e480d10 ("MEDIUM: stats: avoid 1 indirection by storing
the shared stats directly in counters struct"), in order to minimize the
impact of the recent sharded counters work, we try to push things a bit
further in this patch by storing and using "fast" pointers at the session
and stream levels when available to avoid costly indirections and
systematic "tgid" resolution (which can not be cached by the CPU due to
its THREAD-local nature).

Indeed, we know that a session/stream is tied to a given CPU, thanks to
this we know that the tgid for a given session/stream will never change.

Given that, we are able to store sharded frontend and listener counters
pointer at the session level (namely sess->fe_tgcounters and
sess->li_tgcounters), and once the backend and the server are selected,
we are also able to store backend and server sharded counters
pointer at the stream level (namely s->be_tgcounters and s->sv_tgcounters)

Everywhere we rely on these counters and the stream or session context is
available, we use the fast pointers it instead of the indirect pointers
path to make the pointer resolution a bit faster.

This optimization proved to bring a few percents back, and together with
the previous 75e480d10 commit we now fixed the performance regression (we
are back to back with 3.2 stats performance)
2025-07-25 18:24:23 +02:00
Aurelien DARRAGON
cf8ba60c88 CLEANUP: peers: remove unused peer_session_target()
Since commit 7293eb68 ("MEDIUM: peers: use server as stream target") peer
session target always point to server in order to benefit from existing
server transport options.

Thanks to that, it is no longer necessary to have peer_session_target()
helper function, because all it does is return the pointer to the
server object. Let's get rid of that
2025-07-25 18:24:17 +02:00
Ben Kallus
1e48ec7f6c CLEANUP: include: replace hand-rolled offsetof to avoid UB
The C standard specifies that it's undefined behavior to dereference
NULL (even if you use & right after). The hand-rolled offsetof idiom
&(((s*)NULL)->f) is thus technically undefined. This clutters the
output of UBSan and is simple to fix: just use the real offsetof when
it's available.

Note that there's no clear statement about this point in the spec,
only several points which together converge to this:

- From N3220, 6.5.3.4:
  A postfix expression followed by the -> operator and an identifier
  designates a member of a structure or union object. The value is
  that of the named member of the object to which the first expression
  points, and is an lvalue.

- From N3220, 6.3.2.1:
  An lvalue is an expression (with an object type other than void) that
  potentially designates an object; if an lvalue does not designate an
  object when it is evaluated, the behavior is undefined.

- From N3220, 6.5.4.4 p3:
  The unary & operator yields the address of its operand. If the
  operand has type "type", the result has type "pointer to type". If
  the operand is the result of a unary * operator, neither that operator
  nor the & operator is evaluated and the result is as if both were
  omitted, except that the constraints on the operators still apply and
  the result is not an lvalue. Similarly, if the operand is the result
  of a [] operator, neither the & operator nor the unary * that is
  implied by the [] is evaluated and the result is as if the & operator
  were removed and the [] operator were changed to a + operator.

=> In short, this is saying that C guarantees these identities:
    1. &(*p) is equivalent to p
    2. &(p[n]) is equivalent to p + n

As a consequence, &(*p) doesn't result in the evaluation of *p, only
the evaluation of p (and similar for []). There is no corresponding
special carve-out for ->.

See also: https://pvs-studio.com/en/blog/posts/cpp/0306/

After this patch, HAProxy can run without crashing after building w/
clang-19 -fsanitize=undefined -fno-sanitize=function,alignment
2025-07-25 17:54:32 +02:00
Ben Kallus
d3b46cca7b CLEANUP: compiler: prefer char * over void * for pointer arithmetic
This patch changes two instances of pointer arithmetic on void *
to use char * instead, to avoid UB. This is essentially to please
UB analyzers, though.
2025-07-25 17:54:32 +02:00
Aurelien DARRAGON
75e480d107 MEDIUM: stats: avoid 1 indirection by storing the shared stats directly in counters struct
Between 3.2 and 3.3-dev we noticed a noticeable performance regression
due to stats handling. After bisecting, Willy found out that recent
work to split stats computing accross multiple thread groups (stats
sharding) was responsible for that performance regression. We're looking
at roughly 20% performance loss.

More precisely, it is the added indirections, multiplied by the number
of statistics that are updated for each request, which in the end causes
a significant amount of time being spent resolving pointers.

We noticed that the fe_counters_shared and be_counters_shared structures
which are currently allocated in dedicated memory since a0dcab5c
("MAJOR: counters: add shared counters base infrastructure")
are no longer huge since 16eb0fab31 ("MAJOR: counters: dispatch counters
over thread groups") because they now essentially hold flags plus the
per-thread group id pointer mapping, not the counters themselves.

As such we decided to try merging fe_counters_shared and
be_counters_shared in their parent structures. The cost is slight memory
overhead for the parent structure, but it allows to get rid of one
pointer indirection. This patch alone yields visible performance gains
and almost restores 3.2 stats performance.

counters_fe_shared_get() was renamed to counters_fe_shared_prepare() and
now returns either failure or success instead of a pointer because we
don't need to retrieve a shared pointer anymore, the function takes care
of initializing existing pointer.
2025-07-25 16:46:10 +02:00
Aurelien DARRAGON
31adfb6c15 BUG/MEDIUM: logs: fix sess_build_logline_orig() recursion with options
Since ccc43412 ("OPTIM: log: use thread local lf_buildctx to stop pushing
it on the stack"), recursively calling sess_build_logline_orig(), which
may for instance happen when leveraging %ID (or unique-id fetch) for the
first time, would lead to undefined behavior because the parent
sess_build_logline_orig() build context was shared between recursive calls
(only one build ctx per thread to avoid pushing it on the stack for each
call)

In short, the parent build ctx would be altered by the recursive calls,
which is obviously not expected and could result in log formatting errors.

To fix the issue but still avoid polluting the stack with large lf_buildctx
struct, let's move the static 256 bytes build buffer out of the buildctx
so that the buildctx is now stored in the stack again (each function
invokation has its own dedicated build ctx). On the other hand, it's
acceptable to have only 1 256 bytes build buffer per thread because the
build buffer is not involved in recursives calls (unlike the build ctx)

Thanks to Willy and Vincent Gramer for spotting the bug and providing
useful repro.

It should be backported in 3.0 with ccc43412.
2025-07-25 16:46:03 +02:00
Christopher Faulet
b8d5307bd9 MEDIUM: applet: Emit a warning when a legacy applet is spawned
To motivate developers to support the new applets API, a warning is now
emitted when a legacy applet is spawned. To not flood users, this warning is
only emitted once per legacy applet. To do so, the applet flag
APPLET_FL_WARNED was added. It is set when the warning is emitted.

Note that test and set on this flag are not performed via atomic operations.
So it is possible to have more than one warning for a given applet if it is
spawned in same time on several threads. At worrst, there is one warning per
thread.
2025-07-25 15:53:33 +02:00
Christopher Faulet
337768656b MINOR: applet: Add support for flags on applets with a flag about the new API
A new field was added in the applet structure to be able to set flags on the
applets The first one is related to the new API. APPLET_FL_NEW_API is set
for applets based on the new API. It was set on all HAProxy's applets.
2025-07-25 15:44:02 +02:00
Christopher Faulet
2e5e6cdf23 MEDIUM: promex: Update the promex applet to use their own buffers
Thanks to this patch, the promex applet is now using its own buffers.
.rcv_buf and .snd_buf callback functions are now defined to use the default
HTX functions. Parts to receive and send data have also been updated to use
the applet API and to remove any dependencies on the stream-connectors and
the channels.
2025-07-24 12:13:42 +02:00
Christopher Faulet
a2cb0033bd MEDIUM: peers: Update the peer applet to use its own buffers
Thanks to this patch, the peer applet is now using its own buffers. .rcv_buf
and .snd_buf callback functions are now defined to use the default raw
functions. The applet API is now used and any dependencies on the
stream-connectors and the channels were removed.
2025-07-24 12:13:42 +02:00
Christopher Faulet
576361c23e MEDIUM: sink: Update the sink applets to use their own buffers
Thanks to this patch, the sink applets is now using their own buffers.
.rcv_buf and .snd_buf callback functions are now defined to use the default
raw functions. The applet API is now used and any dependencies on the
stream-connectors and the channels were removed.
2025-07-24 12:13:42 +02:00
Christopher Faulet
5da704b55f MEDIUM: log: Update the log applet to use its own buffers
Thanks to this patch, the log applet is now using its own buffers. .rcv_buf
and .snd_buf callback functions are now defined to use the default raw
functions. The applet API is now used and any dependencies on the
stream-connectors and the channels were removed.
2025-07-24 12:13:42 +02:00
Christopher Faulet
6a2b354dea MEDIUM: http-client: Update the http-client applet to use its own buffers
Thanks to this patch, the http-client applet is now using its own buffers.
.rcv_buf and .snd_buf callback functions are now defined to use the default
HTX functions. Parts to receive and send data have also been updated to use
the applet API and to remove any dependencies on the stream-connectors and
the channels.
2025-07-24 12:13:42 +02:00
Christopher Faulet
d05ff904bf MINOR: httpclient-cli: Reset httpclient HTX buffer instead of removing blocks
In the CLI I/O handler interacting with the HTTP client, in HTX mode, after
a dump of the HTX message, data must be removed. Instead of removng all
blocks one by one, we can call htx_reset() because all the message must be
flushed.
2025-07-24 12:13:42 +02:00
Christopher Faulet
1741bc4bf0 BUG/MINOR: httpclient-cli: Don't try to dump raw headers in HTX mode
In the CLI I/O handler interacting with the HTTP client, we must not try to
push raw headers in HTX mode, because there is no raw data in this
mode. This prevent the HTX dump at the end of the I/O handle.

It is a 3.3-specific issue. No backport needed.
2025-07-24 12:13:42 +02:00
Christopher Faulet
88aa7a780c MINOR: http-client: Trigger an error if first response block isn't a start-line
The first HTX block of a response must be a start-line. There is no reason
to wait for something else. And if there are output data in the response
channel buffer, it means we must found the start-line.
2025-07-24 12:13:42 +02:00
Christopher Faulet
c08a0dae30 MINOR: http-client: Try to send request body with headers if possible
There is no reason to yield after sending the request headers, except if the
request was fully sent. If there is a payload, it is better to send it as
well. However, when the whole request was sent, we can leave the I/O handler.
2025-07-24 12:13:42 +02:00
Christopher Faulet
96aa251d20 CLEANUP: http-client: Remove useless indentation when sending request body
It was useless to have an indentation to handle HTTPCLIENT_S_REQ_BODY state
in the http-client I/O handler.
2025-07-24 12:13:42 +02:00
Christopher Faulet
217da087fd MEDIUM: dns: Update the dns_session applet to use its own buffers
Thanks to this patch, the dns_session applet is now using its own
buffers. .rcv_buf and .snd_buf callback functions are now defined to use the
default raw functions. Functions to receive and send data have also been
updated to use the applet API and to remove any dependencies on the
stream-connectors and the channels.
2025-07-24 12:13:41 +02:00
Christopher Faulet
765f14e0e3 BUG/MEDIUM: dns: Reset reconnect tempo when connection is finally established
The issue was introduced by commit 27236f221 ("BUG/MINOR: dns: add tempo
between 2 connection attempts for dns servers"). In this patch, to delay the
reconnection, a timer is used on the appctx when it is created. This
postpones the appctx initialization. However, once initialized, the
expiration time of the underlying task is not reset. So, it is always
considered as expired and the appctx is woken up in loop.

The fix is quite simple. In dns_session_init(), the expiration time of the
appctx's task is alwaus set to TICK_ETERNITY.

This patch must be backported everywhere the commit above was backported. So
as far as 2.8 for now but possibly to all stable versions.
2025-07-24 12:13:41 +02:00
Christopher Faulet
e542d2dfaa MEDIUM: hlua: Update the socket applet to use its own buffers
Thanks to this patch, the lua cosocket applet is now using its own
buffers. .rcv_buf and .snd_buf callback functions are now defined to use the
default raw functions. Functions to receive and send data have also been
updated to use the applet API and to remove any dependencies on the
stream-connectors and the channels.
2025-07-24 12:13:41 +02:00
Christopher Faulet
7e96ff6b84 BUG/MEDIUM: hlua: Report to SC when output data are blocked on a lua socket
It is a fix similar to the previous one ("BUG/MEDIUM: hlua: Report to SC
when data were consumed on a lua socket"), but for the write side. The
writer must notify the cosocket it needs more space in the request buffer to
produce more data by calling sc_need_room(). Otherwise, there is nothing to
prevent to wake the cosocket applet up again and again.

This patch must be backported as far as 2.8, and maybe to 2.6 too.
2025-07-24 12:13:41 +02:00
Christopher Faulet
21e45a61d1 BUG/MEDIUM: hlua: Report to SC when data were consumed on a lua socket
The lua cosocket are quite strange. There is an applet used to handle the
connection and writer and readers subscribed on it to write or read
data. Writers and readers are tasks woken up by the cosocket applet when
data can be consumed or produced, depending on the channels buffers
state. Then the cosocket applet is woken up by writers and readers when read
or write events were performed.

It means the cosocket applet has only few information on what was produced
or consumed. It is the writers and readers responsibility to notify any
blocking. Among other things, the readers must take care to notify the
stream on top of the cosocket applet that some data was consumed. Otherwise,
it may remain blocked, waiting for a write event (a write event from the
stream point of view is a read event from the cosocket point of view).

Thie patch must be backported as far as 2.8, and maybe to 2.6 too.
2025-07-24 12:13:41 +02:00
Christopher Faulet
48df877dab MEDIUM: hlua: Update the http applet to use its own buffers
Thanks to this patch, the lua HTTP applet is now using its own buffers.
.rcv_buf and .snd_buf callback functions are now defined to use the default
HTX functions. Functions to receive and send data have also been updated to
use the applet API and to remove any dependencies on the stream-connectors
and the channels.
2025-07-24 12:13:41 +02:00
Christopher Faulet
3e456be5ae MINOR: hlua: Use the buffer instead of the HTTP message to get HTTP headers
hlua_http_get_headers() function was using the HTTP message from the stream
TXN to retrieve headers from a message. However, this will be an issue to
update the lua HTTP applet to use its own buffers. Indeed, in that case,
information from the channels will be unavailable. So now,
hlua_http_get_headers() is now using a buffer containing an HTX message. It
is just an API change bacause, internally, the function was already
manipulation an HTX message.
2025-07-24 12:13:41 +02:00
Christopher Faulet
15080d9aae MINOR: hlua: Fill the request array on the first HTTP applet run
When a lua HTTP applet is created, a "request" object is created, filled
with the request information (method, path, headers...), to be able to
easily retrieve these information from the script. However, this was done
when thee appctx was created, retrieving the info from the request channel.

To be ale to update the applet to use its own buffer, it is now performed on
the first applet run. Indead, when the applet is created, the info are not
forwarded yet and should not be accessed. Note that for now, information are
still retrieved from the channel.
2025-07-24 12:13:41 +02:00
Christopher Faulet
fdb66e6c5e MEDIUM: hlua: Update the tcp applet to use its own buffers
Thanks to this patch, the lua TCP applet is now using its own buffers.
.rcv_buf and .snd_buf callback functions are now defined to use the default
raw functions. Other changes are quite light. Mainly, end of stream and
errors are reported on the appctx instead of the stream-endpoint descriptor.
2025-07-24 12:13:41 +02:00
Christopher Faulet
1f9a1cbefc MINOR: applet: Improve applet API to take care of inbuf/outbuf alloc failures
applet_get_inbuf() and applet_get_outbuf() functions were not testing if the
buffers were available. So, the caller had to check them before calling one
of these functions. It is not really handy. So now, these functions take
care to have a fully usable buffer before returning. Otherwise NULL is
returned.
2025-07-24 12:13:41 +02:00
Christopher Faulet
44aae94ab9 MINOR: applet: Add HTX versions for applet_input_data() and applet_output_room()
It will be useful for HTX applets because availale data in the input buffer and
available space in the output buffer are computed from the HTX message and not
the buffer itself. So now, applet_htx_input_data() and applet_htx_output_room()
functions can be used.
2025-07-24 12:13:41 +02:00
Christopher Faulet
d9855102cf BUG/MEDIUM: Remove sync sends from streams to applets
When the applet API was reviewed to use dedicated buffers, the support for
sends from the streams to applets was added. Unfortunately, it was not a
good idea because this way it is possible to deliver data to an applet and
release it just after, truncated data. Indeed, the release stage for applets
is related to the stream release itself. However, unlike the multiplexers,
the applets cannot survive to a stream for now.

So, for now, the sync sends from the streams is removed for applets, waiting
for a better way to handle the applets release stage.

Note that this only concerns applets using their own buffers. And of now,
the bug is harmless because all refactored applets are on server side and
consume data first. But this will be an issue with the HTTP client.

This patch should be backported as far as 3.0 after a period of observation.
2025-07-24 12:13:41 +02:00
Christopher Faulet
574d0d8211 BUG/MINOR: applet: Fix applet_getword() to not return one extra byte
applet_getword() function is returning one extra byte when a string is
returned because the "ret" variable is not reset before the loop on the
data. The patch also fixes applet_getline().

It is a 3.3-specific issue. No need to backport.
2025-07-24 12:13:41 +02:00
Christopher Faulet
41a40680ce BUG/MEDIUM: stconn: Fix conditions to know an applet can get data from stream
sc_is_send_allowed() function is used to know if an applet is able to
receive data from the stream. But this function was designed for applets
using the channels buffer. It is not adapted to applets using their own
buffers.

when the SE_FL_WAIT_DATA flag is set, it means the applet is waiting for
more data and should not be woken up without new data. For applets using
channels buffer, just testing the flag is enough because process_stream()
will remove if when more data will be available. For applets using their own
buffers, it is more complicated. Some data may be blocked in the output
channel buffer. In that case, and when the applet input buffer can receive
daa, the applet can be woken up.

This patch must be backported as far as 3.0 after a period of observation.
2025-07-24 12:13:41 +02:00
Christopher Faulet
0d371d2729 BUG/MEDIUM: applet: State inbuf is no longer full if input data are skipped
When data are skipped from the input buffer of an applet, we must take care
to notify the input buffer is no longer full. Otherwise, this could prevent
the stream to push data to the applet.

It is 3.3-specific. No backport needed.
2025-07-24 12:13:41 +02:00
Christopher Faulet
5b5ecf848d BUG/MINOR: hlua: Skip headers when a receive is performed on an HTTP applet
When an HTTP applet tries to retrieve data, the request headers are still in
the buffer. But, instead of being silently removed, their size is removed
from the amount of data retrieved. When the request payload is fully
retrieved, it is not an issue. But it is a problem when a length is
specified. The data are shorten from the headers size.

So now, we take care to silently remove headers.

This patch must be backported to all stable versions.
2025-07-24 12:13:41 +02:00
William Lallemand
8258c8166a MINOR: acme: add ACME to the haproxy -vv feature list
Add "ACME" in the feature list in order to check if the support was
built successfully.
2025-07-24 11:49:11 +02:00
Remi Tricot-Le Breton
14615a8672 CLEANUP: ssl: Use only NIDs in curve name to id table
The curve name to curve id mapping table was built out of multiple
internal tables found in openssl sources, namely the 'nid_to_group'
table found in 'ssl/t1_lib.c' which maps openssl specific NIDs to public
IANA curve identifiers. In this table, there were two instances of
EVP_PKEY_XXX ids being used while all the other ones are NID_XXX
identifiers.
Since the two EVP_PKEY are actually equal to their NID equivalent in
'include/openssl/evp.h' we can use NIDs all along for better coherence.
2025-07-24 10:58:54 +02:00
Ilia Shipitsin
a2267fafcf CLEANUP: acme: fix wrong spelling of "resources"
"ressources" was used as a variable name, let's use English variant
to make spell check happier
2025-07-24 08:11:42 +02:00
William Lallemand
02db0e6b9f BUG/MINOR: acme: allow "processing" in challenge requests
Allow the "processing" status in the challenge object when requesting
to do the challenge, in addition to "pending".

According to RFC 8555 https://datatracker.ietf.org/doc/html/rfc8555/#section-7.1.6

   Challenge objects are created in the "pending" state.  They
   transition to the "processing" state when the client responds to the
   challenge (see Section 7.5.1)

However some CA could respond with a "processing" state without ever
transitioning to "pending".

Must be backported to 3.2.
2025-07-23 16:07:03 +02:00
William Lallemand
c103123c9e MINOR: acme: remove acme_req_auth() and use acme_post_as_get() instead
acme_req_auth() is only a call to acme_post_as_get() now, there's no
reason to keep the function. This patch removes it.
2025-07-23 16:07:03 +02:00
Amaury Denoyelle
08d664b17c MEDIUM: mux-quic: support backend private connection
If a backend connection is private, it should not be reused outside of
its original attached session. As such, on stream detach operation, such
connection is never inserted into server idle/avail list. Instead, it is
stored directly on the session.

The purpose of this commit is to implement proper handling of private
backend connections via QUIC multiplexer.
2025-07-23 15:49:51 +02:00
Amaury Denoyelle
00d668549e MINOR: mux-quic: do not reuse connection if app already shut
QUIC connection graceful closure is performed in two steps. First, the
application layer is closed. In the context of HTTP/3, this is done with
a GOAWAY frame emission, which forbids opening of new streams. Then the
whole connection is terminated via CONNECTION_CLOSE which is the final
emitted frame.

This commit ensures that when app layer is shut for a backend
connection, this connection is removed from either idle or avail server
tree. The objective is to prevent stream layer to try to reuse a
connection if no new stream can be attached on it.

New BUG_ON checks are inserted in qmux_strm_attach() and h3_attach() to
ensure that this assertion is always true.
2025-07-23 15:45:18 +02:00
Amaury Denoyelle
3217835b1d MEDIUM: mux-quic: implement be connection reuse
Implement support for QUIC connection reuse on the backend side. The
main change is done during detach stream operation. If a connection is
idle, it is inserted in the server list. Else, it is stored in the
server avail tree if there is room for more streams.

For non idle connection, qmux_avail_streams() is reused to detect that
stream flow-control limit is not yet reached. If this is the case, the
connection is not inserted in the avail tree, so it cannot be reuse,
even if flow-control is unblocked later by the peer. This latter point
could be improved in the future.

Note that support for QUIC private connections is still missing. Reuse
code will evolved to fully support this case.
2025-07-23 15:45:09 +02:00
Amaury Denoyelle
3bf37596ba MINOR: mux-quic: store session in QCS instance
Add a new <sess> member into QCS structure. It is used to store the
parent session of the stream on attach operation. This is only done for
backend side.

This new member will become necessary when connection reuse will be
implemented. <owner> member of connection is not suitable as it could be
set to NULL, notably after a session_add_conn() failure.

Also, a single BE conn can be shared along different session instance,
in particular when using aggressive/always reuse mode. Thus it is
necessary to linked each QCS instance with its session.
2025-07-23 15:42:37 +02:00
Amaury Denoyelle
826f797bb0 MINOR: mux-quic: disable glitch on backend side
For now, QUIC glitch limit counter is only available on the frontend
side. Thus, disable incrementation on the backend side for now. Also,
session is only available as conn <owner> reliably on the frontend side,
so session_add_glitch_ctr() operation is also securised.
2025-07-23 14:39:18 +02:00
Amaury Denoyelle
89329b147d MINOR: mux-quic: correctly implement backend timeout
qcc_refresh_timeout() is the function called on QUIC MUX activity. Its
purpose is to update the timeout by selecting the correct value
depending on the connection state.

Prior to this patch, backend connections were mostly ignored by the
function. However, the default server timeout was selecting as a
fallback. This is incompatible with backend connections reuse.

This patch fixes timeout applied on backend connections. Only values
specific to frontend which are http-request and http-keep-alive timeouts
are now ignored for a backend connection. Also, fallback timeout is only
used for frontend connections.

This patch ensures that an idle backend connection won't be deleted due
to server timeout. This is necessary for proper connection reuse which
will be implemented in a future patch.
2025-07-23 14:36:48 +02:00
Amaury Denoyelle
95cb763cd6 MINOR: mux-quic: refactor timeout code
This commit is a small reorganization of condition used into
qcc_refresh_timeout(). Its objective is to render the code more logical
before the next patch which will ensure that timeout is properly set for
backend connections.
2025-07-23 14:36:48 +02:00
Amaury Denoyelle
558532fc57 BUG/MINOR: mux-quic: ensure close-spread-time is properly applied
If a connection remains on a proxy currently disabled or stopped, a
special spread timeout is set if active close is configured. For QUIC
MUX, this is set via qcc_refresh_timeout() as with all other timeout
values.

Fix this closing timeout setting : it is now used as an override to any
other timeout that may have been chosen if calculated spread time is
lower than the previously selected value. This is done for backend
connections as well.

This should be backported up to 2.6 after a period of observation.
2025-07-23 14:36:48 +02:00
Amaury Denoyelle
c5bcc3a21e BUG/MINOR mux-quic: apply correctly timeout on output pending data
When no stream is attached, mux layer is responsible to maintain a
timeout. The first criteria is to apply client/server timeout if there
is still data waiting for emission.

Previously, <hreq> qcc member was used to determine this state. However,
this only covers bidirectional streams. Fix this by testing if
<send_list> is empty or not. This is enough to take into account both
bidi and uni streams.

Theorically, this should be backported to every stable versions.
However, send-list is not available on 2.6 and there is no alternative
to quickly determine if there is waiting output data. Thus, it's better
to backport it up to 2.8 only.
2025-07-23 14:36:48 +02:00
William Lallemand
7139ebd676 BUG/MEDIUM: acme: use POST-as-GET instead of GET for resources
The requests that checked the status of the challenge and the retrieval
of the certificate were done using a GET.

This is working with letsencrypt and other CA providers, but it might
not work everywhere. RFC 8555 specifies that only the directory and
newNonce resources MUST work with a GET requests, but everything else
must use POST-as-GET.

Must be backported to 3.2.
2025-07-23 12:42:23 +02:00
Aurelien DARRAGON
054fa05e1f MINOR: log: explicitly ignore "log-steps" on backends
"log-steps" was already ignored if directly defined in a backend section,
however, when defined in a defaults section it was inherited to all
proxies no matter their capability (ie: including backends).

As configurations often contain more backends than frontends, this would
result in wasted memory given that the log-steps setting is only
considered on frontends.

Let's fix that by preventing the inheritance from defaults section to
anything else than frontends. Also adjust the documentation to mention
that the setting in not relevant for backends.
2025-07-22 10:22:04 +02:00
Amaury Denoyelle
e02939108e BUG/MINOR: h3: fix uninitialized value in h3_req_headers_send()
Due to the introduction of smallbuf usage for HTTP/3 headers emission,
ret variable may be used uninitialized if buffer allocation fails due to
not enough room in QUIC connection window.

Fix this by setting ret value to 0.

Function variable declaration are also adjusted so that the pattern is
similar to h3_resp_headers_send(). Finally, outbuf buffer is also
removed as it is now unused.

No need to backport.
2025-07-22 09:42:52 +02:00
Amaury Denoyelle
cbbbf4ea43 MINOR: h3: add traces to h3_req_headers_send()
Add traces during HTTP/3 request encoding. This operation is performed
on the backend side.
2025-07-21 16:58:12 +02:00
Amaury Denoyelle
3126cba82e MINOR: h3: use smallbuf for request header emission
Similarly to HTTP/3 response encoding, a small buffer is first allocated
for the request encoding on the backend side. If this is not sufficient,
the smallbuf is replaced by a standard buffer and encoding is restarted.

This is useful to reduce the window usage over a connection of smaller
requests.
2025-07-21 16:58:12 +02:00
Remi Tricot-Le Breton
7fd849f4e0 MINOR: ssl: Remove ClientHello specific traces if !HAVE_SSL_CLIENT_HELLO_CB
SSL libraries like wolfSSL that don't have the clienthello callback
mechanism enabled do not need to have the traces that are only called
from the said callback.
The code added to parse the ciphers relied on a function that wes not
defined in wolfSSL (SSL_CIPHER_find).
2025-07-21 16:44:50 +02:00
Remi Tricot-Le Breton
665b7d4fa9 MINOR: ssl: Dump ciphers and sigalgs details in trace with 'advanced' verbosity
The contents of the extensions were only dumped with verbosity
'complete' which meant that the 'advanced' verbosity was pretty much
useless despite what its name implies (it was the same as the 'simple'
one).
The 'advanced' verbosity is now the "maximum" one, using 'complete'
would not add any extra information yet, but it leaves more room for
some actually large traces to be dumped later on (some complete
ClientHello dumps for instance).
2025-07-21 16:44:50 +02:00
Remi Tricot-Le Breton
8f2b787241 MINOR: ssl: Add curves in ssl traces
Dump the ClientHello curves in the SSL traces.
2025-07-21 16:44:50 +02:00
Remi Tricot-Le Breton
d799a1b3b2 MINOR: ssl: Add curve id to curve name table and mapping functions
The SSL libraries like OpenSSL for instance do not seem to actually
provide a public mapping between IANA defined curve IDs and curve names,
or even a mapping between curve IDs and internal NIDs.
This new table regroups all those information in a single table so that
we can convert curve names (be it SECG or NIST format) to curve IDs or
NIDs.
The previously existing 'curves2nid' function now uses the new table,
and a new 'curveid2str' one is added.
2025-07-21 16:44:50 +02:00
Remi Tricot-Le Breton
f00d9bf12d MINOR: ssl: Add ciphers in ssl traces
Decode the contents of the ClientHello ciphers extension and dump a
human readable list in the ssl traces.
2025-07-21 16:44:50 +02:00
Amaury Denoyelle
b0fe453079 BUG/MINOR: hq-interop: fix FIN transmission
Since the following patch, app_ops layer is now responsible to report
that HTX block was the last transmitted so that FIN STREAM can be set.
This is mandatory to properly support HTTP 1xx interim responses.

  f349df44b4e21d8bf9b575a0aa869056a2ebaa58
  MINOR: qmux: change API for snd_buf FIN transmission

This change was correctly implemented in HTTP/3 code, however an issue
appeared on hq-interop transcoder in case zero-copy DATA transfer is
performed when HTX buffer is swapped. If this occured during the
transfer of the last HTX block, EOM is not detected and thus STREAM FIN
is never set.

Most of the times, QMUX shut callback is called immediately after. This
results in an emission of a RESET_STREAM to the client, which prevents
the data transfer.

To fix this, use the same method as HTTP/3 : HTX EOM flag status is
checked before any transfer, thus preserving it even after a zero-copy.

Criticity of this bug is low as hq-interop is experimental and is mostly
used for interop testing.

This should fix github issue #3038.

This patch must be backported wherever the above one is.
2025-07-21 15:38:02 +02:00
Aurelien DARRAGON
563b4fafc2 BUG/MINOR: logs: fix log-steps extra log origins selection
Willy noticed that it was not possible to select extra log origins using
log-steps directive. Extra origins are the one registered using
log_orig_register() such as http-req.

Reason was the error path was always executed during extra log origin
matching for log-steps parser, while it should only be executed if no
match was found.

It should be backported to 3.1.
2025-07-21 15:33:55 +02:00
Olivier Houchard
f8e9545f70 BUG/MEDIUM: threads: Disable the workaround to load libgcc_s on macOS
Don't use the workaround to load libgcc_s on macOS. It is not needed
there, and it causes issues, as recent macOS dislike processes that fork
after threads where created (and the workaround creates a temporary
thread). This fixes crashes on macOS at least when using master-worker,
and using the system resolver.

This should fix Github issue #3035

This should be backported up to 2.8.
2025-07-21 13:56:29 +02:00
Valentine Krasnobaeva
5b45251d19 BUILD: debug: add missed guard USE_CPU_AFFINITY to show cpu bindings
Not all platforms support thread-cpu bindings, so let's put
cpu_topo_dump_summary() under USE_CPU_AFFINITY guards.

Only needs to be backported if 1cc0e023ce ("MINOR: debug: add thread-cpu
bindings info in 'show dev' output") is backported.
2025-07-21 11:25:08 +02:00
Frederic Lecaille
14d0f74052 MINOR: quic: Remove pool_head_quic_be_cc_buf pool
This patch impacts the QUIC frontends. It reverts this patch

    MINOR: quic-be: add a "CC connection" backend TX buffer pool

which adds <pool_head_quic_be_cc_buf> new pool to allocate CC (connection closed state)
TX buffers with bigger object size than the one for <pool_head_quic_cc_buf>.
Indeed the QUIC backends must be able to send at least 1200 bytes Initial packets.

For now on, both the QUIC frontends and backend use the same pool with
MAX(QUIC_INITIAL_IPV6_MTU, QUIC_INITIAL_IPV4_MTU)(1252 bytes) as object size.
2025-07-17 19:33:21 +02:00
Valentine Krasnobaeva
1cc0e023ce MINOR: debug: add thread-cpu bindings info in 'show dev' output
Add thread-cpu bindings info in 'show dev' output, as it can be useful for
debugging.
2025-07-17 19:08:13 +02:00
Valentine Krasnobaeva
ff461efc59 MINOR: debug: align output style of debug_parse_cli_show_dev with cpu_dump_topology
Align titles style of debug_parse_cli_show_dev() with
cpu_dump_topology(). We will call the latter inside of
debug_parse_cli_show_dev() to show thread-cpu bindings info.
2025-07-17 19:08:06 +02:00
Valentine Krasnobaeva
9e11c852fe MINOR: cpu-topo: write thread-cpu bindings into trash buffer
Write thread-cpu bindings and cluster summary into provided trash buffer.
Like this we can call this function in any place, when this info is needed.
2025-07-17 19:07:58 +02:00
Valentine Krasnobaeva
2405283230 MINOR: cpu-topo: split cpu_dump_topology() to show its summary in show dev
cpu_dump_topology() prints details about each enabled CPU and a summary with
clusters info and thread-cpu bindings. The latter is often usefull for
debugging and we want to add it in the 'show dev' output.

So, let's split cpu_dump_topology() in two parts: cpu_topo_debug() to print the
details about each enabled CPU; and cpu_topo_dump_summary() to print only the
summary.

In the next commit we will modify cpu_topo_dump_summary() to write into local
trash buffer and it could be easily called from debug_parse_cli_show_dev().
2025-07-17 19:07:46 +02:00
Valentine Krasnobaeva
254e4d59f7 BUG/MINOR: halog: exit with error when some output filters are set simultaneosly
Exit with an error if multiple output filters (-ic, -srv, -st, -tc, -u*, etc.)
are used at the same time.

halog is designed to process and display output for only one filter at a time.
Using multiple filters simultaneously can cause a crash because the program is
not designed to manage multiple, separate result sets (e.g., one for
IP counts, another for URLs).

Supporting simultaneous filters would require a redesign to collect entries for
each filter in separate ebtree. This would negatively impact performance and is
not requested for the moment. This patch prevents the crash by checking filter
combinations just after the command line parsing.

This issue was reported in GitHUB #3031.
This should be backported in all stable versions.
2025-07-17 17:22:37 +02:00
Frederic Lecaille
4eef300a2c BUG/MEDIUM: quic-be: CC buffer released from wrong pool
The "connection close state" TX buffer is used to build the datagram with
basically a CONNECTION_CLOSE frame to notify the peer about the connection
closure. It allows the quic_conn memory release and its replacement by a lighter
quic_cc_conn struct.

For the QUIC backend, there is a dedicated pool to build such datagrams from
bigger TX buffers. But from quic_conn_release(), this is the pool dedicated
to the QUIC frontends which was used to release the QUIC backend TX buffers.

This patch simply adds a test about the target of the connection to release
the "connection close state" TX buffers from the correct pool.

No backport needed.
2025-07-17 11:48:41 +02:00
Willy Tarreau
b6d0ecd258 DOC: connection: explain the rules for idle/safe/avail connections
It's super difficult to find the rules that operate idle conns depending
on their idle/safe/avail/private status. Some are in lists, others not.
Some are in trees, others not. Some have a flag set, others not. This
documents the rules before the definitions in connection-t.h. It could
even be backported to help during backport sessions.
2025-07-16 18:53:57 +02:00
Frederic Lecaille
838024e07e MINOR: quic: Get rid of qc_is_listener()
Replace all calls to qc_is_listener() (resp. !qc_is_listener()) by calls to
objt_listener() (resp. objt_server()).
Remove qc_is_listener() implement and QUIC_FL_CONN_LISTENER the flag it
relied on.
2025-07-16 16:42:21 +02:00
Willy Tarreau
d9701d312d DEV: gdb: add a memprofile decoder to the debug tools
"memprof_dump" will visit memprofile entries and dump them in a
synthetic format counting allocations/releases count/size, type
and calling address.
2025-07-16 15:33:33 +02:00
Christopher Faulet
4f7c26cbb3 BUG/MINOR: applet: Don't trigger BUG_ON if the tid is not on appctx init
When an appctx is initialized, there is a BUG_ON() to be sure the appctx is
really initialized on the right thread to avoid bugs on the thread
affinity. However, it is possible to not choose the thread when the appctx
is created and let it starts on any thread. In that case, the thread
affinity is set when the appctx is initialized. So, we must take cate to not
trigger the BUG_ON() in that case.

For now, we never hit the bug because the thread affinity is always set
during the appctx creation.

This patch must be backport as far as 2.8.
2025-07-16 13:47:33 +02:00
Amaury Denoyelle
88c0422e49 MINOR: h3: remove unused outbuf in h3_resp_headers_send()
Cleanup h3_resp_headers_send() by removing outbuf buffer variable which
is not necessary anymore.
2025-07-16 10:30:59 +02:00
Frederic Lecaille
1c33756f78 BUG/MINOR: quic: Wrong source address use on FreeBSD
The bug is a listener only one, and only occured on FreeBSD.

The FreeBSD issue has been reported here:
https://forums.freebsd.org/threads/quic-http-3-with-haproxy.98443/
where QUIC traces could reveal that sendmsg() calls lead to EINVAL
syscall errnos.

Such a similar issue could be reproduced from a FreeBSD 14-2 VM
with reg-tests/quic/retry.vtc as reg test.

As noted by Olivier, this issue could be fixed within the VM binding
the listener socket to INADDR_ANY.

That said, the symptoms are not exactly the same as the one reporte by the user.
What could be observed from such a VM is that if the first recvmsg() call
returns the datagram destination address, and if the listener
listening address is bound to a specific address, the calls to
sendmsg() fail because of the IP_SENDSRCADDR ip option value
set by cmsg_set_saddr(). According to the ip(4) freebsd manual
such an IP options must be used if the listening socket is
bound to a specific address. It is to be noted that into a VM
the first call to recvmsg() of the first connection does not return the datagram
destination address. This leads the first quic_conn to be initialized without
->local_addr value. This is this value which is used by IP_SENDSRCADDR
ip option. In this case, the sendmsg() calls (without IP_SENDSRCADDR)
never fail. The issue appears at the second condition.

This patch replaces the conditions to use IP_SENDSRCADDR to a call to
qc_may_use_saddr(). This latter also checks that the listener listening
address is not INADDR_ANY to allow the use of the source address.
It is generalized to all the OSes. Indeed, there is no reason to set the source
address when the listener is bound to a specific address.

Must be backported as far as 2.8.
2025-07-16 10:17:54 +02:00
Amaury Denoyelle
63586a8ab4 BUG/MINOR: h3: properly handle interim response on BE side
On backend side, H3 layer is responsible to decode a HTTP/3 response
into an HTX message. Multiple responses may be received on a single
stream with interim status codes prior to the final one.

h3_resp_headers_to_htx() is the function used solely on backend side
responsible for H3 response to HTX transcoding. This patch extends it to
be able to properly support interim responses. When such a response is
received, the new flag H3_SF_RECV_INTERIM is set. This is converted to
QMUX qcs flag QC_SF_EOI_SUSPENDED.

The objective of this latter flag is to prevent stream EOI to be
reported during stream rcv_buf callback, even if HTX message contains
EOM and is empty. QC_SF_EOI_SUSPENDED will be cleared when the final
response is finally converted, which unblock stream EOI notification for
next rcv_buf invocations. Note however that HTX EOM is untouched : it is
always set for both interim and final response reception.

As a minor adjustment, HTX_SL_F_BODYLESS is always set for interim
responses.

Contrary to frontend interim response handling, a flag is necessary on
QMUX layer. This is because H3 to HTX transcoding and rcv_buf callback
are two distinct operations, called under different context (MUX vs
stream tasklet).

Also note that H3 layer has two distinct flags for interim response
handling, one only used as a server (FE side) and the other as a client
(BE side). It was preferred to used two distinct flags which is
considered less error-prone, contrary to a single unified flag which
would require to always set the proxy side to ensure it is relevant or
not.

No need to backport.
2025-07-15 18:39:23 +02:00
Amaury Denoyelle
e7b3a69c59 BUG/MEDIUM: h3: handle interim response properly on FE side
On frontend side, HTTP/3 layer is responsible to transcode an HTX
response message into HTTP/3 HEADERS frame. This operations is handled
via h3_resp_headers_send().

Prior to this patch, if HTX EOM was encountered in the HTX message after
response transcoding, <fin> was reported to the QMUX layer. This will in
turn cause FIN stream bit to be set when the response is emitted.
However, this is not correct as a single HTX response can be constitued
of several interim message, each delimited by EOM block.

Most of the time, this bug will cause the client to close the connection
as it is invalid to receive an interim response with FIN bit set.

Fixes this by now properly differentiate interim and final response.
During interim response transcoding, the new flag H3_SF_SENT_INTERIM
will be set, which will prevent <fin> to be reported. Thus, <fin> will
only be notified for the final response.

This must be backported up to 2.6. Note that it relies on the previous
patch which also must be taken.
2025-07-15 18:39:23 +02:00
Amaury Denoyelle
f349df44b4 MINOR: qmux: change API for snd_buf FIN transmission
Previous patches have fixes interim response encoding via
h3_resp_headers_send(). However, it is still necessary to adjust h3
layer state-machine so that several successive HTTP responses are
accepted for a single stream.

Prior to this, QMUX was responsible to decree that the final HTX message
was encoded so that FIN stream can be emitted. However, with interim
response, MUX is in fact unable to properly determine this. As such,
this is the responsibility of the application protocol layer. To reflect
this, app_ops snd_buf callback is modified so that a new output argument
<fin> is added to it.

Note that for now this commit does not bring any functional change.
However, it will be necessary for the following patch. As such, it
should be backported prior to it to every versions as necessary.
2025-07-15 18:39:23 +02:00
Amaury Denoyelle
d8b34459b5 BUG/MINOR: h3: ensure that invalid status code are not encoded (FE side)
On frontend side, H3 layer transcodes HTX status code into HTTP/3
HEADERS frame. This is done by calling qpack_encode_int_status().

Prior to this patch, the latter function was also responsible to reject
an invalid value, which guarantee that only valid codes are encoded
(between 100 and 999 values). However, this is not practical as it is
impossible to differentiate between an invalid code error and a buffer
room exhaustation.

Changes this so that now HTTP/3 layer first ensures that HTX code is
valid. The stream is closed with H3_INTERNAL_ERROR if invalid value is
present. Thus, qpack_encode_int_status() will only report an error due
to buffer room exhaustion. If a small buffer is used, a standard buffer
will be reallocated which should be sufficient to encode the response.

The impact of this bug is minimal. Its main benefit is code clarity,
while also removing an unnecessary realloc when confronting with an
invalid HTTP code.

This should be backported at least up to 3.1. Prior to it, smallbuf
mechanism isn't present, hence the impact of this patch is less
important. However, it may still be backported to older versions, which
should facilitate picking patches for HTTP 1xx interim response support.
2025-07-15 18:39:23 +02:00
Amaury Denoyelle
d59bdfb8ec BUG/MINOR: h3: properly realloc buffer after interim response encoding
Previous commit fixes encoding of several following HTTP response
message when interim status codes are first reported. However,
h3_resp_headers_send() still was unable to interrupt encoding if output
buffer room was not sufficient. This case may be likely because small
buffers are used for headers encoding.

This commit fixes this situation. If output buffer is not empty prior to
response encoding, this means that a previous interim response message
was already encoded before. In this case, and if remaining space is not
sufficient, use buffer release mechanism : this allows to restart
response encoding by using a newer buffer. This process has already been
used for DATA and trailers encoding.

This must be backported up to 2.6. However, note that buffer release
mechanism is not present for version 2.8 and lower. In this case, qcs
flag QC_SF_BLK_MROOM should be enough as a replacement.
2025-07-15 18:39:23 +02:00
Amaury Denoyelle
1290fb731d BUG/MEDIUM: h3: do not overwrite interim with final response
An HTTP response may contain several interim response message prior (1xx
status) to a final response message (all other status codes). This may
cause issues with h3_resp_headers_send() called for response encoding
which assumes that it is only call one time per stream, most notably
during output buffer handling.

This commit fixes output buffer handling when h3_resp_headers_send() is
called multiple times due to an interim response. Prior to it, interim
response was overwritten with newer response message. Most of the time,
this resulted in error for the client due to QPACK decoding failure.
This is now fixed so that each response is encoded one after the other.

Note that if encoding of several responses is bigger than output buffer,
an error is reported. This can definitely occurs as small buffer are
used during header encoding. This situation will be improved by the next
patch.

This must be backported up to 2.6.
2025-07-15 18:39:23 +02:00
Willy Tarreau
110625bdb2 MINOR: debug: report haproxy and operating system info in panic dumps
The goal is to help figure the OS version (kernel and userland), any
virtualization/containers, and the haproxy version and build features.
Sometimes even reporters themselve can be mistaken about the running
version or environment. Also printing this at the top hepls draw a
visual delimitation between warnings and panic. Now we get something
like this:

  PANIC! Thread 1 is about to kill the process.

  HAProxy info:
    version: 3.3-dev3-c863c0-18
    features: +51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY (...)

  Operating system info:
    virtual machine: no
    container: no
    kernel: Linux 6.1.131 #1 SMP PREEMPT_DYNAMIC Fri Mar 14 01:04:55 CET 2025 x86_64
    userland: Slackware 15.0 x86_64

  * Thread 1 : id=0x7f615a8775c0 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
        1/1    stuck=0 prof=0 harmless=0 isolated=0
               cpu_ns: poll=1835010197 now=1835066102 diff=55905
               (...)
2025-07-15 17:18:29 +02:00
Willy Tarreau
abcc73830f MEDIUM: proxy: register a post-section cleanup function
For listen/frontend/backend, we now want to be able to clean up the
default-server directive that's no longer used past the end of the
section. For this we register a post-section function and perform the
cleanup there.
2025-07-15 10:40:17 +02:00
Willy Tarreau
49a619acae MEDIUM: proxy: no longer allocate the default-server entry by default
The default-server entry used to always be allocated. Now we'll postpone
its allocation for the first time we need it, i.e. during a "default-server"
directive, or when inheriting a defaults section which has one. The memory
savings are significant, on a large configuration with 100k backends and
no default-server directive, the memory usage dropped from 800MB RSS to
420MB (380 MB saved). It should be possible to also address configs using
default-server by releasing this entry when leaving the proxy section,
which is not done yet.
2025-07-15 10:39:44 +02:00
Willy Tarreau
76828d4120 MINOR: proxy: add checks for defsrv's validity
Now we only copy the default server's settings if such a default server
exists, otherwise we only initialize it. At the moment it always exists.

The change is mostly performed in srv_settings_cpy() since that's where
each caller passes through, and there's no point duplicating that test
everywhere.
2025-07-15 10:36:58 +02:00
Willy Tarreau
4ac28f07d0 MEDIUM: proxy: take the defsrv out of the struct proxy
The server struct has gone huge over time (~3.8kB), and having a copy
of it in the defsrv section of the struct proxy costs a lot of RAM,
that is not needed anymore at run time.

This patch replaces this struct with a dynamically allocated one. The
field is allocated and initialized during alloc_new_proxy() and is
freed when the proxy is destroyed for now. But the goal will be to
support freeing it after parsing the section.
2025-07-15 10:34:18 +02:00
Willy Tarreau
2414c5ce2f CLEANUP: server: be sure never to compare src against a non-existing defsrv
The test in srv_ssl_settings_cpy() comparing src to the server's proxy's
default server does work but it's a subtle trap. Indeed, no check is made
on srv->proxy to be valid, and this only works because the compiler is
comparing pointer offsets. During the boot, it's common to have NULL here
in srv->proxy and of course in this case srv does not match that value
which is NULL plus epsilon. But when trying to turn defsrv to a dynamic
pointer instead, then the compiler is forced to dereference this NULL
srv->proxy and dies during init.

Let's always add the null check for srv->proxy before the test to avoid
this situation.

No backport is needed since the problem cannot happen yet.
2025-07-15 10:33:08 +02:00
Willy Tarreau
36f339d2fe CLEANUP: stream: use server_find_by_addr() in sticking_rule_find_target()
This makes this function a bit less of a mess by no longer manipulating
the low-level server address nodes nor the proxy lock.
2025-07-15 10:30:28 +02:00
Willy Tarreau
616c10f608 CLEANUP: server: add server_find_by_addr()
Server lookup by address requires locking and manipulation of the tree
from user code. Let's provide server_find_by_addr() which does that for
us.
2025-07-15 10:30:28 +02:00
Willy Tarreau
fda04994d9 CLEANUP: server: simplify server_find_by_id()
At a few places we're seeing some open-coding of the same function, likely
because it looks overkill for what it's supposed to do, due to extraneous
tests that are not needed (e.g. check of the backend's PR_CAP_BE etc).
Let's just remove all these superfluous tests and inline it so that it
feels more suitable for use everywhere it's needed.
2025-07-15 10:30:28 +02:00
Willy Tarreau
c8f0b69587 CLEANUP: stream: lookup server ID using standard functions
The server lookup in sticking_rule_find_target() uses an open-coded tree
search while we have a function for this server_find_by_id(). In addition,
due to the way it's coded, the stick-table lock also covers the server
lookup by accident instead of being released earlier. This is not a real
problem though since such feature is rarely used nowadays.

Let's clean all this stuff by first retrieving the ID under the lock and
then looking up the corresponding server.
2025-07-15 10:30:28 +02:00
Willy Tarreau
a3443db2eb CLEANUP: cfgparse: lookup proxy ID using existing functions
The code used to detect proxy id conflicts uses an open-coded lookup
in the ID tree which is not necessary since we already have functions
for this. Let's switch to that instead.
2025-07-15 10:30:28 +02:00
Willy Tarreau
31526f73e6 CLEANUP: server: use server_find_by_name() where relevant
Instead of open-coding a tree lookup, in sticking rules and server_find(),
let's just rely on server_find_by_name() which now does exactly the same.
2025-07-15 10:30:28 +02:00
Willy Tarreau
61acd15ea8 CLEANUP: server: rename findserver() to server_find_by_name()
Now it's more logical and matches what is done in the rest of these
functions. server_find() now relies on it.
2025-07-15 10:30:28 +02:00
Willy Tarreau
6ad9285796 CLEANUP: server: rename server_find_by_name() to server_find()
This function doesn't just look at the name but also the ID when the
argument starts with a '#'. So the name is not correct and explains
why this function is not always used when the name only is needed,
and why the list-based findserver() is used instead. So let's just
call the function "server_find()", and rename its generation-id based
cousin "server_find_unique()".
2025-07-15 10:30:28 +02:00
Willy Tarreau
5e78ab33cd MINOR: server: use the tree to look up the server name in findserver()
Let's just use the tree-based lookup instead of walking through the list.
This function is used to find duplicates in "track" statements and a few
such places, so it's important not to waste too much time on large setups.
2025-07-15 10:30:27 +02:00
Willy Tarreau
12a6a3bb3f REORG: server: move findserver() from proxy.c to server.c
The reason this function was overlooked is that it had mostly equivalent
ones in server.c, let's move them together.
2025-07-15 10:30:27 +02:00
Willy Tarreau
732cd0dfa2 CLEANUP: server: do not check for duplicates anymore in findserver()
findserver() used to check for duplicate server names. These are no
longer accepted in 3.3 so let's get rid of that test and simplify the
code. Note that the function still only uses the list instead of the
tree.
2025-07-15 10:30:27 +02:00
Willy Tarreau
d4d72e2303 [RELEASE] Released version 3.3-dev3
Released version 3.3-dev3 with the following main changes :
    - BUG/MINOR: quic-be: Wrong retry_source_connection_id check
    - MEDIUM: sink: change the sink mode type to PR_MODE_SYSLOG
    - MEDIUM: server: move _srv_check_proxy_mode() checks from server init to finalize
    - MINOR: server: move send-proxy* incompatibility check in _srv_check_proxy_mode()
    - MINOR: mailers: warn if mailers are configured but not actually used
    - BUG/MEDIUM: counters/server: fix server and proxy last_change mixup
    - MEDIUM: server: add and use a separate last_change variable for internal use
    - MEDIUM: proxy: add and use a separate last_change variable for internal use
    - MINOR: counters: rename last_change counter to last_state_change
    - MINOR: ssl: check TLS1.3 ciphersuites again in clienthello with recent AWS-LC
    - BUG/MEDIUM: hlua: Forbid any L6/L7 sample fetche functions from lua services
    - BUG/MEDIUM: mux-h2: Properly handle connection error during preface sending
    - BUG/MINOR: jwt: Copy input and parameters in dedicated buffers in jwt_verify converter
    - DOC: Fix 'jwt_verify' converter doc
    - MINOR: jwt: Rename pkey to pubkey in jwt_cert_tree_entry struct
    - MINOR: jwt: Remove unused parameter in convert_ecdsa_sig
    - MAJOR: jwt: Allow certificate instead of public key in jwt_verify converter
    - MINOR: ssl: Allow 'commit ssl cert' with no privkey
    - MINOR: ssl: Prevent delete on certificate used by jwt_verify
    - REGTESTS: jwt: Add test with actual certificate passed to jwt_verify
    - REGTESTS: jwt: Test update of certificate used in jwt_verify
    - DOC: 'jwt_verify' converter now supports certificates
    - REGTESTS: restrict execution to a single thread group
    - MINOR: ssl: Introduce new smp_client_hello_parse() function
    - MEDIUM: stats: add persistent state to typed output format
    - BUG/MINOR: httpclient: wrongly named httpproxy flag
    - MINOR: ssl/ocsp: stop using the flags from the httpclient CLI
    - MEDIUM: httpclient: split the CLI from the actual httpclient API
    - MEDIUM: httpclient: implement a way to use directly htx data
    - MINOR: httpclient/cli: add --htx option
    - BUILD: dev/phash: remove the accidentally committed a.out file
    - BUG/MINOR: ssl: crash in ssl_sock_io_cb() with SSL traces and idle connections
    - BUILD/MEDIUM: deviceatlas: fix when installed in custom locations.
    - DOC: deviceatlas build clarifications
    - BUG/MINOR: ssl/ocsp: fix definition discrepancies with ocsp_update_init()
    - MINOR: proto-tcp: Add support for TCP MD5 signature for listeners and servers
    - BUILD: cfgparse-tcp: Add _GNU_SOURCE for TCP_MD5SIG_MAXKEYLEN
    - BUG/MINOR: proto-tcp: Take care to initialized tcp_md5sig structure
    - BUG/MINOR: http-act: Fix parsing of the expression argument for pause action
    - MEDIUM: httpclient: add a Content-Length when the payload is known
    - CLEANUP: ssl: Rename ssl_trace-t.h to ssl_trace.h
    - MINOR: pattern: add a counter of added/freed patterns
    - CI: set DEBUG_STRICT=2 for coverity scan
    - CI: enable USE_QUIC=1 for OpenSSL versions >= 3.5.0
    - CI: github: add an OpenSSL 3.5.0 job
    - CI: github: update the stable CI to ubuntu-24.04
    - BUG/MEDIUM: quic: SSL/TCP handshake failures with OpenSSL 3.5
    - CI: github: update to OpenSSL 3.5.1
    - BUG/MINOR: quic: Missing TLS 1.3 QUIC cipher suites and groups inits (OpenSSL 3.5 QUIC API)
    - BUG/MINOR: quic-be: Malformed coalesced Initial packets
    - MINOR: quic: Prevent QUIC backend use with the OpenSSL QUIC compatibility module (USE_OPENSS_COMPAT)
    - MINOR: reg-tests: first QUIC+H3 reg tests (QUIC address validation)
    - MINOR: quic-be: Set the backend alpn if not set by conf
    - MINOR: quic-be: TLS version restriction to 1.3
    - MINOR: cfgparse: enforce QUIC MUX compat on server line
    - MINOR: server: support QUIC for dynamic servers
    - CI: github: skip a ssl library version when latest is already in the list
    - MEDIUM: resolvers: switch dns-accept-family to "auto" by default
    - BUG/MINOR: resolvers: don't lower the case of binary DNS format
    - MINOR: resolvers: do not duplicate the hostname_dn field
    - MINOR: proto-tcp: Register a feature to report TCP MD5 signature support
    - BUG/MINOR: listener: really assign distinct IDs to shards
    - MINOR: quic: Prevent QUIC build with OpenSSL 3.5 new QUIC API version < 3.5.1
    - BUG/MEDIUM: quic: Crash after QUIC server callbacks restoration (OpenSSL 3.5)
    - REGTESTS: use two haproxy instances to distinguish the QUIC traces
    - BUG/MEDIUM: http-client: Don't wake http-client applet if nothing was xferred
    - BUG/MEDIUM: http-client: Properly inc input data when HTX blocks are xferred
    - BUG/MEDIUM: http-client: Ask for more room when request data cannot be xferred
    - BUG/MEDIUM: http-client: Test HTX_FL_EOM flag before commiting the HTX buffer
    - BUG/MINOR: http-client: Ignore 1XX interim responses in non-HTX mode
    - BUG/MINOR: http-client: Reject any 101-switching-protocols response
    - BUG/MEDIUM: http-client: Drain the request if an early response is received
    - BUG/MEDIUM: http-client: Notify applet has more data to deliver until the EOM
    - BUG/MINOR: h3: fix https scheme request encoding for BE side
    - MINOR: h1-htx: Add function to format an HTX message in its H1 representation
    - BUG/MINOR: mux-h1: Use configured error files if possible for early H1 errors
    - BUG/MINOR: h1-htx: Don't forget to init flags in h1_format_htx_msg function
    - CLEANUP: assorted typo fixes in the code, commits and doc
    - BUILD: adjust scripts/build-ssl.sh to modern CMake system of QuicTLS
    - MINOR: debug: add distro name and version in postmortem
2025-07-11 16:45:50 +02:00
Valentine Krasnobaeva
0c63883be1 MINOR: debug: add distro name and version in postmortem
Since 2012, systemd compliant distributions contain
/etc/os-release file. This file has some standardized format, see details at
https://www.freedesktop.org/software/systemd/man/latest/os-release.html.

Let's read it in feed_post_mortem_linux() to gather more info about the
distribution.

(cherry picked from commit f1594c41368baf8f60737b229e4359fa7e1289a9)
Signed-off-by: Willy Tarreau <w@1wt.eu>
2025-07-11 11:48:19 +02:00
Ilia Shipitsin
1888991e12 BUILD: adjust scripts/build-ssl.sh to modern CMake system of QuicTLS
QuicTLS in master branch has migrated to CMake, let's adopt script to
it. Previous OpenSSL+QuicTLS patch is built as usual.
2025-07-11 05:04:31 +02:00
Ilia Shipitsin
0ee3d739b8 CLEANUP: assorted typo fixes in the code, commits and doc
Corrected various spelling and phrasing errors to improve clarity and consistency.
2025-07-10 19:49:48 +02:00
Christopher Faulet
516dfe16ff BUG/MINOR: h1-htx: Don't forget to init flags in h1_format_htx_msg function
The regression was introduced by commit 187ae28 ("MINOR: h1-htx: Add
function to format an HTX message in its H1 representation"). We must be
sure the flags variable must be initialized in h1_format_htx_msg() function.

This patch must be backported with the commit above.
2025-07-10 14:10:42 +02:00
Christopher Faulet
d252ec2beb BUG/MINOR: mux-h1: Use configured error files if possible for early H1 errors
The H1 multiplexer is able to produce some errors on its own to report early
errors, before the stream is created. In that case, the error files of the
proxy were tested to detect empty files (or /dev/null) but they were not
used to produce the error itself.

But the documentation states that configured error files are used in all
cases. And in fact, it is not really a problem to use these files. We must
just format a full HTX message. Thanks to the previous patch, it is now
possible.

This patch should fix the issue #3032. It should be backported to 3.2. For
older versions, it must be discussed but it should be quite easy to do.
2025-07-10 10:29:49 +02:00
Christopher Faulet
187ae28cf4 MINOR: h1-htx: Add function to format an HTX message in its H1 representation
The function h1_format_htx_msg() can now be used to convert a valid HTX
message in its H1 representation. No validity test is performed, the HTX
message must be valid. Only trailers are silently ignored if the message is
not chunked. In addition, the destination buffer must be empty. 1XX interim
responses should be supported. But again, there is no validity tests.
2025-07-10 10:29:49 +02:00
Amaury Denoyelle
378c182192 BUG/MINOR: h3: fix https scheme request encoding for BE side
An HTTP/3 request must contains :scheme pseudo-header. Currently, only
"https" value is expected due to QUIC transport layer in use.

However, https value is incorrectly encoded due to a QPACK index value
mismatch in qpack_encode_scheme(). Fix it to ensure that scheme is now
properly set for HTTP/3 requests on the backend side.

No need to backport this.
2025-07-09 17:41:34 +02:00
Christopher Faulet
0b97bf36fa BUG/MEDIUM: http-client: Notify applet has more data to deliver until the EOM
When we leave the I/O handler with an unfinished request, we must report the
applet has more data to deliver. Otherwise, when the channel request buffer
is emptied, the http-client applet is not always woken up to forward the
remaining request data.

This issue was probably revealed by commit "BUG/MEDIUM: http-client: Don't
wake http-client applet if nothing was xferred". It is only an issue with
large POSTs, when the payload is streamed.

This patch must be backported as far as 2.6 with the commit above. But on
older versions, the applet API may differ. So be careful.
2025-07-09 16:27:24 +02:00
Christopher Faulet
25b0625d5c BUG/MEDIUM: http-client: Drain the request if an early response is received
When a large request is sent, it is possible to have a response before the
end of the request. It is valid from HTTP perspective but it is an issue
with the current design of the http-client. Indded, the request and the
response are handled sequentially. So the response will be blocked, waiting
for the end of the request. Most of time, it is not an issue, except when
the request transfer is blocked. In that case, the applet is blocked.

With the current API, it is not possible to handle early response and
continue the request transfer. So, this case cannot be handle. In that case,
it seems reasonnable to drain the request if a response is received. This
way, the request transfer, from the caller point of view, is never blocked
and the response can be properly processed.

To do so, the action flag HTTPCLIENT_FA_DRAIN_REQ is added to the
http-client. When it is set, the request payload is just dropped. In that
case, we take care to not report the end of input to properly report the
request was truncated, especially in logs.

It is only an issue with large POSTs, when the payload is streamed.

This patch must be backported as far as 2.6.
2025-07-09 16:27:24 +02:00
Christopher Faulet
8ba754108d BUG/MINOR: http-client: Reject any 101-switching-protocols response
Protocol updages are not supported by the http-client. So report an error is
a 101-switching-protocols response is received. Of course, it is unexpected
because the API is not designed to support upgrades. But it is better to
properly handle this case.

This patch could be backported as far as 2.6. It depends on the commit
"BUG/MINOR: http-client: Ignore 1XX interim responses in non-HTX mode".
2025-07-09 16:27:24 +02:00
Christopher Faulet
9d10be33ae BUG/MINOR: http-client: Ignore 1XX interim responses in non-HTX mode
When the response is re-formatted in raw message, the 1XX interim responses
must be skipped. Otherwise, information of the first interim response will
be saved (status line and headers) and those from the final response will be
dropped.

Note that for now, in HTX-mode, the interim messages are removed.

This patch must be backported as far as 2.6.
2025-07-09 16:27:24 +02:00
Christopher Faulet
4bdb2e5a26 BUG/MEDIUM: http-client: Test HTX_FL_EOM flag before commiting the HTX buffer
when htx_to_buf() function is called, if the HTX message is empty, the
buffer is reset. So HTX flags must not be tested after because the info may
be lost.

So now, we take care to test HTX_FL_EOM flag before calling htx_to_buf().

This patch must be backported as far as 2.8.
2025-07-09 16:27:24 +02:00
Christopher Faulet
e4a0d40c62 BUG/MEDIUM: http-client: Ask for more room when request data cannot be xferred
When the request payload cannot be xferred to the channel because its buffer
is full, we must request for more room by calling sc_need_room(). It is
important to be sure the httpclient applet will not be woken up in loop to
push more data while it is not possible.

It is only an issue with large POSTs, when the payload is streamed.

This patch must be backported as far as 2.6. Note that on 2.6,
sc_need_room() only takes one argument.
2025-07-09 16:27:24 +02:00
Christopher Faulet
d9ca8f6b71 BUG/MEDIUM: http-client: Properly inc input data when HTX blocks are xferred
When HTX blocks from the requests are transferred into the channel buffer,
the return value of htx_xfer_blks() function must not be used to increment
the channel input value because meta data are counted here while they are
not part of input data. Because of this bug, it is possible to forward more
data than these present in the channel buffer.

Instead, we look at the input data before and after the transfer and the
difference is added.

It is only an issue with large POSTs, when the payload is streamed.

This patch must be backported as far as 2.6.
2025-07-09 16:27:24 +02:00
Christopher Faulet
fffdac42df BUG/MEDIUM: http-client: Don't wake http-client applet if nothing was xferred
When data are transferred to or from the htt-pclient, the applet is
systematically woken up, even when no data are transferred. This could lead
to needlessly wakeups. When called from a lua script, if data are blocked
for a while, this leads to a wakeup ping-pong loop where the http-client
applet is woken up by the lua script which wakes back the script.

To fix the issue, in httpclient_req_xfer() and httpclient_res_xfer()
functions, we now take care to not wake the http-client applet up when no
data are transferred.

This patch must be backported as far as 2.6.
2025-07-09 16:27:24 +02:00
Frederic Lecaille
479c9fb067 REGTESTS: use two haproxy instances to distinguish the QUIC traces
The aim of this patch is to identify the QUIC traces between the QUIC frontend
and backend parts. Two haproxy instances are created. The c(1|2) http clients
connect to ha1 with TCP frontends and QUIC backends. ha2 embeds two QUIC listeners
with s1 as TCP backend. When the traces are activated, they are dumped to stderr.
Hopefully, they are prefixed by the haproxy instance name (h1 or h2). This is very
useful to identify the QUIC instances.
2025-07-09 16:01:02 +02:00
Frederic Lecaille
45ac235baa BUG/MEDIUM: quic: Crash after QUIC server callbacks restoration (OpenSSL 3.5)
Revert this patch which is no more useful since OpenSSL 3.5.1 to remove the
QUIC server callback restoration after SSL context switch:

    MINOR: quic: OpenSSL 3.5 internal QUIC custom extension for transport parameters reset

It was required for 3.5.0. That said, there was no CI for OpenSSL 3.5 at the date
of this commit. The CI recently revealed that the QUIC server side could crash
during QUIC reg tests just after having restored the callbacks as implemented by
the commit above.

Also revert this commit which is no more useful because it arrived with the commit
above:

	BUG/MEDIUM: quic: SSL/TCP handshake failures with OpenSSL 3.

Must be backported to 3.2.
2025-07-09 16:01:02 +02:00
Frederic Lecaille
c01eb1040e MINOR: quic: Prevent QUIC build with OpenSSL 3.5 new QUIC API version < 3.5.1
The QUIC listener part was impacted by the 3.5.0 OpenSSL new QUIC API with several
issues which have been fixed by 3.5.1.

Add a #error to prevent such OpenSSL 3.5 new QUIC API use with version below 3.5.1.

Must be backported to 3.2.
2025-07-09 16:01:02 +02:00
Willy Tarreau
dd49f1ee62 BUG/MINOR: listener: really assign distinct IDs to shards
A fix was made in 3.0 for the case where sharded listeners were using
a same ID with commit 0db8b6034d ("BUG/MINOR: listener: always assign
distinct IDs to shards"). However, the fix is incorrect. By checking the
ID of temporary node instead of the kept one in bind_complete_thread_setup()
it ends up never inserting the used nodes at this point, thus not reserving
them. The side effect is that assigning too close IDs to subsequent
listeners results in the same ID still being assigned twice since not
reserved. Example:

   global
       nbthread 20

   frontend foo
       bind :8000 shards by-thread id 10
       bind :8010 shards by-thread id 20

The first one will start a series from 10 to 29 and the second one a
series from 20 to 39. But 20 not being inserted when creating the shards,
it will remain available for the post-parsing phase that assigns all
unassigned IDs by filling holes, and two listeners will have ID 20.

By checking the correct node, the problem disappears. The patch above
was marked for backporting to 2.6, so this fix should be backported that
far as well.
2025-07-09 15:52:33 +02:00
Christopher Faulet
adba8ffb49 MINOR: proto-tcp: Register a feature to report TCP MD5 signature support
"HAVE_TCP_MD5SIG" feature is now registered if TCP MD5 signature is
supported. This will help the feature detection in the reg-test script
dedicated to this feature.
2025-07-09 09:51:24 +02:00
Willy Tarreau
96da670cd7 MINOR: resolvers: do not duplicate the hostname_dn field
The hostdn.key field in the server contains a pure copy of the hostname_dn
since commit 3406766d57 ("MEDIUM: resolvers: add a ref between servers and
srv request or used SRV record") which wanted to lowercase it. Since it's
not necessary, let's drop this useless copy. In addition, the return from
strdup() was not tested, so it could theoretically crash the process under
heavy memory contention.
2025-07-08 07:54:45 +02:00
Willy Tarreau
95cf518bfa BUG/MINOR: resolvers: don't lower the case of binary DNS format
The server's "hostname_dn" is in Domain Name format, not a pure string, as
converted by resolv_str_to_dn_label(). It is made of lower-case string
components delimited by binary lengths, e.g. <0x03>www<0x07>haproxy<0x03)org.
As such it must not be lowercased again in srv_state_srv_update(), because
1) it's useless on the name components since already done, and 2) because
it would replace component lengths 97 and above by 32-char shorter ones.
Granted, not many domain names have that large components so the risk is
very low but the operation is always wrong anyway. This was brought in
2.5 by commit 3406766d57 ("MEDIUM: resolvers: add a ref between servers
and srv request or used SRV record").

In the same vein, let's fix the confusing strcasecmp() that are applied
to this binary format, and use memcmp() instead. Here there's basically
no risk to incorrectly match the wrong record, but that test alone is
confusing enough to provoke the existence of the bug above.

Finally let's update the component for that field to mention that it's
in this format and already lower cased.

Better not backport this, the risk of facing this bug is almost zero, and
every time we touch such files something breaks for bad reasons.
2025-07-08 07:54:45 +02:00
Willy Tarreau
54d36f3e65 MEDIUM: resolvers: switch dns-accept-family to "auto" by default
As notified in the 3.2 announce [1], dns-accept-family needed to switch
to "auto" by default in 3.3. This is now done.

[1] https://www.mail-archive.com/haproxy@formilux.org/msg45917.html
2025-07-08 07:54:45 +02:00
William Lallemand
9e78859fb3 CI: github: skip a ssl library version when latest is already in the list
Skip the job for "latest" libssl version, when this version is the same
as a one already in the list.

This avoid having 2 jobs for OpenSSL 3.5.1 since no new dev version are
available for now and 3.5.1 is already in the list.
2025-07-07 19:46:07 +02:00
Amaury Denoyelle
42365f53e8 MINOR: server: support QUIC for dynamic servers
To properly support QUIC for dynamic servers, it is required to extend
add server CLI handler :
* ensure conformity between server address and proto
* automatically set proto to QUIC if not specified
* prepare_srv callback must be called to initialize required SSL context

Prior to this patch, crashes may occur when trying to use QUIC with
dynamic servers.

Also, destroy_srv callback must be called when a dynamic server is
deallocated. This ensures that there is no memory leak due to SSL
context.

No need to backport.
2025-07-07 14:29:29 +02:00
Amaury Denoyelle
626cfd85aa MINOR: cfgparse: enforce QUIC MUX compat on server line
Add postparsing checks to control server line conformity regarding QUIC
both on the server address and the MUX protocol. An error is reported in
the following case :
* proto quic is explicitely specified but server address does not
  specify quic4/quic6 prefix
* another proto is explicitely specified but server address uses
  quic4/quic6 prefix
2025-07-07 14:29:24 +02:00
Frederic Lecaille
e76f1ad171 MINOR: quic-be: TLS version restriction to 1.3
This patch skips the TLS version settings. They have as a side effect to add
all the TLS version extensions to the ClientHello message (TLS 1.0 to TLS 1.3).
QUIC supports only TLS 1.3.
2025-07-07 14:13:02 +02:00
Frederic Lecaille
93a94ba87b MINOR: quic-be: Set the backend alpn if not set by conf
Simply set the alpn string to "h3,hq_interop" if there is no "alpn" setting for
QUIC backends.
2025-07-07 14:13:02 +02:00
Frederic Lecaille
a9b5a2eb90 MINOR: reg-tests: first QUIC+H3 reg tests (QUIC address validation)
First simple VTC file for QUIC reg tests. Two listeners are configured, one without
Retry enabled and the other without. Two clients simply tries to connect to these
listeners to make an basic H3 request.
2025-07-07 14:13:02 +02:00
Frederic Lecaille
5a87f4673a MINOR: quic: Prevent QUIC backend use with the OpenSSL QUIC compatibility module (USE_OPENSS_COMPAT)
Make the server line parsing fail when a QUIC backend is configured  if haproxy
is built to use the OpenSSL stack compatibility module. This latter does not
support the QUIC client part.
2025-07-07 14:13:02 +02:00
Frederic Lecaille
87ada46f38 BUG/MINOR: quic-be: Malformed coalesced Initial packets
This bug fix completes this patch which was not sufficient:

   MINOR: quic-be: Allow sending 1200 bytes Initial datagrams

This patch could not allow the build of well formed Initial packets coalesced to
others (Handshake) packets. Indeed, the <padding> parameter passed to qc_build_pkt()
is deduced from a first value: <padding> value and must be set to 1 for
the last encryption level. As a client, the last encryption level is always
the Handshake encryption level. But <padding> was always set to 1 for a QUIC
client, leading the first Initial packet to be malformed because considered
as the second one into the same datagram.

So, this patch sets <padding> value passed to qc_build_pkt() to 1 only when there
is no last encryption level at all, to allow the build of Initial only packets
(not coalesced) or when it frames to send (coalesced packets).

No need to backport.
2025-07-07 14:13:02 +02:00
Frederic Lecaille
6aebca7f2c BUG/MINOR: quic: Missing TLS 1.3 QUIC cipher suites and groups inits (OpenSSL 3.5 QUIC API)
This bug impacts both QUIC backends and frontends with OpenSSL 3.5 as QUIC API.

The connections to a haproxy QUIC listener from a haproxy QUIC backend could not
work at all without HelloRetryRequest TLS messages emitted by the backend
asking the QUIC client to restart the handshake followed by TLS alerts:

    conn. @(nil) OpenSSL error[0xa000098] read_state_machine: excessive message size

Furthermore, the Initial CRYPTO data sent by the client were big (about two 1252 bytes
packets) (ClientHello TLS message). After analyzing the packets a key_share extension
with <unknown> as value was long (more that 1Ko). This extension is in relation with
the groups but does not belong to the groups supported by QUIC.

That said such connections could work with ngtcp2 as backend built against the same
OSSL TLS stack API but with a HelloRetryRequest.

ngtcp2 always set the QUIC default cipher suites and group, for all the stacks it
supports as implemented by this patch.

So this patch configures both QUIC backend and frontend cipher suites and groups
calling SSL_CTX_set_ciphersuites() and SSL_CTX_set1_groups_list() with the correct
argument, except for SSL_CTX_set1_groups_list() which fails with QUIC TLS for
a unknown reason at this time.

The call to SSL_CTX_set_options() is useless from ssl_quic_initial_ctx() for the QUIC
clients. One relies on ssl_sock_prepare_srv_ssl_ctx() to set them for now on.

This patch is effective for all the supported stacks without impact for AWS-LC,
and QUIC TLS and fixes the connections for haproxy QUIC frontend and backends
when builts against OpenSSL 3.5 QUIC API).

A new define HAVE_OPENSSL_QUICTLS has been added to openssl-compat.h to distinguish
the QUIC TLS stack.

Must be backported to 3.2.
2025-07-07 14:13:02 +02:00
William Lallemand
0efbe6da88 CI: github: update to OpenSSL 3.5.1
Update the OpenSSL 3.5 job to 3.5.1.

This must be backported to 3.2.
2025-07-07 13:58:38 +02:00
Frederic Lecaille
fb0324eb09 BUG/MEDIUM: quic: SSL/TCP handshake failures with OpenSSL 3.5
This bug arrived with this commit:

    MINOR: quic: OpenSSL 3.5 internal QUIC custom extension for transport parameters reset

To make QUIC connection succeed with OpenSSL 3.5 API, a call to quic_ssl_set_tls_cbs()
was needed from several callback which call SSL_set_SSL_CTX(). This has as side effect
to set the QUIC callbacks used by the OpenSSL 3.5 API.

But quic_ssl_set_tls_cbs() was also called for TCP sessions leading the SSL stack
to run QUIC code, if the QUIC support is enabled.

To fix this, simply ignore the TCP connections inspecting the <ssl_qc_app_data_index>
index value which is NULL for such connections.

Must be backported to 3.2.
2025-07-07 12:01:22 +02:00
William Lallemand
d0bd0595da CI: github: update the stable CI to ubuntu-24.04
Update the stable CI to ubuntu-24.04.

Must be backported to 3.2.
2025-07-07 09:29:33 +02:00
William Lallemand
b6fec27ef6 CI: github: add an OpenSSL 3.5.0 job
Add an OpenSSL 3.5.0 job to test USE_QUIC.

This must be backported to 3.2.
2025-07-07 09:27:17 +02:00
Ilia Shipitsin
d8c867a1e6 CI: enable USE_QUIC=1 for OpenSSL versions >= 3.5.0
OpenSSL 3.5.0 introduced experimental support for QUIC. This change enables the use_quic option when a compatible version of OpenSSL is detected, allowing QUIC-based functionality to be leveraged where applicable. Feature remains disabled for earlier versions to ensure compatibility.
2025-07-07 09:02:11 +02:00
Ilia Shipitsin
198d422a31 CI: set DEBUG_STRICT=2 for coverity scan
enabling DEBUG_STRICT=2 will enable BUG_ON_HOT() and help coverity
in bug detection

for the reference: https://github.com/haproxy/haproxy/issues/3008
2025-07-06 08:17:37 +02:00
Willy Tarreau
573143e0c8 MINOR: pattern: add a counter of added/freed patterns
Patterns are allocated when loading maps/acls from a file or dynamically
via the CLI, and are released only from the CLI (e.g. "clear map xxx").
These ones do not use pools and are much harder to monitor, e.g. in case
a script adds many and forgets to clear them, etc.

Let's add a new pair of metrics "PatternsAdded" and "PatternsFreed" that
will report the number of added and freed patterns respectively. This
can allow to simply graph both. The difference between the two normally
represents the number of allocated patterns. If Added grows without
Freed following, it can indicate a faulty script that doesn't perform
the needed cleanup. The metrics are also made available to Prometheus
as patterns_added_total and patterns_freed_total respectively.
2025-07-05 00:12:45 +02:00
Remi Tricot-Le Breton
a075d6928a CLEANUP: ssl: Rename ssl_trace-t.h to ssl_trace.h
This header does not actually contain any structures so it's best to
remove the '-t' from the name for better consistency.
2025-07-04 15:21:50 +02:00
William Lallemand
f07f0ee21c MEDIUM: httpclient: add a Content-Length when the payload is known
This introduce a change of behavior in the httpclient API. When
generating a request with a payload buffer, the size of the buffer
payload is known and does not need to be streamed in chunks.

This patch force to sends payload buffer using a Content-Length header
in the request, however the behavior does not change if a callback is
still used instead of a buffer.
2025-07-04 15:21:50 +02:00
Christopher Faulet
5da4da0bb6 BUG/MINOR: http-act: Fix parsing of the expression argument for pause action
When the "pause" action is parsed, if an expression is used instead of a
static value, the position of the current argument after the expression
evaluation is incremented while it should not. The sample_parse_expr()
function already take care of it. However, it should still be incremented
when an time value was parsed.

This patch must be backported to 3.2.
2025-07-04 14:38:32 +02:00
Christopher Faulet
3cc5991c9b BUG/MINOR: proto-tcp: Take care to initialized tcp_md5sig structure
When the TCP MD5 signature is enabled, on a listening socket or an outgoing
one, the tcp_md5sig structure must be initialized first.

It is a 3.3-specific issue. No backport needed.
2025-07-04 08:32:06 +02:00
Christopher Faulet
45cb232062 BUILD: cfgparse-tcp: Add _GNU_SOURCE for TCP_MD5SIG_MAXKEYLEN
It is required for the musl librairy to be sure TCP_MD5SIG_MAXKEYLEN is
defined and avoid build errors.
2025-07-03 16:30:15 +02:00
Christopher Faulet
5232df57ab MINOR: proto-tcp: Add support for TCP MD5 signature for listeners and servers
This patch adds the support for the RFC2385 (Protection of BGP Sessions via
the + TCP MD5 Signature Option) for the listeners and the servers. The
feature is only available on Linux. Keywords are not exposed otherwise.

By setting "tcp-md5sig <password>" option on a bind line, TCP segments of
all connections instantiated from the listening socket will be signed with a
16-byte MD5 digest. The same option can be set on a server line to protect
outgoing connections to the corresponding server.

The primary use case for this option is to allow BGP to protect itself
against the introduction of spoofed TCP segments into the connection
stream. But it can be useful for any very long-lived TCP connections.

A reg-test was added and it will be executed only on linux. All other
targets are excluded.
2025-07-03 15:25:40 +02:00
William Lallemand
6f6c6fa4cb BUG/MINOR: ssl/ocsp: fix definition discrepancies with ocsp_update_init()
Since patch 20718f40b6 ("MEDIUM: ssl/ckch: add filename and linenum
argument to crt-store parsing"), the definition of ocsp_update_init()
and its declaration does not share the same arguments.

Must be backported to 3.2.
2025-07-03 15:14:13 +02:00
David Carlier
e7c59a7a84 DOC: deviceatlas build clarifications
Update accordingly the related documentation, removing/clarifying confusing
parts as it was more complicated than it needed to be.
2025-07-03 09:08:06 +02:00
David Carlier
0e8e20a83f BUILD/MEDIUM: deviceatlas: fix when installed in custom locations.
We are reusing DEVICEATLAS_INC/DEVICEATLAS_LIB when the DeviceAtlas
library had been compiled and installed with cmake and make install targets.
Works fine except when ldconfig is unaware of the path, thus adding
cflags/ldflags into the mix.

Ideally, to be backported down to the lowest stable branch.
2025-07-03 09:08:06 +02:00
William Lallemand
720efd0409 BUG/MINOR: ssl: crash in ssl_sock_io_cb() with SSL traces and idle connections
TRACE_ENTER is crashing in ssl_sock_io_cb() in case a connection idle is
being stolen. Indeed the function could be called with a NULL context
and dereferencing it will crash.

This patch fixes the issue by initializing ctx only once it is usable,
and moving TRACE_ENTER after the initialization.

This must be backported to 3.2.
2025-07-02 16:14:19 +02:00
Willy Tarreau
e34a0a50ae BUILD: dev/phash: remove the accidentally committed a.out file
Commit 41f28b3c53 ("DEV: phash: Update 414 and 431 status codes to phash")
accidentally committed a.out, resulting in build/checkout issues when
locally rebuilt. Let's drop it.

This should be backported to 3.1.
2025-07-02 10:55:13 +02:00
William Lallemand
0f1c206b8f MINOR: httpclient/cli: add --htx option
Use the new HTTPCLIENT_O_RES_HTX flag when using the CLI httpclient with
--htx.

It allows to process directly the response in HTX, then the htx_dump()
function is used to display a debug output.

Example:

echo "httpclient --htx GET https://haproxy.org" | socat /tmp/haproxy.sock
 htx=0x79fd72a2e200(size=16336,data=139,used=6,wrap=NO,flags=0x00000010,extra=0,first=0,head=0,tail=5,tail_addr=139,head_addr=0,end_addr=0)
		[0] type=HTX_BLK_RES_SL    - size=31     - addr=0     	HTTP/2.0 301
		[1] type=HTX_BLK_HDR       - size=15     - addr=31    	content-length: 0
		[2] type=HTX_BLK_HDR       - size=32     - addr=46    	location: https://www.haproxy.org/
		[3] type=HTX_BLK_HDR       - size=25     - addr=78    	alt-svc: h3=":443"; ma=3600
		[4] type=HTX_BLK_HDR       - size=35     - addr=103   	set-cookie: served=2:TLSv1.3+TCP:IPv4
		[5] type=HTX_BLK_EOH       - size=1      - addr=138   	<empty>
2025-07-01 16:33:38 +02:00
William Lallemand
3e05e20029 MEDIUM: httpclient: implement a way to use directly htx data
Add a HTTPCLIENT_O_RES_HTX flag which allow to store directly the HTX
data in the response buffer instead of extracting the data in raw
format.

This is useful when the data need to be reused in another request.
2025-07-01 16:31:47 +02:00
William Lallemand
2f4219ed68 MEDIUM: httpclient: split the CLI from the actual httpclient API
This patch split the httpclient code to prevent confusion between the
httpclient CLI command and the actual httpclient API.

Indeed there was a confusion between the flag used internally by the
CLI command, and the actual httpclient API.

hc_cli_* functions as well as HC_C_F_* defines were moved to
httpclient_cli.c.
2025-07-01 15:46:04 +02:00
William Lallemand
149f6a4879 MINOR: ssl/ocsp: stop using the flags from the httpclient CLI
The ocsp-update uses the flags from the httpclient CLI, which are not
supposed to be used elsewhere since this is a state for the CLI.

This patch implements HC_OCSP flags for the ocsp-update.
2025-07-01 15:46:04 +02:00
William Lallemand
519abefb57 BUG/MINOR: httpclient: wrongly named httpproxy flag
The HC_F_HTTPPROXY flag was wrongly named and does not use the correct
value, indeed this flag was meant to be used for the httpclient API, not
the httpclient CLI.

This patch fixes the problem by introducing HTTPCLIENT_FO_HTTPPROXY
which has must be set in hc->flags.

Also add a member 'options' in the httpclient structure, because the
member flags is reinitialized when starting.

Must be backported as far as 3.0.
2025-07-01 14:47:52 +02:00
Aurelien DARRAGON
747a812066 MEDIUM: stats: add persistent state to typed output format
Add a fourth character to the second column of the "typed output format"
to indicate whether the value results from a volatile or persistent metric
('V' or 'P' characters respectively). A persistent metric means the value
could possibily be preserved across reloads by leveraging a shared memory
between multiple co-processes. Such metrics are identified as "shared" in
the code (since they are possibly shared between multiple co-processes)

Some reg-tests were updated to take that change into account, also, some
outputs in the configuration manual were updated to reflect current
behavior.
2025-07-01 14:15:03 +02:00
Mariam John
bd076f8619 MINOR: ssl: Introduce new smp_client_hello_parse() function
In this patch we introduce a new helped function called `smp_client_hello_parse()` to extract
information presented in a TLS client hello handshake message. 7 sample fetches have also been
modified to use this helped function to do the common client hello parsing and use the result
to do further processing of extensions/cipher.

Fixes: #2532
2025-07-01 11:55:36 +02:00
Willy Tarreau
48d5ef363d REGTESTS: restrict execution to a single thread group
When threads are enabled and running on a machine with multiple CCX
or multiple nodes, thread groups are now enabled since 3.3-dev2, causing
load-balancing algorithms to randomly fail due to incoming connections
spreading over multiple groups and using different load balancing indexes.

Let's just force "thread-groups 1" into all configs when threads are
enabled to avoid this.
2025-06-30 18:54:35 +02:00
Remi Tricot-Le Breton
94d750421c DOC: 'jwt_verify' converter now supports certificates
The 'jwt_verify' converter can now accept certificates as a second
parameter, which can be updated via the CLI.
2025-06-30 17:59:55 +02:00
Remi Tricot-Le Breton
db5ca5a106 REGTESTS: jwt: Test update of certificate used in jwt_verify
Using certificates in the jwt_verify converter allows to make use of the
CLI certificate updates, which is still impossible with public keys (the
legacy behavior).
2025-06-30 17:59:55 +02:00
Remi Tricot-Le Breton
663ba093aa REGTESTS: jwt: Add test with actual certificate passed to jwt_verify
The jwt_verify can now take public certificates as second parameter,
either with actual certificate path (no previously mentioned) or from a
predefined crt-store or from a variable.
2025-06-30 17:59:55 +02:00
Remi Tricot-Le Breton
093a3ad7f2 MINOR: ssl: Prevent delete on certificate used by jwt_verify
A ckch_store used in JWT verification might not have any ckch instances
or crt-list entries linked but we don't want to be able to remove it via
the CLI anyway since it would make all future jwt_verify calls using
this certificate fail.
2025-06-30 17:59:55 +02:00
Remi Tricot-Le Breton
31955e6e0a MINOR: ssl: Allow 'commit ssl cert' with no privkey
The ckch_stores might be used to store public certificates only so in
this case we won't provide private keys when updating the certificate
via the CLI.
If the ckch_store is actually used in a bind or server line an error
will still be raised if the private key is missing.
2025-06-30 17:59:55 +02:00
Remi Tricot-Le Breton
522bca98e1 MAJOR: jwt: Allow certificate instead of public key in jwt_verify converter
The 'jwt_verify' converter could only be passed public keys as second
parameter instead of full-on public certificates. This patch allows
proper certificates to be used.
Those certificates can be loaded in ckch_stores like any other
certificate which means that all the certificate-related operations that
can be made via the CLI can now benefit JWT validation as well.

We now have two ways JWT validation can work, the legacy one which only
relies on public keys which could not be stored in ckch_stores without
some in depth changes in the way the ckch_stores are built. In this
legacy way, the public keys are fully stored in a cache dedicated to JWT
only which does not have any CLI commands and any way to update them
during runtime. It also requires that all the public keys used are
passed at least once explicitely to the 'jwt_verify' converter so that
they can be loaded during init.
The new way uses actual certificates, either already stored in the
ckch_store tree (if predefined in a crt-store or already used previously
in the configuration) or loaded in the ckch_store tree during init if
they are explicitely used in the configuration like so:
    var(txn.bearer),jwt_verify(txn.jwt_alg,"cert.pem")

When using a variable (or any other way that can only be resolved during
runtime) in place of the converter's <key> parameter, the first time we
encounter a new value (for which we don't have any entry in the jwt
tree) we will lock the ckch_store tree and try to perform a lookup in
it. If the lookup fails, an entry will still be inserted into the jwt
tree so that any following call with this value avoids performing the
ckch_store tree lookup.
2025-06-30 17:59:55 +02:00
Remi Tricot-Le Breton
6e9f886c4d MINOR: jwt: Remove unused parameter in convert_ecdsa_sig
The pubkey parameter in convert_ecdsa_sig was not actually used.
2025-06-30 17:59:55 +02:00
Remi Tricot-Le Breton
cd89ce1766 MINOR: jwt: Rename pkey to pubkey in jwt_cert_tree_entry struct
Rename the jwt_cert_tree_entry member pkey to pubkey to avoid any
confusion between private and public key.
2025-06-30 17:59:55 +02:00
Remi Tricot-Le Breton
5c3d0a554b DOC: Fix 'jwt_verify' converter doc
Contrary to what the doc says, the jwt_verify converter only works with
a public key and not a full certificate for certificate based protocols
(everything but HMAC).

This patch should be backported up to 2.8.
2025-06-30 17:59:55 +02:00
Remi Tricot-Le Breton
3465f88f8a BUG/MINOR: jwt: Copy input and parameters in dedicated buffers in jwt_verify converter
When resolving variable values the temporary trash chunks are used so
when calling the 'jwt_verify' converter with two variable parameters
like in the following line, the input would be overwritten by the value
of the second parameter :
    var(txn.bearer),jwt_verify(txn.jwt_alg,txn.cert)
Copying the values into dedicated alloc'ed buffers prevents any new call
to get_trash_chunk from erasing the data we need in the converter.

This patch can be backported up to 2.8.
2025-06-30 17:59:55 +02:00
Christopher Faulet
5ba0a2d527 BUG/MEDIUM: mux-h2: Properly handle connection error during preface sending
On backend side, an error at connection level during the preface sending was
not properly handled and could lead to a spinning loop on process_stream()
when the h2 stream on client side was blocked, for instance because of h2
flow control.

It appeared that no transition was perfromed from the PREFACE state to an
ERROR state on the H2 connection when an error occurred on the underlying
connection. In that case, the H2 connection was woken up in loop to try to
receive data, waking up the upper stream at the same time.

To fix the issue, an H2C error must be reported. Most state transitions are
handled by the demux function. So it is the right place to do so. First, in
PREFACE state and on server side, if an error occurred on the TCP
connection, an error is now reported on the H2 connection. REFUSED_STREAM
error code is used in that case. In addition, in that case, we also take
care to properly handle the connection shutdown.

This patch should fix the issue #3020. It must be backported to all stable
versions.
2025-06-30 16:48:00 +02:00
Christopher Faulet
a2a142bf40 BUG/MEDIUM: hlua: Forbid any L6/L7 sample fetche functions from lua services
It was already forbidden to use HTTP sample fetch functions from lua
services. An error is triggered if it happens. However, the error must be
extended to any L6/L7 sample fetch functions.

Indeed, a lua service is an applet. It totally unexepected for an applet to
access to input data in a channel's buffer. These data have not been
analyzed yet and are still subject to any change. An applet, lua or not,
must never access to "not forwarded" data. Only output data are
available. For now, if a lua applet relies on any L6/L7 sampel fetch
functions, the behavior is undefined and not consistent.

So to fix the issue, hlua flag HLUA_F_MAY_USE_HTTP is renamed to
HLUA_F_MAY_USE_CHANNELS_DATA. This flag is used to prevent any lua applet to
use L6/L7 sample fetch functions.

This patch could be backported to all stable versions.
2025-06-30 16:47:59 +02:00
William Lallemand
7fc8ab0397 MINOR: ssl: check TLS1.3 ciphersuites again in clienthello with recent AWS-LC
Patch ed9b8fec49 ("BUG/MEDIUM: ssl: AWS-LC + TLSv1.3 won't do ECDSA in
RSA+ECDSA configuration") partly fixed a cipher selection problem with
AWS-LC. However this was not checking anymore if the ciphersuites was
available in haproxy which is still a problem.

The problem was fixed in AWS-LC 1.46.0 with this PR
https://github.com/aws/aws-lc/pull/2092.

This patch allows to filter again the TLS13 ciphersuites with recent
versions of AWS-LC. However, since there are no macros to check the
AWS-LC version, it is enabled at the next AWS-LC API version change
following the fix in AWS-LC v1.50.0.

This could be backported where ed9b8fec49 was backported.
2025-06-30 16:43:51 +02:00
Aurelien DARRAGON
4fcc9b5572 MINOR: counters: rename last_change counter to last_state_change
Since proxy and server struct already have an internal last_change
variable and we cannot merge it with the shared counter one, let's
rename the last_change counter to be more specific and prevent the
mixup between the two.

last_change counter is renamed to last_state_change, and unlike the
internal last_change, this one is a shared counter so it is expected
to be updated by other processes in our back.

However, when updating last_state_change counter, we use the value
of the server/proxy last_change as reference value.
2025-06-30 16:26:38 +02:00
Aurelien DARRAGON
5b1480c9d4 MEDIUM: proxy: add and use a separate last_change variable for internal use
Same motivation as previous commit, proxy last_change is "abused" because
it is used for 2 different purposes, one for stats, and the other one
for process-local internal use.

Let's add a separate proxy-only last_change variable for internal use,
and leave the last_change shared (and thread-grouped) counter for
statistics.
2025-06-30 16:26:31 +02:00
Aurelien DARRAGON
01dfe17acf MEDIUM: server: add and use a separate last_change variable for internal use
last_change server metric is used for 2 separate purposes. First it is
used to report last server state change date for stats and other related
metrics. But it is also used internally, including in sensitive paths,
such as lb related stuff to take decision or perform computations
(ie: in srv_dynamic_maxconn()).

Due to last_change counter now being split over thread groups since 16eb0fa
("MAJOR: counters: dispatch counters over thread groups"), reading the
aggregated value has a cost, and we cannot afford to consult last_change
value from srv_dynamic_maxconn() anymore. Moreover, since the value is
used to take decision for the current process we don't wan't the variable
to be updated by another process in our back.

To prevent performance regression and sharing issues, let's instead add a
separate srv->last_change value, which is not updated atomically (given how
rare the  updates are), and only serves for places where the use of the
aggregated last_change counter/stats (split over thread groups) is too
costly.
2025-06-30 16:26:25 +02:00
Aurelien DARRAGON
9d3c73c9f2 BUG/MEDIUM: counters/server: fix server and proxy last_change mixup
16eb0fa ("MAJOR: counters: dispatch counters over thread groups")
introduced some bugs: as a result of improper copy paste during
COUNTERS_SHARED_LAST() macro introduction, some functions such as
srv_downtime() which used to make use of the server last_change variable
now use the proxy one, which doesn't make sense and will likely cause
unexpected logical errors/bugs.

Let's fix them all at once by properly pointing to the server last_change
variable when relevant.

No backport needed.
2025-06-30 16:26:19 +02:00
Aurelien DARRAGON
837762e2ee MINOR: mailers: warn if mailers are configured but not actually used
Now that native mailers configuration is only usable with Lua mailers,
Willy noticed that we lack a way to warn the user if mailers were
previously configured on an older version but Lua mailers were not loaded,
which could trick the user into thinking mailers keep working when
transitionning to 3.2 while it is not.

In this patch we add the 'core.use_native_mailers_config()' Lua function
which should be called in Lua script body before making use of
'Proxy:get_mailers()' function to retrieve legacy mailers configuration
from haproxy main config. This way haproxy effectively knows that the
native mailers config is actually being used from Lua (which indicates
user correctly migrated from native mailers to Lua mailers), else if
mailers are configured but not used from Lua then haproxy warns the user
about the fact that they will be ignored unless they are used from Lua.
(e.g.: using the provided 'examples/lua/mailers.lua' to ease transition)
2025-06-27 16:41:18 +02:00
Aurelien DARRAGON
c7c6d8d295 MINOR: server: move send-proxy* incompatibility check in _srv_check_proxy_mode()
This way the check is executed no matter the section where the server
is declared (ie: not only under the "ring" section)
2025-06-27 16:41:13 +02:00
Aurelien DARRAGON
14d68c2ff7 MEDIUM: server: move _srv_check_proxy_mode() checks from server init to finalize
_srv_check_proxy_mode() is currently executed during server init (from
_srv_parse_init()), while it used to be fine for current checks, it
seems it occurs a bit too early to be usable for some checks that depend
on server keywords to be evaluated for instance.

As such, to make _srv_check_proxy_mode() more relevant and be extended
with additional checks in the future, let's call it later during server
finalization, once all server keywords were evaluated.

No change of behavior is expected
2025-06-27 16:41:07 +02:00
Aurelien DARRAGON
23e5f18b8e MEDIUM: sink: change the sink mode type to PR_MODE_SYSLOG
No change of behavior expected, but some compat checks will now be aware
that the proxy type is not TCP but SYSLOG instead.
2025-06-27 16:41:01 +02:00
Frederic Lecaille
1045623cb8 BUG/MINOR: quic-be: Wrong retry_source_connection_id check
This commit broke the QUIC backend connection to servers without address validation
or retry activated:

  MINOR: quic-be: address validation support implementation (RETRY)

Indeed the retry_source_connection_id transport parameter was already checked as
as if it was required, as if the peer (server) was always using the address validation.
Furthermore, relying on ->odcid.len to ensure a retry token was received is not
correct.

This patch ensures the retry_source_connection_id transport parameter is checked
only when a retry token was received (->retry_token != NULL). In this case
it also checks that this transport parameter is present when a retry token
has been received (tx_params->retry_source_connection_id.len != 0).

No need to backport.
2025-06-27 07:59:12 +02:00
Willy Tarreau
299a441110 [RELEASE] Released version 3.3-dev2
Released version 3.3-dev2 with the following main changes :
    - BUG/MINOR: config/server: reject QUIC addresses
    - MINOR: server: implement helper to identify QUIC servers
    - MINOR: server: mark QUIC support as experimental
    - MINOR: mux-quic-be: allow QUIC proto on backend side
    - MINOR: quic-be: Correct Version Information transp. param encoding
    - MINOR: quic-be: Version Information transport parameter check
    - MINOR: quic-be: Call ->prepare_srv() callback at parsing time
    - MINOR: quic-be: QUIC backend XPRT and transport parameters init during parsing
    - MINOR: quic-be: QUIC server xprt already set when preparing their CTXs
    - MINOR: quic-be: Add a function for the TLS context allocations
    - MINOR: quic-be: Correct the QUIC protocol lookup
    - MINOR: quic-be: ssl_sock contexts allocation and misc adaptations
    - MINOR: quic-be: SSL sessions initializations
    - MINOR: quic-be: Add a function to initialize the QUIC client transport parameters
    - MINOR: sock: Add protocol and socket types parameters to sock_create_server_socket()
    - MINOR: quic-be: ->connect() protocol callback adaptations
    - MINOR: quic-be: QUIC connection allocation adaptation (qc_new_conn())
    - MINOR: quic-be: xprt ->init() adapatations
    - MINOR: quic-be: add field for max_udp_payload_size into quic_conn
    - MINOR: quic-be: Do not redispatch the datagrams
    - MINOR: quic-be: Datagrams and packet parsing support
    - MINOR: quic-be: Handshake packet number space discarding
    - MINOR: h3-be: Correctly retrieve h3 counters
    - MINOR: quic-be: Store asap the DCID
    - MINOR: quic-be: Build post handshake frames
    - MINOR: quic-be: Add the conn object to the server SSL context
    - MINOR: quic-be: Initial packet number space discarding.
    - MINOR: quic-be: I/O handler switch adaptation
    - MINOR: quic-be: Store the remote transport parameters asap
    - MINOR: quic-be: Missing callbacks initializations (USE_QUIC_OPENSSL_COMPAT)
    - MINOR: quic-be: Make the secret derivation works for QUIC backends (USE_QUIC_OPENSSL_COMPAT)
    - MINOR: quic-be: SSL_get_peer_quic_transport_params() not defined by OpenSSL 3.5 QUIC API
    - MINOR: quic-be: get rid of ->li quic_conn member
    - MINOR: quic-be: Prevent the MUX to send/receive data
    - MINOR: quic: define proper proto on QUIC servers
    - MEDIUM: quic-be: initialize MUX on handshake completion
    - BUG/MINOR: hlua: Don't forget the return statement after a hlua_yieldk()
    - BUILD: hlua: Fix warnings about uninitialized variables
    - BUILD: listener: fix 'for' loop inline variable declaration
    - BUILD: hlua: Fix warnings about uninitialized variables (2)
    - BUG/MEDIUM: mux-quic: adjust wakeup behavior
    - MEDIUM: backend: delay MUX init with ALPN even if proto is forced
    - MINOR: quic: mark ctrl layer as ready on quic_connect_server()
    - MINOR: mux-quic: improve documentation for snd/rcv app-ops
    - MINOR: mux-quic: define flag for backend side
    - MINOR: mux-quic: set expect data only on frontend side
    - MINOR: mux-quic: instantiate first stream on backend side
    - MINOR: quic: wakeup backend MUX on handshake completed
    - MINOR: hq-interop: decode response into HTX for backend side support
    - MINOR: hq-interop: encode request from HTX for backend side support
    - CLEANUP: quic-be: Add comments about qc_new_conn() usage
    - BUG/MINOR: quic-be: CID double free upon qc_new_conn() failures
    - MINOR: quic-be: Avoid SSL context unreachable code without USE_QUIC_OPENSSL_COMPAT
    - BUG/MINOR: quic: prevent crash on startup with -dt
    - MINOR: server: reject QUIC servers without explicit SSL
    - BUG/MINOR: quic: work around NEW_TOKEN parsing error on backend side
    - BUG/MINOR: http-ana: Properly handle keep-query redirect option if no QS
    - BUG/MINOR: quic: don't restrict reception on backend privileged ports
    - MINOR: hq-interop: handle HTX response forward if not enough space
    - BUG/MINOR: quic: Fix OSSL_FUNC_SSL_QUIC_TLS_got_transport_params_fn callback (OpenSSL3.5)
    - BUG/MINOR: quic: fix ODCID initialization on frontend side
    - BUG/MEDIUM: cli: Don't consume data if outbuf is full or not available
    - MINOR: cli: handle EOS/ERROR first
    - BUG/MEDIUM: check: Set SOCKERR by default when a connection error is reported
    - BUG/MINOR: mux-quic: check sc_attach_mux return value
    - MINOR: h3: support basic HTX start-line conversion into HTTP/3 request
    - MINOR: h3: encode request headers
    - MINOR: h3: complete HTTP/3 request method encoding
    - MINOR: h3: complete HTTP/3 request scheme encoding
    - MINOR: h3: adjust path request encoding
    - MINOR: h3: adjust auth request encoding or fallback to host
    - MINOR: h3: prepare support for response parsing
    - MINOR: h3: convert HTTP/3 response into HTX for backend side support
    - MINOR: h3: complete response status transcoding
    - MINOR: h3: transcode H3 response headers into HTX blocks
    - MINOR: h3: use BUG_ON() on missing request start-line
    - MINOR: h3: reject invalid :status in response
    - DOC: config: prefer-last-server: add notes for non-deterministic algorithms
    - CLEANUP: connection: remove unused mux-ops dedicated to QUIC
    - BUG/MINOR: mux-quic/h3: properly handle too low peer fctl initial stream
    - MINOR: mux-quic: support max bidi streams value set by the peer
    - MINOR: mux-quic: abort conn if cannot create stream due to fctl
    - MEDIUM: mux-quic: implement attach for new streams on backend side
    - BUG/MAJOR: fwlc: Count an avoided server as unusable.
    - MINOR: fwlc: Factorize code.
    - BUG/MEDIUM: quic: do not release BE quic-conn prior to upper conn
    - MAJOR: cfgparse: turn the same proxy name warning to an error
    - MAJOR: cfgparse: make sure server names are unique within a backend
    - BUG/MINOR: tools: only reset argument start upon new argument
    - BUG/MINOR: stream: Avoid recursive evaluation for unique-id based on itself
    - BUG/MINOR: log: Be able to use %ID alias at anytime of the stream's evaluation
    - MINOR: hlua: emit a log instead of an alert for aborted actions due to unavailable yield
    - MAJOR: mailers: remove native mailers support
    - BUG/MEDIUM: ssl/clienthello: ECDSA with ssl-max-ver TLSv1.2 and no ECDSA ciphers
    - DOC: configuration: add details on prefer-client-ciphers
    - MINOR: ssl: Add "renegotiate" server option
    - DOC: remove the program section from the documentation
    - MAJOR: mworker: remove program section support
    - BUG/MINOR: quic: wrong QUIC_FT_CONNECTION_CLOSE(0x1c) frame encoding
    - MINOR: quic-be: add a "CC connection" backend TX buffer pool
    - MINOR: quic: Useless TX buffer size reduction in closing state
    - MINOR: quic-be: Allow sending 1200 bytes Initial datagrams
    - MINOR: quic-be: address validation support implementation (RETRY)
    - MEDIUM: proxy: deprecate the "transparent" and "option transparent" directives
    - REGTESTS: update http_reuse_be_transparent with "transparent" deprecated
    - REGTESTS: script: also add a line pointing to the log file
    - DOC: config: explain how to deal with "transparent" deprecation
    - MEDIUM: proxy: mark the "dispatch" directive as deprecated
    - DOC: config: crt-list clarify default cert + cert-bundle
    - MEDIUM: cpu-topo: switch to the "performance" cpu-policy by default
    - SCRIPTS: drop the HTML generation from announce-release
    - BUG/MINOR: tools: use my_unsetenv instead of unsetenv
    - CLEANUP: startup: move comment about nbthread where it's more appropriate
    - BUILD: qpack: fix a build issue on older compilers
2025-06-26 18:26:45 +02:00
Willy Tarreau
543b629427 BUILD: qpack: fix a build issue on older compilers
Got this on gcc-4.8:

  src/qpack-enc.c: In function 'qpack_encode_method':
  src/qpack-enc.c:168:3: error: 'for' loop initial declarations are only allowed in C99 mode
     for (size_t i = 0; i < istlen(other); ++i)
     ^

This came from commit a0912cf914 ("MINOR: h3: complete HTTP/3 request
method encoding"), no backport is needed.
2025-06-26 18:09:24 +02:00
Valentine Krasnobaeva
20110491d3 CLEANUP: startup: move comment about nbthread where it's more appropriate
Move the comment about non_global_section_parsed just above the line, where
we reset it.
2025-06-26 18:02:16 +02:00
Valentine Krasnobaeva
a9afc10ae8 BUG/MINOR: tools: use my_unsetenv instead of unsetenv
Let's use our own implementation of unsetenv() instead of the one, which is
provided in libc. Implementation from libc may vary in dependency of UNIX
distro. Implemenation from libc.so.1 ported on Illumos (see the link below) has
caused an eternal loop in the clean_env(), where we invoke unsetenv().

(https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libc/port/gen/getenv.c#L411C1-L456C1)

This is reported at GitHUB #3018 and the reporter has proposed the patch, which
we really appreciate! But looking at his fix and to the implementations of
unsetenv() in FreeBSD libc and in Linux glibc 2.31, it seems, that the algorithm
of clean_env() will perform better with our my_unsetenv() implementation.

This should be backported in versions 3.1 and 3.2.
2025-06-26 18:02:16 +02:00
Willy Tarreau
27baa3f9ff SCRIPTS: drop the HTML generation from announce-release
It has not been used over the last 5 years or so and systematically
requires manual removal. Let's just stop producing it. Also take
this opportunity to add the missing link to /discussions.
2025-06-26 18:02:16 +02:00
Willy Tarreau
b74336984d MEDIUM: cpu-topo: switch to the "performance" cpu-policy by default
As mentioned during the NUMA series development, the goal is to use
all available cores in the most efficient way by default, which
normally corresponds to "cpu-policy performance". The previous default
choice of "cpu-policy first-usable-node" was only meant to stay 100%
identical to before cpu-policy.

So let's switch the default cpu-policy to "performance" right now.
The doc was updated to reflect this.
2025-06-26 16:27:43 +02:00
Maximilian Moehl
5128178256 DOC: config: crt-list clarify default cert + cert-bundle
Clarify that HAProxy duplicates crt-list entries for multi-cert bundles
which can create unexpected side-effects as only the very first
certificate after duplication is considered as default implicitly.
2025-06-26 16:27:07 +02:00
Willy Tarreau
5c15ba5eff MEDIUM: proxy: mark the "dispatch" directive as deprecated
As mentioned in [1], the "dispatch" directive from haproxy 1.0 has long
outlived its original purpose and still suffers from a number of technical
limitations (no checks, no SSL, no idle connes etc) and still hinders some
internal evolutions. It's now time to mark it as deprecated, and to remove
it in 3.5 [2]. It was already recommended against in the documentation but
remained popular in raw TCP environments for being shorter to write.

The directive will now cause a warning to be emitted, suggesting an
alternate method involving "server". The warning can be shut using
"expose-deprecated-directives". The rare configs from 1.0 where
"dispatch" is combined with sticky servers using cookies will just
need to set these servers's weights to zero to prevent them from
being selected by the load balancing algorithm. All of this is
explained in the doc with examples.

Two reg tests were using this method, one purposely for this directive,
which now has expose-deprecated-directives, and another one to test the
behavior of idle connections, which was updated to use "server" and
extended to test both "http-reuse never" and "http-reuse always".

[1] https://github.com/orgs/haproxy/discussions/2921
[2] https://github.com/haproxy/wiki/wiki/Breaking-changes
2025-06-26 15:29:47 +02:00
Willy Tarreau
19140ca666 DOC: config: explain how to deal with "transparent" deprecation
The explanations for the "option transparent" keyword were a bit scarce
regarding deprecation, so let's explain how to replace it with a server
line that does the same.
2025-06-26 14:52:07 +02:00
Willy Tarreau
16f382f2d9 REGTESTS: script: also add a line pointing to the log file
I never counted the number of hours I've been spending selecting then
copy-pasting the directory output and manually appending "/LOG" to read
a log file but it amounts in tens to hundreds. Let's just add a direct
pointer to the log file at the end of the log for a failed run.
2025-06-26 14:33:09 +02:00
Willy Tarreau
1d3ab10423 REGTESTS: update http_reuse_be_transparent with "transparent" deprecated
With commit e93f3ea3f8 ("MEDIUM: proxy: deprecate the "transparent" and
"option transparent" directives") this one no longer works as the config
either has to be adjusted to use server 0.0.0.0 or to enable the deprecated
feature. The test used to validate a technical limitation ("transparent"
not supporting shared connections), indicated as being comparable to
"http-reuse never". Let's now duplicate the test for "http-reuse never"
and "http-reuse always" and validate both behaviors.

Take this opportunity to fix a few problems in this config:
  - use "nbthread 1": depending on the thread where the connection
    arrives, the connection may or may not be reused
  - add explicit URLs to the clients so that they can be recognized
    in the logs
  - add comments to make it clearer what to expect for each test
2025-06-26 14:32:20 +02:00
Willy Tarreau
e93f3ea3f8 MEDIUM: proxy: deprecate the "transparent" and "option transparent" directives
As discussed here [1], "transparent" (already deprecated) and
"option transparent" are horrible hacks which should really disappear
in favor of "server xxx 0.0.0.0" which doesn't rely on hackish code
path. This old feature is now deprecated in 3.3 and will disappear in
3.5, as indicated here [2]. A warning is emitted when used, explaining
how to proceed, and how to silence the warning using the global
"expose-deprecated-directives" if needed. The doc was updated to
reflect this new state.

[1] https://github.com/orgs/haproxy/discussions/2921
[2] https://github.com/haproxy/wiki/wiki/Breaking-changes
2025-06-26 11:55:47 +02:00
Frederic Lecaille
194e3bc2d5 MINOR: quic-be: address validation support implementation (RETRY)
- Add ->retry_token and ->retry_token_len new quic_conn struct members to store
  the retry tokens. These objects are allocated by quic_rx_packet_parse() and
  released by quic_conn_release().
- Add <pool_head_quic_retry_token> new pool for these tokens.
- Implement quic_retry_packet_check() to check the integrity tag of these tokens
  upon RETRY packets receipt. quic_tls_generate_retry_integrity_tag() is called
  by this new function. It has been modified to pass the address where the tag
  must be generated
- Add <resend> new parameter to quic_pktns_discard(). This function is called
  to discard the packet number spaces where the already TX packets and frames are
  attached to. <resend> allows the caller to prevent this function to release
  the in flight TX packets/frames. The frames are requeued to be resent.
- Modify quic_rx_pkt_parse() to handle the RETRY packets. What must be done upon
  such packets receipt is:
  - store the retry token,
  - store the new peer SCID as the DCID of the connection. Note that the peer will
    modify again its SCID. This is why this SCID is also stored as the ODCID
    which must be matched with the peer retry_source_connection_id transport parameter,
  - discard the Initial packet number space without flagging it as discarded and
    prevent retransmissions calling qc_set_timer(),
  - modify the TLS cryptographic cipher contexts (RX/TX),
  - wakeup the I/O handler to send new Initial packets asap.
- Modify quic_transport_param_decode() to handle the retry_source_connection_id
  transport parameter as a QUIC client. Then its caller is modified to
  check this transport parameter matches with the SCID sent by the peer with
  the RETRY packet.
2025-06-26 09:48:00 +02:00
Frederic Lecaille
8a25fcd36e MINOR: quic-be: Allow sending 1200 bytes Initial datagrams
This easy to understand patch is not intrusive at all and cannot break the QUIC
listeners.

The QUIC client MUST always pad its datagrams with Initial packets. A "!l" (not
a listener) test OR'ed with the existing ones is added to satisfy the condition
to allow the build of such datagrams.
2025-06-26 09:48:00 +02:00
Frederic Lecaille
c898b29e64 MINOR: quic: Useless TX buffer size reduction in closing state
There is no need to limit the size of the TX buffer to QUIC_MIN_CC_PKTSIZE bytes
when the connection is in closing state. There is already a test which limits the
number of bytes to be used from this TX buffer after this useless test removed.
It limits this number of bytes to the size of the TX buffer itself:

    if (end > (unsigned char *)b_wrap(buf))
	    end = (unsigned char *)b_wrap(buf);

This is exactly what is needed when the connection is in closing state. Indeed,
the size of the TX buffers are limited to reduce the memory usage. The connection
only needs to send short datagrams with at most 2 packets with a CONNECTION_CLOSE*
frames. They are built only one time and backed up into small TX buffer allocated
from a dedicated pool.
The size of this TX buffer is QUIC_MAX_CC_BUFSIZE which depends on QUIC_MIN_CC_PKTSIZE:

 #define QUIC_MIN_CC_PKTSIZE  128
 #define QUIC_MAX_CC_BUFSIZE (2 * (QUIC_MIN_CC_PKTSIZE + QUIC_DGRAM_HEADLEN))

This size is smaller than an MTU.

This patch should be backported as far as 2.9 to ease further backports to come.
2025-06-26 09:48:00 +02:00
Frederic Lecaille
9cb2acd2f2 MINOR: quic-be: add a "CC connection" backend TX buffer pool
A QUIC client must be able to close a connection sending Initial packets. But
QUIC client Initial packets must always be at least 1200 bytes long. To reduce
the memory use of TX buffers of a connection when in "closing" state, a pool
was dedicated for this purpose but with a too much reduced TX buffer size
(QUIC_MAX_CC_BUFSIZE).

This patch adds a "closing state connection" TX buffer pool with the same role
for QUIC backends.
2025-06-26 09:48:00 +02:00
Frederic Lecaille
1e6d8f199c BUG/MINOR: quic: wrong QUIC_FT_CONNECTION_CLOSE(0x1c) frame encoding
This is an old bug which was there since this commit:

     MINOR: quic: Avoid zeroing frame structures

It seems QUIC_FT_CONNECTION_CLOSE was confused with QUIC_FT_CONNECTION_CLOSE_APP
which does not include a "frame type" field. This field was not initialized
(so with a random value) which prevent the packet to be built because the
packet builder supposes the packet with such frames are very short.

Must be backported as far as 2.6.
2025-06-26 09:48:00 +02:00
William Lallemand
7cb6167d04 MAJOR: mworker: remove program section support
This patch removes completely the support for the program section, the
parsing of the section as well as the internals in the mworker does not
support it anymore.

The program section was considered dysfonctional and not fully
compatible with the "mworker V3" model. Users that want to run an
external program must use their init system.

The documentation is cleaned up in another patch.
2025-06-25 16:11:34 +02:00
William Lallemand
9b5bf81f3c DOC: remove the program section from the documentation
The program section is obsolete and can be remove from the
documentation.
2025-06-25 15:42:57 +02:00
Remi Tricot-Le Breton
34fc73ba81 MINOR: ssl: Add "renegotiate" server option
This "renegotiate" option can be set on SSL backends to allow secure
renegotiation. It is mostly useful with SSL libraries that disable
secure regotiation by default (such as AWS-LC).
The "no-renegotiate" one can be used the other way around, to disable
secure renegotation that could be allowed by default.
Those two options can be set via "ssl-default-server-options" as well.
2025-06-25 15:23:48 +02:00
William Lallemand
370a8cea4a DOC: configuration: add details on prefer-client-ciphers
prefer-client-ciphers does not work exactly the same way when used with
a dual algorithm stack (ECDSA + RSA). This patch details its behavior.

This patch must be backported in every maintained version.

Problem was discovered in #2988.
2025-06-25 14:41:45 +02:00
William Lallemand
4a298c6c5c BUG/MEDIUM: ssl/clienthello: ECDSA with ssl-max-ver TLSv1.2 and no ECDSA ciphers
Patch 23093c72 ("BUG/MINOR: ssl: suboptimal certificate selection with TLSv1.3
and dual ECDSA/RSA") introduced a problem when prioritizing the ECDSA
with TLSv1.3.

Indeed, when a client with TLSv1.3 capabilities announce a list of
ECDSA sigalgs, a list of TLSv1.3 ciphersuites compatible with ECDSA,
but only RSA ciphers for TLSv1.2, and haproxy is configured to a
ssl-max-ver TLSv1.2, then haproxy would use the ECDSA keypair, but the
client wouldn't be able to process it because TLSv1.2 was negociated.

HAProxy would be configured like that:

  ssl-default-bind-options ssl-max-ver TLSv1.2

And a client could be used this way:

  openssl s_client -connect localhost:8443 -cipher ECDHE-ECDSA-AES128-GCM-SHA256 \
          -ciphersuites TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_GCM_SHA256

This patch fixes the issue by checking if TLSv1.3 was configured before
allowing ECDSA is an TLSv1.3 ciphersuite is in the list.

This could be backported where 23093c72 ("BUG/MINOR: ssl: suboptimal
certificate selection with TLSv1.3 and dual ECDSA/RSA") was backported.
However this is quite sensible and we should wait a bit before the
backport.

This should fix issue #2988
2025-06-25 14:25:14 +02:00
Aurelien DARRAGON
5694a98744 MAJOR: mailers: remove native mailers support
As mentioned in 2.8 announce on the mailing list [1] and on the wiki [2]
native mailers were deprecated and planned for removal in 3.3. Now is
the time to drop the legacy code for native mailers which is based on a
tcpcheck "hack" and cannot be maintained. Lua mailers should be used as
a drop in replacement. Indeed, "mailers" and associated config directives
are preserved because mailers config is exposed to Lua, which helps smoothing
the transition from native mailers to Lua based ones.

As a reminder, to keep mailers configuration working as before without
making changes to the config file, simply add the line below to the global
section:

       lua-load examples/lua/mailers.lua

mailers.lua script (provided in the git repository, adjust path as needed)
may be customized by users familiar with Lua, by default it emulates the
behavior of the native (now removed) mailers.

[1]: https://www.mail-archive.com/haproxy@formilux.org/msg43600.html
[2]: https://github.com/haproxy/wiki/wiki/Breaking-changes
2025-06-24 10:55:58 +02:00
Aurelien DARRAGON
c0f6024854 MINOR: hlua: emit a log instead of an alert for aborted actions due to unavailable yield
As reported by Chris Staite in GH #3002, trying to yield from a Lua
action during a client disconnect causes the script to be interrupted
(which is expected) and an alert to be emitted with the error:
"Lua function '%s': yield not allowed".

While this error is well suited for cases where the yield is not expected
at all (ie: when context doesn't allow it) and results from a yield misuse
in the Lua script, it isn't the case when the yield is exceptionnally not
available due to an abort or error in the request/response processing.
Because of that we raise an alert but the user cannot do anything about it
(the script is correct), so it is confusing and polluting the logs.

In this patch we introduce the ACT_OPT_FINAL_EARLY flag which is a
complementary flag to ACT_OPT_FIRST. This flag is set when the
ACT_OPT_FIRST is set earlier than normal (due to error/abort).
hlua_action() then checks for this flag to decide whether an error (alert)
or a simple log message should be emitted when the yield is not available.

It should solve GH #3002. Thanks to Chris Staite (@chrisstaite-menlo) for
having reported the issue and suggested a solution.
2025-06-24 10:55:55 +02:00
Christopher Faulet
20a82027ce BUG/MINOR: log: Be able to use %ID alias at anytime of the stream's evaluation
In a log-format string, using "%[unique-id]" or "%ID" should be equivalent.
However, for the first one, the unique ID is generated when the sample fetch
function is called. For the alias, it is not true. It that case, the
stream's unique ID is generated when the log message is emitted. Otherwise,
by default, the unique id is automatically generated at the end of the HTTP
request analysis.

So, if the alias "%ID" is use in a log-format string anywhere before the end
of the request analysis, the evaluation failed and the ID is considered as
empty. It is not consistent and in contradiction with the "%ID"
documentation.

To fix the issue, instead of evaluating the unique ID when the log message
is emitted, it is now performed on demand when "%ID" format is evaluated.

This patch should fix the issue #3016. It should be backported to all stable
versions. It relies on the following commit:

  * BUG/MINOR: stream: Avoid recursive evaluation for unique-id based on itself
2025-06-24 08:04:50 +02:00
Christopher Faulet
fb7b5c8a53 BUG/MINOR: stream: Avoid recursive evaluation for unique-id based on itself
There is nothing that prevent a "unique-id-format" to reference itself,
using '%ID' or '%[unique-id]'. If the sample fetch function is used, it
leads to an infinite loop, calling recursively the function responsible to
generate the unique ID.

One solution is to detect it during the configuration parsing to trigger an
error. With this patch, we just inhibit recursive calls by considering the
unique-id as empty during its evaluation. So "id-%[unique-id]" lf string
will be evaluated as "id-".

This patch must be backported to all stable versions.
2025-06-24 08:04:50 +02:00
Willy Tarreau
68c3eb3013 BUG/MINOR: tools: only reset argument start upon new argument
In issue #2995, Thomas Kjaer reported that empty argument position
reporting had been broken yet again. This time it was broken by this
latest fix: 2b60e54fb1 ("BUG/MINOR: tools: improve parse_line()'s
robustness against empty args"). It turns out that this fix is not
the culprit and it's in fact correct. The culprit was the original
commit of this series, 7e4a2f39ef ("BUG/MINOR: tools: do not create
an empty arg from trailing spaces"), which used to reset arg_start
to outpos for every new char in addition to doing it for every arg.
This resulted in the end of the line to be seen as always being in
error, thus reporting an incorrect position that the caller would
correct in a generic way designating the beginning of the line. It
didn't reveal prior to the upper fix above because the misassigned
value was almost not used by then.

Assigning the value before entering the loop fixes this problem and
doens't break the series of previous oss-fuzz reproducers. Hopefully
it's the last one again.

This must be backported to 3.2. Thanks to @tkjaer for reporting the
issue along with a reproducer.
2025-06-23 18:41:52 +02:00
Willy Tarreau
d7fad1320e MAJOR: cfgparse: make sure server names are unique within a backend
There was already a check for this but there used to be an exception
that allowed duplicate server names only in case where their IDs were
explicit and different. This has been emitting a warning since 3.1 and
planned for removal in 3.3, so let's do it now. The doc was updated,
though it never mentioned this unicity constraint, so that was added.

Only the check for the exception was removed, the rest of the code
that is currently made to deal with duplicate server names was not
cleaned yet (e.g. the tree doesn't need to support dups anymore, and
this could be done at insertion time). This may be a subject for future
cleanups.
2025-06-23 15:42:32 +02:00
Willy Tarreau
067be38c0e MAJOR: cfgparse: turn the same proxy name warning to an error
As warned since 3.1, it's no longer permitted to have a frontend and
a backend under the same name. This causes too many designation issues,
and causes trouble with stick-tables as well. Now each proxy name is
unique.

This commit only changes the check to return an error. Some code parts
currently exist to find the best candidates, these will be able to be
simplified as future cleanup patches. The doc was updated.
2025-06-23 15:34:05 +02:00
Amaury Denoyelle
74b95922ef BUG/MEDIUM: quic: do not release BE quic-conn prior to upper conn
For frontend side, quic_conn is only released if MUX wasn't allocated,
either due to handshake abort, in which case upper layer is never
allocated, or after transfer completion when full conn + MUX layers are
already released.

On the backend side, initialization is not performed in the same order.
Indeed, in this case, connection is first instantiated, the nthe
quic_conn is created to execute the handshake, while MUX is still only
allocated on handshake completion. As such, it is not possible anymore
to free immediately quic_conn on handshake failure. Else, this can cause
crash if the connection try to reaccess to its transport layer after
quic_conn release.

Such crash can easily be reproduced in case of connection error to the
QUIC server. Here is an example of an experienced backtrace.

Thread 1 "haproxy" received signal SIGSEGV, Segmentation fault.
  0x0000555555739733 in quic_close (conn=0x55555734c0d0, xprt_ctx=0x5555573a6e50) at src/xprt_quic.c:28
  28              qc->conn = NULL;
  [ ## gdb ## ] bt
  #0  0x0000555555739733 in quic_close (conn=0x55555734c0d0, xprt_ctx=0x5555573a6e50) at src/xprt_quic.c:28
  #1  0x00005555559c9708 in conn_xprt_close (conn=0x55555734c0d0) at include/haproxy/connection.h:162
  #2  0x00005555559c97d2 in conn_full_close (conn=0x55555734c0d0) at include/haproxy/connection.h:206
  #3  0x00005555559d01a9 in sc_detach_endp (scp=0x7fffffffd648) at src/stconn.c:451
  #4  0x00005555559d05b9 in sc_reset_endp (sc=0x55555734bf00) at src/stconn.c:533
  #5  0x000055555598281d in back_handle_st_cer (s=0x55555734adb0) at src/backend.c:2754
  #6  0x000055555588158a in process_stream (t=0x55555734be10, context=0x55555734adb0, state=516) at src/stream.c:1907
  #7  0x0000555555dc31d9 in run_tasks_from_lists (budgets=0x7fffffffdb30) at src/task.c:655
  #8  0x0000555555dc3dd3 in process_runnable_tasks () at src/task.c:889
  #9  0x0000555555a1daae in run_poll_loop () at src/haproxy.c:2865
  #10 0x0000555555a1e20c in run_thread_poll_loop (data=0x5555569d1c00 <ha_thread_info>) at src/haproxy.c:3081
  #11 0x0000555555a1f66b in main (argc=5, argv=0x7fffffffde18) at src/haproxy.c:3671

To fix this, change the condition prior to calling quic_conn release. If
<conn> member is not NULL, delay the release, similarly to the case when
MUX is allocated. This allows connection to be freed first, and detach
from quic_conn layer through close xprt operation.

No need to backport.
2025-06-20 17:46:10 +02:00
Olivier Houchard
ba5738489f MINOR: fwlc: Factorize code.
Always set unusable if we could not use a server, instead of doing it in
each branch

This should be backported to 3.2 after e28e647fef43e5865c87f328832fec7794a423e5
is backported.
2025-06-20 15:59:03 +02:00
Olivier Houchard
e28e647fef BUG/MAJOR: fwlc: Count an avoided server as unusable.
When fwlc_get_next_server(), if a server to avoid has been provided, and
we have to ignore it, don't forget to increase the number of unusable
servers, otherwise we may end up ignoring it over and over, never
switching to another server, in an infinite loop until the process gets
killed.
This hopefully fixes Github issues #3004 and #3014.

This should be backported to 3.2.
2025-06-20 15:29:51 +02:00
Amaury Denoyelle
4527a2912b MEDIUM: mux-quic: implement attach for new streams on backend side
Implement attach and avail_streams mux-ops callbacks, which are used on
backend side for connection reuse.

Attach operation is used to initiate new streams on the connection
outside of the first one. It simply relies on qcc_init_stream_local() to
instantiate a new QCS instance, which is immediately linked to its
stream data layer.

Outside of attach, it is also necessary to implement avail_streams so
that the stream layer will try to initiate connection reuse. This method
reports the number of bidirectional streams which can still be opened
for the QUIC connection. It depends directly to the flow-control value
advertised by the peer. Thus, this ensures that attach won't cause any
flow control violation.
2025-06-18 17:25:27 +02:00
Amaury Denoyelle
81cfaab6b4 MINOR: mux-quic: abort conn if cannot create stream due to fctl
Prior to initiate first stream on the backend side, ensure that peer
flow-control allows at least that a single bidirectional stream can be
created. If this is not the case, abort MUX init operation.

Before this patch, flow-control limit was not checked. Hence, if peer
does not allow any bidirectional stream, haproxy would violate it, which
whould then cause the peer to close the connection.

Note that with the current situation, haproxy won't be able to talk to
servers which uses a 0 for initial max bidi streams. A proper solution
could be to pause the request until a MAX_STREAMS is received, under
timeout supervision to ensure the connection is closed if no frame is
received.
2025-06-18 17:25:27 +02:00
Amaury Denoyelle
06cab99a0e MINOR: mux-quic: support max bidi streams value set by the peer
Implement support for MAX_STREAMS frame. On frontend, this was mostly
useless as haproxy would never initiate new bidirectional streams.
However, this becomes necessary to control stream flow-control when
using QUIC as a client on the backend side.

Parsing of MAX_STREAMS is implemented via new qcc_recv_max_streams().
This allows to update <ms_uni>/<ms_bidi> QCC fields.

This patch is necessary to achieve QUIC backend connection reuse.
2025-06-18 17:25:27 +02:00
Amaury Denoyelle
805a070ab9 BUG/MINOR: mux-quic/h3: properly handle too low peer fctl initial stream
Previously, no check on peer flow-control was implemented prior to open
a local QUIC stream. This was a small problem for frontend
implementation, as in this case haproxy as a server never opens
bidirectional streams.

On frontend, the only stream opened by haproxy in this case is for
HTTP/3 control unidirectional data. If the peer uses an initial value
for max uni streams set to 0, it would violate its flow control, and the
peer will probably close the connection. Note however that RFC 9114
mandates that each peer defines minimal initial value so that at least
the control stream can be created.

This commit improves the situation of too low initial max uni streams
value. Now, on HTTP/3 layer initialization, haproxy preemptively checks
flow control limit on streams via a new function
qcc_fctl_avail_streams(). If credit is already expired due to a too
small initial value, haproxy preemptively closes the connection using
H3_ERR_GENERAL_PROTOCOL_ERROR. This behavior is better as haproxy is now
the initiator of the connection closure.

This should be backported up to 2.8.
2025-06-18 17:18:55 +02:00
Amaury Denoyelle
c807182ec9 CLEANUP: connection: remove unused mux-ops dedicated to QUIC
Remove avail_streams_bidi/avail_streams_uni mux_ops. These callbacks
were designed to be specific to QUIC. However, they won't be necessary,
as stream layer only cares about bidirectional streams.
2025-06-18 17:02:50 +02:00
Valentine Krasnobaeva
cdb2f8d780 DOC: config: prefer-last-server: add notes for non-deterministic algorithms
Add some notes which load-balancing algorithm can be considered as
deterministic or non-deterministic and add some examples for each type.
This was asked via mailing list to clarify the usage of
prefer-last-server option.

This can be backported to all stable versions.
2025-06-17 21:18:23 +02:00
Amaury Denoyelle
8fc0d2fbd5 MINOR: h3: reject invalid :status in response
Add checks to ensure that :status pseudo-header received in HTTP/3
response is valid. If either the header is not provided, or it isn't a 3
digit numbers, the response is considered as invalid and the streams is
rejected. Also, glitch counter is now incremented in any of these cases.

This should fix coverity report from github issue #3009.
2025-06-17 11:39:35 +02:00
Amaury Denoyelle
f972f7d9e9 MINOR: h3: use BUG_ON() on missing request start-line
Convert BUG_ON_HOT() statements to BUG_ON() if HTX start-line is either
missing or duplicated when transcoding into a HTTP/3 request. This
ensures that such abnormal conditions will be detected even on default
builds.

This is linked to coverity report #3008.
2025-06-17 11:39:35 +02:00
Amaury Denoyelle
2284aa0d6a MINOR: h3: transcode H3 response headers into HTX blocks
Finalize HTTP/3 response transcoding into HTX message. This patch
implements conversion of HTTP/3 headers provided by the server into HTX
blocks.

Special checks have been implemented to reject connection-specific
headers, causing the stream to be shut in error. Also, handling of
content-length requires that the body size is equal to the value
advertized in the header to prevent HTTP desync.
2025-06-16 18:11:09 +02:00
Amaury Denoyelle
d83255fdc3 MINOR: h3: complete response status transcoding
On the backend side, HTTP/3 request response from server is transcoded
into a HTX message. Previously, a fixed value was used for the status
code.

Improve this by extracting the value specified by the server and set it
into the HTX status line. This requires to detect :status pseudo-header
from the HTTP/3 response.
2025-06-16 18:11:09 +02:00
Amaury Denoyelle
f79effa306 MINOR: h3: convert HTTP/3 response into HTX for backend side support
Implement basic support for HTTP/3 request response transcoding into
HTX. This is done via a new dedicated function h3_resp_headers_to_htx().
A valid HTX status-line is allocated and stored. Status code is
hardcoded to 200 for now.

Following patches will be added to remove hardcoded status value and
also handle response headers provided by the server.
2025-06-16 18:11:09 +02:00
Amaury Denoyelle
0eb35029dc MINOR: h3: prepare support for response parsing
Refactor HTTP/3 request headers transcoding to HTX done in
h3_headers_to_htx(). Some operations are extracted into dedicated
functions, to check pseudo-headers and headers conformity, and also trim
the value of headers before encoding it in HTX.

The objective will be to simplify implementation of HTTP/3 response
transcoding by reusing these functions.

Also, h3_headers_to_htx() has been renamed to h3_req_headers_to_htx(),
to highlight that it is reserved to frontend usage.
2025-06-16 18:11:09 +02:00
Amaury Denoyelle
555ec99d43 MINOR: h3: adjust auth request encoding or fallback to host
Implement proper encoding of HTTP/3 authority pseudo-header during
request transcoding on the backend side. A pseudo-header :authority is
encoded if a value can be extracted from HTX start-line. A special check
is also implemented to ensure that a host header is not encoded if
:authority already is.

A new function qpack_encode_auth() is defined to implement QPACK
encoding of :authority header using literal field line with name ref.
2025-06-16 18:11:09 +02:00
Amaury Denoyelle
96183abfbd MINOR: h3: adjust path request encoding
Previously, HTTP/3 backend request :path was hardcoded to value '/'.
Change this so that we can now encode any path as requested by the
client. Path is extracted from the HTX URI. Also, qpack_encode_path() is
extended to support literal field line with name ref.
2025-06-16 18:11:09 +02:00
Amaury Denoyelle
235e818fa1 MINOR: h3: complete HTTP/3 request scheme encoding
Previously, scheme was always set to https when transcoding an HTX
start-line into a HTTP/3 request. Change this so this conversion is now
fully compliant.

If no scheme is specified by the client, which is what happens most of
the time with HTTP/1, https is set for the HTTP/3 request. Else, reuse
the scheme requested by the client.

If either https or http is set, qpack_encode_scheme will encode it using
entry from QPACK static table. Else, a full literal field line with name
ref is used instead as the scheme value is specified as-is.
2025-06-16 18:11:09 +02:00
Amaury Denoyelle
a0912cf914 MINOR: h3: complete HTTP/3 request method encoding
On the backend side, HTX start-line is converted into a HTTP/3 request
message. Previously, GET method was hardcoded. Implement proper method
conversion, by extracting it from the HTX start-line.

qpack_encode_method() has also been extended, so that it is able to
encode any method, either using a static table entry, or with a literal
field line with name ref representation.
2025-06-16 18:11:09 +02:00
Amaury Denoyelle
f5342e0a96 MINOR: h3: encode request headers
Implement encoding of HTTP/3 request headers during HTX->H3 conversion
on the backend side. This simply relies on h3_encode_header().

Special check is implemented to ensure that connection-specific headers
are ignored. An HTTP/3 endpoint must never generate them, or the peer
will consider the message as malformed.
2025-06-16 18:11:09 +02:00
Amaury Denoyelle
7157adb154 MINOR: h3: support basic HTX start-line conversion into HTTP/3 request
This commit is the first one of a serie which aim is to implement
transcoding of a HTX request into HTTP/3, which is necessary for QUIC
backend support.

Transcoding is implementing via a new function h3_req_headers_send()
when a HTX start-line is parsed. For now, most of the request fields are
hardcoded, using a GET method. This will be adjusted in the next
following patches.
2025-06-16 18:11:09 +02:00
Amaury Denoyelle
fc1a17f169 BUG/MINOR: mux-quic: check sc_attach_mux return value
On backend side, QUIC MUX needs to initialize the first local stream
during MUX init operation. This is necessary so that the first transfer
can then be performed.

sc_attach_mux() is used to attach the created QCS instance to its stream
data layer. However, return value was not checked, which may cause
issues on allocation error. This patch fixes it by returning an error on
MUX init operation and freeing the QCS instance in case of
sc_attach_mux() error.

This fixes coverity report from github issue #3007.

No need to backport.
2025-06-16 18:11:09 +02:00
Christopher Faulet
54d74259e9 BUG/MEDIUM: check: Set SOCKERR by default when a connection error is reported
When a connection error is reported, we try to collect as much information
as possible on the connection status and the server status is adjusted
accordingly. However, the function does nothing if there is no connection
error and if the healthcheck is not expired yet. It is a problem when an
internal error occurred. It may happen at many places and it is hard to be
sure an error is reported on the connection. And in fact, it is already a
problem when the multiplexer allocation fails. In that case, the healthcheck
is not interrupted as it should be. Concretely, it could only happen when a
connection is established.

It is hard to predict the effects of this bug. It may be unimportant. But it
could probably lead to a crash. To avoid any issue, a SOCKERR status is now
set by default when a connection error is reported. There is no reason to
report a connection error for nothing. So a healthcheck failure must be
reported. There is no "internal error" status. So a socket error is
reported.

This patch must be backport to all stable versions.
2025-06-16 17:47:35 +02:00
Christopher Faulet
fb76655526 MINOR: cli: handle EOS/ERROR first
It is not especially a bug fixed. But APPCTX_FL_EOS and APPCTX_FL_ERROR
flags must be handled first. These flags are set by the applet itself and
should mark the end of all processing. So there is not reason to get the
output buffer in first place.

This patch could be backported as far as 3.0.
2025-06-16 16:47:59 +02:00
Christopher Faulet
396f0252bf BUG/MEDIUM: cli: Don't consume data if outbuf is full or not available
The output buffer must be available to process a command, at least to be
able to emit error messages. When this buffer is full or cannot be
allocated, we must wait. In that case, we must take care to notify the SE
will not consume input data. It is important to avoid wakeup in loop,
especially when the client aborts.

When the output buffer is available again and no longer full, and the CLI
applet is waiting for a command line, it must notify it will consume input
data.

This patch must be backported as far as 3.0.
2025-06-16 16:47:59 +02:00
Amaury Denoyelle
96badf86a2 BUG/MINOR: quic: fix ODCID initialization on frontend side
QUIC support on the backend side has been implemented recently. This has
lead to some adjustment on qc_new_conn() to handle both FE and BE sides,
with some of these changes performed by the following commit.

  29fb1aee57288a8b16ed91771ae65c2bfa400128
  MINOR: quic-be: QUIC connection allocation adaptation (qc_new_conn())

An issue was introduced during some code adjustement. Initialization of
ODCID was incorrectly performed, which caused haproxy to emit invalid
transport parameters. Most of the clients detected this and immediatly
closed the connection.

Fix this by adjusting qc_lstnr_params_init() invokation : replace
<qc.dcid>, which in fact points to the received SCID, by <qc.odcid>
whose purpose is dedicated to original DCID storage.

This fixes github issue #3006. This issue also caused the majority of
tests in the interop to fail.

No backport needed.
2025-06-16 10:09:37 +02:00
Frederic Lecaille
5409a73721 BUG/MINOR: quic: Fix OSSL_FUNC_SSL_QUIC_TLS_got_transport_params_fn callback (OpenSSL3.5)
This patch is OpenSSL3.5 QUIC API specific. It fixes
OSSL_FUNC_SSL_QUIC_TLS_got_transport_params_fn() callback (see man(3) SSL_set_quic_tls_cb).

The role of this callback is to store the transport parameters received by the peer.
At this time it is never used by QUIC listeners because there is another callback
which is used to store the transport parameters. This latter callback is not specific
to OpenSSL 3.5 QUIC API. As far as I know, the TLS stack call only one time
one of the callbacks which have been set to receive and store the transport parameters.

That said, OSSL_FUNC_SSL_QUIC_TLS_got_transport_params_fn() is called for QUIC
backends to store the server transport parameters.

qc_ssl_set_quic_transport_params() is useless is this callback. It is dedicated
to store the local tranport parameters (which are sent to the peer). Furthermore
<server> second parameter of quic_transport_params_store() must be 0 for a listener
(or QUIC server) whichs call it, denoting it does not receive the transport parameters
of a QUIC server. It must be 1 for a QUIC backend (a QUIC client which receives
the transport parameter of a QUIC server).

Must be backported to 3.2.
2025-06-16 10:02:45 +02:00
Amaury Denoyelle
ab6895cc65 MINOR: hq-interop: handle HTX response forward if not enough space
On backend side, HTTP/0.9 response body is copied into stream data HTX
buffer. Properly handle the case where the HTX out buffer space is too
small. Only copy a partial copy of the HTTP response. Transcoding will
be restarted when new room is available.
2025-06-13 17:41:13 +02:00
Amaury Denoyelle
46cee07931 BUG/MINOR: quic: don't restrict reception on backend privileged ports
When QUIC is used on the frontend side, communication is restricted with
clients using privileged port. This is a simple protection against
DNS/NTP spoofing.

This feature should not be activated on the backend side, as in this
case it is quite frequent to exchange with server running on privileged
ports. As such, a new parameter is added to quic_recv() so that it is
only active on the frontend side.

Without this patch, it is impossible to communicate with QUIC servers
running on privileged ports, as incoming datagrams would be silently
dropped.

No need to backport.
2025-06-13 16:40:21 +02:00
Christopher Faulet
edb8f2bb60 BUG/MINOR: http-ana: Properly handle keep-query redirect option if no QS
The keep-query redirect option must do nothing is there is no query-string.
However, there is a bug. When there is no QS, an error is returned, leading
to return a 500-internal-error to the client.

To fix the bug, instead of returning 0 when there is no QS, we just skip the
QS processing.

This patch should fix the issue #3005. It must be backported as far as 3.1.
2025-06-13 11:27:20 +02:00
Amaury Denoyelle
577fa44691 BUG/MINOR: quic: work around NEW_TOKEN parsing error on backend side
NEW_TOKEN frame is never emitted by a client, hence parsing was not
tested on frontend side.

On backend side, an issue can occur, as expected token length is static,
based on the token length used internally by haproxy. This is not
sufficient for most server implementation which uses larger token. This
causes a parsing error, which may cause skipping of following frames in
the same packet. This issue was detected using ngtcp2 as server.

As for now tokens are unused by haproxy, simply discard test on token
length during NEW_TOKEN frame parsing. The token itself is merely
skipped without being stored. This is sufficient for now to continue on
experimenting with QUIC backend implementation.

This does not need to be backported.
2025-06-12 17:47:15 +02:00
Amaury Denoyelle
830affc17d MINOR: server: reject QUIC servers without explicit SSL
Report an error during server configuration if QUIC is used by SSL is
not activiated via 'ssl' keyword. This is done in _srv_parse_finalize(),
which is both used by static and dynamic servers.

Note that contrary to listeners, an error is reported instead of a
warning, and SSL is not automatically activated if missing. This is
mainly due to the complex server configuration : _srv_parse_finalize()
is ideal to affect every servers, including dynamic entries. However, it
is executed after server SSL context allocation performed via
<prepare_srv> XPRT operation. A proper fix would be to move SSL ctx
alloc in _srv_parse_finalize(), but this may have unknown impact. Thus,
for now a simpler solution has been chosen.
2025-06-12 16:16:43 +02:00
Amaury Denoyelle
33cd96a5e9 BUG/MINOR: quic: prevent crash on startup with -dt
QUIC traces in ssl_quic_srv_new_ssl_ctx() are problematic as this
function is called early during startup. If activating traces via -dt
command-line argument, a crash occurs due to stderr sink not yet
available.

Thus, traces from ssl_quic_srv_new_ssl_ctx() are simply removed.

No backport needed.
2025-06-12 15:15:56 +02:00
Frederic Lecaille
5a0ae9e9be MINOR: quic-be: Avoid SSL context unreachable code without USE_QUIC_OPENSSL_COMPAT
This commit added a "err" C label reachable only with USE_QUIC_OPENSSL_COMPAT:

   MINOR: quic-be: Missing callbacks initializations (USE_QUIC_OPENSSL_COMPAT)

leading coverity to warn this:

*** CID 1611481:         Control flow issues  (UNREACHABLE)
/src/quic_ssl.c: 802             in ssl_quic_srv_new_ssl_ctx()
796     		goto err;
797     #endif
798
799      leave:
800     	TRACE_LEAVE(QUIC_EV_CONN_NEW);
801     	return ctx;
>>>     CID 1611481:         Control flow issues  (UNREACHABLE)
>>>     This code cannot be reached: "err:
SSL_CTX_free(ctx);".
802      err:
803     	SSL_CTX_free(ctx);
804     	ctx = NULL;
805     	TRACE_DEVEL("leaving on error", QUIC_EV_CONN_NEW);
806     	goto leave;
807     }

The less intrusive (without #ifdef) way to fix this it to add a "goto err"
statement from the code part which is reachable without USE_QUIC_OPENSSL_COMPAT.

Thank you to @chipitsine for having reported this issue in GH #3003.
2025-06-12 11:45:21 +02:00
Frederic Lecaille
869fb457ed BUG/MINOR: quic-be: CID double free upon qc_new_conn() failures
This issue may occur when qc_new_conn() fails after having allocated
and attached <conn_cid> to its tree. This is the case when compiling
haproxy against WolfSSL for an unknown reason at this time. In this
case the <conn_cid> is freed by pool_head_quic_connection_id(), then
freed again by quic_conn_release().

This bug arrived with this commit:

    MINOR: quic-be: QUIC connection allocation adaptation (qc_new_conn())

So, the aim of this patch is to free <conn_cid> only for QUIC backends
and if it is not attached to its tree. This is the case when <conn_id>
local variable passed with NULL value to qc_new_conn() is then intialized
to the same <conn_cid> value.
2025-06-12 11:45:21 +02:00
Frederic Lecaille
dc3fb3a731 CLEANUP: quic-be: Add comments about qc_new_conn() usage
This patch should have come with this last commit for the last qc_new_conn()
modifications for QUIC backends:

     MINOR: quic-be: get rid of ->li quic_conn member

qc_new_conn() must be passed NULL pointers for several variables as mentioned
by the comment. Some of these local variables are used to avoid too much
code modifications.
2025-06-12 11:45:21 +02:00
Amaury Denoyelle
603afd495b MINOR: hq-interop: encode request from HTX for backend side support
Implement transcoding of a HTX request into HTTP/0.9. This protocol is a
simplified version of HTTP. Request only supports GET method without any
header. As such, only a request line is written during snd_buf
operation.
2025-06-12 11:28:54 +02:00
Amaury Denoyelle
a286d5476b MINOR: hq-interop: decode response into HTX for backend side support
Implement transcoding of a HTTP/0.9 response into a HTX message.

HTTP/0.9 is a really simple substract of HTTP spec. The response does
not have any status line and is contains only the payload body. Response
is finished when the underlying connection/stream is closed.

A status line is generated to be compliant with HTX. This is performed
on the first invokation of rcv_buf for the current stream. Status code
is set to 200. Payload body if present is then copied using
htx_add_data().
2025-06-12 11:28:54 +02:00
Amaury Denoyelle
4031bf7432 MINOR: quic: wakeup backend MUX on handshake completed
This commit is the second and final step to initiate QUIC MUX on the
backend side. On handshake completion, MUX is woken up just after its
creation. This step is necessary to notify the stream layer, via the QCS
instance pre-initialized on MUX init, so that the transfer can be
resumed.

This mode of operation is similar to TCP stack when TLS+ALPN are used,
which forces MUX initialization to be delayed after handshake
completion.
2025-06-12 11:28:54 +02:00
Amaury Denoyelle
1efaca8a57 MINOR: mux-quic: instantiate first stream on backend side
Adjust qmux_init() to handle frontend and backend sides differently.
Most notably, on backend side, the first bidirectional stream is created
preemptively. This step is necessary as MUX layer will be woken up just
after handshake completion.
2025-06-12 11:28:54 +02:00
Amaury Denoyelle
f8d096c05f MINOR: mux-quic: set expect data only on frontend side
Stream data layer is notified that data is expected when FIN is
received, which marks the end of the HTTP request. This prepares data
layer to be able to handle the expected HTTP response.

Thus, this step is only relevant on frontend side. On backend side, FIN
marks the end of the HTTP response. No further content is expected, thus
expect data should not be set in this case.

Note that se_expect_data() invokation via qcs_attach_sc() is not
protected. This is because this function will only be called during
request headers parsing which is performed on the frontend side.
2025-06-12 11:28:54 +02:00
Amaury Denoyelle
e8775d51df MINOR: mux-quic: define flag for backend side
Mux connection is flagged with new QC_CF_IS_BACK if used on the backend
side. For now the only change is during traces, to be able to
differentiate frontend and backend usage.
2025-06-12 11:28:54 +02:00
Amaury Denoyelle
93b904702f MINOR: mux-quic: improve documentation for snd/rcv app-ops
Complete document for rcv_buf/snd_buf operations. In particular, return
value is now explicitely defined. For H3 layer, associated functions
documentation is also extended.
2025-06-12 11:28:54 +02:00
Amaury Denoyelle
e7f1db0348 MINOR: quic: mark ctrl layer as ready on quic_connect_server()
Use conn_ctrl_init() on the connection when quic_connect_server()
succeeds. This is necessary so that the connection is considered as
completely initialized. Without this, connect operation will be call
again if connection is reused.
2025-06-12 11:25:12 +02:00
Amaury Denoyelle
a0db93f3d8 MEDIUM: backend: delay MUX init with ALPN even if proto is forced
On backend side, multiplexer layer is initialized during
connect_server(). However, this step is not performed if ALPN is used,
as the negotiated protocol may be unknown. Multiplexer initialization is
delayed after TLS handshake completion.

There are still exceptions though that forces the MUX to be initialized
even if ALPN is used. One of them was if <mux_proto> server field was
already set at this stage, which is the case when an explicit proto is
selected on the server line configuration. Remove this condition so that
now MUX init is delayed with ALPN even if proto is forced.

The scope of this change should be minimal. In fact, the only impact
concerns server config with both proto and ALPN set, which is pretty
unlikely as it is contradictory.

The main objective of this patch is to prepare QUIC support on the
backend side. Indeed, QUIC proto will be forced on the server if a QUIC
address is used, similarly to bind configuration. However, we still want
to delay MUX initialization after QUIC handshake completion. This is
mandatory to know the selected application protocol, required during
QUIC MUX init.
2025-06-12 11:21:32 +02:00
Amaury Denoyelle
044ad3a602 BUG/MEDIUM: mux-quic: adjust wakeup behavior
Change wake callback behavior for QUIC MUX. This operation loops over
each QCS and notify their stream data layer on certain events via
internal helper qcc_wake_some_streams().

Previously, streams were notified only if an error occured on the
connection. Change this to notify streams data layer everytime wake
callback is used. This behavior is now identical to H2 MUX.

qcc_wake_some_streams() is also renamed to qcc_wake_streams(), as it
better reflect its true behavior.

This change should not have performance impact as wake mux ops should
not be called frequently. Note that qcc_wake_streams() can also be
called directly via qcc_io_process() to ensure a new error is correctly
propagated. As wake callback first uses qcc_io_process(), it will only
call qcc_wake_streams() if no error is present.

No known issue is associated with this commit. However, it could prevent
freezing transfer under certain condition. As such, it is considered as
a bug fix worthy of backporting.

This should be backported after a period of observation.
2025-06-12 11:12:49 +02:00
Christopher Faulet
2c3f3eaaed BUILD: hlua: Fix warnings about uninitialized variables (2)
It was still failing on Ubuntu-24.04 with GCC+ASAN. So, instead of
understand the code path the compiler followed to report uninitialized
variables, let's init them now.

No backport needed.
2025-06-12 10:49:54 +02:00
Aurelien DARRAGON
b5067a972c BUILD: listener: fix 'for' loop inline variable declaration
commit 16eb0fab3 ("MAJOR: counters: dispatch counters over thread groups")
introduced a build regression on some compilers:

  src/listener.c: In function 'listener_accept':
  src/listener.c:1095:3: error: 'for' loop initial declarations are only allowed in C99 mode
     for (int it = 0; it < global.nbtgroups; it++)
     ^
  src/listener.c:1095:3: note: use option -std=c99 or -std=gnu99 to compile your code
  src/listener.c:1101:4: error: 'for' loop initial declarations are only allowed in C99 mode
      for (int it = 0; it < global.nbtgroups; it++) {
      ^
  make: *** [src/listener.o] Error 1
  make: *** Waiting for unfinished jobs....

Let's fix that.
No backport needed
2025-06-12 08:46:36 +02:00
Christopher Faulet
01f011faeb BUILD: hlua: Fix warnings about uninitialized variables
In hlua_applet_tcp_recv_try() and hlua_applet_tcp_getline_yield(), GCC 14.2
reports warnings about 'blk2' variable that may be used uninitialized. It is
a bit strange because the code is pretty similar than before. But to make it
happy and to avoid bugs if the API change in future, 'blk2' is now used only
when its length is greater than 0.

No need to backport.
2025-06-12 08:46:36 +02:00
Christopher Faulet
8c573deb9f BUG/MINOR: hlua: Don't forget the return statement after a hlua_yieldk()
In hlua_applet_tcp_getline_yield(), the function may yield if there is no
data available. However we must take care to add a return statement just
after the call to hlua_yieldk(). I don't know the details of the LUA API,
but at least, this return statement fix a build error about uninitialized
variables that may be used.

It is a 3.3-specific issue. No backport needed.
2025-06-12 08:46:36 +02:00
Frederic Lecaille
bf6e576cfd MEDIUM: quic-be: initialize MUX on handshake completion
On backend side, MUX is instantiated after QUIC handshake completion.
This step is performed via qc_ssl_provide_quic_data(). First, connection
flags for handshake completion are resetted. Then, MUX is instantiated
via conn_create_mux() function.
2025-06-11 18:37:34 +02:00
Amaury Denoyelle
cdcecb9b65 MINOR: quic: define proper proto on QUIC servers
Force QUIC as <mux_proto> for server if a QUIC address is used. This is
similarly to what is already done for bind instances on the frontend
side. This step ensures that conn_create_mux() will select the proper
protocol.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
855fd63f90 MINOR: quic-be: Prevent the MUX to send/receive data
Such actions must be interrupted until the handshake completion.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
b9703cf711 MINOR: quic-be: get rid of ->li quic_conn member
Replace ->li quic_conn pointer to struct listener member by  ->target which is
an object type enum and adapt the code.
Use __objt_(listener|server)() where the object type is known. Typically
this is were the code which is specific to one connection type (frontend/backend).
Remove <server> parameter passed to qc_new_conn(). It is redundant with the
<target> parameter.
GSO is not supported at this time for QUIC backend. qc_prep_pkts() is modified
to prevent it from building more than an MTU. This has as consequence to prevent
qc_send_ppkts() to use GSO.
ssl_clienthello.c code is run only by listeners. This is why __objt_listener()
is used in place of ->li.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
f6ef3bbc8a MINOR: quic-be: SSL_get_peer_quic_transport_params() not defined by OpenSSL 3.5 QUIC API
Disable the code around SSL_get_peer_quic_transport_params() as this was done
for USE_QUIC_OPENSSL_COMPAT because SSL_get_peer_quic_transport_params() is not
defined by OpenSSL 3.5 QUIC API.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
034cf74437 MINOR: quic-be: Make the secret derivation works for QUIC backends (USE_QUIC_OPENSSL_COMPAT)
quic_tls_compat_keylog_callback() is the callback used by the QUIC OpenSSL
compatibility module to derive the TLS secrets from other secrets provided
by keylog. The <write> local variable to this function is initialized to denote
the direction (write to send, read to receive) the secret is supposed to be used
for. That said, as the QUIC cryptographic algorithms are symmetrical, the
direction is inversed between the peer: a secret which is used to write/send/cipher
data from a peer point of view is also the secret which is used to
read/receive/decipher data. This was confirmed by the fact that without this
patch, the TLS stack first provides the peer with Handshake to send/cipher
data. The client could not use such secret to decipher the Handshake packets
received from the server. This patch simply reverse the direction stored by
<write> variable to make the secrets derivation works for the QUIC client.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
d1cd0bb987 MINOR: quic-be: Missing callbacks initializations (USE_QUIC_OPENSSL_COMPAT)
quic_tls_compat_init() function is called from OpenSSL QUIC compatibility module
(USE_QUIC_OPENSSL_COMPAT) to initialize the keylog callback and the callback
which stores the QUIC transport parameters as a TLS extensions into the stack.
These callbacks must also be initialized for QUIC backends.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
fc90964b55 MINOR: quic-be: Store the remote transport parameters asap
This is done from TLS secrets derivation callback at Application level (the last
encryption level) calling SSL_get_peer_quic_transport_params() to have an access
to the TLS transport paremeters extension embedded into the Server Hello TLS message.
Then, quic_transport_params_store() is called to store a decoded version of
these transport parameters.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
8c2f2615f4 MINOR: quic-be: I/O handler switch adaptation
For connection to QUIC servers, this patch modifies the moment where the I/O
handler callback is switched to quic_conn_app_io_cb(). This is no more
done as for listener just after the handshake has completed but just after
it has been confirmed.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
f085a2f5bf MINOR: quic-be: Initial packet number space discarding.
Discard the Initial packet number space as soon as possible. This is done
during handshakes in quic_conn_io_cb() as soon as an Handshake packet could
be successfully sent.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
a62098bfb0 MINOR: quic-be: Add the conn object to the server SSL context
The initialization of <ssl_app_data_index> SSL user data index is required
to make all the SSL sessions to QUIC servers work as this is done for TCP
servers. The conn object notably retrieve for SSL callback which are
server specific (e.g. ssl_sess_new_srv_cb()).
2025-06-11 18:37:34 +02:00
Frederic Lecaille
e226a7cb79 MINOR: quic-be: Build post handshake frames
This action is not specific to listeners. A QUIC client also have to send
NEW_CONNECTION_ID frames.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
2d076178c6 MINOR: quic-be: Store asap the DCID
Store the peer connection ID (SCID) as the connection DCID as soon as an Initial
packet is received.
Stop comparing the packet to QUIC_PACKET_TYPE_0RTT is already match as
QUIC_PACKET_TYPE_INITIAL.
A QUIC server must not send too short datagram with ack-eliciting packets inside.
This cannot be done from quic_rx_pkt_parse() because one does not know if
there is ack-eliciting frame into the Initial packets. If the packet must be
dropped, this is after having parsed it!
2025-06-11 18:37:34 +02:00
Frederic Lecaille
b4a9b53515 MINOR: h3-be: Correctly retrieve h3 counters
This is done using qc_counters() function which supports also QUIC servers.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
e27b7b4889 MINOR: quic-be: Handshake packet number space discarding
This is done for QUIC clients (or haproxy QUIC servers) when the handshake is
confirmed.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
43d88a44f1 MINOR: quic-be: Datagrams and packet parsing support
Modify quic_dgram_parse() to stop passing it a listener as third parameter.
In place the object type address of the connection socket owner is passed
to support the haproxy servers with QUIC as transport protocol.
qc_owner_obj_type() is implemented to return this address.
qc_counters() is also implemented to return the QUIC specific counters of
the proxy of owner of the connection.
quic_rx_pkt_parse() called by quic_dgram_parse() is also modify to use
the object type address used by this latter as last parameter. It is
also modified to send Retry packet only from listeners. A QUIC client
(connection to haproxy QUIC servers) must drop the Initial packets with
non null token length. It is also not supposed to receive O-RTT packets
which are dropped.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
266b10b8a4 MINOR: quic-be: Do not redispatch the datagrams
The QUIC datagram redispatch is there to counter the race condition which
exists only for QUIC connections to listener where datagrams may arrive
on the wrong socket between the bind() and connect() calls.
Run this code part only for listeners.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
89d5a59933 MINOR: quic-be: add field for max_udp_payload_size into quic_conn
Add ->max_udp_payload_size new member to quic_conn struct.
Initialize it from qc_new_conn().
Adapt qc_snd_buf() to use it.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
f7c0f5ac1b MINOR: quic-be: xprt ->init() adapatations
Allocate a connection to connect to QUIC servers from qc_conn_init() which is the
->init() QUIC xprt callback.
Also initialize ->prepare_srv and ->destroy_srv callback as this done for TCP
servers.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
29fb1aee57 MINOR: quic-be: QUIC connection allocation adaptation (qc_new_conn())
For haproxy QUIC servers (or QUIC clients), the peer is considered as validated.
This is a property which is more specific to QUIC servers (haproxy QUIC listeners).
No <odcid> is used for the QUIC client connection. It is used only on the QUIC server side.
The <token_odcid> is also not used on the QUIC client side. It must be embedded into
the transport parameters only on the QUIC server side.
The quic_conn is created before the socket allocation. So, the local address is
zeroed.
Initilize the transport parameter with qc_srv_params_init().
Stop hardcoding the <server> parameter passed value to qc_new_isecs() to correctly
initialize the Initial secrets.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
9831f596ea MINOR: quic-be: ->connect() protocol callback adaptations
Modify quic_connect_server() which is the ->connect() callback for QUIC protocol:
    - add a BUG_ON() run when entering this funtion: the <fd> socket must equal -1
    - conn->handle is a union. conn->handle.qc is use for QUIC connection,
      conn->handle.fd must not be used to store the fd.
    - code alignment fix for setsockopt(fd, SOL_SOCKET, (SO_SNDBUF|SO_RCVBUF))
	  statements
    - remove the section of code which was duplicated from ->connect() TCP callback
    - fd_insert() the new socket file decriptor created to connect to the QUIC
      server with quic_conn_sock_fd_iocb() as callback for read event.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
52ec3430f2 MINOR: sock: Add protocol and socket types parameters to sock_create_server_socket()
This patch only adds <proto_type> new proto_type enum parameter and <sock_type>
socket type parameter to sock_create_server_socket() and adapts its callers.
This is to prepare the use of this function by QUIC servers/backends.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
9c84f64652 MINOR: quic-be: Add a function to initialize the QUIC client transport parameters
Implement qc_srv_params_init() to initialize the QUIC client transport parameters
in relation with connections to haproxy servers/backends.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
f49bbd36b9 MINOR: quic-be: SSL sessions initializations
Modify qc_alloc_ssl_sock_ctx() to pass the connection object as parameter. It is
NULL for a QUIC listener, not NULL for a QUIC server. This connection object is
set as value for ->conn quic_conn struct member. Initialise the SSL session object from
this function for QUIC servers.
qc_ssl_set_quic_transport_params() is also modified to pass the SSL object as parameter.
This is the unique parameter this function needs. <qc> parameter is used only for
the trace.
SSL_do_handshake() must be calle as soon as the SSL object is initialized for
the QUIC backend connection. This triggers the TLS CRYPTO data delivery.
tasklet_wakeup() is also called to send asap these CRYPTO data.
Modify the QUIC_EV_CONN_NEW event trace to dump the potential errors returned by
SSL_do_handshake().
2025-06-11 18:37:34 +02:00
Frederic Lecaille
1408d94bc4 MINOR: quic-be: ssl_sock contexts allocation and misc adaptations
Implement ssl_sock_new_ssl_ctx() to allocate a SSL server context as this is currently
done for TCP servers and also for QUIC servers depending on the <is_quic> boolean value
passed as new parameter. For QUIC servers, this function calls ssl_quic_srv_new_ssl_ctx()
which is specific to QUIC.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
7c76252d8a MINOR: quic-be: Correct the QUIC protocol lookup
From connect_server(), QUIC protocol could not be retreived by protocol_lookup()
because of the PROTO_TYPE_STREAM default passed as argument. In place to support
QUIC srv->addr_type.proto_type may be safely passed.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
1e45690656 MINOR: quic-be: Add a function for the TLS context allocations
Implement ssl_quic_srv_new_ssl_ctx() whose aim is to allocate a TLS context
for QUIC servers.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
a4e1296208 MINOR: quic-be: QUIC server xprt already set when preparing their CTXs
The QUIC servers xprts have already been set at server line parsing time.
This patch prevents the QUIC servers xprts to be reset to <ssl_sock> value which is
the value used for SSL/TCP connections.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
24fc44c44d MINOR: quic-be: QUIC backend XPRT and transport parameters init during parsing
Add ->quic_params new member to server struct.
Also set the ->xprt member of the server being initialized and initialize asap its
transport parameters from _srv_parse_init().
2025-06-11 18:37:34 +02:00
Frederic Lecaille
0e67687ca9 MINOR: quic-be: Call ->prepare_srv() callback at parsing time
This XPRT callback is called from check_config_validity() after the configuration
has been parsed to initialize all the SSL server contexts.

This patch implements the same thing for the QUIC servers.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
5a711551a2 MINOR: quic-be: Version Information transport parameter check
Add a little check to verify that the version chosen by the server matches
with the client one. Initiliazes local transport parameters ->negotiated_version
value with this version if this is the case. If not, return 0;
2025-06-11 18:37:34 +02:00
Frederic Lecaille
990c9f95f7 MINOR: quic-be: Correct Version Information transp. param encoding
According to the RFC, a QUIC client must encode the QUIC version it supports
into the "Available Versions" of "Version Information" transport parameter
order by descending preference.

This is done defining <quic_version_2> and <quic_version_draft_29> new variables
pointers to the corresponding version of <quic_versions> array elements.
A client announces its available versions as follows: v1, v2, draft29.
2025-06-11 18:37:34 +02:00
Amaury Denoyelle
9c751a3cc1 MINOR: mux-quic-be: allow QUIC proto on backend side
Activate QUIC protocol support for MUX-QUIC on the backend side,
additionally to current frontend support. This change is mandatory to be
able to implement QUIC on the backend side.

Without this modification, it is impossible to activate explicitely QUIC
protocol on a server line, hence an error is reported :
  config : proxy 'xxxx' : MUX protocol 'quic' is not usable for server 'yyyy'
2025-06-11 18:37:34 +02:00
Amaury Denoyelle
f66b495f8e MINOR: server: mark QUIC support as experimental
Mark QUIC address support for servers as experimental on the backend
side. Previously, it was allowed but wouldn't function as expected. As
QUIC backend support requires several changes, it is better to declare
it as experimental first.
2025-06-11 18:37:33 +02:00
Amaury Denoyelle
bdd5e58179 MINOR: server: implement helper to identify QUIC servers
Define srv_is_quic() which can be used to quickly identified if a server
uses QUIC protocol.
2025-06-11 18:37:19 +02:00
Amaury Denoyelle
1ecf2e9bab BUG/MINOR: config/server: reject QUIC addresses
QUIC is not implemented on the backend side. To prevent any issue, it is
better to reject any server configured which uses it. This is done via
_srv_parse_init() which is used both for static and dynamic servers.

This should be backported up to all stable versions.
2025-06-11 18:37:17 +02:00
Christopher Faulet
b5525fe759 [RELEASE] Released version 3.3-dev1
Released version 3.3-dev1 with the following main changes :
    - BUILD: tools: properly define ha_dump_backtrace() to avoid a build warning
    - DOC: config: Fix a typo in 2.7 (Name format for maps and ACLs)
    - REGTESTS: Do not use REQUIRE_VERSION for HAProxy 2.5+ (5)
    - REGTESTS: Remove REQUIRE_VERSION=2.3 from all tests
    - REGTESTS: Remove REQUIRE_VERSION=2.4 from all tests
    - REGTESTS: Remove tests with REQUIRE_VERSION_BELOW=2.4
    - REGTESTS: Remove support for REQUIRE_VERSION and REQUIRE_VERSION_BELOW
    - MINOR: server: group postinit server tasks under _srv_postparse()
    - MINOR: stats: add stat_col flags
    - MINOR: stats: add ME_NEW_COMMON() helper
    - MINOR: proxy: collect per-capability stat in proxy_cond_disable()
    - MINOR: proxy: add a true list containing all proxies
    - MINOR: log: only run postcheck_log_backend() checks on backend
    - MEDIUM: proxy: use global proxy list for REGISTER_POST_PROXY_CHECK() hook
    - MEDIUM: server: automatically add server to proxy list in new_server()
    - MEDIUM: server: add and use srv_init() function
    - BUG/MAJOR: leastconn: Protect tree_elt with the lbprm lock
    - BUG/MEDIUM: check: Requeue healthchecks on I/O events to handle check timeout
    - CLEANUP: applet: Update comment for applet_put* functions
    - DEBUG: check: Add the healthcheck's expiration date in the trace messags
    - BUG/MINOR: mux-spop: Fix null-pointer deref on SPOP stream allocation failure
    - CLEANUP: sink: remove useless cleanup in sink_new_from_logger()
    - MAJOR: counters: add shared counters base infrastructure
    - MINOR: counters: add shared counters helpers to get and drop shared pointers
    - MINOR: counters: add common struct and flags to {fe,be}_counters_shared
    - MEDIUM: counters: manage shared counters using dedicated helpers
    - CLEANUP: counters: merge some common counters between {fe,be}_counters_shared
    - MINOR: counters: add local-only internal rates to compute some maxes
    - MAJOR: counters: dispatch counters over thread groups
    - BUG/MEDIUM: cli: Properly parse empty lines and avoid crashed
    - BUG/MINOR: config: emit warning for empty args only in discovery mode
    - BUG/MINOR: config: fix arg number reported on empty arg warning
    - BUG/MINOR: quic: Missing SSL session object freeing
    - MINOR: applet: Add API functions to manipulate input and output buffers
    - MINOR: applet: Add API functions to get data from the input buffer
    - CLEANUP: applet: Simplify a bit comments for applet_put* functions
    - MEDIUM: hlua: Update TCP applet functions to use the new applet API
    - BUG/MEDIUM: fd: Use the provided tgid in fd_insert() to get tgroup_info
    - BUG/MINIR: h1: Fix doc of 'accept-unsafe-...-request' about URI parsing
2025-06-11 14:31:33 +02:00
Christopher Faulet
b2f64af341 BUG/MINIR: h1: Fix doc of 'accept-unsafe-...-request' about URI parsing
The description of tests performed on the URI in H1 when
'accept-unsafe-violations-in-http-request' option is wrong. It states that
only characters below 32 and 127 are blocked when this option is set,
suggesting that otherwise, when it is not set, all invalid characters in the
URI, according to the RFC3986, are blocked.

But in fact, it is not true. By default all character below 32 and above 127
are blocked. And when 'accept-unsafe-violations-in-http-request' option is
set, characters above 127 (excluded) are accepted. But characters in
(33..126) are never checked, independently of this option.

This patch should fix the issue #2906. It should be backported as far as
3.0. For older versions, the docuementation could also be clarified because
this part is not really clear.

Note the request URI validation is still under discution because invalid
characters in (33.126) are never checked and some users request a stricter
parsing.
2025-06-10 19:17:56 +02:00
Olivier Houchard
6993981cd6 BUG/MEDIUM: fd: Use the provided tgid in fd_insert() to get tgroup_info
In fd_insert(), use the provided tgid to ghet the thread group info,
instead of using the one of the current thread, as we may call
fd_insert() from a thread of another thread group, that will happen at
least when binding the listeners. Otherwise we'd end up accessing the
thread mask containing enabled thread of the wrong thread group, which
can lead to crashes if we're binding on threads not present in the
thread group.
This should fix Github issue #2991.

This should be backported up to 2.8.
2025-06-10 15:10:56 +02:00
Christopher Faulet
9df380a152 MEDIUM: hlua: Update TCP applet functions to use the new applet API
The functions responsible to extract data from the applet input buffer or to
push data into the applet output buffer are now relying on the newly added
functions in the applet API. This simplifies a bit the code.
2025-06-10 08:16:10 +02:00
Christopher Faulet
18f9c71041 CLEANUP: applet: Simplify a bit comments for applet_put* functions
Instead of repeating which buffer is used depending on the API used by the
applet, a reference to applet_get_outbuf() was added.
2025-06-10 08:16:10 +02:00
Christopher Faulet
79445766a3 MINOR: applet: Add API functions to get data from the input buffer
There was already functions to pushed data from the applet to the stream by
inserting them in the right buffer, depending the applet was using or not
the legacy API. Here, functions to retreive data pushed to the applet by the
stream were added:

  * applet_getchar   : Gets one character

  * applet_getblk    : Copies a full block of data

  * applet_getword   : Copies one text block representing a word using a
                       custom separator as delimiter

  * applet_getline   : Copies one text line

  * applet_getblk_nc : Get one or two blocks of data

  * applet_getword_nc: Gets one or two blocks of text representing a word
                       using a custom separator as delimiter

  * applet_getline_nc: Gets one or two blocks of text representing a line
2025-06-10 08:16:10 +02:00
Christopher Faulet
0d8ecb1edc MINOR: applet: Add API functions to manipulate input and output buffers
In this patch, some functions were added to ease input and output buffers
manipulation, regardless the corresponding applet is using its own buffers
or it is relying on channels buffers. Following functions were added:

  * applet_get_inbuf  : Get the buffer containing data pushed to the applet
                        by the stream

  * applet_get_outbuf : Get the buffer containing data pushed by the applet
                        to the stream

  * applet_input_data : Return the amount of data in the input buffer

  * applet_skip_input : Skips <len> bytes from the input buffer

  * applet_reset_input: Skips all bytes from the input buffer

  * applet_output_room: Returns the amout of space available at the output
                        buffer

  * applet_need_room  : Indicates that the applet have more data to deliver
                        and it needs more room in the output buffer to do
			so
2025-06-10 08:16:10 +02:00
Frederic Lecaille
6b74633069 BUG/MINOR: quic: Missing SSL session object freeing
qc_alloc_ssl_sock_ctx() allocates an SSL_CTX object for each connection. It also
allocates an SSL object. When this function failed, it freed only the SSL_CTX object.
The correct way to free both of them is to call qc_free_ssl_sock_ctx().

Must be backported as far as 2.6.
2025-06-06 17:53:13 +02:00
Amaury Denoyelle
0cdf529720 BUG/MINOR: config: fix arg number reported on empty arg warning
If an empty argument is used in configuration, for example due to an
undefined environment variable, the rest of the line is not parsed. As
such, a warning is emitted to report this.

The warning was not totally correct as it reported the wrong argument
index. Fix this by this patch. Note that there is still an issue with
the "^" indicator, but this is not as easy to fix yet.

This is related to github issue #2995.

This should be backported up to 3.2.
2025-06-06 17:03:02 +02:00
Amaury Denoyelle
5f1fad1690 BUG/MINOR: config: emit warning for empty args only in discovery mode
Hide warning about empty argument outside of discovery mode. This is
necessary, else the message will be displayed twice, which hampers
haproxy output lisibility.

This should fix github isue #2995.

This should be backported up to 3.2.
2025-06-06 17:02:58 +02:00
Christopher Faulet
f5d41803d3 BUG/MEDIUM: cli: Properly parse empty lines and avoid crashed
Empty lines was not properly parsed and could lead to crashes because the
last argument was parsed outside of the cmdline buffer. Indeed, the last
argument is parsed to look for an eventual payload pattern. It is started
one character after the newline at the end of the command line. But it is
only valid for an non-empty command line.

So, now, this case is properly detected when we leave if an empty line is
detected.

This patch must be backported to 3.2.
2025-06-05 10:46:13 +02:00
Aurelien DARRAGON
16eb0fab31 MAJOR: counters: dispatch counters over thread groups
Most fe and be counters are good candidates for being shared between
processes. They are now grouped inside "shared" struct sub member under
be_counters and fe_counters.

Now they are properly identified, they would greatly benefit from being
shared over thread groups to reduce the cost of atomic operations when
updating them. For this, we take the current tgid into account so each
thread group only updates its own counters. For this to work, it is
mandatory that the "shared" member from {fe,be}_counters is initialized
AFTER global.nbtgroups is known, because each shared counter causes the stat
to be allocated lobal.nbtgroups times. When updating a counter without
concurrency, the first counter from the array may be updated.

To consult the shared counters (which requires aggregation of per-tgid
individual counters), some helper functions were added to counter.h to
ease code maintenance and avoid computing errors.
2025-06-05 09:59:38 +02:00
Aurelien DARRAGON
12c3ffbb48 MINOR: counters: add local-only internal rates to compute some maxes
cps_max (max new connections received per second), sps_max (max new
sessions per second) and http.rps_max (maximum new http requests per
second) all rely on shared counters (namely conn_per_sec, sess_per_sec and
http.req_per_sec). The problem is that shared counters are about to be
distributed over thread groups, and we cannot afford to compute the
total (for all thread groups) each time we update the max counters.

Instead, since such max counters (relying on shared counters) are a very
few exceptions, let's add internal (sess,conn,req) per sec freq counters
that are dedicated to cps_max, sps_max and http.rps_max computing.

Thanks to that, related *_max counters shouldn't be negatively impacted
by the thread-group distribution, yet they will not benefit from it
either. Related internal freq counters are prefixed with "_" to emphasize
the fact that they should not be used for other purpose (the shared ones,
which are about to be distributed over thread groups in upcoming commits
are still available and must be used instead). The internal ones could
eventually be removed at any time if we find another way to compute the
{cps,sps,http.rps)_max counters.
2025-06-05 09:59:31 +02:00
Aurelien DARRAGON
b72a8bb138 CLEANUP: counters: merge some common counters between {fe,be}_counters_shared
Now that we have a common struct between fe and be shared counters struct
let's perform some cleanup to merge duplicate members into the common
struct part. This will ease code maintenance.
2025-06-05 09:59:24 +02:00
Aurelien DARRAGON
b599138842 MEDIUM: counters: manage shared counters using dedicated helpers
proxies, listeners and server shared counters are now managed via helpers
added in one of the previous commits.

When guid is not set (ie: when not yet assigned), shared counters pointer
is allocated using calloc() (local memory) and a flag is set on the shared
counters struct to know how to manipulate (and free it). Else if guid is
set, then it means that the counters may be shared so while for now we
don't actually use a shared memory location the API is ready for that.

The way it works, for proxies and servers (for which guid is not known
during creation), we first call counters_{fe,be}_shared_get with guid not
set, which results in local pointer being retrieved (as if we just
manually called calloc() to retrieve a pointer). Later (during postparsing)
if guid is set we try to upgrade the pointer from local to shared.

Lastly, since the memory location for some objects (proxies and servers
counters) may change from creation to postparsing, let's update
counters->last_change member directly under counters_{fe,be}_shared_get()
so we don't miss it.

No change of behavior is expected, this is only preparation work.
2025-06-05 09:59:17 +02:00
Aurelien DARRAGON
c10ce1c85b MINOR: counters: add common struct and flags to {fe,be}_counters_shared
fe_counters_shared and be_counters_shared may share some common members
since they are quite similar, so we add a common struct part shared
between the two. struct counters_shared is added for convenience as
a generic pointer to manipulate common members from fe or be shared
counters pointer.

Also, the first common member is added: shared fe and be counters now
have a flags member.
2025-06-05 09:59:10 +02:00
Aurelien DARRAGON
aa53887398 MINOR: counters: add shared counters helpers to get and drop shared pointers
create include/haproxy/counters.h and src/counters.c files to anticipate
for further helpers as some counters specific tasks needs to be carried
out and since counters are shared between multiple object types (ie:
listener, proxy, server..) we need generic helpers.

Add some shared counters helper which are not yet used but will be updated
in upcoming commits.
2025-06-05 09:59:04 +02:00
Aurelien DARRAGON
a0dcab5c45 MAJOR: counters: add shared counters base infrastructure
Shareable counters are not tagged as shared counters and are dynamically
allocated in separate memory area as a prerequisite for being stored
in shared memory area. For now, GUID and threads groups are not taken into
account, this is only a first step.

also we ensure all counters are now manipulated using atomic operations,
namely, "last_change" counter is now read from and written to using atomic
ops.

Despite the numerous changes caused by the counters being moved away from
counters struct, no change of behavior should be expected.
2025-06-05 09:58:58 +02:00
Aurelien DARRAGON
89b04f2191 CLEANUP: sink: remove useless cleanup in sink_new_from_logger()
As reported by Ilya in GH #2994, some cleanup parts in
sink_new_from_logger() function are not used.

We can actually simplify the cleanup logic to remove dead code, let's
do that by renaming "error_final" label to "error" and only making use
of the "error" label, because sink_free() already takes care of proper
cleanup for all sink members.
2025-06-05 09:58:50 +02:00
Christopher Faulet
8c4bb8cab3 BUG/MINOR: mux-spop: Fix null-pointer deref on SPOP stream allocation failure
When we try to allocate a new SPOP stream, if an error is encountered,
spop_strm_destroy() is called to released the eventually allocated
stream. But, it must only be called if a stream was allocated. If the
reported error is an SPOP stream allocation failure, we must just leave to
avoid null-pointer dereference.

This patch should fix point 1 of the issue #2993. It must be backported as
far as 3.1.
2025-06-04 08:48:49 +02:00
Christopher Faulet
6786b05297 DEBUG: check: Add the healthcheck's expiration date in the trace messags
It could help to diagnose some issues about timeout processing. So let's add
it !
2025-06-03 15:06:12 +02:00
Christopher Faulet
8ee650a88b CLEANUP: applet: Update comment for applet_put* functions
These functions were copied from the channel API and modified to work with
applets using the new API or the legacy one. However, the comments were
updated accordingly. It is the purpose of this patch.
2025-06-03 15:03:30 +02:00
Christopher Faulet
7c788f0984 BUG/MEDIUM: check: Requeue healthchecks on I/O events to handle check timeout
When a healthchecks is processed, once the first wakeup passed to start the
check, and as long as the expiration timer is not reached, only I/O events
are able to wake it up. It is an issue when there is a check timeout
defined.  Especially if the connect timeout is high and the check timeout is
low. In that case, the healthcheck's task is never requeue to handle any
timeout update. When the connection is established, the check timeout is set
to replace the connect timeout. It is thus possible to report a success
while a timeout should be reported.

So, now, when an I/O event is handled, the healthcheck is requeue, except if
an success or an abort is reported.

Thanks to Thierry Fournier for report and the reproducer.

This patch must be backported to all stable versions.
2025-06-03 15:03:30 +02:00
Olivier Houchard
913b2d6c83 BUG/MAJOR: leastconn: Protect tree_elt with the lbprm lock
In fwlc_srv_reposition(), set the server's tree_elt while we still hold
the lbprm read lock. While it was protected from concurrent
fwlc_srv_reposition() calls by the server's lb_lock, it was not from
dequeuing/requeuing that could occur if the server gets down/up or its
weight is changed, and that would lead to inconsistencies, and the
watchdog killing the process because it is stuck in an infinite loop in
fwlc_get_next_server().

This hopefully fixes github issue #2990.

This should be backported to 3.2.
2025-06-03 04:42:47 +02:00
Aurelien DARRAGON
368d01361a MEDIUM: server: add and use srv_init() function
rename _srv_postparse() internal function to srv_init() function and group
srv_init_per_thr() plus idle conns list init inside it. This way we can
perform some simplifications as srv_init() performs multiple server
init steps after parsing.

SRV_F_CHECKED flag was added, it is automatically set when srv_init()
runs successfully. If the flag is already set and srv_init() is called
again, nothing is done. This permis to manually call srv_init() earlier
than the default POST_CHECK hook when needed without risking to do things
twice.
2025-06-02 17:51:33 +02:00
Aurelien DARRAGON
889ef6f67b MEDIUM: server: automatically add server to proxy list in new_server()
while new_server() takes the parent proxy as argument and even assigns
srv->proxy to the parent proxy, it didn't actually inserted the server
to the parent proxy server list on success.

The result is that sometimes we add the server to the list after
new_server() is called, and sometimes we don't.

This is really error-prone and because of that hooks such as
REGISTER_POST_SERVER_CHECK() which as run for all servers listed in
all proxies may not be relied upon for servers which are not actually
inserted in their parent proxy server list. Plus it feels very strange
to have a server that points to a proxy, but then the proxy doesn't know
about it because it cannot find it in its server list.

To prevent errors and make proxy->srv list reliable, we move the insertion
logic directly under new_server(). This requires to know if we are called
during parsing or during runtime to either insert or append the server to
the parent proxy list. For that we use PR_FL_CHECKED flag from the parent
proxy (if the flag is set, then the proxy was checked so we are past the
init phase, thus we assume we are called during runtime)

This implies that during startup if new_server() has to be cancelled on
error paths we need to call srv_detach() (which is now exposed in server.h)
before srv_drop().

The consequence of this commit is that REGISTER_POST_SERVER_CHECK() should
not run reliably on all servers created using new_server() (without having
to manually loop on global servers_list)
2025-06-02 17:51:30 +02:00
Aurelien DARRAGON
e262e4bbe4 MEDIUM: proxy: use global proxy list for REGISTER_POST_PROXY_CHECK() hook
REGISTER_POST_PROXY_CHECK() used to iterate over "main" proxies to run
registered callbacks. This means hidden proxies (and their servers) did
not get a chance to get post-checked and could cause issues if some post-
checks are expected to be executed on all proxies no matter their type.

Instead we now rely on the global proxies list. Another side effect is that
the REGISTER_POST_SERVER_CHECK() now runs as well for servers from proxies
that are not part of the main proxies list.
2025-06-02 17:51:27 +02:00
Aurelien DARRAGON
1f12e45b0a MINOR: log: only run postcheck_log_backend() checks on backend
postcheck_log_backend() checks are executed no matter if the proxy
actually has the backend capability while the checks actually depend
on this.

Let's fix that by adding an extra condition to ensure that the BE
capability is set.

This issue is not tagged as a bug because for now it remains impossible
to have a syslog proxy without BE capability in the main proxy list, but
this may change in the future.
2025-06-02 17:51:24 +02:00
Aurelien DARRAGON
943958c3ff MINOR: proxy: add a true list containing all proxies
We have global proxies_list pointer which is announced as the list of
"all existing proxies", but in fact it only represents regular proxies
declared on the config file through "listen, frontend or backend" keywords

It is ambiguous, and we currently don't have a straightforwrd method to
iterate over all proxies (either public or internal ones) within haproxy

Instead we still have to manually iterate over multiple lists (main
proxies, log-forward proxies, peer proxies..) which is error-prone.

In this patch we add a struct list member (8 bytes) inside struct proxy
in order to store every proxy (except default ones) within a global
"proxies" list which is actually representative for all proxies existing
under haproxy process, like we already have for servers.
2025-06-02 17:51:21 +02:00
Aurelien DARRAGON
6ccf770fe2 MINOR: proxy: collect per-capability stat in proxy_cond_disable()
proxy_cond_disable() collects and prints cumulated connections for be and
fe proxies no matter their type. With shared stats it may cause issues
because depending on the proxy capabilities only fe or be counters may
be allocated.

In this patch we add some checks to ensure we only try to read from
valid memory locations, else we rely on default values (0).
2025-06-02 17:51:17 +02:00
Aurelien DARRAGON
c7c017ec3c MINOR: stats: add ME_NEW_COMMON() helper
Split ME_NEW_* helper into COMMON part and specific part so it becomes
easier to add alternative helpers without code duplication.
2025-06-02 17:51:12 +02:00
Aurelien DARRAGON
d04843167c MINOR: stats: add stat_col flags
Add stat_col flags member to store .generic bit and prepare for upcoming
flags. No functional change expected.
2025-06-02 17:51:08 +02:00
Aurelien DARRAGON
f0b40b49b8 MINOR: server: group postinit server tasks under _srv_postparse()
init_srv_requeue() and init_srv_slowstart() functions are called after
initial server parsing via REGISTER_POST_SERVER_CHECK() hook, and they
are also manually called for dynamic server after the server is
initialized.

This may conflict with _srv_postparse() which is also registered via
REGISTER_POST_SERVER_CHECK() and called during dynamic server creation

To ensure functions don't conflict with each other, let's ensure they
are executed in proper order by calling init_srv_requeue and
init_srv_slowstart() from _srv_postparse() which now becomes the parent
function for server related postparsing stuff. No change of behavior is
expected.
2025-06-02 17:51:05 +02:00
Tim Duesterhus
8ee8b8a04d REGTESTS: Remove support for REQUIRE_VERSION and REQUIRE_VERSION_BELOW
This is no longer used since the migration to the native `haproxy -cc
'version_atleast(X)'` functionality.

see 8727614dc4046e91997ecce421bcb6a5537cac93
see 5efc48dcf1b133dd415c759e83b21d52dc303786
2025-06-02 17:37:11 +02:00
Tim Duesterhus
d8951ec70f REGTESTS: Remove tests with REQUIRE_VERSION_BELOW=2.4
HAProxy 2.4 is the lowest supported version, thus this never matches.

see 18cd4746e5aff9da78d16220b0412947ceba24f3
2025-06-02 17:37:07 +02:00
Tim Duesterhus
534b09f2a2 REGTESTS: Remove REQUIRE_VERSION=2.4 from all tests
HAProxy 2.4 is the lowest supported version, thus this always matches.

see 7aff1bf6b90caadfa95f6b43b526275191991d6f
2025-06-02 17:37:04 +02:00
Tim Duesterhus
239785fd27 REGTESTS: Remove REQUIRE_VERSION=2.3 from all tests
HAProxy 2.4 is the lowest supported version, thus this always matches.

see 7aff1bf6b90caadfa95f6b43b526275191991d6f
2025-06-02 17:37:00 +02:00
Tim Duesterhus
294c47a5ef REGTESTS: Do not use REQUIRE_VERSION for HAProxy 2.5+ (5)
Introduced in:

25bcdb1d9 BUG/MAJOR: h1: Be stricter on request target validation during message parsing

see also:

fbbbc33df REGTESTS: Do not use REQUIRE_VERSION for HAProxy 2.5+
2025-06-02 17:36:56 +02:00
Christopher Faulet
8e8cdf114b DOC: config: Fix a typo in 2.7 (Name format for maps and ACLs)
"identified" was used instead of "identifier". May be backported as far as
3.0
2025-06-02 09:19:38 +02:00
Willy Tarreau
b88164d9c0 BUILD: tools: properly define ha_dump_backtrace() to avoid a build warning
In resolve_sym_name() we declare a few symbols that we want to be able
to resolve. ha_dump_backtrace() was declared with a struct buffer instead
of a pointer to such a struct, which has no effect since we only want to
get the function's pointer, but produces a build warning with LTO, so
let's fix it.

This can be backported to 3.0.
2025-05-30 17:15:48 +02:00
Willy Tarreau
9f4cd435d3 [RELEASE] Released version 3.3-dev0
Released version 3.3-dev0 with the following main changes :
    - MINOR: version: mention that it's development again
2025-05-28 16:46:34 +02:00
Willy Tarreau
8809251ee0 MINOR: version: mention that it's development again
This essentially reverts a6458fd4269.
2025-05-28 16:46:15 +02:00
Willy Tarreau
e134140d28 [RELEASE] Released version 3.2.0
Released version 3.2.0 with the following main changes :
    - MINOR: promex: Add agent check status/code/duration metrics
    - MINOR: ssl: support strict-sni in ssl-default-bind-options
    - MINOR: ssl: also provide the "tls-tickets" bind option
    - MINOR: server: define CLI I/O handler for "add server"
    - MINOR: server: implement "add server help"
    - MINOR: server: use stress mode for "add server help"
    - BUG/MEDIUM: server: fix crash after duplicate GUID insertion
    - BUG/MEDIUM: server: fix potential null-deref after previous fix
    - MINOR: config: list recently added sections with -dKcfg
    - BUG/MAJOR: cache: Crash because of wrong cache entry deleted
    - DOC: configuration: fix the example in crt-store
    - DOC: config: clarify the wording around single/double quotes
    - DOC: config: clarify the legacy cookie and header captures
    - DOC: config: fix alphabetical ordering of layer 7 sample fetch functions
    - DOC: config: fix alphabetical ordering of layer 6 sample fetch functions
    - DOC: config: fix alphabetical ordering of layer 5 sample fetch functions
    - DOC: config: fix alphabetical ordering of layer 4 sample fetch functions
    - DOC: config: fix alphabetical ordering of internal sample fetch functions
    - BUG/MINOR: h3: Set HTX flags corresponding to the scheme found in the request
    - BUG/MEDIUM: h3: Declare absolute URI as normalized when a :authority is found
    - DOC: config: mention in bytes_in and bytes_out that they're read on input
    - DOC: config: clarify the basics of ACLs (call point, multi-valued etc)
    - REGTESTS: Make the script testing conditional set-var compatible with Vtest2
    - REGTESTS: Explicitly allow failing shell commands in some scripts
    - MINOR: listeners: Add support for a label on bind line
    - BUG/MEDIUM: cli/ring: Properly handle shutdown in "show event" I/O handler
    - BUG/MEDIUM: hlua: Properly detect shudowns for TCP applets based on the new API
    - BUG/MEDIUM: hlua: Fix getline() for TCP applets to work with applet's buffers
    - BUG/MEDIUM: hlua: Fix receive API for TCP applets to properly handle shutdowns
    - CI: vtest: Rely on VTest2 to run regression tests
    - CI: vtest: Fix the build script to properly work on MaOS
    - CI: combine AWS-LC and AWS-LC-FIPS by template
    - BUG/MEDIUM: httpclient: Throw an error if an lua httpclient instance is reused
    - DOC: hlua: Add a note to warn user about httpclient object reuse
    - DOC: hlua: fix a few typos in HTTPMessage.set_body_len() documentation
    - DEV: patchbot: prepare for new version 3.3-dev
    - MINOR: version: mention that it's 3.2 LTS now.
2025-05-28 16:35:14 +02:00
Willy Tarreau
a6458fd426 MINOR: version: mention that it's 3.2 LTS now.
The version will be maintained up to around Q2 2030. Let's
also update the INSTALL file to mention this.
2025-05-28 16:31:27 +02:00
Willy Tarreau
2502435eb3 DEV: patchbot: prepare for new version 3.3-dev
The bot will now load the prompt for the upcoming 3.2 version so we have
to rename the files and update their contents to match the current version.
2025-05-28 16:23:12 +02:00
Willy Tarreau
21ce685fcd DOC: hlua: fix a few typos in HTTPMessage.set_body_len() documentation
A few typos were noticed while gathering info for the 3.2 announce
messages, this fixes them, and will probably constitute the last
commit of this release. There's no need to backport it unless commit
94055a5e7 ("MEDIUM: hlua: Add function to change the body length of
an HTTP Message") is backported.
2025-05-27 19:33:49 +02:00
Christopher Faulet
cb7a2444d1 DOC: hlua: Add a note to warn user about httpclient object reuse
It is not supported to reuse an lua httpclient instance to process several
requests. A new object must be created for each request. Thanks to the
previous patch ("BUG/MEDIUM: httpclient: Throw an error if an lua httpclient
instance is reused"), an error is now reported if this happens. But it is
not obvious for users. So the lua-api docuementation was updated accordingly.

This patch is related to issue #2986. It should be backported with the
commit above.
2025-05-27 18:48:23 +02:00
Christopher Faulet
50fca6f0b7 BUG/MEDIUM: httpclient: Throw an error if an lua httpclient instance is reused
It is not expected/supported to reuse an httpclient instance to process
several requests. A new instance must be created for each request. However,
in lua, there is nothing to prevent a user to create an httpclient object
and use it in a loop to process requests.

That's unfortunate because this will apparently work, the requests will be
sent and a response will be received and processed. However internally some
ressources will be allocated and never released. When the next response is
processed, the ressources allocated for the previous one are definitively
lost.

In this patch we take care to check that the httpclient object was never
used when a request is sent from a lua script by checking
HTTPCLIENT_FS_STARTED flags. This flag is set when a httpclient applet is
spawned to process a request and never removed after that. In lua, the
httpclient applet is created when the request is sent. So, it is the right
place to do this test.

This patch should fix the issue #2986. It should be backported as far as
2.6.
2025-05-27 18:47:24 +02:00
Ilya Shipitsin
94ded5523f CI: combine AWS-LC and AWS-LC-FIPS by template
let's reduce code duplication by involving workflow templates
2025-05-27 15:06:58 +02:00
Christopher Faulet
508e074a32 CI: vtest: Fix the build script to properly work on MaOS
"config.h" header file is new in VTest2 and includes must be adapted to be
able to build VTest on MacOS. Let's add "-I." to make it work.
2025-05-27 14:48:53 +02:00
Christopher Faulet
6a18d28ba2 CI: vtest: Rely on VTest2 to run regression tests
VTest2 (https://github.com/vtest/VTest2) was released and is a remplacement
for VTest. VTest was archived. So let's use the new version now.

If this commit is backported, the 2 following commits must also be
backported:

 * 2808e3577 ("REGTESTS: Explicitly allow failing shell commands in some scripts")
 * 82c291124 ("REGTESTS: Make the script testing conditional set-var compatible with Vtest2")
2025-05-27 14:38:46 +02:00
Christopher Faulet
bc4c3c7969 BUG/MEDIUM: hlua: Fix receive API for TCP applets to properly handle shutdowns
An optional timeout was added to AppletTCP.receive() to interrupt calls after a
delay. It was mandatory to be able to implement interactive applets (like
trisdemo). However, this broke the API and it made impossible to differentiate
the shutdowns from the delays expirations. Indeed, in both cases, an empty
string was returned.

Because historically an empty string was used to notify a connection shutdown,
it should not be changed. So now, 'nil' value is returned when no data was
available before the delay expiration.

The new AppletTCP:try_receive() function was also affected. To fix it, instead
of stating there is no delay when a receive is tried, an expired delay is
set. Concretely TICK_ETERNITY was replaced by now_ms.

Finally, AppletTCP:getline() function is not concerned for now because there
is no way to interrupt it after some delay.

The documentation and trisdemo lua script were updated accordingly.

This patch depends on "BUG/MEDIUM: hlua: Properly detect shudowns for TCP
applets based on the new API". However, it is a 3.2-specific issue, so no
backport is needed.
2025-05-27 07:53:19 +02:00
Christopher Faulet
c0ecef71d7 BUG/MEDIUM: hlua: Fix getline() for TCP applets to work with applet's buffers
The commit e5e36ce09 ("BUG/MEDIUM: hlua/cli: Fix lua CLI commands to work
with applet's buffers") fixed the TCP applets API to work with applets using
its own buffers. Howver the getline() function was not updated. It could be
an issue for anyone registering a CLI commands reading lines.

This patch should be backported as far as 3.0.
2025-05-27 07:53:01 +02:00
Christopher Faulet
c64781c2c8 BUG/MEDIUM: hlua: Properly detect shudowns for TCP applets based on the new API
The internal function responsible to receive data for TCP applets with
internal buffers is buggy. Indeed, for these applets, the buffer API is used
to get data. So there is no tests on the SE to properly detect connection
shutdowns. So, it must be performed by hand after the call to b_getblk_nc().

This patch must be backported as far as 3.0.
2025-05-26 19:00:00 +02:00
Christopher Faulet
4d4da515f2 BUG/MEDIUM: cli/ring: Properly handle shutdown in "show event" I/O handler
The commit 03dc54d802 ("BUG/MINOR: ring: Fix I/O handler of "show event"
command to not rely on the SC") introduced a regression. By removing
dependencies on the SC, a test to detect client shutdowns was removed. So
now, the CLI applet is no longer released when the client shut the
connection during a "show event -w".

So of course, we should not use the SC to detect the shutdowns. But the SE
must be used insteead.

It is a 3.2-specific issue, so no backport needed.
2025-05-26 19:00:00 +02:00
Christopher Faulet
99e755d673 MINOR: listeners: Add support for a label on bind line
It is now possile to set a label on a bind line. All sockets attached to
this bind line inherits from this label. The idea is to be able to groud of
sockets. For now, there is no mechanism to create these groups, this must be
done by hand.
2025-05-26 19:00:00 +02:00
Christopher Faulet
2808e3577f REGTESTS: Explicitly allow failing shell commands in some scripts
Vtest2, that should replaced Vtest in few months, will reject any failing
commands in shell blocks. However, some scripts are executing some commands,
expecting an error to be able to parse the error output. So, now use "set
+e" in those scripts to explicitly state failing commads are expected.

It is just used for non-final commands. At the end, the shell block must
still report a success.
2025-05-26 19:00:00 +02:00
Christopher Faulet
82c2911248 REGTESTS: Make the script testing conditional set-var compatible with Vtest2
VTest2 will replaced VTest in few months. There is not so much change
expected. One of them is that a User-Agent header is added by default in all
requests, except if an custom one is already set or if "-nouseragent" option
is used. To still be compatible with VTest, it is not possible to use the
option to avoid the header addition. So, a custom user-agent is added in the
last test of "sample_fetches/cond_set_var.vtc" to be sure it will pass with
Vtest and Vtest2. It is mandatory because the request length is tested.
2025-05-26 19:00:00 +02:00
Willy Tarreau
5b937b7a97 DOC: config: clarify the basics of ACLs (call point, multi-valued etc)
This is essentially in order to address the concerns expressed in
issue #2226 where it is mentioned that the moment they are called is
not clear enough. Admittedly, re-reading the paragraph doesn't make
it obvious on a quick read that they behave like functions. This patch
adds an extra paragraph that makes the parallel with programming
languages' boolean functions and explains the fact that they can be
multi-valued. Hoping this is clearer now.
2025-05-26 16:25:22 +02:00
Willy Tarreau
ef9511be90 DOC: config: mention in bytes_in and bytes_out that they're read on input
Issue #2267 suggests that it's unclear what exactly the byte counts mean
(particularly when compression is involved). Let's clarify that the counts
are read on data input and that they also cover headers and a bit of
internal overhead.
2025-05-26 15:54:36 +02:00
Christopher Faulet
e70c23e517 BUG/MEDIUM: h3: Declare absolute URI as normalized when a :authority is found
Since commit 2c3d656f8 ("MEDIUM: h3: use absolute URI form with
:authority"), the absolute URI form is used when a ':authority'
pseudo-header is found. However, this URI was not declared as normalized
internally.  So, when the request is reformated to be sent to an h1 server,
the absolute-form is used instead of the origin-form. It is unexpected and
may be an issue for some servers that could reject the request.

So, now, we take care to set HTX_SL_F_HAS_AUTHORITY flag on the HTX message
when an authority was found and HTX_SL_F_NORMALIZED_URI flag is set for
"http" or "https" schemes.

No backport needed because the commit above must not be backported. It
should fix a regression reported on the 3.2-dev17 in issue #2977.

This commit depends on "BUG/MINOR: h3: Set HTX flags corresponding to the
scheme found in the request".
2025-05-26 11:47:23 +02:00
Christopher Faulet
da9792cca8 BUG/MINOR: h3: Set HTX flags corresponding to the scheme found in the request
When a ":scheme" pseudo-header is found in a h3 request, the
HTX_SL_F_HAS_SCHM flag must be set on the HTX message. And if the scheme is
'http' or 'https', the corresponding HTX flag must also be set. So,
respectively, HTX_SL_F_SCHM_HTTP or HTX_SL_F_SCHM_HTTPS.

It is mainly used to send the right ":scheme" pseudo-header value to H2
server on backend side.

This patch could be backported as far as 2.6.
2025-05-26 11:38:29 +02:00
Willy Tarreau
083708daf8 DOC: config: fix alphabetical ordering of internal sample fetch functions
Some misordering has been accumulating over time, making some of them
hard to spot. Also "uptime" was not indexed.
2025-05-26 09:36:23 +02:00
Willy Tarreau
52c2247d90 DOC: config: fix alphabetical ordering of layer 4 sample fetch functions
Some misordering has been accumulating over time, making some of them
hard to spot.
2025-05-26 09:33:17 +02:00
Willy Tarreau
770098f5e3 DOC: config: fix alphabetical ordering of layer 5 sample fetch functions
Some misordering has been accumulating over time, making some of them
hard to spot.
2025-05-26 09:26:11 +02:00
Willy Tarreau
5261e35b8f DOC: config: fix alphabetical ordering of layer 6 sample fetch functions
Some misordering has been accumulating over time, making some of them
hard to spot.
2025-05-26 09:26:11 +02:00
Willy Tarreau
e9248243e9 DOC: config: fix alphabetical ordering of layer 7 sample fetch functions
Some misordering has been accumulating over time, making some of them
hard to spot.
2025-05-26 09:26:11 +02:00
Willy Tarreau
38456f63a3 DOC: config: clarify the legacy cookie and header captures
As reported in issue #2195, cookie captures and header captures are no
longer the recommended way to proceed. Let's mention that this is the
legacy way and provide a few pointers to the recommended functions and
actions to use the modern methods.
2025-05-26 08:56:33 +02:00
Willy Tarreau
da8d6d1b2c DOC: config: clarify the wording around single/double quotes
As reported in issue #2327, the wording used in the section about quoting
can be read two ways due to the use of the two types of quotes to protect
each other quote. Better only use the quoting without mixing the two when
mentioning them.
2025-05-26 08:36:33 +02:00
William Lallemand
d607940915 DOC: configuration: fix the example in crt-store
Fix a bad example in the crt-store section. site1 does not use the "web"
crt-store but the global one.

Must be backported as far as 3.0 however the section was 3.12 in
previous version.
2025-05-25 16:55:08 +02:00
Remi Tricot-Le Breton
90441e9bfe BUG/MAJOR: cache: Crash because of wrong cache entry deleted
When "vary" is enabled, we can have multiple entries for a given primary
key in the cache tree. There is a limit to how many secondary entries
can be inserted for a given key. When we try to insert a new secondary
entry, if the limit is already reached, we can try to find expired
entries with the same primary key, and if the limit is still reached we
want to abort the current insertion and to remove the node that was just
inserted.

In commit "a29b073: MEDIUM: cache: Add refcount on cache_entry" though,
a regression was introduced. Instead of removing the entry just inserted
as the comments suggested, we removed the second to last entry and
returned NULL. We then reset the eb.key of the cache_entry in the caller
because we assumed that the entry was already removed from the tree.

This means that some entries with an empty key were wrongly kept in the
tree and the last secondary entry, which keeps the number of secondary
entries of a given key was removed.

This ended up causing some crashes later on when we tried to iterate
over the elements of this given key. The crash could occur in multiple
places, either when trying to retrieve an entry or to add some new ones.

This crash was raised in GitHub issue #2950.
The fix should be backported up to 3.0.
2025-05-23 22:38:54 +02:00
Willy Tarreau
84ffb3d0a9 MINOR: config: list recently added sections with -dKcfg
Newly added sections (crt-store, traces, acme) were not listed in
-dKcfg, let's add them. For now they have to be manually enumerated.
2025-05-23 10:49:33 +02:00
Willy Tarreau
28c7a22790 BUG/MEDIUM: server: fix potential null-deref after previous fix
A valid build warning was reported in the CI with latest commit b40ce97ecc
("BUG/MEDIUM: server: fix crash after duplicate GUID insertion"). Indeed,
if the first test in the function fails, we branch to the err label
with guid==NULL and will crash there. Let's just test guid before
dereferencing it for freeing.

This needs to be backported to 3.0 as well since the commit above was
meant to go there.
2025-05-22 18:09:12 +02:00
Amaury Denoyelle
b40ce97ecc BUG/MEDIUM: server: fix crash after duplicate GUID insertion
On "add server", if a GUID is defined, guid_insert() is used to add the
entry into the global GUID tree. If a similar entry already exists, GUID
insertion fails and the server creation is eventually aborted.

A crash could occur in this case because of an invalid memory access via
guid_remove(). The latter is caused via free_server() as the server
insertion is rejected. The invalid occurs on GUID key.

The issue occurs because of guid_insert(). The function properly
deallocates the GUID key on duplicate insertion, but it failed to reset
<guid.node.key> to NULL. This caused the invalid memory access on
guid_remove(). To fix this, ensure that key member is properly resetted
on guid_insert() error path.

This must be backported up to 3.0.
2025-05-22 17:59:37 +02:00
Amaury Denoyelle
5e088e3f8e MINOR: server: use stress mode for "add server help"
Implement stress mode on "add server help". This ensures that the
command is fully reentrant on full output buffer.

For testing, it requires compilation with USE_STRESS and global setting
"stress-level 1".
2025-05-22 17:40:05 +02:00
Amaury Denoyelle
4de5090976 MINOR: server: implement "add server help"
Implement "help" as a sub-command for "add server" CLI. The objective is
to list all the keywords that are supported for dynamic servers. CLI IO
handler and add_srv_ctx are used to support reentrancy on full output
buffer.

Now that this command is implemented, the outdated keyword list on "add
server" from management documentation can be removed.
2025-05-22 17:40:05 +02:00
Amaury Denoyelle
2570892c41 MINOR: server: define CLI I/O handler for "add server"
Extend "add server" to support an IO handler function named
cli_io_handler_add_server(). A context object is also defined whose
usage will depend on IO handler capabilities.

IO handler is skipped when "add server" is run in default mode, i.e. on
a dynamic server creation. Thus, currently IO handler is unneeded.
However, it will become useful to support sub-commands for "add server".

Note that return value of "add server" parser has been changed on server
creation success. Previously, it was used incorrectly to report if
server was inserted or not. In fact, parser return value is used by CLI
generic code to detect if command processing has been completed, or
should continue to the IO handler. Now, "add server" always returns 1 to
signal that CLI processing is completed. This is necessary to preserve
CLI output emitted by parser, even now that IO handler is defined for
the command. Previously, output was emitted in every situations due to
IO handler not defined. See below code snippet from cli.c for a better
overview :

  if (kw->parse && kw->parse(args, payload, appctx, kw->private) != 0) {
          ret = 1;
          goto fail;
  }

  /* kw->parse could set its own io_handler or io_release handler */
  if (!appctx->cli_ctx.io_handler) {
          ret = 1;
          goto fail;
  }

  appctx->st0 = CLI_ST_CALLBACK;
  ret = 1;
  goto end;
2025-05-22 17:40:05 +02:00
Willy Tarreau
1c0f2e62ad MINOR: ssl: also provide the "tls-tickets" bind option
Currently there is "no-tls-tickets" that is also supported in the
ssl-default-bind-options directive, but there's no way to re-enable
them on a specific "bind" line. This patch simply provides the option
to re-enable them. Note that the flag is inverted because tickets are
enabled by default and the no-tls-ticket option sets the flag to
disable them.
2025-05-22 15:31:54 +02:00
Willy Tarreau
3494775a1f MINOR: ssl: support strict-sni in ssl-default-bind-options
Several users already reported that it would be nice to support
strict-sni in ssl-default-bind-options. However, in order to support
it, we also need an option to disable it.

This patch moves the setting of the option from the strict_sni field
to a flag in the ssl_options field so that it can be inherited from
the default bind options, and adds a new "no-strict-sni" directive to
allow to disable it on a specific "bind" line.

The test file "del_ssl_crt-list.vtc" which already tests both options
was updated to make use of the default option and the no- variant to
confirm everything continues to work.
2025-05-22 15:31:54 +02:00
Christopher Faulet
7244f16ac4 MINOR: promex: Add agent check status/code/duration metrics
In the Prometheus exporter, the last health check status is already exposed,
with its code and duration in seconds. The server status is also exposed.
But the information about the agent check are not available. It is not
really handy because when a server status is changed because of the agent,
it is not obvious by looking to the Prometheus metrics. Indeed, the server
may reported as DOWN for instance, while the health check status still
reports a success. Being able to get the agent status in that case could be
valuable.

So now, the last agent check status is exposed, with its code and duration
in seconds. Following metrics can be grabbe now:

  * haproxy_server_agent_status
  * haproxy_server_agent_code
  * haproxy_server_agent_duration_seconds

Note that unlike the other metrics, no per-backend aggregated metric is
exposed.

This patch is related to issue #2983.
2025-05-22 09:50:10 +02:00
Willy Tarreau
0ac41ff97e [RELEASE] Released version 3.2-dev17
Released version 3.2-dev17 with the following main changes :
    - DOC: configuration: explicit multi-choice on bind shards option
    - BUG/MINOR: sink: detect and warn when using "send-proxy" options with ring servers
    - BUG/MEDIUM: peers: also limit the number of incoming updates
    - MEDIUM: hlua: Add function to change the body length of an HTTP Message
    - BUG/MEDIUM: stconn: Disable 0-copy forwarding for filters altering the payload
    - BUG/MINOR: h3: don't insert more than one Host header
    - BUG/MEDIUM: h1/h2/h3: reject forbidden chars in the Host header field
    - DOC: config: properly index "table and "stick-table" in their section
    - DOC: management: change reference to configuration manual
    - BUILD: debug: mark ha_crash_now() as attribute(noreturn)
    - IMPORT: slz: avoid multiple shifts on 64-bits
    - IMPORT: slz: support crc32c for lookup hash on sse4 but only if requested
    - IMPORT: slz: use a better hash for machines with a fast multiply
    - IMPORT: slz: fix header used for empty zlib message
    - IMPORT: slz: silence a build warning on non-x86 non-arm
    - BUG/MAJOR: leastconn: do not loop forever when facing saturated servers
    - BUG/MAJOR: queue: properly keep count of the queue length
    - BUG/MINOR: quic: fix crash on quic_conn alloc failure
    - BUG/MAJOR: leastconn: never reuse the node after dropping the lock
    - MINOR: acme: renewal notification over the dpapi sink
    - CLEANUP: quic: Useless BIO_METHOD initialization
    - MINOR: quic: Add useful error traces about qc_ssl_sess_init() failures
    - MINOR: quic: Allow the use of the new OpenSSL 3.5.0 QUIC TLS API (to be completed)
    - MINOR: quic: implement all remaining callbacks for OpenSSL 3.5 QUIC API
    - MINOR: quic: OpenSSL 3.5 internal QUIC custom extension for transport parameters reset
    - MINOR: quic: OpenSSL 3.5 trick to support 0-RTT
    - DOC: update INSTALL for QUIC with OpenSSL 3.5 usages
    - DOC: management: update 'acme status'
    - BUG/MEDIUM: wdt: always ignore the first watchdog wakeup
    - CLEANUP: wdt: clarify the comments on the common exit path
    - BUILD: ssl: avoid possible printf format warning in traces
    - BUILD: acme: fix build issue on 32-bit archs with 64-bit time_t
    - DOC: management: precise some of the fields of "show servers conn"
    - BUG/MEDIUM: mux-quic: fix BUG_ON() on rxbuf alloc error
    - DOC: watchdog: update the doc to reflect the recent changes
    - BUG/MEDIUM: acme: check if acme domains are configured
    - BUG/MINOR: acme: fix formatting issue in error and logs
    - EXAMPLES: lua: avoid screen refresh effect in "trisdemo"
    - CLEANUP: quic: remove unused cbuf module
    - MINOR: quic: move function to check stream type in utils
    - MINOR: quic: refactor handling of streams after MUX release
    - MINOR: quic: add some missing includes
    - MINOR: quic: adjust quic_conn-t.h include list
    - CLEANUP: cfgparse: alphabetically sort the global keywords
    - MINOR: glitches: add global setting "tune.glitches.kill.cpu-usage"
2025-05-21 15:56:06 +02:00
Willy Tarreau
a1577a89a0 MINOR: glitches: add global setting "tune.glitches.kill.cpu-usage"
It was mentioned during the development of glitches that it would be
nice to support not killing misbehaving connections below a certain
CPU usage so that poor implementations that routinely misbehave without
impact are not killed. This is now possible by setting a CPU usage
threshold under which we don't kill them via this parameter. It defaults
to zero so that we continue to kill them by default.
2025-05-21 15:47:42 +02:00
Willy Tarreau
eee57b4d3f CLEANUP: cfgparse: alphabetically sort the global keywords
The global keywords table was no longer sorted at all, let's fix it to
ease spotting the searched ones.
2025-05-21 15:47:42 +02:00
Amaury Denoyelle
00d90e8839 MINOR: quic: adjust quic_conn-t.h include list
Adjust include list in quic_conn-t.h. This file is included in many QUIC
source, so it is useful to keep as lightweight as possible. Note that
connection/QUIC MUX are transformed into forward declaration for better
layer separation.
2025-05-21 14:44:27 +02:00
Amaury Denoyelle
01e3b2119a MINOR: quic: add some missing includes
Insert some missing includes statement in QUIC source files. This was
detected after the next commit which adjust the include list used in
quic_conn-t.h file.
2025-05-21 14:44:27 +02:00
Amaury Denoyelle
f286288471 MINOR: quic: refactor handling of streams after MUX release
quic-conn layer has to handle itself STREAM frames after MUX release. If
the stream was already seen, it is probably only a retransmitted frame
which can be safely ignored. For other streams, an active closure may be
needed.

Thus it's necessary that quic-conn layer knows the highest stream ID
already handled by the MUX after its release. Previously, this was done
via <nb_streams> member array in quic-conn structure.

Refactor this by replacing <nb_streams> by two members called
<stream_max_uni>/<stream_max_bidi>. Indeed, it is unnecessary for
quic-conn layer to monitor locally opened uni streams, as the peer
cannot by definition emit a STREAM frame on it. Also, bidirectional
streams are always opened by the remote side.

Previously, <nb_streams> were set by quic-stream layer. Now,
<stream_max_uni>/<stream_max_bidi> members are only set one time, just
prior to QUIC MUX release. This is sufficient as quic-conn do not use
them if the MUX is available.

Note that previously, IDs were used relatively to their type, thus
incremented by 1, after shifting the original value. For simplification,
use the plain stream ID, which is incremented by 4.
2025-05-21 14:26:45 +02:00
Amaury Denoyelle
07d41a043c MINOR: quic: move function to check stream type in utils
Move general function to check if a stream is uni or bidirectional from
QUIC MUX to quic_utils module. This should prevent unnecessary include
of QUIC MUX header file in other sources.
2025-05-21 14:17:41 +02:00
Amaury Denoyelle
cf45bf1ad8 CLEANUP: quic: remove unused cbuf module
Cbuf are not used anymore. Remove the related source and header files,
as well as include statements in the rest of QUIC source files.
2025-05-21 14:16:37 +02:00
Baptiste Assmann
b437094853 EXAMPLES: lua: avoid screen refresh effect in "trisdemo"
In current version of the game, there is a "screen refresh" effect: the
screen is cleared before being re-drawn.
I moved the clear right after the connection is opened and removed it
from rendering time.
2025-05-21 12:00:53 +02:00
William Lallemand
8b121ab6f7 BUG/MINOR: acme: fix formatting issue in error and logs
Stop emitting \n in errmsg for intermediate error messages, this was
emitting multiline logs and was returning to a new line in the middle of
sentences.

We don't need to emit them in acme_start_task() since the errmsg is
ouput in a send_log which already contains a \n or on the CLI which
also emits it.
2025-05-21 11:41:28 +02:00
William Lallemand
156f4bd7a6 BUG/MEDIUM: acme: check if acme domains are configured
When starting the ACME task with a ckch_conf which does not contain the
domains, the ACME task would segfault because it will try to dereference
a NULL in this case.

The patch fix the issue by emitting a warning when no domains are
configured. It's not done at configuration parsing because it is not
easy to emit the warning because there are is no callback system which
give access to the whole ckch_conf once a line is parsed.

No backport needed.
2025-05-21 11:41:28 +02:00
Willy Tarreau
f5ed309449 DOC: watchdog: update the doc to reflect the recent changes
The watchdog was improved and fixed a few months ago, but the doc had
not been updated to reflect this. That's now done.
2025-05-21 11:34:55 +02:00
Amaury Denoyelle
e399daa67e BUG/MEDIUM: mux-quic: fix BUG_ON() on rxbuf alloc error
RX buffer allocation has been reworked in current dev tree. The
objective is to support multiple buffers per QCS to improve upload
throughput.

RX buffer allocation failure is handled simply : the whole connection is
closed. This is done via qcc_set_error(), with INTERNAL_ERROR as error
code. This function contains a BUG_ON() to ensure it is called only one
time per connection instance.

On RX buffer alloc failure, the aformentioned BUG_ON() crashes due to a
double invokation of qcc_set_error(). First by qcs_get_rxbuf(), and
immediately after it by qcc_recv(), which is the caller of the previous
one. This regression was introduced by the following commit.

  60f64449fbba7bb6e351e8343741bb3c960a2e6d
  MAJOR: mux-quic: support multiple QCS RX buffers

To fix this, simply remove qcc_set_error() invocation in
qcs_get_rxbuf(). On buffer alloc failture, qcc_recv() is responsible to
set the error.

This does not need to be backported.
2025-05-21 11:33:00 +02:00
Willy Tarreau
5c628d4e09 DOC: management: precise some of the fields of "show servers conn"
As reported in issue #2970, the output of "show servers conn" is not
clear. It was essentially meant as a debugging tool during some changes
to idle connections management, but if some users want to monitor or
graph them, more info is needed. The doc mentions the currently known
list of fields, and reminds that this output is not meant to be stable
over time, but as long as it does not change, it can provide some useful
metrics to some users.
2025-05-21 10:45:07 +02:00
Willy Tarreau
4b52d5e406 BUILD: acme: fix build issue on 32-bit archs with 64-bit time_t
The build failed on mips32 with a 64-bit time_t here:

  https://github.com/haproxy/haproxy/actions/runs/15150389164/job/42595310111

Let's just turn the "remain" variable used to show the remaining time
into a more portable ullong and use %llu for all format specifiers,
since long remains limited to 32-bit on 32-bit archs.

No backport needed.
2025-05-21 10:18:47 +02:00
Willy Tarreau
09d4c9519e BUILD: ssl: avoid possible printf format warning in traces
When building on MIPS-32 with gcc-9.5 and glibc-2.31, I got this:

  src/ssl_trace.c: In function 'ssl_trace':
  src/ssl_trace.c:118:42: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'ssize_t' {aka 'const int'} [-Wformat=]
    118 |     chunk_appendf(&trace_buf, " : size=%ld", *size);
        |                                        ~~^   ~~~~~
        |                                          |   |
        |                                          |   ssize_t {aka const int}
        |                                          long int
        |                                        %d

Let's just cast the type. No backport needed.
2025-05-21 10:01:14 +02:00
Willy Tarreau
3b2fb5cc15 CLEANUP: wdt: clarify the comments on the common exit path
The condition in which we reach the check for ha_panic() and
ha_stuck_warning() are not super clear, let's reformulate them.
2025-05-20 16:37:06 +02:00
Willy Tarreau
0a8bfb5b90 BUG/MEDIUM: wdt: always ignore the first watchdog wakeup
With commit a06c215f08 ("MEDIUM: wdt: always make the faulty thread
report its own warnings"), when the TH_FL_STUCK flag was flipped on,
we'd then go to the panic code instead of giving a second chance like
before the commit. This can trigger rare cases that only happen with
moderate loads like was addressed by commit 24ce001771 ("BUG/MEDIUM:
wdt: fix the stuck detection for warnings"). This is in fact due to
the loss of the common "goto update_and_leave" that used to serve
both the warning code and the flag setting for probation, and it's
apparently what hit Christian in issue #2980.

Let's make sure we exit naturally when turning the bit on for the
first time. Let's also update the confusing comment at the end of
the check that was left over by latest change.

Since the first commit was backported to 3.1, this commit should be
backported there as well.
2025-05-20 16:37:03 +02:00
William Lallemand
dcdf27af70 DOC: management: update 'acme status'
Update the 'acme status' section with the "Stopped" status and fix the
description.
2025-05-20 16:08:57 +02:00
Frederic Lecaille
bbe302087c DOC: update INSTALL for QUIC with OpenSSL 3.5 usages
Update the QUIC sections which mention the OpenSSL library use cases.
2025-05-20 15:00:06 +02:00
Frederic Lecaille
08eee0d9cf MINOR: quic: OpenSSL 3.5 trick to support 0-RTT
For an unidentified reason, SSL_do_hanshake() succeeds at its first call when 0-RTT
is enabled for the connection. This behavior looks very similar by the one encountered
by AWS-LC stack. That said, it was documented by AWS-LC. This issue leads the
connection to stop sending handshake packets after having release the handshake
encryption level. In fact, no handshake packets could even been sent leading
the handshake to always fail.

To fix this, this patch simulates a "handshake in progress" state waiting
for the application level read secret to be established by the TLS stack.
This may happen only after the QUIC listener has completed/confirmed the handshake
upon handshake CRYPTO data receipt from the peer.
2025-05-20 15:00:06 +02:00
Frederic Lecaille
849a3af14e MINOR: quic: OpenSSL 3.5 internal QUIC custom extension for transport parameters reset
A QUIC must sent its transport parameter using a TLS custom extention. This
extension is reset by SSL_set_SSL_CTX(). It can be restored calling
quic_ssl_set_tls_cbs() (which calls SSL_set_quic_tls_cbs()).
2025-05-20 15:00:06 +02:00
Frederic Lecaille
b3ac1a636c MINOR: quic: implement all remaining callbacks for OpenSSL 3.5 QUIC API
The quic_conn struct is modified for two reasons. The first one is to store
the encoded version of the local tranport parameter as this is done for
USE_QUIC_OPENSSL_COMPAT. Indeed, the local transport parameter "should remain
valid until after the parameters have been sent" as mentionned by
SSL_set_quic_tls_cbs(3) manual. In our case, the buffer is a static buffer
attached to the quic_conn object. qc_ssl_set_quic_transport_params() function
whose role is to call SSL_set_tls_quic_transport_params() (aliased by
SSL_set_quic_transport_params() to set these local tranport parameter into
the TLS stack from the buffer attached to the quic_conn struct.

The second quic_conn struct modification is the addition of the  new ->prot_level
(SSL protection level) member added to the quic_conn struct to store "the most
recent write encryption level set via the OSSL_FUNC_SSL_QUIC_TLS_yield_secret_fn
callback (if it has been called)" as mentionned by SSL_set_quic_tls_cbs(3) manual.

This patches finally implements the five remaining callacks to make the haproxy
QUIC implementation work.

OSSL_FUNC_SSL_QUIC_TLS_crypto_send_fn() (ha_quic_ossl_crypto_send) is easy to
implement. It calls ha_quic_add_handshake_data() after having converted
qc->prot_level TLS protection level value to the correct ssl_encryption_level_t
(boringSSL API/quictls) value.

OSSL_FUNC_SSL_QUIC_TLS_crypto_recv_rcd_fn() (ha_quic_ossl_crypto_recv_rcd())
provide the non-contiguous addresses to the TLS stack, without releasing
them.

OSSL_FUNC_SSL_QUIC_TLS_crypto_release_rcd_fn() (ha_quic_ossl_crypto_release_rcd())
release these non-contiguous buffer relying on the fact that the list of
encryption level (qc->qel_list) is correctly ordered by SSL protection level
secret establishements order (by the TLS stack).

OSSL_FUNC_SSL_QUIC_TLS_yield_secret_fn() (ha_quic_ossl_got_transport_params())
is a simple wrapping function over ha_quic_set_encryption_secrets() which is used
by boringSSL/quictls API.

OSSL_FUNC_SSL_QUIC_TLS_got_transport_params_fn() (ha_quic_ossl_got_transport_params())
role is to store the peer received transport parameters. It simply calls
quic_transport_params_store() and set them into the TLS stack calling
qc_ssl_set_quic_transport_params().

Also add some comments for all the OpenSSL 3.5 QUIC API callbacks.

This patch have no impact on the other use of QUIC API provided by the others TLS
stacks.
2025-05-20 15:00:06 +02:00
Frederic Lecaille
dc6a3c329a MINOR: quic: Allow the use of the new OpenSSL 3.5.0 QUIC TLS API (to be completed)
This patch allows the use of the new OpenSSL 3.5.0 QUIC TLS API when it is
available and detected at compilation time. The detection relies on the presence of the
OSSL_FUNC_SSL_QUIC_TLS_CRYPTO_SEND macro from openssl-compat.h. Indeed this
macro is defined by OpenSSL since 3.5.0 version. It is not defined by quictls.
This helps in distinguishing these two TLS stacks. When the detection succeeds,
HAVE_OPENSSL_QUIC is also defined by openssl-compat.h. Then, this is this new macro
which is used to detect the availability of the new OpenSSL 3.5.0 QUIC TLS API.

Note that this detection is done only if USE_QUIC_OPENSSL_COMPAT is not asked.
So, USE_QUIC_OPENSSL_COMPAT and HAVE_OPENSSL_QUIC are exclusive.

At the same location, from openssl-compat.h, ssl_encryption_level_t enum is
defined. This enum was defined by quictls and expansively used by the haproxy
QUIC implementation. SSL_set_quic_transport_params() is replaced by
SSL_set_quic_tls_transport_params. SSL_set_quic_early_data_enabled() (quictls) is also replaced
by SSL_set_quic_tls_early_data_enabled() (OpenSSL). SSL_quic_read_level() (quictls)
is not defined by OpenSSL. It is only used by the traces to log the current
TLS stack decryption level (read). A macro makes it return -1 which is an
usused values.

The most of the differences between quictls and OpenSSL QUI APIs are in quic_ssl.c
where some callbacks must be defined for these two APIs. This is why this
patch modifies quic_ssl.c to define an array of OSSL_DISPATCH structs: <ha_quic_dispatch>.
Each element of this arry defines a callback. So, this patch implements these
six callabcks:

  - ha_quic_ossl_crypto_send()
  - ha_quic_ossl_crypto_recv_rcd()
  - ha_quic_ossl_crypto_release_rcd()
  - ha_quic_ossl_yield_secret()
  - ha_quic_ossl_got_transport_params() and
  - ha_quic_ossl_alert().

But at this time, these implementations which must return an int return 0 interpreted
as a failure by the OpenSSL QUIC API, except for ha_quic_ossl_alert() which
is implemented the same was as for quictls. The five remaining functions above
will be implemented by the next patches to come.

ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data() have been moved
to be defined for both quictls and OpenSSL QUIC API.

These callbacks are attached to the SSL objects (sessions) calling qc_ssl_set_cbs()
new function. This latter callback the correct function to attached the correct
callbacks to the SSL objects (defined by <ha_quic_method> for quictls, and
<ha_quic_dispatch> for OpenSSL).

The calls to SSL_provide_quic_data() and SSL_process_quic_post_handshake()
have been also disabled. These functions are not defined by OpenSSL QUIC API.
At this time, the functions which call them are still defined when HAVE_OPENSSL_QUIC
is defined.
2025-05-20 15:00:06 +02:00
Frederic Lecaille
894595b711 MINOR: quic: Add useful error traces about qc_ssl_sess_init() failures
There were no traces to diagnose qc_ssl_sess_init() failures from QUIC traces.
This patch add calls to TRACE_DEVEL() into qc_ssl_sess_init() and its caller
(qc_alloc_ssl_sock_ctx()). This was useful at least to diagnose SSL context
initialization failures when porting QUIC to the new OpenSSL 3.5 QUIC API.

Should be easily backported as far as 2.6.
2025-05-20 15:00:06 +02:00
Frederic Lecaille
a2822b1776 CLEANUP: quic: Useless BIO_METHOD initialization
This code is there from QUIC implementation start. It was supposed to
initialize <ha_quic_meth> as a BIO_METHOD static object. But this
BIO_METHOD is not used at all!

Should be backported as far as 2.6 to help integrate the next patches to come.
2025-05-20 15:00:06 +02:00
William Lallemand
e803385a6e MINOR: acme: renewal notification over the dpapi sink
Output a sink message when the certificate was renewed by the ACME
client.

The message is emitted on the "dpapi" sink, and ends by \n\0.
Since the message contains this binary character, the right -0 parameter
must be used when consulting the sink over the CLI:

Example:

	$ echo "show events dpapi -nw -0" | socat -t9999 /tmp/haproxy.sock -
	<0>2025-05-19T15:56:23.059755+02:00 acme newcert foobar.pem.rsa\n\0

When used with the master CLI, @@1 should be used instead of @1 in order
to keep the connection to the worker.

Example:

	$ echo "@@1 show events dpapi -nw -0" | socat -t9999 /tmp/master.sock -
	<0>2025-05-19T15:56:23.059755+02:00 acme newcert foobar.pem.rsa\n\0
2025-05-19 16:07:25 +02:00
Willy Tarreau
99d6c889d0 BUG/MAJOR: leastconn: never reuse the node after dropping the lock
On ARM with 80 cores and a single server, it's sometimes possible to see
a segfault in fwlc_get_next_server() around 600-700k RPS. It seldom
happens as well on x86 with 128 threads with the same config around 1M
rps. It turns out that in fwlc_get_next_server(), before calling
fwlc_srv_reposition(), we have to drop the lock and that one takes it
back again.

The problem is that anything can happen to our node during this time,
and it can be freed. Then when continuing our work, we later iterate
over it and its next to find a node with an acceptable key, and by
doing so we can visit either uninitialized memory or simply nodes that
are no longer in the tree.

A first attempt at fixing this consisted in artificially incrementing
the elements count before dropping the lock, but that turned out to be
even worse because other threads could loop forever on such an element
looking for an entry that does not exist. Maintaining a separate
refcount didn't work well either, and it required to deal with the
memory release while dropping it, which is really not convenient.

Here we're taking a different approach consisting in simply not
trusting this node anymore and going back to the beginning of the
loop, as is done at a few other places as well. This way we can
safely ignore the possibly released node, and the test runs reliably
both on the arm and the x86 platforms mentioned above. No performance
regression was observed either, likely because this operation is quite
rare.

No backport is needed since this appeared with the leastconn rework
in 3.2.
2025-05-19 16:05:03 +02:00
Amaury Denoyelle
d358da4d83 BUG/MINOR: quic: fix crash on quic_conn alloc failure
If there is an alloc failure during qc_new_conn(), cleaning is done via
quic_conn_release(). However, since the below commit, an unchecked
dereferencing of <qc.path> is performed in the latter.

  e841164a4402118bd7b2e2dc2b5068f21de5d9d2
  MINOR: quic: account for global congestion window

To fix this, simply check <qc.path> before dereferencing it in
quic_conn_release(). This is safe as it is properly initialized to NULL
on qc_new_conn() first stage.

This does not need to be backported.
2025-05-19 11:03:48 +02:00
Willy Tarreau
099c1b2442 BUG/MAJOR: queue: properly keep count of the queue length
The queue length was moved to its own variable in commit 583303c48
("MINOR: proxies/servers: Calculate queueslength and use it."), however a
few places were missed in pendconn_unlink() and assign_server_and_queue()
resulting in never decreasing counts on aborted streams. This was
reproduced when injecting more connections than the total backend
could stand in TCP mode and letting some of them time out in the
queue. No backport is needed, this is only 3.2.
2025-05-17 10:46:10 +02:00
Willy Tarreau
6be02d1c6e BUG/MAJOR: leastconn: do not loop forever when facing saturated servers
Since commit 9fe72bba3 ("MAJOR: leastconn; Revamp the way servers are
ordered."), there's no way to escape the loop visiting the mt_list heads
in fwlc_get_next_server if all servers in the list are saturated,
resulting in a watchdog panic. It can be reproduced with this config
and injecting with more than 2 concurrent conns:

    balance leastconn
    server s1 127.0.0.1:8000 maxconn 1
    server s2 127.0.0.1:8000 maxconn 1

Here we count the number of saturated servers that were encountered, and
escape the loop once the number of remaining servers exceeds the number
of saturated ones. No backport is needed since this arrived in 3.2.
2025-05-17 10:44:36 +02:00
Willy Tarreau
ccc65012d3 IMPORT: slz: silence a build warning on non-x86 non-arm
Building with clang 16 on MIPS64 yields this warning:

  src/slz.c:931:24: warning: unused function 'crc32_uint32' [-Wunused-function]
  static inline uint32_t crc32_uint32(uint32_t data)
                         ^

Let's guard it using UNALIGNED_LE_OK which is the only case where it's
used. This saves us from introducing a possibly non-portable attribute.

This is libslz upstream commit f5727531dba8906842cb91a75c1ffa85685a6421.
2025-05-16 16:43:53 +02:00
Willy Tarreau
31ca29eee1 IMPORT: slz: fix header used for empty zlib message
Calling slz_rfc1950_finish() without emitting any data would result in
incorrectly emitting a gzip header (rfc1952) instead of a zlib header
(rfc1950) due to a copy-paste between the two wrappers. The impact is
almost inexistent since the zlib format is almost never used in this
context, and compressing totally empty messages is quite rare as well.
Let's take this opportunity for fixing another mistake on an RFC number
in a comment.

This is slz upstream commit 7f3fce4f33e8c2f5e1051a32a6bca58e32d4f818.
2025-05-16 16:43:53 +02:00
Willy Tarreau
411b04c7d3 IMPORT: slz: use a better hash for machines with a fast multiply
The current hash involves 3 simple shifts and additions so that it can
be mapped to a multiply on architecures having a fast multiply. This is
indeed what the compiler does on x86_64. A large range of values was
scanned to try to find more optimal factors on machines supporting such
a fast multiply, and it turned out that new factor 0x1af42f resulted in
smoother hashes that provided on average 0.4% better compression on both
the Silesia corpus and an mbox file composed of very compressible emails
and uncompressible attachments. It's even slightly better than CRC32C
while being faster on Skylake. This patch enables this factor on archs
with a fast multiply.

This is slz upstream commit 82ad1e75c13245a835c1c09764c89f2f6e8e2a40.
2025-05-16 16:43:53 +02:00
Willy Tarreau
248bbec83c IMPORT: slz: support crc32c for lookup hash on sse4 but only if requested
If building for sse4 and USE_CRC32C_HASH is defined, then we can use
crc32c to calculate the lookup hash. By default we don't do it because
even on skylake it's slower than the current hash, which only involves
a short multiply (~5% slower). But the gains are marginal (0.3%).

This is slz upstream commit 44ae4f3f85eb275adba5844d067d281e727d8850.

Note: this is not used by default and only merged in order to avoid
divergence between the code bases.
2025-05-16 16:43:53 +02:00
Willy Tarreau
ea1b70900f IMPORT: slz: avoid multiple shifts on 64-bits
On 64-bit platforms, disassembling the code shows that send_huff() performs
a left shift followed by a right one, which are the result of integer
truncation and zero-extension caused solely by using different types at
different levels in the call chain. By making encode24() take a 64-bit
int on input and send_huff() take one optionally, we can remove one shift
in the hot path and gain 1% performance without affecting other platforms.

This is slz upstream commit fd165b36c4621579c5305cf3bb3a7f5410d3720b.
2025-05-16 16:43:53 +02:00
Willy Tarreau
0a91c6dcae BUILD: debug: mark ha_crash_now() as attribute(noreturn)
Building on MIPS64 with clang16 incorrectly reports some uninitialized
value warnings in stats-proxy.c due to some calls to ABORT_NOW() where
the compiler didn't know the code wouldn't return. Let's properly mark
the function as noreturn, and take this opportunity for also marking it
unused to avoid possible warnings depending on the build options (if
ABORT_NOW is not used). No backport needed though it will not harm.
2025-05-16 16:43:53 +02:00
William Lallemand
1eebf98952 DOC: management: change reference to configuration manual
Since e24b77e7 ('DOC: config: move the extraneous sections out of the
"global" definition') the ACME section of the configuration manual was
move from 3.13 to 12.8.

Change the reference to that section in "acme renew".
2025-05-16 16:01:43 +02:00
Willy Tarreau
81e46be026 DOC: config: properly index "table and "stick-table" in their section
Tim reported in issue #2953 that "stick-table" and "table" were not
indexed as keywords. The issue was the indent level. Also let's make
sure to put a box around the "store" arguments as well.
2025-05-16 15:37:03 +02:00
Willy Tarreau
df00164fdd BUG/MEDIUM: h1/h2/h3: reject forbidden chars in the Host header field
In continuation with 9a05c1f574 ("BUG/MEDIUM: h2/h3: reject some
forbidden chars in :authority before reassembly") and the discussion
in issue #2941, @DemiMarie rightfully suggested that Host should also
be sanitized, because it is sometimes used in concatenation, such as
this:

    http-request set-url https://%[req.hdr(host)]%[pathq]

which was proposed as a workaround for h2 upstream servers that require
:authority here:

    https://www.mail-archive.com/haproxy@formilux.org/msg43261.html

The current patch then adds the same check for forbidden chars in the
Host header, using the same function as for the patch above, since in
both cases we validate the host:port part of the authority. This way
we won't reconstruct ambiguous URIs by concatenating Host and path.

Just like the patch above, this can be backported afer a period of
observation.
2025-05-16 15:13:17 +02:00
Willy Tarreau
b84762b3e0 BUG/MINOR: h3: don't insert more than one Host header
Let's make sure we drop extraneous Host headers after having compared
them. That also works when :authority was already present. This way,
like for h1 and h2, we only keep one copy of it, while still making
sure that Host matches :authority. This way, if a request has both
:authority and Host, only one Host header will be produced (from
:authority). Note that due to the different organization of the code
and wording along the evolving RFCs, here we also check that all
duplicates are identical, while h2 ignores them as per RFC7540, but
this will be re-unified later.

This should be backported to stable versions, at least 2.8, though
thanks to the existing checks the impact is probably nul.
2025-05-16 15:13:17 +02:00
Christopher Faulet
f45a632bad BUG/MEDIUM: stconn: Disable 0-copy forwarding for filters altering the payload
It is especially a problem with Lua filters, but it is important to disable
the 0-copy forwarding if a filter alters the payload, or at least to be able
to disable it. While the filter is registered on the data filtering, it is
not an issue (and it is the common case) because, there is now way to
fast-forward data at all. But it may be an issue if a filter decides to
alter the payload and to unregister from data filtering. In that case, the
0-copy forwarding can be re-enabled in a hardly precdictable state.

To fix the issue, a SC flags was added to do so. The HTTP compression filter
set it and lua filters too if the body length is changed (via
HTTPMessage.set_body_len()).

Note that it is an issue because of a bad design about the HTX. Many info
about the message are stored in the HTX structure itself. It must be
refactored to move several info to the stream-endpoint descriptor. This
should ease modifications at the stream level, from filter or a TCP/HTTP
rules.

This should be backported as far as 3.0. If necessary, it may be backported
on lower versions, as far as 2.6. In that case, it must be reviewed and
adapted.
2025-05-16 15:11:37 +02:00
Christopher Faulet
94055a5e73 MEDIUM: hlua: Add function to change the body length of an HTTP Message
There was no function for a lua filter to change the body length of an HTTP
Message. But it is mandatory to be able to alter the message payload. It is
not possible update to directly update the message headers because the
internal state of the message must also be updated accordingly.

It is the purpose of HTTPMessage.set_body_len() function. The new body
length myst be passed as argument. If it is an integer, the right
"Content-Length" header is set. If the "chunked" string is used, it forces
the message to be chunked-encoded and in that case the "Transfer-Encoding"
header.

This patch should fix the issue #2837. It could be backported as far as 2.6.
2025-05-16 14:34:12 +02:00
Willy Tarreau
f2d7aa8406 BUG/MEDIUM: peers: also limit the number of incoming updates
There's a configurable limit to the number of messages sent to a
peer (tune.peers.max-updates-at-once), but this one is not applied to
the receive side. While it can usually be OK with default settings,
setups involving a large tune.bufsize (1MB and above) regularly
experience high latencies and even watchdogs during reloads because
the full learning process sends a lot of data that manages to fill
the entire buffer, and due to the compactness of the protocol, 1MB
of buffer can contain more than 100k updates, meaning taking locks
etc during this time, which is not workable.

Let's make sure the receiving side also respects the max-updates-at-once
setting. For this it counts incoming updates, and refrains from
continuing once the limit is reached. It's a bit tricky to do because
after receiving updates we still have to send ours (and possibly some
ACKs) so we cannot just leave the loop.

This issue was reported on 3.1 but it should progressively be backported
to all versions having the max-updates-at-once option available.
2025-05-15 16:57:21 +02:00
Aurelien DARRAGON
098a5e5c0b BUG/MINOR: sink: detect and warn when using "send-proxy" options with ring servers
using "send-proxy" or "send-proxy-v2" option on a ring server is not
relevant nor supported. Worse, on 2.4 it causes haproxy process to
crash as reported in GH #2965.

Let's be more explicit about the fact that this keyword is not supported
under "ring" context by ignoring the option and emitting a warning message
to inform the user about that.

Ideally, we should do the same for peers and log servers. The proper way
would be to check servers options during postparsing but we currently lack
proper cross-type server postparsing hooks. This will come later and thus
will give us a chance to perform the compatibilty checks for server
options depending on proxy type. But for now let's simply fix the "ring"
case since it is the only one that's known to cause a crash.

It may be backported to all stable versions.
2025-05-15 16:18:31 +02:00
Basha Mougamadou
824bb93e18 DOC: configuration: explicit multi-choice on bind shards option
From the documentation, this wasn't clear enough that shards should
be followed by one of the options number / by-thread / by-group.
Align it with existing options in documentation so that it becomes
more explicit.
2025-05-14 19:41:38 +02:00
Willy Tarreau
17df04ff09 [RELEASE] Released version 3.2-dev16
Released version 3.2-dev16 with the following main changes :
    - BUG/MEDIUM: mux-quic: fix crash on invalid fctl frame dereference
    - DEBUG: pool: permit per-pool UAF configuration
    - MINOR: acme: add the global option 'acme.scheduler'
    - DEBUG: pools: add a new integrity mode "backup" to copy the released area
    - MEDIUM: sock-inet: re-check IPv6 connectivity every 30s
    - BUG/MINOR: ssl: doesn't fill conf->crt with first arg
    - BUG/MINOR: ssl: prevent multiple 'crt' on the same ssl-f-use line
    - BUG/MINOR: ssl/ckch: always free() the previous entry during parsing
    - MINOR: tools: ha_freearray() frees an array of string
    - BUG/MINOR: ssl/ckch: always ha_freearray() the previous entry during parsing
    - MINOR: ssl/ckch: warn when the same keyword was used twice
    - BUG/MINOR: threads: fix soft-stop without multithreading support
    - BUG/MINOR: tools: improve parse_line()'s robustness against empty args
    - BUG/MINOR: cfgparse: improve the empty arg position report's robustness
    - BUG/MINOR: server: dont depend on proxy for server cleanup in srv_drop()
    - BUG/MINOR: server: perform lbprm deinit for dynamic servers
    - MINOR: http: add a function to validate characters of :authority
    - BUG/MEDIUM: h2/h3: reject some forbidden chars in :authority before reassembly
    - MINOR: quic: account Tx data per stream
    - MINOR: mux-quic: account Rx data per stream
    - MINOR: quic: add stream format for "show quic"
    - MINOR: quic: display QCS info on "show quic stream"
    - MINOR: quic: display stream age
    - BUG/MINOR: cpu-topo: fix group-by-cluster policy for disordered clusters
    - MINOR: cpu-topo: add a new "group-by-ccx" CPU policy
    - MINOR: cpu-topo: provide a function to sort clusters by average capacity
    - MEDIUM: cpu-topo: change "performance" to consider per-core capacity
    - MEDIUM: cpu-topo: change "efficiency" to consider per-core capacity
    - MEDIUM: cpu-topo: prefer grouping by CCX for "performance" and "efficiency"
    - MEDIUM: config: change default limits to 1024 threads and 32 groups
    - BUG/MINOR: hlua: Fix Channel:data() and Channel:line() to respect documentation
    - DOC: config: Fix a typo in the "term_events" definition
    - BUG/MINOR: spoe: Don't report error on applet release if filter is in DONE state
    - BUG/MINOR: mux-spop: Don't report error for stream if ACK was already received
    - BUG/MINOR: mux-spop: Make the demux stream ID a signed integer
    - BUG/MINOR: mux-spop: Don't open new streams for SPOP connection on error
    - MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing
    - BUG/MEDIUM: mux-spop: Remove frame parsing states from the SPOP connection state
    - BUG/MEDIUM: mux-spop: Properly handle CLOSING state
    - BUG/MEDIUM: spop-conn: Report short read for partial frames payload
    - BUG/MEDIUM: mux-spop: Properly detect truncated frames on demux to report error
    - BUG/MEDIUM: mux-spop; Don't report a read error if there are pending data
    - DEBUG: mux-spop: Review some trace messages to adjust the message or the level
    - DOC: config: move address formats definition to section 2
    - DOC: config: move stick-tables and peers to their own section
    - DOC: config: move the extraneous sections out of the "global" definition
    - CI: AWS-LC(fips): enable unit tests
    - CI: AWS-LC: enable unit tests
    - CI: compliance: limit run on forks only to manual + cleanup
    - CI: musl: enable unit tests
    - CI: QuicTLS (weekly): limit run on forks only to manual dispatch
    - CI: WolfSSL: enable unit tests
2025-05-14 17:01:46 +02:00
Ilia Shipitsin
12de9ecce5 CI: WolfSSL: enable unit tests
Run the new make unit-tests on the CI.
2025-05-14 17:00:31 +02:00
Ilia Shipitsin
75a1e40501 CI: QuicTLS (weekly): limit run on forks only to manual dispatch 2025-05-14 17:00:31 +02:00
Ilia Shipitsin
a8b1b08fd7 CI: musl: enable unit tests
Run the new make unit-tests on the CI.
2025-05-14 17:00:31 +02:00
Ilia Shipitsin
01225f9aa5 CI: compliance: limit run on forks only to manual + cleanup 2025-05-14 17:00:31 +02:00
Ilia Shipitsin
61b30a09c0 CI: AWS-LC: enable unit tests
Run the new make unit-tests on the CI.
2025-05-14 17:00:31 +02:00
Ilia Shipitsin
944a96156e CI: AWS-LC(fips): enable unit tests
Run the new make unit-tests on the CI.
2025-05-14 17:00:31 +02:00
Willy Tarreau
e24b77e765 DOC: config: move the extraneous sections out of the "global" definition
Due to some historic mistakes that have spread to newly added sections,
a number of of recently added small sections found themselves described
under section 3 "global parameters" which is specific to "global" section
keywords. This is highly confusing, especially given that sections 3.1,
3.2, 3.3 and 3.10 directly start with keywords valid in the global section,
while others start with keywords that describe a new section.

Let's just create a new chapter "12. other sections" and move them all
there. 3.10 "HTTPclient tuning" however was moved to 3.4 as it's really
a definition of the global options assigned to the HTTP client. The
"programs" that are going away in 3.3 were moved at the end to avoid a
renumbering later.

Another nice benefit is that it moves a lot of text that was previously
keeping the global and proxies sections apart.
2025-05-14 16:08:02 +02:00
Willy Tarreau
da67a89f30 DOC: config: move stick-tables and peers to their own section
As suggested by Tim in issue #2953, stick-tables really deserve their own
section to explain the configuration. And peers have to move there as well
since they're totally dedicated to stick-tables.

Now we introduce a new section "Stick-tables and Peers", explaining the
concepts, and under which there is one subsection for stick-tables
configuration and one for the peers (which mostly keeps the existing
peers section).
2025-05-14 16:08:02 +02:00
Willy Tarreau
423dffa308 DOC: config: move address formats definition to section 2
Section 2 describes the config file format, variables naming etc, so
there's no reason why the address format used in this file should be
in a separate section, let's bring it into section 2 as well.
2025-05-14 16:08:02 +02:00
Christopher Faulet
e2ae8a74e8 DEBUG: mux-spop: Review some trace messages to adjust the message or the level
Some trace messages were not really accurrate, reporting a CLOSED connection
while only an error was reported on it. In addition, an TRACE_ERROR() was
used to report a short read on HELLO/DISCONNECT frames header. But it is not
an error. a TRACE_DEVEL() should be used instead.

This patch could be backported to 3.1 to ease future backports.
2025-05-14 11:52:10 +02:00
Christopher Faulet
6e46f0bf93 BUG/MEDIUM: mux-spop; Don't report a read error if there are pending data
When an read error is detected, no error must be reported on the SPOP
connection is there are still some data to parse. It is important to be sure
to process all data before reporting the error and be sure to not truncate
received frames. However, we must also take care to handle short read case
to not wait data that will never be received.

This patch must be backported to 3.1.
2025-05-14 11:51:58 +02:00
Christopher Faulet
16314bb93c BUG/MEDIUM: mux-spop: Properly detect truncated frames on demux to report error
There was no test in the demux part to detect truncated frames and to report
an error at the connection level. The SPOP streams were properly switch to
half-closed state. But waiting the associated SPOE applets were woken up and
released, the SPOP connection could be woken up several times for nothing. I
never triggered the watchdog in that case, but it is not excluded.

Now, at the end of the demux function, if a specific test was added to
detect truncated frames to report an error and close the connection.

This patch must be backported to 3.1.
2025-05-14 11:47:41 +02:00
Christopher Faulet
71feb49a9f BUG/MEDIUM: spop-conn: Report short read for partial frames payload
When a frame was not fully received, a short read must be reported on the
SPOP connection to help the demux to handle truncated frames. This was
performed for frames truncated on the header part but not on the payload
part. It is now properly detected.

This patch must be backported to 3.1.
2025-05-14 09:20:10 +02:00
Christopher Faulet
ddc5f8d92e BUG/MEDIUM: mux-spop: Properly handle CLOSING state
The CLOSING state was not handled at all by the SPOP multiplexer while it is
mandatory when a DISCONNECT frame was sent and the mux should wait for the
DISCONNECT frame in reply from the agent. Thanks to this patch, it should be
fixed.

In addition, if an error occurres during the AGENT HELLO frame parsing, the
SPOP connection is no longer switched to CLOSED state and remains in ERROR
state instead. It is important to be able to send the DISCONNECT frame to
the agent instead of closing the TCP connection immediately.

This patch depends on following commits:

  * BUG/MEDIUM: mux-spop: Remove frame parsing states from the SPOP connection state
  * MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing
  * BUG/MINOR: mux-spop: Don't open new streams for SPOP connection on error
  * BUG/MINOR: mux-spop: Make the demux stream ID a signed integer

All the series must be backported to 3.1.
2025-05-14 09:14:12 +02:00
Christopher Faulet
a3940614c2 BUG/MEDIUM: mux-spop: Remove frame parsing states from the SPOP connection state
SPOP_CS_FRAME_H and SPOP_CS_FRAME_P states, that were used to handle frame
parsing, were removed. The demux process now relies on the demux stream ID
to know if it is waiting for the frame header or the frame
payload. Concretly, when the demux stream ID is not set (dsi == -1), the
demuxer is waiting for the next frame header. Otherwise (dsi >= 0), it is
waiting for the frame payload. It is especially important to be able to
properly handle DISCONNECT frames sent by the agents.

SPOP_CS_RUNNING state is introduced to know the hello handshake was finished
and the SPOP connection is able to open SPOP streams and exchange NOTIFY/ACK
frames with the agents.

It depends on the following fixes:

  * MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing
  * BUG/MINOR: mux-spop: Make the demux stream ID a signed integer

This change will be mandatory for the next fix. It must be backported to 3.1
with the commits above.
2025-05-13 19:51:40 +02:00
Christopher Faulet
6b0f7de4e3 MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing
After the ACK frame was parsed, it is useless to set the SPOP connection
state to SPOP_CS_FRAME_H state because this will be automatically handled by
the demux function. If it is not an issue, but this will simplify changes
for the next commit.
2025-05-13 19:51:40 +02:00
Christopher Faulet
197eaaadfd BUG/MINOR: mux-spop: Don't open new streams for SPOP connection on error
Till now, only SPOP connections fully closed or those with a TCP connection on
error were concerned. But available streams could be reported for SPOP
connections in error or closing state. But in these states, no NOTIFY frames
will be sent and no ACK frames will be parsed. So, no new SPOP streams should be
opened.

This patch should be backported to 3.1.
2025-05-13 19:51:40 +02:00
Christopher Faulet
cbc10b896e BUG/MINOR: mux-spop: Make the demux stream ID a signed integer
The demux stream ID of a SPOP connection, used when received frames are
parsed, must be a signed integer because it is set to -1 when the SPOP
connection is initialized. It will be important for the next fix.

This patch must be backported to 3.1.
2025-05-13 19:51:40 +02:00
Christopher Faulet
6d68beace5 BUG/MINOR: mux-spop: Don't report error for stream if ACK was already received
When a SPOP connection was closed or was in error, an error was
systematically reported on all its SPOP streams. However, SPOP streams that
already received their ACK frame must be excluded. Otherwise if an agent
sends a ACK and close immediately, the ACK will be ignored because the SPOP
stream will handle the error first.

This patch must be backported to 3.1.
2025-05-13 19:51:40 +02:00
Christopher Faulet
1cd30c998b BUG/MINOR: spoe: Don't report error on applet release if filter is in DONE state
When the SPOE applet was released, if a SPOE filter context was still
attached to it, an error was reported to the filter. However, there is no
reason to report an error if the ACK message was already received. Because
of this bug, if the ACK message is received and the SPOE connection is
immediately closed, this prevents the ACK message to be processed.

This patch should be backported to 3.1.
2025-05-13 19:51:40 +02:00
Christopher Faulet
dcce02d6ed DOC: config: Fix a typo in the "term_events" definition
A space was missing before the colon.
2025-05-13 19:51:40 +02:00
Christopher Faulet
a5de0e1595 BUG/MINOR: hlua: Fix Channel:data() and Channel:line() to respect documentation
When the channel API was revisted, the both functions above was added. An
offset can be passed as argument. However, this parameter could be reported
to be out of range if there was not enough input data was received yet. It
is an issue, especially with a tcp rule, because more data could be
received. If an error is reported too early, this prevent the rule to be
reevaluated later. In fact, an error should only be reported if the offset
is part of the output data.

Another issue is about the conditions to report 'nil' instead of an empty
string. 'nil' was reported when no data was found. But it is not aligned
with the documentation. 'nil' must only be returned if no more data cannot
be received and there is no input data at all.

This patch should fix the issue #2716. It should be backported as far as 2.6.
2025-05-13 19:51:40 +02:00
Willy Tarreau
e049bd00ab MEDIUM: config: change default limits to 1024 threads and 32 groups
A test run on a dual-socket EPYC 9845 (2x160 cores) showed that we'll
be facing new limits during the lifetime of 3.2 with our current 16
groups and 256 threads max:

  $ cat test.cfg
  global
      cpu-policy perforamnce

  $ ./haproxy -dc -c -f test.cfg
  ...
  Thread CPU Bindings:
    Tgrp/Thr  Tid        CPU set
    1/1-32    1-32       32: 0-15,320-335
    2/1-32    33-64      32: 16-31,336-351
    3/1-32    65-96      32: 32-47,352-367
    4/1-32    97-128     32: 48-63,368-383
    5/1-32    129-160    32: 64-79,384-399
    6/1-32    161-192    32: 80-95,400-415
    7/1-32    193-224    32: 96-111,416-431
    8/1-32    225-256    32: 112-127,432-447

Raising the default limit to 1024 threads and 32 groups is sufficient
to buy us enough margin for a long time (hopefully, please don't laugh,
you, reader from the future):

  $ ./haproxy -dc -c -f test.cfg
  ...
  Thread CPU Bindings:
    Tgrp/Thr  Tid        CPU set
    1/1-32    1-32       32: 0-15,320-335
    2/1-32    33-64      32: 16-31,336-351
    3/1-32    65-96      32: 32-47,352-367
    4/1-32    97-128     32: 48-63,368-383
    5/1-32    129-160    32: 64-79,384-399
    6/1-32    161-192    32: 80-95,400-415
    7/1-32    193-224    32: 96-111,416-431
    8/1-32    225-256    32: 112-127,432-447
    9/1-32    257-288    32: 128-143,448-463
    10/1-32   289-320    32: 144-159,464-479
    11/1-32   321-352    32: 160-175,480-495
    12/1-32   353-384    32: 176-191,496-511
    13/1-32   385-416    32: 192-207,512-527
    14/1-32   417-448    32: 208-223,528-543
    15/1-32   449-480    32: 224-239,544-559
    16/1-32   481-512    32: 240-255,560-575
    17/1-32   513-544    32: 256-271,576-591
    18/1-32   545-576    32: 272-287,592-607
    19/1-32   577-608    32: 288-303,608-623
    20/1-32   609-640    32: 304-319,624-639

We can change this default now because it has no functional effect
without any configured cpu-policy, so this will only be an opt-in
and it's better to do it now than to have an effect during the
maintenance phase. A tiny effect is a doubling of the number of
pool buckets and stick-table shards internally, which means that
aside slightly reducing contention in these areas, a dump of tables
can enumerate keys in a different order (hence the adjustment in the
vtc).

The only really visible effect is a slightly higher static memory
consumption (29->35 MB on a small config), but that difference
remains even with 50k servers so that's pretty much acceptable.

Thanks to Erwan Velu for the quick tests and the insights!
2025-05-13 18:15:33 +02:00
Willy Tarreau
158da59c34 MEDIUM: cpu-topo: prefer grouping by CCX for "performance" and "efficiency"
Most of the time, machines made of multiple CPU types use the same L3
for them, and grouping CPUs by frequencies to form groups doesn't bring
any value and on the opposite can impair the incoming connection balancing.
This choice of grouping by cluster was made in order to constitute a good
choice on homogenous machines as well, so better rely on the per-CCX
grouping than the per-cluster one in this case. This will create less
clusters on machines where it counts without affecting other ones.

It doesn't seem necessary to change anything for the "resource" policy
since it selects a single cluster.
2025-05-13 16:48:30 +02:00
Willy Tarreau
70b0dd6b0f MEDIUM: cpu-topo: change "efficiency" to consider per-core capacity
This is similar to the previous change to the "performance" policy but
it applies to the "efficiency" one. Here we're changing the sorting
method to sort CPU clusters by average per-CPU capacity, and we evict
clusters whose per-CPU capacity is above 125% of the previous one.
Per-core capacity allows to detect discrepancies between CPU cores,
and to continue to focus on efficient ones as a priority.
2025-05-13 16:48:30 +02:00
Willy Tarreau
6c88e27cf4 MEDIUM: cpu-topo: change "performance" to consider per-core capacity
Running the "performance" policy on highly heterogenous systems yields
bad choices when there are sufficiently more small than big cores,
and/or when there are multiple cluster types, because on such setups,
the higher the frequency, the lower the number of cores, despite small
differences in frequencies. In such cases, we quickly end up with
"performance" only choosing the small or the medium cores, which is
contrary to the original intent, which was to select performance cores.
This is what happens on boards like the Orion O6 for example where only
the 4 medium cores and 2 big cores are choosen, evicting the 2 biggest
cores and the 4 smallest ones.

Here we're changing the sorting method to sort CPU clusters by average
per-CPU capacity, and we evict clusters whose per-CPU capacity falls
below 80% of the previous one. Per-core capacity allows to detect
discrepancies between CPU cores, and to continue to focus on high
performance ones as a priority.
2025-05-13 16:48:30 +02:00
Willy Tarreau
5ab2c815f1 MINOR: cpu-topo: provide a function to sort clusters by average capacity
The current per-capacity sorting function acts on a whole cluster, but
in some setups having many small cores and few big ones, it becomes
easy to observe an inversion of metrics where the many small cores show
a globally higher total capacity than the few big ones. This does not
necessarily fit all use cases. Let's add new a function to sort clusters
by their per-cpu average capacity to cover more use cases.
2025-05-13 16:48:30 +02:00
Willy Tarreau
01df98adad MINOR: cpu-topo: add a new "group-by-ccx" CPU policy
This cpu-policy will only consider CCX and not clusters. This makes
a difference on machines with heterogenous CPUs that generally share
the same L3 cache, where it's not desirable to create multiple groups
based on the CPU types, but instead create one with the different CPU
types. The variants "group-by-2/3/4-ccx" have also been added.

Let's also add some text explaining the difference between cluster
and CCX.
2025-05-13 16:48:30 +02:00
Willy Tarreau
33d8b006d4 BUG/MINOR: cpu-topo: fix group-by-cluster policy for disordered clusters
Some (rare) boards have their clusters in an erratic order. This is
the case for the Radxa Orion O6 where one of the big cores appears as
CPU0 due to booting from it, then followed by the small cores, then the
medium cores, then the remaining big cores. This results in clusters
appearing this order: 0,2,1,0.

The core in cpu_policy_group_by_cluster() expected ordered clusters,
and performs ordered comparisons to decide whether a CPU's cluster has
already been taken care of. On the board above this doesn't work, only
clusters 0 and 2 appear and 1 is skipped.

Let's replace the cluster number comparison with a cpuset to record
which clusters have been taken care of. Now the groups properly appear
like this:

  Tgrp/Thr  Tid        CPU set
  1/1-2     1-2        2: 0,11
  2/1-4     3-6        4: 1-4
  3/1-6     7-12       6: 5-10

No backport is needed, this is purely 3.2.
2025-05-13 16:48:30 +02:00
Amaury Denoyelle
f3b9676416 MINOR: quic: display stream age
Add a field to save the creation date of qc_stream_desc instance. This
is useful to display QUIC stream age in "show quic stream" output.
2025-05-13 15:44:22 +02:00
Amaury Denoyelle
dbf07c754e MINOR: quic: display QCS info on "show quic stream"
Complete stream output for "show quic" by displaying information from
its upper QCS. Note that QCS may be NULL if already released, so a
default output is also provided.
2025-05-13 15:43:28 +02:00
Amaury Denoyelle
cbadfa0163 MINOR: quic: add stream format for "show quic"
Add a new format for "show quic" command labelled as "stream". This is
an equivalent of "show sess", dedicated to the QUIC stack. Each active
QUIC streams are listed on a line with their related infos.

The main objective of this command is to ensure there is no freeze
streams remaining after a transfer.
2025-05-13 15:41:51 +02:00
Amaury Denoyelle
1ccede211c MINOR: mux-quic: account Rx data per stream
Add counters to measure Rx buffers usage per QCS. This reused the newly
defined bdata_ctr type already used for Tx accounting.

Note that for now, <tot> value of bdata_ctr is not used. This is because
it is not easy to account for data accross contiguous buffers.

These values are displayed both on log/traces and "show quic" output.
2025-05-13 15:41:51 +02:00
Amaury Denoyelle
a1dc9070e7 MINOR: quic: account Tx data per stream
Add accounting at qc_stream_desc level to be able to report the number
of allocated Tx buffers and the sum of their data. This represents data
ready for emission or already emitted and waiting on ACK.

To simplify this accounting, a new counter type bdata_ctr is defined in
quic_utils.h. This regroups both buffers and data counter, plus a
maximum on the buffer value.

These values are now displayed on QCS info used both on logline and
traces, and also on "show quic" output.
2025-05-13 15:41:41 +02:00
Willy Tarreau
9a05c1f574 BUG/MEDIUM: h2/h3: reject some forbidden chars in :authority before reassembly
As discussed here:
   https://github.com/httpwg/http2-spec/pull/936
   https://github.com/haproxy/haproxy/issues/2941

It's important to take care of some special characters in the :authority
pseudo header before reassembling a complete URI, because after assembly
it's too late (e.g. the '/'). This patch does this, both for h2 and h3.

The impact on H2 was measured in the worst case at 0.3% of the request
rate, while the impact on H3 is around 1%, but H3 was about 1% faster
than H2 before and is now on par.

It may be backported after a period of observation, and in this case it
relies on this previous commit:

   MINOR: http: add a function to validate characters of :authority

Thanks to @DemiMarie for reviving this topic in issue #2941 and bringing
new potential interesting cases.
2025-05-12 18:02:47 +02:00
Willy Tarreau
ebab479cdf MINOR: http: add a function to validate characters of :authority
As discussed here:
  https://github.com/httpwg/http2-spec/pull/936
  https://github.com/haproxy/haproxy/issues/2941

It's important to take care of some special characters in the :authority
pseudo header before reassembling a complete URI, because after assembly
it's too late (e.g. the '/').

This patch adds a specific function which was checks all such characters
and their ranges on an ist, and benefits from modern compilers
optimizations that arrange the comparisons into an evaluation tree for
faster match. That's the version that gave the most consistent performance
across various compilers, though some hand-crafted versions using bitmaps
stored in register could be slightly faster but super sensitive to code
ordering, suggesting that the results might vary with future compilers.
This one takes on average 1.2ns per character at 3 GHz (3.6 cycles per
char on avg). The resulting impact on H2 request processing time (small
requests) was measured around 0.3%, from 6.60 to 6.618us per request,
which is a bit high but remains acceptable given that the test only
focused on req rate.

The code was made usable both for H2 and H3.
2025-05-12 18:02:47 +02:00
Aurelien DARRAGON
c40d6ac840 BUG/MINOR: server: perform lbprm deinit for dynamic servers
Last commit 7361515 ("BUG/MINOR: server: dont depend on proxy for server
cleanup in srv_drop()") introduced a regression because the lbprm
server_deinit is not evaluated anymore with dynamic servers, possibly
resulting in a memory leak.

To fix the issue, in addition to free_proxy(), the server deinit check
should be manually performed in cli_parse_delete_server() as well.

No backport needed.
2025-05-12 16:29:36 +02:00
Aurelien DARRAGON
736151556c BUG/MINOR: server: dont depend on proxy for server cleanup in srv_drop()
In commit b5ee8bebfc ("MINOR: server: always call ssl->destroy_srv when
available"), we made it so srv_drop() doesn't depend on proxy to perform
server cleanup.

It turns out this is now mandatory, because during deinit, free_proxy()
can occur before the final srv_drop(). This is the case when using Lua
scripts for instance.

In 2a9436f96 ("MINOR: lbprm: Add method to deinit server and proxy") we
added a freeing check under srv_drop() that depends on the proxy.
Because of that UAF may occur during deinit when using a Lua script that
manipulate server objects.

To fix the issue, let's perform the lbprm server deinit logic under
free_proxy() directly, where the DEINIT server hooks are evaluated.

Also, to prevent similar bugs in the future, let's explicitly document
in srv_drop() that server cleanups should assume that the proxy may
already be freed.

No backport needed unless 2a9436f96 is.
2025-05-12 16:17:26 +02:00
Willy Tarreau
be4d816be2 BUG/MINOR: cfgparse: improve the empty arg position report's robustness
OSS Fuzz found that the previous fix ebb19fb367 ("BUG/MINOR: cfgparse:
consider the special case of empty arg caused by \x00") was incomplete,
as the output can sometimes be larger than the input (due to variables
expansion) in which case the work around to try to report a bad arg will
fail. While the parse_line() function has been made more robust now in
order to avoid this condition, let's fix the handling of this special
case anyway by just pointing to the beginning of the line if the supposed
error location is out of the line's buffer.

All details here:
   https://oss-fuzz.com/testcase-detail/5202563081502720

No backport is needed unless the fix above is backported.
2025-05-12 16:11:15 +02:00
Willy Tarreau
2b60e54fb1 BUG/MINOR: tools: improve parse_line()'s robustness against empty args
The fix in 10e6d0bd57 ("BUG/MINOR: tools: only fill first empty arg when
not out of range") was not that good. It focused on protecting against
<arg> becoming out of range to detect we haven't emitted anything, but
it's not the right way to detect this. We're always maintaining arg_start
as a copy of outpos, and that later one is incremented when emitting a
char, so instead of testing args[arg] against out+arg_start, we should
instead check outpos against arg_start, thereby eliminating the <out>
offset and the need to access args[]. This way we now always know if
we've emitted an empty arg without dereferencing args[].

There's no need to backport this unless the fix above is also backported.
2025-05-12 16:11:15 +02:00
Aurelien DARRAGON
7d057e56af BUG/MINOR: threads: fix soft-stop without multithreading support
When thread support is disabled ("USE_THREAD=" or "USE_THREAD=0" when
building), soft-stop doesn't work as haproxy never ends after stopping
the proxies.

This used to work fine in the past but suddenly stopped working with
ef422ced91 ("MEDIUM: thread: make stopping_threads per-group and add
stopping_tgroups") because the "break;" instruction under the stopping
condition is never executed when support for multithreading is disabled.

To fix the issue, let's add an "else" block to run the "break;"
instruction when USE_THREAD is not defined.

It should be backported up to 2.8
2025-05-12 14:18:39 +02:00
William Lallemand
8b0d1a4113 MINOR: ssl/ckch: warn when the same keyword was used twice
When using a crt-list or a crt-store, keywords mentionned twice on the
same line overwritte the previous value.

This patch emits a warning when the same keyword is found another time
on the same line.
2025-05-09 19:18:38 +02:00
William Lallemand
9c0c05b7ba BUG/MINOR: ssl/ckch: always ha_freearray() the previous entry during parsing
The ckch_conf_parse() function is the generic function which parses
crt-store keywords from the crt-store section, and also from a
crt-list.

When having multiple time the same keyword, a leak of the previous
value happens. This patch ensure that the previous value is always
freed before overwriting it.

This is the same problem as the previous "BUG/MINOR: ssl/ckch: always
free() the previous entry during parsing" patch, however this one
applies on PARSE_TYPE_ARRAY_SUBSTR.

No backport needed.
2025-05-09 19:16:02 +02:00
William Lallemand
96b1f1fd26 MINOR: tools: ha_freearray() frees an array of string
ha_freearray() is a new function which free() an array of strings
terminated by a NULL entry.

The pointer to the array will be free and set to NULL.
2025-05-09 19:12:05 +02:00
William Lallemand
311e0aa5c7 BUG/MINOR: ssl/ckch: always free() the previous entry during parsing
The ckch_conf_parse() function is the generic function which parses
crt-store keywords from the crt-store section, and also from a crt-list.

When having multiple time the same keyword, a leak of the previous value
happens. This patch ensure that the previous value is always freed
before overwriting it.

This patch should be backported as far as 3.0.
2025-05-09 19:01:28 +02:00
William Lallemand
9ce3fb35a2 BUG/MINOR: ssl: prevent multiple 'crt' on the same ssl-f-use line
The 'ssl-f-use' implementation doesn't prevent to have multiple time the
'crt' keyword, which overwrite the previous value. Letting users think
that is it possible to use multiple certificates on the same line, which
is not the case.

This patch emits an alert when setting the 'crt' keyword multiple times
on the same ssl-f-use line.

Should fix issue #2966.

No backport needed.
2025-05-09 18:52:09 +02:00
William Lallemand
0c4abf5a22 BUG/MINOR: ssl: doesn't fill conf->crt with first arg
Commit c7f29afc ("MEDIUM: ssl: replace "crt" lines by "ssl-f-use"
lines") forgot to remove an the allocation of the crt field which was
done with the first argument.

Since ssl-f-use takes keywords, this would put the first keyword in
"crt" instead of the certificate name.
2025-05-09 18:23:06 +02:00
Willy Tarreau
8a96216847 MEDIUM: sock-inet: re-check IPv6 connectivity every 30s
IPv6 connectivity might start off (e.g. network not fully up when
haproxy starts), so for features like resolvers, it would be nice to
periodically recheck.

With this change, instead of having the resolvers code rely on a variable
indicating connectivity, it will now call a function that will check for
how long a connectivity check hasn't been run, and will perform a new one
if needed. The age was set to 30s which seems reasonable considering that
the DNS will cache results anyway. There's no saving in spacing it more
since the syscall is very check (just a connect() without any packet being
emitted).

The variables remain exported so that we could present them in show info
or anywhere else.

This way, "dns-accept-family auto" will now stay up to date. Warning
though, it does perform some caching so even with a refreshed IPv6
connectivity, an older record may be returned anyway.
2025-05-09 15:45:44 +02:00
Willy Tarreau
1404f6fb7b DEBUG: pools: add a new integrity mode "backup" to copy the released area
This way we can preserve the entire contents of the released area for
later inspection. This automatically enables comparison at reallocation
time as well (like "integrity" does). If used in combination with
integrity, the comparison is disabled but the check of non-corruption
of the area mangled by integrity is still operated.
2025-05-09 14:57:00 +02:00
William Lallemand
e7574cd5f0 MINOR: acme: add the global option 'acme.scheduler'
The automatic scheduler is useful but sometimes you don't want to use,
or schedule manually.

This patch adds an 'acme.scheduler' option in the global section, which
can be set to either 'auto' or 'off'. (auto is the default value)

This also change the ouput of the 'acme status' command so it does not
shows scheduled values. The state will be 'Stopped' instead of
'Scheduled'.
2025-05-09 14:00:39 +02:00
Willy Tarreau
0ae14beb2a DEBUG: pool: permit per-pool UAF configuration
The new MEM_F_UAF flag can be set just after a pool's creation to make
this pool UAF for debugging purposes. This allows to maintain a better
overall performance required to reproduce issues while still having a
chance to catch UAF. It will only be used by developers who will manually
add it to areas worth being inspected, though.
2025-05-09 13:59:02 +02:00
Amaury Denoyelle
14e4f2b811 BUG/MEDIUM: mux-quic: fix crash on invalid fctl frame dereference
Emission of flow-control frames have been recently modified. Now, each
frame is sent one by one, via a single entry list. If a failure occurs,
emission is interrupted and frame is reinserted into the original
<qcc.lfctl.frms> list.

This code is incorrect as it only checks if qcc_send_frames() returns an
error code to perform the reinsert operation. However, an error here
does not always mean that the frame was not properly emitted by lower
quic-conn layer. As such, an extra test LIST_ISEMPTY() must be performed
prior to reinsert the frame.

This bug would cause a heap overflow. Indeed, the reinsert frame would
be a random value. A crash would occur as soon as it would be
dereferenced via <qcc.lfctl.frms> list.

This was reproduced by issuing a POST with a big file and interrupt it
after just a few seconds. This results in a crash in about a third of
the tests. Here is an example command using ngtcp2 :

 $ ngtcp2-client -q --no-quic-dump --no-http-dump \
   -m POST -d ~/infra/html/1g 127.0.0.1 20443 "http://127.0.0.1:20443/post"

Heap overflow was detected via a BUG_ON() statement from qc_frm_free()
via qcc_release() caller :

  FATAL: bug condition "!((&((*frm)->reflist))->n == (&((*frm)->reflist)))" matched at src/quic_frame.c:1270

This does not need to be backported.
2025-05-09 11:07:11 +02:00
Willy Tarreau
3f9194bfc9 [RELEASE] Released version 3.2-dev15
Released version 3.2-dev15 with the following main changes :
    - BUG/MEDIUM: stktable: fix sc_*(<ctr>) BUG_ON() regression with ctx > 9
    - BUG/MINOR: acme/cli: don't output error on success
    - BUG/MINOR: tools: do not create an empty arg from trailing spaces
    - MEDIUM: config: warn about the consequences of empty arguments on a config line
    - MINOR: tools: make parse_line() provide hints about empty args
    - MINOR: cfgparse: visually show the input line on empty args
    - BUG/MINOR: tools: always terminate empty lines
    - BUG/MINOR: tools: make parseline report the required space for the trailing 0
    - DEBUG: threads: don't keep lock label "OTHER" in the per-thread history
    - DEBUG: threads: merge successive idempotent lock operations in history
    - DEBUG: threads: display held locks in threads dumps
    - BUG/MINOR: proxy: only use proxy_inc_fe_cum_sess_ver_ctr() with frontends
    - Revert "BUG/MEDIUM: mux-spop: Handle CLOSING state and wait for AGENT DISCONNECT frame"
    - MINOR: acme/cli: 'acme status' show the status acme-configured certificates
    - MEDIUM: acme/ssl: remove 'acme ps' in favor of 'acme status'
    - DOC: configuration: add "acme" section to the keywords list
    - DOC: configuration: add the "crt-store" keyword
    - BUG/MAJOR: queue: lock around the call to pendconn_process_next_strm()
    - MINOR: ssl: add filename and linenum for ssl-f-use errors
    - BUG/MINOR: ssl: can't use crt-store some certificates in ssl-f-use
    - BUG/MINOR: tools: only fill first empty arg when not out of range
    - MINOR: debug: bump the dump buffer to 8kB
    - MINOR: stick-tables: add "ipv4" as an alias for the "ip" type
    - MINOR: quic: extend return value during TP parsing
    - BUG/MINOR: quic: use proper error code on missing CID in TPs
    - BUG/MINOR: quic: use proper error code on invalid server TP
    - BUG/MINOR: quic: reject retry_source_cid TP on server side
    - BUG/MINOR: quic: use proper error code on invalid received TP value
    - BUG/MINOR: quic: fix TP reject on invalid max-ack-delay
    - BUG/MINOR: quic: reject invalid max_udp_payload size
    - BUG/MEDIUM: peers: hold the refcnt until updating ts->seen
    - BUG/MEDIUM: stick-tables: close a tiny race in __stksess_kill()
    - BUG/MINOR: cli: fix too many args detection for commands
    - MINOR: server: ensure server postparse tasks are run for dynamic servers
    - BUG/MEDIUM: stick-table: always remove update before adding a new one
    - BUG/MEDIUM: quic: free stream_desc on all data acked
    - BUG/MINOR: cfgparse: consider the special case of empty arg caused by \x00
    - DOC: config: recommend disabling libc-based resolution with resolvers
2025-05-09 10:51:30 +02:00
Willy Tarreau
4e20fab7ac DOC: config: recommend disabling libc-based resolution with resolvers
Using both libc and haproxy resolvers can lead to hard to diagnose issues
when their bevahiour diverges; recommend using only one type of resolver.

Should be backported to stable versions.

Link: https://www.mail-archive.com/haproxy@formilux.org/msg45663.html
Co-authored-by: Lukas Tribus <lukas@ltri.eu>
2025-05-09 10:31:39 +02:00
Willy Tarreau
ebb19fb367 BUG/MINOR: cfgparse: consider the special case of empty arg caused by \x00
The reporting of the empty arg location added with commit 08d3caf30
("MINOR: cfgparse: visually show the input line on empty args") falls
victim of a special case detected by OSS Fuzz:

     https://issues.oss-fuzz.com/issues/415850462

In short, making an argument start with "\x00" doesn't make it empty for
the parser, but still emits an empty string which is detected and
displayed. Unfortunately in this case the error pointer is not set so
the sanitization function crashes.

What we're doing in this case is that we fall back to the position of
the output argument as an estimate of where it was located in the input.
It's clearly inexact (quoting etc) but will still help the user locate
the problem.

No backport is needed unless the commit above is backported.
2025-05-09 10:01:44 +02:00
Amaury Denoyelle
3fdb039a99 BUG/MEDIUM: quic: free stream_desc on all data acked
The following patch simplifies qc_stream_desc_ack(). The qc_stream_desc
instance is not freed anymore, even if all data were acknowledged. As
implies by the commit message, the caller is responsible to perform this
cleaning operation.
  f4a83fbb14bdd14ed94752a2280a2f40c1b690d2
  MINOR: quic: do not remove qc_stream_desc automatically on ACK handling

However, despite the commit instruction, qc_stream_desc_free()
invokation was not moved in the caller. This commit fixes this by adding
it after stream ACK handling. This is performed only when a transfer is
completed : all data is acknowledged and qc_stream_desc has been
released by its MUX stream instance counterpart.

This bug may cause a significant increase in memory usage when dealing
with long running connection. However, there is no memory leak, as every
qc_stream_desc attached to a connection are finally freed when quic_conn
instance is released.

This must be backported up to 3.1.
2025-05-09 09:25:47 +02:00
Willy Tarreau
576e47fb9a BUG/MEDIUM: stick-table: always remove update before adding a new one
Since commit 388539faa ("MEDIUM: stick-tables: defer adding updates to a
tasklet"), between the entry creation and its arrival in the updates tree,
there is time for scheduling, and it now becomes possible for an stksess
entry to be requeued into the list while it's still in the tree as a remote
one. Only local updates were removed prior to being inserted. In this case
we would re-insert the entry, causing it to appear as the parent of two
distinct nodes or leaves, and to be visited from the first leaf during a
delete() after having already been removed and freed, causing a crash,
as Christian reported in issue #2959.

There's no reason to backport this as this appeared with the commit above
in 3.2-dev13.
2025-05-08 23:32:25 +02:00
Aurelien DARRAGON
f03e999912 MINOR: server: ensure server postparse tasks are run for dynamic servers
commit 29b76cae4 ("BUG/MEDIUM: server/log: "mode log" after server
keyword causes crash") introduced some postparsing checks/tasks for
server

Initially they were mainly meant for "mode log" servers postparsing, but
we already have a check dedicated to "tcp/http" servers (ie: only tcp
proto supported)

However when dynamic servers are added they bypass _srv_postparse() since
the REGISTER_POST_SERVER_CHECK() is only executed for servers defined in
the configuration.

To ensure consistency between dynamic and static servers, and ensure no
post-check init routine is missed, let's manually invoke _srv_postparse()
after creating a dynamic server added via the cli.
2025-05-08 02:03:50 +02:00
Aurelien DARRAGON
976e0bd32f BUG/MINOR: cli: fix too many args detection for commands
d3f928944 ("BUG/MINOR: cli: Issue an error when too many args are passed
for a command") added a new check to prevent the command to run when
too many arguments are provided. In this case an error is reported.

However it turns out this check (despite marked for backports) was
ineffective prior to 20ec1de21 ("MAJOR: cli: Refacor parsing and
execution of pipelined commands") as 'p' pointer was reset to the end of
the buffer before the check was executed.

Now since 20ec1de21, the check works, but we have another issue: we may
read past initialized bytes in the buffer because 'p' pointer is always
incremented in a while loop without checking if we increment it past 'end'
(This was detected using valgrind)

To fix the issue introduced by 20ec1de21, let's only increment 'p' pointer
if p < end.

For 3.2 this is it, now for older versions, since d3f928944 was marked for
backport, a sligthly different approach is needed:

 - conditional p increment must be done in the loop (as in this patch)
 - max arg check must moved above "fill unused slots" comment where p is
   assigned to the end of the buffer

This patch should be backported with d3f928944.
2025-05-08 02:03:43 +02:00
Willy Tarreau
0cee7b5b8d BUG/MEDIUM: stick-tables: close a tiny race in __stksess_kill()
It might be possible not to see the element in the tree, then not to
see it in the update list, thus not to take the lock before deleting.
But an element in the list could have moved to the tree during the
check, and be removed later without the updt_lock.

Let's delete prior to checking the presence in the tree to avoid
this situation. No backport is needed since this arrived in -dev13
with the update list.
2025-05-07 18:49:21 +02:00
Willy Tarreau
006a3acbde BUG/MEDIUM: peers: hold the refcnt until updating ts->seen
In peer_treat_updatemsg(), we call stktable_touch_remote() after
releasing the write lock on the TS, asking it to decrement the
refcnt, then we update ts->seen. Unfortunately this is racy and
causes the issue that Christian reported in issue #2959.

The sequence of events is very hard to trigger manually, but what happens
is the following:

 T1.  stktable_touch_remote(table, ts, 1);
      -> at this point the entry is in the mt_list, and the refcnt is zero.

      T2.  stktable_trash_oldest() or process_table_expire()
           -> these can run, because the refcnt is now zero.
              The entry is cleanly deleted and freed.

 T1.  HA_ATOMIC_STORE(&ts->seen, 1)
      -> we dereference freed memory.

A first attempt at a fix was made by keeping the refcnt held during
all the time the entry is in the mt_list, but this is expensive as
such entries cannot be purged, causing lots of skips during
trash_oldest_data(). This managed to trigger watchdogs, and was only
hiding the real cause of the problem.

The correct approach clearly is to maintain the ref_cnt until we
touch ->seen. That's what this patch does. It does not decrement
the refcnt, while calling stktable_touch_remote(), and does it
manually after touching ->seen. With this the problem is gone.

Note that a reproducer involves the following:
  - a config with 10 stick-ctr tracking the same table with a
    random key between 10M and 100M depending on the machine.
  - the expiration should be between 10 and 20s. http_req_cnt
    is stored and shared with the peers.
  - 4 total processes with such a config on the local machine,
    each corresponding to a different peer. 3 of the peers are
    bound to half of the cores (all threads) and share the same
    threads; the last process is bound to the other half with
    its own threads.
  - injecting at full load, ~256 conn, on the shared listening
    port. After ~2x expiration time to 1 minute the lone process
    should segfault in pools code due to a corrupted by_lru list.

This problem already exists in earlier versions but the race looks
narrower. Given how difficult it is to trigger on a given machine
in its current form, it's likely that it only happens once in a
while on stable branches. The fix must be backported wherever the
code is similar, and there's no hope to reproduce it to validate
the backport.

Thanks again to Christian for his amazing help!
2025-05-07 18:49:21 +02:00
Amaury Denoyelle
4bc7aa548a BUG/MINOR: quic: reject invalid max_udp_payload size
Add a checks on received max_udp_payload transport parameters. As
defined per RFC 9000, values below 1200 are invalid, and thus the
connection must be closed with TRANSPORT_PARAMETER_ERROR code.

Prior to this patch, an invalid value was silently ignored.

This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".
2025-05-07 15:21:30 +02:00
Amaury Denoyelle
ffabfb0fc3 BUG/MINOR: quic: fix TP reject on invalid max-ack-delay
Checks are implemented on some received transport parameter values,
to reject invalid ones defined per RFC 9000. This is the case for
max_ack_delay parameter.

The check was not properly implemented as it only reject values strictly
greater than the limit set to 2^14. Fix this by rejecting values of 2^14
and above. Also, the proper error code TRANSPORT_PARAMETER_ERROR is now
set.

This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".
2025-05-07 15:21:30 +02:00
Amaury Denoyelle
b60a17aad7 BUG/MINOR: quic: use proper error code on invalid received TP value
As per RFC 9000, checks must be implemented to reject invalid values for
received transport parameters. Such values are dependent on the
parameter type.

Checks were already implemented for ack_delay_exponent and
active_connection_id_limit, accordingly with the QUIC specification.
However, the connection was closed with an incorrect error code. Fix
this to ensure that TRANSPORT_PARAMETER_ERROR code is used as expected.

This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".
2025-05-07 15:21:30 +02:00
Amaury Denoyelle
10f1f1adce BUG/MINOR: quic: reject retry_source_cid TP on server side
Close the connection on error if retry_source_connection_id transport
parameter is received. This is specified by RFC 9000 as this parameter
must not be emitted by a client. Previously, it was silently ignored.

This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".
2025-05-07 15:21:30 +02:00
Amaury Denoyelle
a54fdd3d92 BUG/MINOR: quic: use proper error code on invalid server TP
This commit is similar to the previous one. It fixes the error code
reported when dealing with invalid received transport parameters. This
time, it handles reception of original_destination_connection_id,
preferred_address and stateless_reset_token which must not be emitted by
the client.

This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".
2025-05-07 15:20:06 +02:00
Amaury Denoyelle
df6bd4909e BUG/MINOR: quic: use proper error code on missing CID in TPs
Handle missing received transport parameter value
initial_source_connection_id / original_destination_connection_id.
Previously, such case would result in an error reported via
quic_transport_params_store(), which triggers a TLS alert converted as
expected as a CONNECTION_CLOSE. The issue is that the error code
reported in the frame was incorrect.

Fix this by returning QUIC_TP_DEC_ERR_INVAL for such conditions. This is
directly handled via quic_transport_params_store() which set the proper
TRANSPORT_PARAMETER_ERROR code for the CONNECTION_CLOSE. However, no
error is reported so the SSL handshake is properly terminated without a
TLS alert. This is enough to ensure that the CONNECTION_CLOSE frame will
be emitted as expected.

This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".
2025-05-07 15:20:06 +02:00
Amaury Denoyelle
294bf26c06 MINOR: quic: extend return value during TP parsing
Extend API used for QUIC transport parameter decoding. This is done via
the introduction of a dedicated enum to report the various error
condition detected. No functional change should occur with this patch,
as the only returned code is QUIC_TP_DEC_ERR_TRUNC, which results in the
connection closure via a TLS alert.

This patch will be necessary to properly reject transport parameters
with the proper CONNECTION_CLOSE error code. As such, it should be
backported up to 2.6 with the following series.
2025-05-07 15:19:52 +02:00
Willy Tarreau
46b5dcad99 MINOR: stick-tables: add "ipv4" as an alias for the "ip" type
However the doc purposely says the opposite, to encourage migrating away
from "ip". The goal is that in the future we change "ip" to mean "ipv6",
which seems to be what most users naturally expect. But we cannot break
configurations in the LTS version so for now "ipv4" is the alias.

The reason for not changing it in the table is that the type name is
used at a few places (look for "].kw"):
  - dumps
  - promex

We'd rather not change that output for 3.2, but only do it in 3.3.
This way, 3.2 can be made future-proof by using "ipv4" in the config
without any other side effect.

Please see github issue #2962 for updates on this transition.
2025-05-07 10:11:55 +02:00
Willy Tarreau
697a531516 MINOR: debug: bump the dump buffer to 8kB
Now with the improved backtraces, the lock history and details in the
mux layers, some dumps appear truncated or with some chars alone at
the beginning of the line. The issue is in fact caused by the limited
dump buffer size (2kB for stderr, 4kB for warning), that cannot hold
a complete trace anymore.

Let's jump bump them to 8kB, this will be plenty for a long time.
2025-05-07 10:02:58 +02:00
Willy Tarreau
10e6d0bd57 BUG/MINOR: tools: only fill first empty arg when not out of range
In commit 3f2c8af313 ("MINOR: tools: make parse_line() provide hints
about empty args") we've added the ability to record the position of
the first empty arg in parse_line(), but that check requires to
access the args[] array for the current arg, which is not valid in
case we stopped on too large an argument count. Let's just check the
arg's validity before doing so.

This was reported by OSS Fuzz:
  https://issues.oss-fuzz.com/issues/415850462

No backport is needed since this was in the latest dev branch.
2025-05-07 07:25:29 +02:00
William Lallemand
fbceabbccf BUG/MINOR: ssl: can't use crt-store some certificates in ssl-f-use
When declaring a certificate via the crt-store section, this certificate
can then be used 2 ways in a crt-list:
- only by using its name, without any crt-store options
- or by using the exact set of crt-list option that was defined in the
  crt-store

Since ssl-f-use is generating a crt-list, this is suppose to behave the
same. To achieve this, ckch_conf_parse() will parse the keywords related
to the ckch_conf on the ssl-f-use line and use ckch_conf_cmp() to
compare it to the previous declaration from the crt-store. This
comparaison is only done when any ckch_conf keyword are present.

However, ckch_conf_parse() was done for the crt-list, and the crt-list
does not use the "crt" parameter to declare the name of the certificate,
since it's the first element of the line. So when used with ssl-f-use,
ckch_conf_parse() will always see a "crt" keyword which is a ckch_conf
one, and consider that it will always need to have the exact same set of
paremeters when using the same crt in a crt-store and an ssl-f-use line.

So a simple configuration like this:

   crt-store web
     load crt "foo.com.crt" key "foo.com.key" alias "foo"

   frontend mysite
     bind :443 ssl
     ssl-f-use crt "@web/foo" ssl-min-ver TLSv1.2

Would lead to an error like this:

    config : '@web/foo' in crt-list '(null)' line 0, is already defined with incompatible parameters:
    - different parameter 'key' : previously 'foo.com.key' vs '(null)'

In order to fix the issue, this patch parses the "crt" parameter itself
for ssl-f-use instead of using ckch_conf_parse(), so the keyword would
never be considered as a ckch_conf keyword to compare.

This patch also take care of setting the CKCH_CONF_SET_CRTLIST flag only
if a ckch_conf keyword was found. This flag is used by ckch_conf_cmp()
to know if it has to compare or not.

No backport needed.
2025-05-06 21:36:29 +02:00
William Lallemand
b3b282d2ee MINOR: ssl: add filename and linenum for ssl-f-use errors
Fill cfg_crt_node with a filename and linenum so the post_section
callback can use it to emit errors.

This way the errors are emitted with the right filename and linenum
where ssl-f-use is used instead of (null):0
2025-05-06 21:36:29 +02:00
Willy Tarreau
99f5be5631 BUG/MAJOR: queue: lock around the call to pendconn_process_next_strm()
The extra call to pendconn_process_next_strm() made in commit cda7275ef5
("MEDIUM: queue: Handle the race condition between queue and dequeue
differently") was performed after releasing the server queue's lock,
which is incompatible with the calling convention for this function.
The result is random corruption of the server's streams list likely
due to picking old or incorrect pendconns from the queue, and in the
end infinitely looping on apparently already locked mt_list objects.
Just adding the lock fixes the problem.

It's very difficult to reproduce, it requires low maxconn values on
servers, stickiness on the servers (cookie), a long enough slowstart
(e.g. 10s), and regularly flipping servers up/down to re-trigger the
slowstart.

No backport is needed as this was only in 3.2.
2025-05-06 18:59:54 +02:00
William Lallemand
e035f0c48e DOC: configuration: add the "crt-store" keyword
Add the "crt-store" keyword with its argument in the "3.12" section, so
this could be detected by haproxy-dconv has a keyword and put in the
keywords list.

Must be backported as far as 3.0
2025-05-06 16:07:29 +02:00
William Lallemand
e516b14d36 DOC: configuration: add "acme" section to the keywords list
Add the "acme" keyword with its argument in the "3.13" section, so this
could be detected by haproxy-dconv has a keyword and put in the keywords
list.
2025-05-06 15:34:39 +02:00
William Lallemand
b7c4a68ecf MEDIUM: acme/ssl: remove 'acme ps' in favor of 'acme status'
Remove the 'acme ps' command which does not seem useful anymore with the
'acme status' command.

The big difference with the 'acme status' command is that it was only
displaying the running tasks instead of the status of all certificate.
2025-05-06 15:27:29 +02:00
William Lallemand
48f1ce77b7 MINOR: acme/cli: 'acme status' show the status acme-configured certificates
The "acme status" command, shows the status of every certificates
configured with ACME, not only the running task like "acme ps".

The IO handler loops on the ckch_store tree and outputs a line for each
ckch_store which has an acme section set. This is still done under the
ckch_store lock and doesn't support resuming when the buffer is full,
but we need to change that in the future.
2025-05-06 15:27:29 +02:00
Christopher Faulet
a3ce7d7772 Revert "BUG/MEDIUM: mux-spop: Handle CLOSING state and wait for AGENT DISCONNECT frame"
This reverts commit 53c3046898633e56f74f7f05fb38cabeea1c87a1.

This patch introduced a regression leading to a loop on the frames
demultiplexing because a frame may be ignore but not consumed.

But outside this regression that can be fixed, there is a design issue that
was not totally fixed by the patch above. The SPOP connection state is mixed
with the status of the frames demultiplexer and this needlessly complexify
the connection management. Instead of fixing the fix, a better solution is
to revert it to work a a proper solution.

For the record, the idea is to deal with the spop connection state onlu
using 'state' field and to introduce a new field to handle the frames
demultiplexer state. This should ease the closing state management.

Another issue that must be fixed. We must take care to not abort a SPOP
stream when an error is detected on a SPOP connection or when the connection
is closed, if the ACK frame was already received for this stream. It is not
a common case, but it can be solved by saving the last known stream ID that
recieved a ACK.

This patch must be backported if the commit above is backported.
2025-05-06 13:43:59 +02:00
Aurelien DARRAGON
b39825ee45 BUG/MINOR: proxy: only use proxy_inc_fe_cum_sess_ver_ctr() with frontends
proxy_inc_fe_cum_sess_ver_ctr() was implemented in 9969adbc
("MINOR: stats: add by HTTP version cumulated number of sessions and
requests")

As its name suggests, it is meant to be called for frontends, not backends

Also, in 9969adbc, when used under h1_init(), a precaution is taken to
ensure that the function is only called with frontends.

However, this precaution was not applied in h2_init() and qc_init().

Due to this, it remains possible to have proxy_inc_fe_cum_sess_ver_ctr()
being called with a backend proxy as parameter. While it did not cause
known issues so far, it is not expected and could result in bugs in the
future. Better fix this by ensuring the function is only called with
frontends.

It may be backported up to 2.8
2025-05-06 11:01:39 +02:00
Willy Tarreau
3bb6eea6d5 DEBUG: threads: display held locks in threads dumps
Based on the lock history, we can spot some locks that are still held
by checking the last operation that happened on them: if it's not an
unlock, then we know the lock is held. In this case we append the list
after "locked:" with their label and state like below:

  U:QUEUE S:IDLE_CONNS U:IDLE_CONNS R:TASK_WQ U:TASK_WQ S:QUEUE S:QUEUE S:QUEUE locked: QUEUE(S)
  S:IDLE_CONNS U:IDLE_CONNS S:TASK_RQ U:TASK_RQ S:QUEUE U:QUEUE S:IDLE_CONNS locked: IDLE_CONNS(S)
  R:TASK_WQ S:TASK_WQ R:TASK_WQ S:TASK_WQ R:TASK_WQ S:TASK_WQ R:TASK_WQ locked: TASK_WQ(R)
  W:STK_TABLE W:STK_TABLE_UPDT U:STK_TABLE_UPDT W:STK_TABLE W:STK_TABLE_UPDT U:STK_TABLE_UPDT W:STK_TABLE W:STK_TABLE_UPDT locked: STK_TABLE(W) STK_TABLE_UPDT(W)

The format is slightly different (label(status)) so as to easily
differentiate them visually from the history.
2025-05-06 05:20:37 +02:00
Willy Tarreau
feaac66b5e DEBUG: threads: merge successive idempotent lock operations in history
In order to make the lock history a bit more useful, let's try to merge
adjacent lock/unlock sequences that don't change anything for other
threads. For this we can replace the last unlock with the new operation
on the same label, and even just not store it if it was the same as the
one before the unlock, since in the end it's the same as if the unlock
had not been done.

Now loops that used to be filled with "R:LISTENER U:LISTENER" show more
useful info such as:

  S:IDLE_CONNS U:IDLE_CONNS S:PEER U:PEER S:IDLE_CONNS U:IDLE_CONNS R:LISTENER U:LISTENER
  U:STK_TABLE W:STK_SESS U:STK_SESS R:STK_TABLE U:STK_TABLE W:STK_SESS U:STK_SESS R:STK_TABLE
  R:STK_TABLE U:STK_TABLE W:STK_SESS U:STK_SESS W:STK_TABLE_UPDT U:STK_TABLE_UPDT S:PEER

It's worth noting that it can sometimes induce confusion when recursive
locks of the same label are used (a few exist on peers or stick-tables),
as in such a case the two operations would be needed. However these ones
are already undebuggable, so instead they will just have to be renamed
to make sure they use a distinct label.
2025-05-05 18:36:12 +02:00
Willy Tarreau
743dce95d2 DEBUG: threads: don't keep lock label "OTHER" in the per-thread history
Most threads are filled with "R:OTHER U:OTHER" in their history. Since
anything non-important can use other it's not observable but it pollutes
the history. Let's just drop OTHER entirely during the recording.
2025-05-05 18:10:57 +02:00
Willy Tarreau
1f51f1c816 BUG/MINOR: tools: make parseline report the required space for the trailing 0
The fix in commit 09a325a4de ("BUG/MINOR: tools: always terminate empty
lines") is insufficient. While it properly addresses the lack of trailing
zero, it doesn't account for it in the returned outlen that is used to
allocate a larger line. This happens at boot if the very first line of
the test file is exactly a sharp with nothing else. In this case it will
return a length 0 and the caller (parse_cfg()) will try to re-allocate an
entry of size zero and will fail, bailing out a lack of memory. This time
it should really be OK.

It doesn't need to be backported, unless the patch above would be.
2025-05-05 17:58:04 +02:00
Willy Tarreau
09a325a4de BUG/MINOR: tools: always terminate empty lines
Since latest commit 7e4a2f39ef ("BUG/MINOR: tools: do not create an empty
arg from trailing spaces"), an empty line will no longer produce an arg
and no longer append a trailing zero to them. This was not visible because
one is already present in the input string, however all the trailing args
are set to out+outpos-1, which now points one char before the buffer since
nothing was emitted, and was noticed by ASAN, and/or when parsing garbage.
Let's make sure to always emit the zero for empty lines as well to address
this issue. No backport is needed unless the patch above gets backported.
2025-05-05 17:33:22 +02:00
Willy Tarreau
08d3caf30e MINOR: cfgparse: visually show the input line on empty args
Now when an empty arg is found on a line, we emit the sanitized
input line and the position of the first empty arg so as to help
the user figure the cause (likely an empty environment variable).

Co-authored-by: Valentine Krasnobaeva <vkrasnobaeva@haproxy.com>
2025-05-05 16:17:24 +02:00
Willy Tarreau
3f2c8af313 MINOR: tools: make parse_line() provide hints about empty args
In order to help parse_line() callers report the position of empty
args to the user, let's decide that if no error is emitted, then
we'll stuff the errptr with the position of the first empty arg
without affecting the return value.

Co-authored-by: Valentine Krasnobaeva <vkrasnobaeva@haproxy.com>
2025-05-05 16:17:24 +02:00
Willy Tarreau
9d14f2c764 MEDIUM: config: warn about the consequences of empty arguments on a config line
For historical reasons, the config parser relies on the trailing '\0'
to detect the end of the line being parsed. When the lines started to be
tokenized into arguments, this principle has been preserved, and now all
the parsers rely on *args[arg]='\0' to detect the end of a line. But as
reported in issue #2944, while most of the time it breaks the parsing
like below:

     http-request deny if { path_dir '' }

it can also cause some elements to be silently ignored like below:

     acl bad_path path_sub '%2E' '' '%2F'

This may also subtly happen with environment variables that don't exist
or which are empty:

     acl bad_path path_sub '%2E' "$BAD_PATTERN" '%2F'

Fortunately, parse_line() returns the number of arguments found, so it's
easy from the callers to verify if any was empty. The goal of this commit
is not to perform sensitive changes, it's only to mention when parsing a
line that an empty argument was found and alert about its consequences
using a warning. Most of the time when this happens, the config does not
parse. But for examples as the ACLs above, there could be consequences
that are better detected early.

This patch depends on this previous fix:
   BUG/MINOR: tools: do not create an empty arg from trailing spaces

Co-authored-by: Valentine Krasnobaeva <vkrasnobaeva@haproxy.com>
2025-05-05 16:17:24 +02:00
Willy Tarreau
7e4a2f39ef BUG/MINOR: tools: do not create an empty arg from trailing spaces
Trailing spaces on the lines of the config file create an empty arg
which makes it complicated to detect really empty args. Let's first
address this. Note that it is not user-visible but prevents from
fixing user-visible issues. No backport is needed.

The initial issue was introduced with this fix that already tried to
address it:

    8a6767d266 ("BUG/MINOR: config: don't count trailing spaces as empty arg (v2)")

The current patch properly addresses leading and trailing spaces by
only counting arguments if non-lws chars were found on the line. LWS
do not cause a transition to a new arg anymore but they complete the
current one. The whole new code relies on a state machine to detect
when to create an arg (!in_arg->in_arg), and when to close the current
arg. A special care was taken for word expansion in the form of
"${ARGS[*]}" which still continue to emit individual arguments past
the first LWS. This example works fine:

    ARGS="100 check inter 1000"
    server name 192.168.1."${ARGS[*]}"

It properly results in 6 args:

    "server", "name", "192.168.1.100", "check", "inter", "1000"

This fix should not have any visible user impact and is a bit tricky,
so it's best not to backport it, at least for a while.

Co-authored-by: Valentine Krasnobaeva <vkrasnobaeva@haproxy.com>
2025-05-05 16:16:54 +02:00
William Lallemand
af5bbce664 BUG/MINOR: acme/cli: don't output error on success
Previous patch 7251c13c7 ("MINOR: acme: move the acme task init in a dedicated
function") mistakenly returned the wrong error code when "acme renew" parsing
was successful, and tried to emit an error message.

This patch fixes the issue by returning 0 when the acme task was correctly
scheduled to start.

No backport needed.
2025-05-02 21:21:09 +02:00
Aurelien DARRAGON
0e6f968ee3 BUG/MEDIUM: stktable: fix sc_*(<ctr>) BUG_ON() regression with ctx > 9
As reported in GH #2958, commit 6c9b315 caused a regression with sc_*
fetches and tracked counter id > 9.

As such, the below configuration would cause a BUG_ON() to be triggered:

  global
    log stdout format raw local0
    tune.stick-counters 11

  defaults
    log global
    mode http

  frontend www
    bind *:8080

    acl track_me bool(true)
    http-request set-var(txn.track_var) str("a")
    http-request track-sc10 var(txn.track_var) table rate_table if track_me
    http-request set-var(txn.track_var_rate) sc_gpc_rate(0,10,rate_table)
    http-request return status 200

  backend rate_table
      stick-table type string size 1k expire 5m store gpc_rate(1,1m)

While in 6c9b315 the src_fetch logic was removed from
smp_fetch_sc_stkctr(), num > 9 is indeed not expected anymore as
original num value. But what we didn't consider is that num is effectively
re-assigned for generic sc_* variant.

Thus the BUG_ON() is misplaced as it should only be evaluated for
non-generic fetches. It explains why it triggers with valid configurations

Thanks to GH user @tkjaer for his detailed report and bug analysis

No backport needed, this bug is specific to 3.2.
2025-05-02 16:57:45 +02:00
Willy Tarreau
758e0818c3 [RELEASE] Released version 3.2-dev14
Released version 3.2-dev14 with the following main changes :
    - MINOR: acme: retry label always do a request
    - MINOR: acme: does not leave task for next request
    - BUG/MINOR: acme: reinit the retries only at next request
    - MINOR: acme: change the default max retries to 5
    - MINOR: acme: allow a delay after a valid response
    - MINOR: acme: wait 5s before checking the challenges results
    - MINOR: acme: emit a log when starting
    - MINOR: acme: delay of 5s after the finalize
    - BUG/MEDIUM: quic: Let it be known if the tasklet has been released.
    - BUG/MAJOR: tasks: fix task accounting when killed
    - CLEANUP: tasks: use the local state, not t->state, to check for tasklets
    - DOC: acme: external account binding is not supported
    - MINOR: hlua: ignore "tune.lua.bool-sample-conversion" if set after "lua-load"
    - MEDIUM: peers: Give up if we fail to take locks in hot path
    - MEDIUM: stick-tables: defer adding updates to a tasklet
    - MEDIUM: stick-tables: Limit the number of old entries we remove
    - MEDIUM: stick-tables: Limit the number of entries we expire
    - MINOR: cfgparse-global: add explicit error messages in cfg_parse_global_env_opts
    - MINOR: ssl: add function to extract X509 notBefore date in time_t
    - BUILD: acme: need HAVE_ASN1_TIME_TO_TM
    - MINOR: acme: move the acme task init in a dedicated function
    - MEDIUM: acme: add a basic scheduler
    - MINOR: acme: emit a log when the scheduler can't start the task
2025-05-02 16:23:28 +02:00
William Lallemand
7ad501e6a1 MINOR: acme: emit a log when the scheduler can't start the task
Emit an error log when the renewal scheduler can't start the task.
2025-05-02 16:12:41 +02:00
William Lallemand
7fe59ebb88 MEDIUM: acme: add a basic scheduler
This patch implements a very basic scheduler for the ACME tasks.

The scheduler is a task which is started from the postparser function
when at least one acme section was configured.

The scheduler will loop over the certificates in the ckchs_tree, and for
each certificate will start an ACME task if the notAfter date is past
curtime + (notAfter - notBefore) / 12, or 7 days if notBefore is not
available.

Once the lookup over all certificates is terminated, the task will sleep
and will wakeup after 12 hours.
2025-05-02 16:01:32 +02:00
William Lallemand
7251c13c77 MINOR: acme: move the acme task init in a dedicated function
acme_start_task() is a dedicated function which starts an acme task
for a specified <store> certificate.

The initialization code was move from the "acme renew" command parser to
this function, in order to be called from a scheduler.
2025-05-02 16:01:32 +02:00
William Lallemand
878a3507df BUILD: acme: need HAVE_ASN1_TIME_TO_TM
Restrict the build of the ACME feature to libraries which provide
ASN1_TIME_to_tm() function.
2025-05-02 16:01:32 +02:00
William Lallemand
626de9538e MINOR: ssl: add function to extract X509 notBefore date in time_t
Add x509_get_notbefore_time_t() which returns the notBefore date in
time_t format.
2025-05-02 16:01:32 +02:00
Valentine Krasnobaeva
8a4b3216f9 MINOR: cfgparse-global: add explicit error messages in cfg_parse_global_env_opts
When env variable name or value are not provided for setenv/presetenv it's not
clear from the old error message shown at stderr, what exactly is missed. User
needs to search in it's configuration.

Let's add more explicit error messages about these inconsistencies.

No need to be backported.
2025-05-02 15:37:45 +02:00
Olivier Houchard
994cc58576 MEDIUM: stick-tables: Limit the number of entries we expire
In process_table_expire(), limit the number of entries we remove in one
call, and just reschedule the task if there's more to do. Removing
entries require to use the heavily contended update write lock, and we
don't want to hold it for too long.
This helps getting stick tables perform better under heavy load.
2025-05-02 15:27:55 +02:00
Olivier Houchard
d2d4c3eb65 MEDIUM: stick-tables: Limit the number of old entries we remove
Limit the number of old entries we remove in one call of
stktable_trash_oldest(), as we do so while holding the heavily contended
update write lock, so we'd rather not hold it for too long.
This helps getting stick tables perform better under heavy load.
2025-05-02 15:27:55 +02:00
Olivier Houchard
388539faa3 MEDIUM: stick-tables: defer adding updates to a tasklet
There is a lot of contention trying to add updates to the tree. So
instead of trying to add the updates to the tree right away, just add
them to a mt-list (with one mt-list per thread group, so that the
mt-list does not become the new point of contention that much), and
create a tasklet dedicated to adding updates to the tree, in batchs, to
avoid keeping the update lock for too long.
This helps getting stick tables perform better under heavy load.
2025-05-02 15:27:55 +02:00
Olivier Houchard
b3ad7b6371 MEDIUM: peers: Give up if we fail to take locks in hot path
In peer_send_msgs(), give up in order to retry later if we failed at
getting the update read lock.
Similarly, in __process_running_peer_sync(), give up and just reschedule
the task if we failed to get the peer lock. There is an heavy contention
on both those locks, so we could spend a lot of time trying to get them.
This helps getting peers perform better under heavy load.
2025-05-02 15:27:55 +02:00
Aurelien DARRAGON
7a8d1a3122 MINOR: hlua: ignore "tune.lua.bool-sample-conversion" if set after "lua-load"
tune.lua.bool-sample-conversion must be set before any lua-load or
lua-load-per-thread is used for it to be considered. Indeed, lua-load
directives are parsed on the fly and will cause some parts of the scripts
to be executed during init already (script body/init contexts).

As such, we cannot afford to have "tune.lua.bool-sample-conversion" set
after some Lua code was loaded, because it would mean that the setting
would be handled differently for Lua's code executed during or after
config parsing.

To avoid ambiguities, the documentation now states that the setting must
be set before any lua-load(-per-thread) directive, and if the setting
is met after some Lua was already loaded, the directive is ignored and
a warning informs about that.

It should fix GH #2957

It may be backported with 29b6d8af16 ("MINOR: hlua: rename
"tune.lua.preserve-smp-bool" to "tune.lua.bool-sample-conversion"")
2025-05-02 14:38:37 +02:00
William Lallemand
6051a6e485 DOC: acme: external account binding is not supported
Add a note on external account binding in the ACME section.
2025-05-02 12:04:07 +02:00
Willy Tarreau
1ed238101a CLEANUP: tasks: use the local state, not t->state, to check for tasklets
There's no point reading t->state to check for a tasklet after we've
atomically read the state into the local "state" variable. Not only it's
more expensive, it's also less clear whether that state is supposed to
be atomic or not. And in any case, tasks and tasklets have their type
forever and the one reflected in state is correct and stable.
2025-05-02 11:09:28 +02:00
Willy Tarreau
45e83e8e81 BUG/MAJOR: tasks: fix task accounting when killed
After recent commit b81c9390f ("MEDIUM: tasks: Mutualize the TASK_KILLED
code between tasks and tasklets"), the task accounting was no longer
correct for killed tasks due to the decrement of tasks in list that was
no longer done, resulting in infinite loops in process_runnable_tasks().
This just illustrates that this code remains complex and should be further
cleaned up. No backport is needed, as this was in 3.2.
2025-05-02 11:09:28 +02:00
Olivier Houchard
faa18c1ad8 BUG/MEDIUM: quic: Let it be known if the tasklet has been released.
quic_conn_release() may, or may not, free the tasklet associated with
the connection. So make it return 1 if it was, and 0 otherwise, so that
if it was called from the tasklet handler itself, the said handler can
act accordingly and return NULL if the tasklet was destroyed.
This should be backported if 9240cd4a2771245fae4d0d69ef025104b14bfc23
is backported.
2025-05-02 11:09:28 +02:00
William Lallemand
f63ceeded0 MINOR: acme: delay of 5s after the finalize
Let 5 seconds by default to the server after the finalize to generate
the certificate. Some servers would not send a Retry-After during
processing.
2025-05-02 10:34:48 +02:00
William Lallemand
2db4848fc8 MINOR: acme: emit a log when starting
Emit a administrative log when starting the ACME client for a
certificate.
2025-05-02 10:23:42 +02:00
William Lallemand
fbd740ef3e MINOR: acme: wait 5s before checking the challenges results
Wait 5 seconds before trying to check if the challenges are ready, so it
let time to server to execute the challenges.
2025-05-02 10:18:24 +02:00
William Lallemand
f7cae0e55b MINOR: acme: allow a delay after a valid response
Use the retryafter value to set a delay before doing the next request
when the previous response was valid.
2025-05-02 10:16:12 +02:00
William Lallemand
18d2371e0d MINOR: acme: change the default max retries to 5
Change the default max retries constant to 5 instead of 3.
Some servers can be be a bit long to execute the challenge.
2025-05-02 09:40:12 +02:00
William Lallemand
24fbd1f724 BUG/MINOR: acme: reinit the retries only at next request
The retries were reinitialized incorrectly, it must be reinit only
when we didn't retry. So any valid response would reinit the retries
number.
2025-05-02 09:34:45 +02:00
William Lallemand
6626011720 MINOR: acme: does not leave task for next request
The next request was always leaving the task befor initializing the
httpclient. This patch optimize it by jumping to the next step at the
end of the current one. This way, only the httpclient is doing a
task_wakeup() to handle the response. But transiting from response to
the next request does not leave the task.
2025-05-02 09:31:39 +02:00
William Lallemand
51f9415d5e MINOR: acme: retry label always do a request
Doing a retry always result in initializing a request again, set
ACME_HTTP_REQ directly in the label instead of doing it for each step.
2025-05-02 09:15:07 +02:00
Willy Tarreau
c589964bcc [RELEASE] Released version 3.2-dev13
Released version 3.2-dev13 with the following main changes :
    - MEDIUM: checks: Make sure we return the tasklet from srv_chk_io_cb
    - MEDIUM: listener: Make sure w ereturn the tasklet from accept_queue_process
    - MEDIUM: mux_fcgi: Make sure we return the tasklet from fcgi_deferred_shut
    - MEDIUM: quic: Make sure we return the tasklet from qcc_io_cb
    - MEDIUM: quic: Make sure we return NULL in quic_conn_app_io_cb if needed
    - MEDIUM: quic: Make sure we return the tasklet from quic_accept_run
    - BUG/MAJOR: tasklets: Make sure he tasklet can't run twice
    - BUG/MAJOR: listeners: transfer connection accounting when switching listeners
    - MINOR: ssl/cli: add a '-t' option to 'show ssl sni'
    - DOC: config: fix ACME paragraph rendering issue
    - DOC: config: clarify log-forward "host" option
    - MINOR: promex: expose ST_I_PX_RATE (current_session_rate)
    - BUILD: acme: use my_strndup() instead of strndup()
    - BUILD: leastconn: fix build warning when building without threads on old machines
    - MINOR: threads: prepare DEBUG_THREAD to receive more values
    - MINOR: threads: turn the full lock debugging to DEBUG_THREAD=2
    - MEDIUM: threads: keep history of taken locks with DEBUG_THREAD > 0
    - MINOR: threads/cli: display the lock history on "show threads"
    - MEDIUM: thread: set DEBUG_THREAD to 1 by default
    - BUG/MINOR: ssl/acme: free EVP_PKEY upon error
    - MINOR: acme: separate the code generating private keys
    - MINOR: acme: failure when no directory is specified
    - MEDIUM: acme: generate the account file when not found
    - MEDIUM: acme: use 'crt-base' to load the account key
    - MINOR: compiler: add more macros to detect macro definitions
    - MINOR: cli: split APPCTX_CLI_ST1_PROMPT into two distinct flags
    - MEDIUM: cli: make the prompt mode configurable between n/i/p
    - MEDIUM: mcli: make the prompt mode configurable between i/p
    - MEDIUM: mcli: replicate the current mode when enterin the worker process
    - DOC: configuration: acme account key are auto generated
    - CLEANUP: acme: remove old TODO for account key
    - DOC: configuration: add quic4 to the ssl-f-use example
    - BUG/MINOR: acme: does not try to unlock after a failed trylock
    - BUG/MINOR: mux-h2: fix the offset of the pattern for the ping frame
    - MINOR: tcp: add support for setting TCP_NOTSENT_LOWAT on both sides
    - BUG/MINOR: acme: creating an account should not end the task
    - MINOR: quic: rename min/max fields for congestion window algo
    - MINOR: quic: refactor BBR API
    - BUG/MINOR: quic: ensure cwnd limits are always enforced
    - MINOR: thread: define cshared type
    - MINOR: quic: account for global congestion window
    - MEDIUM: quic: limit global Tx memory
    - MEDIUM: acme: use a map to store tokens and thumbprints
    - BUG/MINOR: acme: remove references to virt@acme
    - MINOR: applet: add appctx_schedule() macro
    - BUG/MINOR: dns: add tempo between 2 connection attempts for dns servers
    - CLEANUP: dns: remove unused dns_stream_server struct member
    - BUG/MINOR: dns: prevent ds accumulation within dss
    - CLEANUP: proxy: mention that px->conn_retries isn't relevant in some cases
    - DOC: ring: refer to newer RFC5424
    - MINOR: tools: make my_strndup() take a size_t len instead of and int
    - MINOR: Add "sigalg" to "sigalg name" helper function
    - MINOR: ssl: Add traces to ssl init/close functions
    - MINOR: ssl: Add traces to recv/send functions
    - MINOR: ssl: Add traces to ssl_sock_io_cb function
    - MINOR: ssl: Add traces around SSL_do_handshake call
    - MINOR: ssl: Add traces to verify callback
    - MINOR: ssl: Add ocsp stapling callback traces
    - MINOR: ssl: Add traces to the switchctx callback
    - MINOR: ssl: Add traces about sigalg extension parsing in clientHello callback
    - MINOR: Add 'conn' param to ssl_sock_chose_sni_ctx
    - BUG/MEDIUM: mux-spop: Wait end of handshake to declare a spop connection ready
    - BUG/MEDIUM: mux-spop: Handle CLOSING state and wait for AGENT DISCONNECT frame
    - BUG/MINOR: mux-h1: Don't pretend connection was released for TCP>H1>H2 upgrade
    - BUG/MINOR: mux-h1: Fix trace message in h1_detroy() to not relay on connection
    - BUILD: ssl: Fix wolfssl build
    - BUG/MINOR: mux-spop: Use the right bitwise operator in spop_ctl()
    - MEDIUM: mux-quic: increase flow-control on each bufsize
    - MINOR: mux-quic: limit emitted MSD frames count per qcs
    - MINOR: add hlua_yield_asap() helper
    - MINOR: hlua_fcn: enforce yield after *_get_stats() methods
    - DOC: config: restore default values for resolvers hold directive
    - MINOR: ssl/cli: "acme ps" shows the acme tasks
    - MINOR: acme: acme_ctx_destroy() returns upon NULL
    - MINOR: acme: use acme_ctx_destroy() upon error
    - MEDIUM: tasks: Mutualize code between tasks and tasklets.
    - MEDIUM: tasks: More code factorization
    - MEDIUM: tasks: Remove TASK_IN_LIST and use TASK_QUEUED instead.
    - MINOR: tasks: Remove unused tasklet_remove_from_tasklet_list
    - MEDIUM: tasks: Mutualize the TASK_KILLED code between tasks and tasklets
    - BUG/MEDIUM: connections: Report connection closing in conn_create_mux()
    - BUILD/MEDIUM: quic: Make sure we build with recent changes
2025-04-30 18:25:28 +02:00
Olivier Houchard
81e4083efb BUILD/MEDIUM: quic: Make sure we build with recent changes
TASK_IN_LIST has been changed to TASK_QUEUED, but one was missed in
quic_conn.c, so fix that.
2025-04-30 18:00:56 +02:00
Olivier Houchard
b138eab302 BUG/MEDIUM: connections: Report connection closing in conn_create_mux()
Add an extra parametre to conn_create_mux(), "closed_connection".
If a pointer is provided, then let it know if the connection was closed.
Callers have no way to determine that otherwise, and we need to know
that, at least in ssl_sock_io_cb(), as if the connection was closed we
need to return NULL, as the tasklet was free'd, otherwise that can lead
to memory corruption and crashes.
This should be backported if 9240cd4a2771245fae4d0d69ef025104b14bfc23
is backported too.
2025-04-30 17:17:36 +02:00
Olivier Houchard
b81c9390f4 MEDIUM: tasks: Mutualize the TASK_KILLED code between tasks and tasklets
The code to handle a task/tasklet when it's been killed before it were
to run is mostly identical, so move it outside of task and tasklet
specific code, and inside the common code.

This commit is just cosmetic, and should have no impact.
2025-04-30 17:09:14 +02:00
Olivier Houchard
4abfade371 MINOR: tasks: Remove unused tasklet_remove_from_tasklet_list
Remove tasklet_remove_from_tasklet_list, as the function hasn't been
used for a long time, and there is little reason to keep it.
2025-04-30 17:09:06 +02:00
Olivier Houchard
2bab043c8c MEDIUM: tasks: Remove TASK_IN_LIST and use TASK_QUEUED instead.
TASK_QUEUED was used to mean "the task has been scheduled to run",
TASK_IN_LIST was used to mean "the tasklet has been scheduled to run",
remove TASK_IN_LIST and just use TASK_QUEUED for tasklets instead.

This commit is just cosmetic, and should not have any impact.
2025-04-30 17:08:57 +02:00
Olivier Houchard
35df7cbe34 MEDIUM: tasks: More code factorization
There is some code that should run no matter if the task was killed or
not, and was needlessly duplicated, so only use one instance.
This also fixes a small bug when a tasklet that got killed before it
could run would still count as a tasklet that ran, when it should not,
which just means that we'd run one less useful task before going back to
the poller.
This commit is mostly cosmetic, and should not have any impact.
2025-04-30 17:08:57 +02:00
Olivier Houchard
438c000e9f MEDIUM: tasks: Mutualize code between tasks and tasklets.
The code that checks if we're currently running, and waits if so, was
identical between tasks and tasklets, so move it in code common to tasks
and tasklets.
This commit is just cosmetic, and should not have any impact.
2025-04-30 17:08:57 +02:00
William Lallemand
6462f183ad MINOR: acme: use acme_ctx_destroy() upon error
Use acme_ctx_destroy() instead of a simple free() upon error in the
"acme renew" error handling.

It's better to use this function to be sure than everything has been
been freed.
2025-04-30 17:18:46 +02:00
William Lallemand
b8a5270334 MINOR: acme: acme_ctx_destroy() returns upon NULL
acme_ctx_destroy() returns when its argument is NULL.
2025-04-30 17:17:58 +02:00
William Lallemand
563ca94ab8 MINOR: ssl/cli: "acme ps" shows the acme tasks
Implement a way to display the running acme tasks over the CLI.

It currently only displays a "Running" status with the certificate name
and the acme section from the configuration.

The displayed running tasks are limited to the size of a buffer for now,
it will require a backref list later to be called multiple times to
resume the list.
2025-04-30 17:12:50 +02:00
Aurelien DARRAGON
4bceca83fc DOC: config: restore default values for resolvers hold directive
Default values for hold directive (resolver context) used to be documented
but this was lost when the keyword description was reworked in 24b319b
("Default value is 10s for "valid", 0s for "obsolete" and 30s for
others.")

Restoring the part that describes the default value.

It may be backported to all stable versions with 24b319b
2025-04-30 17:00:37 +02:00
Aurelien DARRAGON
7f418ac7d2 MINOR: hlua_fcn: enforce yield after *_get_stats() methods
{listener,proxy,server}_get_stats() methods are know to be expensive,
expecially if used under an iteration. Indeed, while automatic yield
is performed every X lua instructions (defaults to 10k), computing an
object's stats 10K times in a single cpu loop is not desirable and
could create contention.

In this patch we leverage hlua_yield_asap() at the end of *_get_stats()
methods in order to force the automatic yield to occur ASAP after the
method returns. Hopefully this should help in similar scenarios as the
one described in GH #2903
2025-04-30 17:00:31 +02:00
Aurelien DARRAGON
97363015a5 MINOR: add hlua_yield_asap() helper
When called, this function will try to enforce a yield (if available) as
soon as possible. Indeed, automatic yield is already enforced every X
Lua instructions. However, there may be some cases where we know after
running heavy operation that we should yield already to avoid taking too
much CPU at once.

This is what this function offers, instead of asking the user to manually
yield using "core.yield()" from Lua itself after using an expensive
Lua method offered by haproxy, we can directly enforce the yield without
the need to do it in the Lua script.
2025-04-30 17:00:27 +02:00
Amaury Denoyelle
df50d3e39f MINOR: mux-quic: limit emitted MSD frames count per qcs
The previous commit has implemented a new calcul method for
MAX_STREAM_DATA frame emission. Now, a frame may be emitted as soon as a
buffer was consumed by a QCS instance.

This will probably increase the number of MAX_STREAM_DATA frame
emission. It may even cause a series of frame emitted for the same
stream with increasing values under high load, which is completely
unnecessary.

To improve this, limit the number of MAX_STREAM_DATA frames built to one
per QCS instance. This is implemented by storing a reference to this
frame in QCS structure via a new member <tx.msd_frm>.

Note that to properly reset QCS msd_frm member, emission of flow-control
frames have been changed. Now, each frame is emitted individually. On
one side, it is better as it prevent to emit frames related to different
streams in a single datagram, which is not desirable in case of packet
loss. However, this can also increase sendto() syscall invocation.
2025-04-30 16:08:47 +02:00
Amaury Denoyelle
14a3fb679f MEDIUM: mux-quic: increase flow-control on each bufsize
Recently, QCS Rx allocation buffer method has been improved. It is now
possible to allocate multiple buffers per QCS instances, which was
necessary to improve HTTP/3 POST throughput.

However, a limitation remained related to the emission of
MAX_STREAM_DATA. These frames are only emitted once at least half of the
receive capacity has been consumed by its QCS instance. This may be too
restrictive when a client need to upload a large payload.

Improve this by adjusting MAX_STREAM_DATA allocation. If QCS capacity is
still limited to 1 or 2 buffers max, the old calcul is still used. This
is necessary when user has limited upload throughput via their
configuration. If QCS capacity is more than 2 buffers, a new frame is
emitted if at least a buffer was consumed.

This patch has reduced number of STREAM_DATA_BLOCKED frames received in
POST tests with some specific clients.
2025-04-30 16:08:47 +02:00
Christopher Faulet
2ccfebcebf BUG/MINOR: mux-spop: Use the right bitwise operator in spop_ctl()
Becaues of a typo, '||' was used instead of '|' to test the SPOP conneciton
flags and decide if the mux is ready or not. The regression was introduced
in the commit fd7ebf117 ("BUG/MEDIUM: mux-spop: Wait end of handshake to
declare a spop connection ready").

This patch must be backported to 3.1 with the commit above.
2025-04-30 16:01:36 +02:00
Remi Tricot-Le Breton
f191a830d8 BUILD: ssl: Fix wolfssl build
The newly added SSL traces require an extra 'conn' parameter to
ssl_sock_chose_sni_ctx which was added in the "regular" code but not in
the wolfssl specific one.
Wolfssl also has a different prototype for some getter functions
(SSL_get_servername for instance), which do not expect a const SSL while
openssl version does.
2025-04-30 15:50:10 +02:00
Christopher Faulet
7dc4e94830 BUG/MINOR: mux-h1: Fix trace message in h1_detroy() to not relay on connection
h1_destroy() may be called to release a H1C after a multiplexer upgrade. In
that case, the connection is no longer attached to the H1C. It must not be
used in the h1 trace message because the connection context is no longer a H1C.

Because of this bug, when a H1>H2 upgrade is performed, a crash may be
experienced if the H1 traces are enabled.

This patch must be backport to all stable versions.
2025-04-30 14:44:42 +02:00
Christopher Faulet
2dc334be61 BUG/MINOR: mux-h1: Don't pretend connection was released for TCP>H1>H2 upgrade
When an applicative upgrade of the H1 multiplexer is performed, we must not
pretend the connection was released.  Indeed, in that case, a H1 stream is
still their with a stream connector attached on it. It must be detached
first before releasing the H1 connection and the underlying connection. So
it is important to not pretend the connection was already released.

Concretely, in that case h1_process() must return 0 instead of -1. It is
minor error because, AFAIK, it is harmless. But it is not correct. So let's
fix it to avoid futur bugs.

To be clear, this happens when a TCP connection is upgraded to H1 connection
and a H2 preface is detected, leading to a second upgrade from H1 to H2.

This patch may be backport to all stable versions.
2025-04-30 14:44:42 +02:00
Christopher Faulet
53c3046898 BUG/MEDIUM: mux-spop: Handle CLOSING state and wait for AGENT DISCONNECT frame
In the SPOE specification, when an error occurred on the SPOP connection,
HAProxy must send a DISCONNECT frame and wait for the agent DISCONNECT frame
in return before trully closing the connection.

However, this part was not properly handled by the SPOP multiplexer. In this
case, the SPOP connection should be in the CLOSING state. But this state was
not used at all. Depending on when the error was encountered, the connection
could be closed immediately, without sending any DISCONNECT frame. It was
the case when an early error was detected during the AGENT-HELLO frame
parsing. Or it could be moved from ERROR to FRAME_H state, as if no error
were detected. This case was less dramatic than it seemed because some flags
were also set to prevent any problem. But it was not obvious.

So now, the SPOP connection is properly switch to CLOSING state when an
DISCONNECT is sent to the agent to be able to wait for its DISCONNECT in
reply. spop_process_demux() was updated to parse frames in that state and
some validity checks was added.

This patch must be backport to 3.1.
2025-04-30 14:44:42 +02:00
Christopher Faulet
fd7ebf117b BUG/MEDIUM: mux-spop: Wait end of handshake to declare a spop connection ready
A SPOP connection must not be considered as ready while the hello handshake
is not finished with success. In addition, no error or shutdown must have
been reported for the underlying connection. Otherwise a freshly openned
spop connexion may be reused while it is in fact dead, leading to a
connection retry.

This patch must be backported to 3.1.
2025-04-30 14:44:42 +02:00
Remi Tricot-Le Breton
047fb37b19 MINOR: Add 'conn' param to ssl_sock_chose_sni_ctx
This is only useful in the traces, the conn parameter won't be used
otherwise.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
6519cec2ed MINOR: ssl: Add traces about sigalg extension parsing in clientHello callback
We had to parse the sigAlg extension by hand in order to properly select
the certificate used by the SSL frontends. These traces allow to dump
the allowed sigAlg list sent by the client in its clientHello.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
105c1ca139 MINOR: ssl: Add traces to the switchctx callback
This callback allows to pick the used certificate on an SSL frontend.
The certificate selection is made according to the information sent by
the client in the clientHello. The traces that were added will allow to
better understand what certificate was chosen and why. It will also warn
us if the chosen certificate was the default one.
The actual certificate parsing happens in ssl_sock_chose_sni_ctx. It's
in this function that we actually get the filename of the certificate
used.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
dbdd0630e1 MINOR: ssl: Add ocsp stapling callback traces
If OCSP stapling fails because of a missing or invalid OCSP response we
used to silently disable stapling for the given session. We can now know
a bit more what happened regarding OCSP stapling.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
0fb05540b2 MINOR: ssl: Add traces to verify callback
Those traces allow to know which errors were met during certificate
chain validation as well as which ones were ignored.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
4a8fa28e36 MINOR: ssl: Add traces around SSL_do_handshake call
Those traces dump information about the multiple SSL_do_handshake calls
(renegotiation and regular call). Some errors coud also be dumped in
case of rejected early data.
Depending on the chosen verbosity, some information about the current
handshake can be dumped as well (servername, tls version, chosen cipher
for instance).
In case of failed handshake, the error codes and messages will also be
dumped in the log to ease debugging.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
9f146bdab3 MINOR: ssl: Add traces to ssl_sock_io_cb function
Add new SSL traces.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
475bb8d843 MINOR: ssl: Add traces to recv/send functions
Those traces will allow to identify sessions on which early data is used
as well as some forcefully closed connections.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
9bb8d6dcd1 MINOR: ssl: Add traces to ssl init/close functions
Add a dedicated trace for some unlikely allocation failures and async
errors. Those traces will ostly be used to identify the start and end of
a given SSL connection.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
08e40f4589 MINOR: Add "sigalg" to "sigalg name" helper function
This function can be used to convert a TLSv1.3 sigAlg entry (2bytes)
from the signature_agorithms client hello extension into a string.

In order to ease debugging, some TLSv1.2 combinations can also be
dumped. In TLSv1.2 those signature algorithms pairs were built out of a
one byte signature identifier combined to a one byte hash identifier.
In TLSv1.3 those identifiers are two bytes blocs that must be treated as
such.
2025-04-30 11:11:26 +02:00
Willy Tarreau
566b384e4e MINOR: tools: make my_strndup() take a size_t len instead of and int
In relation to issue #2954, it appears that turning some size_t length
calculations to the int that uses my_strndup() upsets coverity a bit.
Instead of dealing with such warnings each time, better address it at
the root. An inspection of all call places show that the size passed
there is always positive so we can safely use an unsigned type, and
size_t will always suit it like for strndup() where it's available.
2025-04-30 05:17:43 +02:00
Lukas Tribus
5f9ce99c79 DOC: ring: refer to newer RFC5424
In the ring configuration example we refer to RFC3164 - the original BSD
syslog protocol without support for structured data (SDATA).

Let's refer to RFC5424 instead so SDATA is by default forwarded if
someone copy & pastes from the documentation:

https://discourse.haproxy.org/t/structured-data-lost-when-forwarding-logs-voa-syslog-forwarding-feature/11741/5

Should be backported to 2.6.
2025-04-29 21:39:01 +02:00
Aurelien DARRAGON
bd48e26a74 CLEANUP: proxy: mention that px->conn_retries isn't relevant in some cases
Since 91e785edc ("MINOR: stream: Rely on a per-stream max connection
retries value"), px->conn_retries may be ignored in the following cases:

 * proxy not part of a list which gets properly post-init (ie: main proxy
   list, log-forward list, sink list)
 * proxy lacking the CAP_FE capability

Documenting such cases where the px->conn_retries is set but effectively
ignored, so that we either remove ignored statements or fix them in
the future if they are really needed. In fact all cases affected here are
automomous applets that already handle the retries themselves so the fact
that 91e785edc made ->conn_retries ineffective should not be a big deal
anyway.
2025-04-29 21:21:19 +02:00
Aurelien DARRAGON
5288b39011 BUG/MINOR: dns: prevent ds accumulation within dss
when dns session callback (dns_session_release()) is called upon error
(ie: when some pending queries were not sent), we try our best to
re-create the applet in order to preserve the pending queries and give
them a chance to be retried. This is done at the end of
dns_session_release().

However, doing so exposes to an issue: if the error preventing queries
from being sent is still encountered over and over the dns session could
stay there indefinitely. Meanwhile, other dns sessions may be created on
the same dns_stream_server periodically. If previous failing dns sessions
don't terminate but we also keep creating new ones, we end up accumulating
failing sessions on a given dns_stream_server, which can eventually cause
ressource shortage.

This issue was found when trying to address ("BUG/MINOR: dns: add tempo
between 2 connection attempts for dns servers")

To fix it, we track the number of failed consecutive sessions for a given
dns server. When we reach the threshold (set to 100), we consider that the
link to the dns server is broken (at least temporarily) and we force
dns_session_new() to fail, so that we stop creating new sessions until one
of the existing one eventually succeeds.

A workaround for this fix consists in setting the "maxconn" parameter on
nameserver directive (under resolvers section) to a reasonnable value so
that no more than "maxconn" sessions may co-exist on the same server at
a given time.

This may be backported to all stable versions.
("CLEANUP: dns: remove unused dns_stream_server struct member") may be
backported to ease the backport.
2025-04-29 21:20:54 +02:00
Aurelien DARRAGON
14ebe95a10 CLEANUP: dns: remove unused dns_stream_server struct member
dns_stream_server "max_slots" is unused, let's get rid of it
2025-04-29 21:20:44 +02:00
Aurelien DARRAGON
27236f2218 BUG/MINOR: dns: add tempo between 2 connection attempts for dns servers
As reported by Lukas Tribus on the mailing list [1], trying to connect to
a nameserver with invalid network settings causes haproxy to retry a new
connection attempt immediately which eventually causes unexpected CPU usage
on the thread responsible for the applet (namely 100% on one CPU will be
observed).

This can be reproduced with the test config below:

 resolvers default
  nameserver ns1 tcp4@8.8.8.8:53 source 192.168.99.99
 listen listen
  mode http
  bind :8080
  server s1 www.google.com resolvers default init-addr none

To fix this the issue, we add a temporisation of one second between a new
connection attempt is retried. We do this in dns_session_create() when we
know that the applet was created in the release callback (when previous
query attempt was unsuccessful), which means initial connection is not
affected.

[1]: https://www.mail-archive.com/haproxy@formilux.org/msg45665.html

This should fix GH #2909 and may be backported to all stable versions.
This patch depends on ("MINOR: applet: add appctx_schedule() macro")
2025-04-29 21:20:11 +02:00
Aurelien DARRAGON
1ced5ef2fd MINOR: applet: add appctx_schedule() macro
Just like task_schedule() but for applets to wakeup an applet at a
specific time, leverages _task_schedule() internally
2025-04-29 21:19:37 +02:00
William Lallemand
c11ab983bf BUG/MINOR: acme: remove references to virt@acme
"virt@acme" was the default map used during development, now this must
be configured in the acme section or it won't try to use any map.

This patch removes the references to virt@acme in the comments and the
code.
2025-04-29 16:35:35 +02:00
William Lallemand
5555926fdd MEDIUM: acme: use a map to store tokens and thumbprints
The stateless mode which was documented previously in the ACME example
is not convenient for all use cases.

First, when HAProxy generates the account key itself, you wouldn't be
able to put the thumbprint in the configuration, so you will have to get
the thumbprint and then reload.
Second, in the case you are using multiple account key, there are
multiple thumbprint, and it's not easy to know which one you want to use
when responding to the challenger.

This patch allows to configure a map in the acme section, which will be
filled by the acme task with the token corresponding to the challenge,
as the key, and the thumbprint as the value. This way it's easy to reply
the right thumbprint.

Example:
    http-request return status 200 content-type text/plain lf-string "%[path,field(-1,/)].%[path,field(-1,/),map(virt@acme)]\n" if { path_beg '/.well-known/acme-challenge/' }
2025-04-29 16:15:55 +02:00
Amaury Denoyelle
0f9b3daf98 MEDIUM: quic: limit global Tx memory
Define a new settings tune.quic.frontend.max-tot-window. It contains a
size argument which can be used to set a limit on the sum of all QUIC
connections congestion window. This is applied both on
quic_cc_path_set() and quic_cc_path_inc().

Note that this limitation cannot reduce a congestion window more than
the minimal limit which is set to 2 datagrams.
2025-04-29 15:19:32 +02:00
Amaury Denoyelle
e841164a44 MINOR: quic: account for global congestion window
Use the newly defined cshared type to account for the sum of congestion
window of every QUIC connection. This value is stored in global counter
quic_mem_global defined in proto_quic module.
2025-04-29 15:19:32 +02:00
Amaury Denoyelle
3891456d20 MINOR: thread: define cshared type
Define a new type "struct cshared". This can be used as a tool to
manipulate a global counter with thread-safety ensured. Each thread
would declare its thread-local cshared type, which would point to a
global counter.

Each thread can then add/substract value to their owned thread-local
cshared instance via cshared_add(). If the difference exceed a
configured limit, either positively or negatively, the global counter is
updated and thread-local instance is reset to 0. Each thread can safely
read the global counter value using cshared_read().
2025-04-29 15:10:06 +02:00
Amaury Denoyelle
7bad88c35c BUG/MINOR: quic: ensure cwnd limits are always enforced
Congestion window is limit by a minimal and maximum values which can
never be exceeded. Min value is hardcoded to 2 datagrams as recommended
by the specification. Max value is specified via haproxy configuration.

These values must be respected each time the congestion window size is
adjusted. However, in some rare occasions, limit were not always
enforced. Fix this by implementing wrappers to set or increment the
congestion window. These functions ensure limits are always applied
after the operation.

Additionnally, wrappers also ensure that if window reached a new maximum
value, it is saved in <cwnd_last_max> field.

This should be backported up to 2.6, after a brief period of
observation.
2025-04-29 15:10:06 +02:00
Amaury Denoyelle
c01d455288 MINOR: quic: refactor BBR API
Write minor adjustments to QUIC BBR functions. The objective is to
centralize every modification of path cwnd field.

No functional change. This patch will be useful to simplify
implementation of global QUIC Tx memory usage limitation.
2025-04-29 15:10:06 +02:00
Amaury Denoyelle
2eb1b0cd96 MINOR: quic: rename min/max fields for congestion window algo
There was some possible confusion between fields related to congestion
window size min and max limit which cannot be exceeded, and the maximum
value previously reached by the window.

Fix this by adopting a new naming scheme. Enforced limit are now renamed
<limit_max>/<limit_min>, while the previously reached max value is
renamed <cwnd_last_max>.

This should be backported up to 3.1.
2025-04-29 15:10:06 +02:00
William Lallemand
62dfe1fc87 BUG/MINOR: acme: creating an account should not end the task
The account creation was mistakenly ending the task instead of being
wakeup for the NewOrder state, it was preventing the creation of the
certificate, however the account was correctly created.

To fix this, only the jump to the end label need to be remove, the
standard leaving codepath of the function will allow to be wakeup.

No backport needed.
2025-04-29 14:18:05 +02:00
Willy Tarreau
2cdb3cb91e MINOR: tcp: add support for setting TCP_NOTSENT_LOWAT on both sides
TCP_NOTSENT_LOWAT is very convenient as it indicates when to report
EAGAIN on the sending side. It takes a margin on top of the estimated
window, meaning that it's no longer needed to store too many data in
socket buffers. Instead there's just enough to fill the send window
and a little bit of margin to cover the scheduling time to restart
sending. Experiments on a 100ms network have shown a 10-fold reduction
in the memory used by socket buffers by just setting this value to
tune.bufsize, without noticing any performance degradation. Theoretically
the responsiveness on multiplexed protocols such as H2 should also be
improved.
2025-04-29 12:13:42 +02:00
Willy Tarreau
989f609b1a BUG/MINOR: mux-h2: fix the offset of the pattern for the ping frame
The ping frame's pattern must be written at offset 9 (frame header
length), not 8. This was added in 3.2 with commit 4dcfe098a6 ("MINOR:
mux-h2: prepare to support PING emission"), so no backport is needed.
2025-04-29 12:13:41 +02:00
William Lallemand
2f7f65e159 BUG/MINOR: acme: does not try to unlock after a failed trylock
Return after a failed trylock in acme_update_certificate() instead of
jumping to the error label which does an unlock.
2025-04-29 11:29:52 +02:00
William Lallemand
1cd0b35896 DOC: configuration: add quic4 to the ssl-f-use example
The ssl-f-use keyword is very useful in the case of multiple SSL bind
lines. Add a quic4 bind line in the example to show that.
2025-04-29 10:50:39 +02:00
William Lallemand
582614e1b2 CLEANUP: acme: remove old TODO for account key
Remove old TODO comments about the account key.
2025-04-29 09:59:32 +02:00
William Lallemand
59d83688e8 DOC: configuration: acme account key are auto generated
Explain that account key are auto generated when they do not exist.
2025-04-29 09:32:33 +02:00
Willy Tarreau
dc06495b71 MEDIUM: mcli: replicate the current mode when enterin the worker process
While humans can find it convenient to enter the worker process in prompt
mode, for external tools it will not be convenient to have to systematically
disable it. A better approach is to replicate the master socket's mode
there, since it has already been configured to suit the user: interactive,
prompt and timed modes are automatically passed to the worker process.
This makes the using the worker commands more natural from the master
process, without having to systematically adapt it for each new connection.
2025-04-28 20:21:06 +02:00
Willy Tarreau
c347cb73fa MEDIUM: mcli: make the prompt mode configurable between i/p
Support the same syntax in master mode as in worker mode in order to
configure the prompt. The only thing is that for now the master doesn't
have a non-interactive mode and it doesn't seem necessary to implement
it, so we only support the interactive and prompt modes. However the code
was written in a way that makes it easy to change this later if desired.
2025-04-28 20:21:06 +02:00
Willy Tarreau
e5c255c4e5 MEDIUM: cli: make the prompt mode configurable between n/i/p
Now the prompt mode can more finely be configured between non-interactive
(default), interactive without prompt, and interactive with prompt. This
will ease the usage from automated tools which are not necessarily
interested in having to consume '> ' after each command nor displaying
"+" on payload lines. This can also be convenient when coming from the
master CLI to keep the same output format.
2025-04-28 20:21:06 +02:00
Willy Tarreau
f25b4abc9b MINOR: cli: split APPCTX_CLI_ST1_PROMPT into two distinct flags
The CLI's "prompt" command toggles two distinct things:
  - displaying or hiding the prompt at the beginning of the line
  - single-command vs interactive mode

These are two independent concepts and the prompt mode doesn't
always cope well with tools that would like to upload data without
having to read the prompt on return. Also, the master command line
works in interactive mode by default with no prompt, which is not
consistent (and not convenient for tools). So let's start by splitting
the bit in two, and have a new APPCTX_CLI_ST1_INTER flag dedicated
to the interactive mode. For now the "prompt" command alone continues
to toggle the two at once.
2025-04-28 20:21:06 +02:00
Willy Tarreau
5ac280f2a7 MINOR: compiler: add more macros to detect macro definitions
We add __equals_0(NAME) which is only true if NAME is defined as zero,
and __def_as_empty(NAME) which is only true if NAME is defined as an
empty string.
2025-04-28 20:21:06 +02:00
William Lallemand
32b2b782e2 MEDIUM: acme: use 'crt-base' to load the account key
Prefix the filename with the 'crt-base' before loading the account key,
in order to work like every other keypair in haproxy.
2025-04-28 18:20:21 +02:00
William Lallemand
856b6042d3 MEDIUM: acme: generate the account file when not found
Generate the private key on the account file when the file does not
exists. This generate a private key of the type and parameters
configured in the acme section.
2025-04-28 18:20:21 +02:00
William Lallemand
b2dd6dd72b MINOR: acme: failure when no directory is specified
The "directory" parameter of the acme section is mandatory. This patch
exits with an alert when this parameter is not found.
2025-04-28 18:20:21 +02:00
William Lallemand
420de91d26 MINOR: acme: separate the code generating private keys
acme_EVP_PKEY_gen() generates private keys of specified <keytype>,
<curves> and <bits>. Only RSA and EC are supported for now.
2025-04-28 18:20:21 +02:00
William Lallemand
0897175d73 BUG/MINOR: ssl/acme: free EVP_PKEY upon error
Free the EPV_PKEY upon error when the X509_REQ generation failed.

No backport needed.
2025-04-28 18:20:21 +02:00
Willy Tarreau
12c7189bc8 MEDIUM: thread: set DEBUG_THREAD to 1 by default
Setting DEBUG_THREAD to 1 allows recording the lock history for each
thread. Tests have shown that (as predicted) the cost of updating a
single thread-local variable is not perceptible in the noise, especially
when compared to the cost of obtaining a lock. Since this can provide
useful value when debugging deadlocks, let's enable it by default when
threads are enabled.
2025-04-28 16:50:34 +02:00
Willy Tarreau
d9a659ed96 MINOR: threads/cli: display the lock history on "show threads"
This will display the lock labels and modes for each non-empty step
at the end of "show threads" when these are defined. This allows to
emit up to the last 8 locking operation for each thread on 64 bit
machines.
2025-04-28 16:50:34 +02:00
Willy Tarreau
b8a1c2380b MEDIUM: threads: keep history of taken locks with DEBUG_THREAD > 0
by only storing a word in each thread context, we can keep the history
of all taken/dropped locks by label. This is expected to be very cheap
and to permit to store up to 8 consecutive lock operations in 64 bits.
That should significantly help detect recursive locks as well as figure
what thread was likely to hinder another one waiting for a lock.

For now we only store the final state of the lock, we don't store the
attempt to get it. It's just a matter of space since we already need
4 ops (rd,sk,wr,un) which take 2 bits, leaving max 64 labels. We're
already around 45. We could also multiply by 5 and still keep 8 bits
total per lock, that would limit us to 51 locks max. It seems that
most of the time if we get a watchdog panic, anyway the victim thread
will be perfectly located so that we don't need a specific value for
this. Another benefit is that we perform a single memory write per
lock.
2025-04-28 16:50:34 +02:00
Willy Tarreau
23371b3e7c MINOR: threads: turn the full lock debugging to DEBUG_THREAD=2
At level 1 it now does nothing. This is reserved for some subsequent
patches which will implement lighter debugging.
2025-04-28 16:50:34 +02:00
Willy Tarreau
903a6b14ef MINOR: threads: prepare DEBUG_THREAD to receive more values
We now default the value to zero and make sure all tests properly take
care of values above zero. This is in preparation for supporting several
degrees of debugging.
2025-04-28 16:50:34 +02:00
Willy Tarreau
aa49965d4e BUILD: leastconn: fix build warning when building without threads on old machines
Machines lacking CAS8B/DWCAS and emit a warning in lb_fwlc.c without
threads due to declaration ordering. Let's just move the variable
declaration into the block that uses it as a last variable. No
backport is needed.
2025-04-28 16:50:34 +02:00
Willy Tarreau
589d916efa BUILD: acme: use my_strndup() instead of strndup()
Not all systems have strndup(), that's why we have our "my_strndup()",
so let's make use of it here. This fixes the build on Solaris 10. No
backport is needed.
2025-04-28 16:37:54 +02:00
Aurelien DARRAGON
dc95a3ed61 MINOR: promex: expose ST_I_PX_RATE (current_session_rate)
It has been requested to have the current_session_rate exposed at the
frontend level. For now only the per-process value was exposed
(ST_I_INF_SESS_RATE).

Thanks to the work done lately to merge promex and stat_cols_px[]
array, let's simply defined an .alt_name for the ST_I_PX_RATE metric in
order to have promex exposing it as current_session_rate for relevant
contexts.
2025-04-28 12:23:20 +02:00
Aurelien DARRAGON
e921362810 DOC: config: clarify log-forward "host" option
log-forward "host" option may be confusing because we often mention the
host field when talking about syslog RFC3164 or RFC5424 messages, but
neither rfc actually define "host" field. In fact, everywhere we used
"host field" we actually meant "hostname field" as documented in RFC5424.
This was a language abuse on our side.

In this patch we replace "host" with "hostname" where relevant in the
documentation to prevent confusion.

Thanks to Nick Ramirez for having reported the issue.
2025-04-28 12:23:16 +02:00
Aurelien DARRAGON
385b3f923f DOC: config: fix ACME paragraph rendering issue
Nick Ramirez reported that the ACME paragraph (3.13) caused a rendering
issue where simple text was rendered as a directive. This was caused
by the use of unescaped <name> which confuses dconv.

Let's escape <name> by putting quotes around it to prevent the rendering
issue.

No backport needed.
2025-04-28 12:23:12 +02:00
William Lallemand
83975f34e4 MINOR: ssl/cli: add a '-t' option to 'show ssl sni'
Add a -t option to 'show ssl sni', allowing to add an offset to the
current date so it would allow to check which certificates are expired
after a certain period of time.
2025-04-28 11:35:11 +02:00
Willy Tarreau
f1064c7382 BUG/MAJOR: listeners: transfer connection accounting when switching listeners
Since we made it possible for a bind_conf to listen to multiple thread
groups with shards in 2.8 with commit 9d360604bd ("MEDIUM: listener:
rework thread assignment to consider all groups"), the per-listener
connection count was not properly transferred to the target listener
with the connection when switching to another thread group. This results
in one listener possibly reaching high values and another one possibly
reaching negative values. Usually it's not visible, unless a maxconn is
set on the bind_conf, in which case comparisons will quickly put an end
to the willingness to accept new connections.

This problem only happens when thread groups are enabled, and it seems
very hard to trigger it normally, it only impacts sockets having a single
shard, hence currently the CLI (or any conf with "bind ... shards 1"),
where it can be reproduced with a config having a very low "maxconn" on
the stats socket directive (here, 4), and issuing a few tens of
socat <<< "show activity" in parallel, or sending HTTP connections to a
single-shared listener. Very quickly, haproxy stops accepting connections
and eats CPU in the poller which tries to get its connections accepted.

A BUG_ON(l->nbconn<0) after HA_ATOMIC_DEC() in listener_release() also
helps spotting them better.

Many thanks to Christian Ruppert who once again provided a very accurate
report in GH #2951 with the required data permitting this analysis.

This fix must be backported to 2.8.
2025-04-25 18:47:11 +02:00
Olivier Houchard
9240cd4a27 BUG/MAJOR: tasklets: Make sure he tasklet can't run twice
tasklets were originally designed to alway run on only one thread, so it
was not possible to have it run on 2 threads concurrently.
The API has been extended so that another thread may wake the tasklet,
the idea was still that we wanted to have it run on one thread only.
However, the way it's been done meant that unless a tasklet was bound to
a specific tid with tasklet_set_tid(), or we explicitely used
tasklet_wakeup_on() to specify the thread for the target to run on, it
would be scheduled to run on the current thread.
This is in fact a desirable feature. There is however a race condition
in which the tasklet would be scheduled on a thread, while it is running
on another. This could lead to the same tasklet to run on multiple
threads, which we do not want.
To fix this, just do what we already do for regular tasks, set the
"TASK_RUNNING" flag, and when it's time to execute the tasklet, wait
until that flag is gone.
Only one case has been found in the current code, where the tasklet
could run on different threads depending on who wakes it up, in the
leastconn load balancer, since commit
627280e15f03755b8f59f0191cd6d6bcad5afeb3.
It should not be a problem in practice, as the function called can be
called concurrently.
If a bug is eventually found in relation to this problem, and this patch
should be backported, the following patches should be backported too :
MEDIUM: quic: Make sure we return the tasklet from quic_accept_run
MEDIUM: quic: Make sure we return NULL in quic_conn_app_io_cb if needed
MEDIUM: quic: Make sure we return the tasklet from qcc_io_cb
MEDIUM: mux_fcgi: Make sure we return the tasklet from fcgi_deferred_shut
MEDIUM: listener: Make sure w ereturn the tasklet from accept_queue_process
MEDIUM: checks: Make sure we return the tasklet from srv_chk_io_cb
2025-04-25 16:14:26 +02:00
Olivier Houchard
09f5501bb9 MEDIUM: quic: Make sure we return the tasklet from quic_accept_run
In quic_accept_run, return the tasklet to tell the scheduler the tasklet
is still alive, it is not yet needed, but will be soon.
2025-04-25 16:14:26 +02:00
Olivier Houchard
5838786fa0 MEDIUM: quic: Make sure we return NULL in quic_conn_app_io_cb if needed
In quic_conn_app_io_cb, make sure we return NULL if the tasklet has been
destroyed, so that the scheduler knows. It is not yet needed, but will
be soon.
2025-04-25 16:14:26 +02:00
Olivier Houchard
15c5846db8 MEDIUM: quic: Make sure we return the tasklet from qcc_io_cb
In qcc_io_cb, return the tasklet to tell the scheduler the tasklet is
still alive, it is not yet needed, but will be soon.
2025-04-25 16:14:26 +02:00
Olivier Houchard
8f70f9c04b MEDIUM: mux_fcgi: Make sure we return the tasklet from fcgi_deferred_shut
In fcgi_deferred_shut, return the tasklet to tell the scheduler the
tasklet is still alive, it is not yet needed, but will be soon.
2025-04-25 16:14:26 +02:00
Olivier Houchard
7d190e7df6 MEDIUM: listener: Make sure w ereturn the tasklet from accept_queue_process
In accept_queue_process, return the tasklet to tell the scheduler the
tasklet is still alive, it is not yet needed, but will be soon.
2025-04-25 16:14:26 +02:00
Olivier Houchard
81dc3e67cf MEDIUM: checks: Make sure we return the tasklet from srv_chk_io_cb
In srv_chk_io_cb, return the tasklet to tell the scheduler the tasklet
is still alive, it is not yet needed, but will be soon.
2025-04-25 16:14:26 +02:00
Willy Tarreau
beb23069c6 [RELEASE] Released version 3.2-dev12
Released version 3.2-dev12 with the following main changes :
    - BUG/MINOR: quic: do not crash on CRYPTO ncbuf alloc failure
    - BUG/MINOR: proxy: always detach a proxy from the names tree on free()
    - CLEANUP: proxy: detach the name node in proxy_free_common() instead
    - CLEANUP: Slightly reorder some proxy option flags to free slots
    - MINOR: proxy: Add options to drop HTTP trailers during message forwarding
    - MINOR: h1-htx: Skip C-L and T-E headers for 1xx and 204 messages during parsing
    - MINOR: mux-h1: Keep custom "Content-Length: 0" header in 1xx and 204 messages
    - MINOR: hlua/h1: Use http_parse_cont_len_header() to parse content-length value
    - CLEANUP: h1: Remove now useless h1_parse_cont_len_header() function
    - BUG/MEDIUM: mux-spop: Respect the negociated max-frame-size value to send frames
    - MINOR: http-act: Add 'pause' action to temporarily suspend the message analysis
    - MINOR: acme/cli: add the 'acme renew' command to the help message
    - MINOR: httpclient: add an "https" log-format
    - MEDIUM: acme: use a customized proxy
    - MEDIUM: acme: rename "uri" into "directory"
    - MEDIUM: acme: rename "account" into "account-key"
    - MINOR: stick-table: use a separate lock label for updates
    - MINOR: h3: simplify h3_rcv_buf return path
    - BUG/MINOR: mux-quic: fix possible infinite loop during decoding
    - BUG/MINOR: mux-quic: do not decode if conn in error
    - BUG/MINOR: cli: Issue an error when too many args are passed for a command
    - MINOR: cli: Use a full prompt command for bidir connections with workers
    - MAJOR: cli: Refacor parsing and execution of pipelined commands
    - MINOR: cli: Rename some CLI applet states to reflect recent refactoring
    - CLEANUP: applet: Update st0/st1 comment in appctx structure
    - BUG/MINOR: hlua: Fix I/O handler of lua CLI commands to not rely on the SC
    - BUG/MINOR: ring: Fix I/O handler of "show event" command to not rely on the SC
    - MINOR: cli/applet: Move appctx fields only used by the CLI in a private context
    - MINOR: cache: Add a pointer on the cache in the cache applet context
    - MINOR: hlua: Use the applet name in error messages for lua services
    - MINOR: applet: Save the "use-service" rule in the stream to init a service applet
    - CLEANUP: applet: Remove unsued rule pointer in appctx structure
    - BUG/MINOR: master/cli: properly trim the '@@' process name in error messages
    - MEDIUM: resolvers: add global "dns-accept-family" directive
    - MINOR: resolvers: add command-line argument -4 to force IPv4-only DNS
    - MINOR: sock-inet: detect apparent IPv6 connectivity
    - MINOR: resolvers: add "dns-accept-family auto" to rely on detected IPv6
    - MEDIUM: acme: use Retry-After value for retries
    - MEDIUM: acme: reset the remaining retries
    - MEDIUM: acme: better error/retry management of the challenge checks
    - BUG/MEDIUM: cli: Handle applet shutdown when waiting for a command line
    - Revert "BUG/MINOR: master/cli: properly trim the '@@' process name in error messages"
    - BUG/MINOR: master/cli: only parse the '@@' prefix on complete lines
    - MINOR: resolvers: use the runtime IPv6 status instead of boot time one
2025-04-25 10:19:03 +02:00
Willy Tarreau
40aceb7414 MINOR: resolvers: use the runtime IPv6 status instead of boot time one
On systems where the network is not reachable at boot time (certain HA
systems for example, or dynamically addressed test machines), we'll want
to be able to periodically revalidate the IPv6 reachability status. The
current code makes it complicated because it sets the config bits once
for all at boot time. This commit changes this so that the config bits
are not changed, but instead we rely on a static inline function that
relies on sock_inet6_seems_reachable for every test (really cheap). This
also removes the now unneeded resolvers late init code.

This variable for now is still set at boot time but this will ease the
transition later, as the resolvers code is now ready for this.
2025-04-25 09:32:05 +02:00
Willy Tarreau
7a79f54c98 BUG/MINOR: master/cli: only parse the '@@' prefix on complete lines
The new adhoc parser for the '@@' prefix forgot to require the presence
of the LF character marking the end of the line. This is the reason why
entering incomplete commands would display garbage, because the line was
expected to have its LF character replaced with a zero.

The problem is well illustrated by using socat in raw mode:

   socat /tmp/master.sock STDIO,raw,echo=0

then entering "@@1 show info" one character at a time would error just
after the second "@". The command must take care to report an incomplete
line and wait for more data in such a case.
2025-04-25 09:05:00 +02:00
Willy Tarreau
931d932b3e Revert "BUG/MINOR: master/cli: properly trim the '@@' process name in error messages"
This reverts commit 0e94339eaf1c8423132debb6b1b485d8bb1bb7da.

This patch was in fact fixing the symptom, not the cause. The root cause
of the problem is that the parser was processing an incomplete line when
looking for '@@'. When the LF is present, this problem does not exist
as it's properly replaced with a zero. This can be verified using socat
in raw mode:

  socat /tmp/master.sock STDIO,raw,echo=0

Then entering "@@1 show info" one character at a time will immediately
fail on "@@" without going further. A subsequent patch will fix this.
No backport is needed.
2025-04-25 09:05:00 +02:00
Christopher Faulet
101cc4f334 BUG/MEDIUM: cli: Handle applet shutdown when waiting for a command line
When the CLI applet was refactord in the commit 20ec1de21 ("MAJOR: cli:
Refacor parsing and execution of pipelined commands"), a regression was
introduced. The applet shutdown was not longer handled when the applet was
waiting for the next command line. It is especially visible when a client
timeout occurred because the client connexion is no longer closed.

To fix the issue, the test on the SE_FL_SHW flag was reintroduced in
CLI_ST_PARSE_CMDLINE state, but only is there is no pending input data.

It is a 3.2-specific issue. No backport needed.
2025-04-25 08:47:05 +02:00
William Lallemand
27b732a661 MEDIUM: acme: better error/retry management of the challenge checks
When the ACME task is checking for the status of the challenge, it would
only succeed or retry upon failure.

However that's not the best way to do it, ACME objects contain an
"status" field which could have a final status or a in progress status,
so we need to be able to retry.

This patch adds an acme_ret enum which contains OK, RETRY and FAIL.

In the case of the CHKCHALLENGE, the ACME could return a "pending" or a
"processing" status, which basically need to be rechecked later with the
RETRY. However a "invalid" or "valid" status is final and will return
either a FAIL or a OK.

So instead of retrying in any case, the "invalid" status will ends the
task with an error.
2025-04-24 20:14:47 +02:00
William Lallemand
0909832e74 MEDIUM: acme: reset the remaining retries
When a request succeed, reset the remaining retries to the default
ACME_RETRY value (3 by default).
2025-04-24 20:14:47 +02:00
William Lallemand
bb768b3e26 MEDIUM: acme: use Retry-After value for retries
Parse the Retry-After header in response and store it in order to use
the value as the next delay for the next retry, fallback to 3s if the
value couldn't be parse or does not exist.
2025-04-24 20:14:47 +02:00
Willy Tarreau
69b051d1dc MINOR: resolvers: add "dns-accept-family auto" to rely on detected IPv6
Instead of always having to force IPv4 or IPv6, let's now also offer
"auto" which will only enable IPv6 if the system has a default gateway
for it. This means that properly configured dual-stack systems will
default to "ipv4,ipv6" while those lacking a gateway will only use
"ipv4". Note that no real connectivity test is performed, so firewalled
systems may still get it wrong and might prefer to rely on a manual
"ipv4" assignment.
2025-04-24 17:52:28 +02:00
Willy Tarreau
5d41d476f3 MINOR: sock-inet: detect apparent IPv6 connectivity
In order to ease dual-stack deployments, we could at least try to
check if ipv6 seems to be reachable. For this we're adding a test
based on a UDP connect (no traffic) on port 53 to the base of
public addresses (2001::) and see if the connect() is permitted,
indicating that the routing table knows how to reach it, or fails.
Based on this result we're setting a global variable that other
subsystems might use to preset their defaults.
2025-04-24 17:52:28 +02:00
Willy Tarreau
2c46c2c042 MINOR: resolvers: add command-line argument -4 to force IPv4-only DNS
In order to ease troubleshooting and testing, the new "-4" command line
argument enforces queries and processing of "A" DNS records only, i.e.
those representing IPv4 addresses. This can be useful when a host lack
end-to-end dual-stack connectivity. This overrides the global
"dns-accept-family" directive and is equivalent to value "ipv4".
2025-04-24 17:52:28 +02:00
Willy Tarreau
940fa19ad8 MEDIUM: resolvers: add global "dns-accept-family" directive
By default, DNS resolvers accept both IPv4 and IPv6 addresses. This can be
influenced by the "resolve-prefer" keywords on server lines as well as the
family argument to the "do-resolve" action, but that is only a preference,
which does not block the other family from being used when it's alone. In
some environments where dual-stack is not usable, stumbling on an unreachable
IPv6-only DNS record can cause significant trouble as it will replace a
previous IPv4 one which would possibly have continued to work till next
request. The "dns-accept-family" global option permits to enforce usage of
only one (or both) address families. The argument is a comma-delimited list
of the following words:
  - "ipv4": query and accept IPv4 addresses ("A" records)
  - "ipv6": query and accept IPv6 addresses ("AAAA" records)

When a single family is used, no request will be sent to resolvers for the
other family, and any response for the othe family will be ignored. The
default value is "ipv4,ipv6", which effectively enables both families.
2025-04-24 17:52:28 +02:00
Willy Tarreau
0e94339eaf BUG/MINOR: master/cli: properly trim the '@@' process name in error messages
When '@@' alone is sent on the master CLI (no trailing LF), we get an
error that displays anything past these two characters in the buffer
since there's no room for a \0. Let's make sure to limit the length of
the process name in this case. No backport is needed since this was added
with 00c967fac4 ("MINOR: master/cli: support bidirectional communications
with workers").
2025-04-24 17:52:28 +02:00
Christopher Faulet
29632bcabf CLEANUP: applet: Remove unsued rule pointer in appctx structure
Thanks to previous commits, the "rule" field in the appctx structure is no
longer used. So we can safely remove it.
2025-04-24 16:22:31 +02:00
Christopher Faulet
568ed6484a MINOR: applet: Save the "use-service" rule in the stream to init a service applet
When a service is initialized, the "use-service" rule that was executed is
now saved in the stream, using "current_rule" field, instead of saving it
into the applet context. It is safe to do so becaues this field is unused at
this stage. To avoid any issue, it is reset after the service
initialization. Doing so, it is no longer necessary to save it in the applet
context. It was the last usage of the rule pointer in the applet context.

The init functions for TCP and HTTP lua services were updated accordingly.
2025-04-24 16:22:24 +02:00
Christopher Faulet
6f59986e7c MINOR: hlua: Use the applet name in error messages for lua services
The lua function name was used in error messages of HTTP/TCP lua services
while the applet name can be used. Concretely, this will not change
anything, because when a lua service is regiestered, the lua function name
is used to name the applet. But it is easier, cleaner and more logicial
because it is really the applet name that should be displayed in these error
messages.
2025-04-24 15:59:33 +02:00
Christopher Faulet
e05074f632 MINOR: cache: Add a pointer on the cache in the cache applet context
Thanks to this change, when a response is delivered from the cache, it is no
longer necessary to get the cache filter configuration from the http
"use-cache" rule saved in the appctx to get the currently used cache. It was
a bit complex to get an info that can be directly and naturally stored in
the cache applet context.
2025-04-24 15:48:59 +02:00
Christopher Faulet
b734d7c156 MINOR: cli/applet: Move appctx fields only used by the CLI in a private context
There are several fields in the appctx structure only used by the CLI. To
make things cleaner, all these fields are now placed in a dedicated context
inside the appctx structure. The final goal is to move it in the service
context and add an API for cli commands to get a command coontext inside the
cli context.
2025-04-24 15:09:37 +02:00
Christopher Faulet
03dc54d802 BUG/MINOR: ring: Fix I/O handler of "show event" command to not rely on the SC
Thanks to the CLI refactoring ("MAJOR: cli: Refacor parsing and execution of
pipelined commands"), it is possible to fix "show event" I/O handle function
to no longer use the SC.

When the applet API was refactored to no longer manipulate the channels or
the stream-connectors, this part was missed. However, without the patch
above, it could not be fixed. It is now possible so let's do it.

This patch must not be backported becaues it depends on refactoring of the
CLI applet.
2025-04-24 15:09:37 +02:00
Christopher Faulet
e406fe16ea BUG/MINOR: hlua: Fix I/O handler of lua CLI commands to not rely on the SC
Thanks to the CLI refactoring ("MAJOR: cli: Refacor parsing and execution of
pipelined commands"), it is possible to fix the I/O handler function used by
lua CLI commands to no longer use the SC.

When the applet API was refactored to no longer manipulate the channels or
the stream-connectors, this part was missed. However, without the patch
above, it could not be fixed. It is now possible so let's do it.

This patch must not be backported becaues it depends on refactoring of the
CLI applet.
2025-04-24 15:09:37 +02:00
Christopher Faulet
742dc01537 CLEANUP: applet: Update st0/st1 comment in appctx structure
Today, these states are used by almost all applets. So update the comments
of these fields.
2025-04-24 15:09:37 +02:00
Christopher Faulet
44ace9a1b7 MINOR: cli: Rename some CLI applet states to reflect recent refactoring
CLI_ST_GETREQ state was renamed into CLI_ST_PARSE_CMDLINE and CLI_ST_PARSEREQ
into CLI_ST_PROCESS_CMDLINE to reflect the real action performed in these
states.
2025-04-24 15:09:37 +02:00
Christopher Faulet
20ec1de214 MAJOR: cli: Refacor parsing and execution of pipelined commands
Before this patch, when pipelined commands were received, each command was
parsed and then excuted before moving to the next command. Pending commands
were not copied in the input buffer of the applet. The major issue with this
way to handle commands is the impossibility to consume inputs from commands
with an I/O handler, like "show events" for instance. It was working thanks
to a "bug" if such commands were the last one on the command line. But it
was impossible to use them followed by another command. And this prevents us
to implement any streaming support for CLI commands.

So we decided to refactor the command line parsing to have something similar
to a basic shell. Now an entire line is parsed, including the payload,
before starting commands execution. The command line is copied in a
dedicated buffer. "appctx->chunk" buffer is used for this purpose. It was an
unsed field, so it is safe to use it here. Once the command line copied, the
commands found on this line are executed. Because the applet input buffer
was flushed, any input can be safely consumed by the CLI applet and is
available for the command I/O handler. Thanks to this change, "show event
-w" command can be followed by a command. And in theory, it should be
possible to implement commands supporting input data streaming. For
instance, the Tetris like lua applet can be used on the CLI now.

Note that the payload, if any, is part of the command line and must be fully
received before starting the commands processing. It means there is still
the limitation to a buffer, but not only for the payload but for the whole
command line. The payload is still necessarily at the end of the command
line and is passed as argument to the last command. Internally, the
"appctx->cli_payload" field was introduced to point on the payload in the
command line buffer.

This patch is quite huge but it cannot easily be splitted. It should not
introduced significant changes.
2025-04-24 15:09:37 +02:00
Christopher Faulet
69a9ec5bef MINOR: cli: Use a full prompt command for bidir connections with workers
When a bidirection connection with no command is establisehd with a worker
(so "@@<pid>" alone), a "prompt" command is automatically added to display
the worker's prompt and enter in interactive mode in the worker context.
However, till now, an unfinished command line is sent, with a semicolon
instead of a newline at the end. It is not exactly a bug because this
works. But it is not really expected and could be a problem for future
changes.

So now, a full command line is sent: the "prompt" command finished by a
newline character.
2025-04-24 15:09:37 +02:00
Christopher Faulet
d3f9289447 BUG/MINOR: cli: Issue an error when too many args are passed for a command
When a command is parsed to split it in an array of arguments, by default,
at most 64 arguments are supported. But no warning was emitted when there
were too many arguments. Instead, the arguments above the limit were
silently ignored. It could be an issue for some commands, like "add server",
because there was no way to know some arguments were ignored.

Now an error is issued when too many arguments are passed and the command is
not executed.

This patch should be backported to all stable versions.
2025-04-24 14:58:24 +02:00
Amaury Denoyelle
6c5030f703 BUG/MINOR: mux-quic: do not decode if conn in error
Add an early return to qcc_decode_qcs() if QCC instance is flagged on
error and connection is scheduled for immediate closure.

The main objective is to ensure to not trigger BUG_ON() from
qcc_set_error() : if a stream decoding has set the connection error, do
not try to process decoding on other streams as they may also encounter
an error. Thus, the connection is closed asap with the first encountered
error case.

This should be backported up to 2.6, after a period of observation.
2025-04-24 14:15:02 +02:00
Amaury Denoyelle
fbedb8746f BUG/MINOR: mux-quic: fix possible infinite loop during decoding
With the support of multiple Rx buffers per QCS instance, stream
decoding in qcc_io_recv() has been reworked for the next haproxy
release. An issue appears in a double while loop : a break statement is
used in the inner loop, which is not sufficient as it should instead
exit from the outer one.

Fix this by replacing break with a goto statement.

No need to backport this.
2025-04-24 14:15:02 +02:00
Amaury Denoyelle
3dcda87e58 MINOR: h3: simplify h3_rcv_buf return path
Remove return statement in h3_rcv_buf() in case of stream/connection
error. Instead, reuse already existing label err. This simplifies the
code path. It also fixes the missing leave trace for these cases.
2025-04-24 14:15:02 +02:00
Willy Tarreau
1af592c511 MINOR: stick-table: use a separate lock label for updates
Too many locks were sharing STK_TABLE_LOCK making it hard to analyze.
Let's split the already heavily used update lock.
2025-04-24 14:02:22 +02:00
William Lallemand
f192e446d6 MEDIUM: acme: rename "account" into "account-key"
Rename the "account" option of the acme section into "account-key".
2025-04-24 11:10:46 +02:00
William Lallemand
af73f98a3e MEDIUM: acme: rename "uri" into "directory"
Rename the "uri" option of the acme section into "directory".
2025-04-24 10:52:46 +02:00
William Lallemand
4e14889587 MEDIUM: acme: use a customized proxy
Use a customized proxy for the ACME client.

The proxy is initialized at the first acme section parsed.

The proxy uses the httpsclient log format as ACME CA use HTTPS.
2025-04-23 15:37:57 +02:00
William Lallemand
d700a242b4 MINOR: httpclient: add an "https" log-format
Add an experimental "https" log-format for the httpclient, it is not
used by the httpclient by default, but could be define in a customized
proxy.

The string is basically a httpslog, with some of the fields replaced by
their backend equivalent or - when not available:

"%ci:%cp [%tr] %ft -/- %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r %[bc_err]/%[ssl_bc_err,hex]/-/-/%[ssl_bc_is_resumed] -/-/-"
2025-04-23 15:32:46 +02:00
William Lallemand
d19a62dc65 MINOR: acme/cli: add the 'acme renew' command to the help message
Add the 'acme renew' command to the 'help' command of the CLI.
2025-04-23 13:59:27 +02:00
Christopher Faulet
1709cfd31d MINOR: http-act: Add 'pause' action to temporarily suspend the message analysis
The 'pause' HTTP action can now be used to suspend for a moment the message
analysis. A timeout, expressed in milliseconds using a time-format
parameter, or an expression can be used. If an expression is used, errors
and invalid values are ignored.

Internally, the action will set the analysis expiration date on the
corresponding channel to the configured value and it will yield while it is
not expired.

The 'pause' action is available for 'http-request' and 'http-response'
rules.
2025-04-22 16:14:47 +02:00
Christopher Faulet
ce8c2d359b BUG/MEDIUM: mux-spop: Respect the negociated max-frame-size value to send frames
When a SPOP connection is opened, the maximum size for frames is negociated.
This negociated size is properly used when a frame is received and if a too
big frame is detected, an error is triggered. However, the same was not
performed on the sending path. No check was performed on frames sent to the
agent. So it was possible to send frames bigger than the maximum size
supported by the the SPOE agent.

Now, the size of NOTIFY and DISCONNECT frames is checked before sending them
to the agent.

Thanks to Miroslav to have reported the issue.

This patch must be backported to 3.1.
2025-04-22 16:14:47 +02:00
Christopher Faulet
a56feffc6f CLEANUP: h1: Remove now useless h1_parse_cont_len_header() function
Since the commit "MINOR: hlua/h1: Use http_parse_cont_len_header() to parse
content-length value", this function is no longer used. So it can be safely
removed.
2025-04-22 16:14:47 +02:00
Christopher Faulet
9e05c14a41 MINOR: hlua/h1: Use http_parse_cont_len_header() to parse content-length value
Till now, h1_parse_cont_len_header() was used during the H1 message parsing and
by the lua HTTP applets to parse the content-length header value. But a more
generic function was added some years ago doing exactly the same operations. So
let's use it instead.
2025-04-22 16:14:47 +02:00
Christopher Faulet
a6b32922fc MINOR: mux-h1: Keep custom "Content-Length: 0" header in 1xx and 204 messages
Thanks to the commit "MINOR: mux-h1: Don't remove custom "Content-Length: 0"
header in 1xx and 204 messages", we are now sure that 1xx and 204 responses
were sanitized during the parsing. So, if one of these headers are found in
such responses when sent to the client, it means it was added by hand, via a
"set-header" action for instance. In this context, we are able to make an
exception for the "Content-Length: 0" header, and only this one with this
value, to not break leagacy applications.

So now, a user can force the "Content-Length: 0" header to appear in 1xx and
204 responses by adding the right action in hist configuration.
"Transfer-Encoding" headers are still dropped as "Content-Length" headers
with another value than 0. Note, that in practice, only 101 and 204 are
concerned because other 1xx message are not subject to HTTP analysis.

This patch should fix the issue #2888. There is no reason to backport
it. But if we do so, the patch above must be backported too.
2025-04-22 16:14:47 +02:00
Christopher Faulet
1db99b09d0 MINOR: h1-htx: Skip C-L and T-E headers for 1xx and 204 messages during parsing
According to the RFC9110 and RFC9112, a server must not add 'Content-Length'
or 'Transfer-Encoding' headers into 1xx and 204 responses. So till now,
these headers were dropped from the response when it is sent to the client.

However, it seems more logical to remove it during the message parsing. In
addition to sanitize messages as early as possible, this will allow us to
apply some exception in some cases (This will be the subject of another
patch).

In this patch, 'Content-Length' and 'Transfer-Encoding' headers are removed
from 1xx and 204 responses during the parsing but the same is still
performed during the formatting stage.
2025-04-22 16:14:47 +02:00
Christopher Faulet
5200203677 MINOR: proxy: Add options to drop HTTP trailers during message forwarding
In RFC9110, it is stated that trailers could be merged with the
headers. While it should be performed with a speicial care, it may be a
problem for some applications. To avoid any trouble with such applications,
two new options were added to drop trailers during the message forwarding.

On the backend, "http-drop-request-trailers" option can be enabled to drop
trailers from the requests before sending them to the server. And on the
frontend, "http-drop-response-trailers" option can be enabled to drop
trailers from the responses before sending them to the client. The options
can be defined in defaults sections and disabled with "no" keyword.

This patch should fix the issue #2930.
2025-04-22 16:14:46 +02:00
Christopher Faulet
044ef9b3d6 CLEANUP: Slightly reorder some proxy option flags to free slots
PR_O_TCPCHK_SSL and PR_O_CONTSTATS was shifted to free a slot. The idea is
to have 2 contiguous slots to be able to insert two new options.
2025-04-22 16:14:46 +02:00
Willy Tarreau
5763a891a9 CLEANUP: proxy: detach the name node in proxy_free_common() instead
This changes commit d2a9149f0 ("BUG/MINOR: proxy: always detach a proxy
from the names tree on free()") to be cleaner. Aurlien spotted that
the free(p->id) was indeed already done in proxy_free_common(), which is
called before we delete the node. That's still a bit ugly and it only
works because ebpt_delete() does not dereference the key during the
operation. Better play safe and delete the entry before freeing it,
that's more future-proof.
2025-04-19 10:21:19 +02:00
Willy Tarreau
d2a9149f09 BUG/MINOR: proxy: always detach a proxy from the names tree on free()
Stephen Farrell reported in issue #2942 that recent haproxy versions
crash if there's no resolv.conf. A quick bisect with his reproducer
showed that it started with commit 4194f75 ("MEDIUM: tree-wide: avoid
manually initializing proxies") which reorders the proxies initialization
sequence a bit. The crash shows a corrupted tree, typically indicating a
use-after-free. With the help of ASAN it was possible to find that a
resolver proxy had been destroyed and freed before the name insertion
that causes the crash, very likely caused by the absence of the needed
resolv.conf:

    #0 0x7ffff72a82f7 in free (/usr/local/lib64/libasan.so.5+0x1062f7)
    #1 0x94c1fd in free_proxy src/proxy.c:436
    #2 0x9355d1 in resolvers_destroy src/resolvers.c:2604
    #3 0x93e899 in resolvers_create_default src/resolvers.c:3892
    #4 0xc6ed29 in httpclient_resolve_init src/http_client.c:1170
    #5 0xc6fbcf in httpclient_create_proxy src/http_client.c:1310
    #6 0x4ae9da in ssl_ocsp_update_precheck src/ssl_ocsp.c:1452
    #7 0xa1b03f in step_init_2 src/haproxy.c:2050

But free_proxy() doesn't delete the ebpt_node that carries the name,
which perfectly explains the situation. This patch simply deletes the
name node and Stephen confirmed that it fixed the problem for him as
well. Let's also free it since the key points to p->id which is never
freed either in this function!

No backport is needed since the patch above was first merged into
3.2-dev10.
2025-04-18 23:50:13 +02:00
Amaury Denoyelle
4309a6fbf8 BUG/MINOR: quic: do not crash on CRYPTO ncbuf alloc failure
To handle out-of-order received CRYPTO frames, a ncbuf instance is
allocated. This is done via the helper quic_get_ncbuf().

Buffer allocation was improperly checked. In case b_alloc() fails, it
crashes due to a BUG_ON(). Fix this by removing it. The function now
returns NULL on allocation failure, which is already properly handled in
its caller qc_handle_crypto_frm().

This should fix the last reported crash from github issue #2935.

This must be backported up to 2.6.
2025-04-18 18:11:17 +02:00
Willy Tarreau
acd372d6ac [RELEASE] Released version 3.2-dev11
Released version 3.2-dev11 with the following main changes :
    - CI: enable weekly QuicTLS build
    - DOC: management: slightly clarify the prefix role of the '@' command
    - DOC: management: add a paragraph about the limitations of the '@' prefix
    - MINOR: master/cli: support bidirectional communications with workers
    - MEDIUM: ssl/ckch: add filename and linenum argument to crt-store parsing
    - MINOR: acme: add the acme section in the configuration parser
    - MINOR: acme: add configuration for the crt-store
    - MINOR: acme: add private key configuration
    - MINOR: acme/cli: add the 'acme renew' command
    - MINOR: acme: the acme section is experimental
    - MINOR: acme: get the ACME directory
    - MINOR: acme: handle the nonce
    - MINOR: acme: check if the account exist
    - MINOR: acme: generate new account
    - MINOR: acme: newOrder request retrieve authorizations URLs
    - MINOR: acme: allow empty payload in acme_jws_payload()
    - MINOR: acme: get the challenges object from the Auth URL
    - MINOR: acme: send the request for challenge ready
    - MINOR: acme: implement a check on the challenge status
    - MINOR: acme: generate the CSR in a X509_REQ
    - MINOR: acme: finalize by sending the CSR
    - MINOR: acme: verify the order status once finalized
    - MINOR: acme: implement retrieval of the certificate
    - BUG/MINOR: acme: ckch_conf_acme_init() when no filename
    - MINOR: ssl/ckch: handle ckch_conf in ckchs_dup() and ckch_conf_clean()
    - MINOR: acme: copy the original ckch_store
    - MEDIUM: acme: replace the previous ckch instance with new ones
    - MINOR: acme: schedule retries with a timer
    - BUILD: acme: enable the ACME feature when JWS is present
    - BUG/MINOR: cpu-topo: check the correct variable for NULL after malloc()
    - BUG/MINOR: acme: key not restored upon error in acme_res_certificate()
    - BUG/MINOR: thread: protect thread_cpus_enabled_at_boot with USE_THREAD
    - MINOR: acme: default to 2048bits for RSA
    - DOC: acme: explain how to configure and run ACME
    - BUG/MINOR: debug: remove the trailing \n from BUG_ON() statements
    - DOC: config: add the missing "profiling.memory" to the global kw index
    - DOC: config: add the missing "force-cfg-parser-pause" to the global kw index
    - DEBUG: init: report invalid characters in debug description strings
    - DEBUG: rename DEBUG_GLITCHES to DEBUG_COUNTERS and enable it by default
    - DEBUG: counters: make COUNT_IF() only appear at DEBUG_COUNTERS>=1
    - DEBUG: counters: add the ability to enable/disable updating the COUNT_IF counters
    - MINOR: tools: let dump_addr_and_bytes() support dumping before the offset
    - MINOR: debug: in call traces, dump the 8 bytes before the return address, not after
    - MINOR: debug: detect call instructions and show the branch target in backtraces
    - BUG/MINOR: acme: fix possible NULL deref
    - CLEANUP: acme: stored value is overwritten before it can be used
    - BUILD: incompatible pointer type suspected with -DDEBUG_UNIT
    - BUG/MINOR: http-ana: Properly detect client abort when forwarding the response
    - BUG/MEDIUM: http-ana: Report 502 from req analyzer only during rsp forwarding
    - CI: fedora rawhide: enable unit tests
    - DOC: configuration: fix a typo in ACME documentation
    - MEDIUM: sink: add a new dpapi ring buffer
    - Revert "BUG/MINOR: acme: key not restored upon error in acme_res_certificate()"
    - BUG/MINOR: acme: key not restored upon error in acme_res_certificate() V2
    - BUG/MINOR: acme: fix the exponential backoff of retries
    - DOC: configuration: specify limitations of ACME for 3.2
    - MINOR: acme: emit logs instead of ha_notice
    - MINOR: acme: add a success message to the logs
    - BUG/MINOR: acme/cli: fix certificate name in error message
    - MINOR: acme: register the task in the ckch_store
    - MINOR: acme: free acme_ctx once the task is done
    - BUG/MEDIUM: h3: trim whitespaces when parsing headers value
    - BUG/MEDIUM: h3: trim whitespaces in header value prior to QPACK encoding
    - BUG/MINOR: h3: filter upgrade connection header
    - BUG/MINOR: h3: reject invalid :path in request
    - BUG/MINOR: h3: reject request URI with invalid characters
    - MEDIUM: h3: use absolute URI form with :authority
    - BUG/MEDIUM: hlua: fix hlua_applet_{http,tcp}_fct() yield regression (lost data)
    - BUG/MINOR: mux-h2: prevent past scheduling with idle connections
    - BUG/MINOR: rhttp: fix reconnect if timeout connect unset
    - BUG/MINOR: rhttp: ensure GOAWAY can be emitted after reversal
    - BUG/MINOR: mux-h2: do not apply timer on idle backend connection
    - MINOR: mux-h2: refactor idle timeout calculation
    - MINOR: mux-h2: prepare to support PING emission
    - MEDIUM: server/mux-h2: implement idle-ping on backend side
    - MEDIUM: listener/mux-h2: implement idle-ping on frontend side
    - MINOR: mux-h2: do not emit GOAWAY on idle ping expiration
    - MINOR: mux-h2: handle idle-ping on conn reverse
    - BUILD: makefile: enable backtrace by default on musl
    - BUG/MINOR: threads: set threads_idle and threads_harmless even with no threads
    - BUG/MINOR debug: fix !USE_THREAD_DUMP in ha_thread_dump_fill()
    - BUG/MINOR: wdt/debug: avoid signal re-entrance between debugger and watchdog
    - BUG/MINOR: debug: detect and prevent re-entrance in ha_thread_dump_fill()
    - MINOR: debug: do not statify a few debugging functions often used with wdt/dbg
    - MINOR: tools: also protect the library name resolution against concurrent accesses
    - MINOR: tools: protect dladdr() against reentrant calls from the debug handler
    - MINOR: debug: protect ha_dump_backtrace() against risks of re-entrance
    - MINOR: tinfo: keep a copy of the pointer to the thread dump buffer
    - MINOR: debug: always reset the dump pointer when done
    - MINOR: debug: remove unused case of thr!=tid in ha_thread_dump_one()
    - MINOR: pass a valid buffer pointer to ha_thread_dump_one()
    - MEDIUM: wdt: always make the faulty thread report its own warnings
    - MINOR: debug: make ha_stuck_warning() only work for the current thread
    - MINOR: debug: make ha_stuck_warning() print the whole message at once
    - CLEANUP: debug: no longer set nor use TH_FL_DUMPING_OTHERS
    - MINOR: sched: add a new function is_sched_alive() to report scheduler's health
    - MINOR: wdt: use is_sched_alive() instead of keeping a local ctxsw copy
    - MINOR: sample: add 4 new sample fetches for clienthello parsing
    - REGTEST: add new reg-test for the 4 new clienthello fetches
    - MINOR: servers: Move the per-thread server initialization earlier
    - MINOR: proxies: Initialize the per-thread structure earlier.
    - MINOR: servers: Provide a pointer to the server in srv_per_tgroup.
    - MINOR: lb_fwrr: Move the next weight out of fwrr_group.
    - MINOR: proxies: Add a per-thread group lbprm struct.
    - MEDIUM: lb_fwrr: Use one ebtree per thread group.
    - MEDIUM: lb_fwrr: Don't start all thread groups on the same server.
    - MINOR: proxies: Do stage2 initialization for sinks too
2025-04-18 14:19:47 +02:00
Olivier Houchard
c4aec7a52f MINOR: proxies: Do stage2 initialization for sinks too
In check_config_validity(), we initialize the proxy in several stages.
We do so for the sink list for stage1, but not for stage2. It may not be
needed right now, but it may become needed in the future, so do it
anyway.
2025-04-17 17:38:23 +02:00
Olivier Houchard
658eaa4086 MEDIUM: lb_fwrr: Don't start all thread groups on the same server.
Now that all there is one tree per thread group, all thread groups will
start on the same server. To prevent that, just insert the servers in a
different order for each thread group.
2025-04-17 17:38:23 +02:00
Olivier Houchard
3758eab71c MEDIUM: lb_fwrr: Use one ebtree per thread group.
When using the round-robin load balancer, the major source of contention
is the lbprm lock, that has to be held every time we pick a server.
To mitigate that, make it so there are one tree per thread-group, and
one lock per thread-group. That means we now have a lb_fwrr_per_tgrp
structure that will contain the two lb_fwrr_groups (active and backup) as well
as the lock to protect them in the per-thread lbprm struct, and all
fields in the struct server are now moved to the per-thread structure
too.
Those changes are mostly mechanical, and brings good performances
improvment, on a 64-cores AMD CPU, with 64 servers configured, we could
process about 620000 requests par second, and we now can process around
1400000 requests per second.
2025-04-17 17:38:23 +02:00
Olivier Houchard
f36f6cfd26 MINOR: proxies: Add a per-thread group lbprm struct.
Add a new structure in the per-thread groups proxy structure, that will
contain whatever is per-thread group in lbprm.
It will be accessed as p->per_tgrp[tgid].lbprm.
2025-04-17 17:38:23 +02:00
Olivier Houchard
7ca1c94ff0 MINOR: lb_fwrr: Move the next weight out of fwrr_group.
Move the "next_weight" outside of fwrr_group, and inside struct lb_fwrr
directly, one for the active servers, one for the backup servers.
We will soon have one fwrr_group per thread group, but next_weight will
be global to all of them.
2025-04-17 17:38:23 +02:00
Olivier Houchard
444125a764 MINOR: servers: Provide a pointer to the server in srv_per_tgroup.
Add a pointer to the server into the struct srv_per_tgroup, so that if
we only have access to that srv_per_tgroup, we can come back to the
corresponding server.
2025-04-17 17:38:23 +02:00
Olivier Houchard
5e1ce09e54 MINOR: proxies: Initialize the per-thread structure earlier.
Move the call to initialize the proxy's per-thread structure earlier
than currently done, so that they are usable when we're initializing the
load balancers.
2025-04-17 17:38:23 +02:00
Olivier Houchard
e7613d3717 MINOR: servers: Move the per-thread server initialization earlier
Move the code responsible for calling per-thread server initialization
earlier than it was done, so that per-thread structures are available a
bit later, when we initialize load-balancing.
2025-04-17 17:38:23 +02:00
Mariam John
9a8c4df45d REGTEST: add new reg-test for the 4 new clienthello fetches
Add a reg-test which uses the 4 fetches:

- req.ssl_cipherlist
- req.ssl_sigalgs
- req.ssl_keyshare_groups
- req.ssl_supported_groups
2025-04-17 16:39:47 +02:00
Mariam John
fa063a9e77 MINOR: sample: add 4 new sample fetches for clienthello parsing
This patch contains this 4 new fetches and doc changes for the new fetches:

- req.ssl_cipherlist
- req.ssl_sigalgs
- req.ssl_keyshare_groups
- req.ssl_supported_groups

Towards:#2532
2025-04-17 16:39:47 +02:00
Willy Tarreau
5901164789 MINOR: wdt: use is_sched_alive() instead of keeping a local ctxsw copy
Now we can simply call is_sched_alive() on the local thread to verify
that the scheduler is still ticking instead of having to keep a copy of
the ctxsw and comparing it. It's cleaner, doesn't require to maintain
a local copy, doesn't rely on activity[] (whose purpose is mainly for
observation and debugging), and shows how this could be extended later
to cover other use cases. Practically speaking this doesn't change
anything however, the algorithm is still the same.
2025-04-17 16:25:47 +02:00
Willy Tarreau
36ec70c526 MINOR: sched: add a new function is_sched_alive() to report scheduler's health
This verifies that the scheduler is still ticking without having to
access the activity[] array nor keeping local copies of the ctxsw
counter. It just tests and sets a flag that is reset after each
return from a ->process() function.
2025-04-17 16:25:47 +02:00
Willy Tarreau
874ba2afed CLEANUP: debug: no longer set nor use TH_FL_DUMPING_OTHERS
TH_FL_DUMPING_OTHERS was being used to try to perform exclusion between
threads running "show threads" and those producing warnings. Now that it
is much more cleanly handled, we don't need that type of protection
anymore, which was adding to the complexity of the solution. Let's just
get rid of it.
2025-04-17 16:25:47 +02:00
Willy Tarreau
513397ac82 MINOR: debug: make ha_stuck_warning() print the whole message at once
It has been noticed quite a few times during troubleshooting and even
testing that warnings can happen in avalanches from multiple threads
at the same time, and that their reporting it interleaved bacause the
output is produced in small chunks. Originally, this code inspired by
the panic code aimed at making sure to log whatever could be emitted
in case it would crash later. But this approach was wrong since writes
are atomic, and performing 5 writes in sequence in each dumping thread
also means that the outputs can be mixed up at 5 different locations
between multiple threads. The output of warnings is never very long,
and the stack-based buffer is 4kB so let's just concatenate everything
in the buffer and emit it at once using a single write(). Now there's
no longer this confusion on the output.
2025-04-17 16:25:47 +02:00
Willy Tarreau
c16d5415a8 MINOR: debug: make ha_stuck_warning() only work for the current thread
Since we no longer call it with a foreign thread, let's simplify its code
and get rid of the special cases that were relying on ha_thread_dump_fill()
and synchronization with a remote thread. We're not only dumping the
current thread so ha_thread_dump_one() is sufficient.
2025-04-17 16:25:47 +02:00
Willy Tarreau
a06c215f08 MEDIUM: wdt: always make the faulty thread report its own warnings
Warnings remain tricky to deal with, especially for other threads as
they require some inter-thread synchronization that doesn't cope very
well with other parallel activities such as "show threads" for example.

However there is nothing that forces us to handle them this way. The
panic for example is already handled by bouncing the WDT signal to the
faulty thread.

This commit rearranges the WDT handler to make a better used of this
existing signal bouncing feature of the WDT handler so that it's no
longer limited to panics but can also deal with warnings. In order not
to bounce on all wakeups, we only bounce when there is a suspicion,
that is, when the warning timer has been crossed. We'll let the target
thread verify the stuck flag and context switch count by itself to
decide whether or not to panic, warn, or just do nothing and update
the counters.

As a bonus, now all warning traces look the same regardless of the
reporting thread:

   call trace(16):
   |       0x6bc733 <01 00 00 e8 6d e6 de ff]: ha_dump_backtrace+0x73/0x309 > main-0x2570
   |       0x6bd37a <00 00 00 e8 d6 fb ff ff]: ha_thread_dump_fill+0xda/0x104 > ha_thread_dump_one
   |       0x6bd625 <00 00 00 e8 7b fc ff ff]: ha_stuck_warning+0xc5/0x19e > ha_thread_dump_fill
   |       0x7b2b60 <64 8b 3b e8 00 aa f0 ff]: wdt_handler+0x1f0/0x212 > ha_stuck_warning
   | 0x7fd7e2cef3a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0
   | 0x7ffc6af9e634 <85 a6 00 00 00 0f 01 f9]: linux-vdso:__vdso_gettimeofday+0x34/0x2b0
   |       0x6bad74 <7c 24 10 e8 9c 01 df ff]: sc_conn_io_cb+0x9fa4 > main-0x2400
   |       0x67c457 <89 f2 4c 89 e6 41 ff d0]: main+0x1cf147
   |       0x67d401 <48 89 df e8 8f ed ff ff]: cli_io_handler+0x191/0xb38 > main+0x1cee80
   |       0x6dd605 <40 48 8b 45 60 ff 50 18]: task_process_applet+0x275/0xce9
2025-04-17 16:25:47 +02:00
Willy Tarreau
b24d7f248e MINOR: pass a valid buffer pointer to ha_thread_dump_one()
The goal is to let the caller deal with the pointer so that the function
only has to fill that buffer without worrying about locking. This way,
synchronous dumps from "show threads" are produced and emitted directly
without causing undesired locking of the buffer nor risking causing
confusion about thread_dump_buffer containing bits from an interrupted
dump in progress.

It's only the caller that's responsible for notifying the requester of
the end of the dump by setting bit 0 of the pointer if needed (i.e. it's
only done in the debug handler).
2025-04-17 16:25:47 +02:00
Willy Tarreau
5ac739cd0c MINOR: debug: remove unused case of thr!=tid in ha_thread_dump_one()
This function was initially designed to dump any threadd into the presented
buffer, but the way it currently works is that it's always called for the
current thread, and uses the distinction between coming from a sighandler
or being called directly to detect which thread is the caller.

Let's simplify all this by replacing thr with tid everywhere, and using
the thread-local pointers where it makes sense (e.g. th_ctx, th_ctx etc).
The confusing "from_signal" argument is now replaced with "is_caller"
which clearly states whether or not the caller declares being the one
asking for the dump (the logic is inverted, but there are only two call
places with a constant).
2025-04-17 16:25:47 +02:00
Willy Tarreau
5646ec4d40 MINOR: debug: always reset the dump pointer when done
We don't need to copy the old dump pointer to the thread_dump_pointer
area anymore to indicate a dump is collected. It used to be done as an
artificial way to keep the pointer for the post-mortem analysis but
since we now have this pointer stored separately, that's no longer
needed and it simplifies the mechanim to reset it.
2025-04-17 16:25:47 +02:00
Willy Tarreau
6d8a523d14 MINOR: tinfo: keep a copy of the pointer to the thread dump buffer
Instead of using the thread dump buffer for post-mortem analysis, we'll
keep a copy of the assigned pointer whenever it's used, even for warnings
or "show threads". This will offer more opportunities to figure from a
core what happened, and will give us more freedom regarding the value of
the thread_dump_buffer itself. For example, even at the end of the dump
when the pointer is reset, the last used buffer is now preserved.
2025-04-17 16:25:47 +02:00
Willy Tarreau
d20e9cad67 MINOR: debug: protect ha_dump_backtrace() against risks of re-entrance
If a thread is dumping itself (warning, show thread etc) and another one
wants to dump the state of all threads (e.g. panic), it may interrupt the
first one during backtrace() and re-enter it from the signal handler,
possibly triggering a deadlock in the underlying libc. Let's postpone
the debug signal delivery at this point until the call ends in order to
avoid this.
2025-04-17 16:25:47 +02:00
Willy Tarreau
2dfb63313b MINOR: tools: protect dladdr() against reentrant calls from the debug handler
If a thread is currently resolving a symbol while another thread triggers
a thread dump, the current thread may enter the debug handler and call
resolve_sym_addr() again, possibly deadlocking if the underlying libc
uses locking. Let's postpone the debug signal delivery in this area
during the call. This will slow the resolution a little bit but we don't
care, it's not supposed to happen often and it must remain rock-solid.
2025-04-17 16:25:47 +02:00
Willy Tarreau
8d0c633677 MINOR: tools: also protect the library name resolution against concurrent accesses
This is an extension of eb41d768f ("MINOR: tools: use only opportunistic
symbols resolution"). It also makes sure we're not calling dladddr() in
parallel to dladdr_and_size(), as a preventive measure against some
potential deadlocks in the inner layers of the libc.
2025-04-17 16:25:47 +02:00
Willy Tarreau
5b5960359f MINOR: debug: do not statify a few debugging functions often used with wdt/dbg
A few functions are used when debugging debug signals and watchdog, but
being static, they're not resolved and are hard to spot in dumps, and
they appear as any random other function plus an offset. Let's just not
mark them static anymore, it only hurts:
  - cli_io_handler_show_threads()
  - debug_run_cli_deadlock()
  - debug_parse_cli_loop()
  - debug_parse_cli_panic()
2025-04-17 16:25:47 +02:00
Willy Tarreau
47f8397afb BUG/MINOR: debug: detect and prevent re-entrance in ha_thread_dump_fill()
In the following trace trying to abuse the watchdog from the CLI's
"debug dev loop" command running in parallel to "show threads" loops,
it's clear that some re-entrance may happen in ha_thread_dump_fill().

A first minimal fix consists in using a test-and-set on the flag
indicating that the function is currently dumping threads, so that
the one from the signal just returns. However the caller should be
made more reliable to serialize all of this, that's for future
work.

Here's an example capture of 7 threads stuck waiting for each other:
  (gdb) bt
  #0  0x00007fe78d78e147 in sched_yield () from /lib64/libc.so.6
  #1  0x0000000000674a05 in ha_thread_relax () at src/thread.c:356
  #2  0x00000000005ba4f5 in ha_thread_dump_fill (thr=2, buf=0x7ffdd8e08ab0) at src/debug.c:402
  #3  ha_thread_dump_fill (buf=0x7ffdd8e08ab0, thr=<optimized out>) at src/debug.c:384
  #4  0x00000000005baac4 in ha_stuck_warning (thr=thr@entry=2) at src/debug.c:840
  #5  0x00000000006a360d in wdt_handler (sig=<optimized out>, si=<optimized out>, arg=<optimized out>) at src/wdt.c:156
  #6  <signal handler called>
  #7  0x00007fe78d78e147 in sched_yield () from /lib64/libc.so.6
  #8  0x0000000000674a05 in ha_thread_relax () at src/thread.c:356
  #9  0x00000000005ba4c2 in ha_thread_dump_fill (thr=2, buf=0x7fe78f2d6420) at src/debug.c:426
  #10 ha_thread_dump_fill (buf=0x7fe78f2d6420, thr=2) at src/debug.c:384
  #11 0x00000000005ba7c6 in cli_io_handler_show_threads (appctx=0x2a89ab0) at src/debug.c:548
  #12 0x000000000057ea43 in cli_io_handler (appctx=0x2a89ab0) at src/cli.c:1176
  #13 0x00000000005d7885 in task_process_applet (t=0x2a82730, context=0x2a89ab0, state=<optimized out>) at src/applet.c:920
  #14 0x0000000000659002 in run_tasks_from_lists (budgets=budgets@entry=0x7ffdd8e0a5c0) at src/task.c:644
  #15 0x0000000000659bd7 in process_runnable_tasks () at src/task.c:886
  #16 0x00000000005cdcc9 in run_poll_loop () at src/haproxy.c:2858
  #17 0x00000000005ce457 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3075
  #18 0x0000000000430628 in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3665
2025-04-17 16:25:47 +02:00
Willy Tarreau
ebf1757dc2 BUG/MINOR: wdt/debug: avoid signal re-entrance between debugger and watchdog
As seen in issue #2860, there are some situations where a watchdog could
trigger during the debug signal handler, and where similarly the debug
signal handler may trigger during the wdt handler. This is really bad
because it could trigger some deadlocks inside inner libc code such as
dladdr() or backtrace() since the code will not protect against re-
entrance but only against concurrent accesses.

A first attempt was made using ha_sigmask() but that's not always very
convenient because the second handler is called immediately after
unblocking the signal and before returning, leaving signal cascades in
backtrace. Instead, let's mark which signals to block at registration
time. Here we're blocking wdt/dbg for both signals, and optionally
SIGRTMAX if DEBUG_DEV is used as that one may also be used in this case.

This should be backported at least to 3.1.
2025-04-17 16:25:47 +02:00
Willy Tarreau
0b56839455 BUG/MINOR debug: fix !USE_THREAD_DUMP in ha_thread_dump_fill()
The function must make sure to return NULL for foreign threads and
the local buffer for the current thread in this case, otherwise panics
(and sometimes even warnings) will segfault when USE_THREAD_DUMP is
disabled. Let's slightly re-arrange the function to reduce the #if/else
since we have to specifically handle the case of !USE_THREAD_DUMP anyway.

This needs to be backported wherever b8adef065d ("MEDIUM: debug: on
panic, make the target thread automatically allocate its buf") was
backported (at least 2.8).
2025-04-17 16:25:47 +02:00
Willy Tarreau
337017e2f9 BUG/MINOR: threads: set threads_idle and threads_harmless even with no threads
Some signal handlers rely on these to decide about the level of detail to
provide in dumps, so let's properly fill the info about entering/leaving
idle. Note that for consistency with other tests we're using bitops with
t->ltid_bit, while we could simply assign 0/1 to the fields. But it makes
the code more readable and the whole difference is only 88 bytes on a 3MB
executable.

This bug is not important, and while older versions are likely affected
as well, it's not worth taking the risk to backport this in case it would
wake up an obscure bug.
2025-04-17 16:25:47 +02:00
Willy Tarreau
f499fa3dcd BUILD: makefile: enable backtrace by default on musl
The reason musl builds was not producing exploitable backtraces was
that the toolchain used appears to automatically omit the frame pointer
at -O2 but leaves it at -O0. This patch just makes sure to always append
-fno-omit-frame-pointer to the BACKTRACE cflags and enables the option
with musl where it now works. This will allow us to finally get
exploitable traces from docker images where core dumps are not always
available.
2025-04-17 16:25:47 +02:00
Amaury Denoyelle
bd1d02e2b3 MINOR: mux-h2: handle idle-ping on conn reverse
This commit extends MUX H2 connection reversal step to properly take
into account the new idle-ping feature. It first ensures that h2c task
is properly instantiated/freed depending now on both timers and
idle-ping configuration. Also, h2c_update_timeout() is now called
instead of manually requeuing the task, which ensures the proper timer
value is selected depending on the new connection side.
2025-04-17 14:49:36 +02:00
Amaury Denoyelle
cc5a7a760f MINOR: mux-h2: do not emit GOAWAY on idle ping expiration
If idle-ping is activated and h2c task is expired due to missing PING
ACK, consider that the peer is away and the connection can be closed
immediately. GOAWAY emission is thus skipped.

A new test is necessary in h2c_update_timeout() when PING ACK is
currently expected, but the next timer expiration selected is not
idle-ping. This may happen if http-keep-alive/http-request timers are
selected first. In this case, H2_CF_IDL_PING_SENT flag is resetted. This
is necessary to not prevent GOAWAY emission on expiration.
2025-04-17 14:49:36 +02:00
Amaury Denoyelle
52246249ab MEDIUM: listener/mux-h2: implement idle-ping on frontend side
This commit is the counterpart of the previous one, adapted on the
frontend side. "idle-ping" is added as keyword to bind lines, to be able
to refresh client timeout of idle frontend connections.

H2 MUX behavior remains similar as the previous patch. The only
significant change is in h2c_update_timeout(), as idle-ping is now taken
into account also for frontend connection. The calculated value is
compared with http-request/http-keep-alive timeout value. The shorter
delay is then used as expired date. As hr/ka timeout are based on
idle_start, this allows to run them in parallel with an idle-ping timer.
2025-04-17 14:49:36 +02:00
Amaury Denoyelle
a78a04cfae MEDIUM: server/mux-h2: implement idle-ping on backend side
This commit implements support for idle-ping on the backend side. First,
a new server keyword "idle-ping" is defined in configuration parsing. It
is used to set the corresponding new server member.

The second part of this commit implements idle-ping support on H2 MUX. A
new inlined function conn_idle_ping() is defined to access connection
idle-ping value. Two new connection flags are defined H2_CF_IDL_PING and
H2_CF_IDL_PING_SENT. The first one is set for idle connections via
h2c_update_timeout().

On h2_timeout_task() handler, if first flag is set, instead of releasing
the connection as before, the second flag is set and tasklet is
scheduled. As both flags are now set, h2_process_mux() will proceed to
PING emission. The timer has also been rearmed to the idle-ping value.
If a PING ACK is received before next timeout, connection timer is
refreshed. Else, the connection is released, as with timer expiration.

Also of importance, special care is needed when a backend connection is
going to idle. In this case, idle-ping timer must be rearmed. Thus a new
invokation of h2c_update_timeout() is performed on h2_detach().
2025-04-17 14:49:36 +02:00
Amaury Denoyelle
4dcfe098a6 MINOR: mux-h2: prepare to support PING emission
Adapt the already existing function h2c_ack_ping(). The objective is to
be able to emit a PING request. First, it is renamed as h2c_send_ping().
A new boolean argument <ack> is used to emit either a PING request or
ack.
2025-04-17 14:49:36 +02:00
Amaury Denoyelle
99b2e52f89 MINOR: mux-h2: refactor idle timeout calculation
Reorganize code for timeout calculation in case the connection is idle.
The objective is to better reflect the relations between each timeouts
as follow :

* if GOAWAY already emitted, use shut-timeout, or if unset fallback to
  client/server one. However, an already set timeout is never erased.

* else, for frontend connection, http-request or keep-alive timeout is
  applied depending on the current demux state. If the selected value is
  unset, fallback to client timeout

* for backend connection, no timeout is set to perform http-reuse

This commit is pure refactoring, so no functional change should occur.
2025-04-17 14:49:36 +02:00
Amaury Denoyelle
243bc95de0 BUG/MINOR: mux-h2: do not apply timer on idle backend connection
Since the following commit, MUX H2 timeout function has been slightly
exetended.

  d38d8c6ccb189e7bc813b3693fec3093c9be55f1
  BUG/MEDIUM: mux-h2: make sure control frames do not refresh the idle timeout

A side-effect of this patch is that now backend idle connection expire
timer is not reset if already defined. This means that if a timer was
registered prior to the connection transition to idle, the connection
would be destroyed on its timeout. If this happens for enough
connection, this may have an impact on the reuse rate.

In practice, this case should be rare, as h2c timer is set to
TICK_ETERNITY while there is active streams. The timer is not refreshed
most of the time before going the transition to idle, so the connection
won't be deleted on expiration.

The only case where it could occur is if there is still pending data
blocked on emission on stream detach. Here, timeout server is applied on
the connection. When the emission completes, the connection goes to
idle, but the timer will still armed, and thus will be triggered on the
idle connection.

To prevent this, explicitely reset h2c timer to TICK_ETERNITY for idle
backend connection via h2c_update_timeout().

This patch is explicitely not scheduled for backport for now, as it is
difficult to estimate the real impact of the previous code state.
2025-04-17 14:49:36 +02:00
Amaury Denoyelle
9e6f8ce328 BUG/MINOR: rhttp: ensure GOAWAY can be emitted after reversal
GOAWAY emission should not be emitted before preface. Thus, max_id field
from h2c acting as a server is initialized to -1, which prevents its
emission until preface is received from the peer. If acting as a client,
max_id is initialized to a valid value on the first h2s emission.

This causes an issue with reverse HTTP on the active side. First, it
starts as a client, so the peer does not emit a preface but instead a
simple SETTINGS frame. As role are switched, max_id is initialized much
later when the first h2s response is emitted. Thus, if the connection
must be terminated before any stream transfer, GOAWAY cannot be emitted.

To fix this, ensure max_id is initialized to 0 on h2_conn_reverse() for
active connect side. Thus, a GOAWAY indicating that no stream has been
handled can be generated.

Note that passive connect side is not impacted, as it max_id is
initialized thanks to preface reception.

This should be backported up to 2.9.
2025-04-17 14:49:36 +02:00
Amaury Denoyelle
2b8da5f9ab BUG/MINOR: rhttp: fix reconnect if timeout connect unset
Active connect on reverse http relies on connect timeout to detect
connection failure. Thus, if this timeout was unset, connection failure
may not be properly detected.

Fix this by fallback on hardcoded value of 1s for connect if timeout is
unset in the configuration. This is considered as a minor bug, as
haproxy advises against running with timeout unset.

This must be backported up to 2.9.
2025-04-17 14:49:36 +02:00
Amaury Denoyelle
3ebdd3ae50 BUG/MINOR: mux-h2: prevent past scheduling with idle connections
While reviewing HTTP/2 MUX timeout, it seems there is a possibility that
MUX task is requeued via h2c_update_timeout() with an already expired
date. This can happens with idle connections on two cases :
* first with shut timeout, as timer is not refreshed if already set
* second with http-request and keep-alive timers, which are based on
  idle_start

Queuing an already expired task is an undefined behavior. Fix this by
using task_wakeup() instead of task_queue() at the end of
h2c_update_timeout() if such case occurs.

This should be backported up to 2.6.
2025-04-17 14:49:36 +02:00
Aurelien DARRAGON
b81ab159a6 BUG/MEDIUM: hlua: fix hlua_applet_{http,tcp}_fct() yield regression (lost data)
Jacques Heunis from bloomberg reported on the mailing list [1] that
with haproxy 2.8 up to master, yielding from a Lua tcp service while
data was still buffered inside haproxy would eat some data which was
definitely lost.

He provided the reproducer below which turned out to be really helpful:

  global
      log stdout format raw local0 info
      lua-load haproxy_yieldtest.lua

  defaults
      log global
      timeout connect         10s
      timeout client          1m
      timeout server          1m

  listen echo
      bind *:9090
      mode tcp
      tcp-request content use-service lua.print_input

haproxy_yieldtest.lua:

  core.register_service("print_input", "tcp", function(applet)
      core.Info("Start printing input...")
      while true do
          local inputs = applet:getline()
          if inputs == nil or string.len(inputs) == 0 then
              core.Info("closing input connection")
              return
          end
          core.Info("Received line: "..inputs)
          core.yield()
      end
  end)

And the script below:

  #!/usr/bin/bash
  for i in $(seq 1 9999); do
      for j in $(seq 1 50); do
          echo "${i}_foo_${j}"
      done
      sleep 2
  done

Using it like this:
  ./test_seq.sh | netcat localhost 9090

We can clearly see the missing data for every "foo" burst (every 2
seconds), as they are holes in the numbering.

Thanks to the reproducer, it was quickly found that only versions
>= 2.8 were affected, and that in fact this regression was introduced
by commit 31572229e ("MEDIUM: hlua/applet: Use the sedesc to report and
detect end of processing")

In fact in 31572229e 2 mistakes were made during the refaco.
Indeed, both in hlua_applet_tcp_fct() (which is involved in the reproducer
above) and hlua_applet_http_fct(), the request (buffer) is now
systematically consumed when returning from the function, which wasn't the
case prior to this commit: when HLUA_E_AGAIN is returned, it means a
yield was requested and that the processing is not done yet, thus we
should not consume any data, like we did prior to the refacto.

Big thanks to Jacques who did a great job reproducing and reporting this
issue on the mailing list.

[1]: https://www.mail-archive.com/haproxy@formilux.org/msg45778.html

It should be backported up to 2.8 with commit 31572229e
2025-04-17 14:40:34 +02:00
Amaury Denoyelle
2c3d656f8d MEDIUM: h3: use absolute URI form with :authority
Change the representation of the start-line URI when parsing a HTTP/3
request into HTX. Adopt the same conversion as HTTP/2. If :authority
header is used (default case), the URI is encoded using absolute-form,
with scheme, host and path concatenated. If only a plain host header is
used instead, fallback to the origin form.

This commit may cause some configuration to be broken if parsing is
performed on the URI. Indeed, now most of the HTTP/3 requests will be
represented with an absolute-form URI at the stream layer.

Note that prior to this commit a check was performed on the path used as
URI to ensure that it did not contain any invalid characters. Now, this
is directly performed on the URI itself, which may include the path.

This must not be backported.
2025-04-16 18:32:00 +02:00
Amaury Denoyelle
1faa1285aa BUG/MINOR: h3: reject request URI with invalid characters
Ensure that the HTX start-line generated after parsing an HTTP/3 request
does not contain any invalid character, i.e. control or whitespace
characters.

Note that for now path is used directly as URI. Thus, the check is
performed directly over it. A patch will change this to generate an
absolute-form URI in most cases, but it won't be backported to avoid
configuration breaking in stable versions.

This must be backported up to 2.6.
2025-04-16 18:32:00 +02:00
Amaury Denoyelle
fc28fe7191 BUG/MINOR: h3: reject invalid :path in request
RFC 9114 specifies some requirements for :path pseudo-header when using
http or https scheme. This commit enforces this by rejecting a request
if needed. Thus, path cannot be empty, and it must either start with a
'/' character or contains only '*'.

This must be backported up to 2.6.
2025-04-16 18:31:55 +02:00
Amaury Denoyelle
6403bfbce8 BUG/MINOR: h3: filter upgrade connection header
As specified in RFC 9114, connection headers required special care in
HTTP/3. When a request is received with connection headers, the stream
is immediately closed. Conversely, when translating the response from
HTX, such headers are not encoded but silently ignored.

However, "upgrade" was not listed in connection headers. This commit
fixes this by adding a check on it both on request parsing and response
encoding.

This must be backported up to 2.6.
2025-04-16 18:31:04 +02:00
Amaury Denoyelle
bd3587574d BUG/MEDIUM: h3: trim whitespaces in header value prior to QPACK encoding
This commit does a similar job than the previous one, but it acts now on
the response path. Any leading or trailing whitespaces characters from a
HTX block header value are removed, prior to the header encoding via
QPACK.

This must be backported up to 2.6.
2025-04-16 18:31:04 +02:00
Amaury Denoyelle
a17e5b27c0 BUG/MEDIUM: h3: trim whitespaces when parsing headers value
Remove any leading and trailing whitespace from header field values
prior to inserting a new HTX header block. This is done when parsing a
HEADERS frame, both as headers and trailers.

This must be backported up to 2.6.
2025-04-16 18:31:04 +02:00
William Lallemand
8efafe76a3 MINOR: acme: free acme_ctx once the task is done
Free the acme_ctx task context once the task is done.
It frees everything but the config and the httpclient,
everything else is free.

The ckch_store is freed in case of error, but when the task is
successful, the ptr is set to NULL to prevent the free once inserted in
the tree.
2025-04-16 18:08:01 +02:00
William Lallemand
e778049ffc MINOR: acme: register the task in the ckch_store
This patch registers the task in the ckch_store so we don't run 2 tasks
at the same time for a given certificate.

Move the task creation under the lock and check if there was already a
task under the lock.
2025-04-16 17:12:43 +02:00
William Lallemand
115653bfc8 BUG/MINOR: acme/cli: fix certificate name in error message
The acme command had a new parameter so the certificate name is not
correct anymore because args[1] is not the certificate value anymore.
2025-04-16 17:06:52 +02:00
William Lallemand
39088a7806 MINOR: acme: add a success message to the logs
Add a success log when the certificate was updated.

Ex:

  acme: foobar.pem: Successful update of the certificate.
2025-04-16 14:51:18 +02:00
William Lallemand
31a1d13802 MINOR: acme: emit logs instead of ha_notice
Emit logs using the global logs when the ACME task failed or retries,
instead of using ha_notice().
2025-04-16 14:39:39 +02:00
William Lallemand
f36f9ca21c DOC: configuration: specify limitations of ACME for 3.2
Specify the version for which the limitation applies.
2025-04-16 14:30:45 +02:00
William Lallemand
608eb3d090 BUG/MINOR: acme: fix the exponential backoff of retries
Exponential backoff values was multiplied by 3000 instead of 3 with a
second to ms conversion. Leading to a 9000000ms value at the 2nd
attempt.

Fix the issue by setting the value in seconds and converting the value
in tick_add().

No backport needed.
2025-04-16 14:20:00 +02:00
William Lallemand
7814a8b446 BUG/MINOR: acme: key not restored upon error in acme_res_certificate() V2
When receiving the final certificate, it need to be loaded by
ssl_sock_load_pem_into_ckch(). However this function will remove any
existing private key in the struct ckch_store.

In order to fix the issue, the ptr to the key is swapped with a NULL
ptr, and restored once the new certificate is commited.

However there is a discrepancy when there is an error in
ssl_sock_load_pem_into_ckch() fails and the pointer is lost.

This patch fixes the issue by restoring the pointer in the error path.

This must fix issue #2933.
2025-04-16 14:05:04 +02:00
William Lallemand
e21a165af6 Revert "BUG/MINOR: acme: key not restored upon error in acme_res_certificate()"
This reverts commit 7a43094f8d8fe3c435ecc003f07453dd9de8134a.

Part of another incomplete patch was accidentally squash into the patch.
2025-04-16 14:03:08 +02:00
William Lallemand
bea6235629 MEDIUM: sink: add a new dpapi ring buffer
Add a 1MB ring buffer called "dpapi" for communication with the
dataplane API. It would first be used to transmit ACME informations to
the dataplane API but could be used for more.
2025-04-16 13:56:12 +02:00
William Lallemand
f6fc914fb6 DOC: configuration: fix a typo in ACME documentation
Fix "supposed" typo in ACME documentation.
2025-04-16 13:55:25 +02:00
Ilia Shipitsin
4dee087f19 CI: fedora rawhide: enable unit tests
Run the new make unit-tests on the CI.
2025-04-15 16:53:54 +02:00
Christopher Faulet
d160046e2c BUG/MEDIUM: http-ana: Report 502 from req analyzer only during rsp forwarding
A server abort must be handled by the request analyzers only when the
response forwarding was already started. Otherwise, it it the responsability
of the response analyzer to detect this event. L7-retires and conditions to
decide to silently close a client conneciotn are handled by this analyzer.

Because a reused server connections closed too early could be detected at
the wrong place, it was possible to get a 502/SH instead of a silent close,
preventing the client to safely retries its request.

Thanks to this patch, we are able to silently close the client connection in
this case and eventually to perform a L7 retry.

This patch must be backported as far as 2.8.
2025-04-15 16:28:15 +02:00
Christopher Faulet
c672b2a297 BUG/MINOR: http-ana: Properly detect client abort when forwarding the response
During the response payload forwarding, if the back SC is closed, we try to
figure out if it is because of a client abort or a server abort. However,
the condition was not accurrate, especially when abortonclose option is
set. Because of this issue, a server abort may be reported (SD-- in logs)
instead of a client abort (CD-- in logs).

The right way to detect a client abort when we try to forward the response
is to test if the back SC was shut down (SC_FL_SHUT_DOWN flag set) AND
aborted (SC_FL_ABRT_DONE flag set). When these both flags are set, it means
the back connection underwent the shutdown, which should be converted to a
client abort at this stage.

This patch should be backported as far as 2.8. It should fix last strange SD
report in the issue #2749.
2025-04-15 16:28:15 +02:00
William Lallemand
c291a5c73c BUILD: incompatible pointer type suspected with -DDEBUG_UNIT
src/jws.c: In function '__jws_init':
src/jws.c:594:38: error: passing argument 2 of 'hap_register_unittest' from incompatible pointer type [-Wincompatible-pointer-types]
  594 |         hap_register_unittest("jwk", jwk_debug);
      |                                      ^~~~~~~~~
      |                                      |
      |                                      int (*)(int,  char **)
In file included from include/haproxy/api.h:36,
                 from include/import/ebtree.h:251,
                 from include/import/ebmbtree.h:25,
                 from include/haproxy/jwt-t.h:25,
                 from src/jws.c:5:
include/haproxy/init.h:37:52: note: expected 'int (*)(void)' but argument is of type 'int (*)(int,  char **)'
   37 | void hap_register_unittest(const char *name, int (*fct)());
      |                                              ~~~~~~^~~~~~

GCC 15 is warning because the function pointer does have its
arguments in the register function.

Should fix issue #2929.
2025-04-15 15:49:44 +02:00
William Lallemand
05ebb448b5 CLEANUP: acme: stored value is overwritten before it can be used
>>>     CID 1609049:  Code maintainability issues  (UNUSED_VALUE)
   >>>     Assigning value "NULL" to "new_ckchs" here, but that stored value is overwritten before it can be used.
   592             struct ckch_store *old_ckchs, *new_ckchs = NULL;

Coverity reported an issue where a variable is initialized to NULL then
directry overwritten with another value. This doesn't arm but this patch
removes the useless initialization.

Must fix issue #2932.
2025-04-15 11:44:45 +02:00
William Lallemand
3866d3bd12 BUG/MINOR: acme: fix possible NULL deref
Task was dereferenced when setting ctx but was checked after.
This patch move the setting of ctx after the check.

Should fix issue #2931
2025-04-15 11:41:58 +02:00
Willy Tarreau
3cbbf41cd8 MINOR: debug: detect call instructions and show the branch target in backtraces
In backtraces, sometimes it's difficult to know what was called by a
given point, because some functions can be fairly long making one
doubt about the correct pointer of unresolved ones, others might
just use a tail branch instead of a call + return, etc. On common
architectures (x86 and aarch64), it's not difficult to detect and
decode a relative call, so let's do it on both of these platforms
and show the branch location after a '>'. Example:

x86_64:
   call trace(19):
   |       0x6bd644 <64 8b 38 e8 ac f7 ff ff]: debug_handler+0x84/0x95 > ha_thread_dump_one
   | 0x7feb3e5383a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0
   | 0x7feb3e53748b <c0 b8 03 00 00 00 0f 05]: libpthread:__close+0x3b/0x8b
   |       0x7619e4 <44 89 ff e8 fc 97 d4 ff]: _fd_delete_orphan+0x1d4/0x1d6 > main-0x2130
   |       0x743862 <8b 7f 68 e8 8e e1 01 00]: sock_conn_ctrl_close+0x12/0x54 > fd_delete
   |       0x5ac822 <c0 74 05 4c 89 e7 ff d0]: main+0xff512
   |       0x5bc85c <48 89 ef e8 04 fc fe ff]: main+0x10f54c > main+0xff150
   |       0x5be410 <4c 89 e7 e8 c0 e1 ff ff]: main+0x111100 > main+0x10f2c0
   |       0x6ae6a4 <28 00 00 00 00 ff 51 58]: cli_io_handler+0x31524
   |       0x6aeab4 <7c 24 08 e8 fc fa ff ff]: sc_destroy+0x14/0x2a4 > cli_io_handler+0x31430
   |       0x6c685d <48 89 ef e8 43 82 fe ff]: process_chk_conn+0x51d/0x1927 > sc_destroy

aarch64:
   call trace(15):
   | 0xaaaaad0c1540 <60 6a 60 b8 c3 fd ff 97]: debug_handler+0x9c/0xbc > ha_thread_dump_one
   | 0xffffa8c177ac <c2 e0 3b d5 1f 20 03 d5]: linux-vdso:__kernel_rt_sigreturn
   | 0xaaaaad0b0964 <c0 03 5f d6 d2 ff ff 97]: cli_io_handler+0x28e44 > sedesc_new
   | 0xaaaaad0b22a4 <00 00 80 d2 94 f9 ff 97]: sc_new_from_strm+0x1c/0x54 > cli_io_handler+0x28dd0
   | 0xaaaaad0167e8 <21 00 80 52 a9 6e 02 94]: stream_new+0x258/0x67c > sc_new_from_strm
   | 0xaaaaad0b21f8 <e1 03 13 aa e7 90 fd 97]: sc_new_from_endp+0x38/0xc8 > stream_new
   | 0xaaaaacfda628 <21 18 40 f9 e7 5e 03 94]: main+0xcaca8 > sc_new_from_endp
   | 0xaaaaacfdb95c <42 c0 00 d1 02 f3 ff 97]: main+0xcbfdc > main+0xc8be0
   | 0xaaaaacfdd3f0 <e0 03 13 aa f5 f7 ff 97]: h1_io_cb+0xd0/0xb90 > main+0xcba40
2025-04-14 20:06:48 +02:00
Willy Tarreau
9740f15274 MINOR: debug: in call traces, dump the 8 bytes before the return address, not after
In call traces, we're interested in seeing the code that was executed, not
the code that was not yet. The return address is where the CPU will return
to, so we want to see the bytes that precede this location. In the example
below on x86 we can clearly see a number of direct "call" instructions
(0xe8 + 4 bytes). There are also indirect calls (0xffd0) that cannot be
exploited but it gives insights about where the code branched, which will
not always be the function above it if that one used tail branching for
example. Here's an example dump output:

         call ------------,
                          v
       0x6bd634 <64 8b 38 e8 ac f7 ff ff]: debug_handler+0x84/0x95
 0x7fa4ea2593a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0
       0x752132 <00 00 00 00 00 90 41 55]: htx_remove_blk+0x2/0x354
       0x5b1a2c <4c 89 ef e8 04 07 1a 00]: main+0x10471c
       0x5b5f05 <48 89 df e8 8b b8 ff ff]: main+0x108bf5
       0x60b6f4 <89 ee 4c 89 e7 41 ff d0]: tcpcheck_eval_send+0x3b4/0x14b2
       0x610ded <00 00 00 e8 53 a5 ff ff]: tcpcheck_main+0x7dd/0xd36
       0x6c5ab4 <48 89 df e8 5c ab f4 ff]: wake_srv_chk+0xc4/0x3d7
       0x6c5ddc <48 89 f7 e8 14 fc ff ff]: srv_chk_io_cb+0xc/0x13
2025-04-14 19:28:22 +02:00
Willy Tarreau
003f5168e4 MINOR: tools: let dump_addr_and_bytes() support dumping before the offset
For code dumps, dumping from the return address is pointless, what is
interesting is to dump before the return address to read the machine
code that was executed before branching. Let's just make the function
support negative sizes to indicate that we're dumping this number of
bytes to the address instead of this number from the address. In this
case, in order to distinguish them, we're using a '<' instead of '[' to
start the series of bytes, indicating where the bytes expand and where
they stop. For example we can now see this:

       0x6bd634 <64 8b 38 e8 ac f7 ff ff]: debug_handler+0x84/0x95
 0x7fa4ea2593a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0
       0x752132 <00 00 00 00 00 90 41 55]: htx_remove_blk+0x2/0x354
       0x5b1a2c <4c 89 ef e8 04 07 1a 00]: main+0x10471c
       0x5b5f05 <48 89 df e8 8b b8 ff ff]: main+0x108bf5
       0x60b6f4 <89 ee 4c 89 e7 41 ff d0]: tcpcheck_eval_send+0x3b4/0x14b2
       0x610ded <00 00 00 e8 53 a5 ff ff]: tcpcheck_main+0x7dd/0xd36
       0x6c5ab4 <48 89 df e8 5c ab f4 ff]: wake_srv_chk+0xc4/0x3d7
       0x6c5ddc <48 89 f7 e8 14 fc ff ff]: srv_chk_io_cb+0xc/0x13
2025-04-14 19:25:27 +02:00
Willy Tarreau
b708345c17 DEBUG: counters: add the ability to enable/disable updating the COUNT_IF counters
These counters can have a noticeable cost on large machines, though not
dramatic. There's no single good choice to keep them enabled or disabled.
This commit adds multiple choices:
  - DEBUG_COUNTERS set to 2 will automatically enable them by default, while
    1 will disable them by default
  - the global "debug.counters on/off" will allow to change the setting at
    boot, regardless of DEBUG_COUNTERS as long as it was at least 1.
  - the CLI "debug counters on/off" will also allow to change the value at
    run time, allowing to observe a phenomenon while it's happening, or to
    disable counters if it's suspected that their cost is too high

Finally, the "debug counters" command will append "(stopped)" at the end
of the CNT lines when these counters are stopped.

Not that the whole mechanism would easily support being extended to all
counter types by specifying the types to apply to, but it doesn't seem
useful at all and would require the user to also type "cnt" on debug
lines. This may easily be changed in the future if it's found relevant.
2025-04-14 19:02:13 +02:00
Willy Tarreau
a142adaba0 DEBUG: counters: make COUNT_IF() only appear at DEBUG_COUNTERS>=1
COUNT_IF() is convenient but can be heavy since some of them were found
to trigger often (roughly 1 counter per request on avg). This might even
have an impact on large setups due to the cost of a shared cache line
bouncing between multiple cores. For now there's no way to disable it,
so let's only enable it when DEBUG_COUNTERS is 1 or above. A future
change will make it configurable.
2025-04-14 19:02:13 +02:00
Willy Tarreau
61d633a3ac DEBUG: rename DEBUG_GLITCHES to DEBUG_COUNTERS and enable it by default
Till now the per-line glitches counters were only enabled with the
confusingly named DEBUG_GLITCHES (which would not turn glitches off
when disabled). Let's instead change it to DEBUG_COUNTERS and make sure
it's enabled by default (though it can still be disabled with
-DDEBUG_GLITCHES=0 just like for DEBUG_STRICT). It will later be
expanded to cover more counters.
2025-04-14 19:02:13 +02:00
Willy Tarreau
a8148c313a DEBUG: init: report invalid characters in debug description strings
It's easy to leave some trailing \n or even other characters that can
mangle the debug output. Let's verify at boot time that the debug sections
are clean by checking for chars 0x20 to 0x7e inclusive. This is very simple
to do and it managed to find another one in a multi-line message:

  [WARNING]  (23696) : Invalid character 0x0a at position 96 in description string at src/cli.c:2516 _send_status()

This way new offending code will be spotted before being committed.
2025-04-14 19:02:13 +02:00
Willy Tarreau
9efc60c887 DOC: config: add the missing "force-cfg-parser-pause" to the global kw index
It was documented but missing from the index, let's add it. This can be
backported to 3.1.
2025-04-14 19:02:13 +02:00
Willy Tarreau
640a699804 DOC: config: add the missing "profiling.memory" to the global kw index
It was in the description but not in the index. This can be backported to
all versions where it applies.
2025-04-14 19:02:13 +02:00
Willy Tarreau
23705564ae BUG/MINOR: debug: remove the trailing \n from BUG_ON() statements
These ones were added by mistake during the change of the cfgparse
mechanism in 3.1, but they're corrupting the output of "debug counters"
by leaving stray ']' on their own lines. We could possibly check them
all once at boot but it doens't seem worth it.

This should be backported to 3.1.
2025-04-14 19:02:13 +02:00
William Lallemand
f9390a689f DOC: acme: explain how to configure and run ACME
Add configuration about the acme section in the configuration manual, as
well as the acme command in the management guide.
2025-04-14 16:14:57 +02:00
William Lallemand
7119b5149d MINOR: acme: default to 2048bits for RSA
Change the default RSA value to 2048 bits.
2025-04-14 16:14:57 +02:00
Valentine Krasnobaeva
08efe8cd24 BUG/MINOR: thread: protect thread_cpus_enabled_at_boot with USE_THREAD
Following error is triggered at linker invokation, when we try to compile with
USE_THREAD=0 and -O0.

  make -j 8 TARGET=linux-glibc USE_LUA=1 USE_PCRE2=1 USE_LINUX_CAP=1 \
  		USE_MEMORY_PROFILING=1 OPT_CFLAGS=-O0  USE_THREAD=0

  /usr/bin/ld: src/thread.o: warning: relocation against `thread_cpus_enabled_at_boot' in read-only section `.text'
  /usr/bin/ld: src/thread.o: in function `thread_detect_count':
  /home/vk/projects/haproxy/src/thread.c:1619: undefined reference to `thread_cpus_enabled_at_boot'
  /usr/bin/ld: /home/vk/projects/haproxy/src/thread.c:1619: undefined reference to `thread_cpus_enabled_at_boot'
  /usr/bin/ld: /home/vk/projects/haproxy/src/thread.c:1620: undefined reference to `thread_cpus_enabled_at_boot'
  /usr/bin/ld: warning: creating DT_TEXTREL in a PIE
  collect2: error: ld returned 1 exit status
  make: *** [Makefile:1044: haproxy] Error 1

thread_cpus_enabled_at_boot is only available when we compiled with
USE_THREAD=1, which is the default for the most targets now.

In some cases, we need to recompile in mono-thread mode, thus
thread_cpus_enabled_at_boot should be protected with USE_THREAD in
thread_detect_count().

thread_detect_count() is always called during the process initialization
never mind of multi thread support. It sets some defaults in global.nbthread
and global.nbtgroups.

This patch is related to GitHub issue #2916.
No need to be backported as it was added in 3.2-dev9 version.
2025-04-14 16:03:21 +02:00
William Lallemand
7a43094f8d BUG/MINOR: acme: key not restored upon error in acme_res_certificate()
When receiving the final certificate, it need to be loaded by
ssl_sock_load_pem_into_ckch(). However this function will remove any
existing private key in the struct ckch_store.

In order to fix the issue, the ptr to the key is swapped with a NULL
ptr, and restored once the new certificate is commited.

However there is a discrepancy when there is an error in
ssl_sock_load_pem_into_ckch() fails and the pointer is lost.

This patch fixes the issue by restoring the pointer in the error path.

This must fix issue #2933.
2025-04-14 10:55:44 +02:00
Willy Tarreau
4a44d592ae BUG/MINOR: cpu-topo: check the correct variable for NULL after malloc()
We were testing ha_cpu_topo instead of ha_cpu_clusters after an allocation,
making the check ineffective.

No backport is needed.
2025-04-12 18:23:29 +02:00
William Lallemand
39c05cedff BUILD: acme: enable the ACME feature when JWS is present
The ACME feature depends on the JWS, which currently does not work with
every SSL libraries. This patch only enables ACME when JWS is enabled.
2025-04-12 01:39:03 +02:00
William Lallemand
a96cbe32b6 MINOR: acme: schedule retries with a timer
Schedule the retries with a 3s exponential timer. This is a temporary
mesure as the client should follow the Retry-After field for
rate-limiting for every request (https://datatracker.ietf.org/doc/html/rfc8555#section-6.6)
2025-04-12 01:39:03 +02:00
William Lallemand
768458a79e MEDIUM: acme: replace the previous ckch instance with new ones
This step is the latest to have a usable ACME certificate in haproxy.

It looks for the previous certificate, locks the "BIG CERTIFICATE LOCK",
copy every instance, deploys new ones, remove the previous one.
This is done in one step in a function which does not yield, so it could
be problematic if you have thousands of instances to handle.

It still lacks the rate limit which is mandatory to be used in
production, and more cleanup and deinit.
2025-04-12 01:39:03 +02:00
William Lallemand
9505b5bdf0 MINOR: acme: copy the original ckch_store
Copy the original ckch_store instead of creating a new one. This allows
to inherit the ckch_conf from the previous structure when doing a
ckchs_dup(). The ckch_conf contains the SAN for ACME.

Free the previous PKEY since it a new one is generated.
2025-04-12 01:39:03 +02:00
William Lallemand
5b85b81d84 MINOR: ssl/ckch: handle ckch_conf in ckchs_dup() and ckch_conf_clean()
Handle new members of the ckch_conf in ckchs_dup() and
ckch_conf_clean().

This could be automated at some point since we have the description of
all types in ckch_conf_kws.
2025-04-12 01:39:03 +02:00
William Lallemand
73ab78e917 BUG/MINOR: acme: ckch_conf_acme_init() when no filename
Does not try to strdup the configuration filename if there is none.

No backport needed.
2025-04-12 01:39:03 +02:00
William Lallemand
5500bda9eb MINOR: acme: implement retrieval of the certificate
Once the Order status is "valid", the certificate URL is accessible,
this patch implements the retrieval of the certificate which is stocked
in ctx->store.
2025-04-12 01:39:03 +02:00
William Lallemand
27fff179fe MINOR: acme: verify the order status once finalized
This implements a call to the order status to check if the certificate
is ready.
2025-04-12 01:39:03 +02:00
William Lallemand
680222b382 MINOR: acme: finalize by sending the CSR
This patch does the finalize step of the ACME task.
This encodes the CSR into base64 format and send it to the finalize URL.

https://www.rfc-editor.org/rfc/rfc8555#section-7.4
2025-04-12 01:29:27 +02:00
William Lallemand
de5dc31a0d MINOR: acme: generate the CSR in a X509_REQ
Generate the X509_REQ using the generated private key and the SAN from
the configuration. This is only done once before the task is started.

It could probably be done at the beginning of the task with the private
key generation once we have a scheduler instead of a CLI command.
2025-04-12 01:29:27 +02:00
William Lallemand
00ba62df15 MINOR: acme: implement a check on the challenge status
This patch implements a check on the challenge URL, once haproxy asked
for the challenge to be verified, it must verify the status of the
challenge resolution and if there weren't any error.
2025-04-12 01:29:27 +02:00
William Lallemand
711a13a4b4 MINOR: acme: send the request for challenge ready
This patch sends the "{}" message to specify that a challenge is ready.
It iterates on every challenge URL in the authorization list from the
acme_ctx.

This allows the ACME server to procede to the challenge validation.
https://www.rfc-editor.org/rfc/rfc8555#section-7.5.1
2025-04-12 01:29:27 +02:00
William Lallemand
ae0bc88f91 MINOR: acme: get the challenges object from the Auth URL
This patch implements the retrieval of the challenges objects on the
authorizations URLs. The challenges object contains a token and a
challenge url that need to be called once the challenge is setup.

Each authorization URLs contain multiple challenge objects, usually one
per challenge type (HTTP-01, DNS-01, ALPN-01... We only need to keep the
one that is relevent to our configuration.
2025-04-12 01:29:27 +02:00
William Lallemand
7231bf5726 MINOR: acme: allow empty payload in acme_jws_payload()
Some ACME requests are required to have a JWS with an empty payload,
let's be more flexible and allow this function to have an empty buffer.
2025-04-12 01:29:27 +02:00
William Lallemand
4842c5ea8c MINOR: acme: newOrder request retrieve authorizations URLs
This patch implements the newOrder action in the ACME task, in order to
ask for a new certificate, a list of SAN is sent as a JWS payload.
the ACME server replies a list of Authorization URLs. One Authorization
is created per SAN on a Order.

The authorization URLs are stored in a linked list of 'struct acme_auth'
in acme_ctx, so we can get the challenge URLs from them later.

The location header is also store as it is the URL of the order object.

https://datatracker.ietf.org/doc/html/rfc8555#section-7.4
2025-04-12 01:29:27 +02:00
William Lallemand
04d393f661 MINOR: acme: generate new account
The new account action in the ACME task use the same function as the
chkaccount, but onlyReturnExisting is not sent in this case!
2025-04-12 01:29:27 +02:00
William Lallemand
7f9bf4d5f7 MINOR: acme: check if the account exist
This patch implements the retrival of the KID (account identifier) using
the pkey.

A request is sent to the newAccount URL using the onlyReturnExisting
option, which allow to get the kid of an existing account.

acme_jws_payload() implement a way to generate a JWS payload using the
nonce, pkey and provided URI.
2025-04-12 01:29:27 +02:00
William Lallemand
0aa6dedf72 MINOR: acme: handle the nonce
ACME requests are supposed to be sent with a Nonce, the first Nonce
should be retrieved using the newNonce URI provided by the directory.

This nonce is stored and must be replaced by the new one received in the
each response.
2025-04-12 01:29:27 +02:00
William Lallemand
471290458e MINOR: acme: get the ACME directory
The first request of the ACME protocol is getting the list of URLs for
the next steps.

This patch implements the first request and the parsing of the response.

The response is a JSON object so mjson is used to parse it.
2025-04-12 01:29:27 +02:00
William Lallemand
4780a1f223 MINOR: acme: the acme section is experimental
Allow the usage of the acme section only when
expose-experimental-directives is set.
2025-04-12 01:29:27 +02:00
William Lallemand
b8209cf697 MINOR: acme/cli: add the 'acme renew' command
The "acme renew" command launch the ACME task for a given certificate.

The CLI parser generates a new private key using the parameters from the
acme section..
2025-04-12 01:29:27 +02:00
William Lallemand
bf6a39c4d1 MINOR: acme: add private key configuration
This commit allows to configure the generated private keys, you can
configure the keytype (RSA/ECDSA), the number of bits or the curves.

Example:

    acme LE
        uri https://acme-staging-v02.api.letsencrypt.org/directory
        account account.key
        contact foobar@example.com
        challenge HTTP-01
        keytype ECDSA
        curves P-384
2025-04-12 01:29:27 +02:00
William Lallemand
2e8c350b95 MINOR: acme: add configuration for the crt-store
Add new acme keywords for the ckch_conf parsing, which will be used on a
crt-store, a crt line in a frontend, or even a crt-list.

The cfg_postparser_acme() is called in order to check if a section referenced
elsewhere really exists in the config file.
2025-04-12 01:29:27 +02:00
William Lallemand
077e2ce84c MINOR: acme: add the acme section in the configuration parser
Add a configuration parser for the new acme section, the section is
configured this way:

    acme letsencrypt
        uri https://acme-staging-v02.api.letsencrypt.org/directory
        account account.key
        contact foobar@example.com
        challenge HTTP-01

When unspecified, the challenge defaults to HTTP-01, and the account key
to "<section_name>.account.key".

Section are stored in a linked list containing acme_cfg structures, the
configuration parsing is mostly resolved in the postsection parser
cfg_postsection_acme() which is called after the parsing of an acme section.
2025-04-12 01:29:27 +02:00
William Lallemand
20718f40b6 MEDIUM: ssl/ckch: add filename and linenum argument to crt-store parsing
Add filename and linenum arguments to the crt-store / ckch_conf parsing.

It allows to use them in the parsing function so we could emits error.
2025-04-12 01:29:27 +02:00
Willy Tarreau
00c967fac4 MINOR: master/cli: support bidirectional communications with workers
Some rare commands in the worker require to keep their input open and
terminate when it's closed ("show events -w", "wait"). Others maintain
a per-session context ("set anon on"). But in its default operation
mode, the master CLI passes commands one at a time to the worker, and
closes the CLI's input channel so that the command can immediately
close upon response. This effectively prevents these two specific cases
from being used.

Here the approach that we take is to introduce a bidirectional mode to
connect to the worker, where everything sent to the master is immediately
forwarded to the worker (including the raw command), allowing to queue
multiple commands at once in the same session, and to continue to watch
the input to detect when the client closes. It must be a client's choice
however, since doing so means that the client cannot batch many commands
at once to the master process, but must wait for these commands to complete
before sending new ones. For this reason we use the prefix "@@<pid>" for
this. It works exactly like "@" except that it maintains the channel
open during the whole execution. Similarly to "@<pid>" with no command,
"@@<pid>" will simply open an interactive CLI session to the worker, that
will be ended by "quit" or by closing the connection. This can be convenient
for the user, and possibly for clients willing to dedicate a connection to
the worker.
2025-04-11 16:09:17 +02:00
Willy Tarreau
b6a8abcd0b DOC: management: add a paragraph about the limitations of the '@' prefix
The '@' prefix permits to execute a single command at once in a worker.
It is very handy but comes with some limitations affecting rare commands,
which is better to be documented (one command per session, input closed)
since it can seldom have user-visible effects.
2025-04-11 16:09:17 +02:00
Willy Tarreau
e8267d1ce2 DOC: management: slightly clarify the prefix role of the '@' command
While the examples were clear, the text did not fully imply what was
reflected there. Better have the text explicitly mention that the
'@' command may be used as a prefix or wrapper in front of a command
as well as a standalone command.
2025-04-11 16:09:17 +02:00
Ilya Shipitsin
eed4116c07 CI: enable weekly QuicTLS build
QuicTLS started own fork not dependant on OpenSSL, lets add
that to weekly builds

ML: https://www.mail-archive.com/haproxy@formilux.org/msg45574.html
GH: https://github.com/quictls/quictls/issues/244
2025-04-11 16:01:45 +02:00
Willy Tarreau
a6982a898e [RELEASE] Released version 3.2-dev10
Released version 3.2-dev10 with the following main changes :
    - REORG: ssl: move curves2nid and nid2nist to ssl_utils
    - BUG/MEDIUM: stream: Fix a possible freeze during a forced shut on a stream
    - MEDIUM: stream: Save SC and channel flags earlier in process_steam()
    - BUG/MINOR: peers: fix expire learned from a peer not converted from ms to ticks
    - BUG/MEDIUM: peers: prevent learning expiration too far in futur from unsync node
    - CI: spell check: allow manual trigger
    - CI: codespell: add "pres" to spellcheck whitelist
    - CLEANUP: assorted typo fixes in the code, commits and doc
    - CLEANUP: atomics: remove support for gcc < 4.7
    - CLEANUP: atomics: also replace __sync_synchronize() with __atomic_thread_fence()
    - TESTS: Fix build for filltab25.c
    - MEDIUM: ssl: replace "crt" lines by "ssl-f-use" lines
    - DOC: configuration: replace "crt" by "ssl-f-use" in listeners
    - MINOR: backend: mark srv as nonnull in alloc_dst_address()
    - BUG/MINOR: server: ensure check-reuse-pool is copied from default-server
    - MINOR: server: activate automatically check reuse for rhttp@ protocol
    - MINOR: check/backend: support conn reuse with SNI
    - MINOR: check: implement check-pool-conn-name srv keyword
    - MINOR: task: add thread safe notification_new and notification_wake variants
    - BUG/MINOR: hlua_fcn: fix potential UAF with Queue:pop_wait()
    - MINOR: hlua_fcn: register queue class using hlua_register_metatable()
    - MINOR: hlua: add core.wait()
    - MINOR: hlua: core.wait() takes optional delay paramater
    - MINOR: hlua: split hlua_applet_tcp_recv_yield() in two functions
    - MINOR: hlua: add AppletTCP:try_receive()
    - MINOR: hlua_fcn: add Queue:alarm()
    - MEDIUM: task: make notification_* API thread safe by default
    - CLEANUP: log: adjust _lf_cbor_encode_byte() comment
    - MEDIUM: ssl/crt-list: warn on negative wildcard filters
    - MEDIUM: ssl/crt-list: warn on negative filters only
    - BUILD: atomics: fix build issue on non-x86/non-arm systems
    - BUG/MINOR: log: fix CBOR encoding with LOG_VARTEXT_START() + lf_encode_chunk()
    - BUG/MEDIUM: sample: fix risk of overflow when replacing multiple regex back-refs
    - DOC: configuration: rework the crt-list section
    - MINOR: ring: support arbitrary delimiters through ring_dispatch_messages()
    - MINOR: ring/cli: support delimiting events with a trailing \0 on "show events"
    - DEV: h2: fix h2-tracer.lua nil value index
    - BUG/MINOR: backend: do not use the source port when hashing clientip
    - BUG/MINOR: hlua: fix invalid errmsg use in hlua_init()
    - MINOR: proxy: add setup_new_proxy() function
    - MINOR: checks: mark CHECKS-FE dummy frontend as internal
    - MINOR: flt_spoe: mark spoe agent frontend as internal
    - MEDIUM: tree-wide: avoid manually initializing proxies
    - MINOR: proxy: add deinit_proxy() helper func
    - MINOR: checks: deinit checks_fe upon deinit
    - MINOR: flt_spoe: deinit spoe agent proxy upon agent release
2025-04-11 10:04:00 +02:00
Aurelien DARRAGON
f3b231714f MINOR: flt_spoe: deinit spoe agent proxy upon agent release
Even though spoe agent proxy is statically allocated, it uses the proxy
API and is initialized like a regular proxy, thus specific cleanup is
required upon release. This is not tagged as a bug because as of now this
would only cause some minor memory leak upon deinit.

We check the presence of proxy->id to know if it was initialized since
we cannot rely on a pointer for that.
2025-04-10 22:10:31 +02:00
Aurelien DARRAGON
8a944d0e46 MINOR: checks: deinit checks_fe upon deinit
This is just to make valgrind and friends happy, leverage deinit_proxy()
for checks_fe proxy upon deinit to ensure proper cleanup.

We check the presence of proxy->id to know if it was initialized because
we cannot rely on a pointer for that.
2025-04-10 22:10:31 +02:00
Aurelien DARRAGON
fbfeb591f7 MINOR: proxy: add deinit_proxy() helper func
Same as free_proxy(), but does not free the base proxy pointer (ie: the
proxy itself may not be allocated)

Goal is to be able to cleanup statically allocated dummy proxies.
2025-04-10 22:10:31 +02:00
Aurelien DARRAGON
4194f756de MEDIUM: tree-wide: avoid manually initializing proxies
In this patch we try to use the proxy API init functions as much as
possible to avoid code redundancy and prevent proxy initialization
errors. As such, we prefer using alloc_new_proxy() and setup_new_proxy()
instead of manually allocating the proxy pointer and performing the
base init ourselves.
2025-04-10 22:10:31 +02:00
Aurelien DARRAGON
60f45564a1 MINOR: flt_spoe: mark spoe agent frontend as internal
spoe agent frontend is used by the agent internally, but it is not meant
to be directly exposed like user-facing proxies defined in the config.

As such, better mark it as internal using PR_CAP_INT capability to prevent
any mis-use.
2025-04-10 22:10:31 +02:00
Aurelien DARRAGON
5087048b6d MINOR: checks: mark CHECKS-FE dummy frontend as internal
CHECKS-FE frontend is a dummy frontend used to create checks sessions
as such, it is internal and should not be exposed to the user.
Better mark it as internal using PR_CAP_INT capability to prevent
proxy API from ever exposing it.
2025-04-10 22:10:31 +02:00
Aurelien DARRAGON
e1cec655ee MINOR: proxy: add setup_new_proxy() function
Split alloc_new_proxy() in two functions: the preparing part is now
handled by setup_new_proxy() which can be called individually, while
alloc_new_proxy() takes care of allocating a new proxy struct and then
calling setup_new_proxy() with the freshly allocated proxy.
2025-04-10 22:10:31 +02:00
Aurelien DARRAGON
ea3c96369f BUG/MINOR: hlua: fix invalid errmsg use in hlua_init()
errmsg is used with memprintf and friends, thus it must be NULL
initialized before being passed to memprintf, else invalid read will
occur.

However in hlua_init() the errmsg value isn't initialized, let's fix that

This is really minor because it would only cause issue on error paths,
yet it may be backported to all stable versions, just in case.
2025-04-10 22:10:26 +02:00
Willy Tarreau
7b6df86a83 BUG/MINOR: backend: do not use the source port when hashing clientip
The server's "usesrc" keyword supports among other options "client"
and "clientip". The former means we bind to the client's IP and port
to connect to the server, while the latter means we bind to its IP
only. It's done in two steps, first alloc_bind_address() retrieves
the IP address and port, and second, tcp_connect_server() decides
to either bind to the IP only or IP+port.

The problem comes with idle connection pools, which hash all the
parameters: the hash is calculated before (and ideally withouy) calling
tcp_connect_server(), and it considers the whole struct sockaddr_storage
for the hash, except that both client and clientip entirely fill it with
the client's address. This means that both client and clientip make use
of the source port in the hash calculation, making idle connections
almost not reusable when using "usesrc clientip" while they should for
clients coming from the same source. A work-around is to force the
source port to zero using "tcp-request session set-src-port int(0)" but
it's ugly.

Let's fix this by properly zeroing the port for AF_INET/AF_INET6 addresses.

This can be backported to 2.4. Thanks to Sebastien Gross for providing a
reproducer for this problem.
2025-04-09 11:05:22 +02:00
Aurelien DARRAGON
afd5f5d671 DEV: h2: fix h2-tracer.lua nil value index
Nick Ramirez reported the following error while testing the h2-tracer.lua
script:

  Lua filter 'h2-tracer' : [state-id 0] runtime error: /etc/haproxy/h2-tracer.lua:227: attempt to index a nil value (field '?') from /etc/haproxy/h2-tracer.lua:227: in function line 109.

It is caused by h2ff indexing with an out of bound value. Indeed, h2ff
is indexed with the frame type, which can potentially be > 9 (not common
nor observed during Willy's tests), while h2ff only defines indexes from
0 to 9.

The fix was provided by Willy, it consists in skipping h2ff indexing if
frame type is > 9. It was confirmed that doing so fixes the error.
2025-04-08 17:44:41 +02:00
Willy Tarreau
f4634e5a38 MINOR: ring/cli: support delimiting events with a trailing \0 on "show events"
At the moment it is not supported to produce multi-line events on the
"show events" output, simply because the LF character is used as the
default end-of-event mark. However it could be convenient to produce
well-formatted multi-line events, e.g. in JSON or other formats. UNIX
utilities have already faced similar needs in the past and added
"-print0" to "find" and "-0" to "xargs" to mention that the delimiter
is the NUL character. This makes perfect sense since it's never present
in contents, so let's do exactly the same here.

Thus from now on, "show events <ring> -0" will delimit messages using
a \0 instead of a \n, permitting a better and safer encapsulation.
2025-04-08 14:36:35 +02:00
Willy Tarreau
0be6d73e88 MINOR: ring: support arbitrary delimiters through ring_dispatch_messages()
In order to support delimiting output events with other characters than
just the LF, let's pass the delimiter through the API. The default remains
the LF, used by applet_append_line(), and ignored by the log forwarder.
2025-04-08 14:36:35 +02:00
William Lallemand
038a372684 DOC: configuration: rework the crt-list section
The crt-list section was unclear, this patch reworks it, giving more
details on the matching algorithms and how the things are loaded.
2025-04-08 14:29:10 +02:00
Willy Tarreau
3e3b9eebf8 BUG/MEDIUM: sample: fix risk of overflow when replacing multiple regex back-refs
Aleandro Prudenzano of Doyensec and Edoardo Geraci of Codean Labs
reported a bug in sample_conv_regsub(), which can cause replacements
of multiple back-references to overflow the temporary trash buffer.

The problem happens when doing "regsub(match,replacement,g)": we're
replacing every occurrence of "match" with "replacement" in the input
sample, which requires a length check. For this, a max is applied, so
that a replacement may not use more than the remaining length in the
buffer. However, the length check is made on the replaced pattern and
not on the temporary buffer used to carry the new string. This results
in the remaining size to be usable for each input match, which can go
beyond the temporary buffer size if more than one occurrence has to be
replaced with something that's larger than the remaining room.

The fix proposed by Aleandro and Edoardo is the correct one (check on
"trash" not "output"), and is the one implemented in this patch.

While it is very unlikely that a config will replace multiple short
patterns each with a larger one in a request, this possibility cannot
be entirely ruled out (e.g. mask a known, short IP address using
"XXX.XXX.XXX.XXX").  However when this happens, the replacement pattern
will be static, and not be user-controlled, which is why this patch is
marked as medium.

The bug was introduced in 2.2 with commit 07e1e3c93e ("MINOR: sample:
regsub now supports backreferences"), so it must be backported to all
versions.

Special thanks go to Aleandro and Edoardo for reporting this bug with
a simple reproducer and a fix.
2025-04-07 15:57:28 +02:00
Aurelien DARRAGON
9e8444b730 BUG/MINOR: log: fix CBOR encoding with LOG_VARTEXT_START() + lf_encode_chunk()
There have been some reports that using %HV logformat alias with CBOR
encoder would produce invalid CBOR payload according to some CBOR
implementations such as "cbor.me". Indeed, with the below log-format:

  log-format "%{+cbor}o %(protocol)HV"

And the resulting CBOR payload:

  BF6870726F746F636F6C7F48485454502F312E31FFFF

cbor.me would complain with: "bytes/text mismatch (ASCII-8BIT != UTF-8) in
streaming string") error message.

It is due to the version string being first announced as text, while CBOR
encoder actually encodes it as byte string later when lf_encode_chunk()
is used.

In fact it affects all patterns combining LOG_VARTEXT_START() with
lf_encode_chunk() which means  %HM, %HU, %HQ, %HPO and %HP are also
affected. To fix the issue, in _lf_encode_bytes() (which is
lf_encode_chunk() helper), we now check if we are inside a VARTEXT (we
can tell it if ctx->in_text is true), in which case we consider that we
already announced the current data as regular text so we keep the same
type to encode the bytes from the chunk to prevent inconsistencies.

It should be backported in 3.0
2025-04-07 12:27:14 +02:00
Willy Tarreau
f01ff2478f BUILD: atomics: fix build issue on non-x86/non-arm systems
Commit f435a2e518 ("CLEANUP: atomics: also replace __sync_synchronize()
with __atomic_thread_fence()") replaced the builtins used for barriers,
but the different API required an argument while the macros didn't specify
any, resulting in double parenthesis that were causing obscure build errors
such as "called object type 'void' is not a function or function pointer".
Let's just specify the args for the macro. No backport is needed.
2025-04-07 09:38:22 +02:00
William Lallemand
ab4cd49c04 MEDIUM: ssl/crt-list: warn on negative filters only
negative SNI filters on crt-list lines only have a meaning when they
match a positive wildcard filter. This patch adds a warning which
is emitted when trying to use negative filters without any wildcard on
the same line.

This was discovered in ticket #2900.
2025-04-04 18:18:44 +02:00
William Lallemand
a9ae6b516d MEDIUM: ssl/crt-list: warn on negative wildcard filters
negative wildcard filters were always a noop, and are not useful for
anything unless you want to use !* alone to remove every name from a
certificate.

This is confusing and the documentation never stated it correctly. This
patch adds a warning during the bind initialization if it founds one,
only !* does not emit a warning.

This patch was done during the debugging of issue #2900.
2025-04-04 17:13:51 +02:00
Aurelien DARRAGON
ce6951d6f9 CLEANUP: log: adjust _lf_cbor_encode_byte() comment
_lf_cbor_encode_byte() comment was not updated in c33b857df ("MINOR: log:
support true cbor binary encoding") to reflect the new behavior.

Indeed, binary form is now supported. Updating the comment that says
otherwise.
2025-04-03 17:52:56 +02:00
Aurelien DARRAGON
11d4d0957e MEDIUM: task: make notification_* API thread safe by default
Some notification_* functions were not thread safe by default as they
assumed only one producer would emit events for registered tasks.

While this suited well with the Lua sockets use-case, this proved to
be a limitation with some other event sources (ie: lua Queue class)

instead of having to deal with both the non thread safe and thread
safe variants (_mt suffix), which is error prone, let's make the
entire API thread safe regarding the event list.

Pruning functions still require that only one thread executes them,
with Lua this is always the case because there is one cleanup list
per context.
2025-04-03 17:52:50 +02:00
Aurelien DARRAGON
976890edda MINOR: hlua_fcn: add Queue:alarm()
Queue:alarm() sets a wakeup alarm on the task when new data becomes
available on Queue. It must be re-armed for each event.

Lua documentation was updated
2025-04-03 17:52:44 +02:00
Aurelien DARRAGON
0ffc80d3ba MINOR: hlua: add AppletTCP:try_receive()
This is the non-blocking variant for AppletTCP:receive(). It doesn't
take any argument, instead it tries to read as much data as available
at once. If no data is available, empty string is returned.

Lua documentation was updated.
2025-04-03 17:52:39 +02:00
Aurelien DARRAGON
86d3cfdeeb MINOR: hlua: split hlua_applet_tcp_recv_yield() in two functions
Split hlua_applet_tcp_recv_yield() in order to create
hlua_applet_tcp_recv_try() helper function which does a single receive
attempt.
2025-04-03 17:52:34 +02:00
Aurelien DARRAGON
c7cbfafa38 MINOR: hlua: core.wait() takes optional delay paramater
core.wait() now accepts optional delay parameter in ms. Passed this delay
the task is woken up if no event woke the task before.

Lua documentation was updated.
2025-04-03 17:52:28 +02:00
Aurelien DARRAGON
1e4e5ab4d2 MINOR: hlua: add core.wait()
Similar to core.yield(), except that the task is not woken up
automatically, instead it waits for events to trigger the task
wakeup.

Lua documentation was updated.
2025-04-03 17:52:23 +02:00
Aurelien DARRAGON
748dba4859 MINOR: hlua_fcn: register queue class using hlua_register_metatable()
Most lua classes are registered by leveraging the
hlua_register_metatable() helper. Let's use that for the Queue class as
well for consitency.
2025-04-03 17:52:17 +02:00
Aurelien DARRAGON
c6fa061f22 BUG/MINOR: hlua_fcn: fix potential UAF with Queue:pop_wait()
If Queue:pop_wait() excecuted from a stream context and pop_wait() is
aborted due to a Lua or ressource error, then the waiting object pointing
to the task will still be registered, so if the task eventually dissapears,
Queue:push() may try to wake invalid task pointer..

To prevent this bug from happening, we now rely on notification_* API to
deliver waiting signals. This way signals are properly garbage collected
when a lua context is destroyed.

It should be backported in 2.8 with 86fb22c55 ("MINOR: hlua_fcn: add Queue
class").
This patch depends on ("MINOR: task: add thread safe notification_new and
notification_wake variants")
2025-04-03 17:52:09 +02:00
Aurelien DARRAGON
b77b1a2c3a MINOR: task: add thread safe notification_new and notification_wake variants
notification_new and notification_wake were historically meant to be
called by a single thread doing both the init and the wakeup for other
tasks waiting on the signals.

In this patch, we extend the API so that notification_new and
notification_wake have thread-safe variants that can safely be used with
multiple threads registering on the same list of events and multiple
threads pushing updates on the list.
2025-04-03 17:52:03 +02:00
Amaury Denoyelle
f0f1816f1a MINOR: check: implement check-pool-conn-name srv keyword
This commit is a direct follow-up of the previous one. It defines a new
server keyword check-pool-conn-name. It is used as the default value for
the name parameter of idle connection hash generation.

Its behavior is similar to server keyword pool-conn-name, but reserved
for checks reuse. If check-pool-conn-name is set, it is used in priority
to match a connection for reuse. If unset, a fallback is performed on
check-sni.
2025-04-03 17:19:07 +02:00
Amaury Denoyelle
43367f94f1 MINOR: check/backend: support conn reuse with SNI
Support for connection reuse during server checks was implemented
recently. This is activated with the server keyword check-reuse-pool.

Similarly to stream processing via connect_backend(), a connection hash
is calculated when trying to perform reuse for checks. This is necessary
to retrieve for a connection which shares the check connect parameters.
However, idle connections can additionnally be tagged using a
pool-conn-name or SNI under connect_backend(). Check reuse does not test
these values, which prevent to retrieve a matching connection.

Improve this by using "check-sni" value as idle connection hash input
for check reuse. be_calculate_conn_hash() API has been adjusted so that
name value can be passed as input, both when using streams or checks.

Even with the current patch, there is still some scenarii which could
not be covered for checks connection reuse. most notably, when using
dynamic pool-conn-name/SNI value. It is however at least sufficient to
cover simpler cases.
2025-04-03 17:19:07 +02:00
Amaury Denoyelle
28116e307a MINOR: server: activate automatically check reuse for rhttp@ protocol
Without check-reuse-pool, it is impossible to perform check on server
using @rhttp protocol. This is due to the inherent nature of the
protocol which does not implement an active connect method.

Thus, ensure that check-reuse-pool is always set when a reverse HTTP
server is declared. This reduces server configuration and should prevent
any omission. Note that it is still require to add "check" server
keyword so activate server checks.
2025-04-03 17:19:07 +02:00
Amaury Denoyelle
ace9f5db10 BUG/MINOR: server: ensure check-reuse-pool is copied from default-server
Duplicate server check.reuse_pool boolean value in srv_settings_cpy().
This is necessary to ensure that check-reuse-pool value can be set via
default-server or server-template.

This does not need to be backported.
2025-04-03 17:19:07 +02:00
Amaury Denoyelle
76e9156c9b MINOR: backend: mark srv as nonnull in alloc_dst_address()
Server instance can be NULL on connect_server(), either when dispatch or
transparent proxy are active. However, in alloc_dst_address() access to
<srv> is safe thanks to SF_ASSIGNED stream flag. Add an ASSUME_NONNULL()
to reflect this state.

This should fix coverity report from github issue #2922.
2025-04-03 17:19:07 +02:00
William Lallemand
feb1a9ea17 DOC: configuration: replace "crt" by "ssl-f-use" in listeners
Replace the "crt" keyword from the frontend section with a "ssl-f-use"
keyword, "crt" could be ambigous in case we don't want to put a
certificate filename.
2025-04-03 16:38:15 +02:00
William Lallemand
c7f29afcea MEDIUM: ssl: replace "crt" lines by "ssl-f-use" lines
The new "crt" lines in frontend and listen sections are confusing:

- a filename is mandatory but we could need a syntax without the
  filename in the future, if the filename is generated for example
- there is no clue about the fact that its only used on the frontend
  side when reading the line

A new "ssl-f-use" line replaces the "crt" line, but a "crt" keyword
can be used on this line. "f" indicates that this is the frontend
configuration, a "ssl-b-use" keyword could be used in the future.

The "crt" lines only appeared in 3.2-dev so this won't change anything
for people using configurations from previous major versions.
2025-04-03 16:38:15 +02:00
Olivier Houchard
4715c557e9 TESTS: Fix build for filltab25.c
Give a return type to main(), so that filltab25.c compiles with
modern compilers.
2025-04-03 15:59:41 +02:00
Willy Tarreau
f435a2e518 CLEANUP: atomics: also replace __sync_synchronize() with __atomic_thread_fence()
The drop of older compilers also allows us to focus on clearer
barriers, so let's use them.
2025-04-03 11:59:31 +02:00
Willy Tarreau
34e3b83f9c CLEANUP: atomics: remove support for gcc < 4.7
The old __sync_* API is no longer necessary since we do not support
gcc before 4.7 anymore. Let's just get rid of this code, the file is
still ugly enough without it.
2025-04-03 11:55:35 +02:00
Ilia Shipitsin
27a6353ceb CLEANUP: assorted typo fixes in the code, commits and doc 2025-04-03 11:37:25 +02:00
Ilia Shipitsin
bd477d5f51 CI: codespell: add "pres" to spellcheck whitelist
spellcheck was triggered by the following:

  * pres  : same as "res" but using the parent stream, if any. "pres"
            variables are only accessible during response processing of the
            parent stream.
2025-04-03 11:37:25 +02:00
Ilia Shipitsin
30df5b0f23 CI: spell check: allow manual trigger 2025-04-03 11:37:25 +02:00
Emeric Brun
b02b8453d1 BUG/MEDIUM: peers: prevent learning expiration too far in futur from unsync node
This patch sets the expire of the entry to the max value in
configuration if the value showed in the peer update message
is too far in futur.

This should be backported an all supported branches.
2025-04-03 11:26:29 +02:00
Emeric Brun
00461fbfbf BUG/MINOR: peers: fix expire learned from a peer not converted from ms to ticks
This is has now impact currently since MS_TO_TICKS macro does nothing
but it will prevent further bugs.
2025-04-03 11:26:21 +02:00
Christopher Faulet
6365eb85e5 MEDIUM: stream: Save SC and channel flags earlier in process_steam()
At the begining of process_stream(), the flags of the stream connectors and
channels are saved to be able to handle changes performed in sub-functions
(for instance in analyzers). But, some operations were performed before
saving these flags: Synchronous receives and forced shutdowns. While it
seems to safe for now, it is a bit annoying because some events could be
missed.

So, to avoid bugs in the future, the channels and stream connectors flags
are now really saved before any other processing.
2025-04-03 10:19:58 +02:00
Christopher Faulet
51611a5b70 BUG/MEDIUM: stream: Fix a possible freeze during a forced shut on a stream
When a forced shutdown is performed on a stream, it is possible to freeze it
infinitly because it is performed in an unexpected way from process_stream()
point of view, especially when the stream is waiting for a server
connection. The events sequence is a bit complex but at the end the stream
remains blocked in turn-around state and no event are trriggered to unblock
it.

By trying to fix the issue, we considered it was safer to rethink the
feature. The idea is to quickly shutdown a stream to release resources. For
instance to be able to delete a server. So, instead of scheduling a
shutdown, it is more efficient to trigger an error and detach the stream
from the server, if neecessary. The same code than the one used to deal with
connection errors in back_handle_st_cer() is used.

This patch must be slowly backported as far as 2.6.
2025-04-03 10:19:57 +02:00
William Lallemand
b351f06ff1 REORG: ssl: move curves2nid and nid2nist to ssl_utils
curves2nid and nid2nist are generic functions that could be used outside
the JWS scope, this patch put them at the right place so they can be
reused.
2025-04-02 19:34:09 +02:00
Willy Tarreau
a8fab63604 [RELEASE] Released version 3.2-dev9
Released version 3.2-dev9 with the following main changes :
    - MINOR: quic: move global tune options into quic_tune
    - CLEANUP: quic: reorganize TP flow-control initialization
    - MINOR: quic: ignore uni-stream for initial max data TP
    - MINOR: mux-quic: define config for max-data
    - MINOR: quic: define max-stream-data configuration as a ratio
    - MEDIUM: lb-chash: add directive hash-preserve-affinity
    - MEDIUM: pools: be a bit smarter when merging comparable size pools
    - REGTESTS: disable the test balance/balance-hash-maxqueue
    - BUG/MINOR: log: fix gcc warn about truncating NUL terminator while init char arrays
    - CI: fedora rawhide: allow "on: workflow_dispatch" in forks
    - CI: fedora rawhide: install "awk" as a dependency
    - CI: spellcheck: allow "on: workflow_dispatch" in forks
    - CI: coverity scan: allow "on: workflow_dispatch" in forks
    - CI: cross compile: allow "on: workflow_dispatch" in forks
    - CI: Illumos: allow "on: workflow_dispatch" in forks
    - CI: NetBSD: allow "on: workflow_dispatch" in forks
    - CI: QUIC Interop on AWS-LC: allow "on: workflow_dispatch" in forks
    - CI: QUIC Interop on LibreSSL: allow "on: workflow_dispatch" in forks
    - MINOR: compiler: add __nonstring macro
    - MINOR: thread: dump the CPU topology in thread_map_to_groups()
    - MINOR: cpu-set: compare two cpu sets with ha_cpuset_isequal()
    - MINOR: cpu-set: add a new function to print cpu-sets in human-friendly mode
    - MINOR: cpu-topo: add a dump of thread-to-CPU mapping to -dc
    - MINOR: cpu-topo: pass an extra argument to ha_cpu_policy
    - MINOR: cpu-topo: add new cpu-policies "group-by-2-clusters" and above
    - BUG/MINOR: config: silence .notice/.warning/.alert in discovery mode
    - EXAMPLES: add "games.cfg" and an example game in Lua
    - MINOR: jws: emit the JWK thumbprint
    - TESTS: jws: change the jwk format
    - MINOR: ssl/ckch: add substring parser for ckch_conf
    - MINOR: mt_list: Implement mt_list_try_lock_prev().
    - MINOR: lbprm: Add method to deinit server and proxy
    - MINOR: threads: Add HA_RWLOCK_TRYRDTOWR()
    - MAJOR: leastconn; Revamp the way servers are ordered.
    - BUG/MINOR: ssl/ckch: leak in error path
    - BUILD: ssl/ckch: potential null pointer dereference
    - MINOR: log: support "raw" logformat node typecast
    - CLEANUP: assorted typo fixes in the code and comments
    - DOC: config: fix two missing "content" in "tcp-request" examples
    - MINOR: cpu-topo: cpu_dump_topology() SMT info check little optimisation
    - BUILD: compiler: undefine the CONCAT() macro if already defined
    - BUG/MEDIUM: leastconn: Don't try to reposition if the server is down
    - BUG/MINOR: rhttp: fix incorrect dst/dst_port values
    - BUG/MINOR: backend: do not overwrite srv dst address on reuse
    - BUG/MEDIUM: backend: fix reuse with set-dst/set-dst-port
    - MINOR: sample: define bc_reused fetch
    - REGTESTS: extend conn reuse test with transparent proxy
    - MINOR: backend: fix comment when killing idle conns
    - MINOR: backend: adjust conn_backend_get() API
    - MINOR: backend: extract conn hash calculation from connect_server()
    - MINOR: backend: extract conn reuse from connect_server()
    - MINOR: backend: remove stream usage on connection reuse
    - MINOR: check define check-reuse-pool server keyword
    - MEDIUM: check: implement check-reuse-pool
    - BUILD: backend: silence a build warning when not using ssl
    - BUILD: quic_sock: address a strict-aliasing build warning with gcc 5 and 6
    - BUILD: ssl_ckch: use my_strndup() instead of strndup()
    - DOC: update INSTALL to reflect the minimum compiler version
2025-04-02 18:12:34 +02:00
Willy Tarreau
1450b44bb9 DOC: update INSTALL to reflect the minimum compiler version
The mt_list update in 3.1 mandated the support for c11-like atomics that
arrived with gcc-4.7. As such, older versions are no longer supported.
For special cases in single-threaded environments, mt_lists could be
replaced with regular lists but it doesn't seem worth the hassle. It
was verified that gcc 4.7 to 14 and clang 3.0 and 19 do build fine.
That leaves us with 10 years of coverage of compiler versions, which
remains reasonable assuming that users of old ultra-stable systems are
unlikely to upgrade haproxy without touching the rest of the system.

This should be backported to 3.1.
2025-04-02 18:09:47 +02:00
Willy Tarreau
90e9b9d477 BUILD: ssl_ckch: use my_strndup() instead of strndup()
Not all systems have strndup(), that's why we have our "my_strndup()",
so let's make use of it here. This fixes the build on Solaris 10.
No backport is needed, this was just merged with commit fdcb97614c
("MINOR: ssl/ckch: add substring parser for ckch_conf").
2025-04-02 17:20:03 +02:00
Willy Tarreau
dd900aead8 BUILD: quic_sock: address a strict-aliasing build warning with gcc 5 and 6
The UDP GSO code emits a build warning with older toolchains (gcc 5 and 6):

  src/quic_sock.c: In function 'cmsg_set_gso':
  src/quic_sock.c:683:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
    *((uint16_t *)CMSG_DATA(c)) = gso_size;
    ^

Let's just use the write_u16() function that's made for this purpose.
It was verified that for all versions from 5 to 13, gcc produces the
exact same code with the fix (and without the warning). It arrived in
3.1 with commit 448d3d388a ("MINOR: quic: add GSO parameter on quic_sock
send API") so this can be backported there.
2025-04-02 16:07:31 +02:00
Willy Tarreau
870f7aa5cf BUILD: backend: silence a build warning when not using ssl
Since recent commit ee94a6cfc1 ("MINOR: backend: extract conn reuse
from connect_server()") a build warning "set but not used" on the
"reuse" variable is emitted, because indeed the variable is now only
checked when SSL is in use. Let's just mark it as such.
2025-04-02 15:26:31 +02:00
Amaury Denoyelle
f1fb396d71 MEDIUM: check: implement check-reuse-pool
Implement the possibility to reuse idle connections when performing
server checks. This is done thanks to the recently introduced functions
be_calculate_conn_hash() and be_reuse_connection().

One side effect of this change is that be_calculate_conn_hash() can now
be called with a NULL stream instance. As such, part of the functions
are adjusted accordingly.

Note that to simplify configuration, connection reuse is not performed
if any specific check connection parameters are defined on the server
line or via the tcp-check connect rule. This is performed via newly
defined tcpcheck_use_nondefault_connect().
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
e34f748e3a MINOR: check define check-reuse-pool server keyword
Define a new server keyword check-reuse-pool, and its counterpart with a
"no" prefix. For the moment, only parsing is implemented. The real
behavior adjustment will be implemented in the next patch.
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
20eb57b486 MINOR: backend: remove stream usage on connection reuse
Adjust newly defined be_reuse_connection() API. The stream argument is
removed. This will allows checks to be able to invoke it without relying
on a stream instance.
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
ee94a6cfc1 MINOR: backend: extract conn reuse from connect_server()
Following the previous patch, the part directly related to connection
reuse is extracted from connect_server(). It is now define in a new
function be_reuse_connection().
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
c7cc6b6401 MINOR: backend: extract conn hash calculation from connect_server()
On connection reuse, a hash is first calculated. It is generated from
various connection parameters, to retrieve a matching connection.

Extract hash calculation from connect_server() into a new dedicated
function be_calculate_conn_hash(). The objective is to be able to
perform connection reuse for checks, without connect_server() invokation
which relies on a stream instance.
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
4f0240f9a4 MINOR: backend: adjust conn_backend_get() API
The main objective of this patch is to remove the stream instance from
conn_backend_get() parameters. This would allow to perform reuse outside
of stream contexts, for example for checks purpose.
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
2ca616b4e1 MINOR: backend: fix comment when killing idle conns
Previously, if a server reached its pool-high-count limit, connection
were killed on connect_server() when reuse was not possible. However,
this is now performed even if reuse is done since the following patch :
  b3397367dc7cec9e78c62c54efc24d9db5cde2d2
  MEDIUM: connections: Kill connections even if we are reusing one.

Thus, adjust the related comment to reflect this state.
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
2f36162ee1 REGTESTS: extend conn reuse test with transparent proxy
Recently, work on connection reuses reveals an issue when mixed with
transparent proxy and set-dst. This patch rewrites the related regtests
to be able to catch this now fixed bug.

Note that it is the first regtest which relies on bc_reused recently
introduced sample fetches. This fetch could be reuse in other related
connection reuse regtests to simplify them.
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
ec76d52cea MINOR: sample: define bc_reused fetch
Define a new layer4 sample fetch "bc_reused". It is used as a boolean,
set to true if backend connection was reused for the request.
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
5fda64e87e BUG/MEDIUM: backend: fix reuse with set-dst/set-dst-port
On backend connection reuse, a hash is calculated from various
parameters, to ensure the selected connection match the requested
parameters. Notably, destination address is one of these parameters.
However, it is only taken into account if using a transparent server
(server address 0.0.0.0).

This may cause issue where an incorrect connection is reused, which is
not targetted to the correct destination address. This may be the case
if a set-dst/set-dst-port is used with a transparent proxy (proxy option
transparent).

The fix is simple enough. Destination address is now always used as
input to the connection reuse hash.

This must be backported up to 2.6. Note that for reverse HTTP to work,
it relies on the following patch, which ensures destination address
remains NULL in this case.

  commit e94baf6ca71cb2319610baa74dbf17b9bc602b18
  BUG/MINOR: rhttp: fix incorrect dst/dst_port values
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
d7fa8e88c4 BUG/MINOR: backend: do not overwrite srv dst address on reuse
Previously, destination address of backend connection was systematically
always reassigned. However, this step is unnecessary on connection
reuse. Indeed, reuse should only be conducted with connection using the
same destination address matching the stream requirements.

This patch removes this unnecessary assignment. It is now only performed
when reuse cannot be conducted and a new connection is instantiated.

Functionnally speaking, this patch should not change anything in theory,
as reuse is performed in conformance with the destination address.
However, it appears that it was not always properly enforced. The
systematic assignment of the destination address hides these issues, so
it is now remove. The identified bogus cases will then be fixed in the
following patches.would

This should be backported up to all stable versions.
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
c05bb8c967 BUG/MINOR: rhttp: fix incorrect dst/dst_port values
With a @rhttp server, connect is not possible, transfer is only possible
via idle connection reuse. The server does not have any network address.

Thus, it is unnecessary to allocate the stream destination address prior
to connection reuse. This patch adjusts this by fixing
alloc_dst_address() to take this into account.

Prior to this patch, alloc_dst_address() would incorrectly assimilate a
@rhttp server with a transparent proxy mode. Thus stream destination
address would be copied from the destination address. Connection adress
would then be rewrote with this incorrect value. This did not impact
connect or reuse as destination addr is only used in idle conn hash
calculation for transparent servers. However, it causes incorrect values
for dst/dst_port samples.

This should be backported up to 2.9.
2025-04-02 14:57:40 +02:00
Olivier Houchard
f59297e492 BUG/MEDIUM: leastconn: Don't try to reposition if the server is down
It may happen that the server is going down, and fwlc_srv_reposition()
is still called, because streams still attached to the server are
being terminated.
So in fwlc_srv_reposition(), just do nothing if we've been removed from
the tree.

This should fix github issue #2919.

This should not be backported, unless commit
9fe72bba3cf3484577fa1ef00723de08df757996 is also backported.

2025-04-02 12:24:04 +02:00
Willy Tarreau
4ec5509541 BUILD: compiler: undefine the CONCAT() macro if already defined
As Ilya reported in issue #2911, the CONCAT() macro breaks on NetBSD
which defines its own as __CONCAT() (which is exactly the same). Let's
just undefine it before ours to fix the issue instead of renaming, but
keep ours so that we don't have doubts about what we're running with.

Note that the patch introducing this breaking change was backported
to 3.0.
2025-04-02 11:36:43 +02:00
David Carlier
a703eeaef7 MINOR: cpu-topo: cpu_dump_topology() SMT info check little optimisation
Once we stumble across the first cpu having the criteria, we exit
earlier from the loop.
2025-04-02 11:31:37 +02:00
Willy Tarreau
3de99a0919 DOC: config: fix two missing "content" in "tcp-request" examples
As reported by Uku Srmus in GitHub issue #2917, two "tcp-request" rules
in an example were mistakenly missing the "content" hook, rendering them
invalid.

This can be backported.
2025-04-02 11:17:05 +02:00
Ilia Shipitsin
78b849b839 CLEANUP: assorted typo fixes in the code and comments
code, comments and doc actually.
2025-04-02 11:12:20 +02:00
Aurelien DARRAGON
423cca64b6 MINOR: log: support "raw" logformat node typecast
"raw" logformat node typecast is a special value (unlike str,bool,int..)
which tells haproxy to completely ignore logformat options (including
encoding ones) and force binary output for the current node only. It is
mainly intended for use with JSON or CBOR encoders in order to generate
nested CBOR or nested JSON by storing intermediate log-formats within
variables and assembling the final object in the parent log-format.

Example:

  http-request set-var-fmt(txn.intermediate) "%{+json}o %(lower)[str(value)]"

  log-format "%{+json}o %(upper)[str(value)] %(intermediate:raw)[var(txn.intermediate)]"

Would produce:

   {"upper": "value", "intermediate": {"lower": "value"}}
2025-04-02 11:04:43 +02:00
William Lallemand
31bd3627cd BUILD: ssl/ckch: potential null pointer dereference
src/ssl_ckch.c: In function ‘ckch_conf_parse’:
src/ssl_ckch.c:4852:40: error: potential null pointer dereference [-Werror=null-dereference]
 4852 |                                 while (*r) {
      |                                        ^~

Add a test on r before using *r.

No backport needed
2025-04-02 10:02:07 +02:00
William Lallemand
2e8acf54d4 BUG/MINOR: ssl/ckch: leak in error path
fdcb97614cb ("MINOR: ssl/ckch: add substring parser for ckch_conf")
introduced a leak in the error path when the strndup fails.

This patch fixes issue #2920. No backport needed.
2025-04-02 09:53:48 +02:00
Olivier Houchard
9fe72bba3c MAJOR: leastconn; Revamp the way servers are ordered.
For leastconn, servers used to just be stored in an ebtree.
Each server would be one node.
Change that so that nodes contain multiple mt_lists. Each list
will contain servers that share the same key (typically meaning
they have the same number of connections). Using mt_lists means
that as long as tree elements already exist, moving a server from
one tree element to another does no longer require the lbprm write
lock.
We use multiple mt_lists to reduce the contention when moving
a server from one tree element to another. A list in the new
element will be chosen randomly.
We no longer remove a tree element as soon as they no longer
contain any server. Instead, we keep a list of all elements,
and when we need a new element, we look at that list only if it
contains a number of elements already, otherwise we'll allocate
a new one. Keeping nodes in the tree ensures that we very
rarely have to take the lbrpm write lock (as it only happens
when we're moving the server to a position for which no
element is currently in the tree).

The number of mt_lists used is defined as FWLC_NB_LISTS.
The number of tree elements we want to keep is defined as
FWLC_MIN_FREE_ENTRIES, both in defaults.h.
The value used were picked afrer experimentation, and
seems to be the best choice of performances vs memory
usage.

Doing that gives a good boost in performances when a lot of
servers are used.
With a configuration using 500 servers, before that patch,
about 830000 requests per second could be processed, with
that patch, about 1550000 requests per second are
processed, on an 64-cores AMD, using 1200 concurrent connections.
2025-04-01 18:05:30 +02:00
Olivier Houchard
ba521a1d88 MINOR: threads: Add HA_RWLOCK_TRYRDTOWR()
Add HA_RWLOCK_TRYRDTOWR(), that tries to upgrade a lock
from reader to writer, and fails if any seeker or writer already
holds it.
2025-04-01 18:05:30 +02:00
Olivier Houchard
2a9436f96b MINOR: lbprm: Add method to deinit server and proxy
Add two new methods to lbprm, server_deinit() and proxy_deinit(),
in case something should be done at the lbprm level when
removing servers and proxies.
2025-04-01 18:05:30 +02:00
Olivier Houchard
17059098e7 MINOR: mt_list: Implement mt_list_try_lock_prev().
Implement mt_list_try_lock_prev(), that does the same thing
as mt_list_lock_prev(), exceot if the list is locked, it
returns { NULL, NULL } instaed of waiting.
2025-04-01 18:05:30 +02:00
William Lallemand
fdcb97614c MINOR: ssl/ckch: add substring parser for ckch_conf
Add a substring parser for the ckch_conf keyword parser, this will split
a string into multiple substring, and strdup them in a array.
2025-04-01 15:38:32 +02:00
William Lallemand
fa01c9d92b TESTS: jws: change the jwk format
The format of the jwk output changed a little bit because of the
previous commit.
2025-04-01 14:37:22 +02:00
William Lallemand
f8fe84caca MINOR: jws: emit the JWK thumbprint
jwk_thumbprint() is a function which is a function which implements
RFC7368 and emits a JWK thumbprint using a EVP_PKEY.

EVP_PKEY_EC_to_pub_jwk() and EVP_PKEY_RSA_to_pub_jwk() were changed in
order to match what is required to emit a thumbprint (ie, no spaces or
lines and the lexicographic order of the fields)
2025-04-01 11:57:55 +02:00
Willy Tarreau
ed1d4807da EXAMPLES: add "games.cfg" and an example game in Lua
The purpose is mainly to exhibit certain limitations that come with such
less common programming models, to show users how to program interactive
tools in Lua, and how to connect interactively.

Other use cases that could be envisioned are "top" and various monitoring
utilities, with sliding graphs etc. Lua is particularly attractive for
this usage, easy to program, well known from most AI tools (including its
integration into haproxy), making such programs very quick to obtain in
their basic form, and to improve later.

A very limited example game is provided, following the principle of a
very popular one, where the player must compose lines from falling
pieces. It quickly revealed the need to the ability to enforce a timeout
to applet:receive(). Other identified limitations include the difficulty
from the Lua side to monitor multiple events at once, but it seems that
callbacks and/or event dispatchers would be useful here.

At the moment the CLI is not workable (it interactivity was broken in 2.9
when line buffering was adopted), though it was verified that it works
with older releases.

The command needed to connect to the game is displayed as a notice message
during boot.
2025-04-01 09:10:00 +02:00
Willy Tarreau
2c779f3938 BUG/MINOR: config: silence .notice/.warning/.alert in discovery mode
When first pre-parsing the config to detect the presence or absence of
the master mode, we must not emit messages because they are not supposed
to be visible at this point, otherwise they appear twice each. The
pre-parsing, also called discovery mode, is only for internal use,
thus it should remain silent.

This should be backported to 3.1 where this mode was introduced.
2025-04-01 09:06:25 +02:00
Willy Tarreau
9f00702dc6 MINOR: cpu-topo: add new cpu-policies "group-by-2-clusters" and above
This adds "group-by-{2,3,4}-clusters", which, as its name implies,
create one thread group per X clusters. This can be useful when CPUs
are split into too small clusters, as well as when the total number
of assigned cores is not even between the clusters, to try to spread
the load between less different ones.
2025-03-31 16:21:37 +02:00
Willy Tarreau
1e9a2529aa MINOR: cpu-topo: pass an extra argument to ha_cpu_policy
This extra argument will allow common functions to distinguish between
multiple policies. For now it's not used.
2025-03-31 16:21:37 +02:00
Willy Tarreau
e4053b0d09 MINOR: cpu-topo: add a dump of thread-to-CPU mapping to -dc
When emitting the CPU topology info with -dc, also emit a list of
thread-to-CPU mapping. The group/thread and thread ID are emitted
with the list of their CPUs on each line. The count of CPUs is shown
to ease comparisons, and as much as possible, we try to pack identical
lines within a group by showing thread ranges.
2025-03-31 16:21:37 +02:00
Willy Tarreau
571573874a MINOR: cpu-set: add a new function to print cpu-sets in human-friendly mode
The new function "print_cpu_set()" will print cpu sets in a human-friendly
way, with commas and dashes for intervals. The goal is to keep them compact
enough.
2025-03-31 16:21:37 +02:00
Willy Tarreau
3955f151b1 MINOR: cpu-set: compare two cpu sets with ha_cpuset_isequal()
This function returns true if two CPU sets are equal.
2025-03-31 16:21:37 +02:00
Willy Tarreau
e17512c3b2 MINOR: thread: dump the CPU topology in thread_map_to_groups()
It was previously done in thread_detect_count() but that's not quite
handy because we still don't know about the groups setting. Better do
it slightly later and have all the relevant info instead.
2025-03-31 15:42:13 +02:00
Valentine Krasnobaeva
b303861469 MINOR: compiler: add __nonstring macro
GCC 15 throws the following warning on fixed-size char arrays if they do not
contain terminated NUL:

src/tools.c:2041:25: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (17 chars into 16 available) [-Werror=unterminated-string-initialization]
 2041 | const char hextab[16] = "0123456789ABCDEF";

We are using a couple of such definitions for some constants. Converting them
to flexible arrays, like: hextab[] = "0123456789ABCDEF" may have consequences,
as enlarged arrays won't fit anymore where they were possibly located due to
the memory alignement constraints.

GCC adds 'nonstring' variable attribute for such char arrays, but clang and
other compilers don't have it. Let's wrap 'nonstring' with our
__nonstring macro, which will test if the compiler supports this attribute.

This fixes the issue #2910.
2025-03-31 13:50:28 +02:00
Ilia Shipitsin
415d446065 CI: QUIC Interop on LibreSSL: allow "on: workflow_dispatch" in forks
previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks
2025-03-28 09:51:35 +01:00
Ilia Shipitsin
8d591c387a CI: QUIC Interop on AWS-LC: allow "on: workflow_dispatch" in forks
previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks
2025-03-28 09:51:35 +01:00
Ilia Shipitsin
7de45e3874 CI: NetBSD: allow "on: workflow_dispatch" in forks
previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks
2025-03-28 09:51:35 +01:00
Ilia Shipitsin
8231f58fdc CI: Illumos: allow "on: workflow_dispatch" in forks
previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks
2025-03-28 09:51:35 +01:00
Ilia Shipitsin
7495dbed22 CI: cross compile: allow "on: workflow_dispatch" in forks
previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks
2025-03-28 09:51:35 +01:00
Ilia Shipitsin
7eb54656ae CI: coverity scan: allow "on: workflow_dispatch" in forks
previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks
2025-03-28 09:51:35 +01:00
Ilia Shipitsin
424ca19831 CI: spellcheck: allow "on: workflow_dispatch" in forks
previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks
2025-03-28 09:51:35 +01:00
Ilia Shipitsin
d9cb95c2a5 CI: fedora rawhide: install "awk" as a dependency
for some reason it is not installed by default on rawhide anymore
2025-03-28 09:51:35 +01:00
Ilia Shipitsin
21894300c1 CI: fedora rawhide: allow "on: workflow_dispatch" in forks
previously that build were limited to "haproxy" github organization
only. let's allow manual builds from forks
2025-03-28 09:51:35 +01:00
Valentine Krasnobaeva
44f98f1747 BUG/MINOR: log: fix gcc warn about truncating NUL terminator while init char arrays
gcc 15 throws such kind of warnings about initialization of some char arrays:

src/log.c:181:33: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (17 chars into 16 available) [-Werror=unterminated-string-initialization]
  181 | const char sess_term_cond[16] = "-LcCsSPRIDKUIIII"; /* normal, Local, CliTo, CliErr, SrvTo, SrvErr, PxErr, Resource, Internal, Down, Killed, Up, -- */
      |                                 ^~~~~~~~~~~~~~~~~~
src/log.c:182:33: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (9 chars into 8 available) [-Werror=unterminated-string-initialization]
  182 | const char sess_fin_state[8]  = "-RCHDLQT";     /* cliRequest, srvConnect, srvHeader, Data, Last, Queue, Tarpit */

So, let's make it happy by not giving the sizes of these char arrays
explicitly, thus he can accomodate there NUL terminators.

Reported in GitHub issue #2910.

This should be backported up to 2.6.
2025-03-27 11:52:33 +01:00
Willy Tarreau
9b53a4a7fb REGTESTS: disable the test balance/balance-hash-maxqueue
This test brought by commit 8ed1e91efd ("MEDIUM: lb-chash: add directive
hash-preserve-affinity") seems to have hit a limitation of what can be
expressed in vtc, as it would be desirable to have one server response
release two clients at once but the various attempts using barriers
have failed so far. The test seems to work fine locally but still fails
almost 100% of the time on the CI, so it remains timing dependent in
some ways. Tests have been done with nbthread 1, pool-idle-shared off,
http-reuse never (since always fails locally) etc but to no avail. Let's
just mark it broken in case we later figure another way to fix it. It's
still usable locally most of the time, though.
2025-03-25 18:24:49 +01:00
Willy Tarreau
6b17310757 MEDIUM: pools: be a bit smarter when merging comparable size pools
By default, pools of comparable sizes are merged together. However, the
current algorithm is dumb: it rounds the requested size to the next
multiple of 16 and compares the sizes like this. This results in many
entries which are already multiples of 16 not being merged, for example
1024 and 1032 are separate, 65536 and 65540 are separate, 48 and 56 are
separate (though 56 merges with 64).

This commit changes this to consider not just the entry size but also the
average entry size, that is, it compares the average size of all objects
sharing the pool with the size of the object looking for a pool. If the
object is not more than 1% bigger nor smaller than the current average
size or if it neither 16 bytes smaller nor larger, then it can be merged.
Also, it always respects exact matches in order to avoid merging objects
into larger pools or worse, extending existing ones for no reason, and
when there's a tie, it always avoids extending an existing pool.

Also, we now visit all existing pools in order to spot the best one, we
do not stop anymore at the smallest one large enough. Theoretically this
could cost a bit of CPU but in practice it's O(N^2) with N quite small
(typically in the order of 100) and the cost at each step is very low
(compare a few integer values). But as a side effect, pools are no
longer sorted by size, "show pools bysize" is needed for this.

This causes the objects to be much better grouped together, accepting to
use a little bit more sometimes to avoid fragmentation, without causing
everyone to be merged into the same pool. Thanks to this we're now
seeing 36 pools instead of 48 by default, with some very nice examples
of compact grouping:

  - Pool qc_stream_r (80 bytes) : 13 users
      >  qc_stream_r : size=72 flags=0x1 align=0
      >  quic_cstrea : size=80 flags=0x1 align=0
      >  qc_stream_a : size=64 flags=0x1 align=0
      >  hlua_esub   : size=64 flags=0x1 align=0
      >  stconn      : size=80 flags=0x1 align=0
      >  dns_query   : size=64 flags=0x1 align=0
      >  vars        : size=80 flags=0x1 align=0
      >  filter      : size=64 flags=0x1 align=0
      >  session pri : size=64 flags=0x1 align=0
      >  fcgi_hdr_ru : size=72 flags=0x1 align=0
      >  fcgi_param_ : size=72 flags=0x1 align=0
      >  pendconn    : size=80 flags=0x1 align=0
      >  capture     : size=64 flags=0x1 align=0

  - Pool h3s (56 bytes) : 17 users
      >  h3s         : size=56 flags=0x1 align=0
      >  qf_crypto   : size=48 flags=0x1 align=0
      >  quic_tls_se : size=48 flags=0x1 align=0
      >  quic_arng   : size=56 flags=0x1 align=0
      >  hlua_flt_ct : size=56 flags=0x1 align=0
      >  promex_metr : size=48 flags=0x1 align=0
      >  conn_hash_n : size=56 flags=0x1 align=0
      >  resolv_requ : size=48 flags=0x1 align=0
      >  mux_pt      : size=40 flags=0x1 align=0
      >  comp_state  : size=40 flags=0x1 align=0
      >  notificatio : size=48 flags=0x1 align=0
      >  tasklet     : size=56 flags=0x1 align=0
      >  bwlim_state : size=48 flags=0x1 align=0
      >  xprt_handsh : size=48 flags=0x1 align=0
      >  email_alert : size=56 flags=0x1 align=0
      >  caphdr      : size=41 flags=0x1 align=0
      >  caphdr      : size=41 flags=0x1 align=0

  - Pool quic_cids (32 bytes) : 13 users
      >  quic_cids   : size=16 flags=0x1 align=0
      >  quic_tls_ke : size=32 flags=0x1 align=0
      >  quic_tls_iv : size=12 flags=0x1 align=0
      >  cbuf        : size=32 flags=0x1 align=0
      >  hlua_queuew : size=24 flags=0x1 align=0
      >  hlua_queue  : size=24 flags=0x1 align=0
      >  promex_modu : size=24 flags=0x1 align=0
      >  cache_st    : size=24 flags=0x1 align=0
      >  spoe_appctx : size=32 flags=0x1 align=0
      >  ehdl_sub_tc : size=32 flags=0x1 align=0
      >  fcgi_flt_ct : size=16 flags=0x1 align=0
      >  sig_handler : size=32 flags=0x1 align=0
      >  pipe        : size=24 flags=0x1 align=0

  - Pool quic_crypto (1032 bytes) : 2 users
      >  quic_crypto : size=1032 flags=0x1 align=0
      >  requri      : size=1024 flags=0x1 align=0

  - Pool quic_conn_r (65544 bytes) : 2 users
      >  quic_conn_r : size=65536 flags=0x1 align=0
      >  dns_msg_buf : size=65540 flags=0x1 align=0

On a very unscientific test consisting in sending 1 million H1 requests
and 1 million H2 requests to the stats page, we're seeing an ~6% lower
memory usage with the patch:

  before the patch:
    Total: 48 pools, 4120832 bytes allocated, 4120832 used (~3555680 by thread caches).

  after the patch:
    Total: 36 pools, 3880648 bytes allocated, 3880648 used (~3299064 by thread caches).

This should be taken with care however since pools allocate and release
in batches.
2025-03-25 18:01:01 +01:00
Pierre-Andre Savalle
8ed1e91efd MEDIUM: lb-chash: add directive hash-preserve-affinity
When using hash-based load balancing, requests are always assigned to
the server corresponding to the hash bucket for the balancing key,
without taking maxconn or maxqueue into account, unlike in other load
balancing methods like 'first'. This adds a new backend directive that
can be used to take maxconn and possibly maxqueue in that context. This
can be used when hashing is desired to achieve cache locality, but
sending requests to a different server is preferable to queuing for a
long time or failing requests when the initial server is saturated.

By default, affinity is preserved as was the case previously. When
'hash-preserve-affinity' is set to 'maxqueue', servers are considered
successively in the order of the hash ring until a server that does not
have a full queue is found.

When 'maxconn' is set on a server, queueing cannot be disabled, as
'maxqueue=0' means unlimited.  To support picking a different server
when a server is at 'maxconn' irrespective of the queue,
'hash-preserve-affinity' can be set to 'maxconn'.
2025-03-25 18:01:01 +01:00
Amaury Denoyelle
cf9e40bd8a MINOR: quic: define max-stream-data configuration as a ratio 2025-03-25 16:30:35 +01:00
Amaury Denoyelle
68c10d444d MINOR: mux-quic: define config for max-data
Define a new global configuration tune.quic.frontend.max-data. This
allows users to explicitely set the value for the corresponding QUIC TP
initial-max-data, with direct impact on haproxy memory consumption.
2025-03-25 16:30:09 +01:00
Amaury Denoyelle
1f1a18e318 MINOR: quic: ignore uni-stream for initial max data TP
Initial TP value for max-data is automatically calculated to be adjusted
to the maximum number of opened streams over a QUIC connection. This
took into account both max-streams-bidi-remote and uni-streams. By
default, this is equivalent to 100 + 3 = 103 max opened streams.

This patch simplifies the calculation by only using bidirectional
streams. Uni streams are ignored because they are only used for HTTP/3
control exchanges, which should only represents a few bytes. For now,
users can only configure the max number of remote bidi streams, so the
simplified calculation should make more sense to them.

Note that this relies on the assumption that HTTP/3 is used as
application protocol. To support other protocols, it may be necessary to
review this and take into account both local bidi and uni streams.
2025-03-25 16:29:38 +01:00
Amaury Denoyelle
3db5320289 CLEANUP: quic: reorganize TP flow-control initialization
Adjust initialization of flow-control transport parameters via
quic_transport_params_init().

This is purely cosmetic, with some comments added. It is also a
preparatory step for future patches with addition of new configuration
keywords related to flow-control TP values.
2025-03-25 16:29:35 +01:00
Amaury Denoyelle
a71007c088 MINOR: quic: move global tune options into quic_tune
A new structure quic_tune has recently been defined. Its purpose is to
store global options related to QUIC. Previously, only the tunable to
toggle pacing was stored in it.

This commit moves several QUIC related tunable from global to quic_tune
structure. This better centralizes QUIC configuration option and gives
room for future generic options.
2025-03-24 10:01:46 +01:00
Willy Tarreau
119a79f479 [RELEASE] Released version 3.2-dev8
Released version 3.2-dev8 with the following main changes :
    - MINOR: jws: implement JWS signing
    - TESTS: jws: implement a test for JWS signing
    - CI: github: add "jose" to apt dependencies
    - CLEANUP: log-forward: remove useless options2 init
    - CLEANUP: log: add syslog_process_message() helper
    - MINOR: proxy: add proxy->options3
    - MINOR: log: migrate log-forward options from proxy->options2 to options3
    - MINOR: log: provide source address information in syslog_process_message()
    - MINOR: tools: only print address in sa2str() when port == -1
    - MINOR: log: add "option host" log-forward option
    - MINOR: log: handle log-forward "option host"
    - MEDIUM: log: change default "host" strategy for log-forward section
    - BUG/MEDIUM: thread: use pthread_self() not ha_pthread[tid] in set_affinity
    - MINOR: compiler: add a simple macro to concatenate resolved strings
    - MINOR: compiler: add a new __decl_thread_var() macro to declare local variables
    - BUILD: tools: silence a build warning when USE_THREAD=0
    - BUILD: backend: silence a build warning when threads are disabled
    - DOC: management: rename some last occurences from domain "dns" to "resolvers"
    - BUG/MINOR: stats: fix capabilities and hide settings for some generic metrics
    - MINOR: cli: export cli_io_handler() to ease symbol resolution
    - MINOR: tools: improve symbol resolution without dl_addr
    - MINOR: tools: ease the declaration of known symbols in resolve_sym_name()
    - MINOR: tools: teach resolve_sym_name() a few more common symbols
    - BUILD: tools: avoid a build warning on gcc-4.8 in resolve_sym_name()
    - DEV: ncpu: also emulate sysconf() for _SC_NPROCESSORS_*
    - DOC: design-thoughts: commit numa-auto.txt
    - MINOR: cpuset: make the API support negative CPU IDs
    - MINOR: thread: rely on the cpuset functions to count bound CPUs
    - MINOR: cpu-topo: add ha_cpu_topo definition
    - MINOR: cpu-topo: allocate and initialize the ha_cpu_topo array.
    - MINOR: cpu-topo: rely on _SC_NPROCESSORS_CONF to trim maxcpus
    - MINOR: cpu-topo: add a function to dump CPU topology
    - MINOR: cpu-topo: update CPU topology from excluded CPUs at boot
    - REORG: cpu-topo: move bound cpu detection from cpuset to cpu-topo
    - MINOR: cpu-topo: add detection of online CPUs on Linux
    - MINOR: cpu-topo: add detection of online CPUs on FreeBSD
    - MINOR: cpu-topo: try to detect offline cpus at boot
    - MINOR: cpu-topo: add CPU topology detection for linux
    - MINOR: cpu-topo: also store the sibling ID with SMT
    - MINOR: cpu-topo: add NUMA node identification to CPUs on Linux
    - MINOR: cpu-topo: add NUMA node identification to CPUs on FreeBSD
    - MINOR: thread: turn thread_cpu_mask_forced() into an init-time variable
    - MINOR: cfgparse: move the binding detection into numa_detect_topology()
    - MINOR: cfgparse: use already known offline CPU information
    - MINOR: global: add a command-line option to enable CPU binding debugging
    - MINOR: cpu-topo: add a new "cpu-set" global directive to choose cpus
    - MINOR: cpu-topo: add "drop-cpu" and "only-cpu" to cpu-set
    - MEDIUM: thread: start to detect thread groups and threads min/max
    - MEDIUM: cpu-topo: make sure to properly assign CPUs to threads as a fallback
    - MEDIUM: thread: reimplement first numa node detection
    - MEDIUM: cfgparse: remove now unused numa & thread-count detection
    - MINOR: cpu-topo: refine cpu dump output to better show kept/dropped CPUs
    - MINOR: cpu-topo: fall back to nominal_perf and scaling_max_freq for the capacity
    - MINOR: cpu-topo: use cpufreq before acpi cppc
    - MINOR: cpu-topo: boost the capacity of performance cores with cpufreq
    - MINOR: cpu-topo: skip CPU detection when /sys/.../cpu does not exist
    - MINOR: cpu-topo: skip identification of non-existing CPUs
    - MINOR: cpu-topo: skip CPU properties that we've verified do not exist
    - MINOR: cpu-topo: implement a sorting mechanism for CPU index
    - MINOR: cpu-topo: implement a sorting mechanism by CPU locality
    - MINOR: cpu-topo: implement a CPU sorting mechanism by cluster ID
    - MINOR: cpu-topo: ignore single-core clusters
    - MINOR: cpu-topo: assign clusters to cores without and renumber them
    - MINOR: cpu-topo: make sure we don't leave unassigned IDs in the cpu_topo
    - MINOR: cpu-topo: assign an L3 cache if more than 2 L2 instances
    - MINOR: cpu-topo: renumber cores to avoid holes and make them contiguous
    - MINOR: cpu-topo: add a function to sort by cluster+capacity
    - MINOR: cpu-topo: consider capacity when forming clusters
    - MINOR: cpu-topo: create an array of the clusters
    - MINOR: cpu-topo: ignore excess of too small clusters
    - MINOR: cpu-topo: add "only-node" and "drop-node" to cpu-set
    - MINOR: cpu-topo: add "only-thread" and "drop-thread" to cpu-set
    - MINOR: cpu-topo: add "only-core" and "drop-core" to cpu-set
    - MINOR: cpu-topo: add "only-cluster" and "drop-cluster" to cpu-set
    - MINOR: cpu-topo: add a CPU policy setting to the global section
    - MINOR: cpu-topo: add a 'first-usable-node' cpu policy
    - MEDIUM: cpu-topo: use the "first-usable-node" cpu-policy by default
    - CLEANUP: thread: now remove the temporary CPU node binding code
    - MINOR: cpu-topo: add cpu-policy "group-by-cluster"
    - MEDIUM: cpu-topo: let the "group-by-cluster" split groups
    - MINOR: cpu-topo: add a new "performance" cpu-policy
    - MINOR: cpu-topo: add a new "efficiency" cpu-policy
    - MINOR: cpu-topo: add a new "resource" cpu-policy
    - MINOR: jws: add new functions in jws.h
    - MINOR: cpu-topo: fix unused stack var 'cpu2' reported by coverity
    - MINOR: hlua: add an optional timeout to AppletTCP:receive()
    - MINOR: jws: use jwt_alg type instead of a char
    - BUG/MINOR: log: prevent saddr NULL deref in syslog_io_handler()
    - MINOR: stream: decrement srv->served after detaching from the list
    - BUG/MINOR: hlua: fix optional timeout argument index for AppletTCP:receive()
    - MINOR: server: simplify srv_has_streams()
    - CLEANUP: server: make it clear that srv_check_for_deletion() is thread-safe
    - MINOR: cli/server: don't take thread isolation to check for srv-removable
    - BUG/MINOR: limits: compute_ideal_maxconn: don't cap remain if fd_hard_limit=0
    - MINOR: limits: fix check_if_maxsock_permitted description
    - BUG/MEDIUM: hlua/cli: fix cli applet UAF in hlua_applet_wakeup()
    - MINOR: tools: path_base() concatenates a path with a base path
    - MEDIUM: ssl/ckch: make the ckch_conf more generic
    - BUG/MINOR: mux-h2: Reset streams with NO_ERROR code if full response was already sent
    - MINOR: stats: add .generic explicit field in stat_col struct
    - MINOR: stats: STATS_PX_CAP___B_ macro
    - MINOR: stats: add .cap for some static metrics
    - MINOR: stats: use stat_col storage stat_cols_info
    - MEDIUM: promex: switch to using stat_cols_info for global metrics
    - MINOR: promex: expose ST_I_INF_WARNINGS (AKA total_warnings) metric
    - MEDIUM: promex: switch to using stat_cols_px for front/back/server metrics
    - MINOR: stats: explicitly add frontend cap for ST_I_PX_REQ_TOT
    - CLEANUP: promex: remove unused PROMEX_FL_{INFO,FRONT,BACK,LI,SRV} flags
    - BUG/MEDIUM: mux-quic: fix crash on RS/SS emission if already close local
    - BUG/MINOR: mux-quic: remove extra BUG_ON() in _qcc_send_stream()
    - MEDIUM: mt_list: Reduce the max number of loops with exponential backoff
    - MINOR: stats: add alt_name field to stat_col struct
    - MINOR: stats: add alt name info to stat_cols_info where relevant
    - MINOR: promex: get rid of promex_global_metric array
    - MINOR: stats-proxy: add alt_name field for ME_NEW_{FE,BE,PX} helpers
    - MINOR: stats-proxy: add alt name info to stat_cols_px where relevant
    - MINOR: promex: get rid of promex_st_metrics array
    - MINOR: pools: rename the "by_what" field of the show pools context to "how"
    - MINOR: cli/pools: record the list of pool registrations even when merging them
2025-03-21 17:33:36 +01:00
Willy Tarreau
9091c5317f MINOR: cli/pools: record the list of pool registrations even when merging them
By default, create_pool() tries to merge similar pools into one. But when
dealing with certain bugs, it's hard to say which ones were merged together.
We do have the information at registration time, so let's just create a
list of registrations ("pool_registration") attached to each pool, that
will store that information. It can then be consulted on the CLI using
"show pools detailed", where the names, sizes, alignment and flags are
reported.
2025-03-21 17:09:30 +01:00
Willy Tarreau
baf8b742b4 MINOR: pools: rename the "by_what" field of the show pools context to "how"
The goal will be to support other dump options. We don't need 32 bits to
express sorting criteria, let's reserve only 4 bits for them and leave
the remaining ones unused.
2025-03-21 17:09:30 +01:00
Aurelien DARRAGON
83074bf690 MINOR: promex: get rid of promex_st_metrics array
In this patch we pursue the work started in a5aadbd ("MEDIUM: promex:
switch to using stat_cols_px for front/back/server metrics"):

Indeed, while having ".promex_name" info in stat_cols_info generic array
was confusing, Willy suggested that we have ".alt_name" which stays
generic and may be considered by alternative exporters for metric naming.
For now, only promex exporter will make use of it.

Thanks to this, it allows us to completely get rid of the
stat_cols_px array. The other main benefit is that it will be much harder
to overlook promex metric definition now because .alt_name has more
visibility in the main metric array rather than in an addon file.
2025-03-21 17:05:31 +01:00
Aurelien DARRAGON
276491dc22 MINOR: stats-proxy: add alt name info to stat_cols_px where relevant
For all metrics defined under promex_st_metrics array, add the
corresponding .alt_name field in the general purpose stat_cols_px
array.
2025-03-21 17:05:26 +01:00
Aurelien DARRAGON
7f9d8c1327 MINOR: stats-proxy: add alt_name field for ME_NEW_{FE,BE,PX} helpers
For now alt_name is systematically set to NULL. Thanks to this change we
may easily add an altname to existing metrics. Also by requiring explicit
value it offers more visibility for this field.
2025-03-21 17:05:19 +01:00
Aurelien DARRAGON
155fb4ec74 MINOR: promex: get rid of promex_global_metric array
In this patch we pursue the work started in 1adc796 ("MEDIUM: promex:
switch to using stat_cols_info for global metrics"):

Indeed, while having ".promex_name" info in stat_cols_info generic array
was confusing, Willy suggested that we have ".alt_name" which stays
generic and may be considered by alternative exporters for metric naming.
For now, only promex exporter will make use of it.

Thanks to this, it allows us to completely get rid of the
promex_global_metric array. The other main benefit is that it will be
much harder to overlook promex metric definition now because .alt_name
has more visibility in the main metric array rather than in an addon file.
2025-03-21 17:05:14 +01:00
Aurelien DARRAGON
b03e05cd36 MINOR: stats: add alt name info to stat_cols_info where relevant
For all metrics defined under promex_global_metrics array, add the
corresponding .alt_name field in the general purpose stat_cols_info
array.
2025-03-21 17:05:02 +01:00
Aurelien DARRAGON
7ec6f4412c MINOR: stats: add alt_name field to stat_col struct
alt_name will be used by metric exporters to know how the metric should be
presented to the user. If the alt_name is NULL, the metric should be
ignored. For now only promex exporter will make use of this.
2025-03-21 17:04:54 +01:00
Olivier Houchard
98967aa09f MEDIUM: mt_list: Reduce the max number of loops with exponential backoff
Reduce the max number of loops in the mt_list code while waiting for
a lock to be available with exponential backoff. It's been observed that
the current value led to severe performances degradation at least on
some hardware, hopefully this value will be acceptable everywhere.
2025-03-21 11:30:59 +01:00
Amaury Denoyelle
c5f8df8d55 BUG/MINOR: mux-quic: remove extra BUG_ON() in _qcc_send_stream()
The following patch fixed a BUG_ON() which could be triggered if RS/SS
emission was scheduled after stream local closure.
  7ee1279f4b8416435faba5cb93a9be713f52e4df
  BUG/MEDIUM: mux-quic: fix crash on RS/SS emission if already close local

qcc_send_stream() was rewritten as a wrapper around an internal
_qcc_send_stream() used to bypass the faulty BUG_ON(). However, an extra
unnecessary BUG_ON() was added by mistake in _qcc_send_stream().

This should not cause any issue, as the BUG_ON() is only active if <urg>
argument is false, which is not the case for RS/SS emission. However,
this patch is labelled as a bug as this BUG_ON() is unnecessary and may
cause issues in the future.

This should be backported up to 2.8, after the above mentionned patch.
2025-03-20 18:18:52 +01:00
Amaury Denoyelle
7ee1279f4b BUG/MEDIUM: mux-quic: fix crash on RS/SS emission if already close local
A BUG_ON() is present in qcc_send_stream() to ensure that emission is
never performed with a stream already closed locally. However, this
function is also used for RESET_STREAM/STOP_SENDING emission. No
protection exists to ensure that RS/SS is not scheduled after stream
local closure, which would result in this BUG_ON() crash.

This crash can be triggered with the following QUIC client sequence :
1. SS is emitted to open a new stream. QUIC-MUX schedules a RS emission
   by and the stream is locally closed.
2. An invalid HTTP/3 request is sent on the same stream, for example
   with duplicated pseudo-headers. The objective is to ensure
   qcc_abort_stream_read() is called after stream closure, which results
   in the following backtrace.

 0x000055555566a620 in qcc_send_stream (qcs=0x7ffff0061420, urg=1, count=0) at src/mux_quic.c:1633
 1633            BUG_ON(qcs_is_close_local(qcs));
 [ ## gdb ## ] bt
 #0  0x000055555566a620 in qcc_send_stream (qcs=0x7ffff0061420, urg=1, count=0) at src/mux_quic.c:1633
 #1  0x000055555566a921 in qcc_abort_stream_read (qcs=0x7ffff0061420) at src/mux_quic.c:1658
 #2  0x0000555555685426 in h3_rcv_buf (qcs=0x7ffff0061420, b=0x7ffff748d3f0, fin=0) at src/h3.c:1454
 #3  0x0000555555668a67 in qcc_decode_qcs (qcc=0x7ffff0049eb0, qcs=0x7ffff0061420) at src/mux_quic.c:1315
 #4  0x000055555566c76e in qcc_recv (qcc=0x7ffff0049eb0, id=12, len=0, offset=23, fin=0 '\000',
     data=0x7fffe0049c1c "\366\r,\230\205\354\234\301;\2563\335\037k\306\334\037\260", <incomplete sequence \323>) at src/mux_quic.c:1901
 #5  0x0000555555692551 in qc_handle_strm_frm (pkt=0x7fffe00484b0, strm_frm=0x7ffff00539e0, qc=0x7fffe0049220, fin=0 '\000') at src/quic_rx.c:635
 #6  0x0000555555694530 in qc_parse_pkt_frms (qc=0x7fffe0049220, pkt=0x7fffe00484b0, qel=0x7fffe0075fc0) at src/quic_rx.c:980
 #7  0x0000555555696c7a in qc_treat_rx_pkts (qc=0x7fffe0049220) at src/quic_rx.c:1324
 #8  0x00005555556b781b in quic_conn_app_io_cb (t=0x7fffe0037f20, context=0x7fffe0049220, state=49232) at src/quic_conn.c:601
 #9  0x0000555555d53788 in run_tasks_from_lists (budgets=0x7ffff748e2b0) at src/task.c:603
 #10 0x0000555555d541ae in process_runnable_tasks () at src/task.c:886
 #11 0x00005555559c39e9 in run_poll_loop () at src/haproxy.c:2858
 #12 0x00005555559c41ea in run_thread_poll_loop (data=0x55555629fb40 <ha_thread_info+64>) at src/haproxy.c:3075

The proper solution is to not execute this BUG_ON() for RS/SS emission.
Indeed, it is valid and can be useful to emit these frames, even after
stream local closure.

To implement this, qcc_send_stream() has been rewritten as a mere
wrapper function around the new internal _qcc_send_stream(). The latter
is used only by QMUX for STREAM, RS and SS emission. Application layer
continue to use the original function for STREAM emission, with the
BUG_ON() still in place there.

This must be backported up to 2.8.
2025-03-20 17:32:14 +01:00
Aurelien DARRAGON
85f2f93d11 CLEANUP: promex: remove unused PROMEX_FL_{INFO,FRONT,BACK,LI,SRV} flags
Now promex metric dumping relies on stat_cols API, we don't make use of
these flags, so let's remove them.
2025-03-20 11:42:58 +01:00
Aurelien DARRAGON
2ab82124ec MINOR: stats: explicitly add frontend cap for ST_I_PX_REQ_TOT
While being a generic metric, ST_I_PX_REQ_TOT is handled specifically for
the frontend case. But the frontend capability isn't set for that metric
It is actually quite misleading, because the capability may be checked
to see whether the metric is relevant for a given scope, yet it is
relevant for frontend scope.

In this patch we also add the frontend capability for the metric.
2025-03-20 11:42:43 +01:00
Aurelien DARRAGON
a5aadbd512 MEDIUM: promex: switch to using stat_cols_px for front/back/server metrics
Now the stat_cols_px array contains all info that-prometheus requires
stop using the promex_st_metrics array that contains redundant infos.

As for ("MEDIUM: promex: switch to using stat_cols_info for global
metrics"), initial goal was to completely get rid of promex_st_metrics
array, but it turns out it is still required but only for the name
mapping part now. So in this commit we change it from complex structure
array (with redundant info) to a simple ist array with the
metric id:promex name mapping. If a metric name is not defined there, then
promex ignores it.
2025-03-20 11:40:07 +01:00
Aurelien DARRAGON
d31ef6134a MINOR: promex: expose ST_I_INF_WARNINGS (AKA total_warnings) metric
It has been requested to have the ST_I_INF_WARNINGS metric available from
prometheus, let's define it in promex_global_metrics ist array so that
prometheus starts advertising it.
2025-03-20 11:39:16 +01:00
Aurelien DARRAGON
1adc796c4b MEDIUM: promex: switch to using stat_cols_info for global metrics
Now the stat_cols_info array contains all info that prometheus requires,
stop using the promex_global_metrics array that contains redundant infos.

Initial goal was to completely drop the promex_global_metrics array.
However it was deemed no longer relevant as prometheus stats rely on a
custom name that cannot be derived from stat_cols_info[], unless we add
a specific ".promex_name" field or similar to name the stats for
prometheus. This is what was carried over on a first attempt but it proved
to burden stat_cols_info[] array (not only memory wise, it is quite
confusing to see promex in the main codebase, given that prometheus is
shipped as an optional add-on).

The new strategy consists in revamping the promex_global_metrics array
from promex_metric (with all redundant fields for metrics) to a simple
ID<==>IST mapping. If the metric is mapped, then it means promex addon
should advertise it (using the name provided in the mapping). Now for
all the metric retrieval, no longer rely on built-in hardcoded values
but instead leverage the new stat cols API.

The tricky part is the .type association because the general rule doesn't
apply for all metrics as it seems that we stated that some non-counters
oriented metrics (at least from haproxy point of view) had to be presented
as counter metrics. So in this patch we add some special treatment for
those metrics to emulate the old behavior. If that's not relevant in the
future, it may be removed. But this requires to ensure that promex users
will properly cope with that change. At least for now, no change of
behavior should be expected.
2025-03-20 11:38:56 +01:00
Aurelien DARRAGON
af68343a56 MINOR: stats: use stat_col storage stat_cols_info
Use stat_col storage for stat_cols_info[] array instead of name_desc.

As documented in 65624876f ("MINOR: stats: introduce a more expressive
stat definition method"), stat_col supersedes name_desc storage but
it remains backward compatible. Here we migrate to the new API to be
able to further extend stat_cols_info[] in following patches.
2025-03-20 11:38:32 +01:00
Aurelien DARRAGON
8aa8626d12 MINOR: stats: add .cap for some static metrics
Goal is to merge promex metrics definition into the main one.
Promex metrics will use the metric capability to know available scopes,
thus only metrics relevant for prometheus were updated.
2025-03-20 11:38:17 +01:00
Aurelien DARRAGON
9c60fc9fe1 MINOR: stats: STATS_PX_CAP___B_ macro
STATS_PX_CAP___B_ points to STATS_PX_CAP_BE, it is just an alias
for consistency, like STATS_PX_CAP____S which points to
STATS_PX_CAP_SRV.
2025-03-20 11:37:47 +01:00
Aurelien DARRAGON
3c1b00b127 MINOR: stats: add .generic explicit field in stat_col struct
Further extend logic implemented in 65624876 ("MINOR: stats: introduce a
more expressive stat definition method") and 4e9e8418 ("MINOR: stats:
prepare stats-file support for values other than FN_COUNTER"): we don't
rely anymore on the presence of the capability to know if the metric is
generic or not. This is because it prevents us from setting a capability
on static statistics. Yet it could be useful to set the capability even
on static metrics, thus we add a dedicated .generic bit to tell haproxy
that the metric is generic and can be handled automatically by the API.

Also, ME_NEW_* helpers are not explicitly associated to generic metric
definition (as it was already the case before) to avoid ambiguities.
It may change in the future as we may need to use the new definition
method to define static metrics (without the generic bit set). But for
now it isn't the case as this need definition was implemented for generic
metrics support in the first place. If we want to define static metrics
using the API, we could add a new set of helpers for instance.
2025-03-20 11:37:21 +01:00
Christopher Faulet
e87397bc7d BUG/MINOR: mux-h2: Reset streams with NO_ERROR code if full response was already sent
On frontend side, when a stream is shut while the response was already fully
sent, it was cancelled by sending a RST_STREAM(CANCEL) frame. However, it is
not accurrate. CANCEL error code must only be used if the response headers
were sent, but not the full response. As stated in the RFC 9113, when the
response was fully sent, to stop the request sending, a RST_STREAM with an
error code of NO_ERROR must be sent.

This patch should solve the issue #1219. It must be backported to all stable
versions.
2025-03-20 08:36:06 +01:00
William Lallemand
2fb6270910 MEDIUM: ssl/ckch: make the ckch_conf more generic
The ckch_store_load_files() function makes specific processing for
PARSE_TYPE_STR as if it was a type only used for paths.

This patch changes a little bit the way it's done,
PARSE_TYPE_STR is only meant to strdup() a string and stores the
resulting pointer in the ckch_conf structure.

Any processing regarding the path is now done in the callback.

Since the callbacks were basically doing the same thing, they were
transformed into the DECLARE_CKCH_CONF_LOAD() macros which allows to
do some templating of these functions.

The resulting ckch_conf_load_* functions will do the same as before,
except they will also do the path processing instead of letting
ckch_store_load_files() do it, which means we don't need the "base"
member anymore in the struct ckch_conf_kws.
2025-03-19 18:08:40 +01:00
William Lallemand
b0ad777902 MINOR: tools: path_base() concatenates a path with a base path
With the SSL configuration, crt-base, key-base are often used, these
keywords concatenates the base path with the path when the path does not
start by  '/'.

This is done at several places in the code, so a function to do this
would be better to standardize the code.
2025-03-19 17:59:31 +01:00
Aurelien DARRAGON
21601f4a27 BUG/MEDIUM: hlua/cli: fix cli applet UAF in hlua_applet_wakeup()
Recent commit e5e36ce09 ("BUG/MEDIUM: hlua/cli: Fix lua CLI commands
to work with applet's buffers") revealed a bug in hlua cli applet handling

Indeed, playing with Willy's lua tetris script on the cli, a segfault
would be encountered when forcefully closing the session by sending a
CTRL+C on the terminal.

In fact the crash was caused by a UAF: while the cli applet was already
freed, the lua task responsible for waking it up would still point to it.
Thus hlua_applet_wakeup() could be called even if the applet didn't exist
anymore.

To fix the issue, in hlua_cli_io_release_fct() we must also free the hlua
task linked to the applet, like we already do for
hlua_applet_tcp_release() and hlua_applet_http_release().

While this bug exists on stable versions (where it should be backported
too for precaution), it only seems to be triggered starting with 3.0.
2025-03-19 17:03:28 +01:00
Valentine Krasnobaeva
6986e3f41f MINOR: limits: fix check_if_maxsock_permitted description
Fix typo in check_if_maxsock_permitted() description.
2025-03-18 17:38:04 +01:00
Valentine Krasnobaeva
060f441199 BUG/MINOR: limits: compute_ideal_maxconn: don't cap remain if fd_hard_limit=0
'global.fd_hard_limit' stays uninitialized, if haproxy is started with -m
(global.rlimit_memmax). 'remain' is the MAX between soft and hard process fd
limits. It will be always bigger than 'global.fd_hard_limit' (0) in this case.

So, if we reassign 'remain' to the 'global.fd_hard_limit' unconditionally,
calculated then 'maxconn' will be even negative and the DEFAULT_MAXCONN (100)
will be set as the 'ideal_maxconn'.

During the 'global.maxconn' calculations in set_global_maxconn(), if the
provided 'global.rlimit_memmax' is quite big, system will refuse to calculate
based on its 'global.maxconn' and we will do a fallback to the 'ideal_maxconn',
which is 100.

Same problem for the configs with SSL frontends and backends.

This fixes the issue #2899.

This should be backported to v3.1.0.
2025-03-18 17:37:33 +01:00
Willy Tarreau
6336b636f7 MINOR: cli/server: don't take thread isolation to check for srv-removable
Thanks to the previous commits, we now know that "wait srv-removable"
does not require thread isolation, as long as 3372a2ea00 ("BUG/MEDIUM:
queues: Stricly respect maxconn for outgoing connections") and c880c32b16
("MINOR: stream: decrement srv->served after detaching from the list")
are present. Let's just get rid of thread_isolate() here, which can
consume a lot of CPU on highly threaded machines when removing many
servers at once.
2025-03-18 17:36:02 +01:00
Willy Tarreau
aad8e74cb9 CLEANUP: server: make it clear that srv_check_for_deletion() is thread-safe
This function was marked as requiring thread isolation because its code
was extracted from cli_parse_delete_server() and was running under
isolation. But upon closer inspection, and using atomic loads to check
a few counters, it is actually safe to run without isolation, so let's
reflect that in its description.

However, it remains true that cli_parse_delete_server() continues to call
it under isolation.
2025-03-18 17:36:02 +01:00
Willy Tarreau
0e8c573b4b MINOR: server: simplify srv_has_streams()
Now that thanks to commit c880c32b16 ("MINOR: stream: decrement
srv->served after detaching from the list") we can trust srv->served,
let's use it and no longer loop on threads when checking if a server
still has streams attached to it. This will be much cheaper and will
result in keeping isolation for a shorter time in the "wait" command.
2025-03-18 17:36:02 +01:00
Aurelien DARRAGON
4651c4edd5 BUG/MINOR: hlua: fix optional timeout argument index for AppletTCP:receive()
Baptiste reported that using the new optional timeout argument introduced
in 19e48f2 ("MINOR: hlua: add an optional timeout to AppletTCP:receive()")
the following error would occur at some point:

runtime error: file.lua:lineno: bad argument #-2 to 'receive' (number
expected, got light userdata) from [C]: in method 'receive...

In fact this is caused by exp_date being retrieved using relative index -1
instead of absolute index 3. Indeed, while using relative index is fine
most of the time when we trust the stack, when combined with yielding the
top of the stack when resuming from yielding is not necessarily the same
as when the function was first called (ie: if some data was pushed to the
stack in the yieldable function itself). As such, it is safer to use
explicit index to access exp_date variable at position 3 on the stack.

It was confirmed that doing so addresses the issue.

No backport needed unless 19e48f2 is.
2025-03-18 16:48:32 +01:00
Willy Tarreau
c880c32b16 MINOR: stream: decrement srv->served after detaching from the list
In commit 3372a2ea00 ("BUG/MEDIUM: queues: Stricly respect maxconn for
outgoing connections"), it has been ensured that srv->served is held
as long as possible around the periods where a stream is attached to a
server. However, it's decremented early when entering sess_change_server,
and actually just before detaching from that server's list. While there
is theoretically nothing wrong with this, it prevents us from looking at
this counter to know if streams are still using a server or not.

We could imagine decrementing it much later but that wouldn't work with
leastconn, since that algo needs ->served to be final before calling
lbprm.server_drop_conn(). Thus what we're doing here is to detach from
the server, then decrement ->served, and only then call the LB callback
to update the server's position in the tree. At this moment the stream
doesn't know the server anymore anyway (except via this function's
local variable) so it's safe to consider that no stream knows the server
once the variable reaches zero.
2025-03-18 11:43:52 +01:00
Aurelien DARRAGON
7895726bff BUG/MINOR: log: prevent saddr NULL deref in syslog_io_handler()
In ad0133cc ("MINOR: log: handle log-forward "option host""), we
de-reference saddr without first checking if saddr is NULL. In practise
saddr shouldn't be null, but it may be the case if memory error happens
for tcp syslog handler so we must assume that it can be NULL at some
point.

To fix the bug, we simply check for NULL before de-referencing it
under syslog_io_handler(), as the function comment suggests.

No backport needed unless ad0133cc is.
2025-03-18 00:13:19 +01:00
William Lallemand
29b4b985c3 MINOR: jws: use jwt_alg type instead of a char
This patch implements the function EVP_PKEY_to_jws_algo() which returns
a jwt_alg compatible with the private key.

This value can then be passed to jws_b64_protected() and
jws_b64_signature() which modified to take an jwt_alg instead of a char.
2025-03-17 18:06:34 +01:00
Willy Tarreau
19e48f237f MINOR: hlua: add an optional timeout to AppletTCP:receive()
TCP services might want to be interactive, and without a timeout on
receive(), the possibilities are a bit limited. Let's add an optional
timeout in the 3rd argument to possibly limit the wait time. In this
case if the timeout strikes before the requested size is complete,
a possibly incomplete block will be returned.
2025-03-17 16:19:34 +01:00
Valentine Krasnobaeva
557f62593f MINOR: cpu-topo: fix unused stack var 'cpu2' reported by coverity
Coverity has reported that cpu2 seems sometimes unused in
cpu_fixup_topology():

*** CID 1593776:  Code maintainability issues  (UNUSED_VALUE)
/src/cpu_topo.c: 690 in cpu_fixup_topology()
684                             continue;
685
686                     if (ha_cpu_topo[cpu].cl_gid != curr_id) {
687                             if (curr_id >= 0 && cl_cpu <= 2)
688                                     small_cl++;
689                             cl_cpu = 0;
>>>     CID 1593776:  Code maintainability issues  (UNUSED_VALUE)
>>>     Assigning value from "cpu" to "cpu2" here, but that stored value is overwritten before it can be used.
690                             cpu2 = cpu;
691                             curr_id = ha_cpu_topo[cpu].cl_gid;
692                     }
693                     cl_cpu++;
694             }
695

That's it. 'cpu2' automatic/stack variable is used only in for() loop scopes to
save cpus ID in which we are interested in. In the loop pointed by coverity
this variable is not used for further processing within the loop's scope.
Then it is always reinitialized to 0 in the another following loops.

This fixes GitHUb issue #2895.
2025-03-17 14:53:36 +01:00
William Lallemand
de67f25a7e MINOR: jws: add new functions in jws.h
Add signatures of jws_b64_payload(), jws_b64_protected(),
jws_b64_signature(), jws_flattened() which allows to create a complete
JWS flattened object.
2025-03-17 11:51:52 +01:00
Willy Tarreau
e3fd9970a9 MINOR: cpu-topo: add a new "resource" cpu-policy
This cpu policy keeps the smallest CPU cluster. This can
be used to limit the resource usage to the strict minimum
that still delivers decent performance, for example to
try to further reduce power consumption or minimize the
number of cores needed on some rented systems for a
sidecar setup, in order to scale the system down more
easily. Note that if a single cluster is present, it
will still be fully used.

When started on a 64-core EPYC gen3, it uses only one CCX
with 8 cores and 16 threads, all in the same group.
2025-03-14 18:33:16 +01:00
Willy Tarreau
ad3650c354 MINOR: cpu-topo: add a new "efficiency" cpu-policy
This cpu policy tries to evict performant core clusters and only
focuses on efficiency-oriented ones. On an intel i9-14900k, we can
get 525k rps using 8 performance cores, versus 405k when using all
24 efficiency cores. In some cases the power savings might be more
desirable (e.g. scalability tests on a developer's laptop), or the
performance cores might be better suited for another component
(application or security component).
2025-03-14 18:33:16 +01:00
Willy Tarreau
dcae2fa4a4 MINOR: cpu-topo: add a new "performance" cpu-policy
This cpu policy tries to evict efficient core clusters and only
focuses on performance-oriented ones. On an intel i9-14900k, we can
get 525k rps using only 8 cores this way, versus 594k when using all
24 cores. The gains from using all these codes are not significant
enough to waste them on this. Also these cores can be much slower
at doing SSL handshakes so it can make sense to evict them. Better
keep the efficiency cores for network interrupts for example.

Also, on a developer's machine it can be convenient to keep all these
cores for the local tasks and extra tools (load generators etc).
2025-03-14 18:33:16 +01:00
Willy Tarreau
96cd420dc3 MEDIUM: cpu-topo: let the "group-by-cluster" split groups
When a cluster is too large to fit into a single group, let's split it
into two equal groups, which will still be allowed to use all the CPUs
of the cluster. This allows haproxy to start all the threads with a
minimum number of groups (e.g. 2x40 for 80 cores).
2025-03-14 18:33:16 +01:00
Willy Tarreau
8aeb096740 MINOR: cpu-topo: add cpu-policy "group-by-cluster"
This policy forms thread groups from the CPU clusters, and bind all the
threads in them to all the CPUs of the cluster. This is recommended on
system with bad inter-CCX latencies. It was shown to simply triple the
performance with queuing on a 64-core EPYC without having to manually
assign the cores with cpu-map.
2025-03-14 18:33:16 +01:00
Willy Tarreau
aaa4080b8b CLEANUP: thread: now remove the temporary CPU node binding code
This is now superseded by the default "safe" cpu-policy, and every time
it's used, that code was bypassed anyway since global.nbthread was set.
We can now safely remove it. Note that for other policies which do not
set a thread count nor further restrict CPUs (such as "none", or even
"safe" when finding a single node), we continue to go through the fallback
code that automatically assigns CPUs to threads and counts them.
2025-03-14 18:33:16 +01:00
Willy Tarreau
56d939866b MEDIUM: cpu-topo: use the "first-usable-node" cpu-policy by default
This now turns the cpu-policy to "first-usable-node" by default, so that
we preserve the current default behavior consisting in binding to the
first node if nothing was forced. If a second node is found,
global.nbthread is set and the previous code will be skipped.
2025-03-14 18:33:16 +01:00
Willy Tarreau
7fc6cdd0b1 MINOR: cpu-topo: add a 'first-usable-node' cpu policy
This is a reimplemlentation of the current default policy. It binds to
the first node having usable CPUs if found, and drops CPUs from the
second and next nodes.
2025-03-14 18:33:16 +01:00
Willy Tarreau
156430ceb6 MINOR: cpu-topo: add a CPU policy setting to the global section
We'll need to let the user decide what's best for their workload, and in
order to do this we'll have to provide tunable options. For that, we're
introducing struct ha_cpu_policy which contains a name, a description
and a function pointer. The purpose will be to use that function pointer
to choose the best CPUs to use and now to set the number of threads and
thread-groups, that will be called during the thread setup phase. The
only supported policy for now is "none" which doesn't set/touch anything
(i.e. all available CPUs are used).
2025-03-14 18:33:16 +01:00
Willy Tarreau
9a8e8af11a MINOR: cpu-topo: add "only-cluster" and "drop-cluster" to cpu-set
These are processed after the topology is detected, and they allow to
restrict binding to or evict CPUs matching the indicated hardware
cluster number(s). It can be used to bind to only some clusters, such
as CCX or different energy efficiency cores. For this reason, here we
use the cluster's local ID (local to the node).
2025-03-14 18:33:16 +01:00
Willy Tarreau
a946cfa8b5 MINOR: cpu-topo: add "only-core" and "drop-core" to cpu-set
These are processed after the topology is detected, and they allow to
restrict binding to or evict CPUs matching the indicated hardware
core number(s). It can be used to bind to only some clusters as well
as to evict efficient cores whose number is known.
2025-03-14 18:33:16 +01:00
Willy Tarreau
c591c9d6a6 MINOR: cpu-topo: add "only-thread" and "drop-thread" to cpu-set
These are processed after the topology is detected, and they allow to
restrict binding to or evict CPUs matching the indicated hardware
thread number(s). It can be used to reserve even threads for HW IRQs
and odd threads for haproxy for example, or to evict efficient cores
that do only have thread #0.
2025-03-14 18:33:16 +01:00
Willy Tarreau
c93ee25054 MINOR: cpu-topo: add "only-node" and "drop-node" to cpu-set
These are processed after the topology is detected, and they allow to
restrict binding to or evict CPUs matching the indicated node(s).
2025-03-14 18:33:16 +01:00
Willy Tarreau
7263366606 MINOR: cpu-topo: ignore excess of too small clusters
On some Arm systems (typically A76/N1) where CPUs can be associated in
pairs, clusters are reported while they have no incidence on I/O etc.
Yet it's possible to have tens of clusters of 2 CPUs each, which is
counter productive since it does not even allow to start enough threads.

Let's detect this situation as soon as there are at least 4 clusters
having each 2 CPUs or less, which is already very suspcious. In this
case, all these clusters will be reset as meaningless. In the worst
case if needed they'll be re-assigned based on L2/L3.
2025-03-14 18:33:12 +01:00
Willy Tarreau
aa4776210b MINOR: cpu-topo: create an array of the clusters
The goal here is to keep an array of the known CPU clusters, because
we'll use that often to decide of the performance of a cluster and
its relevance compared to other ones. We'll store the number of CPUs
in it, the total capacity etc. For the capacity, we count one unit
per core, and 1/3 of it per extra SMT thread, since this is roughly
what has been measured on modern CPUs.

In order to ease debugging, they're also dumped with -dc.
2025-03-14 18:30:31 +01:00
Willy Tarreau
204ac3c0b6 MINOR: cpu-topo: consider capacity when forming clusters
By using the cluster+capacity sorting function we can detect
heterogneous clusters which are not properly reported. Thanks to this,
the following misnumbered machine featuring 4 big cores, 4 medium ones
an 4 small ones is properly detected with its clusters correctly
assigned:

      [keep] thr=  0 -> cpu=  0 pk=00 no=00 cl=000 ts=000 capa=1024
      [keep] thr=  1 -> cpu=  1 pk=00 no=00 cl=002 ts=008 capa=278
      [keep] thr=  2 -> cpu=  2 pk=00 no=00 cl=002 ts=009 capa=278
      [keep] thr=  3 -> cpu=  3 pk=00 no=00 cl=002 ts=010 capa=278
      [keep] thr=  4 -> cpu=  4 pk=00 no=00 cl=002 ts=011 capa=278
      [keep] thr=  5 -> cpu=  5 pk=00 no=00 cl=001 ts=004 capa=905
      [keep] thr=  6 -> cpu=  6 pk=00 no=00 cl=001 ts=005 capa=905
      [keep] thr=  7 -> cpu=  7 pk=00 no=00 cl=001 ts=006 capa=866
      [keep] thr=  8 -> cpu=  8 pk=00 no=00 cl=001 ts=007 capa=866
      [keep] thr=  9 -> cpu=  9 pk=00 no=00 cl=000 ts=001 capa=984
      [keep] thr= 10 -> cpu= 10 pk=00 no=00 cl=000 ts=002 capa=984
      [keep] thr= 11 -> cpu= 11 pk=00 no=00 cl=000 ts=003 capa=1024

Also this has the benefit of always assigning highest performance
clusters with the smallest IDs so that simple configs can decide to
simply bind to cluster 0 or clusters 0,1 and benefit from optimal
performance.
2025-03-14 18:30:31 +01:00
Willy Tarreau
4a6eaf6c5e MINOR: cpu-topo: add a function to sort by cluster+capacity
The purpose here is to detect heterogenous clusters which are not
properly reported, based on the exposed information about the cores
capacity. The algorithm here consists in sorting CPUs by capacity
within a cluster, and considering as equal all those which have 5%
or less difference in capacity with the previous one. This allows
large clusters of more than 5% total between extremities, while
keeping apart those where the limit is more pronounced. This is
quite common in embedded environments with big.little systems, as
well as on some laptops.
2025-03-14 18:30:31 +01:00
Willy Tarreau
0290b807dd MINOR: cpu-topo: renumber cores to avoid holes and make them contiguous
Due to the way core numbers are assigned and the presence of SMT on
some of them, some holes may remain in the array. Let's renumber them
to plug holes once they're known, following pkg/node/die/llc etc, so
that they're local to a (pkg,node) set. Now an i7-14700 shows cores
0 to 19, not 0 to 27.
2025-03-14 18:30:31 +01:00
Willy Tarreau
b633b9d422 MINOR: cpu-topo: assign an L3 cache if more than 2 L2 instances
On some machines, L3 is not always reported (e.g. on some lx2 or some
armada8040). But some also don't have L3 (core 2 quad). However, no L3
when there are more than 2 L2 is quite unheard of, and while we don't
really care about firing 2 thread groups for 2 L2, we'd rather avoid
doing this if there are 8! In this case we'll declare an L3 instance
to fix the situation. This allows small machines to continue to start
with two groups while not derivating on large ones.
2025-03-14 18:30:31 +01:00
Willy Tarreau
d169758fa9 MINOR: cpu-topo: make sure we don't leave unassigned IDs in the cpu_topo
It's important that we don't leave unassigned IDs in the topology,
because the selection mechanism is based on index-based masks, so an
unassigned ID will never be kept. This is particularly visible on
systems where we cannot access the CPU topology, the package id, node id
and even thread id are set to -1, and all CPUs are evicted due to -1 not
being set in the "only-cpu" sets.

Here in new function "cpu_fixup_topology()", we assign them with the
smallest unassigned value. This function will be used to assign IDs
where missing in general.
2025-03-14 18:30:31 +01:00
Willy Tarreau
af648c7b58 MINOR: cpu-topo: assign clusters to cores without and renumber them
Due to the previous commit we can end up with cores not assigned
any cluster ID. For this, at the end we sort the CPUs by topology
and assign cluster IDs to remaining CPUs based on pkg/node/llc.
For example an 14900 now shows 5 clusters, one for the 8 p-cores,
and 4 of 4 e-cores each.

The local cluster numbers are per (node,pkg) ID so that any rule could
easily be applied on them, but we also keep the global numbers that
will help with thread group assignment.

We still need to force to assign distinct cluster IDs to cores
running on a different L3. For example the EPYC 74F3 is reported
as having 8 different L3s (which is true) and only one cluster.

Here we introduce a new function "cpu_compose_clusters()" that is called
from the main init code just after cpu_detect_topology() so that it's
not OS-dependent. It deals with this renumbering of all clusters in
topology order, taking care of considering any distinct LLC as being
on a distinct cluster.
2025-03-14 18:30:31 +01:00
Willy Tarreau
385360fe81 MINOR: cpu-topo: ignore single-core clusters
Some platforms (several armv7, intel 14900 etc) report one distinct
cluster per core. This is problematic as it cannot let clusters be
used to distinguish real groups of cores, and cannot be used to build
thread groups.

Let's just compare the cluster cpus to the siblings, and ignore it if
they exactly match. We must also take care of not falling back to
core_cpus_list, which can enumerate cores that already have their
cluster assigned (e.g. intel 14900 has 4 4-Ecore clusters in addition
to the 8 Pcores).
2025-03-14 18:30:31 +01:00
Willy Tarreau
a4471ea56d MINOR: cpu-topo: implement a CPU sorting mechanism by cluster ID
This will be used to detect and fix incorrect setups which report
the same cluster ID for multiple L3 instances.

The arrangement of functions in this file is becoming a real problem.
Maybe we should move all this to cpu_topo for example, and better
distinguish OS-specific and generic code.
2025-03-14 18:30:31 +01:00
Willy Tarreau
a8acdbd9fd MINOR: cpu-topo: implement a sorting mechanism by CPU locality
Once we've kept only the CPUs we want, the next step will be to form
groups and these ones are based on locality. Thus we'll have to sort by
locality. For now the locality is only inferred by the index. No grouping
is made at this point. For this we add the "cpu_reorder_by_locality"
function with a locality-based comparison function.
2025-03-14 18:30:31 +01:00
Willy Tarreau
18133a054d MINOR: cpu-topo: implement a sorting mechanism for CPU index
CPU selection will be performed by sorting CPUs according to
various criteria. For dumps however, that's really not convenient
and we'll need to reorder the CPUs according to their index only.
This is what the new function cpu_reorder_by_index() does. It's
called  in thread_detect_count() before dumping the CPU topology.
2025-03-14 18:30:31 +01:00
Willy Tarreau
661d49a18a MINOR: cpu-topo: skip CPU properties that we've verified do not exist
A number of entries under /cpu/cpu%d only exist on certain kernel
versions, certain archs and/or with certain modules loaded. It's
pointless to insist on trying to read them all for all CPUs when
we've already verified they do not exist. Thus let's use stat()
the first time prior to checking some of them, and only try to
access them when they really exist. This almost completely
eliminates the large number of ENOENT that was visible in strace
during startup.
2025-03-14 18:30:31 +01:00
Willy Tarreau
baeea08dba MINOR: cpu-topo: skip identification of non-existing CPUs
There's no point trying to read all entries under /cpu/cpu%d when that
one does not exist, so let's just skip it in this case.
2025-03-14 18:30:31 +01:00
Willy Tarreau
8542c79f9d MINOR: cpu-topo: skip CPU detection when /sys/.../cpu does not exist
There's no point scanning all entries when /cpu doesn't exist in the
first place. Let's check once for it and skip the loop in this case.
2025-03-14 18:30:30 +01:00
Willy Tarreau
c5ddf4a5b2 MINOR: cpu-topo: boost the capacity of performance cores with cpufreq
Cpufreq alone isn't a good metric on heterogenous CPUs because efficient
cores can reach almost as high frequencies as performant ones. Tests have
shown that majoring performance cores by 50% gives a pretty accurate
estimate of the performance to expect on modern CPUs, and that counting
+33% per extra SMT thread is reasonable as well. We don't have the info
about the core's quality, but using the presence of SMT is a reasonable
approach in this case, given that efficiency cores will not use it.

As an example, using one thread of each of the 8 P-cores of an intel
i9-14900k gives 395k rps for a corrected total capacity of 69.3k, using
the 16 E-cores gives 40.5k for a total capacity of 70.4k, and using both
threads of 6 P-cores gives 41.1k for a total capacity of 69.6k. Thus the
3 same scores deliver the same performance in various combinations.
2025-03-14 18:30:30 +01:00
Willy Tarreau
e4aa13e786 MINOR: cpu-topo: use cpufreq before acpi cppc
The acpi_cppc method was found to take about 5ms per CPU on a 64-core
EPYC system, which is plain unacceptable as it delays the boot by half
a second. Let's use the less accurate cpufreq first, which should be
sufficient anyway since many systems do not have acpi_cppc. We'll only
fall back to acpi_cppc for systems without cpufreq. If it were to be
an issue over time, we could also automatically consider that all
threads of the same core or even of the same cluster run at the same
speed (when a cluster is known to be accurate).
2025-03-14 18:30:30 +01:00
Willy Tarreau
d11241b7ba MINOR: cpu-topo: fall back to nominal_perf and scaling_max_freq for the capacity
When cpu_capacity is not present, let's try to check acpi_cppc's
nominal_perf which is similar and commonly found on servers, then
scaling_max_freq (though that last one may vary a bit between CPUs
depending on die quality). That variation is not a problem since
we can absorb a ~5% variation without issue.

It was verified on an i9-14900 featuring 5.7-P, 6.0-P and 4.4-E GHz
that P-cores were not reordered and that E cores were placed last.
It was also OK on a W3-2345 with 4.3 to 4.5GHz.
2025-03-14 18:30:30 +01:00
Willy Tarreau
322c28cc19 MINOR: cpu-topo: refine cpu dump output to better show kept/dropped CPUs
It's becoming difficult to see which CPUs are going to be kept/dropped.
Let's just skip all offline CPUs, and indicate "keep" in front of those
that are going to be used, and "----" in front of the excluded ones. It
is way more readable this way.

Also let's just drop the array entry number, since it's always the same
as the CPU number and is only an internal representation anyway.
2025-03-14 18:30:30 +01:00
Willy Tarreau
f1210ee7c6 MEDIUM: cfgparse: remove now unused numa & thread-count detection
Ths is not needed anymore since already done before landing here
via thread_detect_count().
2025-03-14 18:30:30 +01:00
Willy Tarreau
e3aef4c9a4 MEDIUM: thread: reimplement first numa node detection
Let's reimplement automatic binding to the first NUMA node when thread
count is not forced. It's the same thing as is already done in
check_config_validity() except that this time it's based on the
collected CPU information. The threads are automatically counted
and CPUs from non-first node(s) are evicted.
2025-03-14 18:30:30 +01:00
Willy Tarreau
4a525e8d27 MEDIUM: cpu-topo: make sure to properly assign CPUs to threads as a fallback
If no cpu-map is done and no cpu-policy could be enforced, we still need
to count the number of usable CPUs, assign them to all threads and set
the nbthread value accordingly.

This already handles the part that was done in check_config_validity()
via thread_cpus_enabled_at_boot.
2025-03-14 18:30:30 +01:00
Willy Tarreau
1af4942c95 MEDIUM: thread: start to detect thread groups and threads min/max
By mutually refining the thread count and group count, we can try
to detect the most suitable setup for the current machine. Taskset
is implicitly handled correctly. tgroups automatically adapt to the
configured number of threads. cpu-map manages to limit tgroups to
the smallest supported value.

The thread-limit is enforced. Just like in cfgparse, if the thread
count was forced to a higher value, it's reduced and a warning is
emitted. But if it was not set, the thr_max value is bound to this
limit so that further calculations respect it.

We continue to default to the max number of available threads and 1
tgroup by default, with the limit. This normally allows to get rid
of that test in check_config_validity().
2025-03-14 18:30:30 +01:00
Willy Tarreau
68069e4b27 MINOR: cpu-topo: add "drop-cpu" and "only-cpu" to cpu-set
These allow respectively to disable binding to CPUs listed in a set, and
to disable binding to CPUs not in a set.
2025-03-14 18:30:30 +01:00
Willy Tarreau
cda4956d9c MINOR: cpu-topo: add a new "cpu-set" global directive to choose cpus
For now it's limited, it only supports "reset" to ask that any previous
"taskset" be ignored. The goal will be to later add more actions that
allow to symbolically define sets of cpus to bind to or to drop. This
also clears the cpu_mask_forced variable that is used to detect
that a taskset had been used.
2025-03-14 18:30:30 +01:00
Willy Tarreau
f0661e79fe MINOR: global: add a command-line option to enable CPU binding debugging
During development, everything related to CPU binding and the CPU topology
is debugged using state dumps at various places, but it does make sense to
have a real command line option so that this remains usable in production
to help users figure why some CPUs are not used by default. Let's add
"-dc" for this. Since the list of global.tune.options values is almost
full and does not 100% match this option, let's add a new "tune.debug"
field for this.
2025-03-14 18:30:30 +01:00
Willy Tarreau
94543d7b65 MINOR: cfgparse: use already known offline CPU information
No need to reparse cpu/online, let's just rely on the info we learned
previously about offline CPUs.
2025-03-14 18:30:30 +01:00
Willy Tarreau
1560827c9d MINOR: cfgparse: move the binding detection into numa_detect_topology()
For now the function refrains from detecting the CPU topology when a
restrictive taskset or cpu-map was already performed on the process,
and it's documented as such, the reason being that until we're able
to automatically create groups, better not change user settings. But
we'll need to be able to detect bound CPUs and to process them as
desired by the user, so we now need to move that detection into the
function itself. It changes nothing to the logic, just gives more
freedom to the function.
2025-03-14 18:30:30 +01:00
Willy Tarreau
ac1db9db7d MINOR: thread: turn thread_cpu_mask_forced() into an init-time variable
The function is not convenient because it doesn't allow us to undo the
startup changes, and depending on where it's being used, we don't know
whether the values read have already been altered (this is not the case
right now but it's going to evolve).

Let's just compute the status during cpu_detect_usable() and set a
variable accordingly. This way we'll always read the init value, and
if needed we can even afford to reset it. Also, placing it in cpu_topo.c
limits cross-file dependencies (e.g. threads without affinity etc).
2025-03-14 18:30:30 +01:00
Willy Tarreau
3a7cc676fa MINOR: cpu-topo: add NUMA node identification to CPUs on FreeBSD
With this patch we're also NUMA node IDs to each CPU when the info is
found. The code is highly inspired from the one in commit f5d48f8b3
("MEDIUM: cfgparse: numa detect topology on FreeBSD."), the difference
being that we're just setting the value in ha_cpu_topo[].
2025-03-14 18:30:30 +01:00
Willy Tarreau
f6154c079e MINOR: cpu-topo: add NUMA node identification to CPUs on Linux
With this patch we're also assigning NUMA node IDs to each CPU when one
is found. The code is highly inspired from the one in commit b56a7c89a
("MEDIUM: cfgparse: detect numa and set affinity if needed") that already
did the job, except that it could be simplified since we're just collecting
info to fill the ha_cpu_topo[] array.
2025-03-14 18:30:30 +01:00
Willy Tarreau
65612369e7 MINOR: cpu-topo: also store the sibling ID with SMT
The sibling ID was not reported because it's not directly accessible
but we don't care, what matters is that we assign numbers to all the
threads we find using the same CPU so that some strategies permit to
allocate one thread at a time if we want to use few threads with max
performance.
2025-03-14 18:30:30 +01:00
Willy Tarreau
7cb274439b MINOR: cpu-topo: add CPU topology detection for linux
This uses the publicly available information from /sys to figure the cache
and package arrangements between logical CPUs and fill ha_cpu_topo[], as
well as their SMT capabilities and relative capacity for those which expose
this. The functions clearly have to be OS-specific.
2025-03-14 18:30:30 +01:00
Willy Tarreau
12f3a2bbb7 MINOR: cpu-topo: try to detect offline cpus at boot
When possible, the offline CPUs are detected at boot and their OFFLINE
flag is set in the ha_cpu_topo[] array. When the detection is not
possible (e.g. not linux, /sys not mounted etc), we just mark none of
them as being offline, as we don't want to infer wrong info that could
hinder automatic CPU placement detection. When valid, we take this
opportunity for refining cpu_topo_lastcpu so that we don't need to
manipulate CPUs beyond this value.
2025-03-14 18:30:30 +01:00
Willy Tarreau
44881e5abf MINOR: cpu-topo: add detection of online CPUs on FreeBSD
On FreeBSD we can detect online CPUs at least by doing the bitwise-OR of
the CPUs of all domains, so we're using this and adding this detection
to ha_cpuset_detect_online(). If we find simpler later, we can always
rework it, but it's reasonably inexpensive since we only check existing
domains.
2025-03-14 18:30:30 +01:00
Willy Tarreau
8f72ce335a MINOR: cpu-topo: add detection of online CPUs on Linux
This adds a generic function ha_cpuset_detect_online() which for now
only supports linux via /sys. It fills a cpuset with the list of online
CPUs that were detected (or returns a failure).
2025-03-14 18:30:30 +01:00
Willy Tarreau
8c524c7c9d REORG: cpu-topo: move bound cpu detection from cpuset to cpu-topo
The cpuset files are normally used only for cpu manipulations. It happens
that the initial CPU binding detection was initially placed there since
there was no better place, but in practice, being OS-specific, it should
really be in cpu-topo. This simplifies cpuset which doesn't need to know
about the OS anymore.
2025-03-14 18:30:30 +01:00
Willy Tarreau
a6fdc3eaf0 MINOR: cpu-topo: update CPU topology from excluded CPUs at boot
Now before trying to resolve the thread assignment to groups, we detect
which CPUs are not bound at boot so that we can mark them with
HA_CPU_F_EXCLUDED. This will be useful to better know on which CPUs we
can count later. Note that we purposely ignore cpu-map here as we
don't know how threads and groups will map to cpu-map entries, hence
which CPUs will really be used.

It's important to proceed this way so that when we have no info we
assume they're all available.
2025-03-14 18:30:30 +01:00
Willy Tarreau
bdb731172c MINOR: cpu-topo: add a function to dump CPU topology
The new function cpu_dump_topology() will centralize most debugging
calls, and it can make efforts of not dumping some possibly irrelevant
fields (e.g. non-existing cache levels).
2025-03-14 18:30:30 +01:00
Willy Tarreau
041462c4af MINOR: cpu-topo: rely on _SC_NPROCESSORS_CONF to trim maxcpus
We don't want to constantly deal with as many CPUs as a cpuset can hold,
so let's first try to trim the value to what the system claims to support
via _SC_NPROCESSORS_CONF. It is obviously still subject to the limit of
the cpuset size though. The value is stored globally so that we can
reuse it elsewhere after initialization.
2025-03-14 18:30:30 +01:00
Willy Tarreau
656cedad42 MINOR: cpu-topo: allocate and initialize the ha_cpu_topo array.
This does the bare minimum to allocate and initialize a global
ha_cpu_topo array for the number of supported CPUs and release
it at deinit time.
2025-03-14 18:30:30 +01:00
Willy Tarreau
d165f5d3ab MINOR: cpu-topo: add ha_cpu_topo definition
This structure will be used to store information about each CPU's
topology (package ID, L3 cache ID, NUMA node ID etc). This will be used
in conjunction with CPU affinity setting to try to perform a mostly
optimal binding between threads and CPU numbers by default. Since it
was noticed during tests that absolutely none of the many machines
tested reports different die numbers, the die_id is not stored.
Also, it was found along experiments that the cluster ID will be used
a lot, half of the time as a node-local identifier, and half of the
time as a global identifier. So let's store the two versions at once
(cl_gid, cl_lid).

Some flags are added to indicate causes of exclusion (offline, excluded
at boot, excluded by rules, ignored by policy).
2025-03-14 18:30:30 +01:00
Willy Tarreau
05a4efb102 MINOR: thread: rely on the cpuset functions to count bound CPUs
let's just clean up the thread_cpus_enabled() code a little bit
by removing the OS-specific code and rely on ha_cpuset_detect_bound()
instead. On macos we continue to use sysconf() for now.
2025-03-14 18:30:30 +01:00
Willy Tarreau
32bb68e736 MINOR: cpuset: make the API support negative CPU IDs
Negative IDs are very convenient to mean "not set", so let's just make
the cpuset API robust against this, especially with ha_cpuset_isset()
so that we don't have to manually add this check everywhere when a
value is not known.
2025-03-14 18:30:30 +01:00
Willy Tarreau
f156baf8ce DOC: design-thoughts: commit numa-auto.txt
Lots of collected data and observations aggregated into a single commit
so as not to lose them. Some parts below come from several commit
messages and are incremental.

Add captures and analysis of intel 14900 where it's not easy to draw
the line between the desired P and E cores.

The 14900 raises some questions (imagine a dual-die variant in multi-socket).
That's the start of an algorithmic distribution of performance cores into
thread groups.

cpu-map currently conflicts a lot with the choices after auto-detection
but it doesn't have to. The problem is the inability to configure the
threads for the whole process like taskset does. By offering this ability
we can also start to designate groups of CPUs symbolically (package, die,
ccx, cores, smt).

It can also be useful to exploit the info from cpuinfo that is not
available in /sys, such as the model number. At least on arm, higher
numbers indicate bigger cores and can be useful to distinguish cores
inside a cluster. It will not indicate big vs medium ones of the same
type (e.g. a78 3.0 vs 2.4 GHz) but can still be effective at identifying
the efficient ones.

In short, infos such as cluster ID not always reliable, and are
local to the package. die_id as well. die number is not reported
here but should definitely be used, as a higher priority than L3.

We're still missing a discriminant between the l3 and cluster number
in order to address heterogenous CPUs (e.g. intel 14900), though in
terms of locality that's currently done correctly.

CPU selection is also a full topic, and some thoughts were noted
regarding sorting by perf vs locality so as never to mix inter-
socket CPUs due to sorting.

The proposed cpu-selection cannot work as-is, because it acts both on
restriction and preference, and these two are not actions but a sequence.
First restrictions must be enforced, and second the remaining CPUs are
sorted according to the preferred criterion, and a number of threads are
selected.

Currently we refine the OS-exposed cluster number but it's not correct
as we can end up with something poorly numbered. We need to respect the
LLC in any case so let's explain the approach.
2025-03-14 18:30:30 +01:00
Willy Tarreau
0ceb1f2c51 DEV: ncpu: also emulate sysconf() for _SC_NPROCESSORS_*
This is also needed in order to make the requested number of CPUs
appear. For now we don't reroute to the original sysconf() call so
we return -1,EINVAL for all other info.
2025-03-14 18:30:30 +01:00
Willy Tarreau
ed75148ca0 BUILD: tools: avoid a build warning on gcc-4.8 in resolve_sym_name()
A build warning is emitted with gcc-4.8 in tools.c since commit
e920d73f59 ("MINOR: tools: improve symbol resolution without dl_addr")
because the compiler doesn't see that <size> is necessarily initialized.
Let's just preset it.
2025-03-14 18:30:30 +01:00
Willy Tarreau
4e09789644 MINOR: tools: teach resolve_sym_name() a few more common symbols
This adds run_poll_loop, run_tasks_from_lists, process_runnable_tasks,
ha_dump_backtrace and cli_io_handler which are fairly common in
backtraces. This will be less relative symbols when dladdr is not
usable.
2025-03-13 17:31:16 +01:00
Willy Tarreau
a3582a77f7 MINOR: tools: ease the declaration of known symbols in resolve_sym_name()
Let's have a macro that declares both the symbol and its name, it will
avoid the risk of introducing typos, and encourages adding more when
needed. The macro also takes an optional second argument to permit an
inline declaration of an extern symbol.
2025-03-13 17:30:48 +01:00
Willy Tarreau
e920d73f59 MINOR: tools: improve symbol resolution without dl_addr
When dl_addr is not usable or fails, better fall back to the closest
symbol among the known ones instead of providing everything relative
to main. Most often, the location of the function will give some hints
about what it can be. Thus now we can emit fct+0xXXX in addition to
main+0xXXX or main-0xXXX. We keep a margin of +256kB maximum after a
function for a match, which is around the maximum size met in an object
file, otherwise it becomes pointless again.
2025-03-13 17:30:48 +01:00
Willy Tarreau
1e99efccef MINOR: cli: export cli_io_handler() to ease symbol resolution
It's common to meet this function in backtraces, it's a bit annoying
that it's not resolved, so let's export it so that it becomes resolvable.
2025-03-13 17:30:48 +01:00
Aurelien DARRAGON
8311be5ac6 BUG/MINOR: stats: fix capabilities and hide settings for some generic metrics
Performing a diff on stats output before vs after commit 66152526
("MEDIUM: stats: convert counters to new column definition") revealed
that some metrics were not properly ported to to the new API. Namely,
"lbtot", "cli_abrt" and "srv_abrt" are now exposed on frontend and
listeners while it was not the case before.

Also, "hrsp_other" is exposed even when "mode http" wasn't set on the
proxy.

In this patch we restore original behavior by fixing the capabilities
and hide settings.

As this could be considered as a minor regression (looking at the commit
message it doesn't seem intended), better tag this as a bug. It should be
backported in 3.0 with 66152526.
2025-03-13 11:49:18 +01:00
Aurelien DARRAGON
4c3eb60e70 DOC: management: rename some last occurences from domain "dns" to "resolvers"
This is a complementary patch to cf913c2f9 ("DOC: management: rename show
stats domain cli "dns" to "resolvers"). The doc still refered to the
legacy "dns" domain filter for stat command. Let's rename those occurences
to "resolvers".

It may be backported to all stable versions.
2025-03-13 11:49:10 +01:00
Willy Tarreau
78ef52dbd1 BUILD: backend: silence a build warning when threads are disabled
Since commit 8de8ed4f48 ("MEDIUM: connections: Allow taking over
connections from other tgroups.") we got this partially absurd
build warning when disabling threads:

  src/backend.c: In function 'conn_backend_get':
  src/backend.c:1371:27: warning: array subscript [0, 0] is outside array bounds of 'struct tgroup_info[1]' [-Warray-bounds]

The reason is that gcc sees that curtgid is not equal to tgid which is
defined as 1 in this case, thus it figures that tgroup_info[curtgid-1]
will be anything but zero and that doesn't fit. It is ridiculous as it
is a perfect case of dead code elimination which should not warrant a
warning. Nevertheless we know we don't need to do this when threads are
disabled and in this case there will not be more than 1 thread group, so
we can happily use that preliminary test to help the compiler eliminate
the dead condition and avoid spitting this warning.

No backport is needed.
2025-03-12 18:16:14 +01:00
Willy Tarreau
b61ed9babe BUILD: tools: silence a build warning when USE_THREAD=0
The dladdr_lock that was added to avoid re-entering into dladdr is
conditioned by threads, but the way it's declared causes a build
warning if threads are disabled due to the insertion of a lone semi
colon in the variables block. Let's switch to __decl_thread_var()
for this.

This can be backported wherever commit eb41d768f9 ("MINOR: tools:
use only opportunistic symbols resolution") is backported. It relies
on these previous two commits:

   bb4addabb7 ("MINOR: compiler: add a simple macro to concatenate resolved strings")
   69ac4cd315 ("MINOR: compiler: add a new __decl_thread_var() macro to declare local variables")
2025-03-12 18:11:14 +01:00
Willy Tarreau
69ac4cd315 MINOR: compiler: add a new __decl_thread_var() macro to declare local variables
__decl_thread() already exists but is more suited for struct members.
When using it in a variables block, it appends the final trailing
semi-colon which is a statement that ends the variable block. Better
clean this up and have one precisely for variable blocks. In this
case we can simply define an unused enum value that will consume the
semi-colon. That's what the new macro __decl_thread_var() does.
2025-03-12 18:08:12 +01:00
Willy Tarreau
bb4addabb7 MINOR: compiler: add a simple macro to concatenate resolved strings
It's often useful to be able to concatenate strings after resolving
them (e.g. __FILE__, __LINE__ etc). Let's just have a CONCAT() macro
to do that, which calls _CONCAT() with the same arguments to make
sure the contents are resolved before being concatenated.
2025-03-12 18:06:55 +01:00
Willy Tarreau
12383fd9f5 BUG/MEDIUM: thread: use pthread_self() not ha_pthread[tid] in set_affinity
A bug was uncovered by the work on NUMA. It only triggers in the CI
with libmusl due to a race condition. What happens is that the call
to set_thread_cpu_affinity() is done very early in the polling loop,
and that it relies on ha_pthread[tid] instead of pthread_self(). The
problem is that ha_pthread[tid] is only set by the return from
pthread_create(), which might happen later depending on the number of
CPUs available to run the starting thread.

Let's just use pthread_self() here. ha_pthread[] is only used to send
signals between threads, there's no point in using it here.

This can be backported to 2.6.
2025-03-12 15:59:23 +01:00
Aurelien DARRAGON
e942305214 MEDIUM: log: change default "host" strategy for log-forward section
Historically, log-forward proxy used to preserve host field from input
message as much as possible, and if syslog host wasn't provided
(rfc5424 '-' or bad rfc3164 or rfc5424 message) then "localhost" or "-"
would be used as host when outputting message using rfc3164 or rfc5424.

We change that behavior (which corresponds to "keep" host option), so that
log-forward now uses "fill" strategy as default: if the host is provided
in input message, it is preserved. However if it is missing and IP address
from sender is available, we use it.
2025-03-12 10:55:49 +01:00
Aurelien DARRAGON
ad0133cc50 MINOR: log: handle log-forward "option host"
Following previous patch, we know implement the logic for the host
option under log-forward section. Possible strategies are:

      replace If input message already contains a value for the host
              field, we replace it by the source IP address from the
              sender.
              If input message doesn't contain a value for the host field
              (ie: '-' as input rfc5424 message or non compliant rfc3164
              or rfc5424 message), we use the source IP address from the
              sender as host field.

      fill    If input message already contains a value for the host field,
              we keep it.
              If input message doesn't contain a value for the host field
              (ie: '-' as input rfc5424 message or non compliant rfc3164
              or rfc5424 message), we use the source IP address from the
              sender as host field.

      keep    If input message already contains a value for the host field,
              we keep it.
              If input message doesn't contain a value for the host field,
              we set it to localhost (rfc3164) or '-' (rfc5424).
              (This is the default)

      append  If input message already contains a value for the host field,
              we append a comma followed by the IP address from the sender.
              If input message doesn't contain a value for the host field,
              we use the source IP address from the sender.

Default value (unchanged) is "keep" strategy. option host is only relevant
with rfc3164 or rfc5424 format on log targets. Also, if the source address
is not available (ie: UNIX socket), default behavior prevails.

Documentation was updated.
2025-03-12 10:52:07 +01:00
Aurelien DARRAGON
003fe530ae MINOR: log: add "option host" log-forward option
add only the parsing part, options are currently unused
2025-03-12 10:51:35 +01:00
Aurelien DARRAGON
47f14be9f3 MINOR: tools: only print address in sa2str() when port == -1
Support special value for port in sa2str: if port is equal to -1, only
print the address without the port, also ignoring <map_ports> value.
2025-03-12 10:51:20 +01:00
Aurelien DARRAGON
2de62d0461 MINOR: log: provide source address information in syslog_process_message()
provide struct sockaddr_storage pointer from the message sender in
syslog_process_message()
2025-03-12 10:50:30 +01:00
Aurelien DARRAGON
bc76f6dde9 MINOR: log: migrate log-forward options from proxy->options2 to options3
Migrate recently added log-forward section options, currently stored under
proxy->options2 to proxy->options3 since proxy->options2 is running out of
space and we plan on adding more log-forward options.
2025-03-12 10:50:03 +01:00
Aurelien DARRAGON
cc5a66212d MINOR: proxy: add proxy->options3
proxy->options2 is almost full, yet we will add new log-forward options
in upcoming patches so we anticipate that by adding a new {no_}options3
and cfg_opts3[] to further extend proxy options
2025-03-12 10:49:36 +01:00
Aurelien DARRAGON
d47e7103b8 CLEANUP: log: add syslog_process_message() helper
Prevent code duplication under syslog_fd_handler() and syslog_io_handler()
by merging common code path in a single syslog_process_message() helper
that processed a single message stored in <buf> according to <frontend>
settings.
2025-03-12 10:49:18 +01:00
Aurelien DARRAGON
8b8520305e CLEANUP: log-forward: remove useless options2 init
It is actually not required to zero out proxy->options2 since proxy is
allocated using calloc() which already does it.
2025-03-12 10:49:08 +01:00
William Lallemand
c6e6318125 CI: github: add "jose" to apt dependencies
jose is used in the JWS unit-test, let's add it to the CI.
2025-03-11 22:29:40 +01:00
William Lallemand
d014d7ee72 TESTS: jws: implement a test for JWS signing
This test returns a JWS payload signed a specified private key in the
PEM format, and uses the "jose" command tool to check if the signature
is correct against the jwk public key.

The test could be improved later by using the code from jwt.c allowing
to check a signature.
2025-03-11 22:29:40 +01:00
William Lallemand
3abb428fc8 MINOR: jws: implement JWS signing
This commits implement JWS signing, this is divided in 3 parts:

- jws_b64_protected() creates a JWS "protected" header, which takes the
  algorithm, kid or jwk, nonce and url as input, and fill a destination
  buffer with the base64url version of the header
- jws_b64_payload() just encode a payload in base64url
- jws_b64_signature() generates a signature using as input the protected
  header and the payload, it supports ES256, ES384 and ES512 for ECDSA
  keys, and RS256 for RSA ones. The RSA signature just use the
  EVP_DigestSign() API with its result encoded in base64url. For ECDSA
  it's a little bit more complicated, and should follow section 3.4 of
  RFC7518, R and S should be padded to byte size.

Then the JWS can be output with jws_flattened() which just formats the 3
base64url output in a JSON representation with the 3 fields, protected,
payload and signature.
2025-03-11 22:29:40 +01:00
Willy Tarreau
3cbeb6a74b [RELEASE] Released version 3.2-dev7
Released version 3.2-dev7 with the following main changes :
    - BUG/MEDIUM: applet: Don't handle EOI/EOS/ERROR is applet is waiting for room
    - BUG/MEDIUM: spoe/mux-spop: Introduce an NOOP action to deal with empty ACK
    - BUG/MINOR: cfgparse: fix NULL ptr dereference in cfg_parse_peers
    - BUG/MEDIUM: uxst: fix outgoing abns address family in connect()
    - REGTESTS: fix reg-tests/server/abnsz.vtc
    - BUG/MINOR: log: fix outgoing abns address family
    - BUG/MINOR: sink: add tempo between 2 connection attempts for sft servers
    - MINOR: clock: always use atomic ops for global_now_ms
    - CI: QUIC Interop: clean old docker images
    - BUG/MINOR: stream: do not call co_data() from __strm_dump_to_buffer()
    - BUG/MINOR: mux-h1: always make sure h1s->sd exists in h1_dump_h1s_info()
    - MINOR: tinfo: add a new thread flag to indicate a call from a sig handler
    - BUG/MEDIUM: stream: never allocate connection addresses from signal handler
    - MINOR: freq_ctr: provide non-blocking read functions
    - BUG/MEDIUM: stream: use non-blocking freq_ctr calls from the stream dumper
    - MINOR: tools: use only opportunistic symbols resolution
    - CLEANUP: task: move the barrier after clearing th_ctx->current
    - MINOR: compression: Introduce minimum size
    - BUG/MINOR: h2: always trim leading and trailing LWS in header values
    - MINOR: tinfo: split the signal handler report flags into 3
    - BUG/MEDIUM: stream: don't use localtime in dumps from a signal handler
    - OPTIM: connection: don't try to kill other threads' connection when !shared
    - BUILD: add possibility to use different QuicTLS variants
    - MEDIUM: fd: Wait if locked in fd_grab_tgid() and fd_take_tgid().
    - MINOR: fd: Add fd_lock_tgid_cur().
    - MEDIUM: epoll: Make sure we can add a new event
    - MINOR: pollers: Add a fixup_tgid_takeover() method.
    - MEDIUM: pollers: Drop fd events after a takeover to another tgid.
    - MEDIUM: connections: Allow taking over connections from other tgroups.
    - MEDIUM: servers: Add strict-maxconn.
    - BUG/MEDIUM: server: properly initialize PROXY v2 TLVs
    - BUG/MINOR: server: fix the "server-template" prefix memory leak
    - BUG/MINOR: h3: do not report transfer as aborted on preemptive response
    - CLEANUP: h3: fix documentation of h3_rcv_buf()
    - MINOR: hq-interop: properly handle incomplete request
    - BUG/MEDIUM: mux-fcgi: Try to fully fill demux buffer on receive if not empty
    - MINOR: h1: permit to relax the websocket checks for missing mandatory headers
    - BUG/MINOR: hq-interop: fix leak in case of rcv_buf early return
    - BUG/MINOR: server: check for either proxy-protocol v1 or v2 to send hedaer
    - MINOR: jws: implement a JWK public key converter
    - DEBUG: init: add a way to register functions for unit tests
    - TESTS: add a unit test runner in the Makefile
    - TESTS: jws: register a unittest for jwk
    - CI: github: run make unit-tests on the CI
    - TESTS: add config smoke checks in the unit tests
    - MINOR: jws: conversion to NIST curves name
    - CI: github: remove smoke tests from vtest.yml
    - TESTS: ist: fix wrong array size
    - TESTS: ist: use the exit code to return a verdict
    - TESTS: ist: add a ist.sh to launch in make unit-tests
    - CI: github: fix h2spec.config proxy names
    - DEBUG: init: Add a macro to register unit tests
    - MINOR: sample: allow custom date format in error-log-format
    - CLEANUP: log: removing "log-balance" references
    - BUG/MINOR: log: set proper smp size for balance log-hash
    - MINOR: log: use __send_log() with exact payload length
    - MEDIUM: log: postpone the decision to send or not log with empty messages
    - MINOR: proxy: make pr_mode enum bitfield compatible
    - MINOR: cfgparse-listen: add and use cfg_parse_listen_match_option() helper
    - MINOR: log: add options eval for log-forward
    - MINOR: log: detach prepare from parse message
    - MINOR: log: add dont-parse-log and assume-rfc6587-ntf options
    - BUG/MEIDUM: startup: return to initial cwd only after check_config_validity()
    - TESTS: change the output of run-unittests.sh
    - TESTS: unit-tests: store sh -x in a result file
    - CI: github: show results of the Unit tests
    - BUG/MINOR: cfgparse/peers: fix inconsistent check for missing peer server
    - BUG/MINOR: cfgparse/peers: properly handle ignored local peer case
    - BUG/MINOR: server: dont return immediately from parse_server() when skipping checks
    - MINOR: cfgparse/peers: provide more info when ignoring invalid "peer" or "server" lines
    - BUG/MINOR: stream: fix age calculation in "show sess" output
    - MINOR: stream/cli: rework "show sess" to better consider optional arguments
    - MINOR: stream/cli: make "show sess" support filtering on front/back/server
    - TESTS: quic: create first quic unittest
    - MINOR: h3/hq-interop: restore function for standalone FIN receive
    - MINOR/OPTIM: mux-quic: do not allocate rxbuf on standalone FIN
    - MINOR: mux-quic: refine reception of standalone STREAM FIN
    - MINOR: mux-quic: define globally stream rxbuf size
    - MINOR: mux-quic: define rxbuf wrapper
    - MINOR: mux-quic: store QCS Rx buf in a single-entry tree
    - MINOR: mux-quic: adjust Rx data consumption API
    - MINOR: mux-quic: adapt return value of qcc_decode_qcs()
    - MAJOR: mux-quic: support multiple QCS RX buffers
    - MEDIUM: mux-quic: handle too short data splitted on multiple rxbuf
    - MAJOR: mux-quic: increase stream flow-control for multi-buffer alloc
    - BUG/MINOR: cfgparse-tcp: relax namespace bind check
    - MINOR: startup: adjust alert messages, when capabilities are missed
2025-03-07 16:37:57 +01:00
Valentine Krasnobaeva
7d427134fe MINOR: startup: adjust alert messages, when capabilities are missed
CAP_SYS_ADMIN support was added, in order to access sockets in namespaces. So
let's adjust the alert at startup, where we check preserved capabilities from
global.last_checks. Let's mention here cap_sys_admin as well.
2025-03-07 16:37:16 +01:00
Damien Claisse
f0a07f834c BUG/MINOR: cfgparse-tcp: relax namespace bind check
Commit 5cbb278 introduced cap_sys_admin support, and enforced checks for
both binds and servers. However, when binding into a namespace, the bind
is done before dropping privileges. Hence, checking that we have
cap_sys_admin capability set in this case is not needed (and it would
decrease security to add it).
For users starting haproxy with other user than root and without
cap_sys_admin, bind should have already failed.
As a consequence, relax runtime check for binds into a namespace.
2025-03-07 16:23:29 +01:00
Amaury Denoyelle
dc7913d814 MAJOR: mux-quic: increase stream flow-control for multi-buffer alloc
Support for multiple Rx buffers per QCS instance has been introduced by
previous patches. However, due to flow-control initial values, client
were still unable to fully used this to increase their upload
throughput.

This patch increases max-stream-data-bidi-remote flow-control initial
values. A new define QMUX_STREAM_RX_BUF_FACTOR will fix the number of
concurrent buffers allocable per QCS. It is set to 90.

Note that connection flow-control initial value did not changed. It is
still configured to be equivalent to bufsize multiplied by the maximum
concurrent streams. This ensures that Rx buffers allocation is still
constrained per connection, so that it won't be possible to have all
active QCS instances using in parallel their maximum Rx buffers count.
2025-03-07 12:06:27 +01:00
Amaury Denoyelle
75027692a3 MEDIUM: mux-quic: handle too short data splitted on multiple rxbuf
Previous commit introduces support for multiple Rx buffers per QCS
instance. Contiguous data may be splitted accross multiple buffers
depending on their offset.

A particular issue could arise with this new model. Indeed, app_ops
rcv_buf callback can still deal with a single buffer at a time. This may
cause a deadlock in decoding if app_ops layer cannot proceed due to
partial data, but such data are precisely divided on two buffers. This
can for example intervene during HTTP/3 frame header parsing.

To deal with this, a new function is implemented to force data realign
between two contiguous buffers. This is called only when app_ops rcv_buf
returned 0 but data is available in the next buffer after the current
one. In this case, data are transferred from the next into the current
buffer via qcs_transfer_rx_data(). Decoding is then restarted, which
should ensure that app_ops layer has enough data to advance.

During this operation, special care is ensure to removed both
qc_stream_rxbuf entries, as their offset are adjusted. The next buffer
is only reinserted if there is remaining data in it, else it can be
freed.

This case is not easily reproducible as it depends on the HTTP/3 framing
used by the client. It seems to be easily reproduced though with quiche.
$ quiche-client --http-version HTTP/3 --method POST --body /tmp/100m \
  "https://127.0.0.1:20443/post"
2025-03-07 12:06:27 +01:00
Amaury Denoyelle
60f64449fb MAJOR: mux-quic: support multiple QCS RX buffers
Implement support for multiple Rx buffers per QCS instances. This
requires several changes mostly in qcc_recv() / qcc_decode_qcs() which
deal with STREAM frames reception and decoding. These multiple buffers
can be stored in QCS rx.bufs tree which was introduced in an earlier
patch.

On STREAM frame reception, a buffer is retrieved from QCS bufs tree, or
allocated if necessary, based on the data starting offset. Each buffers
are aligned on bufsize for convenience. This ensures there is no overlap
between two contiguous buffers. Special care is taken when dealing with
a STREAM frame which must be splitted and stored in two contiguous
buffers.

When decoding input data, qcc_decode_qcs() is still invoked with a
single buffer as input. This requires a new while loop to ensure
decoding is performed accross multiple contiguous buffers until all data
are decoded or app stream buffer is full.

Also, after qcs_consume() has been performed, the stream Rx channel is
immediately closed if FIN was already received and QCS now contains only
a single buffer with all remaining data. This is necessary as qcc_recv()
is unable to close the Rx channel if FIN is received for a buffer
different from the current readable offset.

Note that for now stream flow-control value is still too low to fully
utilizing this new infrastructure and improve clients upload throughput.
Indeed, flow-control max-stream-data initial values are set to match
bufsize. This ensures that each QCS will use 1 buffer, or at most 2 if
data are splitted. A future patch will increase this value to unblock
this limitation.
2025-03-07 12:06:26 +01:00
Amaury Denoyelle
7b168e356f MINOR: mux-quic: adapt return value of qcc_decode_qcs()
Change return value of qcc_decode_qcs(). It now directly returns the
value from app_ops rcv_buf callback. Function documentation is updated
to reflect this.

For now, qcc_decode_qcs() return value is ignored by callers, so this
patch should not have any functional change. However, it will become
necessary when implementing multiple Rx buffers per QCS, as a loop will
be implemented to invoke qcc_decode_qcs() on several contiguous buffers.
Decoding must be stopped however as soon as an error is returned by
rcv_buf callback. This is also the case in case of a null value, which
indicates there is not enough data to continue decoding.
2025-03-07 12:06:26 +01:00
Amaury Denoyelle
6b5607d66f MINOR: mux-quic: adjust Rx data consumption API
HTTP/3 data are converted into HTX via qcc_decode_qcs() function. On
completion, these data are removed from QCS Rx buffer via qcs_consume().

This patch adjust qcs_consume() API with several changes. Firstly, the
Rx buffer instance to operate on must now be specified as a new argument
to the function. Secondly, buffer liberation when all data were removed
from qcs_consume() is extracted up to qcc_decode_qcs() caller.

No functional change with this patch. The objective is to have an API
which can be better adapted to multiple Rx buffers per QCS instance.
2025-03-07 12:06:26 +01:00
Amaury Denoyelle
a4f31ffeeb MINOR: mux-quic: store QCS Rx buf in a single-entry tree
Convert QCS rx buffer pointer to a tree container. Additionnaly, offset
field of qc_stream_rxbuf is thus transformed into a node tree.

For now, only a single Rx buffer is stored at most in QCS tree. Multiple
Rx buffers will be implemented in a future patch to improve QUIC clients
upload throughput.
2025-03-07 12:06:26 +01:00
Amaury Denoyelle
cc3c2d1f12 MINOR: mux-quic: define rxbuf wrapper
Define a new type qc_stream_rxbuf. This is used as a wrapper around QCS
Rx buffer with encapsulation of the ncbuf storage. It is allocated via a
new pool. Several functions are adapted to be able to deal with
qc_stream_rxbuf as a wrapper instead of the previous plain ncbuf
instance.

No functional change should happen with this patch. For now, only a
single qc_stream_rxbuf can be instantiated per QCS. However, this new
type will be useful to implement multiple Rx buffer storage in a future
commit.
2025-03-07 12:06:26 +01:00
Amaury Denoyelle
4b1e63d191 MINOR: mux-quic: define globally stream rxbuf size
QCS uses ncbuf for STREAM data storage. This serves as a limit for
maximum STREAM buffering capacity, advertised via QUIC transport
parameters for initial flow-control values.

Define a new function qmux_stream_rx_bufsz() which can be used to
retrieve this Rx buffer size. This can be used both in MUX/H3 layers and
in QUIC transport parameters.
2025-03-07 12:06:26 +01:00
Amaury Denoyelle
7dd1eec2b1 MINOR: mux-quic: refine reception of standalone STREAM FIN
Reception of standalone STREAM FIN is a corner case, which may be
difficult to handle. In particular, care must be taken to ensure app_ops
rcv_buf() is always called to be notify about FIN, even if Rx buffer is
empty or full demux flag is set. If this is the case, it could prevent
closure of QCS Rx channel.

To ensure this, rcv_buf() was systematically called if FIN was received,
with or without data payload. This could called unnecessary invokation
when FIN is transmitted with data and full demux flag is set, or data
are received out-of-order.

This patches improve qcc_recv() by detecting explicitely a standalone
FIN case. Thus, rcv_buf() is only forcefully called in this case and if
all data were already previously received.
2025-03-07 12:06:26 +01:00
Amaury Denoyelle
20dc8e4ec2 MINOR/OPTIM: mux-quic: do not allocate rxbuf on standalone FIN
STREAM FIN may be received without any payload. However, qcc_recv()
always called qcs_get_ncbuf() indiscriminately, which may allocate a QCS
Rx buffer. This is unneeded as there is no payload to store.

Improve this by skipping qcs_get_ncbuf() invokation when dealing with a
standalone FIN signal. This should prevent superfluous buffer
allocation.
2025-03-07 12:06:26 +01:00
Amaury Denoyelle
861b11334c MINOR: h3/hq-interop: restore function for standalone FIN receive
Previously, a function qcs_http_handle_standalone_fin() was implemented
to handle a received standalone FIN, bypassing app_ops layer decoding.
However, this was removed as app_ops layer interaction is necessary. For
example, HTTP/3 checks that FIN is never sent on the control uni stream.

This patch reintroduces qcs_http_handle_standalone_fin(), albeit in a
slightly diminished version. Most importantly, it is now the
responsibility of the app_ops layer itself to use it, to avoid the
shortcoming described above.

The main objective of this patch is to be able to support standalone FIN
in HTTP/0.9 layer. This is easily done via the reintroduction of
qcs_http_handle_standalone_fin() usage. This will be useful to perform
testing, as standalone FIN is a corner case which can easily be broken.
2025-03-07 12:06:26 +01:00
Amaury Denoyelle
6f95d0dad0 TESTS: quic: create first quic unittest
Define a first unit-test dedicated to QUIC. A single test for now
ensures that variable length decoding is compliant. This should be
extended in the future with new set of tests.
2025-03-07 12:06:26 +01:00
Willy Tarreau
5e558c1727 MINOR: stream/cli: make "show sess" support filtering on front/back/server
With "show sess", particularly "show sess all", we're often missing the
ability to inspect only streams attached to a frontend, backend or server.
Let's just add these filters to the command. Only one at a time may be set.

One typical use case could be to dump streams attached to a server after
issuing "shutdown sessions server XXX" to figure why any wouldn't stop
for example.
2025-03-07 10:38:12 +01:00
Willy Tarreau
2bd7cf53cb MINOR: stream/cli: rework "show sess" to better consider optional arguments
The "show sess" CLI command parser is getting really annoying because
several options were added in an exclusive mode as the single possible
argument. Recently some cumulable options were added ("show-uri") but
the older ones were not yet adapted. Let's just make sure that the
various filters such as "older" and "age" now belong to the options
and leave only <id>, "all", and "help" for the first ones. The doc was
updated and it's now easier to find these options.
2025-03-07 10:36:58 +01:00
Willy Tarreau
1cdf2869f6 BUG/MINOR: stream: fix age calculation in "show sess" output
The "show sess" output reports an age that's based on the last byte of
the HTTP request instead of the stream creation date, due to a confusion
between logs->request_ts and the request_date sample fetch function. Most
of the time these are equal except when the request is not yet full for
any reason (e.g. wait-body). This explains why a few "show sess" could
report a few new streams aged by 99 days for example.

Let's perform the correct request timestamp calculation like the sample
fetch function does, by adding t_idle and t_handshake to the accept_ts.
Now the stream's age is correct and can be correctly used with the
"show sess older <age>" variant.

This issue was introduced in 2.9 and the fix can be backported to 3.0.
2025-03-07 10:36:58 +01:00
Aurelien DARRAGON
dbb25720dd MINOR: cfgparse/peers: provide more info when ignoring invalid "peer" or "server" lines
Invalid (incomplete) "server" or "peer" lines under peers section are now
properly ignored. For completeness, in this patch we add some reports so
that the user knows that incomplete lines were ignored.

For an incomplete server line, since it is tolerated (see GH #565), we
only emit a diag warning.

For an incomplete peer line, we report a real warning, as it is not
expected to have a peer line without an address:port specified.

Also, 'newpeer == curpeers->local' check could be simplified since
we already have the 'local_peer' variable which tells us that the
parsed line refers to a local peer.
2025-03-07 09:39:51 +01:00
Aurelien DARRAGON
a76b5358f0 BUG/MINOR: server: dont return immediately from parse_server() when skipping checks
If parse_server() is called under peers section parser, and the address
needs to be parsed but it is missing, we directly return from the function

However since 0fc136ce5b ("REORG: server: use parsing ctx for server
parsing"), parse_server() uses parsing ctx to emit warning/errors, and
the ctx must be reset before returning from the function, yet this early
return was overlooked. Because of that, any ha_{warning,alert..} message
reported after early return from parse_server() could cause messages to
have an extra "parsing [file:line]" info.

We fix that by ensuring parse_server() doesn't return without resetting
the parsing context.

It should be backported up to 2.6
2025-03-07 09:39:46 +01:00
Aurelien DARRAGON
054443dfb9 BUG/MINOR: cfgparse/peers: properly handle ignored local peer case
In 8ba10fea6 ("BUG/MINOR: peers: Incomplete peers sections should be
validated."), some checks were relaxed in parse_server(), and extra logic
was added in the peers section parser in an attempt to properly ignore
incomplete "server" or "peer" statement under peers section.

This was done in response to GH #565, the main intent was that haproxy
should already complain about incomplete peers section (ie: missing
localpeer).

However, 8ba10fea69 explicitly skipped the peer cleanup upon missing
srv association for local peers. This is wrong because later haproxy
code always assumes that peer->srv is valid. Indeed, we got reports
that the (invalid) config below would cause segmentation fault on
all stable versions:

 global
   localpeer 01JM0TEPAREK01FQQ439DDZXD8

 peers my-table
   peer 01JM0TEPAREK01FQQ439DDZXD8

 listen dummy
   bind localhost:8080

To fix the issue, instead of by-passing some cleanup for the local
peer, handle this case specifically by doing the regular peer cleanup
and reset some fields set on the curpeers and curpeers proxy because
of the invalid local peer (do as if the peer was not declared).

It should still comply with requirements from #565.

This patch should be backported to all stable versions.
2025-03-06 22:05:29 +01:00
Aurelien DARRAGON
2560ab892f BUG/MINOR: cfgparse/peers: fix inconsistent check for missing peer server
In the "peers" section parser, right after parse_server() is called, we
used to check whether the curpeers->peers_fe->srv pointer was set or not
to know if parse_server() successfuly added a server to the peers proxy,
server that we can then associate to the new peer.

However the check is wrong, as curpeers->peers_fe->srv points to the
last added server, if a server was successfully added before the
failing one, we cannot detect that the last parse_server() didn't
add a server. This is known to cause bug with bad "peer"/"server"
statements.

To fix the issue, we save a pointer on the last known
curpeers->peers_fe->srv before parse_server() is called, and we then
compare the save with the pointer after parse_server(), if the value
didn't change, then parse_server() didn't add a server. This makes
the check consistent in all situations.

It should be backported to all stable versions.
2025-03-06 22:05:24 +01:00
William Lallemand
29db5406b4 CI: github: show results of the Unit tests
Add a "Show Unit-Tests results" section which show each unit test which
failed by displaying their result file.
2025-03-06 21:23:54 +01:00
William Lallemand
0b22c8e0e0 TESTS: unit-tests: store sh -x in a result file
Store `sh -e -x` of the test in a result file. This file is deleted upon
success, but can be consulted if the test fails
2025-03-06 21:22:38 +01:00
William Lallemand
7fdc4160b2 TESTS: change the output of run-unittests.sh
- "check" is run with sh -e so it will stop at the first error
- output of "check" is not shown anymore
- add a line with the name of the failed test
2025-03-06 17:53:53 +01:00
Valentine Krasnobaeva
e900ef987e BUG/MEIDUM: startup: return to initial cwd only after check_config_validity()
In check_config_validity() we evaluate some sample fetch expressions
(log-format, server rules, etc). These expressions may use external files like
maps.

If some particular 'default-path' was set in the global section before, it's no
longer applied to resolve file pathes in check_config_validity(). parse_cfg()
at the end of config parsing switches back to the initial cwd.

This fixes the issue #2886.

This patch should be backported in all stable versions since 2.4.0, including
2.4.0.
2025-03-06 10:49:48 +01:00
Roberto Moreda
f98b5c4f59 MINOR: log: add dont-parse-log and assume-rfc6587-ntf options
This commit introduces the dont-parse-log option to disable log message
parsing, allowing raw log data to be forwarded without modification.

Also, it adds the assume-rfc6587-ntf option to frame log messages
using only non-transparent framing as per RFC 6587. This avoids
missparsing in certain cases (mainly with non RFC compliant messages).

The documentation is updated to include details on the new options and
their intended use cases.

This feature was discussed in GH #2856
2025-03-06 09:30:39 +01:00
Roberto Moreda
c25e6f5efa MINOR: log: detach prepare from parse message
This commit adds a new function `prepare_log_message` to initialize log
message buffers and metadata. This function sets default values for log
level and facility, ensuring a consistent starting state for log
processing. It also prepares the buffer and metadata fields, simplifying
subsequent log parsing and construction.
2025-03-06 09:30:31 +01:00
Roberto Moreda
834e9af877 MINOR: log: add options eval for log-forward
This commit adds parsing of options in log-forward config sections and
prepares the scenario to implement actual changes of behaviuor. So far
we only take in account proxy->options2, which is the bit container with
more available positions.
2025-03-06 09:30:25 +01:00
Aurelien DARRAGON
0746f6bde0 MINOR: cfgparse-listen: add and use cfg_parse_listen_match_option() helper
cfg_parse_listen_match_option() takes cfg_opt array as parameter, as well
current args, expected mode and cap bitfields.

It is expected to be used under cfg_parse_listen() function or similar.
Its goal is to remove code duplication around proxy->options and
proxy->options2 handling, since the same checks are performed for the
two. Also, this function could help to evaluate proxy options for
mode-specific proxies such as log-forward section for instance:
by giving the expected mode and capatiblity as input, the function
would only match compatible options.
2025-03-06 09:30:18 +01:00
Aurelien DARRAGON
d9aa199100 MINOR: proxy: make pr_mode enum bitfield compatible
Current pr_mode enum is a regular enum because a proxy only supports one
mode at a time. However it can be handy for a function to be given a
list of compatible modes for a proxy, and we can't do that using a
bitfield because pr_mode is not bitfield compatible (values share
the same bits).

In this patch we manually define pr_mode values so that they are all
using separate bits and allows a function to take a bitfield of
compatible modes as parameter.
2025-03-06 09:30:11 +01:00
Aurelien DARRAGON
c7abe7778e MEDIUM: log: postpone the decision to send or not log with empty messages
As reported by Nick Ramirez in GH #2891, it is currently not possible to
use log-profile without a log-format set on the proxy.

This is due to historical reason, because all log sending functions avoid
trying to send a log with empty message. But now with log-profile which
can override log-format, it is possible that some loggers may actually
end up generating a valid log message that should be sent! Yet from the
upper logging functions we don't know about that because loggers are
evaluated in lower API functions.

Thus, to avoid skipping potentially valid messages (thanks to log-profile
overrides), in this patch we postpone the decision to send or not empty
log messages in lower log API layer, ie: _process_send_log_final(), once
the log-profile settings were evaluated for a given logger.

A known side-effect of this change is that fe->log_count statistic may
be increased even if no log message is sent because the message was empty
and even the log-profile didn't help to produce a non empty log message.
But since configurations lacking proxy log-format are not supposed to be
used without log-profile (+ log steps combination) anyway it shouldn't be
an issue.
2025-03-05 15:38:52 +01:00
Aurelien DARRAGON
9e9b110032 MINOR: log: use __send_log() with exact payload length
Historically, __send_log() was called with terminating NULL byte after
the message payload. But now that __send_log() supports being called
without terminating NULL byte (thanks to size hint), and that __sendlog()
actually stips any \n or NULL byte, we don't need to bother with that
anymore. So let's remove extra logic around __send_log() users where we
added 1 extra byte for the terminating NULL byte.

No change of behavior should be expected.
2025-03-05 15:38:46 +01:00
Aurelien DARRAGON
94a9b0f5de BUG/MINOR: log: set proper smp size for balance log-hash
result.data.u.str.size was set to size+1 to take into account terminating
NULL byte as per the comment. But this is wrong because the caller is free
to set size to just the right amount of bytes (without terminating NULL
byte). In fact all smp API functions will not read past str.data so there
is not risk about uninitialized reads, but this leaves an ambiguity for
converters that may use all the smp size to perform transformations, and
since we don't know about the "message" memory origin, we cannot assume
that its size may be greater than size. So we max it out to size just to
be safe.

This bug was not known to cause any issue, it was spotted during code
review. It should be backported in 2.9 with b30bd7a ("MEDIUM: log/balance:
support for the "hash" lb algorithm")
2025-03-05 15:38:41 +01:00
Aurelien DARRAGON
ddf66132f4 CLEANUP: log: removing "log-balance" references
This is a complementary patch to 0e1f389fe9 ("DOC: config: removing
"log-balance" references"): we properly removed all log-balance
references in the doc but there remained some in the code, let's fix
that.

It could be backported in 2.9 with 0e1f389fe9
2025-03-05 15:38:34 +01:00
Valentine Krasnobaeva
b46b81949f MINOR: sample: allow custom date format in error-log-format
Sample fetches %[accept_date] and %[request_date] with converters can be used
in error-log-format string. But in the most error cases they fetches nothing,
as error logs are produced on SSL handshake issues or when invalid PROXY
protocol header is used. Stream object is never allocated in such cases and
smp_fetch_accept_date() just simply returns 0.

There is a need to have a custom date format (ISO8601) also in the error logs,
along with normal logs. When sess_build_logline_orig() builds log line it
always copies the accept date to strm_logs structure. When stream is absent,
accept date is copied from the session object.

So, if the steam object wasn't allocated, let's use the session date info in
smp_fetch_accept_date(). This allows then, in sample_process(), to apply to the
fetched date different converters and formats.

This fixes the issue #2884.
2025-03-04 18:57:29 +01:00
Olivier Houchard
335ef3264b DEBUG: init: Add a macro to register unit tests
Add a new macro, REGISTER_UNITTEST(), that will automatically make sure
we call hap_register_unittest(), instead of having to create a function
that will do so.
2025-03-04 18:18:10 +01:00
William Lallemand
588237ca6e CI: github: fix h2spec.config proxy names
h2spec.config config file emitted a warning because the frontend name
has the same name as the backend.
2025-03-04 11:44:03 +01:00
William Lallemand
06d86822c1 TESTS: ist: add a ist.sh to launch in make unit-tests
Compile and run the ist unit tests from ist.sh
2025-03-04 11:25:35 +01:00
William Lallemand
11ea331e20 TESTS: ist: use the exit code to return a verdict
Use the exit code to return a verdict on the test.
2025-03-04 11:25:35 +01:00
William Lallemand
ddd2c82a35 TESTS: ist: fix wrong array size
test_istzero() and test_istpad() has the wrong array size buf[] which
lacks the space for the '\0';

Could be backported in every stable branches.
2025-03-04 11:25:25 +01:00
William Lallemand
937ece45d4 CI: github: remove smoke tests from vtest.yml
Smoke tests from the vtest.yml are not useful anymore since they are run
directly by tests/unit/smoke/test.sh. This patch removes them.
2025-03-03 12:46:20 +01:00
William Lallemand
cf71e9f5cf MINOR: jws: conversion to NIST curves name
OpenSSL version greater than 3.0 does not use the same API when
manipulating EVP_PKEY structures, the EC_KEY API is deprecated and it's
not possible anymore to get an EC_GROUP and simply call
EC_GROUP_get_curve_name().

Instead, one must call EVP_PKEY_get_utf8_string_param with the
OSSL_PKEY_PARAM_GROUP_NAME parameter, but this would result in a SECG
curves name, instead of a NIST curves name in previous version.
(ex: secp384r1 vs P-384)

This patch adds 2 functions:

- the first one look for a curves name and converts it to an openssl
  NID.

- the second one converts a NID to a NIST curves name

The list only contains: P-256, P-384 and P-521 for now, it could be
extended in the fure with more curves.
2025-03-03 12:43:32 +01:00
William Lallemand
8a6b0b06cd TESTS: add config smoke checks in the unit tests
vtest.yml contains some config checks that are used to check the
memleaks.

This patch adds a unit test which runs the same tests.
2025-03-03 12:43:32 +01:00
William Lallemand
7a2a613132 CI: github: run make unit-tests on the CI
Run the new make unit-tests on the CI.

It requires HAProxy to be built with -DDEBUG_UNIT so the -U option is
available in HAProxy
2025-03-03 12:43:32 +01:00
William Lallemand
09457111bb TESTS: jws: register a unittest for jwk
Add a way to test the jwk converter in the unit test system

    $ make TARGET=linux-glibc USE_OPENSSL=1 CFLAGS="-DDEBUG_UNIT=1"
    $ ./haproxy -U jwk foobar.pem.rsa
    {
        "kty": "RSA",
        "n":   "...",
        "e":   "AQAB"
    }
    $ ./haproxy -U jwk foobar.pem.ecdsa
    {
        "kty": "EC",
        "crv": "P-384",
        "x":   "...",
        "y":   "..."
    }

This is then tested by a shell script:

    $ HAPROXY_PROGRAM=${PWD}/haproxy tests/unit/jwk/test.sh
    + readlink -f tests/unit/jwk/test.sh
    + BASENAME=/haproxy/tests/unit/jwk/test.sh
    + dirname /haproxy/tests/unit/jwk/test.sh
    + TESTDIR=/haproxy/tests/unit/jwk
    + HAPROXY_PROGRAM=/haproxy/haproxy
    + mktemp
    + FILE1=/tmp/tmp.iEICxC5yNK
    + /haproxy/haproxy -U jwk /haproxy/tests/unit/jwk/ecdsa.key
    + diff -Naurp /haproxy/tests/unit/jwk/ecdsa.pub.jwk /tmp/tmp.iEICxC5yNK
    + rm /tmp/tmp.iEICxC5yNK
    + mktemp
    + FILE2=/tmp/tmp.EIrGZGaCDi
    + /haproxy/haproxy -U jwk /haproxy/tests/unit/jwk/rsa.key
    + diff -Naurp /haproxy/tests/unit/jwk/rsa.pub.jwk /tmp/tmp.EIrGZGaCDi
    + rm /tmp/tmp.EIrGZGaCDi

    $ echo $?
    0
2025-03-03 12:43:32 +01:00
William Lallemand
1e7478bb4e TESTS: add a unit test runner in the Makefile
`make unit-tests` would run shell scripts from tests/unit/

The run-unittests.sh script will look for any .sh in tests/unit/ and
will call it twice:

- first with the 'check' argument in order to decide if we should skip
  the test or not
- second to run the check

A simple test could be written this way:

	#!/bin/sh

	check() {
	       ${HAPROXY_PROGRAM} -cc 'feature(OPENSSL)'
	       command -v socat
	}

	run() {
		 ${HAPROXY_PROGRAM} -dI -f ${ROOTDIR}/examples/quick-test.cfg -c
	}

	case "$1" in
	       "check")
	               check
	       ;;
	       "run")
	               run
	       ;;
	esac

The tests *MUST* be written in POSIX shell in order to be portable, and
any special commands should be tested with `command -v` before using it.

Tests are run with `sh -e` so everything must be tested.
2025-03-03 12:43:32 +01:00
William Lallemand
a647839954 DEBUG: init: add a way to register functions for unit tests
Doing unit tests with haproxy was always a bit difficult, some of the
function you want to test would depend on the buffer or trash buffer
initialisation of HAProxy, so building a separate main() for them is
quite hard.

This patch adds a way to register a function that can be called with the
"-U" parameter on the command line, will be executed just after
step_init_1() and will exit the process with its return value as an exit
code.

When using the -U option, every keywords after this option is passed to
the callback and could be used as a parameter, letting the capability to
handle complex arguments if required by the test.

HAProxy need to be built with DEBUG_UNIT to activate this feature.
2025-03-03 12:43:32 +01:00
William Lallemand
4dc0ba233e MINOR: jws: implement a JWK public key converter
Implement a converter which takes an EVP_PKEY and converts it to a
public JWK key. This is the first step of the JWS implementation.

It supports both EC and RSA keys.

Know to work with:

- LibreSSL
- AWS-LC
- OpenSSL > 1.1.1
2025-03-03 12:43:32 +01:00
Willy Tarreau
730641f7ca BUG/MINOR: server: check for either proxy-protocol v1 or v2 to send hedaer
As reported in issue #2882, using "no-send-proxy-v2" on a server line does
not properly disable the use of proxy-protocol if it was enabled in a
default-server directive in combination with other PP options. The reason
for this is that the sending of a proxy header is determined by a test on
srv->pp_opts without any distinction, so disabling PPv2 while leaving other
options results in a PPv1 header to be sent.

Let's fix this by explicitly testing for the presence of either send-proxy
or send-proxy-v2 when deciding to send a proxy header.

This can be backported to all versions. Thanks to Andre Sencioles (@asenci)
for reporting the issue and testing the fix.
2025-03-03 04:05:47 +01:00
Amaury Denoyelle
d0f97040a3 BUG/MINOR: hq-interop: fix leak in case of rcv_buf early return
HTTP/0.9 parser was recently updated to support truncated requests in
rcv_buf operation. However, this caused a leak as input buffer is
allocated early.

In fact, the leak was already present in case of fatal errors. Fix this
by first delaying buffer allocation, so that initial checks are
performed before. Then, ensure that buffer is released in case of a
latter error.

This is considered as minor, as HTTP/0.9 is reserved for experiment and
QUIC interop usages.

This should be backported up to 2.6.
2025-02-28 17:37:00 +01:00
Willy Tarreau
fd5d59967a MINOR: h1: permit to relax the websocket checks for missing mandatory headers
At least one user would like to allow a standards-violating client setup
WebSocket connections through haproxy to a standards-violating server that
accepts them. While this should of course never be done over the internet,
it can make sense in the datacenter between application components which do
not need to mask the data, so this typically falls into the situation of
what the "accept-unsafe-violations-in-http-request" option and the
"accept-unsafe-violations-in-http-response" option are made for.
See GH #2876 for more context.

This patch relaxes the test on the "Sec-Websocket-Key" header field in
the request, and of the "Sec-Websocket-Accept" header in the response
when these respective options are set.

The doc was updated to reference this addition. This may be backported
to 3.1 but preferably not further.
2025-02-28 17:31:20 +01:00
Christopher Faulet
0e08252294 BUG/MEDIUM: mux-fcgi: Try to fully fill demux buffer on receive if not empty
Don't reserve space for the HTX overhead on receive if the demux buffer is
not empty. Otherwise, the demux buffer may be erroneously reported as full
and this may block records processing. Because of this bug, a ping-pong loop
till timeout between data reception and demux process can be observed.

This bug was introduced by the commit 5f927f603 ("BUG/MEDIUM: mux-fcgi:
Properly handle read0 on partial records"). To fix the issue, if the demux
buffer is not empty when we try to receive more data, all free space in the
buffer can now be used. However, if the demux buffer is empty, we still try
to keep it aligned with the HTX.

This patch must be backported to 3.1.
2025-02-28 16:07:05 +01:00
Amaury Denoyelle
3cc095a011 MINOR: hq-interop: properly handle incomplete request
Extends HTTP/0.9 layer to be able to deal with incomplete requests.
Instead of an error, 0 is returned. Thus, instead of a stream closure.
QUIC-MUX may retry rcv_buf operation later if more data is received,
similarly to HTTP/3 layer.

Note that HTTP/0.9 is only used for testing and interop purpose. As
such, this limitation is not considered as a bug. It is probably not
worth to backport it.
2025-02-27 17:34:06 +01:00
Amaury Denoyelle
0aa35289b3 CLEANUP: h3: fix documentation of h3_rcv_buf()
Return value of h3_rcv_buf() is incorrectly documented. Indeed, it may
return a positive value to indicate that input bytes were converted into
HTX. This is especially important, as caller uses this value to consume
the reported data amount in QCS Rx buffer.

This should be backported up to 2.6. Note that on 2.8, h3_rcv_buf() was
named h3_decode_qcs().
2025-02-27 17:31:40 +01:00
Amaury Denoyelle
f6648d478b BUG/MINOR: h3: do not report transfer as aborted on preemptive response
HTTP/3 specification allows a server to emit the entire response even if
only a partial request was received. In particular, this happens when
request STREAM FIN is delayed and transmitted in an empty payload frame.

In this case, qcc_abort_stream_read() was used by HTTP/3 layer to emit a
STOP_SENDING. Remaining received data were not transmitted to the stream
layer as they were simply discared. However, this prevents FIN
transmission to the stream layer. This causes the transfer to be
considered as prematurely closed, resulting in a cL-- log line status.
This is misleading to users which could interpret it as if the response
was not sent.

To fix this, disable STOP_SENDING emission on full preemptive reponse
emission. Rx channel is kept opened until the client closes it with
either a FIN or a RESET_STREAM. This ensures that the FIN signal can be
relayed to the stream layer, which allows the transfer to be reported as
completed.

This should be backported up to 2.9.
2025-02-27 17:23:24 +01:00
Dragan Dosen
0ae7a5d672 BUG/MINOR: server: fix the "server-template" prefix memory leak
The srv->tmpl_info.prefix was not freed in srv_free_params().

This could be backported to all stable versions.
2025-02-27 04:21:01 +01:00
Dragan Dosen
6838fe43a3 BUG/MEDIUM: server: properly initialize PROXY v2 TLVs
The PROXY v2 TLVs were not properly initialized when defined with
"set-proxy-v2-tlv-fmt" keyword, which could have caused a crash when
validating the configuration or malfunction (e.g. when used in
combination with "server-template" and/or "default-server").

The issue was introduced with commit 6f4bfed3a ("MINOR: server: Add
parser support for set-proxy-v2-tlv-fmt").

This should be backported up to 2.9.
2025-02-27 04:20:45 +01:00
Olivier Houchard
706b008429 MEDIUM: servers: Add strict-maxconn.
Maxconn is a bit of a misnomer when it comes to servers, as it doesn't
control the maximum number of connections we establish to a server, but
the maximum number of simultaneous requests. So add "strict-maxconn",
that will make it so we will never establish more connections than
maxconn.
It extends the meaning of the "restricted" setting of
tune.takeover-other-tg-connections, as it will also attempt to get idle
connections from other thread groups if strict-maxconn is set.
2025-02-26 13:00:18 +01:00
Olivier Houchard
8de8ed4f48 MEDIUM: connections: Allow taking over connections from other tgroups.
Allow haproxy to take over idle connections from other thread groups
than our own. To control that, add a new tunable,
tune.takeover-other-tg-connections. It can have 3 values, "none", where
we won't attempt to get connections from the other thread group (the
default), "restricted", where we only will try to get idle connections
from other thread groups when we're using reverse HTTP, and "full",
where we always try to get connections from other thread groups.
Unless there is a special need, it is advised to use "none" (or
restricted if we're using reverse HTTP) as using connections from other
thread groups may have a performance impact.
2025-02-26 13:00:18 +01:00
Olivier Houchard
d31b1650ae MEDIUM: pollers: Drop fd events after a takeover to another tgid.
In pollers that support it, provide the generation number in addition to
the fd, and, when an event happened, if the generation number is the
same, but the tgid changed, then assumed the fd was taken over by a
thread from another thread group, and just delete the event from the
current thread's poller, as we no longer want to hear about it.
2025-02-26 13:00:18 +01:00
Olivier Houchard
c36aae2af1 MINOR: pollers: Add a fixup_tgid_takeover() method.
Add a fixup_tgid_takeover() method to pollers for which it makes sense
(epoll, kqueue and evport). That method can be called after a takeover
of a fd from a different thread group, to make sure the poller's
internal structure reflects the new state.
2025-02-26 13:00:18 +01:00
Olivier Houchard
752c5cba5d MEDIUM: epoll: Make sure we can add a new event
Check that the call to epoll_ctl() succeeds, and if it does not, if
we're adding a new event and it fails with EEXIST, then delete and
re-add the event. There are a few cases where we may already have events
for a fd. If epoll_ctl() fails for any reason, use BUG_ON to make sure
we immediately crash, as this should not happen.
2025-02-26 13:00:18 +01:00
Olivier Houchard
c5cc09c00d MINOR: fd: Add fd_lock_tgid_cur().
Add fd_lock_tgid_cur(), a function that will lock the tgid, without
modifying its value.
2025-02-26 13:00:18 +01:00
Olivier Houchard
52b97ff8dd MEDIUM: fd: Wait if locked in fd_grab_tgid() and fd_take_tgid().
Wait while the tgid is locked in fd_grab_tgid() and fd_take_tgid().
As that lock is barely used, it should have no impact.
2025-02-26 13:00:18 +01:00
Ilia Shipitsin
814b5dfe30 BUILD: add possibility to use different QuicTLS variants
initially QuicTLS started as a patchset on top of OpenSSL,
currently project has started its own journey as QuicTLS

somehow we need both

ML: https://www.mail-archive.com/haproxy@formilux.org/msg45574.html
GH: https://github.com/quictls/quictls/issues/244
2025-02-25 10:29:46 +01:00
Willy Tarreau
a826250659 OPTIM: connection: don't try to kill other threads' connection when !shared
Users may have good reasons for using "tune.idle-pool.shared off", one of
them being the cost of moving cache lines between cores, or the kernel-
side locking associated with moving FDs. For this reason, when getting
close to the file descriptors limits, we must not try to kill adjacent
threads' FDs when the sharing of pools is disabled. This is extremely
expensive and kills the performance. We must limit ourselves to our local
FDs only. In such cases, it's up to the users to configure a large enough
maxconn for their usages.

Before this patch, perf top reported 9% CPU usage in connect_server()
onthe trylock used to kill connections when running at 4800 conns for
a global maxconn of 6400 on a 128-thread server. Now it doesn't spend
its time there anymore, and performance has increased by 12%. Note,
it was verified that disabling the locks in such a case has no effect
at all, so better keep them and stay safe.
2025-02-25 09:23:46 +01:00
Willy Tarreau
2e0bac90da BUG/MEDIUM: stream: don't use localtime in dumps from a signal handler
In issue #2861, Jarosaw Rzesztko reported another issue with
"show threads", this time in relation with the conversion of a stream's
accept date to local time. Indeed, if the libc was interrupted in this
same function, it could have been interrupted with a lock held, then
it's no longer possible to dump the date, and we face a deadlock.
This is easy to reproduce with logging enabled.

Let's detect we come from a signal handler and do not try to resolve
the time to localtime in this case.
2025-02-24 13:40:42 +01:00
Willy Tarreau
fb7874c286 MINOR: tinfo: split the signal handler report flags into 3
While signals are not recursive, one signal (e.g. wdt) may interrupt
another one (e.g. debug). The problem this causes is that when leaving
the inner handler, it removes the outer's flag, hence the protection
that comes with it. Let's just have 3 distinct flags for regular signals,
debug signal and watchdog signal. We add a 4th definition which is an
aggregate of the 3 to ease testing.
2025-02-24 13:37:52 +01:00
Willy Tarreau
bbf824933f BUG/MINOR: h2: always trim leading and trailing LWS in header values
Annika Wickert reported some occasional disconnections between haproxy
and varnish when communicating over HTTP/2, with varnish complaining
about protocol errors while captures looked apparently normal. Nils
Goroll managed to reproduce this on varnish by injecting the capture of
the outgoing haproxy traffic and noticed that haproxy was forwarding a
header value containing a trailing space, which is now explicitly
forbidden since RFC9113.

It turns out that the only way for such a header to pass through haproxy
is to arrive in h2 and not be edited, in which case it will arrive in
HTX with its undesired spaces. Since the code dealing with HTX headers
always trims spaces around them, these are not observable in dumps, but
only when started in debug mode (-d). Conversions to/from h1 also drop
the spaces.

With this patch we trim LWS both on input and on output. This way we
always present clean headers in the whole stack, and even if some are
manually crafted by the configuration or Lua, they will be trimmed on
the output.

This must be backported to all stable versions.

Thanks to Annika for the helpful capture and Nils for the help with
the analysis on the varnish side!
2025-02-24 09:39:57 +01:00
Vincent Dechenaux
9011b3621b MINOR: compression: Introduce minimum size
This is the introduction of "minsize-req" and "minsize-res".
These two options allow you to set the minimum payload size required for
compression to be applied.
This helps save CPU on both server and client sides when the payload does
not need to be compressed.
2025-02-22 11:32:40 +01:00
Willy Tarreau
e7510d6230 CLEANUP: task: move the barrier after clearing th_ctx->current
There's a barrier after releasing the current task in the scheduler.
However it's improperly placed, it's done after pool_free() while in
fact it must be done immediately after resetting the current pointer.
Indeed, the purpose is to make sure that nobody sees the task as valid
when it's in the process of being released. This is something that
could theoretically happen if interrupted by a signal in the inlined
code of pool_free() if the compiler decided to postpone the write to
->current. In practice since nothing fancy is done in the inlined part
of the function, there's currently no risk of reordering. But it could
happen if the underlying __pool_free() were to be inlined for example,
and in this case we could possibly observe th_ctx->current pointing
to something currently being destroyed.

With the barrier between the two, there's no risk anymore.
2025-02-21 18:31:46 +01:00
Willy Tarreau
eb41d768f9 MINOR: tools: use only opportunistic symbols resolution
As seen in issue #2861, dladdr_and_size() an be quite expensive and
will often hold a mutex in the underlying library. It becomes a real
problem when issuing lots of "show threads" or wdt warnings in parallel
because threads will queue up waiting for each other to finish, adding
to their existing latency that possibly caused the warning in the first
place.

Here we're taking a different approach. If the thread is not isolated
and not panicking, it's doing unimportant stuff like showing threads
or warnings. In this case we try to grab a lock, and if we fail because
another thread is already there, we just pretend we cannot resolve the
symbol. This is not critical because then we fall back to the already
used case which consists in writing "main+<offset>". In practice this
will almost never happen except in bad situations which could have
otherwise degenerated.
2025-02-21 18:26:29 +01:00
Willy Tarreau
3c22fa315b BUG/MEDIUM: stream: use non-blocking freq_ctr calls from the stream dumper
The stream dump function is called from signal handlers (warning, show
threads, panic). It makes use of read_freq_ctr() which might possibly
block if it tries to access a locked freq_ctr in the process of being
updated, e.g. by the current thread.

Here we're relying on the non-blocking API instead. It may return incorrect
values (typically smaller ones after resetting the curr counter) but at
least it will not block.

This needs to be backported to stable versions along with the previous
commit below:

   MINOR: freq_ctr: provide non-blocking read functions

At least 3.1 is concerned as the warnings tend to increase the risk of
this situation appearing.
2025-02-21 18:26:29 +01:00
Willy Tarreau
29e246a84c MINOR: freq_ctr: provide non-blocking read functions
Some code called by the debug handlers in the context of a signal handler
accesses to some freq_ctr and occasionally ends up on a locked one from
the same thread that is dumping it. Let's introduce a non-blocking version
that at least allows to return even if the value is in the process of being
updated, it's less problematic than hanging.
2025-02-21 18:26:29 +01:00
Willy Tarreau
84d4c948fc BUG/MEDIUM: stream: never allocate connection addresses from signal handler
In __strm_dump_to_buffer(), we call conn_get_src()/conn_get_dst() to try
to retrieve the connection's IP addresses. But this function may be called
from a signal handler to dump a currently running stream, and if the
addresses were not allocated yet, a poll_alloc() will be performed while
we might possibly already be running pools code, resulting in pool list
corruption.

Let's just make sure we don't call these sensitive functions there when
called from a signal handler.

This must be backported at least to 3.1 and ideally all other versions,
along with this previous commit:

  MINOR: tinfo: add a new thread flag to indicate a call from a sig handler
2025-02-21 17:41:38 +01:00
Willy Tarreau
ddd173355c MINOR: tinfo: add a new thread flag to indicate a call from a sig handler
Signal handlers must absolutely not change anything, but some long and
complex call chains may look innocuous at first glance, yet result in
some subtle write accesses (e.g. pools) that can conflict with a running
thread being interrupted.

Let's add a new thread flag TH_FL_IN_SIG_HANDLER that is only set when
entering a signal handler and cleared when leaving them. Note, we're
speaking about real signal handlers (synchronous ones), not deferred
ones. This will allow some sensitive call places to act differently
when detecting such a condition, and possibly even to place a few new
BUG_ON().
2025-02-21 17:41:38 +01:00
Willy Tarreau
a56dfbdcb4 BUG/MINOR: mux-h1: always make sure h1s->sd exists in h1_dump_h1s_info()
This function may be called from a signal handler during a warning,
a panic or a show thread. We need to be more cautious about what may
or may not be dereferenced since an h1s is not necessarily fully
initialized. Loops of "show threads" sometimes manage to crash when
dereferencing a null h1s->sd, so let's guard it and add a comment
remining about the unusual call place.

This can be backported to the relevant versions.
2025-02-21 17:41:38 +01:00
Willy Tarreau
9d5bd47634 BUG/MINOR: stream: do not call co_data() from __strm_dump_to_buffer()
co_data() was instrumented to detect cases where c->output > data and
emits a warning if that's not correct. The problem is that it happens
quite a bit during "show threads" if it interrupts traffic anywhere,
and that in some environments building with -DDEBUG_STRICT_ACTION=3,
it will kill the process.

Let's just open-code the channel functions that make access to co_data(),
there are not that many and the operations remain very simple.

This can be backported to 3.1. It didn't trigger in earlier versions
because they didn't have this CHECK_IF_HOT() test.
2025-02-21 17:18:00 +01:00
Ilia Shipitsin
0bdf414fa5 CI: QUIC Interop: clean old docker images
currently temporary docker images are kept forever. let's delete
outdated ones
2025-02-21 11:34:43 +01:00
Aurelien DARRAGON
97a19517ff MINOR: clock: always use atomic ops for global_now_ms
global_now_ms is shared between threads so we must give hint to the
compiler that read/writes operations should be performed atomically.

Everywhere global_now_ms was used, atomic ops were used, except in
clock_update_global_date() where a read was performed without using
atomic op. In practise it is not an issue because on most systems
such reads should be atomic already, but to prevent any confusion or
potential bug on exotic systems, let's use an explicit _HA_ATOMIC_LOAD
there.

This may be backported up to 2.8
2025-02-21 11:22:35 +01:00
Aurelien DARRAGON
9561b9fb69 BUG/MINOR: sink: add tempo between 2 connection attempts for sft servers
When the connection for sink_forward_{oc}_applet fails or a previous one
is destroyed, the sft->appctx is instantly released.

However process_sink_forward_task(), which may run at any time, iterates
over all known sfts and tries to create sessions for orphan ones.

It means that instantly after sft->appctx is destroyed, a new one will
be created, thus a new connection attempt will be made.

It can be an issue with tcp log-servers or sink servers, because if the
server is unavailable, process_sink_forward() will keep looping without
any temporisation until the applet survives (ie: connection succeeds),
which results in unexpected CPU usage on the threads responsible for
that task.

Instead, we add a tempo logic so that a delay of 1second is applied
between two retries. Of course the initial attempt is not delayed.

This could be backported to all stable versions.
2025-02-21 11:22:35 +01:00
Aurelien DARRAGON
c9d4192726 BUG/MINOR: log: fix outgoing abns address family
While reviewing the code in an attempt to fix GH #2875, I stumbled
on another case similar to aac570c ("BUG/MEDIUM: uxst: fix outgoing
abns address family in connect()") that caused abns(z) addresses to
fail when used as log targets.

The underlying cause is the same as aac570c, which is the rework of the
unix socket families in order to support custom addresses for different
adressing schemes, where a real_family() was overlooked before passing
a haproxy-internal address struct to socket-oriented syscall.

To fix the issue, we first copy the target's addr, and then leverage
real_family() to set the proper low-level address family that is passed
to sendmsg() syscall.

It should be backported in 3.1
2025-02-21 11:22:28 +01:00
Aurelien DARRAGON
26d97ec148 REGTESTS: fix reg-tests/server/abnsz.vtc
It was proved in GH #2875 that the regtest was broken, at least for the
server-side abnsz, as the connect() was not performed using the proper
family, which results in kernel refusing to perform the call, while the
reg-test actually succeeds.

Indeed, in the test we used vtest client to connect to haproxy, which
then routed the request to another haproxy instance listening on an
abnsz socket, and this last haproxy was the one to answer the http
request.

As we only used "rxresp" in vtest client, the test succeeded with empty
responses, which was the case due to the server connection failing on the
first haproxy process.
2025-02-21 08:22:25 +01:00
Willy Tarreau
aac570cd03 BUG/MEDIUM: uxst: fix outgoing abns address family in connect()
Since we reworked the unix socket families in order to support custom
addresses for different addressing schemes, we've been using extra
values for the ss_family field in sockaddr_storage. These ones have
to be adjusted before calling bind() or connect(). It turns out that
after the abns/abnsz updates in 3.1, the connect() code was not adjusted
to take care of the change, resulting in AF_CUST_ABNS or AF_CUST_ABNSZ
to be placed in the address that was passed to connect().

The right approach is to locally copy the address, get its length,
fixup the family and use the fixed value and length for connect().

This must be backported to 3.1. Many thanks for @Mewp for reporting
this issue in github issue #2875.
2025-02-21 07:59:08 +01:00
Valentine Krasnobaeva
390df282c1 BUG/MINOR: cfgparse: fix NULL ptr dereference in cfg_parse_peers
When "peers" keyword is followed by more than one argument and it's the first
"peers" section in the config, cfg_parse_peers() detects it and exits with
"ERR_ALERT|ERR_FATAL" err_code.

So, upper layer parser, parse_cfg(), continues and parses the next keyword
"peer" and then he tries to check the global cfg_peers, which should contain
"my_cluster". The global cfg_peers is still NULL, because after alerting a user
in alertif_too_many_args, cfg_parse_peers() exited.

	peers my_cluster __some_wrong_data__
	peer haproxy1 1.1.1.1 1000

In order to fix this, let's add ERR_ABORT, if "peers" keyword is followed by
more than one argument. Like this parse_cfg() will stops immediately and
terminates haproxy with "too many args for peers my_cluster..." alert message.

It's more reliable, than add checks "if (cfg_peers !=NULL)" in "peer"
subparser, as we may have many "peers" sections.

	peers my_another_cluster
	peer haproxy1 1.1.1.2 1000

	peers my_cluster  __some_wrong_data__
	peer haproxy1 1.1.1.1 1000

In addition, for the example above, parse_cfg() will parse all configuration
until the end and only then terminates haproxy with the alert
"too many args...". Peer haproxy1 will be wrongly associated with
my_another_cluster.

This fixes the issue #2872.
This should be backported in all stable versions.
2025-02-20 17:10:26 +01:00
Christopher Faulet
851e52b551 BUG/MEDIUM: spoe/mux-spop: Introduce an NOOP action to deal with empty ACK
In the SPOP protocol, ACK frame with empty payload are allowed. However, in
that case, because only the payload is transferred, there is no data to
return to the SPOE applet. Only the end of input is reported. Thus the
applet is never woken up. It means that the SPOE filter will be blocked
during the processing timeout and will finally return an error.

To workaournd this issue, a NOOP action is introduced with the value 0. It
is only an internal action for now. It does not exist in the SPOP
protocol. When an ACK frame with an empy payload is received, this noop
action is transferred to the SPOE applet, instead of nothing. Thanks to this
trick, the applet is properly notified. This works because unknown actions
are ignored by the SPOE filter.

This patch must be backported to 3.1.
2025-02-20 11:56:27 +01:00
Christopher Faulet
efc46de294 BUG/MEDIUM: applet: Don't handle EOI/EOS/ERROR is applet is waiting for room
The commit 7214dcd52 ("BUG/MEDIUM: applet: Don't pretend to have more data
to handle EOI/EOS/ERROR") introduced a regression. Because of this patch, it
was possible to handle EOI/EOS/ERROR applet flags too early while the applet
was waiting for more room to transfer the last output data.

This bug can be encountered with any applet using its own buffers (cache and
stats for instance). And depending on the configuration and the timing, the
data may be truncated or the stream may be blocked, infinitely or not.
Streams blocked infinitely were observed with the cache applet and the HTTP
compression enabled.

For the record, it is important to detect EOI/EOS/ERROR applet flags to be
able to report the corresponding event on the SE and by transitivity on the
SC. Most of time, this happens when some data should be transferred to the
stream. The .rcv_buf callback function is called and these flags are
properly handled. However, some applets may also report them spontaneously,
outside of any data transfer. In that case, the .rcv_buf callback is not
called.

It is the purpose of this patch (and the one above). Being able to detect
pending EOI/EOS/ERROR applet flags. However, we must be sure to not handle
them too early at this place. When these flags are set, it means no more
data will be produced by the applet. So we must only wait to have
transferred everything to the stream. And this happens when the applet is no
longer waiting for more room.

This patch must be backported to 3.1 with the one above.
2025-02-20 10:00:32 +01:00
Willy Tarreau
4ef6be4a1f [RELEASE] Released version 3.2-dev6
Released version 3.2-dev6 with the following main changes :
    - BUG/MEDIUM: debug: close a possible race between thread dump and panic()
    - DEBUG: thread: report the spin lock counters as seek locks
    - DEBUG: thread: make lock time computation more consistent
    - DEBUG: thread: report the wait time buckets for lock classes
    - DEBUG: thread: don't keep the redundant _locked counter
    - DEBUG: thread: make lock_stat per operation instead of for all operations
    - DEBUG: thread: reduce the struct lock_stat to store only 30 buckets
    - MINOR: lbprm: add a new callback ->server_requeue to the lbprm
    - MEDIUM: server: allocate a tasklet for asyncronous requeuing
    - MAJOR: leastconn: postpone the server's repositioning under contention
    - BUG/MINOR: quic: reserve length field for long header encoding
    - BUG/MINOR: quic: fix CRYPTO payload size calcul for encoding
    - MINOR: quic: simplify length calculation for STREAM/CRYPTO frames
    - BUG/MINOR: mworker: section ignored in discovery after a post_section_parser
    - BUG/MINOR: mworker: post_section_parser for the last section in discovery
    - CLEANUP: mworker: "program" section does not have a post_section_parser anymore
    - MEDIUM: initcall: allow to register mutiple post_section_parser per section
    - CI: cirrus-ci: bump FreeBSD image to 14-2
    - DOC: initcall: name correctly REGISTER_CONFIG_POST_SECTION()
    - REGTESTS: stop using truncated.vtc on freebsd
    - MINOR: quic: refactor STREAM encoding and splitting
    - MINOR: quic: refactor CRYPTO encoding and splitting
    - BUG/MEDIUM: fd: mark FD transferred to another process as FD_CLONED
    - BUG/MINOR: ssl/cli: "show ssl crt-list" lacks client-sigals
    - BUG/MINOR: ssl/cli: "show ssl crt-list" lacks sigals
    - MINOR: ssl/cli: display more filenames in 'show ssl cert'
    - DOC: watchdog: document the sequence of the watchdog and panic
    - MINOR: ssl: store the filenames resulting from a lookup in ckch_conf
    - MINOR: startup: allow hap_register_feature() to enable a feature in the list
    - MINOR: quic: support frame type as a varint
    - BUG/MINOR: startup: leave at first post_section_parser which fails
    - BUG/MINOR: startup: hap_register_feature() fix for partial feature name
    - BUG/MEDIUM: cli: Be sure to drop all input data in END state
    - BUG/MINOR: cli: Wait for the last ACK when FDs are xferred from the old worker
    - BUG/MEDIUM: filters: Handle filters registered on data with no payload callback
    - BUG/MINOR: fcgi: Don't set the status to 302 if it is already set
    - MINOR: ssl/crtlist: split the ckch_conf loading from the crtlist line parsing
    - MINOR: ssl/crtlist: handle crt_path == cc->crt in crtlist_load_crt()
    - MINOR: ssl/ckch: return from ckch_conf_clean() when conf is NULL
    - MEDIUM: ssl/crtlist: "crt" keyword in frontend
    - DOC: configuration: document the "crt" frontend keyword
    - DEV: h2: add a Lua-based HTTP/2 connection tracer
    - BUG/MINOR: quic: prevent crash on conn access after MUX init failure
    - BUG/MINOR: mux-quic: prevent crash after MUX init failure
    - DEV: h2: fix flags for the continuation frame
    - REGTESTS: Fix truncated.vtc to send 0-CRLF
    - BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut
    - Revert "REGTESTS: stop using truncated.vtc on freebsd"
    - MINOR: mux-quic: define a QCC application state member
    - MINOR: mux-quic/h3: emit SETTINGS via MUX tasklet handler
    - MINOR: mux-quic/h3: support temporary blocking on control stream sending
2025-02-19 18:39:51 +01:00
Amaury Denoyelle
a7645d7cd5 MINOR: mux-quic/h3: support temporary blocking on control stream sending
When HTTP/3 layer is initialized via QUIC MUX, it first emits a SETTINGS
frame on an unidirectional control stream. However, this could be
prevented if client did not provide initial flow control.

Previously, QUIC MUX was unable to deal with such situation. Thus, the
connection was closed immediately and no transfer could occur. Improve
this by extending QUIC MUX application layer API : initialization may
now return a transient error. This allows MUX to continue to use the
connection normally. Initialization will be retried periodically alter
until it can succeed.

This new API allows to deal with the flow control issue described above.
Note that this patch is not considered as a bug fix. Indeed, clients are
strongly advised to provide enough flow control for a SETTINGS frame
exchange.
2025-02-19 11:08:02 +01:00
Amaury Denoyelle
06e7674399 MINOR: mux-quic/h3: emit SETTINGS via MUX tasklet handler
Previously, QUIC MUX application layer was installed and initialized via
MUX init. However, the latter stage involve I/O operations, for example
when using HTTP/3 with the emission of a SETTINGS frame.

Change this to prevent any I/O operations during MUX init. As such,
finalize app_ops callback is now called during the first invokation of
qcc_io_send(), in the context of MUX tasklet. To implement this, a new
application state value is added, to detect the transition from NULL to
INIT stage.
2025-02-19 11:03:40 +01:00
Amaury Denoyelle
188fc45b95 MINOR: mux-quic: define a QCC application state member
Introduce a new QCC field to track the current application layer state.
For the moment, only INIT and SHUT state are defined. This allows to
replace the older flag QC_CF_APP_SHUT.

This commit does not bring major changes. It is only necessary to permit
future evolutions on QUIC MUX. The only noticeable change is that QMUX
traces can now display this new field.
2025-02-19 10:59:53 +01:00
Christopher Faulet
4a99f15f0c Revert "REGTESTS: stop using truncated.vtc on freebsd"
This reverts commit 0b9a75e8781593c250f6366a64a019018ade688e.

Thanks to the previous fixes ("REGTESTS: Fix truncated.vtc to send 0-CRLF" and
"BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut"),
this script can be reenabled for FreeBSD.
2025-02-18 17:35:00 +01:00
Christopher Faulet
b70921f2c1 BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut
On shut, truncated HTX messages were not properly handled by the H2
multiplexer. Depending on how data were emitted, a chunked HTX message
without the 0-CRLF could be considered as full and an empty data with ES
flag set could be emitted instead of a RST_STREAM(CANCEL) frame.

In the H2 multiplexer, when a shut is performed, an HTX message is
considered as truncated if more HTX data are still expected. It is based on
the presence or not of the H2_SF_MORE_HTX_DATA flag on the H2 stream.
However, this flag is set or unset depending on the HTX extra field
value. This field is used to state how much data that must still be
transferred, based on the announced data length. For a message with a
content-length, this assumption is valid. But for a chunked message, it is
not true. Only the length of the current chunk is announced. So we cannot
rely on this field in that case to know if a message is full or not.

Instead, we must rely on the HTX start-line flags to know if more HTX data
are expected or not. If the xfer length is known (the HTX_SL_F_XFER_LEN flag
is set on the HTX start-line), it means that more data are always expected,
until the end of message is reached (the HTX_FL_EOM flag is set on the HTX
message). This is true for bodyless message because the end of message is
reported with the end of headers. This is also true for tunneled messages
because the end of message is received before switching the H2 stream in
tunnel mode.

This patch must be backported as far as 2.8.
2025-02-18 17:34:59 +01:00
Christopher Faulet
b93e419750 REGTESTS: Fix truncated.vtc to send 0-CRLF
When a chunked messages is sent, the 0-CRLF must be explicitely sent. Since
the begining, it is missing. Just add it.
2025-02-18 17:34:59 +01:00
Willy Tarreau
af5c07eee9 DEV: h2: fix flags for the continuation frame
It's flag 2 (end of headers) that's defined there, not 3 (padded).
2025-02-18 14:17:17 +01:00
Amaury Denoyelle
2715dbe9d0 BUG/MINOR: mux-quic: prevent crash after MUX init failure
qmux_init() may fail for several reasons. In this case, connection
resources are freed and underlying and a CONNECTION_CLOSE will be
emitted via its quic_conn instance.

In case of qmux_init() failure, qcc_release() is used to clean up
resources, but QCC <conn> member is first resetted to NULL, as
connection released must be delayed. Some cleanup operations are thus
skipped, one of them is the resetting of <ctx> connection member to
NULL. This may cause a crash as <ctx> is a dangling pointer after QCC
release. One of the possible reproducer is to activate QMUX traces,
which will cause a segfault on the qmux_init() error leave trace.

To fix this, simply reset <ctx> to NULL manually on qmux_init() failure.

This must be backported up to 3.0.
2025-02-18 11:02:46 +01:00
Amaury Denoyelle
2cdc4695cb BUG/MINOR: quic: prevent crash on conn access after MUX init failure
Initially, QUIC-MUX was responsible to reset quic_conn <conn> member to
NULL when MUX was released. This was performed via qcc_release().

However, qcc_release() is also used on qmux_init() failure. In this
case, connection must be freed via its session, so QCC <conn> member is
resetted to NULL prior to qcc_release(), which prevents quic_conn <conn>
member to also be resetted. As the connection is freed soon after,
quic_conn <conn> is a dangling pointer, which may cause crashes.

This bug should be very rare as first it implies that QUIC-MUX
initialization has failed (for example due to a memory alloc error).
Also, <conn> member is rarely used by quic_conn instance. In fact, the
only reproducible crash was done with QUIC traces activated, as in this
case connection is accessed via quic_conn under __trace_enabled()
function.

To fix this, detach connection from quic_conn via the XPRT layer instead
of the MUX. More precisely, this is performed via quic_close(). This
should ensure that it will always be conducted, either on normal
connection closure, but also after special conditions such as MUX init
failure.

This should be backported up to 2.6.
2025-02-18 10:43:56 +01:00
Willy Tarreau
607aa57b2e DEV: h2: add a Lua-based HTTP/2 connection tracer
The following config is sufficient to trace H2 exchanges between a client
and a server:

   global
       lua-load "dev/h2/h2-tracer.lua"

   listen h2_sniffer
       mode tcp
       bind :8002
       filter lua.h2-tracer #hex
       server s1 127.0.0.1:8003

The commented "hex" argument will also display full frames in hex (not
recommended). The connections are prefixed with a 3-hex digit number in
order to also support a bit of multiplexing without impacting the reading
too much. The screen is split in two, with the request on the left and
the response on the right. Here's an example of what it does between an
haproxy backend and an haproxy frontend both in H2, when submitted a
curl request for /?s=30k handled by httpterm:

  [001] ### req start
  [001] [PREFACE len=24]
  [001] [SETTINGS sid=0 len=24 (bytes=24)]
  [001]                                          | ### res start
  [001]                                          | [SETTINGS sid=0 len=18 (bytes=27)]
  [001]                                          | [SETTINGS ACK sid=0 len=0 (bytes=0)]
  [001] [SETTINGS ACK sid=0 len=0 (bytes=56)]
  [001] [HEADERS EH+ES sid=1 len=47 (bytes=47)]
  [001]                                          | [HEADERS EH sid=1 len=101 (bytes=15351)]
  [001]                                          | [DATA sid=1 len=15126 (bytes=15241)]
  [001]                                          | [DATA sid=1 len=1258 (bytes=106)]
  [001]                                          |                  ... -106 = 1152
  [001]                                          |                    ... -1152 = 0
  [001] [WINDOW_UPDATE sid=1 len=4 (bytes=43)]
  [001] [WINDOW_UPDATE sid=0 len=4 (bytes=30)]
  [001] [WINDOW_UPDATE sid=1 len=4 (bytes=17)]
  [001] [WINDOW_UPDATE sid=0 len=4 (bytes=4)]
  [001]                                          | [DATA ES sid=1 len=14336 (bytes=14336)]
  [001] [WINDOW_UPDATE sid=0 len=4 (bytes=4)]
  [001] ### req end: 31080 bytes total
  [001]                                          | [GOAWAY sid=0 len=8 (bytes=8)]
  [001]                                          | ### res end: 31097 bytes total

It deserves some improvements. For instance at the moment it does not
verify the preface, any 24 bytes will work. It does not perform any
protocol validation either. Detecting some issues such as out-of-sequence
frames could be helpful. But it already helps as-is.
2025-02-18 09:26:15 +01:00
William Lallemand
764f6910ed DOC: configuration: document the "crt" frontend keyword
Document the "crt" keyword of frontend and listen section.
2025-02-17 18:26:37 +01:00
William Lallemand
cd6a02ace9 MEDIUM: ssl/crtlist: "crt" keyword in frontend
This patch implements the "crt" keywords in frontend, declaring an
implicit crt-list named after the frontend.

The patch is split in two steps:

The first step is the crt keyword parser, which parses crt lines and
fill a "cfg_crt_node" struct containing a ssl_bind_conf and a
ckch_conf which are put in a list to be used later.

After parsing the frontend section, as a 2nd step, a
post_section_parser is called, it will create a crt-list named after
the frontend and will fill it with certificates from the list of
cfg_crt_node. Once created this crt-list will be loaded in every "ssl"
bind lines that didn't declare any crt or crt-list.

Example:

    listen https
       bind :443 ssl
       crt foobar.pem
       crt test1.net.crt key test1.net.key

Implements part of #2854
2025-02-17 18:26:37 +01:00
William Lallemand
82f927817e MINOR: ssl/ckch: return from ckch_conf_clean() when conf is NULL
ckch_conf_clean() mustn't be executed when the argument is NULL, this
will keep the API more consistant like any free() function.
2025-02-17 18:26:37 +01:00
William Lallemand
0330011acf MINOR: ssl/crtlist: handle crt_path == cc->crt in crtlist_load_crt()
Handle the case where crt_path == cc->crt, so the pointer doesn't get
free'd before getting strdup'ed in crtlist_load_crt().
2025-02-17 18:26:37 +01:00
William Lallemand
69163cd63e MINOR: ssl/crtlist: split the ckch_conf loading from the crtlist line parsing
ckch_conf loading is not that simple as it requires to check
- if the cert already exists in the ckchs_tree
- if the ckch_conf is compatible with an existing cert in ckchs_tree
- if the cert is a bundle which need to load multiple ckch_store

This logic could be reuse elsewhere, so this commit introduce the new
crtlist_load_crt() function which does that.
2025-02-17 18:26:37 +01:00
Christopher Faulet
ca79ed5eef BUG/MINOR: fcgi: Don't set the status to 302 if it is already set
When a "Location" header was found in a FCGI response, the status code was
forced to 302. But it should only be performed if no status code was set
first.

So now, we take care to not override an already defined status code when the
"Location" header is found.

This patch should fix the issue #2865. It must backported to all stable
versions.
2025-02-17 16:37:53 +01:00
Christopher Faulet
34542d5ec2 BUG/MEDIUM: filters: Handle filters registered on data with no payload callback
An HTTP filter with no http_payload callback function may be registered on
data. In that case, this filter is obviously not called when some data are
received but it remains important to update its internal state to be sure to
keep it synchronized on the stream, especially its offet value. Otherwise,
the wrong calculation on the global offset may be performed in
flt_http_end(), leading to an integer overflow when data are moved from
input to output. This overflow triggers a BUG_ON() in c_adv().

The same is true for TCP filters with no tcp_payload callback function.

This patch must be backport to all stable versions.
2025-02-17 16:16:29 +01:00
Christopher Faulet
49b7bcf583 BUG/MINOR: cli: Wait for the last ACK when FDs are xferred from the old worker
On reload, the new worker requests bound FDs to the old one. The old worker
sends them in message of at most 252 FDs. Each message is acknowledged by
the new worker. All messages sent or received by the old worker are handled
manually via sendmsg/recv syscalls. So the old worker must be sure consume
all the ACK replies. However, the last one was never consumed. So it was
considered as a command by the CLI applet. This issue was hidden since
recently. But it was the root cause of the issue #2862.

Note this last ack is also the first one when there are less than 252 FDs to
transfer.

This patch must be backported to all stable versions.
2025-02-17 15:31:07 +01:00
Christopher Faulet
972ce87676 BUG/MEDIUM: cli: Be sure to drop all input data in END state
Commit 7214dcd ("BUG/MEDIUM: applet: Don't pretend to have more data to
handle EOI/EOS/ERROR") revealed a bug with the CLI applet. Pending input
data when the applet is in CLI_ST_END state were never consumed or dropped,
leading to a wakeup loop.

The CLI applet implements its own snd_buf callback function. It is important
it consumes all pending input data. Otherwise, the applet is woken up in
loop until it empties the request buffer. Another way to fix the issue would
be to report an error. But in that case, it seems reasonnable to drop these
data.

The issue can be observed on reload, in master/worker mode, because of issue
about the last ACK message which was never consummed by the _getsocks()
command.

This patch should fix the issue #2862. It must be backported to 3.1 with the
commit above.
2025-02-17 15:31:07 +01:00
William Lallemand
ab2fa95bdd BUG/MINOR: startup: hap_register_feature() fix for partial feature name
In patch 2fe4cbd8e ("MINOR: startup: allow hap_register_feature() to
enable a feature in the list"), the ability to overwrite a '-' in the
feature list was added. However the code was not tokenizing correctly
the string, and partial feature name found in the name could result in
having the same feature name multiple time.

This patch rewrites the lookup of the string by tokenizing it correctly.
2025-02-17 14:56:09 +01:00
William Lallemand
7268e9c249 BUG/MINOR: startup: leave at first post_section_parser which fails
Since we are now iterating on post_section_parser() for a same keyword,
we need to exit at the first ERR_ABORT.

The post_section_parser() is called when parsing a new section, but also
at the end of the file to be called for the last section.

The changes in 4de86bb ("MEDIUM: initcall: allow to register mutiple
post_section_parser per section") should have added tests on the
ERR_ABORT value.

Also pcs->post_section_parser() must be called instead of
cs->post_section_parser() because we could have a NULL ptr.

This bug does not affect anything since we don't use
REGISTER_CONFIG_POST_SECTION() yet.
2025-02-17 11:21:20 +01:00
Amaury Denoyelle
32691e7c25 MINOR: quic: support frame type as a varint
QUIC frame type is encoded as a variable-length integer. Thus, 64-bit
integer should be used for them. Currently, this was not the case as
type was represented as a 1-byte char inside quic_frame structure. This
does not cause any issue with QUIC from RFC9000, as all frame types fit
in this range. Furthermore, a QUIC implementation is required to use the
smallest size varint when encoding a frame type.

However, the current code is unable to accept QUIC extension with bigger
frame types. This is notably the case for quic-on-streams draft. Thus,
this commit readjusts quic_frame architecture to be able to support
higher frame type values.

First, type field of quic_frame is changed to a 64-bits variable. Both
encoding and decoding frame functions uses variable-length integer
helpers to manipulate the frame type field.

Secondly, the quic_frame builders/parsers infrastructure is still
preserved. However, it could be impossible to define new large frame
type as an index into quic_frame_builders / quic_frame_parsers arrays.
Thus, wrapper functions are now provided to access the builders and
parsers. Both qf_builder() and qf_parser() wrappers can then be extended
to return custom builder/parser instances for larger frame type.

Finally, unknown frame type detection also uses the new wrapper
quic_frame_is_known(). As with builders/parsers, for large frame type,
this function must be manually completed to support a new type value.
2025-02-14 09:00:05 +01:00
William Lallemand
2fe4cbd8e5 MINOR: startup: allow hap_register_feature() to enable a feature in the list
This patch allows hap_register_feature() to enable a feature in the list
which was already registered and marked disabled.

This way we could enable automatically some features under certain
condition without the need of the USE argument with make and correctly
report its activation.
2025-02-14 00:09:17 +01:00
William Lallemand
7034f2ca48 MINOR: ssl: store the filenames resulting from a lookup in ckch_conf
With this patch, files resulting from a lookup (*.key, *.ocsp,
*.issuer etc) are now stored in the ckch_conf.

It allows to see the original filename from where it was loaded in "show
ssl cert <filename>"
2025-02-13 17:44:00 +01:00
Willy Tarreau
a4d65c9cc8 DOC: watchdog: document the sequence of the watchdog and panic
Each time we go into the watchdog and panic code, it's super hard to
figure who calls what since signals are involved to bounce between
threads. Let's document the main principles and sequences to ease the
journey next time.
2025-02-13 16:45:07 +01:00
William Lallemand
0c0b38d64c MINOR: ssl/cli: display more filenames in 'show ssl cert'
"show ssl cert <file>" only displays a unique filename, which is the
key used in the ckch_store tree. This patch extends it by displaying
every filenames from the ckch_conf that can be configured with the
crt-store.

In order to be more consistant, some changes are needed in the future:
- we need to store the complete path in the ckch_conf (meaning with
  crt-path or key-path)
- we need to fill a ckch_conf in cases the files are autodiscovered
2025-02-13 16:18:06 +01:00
William Lallemand
5a7cbb8d81 BUG/MINOR: ssl/cli: "show ssl crt-list" lacks sigals
1d3c8223 ("MINOR: ssl: allow to change the server signature algorithm")
mplemented the sigals keyword in the crt-list but never the dump of the
keyword over the CLI.

Must be backported as far as 2.8.
2025-02-12 17:16:50 +01:00
William Lallemand
037d2e5498 BUG/MINOR: ssl/cli: "show ssl crt-list" lacks client-sigals
b6ae2aafde43 ("MINOR: ssl: allow to change the signature algorithm for
client authentication") implemented the client-sigals keyword in the
crt-list but never the dump of the keyword over the CLI.

Must be backported as far as 2.8.
2025-02-12 17:16:50 +01:00
Willy Tarreau
561319bd1c BUG/MEDIUM: fd: mark FD transferred to another process as FD_CLONED
The crappy epoll API stroke again with reloads and transferred FDs.
Indeed, when listening sockets are retrieved by a new worker from a
previous one, and the old one finally stops listening on them, it
closes the FDs. But in this case, since the sockets themselves were
not closed, epoll will not unregister them and will continue to
report new activity for these in the old process, which can only
observe, count an fd_poll_drop event and not unregister them since
they're not reachable anymore.

The unfortunate effect is that long-lasting old processes are woken
up at the same rate as the new process when accepting new connections,
and can waste a lot of CPU. Accept rates divided by 8 were observed on
a small test involving a slow transfer on 10 connections facing a reload
every second so that 10 processes were busy dealing with them while
another process was hammering the service with new connections.

Fortunately, years ago we implemented a flag FD_CLONED exactly for
similar purposes. Let's simply mark transferred FDs with FD_CLONED
so that the process knows that these ones require special treatment
and have to be manually unregistered before being closed. This does
the job fine, now old processes correctly unregister the FD before
closing it and no longer receive accept events for the new process.

This needs to be backported to all stable versions. It only affects
epoll, as usual, and this time in combination with transferred FDs
(typically reloads in master-worker mode). Thanks to Damien Claisse
for providing all detailed measurements and statistics allowing to
understand and reproduce the problem.
2025-02-12 16:35:01 +01:00
Amaury Denoyelle
e2744d23be MINOR: quic: refactor CRYPTO encoding and splitting
This patch is the direct follow-up of the previous one which refactor
STREAM frame encoding. Reuse the newly defined quic_strm_frm_fillbuf()
and quic_strm_frm_split() functions for CRYPTO frame encoding.

The code for CRYPTO and STREAM frames encoding should now be clearer as
it is mostly identical.
2025-02-12 15:10:54 +01:00
Amaury Denoyelle
f96af8e463 MINOR: quic: refactor STREAM encoding and splitting
CRYPTO and STREAM frames encoding is similar. If payload is too large,
frame will be splitted and only the first payload part will be written
in the output QUIC packet. This process is complexified by the presence
of a variable-length integer Length field prior to the payload.

This commit aims at refactor these operations. Define two functions to
simplify the code :
* quic_strm_frm_fillbuf() which is used to calculate the optimal frame
  length of a STREAM/CRYPTO frame with its payload in a buffer
* quic_strm_frm_split() which is used to split the frame payload if
  buffer is too small

With this patch, both functions are now implemented for STREAM encoding.
2025-02-12 15:10:03 +01:00
William Lallemand
0b9a75e878 REGTESTS: stop using truncated.vtc on freebsd
We never succeed to make the truncated.vtc reg-test work constantly on
the Cirrus FreeBSD CI.

Let's exclude it from the FreeBSD tests so the CI don't break randomly.
2025-02-12 13:34:40 +01:00
William Lallemand
0b47e5fa20 DOC: initcall: name correctly REGISTER_CONFIG_POST_SECTION()
REGISTER_CONFIG_POST_SECTION() was not named correctly.
2025-02-12 13:27:44 +01:00
William Lallemand
6097938209 CI: cirrus-ci: bump FreeBSD image to 14-2
FreeBSD CI since to be broken for a while, try to upgrade the image to
the latest 14.2 version.
2025-02-12 13:18:55 +01:00
William Lallemand
4de86bbbfc MEDIUM: initcall: allow to register mutiple post_section_parser per section
Before this patch, REGISTER_CONFIG_SECTION() allowed to register one and only
one callback (<post>) called after the parsing of a section.

It was limitating because you couldn't register a post callback from anywhere
else in the code.

This patch introduces the new REGISTER_CONFIG_SECTION_POST() macros which allows
to register a new post callback for a section keyword from anywhere.

This patch introduces the feature by allowing `struct cfg_section` entries that
does not have a `section_parser`, and then iterating on all cfg_section with a
post_section_parser for a keyword.
2025-02-12 12:52:41 +01:00
William Lallemand
5c2039b5b8 CLEANUP: mworker: "program" section does not have a post_section_parser anymore
The "program" section does not have a post_section_parser anymore so no
need to make an exception for it.
2025-02-12 12:37:01 +01:00
William Lallemand
313eeae7db BUG/MINOR: mworker: post_section_parser for the last section in discovery
Previous patch 2c270a05f ("BUG/MINOR: mworker: section ignored in
discovery after a post_section_parser") needs an adjustment for the last
section of the file.

Indeed the post_section_parser of the last section must not be called in
discovery mode.

Must be backported in 3.1.
2025-02-12 12:34:57 +01:00
William Lallemand
2c270a05f0 BUG/MINOR: mworker: section ignored in discovery after a post_section_parser
When a new section is discovered, the post_section_parser of the
previous section is called. However in the new master-worker mode the
discovery mode will skip the post_section_parser. But instead of
trying to parse the current section keyword after that, it would skip
completely the current line.

This is a minor bug since there isn't a lot of section with
post_section_parser, and not a lot of section to parse in discovery
mode.

But this could be reproduced like this:

	global
	        expose-deprecated-directives

	resolvers res
		parse-resolv-conf

	program foo
	        command sleep 10

	program bar
	       command sleep 10

Ths 'resolvers' section has a post_section_parser which will be ignored
in discovery mode with the consequence of ignoring the first program
section.

This must be backported in 3.1.
2025-02-12 12:18:17 +01:00
Amaury Denoyelle
731340afbd MINOR: quic: simplify length calculation for STREAM/CRYPTO frames
STREAM and CRYPTO frames have a similar encoding format. In particular,
both of them have a variable-length integer Length field just before the
frame payload.

It is complex to determine the optimal Length value before copying the
payload data in the remaining buffer space. As such, helper functions
were implemented to calculate this. However, CRYPTO and STREAM frames
encoding implementation were not completely aligned, which renders the
code harder to follow.

The purpose of this commit is to simplify CRYPTO and STREAM frames
encoding. First, a new helper quic_int_cap_length() is defined which is
useful to determine the optimal buffer room available if prefixed by a
variable-length integer as Length field. Then, processing of both CRYPTO
and STREAM frames is now nearly identical, based on this new helper
function. Functions max_available_room() and max_stream_data_size() are
now unused and are removed.
2025-02-12 11:51:09 +01:00
Amaury Denoyelle
e6a223542a BUG/MINOR: quic: fix CRYPTO payload size calcul for encoding
Function max_stream_data_size() is used to determine the payload length
of a CRYPTO frame. It takes into account that the CRYPTO length field is
a variable length integer.

Implemented calcul was incorrect as it reserved too much space as a
frame header. This error is mostly due because max_stream_data_size()
reuses max_available_room() which also reserve space for a variable
length integer. This results in CRYPTO frames shorter of 1 to 2 bytes
than the maximum achievable value, which produces in the end datagram
shorter than the MTU.

Fix max_stream_data_size() implementation. It is now merely a wrapper on
max_available_room(). This ensures that CRYPTO frame encoding is now
properly optimized to use the MTU available.

This should be backported up to 2.6.
2025-02-12 11:51:09 +01:00
Amaury Denoyelle
63747452a3 BUG/MINOR: quic: reserve length field for long header encoding
Long header packets have a mandatory Length field, which contains the
size of Packet number and payload, encoded as a variable-length integer.
Its value can thus only be determined after the payload size is known,
which depends on the remaining buffer space after this variable-length
field.

Packet payload are encoded in two steps. First, a list of input frames
is processed until the packet buffer is full. CRYPTO and STREAM frames
payload can be splitted if need to fill the buffer. Real encoding is
then performed as a second stage operation, first with Length field,
then with the selected frames themselves.

Before this patch, no space was reserved in the buffer for Length field
when attaching the frames to the packet. This could result in a error as
the packet payload would be too large for the remaining space.

In practice, this issue was rarely encounted, mostly as a side-effect
from another issue linked to CRYPTO frame encoding. Indeed, a wrong
calculation is performed on CRYPTO splitting, which results in frame
payload shorter by a few bytes than expected. This however ensured there
would be always enough room for the Length field and payload during
encoding. As CRYPTO frames are the only big enough content emitted with
a Long header packet, this renders the current issue mostly non
reproducible.

Fix the original issue by reserving some space for Length field prior to
frame payload calculation, using a maximum value based on the remaining
room space. Packet length is then reduced if needed when encoding is
performed, which ensures there is always enough room for the selected
frames.

Note that the other issue impacting CRYPTO frame encoding is not yet
fixed. This could result in datagrams with Long header packets not
completely extended to the full MTU. The issue will be addressed in
another patch.

This should be backported up to 2.6.
2025-02-12 11:51:09 +01:00
Willy Tarreau
627280e15f MAJOR: leastconn: postpone the server's repositioning under contention
When leastconn is used under many threads, there can be a lot of
contention on leastconn, because the same node has to be moved around
all the time (when picking it and when releasing it). In GH issue #2861
it was noticed that 46 threads out of 64 were waiting on the same lock
in fwlc_srv_reposition().

In such a case, the accuracy of the server's key becomes quite irrelevant
because nobody cares if the same server is picked twice in a row and the
next one twice again.

While other approaches in the past considered using a floating key to
avoid moving the server each time (which was not compatible with the
round-robin rule for equal keys), here a more drastic solution is needed.
What we're doing instead is that we turn this lock into a trylock. If we
can grab it, we do the job. If we can't, then we just wake up a server's
tasklet dedicated to this. That tasklet will then try again slightly
later, knowing that during this short time frame, the server's position
in the queue is slightly inaccurate. Note that any thread touching the
same server will also reposition it and save that work for next time.
Also if multiple threads wake the tasklet up, then that's fine, their
calls will be merged and a single lock will be taken in the end.

Testing this on a 24-core EPYC 74F3 showed a significant performance
boost from 382krps to 610krps. The performance profile reported by
perf top dropped from 43% to 2.5%:

Before:
  Overhead  Shared Object             Symbol
    43.46%  haproxy-master-inlineebo  [.] fwlc_srv_reposition
    21.20%  haproxy-master-inlineebo  [.] fwlc_get_next_server
     0.91%  haproxy-master-inlineebo  [.] process_stream
     0.75%  [kernel]                  [k] ice_napi_poll
     0.51%  [kernel]                  [k] tcp_recvmsg
     0.50%  [kernel]                  [k] ice_start_xmit
     0.50%  [kernel]                  [k] tcp_ack

After:
  Overhead  Shared Object             Symbol
    30.37%  haproxy                   [.] fwlc_get_next_server
     2.51%  haproxy                   [.] fwlc_srv_reposition
     1.91%  haproxy                   [.] process_stream
     1.46%  [kernel]                  [k] ice_napi_poll
     1.36%  [kernel]                  [k] tcp_recvmsg
     1.04%  [kernel]                  [k] tcp_ack
     1.00%  [kernel]                  [k] skb_release_data
     0.96%  [kernel]                  [k] ice_start_xmit
     0.91%  haproxy                   [.] conn_backend_get
     0.82%  haproxy                   [.] connect_server
     0.82%  haproxy                   [.] run_tasks_from_lists

Tested on an Ampere Altra with 64 aarch64 cores dedicated to haproxy,
the gain is even more visible (3.6x):

  Before: 311-323k rps, 3.16-3.25ms, 6400% CPU
  Overhead  Shared Object     Symbol
    55.69%  haproxy-master    [.] fwlc_srv_reposition
    33.30%  haproxy-master    [.] fwlc_get_next_server
     0.89%  haproxy-master    [.] process_stream
     0.45%  haproxy-master    [.] h1_snd_buf
     0.34%  haproxy-master    [.] run_tasks_from_lists
     0.32%  haproxy-master    [.] connect_server
     0.31%  haproxy-master    [.] conn_backend_get
     0.31%  haproxy-master    [.] h1_headers_to_hdr_list
     0.24%  haproxy-master    [.] srv_add_to_idle_list
     0.23%  haproxy-master    [.] http_request_forward_body
     0.22%  haproxy-master    [.] __pool_alloc
     0.21%  haproxy-master    [.] http_wait_for_response
     0.21%  haproxy-master    [.] h1_send

  After: 1.21M rps, 0.842ms, 6400% CPU
  Overhead  Shared Object     Symbol
    17.44%  haproxy           [.] fwlc_get_next_server
     6.33%  haproxy           [.] process_stream
     4.40%  haproxy           [.] fwlc_srv_reposition
     3.64%  haproxy           [.] conn_backend_get
     2.75%  haproxy           [.] connect_server
     2.71%  haproxy           [.] h1_snd_buf
     2.66%  haproxy           [.] srv_add_to_idle_list
     2.33%  haproxy           [.] run_tasks_from_lists
     2.14%  haproxy           [.] h1_headers_to_hdr_list
     1.56%  haproxy           [.] stream_set_backend
     1.37%  haproxy           [.] http_request_forward_body
     1.35%  haproxy           [.] http_wait_for_response
     1.34%  haproxy           [.] h1_send

And at similar loads, the CPU usage considerably drops (3.55x), as
well as the response time (10x):

  After: 320k rps, 0.322ms, 1800% CPU
  Overhead  Shared Object     Symbol
     7.62%  haproxy           [.] process_stream
     4.64%  haproxy           [.] h1_headers_to_hdr_list
     3.09%  haproxy           [.] h1_snd_buf
     3.08%  haproxy           [.] h1_process_demux
     2.22%  haproxy           [.] __pool_alloc
     2.14%  haproxy           [.] connect_server
     1.87%  haproxy           [.] h1_send
   > 1.84%  haproxy           [.] fwlc_srv_reposition
     1.84%  haproxy           [.] run_tasks_from_lists
     1.77%  haproxy           [.] sock_conn_iocb
     1.75%  haproxy           [.] srv_add_to_idle_list
     1.66%  haproxy           [.] http_request_forward_body
     1.65%  haproxy           [.] wake_expired_tasks
     1.59%  haproxy           [.] h1_parse_msg_hdrs
     1.51%  haproxy           [.] http_wait_for_response
   > 1.50%  haproxy           [.] fwlc_get_next_server

The cost of fwlc_get_next_server() naturally increases as the server
count increases, but now has no visible effect on updates. The load
distribution remains unchanged compared to the previous approach,
the weight still being respected.

For further improvements to the fwlc algo, please consult github
issue #881 which centralizes everything related to this algorithm.
2025-02-12 11:48:10 +01:00
Willy Tarreau
b6a8318cc2 MEDIUM: server: allocate a tasklet for asyncronous requeuing
This creates a tasklet that only expects to be called when the LB
algorithm is under contention when trying to reposition the server
in its tree. Indeed, that's one of the operations that usually
requires to take a write lock on a highly contended area, often
for very little benefits under contention; indeed, under load, if
a server keeps its previous position for a few extra microseconds,
usually there's no harm. Thus this new tasklet can be woken up by
the LB algo to ask the server to later call lbprm.server_requeue().
It does nothing else.
2025-02-11 17:24:09 +01:00
Willy Tarreau
20b8c4ddba MINOR: lbprm: add a new callback ->server_requeue to the lbprm
This callback will be used to reposition a server to its expected
position regardless of the fact that it was taken or dropped. It
will only be used by supporting LB algos. For now, only fwlc defines
it and assigns it to fwlc_srv_reposition(). At the moment it's not
used yet.
2025-02-11 17:16:14 +01:00
Willy Tarreau
eced1d6d8a DEBUG: thread: reduce the struct lock_stat to store only 30 buckets
Storing only 30 buckets means we only keep 256 bytes per label. This
further simplifies address calculation and reduces the memory used
without complicating the locking code. It means we won't measure wait
times larger than a second but we're not supposed to face this as it
would trigger the watchdog anyway. It may become a little bit just if
measuring using rdtsc() instead of now_mono_time() though (typically
the limit would be around 350ms for a 3 GHz CPU).
2025-02-10 18:34:43 +01:00
Willy Tarreau
c2f2d6fd3c DEBUG: thread: make lock_stat per operation instead of for all operations
It's more convenient (and more readable) to have the lock stats arranged
by operation type (read, seek, write). It will also allow to later simplify
the structure format and the bucket address calculation. Now lock_stat[]
got split into lock_stats_rd[], lock_stats_sk[], lock_stats_wr[].
2025-02-10 18:34:43 +01:00
Willy Tarreau
4168d1278c DEBUG: thread: don't keep the redundant _locked counter
Now that we have our sums by bucket, the _locked counter is redundant
since it's always equal to the sum of all entries. Let's just get rid
of it and replace its consumption with a loop over all buckets, this
will reduce the overhead of taking each lock at the expense of a tiny
extra effort when dumping all locks, which we don't care about.
2025-02-10 18:34:43 +01:00
Willy Tarreau
a22550fbd7 DEBUG: thread: report the wait time buckets for lock classes
In addition to the total/average wait time, we now also store the
wait time in 2^N buckets. There are 32 buckets for each type (read,
seek, write), allowing to store wait times from 1-2ns to 2.1-4.3s,
which is quite sufficient, even if we'd want to switch from NS to
CPU cycles in the future. The counters are only reported for non-
zero buckets so as not to visually pollute the output.

This significantly inflates the lock_stat struct, which is now
aligned to 256 bytes and rounded up to 1kB. But that's not really
a problem, given that there's only one per lock label.
2025-02-10 18:34:43 +01:00
Willy Tarreau
0b849c59fb DEBUG: thread: make lock time computation more consistent
The lock time computation was a bit inconsistent between functions,
particularly those using a try_lock. Some of them would count the lock
as taken without counting the time, others would simply not count it.
This is essentially due to the way the time is retrieved, as it was
done inside the atomic increment.

Let's instead always use start_time to carry the elapsed time, by
presetting it to the negative time before the event and addinf the
positive time after, so that it finally contains the duration. Then
depending on the try lock's success, we add the result or not. This
was generalized to all lock functions for consistency, and because
this will be handy for future changes.
2025-02-10 18:34:43 +01:00
Willy Tarreau
99a88ee904 DEBUG: thread: report the spin lock counters as seek locks
Technically speaking, spin locks use a seek lock, not a write lock,
so better count them appropriately for consistency (lock time, or
function calls count).
2025-02-10 18:34:43 +01:00
Willy Tarreau
7ddcdff33f BUG/MEDIUM: debug: close a possible race between thread dump and panic()
The rework of the thread dumping mechanism in 2.8 with commit 9a6ecbd590
("MEDIUM: debug: simplify the thread dump mechanism") opened a small
race, which is that a thread in the process of dumping other ones may
block the other one from panicing while it's looping at the end of
ha_thread_dump_fill(), or any other sequence involving the currently
dumped one.

This was emphasized in 3.1 with commit 148eb5875f ("DEBUG: wdt: better
detect apparently locked up threads and warn about them") that allowed
to emit warnings about long-stuck threads, because in this case, what
happens is that sometimes a thread starts to emit a warning (or a set
of warnings), and while the warning is being awaited for, a panic
finally happens and interrupts either the dumping thread, which never
finishes and waits for the target's pointer to become NULL which will
never happen since it was supposed to do it itself, or the currently
dumped thread which could wait for the dumping thread to become ready
while this one has not released the former.

In order to address this, first we now make sure never to dump a thread
that is already in the process of dumping another one. We're adding a
new thread flag to know this situation, that is set in ha_thread_dump_fill()
and cleared in ha_thread_dump_done(). And similarly, we don't trigger
the watchdog on a thread waiting for another one to finish its dump,
as it's likely a case of warning (and maybe even a panic) that makes
them wait for each other and we don't want such cases to be reentrant.
Finally, we check in the main polling loop that the flag never accidentally
leaked (e.g. wrong flag manipulation) as this would be difficult to spot
with bad consequences.

This should be backported at least to 2.8, and should resolve github
issue #2860. Thanks to Chris Staite for the very informative backtrace
that exhibited the problem.
2025-02-10 18:34:26 +01:00
Willy Tarreau
37e84676c7 [RELEASE] Released version 3.2-dev5
Released version 3.2-dev5 with the following main changes :
    - BUG/MINOR: ssl: put ssl_sock_load_ca under SSL_NO_GENERATE_CERTIFICATES
    - CLEANUP: ssl: rename ssl_sock_load_ca to ssl_sock_gencert_load_ca
    - CLEANUP: ssl: move ssl_sock_gencert_load_ca declaration in ssl_gencert.h
    - CLEANUP: tree-wide: define and use acl_match_cond() helper
    - MINOR: epoll: permit to mask certain specific events
    - MINOR: proxies: Add a per-thread group field to struct proxy.
    - MINOR: Add fields to the per-thread group field in struct server.
    - MINOR: proxies/servers: Calculate queueslength and use it.
    - MEDIUM: servers/proxies: Switch to using per-tgroup queues.
    - BUG/MINOR: stream: Properly handle "on-marked-up shutdown-backup-sessions"
    - MEDIUM: stream: Map task wake up reasons to dedicated stream events
    - MEDIUM: stream: No longer use TASK_F_UEVT* to shut a stream down
    - BUILD: tools: fix build on BSD by dropping the ETIME check
    - MINOR: queues: use __ha_cpu_relax() on failed CAS.
    - BUILD: queues: Use unsigned int when needed
    - BUILD: ssl: allow to build without the renegotiation API of WolfSSL
    - BUILD: ssl: more cleaner approach to WolfSSL without renegotiation
    - BUG/MEDIUM: chunk: make sure to flush the trash pool before resizing
    - MINOR: quic: remove references to burst in quic-cc-algo parsing
    - MINOR: quic: allow BBR testing without pacing
    - MINOR: quic: transform pacing settings into a global option
    - MAJOR: quic: mark pacing as stable and enable it by default
    - MINOR: quic: mark BBR as stable
    - MINOR: quic: define quic_tune
    - BUILD: quic: fix overflow in global tune
    - DEBUG: fd: add a counter of takeovers of an FD since it was last opened
    - MINOR: fd: add a generation number to file descriptors
    - DEBUG: epoll: store and compare the FD's generation count with reported event
    - MEDIUM: epoll: skip reports of stale file descriptors
    - MINOR: mux-h1: Add masks to group H1S DEMUX and MUX errors
    - BUG/MINOR: mux-h1: Only report a SE error on demux error
    - MINOR: tevt: Add the termination events log's fundations
    - MINOR: tevt/stconn: Add a termination events log in the SE descriptor
    - MINOR: tevt/mux-h1: Report termination events for the H1C and H1S
    - MINOR: tevt/mux-h2: Report termination events for the H2C
    - MINOR: tevt/stream/stconn: Report termination events for stream and sc
    - MINOR: tevt/conn: Report intercepted event for L4 rules
    - MINOR: tevt/mux-h1/mux-h2: Add termination events log when dumping mux info
    - MINOR: tevt/muxes: Add CTL and SCTL command to get the termination event logs
    - MINOR: tevt/mux-pt: Add support for termination event logs
    - MINOR: tevt/connection: Add dedicated termination events for lower locations
    - MEDIUM: tevt/muxes: Add dedicated termination events for muxc/se locations
    - MINOR: tevt/stconn: Be more accurate to report shutw events
    - MEDIUM: tevt/stconn/stream: Add dedicated termination events for stream location
    - MINOR: tevt: Don't duplicate termination event during reporting
    - MINOR: tevt/applet:  Add limited support for termination event logs for applets
    - MINOR: tevt: Add a sample to get termination events for all locations
    - MINOR: tevt: Improve function to convert a termination events log to string
    - REORG: tevt/connection: Move enums at the end of the header file
    - MINOR: tevt/dev: Add term_events tool
    - MINOR: tevt/connection: Add support for POLL_HUP/POLL_ERR events
    - MINOR: tevt/dev: Parse tuple of termination events
    - BUG/MEDIUM: htx: wrong count computation in htx_xfer_blks()
    - DOC: htx: clarify <mark> parameter for htx_xfer_blks()
    - BUILD: quic: remove GCC undefined error in qc_release_lost_pkts()
    - MEDIUM: htx: prevent <mark> to copy incomplete headers in htx_xfer_blks()
    - BUG/MEDIUM: mux-fcgi: Properly handle read0 on partial records
    - BUG/MINOR: tevt/http-ana: Remove badly placed event reports
    - DEBUG: http-ana: Remove debug counters from HTTP analyzers
    - DEBUG: mux-h1: Remove some debug counters
    - BUG/MINOR: tcp-rules: Don't forward close during tcp-response content rules eval
    - MEDIUM: stream: interrupt costly rulesets after too many evaluations
    - BUG/MINOR: http-check: Don't pretend a C-L heeader is set before adding it
    - BUILD: ssl: remove a boringssl definition defined by recent boringssl libs
    - BUG/MINOR: tevt/mux-h2: Set truncated receive/eos events at SE level on error
    - BUG/MEDIUM: flt-spoe: Set/test applet flags instead of SE flags from I/O handler
    - BUG/MEDIUM: applet: Don't pretend to have more data to handle EOI/EOS/ERROR
    - BUG/MEDIUM: flt-spoe: Properly handle end of stream from the SPOE applet
    - MINOR: flt-spoe: Report end of input immediately after applet init
    - MINOR: mux-spop: Report EOI on the SE when a ACK is received for a stream
    - MINOR: mux-spop: Set SPOP_CF_ERROR flag on connection error only
    - MINOR: tevt/mux-spop:  Report termination events for the SPOP connect/stream
    - CLEANUP: mux-spop: Remove useless comments
    - MINOR: mux-spop: Dump info about connections and streams in dedicated functions
    - MINOR: mux-spop: Implement .show_sd callback function
    - MEDIUM: mux-fcgi: Add a function to propagate termination flags from fstrm to SE
    - BUG/MEDIUM: mux-fcgi: Propagate flags to SE in fcgi_strm_wake_one_stream
    - MINOR: tevt/mux-fcgi:  Report termination events for the FCGI connect/stream
    - MINOR: mux-fcgi: Dump info about connections and streams in dedicated functions
    - MINOR: mux-spop/mux-fcgi: Add support of the debug string for logs
    - BUG/MINOR: cli: Don't set SE flags from the cli applet
    - BUG/MINOR: cli: Fix memory leak on error for _getsocks command
    - BUG/MINOR: cli: Fix a possible infinite loop in _getsocks()
    - BUG/MINOR: config/userlist: Support one 'users' option for 'group' directive
    - BUG/MINOR: auth: Fix a leak on error path when parsing user's groups
    - BUG/MINOR: flt-trace: Support only one name option
    - MINOR: filters: Improve errors formating during filters parsing
    - BUG/MINOR: stats-json: Define JSON_INT_MAX as a signed integer
    - DOC: option redispatch should mention persist options
    - BUG/MINOR: debug: make "debug dev sched" accept a negative TID
    - BUG/MINOR: debug: make sure the "debug dev sched" tasks don't block stopping
    - IMPORT: plock: export the uninlined version of the lock wait function
    - IMPORT: plock: give higher precedence to W than S
    - IMPORT: plock: lower the slope of the exponential back-off
    - IMPORT: plock: use cpu_relax() for a shorter time in EBO
    - Revert "IMPORT: plock: export the uninlined version of the lock wait function"
    - BUG/MEDIUM: ssl: chosing correct certificate using RSA-PSS with TLSv1.3
2025-02-08 05:53:40 +01:00
William Lallemand
3912780b1e BUG/MEDIUM: ssl: chosing correct certificate using RSA-PSS with TLSv1.3
The clienthello callback was written when TLSv1.3 was not yet out, and
signatures algorithm changed since then.

With TLSv1.2, the least significant byte was used to determine the
SignatureAlgorithm, which could be rsa(1), dsa(2), ecdsa(3).
https://datatracker.ietf.org/doc/html/rfc5246#section-7.4.1.4.1

This was used to chose which type of certificate to push to the client.

But TLSv1.3 changed that, and introduced new RSA-PSS algorithms that
does not have the least sinificant byte to 1.
https://datatracker.ietf.org/doc/html/rfc8446#section-4.2.3

This would result in chosing the wrong certificate when an RSA an ECDSA
ones are in the configuration for the same SNI or default entry.

This patch fixes the issue by parsing bothe hash and signature field to
check the RSA-PSS signature scheme.

This must fix issue #2852.

This must be backported in every stable versions. The code was moved
from ssl_sock.c to ssl_clienthello in recent versions.
2025-02-07 20:56:42 +01:00
Willy Tarreau
ae540e3d9c Revert "IMPORT: plock: export the uninlined version of the lock wait function"
This reverts commit 5496d06b2b1ea276ffb6aec78ffca177b88d89cd.

It breaks the build on Windows which apparently doesn't support the weak
attribute well on functions. It's not big deal anyway, playing with build
options while debugging still works though it's less easy to use.
2025-02-07 19:51:15 +01:00
Willy Tarreau
b957e2f3ef IMPORT: plock: use cpu_relax() for a shorter time in EBO
Tests have shown that on modern CPUs it's interesting to wait a bit less
in cpu_relax(). Till now we were looping down to 60 iterations and then
switching to just barriers. Increasing the threshold to 90 iterations
left before getting out of the loop improved the average and max time
to grab a write lock by a few percent (e.g. 10% at 1us, 20% at 256ns
or lower). Higher values tend to progressively lose that gain so let's
stick to this one. This was measured on an EPYC 74F3 like previous
measurements that initially led to this value, and the value might
possibly depend on the mask applied to the loop counter.

This is plock commit 74ca0a7307fa6aec3139f27d3b7e534e1bdb748e.
2025-02-07 18:04:29 +01:00
Willy Tarreau
253fba01a7 IMPORT: plock: lower the slope of the exponential back-off
Along many tests involving both haproxy's scheduler and forwarded
traffic, various exponents and algorithms were attempted for the EBO
and their effects were measured. It was found that a growth in 1.25^N
limited to 128k cycles consistently gives a better latency than 1.5^N
limited to 256k cycles, without degrading general performance. The
measures of the time to grab a write lock on a 48-thread EPYC show
that the number of occurrences of low times was roughly multiplied by
2-3 while the number of occurrences of times above 64us was reduced
by similar factors, to even reach 300 at 64us and limiting the maximum
time by a factor of 4.

The other variants that were experimented with are:

  m = ((m + (m >> 1)) + 2) & 0x3ffff;            // original
  m = ((m + (m >> 1) + (m >> 3)) + 2) & 0x3ffff;
  m = ((m + (m >> 1) + (m >> 4)) + 2) & 0x3ffff;
  m = ((m + (m >> 1) + (m >> 4)) + 2) & 0x1ffff;
  m = ((m + (m >> 1) + (m >> 4)) + 1) & 0x1ffff;
  m = ((m + (m >> 2) + (m >> 4)) + 1) & 0x1ffff; // lowest CPU on pl_wr test + good perf
  m = ((m + (m >> 2)) + 1) & 0x1ffff;            // even lower cpu usage, lowest max
  m = ((m + (m >> 1) + (m >> 2)) + 1) & 0x1ffff; // correct but slightly higher maxes
  m = ((m + (m >> 1) + (m >> 3)) + 1) & 0x1ffff; // less good than m+m>>2
  m = ((m + (m >> 2) + (m >> 3)) + 1) & 0x1ffff; // better but not as good as m+m>>2
  m = ((m + (m >> 3) + (m >> 4)) + 1) & 0x1ffff; // less good, lower rates on small coounts.
  m = ((m + (m >> 2) + (m >> 3) + (m >> 4)) + 1) & 0x1ffff; // less good as well
  m = ((m & 0x7fff) + (m >> 1) + (m >> 4)) + 2;
  m = ((m & 0xffff) + (m >> 1) + (m >> 4)) + 2;

This is plock commit dddd9ee01c522da33c353e2e4d4fd743d8336ec3.
2025-02-07 18:04:29 +01:00
Willy Tarreau
9dd56da730 IMPORT: plock: give higher precedence to W than S
It was noticed in haproxy that in certain extreme cases, a write lock
subject to EBO may fail for a very long time in front of a large set
of readers constantly trying to upgrade to the S state. The reason is
that among many readers, one will succeed in its upgrade, and this
situation can last for a very long time with many readers upgrading
in turn, while the writer waits longer and longer before trying again.

Here we're taking a reasonable approach which is that the write lock
should have a higher precedence in its attempt to grab the lock. What
is done is that instead of fully rolling back in case of conflict with
a pure S lock, the writer will only release its read part in order to
let the S upgrade to W if needed, and finish its operations. This
guarantees no other seek/read/write can enter. Once the conflict is
resolved, the writer grabs the read part again and waits for readers
to be gone (in practice it could even return without waiting since we
know that any possible wanderers would leave or even not be there at
all, but it avoids a complicated loop code that wouldn't improve the
practical situation but inflate the code).

Thanks to this change, the maximum write lock latency on a 48 threads
AMD with aheavily loaded scheduler went down from 256 to 64 ms, and the
number of occurrences of 32ms or more was divided by 300, while all
occurrences of 1ms or less were multiplied by up to 3 (3 for the 4-16ns
cases).

This is plock commit b6a28366d156812f59c91346edc2eab6374a5ebd.
2025-02-07 18:04:29 +01:00
Willy Tarreau
5496d06b2b IMPORT: plock: export the uninlined version of the lock wait function
The inlining of the lock waiting function was made more easily
configurable with commit 7505c2e ("plock: always expose the inline
version of the lock wait function"). However, the standard one remained
static, but in order to resolve the symbols in "perf top", it's much
better to export it, so let's move "static" with "inline" and leave it
exported when PLOCK_INLINE_EBO is not set.

This is plock commit 3bea7812ec705b9339bbb0ed482a2cd8aa6c185c.
2025-02-07 18:04:29 +01:00
Willy Tarreau
8d63dc50ab BUG/MINOR: debug: make sure the "debug dev sched" tasks don't block stopping
When "debug dev sched" is used to pop up background tasks, these tasks
are never stopped, so we must be careful to stop them when the stopping
flag is set, otherwise they can prevent the process from stopping when
sufficiently numerous (tests went as far as 100 million tasks, leading
the run queue never being completely purged in one poll round).

No backport is needed since this is only used when debugging and tuning
the scheduler.
2025-02-07 18:04:29 +01:00
Willy Tarreau
6765a32eb4 BUG/MINOR: debug: make "debug dev sched" accept a negative TID
The TID passed to "debug dev sched" is used to pin the task to a given
thread. A negative value normally means the task is unpinned and goes
to the shared wait queue and run queue. However due to the type of the
variable, negative values were mapped as highly positive values and were
set to the current thread. Let's add the proper cast to fix this.

No backport is needed since this is only used to experiment with the
scheduler and measure its performance.
2025-02-07 18:04:29 +01:00
Lukas Tribus
5926fb7823 DOC: option redispatch should mention persist options
"option redispatch" remains vague in which cases a session would persist;
let's mention "option persist" and "force-persist" as an example so folks
don't draw the conclusion that this may be default.

Should be backported to stable branches.
2025-02-06 17:49:13 +01:00
Christopher Faulet
d48b5add88 BUG/MINOR: stats-json: Define JSON_INT_MAX as a signed integer
A JSON integer is defined in the range [-(2**53)+1, (2**53)-1]. Macro are used
to define the minimum and the maximum value, The minimum one is defined using
the maximum one. So JSON_INT_MAX must be defined as a signed integer value to
avoid wrong cast of JSON_INT_MIN.

It was reported by Coverity in #2841: CID 1587769.

This patch could be backported to all stable versions.
2025-02-06 17:19:49 +01:00
Christopher Faulet
bc487afc85 MINOR: filters: Improve errors formating during filters parsing
The error message reported by a filter during parsing are displayed between
quotes. It is not really user friendly. So let's remove the quotes here.
2025-02-06 17:03:40 +01:00
Christopher Faulet
b20e2c96cf BUG/MINOR: flt-trace: Support only one name option
When a trace filter is defined, only one 'name' option is expected. But it
was not tested. Thus it was possible to set several names leading to a
memory leak.

It is now tested, and it is not allowed to redefine the trace filter name.

It was reported by Coverity in #2841: CID 1587768.

This patch could be backported to all stable versions.
2025-02-06 17:01:15 +01:00
Christopher Faulet
a7f513af91 BUG/MINOR: auth: Fix a leak on error path when parsing user's groups
In a userlist section, when a user is parsed, if a specified group is not
found, an error is reported. In this case we must take care to release the
alredy built groups list.

It was reported by Coverity in #2841: CID 1587770.

This patch could be backported to all stable versions.
2025-02-06 16:55:37 +01:00
Christopher Faulet
a1e14d2a82 BUG/MINOR: config/userlist: Support one 'users' option for 'group' directive
When a group is defined in a userlist section, only one 'users' option is
expected. But it was not tested. Thus it was possible to set several options
leading to a memory leak.

It is now tested, and it is not allowed to redefine the users option.

It was reported by Coverity in #2841: CID 1587771.

This patch could be backported to all stable versions.
2025-02-06 16:55:29 +01:00
Christopher Faulet
75e8c8ed33 BUG/MINOR: cli: Fix a possible infinite loop in _getsocks()
In _getsocks() functuoin, when we failed to set the unix socket in
non-blocking mode, a goto to "out" label led to loop infinitly. To fix the
issue, we must only let the function exit.

This patch should be backported to all stable versions.
2025-02-06 15:44:21 +01:00
Christopher Faulet
372cc696d4 BUG/MINOR: cli: Fix memory leak on error for _getsocks command
Some errors in parse function of _getsocks commands were not properly handled
and immediately returned, leading to a memory leak on cmsgbuf and tmpbuf
buffers.

To fix the issue, instead of immediately return with -1, we jump to "out"
label. Returning 1 intead of -1 in that case is valid.

This was reported by Coverity in #2841: CIDs 1587773 and 1587772.

This patch should be backported as far as 2.4.
2025-02-06 15:43:04 +01:00
Christopher Faulet
7e927243b9 BUG/MINOR: cli: Don't set SE flags from the cli applet
Since the CLI was updated to use the new applet API, it should no longer set
directly the SE flags. Instead, the corresponding applet flags must be set,
using the applet API (appet_set_*). It is true for the CLI I/O handler but also
for the commands parse function and I/O callback function.

This patch should be backported as far as 3.0.
2025-02-06 15:23:20 +01:00
Christopher Faulet
0aa69e7865 MINOR: mux-spop/mux-fcgi: Add support of the debug string for logs
Now it is possible to have debug info about FCGI and SPOP multiplexers. To do
so, the support for the MUX_SCTL_DBG_STR command was implemented for these
muxes.

The have this log message, the log-format must be set to:

  log-format "$HAPROXY_HTTP_LOG_FMT bs=<%[bs.debug_str]>"
2025-02-06 11:19:32 +01:00
Christopher Faulet
456cfa450a MINOR: mux-fcgi: Dump info about connections and streams in dedicated functions
fcgi_show_fd() function was splitted to dump the info about the FCGI
connections and the FCGI streams in dedicated functions, duplicating this
way what is performed in other muxes.

In addition, the FCGI multiplexer now implements the .show_sd callback
function called by "show sess" CLI command.
2025-02-06 11:19:32 +01:00
Christopher Faulet
bbc8c98a54 MINOR: tevt/mux-fcgi: Report termination events for the FCGI connect/stream
Termination events are now reported for the FCGI connections and the FCGI
streams. In addition, all available termination events logs are reported in
the "show-fd" callback function. The .ctl and .sctl callback functions were
also update to support, respectively, MUX_CTL_TEVTS and MUX_SCTL_TEVTS
commands.
2025-02-06 11:19:32 +01:00
Christopher Faulet
5b1c2277ae BUG/MEDIUM: mux-fcgi: Propagate flags to SE in fcgi_strm_wake_one_stream
The commit is flagged as a bug because the same fix on the H2 multiplexer was
reported as a bug. But no issue was reported.

When a stream is explicitly woken up by the FCGI conneciton, if an error
condition is detected, the corresponding error flag is set on the SE. So
SE_FL_ERROR or SE_FL_ERR_PENDING, depending if the end of stream was
reported or not.

However, there is no attempt to propagate other termination flags. We must
be sure to properly set SE_FL_EOI and SE_FL_EOS when appropriate to be able
to switch a pending error to a fatal error.

Because of this bug, the SE could remain with a pending error and no end of
stream, preventing the applicative stream to trully abort it. It means on
some abort scenario, it seems to be possible to block a stream infinitely.

This patche depends on:

  * MEDIUM: mux-fcgi: Add a function to propagate termination flags from fstrm to SE
  * BUG/MEDIUM: mux-fcgi: Properly handle read0 on partial records

This patch could be backported at least as far as 2.8 after a period of
observation. However no bug was reportedn so there is no rush.
2025-02-06 11:19:32 +01:00
Christopher Faulet
ccdca4bb77 MEDIUM: mux-fcgi: Add a function to propagate termination flags from fstrm to SE
The function fcgi_strm_propagate_term_flags() was added to check the FSTRM
state and evaluate when EOI/EOS/ERR_PENDING/ERROR flags must be set on the
SE. It is not the only place where those flags are set. But it centralizes
the synchro between the FCGI stream and the SC.

For now, this function is only used at the end of fcgi_rcv_buf(). But it
will be used to fix a potential bug.
2025-02-06 11:19:32 +01:00
Christopher Faulet
7b638eb1a6 MINOR: mux-spop: Implement .show_sd callback function
The SPOP multiplexer now implements the .show_sd callback function called by
"show sess" CLI command.
2025-02-06 11:19:32 +01:00
Christopher Faulet
5aeb678762 MINOR: mux-spop: Dump info about connections and streams in dedicated functions
spop_show_fd() function was splitted to dump the info about the SPOP
connections and the SPOP streams in dedicated functions, duplicating this
way what is performed in other muxes.
2025-02-06 11:19:32 +01:00
Christopher Faulet
eb4e517489 CLEANUP: mux-spop: Remove useless comments
Just a small cleanup to remove some comments added during the development of
the mux.
2025-02-06 11:19:32 +01:00
Christopher Faulet
4f8ae5b1f6 MINOR: tevt/mux-spop: Report termination events for the SPOP connect/stream
Termination events are now reported for the SPOP connections and the SPOP
streams. In addition, all available termination events logs are reported in
the "show-fd" callback function. The .ctl and .sctl callback functions were
also update to support, respectively, MUX_CTL_TEVTS and MUX_SCTL_TEVTS
commands.
2025-02-06 11:19:32 +01:00
Christopher Faulet
514a912a4d MINOR: mux-spop: Set SPOP_CF_ERROR flag on connection error only
The SPOP_CF_ERROR flag is now set on connection error only. It was also set
on some demux failures. But it is not mandatory because the connection is
closed anyway. And it is handy to have a flag dedicated to tcp connection
error. It was the original purpose of this flag.

This patch could be backported to 3.1 to ease future backports.
2025-02-06 11:19:32 +01:00
Christopher Faulet
d16c534511 MINOR: mux-spop: Report EOI on the SE when a ACK is received for a stream
The spop stream now reports the end of input when the ACK is transferred to
the SPOE applet. To do so, the flag SPOP_SF_ACK_RCVD was added. It is set on
the SPOP stream when its ACK is received by the SPOP connection.

In addition when SPOP stream flags are propagated to the SE, the error is
now reported if end of input was not reached instead of testing the
connection error code. It is more accurate.

This patch should be backported to 3.1.
2025-02-06 11:19:32 +01:00
Christopher Faulet
f7e5718596 MINOR: flt-spoe: Report end of input immediately after applet init
The SPOE applet forwards the message that must be sent to agent during its
init stage. So just after it is created. When it is performed, the end of
input must be reported because no more data will be forwarded. However, it
was performed after receiving the ACK response. It is harmless, but there is
no reason to delay the EOI. It is now fixed.

This patch must be backported to 3.1.
2025-02-06 11:19:32 +01:00
Christopher Faulet
38aac2c7bc BUG/MEDIUM: flt-spoe: Properly handle end of stream from the SPOE applet
The previous fix ("BUG/MEDIUM: applet: Don't pretend to have more data to
handle EOI/EOS/ERROR") revealed an issue with the way the SPOE applet was
reporting the end of stream, leading to never shut the applet down.

In fact, there is two bug in one. The first one is about the applet
shutdown. Since the fix above, the applet is no longer closed. Before, it
was closed because it was reported in error. But now, it is just delayed
because the applet and the SPOP stream are declared to support half close
connections. So the applet is only closed when the SPOP connection is
closed. To fix this bug, both side are now stating that half close
connections are not supported.

The second bug is about the way the end of stream is reported. It is
reported when the ACK response is received. But it is too early, because the
parent stream must process the response first. So now, we take care to have
processed the ACK from the parent applet before reporting an end of stream.

This patch must be backported with the commit above to 3.1.
2025-02-06 11:19:32 +01:00
Christopher Faulet
7214dcd52d BUG/MEDIUM: applet: Don't pretend to have more data to handle EOI/EOS/ERROR
The way appctx EOI/EOS/ERROR flags were reported for applets using the new
API were to state the applet had more data to deliver. But it was not
correct and for APPCTX_FL_EOS, this led to report an error on the SE because
it is not expected. More data to deliver and an end of stream is an
impossible situation.

This was added as a fix by commit b8ca114031 ("BUG/MEDIUM: applet: State
appctx have more data if its EOI/EOS/ERROR flag is set"), mainly to make the
SPOE applet work.

When an applet set one of these flags, it really means it has no more data
to deliver. So we must not try to trigger a new receive to handle these
flags. Instead we must handle them directly in task_process_applet()
function and only if the corresponding SE flags were not already set.

This patch must be backported to 3.1.
2025-02-06 11:19:32 +01:00
Christopher Faulet
db504fbdbe BUG/MEDIUM: flt-spoe: Set/test applet flags instead of SE flags from I/O handler
The SPOE applet is using the new applet API. Thus end of input, end of
stream and errors must be reported using the applet flags, not the SE
flags. This was not the case. So let's fix it.

It seems this bug is harmless for now.

This patch must be backported to 3.1.
2025-02-06 11:19:32 +01:00
Christopher Faulet
54a09dfe0f BUG/MINOR: tevt/mux-h2: Set truncated receive/eos events at SE level on error
When receive or EOS termination events are reported at the SE level, a
truncation was erroneously reported when no error was detected. Of course, it
must be the opposite.

No backport needed.
2025-02-06 11:19:32 +01:00
Frederic Lecaille
85cb1cc7f4 BUILD: ssl: remove a boringssl definition defined by recent boringssl libs
This is the case for AWS-LC which derives from boringssl, where
X509_OBJECT_get0_X509_CRL() is already defined. There is definitively
no more need to define this function to build haproxy against TLS libs derived
from boringssl.
2025-02-06 10:48:25 +01:00
Christopher Faulet
fad68cb16d BUG/MINOR: http-check: Don't pretend a C-L heeader is set before adding it
When a GET/HEAD/OPTIONS/DELETE healthcheck request was formatted, we claimed
there was a "content-length" header set even when there was no payload,
leading to actually send a "content-length: 0" header to the server. It was
unexpected and could be rejected by servers.

When a healthcheck request is sent we must take care to state there is a
"content-length" header when it is explicitly added.

This patch should fix the issue #2851. It must be backported as far as 2.9.
2025-02-03 18:46:41 +01:00
Aurelien DARRAGON
0846638f7f MEDIUM: stream: interrupt costly rulesets after too many evaluations
It is not rare to see configurations with a large number of "tcp-request
content" or "http-request" rules for instance. A large number of rules
combined with cpu-demanding actions (e.g.: actions that work on content)
may create thread contention as all the rules from a given ruleset are
evaluated under the same polling loop if the evaluation is not interrupted

Thus, in this patch we add extra logic around "tcp-request content",
"tcp-response content", "http-request" and "http-response" rulesets, so
that when a certain number of rules are evaluated under the single polling
loop, we force the evaluating function to yield. As such, the rule which
was about to be evaluated is saved, and the function starts evaluating
rules from the save pointer when it returns (in the next polling loop).

We use task_wakeup(task, TASK_WOKEN_MSG) to explicitly wake the task so
that no time is wasted and the processing is resumed ASAP. TASK_WOKEN_MSG
is mandatory here because process_stream() expects TASK_WOKEN_MSG for
explicit analyzers re-evaluation.

rules_bcount stream's attribute was added to count how manu rules were
evaluated since last interruption (yield). Also, SF_RULE_FYIELD flag
was added to know that the s->current_rule was assigned due to forced
yield and not regular yield.

By default haproxy will enforce a yield every 50 rules, this behavior
can be configured using the "tune.max-rules-at-once" global keyword.

There is a limitation though: for now, if the ACT_OPT_FINAL flag is set
on act_opts, we consider it is not safe to yield (as it is already the
case for automatic yield). In this case instead of yielding an taking
the risk of not being called back, we skip the yield and hope it will
not create contention. This is something we should ideally try to
improve in order to yield in all conditions.
2025-02-03 17:09:48 +01:00
Christopher Faulet
04bbfa4354 BUG/MINOR: tcp-rules: Don't forward close during tcp-response content rules eval
When the tcp-response content ruleset evaluation is delayed because of an
ACL condition, the close forwarding on the client side is not explicitly
blocked. So it is possible to close the client side before the end of the
response evaluation.

To fix the issue, this is now done in all cases where some data are
missing. Concretely, channel_dont_close() is called in "missing_data" goto
label.

Note it is only a theorical bug (or pending bug). It is not possible to
trigger it for now because an ACL cannot wait for more data when a close was
received. But the code remains a bit weak. It is safer this way. It is
especially mandatory for the "force yield" option that should be added soon.

This patch could be backported to all stable versions.
2025-02-03 15:31:59 +01:00
Christopher Faulet
431c5533b7 DEBUG: mux-h1: Remove some debug counters
Several debug counters were added to debug a strange issue about early
aborts. Most of them are now useless, especially because it is now possible
to rely on the termination events logs. So, it is better to remove them.

Note that these counters are still there in 3.1.
2025-02-03 08:48:31 +01:00
Christopher Faulet
1c6512f8fc DEBUG: http-ana: Remove debug counters from HTTP analyzers
Several debug counters were added in HTTP analyzers to help debugging a
strange issue about early aborts. But these counters are a bit overkill
now. Especially because it is now possible to rely on the termination event
log. So just remove them.

Note that these counters are still there in 3.1.
2025-02-03 08:28:45 +01:00
Christopher Faulet
274c9d21a6 BUG/MINOR: tevt/http-ana: Remove badly placed event reports
When specific events for the stream location were added, some reports about
message interception were not removed. These reports are now removed.

No need to backport.
2025-02-03 08:20:41 +01:00
Christopher Faulet
5f927f603a BUG/MEDIUM: mux-fcgi: Properly handle read0 on partial records
A Read0 event could be ignored by the FCGI multiplexer if it is blocked on a
partial record. Instead of handling the event, it remained blocked, waiting
for the end of the record.

To fix the issue, the same solution than the H2 multiplexer is used. Two
flags are introduced. The first one, FCGI_CF_END_REACHED, is used to
acknowledge a read0. This flag is set when a read0 was received AND the FCGI
multiplexer must handle it. The second one, FCGI_CF_DEM_SHORT_READ, is set
when the demux is interrupted on a partial record. A short read and a read0
lead to set the FCGI_CF_END_REACHED flag.

With these changes, the FCGI mux should be able to properly handle read0 on
partial records.

This patch should be backported to all stable versions after a period of
observation.
2025-02-03 07:49:50 +01:00
William Lallemand
0a28b1ea0c MEDIUM: htx: prevent <mark> to copy incomplete headers in htx_xfer_blks()
Prevent a partial copy of trailers or headers when using the <mark>
parameter.

When using htx_xfer_blks(), transfering partial headers or trailers are
prevented when restricted by the <count> parameter. However using the
<mark> parameter will still allow to do it.

This patch changes the behavior by checking the <mark> type only after
checking the headers/trailers type, so we can still rollback on partial
transfer.

No impact on the current code, which does not try to do that yet.
2025-01-31 15:51:51 +01:00
Amaury Denoyelle
4ad2accfee BUILD: quic: remove GCC undefined error in qc_release_lost_pkts()
Every once in a while, GCC reports issues with qc_release_lost_pkts()
function. It seems that its static analysis is foiled by the code
structuring. The latest warning reports the following issue :

  CC      src/quic_loss.o
src/quic_loss.c: In function ‘qc_release_lost_pkts’:
src/quic_loss.c:313:58: error: potential null pointer dereference [-Werror=null-dereference]
  313 |                         unsigned int period = newest_lost->time_sent_ms - oldest_lost->time_sent_ms;
      |                                               ~~~~~~~~~~~^~~~~~~~~~~~~~

To fix definitely this, change slightly the code. <oldest_lost> and
<newest_lost> are now initialized on the first list entry outside of the
loop. This is enough to guarantee to GCC that they cannot be NULL for
the remainder of the function.
2025-01-31 15:34:30 +01:00
William Lallemand
c17e029232 DOC: htx: clarify <mark> parameter for htx_xfer_blks()
Clarify the fact that the first <mark> block is transferred before
stopping when using htx_xfer_blks()
2025-01-31 15:23:47 +01:00
William Lallemand
c6390cdf9c BUG/MEDIUM: htx: wrong count computation in htx_xfer_blks()
When transfering blocks from an src to another dst htx representation,
htx_xfer_blks() decreases the size of each block removed from the <count>
value passed in parameter, so it can't transfer more than <count>. The
size must also contains the metadata, represented by a simple
sizeof(struct htk_blk).

However, the code was doing a sizeof(dstblk) instead of a
sizeof(*dstblk) which as the consequence of removing only a size_t from
count. Fortunately htx_blk size is 64bits, so that does not provoke any
problem in 64bits. But on 32bits architecture, the count value is not
decreased correctly and the function could try to transfer more blocks
than allowed by the count parameter.

Must be backported in every stable release.
2025-01-31 15:02:58 +01:00
Christopher Faulet
956cb5d554 MINOR: tevt/dev: Parse tuple of termination events
term_events tool is now able to parse tuple of termination events, as returned
by "term_events" sample fetch function.
2025-01-31 10:46:08 +01:00
Christopher Faulet
71320fc9c1 MINOR: tevt/connection: Add support for POLL_HUP/POLL_ERR events
Connection errors can be detected via connect/recv/send syscall, but also
because it was reported by the poller. So dedicated events, at the FD level,
are introduced to make the difference.

term_events tool was updated accordingly.
2025-01-31 10:41:50 +01:00
Christopher Faulet
c7457427ab MINOR: tevt/dev: Add term_events tool
This development tool can be used to convert a string representing a
termination event logs to its human redable representation. Several string
may be converting at a time. To do so, several arguments can be specified on
the commeand line or they can be provided on STDIN, using "-" argument.

Here is an exemple:

  > term_events f2x2f4x4 m2m4m1 e2e1 s2s1S1 E1 M1 F1
  ### f2x2f4x4 : fd:shutr > xprt:shutr > fd:snd_err > xprt:snd_err
  ### m2m4m1   : muxc:shutr > muxc:snd_err > muxc:shutw
  ### e2e1     : se:eos > se:shutw
  ### s2s1S1   : strm:eos > strm:shutw > STRM:shutw
  ### E1       : SE:shutw
  ### M1       : MUXC:shutw
  ### F1       : FD:shutw

The make target "dev/term_events/term_events" must be used to compile it.
2025-01-31 10:41:50 +01:00
Christopher Faulet
990854ee0d REORG: tevt/connection: Move enums at the end of the header file
Enums used to report events were placed in the connection header for
conveniance. But it is not specifically related to connection. So, they are
moved at the end of the file to have a better isolation.
2025-01-31 10:41:50 +01:00
Christopher Faulet
487d6b09f1 MINOR: tevt: Improve function to convert a termination events log to string
The function is now responsible to handle empty log because no event was
reported. In that case, an empty string is returned. It is also responsible to
handle case where termination events log is not supported for an given entity
(for instance the quic mux for now). In that case, a dash ("-") is returned.
2025-01-31 10:41:50 +01:00
Christopher Faulet
b161155498 MINOR: tevt: Add a sample to get termination events for all locations
"term_events" is a sample fetche function that can be used to get
termination events for all locations in one call. The format equivalent to:

  {fc_term_events,fc_mux_term_events,fs.term_events,txn.term_events,bs.term_events,bc_mux_term_events,bc_term_events}

If no event was reported for a location, the field is empty. If the feature
is not supported yet, a dash ('-') is printed.
2025-01-31 10:41:50 +01:00
Christopher Faulet
eb2f1a4ba4 MINOR: tevt/applet: Add limited support for termination event logs for applets
There is no termination events log for applet but events for the SE location
are filled when the endpoint is an applet. Most of them relies on the new
applet API. Only few events are reported for legacy applets.
2025-01-31 10:41:50 +01:00
Christopher Faulet
cbd898c42b MINOR: tevt: Don't duplicate termination event during reporting
It is hard to never detect the same event several time without painful
tests. In other words, the same termination event can be reported several
time and this must be handled. To do so, "tevt_report_event" macro is
updated to ignore an event if the last reported one is of the same type, for
the same location. Of course, if the same event is reported several times at
different moment, it will not be detected.
2025-01-31 10:41:50 +01:00
Christopher Faulet
2dc02f75b1 MEDIUM: tevt/stconn/stream: Add dedicated termination events for stream location
If it is the last patch to introduce dedicated termination events for each
location. In this one, events for the stream location are introcued. The old
enum is also removed because it is now unused.

Here, more accurate evets are added. The "intercepted" event was splitted.
2025-01-31 10:41:50 +01:00
Christopher Faulet
9697704932 MINOR: tevt/stconn: Be more accurate to report shutw events
In se_shutdown() a SE termination event is reported while the shutw stream
event is reported in sc_app_shut_conn().
2025-01-31 10:41:50 +01:00
Christopher Faulet
a58e650ad1 MEDIUM: tevt/muxes: Add dedicated termination events for muxc/se locations
Termination events dedicated to mux connection and stream-endpoint
descriptors are added in this patch. Specific events to these locations are
thus added. Changes for the H1 and H2 multiplexers are reviewed to be more
accurate.
2025-01-31 10:41:50 +01:00
Christopher Faulet
f2778ccc7d MINOR: tevt/connection: Add dedicated termination events for lower locations
To be able to add more accurate termination events for each location, the
enum will be splitted by location. Indeed, there are at most 16 possbile
events. It will be pretty confusing to use same termination events for the
different locations. So the best is to split them.

In this patch, the termination events for the fd, hs and xprt locations are
introduced. For now some holes are added to keep similar events aligned
across enums. But this may change in future.
2025-01-31 10:41:50 +01:00
Christopher Faulet
9cbc3229ec MINOR: tevt/mux-pt: Add support for termination event logs
A termination event logs is added to the mux-pt context and appropriate
events are reported for the muxc location. There is no SE events for this
mux.
2025-01-31 10:41:50 +01:00
Christopher Faulet
a4c281a190 MINOR: tevt/muxes: Add CTL and SCTL command to get the termination event logs
MUX_CTL_TEVTS command is added to get the termination event logs of a mux
connection and MUX_SCTL_TEVTS command to get the termination event logs of a
mux stream.
2025-01-31 10:41:50 +01:00
Christopher Faulet
95029305d3 MINOR: tevt/mux-h1/mux-h2: Add termination events log when dumping mux info
The termiantion events logs of the multiplexer connection and stream are now
dumped when corresponding mux info are dumped. The termination event logs of
the underlying connection is also dumped in the debug string.
2025-01-31 10:41:50 +01:00
Christopher Faulet
170d46989c MINOR: tevt/conn: Report intercepted event for L4 rules
When a L4 rules interrupts the processing, a termination event is reported
for the connection, with the "fd" location.
2025-01-31 10:41:50 +01:00
Christopher Faulet
00a07c8b54 MINOR: tevt/stream/stconn: Report termination events for stream and sc
In this patch, events for the stream location are reported. These events are
first reported on the corresponding stream-connector. So front events on scf
and back event on scb. Then all events are both merged in the stream. But
only 4 events are saved on the stream.

Several internal events are for now grouped with the type
"tevt_type_intercepted". More events will be added to have a better
resolution. But at least the place to report these events are identified.

For now, when a event is reported on a SC, it is also reported on the stream
and vice versa.
2025-01-31 10:41:50 +01:00
Christopher Faulet
147b6d3d4d MINOR: tevt/mux-h2: Report termination events for the H2C
shutdown for reads (read0), receive errors, shutdown for writes and timeouts
are reported, but only for the H2 connection for now.

As for the H1 multiplexer, more events must be added to report protocol
errors, goaways and rst-streams. And of course, all events for the H2
streams must be reported too.
2025-01-31 10:41:50 +01:00
Christopher Faulet
5f03261166 MINOR: tevt/mux-h1: Report termination events for the H1C and H1S
shutdown for reads (read0), receive errors, shutdown for writes and timeouts
are reported. It is not too hard to know where to report events generated by
HAProxy (timeouts and shutw). For detected events (shutr and receive error),
it is not so simple. These events must not be reported when they are
detected but when the mux can handle them. For instance, some unprocessed
input data may block a read0. So, the experience will tell us if these
events are reported at the rigth time and on the right conditions.

For now, no internal errors (parsing errors, protocol errors, intenral
errors...) are reported because these event types have not yet been added.
2025-01-31 10:41:50 +01:00
Christopher Faulet
992b4b9726 MINOR: tevt/stconn: Add a termination events log in the SE descriptor
This termination events log will be used to report events from the mux
streams. The location will be "tevt_loc_se" and the muxes will be
responsible to report the corresponding events.
2025-01-31 10:41:50 +01:00
Christopher Faulet
e944944990 MINOR: tevt: Add the termination events log's fundations
Termination events logs will be used to report the events that led to close
a connection. Unlike flags, that reflect a state, the idea here is to store
a log to preserve the order of the events. Most of time, when debugging an
issue, the order of the events is crucial to be able to understand the root
cause of the issue. The traces are trully heplful to do so. But it is not
always possible to active them because it is pretty verbose. On heavily
loaded platforms, it is not acceptable. We hope that the termination events
logs will help us in that situations.

One termination events log will be be store at each layer (connection, mux
connection, mux stream...) as a 32-bits integer. Each event will be store on
8 bits, 4 bits for the location and 4 bits for the type. So the first four
events will be stored only for each layer. It should be enough why a
connection is closed.

In this patch, the enums defining the termination event locations and types
are added. The macro to report a new event is also added and a function to
convert a termination events log to a string that could be display in log
messages for instance.
2025-01-31 10:41:49 +01:00
Christopher Faulet
4ccca7efcf BUG/MINOR: mux-h1: Only report a SE error on demux error
When a demux error is reported by the H1S, an error must be reported on the
SE and not an end-of-input or an end-of-stream. So SE_FL_ERROR flag must be
set and not SE_FL_EOI/SE_FL_EOS.

It seems this bug has no impact. So there is no reason to backport it.
2025-01-31 10:41:49 +01:00
Christopher Faulet
e56e718c82 MINOR: mux-h1: Add masks to group H1S DEMUX and MUX errors
It is just a small patch to clean up mux/demux functions. Instead of listing
the H1S errors that must be handled during demux of mux operations, masks of
flags are used. It is more readable.
2025-01-31 10:41:49 +01:00
Willy Tarreau
8235a24782 MEDIUM: epoll: skip reports of stale file descriptors
Now that we can see that some events are reported for older instances
of a file descriptor, let's skip these ones instead of reporting
dangerous events on them. It might possibly qualify as a bug if it
helps fixing strange issues in certain environments, in which case it
can make sense to backport it along with the following recent patches:

  DEBUG: fd: add a counter of takeovers of an FD since it was last opened
  MINOR: fd: add a generation number to file descriptors
  DEBUG: epoll: store and compare the FD's generation count with reported event
2025-01-30 19:45:34 +01:00
Willy Tarreau
5012b6c6d9 DEBUG: epoll: store and compare the FD's generation count with reported event
There have been some reported cases where races between threads in epoll
were causing wrong reports of close or error events. Since the epoll_event
data is 64 bits, we can store the FD's generation counter in the upper
bits to verify if we're speaking about the same instance of the FD as the
current one or a stale one. If the generation number does not match, then
we classify these into 3 conditions and increment the relevant COUNT_IF()
counters (stale report for closed FD, stale report of harmless event on
reopened FD, stale report of HUP/ERR on reopened FD). Tests have shown that
with heavy concurrency, a very small maxconn (typically 1 per thread),
http-reuse always and a server closing connections first but randomly
(httpterm with /C=2r), such events can happen at a pace of a few per second
for the closed FDs, and a few per minute for the other ones, so there's value
in leaving this accessible for troubleshooting. E.g after a few minutes:

  Count     Type Location function(): "condition" [comment]
  5541       CNT ev_epoll.c:296 _do_poll(): "1" [epoll report of event on a just closed fd (harmless)]
  10         CNT ev_epoll.c:294 _do_poll(): "1" [epoll report of event on a closed recycled fd (rare)]
  42         CNT ev_epoll.c:289 _do_poll(): "1" [epoll report of HUP on a stale fd reopened on the same thread (suspicious)]
  212        CNT ev_epoll.c:279 _do_poll(): "1" [epoll report of HUP/ERR on a stale fd reopened on another thread (harmless)]
  1          CNT mux_h1.c:3911 h1_send(): "b_data(&h1c->obuf)" [connection error (send) with pending output data]

This one with the following setup, whicih abuses threads contention by
starting 64 threads on two cores:
- config:
    global
        nbthread 64
        stats socket /tmp/sock1 level admin
        stats timeout 1h
    defaults
        timeout client 5s
        timeout server 5s
        timeout connect 5s
        mode http
    listen p2
        bind :8002
        http-reuse always
        server s1 127.0.0.1:8000 maxconn 4

- haproxy forcefully started on 2C4T:

    $ taskset -c 0,1,4,5 ./haproxy -db -f epoll-dbg.cfg

- httpterm on port 8000, cpus 2,3,6,7 (2C4T)

- h1load with responses larger than a single buffer, and randomly
  closing/keeping alive:

    $ taskset -c 2,3,6,7 h1load -e -t 4 -c 256 -r 1 0:8002/?s=19k/C=2r
2025-01-30 19:45:34 +01:00
Willy Tarreau
d155924efe MINOR: fd: add a generation number to file descriptors
This patch adds a counter of close() on file descriptors in the fdtab.
The goal is to better detect if reported events concern the current or
a previous file descriptor. For now the counter is only added, and is
showed in "show fd" as "gen". We're reusing unused space at the end of
the struct. If it's needed for something more important later, this
patch can be reverted.
2025-01-30 19:45:34 +01:00
Willy Tarreau
44ac7a7e73 DEBUG: fd: add a counter of takeovers of an FD since it was last opened
That's essentially in order to help with debugging strange cases like
the occasional epoll issues/races, by keeping a counter of how many
times an FD was taken over since last inserted. The room is available
so let's use it. If it's needed later, this patch can easily be reverted.
The counter is also reported in "show fd" as "tkov".
2025-01-30 19:45:34 +01:00
Amaury Denoyelle
b849ee5fa3 BUILD: quic: fix overflow in global tune
A new global option was recently introduced to disable pacing. However,
the value used (1<<31) caused issue with some compiler as options field
used for storage is declared as int. Move pacing deactivation flag
outside into the newly defined quic_tune to fix this.

This should be backported up to 3.1 after a period of observation. Note
that it relied on the previous patch which defined new quic_tune type.
2025-01-30 18:12:53 +01:00
Amaury Denoyelle
09e9c7d5b7 MINOR: quic: define quic_tune
Define a new structure quic_tune. It will be useful to regroup various
configuration settings and tunable related to QUIC, instead of defining
them into the global structure.
2025-01-30 18:12:40 +01:00
Amaury Denoyelle
2fc63cb186 MINOR: quic: mark BBR as stable
Pacing has recently been moved out of experimental status and is
activated by default. This is a mandatory requirement for BBR.
Furthermore, BBR is now considered stable. As such, removes its
experimental status with this commit.
2025-01-30 17:20:41 +01:00
Amaury Denoyelle
a19d9b0486 MAJOR: quic: mark pacing as stable and enable it by default
Remove pacing experimental status, so it's not required anymore to use
expose-experimental-directives to enable it.

Along this change, pacing is now activated by default. As such, pacing
configuration is transformed into its final form. The global on/off
setting is turned into a disable setting without argument.
2025-01-30 17:20:41 +01:00
Amaury Denoyelle
0c8b54b2d1 MINOR: quic: transform pacing settings into a global option
Pacing support was previously activated on each bind line individually,
via an optional argument of quic-cc-algo keyword. Remove this optional
argument and introduce a global setting to enable/disable pacing. Pacing
activation is still flagged as experimental.

One important change is that previously BBR usage automatically
activated pacing support. This is not the case anymore, so users should
now always explicitely activate pacing if BBR is selected. A new warning
message will be displayed if this is not the case.

Another consequence of this change is that now pacing_inter callback is
always defined for every quic_cc_algo types. As such, QUIC MUX uses
global.tune.options to determine if pacing is required.

This should be backported up to 3.1, after a period of observation.
2025-01-30 17:19:38 +01:00
Amaury Denoyelle
d04e93bc2e MINOR: quic: allow BBR testing without pacing
Pacing is activated per bind line via an optional boolean argument of
quic-cc-algo keyword. Contrary to the default usage, pacing is
automatically activated when BBR is chosen. This is because this
algorithm is expected to run on top of pacing, else its behavior is
undefined.

Previously, pacing argument was thus ignored when BBR was selected.
Change this to support explicit deactivation of pacing with it. This
could be useful to test BBR without pacing when debugging some issues.

This should be backported up to 3.1, after a period of observation.
2025-01-30 17:18:02 +01:00
Amaury Denoyelle
6acf391e89 MINOR: quic: remove references to burst in quic-cc-algo parsing
Pacing activation configuration has been recently revamped. Previously,
pacing related quic-cc-algo argument was used to specify a burst size.
It evolved into a boolean value as burst size is dynamically calculated
now. As such, removes any references to the old burst value in config
parsing code for cleaner code.

This should be backported up to 3.1, after a period of observation.
2025-01-30 17:02:59 +01:00
Willy Tarreau
bd7a688b8b BUG/MEDIUM: chunk: make sure to flush the trash pool before resizing
Late in 3.1 we've added an integrity check to make sure we didn't keep
trash objects allocated before resizing the trash with commit 0bfd36e7b8
("MINOR: chunk: add a BUG_ON upon the next init_trash_buffer()"), but
it turns out that the counter that is being checked includes the number
of objects left in local thread caches. As such it can trigger despite
no object being allocated. This precisely happens when setting
tune.memory.hot-size to a few megabytes because some temporarily used
trash objects will remain in cache.

In order to address this, let's first flush the pool before running
the check. That was previously done by pool_destroy() but the check
had to be inserted before it. So now we first flush the trash pool,
then verify it's no longer used, and finally we can destroy it.

This needs to be backported to 3.1. Thanks to Christian Ruppert for
reporting this bug.
2025-01-29 17:55:18 +01:00
William Lallemand
b43e5d8c16 BUILD: ssl: more cleaner approach to WolfSSL without renegotiation
Patch discussed in https://github.com/wolfSSL/wolfssl/issues/6834

When building Wolfssl without renegotiation options, WolfSSL still
defines the macros about it, which warns during the build.

This patch completes the previous one by undefining the macros so
haproxy could build without any warning.
2025-01-28 20:55:20 +01:00
William Lallemand
c6a8279cdf BUILD: ssl: allow to build without the renegotiation API of WolfSSL
In ticket https://github.com/wolfSSL/wolfssl/issues/6834, it was
suggested to push --enable-haproxy within --enable-distro.

WolfSSL does not want to include the renegotiation support in
--enable-distro.

To achieve this, let haproxy build without SSL_renegotiate_pending()
when wolfssl does not define HAVE_SECURE_RENEGOCIATION or
HAVE_SERVER_RENEGOCIATION_INFO.
2025-01-28 18:31:32 +01:00
Olivier Houchard
9253146b90 BUILD: queues: Use unsigned int when needed
Use unsigned int instead of int when calculating which thread group we
should dequeue from next, as the difference in signedness makes clang
unhappy.
2025-01-28 17:44:54 +01:00
Olivier Houchard
b74ec1efc2 MINOR: queues: use __ha_cpu_relax() on failed CAS.
Make sure we call __ha_cpu_relax() if we fail a CAS, to help with
contention.
2025-01-28 16:00:19 +01:00
Willy Tarreau
f17b0a994b BUILD: tools: fix build on BSD by dropping the ETIME check
Commit 44537379fc ("MINOR: tools: add errname to print errno macro
name") brought a facility to report errno using a symbolic string
when known instead of showing only the value. However, among the
listed options, ETIME is mentioned but is unknown from FreeBSD where
it breaks the build. Let's simply drop it, we don't use ETIME anyway
and even if it would be reported, the default code path still reports
the numeric value so there's no harm. If other ones fail to build in
the future, they could be handled the same way.
2025-01-28 15:58:57 +01:00
Christopher Faulet
36d151dc10 MEDIUM: stream: No longer use TASK_F_UEVT* to shut a stream down
Thanks to the previous patch, it is now possible to explicitly rely on
stream's events to shut it down. The right event is set in
stream_shutdown(), before waking up the stream, via an atomic operation. In
process_stream(), this event will be handled as expected.

Thus, TASK_F_UEVT* are no longer used, but not removed since still usable
for other tasks.

This patch depends on "MEDIUM: stream: Map task wake up reasons to dedicated
stream events".
2025-01-28 14:53:37 +01:00
Christopher Faulet
6048460102 MEDIUM: stream: Map task wake up reasons to dedicated stream events
To fix thread-safety issues when a stream must be shut, three new task
states were added. These states are generic (UEVT1, UEVT2 and UEVT3), the
task callback function is responsible to know what to do with them. However,
it is not really scalable.

The best is to use an atomic field in the stream structure itself to deal
with these dedicated events. There is already the "pending_events" field
that save wake up reasons (TASK_WOKEN_*) to not loose them if
process_stream() is interrupted before it had a chance to handle them.

So the idea is to introduce a new field to handle streams dedicated events
and merged them with the task's wake up reasons used by the stream. This
means a mapping must be performed between some task wake up reasons and
streams events. Note that not all task wake up reasons will be mapped.

In this patch, the "new_events" field is introduced. It is an atomic
bit-field. Streams events (STRM_EVT_*) are also introduced to map the task
wake up reasons used by process_stream(). Only TASK_WOKEN_TIMER and
TASK_WOKEN_MSG are mapped, in addition to TASK_F_UEVT* flags. In
process_stream(), "pending_events" field is now filled with new stream
events and the mapping of the wake up reasons.
2025-01-28 14:53:37 +01:00
Christopher Faulet
0a52a75ef7 BUG/MINOR: stream: Properly handle "on-marked-up shutdown-backup-sessions"
shutdown-backup-sessions action for on-marked-up directive does not work anymore
since the stream_shutdown() function was modified to be async-safe.

When stream_shutdown() was modified to be async-safe, dedicated task events were
added to map the reasons to shut a stream down. SF_ERR_DOWN was mapped to
TASK_F_EVT1 and SF_ERR_KILLED was mapped to TASK_F_EVT2. The reverse mapping was
performed by process_stream() to shut the stream with the appropriate reason.

However, SF_ERR_UP reason, used by shutdown-backup-sessions action to shut a
stream down because a preferred server became available, was not mapped in the
same way. So since commit b8e3b0a18d ("BUG/MEDIUM: stream: make
stream_shutdown() async-safe"), this action is ignored and does not work
anymore.

To fix an issue, and being able to bakcport the fix, a third task event was
added. TASK_F_EVT3 is now mapped on SF_ERR_UP.

This patch should fix the issue #2848. It must be backported as far as 2.6.
2025-01-28 14:53:37 +01:00
Olivier Houchard
26b3e5236f MEDIUM: servers/proxies: Switch to using per-tgroup queues.
For both servers and proxies, use one connection queue per thread-group,
instead of only one. Having only one can lead to severe performance
issues on NUMA machines, it is actually trivial to get the watchdog to
trigger on an AMD machine, having a server with a maxconn of 96, and an
injector that uses 160 concurrent connections.
We now have one queue per thread-group, however when dequeueing, we're
dequeuing MAX_SELF_USE_QUEUE (currently 9) pendconns from our own queue,
before dequeueing one from another thread group, if available, to make
sure everybody is still running.
2025-01-28 12:49:41 +01:00
Olivier Houchard
583303c48b MINOR: proxies/servers: Calculate queueslength and use it.
For both proxies and servers, properly calculates queueslength, which is
the total number of element in each queues (as they currently are only
using one queue, it is equivalent to the number of element of that
queue), and use it instead of the queue's length.
2025-01-28 12:49:41 +01:00
Olivier Houchard
59eddabe16 MINOR: Add fields to the per-thread group field in struct server.
Add a per-thread group queue and associated fields in per-thread group
field in struct server, as well as a new field, queues length.
This is currently unused, so should change nothing.
2025-01-28 12:49:41 +01:00
Olivier Houchard
f879b9a18a MINOR: proxies: Add a per-thread group field to struct proxy.
Add a per-thread group field to struct proxy, that will contain a struct
queue, as well as a new field, "queueslength".
This is currently unused, so should change nothing.
Please note that proxy_init_per_thr() must now be called for each proxy
once the thread groups number is known.
2025-01-28 12:49:41 +01:00
Willy Tarreau
7fa70da06d MINOR: epoll: permit to mask certain specific events
A few times in the past we've seen cases where epoll was caught reporting
a wrong event that caused trouble (e.g. spuriously reporting HUP or RDHUP
after a successful connect()). The new tune.epoll.mask-events directive
permits to mask events such as ERR, HUP and RDHUP and convert them to IN
events that are processed by the regular receive path. This should help
better diagnose and troubleshoot issues such as this one, as well as rule
out such a cause when similar issues are reported:

   https://github.com/haproxy/haproxy/issues/2368
   https://www.spinics.net/lists/netdev/msg876470.html

It should be harmless to backport this if necessary.
2025-01-27 15:47:46 +01:00
Aurelien DARRAGON
e768a531b7 CLEANUP: tree-wide: define and use acl_match_cond() helper
acl_match_cond() combines acl_exec_cond() + acl_pass() and a check on the
condition->pol (to check if the cond is inverted) in order to return
either 0 if the cond doesn't match or 1 if it matches (or NULL).

Thanks to this we can actually simplify some redundant constructs that
iterate over rules and evaluate if the condition matches or not.

Conditions for tcp-request inspect-content and tcp-response
inspect-content couldn't be simplified because they perform an extra
check for missing data, and thus still need to leverage acl_exec_cond()

It's best to display the patch using "-w", like "git show xxxx -w",
because some blocks had to be re-indented after the cleanup, which
makes the patch hard to review by default.
2025-01-27 11:11:43 +01:00
Valentine Krasnobaeva
94d3b7375a CLEANUP: ssl: move ssl_sock_gencert_load_ca declaration in ssl_gencert.h
As ssl_sock_gencert_load_ca and ssl_sock_gencert_free_ca are compiled only if
SSL_NO_GENERATE_CERTIFICATES is not defined, let's align it and move these
declarations in ssl_gencert.h.
2025-01-24 12:31:07 +01:00
Valentine Krasnobaeva
846819b316 CLEANUP: ssl: rename ssl_sock_load_ca to ssl_sock_gencert_load_ca
ssl_sock_load_ca is defined in ssl_gencert.c and compiled only if
SSL_NO_GENERATE_CERTIFICATES is not defined. It's name is a bit confusing, as
we may think at the first glance, that it's a generic function, which is also
used to load CA file, provided via 'ca-file' keyword.
ssl_set_verify_locations_file is used in this case.

So let's rename ssl_sock_load_ca into ssl_sock_gencert_load_ca. Same is
applied to ssl_sock_free_ca.
2025-01-24 12:31:07 +01:00
Valentine Krasnobaeva
c987f30245 BUG/MINOR: ssl: put ssl_sock_load_ca under SSL_NO_GENERATE_CERTIFICATES
ssl_sock_load_ca and ssl_sock_free_ca definitions are compiled only, if
SSL_NO_GENERATE_CERTIFICATES is not set. In case, when we set this define and
build haproxy, linker throws an error. So, let's fix this.

This should be backported in all stable versions.
2025-01-24 12:31:07 +01:00
Willy Tarreau
670182bc9e [RELEASE] Released version 3.2-dev4
Released version 3.2-dev4 with the following main changes :
    - BUG/MINOR: stktable: fix big-endian compatiblity in smp_to_stkey()
    - MINOR: stktable: add stkey_to_smp() helper
    - MINOR: stktable: add stksess_getkey() helper
    - MINOR: stktable: add sc[0-2]_key fetches
    - BUG/MEDIUM: queues: Adjust the proxy counters when appropriate
    - MINOR: trace: add help message for -dt argument
    - MINOR: trace: ensure -dt priority over traces config section
    - MINOR: trace: support all source alias on -dt
    - BUG/MINOR: quic: reject NEW_TOKEN frames from clients
    - MINOR: stktable: fix potential build issue in smp_to_stkey
    - BUG/MEDIUM: stktable: fix missing lock on some table converters
    - BUG/MEDIUM: promex: Use right context pointers to dump backends extra-counters
    - MINOR: stktable: fix potential build issue in smp_to_stkey (2nd try)
    - MINOR: stktable: add smp_fetch_stksess() helper function
    - MEDIUM: stktable: split src-based key smp_fetch_sc functions
    - MEDIUM: stktable: split sc_ and src_ fetch lookup logics
    - MEDIUM: stktable: leverage smp_fetch_* helpers from sample conv
    - DOC: config: unify sample conv|fetches optional arguments syntax
    - DOC: config: stick-table converters support implicit <table> argument
    - DOC: config: stick-table converter do accept ANY-typed input
    - DOC: config: clarify return type for some stick-table converters
    - DOC: config: refer to canonical sticktable converters for src_* fetches
    - CLEANUP: stktable: move sample_conv_table_bytes_out_rate()
    - MINOR: stktable: add table_{inc,clr}_gpc* converters
    - BUG/MAJOR: quic: reject too large CRYPTO frames
    - BUG/MAJOR: log/sink: possible sink collision in sink_new_from_srv()
    - BUG/MINOR: init: set HAPROXY_STARTUP_VERSION from the variable, not the macro
    - REORG: version: move the remaining BUILD_* stuff from haproxy.c to version.c
    - BUG/MINOR: quic: ensure a detached coalesced packet can't access its neighbours
    - MINOR: quic: Add a BUG_ON() on quic_tx_packet refcount
    - BUILD: quic: Move an ASSUME_NONNULL() for variable which is not null
    - BUG/MEDIUM: mux-h1: Properly close H1C if an error is reported before sending data
    - CLEANUP: quic: remove unused prototype
    - MINOR: quic: rename pacing_rate cb to pacing_inter
    - BUG/MINOR: quic: do not increase congestion window if app limited
    - MINOR: mux-quic: increment pacing retry counter on expired
    - MEDIUM: quic: implement credit based pacing
    - MEDIUM: mux-quic: reduce pacing CPU usage with passive wait
    - MEDIUM: quic: use dynamic credit for pacing
    - MINOR: quic: remove unused pacing burst in bind_conf/quic_cc_path
    - MINOR: quic: adapt credit based pacing to BBR
    - MINOR: tools: add errname to print errno macro name
    - MINOR: debug: debug_parse_cli_show_dev: use errname
    - MINOR: debug: show boot and runtime process settings in table
2025-01-24 11:01:06 +01:00
Valentine Krasnobaeva
8620ae7962 MINOR: debug: show boot and runtime process settings in table
Let's reformat output of "show dev" in order to show some boot and runtime
process settings in a table. This makes the output less crowded.
2025-01-24 09:54:57 +01:00
Valentine Krasnobaeva
df7f16d960 MINOR: debug: debug_parse_cli_show_dev: use errname
Let's use errname, introduced in the previous commit in the output of
"show dev". This output is destined to engineers. So, no need to provide a
long descriptions of errnos given by strerror.
2025-01-24 09:54:57 +01:00
Valentine Krasnobaeva
44537379fc MINOR: tools: add errname to print errno macro name
Add helper to print the name of errno's corresponding macro, for example
"EINVAL" for errno=22. This may be helpful for debugging and for using in
some CLI commands output. The switch-case in errname() contains only the
errnos currently used in the code. So, it needs to be extended, if one starts
to use new syscalls.
2025-01-24 09:54:57 +01:00
Amaury Denoyelle
42bac9339c MINOR: quic: adapt credit based pacing to BBR
Credit based pacing has been further refined to be able to calculate
dynamically burst size based on congestion parameter. However, BBR
algorithm already provides pacing rate and burst size (labelled as
send_quantum) for 1ms of emission.

Adapt quic_pacing_reload() to use BBR values to compute pacing credit.
This is done via pacing_burst callback which is now only defined for
BBR. For other algorithms, determine the burst size over 1ms with the
congestion window size and RTT.

This should be backported up to 3.1.
2025-01-23 17:41:07 +01:00
Amaury Denoyelle
7896edccdc MINOR: quic: remove unused pacing burst in bind_conf/quic_cc_path
Pacing burst size is now dynamic. As such, configuration value has been
removed and related fields in bind_conf and quic_cc_path structures can
be safely removed.

This should be backported up to 3.1.
2025-01-23 17:40:48 +01:00
Amaury Denoyelle
cb91ccd8a8 MEDIUM: quic: use dynamic credit for pacing
Major improvements have been introduced in pacing recently. Most
notably, QMUX schedules emission on a millisecond resolution, which
allow to use passive wait to be much CPU friendly.

However, an issue remains with the pacing max credit. Unless BBR is
used, it is fixed to the configured value from quic-cc-algo bind
statement. This is not practical as if too low, it may drastically
reduce performance due to 1ms sleep resolution. If too high, some
clients will suffer from too much packet loss.

This commit fixes the issue by implementing a dynamic maximum credit
value based on the network condition specific to each clients.
Calculation is done to fix a maximum value which should allow QMUX
current tasklet context to emit enough data to cover the delay with the
next tasklet invokation. As such, avg_loop_us is used to detect the
process load. If too small, 1.5ms is used as minimal value, to cover the
extra delay incurred by the system which will happen for a default 1ms
sleep.

This should be backported up to 3.1.
2025-01-23 17:40:48 +01:00
Amaury Denoyelle
8098be1fdc MEDIUM: mux-quic: reduce pacing CPU usage with passive wait
Pacing algorithm has been revamped in the previous commit to implement a
credit based solution. This is a far more adaptative solution, in
particular which allow to catch up in case pause between pacing emission
was longer than expected.

This allows QMUX to remove the active loop based on tasklet wake-up.
Instead, a new task is used when emission should be paced. The main
advantage is that CPU usage is drastically reduced.

New pacing task timer is reset each time qcc_io_send() is invoked. Timer
will be set only if pacing engine reports that emission must be
interrupted. In this case timer is set via qcc_wakeup_pacing() to the
delay reported by congestion algorithm, or 1ms if delay is too short. At
the end of qcc_io_cb(), pacing task is queued if timer has been set.

Pacing task execution is simple enough : it immediately wakes up QCC I/O
handler.

Note that to have decent performance, it requires to have a large enough
burst defined in configuration of quic-cc-algo. However, this value is
common to every listener clients, which may cause too much loss under
network conditions. This will be address in a future patch.

This should be backported up to 3.1.
2025-01-23 17:40:22 +01:00
Amaury Denoyelle
4489a61585 MEDIUM: quic: implement credit based pacing
Implement a new method for QUIC pacing emission based on credit. This
represents the number of packets which can be emitted in a single burst.
After emission, decrement from the credit the number of emitted packets.
Several emission can be conducted in the same sequence until the credit
is completely decremented.

When a new emission sequence is initiated (i.e. under a new QMUX tasklet
invokation), credit is refilled according to the delay which occured
between the last and current emission context.

This new mechanism main advantage is that it allows to conduct several
emission in the same task context without having to wait between each
invokation. Wait is only forced if pacing is expired, which is now
equivalent to having a null credit.

Furthermore, if delay between two emissions sequence would have been
smaller than expected, credit is only partially refilled. This allows to
restart emission without having to wait for the whole credit to be
available.

On the implementation side, a new field <credit> is avaiable in
quic_pacer structure. It is automatically decremented on
quic_pacing_sent_done() invokation. Also, a new function
quic_pacing_reload() must be used by QUIC MUX when a new emission
sequence is initiated to refill credit. <next> field from quic_pacer has
been removed.

For the moment, credit is based on the burst configured via quic-cc-algo
keyword, or directly reported by BBR.

This should be backported up to 3.1.
2025-01-23 17:40:20 +01:00
Amaury Denoyelle
9d8589f0de MINOR: mux-quic: increment pacing retry counter on expired
A field <paced_sent_ctr> from quic_pacer structure is used to report the
number of occurences where emission has been interrupted due to pacing.
However, it was not incremented when QUIC MUX had to pause immediately
emission as pacing was still not yet expired.

Fix this by incrementing <paced_sent_ctr> in qcc_io_send() prior to
emission if pacing is expired. Note that incrementation is only done
once if the tasklet is then repeatdely woken up until the timer is
expired.

This should be backported up to 3.1.
2025-01-23 17:29:14 +01:00
Amaury Denoyelle
bbaa7aef7b BUG/MINOR: quic: do not increase congestion window if app limited
Previously, congestion window was increased any time each time a new
acknowledge was received. However, it did not take into account the
window filling level. In a network condition with negligible loss, this
will cause the window to be incremented until the maximum value (by
default 480k), even though the application does not have enough data to
fill it.

In most cases, this issue is not noticeable. However, it may lead to
excessive memory consumption when a QUIC connection is suddendly
interrupted, as in this case haproxy will fill the window with
retransmission. It even has caused OOM crash when thousands of clients
were interrupted at once on a local network benchmark.

Fix this by first checking window level prior to every incrementation
via a new helper function quic_cwnd_may_increase(). It was arbitrarily
decided that the window must be at least 50% full when the ACK is
handled prior to increment it. This value is a good compromise to keep
window in check while still allowing fast increment when needed.

Note that this patch only concerns cubic and newreno algorithm. BBR has
already its notion of application limited which ensures the window is
only incremented when necessary.

This should be backported up to 2.6.
2025-01-23 14:49:35 +01:00
Amaury Denoyelle
7c0820892f MINOR: quic: rename pacing_rate cb to pacing_inter
Rename one of the congestion algorithms pacing callback from pacing_rate
to pacing_inter. This better reflects that this function returns a delay
(in nanoseconds) which should be applied between each packet emission to
fill the congestion window with a perfectly smoothed emission.

This should be backported up to 3.1.
2025-01-23 14:49:35 +01:00
Amaury Denoyelle
2178bf1192 CLEANUP: quic: remove unused prototype
Remove undefined quic_pacing_send() function prototype from quic_pacing
module.

This should be backported up to 3.1.
2025-01-23 14:49:35 +01:00
Christopher Faulet
b18e988e0d BUG/MEDIUM: mux-h1: Properly close H1C if an error is reported before sending data
It is possible to have front H1 connections waiting for the client timeout
while they should be closed because a conneciton error was reported before
sebding an error message to the client. It is not a leak because the
connections are closed when the timeout expires but it is a waste of
ressources, especially if the client timeout is high.

When an early error message must be sent to the client, if an error was
already detected, no data are sent and the output buffer is released. At
this stage, the H1 connection is in CLOSING state and it must be
released. But because of a bug, this is not performed. The client timeout is
rearmed and the H1 connection is only closed when it expires.

To fix the issue, the condition to close a H1C must also be evaluated when
an error is detected before sending data.

It is only an issue with idle client connections, because there is no H1
stream in that case and the error message is generated by the mux itself.

This patch must be backported as far as 2.8.
2025-01-23 11:05:48 +01:00
Frederic Lecaille
1f099db7e2 BUILD: quic: Move an ASSUME_NONNULL() for variable which is not null
Some new compilers warn that <oldest_lost> variable can be null even this cannot be
the case as mentioned by the comment about an already present ASSUME_NONNULL()
call comment as follows:

src/quic_loss.c: In function ‘qc_release_lost_pkts’:
src/quic_loss.c:307:86: error: potential null pointer dereference [-Werror=null-dereference]
  307 |   unsigned int period = newest_lost->time_sent_ms - oldest_lost->time_sent_ms;
      |                                                     ~~~~~~~~~~~^~~~~~~~~~~~~~

Move up this ASSUME_NONNULL() statement to please these compiler.

Must be backported as far as 2.6 to easy any further backport around this code part.
2025-01-21 22:01:34 +01:00
Frederic Lecaille
4f38c4bfd8 MINOR: quic: Add a BUG_ON() on quic_tx_packet refcount
This is definitively a bug to call quic_tx_packet_refdec() to decrement the reference
counter of a TX packet calling quic_tx_packet_refdec(), and possibly to release its
memory when it is negative or null.

This counter is incremented when a TX frm is attached to it with some allocated memory
and when the packet is inserted into a data structure, if needed (list or tree).

Should be easily backported as far as 2.6 to ease any further backport around
this code part.
2025-01-21 22:01:34 +01:00
Frederic Lecaille
cb729fb64d BUG/MINOR: quic: ensure a detached coalesced packet can't access its neighbours
Reset ->prev and ->next fields of a coalesced TX packet to ensure it cannot access
several times its neighbours after it is supposed to be detached from them calling
quic_tx_packet_dgram_detach().

There are two cases where a packet can be coalesced to another previous built one:
this is when it is built into the same datagrame without GSO (and flagged flag with
QUIC_FL_TX_PACKET_COALESCED) or when sent from the same sendto() syscall with GOS
(not flagged with QUIC_FL_TX_PACKET_COALESCED).

This fix may be in relation with GH #2839.

Must be backported as far as 2.6.
2025-01-21 22:01:34 +01:00
Willy Tarreau
b066c0affb REORG: version: move the remaining BUILD_* stuff from haproxy.c to version.c
version.c tries to centralize all variables conveying version information,
but there's still an issue with the BUILD_* variables which are only
passed to haproxy.o and are only updated when that one is rebuilt. This
is not very logical given that we can end up with values there which
contradict info from version.c.

Better move all of these to version.c which is systematically rebuilt.
Most of these variables only end up as string concatenation at the
moment. Some of them are even duplicated. In version.c we now have one
variable (or constant) for each of them and haproxy.c references them
in messages. This is much more logical and easier to maintain in a
consistent state.

The patch looks a bit large but it really only moves the ifdefed string
assignment from one file to another, placing them into variables.
2025-01-20 17:53:55 +01:00
Willy Tarreau
9e61cf6790 BUG/MINOR: init: set HAPROXY_STARTUP_VERSION from the variable, not the macro
This environment variable was added by commit d4c0be6b20 ("MINOR: startup:
HAPROXY_STARTUP_VERSION contains the version used to start"). However, it's
set from the macro that is passed during the build process instead of being
set from the variable that's kept up to date in version.c. The difference
is visible only during debugging/bisecting because only changed files and
version.o are rebuilt, but not necessarily haproxy.o, which is where the
environment variable is set. This means that the version exposed in the
environment is not necessarily the same as the one presented in
"haproxy -v" during such debugging sessions.

This should be backported to 2.8. It has no impact at all on regularly
built binaries.
2025-01-20 17:53:55 +01:00
Aurelien DARRAGON
bfa493d4be BUG/MAJOR: log/sink: possible sink collision in sink_new_from_srv()
sink_new_from_srv() leverages sink_new_buf() with the server id as name,
sink_new_buf() then calls __sink_new() with the provided name.

Unfortunately sink_new() is designed in such a way that it will first look
up in the list of existing sinks to check if a sink already exists with
given name, in which case the existing sink is returned. While this
behavior may be error-prone, it is actually up to the caller to ensure
that the provided name is unique if it really expects a unique sink
pointer.

Due to this bug in sink_new_from_srv(), multiple tcp servers with the same
name defined in distinct log backends would end up sharing the same sink,
which means messages sent to one of the servers would also be forwarded to
all servers with the same name across all log backend sections defined in
the config, which is obviously an issue and could even raise security
concerns.

Example:

  defaults
    log backend@log-1 local0

  backend log-1
    mode log
    server s1 127.0.0.1:514
  backend log-2
    mode log
    server s1 127.0.0.1:5114

With the above config, logs sent to log-1/s1 would also end up being sent
to log-2/s1 due to server id "s1" being used for tcp servers in distinct
log backends.

To fix the issue, we now prefix the sink ame with the backend name:
back_name/srv_id combination is known to be unique (backend name
serves as a namespace)

This bug was reported by GH user @landon-lengyel under #2846.

UDP servers (with udp@ prefix before the address) are not affected as they
don't make use of the sink facility.

As a workaround, one should manually ensure that all tcp servers across
different log backends (backend with "mode log" enabled) use unique names

This bug was introduced in e58a9b4 ("MINOR: sink: add sink_new_from_srv()
function") thus it exists since the introduction of log backends in 2.9,
which means this patch should be backported up to 2.9.
2025-01-20 12:33:20 +01:00
Amaury Denoyelle
c3a4a4d166 BUG/MAJOR: quic: reject too large CRYPTO frames
Received CRYPTO frames are inserted in a ncbuf to handle out-of-order
reception via ncb_add(). They are stored on the position relative to the
frame offset, minus a base offset which corresponds to the in-order data
length already handled.

Previouly, no check was implemented on the frame offset value prior to
ncb_add(), which could easily trigger a crash if relative offset was too
large. Fix this by ensuring first that the frame can be stored in the
buffer before ncb_add() invokation. If this is not the case, connection
is closed with error CRYPTO_BUFFER_EXCEEDED, as required by QUIC
specification.

This should fix github issue #2842.

This must be backported up to 2.6.
2025-01-20 11:43:23 +01:00
Aurelien DARRAGON
0486b9e491 MINOR: stktable: add table_{inc,clr}_gpc* converters
As discussed in GH #2423, there are some cases where src_{inc,clr}_gpc*
is not sufficient because we need to perform the lookup on a specific
key. Indeed, just like we did in e642916 ("MEDIUM: stktable: leverage
smp_fetch_* helpers from sample conv"), we can easily implement new
table converters based on existing fetches. This is what we do in
this patch.

Also the doc was updated so that src_{inc,clr}_gpc* fetches now point to
their generic equivalent table_{inc,clr}_gpc*. Indeed, src_{inc,clr}_gpc*
are simply aliases.

This should fix GH #2423.
2025-01-16 11:50:33 +01:00
Aurelien DARRAGON
9f68049cc1 CLEANUP: stktable: move sample_conv_table_bytes_out_rate()
sample_conv_table_bytes_out_rate() was defined in the middle of other
stick-table sample convs without any ordering logic. Let's put it
where it belongs, right after sample_conv_table_bytes_in_rate().
2025-01-16 11:50:27 +01:00
Aurelien DARRAGON
62e42184ab DOC: config: refer to canonical sticktable converters for src_* fetches
When available, to prevent doc duplication, let's make src_* fetches
point to equivalent table_* converters, as they are in fact aliases
for src,table_* converters.
2025-01-16 11:50:20 +01:00
Aurelien DARRAGON
163c1124a2 DOC: config: clarify return type for some stick-table converters
Some stick-table converters such as "table_gpt" erroneously suggest that
the returned type is a boolean while in fact it is integer type, as
properly documented for the sample fetch equivalents.
2025-01-16 11:50:14 +01:00
Aurelien DARRAGON
a8407cf3f7 DOC: config: stick-table converter do accept ANY-typed input
Since 2d17db58 ("MINOR: stick-table: change all stick-table converters'
inputs to SMP_T_ANY"), all stick-table converters accept ANY input
type as parameter, this means that it does no longer restrict the key as
a string representation of the input. However the doc wasn't updated when
the change was made. Moreover, some converters document the updated behavior
while others don't, which is kind of confusing, let's fix that.
2025-01-16 11:50:08 +01:00
Aurelien DARRAGON
0d318b4383 DOC: config: stick-table converters support implicit <table> argument
As with stick-table sample fetches, the <table> argument is not strictly
needed and defaults to the current proxy's stick-table when not provided

Let's update the doc and prototype to reflect the current behavior.
2025-01-16 11:50:02 +01:00
Aurelien DARRAGON
dfdee47a8e DOC: config: unify sample conv|fetches optional arguments syntax
The most common way (and proper way it seems) to declare optional
arguments in sample fetch or converters' prototype is to declare
them between square brackets, including the leading coma (because the
coma should be omitted if the argument is not provided). Also, when
multiple optional arguments are found, we should apply the same logic
but recursively.

In this patch we fix prototypes that include optional arguments and don't
follow this syntax. This improves readibility and sets the norm for
upcoming sample fetches/converters.
2025-01-16 11:49:55 +01:00
Aurelien DARRAGON
e6429166b9 MEDIUM: stktable: leverage smp_fetch_* helpers from sample conv
In this patch we try to prevent code duplication: some fetches and sample
converters do the exact same thing, except that the converter takes the
argument as input data. Until now, both the converter and the fetch
had their own implementation (copy pasted), with the fetch specific or
converter specific lookup part.

Thanks to previous commits, we now have generic sample fetch helpers
that take the stkctr as argument, so let's leverage them directly
from the converter functions when available. This allows to remove
a lot of code duplication and should make code maintenance easier in the
future.
2025-01-15 14:04:55 +01:00
Aurelien DARRAGON
6c9b315187 MEDIUM: stktable: split sc_ and src_ fetch lookup logics
While this patch actually adds more insertions than deletions, it actually
tries to simplify the lookup logic for sc_ and src_ sticktable fetches.

Indeed, smp_create_src_stkctr() and smp_fetch_sc_stkctr() combination
was used everywhere the fetch supports sc_ and src_ form, and
smp_fetch_sc_stkctr() even integrated some of the src-oriented fetch logic.

Not only this was confusing, but it made the task of adding new generic
fetches even more complex.

Thus in this patch we completely dedicate smp_fetch_sc_stkctr() to sc_
oriented fetches, while smp_create_src_stkctr() is now renamed to
smp_fetch_src_stkctr() and can now work on its own for src_ oriented
fetches. It takes an additional paramater, "create" to tell the function
if the entry should be created if it doesn't exist yet.

Now it's up to the calling function to know if it should be using the
sc_ oriented fetch or the src_ oriented one based on the input keyword.
2025-01-15 14:04:50 +01:00
Aurelien DARRAGON
22229a41a2 MEDIUM: stktable: split src-based key smp_fetch_sc functions
In this patch we split several sample fetch functions that are leveraged
by the "src-" fetches such as smp_fetch_sc_inc_gpc().

Indeed, for all of them, we add an intermediate helper function that takes
a stkctr pointer as parameter and performs the logic, leaving the lookup
part in the calling function. Before this patch existing functions were
doing the lookup + the fetch logic. Thanks to this patch it will become
easier to add generic converters taking lookup key as input.

List of targeted functions:
 - smp_fetch_sc_inc_gpc()
 - smp_fetch_sc_inc_gpc0()
 - smp_fetch_sc_inc_gpc1()
 - smp_fetch_sc_clr_gpc()
 - smp_fetch_sc_clr_gpc0()
 - smp_fetch_sc_clr_gpc1()
 - smp_fetch_sc_conn_cnt()
 - smp_fetch_sc_conn_rate()
 - smp_fetch_sc_updt_conn_cnt()
 - smp_fetch_sc_conn_curr()
 - smp_fetch_sc_glitch_cnt()
 - smp_fetch_sc_glitch_rate()
 - smp_fetch_sc_sess_cnt()
 - smp_fetch_sc_sess_rate()
 - smp_fetch_sc_http_req_cnt()
 - smp_fetch_sc_http_req_rate()
 - smp_fetch_sc_http_err_cnt()
 - smp_fetch_sc_http_err_rate()
 - smp_fetch_sc_http_fail_cnt()
 - smp_fetch_sc_http_fail_rate()
 - smp_fetch_sc_kbytes_in()
 - smp_fetch_sc_bytes_in_rate()
 - smp_fetch_kbytes_out()
 - smp_fetch_sc_gpc1_rate()
 - smp_fetch_sc_gpc0_rate()
 - smp_fetch_sc_gpc_rate()
 - smp_fetch_sc_get_gpc1()
 - smp_fetch_sc_get_gpc0()
 - smp_fetch_sc_get_gpc()
 - smp_fetch_sc_get_gpt0()
 - smp_fetch_sc_get_gpt()
 - smp_fetch_sc_bytes_out_rate()

Please note that this patch doesn't render any good using "git show" or
"git diff". For all the functions listed above, a new helper function was
defined right above it, with the same name without "_sc". These new
functions perform the fetch part, while the original ones (with "_sc")
now simply perform the lookup and then leverage the corresponding fetch
helper.
2025-01-15 14:04:45 +01:00
Aurelien DARRAGON
f71bad4694 MINOR: stktable: add smp_fetch_stksess() helper function
smp_fetch_stksess(table, smp, create) performs a lookup in <table> by
using <smp> as a key. It returns matching entry on success and NULL on
failure. <create> can be set to 1 to force the entry creation.

We then use this helper everywhere relevant to prevent code duplication
2025-01-15 14:04:40 +01:00
Aurelien DARRAGON
0fb8807820 MINOR: stktable: fix potential build issue in smp_to_stkey (2nd try)
As discussed in GH #2838, the previous fix f399dbf
("MINOR: stktable: fix potential build issue in smp_to_stkey") which
attempted to remove conversion ambiguity and prevent build warning proved
to be insufficient.

This time, we implement Willy's suggestion, which is to use an union to
perform the conversion.

Hopefully this should fix GH #2838. If that's the case (and only in that
case), then this patch may be backported with f399dbf (else the patch
won't apply) anywhere b59d1fd ("BUG/MINOR: stktable: fix big-endian
compatiblity in smp_to_stkey()") was backported.
2025-01-15 14:04:31 +01:00
Christopher Faulet
91578212d7 BUG/MEDIUM: promex: Use right context pointers to dump backends extra-counters
When backends extra counters are dumped, the wrong pointer was used in the
promex context to retrieve the stats module. p[1] must be used instead of
p[2]. Because of this typo, a infinite loop could be experienced if the
output buffer is full during this stage. But in all cases an overflow is
possible leading to a memory corruption.

This patch may be related to issue #2831. It must be backported as far as
3.0.
2025-01-14 15:38:43 +01:00
Aurelien DARRAGON
8919a80da9 BUG/MEDIUM: stktable: fix missing lock on some table converters
In 819fc6f563
("MEDIUM: threads/stick-tables: handle multithreads on stick tables"),
sample fetch and action functions were properly guarded with stksess
read/write locks for read and write operations respectively, but the
sample_conv_table functions leveraged by "table_*" converters were
overlooked.

This bug was not known to cause issues in existing deployments yet (at
least it was not reported), but due to its nature it can theorically lead
to inconsistent values being reported by "table_*" converters if the value
is being updated by another thread in parallel.

It should be backported to all stable versions.

[ada: for versions < 3.0, glitch_cnt and glitch_rate samples should be
 ignored as they first appeared in 3.0]
2025-01-14 11:36:04 +01:00
Aurelien DARRAGON
f399dbf70c MINOR: stktable: fix potential build issue in smp_to_stkey
smp_to_stkey() uses an ambiguous cast from 64bit integer to 32 bit
unsigned integer. While it is intended, let's make the cast less
ambiguous by explicitly casting the right part of the assignment to the
proper type.

This should fix GH #2838
2025-01-13 09:45:40 +01:00
Amaury Denoyelle
4a5d82a97d BUG/MINOR: quic: reject NEW_TOKEN frames from clients
As specified by RFC 9000, reject NEW_TOKEN frames emitted by clients.
Close the connection with error code PROTOCOL_VIOLATION.

This must be backported up to 2.6.
2025-01-10 14:50:59 +01:00
Amaury Denoyelle
a2c0c459a4 MINOR: trace: support all source alias on -dt
Command line argument -dt can be used to activate traces during startup.
Via its optional argument, it is possible to change settings for a
particular trace source. It is also possible to update every registered
sources by specifying an empty name.

Support the trace source alias "all". This is an alternative to the
empty name to update every sources.
2025-01-10 14:50:59 +01:00
Amaury Denoyelle
a50dd07c16 MINOR: trace: ensure -dt priority over traces config section
Traces can be activated on startup either via -dt command line argument
or via the traces configuration section. This can caused confusion as it
may not be clear as trace source can be completed or overriden by one or
the other.

Fix the precedence to give the priority to the command line argument.
Now, each trace source configured via -dt is first resetted to a default
state before applying new settings. Then, it is impossible to change a
trace source via the configuration file if it was already targetted via
-dt argument.
2025-01-10 14:50:59 +01:00
Amaury Denoyelle
da9a7e0bd9 MINOR: trace: add help message for -dt argument
Traces can be activated on startup via -dt command line argument. To
facilitate its usage, display a usage description and examples when
"help" is specified.
2025-01-10 14:50:59 +01:00
Olivier Houchard
659d5f6579 BUG/MEDIUM: queues: Adjust the proxy counters when appropriate
In process_srv_queue(), if we manage to successfully run an extra task,
don't forget to adjust the proxy's totpend and served counters accordingly.
Having an inaccurate served could lead to various subtle bugs, as it is
used when making load balancing decisions.

This should not be backported, unless cda7275ef5d5e49fb2ea2373ea3b1ba63fc927c3
is backported too.
2025-01-09 17:46:46 +01:00
Aurelien DARRAGON
24042df94e MINOR: stktable: add sc[0-2]_key fetches
As discussed in GH #1750, we were lacking a sample fetch to be able to
retrieve the key from the currently tracked counter entry. To do so,
sc_key fetch can now be used. It returns a sample with the correct type
(table key type) corresponding to the tracked counter entry (from previous
track-sc rules).

If no entry is currently tracked, it returns nothing.

It can be used using the standard form "sc_key(<sc_number>)" or the legacy
form: "sc0_key", "sc1_key", "sc2_key"

Documentation was updated.
2025-01-09 10:57:01 +01:00
Aurelien DARRAGON
7423310d5d MINOR: stktable: add stksess_getkey() helper
stksess_getkey(t, ts) returns a stktable_key struct pointer filled with
data from input <ts> entry in <t> table. Returned pointer uses the
static_table_key variable. Indeed, stktable_key struct is more convenient
to manipulate than having to deal with the key extraction from stktsess
struct directly.
2025-01-09 10:56:56 +01:00
Aurelien DARRAGON
df9c2ef2c3 MINOR: stktable: add stkey_to_smp() helper
reverse operation for smp_to_stkey(): fills input <smp> from a
stktable_key struct.

Returns 1 on success and 0 on failure.
2025-01-09 10:56:50 +01:00
Aurelien DARRAGON
b59d1fd911 BUG/MINOR: stktable: fix big-endian compatiblity in smp_to_stkey()
When smp_to_stkey() deals with SINT samples, since stick-tables deals with
32 bits integers while SINT sample is 64 bit integer, inplace conversion
was done in smp_to_stkey. For that the 64 bit integer was truncated before
the key would point to it. Unfortunately this only works on little endian
architectures because with big endian ones, the key would point to the
wrong 32bit range.

To fix the issue and make the conversion endian-proof, let's re-assign
the sample as 32bit integer before the key points to it.

Thanks to Willy for having spotted the bug and suggesting the above fix.

It should be backported to all stable versions.
2025-01-09 10:56:43 +01:00
Willy Tarreau
7be596b35c [RELEASE] Released version 3.2-dev3
Released version 3.2-dev3 with the following main changes :
    - DOC: config: add missing "track-sc0" in action keywords matrix
    - BUG/MINOR: stktable: invalid use of stkctr_set_entry() with mixed table types
    - BUG/MAJOR: mux-quic: fix BUG_ON on empty STREAM emission
    - BUG/MEDIUM: mux-h2: Count copied data when looping on RX bufs in h2_rcv_buf()
    - Revert "BUG/MAJOR: mux-quic: fix BUG_ON on empty STREAM emission"
    - BUG/MAJOR: mux-quic: properly fix BUG_ON on empty STREAM emission
    - MINOR: mux-quic: add traces on sd attach
    - BUG/MEDIUM: mux-quic: do not attach on already closed stream
    - BUG/MINOR: compression: handle a possible strdup() failure
    - BUG/MINOR: pool: handle a possible strdup() failure
    - BUG/MINOR: cfgparse-tcp: handle a possible strdup() failure
    - BUG/MINOR: log: Allow to use if/unless conditionnals for do-log action
    - MINOR: config: Alert about extra arguments for errorfile and errorloc
    - BUG/MINOR: mux-quic: fix wakeup on qcc_set_error()
    - MINOR: mux-quic: change return value of qcs_attach_sc()
    - BUG/MINOR: mux-quic: handle closure of uni-stream
    - BUG/MEDIUM: promex/resolvers: Don't dump metrics if no nameserver is defined
    - BUG/MAJOR: ssl/ocsp: fix NULL conn object dereferencing to access QUIC TLS counters
    - MEDIUM: errors: get rid of shm_open()
    - BUILD: makefile: do not clean standalone binaries on a simple "make clean"
    - BUILD: makefile: add a qinfo macro to pass info in quiet mode
    - DEV: ncpu: add a simple utility to help with NUMA development
    - DEV: ncpu: implement a wrapper mode
    - DEV: ncpu: make the wrapper work both as a lib and executable
    - BUG/MEDIUM: h1-htx: Properly handle bodyless messages
    - MINOR: tools: add a few functions to simply check for a file's existence
2025-01-09 09:21:04 +01:00
Willy Tarreau
b25850f25b MINOR: tools: add a few functions to simply check for a file's existence
At many places we'd like to be able to simply construct a path from a
format string and check if that path corresponds to an existing file,
directory etc. Here we add 3 functions, a generic one to test that a
path corresponds to a given file mode (e.g. S_IFDIR, S_IFREG etc), and
two other ones specifically checking for a file or a dir for easier
use.
2025-01-09 09:18:49 +01:00
Christopher Faulet
b9cc361b35 BUG/MEDIUM: h1-htx: Properly handle bodyless messages
During h1 parsing, there are some postparsing checks to detect bodyless
messages and switch the parsing in DONE state. However, a case was not
properly handled. Responses to HEAD requests with a "transfer-encoding"
header. The response parser remained blocked waiting for the response body.

To fix the issue, the postparsing was sliglty modified. Instead of trying to
handle bodyless messages in a common way between the request and the
response, it is now performed in the dedicated postparsing functions. It is
easier to enumerate all cases, especially because there is already a test
for responses to HEAD requests.

This patch should fix the issue #2836. It must be backported as far as 2.9.
2025-01-08 18:20:26 +01:00
Willy Tarreau
ca773e1a2a DEV: ncpu: make the wrapper work both as a lib and executable
It's convenient to have a share lib be able to also work as a wrapper.
But recent glibc broke support for this dual-mode thing some time ago:

   https://patchwork.ozlabs.org/project/glibc/patch/20190312130235.8E82C89CE49C@oldenburg2.str.redhat.com/
   https://stackoverflow.com/questions/59074126/loading-executable-or-executing-a-library

Trying to preload such an executable indeed returns:

   ERROR: ld.so: object '/path/to/ncpu.so' from LD_PRELOAD cannot be preloaded (cannot dynamically load position-independent executable): ignored.

Note that the code still supports it since libc.so is both an executable
and a lib. The approach taken here is the same as in the nousr.so wrapper.
It consists in dropping the DF_1_PIE flag from the resulting executable
since it's what the dynamic linker is looking for. This flag is found in
FLAGS_1 in the .dynamic section. As readelf -a suggests, it's after the
tag 0x6ffffffb. The value is 0x08000000. We're using objdump to figure the
length and offset of the struct, dd to extract the 3 parts, and sed to
patch the binary.

It's likely that it will only work on 64-bit little endian, though tests
should be performed to see what to do on other platforms. At least on
x86_64, ld.so is happy and it continues to be possible to use the binary
as a .so, and that the platform where most of the development happens so
that's fine.

In any case the wrapper and the standard shared lib are still made two
distinct files so that it's possible to use the non-patched version on
unsupported OSes or architectures.
2025-01-08 11:27:10 +01:00
Willy Tarreau
3fdf875716 DEV: ncpu: implement a wrapper mode
The wrapper mode allows to present itself as LD_PRELOAD before loading
haproxy, which is often more convenient since it allows to pass the
number of CPUs in argument. However, this mode is no longer supported by
modern glibcs, so a future patch will come to implement a trick that was
tested to work at least on x86.
2025-01-08 11:26:05 +01:00
Willy Tarreau
25c08562cb DEV: ncpu: add a simple utility to help with NUMA development
Collecting captures of /sys isn't sufficient for NUMA development because
haproxy detects the number of CPUs at boot time and will not be able to
inspect more than this number. Let's just have a small utility to report
a fake number of CPUs, that will be loaded using LD_PRELOAD. It checks
the NCPU variable if it exists and will present this number of CPUs, or
if it does not exist, will expose the maximum supported number.
2025-01-08 11:26:05 +01:00
Willy Tarreau
bd06502b22 BUILD: makefile: add a qinfo macro to pass info in quiet mode
Some commands such as $(cmd_CC) etc already handle the quiet vs verbose
mode in the makefile, but sometimes we may want to pass other info. The
new "qinfo" macro can be called with a 9-char string argument (spaces
included) as a prefix for some commands, to emit that string when in
quiet mode. The caller must fill the spaces needed for alignment. E.g:

  $(call quinfo,  CC     )$(CC) ...
2025-01-08 11:26:05 +01:00
Willy Tarreau
c87619fa25 BUILD: makefile: do not clean standalone binaries on a simple "make clean"
Running "make clean" currently gets rid of a number of auxiliary tools,
including the standalone ones that do not depend on haproxy's build
options. This is a bit annoying as they have to be rebuilt each time.
Let's move them to the distclean target instead.
2025-01-08 11:26:01 +01:00
William Lallemand
143be1b59f MEDIUM: errors: get rid of shm_open()
Since 5ee266b7 ("MINOR: error: simplify startup_logs_init_shm"), the FD
of the startup logs is always closed and the HAPROXY_STARTUPLOGS_FD
variable is not used anymore. Which means we only need a mmap.

Indeed the shm_open() function was only needed to keep the shm between
the exec() of the master so we can get the logs stored there after doing
the final exec() in wait mode. Since the wait mode doesn't exist
anymore and the parsing is done in a worker, we only need to share a
memory zone between the master and the worker.

This patch removes shm_open() and replace it with a simple mmap(), this
way the shared startup-logs become more portable and USE_SHM_OPEN is not
required anymore.
2025-01-07 16:42:38 +01:00
Frederic Lecaille
d7fc90afe9 BUG/MAJOR: ssl/ocsp: fix NULL conn object dereferencing to access QUIC TLS counters
This bug arrived with this commit in the current dev branch:

	056ec51c26 MEDIUM: ssl/ocsp: counters for OCSP stapling

and could occur for QUIC connections during handshake when the underlying
<conn> connection object is not already initialized. So in this case the TLS
counters attached to TLS listeners cannot be accessed through this object but
from the QUIC connection object.

Modify the code to initialize the listener (<li> variable) for both QUIC
and TCP connections, then initialize the variables for the TLS counters
if the listener is also initialized.

Thank you to @Tristan971 for having reported this issue in GH #2833.

Must be backported with the commit mentioned above if it is planned to be
backported.
2025-01-07 15:19:42 +01:00
Christopher Faulet
892eb2bb2c BUG/MEDIUM: promex/resolvers: Don't dump metrics if no nameserver is defined
A 'resolvers' section may be defined without any nameserver. In that case,
we must take care to not dump corresponding Prometheus metrics. However
there is an issue that could lead to a crash or a strange infinite loop
because we are looping on an empty list and, at some point, we are
dereferencing an invalid pointer.

There is an issue because the loop on the nameservers of a resolvers section
is performed via callback functions and not the standard list_for_each_entry
macro. So we must take care to properly detect end of the list and empty
lists for nameservers. But the fix is not so simple because resolvers
sections with and without nameservers may be mixed.

To fix the issue, in rslv_promex_start_ts() and rslv_promex_next_ts(), when the
next resolvers section must be evaluated, a loop is now used to properly skip
empty sections.

This patch is related to #2831. Not sure it fixes it. It must be backported
as far as 3.0.
2025-01-06 09:08:38 +01:00
Amaury Denoyelle
801e39e1cc BUG/MINOR: mux-quic: handle closure of uni-stream
This commit is a direct follow-up to the previous one. As already
described, a previous fix was merged to prevent streamdesc attach
operation on already completed QCS instances scheduled for purging. This
was implemented by skipping app proto decoding.

However, this has a bad side-effect for remote uni-directional stream.
If receiving a FIN stream frame on such a stream, it will considered as
complete because streamdesc are never attached to a uni stream. Due to
the mentionned new fix, this prevent analysis of this last frame for
every uni stream.

To fix this, do not skip anymore app proto decoding for completed QCS.
Update instead qcs_attach_sc() to transform it as a noop function if QCS
is already fully closed before streamdesc instantiation. However,
success return value is still used to prevent an invalid decoding error
report.

The impact of this bug should be minor. Indeed, HTTP3 and QPACK uni
streams are never closed by the client as this is invalid due to the
spec. The only issue was that this prevented QUIC MUX to close the
connection with error H3_ERR_CLOSED_CRITICAL_STREAM.

This must be backported along the previous patch, at least to 3.1, and
eventually to 2.8 if mentionned patches are merged there.
2025-01-03 17:21:19 +01:00
Amaury Denoyelle
af00be8e0f MINOR: mux-quic: change return value of qcs_attach_sc()
A recent fix was introduced to ensure that a streamdesc instance won't
be attached to an already completed QCS which is eligible to purging.
This was performed by skipping application protocol decoding if a QCS is
in such a state. Here is the patch responsible for this change.
  caf60ac696a29799631a76beb16d0072f65eef12
  BUG/MEDIUM: mux-quic: do not attach on already closed stream

However, this is too restrictive, in particular for unidirection stream
where no streamdesc is never attached. To fix this behavior, first
qcs_attach_sc() API has been modified. Instead of returning a streamdesc
instance, it returns either 0 on success or a negative error code.

There should be no functional changes with this patch. It is only to be
able to extend qcs_attach_sc() with the possibility of skipping
streamdesc instantiation while still keeping a success return value.

This should be backported wherever the above patch has been merged. For
the record, it was scheduled for immediate backport on 3.1, plus merging
on older releases up to 2.8 after a period of observation.
2025-01-03 17:19:21 +01:00
Amaury Denoyelle
4f2554903b BUG/MINOR: mux-quic: fix wakeup on qcc_set_error()
The following patch was a major refactoring of QUIC MUX. It removes
pacing specific code path. In particular, qcc_wakeup() utility function
was removed and replaced by its tasklet_wakup() usage.
  41f0472d967b2deb095d5adc8a167da973fbee3d
  MEDIUM: mux-quic: remove pacing specific code on qcc_io_cb

However, an incorrect substitution was performed in qcc_set_error(). As
such, there was no explicit wakeup in case an error is detected by QUIC
MUX or the app protocol layer. This may lead to missing error reporting
to clients.

Fix this by re-add tasklet_wakup() usage into qcc_set_error().

This must be backported up to 3.1 where above patch is scheduled.
2025-01-03 10:39:49 +01:00
Christopher Faulet
f578811c4e MINOR: config: Alert about extra arguments for errorfile and errorloc
errorfile and errorloc directives expect excatly two arguments. But extra
arguments were just ignored while an error should be emitted. It is now
fixed.

This patch could be backported as far as 2.2 if necessary.
2025-01-03 10:10:09 +01:00
Christopher Faulet
a785a20bef BUG/MINOR: log: Allow to use if/unless conditionnals for do-log action
The do-log action does not accept argument for now. But an error was
triggered if any extra arguments was found, preventing the use of if/unless
conditionnals.

When an action is parsed, expected arguments must be tested to detect
missing ones but not unexpected extra arguments because this should be
performed by the conditionnal parser. So just removing the test in the
do-log parser function is enough to fix the issue.

This patch must be backported to 3.1.
2025-01-03 09:44:08 +01:00
Ilia Shipitsin
bbd1cedefc BUG/MINOR: cfgparse-tcp: handle a possible strdup() failure
This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to all supported branches.
2025-01-02 14:31:07 +01:00
Ilia Shipitsin
beca953c55 BUG/MINOR: pool: handle a possible strdup() failure
This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to all supported branches.
2025-01-02 14:31:07 +01:00
Ilia Shipitsin
b4f965be9e BUG/MINOR: compression: handle a possible strdup() failure
This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to all supported branches.
2025-01-02 14:31:07 +01:00
Amaury Denoyelle
caf60ac696 BUG/MEDIUM: mux-quic: do not attach on already closed stream
Due to QUIC packet reordering, a stream may be opened via a new
RESET_STREAM or STOP_SENDING frame. This would cause either Tx or Rx
channel to be immediately closed.

This can cause an issue with current QUIC MUX implementation with QCS
purging. QCS are inserted into QCC purge list when transfer could be
considered as completed. In most cases, this happens after full
request/response exchange. However, it can also happens after request
reception if RESET_STREAM/STOP_SENDING are received first.

A BUG_ON() crash will occur if a STREAM frame is received after. In this
case, streamdesc instance will be attached via qcs_attach_sc() to handle
the new request. However, QCS is already considered eligible to purging.
It could cause it to be released while its streamdesc instance remains.
A BUG_ON() crash detects this problem in qcc_purge_streams().

To fix this, extend qcc_decode_qcs() to skip app proto rcv_buf
invokation if QCS is considered completed. A similar condition was
already implemented when read was previously aborted after a
STOP_SENDING emission by QUIC MUX.

This crash was reproduced on haproxy.org. Here is the output of the
backtrace :
Core was generated by `./haproxy-dev -db -f /etc/haproxy/haproxy-current.cfg -sf 16495'.
Program terminated with signal SIGILL, Illegal instruction.
 #0  0x00000000004e442b in qcc_purge_streams (qcc=0x774cca0) at src/mux_quic.c:2661
2661                    BUG_ON_HOT(!qcs_is_completed(qcs));
[Current thread is 1 (LWP 1457)]
[ ## gdb ## ] bt
 #0  0x00000000004e442b in qcc_purge_streams (qcc=0x774cca0) at src/mux_quic.c:2661
 #1  0x00000000004e4db7 in qcc_io_process (qcc=0x774cca0) at src/mux_quic.c:2744
 #2  0x00000000004e5a54 in qcc_io_cb (t=0x7f71193940c0, ctx=0x774cca0, status=573504) at src/mux_quic.c:2886
 #3  0x0000000000b4f792 in run_tasks_from_lists (budgets=0x7ffdcea1e670) at src/task.c:603
 #4  0x0000000000b5012f in process_runnable_tasks () at src/task.c:883
 #5  0x00000000007de4a3 in run_poll_loop () at src/haproxy.c:2771
 #6  0x00000000007deb9f in run_thread_poll_loop (data=0x1335a00 <ha_thread_info>) at src/haproxy.c:2985
 #7  0x00000000007dfd8d in main (argc=6, argv=0x7ffdcea1e958) at src/haproxy.c:3570

This BUG_ON() crash can only happen since 3.1 refactoring. Indeed, purge
list was only implemented on this version. As such, please backport it
on 3.1 immediately. However, a logic issue remains for older version as
a stream could be attached on a fully closed QCS. Thus, it should be
backported up to 2.8, this time after a period of observation.
2025-01-02 11:25:40 +01:00
Amaury Denoyelle
4a997e5a93 MINOR: mux-quic: add traces on sd attach
Add traces into qcs_attach_sc(). This function is called when a request
is received on a QCS stream and a streamdesc instance is attached. This
will be useful to facilitate debugging.
2025-01-02 11:25:40 +01:00
Amaury Denoyelle
ddfd8031f8 BUG/MAJOR: mux-quic: properly fix BUG_ON on empty STREAM emission
Properly fix BUG_ON() occurence when QUIC MUX emits only empty STREAM
frames. This was addressed by a previous patch but it causes another
regression so a revert was needed.

BUG_ON() on qcc_build_frms() return value is invalid. Indeed,
qcc_build_frms() may return 0, but this does not imply that frame list
is empty, as encoded frames can have a zero length payload. As such,
simply remove this invalid BUG_ON().

This must be backported up to 3.1.
2025-01-02 11:25:40 +01:00
Amaury Denoyelle
85e27f1e92 Revert "BUG/MAJOR: mux-quic: fix BUG_ON on empty STREAM emission"
This reverts commit 98064537423fafe05b9ddd97e81cedec8b6b278d.

Above patch tried to fix a BUG_ON() occurence when MUX only emitted
empty STREAM frames via qcc_build_frms(). Return value of qcs_send() was
changed from the payload STREAM frame to the whole frame length.
However, this is invalid as this return value is used to ensure
connection flow-control is not exceeded on sending retry. This causes
occurence of BUG_ON() crash in qcc_io_send() as send-list is not
properly purged after QCS emission.

Reverts this incorrect fix. The original issue will be properly dealt in
the next commit.

This commit must be backported to 3.1 if reverted commit was already
applied on it.
2025-01-02 11:00:25 +01:00
Christopher Faulet
22f8d2c99e BUG/MEDIUM: mux-h2: Count copied data when looping on RX bufs in h2_rcv_buf()
When data was copied from RX buffers to the channel buffer, more data than
expected could be moved because amount of data copied was never decremented
from the limit. This could lead to a stream dead lock when the compression
filter was inuse.

The issue was introduced by commit 4eb3ff1 ("MAJOR: mux-h2: make streams use
the connection's buffers") but revealed by 3816c38 ("MAJOR: mux-h2: permit a
stream to allocate as many buffers as desired").

Because a h2 stream can now have several RX buffers, in h2_rcv_buf(), we
loop on these buffers to fill the channel buffer. However, we must still
take care to respect the limit to not copy to much data. However, the
"count" variable was never decremented to reflect amount of data already
copied. So, it was possible to exceed the limit.

It was an issue when the compression filter was inuse because the channel
buffer could be fully filled, preventing the compression to be
performed. When this happened, the stream was infinitly blocked because the
compression filter was asking for some space but nothing was scheduled to be
forwarded.

This patch should fix the issue #2826. It must be backported to 3.1.
2025-01-02 09:58:23 +01:00
Amaury Denoyelle
9806453742 BUG/MAJOR: mux-quic: fix BUG_ON on empty STREAM emission
A BUG_ON() is present in qcc_io_send() to ensure that encoded frame list
is empty if qcc_build_frms() previously returned 0.

This BUG_ON() may be triggered if empty STREAM frame is encoded for
standalone FIN. Indeed, qcc_build_frms() returns the sum of all STREAM
payload length. In case only empty STREAM frames are generated, return
value will be 0, despite new frames encoded and inserted into frame
list.

To fix this, change return value of qcs_send(). This now returns the
whole STREAM frame length, both header and payload included. This
ensures that qcc_build_frms() won't return a nul value if new frames are
encoded, even empty ones.

This must be backported up to 3.1.
2024-12-31 16:39:53 +01:00
Aurelien DARRAGON
5bbdd14f56 BUG/MINOR: stktable: invalid use of stkctr_set_entry() with mixed table types
Some actions such as "sc0_get_gpc0" (using smp_fetch_sc_stkctr()
internally) can take an optional table name as parameter to perform the
lookup on a different table from the tracked one but using the key from
the tracked entry. It is done by leveraging the stktable_lookup() function
which was originally meant to perform intra-table lookups.

Calling sc0_get_gpc0() with a different table name will result in
stktable_lookup() being called to perform lookup using a stktsess from
a different table. While it is theorically fine, it comes with a pitfall:
both tables (the one from where the stktsess originates and the actual
target table) should rely on the exact same key type and length.

Failure to do so actually results in undefined behavior, because the key
type and/or length from one table is used to perform the lookup in
another table, while the underlying lookup API expects explicit type and
key length.

For instance, consider the below example:

  peers testpeers
    bind 127.0.0.1:10001
    server localhost

    table test type binary len 1 size 100k expire 1h store gpc0
    table test2 type string size 100k expire 1h store gpc0

  listen test_px
    mode http
    bind 0.0.0.0:8080
    http-request track-sc0 bin(AA) table testpeers/test
    http-request track-sc1 str(ok) table testpeers/test2
    log-format "%[sc0_get_gpc0(testpeers/test2)]"
    log stdout format raw local0

    server s1 git.haproxy.org:80

Performing a curl request to localhost:8080 will cause unitialized reads
because string "ok" from test2 table will be compared as a string against
"AA" binary sample which is not NULL terminated:

==2450742== Conditional jump or move depends on uninitialised value(s)
==2450742==    at 0x484F238: strlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2450742==    by 0x27BCE6: stktable_lookup (stick_table.c:539)
==2450742==    by 0x281470: smp_fetch_sc_stkctr (stick_table.c:3580)
==2450742==    by 0x283083: smp_fetch_sc_get_gpc0 (stick_table.c:3788)
==2450742==    by 0x2A805C: sample_process (sample.c:1376)

So let's prevent that by adding some comments in stktable_set_entry()
func description, and by adding a check in smp_fetch_sc_stkctr() to ensure
both source stksess and target table share the same key properties.

While it could be relevant to backport this in all stable versions, it is
probably safer to wait for some time before doing so, to ensure that no
existing configs rely on this ambiguity because the fact that the target
table and source stksess entry need to share the same key type and length
is not explicitly documented.
2024-12-31 16:36:00 +01:00
Aurelien DARRAGON
f94c63021b DOC: config: add missing "track-sc0" in action keywords matrix
In d54e8f8107 ("DOC: config: reorganize actions into their own section"),
"track-sc0" keyword was properly documented but the keyword was not placed
in the action keywords matrix alongside other track-sc* statements. It
was probably overlooked, so let's fix that.

Could be backported up to 2.9 with d54e8f8107.
2024-12-31 16:35:54 +01:00
Willy Tarreau
e148dfd35d [RELEASE] Released version 3.2-dev2
Released version 3.2-dev2 with the following main changes :
    - MINOR: build: define DEBUG_STRESS
    - MINOR: applet: define applet_putchk_stress() alternative
    - MINOR: stats: use stress mode to force reentrant dumps
    - CI: scripts: add support for AWS-LC-FIPS in build-ssl.sh
    - MINOR: ssl: add "FIPS" details in haproxy -vv
    - MEDIUM: ssl: rename 'OpenSSL' by 'SSL library' in haproxy -vv
    - CI: github: let's add an AWS-LC-FIPS job
    - MINOR: window_filter: rely on the time to update the filter samples (QUIC/BBR)
    - BUG/MINOR: quic: wrong logical statement in in_recovery_period() (BBR)
    - BUG/MINOR: quic: fix BBB max bandwidth oscillation issue.
    - BUG/MINOR: quic: wrong bbr_target_inflight() implementation
    - BUG/MINOR: quic: remove max_bw filter from delivery rate sampling
    - BUG/MINOR: quic: underflow issue for bbr_inflight_hi_from_lost_packet()
    - BUG/MINOR: quic: reduce packet losses at least during ProbeBW_CRUISE (BBR)
    - MINOR: quic: reduce the private data size of QUIC cc algos
    - CLEANUP: quic: remove a wrong comment about ->app_limited (drs)
    - BUG/MINOR: quic: fix the wrong tracked recovery start time value
    - BUG/MINOR: quic: too permissive exit condition for high loss detection in Startup (BBR)
    - BUG/MINOR: cli: cli_snd_buf: preserve \r\n for payload lines
    - REGTESTS: ssl: add a PEM with mix of LF and CRLF line endings
    - BUG/MINOR: quic: missing Startup accelerating probing bw states
    - CLEANUP: quic: Rename some BBR functions in relation with bw probing
    - REORG: startup: move global.maxconn calculations in limits.c
    - REORG: startup: move code that applies limits to limits.c
    - REORG: startup: move nofile limit checks in limits.c
    - MINOR: ssl: add utils functions to extract X509 notAfter date
    - MINOR: ssl/cli: allow to filter expired certificates with 'show ssl sni'
    - MINOR: ssl/cli: add -A to the 'show ssl sni' command description
    - BUG/MINOR: ssl/cli: 'show ssl cert' escape the first '*' of a filename
    - BUG/MINOR: ssl/cli: 'show ssl crl-file' escape the first '*' of a filename
    - BUG/MINOR: ssl/cli: 'show ssl ca-file' escape the first '*' of a filename
    - BUG/MEDIUM: stconn: Only consider I/O timers to update stream's expiration date
    - BUG/MEDIUM: queues: Make sure we call process_srv_queue() when leaving
    - BUG/MEDIUM: queues: Do not use pendconn_grab_from_px().
    - CLEANUP: queues: Remove pendconn_grab_from_px().
    - BUILD: debug: only dump/reset glitch counters when really defined
    - MINOR: compiler: add a __has_builtin() macro to detect features more easily
    - MINOR: compiler: rely on builtin detection for __builtin_unreachable()
    - MINOR: compiler: add a new "ASSUME" macro to help the compiler
    - MINOR: compiler: also enable __builtin_assume() for ASSUME()
    - MINOR: compiler: add ASSUME_NONNULL() to tell the compiler a pointer is valid
    - MINOR: bug: make BUG_ON() fall back to ASSUME
    - CLEANUP: cache: use ASSUME_NONNULL() instead of DISGUISE()
    - CLEANUP: hlua: use ASSUME_NONNULL() instead of ALREADY_CHECKED()
    - CLEANUP: htx: use ASSUME_NONNULL() to mark the start line as non-null
    - CLEANUP: mux-fcgi: use ASSUME_NONNULL() to indicate that the first block exists
    - CLEANUP: stats: use ASSUME_NONNULL() to indicate that the first block exists
    - CLEANUP: quic: replace ALREADY_CHECKED() with ASSUME_NONNULL() at a few places
    - CLEANUP: ssl-sock: drop two now unneeded ALREADY_CHECKED()
    - BUG/MEDIUM: mux-quic: do not mix qcc_io_send() return codes with pacing
    - CLEANUP: mux-quic: remove unused qcc member send_retry_list
    - MINOR: quic: add traces
    - MINOR: mux-quic: refactor wait-for-handshake support
    - MEDIUM/OPTIM: mux-quic: define a recv_list for demux resumption
    - MEDIUM/OPTIM: mux-quic: implement purg_list
    - MINOR: mux-quic: extract code to build STREAM frames list
    - MINOR: mux-quic: split STREAM and RS/SS emission
    - MEDIUM/OPTIM: mux-quic: do not rebuild frms list on every send
    - MEDIUM: mux-quic: remove pacing specific code on qcc_io_cb
    - MINOR: trace: implement tracing disabling API
    - MINOR: mux-quic: hide traces when woken up on pacing only
    - MINOR: ssl/cli: add a 'Uncommitted' status for 'show ssl' commands
    - MINOR: ssl/ocsp: Add extra details in error logs when possible
    - BUILD: ssl/ocsp: error: ‘%.*s’ directive argument is null
    - MEDIUM: ssl/ocsp: OCSP response is expired with OCSP_MAX_RESPONSE_TIME_SKEW
    - MINOR: ssl: improve HAVE_SSL_OCSP ifdef
    - DOC: config: add example for server "track" keyword
    - DOC: config: reorder "tune.lua.*" keywords by alphabetical order
    - DOC: config: add "tune.lua.burst-timeout" to the list of global parameters
    - MINOR: hlua: add option to preserve bool type from smp to lua
    - REGTESTS: fix lua-based regtests using tune.lua.smp-preserve-bool
    - BUG/MEDIUM: mux-quic: prevent BUG_ON() by refreshing frms on MAX_DATA
    - CLEANUP: mux-quic: remove dead err label in qcc_build_frms()
    - BUG/MINOR: h2/rhttp: fix HTTP2 conn counters on reverse
    - MINOR: hlua: rename "tune.lua.preserve-smp-bool" to "tune.lua.bool-sample-conversion"
    - MINOR: ssl: change visibility of ssl_stats_module
    - MINOR: ssl: rework the error management in the OCSP callback
    - MEDIUM: ssl/ocsp: counters for OCSP stapling
    - CI: limit aws-lc and libressl Quic Interop to "haproxy" only
    - BUG/MEDIUM: queue: Make process_srv_queue return the number of streams
    - CI: github: try to build the latest WolfSSL master weekly
    - CI: github: activate ASAN on the WolfSSL weekly job
    - BUG/MINOR: stats: fix segfault caused by uninitialized value in "show schema json"
    - MINOR: stktable: add stktable_get_data_type_idx() helper function
    - MINOR: stktable: support optional index for array types in {set, clear, show} table commands
    - CI: scripts: allow to build wolfssl with --enable-debug
    - CI: github: activate debug in wolfssl weekly build
    - BUG/MEDIUM: queues: Stricly respect maxconn for outgoing connections
    - MEDIUM: queue: Handle the race condition between queue and dequeue differently
    - CLEANUP: Remove pendconn_must_try_again().
    - BUILD: compat: add missing fcntl.h before defining F_SETPIPE_SZ
    - BUILD: mworker: always initialize the saveptr of strtok_r()
    - BUILD: limits: make normalize_rlim() take an rlim_t to fix build on m68k
    - BUG/MINOR: checks: handle a possible strdup() failure
    - BUG/MINOR: listener: handle a possible strdup() failure
    - BUG/MINOR: mux_h1: handle a possible strdup() failure
    - BUG/MINOR: debug: handle a possible strdup() failure
2024-12-25 15:17:01 +01:00
Ilia Shipitsin
6524fbfb70 BUG/MINOR: debug: handle a possible strdup() failure
This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to all supported branches.
2024-12-25 12:42:33 +01:00
Ilia Shipitsin
a3e6c783cd BUG/MINOR: mux_h1: handle a possible strdup() failure
This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to all supported branches.
2024-12-25 12:42:33 +01:00
Ilia Shipitsin
89c62693da BUG/MINOR: listener: handle a possible strdup() failure
This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to all supported branches.
2024-12-25 12:41:08 +01:00
Ilia Shipitsin
495f1f9741 BUG/MINOR: checks: handle a possible strdup() failure
This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to all supported branches.
2024-12-25 12:40:56 +01:00
Willy Tarreau
f486f976c7 BUILD: limits: make normalize_rlim() take an rlim_t to fix build on m68k
As can be seen here, the build fails on m68k since commit 665dde648
("MINOR: debug: use LIM2A to show limits") in 3.1:

  https://github.com/haproxy/haproxy/actions/runs/12440234399/job/34735360177

The reason is the comparison between a ulong limit and RLIM_INFINITY.
Indeed, on m68k, rlim_t is an unsigned long long. Let's just change
the function's input type to take an rlim_t instead. This also allows
to get rid of the casts in the call place.

This can be backported to 3.1 though it's not important given the low
prevalence of this platform for such use cases.
2024-12-25 12:33:06 +01:00
Willy Tarreau
21df7677a9 BUILD: mworker: always initialize the saveptr of strtok_r()
Building with some libcs which define strtok_r() as an inline function
can yield a possibly uninitialized warning due to a loop dereferencing
this save pointer early, even though the doc clearly mentions that it
is ignored. This is actually more of a mismatch between the compiler
and the libc (gcc-4.7 and glibc-2.23 in that case). It's trivial to
set s2 to NULL here so let's do it to please this old couple. Note
that while the warning is triggered in all supported versions, there's
no point backporting it since it's unlikely this combination will be
relevant outside of backwards compatibility checks now.
2024-12-25 12:18:46 +01:00
Willy Tarreau
f78121dd32 BUILD: compat: add missing fcntl.h before defining F_SETPIPE_SZ
n 1.5-dev8, 13 years ago, support for setting pipe size was added by
commit bd9a0a778 ("OPTIM/MINOR: make it possible to change pipe size
(tune.pipesize)"). For compatibility purposes, it was defining
F_SETPIPE_SZ in compat.h if it was not set. It apparently always had
F_SETPIPE_SZ defined before being included.

Now in 3.2-dev1, commit fbc534a6f ("REORG: startup: move nofile limit
checks in limits.c") reordered a few includes and ended up with
mworker-prog.c including compat.h before fcntl.h, causing a redefinition
error on certain libcs:

    CC      src/mworker-prog.o
  In file included from /usr/include/bits/fcntl.h:61:0,
                   from /usr/include/fcntl.h:35,
                   from include/haproxy/limits.h:11,
                   from include/haproxy/mworker.h:18,
                   from src/mworker-prog.c:27:
  /usr/include/bits/fcntl-linux.h:203:0: warning: "F_SETPIPE_SZ" redefined [enabled by default]
  In file included from include/haproxy/api-t.h:35:0,
                   from include/haproxy/api.h:33,
                   from src/mworker-prog.c:23:
  include/haproxy/compat.h:161:0: note: this is the location of the previous definition

Let's simply include fcntl.h in compat.h before the macro is redefined.

There's normally no need to backport this, though it's harmless to do
it if needed.
2024-12-25 11:53:11 +01:00
Olivier Houchard
505480eeef CLEANUP: Remove pendconn_must_try_again().
Remove pendconn_must_try_again(), now that it no longer is used.
2024-12-24 14:10:06 +01:00
Olivier Houchard
cda7275ef5 MEDIUM: queue: Handle the race condition between queue and dequeue differently
There is a small race condition, where a server would check if there is
something left in the proxy queue, and adding something to the proxy
queue. If the server checks just before the stream is added to the queue,
and it no longer has any stream to deal with, then nothing will take
care of the stream, that may stay in the queue forever.
This was worked around with commit 5541d4995d, by checking for that exact
condition after adding the stream to the queue, and trying again to get
a server assigned if it is detected.
That fix lead to multiple infinite loops, that got fixed, but it is not
unlikely that it could happen again. So let's fix the initial problem
differently : a single server may mark itself as ready, and it removes
itself once used. The principle is that when we discover that the just
queued stream is alone with no active request anywhere ot dequeue it,
instead of rebalancing it, it will be assigned to that current "ready"
server that is available to handle it. The extra cost of the atomic ops
is negligible since the situation is super rare.
2024-12-24 14:10:06 +01:00
Olivier Houchard
3372a2ea00 BUG/MEDIUM: queues: Stricly respect maxconn for outgoing connections
The "served" field of struct server is used to know how many connections
are currently in use for a server. But served used to be incremented way
after the server was picked, so there were race conditions that could
lead more than maxconn connections to be allocated for one server. To
fix this, increment served way earlier, and make sure at the time that
it never goes past maxconn.
We now should never have more outgoing connections than set by maxconn.
2024-12-24 14:10:06 +01:00
William Lallemand
4332fed6c1 CI: github: activate debug in wolfssl weekly build
Activate the WolfSSL debugging of WolfSSL in the weekly job.
2024-12-23 18:00:34 +01:00
William Lallemand
287b2dc6dd CI: scripts: allow to build wolfssl with --enable-debug
Allow to activate the debugging of WolfSSL when building it.

WOLFSSL_DEBUG=1 WOLFSSL_VERSION=git-master ./scripts/build-ssl.sh
2024-12-23 18:00:25 +01:00
Aurelien DARRAGON
e8b7337d86 MINOR: stktable: support optional index for array types in {set, clear, show} table commands
As discussed in GH #2286, {set, clear, show} table commands were unable
to deal with array types such as gpt, because they handled such types as
a non-array types, thus only the first entry (ie: gpt[0]) was considered.

In this patch we add an extra logic around array-types handling so that
it is possible to specify an array index right after the type, like this:

  set table peer/table key mykey data.gpt[2] value
  # where 2 is the entry index that we want to access

If no index is specified, then it implicitly defaults to 0 to mimic
previous behavior.
2024-12-23 17:32:11 +01:00
Aurelien DARRAGON
c0dc7769d4 MINOR: stktable: add stktable_get_data_type_idx() helper function
Same as stktable_get_data_type(), but tries to parse optional index in
the form "name[idx]" (only for array types).

Falls back to stktable_get_data_type() when no index is provided.
2024-12-23 17:32:09 +01:00
Aurelien DARRAGON
ac1f413590 BUG/MINOR: stats: fix segfault caused by uninitialized value in "show schema json"
Since b3d5708 ("MINOR: stats: remove implicit static trash_chunk usage")
a segfault can occur when issuing "show schema json" on the stats socket.

Indeed, now the dumping functions don't rely on trash_chunk anymore, but
instead they rely on the appctx->chunk buffer. However, unlike other
stats dumping commands, the "show schema json" only have an io handler,
and no parse function. With other command, the parse function is
responsible for pre-setting some data, including applet ctx reservation.

Thus due to "show schema json" lacking parsing function, the applet ctx is
used uninitialized, which is a bug obviously.

To fix the issue we simply add a parse function for "show schema json",
although all it does for now is calling applet_reserve_svcctx() for the
current applet ctx.

This issue was reported by @dsuch in GH #2825. It must be backported up
to 3.0.
2024-12-23 17:32:07 +01:00
William Lallemand
dfc403f5c6 CI: github: activate ASAN on the WolfSSL weekly job
Activate ASAN on the WolfSSL weekly job in order to have use-after-free
traces.
2024-12-23 17:27:27 +01:00
William Lallemand
ef108705e4 CI: github: try to build the latest WolfSSL master weekly
The WolfSSL latest version is still broken (5.7.4), no new release was
done with a new version.

Modify the weekly CI job so we could build with the latest git version.
2024-12-23 17:27:00 +01:00
Olivier Houchard
5b8899b6cc BUG/MEDIUM: queue: Make process_srv_queue return the number of streams
Make process_srv_queue() return the number of streams unqueued, as
pendconn_grab_from_px() did, as that number is used by
srv_update_status() to generate logs.

This should be backported up to 2.6 with
111ea83ed4e13ac3ab028ed5e95201a1b4aa82b8
2024-12-23 15:03:40 +01:00
Ilia Shipitsin
6aae995b1d CI: limit aws-lc and libressl Quic Interop to "haproxy" only
those CI are not supposed to run in forks (however, if someone wants,
he can enable it personally)
2024-12-23 13:59:48 +01:00
William Lallemand
056ec51c26 MEDIUM: ssl/ocsp: counters for OCSP stapling
Add 2 counters in the SSL stats module for OCSP stapling.

- ssl_ocsp_staple is the number of OCSP response successfully stapled
  with the handshake
- ssl_failed_ocsp_stapled is the number of OCSP response that we
  couldn't staple, it could be because of an error or because the
  response is expired.

These counters are incremented in the OCSP stapling callback, so if no
OCSP was configured they won't never increase. Also they are only
working in frontends.

This was discussed in github issue #2822.
2024-12-23 11:23:00 +01:00
William Lallemand
6e4dd4c64c MINOR: ssl: rework the error management in the OCSP callback
Use an error label to fail in the OCSP callback, instead of returns
everywhere.
2024-12-23 11:23:00 +01:00
William Lallemand
0e6af97233 MINOR: ssl: change visibility of ssl_stats_module
In order to add stats from other files, the ssl_stats_module need to be
visible from other files.

This moves the ssl_counters definition in ssl_sock-t.h and removes the
static of ssl_stats_module.
2024-12-23 11:23:00 +01:00
Aurelien DARRAGON
29b6d8af16 MINOR: hlua: rename "tune.lua.preserve-smp-bool" to "tune.lua.bool-sample-conversion"
A better name was found for the option implemented in ec74438
("MINOR: hlua: add option to preserve bool type from smp to lua")

Indeed, "tune.lua.preserve-smp-bool {on | off}" wasn't explicit enough
nor did it encourage the adoption of the new "fixed" behavior (vs
historical behavior which is now considered as a bug).

Thus it becomes "tune.lua.bool-sample-conversion { normal | pre-3.1-bug }"
which actively encourage users to switch the new behavior after having
patched in-use Lua script if needed. From a technical point of view,
the logic remains the same, as the option currently defaults to
"pre-3.1-bug" to prevent script breakage, and a warning is emitted if
the option isn't set explicily and Lua is used.

Documentation and regtests were updated.

Must be backported in 3.1 with ec74438 and f2838f5 ("REGTESTS: fix
lua-based regtests using tune.lua.smp-preserve-bool")
2024-12-20 17:34:05 +01:00
Amaury Denoyelle
8633446337 BUG/MINOR: h2/rhttp: fix HTTP2 conn counters on reverse
Dedicated HTTP/2 stats proxy counters are available for current and
total number of HTTP/2 connection on both frontend and backend sides.
Both counters are simply incremented into h2_init().

This causes issues when using reverse HTTP. First, increment is not
performed on the expected side, as it is triggered before
h2_conn_reverse() which switches a connection from frontend to backend
or vice versa. For example on active revers side, h2_total_connections
is incremented on the backend only even after connection is reversed and
attached to a listener for the remainder of its lifetime.

h2_open_connections suffers from a similar but arguably worst behavior
as it is also decremented. If increment and decrement operations are not
performed on the same proxy side, which happens for every connection
which has been successfully reversed, it causes an invalid counter
value, possibly with an integer overflow.

To fix this, delay increment operations on reverse HTTP from h2_init()
to h2_conn_reverse(). Both counters are updated only after reverse has
completed, thus using the expected frontend or backend side.

To prevent overflow on h2_open_connections, ensure h2_release()
decrement is not performed if a connection is freed before achieving its
reversal, as in this case it would not have been accounted by H2
counters.

This should be backported up to 2.9.

This should fix github issue #2821.
2024-12-19 17:32:01 +01:00
Amaury Denoyelle
4490df57a6 CLEANUP: mux-quic: remove dead err label in qcc_build_frms()
STREAM frames emission in qcc_build_frms() has been splitted from
RESET_STREAM/STOP_SENDING into qcc_emit_rs_ss(). Now, the former cannot
fail, as such err label can be removed as it is unreachable.

This should be backported up to 3.1.

This should fix github issue #2824.
2024-12-19 16:36:33 +01:00
Amaury Denoyelle
7edb2ffae7 BUG/MEDIUM: mux-quic: prevent BUG_ON() by refreshing frms on MAX_DATA
QUIC MUX emission has been optimized recently by recycling STREAM frames
list between emission cycles. This is done via qcc frms list member. If
new data is available, frames list must be cleared before the next
emission to force the encoding of new STREAM frames.

If a refresh frames list is missed, it would lead to incomplete data
emission on the next transfer. In most cases, this is detected via a
BUG_ON() inside qcc_io_send(), as qcs instances remains in send_list
after a qcc_send_frames() full emission.

A bug was recently found which causes this BUG_ON() crash. This is
directly related to flow control. Indeed, when sending credit is
increased on the connection or a stream, frames list should be cleared
as new larger STREAM frames could be encoded. This was already performed
on MAX_DATA/MAX_STREAM_DATA reception but only if flow-control limit was
unblocked. However this is not the proper condition and it may lead to
insufficient frames refresh and thus this BUG_ON() crash.

Fix this by adjusting the condition for frames refresh on flow control
credit increase. Now, frames list is cleared if real offset is not
blocked and soft offset was equal or greater to the previous limit.
Indeed, this is the only case in which frames refreshing is necessary as
it would result in bigger encoded STREAM frames.

This bug was detected on QUIC interop with go-x-net client. It can also
be reproduced, albeit not systematically, using the following command :
  $ ngtcp2-client -q --no-quic-dump --no-http-dump \
    --exit-on-all-streams-close --max-data 10 \
    127.0.0.1 20443 -n10 "http://127.0.0.1:20443/?s=10k"

This bug appeared with the following patch. As it is scheduled for 3.1
backporting, the current fix should be backported with it.
  14710b5e6bf76834343d58db22e00b72590b16fe
  MEDIUM/OPTIM: mux-quic: do not rebuild frms list on every send
2024-12-19 16:36:28 +01:00
Aurelien DARRAGON
f2838f5172 REGTESTS: fix lua-based regtests using tune.lua.smp-preserve-bool
Because of the previous commit, configs making use of lua script without
setting "tune.lua.smp-preserve-bool" explicitly now raise a warning.

However, since 6f746af91 ("REGTESTS: use -dW by default on every
reg-tests"), regtests are not allowed to raise warnings anymore.

Because of this the CI now fails for every tests that relies on Lua.
To fix this, let's explicitly set the "tune.lua.smp-preserve-bool" for
all tests involving Lua. Here we set the value to "on" because we know
it is safe to do so, and this way it will be future-proof.

If ec7443827 ("MINOR: hlua: add option to preserve bool type from smp to
lua") is backported, then this patch must be backported with it (if it
is not trivial to backport, then simply follow this rule: grep for
"lua-load" in reg-tests directory, then for each match, make sure to set
the tune.smp-preserve-bool tunable in the global section.
2024-12-19 14:21:35 +01:00
Aurelien DARRAGON
ec74438273 MINOR: hlua: add option to preserve bool type from smp to lua
As discussed in GH #2814, there is an ambiguity in hlua implementation
that causes haproxy smp boolean type to be pushed as an integer on the
Lua stack. On the other hand, when doing Lua to haproxy smp conversion,
the boolean type is properly perserved. Of course this situation is not
desirable and can lead to unexpected results. However we cannot simply
fix the behavior because in Lua boolean and integer types are not
are completely distinct types and cannot be used interchangeably. So in
order to prevent breaking existing scripts logic, in this patch we add a
dedicated lua tunable named "tune.lua.smp-preserve-bool" which can take
the following values:

  - "on" : when converting haproxy smp to lua, boolean type is preserved
  - "off": when converting haproxy smp to lua, boolean is converted to
           integer (legacy behavior)

For now, the tunable defaults to "off" to preserve historical behavior.
However, when the option isn't set explicitly and lua is used, a warning
will be emitted in order to raise user's awareness about this ambiguity.
It is expected that the tunable could default to "on" in future versions,
thus it is recommended to avoid setting it to "off" except when using
existing Lua scripts that still rely on the old behavior regarding boolean
smp to Lua conversion, and that they cannot be fixed easily.

This should solve issue GH #2814. It may be relevant to backport this in
haproxy 3.1.
2024-12-19 13:50:27 +01:00
Aurelien DARRAGON
67e3270c59 DOC: config: add "tune.lua.burst-timeout" to the list of global parameters
"tune.lua.burst-timeout" was properly defined but not listed in the list
of global parameters as it was overlooked in 58e36e5b1 ("MEDIUM: hlua:
introduce tune.lua.burst-timeout")
2024-12-19 13:50:21 +01:00
Aurelien DARRAGON
985a45d9c7 DOC: config: reorder "tune.lua.*" keywords by alphabetical order
Effort was made to properly organize "tune.*" keywords by alphabetical
order, but "tune.lua" keywords didn't follow that rule with care.

Let's fix that.
2024-12-19 13:50:16 +01:00
Aurelien DARRAGON
48545113f4 DOC: config: add example for server "track" keyword
As requested on GH #2325, "track" server keyword could benefit from a
simple config example to show how to make use of it.

That's what we're doing in this commit, thanks to GH user @HAkmiller
for the suggestion.
2024-12-19 13:50:03 +01:00
William Lallemand
acb2c9eb8b MINOR: ssl: improve HAVE_SSL_OCSP ifdef
Allow to build correctly without OCSP. It could be disabled easily with
OpenSSL build with OPENSSL_NO_OCSP. Or even with
DEFINE="-DOPENSSL_NO_OCSP" on haproxy make line.
2024-12-19 10:53:05 +01:00
William Lallemand
1c7f5ce32e MEDIUM: ssl/ocsp: OCSP response is expired with OCSP_MAX_RESPONSE_TIME_SKEW
When a OCSP response has a nextUpdate date which is
OCSP_MAX_RESPONSE_TIME_SKEW (300) seconds in the future, the OCSP
stapling callback ssl_sock_ocsp_stapling_cbk() returns SSL_TLSEXT_ERR_NOACK.

However we don't emit an error when trying to load the file.

There is a OCSP_check_validity() check using
OCSP_MAX_RESPONSE_TIME_SKEW, but it checks that the OCSP response is not
thisUpdate is not too much in the past.

This patch emits an error during loading so we don't try to load an OCSP
response which would never be emitted because of OCSP_MAX_RESPONSE_TIME_SKEW.

This was discussed in issue #2822.
2024-12-18 16:14:32 +01:00
William Lallemand
6e11d34940 BUILD: ssl/ocsp: error: ‘%.*s’ directive argument is null
Some gcc version will emit an error because a '%.*s' argument have a
NULL parameter. Initialize the string to "" instead.
2024-12-18 11:25:22 +01:00
Remi Tricot-Le Breton
93f2c73423 MINOR: ssl/ocsp: Add extra details in error logs when possible
When the ocsp response auto update process fails during insertion or
while validating the received ocsp response, we call
ssl_sock_update_ocsp_response or ssl_ocsp_check_response respectively
and both these functions take an 'err' parameter in which detailed error
messages can be written. Until now, those error messages were discarded
and the only information given to the user was a generic error
(ERR_CHECK or ERR_INSERT) which does not help much.
We now keep a pointer to the last error message in the certificate_ocsp
structure and dump its content in the update logs as well as in the
"show ssl ocsp-updates" cli command.

This issue was raised in GitHub #2817.
2024-12-18 10:41:16 +01:00
William Lallemand
4abedc3fb0 MINOR: ssl/cli: add a 'Uncommitted' status for 'show ssl' commands
Add a 'Uncommitted' status for 'show ssl' commands on the 'Status' line
when accessing a non-empty and uncommitted SSL transaction.

Available with:
- show ssl cert
- show ssl ca-file
- show ssl crl-file
2024-12-18 10:32:26 +01:00
Amaury Denoyelle
53db43aff2 MINOR: mux-quic: hide traces when woken up on pacing only
Previous commit aligned default and pacing emission. This is a cleaner
and more robust code. However, it may disrupt traces analysis when
pacing is rescheduled until timer expiration.

Hide traces when qcc_io_cb() is woken up only due to pacing and timer is
not yet expired. This is implemented by using special TASK_WOKEN_IO for
pacing.

This should be backported up to 3.1.
2024-12-18 09:52:16 +01:00
Amaury Denoyelle
9d155ca706 MINOR: trace: implement tracing disabling API
Define a set of functions to temporarily disable/reactivate tracing for
the current thread. This could be useful when wanting to quickly remove
tracing output for some code parts.

The API relies on a disable/resume set of functions, with a thread-local
counter. This counter is tested under __trace_enabled(). It is a
cumulative value so that the same count of resume must be issued after
several disable usage. There is also the possibility to force reset the
counter to 0 before restoring the old value.

This should be backported up to 3.1.
2024-12-18 09:52:06 +01:00
Amaury Denoyelle
41f0472d96 MEDIUM: mux-quic: remove pacing specific code on qcc_io_cb
Pacing was recently implemented by QUIC MUX. Its tasklet is rescheduled
until next emission timer is reached. To improve performance, an
alternate execution of qcc_io_cb was performed when rescheduled due to
pacing. This was implemented using TASK_F_USR1 flag.

However, this model is fragile, in particular when several events
happened alongside pacing scheduling. This has caused some issue
recently, most notably when MUX is subscribed on transport layer on
receive for handshake completion while pacing emission is performed in
parallel. MUX qcc_io_cb() would not execute the default code path, which
means the reception event is silently ignored.

Recent patches have reworked several parts of qcc_io_cb. The objective
was to improve performance with better algorithm on send and receive
part. Most notable, qcc frames list is only cleared when new data is
available for emission. With this, pacing alternative code is now mostly
unneeded. As such, this patch removes it. The following changes are
performed :

* TASK_F_USR1 is now not used by QUIC MUX. As such, tasklet_wakeup()
  default invokation can now replace obsolete wrappers
  qcc_wakeup/qcc_wakeup_pacing

* qcc_purge_sending is removed. On pacing rescheduling, all qcc_io_cb()
  is executed. This is less error-prone, in particular when pacing is
  mixed with other events like receive handling. This renders the code
  less fragile, as it completely solves the described issue above.

This should be backported up to 3.1.
2024-12-18 09:49:20 +01:00
Amaury Denoyelle
14710b5e6b MEDIUM/OPTIM: mux-quic: do not rebuild frms list on every send
A newly introduced frames list member has been defined into QCC instance
with pacing implementation. This allowed to preserve STREAM frames built
between different emission scheduled by pacing, without having to
regenerate it if no new QCS data is available.

Generalize this principle outside of pacing scheduling. Now, the frames
list will be reused accross several qcc_io_send() usage. Frames list is
only cleared when necessary. This will force its refreshing in the next
qcc_io_send() via qcc_build_frms_list().

Frames list refreshing is performed in the following cases :
* on successful transfer from stream snd_buf / done_ff / shut
* on stream reset or read abort
* on max_data/max_stream_data reception with window increase

Note that the two first cases are in fact covered directly due to
qcc_send_stream() usage when QCS is (re)inserted into the send_list.

The main objective of this patch will be to remove QUIC MUX pacing
specific code path. It could also provide better performance as emission
of large frames may often be rescheduled due to transport layer, either
on congestion or full socket buffer. When QUIC MUX is rescheduled, no
new data is available and frames list can be reuse as-is, avoiding an
unecessary loop over send_list.

This should be backported up to 3.1.
2024-12-18 09:49:02 +01:00
Amaury Denoyelle
9ecc1a8e57 MINOR: mux-quic: split STREAM and RS/SS emission
This commit is a follow-up of the previous one which defines function
qcc_build_frms(). This function implements looping over qcc send_list,
to both encode and send individually any STOP_SENDING and RESET_STREAM,
but also encode STREAM frames as a preparator step. STREAM frames were
then sent as a list outside of qcc_build_frms() via qcc_send_frames().

Extract STOP_SENDING/RESET_STREAM encoding and emission step into a new
function qcc_emit_rs_ss(). The code is thus cleaner. In particular it
highlights that an error during STOP_SENDING/RESET_STREAM emission stage
is fatal and prevent any STREAM frames processing.

This should be backported up to 3.1.
2024-12-18 09:40:21 +01:00
Amaury Denoyelle
244dc00b09 MINOR: mux-quic: extract code to build STREAM frames list
Extracts code responsible to generate STREAM, RESET_STREAM and
STOP_SENDING frames for each qcs instances registered in qcc send_list.
It is moved from qcc_io_send() to its owned new function
qcc_build_frms().

This commit does not bring functional change. It is a preparatory step
to adapt QUIC MUX send mechanism to allow reusing of qcc frms list
accross qcc_io_send() invokation.

As a side change, qcc_tx_frms_free() is renamed to qcc_clear_frms().
This better highlights its relationship with qcc_build_frms().

This should be bkacported up to 3.1.
2024-12-18 09:38:19 +01:00
Amaury Denoyelle
e296585ae9 MEDIUM/OPTIM: mux-quic: implement purg_list
This commit is part of the current serie which aims to refactor and
improve overall performance of QUIC MUX I/O handler.

qcc_io_process() is responsible to perform some internal operations on
QUIC MUX after I/O completion. It is notably called on every qcc_io_cb()
tasklet handler.

The most intensive work on it is the purging of QCS instances after
transfer completion. This was implemented by looping on QCC streams tree
and inspecting the state of every QCS. The purpose of this commit is to
optimize this processing.

A new purg_list QCC member is defined. It is responsible to list every
QCS instances whose transfer has been completed. It is thus safe to
reuse <el_send> QCS list attach point. Stream purging will thus only
loop on purg_list instead of every known QCS.

This should be backported up to 3.1.
2024-12-18 09:33:52 +01:00
Amaury Denoyelle
4b42dd4ae0 MEDIUM/OPTIM: mux-quic: define a recv_list for demux resumption
This commit is part of the current serie which aims to refactor and
improve overall performance of QUIC MUX I/O handler.

Define a recv_list element into qcc structure. This is used to
registered every instance of qcs which are currently blocked on
demuxing, which happen on no more space in <rx.appbuf>.

The purpose of this patch is to reduce qcc_io_recv() CPU usage. Now,
only recv_list iteration is performed, instead of the previous looping
over every qcs instances. This is useful as qcc_io_recv() is called each
time qcc_io_cb() is scheduled, even if only sending condition was the
wakeup origin.

A qcs is not inserted into recv_list immediately after blocking on demux
full buffer. Instead, this is only done after unblocking via stream
rcv_buf callback, which ensure that new buffer space is available.

This should be backported up to 3.1.
2024-12-18 09:23:41 +01:00
Amaury Denoyelle
0a53a008d0 MINOR: mux-quic: refactor wait-for-handshake support
This commit refactors wait-for-handshake support from QUIC MUX. The flag
logic QC_CF_WAIT_HS is inverted : it is now positionned only if MUX is
instantiated before handshake completion. When the handshake is
completed, the flag is removed.

The flag is now set directly on initialization via qmux_init(). Removal
via qcc_wait_for_hs() is moved from qcc_io_process() to qcc_io_recv().
This is deemed more logical as QUIC MUX is scheduled on RECV to be
notify by the transport layer about handshake termination. Moreover,
qcc_wait_for_hs() is now called if recv subscription is still active.

This commit is the first of a serie which aims to refactor QUIC MUX I/O
handler and improves its overall performance. The ultimate objective is
to be able to stream qcc_io_cb() by removing pacing specific code path
via qcc_purge_sending().

This should be backported up to 3.1.
2024-12-18 09:23:41 +01:00
Amaury Denoyelle
9dcd2369e2 MINOR: quic: add traces
Add some traces to better follow QUIC MUX scheduling, in particular with
pacing interaction.

This should be backported up to 3.1.
2024-12-18 09:20:20 +01:00
Amaury Denoyelle
17bfe93768 CLEANUP: mux-quic: remove unused qcc member send_retry_list
Remove unused fields send_retry_list from qcc and its corresponding
attach element el from qcs.

This should be backported up to 3.1.
2024-12-18 09:20:20 +01:00
Amaury Denoyelle
2e3542bec6 BUG/MEDIUM: mux-quic: do not mix qcc_io_send() return codes with pacing
With pacing implementation, qcc_send_frames() return code has been
extended to report emission interruption due to pacing limitation. This
is used only in qcc_io_send().

However, its invokation may be skipped using 'sent_done' label. This
happens on emission failure of a STOP_SENDING or RESET_STREAM (either
memory allocation failure, or transport layer rejection). In this case,
return values are mixed as qcs_send() is wrongly compared against pacing
interruption condition. This value corresponds to the length of the last
built STREAM frames.

If by mischance the last frame was 1 byte long, qcs_send() return value
is equal to pacing interruption condition. This has several effects. If
pacing is activated, it may lead to unneeded wakeup on QUIC MUX. Worst,
if pacing is not used, a BUG_ON() crash will be triggered.

Fix this by using a different variable dedicated to qcc_send_frames()
return value. By default it is initialized to 0. This ensures that
pacing code won't be activated in case qcc_send_frames() is not used.

This must be backported up to 3.1.
2024-12-18 09:18:48 +01:00
Willy Tarreau
93d4e9d50f CLEANUP: ssl-sock: drop two now unneeded ALREADY_CHECKED()
In ssl_sock_bind_verifycbk() a BUG_ON() checks the validity of "ctx" and
"bind_conf". There was a pair of ALREADY_CHECKED() macros after BUG_ON()
for the case where DEBUG_STRICT=0. But this is now addressed so we can
remove these two macros and rely on the BUG_ON() instead.
2024-12-17 17:47:57 +01:00
Willy Tarreau
7760e3a374 CLEANUP: quic: replace ALREADY_CHECKED() with ASSUME_NONNULL() at a few places
There were 4 instances of ALREADY_CHECKED() used to tell the compiler that
the argument couldn't be NULL by design. Let's change them to the cleaner
ASSUME_NONNULL(). Functions like qc_snd_buf() were slightly reduced in
size (-24 bytes).

Apparently gcc-13 sees a potential case that others don't see, and it's
likely a bug since depending what is masked, it will completely change
the output warnings to the point of contradicting itself. After many
attempts, it appears that just checking that CMSG_FIRSTHDR(msg) is not
null suffices to calm it down, so the strange warnings might have been
the result of an overoptimization based on a supposed UB in the first
place. At least now all versions up to 13.2 as well as clang are happy.
2024-12-17 17:47:57 +01:00
Willy Tarreau
1f93622779 CLEANUP: stats: use ASSUME_NONNULL() to indicate that the first block exists
In stats_scope_ptr(), the validity of blk() was assumed using
ALREADY_CHECKED(blk), but we can now use the cleaner ASSUME_NONNULL().
In addition this simplifies the BUG_ON() check that follows.
2024-12-17 17:47:57 +01:00
Willy Tarreau
6dfd541ca8 CLEANUP: mux-fcgi: use ASSUME_NONNULL() to indicate that the first block exists
In fcgi_snd_buf(), this was previously achieved using
ALREADY_CHECKED(blk), but we can now fold it into the cleaner
ASSUME_NONNULL().
2024-12-17 17:47:57 +01:00
Willy Tarreau
143a103696 CLEANUP: htx: use ASSUME_NONNULL() to mark the start line as non-null
In http_replace_req_uri(), this assumption was previously made using
ALREADY_CHECKED() but the new one is cleaner (and smaller, 24 bytes
less).
2024-12-17 17:47:57 +01:00
Willy Tarreau
a4f50c69e4 CLEANUP: hlua: use ASSUME_NONNULL() instead of ALREADY_CHECKED()
The purpose of the test in hlua_applet_tcp_new() was precisely to
declare non-nullity. Let's just do it using ASSUME_NONNULL() now.
2024-12-17 17:47:57 +01:00
Willy Tarreau
29b2c5d4d4 CLEANUP: cache: use ASSUME_NONNULL() instead of DISGUISE()
DISGUISE() was used to avoid a NULL warning. Using ASSUME_NONNULL()
instead makes it clearer and made the function slightly shorter.
2024-12-17 17:42:11 +01:00
Willy Tarreau
7b6acb6a51 MINOR: bug: make BUG_ON() fall back to ASSUME
When the strict level is zero and BUG_ON() is not implemented, some
possible null-deref warnings are emitted again because some were
covering for these cases. Let's make it fall back to ASSUME() so that
the compiler continues to know that the tested expression never happens.
It also allows to further optimize certain functions by helping the
compiler eliminate certain tests for impossible values. However it
requires that the expression is really evaluated before passing the
result through ASSUME() otherwise it was shown that gcc-11 and above
will fail to evaluate its implications and will continue to emit the
null-deref warnings in case the expression is non-trivial (e.g. it
has multiple terms).

We don't do it for BUG_ON_HOT() however because the extra cost of
evaluating the condition is generally not welcome in fast paths,
particularly when that BUG_ON_HOT() was kept disabled for
performance reasons.
2024-12-17 17:39:12 +01:00
Willy Tarreau
63798088b3 MINOR: compiler: add ASSUME_NONNULL() to tell the compiler a pointer is valid
At plenty of places we have ALREADY_CHECKED() or DISGUISE() on a pointer
just to avoid "possibly null-deref" warnings. These ones have the side
effect of weakening optimizations by passing through an assembly step.
Using ASSUME_NONNULL() we can avoid that extra step. And when the
__builtin_unreachable() builtin is not present, we fall back to the old
method using assembly. The macro returns the input value so that it may
be used both as a declarative way to claim non-nullity or directly inside
an expression like DISGUISE().
2024-12-17 16:46:46 +01:00
Willy Tarreau
2ce63b7b17 MINOR: compiler: also enable __builtin_assume() for ASSUME()
Clang apparently has __builtin_assume() which does exactly the same
as our macro, since at least v3.8. Let's enable it, in case it may
even better detect assumptions vs unreachable code.
2024-12-17 16:46:46 +01:00
Willy Tarreau
efc897484b MINOR: compiler: add a new "ASSUME" macro to help the compiler
This macro takes an expression, tests it and calls an unreachable
statement if false. This allows the compiler to know that such a
combination does not happen, and totally eliminate tests that would
be related to this condition. When the statement is not available
in the compiler, we just perform a break from a do {} while loop
so that the expression remains evaluated if needed (e.g. function
call).
2024-12-17 16:46:46 +01:00
Willy Tarreau
41fc18b1d1 MINOR: compiler: rely on builtin detection for __builtin_unreachable()
Due to __builtin_unreachable() only being associated to gcc 4.5 and
above, it turns out it was not enabled for clang. It's not used *that*
much but still a little bit, so let's enable it now. This reduces the
code size by 0.2% and makes it a bit more efficient.
2024-12-17 16:46:46 +01:00
Willy Tarreau
96cfcb1df3 MINOR: compiler: add a __has_builtin() macro to detect features more easily
We already have a __has_attribute() macro to detect when the compiler
supports a specific attribute, but we didn't have the equivalent for
builtins. clang-3 and gcc-10 have __has_builtin() for this. Let's just
bring it using the same mechanism as __has_attribute(), which will allow
us to simply define the macro's value for older compilers. It will save
us from keeping that many compiler-specific tests that are incomplete
(e.g. the __builtin_unreachable() test currently doesn't cover clang).
2024-12-17 16:46:46 +01:00
Willy Tarreau
4710ab5604 BUILD: debug: only dump/reset glitch counters when really defined
If neither DEBUG_GLITCHES nor DEBUG_STRICT is set, we end up with
no dbg_cnt section, resulting in debug_parse_cli_counters not
building due to __stop_dbg_cnt and __start_dbg_cnt not being defined.
Let's just condition the end of the function to these conditions.
An alternate approach (less elegant) is to always declare a dummy
entry of type DBG_COUNTER_TYPES in debug.c.

This must be backported to 3.1 since it was brought with glitches.
2024-12-17 16:46:25 +01:00
Olivier Houchard
b3cd5a4b86 CLEANUP: queues: Remove pendconn_grab_from_px().
pendconn_grab_from_px() is now unused, so just remove it.
2024-12-17 16:05:44 +01:00
Olivier Houchard
111ea83ed4 BUG/MEDIUM: queues: Do not use pendconn_grab_from_px().
pendconn_grab_from_px() was called when a server was brought back up, to
get some streams waiting in the proxy's queue and get them to run on the
newly available server. It is very similar to process_srv_queue(),
except it only goes through the proxy's queue, which can be a problem,
because there is a small race condition that could lead us to add more
streams to the server queue just as it's going down. If that happens,
the server would just be ignored when back up by new streams, as its
queue is not empty, and it would never try to process its queue.
The other problem with pendconn_grab_from_px() is that it is very
liberal with how it dequeues streams, and it is not very good at
enforcing maxconn, it could lead to having 3*maxconn connections.
For both those reasons, just get rid of pendconn_grab_from_px(), and
just use process_srv_queue().
Both problems are easy to reproduce, especially on a 64 threads machine,
set a maxconn to 100, inject in H2 with 1000 concurrent connections
containing up to 100 streams each, and after a few seconds/minutes the
max number of concurrent output streams will be much higher than
maxconn, and eventually the server will stop processing connections.

It may be related to github issue #2744. Note that it doesn't totally
fix the problem, we can occasionally see a few more connections than
maxconn, but the max that have been observed is 4 more connections, we
no longer get multiple times maxconn.

have more outgoing connections than maxconn,
This should be backported up to 2.6.
2024-12-17 16:05:44 +01:00
Olivier Houchard
dc9ce9c264 BUG/MEDIUM: queues: Make sure we call process_srv_queue() when leaving
In stream_free(), make sure we call process_srv_queue() each time we
call sess_change_server(), otherwise a server may end up not dequeuing
any stream when it could do so. In some extreme cases it could lead to
an infinite loop, as the server would appear to be available, as its
"served" parameter would be < maxconn, but would end up not being used,
as there are elements still in its queue.

This should be backported up to 2.6.
2024-12-17 16:05:44 +01:00
Christopher Faulet
4f32d03360 BUG/MEDIUM: stconn: Only consider I/O timers to update stream's expiration date
In sc_notify(), it remained a case where it was possible to set an
expiration date on the stream in the past, leading to a crash because of a
BUG_ON(). This must never happen of course.

In sc_notify(), The stream's expiration may be updated in case no wakeup
conditions are encoutered. In that case, we must take care to never set an
expiration date in the past. However, it appeared there was still a
condition to do so. This code is based on an implicit postulate: the
stream's expiration date must always be set when we leave
process_stream(). It was true since the 2.9. But in 3.0, the buffer
allocation mechanism was improved and on an alloc failure in
process_stream(), the stream is inserted in a wait-list and its expiration
date is set to TICK_ETERNITY. With the good timing, and an analysis
expiration date set on a channel, it is possible to set the stream's
expiration date in past.

After analysis, it appeared that the proper way to fix the issue is to only
evaluate I/O timers (read and write timeout) and not stream's timers
(analase_exp or conn_exp) because only I/O timers may have changed since the
last process_stream() call.

This patch must be backported as far as 3.0 to fix the issue. But it is
probably a good idea to also backported it as far as 2.8.
2024-12-16 17:47:25 +01:00
William Lallemand
e3b760ebcc BUG/MINOR: ssl/cli: 'show ssl ca-file' escape the first '*' of a filename
When doing a 'show ssl ca-file <filename>', prefixing a filename with a '*'
allows to show the uncommited transaction asociated to this filename.

However for people using '*' as the first character of their
filename, there is no way to access this filename.

This patch fixes the problem by allowing to escape the first
character with \.

This should be backported in every stable branches.
2024-12-16 17:09:34 +01:00
William Lallemand
82c83a11a1 BUG/MINOR: ssl/cli: 'show ssl crl-file' escape the first '*' of a filename
When doing a 'show ssl crl-file <filename>', prefixing a filename with a '*'
allows to show the uncommited transaction asociated to this filename.

However for people using '*' as the first character of their
filename, there is no way to access this filename.

This patch fixes the problem by allowing to escape the first
character with \.

This should be backported in every stable branches.
2024-12-16 16:46:52 +01:00
William Lallemand
2ba4cf541b BUG/MINOR: ssl/cli: 'show ssl cert' escape the first '*' of a filename
When doing a 'show ssl cert <filename>', prefixing a filename with a '*'
allows to show the uncommited transaction asociated to this filename.

However for people using '*' as the first character of their filename,
there is no way to access this filename.

This patch fixes the problem by allowing to escape the first character
with \.

This should be backported in every stable branches.
2024-12-16 16:17:12 +01:00
William Lallemand
fd35b7fb97 MINOR: ssl/cli: add -A to the 'show ssl sni' command description
Add [-A] to the 'show ssl sni' command description.
2024-12-16 15:22:27 +01:00
William Lallemand
7c8e38d4d6 MINOR: ssl/cli: allow to filter expired certificates with 'show ssl sni'
-A option in 'show ssl sni' shows certificates that are past the
notAfter date.

The patch reworks the options parsing to get multiple.
2024-12-16 14:55:23 +01:00
William Lallemand
bb88f68cf7 MINOR: ssl: add utils functions to extract X509 notAfter date
Add ASN1_to_time_t() which converts an ASN1_TIME to a time_t and
x509_get_notafter_time_t() which returns the notAfter date in time_t
format.
2024-12-16 14:54:53 +01:00
Valentine Krasnobaeva
fbc534a6fa REORG: startup: move nofile limit checks in limits.c
Let's encapsulate the code, which checks the applied nofile limit into
a separate helper check_nofile_lim_and_prealloc_fd(). Let's keep in this new
function scope the block, which tries to create a copy of FD with the highest
number, if prealloc-fd is set in the configuration.
2024-12-16 10:44:01 +01:00
Valentine Krasnobaeva
14f5e00d38 REORG: startup: move code that applies limits to limits.c
In step_init_3() we try to apply provided or calculated earlier haproxy
maxsock and memmax limits.

Let's encapsulate these code blocks in dedicated functions:
apply_nofile_limit() and apply_memory_limit() and let's move them into
limits.c. Limits.c gathers now all the logic for calculating and setting
system limits in dependency of the provided configuration.
2024-12-16 10:44:01 +01:00
Valentine Krasnobaeva
1332e9b58d REORG: startup: move global.maxconn calculations in limits.c
Let's encapsulate the code, which calculates global.maxconn and
global.maxsslconn into a dedicated function set_global_maxconn() and let's
move this function in limits.c. In limits.c we keep helpers to calculate and
check haproxy internal limits, based on the system nofile and memory limits.
2024-12-16 10:44:01 +01:00
Frederic Lecaille
949bc18f66 CLEANUP: quic: Rename some BBR functions in relation with bw probing
Rename bbr_is_probing_bw() to bbr_is_in_a_probe_state() and
bbr_is_accelerating_probing_bw() to bbr_is_probing_bw() to match
the function names of the BBR v3 internet draft.

Must be backported to 3.1 to ease any further backport to come.
2024-12-13 19:41:21 +01:00
Frederic Lecaille
0dc0c890ea BUG/MINOR: quic: missing Startup accelerating probing bw states
Startup state is also a probing with acceleration bandwidth state.
This modification should have come with this previous one:

  BUG/MINOR: quic: reduce packet losses at least during ProbeBW_CRUISE (BBR)

Must be backported to 3.1.
2024-12-13 19:41:21 +01:00
Valentine Krasnobaeva
ea4a148a7d REGTESTS: ssl: add a PEM with mix of LF and CRLF line endings
User tried to update a PEM, generated automatically. Part of this PEM has LF
line endings, and another part (CA certificate), added by some API, has CRLF
line endings. This has revealed a bug in cli_snd_buf(), see more
details in issue GitHUB #2818. So, let's add an example of such PEM in our
SSL regtest.
2024-12-13 18:13:42 +01:00
Valentine Krasnobaeva
d60c893991 BUG/MINOR: cli: cli_snd_buf: preserve \r\n for payload lines
cli_snd_buf() analyzez input line by line. Before this patch it has always
scanned a given line for the presence of '\r' followed by '\n'.

This is only needed for strings, that contain the commands itself like
"show ssl cert\n", "set ssl cert test.pem <<\n".

In case of strings, which contain the command's payload, like
"-----BEGIN CERTIFICATE-----\r\n", '\r\n' should be preserved
as is.

This patch fixes the GitHub issue #2818.

This patch should be backported in v3.1 and in v3.0.
2024-12-13 18:13:42 +01:00
Frederic Lecaille
178109f608 BUG/MINOR: quic: too permissive exit condition for high loss detection in Startup (BBR)
This bug fixes the 3rd condition used by bbr_check_startup_high_loss() to decide
it has detected some high loss as mentioned by the BBR v3 RFC draft:

   4.3.1.3. Exiting Startup Based on Packet Loss
   ...
   There are at least BBRStartupFullLossCnt=6 discontiguous sequence ranges lost in that round trip.

where a <= operator was used in place of <.

Must be backported to 3.1.
2024-12-13 14:42:43 +01:00
Frederic Lecaille
e61b418907 BUG/MINOR: quic: fix the wrong tracked recovery start time value
bbr_congestion_event() role is to track the start time of recovery periods.
This was done using <ts> passed as parameter. But this parameter is the
time the newest lost packet has been sent.
The timestamp value to store in ->recovery_start_ts is <now_ms>.

Must be backported to 3.1.
2024-12-13 14:42:43 +01:00
Frederic Lecaille
e1d25cdbdd CLEANUP: quic: remove a wrong comment about ->app_limited (drs)
->app_limited quic_drs struct member is not a boolean. This is
the index of the last transmitted packet marked as application-limited, or 0 if
the connection is not currently application-limited (see C.app_limited
definition in BBR v3 draft).
2024-12-13 14:42:43 +01:00
Frederic Lecaille
eeaeb412dc MINOR: quic: reduce the private data size of QUIC cc algos
After these commits:

    BUG/MINOR: quic: remove max_bw filter from delivery rate sampling
    BUG/MINOR: quic: fix BBB max bandwidth oscillation issue

where some members were removed from bbr struct, the private data
size of QUIC cc algorithms may be reduced from 160 to 144 uint32_t.

Should be easily backported to 3.1 alonside the commits mentioned above.
2024-12-13 14:42:43 +01:00
Frederic Lecaille
9813de0537 BUG/MINOR: quic: reduce packet losses at least during ProbeBW_CRUISE (BBR)
Upon congestion events (for a instance packet loss),
bbr_adapt_lower_bounds_from_congestion() role is to adapt some BBR internal
variables in relation with the estimated bandwidth (BBR.bw).

According to the BBR v3 draft, this function should do nothing
if BBRIsProbingBW() pseudo-code returns true. That said, this function
is not defined by the BBR v3 draft. But according to this part mentioned before
defining the pseudo-code for BBRAdaptLowerBoundsFromCongestion():

4.5.10.3. When not Probing for Bandwidth
When not explicitly accelerating to probe for bandwidth (Drain, ProbeRTT,
ProbeBW_DOWN, ProbeBW_CRUISE), BBR responds to loss by slowing down to some extent.
This is because loss suggests that the available bandwidth and safe volume of
in-flight data may have decreased recently, and the flow needs to adapt, slowing
down toward the latest delivery process. BBR flows implement this response by
reducing the short-term model parameters, BBR.bw_lo and BBR.inflight_lo.

BBRIsProbingBW() should concern the accelerating probe for bandwidth states
which are BBR_ST_PROBE_BW_REFILL and BBR_ST_PROBE_BW_UP.

Adapt the code to match this latter assumption. At least this reduce
drastically the packet loss volumes at least during ProbeBW_CRUISE.

As an example, on a 100MBits/s internet link with ~94ms as RTT, before
this patch, 4329640 sent packets were needed with 1617119 lost packets (!!!) to
download a 3GB object. After this patch, 2843952 sent packets vs 144134 lost packets
are needed. There may be some packet loss issue. I suspect the maximum bandwidth
which may be overestimated. More this is the case, more the packet loss is big.
That said, at this time, it remains below 5% depending on the size of the objects,
5% being for more than 2GB objects.

Must be backported to 3.1.
2024-12-13 14:42:43 +01:00
Frederic Lecaille
ebfc301d5d BUG/MINOR: quic: underflow issue for bbr_inflight_hi_from_lost_packet()
Add a test to ensure that values of a local variable used by
bbr_inflight_hi_from_lost_packet() is not be impacted by underflow issues
when subtracting too big numbers and make this function return a correct value.

Must be backported to 3.1.
2024-12-13 14:42:43 +01:00
Frederic Lecaille
22ab45a3a8 BUG/MINOR: quic: remove max_bw filter from delivery rate sampling
This filter is no more needed after this commit:

 BUG/MINOR: quic: fix BBB max bandwidth oscillation issue.

Indeed, one added this filter at delivery rate sampling level to filter
the BBR max bandwidth estimations and was inspired from ngtcp2 code source when
trying to fix the oscillation issue. But this BBR max bandwidth oscillation issue
was fixed by the aforementioned commit.

Furthermore this code tends to always increment the BBR max bandwidth. From my point
of view, this is not a good idea at all.

Must be backported to 3.1.
2024-12-13 14:42:43 +01:00
Frederic Lecaille
2bcd5b4cba BUG/MINOR: quic: wrong bbr_target_inflight() implementation
This bug arrived with this commit:

  6404b7a18a BUG/MINOR: quic: fix bbr_inflight() calls with wrong gain value

This patch partially reverts after having checked the BBR v3 draft.
This bug was invisible when testing long BBR flows.

Must be backported to 3.1.
2024-12-13 14:42:43 +01:00
Frederic Lecaille
b47e1e65df BUG/MINOR: quic: fix BBB max bandwidth oscillation issue.
Remove the code in relation with BBR.ack_phase as per this commit:
ee98c12ad6

I do now kwow at this time why such a request was pushed on GH for the BBR v3 draft
pseudo-code. That said, the use of such an ack phase seemed confusing, adding much
more information about a BBR flow state than needed. Indeed, the ack phase
state is modified several times in the BBR draft pseudo-code but only used to
decide if the max bandwidth filter virtual clock had to be incremented by
BBRAdvanceMaxBwFilter().

In addition to this, when discussing about haproxy BBR implementation with
Neal Cardwell on the BBR development google group about an oscillation issue
of the max bandwidth (BBR.max_bw), I concluded that this was due to the fact
that its filter virutal clock was too often update, due to the ack phase wich
was stalled in BBR_ACK_PHASE_ACKS_PROBE_STOPPING state for too long. This is
where Neal asked me to test the aforementioned commit. This definitively
makes the max bandwidth (BBR.max_bw) oscillation issue disappear.

Another solution would have been to add a new ack phase enum afer
BBR_ACK_PHASE_ACKS_PROBE_STOPPING. BBR_ACK_PHASE_ACKS_PROBE_STOPPED
would have been a good candidate.

Remove the code in relation with BBR.ack_phase.

Must be backported to 3.1.
2024-12-13 14:42:43 +01:00
Frederic Lecaille
1dbf6b8bed BUG/MINOR: quic: wrong logical statement in in_recovery_period() (BBR)
A && logical operator was badly replaced by a || in this function which decides
if BBR is in a recovery period.

Must be backported to 3.1.
2024-12-13 14:42:43 +01:00
Frederic Lecaille
a9a2f98f86 MINOR: window_filter: rely on the time to update the filter samples (QUIC/BBR)
The windowed filters are used only the BBR implementation for QUIC to filter
the maximum bandwidth samples for its estimation over a virtual time interval
tracked by counting the cyclical progression through ProbeBW cycles. ngtcp2
and quiche use such windowed filters in their BBR implementation. But in a
slightly different way. When updating the 2nd or 3rd filter samples, this
is done based on their values in place of the time they have been sampled.
It seems more logical to rely on the sample timestamps even if this has no
implication because when a sample is updated using another sample because it
has the same value, they have both the same timestamps!

This patch modifies two statements which compare two consecutive filter samples
based on their values (smp[]->v) by statements which compare them based on the
virtual time they have been sampled (smp[]->t). This fully complies which the
code used by the Linux kernel in lib/win_minmax.c.

Alo take the opportunity of this patch to shorten some statements using <smp>
local variable value to update smp[2] sample in place of initializing its two
members with the <smp> member values.

This patch SHOULD be easily backported to 3.1 where BBR was first implemented.
2024-12-13 14:42:43 +01:00
William Lallemand
0c1fdb2908 CI: github: let's add an AWS-LC-FIPS job
Add a job which does exactly the same as the aws-lc.yml job, but using
the AWS-LC-FIPS build.
2024-12-12 16:35:42 +01:00
William Lallemand
0107bfdb1a MEDIUM: ssl: rename 'OpenSSL' by 'SSL library' in haproxy -vv
It's been some time since we are compatible with multiple SSL libraries,
let's rename the "OpenSSL library" strings in "SSL library" strings in
haproxy -vv, in order to be more generic.
2024-12-12 15:58:57 +01:00
William Lallemand
f97ffb9ec4 MINOR: ssl: add "FIPS" details in haproxy -vv
Add the FIPS mode in haproxy -vv, it need to be activated on the system
with openssl.cnf or by compiling the SSL library with the right options.

Can't work with OpenSSL >= 3.0 because fips a "provider" to load, works
with AWS-LC, WolfSSL and OpenSSL 1.1.1.
2024-12-12 15:57:38 +01:00
William Lallemand
23f670f1f5 CI: scripts: add support for AWS-LC-FIPS in build-ssl.sh
Allow the build-ssl.sh script to build AWS-LC-FIPS.

Example:

  sudo AWS_LC_FIPS_VERSION=3.0.0 BUILDSSL_DESTDIR=/opt/awslc-fips-3.0.0/ ./scripts/build-ssl.sh
2024-12-12 15:57:30 +01:00
Amaury Denoyelle
ee7241ed18 MINOR: stats: use stress mode to force reentrant dumps
Provide alternative code during stats dump when stress mode is active.
The objective is to force the applet to yield on every output line. This
allows to easily test reentrant code paths, in particular while adding
and removing server instances.

To support this, output is interrupted every time the output buffer (or
its equivalent) is not empty. Use COND_STRESS() macro to provide default
and stress alternative conditions.
2024-12-12 11:26:33 +01:00
Amaury Denoyelle
1f458b3ea8 MINOR: applet: define applet_putchk_stress() alternative
Previous patch introduced stress mode to be able to easily test
alternative code paths.

The first point would be to force interruption of stats dump on every
line and check reentrant patchs, in particular while adding and removing
servers instances.

The purpose of this patch is to be able to use applet_putchk_stress()
during stats dump while not impacting other applets. To support this,
extract applet_putchk() into an internal _applet_putchk() which have a
new argument stress. Define two helpers applet_putchk() and
applet_putchk_stress(), the latter to set the stress argument to true.

For the moment, applet_putchk_stress() is not used. This will be the
subject of the next patch.
2024-12-12 11:26:33 +01:00
Amaury Denoyelle
9d19fc4cf7 MINOR: build: define DEBUG_STRESS
Define a new build mode DEBUG_STRESS. This will be used to stress some
code parts which cannot be reproduce easily with an alternative
suboptimal code.

First, a global <mode_stress> is set either to 1 or 0 depending on
DEBUG_STRESS compilation. A new global keyword "stress-level" is also
defined. It allows to specify a level from 0 to 9, to increase the
stress incurred on the code.

Helper macro STRESS_RUN* are defined for each stress level. This allows
to easily specify an instruction in default execution and a stress
counterpart if running on the corresponding stress level.
2024-12-12 11:19:10 +01:00
Willy Tarreau
f36ac42274 [RELEASE] Released version 3.2-dev1
Released version 3.2-dev1 with the following main changes :
    - MINOR: pattern: split pat_ref_set()
    - MINOR: pattern: add pat_ref_gen_set() function
    - MINOR: pattern: add pat_ref_gen_find_elt() function
    - MINOR: pattern: add pat_ref_gen_delete() function
    - MEDIUM: pattern: consider gen_id in pat_ref_set_from_node()
    - MEDIUM: pattern: always consider gen_id for pat_ref lookup operations
    - MINOR: version: this is development again (3.2)
    - DEV: patchbot: prepare for new version 3.2-dev
    - BUG/MEDIUM: sock: Remove FD_POLL_HUP during connect() if FD_POLL_ERR is not set
    - MINOR: proxy: Add support of 421-Misdirected-Request in retry-on status
    - BUG/MINOR: log: fix lf_text() behavior with empty string
    - MINOR: log: always consider "+M" option in lf_text_len()
    - BUG/MINOR: improve BBR throughput on very fast links
    - MINOR: event_hdl: add PAT_REF events
    - MINOR: pattern: publish event_hdl events on pat_ref updates
    - MINOR: hlua: add patref class
    - MINOR: hlua: add core.get_patref method
    - MINOR: hlua_fcn: implement index and pair metamethods for patref class
    - MINOR: hlua_fcn: wrap pat_ref struct for patref class
    - MINOR: pattern: add pat_ref_may_commit() helper function
    - MINOR: hlua_fcn: add Patref:commit() method
    - MINOR: hlua_fcn: add Patref:prepare() method
    - MINOR: hlua_fcn: add Patref:purge() method
    - MINOR: hlua_fcn: add Patref:giveup()
    - MINOR: hlua_fcn: add Patref:add()
    - MINOR: hlua_fcn: add Patref:del()
    - MINOR: hlua_fcn: add Patref:set()
    - MINOR: hlua_fcn: add Patref:add_bulk()
    - MINOR: hlua_fcn: add Patref:event_sub()
    - DOC: lua: prefer Patref:{set,add}() over legacy methods for acl and maps
    - BUG/MINOR: hlua_fcn: fix Patref:set() force parameter
    - BUG/MEDIUM: event_hdl: fix uninitialized value in async mode when no data is provided
    - BUG/MEDIUM: quic: prevent stream freeze on pacing
    - BUG/MEDIUM: http-ana: Reset request flag about data sent to perform a L7 retry
    - BUG/MINOR: h1-htx: Use default reason if not set when formatting the response
    - BUILD: quic: fix a build error about an non initialized timestamp
    - CI: github: allow coredumps on aws-lc and wolfssl jobs
    - BUG/MINOR: listener: fix potential null pointer dereference in listener_release()
    - MINOR: hlua: fix ambiguous hlua usage in hlua_filter_delete()
    - BUG/MINOR: signal: register default handler for SIGINT in signal_init()
    - BUG/MINOR: startup: close pidfd and free global.pidfile in handle_pidfile()
    - BUG/MINOR: startup: fix pidfile creation
    - MINOR: tools: add a new macro DEFVAL() to provide a default argument
    - MINOR: tasklet: set TASK_WOKEN_OTHER on tasklets by default
    - BUG/MINOR: quic: fix bbr_inflight() calls with wrong gain value
    - BUG/MEDIUM: init: make sure only daemonized processes change their session
    - BUG/MINOR: init: do not call fork_poller() for non-forked processes
    - BUG/MEDIUM: mux-quic: remove pacing status when everything is sent
    - BUG/MINOR: quic: remove startup alert if conn socket-owner unsupported
    - BUG/MINOR: quic: remove startup alert if GSO unsupported
    - MINOR: stktable: implement "recv-only" table option
    - CLEANUP: stktable: replace nopurge attribute with flag
    - CLEANUP: stktable: add some stktable flags polishing
    - BUG/MEDIUM: mux-h2: make sure not to touch dummy streams when sending WU
    - MINOR: mux-quic: clean up zero-copy done_ff callback
    - BUG/MINOR: config: Fix parsing of accept-invalid-http-{request,response}
    - BUG/MINOR: mworker: don't save program PIDs in oldpids
    - BUG/MINOR: mworker: fix -D -W -sf/-st modes
    - BUG/MINOR: startup: fix error path for master, if can't open pidfile
    - CLEANUP: startup: make if condition to kill old pids more readable
    - DOC: config: fix confusing init-state examples
    - MINOR: mux-h1: use explicit __objt_server on idle conn reinsert
    - MINOR: mux-h2: use explicit __objt_server on idle conn reinsert
    - MINOR: mux-spop: use explicit __objt_server on idle conn reinsert
    - MINOR: mux-fcgi: use explicit __objt_server on idle conn reinsert
    - MINOR: quic: convert startup check in a freestanding function
    - MINOR: quic: split startup check function
    - MINOR: quic: implement build options report
    - BUG/MINOR: debug: COUNT_IF() should return true/false
    - MINOR: mux-h2/traces: add a missing trace on negative initial window size
    - CLEANUP: mux-h2/traces: reword certain ambiguous traces
    - MINOR: mux-h2/glitches: add a description to the H2 glitches
    - BUG/MINOR: mux-h2: fix expression when detecting excess of CONTINUATION frames
    - BUILD: debug: fix build issues in COUNT_IF() with -Wunused-value
    - MINOR: tools: make fddebug() automatically emit the location
    - MINOR: ssl: add notBefore and notAfter utility functions
    - MEDIUM: ssl/cli: "show ssl sni" list the loaded SNI in frontends
    - BUG/MEDIUM: startup: don't daemonize if started with -c
    - BUG/MEDIUM: startup: report status if daemonized process fails
    - BUG/MEDIUM: mworker: report status, if daemonized master fails
    - BUG/MINOR: mworker: detach from tty when received READY from worker
    - BUG/MINOR: namespace: handle a possible strdup() failure
    - BUG/MINOR: ssl_crtlist: handle a possible strdup() failure
    - BUG/MINOR: resolvers: handle a possible strdup() failure
    - CI: use "/tmp" as default value for TMPDIR when searching logs
    - DOC: management: fix typos and paragraph ordering in 'show ssl sni'
    - CLEANUP: ssl: fix comment in 'show ssl sni'
    - MINOR: ssl/cli: add negative filters to "show ssl sni"
    - BUG/MINOR: stats: decrement srv refcount on stats-file release
    - MINOR: list: define a watcher type
    - BUG/MEDIUM: stats/server: use watcher to track server during stats dump
    - MINOR: server: remove prev_deleted server list
    - BUG/MINOR: http-fetch: Ignore empty argument string for query()
    - BUG/MINOR: server-state: Fix expiration date of srvrq_check tasks
    - BUG/MINOR: hlua_fcn: restore server pairs iterator pointer consistency
2024-12-11 14:17:46 +01:00
Aurelien DARRAGON
358166ae6a BUG/MINOR: hlua_fcn: restore server pairs iterator pointer consistency
Since 9c91b30 ("MINOR: server: remove prev_deleted server list"), hlua
server pair iterator may use and return invalid (stale) server pointer
if multiple servers were deleted between two iterations.

Indeed, the server refcount mechanism (using srv_take()) is no longer
sufficient as the prev_deleted mitigation was removed.

To ensure server pointer consistency between two yields, the new watcher
mechanism must be used (as it already the case for stats dumping).

Thus in this patch we slightly change the server iteration logic:
hlua_server_list_iterator_context struct now stores the next valid server
pointer, and a watcher is added to ensure this pointer is never stale.

Then in hlua_listable_servers_pairs_iterator(), this next pointer is used
to create the Lua server object, and the next valid pointer is obtained by
leveraging watcher_next().

No backport needed unless 9c91b30 ("MINOR: server: remove prev_deleted
server list") is. Please note that dynamic servers were not supported in
Lua prior to 2.8, so it doesn't make sense to backport this patch further
than 2.8.
2024-12-11 10:52:11 +01:00
Christopher Faulet
647a290662 BUG/MINOR: server-state: Fix expiration date of srvrq_check tasks
"hold.timeout" was used as expiration date for srvrq_check tasks. But it is
not accurrate. The expiration date must be based on the resolution timeouts
instead (resolve and retry).

The purpose of srvrq_check task is to clean up the server resolution status
when outdated info are inherited from the state file. Using "hold.timeout"
is not accurrate here because hold timeouts concern the resolution response
items not the resolution status of servers. It may be set to a huge value or
0. The expiration date of these tasks must be based on the resolution
timeouts instead.

So now the ("timeout resolve" + resolve_retries * "timeout retry") value is
used.

This patch should fix the issue #2816. It must be backported to all stable
versions.
2024-12-11 10:00:01 +01:00
Christopher Faulet
e1525e7b8f BUG/MINOR: http-fetch: Ignore empty argument string for query()
query() sample fetch function takes an optional argument string. During
configuration parsing, empty string must be ignored. It is especially
important when the sample is used with empty parenthesis. The argument is
optional and it is a list of options to configure the behavior of the sample
fetch. So it is logical to ignore empty strings.

This patch should fix the issue #2815. It must be backported to 3.1.
2024-12-11 10:00:01 +01:00
Amaury Denoyelle
9c91b30139 MINOR: server: remove prev_deleted server list
This patch is a direct follow-up to the previous one. Thanks to watcher
type, it is not safe to assume that servers manipulated via stats dump
were not targetted by a "delete server" CLI command. As such,
prev_deleted list server member is now unneeded. This patch thus removes
any reference to it.
2024-12-10 16:19:33 +01:00
Amaury Denoyelle
071ae8ce3d BUG/MEDIUM: stats/server: use watcher to track server during stats dump
If a server A is deleted while a stats dump is currently on it, deletion
is delayed thanks to reference counting. Server A is nonetheless removed
from the proxy list. However, this list is a single linked list. If the
next server B is deleted and freed immediately, server A would still
point to it. This problem has been solved by the prev_deleted list in
servers.

This model seems correct, but it is difficult to ensure completely its
validity. In particular, it implies when stats dump is resumed, server A
elements will be accessed despite the server being in a half-deleted
state.

Thus, it has been decided to completely ditch the refcount mechanism for
stats dump. Instead, use the watcher element to register every stats
dump currently tracking a server instance. Each time a server is deleted
on the CLI, each stats dump element which may points to it are updated
to access the next server instance, or NULL if this is the last server.
This ensures that a server which was deleted via CLI but not completely
freed is never accessed on stats dump resumption.

Currently, no race condition related to dynamic servers and stats dump
is known. However, as described above, the previous model is deemed too
fragile, as such this patch is labelled as bug-fix. It should be
backported up to 2.6, after a reasonable period of observation. It
relies on the following patch :
  MINOR: list: define a watcher type
2024-12-10 16:19:33 +01:00
Amaury Denoyelle
eafa8a32bb MINOR: list: define a watcher type
Define a new watcher type into list module. This type is similar to bref
and can be used to register an element which is currently tracking a
dynamic target. Contrary to bref, if the target is freed, every watcher
element are updated to point to a next valid entry or NULL.

This type will simplify handling of dynamic servers deletion, in
particular while stats dump are performed.

This patch is not a bug-fix. However, it is mandatory to fix a race
condition in dynamic servers. Thus, it should be backported along the
next commit up to 2.6.
2024-12-10 16:04:11 +01:00
Amaury Denoyelle
2199179461 BUG/MINOR: stats: decrement srv refcount on stats-file release
Servers instance may be removed at runtime. This can occurs during a
stat dump which currently references this server instance. This case is
protected by server refcount to prevent the server immediate release.

CLI output may be interrupted prior to stats dump completion, for
example if client CLI has been disconnected before the full response
transfer. As such, srv_drop() must be called in every stats dump release
callback.

srv_drop() was missing for stats-file dump release callback. This could
cause a race condition which would prevent a server instance to be fully
removed. Fix this by adding srv_drop() invokation into
cli_io_handler_release_dump_stat_file().

This should be backported up to 3.0.
2024-12-10 16:04:11 +01:00
William Lallemand
a6b3080966 MINOR: ssl/cli: add negative filters to "show ssl sni"
The 'show ssl sni' output can be confusing when using crt-list, because
the wildcards can be completed with negative filters, and they need to
be associated to the same line.

Having a negative filter on its line alone does not make much sense,
this patch adds a new 'Negative Filter' column that show the exception
applied on a wildcard from a crt-list line.
2024-12-10 11:36:50 +01:00
William Lallemand
da28cd08f5 CLEANUP: ssl: fix comment in 'show ssl sni'
Fix a comment in the 'show ssl sni' IO handler.
2024-12-10 11:17:10 +01:00
William Lallemand
9681fe0dba DOC: management: fix typos and paragraph ordering in 'show ssl sni'
Fixes small typos, uppercase and paragraph ordering in the 'show ssl
sni' section.
2024-12-10 10:27:57 +01:00
Ilia Shipitsin
d61cac4ed1 CI: use "/tmp" as default value for TMPDIR when searching logs
VTest use /tmp already if not defined, let stick the behaviour for
searching logs as well
2024-12-10 08:20:51 +01:00
Ilia Shipitsin
193c94a539 BUG/MINOR: resolvers: handle a possible strdup() failure
This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to all supported branches.
2024-12-10 08:05:50 +01:00
Ilia Shipitsin
ce30bc1730 BUG/MINOR: ssl_crtlist: handle a possible strdup() failure
This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to all supported branches.
2024-12-10 08:05:42 +01:00
Ilia Shipitsin
abee546850 BUG/MINOR: namespace: handle a possible strdup() failure
This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to all supported branches.
2024-12-10 08:05:34 +01:00
Valentine Krasnobaeva
1f63a53955 BUG/MINOR: mworker: detach from tty when received READY from worker
Some master process' initialization steps are conditioned by receiving the
READY message from worker (pidfile creation, forwarding READY message to the
launching parent). So, master process can not do these initialization routines
before.

If the master process fails, while creating pid or forwarding the READY to the
parent in daemon mode, he exits with a proper alert message. In daemon mode we
no longer see such message, as process is already detached from the tty.

To fix this, as these alerts could be very useful, let's detach the master
process from the tty after his last initialization steps in _send_status.
2024-12-09 21:32:54 +01:00
Valentine Krasnobaeva
97aaf76716 BUG/MEDIUM: mworker: report status, if daemonized master fails
As daemonization fork happens now very early and before the master-worker
fork, if master or worker processes fail during the initialization, some
critical errors can't be reported to stdout. The launching (parent) process in
such cases exits with 0. This makes an impression, that master and his worker
have successfully started at background, which really complicates the
operations.

In the previous commit a pipe was added to make daemonized child communicate
with his parent. Let's add the same logic to master-worker mode. Up to
receiving the READY message from the worker, master will "forward" it via the
pipe to the launching process. Launching process can obtain master's exit
status, if the master fails to start and nothing has been written in the pipe.

This fix should be backported only in 3.1.
2024-12-09 21:32:49 +01:00
Valentine Krasnobaeva
663d75e7a0 BUG/MEDIUM: startup: report status if daemonized process fails
Due to master-worker rework, daemonization fork happens now before parsing
and applying the configuration. This makes impossible to report correctly all
warnings and alerts to shell's stdout. Daemonzied process fails, while being
already in background, exit code reported by shell via '$?' equals to 0, as
it's the exit code of his parent.

To fix this, let's create a pipe between parent and daemonized child. The
child will send into this pipe a "READY" message, when it finishes his
initialization. The parent will wait on the "read" end of the pipe until
receiving something. If read() fails, parent obtains the status of the
exited child with waitpid(). So, the parent can correctly report the error to
the stdout and he can exit with child's exitcode.

This fix should be backported only in 3.1.
2024-12-09 21:32:44 +01:00
Valentine Krasnobaeva
5f94e98d89 BUG/MEDIUM: startup: don't daemonize if started with -c
Due to master-worker refactoring, daemonization fork happens now very early,
before parsing and verifying the configuration. For the moment there is no
any specific syntax, which needs for the daemon mode to be really applied in
order to perform the tests.

So, it's better not to do the daemonization fork, if 'daemon' keyword is
presented in the config (or -D option), when we started with -c (MODE_CHECK).
Like this, during the config verification, the process will always stay in
foreground and all warning or errors will be delivered to the stdout.

This fix should be backported only in 3.1.
2024-12-09 21:32:36 +01:00
William Lallemand
5d1b30d6b8 MEDIUM: ssl/cli: "show ssl sni" list the loaded SNI in frontends
The "show ssl sni" command, allows one to dump the list of SNI in an
haproxy process, or a designated frontend.

It lists the SNI with the type, filename, and dates of expiration and
activation
2024-12-09 18:29:35 +01:00
William Lallemand
5454824e31 MINOR: ssl: add notBefore and notAfter utility functions
Extracting notBefore and notAfter as a string can be bothersome,
add 2 utility functions that returns the value in a static buffer.
2024-12-09 18:29:23 +01:00
Willy Tarreau
c3ee4e375b MINOR: tools: make fddebug() automatically emit the location
fddebug() is sometimes quite helpful, but annoying to use when following
a call path because it's a pain to always repeat the function name and
call place. Let's have it automatically prepend the function name, the
file name and the line number, and make its arguments optional, replacing
them by a simple LF when all absent. This way, simply placing:

    fddebug();

is sufficient to emit a location follocing "[%s@%s:%d]\n". This function
must not be used in production (and even call places with it shouldn't be
committed) and it should only be used by developers, so the simplest the
better.
2024-12-09 18:05:09 +01:00
Willy Tarreau
d6dc8120c0 BUILD: debug: fix build issues in COUNT_IF() with -Wunused-value
Commit 7f64bb79fd ("BUG/MINOR: debug: COUNT_IF() should return true/false")
allowed the COUNT_IF() macro to return the evaluated value. This is handy
to place it in "if ()" conditions and count them at the same time. When
glitches are disabled, the condition is just returned as-is, but most call
places do not use the result, making some compilers complain. In addition,
while reviewing this, it was noticed that when DEBUG_STRICT=0, the macro
would still be replaced by a "do { } while (0)" statement, which not only
does not evaluate the expression, but also cannot return anything. Ditto
for COUNT_IF_HOT().

Let's make sure both are always properly evaluated now.
2024-12-09 18:04:51 +01:00
Willy Tarreau
cb21db04c7 BUG/MINOR: mux-h2: fix expression when detecting excess of CONTINUATION frames
Latest commit f0eca8fe7 ("MINOR: mux-h2/glitches: add a description to
the H2 glitches") misplaced the optional glitch description field, with
it appearing at the end of the if () condition and always reporting
an excess of CONTINUATION frames from the first exceeding one.

This needs to be backported along with that commit once it gets backported.
2024-12-06 18:53:19 +01:00
Willy Tarreau
f0eca8fe73 MINOR: mux-h2/glitches: add a description to the H2 glitches
Since we can now list them using "debug counters" and now support a
description, better add the description to all glitches. This patch may
be backported to 3.1, but before this the following patches must also
be picked:

    86823c828 MINOR: mux-h2/traces: add a missing trace on negative initial window size
    7c8e9420a CLEANUP: mux-h2/traces: reword certain ambiguous traces
2024-12-06 18:49:07 +01:00
Willy Tarreau
7c8e9420a2 CLEANUP: mux-h2/traces: reword certain ambiguous traces
Some h2 traces were not very clear, let's reword them a bit.
2024-12-06 18:45:46 +01:00
Willy Tarreau
86823c828f MINOR: mux-h2/traces: add a missing trace on negative initial window size
When a negative initial windows size is reported, we're going to close
the connection, so it's important to report a trace to explain why!
This should be backported at least to 3.1 and possibly 3.0 (adapting the
context since there's no glitches there).
2024-12-06 18:45:46 +01:00
Willy Tarreau
7f64bb79fd BUG/MINOR: debug: COUNT_IF() should return true/false
The COUNT_IF() macro was initially meant to return true/false to be used
in if() conditions but had an extra do { } while(0) that prevents it from
doing so. Let's get rid of the do { } while(0) before the code generalizes
to too many places. There's no impact on existing code, but may have to be
backported if future fixes rely on it.
2024-12-06 18:45:46 +01:00
Amaury Denoyelle
fc0bb6224c MINOR: quic: implement build options report
Define a new function quic_register_build_options(). Its purpose is to
register a build options string for QUIC features which is reported when
using haproxy -vv.

This will allow to easily determine if connection socket-owner mode and
GSO are supported or not. Here is the new filtered output :

$ ./haproxy -vv|grep '^QUIC:'
QUIC: connection socket-owner mode support : yes
QUIC: GSO emission support : yes
2024-12-06 18:34:10 +01:00
Amaury Denoyelle
cab2cc15c1 MINOR: quic: split startup check function
Two features are tested on startup via quic_test_socketopts() :
connection socket-owner mode support and GSO. Extract both test in their
separated functions called by quic_test_socketopts().

This patch will allow to reuse easily QUIC features detection for build
options report via haproxy -vv.
2024-12-06 18:34:09 +01:00
Amaury Denoyelle
e7fd458c14 MINOR: quic: convert startup check in a freestanding function
quic_test_socketopts() function is used to detect system support for
QUIC network stack. Previously, it relies on an already bound listener
instance, notably to ensure that two UDP sockets can be bound on the
same source address.

Improve quic_test_socketopts() to run without any listener argument. It
now automatically instantiates and manipulates two dummy sockets FDs to
check for multi-bind support. This brings two advantages :
* the function is now called via an initcall
* it will easily be reusable to implement build option description
2024-12-06 18:33:50 +01:00
Amaury Denoyelle
d4f6f2df5e MINOR: mux-fcgi: use explicit __objt_server on idle conn reinsert
This commit is the counterpart of the previous one for FCGI mux. It
replaces objt_server() by unsafe __objt_server(), as conn target is
guarantee to point to a valid server instance, which can then be used as
_srv_add_idle() argument.
2024-12-06 18:02:55 +01:00
Amaury Denoyelle
1778284824 MINOR: mux-spop: use explicit __objt_server on idle conn reinsert
This commit is the counterpart of the previous one for SPOP mux. It
replaces objt_server() by unsafe __objt_server(), as conn target is
guarantee to point to a valid server instance, which can then be used as
_srv_add_idle() argument.

This should fix coverity report from github issue #2811.
2024-12-06 18:02:55 +01:00
Amaury Denoyelle
762d0764d7 MINOR: mux-h2: use explicit __objt_server on idle conn reinsert
This commit is the counterpart of the previous one for H2 mux. It
replaces objt_server() by unsafe __objt_server(), as conn target is
guarantee to point to a valid server instance, which can then be used as
_srv_add_idle() argument.
2024-12-06 18:02:55 +01:00
Amaury Denoyelle
ece3bf65ca MINOR: mux-h1: use explicit __objt_server on idle conn reinsert
When dealing with a backend connection, H1 mux IO handler must reinsert
it in its idle list pool if it was extracted from it at the beginning.
This is the case if conn_in_list is true.

On reinsert, idle list pool is retrieved via the server instance
accessible from <conn.target>. Replace objt_server usage with
__objt_server as an idle connection is always attached to a server. This
ensures that there is no issue when using _srv_add_idle() then.

This should fix coverity report from github issue #2810.
2024-12-06 18:02:55 +01:00
Aurelien DARRAGON
7934eef25d DOC: config: fix confusing init-state examples
in 50322dff ("MEDIUM: server: add init-state"), some examples on how to
use init-state server keyword were added alongside with the keyword
documentation.

However, as reported by Nick Ramirez, there was an error because the
example that stated that haproxy will pass the traffic to the server after
3 successful health checks used the "init-state down" instead of the
"init-state fully-down". Thus the behavior wouldn't match what the
comment said (only 1 successful health check was required).

Here we fix the example in itself to match with the comment. Also the
following example ("# or") was also affected, but it is kind of
redundant as the main purpose of the examples are to illustrate the
feature in itself and not how to use server-template directive, so we
remove it.

This should be backported in 3.1 with 50322dff
2024-12-06 13:16:12 +01:00
Valentine Krasnobaeva
f24e57d717 CLEANUP: startup: make if condition to kill old pids more readable
Update comment and condition. nb_oldpids it's not a pointer, but a signed int,
which keeps the max number of elements in oldpids array. So, it's a good
practice to check, if it's strictly positive here.
2024-12-06 12:00:22 +01:00
Valentine Krasnobaeva
cd0b58e23e BUG/MINOR: startup: fix error path for master, if can't open pidfile
If master process can't open a pidfile, there is no sense to send SIGTTIN to
oldpids, as it will exit. So, old workers will terminate as well. It's better
to send the last alert to the log about unrecoverable error, because master is
already in its polling loop.

For the standalone mode we should keep the previous logic in this case: send
SIGTTIN to old process and unbind listeners for the new one. So, it's better
to put this error path in main(), as it's done when other configuration settings
can't be applied.

This patch should be backported only in 3.1.
2024-12-06 12:00:22 +01:00
Valentine Krasnobaeva
ee111d2004 BUG/MINOR: mworker: fix -D -W -sf/-st modes
When a new master process is launched like below:

	./haproxy -W -D -p ha.pid -sf $(cat ha.pid)...

The old master process and its workers do not stop. Since the master-worker
refactoring, the code, which sends USR1/TERM to old pids from -sf, is called
only for the standalone mode. In master-worker mode we should receive the READY
message from the newly forked worker at first, in order to be able to terminate
the previous master.

So, to fix this, let's terminate the previous master in _send_status(), where
we parse the READY message from the newly forked worker. And let's continue to
use oldpids array, as it was in 3.0, in order to stop the workers, launched
before the reload.

This patch should be backported only in 3.1.
2024-12-06 12:00:22 +01:00
Valentine Krasnobaeva
1fead6c0ca BUG/MINOR: mworker: don't save program PIDs in oldpids
After reload, previously launched programs are stopped explicitly in
mworker_ext_launch_all(). So, there is no longer need to save their PIDs in
oldpids array before the master reexec().

This also prepares the fix of "-D -W -sf/-st" modes, as we will need to
loop over this array in the master process context, in order to stop the
previous master, when the new one is ready.

This patch should be backported only in 3.1.
2024-12-06 12:00:22 +01:00
Christopher Faulet
bc453c5106 BUG/MINOR: config: Fix parsing of accept-invalid-http-{request,response}
These options are now deprectated, but the proxy capabilities are not
properly checked during the configuration parsing leading to always ignore
these options. This is now fixed by checking the frontend capability for
"accept-invalid-http-request" option and the backend capability for
"accept-invalid-http-response" option.

In addition, the messages about the deprecation of these options are now
emitted with ha_warning() instead of ha_alert() because they are only
warnings and not errors.

This patch should fix the issue #2806. It must be backported to 3.1.
2024-12-05 22:02:58 +01:00
Amaury Denoyelle
7885a3b3e1 MINOR: mux-quic: clean up zero-copy done_ff callback
Recently, an issue was found with QUIC zero-copy forwarding on 3.0
version. A desynchronization could occur internally in QCS Tx bytes
counters which would cause a BUG_ON() crash on qcs_destroy() when the
stream is detached.

It was silently fixed in version 3.1 by the following patch. As it was
considered as an optimization, it was not scheduled yet for backport.

  6697e87ae5e1f569dc87cf690b5ecfc049c4aab0
  MINOR: mux-quic: Don't send an emtpy H3 DATA frame during zero-copy forwarding

This mistake has been caused due to some counter-intuitive manipulation
in QUIC zero-copy implementation. Try to streamline this in QUIC MUX
done_ff callback and its application protocol counterpart. Especially
for values exchanged between MUX and application on one side, and MUX
and stconn layer as done_fastfwd return value.

First, application done_ff callback now returns the length of the wholly
encoded frame. For HTTP/3, it means header length + payload length h3
frame. This value can then be reused as qcc_send_stream() argument to
increase QCS Tx soft offset.

As previously, special care has been taken to ensure that QUIC MUX
done_ff only return the transferred data bytes. Thus, any extra offset
for HTTP/3 header is properly excluded. This is mandatory for stconn
layer to consider the transfer has completed.

Secondly, remove duplicated code in application done_ff to reset iobuf
info. This is now factorize in QUIC MUX done_ff itself.

This patch is related to github issue #2678.
2024-12-05 16:57:31 +01:00
Willy Tarreau
d649278fce BUG/MEDIUM: mux-h2: make sure not to touch dummy streams when sending WU
Since commit 1cc851d9f2 ("MEDIUM: mux-h2: start to update stream when
sending WU") we started storing stream offsets in the h2s struct. These
offsets are updated at a few points, where it's safe to write to the
stream, and in h2c_send_strm_wu(), where the h2s->h2c was not performed.

Due to this, nothing protects the h2s from being updated when sending a
WU for a closed stream, which might only happen when acknowledging a
frame after resetting that stream, which is quite unlikely. In any case
if this happens, it will crash as in issue #2793 since the closed streams
are purposely read-only to catch such bugs.

The fix is trivial, just check h2s->h2c before deciding to update the
stream.

Thanks to @Wahnes for reporting this, and Christopher for spotting the
cause. This needs to be backported to 3.1 only.
2024-12-05 15:25:09 +01:00
Aurelien DARRAGON
ae9d8d40d0 CLEANUP: stktable: add some stktable flags polishing
Better late than never, commit 1f73d35 ("MINOR: stktable: implement
"recv-only" table option") implemented stktable flags and initial
definitions, but it lacks some comments plus the flag is stored as
16bits but the SKT_FL_ definition width allows for only 8bits so
it is a bit confusing, let's fix that
2024-12-05 13:14:21 +01:00
Aurelien DARRAGON
9f44c5f9be CLEANUP: stktable: replace nopurge attribute with flag
Thanks to previous commit stktable struct now have a "flags" struct member

Let's take this opportunity to remove the isolated "nopurge" attribute in
stktable struct and rely on a flag named STK_FL_NOPURGE instead.

This helps to better organize stktable struct members.
2024-12-05 12:15:31 +01:00
Aurelien DARRAGON
1f73d3524d MINOR: stktable: implement "recv-only" table option
When "recv-only" keyword is added on a stick table declaration (in peers
or proxy section), haproxy considers that the table is only used for
data retrieval from a remote location and not used to perform local
updates. As such, it enables the retrieval of local-only values such
as conn_cur that are ignored by default. This can be useful in some
contexts where we want to know about local-values such are conn_cur
from a remote peer.

To do this, add stktable struct flags  which default to NONE and enable
the RECV_ONLY flag on the table then "recv-only" keyword is found in the
table declaration. Then, when in peer_treat_updatemsg(), when handling
table updates, don't ignore data updates for local-only values if the flag
is set.
2024-12-05 12:15:24 +01:00
Amaury Denoyelle
3c239b2f80 BUG/MINOR: quic: remove startup alert if GSO unsupported
This patch is similar to the previous one, but for GSO support. Remove
alert level message to a diag report only visible with argument -dD.

This must be backported up to 3.1.
2024-12-05 11:30:31 +01:00
Amaury Denoyelle
6fed219fd7 BUG/MINOR: quic: remove startup alert if conn socket-owner unsupported
QUIC relies on several advanced network API features from the kernel to
perform optimally. Checks are performed during startup to ensure that
these features are supported. A fallback is automatically performed for
every incompatible feature.

Besides the automatic fallback mechanism, a message is also reported to
the user at the same time. Previously, alert level was used, but it is
incorrect as it is reserved for unrecoverable errors which should
prevent haproxy to start. Warning level could be used, but this can
annoy users running with zero-warning mode.

This patch removes the alert message when 'socket-owner connection' mode
cannot be activated. Convert the message to a diag level. This allows
users to start without forcing configuration modification to hide a
warning. Besides, several feature fallback such as the polling mechanism
does not emit any warning either, so it's better to adopt a similar
behavior for QUIC features.

This must be backported up to 2.8.
2024-12-05 11:30:12 +01:00
Amaury Denoyelle
08f557f0c4 BUG/MEDIUM: mux-quic: remove pacing status when everything is sent
TASK_F_USR1 is used by MUX tasklet when emission has been interrupted
due to pacing. When the tasklet runs again, only qcc_purge_sending()
will be called as an optimization.

Pacing status is only removed via qcc_wakeup(). Until then, TASK_F_USR1
is not cleared. This causes an issue after emission with pacing
completion if the MUX tasklet is woken up for a recv subscribe, as
qcc_wakeup() is not used by quic-conn layer. The tasklet will
incorrectly run only for pacing emission, without handling reception
process. Worst, a crash will occur if QCC tx frames list is empty, due
to a BUG_ON() in qcc_purge_sending().

Recv subscribe is only used for 0-RTT, when QUIC MUX is instantiated
before quic-conn handshake completion. Thus, this bug can only be
reproduced with 0-rtt. Furthermore, MUX must already have emitted at
least a few response bytes with pacing, before QUIC handshake
completion. It cannot easily be reproduced, at least with CLI clients
where the handshake is always already completed before MUX exchanges.

To fix this, remove TASK_F_USR1 when pacing emission has been completed.
At least, this prevents BUG_ON() on qcc_purge_sending() as it won't be
called with an empty QCC Tx frame list anymore. However, this bug has
revealed that MUX tasklet architecture is not suitable when both
handling reception and emission part. This will be improved in a future
serie of patches.

This should fix github issue #2796.

This must be backported up to 3.1.
2024-12-05 11:04:06 +01:00
Willy Tarreau
8b16b72541 BUG/MINOR: init: do not call fork_poller() for non-forked processes
In 3.1-dev10, commit 8dd4efe42f ("MAJOR: mworker: move master-worker
fork in init()") made the fork_poller() code unconditional, while it
is only desirable for processes that have been forked from a parent
(standalone daemon mode) or from a master (master-worker mode). The
call can be expensive in some cases as it will create a new poller,
scan and try to migrate to it all existing FDs till the highest known
one. With very high numbers of FDs, this can take several seconds to
start.

This should be backported to 3.1.
2024-12-04 19:46:42 +01:00
Willy Tarreau
70e4938aec BUG/MEDIUM: init: make sure only daemonized processes change their session
Commit 8dd4efe42f ("MAJOR: mworker: move master-worker fork in init()")
introduced some sensitive changes to the startup code (which was
expected), and one sensitive change is that the second call to setsid()
was accidentally made unconditional. As such it even applies to foreground
processes, resulting in foreground processes being detached from the
terminal and no longer responding to Ctrl-C nor Ctrl-Z. An example of
this simply consists in start haproxy -db under sudo. Then a new shell
is required to stop it.

This patch removes this second setsid(), as it is already done in
apply_daemon_mode().

This must be backported to 3.1.
2024-12-04 19:46:42 +01:00
Frederic Lecaille
6404b7a18a BUG/MINOR: quic: fix bbr_inflight() calls with wrong gain value
This patch fixes two wrong calls to bbr_inflight().

bbr_target_inflight() aim is to compute the number of bytes BBR has to put on
the network as bytes in flight (sent but not acked bytes). It must call
bbr_inflight() with the current window gain value (in place of a wrong fixed 100
gain value here, in percents).

bbr_is_time_to_cruise() also called bbr_inflight() with a wrong gain value
as parameter due to a confusion between the value mentioned by the RFC (1
meaning 100% of the current window) and our implementation which needs value in
percents (so 100 in place of 1 here). Note that bbr_is_time_to_cruise() aim is to
make BBR the decision to leave the probing_bw down state. The bug had as side
effect to make BBR stay in this state during too long periods of time during
which the bottleneck bandwidth is decreasing, leading to big oscillations
between the mininum and maximum bottleneck bandwidth estimations.

This patch must be backported to 3.1 where BBR was first implemented.
2024-12-04 18:47:15 +01:00
Willy Tarreau
e6f4f15929 MINOR: tasklet: set TASK_WOKEN_OTHER on tasklets by default
Now when tasklets are woken up via tasklet_wakeup(), tasklet_wakeup_on()
or tasklet_wakeup_after(), either the optional wakeup flags will be used,
or TASK_WOKEN_OTHER will be used.

This allows tasklet handlers waking up for any given cause to notice
whether or not they were also woken for another reason. For example, a
mux handler could skip heavy parts when seeing that TASK_WOKEN_OTHER is
absent, proving that no standard tasklet_wakeup() was done, for example
in response to a subscribe().

The benefit of the TASK_WOKEN_* flags is that they're purged during the
wakeup, and that they're easy to check for using TASK_WOKEN_ANY.
TASK_F_UEVT1 and TASK_F_UEVT2 are also usable for private use (e.g. wakeup
from a stream to a connection inside a mux).

Probably that in the future, code dealing with subscribe events should
start to place TASK_WOKEN_IO like is done for upper layers.
2024-12-03 19:45:08 +01:00
Willy Tarreau
6322c9fbbf MINOR: tools: add a new macro DEFVAL() to provide a default argument
This is like DEFZERO and DEFNULL, but this one allows to specify the
default value to be used as the first argument.
2024-12-03 19:45:08 +01:00
Valentine Krasnobaeva
295071007b BUG/MINOR: startup: fix pidfile creation
Pidfile should be created at the latest initialization stage, when we are
sure, that process is able to start successfully, otherwise PID value, written
in this file is no longer valid.

So, for the standalone mode, let's move the block, which opens the pidfile and
let's put it just before applying "chroot". In master-worker mode, master
doesn't perform chroot. So it creates the pidfile, only when the "READY"
message from the newly forked worker is received.

This should be backported only in 3.1
2024-12-02 17:28:04 +01:00
Valentine Krasnobaeva
a33977da48 BUG/MINOR: startup: close pidfd and free global.pidfile in handle_pidfile()
After master-worker mode refactoring, global.pidfile is only used in
handle_pidfile(), which opens the provided file and writes the PID into it. So,
it's more appropriate to perform the close(pidfd) and ha_free(&global.pidfile)
also in this function.

This commit prepares the fix of the pidfile creation, as it's created now very
early, when we are not sure, that process has successfully started. In
master-worker mode handle_pidfile() can be called in the master process context.
So, let's make it accessible from other compilation units via global.h.

This should be backported only in 3.1.
2024-12-02 17:28:04 +01:00
Valentine Krasnobaeva
d3c20b0246 BUG/MINOR: signal: register default handler for SIGINT in signal_init()
When haproxy is launched in a background and in a subshell (see example below),
according to POSIX standard (2.11. Signals and Error Handling), it inherits
from the subshell SIG_IGN signal handler for SIGINT and SIGQUIT.

	$ (./haproxy -f env4.cfg &)

So, when haproxy is lanched like this, it doesn't stop upon receiving
the SIGINT. This can be a root cause of some unexpected timeouts, when haproxy
is started under VTest, as VTest sends to the process SIGINT in order to
terminate it. To fix this, let's explicitly register the default signal
handler for the SIGINT in signal_init() initcall.

This should be backported in all stable versions.
2024-12-02 17:28:04 +01:00
Aurelien DARRAGON
70b5cd6794 MINOR: hlua: fix ambiguous hlua usage in hlua_filter_delete()
In GH #2804, @Bbulatov reported that the result of hlua_stream_ctx_get()
was used and de-referenced without checking if it's NULL in
hlua_filter_delete() while other functions used to check for NULL before
de-referencing it.

In fact hlua_stream_ctx_get() can only return NULL if
hlua_stream_ctx_prepare() failed or was not called on the current stream.

Now because of the filter's API, since hlua_filter_delete() is mapped as
detach method and hlua_filter_new() as attach method, and since
hlua_filter_new() is responsible for calling hlua_stream_ctx_prepare(),
there's no reason hlua_filter_delete() should be called if
hlua_filter_new() failed or wasn't called. Thus we can assume that hlua
can never be NULL in hlua_filter_delete(), so we add a BUG_ON() to ensure
it is always the case and remove the ambiguity.
2024-12-02 17:22:51 +01:00
Aurelien DARRAGON
b167426b6b BUG/MINOR: listener: fix potential null pointer dereference in listener_release()
As reported by @Bbulatov on GH #2804, fe is found at multiple places in
listener_release(): in some places it is first checked against NULL before
being de-referenced while in some other places it is not, which is
ambiguous and could hide a bug.

In practise, fe cannot be NULL for now, but it might not be the case in
the future as we want to keep the possibility to run isolated listeners
(that is, without proxy attached).

We've already ensured this was the case with a57786e ("BUG/MINOR:
listener: null pointer dereference suspected by coverity"), but
this promise was recently broken by 65ae134 ("BUG/MINOR: listener: Wake
proxy's mngmt task up if necessary on session release").

Let's fix that by conditionning the block with an "else if" statement
instead of a regular "else".

No need for backport except if multi-connection protocols (ie: FTP) were
to be backported as well.
2024-12-02 17:22:45 +01:00
William Lallemand
a582b9c18d CI: github: allow coredumps on aws-lc and wolfssl jobs
The weekly aws-lc and wolfssl jobs lacks an `ulimit -c` call in order to
allow to get the coredumps.
2024-12-02 15:19:41 +01:00
Frederic Lecaille
7868dc9c45 BUILD: quic: fix a build error about an non initialized timestamp
This is to please a non identified compilers which complains about an hypothetic
<time_ns> variable which would be not initialized even if this is the case only
when it is not used.

This build issue arrived with this commit:
	BUG/MINOR: improve BBR throughput on very fast links

Should be backported to 3.1 with this previous commit.
2024-11-29 14:48:37 +01:00
Christopher Faulet
37487ada73 BUG/MINOR: h1-htx: Use default reason if not set when formatting the response
When the response status line is formatted before sending it to the client,
if there is no reason set, HAProxy should add one that matches the status
code, as stated in the configuration manual. However it is not performed.

It is possible to hit this bug when the response comes from a H2 server,
because there is no reason field in HTTP/2 and above.

This patch should fix the issue #2798. It should be backported to all stable
versions.
2024-11-29 14:46:38 +01:00
Christopher Faulet
62f37801c8 BUG/MEDIUM: http-ana: Reset request flag about data sent to perform a L7 retry
It is possible to loose the request after several L7 retries, leading to
crashes, because the request channel flag stating some data were sent is not
properly reset.

When a L7 retry is performed, some flags on different entities must be reset
to be sure a new connection will be properly retried, just like it was the
first one, mainly because there was no connection establishment failure. One
of them, on the request channel, is not reset. The flag stating some data
were already sent. It is annoying because this flag is used during the
connection establishment to know if an error is triggered at the connection
level or at the data level. In the last case, the error must be handled by
the HTTP response analyzer, to eventually perform another L7 retry.

Because CF_WROTE_DATA flag is not removed when a L7 retry is performed, a
subsequent connection establishment error may be handled as a L7 error while
in fact the request was never sent. It also means the request was never
saved in the buffer used to performed L7 retries. Thus, on the next L7
retires, the request is just lost. This forecefully leads to a bunch of
undefined behavior. One of them is a crash, when the request is used to
perform the load-balancing.

This patch should fix issue #2793. It must be backported to all stable
versions.
2024-11-29 14:46:38 +01:00
Amaury Denoyelle
9d4c26ebaa BUG/MEDIUM: quic: prevent stream freeze on pacing
On snd_buf completion, QUIC MUX tasklet is scheduled if newly data has
been transferred from the stream layer. Thanks to qcc_wakeup(), pacing
status is removed from tasklet, which ensure next emission will reset Tx
frames and use the new data.

Tasklet is not scheduled if MUX is already subscribed on send due to a
previous blocking condition. This is an optimization to prevent an
unneeded IO handler execution. However, this causes a bug if an emission
is currently delayed due to pacing. As pacing status is not removed on
snd_buf, next emission process will continue emission with older data
without refreshing the newly transferred one.

This causes a transfer freeze. Unless there is some activity on the
connection, the transfer will be eventually aborted due to idle timeout.

To fix this, remove TASK_F_USR1 if tasklet wakeup is not called due to
send subscription. Note that this code is also duplicated in done_ff for
zero-copy transfer.

This must be backported up to 3.1.
2024-11-29 14:35:10 +01:00
Aurelien DARRAGON
dd56616067 BUG/MEDIUM: event_hdl: fix uninitialized value in async mode when no data is provided
In _event_hdl_publish(), when we prepare the asynchronous event and no
<data> was provided (set to NULL), we forgot to initialize the _data
event_hdl_async_event struct member to NULL, which leads to uninitialized
reads in event_hdl_async_free_event() when the event is freed:

==1002331== Conditional jump or move depends on uninitialised value(s)
==1002331==    at 0x35D9D1: event_hdl_async_free_event (event_hdl.c:224)
==1002331==    by 0x1CC8EC: hlua_event_runner (hlua.c:9917)
==1002331==    by 0x39AD3F: run_tasks_from_lists (task.c:641)
==1002331==    by 0x39B7B4: process_runnable_tasks (task.c:883)
==1002331==    by 0x314B48: run_poll_loop (haproxy.c:2976)
==1002331==    by 0x315218: run_thread_poll_loop (haproxy.c:3190)
==1002331==    by 0x18061D: main (haproxy.c:3747)

The bug severity was set to MEDIUM because of its nature, and it's best
if this patch can be backported up to 2.8. But in practise it can only be
triggered with events that don't provide optional data: since PAT_REF
events are the first native events making use of this feature, this bug
shouldn't be an issue before f72a66e ("MINOR: pattern: publish event_hdl
events on pat_ref updates")
2024-11-29 10:18:07 +01:00
Aurelien DARRAGON
4e52438c0b BUG/MINOR: hlua_fcn: fix Patref:set() force parameter
Patref:set(key, val[, force]) takes optional "force" parameter (defaults
to false) to force the entry to be created if it doesn't already exist

To retrieve the value, lua_tointeger() was used in place of
lua_toboolean(), and because of that force is not enabled if "true"
is passed as parameter (only numbers were recognized) despite the
documentation mentioning that "force" is a boolean.

To fix the issue, we replace lua_tointeger by lua_toboolean.

Also, the doc was updated to rename "bool" to "boolean" for the "force"
parameter to stay consistent with historical naming in the file.

No backport needed unless 9ee37de5c ("MINOR: hlua_fcn: add Patref:set()")
is.
2024-11-29 07:39:38 +01:00
Aurelien DARRAGON
e5acb03137 DOC: lua: prefer Patref:{set,add}() over legacy methods for acl and maps
Patref:set() can achieve the same thing as core.set_map()
Patref:add() can achieve the same thing as core.add_acl()
Patref:del() can achieve the same thing as core.del_map() and
core.del_acl()

As a bonus, Patref:{set,add} are more efficient than their core
legacy equivalent, because they don't require systematic pattern
reference lookup for each individual operation.

Let's mention that in the doc to encourage Patref methods adoption.
2024-11-29 07:23:59 +01:00
Aurelien DARRAGON
7ff9a1c341 MINOR: hlua_fcn: add Patref:event_sub()
Just like we did for server events, in this patch we expose the PAT_REF
event family (see "MINOR: event_hdl: add PAT_REF events") in Lua.

Unlike server events, Patref events don't provide additional event data,
and the registration can only take place from a Patref object (ie: not
globally).

Thanks to this commit it now becomes possible to trigger actions when
updates are performed on a map (or acl list) being monitor, without
the need to loop or use inefficient workarounds.
2024-11-29 07:23:53 +01:00
Aurelien DARRAGON
884dc6232a MINOR: hlua_fcn: add Patref:add_bulk()
There is no cli equivalent for this one. It is similar to Patref:add()
excepts thay it takes a table as parameter (for acl: table of keys, for
maps: table of keys:values). The goal is to add multiple entries at once
to limit locking time to the strict minimum. It is recommended to use this
one over Patref:add() when adding multiple entries at once.
2024-11-29 07:23:48 +01:00
Aurelien DARRAGON
9ee37de5cf MINOR: hlua_fcn: add Patref:set()
Just like "set map" on the cli, the Patref:set() method (only relevant
for maps) can be used to modify an existing entry's value in the pattern
reference pointed to by the Lua Patref object. Lookup is performed on the
key. The update will target the live pattern reference version, unless
Patref:prepare() is ongoing.
2024-11-29 07:23:43 +01:00
Aurelien DARRAGON
a5f74a2a2d MINOR: hlua_fcn: add Patref:del()
Just like "del map" and "del acl" on the cli, the Patref:del() method can
be used to delete an existing entry in the pattern reference pointed to
by the Lua Patref object. The update will target the live pattern
reference version, unless Patref:prepare() is ongoing.
2024-11-29 07:23:37 +01:00
Aurelien DARRAGON
6cc2662ce7 MINOR: hlua_fcn: add Patref:add()
Just like "add map" and "add acl" on the cli, the Patref:add() method can
be used to add a new entry to the pattern reference pointed to by the
Lua Patref object. The update will target the live pattern reference
version, unless Patref:prepare() is ongoing.
2024-11-29 07:23:32 +01:00
Aurelien DARRAGON
3bcc653ce1 MINOR: hlua_fcn: add Patref:giveup()
If Patref:commit() was used and the new version (generation) isn't going
to be committed, calling Patref:giveup() will allow allocated resources
to be freed and reused. It is a good habit to call this if commit()
isn't called after a prepare().
2024-11-29 07:23:26 +01:00
Aurelien DARRAGON
fda5ca3472 MINOR: hlua_fcn: add Patref:purge() method
It is a special Lua Patref method: it bypasses the commit/prepare logic
and purges the whole pattern reference items pointed to by Patref Lua
object (all versions, not just the current one). It doesn't have a cli
equivalent: it leverages pat_ref_purge_range().
2024-11-29 07:23:20 +01:00
Aurelien DARRAGON
fe394598c5 MINOR: hlua_fcn: add Patref:prepare() method
Just like the "prepare map" or "prepare acl" on the cli, but for Lua:
it leverages the pattern API to create a subset (ie: a new generation id)
that will automatically be used as target for following Patref operations
(add/set/del...) until the "commit" method is invoked to atomically push
the pending updates.
2024-11-29 07:23:14 +01:00
Aurelien DARRAGON
8bce7ff854 MINOR: hlua_fcn: add Patref:commit() method
commit() method may be used to commit pending updates on the local patref
object:

hlua_patref flags were added:
 HLUA_PATREF_FL_GEN means the patref object has been updated
 and it is associated to a new revision (curr_gen) in order to prepare
 and commit the pending updates.

upon commit, the pattern API is leveraged with curr_gen as revision to
commit new object items. Once commit is performed, previous (pending)
revisions that are older than the committed one are cleaned up (similar
to what's done with commit on the cli). Also, Patref function APIs now
take into account curr_gen to perform lookups.
2024-11-29 07:23:08 +01:00
Aurelien DARRAGON
e769d8f426 MINOR: pattern: add pat_ref_may_commit() helper function
pat_ref_may_commit() may be used to know if a given generation ID id still
valid, which means it may still be committed at some point. Else it means
that another pending generation ID older than the tested one was already
committed and thus other generations ID below this one are stale and must
be regenerated.
2024-11-29 07:23:01 +01:00
Aurelien DARRAGON
43ab25f007 MINOR: hlua_fcn: wrap pat_ref struct for patref class
In order to extend the patref class features, let's wrap the pat_ref struct
into hlua_patref struct. This way we may add additional data alongside the
pat_ref pointer to store additional context required for pat_ref data
manipulation from lua.

Since the wrapper (hlua_patref) is an allocated object, we declare the _gc
metamethod for patref class in order to properly cleanup resources when
they are out of scope.
2024-11-29 07:22:54 +01:00
Aurelien DARRAGON
2021072391 MINOR: hlua_fcn: implement index and pair metamethods for patref class
patref object may now leverage index and pair methamethods to list and
access patref elements at a specific index (=key)

Also, patref:is_map() method may be used to know if the patref stores acl
(key only) or map-style (key:value) patterns.
2024-11-29 07:22:46 +01:00
Aurelien DARRAGON
31784efad2 MINOR: hlua: add core.get_patref method
core.get_patref() method may be used to get a reference to a pattern
object (pat_ref struct which is used for maps and acl storage) from
Lua by providing the reference name (filename for files, or prefix+name
for opt or virtual pattern references).

Lua documentation was updated.
2024-11-29 07:22:38 +01:00
Aurelien DARRAGON
956a25cf60 MINOR: hlua: add patref class
Implement patref class to expose pat_ref struct internal pattern struct
in lua. This is some prerequisite work needed to be able to manipulate
exisiting generic pattern object lists (acl/map) from Lua, because the Map
class can only be used to perform matching ops on Map files.
2024-11-29 07:22:32 +01:00
Aurelien DARRAGON
f72a66eef2 MINOR: pattern: publish event_hdl events on pat_ref updates
Now that PAT_REF events were defined in previous commit, let's actually
publish them from pattern API where relevant. Unlike server events,
pattern reference events are only published in the pat_ref subscriber's
list on purpose, because in some setups patref updates (updates performed
on a map for instance from action or cli) are very frequent, and we don't
want to impact pattern API performance just for that.

Moreover, as the main use case is to be able to subscribe to maps updates
from Lua, allowing a per-pattern reference registration is already enough.

No additional data is provided for such events (also for performance reason)

Care was taken not to publish events when the update doesn't affect the
live subset (the one targeted by curr_gen).
2024-11-29 07:22:25 +01:00
Aurelien DARRAGON
f7267bd315 MINOR: event_hdl: add PAT_REF events
This is some prerequisite work for implementing PAT_REF events.

In this commit we define the PAT_REF event_hdl family (which gets family
slot id #2), with the following supported events:

  - EVENT_HDL_SUB_PAT_REF_ADD: element was added to the current version of
    the pattern ref
  - EVENT_HDL_SUB_PAT_REF_DEL: element was deleted from the current
    version of the pattern ref
  - EVENT_HDL_SUB_PAT_REF_SET: element was modified in the current version
    of the pattern ref
  - EVENT_HDL_SUB_PAT_REF_COMMIT: pending element(s) was/were commited in
    the current version of the pattern ref
  - EVENT_HDL_SUB_PAT_REF_CLEAR: all elements were cleared from the
    current version of the pattern ref

The goal is to be able to track a pat_ref struct in order to be notified
when it is updated. For performance reasons, events from this family won't
provide any additional info, and will only be published in the pat_ref
subscription list. Indeed, pat_ref may be updated at a relatively high
frequency (or worse, batch work), so we cannot afford doing expensive
treatment for each update.
2024-11-29 07:22:18 +01:00
Frederic Lecaille
f8b697c19b BUG/MINOR: improve BBR throughput on very fast links
This patch fixes the loss of information when computing the delivery rate
(quic_cc_drs.c) on links with very low latency due to usage of 32bits
variables with the millisecond as precision.

Initialize the quic_conn task with TASK_F_WANTS_TIME flag ask it to ask
the scheduler to update the call date of this task. This allows this task to get
a nanosecond resolution on the call date calling task_mono_time(). This is enabled
only for congestion control algorithms with delivery rate estimation support
(BBR only at this time).

Store the send date with nanosecond precision of each TX packet into
->time_sent_ns new quic_tx_packet struct member to store the date a packet was
sent in nanoseconds thanks to task_mono_time().

Make use of this new timestamp by the delivery rate estimation algorithm (quic_cc_drs.c).

Rename current ->time_sent member from quic_tx_packet struct to ->time_sent_ms to
distinguish the unit used by this variable (millisecond) and update the code which
uses this variable. The logic found in quic_loss.c is not modified at all.

Must be backported to 3.1.
2024-11-28 21:39:05 +01:00
Aurelien DARRAGON
e37976166b MINOR: log: always consider "+M" option in lf_text_len()
Historically, when lf_text_len() or lf_text() were called with a NULL
string and "+M" option was set, "-" would be printed.

However, if the input string was simply an empty one with len > 0, then
nothing would be printed. This can happen if lf_text() is called with
an empty string because in this case len is set to size (indeed, for
performance reasons we don't pre-compute the length, we stop as soon
as we encounter a NULL-byte)

In practise, a lot of call places making use of lf_text() or lf_text_len()
try their best to avoid calling lf_text() with an empty string, and
instead explicitly call lf_text_len() with NULL as parameter to consider
the "+M" option.

But this is not enough, as shown in GH #2797, there could still be places
where lf_text() is called with an empty string. In such case, instead of
ignoring the "+M" option, let's check after _lf_text_len() if the returned
pointer differs from the original one. If both are equal, then it means
that nothing was printed (ie: result of empty string): in that case we
check the "+M" option to print "-" when possible.

While this commit seems harmless, it's probably better to avoid
backporting it since it could break existing applications relying on the
historical behavior.
2024-11-28 13:11:11 +01:00
Aurelien DARRAGON
3e470471b7 BUG/MINOR: log: fix lf_text() behavior with empty string
As reported by Baptiste in GH #2797, if a logformat alias leveraging
lf_text() ends up printing nothing (empty string), the whole logformat
evaluation stops, leading garbage log message.

This bug was introduced during 3.0 cycle in fcb7e4b ("MINOR: log: add
lf_rawtext{_len}() functions"). At that time I genuinely thought that
if strlcpy2() returned 0, it was due to a lack of space, actually
forgetting that the function may simply be called with an empty string.

Because of that, lf_text() would return NULL if called with an empty
string, and since all lf_*() helpers are expected to return NULL on
error, this explains why the logformat evaluation immediately stops in
this case.

To fix the issue, let's simply consider that strlcpy2() returning 0 is
not an error, like it was already the case before.

It should be backported in 3.1 and 3.0 with fcb7e4b.
2024-11-28 12:10:11 +01:00
Christopher Faulet
bc66d31985 MINOR: proxy: Add support of 421-Misdirected-Request in retry-on status
The "421" status can now be specified on retry-on directives. PR_RE_* flags
were updated to remains sorted.

This patch should fix the issue #2794. It is quite simple so it may safely
be backported to 3.1 if necessary.
2024-11-28 11:47:40 +01:00
Christopher Faulet
7262433183 BUG/MEDIUM: sock: Remove FD_POLL_HUP during connect() if FD_POLL_ERR is not set
epoll_wait() may return EPOLLUP and/or EPOLLRDHUP after an asynchronous
connect(), to indicate that the peer accepted the connection then
immediately closed before epoll_wait() returned. When this happens,
sock_conn_check() is called to check whether or not the connection correctly
established, and after that the receive channel of the socket is assumed to
already be closed. This lets haproxy send the request at best (if RDHUP and
not HUP) then immediately close.

Over the last two years, there were a few reports about this spuriously
happening on connections where network captures proved that the server had
not closed at all (and sometimes even received the request and responded to
it after haproxy had closed). The logs show that a successful connection is
immediately reported on error after the request was sent. After
investigations, it appeared that a EPOLLUP, or eventually a EPOLLRDHUP, can
be reported by epool_wait() during the connect() but in sock_conn_check(),
the connect() reports a success. So the connection is validated but the HUP
is handled on the first receive and an error is reported.

The same behavior could be observed on health-checks, leading HAProxy to
consider the server as DOWN while it is not.

The only explanation at this point is that it is a kernel bug, notably
because it does not even match the documentation for connect() nor epoll. In
addition for now it was only observed with Ubuntu kernels 5.4 and 5.15 and
was never reproduced on any other one.

We have no reproducer but here is the typical strace observed:

socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 114
fcntl(114, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
setsockopt(114, SOL_TCP, TCP_NODELAY, [1], 4) = 0
connect(114, {sa_family=AF_INET, sin_port=htons(11000), sin_addr=inet_addr("A.B.C.D")}, 16) = -1 EINPROGRESS (Operation now in progress)
epoll_ctl(19, EPOLL_CTL_ADD, 114, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=114, u64=114}}) = 0
epoll_wait(19, [{events=EPOLLIN, data={u32=15, u64=15}}, {events=EPOLLIN, data={u32=151, u64=151}}, {events=EPOLLIN, data={u32=59, u64=59}}, {events=EPOLLIN|EPOLLRDHUP, data={u32=114, u64=114}}], 200, 0) = 4
epoll_ctl(19, EPOLL_CTL_MOD, 114, {events=EPOLLOUT, data={u32=114, u64=114}}) = 0
epoll_wait(19, [{events=EPOLLOUT, data={u32=114, u64=114}}, {events=EPOLLIN, data={u32=15, u64=15}}, {events=EPOLLIN, data={u32=10, u64=10}}, {events=EPOLLIN, data={u32=165, u64=165}}], 200, 0) = 4
connect(114, {sa_family=AF_INET, sin_port=htons(11000), sin_addr=inet_addr("A.B.C.D")}, 16) = 0
sendto(114, "POST "..., 1009, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 1009
close(114)                              = 0

Some ressources about this issue:
  - https://www.spinics.net/lists/netdev/msg876470.html
  - https://github.com/haproxy/haproxy/issues/1863
  - https://github.com/haproxy/haproxy/issues/2368

So, to workaround the issue, we have decided to remove FD_POLL_HUP flag on
the FD during the connection establishement if FD_POLL_ERR is not reported
too in sock_conn_check(). This way, the call to connect() is able to
validate or reject the connection. At the end, if the HUP or RDHUP flags
were valid, either connect() would report the error itself, or the next
recv() would return 0 confirming the closure that the poller tried to
report. EPOLL_RDHUP is only an optimization to save a syscall anyway, and
this pattern is so rare that nobody will ever notice the extra call to
recv().

Please note that at least one reporter confirmed that using poll() instead
of epoll() also addressed the problem, so that can also be a temporary
workaround for those discovering the problem without the ability to
immediately upgrade.

The event is accounted via a COUNT_IF(), to be able to spot it in future
issue. Just in case.

This patch should fix the issue #1863 and #2368. It may be related
to #2751. It should be backported as far as 2.4. In 3.0 and below, the
COUNT_IF() must be removed.
2024-11-27 12:16:25 +01:00
Willy Tarreau
eea2697e95 DEV: patchbot: prepare for new version 3.2-dev
The bot will now load the prompt for the upcoming 3.2 version so we have
to rename the files and update their contents to match the current version.
2024-11-26 17:24:21 +01:00
Willy Tarreau
97d33abb23 MINOR: version: this is development again (3.2)
This basically reverts commit b629f366a7 ("MINOR: version: mention that
3.1 is stable now").
2024-11-26 17:21:16 +01:00
Aurelien DARRAGON
aa69a02d7f MEDIUM: pattern: always consider gen_id for pat_ref lookup operations
Historically, pat_ref lookup operations were performed on the whole
pat_ref elements list. As such, set, find and delete operations on a given
key would cause any matching element in pat_ref to be considered.

When prepare/commit operations were added, gen_id was impelemnted in
order to be able to work on a subset from pat_ref without impacting
the current (live) version from pat_ref, until a new subset is committed
to replace the current one.

While the logic was good, there remained a design flaw from the historical
implementation: indeed, legacy functions such as pat_ref_set(),
pat_ref_delete() and pat_ref_find_elt() kept performing the lookups on the
whole set of elements instead of considering only elements from the current
subset. Because of this, mixing new prepare/commit operations with legacy
operations could yield unexpected results.

For instance, before this commit:

  echo "add map #0 key oldvalue" | socat /tmp/ha.sock -
  echo "prepare map #0" | socat /tmp/ha.sock -
  New version created: 1
  echo "add map @1 #0 key newvalue" | socat /tmp/ha.sock -
  echo "del map #0 key" | socat /tmp/ha.sock -
  echo "commit map @1 #0" | socat /tmp/ha.sock -

  -> the result would be that "key" entry doesn't exist anymore after the
  commit, while we would expect the new value to be there instead.

Thanks to the previous commits, we may finally fix this issue: for set,
find_elt and delete operations, the current generation id is considered.

With the above example, it means that the "del map #0 key" would only
target elements from the current subset, thus elements in "version 1" of
the map would be immune to the delete (as we would expect it to work).
2024-11-26 16:12:31 +01:00
Aurelien DARRAGON
010c34b8c7 MEDIUM: pattern: consider gen_id in pat_ref_set_from_node()
Don't set all duplicates from a given node if they don't have the same
gen_id. Indeed, now we consider the gen_id to only work on the same
pattern ref revision.
2024-11-26 16:12:26 +01:00
Aurelien DARRAGON
4792f27892 MINOR: pattern: add pat_ref_gen_delete() function
pat_ref_gen_delete(ref, gen_id, key) tries to delete all samples belonging
to <gen_id> and matching <key> under <ref>

The goal is to be able to target a single subset from <ref>
2024-11-26 16:12:21 +01:00
Aurelien DARRAGON
a131c542a6 MINOR: pattern: add pat_ref_gen_find_elt() function
pat_ref_gen_find_elt(ref, gen_id, key) tries to find <elt> element
belonging to <gen_id> and matching <key> in <ref> reference.

The goal is to be able to target a single subset from <ref>
2024-11-26 16:12:16 +01:00
Aurelien DARRAGON
c9d6af3c6d MINOR: pattern: add pat_ref_gen_set() function
pat_ref_gen_set(ref, gen_id, value, err) modifies to <value> the sample
of all patterns matching <key> and belonging to <gen_id> (generation id)
under <ref>

The goal is to be able to target a single subset from <ref>
2024-11-26 16:12:11 +01:00
Aurelien DARRAGON
3d250b3be8 MINOR: pattern: split pat_ref_set()
split pat_ref_set() function in 2 distinct functions. Indeed, since
0844bed7d3 ("MEDIUM: map/acl: Improve pat_ref_set() efficiency (for
"set-map", "add-acl" action perfs)"), pat_ref_set() prototype was updated
to include an extra <elt> argument. But the logic behind is not explicit
because the function will not only try to set <elt>, but also its
duplicate (unlike pat_ref_set_elt() which only tries to update <elt>).

Thus, to make it clearer and better distinguish between the key-based
lookup version and the elt-based one, restotre pat_ref_set() previous
prototype and add a dedicated pat_ref_set_elt_duplicate() that takes
<elt> as argument and tries to update <elt> and all duplicates.
2024-11-26 16:12:05 +01:00
Willy Tarreau
4d58f521ee [RELEASE] Released version 3.2-dev0
Released version 3.2-dev0 with the following main changes :
    - exact copy of 3.1.0
2024-11-26 15:33:57 +01:00
Willy Tarreau
f2b97918e8 [RELEASE] Released version 3.1.0
Released version 3.1.0 with the following main changes :
    - BUG/MAJOR: mux-h1: Properly handle wrapping on obuf when dumping the first-line
    - BUILD: activity/memprofile: fix a build warning in the posix_memalign handler
    - BUG/MINOR: quic: Avoid BUG_ON() on ->on_pkt_lost() BBR callback call
    - CI: update to the latest AWS-LC version
    - CI: update to the latest WolfSSL version
    - DOC: ot: mention planned deprecation of the OT filter
    - Revert "CI: update to the latest WolfSSL version"
    - CI: github: add a WolfSSL job which tries the latest version
    - BUILD: systemd: fix usage of reserved name "sun" in the address field
    - BUILD: init: use the more portable FD_CLOEXEC for /dev/null
    - CI: github: improve the Wolfssl job
    - CI: github: improve the AWS-LC job
    - BUG/MINOR: mux-quic: fix show quic report of QCS prepared bytes
    - BUG/MEDIUM: quic: fix sending performance due to qc_prep_pkts() return
    - MINOR: mux-quic: use sched call time for pacing
    - CI: github: allow to run the Illumos job manually
    - BUILD: tcp_sample: var_fc_counter defined but not used
    - CI: github: add 'workflow_dispatch' on remaining build jobs
    - DOC: config: refine a little bit the text on QUIC pacing
    - MINOR: proto_sockpair: send_fd_uxst: init iobuf, cmsghdr, cmsgbuf to zeros
    - MINOR: startup: rename on_new_child_failure to mworker_on_new_child_failure
    - REORG: startup: move on_new_child_failure in mworker.c
    - MINOR: startup: prefix prepare_master and run_master with mworker_*
    - REORG: startup: move mworker_prepare_master in mworker.c
    - MINOR: startup: keep updating verbosity modes only in haproxy.c
    - REORG: startup: move mworker_run_master and mworker_loop in mworker.c
    - REORG: startup: move mworker_reexec and mworker_reload in mworker.c
    - MINOR: startup: prefix apply_master_worker_mode with mworker_*
    - REORG: startup: move mworker_apply_master_worker_mode in mworker.c
    - MINOR: cfgparse-quic: strengthen quic-cc-algo parsing
    - BUG/MAJOR: quic: fix wrong packet building due to already acked frames
    - DEV: lags/show-sess-to-flags: Properly handle fd state on server side
    - BUG/MEDIUM: http-ana: Don't release too early the L7 buffer
    - MINOR: quic: make bbr consider the max window size setting
    - DOC: quic: Amend the pacing information about BBR.
    - BUG/MEDIUM: quic: prevent EMSGSIZE with GSO for larger bufsize
    - MINOR: cli: Add a "help" keyword to show sess
    - MINOR: cli/quic: Add a "help" keyword to show quic
    - DOC: management: mention "show sess help" and "show quic help"
    - DOC: install: update the list of supported versions
    - MINOR: version: mention that 3.1 is stable now
2024-11-26 15:24:10 +01:00
Christopher Faulet
b629f366a7 MINOR: version: mention that 3.1 is stable now
This version will be maintained up to around Q1 2026. The INSTALL file
also mentions it.
2024-11-26 15:23:54 +01:00
Willy Tarreau
0a406054c7 DOC: install: update the list of supported versions
OpenSSL up to 3.4 was tested, and gcc up to 14 was tested, so let's
reflect this in the install doc.
2024-11-26 15:23:54 +01:00
Willy Tarreau
16022c2a7b DOC: management: mention "show sess help" and "show quic help"
These ones were recently added but we forgot to update the doc.
2024-11-26 15:00:51 +01:00
Olivier Houchard
4f973ab23a MINOR: cli/quic: Add a "help" keyword to show quic
Add a help keyword to show quic, that will provide a longer explanation
of all the available options than what is provided by the command "help".
2024-11-26 14:55:30 +01:00
Olivier Houchard
5288d0f47b MINOR: cli: Add a "help" keyword to show sess
Add a help keyword to show sess, that will provide a longer explanation of
all the available options than what is provided by the command "help".
2024-11-26 14:55:30 +01:00
Amaury Denoyelle
2fffd85b97 BUG/MEDIUM: quic: prevent EMSGSIZE with GSO for larger bufsize
A UDP datagram cannot be greater than 65535 bytes, as UDP length header
field is encoded on 2 bytes. As such, sendmsg() will reject a bigger
input with error EMSGSIZE. By default, this does not cause any issue as
QUIC datagrams are limited to 1.252 bytes and sent individually.

However, with GSO support, value bigger than 1.252 bytes are specified
on sendmsg(). If using a bufsize equal to or greater than 65535, syscall
could reject the input buffer with EMSGSIZE. As this value is not
expected, the connection is immediately closed by haproxy and the
transfer is interrupted.

This bug can easily reproduced by requesting a large object on loopback
interface and using a bufsize of 65535 bytes. In fact, the limit is
slightly less than 65535, as extra room is also needed for IP + UDP
headers.

Fix this by reducing the count of datagrams encoded in a single GSO
invokation via qc_prep_pkts(). Previously, it was set to 64 as specified
by man 7 udp. However, with 1252 datagrams, this is still too many.
Reduce it to a value of 52. Input to sendmsg will thus be restricted to
at most 65.104 bytes if last datagram is full.

If there is still data available for encoding in qc_prep_pkts(), they
will be written in a separate batch of datagrams. qc_send_ppkts() will
then loop over the whole QUIC Tx buffer and call sendmsg() for each
series of at most 52 datagrams.

This does not need to be backported.
2024-11-26 11:49:30 +01:00
Frederic Lecaille
3cee8d7830 DOC: quic: Amend the pacing information about BBR.
BBR handles itself its own burst size (mentioned as send_quantum in BBR RFC).
2024-11-26 08:00:58 +01:00
Frederic Lecaille
a3248a39eb MINOR: quic: make bbr consider the max window size setting
Limit the BBR congestion control window size as this is done for all the others
congestion control algorithms with tune.quic.frontend.default-max-window-size
or as first argument passed to "bbr" option for "quic-cc-algo".
2024-11-26 08:00:58 +01:00
Christopher Faulet
dc15581c02 BUG/MEDIUM: http-ana: Don't release too early the L7 buffer
In some cases, the buffer used to store the request to be able to perform a
L7 retry is released released too early, leading to a crash because a retry
is performed with an empty request.

First, there is a test on invalid 101 responses that may be caught by the
"junk-response" retry policy. Then, it is possible to get an error
(empty-response, bad status code...) after an interim response. In both
cases, the L7 buffer is already released while it should not.

To fix the issue, the L7 buffer is now released at the end of the
AN_RES_WAIT_HTTP analyser, but only when a response was successfully
received and processed. In all error cases, the stream is quickly released,
with the L7 buffer. So there is no leak and it is safer this way.

This patch may fix the issue #2793. It must be as far as 2.4.
2024-11-25 22:18:19 +01:00
Christopher Faulet
ceb80aed57 DEV: lags/show-sess-to-flags: Properly handle fd state on server side
It must be handled as an hexadecimal value.
2024-11-25 21:57:30 +01:00
Frederic Lecaille
96b2641fc8 BUG/MAJOR: quic: fix wrong packet building due to already acked frames
If a packet build was asked to probe the peer with frames which have just
been acked, the frames build run by qc_build_frms() could be cancelled  by
qc_stream_frm_is_acked() whose aim is to check that current frames to
be built have not been already acknowledged. In this case the packet build run
by qc_do_build_pkt() is not interrupted, leading to the build of an empty packet
which should be ack-eliciting.

This is a bug detected by the BUG_ON() statement in qc_do_build_pk():

	    BUG_ON(qel->pktns->tx.pto_probe &&
           !(pkt->flags & QUIC_FL_TX_PACKET_ACK_ELICITING));

Thank you to @Tristan971 for having reported this issue in GH #2709

This is an old bug which must be backported as far as 2.6.
2024-11-25 18:55:45 +01:00
Amaury Denoyelle
d41273c633 MINOR: cfgparse-quic: strengthen quic-cc-algo parsing
quic-cc-algo is a bind keyword which is used to specify the congestion
control algorithm. It is parsed via function bind_parse_quic_cc_algo().

The parsing function was too laxed as it used strncmp for algo token
matching. This could cause surprise if specifying an invalid algorithm
but starting identically to another entry. Especially if extra
parameters are specified in parenthesis, as in this case parameters
value will be completely ignored and default value used instead.

To fix this, convert algo argument to ist. Then, use istsplit() to
extract algo token from the optional extra arguments and compare the
whole value with isteq().
2024-11-25 16:19:54 +01:00
Valentine Krasnobaeva
3500865bc1 REORG: startup: move mworker_apply_master_worker_mode in mworker.c
mworker_apply_master_worker_mode() is called only in master-worker mode, so
let's move it mworker.c
2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva
3899a7ecaa MINOR: startup: prefix apply_master_worker_mode with mworker_*
This patch prepares the move of apply_master_worker_mode in mworker.c. So,
let's at first rename it to mworker_apply_master_worker_mode.
2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva
dee247c14e REORG: startup: move mworker_reexec and mworker_reload in mworker.c
Let's move mworker_reexec() and mworker_reload() in mworker.c. mworker_reload()
is called only within the functions, which are already in mworker.c. So, this
reorganization allows to declare mworker_reload() as a static.
2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva
0c7b93eb1d REORG: startup: move mworker_run_master and mworker_loop in mworker.c
mworker_run_master() is called only in master mode. mworker_loop() is static
and called only in mworker_run_master(). So let's move these both functions in
mworker.c.

We also need here to make run_thread_poll_loop() accessible from other units,
as it's used in mworker_loop().
2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva
56894db000 MINOR: startup: keep updating verbosity modes only in haproxy.c
This commit prepares the move of mworker_run_master() in mworker.c.

Let's remove from it's definition the code, which adjusts verbosity in
dependency of other global run time modes (daemon or foreground). This part
should stay in main(), where all verbosity modes are handeled for
different mode combinations.
2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva
7974089ac6 REORG: startup: move mworker_prepare_master in mworker.c
mworker_prepare_master() performs some preparation routines for the new worker
process, which will be forked during the startup. It's called only in
master-worker mode, so let's move it in mworker.c.
2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva
41cc1fe310 MINOR: startup: prefix prepare_master and run_master with mworker_*
This patch prepares the move of prepare_master() and run_master() definitions
into mworker.c. So, let's at first prefix its names with mworker_*.
2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva
af642420b4 REORG: startup: move on_new_child_failure in mworker.c
mworker_on_new_child_failure() performs some routines for the worker process,
if it has failed the reload. As it's called only in mworker_catch_sigchld()
from mworker.c, let's move mworker_on_new_child_failure() in mworker.c as well.
Like this it could also be declared as a static.
2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva
321c021a83 MINOR: startup: rename on_new_child_failure to mworker_on_new_child_failure
This patch prepares the moving of on_new_child_failure definition into
mworker.c. So, let's rename it accordingly and let's also update its
description.
2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva
10c14a1ed0 MINOR: proto_sockpair: send_fd_uxst: init iobuf, cmsghdr, cmsgbuf to zeros
In master-worker mode, worker process uses now send_fd_uxst() to send
'_send_status' command to master. Since refactoring, this started to trigger
the following Valgrind reports:

==810584== Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
==810584==    at 0x4AAC99D: __libc_sendmsg (sendmsg.c:28)
==810584==    by 0x4AAC99D: sendmsg (sendmsg.c:25)
==810584==    by 0x56350F: send_fd_uxst (proto_sockpair.c:271)
==810584==    by 0x3AA25C: main (haproxy.c:4151)
==810584==  Address 0x1ffefffbfe is on thread 1's stack
==810584==  in frame #1, created by send_fd_uxst (proto_sockpair.c:241)
==810584==
==810584== Syscall param sendmsg(msg.msg_control) points to uninitialised byte(s)
==810584==    at 0x4AAC99D: __libc_sendmsg (sendmsg.c:28)
==810584==    by 0x4AAC99D: sendmsg (sendmsg.c:25)
==810584==    by 0x56350F: send_fd_uxst (proto_sockpair.c:271)
==810584==    by 0x3AA25C: main (haproxy.c:4151)
==810584==  Address 0x1ffefffc14 is on thread 1's stack
==810584==  in frame #1, created by send_fd_uxst (proto_sockpair.c:241)
==810584==

So, let's initialize with zeros all buffers, which are passed to sendmsg
syscall(), used in send_fd_uxst() to avoid these Valgrind messages. They
increase Valgrind output and could make unnoticeable some other, more important
reports.
2024-11-25 15:20:24 +01:00
Willy Tarreau
7fb98e833c DOC: config: refine a little bit the text on QUIC pacing
The QUIC pacing options changed a few times during their development.
For example the unit is now in datagrams not bytes. Also a few
sentences were slightly ambiguous so let's reword this.

No backport is needed.
2024-11-25 14:54:16 +01:00
William Lallemand
dee3f4b3ff CI: github: add 'workflow_dispatch' on remaining build jobs
Add 'workflow_dispatch' on the remaining scheduled build jobs that does
not have it.

This keyword allows to start manually a job from the "Actions" interface
in github.
2024-11-25 14:03:13 +01:00
William Lallemand
da1331b0b5 BUILD: tcp_sample: var_fc_counter defined but not used
var_fc_counter is not used on Illumos and emit a warning

  src/tcp_sample.c:291:12: warning: ‘var_fc_counter’ defined but not used [-Wunused-function]
    291 | static int var_fc_counter(struct arg *args, char **err)
        |            ^~~~~~~~~~~~~~

Let's add an ifdef to build it.
2024-11-25 11:41:26 +01:00
William Lallemand
079193e375 CI: github: allow to run the Illumos job manually
Add the "workflow_dispatch" option to the Illumos CI so it can be run
manually from the github actions page.
2024-11-25 11:30:55 +01:00
Amaury Denoyelle
22bd92a87f MINOR: mux-quic: use sched call time for pacing
QUIC pacing was recently implemented to limit burst and improve overall
bandwidth. This is used only for MUX STREAM emission. Pacing requires
nanosecond resolution. As such, it used now_cpu_time() which relies on
clock_gettime() syscall.

The usage of clock_gettime() has several drawbacks :
* it is a syscall and thus requires a context-switch which may hurt
  performance
* it is not be available on all systems
* timestamp is retrieved multiple times during a single task execution,
  thus yielding different values which may tamper pacing calculation

Improve this by using task_mono_time() instead. This returns task call
time from the scheduler thread context. It requires the flag
TASK_F_WANTS_TIME on QUIC MUX tasklet to force the scheduler to update
call time with now_mono_time(). This solves every limitations listed
above :
* syscall invokation is only performed once before tasklet execution,
  thus reducing context-switch impact
* on non compatible system, a millisecond timer is used as a fallback
  which should ensure that pacing works decently for them
* timer value is now guaranteed to be fixed duing task execution
2024-11-25 11:21:45 +01:00
Amaury Denoyelle
044452546e BUG/MEDIUM: quic: fix sending performance due to qc_prep_pkts() return
qc_prep_pkts() is a QUIC transport level function which encodes one or
several datagrams in a buffer before sending them. It returns the number
of encoded datagram. This is especially important when pacing is used to
limit packet bursts.

This datagram accounting was not trivial as qc_prep_pkts() used several
code paths depending on the condition of the current encoded packet.
Thus, there were several places were the local variable dgram_cnt could
have been incremented. This was implemented by the following commit :

  commit 5cb8f8a6224db96f4386277c41ddae4a29a4130d
  MINOR: quic: support a max number of built packet per send iteration

However, there is a bug due to a missing increment when all frames from
the current QEL have been encoded. In this case, the encoding continue
in the same datagram to coalesce a futur packet. However, if this is the
last QEL, encoding loop will then break. As first_pkt is not NULL,
qc_txb_store() is called outside but dgram_cnt is yet not incremented.

In particular, this causes qc_prep_pkts() to return 0 when there is only
small STREAM frames to emit for application QEL. In qc_send(), this is
interpreted as a value which prevents further emission for the current
invokation. Thus, it may hurts performance, both without and with
pacing.

To fix this, removing multiple dgram_cnt increment. Now, it is modified
only in a single place which should cover every case, and render the
code easier to validate.

The most notable case where the bug is visible is when using cubic with
pacing without any burst, with quic-cc-algo cubic(,1). First, transfer
bandwidth in average was suboptimal, with significant variation. Worst,
it could sometimes fall dramatically for a particular stream without
recovering before returning to an expected level on the next one.

No need to backport.
2024-11-25 11:21:28 +01:00
Amaury Denoyelle
3704e0e174 BUG/MINOR: mux-quic: fix show quic report of QCS prepared bytes
On show quic, each MUX streams are listed with their various indicator
for buffering on Rx and Tx. In particular, txoff displays in parenthesis
the current level of data prepared by the upper stream instance not yet
emitted by QUIC transport layer.

This value is only accessible after a substract operation. However,
there was a typo which caused the result to be always 0. Fix this by
reusing the correct offsets in the calculation.

This should be backported up to 3.0.
2024-11-25 11:21:28 +01:00
William Lallemand
a7e5180c71 CI: github: improve the AWS-LC job
Like the WolfSSL job, improve the AWS-LC job by adding the socat command
so all SSL reg-tests can be run.
Also add gdb and output of corefiles.
2024-11-25 11:14:33 +01:00
William Lallemand
b0c2745ed0 CI: github: improve the Wolfssl job
Improve the WolfSSL job by adding the missing socat command.
Also add gdb and output corefiles like it's done on the VTest job.
2024-11-25 11:00:03 +01:00
Willy Tarreau
a3613d239b BUILD: init: use the more portable FD_CLOEXEC for /dev/null
In 3.1-dev10, commit 8dd4efe42f ("MAJOR: mworker: move master-worker
fork in init()"), the FD associated to /dev/null was made CLOEXEC
using O_CLOEXEC. Unfortunately this is not portable on older OSes,
doesn't build on Solaris for example, and was even reported as breaking
moderately old Linux OSes for other projects. Better not use it unless
absolutely certain it will work (currently we only use it for Linux
namespaces, which are optional), and use the conventional FD_CLOEXEC
instead.

No backport is needed.
2024-11-25 08:46:29 +01:00
Willy Tarreau
f0548302bb BUILD: systemd: fix usage of reserved name "sun" in the address field
systemd.c doesn't build on Solaris / Illumos because it uses "sun" as
the field name in a structure, while "sun" is the name of the macro
used to detect Solaris:

  src/systemd.c: In function 'sd_notify':
  src/systemd.c:43:22: error: expected identifier or '(' before numeric constant
     struct sockaddr_un sun;
                        ^
  src/systemd.c:44:2: warning: no semicolon at end of struct or union
    } socket_addr = {
    ^

Admittedly, the OS could have instead defined "sun" to itself to avoid
this. Any other name will work, let's just use "ux" for the short form
of "unix".

The problem appeared in 3.0-dev with commit aa3632962f ("MEDIUM:
mworker: get rid of libsystemd"), though by then this file was only
built when USE_SYSTEMD was set, which was not the case for non-linux
platforms. However since 3.1-dev14 with commit 15845247db ("MEDIUM:
mworker: remove USE_SYSTEMD requirement for -Ws"), all platforms
now build this file.

No backport is needed even though it will not hurt to have it in 3.0
for completeness.
2024-11-25 08:09:09 +01:00
William Lallemand
a941c92c12 CI: github: add a WolfSSL job which tries the latest version
Like the AWS-LC job, add a CI job which looks for the latest WolfSSL
version and tries to build it.

The patch adds a function which determines the latest version of WolfSSL
from the github tag, and the yml which describes the job.
2024-11-22 17:40:34 +01:00
William Lallemand
16e44e70c8 Revert "CI: update to the latest WolfSSL version"
This reverts commit 03f57fcf94dae61906b56d10d1fb21f7afaae4fc.

Looks like the 5.7.4 version is broke with HAProxy, let's revert the CI
for now.
2024-11-22 16:24:23 +01:00
Willy Tarreau
450528b9f5 DOC: ot: mention planned deprecation of the OT filter
Miroslav mentioned below that he's currently working on an OpenTelemetry
replacement for the OpenTracing filter since OpenTracing itself is no
longer maintained nor supported:

  https://github.com/haproxy/haproxy/issues/2782#issuecomment-2493576327

Given that he aims for 3.2, let's already settle on an upcoming deprecation
of the filter for 3.3 with a removal for 3.5. This will leave time to finish
the development and permit users to switch smoothly. At this point no warning
is emitted (since the users have no alternative) but better mention this plan
in the doc to make them aware of future changes.
2024-11-22 16:11:51 +01:00
William Lallemand
03f57fcf94 CI: update to the latest WolfSSL version
Update the CI to the 5.7.4 WolfSSL version.
2024-11-22 16:05:32 +01:00
William Lallemand
0022962ecb CI: update to the latest AWS-LC version
Update the CI to the 1.39.0 AWS-LC version.
2024-11-22 16:03:28 +01:00
Frederic Lecaille
7472990f86 BUG/MINOR: quic: Avoid BUG_ON() on ->on_pkt_lost() BBR callback call
The per-packet delivery rate sample is applied to ack-eliciting packet only
calling ->drs_on_transmit() BBR callback. So, ->on_pkt_lost() which inspects the
delivery rate sampling information during packet loss detection must not be
called for non ack-eliciting packet. If not, it would be facing with non
initialized variables with big chance to trigger a BUG_ON().

As BBR is implemented in the current developement version, there is
no need to backport this patch.
2024-11-22 15:51:29 +01:00
Willy Tarreau
b30639848e BUILD: activity/memprofile: fix a build warning in the posix_memalign handler
A "return NULL" statement was placed for error handling in the
posix_memalign() handler instead of an int errno value, by recent
commit 5ddc8b3ad4 ("MINOR: activity/memprofile: monitor non-portable
calls as well"). Surprisingly the warning only triggered on gcc-4.8.
Let's use ENOMEM instead. No backport needed.
2024-11-22 09:42:49 +01:00
Christopher Faulet
b150ae46dd BUG/MAJOR: mux-h1: Properly handle wrapping on obuf when dumping the first-line
The formatting of the first-line, for a request or a response, does not
properly handle the wrapping of the output buffer. This may lead to a data
corruption for the current response or eventually for the previous one.

Utility functions used to format the first-line of the request or the
response rely on the chunk API. So it is not expected to pass a buffer that
wraps. Unfortunatly, because of a change performed during the 2.9 dev cycle,
the output buffer was direclty used instead of a non-wrapping buffer created
from it with b_make() function. It is not an issue for the request because
its start-line is always the first block formatted in the output buffer. But
for the response, the output may be not empty and may wrap. In that case,
the response start-line is dumped at a random position in the buffer,
corrupting data. AFAIK, it is only an issue if the HTTP request pipelining
is used.

To fix the issue, we now take care to create a non-wapping buffer from the
output buffer.

This patch should fix issues #2779 and #2996. It must be backported as far as
2.9.
2024-11-22 08:48:53 +01:00
Willy Tarreau
c5d0342fa2 [RELEASE] Released version 3.1-dev14
Released version 3.1-dev14 with the following main changes :
    - MINOR: acl: export find_acl_default()
    - MINOR: sample: extend the "when" converter to support an ACL
    - MINOR: cfgparse: parse tune.{rcvbuf,sndbuf}.{client,server} as sizes
    - MINOR: cfgparse: parse tune.{rcvbuf,sndbuf}.{frontend,backend} as sizes
    - MINOR: cfgparse: parse tune.pipesize as a size
    - MINOR: cfgparse: parse tune.recv_enough as a size
    - MINOR: cfgparse: parse tune.bufsize as a size
    - MINOR: cfgparse: parse tune.bufsize.small as a size
    - REGTESTS: silence the "log format ignored" warnings
    - REGTESTS: silence warning "previous 'http-response' action is final"
    - REGTESTS: make the unit explicit for very short timeouts
    - REGTESTS: silence warnings about content-type being ignored
    - REGTESTS: remove a duplicate "option httpslog" in the defaults section
    - REGTESTS: silence warning "L6 sample fetches ignored" in cond_set_var
    - REGTESTS: add missing timeouts to 30 tests
    - REGTESTS: only use tune.ssl.default-dh-param when not using AWS-LC
    - REGTESTS: enable -dW on almost all tests to fail on warnings
    - MEDIUM: config: warn on unitless timeouts < 100 ms
    - MINOR: tools: make parse_size_err() support 32/64 bits
    - MINOR: ring: support unit suffixes in the size
    - MINOR: cfgparse-global: parse options to allow non std keywords in discovery mode
    - BUG/MINOR: mworker-prog: don't warn about deprecated section with expose-deprecated-directives
    - MINOR: cli: make "show env" accessible via master CLI without enabling debug
    - MINOR: config: show HAPROXY_BRANCH in "show env" output
    - MINOR: http-ana: Add option to keep query-string on a localtion-based redirect
    - MINOR: http-ana: Add support for "set-cookie-fmt" option to redirect rules
    - MINOR: agent-check: Be able to set absolute weight via an agent
    - MINOR: stream: Add an option to "show sess" command to dump the captured URI
    - DOC: config: A a space before ':' for {bs,fs}.aborted and {bs,fs}.rst_code
    - DOC: config: Fix a typo in "1.3.1. The Request line"
    - MINOR: http: Add support for HTTP 414/431 status codes
    - DEV: phash: Update 414 and 431 status codes to phash
    - MINIR: mux-h1: Return 414 or 431 when appropriate
    - BUG/MINOR: http_ana: Report -1 for %Tr for invalid response only
    - DOC: config: Slightly improve the %Tr documentation
    - DOC: config: Move wait_end in section about internal samples
    - DOC: config: Move fs.* and bs.* in section about L5 samples
    - MINOR: stats-file: add the filename in the warning
    - MEDIUM: stats-file: explicitely ignore comments starting by //
    - DOC: quic: rename max-window-size as with default prefix
    - MINOR: mux-quic: add missing values for show flags
    - MINOR: quic: simplify qc_prep_pkts() exit path
    - MINOR: quic: support a max number of built packet per send iteration
    - MINOR: quic: extend qc_send_mux() return type with a dedicated enum
    - MINOR: quic: define quic_pacing module
    - MINOR: quic/pacing: implement quic_pacer engine
    - MINOR: quic/pacing: support pacing emission on quic_conn layer
    - MINOR: quic/pacing: add burst support
    - MINOR: mux-quic: define a tx STREAM frame list member
    - MINOR: mux-quic: encapsulate QCC tasklet wakeup
    - MAJOR: mux-quic: support pacing emission
    - MINOR: quic: use dynamic cc_algo on bind_conf
    - MINOR: quic: extend quic-cc-algo optional parameters
    - MEDIUM: quic: define cubic-pacing congestion algorithm
    - MINOR: mux_quic/pacing: display pacing info on show quic
    - MEDIUM: stats-file: silently ignore be/fe mistmatch
    - REGTESTS: use -dW by default on every reg-tests
    - DOC: lua: fix yield-dependent methods expected contexts
    - DOC: sched: add missing scheduler API documentation for tasklet_wakeup_after()
    - DOC: sched: document the missing TASK_F_UEVT* flags
    - CLEANUP: tinfo: move sched_*_date/*_mono_time to the thread-local area
    - MINOR: stream: don't update s->lat_time when the wakeup date is not set
    - MINOR: tinfo/clock: turn sched_call_date to 64-bits
    - MINOR: sched: add TASK_F_WANTS_TIME to make the scheduler update the call date
    - MINOR: tools: add new macro DEFZERO to provide a default zero argument
    - MINOR: tasklet: make the low-level tasklet API take a flag
    - MINOR: tasklet: support an optional set of wakeup flags to tasklet_wakeup_on()
    - DOC: configuration: explain the rules regarding spaces in arguments
    - DOC: configuration: explain quotes and spaces in conditional blocks
    - DOC: configuration: wrap long line for "strstr()" conditional expression
    - BUG/MINOR: http-ana: Adjust the server status before the L7 retries
    - MINOR: http-fetch: Add an option to 'query" to get the QS with the '?'
    - BUG/MINOR: cfgparse-quic: fix renaming of max-window-size
    - MEDIUM: mworker: remove USE_SYSTEMD requirement for -Ws
    - CI: vtest: temporarily build from the sd-notify PR
    - MINOR: systemd: replace SOCK_CLOEXEC by fcntl call to FD_CLOEXEC
    - BUILD: makefile: make ERR apply to build options as well
    - MINOR: startup: set HAPROXY_LOCALPEER only once
    - DOC: configuration: update "Environment variables" chapter
    - DOC: config: indent the list of environment variables
    - OPTION: map/hlua: make core.set_map() lookup more efficient
    - REGTESTS: switch to -Ws for master-worker reg-tests
    - REGTESTS: disable temporarly mworker test on OSX
    - MINOR: quic: Add the congestion window initial value to QUIC path
    - MINOR: window_filter: Implement windowed filter (only max)
    - MINOR: quic: implement delivery rate sampling algorithm
    - MINOR: quic: implement BBR congestion control algorithm for QUIC
    - MINOR: quic: quic_cc modifications to support BBR
    - MINOR: quic: quic_loss modifications to support BBR
    - MINOR: quic: RX part modifications to support BBR
    - MINOR: quic: TX part modifications to support BBR.
    - MINOR: quic: add "bbr" new "quic-cc-algo" option
    - BUG/MEDIUM: mux-h2: Increase max number of headers when encoding HEADERS frames
    - BUG/MEDIUM: mux-h2: Check the number of headers in HEADERS frame after decoding
    - BUG/MEDIUM: h3: Properly limit the number of headers received
    - BUG/MEDIUM: h3: Increase max number of headers when sending headers
    - DOC: config: Improve documentation of tune.http.maxhdr directive
    - DOC: management: Clearly state "show errors" only reports malformed H1 messages
    - BUILD: makefile: build flags.c before haproxy to speed up the build
    - BUILD: makefile: reorder object files by build time
    - MINOR: config: Improve warnings on misplaced rules by adding an optional arg
    - CLEANUP: cfgparse: Add direction in functions name that warn on misplaced rules
    - MINOR: cfgparse: Emit a warning for misplaced "tcp-response content" rules
    - BUG/MINOR: cfgparse-quic: fix bbr initialization
    - MINOR: cfgparse-quic: activate pacing only via burst argument
    - MINOR: quic: Useless rate sample member initialization
    - BUG/MINOR: cfgparse-quic: fix warning for cc-aglo with 0 burst
    - MINOR: quic: support pacing for newreno and nocc
    - BUG/MINOR: quic: Missing application limitations tracking for BBR
    - MINOR: cfgparse-global: add cfg_parse_global_chroot
    - MINOR: cfgparse-global: add more checks for "chroot" argument
    - BUG/MINOR: startup: fix UAF when set the default for log_tag
    - MINOR: capabilities: rename program_name argument to progname
    - MINOR: startup: use global progname variable
    - MINOR: cfgparse-global: add cfg_parse_global_localpeer
    - BUG/MINOR: config: allow to check HAPROXY_LOCALPEER in config
    - BUG/MINOR: startup: init_early: remove obsolete comment
    - BUG/MEDIUM: debug: don't set the STUCK flag from debug_handler()
    - BUG/MEDIUM: wdt: fix the stuck detection for warnings
    - BUG/MINOR: activity/memprofile: reinitialize the free calls on DSO summary
    - MINOR: activity/memprofile: offer a function to unregister stale info
    - BUG/MEDIUM: pools/memprofile: always clean stale pool info on pool_destroy()
    - MINOR: activity: better report nil than ffff in unknown callers
    - CLEANUP: activity: better use a mask to tests freeing methods
    - MINOR: activity/memprofile: also monitor strdup() activity
    - MINOR: activity/memprofile: monitor non-portable calls as well
    - MINOR: activity: interrupt the show profile dump more often
    - MINOR: tools: resolve main() only once in resolve_sym_name()
    - MINOR: tools: add a new function "resolve_dso_name" to find a symbol's DSO
    - MINOR: activity/memprofile: use resolve_dso_name() for the DSO summary
    - REGTESTS: relax strerror matching to avoid a failure on libmusl
    - REGTESTS: don't rely on the base64 utility when openssl base64 is already used
2024-11-21 23:26:41 +01:00
Willy Tarreau
a89a2d8902 REGTESTS: don't rely on the base64 utility when openssl base64 is already used
Regtest ocsp_auto_update.vtc used to fail here on FreeBSD because the
base64 utility was not installed by default. Once installed it would
still fail because the utility doesn't support -w to wrap lines. Since
the regtest already relies on openssl base64 for a few commands, let's
just rely on it for the other ones. The only limitation is that openssl
freezes on lines longer than 1024 bytes, and doesn't seem to process more
than 255 chars at once, which might be the reason for using base64 -w 1000
in the first place (the script was probably tested like this). Instead
sed is efficient at wrapping long lines and does the job pretty well.
The output was fixed at 72 chars so that the output is also readable on
a terminal for debugging.
2024-11-21 21:10:09 +01:00
Willy Tarreau
a1ace74b7e REGTESTS: relax strerror matching to avoid a failure on libmusl
The regtest4be_1srv_smtpchk_httpchk_layer47errors.vtc fails on musl
because it reports "Network unreachable" for -EUNREACH while the
check matches "Network is unreachable" as on other OSes. Let's just
replace " is" with ".*". It now works on both glibc and musl.
2024-11-21 20:26:46 +01:00
Willy Tarreau
ead0b0154b MINOR: activity/memprofile: use resolve_dso_name() for the DSO summary
Let's simplify the code by making use of this simpler and sometimes
more efficient variant.
2024-11-21 19:58:06 +01:00
Willy Tarreau
670507a66e MINOR: tools: add a new function "resolve_dso_name" to find a symbol's DSO
In the memprofile summary per DSO, we currently have to pay a high price
by calling dladdr() on each symbol when doing the summary per DSO at the
end, while we're not interested in these details, we just want the DSO
name which can be made cheaper to obtain, and easier to manipulate. So
let's create resolve_dso_name() to only extract minimal information from
an address. At the moment it still uses dladdr() though it avoids all the
extra expensive work, and will further be able to leverage the same
mechanism as "show libs" to instantly spot DSO from address ranges.
2024-11-21 19:58:06 +01:00
Willy Tarreau
a205a91bb3 MINOR: tools: resolve main() only once in resolve_sym_name()
resolv_sym_name() calls dladdr(main) for each symbol in order to compare
the first address with other symbols. But this is pointless and quite
expensive in outputs to "show profiling" for example. Let's just keep a
local copy and have a variable indicating if the resolution is needed/
in progress/done to save the value for subsequent calls.
2024-11-21 19:58:06 +01:00
Willy Tarreau
9a8b834435 MINOR: activity: interrupt the show profile dump more often
The calls to resolv_sym_name() can be a bit expensive. Forcing to
yield more often is better for the latency and will avoid the
watchdog reporting warnings.

Note that it's still called in the sort at the end, but that one
cannot be avoided. At best we could try to rely on the list of libs
but that's not trivial and not always present.
2024-11-21 19:58:06 +01:00
Willy Tarreau
5ddc8b3ad4 MINOR: activity/memprofile: monitor non-portable calls as well
Some dependencies might very well rely on posix_memalign(), strndup()
or other less portable callsn making us miss them when chasing memory
leaks, resulting in negative global allocation counters. Let's provide
the handlers for the following functions:

  strndup()        // _POSIX_C_SOURCE >= 200809L || glibc >= 2.10
  valloc()         // _BSD_SOURCE || _XOPEN_SOURCE>=500 || glibc >= 2.12
  aligned_alloc()  // _ISOC11_SOURCE
  posix_memalign() // _POSIX_C_SOURCE >= 200112L
  memalign()       // obsolete
  pvalloc()        // obsolete

This time we don't fail if they're not found, we just silently forward
the calls.
2024-11-21 19:58:06 +01:00
Willy Tarreau
33c0ce299d MINOR: activity/memprofile: also monitor strdup() activity
Some memory profiling outputs have showed negative counters, very likely
due to some libs calling strdup(). Let's add it to the list of monitored
activities.

Actually even haproxy itself uses some. Having "profiling.memory on" in
the config reveals 35 call places.
2024-11-21 19:58:06 +01:00
Willy Tarreau
623a2c4e19 CLEANUP: activity: better use a mask to tests freeing methods
In "show profiling memory", we need to distinguish methods which really
free memory from those which do not so that we don't account for the
free value twice. However for now it's done using multiple tests, which
are going to complicate the addition of new methods. Let's switch to a
bit field defined as a mask in a single place instead, as we don't
intend to use more than 32/64 methods!
2024-11-21 19:58:06 +01:00
Willy Tarreau
f3547d0b74 MINOR: activity: better report nil than ffff in unknown callers
For unknown callers we try to get the lowest known address and we
purposely ignore NULL during calculation of the min. But the side
effect is that we also report ffff in the per-DSO address. Better
catch this case and finally accept to report nil. Before it would
report this:

  $ socat - /tmp/sock1 <<< "show profiling memory" |grep nil
        50000          10        9600000           9440|            (nil) [other] unknown(192) [delta=9590560] [pool=http_txn]
        50000          10        9600000           9440|            (nil) DSO:other; delta_calls=49990; delta_bytes=9590560

now it reports this:

  $ socat - /tmp/sock1 <<< "show profiling memory" |grep nil
        50000          11        9600000           9656|            (nil) [other] unknown(192) [delta=9590344] [pool=connection]
        50000          11        9600000           9656|            (nil) DSO:other; delta_calls=49989; delta_bytes=9590344
2024-11-21 19:58:06 +01:00
Willy Tarreau
ed3ed35867 BUG/MEDIUM: pools/memprofile: always clean stale pool info on pool_destroy()
There's actually a problem with memprofiles: the pool pointer is stored
in ->info but some pools are replaced during startup, such as the trash
pool, leaving a dangling pointer there, that may randomly report crap or
even crash during "show profile memory".

Let's make pool_destroy() call memprof_remove_stale_info() added
by previous patch so that these entries are properly unregistered.

This must be backported along with the previous patch (MINOR:
activity/memprofile: offer a function to unregister stale info) as
far as 2.8.
2024-11-21 19:58:06 +01:00
Willy Tarreau
859341c1ec MINOR: activity/memprofile: offer a function to unregister stale info
There's actually a problem with memprofiles: the pool pointer is stored
in ->info but some pools are replaced during startup, such as the trash
pool, leaving a dangling pointer there.

Let's complete the API with a new function memprof_remove_stale_info()
that will remove all stale references to this info pointer. It's also
present when USE_MEMORY_PROFILING is not set so as to ease the job on
callers.
2024-11-21 19:58:06 +01:00
Willy Tarreau
c42a2b8c94 BUG/MINOR: activity/memprofile: reinitialize the free calls on DSO summary
In commit 401fb0e87a ("MINOR: activity/memprofile: show per-DSO stats")
we added a summary per DSO. However the free calls/tot were not initialized
when creating a new entry because initially they were applied to any entry,
but since we don't update free calls for non-free capable callers, we still
need to reinitialize these entries when reassigning one. Because of this
bug, a "show profiling memory" output can randomly show highly negative
values on the DSO lines if it turns out that the DSO entry was created on
an alloc instead of a realloc/free.

Since the commit above was backported to 2.9, this one must go there as
well.
2024-11-21 19:58:05 +01:00
Willy Tarreau
24ce001771 BUG/MEDIUM: wdt: fix the stuck detection for warnings
If two slow tasks trigger one warning even a few seconds apart, the
watchdog code will mistakenly take this for a definite stuck task and
kill the process. The reason is that since commit 148eb5875f ("DEBUG:
wdt: better detect apparently locked up threads and warn about them")
the updated ctxsw count is not the correct one, instead of updating
the private counter it resets the public one, preventing it from making
progress and making the wdt believe that no progress was made. In
addition the initial value was read from [tid] instead of [thr].

Please note that another fix is needed in debug_handler() otherwise the
watchdog will fire early after the first warning or thread dump.

A simple test for this is to issue several of these commands back-to-back
on the CLI, which crashes an unfixed 3.1 very quickly:

  $ socat /tmp/sock1 - <<< "expert-mode on; debug dev loop 1000"

This needs to be backported to 2.9 since the fix above was backported
there. The impact on 3.0 and 2.9 is almost inexistent since the watchdog
there doesn't apply the shorter warning delay, so the first call already
indicates that the thread is stuck.
2024-11-21 19:58:05 +01:00
Willy Tarreau
1151fe6818 BUG/MEDIUM: debug: don't set the STUCK flag from debug_handler()
Since 2.0 with commit e6a02fa65a ("MINOR: threads: add a "stuck" flag
to the thread_info struct"), the TH_FL_STUCK flag was set by the
debugger to flag that a thread was stuck and report it in the output.

However, two commits later (2bfefdbaef "MAJOR: watchdog: implement a
thread lockup detection mechanism"), this flag was used to detect that
a thread had already been reported as stuck. The problem is that it
seldom happens that a "show threads" command instantly crashes because
it calls debug_handler(), which sets the flag, and if the watchdog timer
was about to trigger before going back to the scheduler, the watchdog
believes that the thread has been stuck for a while and will kill the
process.

The issue was magnified in 3.1 with the lower-delay warning, because
it's possible for a thread to die on the next wakeup after the first
warning (which calls debug_handler() hence sets the STUCK flag).

One good approach would have been to use two distinct flags, one for
"stuck" as reported by the debug handler, and one for "stuck" as seen
by the watchdog. However, one could also argue that since the second
commit, given that the wdt monitors the threads, there's no point any
more for the debug handler to set the flag itself. Removing this code
means that two consecutive "show threads" will not report "stuck" until
the watchdog sets it, which aligns better with expectations.

This can be backported to all stable releases. This code has changed a
bit over time, the "if" block and the harmless variables just need to
be removed.
2024-11-21 19:58:05 +01:00
Valentine Krasnobaeva
332839eb9d BUG/MINOR: startup: init_early: remove obsolete comment
This fixes the commit d6ccd1738bae
("MINOR: startup: set HAPROXY_LOCALPEER only once").

Comment "/* preset some environment variables */" is now useless here as
HAPROXY_LOCALPEER is set later during the initialization stage and only once.

This should not be backported, as related to the latest master-worker
refactoring.
2024-11-21 19:55:21 +01:00
Valentine Krasnobaeva
aa88d6ee37 BUG/MINOR: config: allow to check HAPROXY_LOCALPEER in config
This fixes the commit d6ccd1738bae
("MINOR: startup: set HAPROXY_LOCALPEER only once"). HAPROXY_LOCALPEER could
be checked in the configuration to set some servers settings or listeners. So,
we need to set it just before we read the configuration at the second time.

Let's mark HAPROXY_LOCALPEER as "usable" in the configuration in the related
documentation chapter.

This should not be backported, as related to the latest master-worker
refactoring.
2024-11-21 19:55:21 +01:00
Valentine Krasnobaeva
d253f30823 MINOR: cfgparse-global: add cfg_parse_global_localpeer
This commit prepares the parsing of localpeer keyword in MODE_DISCOVERY. We
need this, as HAPROXY_LOCALPEER environment variable could be checked in the
configuration in order to enable some backend or frontend settings.

So, let's at first add a dedicated parser for localpeer. At second, we no
longer need to check, if cfg_peers is valid pointer, as in MODE_DISCOVERY we
parse only the "global" section.

In addition, let's make the code of localpeer parser a little bit more
readable.
2024-11-21 19:55:21 +01:00
Valentine Krasnobaeva
bfe0f9d02d MINOR: startup: use global progname variable
Let's store progname in the global variable, as it is handy to use it in
different parts of code to format messages sent to stdout.

This reduces the number of arguments, which we should pass to some functions.
2024-11-21 19:55:21 +01:00
Valentine Krasnobaeva
ef154a49e1 MINOR: capabilities: rename program_name argument to progname
This commit prepares the usage of the global progname variable.
prepare_caps_from_permitted_set() use progname value in warning messages. So,
let's rename program_name argument to progname.
2024-11-21 19:55:21 +01:00
Valentine Krasnobaeva
351ae5dbed BUG/MINOR: startup: fix UAF when set the default for log_tag
In the init_early() global.log_tag is initialized to the string from progname
pointer and global.log_tag.area points to this pointer.

If log-tag keyword is provided in the configuration, its parser at first frees
global.log_tag.area and then it does a new memory allocation to copy
there the argument of log-tag. So, progname no longer points to the valid
memory.

To fix this, let's always keep progname and global.log_tag.area at separate
memory areas. If log_tag will be redefined in the configuration, its parser will
free the memory allocated for the default value in chunk_destroy(). Memory
allocated for progname will be freed in deinit().

This should not be backported as related to the latest master-worker
refactoring.
2024-11-21 19:55:21 +01:00
Valentine Krasnobaeva
d1c3cd8974 MINOR: cfgparse-global: add more checks for "chroot" argument
If directory provided as a "chroot" keyword argument does not exist or
inaccessible, this is reported only at the latest initialization stage, when
haproxy tries to perform chroot. Sometimes it's not very convenient, as the
process is already bound to listen sockets.

This was done explicitly in order not to break the case, when haproxy is
launched with "-c" option in some specific environment, where it's not possible
to create or to modify chroot directory, provided in the configuration.

So, let's add more checks for "chroot" directory during the parsing
stage and let's show diagnostic warnings, if this directory has become
non-accesible or was deleted. Like this, users, who wants to catch errors
related to misconfigured chroot before starting the process, can launch haproxy
with -dW and -dD. zero-warning mode will stop the process with error, if any
warning was emitted during initialization stage.
2024-11-21 19:55:21 +01:00
Valentine Krasnobaeva
c853502cc6 MINOR: cfgparse-global: add cfg_parse_global_chroot
Let's add a dedicated parser for "chroot" keyword, as we add some more checks
for its argument in the next commit.

This reduces the size of cfg_parse_global().
2024-11-21 19:55:21 +01:00
Frederic Lecaille
01fcbd6c08 BUG/MINOR: quic: Missing application limitations tracking for BBR
The ->app_limited member of the delivery rate struct (quic_cc_drs) aim is to
store the index of the last transmitted byte marked as application-limited
so that to track the application-limited phases. During these phases,
BBR must ignore delivery rate samples to properly estimate the delivery rate.

Without such a patch, the Startup phase could be exited very quickly with
a very low estimated bottleneck bandwidth. This had a very bad impact
on little objects with download times smaller than the expected Startup phase
duration. For such objects, with enough bandwith, BBR should stay in the Startup
state.

No need to be backported, as BBR is implemented in the current developement version.
2024-11-21 19:23:53 +01:00
Amaury Denoyelle
95d3edd68f MINOR: quic: support pacing for newreno and nocc
Extend extra pacing support for newreno and nocc congestion algorithms,
as with cubic.

For better extensibility of cc algo definition, define a new flags field
in quic_cc_algo structure. For now, the only value is
QUIC_CC_ALGO_FL_OPT_PACING which is set if pacing support can be
optionally activated. Both cubic, newreno and nocc now supports this.

This new flag is then reused by QUIC config parser. If set, extra
quic-cc-algo burst parameter is taken into account. If positive, this
will activate pacing support on top of the congestion algorithm. As with
cubic previously, pacing is only supported if running under experimental
mode.

Only BBR is not flagged with this new value as pacing is directly
builtin in the algorithm and cannot be turn off. Furthermore, BBR
calculates automatically its value for maximum burst. As such, any
quic-cc-algo burst argument used with BBR is still ignored with a
warning.
2024-11-21 11:33:44 +01:00
Amaury Denoyelle
99497d23b5 BUG/MINOR: cfgparse-quic: fix warning for cc-aglo with 0 burst
Optional burst argument for quic-cc-algo is used to toggle pacing
support on top of cubic. This is the case if it is positive.

The default value is 0, which do not activate pacing. However, in this
case, an incorrect warning is reported about the parameter being
ignored. Fix this by removing the warning in this case.

No need to backport.
2024-11-21 11:26:36 +01:00
Frederic Lecaille
ea17de01ac MINOR: quic: Useless rate sample member initialization
This poor/inefficient code has been revealed by coverity GH issue in #2788 where
some quic_cc_rs struct member initializations were mentionned as overwritten
(after initialization) before being used as follows:

CID 1565821:  Code maintainability issues  (UNUSED_VALUE)
/src/quic_cc_bbr.c: 1373 in bbr_handle_lost_packet()
1367     }
1368
1369     static void bbr_handle_lost_packet(struct bbr *bbr, struct quic_cc_path *p,
1370                                        struct quic_tx_packet *pkt,
1371                                        uint32_t lost)
1372     {
>>>     CID 1565821:  Code maintainability issues  (UNUSED_VALUE)
>>>     Assigning value "0UL" to "rs.tx_in_flight" here, but that stored value is overwritten before it can be used.
1373            struct quic_cc_rs rs = {0};
1374
1375            /* C.delivered = bbr->drs.delivered */
1376            bbr_note_loss(bbr, bbr->drs.delivered);
1377            if (!bbr->bw_probe_samples)
1378                    return; /* not a packet sent while probing bandwidth */

Remove the {0} initializer for <rs> variable. This is safe because the members
initializations of <rs> local variable passed to functions from
bbr_handle_lost_packet() are done. Add a comment to mention this.
2024-11-21 11:01:53 +01:00
Amaury Denoyelle
de86fd1e6c MINOR: cfgparse-quic: activate pacing only via burst argument
Recently, pacing support was added for cubic congestion algorithm. This
was activated by using the new token "cubic-pacing" on quic-cc-algo.
Furthermore, it was possible to define a burst size with a new
parameters after congestion token between parenthesis.

This configuration is not oblivious to users. In particular, it can
cause to easily forgot to tweak burst size, which can dramatically
impact performance.

Simplify this by removing the extra "-pacing" suffix. Now, pacing will
be activated solely based on the burst parameter. If 0, burst is
considered as infinite and no pacing will be used. Pacing will be
activating for any positive burst. This better reflects the link between
pacing and burst and its importance.

Note that for the moment, if burst is specified, it will be ignored with
a warning for algorithm outside of cubic.

This is not a breaking change as pacing support was implemented in the
current dev version.
2024-11-21 10:55:55 +01:00
Amaury Denoyelle
7b23c9075c BUG/MINOR: cfgparse-quic: fix bbr initialization
To support pacing with cubic, a recent change was introduced to render
quic_cc_algo on bind line dynamically allocated, instead of pointing to
a globally defined variable. This allows customization of the algorithm
callbacks per bind line.

This was not correctly used for BBR as it was set to point to the global
quic_cc_algo_bbr. This causes a segfault on haproxy process closing. Fix
this by properly initializing BBR as other algorithms.

This should fix coverity report from github issue #2786.
2024-11-21 10:49:16 +01:00
Christopher Faulet
e58a30d369 MINOR: cfgparse: Emit a warning for misplaced "tcp-response content" rules
When a "tcp-response content" rule is placed after a "http-response" rule, a
warning is now emitted, just like for rules applied on the requests.
2024-11-21 09:55:04 +01:00
Christopher Faulet
5dcd3b0d99 CLEANUP: cfgparse: Add direction in functions name that warn on misplaced rules
This only concerns functions emitting warnings about misplaced tcp-request
rules. The direction is now specified in the functions name. For instance
"warnif_misplaced_tcp_conn" is replaced by "warnif_misplaced_tcp_req_conn".
2024-11-21 09:51:37 +01:00
Christopher Faulet
7710580428 MINOR: config: Improve warnings on misplaced rules by adding an optional arg
In warnings about misplaced rules, only the first keyword is mentionned. It
works well for http-request or quic-initial rules for instance. But it is a
bit confusing for tcp-request rules, because the layer is missing (session
or content).

To make it a bit systematic (and genric), the second argument can now be
provided. It can be set to NULL if there is no layer or scope. But
otherwise, it may be specified and it will be reported in the warning.

So the following snippet:

    tcp-request content reject if FALSE
    tcp-request session reject if FALSE
    tcp-request connection reject if FALSE

Will now emit the following warnings:

  a 'tcp-request session' rule placed after a 'tcp-request content' rule will still be processed before.
  a 'tcp-request connection' rule placed after a 'tcp-request session' rule will still be processed before.

This patch should fix the issue #2596.
2024-11-21 09:28:42 +01:00
Willy Tarreau
c329bfe3f5 BUILD: makefile: reorder object files by build time
mux_spop is quite long to build and was at the end. The rest did not
change much, but the build time is now dominated by hlua.o and mux_h2.o
and by a large margin. On the 80-core ARM mux_h2.o is present from
beginning to end and on the PC it's hlua.o, so both might have to be
split at some point to benefit from multi-core.

Nevertheless, the changes allowed to shrink about one second out of
the 18 it was taking on that machine.
2024-11-20 18:49:56 +01:00
Willy Tarreau
f16edcd34c BUILD: makefile: build flags.c before haproxy to speed up the build
The end of the build is often super slow. In practice it's flags.o that
now takes ages (3.4 seconds) and blocks everything on a single core at
the end. Let's declare it before the haproxy target so that it starts
earlier. On a quad-2.2 GHz CPU, the build time goes down from 44 to 42s
and the end feels less painful.
2024-11-20 18:49:56 +01:00
Christopher Faulet
667ac8acc6 DOC: management: Clearly state "show errors" only reports malformed H1 messages
For now, only the H1 multiplexer is able to capture malformed messages. So
it is better to update the management guide accordingly to avoid any
confusion.
2024-11-20 18:08:17 +01:00
Christopher Faulet
e863d8d681 DOC: config: Improve documentation of tune.http.maxhdr directive
The description was inproved to clrealy mentionned it is applied on received
requests and responses. In addition, a comment was added about HTTP/2 and
HTTP/3 limitation when messages are encoded to be sent.
2024-11-20 18:02:36 +01:00
Christopher Faulet
3bd9a9e7d7 BUG/MEDIUM: h3: Increase max number of headers when sending headers
In the same way than for the H2, the maximum number of headers that can be
encoded when headers are sent must be increased to match the limit imposed
when they are received.

Reasons are the sames. On receive path, the maximum number of headers
accepted must be higher than the configured limit to be able to handle
pseudo headers and cookies headers. On the sending path, the same limit must
be applied because the pseudo headers will consume some extra slots and the
cookie header could be splitted.

This patch should be backported as far as 2.6.
2024-11-20 17:44:22 +01:00
Christopher Faulet
785e633353 BUG/MEDIUM: h3: Properly limit the number of headers received
The number of headers are limited before the decoding but pseudo headers and
cookie headers consume extra slots. In practice, this lowers the maximum number
of headers that can be received.

To workaround this issue, the limit is doubled during the frame decoding to be
sure to have enough extra slots. And the number of headers is tested against the
configured limit after the HTX message was created to be able to report an
error. Unfortunatly no parsing error are reported because the QUIC multiplexer
is not able to do so for now.

The same is performed on trailers to be consistent with H2.

This patch should be backported as far as 2.6.
2024-11-20 17:44:22 +01:00
Christopher Faulet
63d2760dfa BUG/MEDIUM: mux-h2: Check the number of headers in HEADERS frame after decoding
There is no explicit test on the number of headers when a HEADERS frame is
received. It is implicitely limited by the size of the header list. But it
is twice the configured limit to be sure to decode the frame.

So now, a check is performed after the HTX message was created. This way, we
are sure to not exceed the configured limit after the decoding stage. If
there are too many headers, a parsing error is reported.

Note the same is performed on the trailers.

This patch should patially address the issue #2685. It should be backported
to all stable versions.
2024-11-20 17:44:22 +01:00
Christopher Faulet
e415e3cb7a BUG/MEDIUM: mux-h2: Increase max number of headers when encoding HEADERS frames
When a HEADERS frame is encoded to be sent, the maximum number of headers
allowed in the frame is lower than on receiving path. This can lead to
report a sending error while the message was accepted. It could be
confusing.

In addition, the start-line is splitted into pseudo-headers and consummes
this way some header slots, increasing the difference between HEADERS frames
encoding and decoding. It is even more noticeable because when a HEADERS
frame is decoded, a margin is used to be able to handle splitted cookie
headers. Concretly, on decoding path, a limit of twice the maxumum number of
headers allowed in a message (tune.http.maxhdr * 2) is used. On encoding
path, the exact limit is used. It is not consistent.

Note that when a frame is decoded, we must use a larger limit because the
pseudo headers are reassembled in the start-line and must count for one. But
also because, most of time, the cookies are splitted into several headers
and are reassembled too.

To fix the issue, the same ratio is applied on sending path. A limit must be
defined because an dynamic allocation is not acceptable. Twice of the
configured limit should be good enough to support headers manipulation.

This patch should be backported to all stable versions.
2024-11-20 17:44:22 +01:00
Frederic Lecaille
349954601f MINOR: quic: add "bbr" new "quic-cc-algo" option
Add this new "bbr" option to the list of the congestion control algorithms which
may be set by "quic-cc-algo" setting.

This new algorithm is considered as experimental and may be enabled only if
"expose-experimental-directive" is set.

Also update the documentation for this new setting.
2024-11-20 17:34:22 +01:00
Frederic Lecaille
e778b9a2b6 MINOR: quic: TX part modifications to support BBR.
Very few modifications: call ->on_transmit() and ->drs_on_transmit() congestion
control algorithm (quic_cc) callbacks from qc_send_ppkts() just after having
sents some packets.
2024-11-20 17:34:22 +01:00
Frederic Lecaille
44af88d856 MINOR: quic: RX part modifications to support BBR
qc_notify_cc_of_newly_acked_pkts() aim is to notify the congestion algorithm
of all the packet acknowledgements. It must call quic_cc_drs_update_rate_sample()
to update the delivery rate sampling information. It must also call
quic_cc_drs_on_ack_recv() to update the state of the delivery rate sampling part
used by BBR.
Finally, ->on_ack_rcvd() is called with the total number of bytes delivered
by the sender from the newly acknowledged packets with <bytes_delivered> as
parameter to do so. <pkt_delivered> store the per-packet number of bytes
delivered by the newly sent acknowledged packet (the packet with the highest
packet number). <bytes_lost> is also used and has been set by
qc_packet_loss_lookup() before calling qc_notify_cc_of_newly_acked_pkts().
2024-11-20 17:34:22 +01:00
Frederic Lecaille
d85eb127e9 MINOR: quic: quic_loss modifications to support BBR
qc_packet_loss_lookup() aim is to detect the packet losses. This is this function
which must called ->on_pkt_lost() BBR specific callback. It also set
<bytes_lost> passed parameter to the total number of bytes detected as lost upon
an ACK frame receipt for its caller.
Modify qc_release_lost_pkts() to call ->congestion_event() with the send time
from the newest packet detected as lost.
Modify qc_release_lost_pkts() to call ->slow_start() callback only if define
by the congestion control algorithm. This is not the case for BBR.
2024-11-20 17:34:22 +01:00
Frederic Lecaille
af75665cb7 MINOR: quic: quic_cc modifications to support BBR
Add several callbacks to quic_cc_algo struct which are only called by BBR.
->get_drs() may be used to retrieve the delivery rate sampling information
from an congestion algorithm struct (quic_cc).
->on_transmit() must be called before sending any packet a QUIC sender.
->on_ack_rcvd() must be called after having received an ACK.
->on_pkt_lost() must be called after having detected a packet loss.
->congestion_event() must be called after any congestion event detection
Modify quic_cc.c to call ->event only if defined. This is not the case
for BBR.
2024-11-20 17:34:22 +01:00
Frederic Lecaille
d04adf44dc MINOR: quic: implement BBR congestion control algorithm for QUIC
Implement the version 3 of BBR for QUIC specified by the IETF in this draft:

https://datatracker.ietf.org/doc/draft-ietf-ccwg-bbr/

Here is an extract from the Abstract part to sum up the the capabilities of BBR:

BBR ("Bottleneck Bandwidth and Round-trip propagation time") uses recent
measurements of a transport connection's delivery rate, round-trip time, and
packet loss rate to build an explicit model of the network path. BBR then uses
this model to control both how fast it sends data and the maximum volume of data
it allows in flight in the network at any time. Relative to loss-based congestion
control algorithms such as Reno [RFC5681] or CUBIC [RFC9438], BBR offers
substantially higher throughput for bottlenecks with shallow buffers or random
losses, and substantially lower queueing delays for bottlenecks with deep buffers
(avoiding "bufferbloat"). BBR can be implemented in any transport protocol that
supports packet-delivery acknowledgment. Thus far, open source implementations
are available for TCP [RFC9293] and QUIC [RFC9000].

In haproxy, this implementation is considered as still experimental. It depends
on the newly implemented pacing feature.

BBR was asked in GH #2516 by @KazuyaKanemura, @osevan and @kennyZ96.
2024-11-20 17:34:22 +01:00
Frederic Lecaille
472d575950 MINOR: quic: implement delivery rate sampling algorithm
This patch implements an algorithm which may be used by congestion algorithms
for QUIC to estimate the current delivery rate of a sender. It is at least used
by BBR and could be used by others congestion algorithms as cubic.

This algorithm was specified by an RFC draft here:
https://datatracker.ietf.org/doc/html/draft-cheng-iccrg-delivery-rate-estimation
before being merged into BBR v3 here:
https://datatracker.ietf.org/doc/html/draft-cardwell-ccwg-bbr#section-4.5.2.2
2024-11-20 17:34:22 +01:00
Frederic Lecaille
c08b877657 MINOR: window_filter: Implement windowed filter (only max)
Implement the Kathleen Nichols' algorithm used by several congestion control
algorithm implementation (TCP/BBR in Linux kernel, QUIC/BBR in quiche) to track
the maximum value of a data type during a fixe time interval.
In this implementation, counters which are periodically reset are used in place
of timestamps.
Only the max part has been implemented.
(see lib/minmax.c implemenation for Linux kernel).
2024-11-20 17:34:22 +01:00
Frederic Lecaille
7bbe8828ba MINOR: quic: Add the congestion window initial value to QUIC path
Add ->initial_wnd new member to quic_cc_path struct to keep the initial value
of the congestion window. This member is initialized as soon as a QUIC connection
is allocated. This modification is required for BBR congestion control algorithm.
2024-11-20 17:34:22 +01:00
William Lallemand
5ebecbe45b REGTESTS: disable temporarly mworker test on OSX
-Ws on VTest is not working correctly for an unknown reason, the polling
of the NOTIFY_SOCKET seems to timeout, and VTest never receives the
READY message.

This patch disables the reg-tests using -Ws on OS X.
2024-11-20 17:13:59 +01:00
William Lallemand
b7d81b3511 REGTESTS: switch to -Ws for master-worker reg-tests
The -W mode implemented in VTest is not reliable anymore, because VTest
waits for the pidfile to be created. But with the new master-worker
mode, this file is created long before haproxy is ready. This can lead
to the test being started too soon, and failing from time to time.

The -Ws option allows to wait for haproxy to deliver a message to VTest
once it is ready.
2024-11-20 17:13:59 +01:00
Aurelien DARRAGON
2ce0db4e4b OPTION: map/hlua: make core.set_map() lookup more efficient
0844bed7d3 ("MEDIUM: map/acl: Improve pat_ref_set() efficiency (for
"set-map", "add-acl" action perfs)") improved lookup efficiency for
set-map http action, but the core.set_map() lua method which is built
on the same construct was overlooked. Let's also benefit from this optim
as it easily applies.
2024-11-20 16:14:13 +01:00
Willy Tarreau
311dc748b0 DOC: config: indent the list of environment variables
In the doc our lists are indented but for any reason this one was not,
making it harder to visually delimit. Let's just indent it. No need to
backport this, it's totally cosmetic and would need adaptations since
it was recently touched.
2024-11-20 15:57:09 +01:00
Valentine Krasnobaeva
41d906d69b DOC: configuration: update "Environment variables" chapter
There are some variables, which are set by HAProxy process (HAPROXY_*). Some
of them are handy to check or to redefine in the configuration, in order to
create conditional blocks and make the configuration more flexible. But it
wasn't clear in the documentation, which variables are really safe and usefull
to redefine and which ones could be only read via "show env" output.

Latest changes in master-worker architecture makes the existed description even
more confusing.

So let's sort all HAPROXY_* variables to four categories and let's also mark
explicitly, which ones are set in which process, when haproxy is started in
master-worker mode.

In addition, update examples in chapter "2.4. Conditional blocks". This might
bring more ideas for users how HAPROXY_* variables could be used in the
conditional blocks.
2024-11-20 15:56:50 +01:00
Valentine Krasnobaeva
d6ccd1738b MINOR: startup: set HAPROXY_LOCALPEER only once
Before this patch HAPROXY_LOCALPEER variable could be set in init_early(),
in init_args() and in cfg_parse_global(). In master-worker mode, if localpeer
keyword set in the global section, HAPROXY_LOCALPEER in the worker
environment is set to this keyword's value, but in the master environment it
still keeps the default, a localhost name. This is confusing.

To fix it, let's set HAPROXY_LOCALPEER only once, when a worker or process in a
standalone mode has finished to parse its configuration. And let's set this
variable only for the worker process or for the process in a standalone mode,
because the master doesn't need it.

HAPROXY_LOCALPEER takes the value saved in localpeer global variable, which is
always set by default in init_early() to the local hostname. Then, localpeer
could be reset in init_args (-L option) and in cfg_parse_global() (while
parsing "localpeer" keyword).
2024-11-20 15:44:10 +01:00
Willy Tarreau
1171a23aec BUILD: makefile: make ERR apply to build options as well
Once in a while we find some makefiles ignoring some outdated arguments
and just emit a warning. What's annoying is that if users (say, distro
packagers), have purposely added ERR=1 to their build scripts to make
sure to fail on any warning, these ones will be ignored and the build
can continue with invalid or missing options.

William rightfully suggested that ERR=1 should also catch make's warnings
so this patch implements this, by creating a new "complain" variable that
points either to "error" or "warning" depending on $(ERR), and that is
used to send the messages using $(call $(complain),...). This does the
job right at little effort (tested from GNU make 3.82 to 4.3).

Note that for this purpose the ERR declaration was upped in the makefile
so that it appears before the new errors.mk file is included.
2024-11-20 14:58:35 +01:00
William Lallemand
b861dc9371 MINOR: systemd: replace SOCK_CLOEXEC by fcntl call to FD_CLOEXEC
Since we build systemd.o for every target, we need it to be more
portable.

The SOCK_CLOEXEC argument from socket() is not portable and won't build
on some OS like macOS X.

This patch fixes the issue by replace SOCK_CLOEXEC by a fnctl set to
FD_CLOEXEC.
2024-11-20 14:26:23 +01:00
William Lallemand
1ceeeacbad CI: vtest: temporarily build from the sd-notify PR
Build VTest temporarily from the sd-notify PR until the
https://github.com/vtest/VTest/pull/41 is merged.

This PR allows starting with -Ws in order to have more reliables tests
in master-worker mode.
2024-11-20 12:07:38 +01:00
William Lallemand
15845247db MEDIUM: mworker: remove USE_SYSTEMD requirement for -Ws
Since sd_notify() is now implemented in src/systemd.c, there is no need
anymore to build its support conditionnally with USE_SYSTEMD.

This patch add supports for -Ws for every build and removes the
USE_SYSTEMD build option. It also remove every reference to USE_SYSTEMD
in the documentation and the CI.

This also allows to run the reg-tests in -Ws with the new VTest support.
2024-11-20 12:07:38 +01:00
Amaury Denoyelle
16147e6cf3 BUG/MINOR: cfgparse-quic: fix renaming of max-window-size
A patch has recently tried to rename QUIC max-window-size global
parameter to default-max-window-size to better reflect its usage.
However, only the documentation was edited but not cfgparse-quic.c.

Fix this by updating cfgparse-quic.c with the new default- naming.

No need to backport.
2024-11-20 11:12:06 +01:00
Christopher Faulet
17d4e6eaf9 MINOR: http-fetch: Add an option to 'query" to get the QS with the '?'
As mentionned by Thayne McCombs in #2728, it could be handy to have a sample
fetch function to retrieve the query string with the question mark
character.

Indeed, for now, "query" sample fetch function already extract the query
string from the path, but the question mark character is not
included. Instead of adding a new sample fetch function with a too similar
name, an optional argument is added to "query". If "with_qm" is passed as
argument, the question mark will be included in the query string, but only
if it is not empty.

Thanks to this patch, the following rule:

  http-request redirect location /destination?%[query] if { -m found query }  some_condition
  http-request redirect location /destination if some_condition

can now be expressed this way:

  http-request redirect location /destination%[query(with_qm)] if some_condition
2024-11-20 10:20:05 +01:00
Christopher Faulet
2a5da31cce BUG/MINOR: http-ana: Adjust the server status before the L7 retries
The server status must be adjusted, if necessary, at each retry. It is
properly performed when "obersve layer4" directive is set. But for the layer
7, only the last attempt was considered.

When the L7 retries were implemented, all retries were added before the
server status adjutement. So only the last attempt was considered. To fix
the issue, we must adjut the server status first, and then try to perform a
L7 retry.

This patch should fix the issue #2679. It must be backported to all stable
versions.
2024-11-20 09:22:06 +01:00
Willy Tarreau
5c15899410 DOC: configuration: wrap long line for "strstr()" conditional expression
This keyword had too long a description line, let's split it. This can be
backported to 2.8.
2024-11-20 09:04:53 +01:00
Willy Tarreau
da1620b317 DOC: configuration: explain quotes and spaces in conditional blocks
Conditional blocks inherit the same tokenizer and argument parser as
the rest of the configuration, but are also silently concatenated
around groups of spaces and tabs. This can lead to subtle failures
for configs containing spaces around commas and parenthesis, where
a string comparison might silently fail for example. Let's better
document this particular case.

Thanks to Valentine for analysing and reporting the problem.

This can be backported to 2.4.
2024-11-20 09:04:53 +01:00
Willy Tarreau
962d5e038f DOC: configuration: explain the rules regarding spaces in arguments
Spaces around commas or parenthesis in expressions are generally part
of the value due to the long history of supporting unquoted arguments.
But this tends to come as a surprise to new users and sometimes creates
subtly invalid configurations. Let's add some text covering this.

This can be backported to 2.4.
2024-11-20 08:42:02 +01:00
Willy Tarreau
12fcd65468 MINOR: tasklet: support an optional set of wakeup flags to tasklet_wakeup_on()
tasklet_wakeup_on() and its derivates (tasklet_wakeup_after() and
tasklet_wakeup()) do not support passing a wakeup cause like
task_wakeup(). This is essentially due to an API limitation cause by
the fact that for a very long time the only reason for waking up was
to process pending I/O. But with the growing complexity of mux tasks,
it is becoming important to be able to skip certain heavy processing
when not strictly needed.

One possibility is to permit the caller of tasklet_wakeup() to pass
flags like task_wakeup(). Instead of going with a complex naming scheme,
let's simply make the flags optional and be zero when not specified. This
means that tasklet_wakeup_on() now takes either 2 or 3 args, and that the
third one is the optional flags to be passed to the callee. Eligible flags
are essentially the non-persistent ones (TASK_F_UEVT* and TASK_WOKEN_*)
which are cleared when the tasklet is executed. This way the handler
will find them in its <state> argument and will be able to distinguish
various causes for the call.
2024-11-19 20:13:41 +01:00
Willy Tarreau
0334cb28a9 MINOR: tasklet: make the low-level tasklet API take a flag
Everything in the tasklet layer supports flags, except that they are
just not implemented in the wakeup functions, while they are in the
task_wakeup functions. Initially it was not considered useful to pass
wakeup causes because these were essentially I/O, but with the growing
number of I/O handlers having to deal with various types of operations
(typically cheap I/O notifications on subscribe vs heavy parsing on
application-level wakeups), it would be nice to start to make this
distinction possible.

This commit extends _tasklet_wakeup_on() and _tasklet_wakeup_after()
to pass a set of flags that continues to be set as zero. For now this
changes nothing, but new functions will come.
2024-11-19 20:13:41 +01:00
Willy Tarreau
e57581d76d MINOR: tools: add new macro DEFZERO to provide a default zero argument
This is the equivalent of DEFNULL except that it sets a zero value instead
of a NULL for a missing argument.
2024-11-19 20:13:41 +01:00
Willy Tarreau
c5052bad8a MINOR: sched: add TASK_F_WANTS_TIME to make the scheduler update the call date
Currently tasks being profiled have th_ctx->sched_call_date set to the
current nanosecond in monotonic time. But there's no other way to have
this, despite the scheduler being capable of it. Let's just declare a
new task flag, TASK_F_WANTS_TIME, that makes the scheduler take the time
just before calling the handler. This way, a task that needs nanosecond
resolution on the call date will be able to be called with an up-to-date
date without having to abuse now_mono_time() if not needed. In addition,
if CLOCK_MONOTONIC is not supported (now_mono_time() always returns 0),
the date is set to the most recently known now_ns, which is guaranteed
to be atomic and is only updated once per poll loop.

This date can be more conveniently retrieved using task_mono_time().

This can be useful, e.g. for pacing. The code was slightly adjusted so
as to merge the common parts between the profiling case and this one.
2024-11-19 20:13:41 +01:00
Willy Tarreau
12969c1b17 MINOR: tinfo/clock: turn sched_call_date to 64-bits
We used to store it in 32-bits since we'd only use it for latency and CPU
usage calculation but usages will evolve so let's not truncate the value
anymore. Now we store the full 64 bits. Note that this doesn't even
increase the storage size due to alignment. The 3 usage places were
verified to still be valid (most were already cast to 32 bits anyway).
2024-11-19 20:13:41 +01:00
Willy Tarreau
33c461314c MINOR: stream: don't update s->lat_time when the wakeup date is not set
In 2.7 was added a stream wakeup latency calculation with commit
6a28a30efa ("MINOR: tasks: do not keep cpu and latency times in struct
task"). However, due to the transformation of the previous code, it
kept unconditionally updating s->lat_time even of the sched_wake_date
was zero. In other words, s->lat_time is constantly updated for the
huge majority of calls that are made without profiling. Let's just
check the sched_wake_date status before doing so.
2024-11-19 20:13:41 +01:00
Willy Tarreau
973c81ceec CLEANUP: tinfo: move sched_*_date/*_mono_time to the thread-local area
These ones are never atomically accessed, they have nothing to do in
the atomic ops cache line, let's move them to the thread-local area.
2024-11-19 20:13:41 +01:00
Willy Tarreau
8dc68f3c75 DOC: sched: document the missing TASK_F_UEVT* flags
These are user-defined one-shot events that are application-specific
and reset upon wakeup and were not documented. No backport is needed
since these were added to 3.1.
2024-11-19 20:13:41 +01:00
Willy Tarreau
e5ca72cb6f DOC: sched: add missing scheduler API documentation for tasklet_wakeup_after()
This was added to 2.6 but the doc was forgotten. Let's add it. It's not
needed to backport this since it's only used for new developments.
2024-11-19 20:13:41 +01:00
Aurelien DARRAGON
501827ebe0 DOC: lua: fix yield-dependent methods expected contexts
Contrary to what the doc states, it is not expected (nor relevant) to
use yield-dependent methods such as core.yield() or core.(m)sleep() from
contexts that don't support yielding. Such contexts include body, init,
fetches and converters.

Thus the doc got it wrong since the beginning, because such methods were
never supported from the above contexts, yet it was listed in the list
of compatible contexts (probably the result of a copy-paste), which is
error-prone because it could either cause a Lua runtime error to be
thrown, or be ignored in some other cases.

It should be backported to all stable versions.
2024-11-19 19:36:02 +01:00
William Lallemand
6f746af915 REGTESTS: use -dW by default on every reg-tests
Every reg-test now runs without any warning, so let's acivate -dW by
default so the new ones will inheritate the option.

This patch reverts 9d511b3c ("REGTESTS: enable -dW on almost all tests
to fail on warnings") and adds -dW in the default HAPROXY_ARGS of
scripts/run-regtests.sh instead.
2024-11-19 16:53:10 +01:00
William Lallemand
e1fb9a47e1 MEDIUM: stats-file: silently ignore be/fe mistmatch
Most of the invalid or unknow field in the stats-file parser are ignored
silently, which is not the case of the frontend/backend mismatch on a
guid, which is kind of strange.

Since this is ""documented"" to be ignored in the
reg-tests/stats/sample-stats-file file, let's also ignore this kind of
line. This will allow to run the associated reg-test with -dW.
2024-11-19 16:44:51 +01:00
Amaury Denoyelle
5a29fd6c61 MINOR: mux_quic/pacing: display pacing info on show quic
To improve debugging, extend "show quic" output to report if pacing is
activated on a connection. Two values will be displayed for pacing :

* a new counter paced_sent_ctr is defined in QCC structure. It will be
  incremented each time an emission is interrupted due to pacing.

* pacing engine now saves the number of datagrams sent in the last paced
  emission. This will be helpful to ensure burst parameter is valid.
2024-11-19 16:21:05 +01:00
Amaury Denoyelle
24cea66e07 MEDIUM: quic: define cubic-pacing congestion algorithm
Define a new QUIC congestion algorithm token 'cubic-pacing' for
quic-cc-algo bind keyword. This is identical to default cubic
implementation, except that pacing is used for STREAM frames emission.

This algorithm supports an extra argument to specify a burst size. This
is stored into a new bind_conf member named quic_pacing_burst which can
be reuse to initialize quic path.

Pacing support is still considered experimental. As such, 'cubic-pacing'
can only be used with expose-experimental-directives set.
2024-11-19 16:20:58 +01:00
Amaury Denoyelle
6dfc8fbf1d MINOR: quic: extend quic-cc-algo optional parameters
Modify quic-cc-algo for better extensability of optional parameters
parsing. This will be useful to support a new parameter for maximum
allowed pacing burst size.

Take this opportunity to refine quic-cc-algo documentation. Optional
parameters are now presented as a list which would be soon extended.
2024-11-19 16:20:52 +01:00
Amaury Denoyelle
a6504c9cfb MINOR: quic: use dynamic cc_algo on bind_conf
A QUIC congestion algorithm can be specified on the bind line via
keyword quic-cc-algo. As such, bind_conf structure has a member
quic_cc_algo.

Previously, if quic-cc-algo was set, bind_conf member was initialized to
one of the globally defined CC algo structure. This patch changes
bind_conf quic_cc_algo initialization to point to a dynamically
allocated copy of CC algo structure.

With this change, it will be possible to tweak individually each CC algo
of a bind line. This will be used to activate pacing on top of the
congestion algorithm.

As bind_conf member is dynamically allocated now, its member is now
freed via free_proxy() to prevent any leak.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
796446a15e MAJOR: mux-quic: support pacing emission
Support pacing emission for STREAM frames at the QUIC MUX layer. This is
implemented by adding a quic_pacer engine into QCC structure.

The main changes have been written into qcc_io_send(). It now
differentiates cases when some frames have been rejected by transport
layer. This can occur as previously due to congestion or FD buffer full,
which requires subscribing on transport layer. The new case is when
emission has been interrupted due to pacing timing. In this case, QUIC
MUX I/O tasklet is rescheduled to run with the flag TASK_F_USR1.

On tasklet execution, if TASK_F_USR1 is set, all standard processing for
emission and reception is skipped. Instead, a new function
qcc_purge_sending() is called. Its purpose is to retry emission with the
saved STREAM frames list. Either all remaining frames can now be send,
subscribe is done on transport error or tasklet must be rescheduled for
pacing purging.

In the meantime, if tasklet is rescheduled due to other conditions,
TASK_F_USR1 is reset. This will trigger a full regeneration of STREAM
frames. In this case, pacing expiration must be check before calling
qcc_send_frames() to ensure emission is now allowed.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
ede4cd4c2e MINOR: mux-quic: encapsulate QCC tasklet wakeup
QUIC MUX will be responsible to drive emission with pacing. This will be
implemented via setting TASK_F_USR1 before I/O tasklet wakeup. To
prepare this, encapsulate each I/O tasklet wakeup into a new function
qcc_wakeup().

This commit is purely refactoring prior to pacing implementation into
QUIC MUX.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
4a94a018f0 MINOR: mux-quic: define a tx STREAM frame list member
For STREAM emission, MUX QUIC previously used a local list defined under
qcc_io_send(). This was suitable as either all frames were sent, or
emission must be interrupted due to transport congestion or fatal error.
In the latter case, the list was emptied anyway and a new frame list was
built on future qcc_io_send() invokation.

For pacing, MUX QUIC may have to save the frame list if pacing should be
applied across emission. This is necessary to avoid to unnecessarily
rebuilt stream frame list between each paced emission. To support this,
STREAM list is now stored as a member of QCC structure.

Ensure frame list is always deleted, even on QCC release, using newly
defined utility function qcc_tx_frms_free().
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
886a7c475c MINOR: quic/pacing: add burst support
qc_send_mux() has been extended previously to support pacing emission.
This will ensure that no more than one datagram will be emitted during
each invokation. However, to achieve better performance, it may be
necessary to emit a batch of several datagrams one one turn.

A so-called burst value can be specified by the user in the
configuration. However, some congestion control algos may defined their
owned dynamic value. As such, a new CC callback pacing_burst is defined.

quic_cc_default_pacing_burst() can be used for algo without pacing
interaction, such as cubic. It will returns a static value based on user
selected configuration.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
8039fe43e6 MINOR: quic/pacing: support pacing emission on quic_conn layer
Pacing will be implemented for STREAM frames emission. As such,
qc_send_mux() API has been extended to add an argument to a quic_pacer
engine.

If non NULL, engine will be used to pace emission. In short, no more
than one datagram will be emitted for each qc_send_mux() invokation.
Pacer is then notified about the emission and a timer for a future
emission is calculated. qc_send_mux() will return PACING error value, to
inform QUIC MUX layer that it will be responsible to retry emission
after some delay.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
ab82fab442 MINOR: quic/pacing: implement quic_pacer engine
Extend quic_pacer engine to support pacing emission. Several functions
are defined.
* quic_pacing_sent_done() to notify engine about an emission of one or
  several datagrams
* quic_pacing_expired() to check if emission should be delayed or can be
  conducted immediately
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
3e11492c99 MINOR: quic: define quic_pacing module
Add a new module quic_pacing. A new structure quic_pacer is defined.
This will be used as a pacing engine to implement smooth emission of
QUIC data.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
7fd48a5723 MINOR: quic: extend qc_send_mux() return type with a dedicated enum
This commit is part of a adjustment on QUIC transport send API to
support pacing. Here, qc_send_mux() return type has been changed to use
a new enum quic_tx_err.

This is useful to explain different failure causes of emission. For now,
only two values have been defined : NONE and FATAL. When pacing will be
implemented, a new value would be added to specify that emission was
interrupted on pacing. This won't be a fatal error as this allows to
retry emission but not immediately.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
5cb8f8a622 MINOR: quic: support a max number of built packet per send iteration
Extend QUIC transport emission function to support a maximum datagram
argument. The purpose is to ensure that qc_send() won't emit more than
the specified value, unless it is 0 which is considered as unlimited.

In qc_prep_pkts(), a counter of built datagram has been added to support
this. The packet building loop is interrupted if it reaches a specified
maximum value. Also, its return value has been changed to the number of
prepared datagrams. This is reused by qc_send() to interrupt its work if
a specified max datagram argument value is reached over one or several
iteration of prepared/sent datagrams.

This change is necessary to support pacing emission. Note that ideally,
the total length in bytes of emitted datagrams should be taken into
account instead of the raw number of datagrams. However, for a first
implementation, it was deemed easier to implement it with the latter.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
a554d82131 MINOR: quic: simplify qc_prep_pkts() exit path
To prepare pacing support, qc_prep_pkts() exit path have been rewritten
to be easily modified. This is purely refactoring which should not have
any functional change :
* a dedicated error path has been added
* ensure qc_txb_store() is always called to finalize datagram on normal
  exit path if first_pkt is not NULL. Needed to support breaking from
  packet building loop in a easier way.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
4069873403 MINOR: mux-quic: add missing values for show flags
Add QCC QC_CF_WAIT_FOR_HS and QCS QC_SF_TXBUB_OOB flags to their
respective show_flags to be able to decipher them via dev flags utility.

These values have been added in the current dev version, thus no need to
backport this patch.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle
8540886f00 DOC: quic: rename max-window-size as with default prefix
Rename 'tune.quic.frontend.max-window-size' with the prefix 'default-'.
This highlights the fact that it is not a hard limit, as it can be
overriden if specifying an optional window size via quic-cc-algo on a
bind line.

No need to backport as this keyword was added on the current dev
version.
2024-11-19 16:16:48 +01:00
William Lallemand
f36caf7b81 MEDIUM: stats-file: explicitely ignore comments starting by //
Explicitely ignore comments starting by // so they don't emit a warning.
2024-11-19 15:49:44 +01:00
William Lallemand
96f2736e99 MINOR: stats-file: add the filename in the warning
Add the name of the stats-file in the warning so it's clear that the
warning was provoked by the stats-file and not the config file.
2024-11-19 15:49:44 +01:00
Christopher Faulet
e68c6852ad DOC: config: Move fs.* and bs.* in section about L5 samples
These sample fetch functions were added in the wrong section. Move them in
the section about sample fetch functions at L5 layer.
2024-11-19 15:29:41 +01:00
Christopher Faulet
4ccc3f4048 DOC: config: Move wait_end in section about internal samples
wait_end is an internal sample fetch functions and not a L6 one. So move it
in the corresponding section.
2024-11-19 15:29:40 +01:00
Christopher Faulet
e9021a4ca1 DOC: config: Slightly improve the %Tr documentation
Specify -1 can also be reported for %Tr delay when the response is invalid.
2024-11-19 15:29:40 +01:00
Christopher Faulet
5863d33fce BUG/MINOR: http_ana: Report -1 for %Tr for invalid response only
The server response time is erroneously reported as -1 when it is
intercepted by HAProxy.

As stated in the documentation, the server response time is reported as -1
when the last response header was never seen. It happens when a server
timeout is triggered before the server managed to process the request. It
also happens if the response is invalid. This may be reported by the mux
during the response parsing, but also by the HTTP analyzers. However, in
this last case, the response time must only be reported as -1 on 502.

This patch must be backported to all stable versions. It should fix the
issue #2384.
2024-11-19 15:29:40 +01:00
Christopher Faulet
bc967758a2 MINIR: mux-h1: Return 414 or 431 when appropriate
When the request is too large to fit in a buffer a 414 or a 431 error
message is returned depending on the error state of the request parser. A
414 is returned if the URI is too long, otherwise a 431 is returned.

This patch should fix the issue #1309.
2024-11-19 15:29:40 +01:00
Christopher Faulet
41f28b3c53 DEV: phash: Update 414 and 431 status codes to phash
The phash tool was updated to reflect the previous change. 414 and 431 are
now part of the handled status codes.
2024-11-19 15:29:40 +01:00
Christopher Faulet
62dc8750a9 MINOR: http: Add support for HTTP 414/431 status codes
414-Uri-Too-Long and 431-Request-Header-Fields-Too-Large are now part of
supported status codes that can be define as error files. The hash table
defined in http_get_status_idx() was updated accordingly.
2024-11-19 15:29:40 +01:00
Christopher Faulet
18de419f96 DOC: config: Fix a typo in "1.3.1. The Request line"
At the beginning of the last paragraph of this section, HTTP/3 was used
instead of HTTP/2. It is not fixed.
2024-11-19 15:29:40 +01:00
Christopher Faulet
3af2d91b3b DOC: config: A a space before ':' for {bs,fs}.aborted and {bs,fs}.rst_code
A space was missing before the ':' for the sample fetch functions above. It
was an issue for the text to HTML conversion script. So, let's fix it.
2024-11-19 15:29:40 +01:00
Christopher Faulet
fa43ca2ed0 MINOR: stream: Add an option to "show sess" command to dump the captured URI
"show sess" command now supports a list of options that can be set after all
other possible arguments (<id>, all...). For now, "show-uri" is the only
supported option. With this options, the captured URI, if non-null, is added
to the dump of a stream, complete or now. The URI may be anonymized if
necessary.

This patch should fix the issue #663.
2024-11-19 15:29:40 +01:00
Christopher Faulet
e9bc5937c9 MINOR: agent-check: Be able to set absolute weight via an agent
Historically, an agent-check program is only able to set a proportial weight
to the initial server's weight. However, it could be handy to also set an
absolute value. It is the purpose of this patch.

Instead of changing the current way to set a server's weight, a new
agent-check command is introduced. The string "weight:", followed by an
positive interger or a positive interger percentage, can now be used. If the
value ends with the '%' sign, then the new weight will be proportional to
the initially weight of the server. Otherwise, the value is considered as an
absolute weight and must be between 0 and 256.

This patch should fix the issue #360.
2024-11-19 15:29:40 +01:00
Christopher Faulet
1be7140ade MINOR: http-ana: Add support for "set-cookie-fmt" option to redirect rules
It is now possible to use a log-format string to define the "Set-Cookie"
header value of a response generated by a redirect rule. There is no special
check on the result format and it is not possible during the configuration
parsing. It is proably not a big deal because already existing "set-cookie"
and "clear-cookie" options don't perform any check.

Here is an example:

  http-request redirect location https://someurl.com/ set-cookie haproxy="%[var(txn.var)]"

This patch should fix the issue #1784.
2024-11-19 15:20:02 +01:00
Christopher Faulet
b2877db47c MINOR: http-ana: Add option to keep query-string on a localtion-based redirect
On prefix-based redirect, there is an option to drop the query-string of the
location. Here it is the opposite. an option is added to preserve the
query-string of the original URI for a localtion-based redirect.

By setting "keep-query" option, for a location-based redirect only, the
query-string of the original URI is appended to the location. If there is no
query-string, nothing is added (no empty '?'). If there is already a
non-empty query-string on the localtion, the original one is appended with
'&' separator.

This patch should fix issue #2728.
2024-11-19 15:20:02 +01:00
Valentine Krasnobaeva
7848692c4c MINOR: config: show HAPROXY_BRANCH in "show env" output
Before this patch HAPROXY_BRANCH was unset just after configuration parsing.
Let's keep it, as it could be used in conditional blocks and some
configuration directives and it's handy to check its runtime value via "show
env".

In master-worker mode, this variable is set to the same value for both
processes.
2024-11-19 14:13:50 +01:00
Valentine Krasnobaeva
d58a8d1f64 MINOR: cli: make "show env" accessible via master CLI without enabling debug
Before this patch, we have need to put the master CLI in debug mode to be able
to issue 'show env' command for the master process. Output of this command is
handy even for the master process context, as it allows to control its
environment variables, which could be used/modified in the 'global' section.

So, let's provide in 'show env' command structure the level ACCESS_MASTER.
This allows to see and to access this command in master CLI without putting it
in debug mode.
2024-11-19 14:13:42 +01:00
Valentine Krasnobaeva
b9536717cd BUG/MINOR: mworker-prog: don't warn about deprecated section with expose-deprecated-directives
As master parses now expose-deprecated-directives option, let's emit warning
about deprecated 'progam' section only in case, if this option wasn't set in
the 'global' section. This allows to people, who don't prefer to remove the
'program' section immediately to continue to start the process in zero-warning
mode.

Adjust the warning message accordingly and mcli_start_progs.vtc test. As
expose-deprecated-directives option is a 'global' section keyword, this section
must always precede any 'program' section, if users still continue to keep
'program' section.

This doesn't need to be backported, as related to the latest changes in
the master-worker architecture.
2024-11-19 14:13:30 +01:00
Valentine Krasnobaeva
39ea0df38f MINOR: cfgparse-global: parse options to allow non std keywords in discovery mode
'Program' section is considered as deprecated now, see the commit 581c8a27d98c
("MEDIUM: mworker: depreciate the 'program' section"). So, the 'program'
section parser emits a warning every time since this commit, if its section is
presented. This makes impossible to launch the process in zero-warning mode.

After master-worker refactoring only the master process parses the 'program'
section. So, at first, in order to be able to start in zero-warning mode, we
need to parse in master process option, which allows deprecated keywords. Thus,
let's set in this commit KWF_DISCOVERY flag to
cfg_parse_global_non_std_directives parser, which parses
'expose-deprecated-directives' and 'expose-deprecated-directives' options.
2024-11-19 14:13:19 +01:00
Willy Tarreau
f8d3d2e4cf MINOR: ring: support unit suffixes in the size
The ring size used to take only numbers and silently ignore letters (due
to atol()), resulting it tiny buffers when trying to collect traces and
using e.g. "size 10g". Let's make use of parse_size_err() to properly
parse units.
2024-11-19 10:56:45 +01:00
Willy Tarreau
82f190f882 MINOR: tools: make parse_size_err() support 32/64 bits
parse_size_err() currently is a function working only on an uint. It's
not convenient for certain elements such as rings on large machines.

This commit addresses this by having one function for uints and one
for ullong, and making parse_size_err() a macro that automatically
calls one or the other. It also has the benefit of automatically
supporting compatible types (long, size_t etc).
2024-11-19 10:50:42 +01:00
Willy Tarreau
9c6ccb8dbb MEDIUM: config: warn on unitless timeouts < 100 ms
From time to time we face a configuration with very small timeouts which
look accidental because there could be expectations that they're expressed
in seconds and not milliseconds.

This commit adds a check for non-nul unitless values smaller than 100
and emits a warning suggesting to append an explicit unit if that was
the intent.

Only the common timeouts, the server check intervals and the resolvers
hold and timeout values were covered for now. All the code needs to be
manually reviewed to verify if it supports emitting warnings.

This may break some configs using "zero-warning", but greps in existing
configs indicate that these are extremely rare and solely intentionally
done during tests. At least even if a user leaves that after a test, it
will be more obvious when reading 10ms that something's probably not
correct.
2024-11-19 10:33:20 +01:00
Willy Tarreau
9d511b3c27 REGTESTS: enable -dW on almost all tests to fail on warnings
Now that warnings were almost all removed, let's enable zero-warning
via -dW. All tests were adjusted, but two:

  - mcli/mcli_start_progs.vtc:
      the programs section currently cannot be silenced

  - stats/stats-file.vtc:
      the warning comes from the stats file itself on comment lines.

All other ones are now OK.
2024-11-19 09:27:08 +01:00
Willy Tarreau
efd745e22d REGTESTS: only use tune.ssl.default-dh-param when not using AWS-LC
This option is not available with AWS-LC and emits a warning, so let's
properly enclose the test to cover this special case.
2024-11-19 09:27:08 +01:00
Willy Tarreau
d37610f43d REGTESTS: add missing timeouts to 30 tests
No less than 30 tests were missing timeouts, preventing them from being
started with zero-warning. Since they were not supposed to trigger, they
have been set to 30s so as never to trigger, and now they do not produce
any warning anymore.
2024-11-19 08:46:02 +01:00
Willy Tarreau
52b72ec3ba REGTESTS: silence warning "L6 sample fetches ignored" in cond_set_var
This reg-test uses req.len in an HTTP backend. It does work but emits a
warning suggesting that this is ignored, so most likely its days are
counted now. Let's just use req.hdrs,length instead.
2024-11-19 08:33:15 +01:00
Willy Tarreau
b9537fe66d REGTESTS: remove a duplicate "option httpslog" in the defaults section
This triggers the following warning:

  'option httpslog' overrides previous 'option httpslog' in 'defaults' section.
2024-11-19 08:06:26 +01:00
Willy Tarreau
dce394a303 REGTESTS: silence warnings about content-type being ignored
The following rules are triggering warnings about content-type being
ignored:

  http-request return content-type "text/plain" if { path /def-4 }
  http-request return content-type "text/plain" file /dev/null hdr "x-custom-hdr" "%[url]"  if { path /empty-file }

Annoyingly, the content-type is mandatory when the file is not empty,
that might be something to revisit in the future to relax at least one
of the rules so that the config doesn't strictly require to know the
file contents upfront.
2024-11-19 08:06:26 +01:00
Willy Tarreau
6d70da76d3 REGTESTS: make the unit explicit for very short timeouts
Two tests were using "timeout {client,server} 1" to forcefully trigger
them, but a forthcoming patch will emit a warning for such small unitless
values, so let's be explicit about the unit.
2024-11-19 08:06:26 +01:00
Willy Tarreau
04465d25bc REGTESTS: silence warning "previous 'http-response' action is final"
The regtest "h1or2_to_h1c" contains both an allow and a deny at the end,
likely to help catch rare bugs. But this triggers a warning that we can
silence by placing a condition on the penultimate rule.
2024-11-19 08:06:26 +01:00
Willy Tarreau
671f6beac1 REGTESTS: silence the "log format ignored" warnings
Several tests were declaring a log format without having an explicit
log server configured, causing a warning. Let's clean them up.
2024-11-19 08:06:26 +01:00
Willy Tarreau
e72b525832 MINOR: cfgparse: parse tune.bufsize.small as a size
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "4k". Let's make use of parse_size_err() on it so that
units are supported. This requires to turn it to uint as well, which
was verified to be OK.
2024-11-18 19:07:05 +01:00
Willy Tarreau
a344d37fad MINOR: cfgparse: parse tune.bufsize as a size
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, preventing from starting when set
e.g. to "64k". Let's make use of parse_size_err() on it so that units are
supported. This requires to turn it to uint as well, and to explicitly
limit its range to INT_MAX - 2*sizeof(void*), which was previously
partially handled as part of the sign check.
2024-11-18 19:06:25 +01:00
Willy Tarreau
2f0c6ff3a5 MINOR: cfgparse: parse tune.recv_enough as a size
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on it so that
units are supported. This requires to turn it to uint as well, and
since it's sometimes compared to an int, we limit its range to
0..INT_MAX.
2024-11-18 19:01:28 +01:00
Willy Tarreau
a90a7d4d60 MINOR: cfgparse: parse tune.pipesize as a size
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on it so that
units are supported. This requires to turn it to uint as well, which
was verified to be OK.
2024-11-18 18:51:31 +01:00
Willy Tarreau
f9f28b7584 MINOR: cfgparse: parse tune.{rcvbuf,sndbuf}.{frontend,backend} as sizes
Till now these values were parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on them so that
units are supported. This requires to turn them to uint as well, which
is OK.
2024-11-18 18:50:02 +01:00
Willy Tarreau
a923c72357 MINOR: cfgparse: parse tune.{rcvbuf,sndbuf}.{client,server} as sizes
Till now these values were parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on them so that
units are supported. This requires to turn them to uint as well, which
is OK.
2024-11-18 18:49:01 +01:00
Willy Tarreau
45f9e95f22 MINOR: sample: extend the "when" converter to support an ACL
Sometimes conditions to decide of an anomaly are not as easy to define
as just an error or a success. One example use case would be to monitor
the transfer time and fix a threshold.

An idea suggested by Tristan would be to make permit the "when"
converter to refer to a more variable or dynamic condition.

Here we make this possible by making "when" rely on a named ACL. The
ACL then needs to be specified in either the proxy or the defaults
section. Since it is evaluated inline, it may even refer to information
available at the end (at log time) such as the data transfer time. If
the ACL evalutates to true, the converter passes the data.

Example: log "dbg={-}" when fine, or "dbg={... debug info ...}" on slow
transfers:

  acl slow_xfer res.timer.data ge 10000   # more than 10s is slow
  log-format "$HAPROXY_HTTP_LOG_FMT                                \
              fsdbg={%[fs.debug_str,when(acl,slow_xfer)]}          \
              bsdbg={%[bs.debug_str,when(acl,slow_xfer)]}"
2024-11-18 16:11:55 +01:00
Willy Tarreau
00fcda1ff2 MINOR: acl: export find_acl_default()
It will be needed in a future patch, so let's export it (it was static).
2024-11-18 15:15:54 +01:00
Willy Tarreau
9539f2b097 [RELEASE] Released version 3.1-dev13
Released version 3.1-dev13 with the following main changes :
    - MEDIUM: mworker: depreciate the 'program' section
    - BUILD: ot: use a cebtree instead of a list for variable names
    - MINOR: startup: replace HAPROXY_LOAD_SUCCESS with global load_status
    - BUG/MINOR: startup: set HAPROXY_CFGFILES in read_cfg
    - BUG/MINOR: cli: don't show sockpairs in HAPROXY_CLI and HAPROXY_MASTER_CLI
    - BUG/MEDIUM: stconn: Don't forward shut for SC in connecting state
    - BUG/MEDIUM: resolvers: Insert a non-executed resulution in front of the wait list
    - MINOR: debug: explicitly permit the counter condition to be empty
    - MINOR: debug: add a new counter type for glitches
    - MINOR: mux-h2: count glitches when they're reported
    - BUG/MINOR: deinit: release uri_auth admin rules
    - MINOR: uri_auth: add stats_uri_auth_free helper
    - MEDIUM: uri_auth: implement clean uri_auth cleaning
    - MINOR: mux-quic/h3: count glitches when they're reported
    - BUG/MEDIUM: mux-h2: Don't send RST_STREAM frame for streams with no ID
    - BUG/MINOR: Don't report early srv aborts on request forwarding in DONE state
    - MINOR: promex: Expose the global node and description in process metrics
    - MINOR: promex: Add global and proxies description as labels to all metrics
    - OPTIM: pattern: only apply LRU cache for large enough lists
    - BUG/MEDIUM: checks: make sure to always apply offsets to now_ms in expiration
    - BUG/MINOR: debug: do not set task expiration to TICK_ETERNITY
    - BUG/MEDIUM: mailers: make sure to always apply offsets to now_ms in expiration
    - BUG/MINOR: mux_quic: make sure to always apply offsets to now_ms in expiration
    - BUG/MINOR: peers: make sure to always apply offsets to now_ms in expiration
    - BUG/MEDIUM: clock: make sure now_ms cannot be TICK_ETERNITY
    - MINOR: debug/cli: replace "debug dev counters" with "debug counters"
    - DOC: config: add tune.h2.{be,fe}.rxbuf to the global keywords index
    - MINOR: chunk: add a BUG_ON upon the next init_trash_buffer()
2024-11-15 18:42:29 +01:00
William Lallemand
0bfd36e7b8 MINOR: chunk: add a BUG_ON upon the next init_trash_buffer()
The trash pool is initialized twice in haproxy, first during STG_POOL,
and 2nd after configuration parsing.

Doing alloc_trash_chunk() between this 2 phases can lead to strange
things if we are using it after, indeed the pool is destroyed and
trying to do a free_trash_chunk() or accessing the pointer will lead to
crashes.

This patch checks that we don't have used buffers from the trash pool
before initializing the pool again.
2024-11-15 17:15:06 +01:00
Willy Tarreau
5f37af7a8e DOC: config: add tune.h2.{be,fe}.rxbuf to the global keywords index
These two keywords were missing from the index, let's add them.
2024-11-15 16:32:37 +01:00
Willy Tarreau
4420939fcd MINOR: debug/cli: replace "debug dev counters" with "debug counters"
"debug dev" commands are not meant to be used by end-users, and are
purposely not documented. Yet due to their usefulness in troubleshooting
sessions, users are increasingly invited by developers to use some of
them.

"debug dev counters" is one of them. Better move it to "debug counters"
and document it so that users can check them even if the output can look
cryptic at times. This, combined with DEBUG_GLITCHES, can be convenient
to observe suspcious activity. The doc however precises that the format
may change between versions and that new entries/types might appear
within a stable branch.
2024-11-15 16:26:01 +01:00
Willy Tarreau
5a3735a155 BUG/MEDIUM: clock: make sure now_ms cannot be TICK_ETERNITY
In clock ticks, 0 is TICK_ETERNITY. Long ago we used to make sure now_ms
couldn't be zero so that it could be assigned to expiration timers, but
it has long changed after functions like tick_add() were instrumented to
make the check. The problem is that aside the rare few accidental direct
assignments to expiration dates, it's also used to mark the beginning of
an event that's later checked against TICK_ETERNITY to know if it has
already struck. The problem in this case is that certain events may just
be replaced or dropped just because they apparently never appeared. It's
probably the case for stconn's "lra" and "fsb" fields, just like it is
for all those involving tick_add_ifset(), like h2c->idle_start.

The right approach would be to change the type of now_ms to something
else that cannot take direct computations and that represents a timestamp,
forcing to always use the conversion functions. The variables holding such
timestamps would also be distinguished from intervals. At first glance we
could have for timestamps:
  - 0 = never happened (for the past), eternity (for the future)
  - X = date
and for intervals:
  - 0 = not set
  - X = interval

However this requires significant changes. Instead for now, let's just
make sure again that now_ms is never 0 by setting it to 1 when this
happens (1 / 4 billion times, or 1ms every 49.7 days).

This will need to be carefully backported to older versions. Note that
with this patch backported, the previous ones fixing the zero date are
not strictly needed.
2024-11-15 16:01:31 +01:00
Willy Tarreau
ed55ff878d BUG/MINOR: peers: make sure to always apply offsets to now_ms in expiration
Now_ms can be zero nowadays, so it's not suitable for direct assignment to
t->expire, as there's a risk that the timer never wakes up once assigned
(TICK_ETERNITY). Let's use tick_add(now_ms, 0) for an immediate wakeup
instead. The impact here might be a reconnect programmed upon signal
receipt at the wrapping date not having a working timeout.

This should be backported where it applies.
2024-11-15 15:44:05 +01:00
Willy Tarreau
f66bfcff96 BUG/MINOR: mux_quic: make sure to always apply offsets to now_ms in expiration
Now_ms can be zero nowadays, so it's not suitable for direct assignment to
t->expire, as there's a risk that the timer never wakes up once assigned
(TICK_ETERNITY). Let's use tick_add(now_ms, 0) for an immediate wakeup
instead. The impact looks nul since the task is also woken up, but better
not leave such tasks in the timer tree anyway.

This should be backported where it applies.
2024-11-15 15:41:21 +01:00
Willy Tarreau
841be4cdd1 BUG/MEDIUM: mailers: make sure to always apply offsets to now_ms in expiration
Now_ms can be zero nowadays, so it's not suitable for direct assignment to
t->expire, as there's a risk that the timer never wakes up once assigned
(TICK_ETERNITY). Let's use tick_add(now_ms, 0) for an immediate wakeup
instead. The impact here might be mailers suddenly stopping.

This should be backported where it applies.
2024-11-15 15:39:58 +01:00
Willy Tarreau
808a7cc777 BUG/MINOR: debug: do not set task expiration to TICK_ETERNITY
Using "debug task", it's possible to change a task's expiration, but
we must be careful not to set it to TICK_ETERNITY. Let's use tick_add()
instead. The risk is basically nul since it's a debugging command, so
no backport is needed.
2024-11-15 15:39:00 +01:00
Willy Tarreau
2f287f14f3 BUG/MEDIUM: checks: make sure to always apply offsets to now_ms in expiration
Now_ms can be zero nowadays, so it's not suitable for direct assignment to
t->expire, as there's a risk that the timer never wakes up once assigned
(TICK_ETERNITY). Let's use tick_add(now_ms, 0) for an immediate wakeup
instead. The impact here might be health checks suddenly stopping.

This should be backported where it applies.
2024-11-15 15:39:00 +01:00
Willy Tarreau
555994c968 OPTIM: pattern: only apply LRU cache for large enough lists
As shown in issue #1518, the LRU cache has a non-null cost that can
sometimes be above the match cost it's trying to avoid. After a number
of tests, it appears that:
  - "simple" match operations (sub, beg, end, int etc) reach a break-even
    after ~20 patterns in list
  - "heavy" match operations (reg) reach a break-even after ~5 patterns in
    list

Let's only consult the LRU cache when the number of patterns in the
expression is at least as large as this limit. Of course there will
always be outliers but it already starts good.

Another improvement consists in reducing the cache size to further
speed up lookups, which makes sense if less expressions use the cache.
2024-11-15 15:33:04 +01:00
Christopher Faulet
25b0592745 MINOR: promex: Add global and proxies description as labels to all metrics
While the global description is exposed, when defined, in a dedicated
metric, it is not possible to dump the description defined in a
frontend/listen/backend sections. So, thanks to this patch, it is now
possible to dump it as a label of all metrics of the corresponding
section. To do so, "desc-labels" parameter must be provided on the URL:

    /metrics?desc-labels

When this parameter is set, if a description is provided in a section,
including the global one, the "desc" label will be added to all metrics of
this section. For instance:

  haproxy_frontend_current_sessions{proxy="front-http",desc="..."} 1

Note that servers metrics inherit the description of their backend/listen
section.

This patch should solve the issue #1531.
2024-11-15 14:25:13 +01:00
Christopher Faulet
451d216a53 MINOR: promex: Expose the global node and description in process metrics
The global node value is now exposed via "haproxy_process_node" metrics. The
metric value is always set to 1 and the node name itself is the "node"
label. The same is performed for the global description. But only if it is
defined. In that case "haproxy_process_description" metric is defined, with
1 as value and the description itself is set in the "desc" label.
2024-11-15 14:24:31 +01:00
Christopher Faulet
a930e99f46 BUG/MINOR: Don't report early srv aborts on request forwarding in DONE state
L7-retries may be ignored if server aborts are detected during the request
forwarding, when the request is already in DONE state.

When a request was fully processed (so in HTTP_MSG_DONE state) and is
waiting for be forwarded to the server, there is a test to detect server
aborts, to be able to report the error. However, this test must be skipped
if the response was not received yet, to let the reponse analyszers handle
the abort. It is important to properly handle the retries. This test must
only be performed if the response analysis was finished. It means the
response must be at least in HTTP_MSG_BODY state.

This patch should be backported as far as 2.8.
2024-11-15 11:00:05 +01:00
Christopher Faulet
f065d00098 BUG/MEDIUM: mux-h2: Don't send RST_STREAM frame for streams with no ID
On server side, the H2 stream is first created with an unassigned ID (ID ==
0). Its ID is assigned when the request is emitted, before formatting the
HEADERS frame. However, the session may be aborted during that stage. We
must take care to not emit RST_STREAM frame for this stream, because it does
not exist yet for the server.

It is especially important to do so because, depending on the timing, it may
also happens before the H2 PREFACE was sent.

This patch must be backported to all stable versions. It is related to issue
2024-11-15 10:34:47 +01:00
Willy Tarreau
4fd6d15344 MINOR: mux-quic/h3: count glitches when they're reported
The qcc_report_glitch() function is now replaced with a macro to support
enumerating counters for each individual glitch line. For now this adds
36 such counters. The macro supports an optional description, though that
is not being used for now.

As a reminder, this requires to build with -DDEBUG_GLITCHES=1.
2024-11-14 20:43:33 +01:00
Aurelien DARRAGON
42710b7320 MEDIUM: uri_auth: implement clean uri_auth cleaning
proxy auth_uri struct was manually cleaned up during deinit, but the logic
behind was kind of akward because it was required to find out which ones
were shared or not. Instead, let's switch to a proper refcount mechanism
and free the auth_uri struct directly in proxy_free_common().
2024-11-14 15:03:38 +01:00
Aurelien DARRAGON
e1ec37ea51 MINOR: uri_auth: add stats_uri_auth_free helper
Let's now leverage stats_uri_auth_free() helper to free uri_auth struct
instead of manually performing the cleanup, which is error-prone.
2024-11-14 15:03:33 +01:00
Aurelien DARRAGON
350a3ab052 BUG/MINOR: deinit: release uri_auth admin rules
When uri_auth admin rules were implemented in 474be415
("[MEDIUM] stats: add an admin level") no attempt was made to free the
list of allocated rules, which makes valgrind unhappy upon deinit when
"stats admin" is used in the config.

To fix the issue, let's cleanup the admin rules list upon deinit where
uri_auth freeing is already handled.

While this could be backported to every stable versions, given how minor
this is and has no impact on the dying process, it is probably not worth
the effort.
2024-11-14 15:03:27 +01:00
Willy Tarreau
df93cf72b9 MINOR: mux-h2: count glitches when they're reported
The h2c_report_glitch() function is now replaced with a macro to support
enumerating counters for each individual glitch line. For now this adds
43 such counters. The macro supports an optional description, though that
is not being used for now. It gives outputs like this (note that the last
one was purposely instrumented to pass a description):

   > debug dev counters glt all
   0          GLT mux_h2.c:5976 h2c_dec_hdrs()
   0          GLT mux_h2.c:5960 h2c_dec_hdrs()
   (...)
   0          GLT mux_h2.c:2207 h2c_frt_recv_preface()
   0          GLT mux_h2.c:1954 h2c_frt_stream_new(): new stream too early

As a reminder, this requires to build with -DDEBUG_GLITCHES=1.
2024-11-14 09:01:57 +01:00
Willy Tarreau
502790ed7e MINOR: debug: add a new counter type for glitches
COUNT_GLITCH() will implement an unconditional counter on its declaration
line when DEBUG_GLITCHES is set, and do nothing otherwise. The output will
be reported as "GLT" and can be filtered as "glt" on the CLI. The purpose
is to help figure what's happening if some glitches counters start going
through the roof. The macro supports an optional string argument to
describe the cause of the glitch (e.g. "truncated header"), which is then
reported in the dump.

For now this is conditioned by DEBUG_GLITCHES but if it turns out to be
light enough, maybe we'll keep it enabled full time. In this case it
might have to be moved away from debug dev, or at least documented (or
done as debug counters maybe so that dev can remain undocumented and
updatable within a branch?).
2024-11-14 08:49:38 +01:00
Willy Tarreau
e119095290 MINOR: debug: explicitly permit the counter condition to be empty
In order to count new event types, we'll need to support empty conditions
so that we don't have to fake if (1) that would pollute the output. This
change checks if #cond is an empty string before concatenating it with
the optional var args, and avoids dumping the colon on the dump if the
whole description is empty.
2024-11-14 08:47:00 +01:00
Christopher Faulet
8f28dbeea9 BUG/MEDIUM: resolvers: Insert a non-executed resulution in front of the wait list
When a resolver is woken up to process DNS resolutions, it is possible to
trigger an infinite loop on the resolver's wait list because delayed
resolutions are always reinserted at the end of this list. This leads the
watchdog to kill the process. By re-inserting them in front of the list,
that fixes the bug.

When a resolver tries to send the queries for the resolutions in its wait
list, it may be unable to proceed for a resolution. This may happen because
the resolution must be skipped (no hostname to resolv, a resolution already
in-progress) or when an error occurred. In that case, the resolution is
re-inserted in the resolver's wait list to be retry later, on a next wakeup.

However, the resolution is inserted at the end of the wait list. So it is
immediately reevaluated, in the same execution loop, instead of to be
delayed. Most of time, it is not an issue because the resolution is
considered as not expired on the second run. But it is an problem when the
internal time wraps and is equal to 0. In that case, the resolution
expiration date is badly computed and it is always considered as expired. If
two or more resolutions are in that state, the resolver loops for ever on
its wait list, until the process is killed by the watchdog.

So we can argue that the way the resolution expiration date is computed must
be fixed. And it would be true in a perfect world. However, the resolvers
code is so crapy that it is hard to be sure to not introduce regressions. It
is farly easier to re-insert delayed resolutions in front of the wait
list. This fixes the issue and at worst, these resolutions will be evaluated
one time too many on the next wakeup and only if now_ms was equal to 0 on
the prior wakeup.

This patch should be backported to all stable versions. On 2.2, LIST_ADD()
must be used instead of LIST_INSERT()
2024-11-13 10:53:27 +01:00
Christopher Faulet
72e529829b BUG/MEDIUM: stconn: Don't forward shut for SC in connecting state
In connecting state, shutdown must not be forwarded or scheduled because
otherwise this will prevent any connection retries. Indeed, if a EOS is
reported by the mux during the connection establishment, this should be
handled by the stream to eventually retries. If the write side is closed
first, this will not be possible because the stconn will be switched in DIS
state. If the shut is scheduled because pending data are blocked, the same
may happen, depending on the abort-on-close option.

This patch should be slowly be backported as far as 2.4. But an observation
period is mandatory. On 2.4, the patch must be adapted to use the
stream-interface API.
2024-11-13 10:53:27 +01:00
Valentine Krasnobaeva
113745e6f0 BUG/MINOR: cli: don't show sockpairs in HAPROXY_CLI and HAPROXY_MASTER_CLI
Before this fix, HAPROXY_CLI and HAPROXY_MASTER_CLI have contained along with
CLI sockets addresses internal sockpairs, which are used only for master CLI
(reload sockpair and sockpair shared with a worker process). These internal
sockpairs are always need to be hidden.

At the moment there is no any client, who uses sockpair addresses for the
stats listener or in order to connect to master CLI. So, let's simply not copy
these internal sockpair addresses of MASTER and GLOBAL proxy listeners.

As listeners with sockpairs are skipped and they can be presented in the
listeners list in any order, let's add semicolon separator between addresses
only in the case, when there are already some string saved in the trash and we
are sure, that we are adding a new address to it. Otherwise, we could have such
weird output:

	HAPROXY_MASTER_CLI=unix@/tmp/mcli.sock;;

This fix is need to be backported in all stable versions.
2024-11-13 09:50:05 +01:00
Valentine Krasnobaeva
1f0cd91fe7 BUG/MINOR: startup: set HAPROXY_CFGFILES in read_cfg
load_cfg() is called only once before the first reading of the configuration
(we parse here only the global section). Then, before reading the rest of the
sections (second call of read_cfg()), we call clean_env(). As
HAPROXY_CFGFILES is set in load_cfg(), which is called only once, clean_env()
erases it. Thus, it's not longer shown in "show env" output.

To fix this, let's set HAPROXY_CFGFILES in read_cfg(). Like this in
master-worker mode it is set for master and for worker processes, as it was
before the refactoring.

This fix doesn't need to be backported as related to the latest master-worker
architecture change.
2024-11-13 09:50:05 +01:00
Valentine Krasnobaeva
d5d41dee3d MINOR: startup: replace HAPROXY_LOAD_SUCCESS with global load_status
After master-worker refactoring, master performs re-exec only once up to
receiving "reload" command or USR2 signal. There is no more the second
master's re-exec to free unused memory. Thus, there is no longer need to export
environment variable HAPROXY_LOAD_SUCCESS with worker process load status. This
status can be simply saved in a global variable load_status.
2024-11-13 09:50:05 +01:00
Miroslav Zagorac
aadda34fd6 BUILD: ot: use a cebtree instead of a list for variable names
In order for the function flt_ot_vars_scope_dump() to work, it is
necessary to take into account the changes made by the commits 47ec7c681
("OPTIM: vars: use a cebtree instead of a list for variable names") and
5d350d1e5 ("OPTIM: vars: use multiple name heads in the vars struct").

The function is only used if the OT_DEBUG=1 option is set when compiling
HAProxy.
2024-11-12 11:07:13 +01:00
William Lallemand
581c8a27d9 MEDIUM: mworker: depreciate the 'program' section
The program section is unreliable and should not be used, more reliable
alternatives exist outside HAProxy. Let's depreciate the section so we
could remove it completely in 3.3.
2024-11-08 17:06:58 +01:00
Willy Tarreau
0434e87348 [RELEASE] Released version 3.1-dev12
Released version 3.1-dev12 with the following main changes :
    - MINOR: startup: tune.renice.{startup,runtime} allow to change priorities
    - BUG/MEDIUM: promex: Fix dump of extra counters
    - BUILD: import/mt_list: support building with TCC
    - BUILD: compiler: define __builtin_prefetch() for tcc
    - CLEANUP: quic: Remove the useless directive "tune.quic.backend.max-idle-timeou"
    - DOC: config: document connection error 44 (reverse connect failure)
    - CLEANUP: connection: properly name the CO_ER_SSL_FATAL enum entry
    - DEBUG: cli: support closing "hard" using close() in addition to fd_delete()
    - MINOR: connection: add more connection error codes to cover common errno
    - MINOR: rawsock: set connection error codes when returning from recv/send/splice
    - MINOR: connection: add new sample fetch functions fc_err_name and bc_err_name
    - MINOR: quic: Help diagnosing malformed probing packets
    - BUG/MINOR: quic: fix malformed probing packet building
    - MINOR: listener: Remove useless checks on the receiver protocol existence
    - MINOR: http-conv: Remove unreachable goto statement in sample_conv_q_preferred
    - MINOR: http: don't %-encode the payload when not relevant
    - MINOR: quic: simplify qc_parse_pkt_frms() return path
    - MINOR: quic: use dynamically allocated frame on parsing
    - MINOR: quic: extend return value of CRYPTO parsing
    - BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO
    - BUG/MINOR: mworker: do 'program' postparser checks in read_cfg_in_discovery_mode
    - EXAMPLES: add "traces.cfg" with traces examples
    - BUG/MEDIUM: quic: do not consider ACK on released stream as error
    - CLEANUP: stats: fix misleading comment on top of stat_idx_info
    - MINOR: wdt: move the local timers to a struct
    - MINOR: debug: add a function to dump a stuck thread
    - DEBUG: wdt: better detect apparently locked up threads and warn about them
    - DEBUG: cli: make it possible for "debug dev loop" to trigger warnings
    - DEBUG: wdt: make the blocked traffic warning delay configurable
    - DEBUG: wdt: add a stats counter "BlockedTrafficWarnings" in show info
    - DEBUG: wdt: set the default blocked task delay to 100 ms
    - MINOR: debug: move the "recover now" warn message after the optional notes
    - MINOR: event_hdl: add event_hdl_sub_list_empty() helper func
    - MINOR: pattern: add _pat_ref_new() helper func
    - OPTIM: pattern: use malloc() to initialize new pat_ref struct
    - MINOR: pattern: add pat_ref_free() helper func
    - CLEANUP: guid: remove global tree export
    - BUG/MINOR: guid/server: ensure thread-safety on GUID insert/delete
    - DOC: management: explain the change of behavior of the program section
    - BUG/MEDIUM: mux-h2: try to wait for the peer to read the GOAWAY
    - BUG/MEDIUM: quic: prevent crash due to CRYPTO parsing error
2024-11-08 15:46:54 +01:00
Amaury Denoyelle
2975e8805d BUG/MEDIUM: quic: prevent crash due to CRYPTO parsing error
A packet which contains several splitted and out of order CRYPTO frames
may be parsed multiple times to ensure it can be handled via ncbuf. Only
3 iterations can be performed to prevent excessive CPU usage.

There is a risk of crash if packet parsing is interrupted after maximum
iterations is reached, or no progress can be made on the ncbuf. This is
because <frm> may be dangling after list_for_each_entry_safe()

The crash occurs on qc_frm_free() invokation, on error path of
qc_parse_pkt_frms(). To fix it, always reset frm to NULL after
list_for_each_entry_safe() to ensure it is not dangling.

This should fix new report on github isue #2776. This regression has
been triggered by the following patch :
  1767196d5b2d8d1e557f7b3911a940000166ecda
  BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO

As such, it must be backported up to 2.6, after the above patch.
2024-11-08 15:19:57 +01:00
Willy Tarreau
3ed9361688 BUG/MEDIUM: mux-h2: try to wait for the peer to read the GOAWAY
When timeout http-keep-alive is very short (e.g. 10ms), it's possible
sometimes for a client to face truncated responses due to an early
close that happens while the system is still pushing the last data,
colliding with the client's WINDOW_UPDATEs that trigger RSTs.

Here we're trying to do better: first we send a GOAWAY on timeout, then
we wait up to clientfin/client timeout for the peer to react so that we
don't immediately close. This is sufficient to avoid truncation as soon
as the timeout is more than a few hundred ms.

It's not certain it should be backported, because it's a bit sensistive
and might possibly fall into certain edge cases.
2024-11-08 14:31:07 +01:00
William Lallemand
75b302d123 DOC: management: explain the change of behavior of the program section
The program section does not work exactly the same way with the
master-worker rework of HAProxy 3.1. Let's explain it in the program
documentation.
2024-11-08 12:00:26 +01:00
Amaury Denoyelle
8e0e7d9d1a BUG/MINOR: guid/server: ensure thread-safety on GUID insert/delete
Since 3.0, it is possible to assign a GUID to proxies, listeners and
servers. These objects are stored in a global tree guid_tree.

Proxies and listeners are static. However, servers may be added or
deleted at runtime, which imply that guid_tree must be protected. Fix
this by declaring a read-write lock to protect tree access.

For now, only guid_insert() and guid_remove() are protected using a
write lock. Outside of these, GUID tree is not accessed at runtime. If
server CLI commands are extended to support GUID as server identifier,
lookup operation should be extended with a read lock protection.

Note that during stat-file preloading, GUID tree is accessed for lookup.
However, as it is performed on startup which is single threaded, there
is no need for lock here. A BUG_ON() has been added to ensure this
precondition remains true.

This bug could caused a segfault when using dynamic servers with GUID.
However, it was never reproduced for now.

This must be backported up to 3.0. To avoid a conflict issue, the
previous cleanup patch can be merged before it.
2024-11-07 18:17:03 +01:00
Amaury Denoyelle
b70880cdc9 CLEANUP: guid: remove global tree export
guid_tree is not directly used outside of functions provided by the guid
module. Remove its export from the include file.
2024-11-07 17:20:00 +01:00
Aurelien DARRAGON
aba3ed62ae MINOR: pattern: add pat_ref_free() helper func
For now, pat_ref struct are never freed, except during init in case of
error. The freeing is done directly in the init functions because we
don't have an helper for that.

No having an helper func to properly free pat_ref struct doesn't encourage
us to free unused pat_ref structs, plus it is error-prone if new dynamic
members are added to pat_ref struct in the future.

To fix that, let's add a pat_ref_free() helper func and use it where
relevant (which means only under pat_ref init function for now..)
2024-11-07 11:36:13 +01:00
Aurelien DARRAGON
e8a0dbff93 OPTIM: pattern: use malloc() to initialize new pat_ref struct
As mentioned in the previous commit, in _pat_ref_new(), it was not
strictly needed to explicitly assign all struct members to 0 since
the struct was allocated with calloc() which does the zeroing for us.

However, it was verified that we already initialize all fields explictly,
thus there is no reason to keep using calloc() instead of malloc(). In
fact using malloc() is less expensive, so let's use that instead now.
2024-11-07 11:36:08 +01:00
Aurelien DARRAGON
d1397401f0 MINOR: pattern: add _pat_ref_new() helper func
pat_ref_newid() and pat_ref_new() are two functions to create and
initialize a pat_ref struct based on input parameters.

Both function perform the same generic allocation and initialization
for pat_ref struct, thus there is quite a lot of code redundancy.

This is error-prone if the pat_ref init sequence has to be updated at
some point.

To reduce maintenance costs, let's add a _pat_ref_new() helper func that
takes care of the generic allocation and base initialization for pat_ref
struct.
2024-11-07 11:36:01 +01:00
Aurelien DARRAGON
79a346aa28 MINOR: event_hdl: add event_hdl_sub_list_empty() helper func
event_hdl_sub_list_empty() may be used to know if the subscription list
passed as argument is empty or not (ie: if there currently are any
subcribers or not). It can be useful to know if the subscription is empty
is order to avoid unecessary preparation work and skip event publishing to
save CPU time if we already know that no one is interested in tracking the
changes for a given subscription list.
2024-11-07 11:35:55 +01:00
Willy Tarreau
5dcf2012fc MINOR: debug: move the "recover now" warn message after the optional notes
At the end of the too long processing warning added by commit 0950778b3a
("MINOR: debug: add a function to dump a stuck thread"), there can be some
optional notes about lua and memory trimming. However it's a bit awkward
that they appear after the "trying to recover now" message. Let's just move
that message after the notes.
2024-11-07 07:56:13 +01:00
Willy Tarreau
5f4fe20116 DEBUG: wdt: set the default blocked task delay to 100 ms
The warn-blocked-traffic-after can be significantly lowered. In any
case, in order to be usable it must be well below the limit to have a
chance to emit exploitable traces before the watchdog finally fires.
Even configured at 1ms it looks very difficult to trigger it on a
laptop doing SSL and compression, so applying a 100-fold factor to
cover for large configs and small machines sounds sane for 3.1. In any
case, even at 100ms, the service degradation becomes quite visible.
2024-11-06 18:35:42 +01:00
Willy Tarreau
84dd05e7d8 DEBUG: wdt: add a stats counter "BlockedTrafficWarnings" in show info
Every time a warning is issued about traffic being blocked, let's
increment a global counter so that we can check for this situation
in "show info".
2024-11-06 18:35:42 +01:00
Willy Tarreau
6127e5a4e9 DEBUG: wdt: make the blocked traffic warning delay configurable
The new global "warn-blocked-traffic-after" allows one to configure
after how much time a warning should be emitted when traffic is blocked.
2024-11-06 18:35:42 +01:00
Willy Tarreau
7337c42224 DEBUG: cli: make it possible for "debug dev loop" to trigger warnings
A new argument "warn" allows to force the emission of a warning while
stuck in the loop by making the internal state inconsistent.
2024-11-06 18:35:42 +01:00
Willy Tarreau
148eb5875f DEBUG: wdt: better detect apparently locked up threads and warn about them
In order to help users detect when threads are behaving abnormally, let's
try to emit a warning when one is no longer making any progress. This will
allow to catch faulty situations more accurately, instead of occasionally
triggering just after the long task. It will also let users know that there
is something wrong with their configuration, and inspect the call trace to
figure whether they're using excessively long rules or Lua for example (the
usual warnings about lua-load vs lua-load-per-thread are still reported).

The warning will only be emitted for threads not yet marked as stuck so
as not to interfere with panic dumps and avoid sending a warning just
before a panic. A tainted flag is set when this happens however (0x2000).
2024-11-06 18:35:42 +01:00
Willy Tarreau
0950778b3a MINOR: debug: add a function to dump a stuck thread
There's currently no way to just emit a warning informing that a thread
is stuck without crashing. This is a problem because sometimes users
would benefit from this info to clean up their configuration (e.g. abuse
of map_regm, lua-load etc).

This commit adds a new function ha_stuck_warning() that will emit a
warning indicating that the designated thread has been stuck for XX
milliseconds, with a number of streams blocked, and will make that
thread dump its own state. The warning will then be sent to stderr,
along with some reminders about the impacts of such situations to
encourage users to fix their configuration.

In order not to disrupt operations, a local 4kB buffer is allocated
in the stack. This should be quite sufficient.

For now the function is not used.
2024-11-06 18:35:42 +01:00
Willy Tarreau
3f4d646849 MINOR: wdt: move the local timers to a struct
Better have a local struct for per-thread timers, as this will allow us
to store extra info that are useful to improve accurate reporting.
2024-11-06 18:35:42 +01:00
Willy Tarreau
1f34a0fd27 CLEANUP: stats: fix misleading comment on top of stat_idx_info
The comment asks to update the "metrics_info" array, which does not
exist, instead it's called stat_cols_info[] and is in stats.c. Let's
mention all that to save time searching for the needed info.

While no version seems to have ever known that "metrics_info", it's not
needed to backport this as it's only a comment.
2024-11-06 18:35:42 +01:00
Amaury Denoyelle
3b851a326b BUG/MEDIUM: quic: do not consider ACK on released stream as error
When an ACK is received by haproxy, a lookup is performed to retrieve
the related emitted frames. For STREAM type frames, a lookup is
performed under quic_conn stream_desc tree. Indeed, the corresponding
stream instance could be already released if multiple ACK were received
refering to the same stream offset, which can happen notably if
retransmission occured.

qc_handle_newly_acked_frm() implements this logic. If the case with an
already released stream is encounted, an error is returned. In the end,
this error is propagated via qc_parse_pkt_frms() into
qc_treat_rx_pkts(), despite being in fact a perfectly valid case. Fix
this by adjusting ACK handling function to return a success value for
the particular case of released stream instead.

The impact of this bug is unknown, but it can have several consequences.
* if the packet with the ACK contains other frames after it, their
  content will be skipped
* the packet won't be acknowledged by haproxy, even if it contains other
  frames and is ack-eliciting. This may cause unneeded retransmission by
  the client.
* RTT sampling information related to this ACK is ignored by haproxy

Finally, it also caused the increment of the quic_conn counter
dropped_parsing (droppars in "show quic" output) which should be
reserved only for real error cases.

This regression is present since the following patch :
  e7578084b0536e3e5988be7f09091c85beb8fa9d
  MINOR: quic: implement dedicated type for out-of-order stream ACK

Before, qc_handle_newly_acked_frm() return type was always ignored. As
such, no backport is needed.
2024-11-06 17:37:44 +01:00
William Lallemand
66bff034d7 EXAMPLES: add "traces.cfg" with traces examples
Add an example on how to use the traces section. The example use the
3.1-dev8 syntax and enables all traces on stderr.
2024-11-06 17:32:32 +01:00
Valentine Krasnobaeva
e9928c306c BUG/MINOR: mworker: do 'program' postparser checks in read_cfg_in_discovery_mode
cfg_program_postparser() contains 2 parts:

	- check the combination of MODE_MWORKER and "program" section. if
	"program" section was parsed, MODE_MWORKER is mandatory;

	- check "command" keyword, which is mandatory for this section as
	well.

This is more appropriate now, after the master-worker refactoring, do the
first part in read_cfg_in_discovery_mode, where we already check the
combination of MODE_MWORKER and -S option.

We need to do the second part just below, in read_cfg_in_discovery_mode() as
well, because it's only the master process, who parses now program section and
programs are forked before running postparser functions in step_init_2.
Otherwise, mworker_ext_launch_all() will emit a log message, that program is
started, but actually nothing has been launched, if 'command' keyword is
absent.

This not needs to be backported, as related to the master-worker refactoring.
2024-11-06 15:49:44 +01:00
Amaury Denoyelle
1767196d5b BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO
A ClientHello may be splitted accross several different CRYPTO frames,
then mixed in a single QUIC packet. This is used notably by clients such
as chrome to render the first Initial packet opaque to middleboxes.

Each packet frame is handled sequentially. Out-of-order CRYPTO frames
are buffered in a ncbuf, until gaps are filled and data is transferred
to the SSL stack. If CRYPTO frames are heavily splitted with small
fragments, buffering may fail as ncbuf does not support small gaps. This
causes the whole packet to be rejected and unacknowledged. It could be
solved if the client reemits its ClientHello after remixing its CRYPTO
frames.

This patch is written to improve CRYPTO frame parsing. Each CRYPTO
frames which cannot be buffered due to ncbuf limitation are now stored
in a temporary list. Packet parsing is completed until all frames have
been handled. If temporary list is not empty, reparsing is done on the
stored frames. With the newly buffered CRYPTO frames, ncbuf insert
operation may this time succeeds if the frame now covers a whole gap.
Reparsing will loop until either no progress can be made or it has been
done at least 3 times, to prevent CPU utilization.

This patch should fix github issue #2776.

This should be backported up to 2.6, after a period of observation. Note
that it relies on the following refactor patches :
  MINOR: quic: extend return value of CRYPTO parsing
  MINOR: quic: use dynamically allocated frame on parsing
  MINOR: quic: simplify qc_parse_pkt_frms() return path
2024-11-06 14:29:14 +01:00
Amaury Denoyelle
d65e782c8c MINOR: quic: extend return value of CRYPTO parsing
qc_handle_crypto_frm() is the function used to handled a newly received
CRYPTO frame. Change its API to use a newly dedicated return type. This
allows to report if the frame was properly handled, ignored if already
parsed previously or rejected after a fatal error.

This commit does not have any functional changes. However, it allows to
simplify qc_handle_crypto_frm() API by removing <fast_retrans> as output
parameter. Also, this patch will be necessary to support multiple
iteration of packet parsing for CRYPTO frames.
2024-11-06 14:28:14 +01:00
Amaury Denoyelle
190fc97606 MINOR: quic: use dynamically allocated frame on parsing
qc_parse_pkt_frms() is the function responsible to parse a received QUIC
packet. Payload is decoded and splitted into individual frames which are
then handled individually. Previously, frame was used as locally stack
allocated. Change this to work on a dynamically allocated frame.

This commit does bring any functional changes. However, it will be
useful to extend packet parsing. In particular, it will be necessary to
save some frames during parsing to reparse them after the others.
2024-11-06 14:28:14 +01:00
Amaury Denoyelle
498a99a849 MINOR: quic: simplify qc_parse_pkt_frms() return path
Change qc_parse_pkt_frms() return path for normal and error cases. Most
notably, it allows to remove local variable ret as now return value is
hardcoded on normal and err label. This also allows to define a
different trace for error leaving code.
2024-11-06 14:28:14 +01:00
Aurelien DARRAGON
24dd7154a6 MINOR: http: don't %-encode the payload when not relevant
As reported by Pierre Maoui in GH #2477, it's not possible to render
control chars from variables or expressions verbatim in the payload part
of http-return statements. That's a problem because this part should not
require to be encoded at all (we could even imagine building favicons on
the fly for example).

In fact it is the LOG_OPT_HTTP option when passed as default options on
parse_logformat_string() which tells the log encoder that the payload
should be http-encoded using lf_chunk() instead of being printed using the
per-type encoder.

This option was set when parsing logformat expressions for lf-string
expression under http-return statements, as well as logformat expressions
for set-map action. While it is true that those actions may only be
used under http context, the LOG_OPT_HTTP logformat option is not relevant
there, because the payload is expected to be used without being encoded.

So let's simply get rid of this option when parsing logformat expressions
for set-map action key/value and lf-string from http-request return
action, and add a note next to LOG_OPT_HTTP option to indicate that it is
used to tell the log encoder that the payload should be HTTP-encoded.

Thanks to Pierre for having reported the issue and Willy for the
analysis and patch proposal.
2024-11-06 10:21:15 +01:00
Christopher Faulet
97d3096040 MINOR: http-conv: Remove unreachable goto statement in sample_conv_q_preferred
This was reported by Coverity. In sample_conv_q_preferred() function, a goto
statement after a "while(1)" loop is unreachable. Instead of just removing
it, the same goto statement in the loop is replaced by a break. It is safer
this way, in case the loop change in future.

This patch should fix the issue #2683.
2024-11-06 10:06:52 +01:00
Christopher Faulet
1cc9340afd MINOR: listener: Remove useless checks on the receiver protocol existence
The receiver protocol is always set when a listener is created or cloned. At
least for now. And there is no check on it at many places, except in
listener_accept() function. So, let's remove remaining useless checks. That
will avoid false Coverity reports in future.

This patch should fix the issue #2631.
2024-11-06 09:35:01 +01:00
Frederic Lecaille
217e467e89 BUG/MINOR: quic: fix malformed probing packet building
This bug arrived with this commit:

   cdfceb10a MINOR: quic: refactor qc_prep_pkts() loop

which prevents haproxy from sending PING only packets/datagrams (some
packets/datagrams with only PING frame as ack-eliciting frames inside).
Such packets/datagrams are useful in rare cases during retransmissions
when one wants to probe the peer without exceeding the anti-amplification
limit.

Modify the condition passed to qc_build_pkt() to add padding to the current
datagram. One does not want to do that when probing the peer without ack-eliciting
frames passed as <frms> parameter. Indeed qc_build_pkt() calls qc_do_build_pkt()
which supports this case: if <probe> is true (probing required), qc_do_build_pkt()
handles the case where some padding must be added to a PING only packet/datagram.
This is the case when probing with an empty <frms> frame list of ack-eliciting
frames without exceeding the anti-amplification limit from qc_dgrams_retransmit().

Add some comments to qc_build_pkt() and qc_do_build_pkt() to clarify this
as this code is easy to break!

Thank you for @Tristan971 for having reported this issue in GH #2709.

Must be backported to 3.0.
2024-11-05 20:17:35 +01:00
Frederic Lecaille
444a19ea38 MINOR: quic: Help diagnosing malformed probing packets
Add a BUG_ON() to detect some malformed packets which are supposed to probe the
peer without being ack-eliciting: the peer would not acknowledged such packets.
2024-11-05 20:17:35 +01:00
Willy Tarreau
601b34fe7b MINOR: connection: add new sample fetch functions fc_err_name and bc_err_name
These functions return a symbolic error code such as ECONNRESET to keep
logs compact while making them human-readable. It's a good alternative
to the numeric code in that it's more expressive, and a good one to the
full message since it's shorter and more precise (some codes even match
errno names).

The doc was updated so that the symbolic names appear in the table. It
could be useful to backport this feature to help with troubleshooting
some issues, though backporting the doc might possibly be more annoying
in case users have local patches already, so maybe the table update does
not need to be backported in this case.
2024-11-05 18:57:43 +01:00
Willy Tarreau
822d82caf4 MINOR: rawsock: set connection error codes when returning from recv/send/splice
For a long time the errno values returned by recv/send/splice() were not
translated to connection error codes. There are not that many eligible
and having them would help a lot when debugging some complex issues where
logs disagree with network traces. Let's add them now.
2024-11-05 18:57:43 +01:00
Willy Tarreau
00c383ff65 MINOR: connection: add more connection error codes to cover common errno
While we get reports of connection setup errors in fc_err/bc_err, we
don't have the equivalent for the recv/send/splice syscalls. Let's
add provisions for new codes that cover the common errno values that
recv/send/splice can return, i.e. ECONNREFUSED, ENOMEM, EBADF, EFAULT,
EINVAL, ENOTCONN, ENOTSOCK, ENOBUFS, EPIPE. We also add a special case
for when the poller reported the error itself. It's worth noting that
EBADF/EFAULT/EINVAL will generally indicate serious bugs in the code
and should not be reported.

The only thing is that it's quite hard to forcefully (and reliably)
trigger these errors in automated tests as the timing is critical.
Using iptables to manually reset established connections in the
middle of large transfers at least permits to see some ECONNRESET
and/or EPIPE, but the other ones are harder to trigger.
2024-11-05 18:57:43 +01:00
Willy Tarreau
0f1d37a479 DEBUG: cli: support closing "hard" using close() in addition to fd_delete()
"debug dev close <fd>" currently closes that FD using fd_delete() after
checking that it's known from the fdtab. Sometimes we also want to just
perform a pure close() of FDs not in the fdtab (pollers, etc) in order
to provoke certain error cases. The optional "hard" argument to the
command will make it use a plain close() instead of fd_delete() and skip
the fd owner check. The main visible effect when closing a traffic socket
with it is that instead of dying from a double fd_delete() by seeing that
fd.owner is already 0, it will die during the next fd_insert() seeing that
fd.owner was not 0.
2024-11-05 18:57:43 +01:00
Willy Tarreau
393957908b CLEANUP: connection: properly name the CO_ER_SSL_FATAL enum entry
It was the only one prefixed with "CO_ERR_", making it harder to batch
process and to look up. It was added in 2.5 by commit 61944f7a73 ("MINOR:
ssl: Set connection error code in case of SSL read or write fatal failure")
so it can be backported as far as 2.6 if needed to help integrate other
patches.
2024-11-05 18:57:42 +01:00
Willy Tarreau
abed9e0426 DOC: config: document connection error 44 (reverse connect failure)
It was missing from commit ac1164de7c ("MINOR: connection: define error
for reverse connect"), and can be backported to 3.0 and 2.9.
2024-11-05 18:57:42 +01:00
Christopher Faulet
1f71ec85b0 CLEANUP: quic: Remove the useless directive "tune.quic.backend.max-idle-timeou"
First there is a typo in the directive name, then it is not documented and
finally, it is not used at all. The directive is only removed from the
keyword list. Parsing function is not updated.

This patch should fix the issue #2601.
2024-11-05 18:53:54 +01:00
Willy Tarreau
b300db55f6 BUILD: compiler: define __builtin_prefetch() for tcc
We're using a few occurrences of __builtin_prefetch() but tcc doesn't
know about it so let's give it a dummy definition. Now the code builds
and works again with tcc without thread support.
2024-11-05 15:43:17 +01:00
Willy Tarreau
033db091fc BUILD: import/mt_list: support building with TCC
TCC is often convenient to quickly test builds, run CI tests etc. It has
limited thread support (e.g. no thread-local stuff) but that is often
sufficient for testing. TCC lacks __atomic_exchange_n() but has the
exactly equivalent __atomic_exchange(), and doesn't have any barrier.
For this reason we force the atomic_exchange to use the stricter SEQ_CST
mem ordering that allows to ignore the barrier.

[wt: that's upstream commit ca8b865 ("BUILD: support building with TCC")]
2024-11-05 15:43:17 +01:00
Christopher Faulet
d1adfd9fe4 BUG/MEDIUM: promex: Fix dump of extra counters
When extra counters are dumped for an entity (frontend, backend, server or
listener), there is a filter on capabilities. Some extra counters are not
available for all entities and must be ignored. However, when this was
performed, the field number, used as an index to dump the metric value, was
still incremented while it should not and leads to an overflow or a stats
mix-up.

This patch must be backported to 3.0.
2024-11-05 15:36:41 +01:00
William Lallemand
e75a019fba MINOR: startup: tune.renice.{startup,runtime} allow to change priorities
This commit introduces the tune.renice.startup and tune.renice.runtime
global keywords that allows to change the priority with setpriority().

tune.renice.startup is parsed and applied in the worker or the standalone
process for configuration parsing. If this keyword is used alone, the
nice value is changed to the previous one after configuration parsing.

tune.renice.runtime is applied after configuration parsing, so in the
worker or a standalone process. Combined with tune.renice.startup it
allows to have a different nice value during configuration parsing and
during runtime.

The feature was discussed in github issue #1919.

Example:

   global
        tune.renice.startup 15
        tune.renice.runtime 0
2024-11-04 17:48:58 +01:00
Willy Tarreau
2092199353 [RELEASE] Released version 3.1-dev11
Released version 3.1-dev11 with the following main changes :
    - BUG/MINOR: httpclient: return NULL when no proxy available during httpclient_new()
    - BUG/MEDIUM: mworker/httpclient: initialization skipped by accident in mworker mode
    - BUG/MINOR: resolvers/mworker: missing default resolvers in mworker mode
    - MINOR: mworker/ocsp: skip ocsp-update proxy init in master
    - BUG/MEDIUM: stconn: Wait iobuf is empty to shut SE down during a check send
    - MINOR: mux-h1: Show the SD iobuf in trace messages on stream send events
    - MINOR: mux-h1: Add a trace on shutdown when keep-alive is not possible
    - BUG/MINOR: http-ana: Don't report a server abort if response payload is invalid
    - BUG/MEDIUM: stconn: Check FF data of SC to perform a shutdown in sc_notify()
    - BUG/MAJOR: filters/htx: Add a flag to state the payload is altered by a filter
    - REGTESTS: Never reuse server connection in http-messaging/truncated.vtc
    - BUG/MINOR: quic: avoid leaking post handshake frames
    - MINOR: quic: send new tokens (NEW_TOKEN) even for 1RTT sessions
    - BUG/MEDIUM: quic: avoid freezing 0RTT connections
    - DOC: config: fix rfc7239 forwarded typo in desc
    - MINOR: http_ext: implement rfc7239_{nn,np} converters
    - CLEANUP: http_ext: remove useless BUG_ON() in http_handle_xot_header()
    - BUG/MINOR: sample: free err2 in smp_resolve_args for type ARGT_REG
    - MINOR: arg: add an argument type for identifier
    - BUILD: buffers: keep b_getblk_nc() and b_peek_varint() in buf.h
    - CLEANUP: buffers: simplify b_get_varint()
    - OPTIM: buffers: avoid a useless wrapping check for ofs == 0
    - MINOR: debug: make mark_tainted() return the previous value
    - MINOR: chunk: drop the global thread_dump_buffer
    - MINOR: debug: split ha_thread_dump() in two parts
    - MINOR: debug: slightly change the thread_dump_pointer signification
    - MINOR: debug: make ha_thread_dump_done() take the pointer to be used
    - MINOR: debug: replace ha_thread_dump() with its two components
    - MEDIUM: debug: on panic, make the target thread automatically allocate its buf
    - BUILD: mux-h2/traces: fix build on 32-bit due to size of the DATA frame
    - CI: prepare Coverity build for Ubuntu 24
    - CI: bump development builds explicitely to Ubuntu 24.04
    - CI: modernize macos builds to macos-15
    - BUG/MINOR: mworker: fix mworker-max-reloads parser
    - MINOR: mux-quic: simplify sending of empty STREAM FIN
    - BUG/MINOR: mux-quic: do not close STREAM with empty FIN if no data sent
    - CLEANUP: debug: make the BUG_ON() macros check the condition in the outer one
    - MEDIUM: debug: add match counters for BUG_ON/WARN_ON/CHECK_IF
    - MINOR: debug: add a new debug macro COUNT_IF()
    - MINOR: debug: add "debug dev counters" to list code counters
    - BUG/MEDIUM: stats-html: Never dump more data than expected during 0-copy FF
    - BUG/MEDIUM: mux-h2: Remove H2S from send list if data are sent via 0-copy FF
    - BUG/MINOR: stconn: Pretend the SE have more data to deliver on abortonclose
    - CLEANUP: stream: remove outdated comments
    - DEBUG: stream: Add debug counters to track some client/server aborts
    - DEBUG: mux-h1: Add debug counters to track some errors
    - MINOR: mux-h1: Add support of the debug string for logs
    - MINOR: stream: maintain per-stream counters of the number of passes on code
    - MINOR: filters: add per-filter call counters
    - MINOR: sample: add the "when" converter to condition some expressions
    - BUG/MEDIUM: connection/http-reuse: fix address collision on unhandled address families
    - BUILD: spoe: fix build warning on older gcc around sub-struct initialization
    - Revert "OPTIM: mux-h2: make h2_send() report more accurate wake up conditions"
    - DEBUG: mux-h1: Add debug counters to track errors with in/out pending data
    - BUG/MINOR: mux-h1: Fix conditions on pipe in some COUNT_IF()
    - MINOR: activity/memprofile: show per-DSO stats
    - BUG/MINOR: mworker/cli: show master startup logs in recovery mode
    - MINOR: mworker: stop MASTER proxy listener on worker mcli sockpair
    - MINOR: error: simplify startup_logs_init_shm
    - BUG/MINOR: mworker: show worker warnings in startup logs
    - CLEANUP: mworker: clean mworker_reexec
    - MINOR: mworker/cli: split mworker_cli_proxy_create
    - BUG/MINOR: server: fix dynamic server leak with check on failed init
    - BUG/MEDIUM: server: fix race on servers_list during server deletion
    - BUG/MEDIUM: stconn: Report blocked send if sends are blocked by an error
    - BUG/MINOR: http-ana: Fix wrong client abort reports during responses forwarding
    - BUG/MINOR: stconn: Don't disable 0-copy FF if EOS was reported on consumer side
    - MINOR: mworker/cli: add 'debug' to 'show proc'
    - MINOR: mworker/cli: remove comment line for program when useless
    - MINOR: mworker/cli: 'show proc debug' for old workers
    - BUILD: debug: silence a build warning with threads disabled
    - CLEANUP: mux-h2: remove the unused "full" variable in h2_frt_transfer_data()
    - MINOR: pools: export the pools variable
    - MINOR: debug: place a magic pattern at the beginning of post_mortem
    - MINOR: debug: place the post_mortem struct in its own section.
    - MINOR: debug: store important pointers in post_mortem
    - MINOR: debug: do not limit backtraces to stuck threads
    - MINOR: cli: remove non-printable characters from 'debug dev fd'
    - MINOR: cli: add an 'echo' command
    - MINOR: debug: also add a pointer to struct global to post_mortem
    - CLEANUP: mworker: make mworker_create_master_cli more readable
    - BUG/MEIDUM: mworker: fix fd leak from master to worker
    - BUG/MINOR: mworker/cli: fix mworker_cli_global_proxy_new_listener
    - MINOR: tools: add strnlen2() helper
    - CLEANUP: log: use strnlen2() in _lf_text_len() to compute string length
    - DOC: design: add notes about more detailed error reporting for logs
    - MINOR: debug: also add fdtab and acitvity to struct post_mortem
    - MINOR: debug: remove the redundant process.thread_info array from post_mortem
    - DEV: gdb: add a number of gdb scripts to navigate in core dumps
    - BUG/MINOR: trace: stop rewriting argv with -dt
    - MEDIUM: protocol: make abns a custom unix socket address family
    - MEDIUM: protocol: rely on AF_CUST_ABNS family to recognize ABNS sockets
    - CLEANUP: tools: rely on address family to detect ABNS sockets
    - MINOR: protocol: create abnsz socket address family
    - MINOR: sock: restore effective UNIX family in sock_get_old_sockets()
    - MEDIUM: sock: also restore effective unix family in get_{src,dst}()
    - MEDIUM: sock_unix: use per-family addrcmp function
    - MEDIUM: socket: add zero-terminated ABNS alternative
    - BUG/MINOR: ssl/cli: 'set ssl cert' does not check the transaction name correctly
    - BUG/MINOR: mworker: mworker_reexec: unset MODE_STARTING before free startup logs ring
    - BUG/MINOR: errors: startup_logs_free: set global startup_logs ptr to NULL
    - BUG/MINOR: errors: print_message: don't allocate startup logs ring
    - BUG/MINOR: startup: don't fork worker if started with -c -W
    - BUG/MINOR: startup: dump libs only in worker if started with -W -dL
    - BUG/MINOR: startup: dump keywords only in worker if started with -W -dKAll
    - BUG/MINOR: startup: don't dump polling info for master in verbose mode
    - CI: switch QUIC Interop on AWS-LC to common docker image
    - CI: switch QUIC Interop on LibreSSL to common docker image
    - CI: enable chacha20 test on LibreSSL QUIC Interop
    - DOC: config: add missing glitch_{cnt,rate} data types
    - DOC: config: add missing glitch_{cnt,rate} sample definitions
    - CI: LibreSSL QUIC Interop: fix docker context
    - DEBUG: mux-h1: Add H1C expiration dates in trace messages
    - BUG/MEDIUM: mux-h1: Fix how timeouts are applied on H1 connections
    - BUG/MINOR: http-ana: Report internal error if an action yields on a final eval
    - MINOR: stream: Save last evaluated rule on invalid yield
    - MINOR: quic: complete trace in qc_may_build_pkt()
    - MINOR: quic: move qc_send_mux() prototype into quic_tx.h
    - MINOR: stream: Replace last_rule_file/line fields by a more generic field
    - MINOR: stream: Save the last filter evaluated interrupting the processing
    - MINOR: stream: Save the entity waiting to continue its processing
    - MINOR: stream: Use an enum to identify last and waiting entities for streams
    - MINOR: stream: Add http-buffer-request option in the waiting entities
    - DOC: config: Add documentation about last_entity sample fetch
    - DOC: config: Add documentation about waiting_entity sample fetch
2024-11-01 10:17:02 +01:00
Christopher Faulet
1cd8173687 DOC: config: Add documentation about waiting_entity sample fetch
The commit adds the documentation for the waiting_entity sample fetch.
2024-10-31 20:47:59 +01:00
Christopher Faulet
6034080c49 DOC: config: Add documentation about last_entity sample fetch
The commit adds the documentation for the last_entity sample fetch.
2024-10-31 20:25:07 +01:00
Christopher Faulet
64554a55f4 MINOR: stream: Add http-buffer-request option in the waiting entities
When http-buffer-request option is set on a proxy, the processing will be
paused to wait the full request payload or a full buffer. So it is an entity
that block the processing, just like a rule or a filter that yields. So now,
it is reported as a waiting entity if an error or a timeout occurred.

To do so, an stream entity type is added for this option. There is no
pointer. And "waiting_entity" sample fetch returns the option name.
2024-10-31 20:24:50 +01:00
Christopher Faulet
c64712b085 MINOR: stream: Use an enum to identify last and waiting entities for streams
Instead of using 1 for last/waiting rule and 2 for last/waiting filter, an
enum is used. It is less ambiguous this way.
2024-10-31 20:24:37 +01:00
Christopher Faulet
537f20eb3e MINOR: stream: Save the entity waiting to continue its processing
When a rule or a filter yields because it waits for something to be able to
continue its processing, this entity is saved in the stream. If an error or
a timeout occurred, info on this entity may be retrieved via the
"waiting_entity" sample fetch, for instance to dump it in the logs. This
info may be useful to found root cause of some bugs because it is a way to
know the processing was temporarily stopped. This may explain timeouts for
instance.

The sample fetch is not documented yet.
2024-10-31 16:40:09 +01:00
Christopher Faulet
53de6da1c0 MINOR: stream: Save the last filter evaluated interrupting the processing
It is very similar to the last evaluated rule. When a filter returns an
error that interrupts the processing, it is saved in the stream, in the
last_entity field, with the type 2. The pointer on filter config is
saved. This pointer never changes during runtime and is part of the proxy's
structure. It is an element of the filter_configs list in the proxy
structure.

"last_entity" sample fetch was update accordingly. The filter identifier is
returned, if defined. Otherwise the save pointer.
2024-10-31 16:39:04 +01:00
Christopher Faulet
c9fa78e747 MINOR: stream: Replace last_rule_file/line fields by a more generic field
The last evaluated rule is now saved in a generic structure, named
last_entity, with a type to identify it. The idea is to be able to store
other kind of entity that may interrupt a specific processing.

The type of the last evaluated rule is set to 1. It will be replace later by
an enum to be more explicit. In addition, the pointer to the rule itself is
saved instead of its location.

The sample fetch "last_entity" was added to retrieve the information about
it. In this case, it is the rule localtion, the config file containing the
rule followed by the line where the rule is defined, separated by a
colon. This sample fetch is not documented yet.
2024-10-31 16:36:39 +01:00
Amaury Denoyelle
dcf334168c MINOR: quic: move qc_send_mux() prototype into quic_tx.h
qc_send_mux() is defined in quic_tx.c. As such, its prototype is moved
from quic_conn.h to quic_tx.h.
2024-10-31 15:35:31 +01:00
Amaury Denoyelle
a8738f4156 MINOR: quic: complete trace in qc_may_build_pkt()
Log the encryption level in qc_may_build_pkt(). This is necessary to
fully understand the sending conditions of the QUIC stack.
2024-10-31 15:35:31 +01:00
Christopher Faulet
0b7605491e MINOR: stream: Save last evaluated rule on invalid yield
When an action yields while it is not allowed, an internal error is
reported. This interrupts the processing. So info about the last evaluated
rule must be filled.

This patch may be bakcported if needed. If so, the commit ("MINOR: stream:
Save last evaluated rule on invalid yield") must be backported first.
2024-10-31 09:30:52 +01:00
Christopher Faulet
65ea29dcf8 BUG/MINOR: http-ana: Report internal error if an action yields on a final eval
This was already performed for tcp actions at content level, but not for
HTTP actions. It is always a bug, so it must be reported accordingly.

This patch may be backported to all stable versions.
2024-10-31 09:30:52 +01:00
Christopher Faulet
3c09b34325 BUG/MEDIUM: mux-h1: Fix how timeouts are applied on H1 connections
There were several flaws in the way the different timeouts were applied on
H1 connections. First, the H1C task handling timeouts was not created if no
client/server timeout was specified. But there are other timeouts to
consider. First, the client-fin/server-fin timeouts. But for frontend
connections, http-keey-alive and http-request timeouts may also be used. And
finally, on soft-stop, the close-spread-time value must be considered too.

So at the end, it is probably easier to always create a task to manage H1C
timeouts. Especially since the client/server timeouts are most often set.

Then, when the expiration date of the H1C's task must only be updated if the
considered timeout is set. So tick_add_ifset() must be used instead of
tick_add(). Otherwise, if a timeout is undefined, the taks may expire
immediately while it should in fact never expire.

Finally, the idle expiration date must only be considered for idle
connections.

This patch should be backported in all stable versions, at least as far as
2.6. On the 2.4, it will have to be slightly adapted for the idle_exp
part. On 2.2 and 2.0, the patch will have to be rewrite because
h1_refresh_timeout() is quite different.
2024-10-31 09:30:52 +01:00
Christopher Faulet
9fa5b379fa DEBUG: mux-h1: Add H1C expiration dates in trace messages
The expiration date of the H1C task and the H1C idle expiration date are now
dumped in the trace messages.
2024-10-31 09:30:52 +01:00
Ilia Shipitsin
976af317a4 CI: LibreSSL QUIC Interop: fix docker context
in the commit 98099287ee
building docker was switched to URL, but I forgotten to change context.

this is a followup fix.
2024-10-30 19:42:31 +01:00
Aurelien DARRAGON
0686fd8cfc DOC: config: add missing glitch_{cnt,rate} sample definitions
Following previous commit, when glitch_cnt and glitch_rate data types were
implemented in c9c6b683f ("MEDIUM: stick-tables: add a new stored type for
glitch_cnt and glitch_rate"), newly exposed samples such as
table_glitch_cnt(), table_glitch_rate, src_glitch_cnt() and
src_glitch_rate() were documented but their definitions was missing in
supported keywords list.

It should be backported in 3.0 with c9c6b683f
2024-10-30 17:47:30 +01:00
Aurelien DARRAGON
9a6fc2d474 DOC: config: add missing glitch_{cnt,rate} data types
When glitch_cnt and glitch_rate data types were implemented in
c9c6b683f ("MEDIUM: stick-tables: add a new stored type for glitch_cnt and
glitch_rate"), the data types list for "stick-table" keyword documentation
was overlooked.

This was reported by Nick Ramirez.

It should be backported in 3.0 with c9c6b683f.
2024-10-30 17:47:24 +01:00
Ilia Shipitsin
3ecca216b4 CI: enable chacha20 test on LibreSSL QUIC Interop
it was commented on purpose "until LibreSSL-4.0 is released".
lets enable it
2024-10-30 16:46:22 +01:00
Ilia Shipitsin
98099287ee CI: switch QUIC Interop on LibreSSL to common docker image
previously we used different docker images for different SSL libs,
now all of them are merged into one, lets switch to it
2024-10-30 16:46:06 +01:00
Ilia Shipitsin
4d40e9384c CI: switch QUIC Interop on AWS-LC to common docker image
previously we used different docker images for different SSL libs,
now all of them are merged into one, lets switch to it
2024-10-30 16:45:36 +01:00
Valentine Krasnobaeva
d3eb00e61d BUG/MINOR: startup: don't dump polling info for master in verbose mode
As master-worker fork happens now before step_init_2(), when pollers are
initialized and polling settings and dumped then in verbose and in debug modes
to stdout, it turns out that master and worker dump its same polling
settings separately. This creates long and messy output in these modes.

Polling settings are the same for master and for worker process for the moment.
Even if they would diverge in future we are interested here in worker's
settings. So, when started in the master-worker mode let's dump it only in the
worker context.

This doesn't need to be backported as related to the latest master-worker
refactoring.
2024-10-30 10:50:09 +01:00
Valentine Krasnobaeva
bbe7828d49 BUG/MINOR: startup: dump keywords only in worker if started with -W -dKAll
If haproxy was started with -W -dK*, after master-worker refactoring, we dump
registered keywords to stdout twice in master and in worker processes. This
information is redundant and output has no longer the right format. So, as the
keyword registration happens very early before the fork, let's dump keywords
only in the worker context, if haproxy was launched with -W.

This does not need to be backported, as related to the latest master-worker
refactoring.
2024-10-30 10:01:28 +01:00
Valentine Krasnobaeva
ea824aebc1 BUG/MINOR: startup: dump libs only in worker if started with -W -dL
If haproxy was started with -W -dL, after master-worker refactoring we dump
libs to stdout twice in master and in worker processes. This is information is
redundant. So let's show linked libraries only in the worker context, if
haproxy was started also with -W.

This does not need to be backported, as related to the latest master-worker
rework.
2024-10-30 10:00:40 +01:00
Valentine Krasnobaeva
d1c6d44976 BUG/MINOR: startup: don't fork worker if started with -c -W
Don't do master-worker fork if MODE_CHECK is detected from the command line along
with the master-worker mode. We should exit in MODE_CHECK, after the
configuration parsing and validation. So, with the new master-worker architecture
it's better to align this mode with the standalone.

This patch does not need to be backported, as related to the latest
master-worker rework.
2024-10-30 09:59:59 +01:00
Valentine Krasnobaeva
f0f03b98f7 BUG/MINOR: errors: print_message: don't allocate startup logs ring
Don't call startup_logs_init() in order to allocate the startup logs ring
again, if startup_logs pointer is NULL. Startup logs ring is allocated
explicitly in step_init_1 routine, when the process starts, and it's freed
explicitly for master process at the end of mworker_reexec scope. So, when
we no longer have this pointer, let's just save the log message in the
message buffer.

Otherwise, in case of master process, we will allocate the startup logs ring
again here and we will lost its address after execvp.

No need to backport this fix as it's related to the latest master-worker
refactoring.
2024-10-29 18:17:49 +01:00
Valentine Krasnobaeva
bf8c871e26 BUG/MINOR: errors: startup_logs_free: set global startup_logs ptr to NULL
ring_free() calls free() on the ring struct pointer, but startup_logs continues
to keep this address. So let's reset at the end startup_logs to NULL.
startup_logs is checked in print_message().

No need to backport this fix, as it's related to the latest master-worker
refactoring.
2024-10-29 18:17:49 +01:00
Valentine Krasnobaeva
cd57ee7ffa BUG/MINOR: mworker: mworker_reexec: unset MODE_STARTING before free startup logs ring
Flag MODE_STARTING should be unset for master just before freeing the startup
logs ring, as it triggers the copy of process logs to this ring, see the code
of print_message().

Moreover with this flag set, if startup logs ring pointer is NULL, any
print_message() triggered just before the execvp in mworker_reexec() will call
startup_logs_init(). So ring will be allocated again "discretely" and after
execvp we will lost its address, as in step_init_1() we will call again
startup_logs_init().

No need to backport this fix as it's related to the latest master-worker
refactoring.
2024-10-29 18:17:49 +01:00
William Lallemand
984d2cfb61 BUG/MINOR: ssl/cli: 'set ssl cert' does not check the transaction name correctly
Since commit  089c13850f ("MEDIUM: ssl: ssl-load-extra-del-ext work
only with .crt"), the 'set ssl cert' CLI command does not check
correctly if the transaction you are trying to update is the right one.

The consequence is that you could commit accidentaly a transaction on
the wrong certificate.

The fix introduces the check again in case you are not using
ssl-load-extra-del-ext.

This must be backported in all stable versions.
2024-10-29 16:01:07 +01:00
Tristan
18582ede05 MEDIUM: socket: add zero-terminated ABNS alternative
When an abstract unix socket is bound by HAProxy (using "abns@" prefix),
NUL bytes are appended at the end of its path until sun_path is filled
(for a total of 108 characters).

Here we add an alternative to pass only the non-NUL length of that path
to connect/bind calls, such that the effective path of the socket's name
is as humanly written. This may be useful to interconnect with existing
softwares that implement abstract sockets with this logic instead of the
default haproxy one.

This is achieved by implementing the "abnsz" socket prefix (instead of
"abns"), which stands for "zero-terminated ABNS". "abnsz" prefix may be
used anywhere "abns" is. Internally, haproxy uses the custom socket
family (AF_CUST_ABNS vs AF_CUST_ABNSZ) to differentiate default abns
sockets from zero-terminated ones.

Documentation was updated and regtest was added.

Fixes GH issues #977 and #2479

Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
2024-10-29 12:15:24 +01:00
Aurelien DARRAGON
43861e3234 MEDIUM: sock_unix: use per-family addrcmp function
Thanks to previous commit, we may now use dedicated addrcmp functions for
each UNIX address family. This allows to simplify sock_unix_addrcmp()
function and avoid useless checks in order to try to guess the socket
type.

In this patch we implement sock_abns_addrcmp() and sock_abnsz_addrcmp()
functions, which are respectively used for ABNS and ABNSZ custom families

sock_unix_addrcmp() now only holds regular UNIX socket comparing logic.
2024-10-29 12:15:09 +01:00
Aurelien DARRAGON
d879bf6600 MEDIUM: sock: also restore effective unix family in get_{src,dst}()
As in previous commit, let's push the logic a bit further in order to
properly restore the effective UNIX socket type when leveraging
get_src() and get_dst() sock functions, since they rely on getpeername()
and getsockname() under the hood, both of which will actually loose the
effective family and return AF_UNIX for all our custom UNIX sockets.

To do this, add sock_restore_unix_family() helper function from the logic
implemented in the previous commit, and call this function from get_src()
and get_dst() in case of unix socket prior to returning.
2024-10-29 12:15:03 +01:00
Aurelien DARRAGON
ae64444303 MINOR: sock: restore effective UNIX family in sock_get_old_sockets()
When getting sockets from older process in sock_get_old_sockets(), we
leverage getsockname() to fill sockaddr struct from known fd.

However, the kernel doesn't know about our custom UNIX families such
as CUST_ABNS and CUST_ABNSZ which are both based on AF_UNIX real family.

Since haproxy socket API relies on effective family (and not real family)
to recognize the socket type instead of having to guess it by analyzing
the path content, let's restore it right after getsockname() since we
have all the infos needed to deduce the right family.

If the path starts with a NULL byte, we know that it is an abstract sock.
Then we simply check <addrlen> value from getsockname() to know if the
addr makes uses of the whole path space (normal ABNS) or partial path
space (zero ABNS / aka ABNZ) terminated by 0.
2024-10-29 12:14:57 +01:00
Willy Tarreau
d24768ab44 MINOR: protocol: create abnsz socket address family
For now it's the same as abns. We'll need to modify sock_unix_addrcmp(),
and a few other ones to support effective path length when dealing with
the \0. Let's check with Tristan's patch for this (upcoming patch).

Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
2024-10-29 12:14:50 +01:00
Aurelien DARRAGON
9fea4a3ca5 CLEANUP: tools: rely on address family to detect ABNS sockets
Following previous commit, in str2sa_range(), make use of address' family
which was just set to check if the socket is ABNS or not instead of
relying on an extra boolean to save this info.
2024-10-29 12:14:44 +01:00
Aurelien DARRAGON
5d766260f0 MEDIUM: protocol: rely on AF_CUST_ABNS family to recognize ABNS sockets
Now that we can easily distinguish regular UNIX socket from ABNS sockets
by simply looking at the address family, stop looking at the first byte
from addr->sun_path to guess if the socket is an ABNS one or not. Looking
at the family is straightforward and will allow to differentiate between
upcoming ABNSZ and ABNS (where looking at the first byte from path won't
help anymore).
2024-10-29 12:14:37 +01:00
Willy Tarreau
78ac312bbd MEDIUM: protocol: make abns a custom unix socket address family
This is a pre-requisite to adding the abnsz socket address family:

in this patch we make use of protocol API rework started by 732913f
("MINOR: protocol: properly assign the sock_domain and sock_family") in
order to implement a dedicated address family for ABNS sockets (based on
UNIX parent family).

Thanks to this, it will become trivial to implement a new ABNSZ (for abns
zero) family which is essentially the same as ABNS but with a slight
difference when it comes to path handling (ABNS uses the whole sun_path
length, while ABNSZ's path is zero terminated and evaluation stops at 0)

It was verified that this patch doesn't break reg-tests and behaves
properly (tests performed on the CLI with show sess and show fd).

Anywhere relevant, AF_CUST_ABNS is handled alongside AF_UNIX. If no
distinction needs to be made, real_family() is used to fetch the proper
real family type to handle it properly.

Both stream and dgram were converted, so no functional change should be
expected for this "internal" rework, except that proto will be displayed
as "abns_{stream,dgram}" instead of "unix_{stream,dgram}".

Before ("show sess" output):
  0x64c35528aab0: proto=unix_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=00 epoch=0 age=0s calls=1 rate=0 cpu=0 lat=0 rq[f=848000h,i=0,an=00h,ax=] rp[f=80008000h,i=0,an=00h,ax=] scf=[8,0h,fd=21,rex=10s,wex=] scb=[8,1h,fd=-1,rex=,wex=] exp=10s rc=0 c_exp=

After:
  0x619da7ad74c0: proto=abns_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=00 epoch=0 age=0s calls=1 rate=0 cpu=0 lat=0 rq[f=848000h,i=0,an=00h,ax=] rp[f=80008000h,i=0,an=00h,ax=] scf=[8,0h,fd=22,rex=10s,wex=] scb=[8,1h,fd=-1,rex=,wex=] exp=10s rc=0 c_exp=

Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
2024-10-29 12:14:25 +01:00
William Lallemand
596db3ef86 BUG/MINOR: trace: stop rewriting argv with -dt
When using trace with -dt, the trace_parse_cmd() function is doing a
strtok which write \0 into the argv string.

When using the mworker mode, and reloading, argv was modified and the
trace won't work anymore because the first : is replaced by a '\0'.

This patch fixes the issue by allocating a temporary string so we don't
modify the source string directly. It also replace strtok by its
reentrant version strtok_r.

Must be backported as far as 2.9.
2024-10-29 11:01:47 +01:00
Willy Tarreau
e240be5495 DEV: gdb: add a number of gdb scripts to navigate in core dumps
These is a collection of functions I'm occasionally using to navigate
in core dumps. Only working ones were extracted.

Those requiring knowledge of global variables (e.g. pools, proxy list)
use the one extracted from the post_mortem struct. That one is defined
in post-mortem.gdb and needs to be initialized using "pm_init post_mortem"
or "pm_init <pointer>". From this point a number of global variables are
accessible even if symbols are missing; those ones are then used by other
functions to dump streams, threads, pools, proxies etc.

The files can be sourced or copy-pasted into a gdb session. It's worth
trying to keep them up-to-date, as the old ones used to navigate through
tasks are no longer usable due to massive changes.
2024-10-28 17:55:08 +01:00
Willy Tarreau
52240680f1 MINOR: debug: remove the redundant process.thread_info array from post_mortem
That one is huge and unneeded since we now have the pointer to the
whole thread_info[] array, which does contain the freshest version
of these info and many more. Let's just get rid of it entirely.
2024-10-28 17:14:48 +01:00
Willy Tarreau
da5cf52173 MINOR: debug: also add fdtab and acitvity to struct post_mortem
These ones are often used as well when trying to analyse sequences of
events, let's add them.
2024-10-28 17:14:48 +01:00
Willy Tarreau
20ffa35f66 DOC: design: add notes about more detailed error reporting for logs
These are the notes of a day long code analysis session (CFA+WTA)
aimed at figuring what's missing during most code troubleshooting
sessions.  The goal is to provide good indications about what rules/
filters were still active when the processing ended (timeout, error
etc), what subscribers are still active (indicating waiting for an
event), and what shut/abort events were met at the various levels
of each side's stack, in each direction.
2024-10-28 17:14:48 +01:00
Aurelien DARRAGON
6d5b32daad CLEANUP: log: use strnlen2() in _lf_text_len() to compute string length
Thanks to previous commit, we can now use strnlen2() function to perform
strnlen() portable equivalent instead of re-implementing the logic under
_lf_text_len() function.
2024-10-28 14:59:42 +01:00
Aurelien DARRAGON
24131dee30 MINOR: tools: add strnlen2() helper
strnlen2() is functionally equivalent to strnlen(). Goal is to provide
an alternative to strnlen() which is not portable since it requires
_POSIX_C_SOURCE >= 200809L
2024-10-28 14:59:35 +01:00
Valentine Krasnobaeva
7855069655 BUG/MINOR: mworker/cli: fix mworker_cli_global_proxy_new_listener
There is no need to close proc->ipc_fd[0] on the error path in
mworker_cli_global_proxy_new_listener(), as it's already closed before by the
caller.
2024-10-26 22:53:24 +02:00
Valentine Krasnobaeva
4931d1ca5f BUG/MEIDUM: mworker: fix fd leak from master to worker
During re-execution master keeps always opened "reload" sockpair FDs and
shared sockpair ipc_fd[0], the latter is using to transfert listeners sockets
from the previously forked worker to the new one. So, these master's FDs are
inherited in the newly forked worker and must be closed in its context.

"reload" sockpair inherited FDs and shared sockpair FD (ipc_fd[0]) are closed
separately, becase master doesn't recreate "reload" sockpair each time after
its re-exec. It always keeps the same FDs for this "reload" sockpair. So in
worker context it can be closed immediately after the fork.

At contrast, shared sockpair is created each time after reload, when the new
worker will be forked. So, if N previous workers are still exist at this moment,
the new worker will inherit N ipc_fd[0] from master. So, it's more save to
close all these FDs after get_listeners_fd() and bind_listeners() calls.
Otherwise, early closed FDs in the worker context will be immediately bound to
listeners and we could potentially have some bugs.
2024-10-26 22:53:24 +02:00
Valentine Krasnobaeva
745a4c5e93 CLEANUP: mworker: make mworker_create_master_cli more readable
Using nested 'if' operator, while checking if we will need to allocate again the
"reload" sockpair, does not degrade performance, as mworker_create_master_cli is
a startup routine.

This nested 'if' (we check one condition in each operator) makes more visible the
fact, that the "reload" sockpair is allocated only once, when the master process
starts and it does not re-allocated again (hence, its FDs are not closed) during
reloads. This way of checking multiple conditions here makes more easy to spot
this fact, while analysing the code in order to investigate FD leaks between
master and worker.
2024-10-26 22:26:49 +02:00
Willy Tarreau
2f04ebe14a MINOR: debug: also add a pointer to struct global to post_mortem
The pointer to struct global is also an important element to have in
post_mortem given that it's used a lot to take decisions in the code.
Let's just add it. It's worth noting that we could get rid of argc/argv
at this point since they're also present in the global struct, but they
don't cost much there anyway.
2024-10-26 11:33:09 +02:00
William Lallemand
dc1c0a169c MINOR: cli: add an 'echo' command
Add an echo command to write text over the CLI output.
2024-10-24 17:20:57 +02:00
William Lallemand
944a224358 MINOR: cli: remove non-printable characters from 'debug dev fd'
When using 'debug dev fd', the output of laddr and raddr can contain
some garbage.

This patch replaces any control or non-printable character by a '.'.
2024-10-24 16:45:11 +02:00
Willy Tarreau
4adb2d864d MINOR: debug: do not limit backtraces to stuck threads
Historically for size limitation reasons, we would only dump the
backtrace of stuck threads. The problem is that when triggering
a panic or other reasons, we have no backtrace, which effectively
limits it to the watchdog timer. It's also visible in "show threads"
which used to report backtraces for all threads in 2.4 and displays
none nowadays, making its use much more limited.

A first approach could be to just dump the thread that triggers the
panic (in addition to stuck threads). But that remains quite limited
since "show threads" would still display nothing. This patch takes a
better approach consisting in dumping all non-idle threads. This way
the output is less polluted that with the older approach (no need to
dump all those waiting in the poller), and all active threads are
visible, in panics as well as in "show threads". As such, the CLI
command "debug dev panic" now dmups backtraces again. This is already
a benefit which will ease testing of various locations against the
ability to resolve useful symbols.
2024-10-24 16:12:46 +02:00
Willy Tarreau
e5fccfe0b6 MINOR: debug: store important pointers in post_mortem
Dealing with a core and a stripped executable is a pain when it comes
to finding pools, proxies or thread contexts. Let's put a pointer to
these heads and arrays in the post_mortem struct for easier location.
Other critical lists like this could possibly benefit from being added
later.

Here we now have:
  - tgroup_info
  - thread_info
  - tgroup_ctx
  - thread_ctx
  - pools
  - proxies

Example:
  $ objdump -h haproxy|grep post
   34 _post_mortem  000014b0  0000000000cfd400  0000000000cfd400  008fc400  2**8

  (gdb) set $pm=(struct post_mortem*)0x0000000000cfd400

  (gdb) p $pm->tgroup_ctx[0]
  $8 = {
    threads_harmless = 254,
    threads_idle = 254,
    stopping_threads = 0,
    timers = {
      b = {0x0, 0x0}
    },
    niced_tasks = 0,
    __pad = 0xf5662c <ha_tgroup_ctx+44> "",
    __end = 0xf56640 <ha_tgroup_ctx+64> ""
  }

  (gdb) info thr
    Id   Target Id                         Frame
  * 1    Thread 0x7f9e7706a440 (LWP 21169) 0x00007f9e76a9c868 in raise () from /lib64/libc.so.6
    2    Thread 0x7f9e76a60640 (LWP 21175) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
    3    Thread 0x7f9e7613d640 (LWP 21176) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
    4    Thread 0x7f9e7493a640 (LWP 21179) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
    5    Thread 0x7f9e7593c640 (LWP 21177) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
    6    Thread 0x7f9e7513b640 (LWP 21178) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
    7    Thread 0x7f9e6ffff640 (LWP 21180) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
    8    Thread 0x7f9e6f7fe640 (LWP 21181) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
  (gdb) p/x $pm->thread_info[0].pth_id
  $12 = 0x7f9e7706a440
  (gdb) p/x $pm->thread_info[1].pth_id
  $13 = 0x7f9e76a60640

  (gdb) set $px = *$pm->proxies
  while ($px != 0)
     printf "%#lx %s served=%u\n", $px, $px->id, $px->served
     set $px = ($px)->next
  end

  0x125eda0 GLOBAL served=0
  0x12645b0 stats served=0
  0x1266940 comp served=0
  0x1268e10 comp_bck served=0
  0x1260cf0 <OCSP-UPDATE> served=0
  0x12714c0 <HTTPCLIENT> served=0
2024-10-24 16:12:46 +02:00
Willy Tarreau
93c3f2a0b4 MINOR: debug: place the post_mortem struct in its own section.
Placing it in its own section will ease its finding, particularly in
gdb which is too dumb to find anything in memory. Now it will be
sufficient to issue this:

  $ gdb -ex "info files" -ex "quit" ./haproxy core 2>/dev/null |grep _post_mortem
  0x0000000000cfd300 - 0x0000000000cfe780 is _post_mortem

or this:

   $ objdump -h haproxy|grep post
    34 _post_mortem  00001480  0000000000cfd300  0000000000cfd300  008fc300  2**8

to spot the symbol's address. Then it can be read this way:

   (gdb) p *(struct post_mortem *)0x0000000000cfd300
2024-10-24 16:12:46 +02:00
Willy Tarreau
989b02e193 MINOR: debug: place a magic pattern at the beginning of post_mortem
In order to ease finding of the post_mortem struct in core dumps, let's
make it start with a recognizable pattern of exactly 32 chars (to
preserve alignment):

  "POST-MORTEM STARTS HERE+7654321\0"

It can then be found like this from gdb:

  (gdb) find 0x000000012345678, 0x0000000100000000, 'P','O','S','T','-','M','O','R','T','E','M'
  0xcfd300 <post_mortem>
  1 pattern found.

Or easier with any other more practical tool (who as ever used "find" in
gdb, given that it cannot iterate over maps and is 100% useless?).
2024-10-24 16:12:46 +02:00
Willy Tarreau
fba48e1c40 MINOR: pools: export the pools variable
We want it to be accessible from debuggers for inspection and it's
currently unavailable. Let's start by exporting it as a first step.
2024-10-24 16:12:46 +02:00
Willy Tarreau
db76949cff CLEANUP: mux-h2: remove the unused "full" variable in h2_frt_transfer_data()
During 11th and 12th iteration of the development cycle for the H2 auto
rx window, several approaches were attempted to figure if another buffer
could be allocated or not. One of them consisted in looping back to the
beginning of the function requesting a new buffer slot and getting one
if the buffer was either apparently or confirmed full. The latest one
consisted in directly allocating the next buffer from the two places
where it's found to be proven full, instead of checking with the now
defunct h2s_may_get_rxbuf() if we were allowed to get once an loop.
That approach was retained. In this case the "full" variabled is no
longer needed, so let's get rid of it because the construct looks bogus
and confuses coverity (and possibly code readers as the intent is unclear
compared to the code).
2024-10-24 16:12:46 +02:00
Willy Tarreau
f163cbfb7f BUILD: debug: silence a build warning with threads disabled
Commit 091de0f9b2 ("MINOR: debug: slightly change the thread_dump_pointer
signification") caused the following warning to be emitted when threads
are disabled:

  src/debug.c: In function 'ha_thread_dump_one':
  src/debug.c:359:9: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]

Let's just disguise the pointer to silence it. It should be backported
where the patch above was backported, since it was part of a series aiming
at making thread dumps more exploitable from core dumps.
2024-10-24 16:12:46 +02:00
William Lallemand
5db761f709 MINOR: mworker/cli: 'show proc debug' for old workers
Add FD details for old workers in 'show proc debug'.
2024-10-24 14:47:28 +02:00
William Lallemand
b49ddae21b MINOR: mworker/cli: remove comment line for program when useless
Remove the '# programs' line on 'show proc' output when there are no
program.
2024-10-24 14:39:41 +02:00
William Lallemand
84640aaa2a MINOR: mworker/cli: add 'debug' to 'show proc'
This patch adds a 'debug' parameter to the 'show proc' command of the
master CLI. It allows to show debug details about the processes.

Example:

echo 'show proc debug' | socat /tmp/master.sock -
\#<PID>          <type>          <reloads>       <uptime>        <version>      		<ipc_fd[0]>     <ipc_fd[1]>
391999          master          0 [failed: 0]   0d00h00m02s     3.1-dev10-b9095a-63		5               6
\# workers
392001          worker          0               0d00h00m02s     3.1-dev10-b9095a-63		3               -1
\# programs
2024-10-24 14:23:27 +02:00
Christopher Faulet
362de90f3e BUG/MINOR: stconn: Don't disable 0-copy FF if EOS was reported on consumer side
There is no reason to disable the 0-copy data forwarding if an end-of-stream
was reported on the consumer side. Indeed, the consumer will send data in
this case. So there is no reason to check the read side here.

This patch may be backported as far as 2.9.
2024-10-24 12:07:50 +02:00
Christopher Faulet
5970c6abec BUG/MINOR: http-ana: Fix wrong client abort reports during responses forwarding
When the response forwarding is aborted, we must not report a client abort
if a EOS was seen on client side. On abort performed by the stream must be
considered.

This bug was introduced when the SHUTR was splitted in 2 flags.

This patch must be backported as far as 2.8.
2024-10-24 12:07:50 +02:00
Christopher Faulet
fbc3de6e9e BUG/MEDIUM: stconn: Report blocked send if sends are blocked by an error
When some data must be sent to the endpoint but an error was previously
reported, nothing is performed and we leave. But, in this case, the SC is not
notified the sends are blocked.

It is indeed an issue if the endpoint reports an error after consuming all
data from the SC. In the endpoint the outgoing data are trashed because of
the error, but on the SC, everything was sent, even if an error was also
reported.

Because of this bug, it is possible to have outgoing data blocked at the SC
level but without any write timeout armed. In some cases, this may lead to
blocking conditions where the stream is never closed.

So now, when outgoing data cannot be sent because an previous error was
triggered, a blocked send is reported. This way, it is possible to report a
write timeout.

This patch should fix the issue #2754. It must be backported as far as 2.8.
2024-10-24 11:46:33 +02:00
Amaury Denoyelle
7a02fcaf20 BUG/MEDIUM: server: fix race on servers_list during server deletion
Each server is inserted in a global list named servers_list on
new_server(). This list is then only used to finalize servers
initialization after parsing.

On dynamic server creation, there is no issue as new_server() is under
thread isolation. However, when a server is deleted after its refcount
reached zero, srv_drop() removes it from servers_list without lock
protection. In the longterm, this can cause list corruption and crashes,
especially if multiple adjacent servers are removed in parallel.

To fix this, convert servers_list to a mt_list. This should not impact
performance as servers_list is not used during runtime outside of server
creation/deletion.

This should fix github issue #2733. Thanks to Chris Staite who first
found the issue here.

This must be backported up to 2.6.
2024-10-24 11:35:57 +02:00
Amaury Denoyelle
116178563c BUG/MINOR: server: fix dynamic server leak with check on failed init
If a dynamic server is added with check or agent-check, its refcount is
incremented after server keyword parsing. However, if add server fails
at a later stage, refcount is only decremented once, which prevented the
server to be fully released.

This causes a leak with a server which is detached from most of the
lists but still exits in the system.

This bug is considered minor as only a few conditions may cause a
failure in add server after check/agent-check initialization. This is
the case if there is a naming collision or the dynamic ID cannot be
generated.

To fix this, simply decrement server refcount on add server error path
if either check and/or agent-check are flagged as activated.

This bug is related to github issue #2733. Thanks to Chris Staite who
first found the leak.

This must be backported up to 2.6.
2024-10-24 11:35:57 +02:00
Valentine Krasnobaeva
ddb829bb51 MINOR: mworker/cli: split mworker_cli_proxy_create
There are two parts in mworker_cli_proxy_create(): allocating and setting up
MASTER proxy and allocating and setting up servers on ipc_fd[0] of the
sockpairs shared with workers.

So, let's split mworker_cli_proxy_create() into two functions respectively.
Each of them takes **errmsg as an argument to write an error message, which may
be triggered by some subcalls. The content of this errmsg will allow to extend
the final alert message shown to user, if these new functions will fail.

The main goals of this split is to allow to move these two parts independantly
in future and makes the code of haproxy initialization in haproxy.c more
transparent.
2024-10-24 11:32:20 +02:00
Valentine Krasnobaeva
a0d727e069 CLEANUP: mworker: clean mworker_reexec
Before refactoring master-worker architecture, resources to setup master CLI
for the new worker process (shared sockpair, entry in proc_list) were created
in init() before parsing the configuration and binding listening sockets. So,
master during its re-exec has had to cleanup the new worker's ressources in
a case, when it fails at some initialization step before the fork.

Now fork happens very early and worker parses its configuration by itself. If
it fails during the initialization stage, all clean ups (deleting the fds of
the shared sockpair, proc_list cleanup) are performed in SIGCHLD handler up to
catching the SIGCHLD corresponded to this new worker. So, there is no longer
need to call mworker_cleanup_proc() in mworker_reexec().

As for mworker_cleanlisteners(), there is no longer need to call this function.
Master parses now only "global" and "program" sections, so it allocates only
MASTER proxy, which is stopped in mworker_reexec() by mworker_cli_proxy_stop().

Let's keep the definitions of mworker_cleanlisteners() and
mworker_cleanup_proc() in mworker.c for the moment. We may reuse parts of its
code later.
2024-10-24 11:32:20 +02:00
Valentine Krasnobaeva
4db0f69527 BUG/MINOR: mworker: show worker warnings in startup logs
As master-worker fork happens now at early init stage and worker then parses
its configuration and performs all initialization steps, let's duplicate
startup logs ring for it, just before the moment when it enters in its pollong
loop. Startup logs ring content is shown as an output of the "reload" master
CLI command and we should be able to dump here worker initialization logs.

Log messages are written in startup logs ring only, when mode MODE_STARTING is
set (see print_message()). So, to be able to keep in startup logs the last
worker alerts, let's withdraw MODE_STARTING and let's reset user messages
context respectively just before entering in polling loop.

This fix does not need to be backported as it is a part of previous patches
from this version, which refactor master-worker architecture.
2024-10-24 11:32:20 +02:00
Valentine Krasnobaeva
5ee266b745 MINOR: error: simplify startup_logs_init_shm
This patch simplifies the code of startup_logs_init_shm(). We no longer re-exec
master process twice after each reload to free its unused memory, which it had
to allocate, because it has parsed all configuration sections. So, there is no
longer need to keep SHM fd opened between the first and the next reloads. We
can completely remove HAPROXY_STARTUPLOGS_FD.

In step_init_1() we continue to call startup_logs_init_shm() to open SHM and to
allocate startup logs ring area within it. In master-worker mode, worker
duplicates initial startup logs ring after sending its READY state to master.
Sharing the same ring between two processes until the worker finishes its
initialization allows to show at master CLI output worker's startup logs.

During the next reload master process should free the memory allocated for the
ring structure. Then after the execvp() it will reopen and map SHM area again
and it will reallocate again the ring structure.
2024-10-24 11:32:20 +02:00
Valentine Krasnobaeva
e9c8e0efc9 MINOR: mworker: stop MASTER proxy listener on worker mcli sockpair
After sending its "READY" status worker should not keep the access
to MASTER proxy, thus, it shouldn't be able to send any other commands further
to master process.

To achieve this, let's stop in master context master CLI listener attached on
the sockpair shared with worker. We do this just after receiving the worker's
status message.
2024-10-24 11:32:20 +02:00
Valentine Krasnobaeva
3a5b28e00c BUG/MINOR: mworker/cli: show master startup logs in recovery mode
When master enters in recovery mode after unsuccessfull reload
HAPROXY_LOAD_SUCCESS should be set as 0. Like this
cli_io_handler_show_cli_sock() could dump in master CLI its warnings and alerts,
saved in startup logs ring.

No need to backport this fix, as this is related to the previous patches in
this version to refactor master-worker architecture.
2024-10-24 11:32:20 +02:00
Willy Tarreau
401fb0e87a MINOR: activity/memprofile: show per-DSO stats
On systems where many libs are loaded, it's hard to track suspected
leaks. Having a per-DSO summary makes it more convenient. That's what
we're doing here by summarizing all calls per DSO before showing the
total.
2024-10-24 10:49:21 +02:00
Christopher Faulet
c91745e3a4 BUG/MINOR: mux-h1: Fix conditions on pipe in some COUNT_IF()
The previous commit contains a bug in some COUNT_IF() relying on the pipe
inside the IOBUF. We must take care to have a pipe before checking its size.

No backport needed.
2024-10-24 09:50:16 +02:00
Christopher Faulet
7e60928c9c DEBUG: mux-h1: Add debug counters to track errors with in/out pending data
Debug counters were added on all connection error when pending data remain
blocked in the input or ouput buffers. The same is performed when the H1C is
released, when the connection is closed and when a timeout is reached. Idea
is to be able to count all cases where data are lost, especially the
outgoing ones.
2024-10-24 08:18:55 +02:00
Willy Tarreau
1eb31d30fe Revert "OPTIM: mux-h2: make h2_send() report more accurate wake up conditions"
This reverts commit 9fbc01710a313968c90e72537a5906432f438062.

In 3.1-dev10, commit 9fbc01710a ("OPTIM: mux-h2: make h2_send() report
more accurate wake up conditions") leveraged the more accurate distinction
between demux and recv to decide when to wake the tasklet up after a send.
But other cases are needed. When we just need to wake the processing task
up so that it itself wakes up other streams, for example because these ones
are blocked. Indeed, a temporarily blocked stream may block other ones,
which will never be woken up if the demux has nothing to do.

In an ideal world we would check all cases where blocking flags were
dropped. However it looks like this case after a send is probably the
only one that deserves waking up the connection again. It's likely that
in practice the MUX_MFULL flag was dropped and that it was that one that
was blocking the send.

In addition, dealing with these cases was not sufficient, as one case was
encountered where dbuf was empty, subs=0, short_read still present while
in FRH state... and the timeouts were still there (easily found with
halog -tcn cD at a rate of 1-2 every 2 minutes roughly).

Interestingly, in a dump, some MBUF_HAS_DATA were seen on an empty mbuf,
so it means that certain conditions must be taken very carefully in the
wakeup conditions.

So overall this indicates that there remain subtle inconsistencies that
this optimization is sensitive to. It may have to be revisited later but
for now better revert it.

No backport is needed.

Annex:
  - first dump showing a dependency on WAIT_INLIST after h2_send():

    0x6dc2800: [23/Oct/2024:18:07:22.861247] id=1696 proto=tcpv4
      flags=0x100c4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x597a900, pend_pos=(nil) waiting=0 epoch=0
      frontend=public (id=2 mode=http), listener=SSL (id=5)
      backend=gitweb-haproxy (id=6 mode=http)
      task=0x6e1d090 (state=0x00 nice=0 calls=23 rate=0 exp=2s tid=0(1/0) age=57s)
      txn=0x6e3f7c0 flags=0x43000 meth=1 status=200 req.st=MSG_DONE rsp.st=MSG_DATA req.f=0x4c rsp.f=0x2e
      scf=0x6dc33a0 flags=0x00002482 ioto=1m state=EST endp=CONN,0x6dc6c20,0x40405001 sub=3 rex=<NEVER> wex=3s rto=3s wto=3s
        iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0
          h2s=0x6dc6c20 h2s.id=59 .st=HCR .flg=0x7001 .rxwin=32712 .rxbuf.c=0 .t=0@(nil)+0/0 .h=0@(nil)+0/0
           .sc=0x6dc33a0(.flg=0x00002482 .app=0x6dc2800) .sd=0x6e83fd0(.flg=0x40405001)
           .subs=0x6dc33b8(ev=3 tl=0x6e22a20 tl.calls=10 tl.ctx=0x6dc33a0 tl.fct=sc_conn_io_cb)
           h2c=0x6e66570 h2c.st0=FRH .err=0 .maxid=77 .lastid=-1 .flg=0x2000e00 .nbst=2 .nbsc=2 .nbrcv=0 .glitches=0
           .fctl_cnt=0 .send_cnt=2 .tree_cnt=2 .orph_cnt=0 .sub=1 .dsi=77 .dbuf=0@(nil)+0/0
           .mbuf=[4..4|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=0x6dbdc60 .exp=<NEVER>
          co0=0x7f84881614b0 ctrl=tcpv4 xprt=SSL mux=H2 data=STRM target=LISTENER:0x2acb7c0
          flags=0x80000300 fd=19 fd.state=121 updt=0 fd.tmask=0x1
      scb=0x2a8da90 flags=0x00001211 ioto=1m state=EST endp=CONN,0x6e5a530,0x106c0001 sub=0 rex=<NEVER> wex=<NEVER> rto=3s wto=<NEVER>
        iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0
          h1s=0x6e5a530 h1s.flg=0x14094 .sd.flg=0x106c0001 .req.state=MSG_DONE .res.state=MSG_DATA
           .meth=GET status=200 .sd.flg=0x106c0001 .sc.flg=0x00001211 .sc.app=0x6dc2800 .subs=(nil)
           h1c=0x7f84880f5f40 h1c.flg=0x80000020 .sub=0 .ibuf=32704@0x6ddef30+16262/32768 .obuf=0@(nil)+0/0 .task=0x6e131d0 .exp=<NEVER>
          co1=0x7f8488172b70 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x597a900
          flags=0x00000300 fd=31 fd.state=10122 updt=0 fd.tmask=0x1
      filters={0x6e49f30="cache store filter", 0x6e67ad0="compression filter"}
      req=0x6dc2828 (f=0x21840000 an=0x48000 tofwd=0 total=224)
          an_exp=<NEVER> buf=0x6dc2830 data=(nil) o=0 p=0 i=0 size=0
          htx=0x104d2c0 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0
      res=0x6dc2870 (f=0xa0040000 an=0x24000000 tofwd=0 total=309982)
          an_exp=<NEVER> buf=0x6dc2878 data=0x6dceef0 o=16333 p=16333 i=16435 size=32768
          htx=0x6dceef0 flags=0x0 size=32720 data=16333 used=1 wrap=NO extra=0
      -----------------------------------
      strm.flg       0x100c4a  SF_SRV_REUSED SF_HTX SF_REDIRECTABLE SF_CURR_SESS SF_BE_ASSIGNED SF_ASSIGNED
      task.state            0  0
      txn.meth              1  GET
      txn.flg         0x43000  TX_NOT_FIRST TX_CACHE_COOK TX_CACHEABLE
      txn.req.flg        0x4c  HTTP_MSGF_BODYLESS HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN
      txn.rsp.flg        0x2e  HTTP_MSGF_COMPRESSING HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN HTTP_MSGF_TE_CHNK
      f.sc.flg         0x2482  SC_FL_SND_EXP_MORE SC_FL_RCV_ONCE SC_FL_WONT_READ SC_FL_EOI
      f.sc.sd.flg  0x40405001  SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX
      f.h2s.flg        0x7001  H2_SF_HEADERS_RCVD H2_SF_OUTGOING_DATA H2_SF_HEADERS_SENT H2_SF_ES_RCVD
      f.h2s.sd.flg 0x40405001  SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX
      f.h2c.flg     0x2000e00  H2_CF_MBUF_HAS_DATA H2_CF_DEM_IN_PROGRESS H2_CF_DEM_SHORT_READ H2_CF_WAIT_INLIST
      f.co.flg     0x80000300  CO_FL_XPRT_TRACKED CO_FL_XPRT_READY CO_FL_CTRL_READY
      f.co.fd.st        0x121  FD_POLL_IN FD_EV_READY_W FD_EV_ACTIVE_R
      b.sc.flg         0x1211  SC_FL_SND_NEVERWAIT SC_FL_NEED_ROOM SC_FL_NOHALF SC_FL_ISBACK
      b.sc.sd.flg  0x106c0001  SE_FL_WAIT_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_MAY_FASTFWD_PROD SE_FL_WANT_ROOM SE_FL_RCV_MORE SE_FL_T_MUX
      b.h1s.sd.flg 0x106c0001  SE_FL_WAIT_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_MAY_FASTFWD_PROD SE_FL_WANT_ROOM SE_FL_RCV_MORE SE_FL_T_MUX
      b.h1s.flg       0x14094  H1S_F_HAVE_CLEN H1S_F_HAVE_O_CONN H1S_F_NOT_FIRST H1S_F_WANT_KAL H1S_F_RX_CONGESTED
      b.h1c.flg    0x80000020  H1C_F_IS_BACK H1C_F_IN_FULL
      b.co.flg          0x300  CO_FL_XPRT_READY CO_FL_CTRL_READY
      b.co.fd.st       0x278a  FD_POLL_OUT FD_POLL_PRI FD_POLL_IN FD_EV_ERR_RW FD_EV_READY_R 0x2008
      req.flg      0x21840000  CF_FLT_ANALYZE CF_DONT_READ CF_AUTO_CONNECT CF_WROTE_DATA
      req.ana         0x48000  AN_REQ_FLT_END AN_REQ_HTTP_XFER_BODY
      req.htx.flg           0  0
      res.flg      0xa0040000  CF_ISRESP CF_FLT_ANALYZE CF_WROTE_DATA
      res.ana      0x24000000  AN_RES_FLT_END AN_RES_HTTP_XFER_BODY
      res.htx.flg           0  0
      -----------------------------------

  - second example of stuck connection after properly checking for WAIT_INLIST
    as well:

    0x73438d0: [23/Oct/2024:18:46:57.235709] id=3963 proto=tcpv4
      flags=0x100c4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x5dd3f50, pend_pos=(nil) waiting=0 epoch=0x13
      p_stc=25 p_req=29 p_res=29 p_prp=29
      frontend=public (id=2 mode=http), listener=SSL (id=5)
      backend=gitweb-haproxy (id=6 mode=http)
      task=0x72a13e0 (state=0x00 nice=0 calls=24 rate=0 exp=7s tid=0(1/0) age=53s)
      txn=0x7287260 flags=0x43000 meth=1 status=200 req.st=MSG_DONE rsp.st=MSG_DATA req.f=0x4c rsp.f=0x2e
      scf=0x729e520 flags=0x00042082 ioto=1m state=EST endp=CONN,0x737ffd0,0x4040d001 sub=2 rex=<NEVER> wex=46s rto=46s wto=46s
        iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0
          h2s=0x737ffd0 h2s.id=57 .st=HCR .flg=0x7001 .rxwin=32712 .rxbuf.c=0 .t=0@(nil)+0/0 .h=0@(nil)+0/0
           .sc=0x729e520(.flg=0x00042082 .app=0x73438d0) .sd=0x72afd50(.flg=0x4040d001)
           .subs=0x729e538(ev=2 tl=0x72af760 tl.calls=10 tl.ctx=0x729e520 tl.fct=sc_conn_io_cb)
           h2c=0x72555a0 h2c.st0=FRH .err=0 .maxid=77 .lastid=-1 .flg=0x60e00 .nbst=1 .nbsc=1 .nbrcv=0 .glitches=0
           .fctl_cnt=0 .send_cnt=1 .tree_cnt=1 .orph_cnt=0 .sub=0 .dsi=77 .dbuf=0@(nil)+0/0
           .mbuf=[2..2|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=0x725e660 .exp=<NEVER>
          co0=0x7378e00 ctrl=tcpv4 xprt=SSL mux=H2 data=STRM target=LISTENER:0x2f24800
          flags=0x80040300 fd=23 fd.state=1122 updt=0 fd.tmask=0x1
      scb=0x2ee74c0 flags=0x00001211 ioto=1m state=EST endp=CONN,0x7287190,0x106c0001 sub=0 rex=<NEVER> wex=<NEVER> rto=46s wto=<NEVER>
        iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0
          h1s=0x7287190 h1s.flg=0x14094 .sd.flg=0x106c0001 .req.state=MSG_DONE .res.state=MSG_DATA
           .meth=GET status=200 .sd.flg=0x106c0001 .sc.flg=0x00001211 .sc.app=0x73438d0 .subs=(nil)
           h1c=0x7373920 h1c.flg=0x80000020 .sub=0 .ibuf=32704@0x7272700+318/32768 .obuf=0@(nil)+0/0 .task=0x729e700 .exp=<NEVER>
          co1=0x72f5290 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x5dd3f50
          flags=0x00000300 fd=19 fd.state=10122 updt=0 fd.tmask=0x1
      filters={0x728f1f0="cache store filter" [3], 0x728fea0="compression filter" [28]}
      req=0x73438f8 (f=0x21840000 an=0x48000 tofwd=0 total=224)
          an_exp=<NEVER> buf=0x7343900 data=(nil) o=0 p=0 i=0 size=0
          htx=0x105f440 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0
      res=0x7343940 (f=0xa0040000 an=0x24000000 tofwd=0 total=359574)
          an_exp=<NEVER> buf=0x7343948 data=0x72b1b30 o=16333 p=16333 i=16435 size=32768
          htx=0x72b1b30 flags=0x8 size=32720 data=16333 used=1 wrap=NO extra=0
      -----------------------------------
      strm.flg       0x100c4a  SF_SRV_REUSED SF_HTX SF_REDIRECTABLE SF_CURR_SESS SF_BE_ASSIGNED SF_ASSIGNED
      task.state            0  0
      txn.meth              1  GET
      txn.flg         0x43000  TX_NOT_FIRST TX_CACHE_COOK TX_CACHEABLE
      txn.req.flg        0x4c  HTTP_MSGF_BODYLESS HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN
      txn.rsp.flg        0x2e  HTTP_MSGF_COMPRESSING HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN HTTP_MSGF_TE_CHNK
      f.sc.flg        0x42082  SC_FL_EOS SC_FL_SND_EXP_MORE SC_FL_WONT_READ SC_FL_EOI
      f.sc.sd.flg  0x4040d001  SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX
      f.h2s.flg        0x7001  H2_SF_HEADERS_RCVD H2_SF_OUTGOING_DATA H2_SF_HEADERS_SENT H2_SF_ES_RCVD
      f.h2s.sd.flg 0x4040d001  SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX
      f.h2c.flg       0x60e00  H2_CF_END_REACHED H2_CF_RCVD_SHUT H2_CF_MBUF_HAS_DATA H2_CF_DEM_IN_PROGRESS H2_CF_DEM_SHORT_READ
      f.co.flg     0x80040300  CO_FL_XPRT_TRACKED CO_FL_SOCK_RD_SH CO_FL_XPRT_READY CO_FL_CTRL_READY
      f.co.fd.st       0x1122  FD_POLL_HUP FD_POLL_IN FD_EV_READY_W FD_EV_READY_R
      b.sc.flg         0x1211  SC_FL_SND_NEVERWAIT SC_FL_NEED_ROOM SC_FL_NOHALF SC_FL_ISBACK
      b.sc.sd.flg  0x106c0001  SE_FL_WAIT_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_MAY_FASTFWD_PROD SE_FL_WANT_ROOM SE_FL_RCV_MORE SE_FL_T_MUX
2024-10-23 19:17:10 +02:00
Willy Tarreau
a1d0e58b06 BUILD: spoe: fix build warning on older gcc around sub-struct initialization
gcc-4.8 is unhappy with the cfg_file initialization:

  src/flt_spoe.c: In function 'parse_spoe_flt':
  src/flt_spoe.c:2202:9: warning: missing braces around initializer [-Wmissing-braces]
    struct cfgfile      cfg_file = {0};
         ^
  src/flt_spoe.c:2202:9: warning: (near initialization for 'cfg_file.list') [-Wmissing-braces]

This is due to the embedded list member. Initializing it to empty like
we do almost everywhere else makes it happy. No backport is needed as
this was changed in 3.1-dev5 only.
2024-10-23 15:12:59 +02:00
Aurelien DARRAGON
b5b40a9843 BUG/MEDIUM: connection/http-reuse: fix address collision on unhandled address families
As described in GH #2765, there were situations where http connections
would be re-used for requests to different endpoints, which is obviously
unexpected. In GH #2765, this occured with httpclient and UNIX socket
combination, but later code analysis revealed that while disabling http
reuse on httpclient proxy helped, it didn't fix the underlying issue since
it was found that conn_calculate_hash_sockaddr() didn't take into account
families such as AF_UNIX or AF_CUST_SOCKPAIR, and because of that the
sock_addr part of the connection wasn't hashed.

To properly fix the issue, let's explicly handle UNIX (both regular and
ABNS) and AF_CUST_SOCKPAIR families, so that the destination address is
properly hashed. To prevent this bug from re-appearing: when the family
isn't known, instead of doing nothing like before, let's fall back to a
generic (unoptimal) hashing which hashes the whole sockaddr_storage struct

As a workaround, http-reuse may be disabled on impacted proxies.
(unfortunately this doesn't help for httpclient since reuse policy
defaults to safe and cannot be modified from the config)

It should be backported to all stable versions.

Shout out to @christopherhibbert for having reported the issue and
provided a trivial reproducer.

[ada: prior to 3.0, ctx adjt is required because conn_hash_update()'s
 prototype is slightly different]
2024-10-23 11:48:16 +02:00
Willy Tarreau
b74fb1325e MINOR: sample: add the "when" converter to condition some expressions
Sometimes it would be desirable to include some debugging output only
under certain conditions, but the end of the transfer is too late to
apply some rules.

Here we take the approach of making a converter ("when") that takes a
condition among an arbitrary list, and decides whether or not to let
the input sample pass through or not based on the condition. This
allows for example to log debugging information only when an error
was encountered during the processing (sort of an extension of
dontlog-normal). The conditions are quite limited (stopping, error,
normal, toapplet, forwarded, processed) and can be negated. The
converter can also be chained to use more complex conditions.

A suggested example will be:

    # log "dbg={-}" when fine, or "dbg={... debug info ...}" on error:
    log-format "$HAPROXY_HTTP_LOG_FMT dbg={%[bs.debug_str,when(!normal)]}"
2024-10-22 20:13:00 +02:00
Willy Tarreau
19e4ec43b9 MINOR: filters: add per-filter call counters
The idea here is to record how many times a filter is being called on a
stream. We're incrementing the same counter all along, regardless of the
type of event, since the purpose is essentially to detect one that might
be misbehaving. The number of calls is reported in "show sess all" next
to the filter name. It may also help detect suboptimal processing. For
example compressing 1GB shows 138k calls to the compression filter, which
is roughly two calls per buffer. Maybe we wake up with incomplete buffers
and compress less. That's left for a future analysis.
2024-10-22 20:13:00 +02:00
Willy Tarreau
37d5c6fe3a MINOR: stream: maintain per-stream counters of the number of passes on code
Process_stream() is a complex function and a few times some lopos were
either witnessed or suspected. Each time this happens it's extremely
difficult to figure why because it involves combinations of analysers,
filters, errors etc.

Let's at least maintain a set of 4 counters per stream that report the
number of times we've been through each of the 4 most important blocks
(stconn changes, request analysers, response analysers, and propagation
of changes down). These ones are stored in the stream and reported in
"show sess all", just like they will be reported in panic dumps.
2024-10-22 20:13:00 +02:00
Christopher Faulet
ce314cfb39 MINOR: mux-h1: Add support of the debug string for logs
Now it is possible to have info about front and back H1 multiplexer. For instance:

    <134>Oct 22 18:10:46 haproxy[3841864]: 127.0.0.1:44280 [22/Oct/2024:18:10:43.265] front-http back-http/www 0/0/-1/-1/3082 503 217 - - SC-- 1/1/0/0/3 0/0 "GET / HTTP/1.1" fs=< h1s=0x13b6f10 h1s.flg=0x14010 .sd.flg=0x50404601 .req.state=MSG_DONE .res.state=MSG_DONE .meth=GET status=503 .sd.flg=0x50404601 .sc.flg=0x00034482 .sc.app=0x11e4c30 .subs=(nil) h1c.flg=0x0 .sub=0 .ibuf
=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x1337d10 .exp=<NEVER> conn.flg=0x80000300> bs=< h1s=0x13bb400 h1s.flg=0x100010 .sd.flg=0x10400001 .req.state=MSG_RQBEFORE .res.state=MSG_RPBEFORE .meth=UNKNOWN status=0 .sd.flg=0x10400001 .sc.flg=0x0003c007 .sc.app=0x11e4c30 .subs=(nil) h1c.flg=0x80000000 .sub=0 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x12ba610 .exp=<NEVER> conn.flg=0x5c0300>

The have this log message, the log-format must be set to:

  log-format "$HAPROXY_HTTP_LOG_FMT fs=<%[fs.debug_str]> bs=<%[bs.debug_str]>"
2024-10-22 18:21:28 +02:00
Christopher Faulet
35ab9b8c6d DEBUG: mux-h1: Add debug counters to track some errors
Debug counters are added to track errors about wrong the payload length
during the message formatting (on the sending path). Aborts are also
concerned. connection shutdowns and errors while the end of the message was
not reached are now tracked. On the sending path, shutdown performed while
all the message was not forwarded are tracked too.
2024-10-22 17:39:32 +02:00
Christopher Faulet
c8aecc393b DEBUG: stream: Add debug counters to track some client/server aborts
Not all aborts are tracked for now but only those a bit ambiguous. Mainly,
aborts during the data forwarding are concerned. Those triggered during the
request or the response analysis are easier to analyze with the stream
termination state.
2024-10-22 16:46:37 +02:00
Christopher Faulet
19b736a5fb CLEANUP: stream: remove outdated comments
Comments added during a refactoring session were still there while they are
now totally useless. So let's remove them.
2024-10-22 16:14:15 +02:00
Christopher Faulet
7dc930d231 BUG/MINOR: stconn: Pretend the SE have more data to deliver on abortonclose
When abortonclose option is enabled on the backend, at the SC level, we must
still pretend the SE have more data to deliver to be able to receive the
EOS. It must be performed at 2 places:

  * When the backend is set and the connection is requested. It is when the
    option is seen for the first time.

  * After a receive attempt, if the EOI flag is set on the sedesc.

Otherwise, when an abort is detected by the mux, the SC is not
notified.

This patch should fix the issue #2764.

This bug probably exists in all stable version but is only visible since
bca5e1423 ("OPTIM: stconn: Don't pretend mux have more data to deliver on
EOI/EOS/ERROR"). So I suggest to not backport it for now, except if the commit
above is backported.
2024-10-22 11:16:24 +02:00
Christopher Faulet
ded28f6e5c BUG/MEDIUM: mux-h2: Remove H2S from send list if data are sent via 0-copy FF
When data are sent via the zero-copy data forwarding, in h2_done_ff, we must
be sure to remove the H2 stream from the send list if something is send. It
was only performed if no blocking condition was encountered. But we must
also do it if something is sent. Otherwise the transfer may be blocked till
timeout.

This patch must be backported as far as 2.9.
2024-10-22 08:00:32 +02:00
Christopher Faulet
529e4f36a3 BUG/MEDIUM: stats-html: Never dump more data than expected during 0-copy FF
During the zero-copy data forwarding, the caller specify the maximum amount
of data the producer may push. However, the HTML stats applet does not use
it and can fill all the free space in the buffer.  It is especially an issue
when the consumer is limited by a flow control, like the H2. Because we may
emit too large DATA frame in this case. It is especially visible with big
buffer (for instance 32k).

In the early age or zero-copy data forwarding, the caller was responsible to
pass a properly resized buffer. And during the different refactoring steps,
this has changed but the HTML stats applet was not updated accordingly.

To fix the bug, the buffer used to dump the HTML page is resized to be sure
not too much data are dumped.

This patch should solve the issue #2757. It must be backported to 3.0.
2024-10-22 08:00:32 +02:00
Willy Tarreau
f2c415cec1 MINOR: debug: add "debug dev counters" to list code counters
Issuing "debug dev counters" on the CLI will now scan all existing
counters, and report their count, type, location, function name, the
condition and an optional comment passed to the macro.

The command takes a number of arguments:
  - "show": this is the default, it will just list the counters
  - "reset": will reset the matching counters instead of listing them
  - "all": by default, only non-zero counters are listed. With "all",
     they are all listed
  - "bug": restrict the reset or dump to counters of type "BUG" (BUG_ON usually)
  - "chk": restrict the reset or dump to counters of type "CHK" (CHECK_IF)
  - "cnt": restrict the reset or dump to counters of type "CNT" (COUNT_IF)

The types may be cumulated, and the option entered in any order. Here's
an example of the output of "debug dev counters show all bug":

  Count     Type Location function(): "condition" [comment]
  0          BUG ring.h:114 ring_dup(): "max > ring_size(dst)"
  0          BUG vecpair.h:223 vp_getblk_ofs(): "ofs >= v1->len + v2->len"
  0          BUG buf.h:395 b_add(): "b->data + count > b->size"
  0          BUG buf.h:106 b_room(): "b->data > b->size"
  0          BUG task.h:328 _task_queue(): "(ulong)caller & 1"
  0          BUG task.h:324 _task_queue(): "task->tid != tid"
  0          BUG task.h:313 _task_queue(): "(ulong)caller & 1"
  (...)

This is expected to be convenient combined with the use and abuse of
COUNT_IF() at select locations.
2024-10-21 19:17:55 +02:00
Willy Tarreau
da66c42f65 MINOR: debug: add a new debug macro COUNT_IF()
This macro works exactly like BUG_ON() except that it never logs anything
nor crashes, it only implements an atomic counter that is incremented on
every call. This can be used to count a number of unlikely events that are
worth checking at run time on setups showing unusual and unreproducible
behaviors.
2024-10-21 19:14:07 +02:00
Willy Tarreau
776fd03509 MEDIUM: debug: add match counters for BUG_ON/WARN_ON/CHECK_IF
These macros do not always kill the process, and sometimes it would be
nice to know if some match or not, and how many times (especially for the
CHECK_IF one).

This commit adds a new section "dbg_cnt" made of structs that contain
function name, file name, line number, check type, condition and match
count. A newe macro __DBG_COUNT() adds one to the counter, and is placed
inside _BUG_ON() and _BUG_ON_ONCE(). It's worth noting that the exact
type of the check is not very precise but in practice we don't care,
as most checks will cause the process to die anyway unless they're of
type _BUG_ON_ONCE() (used by CHECK_IF by default).

All of this is limited to !defined(USE_OBSOLETE_LINKER) because we're
creating a section, thus we need a modern linker to be able to scan
this section later. Doing so adds ~50kB to the executable due to the
~1266 BUG_ON() and others placed there. That's not huge in comparison
to the visibility it can provide.
2024-10-21 19:14:07 +02:00
Willy Tarreau
8844ed2009 CLEANUP: debug: make the BUG_ON() macros check the condition in the outer one
The BUG_ON() macros are made of two levels so as to resolve the condition
to a string. However this doesn't offer much flexibility for performing
other operations when the condition is validated, so let's adjust them so
that the condition is checked in the outer macro and the operations are
performed in the inner one.
2024-10-21 18:17:25 +02:00
Amaury Denoyelle
68c8c91023 BUG/MINOR: mux-quic: do not close STREAM with empty FIN if no data sent
A stream may be shut without any HTX EOM reported to report a proper
closure. This is the case for QCS instances flagged with
QC_SF_UNKNOWN_PL_LENGTH. Shut is performed with an empty FIN emission
instead of a RESET_STREAM. This has been implemented since the following
patch :

  24962dd1784dd22babc8da09a5fc8769617f89e3
  BUG/MEDIUM: mux-quic: do not emit RESET_STREAM for unknown length

However, in case of HTTP/3, an empty FIN should only be done after a
full message is emitted, which requires at least a HEADERS frame. If an
empty FIN is emitted without it, client may interpret this as invalid
and close the connection. To prevent this, fallback to a RESET_STREAM
emission if no data were emitted on the stream.

This was reproduced using ngtcp2-client with 10% loss (-r 0.1) on a
remote host, with httpterm request "/?s=100k&C=1&b=0&P=400". An error
ERR_H3_FRAME_UNEXPECTED is returned by ngtcp2-client when the bug
occurs.

Note that this change is incomplete. The message validity depends solely
on the application protocol in use. As such, a new app_ops callback
should be implemented to ensure the stream is closed accordingly.
However, this first patch ensures that at least HTTP/3 case is valid
while keeping a minimal backport process.

This should be backported up to 2.8.
2024-10-21 11:24:38 +02:00
Amaury Denoyelle
b200d3d80b MINOR: mux-quic: simplify sending of empty STREAM FIN
An empty STREAM frame can be emitted by QUIC MUX to notify about a
delayed FIN when there is no data left to transmit. This requires a
tedious comparison on stream offset in qmux_ctrl_send() to ensure an
empty stream frame is not always considered as retransmitted, which is
necessary to locally close the QCS instance.

Simplify this by unsubscribe from streamdesc layer when the QCS is
locally closed on FIN transmission notification. This prevents all
future retransmitted frames to be reported to the QCS instance,
especially any potentially retransmitted empty FIN.
2024-10-21 11:21:07 +02:00
Valentine Krasnobaeva
af1d170122 BUG/MINOR: mworker: fix mworker-max-reloads parser
Before this patch, when wrong argument was provided in the configuration for
mworker-max-reloads keyword, parser shows these errors below on the stderr:

[WARNING]  (1820317) : config : parsing [haproxy.cfg:154] : (null)parsing [haproxy.cfg:154] : 'mworker-max-reloads' expects an integer argument.

In a case, when by mistake two arguments were provided instead of one, this has
also triggered a buggy error message:

[ALERT]    (1820668) : config : parsing [haproxy.cfg:154] : 'mworker-max-reloads' cannot handle unexpected argument '45'.
[WARNING]  (1820668) : config : parsing [haproxy.cfg:154] : (null)

So, as 'mworker-max-reloads' is parsed in discovery mode by master process
let's align now its parser with all others, which could be called for this
mode. Like this in cases, when there are too many args or argument isn't a
valid integer we return proper error codes to global section parser and
messages are formated properly.

This fix should be backported in all stable versions.
2024-10-21 10:46:58 +02:00
Ilya Shipitsin
8a1aabb133 CI: modernize macos builds to macos-15
macos-15 support was announced few months ago: https://github.com/github/roadmap/issues/986
2024-10-21 07:54:38 +02:00
Ilya Shipitsin
50cf89ad5c CI: bump development builds explicitely to Ubuntu 24.04
Initially we agreed to split builds into "latest" for development branch
and fixed 22.04 for stable branches. It got broken when "latest" label migrated
from ubuntu-22 to ubuntu-24 ... because of build cache. Cache key is built using
runner label, it was not prepared to use the same "latest" cache from ubuntu 22
on ubuntu 24. To make things clear, let's stick explicitely to ubuntu 24.
2024-10-21 07:54:35 +02:00
Ilya Shipitsin
b6491ab19f CI: prepare Coverity build for Ubuntu 24
PCRE2 is recommended, PCRE was chosen for no reason. GHA Ubuntu 22 images include both libs,
but recent Ubuntu 24 does not. Let us prepare for Ubuntu 24
2024-10-21 07:54:32 +02:00
Willy Tarreau
9aa86b9dbd BUILD: mux-h2/traces: fix build on 32-bit due to size of the DATA frame
Commit cf3fe1eed ("MINOR: mux-h2/traces: print the size of the DATA
frames") added the size of the DATA frame to the traces. Unfortunately
it uses ullong instead of ulong to cast a pointer, which breaks the
build on 32-bit platforms. Let's just switch it to ulong which works
on both.
2024-10-21 04:17:59 +02:00
Willy Tarreau
278b9613a3 MEDIUM: debug: on panic, make the target thread automatically allocate its buf
One main problem with panic dumps is that they're filling the dumping
thread's trash, and that the global thread_dump_buffer is too small to
catch enough of them.

Here we're proceeding differently. When dumping threads for a panic, we're
passing the magic value 0x2 as the buffer, and it will instruct the target
thread to allocate its own buffer using get_trash_chunk() (which is signal
safe), so that each thread dumps into its own buffer. Then the thread will
wait for the buffer to be consumed, and will assign its own thread_dump_buffer
to it. This way we can simply dump all threads' buffers from gdb like this:

  (gdb) set $t=0
        while ($t < global.nbthread)
          printf "%s\n", ha_thread_ctx[$t].thread_dump_buffer.area
          set $t=$t+1
        end

For now we make it wait forever since it's only called on panic and we
want to make sure the thread doesn't leave and continues to use that trash
buffer or do other nasty stuff. That way the dumping thread will make all
of them die.

This would be useful to backport to the most recent branches to help
troubleshooting. It backports well to 2.9, except for some trivial
context in tinfo-t.h for an updated comment. 2.8 and older would also
require TAINTED_PANIC. The following previous patches are required:

   MINOR: debug: make mark_tainted() return the previous value
   MINOR: chunk: drop the global thread_dump_buffer
   MINOR: debug: split ha_thread_dump() in two parts
   MINOR: debug: slightly change the thread_dump_pointer signification
   MINOR: debug: make ha_thread_dump_done() take the pointer to be used
   MINOR: debug: replace ha_thread_dump() with its two components
2024-10-19 16:01:52 +02:00
Willy Tarreau
afeac4bc02 MINOR: debug: replace ha_thread_dump() with its two components
At the few places we were calling ha_thread_dump(), now we're
calling separately ha_thread_dump_fill() and ha_thread_dump_done()
once the data are consumed.
2024-10-19 15:42:34 +02:00
Willy Tarreau
d7c34ba479 MINOR: debug: make ha_thread_dump_done() take the pointer to be used
This will allow the caller to decide whether to definitely clear the
pointer and release the thread, or to leave it unlocked so that it's
easy to analyse from the struct (the goal will be to use that in panic()
so that cores are easy to analyse).
2024-10-19 15:42:07 +02:00
Willy Tarreau
091de0f9b2 MINOR: debug: slightly change the thread_dump_pointer signification
Now the thread_dump_pointer is returned ORed with 1 once done, or NULL
when cancelled (for now noone cancels). The goal will be to permit
the callee to provide its own pointer.

The ha_thread_dump_fill() function now returns the buffer pointer that
was used (without OR 1) or NULL, for ease of use from the caller.
2024-10-19 15:42:07 +02:00
Willy Tarreau
2036f5bba1 MINOR: debug: split ha_thread_dump() in two parts
We want to have a function to trigger the dump and another one to wait
for it to be completed. This will be important to permit panic dumps to
be done on local threads. For now this does not change anything, as the
function still calls the two new functions one after the other.
2024-10-19 15:42:07 +02:00
Willy Tarreau
a6698304e0 MINOR: chunk: drop the global thread_dump_buffer
This variable is not very useful and is confusing anyway. It was mostly
used to detect that a panic dump was still in progress, but we can now
check mark_tainted() for this. The pointer was set to one of the dumping
thread's trash chunks. Let's temporarily continue to copy the dumps to
that trash, we'll remove it later.
2024-10-19 15:42:00 +02:00
Willy Tarreau
8e048603d1 MINOR: debug: make mark_tainted() return the previous value
Since mark_tainted() uses atomic ops to update the tainted status, let's
make it return the prior value, which will allow the caller to detect
if it's the first one to set it or not.
2024-10-19 15:13:47 +02:00
Willy Tarreau
84340d108b OPTIM: buffers: avoid a useless wrapping check for ofs == 0
As mentioned in previous commit, b_peek_ofs() performs a wrapping check
but is often called with ofs == 0 as a constant. We can detect this case
with __builtin_const_p() so it makes sense to use it. A test shows a size
reduction of about 320 bytes, which is not much, but it happens in hot code
paths, and each 16 bytes reduction indicates an eliminated conditional
branch.

Some clear winners are ci_getblk_nc() (-48 bytes), h2c_dec_hdrs (-141B),
h1_copy_msg_data (-124B), tcpcheck_spop_expect_hello (-80B),
h1_parse_msg_data (-44B). These ones will definitely benefit from doing
less conditional jumps.
2024-10-18 18:42:47 +02:00
Willy Tarreau
fca212292a CLEANUP: buffers: simplify b_get_varint()
The function is an exact copy of b_peek_varint() with ofs==0 and doing a
b_del() at the end. We can simply call that other one and delete the
contents. It turns out that the code is bigger with this change because
b_peek_varint() passes its offset to b_peek() which performs a wrapping
check. When ofs==0 the wrapping cannot happen, but there's no real way
to tell that to the compiler. Instead conditioning the if() in b_peek()
with (!__builtin_constant_p(ofs) || ofs) does the job, but it's not worth
it at the moment since we have no users of b_get_varint() for now. Let's
just stick to the simple normal code.
2024-10-18 18:28:39 +02:00
Willy Tarreau
8b5a1fd1fc BUILD: buffers: keep b_getblk_nc() and b_peek_varint() in buf.h
Some large functions were moved to buf.c by commit ac66df4e2 ("REORG:
buffers: move some of the heavy functions from buf.h to buf.c"). However,
as found by Amaury, haring doesn't build anymore. Upon close inspection,
b_getblk_nc() isn't that big since it's very much inlinable, and a part
of its apparently large size comes from the BUG_ON_HOT() that were
implemented. Regarding b_peek_varint(), it doesn't have any dependency
and is used only at 4 places in the DNS code, so its loop will not have
big impacts, and the rest around can be optimised away by the compiler
so it remains relevant to keep it inlined. Also it can serve as a base
to deduplicate the code in b_get_varint().

No backport needed.
2024-10-18 17:53:25 +02:00
Dragan Dosen
f33e9079a9 MINOR: arg: add an argument type for identifier
The ARGT_ID argument type may now be used to set a custom resolve
function in order to help resolve the argument string value. If the
custom resolve function is not set, the behavior is the same as of
type ARGT_STR.
2024-10-18 14:30:24 +02:00
Dragan Dosen
40ab88899c BUG/MINOR: sample: free err2 in smp_resolve_args for type ARGT_REG
The err2 may be leaking memory in case an error occurred as a result of
regex_comp() call.
2024-10-18 14:29:56 +02:00
Aurelien DARRAGON
9262b7109e CLEANUP: http_ext: remove useless BUG_ON() in http_handle_xot_header()
A useless BUG_ON() statement was let in a conditional block that already
checks that the condition cannot be met within the block. Remove the
useless BUG_ON()
2024-10-17 17:25:06 +02:00
Aurelien DARRAGON
d28d016f43 MINOR: http_ext: implement rfc7239_{nn,np} converters
"option forwarded" provides a convenient way to automatically insert
rfc7239 forwarded header to requests sent to servers.

On the other hand, manually crafting the header is quite complicated due
to specific formatting rules that must be followed as per rfc7239.
However, sometimes it may be necessary to craft the header manually, for
instance if it has to be conditional or based on parameters that "option
forwarded" doesn't provide. To ease this task, in this patch we implement
rfc7239_nn and rfc7239_np which are respectively meant to craft nodename:
nodeport values, specifically intended to manually build rfc7239 'for'
and 'by' header fields while ensuring rfc7239 compliancy.

Example:
  # build RFC-compliant 7239 header:
  http-request set-var-fmt(txn.forwarded) "for=\"%[ipv6(::1),rfc7239_nn]:%[str(8888),rfc7239_np]\";host=\"haproxy.org\";proto=http"
  # check RFC-compliancy:
  http-request set-var(txn.test) "var(txn.forwarded),debug(ok,stderr),rfc7239_is_valid,debug(ok,stderr)"
  #  stderr output:
  #    [debug] ok: type=str <for="[::1]:_8888";host="haproxy.org";proto=http>
  #    [debug] ok: type=bool <1>

See documentation for more info and examples.
2024-10-17 17:24:58 +02:00
Aurelien DARRAGON
45cbbdc845 DOC: config: fix rfc7239 forwarded typo in desc
replace specicy with specify in rfc7239 forwarded option description.
Multiple occurences were found.

May be backported in 2.8.
2024-10-17 17:24:51 +02:00
Frederic Lecaille
b1af5dabf0 BUG/MEDIUM: quic: avoid freezing 0RTT connections
This issue came with this commit:

	f627b92 BUG/MEDIUM: quic: always validate sender address on 0-RTT

and could be easily reproduced with picoquic QUIC client with -Q option
which splits a big ClientHello TLS message into two Initial datagrams.
A second condition must be fulfilled to reprodue this issue: picoquic
must not send the token provided by haproxy (NEW_TOKEN). To do that,
haproxy must be patched to prevent it to send such tokens.

Under these conditions, if haproxy has enough time to reply to the first Initial
datagrams, when it receives the second Initial datagram it sends a Retry paquet.
Then the client ignores the Retry paquet as mentionned by RFC 9000:

 17.2.5.2. Handling a Retry Packet
    A client MUST accept and process at most one Retry packet for each connection
    attempt. After the client has received and processed an Initial or Retry packet
    from the server, it MUST discard any subsequent Retry packets that it receives.

On its side, haproxy has closed the connection. When it receives the second
Initial datagram, it open a new connection but with Initial packets it
cannot decrypt (wrong ODCID) leaving the client without response.

To fix this, as the aim of the token (NEW_TOKEN) sent by haproxy is to validate
the peer address, in place of closing the connection when no token was received
for a 0RTT connection, one leaves this validation to the handshake process.
Indeed, the peer adress is validated during the handshake when a valid handshake
packet is received by the listener. But as one does not want haproxy to process
0RTT data when no token was received, one does not accept the connection before
the successful handshake completion. In addition to this, the 0RTT packets
are not released after successful handshake completion when no token was received
to leave a chance to haproxy to process these 0RTT data in such case (see
quic_conn_io_cb()).

Must be backported as far as 2.9.
2024-10-17 15:04:06 +02:00
Frederic Lecaille
c7f14a38f5 MINOR: quic: send new tokens (NEW_TOKEN) even for 1RTT sessions
Tokens are sent when opening a connection, just after the handshake, to
be possibly reused by the peer for the next connection. They are used
to validate the peer address during the 0RTT connection openings.
But there is no reason to reserve this feature to 0RTT connections.
This patch modifies quic_build_post_handshake_frames() to do so.
2024-10-17 15:04:06 +02:00
Frederic Lecaille
19aa320f64 BUG/MINOR: quic: avoid leaking post handshake frames
This bug came with this commit:
	f627b92 BUG/MEDIUM: quic: always validate sender address on 0-RTT
If an error happens in quic_build_post_handshake_frames() during the
code exexuted for th NEW_TOKEN frame allocation, some could leak because
of the wrong label used to interrupt this function asap.
Replace the "goto leave" by "goto err" to deallocated such frames to fix
this issue.

Must be backported as far as 2.9.
2024-10-17 15:04:06 +02:00
Christopher Faulet
e7be13da87 REGTESTS: Never reuse server connection in http-messaging/truncated.vtc
A "Connection: close" header is added to responses to avoid any connection
reuse. This should avoid errors on the client side.
2024-10-17 14:44:01 +02:00
Christopher Faulet
52a3d807fc BUG/MAJOR: filters/htx: Add a flag to state the payload is altered by a filter
When a filter is registered on the data, it means it may change the payload
length by rewritting data. It means consumers of the message cannot trust the
expected length of payload as announced by the producer. The commit 8bd835b2d2
("MEDIUM: filters/htx: Don't rely on HTX extra field if payload is filtered")
was pushed to solve this issue. When the HTTP payload of a message is filtered,
the extra field is set to 0 to be sure it will never be used by error by any
consumer. However, it is not enough.

Indeed, the filters must be called before fowarding some data. They cannot be
by-passed. But if a consumer is unable to flush the HTX message, some outgoing
data can remain blocked in the channel's buffer. If some new data are then
pushed because there is some room in the channel's buffe, the producer will set
the HTX extra field. At this stage, if the consumer is unblocked and can send
again data, it is possible to call it to forward outgoing data blocked in the
channel's buffer before waking the stream up to filter new input data. It is the
purpose of the data fast-forwarding. In this case, the HTX extra field will be
seen by the consumer. It is unexpected and leads to undefined behavior.

One consequence of this bug is to perform a wrong chunking on compressed
messages, leading to processing errors at the end of the message, reported as
"ID--" in logs.

To fix the bug, a HTX flag is added to state the payload of the current HTX
message is altered. When this flag is set (HTX_FL_ALTERED_PAYLOAD), the HTX
extra field must not be trusted. And to keep things simple, when this flag is
set, the HTX extra field is automatically set to 0 when the HTX message is
loaded, in htxbuf() function.

It is probably the less intrusive way to fix the bug for now. But this part must
be reviewed to save meta-info of the HTX message outside of the message itself.

This commit should solve the issue #2741. It must be backported as far as 2.9.
2024-10-17 13:54:54 +02:00
Christopher Faulet
0fcfed9e23 BUG/MEDIUM: stconn: Check FF data of SC to perform a shutdown in sc_notify()
In sc_notify() function, the consumer side of the SC is tested to verify if
we must perform a shutdown on the endpoint. To do so, no output data must be
present in the buffer and in the iobuf. However, there is a bug here, the
iobuf of the opposite SC is tested instead of the one of the current SC. So
a shutdown can be performed on the endpoint while there are still output
data in the iobuf that must be sent. Concretely, it can only be data blocked
in a pipe.

Because of this bug, data blocked in the pipe will be never sent. I've not
tested but I guess this may block the stream during the client or server
timeout.

This patch must be backported as far as 2.9.
2024-10-17 13:53:40 +02:00
Christopher Faulet
6790067e79 BUG/MINOR: http-ana: Don't report a server abort if response payload is invalid
If a parsing error is reported by the mux on the response payload, a proxy
error (PRXCOND) must be reported instead of a server abort (SRVCL). Because
of this bug, inavlid response may are reported as "SD--" or "SL--" in logs
instead of "PD--" or "PL--".

This patch must be backported to all stable versions.
2024-10-17 13:53:40 +02:00
Christopher Faulet
f98feda53f MINOR: mux-h1: Add a trace on shutdown when keep-alive is not possible
When the stream is shut down, some tests are performed to know if the
connection must also be closed or not. There are trace messages for all
cases, except for the default one: Abort or close-mode. Thanks to this
patch, there is now a message too in this case.
2024-10-17 13:53:40 +02:00
Christopher Faulet
2c82ca60c6 MINOR: mux-h1: Show the SD iobuf in trace messages on stream send events
Info about the SD iobuf are now dumped in trace messages when a stream send
event is processed. It is a useful information to debug zero-copy forwarding
issues.
2024-10-17 13:53:40 +02:00
Christopher Faulet
48f1e2b6fe BUG/MEDIUM: stconn: Wait iobuf is empty to shut SE down during a check send
When a send attempt is performed on the opposite side from sc_notify() and
all outgoing data are sent while a shut was scheduled, the SE is shut down
because we consider all data were sent and no more are expected. However,
here we must also be carefull to have sent all pending data in the
iobuf. Indeed, some spliced data may be blocked. In this case, if the SE is
shut down, these data may be lost.

This patch should fix the original bug reported in #2749. It must be
backported as far as 2.9.
2024-10-17 13:53:40 +02:00
William Lallemand
043f11e891 MINOR: mworker/ocsp: skip ocsp-update proxy init in master
The proxy must be created in mworker mode, but only in the worker, not in
the master. The current code creates the proxy in both processes.

The patch only checks that we are not in the master to start the
ocsp-update pre-check.

No backport needed.
2024-10-17 12:30:59 +02:00
William Lallemand
5184f3fb30 BUG/MINOR: resolvers/mworker: missing default resolvers in mworker mode
Since commit fe75c1e12da061 ("MEDIUM: startup: remove
MODE_MWORKER_WAIT") the MODE_MWORKER_WAIT constant disappeared. The
initialization of the default resolvers section was conditionned by this
constant.

The section must be created in mworker mode, but only in the worker not in
the master. It was currently completely disabled in both the master and
the worker which could break configuration using it, as well as the
httpclient.

No backport needed.
2024-10-17 12:17:23 +02:00
William Lallemand
fdbff3a020 BUG/MEDIUM: mworker/httpclient: initialization skipped by accident in mworker mode
Since commit fe75c1e12da061 ("MEDIUM: startup: remove
MODE_MWORKER_WAIT") the MODE_MWORKER_WAIT constant disappearded. The
initialization of the httpclient proxy was conditionned by this
constant.

The proxy must be created in mworker mode, but only in the worker not in
the master. It was currently completely disabled in both the master and
the worker provoking a NULL dereference upon httpclient usage.

No backport needed.
2024-10-17 12:16:35 +02:00
William Lallemand
e7b7072943 BUG/MINOR: httpclient: return NULL when no proxy available during httpclient_new()
Latest patches on the mworker rework skipped the httpclient_proxy
creation by accident. This is not supposed to happen because haproxy is
supposed to stop when the proxy creation failed, but it shows a flaw in
the API.

When the httpclient_proxy or the proxy used in parameter of
httpclient_new_from_proxy() is NULL, it will be dereferenced and cause a
crash.

The patch only returns a NULL when doing an httpclient_new() if the
proxy is not available.

Must be backported as far as 2.7.
2024-10-17 11:57:29 +02:00
Willy Tarreau
1fb61475f2 [RELEASE] Released version 3.1-dev10
Released version 3.1-dev10 with the following main changes :
    - BUG/MAJOR: mux-quic: do not crash on empty STREAM frame emission
    - BUG/MINOR: stats: Fix the name for the total number of streams created
    - MINOR: quic: strengthen qc_release_frm()
    - MEDIUM: quic: decount acknowledged data for MUX txbuf window
    - MINOR: quic: implement dedicated type for out-of-order stream ACK
    - MEDIUM: quic: merge contiguous/overlapping buffered ack stream range
    - MEDIUM: quic: decount out-of-order ACK data range for MUX txbuf window
    - MINOR: log: add do_log() logging helper
    - MINOR: log: add do_log_parse_act() helper func
    - MINOR: action: add do-log action
    - REGTESTS: add some tests for 'do-log' action
    - BUG/MEDIUM: hlua: make hlua_ctx_renew() safe
    - BUG/MEDIUM: hlua: properly handle sample func errors in hlua_run_sample_{fetch,conv}()
    - BUG/MINOR: quic: fix discarding of already stored out-of-order ACK
    - BUG/MEDIUM: quic: properly decount out-of-order ACK on stream release
    - MINOR: ssl: disable server side default CRL check with WolfSSL
    - MEDIUM: sink: implement sink_find_early()
    - MINOR: trace: postresolve sink names
    - MINOR: sample: postresolve sink names in debug() converter
    - BUG/MEDIUM: mux-quic: ensure timeout server is active for short requests
    - MINOR: cfgparse: simulate long configuration parsing with force-cfg-parser-pause
    - BUILD: cache: silence an uninitialized warning at -Og with gcc-12.2
    - BUG/MINOR: mux-h2/traces: present the correct buffer for trailers errors traces
    - MINOR: mux-h2/traces: print the size of the DATA frames
    - CLEANUP: muxes: remove useless inclusion of ebmbtree.h
    - REORG: buffers: move some of the heavy functions from buf.h to buf.c
    - MINOR: buffer: add a buffer list type with functions
    - MINOR: mux-h2: split the amount of rx data from the amount to ack
    - MINOR: mux-h2: create and initialize an rx offset per stream
    - MEDIUM: mux-h2: start to update stream when sending WU
    - MEDIUM: mux-h2: start to introduce the window size in the offset calculation
    - MINOR: mux-h2: count within a connection, how many streams are receiving data
    - MINOR: mux-h2: allocate the array of shared rx bufs in the h2c
    - MINOR: mux-h2: add rxbuf head/tail/count management for h2s
    - MINOR: mux-h2: move H2_CF_WAIT_IN_LIST flag away from the demux flags
    - MINOR: mux-h2: simplify the exit code in h2_rcv_buf()
    - MINOR: mux-h2: simplify the wake up code in h2_rcv_buf()
    - MINOR: mux-h2: clear up H2_CF_DEM_DFULL and H2_CF_DEM_SHORT_READ ambiguity
    - MAJOR: mux-h2: make streams use the connection's buffers
    - MAJOR: mux-h2: permit a stream to allocate as many buffers as desired
    - MAJOR: mux-h2: make the rxbuf allocation algorithm a bit smarter
    - MINOR: mux-h2: add tune.h2.be.rxbuf and tune.h2.fe.rxbuf global settings
    - MEDIUM: mux-h2: change the default initial window to 16kB
    - DOC: design-thoughts: add diagrams illustrating an rx win groth
    - MEDIUM: mux-h2: rework h2_restart_reading() to differentiate recv and demux
    - OPTIM: mux-h2: make h2_send() report more accurate wake up conditions
    - OPTIM: mux-h2: try to continue reading after demuxing when useful
    - OPTIM: mux-h2: use tasklet_wakeup_after() in h2s_notify_recv()
    - MINOR: mux-h2/traces: add missing flags and proxy ID in traces
    - MINOR: mux-h2/traces: add buffer-related info to h2s and h2c
    - CI: cirrus-ci: bump FreeBSD image to 14-1
    - REGTESTS: fix a reload race in abns_socket.vtc
    - MINOR: activity/memprofile: always return "other" bin on NULL return address
    - MINOR: quic: notify connection layer on handshake completion
    - BUG/MINOR: stream: unblock stream on wait-for-handshake completion
    - BUG/MEDIUM: quic: support wait-for-handshake
    - BUG/MEDIUM: server: server stuck in maintenance after FQDN change
    - BUG/MEDIUM: queue: make sure never to queue when there's no more served conns
    - DEBUG: mux-h2/flags: add H2_CF_DEM_RXBUF & H2_SF_EXPECT_RXDATA for the decoder
    - REGTESTS: cli: add delay 0.1 before connect to cli
    - MINOR: startup: add O_CLOEXEC flag to open /dev/null
    - MEDIUM: startup: move daemonization fork in init
    - MINOR: startup: refactor "daemonization" fork
    - MEDIUM: startup: move PID handling in init()
    - MAJOR: mworker: move master-worker fork in init()
    - BUG/MINOR: mworker: fix memory leak due to master-worker fork
    - REORG: mworker: set nbthread=1 for master after fork
    - MINOR: init: check MODE_MWORKER before creating master CLI
    - REORG: mworker: move mworker_create_master_cli in master 'case'
    - MEDIUM: startup: call chroot() if needed in one place
    - MEDIUM: startup: do set_identity() if needed in one place
    - MINOR: startup: only worker gets capabilities from bin
    - CLEANUP: haproxy: rm no longer used mworker_reexec_waitmode
    - MINOR: startup: rename exit_on_waitmode_failure to exit_on_failure
    - MINOR: defaults: update MASTER_MAXCONN description
    - MEDIUM: startup: remove MODE_MWORKER_WAIT
    - MINOR: global: add MODE_DISCOVERY flag
    - MEDIUM: cfgparse: add KWF_DISCOVERY keyword flag
    - MEDIUM: cfgparse: call some parsers only in MODE_DISCOVERY
    - MEDIUM: cfgparse-global: parse only KWF_DISCOVERY keywords in MODE_DISCOVERY
    - MEDIUM: cfgparse: parse only "global" section in MODE_DISCOVERY
    - MEDIUM: startup: introduce load_cfg and read_cfg
    - MINOR: cfgparse: fix *thread keywords sensitive to global section position
    - MINOR: mworker/cli: rename mworker_cli_proxy_new_listener
    - MINOR: mworker/cli: rename and clean mworker_cli_sockpair_new
    - MINOR: mworker/cli: create master CLI sockpair before fork
    - MINOR: mworker/cli: create MASTER proxy before mcli listeners
    - MINOR: mworker: add and set state PROC_O_INIT for new worker
    - MEDIUM: mworker/cli: close child and parent fds, setup listeners
    - MINOR: mworker: mworker_catch_sigchld: use fd_delete instead of close
    - MINOR: startup: rename and adapt reexec_on_failure
    - MINOR: mworker: add support for case when new worker dies
    - MINOR: mworker: simplify the code that sets PROC_O_LEAVING
    - MINOR: mworker/cli: add _send_status to support state transition
    - MEDIUM: startup: split sending oldpids_sig logic for standalone and mworker modes
    - MINOR: startup: split init() into separate initialization routines
    - MINOR: startup: split main: add step_init_3
    - MINOR: startup: simplify check for calling sock_get_old_sockets
    - MINOR: startup: encapsulate sock_get_old_sockets in a function
    - MINOR: startup: add bind_listeners
    - MINOR: startup: split main: add step_init_4
    - MINOR: startup: encapsulate master's code in run_master
    - MINOR: startup: add read_cfg_in_discovery_mode
    - MINOR: mworker: adapt exit_on_failure for master recovery mode
    - MEDIUM: mworker: add support of master recovery mode
    - MINOR: startup: add set_verbosity
    - MEDIUM: mworker: block reloads
    - MINOR: mworker: slow load status delivery if worker is starting
    - MINOR: mworker: readapt program support in mworker_catch_sigchld
    - MINOR: mworker: deserialize process list before read_cfg_in_discovery_mode
    - MINOR: mworker: parse program only in MODE_DISCOVERY
    - MINOR: cfgparse: add support for program section
    - MINOR: startup: reintroduce program support
    - MINOR: mworker-prog: stop old programs in mworker_ext_launch_all
    - MINOR: mworker: reintroduce systemd support
    - MINOR: mworker: report explicitly when worker exits due to max reloads
    - MINOR: cfgparse-global: parse *env keywords in MODE_DISCOVERY
    - MINOR: startup: reintroduce *env keywords support
    - MINOR: startup: close devnullfd, when daemon mode is applied
2024-10-16 22:57:52 +02:00
Valentine Krasnobaeva
c42ad79134 MINOR: startup: close devnullfd, when daemon mode is applied
In case of daemon mode now daemonization fork happens in the early init stage
before parsing and applying the configuration, so we can't close
stdio/stderr/stdout immediately after forking. We keep it open until the most
of configuration, including chroot are applied in order to show alerts, if
there are some problems. To achieve this /dev/null is opened just before calling
chroot(), and after the chroot block it's used to close all standard outputs
and stdin. At this point we no longer need the fd of /dev/null, so we can close
it as well.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
dc53c37234 MINOR: startup: reintroduce *env keywords support
setenv/resetenv/presetenv/unsetenv keywords in the configuration modify the
process environment. In case of master-worker and programs we need to restore
the initial process environment before reload, as the configuration could
change in between and newly forked workers and programs should be launched
in the environment corresponded to this new configuration.

To achieve this we backup the initial process environment before the first
configuration read, when 'global' and 'program' sections are read. And then we
clean up master process environment and restore the initial one from the backup
in mworker_reexec().
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
d5ad92c7aa MINOR: cfgparse-global: parse *env keywords in MODE_DISCOVERY
setenv/resetenv/presetenv/unsetenv keywords should be parsed by master
process and by worker. As some other master parameters could be enabled in
conditional blocks (.if...endif). To achieve this let's tag '*env' keywords
with KWF_DISCOVERY flag.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
d11dc11e5a MINOR: mworker: report explicitly when worker exits due to max reloads
It's convienient for testing and for usage to produce different warning
messages, when the former worker exits due to max reloads exceeded, and when it
was terminated by the master.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
4c8303a59e MINOR: mworker: reintroduce systemd support
Let's reintroduce systemd support in the refactored master-worker mode.

As for now, the master-worker fork happens during early initialization steps and
then the master process receieves the "READY" status message from the newly
forked worker, that has successfully loaded. Let's propagate this "READY" status
message at this moment to the systemd from the master process context
(_send_status()). We use the master process to send messages to systemd,
because it is only the process, monitored by systemd.

In master recovery mode, we also need to send to the systemd the "READY"
message, but with the status "Reload failed". "READY" will signal to systemd,
that master process is still alive, because it doesn't exit in recovery mode
and it keeps the existed worker. Status "Reload failed" will signal to user,
that something wrong has happened with the configuration. Same message logic
was originally preserved for the case, when the worker fails to read its
configuration, see on_new_child_failure() for more details.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
9e23cfa5c2 MINOR: mworker-prog: stop old programs in mworker_ext_launch_all
This patch is a part of series to reintroduce the program support in the new
master-worker architecture.

Now, after refactoring in master-worker mode it's the master process, who
stops workers forked before the reload. Current worker no longer sends USR1 or
TERM signals to the previous one after ports binding. This behaviour is kept
only for the standalone mode.

So, in case of programs, it's up to master process as well to stop programs,
which were launched before reload. Let's do this in mworker_ext_launch_all(),
just before starting the new programs.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
0fc2ff4b7d MINOR: startup: reintroduce program support
This patch is a part of series to reintroduce the program support in the new
master-worker architecture.

Let's add here mworker_ext_launch_all() call before master-worker fork to
start external programs. We keep the order and the place of these two forks
(program and master-worker) the same as before the refactoring, in order to
avoid regressions.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
a2fac5a3a1 MINOR: cfgparse: add support for program section
This patch is a part of series to reintroduce the program support in the new
master-worker architecture.

Programs are launched by master, thus only the master process needs its
configuration. Therefore, program section parser should be called only in
discovery mode, when master parses its configuration.

Program section has a post section parser. It should be called only in
discovery mode as well.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
45a284895a MINOR: mworker: parse program only in MODE_DISCOVERY
This patch is a part of series to reintroduce the program support in the new
master-worker architecture.

Master process launches external programs, so it needs to read program
section. Thus, it should be parsed in MODE_DISCOVERY. Worker does not need
program settings, so let's check the runtime mode in cfg_parse_program. Worker
should always skip this section.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
ee7fc98320 MINOR: mworker: deserialize process list before read_cfg_in_discovery_mode
This patch is a part of series to reintroduce the program support in the new
master-worker architecture.

For the moment we keep the order of program and worker forks the same as before
the refactoring, as we need to be sure that this won't introduce regressions.
So, programs are forked before the new worker process.

Before the program's fork we already need deserialized processes list to find
the programs launched before reload and to stop them. Processes list saved
before the reload in HAPROXY_PROCESSES variable. It should be deserialized
before the first configuration read in discovery mode, because resetenv keyword
could be presented in the global section.

So, let's move mworker_env_to_proc_list() from mworker_create_master_cli() to
main(). We need to call it only after reload in master-worker mode, thus
HAPROXY_MWORKER_REEXEC and HAPROXY_PROCESSES should be still presented in the
re-executing process environment before the first configuration read.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
7a267c4a27 MINOR: mworker: readapt program support in mworker_catch_sigchld
This patch is a part of series to reintroduce the program support in the new
master-worker architecture.

We just only launch and stop external programs and there is no any
communication between the master process and the started program binary. So,
ipc_fd[0] and ipc_fd[1] are not used and kept as -1 for programs processes. Due
to this, no need for the exiting program process to call fd_delete on this
fds. Otherwise, this will trigger a BUG_ON.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
d766677d92 MINOR: mworker: slow load status delivery if worker is starting
With refactored master-worker architecture master and worker processes parse
its parts of the configuration. Worker could have a huge configuration, so it
will take some time to load. As now HAPROXY_LOAD_SUCCESS is set to 1 only
after receiving the status READY from the new worker
cli_io_handler_show_loadstatus() may exit very fast by showing load status 0,
and in such case and mcli socket will be closed.

This already breaks some regression tests and can confuse some APIs. So, let's
slow down the load status delivery. If in the process list there is still some
process, which is loading (PROC_O_INIT). appctx task will sleep in this case for
50ms and then return 0. cli_io_handler_show_loadstatus() is called in loop, so
with such pacing, there is a high chance that the next time, when we enter in
its scope all processes will have the state READY. Like this master CLI
connection socket won't be closed until the loading of the new worker is really
finished, thus the reload status and logs (Success=1/0) will be shown in
synchronious way.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
5f16453082 MEDIUM: mworker: block reloads
When reloads arrive very often (sent by some APIs), newly forked workers
almost don't have a time to load completely and to send its READY status to
master, which allows then to stop the previous worker (launched before reload).
As a result, the number of workers increases very quickly, previous workers are
still alive and the memory consumption is very high.

To avoid such situations let's return in cli_parse_reload() reload status 0
with the text ""Another reload is still in progress", if there is still a
process with PROC_O_INIT flag in the processes list.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
5be14b338a MINOR: startup: add set_verbosity
Let's encapsulate the logic to set verbosity modes (MODE_DEBUG and MODE_VERBOSE)
in a separate function set_verbosity(). This makes the code of main() more
readable and this allows to call set_verbosity() for master process in recovery
mode. So, in this mode, verbosity settings before the master re-execution will
be re-applied to master. set_verbosity() will be extended in future commits to
reduce the verbosiness of master in order not to dump pollers list and filters,
if it was started with -V or -d.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
5909d508bc MEDIUM: mworker: add support of master recovery mode
In this commit we add run_master_in_recovery_mode(), which groups all necessary
initialization steps, which master should perform to be able to enter in its
polling loop (run_master()), when it fails while parsing its new config.

As exit_on_failure() is now adapted for master recovery mode. Let's register
it as atexit handler, when master enters in this mode. And let's remove
atexit_flag variable for master, because we no longer use it.

We also slightly refactor here read_cfg_in_discovery_mode() in order to call
run_master_in_recovery_mode() for the case, described above. Warning messages
are mandatory before calling the run_master_in_recovery_mode() as this allows
to stop haproxy with error, if it was launched in zero-warning mode.

So, in recovery mode master does not launch any worker. It just performs its
necessary initialization routines and enters in its polling loop to continue
to monitor the existed worker process.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
fe4708feaa MINOR: mworker: adapt exit_on_failure for master recovery mode
Master recovery mode replaces the former wait-mode with a difference, that
master in this case doesn't try to fork the new worker process. But it still
needs to enter to its polling loop in order to monitor the previous worker.
Master performs some initialization steps for this and it recreates its master
CLI. During its initialization steps, master could potentially fail again.
As we use for the moment for master init steps some common routines
(step_init_2() and step_init_3()), there is no way there to signal to user that
failure has happened for the master and in addition, in its recovery mode. So,
in such case exit_on_failure() can be still useful in order to print an
appropriate alert, as we can register this function as atexit handler for the
master.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
6615e46456 MINOR: startup: add read_cfg_in_discovery_mode
Let's encapsulate here the code to load and to read the configuration at the
first time in MODE_DISCOVERY. This makes the code of main() more readable and
this adds the structure for adding necessary master initializations routines
to support master recovery mode.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
1cee184145 MINOR: startup: encapsulate master's code in run_master
Let's encapsulate master's code (steps which it does before entering in its
polling loop and deinitialization routines after) in a separate run_master()
function. This makes the code of main() more readable. In future we plan to put
in run_master() more master process related code, in order to clean completely
init_step_2(), init_step_3() and init_step_4().
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
e5cd81cf8f MINOR: startup: split main: add step_init_4
Let's encapsulate here another part of main, after binding listeners sockets
and before calling the master's code in master-worker mode. This block
contains the code, which applies verbosity settings, checks limits and updates
the ready date. It will take some time to figure out, which of these parts are
really needed for the master, or which ones it could skip. So let's put all
these for the moment in step_init_4() and let's call it for all modes.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
26a6fdf542 MINOR: startup: add bind_listeners
Let's encapsulate here the code, which tries to bind listeners for the new
process in a separate function. This will make the main() code more readable.
Master process, even if it has failed while reading its new configuration, has
to bind its master CLI sockets. So like this we will can call this function in
the master recovery mode.

Master CLI socket address and port for external connections (user, monitoring
tools) are provided for now only via the command line. So, master, even after
this failure can and must reestablish master CLI connections again.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
babbcb047e MINOR: startup: encapsulate sock_get_old_sockets in a function
Let's encapsulate here the code, that calls sock_get_old_sockets() to obtain
listeners sockets from the previous process into a separate function. This
will make the code of main() more readable and we can move this new function
(if we might need so) in future.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
f4e73b4302 MINOR: startup: simplify check for calling sock_get_old_sockets
MODE_CHECK and MODE_CHECK_CONDITION are applied now very early in
step_init_1() and step_init_2() in order to check the configuration or to check
some condition provided via the command line. When these checks have
terminated, the main process exits. So, no longer need to verify these modes at
the moment, when the current process have already done its basic initialization
routines and is asking for listeners sockets from the previously started one.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
c4795e4019 MINOR: startup: split main: add step_init_3
The first part of main(), just after calling the former init() and before
trying to bind listeners, need to be also encapsulated into a separate
step_init_3() as it is. It contains important blocks to register signals, to
apply memory and nofile limits, etc. The order of these blocks should be also
preserved (especially the signals part).

For the moment step_init_3() must be also executed for all runtime modes.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
49772c55e3 MINOR: startup: split init() into separate initialization routines
This is the first commit in a series to add a support of the 5-th reload
use case, when the master process fails to read its new configuration. In this
case it just need to perform its initialization steps and keep the existed
worker.

To add the support for this last use case we need to split init() and main()
in a shorter steps in order to encapsulate necessary initialization routines
into separate functions.

Let's at first, make here progname as a global variable for haproxy.c, as it
will be used in error messages in the initialization functions. Then let's
split the init() into separate routines, which set and apply modes, write
process PID in a pidfile, etc.

The big part of the former init(), which called functions to allocate pools,
to initialize proxies, to calculate maxconn and to perform some post checks was
just encasulated as is, into step_init_2(). It will take some time to figure
out exactly which parts of this initialization block are really necessary for
the master process and which ones it could skip. So, for the moment
step_init_2() is called for all runtime modes.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
81dbc2c2e2 MEDIUM: startup: split sending oldpids_sig logic for standalone and mworker modes
Before refactoring the master-worker mode, in all runtime modes, when the new
process successfully parsed its configuration and bound to sockets, it sent
either SIGUSR1 or SIGTERM to the previous one in order to terminate it.

Let's keep this logic as is for the standalone mode. In addition, in standalone
mode we need to send the signal to old process before calling set_identity(),
because in set_identity() effective user or group may change. So, the order is
important here.

In case of master-worker mode after refactoring, master terminates the previous
worker by itself up to receiving "READY" status from the new one in
_send_status(). Master also sets at this moment HAPROXY_LOAD_SUCCESS env
variable and checks, if there are some other workers to terminate with
max_reloads exceeded.

So, now in master-worker mode we terminate old workers only, when the new one
has successfully done all initialization steps and has sent "READY" status to
master.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
b73a278df4 MINOR: mworker/cli: add _send_status to support state transition
In the new master-worker architecture, when a worker process is forked and
successfully initialized it needs somehow to communicate its "READY" state to
the master, in order to terminate the previous worker and workers, that might
exceeded max_reloads counter.

So, let's implement for this a new master CLI _send_status command. A new
worker can send its status string "READY" to the master, when it's about
entering to the run poll loop, thus it can start to receive data.

In _send_status() in the master context we update the status of the new worker:
PROC_O_INIT flag is withdrawn.

When TERM signal is sent to a worker, worker terminates and this triggers the
mworker_catch_sigchld() handler in master. This handler deletes the exiting
process entry from the processes list.

In _send_status() we loop over the processes list twice. At the first time, in
order to stop workers that exceeded the max_reloads counter. At the second time,
in order to stop the worker forked before the last reload. In the corner case,
when max_reloads=1, we avoid to send SIGTERM twice to the same worker by
setting sigterm_sent flag during the first loop.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
154848a314 MINOR: mworker: simplify the code that sets PROC_O_LEAVING
When master performs a reexec it should set for an already existed worker the
flag PROC_O_LEAVING. It means that existed worked is marked as the previous one
and will be terminated after the reload.

In the previous implementation master process was need to do the reexec
twice (the first time for parsing its configuration and the second time to free
unused ressources). So the logic of setting PROC_O_LEAVING was based on
comparing the number of reloads, performed by each process from the processes
list, except the master.

Now, as being mentioned before, reexec is performed only once. So, in this case
we need to set PROC_O_LEAVING flag, when we deserialize the list. It is done for
all processes, which have the number of reloads stricly positive.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
c8aac63893 MINOR: mworker: add support for case when new worker dies
The case, when the new worker fails while it parses its configuration or while
it tries to apply it, could be considered as the new one, because the master
process is no longer need to reexec again. The master simply keeps the previous
worker (forked before the reload) and it let the new one to exit with failure.

When the new worker exits, in the master process context (mworker_catch_sigchld)
we need to stop a MASTER proxy listener and we need to drop the server,
attached to new worker's CLI sockpair (it's inherited in master). Then we
explicitly delete master's end of this sockpair (child->ipc_fd[0]) from the
fdtab and we free the memory allocated for the worker process.
on_new_child_failure() is called before the clean up to signal systemd that
reload/load was failed.

If the new worker fails during the first start, so there is no any previous
worker, master process should exit immediately in order to keep the same
behaviour, as it was before this architecture change.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
2bb07b913d MINOR: startup: rename and adapt reexec_on_failure
Previously reexec_on_failure() was called in cases when the process has failed
after reload, while it was parsing its configuration or it was trying to apply
it. reexec_on_failure() has called mworker_reexec() and the master process has
been reexecuted.

With the new architecture in such cases there is no longer need to reexecute
the master process after its reload again. It simply keeps the previous worker,
forked before the reload, and it lets the new one to exit with an error. But we
still need the code, which increments the number of failed reloads and which
notifies systemd with new "Reload failed!" status. So, let's reuse and adapt
for this reexec_on_failure() and let's rename it to on_new_child_failure().
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
9b27f82da3 MINOR: mworker: mworker_catch_sigchld: use fd_delete instead of close
If the worker exits due to failure or due to receiving TERM signal, in the
master context, we can't now simply close the master's fd (ipc_fd[0]) of
the inherited master CLI sockpair.

When the worker is created, in the master process context MASTER proxy listener
is bound to ipc_fd[0]. When this worker fails or exits, master process is
always in its polling loop. So, closing some fd in its context immediately
triggers the BUG_ON(fd->owner), as the poller try to reinsert the "freed" fd
into fdtab and try to reuse it. We must call fd_delete in this case. This will
deinitializes fd auxilary data and closes its properly.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
cf150fd73d MEDIUM: mworker/cli: close child and parent fds, setup listeners
Basically, this is the continuation of the previous commits. So, here after the
fork, worker process closes the "master" end of the copied CLI sockpair and
binds its end, ipc_fd[1], to the GLOBAL proxy listener.
mworker_cli_global_proxy_new_listener() guarantees that GLOBAL proxy will be
created, if it wasn't the case before.

Master process, at first, allocates the MASTER proxy, creates master CLI listener
(-S command line option) and reload sockpair and then closes the "worker" end of
the copied CLI sockpair and binds its end, ipc_fd[0], to the created MASTER proxy.

Usage of the new PROC_O_INIT state helps to reduce test conditions to find the
newly forked worker.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
646299fc95 MINOR: mworker: add and set state PROC_O_INIT for new worker
Here, to distinguish between the new worker and the previous one let's add a
new process state PROC_O_INIT and let's set it, when the memory is allocated
for the new worker in the processes list.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
26ad5465cc MINOR: mworker/cli: create MASTER proxy before mcli listeners
For the master process we always need to create a MASTER proxy, even if
master cli settings were not provided via command line, because now we bind a
listener in the master process context at ipc_fd[0]. So, MASTER proxy should be
already allocated at this moment.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
6ec38c9a74 MINOR: mworker/cli: create master CLI sockpair before fork
The main idea here is to create a master CLI inherited sockpair just before the
master-worker fork. And only then after the fork let each process to bind a
needed listener to the its end of this sockpair.

Like this master and worker processes can close unused "ends" of its sockpair
copy (ipc_fd[0] for worker and and ipc_fd[1] for master).

When this sockpair creation happens inside the
mworker_cli_global_proxy_new_listener() is not possible for the master to
close ipc_fd[1] bound to the GLOBAL proxy listener, as this triggers a
BUG_ON(fd->owner) in fd_insert() in master context, because master process
has alredy entered in its polling loop and poller in its turn tries to reused
closed fd.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
cc1a631beb MINOR: mworker/cli: rename and clean mworker_cli_sockpair_new
Let's rename mworker_cli_sockpair_new() to
mworker_cli_global_proxy_new_listener() to outline that this function creates
the GLOBAL proxy, allocates the listener with "master-socket" bind conf and
attaches this listener to this GLOBAL proxy. Listener is bound to ipc_fd[1] of
the sockpair inherited in master and in worker (master CLI sockpair).
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
0fbf1973ad MINOR: mworker/cli: rename mworker_cli_proxy_new_listener
This is the first commit in a series to add the support of 4 primary reload
use-cases for the new master-worker architecture:

1. Newly forked worker process dies before any reload, due to some errors in
   the configuration. Newly forked worker process crashes before any reload
   after sending its "READY" state to master.

2. Newly forked worker process dies due to some errors in the new
   configuration. This happens after reload, when this new configuration was
   supplied, so the previous worker process is still here.

3. Newly forked worker process crashes after sending its "READY" state to
   master due to some bugs. This happens after reload, so the previous worker
   process is still here.

4. Newly forked worker process has sent its "READY" state to master and starts
   to receive traffic. This happens after reload, the old worker hasn't
   terminated yet, as it is waiting on some idle connection and it crashes.

Let's rename in this commit mworker_cli_proxy_new_listener() to
mworker_cli_master_proxy_new_listener() to outline, that this function creates
"master-socket" bind conf and allocates a listener. This listener is attached
to the MASTER proxy and it's bound to the ipc_fd[0] of the sockpair,
inherited in master and in worker processes (master CLI sockpair).
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
223caab96f MINOR: cfgparse: fix *thread keywords sensitive to global section position
*thread keywords parsers are sensitive to global section position. If they are
present there, the global section must be the first section in the
configuration. *thread parsers logic is based on non_global_section_parsed
counter. So, we need to reset it explicitly before the second configuration
read done by worker or in a standalone mode.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
0ed262d7bf MEDIUM: startup: introduce load_cfg and read_cfg
This commit is a part of the series to add a support of discovery mode in the
configuration parser and in initialization sequence.

In order to support discovery mode, we need to read the configuration twice.
So, we need to split the stage, when we load all configuration files, from
the stage when we parse it. To do this, let's encapsulate in read_cfg() the
part, where we load the configuration files in a separate function, load_cfg().
Like this we can call only the parsing part as many times as we need.

Before reading configuration at the first time we set MODE_DISCOVERY. After
the reading this mode is immediately unset, as the real runtime mode has been
already set by discovery keywords parsers.

Second read is performed when all primary runtime modes (daemon, master-worker)
are applied, because we should not read the configuration twice in the master
process.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
e2b4768224 MEDIUM: cfgparse: parse only "global" section in MODE_DISCOVERY
This commit is a part of the series to add a support of discovery mode in the
configuration parser and in initialization sequence.

So, in discovery mode, when we read the configuration the first time, we
parse for the moment only the "global" section. Unknown section names will be
ignored.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
699be6a55d MEDIUM: cfgparse-global: parse only KWF_DISCOVERY keywords in MODE_DISCOVERY
This commit is a part of the series to add a support of discovery mode in the
configuration parser and in initialization sequence.

Global section parser parses the majority of keywords in its function, so
those keywords don't have any dedicated parsers yet. Only after this parsing
block cfg_parse_global() starts to call dedicated parsers for any other
discovered keywords, which were not found in the block.

As all keywords, which should be parsed in MODE_DISCOVERY have its own parser
funtions, we can skip this block with goto discovery_kw and start directly from
the part, where we call parsers from the keywords list. KWF_DISCOVERY flag helps
to call in MODE_DISCOVERY only the parsers, which we are needed at this mode.

All unknown keywords and garbage will be ignored at this stage.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
48371e6a30 MEDIUM: cfgparse: call some parsers only in MODE_DISCOVERY
This commit is a part of the series to add a support of discovery mode in the
configuration parser and in initialization sequence.

Some keyword parsers tagged with KWF_DISCOVERY (for example those, which parse
runtime modes, poller types, pidfile), should not be called twice when
the configuration will be read the second time after the discovery mode.
It's redundant and could trigger parser's errors in standalone mode. In
master-worker mode the worker process inherits parsed settings from the master.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
f9123e2183 MEDIUM: cfgparse: add KWF_DISCOVERY keyword flag
This commit is a part of the series to add a support of discovery mode in the
configuration parser and in initialization sequence.

So, let's add here KWF_DISCOVERY flag to distinguish the keywords,
which should be parsed in "discovery" mode and which are needed for master
process, from all others. Keywords, that should be parsed in "discovery" mode
have its dedicated parser funtions. Let's tag these functions with
KWF_DISCOVERY flag in keywords list. Like this, only these keyword parsers
might be called during the first configuration read in discovery mode.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
6769745fe5 MINOR: global: add MODE_DISCOVERY flag
This is the first commit from a series to add a support of discovery mode
in the configuration parser and in initialization sequence.

Discovery mode is the mode, when we read the configuration at the first time
and we parse and set runtime modes: daemon, zero-warning, master-worker. In
this mode we also parse some parameters needed for the master process to start,
in case if we are in the master-worker mode. Like this the master process
doesn't allocate any additional resources, which it doesn't use and it quickly
finishes its initialization and enters to its polling loop. The worker process
after its fork reads the rest of the configuration.

So, let's add in this commit MODE_DISCOVERY flag to check it in
configuration parser functions.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
fe75c1e12d MEDIUM: startup: remove MODE_MWORKER_WAIT
MODE_MWORKER_WAIT becames redundant with MODE_MWORKER, due to moving
master-worker fork in init(). This change allows master no longer perform
reexec just after forking in order to free additional memory.

As after the fork in the master process we set 'master' variable, we can
replace now MODE_MWORKER_WAIT in some 'if' statements by simple check of this
'master' variable.

Let's also continue to get rid of HAPROXY_MWORKER_WAIT_ONLY environment
variable, as it's no longer needed as well.

In cfg_program_postparser(), which is used to check if cmdline is defined to
launch a program, we completely remove the check of mode for now, because
the master process does not parse the configuration for the moment. 'program'
section parsing will be reintroduced in master later in the next commits.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
fb7bef781d MINOR: defaults: update MASTER_MAXCONN description
This is a one of the commits to prepare the removal of MODE_MWORKER_WAIT
support, as it became redundant with MODE_MWORKER due to moving master-worker
fork in init().
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
3f5f57845b MINOR: startup: rename exit_on_waitmode_failure to exit_on_failure
As we no longer support MODE_MWORKER_WAIT for master (it became redundant with
MODE_MWORKER after moving master-worker fork in init()), let's rename
exit_on_waitmode_failure() callback in just exit_on_failure().
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
7795d49ae6 CLEANUP: haproxy: rm no longer used mworker_reexec_waitmode
This a first commit to prepare the removal of MODE_MWORKER_WAIT support. It has
became redundant with MODE_MWORKER, due to moving master-worker fork in init().
Master process does no longer perform reexec to free additional memory after
forking and does no longer changing its mode to MODE_MWORKER_WAIT, where it has
entered to its wait polling loop and has handled signals. Now, master enters in
this loop almost immediately after forking a worker and being always in mode
MODE_MWORKER.

So, we can remove mworker_reexec_waitmode() wrapper, which was used to set
HAPROXY_MWORKER_WAIT_ONLY variable and to call mworker_reexec(). But let's keep
for the moment the logic of reexec_on_failure() atexit callback for master in
order if in the future we will need to support this case again.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
cb0f1f42e1 MINOR: startup: only worker gets capabilities from bin
Due to moving the master-worker fork in init(), we need to protect
prepare_caps_from_permitted_set() call, which is executed after init(). This
call makes sense only for worker, daemon and for foreground mono process modes.

prepare_caps_from_permitted_set() allows to read Linux capabilities from
haproxy binary and to move some of them in process Effective set, if 'setcap'
keyword lists needed capabilities in the global section.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
fe04c2ad37 MEDIUM: startup: do set_identity() if needed in one place
There are two set_identity() calls, both under quite same:

    'if ((global.mode & (MODE_MWORKER|MODE_DAEMON...)...'

The first call serves to change uid/gid and set some needed Linux capabilities
only for process in the foreground mode. The second comes after master-worker
fork and allows to do the same in daemon and in worker modes.

Due to moving the master-worker fork in init() in some previous commit, the
second set_identity() now is no longer under the 'if'. So, it is executed
for all modes, except MODE_MWORKER. Now in MODE_MWORKER process enters in its
wait polling loop just after forking a worker and it terminates almost
immediately, if it exits this loop.

Worker, daemon and process in a foreground mode will perform set_identity() as
before, but now it will be called in a one place at main().

global.last_checks should be verified just after set_identity() call. As it's
stated in comments some configuration options may require full privileges or
some Linux capabilities need to be granted to process. set_identity() via
prepare_caps_for_setuid() may put configured capabilities in process Effective
set and, hence, remove respective flag from global.last_checks.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
02af1fe067 MEDIUM: startup: call chroot() if needed in one place
There are two 'chroot' code blocks, both under quite same:

	'if ((global.mode & (MODE_MWORKER|MODE_DAEMON...)...'

The first block serves to perform chroot only for process in the foreground
mode. The second comes after master-worker fork and allows to do chroot
in daemon and in worker modes.

Due to moving the master-worker fork in init() in some previous commit, the
second 'chroot' code block now is no longer under the 'if'. So, it is executed
for all modes, except MODE_MWORKER. Now in MODE_MWORKER process enters in its
wait polling loop just after forking a worker and it terminates almost
immediately, if it exits this loop.

Worker, daemon and process in a foreground mode will perform the chroot as
before, but now it will be done in a one place at main().
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
7a2ee10d71 REORG: mworker: move mworker_create_master_cli in master 'case'
Let's move mworker_create_master_cli() call in 'master' case just above and get
rid of redundant global.mode tests.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
e4c10a704d MINOR: init: check MODE_MWORKER before creating master CLI
mworker_create_master_cli() creates MASTER proxy and allocates listeners,
which are attached to this proxy. It also creates a reload sockpair.

So, it's more appropriate to do the check, that we are in a MODE_MWORKER, if
master CLI settings were provided via command line, just after the config
parsing. And only then, if runtime mode and command line settings are
coherent, try to perform master-worker fork and try to create master CLI.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
26e53e2e8c REORG: mworker: set nbthread=1 for master after fork
After moving master-worker fork into init() and reintroducing it into a
switch-case (see the previous commit), it is more appropriate to set
nbthread=1 and nbtgroups=1 immediately in the 'case' for the parent process.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
ae84f06025 BUG/MINOR: mworker: fix memory leak due to master-worker fork
Before this fix, startup logs ring was duplicated before the fork(), so master
and worker had both the original startup_logs ring and the duplicated one. In
the worker context we freed the original ring and used a duplicated one. In
the master context we did nothing, but we still create a duplicated copy again
and again during the reload.

So, let's duplicate startup logs ring only in the worker context. Master
continues to use the original ring initialized in init() before its fork().
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
8dd4efe42f MAJOR: mworker: move master-worker fork in init()
This refactoring allows to simplify 'master-worker' logic. The master process
with this change will fork a worker very early at the initialization stage,
which allows to perform a configuration parsing only for the worker. In reality
only the worker process needs to parse and to apply the whole configuration.

Master process just polls master CLI sockets, watches worker status, catches
its termination state and handles the signals. With this refactoring there is
no longer need for master to perform re-execution after reading the whole
configuration file to free additional memory. And there is no longer need for
worker to register atexit callbacks, in order to free the memory, when it
fails to apply the new configuration. In contrast, we now need to set
proc_self pointer to the new worker entry in processes list just after
the fork in the worker process context. proc_self is dereferenced in
mworker_sockpair_register_per_thread(), which is called when worker enters in
its polling loop.

Following patches will try to gather more 'worker' and 'master' specific' code
in the dedicated cases of this new fork() switch, or in a separate functions.
2024-10-16 22:00:58 +02:00
Valentine Krasnobaeva
4cbfcc60f4 MEDIUM: startup: move PID handling in init()
Let's move PID handling in init() from the main() code. It is more appropriate
to open and to write the PID of the process just after daemonization fork. In
case of daemon monoprocess mode, we will simply write a PID of the process,
which is already in the background. In case of 'master-worker' mode, we keep
the previous behaviour and we write only a PID of the master process.

This allows to remove redundant tests of the process execution mode, tests of
the pidfd value and consequent writes to this pidfd. This patch prepares the
refactoring of master-worker fork by moving it in init() function as well.
2024-10-16 22:00:58 +02:00
Valentine Krasnobaeva
95c19be2ab MINOR: startup: refactor "daemonization" fork
Let's put "daemonization" fork into a switch-case. This is more readable and we
don't need to allocate memory for the fork() return value here.
2024-10-16 22:00:58 +02:00
Valentine Krasnobaeva
90b8181c0a MEDIUM: startup: move daemonization fork in init
Let's move daemonization fork in init(). We need to perform this fork always
before forking a worker process, in order to be able to launch master and then
its worker in daemon, i.e. background mode, if haproxy was started with '-D'
option.

This refactoring is a preparation step, needed for replacing then master-worker
fork in init() as well. This allows the master process not to read the whole
configuration file and not to do re-execution in order to free additional
memory, when worker was forked. In the new refactored design only the worker
process will read and apply a new configuration, while the master will arrive
very fast in its polling loop to wait worker's termination and to handle
signals. See more details in the following commits.
2024-10-16 22:00:58 +02:00
Valentine Krasnobaeva
df12791da3 MINOR: startup: add O_CLOEXEC flag to open /dev/null
As master process performs execvp() syscall to handle USR2 and HUP signals in
mworker_reexec(), let's add O_CLOEXEC flag, when we open '/dev/null' in order
to avoid fd leak.

This a preparation step to refactor master-worker logic. See more details in
the next commits.
2024-10-16 22:00:58 +02:00
Valentine Krasnobaeva
5bbcdc003a REGTESTS: cli: add delay 0.1 before connect to cli
When vtest starts haproxy process, it loops until the moment, when haproxy
pidfile is created. When pidfile is created, vtest considers that haproxy
process is ready and it starts to perform test commands, in particular, it
connects to CLI. It's not very reliable approach to base the check of the
process readiness on the PID file. After master-worker architecture
refactoring pidfile is created in the early init stage, but master and worker
are not yet finished its initialization routines. So, all mcli tests and some
tests where we sent commands to CLI start to fail regularly.

In vtest at the moment there is no any other approach to check that the
process is really ready. So let's add a delay 0.1s before connecting to CLI in
all mcli tests and in acl_cli_spaces test.
2024-10-16 22:00:58 +02:00
Willy Tarreau
2c2dac77aa DEBUG: mux-h2/flags: add H2_CF_DEM_RXBUF & H2_SF_EXPECT_RXDATA for the decoder
Both flags were recently added but missing from the decoders flags, so
they appeared in hex in dev/flags/flags output. No backport needed.
2024-10-16 18:32:52 +02:00
Willy Tarreau
ca275d99ce BUG/MEDIUM: queue: make sure never to queue when there's no more served conns
Since commit 53f52e67a0 ("BUG/MEDIUM: queue: always dequeue the backend when
redistributing the last server"), we've got two reports again still showing
the theoretically impossible condition in pendconn_add(), including a single
threaded one.

Thanks to the traces, the issue could be tracked down to the redispatch part.
In fact, in non-determinist LB algorithms (RR, LC, FAS), we don't perform the
LB if there are pending connections in the backend, since it indicates that
previous attempts already failed, so we directly return SRV_STATUS_FULL. And
contrary to a previous belief, it is possible to meet this condition with
be->served==0 when redispatching (and likely with maxconn not greater than
the number of threads).

The problem is that in this case, the entry is queued and then the
pendconn_must_try_again() function checks if any connections are currently
being served to detect if we missed a race, and tries again, but that
situation is not caused by a concurrent thread and will never fix itself,
resulting in the loop.

All that part around pendconn_must_try_again() is still quite brittle, and
a safer approach would involve a sequence counter to detect new arrivals
and dequeues during the pendconn_add() call. But it's more sensitive work,
probably for a later fix.

This fix must be backported wherever the fix above was backported. Thanks
to Patrick Hemmer, as well as Damien Claisse and Basha Mougamadou from
Criteo for their help on tracking this one!
2024-10-16 18:08:39 +02:00
Aurelien DARRAGON
85298189bf BUG/MEDIUM: server: server stuck in maintenance after FQDN change
Pierre Bonnat reported that SRV-based server-template recently stopped
to work properly.

After reviewing the changes, it was found that the regression was caused
by a4d04c6 ("BUG/MINOR: server: make sure the HMAINT state is part of MAINT")

Indeed, HMAINT is not a regular maintenance flag. It was implemented in
b418c122 a4d04c6 ("BUG/MINOR: server: make sure the HMAINT state is part
of MAINT"). This flag is only set (and never removed) when the server FQDN
is changed from its initial config-time value. This can happen with "set
server fqdn" command as well as SRV records updates from the DNS. This
flag should ideally belong to server flags.. but it was stored under
srv_admin enum because cur_admin is properly exported/imported via server
state-file while regular server's flags are not.

Due to a4d04c6, when a server FQDN changes, the server is considered in
maintenance, and since the HMAINT flag is never removed, the server is
stuck in maintenance.

To fix the issue, we partially revert a4d04c6. But this latter commit is
right on one point: HMAINT flag was way too confusing and mixed-up between
regular MAINT flags, thus there's nothing to blame about a4d04c6 as it was
error-prone anyway.. To prevent such kind of bugs from happening again,
let's rename HMAINT to something more explicit (SRV_ADMF_FQDN_CHANGED) and
make it stand out under srv_admin enum so we're not tempted to mix it with
regular maintenance flags anymore.

Since a4d04c6 was set to be backported in all versions, this patch must
be backported there as well.
2024-10-16 14:26:57 +02:00
Amaury Denoyelle
0918c41ef6 BUG/MEDIUM: quic: support wait-for-handshake
wait-for-handshake http-request action was completely ineffective with
QUIC protocol. This commit implements its support for QUIC.

QUIC MUX layer is extended to support wait-for-handshake. A new function
qcc_handle_wait_for_hs() is executed during qcc_io_process(). It detects
if MUX processing occurs after underlying QUIC handshake completion. If
this is the case, it indicates that early data may be received. As such,
connection is flagged with CO_FL_EARLY_SSL_HS, which is necessary to
block stream processing on wait-for-handshake action.

After this, qcc subscribs on quic_conn layer for RECV notification. This
is used to detect QUIC handshake completion. Thus,
qcc_handle_wait_for_hs() can be reexecuted one last time, to remove
CO_FL_EARLY_SSL_HS and notify every streams flagged as
SE_FL_WAIT_FOR_HS.

This patch must be backported up to 2.6, after a mandatory period of
observation. Note that it relies on the backport of the two previous
patches :
- MINOR: quic: notify connection layer on handshake completion
- BUG/MINOR: stream: unblock stream on wait-for-handshake completion
2024-10-16 11:51:35 +02:00
Amaury Denoyelle
73031e81cd BUG/MINOR: stream: unblock stream on wait-for-handshake completion
wait-for-handshake is an http-request action which permits to delay the
processing of content received as TLS early data. The action yields
as long as connection handshake is in progress. In the meantime, stconn
is flagged with SE_FL_WAIT_FOR_HS.

When the handshake is finished, MUX layer is responsible to woken up
SE_FL_WAIT_FOR_HS flagged stconn instances to restart the stream
processing. On sc_conn_process(), SE_FL_WAIT_FOR_HS flag is removed and
stream layer is woken up.

However, there may be a blocking after MUX notification. sc_conn_recv()
may return 0 due to no new data reception, which prevents
sc_conn_process() execution. The stream is thus blocked until its
timeout.

To fix this, checks in sc_conn_recv() about the handshake termination
condition. If true, explicitely returns 1 to ensure sc_conn_process()
will be executed.

Note that this bug is not reproducible due to various conditions related
to early data implementation in haproxy. Indeed, connection layer
instantiation is always delayed until SSL handshake completion, which
prevents the handling of early data as expected.

This fix will be necessary to implement wait-for-handshake support for
QUIC. As such, it must be backported with the next commit up to 2.6,
after a mandatory period of observation.
2024-10-16 11:44:31 +02:00
Amaury Denoyelle
5a5950e42d MINOR: quic: notify connection layer on handshake completion
Wake up connection layer on QUIC handshake completion via
quic_conn_io_cb. Select SUB_RETRY_RECV as this was previously unused by
QUIC MUX layer.

For the moment, QUIC MUX never subscribes for handshake completion.
However, this will be necessary for features such as the delaying of
early data forwarding via wait-for-handshake.

This patch will be necessary to implement wait-for-handshake support for
QUIC. As such, it must be backported with next commits up to 2.6,
after a mandatory period of observation.
2024-10-16 11:42:06 +02:00
Willy Tarreau
5091f90479 MINOR: activity/memprofile: always return "other" bin on NULL return address
It was found in a large "show profiling memory" output that a few entries
have a NULL return address, which causes confusion because this address
will be reused by the next new allocation caller, possibly resulting in
inconsistencies such as "free() ... pool=trash" which makes no sense. The
cause is in fact that the first caller had an entry->info pointing to the
trash pool from a p_alloc/p_free with a NULL return address, and the second
had a different type and reused that entry.

Let's make sure undecodable stacks causing an apparent NULL return address
all lead to the "other" bin.

While this is not exactly a bug, it would make sense to backport it to the
recent branches where the feature is used (probably at least as far as 2.8).
2024-10-15 08:12:34 +02:00
Willy Tarreau
93c9f19af7 REGTESTS: fix a reload race in abns_socket.vtc
This test issues a reload over the master CLI, but it is totally
possible that the master has not yet finished starting up the master
CLI when the command is issued, resulting in a failure. This was much
more visible on the new master-worker model, but definitely affects the
old one and could be the reason for this test to occasionally fail on
the CI.
2024-10-14 19:15:21 +02:00
William Lallemand
0302adf996 CI: cirrus-ci: bump FreeBSD image to 14-1
FreeBSD CI since to be broken for a while, try to upgrade the image to
the latest 14.1 version.
2024-10-14 14:28:26 +02:00
Willy Tarreau
e4cb0ad632 MINOR: mux-h2/traces: add buffer-related info to h2s and h2c
The traces currently don't contain any info about the amount of data
present in buffers, making it difficult to figure if an empty buffer
is the cause for not demuxing or if a full buffer is the cause for
not reading more data. Let's add them, with the head/tail info as
well.
2024-10-12 18:07:21 +02:00
Willy Tarreau
a8f907a459 MINOR: mux-h2/traces: add missing flags and proxy ID in traces
H2 traces are unusable to detect bugs most of the time because they miss
the h2c and h2s flags, as well as the proxy, which makes it very hard to
figure if the info comes from the client or the server as soon as two
layers are stacked. This commit adds these precious information as well
as the h2s's rx and tx windows.

This could be backported to a few recent branches, but the rx window
calculation will have to be replaced with the static value there.
2024-10-12 17:45:51 +02:00
Willy Tarreau
fcab647613 OPTIM: mux-h2: use tasklet_wakeup_after() in h2s_notify_recv()
This reduces the avg wakeup latency of sc_conn_io_cb() from 1900 to 51us.
The L2 cache misses from from 1.4 to 1.2 billion for 20k req. But the
perf is not better. Also there are situations where we must not perform
such wakeup, these may only be done from h2_io_cb, hence the test on the
next_tasklet pointer and its reset when leaving the function. In practice
all callers to h2s_close() or h2s_destroy() can reach that code, this
includes h2_detach, h2_snd_buf, h2_shut etc.

Another test with 40 concurrent connections, transferring 40k 1MB objects
at different concurrency levels from 1 to 80 also showed a 21% drop in L2
cache misses, and a 2% perf improvement:

Before:
   329,510,887,528  instructions
    50,907,966,181  branches
       843,515,912  branch-misses
     2,753,360,222  cache-misses
    19,306,172,474  L1-icache-load-misses
    17,321,132,742  L1-dcache-load-misses
       951,787,350  LLC-load-misses

      44.660469000 seconds user
      62.459354000 seconds sys

   => avg perf: 373 MB/s

After:
   331,310,219,157  instructions
    51,343,396,257  branches
       851,567,572  branch-misses
     2,183,369,149  cache-misses
    19,129,827,134  L1-icache-load-misses
    17,441,877,512  L1-dcache-load-misses
       906,923,115  LLC-load-misses

      42.795458000 seconds user
      62.277983000 seconds sys

   => avg perf: 380 MB/s

With small requests, it's the L1 and L3 cache misses which reduced by
3% and 7% respectively, and the performance went up by 3%.
2024-10-12 17:17:51 +02:00
Willy Tarreau
04ce6536e1 OPTIM: mux-h2: try to continue reading after demuxing when useful
When we stop demuxing in the middle of a frame, we know that there are
other data following. The demux buffer is small and unique, but now we
have rxbufs, so after h2_process_demux() is left, the dbuf is almost
empty and has room to be delivered into another rxbuf.

Let's implement a short loop with a counter and a few conditions around
the demux call. We limit the number of turns to the number of available
rxbufs and no more than 12, since it shows good performance, and the
wakeup is only called once. This has shown a nice 12-20% bandwidth gain
on backend-side H2 transferring 1MB-large objects, and does not affect
the rest (headers, control etc). The number of wakeup calls was divided
by 5 to 8, which is also a nice improvement. The counter is limited to
make sure we don't add processing latency. Tests were run to find the
optimal limit, and it turns out that 16 is just slightly better, but not
worth the +33% increase in peak processing latency.

The h2_process_demux() function just doens't call the wakeup function
anymore, and solely focuses on transferring from dbuf to rxbuf.

Practical measurement: test with h2load producing 4 concurrent connections
with 10 concurrent streams each, downloading 1MB objects (20k total) via
two layers of haproxy stacked, reaching httpterm over H1 (numbers are total
for the 2 h2 front and 1 h2 back). All on a single thread.

Before: 549-553 MB/s (on h2load)
  function    calls  cpu_tot  cpu_avg
  h2_io_cb  2562340  8.157s   3.183us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup
  h2_io_cb    30109  840.9ms  27.93us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup
  h2_io_cb    16105  106.4ms  6.607us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup
  h2_io_cb        1  11.75us  11.75us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup
  h2_io_cb  2608555  9.104s   3.490us --total--

  perf stat:
   153,117,996,214 instructions                             (71.41%)
    22,919,659,027 branches       # 14.97% of inst          (71.41%)
       384,009,600 branch-misses  #  1.68% of all branches  (71.42%)
        44,052,220 cache-misses          # 1 inst / 3476    (71.44%)
     9,819,232,047 L1-icache-load-misses # 6.4% of inst     (71.45%)
     8,426,410,306 L1-dcache-load-misses # 5.5% of inst     (57.15%)
        10,951,949 LLC-load-misses       # 1 inst / 13982   (57.13%)

      12.372600000 seconds user
      23.629506000 seconds sys

After: 660 MB/s (+20%)
  function    calls  cpu_tot  cpu_avg
  h2_io_cb   244502  4.410s   18.04us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup
  h2_io_cb    42107  1.062s   25.22us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup
  h2_io_cb    13703  106.3ms  7.758us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup
  h2_io_cb        1  13.74us  13.74us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup
  h2_io_cb   300313  5.578s   18.57us --total--

  perf stat:
   126,840,441,876 instructions                             (71.40%)
    17,576,059,236 branches       # 13.86% of inst          (71.40%)
       274,136,753 branch-misses  #  1.56% of all branches  (71.42%)
        30,413,562 cache-misses          # 1 inst / 4170    (71.45%)
     6,665,036,203 L1-icache-load-misses # 5.25% of inst    (71.46%)
     7,519,037,097 L1-dcache-load-misses # 5.9% of inst     (57.15%)
         6,702,411 LLC-load-misses       # 1 inst / 18925   (57.12%)

      10.490097000 seconds user
      19.212515000 seconds sys

It's also interesting to see that less total time is spent in these
functions, clearly indicating that the cost of interrupted processing,
and the extraneous cache misses come into play at some point. Indeed,
after the change, the number of instructions went down by 17.2%, while
the L2 cache misses dropped by 31% and the L3 cache misses by 39%!
2024-10-12 16:38:36 +02:00
Willy Tarreau
9fbc01710a OPTIM: mux-h2: make h2_send() report more accurate wake up conditions
h2_send() used to report non-zero every time any data were sent, and
this was used from h2_snd_buf() or h2_done_ff() to trigger a wakeup,
which possibly can do nothing. Restricting this wakeup to either a
successful send() combined with the ability to demux, or an error.

Doing this makes the number of h2_io_cb() wakeups drop from 422k to
245k for 1000 1MB objects delivered over 100 streams between two H2
proxies, without any behavior change nor performance change. In
practice, most send() calls do not result in a wakeup anymore but
synchronous errors still do.

A local test downloading 10k 1MB objects from an H1 server with a single
connection shows this change:

   before      after    caller
   1547        1467     h2_process_demux()
   2138           0     h2_done_ff()       <---
     38        1453     ssl_sock_io_cb()   <---
     18           0     h2_snd_buf()
      1           1     h2_init()
   3742        2921     -- total --

In practice the ssl_sock_io_cb() wakeups are those notifying about
SUB_RETRY_RECV, which are not accounted for when h2_done_ff() performs
the wakeup because the tasklet is already queued (a counter placed
there shows that it's nonetheless called). So there's no transfer and
h2_done_ff() was only hiding the other one.

Another test involving 4 connections with 10 concurrent streams each
and 20000 1MB objects total shows a total disparition of the wakeups
from h2_snd_buf and h2_done_ff, which used to account together for
50% of the wakeups, resulting in effectively halving the number of
wakeups which, based on their avg process time, were not doing
anything:

Before:
  function   calls     cpu_tot   cpu_avg
  h2_io_cb   2571208   7.406s    2.880us <- h2c_restart_reading@src/mux_h2.c:940 tasklet_wakeup
  h2_io_cb   2536949   251.4ms   99.00ns <- h2_snd_buf@src/mux_h2.c:7573 tasklet_wakeup ###
  h2_io_cb     41100   5.622ms   136.0ns <- h2_done_ff@src/mux_h2.c:7779 tasklet_wakeup ###
  h2_io_cb     38979   852.8ms   21.88us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup
  h2_io_cb     12519   90.28ms   7.211us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup
  h2_io_cb         1   13.81us   13.81us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup
  h2_io_cb   5200756   8.606s    1.654us --total--

After:
  h2_io_cb   2562340   8.157s    3.183us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup
  h2_io_cb     30109   840.9ms   27.93us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup
  h2_io_cb     16105   106.4ms   6.607us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup
  h2_io_cb         1   11.75us   11.75us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup
  h2_io_cb   2608555   9.104s    3.490us --total--
2024-10-12 16:38:36 +02:00
Willy Tarreau
633c41c621 MEDIUM: mux-h2: rework h2_restart_reading() to differentiate recv and demux
From the beginning, h2_restart_reading() has always been confusing because
it decides whether or not to wake the tasklet handler up or not. This
tasklet handler does two things, one is receiving from the socket to the
demux buf, and one is demuxing from the demux buf to the streams' rxbufs.

The conditions are governed by h2_recv_allowed(), which is also called at
a few places to decide whether or not to actually receive from the socket.
It starts to be visible that this leaves some difficulties regarding what
to do with possibly pending data.

In 2.0 with commit 3ca18bf0b ("BUG/MEDIUM: h2: Don't attempt to recv from
h2_process_demux if we subscribed."), we even had to address a special
case where it was possibly to endlessly wake up because the conditions
would rely on the demux buffer's contents, though the solution consisted
in passing a flag to decide whether or not to consider the buffer's
contents.

In 2.5 commit b5f7b5296 ("BUG/MEDIUM: mux-h2: Handle remaining read0 cases
on partial frames") introduced a new flag H2_CF_DEM_SHORT_READ which
indicates that the demux had to stop in the middle of a frame and cannot
make progress without more data. More adaptations later came in based on
this but this actually reflected exactly what was needed to solve this
painful situation: a state indicating whether to receive or parse.

Now's about time to definitely address this by reworking h2_restart_reading()
to check two completely independent things:
  - the ability to receive more data into the demux buffer, which is
    based on its allocation/fill state and the socket's errors
  - the ability to demux such data, which is based on the presence of
    enough data (i.e. no stuck short read), and ability to find an rx
    buf to continue the processing.

Now the conditions are much more understandable, and it's also visible
that the consider_buffer argument, whose value was not trivial for
callers, is not used anymore.

Tests stacking two layers of H2 show strictly no change to the wakeup
cause distributions nor counts.
2024-10-12 16:38:36 +02:00
Willy Tarreau
e057f8367c DOC: design-thoughts: add diagrams illustrating an rx win groth
Let's just see on a diagram how the receiver can detect that the
window is large enough for the remote sender to fill the link. Here
it seems that a first criterion is that data are accumulating in
the rxbuf, indicating that the next hop doesn't consume them fast
enough. On the diagram it's visible when blue arrows (incoming data)
are more frequent than the magenta ones on average (outgoing data),
which happens when silence moments are less frequent and don't allow
the reader to catch up. It's also visible that there are two phases
alternating in the transfer:
  - measure round trip time (i.e. how long it takes to restart
    sending after a WU was sent after a long silence)

  - measure the lowest rxbuf size during the previous round trip

It's worth noting that a window size change only has *observable* effect
after two RTT: the first RTT is to restart sending (opening or enlarging
the window), the second RTT to measure the lowest rxbuf size over the
period.

By turning the advertised window into an offset and comparing it to
the received quantity, it's possible to measure the RTT of the whole
chain (including the client possibly producing the data). Note that
when multiple streams compete for BW this can become tricky. Limiting
the window to available buffers and counting the number of sending
streams on a connection could work (i.e. split total buffers into
1+#senders, first one being used for tx).
2024-10-12 16:38:36 +02:00
Willy Tarreau
0fd66703c2 MEDIUM: mux-h2: change the default initial window to 16kB
Now that we're using all available rx buffers for transfers, there's
no point anymore in advertising more than the minimum value we can
safely buffer. Let's be conservative and only rely on the dynamic
buffers to improve speed beyond the configured value, and make sure
than many streams will no longer cause unfairness.

Interestingly, the total number of wakeups has further shrunk down, but
with a different distribution. From 128k for 1000 1M transfers, it went
down to 119k, with 96k from restart_reading, 10k from done_ff and 2.6k
from snd_buf. done_ff went up by 30% and restart_reading went down by
30%.
2024-10-12 16:38:26 +02:00
Willy Tarreau
1ed9d37c88 MINOR: mux-h2: add tune.h2.be.rxbuf and tune.h2.fe.rxbuf global settings
These settings allow to change the total buffer size allocated to the
backend and frontend respectively. This way it's no longer necessary to
play with tune.bufsize nor increase the number of streams to benefit from
more buffers.

Setting tune.h2.fe.rxbuf to 4m to match a sender's max tcp_wmem resulted
in 257 Mbps for a single stream at 103ms vs 121 Mbps default (or 5.1 Mbps
with a single buffer and 64kB window).
2024-10-12 16:29:16 +02:00
Willy Tarreau
e018d9a0cf MAJOR: mux-h2: make the rxbuf allocation algorithm a bit smarter
Without using bandwidth estimates, we can already use up to the number
of allocatable rxbufs and share them evenly between receiving streams.
In practice we reserve one buffer for any non-receiving stream, plus
1 per 8 possible new streams, and divide the rest between the number
of receiving streams.

Finally, for front streams, this is rounded up to the buffer size while
for back streams we round it down. The rationale here is that front to
back is very fast to flush and slow to refill so we want to optimise
upload bandwidth regardless of the number of streams, while it's the
opposite in the other way so we try to minimize HoL.

That shows good results with a single stream being able to send at 121
Mbps at 103ms using 1.4 MB buffer with default settings, or 8 streams
sharing the bandwidth at 180kB each. Previously the limit was approx
5.1 Mbps per stream.

It also enables better sharing of backend connections: a slow (100 Mbps)
and a fast (1 Gbps) clients were both downloading 2 100MB files each over
a shared H2 connection. The fast one used to show 6.86 to 20.74s with an
avg of 11.45s and an stddev of 5.81s before the patch, and went to a
much more respectable 6.82 to 7.73s with 7.08s avg and 0.336s stddev.

We don't try to increase the window past the remaining content length.
First, this is pointless (though harmless), but in addition it causes
needless emission of WINDOW_UPDATE frames on small uploads that are
smaller than a window, and beyond being useless, it upsets vtest which
expects an RST on some tests. The scheduling is not reliable enough to
insert an expect for a window update first, so in the end wich that
extra check we save a few useless frames on small uploads and please
vtest.

A new setting should be added to allow to increase the number of buffers
without having to change the number of streams. At this point it's not
done.
2024-10-12 16:29:16 +02:00
Willy Tarreau
3816c38601 MAJOR: mux-h2: permit a stream to allocate as many buffers as desired
Now we don't enforce allocation limits in h2s_get_rxbuf(), since there
is no benefit in not processing pending data, it would still cause HoL
for no saving. The only reason for not allocating is if there are no
buffers available for the connection.

In theory this should not change anything except that it excerts code
paths that support reallocating multiple buffers, which could possibly
uncover a sleeping bug. This is why it's placed in a separate commit.

And one observation worth noting is that it almost cut in half the number
of iocb wakeups: for 1000 1MB transfers over 100 concurrent streams of a
single connection, we used to observe 208k wakeups (110 from restart_reading,
80 from snd_buf, 11 from done_ff), and now we're observing 128k (113 from
restart_reading, 2.4 from snd_buf, 6.9k from done_ff), which seems to
indicate that pretty often the demuxing was blocked on a buffer full due
to the default advertised window of 64k.
2024-10-12 16:29:16 +02:00
Willy Tarreau
4eb3ff1d3b MAJOR: mux-h2: make streams use the connection's buffers
For now it seems to work as before, and even when artificially inflating
the number of allocatable buffers per stream. The number of allocated
slots is always the same as the max number of streams, which guarantees
that each stream will find one buffer. we only grant one buffer per
stream at this point, since the goal was to replace the existing single
rxbuf.

A new demux blocking flag, H2_CF_DEM_RXBUF, was added to indicate
a failure to get an rxbuf slot from the connection. It was lightly
tested (by forcing bl_init() to a lower number of buffers). It is not
yet certain whether it's more useful to have a new flag or to reuse
the existing H2_CF_DEM_SFULL which indicates the rxbuf is full,
but at least the new flag more accurately translates the condition,
that may make a difference in the future. However, given that when
RXBUF is set, most of the time it results in a failure to find more
room to demux and it sets SFULL, for now we have to always clear
SFULL when clearing RXBUF as well. This means that most of the time
we'll see 3 combinations:
  - none: everything's OK
  - SFULL: the unique rx buffer is full
  - RXBUF || (RXBUF|SFULL): cannot allocate more entries

Note that we need to be super careful in h2_frt_transfer_data() because
the htx_free_data_space() function doesn't guarantee that the room is
usable, so htx_add_data() may still fail despite an apparent room. For
this reason, h2_frt_transfer_data() maintains a "full" flag to indicate
that a transfer attempt failed and that a new buffer is required.
2024-10-12 16:29:16 +02:00
Willy Tarreau
6279cbc9e9 MINOR: mux-h2: clear up H2_CF_DEM_DFULL and H2_CF_DEM_SHORT_READ ambiguity
Since commit 485da0b05 ("BUG/MEDIUM: mux_h2: Handle others remaining
read0 cases on partial frames"), H2_CF_DEM_SHORT_READ is set when there
is no blocking flags. However, it checks H2_CF_DEM_BLOCK_ANY which does
not include H2_CF_DEM_DFULL. This results in many cases where both
H2_CF_DEM_DFULL and H2_CF_DEM_SHORT_READ are set together, which makes
no sense, since one says the demux buffer is full while the other one
says an incomplete read was done. This doesn't permit to properly
decide whether to restart reading or processing.

Let's make sure to clear DFULL in h2_process_demux() whenever we
consume incoming data from the dbuf, and check for DFULL before
setting SHORT_READ.

This could probably be considered as a bug fix but it's hard to say if
it has any impact on the current code, probably at worst it might cause
a few useless wakeups, so until there's any proof that it needs to be
backported, better not do it.
2024-10-12 16:29:16 +02:00
Willy Tarreau
b74bedf157 MINOR: mux-h2: simplify the wake up code in h2_rcv_buf()
The code used to decide when to restart reading is far from being trivial
and will cause trouble after the forthcoming changes: it checks if the
current stream is the same that is being demuxed, and only if so, wakes
the demux to restart reading. Once streams will start to use multiple
buffers, this condition will make no sense anymore. Actually the real
reason is split into two steps:
  - detect if the demux is currently blocked on the current stream, and
    if so remove SFULL
  - detect if any demux blocking flags were removed during the operations,
    and if so, wake demuxing.

For now this doesn't change anything.
2024-10-12 16:29:16 +02:00
Willy Tarreau
a0ed92f3dd MINOR: mux-h2: simplify the exit code in h2_rcv_buf()
The code used to decide what to tell to the upper layer and when to free
the rxbuf is a bit convoluted and difficult to adapt to dynamic rxbufs.
We first need to deal with memory management (b_free) and only then to
decide what to report upwards. Right now it does it the other way around.

This should not change anything.
2024-10-12 16:29:16 +02:00
Willy Tarreau
3b5ac2b553 MINOR: mux-h2: move H2_CF_WAIT_IN_LIST flag away from the demux flags
It's not convenient to have this flag in the middle of the demux flags,
it easily hides other ones that need to be added. Let's move it after
the other ones.
2024-10-12 16:29:16 +02:00
Willy Tarreau
8cf418811d MINOR: mux-h2: add rxbuf head/tail/count management for h2s
Now the h2s get their rx_head, rx_tail and rx_count associated with the
shared rxbufs. A few functions are provided to manipulate all this,
essentially allocate/release a buffer for the stream, return a buffer
pointer to the head/tail, counting allocated buffers for the stream
and reporting if a stream may still allocate.

For now this code is not used.
2024-10-12 16:29:16 +02:00
Willy Tarreau
a891534bfd MINOR: mux-h2: allocate the array of shared rx bufs in the h2c
In preparation for having a shared list of rx bufs, we're now allocating
the array of shared rx bufs in the h2c. The pool is created at the max
size between the front and back max streams for now, and the array is not
used yet.
2024-10-12 16:29:16 +02:00
Willy Tarreau
721ea5b06c MINOR: mux-h2: count within a connection, how many streams are receiving data
A stream is receiving data from after the HEADERS frame missing END_STREAM,
to the end of the stream or HREM (the presence of END_STREAM). We're now
adding a flag to the stream that indicates this state, as well as a counter
in the connection of streams currently receiving data. The purpose will be
to gauge at any instant the number of streams that might have to share the
available bandwidth and buffers count in order not to allocate too much flow
control to any single stream. For now the counter is kept up to date, and is
reported in "show fd".
2024-10-12 16:29:16 +02:00
Willy Tarreau
c9275084bc MEDIUM: mux-h2: start to introduce the window size in the offset calculation
Instead of incrementing the last_max_ofs by the amount of received bytes,
we now start from the new current offset to which we add the static window
size. The result is exactly the same but it prepares the code to use a
window size combined with an offset instead of just refilling the budget
from what was received.

It was even verified that changing h2_fe_settings_initial_window_size in
the middle of a transfer using gdb does indeed allow the transfer speed
to adapt accordingly.
2024-10-12 16:29:16 +02:00
Willy Tarreau
1cc851d9f2 MEDIUM: mux-h2: start to update stream when sending WU
The rationale here is that we don't absolutely need to update the
stream offset live, there's already the rcvd_s counter to remind
us we've received data. So we can continue to exploit the current
check points for this.

Now we know that rcvd_s indicates the amount of newly received bytes
for the stream since last call to h2c_send_strm_wu() so we can update
our stream offsets within that function. The wu_s counter is set to
the difference between next_adv_ofs and last_adv_ofs, which are
resynchronized once the frame is sent.

If the stream suddenly disappears with unacked data (aborted upload),
the presence of the last update in h2c->wu_s is sufficient to let the
connection ack the data alone, and upon subsequent calls with new
rcvd_s, the received counter will be used to ack, like before. We
don't need to do more anyway since the goal is to let the client
abort ASAP when it gets an RST.

At this point, the stream knows its current rx offset, the computed
max offset and the last advertised one.
2024-10-12 16:29:16 +02:00
Willy Tarreau
eb0fe66c61 MINOR: mux-h2: create and initialize an rx offset per stream
In H2, everything is accounted as budget. But if we want to moderate
the rcv window that's not very convenient, and we'd rather have offsets
instead so that we know where we are in the stream. Let's first add
the fields to the struct and initialize them. The curr_rx_ofs indicates
the position in the stream where next incoming bytes will be stored.
last_adv_ofs tells what's the offset that was last advertised as the
window limit, and next_max_ofs is the one that will need to be
advertised, which is curr_rx_ofs plus the current window. next_max_ofs
will have to cause a WINDOW_UPDATE to be emitted when it's higher than
last_adv_ofs, and once the WU is sent, its value will have to be copied
over last_adv_ofs.

The problem is, for now wherever we emit a stream WU, we have no notion
of stream (the stream might even not exist anymore, e.g. after aborting
an upload), because we currently keep a counter of stream window to be
acked for the current stream ID (h2c->dsi) in the connection (rcvd_s).
Similarly there are a few places early in the frame header processing
where rcvd_s is incremented without knowing the stream yet. Thus, lookups
will be needed for that, unless such a connection-level counter remains
used and poured into the stream's count once known (delicate).

Thus for now this commit only creates the fields and initializes them.
2024-10-12 16:29:15 +02:00
Willy Tarreau
560e474cdd MINOR: mux-h2: split the amount of rx data from the amount to ack
We'll need to keep track of the total amount of data received for the
current stream, and the amount of data to ack for the current stream,
which might soon diverge as soon as we'll have to update the stream's
offset with received data, which are different from those to be ACKed.
One reason is that in case a stream doesn't exist anymore (e.g. aborted
an upload), the rcvd_s info might get lost after updating the stream,
so we do need to have an in-connection counter for that.

What's done here is that the rcvd_s count is transferred to wu_s in
h2c_send_strm_wu(), to be used as the counter to send, and both are
considered as sufficient when non-null to call the function.
2024-10-12 16:29:15 +02:00
Willy Tarreau
8f09bdce10 MINOR: buffer: add a buffer list type with functions
The buffer ring is problematic in multiple aspects, one of which being
that it is only usable by one entity. With multiplexed protocols, we need
to have shared buffers used by many entities (streams and connection),
and the only way to use the buffer ring model in this case is to have
each entity store its own array, and keep a shared counter on allocated
entries. But even with the default 32 buf and 100 streams per HTTP/2
connection, we're speaking about 32*101*32 bytes = 103424 bytes per H2
connection, just to store up to 32 shared buffers, spread randomly in
these tables. Some users might want to achieve much higher than default
rates over high speed links (e.g. 30-50 MB/s at 100ms), which is 3 to 5
MB storage per connection, hence 180 to 300 buffers. There it starts to
cost a lot, up to 1 MB per connection, just to store buffer indexes.

Instead this patch introduces a variant which we call a buffer list.
That's basically just a free list encoded in an array. Each cell
contains a buffer structure, a next index, and a few flags. The index
could be reduced to 16 bits if needed, in order to make room for a new
struct member. The design permits initializing a whole freelist at once
using memset(0).

The list pointer is stored at a single location (e.g. the connection)
and all users (the streams) will just have indexes referencing their
first and last assigned entries (head and tail). This means that with
a single table we can now have all our buffers shared between multiple
streams, irrelevant to the number of potential streams which would want
to use them. Now the 180 to 300 entries array only costs 7.2 to 12 kB,
or 80 times less.

Two large functions (bl_deinit() & bl_get()) were implemented in buf.c.
A basic doc was added to explain how it works.
2024-10-12 16:29:15 +02:00
Willy Tarreau
ac66df4e2e REORG: buffers: move some of the heavy functions from buf.h to buf.c
Over time, some of the buffer management functions grew quite a bit,
and were still forced to remain inlined since all defined in buf.h.
Let's create buf.c and move the heaviest ones there. All those moved
here were above 200 bytes.
2024-10-12 16:29:15 +02:00
Willy Tarreau
d288ddb575 CLEANUP: muxes: remove useless inclusion of ebmbtree.h
Since 2.7 with commit 8522348482 ("BUG/MAJOR: conn-idle: fix hash indexing
issues on idle conns"), we've been using eb64 trees and not ebmb trees
anymore, and later we dropped all that to centralize the operations in
the server. Let's remove the ebmbtree.h includes from the muxes that do
not use them.
2024-10-12 16:29:15 +02:00
Willy Tarreau
cf3fe1eed4 MINOR: mux-h2/traces: print the size of the DATA frames
DATA frames produce a special trace with the amount of transferred data
in arg4, but this was not reported by h2_trace(). This commit just adds
it.
2024-10-12 16:29:15 +02:00
Willy Tarreau
af064b497a BUG/MINOR: mux-h2/traces: present the correct buffer for trailers errors traces
The local "rxbuf" buffer was passed to the trace instead of h2s->rxbuf
that is used when decoding trailers. The impact is essentially the
impossibility to present some buffer contents in some rare cases. It
may be backported but it's unlikely that anyone will ever notice the
difference.
2024-10-12 16:29:15 +02:00
Willy Tarreau
0fa654ca92 BUILD: cache: silence an uninitialized warning at -Og with gcc-12.2
Building with gcc-12.2 -Og yields this incorrect warning in cache.c:

  In function 'release_entry_unlocked',
      inlined from 'http_action_store_cache' at src/cache.c:1449:4:
  src/cache.c:330:9: warning: 'object' may be used uninitialized [-Wmaybe-uninitialized]
    330 |         release_entry(cache, entry, 1);
        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  src/cache.c: In function 'http_action_store_cache':
  src/cache.c:1200:29: note: 'object' was declared here
   1200 |         struct cache_entry *object, *old;
        |                             ^~~~~~

This is wrong, the only way to reach the function is with first!=NULL
and the gotos that reach there are all those made with first==NULL.
Let's just preset object to NULL to silence it.
2024-10-12 16:28:54 +02:00
William Lallemand
edf85a1d76 MINOR: cfgparse: simulate long configuration parsing with force-cfg-parser-pause
This command is pausing the configuration parser for <timeout>
milliseconds. This is useful for development or for testing timeouts of
init scripts, particularly to simulate a very long reload. It requires
the expose-experimental-directives to be set.
2024-10-11 17:40:37 +02:00
Amaury Denoyelle
232083c3e5 BUG/MEDIUM: mux-quic: ensure timeout server is active for short requests
If a small request is received on QUIC MUX frontend, it can be
transmitted directly with the FIN on attach operation. rcv_buf is
skipped by the stream layer. Thus, it is necessary to ensure that there
is similar behavior when FIN is reported either on attach or rcv_buf.

One difference was that se_expect_data() was called only for rcv_buf but
not on attach. This most obvious effect is that stream timeout was
deactivated for this request : client timeout was disabled on EOI but
server one not armed due to previous se_expect_no_data(). This prevents
the early closure of too long requests.

To fix this, add an invokation of se_expect_data() on attach operation.

This bug can simply be detected using httpterm with delay request (for
example /?t=10000) and using smaller client/server timeouts. The bug is
present if the request is not aborted on timeout but instead continue
until its proper HTTP 200 termination.

This has been introduced by the following commit :
  85eabfbf672c57e4ed082da1b96c95348b331320
  MEDIUM: mux-quic: Don't expect data from server as long as request is unfinished

This must be backported up to 2.8.
2024-10-10 17:20:39 +02:00
Aurelien DARRAGON
7144e60cd2 MINOR: sample: postresolve sink names in debug() converter
debug() converter used to resolve sink names during parsing time. Because
of this, we were unable to specify sink names that were defined after
the debug() converter was placed.

Like in the previous commit, let's implement proper postparsing for the
debug() converter, in order to be able to use sink names that are about
to be defined later in the config file.
2024-10-10 16:55:15 +02:00
Aurelien DARRAGON
ed266589b6 MINOR: trace: postresolve sink names
A previous known limitation about traces was that parsing was performed on
the fly, meaning that when using "sink" keyword, only sinks that were
either internal or previously defined in the config could be used. Indeed,
it was not possible to use a ring section defined AFTER the traces section
when using the 'sink' keyword from traces.

This limitation was also mentioned in the config file.

Let's get rid of that limitation by implementing proper postparsing for
the sink parameter in traces section. To do this, make use of the new
sink_find_early() helper to start referencing sink by their names even
if they don't exist yet (if they are about to be defined later in the
config)

Traces commands on the cli are not concerned by this change.
2024-10-10 16:55:15 +02:00
Aurelien DARRAGON
1bdf6e884a MEDIUM: sink: implement sink_find_early()
sink_find_early() is a convenient function that can be used instead of
sink_find() during parsing time in order to try to find a matching
sink even if the sink is not defined yet.

Indeed, if the sink is not defined, sink_find_early() will try to create
it and mark it as forward-declared. It will also save informations from
the caller to better identify it in case of errors.

If the sink happens to be found in the config, it will transition from
forward-declared type to its final type. Else, it means that the sink
was not found in the config, in this case, during postresolve, we raise
an error to indicate that the sink was not found in the configuration.

It should help solve postresolving issue with rings, because for now only
log targets implement proper ring postresolving.. but rings may be used
at different places in the code, such as debug() converter or in "traces"
section.
2024-10-10 16:55:15 +02:00
Damien Claisse
ba7c03c18e MINOR: ssl: disable server side default CRL check with WolfSSL
Patch 64a77e3ea5 disabled CRL check when no CRL file was provided, but
it only did it on bind side. Add the same fix in server context
initialization side.
This allows to enable peer verification (verify required) on a server
using TLS, without having to provide a CRL file.
2024-10-10 09:31:19 +02:00
Amaury Denoyelle
456c3997b2 BUG/MEDIUM: quic: properly decount out-of-order ACK on stream release
Out-of-order STREAM ACK are buffered in its related streambuf tree. On
insertion, overlapping or contiguous ranges are merged together. The
total size of buffered ack range is stored in <room> streambuf member
and reported to QUIC MUX layer on streambuf release. The objective is to
ensure QUIC MUX layer can allocate Tx buffers conveniently to preserve a
good transfer throughput.

Streamdesc is the overall container of many streambufs. It may also been
released when its upper QCS instance is freed, after all stream data
have been emitted. In this case, the active streambuf is also released
via custom code. However, in this code path, <room> was not reported to
the QUIC MUX layer.

This bug caused wrong estimation for the QUIC MUX txbuf window, with
bytes reamining even after all ACK reception. This may cause transfer
freeze on other connection streams, with RESET_STREAM emission on
timeout client.

To fix this, reuse the existing qc_stream_buf_release() function on
streamdesc release. This ensures that notify_room is correctly used.

No need to backport.
2024-10-09 17:47:16 +02:00
Amaury Denoyelle
f0049d0748 BUG/MINOR: quic: fix discarding of already stored out-of-order ACK
To properly decount out-of-order acked data range, contiguous or
overlapping ranges are first merged before their insertion in a tree.

The first step ensure that a newly reported range is not completely
covered by the existing tree ranges. However, one of the condition was
incorrect. Fix this to ensure that the final range tree does not contain
duplicated entry.

The impact of this bug is unknown. However, it may have allowed the
insertion of overlapping ranges, which could in turn cause an error in
QUIC MUX txbuf window, with a possible transfer freeze.

No need to backport.
2024-10-09 17:32:30 +02:00
Aurelien DARRAGON
f88f162868 BUG/MEDIUM: hlua: properly handle sample func errors in hlua_run_sample_{fetch,conv}()
To execute sample fetches and converters from lua. hlua API leverages the
sample API. Prior to executing the sample func, the arg checker is called
from hlua_run_sample_{fetch,conv}() to detect potential errors.

However, hlua_run_sample_{fetch,conv}() both pass NULL as <err> argument,
but it is wrong for two reasons. First we miss an opportunity to report
precise error messages to help the user know what went wrong during the
check.. and more importantly, some val check functions consider that the
<err> pointer is never NULL. This is the case for example with
check_crypto_hmac(). Because of this, when such val check functions
encounter an error, they will crash the process because they will try
to de-reference NULL.

This bug was discovered and reported by GH user @JB0925 on #2745.

Perhaps val check functions should make sure that the provided <err>
pointer is != NULL prior to de-referencing it. But since there are
multiple occurences found in the code and the API isn't clear about that,
it is easier to fix the hlua part (caller) for now.

To fix the issue, let's always provide a valid <err> pointer when
leveraging val_arg() check function pointer, and make use of it in case
or error to report relevant message to the user before freeing it.

It should be backported to all stable versions.
2024-10-08 12:00:42 +02:00
Aurelien DARRAGON
d0e0105181 BUG/MEDIUM: hlua: make hlua_ctx_renew() safe
hlua_ctx_renew() is called from unsafe places where the caller doesn't
expect it to LJMP.. however hlua_ctx_renew() makes use of Lua library
function that could potentially raise errors, such as lua_newthread(),
and it does nothing to catch errors. Because of this, haproxy could
unexpectedly crash. This was discovered and reported by GH user
@JB0925 on #2745.

To fix the issue, let's simply make hlua_ctx_renew() safe by applying
the same logic implemented for hlua_ctx_init() or hlua_ctx_destroy(),
which is catching Lua errors by leveraging SET_SAFE_LJMP_PARENT() helper.

It should be backported to all stable versions.
2024-10-08 12:00:36 +02:00
Aurelien DARRAGON
3f4a788329 REGTESTS: add some tests for 'do-log' action
Now that 'do-log' action may be used for all existing action contexts,
let's add some tests in reg-tests/log/log_profile.vtc to ensure it works
as expected. quic-ini is not tested as it may not be builtin depending on
build options..
2024-10-04 21:38:19 +02:00
Aurelien DARRAGON
3ba924a4da MINOR: action: add do-log action
Thanks to the two previous commits, we can now expose the do-log action
on all available action contexts, including the new quic-init context.

Each context is responsible for exposing the do-log action by registering
the relevant log steps, saving the idendifier, and then store it in the
rule's context so that do_log_action() automatically uses it to produce
the log during runtime.

To use the feature, it is simply needed to use "do-log" (without argument)
on an action directive, example:

   tcp-request connection do-log

As mentioned before, each context where the action is exposed has its own
log step identifier. Currently known identifiers are:

  quic-initial:           quic-init
  tcp-request connection: tcp-req-conn
  tcp-request session:    tcp-req-sess
  tcp-request content:    tcp-req-cont
  tcp-response content:   tcp-res-cont
  http-request:           http-req
  http-response:          http-res
  http-after-response:    http-after-res

Thus, these "additional" logging steps can be used as-is under log-profile
section (after "on" keyword). However, although the parser will accept
them, it makes no sense to use them with the "log-steps" proxy keyword,
since the only path for these origins to trigger a log generation is
through the explicit use of "do-log" action.

This need was described in GH #401, it should help to conditionally
trigger logs using ACL at specific key points.. and may either be used
alone or combined with "log-steps" to add additional log "trackers" during
transaction handling.

Documentation was updated and some examples were added.
2024-10-04 21:38:14 +02:00
Aurelien DARRAGON
0e271f1d2a MINOR: log: add do_log_parse_act() helper func
Function may be used from places where per-context actions are usually
registered (tcp_act.c, http_act.c, quic_rules.c.. to name a few) in
order to expose the do_log() action.
2024-10-04 21:38:08 +02:00
Aurelien DARRAGON
e63c7da508 MINOR: log: add do_log() logging helper
do_log() is quite similar to sess_log() or strm_log(), excepts that it
may be called at any time during session handling in an opportunistic
way as long as the session exists (the stream may or may not exist).

Also, it will try to emit the log as INFO by default, unless set-log-level
is used on the stream, or error origin flag is set.
2024-10-04 21:38:02 +02:00
Amaury Denoyelle
f6599cf5a6 MEDIUM: quic: decount out-of-order ACK data range for MUX txbuf window
This commit is the last one of a serie whose objective is to restore
QUIC transfer throughput performance to the state prior to the recent
QUIC MUX buffer allocator rework.

This gain is obtained by reporting received out-of-order ACK data range
to the QUIC MUX which can then decount room in its txbuf window. This is
implemented in QUIC streamdesc layer by adding a new invokation of
notify_room callback. This is done into qc_stream_buf_store_ack() which
handle out-of-order ACK data range.

Previous commit has introduced merging of overlapping ACK data range. As
such, it's easy to only report the newly acknowledged data range.

As with in-order ACKs, this new notification is only performed on
released streambuf. As such, when a streambuf instance is released,
notify_room notification now also reports the total length of
out-of-order ACK data range currently stored. This value is stored in a
new streambuf member <room> to avoid unnecessary tree lookup.

This <room> member also serves on in-order ACK notification to reduce
the notified room. This prevents to report invalid values when overlap
ranges are treated first out-of-order and then in-order, which would
cause an invalid QUIC MUX txbuf window value.

After this change has been implemented, performance has been
significantly improved, both with ngtcp2-client rate usage and on
interop goodput test. These values are now similar to the rate observed
on older haproxy version before QUIC MUX buffer allocator rework.
2024-10-04 18:09:51 +02:00
Amaury Denoyelle
ae3e768d32 MEDIUM: quic: merge contiguous/overlapping buffered ack stream range
Transfer throughput was deteriorated since recent rework of QUIC MUX
txbuf allocator. This was partially restorated with the commit to
decount individual in-order ACK from the MUX buffer window.

To fully retrieve the old performance level, all ACKs must be decounted
when handled by QUIC streamdesc layer, event out-of-order ranges.
However, this is not easily implemented as several ranges may exist in
parallel with overlap on the underlying data. It would cause
miscalculation for QUIC MUX buffer window if such ranges were blindly
reported.

The proper solution is to first implement merge of contiguous or
overlapping ACK data ranges to reduce the number of stored ranges to the
minimal. This is the purpose of this patch. This is implemented in a new
static function named qc_stream_buf_store_ack() into streamdesc layer.

The merge algorithm is simple enough. First, it ensures the newly added
range is not already fully covered by a preexisting entry. Then, it
checks if there is contiguity/overlap with one or several ranges
starting at the same of a greater offset. If true, the newly added entry
is extended to cover them all, and all contiguous/overlapped ranges are
removed. Finally, if there is contiguity or overlap with an entry
starting at a smaller offset, no new range is instantiated and instead
the smaller offset is extended.

Now that contiguous or overlapped ranges cannot exits anymore, ACK data
ranges tree instiatiation can used EB_ROOT_UNIQUE.

Outside of the longer term objective which is to decount out-of-order
ACKs from MUX txbuf window, this commit could also improve some
performance and/or memory usage for connections where stream data
fragmentation and packet reording is high.
2024-10-04 18:07:52 +02:00
Amaury Denoyelle
e7578084b0 MINOR: quic: implement dedicated type for out-of-order stream ACK
QUIC streamdesc layer is responsible to handle reception of ACK for
streams. It removes stream data from the underlying buffers on ACK
reception.

Streamdesc layer treats ACK in order at the stream level. Out of order
ACKs are buffered in a tree until they can be handled on older data
acknowledgement reception. Previously, qf_stream instance which comes
from the quic_tx_packet was used as tree node to buffer such ranges.

Introduce a new type dedicated to represent out of order stream ack data
range. This type is named qc_stream_ack. It contains minimal infos only
relative to the acknowledged stream data range.

This allows to reduce size of frequently used quic_frame with the
removal of tree node from qf_stream. Another side effect of this change
is that now quic_frame are always released immediately on ACK reception,
both in-order and out-of-order. This allows to also release the
quic_tx_packet instance which should reduce memory consumption.

The drawback of this change is that qc_stream_ack instance must be
allocated on out-of-order ACK reception. As such, qc_stream_desc_ack()
may fail if an error happens on allocation. For the moment, such error
is silenly recovered up to qc_treat_rx_pkts() with the dropping of the
received packet containing the ACK frame. In the future, it may be
useful to close the connection as this error may only happens on low
memory usage.
2024-10-04 17:56:45 +02:00
Amaury Denoyelle
4ff87db5fe MEDIUM: quic: decount acknowledged data for MUX txbuf window
Recently, a new allocation mechanism was implemented for Tx buffers used
by QUIC MUX. Now, underlying congestion window size is used to determine
if it is still possible or not to allocate a new buffer when necessary.

This mechanism has render the QUIC stack more flexible. However, it also
has brought some performance degradation, with transfer time longer in
certain environment. It was first discovered on the measurement results
of the interop. It can also easily be reproduced using the following
ngtcp2-client example which forces a very small congestion window due to
frequent loss :

 $ ngtcp2-client -q --no-quic-dump --no-http-dump --exit-on-all-streams-close -r 0.1 127.0.0.1 20443 "https://[::]:20443/?s=10m"

This performance decrease is caused by the allocator which is now too
strict. It may cause buffer underrun frequently at the MUX layer when
the congestion window is too small, as new buffers cannot be allocated
until the current one is fully acknowledged. This resuls in transfers
with very bad throughput utilisation. The objective of this new serie of
patches is to relax some restrictions to permit QUIC MUX to allocate new
buffers more quickly, while preserving the initial limitation based on
congestion window size.

An interesting method for this is to notify QUIC MUX about newly
available room on individual ACK reception, without waiting for the full
bffer acknowledgement. This is easily implemented by adding a new
notify_room invokation in QUIC streamdesc layer on ACK reception.
However, ACK reception are handled in-order at the stream level. Out of
order ACKs are buffered and are not decounted for now. This will be
implemented in a future commit.

Note that for a single buffer instance, data can in parallel be written
by QUIC MUX and removed on ACK reception. This could cause room
notification to QUIC MUX layer to report invalid values. As such, ACK
reception are only accounted for released buffers. This ensures that
such buffers won't received any new data. In the same time, buffer room
is notified on release operation as it does not need acknowledgement.

This commit has permit to improve performance for the ngtcp2-client
scenario above. However, it is not yet sufficient enough for interop
goodput test.
2024-10-04 17:31:26 +02:00
Amaury Denoyelle
324a49ed4d MINOR: quic: strengthen qc_release_frm()
quic_frame is the type used to represent frames emitted in a QUIC Tx
packet. Each frame is attached to a packet, and can also be linked to
other frames from the the same packet, or duplicated frames for
retransmission. As such, quic_frame free operation is a tedious process.

qc_release_frm() has been implemented to ensure quic_frame is always
properly freed after detaching from all its list attach point. One
particular point is to ensure that when a frame is released, the frame
origin and all origin copies, including the current <frm> are flagged as
acked and detached from the reflist. Add a BUG_ON() to ensure this loop
is properly conducted when dealing with the current <frm> instance.
2024-10-04 16:00:05 +02:00
Christopher Faulet
131b877565 BUG/MINOR: stats: Fix the name for the total number of streams created
Because of a copy/paste error, CurrStreams was reused by mistake. It should
be "CumStreams"

No backports needed.
2024-10-04 15:44:40 +02:00
Amaury Denoyelle
c1d714156e BUG/MAJOR: mux-quic: do not crash on empty STREAM frame emission
Most of the time STREAM frames emitted by QUIC MUX have some data in it.
However, it is possible to use an empty frame when a delayed FIN must be
transferred.

Recently, QUIC MUX send callback notification has been refactored. Now,
this callback is blindly called by quic_conn lower layer each time a
STREAM frame is built into a newly Tx packet. QUIC MUX is responsible to
ensure the notified frame corresponds to newly emitted data or
retransmission. Offsets are used for this comparison, but this requires
special care for empty FIN frames.

Sadly, the comparison written to determine if an empty FIN frame was
sent for the first time or retransmitted is not correct. This caused
such frame to always be dismissed as retransmission in QUIC MUX sent
callback. This prevented the related QCS instance to be removed from the
send_list, causing qcc_io_send() to retry a new emission. This was
finally interrupted by the BUG_ON() assertion to prevent an infinite
loop.

Fix this crash by updating the condition in QUIC MUX send callback. For
empty STREAM frame, it is sufficient to check if QC_SF_FIN_STREAM was
already removed or not to detect a retransmission. Indeed, empty STREAM
frames are never used outside of delayed FIN reporting.

No need to backport. This crash was introduced in the current dev branch
by the following commit.
  d7f4e5abf0b7129329d0ea716c104474fd934bc6
  MEDIUM: quic: strengthen MUX send notification
2024-10-04 11:31:11 +02:00
Willy Tarreau
7cdc9325a1 [RELEASE] Released version 3.1-dev9
Released version 3.1-dev9 with the following main changes :
    - MINOR: tools: add minimal file name management
    - CLEANUP: stick-table: make the file location point to a global file name
    - MINOR: proxy: use the global file names for conf->file
    - CLEANUP: cfgparse: factor proxy vs log-forward collisions
    - BUG/MINOR: cfgparse: detect another uncaught case of duplicate defaults
    - MINOR: proxy: add a list of orphaned defaults sections
    - MEDIUM: cfgparse: drop duplicate named defaults sections after use
    - OPTIM: cfgparse: speed up duplicate server detection
    - MEDIUM: cfgparse: warn about deprecated use of duplicate server names
    - BUG/MINOR: server: shut down streams under thread isolation
    - BUG/MINOR: proxy: also make the cli and resolvers use the global name
    - REGTESTS: log: fix log-profile.vtc
    - MEDIUM: mailers: warn about deprecated legacy mailers
    - BUG/MEDIUM: cli: Be sure to catch immediate client abort
    - DEV: flags/applet: decode appctx flags
    - BUG/MEDIUM: cli: Deadlock when setting frontend maxconn
    - MINOR: log: fix indent in strm_log()
    - MINOR: log: introduce extra log profile steps
    - MINOR: log: handle extra log origins in _process_send_log_override()
    - MINOR: log: introduce log_orig flags
    - MINOR: log: explicitly handle extra log origins as error when relevant
    - MINOR: log: support extra log origins for '%OG' alias
    - MINOR: proxy: add log_steps struct member
    - MINOR: log: introduce "log-steps" proxy keyword
    - MINOR: log: add log_orig_proxy() helper function
    - MEDIUM: log: consider log-steps proxy setting for existing log origins
    - DOC: config: document proxy "log-steps" keyword
    - REGTESTS: add a test for proxy "log-steps"
    - Revert "BUG/MINOR: server: shut down streams under thread isolation"
    - MINOR: task: define two new one-shot events for use with WOKEN_OTHER or MSG
    - BUG/MEDIUM: stream: make stream_shutdown() async-safe
    - BUG/MINOR: server: make sure the HMAINT state is part of MAINT
    - BUG/MINOR: queue: make sure that maintenance redispatches server queue
    - MINOR: server: make srv_shutdown_sessions() call pendconn_redistribute()
    - BUILD: tools: only include execinfo.h for the real backtrace() function
    - MINOR: tools: do not attempt to use backtrace() on linux without glibc
    - OPTIM: channel: speed up co_getline()'s search of the end of line
    - OPTIM: stconn: Don't pretend mux have more data to deliver on EOI/EOS/ERROR
    - BUG/MINOR: mcli: Pretend the mux have more data to deliver between two commands
    - MINOR: action: Export release_expr_int_action() release function
    - MINOR: stream: Rely on a per-stream max connection retries value
    - MINOR: stream: Support dynamic changes of the number of connection retries
    - MINOR: stream/stats: Expose the current number of streams in stats
    - MINOR: stream/stats: Expose the total number of streams ever created in stats
    - BUG/MINOR: cfgparse-global: fix allowed args number for setenv
    - MINOR: cfgparse-global: add dedicated parser for *env keywords
    - MINOR: mux-quic: complete Tx infos for QCS dump
    - MINOR: quic: ensure txbuf realloc is only performed on empty buffer
    - MINOR: mux-quic: strengthen qcs_send_metadata() usage
    - MINOR: quic: remove unneeded notification of txbuf room
    - MINOR: quic: refactor MUX send notification
    - MEDIUM: quic: strengthen MUX send notification
    - MINOR: quic: refactor STREAM room notification
    - MINOR: quic: do not remove qc_stream_desc automatically on ACK handling
    - MINOR: quic: store streambuf in a streamdesc tree
    - MINOR: quic: move buffered ACK to streambuf
    - MEDIUM: quic: handle out-of-order ACK at streamdesc layer
    - MEDIUM: quic: refactor buffered STREAM ACK consuming
    - BUG/MEDIUM: queue: always dequeue the backend when redistributing the last server
    - MINOR: config/trace: Add a 'traces' section to declare debug traces
    - MINOR: trace: Be able to chain commands for a source in one line
    - MINOR: tcpcheck: Add support for an option host header value for httpchk option
    - BUG/MINOR: mux-h1: Fix condition to set EOI on SE during zero-copy forwarding
    - MINOR: mux-h1: Use a dedicated function to conditionnaly set EOI flag on SE
    - BUG/MINOR: http-ana: Disable fast-fwd for unfinished req waiting for upgrade
    - BUG/MINOR: mux-quic: fix crash on qcc_init() early return
    - BUG/MINOR: quic: fix trace on releasing STREAM frame after ack
2024-10-03 17:47:33 +02:00
Amaury Denoyelle
b74df9fbc9 BUG/MINOR: quic: fix trace on releasing STREAM frame after ack
Fix NULL argument pass to qc_release_frm(). This allows to give more
context on the traces inside it. Note that no crash occured as QUIC
traces always check validity on first arg before derefencing it.

No backport needed.
2024-10-02 17:10:51 +02:00
Amaury Denoyelle
58b7a72d07 BUG/MINOR: mux-quic: fix crash on qcc_init() early return
qcc_release() may be used in case qcc_init() cannot complete. In this
case, connection instance is NULL. As such, it cannot be dereferenced
without testing it first.

This should fix github coverity report #2739.

No backport needed.
2024-10-02 17:06:31 +02:00
Christopher Faulet
cea1379cf1 BUG/MINOR: http-ana: Disable fast-fwd for unfinished req waiting for upgrade
If a request is waiting for a protocol upgrade but it is not finished, the
data fast-forwarding is disabled. Otherwise, the request analyzers will miss
the end of the message.

This case is possible since the commit 01fb1a54 ("BUG/MEDIUM: mux-h1/mux-h2:
Reject upgrades with payload on H2 side only"). Indeed, before, a protocol
upgrade was not allowed for request with payload. But it is now possible and
this comes with a side-effect. It is not really satisfying but for now there
is no other way to sync the muxes and the applicative stream. It seems to be
a reasonnable fix for now, waiting for a deeper refactoring.

This patch must be backported with the commit above.
2024-10-02 10:31:40 +02:00
Christopher Faulet
267ba1d889 MINOR: mux-h1: Use a dedicated function to conditionnaly set EOI flag on SE
The same conditions are evaluated in h1_process_demux() and h1_fastfwd() to
know if SE_FL_EOI flag must be set or not on the sedesc. So now, a dedicated
function is used.
2024-10-02 10:22:51 +02:00
Christopher Faulet
6b39e245e1 BUG/MINOR: mux-h1: Fix condition to set EOI on SE during zero-copy forwarding
During zero-copy data forwarding, the producer must set the EOI flag on the SE
when end of the message is reached. It is already done but there is a case where
this flag is set while it should not. When a request wants to perform a protocol
upgrade and it is waiting for the server response, the flag must not be set
because the HTTP message is finished but some data are possibly still expected,
depending on the server response. On a 101-switching-protocol, more data will be
sent because the producer is switch to TUNNEL state.

So, now, the right condition is used. In DONE state, SE_FL_EOI flag is set on the sedesc iff:

  - it is the response
  - it is the request and the response is also in DONNE state
  - it is a request but no a protocol upgrade nor a CONNECT

This patch must be backported as far as 2.9.
2024-10-02 10:22:51 +02:00
Christopher Faulet
27ee292731 MINOR: tcpcheck: Add support for an option host header value for httpchk option
Support for headers and body hidden in the version for the "option httpchk"
directive was removed. However a Host header is mandatory for HTTP/1.1
requests and some servers may return an error if it is not set. For now, to
add it, an "http-check send" rule must be added. But it is not really handy
to use an extra config line for this purpose.

So now, it is possible to set the host header value, a log-format string, as
extra argument to "option httpchk" directive. It must be the fourth argument:

  option httpchk GET / HTTP/1.1 www.srv.com

While this patch is not a bug fix, it is simple enough to be backported if
necessary. On 2.9 and older, lf_init_expr() does not exist and LIST_INIT() must
be used instead.
2024-10-02 10:22:51 +02:00
Christopher Faulet
c39c351a73 MINOR: trace: Be able to chain commands for a source in one line
In the configuration file or on the CLI, configuring traces for a specific
source is a bit painful because this must be done in several lines. Thanks
to this patch, it is now possible to fully configure traces for a source in
one line. For instance, the following on the CLI:

  trace h1 sink stderr; trace h1 level developer; trace h1 verbosity complete; trace h1 start now

can now be replaced by:

  trace h1 sink stderr level developer verbosity complete start now

The same is true for the 'trace' directives in the configuration file.
2024-10-02 10:22:51 +02:00
Christopher Faulet
15a520d474 MINOR: config/trace: Add a 'traces' section to declare debug traces
It is no longer supported to declare debug traces, via 'trace' directive, in
a global section. A 'traces' directive must be used instead. The syntax of
the 'trace' directive in these sections remains the same. But it is no
longer experimental.

The main reason for this change is to avoid to have a ring section defined
before a global one. Indeed, for now, forward declarations of ring sections
are not supported. So to configure traces, you had to add a ring section
before the global one defining the traces. Most of time, that meant to have
two global sections :

  global
    [...] # global settings

  ring <name>
    [...]

  global
    [...] # trace config

In addition, it will be possible to easily extend the traces section by
adding some new directives.
2024-10-02 10:22:51 +02:00
Willy Tarreau
53f52e67a0 BUG/MEDIUM: queue: always dequeue the backend when redistributing the last server
An interesting bug was revealed by commit 5541d4995d ("BUG/MEDIUM: queue:
deal with a rare TOCTOU in assign_server_and_queue()"). When shutting
down a server to redistribute its connections, no check is made on the
backend's queue. If we're turning off the last server and the backend
has pending connections, these ones will wait there till the queue
timeout. But worse, since the commit above, we can enter an endless loop
in the following situation:

  - streams are present in the backend's queue
  - streams are purged on the last server via srv_shutdown_streams()
  - that one calls pendconn_redistribute(srv) which does not purge
    the backend's pendconns
  - a stream performs some load balancing and enters assign_server_and_queue()
  - assign_server() is called in turn
  - the LB algo is non-deterministic and there are entries in the
    backend's queue. The function notices it and returns SRV_STATUS_FULL
  - assign_server_and_queue() calls pendconn_add() to add the connection
    to the backend's queue
  - on return, pendconn_must_try_again() is called, it figures there's
    no stream served anymore on the server nor the proxy, so it removes
    the pendconn from the queue and returns 1
  - assign_server_and_queue() loops back to the beginning to try again,
    while the conditions have not changed, resulting in an endless loop.

Ideally a change count should be used in the queues so that it's possible
to detect that some dequeuing happened and/or that a last stream has left.
But that wouldn't completely solve the problem that is that we must never
ever add to a queue when there's no server streams to dequeue the new
entries.

The current solution consists in making pendconn_redistribute() take care
of the proxy after the server in case there's no more server available on
the proxy. It at least ensures that no pending streams are left in the
backend's queue when shutting streams down or when the last server goes
down. The try_again loop remains necessary to deal with inevitable races
during pendconn additions. It could be limited to a few rounds, though,
but it should never trigger if the conditions are sufficient to permit
it to converge.

One way to reproduce the issue is to run a config with a single server
with maxconn 1 and plenty of threads, then run in loops series of:

 "disable server px/s;shutdown sessions server px/s;
  wait 100ms server-removable px/s; show servers conn px;
  enable server px/s"

on the CLI at ~10/s while injecting with around 40 concurrent conns at
40-100k RPS. In this case in 10s - 1mn the crash can appear with a
backtrace like this one for at least 1 thread:

  #0  pendconn_add (strm=strm@entry=0x17f2ce0) at src/queue.c:487
  #1  0x000000000064797d in assign_server_and_queue (s=s@entry=0x17f2ce0) at src/backend.c:1064
  #2  0x000000000064a928 in srv_redispatch_connect (s=s@entry=0x17f2ce0) at src/backend.c:1962
  #3  0x000000000064ac54 in back_handle_st_req (s=s@entry=0x17f2ce0) at src/backend.c:2287
  #4  0x00000000005ae1d5 in process_stream (t=t@entry=0x17f4ab0, context=0x17f2ce0, state=<optimized out>) at src/stream.c:2336

It's worth noting that other threads may often appear waiting after the
poller and one in server_atomic_sync() waiting for isolation, because
the event that is processed when shutting the server down is consumed
under isolation, and having less threads available to dequeue remaining
requests increases the probability to trigger the problem, though it is
not at all necessary (some less common traces never show them).

This should carefully be backported wherever the commit above was
backported.
2024-10-01 18:57:51 +02:00
Amaury Denoyelle
8d68717a41 MEDIUM: quic: refactor buffered STREAM ACK consuming
For the moment, streamdesc layer can only deal with in-order ACK at the
stream level. Received out-of-order ACKs are buffered in a tree attached
to a streambuf instance.

Previously, caller of qc_stream_desc_ack() was responsible to implement
consumption of these buffered ACKs. Refactor this by implementing it
directly at the streamdesc layer within qc_stream_desc_ack(). This
simplifies quic_rx ACK handling and ensure buffered ACKs are consumed as
soon as possible.
2024-10-01 16:22:23 +02:00
Amaury Denoyelle
cc4384aeb7 MEDIUM: quic: handle out-of-order ACK at streamdesc layer
qc_stream_desc_ack() is the entrypoint for streamdesc layer to handle a
new acknowledgement of previously emitted STREAM data.

Previously, it was only able to deal with in-order ACK offset. The
caller was responsible to buffer out-of-order ACKs. Change this by
dealing with the latter case directly in qc_stream_desc_ack(). This
notably simplify ACK handling in quic_rx module.
2024-10-01 16:22:20 +02:00
Amaury Denoyelle
62558a9285 MINOR: quic: move buffered ACK to streambuf
QUIC streamdesc layer is used to manage QUIC MUX stream txbuf data
storage until acknowledgment. Currently, it only supports in-order
acknowledgment at the stream level. This requires to be able to buffer
out-of-order ACKs until they can be handled.

Previously, these ACKs were stored in a tree to the streamdesc instance.
Move this indexed storage at the streambuf instance.

This commit is purely an architecture change. However, it will allow to
extend ACK management in future patches, such as the ability to merge
overlapping out-of-order ACKs.
2024-10-01 16:19:42 +02:00
Amaury Denoyelle
943e48dadd MINOR: quic: store streambuf in a streamdesc tree
qc_stream_desc layer is used by QUIC MUX to store emitted STREAM data
until their acknowledgement. Each stream with Tx capability can allocate
its own qc_stream_desc. In turn, each stream desc can have one or
multiple data buffers. This is useful when a MUX stream releases a
buffer and allocate a new one, to preserve bandwith without waiting to
receive all acknowledgement of the previous buffer.

Each buffer is encapsulated in a qc_stream_buf structure. Previously, it
was stored as a list into qc_stream_desc. Change this storage to use a
tree instead. Each buffer is indexed by their offset.

This commit does not introduce functional changes. However, this
rearchitecture will be necessary for future commit to extend ACK
management which require fetching individual buffer instance, not just
the first or last element of a streamdesc, by their offset.
2024-10-01 16:19:41 +02:00
Amaury Denoyelle
f4a83fbb14 MINOR: quic: do not remove qc_stream_desc automatically on ACK handling
qc_stream_desc_ack() is used to handle ACK received for STREAM frame. It
removes acknowledged data from their underlying buffer.

If all data were removed after ACK handling, qc_stream_desc instance
would automatically be freed at the end of qc_stream_desc_ack().
However, this renders the function complicated to use. Simplify this by
removing this automatic removal. Now, caller is responsible to check
after ACK handling if qc_stream_desc instance can be removed. This is
easily done using qc_stream_desc_done() helper.
2024-10-01 16:19:25 +02:00
Amaury Denoyelle
db68f8ed86 MINOR: quic: refactor STREAM room notification
qc_stream_desc is an intermediary layer between QUIC MUX and quic_conn.
It is a facility which permits to store data to emit and keep them for
retransmission until acknowledgment. This layer is responsible to notify
QUIC MUX each time a buffer is freed. This is necessary as MUX buffer
allocation is limited by the underlying congestion window size.

Refactor this to use a mechanism similar to send notification. A new
callback notify_room can now be registered to qc_stream_desc instance.
This is set by QUIC MUX to qmux_ctrl_room(). On MUX QUIC free, special
care is now taken to reset notify_room callback to NULL.

Thanks to this refactoring, further adjustment have been made to refine
the architecture. One of them is the removal of qc_stream_desc
QC_SD_FL_OOB_BUF, which is now converted to a MUX layer flag
QC_SF_TXBUF_OOB.
2024-10-01 16:19:25 +02:00
Amaury Denoyelle
d7f4e5abf0 MEDIUM: quic: strengthen MUX send notification
Previous commit implement a refactor of MUX send notification from
quic_conn layer. With this new architecture, a proper callback is
defined for each qc_stream_desc instance.

This architecture change allows to simplify notification from quic_conn
layer. First, ensure the MUX callback to properly ignore retransmission
of an already emitted frame. Luckily, this can be handled easily by
comparing offsets and FIN status. Also, each QCS instance can now be
unregistered from send notification just prior qc_stream_desc releasing.
This ensures a QCS is never manipulated from quic_conn after its
emission ending. Both these changes render the send notification more
robust. As a nice effect, flag QUIC_FL_CONN_TX_MUX_CONTEXT can be
removed as it is now unneeded.
2024-10-01 16:19:25 +02:00
Amaury Denoyelle
6ad99af0a9 MINOR: quic: refactor MUX send notification
For STREAM emission, MUX QUIC generates one or several frames and emit
them via qc_send_mux(). Lower layer may use them as-is, or split them to
lower chunk to fit in a QUIC packet. It is then responsible to notify
the MUX to report the amount of data sent.

Previously, this was done via a direct call from quic_conn to MUX using
qcc_streams_sent_done(). Modify this to have a better isolation accross
layers. Define a send callback handled by the qc_stream_desc instance.
This allows the MUX to register each QCS instance individually to the
renamved qmux_ctrl_send() which replaces qcc_streams_sent_done().

At quic_conn layer, qc_stream_desc_send() can be used now. This is a
wrapper to qc_stream_desc layer to invoke the send callback if
registered.

This mechanism of qc_stream_desc callback should be extended later to
implement other notifications accross the QUIC stack.
2024-10-01 16:19:25 +02:00
Amaury Denoyelle
4859d8e71d MINOR: quic: remove unneeded notification of txbuf room
When a stream buffer is freed, qc_stream_desc notify MUX. This is useful
if MUX is waiting for Tx buffer allocation.

Remove this notification in qc_stream_desc(). This is because the
function is called when all stream data have been acknowledged and thus
notified. This function can also be called with some data
unacknowledged, but in this case this is only true just before
connection closure. As such, it is useful to notify the MUX in this
condition.
2024-10-01 16:19:25 +02:00
Amaury Denoyelle
12782da020 MINOR: mux-quic: strengthen qcs_send_metadata() usage
This function is reserved for QCS instance where no data was emitted.
A BUG_ON() ensures this by checking that streamdesc buf_list is empty.

However, this condition would not be enough if data were previously
emitted but already fully acknowledged. Thus, extend the condition by
also checking the streamdesc ack_offset is 0.
2024-10-01 16:17:03 +02:00
Amaury Denoyelle
fdc16c1e01 MINOR: quic: ensure txbuf realloc is only performed on empty buffer
QUIC application protocol layer has the ability to either allocate a
standard buffer or a smaller one. The latter is useful when only small
data are transferred to prevent consuming too much of the QUIC MUX
buffer window.

This operation is performed using qc_stream_buf_realloc(). Add a new
BUG_ON() in it to ensure no data is present in the buffer. Indeed, this
would cause to data loss, or even crash when trying to acknowledge data.

Note that for the moment qc_stream_buf_realloc() is only use for HTTP/3
headers transmission, and this usage is conform to the new BUG_ON. This
commit is thus not a bug fix, but only to strengthen the API.
2024-10-01 11:51:51 +02:00
Amaury Denoyelle
172404a8ec MINOR: mux-quic: complete Tx infos for QCS dump
Complete debug info when a QCS instance is dumped either on traces or
show quic. Display the value of Tx offset both soft and real, along with
the current flow-control limit.
2024-10-01 11:51:51 +02:00
Valentine Krasnobaeva
f18b52cc80 MINOR: cfgparse-global: add dedicated parser for *env keywords
This commit prepares the config parser to support MODE_DISCOVERY and, thus,
refactored master-worker mode. The latter implies, that master process reads
only the 'DISCOVERY' tagged keywords from the global section and it must call
for this an appropriate keyword parser.

So, let's move the code, which parses *env keywords, from the global section
parser to its own keyword registered parser.
2024-10-01 10:37:29 +02:00
Valentine Krasnobaeva
df68f7ec96 BUG/MINOR: cfgparse-global: fix allowed args number for setenv
Keywords setenv and presetenv take 2 arguments: variable name and value.
So, the total number, that should be passed to alertif_too_many_args is 2
("setenv <name> <value>") instead of 3. For alertif_too_many_args the first
argument index is 0.

This should be backported in all stable versions.
2024-10-01 10:35:09 +02:00
Christopher Faulet
273d322b6f MINOR: stream/stats: Expose the total number of streams ever created in stats
A shared counter is added in the thread context to track the total number of
streams created on the thread. This number is then reported in stats. It
will be a useful information to diagnose some bugs.
2024-09-30 16:55:53 +02:00
Christopher Faulet
18ee22ff76 MINOR: stream/stats: Expose the current number of streams in stats
A shared counter is added in the thread context to track the current number
of streams. This number is then reported in stats. It will be a useful
information to diagnose some bugs.
2024-09-30 16:55:53 +02:00
Christopher Faulet
6a94b7419e MINOR: stream: Support dynamic changes of the number of connection retries
Thanks to the previous patch, it is now possible to add an action to
dynamically change the maxumum number of connection retires for a stream.
"set-retries" action may now be used to do so, from a "tcp-request content"
or a "http-request" rule. This action accepts an expression or an integer
between 0 and 100. The integer value is checked during the configuration
parsing and leads to an error if it is not in the expected range. However,
for the expression, the value is retrieve at runtime. So, invalid value are
just ignored.

Too high value is forbidden to avoid any trouble. 100 retries seems already
be an amazingly hight value. In addition, the option is only available on
backend or listen sections.

Because the max retries is limited to 100 at most, it can be stored as a
unsigned short. This save some space in the stream structure.
2024-09-30 16:55:53 +02:00
Christopher Faulet
91e785edc9 MINOR: stream: Rely on a per-stream max connection retries value
Instead of directly relying on the backend parameter to limit the number of
connection retries, we now use a per-stream value. This value is by default
inherited from the backend value when it is set. So for now, there is no
change except the stream value is used instead of the backend value. But
thanks to this change, it will be possible to dynamically change this value.
2024-09-30 16:55:53 +02:00
Christopher Faulet
0d91de2be4 MINOR: action: Export release_expr_int_action() release function
This function was only used by TCP actions and was private to tcp_act.c
file. However, it make sense to make it public to be used by any action
relying on an int-or-expression argument.
2024-09-30 16:55:53 +02:00
Christopher Faulet
688abb6f30 BUG/MINOR: mcli: Pretend the mux have more data to deliver between two commands
Since the commit "OPTIM: stconn: Don't pretend mux have more data to deliver
on EOI/EOS/ERROR", the SC no longer pretend its mux have more data to
deliver when one of EOI/EOS/ERROR flags are set on its sedesc.

However, for the master cli, it is an issue because any EOI/EOS at the end
of a command is in fact detected on the attempt to get the next command. To
do so, the stream is reset. Because if the commit above, the next received
is never performed. To fix the issue, when the stream is reset, the front SC
pretend its mux have more data to deliver.

This patch must only be bacported if the commit above is backported.
2024-09-30 16:55:53 +02:00
Christopher Faulet
bca5e14235 OPTIM: stconn: Don't pretend mux have more data to deliver on EOI/EOS/ERROR
Doing some benchs on the 3.0, we encountered a small loss on requests/sec on
small objects compared to the 2.8 . After bisecting the issue, it appeared
that this was introduced when the mux-to-mux zero-copy data forwarding was
implemented in 2.9-dev8. Extra subscribes on receives at the end of the
message were responsible of the loss.

A basic configuration, sending H2 requests to a H1 server returning
responses without payload is enough to observe the issue. With the following
command, we can observe a huge increase of epoll_ctl calls on 2.9/3.x:

  h2load -c 100 -m 10 -n 100000 http://...

On 2.8 we have around 3200 calls to epoll_ctl against more than 20k on 3.1.

The fix seems obvious. After a receive, there is no reason to state a mux
have more data to deliver if EOI/EOS/ERROR flag was set on the
stream-endpoint descriptor. With this change, extra calls to epoll_ctl
disappear. However it is a sensitive part so it is important to keep an eye
on it and to not backport it.

Thanks to Willy and Emeric to have spot the issue.
2024-09-30 16:55:48 +02:00
Willy Tarreau
11051ed9c7 OPTIM: channel: speed up co_getline()'s search of the end of line
Previously, co_getline() was essentially used for occasional parsing
in peers's banner or Lua, so it could afford to read one character at
a time. However now it's also used on the TCP log path, where it can
consume up to 40% CPU as mentioned in GH issue #2731. Let's speed it
up by using memchr() to look for the LF, and copying the data at once
using memcpy().

Previously it would take 2.44s to consume 1 GB of log on a single
thread of a Core i7-8650U, now it takes 1.56s (-36%).
2024-09-30 11:36:39 +02:00
Willy Tarreau
7caf073faa MINOR: tools: do not attempt to use backtrace() on linux without glibc
The function is provided by glibc. Nothing prevents us from using our
own outside of glibc there (tested on aarch64 with musl). We still do
not enable it by default as we don't yet know if all archs work well,
but it's sufficient to pass USE_BACKTRACE=1 when building with musl to
verify it's OK.
2024-09-29 09:52:23 +02:00
Willy Tarreau
1c4776dbc3 BUILD: tools: only include execinfo.h for the real backtrace() function
No need to include this possibly non-existing file when using our own
backtrace() implementation, it's only needed for the libc-provided one.
Because of this it's currently not possible to build musl with backtrace
enabled.
2024-09-29 09:52:23 +02:00
Willy Tarreau
1d403caf8a MINOR: server: make srv_shutdown_sessions() call pendconn_redistribute()
When shutting down server sessions, the queue was not considered, which
is a problem if some element reached the queue at the moment the server
was going down, because there will be no more requests to kick them out
of it. Let's always make sure we scan the queue to kick these streams
out of it and that they can possibly find a more suitable server. This
may make a difference in the time it takes to shut down a server on the
CLI when lots of servers are in the queue.

It might be interesting to backport this to 3.0 but probably not much
further.
2024-09-27 19:01:38 +02:00
Willy Tarreau
1385e33eb0 BUG/MINOR: queue: make sure that maintenance redispatches server queue
Turning a server to maintenance currently doesn't redispatch the server
queue unless there's an explicit "option redispatch" and no "option
persist", while the former has never really been the purpose of this
test. Better refine this so that forced maintenance also causes the
queue to be flushed, and possibly redispatched unless the proxy has
option persist. This way now when turning a server to maintenance,
the queue is immediately flushed and streams can decide what to do.

This can be backported, though there's no need to go far since it was
never directly reported and only noticed as part of debugging some
rare "shutdown sessions" strangeness, which it might participate to.
2024-09-27 18:54:07 +02:00
Willy Tarreau
a4d04c649a BUG/MINOR: server: make sure the HMAINT state is part of MAINT
In 1.8 when adding "set server fqdn" with commit b418c1228c ("MINOR:
server: cli: Add server FQDNs to server-state file and stats socket."),
the HMAINT flag was not made part of the MAINT ones, so technically
speaking when changing the FQDN, the server is not completely considered
as in maintenance mode.

In its defense, the code location around that was completely messy, with
the aggregator flag being hidden between other values and purposely but
discretely ignoring one of the flags, so the comments were updated to
make the intent clearer (particularly regarding CMAINT which looked like
it was also forgotten while it was on purpose).

This can be backported anywhere.
2024-09-27 18:40:15 +02:00
Willy Tarreau
b8e3b0a18d BUG/MEDIUM: stream: make stream_shutdown() async-safe
The solution found in commit b500e84e24 ("BUG/MINOR: server: shut down
streams under thread isolation") to deal with inter-thread stream
shutdown doesn't work fine because there exists code paths involving
a server lock which can then deadlock on thread_isolate(). A better
solution then consists in deferring the shutdown to the stream itself
and just wake it up for that.

The only thing is that TASK_WOKEN_OTHER is a bit too generic and we
need to pass at least 2 types of events (SF_ERR_DOWN and SF_ERR_KILLED),
so we're now leveraging the new TASK_F_UEVT1 and _UEVT2 flags on the
task's state to convey these info. The caller only needs to wake the
task up with these flags set, and the stream handler will then finish
the job locally using stream_shutdown_self().

This needs to be carefully backported to all branches affected by the
dequeuing issue and containing any of the 5541d4995d ("BUG/MEDIUM:
queue: deal with a rare TOCTOU in assign_server_and_queue()"), and/or
b11495652e ("BUG/MEDIUM: queue: implement a flag to check for the
dequeuing").
2024-09-27 12:15:41 +02:00
Willy Tarreau
b5281283bb MINOR: task: define two new one-shot events for use with WOKEN_OTHER or MSG
TASK_WOKEN_MSG only says "someone sent you a message" but doesn't convey
any info about the message. TASK_WOKEN_OTHER says "you're woken for another
reason" but doesn't tell which one. Most often they're used as-is by the
task handlers to report very specific situations.

For some important control notifications, having the ability to modulate
the message a little bit is useful, so let's define two user event types
UEVT1 and UEVT2 to be used in conjunction with TASK_WOKEN_MSG or _OTHER
so that the application can know that a specific condition was explicitly
requested. It will be used this way:

  task_wakeup(s->task, TASK_WOKEN_MSG | TASK_F_UEVT1);
or:
  task_wakeup(s->task, TASK_WOKEN_OTHER | TASK_F_UEVT2);

Since events are cumulative, keep in mind not to consider a 3rd value
as the combination of EVT1+EVT2; these really mean that the two events
appeared (though in unspecified order).
2024-09-27 11:56:10 +02:00
Willy Tarreau
d1c398b786 Revert "BUG/MINOR: server: shut down streams under thread isolation"
This reverts commit b500e84e24fd19ccbcdf4fae5165aeb07e46bd67.

Thread isolation does not work well for this, there exists code paths
which already hold the server's lock and result in a deadlock. Let's
revert that and address it better without isolation.
2024-09-27 10:17:31 +02:00
Aurelien DARRAGON
0c94b2efec REGTESTS: add a test for proxy "log-steps"
Now that proxy "log-steps" keyword was implemented and is usable since
("MEDIUM: log: consider log-steps proxy setting for existing log origins")
let's add some tests for it in reg-tests/log/log_profile.vtc.
2024-09-26 16:53:07 +02:00
Aurelien DARRAGON
7ad4e00c1f DOC: config: document proxy "log-steps" keyword
Now that "log-steps" proxy keyword is functional, let's add some
documentation and usage examples for it.
2024-09-26 16:53:07 +02:00
Aurelien DARRAGON
e3eb6a9035 MEDIUM: log: consider log-steps proxy setting for existing log origins
During tcp/http transaction processing, haproxy may produce logs at
different steps during the processing (accept, connect, request,
response, close). But the behavior is hardly configurable because
haproxy will only emit a single log per transaction, and by default
it will try to produce the log once all log aliases or fetches used
in the logformat could be satisfied, which means the log is often
emitted during connection teardown, unless "option logasap" is used.

We were often asked to have a way to emit multiple logs for a single
transaction, like for instance emit log during accept, then request,
response and close for instance, see GH #401 for more context.

Thanks to "log-steps" keyword introduced by commit "MINOR: log:
introduce "log-steps" proxy keyword", it is now possible to explictly
configure when logs should be generated by haproxy when processing a
transaction. This commit adds the required checks so that log-steps
proxy option is properly considered for existing logs generated by
haproxy. If "log-steps" is not specified on the proxy, the old behavior
is preserved.

Note: a slight cpu overhead should only be visible when "log-steps"
keyword will be used due to the implementation relying on eb32 lookup
instead of basic bitfield check as described in "MINOR: proxy: add
log_steps struct member". However, the default behavior shouldn't be
affected.

When combining log-steps with log-profiles, user has the ability to
explicitly control how and when haproxy should generate logs during
requests handling.
2024-09-26 16:53:07 +02:00
Aurelien DARRAGON
4189eb7aca MINOR: log: add log_orig_proxy() helper function
Function may be used on proxy where log-steps are used to check if a given
log origin should be handled or not.
2024-09-26 16:53:07 +02:00
Aurelien DARRAGON
c043d5d372 MINOR: log: introduce "log-steps" proxy keyword
For now it is only available for proxies with frontend capability because
log-steps are only evaluated under sess_log() or strm_log() which
essentially focus on the frontend side when it comes to log settings so
it's better to keep it this way for better consistency, at least for now.

For now the setting does nothing (it is not considered during runtime),
it will be implemented and documented in upcoming commits.
2024-09-26 16:53:07 +02:00
Aurelien DARRAGON
9341792baf MINOR: proxy: add log_steps struct member
add proxy->conf.log_steps eb32 root tree which will be used to store the
log origin identifiers that should result in haproxy emitting a log as
configured by the user using upcoming "log-steps" proxy keyword.

It was chosen to use eb32 tree instead of simple bitfield because despite
the slight overhead it is more future-proof given that we already
implemented the prerequisites for seamless custom log origins registration
that will also be usable from "log-steps" proxy keyword.
2024-09-26 16:53:07 +02:00
Aurelien DARRAGON
b882402a29 MINOR: log: support extra log origins for '%OG' alias
Following previous commits, let's improve log_orig_to_str() so that
extra log origins (registered through log_orig_register()) can be
translated to string from origin ID.

For that, it is required to add eb_32 tree node to log_origin struct in
order to enable quick integer lookup during runtime. Slow name lookup
using the list is acceptable for config parsing, but it is not the case
during runtime when log_orig_to_str() is expected to be used. Also, to
prevent duplicated info, get rid of ->id field and use ->tree.key instead
2024-09-26 16:53:07 +02:00
Aurelien DARRAGON
f8bb9d5c57 MINOR: log: explicitly handle extra log origins as error when relevant
Thanks to previous commit, we can know check for log_orig optional flags
in functions taking struct log_orig as parameter. Let's take this
opportunity to add the LOG_ORIG_FL_ERROR flag and check this flag at a
few places to handle the log message differently because if the flag is
set then the caller expects the log to be handled as an error explicitly.

e.g.: in _process_send_log_override(), if the flag is set, use the error
log format instead of the dedicated one.
2024-09-26 16:53:07 +02:00
Aurelien DARRAGON
3c15ee05e9 MINOR: log: introduce log_orig flags
Rename 'enum log_orig' to 'enum log_orig_id', since this enum specifically
contains the log origin ids.

Add 'struct log_orig' which wraps 'enum log_orig' with optional flags
(no flags defined for now).

Add log_orig() helper func that takes id and flags as parameter and
returns log_orig struct initialized with input arguments.

Update functions taking log origin as parameter so they explicitly take
log orig id or log orig wrapper as argument depending on the level of
context expected by the function.
2024-09-26 16:53:07 +02:00
Aurelien DARRAGON
6567e37680 MINOR: log: handle extra log origins in _process_send_log_override()
Thanks to the previous commit, it is now possible to register additional
log origins that may be used from log-profile section as 'on' steps.

As such, let's make _process_send_log_override() function aware of them
by trying to lookup in the tree of extra logging steps in the default
switch-case catchall. If the log origin id matches with the id of the
extra logging step, we use the associated log format instead of the
"any" log format.
2024-09-26 16:53:07 +02:00
Aurelien DARRAGON
818475c5cc MINOR: log: introduce extra log profile steps
add a way to register additional log origins using log_origin_register()
that may be used as log profile steps from log profile sections.

For now this does nothing as no extra origins are registered and extra log
origins are not yet considered for runtime logging paths.

When specifying an extra logging step for on <step> under log-profile
section, the logging step is stored within a binary tree for efficient
lookup during runtime. No performance impact should be expected if extra
log origins are not being used, and slight performance impact if extra
log origins are used.

Don't forget to update the documentation when new log origins are added
(both %OG log alias and on <step> log-profile keyword are concerned.
2024-09-26 16:53:07 +02:00
Aurelien DARRAGON
facf259d88 MINOR: log: fix indent in strm_log()
8f34320e15 ("MINOR: log: provide log origin in logformat expressions
using '%OG'") caused wrong indent in strm_log()
2024-09-26 16:53:07 +02:00
Oliver Dala
a889413f5e BUG/MEDIUM: cli: Deadlock when setting frontend maxconn
The proxy lock state isn't passed down to relax_listener
through dequeue_proxy_listeners, which causes a deadlock
in relax_listener when it tries to get that lock.

Backporting: Older versions didn't have relax_listener and directly called
resume_listener in dequeue_proxy_listeners. lpx should just be passed directly
to resume_listener then.

The bug was introduced in commit 001328873c352e5e4b1df0dcc8facaf2fc1408aa

[cf: This patch should fix the issue #2726. It must be backported as far as
2.4]
2024-09-25 17:12:11 +02:00
Christopher Faulet
96edacc546 DEV: flags/applet: decode appctx flags
Decode APPCTX flags via appctx_show_flags() function.
2024-09-24 18:26:36 +02:00
Christopher Faulet
14a413033c BUG/MEDIUM: cli: Be sure to catch immediate client abort
A client abort while nothing was sent is properly handled except when this
immediately happens after the connection was accepted. The read0 event is
caught before the CLI applet is created. In that case, the shutdown is not
handled and the applet is no longer wakeup. In that case, the stream remains
blocked and no timeout are armed.

The bug was due to the fact that when the applet I/O handler was called for
the first time, the applet context was initialized and nothing more was
performed. A shutdown, if any, would be handled on the next call. In that
case, it was too late.

Now, afet the init step, we loop to eval the first command. There is no
command here but the shutdown will be tested.

This patch should fix the issue #2727. It must be backported to 3.0.
2024-09-24 18:01:38 +02:00
Aurelien DARRAGON
d622f9d5b6 MEDIUM: mailers: warn about deprecated legacy mailers
As mentioned in 2.8 announce on the mailing list [1] and on the wiki [2],
use of legacy mailers is now deprecated and will not be supported anymore
starting with version 3.3. Use of Lua script (AKA Lua mailers) is now
encouraged (and fully supported since 2.8) for this purpose, as it offers
more flexibility (e.g: alerts can be customized) and is more future-proof.

Configurations relying on legacy mailers will now raise a warning.

Users willing to keep their existing mailers config in a working state
should simply add the following line to their global section:

   # mailers.lua file as provided in the git repository
   # adjust path as needed
   lua-load examples/lua/mailers.lua

[1]: https://www.mail-archive.com/haproxy@formilux.org/msg43600.html
[2]: https://github.com/haproxy/wiki/wiki/Breaking-changes
2024-09-23 20:16:27 +02:00
Aurelien DARRAGON
cdaa749ba0 REGTESTS: log: fix log-profile.vtc
Add missing wait for Slg4 introduced in f8299bc ("MINOR: log: "drop"
support for log-profile steps"), and missing barrier increase due to
the use of barrier sync, which could have resulted in the regtest
being timing-sentive and thus less-reliable.

Also, the "error" check in Slg4 wasn't even considered because it is
emitted by frontend 4, not frontend 2..

No backport needed unless f8299bc is.
2024-09-23 20:15:47 +02:00
Willy Tarreau
fdf38ed7fc BUG/MINOR: proxy: also make the cli and resolvers use the global name
As detected by ASAN on the CI, two places still using strdup() on the
proxy names were left by commit b325453c3 ("MINOR: proxy: use the global
file names for conf->file").

No backport is needed.
2024-09-21 20:08:06 +02:00
Willy Tarreau
b500e84e24 BUG/MINOR: server: shut down streams under thread isolation
Since the beginning of thread support, the shutdown of streams attached
to a server was run under the server's lock, but that's not sufficient.
It indeed turns out that shutting down streams (either from the CLI using
"shutdown sessions server XXX" or due to "on-error shutdown-sessions")
iterates over all the streams to shut them down, but stream_shutdown()
has no way to protect its actions against concurrent actions from the
stream itself on another thread, and streams offer no such provisions
anyway.

The impact is some rare but possible crashes when shutting down streams
from the CLI in cmopetition with high server traffic. The probability
is low enough to mark it minor, though it was observed in the field.

At least since 2.4 the streams are arranged in per-thread lists, so it
likely would be possible using the event subsystem to delegate these
events to dedicated per-thread tasks which would address the problem.
But server streams don't get killed often enough to justify such extra
complexity, so better just run the loop under thread isolation.

It also shows that the internal API could probably be improved to
support a lighter thread exclusion instead of full isolation: various
places want to only exclude one thread and here it could work. But
again there's no point doing this for now.

This patch should be backported to all stable branches. It's important
to carefully check that this srv_shutdowns_streams() function is never
called itself under isolation in older versions (though at first glance
it looks OK).
2024-09-21 19:35:35 +02:00
Willy Tarreau
e77c73316a MEDIUM: cfgparse: warn about deprecated use of duplicate server names
As discussed below, there are too many problems and limitations caused
by still supporting duplicate server names. That's already particularly
complicated and dissuasive to use since it requires these servers to
have explicit IDs to be accept. Let's now warn on any duplicate, even
with explicit IDs and remind that this will become forbidden in 3.3.

Link: https://www.mail-archive.com/haproxy@formilux.org/msg45185.html
2024-09-20 17:15:11 +02:00
Willy Tarreau
029d75df1e OPTIM: cfgparse: speed up duplicate server detection
Surprisingly, the duplicate server name detection has never made use
of the names tree, so lookups were still in O(N^2). It took 1 second
to validate 50k servers spread into 25 backends at 2k per backend.

By simply using the tree (and since the current server already is in
the tree), we just have to walk using ebpt_prev_dup to visit previous
servers with the same name. We can then detect which ones conflict
without having an ID set and error. The config check time is now 1/4
of the previous one for 2k servers per backend, and more importantly
it will make it simpler to check for any duplicates later.
2024-09-20 17:14:50 +02:00
Willy Tarreau
ccd1ecba1d MEDIUM: cfgparse: drop duplicate named defaults sections after use
It has never been permitted to explicitly reference named defaults
sections for which there are duplicate names. This means that when
a duplicate defaults section is found, there's no point in keeping
it since it will never be used for lookups, so it can be dropped.

However, some such defaults sections might have some rules in them
that are implicitly referenced by proxies placed after them. In this
case they cannot be removed.

What is done here is that upon each new named section creation, if
another one is found with the same name, its config location is stored
into the new proxy's {prev_file,prev_line} pair, and the old section is
either destroyed if its refcount is null, or just unindexed. The dup
check when creating a new proxy now consists in checking the prev_line
instead of performing a dup lookup on the defaults section.

This will guarantee that we can't find duplicate defaults sections in
their tree anymore, while still keeping track of what's allocated and
releasing everything upon exit.

Beyond the consistency gain, there are nice savings for large configs
involving many defaults sections: a test with 300k sections saved
about 1.9 GB of RAM, and started 25% faster likely thanks to spending
less time allocating memory.
2024-09-20 16:35:32 +02:00
Willy Tarreau
c8b813771d MINOR: proxy: add a list of orphaned defaults sections
We'll soon delete unreferenced and duplicated named defaults sections
from the list of proxies. The problem with this is that this list (in
fact a name-based tree) is used to release all of them at the end. Let's
add a list of orphaned defaults sections, typically those containing
"http-check send" statements or various other rules, and that are
implicitly inherited by a proxy hence have a non-zero refcount while
also having a name. These now makes it possible to remove them from
the name index while still keeping their memory around for the lifetime
of the process, and cleaning it at the end.
2024-09-20 15:59:04 +02:00
Willy Tarreau
cb4c236fac BUG/MINOR: cfgparse: detect another uncaught case of duplicate defaults
The following sequence was not properly caught:

   defaults def
   backend back from def
   defaults def

But this one was:

   defaults def
   defaults def
   backend back from def

Let's check when defaults are declared that they're not already
referenced.

Better not backport this. While it will catch broken configs (possibly
some with backends pasted after the wrong defaults), these might still
work by accident. It may be reported as a diag warning though.
2024-09-20 15:58:10 +02:00
Willy Tarreau
5b221d1e41 CLEANUP: cfgparse: factor proxy vs log-forward collisions
This simplifies the check added in 1a38684fbc ("MEDIUM: cfgparse:
detect collisions between defaults and log-forward"), by factoring it
with the other existing one.

The tests are ugly in that code because a first block tests pure
proxies, a second one proxies or defaults and inside that one we
have special cases for defaults. Let's just move the tests to the
"any proxy type" block.
2024-09-20 14:13:14 +02:00
Willy Tarreau
b325453c36 MINOR: proxy: use the global file names for conf->file
Proxy file names are assigned a bit everywhere (resolvers, peers,
cli, logs, proxy). All these elements were enumerated and now use
copy_file_name(). The only ha_free() call was turned to drop_file_name().

As a bonus side effect, a 300k backend config saved 14 MB of RAM.
2024-09-19 15:38:19 +02:00
Willy Tarreau
9ab21a3c2d CLEANUP: stick-table: make the file location point to a global file name
The file name used to point to the calling function's stack for stick
tables, which was OK during parsing but remained dangling afterwards.
At least it was already marked const so as not to accidentally free it.
Let's make it point to a file_name_node now.
2024-09-19 15:38:19 +02:00
Willy Tarreau
d6c060c5ae MINOR: tools: add minimal file name management
In proxies, stick-tables, servers, etc... at plenty of places we store
a file name and a line number. Some file names are the result of strdup()
(e.g. in proxies), others not (e.g. stick-tables) and leave dangling
pointers at the end of parsing. The risk of double-free is not null
either.

In order to stop this, let's first add a simple tool that allows to
register short strings inside a global list, these strings happening
to be server names. The strings are either duplicated and stored upon
failure to find them, or just added to this storage. Since file names
are not expected to disappear before the end of the process, for now
we don't even implement refcounting, and we free them all at the end.
There's already a drop_file_name() function to reset the pointer like
ha_free() used to do, and even if not strictly needed it's a good
habit to get used to doing it.

The strings are returned as const so that they're stored as-is in
structs, and that nasty free() calls are easily caught. The pointer
points to the char[] storage inside the node itself. This way later
if we want to implement refcounting, it will be trivial to just look
up a string and change its associated node's refcount. If needed,
comparisons can also be made on pointers.

For now they're not used yet and are released on deinit().
2024-09-19 15:36:58 +02:00
Willy Tarreau
30a0e93fe6 [RELEASE] Released version 3.1-dev8
Released version 3.1-dev8 with the following main changes :
    - DOC: configuration: place the HAPROXY_HTTP_LOG_FMT example on the correct line
    - MINOR: mux-h1: Set EOI on SE during demux when both side are in DONE state
    - BUG/MEDIUM: mux-h1/mux-h2: Reject upgrades with payload on H2 side only
    - REGTESTS: h1/h2: Update script testing H1/H2 protocol upgrades
    - BUG/MEDIUM: clock: detect and cover jumps during execution
    - BUG/MINOR: pattern: prevent const sample from being tampered in pat_match_beg()
    - BUG/MEDIUM: pattern: prevent uninitialized reads in pat_match_{str,beg}
    - BUG/MEDIUM: pattern: prevent UAF on reused pattern expr
    - MEDIUM: ssl/cli: "dump ssl cert" allow to dump a certificate in PEM format
    - BUG/MAJOR: mux-h1: Wake SC to perform 0-copy forwarding in CLOSING state
    - BUG/MINOR: h1-htx: Don't flag response as bodyless when a tunnel is established
    - REGTESTS: fix random failures with wrong_ip_port_logging.vtc under load
    - BUG/MINOR: pattern: do not leave a leading comma on "set" error messages
    - REGTESTS: shorten a bit the delay for the h1/h2 upgrade test
    - MINOR: server: allow init-state for dynamic servers
    - DOC: server: document what to check for when adding new server keywords
    - MEDIUM: h1: Accept invalid T-E values with accept-invalid-http-response option
    - BUG/MINOR: polling: fix time reporting when using busy polling
    - BUG/MINOR: clock: make time jump corrections a bit more accurate
    - BUG/MINOR: clock: validate that now_offset still applies to the current date
    - BUG/MEDIUM: queue: implement a flag to check for the dequeuing
    - OPTIM: sample: don't check casts for samples of same type
    - OPTIM: vars: remove the unneeded lock in vars_prune_*
    - OPTIM: vars: inline vars_prune() to avoid many calls
    - MINOR: vars: remove the emptiness tests in callers before pruning
    - IMPORT: import cebtree (compact elastic binary trees)
    - OPTIM: vars: use a cebtree instead of a list for variable names
    - OPTIM: vars: use multiple name heads in the vars struct
    - BUG/MINOR: peers: local entries updates may not be advertised after resync
    - DOC: config: Explicitly list relaxing rules for accept-invalid-http-* options
    - MINOR: proxy: Rename accept-invalid-http-* options
    - DOC: configuration: Remove dangerous directives from the proxy matrix
    - BUG/MEDIUM: sc_strm/applet: Wake applet after a successfull synchronous send
    - BUG/MEDIUM: cache/stats: Wait to have the request before sending the response
    - BUG/MEDIUM: promex: Wait to have the request before sending the response
    - MINOR: clock: test all clock_gettime() return values
    - MEDIUM: clock: collect the monotonic time in clock_local_update_date()
    - MEDIUM: clock: opportunistically use CLOCK_MONOTONIC for the internal time
    - MEDIUM: clock: use the monotonic clock for idle time calculation
    - MEDIUM: clock: don't compute before_poll when using monotonic clock
    - BUG/MINOR: fix missing "log-format overrides previous 'option tcplog clf'..." detection
    - BUG/MINOR: fix missing "'option httpslog' overrides previous 'option tcplog clf'..." detection
    - BUG/MINOR: cfgparse-listen: fix option httpslog override warning message
    - BUG/MINOR: cfgparse: detect incorrect overlap of same backend names
    - MEDIUM: cfgparse: warn about proxies having the same names
    - DOC: management: add init-state to add server keywords
    - BUG/MINOR: mux-quic: report glitches to session
    - BUILD: cebtree: silence a bogus gcc warning on impossible code paths
    - MEDIUM: cfgparse: warn about colliding names between defaults and proxies
    - MEDIUM: cfgparse: detect collisions between defaults and log-forward
2024-09-18 22:29:08 +02:00
Willy Tarreau
1a38684fbc MEDIUM: cfgparse: detect collisions between defaults and log-forward
Sadly, when log-forward were introduced they took great care of avoiding
collision with regular proxies but defaults were missed (they need to be
explicitly checked for). So now we have to move them to a warning for 3.1
instead of rejecting them.
2024-09-18 18:08:15 +02:00
Willy Tarreau
d8f4b07e40 MEDIUM: cfgparse: warn about colliding names between defaults and proxies
In order to complete the checks added in 303a66573d ("MEDIUM: cfgparse:
warn about proxies having the same names"), we also need to warn about
regular proxies having the same name as defaults sections as well as
defaults sections having the same name as proxies, since defaults
sections are inherently proxies, albeit stored in a separate list for
now.
2024-09-18 18:08:06 +02:00
Willy Tarreau
8df44eea6d BUILD: cebtree: silence a bogus gcc warning on impossible code paths
gcc-12 and above report a wrong warning about a negative length being
passed to memcmp() on an impossible code path when built at -O0. The
pattern is the same at a few places, basically:

  int foo(int op, const void *a, const void *b, size_t size, size_t arg)
  {
      if (op == 1) // arg is a strict multiple of size
          return memcmp(a, b, arg - size);
      return 0;
  }
  ...
  int bar()
  {
     return foo(0, a, b, sizeof(something), 0);
  }

It *might* be possible to invent dummy values for the "len" argument
above in the real code, but that significantly complexifies it and as
usual can easily result in introducing undesired bugs.

Here we take a different approach consisting in shutting the
-Wstringop-overread warning on gcc>=12 at -O0 since that's the only
condition that triggers it. The issue was reported to and confirmed by
the gcc team here:  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114622

No backport needed, but this should be upstreamed into cebtree after
checking that all involved macros are available.
2024-09-18 17:42:52 +02:00
Amaury Denoyelle
fcd6d29acf BUG/MINOR: mux-quic: report glitches to session
Glitch counter was implemented for QUIC/HTTP3. The counter is stored in
the QCC MUX connection instance. However, this is never reported at the
session level which is necessary if glitch counter is tracked via a
stick-table.

To fix this, use session_add_glitch_ctr() in various QUIC MUX functions
which may increment glitch counter.

This should be backported up to 3.0.
2024-09-18 16:11:03 +02:00
Damien Claisse
2c783c25d6 DOC: management: add init-state to add server keywords
Commit ce6a621ae allowed init-state to be used for dynamic servers but I
forgot to update management doc.
2024-09-17 22:44:53 +02:00
Willy Tarreau
303a66573d MEDIUM: cfgparse: warn about proxies having the same names
As discussed below, there are too many problems and uncaught bugs
in the parser when trying to support proxies having similar names
but different types. There's specific code to detect the presence
of stick-tables in a pair of such proxies for example. It's even
possible that certain combinations of backend+listen that were not
previously detected have some nasty side effects.

According to the proposal in the discussion, this is now deprecated
in 3.1 (thus we emit a warning) and will become forbidden in 3.3.

A backport might be useful, but reporting a diag_warning only, not a
classical warning, so as not to break setups running in zero-warning
mode.

It was verified with a config involving all 9 combinations of
(frontend,backend,listen) followed by one of the same three that all
collisions are now properly blocked and that only back+front are kept
and emit a warning.

Link: https://www.mail-archive.com/haproxy@formilux.org/msg45185.html
2024-09-17 19:55:00 +02:00
Willy Tarreau
c70906c8a1 BUG/MINOR: cfgparse: detect incorrect overlap of same backend names
As reported below, it's possible to declare a backend then a proxy with
the same name, because for the proxy we check a frontend capability (the
first one to be tested):

   backend b
   listen b
        bind :8888

Let's check the two capabilities in this case and not just the frontend.

Better not backport this, as there's a risk of breakage of existing
setups that work by accident. It might make sense to report them as
diag warnings though.

Link: https://www.mail-archive.com/haproxy@formilux.org/msg45185.html
2024-09-17 19:55:00 +02:00
Aurelien DARRAGON
17e52c922b BUG/MINOR: cfgparse-listen: fix option httpslog override warning message
"option httpslog" override warning messaged used to be reported as
"option httplog", probably as a result of copy paste without adjusting
the context. Let's fix that to prevent emitting confusing warning messages

The issue exists since 98b930d ("MINOR: ssl: Define a default https log
format"), thus it should be backported up to 2.6
2024-09-17 15:40:02 +02:00
Aurelien DARRAGON
bc4bf5779f BUG/MINOR: fix missing "'option httpslog' overrides previous 'option tcplog clf'..." detection
Same as b85edd44db0 ("BUG/MINOR: fix missing "log-format overrides
previous 'option tcplog clf'..." detection") but for "option httpslog"
keyword.

No backport needed unless fd48b28 ("MINOR: Implements new log format of
option tcplog clf") is.
2024-09-17 15:40:02 +02:00
Aurelien DARRAGON
607b9adc9b BUG/MINOR: fix missing "log-format overrides previous 'option tcplog clf'..." detection
In commit fd48b28315 ("MINOR: Implements new log format of option tcplog clf")
"option tcplog clf" detection was correcly added for "option tcplog" and
"option httplog", but "log-format" case was overlooked. Thus, this config
would report erroneous warning message:

  defaults
    option tcplog clf
    log-format "ok"

[WARNING]  (727893) : config : parsing [test.conf:3]: 'log-format' overrides previous 'log-format' in 'defaults' section.

No backport needed unless fd48b28315 is.
2024-09-17 14:41:58 +02:00
Willy Tarreau
499e057644 MEDIUM: clock: don't compute before_poll when using monotonic clock
There's no point keeping both clocks up to date; if the monotonic clock
is ticking, let's just refrain from updating the wall clock one before
polling since we won't use it. We still do it after polling however as
we need a wall clock time to communicate with outside.

This saves one gettimeofday() call per loop and two timeval comparisons.
2024-09-17 09:08:10 +02:00
Willy Tarreau
24496803d1 MEDIUM: clock: use the monotonic clock for idle time calculation
By just keeping a copy of the last known value before entering
polling, we can apply the same algorithm as we're currently using,
except that it's now applied to the monotonic clock instead of the
wall clock, when it's detected that it's ticking. This improves
idle time calculation accuracy by making it independent on the
wall clock.
2024-09-17 09:08:10 +02:00
Willy Tarreau
4150851ce5 MEDIUM: clock: opportunistically use CLOCK_MONOTONIC for the internal time
We already collect CLOCK_MONOTONIC when it's available when leaving the
poller, but it's only used for profiling. The functions that return it
set the value to zero when it's not available, so we can use that to
detect if it works or not. The idea is that if the monotonic time is
non-zero, it is ticking and usable, then we use if for now_ns, otherwise
we use the corrected date. We continue to apply the now_offset to the
returned value because it helps forcing an early time wrap-around.

Proceeding like this presents two benefits:
  - on systems supporting this, the time is much more robust against
    time changes
  - when it works, it saves us from having to go through the time
    correction code, which is usually cheap, but better avoided anyway.

Note that idle time calculation continues to rely on the wall-clock
time.
2024-09-17 09:08:10 +02:00
Willy Tarreau
f793845f4a MEDIUM: clock: collect the monotonic time in clock_local_update_date()
Now we collect this clock in clock_local_update_date(), the closest from
the poller, which is also used when busy-polling, and the values is set
into the thread's curr_mono_time which did not exist before. Later,
clock_leaving_poll() just sets the prev_mono_time value from the curr_
one instead of retrieving the time at this specific point. It also means
that the monotonic time will now also cover the time needed to update
the global time, which should be negligible. Note that we don't collect
the CPU time in the clock_local_update_date() function even though it's
tempting, because when doing busy-polling, it would be collected on each
round while being useless.

Doing so will make sure that the local time always knows the monotonic
time when it is available.
2024-09-17 09:08:10 +02:00
Willy Tarreau
42e699903e MINOR: clock: test all clock_gettime() return values
Till now we were only using clock_gettime() for profiling, so if it
would fail it was no big deal. We intend to use it as the main clock
as well now, so we need to more reliably detect its absence or failure
and gracefully fall back to other options. Without the test we would
return anything present in the stack, which is neither clean nor easy
to detect.
2024-09-17 09:08:10 +02:00
Christopher Faulet
bb2a2bc5f2 BUG/MEDIUM: promex: Wait to have the request before sending the response
It is similar to the previous fix about the stats applet ("BUG/MEDIUM:
cache/stats: Wait to have the request before sending the response").
However, for promex, there is no crash and no obvious issue. But it depends
on the filter. Indeed, the request is used by promex, independantly if it
was considered as forwarded or not. So if it is modified by the filter,
modification are just ignored.

Same bug, same fix. We now wait the request was forwarded before processing
it and produce the response.
2024-09-16 22:56:28 +02:00
Christopher Faulet
afc50f2445 BUG/MEDIUM: cache/stats: Wait to have the request before sending the response
It seems obvious. On a classical workflow, the request headers analysis is
finished when these applets are woken up for the first time. So they don't
take care to really have the request to start to process it and to send the
response. But with a filter, it is possible to stop the request analysis
after the applet creation.

If this happens for the stats applet, this leads to a crash because we
retrieve the request start-line without checking if it is available. For the
cache applet, the response is just immediatly sent. And here it is a problem
if the compression is enabled. In that case too, this may lead to a crash
because the compression may be enabled but not initialized.

For a true server, there is no issue because the connection cannot be
established. The server is chosen only after the request analysis. The issue
with applets is that once created, an applet is quickly switched to the
established state. So it is probably a point that must be carefully reviewed
and probably reworked.

In the mean time, as a fix, in the cache and the stats applet, we just take
care to have the request before sending the response. This will do the
trick.

The patch must be backported as far as 2.6. On 2.6, the patch must be adapted.
2024-09-16 22:55:40 +02:00
Christopher Faulet
5fc12b0afd BUG/MEDIUM: sc_strm/applet: Wake applet after a successfull synchronous send
On a synchronous send from the stream to an applet, if some data were sent,
we must take care to wake the applet up. It is important because if
everything was sent at this stage, there is no other chance to wake the
applet up, mainly because SE_FL_WAIT_DATA flag is set on the applet's sedesc
in sc_update_tx() at the end of process_stream(). This flag prevent any
wakeup of the applet for a send event.

It is not necessary for a mux because the mux stream is called when a
syncrhonous send from the stream is performed. So it is reponsible to wake
the mux connection if necessary.

This patch must be backport to 3.0.
2024-09-16 22:55:40 +02:00
Christopher Faulet
655124f5cc DOC: configuration: Remove dangerous directives from the proxy matrix
For now, that only concerns accept-invalid-http-{request/response} and
accept-unsafe-violations-in-http-{request/response}. But the idea is to make
dangerous directives hard to find. It is one more way to discourage anyone
to use it. And, optionnaly, it is also handy because it keeps the matrix
aligned on 80 columns.
2024-09-16 22:55:25 +02:00
Christopher Faulet
4de6632693 MINOR: proxy: Rename accept-invalid-http-* options
With these options, it is possible to accept some invalid messages that may
considered as unsafe and may result as vulnerabilities. The naming is not
explicit enough on this point. These option must really be considered as
dangerous and only used as a temporary workaround. Unfortunately, when used,
it is probably because there are some legacy and unsupported applications in
place. Nevermind. The documentation warns about the use of these
options. Now the name of the options itself is a warning.

So now, "accept-invalid-http-request" and "accept-invalid-http-response"
options are deprecated and replaced by
"accept-unsafe-violations-in-http-request" and
"accept-unsafe-violations-in-http-response" options.
2024-09-16 22:55:25 +02:00
Christopher Faulet
0f4fad5291 DOC: config: Explicitly list relaxing rules for accept-invalid-http-* options
Time to time, new exceptions are added in the HTTP parsing (most of time H1)
to not reject some invalid messages sent by legacy applications. But the
documentation of accept-invalid-http-request and
accept-invalid-http-response options is not pretty clear. So, now, there is
an explicit list of relaxing rules for both options.
2024-09-16 22:55:24 +02:00
Aurelien DARRAGON
1e0920f855 BUG/MINOR: peers: local entries updates may not be advertised after resync
Since commit 864ac3117 ("OPTIM: stick-tables: check the stksess without
taking the read lock"), when entries for a local table are learned from
another peer upon resynchro, and this is the only peer haproxy speaks to,
local updates on such entries are not advertised to the peer anymore,
until they eventually expire and can be recreated upon local updates.

This is due to the fact that ts->seen is always set to 0 when creating
new entry, and also when touch_remote is performed on the entry.

Indeed, while 864ac3117 attempts to avoid useless updates, it didn't
consider entries learned from a remote peer. Such entries are exclusively
learned in peer_treat_updatemsg(): once the entry is created (or updated)
with new data, touch_remote is used to commit the change. However, unlike
touch_local, entries committed using touch_remote will not be advertised
to the peer from which the entry was just learned (otherwise we would
enter a looping situation). Due to the above patch, once an entry is
learned from the (unique) remote peer, 'seen' will be stuck to 0 so it
will never be advertised for its whole lifetime.

Instead, when entries are learned from a peer, we should consider that
the peer that taught us the entry has seen it.

To do this, let's set seen=1 in peer_treat_updatemsg() after calling
touch_remote(). This way, if we happen to perform updates on this entry,
it will be properly advertized to relevant peers. This patch should not
affect the performance gain documented in 864ac3117 given that the test
scenario didn't involved entries learned by remote peers, but solely
locally created entries advertised to remote peers upon updates.

This should be backported in 3.0 with 864ac3117.
2024-09-16 14:06:39 +02:00
Willy Tarreau
5d350d1e50 OPTIM: vars: use multiple name heads in the vars struct
Given that the original list-based version was using a list head as the
root of the variables, while the tree is using a single pointer, it made
sense to reuse that space to place multiple roots, indexed on the lower
bits of the name hash. Two roots slightly increase the performance level,
but the best gain is obtained with 4 roots. The performance is now always
above that of the list, even with small counts, and with 100 vars, it's
21% higher than before, or 67% higher than with the list.

We keep the same lock (it could have made sense to use one lock per head),
because most of the variables in large configs are attached to a stream
or a session, hence are not shared between threads. Thus there's no point
in sharding the pointer.
2024-09-15 23:51:51 +02:00
Willy Tarreau
47ec7c681e OPTIM: vars: use a cebtree instead of a list for variable names
Configs involving many variables can start to eat a lot of CPU in name
lookups. The reason is that the names themselves are dynamic in that
they are relative to dynamic objects (sessions, streams, etc), so
there's no fixed index for example. The current implementation relies
on a standard linked list, and in order to speed up lookups and avoid
comparing strings, only a 64-bit hash of the variable's name is stored
and compared everywhere.

But with just 100 variables and 1000 accesses in a config, it's clearly
visible that variable name lookup can reach 56% CPU with a config
generated this way:

  for i in {0..100}; do
    printf "\thttp-request set-var(txn.var%04d) int(%d)" $i $i;
    for j in {1..10}; do [ $i -lt $j ] || printf ",add(txn.var%04d)" $((i-j)); done;
    echo;
  done

The performance and a 4-core skylake 4.4 GHz reaches 85k RPS with a perf
profile showing:

  Samples: 170K of event 'cycles', Event count (approx.): 142378815419
  Overhead  Shared Object            Symbol
    56.39%  haproxy                  [.] var_to_smp
     6.65%  haproxy                  [.] var_set.part.0
     5.76%  haproxy                  [.] sample_process_cnv
     3.23%  haproxy                  [.] sample_conv_var2smp
     2.88%  haproxy                  [.] sample_conv_arith_add
     2.33%  haproxy                  [.] __pool_alloc
     2.19%  haproxy                  [.] action_store
     2.13%  haproxy                  [.] vars_get_by_desc
     1.87%  haproxy                  [.] smp_dup

[above, var_to_smp() calls var_get() under the read lock].

By switching to a binary tree, the cost is significantly lower, the
performance reaches 117k RPS (+37%) with this profile:

  Samples: 170K of event 'cycles', Event count (approx.): 142323631229
  Overhead  Shared Object            Symbol
    40.22%  haproxy                  [.] cebu64_lookup
     7.12%  haproxy                  [.] sample_process_cnv
     6.15%  haproxy                  [.] var_to_smp
     4.75%  haproxy                  [.] cebu64_insert
     3.79%  haproxy                  [.] sample_conv_var2smp
     3.40%  haproxy                  [.] cebu64_delete
     3.10%  haproxy                  [.] sample_conv_arith_add
     2.36%  haproxy                  [.] action_store
     2.32%  haproxy                  [.] __pool_alloc
     2.08%  haproxy                  [.] vars_get_by_desc
     1.96%  haproxy                  [.] smp_dup
     1.75%  haproxy                  [.] var_set.part.0
     1.74%  haproxy                  [.] cebu64_first
     1.07%  [kernel]                 [k] aq_hw_read_reg
     1.03%  haproxy                  [.] pool_put_to_cache
     1.00%  haproxy                  [.] sample_process

The performance lowers a bit earlier than with the list however. What
can be seen is that the performance maintains a plateau till 25 vars,
starts degrading a little bit for the tree while it remains stable till
28 vars for the list. Then both cross at 42 vars and the list continues
to degrade doing a hyperbole while the tree resists better. The biggest
loss is at around 32 variables where the list stays 10% higher.

Regardless, given the extremely narrow band where the list is better, it
looks relevant to switch to this in order to preserve the almost linear
performance of large setups. For example at 1000 variables and 10k
lookups, the tree is 18 times faster than the list.

In addition this reduces the size of the struct vars by 8 bytes since
there's a single pointer, though it could make sense to re-invest them
into a secondary head for example.
2024-09-15 23:49:01 +02:00
Willy Tarreau
a0205f9de4 IMPORT: import cebtree (compact elastic binary trees)
This is an import of the compact elastic binary trees at commit
a9cd84a ("OPTIM: descent: better prefetch less and for writes when
deleting")

These will be used to replace certain lists (and possibly certain
tree nodes as well). They're as fast (or even faster) than ebtrees
for lookups, as fast for insertion and slower for deletion, and a
node only uses 2 pointers (like a list).

The only changes were cebtree.h where common/tools.h was replaced
with ebtree.h which we already have and already provides the needed
functions and macros, and the addition of a wrapper cebtree-prv.h in
src/ to redirect to import/cebtree-prv.h.
2024-09-15 23:44:59 +02:00
Willy Tarreau
6e92988e20 MINOR: vars: remove the emptiness tests in callers before pruning
All callers of vars_prune_* currently check the list for emptiness.
Let's leave that to vars_prune() itself, it will ease some changes in
the code. Thanks to the previous inlining of the vars_prune() function,
there's no performance loss, and even a very tiny 0.1% gain.
2024-09-15 23:44:16 +02:00
Willy Tarreau
2c1a9c3a43 OPTIM: vars: inline vars_prune() to avoid many calls
Many configs don't have variables and call it for no reason, and even
configs with variables don't necessarily have some in all scopes.
2024-09-15 23:42:09 +02:00
Willy Tarreau
aad6b771dd OPTIM: vars: remove the unneeded lock in vars_prune_*
vars_prune() and vars_prune_all() take the variable lock while purging
all variables from a head. However this is not needed:
  - proc scope variables are only purged during deinit, hence no lock
    is needed ;
  - all other scopes are attached to entities bound to a single thread
    so no lock is needed either.

Removing the lock saves about 0.5% CPU on variables-intensive setups,
but above all simplify the code, so let's do it.
2024-09-15 23:05:50 +02:00
Willy Tarreau
51ade2f1db OPTIM: sample: don't check casts for samples of same type
Originally when converters were created, they were mostly for casting
types. Nowadays we have many artithmetic converters to perform operations
on integers, and a number of converters operating on strings. Both of
these categories most often do not need any cast since the input and
output types are the same, which is visible as the cast function is
c_none. However, profiling shows that when heavily using arithmetic
converters, it's possible to spend up to ~7% of the time in
sample_process_cnv(), a good part of which is only in accessing the
sample_casts[] array. Simply avoiding this lookup when input and ouput
types are equal saves about 2% CPU on such setups doing intensive use
of converters.
2024-09-15 12:43:56 +02:00
Willy Tarreau
b11495652e BUG/MEDIUM: queue: implement a flag to check for the dequeuing
As unveiled in GH issue #2711, commit 5541d4995d ("BUG/MEDIUM: queue:
deal with a rare TOCTOU in assign_server_and_queue()") does have some
side effects in that it can occasionally cause an endless loop.

As Christopher analysed it, the problem is that process_srv_queue(),
which uses a trylock in order to leave only one thread in charge of
the dequeueing process, can lose the lock race against pendconn_add().
If this happens on the last served request, then there's no more thread
to deal with the dequeuing, and assign_server_and_queue() will loop
forever on a condition that was initially exepected to be extremely
rare (and still is, except that now it can become sticky). Previously
what was happening is that such queued requests would just time out
and since that was very rare, nobody would notice.

The root of the problem really is that trylock. It was added so that
only one thread dequeues at a time but it doesn't offer only that
guarantee since it also prevents a thread from dequeuing if another
one is in the process of queuing. We need a different criterion.

What we're doing now is to set a flag "dequeuing" in the server, which
indicates that one thread is currently in the process of dequeuing
requests. This one is atomically tested, and only if no thread is in
this process, then the thread grabs the queue's lock and dequeues.
This way it will be serialized with pendconn_add() and no request
addition will be missed.

It is not certain whether the original race covered by the fix above
can still happen with this change, so better keep that fix for now.

Thanks to @Yenya (Jan Kasprzak) for the precise and complete report
allowing to spot the problem.

This patch should be backported wherever the patch above was backported.
2024-09-13 08:35:47 +02:00
Willy Tarreau
adaba6f904 BUG/MINOR: clock: validate that now_offset still applies to the current date
We want to make sure that now_offset is still valid for the current
date: another thread could very well have updated it by detecting a
backwards jump, and at the very same moment the time got fixed again,
that we retrieve and add to the new offset, which results in a larger
jump. Normally, for this to happen, it would mean that before_poll
was also affected by the jump and was detected before and bounded
within 2 seconds, resulting in max 2 seconds perturbations.

Here we try to detect this situation and fall back to re-adjusting the
offset instead.

It's more of a strengthening of what's done by commit e8b1ad4c2b
("BUG/MEDIUM: clock: also update the date offset on time jumps") than a
pure fix, in that the issue was not direclty observed but it's visibly
possible by reading the code, so this should be backported along with
the patch above. This is related to issue GH #2704.

Note that this could be simplified in terms of operations by migrating
the deadlines to nanoseconds, but this was the path to least intrusive
changes.
2024-09-12 19:09:19 +02:00
Willy Tarreau
af48e4cc6b BUG/MINOR: clock: make time jump corrections a bit more accurate
Since commit e8b1ad4c2b ("BUG/MEDIUM: clock: also update the date offset
on time jumps") we try to update the now_offet based on the last known
valid date. But if it's off compared to the global_now_ns date shared
by other threads, we'll get the time off a little bit. When this happens,
we should consider the most recent of these dates so that if the global
date was already known to be more recent, we should use it and stick to
it. This will avoid setting too large an offset that could in turn provoke
a larger jump on another thread.

This is related to issue GH #2704.

This can be backported to other branches having the patch above.
2024-09-12 18:27:03 +02:00
Willy Tarreau
ad98edd00a BUG/MINOR: polling: fix time reporting when using busy polling
Since commit beb859abce ("MINOR: polling: add an option to support
busy polling") the time and status passed to clock_update_local_date()
were incorrect. Indeed, what is considered is the before_poll date
related to the configured timeout which does not correspond to what
is passed to the poller. That's not correct because before_poll+the
syscall's timeout will be crossed by the current date 100 ms after
the start of the poller. In practice it didn't happen when the poller
was limited to 1s timeout but at one minute it happens all the time.

That's particularly visible when running a multi-threaded setup with
busy polling and only half of the threads working (bind ... thread even).
In this case, the fixup code of clock_update_local_date() is executed
for each round of busy polling. The issue was made really visible
starting with recent commit e8b1ad4c2b ("BUG/MEDIUM: clock: also
update the date offset on time jumps") because upon a jump, the
shared offset is reset, while it should not be in this specific
case.

What needs to be done instead is to pass the configured timeout of
the poller (and not of the syscall), and always pass "interrupted"
set so as to claim we got an event (which is sort of true as it just
means the poller returned instantly). In this case we can still
detect backwards/forward jumps and will use a correct boundary
for the maximum date that covers the whole loop.

This can be backported to all versions since the issue was introduced
with busy-polling in 1.9-dev8.
2024-09-12 17:47:13 +02:00
Christopher Faulet
1900ca475f MEDIUM: h1: Accept invalid T-E values with accept-invalid-http-response option
Since the 2.6, A parsing error is reported when the chunked encoding is
found twice. As stated in RFC9112, A sender must not apply the chunked
transfer coding more than once to a message body. It means only one chunked
coding must be found. In addition, empty values are also rejected becaues it
is forbidden by RFC9110.

However, in both cases, it may be useful to relax the rules for trusted
legacy servers when accept-invalid-http-response option is set. Especially
because it was accepted on 2.4 and older. In addition, T-E header is now
sanitized before sending it. It is not a problem Because it is a hop-by-hop
header

Note that it remains invalid on client side because there is no good reason
to relax the parsing on this side. We can argue a server is trusted so we
can decide to support some legacy behavior. It is not true on client side
and it is highly suspicious if a client is sending an invalid T-E header.

Note also we continue to reject unsupported T-E values (so all codings except
"chunked"). Because the "TE" header is sanitized and cannot contain other value
than "Trailers", there is absolutely no reason for a server to use something
else.

This patch should fix the issue #2677. It could probably be backported as
far as 2.6 if necessary.
2024-09-12 09:21:57 +02:00
Willy Tarreau
2b95c77c08 DOC: server: document what to check for when adding new server keywords
It's too easy to overlook the dynamic servers when adding new server
keywords, and the fields on each keyword line are totally obscure. This
commit adds a title to each column of the table and explains what is
expected and what to check for when adding a keyword.
2024-09-10 18:50:12 +02:00
Damien Claisse
ce6a621ae3 MINOR: server: allow init-state for dynamic servers
Commit 50322df introduced the init-state keyword, but it didn't enable
it for dynamic servers. However, this feature is perfectly desirable
for virtual servers too, where someone would like a server inlived
through "set server be1/srv1 state ready" to be put out of maintenance
in down state until the next health check succeeds.
At reading the code, it seems that it's only a matter of allowing this
keyword for dynamic servers, as current code path calls
srv_adm_set_ready() which incidentally triggers a call to
_srv_update_status_adm().
2024-09-10 18:18:38 +02:00
Willy Tarreau
33deb4babe REGTESTS: shorten a bit the delay for the h1/h2 upgrade test
Commit d6c4ed9a96 ("REGTESTS: h1/h2: Update script testing H1/H2
protocol upgrades") introduced a 0.5 second delay which is higher
than those of most other tests (usually 0.05 or 0.2) and triggers
timeouts on my side. Let's just shorten it to 0.2 since its goal
is only to send data separately.

Note: maybe a barrier approach would be possible, though not
      studied.
2024-09-10 10:36:59 +02:00
Willy Tarreau
9f8d9c9e8b BUG/MINOR: pattern: do not leave a leading comma on "set" error messages
Commit 4f2493f355 ("BUG/MINOR: pattern: pat_ref_set: fix UAF reported by
coverity") dropped the condition to concatenate error messages and as
such introduced a leading comma in front of all of them. Then commit
911f4d93d4 ("BUG/MINOR: pattern: pat_ref_set: return 0 if err was found")
changed the behavior to stop at the first error anyway, so all the
mechanics dedicated to the concatenation of error messages is no longer
needed and we can simply return the error as-is, without inserting any
comma.

This should be backported where the patches above are backported.
2024-09-10 08:55:29 +02:00
Willy Tarreau
036ab62231 REGTESTS: fix random failures with wrong_ip_port_logging.vtc under load
This test has an expect rule for syslog that looks for [cC]D, to
indicate a client abort or timeout during the data phase. The purpose
was to say that when it fails it must be this, but the very low timeout
(1ms) still makes it prone to succeeding if the machine is highly loaded.

This has become more visible since commit e8b1ad4c2b ("BUG/MEDIUM: clock:
also update the date offset on time jumps") because the clock drift
adjustments are more systematic. Since this commit, running 50 such tests
at twice more than the number of CPUs in parallel is sufficient to yield
errors due to some lines appearing as succeeding:

   make reg-tests -- --j $((($(nproc)+1)*2)) --vtestparams -n50 reg-tests/log/wrong_ip_port_logging.vtc

It was observed that pauses up to 300ms were observed in epoll_wait() in
such circumstances, which were properly fixed by the time drift detection..
Another approach would consist in increasing the permitted margin during
which we don't fix the clock drift but that would not be logical since the
base time had really been awaited for.

This should be backported to all stable releases since the commit above
will trigger the issue more often.
2024-09-09 19:38:28 +02:00
Christopher Faulet
a99d58819f BUG/MINOR: h1-htx: Don't flag response as bodyless when a tunnel is established
This reverts commit 225a4d02e1f6a12c0b4f3584949fad3339d71708.

When a 200-OK response is replied to a CONNECT request or a
101-Switching-protocol, a tunnel is considered as established between the
client and the server. However, we must not declare the reponse as
bodyless. Of course, there is no payload, but tunneled data are expected.

Because of this bug, the zero-copy forwarding is disabled on the server
side.

This patch must be backported as far as 2.9.
2024-09-09 19:01:47 +02:00
Christopher Faulet
f6e193f1b0 BUG/MAJOR: mux-h1: Wake SC to perform 0-copy forwarding in CLOSING state
When the mux is woken up on I/O events, if the zero-copy forwarding is
enabled, receives are blocked. In this case, the SC is woken up to be able
to perform 0-copy forwarding to the other side. This works well, except for
the H1C in CLOSING state.

Indeed, in that case, in h1_process(), the SC is not woken up because only
RUNNING H1 connections are considered. As consequence, the mux will ignore
connection closure. The H1 connection remains blocked, waiting for the
shutdown timeout. If no timeout is configured, the H1 connection is never
closed leading to a leak.

This patch should fix leak reported by Damien Claisse in the issue #2697. It
should be backported as far as 2.8.
2024-09-09 19:01:47 +02:00
William Lallemand
021ac6a108 MEDIUM: ssl/cli: "dump ssl cert" allow to dump a certificate in PEM format
The new "dump ssl cert" CLI command allows to dump a certificate stored
into HAProxy memory. Until now it was only possible to dump the
description of the certificate using "show ssl cert", but with this new
command you can dump the PEM content on the filesystem.

This command is only available on a admin stats socket.

$ echo "@1 dump ssl cert cert.pem" | socat /tmp/master.sock -
-----BEGIN PRIVATE KEY-----
[...]
-----END PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
2024-09-09 16:54:48 +02:00
Aurelien DARRAGON
68cfb222b5 BUG/MEDIUM: pattern: prevent UAF on reused pattern expr
Since c5959fd ("MEDIUM: pattern: merge same pattern"), UAF (leading to
crash) can be experienced if the same pattern file (and match method) is
used in two default sections and the first one is not referenced later in
the config. In this case, the first default section will be cleaned up.
However, due to an unhandled case in the above optimization, the original
expr which the second default section relies on is mistakenly freed.

This issue was discovered while trying to reproduce GH #2708. The issue
was particularly tricky to reproduce given the config and sequence
required to make the UAF happen. Hopefully, Github user @asmnek not only
provided useful informations, but since he was able to consistently
trigger the crash in his environment he was able to nail down the crash to
the use of pattern file involved with 2 named default sections. Big thanks
to him.

To fix the issue, let's push the logic from c5959fd a bit further. Instead
of relying on "do_free" variable to know if the expression should be freed
or not (which proved to be insufficient in our case), let's switch to a
simple refcounting logic. This way, no matter who owns the expression, the
last one attempting to free it will be responsible for freeing it.
Refcount is implemented using a 32bit value which fills a previous 4 bytes
structure gap:

        int                        mflags;               /*    80     4 */

        /* XXX 4 bytes hole, try to pack */

        long unsigned int          lock;                 /*    88     8 */
(output from pahole)

Even though it was not reproduced in 2.6 or below by @asmnek (the bug was
revealed thanks to another bugfix), this issue theorically affects all
stable versions (up to c5959fd), thus it should be backported to all
stable versions.
2024-09-09 16:07:05 +02:00
Aurelien DARRAGON
8157c1caf2 BUG/MEDIUM: pattern: prevent uninitialized reads in pat_match_{str,beg}
Using valgrind when running map_beg or map_str, the following error is
reported:

==242644== Conditional jump or move depends on uninitialised value(s)
==242644==    at 0x2E4AB1: pat_match_str (pattern.c:457)
==242644==    by 0x2E81ED: pattern_exec_match (pattern.c:2560)
==242644==    by 0x343176: sample_conv_map (map.c:211)
==242644==    by 0x27522F: sample_process_cnv (sample.c:1330)
==242644==    by 0x2752DB: sample_process (sample.c:1373)
==242644==    by 0x319917: action_store (vars.c:814)
==242644==    by 0x24D451: http_req_get_intercept_rule (http_ana.c:2697)

In fact, the error is legit, because in pat_match_{beg,str}, we
dereference the buffer on len+1 to check if a value was previously set,
and then decide to force NULL-byte if it wasn't set.

But the approach is no longer compatible with current architecture:
data past str.data is not guaranteed to be initialized in the buffer.
Thus we cannot dereference the value, else we expose us to uninitialized
read errors. Moreover, the check is useless, because we systematically
set the ending byte to 0 when the conditions are met.

Finally, restoring the older value after the lookup is not relevant:
indeed, either the sample is marked as const and in such case it
is already duplicated, or the sample is not const and we forcefully add
a terminating NULL byte outside from the actual string bytes (since we're
past str.data), so as we didn't alter effective string data and that data
past str.data cannot be dereferenced anyway as it isn't guaranteed to be
initialized, there's no point in restoring previous uninitialized data.

It could be backported in all stable versions. But since this was only
detected by valgrind and isn't known to cause issues in existing
deployments, it's probably better to wait a bit before backporting it
to avoid any breakage.. although the fix should be theoretically harmless.
2024-09-09 15:57:30 +02:00
Aurelien DARRAGON
3449525a02 BUG/MINOR: pattern: prevent const sample from being tampered in pat_match_beg()
This is a complementary patch to a68affeaa ("BUG/MINOR: pattern: a sample
marked as const could be written"). Indeed the same logic from
pat_match_str() is used there, but we lack the check to ensure that the
sample is not const before writing data to it.

It could be backported to all stable versions.
2024-09-09 15:57:23 +02:00
Willy Tarreau
ef8d8215de BUG/MEDIUM: clock: detect and cover jumps during execution
After commit e8b1ad4c2 ("BUG/MEDIUM: clock: also update the date offset
on time jumps"), @firexinghe mentioned that the issue was still present
in their case. In fact it depends on the load, which affects the
probability that the time changes between two poll() calls vs that it
changes during poll(). The time correction code used to only deal with
the latter. But under load if it changes between two poll() calls, what
happens then is that before_poll is off, and after returning from poll(),
the date is within bounds defined by before_poll, so no correction is
applied.

After many tests, it turns out that the most reliable solution without
using CLOCK_MONOTONIC is to prevent before_poll from being earlier than
the previous after_poll (trivial), and to cover forward jumps, we need
to enforce a margin. Given that the watchdog kills a looping task within
2 seconds and that no sane setup triggers it, it seems that 2 seconds
remains a safe enough margin. This means that in the worst case, some
forward jumps of up to 2 seconds will not be corrected, leading to an
apparent fast time and low rates. But this is supposed to be an exceptional
event anyway (typically an admin or crontab running ntpdate).

For future versions, given that we now opportunistically call
now_mono_time() before and after poll(), that returns zero if not
supported, we could imagine relying on this one for the thread's local
time when it's non-null.
2024-09-08 19:15:38 +02:00
Christopher Faulet
d6c4ed9a96 REGTESTS: h1/h2: Update script testing H1/H2 protocol upgrades
"http-messaging/protocol_upgrade.vtc" script was updated to test upgrades
for requests with a payload. It should fail when the request is sent to a H2
server. When sent to a H1 server, it should succeed, except if the server
replies before the end of the request.
2024-09-06 14:18:02 +02:00
Christopher Faulet
001fb1a548 BUG/MEDIUM: mux-h1/mux-h2: Reject upgrades with payload on H2 side only
Since 1d2d77b27 ("MEDIUM: mux-h1: Return a 501-not-implemented for upgrade
requests with a body"), it is no longer possible to perform a protocol
upgrade for requests with a payload. The main reason was to be able to
support protocol upgrade for H1 client requesting a H2 server. In that case,
the upgrade request is converted to a CONNECT request. So, it is not
possible to convey a payload in that case.

But, it is a problem for anyone wanting to perform upgrades on H1 server
using requests with a payload. It is uncommon but valid. So, now, it is the
H2 multiplexer responsibility to reject upgrade requests, on server side, if
there is a payload. An INTERNAL_ERROR is returned for the H2S in that
case. On H1 side, the upgrade is now allowed, but only if the server waits
for the end of the request to return the 101-Switching-protocol
response. Indeed, it is quite hard to synchronise the frontend side and the
backend side in that case. Asking to servers to fully consume the request
payload before returned the response seems reasonable.

This patch should fix the issue #2684. It could be backported after a period
of observation, as far as 2.4 if possible. But only if it is not too
hard. It depends on "MINOR: mux-h1: Set EOI on SE during demux when both
side are in DONE state".
2024-09-06 09:16:18 +02:00
Christopher Faulet
ad1ef94612 MINOR: mux-h1: Set EOI on SE during demux when both side are in DONE state
For now, this case is already handled for all requests except for those
waiting for a tunnel establishment (CONNECT and protocol upgrades). It is
not an issue because only bodyless requests are supported in these cases. So
the request is always finished at the end of headers and therefore before
the response.

However, to relax conditions for full H1 protocol upgrades (H1 client and
server), this case will be necessary. Indeed, the idea is to be able to
perform protocol upgrades for requests with a payload. Today, the "Upgrade:"
header is removed before sending the request to the server. But to support
this case, this patch is required to properly finish transaction when the
server does not perform the upgrade.
2024-09-06 09:00:13 +02:00
Willy Tarreau
c22fc591d4 DOC: configuration: place the HAPROXY_HTTP_LOG_FMT example on the correct line
When HAPROXY_HTTP_LOG_FMT was added by commit 537b9e7f36 ("MINOR: config:
add environment variables for default log format"), the example was placed
by accident after the clf log format instead of the HTTP log format,
causing a bit of confusion.

This can be backported to 2.8.
2024-09-06 07:41:16 +02:00
Willy Tarreau
a2aea9f573 [RELEASE] Released version 3.1-dev7
Released version 3.1-dev7 with the following main changes :
    - MINOR: config: Created env variables for http and tcp clf formats
    - MINOR: mux-quic: add buf_in_flight to QCC debug infos
    - MINOR: mux-quic: correct qcc_bufwnd_full() documentation
    - MINOR: tools: add helpers to backup/clean/restore env
    - MINOR: mworker: restore initial env before wait mode
    - BUG/MINOR: haproxy: free init_env in deinit only if allocated
    - BUILD: tools: environ is not defined in OS X and BSD
    - DEV: coccinelle: add a test to detect unchecked malloc()
    - DEV: coccinelle: add a test to detect unchecked calloc()
    - CI: QUIC Interop AWS-LC: enable ngtcp2 client
    - CI: fix missing comma introduced in 956839c0f68a7722acc586ecd91ffefad2ccb303
    - CI: QUIC Interop: do not run bandwidth measurement tests
    - CI: QUIC Interop: use different artifact names for uploading logs
    - BUILD: quic: 32bits build broken by wrong integer conversions for printf()
    - CLEANUP: ssl: cleanup the clienthello capture
    - MEDIUM: ssl: capture the supported_versions extension from Client Hello
    - MEDIUM: ssl/sample: add ssl_fc_supported_versions_bin sample fetch
    - MEDIUM: ssl: capture the signature_algorithms extension from Client Hello
    - MEDIUM: ssl/sample: add ssl_fc_sigalgs_bin sample fetch
    - MINOR: proxy: Add support of 429-Too-Many-Requests in retry-on status
    - BUG/MEDIUM: mux-h2: Set ES flag when necessary on 0-copy data forwarding
    - BUG/MEDIUM: stream: Prevent mux upgrades if client connection is no longer ready
    - BUG/MINIR: proxy: Match on 429 status when trying to perform a L7 retry
    - CLEANUP: haproxy: fix typos in code comment
    - CLEANUP: mqtt: fix typo in MQTT_REMAINING_LENGHT_MAX_SIZE
    - MINOR: tools: Implement ipaddrcpy().
    - MINOR: quic: Implement quic_tls_derive_token_secret().
    - MINOR: quic: Token for future connections implementation.
    - BUG/MINOR: quic: Missing incrementation in NEW_TOKEN frame builder
    - MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct)
    - MINOR: quic: Implement qc_ssl_eary_data_accepted().
    - MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
    - BUG/MEDIUM: quic: always validate sender address on 0-RTT
    - BUILD: quic: fix build errors on FreeBSD since recent GSO changes
    - MINOR: tools: extend str2sa_range to add an alt parameter
    - MINOR: server: add a alt_proto field for server
    - MEDIUM: sock: use protocol when creating socket
    - MEDIUM: protocol: add MPTCP per address support
    - BUG/MINOR: quic: Crash from trace dumping SSL eary data status (AWS-LC)
    - MEDIUM: stick-table: Add support of a factor for IN/OUT bytes rates
    - MEDIUM: bwlim: Use a read-lock on the sticky session to apply a shared limit
    - BUG/MEDIUM: mux-pt: Never fully close the connection on shutdown
    - BUG/MEDIUM: cli: Always release back endpoint between two commands on the mcli
    - BUG/MINOR: quic: unexploited retransmission cases for Initial pktns.
    - BUG/MEDIUM: mux-h1: Properly handle empty message when an error is triggered
    - MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places
    - BUG/MAJOR: mux-h2: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf
    - BUG/MINOR: mux-spop: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf
    - BUG/MINOR: Crash on O-RTT RX packet after dropping Initial pktns
    - BUG/MEDIUM: mux-pt: Fix condition to perform a shutdown for writes in mux_pt_shut()
    - CLEANUP: assorted typo fixes in the code and comments
    - DEV: patchbot: count the number of backported/non-backported patches
    - DEV: patchbot: add direct links to show only specific categories
    - DEV: patchbot: detect commit IDs starting with 7 chars
    - BUG/MEDIUM: clock: also update the date offset on time jumps
    - MEDIUM: server: add init-state
2024-09-05 18:53:54 +02:00
Aaron Kuehler
50322dff81 MEDIUM: server: add init-state
Allow the user to set the "initial state" of a server.

Context:

Servers are always set in an UP status by default. In
some cases, further checks are required to determine if the server is
ready to receive client traffic.

This introduces the "init-state {up|down}" configuration parameter to
the server.

- when set to 'fully-up', the server is considered immediately available
  and can turn to the DOWN sate when ALL health checks fail.
- when set to 'up' (the default), the server is considered immediately
  available and will initiate a health check that can turn it to the DOWN
  state immediately if it fails.
- when set to 'down', the server initially is considered unavailable and
  will initiate a health check that can turn it to the UP state immediately
  if it succeeds.
- when set to 'fully-down', the server is initially considered unavailable
  and can turn to the UP state when ALL health checks succeed.

The server's init-state is considered when the HAProxy instance
is (re)started, a new server is detected (for example via service
discovery / DNS resolution), a server exits maintenance, etc.

Link: https://github.com/haproxy/haproxy/issues/51
2024-09-05 11:13:10 +02:00
Willy Tarreau
e8b1ad4c2b BUG/MEDIUM: clock: also update the date offset on time jumps
In GH issue #2704, @swimlessbird and @xanoxes reported problems handling
time jumps. Indeed, since 2.7 with commit 4eaf85f5d9 ("MINOR: clock: do
not update the global date too often") we refrain from updating the global
offset in case it didn't change. But there's a catch: in case of a large
time jump, if the poller was interrupted, the local time remains the same
and we return immediately from there without updating the offset. It then
becomes incorrect regarding the "date" value, and upon subsequent call to
the poller, there's no way to detect a jump anymore so we apply the old,
incorrect offset and the date becomes wrong. Worse, going back to the
original time (then in the past), global_now_ns remains higher than the
local time and neither get updated anymore.

What is missing in practice is to immediately update the offset when
detecting a time jump. In an ideal world, the offset would be updated
upon every call, that's what was being done prior to commit above but
it's extremely CPU intensive on large systems. However we can perfectly
afford to update the offset every time we detect a time jump, as it's
not as common.

This needs to be backported as far as 2.8. Thanks to both participants
above for providing very helpful details.
2024-09-04 16:55:43 +02:00
Willy Tarreau
531bf44a65 DEV: patchbot: detect commit IDs starting with 7 chars
Some commit messages contain commit IDs as short as 7 chars, let's detect
them.
2024-09-04 09:41:40 +02:00
Willy Tarreau
f6910a4578 DEV: patchbot: add direct links to show only specific categories
The per-category counters are now clickable so that it becomes possible
to list the relevant ones.
2024-09-04 09:38:43 +02:00
Willy Tarreau
eaf4adb5e2 DEV: patchbot: count the number of backported/non-backported patches
It's useful to instantly see how many patches of each category have
already been backported and are still pending, let's count them and
report them at the top of the page.
2024-09-04 09:11:04 +02:00
Ilya Shipitsin
1f6e5f7a61 CLEANUP: assorted typo fixes in the code and comments
This is 43rd iteration of typo fixes
2024-09-03 17:49:21 +02:00
Christopher Faulet
e1cae42879 BUG/MEDIUM: mux-pt: Fix condition to perform a shutdown for writes in mux_pt_shut()
A regression was introduced in the commit 76fa71f7a ("BUG/MEDIUM: mux-pt:
Never fully close the connection on shutdown") because of a typo on the
connection flags. CO_FL_SOCK_WR_SH flag must be tested to prevent a call to
conn_sock_shutw() and not CO_FL_SOCK_RD_SH.

Concretly, most of time, it is harmeless because shutdown for writes is
always performed before any shutdown for reads. Except in case describe by
the commit above. But it is not clear if it has an impact or not.

This patch must be backported with the commit above, so as far as 2.9.
2024-09-03 15:25:05 +02:00
Frederic Lecaille
7e19432fd4 BUG/MINOR: Crash on O-RTT RX packet after dropping Initial pktns
This bug arrived with this naive commit:

    BUG/MINOR: quic: Too shord datagram during O-RTT handshakes (aws-lc only)

which omitted to consider the case where the Initial packet number space
could be discarded before receiving 0-RTT packets.

To fix this, append/insert the O-RTT (early-data) packet number space
into the encryption level list depending on the presence or not of
the Initial packet number space.

This issue was revealed when using aws-lc as TLS stack in GH #2701 issue.
Thank you to @Tristan971 for having reported this issue.

Must be backported where the commit mentionned above is supposed to be
backported: as far as 2.9.
2024-09-03 15:23:06 +02:00
Willy Tarreau
f8bff3b531 BUG/MINOR: mux-spop: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf
That's the equivalent of the mux-h2 one, except that here there's no
real risk to loop since normally we cannot feed data that bypass the
closed state check (e.g. no zero-copy forward). But it still remains
dirty to be able to leave and empty mbuf with MFULL and MROOM set, so
better clear them as well.

No backport is needed since this is only in 3.1.
2024-09-03 14:39:04 +02:00
Willy Tarreau
830e50561c BUG/MAJOR: mux-h2: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf
There exists an extremely tricky code path that was revealed in 3.0 by
the glitches feature, though it might theoretically have existed before.

TL;DR: a mux mbuf may be full after successfully sending GOAWAY, and
discard its remaining contents without clearing H2_CF_MUX_MFULL and
H2_CF_DEM_MROOM, then endlessly loop in h2_send(), until the watchdog
takes care of it.

What can happen is the following: Some data are received, h2_io_cb() is
called. h2_recv() is called to receive the incoming data. Then
h2_process() is called and in turn calls h2_process_demux() to process
input data. At some point, a glitch limit is reached and h2c_error() is
called to close the connection. The input frame was incomplete, so some
data are left in the demux buffer. Then h2_send() is called, which in
turn calls h2_process_mux(), which manages to queue the GOAWAY frame,
turning the state to H2_CS_ERROR2. The frame is sent, and h2_process()
calls h2_send() a last time (doing nothing) and leaves. The streams
are all woken up to notify about the error.

Multiple backend streams were waiting to be scheduled and are woken up
in turn, before their parents being notified, and communicate with the
h2 mux in zero-copy-forward mode, request a buffer via h2_nego_ff(),
fill it, and commit it with h2_done_ff(). At some point the mux's output
buffer is full, and gets flags H2_CF_MUX_MFULL.

The io_cb is called again to process more incoming data. h2_send() isn't
called (polled) or does nothing (e.g. TCP socket buffers full). h2_recv()
may or may not do anything (doesn't matter). h2_process() is called since
some data remain in the demux buf. It goes till the end, where it finds
st0 == H2_CS_ERROR2 and clears the mbuf. We're now in a situation where
the mbuf is empty and MFULL is still present.

Then it calls h2_send(), which doesn't call h2_process_mux() due to
MFULL, doesn't enter the for() loop since all buffers are empty, then
keeps sent=0, which doesn't allow to clear the MFULL flag, and since
"done" was not reset, it loops forever there.

Note that the glitches make the issue more reproducible but theoretically
it could happen with any other GOAWAY (e.g. PROTOCOL_ERROR). What makes
it not happen with the data produced on the parsing side is that we
process a single buffer of input at once, and there's no way to amplify
this to 30 buffers of responses (RST_STREAM, GOAWAY, SETTINGS ACK,
WINDOW_UPDATE, PING ACK etc are all quite small), and since the mbuf is
cleared upon every exit from h2_process() once the error was sent, it is
not possible to accumulate response data across multiple calls. And the
regular h2_snd_buf() path checks for st0 >= H2_CS_ERROR so it will not
produce any data there either.

Probably that h2_nego_ff() should check for H2_CS_ERROR before accepting
to deliver a buffer, but this needs to be carefully studied. In the mean
time the real problem is that the MFULL flag was kept when clearing the
buffer, making the two inconsistent.

Since it doesn't seem possible to trigger this sequence without the
zero-copy-forward mechanism, this fix needs to be backported as far as
2.9, along with previous commit "MINOR: mux-h2: try to clear DEM_MROOM
and MUX_MFULL at more places" which will strengthen the consistency
between these checks.

Many thanks to Annika Wickert for her detailed report that allowed to
diagnose this problem. CVE-2024-45506 was assigned to this problem.
2024-09-03 14:39:04 +02:00
Willy Tarreau
e9cdedb39b MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places
The code leading to H2_CF_MUX_MFULL and H2_CF_DEM_MROOM being cleared
is quite complex and assumptions about its state are extremely difficult
when reading the code. There are indeed long sequences where the mux might
possibly be empty, still having the flag set until it reaches h2_send()
which will clear it after the last send. Even then it's not obviour whether
it's always guaranteed to release the flag when invoked in multiple passes.
Let's just simplify the conditionnn so that h2_send() does not depend on
"sent" anymore and that h2_timeout_task() doesn't leave the flags set on
the buffer on emptiness. While it doesn't seem to fix anything, it will
make the code more robust against future changes.
2024-09-03 14:39:04 +02:00
Christopher Faulet
0d4271cdae BUG/MEDIUM: mux-h1: Properly handle empty message when an error is triggered
When a 400/408/500/501 error is returned by the H1 multiplexer, we first try
to get the error message of the proxy before using the default one. This may
be configured to be mapped on /dev/null or on an empty file. In that case,
no message is emitted, as expected. But everything is handled as the error
was successfully sent.

However, there is an bug here. In h1_send_error() function, this case is not
properly handled. The flag H1C_F_ABRTED is not set on the H1 connection as it
should be and h1_close() function is not called, leaving the H1 connection in an
undefined state.

It is especially an issue when a "empty" 408-Request-Time-out error is emitted
while there are data blocked in the output buffer. In that case, the connection
remains openned until the client closes and a "cR--"/408 is logged repeatedly, every
time the client timeout is reached.

This patch must backported as far as 2.8.
2024-09-03 14:28:42 +02:00
Frederic Lecaille
15a737eb5f BUG/MINOR: quic: unexploited retransmission cases for Initial pktns.
qc_prep_hdshk_fast_retrans() job is to pick some packets to be retransmitted
from Initial and Handshake packet number spaces. A packet may be coalesced to
a first one into the same datagram. When a coalesced packet is inspected for
retransmission, it is skipped if its length would make the total datagram length
it is attached to exceeding the anti-amplification limit. But in this case, the
first packet must be kept for the current retransmission. This is tracked by
this trace statemement:
    TRACE_PROTO("will probe Initial packet number space", QUIC_EV_CONN_SPPKTS, qc);
This was not the case because of the wrong "goto end" statement. This latter
must be run only if the Initial packet number space must not be probe with
the first packet found as coalesced to another one which must be skipped.

This bug was revealed by AWS-LC interop runner with handshakeloss and
handshakecorruption which always fail because this stack leads the server
to send more Initial packets.

Thank you to Ilya (@chipitsine) for this issue report in GH #2663.

Must be backported as far as 2.6.
2024-09-03 11:47:51 +02:00
Christopher Faulet
d4781bd5e7 BUG/MEDIUM: cli: Always release back endpoint between two commands on the mcli
When several commands are chained on the master CLI, the same client
connection is used. Because, it is a TCP connection, the mux PT is used. It
means there is no stream at the mux level. It is not possible to release the
applicative stream between each commands as for the HTTP. So, to work around
this limitation, between two commands, the master CLI is resetting the
stream. It does exactly what it was performed on HTTP to manage keep-alive
connections on old HAProxy versions.

But this part was copied from a code dealing with connection only while the
back endpoint can be an applet or a mux for the master cli. The previous fix
on the mux PT ("BUG/MEDIUM: mux-pt: Never fully close the connection on
shutdown") revealed a bug. Between two commands, the back endpoint was only
released if the connection's XPRT was closed. This works if the back
endpoint is an applet because there is no connection. But for commands sent
to a worker, a connection is used. At this stage, this only works if the
connection's XPRT is closed. Otherwise, the old endpoint is never detached
leading to undefined behavior on the next command execution (most probably a
crash).

Without the commit above, the connection's XPRT is always closed on
shutdown. It is no longer true. At this stage, we must inconditionnally
release the back endpoint by resetting the corresponding sedesc to fix the
bug.

This patch must be backported with the commit above in all stable
versions. On 2.4 and lower, it will need to be adapted.
2024-09-02 18:31:35 +02:00
Christopher Faulet
76fa71f7a8 BUG/MEDIUM: mux-pt: Never fully close the connection on shutdown
When a shutdown is reported to the mux (shutdown for reads or writes), the
connexion is immediately fully closed if the mux detects the connexion is
closed in both directions. Only the passthrough multiplexer is able to
perform this action at this stage because there is no stream and no internal
data. Other muxes perform a full connection close during the mux's release
stage. It was working quite well since recently. But, in theory, the bug is
quite old.

In fact, it seems possible for the lower layer to report an error on the
connection in same time a shutdown is performed on the mux. Depending on how
events are scheduled, the following may happen:

 1. An connection error is detected at the fd layer and a wakeup is
    scheduled on the mux to handle the event.

 2. A shutdown for writes is performed on the mux. Here the mux decides to
    fully close the connexion. If the xprt is not used to log info, it is
    released.

 3. The mux is finally woken up. It tries to retrieve data from the xprt
    because it is not awayre there was an error. This leads to a crash
    because of a NULL-deref.

By reading the code, it is not obvious. But it seems possible with SSL
connection when the handshake is rearmed. It happens when a
SSL_ERROR_WANT_WRITE is reported on a SSL_read() attempt or a
SSL_ERROR_WANT_READ on a SSL_write() attempt.

This bug is only visible if the XPRT is not used to log info. So it is no so
common.

This patch should fix the 2nd crash reported in the issue #2656. It must
first be backported as far as 2.9 and then slowly to all stable versions.
2024-09-02 15:50:25 +02:00
Christopher Faulet
f9adcdf039 MEDIUM: bwlim: Use a read-lock on the sticky session to apply a shared limit
There is no reason to acquire a write-lock on the sticky session when a
shared limit is applied because only the frequency is updated. The sticky
session itself is not modified. We must just take care it is not removed in
the mean time. So a read-lock may be used instead.
2024-09-02 15:50:25 +02:00
Christopher Faulet
a7f6b0ac03 MEDIUM: stick-table: Add support of a factor for IN/OUT bytes rates
Add a factor parameter to stick-tables, called "brates-factor", that is
applied to in/out bytes rates to work around the 32-bits limit of the
frequency counters. Thanks to this factor, it is possible to have bytes
rates beyond the 4GB. Instead of counting each bytes, we count blocks
of bytes. Among other things, it will be useful for the bwlim filter, to be
able to configure shared limit exceeding the 4GB/s.

For now, this parameter must be in the range ]0-1024].
2024-09-02 15:50:25 +02:00
Frederic Lecaille
db13df3d6e BUG/MINOR: quic: Crash from trace dumping SSL eary data status (AWS-LC)
This bug follows this patch:
     MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
where a new third variable was added to be dumped from QUIC_EV_CONN_IO_CB trace
event. The quic_trace() code did not reveal there was already another variable
passed as third argument but not dumped. This leaded to crash when dereferencing
a point to an int in place of a point to an SSL object.

This issue was reproduced only by handshakecorruption aws-lc interop test with
s2n-quic as client.

Note that this patch must be backported with this one:
     BUG/MEDIUM: quic: always validate sender address on 0-RTT
which depends on the commit mentionned above.
2024-09-02 10:01:41 +02:00
Aperence
20efb856e1 MEDIUM: protocol: add MPTCP per address support
Multipath TCP (MPTCP), standardized in RFC8684 [1], is a TCP extension
that enables a TCP connection to use different paths.

Multipath TCP has been used for several use cases. On smartphones, MPTCP
enables seamless handovers between cellular and Wi-Fi networks while
preserving established connections. This use-case is what pushed Apple
to use MPTCP since 2013 in multiple applications [2]. On dual-stack
hosts, Multipath TCP enables the TCP connection to automatically use the
best performing path, either IPv4 or IPv6. If one path fails, MPTCP
automatically uses the other path.

To benefit from MPTCP, both the client and the server have to support
it. Multipath TCP is a backward-compatible TCP extension that is enabled
by default on recent Linux distributions (Debian, Ubuntu, Redhat, ...).
Multipath TCP is included in the Linux kernel since version 5.6 [3]. To
use it on Linux, an application must explicitly enable it when creating
the socket. No need to change anything else in the application.

This attached patch adds MPTCP per address support, to be used with:

  mptcp{,4,6}@<address>[:port1[-port2]]

MPTCP v4 and v6 protocols have been added: they are mainly a copy of the
TCP ones, with small differences: names, proto, and receivers lists.

These protocols are stored in __protocol_by_family, as an alternative to
TCP, similar to what has been done with QUIC. By doing that, the size of
__protocol_by_family has not been increased, and it behaves like TCP.

MPTCP is both supported for the frontend and backend sides.

Also added an example of configuration using mptcp along with a backend
allowing to experiment with it.

Note that this is a re-implementation of Bjrn's work from 3 years ago
[4], when haproxy's internals were probably less ready to deal with
this, causing his work to be left pending for a while.

Currently, the TCP_MAXSEG socket option doesn't seem to be supported
with MPTCP [5]. This results in a warning when trying to set the MSS of
sockets in proto_tcp:tcp_bind_listener.

This can be resolved by adding two new variables:
sock_inet(6)_mptcp_maxseg_default that will hold the default
value of the TCP_MAXSEG option. Note that for the moment, this
will always be -1 as the option isn't supported. However, in the
future, when the support for this option will be added, it should
contain the correct value for the MSS, allowing to correctly
set the TCP_MAXSEG option.

Link: https://www.rfc-editor.org/rfc/rfc8684.html [1]
Link: https://www.tessares.net/apples-mptcp-story-so-far/ [2]
Link: https://www.mptcp.dev [3]
Link: https://github.com/haproxy/haproxy/issues/1028 [4]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/515 [5]

Co-authored-by: Dorian Craps <dorian.craps@student.vinci.be>
Co-authored-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
2024-08-30 18:53:49 +02:00
Aperence
2f171fe36a MEDIUM: sock: use protocol when creating socket
Use the protocol configured for a connection when creating the socket,
instead of always using 0.

This change is needed to allow new protocol to be used when creating
the sockets, such as MPTCP. Note however that this patch won't change
anything for now, as the only other value that proto->sock_prot could
hold is IPPROTO_TCP, which has the same behavior as 0 when passed to
socket.
2024-08-30 18:53:49 +02:00
Aperence
38618822e1 MINOR: server: add a alt_proto field for server
Add a new field alt_proto to the server structures that
specify if an alternate protocol should be used for this server.

This field can be transparently passed to protocol_lookup to get
an appropriate protocol structure.

This change allows thus to create servers with different protocols,
and not only TCP anymore.
2024-08-30 18:53:49 +02:00
Aperence
a7b04e383a MINOR: tools: extend str2sa_range to add an alt parameter
Add a new parameter "alt" that will store wether this configuration
use an alternate protocol.

This alt pointer will contain a value that can be transparently
passed to protocol_lookup to obtain an appropriate protocol structure.

This change is needed to allow for example the servers to know if it
need to use an alternate protocol or not.
2024-08-30 18:53:49 +02:00
Willy Tarreau
2bc513dd31 BUILD: quic: fix build errors on FreeBSD since recent GSO changes
The following commits broke the build on FreeBSD when QUIC is enabled:

  35470d518 ("MINOR: quic: activate UDP GSO for QUIC if supported")
  448d3d388 ("MINOR: quic: add GSO parameter on quic_sock send API")

Indeed, it turns out that netinet/udp.h requires sys/types.h to be
included before. Let's just change the includes order to fix the build.
No backport is needed.
2024-08-30 18:53:49 +02:00
Frederic Lecaille
f627b9272b BUG/MEDIUM: quic: always validate sender address on 0-RTT
It has been reported by Wedl Michael, a student at the University of Applied
Sciences St. Poelten, a potential vulnerability into haproxy as described below.

An attacker could have obtained a TLS session ticket after having established
a connection to an haproxy QUIC listener, using its real IP address. The
attacker has not even to send a application level request (HTTP3). Then
the attacker could open a 0-RTT session with a spoofed IP address
trusted by the QUIC listen to bypass IP allow/block list and send HTTP3 requests.

To mitigate this vulnerability, one decided to use a token which can be provided
to the client each time it successfully managed to connect to haproxy. These
tokens may be reused for future connections to validate the address/path of the
remote peer as this is done with the Retry token which is used for the current
connection, not the next one. Such tokens are transported by NEW_TOKEN frames
which was not used at this time by haproxy.

So, each time a client connect to an haproxy QUIC listener with 0-RTT
enabled, it is provided with such a token which can be reused for the
next 0-RTT session. If no such a token is presented by the client,
haproxy checks if the session is a 0-RTT one, so with early-data presented
by the client. Contrary to the Retry token, the decision to refuse the
connection is made only when the TLS stack has been provided with
enough early-data from the Initial ClientHello TLS message and when
these data have been accepted. Hopefully, this event arrives fast enough
to allow haproxy to kill the connection if some early-data have been accepted
without token presented by the client.

quic_build_post_handshake_frames() has been modified to build a NEW_TOKEN
frame with this newly implemented token to be transported inside.

quic_tls_derive_retry_token_secret() was renamed to quic_do_tls_derive_token_secre()
and modified to be reused and derive the secret for the new token implementation.

quic_token_validate() has been implemented to validate both the Retry and
the new token implemented by this patch. When this is a non-retry token
which could not be validated, the datagram received is marked as requiring
a Retry packet to be sent, and no connection is created.

When the Initial packet does not embed any non-retry token and if 0-RTT is enabled
the connection is marked with this new flag: QUIC_FL_CONN_NO_TOKEN_RCVD. As soon
as the TLS stack detects that some early-data have been provided and accepted by
the client, the connection is marked to be killed (QUIC_FL_CONN_TO_KILL) from
ha_quic_add_handshake_data(). This is done calling qc_ssl_eary_data_accepted()
new function. The secret TLS handshake is interrupted as soon as possible returnin
0 from ha_quic_add_handshake_data(). The connection is also marked as
requiring a Retry packet to be sent (QUIC_FL_CONN_SEND_RETRY) from
ha_quic_add_handshake_data(). The the handshake I/O handler (quic_conn_io_cb())
knows how to behave: kill the connection after having sent a Retry packet.

About TLS stack compatibility, this patch is supported by aws-lc. It is
disabled for wolfssl which does not support 0-RTT at this time thanks
to HAVE_SSL_0RTT_QUIC.

This patch depends on these commits:

     MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
     MINOR: quic: Implement qc_ssl_eary_data_accepted().
     MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct)
     BUG/MINOR: quic: Missing incrementation in NEW_TOKEN frame builder
     MINOR: quic: Token for future connections implementation.
     MINOR: quic: Implement quic_tls_derive_token_secret().
     MINOR: tools: Implement ipaddrcpy().

Must be backported as far as 2.6.
2024-08-30 17:04:09 +02:00
Frederic Lecaille
8854cef036 MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
Dump the early data status from QUIC_EV_CONN_IO_CB trace event.
This is very helpful to know if the QUIC server has accepted the
early data received from clients.
2024-08-30 17:04:09 +02:00
Frederic Lecaille
609b124561 MINOR: quic: Implement qc_ssl_eary_data_accepted().
This function is a wrapper around SSL_get_early_data_status() for
OpenSSL derived stack and SSL_early_data_accepted() boringSSL derived
stacks like AWS-LC. It returns true for a TLS server if it has
accepted the early data received from a client.

Also implement quic_ssl_early_data_status_str() which is dedicated to be used
for debugging purposes (traces). This function converts the enum returned
by the two function mentionned above to a human readable string.
2024-08-30 17:04:09 +02:00
Frederic Lecaille
e926378375 MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct)
Modify qf_new_token structure to use a static buffer with QUIC_TOKEN_LEN
as size as defined by the token for future connections (quic_token.c).
Modify consequently the NEW_TOKEN frame parser (see quic_parse_new_token_frame()).
Also add comments to denote that the NEW_TOKEN parser function is used only by
clients and that its builder is used only by servers.
2024-08-30 17:04:09 +02:00
Frederic Lecaille
76c80605a6 BUG/MINOR: quic: Missing incrementation in NEW_TOKEN frame builder
quic_build_new_token_frame() is the function which is called to build
a NEW_TOKEN frame into a buffer. The position pointer for this buffer
was not updated, leading the NEW_TOKEN frame to be malformed.

Must be backported as far as 2.6.
2024-08-30 17:04:09 +02:00
Frederic Lecaille
f5b09dc452 MINOR: quic: Token for future connections implementation.
There exist two sorts of token used by QUIC. They are both used to validate
the peer address (path validation). Retry are used for the current
connection the client want to open. This patch implement the other
sort of tokens which after having been received from a connection, may
be provided for the next connection from the same IP address to validate
it (or validate the network path between the client and the server).

The token generation is implemented by quic_generate_token(), and
the token validation by quic_token_chek(). The same method
is used as for Retry tokens to build such tokens to be reused for
future connections. The format is very simple: one byte for the format
identifier to distinguish these new tokens for the Retry token, followed
by a 32bits timestamps. As this part is ciphered with AEAD as cryptographic
algorithm, 16 bytes are needed for the AEAD tag. 16 more random bytes
are added to this token and a salt to derive the AEAD secret used
to cipher the token. In addition to this salt, this is the client IP address
which is used also as AAD to derive the AEAD secret. So, the length of
the token is fixed: 37 bytes.
2024-08-30 17:04:09 +02:00
Frederic Lecaille
74caa0eece MINOR: quic: Implement quic_tls_derive_token_secret().
This is function is similar to quic_tls_derive_retry_token_secret().
Its aim is to derive the secret used to cipher the token to be used
for future connections.

This patch renames quic_tls_derive_retry_token_secret() to a more
and reuses its code to produce a more generic one: quic_do_tls_derive_token_secret().
Two arguments are added to this latter to produce both quic_tls_derive_retry_token_secret()
and quic_tls_derive_token_secret() new function which calls
quic_do_tls_derive_token_secret().
2024-08-30 17:04:09 +02:00
Frederic Lecaille
fb7a092203 MINOR: tools: Implement ipaddrcpy().
Implement ipaddrcpy() new function to copy only the IP address from
a sockaddr_storage struct object into a buffer.
2024-08-30 17:04:09 +02:00
Nicolas CARPi
a33407b499 CLEANUP: mqtt: fix typo in MQTT_REMAINING_LENGHT_MAX_SIZE
There was a typo in the macro name, where LENGTH was incorrectly
written. This didn't cause any issue because the typo appeared in all
occurrences in the codebase.
2024-08-30 14:58:59 +02:00
Nicolas CARPi
534e7e4598 CLEANUP: haproxy: fix typos in code comment
Use "from" instead of "form" in ha_random_boot function code comments.
2024-08-30 14:58:59 +02:00
Christopher Faulet
62c9d51ca4 BUG/MINIR: proxy: Match on 429 status when trying to perform a L7 retry
Support for 429 was recently added to L7 retries (0d142e075 "MINOR: proxy:
Add support of 429-Too-Many-Requests in retry-on status"). But the
l7_status_match() function was not properly updated. The switch statement
must match the 429 status to be able to perform a L7 retry.

This patch must be backported if the commit above is backported. It is
related to #2687.
2024-08-30 12:13:32 +02:00
Christopher Faulet
e4812404c5 BUG/MEDIUM: stream: Prevent mux upgrades if client connection is no longer ready
If an early error occurred on the client connection, we must prevent any
multiplexer upgrades. Indeed, it is unexpected for a mux to be initialized
with no xprt. On a normal workflow it is impossible. So it is not an
issue. But if a mux upgrade is performed at the stream level, an early error
on the connection may have already been handled by the previous mux and the
connection may be already fully closed. If the mux upgrade is still
performed, a crash can be experienced.

It is possible to have a crash with an implicit TCP>HTTP upgrade if there is no
data in the input buffer. But it is also possible to get a crash with an
explicit "switch-mode http" rule.

It must be backported to all stable versions. In 2.2, the patch must be
applied directly in stream_set_backend() function.
2024-08-28 16:38:20 +02:00
Christopher Faulet
4ef5251c44 BUG/MEDIUM: mux-h2: Set ES flag when necessary on 0-copy data forwarding
When DATA frames are sent via the 0-copy data forwarding, we must take care
to set the ES flag on the last DATA frame. It should be performed in
h2_done_ff() when IOBUF_FL_EOI flag was set by the producer. This flag is
here to know when the producer has reached the end of input. When this
happens, the h2s state is also updated. It is switched to "half-closed
local" or "closed" state depending on its previous state.

It is mainly an issue on uploads because the server may be blocked waiting
for the end of the request. A workaround is to disable the 0-copy forwarding
support the the H2 by setting "tune.h2.zero-copy-fwd-send" directive to off
in your global section.

This patch should fix the issue #2665. It must be backported as far as 2.9.
2024-08-28 10:05:34 +02:00
Christopher Faulet
0d142e0756 MINOR: proxy: Add support of 429-Too-Many-Requests in retry-on status
The "429" status can now be specified on retry-on directives. PR_RE_* flags
were updated to remains sorted.

This patch should fix the issue #2687. It is quite simple so it may safely
be backported to 3.0 if necessary.
2024-08-28 10:05:34 +02:00
William Lallemand
d2fc1ab66e MEDIUM: ssl/sample: add ssl_fc_sigalgs_bin sample fetch
This new sample fetch allow to extract the binary list contained in the
signature_algorithms (13) TLS extensions.

https://datatracker.ietf.org/doc/html/rfc8446#section-4.2.3
2024-08-26 15:17:40 +02:00
William Lallemand
e8fecef0ff MEDIUM: ssl: capture the signature_algorithms extension from Client Hello
Activate the capture of the TLS signature_algorithms extension from the
Client Hello. This list is stored in the ssl_capture buffer when the
global option "tune.ssl.capture-cipherlist-size" is enabled.
2024-08-26 15:17:40 +02:00
William Lallemand
ac5c7158f9 MEDIUM: ssl/sample: add ssl_fc_supported_versions_bin sample fetch
This new sample fetch allow to extract the binary list contained in the
supported_versions (43) TLS extensions.

https://datatracker.ietf.org/doc/html/rfc8446#section-4.2.1
2024-08-26 15:17:40 +02:00
William Lallemand
ce7fb6628e MEDIUM: ssl: capture the supported_versions extension from Client Hello
Activate the capture of the TLS supported_versions extension from the
Client Hello. This list is stored in the ssl_capture buffer when the
global option "tune.ssl.capture-cipherlist-size" is enabled.
2024-08-26 15:12:42 +02:00
William Lallemand
3c0a0f1e1b CLEANUP: ssl: cleanup the clienthello capture
In order to add more extensions, clean up the clienthello capture
function a little bit.
2024-08-26 15:12:42 +02:00
Frederic Lecaille
414e3aa6bc BUILD: quic: 32bits build broken by wrong integer conversions for printf()
Since these commits the 32bits build is broken due to several errors as follow:

CC      src/quic_cli.o
src/quic_cli.c: In function ‘dump_quic_full’:
src/quic_cli.c:285:94: error: format ‘%ld’ expects argument of type ‘long int’,
        but argument 5 has type ‘uint64_t’ {aka ‘long long unsigned int’} [-Werror=format=]
  285 |                         chunk_appendf(&trash, "  [initl] rx.ackrng=%-6zu tx.inflight=%-6zu(%ld%%)\n",
      |                                                                                            ~~^
      |                                                                                              |
      |                                                                                              long int
      |                                                                                            %lld
  286 |                                       pktns->rx.arngs.sz, pktns->tx.in_flight,
  287 |                                       pktns->tx.in_flight * 100 / qc->path->cwnd);
      |                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                                 |
      |                                                                 uint64_t {aka long long unsigned int}

Replace several %ld by %llu with ull as printf conversion in quic_clic.c and a
%ld by %lld with (long long) as printf conversion in quic_cc_cubic.c.

Thank you to Ilya (@chipitsine) for having reported this issue in GH #2689.

Must be backported to 3.0.
2024-08-26 11:21:48 +02:00
Ilya Shipitsin
4256961a44 CI: QUIC Interop: use different artifact names for uploading logs
artifact names must be unique, otherwise only first failed logs are
uploaded, other encounter 409 conflict
2024-08-26 11:19:41 +02:00
Ilya Shipitsin
438ad6b495 CI: QUIC Interop: do not run bandwidth measurement tests
crosstraffic, goodput tests are intended to perform bandwidth measurement,
we do not consider GitHub runners for that purpose

GH issue: https://github.com/haproxy/haproxy/issues/2688
2024-08-26 11:19:41 +02:00
Ilya Shipitsin
f583ed9469 CI: fix missing comma introduced in 956839c0f68a7722acc586ecd91ffefad2ccb303
in 956839c0f68a7722acc586ecd91ffefad2ccb303 syntax was broken due to missing
comma. it is follow up.
2024-08-26 11:19:41 +02:00
Ilya Shipitsin
956839c0f6 CI: QUIC Interop AWS-LC: enable ngtcp2 client
Let's add it and see how it goes.
GH issue: https://github.com/haproxy/haproxy/issues/2688
2024-08-24 19:13:59 +02:00
Ilya Shipitsin
8f2112a04f DEV: coccinelle: add a test to detect unchecked calloc()
The coccinelle test "unchecked-calloc.cocci" detects various cases of
unchecked calloc().
2024-08-24 19:13:56 +02:00
Ilya Shipitsin
2ec42bff48 DEV: coccinelle: add a test to detect unchecked malloc()
The coccinelle test "unchecked-malloc.cocci" detects various cases of
unchecked malloc().
2024-08-24 19:13:56 +02:00
William Lallemand
7a03ab426f BUILD: tools: environ is not defined in OS X and BSD
Add extern char **environ which in order to build the new functions to
manipulate the environment.

Indeed the variable environ is not required to be declared by POSIX, so
it need to be declared manually:

"In addition, the following variable, which must be declared by the user if it is to be used directly:

extern char **environ;"

https://pubs.opengroup.org/onlinepubs/9699919799/functions/environ.html
2024-08-23 19:39:57 +02:00
Valentine Krasnobaeva
28ca7fc594 BUG/MINOR: haproxy: free init_env in deinit only if allocated
This fixes 7b78e1571 (" MINOR: mworker: restore initial env before wait
mode").

In cases, when haproxy starts without any configuration, for example:
'haproxy -vv', init_env array to backup env variables is never allocated. So,
we need to check in deinit(), when we free its memory, that init_env is not a
NULL ptr.
2024-08-23 19:08:53 +02:00
Valentine Krasnobaeva
7b78e1571b MINOR: mworker: restore initial env before wait mode
This patch is the follow-up of 1811d2a6ba (MINOR: tools: add helpers to
backup/clean/restore env).

In order to avoid unexpected behaviour in master-worker mode during the process
reload with a new configuration, when the old one has contained '*env' keywords,
let's backup its initial environment before calling parse_cfg() and let's clean
and restore it in the context of master process, just before it enters in a wait
polling loop.

This will garantee that new workers will have a new updated environment and not
the previous one inherited from the master, which does not read the configuration,
when it's in a wait-mode.
2024-08-23 17:06:59 +02:00
Valentine Krasnobaeva
1811d2a6ba MINOR: tools: add helpers to backup/clean/restore env
'setenv', 'presetenv', 'unsetenv', 'resetenv' keywords in configuration could
modify the process runtime environment. In case of master-worker mode this
creates a problem, as the configuration is read only once before the forking a
worker and then the master process does the reexec without reading any config
files, just to free the memory. So, during the reload a new worker process will
be created, but it will inherited the previous unchanged environment from the
master in wait mode, thus it won't benefit the changes in configuration,
related to '*env' keywords. This may cause unexpected behavior or some parser
errors in master-worker mode.

So, let's add a helper to backup all process env variables just before it will
read its configuration. And let's also add helpers to clean up the current
runtime environment and to restore it to its initial state (as it was before
parsing the config).
2024-08-23 17:06:33 +02:00
Amaury Denoyelle
960d68a5af MINOR: mux-quic: correct qcc_bufwnd_full() documentation
Fix returned value domment of qcc_bufwnd_full() which was incorrect.
2024-08-23 16:25:04 +02:00
Amaury Denoyelle
ecfedc2570 MINOR: mux-quic: add buf_in_flight to QCC debug infos
Dump <buf_in_flight> QCC field both in QUIC MUX traces and "show quic".
This could help to detect if MUX does not allocate enough buffers
compared to quic_conn current congestion window.
2024-08-22 17:48:23 +02:00
Nathan Wehrman
5c07d58e08 MINOR: config: Created env variables for http and tcp clf formats
Since we already have variables for the other formats and the
change is trivial I thought it would be a nice addition for
completeness
2024-08-22 09:15:58 +02:00
Willy Tarreau
599f043e74 [RELEASE] Released version 3.1-dev6
Released version 3.1-dev6 with the following main changes :
    - BUG/MINOR: proto_tcp: delete fd from fdtab if listen() fails
    - BUG/MINOR: proto_tcp: keep error msg if listen() fails
    - MINOR: proto_tcp: tcp_bind_listener: copy errno in errmsg
    - MINOR: channel: implement ci_insert() function
    - BUG/MEDIUM: mworker/cli: fix pipelined modes on master CLI
    - REGTESTS: mcli: test the pipelined commands on master CLI
    - MINOR: cfgparse: load_cfg_in_mem: fix null ptr dereference reported by coverity
    - MINOR: startup: fix unused value reported by coverity
    - BUG/MINOR: mux-quic: do not send too big MAX_STREAMS ID
    - BUG/MINOR: proto_uxst: delete fd from fdtab if listen() fails
    - BUG/MINOR: cfgparse: parse_cfg: fix null ptr dereference reported by coverity
    - MINOR: proto_uxst: copy errno in errmsg for syscalls
    - MINOR: mux-quic: do not trace error in qcc_send_frames() on empty list
    - BUG/MINOR: h3: properly reject too long header responses
    - CLEANUP: mworker/cli: clean up the mode handling
    - BUG/MINOR: tools: make fgets_from_mem() stop at the end of the input
    - BUG/MINOR: pattern: pat_ref_set: fix UAF reported by coverity
    - BUG/MINOR: pattern: pat_ref_set: return 0 if err was found
    - CI: keep logs for failed QIUC Interop jobs
    - BUG/MINOR: release-estimator: fix relative scheme in CHANGELOG URL
    - MINOR: release-estimator: add requirements.txt
    -  MINOR: release-estimator: add installation steps in README.md
    - MINOR: release-estimator: fix the shebang of the python script
    - DOC: config: correct the table for option tcplog
    - MEDIUM: log: relax some checks and emit diag warnings instead in lf_expr_postcheck()
    - MINOR: log: "drop" support for log-profile steps
    - CI: QUIC Interop LibreSSL: document chacha20 test status
    - CI: modernize codespell action, switch to node 16
    - CI: QUIC Interop AWS-LC: enable chrome client
    - DOC: lua: fix incorrect english in lua.txt
    - MINOR: Implements new log format of option tcplog clf
    - MINOR: cfgparse: limit file size loaded via /dev/stdin
    - BUG/MINOR: stats: fix color of input elements in dark mode
    - CLEANUP: stats: use modern DOCTYPE tag
    - BUG/MINOR: stats: add lang attribute to html tag
    - DOC: quic: fix default minimal value for max window size
    - DOC: quic: document nocc debug congestion algorithm
    - MINOR: quic: extract config window-size parsing
    - MINOR: quic: define max-window-size config setting
    - MINOR: quic: allocate stream txbuf via qc_stream_desc API
    - MINOR: mux-quic: account stream txbuf in QCC
    - MEDIUM: mux-quic: implement API to ignore txbuf limit for some streams
    - MINOR: h3: mark control stream as metadata
    - MINOR: mux-quic: define buf_in_flight
    - MAJOR: mux-quic: allocate Tx buffers based on congestion window
    - MINOR: quic/config: adapt settings to new conn buffer limit
    - MINOR: quic: define sbuf pool
    - MINOR: quic: support sbuf allocation in quic_stream
    - MEDIUM: h3: allocate small buffers for headers frames
    - MINOR: mux-quic: retry after small buf alloc failure
    - BUG/MINOR: cfgparse-global: fix err msg in mworker keyword parser
    - BUG/MINOR: cfgparse-global: clean common_kw_list
    - BUG/MINOR: cfgparse-global: remove redundant goto
    - MINOR: cfgparse-global: move 'pidfile' in global keywords list
    - MINOR: cfgparse-global: move 'expose-*' in global keywords list
    - MINOR: cfgparse-global: move tune options in global keywords list
    - MINOR: cfgparse-global: move unsupported keywords in global list
    - BUG/MINOR: cfgparse-global: remove tune.fast-forward from common_kw_list
    - MINOR: quic: store the lost packets counter in the quic_cc_event element
    - MINOR: quic: support a tolerance for spurious losses
    - MINOR: protocol: properly assign the sock_domain and sock_family
    - MINOR: protocol: add a family lookup
    - MEDIUM: socket: always properly use the sock_domain for requested families
    - MINOR: protocol: add the real address family to the protocol
    - MINOR: socket: don't ban all custom families from reuseport
    - MINOR: protocol: always initialize the receivers list on registration
    - CLEANUP: protocol: no longer initialize .receivers nor .nb_receivers
2024-08-21 17:50:03 +02:00
Willy Tarreau
9911b53d75 CLEANUP: protocol: no longer initialize .receivers nor .nb_receivers
Protocol definitions no longer need to initialize these internal fields,
as they're now properly initialized during protocol registration.
2024-08-21 17:37:46 +02:00
Willy Tarreau
1cb3b0b745 MINOR: protocol: always initialize the receivers list on registration
Till now, protocols were required to self-initialize their receivers
list head, which is not very convenient, and is quite error prone.
Indeed, it's too easy to copy-paste a protocol definition and forget
to update the .receivers field to point to itself, resulting in mixed
lists. Let's just do that in protocol_register(). And while we're at
it, let's also zero the nb_receivers entry that works with it, so that
the protocol definition isn't required to pre-initialize stuff related
to internal book-keeping.
2024-08-21 17:37:46 +02:00
Willy Tarreau
034974106f MINOR: socket: don't ban all custom families from reuseport
The test on ss_family >= AF_MAX is too strict if we want to support new
custom families, let's apply this to the real_family instead so that we
check that the underlying socket supports reuseport.
2024-08-21 17:37:46 +02:00
Willy Tarreau
2a799b64b0 MINOR: protocol: add the real address family to the protocol
For custom families, there's sometimes an underlying real address and
it would be nice to be able to directly use the real family in calls
to bind() and connect() without having to add explicit checks for
exceptions everywhere.

Let's add a .real_family field to struct proto_fam for this. For now
it's always equal to the family except for non-transferable ones such
as rhttp where it's equal to the custom one (anything else could fit).
2024-08-21 17:37:46 +02:00
Willy Tarreau
d592ebdbeb MEDIUM: socket: always properly use the sock_domain for requested families
Now we make sure to always look up the protocol's domain for an address
family. Previously we would use it as-is, which prevented from properly
using custom addresses (which is when they differ).

This removes some hard-coded tests such as in log.c where UNIX vs UDP
was explicitly checked for example. It requires a bit of care, however,
so as to properly pass value 1 in the 3rd arg of the protocol_lookup()
for DGRAM stuff. Maybe one day we'll change these for defines or enums
to limit mistakes.
2024-08-21 17:36:58 +02:00
Willy Tarreau
ba4a416c66 MINOR: protocol: add a family lookup
At plenty of places we have access to an address family which may
include some custom addresses but we cannot simply convert them to
the real families without performing some random protocol lookups.

Let's simply add a proto_fam table like we have for the protocols.
The protocols could even be indexed there, but for now it's not worth
it.
2024-08-21 16:46:15 +02:00
Willy Tarreau
732913f848 MINOR: protocol: properly assign the sock_domain and sock_family
When we finally split sock_domain from sock_family in 2.3, something
was not cleanly finished. The family is what should be stored in the
address while the domain is what is supposed to be passed to socket().
But for the custom addresses, we did the opposite, just because the
protocol_lookup() function was acting on the domain, not the family
(both of which are equal for non-custom addresses).

This is an API bug but there's no point backporting it since it does
not have visible effects. It was visible in the code since a few places
were using PF_UNIX while others were comparing the domain against AF_MAX
instead of comparing the family.

This patch clarifies this in the comments on top of proto_fam, addresses
the indexing issue and properly reconfigures the two custom families.
2024-08-21 16:46:15 +02:00
Willy Tarreau
67bf1d6c9e MINOR: quic: support a tolerance for spurious losses
Tests performed between a 1 Gbps connected server and a 100 mbps client,
distant by 95ms showed that:

  - we need 1.1 MB in flight to fill the link
  - rare but inevitable losses are sufficient to make cubic's window
    collapse fast and long to recover
  - a 100 MB object takes 69s to download
  - tolerance for 1 loss between two ACKs suffices to shrink the download
    time to 20-22s
  - 2 losses go to 17-20s
  - 4 losses reach 14-17s

At 100 concurrent connections that fill the server's link:
  - 0 loss tolerance shows 2-3% losses
  - 1 loss tolerance shows 3-5% losses
  - 2 loss tolerance shows 10-13% losses
  - 4 loss tolerance shows 23-29% losses

As such while there can be a significant gain sometimes in setting this
tolerance above zero, it can also significantly waste bandwidth by sending
far more than can be received. While it's probably not a solution to real
world problems, it repeatedly proved to be a very effective troubleshooting
tool helping to figure different root causes of low transfer speeds. In
spirit it is comparable to the no-cc congestion algorithm, i.e. it must
not be used except for experimentation.
2024-08-21 08:34:30 +02:00
Willy Tarreau
fab0e99aa1 MINOR: quic: store the lost packets counter in the quic_cc_event element
Upon loss detection, qc_release_lost_pkts() notifies congestion
controllers about the event and its final time. However it does not
pass the number of lost packets, that can provide useful hints for
some controllers. Let's just pass this option.
2024-08-21 08:02:44 +02:00
Valentine Krasnobaeva
2e6e159ac4 BUG/MINOR: cfgparse-global: remove tune.fast-forward from common_kw_list
Remove tune.fast-forward from common_kw_list. It was replaced by
'tune.disable-fast-forward' and it's no longer present in "if..else if.."
parser from cfg_parse_global(). Otherwise, it may be shown as the best-match
keyword for some tune options, which is now wrong.

Should be backported in versions 2.9 and 3.0.
2024-08-20 19:16:34 +02:00
Valentine Krasnobaeva
731ef865e3 MINOR: cfgparse-global: move unsupported keywords in global list
Following the previous commits and in order to clean up cfg_parse_global let's
move unsupported keywords in the global list and let's add for them a dedicated
parser.
2024-08-20 19:16:33 +02:00
Valentine Krasnobaeva
55309592db MINOR: cfgparse-global: move tune options in global keywords list
In order to clean up cfg_parse_global() and to add the support of the new
MODE_DISCOVERY in configuration parsing, let's move the keywords related to
tune options into the global keywords list and let's add for them two dedicated
parsers. Tune options keywords are sorted between two parsers in dependency of
parameters number, which a given tune option needs.

tune options parser is called by section parser and follows the common API, i.e.
it returns -1 on failure, 0 on success and 1 on recoverable error. In case of
recoverable error we've previously returned ERR_ALERT (0x10) and we have emitted
an alert message at startup. Section parser treats all rc > 0 as ERR_WARN. So in
case, if some tune option was set twice in the global section, tune
options parser will return 1 (in order to respect the common API), section
parser will treat this as ERR_WARN and a warning message will be emitted during
process startup instead of alert, as it was before.
2024-08-20 19:16:32 +02:00
Valentine Krasnobaeva
c46497f16f MINOR: cfgparse-global: move 'expose-*' in global keywords list
Following the previous commit let's also move 'expose-*' keywords in the global
cfg_kws list and let's add for them a dedicated parser. This will simplify the
configuration parsing in the new MODE_DISCOVERY, which allows to read only the
keywords, needed at the early start of haproxy process (i.e. modes, pidfile,
chosen poller).
2024-08-20 19:16:31 +02:00
Valentine Krasnobaeva
450ce3e61b MINOR: cfgparse-global: move 'pidfile' in global keywords list
This commit cleans up cfg_parse_global() and prepares the config parser to
support MODE_DISCOVERY. This step is needed in early starting stage, just to
figura out in which mode the process was started, to set some necessary
parameteres needed for this mode and to continue the initialization
stage.

'pidfile' makes part of such common keywords, which are needed to be parsed
very early and which are used almost in all process modes (except the
foreground, '-d').

'pidfile' keyword parser is called by section parser and follows the common
API, i.e. it returns -1 on failure, 0 on success and 1 on recoverable error. In
case of recoverable error we've previously returned ERR_ALERT (0x10) and we have
emitted an alert message at startup. Section parser treats all rc > 0 as
ERR_WARN. So in case, if pidfile was already specified via command line, the
keyword parser will return 1 (in order to respect the common API), section
parser will treat this as ERR_WARN and a warning message will be emitted during
process startup instead of alert, as it was before.
2024-08-20 19:16:30 +02:00
Valentine Krasnobaeva
f29be97ac7 BUG/MINOR: cfgparse-global: remove redundant goto
In the case, when the given keyword was found in the global 'cfg_kws' list, we
go to 'out' label anyway, after testing rc returned by the keyword's parser. So
there is not a much gain if we perform 'goto out' jump specifically when rc > 0.
2024-08-20 19:16:29 +02:00
Valentine Krasnobaeva
74bc6f3d66 BUG/MINOR: cfgparse-global: clean common_kw_list
This patch fixes commits 118ac11ce
("MINOR: cfgparse-global: move mode's keywords in cfg_kw_list") and 83ff4db18
(MINOR: cfgparse-global: move no<poller_name> in cfg_kw_list).

'common_kw_list' serves to show the best-match keyword in cfg_parse_global(), if
the given keyword was not parsed in "if..else if.." cases. cfg_parse_global()
is still used as a parser for some keywords from the global section.

Mode-specific and no<poller_name> keywords now have their own parsers. They no
longer take place in the "if..else if.." from cfg_parse_global() and they are
registered in the 'cfg_kws' list. So, there is no longer need to duplicate
them in the 'common_kw_list'. Otherwise, they will be shown twice in parser
error message.
2024-08-20 19:16:28 +02:00
Valentine Krasnobaeva
4291d10b44 BUG/MINOR: cfgparse-global: fix err msg in mworker keyword parser
This patch fixes the commit 118ac11ce
("cfgparse-global: move mode's keywords in cfg_kw_list"). Error message
delivered by keyword parser in **err is always shown with ha_alert() by the
caller cfg_parse_global(). The caller always supplies these alerts with the
filename and the line number.
2024-08-20 19:16:27 +02:00
Amaury Denoyelle
0d6112b40b MINOR: mux-quic: retry after small buf alloc failure
Previous commit switch to small buffers for HTTP/3 HEADERS emission.
This ensures that several parallel streams can allocate their own buffer
without hitting the connection buffer limit based now on the congestion
window size.

However, this prevents the transmission of responses with uncommonly
large headers. Indeed, if all headers cannot be encoded in a single
buffer, an error is reported which cause the whole connection closure.

Adjust this by implementing a realloc API exposed by QUIC MUX. This
allows application layer to switch from a small to a default buffer and
restart its processing. This guarantees that again headers not longer
than bufsize can be properly transferred.
2024-08-20 18:12:27 +02:00
Amaury Denoyelle
b355e89bf9 MEDIUM: h3: allocate small buffers for headers frames
A major change was recently implemented to change QUIC MUX Tx buffer
allocation limit, which is now based on the current connection
congestion window size. As this size may be smaller than the previous
static value, it is likely that the limit will be reached more
frequently.

When using HTTP/3, the majority of requests streams are used for small
object exchanges. Every responses start with a HEADERS frames which
should be much smaller in size than the default buffer. But as the whole
buffer size is accounted against the congestion window, a single stream
can block others even if only emitting a single HEADERS frame which is
suboptimal for bandwith usage, if the congestion window is small enough.

To adapt to this new situation, rely on the newly available small
buffers to transfer HEADERS frame response. This at least guarantee that
several parallel streams could allocate their own buffer for the first
part of the response, even with a small congestion window.

The situation could be further improve to use various indication on the
data size and select a small buffer if sufficient. This could be done
for example via the Content-length value or HTX extra field. However
this must be the subject of a dedicated patch.
2024-08-20 18:12:27 +02:00
Amaury Denoyelle
885e4c5cf8 MINOR: quic: support sbuf allocation in quic_stream
This patch extends qc_stream_desc API to be able to allocate small
buffers. QUIC MUX API is similarly updated as ultimatly each application
protocol is responsible to choose between a default or a smaller buffer.

Internally, the type of allocated buffer is remembered via qc_stream_buf
instance. This is mandatory to ensure that the buffer is released in the
correct pool, in particular as small and standard buffers can be
configured with the same size.

This commit is purely an API change. For the moment, small buffers are
not used. This will changed in a dedicated patch.
2024-08-20 18:12:27 +02:00
Amaury Denoyelle
d0d8e57d47 MINOR: quic: define sbuf pool
Define a new buffer pool reserved to allocate smaller memory area. For
the moment, its usage will be restricted to QUIC, as such it is declared
in quic_stream module.

Add a new config option "tune.bufsize.small" to specify the size of the
allocated objects. A special check ensures that it is not greater than
the default bufsize to avoid unexpected effects.
2024-08-20 18:12:27 +02:00
Amaury Denoyelle
1de5f718cf MINOR: quic/config: adapt settings to new conn buffer limit
QUIC MUX buffer allocation limit is now directly based on the underlying
congestion window size. previous static limit based on conn-tx-buffers
is now unused. As such, this commit adds a warning to users to prevent
that it is now obsolete.

Secondly, update max-window-size setting. It is now the main entrypoint
to limit both the maximum congestion window size and the number of QUIC
MUX allocated buffer on emission. Remove its special value '0' which was
used to automatically adjust it on now unused conn-tx-buffers.
2024-08-20 17:59:35 +02:00
Amaury Denoyelle
aeb8c1ddc3 MAJOR: mux-quic: allocate Tx buffers based on congestion window
Each QUIC MUX may allocate buffers for MUX stream emission. These
buffers are then shared with quic_conn to handle ACK reception and
retransmission. A limit on the number of concurrent buffers used per
connection has been defined statically and can be updated via a
configuration option. This commit replaces the limit to instead use the
current underlying congestion window size.

The purpose of this change is to remove the artificial static buffer
count limit, which may be difficult to choose. Indeed, if a connection
performs with minimal loss rate, the buffer count would limit severely
its throughput. It could be increase to fix this, but it also impacts
others connections, even with less optimal performance, causing too many
extra data buffering on the MUX layer. By using the dynamic congestion
window size, haproxy ensures that MUX buffering corresponds roughly to
the network conditions.

Using QCC <buf_in_flight>, a new buffer can be allocated if it is less
than the current window size. If not, QCS emission is interrupted and
haproxy stream layer will subscribe until a new buffer is ready.

One of the criticals parts is to ensure that MUX layer previously
blocked on buffer allocation is properly woken up when sending can be
retried. This occurs on two occasions :

* after an already used Tx buffer is cleared on ACK reception. This case
  is already handled by qcc_notify_buf() via quic_stream layer.

* on congestion window increase. A new qcc_notify_buf() invokation is
  added into qc_notify_send().

Finally, remove <avail_bufs> QCC field which is now unused.

This commit is labelled MAJOR as it may have unexpected effect and could
cause significant behavior change. For example, in previous
implementation QUIC MUX would be able to buffer more data even if the
congestion window is small. With this patch, data cannot be transferred
from the stream layer which may cause more streams to be shut down on
client timeout. Another effect may be more CPU consumption as the
connection limit would be hit more often, causing more streams to be
interrupted and woken up in cycle.
2024-08-20 17:17:17 +02:00
Amaury Denoyelle
000976af58 MINOR: mux-quic: define buf_in_flight
Define a new QCC counter named <buf_in_flight>. Its purpose is to
account the current sum of all allocated stream buffer size used on
emission.

For this moment, this counter is updated and buffer allocation and
deallocation. It will be used to replace <avail_bufs> once congestion
window is used as limit for buffer allocation in a future commit.
2024-08-20 17:17:17 +02:00
Amaury Denoyelle
f9777bea30 MINOR: h3: mark control stream as metadata
A current work is performed to change QUIC MUX buffer allocation limit
from a configurable static value to use the size of the congestion
window instead. This change may cause the buffer allocation limit to be
triggered more frequently.

To ensure HTTP/3 control emission is not perturbed by this change, mark
the stream with qcc_send_metadata(). This ensures that buffer allocation
for this stream won't be subject to the connection limit. This is
necessary to guarantee that SETTINGS and GOAWAY frames are emitted.
2024-08-20 17:17:17 +02:00
Amaury Denoyelle
4c4bf26f44 MEDIUM: mux-quic: implement API to ignore txbuf limit for some streams
Define a new qc_stream_desc flag QC_SD_FL_OOB_BUF. This is to mark
streams which are not subject to the connection limit on allocated MUX
stream buffer.

The purpose is to simplify handling of QUIC MUX streams which do not
transfer data and as such are not driven by haproxy layer, for example
HTTP/3 control stream. These streams interacts synchronously with QUIC
MUX and cannot retry emission in case of temporary failure.

This commit will be useful once connection buffer allocation limit is
reimplemented to directly rely on the congestion window size. This will
probably cause the buffer limit to be reached more frequently, maybe
even on QUIC MUX initialization. As such, it will be possible to mark
control streams and prevent them to be subject to the buffer limit.

QUIC MUX expose a new function qcs_send_metadata(). It can be used by an
application protocol to specify which streams are used for control
exchanges. For the moment, no such stream use this mechanism.
2024-08-20 17:17:17 +02:00
Amaury Denoyelle
f4d1bd0b76 MINOR: mux-quic: account stream txbuf in QCC
A limit per connection is put on the number of buffers allocated by QUIC
MUX for emission accross all its streams. This ensures memory
consumption remains under control. This limit is simply explained as a
count of buffers which can be concurrently allocated for each
connection.

As such, quic_conn structure was used to account currently allocated
buffers. However, a quic_conn nevers allocates new stream buffers. This
is only done at QUIC MUX layer. As such, this commit moves buffer
accounting inside QCC structure. This simplifies the API, most notably
qc_stream_buf_alloc() usage.

Note that this commit inverts the accounting. Previously, it was
initially set to 0 and increment for each allocated buffer. Now, it is
set to the maximum value and decrement for each buf usage. This is
considered as clearer to use.
2024-08-20 17:17:17 +02:00
Amaury Denoyelle
635fbaaa4a MINOR: quic: allocate stream txbuf via qc_stream_desc API
This commit simply adjusts QUIC stream buffer allocation. This operation
is conducted by QUIC MUX using qc_stream_desc layer. Previously,
qc_stream_buf_alloc() would return a qc_stream_buf instance and QUIC MUX
would finalized the buffer area allocation. Change this to perform the
buffer allocation directly into qc_stream_buf_alloc().

This patch clarifies the interaction between QUIC MUX and
qc_stream_desc. It is cleaner to allocate the buffer via qc_stream_desc
as it is already responsible to free the buffer.

It also ensures that connection buffer accounting is only done after the
whole qc_stream_buf and its buffer are allocated. Previously, the
increment operation was performed between the two steps. This was not an
issue, as this kind of error triggers the whole connection closure.
However, if in the future this is handled as a stream closure instead,
this commit ensures that the buffer remains valid in all cases.
2024-08-20 17:17:17 +02:00
Amaury Denoyelle
c24c8667b2 MINOR: quic: define max-window-size config setting
Define a new global keyword tune.quic.frontend.max-window-size. This
allows to set globally the maximum congestion window size for each QUIC
frontend connections.

The default value is 0. It is a special value which automatically derive
the size from the configured QUIC connection buffer limit. This is
similar to the previous "quic-cc-algo" behavior, which can be used to
override the maximum window size per bind line.
2024-08-20 17:02:29 +02:00
Amaury Denoyelle
280b61468a MINOR: quic: extract config window-size parsing
quic-cc-algo is a bind line keyword which allow to select a QUIC
congestion algorithm. It can take an optional integer to specify the
maximum window size. This value is an integer and support the suffixes
'k', 'm' and 'g' to specify respectively kilobytes, megabytes and
gigabytes.

Extract the maximum window size parsing in a dedicated function named
parse_window_size(). It accepts as input an integer value with an
optional suffix, 'k', 'm' or 'g'. The first invalid character is
returned by the function to the caller.

No functional change. This commit will allow to quickly implement a new
keyword to configure a default congestion window size in the global
section.
2024-08-20 16:07:22 +02:00
Amaury Denoyelle
5b6e8c4d4d DOC: quic: document nocc debug congestion algorithm
Document nocc congestion algorithm as an entry of quic-cc-algo.
Highlight the fact that it is reserved for debugging and should not be
used outside of this use case.
2024-08-20 16:07:22 +02:00
Amaury Denoyelle
103d860777 DOC: quic: fix default minimal value for max window size
It is possible to override the default QUIC congestion algorithm on a
bind line. With the same setting, it is also possible to specify the
maximum congestion window size.

The parser rejects values outside of the range between 10k and 4g. This
is in contradiction with the documentation which specify 1k as the lower
value. Correct this value in the documentation.

This should be backported up to 2.9.
2024-08-20 16:07:22 +02:00
Nicolas CARPi
bba679026c BUG/MINOR: stats: add lang attribute to html tag
The "html" element of the stats page was missing a "lang" attribute.
This change specifies the "en" value, which corresponds to english
language.

It is also a required element for WCAG Success Criterion 3.1.1, which
renders the web more accessible through a set of requirements. In this
case it allows assistive technologies such as screen readers to
determine the language of the page.

MDN page: https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/lang
HTML standard: https://html.spec.whatwg.org/multipage/dom.html#attr-lang
WCAG criterion: https://www.w3.org/WAI/WCAG22/Understanding/language-of-page.html
2024-08-20 15:55:45 +02:00
Nicolas CARPi
9318a624a1 CLEANUP: stats: use modern DOCTYPE tag
Switching the stats page doctype to the modern standard is shorter and
less complex, and is the recommended doctype by current HTML standard.
It makes it clear that we do not want to run in quirks mode. More information below.

Quirks mode: https://developer.mozilla.org/en-US/docs/Web/HTML/Quirks_Mode_and_Standards_Mode
HTML Standard: https://html.spec.whatwg.org/multipage/syntax.html#the-doctype
2024-08-20 15:55:31 +02:00
Nicolas CARPi
c63d558e41 BUG/MINOR: stats: fix color of input elements in dark mode
Previously the text color was dark, with a dark background, this makes it
white, and thus readable. This is visible on the "Scope" input field.
2024-08-20 15:55:14 +02:00
Valentine Krasnobaeva
8b1dfa9def MINOR: cfgparse: limit file size loaded via /dev/stdin
load_cfg_in_mem() can continuously reallocate memory in order to load an
extremely large input from /dev/stdin, until it fails with ENOMEM, which means
that process has consumed all available RAM. In case of containers and
virtualized environments it's not very good.

So, in order to prevent this, let's introduce MAX_CFG_SIZE as 10MB, which will
limit the size of input supplied via /dev/stdin.
2024-08-20 14:28:34 +02:00
Nathan Wehrman
fd48b28315 MINOR: Implements new log format of option tcplog clf
Some systems require log formats in the CLF format and that meant that I
could not send my logs for proxies in mode tcp to those servers.  This
implements a format that uses log variables that are compatble with TCP
mode frontends and replaces traditional HTTP values in the CLF format
to make them stand out. Instead of logging method and URI like this
"GET /example HTTP/1.1" it will log "TCP " and for a response code I
used "000" so it would be easy to separate from legitimate HTTP
traffic. Now your log servers that require a CLF format can see the
timings for TCP traffic as well as HTTP.
2024-08-20 07:46:34 +02:00
Nicolas CARPi
974fae2b17 DOC: lua: fix incorrect english in lua.txt
This commit fixes some typos, grammatical errors and unusual english
such as "can not" instead of preferred "cannot".
2024-08-20 05:21:02 +02:00
Ilia Shipitsin
ae8f6724a1 CI: QUIC Interop AWS-LC: enable chrome client
chrome is important browser, let's enable it in AWS-LC weekly tests.
the only test supported by chrome is http3
2024-08-20 05:13:46 +02:00
Ilia Shipitsin
6301042938 CI: modernize codespell action, switch to node 16
The following actions uses node12 which is deprecated and will be forced
to run on node16: codespell-project/codespell-problem-matcher@v1. For
more info:
   https://github.blog/changelog/2023-06-13-github-actions-all-actions-will-run-on-node16-instead-of-node12-by-default/
2024-08-20 05:13:46 +02:00
Ilia Shipitsin
8b422971ee CI: QUIC Interop LibreSSL: document chacha20 test status
due to https://github.com/haproxy/haproxy/issues/2569 chacha20 is
disabled completely on LibreSSL. let's add a comment to not forget
enabling it
2024-08-20 05:13:26 +02:00
Aurelien DARRAGON
f8299bc5ea MINOR: log: "drop" support for log-profile steps
It is now possible to use "drop" keyword for "on" lines under a
log-profile section to specify that no log at all should be emitted for
the specified step (setting an empty format was not sufficient to do so
because only the log payload would be empty, not the log header, thus the
log would still be emitted).

It may be useful to selectively disable logging at specific steps for a
given log target (since the log profile may be set on log directives):

log-profile myprof
  on request format "blabla" sd "custom sd"
  on response drop

New testcase was added to reg-tests/log/log_profiles.vtc
2024-08-19 18:53:01 +02:00
Aurelien DARRAGON
41ca89bc6f MEDIUM: log: relax some checks and emit diag warnings instead in lf_expr_postcheck()
With 7a21c3a ("MAJOR: log: implement proper postparsing for logformat
expressions") which finally made postparsing checks reliable, we started
to get report from users that couldn't start haproxy 3.0 with configs that
used to work in the past. The current situation is described in GH #2642.

While the checks are mostly relevant, it turns out there are not strictly
needed anymore from a technical point of view. Most of them were useful in
early logformat implementation to prevent runtime bugs due to the use of
an alias or fetch at runtime from an incompatible proxy. It's been a few
versions already that the code handling fetches and log aliases is robust
enough to support fetches/aliases used from the wrong context: all it
does is that the fetch/alias will silently fail if it's not available.

This can be proved by the fact that even if the postparsing checks were
partially broken in the past, it didn't cause runtime issues (at least
on recent haproxy versions).

Most of these checks can now be seen as configuration hints: when a check
triggers, it will indicate a configuration inconsistency in most cases,
but they are some corner cases where it is not possible to know at config
time if the conditions will be met for the alias/fetch to work properly..
so instead of failing with a hard error like we did so far, let's just be
more permissive and report our findings using "diag_warning": such
warnings are only emitted when haproxy is started with '-dD' cli option.

We also took this opportunity to improve messages clarity and make them
more precise (report the offending item instead of complaining about the
whole expression because of a single element).

With this patch, configs that used to start before 7a21c3a shouldn't
trigger hard errors anymore.

This may be backported in 3.0.
2024-08-16 14:25:10 +02:00
Nathan Wehrman
9788ae1d19 DOC: config: correct the table for option tcplog
option tcplog was reported as functional in the backend section in
error. This can be back ported as needed but it simply corrects
that.
2024-08-13 19:50:18 +02:00
William Lallemand
f14bdba867 MINOR: release-estimator: fix the shebang of the python script
Fix the shebang of the python script to use /usr/bin/env, allowing to
call the script directly from a virtualenv with `./release-estimator.py`
without using the python3 install of the system.
2024-08-13 17:26:36 +02:00
William Lallemand
5131f32440 MINOR: release-estimator: add installation steps in README.md
Update the README.md with the dependencies and the installation steps
 with a python venv.
2024-08-13 17:21:47 +02:00
William Lallemand
9857eba3ae MINOR: release-estimator: add requirements.txt
Add a requirements.txt file to install the release-estimator script.
2024-08-13 17:12:59 +02:00
William Lallemand
bb02d95e92 BUG/MINOR: release-estimator: fix relative scheme in CHANGELOG URL
The CHANGELOG URL which is parsed in the HTML now have a relative
scheme, which is incompatible with requests. This patch adds an https
scheme to the URL.
2024-08-13 16:43:03 +02:00
Ilia Shipitsin
ec1d93a6e9 CI: keep logs for failed QIUC Interop jobs
it might be useful to investigate logs of failed tests. to keep
artifacts small the following actions are taken
- only failed logs are kept
- logs retention is 6 days
2024-08-13 16:21:01 +02:00
Valentine Krasnobaeva
911f4d93d4 BUG/MINOR: pattern: pat_ref_set: return 0 if err was found
pat_ref_set_elt() returns 0, if we are run out of memory or can't parse a new
map value. Any arror message emitted by pat_ref_set_elt() is saved in err
buffer, if its provided by caller. These error messages are cumulated during
the loop.

pat_ref_set() is used to update values in map, referred to the same given key.
If during the update pat_ref_set_elt() fails, let's retun 0 to caller
immediately. We have the same non-unique key and the same new value in each
loop. So it seems quite odd to cumulate the same error messages and print it in
CLI:

        > add map @1 mytest.map <<
        + 1.0.1.11 TestA
        + 1.0.1.11 TESTA
        + 1.0.1.11 test_a
        +

        > set map mytest.map 1.0.1.11 15
         unable to parse '15' unable to parse '15' unable to parse '15'.

cli_parse_set_map(), which calls pat_ref_set() to update map, will return only
one error message with this patch:

> set map mytest.map 1.0.1.11 15
 unable to parse '15'.

hlua_set_map() and http_action_set_map() don't provide error buffer and will
just exit on the first error.

This should be backported in all stable versions.
2024-08-13 16:13:43 +02:00
Valentine Krasnobaeva
4f2493f355 BUG/MINOR: pattern: pat_ref_set: fix UAF reported by coverity
memprintf() performs realloc and updates then the pointer to an output buffer,
where it has written the data. So free() is called on the previous buffer
address, if it was provided.

pat_ref_set_elt() uses memprintf() to write its error message as well as
pat_ref_set(). So, when we re-enter into the while loop the second time and
pat_ref_set_elt() has returned, the *err ptr (previous value of *merr) is
already freed by memprintf() from pat_ref_set_el().

'if (!found)' condition is false at this point, because we've found a node at
the first loop. So, the second memprintf(), in order to write error messages,
does again free(*err).

This should be backported in all stable versions.
2024-08-13 16:13:41 +02:00
Willy Tarreau
0982bfd999 BUG/MINOR: tools: make fgets_from_mem() stop at the end of the input
The memchr() used to look for the LF character must consider the end of
input, not just the output buffer size.

This was found by oss-fuzz:
   https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=71096

No backport is needed.
2024-08-11 14:44:28 +02:00
William Lallemand
75944e266e CLEANUP: mworker/cli: clean up the mode handling
Cleanup the mode handling by refactoring the strings constant
that are written multiple times
2024-08-09 17:47:20 +02:00
Amaury Denoyelle
48514c118c BUG/MINOR: h3: properly reject too long header responses
When encoding HTX to HTTP/3 headers on the response path, a bunch of
ABORT_NOW() where used when buffer room was not enough. In most cases
this is safe as output buffer has just been allocated and so is empty at
the start of the function. However, with a header list longer than a
whole buffer, this would cause an unexpected crash.

Fix this by removing ABORT_NOW() statement with proper error return
path. For the moment, this would cause the whole connection to be close
rather than the stream only. This may be further improved in the future.

Also remove ABORT_NOW() when encoding frame length at the end of headers
or trailers encoding. Buffer room is sufficient as it was already
checked prior in the same function.

This should be backported up to 2.6. Special care should be handled
however as this code path has changed frequently :
* for 2.9 and older, the extra following statement must be inserted
  prior each newly added goto statement :
  h3c->err = H3_INTERNAL_ERROR;
* for 2.6, trailers support is not implemented. As such, related chunks
  should just be ignored when backporting.
2024-08-09 17:41:16 +02:00
Amaury Denoyelle
8939d8e473 MINOR: mux-quic: do not trace error in qcc_send_frames() on empty list
qcc_send_frames() can be called with an empty list and returns
immediately with an error code. This is convenience to be able to call
it in a while loop.

Remove the trace with "error" when this is the case and replacing it
with a less alarming "leaving on..." message. This should help debugging
when traces are active.
2024-08-09 17:41:16 +02:00
Valentine Krasnobaeva
9fc69ebc0a MINOR: proto_uxst: copy errno in errmsg for syscalls
Let's copy errno in error messages, which we emit in cases when listen() or
connect() fail. This is helpful for debugging.
2024-08-09 17:38:42 +02:00
Valentine Krasnobaeva
16e89f6b5c BUG/MINOR: cfgparse: parse_cfg: fix null ptr dereference reported by coverity
This commit fixes potential null ptr dereferences reported by coverity, see
more details about it in the issues #2676 and #2668.

'outline' ptr, which is initialized to NULL explicitly as a temporary buffer to
store split keywords may be in theory implicitly dereferenced in some corner
cases (which we haven't encountered yet with real world configurations) in
'if (!**args)'. parse_line() code, called before under some conditions
assigns: args[arg] = outline + outpos and outpos initial value is 0.
2024-08-09 15:43:29 +02:00
Valentine Krasnobaeva
eb82358690 BUG/MINOR: proto_uxst: delete fd from fdtab if listen() fails
This patch is done mostly as a safeguard in order not to trigger
BUG_ON(fdtab[fd].owner != NULL) check, if listen() will fail on UNIX domain
socket.

In uxst_bind_listener(), the pretty same logic of closing socket on error path
was kept, as it was in tcp_bind_listener() before. The use of fd_delete() was
not generalized, when the support of UNIX sock_stream protocol was implemented.
So, let's remove fd from fdtab on failure, instead of closing it. Otherwise,
uxst_bind_listener(), which could be called in loop for each receiver, will
obtain the same fd via socket() for the next receiver. Then, it will bind it
again and it will try to re-insert it in fdtab.

This can be backported to all stable versions.
2024-08-09 15:23:28 +02:00
Amaury Denoyelle
f3c75a52df BUG/MINOR: mux-quic: do not send too big MAX_STREAMS ID
QUIC stream IDs are expressed as QUIC variable integer which cover the
range for 0 to 2^62 - 1. As such, it is forbidden to send an ID for
MAX_STREAMS flow-control frame which would allow to overcome this value.

This patch fixes MAX_STREAMS emission to ensure sent value is valid.
This also ensures that the peer cannot open a stream with an invalid ID
as this would cause a flow-control violation instead.

This must be backported up to 2.6.
2024-08-09 14:33:49 +02:00
Valentine Krasnobaeva
aae2ff7691 MINOR: startup: fix unused value reported by coverity
Unused 0 is assigned to ret, as it's rewritten by error code of read_cfg().
This issue was reported by coverity.
2024-08-08 19:54:12 +02:00
Valentine Krasnobaeva
da82f08055 MINOR: cfgparse: load_cfg_in_mem: fix null ptr dereference reported by coverity
This helps to optimize a bit load_cfg_in_mem() and fixes the potential null ptr
dereference in fread() call. If (read_bytes + bytes_to_read) equals to initial
chunk_size (zero), realloc is never called, *cfg_content keeps its NULL value.

So, let's assure that initial number of bytes to read
(read_bytes + bytes_to_read) is stricly positive, when we enter into loop at
the first time.
2024-08-08 19:54:12 +02:00
William Lallemand
fe5ddcc490 REGTESTS: mcli: test the pipelined commands on master CLI
A recent fix broke the pipelined command on the master CLI, this
reg-tests implement a simple test that allow to check its right
behavior.

This could be backported as far as 2.6.
2024-08-08 17:29:37 +02:00
William Lallemand
b75edf2f11 BUG/MEDIUM: mworker/cli: fix pipelined modes on master CLI
Since commit 3d93ecc ("BUG/MAJOR: cli: Restore non-interactive mode
behavior with pipelined commands") and commit 598c7f16 ("BUG/MEDIUM:
cli: Warn if pipelined commands are delimited by a \n"), the pipelined
command on the master CLI are either broken or emit warnings depending
on which version.

The reason is that mode applied on the master CLI are saved on the in
the current CLI session, and then reinserted for each pipelined command,
however, these commande were inserted as new lines.

For example:

 "@1; expert-mode on; debug dev log foo; debug dev log bar"

 Would be sent as:

  "expert mode on\ndebug dev log foo"
  "expert mode on\ndebug dev log bar"

This patch fixes the issue by using the new ci_insert() function which
inserts a string instead of a newline, and the command are now suffixed
by ';' upon insertion allowing a correct pipelined command chain.

This must be backported with the previous commit introducing ci_insert()
in every stable version.

This is broken since the 3.0 version, but it emits a warning in every
version below, because 598c7f164 was backported.
2024-08-08 17:29:37 +02:00
William Lallemand
b2a8e8731d MINOR: channel: implement ci_insert() function
ci_insert() is a function which allows to insert a string <str> of size
<len> at <pos> of the input buffer. This is the equivalent of
ci_insert_line2() but without inserting '\r\n'
2024-08-08 17:29:37 +02:00
Valentine Krasnobaeva
46181e730a MINOR: proto_tcp: tcp_bind_listener: copy errno in errmsg
Let's copy errno in errmsg produced by tcp_bind_listener if it fails in
a syscall(). This is helpful to debug issues, while binding listeners.
2024-08-08 16:34:13 +02:00
Valentine Krasnobaeva
81f48395b3 BUG/MINOR: proto_tcp: keep error msg if listen() fails
If listen() fails, we need to keep the message about it, which is copied then
in errmsg buffer on the error path. This buffer is properly provided by the
caller (protocol_bind_all()) and reallocated if needed in memprintf(), but
it was deleted without being returned.

This can be backported to all stable versions.
2024-08-08 16:34:06 +02:00
Valentine Krasnobaeva
308c6881c0 BUG/MINOR: proto_tcp: delete fd from fdtab if listen() fails
If listen() fails, fd should be deleted from fdtab, not just closed. Otherwise,
sock_inet_bind_receiver(), which is called in loop for each receiver, will
obtain the same fd via socket() for the next receiver, registered in the
receivers list. Then, it will bind it again and it will try to re-insert it in
fdtab, and fd_insert() will trigger the BUG_ON(fdtab[fd].owner != NULL) check.

When tcp_bind_listener() code was implemented, the use of fd_delete() was
not generalized and this one remained overlooked.

This can be backported to all stable versions.
2024-08-08 16:33:53 +02:00
Willy Tarreau
8427c5b542 [RELEASE] Released version 3.1-dev5
Released version 3.1-dev5 with the following main changes :
    - BUG/MINOR: quic: Lack of precision when computing K (cubic only cc)
    - MEDIUM: ssl/quic: implement quic crypto with EVP_AEAD
    - MINOR: quic: rename confusing wording aes to hp
    - MEDIUM: quic: add key argument to header protection crypto functions
    - MEDIUM: quic: implement CHACHA20_POLY1305 for AWS-LC
    - MEDIUM: sink: assume sft appctx stickiness
    - MINOR: quic: delay Retry emission on quic-force-retry
    - MEDIUM: quic: implement quic-initial rules
    - MINOR: quic: support ACL for quic-initial rules
    - MINOR: quic: pass quic_dgram as obj_type for quic-initial rules
    - MINOR: quic: implement reject quic-initial action
    - MINOR: quic: implement send-retry quic-initial rules
    - BUG/MEDIUM: quic: fix invalid conn reject with CONNECTION_REFUSED
    - MEDIUM: h1: allow to preserve keep-alive on T-E + C-L
    - MINOR: quic: Add information to "show quic" for CUBIC cc.
    - MINOR: quic: Dump TX in flight bytes vs window values ratio.
    - BUG/MEDIUM: jwt: Clear SSL error queue on error when checking the signature
    - BUILD: cfgparse-quic: fix build error on Solaris due to missing netinet/in.h
    - MINOR: queue: add a function to check for TOCTOU after queueing
    - BUG/MEDIUM: queue: deal with a rare TOCTOU in assign_server_and_queue()
    - DOC: config: Add documentation about spop mode for backends
    - BUG/MEDIUM: stconn: Report error on SC on send if a previous SE error was set
    - BUG/MEDIUM: mux-pt/mux-h1: Release the pipe on connection error on sending path
    - BUILD: mux-pt: Use the right name for the sedesc variable
    - BUG/MINOR: stconn: bs.id and fs.id had their dependencies incorrect
    - BUG/MEDIUM: ssl: reactivate 0-RTT for AWS-LC
    - BUG/MEDIUM: ssl: 0-RTT initialized at the wrong place for AWS-LC
    - BUILD: ssl: replace USE_OPENSSL_AWSLC by OPENSSL_IS_AWSLC
    - BUG/MEDIUM: quic: prevent conn freeze on 0RTT undeciphered content
    - MINOR: tcp_sample: Move TCP low level sample fetch function to control layer
    - MINOR: quic: Define ->get_info() control layer callback for QUIC
    - MINOR: flags/mux-quic: decode qcc and qcs flags
    - BUG/MINOR: quic: fix fc_rtt/srtt values
    - BUG/MIONR: quic: fix fc_lost
    - BUG/MINOR: h1: do not forward h2c upgrade header token
    - BUG/MINOR: h2: reject extended connect for h2c protocol
    - BUG/MEDIUM: http-ana: Report error on write error waiting for the response
    - BUG/MEDIUM: h2: Only report early HTX EOM for tunneled streams
    - BUG/MEDIUM: mux-h2: Propagate term flags to SE on error in h2s_wake_one_stream
    - BUG/MEDIUM: peer: Notify the applet won't consume data when it waits for sync
    - BUG/MINOR: quic: Too shord datagram during O-RTT handshakes (aws-lc only)
    - CI: add weekly QUIC Interop regression against AWS-LC
    - CI: harden NetBSD builds by ERR=1
    - BUG/MINOR: quic: Too short datagram during packet building failures (aws-lc only)
    - DEV: coccinelle: add a test to detect unchecked strdup()
    - BUG/MINOR: fcgi-app: handle a possible strdup() failure
    - BUG/MEDIUM: server/addr: fix tune.events.max-events-at-once event miss and leak
    - MINOR: quic: convert qc_stream_desc release field to flags
    - MINOR: quic: implement function to check if STREAM is fully acked
    - BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM
    - MINOR: quic: enforce ACK reception is handled in order
    - DOC: configuration: fix alphabetical ordering of {bs,fs}.aborted
    - MINOR: stconn: add a new pair of sf functions {bs,fs}.debug_str
    - MINOR: mux-h2: implement the debug string for logs
    - MINOR: mux-quic: define dump functions for QCC and QCS
    - MINOR: mux-quic: implement debug string for logs
    - MINOR: quic: dump quic_conn debug string for logs
    - MINOR: time: define tot_time structure
    - MINOR: mux-quic: measure QCS lifetime and its blocking state
    - BUG/MINOR: trace/quic: enable conn/session pointer recovery from quic_conn
    - BUG/MINOR: trace/quic: permit to lock on frontend/connect/session etc
    - BUG/MEDIUM: trace: fix null deref in lockon mechanism since TRACE_ENABLED()
    - BUG/MINOR: trace: automatically start in waiting mode with "start <evt>"
    - BUG/MINOR: trace/quic: make "qconn" selectable as a lockon criterion
    - BUG/MINOR: quic/trace: make quic_conn_enc_level_init() emit NEW not CLOSE
    - MINOR: trace: support setting the sink and level for all sources at once
    - MINOR: session/trace: enable very minimal session tracing
    - MEDIUM: trace: implement a "follow" mechanism
    - MINOR: trace: move the known trace context into a dedicated struct
    - MINOR: trace: add a per-source helper to pre-fill the context
    - MINOR: mux-h2: add a trace context filling helper
    - MINOR: mux-h1: add a trace context filling helper
    - MINOR: mux-quic: don't leave dangling pointer after freeing qcs->sd
    - MINOR: mux-quic: add a trace context filling helper
    - MINOR: mux-h1/trace: add a state trace on stream creation/upgrade
    - MINOR: mux-h2/trace: add a state trace on stream creation/destruction
    - MINOR: mux-h3/trace: add a state trace on stream creation/destruction
    - BUG/MINOR: quic: prevent freeze after early QCS closure
    - MINOR: server: ensure max_events_at_once > 0 in server_atomic_sync()
    - MINOR: cfgparse: add struct cfgfile to represent config in memory
    - REORG: tools: move list_append_word to cfgparse
    - MINOR: startup: adapt list_append_word to use cfgfile
    - MINOR: cfgparse: add load_cfg_in_mem
    - MINOR: cfgparse: load_cfg_in_mem: take in account file size
    - MINOR: tools: add fgets_from_mem
    - MEDIUM: startup: make read_cfg() return immediately on ENOMEM
    - MEDIUM: startup: load and parse configs from memory
    - MINOR: startup: rename readcfgfile in parse_cfg
2024-08-07 18:42:33 +02:00
Valentine Krasnobaeva
c6cfa7cb4a MINOR: startup: rename readcfgfile in parse_cfg
As readcfgfile no longer opens configuration files and reads them with fgets,
but performs only the parsing of provided data, let's rename it to parse_cfg by
analogy with read_cfg in haproxy.c.
2024-08-07 18:41:41 +02:00
Valentine Krasnobaeva
5b52df4c4d MEDIUM: startup: load and parse configs from memory
Let's call load_cfg_in_ram() helper for each configuration file to load it's
content in some area in memory. Adapt readcfgfile() parser function
respectively. In order to limit changes in its scope we give as an argument a
cfgfile structure, already filled in init_args() and in load_cfg_in_ram() with
file metadata and content.

Parser function (readcfgfile()) uses now fgets_from_mem() instead of standard
fgets from libc implementations.

SPOE filter parses its own configuration file, pointed by 'config' keyword in
the configuration already loaded in memory. So, let's allocate and fill for
this a supplementary cfgfile structure, which is not referenced in cfg_cfgfiles
list. This structure and the memory with content of SPOE filter configuration
are freed immediately in parse_spoe_flt(), when readcfgfile() returns.

HAProxy OpenTracing filter also uses its own configuration file. So, let's
follow the same logic as we do for SPOE filter.
2024-08-07 18:41:41 +02:00
Valentine Krasnobaeva
2bb34edb0b MEDIUM: startup: make read_cfg() return immediately on ENOMEM
This commit prepares read_cfg() to call load_cfg_in_mem() helper in order to
load configuration files in memory. Before, read_cfg() calls the parser for all
files from cfg_cfgfiles list and cumulates parser's errors and memprintf's
errors in for_each loop. memprintf's errors did not stop this loop and were
accounted just after.

Now, as we plan to load configuration files in memory, we stop the loop, if
memprintf() fails, and we show appropraite error message with ha_alert. Then
process terminates. So not all cumulated syntax-related errors will be shown
before exit in this case and we has to stop, because we run out of memory.

If we can't open the current file or we fail to allocate a memory to store
some configuration line, the previous behaviour is kept, process emits
appropriate alert message and exits.

If parser returns some syntax-related error on the current file, the previous
behaviour is kept as well. We cumulate such errors for all parsed files and we
check them just after the loop. All syntax-related errors for all files is
shown then  as before in ha_alert messages line by line during the startup.
Then process will exit with 1.

As now cfg_cfgfiles list contains many pointers to some memory areas with
configuration files content and this content could be big, it's better to
free the list explicitly, when parsing was finished. So, let's change
read_cfg() to return some integer value to its caller init(), and let's perform
the free  routine at a caller level, as cfg_cfgfiles list was initialized and
initially filled at this level.
2024-08-07 18:41:41 +02:00
Valentine Krasnobaeva
007f7f2f02 MINOR: tools: add fgets_from_mem
Add fgets_from_mem() helper to read lines from configuration files, stored now
as memory chunks. In order to limit changes in the first-level parser code
(readcfgfile()), it is better to reimplement the standard fgets, i.e. to
have a fgets, which can read the serialized data line by line from some memory
area, instead of file stream, and can keep the same behaviour as libc
implementations fgets.
2024-08-07 18:41:41 +02:00
Valentine Krasnobaeva
03e63b98ca MINOR: cfgparse: load_cfg_in_mem: take in account file size
Let's take in account the given file size, when its reported via stat.

It's very convenient for large configuration files, as this allows to
perform only the one memory allocation call for precisely needeed file size.
This also allows to perform only the one call to fread().

We need to provide to fread() file_stat.st_size + 1 to be able to grab EOF.
Like this it sets feof(f)=1 flag and this allows to exit from the loop
immediately, just after fread call.

If /dev/stdin or /dev/null is provided as a file, we continue to read the
configuration chunk by chunk, stat doesn't report the size.
2024-08-07 18:41:41 +02:00
Valentine Krasnobaeva
5b9ed6e4be MINOR: cfgparse: add load_cfg_in_mem
Add load_cfg_in_mem() helper, which allows to store the content of a given file
in memory.
2024-08-07 18:41:41 +02:00
Valentine Krasnobaeva
bafb0ce272 MINOR: startup: adapt list_append_word to use cfgfile
list_append_word() helper was used before only to chain configuration file names
in a list. As now we start to use cfgfile structure which represents entire file
in memory and its metadata, let's adapt this helper to use this structure and
let's rename it to list_append_cfgfile().

Adapt functions, which process configuration files and directories to use
cfgfile structure and list_append_cfgfile() instead of wordlist.
2024-08-07 18:41:41 +02:00
Valentine Krasnobaeva
39f2a19620 REORG: tools: move list_append_word to cfgparse
Let's move list_append_word to cfgparse.c as it is used only to fill
cfg_cfgfiles list with configuration file names.
2024-08-07 18:41:41 +02:00
Valentine Krasnobaeva
70b842e847 MINOR: cfgparse: add struct cfgfile to represent config in memory
This and following commits serve to prepare loading configuration files in
memory, before parsing them, as we may need to parse some parts of
configuration in different moments of the startup sequence. This is a case of
the new master-worker initialization process. Here we need to read at first
only the global and the program sections and only after some steps
(forking worker, etc) the rest of the configuration.

Add a new structure cfgfile to keep configuration files metadata and content,
loaded somewhere in a memory. Instances of filled cfgfile structures could be
chained in a list, as the order in which they were loaded is important.
2024-08-07 18:41:41 +02:00
Aurelien DARRAGON
a6d1eb8f5d MINOR: server: ensure max_events_at_once > 0 in server_atomic_sync()
In 8f1fd96 ("BUG/MEDIUM: server/addr: fix tune.events.max-events-at-once
event miss and leak"), we added a comment saying that
tune.events.max-events-at-once is assumed to be strictly positive.

It is so because the keyword parser forces values between 1 and 10000:
we don't want less than 1 because it wouldn't make any sense, and 10k
max because beyond that we could create contention in server_atomic_sync()

Now as the above commit implements a do..while it heavily relies on the
fact that the budget is at least 1. Upon soft-stop, we break away from
the loop without decrementing the budget. With all that in mind, it is
safe to assume that the 'remain' counter will only fall to 0 if the task
runs out of budget while doing work, in which case the task still exists
and must be rescheduled.

As seen in GH #2667 this assumption was ambiguous, so let's make it
official by adding a pair of BUG_ON() that make it explicit that it
works because remain 'cannot' be 0 unless the entire budget was
consumed.

No backport needed.
2024-08-07 18:31:35 +02:00
Amaury Denoyelle
3ef1ee477d BUG/MINOR: quic: prevent freeze after early QCS closure
A connection freeze may occur if a QCS is released before transmitting
any data. This can happen when an error is detected early by the stream,
for example during HTTP response headers encoding, forcing the whole
connection closure.

In this case, a connection error is registered by the QUIC MUX to the
lower layer. MUX is then release and xprt layer is notified to prepare
CONNECTION_CLOSE emission. However, this is prevented because quic_conn
streams tree is not empty as it contains the qc_stream_desc previously
attached to the failed QCS instance. The connection will freeze until
QUIC idle timeout.

This situation is caused by an omission during qc_stream_desc release
operation. In the described situation, qc_stream_desc current buffer is
empty and can thus by removed, which is the purpose of this patch. This
unblocks this previously failed situation, with qc_stream_desc removal
from quic_conn tree.

This issue can be reproduced by modifying H3/QPACK code to return an
early error during HEADERS response processing.

This must be backported up to 2.6, after a period of observation.
2024-08-07 18:14:29 +02:00
Willy Tarreau
d5da87b5dc MINOR: mux-h3/trace: add a state trace on stream creation/destruction
Logging below the developer level doesn't always yield very convenient
traces as we don't know well where streams are allocated nor released.
Let's just make that more explicit by using state-level traces for these
important steps.
2024-08-07 16:02:59 +02:00
Willy Tarreau
23417ab9d4 MINOR: mux-h2/trace: add a state trace on stream creation/destruction
Logging below the developer level doesn't always yield very convenient
traces as we don't know well where streams are allocated nor released.
Let's just make that more explicit by using state-level traces for these
important steps.
2024-08-07 16:02:59 +02:00
Willy Tarreau
cc12d1b253 MINOR: mux-h1/trace: add a state trace on stream creation/upgrade
Logging below the developer level doesn't always yield very convenient
traces as we don't know well where streams are allocated nor released.
Let's just make that more explicit by using state-level traces. Note that
h1s destruction was already logged as closing connection or switching
to idle mode.
2024-08-07 16:02:59 +02:00
Willy Tarreau
6191de6aa6 MINOR: mux-quic: add a trace context filling helper
This helper is able to find a connection, a session, a stream, or a
frontend from its args.
2024-08-07 16:02:59 +02:00
Willy Tarreau
b2cede590b MINOR: mux-quic: don't leave dangling pointer after freeing qcs->sd
In qcs_free() we're calling a few other functions after releasing
qcs->sd. None of them make use of it for now but with traces that
will change. Make sure to clear qcs->sd after releasing it.
2024-08-07 16:02:59 +02:00
Willy Tarreau
adfe0a30e1 MINOR: mux-h1: add a trace context filling helper
This helper is able to find a connection, a session, a stream, a
frontend or a backend from its args.
2024-08-07 16:02:59 +02:00
Willy Tarreau
6c6ef5ae12 MINOR: mux-h2: add a trace context filling helper
This helper is able to find a connection, a session, a stream, a
frontend or a backend from its args.

Note that this required to always make sure that h2s->sess is reset on
allocation because it's normally initialized later for backend streams,
and producing traces between the two could pre-fill a bad pointer in
the trace_ctx.
2024-08-07 16:02:59 +02:00
Willy Tarreau
10c8baca44 MINOR: trace: add a per-source helper to pre-fill the context
Now sources which want to do it can provide a helper that can pre-fill
some fields in the context based on their knowledge (e.g. mux streams).
2024-08-07 16:02:59 +02:00
Willy Tarreau
7d55a70f5a MINOR: trace: move the known trace context into a dedicated struct
We now have a trace_ctx to hold the sess, conn, qc, stream and so on.
This will allow us to pass it across layers so that other helpers can
help fill them.

Ideally it should be passed as an argument to __trace_enabled() by
__trace() so that it can be passed back to the trace callback. But
it seems that trace callbacks are smart enough to figure all their
info when they need them.
2024-08-07 16:02:59 +02:00
Willy Tarreau
d465610ec3 MEDIUM: trace: implement a "follow" mechanism
With "follow" from one source to another, it becomes possible for a
source to automatically follow another source's tracked pointer. The
best example is the session:
  - the "session" source is enabled and has a "lockon session"
    -> its lockon_ptr is equal to the session when valid
  - other sources (h1,h2,h3 etc) are configured for "follow session"
    and will then automatically check if session's lockon_ptr matches
    its own session, in which case tracing will be enabled for that
    trace (no state change).

It's not necessary to start/pause/stop traces when using this, only
"follow" followed by a source with lockon enabled is needed. Some
combinations might work better than others. At the moment the session
is almost never known from the backend, but this may improve.

The meta-source "all" is supported for the follower so that all sources
will follow the tracked one.
2024-08-07 16:02:59 +02:00
Willy Tarreau
abb07af67e MINOR: session/trace: enable very minimal session tracing
By having traces at the session level, it becomes possible to start
traces on session creation and pause them on session end. Doing so
will soon open new possibilties to synchronize multiple traces.
2024-08-07 16:02:59 +02:00
Willy Tarreau
d2a49de9c7 MINOR: trace: support setting the sink and level for all sources at once
It's extremely painful to have to set "trace <src> sink buf1" for all
sources, then to do the same for "level developer" (for example). Let's
have a possibility via a meta-source "all" to apply the change to all
sources at once. This currently supports level and sink, which are not
dependent on the source, this is a good start.
2024-08-07 16:02:59 +02:00
Willy Tarreau
6bf50dfccc BUG/MINOR: quic/trace: make quic_conn_enc_level_init() emit NEW not CLOSE
The event emitted by this trace was of type CLOSE instead of NEW, which
would somtimes temporarily pause a started trace.

This can be backported to 3.0, probably 2.6.
2024-08-07 16:02:59 +02:00
Willy Tarreau
7a22fbd453 BUG/MINOR: trace/quic: make "qconn" selectable as a lockon criterion
The test was was performed but there's no way to set the option! Let's
just add "qconn" to select the quic conn when the source supports it.

This can be backported at least to 3.0, probably 2.6.
2024-08-07 16:02:59 +02:00
Willy Tarreau
0406efe9ad BUG/MINOR: trace: automatically start in waiting mode with "start <evt>"
The doc clearly says that "start <evt>" should leave the trace in pause
mode until the indicated event appears. However it's not what's happening,
the state is not changed until one command uses "now", so it's typically
needed to configure the events with "start <evt>" then enable the waiting
mode using "pause now". This is counter-intuitive and does not match the
doc, so let's fix it so that "start <evt>" switches from stopped to waiting
as long as at least one event is enabled.

This can be backported to all versions.
2024-08-07 16:02:59 +02:00
Willy Tarreau
b5df6b5a31 BUG/MEDIUM: trace: fix null deref in lockon mechanism since TRACE_ENABLED()
When calling TRACE_ENABLED(), which is called by TRACE_PRINTF(), we pass
a NULL plockptr to __trace_enabled(). This argument is used when lockon
is active, and may update the pointer. This is an overlook which also
broke the lockon mechanism because now for calls from __trace(), it
dereferences a pointer pointing to NULL, and never updates it due to the
broken condition, so that trace() never sets up src->lockon_ptr.

The bug was introduced in 2.8 by commit 8f9a9704bb ("MINOR: trace: add a
TRACE_ENABLED() macro to determine if a trace is active"), so the fix must
be backported there.
2024-08-07 16:02:59 +02:00
Willy Tarreau
88a752ca78 BUG/MINOR: trace/quic: permit to lock on frontend/connect/session etc
These ones were not proposed in the list of trackable elements. Note
that this depends on previous commit:

    BUG/MINOR: trace/quic: enable conn/session pointer recovery from quic_conn

This should be backported to at least 3.0, maybe even 2.6.
2024-08-07 16:02:59 +02:00
Willy Tarreau
aa1915a9f5 BUG/MINOR: trace/quic: enable conn/session pointer recovery from quic_conn
In __trace_enabled(), a quic_conn was detected, but it was not possible
to derive the connection nor the session from it, which was quite limiting
in terms of ability to track a same instance.

This should be backported to at least 3.0, maybe even 2.6.
2024-08-07 16:02:59 +02:00
Amaury Denoyelle
9f829ea3f3 MINOR: mux-quic: measure QCS lifetime and its blocking state
Reuse newly defined tot_time structure to measure various values related
to a QCS lifetime.

First, a timer is used to comptabilize the total QCS lifetime. Then, two
other timers are used to account the total time during which Tx from
stream layer to MUX is blocked, either on lack of buffer or due to
flow-control.

These three timers are reported in qmux_dump_qcs_info(). Thus, they are
available in traces and for QUIC MUX debug string sample.
2024-08-07 15:40:52 +02:00
Amaury Denoyelle
a6e2523ca1 MINOR: time: define tot_time structure
Define a new utility type tot_time. Its purpose is to be able to account
elapsed time accross multiple periods. Functions are defined to easily
start and stop measures, and return the current value.
2024-08-07 15:40:52 +02:00
Amaury Denoyelle
663416b4ef MINOR: quic: dump quic_conn debug string for logs
Define a new xprt_ops callback named dump_info. This can be used to
extend MUX debug string with infos from the lower layer.

Implement dump_info for QUIC stack. For now, only minimal info are
reported : bytes in flight and size of the sending window. This should
allow to detect if the congestion controller is fine. These info are
reported via QUIC MUX debug string sample.
2024-08-07 15:40:52 +02:00
Amaury Denoyelle
630fa53c51 MINOR: mux-quic: implement debug string for logs
Implement MUX_SCTL_DBG_STR for QUIC MUX. This returns info for the
current QCS and QCC instances, reusing qmux_dump_qc{c,s}_info functions
already used for traces, and the connection flags.

This stream operation is useful for debug string sample support.
2024-08-07 15:40:52 +02:00
Amaury Denoyelle
eb4dfa3b36 MINOR: mux-quic: define dump functions for QCC and QCS
Extract trace code to dump QCC and QCS instances into dedicated
functions named qmux_dump_qc{c,s}_info(). This will allow to easily
print QCC/QCS infos outside of traces.
2024-08-07 15:40:52 +02:00
Willy Tarreau
490cb16d3a MINOR: mux-h2: implement the debug string for logs
Now it permits to have this for a front and a back:

<134>Jul 30 19:32:53 haproxy[24405]: 127.0.0.1:64860 [30/Jul/2024:19:32:53.732] test2 test2/s1 0/0/0/0/0 200 130 - - ---- 2/1/0/0/0 0/0 "GET /blah HTTP/2.0"  h2s.id=1 .st=CLO .flg=0x7003 .rxbuf=0@(nil)+0/0 .sc=0x1e03fb0(.flg=0x00034482 .app=0x1e04020) .sd=0x1e03f30(.flg=0x50405601) .subs=(nil) h2c.st0=FRH .err=0 .maxid=1 .lastid=-1 .flg=0x100e00 .nbst=0 .nbsc=1, .glitches=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=1 .orph_cnt=0 .sub=1 .dsi=1 .dbuf=0@(nil)+0/0 .mbuf=[1..1|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=(nil) conn.flg=0x80000300
<134>Jul 30 19:32:53 haproxy[24405]: 127.0.0.1:65246 [30/Jul/2024:19:32:53.732] test1 test1/s1 0/0/0/0/0 200 130 - - ---- 2/1/0/0/0 0/0 "GET /blah HTTP/1.1"  h2s.id=1 .st=CLO .flg=0x7003 .rxbuf=0@(nil)+0/0 .sc=0x1dfc7b0(.flg=0x0006d01b .app=0x1c65fe0) .sd=0x1dfc820(.flg=0x1040ca01) .subs=(nil) h2c.st0=FRH .err=0 .maxid=1 .lastid=-1 .flg=0x108e00 .nbst=0 .nbsc=1, .glitches=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=1 .orph_cnt=0 .sub=1 .dsi=1 .dbuf=0@(nil)+0/0 .mbuf=[1..1|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=(nil) conn.flg=0x000300

Just with this in the front and back proxies respectively:
  log-format "$HAPROXY_HTTP_LOG_FMT %[bs.debug_str(15)]"
  log-format "$HAPROXY_HTTP_LOG_FMT %[fs.debug_str(15)]"

For now the mux only implements muxs, muxc, conn. Xprt is ignored.
2024-08-07 14:07:41 +02:00
Willy Tarreau
921e04bf87 MINOR: stconn: add a new pair of sf functions {bs,fs}.debug_str
These are passed to the underlying mux to retrieve debug information
at the mux level (stream/connection) as a string that's meant to be
added to logs.

The API is quite complex just because we can't pass any info to the
bottom function. So we construct a union and pass the argument as an
int, and expect the callee to fill that with its buffer in return.

Most likely the mux->ctl and ->sctl API should be reworked before
the release to simplify this.

The functions take an optional argument that is a bit mask of the
layers to dump:
  muxs=1
  muxc=2
  xprt=4
  conn=8
  sock=16

The default (0) logs everything available.
2024-08-07 14:07:41 +02:00
Willy Tarreau
b681a9e488 DOC: configuration: fix alphabetical ordering of {bs,fs}.aborted
These must be before {bs,fs}.id, not after. Should be backported wherever
068ce2d5d2 ("MINOR: stconn: Add samples to retrieve about stream aborts")
is (normally 3.0).
2024-08-07 14:07:41 +02:00
Amaury Denoyelle
b2282082dd MINOR: quic: enforce ACK reception is handled in order
Add a new BUG_ON() in qc-stream_desc_ack(). It ensures that
acknowledgement are always notify in-order. This is because out-of-order
ACKs cannot be handled by qc_stream_desc layer which does not support
gap in STREAM sent data.

Prior to this fix, out-of-order ACKs are simply ignored without any
error. This currently cannot happen thanks to careful
qc_stream_desc_ack() invokation. If this assumption is broken in the
future by inatteion, this would cause loss of ACK notification which
will prevent qc_stream_desc release.
2024-08-07 11:08:20 +02:00
Amaury Denoyelle
e177cf341c BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM
STREAM frames have dedicated handling on retransmission. A special check
is done to remove data already acked in case of duplicated frames, thus
only unacked data are retransmitted.

This handling is faulty in case of an empty STREAM frame with FIN set.
On retransmission, this frame does not cover any unacked range as it is
empty and is thus discarded. This may cause the transfer to freeze with
the client waiting indefinitely for the FIN notification.

To handle retransmission of empty FIN STREAM frame, qc_stream_desc layer
have been extended. A new flag QC_SD_FL_WAIT_FOR_FIN is set by MUX QUIC
when FIN has been transmitted. If set, it prevents qc_stream_desc to be
freed until FIN is acknowledged. On retransmission side,
qc_stream_frm_is_acked() has been updated. It now reports false if
FIN bit is set on the frame and qc_stream_desc has QC_SD_FL_WAIT_FOR_FIN
set.

This must be backported up to 2.6. However, this modifies heavily
critical section for ACK handling and retransmission. As such, it must
be backported only after a period of observation.

This issue can be reproduced by using the following socat command as
server to add delay between the response and connection closure :
  $ socat TCP-LISTEN:<port>,fork,reuseaddr,crlf SYSTEM:'echo "HTTP/1.1 200 OK"; echo ""; sleep 1;'

On the client side, ngtcp2 can be used to simulate packet drop. Without
this patch, connection will be interrupted on QUIC idle timeout or
haproxy client timeout with ERR_DRAINING on ngtcp2 :
  $ ngtcp2-client --exit-on-all-streams-close -r 0.3 <host> <port> "http://<host>:<port>/?s=32o"

Alternatively to ngtcp2 random loss, an extra haproxy patch can also be
used to force skipping the emission of the empty STREAM frame :

diff --git a/include/haproxy/quic_tx-t.h b/include/haproxy/quic_tx-t.h
index efbdfe687..1ff899acd 100644
--- a/include/haproxy/quic_tx-t.h
+++ b/include/haproxy/quic_tx-t.h
@@ -26,6 +26,8 @@ extern struct pool_head *pool_head_quic_cc_buf;
 /* Flag a sent packet as being probing with old data */
 #define QUIC_FL_TX_PACKET_PROBE_WITH_OLD_DATA (1UL << 5)

+#define QUIC_FL_TX_PACKET_SKIP_SENDTO (1UL << 6)
+
 /* Structure to store enough information about TX QUIC packets. */
 struct quic_tx_packet {
 	/* List entry point. */
diff --git a/src/quic_tx.c b/src/quic_tx.c
index 2f199ac3c..2702fc9b9 100644
--- a/src/quic_tx.c
+++ b/src/quic_tx.c
@@ -318,7 +318,7 @@ static int qc_send_ppkts(struct buffer *buf, struct ssl_sock_ctx *ctx)
 		tmpbuf.size = tmpbuf.data = dglen;

 		TRACE_PROTO("TX dgram", QUIC_EV_CONN_SPPKTS, qc);
-		if (!skip_sendto) {
+		if (!skip_sendto && !(first_pkt->flags & QUIC_FL_TX_PACKET_SKIP_SENDTO)) {
 			int ret = qc_snd_buf(qc, &tmpbuf, tmpbuf.data, 0, gso);
 			if (ret < 0) {
 				if (gso && ret == -EIO) {
@@ -354,6 +354,7 @@ static int qc_send_ppkts(struct buffer *buf, struct ssl_sock_ctx *ctx)
 					qc->cntrs.sent_bytes_gso += ret;
 			}
 		}
+		first_pkt->flags &= ~QUIC_FL_TX_PACKET_SKIP_SENDTO;

 		b_del(buf, dglen + QUIC_DGRAM_HEADLEN);
 		qc->bytes.tx += tmpbuf.data;
@@ -2066,6 +2067,17 @@ static int qc_do_build_pkt(unsigned char *pos, const unsigned char *end,
 				continue;
 			}

+			switch (cf->type) {
+			case QUIC_FT_STREAM_8 ... QUIC_FT_STREAM_F:
+				if (!cf->stream.len && (qc->flags & QUIC_FL_CONN_TX_MUX_CONTEXT)) {
+					TRACE_USER("artificially drop packet with empty STREAM frame", QUIC_EV_CONN_TXPKT, qc);
+					pkt->flags |= QUIC_FL_TX_PACKET_SKIP_SENDTO;
+				}
+				break;
+			default:
+				break;
+			}
+
 			quic_tx_packet_refinc(pkt);
 			cf->pkt = pkt;
 		}
2024-08-07 11:03:32 +02:00
Amaury Denoyelle
714009b7bc MINOR: quic: implement function to check if STREAM is fully acked
When a STREAM frame is retransmitted, a check is performed to remove
range of data already acked from it. This is useful when STREAM frames
are duplicated and splitted to cover different data ranges. The newly
retransmitted frame contains only unacked data.

This process is performed similarly in qc_dup_pkt_frms() and
qc_build_frms(). Refactor the code into a new function named
qc_stream_frm_is_acked(). It returns true if frame data are already
fully acked and retransmission can be avoided. If only a partial range
of data is acknowledged, frame content is updated to only cover the
unacked data.

This patch does not have any functional change. However, it simplifies
retransmission for STREAM frames. Also, it will be reused to fix
retransmission for empty STREAM frames with FIN set from the following
patch :
  BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM

As such, it must be backported prior to it.
2024-08-07 10:57:10 +02:00
Amaury Denoyelle
bb9ac256a1 MINOR: quic: convert qc_stream_desc release field to flags
qc_stream_desc had a field <release> used as a boolean. Convert it with
a new <flags> field and QC_SD_FL_RELEASE value as equivalent.

The purpose of this patch is to be able to extend qc_stream_desc by
adding newer flags values. This patch is required for the following
patch
  BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM

As such, it must be backported prior to it.
2024-08-06 18:00:17 +02:00
Aurelien DARRAGON
8f1fd96d17 BUG/MEDIUM: server/addr: fix tune.events.max-events-at-once event miss and leak
An issue has been introduced with cd99440 ("BUG/MAJOR: server/addr: fix
a race during server addr:svc_port updates").

Indeed, in the above commit we implemented the atomic_sync task which is
responsible for consuming pending server events to apply the changes
atomically. For now only server's addr updates are concerned.

To prevent the task from causing contention, a budget was assigned to it.
It can be controlled with the global tunable
'tune.events.max-events-at-once': the task may not process more than this
number of events at once.

However, a bug was introduced with this budget logic: each time the task
has to be interrupted because it runs out of budget, we reschedule the
task to finish where it left off, but the current event which was already
removed from the queue wasn't processed yet. This means that this pending
event (each tune.events.max-events-at-once) is effectively lost.

When the atomic_sync task deals with large number of concurrent events,
this bug has 2 known consequences: first a server's addr/port update
will be lost every 'tune.events.max-events-at-once'. This can of course
cause reliability issues because if the event is not republished
periodically, the server could stay in a stale state for indefinite amount
of time. This is the case when the DNS server flaps for instance: some
servers may not come back UP after the incident as described in GH #2666.

Another issue is that the lost event was not cleaned up, resulting in a
small memory leak. So in the end, it means that the bug is likely to
cause more and more degradation over time until haproxy is restarted.

As a workaround, 'tune.events.max-events-at-once' may be set to the
maximum number of events expected per batch. Note however that this value
cannot exceed 10 000, otherwise it could cause the watchdog to trigger due
to the task being busy for too long and preventing other threads from
making any progress. Setting higher values may not be optimal for common
workloads so it should only be used to mitigate the bug while waiting for
this fix.

Since tune.events.max-events-at-once defaults to 100, this bug only
affects configs that involve more than 100 servers whose addr:port
properties are likely to be updated at the same time (batched updates
from cli, lua, dns..)

To fix the bug, we move the budget check after the current event is fully
handled. For that we went from a basic 'while' to 'do..while' loop as we
assume from the config that 'tune.events.max-events-at-once' cannot be 0.
While at it, we reschedule the task once thread isolation ends (it was not
required to perform the reschedule while under isolation) to give the hand
back faster to waiting threads.

This patch should be backported up to 2.9 with cd99440. It should fix
GH #2666.
2024-08-06 16:41:37 +02:00
Ilia Shipitsin
aaaacaaf4b BUG/MINOR: fcgi-app: handle a possible strdup() failure
This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to 2.2.
2024-08-06 08:21:49 +02:00
Ilia Shipitsin
661e1db826 DEV: coccinelle: add a test to detect unchecked strdup()
The coccinelle test "unchecked-strdup.cocci" detects various cases of
unchecked strdup().
2024-08-06 08:21:49 +02:00
Frederic Lecaille
eb1a097a66 BUG/MINOR: quic: Too short datagram during packet building failures (aws-lc only)
This issue was reported by Ilya (@Chipitsine) when building haproxy against
aws-lc in GH #2663 where handshakeloss and handshakecorruption interop tests could
lead haproxy to crash after having built too short datagrams:

FATAL: bug condition "first_pkt->type == QUIC_PACKET_TYPE_INITIAL && (first_pkt->flags & (1UL << 0)) && length < 1200" matched at src/quic_tx.c:163
call trace(13):
| 0x55f4ee4dcc02 [ba d9 00 00 00 48 8d 35]: main-0x195bf2
| 0x55f4ee4e3112 [83 3d 2f 16 35 00 00 0f]: qc_send+0x11f3/0x1b5d
| 0x55f4ee4e9ab4 [85 c0 0f 85 00 f6 ff ff]: quic_conn_io_cb+0xab1/0xf1c
| 0x55f4ee6efa82 [48 c7 c0 f8 55 ff ff 64]: run_tasks_from_lists+0x173/0x9c2
| 0x55f4ee6f05d3 [8b 7d a0 29 c7 85 ff 0f]: process_runnable_tasks+0x302/0x6e6
| 0x55f4ee671bb7 [83 3d 86 72 44 00 01 0f]: run_poll_loop+0x6e/0x57b
| 0x55f4ee672367 [48 8b 1d 22 d4 1d 00 48]: main-0x48d
| 0x55f4ee6755e0 [b8 00 00 00 00 e8 08 61]: main+0x2dec/0x335d

This could happen after Handshake packet building failures which follow a successful
Initial packet into the same datagram. In this case, the datagram could be emitted
with a too short length (<1200 bytes).

To fix this, store the datagram only if the first packet is not an Initial packet
or if its length is big enough (>=1200 bytes).

Must be backported as far as 2.6.
2024-08-05 13:40:51 +02:00
Ilia Shipitsin
7fc52032e3 CI: harden NetBSD builds by ERR=1
Add ERR=1 build option to the NetBSD build from github.
2024-08-05 08:49:19 +02:00
Ilia Shipitsin
15d47eda37 CI: add weekly QUIC Interop regression against AWS-LC
currently only quic-go and picoquic clients are enabled.
Tests will be run weekly.
2024-08-05 08:46:49 +02:00
Frederic Lecaille
e12620a8a9 BUG/MINOR: quic: Too shord datagram during O-RTT handshakes (aws-lc only)
By "aws-lc only", one means that this bug was first revealed by aws-lc stack.
This does not mean it will not appeared for new versions of other TLS stacks which
have never revealed this bug.

This bug was reported by Ilya (@chipitsine) in GH #2657 where some QUIC interop
tests (resumption, zerortt) could lead to crash with haproxy compiled against
aws-lc TLS stack. These crashed were triggered by this BUG_ON() which detects
that too short datagrams with at least one ack-eliciting Initial packet inside
could be built.

  <0>2024-07-31T15:13:42.562717+02:00 [01|quic|5|quic_tx.c:739] qc_prep_pkts():
  next encryption level : qc@0x61d000041080 idle_timer_task@0x60d000006b80 flags=0x6000058

  FATAL: bug condition "first_pkt->type == QUIC_PACKET_TYPE_INITIAL && (first_pkt->flags & (1UL << 0)) && length < 1200" matched at src/quic_tx.c:163
  call trace(12):
  | 0x563ea447bc02 [ba d9 00 00 00 48 8d 35]: main-0x1958ce
  | 0x563ea4482703 [e9 73 fe ff ff ba 03 00]: qc_send+0x17e4/0x1b5d
  | 0x563ea4488ab4 [85 c0 0f 85 00 f6 ff ff]: quic_conn_io_cb+0xab1/0xf1c
  | 0x563ea468e6f9 [48 c7 c0 f8 55 ff ff 64]: run_tasks_from_lists+0x173/0x9c2
  | 0x563ea468f24a [8b 7d a0 29 c7 85 ff 0f]: process_runnable_tasks+0x302/0x6e6
  | 0x563ea4610893 [83 3d aa 65 44 00 01 0f]: run_poll_loop+0x6e/0x57b
  | 0x563ea4611043 [48 8b 1d 46 c7 1d 00 48]: main-0x48d
  | 0x7f64d05fb609 [64 48 89 04 25 30 06 00]: libpthread:+0x8609
  | 0x7f64d0520353 [48 89 c7 b8 3c 00 00 00]: libc:clone+0x43/0x5e

That said everything was correctly done by qc_prep_ptks() to prevent such a case.
But this relied on the hypothesis that the list of encryption levels it used
was always built in the same order as follows for 0-RTT sessions:

    initial, early-data, handshake, application

But this order is determined but the order the TLS stack derives the secrets
for these encryption levels. For aws-lc, this order is not the same but
as follows:

    initial, handshake, application, early-data

During 0-RTT sessions, the server may have to build three ack-eliciting packets
(with CRYPTO data inside) to reply to the first client packet: initial, hanshake,
application. qc_prep_pkts() adds a PADDING frame to the last built packet
for the last encryption level in the list. But after application level encryption,
there is early-data encryption level. This prevented qc_prep_pkts() to build
a padded applicaiton level last packet to send a 1200-bytes datagram.

To fix this, always insert early-data encryption level after the initial
encryption level into the encryption levels list when initializing this encryption
level from quic_conn_enc_level_init().

Must be backported as far as 2.9.
2024-08-02 15:25:26 +02:00
Christopher Faulet
78b8b60030 BUG/MEDIUM: peer: Notify the applet won't consume data when it waits for sync
When the peer applet is waiting for a synchronisation with the global sync
task, we must notify it won't consume data. Otherwise, if some data are
already waiting in the input buffer, the applet will be woken up in loop and
this wil trigger the watchdog. Once synchronized, the applet is woken up. In
that case, the peer applet must indicate it is going to consume data again.

This patch should fix the issue #2656. It must be backported to 3.0.
2024-08-02 08:42:29 +02:00
Christopher Faulet
184f16ded7 BUG/MEDIUM: mux-h2: Propagate term flags to SE on error in h2s_wake_one_stream
When a stream is explicitly woken up by the H2 conneciton, if an error
condition is detected, the corresponding error flag is set on the SE. So
SE_FL_ERROR or SE_FL_ERR_PENDING, depending if the end of stream was
reported or not.

However, there is no attempt to propagate other termination flags. We must
be sure to properly set SE_FL_EOI and SE_FL_EOS when appropriate to be able
to switch a pending error to a fatal error.

Because of this bug, the SE remains with a pending error and no end of
stream, preventing the applicative stream to trully abort it. It means on
some abort scenario, it is possible to block a stream infinitely.

This patch must be backported at least as far as 2.8. No bug was observed on
older versions while the same code is inuse.
2024-08-02 08:42:28 +02:00
Christopher Faulet
6743e128f3 BUG/MEDIUM: h2: Only report early HTX EOM for tunneled streams
For regular H2 messages, the HTX EOM flag is synonymous the end of input. So
SE_FL_EOI flag must also be set on the stream-endpoint descriptor. However,
there is an exception. For tunneled streams, the end of message is reported
on the HTX message just after the headers. But in that case, no end of input
is reported on the SE.

But here, there is a bug. The "early" EOM is also report on the HTX messages
when there is no payload (for instance a content-length set to 0). If there
is no ES flag on the H2 HEADERS frame, it is an unexpected case. Because for
the applicative stream and most probably for the opposite endpoint, the
message is considered as finihsed. It is switched in its DONE state (or the
equivalent on the endpoint). But, if an extra H2 frame with the ES flag is
received, a TRAILERS frame or an emtpy DATA frame, an extra EOT HTX block is
pushed to carry the HTX EOM flag. So an extra HTX block is emitted for a
regular HTX message. It is totally invalid, it must never happen.

Because it is an undefined behavior, it is difficult to predict the result.
But it definitly prevent the applicative stream to properly handle aborts
and errors because data remain blocked in the channel buffer. Indeed, the
end of the message was seen, so no more data are forwarded.

It seems to be an issue for 2.8 and upper. Harder to evaluate for older
versions.

This patch must be backported as far as 2.4.
2024-08-02 08:42:28 +02:00
Christopher Faulet
0ba6202796 BUG/MEDIUM: http-ana: Report error on write error waiting for the response
When we are waiting for the server response, if an error is pending on the
frontend side (a write error on client), it is handled as an abort and all
regular response analyzers are removed, except the one responsible to
release the filters, if any. However, while it is handled as an abort, the
error is not reported, as usual, via http_reply_and_close() function. It is
an issue because in that, the channels buffers are not reset.

Because of this bug, it is possible to block a stream infinitely. The
request side is waiting for the response side and the response side is
blocked because filters must be released and this cannot be done because
data remain blocked in channels buffers.

So, in that case, calling http_reply_and_close() with no message is enough
to unblock the stream.

This patch must be backported as far as 2.8.
2024-08-02 08:42:28 +02:00
Amaury Denoyelle
7a5a30d28a BUG/MINOR: h2: reject extended connect for h2c protocol
This commit prevents forwarding of an HTTP/2 Extended CONNECT when "h2c"
or "h2" token is set as targetted protocol. Contrary to the previous
commit which deals with HTTP/1 mux, this time the request is rejected
and a RESET_STREAM is reported to the client.

This must be backported up to 2.4 after a period of observation.
2024-08-01 18:23:44 +02:00
Amaury Denoyelle
7b89aa5b19 BUG/MINOR: h1: do not forward h2c upgrade header token
haproxy supports tunnel establishment through HTTP Upgrade mechanism.
Since the following commit, extended CONNECT is also supported for
HTTP/2 both on frontend and backend side.

  commit 9bf957335e2c385b74901481f7a89c9565dfce53
  MEDIUM: mux_h2: generate Extended CONNECT from htx upgrade

As specified by HTTP/2 rfc, "h2c" can be used by an HTTP/1.1 client to
request an upgrade to HTTP/2. In haproxy, this is not supported so it
silently ignores this. However, Connection and Upgrade headers are
forwarded as-is on the backend side.

If using HTTP/1 on the backend side and the server supports this upgrade
mechanism, haproxy won't be able to parse the HTTP response. If using
HTTP/2, mux backend tries to incorrectly convert the request to an
Extended CONNECT with h2c protocol, which may also prevent the response
to be transmitted.

To fix this, flag HTTP/1 request with "h2c" or "h2" token in an upgrade
header. On converting the header list to HTX, the upgrade header is
skipped if any of this token is present and the H1_MF_CONN_UPG flag is
removed.

This issue can easily be reproduced using curl --http2 argument to
connect to an HTTP/1 frontend.

This must be backported up to 2.4 after a period of observation.
2024-08-01 18:23:32 +02:00
Amaury Denoyelle
a7a2db4ad5 BUG/MIONR: quic: fix fc_lost
Control layer callback get_info has recently been implemented for QUIC.
However, fc_lost always returned 0. This is because quic_get_info() does
not use the correct input argument value to identify lost value.

This does not need to be backported.
2024-08-01 11:35:27 +02:00
Amaury Denoyelle
522c3bea2c BUG/MINOR: quic: fix fc_rtt/srtt values
QUIC has recently implement get_info callback to return RTT/sRTT values.
However, it uses milliseconds, contrary to TCP which uses microseconds.
This cause smp fetch functions to return invalid values. Fix this by
converting QUIC values to microseconds.

This does not need to be backported.
2024-08-01 11:35:27 +02:00
Amaury Denoyelle
4b0bda42f7 MINOR: flags/mux-quic: decode qcc and qcs flags
Decode QUIC MUX connection and stream elements via qcc_show_flags() and
qcs_show_flags(). Flags definition have been moved outside of USE_QUIC
to ease compilation of flags binary.
2024-07-31 17:59:35 +02:00
Frederic Lecaille
f7f76b8b0d MINOR: quic: Define ->get_info() control layer callback for QUIC
This low level callback may be called by several sample fetches for
frontend connections like "fc_rtt", "fc_rttvar" etc.
Define this callback for QUIC protocol as pointer to quic_get_info().
This latter supports these sample fetches:
   "fc_lost", "fc_reordering", "fc_rtt" and "fc_rttvar".

Update the documentation consequently.
2024-07-31 10:29:42 +02:00
Frederic Lecaille
1733dff42a MINOR: tcp_sample: Move TCP low level sample fetch function to control layer
Add ->get_info() new control layer callback definition to protocol struct to
retreive statiscal counters information at transport layer (TCPv4/TCPv6) identified by
an integer into a long long int.
Move the TCP specific code from get_tcp_info() to the tcp_get_info() control layer
function (src/proto_tcp.c) and define it as  the ->get_info() callback for
TCPv4 and TCPv6.
Note that get_tcp_info() is called for several TCP sample fetches.
This patch is useful to support some of these sample fetches for QUIC and to
keep the code simple and easy to maintain.
2024-07-31 10:29:42 +02:00
Amaury Denoyelle
bba6baff30 BUG/MEDIUM: quic: prevent conn freeze on 0RTT undeciphered content
Received QUIC packets are stored in quic_conn Rx buffer after header
protection removal in qc_rx_pkt_handle(). These packets are then removed
after quic_conn IO handler via qc_treat_rx_pkts().

If HP cannot be removed, packets are still copied into quic_conn Rx
buffer. This can happen if encryption level TLS keys are not yet
available. The packet remains in the buffer until HP can be removed and
its content processed.

An issue occurs if client emits a 0-RTT packet but haproxy does not have
the shared secret, for example after a haproxy process restart. In this
case, the packet is copied in quic_conn Rx buffer but its HP won't ever
be removed. This prevents the buffer to be purged. After some time, if
the client has emitted enough packets, Rx buffer won't have any space
left and received packets are dropped. This will cause the connection to
freeze.

To fix this, remove any 0-RTT buffered packets on handshake completion.
At this stage, 0-RTT packets are unnecessary anymore. The client is
expected to reemit its content in 1-RTT packet which are properly
deciphered.

This can easily reproduce with HTTP/3 POST requests or retrieving a big
enough object, which will fill the Rx buffer with ACK frames. Here is a
picoquic command to provoke the issue on haproxy startup :

$ picoquicdemo -Q -v 00000001 -a h3 <hostname> 20443 "/?s=1g"

Note that allow-0rtt must be present on the bind line to trigger the
issue. Else haproxy will reject any 0-RTT packets.

This must be backported up to 2.6.

This could be one of the reason for github issue #2549 but it's unsure
for now.
2024-07-31 10:24:53 +02:00
William Lallemand
f76e8e50f4 BUILD: ssl: replace USE_OPENSSL_AWSLC by OPENSSL_IS_AWSLC
Replace USE_OPENSSL_AWSLC by OPENSSL_IS_AWSLC in the code source, so we
won't need to set USE_OPENSSL_AWSLC in the Makefile on the long term.
2024-07-30 18:53:08 +02:00
William Lallemand
1889b86561 BUG/MEDIUM: ssl: 0-RTT initialized at the wrong place for AWS-LC
Revert patch fcc8255 "MINOR: ssl_sock: Early data disabled during
SSL_CTX switching (aws-lc)". The patch was done in the wrong callback
which is never built for AWS-LC, and applies options on the SSL_CTX
instead of the SSL, which should never be done elsewhere than in the
configuration parsing.

This was probably triggered by successfully linking haproxy against
AWS-LC without using USE_OPENSSL_AWSLC.

The patch also reintroduced SSL_CTX_set_early_data_enabled() in the
ssl_quic_initial_ctx() and ssl_sock_initial_ctx(). So the initial_ctx
does have the right setting, but it still needs to be applied to the
selected SSL_CTX in the clienthello, because we need it on the selected
SSL_CTX.

Must be backported to 3.0. (ssl_clienthello.c part was in ssl_sock.c)
2024-07-30 18:53:08 +02:00
William Lallemand
56eefd6827 BUG/MEDIUM: ssl: reactivate 0-RTT for AWS-LC
Then reactivate HAVE_SSL_0RTT and HAVE_SSL_0RTT_QUIC for AWS-LC, which
were wrongly deactivated in f5353f2c ("MINOR: ssl: add HAVE_SSL_0RTT
constant").

Must be backported to 3.0.
2024-07-30 18:53:08 +02:00
Willy Tarreau
376b147fff BUG/MINOR: stconn: bs.id and fs.id had their dependencies incorrect
The backend depends on the response and the frontend on the request, not
the other way around. In addition, they used to depend on L6 (hence
contents in the channel buffers) while they should only depend on L5
(permanent info known in the mux).

This came in 2.9 with commit 24059615a7 ("MINOR: Add sample fetches to
get the frontend and backend stream ID") so this can be backported there.

(cherry picked from commit 61dd0156c82ea051779e6524cad403871c31fc5a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
2024-07-30 18:39:29 +02:00
Christopher Faulet
d9f41b1d6e BUILD: mux-pt: Use the right name for the sedesc variable
A typo was introduced in 760d26a86 ("BUG/MEDIUM: mux-pt/mux-h1: Release the
pipe on connection error on sending path"). The sedesc variable is 'sd', not
'se'.

This patch must be backported with the commit above.
2024-07-30 10:44:00 +02:00
Christopher Faulet
760d26a862 BUG/MEDIUM: mux-pt/mux-h1: Release the pipe on connection error on sending path
When data are sent using the kernel splicing, if a connection error
occurred, the pipe must be released. Indeed, in that case, no more data can
be sent and there is no reason to not release the pipe. But it is in fact an
issue for the stream because the channel will appear are not empty. This may
prevent the stream to be released. This happens on 2.8 when a filter is also
attached on it. On 2.9 and upper, it seems there is not issue. But it is
hard to be sure and the current patch remains valid is all cases. On 2.6 and
lower, the code is not the same and, AFAIK, there is no issue.

This patch must be backported to 2.8. However, on 2.8, there is no zero-copy
data forwarding. The patch must be adapted. There is no done_ff/resume_ff
callback functions for muxes. The pipe must released in sc_conn_send() when
an error flag is set on the SE, after the call to snd_pipe callback
function.
2024-07-30 09:05:25 +02:00
Christopher Faulet
5dc45445ff BUG/MEDIUM: stconn: Report error on SC on send if a previous SE error was set
When a send on a connection is performed, if a SE error (or a pending error)
was already reported earlier, we leave immediately. No send is performed.
However, we must be sure to report the error at the SC level if necessary.
Indeed, the SE error may have been reported during the zero-copy data
forwarding. So during receive on the opposite side. In that case, we may
have missed the opportunity to report it at the SC level.

The patch must be backported as far as 2.8.
2024-07-30 09:05:25 +02:00
Christopher Faulet
33c9562f07 DOC: config: Add documentation about spop mode for backends
The SPOE was refactored. Now backends referenced by a SPOE filter must use
the spop mode to be able to use the spop multiplexer for server connections.
The "spop" mode was added in the list of supported mode for backends.
2024-07-30 09:05:25 +02:00
Willy Tarreau
5541d4995d BUG/MEDIUM: queue: deal with a rare TOCTOU in assign_server_and_queue()
After checking that a server or backend is full, it remains possible
to call pendconn_add() just after the last pending requests finishes, so
that there's no more connection on the server for very low maxconn (typ 1),
leaving new ones in queue till the timeout.

The approach depends on where the request was queued, though:
  - when queued on a server, we can simply detect that we may dequeue
    pending requests and wake them up, it will wake our request and
    that's fine. This needs to be done in srv_redispatch_connect() when
    the server is set.

  - when queued on a backend, it means that all servers are done with
    their requests. It means that all servers were full before the
    check and all were empty after. In practice this will only concern
    configs with less servers than threads. It's where the issue was
    first spotted, and it's very hard to reproduce with more than one
    server. In this case we need to load-balance again in order to find
    a spare server (or even to fail). For this, we call the newly added
    dedicated function pendconn_must_try_again() that tells whether or
    not a blocked pending request was dequeued and needs to be retried.

This should be backported along with pendconn_must_try_again() to all
stable versions, but with extreme care because over time the queue's
locking evolved.
2024-07-29 09:27:01 +02:00
Willy Tarreau
1a8f3a368f MINOR: queue: add a function to check for TOCTOU after queueing
There's a rare TOCTOU case that happens from time to time with maxconn 1
and multiple threads. Between the moment we see the queue full and the
moment we queue a request, it's possible that the last request on the
server or proxy ended and that no other one is left to offer it its place.

Given that all this code path is performance-critical and we cannot afford
to increase the lock duration, better recheck for the condition after
queueing. For this we need to be able to check for the condition and
cleanly dequeue a request. That's what this patch provides via the new
function pendconn_must_try_again(). It will catch more requests than
absolutely needed though it will catch them all. It may find that around
1/1000 of requests are at risk, though testing shows that in practice,
it's around 1 per million that really gets stuck (other ones benefit
from timing and finishing late requests). Maybe in the future some
conditions might be refined but it's harmless.

What happens to such requests is that they're dequeued and their pendconn
freed, so that the caller can decide to try to LB or queue them again. For
now the function is not used, it's just added separately for easier tracking.
2024-07-29 09:27:01 +02:00
Willy Tarreau
4316ef2eab BUILD: cfgparse-quic: fix build error on Solaris due to missing netinet/in.h
Since commit 35470d518 ("MINOR: quic: activate UDP GSO for QUIC if
supported"), Solaris build fails due to netinet/udp.h being included
without netinet/in.h. Adding it is sufficient to fix the problem. No
backport is needed.
2024-07-28 14:59:23 +02:00
Christopher Faulet
46b1fec0e9 BUG/MEDIUM: jwt: Clear SSL error queue on error when checking the signature
When the signature included in a JWT is verified, if an error occurred, one
or more SSL errors are queued and never cleared. These errors may be then
caught by the SSL stack and a fatal SSL error may be erroneously reported
during a SSL received or send.

So we must take care to clear the SSL error queue when the signature
verification failed.

This patch should fix issue #2643. It must be backported as far as 2.6.
2024-07-26 16:59:00 +02:00
Frederic Lecaille
4abaadd842 MINOR: quic: Dump TX in flight bytes vs window values ratio.
Display the ratio of the numbers of bytes in flight by packet number spaces
versus the current window values in percent.
2024-07-26 16:42:44 +02:00
Frederic Lecaille
76ff8afa2d MINOR: quic: Add information to "show quic" for CUBIC cc.
Add ->state_cli() new callback to quic_cc_algo struct to define a
function called by the "show quic (cc|full)" commands to dump some information
about the congestion algorithm internal state currently in use by the QUIC
connections.

Implement this callback for CUBIC algorithm to dump its internal variables:
   - K: (the time to reach the cubic curve inflexion point),
   - last_w_max: the last maximum window value reached before intering
     the last recovery period. This is also the window value at the
     inflexion point of the cubic curve,
   - wdiff: the difference between the current window value and last_w_max.
     So negative before the inflexion point, and positive after.
2024-07-26 16:42:44 +02:00
Willy Tarreau
2dab1ba84b MEDIUM: h1: allow to preserve keep-alive on T-E + C-L
In 2.5-dev9, commit 631c7e866 ("MEDIUM: h1: Force close mode for invalid
uses of T-E header") enforced a recently arrived new security rule in the
HTTP specification aiming at preventing a class of content-smuggling
attacks involving HTTP/1.0 agents. It consists in handling the very rare
T-E + C-L requests or responses in close mode.

It happens it does have an impact of a rare few and very old clients
(probably running insecure TLS stacks by the way) that continue to send
both with their POST requests. The impact is that for each and every
request they'll have to reconnect, possibly negotiating a full TLS
handshake that becomes harmful to the machine in terms of CPU computation.

This commit adds a new option "h1-do-not-close-on-insecure-transfer-encoding"
that does exactly what it says, it just asks not to close on such messages,
even though the message continues to be sanitized and C-L dropped. It means
that the risk is only between the sender and haproxy, which is limited, and
might be the only acceptable solution for such environments having to deal
with broken implementations.

The cases are so rare that it should not need to be backported, or in the
worst case, to the latest LTS if there is any demand.
2024-07-26 15:59:35 +02:00
Amaury Denoyelle
85131f91bf BUG/MEDIUM: quic: fix invalid conn reject with CONNECTION_REFUSED
quic-initial rules were implemented just recently. For some actions, a
new flags field was added in quic_dgram structure. This is used to
report the result of the rules execution.

However, this flags field was left uninitialized. Depending on its
value, it may close the connection to be wrongly rejected via
CONNECTION_REFUSED. Fix this by properly set flags value to 0.

No need to backport.
2024-07-26 15:24:35 +02:00
Amaury Denoyelle
08515af9df MINOR: quic: implement send-retry quic-initial rules
Define a new quic-initial "send-retry" rule. This allows to force the
emission of a Retry packet on an initial without token instead of
instantiating a new QUIC connection.
2024-07-25 15:39:39 +02:00
Amaury Denoyelle
69d7e9f3b7 MINOR: quic: implement reject quic-initial action
Define a new quic-initial action named "reject". Contrary to dgram-drop,
the client is notified of the rejection by a CONNECTION_CLOSE with
CONNECTION_REFUSED error code.

To be able to emit the necessary CONNECTION_CLOSE frame, quic_conn is
instantiated, contrary to dgram-drop action. quic_set_connection_close()
is called immediatly after qc_new_conn() which prevents the handshake
startup.
2024-07-25 15:39:39 +02:00
Amaury Denoyelle
f91be2657e MINOR: quic: pass quic_dgram as obj_type for quic-initial rules
To extend quic-initial rules, pass quic_dgram instance to argument for
the various actions. As such, quic_dgram is now supported as an obj_type
and can be used in session origin field.
2024-07-25 15:39:39 +02:00
Amaury Denoyelle
1259700763 MINOR: quic: support ACL for quic-initial rules
Add ACL condition support for quic-initial rules. This requires the
extension of quic_parse_quic_initial() to parse an extra if/unless
block.

Only layer4 client samples are allowed to be used with quic-initial
rules. However, due to the early execution of quic-initial rules prior
to any connection instantiation, some samples are non supported.

To be able to use the 4 described samples, a dummy session is
instantiated before quic-initial rules execution. Its src and dst fields
are set from the received datagram values.
2024-07-25 15:39:39 +02:00
Amaury Denoyelle
cafe596608 MEDIUM: quic: implement quic-initial rules
Implement a new set of rules labelled as quic-initial.

These rules as specific to QUIC. They are scheduled to be executed early
on Initial packet parsing, prior a new QUIC connection instantiation.
Contrary to tcp-request connection, this allows to reject traffic
earlier, most notably by avoiding unnecessary QUIC SSL handshake
processing.

A new module quic_rules is created. Its main function
quic_init_exec_rules() is called on Initial packet parsing in function
quic_rx_pkt_retrieve_conn().

For the moment, only "accept" and "dgram-drop" are valid actions. Both
are final. The latter drops silently the Initial packet instead of
allocating a new QUIC connection.
2024-07-25 15:39:39 +02:00
Amaury Denoyelle
a72e82c382 MINOR: quic: delay Retry emission on quic-force-retry
Currently, quic Retry packets are emitted for two different reasons
after processing an Initial without token :
- quic-force-retry is set on bind-line
- an abnormal number of half-open connection is currently detected

Previously, these two conditions were checked separately in different
functions during datagram parsing. Uniformize this by moving
quic-force-retry check in quic_rx_pkt_retrieve_conn() along the second
condition check.

The purpose of this patch is to uniformize datagram parsing stages. It
is necessary to implement quic-initial rules in
quic_rx_pkt_retrieve_conn() prior to any Retry emission. This prevents
to emit unnecessary Retry if an Initial is subject to a reject rule.
2024-07-25 15:29:50 +02:00
Aurelien DARRAGON
e328056ddc MEDIUM: sink: assume sft appctx stickiness
As mentioned in b40d804 ("MINOR: sink: add some comments about sft->appctx
usage in applet handlers"), there are few places in the code where it
looks like we assumed that the applet callbacks such as
sink_forward_session_init() or sink_forward_io_handler() could be
executing an appctx whose sft is detached from the appctx
(appctx != sft->appctx).

In practise this should not be happening since an appctx sticks to the
same thread its entire lifetime, and the only times sft->appctx is
effectively assigned is during the session/appctx creation (in
process_sink_forward()) or release.

Thus if sft->appctx wouldn't point to the appctx that the sft was bound
to after appctx creation, it would probably indicate a bug rather than
an expected condition. To further emphasize that and prevent the
confusion, and since 3.1-dev4 was released, let's remove such checks and
instead add a BUG_ON to ensure this never happens.

In _sink_forward_io_handler(), the "hard_close" label was removed since
there are no more uses for it (no hard errors may be caught from the
function for now)
2024-07-25 14:56:19 +02:00
William Lallemand
28cb01f8e8 MEDIUM: quic: implement CHACHA20_POLY1305 for AWS-LC
With AWS-LC, the aead part is covered by the EVP_AEAD API which
provides the correct EVP_aead_chacha20_poly1305(), however for header
protection it does not provides an EVP_CIPHER for chacha20.

This patch implements exceptions in the header protection code and use
EVP_CIPHER_CHACHA20 and EVP_CIPHER_CTX_CHACHA20 placeholders so we can
use the CRYPTO_chacha_20() primitive manually instead of the EVP_CIPHER
API.

This requires to check if we are using EVP_CIPHER_CTX_CHACHA20 when
doing EVP_CIPHER_CTX_free().
2024-07-25 13:45:39 +02:00
William Lallemand
177c84808c MEDIUM: quic: add key argument to header protection crypto functions
In order to prepare the code for using Chacha20 with the EVP_AEAD API,
both quic_tls_hp_decrypt() and quic_tls_hp_encrypt() need an extra key
argument.

Indeed Chacha20 does not exists as an EVP_CIPHER in AWS-LC, so the key
won't be embedded into the EVP_CIPHER_CTX, so we need an extra parameter
to use it.
2024-07-25 13:45:39 +02:00
William Lallemand
d55a297b85 MINOR: quic: rename confusing wording aes to hp
Some of the crypto functions used for headers protection in QUIC are
named with an "aes" name even thought they are not used for AES
encryption only.

This patch renames these "aes" to "hp" so it is clearer.
2024-07-25 13:45:38 +02:00
William Lallemand
31c831e29b MEDIUM: ssl/quic: implement quic crypto with EVP_AEAD
The QUIC crypto is using the EVP_CIPHER API in order to achieve
authenticated encryption, this was the API which was used with OpenSSL.
With libraries that inspires from BoringSSL (libreSSL and AWS-LC), the
AEAD algorithms are implemented using the EVP_AEAD API.

This patch converts the call to the EVP_CIPHER API when called in the
contex of AEAD cryptography for QUIC.

The patch defines some QUIC_AEAD macros that can be either EVP_CIPHER or
EVP_AEAD depending on the library.

This was mainly done for AWS-LC but this could be useful for other
libraries. This should finally allow to use CHACHA20_POLY1305 with
AWS-LC.

This patch allows to use the following ciphers with the EVP_AEAD API:
- TLS1_3_CK_AES_128_GCM_SHA256
- TLS1_3_CK_AES_256_GCM_SHA384

AWS-LC does not implement TLS1_3_CK_AES_128_CCM_SHA256 and
TLS1_3_CK_CHACHA20_POLY1305_SHA256 requires some hack for headers
protection which will come in another patch.
2024-07-25 13:45:38 +02:00
Frederic Lecaille
a6d40e09f7 BUG/MINOR: quic: Lack of precision when computing K (cubic only cc)
K cubic variable is stored in ms. But it was a formula with the second as unit
for the window difference parameter which was used to compute K without
considering the loss of information. Then the result was converted in ms (K *= 1000).
This leaded to a lack of precision and multiples of 1000 as values.

To fix this, use the same formula but with the window difference in ms as parameter
passed to the cubic function and remove the conversion.

Must be backported as far as 2.6.
2024-07-24 18:24:39 +02:00
Willy Tarreau
7eca16921b [RELEASE] Released version 3.1-dev4
Released version 3.1-dev4 with the following main changes :
    - MINOR: limits: prepare to keep limits in one place
    - REORG: fd: move raise_rlim_nofile to limits
    - CLEANUP: fd: rm struct rlimit definition
    - REORG: global: move rlim_fd_*_at_boot in limits
    - MINOR: haproxy: prepare to move limits-related code
    - REORG: haproxy: move limits handlers to limits
    - MINOR: limits: add is_any_limit_configured
    - CLEANUP: quic: remove obsolete comment on send
    - MINOR: quic: extend detection of UDP API OS features
    - MINOR: quic: activate UDP GSO for QUIC if supported
    - MINOR: quic: define quic_cc_path MTU as constant
    - MINOR: quic: add GSO parameter on quic_sock send API
    - MAJOR: quic: support GSO when encoding datagrams
    - MEDIUM: quic: implement GSO fallback mechanism
    - MINOR: quic: add counters of sent bytes with and without GSO
    - BUG/MEDIUM: bwlim: Be sure to never set the analyze expiration date in past
    - CLEANUP: proto: rename TID affinity callbacks
    - CLEANUP: quic: rename TID affinity elements
    - BUG/MINOR: limits: fix license type in limits.h
    - BUG/MINOR: session: Eval L4/L5 rules defined in the default section
    - CLEANUP: stconn: Fix a typo in comments for SE_ABRT_SRC_*
    - MEDIUM: spoe: Remove fragmentation support
    - MEDIUM: spoe: Remove async mode support
    - MINOR: spoe: Use only a global engine-id per agent
    - MINOR: spoe: Remove debugging
    - MAJOR: spoe: Remove idle applets and pipelining support
    - MINOR: spoe: Remove the dedicated SPOE applet task
    - MEDIUM: proxy/spoe: Add a SPOP mode
    - MEDIUM: applet: Add a .shut callback function for applets
    - MINOR: connection: No longer include stconn type header in connection-t.h
    - MINOR: stconn: Use a dedicated function to get the opposite sedesc
    - MINOR: spoe: Rename some flags and constant to use SPOP prefix
    - MINOR: spoe: Dynamically alloc the message list per event of an agent
    - MINOR: spoe: Move all stuff regarding the filter/applet in the C file
    - MINOR: spoe: Move spoe_str_to_vsn() into the header file
    - MEDIUM: mux-spop: Introduce the SPOP multiplexer
    - MEDIUM: check/spoe: Use SPOP multiplexer to perform SPOP health-checks
    - MAJOR: spoe: Rewrite SPOE applet to use the SPOP mux
    - CLEANUP: spoe: Uniformize function definitions
    - MINOR: spoe: Add internal sample fetch to retrieve the SPOE engine ID
    - MEDIUM: spoe: Set a specific name for the connection pool of SPOP servers
    - MINOR: backend: Remove test on HTX streams to reuse idle connections on connect
    - MEDIUM: spoe: Force the reuse 'always' mode for SPOP backends
    - MINOR: mux-spop: Use a dedicated function to update the SPOP connection timeout
    - MAJOR: mux-spop: Make the SPOP connections reusable
    - MINOR: stats-html: Display reuse ratio for spop connections
    - MEDIUM: spoe: Directly xfer NOTIFY frame when SPOE applet is created
    - MEDIUM: spoe: Directly receive ACK frame in the SPOE context buffer
    - MEDIUM: mux-spop/spoe: Save negociated max-frame-size value in the mux
    - MINOR: spoe: Remove the spop version from the SPOE appctx context
    - MEDIUM: mux-spop: Add checks on received frames
    - MEDIUM: mux-spop: Announce the pipeling support if possible
    - MEDIUM: spoe: Forward SPOE context error to the SPOE applet
    - MEDIUM: spoe: Make the SPOE applet use its own buffers
    - DOC: spoe: Update SPOE documentation to reflect recent refactoring
    - BUILD: mux-spop: fix build failure on gcc 4-10 and clang
    - MINOR: fd: don't scan the full fdtab on all threads
    - MINOR: server: better mt_list usage for node migration (prev_deleted handling)
    - BUG/MINOR: do not close uninit FD in quic_test_socketops()
    - BUG/MEDIUM: debug/cli: fix "show threads" crashing with low thread counts
    - MINOR: debug: prepare feed_post_mortem_late
    - CLEANUP: debug: fix indents in debug_parse_cli_show_dev
    - MINOR: debug: store runtime uid/gid in postmortem
    - MINOR: debug: keep runtime capabilities in post_mortem
    - MINOR: debug: use LIM2A to show limits
    - MINOR: debug: prepare to show runtime limits
    - MINOR: debug: keep runtime limits in postmortem
    - DOC: install: don't reference removed CPU arg
    - BUG/MEDIUM: ssl_sock: fix deadlock in ssl_sock_load_ocsp() on error path
    - BUG/MAJOR: mux-h2: force a hard error upon short read with pending error
    - MEDIUM: sink: start applets asynchronously
    - OPTIM: sink: balance applets accross threads
    - MEDIUM: ocsp: fix ocsp when the chain is loaded from 'issuers-chain-path'
    - MEDIUM: ssl: add extra_chain to ckch_data
    - MINOR: ssl: change issuers-chain for show_cert_detail()
    - REGTESTS: ssl: test the issuers-chain-path keyword
    - DOC: configuration: issuers-chain-path not compatible with OCSP
    - DOC: configuration: issuers-chain-path is compatible with OCSP
    - BUG/MEDIUM: startup: fix zero-warning mode
    - BUILD: tree-wide: cast arguments to tolower/toupper to unsigned char (2)
    - MINOR: cfgparse-global: move mode's keywords in cfg_kw_list
    - MINOR: cfgparse-global: move no<poller_name> in cfg_kw_list
    - DOC: config: improve the http-keep-alive section
    - BUG/MINOR: stick-table: fix crash for src_inc_gpc() without stkcounter
    - BUG/MINOR: server: Don't warn fallback IP is used during init-addr resolution
    - BUG/MINOR: cli: Atomically inc the global request counter between CLI commands
    - MINOR: stream: Add a pointer to set the parent stream
    - MINOR: vars: Fill a description instead of hash and scope when a name is parsed
    - MINOR: vars: Use a description to set/unset a variable instead of its hash and scope
    - MEDIUM: vars: Be able to parse parent scopes for variables
    - MINOR: vars: Use a variable description to get variables of a specific scope
    - MEDIUM: vars: Be able to retrieve variable of the parent stream, if any
    - MEDIUM: spoe: Set the parent stream for SPOE streams
    - BUG/MINOR: quic: Non optimal first datagram.
    - DOC: config: Add a dedicated section about variables
    - DOC: config: Add info about variable scopes referencing the parent stream
    - DOC: config: Explicitly state the SPOE streams have a usable parent stream
    - MINOR: quic: Avoid cc priv buffer overflow.
    - MINOR: spoe: Add a function to validate a version is supported
    - MINOR: spoe: export the list of SPOP error reasons
    - MEDIUM: spoe/tcpcheck: Reintroduce SPOP check as a customized tcp-check
    - REGTESTS: check/spoe: Re-enable the script performing SPOP health-checks
    - BUG/MEDIUM: sink: properly init applet under sft lock
    - MINOR: sink: unify and sink_forward_io_handler() and sink_forward_oc_io_handler()
    - MINOR: sink: Remove useless test on SE_FL_SHR/SHW flags
    - MINOR: sink: merge sink_forward_io_handler() with sink_forward_oc_io_handler()
    - MINOR: sink: add some comments about sft->appctx usage in applet handlers
    - MINOR: sink: distinguish between hard and soft close in _sink_forward_io_handler()
    - MEDIUM: sink: don't set NOLINGER flag on the outgoing stream interface
    - MINOR: ring: count processed messages in ring_dispatch_messages()
    - MINOR: sink: add processed events counter in sft
    - MEDIUM: sink: "max-reuse" support for sink servers
    - OPTIM: sink: consider threads' current load when rebalancing applets
2024-07-24 18:20:24 +02:00
Aurelien DARRAGON
2513bd257f OPTIM: sink: consider threads' current load when rebalancing applets
In c454296f0 ("OPTIM: sink: balance applets accross threads"), we already
made sure to balance applets accross threads by picking a random thread
to spawn the new applet.

Also, thanks to the previous commit, we also have the ability to destroy
the applet when a certain amount of messages were processed to help
distribute the load during runtime.

Let's improve that by trying up to 3 different threads in the hope to pick
a non-overloaded one in the best scenario, and the least over loaded one
in the worst case. This should help to better distribute the load over
multiple threads when high loads are expected.

Logic was greatly inspired from thread migration logic used by server
health checks, but it was simpliflied for sink's use case.
2024-07-24 17:59:18 +02:00
Aurelien DARRAGON
237849c911 MEDIUM: sink: "max-reuse" support for sink servers
Thanks to the previous commit, it is now possible to know how many events
were processed for a given sft/server sink pair. As mentioned in commit
c454296 ("OPTIM: sink: balance applets accross threads"), let's provide
the ability to restart a server connection when a certain amount of events
were processed to help better balance the load over multiple threads.

For this, we make use the of "max-reuse" server keyword which was only
relevant under "http" context so far. Under sink context, "max-reuse"
corresponds to the number of times the tcp connection can be reused
for sending messages, which in fact means that "max-reuse + 1" is the
number of events (ie: messages) that are allowed to be sent using the
same tcp server connection: when this threshold is met, the connection
will be destroyed and a new one will be created on a random thread.
The value is not strict: it is the minimum value above which the
connection may be destroyed since the value is checked after
ring_dispatch_messages() which may process multiple messages at once.

By default, no limit is enforced (the connection will be reused for as
long as it is available).

The documentation was updated accordingly.
2024-07-24 17:59:14 +02:00
Aurelien DARRAGON
709b3db941 MINOR: sink: add processed events counter in sft
Add a new struct member to sft structure named e_processed in order to
track the total number of events processed by sft applets.

sink_forward_oc_io_handler() and sink_forward_io_handler() now make use
of ring_dispatch_messages() optional value added in the previous commit
in order to increase the number of processed events.
2024-07-24 17:59:08 +02:00
Aurelien DARRAGON
47323e64ad MINOR: ring: count processed messages in ring_dispatch_messages()
ring_dispatch_messages() now takes an optional argument <processed> which
must point to a size_t counter when provided.

When provided, the value is updated to the number of messages processed
by the function.
2024-07-24 17:59:03 +02:00
Aurelien DARRAGON
0821460e3f MEDIUM: sink: don't set NOLINGER flag on the outgoing stream interface
Given that sink applets are responsible for conveying messages from the
ring to the tcp server endpoint, there are no protocol timeout or errors
expected there, it is an unidirectional flow of data over TCP.

As such, NOLINGER flag which was inherited from peers applet, see
dbd026792 ("BUG/MEDIUM: peers: set NOLINGER on the outgoing stream
interface") is not desirable under sink context:

The reason why we have the NOLINGER flag set is to ensure the connection
is closed right away and avoid 60s TIME_WAIT delay on closed sockets.
The downside is that messages sent right before closing the socket are
not guaranteed to make it to the server because closing with NOLINGER
flag set will result in RST packet being emitted right away, which could
prevent in-flight messages from being properly delivered.

Unlike peers applets, the only cases were sink applets are expected to
close the connection are upon unexpected error or upon stopping, which are
relatively rare events. Thanks to previous commit, ERROR flag is already
set in case of error, so the use of NOLINGER is not mandatory for the
RST to be sent. Now for the stopping case, it only happens once in the
process lifetime so it's acceptable to close the socket using EOS+EOI
flags without the NOLINGER option set.

So in our case, it is preferable to ensure messages get properly delivered
knowning that closed sockets should be piling up in TIME_WAIT, this means
removing the NOLINGER flag on the outgoing stream interface for sink
applets. It is a prerequisite for upcoming patches in order to cleanly
shut the applet during runtime without risking to send the RST packet
before all pending messages were sent to the endpoint.
2024-07-24 17:58:58 +02:00
Aurelien DARRAGON
c6ab0e14e2 MINOR: sink: distinguish between hard and soft close in _sink_forward_io_handler()
Aborting the socket on soft-stop is not the same as aborting it due to
unexpected error. As such, let's leverage the granularity offered by
sedesc flags to better reflect the situation: abort during soft-stop is
handled as a soft close thanks to EOI+EOS flags, while abort due to
unexpected error is handled as hard error thanks to ERROR+EOS flags.

Thanks to this change, hard error will always emit RST packet even if
the NOLINGER option wasn't set on the socket.
2024-07-24 17:58:52 +02:00
Aurelien DARRAGON
b40d804c7f MINOR: sink: add some comments about sft->appctx usage in applet handlers
There seem to be an ambiguity in the code where sft->appctx would differ
from the appctx that was assigned to it upon appctx creation.

In practise, it doesn't seem this could be happening. Adding a few notes
to come back to this later and try to see if we can remove this
ambiguity.
2024-07-24 17:58:47 +02:00
Aurelien DARRAGON
10811fdfd6 MINOR: sink: merge sink_forward_io_handler() with sink_forward_oc_io_handler()
Now that sink_forward_oc_io_handler() and sink_forward_io_handler() were
unified again thanks to the previous commit, let's take a chance to merge
code that is common to both functions in order to ease code maintenance.

Let's add _sink_forward_io_handler() internal function which takes the
applet and a message handler as argument: sink_forward_io_handler() and
sink_forward_oc_io_handler() leverage this internal function by passing
the correct message handler for the desired format.
2024-07-24 17:58:41 +02:00
Aurelien DARRAGON
f2848e6146 MINOR: sink: Remove useless test on SE_FL_SHR/SHW flags
Re-apply dcd917d972 ("MINOR: applet: Remove uselelss test on SE_FL_SHR/SHW
flags") for sink_forward_oc_io_handler() function as it was probably
overlooked given that sink_forward_oc_io_handler() and
sink_forward_io_handler() follow the same logic.
2024-07-24 17:58:35 +02:00
Aurelien DARRAGON
901a66b3fc MINOR: sink: unify and sink_forward_io_handler() and sink_forward_oc_io_handler()
In a739dc2 ("MEDIUM: sink: Use the sedesc to report and detect end of
processing"), we added a drain after close in sink_forward_oc_io_handler()
by the use of "goto out".

However, since we perform a close, there is no reason to drain data from
the socket. Moreover, before the patch there was no drain and nothing
mentioned the fact that that the drain was added on purpose. Lastly,
sink_forward_io_handler() and sink_forward_oc_io_handler() functions are
strictly identical when in comes to processing logic, and the drain was
only added in sink_forward_oc_io_handler() and not in
sink_forward_io_handler().

As such, it's pretty safe to assume that the drain is not needed here
and was added as accident. So in this patch we remove it in an attempt
to unify sink_forward_io_handler() and sink_forward_oc_io_handler()
functions like it was already the case before.
2024-07-24 17:58:30 +02:00
Aurelien DARRAGON
c81b8ee480 BUG/MEDIUM: sink: properly init applet under sft lock
Since 09d69eacf8 ("MEDIUM: sink: start applets asynchronously") the applet
is no longer initialized under the sft lock while it was the case before.

At first it doesn't seem to be an issue, but if we look closer at
sink_forward_session_init(), we can see that sft->appctx is assigned
while it can be accessed at the same time from sink_init_forward().

Let's restore the old guarantees by performing the .init under the sft
lock.

No backport needed unless 09d69eacf8 is.
2024-07-24 17:58:24 +02:00
Christopher Faulet
06547dcf52 REGTESTS: check/spoe: Re-enable the script performing SPOP health-checks
Thanks to previous patches, it is now possible to re-enable the test on SPOP
health-checks support.
2024-07-24 14:19:10 +02:00
Christopher Faulet
51e18c9aa6 MEDIUM: spoe/tcpcheck: Reintroduce SPOP check as a customized tcp-check
To be able to retrieve accurrate errors when a SPOP health-check is
performed, a customized tcp-check is used. Indeed, it is not possible to
rely on the SPOP multiplexer for now because the check is performed at the
mux connection layer and the error, if any, cannot be retrieved by the
health-check. A L4 success or error is reported.

To fix this issue and restore the previous behavior, a customized tcp-check
is created. The connection is forced to use the PT multiplexer. An hardcoded
message is sent and a customer handler is used to decode the SPOA response.
This way, it is possible to parse the response and return an accurrate
status code.
2024-07-24 14:19:10 +02:00
Christopher Faulet
2f3c4d1b6c MINOR: spoe: export the list of SPOP error reasons
The strings representing the human-readable version for SPOP errors are now
exported. It is now an array of IST to ease manipulation.
2024-07-24 14:19:10 +02:00
Christopher Faulet
f8fed07d3a MINOR: spoe: Add a function to validate a version is supported
spoe_check_vsn() function can now be used to check if a version, converted
to an integer, via spoe_str_to_vsn() for instance, is supported. To do so,
the list of all supported version is now exported.
2024-07-24 14:19:10 +02:00
Frederic Lecaille
735e4aecfc MINOR: quic: Avoid cc priv buffer overflow.
Add two initcall callback with BUG_ON_HOT() to newro and cubic modules to
ensure there is no buffer overflow when accessing the private data of
these congestion control algorithm state structures. This is to ensure
that further modifications about these data structures will not
lead to surprises. At this time there is no possible buffer overflow.
2024-07-24 11:07:19 +02:00
Christopher Faulet
e902db2609 DOC: config: Explicitly state the SPOE streams have a usable parent stream
It is explicitly mentionned in the configuration manual that the parent of a
SPOE stream is the filtered stream. It means variables of the filtered
stream are usable from the SPOE stream.
2024-07-19 16:35:44 +02:00
Christopher Faulet
2e86de0e0f DOC: config: Add info about variable scopes referencing the parent stream
It is now possible for a stream to have a parent and it is also possible to
retrieve variables defined in the parent stream context. To do so, some
extra scopes were introduced. The section 2.8. was updated accordingly.
2024-07-19 16:35:38 +02:00
Christopher Faulet
b643fbb1a6 DOC: config: Add a dedicated section about variables
The variables in the HAProxy configuration are now described in a dedicated
section. Instead of repeating the same description everywhere a variable
name can be used, the section 2.8. is now referenced.
2024-07-19 16:31:13 +02:00
Frederic Lecaille
402ce29e9e BUG/MINOR: quic: Non optimal first datagram.
This bug arrived with this commit:

     b068e758f MINOR: quic: simplify rescheduling for handshake

This commit introduced a bad side effect. Haproxy always replied by an ACK-only
datagram when it received the first client Initial packet. Then it handled
the CRYPTO data insided. And finally, it sent its own CRYPTO data. This broke
the packet coalescing rule whose aim is to optimally build and send as more
as QUIC packets by datagram.

To fix this, simply partially reverts this commit, to make the low level I/O
task return again if some CRYPTO were received. This will delay the
acknowledgement which will be sent with the CRYPTO data from the same
datagram again.

Must be backported to 3.0.
2024-07-19 16:22:00 +02:00
Christopher Faulet
127083a7a2 MEDIUM: spoe: Set the parent stream for SPOE streams
When a SPOE applet is created to send a message to an agent, the parent of
the associated stream is set to the one filtered. And the relationship
between the streams is removed when the applet is released or when the
processing on main stream is finished.

In the mean time, it is possible to get variables of the parent stream from
the SPOE one. It is not a huge change but this will be amazingly useful. For
instance, it is now possible to be sticky on a server using a critera of the
main streem. Here is an example using the client source address:

  listen http
    bind *:80
    tcp-request content set-var(txn.client_src) src
    filter spoe engine {SPOE-NAME} config /{SPOE-CONFIG}
    http-request send-spoe-group {SPOE-NAME} {SPOE-MSG}
    server www 127.0.0.1:8000

  backend spoe-backend
    mode spop
    timeout server 10s

    stick-table type ip size 200k expire 30m
    stick on var(ptxn.client_src)

    server srv1 ...
    server srv2 ...
    server srv3 ...
    server srv4 ...

Of course, the feature is not limited to stick-tables. Everywhere variables
are used, it is now possible to get the value set on the parent stream from
the SPOE stream.
2024-07-18 17:06:12 +02:00
Christopher Faulet
230c1570ac MEDIUM: vars: Be able to retrieve variable of the parent stream, if any
It is now possible to retrieved the value of a variable using the parent
stream or the parent session instead of the current one. It remains
forbidden to set or unset this value. The sample fetch used to store the
result is a local copy. So it may be safely altered by a converter without
changing the value of the original variable.

Note that for now, the parent of a stream is never set. So this part is not
really used. This will change with the SPOE.
2024-07-18 17:06:12 +02:00
Christopher Faulet
1a1afecb8b MINOR: vars: Use a variable description to get variables of a specific scope
Now a variable description is retrieved when a variable is parsed, we can
use it to get the variable value. It is mandatory to be able to know the
parent stream, if any, must be used, instead of the current one.
2024-07-18 17:06:12 +02:00
Christopher Faulet
f93828f229 MEDIUM: vars: Be able to parse parent scopes for variables
Add session/stream scopes related to the parent. To do so, "psess", "ptxn",
"preq" or "pres" must be used instead of tranditionnal scopes (without the
first "p"). the "proc" scope is not concerned by this change because it is
not linked to a stream. When such scopes are used, a specific flags is added
on the variable description during the variable parsing.

For now, theses scopes are parsed and the variable description is updated
accordingly. But at the end, any operation on the variable value fails.
2024-07-18 16:39:39 +02:00
Christopher Faulet
d430edcda3 MINOR: vars: Use a description to set/unset a variable instead of its hash and scope
Now a variable description is retrieved when a variable is parsed, we can
use it to set or unset the variable value. It is mandatory to be able to
know the parent stream, if any, must be used, instead of the current one.
2024-07-18 16:39:38 +02:00
Christopher Faulet
eb2d71614f MINOR: vars: Fill a description instead of hash and scope when a name is parsed
A variable description is now used to parse a variable and extract its name
and its scope. It is mandatory to be able to add some flags on the variable
when it is evaluated (set or get). Among other things, this will be used to
know the parent stream, if any, must be used, instead of the current one.
2024-07-18 16:39:38 +02:00
Christopher Faulet
b020bb73a0 MINOR: stream: Add a pointer to set the parent stream
A pointer to a parent stream was added in the stream structure. For now,
this pointer is never set, but the idea is to have an access to a stream
environment from another one from the moment there is a parent/child
relationship betwee these streams.

Concretely, for now, there is nothing to formalize this relationship.
2024-07-18 16:39:38 +02:00
Christopher Faulet
3cdb3fa5d9 BUG/MINOR: cli: Atomically inc the global request counter between CLI commands
The global request counter is used to set the stream id (s->uniq_id). It is
incremented at different places. And it must be atomically incremented
because it is a global value. However, in the analyer dealing with CLI
command response, this was not the case. It is now fixed.

This patch must be backported to all stable versions.
2024-07-18 16:39:38 +02:00
Christopher Faulet
abaafda485 BUG/MINOR: server: Don't warn fallback IP is used during init-addr resolution
When a fallback IP address is provided in the list of methods to use to
resolve the server address, a warning is emitted if previous methods
failed. The aim is to inform this address will be used for the
server. However, it is valid use-case. It is the expected behavior. There is
no reason to emit a warning. Having a message during HAProxy startup to
inform the fallback IP address will be used is probably a good idea. But it
should be a notice not a warning. Otherwise, checking the configuration
validity will always failed, just like starting HAProxy in zero-warning
mode while the option was set on purpose.

This patch should fix the issue #2627. It must be backported to all stable
versions.
2024-07-18 16:39:38 +02:00
Amaury Denoyelle
ea7ea5198a BUG/MINOR: stick-table: fix crash for src_inc_gpc() without stkcounter
Since 2.5, an array of GPC is provided to replace legacy gpc0/gpc1.
src_inc_gpc is a sample fetch which is used to increment counters in
this array.

A crash occurs if src_inc_gpc is used without any previous track-sc
rule. This is caused by an error in smp_fetch_sc_inc_gpc(). When
temporary stick counter is created via smp_create_src_stkctr(), table
pointer arg value used is not correct : it points to the counter ID
instead of the table argument. To fix this, use the proper sample fetch
second arg.

This can be reproduced with the following config :
  acl mark src_inc_gpc(0,<table>) -m bool
  tcp-request connection accept if mark

This should be backported up to 2.6.
2024-07-18 16:12:36 +02:00
Willy Tarreau
2bd269cf2a DOC: config: improve the http-keep-alive section
Nathan Wehrman suggested this add-on to try to better explain the
interactions between http-keep-alive and other timeouts, and the
impacts on protocols (HTTP/1, HTTP/2 etc).
2024-07-18 14:24:07 +02:00
Valentine Krasnobaeva
83ff4db188 MINOR: cfgparse-global: move no<poller_name> in cfg_kw_list
This commit continues to clean up cfg_parse_global() and to prepare the
refactoring of master-worker mode. Master, after forking a worker, enters in
its wait polling loop to catch signals and to provide master CLI. So, some
poller types could be disabled for master process it as well.
2024-07-18 14:15:59 +02:00
Valentine Krasnobaeva
118ac11cea MINOR: cfgparse-global: move mode's keywords in cfg_kw_list
This commit cleans up cfg_parse_global() and prepares the config parser for
master-worker mode refactoring, where daemon and master-worker fork() calls
will happen very early in init().

So, the config in such case should be read twice:
 - at first: only some keywords in the global section for the mode discovery
   and everything, which is related to master process by opportunity;
 - at second: except the master process, all other keywords would be parsed;
2024-07-18 14:15:52 +02:00
Aurelien DARRAGON
d3d35f0fc6 BUILD: tree-wide: cast arguments to tolower/toupper to unsigned char (2)
Fix build warning on NetBSD by reapplying f278eec37a ("BUILD: tree-wide:
cast arguments to tolower/toupper to unsigned char").

This should fix issue #2551.
2024-07-18 13:29:52 +02:00
Valentine Krasnobaeva
fcd4bf54c8 BUG/MEDIUM: startup: fix zero-warning mode
Let's check the second time a global counter of "ha_warning" messages, if
zero-warning is set. And let's do this just before forking. At this moment we
are sure, that we've already done all init operations, where we could emit
"ha_warning", and we still have stderr fd opened.

Even with the second check, we could lost some late and rare warnings
about failing to drop supplementary groups and about re-enabling core dumps.
Notes about this are added into 'zero-warning' keyword description.
2024-07-18 05:24:56 +02:00
William Lallemand
beaa0e1635 DOC: configuration: issuers-chain-path is compatible with OCSP
Since patch f3dfd95a ("MEDIUM: ocsp: fix ocsp when the chain is loaded
from 'issuers-chain-path'") the OCSP features are compatible with
'issuers-chain-path'.
2024-07-17 18:20:43 +02:00
William Lallemand
8a3e4a608b DOC: configuration: issuers-chain-path not compatible with OCSP
State that issuers-chain-path is not compatible with OCSP features.

Must be backported in every stable version.
2024-07-17 17:46:16 +02:00
William Lallemand
4bac38d088 REGTESTS: ssl: test the issuers-chain-path keyword
Add a reg-test which test the completion of the issuers-chain-path
keyword

Note that it could be interesting to have the loading of a .ocsp
combined with this, but our pki for OCSP tests lacks
the SubjectKeyIdentifier extensions.
2024-07-17 16:52:06 +02:00
William Lallemand
ae8c3f7f77 MINOR: ssl: change issuers-chain for show_cert_detail()
Since data->chain is now completed when loading the files, we don't need
to use ssl_get0_issuer_chain() anywhere else in the code.

data->chain will always be completed once the files are loaded, but we
can't know from show_cert_detail() from what chain file it was completed.
That's why the extra_chain pointer was added to dump the chain file.
2024-07-17 16:52:06 +02:00
William Lallemand
344c3ce8fc MEDIUM: ssl: add extra_chain to ckch_data
The extra_chain member is a pointer to the 'issuers-chain-path' file
that completed the chain.

This is useful to get what chain file was used.
2024-07-17 16:52:06 +02:00
Valentine Krasnobaeva
f3dfd95aa2 MEDIUM: ocsp: fix ocsp when the chain is loaded from 'issuers-chain-path'
This fixes OCSP, when issuer chain is in a separate PEM file. This is a
case of issuers-chain-path keyword, which points to folder that contains only
PEM with RootCA and IntermediateCA.

Before this patch, the chain from 'issuers-chain-path' was applied
directly to the SSL_CTX without being applied to the data->chain
structure. This would work for SSL traffic, but every tests done with
data->chain would fail, OCSP included, because the chain would be NULL.

This patch moves the loading of the chain from
ssl_sock_load_cert_chain(), which is the function that applies the chain
to the SSL_CTX, to ssl_sock_load_pem_into_ckch() which is the function
that loads the files into the ckch_data structure.

Fixes issue #2635 but it changes thing on the CLI, so that's not
backportable.
2024-07-17 16:52:06 +02:00
Aurelien DARRAGON
c454296f07 OPTIM: sink: balance applets accross threads
Most of the time all sink applets (which are responsible for relaying
messages from the ring to the tcp servers endpoints) would end up being
assigned to the first available thread (tid:0), resulting in excessive CPU
usage on a single thread when multiple sink servers were defined (no
matter if they were defined over multiple "ring" sections) and significant
message load was pushed through them over the ring API.

This patch is similar to 34e4085f ("MEDIUM: peers: Balance applets across
threads") but for sinks. We use a slightly different approach, which is to
elect a random thread instead of picking the one with leasts applets. This
proves to be already sufficient to alleviate the issue.

In the case we want to have a better load distribution we should consider
breaking existing connections to reestablish them on a new thread when we
find out that they start monopolizing a cpu thread (ie: after a certain
amount of messages for instance). Also check tcpchecks migrating model for
inspiration.

This patch depends on the previous one ("MEDIUM: sink: start applets
asynchronously").
2024-07-17 16:45:49 +02:00
Aurelien DARRAGON
09d69eacf8 MEDIUM: sink: start applets asynchronously
Since d9c1d33fa1 ("MEDIUM: applet: Add support for async appctx startup
on a thread subset"), it is now possible to delay appctx's init: for that
it is required that the .init callback is defined on the applet.

When the applet will be processed on the first run, applet API will
automatically finish the applet initialization. Thus we explicitly
call appctx_wakeup() on the applet to schedule it for initial run
instead of calling appctx_init() ourselves.

This is done in prevision of the next patch in order to be able to
schedule the applet on a different thread from the one executing
sink_forward_session_create() function.

Note: 'out_free_appctx' label was removed since it is no longer used.
2024-07-17 16:45:43 +02:00
Willy Tarreau
4de03e42cd BUG/MAJOR: mux-h2: force a hard error upon short read with pending error
A risk of truncated packet was addressed in 2.9 by commit 19fb19976f
("BUG/MEDIUM: mux-h2: Only Report H2C error on read error if demux
buffer is empty") by ignoring CO_FL_ERROR after a recv() call as long
as some data remained present in the buffer. However it has a side
effect due to the fact that some frame processors only deal with full
frames, for example, HEADERS. The side effect is that an incomplete
frame will not be processed and will remain in the buffer, preventing
the error from being taken into account, so the I/O handler wakes up
the H2 parser to handle the error, and that one just subscribes for
more data, and this loops forever wasting CPU cycles.

Note that this only happens with errors at the SSL layer exclusively,
otherwise we'd have a read0 pending that would properly be detected:

  conn->flags = CO_FL_XPRT_TRACKED | CO_FL_ERROR | CO_FL_XPRT_READY | CO_FL_CTRL_READY
  conn->err_code = CO_ERR_SSL_FATAL
  h2c->flags  = H2_CF_ERR_PENDING | H2_CF_WINDOW_OPENED | H2_CF_MBUF_HAS_DATA | H2_CF_DEM_IN_PROGRESS | H2_CF_DEM_SHORT_READ

The condition to report the error in h2_recv() needs to be refined, so
that connection errors are taken into account either when the buffer is
empty, or when there's an incomplete frame, since we're certain it will
never be completed. We're certain to enter that function because
H2_CF_DEM_SHORT_READ implies too short a frame, and earlier there's a
protocol check to validate that no frame size is larger than bufsize,
hence a H2_CF_DEM_SHORT_READ implies there's some room left in the
buffer and we're allowed to try to receive.

The condition to reproduce the bug seems super hard to meet but was
observed once by Patrick Hemmer who had the reflex to capture lots of
information that allowed to explain the problem. In order to reproduce
it, the SSL code had to be significantly modified to alter received
contents at very empiric places, but that was sufficient to reproduce
it and confirm that the current patch works as expected.

The bug was tagged MAJOR because when it triggers there's no other
solution to get rid of it but to restart the process. However given how
hard it is to trigger on a lab, it does not seem very likely to occur
in field.

This needs to be backported to 2.9.
2024-07-17 15:07:47 +02:00
Valentine Krasnobaeva
9371c28c28 BUG/MEDIUM: ssl_sock: fix deadlock in ssl_sock_load_ocsp() on error path
We could run under heavy load in containers or on premises and some automatic
tool in parallel could use CLI to check OCSP updates statuses or to upload new
OCSP responses. So, calloc() to store OCSP update callback arguments may fail
and ocsp_tree_lock need to be unlocked, when exiting due to this failure.

This needs to be backported in all stable versions until v2.4.0 included.
2024-07-17 14:52:11 +02:00
Lukas Tribus
a9e3decd76 DOC: install: don't reference removed CPU arg
Remove reference to the removed CPU= build argument in commit 018443b8a1
("BUILD: makefile: get rid of the CPU variable").

This should be backported to 3.0.
2024-07-16 20:06:06 +02:00
Valentine Krasnobaeva
e8799d2880 MINOR: debug: keep runtime limits in postmortem
It's usefull to keep runtime limits (fd and RAM) in postmortem and show them in
debug_parse_cli_show_dev(). Runtime limits are fed in feed_post_mortem_late(),
as we are sure that at this moment that all configuration was parsed and
all applied limits were alredy adjusted.
2024-07-16 14:04:41 +02:00
Valentine Krasnobaeva
3abd03aa78 MINOR: debug: prepare to show runtime limits
This is a preparation patch to extend postmortem in order to store runtime
limits. No need to perform getrlimit() in feed_post_mortem(), as we do this
in the very beginning of main() and we store initial fd limits in global
'rlim_fd_cur_at_boot' and 'rlim_fd_max_at_boot' variables.
2024-07-16 14:04:41 +02:00
Valentine Krasnobaeva
665dde6481 MINOR: debug: use LIM2A to show limits
It is more handy to use LIM2A in debug_parse_cli_show_dev(), as it allows to
show a custom string ("unlimited"), if a given limit value equals to 0.

normalize_rlim() handler is needed to convert properly RLIM_INFINITY to zero,
with the respect of type sizes, as rlim_t is always 4 bytes on 32bit and
64bit arch.
2024-07-16 14:04:41 +02:00
Valentine Krasnobaeva
93cc7df276 MINOR: debug: keep runtime capabilities in post_mortem
Let's extend postmortem to keep process runtime capabilities. This information
is gathered in feed_post_mortem_late(), as it is called just before
run_poll_loop() and we are sure at this moment, that all configuration
settings were successfully applied.
2024-07-16 14:04:41 +02:00
Valentine Krasnobaeva
baa4e1cf39 MINOR: debug: store runtime uid/gid in postmortem
Let's extend post_mortem to store runtime process uid and gid.
This information is fed in feed_post_mortem_late(), just before calling
run_poll_loop(). Like this we are sure that all configuration settings were
successfully applied.
2024-07-16 14:04:41 +02:00
Valentine Krasnobaeva
ac8bd679dc CLEANUP: debug: fix indents in debug_parse_cli_show_dev
Fix indents in debug_parse_cli_show_dev() to avoid useless conflicts in case of
future changes in this function or git-bisect.
2024-07-16 14:04:41 +02:00
Valentine Krasnobaeva
7cdf5751b5 MINOR: debug: prepare feed_post_mortem_late
Process runtime information could be very useful in post_mortem, but we have to
collect it just before calling run_poll_loop(). Like this we are sure, that
we've successfully applied all configuration parameters and what we've
collected are the latest runtime settings.

The most appropraite place to collect such information is
feed_post_mortem_late(). It's called in each thread, but puts thread info in
the post_mortem only when it's in the last thread context. As it's called
under mutex lock, other threads at this moment have to wait until
feed_post_mortem_late() and another initialization functions from
per_thread_init_list will finish. The number of threads could be large. So, to
avoid spending a lot of time under the lock, let's exit immediately from
feed_post_mortem_late(), if it wasn't called in the last thread.
2024-07-16 14:04:41 +02:00
Willy Tarreau
e0e2b66132 BUG/MEDIUM: debug/cli: fix "show threads" crashing with low thread counts
The "show threads" command introduced early in the 2.0 dev cycle uses
appctx->st1 to store its context (the number of the next thread to dump).
It goes back to an era where contexts were shared between the various
applets and the CLI's command handlers.

In fact it was already not good by then because st1 could possibly have
APPCTX_CLI_ST1_PAYLOAD (2) in it, that would make the dmup start at
thread 2, though it was extremely unlikely.

When contexts were finally cleaned up and moved to their own storage,
this one was overlooked, maybe due to using st1 instead of st2 like
most others. So it continues to rely on st1, and more recently some
new flags were appended, one of which is APPCTX_CLI_ST1_LASTCMD (16)
and is always there. This results in "show threads" to believe it must
start do dump from thread 16, and if this thread is not present, it can
simply crash the process. A tiny reproducer is:

  global
    nbthread 1
    stats socket /tmp/sock1 level admin mode 666

  $ socat /tmp/sock1 - <<< "show threads"

The fix for modern versions simply consists in assigning a context to
this command from the applet storage. We're using a single int, no need
for a struct, an int* will do it. That's valid till 2.6.

Prior to 2.6, better switch to appctx->ctx.cli.i0 or i1 which are all
properly initialized before the command is executed.

This must be backported to all stable versions.

Thanks to Andjelko Horvat for the report and the reproducer.
2024-07-16 11:35:06 +02:00
Amaury Denoyelle
d57b95aab7 BUG/MINOR: do not close uninit FD in quic_test_socketops()
On startup, quic_test_socketops() is called to ensure that chosen
configuration option are compatible with UDP system stack. A dummy FD is
allocated to invoke various setsockopt() settings.

If no tests are required, FD is not allocated. In this case, close()
should not be close. This is mostly for better coding as this does not
cause any real issue for users.

This should fix github issue #2638.

No need to backport.
2024-07-16 10:51:02 +02:00
Aurelien DARRAGON
05f33e95ba MINOR: server: better mt_list usage for node migration (prev_deleted handling)
Now that mt_list v2 api was merged into haproxy's codebase in 4e65fc6 ("
MAJOR: import: update mt_list to support exponential back-off (try #2)"),
let's fix a hack in cli_parse_delete_server() which abused from mt_list
api to migrate an element from one list to another: there used to be a
tiny race there between the pop and the append operations, race that was
compensated by the fact that it was performed under full thread isolation.

However that was a bad example of the mt_list API which could have
resulted in actual bug if the code was duplicated elsewhere without thread
isolation. To fix this, we now make use of the
MT_LIST_FOR_EACH_ENTRY_LOCKED() macro which allows us to simply migrate
the current element to another list since the element is appended into
another one while still in busy state and then unlinked from the original
list.
2024-07-16 09:12:39 +02:00
Willy Tarreau
75b335abc7 MINOR: fd: don't scan the full fdtab on all threads
During tests, it's pretty visible that with many threads and a large
number of FDs, the process may take time to be ready. The reason for
this is that the full fdtab array is scanned by each and every thread
at boot in fd_reregister_all() in order to make each thread-local
poller adopt the FDs that are relevant to it. The problem is that
when dealing with 1-2M FDs and 64+ threads, it starts to represent
quite a number of loops, and usually the fdtab array doesn't entirely
fit in the CPU's L3 cache, causing extra memory accesses.

It's particularly visible when issuing debugging commands to the CLI
because usually the first one fails while the CPU is at 100% for half
a second (which also is socat's timeout). A quick test with this:

    global
        stats socket /tmp/sock1 level admin mode 666
        stats timeout 1h
        maxconn 2000000

And the following script started in another window:

    while ! time socat -t5 - /tmp/sock1 <<< "show version";do date -Ins;done

shows that it takes 1.58s for the socat instance that succeeds on an
Ampere Altra with 80 cores, this requires to change the timeout (defaults
to half a second) otherwise it returns nothing. In addition it also means
that during reloads, some CPU spikes will be noticed.

Adding a prefetch of the current FD + 16 improves the startup time by 30%
but that's far from being sufficient.

In practice all of this is performed at boot time, a moment at which we
know that extremely few FDs are registered (basically just the listeners),
so FD numbers are usually very low and the rest of the table is scanned
for no benefit. Ideally, knowing upfront how many FDs we have should be
sufficient.

A first approach would consist in counting the entries on a single thread
before registering pollers. It's not necessarily efficient and would take
time anyway.

This patch takes a different approach. It consists in keeping a thread-local
max ("fd_highest") that is updated whenever fd_insert() is called with a
larger number. Of course this is not correct once all threads have started,
but it will remain valid during boot since the same value is used during
startup and is cloned for each thread, and no scheduling happens anywhere
during this period, so that all threads are aware of the highest FD they've
seen registered, even if it had been done in some init code, and this without
having to deal with a shared variable.

Here on the test platform, the script gets its response in 10ms vs 1580
before.
2024-07-15 19:19:13 +02:00
Willy Tarreau
a5c5a68454 BUILD: mux-spop: fix build failure on gcc 4-10 and clang
A label at end of block was added in mux_spop.c in function
spop_conn_update_timeout() by commit 7e1bb7283b ("MEDIUM: mux-spop:
Introduce the SPOP multiplexer"). This is normally not permitted,
so gcc-4 to 10 and clang whine about it:

    CC      src/mux_spop.o
  src/mux_spop.c: In function 'spop_conn_update_timeout':
  src/mux_spop.c:899:2: error: label at end of compound statement
    899 |  leave:
        |  ^~~~~

Let's just add a return there to make the compiler happy. No backport
is needed.
2024-07-15 19:19:13 +02:00
Christopher Faulet
b353232641 DOC: spoe: Update SPOE documentation to reflect recent refactoring
The SPOE was refactored. Several parameters were deprecated. Fragmentation
and async capabilities support were removed. The default log-format was
updated too.

So, the SPOE documentation was updated accordingly.

The related issue is #2502.
2024-07-12 16:38:49 +02:00
Christopher Faulet
e83ab972cc MEDIUM: spoe: Make the SPOE applet use its own buffers
The SPOE applet is rewritten to use its own buffers. It is not a huge change
because, once started, the only responsibility of the SPOE applet is to
transfer the ACK frame to the SPOE filter. So it means it does not send any
data to the opposite endpoint, the NOTIFY frame was already transferred
during the applet creation. And it does only receive one full frame. Once
received, it can exit.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
1dd2e484b0 MEDIUM: spoe: Forward SPOE context error to the SPOE applet
Errors triggered by a SPOE filter intance, mainly the processing timeout, are
now forwarded to the SPOE applet. This way, an error can be reported to the SPOP
mux stream to abort it early.

Note that, for now, no abort reaon is set because the SPOP connection is not
closed. Only the SPOP stream is aborted. But thanks to this patch, the SPOE
applet can be released immediately, instead of waiting for the ACK frame or an
error on the mux side.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
1755c32949 MEDIUM: mux-spop: Announce the pipeling support if possible
Reintroduce the pipelining support. Everyting was alredy in place to be able
to multiplex the streams on a SPOP connection. Here, the pipelining support
is annonced and checked in the agent replies. A hard-coded limit to 20
streams is set if the pipelining is supported on both sides. Otherwise, it
is disabled and only one stream at a time is allowed.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
880c037bcf MEDIUM: mux-spop: Add checks on received frames
Some conformance checks on received frames are added with this patch. Idea
is to detect invalid frames and ignore unknown ones if possible. All checks
are performed on the frame metatdata, mainly on the stream and the frame
identifiers.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
7890d6b28d MINOR: spoe: Remove the spop version from the SPOE appctx context
The SPOE applet no longer manipulate the SPOP verison. So it can be safely
removed from its context.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
62d3a96301 MEDIUM: mux-spop/spoe: Save negociated max-frame-size value in the mux
The SPOE applet is just a pass-through now. It is no longer reponsible to
check the frame size. On the other hand, the SPOP multiplexer negociate the
maximum frame size with the agent. So, it seems logical to store this
negociated value in the mux and no longer in the applet context.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
ba64bc3f20 MEDIUM: spoe: Directly receive ACK frame in the SPOE context buffer
Just like the previous patch, here we avoid a buffer copy between the SPOE
applet and the SPOE filter for the ACK reply. The buffer from the SPOE
context is used to retrieve the ACK reply from the channel response buffer.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
07cf7769ce MEDIUM: spoe: Directly xfer NOTIFY frame when SPOE applet is created
Instead of using a buffer from the SPOE filter to store the NOTIFY frame, to
copy it in a trash buffer in the SPOE applet to add meta-data and then tranfer
it to the channel, the original buffer is directly transfered to the channel
during the SPOE applet creation.

The SPOE applet is thus simplied, the I/O handler is now only responsible to
retrieve the ACK reply.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
6b9daec93d MINOR: stats-html: Display reuse ratio for spop connections
Now SPOP connections can be reused, it could be pretty useful to know the
reuse rate. The corresponding backend and server counters are already
incremented, but not displayed on the stats HTML page. Thanks to this patch,
it is now possible to get it, just like for HTTP proxies.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
e68274c90a MAJOR: mux-spop: Make the SPOP connections reusable
Thanks to this patch, SPOP connections can now be inserted in idle
connections list of the server or the session. There is no multiplexing by
SPOP connecitons can be reused. It is the same mechanics than for other
muxes. Noting really new. But it is a huge improvement.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
078f9d3583 MINOR: mux-spop: Use a dedicated function to update the SPOP connection timeout
Force the SPOP servers to use the SPOE engine identifier as pool connection
name. This way, idle SPOP connections, once implemented, of different engine
but using the same backend will not be mixed up.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
e65ff4bf58 MEDIUM: spoe: Force the reuse 'always' mode for SPOP backends
The reuse "always" mode is forced for SPOP backends. For now, SPOP
connections cannot be idle, but once implemented, thanks to this patch, it
will be possible to reuse SPOP connections.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
d2ce835fb7 MINOR: backend: Remove test on HTX streams to reuse idle connections on connect
In connect_server() function, there is a test to be able to reuse idle
connections for HTX streams only. Till now, only HTTP connections can be
idle. And this tests was added to be sure to now reuse idle connections for
legacy HTTP streams. But the legacy HTTP was removed in HAProxy-2.1. So we
can safely remove this test.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
3a7879a652 MEDIUM: spoe: Set a specific name for the connection pool of SPOP servers
With this patch, we force the connection pool name of SPOP server to the
SPOE engine identifier. This way, SPOP idle connections cannot be shared
between diffrente engines.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
706a57d55a MINOR: spoe: Add internal sample fetch to retrieve the SPOE engine ID
The internal sample fetch "spoe.engine-id" is added. It may be used to
retrieve the current engine identifier, but only if the client endpoint is
an SPOE applet. For now, this sample is not documented. It will only be used
to set the connection pool name for a specific engine. This way, several
engine can use the same SPOP backend without sharing their idle connections.

The documentation will be added later, mainly because other SPOE sample
fetches will be added, and some changes are expected.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
a492e08e62 CLEANUP: spoe: Uniformize function definitions
SPOE functions definitions were splitted on 2 or more lines, with the return
type alone on the first line. It is unusual in the HAProxy code.

The related issue is #2502.
2024-07-12 15:27:05 +02:00
Christopher Faulet
cab98784d8 MAJOR: spoe: Rewrite SPOE applet to use the SPOP mux
It is the huge part of the series. The patch is not so huge, it removes
functions to produce or consume frames. The SPOE applet is pretty light
now. But since this patch, the SPOP multiplexer is now used. The SPOP mode
is now automatically ised for SPOP backends. So if there are bugs in the
SPOP multiplexer, they will be visible now.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
1bea73612a MEDIUM: check/spoe: Use SPOP multiplexer to perform SPOP health-checks
The SPOP health-checks are now performed using the SPOP multiplexer. This
will be fixed later, but for now, it is considered as a L4 health-check and
no specific status code is reported. It means the corresponding vtest script
is marked as broken for now.

Functionnaly speaking, the same is performed. A connection is opened, a
HELLO frame is sent to the agent and we wait for the HELLO frame from the
agent in reply. But only L4OK, L4KO or L4TOUT will be reported.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
7e1bb7283b MEDIUM: mux-spop: Introduce the SPOP multiplexer
It is no possible yet to use it. Idles connections and pipelining mode are
not supported for now. But it should be possible to open a SPOP connection,
perform the HELLO handshake, send a NOTIFY frame based on data produced by
the client side and receive the corresponding ACK frame to transfer its
content to the client side.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
d0d23a7a66 MINOR: spoe: Move spoe_str_to_vsn() into the header file
The function used to convert the SPOE version from a string to an integer is
now located in spoe-t.h header file.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
08b522d6ac MINOR: spoe: Move all stuff regarding the filter/applet in the C file
Structures describing the SPOE applet context, the SPOE filter configuration
and context and the SPOE messages and groups are moved in the C file. In
spoe-t.h file, it remains the structure describing an SPOE agent and flags
used by both sides.

In addition, the SPOE frontend, created for a given SPOE engine, is moved
from the SPOE filter configuration to the SPOE agent structure.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
e6145a0ea1 MINOR: spoe: Dynamically alloc the message list per event of an agent
The inline array used to store, the configured messages per event in the
SPOE agent structure, is replaced by a dynamic array, allocated during the
configuration parsing. The main purpose of this change is to be able to move
all stuff regarding the SPOE filter and applet in the C file.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
ce53bb6284 MINOR: spoe: Rename some flags and constant to use SPOP prefix
A SPOP multiplexer will be added. Many flags, constants and structures will
be remove from the applet scope. So the "SPOP" prefix is used instead of
"SPOE", to be consistent.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
51ebf644e5 MINOR: stconn: Use a dedicated function to get the opposite sedesc
se_opposite() function is added to let an endpoint retrieve the opposite
endpoint descriptor. Muxes supportng the zero-copy forwarding can now use
it. The se_shutdown() function too. This will be use by the SPOP multiplexer
to be able to retrieve the SPOE agent configuration attached to the applet
on client side.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
4b8098bf48 MINOR: connection: No longer include stconn type header in connection-t.h
It is a small change, but it is cleaner to no include stconn-t.h header in
connection-t.h, mainly to avoid circular definitions.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
33ac3dabcb MEDIUM: applet: Add a .shut callback function for applets
Applets can now define a shutdown callback function, just like the
multiplexer. It is especially usefull to get the abort reason. This will be
pretty useful to get the status code from the SPOP stream to report it at
the SPOe filter level.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
1538c4aa82 MEDIUM: proxy/spoe: Add a SPOP mode
The SPOE was significantly lightened. It is now possible to refactor it to
use a dedicated multiplexer. The first step is to add a SPOP mode for
proxies. The corresponding multiplexer mode is also added.

For now, there is no SPOP multiplexer, so it is only declarative. But at the
end, the SPOP multiplexer will be automatically selected for servers inside
a SPOP backend.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
b986952a75 MINOR: spoe: Remove the dedicated SPOE applet task
The dedicated task per SPOE applet is no longer used. So it is removed.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
4e589095d9 MAJOR: spoe: Remove idle applets and pipelining support
Management of idle applets is removed. Consequently, the pipelining support
is also removed. It is a huge change but it should be transparent for the
agents, except regarding the performances. Of course, being able to reuse
already openned connections and being able to multiplex frames on a given
connection is a must have. These features will be restored later.

hello and idle timeout are not longer used. Because an applet is spawned to
process a NOTIFY frame and closed after receiving the ACK reply, the
processing timeout is the only one required. In addition, the parameters to
limit the SPOE applet creation are no longer used too.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
2405881ab0 MINOR: spoe: Remove debugging
All the SPOE debugging is removed. The code will be easier to rework this
way and the debugging will be mainly moved in the SPOP multiplexter via the
trace API.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
d37489abef MINOR: spoe: Use only a global engine-id per agent
Because the async mode was removed, it is no longer mandatory to announce a
different engine identifiers per thread for a given SPOE agent. This was
used to be sure requests and the corresponding responses are stuck on the
same thread.

So, now, a SPOE agent only announces one engine identifier on all
connections. No changes should be expected for agents.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
52ad7eb79e MEDIUM: spoe: Remove async mode support
The support for asynchronous mode, the ability to send messages on a
connection and receive the responses on any other connections, is removed.
It appears this feature was a bit overkill. And it is a problem for this
refactoring. This feature is removed and will not be restored at the end.

It is not a big deal for agent supporting the async mode because it is
usable if it is announced on both sides. HAProxy stops to announce it. This
should be transparent for agents.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
e3c92209f7 MEDIUM: spoe: Remove fragmentation support
It is the first patch of a long series to refactor the SPOE filter. The idea
is to rely on a dedicated multiplexer instead of hakcing HAProxy with a list
of applets processing a message queue.

First of all, optionnal features will be removed. Some will be restored at
the end, some others will just be removed. It is the case here. The frame
fragmentation support is removed. The only purpose of this feature is to be
able to support the streaming. Because it is out of the scope of this
refactoring, the fragmentation is removed.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Christopher Faulet
249a547f37 CLEANUP: stconn: Fix a typo in comments for SE_ABRT_SRC_*
Just a little typo: s/set bu/ set by/
2024-07-12 15:27:04 +02:00
Christopher Faulet
0764445505 BUG/MINOR: session: Eval L4/L5 rules defined in the default section
It is possible to define TCP/HTTP rules in a named default section to
inherit from it in a proxy. However, there is an issue with L4/L5 rules.
Only the lists of the current frontend are checked to know if an eval must
be performed. Nothing is done for an empty list. Of course, the lists of the
default proxy must also be checked to be sure to not ignored default L4/L5
rules. It is now fixed.

This patch should fix the issue #2637. It must be backported as far as 2.6.
2024-07-12 15:27:04 +02:00
Valentine Krasnobaeva
9302869c95 BUG/MINOR: limits: fix license type in limits.h
Need to use LGPL-2.1-or-later in headers since our hedaers default
to LGPL.
2024-07-11 18:15:48 +02:00
Amaury Denoyelle
3be58fc720 CLEANUP: quic: rename TID affinity elements
This commit is the renaming counterpart of the previous one, this time
for quic_conn module. Several elements related to TID affinity update
from quic_conn has been renamed : public functions, but also flag
renamed to QUIC_FL_CONN_TID_REBIND and trace event to
QUIC_EV_CONN_BIND_TID.

This should be backported with the same instruction as the previous
commit.
2024-07-11 15:14:06 +02:00
Amaury Denoyelle
9fbe8b0334 CLEANUP: proto: rename TID affinity callbacks
Since the following patch, protocol API to update a connection TID
affinity has been extended.
  commit 1a43b9f32c71267e3cb514aa70a13c75adb20742
  MINOR: proto: extend connection thread rebind API

The single callback set_affinity has been splitted in 3 different
functions which are called at different stages during listener_accept(),
depending on accept queue push success or not. However, the naming was
rendered confusing by the usage of function prefix 1 and 2.

Rename proto callback related to TID affinity update and use the
following names :

* bind_tid_prep
* bind_tid_commit
* bind_tid_reset

This commit should probably be backported at least up to 3.0 with the
above patch. This is because the fix was recently backported and it
would allow to keep changes minimal between the two versions. It could
even be backported up to 2.8 if there is no major conflict.
2024-07-11 15:14:06 +02:00
Christopher Faulet
2cb5b7dca6 BUG/MEDIUM: bwlim: Be sure to never set the analyze expiration date in past
Every time a bandwidth limitation is evaluated on a channel, the analyze
expiration date is renewed, mainly based on the internal bandwidth
limitation filter expiration date. However, when the filter is called while
there is no data to filter, we skip all limitation computations to jump at
the end of the function. At this stage, the analyze expiration date is
renewed before exiting. But here the internal expiration date may be expired
and not reset.

To sum up, it is possible to set the analyze expiration date of a channel in
the past. It is unexpected and this could lead to a loop in process_stream.

To fix the issue, we just now take care to reset the internal expiration
date, if needed, before exiting.

This patch should fix the issue #2634. It must be backported as far as 2.8.
2024-07-11 14:51:23 +02:00
Amaury Denoyelle
b0990b38f8 MINOR: quic: add counters of sent bytes with and without GSO
Add a sent bytes counter for each quic_conn instance. A secondary field
which only account bytes sent via GSO which is useful to ensure if this
is activated.

For the moment, these counters are reported on "show quic" but not
aggregated on proxy quic module stats.
2024-07-11 11:02:44 +02:00
Amaury Denoyelle
d0ea173e35 MEDIUM: quic: implement GSO fallback mechanism
UDP GSO on Linux is not implemented in every network devices. For
example, this is not available for veth devices frequently used in
container environment. In such case, EIO is reported on send()
invocation.

It is impossible to test at startup for proper GSO support in this case
as a listener may be bound on multiple network interfaces. Furthermore,
network interfaces may change during haproxy lifetime.

As such, the only option is to react on send syscall error when GSO is
used. The purpose of this patch is to implement a fallback when
encountering such conditions. Emission can be retried immediately by
trying to send each prepared datagrams individually.

To support this, qc_send_ppkts() is able to iterate over each datagram
in a so-called non-GSO fallback mode. Between each emission, a datagram
header is rewritten in front of the buffer which allows the sending loop
to proceed until last datagram is emitted.

To complement this, quic_conn listener is flagged on first GSO send
error with value LI_F_UDP_GSO_NOTSUPP. This completely disables GSO for
all future emission with QUIC connections using this listener.

For the moment, non-GSO fallback mode is activated when EIO is reported
after GSO has been set. This is the error reported for the veth usage
described above.
2024-07-11 11:02:44 +02:00
Amaury Denoyelle
af22792a43 MAJOR: quic: support GSO when encoding datagrams
QUIC datagrams are encoded during emission via the function
qc_prep_pkts(). By default, if GSO is not used, each datagram is
prefixed by a metadata header which specify its length and address of
its first quic_tx_packet instance.

If GSO is activated, metadata header won't be inserted for datagrams
following the first one sent in a single syscall. Length field will
contain the total size of these datagrams. This allows to support both
GSO and non-GSO prepared datagram in the same Tx buffer.

qc_send_ppkts() is invoked just after datagrams encoding. It iterates
over each metadata header in Tx buffer to sent each datagram
individually. If length field is bigger than network MTU, GSO usage is
assumed and qc_snd_buf() GSO parameter will be set.

Another important point to note regarding GSO implementation is that
during datagram encoding, packets from the same datagram instance are
attached together. However, if using GSO, consecutive packets from
different datagrams are also linked, but without
QUIC_FL_TX_PACKET_COALESCED flag. This allows to properly update
quic_conn status with all sent packets in qc_send_ppkts(). Packets from
different datagrams are then unlinked to treat them separately when
receiving corresponding ACK frames.
2024-07-11 11:02:44 +02:00
Amaury Denoyelle
448d3d388a MINOR: quic: add GSO parameter on quic_sock send API
Add <gso_size> parameter to qc_snd_buf(). When non-null, this specifies
the value for socket option SOL_UDP/UDP_SEGMENT. This allows to send
several datagrams in a single call by splitting data multiple times at
<gso_size> boundary.

For now, <gso_size> remains set to 0 by caller, as such there should not
be any functional change.
2024-07-11 11:02:44 +02:00
Amaury Denoyelle
96a34d79d9 MINOR: quic: define quic_cc_path MTU as constant
Future commits will implement GSO support to be able to emit multiple
datagrams in a single syscall invocation. This will be used every time
there is more data to sent than the UDP network MTU.

No change will be done for Tx buffer encoding, in particular when using
extra metadata datagram header. When GSO will be used, length field will
contain the total length of all datagrams to emit in a single GSO
syscall send. As such, QUIC send functions will detect that GSO is in
use if total length is greater than MTU.

This last assumption forces to ensure that MTU is constant. Indeed, in
case qc_send() is interrupted, Tx buffer will be left with prepared
datagrams. These datagrams will be emitted at the next qc_send()
invocation. If MTU would change during these two calls, it would be
impossible to know if GSO was used or not. To prevent this, mark <mtu>
field of quic_cc_path as constant.
2024-07-11 11:02:44 +02:00
Amaury Denoyelle
35470d5185 MINOR: quic: activate UDP GSO for QUIC if supported
Add a startup test for GSO support in quic_test_socketopts() and
automatically activate it in qc_prep_pkts() when building datagrams as
big as MTU.

Also define a new config option tune.quic.disable-udp-gso. This is
useful to prevent warning on older platform or to debug an issue which
may be related to GSO.
2024-07-11 11:02:44 +02:00
Amaury Denoyelle
5bddf39fb2 MINOR: quic: extend detection of UDP API OS features
QUIC haproxy implementation relies on specific OS features to activate
some UDP optimization. One of these is the ability to bind multiple
sockets on the same address, which is necessary to have a dedicated
socket for each QUIC connections. This feature support is tested during
startup via an internal proto-quic function. It automatically deactivate
socket per connection if OS is not compatible.

The purpose of this patch is to render this QUIC feature detection code
more generic. Function is renamed quic_test_socketopts() and is still
invoked on startup. Its internal code has been refactored to be able to
implement other features support test in it.

Return value has also been changed and is now taken into account. In
case of ERR_FATAL, haproxy startup will be interrupted. This happens on
socket() syscall failure used to duplicate a QUIC listener FD.

This commit will become necessary to detect GSO support on startup.
2024-07-11 11:02:44 +02:00
Amaury Denoyelle
cac47d19bd CLEANUP: quic: remove obsolete comment on send
Remove comment on send which is now obsolete since the introduction of
per-connection socket.
2024-07-11 11:02:44 +02:00
Valentine Krasnobaeva
3a0b44b122 MINOR: limits: add is_any_limit_configured
Let's encapsulate the check of all supported for now process internal limits in
a separate function. This will help in cases, when we need to simply check if
we have even only one limit set in the configuration file. It's important, as
the default value for a one limit (fd-hard-limits, for example) sometimes must
not affect the computation of the others.
2024-07-10 18:05:48 +02:00
Valentine Krasnobaeva
1f8addfdc2 REORG: haproxy: move limits handlers to limits
This patch moves handlers to compute process related limits in 'limits'
compilation unit.
2024-07-10 18:05:48 +02:00
Valentine Krasnobaeva
22db643648 MINOR: haproxy: prepare to move limits-related code
This patch is done in order to prepare the move of handlers to compute and to
check process related limits as maxconn, maxsock, maxpipes.

So, these handlers become no longer static due to the future move.

We add the handlers declarations in limits.h in this patch as well, in order to
keep the next patch, dedicated to code replacement, without any additional
modifications.

Such split also assures that this patch can be compiled separately from the
next one, where we moving the handlers. This  is important in case of
git-bisect.
2024-07-10 18:05:48 +02:00
Valentine Krasnobaeva
b8dc783eb9 REORG: global: move rlim_fd_*_at_boot in limits
Let's move in 'limits' compilation unit global variables to keep the initial
process fd limits.
2024-07-10 18:05:48 +02:00
Valentine Krasnobaeva
47f2afb436 CLEANUP: fd: rm struct rlimit definition
As raise_rlim_nofile() was moved to limits compilation unit, limits.h includes
the system <sys/resource.h>. So, this definition of rlimit system type
structure is no longer need for compilation of fd unit.
2024-07-10 18:05:48 +02:00
Valentine Krasnobaeva
3759674047 REORG: fd: move raise_rlim_nofile to limits
Let's move raise_rlim_nofile() from 'fd' compilation unit to 'limits', as it
wraps setrlimit to change process RLIMIT_NOFILE.
2024-07-10 18:05:48 +02:00
Valentine Krasnobaeva
1517bcb5e3 MINOR: limits: prepare to keep limits in one place
The code which gets, sets and checks initial and current fd limits and process
related limits (maxconn, maxsock, ulimit-n, fd-hard-limit) is spread around
different functions in haproxy.c and in fd.c. Let's group it together in
dedicated limits.c and limits.h.

This patch is done in order to prepare the moving of limits-related functions
from different places to the new 'limits' compilation unit. It helps to keep
clean the next patch, which will do only the move without any additional
modifications.

Such detailed split is needed in order to be sure not to break accidentally
limits logic and in order to be able to compile each commit separately in case
of git-bisect.
2024-07-10 18:05:48 +02:00
Willy Tarreau
a4bc71a1a3 [RELEASE] Released version 3.1-dev3
Released version 3.1-dev3 with the following main changes :
    - BUG/MINOR: quic: Wrong datagram building when probing.
    - BUG/MEDIUM: quic: fix possible exit from qc_check_dcid() without unlocking
    - BUG/MINOR: promex: Remove Help prefix repeated twice for each metric
    - DOC: configuration: add details about crt-store in bind "crt" keyword
    - BUG/MEDIUM: hlua/cli: Fix lua CLI commands to work with applet's buffers
    - DOC: configuration: more details about the master-worker mode
    - BUG/MEDIUM: server: fix race on server_atomic_sync()
    - BUG/MINOR: jwt: don't try to load files with HMAC algorithm
    - CLEANUP: quic: cleanup prototypes related to CIDs handling
    - CLEANUP: quic: remove non-existing quic_cid_tree definition
    - MINOR: quic: remove access to CID global tree outside of quic_cid module
    - REORG: quic: remove quic_cid_trees reference from proto_quic
    - MINOR: quic: add 2 BUG_ON() on datagram dispatch
    - MINOR: quic: ensure quic_conn is never removed on thread affinity rebind
    - MEDIUM: init: set default for fd_hard_limit via DEFAULT_MAXFD
    - DOC: configuration: update maxconn description
    - MINOR: proto: extend connection thread rebind API
    - BUG/MEDIUM: quic: prevent crash on accept queue full
    - BUG/MEDIUM: peers: Fix crash when syncing learn state of a peer without appctx
    - CI: add weekly QUIC Interop regression against LibreSSL
    - DEV: flags/quic: decode quic_conn flags
    - MINOR: quic: rename "ssl error" trace
    - BUG/MEDIUM: init: fix fd_hard_limit default in compute_ideal_maxconn
    - BUG/MINOR: jwt: fix variable initialisation
    - MINOR: ssl/sample: ssl_c_san returns a comma separated list of SAN
    - OPTIM: pool: improve needed_avg cache line access pattern
    - MAJOR: import: update mt_list to support exponential back-off (try #2)
    - CI: weekly QUIC Interop: try to fix private image
    - BUG/MINOR: h1: Fail to parse empty transfer coding names
    - BUG/MINOR: h1: Reject empty coding name as last transfer-encoding value
    - BUG/MEDIUM: h1: Reject empty Transfer-encoding header
    - BUG/MEDIUM: spoe: Be sure to create a SPOE applet if none on the current thread
    - BUILD: listener: silence a build warning about unused value without threads
    - DOC: architecture: remove the totally outdated architecture manual
    - SCRIPTS: create-release: no more need to skip architecture.txt
2024-07-10 15:39:36 +02:00
Willy Tarreau
d96b9f4249 SCRIPTS: create-release: no more need to skip architecture.txt
Now that it's gone we won't stumble upon it by accident anymore.
2024-07-10 15:38:45 +02:00
Willy Tarreau
95b9d8abee DOC: architecture: remove the totally outdated architecture manual
We've discussed about removing it many times and I thought it had been
removed long ago, but apparently not as William proved me. Let's get
rid of it now. It's totally outdated (last updated 18 years ago, when
laptop processors were still 32 bits), mentions keywords and external
products that don't exist anymore. It's not even on docs.haproxy.org.
At some point, old stuff must really die.
2024-07-10 15:38:20 +02:00
Willy Tarreau
0cb8743209 BUILD: listener: silence a build warning about unused value without threads
A variable introduced in commit 1a43b9f32c ("MINOR: proto: extend
connection thread rebind API") is not used without threads and causes a
build warning. Let's just mark it maybe_unused.

Since the commit above is tagged for backporting, this one will need to
be backported along with it.
2024-07-10 15:17:04 +02:00
Christopher Faulet
5e84f13a0b BUG/MEDIUM: spoe: Be sure to create a SPOE applet if none on the current thread
When a message is queued, waiting to be processed by a SPOE applet, there
are some heuristic to know if a new applet must be created or not. There are
2 conditions to skip the applet creation:

  1 - if there are enough idle applets on the current thread, or,

  2 - if the processing rate on the current thread is high enough to handle
      this new message

In the 2nd case, there is a flaw when the number of processed messages falls
to zero while the processing rate is still greater than zero. In that case,
we will skip the SPOE applet creation without taking care to check there is
at least one applet on the current thread.

So now, the conditions above to skip the SPOE applet creation are only
evaluated if there is at least one applet on the current thread.

This patch must be backported to every stable versions.
2024-07-10 10:52:20 +02:00
Christopher Faulet
4a2dd6f377 BUG/MEDIUM: h1: Reject empty Transfer-encoding header
The Transfer-Encoding headers list the transfer coding that have been
applied to the content in order to form the message body. It is a list of
tokens. And as specified by RFC 9110, a token cannot be empty. When several
coding names are specify as a comma-separated value, this case is properly
handled and an error is triggered. However, an empty header value will just
be skipped and no error is triggered. This could be an issue with some buggy
servers.

Now, empty Transfer-Encoding header are rejected too.

This patch must be backported as far as 2.6.
2024-07-10 10:52:20 +02:00
Christopher Faulet
428451fe96 BUG/MINOR: h1: Reject empty coding name as last transfer-encoding value
The following Transfer-Encoding header is now rejected with a
400-bad-request:

  Transfer-Encoding: chunked,\r\n

This case was not properly handled and the last empty value was just
ignored.

This patch must be backported as far as 2.6.
2024-07-10 10:52:20 +02:00
Christopher Faulet
b8b0102760 BUG/MINOR: h1: Fail to parse empty transfer coding names
Empty transfer coding names, inside a comma-separated list, are already
rejected. But it is only by chance. Today, it is detected as an unknown
coding names (not "chunked" concretly). Then, it is handled by the H1
multiplexer as an error and a 422-Unprocessable-Content response is
returned.

So, the error is properly detected in this case, but it is not accurate. A
400-bad-request response must be returned instead. Then, it is better to
catch the error during the header parsing. It is the purpose of this patch.

This patch should be backported as far as 2.6.
2024-07-10 10:52:20 +02:00
Ilia Shipitsin
89bdd8b62a CI: weekly QUIC Interop: try to fix private image
for some reason image built in HAProxy workflow is "private", it
is succesfully built, but fails to pull. Let's try explicit docker login
for run job as well
2024-07-10 09:43:02 +02:00
Willy Tarreau
4e65fc66f6 MAJOR: import: update mt_list to support exponential back-off (try #2)
This is the second attempt at importing the updated mt_list code (commit
59459ea3). The previous one was attempted with commit c618ed5ff4 ("MAJOR:
import: update mt_list to support exponential back-off") but revealed
problems with QUIC connections and was reverted.

The problem that was faced was that elements deleted inside an iterator
were no longer reset, and that if they were to be recycled in this form,
they could appear as busy to the next user. This was trivially reproduced
with this:

  $ cat quic-repro.cfg
  global
          stats socket /tmp/sock1 level admin
          stats timeout 1h
          limited-quic

  frontend stats
          mode http
          bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3
          timeout client 5s
          stats uri /

  $ ./haproxy -db -f quic-repro.cfg  &

  $ h2load -c 10 -n 100000 --npn h3 https://127.0.0.1:8443/
  => hang

This was purely an API issue caused by the simplified usage of the macros
for the iterator. The original version had two backups (one full element
and one pointer) that the user had to take care of, while the new one only
uses one that is transparent for the user. But during removal, the element
still has to be unlocked if it's going to be reused.

All of this sparked discussions with Fred and Aurlien regarding the still
unclear state of locking. It was found that the lock API does too much at
once and is lacking granularity. The new version offers a much more fine-
grained control allowing to selectively lock/unlock an element, a link,
the rest of the list etc.

It was also found that plenty of places just want to free the current
element, or delete it to do anything with it, hence don't need to reset
its pointers (e.g. event_hdl). Finally it appeared obvious that the
root cause of the problem was the unclear usage of the list iterators
themselves because one does not necessarily expect the element to be
presented locked when not needed, which makes the unlock easy to overlook
during reviews.

The updated version of the list presents explicit lock status in the
macro name (_LOCKED or _UNLOCKED suffixes). When using the _LOCKED
suffix, the caller is expected to unlock the element if it intends to
reuse it. At least the status is advertised. The _UNLOCKED variant,
instead, always unlocks it before starting the loop block. This means
it's not necessary to think about unlocking it, though it's obviously
not usable with everything. A few _UNLOCKED were used at obvious places
(i.e. where the element is deleted and freed without any prior check).

Interestingly, the tests performed last year on QUIC forwarding, that
resulted in limited traffic for the original version and higher bit
rate for the new one couldn't be reproduced because since then the QUIC
stack has gaind in efficiency, and the 100 Gbps barrier is now reached
with or without the mt_list update. However the unit tests definitely
show a huge difference, particularly on EPYC platforms where the EBO
provides tremendous CPU savings.

Overall, the following changes are visible from the application code:

  - mt_list_for_each_entry_safe() + 1 back elem + 1 back ptr
    => MT_LIST_FOR_EACH_ENTRY_LOCKED() or MT_LIST_FOR_EACH_ENTRY_UNLOCKED()
       + 1 back elem

  - MT_LIST_DELETE_SAFE() no longer needed in MT_LIST_FOR_EACH_ENTRY_UNLOCKED()
      => just manually set iterator to NULL however.
    For MT_LIST_FOR_EACH_ENTRY_LOCKED()
      => mt_list_unlock_self() (if element going to be reused) + NULL

  - MT_LIST_LOCK_ELT => mt_list_lock_full()
  - MT_LIST_UNLOCK_ELT => mt_list_unlock_full()

  - l = MT_LIST_APPEND_LOCKED(h, e);  MT_LIST_UNLOCK_ELT();
    => l=mt_list_lock_prev(h); mt_list_lock_elem(e); mt_list_unlock_full(e, l)
2024-07-09 16:46:38 +02:00
Willy Tarreau
87d269707b OPTIM: pool: improve needed_avg cache line access pattern
On an AMD EPYC 3rd gen, 20% of the CPU is spent calculating the amount
of pools needed when using QUIC, because pool allocations/releases are
quite frequent and the inter-CCX communication is super slow. Still,
there's a way to save between 0.5 and 1% CPU by using fetch-add and
sub-fetch that are converted to XADD so that the result is directly
fed into the swrate_add argument without having to re-read the memory
area. That's what this patch does.
2024-07-09 16:46:38 +02:00
William Lallemand
9797a7718c MINOR: ssl/sample: ssl_c_san returns a comma separated list of SAN
The ssl_c_san sample fetch returns a list of Subject Alt Name which was
presented by the client certificate.

The format is the same as the "openssl x509 -text" command, it's a
Description: Value list separated by commas.
The format is directly generated by the GENERAL_NAME_print() openssl
function.

https://github.com/openssl/openssl/blob/openssl-3.0/crypto/x509/v3_san.c#L207

Example:
    IP Address:127.0.0.1, IP Address:127.0.0.2, IP Address:127.0.0.3, URI:http://docs.haproxy.org/2.7/, DNS:ca.tests.haproxy.com
2024-07-09 13:57:18 +02:00
William Lallemand
0a1b251c1a BUG/MINOR: jwt: fix variable initialisation
Set the alg variable from sample_conv_jwt_verify_check() to
JWT_ALG_DEFAULT.

This was reported by coverity in #2630, but since you need to use the
first argument to use the 2nd, this has no real impact.

Mut be backported with 883f1bd (as far as 2.6).
2024-07-08 14:23:14 +02:00
Valentine Krasnobaeva
16a5fac4bb BUG/MEDIUM: init: fix fd_hard_limit default in compute_ideal_maxconn
This commit fixes 41275a691 ("MEDIUM: init: set default for fd_hard_limit via
DEFAULT_MAXFD").

fd_hard_limit is taken in account implicitly via 'ideal_maxconn' value in
all maxconn adjustements, when global.rlimit_memmax is set:

	MIN(global.maxconn, capped by global.rlimit_memmax, ideal_maxconn);

It also caps provided global.rlimit_nofile, if it couldn't be set as a current
process fd limit (see more details in the main() code).

So, lets set the default value for fd_hard_limit only, when there is no any
other haproxy-specific limit provided, i.e. rlimit_memmax, maxconn,
rlimit_nofile. Otherwise we may break users configs.

Please, note, that in master-worker mode, master does not need the
DEFAULT_MAXFD (1048576) as well, as we explicitly limit its maxconn to 100.

Must be backported in all stable versions until v2.6.0, including v2.6.0,
like the commit above.
2024-07-08 11:26:16 +02:00
Amaury Denoyelle
3d4baa3c7b MINOR: quic: rename "ssl error" trace
SSL status is reported each time quic_conn_io_cb() is finished via a
trace. Change the trace label from "ssl error" to "ssl status". This
allows to search for errors easier without being distracted by this
trace.
2024-07-08 09:38:35 +02:00
Amaury Denoyelle
19b8c1b7cd DEV: flags/quic: decode quic_conn flags
Decode quic_conn flags via qc_show_flags() function.

To support this, quic flags definition have been put outside of USE_QUIC
directive.
2024-07-08 09:38:35 +02:00
Ilia Shipitsin
f8a30b69d2 CI: add weekly QUIC Interop regression against LibreSSL
currently only quic-go and picoquic clients are enabled with testsuites
supposed to be "green". Tests will be run weekly.
2024-07-05 15:11:21 +02:00
Christopher Faulet
3e2d1476e6 BUG/MEDIUM: peers: Fix crash when syncing learn state of a peer without appctx
For a given peer, the synchronization of the learn state is no longer
performed in the peer appctx. It is delayed to be handled by the peers sync
task. It means that for a given peer, it is possible to have finished to
learn and only handle it after the appctx release. So the synchronization
may happen on a peer without appctx.

This was not tested and an unconditionnal wakeup on the appctx could lead to
a crash because of a NULL-deref. It may be experienced by running
reg-tests/peers/tls_basic_sync.vtc script in loop. The fix is obivous. In
sync_peer_learn_state(), we must omit to wakeup the appctx if it was already
released.

This patch should fix issue #2629. It must be backported to 3.0.
2024-07-05 12:14:27 +02:00
Amaury Denoyelle
95f624540b BUG/MEDIUM: quic: prevent crash on accept queue full
Handshake for quic_conn instances runs on a single non-chosen thread. On
completion, listener_accept() is performed to select the less loaded
thread before initializing connection instance. As such, quic_conn
instance is migrated to the thread with its upper connection.

In case accept queue is full, listener_accept() fallback to local accept
mode, which cause the connection to be assigned to the current thread.
However, this is not supported by QUIC as quic_conn instance is left on
the previously selected thread. In most cases, this will cause a
BUG_ON() due to a task manipulation from an outside thread.

To fix this, handle quic_conn thread rebind in multiple steps using the
new extended protocol API. Several operations have been moved from
qc_set_tid_affinity1() to newly defined qc_set_tid_affinity2(), in
particular CID TID update. This ensures that quic_conn instance is not
prematurely accessed on the new thread until accept queue push is
guaranteed to succeed.

qc_reset_tid_affinity() is also newly defined to reassign the newly
created tasks and tasklets to the current thread. This is necessary to
prevent the BUG_ON() crash described above.

This must be backported up to 2.8 after a period of observation. Note
that it depends on previous patch :
  MINOR: proto: extend connection thread rebind API
2024-07-04 17:28:56 +02:00
Amaury Denoyelle
1a43b9f32c MINOR: proto: extend connection thread rebind API
MINOR: listener: define callback for accept queue push

Extend API for connection thread rebind API by replacing single callback
set_affinity by three different ones. Each one of them is used at a
different stage of the operation :

* set_affinity1 is used similarly to previous set_affinity

* set_affinity2 is called directly from accept_queue_push_mp() when an
  entry has been found in accept ring. This operation cannot fail.

* reset_affinity is called after set_affinity1 in case of failure from
  accept_queue_push_mp() due to no space left in accept ring. This is
  necessary for protocols which must reconfigure resources before
  fallback on the current tid.

This patch does not have any functional changes. However, it will be
required to fix crashes for QUIC connections when accept queue ring is
full. As such, it must be backported with it.
2024-07-04 16:33:21 +02:00
Valentine Krasnobaeva
ff024206f0 DOC: configuration: update maxconn description
Let's update maxconn keyword description, in order to make it clear, which
setting has the precedence over the global.maxconn and the SYSTEM_MAXCONN if
set.
2024-07-04 07:53:07 +02:00
Valentine Krasnobaeva
41275a6918 MEDIUM: init: set default for fd_hard_limit via DEFAULT_MAXFD
Let's provide a default value for fd_hard_limit, if it's not set in the
configuration. With this patch we could set some specific default via
compile-time variable DEFAULT_MAXFD as well. Hope, this will be helpfull for
haproxy package maintainers.

    make -j 8 TARGET=linux-glibc DEBUG=-DDEFAULT_MAXFD=50000

If haproxy is comipled without DEFAULT_MAXFD defined, the default will be set
to 1048576.

This is done to avoid killing the process by its watchdog, while it started
without any limitations in its configuration or in the command line and the
hard RLIMIT_NOFILE is extremely huge (~1000000000). We use in this case
compute_ideal_maxconn() to calculate maxconn and maxsock, maxsock defines the
size of internal fdtab, which becames very-very large as well. When
the process starts to simply loop over this fdtab (0(n)), this takes a lot of
time, so watchdog does it job.

To avoid this, maxconn now is always reduced to some reasonable value either
by explicit global.fd-hard-limit from configuration, or by its default. The
default may be changed at build-time and overwritten then by
global.fd-hard-limit at runtime. Explicit global.fd-hard-limit from the
configuration has always precedence over DEFAULT_MAXFD, if set.

Must be backported in all stable versions until v2.6.0, including v2.6.0.
2024-07-04 07:52:42 +02:00
Amaury Denoyelle
bfdf145859 MINOR: quic: ensure quic_conn is never removed on thread affinity rebind
On accept, quic_conn instance is migrated from its original thread to a
new one. This operation is conducted in two steps, on the original than
the new thread instance. During the interval, quic_conn is artificially
rendered inactive. It must never be accessed nor removed until migration
is completed via qc_finalize_affinity_rebind(). This new BUG_ON() will
enforce that removal is never conducted until migration is completed.
2024-07-03 15:02:40 +02:00
Amaury Denoyelle
a4240fb26f MINOR: quic: add 2 BUG_ON() on datagram dispatch
QUIC datagram dispatch is an error prone operation as it must always
ensure the correct thread is used before accessing to the recipient
quic_conn instance. Strengthen this code part by adding two BUG_ON_HOT()
to ensure thread safety.
2024-07-03 15:02:40 +02:00
Amaury Denoyelle
8550549cca REORG: quic: remove quic_cid_trees reference from proto_quic
Previous commit removed access/manipulation to QUIC CID global tree
outside of quic_cid module. This ensures that proper locking is always
performed.

This commit finalizes this cleanup by marking CID global tree as static
only to quic_cid source file. Initialization of this tree is removed
from proto_quic and now performed using dedicated initcalls
quic_alloc_global_cid_tree().

As a side change, complete CID global tree documentation, in particular
to explain CID global tree artificial splitting and ODCID handling.
Overall, the code is now clearer and safer.
2024-07-03 15:02:40 +02:00
Amaury Denoyelle
0a352ef08e MINOR: quic: remove access to CID global tree outside of quic_cid module
haproxy generates for each QUIC connection a set of CID. The peer must
reuse them as DCID for its emitted packet. On datagram reception, DCID
field serves as identifier to dispatch them on their correct thread.

These CIDs are stored in a global CID tree. Access to this data
structure must always be protected with CID_LOCK. This commit is a
refactoring to regroup all CID tree access in quic_cid module. Several
code parts are ajusted :

* quic_cid_insert() is extended to check for insertion race-condition.
  This is useful on quic_conn instantiation. Code where such race cannot
  happen can use unsafe _quic_cid_insert() instead.

* on RETIRE_CONNECTION_ID frame reception, existing quic_cid_delete()
  function is used.

* remove tree lookup from qc_check_dcid(), extracted in the new
  quic_cmp_cid_conn() function. Ultimately, the latter should be removed
  as CID lookup could be conducted on quic_conn owned tree without
  locking.
2024-07-03 15:02:40 +02:00
Amaury Denoyelle
5d186673df CLEANUP: quic: remove non-existing quic_cid_tree definition
quic_cid_tree global variable does not exist anymore. Remove its
definition in quic_conn.c.
2024-07-03 15:02:40 +02:00
Amaury Denoyelle
a05fefe74d CLEANUP: quic: cleanup prototypes related to CIDs handling
Remove duplicated prototypes from quic_conn.h also present in
quic_cid.h. Also remove quic_derive_cid() prototype and mark it as
static.
2024-07-03 15:02:40 +02:00
William Lallemand
883f1bdbce BUG/MINOR: jwt: don't try to load files with HMAC algorithm
When trying to use a HMAC algorithm (HS256, HS384, HS512) the
sample_conv_jwt_verify_check() function of the converter tries to load a
file even if it is only supposed to contain a secret instead of a path.

When using lua, the check function is called at runtime so it even tries
to load file at each call... This fixes the issue for HMAC algorithm
but this is still a problem with the other algorithms, since we don't
have a way of pre-loading files before the call.

Another solution must be found to prevent disk IO with lua using other
algorithms.

Must be backported as far as 2.6.
2024-07-03 12:35:50 +02:00
Amaury Denoyelle
50ae717624 BUG/MEDIUM: server: fix race on server_atomic_sync()
The following patch fixes a race condition during server addr/port
update :
  cd994407a9545a8d84e410dc0cc18c30966b70d8
  BUG/MAJOR: server/addr: fix a race during server addr:svc_port updates

The new update mechanism is implemented via an event update. It uses
thread isolation to guarantee that no other thread is accessing server
addr/port. Furthermore, to ensure server instance is not deleted just
before the event handler, server instance is lookup via its ID in proxy
tree.

However, thread isolation is only entered after server lookup. This
leaves a tiny race condition as the thread will be marked as harmless
and a concurrent thread can delete the server in the meantime. This
causes server_atomic_sync() to manipulated a deleted server instance to
reinsert it in used_server_addr backend tree. This can cause a segfault
during this operation or possibly on a future used_server_addr tree
access.

This issue was detected by criteo. Several backtraces were retrieved,
each related to server addr_node insert or delete operation, either in
srv_set_addr_desc(), or add/delete dynamic server handlers.

To fix this, simply extend thread isolation section to start it before
server lookup. This ensures that once retrieved the server cannot be
deleted until its addr/port are updated. To ensure this issue won't
happen anymore, a new BUG_ON() is added in srv_set_addr_desc().

Also note that ebpt_delete() is now called every time on delete handler
as this is a safe idempotent operation.

To reproduce these crashes, a script was executed to add then remove
different servers every second. In parallel, the following CLI command
was issued repeatdly without any delay to force multiple update on
servers port :

  set server <srv> addr 0.0.0.0 port $((1024 + RANDOM % 1024))

This must be backported at least up to 3.0. If above mentionned patch
has been selected for previous version, this commit must also be
backported on them.
2024-07-03 09:20:24 +02:00
William Lallemand
419b79492a DOC: configuration: more details about the master-worker mode
Add more details about the master-worker mode in the "master-worker"
global keyword.

Should fix issue #2198.
2024-07-02 18:23:34 +02:00
Christopher Faulet
e5e36ce097 BUG/MEDIUM: hlua/cli: Fix lua CLI commands to work with applet's buffers
In 3.0, the CLI applet was rewritten to use its own buffers. However, the
lua part, used to register CLI commands at runtime, was not updated
accordingly. It means the lua CLI commands still try to write in the channel
buffers. This is of course totally unexepected and not supported. Because of
this bug, the applet hangs intead of returning the command result.

The registration of lua CLI commands relies on the lua TCP applets. So the
send and receive functions were fixed to use the applet's buffer when it is
required and still use the channel buffers otherwies. This way, other lua
TCP applets can still run on the legacy mode, without the applet's buffers.

This patch must be backported to 3.0.
2024-07-02 10:05:40 +02:00
William Lallemand
ba37ad41b2 DOC: configuration: add details about crt-store in bind "crt" keyword
Add some details about the certificate storage cache system in the "crt"
bind keyword.

This should be backported to 3.0. Fix issue #2618.
2024-07-01 12:30:06 +02:00
Christopher Faulet
b789cef91f BUG/MINOR: promex: Remove Help prefix repeated twice for each metric
When the support for modules was added, the function producing the #HELP
line of each metric was refactored. Since then, the prefix "#HELP
<metric-name>" is printed twice because a code block was not removed. It is
now fixed.

This patch must be backported to 3.0.
2024-07-01 10:50:27 +02:00
Willy Tarreau
192abc6f83 BUG/MEDIUM: quic: fix possible exit from qc_check_dcid() without unlocking
Locking of the CID tree was extended in qc_check_dcid() by recent commit
05f59a5 ("BUG/MINOR: quic: fix race condition in qc_check_dcid()") but
there was a direct return from the middle of the function which was not
covered by the unlock, resulting in the function keeping the lock on
success return.

Let's just remove this return and replace it with a variable to merge all
exit paths.

This must be backported wherever the fix above is backported.
2024-07-01 10:29:31 +02:00
Frederic Lecaille
6d943b8db6 BUG/MINOR: quic: Wrong datagram building when probing.
This issue was revealed by chacha20 interop test which very often fails with
ngtcp2 as client. This was due to the fact that 2 application level packets could
be coalesced into the same datagram as revealed by such a capture:

Frame 380: 255 bytes on wire (2040 bits), 255 bytes captured (2040 bits)
Point-to-Point Protocol
Internet Protocol Version 4, Src: 193.167.100.100, Dst: 193.167.0.100
User Datagram Protocol
QUIC IETF
    QUIC Connection information
        [Connection Number: 0]
    [Packet Length: 187]
    QUIC Short Header DCID=ec523fe99840f9c17c868a88d649147814 PKN=333
        0... .... = Header Form: Short Header (0)
        .1.. .... = Fixed Bit: True
        ..0. .... = Spin Bit: False
        [...0 0... = Reserved: 0]
        [.... .0.. = Key Phase Bit: False]
        [.... ..00 = Packet Number Length: 1 bytes (0)]
        Destination Connection ID: ec523fe99840f9c17c868a88d649147814
        [Packet Number: 333]
        Protected Payload […]: 43537d43a3c83e47db6891bd6a4fd7d7fa31941badcb87a540e843341d6a5e493ed4c3f6e6bbff094804ee0ab06830dc1a1bbf52ace4323d2e4f6e0bd4eea73df0721d2949d05a058d3afb974e814494ebf44d1375b0e7f1fd5bcf634cf32ef9a9b4018758a49d39a24c40
    STREAM id=0 fin=0 off=294768 len=144 dir=Bidirectional origin=Client-initiated
        Frame Type: STREAM (0x000000000000000e)
            .... ...0 = Fin: False
            .... ..1. = Len(gth): True
            .... .1.. = Off(set): True
        Stream ID: 0
            .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ...0 = Stream initiator: Client-initiated (0)
            .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ..0. = Stream direction: Bidirectional (0)
        Offset: 294768
        Length: 144
        Stream Data […]: 63eef6ccee0d2ab602db3682d0e7cc09b72db6adc307d7699a211144b4b6c029cbed9beae1491c10a5fe0678d815a5303843d33c0593fedc9b64068fd0207e280d05aac2c0054fe9ab30857bc3669ee51d34756cfd2e098eb1ab31a03911f6a103f0a16f8f984d9861efdcf4433c
QUIC IETF
    [Packet Length: 38]
    QUIC Short Header DCID=ec523fe99840f9c17c868a88d649147814 PKN=334
        0... .... = Header Form: Short Header (0)
        .1.. .... = Fixed Bit: True
        ..0. .... = Spin Bit: False
        [...0 0... = Reserved: 0]
        [.... .0.. = Key Phase Bit: False]
        [.... ..00 = Packet Number Length: 1 bytes (0)]
        Destination Connection ID: ec523fe99840f9c17c868a88d649147814
        [Packet Number: 334]
        Protected Payload: b9c0e6dc3fc523574f8164c31b6cd156496212
    PING
        Frame Type: PING (0x0000000000000001)
    PADDING Length: 2
        Frame Type: PADDING (0x0000000000000000)
        [Padding Length: 2]

On the peer side these two packet are considered as a unique one
because there may be only one packet by datagram at application encryption
level and reported as a STREAM frame encoding error:

I00000332 0xec523fe99840f9c17c868a88d649147814 con recv packet len=225
mask=b2c69c7827 sample=43a3c83e47db6891bd6a4fd7d7fa3194
I00000332 0xec523fe99840f9c17c868a88d649147814 pkt rx pkn=333 dcid=0xec523fe99840f9c17c868a88d649147814 type=1RTT k=0
I00000332 0xec523fe99840f9c17c868a88d649147814 frm rx 333 1RTT STREAM(0x0e) id=0x0 fin=0 offset=294768 len=144 uni=0
ngtcp2_conn_read_pkt: ERR_FRAME_ENCODING
I00000332 0xec523fe99840f9c17c868a88d649147814 pkt tx pkn=1531039643 dcid=0xae79dfc99d6c65d6 type=1RTT k=0
I00000332 0xec523fe99840f9c17c868a88d649147814 frm tx 1531039643 1RTT CONNECTION_CLOSE(0x1c) error_code=FRAME_ENCODING_ERROR(0x7) frame_type=0 reason_len=0 reason=[]
I00000332 0xec523fe99840f9c17c868a88d649147814 frm tx 1531039643 1RTT PADDING(0x00) len=9

Note here that the sum of the two packet sizes (from capture) is the same as the
packet length reporte by ngtcp2: 187+38 = 225. It also seems that wireshark tries
to parse as much as packet into the same datagram, regardless of the QUIC protocol
rules.

Haproxy traces revealed that this could happen at least when probing the peer.
The recent low level packet building modifications aim was to build
as much as datagrams into the same buffer. But it seems that the
probing packet case treatment has been broken. That said, I have not
identified impacted commit. This issue could be reproduced inside
interop test environment (no possible git bisection).

To fix this, rely on the <probe> variable value to identify if the last
packet built by qc_prep_pkts() was a probing one, then try to
coalesce some others packet into the same datagram if this was not the case.
Of course the test on <probe> value has to be done before setting it
for the next packet.

Must be backported to 3.0.
2024-07-01 09:29:09 +02:00
Willy Tarreau
bbc2f043e3 [RELEASE] Released version 3.1-dev2
Released version 3.1-dev2 with the following main changes :
    - BUG/MINOR: log: fix broken '+bin' logformat node option
    - DEBUG: hlua: distinguish burst timeout errors from exec timeout errors
    - REGTESTS: ssl: fix some regtests 'feature cmd' start condition
    - BUG/MEDIUM: ssl: AWS-LC + TLSv1.3 won't do ECDSA in RSA+ECDSA configuration
    - MINOR: ssl: activate sigalgs feature for AWS-LC
    - REGTESTS: ssl: activate new SSL reg-tests with AWS-LC
    - BUG/MEDIUM: proxy: fix email-alert invalid free
    - REORG: mailers: move free_email_alert() to mailers.c
    - BUG/MINOR: proxy: fix email-alert leak on deinit() (2nd try)
    - DOC: configuration: fix alphabetical order of bind options
    - DOC: management: document ptr lookup for table commands
    - BUG/MAJOR: quic: fix padding with short packets
    - BUG/MAJOR: quic: do not loop on emission on closing/draining state
    - MINOR: sample: date converter takes HTTP date and output an UNIX timestamp
    - SCRIPTS: git-show-backports: do not truncate git-show output
    - DOC: api/event_hdl: small updates, fix an example and add some precisions
    - BUG/MINOR: h3: fix crash on STOP_SENDING receive after GOAWAY emission
    - BUG/MINOR: mux-quic: fix crash on qcs SD alloc failure
    - BUG/MINOR: h3: fix BUG_ON() crash on control stream alloc failure
    - BUG/MINOR: quic: fix BUG_ON() on Tx pkt alloc failure
    - DEV: flags/show-fd-to-flags: adapt to recent versions
    - MINOR: capabilities: export capget and __user_cap_header_struct
    - MINOR: capabilities: prepare support for version 3
    - MINOR: capabilities: use _LINUX_CAPABILITY_VERSION_3
    - MINOR: cli/debug: show dev: add cmdline and version
    - MINOR: cli/debug: show dev: show capabilities
    - MINOR: debug: print gdb hints when crashing
    - BUILD: debug: also declare strlen() in __ABORT_NOW()
    - BUILD: Missing inclusion header for ssize_t type
    - BUG/MINOR: hlua: report proper context upon error in hlua_cli_io_handler_fct()
    - MINOR: cfgparse/log: remove leftover dead code
    - BUG/MEDIUM: stick-table: Decrement the ref count inside lock to kill a session
    - MINOR: stick-table: Always decrement ref count before killing a session
    - REORG: init: do MODE_CHECK_CONDITION logic first
    - REORG: init: encapsulate CHECK_CONDITION logic in a func
    - REORG: init: encapsulate 'reload' sockpair and master CLI listeners creation
    - REORG: init: encapsulate code that reads cfg files
    - BUG/MINOR: server: fix first server template name lookup UAF
    - MINOR: activity: make the memory profiling hash size configurable at build time
    - BUG/MEDIUM: server/dns: prevent DOWN/UP flap upon resolution timeout or error
    - BUG/MEDIUM: h3: ensure the ":method" pseudo header is totally valid
    - BUG/MEDIUM: h3: ensure the ":scheme" pseudo header is totally valid
    - BUG/MEDIUM: quic: fix race-condition in quic_get_cid_tid()
    - BUG/MINOR: quic: fix race condition in qc_check_dcid()
    - BUG/MINOR: quic: fix race-condition on trace for CID retrieval
2024-06-29 11:28:41 +02:00
Amaury Denoyelle
bbb9f8248e BUG/MINOR: quic: fix race-condition on trace for CID retrieval
quic_rx_pkt_retrieve_conn() is used when parsing a received datagram
from the listener socket. It returned the quic_conn instance
corresponding to the first packet DCID, unless it is mapped to another
thread.

As expected, global CID tree access is protected by a lock in the
function. However, there is a race condition due to the final trace
where qc instance is dereferenced outside of the lock. Fix this by
adding a new trace under lock protection and remove qc deferencement at
function end.

This may fix first crash of github issue #2607.

This must be backported up to 2.8.
2024-06-28 16:28:33 +02:00
Amaury Denoyelle
05f59a51ac BUG/MINOR: quic: fix race condition in qc_check_dcid()
qc_check_dcid() is a function which check that a DCID is associated to
the expected quic_conn instance. This is used for quic_conn socket
receive handler as there is a tiny risk that a datagram to another
connection was received on this socket.

As other operations on global CID tree, a lock must be used to protect
against race condition. However, as previous commit, lock was not held
long enough as CID tree node is accessed outside of the lock region. To
fix this, increase critical section until CID dereferencement is done.

The impact of this bug should be similar to the previous one. However,
risk of crash are even less reduced as it should be extremely rare to
receive datagram for other connections on a quic_conn socket. As such,
most of the time first check condition of qc_check_dcid() is enough.

This may fix first crash of issue github #2607.

This must be backported up to 2.8.
2024-06-28 16:28:33 +02:00
Amaury Denoyelle
72267ff35f BUG/MEDIUM: quic: fix race-condition in quic_get_cid_tid()
haproxy generates CID for clients which reuse them as DCID on their
packets. These CID are stored in a global tree quic_cid_trees. Each
operation on this tree must be done under lock protection.

quic_get_cid_tid() is a function which lookups a CID in global tree and
return the associated thread ID. This is used on datagram reception on
listener socket before redispatching the datagram to the correct thread.
This function uses a lock to protect quic_cid_trees access. However,
lock region is too small as CID tree node is accessed outside of it. Fix
this by extending lock protection for CID dereferencement until thread
ID is retrieved.

The impact of this bug is unknown, but it may possible cause crashes.
However, it is probably rare as most of datagram reception is done on
quic_conn socket which does not uses quic_get_cid_tid().

This may fix first crash of github issue #2607.

This must be backported up to 2.8.
2024-06-28 16:27:20 +02:00
Amaury Denoyelle
a3bed52d1f BUG/MEDIUM: h3: ensure the ":scheme" pseudo header is totally valid
Ensure pseudo-header scheme is only constitued of valid characters
according to RFC 9110. If an invalid value is found, the request is
rejected and stream is resetted.

It's the same as for previous commit "BUG/MEDIUM: h3: ensure the
":method" pseudo header is totally valid" except that this time it
applies to the ":scheme" pseudo header.

This must be backported up to 2.6.
2024-06-28 14:36:30 +02:00
Amaury Denoyelle
789d4abd73 BUG/MEDIUM: h3: ensure the ":method" pseudo header is totally valid
Ensure pseudo-header method is only constitued of valid characters
according to RFC 9110. If an invalid value is found, the request is
rejected and stream is resetted.

Previously only characters forbidden in headers were rejected (NUL/CR/LF),
but this is insufficient for :method, where some other forbidden chars
might be used to trick a non-compliant backend server into seeing a
different path from the one seen by haproxy. Note that header injection
is not possible though.

This must be backported up to 2.6.

Many thanks to Yuki Mogi of FFRI Security Inc for the detailed report
that allowed to quicky spot, confirm and fix the problem.
2024-06-28 14:36:30 +02:00
Aurelien DARRAGON
80aba1d284 BUG/MEDIUM: server/dns: prevent DOWN/UP flap upon resolution timeout or error
This is a complementary patch to c16eba818 ("BUG/MEDIUM: server/dns:
preserve server's port upon resolution timeout or error").

Indeed, since c16eba818, the port is properly preserved, but unsetting
server's address this way results in server_atomic_sync() function
thinking that we're actually setting a new address and not unsetting
the previous one because addr family is != AF_UNSPEC.

Upon DNS timeout, this could be observed:

[WARNING]  (2588257) : Server http/s1 is going DOWN for maintenance (DNS timeout status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING]  (2588257) : Server http/s1 ('test1.localhost') is UP/READY (resolves again).

Notice that server timeouts and then immediately resolves again. Of course
in this case case the server's address was properly set to 0, meaning
that the server will not receive any traffic, but it is confusing and
could result in haproxy temporarily thinking that the server is actually
available while it's not.

To properly fix the issue and restore historical behavior, let's
explicitly set inetaddr's family to AF_UNSPEC after fetching original
server's address.

It should be backported in 3.0 with c16eba818.
2024-06-28 11:26:52 +02:00
Willy Tarreau
290659ffd3 MINOR: activity: make the memory profiling hash size configurable at build time
The MEMPROF_HASH_BITS variable was set to 10 without a possibility to
change it (beyond patching the code). After seeing a few reports already
with "other" being listed and a list with close to 1024 entries, it looks
like it's about time to either increase the hash size, or at least make
it configurable for special cases. As a reminder, in order to remain
fast, the algorithm searches no more than 16 places after the hash, so
when a table is almost full, searches are long and new places are rare.

The present patch just makes it possible to redefine it by passing
"-DMEMPROF_HASH_BITS=11" or "-DMEMPROF_HASH_BITS=12" in CFLAGS, and
moves the definition to defaults.h to make it easier to find. Such
values should be way sufficient for the vast majority of use cases.
Maybe in the future we'd change the default. At least this version
should be backported to ease rebuilds, say, till 2.8 or so.
2024-06-27 18:01:27 +02:00
Aurelien DARRAGON
eec8048042 BUG/MINOR: server: fix first server template name lookup UAF
This is a follow-up for 7223296 ("BUG/MINOR: server: fix first server
template not being indexed").

Indeed, in 7223296 we added a new call to _srv_parse_set_id_from_prefix()
for the first server before handling additional ones. But we actually
overlooked the fact that _srv_parse_set_id_from_prefix() was already
performed at the end of _srv_parse_tmpl_init() for the same server.

Since _srv_parse_set_id_from_prefix() frees srv->id, it results in UAF
when performing name lookups on the first server, because used_server_name
node key still uses the freed string pointer.

The early _srv_parse_set_id_from_prefix() call (added in 7223296) and
the original one perform the same task, except that the new one is
followed by name node insertion logic required for name lookups to work
properly. So let's simply get rid of the old one at the end of the
function.

_srv_parse_set_id_from_prefix() in the 'err:' label was also removed since
is is now useless as well starting with 7223296 and would trigger the same
bug on error paths. Thanks to Amaury for noticing it.

This bug was discovered while trying to address GH issue #2620.
Thanks to @x-yuri for his detailed report (with working repro).

It should be backported in 3.0 with 7223296.
2024-06-27 16:38:25 +02:00
Valentine Krasnobaeva
ed90ad895c REORG: init: encapsulate code that reads cfg files
Haproxy master process should not read its configuration the second time
after performing reexec and passing to MODE_MWORKER_WAIT. So, to make
this part of init() function more readable and to distinguish better the
point, where configs have been read, let's encapsulate it in a separate
function.
2024-06-27 16:09:38 +02:00
Valentine Krasnobaeva
5e06d45df7 REORG: init: encapsulate 'reload' sockpair and master CLI listeners creation
Let's encapsulate the logic of 'reload' sockpair and master CLI listeners
creation, used by master CLI into a separate function, as we needed this
only in master-worker runtime  mode. This makes the code of init() more
readable.
2024-06-27 16:08:42 +02:00
Valentine Krasnobaeva
6f613faa71 REORG: init: encapsulate CHECK_CONDITION logic in a func
As MODE_CHECK_CONDITION logic terminates the process anyway, no matter if
the test for the provided condition was successfull or not, let's
encapsulate it in a separate function. This makes the code of init() more
readable.
2024-06-27 16:01:01 +02:00
Valentine Krasnobaeva
10de58fbfb REORG: init: do MODE_CHECK_CONDITION logic first
In MODE_CHECK_CONDITION we only parse check_condition string, provided by
'-cc', and then we evaluate it. Haproxy process terminates at the
end of {if..else} block anyway, if the test has failed or passed. So, it
will be more appropriate to perform MODE_CHECK_CONDITION test first and
then do all other process runtime mode verifications.
2024-06-27 15:59:43 +02:00
Christopher Faulet
ad946a704d MINOR: stick-table: Always decrement ref count before killing a session
Guarded functions to kill a sticky session, stksess_kill()
stksess_kill_if_expired(), may or may not decrement and test its reference
counter before really killing it. This depends on a parameter. If it is set
to non-zero value, the ref count is decremented and if it falls to zero, the
session is killed. Otherwise, if this parameter is equal to zero, the
session is killed, regardless the ref count value.

In the code, these functions are always called with a non-zero parameter and
the ref count is always decremented and tested. So, there is no reason to
still have a special case. Especially because it is not really easy to say
if it is supported or not. Does it mean it is possible to kill a sticky
session while it is still referenced somewhere ? probably not. So, does it
mean it is possible to kill a unreferenced session ? This case may be
problematic because the session is accessed outside of any lock and thus may
be released by another thread because it is unreferenced. Enlarging scope of
the lock to avoid any issue is possible but it is a bit of shame to do so
because there is no usage for now.

The best is to simplify the API and remove this case. Now, stksess_kill()
and stksess_kill_if_expired() functions always decrement and test the ref
count before killing a sticky session.
2024-06-26 15:05:06 +02:00
Christopher Faulet
9357873641 BUG/MEDIUM: stick-table: Decrement the ref count inside lock to kill a session
When we try to kill a session, the shard must be locked before decrementing
the ref count on the session. Otherwise, the ref count can fall to 0 and a
purge task (stktable_trash_oldest or process_table_expire) may release the
session before we have the opportunity to acquire the lock on the shard to
effectively kill the session. This could lead to a double free.

Here is the scenario:

    Thread 1                                 Thread 2

  sktsess_kill(ts)
    if (ATOMIC_DEC(&ts->ref_cnt) != 0)
        return
                   /* here the ref count is 0 */

                                       stktable_trash_oldest()
                                          LOCK(&sh_lock)
                                          if (!ATOMIC_LOAD(&ts->ref_cnf))
                                              __stksess_free(ts)
                                          UNLOCK(&sh_lock)

                  /* here the session was released */
    LOCK(&sh_lock)
    __stksess_free(ts)  <--- double free
    UNLOCK(&sh_lock)

The bug was introduced in 2.9 by the commit 7968fe3889 ("MEDIUM:
stick-table: change the ref_cnt atomically"). The ref count must be
decremented inside the lock for stksess_kill() and sktsess_kill_if_expired()
function.

This patch should fix the issue #2611. It must be backported as far as 2.9. On
the 2.9, there is no sharding. All the table is locked. The patch will have to
be adapted.
2024-06-26 12:05:37 +02:00
Aurelien DARRAGON
bcf98c9b5f MINOR: cfgparse/log: remove leftover dead code
Remove development leftover introduced by commit 15e9c7da6 ("MINOR: log:
add log-profile parsing logic").

Indeed, since "log-profile" section keyword is registered via
REGISTER_CONFIG_SECTION() macro, it is not relevant to declare it in
common_kw_list[] from cfgparse-global.c. All it does is that it could
confuse the user by suggesting him to use "log-profile" inside a global
section when trying to find a best match in cfg_parse_global().
2024-06-26 11:06:31 +02:00
Aurelien DARRAGON
185d230e2c BUG/MINOR: hlua: report proper context upon error in hlua_cli_io_handler_fct()
As a result of copy pasting, hlua_cli_io_handler_fct() used to report lua
exceptions like E_ETMOUT as "Lua converter" instead of "Lua cli".

Let's fix that.

It could be backported to all stable versions.

[ada: for older versions, HLUA_E_BTMOUT case didn't exist so it has to be
 skipped]
2024-06-26 11:06:24 +02:00
Frederic Lecaille
bc9821fd26 BUILD: Missing inclusion header for ssize_t type
Compilation issue detected as follows by gcc:

In file included from src/ncbuf.c:19:
src/ncbuf.c: In function 'ncb_write_off':
include/haproxy/bug.h:144:10: error: unknown type name 'ssize_t'
  144 |   extern ssize_t write(int, const void *, size_t); \
2024-06-26 10:17:09 +02:00
Willy Tarreau
2d27c80288 BUILD: debug: also declare strlen() in __ABORT_NOW()
Previous commit 8f204fa8ae ("MINOR: debug: print gdb hints when crashing")
broken on the CI where strlen() isn't known. Let's forward-declare it in
the __ABORT_NOW() functions, just like write(). No backport is needed.
2024-06-26 08:04:40 +02:00
Willy Tarreau
8f204fa8ae MINOR: debug: print gdb hints when crashing
To make bug reporting easier for users, when crashing, let's suggest
what to do. Typically when a BUG_ON() matches, only the current thread
is useful the vast majority of the time, while when the watchdog
triggers, all threads are interesting.

The messages are printed at the end after the dump. We may adjust these
with wiki links in the future is more detailed instructions are relevant.
2024-06-26 07:43:00 +02:00
Valentine Krasnobaeva
2cd52a88be MINOR: cli/debug: show dev: show capabilities
If haproxy compiled with Linux capabilities support, let's show process
capabilities before applying the configuration and at runtime in 'show dev'
command output. This maybe useful for debugging purposes. Especially in
cases, when process changes its UID and GID to non-priviledged or it
has started and run under non-priviledged UID and needed capabilities are
set by admin on the haproxy binary.
2024-06-26 07:38:21 +02:00
Valentine Krasnobaeva
0d79c9bedf MINOR: cli/debug: show dev: add cmdline and version
'show dev' command is very convenient to obtain haproxy debugging information,
while process is run in container. Let's extend its output with version and
cmdline. cmdline is useful in a way, as it shows absolute binary path and its
arguments, because sometimes the person, who is debugging failing container is
not the same, who has created and deployed it.

argc and argv are stored in the exported global structure, because
feed_post_mortem() is added as a post check function callback in the
post_check_list. So we can't simply change the signature of
feed_post_mortem(), without breaking other post check callbacks APIs.

Parsers are not supposed to modify argv, so we can safely bypass its pointer
to debug_parse_cli_show_dev(), without copying all argument stings somewhere
in the heap or on stack.
2024-06-26 07:38:21 +02:00
Valentine Krasnobaeva
fba9ade891 MINOR: capabilities: use _LINUX_CAPABILITY_VERSION_3
Linux kernel shows the warning below, when _LINUX_CAPABILITY_VERSION_1 is
used in capset() and capget().

        [1710243.523230] capability: warning: `haproxy' uses 32-bit capabilities (legacy support in use)

This triggers questions from users. Warning is shown by kernel, because
since Linux 2.6.25, 64-bit capabilities support was introduced in
_LINUX_CAPABILITY_VERSION_2. It's in order to be able to continiously
extend capabilities list with the new ones.

We can't use _LINUX_CAPABILITY_VERSION_2, because this version triggers
another warning, according linux/kernel/capability.c (see also more details
about it in comments from kernel sources and in man capset(2)).

kernel/capability.c:
    ...
    static int cap_validate_magic(cap_user_header_t header, unsigned *tocopy)
    {
            __u32 version;

            if (get_user(version, &header->version))
                    return -EFAULT;

            switch (version) {
            case _LINUX_CAPABILITY_VERSION_1:
                    warn_legacy_capability_use();
                    *tocopy = _LINUX_CAPABILITY_U32S_1;
                    break;
            case _LINUX_CAPABILITY_VERSION_2:
                    warn_deprecated_v2();
                    fallthrough;    /* v3 is otherwise equivalent to v2 */
            case _LINUX_CAPABILITY_VERSION_3:
                    *tocopy = _LINUX_CAPABILITY_U32S_3;
                    break;
            default:
            ...

So, to avoid any warnings, lets use _LINUX_CAPABILITY_VERSION_3, which
according to comments in linux/kernel/capability.c, has the same
functionality as _LINUX_CAPABILITY_VERSION_2 (i.e. array of 2
__user_cap_data_struct with 32-bits integers for each capability set), but
comes in Linux 2.6.26 with a header change, in order to protect legacy
source code.

For the moment, we don't authorize capabilities higher, than CAP_SYS_ADMIN
(21-st bit), so we always check the "low" 32 bits, i.e.
__user_cap_data_struct[0].
2024-06-26 07:38:21 +02:00
Valentine Krasnobaeva
e2e756a67d MINOR: capabilities: prepare support for version 3
Commit e338d263a76a ("Add 64-bit capability support to the kernel")
introduces in the kernel _LINUX_CAPABILITY_VERSION_1 and
_LINUX_CAPABILITY_VERSION_2 and its corresponded magic numbers "1"
(_LINUX_CAPABILITY_U32S_1) and "2" (_LINUX_CAPABILITY_VERSION_2).

Capabilities sets, since this commit, are composed as an arrays of
 __user_cap_data_struct with length defined in version's magic number
(e.g. struct __user_cap_data_struct kdata[_LINUX_CAPABILITY_U32S_1]).

These magic numbers also help the kernel to figure out how many data
(in __user_cap_data_struct "units") it needs to copy_from/to_user in
capset/capget syscalls.

In order to use _LINUX_CAPABILITY_VERSION_3 in the next commit (it has the
same functionality as version 2), let's follow the kernel code and let's
allocate memory to store 32-capabilities as an array of
__user_cap_data_struct with the length of 1 (_LINUX_CAPABILITY_U32S_1).
2024-06-26 07:38:21 +02:00
Valentine Krasnobaeva
fcf1a0bcf5 MINOR: capabilities: export capget and __user_cap_header_struct
To be able to show process capabilities before applying its configuration and
also at runtime in 'show dev' command output, we need to export the wrapper
around capget() syscall. It also seems more handy to place
__user_cap_header_struct in .data section and declare it as globally
accessible, as we always fill it with the same values. This avoids allocate
and fill these 8 bytes each time on the stack frame, when capget() or capset()
wrappers are called.
2024-06-26 07:38:21 +02:00
Willy Tarreau
a14c7d194a DEV: flags/show-fd-to-flags: adapt to recent versions
The script hadn't been updated since it was introduced, and the
hard-coded field 12 doesn't match anymore (it's 16 now). Let's just
use "grep -o cflg..." to extract the desired part more flexibly.
This can be backported at least to 3.0, probably further, but it
will need to be tested prior to this. Better not bring it too far,
it's only used when debugging.
2024-06-25 08:13:24 +02:00
Amaury Denoyelle
d5376b7a87 BUG/MINOR: quic: fix BUG_ON() on Tx pkt alloc failure
On quic_tx_packet allocation failure, it is possible to trigger BUG_ON()
crash on INITIAL packet building. This statement is responsible to
ensure INITIAL packets are padded to 1.200 bytes as required. If a
packet on higher encryption level allocation fails, PADDING frame cannot
properly encoded, despite the INITIAL packet properly built.

This crash happens due to qc_txb_store() invokation after quic_tx_packet
allocation failure to validate already built packets. However, this
statement is unneeded as qc_purge_tx_buf() is called just after. Simply
remove qc_txb_store() to fix this issue.

This was detected using -dMfail.

This should be backported up to 2.6.
2024-06-24 14:40:38 +02:00
Amaury Denoyelle
5718c67c19 BUG/MINOR: h3: fix BUG_ON() crash on control stream alloc failure
BUG_ON() from qcc_set_error() is triggered on HTTP/3 control stream
allocation failure. This is caused because both h3_finalize() and
qcc_init_stream_local() call qcc_set_error() which is forbidden to
prevent error code erasure.

Fix this by removing qcc_set_error() invocation from h3_finalize() on
allocation failure. Note that this function is still responsible to use
it on SETTING frame emission failure.

This was detected using -dMfail.

This must be backported up to 3.0.
2024-06-24 14:40:38 +02:00
Amaury Denoyelle
3aded1d375 BUG/MINOR: mux-quic: fix crash on qcs SD alloc failure
Since the following commit, sedesc are created since QCS instantiation
in qcs_new().
  086e51017e7731ee9820b882fe6e8cc5f0dd5352
  BUG/MEDIUM: mux-quic: Create sedesc in same time of the QUIC stream

However, sedesc is initialized before other QCS mandatory fields. If
sedesc allocation fails, a crash would occur on qcs_free() invocation
for QCS early release. To fix this, delay sedesc allocation until
function end.

This bug was detected using -dMfail.

This should be backported up to 2.6.
2024-06-24 14:04:48 +02:00
Amaury Denoyelle
85838822ba BUG/MINOR: h3: fix crash on STOP_SENDING receive after GOAWAY emission
After emitting a HTTP/3 GOAWAY frame, opening of streams higher than
advertised ID was prevented. h3_attach operation would return success
but without allocating H3S stream context for QCS. In addition, the
stream would be immediately scheduled for RESET_STREAM emission.

Despite the immediate stream close, the current is not sufficient enough
and can cause crashes. When of this occurence can be found if
STOP_SENDING is the first frame received for a stream. A crash would
occur under qcc_recv_stop_sending() after h3_attach invokation, when
h3_close() is used which try to access to H3S context.

To fix this, change h3_attach API. In case of success, H3S stream
context is always allocated, even if the stream will be scheduled for
immediate close. This renders the code more reliable.

This crash should be extremely rare, as it can only happen after GOAWAY
emission, which is only used on soft-stop or reload.

This should solve the second crash occurence reported on GH #2607.

This must be backported up to 2.8.
2024-06-24 12:03:55 +02:00
Aurelien DARRAGON
13e0972aea DOC: api/event_hdl: small updates, fix an example and add some precisions
Fix an example suggesting that using EVENT_HDL_SUB_TYPE(x, y) with y being
0 was valid. Then add some notes to explain how to use
EVENT_HDL_SUB_FAMILY() and EVENT_HDL_SUB_TYPE() with valid values.

Also mention that the feature is available starting from 2.8 and not 2.7.
Finally, perform some purely cosmetic updates.

This could be backported in 2.8.
2024-06-21 18:12:31 +02:00
Amaury Denoyelle
b27470fd1d SCRIPTS: git-show-backports: do not truncate git-show output
git-show-backports lists a git-show command which can be used to inspect
all commits subject to backport. This command specifies formatting
option to reproduce default git-show output, especially for commit
messages indented with 4 spaces character. However, it also add wrapping
on message line longer than 72 characters. This reduce lisibility of
messages where large info are written such as backtraces.

Improve this by changing git-show format option. Use a limit value of 0
to disable wrapping while preserving indentation.

This could be backported to every stable version to simplify backporting
process.
2024-06-21 15:08:42 +02:00
William Lallemand
5756f10cbc MINOR: sample: date converter takes HTTP date and output an UNIX timestamp
The `date` converter takes an HTTP date in input, it could be either a
imf, rfc850 or asctime date. It will output an UNIX timestamp.
2024-06-20 16:38:48 +02:00
Amaury Denoyelle
937324d493 BUG/MAJOR: quic: do not loop on emission on closing/draining state
To emit CONNECTION_CLOSE frame, a special buffer is allocated via
qc_txb_store(). This is due to QUIC_FL_CONN_IMMEDIATE_CLOSE flag.
However this flag is reset after qc_send_ppkts() invocation to prevent
reemission of CONNECTION_CLOSE frame.

qc_send() can invoke multiple times a series of qc_prep_pkts() +
qc_send_ppkts() to emit several datagrams. However, this may cause a
crash if on first loop a CONNECTION_CLOSE is emitted. On the next loop
iteration, QUIC_FL_CONN_IMMEDIATE_CLOSE is resetted, thus qc_prep_pkts()
will use the wrong buffer size as end delimiter. In some cases, this may
cause a BUG_ON() crash due to b_add() outside of buffer.

This bug can be reproduced by using a while loop of ngtcp2-client and
interrupting them randomly via Ctrl+C.

Here is the patch which introduce this regression :
  cdfceb10ae136b02e51f9bb346321cf0045d58e0
  MINOR: quic: refactor qc_prep_pkts() loop
2024-06-19 15:15:59 +02:00
Amaury Denoyelle
c714b6bb55 BUG/MAJOR: quic: fix padding with short packets
QUIC sending functions were extended to be more flexible. Of all the
changes, they support now iterating over a variable instance of QEL
instance of only 2 previously. This change has rendered PADDING emission
less previsible, which was adjusted via the following patch :

  a60609f1aa3e5f61d2a2286fdb40ebf6936a80ee
  BUG/MINOR: quic: fix padding of INITIAL packets

Its main purpose was to ensure PADDING would only be generated for the
last iterated QEL instance, to avoid unnecessary padding. In parallel, a
BUG_ON() statement ensure that built INITIAL packets are always padded
to 1.200 bytes as necessary before emitted them.

This BUG_ON() statement caused crash in one particular occurence : when
building datagrams that mixed Initial long packets and 1-RTT short
packets. This last occurence type does not have a length field in its
header, contrary to Long packets. This caused a miscalculation for the
necessary padding size, with INITIAL packets not padded enough to reach
the necessary 1.200 bytes size.

This issue was detected on 3.0.2. It can be reproduced by using 0-RTT
combined with latency. Here are the used commands :

  $ ngtcp2-client --tp-file=/tmp/ngtcp2-tp.txt \
    --session-file=/tmp/ngtcp2-session.txt --exit-on-all-streams-close \
    127.0.0.1 20443 "https://[::]/?s=32o"
  $ sudo tc qdisc add dev lo root netem latency 500ms

Note that this issue cannot be reproduced on current dev version.
Indeed, it seems that the following patch introduce a slight change in
packet building ordering :

  cdfceb10ae136b02e51f9bb346321cf0045d58e0
  MINOR: quic: refactor qc_prep_pkts() loop

This must be backported to 3.0.

This should fix github issue #2609.
2024-06-19 11:11:57 +02:00
Aurelien DARRAGON
7422f16da3 DOC: management: document ptr lookup for table commands
Add missing documentation and examples for the optional ptr lookup method
for table {show,set,clear} commands introduced in commit 9b2717e7 ("MINOR:
stktable: use {show,set,clear} table with ptr"), as initially described in
GH #2118.

It may be backported in 3.0.
2024-06-19 10:28:10 +02:00
William Lallemand
0cc2913aec DOC: configuration: fix alphabetical order of bind options
Put the curves, ecdhe, severity-output, v4v6 and v6only keyword at the
right place.

Fix issue #2594.

Could be backported in every stable versions.
2024-06-18 12:08:19 +02:00
Aurelien DARRAGON
9d312212df BUG/MINOR: proxy: fix email-alert leak on deinit() (2nd try)
As shown in GH #2608 and ("BUG/MEDIUM: proxy: fix email-alert invalid
free"), simply calling free_email_alert() from free_proxy() is not the
right thing to do.

In this patch, we reuse proxy->email_alert.set memory space to introduce
proxy->email_alert.flags in order to support 2 flags:
PR_EMAIL_ALERT_SET (to mimic proxy->email_alert.set) and
PR_EMAIL_ALERT_RESOLVED (set once init_email_alert() was called on the
proxy to resolve email_alert.mailer pointer).

Thanks to PR_EMAIL_ALERT_RESOLVED flag, free_email_alert() may now
properly handle the freeing of proxy email_alert settings: if the RESOLVED
flag is set, then it means the .email_alert.mailers.name parsing hint was
replaced by the actual mailers pointer, thus no free should be attempted.

No backport needed: as described in ("BUG/MEDIUM: proxy: fix email-alert
invalid free"), this historical leak is not sensitive as it cannot be
triggered during runtime.. thus given that the fix is not backport-
friendly, it's not worth the trouble.
2024-06-17 19:37:29 +02:00
Aurelien DARRAGON
ee8be55942 REORG: mailers: move free_email_alert() to mailers.c
free_email_alert() was declared in cfgparse.c, but it should belong to
mailers.c instead.
2024-06-17 19:37:29 +02:00
Aurelien DARRAGON
8e226682be BUG/MEDIUM: proxy: fix email-alert invalid free
In fa90a7d3 ("BUG/MINOR: proxy: fix email-alert leak on deinit()"), I
tried to fix email-alert deinit() leak the simple way by leveraging
existing free_email_alert() helper function which was already used for
freeing email alert settings used in a default section.

However, as described in GH #2608, there is a subtelty that makes
free_email_alert() not suitable for use from free_proxy().

Indeed, proxy 'mailers.name' hint shares the same memory space than the
pointer to the corresponding mailers section (once the proxy is resolved,
name hint is replaced by the pointer to the section). However, since both
values share the same space (through union), we have to take care of not
freeing `mailers.name` once init_email_alert() was called on the proxy.

Unfortunately, free_email_alert() isn't protected against that, causing
double free() during deinit when mailers section is referenced from
multiple proxy sections. Since there is no easy fix, and that the leak in
itself isn't a big deal (fa90a7d3 was simply an opportunistic fix rather
than a must-have given that the leak only occurs during deinit and not
during runtime), let's actually revert the fix to restore legacy behavior
and prevent deinit errors.

Thanks to @snetat for having reported the issue on Github as well as
providing relevant infos to pinpoint the bug.

It should be backported everywhere fa90a7d3 was backported.
[ada: for versions prior to 3.0, simply revert the offending commit using
'git revert' as proxy_free_common() first appears in 3.0]
2024-06-17 19:37:24 +02:00
William Lallemand
c268313f60 REGTESTS: ssl: activate new SSL reg-tests with AWS-LC
Prerequisites are now available in AWS-LC, so we can enable these
reg-tests.

With this patch, aws-lc only has 5 reg-tests that are not working:
- reg-tests/ssl/ssl_reuse.vtc: stateful session resumption is only supported with TLSv1.2
- reg-tests/ssl/ssl_curve_name.vtc: function to extract curve name is not available
- reg-tests/ssl/ssl_errors.vtc: errors are not the same than OpenSSL
- reg-tests/ssl/ssl_dh.vtc: AWS-LC does not support DH
- reg-tests/ssl/ssl_curves.vtc: not working correctly

Which means most of the features are working correctly.
2024-06-17 17:43:22 +02:00
William Lallemand
30a432d198 MINOR: ssl: activate sigalgs feature for AWS-LC
AWSLC lacks the SSL_CTX_set1_sigalgs_list define, however the function
exists, which disables the feature in HAProxy, even if we could have
build with it.

SSL_CTX_set1_client_sigalgs_list() is not available, though.

This patch introduce the define so the feature is enabled.
2024-06-17 17:40:49 +02:00
William Lallemand
ed9b8fec49 BUG/MEDIUM: ssl: AWS-LC + TLSv1.3 won't do ECDSA in RSA+ECDSA configuration
SSL_get_ciphers() in AWS-LC seems to lack the TLSv1.3 ciphersuites,
which break the ECDSA key selection when doing TLSv1.3.

An issue was opened https://github.com/aws/aws-lc/issues/1638

Indeed, in ssl_sock_switchctx_cbk(), the sigalgs is used to determine if
ECDSA is doable or not, then the function compares the list of ciphers in
the clienthello with the list of configured ciphers.

The fix solves the issue by never skipping the TLSv1.3 ciphersuites,
even if they are not in SSL_get_ciphers().
2024-06-17 17:40:49 +02:00
William Lallemand
6da0879083 REGTESTS: ssl: fix some regtests 'feature cmd' start condition
Since patch fde517b ("REGTESTS: wolfssl: temporarly disable some failing
reg-tests") some 'feature cmd' lines have an extra quotation mark, so
they were disable in every cases.

Must be backported to 2.9.
2024-06-17 16:12:57 +02:00
Aurelien DARRAGON
983513d901 DEBUG: hlua: distinguish burst timeout errors from exec timeout errors
hlua burst timeout was introduced in 58e36e5b1 ("MEDIUM: hlua: introduce
tune.lua.burst-timeout").

It is a safety measure that allows to detect when too much time is spent
on a single lua execution (between 2 interruptions/yields), meaning that
the current thread is not able to perform other tasks. Such scenario
should be avoided because it will cause thread contention which may have
negative performance impact and could cause the watchdog to trigger. When
the burst timeout is exceeded, the current Lua execution is aborted and a
timeout error is reported to the user.

Unfortunately, the same error is currently being reported for cumulative
(AKA execution) timeout and for burst timeout, which may be confusing to
the user.

Indeed, "execution timeout" error historically results from the current
hlua context exceeding the total (cumulative) time it's allowed to run.
It is set per lua context using the dedicated tunables:
 - tune.lua.session-timeout
 - tune.lua.task-timeout
 - tune.lua.service-timeout

We've already faced an user report where the user was able to trigger the
burst timeout and got "Lua task: execution timeout." error while the user
didn't set cumulative timeout. Thus the error was actually confusing
because it was indeed the burst timeout which was causing it due to the
use of cpu-intensive call from within the task without sufficient manual
"yield" keypoints around the cpu-intensive call to ensure it runs on a
dedicated scheduler cycle.

In this patch we make it so burst timeout related errors are reported as
"burst timeout" errors instead of "execution timeout" errors (which
in fact became the generic timeout errors catchall with 58e36e5b1).

To do this, hlua_timer_check() now returns a different value depending if
the exeeded timeout is the burst one or the cumulative one, which allows
us to return either HLUA_E_ETMOUT or HLUA_E_BTMOUT in hlua_ctx_resume().

It should improve the situation described in GH #2356 and may possibly be
backported with 58e36e5b1 to improve error reporting if it applies without
resistance.
2024-06-14 18:25:58 +02:00
Aurelien DARRAGON
0030f722a2 BUG/MINOR: log: fix broken '+bin' logformat node option
In 12d08cf912 ("BUG/MEDIUM: log: don't ignore disabled node's options"),
while trying to restore historical node option inheritance behavior, I
broke the '+bin' logformat node option recently introduced in b7c3d8c87c
("MINOR: log: add +bin logformat node option").

Indeed, because of 12d08cf912, LOG_OPT_BIN is not set anymore on
individual nodes even if it was set globally, making the feature unusable.
('+bin' is also used for binary cbor encoding)

What I should have done instead is include LOG_OPT_BIN in the options
inherited from global ones. This is what's being done in this commit.
Misleading comment was adjusted.

It must be backported in 3.0 with 12d08cf912.
2024-06-14 18:25:21 +02:00
Christopher Faulet
dc1bca4e9f [RELEASE] Released version 3.1-dev1
Released version 3.1-dev1 with the following main changes :
    - REGTESTS: Remove REQUIRE_VERSION=2.1 from all tests
    - REGTESTS: Remove REQUIRE_VERSION=2.2 from all tests
    - CI: use "--no-install-recommends" for apt-get
    - CI: switch to lua 5.4
    - CI: use USE_PCRE2 instead of USE_PCRE
    - DOC: replace the README by a markdown version
    - CI: VTest: accelerate package install a bit
    - ADMIN: acme.sh: remove the old acme.sh code
    - BUG/MINOR: cfgparse: remove the correct option on httpcheck send-state warning
    - BUG/MINOR: tcpcheck: report correct error in tcp-check rule parser
    - BUG/MINOR: tools: fix possible null-deref in env_expand() on out-of-memory
    - DOC: configuration: add an example for keywords from crt-store
    - CI: speedup apt package install
    - DOC: add the FreeBSD status badge to README.md
    - DOC: change the link to the FreeBSD CI in README.md
    - MINOR: stktable: avoid ambiguous stktable_data_ptr() usage in cli_io_handler_table()
    - BUG/MINOR: hlua: use CertCache.set() from various hlua contexts
    - CLEANUP: hlua: fix CertCache class comment
    - CI: FreeBSD: upgrade image, packages
    - BUG/MEDIUM: h1-htx: Don't state interim responses are bodyless
    - MEDIUM: stconn: Be able to unblock zero-copy data forwarding from done_fastfwd
    - BUG/MEDIUM: mux-quic: Unblock zero-copy forwarding if the txbuf can be released
    - BUG/MINOR: quic: prevent crash on qc_kill_conn()
    - CLEANUP: hlua: use hlua_pusherror() where relevant
    - BUG/MINOR: hlua: don't use lua_pushfstring() when we don't expect LJMP
    - BUG/MINOR: hlua: fix unsafe hlua_pusherror() usage
    - BUG/MINOR: hlua: prevent LJMP in hlua_traceback()
    - CLEANUP: hlua: get rid of hlua_traceback() security checks
    - BUG/MINOR: hlua: fix leak in hlua_ckch_set() error path
    - CLEANUP: hlua: simplify ambiguous lua_insert() usage in hlua_ctx_resume()
    - BUG/MEDIUM: mux-quic: Don't unblock zero-copy fwding if blocked during nego
    - MINOR: mux-quic: Don't send an emtpy H3 DATA frame during zero-copy forwarding
    - BUG/MEDIUM: ssl: wrong priority whem limiting ECDSA ciphers in ECDSA+RSA configuration
    - BUG/MEDIUM: ssl: bad auth selection with TLS1.2 and WolfSSL
    - BUG/MINOR: quic: fix computed length of emitted STREAM frames
    - BUG/MINOR: quic: ensure Tx buf is always purged
    - BUG/MEDIUM: stconn/mux-h1: Fix suspect change causing timeouts
    - BUG/MAJOR: mux-h1:  Properly copy chunked input data during zero-copy nego
    - BUG/MINOR: mux-h1: Use the right variable to set NEGO_FF_FL_EXACT_SIZE flag
    - DOC: install: remove boringssl from the list of supported libraries
    - MINOR: log: fix "http-send-name-header" ignore warning message
    - BUG/MINOR: proxy: fix server_id_hdr_name leak on deinit()
    - BUG/MINOR: proxy: fix log_tag leak on deinit()
    - BUG/MINOR: proxy: fix email-alert leak on deinit()
    - BUG/MINOR: proxy: fix check_{command,path} leak on deinit()
    - BUG/MINOR: proxy: fix dyncookie_key leak on deinit()
    - BUG/MINOR: proxy: fix source interface and usesrc leaks on deinit()
    - BUG/MINOR: proxy: fix header_unique_id leak on deinit()
    - MINOR: proxy: add proxy_free_common() helper function
    - BUG/MEDIUM: proxy: fix UAF with {tcp,http}checks logformat expressions
    - MINOR: log: change wording in lf_expr_postcheck() error message
    - BUG/MEDIUM: log: fix lf_expr_postcheck() behavior with default section
    - CLEANUP: log/proxy: fix comment in proxy_free_common()
    - DOC: config: move "hash-key" from proxy to server options
    - DOC: config: add missing section hint for "guid" proxy keyword
    - DOC: config: add missing context hint for new server and proxy keywords
    - BUG/MINOR: promex: Skip resolvers metrics when there is no resolver section
    - DOC: internals: add a documentation about the master worker
    - BUG/MAJOR: mux-h1: Prevent any UAF on H1 connection after draining a request
    - BUG/MINOR: quic: fix padding of INITIAL packets
    - OPTIM: quic: fill whole Tx buffer if needed
    - MINOR: quic: refactor qc_build_pkt() error handling
    - MINOR: quic: use global datagram headlen definition
    - MINOR: quic: refactor qc_prep_pkts() loop
    - DOC/MINOR: management: add missed -dR and -dv options
    - DOC/MINOR: management: add -dZ option
    - DOC: management: rename show stats domain cli "dns" to "resolvers"
    - REORG: log: reorder send log helpers by dependency order
    - MINOR: session: expose session_embryonic_build_legacy_err() function
    - MEDIUM: log/session: handle embryonic session log within sess_log()
    - MINOR: log: provide sending log context to process_send_log() when available
    - MINOR: log: add log_orig_to_str() function
    - MINOR: log: provide log origin in logformat expressions using '%OG'
    - CLEANUP: log: remove ambiguous legacy comment for resolve_logger()
    - MINOR: log/backend: always free parsing hints in resolve_logger()
    - MINOR: log: make resolve_logger() static
    - MINOR: log: provide proxy context to resolve_logger()
    - MINOR: log: add __send_log_set_metadata_sd helper
    - MINOR: log: add logger flags
    - MINOR: log: add log-profile parsing logic
    - MINOR: log: add log profile buildlines
    - MEDIUM: log: handle log-profile in process_send_log()
    - DOC: config: add documentation for log profiles
    - REGTESTS: log: add a test for log-profile
    - MINOR: ssl: add ssl_sock_bind_verifycbk() in ssl_sock.h
    - REORG: ssl: move the SNI selection code in ssl_clienthello.c
    - BUILD: ssl: fix build with wolfSSL
    - CI: github: upgrade aws-lc to 1.29.0
    - Revert "CI: github: upgrade aws-lc to 1.29.0"
    - MEDIUM: ssl: support for ECDA+RSA certificate selection with AWS-LC
    - BUILD: ssl: disable deprecated functions for AWS-LC 1.29.0
    - MINOR: ssl: relax the 'ssl.default-dh-param' keyword parsing
    - CI: github: upgrade aws-lc to 1.29.0
    - DOC: INSTALL: minimum AWS-LC version is v1.22.0
    - CI: github: do the AWS-LC weekly build with ERR=1
2024-06-14 16:04:18 +02:00
William Lallemand
5e361c7767 CI: github: do the AWS-LC weekly build with ERR=1
The weekly CI that tries new version of AWS-LC was not building with
ERR=1, which let us think that everything was good but there was in fact
new warning that we missed.

Add ERR=1 to the build so the CI will failed for any new warning.
2024-06-14 12:18:32 +02:00
William Lallemand
1950996e83 DOC: INSTALL: minimum AWS-LC version is v1.22.0
Change the minimum AWS-LC version required
2024-06-14 12:06:03 +02:00
William Lallemand
11e13175d4 CI: github: upgrade aws-lc to 1.29.0
Upgrade aws-lc to 1.29.0 on the push CI.
2024-06-14 11:37:11 +02:00
William Lallemand
7e80af04ca MINOR: ssl: relax the 'ssl.default-dh-param' keyword parsing
Some libraries are ignoring SSL_CTX_set_tmp_dh_callback(), but disabling
the 'ssl.default-dh-param' keyword when the function is not supported would
result in an error instead of silently continuing. This patch emits a
warning when the keyword is not supported instead of a loading failure.
2024-06-14 11:36:52 +02:00
William Lallemand
ee5aa4e5e6 BUILD: ssl: disable deprecated functions for AWS-LC 1.29.0
AWS-LC have a lot of functions that does nothing, which are now
deprecated and emits some warning.

This patch disables the following useless functions that emits a warning:
SSL_CTX_get_security_level(), SSL_CTX_set_tmp_dh_callback(),
ERR_load_SSL_strings(), RAND_keep_random_devices_open()

The list of deprecated functions is here:

https://github.com/aws/aws-lc/blob/main/docs/porting/functionality-differences.md
2024-06-14 10:41:36 +02:00
William Lallemand
7120c77b14 MEDIUM: ssl: support for ECDA+RSA certificate selection with AWS-LC
AWS-LC does not support the SSL_CTX_set_client_hello_cb() function from
OpenSSL which allows to analyze ciphers and signatures algorithm of the
ClientHello. However it supports the SSL_CTX_set_select_certificate_cb()
which allows the same thing but was the implementation from the
boringSSL side.

This patch uses the SSL_CTX_set_select_certificate_cb() as well as the
SSL_early_callback_ctx_extension_get() function to get the signature
algorithms.

This was successfully tested with openssl s_client as well as
testssl.sh.

This should allow to enable more reg-tests that depend on certificate
selection.

Require at least AWS-LC 1.22.0.
2024-06-13 19:36:40 +02:00
William Lallemand
935b3bd1b7 Revert "CI: github: upgrade aws-lc to 1.29.0"
This reverts commit 6e986e7493ad2aa0c5a11c59d1235b03c02ef71c.
2024-06-13 17:14:58 +02:00
William Lallemand
6e986e7493 CI: github: upgrade aws-lc to 1.29.0
Upgrade aws-lc to 1.29.0 on the push CI.
2024-06-13 17:11:04 +02:00
William Lallemand
5149cc4990 BUILD: ssl: fix build with wolfSSL
fix build with wolfSSL, broken since the reorg in src/ssl_clienthello.c
2024-06-13 17:01:45 +02:00
William Lallemand
4ced880d22 REORG: ssl: move the SNI selection code in ssl_clienthello.c
Move the code which is used to select the final certificate with the
clienthello callback. ssl_sock_client_sni_pool need to be exposed from
outside ssl_sock.c
2024-06-13 16:48:17 +02:00
William Lallemand
fc7c5d892b MINOR: ssl: add ssl_sock_bind_verifycbk() in ssl_sock.h
Add missing ssl_sock_bind_verifycbk() in ssl_sock.h
2024-06-13 16:48:17 +02:00
Aurelien DARRAGON
bcad26c814 REGTESTS: log: add a test for log-profile
Try to cover some common use-cases for "log-profile" feature. The tests
mainly focus on log-profile section declaration, and testing the behavior
of logformat / log-tag overriding capabilities.

For now, the use of log-profiles is somewhat limited because we lack
the ability to explicitly trigger the log building process at specific
steps during the stream handling. Indeed, for now we rely on
"option logasap" and proxy log-format string content "hacks" to force
the log emission at some specific steps, thus more tests should be added
over the time, when new mechanisms allowing the emission of logs at
expected processing steps will be added, or if new keywords are added to
the log-profile section.

This test requires versions >= 3.0-dev1
2024-06-13 15:43:10 +02:00
Aurelien DARRAGON
8fa4036dae DOC: config: add documentation for log profiles
Now that log-profile parsing logic has been implemented in "MINOR: log:
add log-profile parsing logic" and is actually effective since "MEDIUM:
log: handle log-profile in process_send_log()", let's document the feature
and add some examples.

Log-profile section is declared like this:

  log-profile myprof
    log-tag "custom-tag"

    on error format "%ci: error"
    on any format "(custom httplog) ${HAPROXY_HTTP_LOG_FMT}" sd "[exampleSDID@1234 step=\"accept\" id=\"%ID\"]"

(check out the documentation for the full list of options, some options
are only relevant under specific contexts)

And used this way (from usual "log" directive lines):

  global
    log stdout format rfc5424 profile myprof local0
                              --------------

For now, the use of log-profiles is somewhat limited because we lack
the ability to explicitly trigger the log building process at specific
steps during the stream handling, but it should gain more traction over
the time as the feature evolves and new mechanisms allowing the emission
of logs at expected processing steps will be added.

It should partially fix GH #401
2024-06-13 15:43:10 +02:00
Aurelien DARRAGON
cc6fd2646b MEDIUM: log: handle log-profile in process_send_log()
In previous commit we implemented log-profile parsing logic. Now let's
actually make use of available log-profile information from logger struct
to decide whether we need to rebuild the logline under process_send_log()
according to log profile settings. Nothing is done if the logger didn't
specify a log-profile.
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
48d34b98e4 MINOR: log: add log profile buildlines
Now that we have log-profile parsing done, let's prepare for runtime
log-profile handling by adding the necessary string buffer required to
re-build log strings using sess_build_logline() on the fly without
altering regular loglines content.

Indeed, since a different log-profile may (or may not) be specified for
each logger, we must keep the original string and only rebuild a custom
one when required for the current logger (according to the selected log-
profile).
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
15e9c7da6b MINOR: log: add log-profile parsing logic
This patch implements prerequisite log-profile struct and parser logic.
It has no effect during runtime for now.

Logformat expressions provided in log-profile "steps" are postchecked
during postparsing for each proxy "log" directive that makes use of a
given profile. (this allows to ensure that the logformat expressions
used in the profile are compatible with proxy using them)
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
33f3bec7ee MINOR: log: add logger flags
Logger struct may benefit from having a "flags" struct member to set
or remove different logger states. For that, we reuse an existing
4 bytes hole in the logger struct to store a 2 bytes flags integer,
leaving the struct with a 2-bytes hole now.
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
a6e38465fb MINOR: log: add __send_log_set_metadata_sd helper
Extract sd metadata assignment in __send_log() to make an inline helper
function out of it in order to be able to use it from other functions if
needed.
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
3102c89dde MINOR: log: provide proxy context to resolve_logger()
Prerequisite work for log-profiles, we need to know under which proxy
context the logger is being used. When the info is not available, (ie:
global section or log-forward section, <px> is set to NULL)
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
42139fa16e MINOR: log: make resolve_logger() static
There is no need to expose this internal function, let's make it static.
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
db47471155 MINOR: log/backend: always free parsing hints in resolve_logger()
Since resolve_logger() always resolves logger target (even when error
occurs), we must take care of freeing parsing hints because free_logger()
won't try to do it if target RESOLVED flag is set on the target.

This isn't considered as a bug because resolve_logger(), being a
postparsing check, will make haproxy immediately exit upon fatal error
in haproxy.c, but it's better to ensure that everything will be properly
freed if we decide to perform a clean exit upon postparsing checks error
in the future.
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
2a1bf99923 CLEANUP: log: remove ambiguous legacy comment for resolve_logger()
It is no longer relevant to say that <logger> is used for implicit
settings. In fact the function resolves <logger>, but currently
mainly focuses on loggers's target. However we could extend the
function to perform additional work on the logger itself in the future.

let's adjust the comment to prevent any confusion.
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
8f34320e15 MINOR: log: provide log origin in logformat expressions using '%OG'
'%OG' logformat alias may be used to report the log origin (when/where)
that triggered log generation using sess_build_logline().

Possible values are:
  - "sess_error": log was generated during session error handling
  - "sess_killed": log was generated during session abortion (killed
    embryonic session)
  - "txn_accept": log was generated right after frontend conn was accepted
  - "txn_request": log was generated after client request was received
  - "txn_connect": log was generated after backend connection establishment
  - "txn_response": log was generated during server response handling
  - "txn_close": log was generated at the final txn step, before closing
  - "unspec": unknown or not specified

Documentation was updated.
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
b52862d401 MINOR: log: add log_orig_to_str() function
Get human readable string from log_orig enum members.
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
2a91bd52ad MINOR: log: provide sending log context to process_send_log() when available
This is another prerequisite work in preparation for log-profiles: in this
patch we make process_send_log() aware of the log origin, primarily aiming
for sess and txn logging steps such as error, accept, connect, close, as
well as relevant sess and stream pointers.
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
0b7a5a64eb MEDIUM: log/session: handle embryonic session log within sess_log()
Move the embryonic session logging logic down to sess_log() in preparation
for log-profiles because then log preferences will be set per logger and
not per proxy. Indeed, as each logger may come with its own log-profile
that possibly overrides proxy logformat preferences, the check will need
to be performed at a central place by lower sending functions.

To ensure the change doesn't break existing behavior, a dedicated
sess_log_embryonic() wrapper was added and is exclusively used by
session_kill_embryonic() to indicate that a special logging logic must
be performed under sess_log().

Also, thanks to this change, log-format-sd will now be taken into account
for legacy embryonic session logging.
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
79a0a7b4d8 MINOR: session: expose session_embryonic_build_legacy_err() function
rename session_build_err_string() to session_embryonic_build_legacy_err()
and add new <out> buffer argument to the prototype. <out> will be used as
destination for the generated string instead of implicitly relying on the
trash buffer. Finally, expose the new function through the header file so
that it becomes usable from any source file.

The function is expected to be called with a session originating from
a connection and should not be used for applets.
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
ee288a4eef REORG: log: reorder send log helpers by dependency order
This commit looks messy, but all it does is reorganize send_log() helpers
by dependency order to remove the need of forward-declaring some of them.

Also, since they're all internal helpers, let's explicitly mark them as
static to prevent any misuse.
2024-06-13 15:43:09 +02:00
Aurelien DARRAGON
cf913c2f90 DOC: management: rename show stats domain cli "dns" to "resolvers"
In commit f8642ee82 ("MEDIUM: resolvers: rename dns extra counters to
resolvers extra counters"), we renamed "dns" counters to "resolvers", but
we forgot to update the documentation accordingly.

This may be backported to all stable versions.
2024-06-13 15:43:09 +02:00
Valentine Krasnobaeva
61d66a3d06 DOC/MINOR: management: add -dZ option
Add some description for missed -dZ command line option in
the "3. Starting HAProxy" chapter.

Need to be backported until 2.9.
2024-06-12 18:21:21 +02:00
Valentine Krasnobaeva
27623d8393 DOC/MINOR: management: add missed -dR and -dv options
Add some description for missed -dR and -dv command line options in
the "3. Starting HAProxy" chapter.

Need to be backported in every stable version.
2024-06-12 18:20:41 +02:00
Amaury Denoyelle
cdfceb10ae MINOR: quic: refactor qc_prep_pkts() loop
qc_prep_pkts() is built around a double loop iteration. First, it
iterates over every QEL instance register on sending. The inner loop is
used to repeatdly called qc_build_pkt() with a QEL instance. If the QEL
instance has no more data to sent, the next QEL entry is selected. It
can also be interrupted earlier if there is not enough room on the sent
buffer.

Clarify the inner loop by using qc_may_build_pkt() directly into it
besides the check on buffer room left. This function is used to test if
the QEL instance has something to send.

This should simplify send evolution, in particular GSO implementation.
2024-06-12 18:05:40 +02:00
Amaury Denoyelle
ba00431625 MINOR: quic: use global datagram headlen definition
Each emitted QUIC datagram is prefixed by an out-of-band header. This
header specify the datagram length and the pointer to the first QUIC
packet instance. This header length is defined via QUIC_DGRAM_HEADLEN.

Replace every occurences of manually calculated header length with
globally defined QUIC_DGRAM_HEADLEN. This should ease code maintenance
and simplify GSO implementation.
2024-06-12 18:05:40 +02:00
Amaury Denoyelle
88681681cc MINOR: quic: refactor qc_build_pkt() error handling
qc_build_pkt() error handling was difficult due to multiple error code
possible. Improve this by defining a proper enum to describe the various
error code. Also clean up ending labels inside qc_build_pkt().
2024-06-12 18:05:40 +02:00
Amaury Denoyelle
ab37b86921 OPTIM: quic: fill whole Tx buffer if needed
Previously, packets encoding was stopped as soon as buffer room left is
less than UDP MTU. This is suboptimal if the next packet would be
smaller than that.

To improve this, only check if there is at least enough room for the
mandatory packet header. qc_build_pkt() would ensure there is thus
responsible to return QC_BUILD_PKT_ERR_BUFROOM as soon as buffer left is
insufficient to stop packets encoding. An extra check is added to ensure
end pointer would never exceed buffer end.

This should not have any significant impact on the performance. However,
this renders the code intention clearer.
2024-06-12 18:05:40 +02:00
Amaury Denoyelle
a60609f1aa BUG/MINOR: quic: fix padding of INITIAL packets
API for sending has been extended to support emission on more than 2 QEL
instances. However, this has rendered the PADDING emission for INITIAL
packets less previsible. Indeed, if qc_send() is used with empty QEL
instances, a padding frame may be generated before handling the last QEL
registered, which could cause unnecessary padding to be emitted.

This commit simplify PADDING by only activating it for the last QEL
registered. This ensures that no superfluous padding is generated as if
the minimal INITIAL datagram length is reached, padding is resetted
before handling last QEL instance.

This bug is labelled as minor as haproxy already emit big enough INITIAL
packets coalesced with HANDSHAKE one without needing padding. This
however render the padding code difficult to test. Thus, it may be
useful to force emission on INITIAL qel only without coalescing
HANDSHAKE packet. Here is a sample to reproduce it :

--- a/src/quic_conn.c
+++ b/src/quic_conn.c
@@ -794,6 +794,14 @@ struct task *quic_conn_io_cb(struct task *t, void *context, unsigned int state)
                }
        }

+       if (qc->iel && qel_need_sending(qc->iel, qc)) {
+               struct list empty = LIST_HEAD_INIT(empty);
+               qel_register_send(&send_list, qc->iel, &qc->iel->pktns->tx.frms);
+               if (qc->hel)
+                       qel_register_send(&send_list, qc->hel, &empty);
+               qc_send(qc, 0, &send_list);
+       }
+
        /* Insert each QEL into sending list if needed. */
        list_for_each_entry(qel, &qc->qel_list, list) {
                if (qel_need_sending(qel, qc))

This should be backported up to 3.0.
2024-06-12 18:05:40 +02:00
Christopher Faulet
0e09cce0fd BUG/MAJOR: mux-h1: Prevent any UAF on H1 connection after draining a request
Since 2.9, it is possible to drain the request payload from the H1
multiplexer in case of early reply. When this happens, the upper stream is
detached but the H1 stream is not destroyed. Once the whole request is
drained, the end of the detach stage is finished. So the H1 stream is
destroyed and the H1 connection is ready to be reused, if possible,
otherwise it is released.

And here is the issue. If some data of the next request are received with
last bytes of the drained one, parsing of the next request is immediately
started. The previous H1 stream is destroyed and a new one is created to
handle the parsing.  At this stage the H1 connection may be released, for
instance because of a parsing error. This case was not properly handled.
Instead of immediately exiting the mux, it was still possible to access the
released H1 connection to refresh its timeouts, leading to a UAF issue.

Many thanks to Annika for her invaluable help on this issue.

The patch should fix the issue #2602. It must be backported as far as 2.9.
2024-06-12 16:12:47 +02:00
William Lallemand
82a4dd7df6 DOC: internals: add a documentation about the master worker
Add a documentation about the history of the master-worker and how it
was implemented in its first version and how it is currently working.
This is a global view of the architecture, and not an exhaustive
explanation of all mechanisms.
2024-06-12 14:46:05 +02:00
Christopher Faulet
91fe085943 BUG/MINOR: promex: Skip resolvers metrics when there is no resolver section
By default, there is always at least on resolver section, the default one,
based on "/etc/resolv.conf" content. However, it is possible to have no
resolver at all if the file is empty or if any error occurred. Errors are
silently ignored at this stage.

In that case, there was a bug in the Prometheus exporter leading to a crash
because the resolver section list is empty. An invalid resolver entity was
used. To fix the issue we must only take care to not dump resolvers metrics
when there is no resolver.

Thanks to Aurelien to have spotted the offending commit.

This patch should fix the issue #2604. It must be backported to 3.0.
2024-06-12 08:55:52 +02:00
Aurelien DARRAGON
c157894ba9 DOC: config: add missing context hint for new server and proxy keywords
To stay consistent with the work started in 54627f991 ("DOC: config: add
context hint for proxy keywords") and 3d4e1e682 ("DOC: config: add context
hint for server keywords"), we add missing context hint for "guid" (both
proxy and server) keyword and "hash-key" server keyword that were added
during 3.0 development.

This may be backported in 3.0.
2024-06-11 17:03:02 +02:00
Aurelien DARRAGON
aec02320bd DOC: config: add missing section hint for "guid" proxy keyword
"guid" proxy keyword added in da754b45 ("MINOR: proxy: implement GUID
support") was lacking the section hint in the keyword description, let's
fix that.

It could be backported in 3.0 with da754b45.
2024-06-11 17:02:55 +02:00
Aurelien DARRAGON
cdf1d20e8a DOC: config: move "hash-key" from proxy to server options
As reported by Ashley Morris, "hash-key" keyword which was introduced in
commit faa8c3e0 ("MEDIUM: lb-chash: Deterministic node hashes based on
server address") doesn't belong to proxy keywords and should be found in
5.2 "Server and default-server options" instead.

It should be backported in 3.0 with faa8c3e0
2024-06-11 17:02:50 +02:00
Aurelien DARRAGON
c6931a4f01 CLEANUP: log/proxy: fix comment in proxy_free_common()
Thanks to previous commit, logformat expressions for default proxies are
also postchecked, adjusting a comment that suggests it's not the case.
2024-06-11 11:00:11 +02:00
Aurelien DARRAGON
e4f122f3f4 BUG/MEDIUM: log: fix lf_expr_postcheck() behavior with default section
Since 7a21c3a4ef ("MAJOR: log: implement proper postparsing for logformat
expressions"), logformat expressions stored in a default section are not
postchecked anymore. This is because the REGISTER_POST_PROXY_CHECK() only
evaluates regular proxies. Because of this, proxy options which are
automatically enabled on the proxy depending on the logformat expression
features in use are not set on the default proxy, which means such options
are not passed to the regular proxies that inherit from it (proxies that
and will actually be running the logformat expression during runtime).

Because of that, a logformat expression stored inside a default section
and executed by a regular proxy may not behave properly. Also, since
03ca16f38b ("OPTIM: log: resolve logformat options during postparsing"),
it's even worse because logformat node options postresoving is also
skipped, which may also alter logformat expression encoding feature.

To fix the issue, let's add a special case for default proxies in
parse_logformat_string() and lf_expr_postcheck() so that default proxies
are postchecked on the fly during parsing time in a "relaxed" way as we
cannot assume that the features involved in the logformat expression won't
be compatible with the proxy actually running it since we may have
different types of proxies inheriting from the same default section.

This bug was discovered while trying to address GH #2597.

It should be backported to 3.0 with 7a21c3a4ef and 03ca16f38b.
2024-06-11 11:00:05 +02:00
Aurelien DARRAGON
cbc8e1394d MINOR: log: change wording in lf_expr_postcheck() error message
logformat_node was referenced as "node" in the error message reported
to the user, but in fact it is referred to as "item" in user
documentation. Using "item" in the error message to better comply with
the doc.

Error message was introduced with 7a21c3a4ef ("MAJOR: log: implement
proper postparsing for logformat expressions")
2024-06-11 10:59:58 +02:00
Aurelien DARRAGON
318c290ff2 BUG/MEDIUM: proxy: fix UAF with {tcp,http}checks logformat expressions
When parsing a logformat expression using parse_logformat_string(), the
caller passes the proxy under which the expression is found as argument.

This information allows the logformat expression API to check if the
expression is compatible with the proxy settings.

Since 7a21c3a ("MAJOR: log: implement proper postparsing for logformat
expressions"), the proxy compatibilty checks are postponed after the proxy
is fully parsed to ensure proxy properties are fully resolved for checks
consistency.

The way it works, is that each time parse_logformat_string() is called for
a given expression and proxy, it schedules the expression for postchecking
by appending the expression to the list of pending expression checks on
the proxy (lf_checks struct). Then, when the proxy is called with the
REGISTER_POST_PROXY_CHECK() hook, it iterates over unchecked expressions
and performs the check, then it removes the expression from its list.

However, I overlooked a special case: if a logformat expression is used
on a proxy that is disabled or a default proxy:
REGISTER_POST_PROXY_CHECK() hook is never called. Because of that, lf
expressions may still point to the proxy after the proxy is freed.

For most logformat expressions, this isn't an issue because they are
stored within the proxy itself, but this isn't the case with
{tcp,http}checks logformat expressions: during deinit() sequence, all
proxies are first cleaned up, and only then shared checks are freed.

Because of that, the below config will trigger UAF since 7a21c3a:

uaf.conf:
  listen dummy
    bind localhost:2222

  backend testback
    disabled
    mode http
    option httpchk
    http-check send hdr test "test"
    http-check expect status 200

haproxy -f uaf.conf -c:

==152096== Invalid write of size 8
==152096==    at 0x21C317: lf_expr_deinit (log.c:3491)
==152096==    by 0x2334A3: free_tcpcheck_http_hdr (tcpcheck.c:84)
==152096==    by 0x2334A3: free_tcpcheck_http_hdr (tcpcheck.c:79)
==152096==    by 0x2334A3: free_tcpcheck_http_hdrs (tcpcheck.c:98)
==152096==    by 0x23365A: free_tcpcheck.part.0 (tcpcheck.c:130)
==152096==    by 0x2338B1: free_tcpcheck (tcpcheck.c:108)
==152096==    by 0x2338B1: deinit_tcpchecks (tcpcheck.c:3780)
==152096==    by 0x2CF9A4: deinit (haproxy.c:2949)
==152096==    by 0x2D0065: deinit_and_exit (haproxy.c:3052)
==152096==    by 0x169BC0: main (haproxy.c:3996)
==152096==  Address 0x52a8df8 is 6,968 bytes inside a block of size 7,168 free'd
==152096==    at 0x484B27F: free (vg_replace_malloc.c:872)
==152096==    by 0x2CF8AD: deinit (haproxy.c:2906)
==152096==    by 0x2D0065: deinit_and_exit (haproxy.c:3052)
==152096==    by 0x169BC0: main (haproxy.c:3996)

To fix the issue, let's ensure in proxy_free_common() that no unchecked
expressions may still point to the proxy after the proxy is freed by
purging the list (DEL_INIT is used to reset list items).

Special thanks to GH user @mhameed who filed a comprehensive issue with
all the relevant information required to reproduce the bug (see GH #2597),
after having first reported the issue on the alpine project bug tracker.
2024-06-11 10:59:52 +02:00
Aurelien DARRAGON
005e4ba715 MINOR: proxy: add proxy_free_common() helper function
As shown by previous patch series, having to free some common proxy
struct members twice (in free_proxy() and proxy_free_defaults()) is
error-prone: we often overlook one of the two free locations when
adding new features.

To prevent such bugs from being introduced in the future, and also avoid
code duplication, we now have a proxy_free_common() function to free all
proxy struct members that are common to all proxy types (either regular or
default ones).

This should greatly improve code maintenance related to proxy freeing
logic.
2024-06-11 10:59:45 +02:00
Aurelien DARRAGON
847c406b9a BUG/MINOR: proxy: fix header_unique_id leak on deinit()
proxy header_unique_id wasn't cleaned up in proxy_free_defaults(),
resulting in small memory leak if "unique-id-header" was used on a
default proxy section.

It may be backported to all stable versions.
2024-06-11 10:59:39 +02:00
Aurelien DARRAGON
1aa219078d BUG/MINOR: proxy: fix source interface and usesrc leaks on deinit()
proxy conn_src.iface_name was only freed in proxy_free_defaults(), whereas
proxy conn_src.bind_hdr_name was only freed in free_proxy().

Because of that, using "source usesrc hdr_ip()" in a default proxy, or
"source interface" in a regular or default proxy would cause memory leaks
during deinit.

It may be backported to all stable versions.
2024-06-11 10:59:33 +02:00
Aurelien DARRAGON
6f53df3fcf BUG/MINOR: proxy: fix dyncookie_key leak on deinit()
proxy dyncookie_key wasn't cleaned up in free_proxy(), resulting in small
memory leak if "dynamic-cookie-key" was used on a regular or default
proxy.

It may be backported to all stable versions.
2024-06-11 10:59:27 +02:00
Aurelien DARRAGON
62d0465a96 BUG/MINOR: proxy: fix check_{command,path} leak on deinit()
proxy check_{command,path} members (used for "external-check" feature)
weren't cleaned up in free_proxy(), resulting in small memory leak if
"external-check command" or "external-check path" were used on a regular
or default proxy.

It may be backported to all stable versions.
2024-06-11 10:59:20 +02:00
Aurelien DARRAGON
fa90a7d313 BUG/MINOR: proxy: fix email-alert leak on deinit()
proxy email-alert settings weren't cleaned up in free_proxy(), resulting
in small memory leak if "email-alert to" or "email-alert from" were used
on a regular or default proxy.

It may be backported to all stable versions.
2024-06-11 10:59:15 +02:00
Aurelien DARRAGON
77b192ea36 BUG/MINOR: proxy: fix log_tag leak on deinit()
proxy log_tag wasn't cleaned up in free_proxy(), resulting in small
memory leak if "log-tag" was used on a regular or default proxy.

It may be backported to all stable versions.
2024-06-11 10:59:08 +02:00
Aurelien DARRAGON
99f3409582 BUG/MINOR: proxy: fix server_id_hdr_name leak on deinit()
proxy server_id_hdr_name member (used for "http-send-name-header" option)
wasn't cleaned up in free_proxy(), resulting in small memory leak if
"http-send-name-header" was used on a regular or default proxy.

This may be backported to all stable versions.
2024-06-11 10:59:02 +02:00
Aurelien DARRAGON
e5ccfda9d3 MINOR: log: fix "http-send-name-header" ignore warning message
Warning message to indicate that the "http-send-name-header" option is
ignored for backend in "mode log" was referenced using its internal
struct wording instead of public name (as seen in the documentation).

Let's fix that.

It may be backported with c7783fb ("MINOR: log/backend: prevent
"http-send-name-header" use with LOG mode") in 2.9.
2024-06-11 10:58:55 +02:00
William Lallemand
7acdc3f6ff DOC: install: remove boringssl from the list of supported libraries
BoringSSL support is known to be broken since 2021, it was removed from
the CI at this time and never fixed.
(30ee2965b66f20a2649323ca36029bf2440e34b9)

Even the QUIC code for boringSSL was removed in 2022.
(e06f7459faf36f5f63092cb6ce89d281dfc4ee6a)
2024-06-10 18:54:28 +02:00
Christopher Faulet
7bff576ebb BUG/MINOR: mux-h1: Use the right variable to set NEGO_FF_FL_EXACT_SIZE flag
Instead of setting this flag on the ones used for the zero-copy negociation,
it is set on the connection flags used for xprt->rcv_buf()
call. Fortunately, there is no real consequence. The only visible effect is
the chunk size that is written on 8 bytes for no reason.

This patch is related to issue #2598. It must be backported to 3.0.
2024-06-10 14:06:35 +02:00
Christopher Faulet
e8cc8a60be BUG/MAJOR: mux-h1: Properly copy chunked input data during zero-copy nego
When data are transfered via zero-copy data forwarding, if some data were
already received, we try to immediately tranfer it during the negociation
step. If data are chunked and the chunk size is unknown, 10 bytes are reserved
to write the chunk size during the done step. However, when input data are
finally transferred, the offset is ignored. Data are copied into the output
buffer. But the first 10 bytes are then crushed by the chunk size. Thus the
chunk is truncated leading to a malformed message.

This patch should fix the issue #2598. It must be backported to 3.0.
2024-06-10 14:06:35 +02:00
William Manley
52eb6b23f8 BUG/MEDIUM: stconn/mux-h1: Fix suspect change causing timeouts
This fixes an issue I've had where if a connection was idle for ~23s
it would get in a bad state.  I don't understand this code, so I'm
not sure exactly why it was failing.

I discovered this by bisecting to identify the commit that caused the
regression between 2.9 and 3.0.  The commit is
d2c3f8dde7c2474616c0ea51234e6ba9433a4bc1: "MINOR: stconn/connection:
Move shut modes at the SE descriptor level" - a part of v3.0-dev8.
It seems to be an innocent renaming, so I looked through it and this
stood out as suspect:

    -        if (mode != CO_SHW_NORMAL)
    +        if (mode & SE_SHW_NORMAL)

It looks like the not went missing here, so this patch reverses that
condition.  It fixes my test.

I don't quite understand what this is doing or is for so I can't write
a regression test or decent commit message.  Hopefully someone else
will be able to pick this up from where I've left it.

[CF: This inverts the condition to perform clean shutdowns. This means no
     clean shutdown are performed when it should do. This patch must be
     backported to 3.0]
2024-06-10 14:06:35 +02:00
Amaury Denoyelle
0ef94e2dff BUG/MINOR: quic: ensure Tx buf is always purged
quic_conn API for sending was recently refactored. The main objective
was to regroup the different functions present for both handshake and
application emission.

After this refactoring, an optimization was introduced to avoid calling
qc_send() if there was nothing new to emit. However, this prevent the Tx
buffer to be purged if previous sending was interrupted, until new
frames are finally available.

To fix this, simply remove the optimization. qc_send() is thus now
always called in quic_conn IO handlers.

The impact of this bug should be minimal as it happens only on sending
temporary error. However in this case, this could cause extra latency or
even a complete sending freeze in the worst scenario.

This must be backported up to 3.0.
2024-06-10 10:29:28 +02:00
Amaury Denoyelle
50470a5181 BUG/MINOR: quic: fix computed length of emitted STREAM frames
qc_build_frms() is responsible to encode multiple frames in a single
QUIC packet. It accounts for room left in the buffer packet for each
newly encded frame.

An incorrect computation was performed when encoding a STREAM frame in a
single packet. Frame length was accounted twice which would reduce in
excess the buffer packet room. This caused the remaining built frames to
be reduced with the resulting packet not able to fill the whole MTU.

The impact of this bug should be minimal. It is only present when
multiple frames are encoded in a single packet after a STREAM. However
in this case datagrams built are smaller than expecting, which is
suboptimal for bandwith.

This should be backported up to 2.6.
2024-06-10 10:24:02 +02:00
William Lallemand
711338e1ce BUG/MEDIUM: ssl: bad auth selection with TLS1.2 and WolfSSL
The ClientHello callback for WolfSSL introduced in haproxy 2.9, seems
not to behave correctly with TLSv1.2.

In TLSv1.2, this is the cipher that is used to chose the authentication algorithm
(ECDSA or RSA), however an SSL client can send a signature algorithm.

In TLSv1.3, the authentication is not part of the ciphersuites, and
is selected using the signature algorithm.

The mistake in the code is that the signature algorithm in TLSv1.2 are
overwritting the auth that was selected using the ciphers.

This must be backported as far as 2.9.
2024-06-07 15:47:15 +02:00
William Lallemand
93cc23a355 BUG/MEDIUM: ssl: wrong priority whem limiting ECDSA ciphers in ECDSA+RSA configuration
The ClientHello Callback which is used for certificate selection uses
both the signature algorithms and the ciphers sent by the client.

However, when a client is announcing both ECDSA and RSA capabilities
with ECSDA ciphers that are not available on haproxy side and RSA
ciphers that are compatibles, the ECDSA certificate will still be used
but this will result in a "no shared cipher" error, instead of a
fallback on the RSA certificate.

For example, a client could send
'ECDHE-ECDSA-AES128-CCM:ECDHE-RSA-AES256-SHA and HAProxy could be
configured with only 'ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA'.

This patch fixes the issue by validating that at least one ECDSA cipher
is available on both side before chosing the ECDSA certificate.

This must be backported on all stable versions.
2024-06-05 15:33:36 +02:00
Christopher Faulet
6697e87ae5 MINOR: mux-quic: Don't send an emtpy H3 DATA frame during zero-copy forwarding
It may only happens when there is no data to forward but a last stream frame
must be sent with the FIN bit. It is not invalid, but it is useless to send
an empty H3 DATA frame in that case.
2024-06-05 07:28:10 +02:00
Christopher Faulet
9748df29ff BUG/MEDIUM: mux-quic: Don't unblock zero-copy fwding if blocked during nego
The previous fix (792a645ec2 ["BUG/MEDIUM: mux-quic: Unblock zero-copy
forwarding if the txbuf can be released"]) introduced a regression. The
zero-copy data forwarding must only be unblocked if it was blocked by the
producer, after a successful negotiation.

It is important because during a negotiation, the consumer may be blocked
for another reason. Because of the flow control for instance. In that case,
there is not necessarily a TX buffer. And it unexpected to try to release an
unallocated TX buf.

In addition, the same may happen while a TX buf is still in-use. In that
case, it must also not be released. So testing the TX buffer is not the
right solution.

To fix the issue, a new IOBUF flag was added (IOBUF_FL_FF_WANT_ROOM). It
must be set by the producer if it is blocked after a sucessful negotiation
because it needs more room. In that case, we know a buffer was provided by
the consummer. In done_fastfwd() callback function, it is then possible to
safely unblock the zero-copy data forwarding if this flag is set.

This patch must be backported to 3.0 with the commit above.
2024-06-05 07:28:10 +02:00
Aurelien DARRAGON
2bde0d64dd CLEANUP: hlua: simplify ambiguous lua_insert() usage in hlua_ctx_resume()
'lua_insert(lua->T, -lua_gettop(lua->T))' is actually used to rotate the
top value with the bottom one, thus the code was overkill and the comment
was actually misleading, let's fix that by using explicit equivalent form
(absolute index).

It may be backported with 5508db9a2 ("BUG/MINOR: hlua: fix unsafe
lua_tostring() usage with empty stack") to all stable versions to ease
code maintenance.
2024-06-04 16:31:38 +02:00
Aurelien DARRAGON
755c2daf0f BUG/MINOR: hlua: fix leak in hlua_ckch_set() error path
in hlua_ckch_commit_yield() and hlua_ckch_set(), when an error occurs,
we enter the error path and try to raise an error from the <err> msg
pointer which must be freed afterwards.

However, the fact that luaL_error() never returns was overlooked, because
of that <err> msg is never freed in such case.

To fix the issue, let's use hlua_pushfstring_safe() helper to push the
err on the lua stack and then free it before throwing the error using
lua_error().

It should be backported up to 2.6 with 30fcca18 ("MINOR: ssl/lua:
CertCache.set() allows to update an SSL certificate file")
2024-06-04 16:31:30 +02:00
Aurelien DARRAGON
2be94c008e CLEANUP: hlua: get rid of hlua_traceback() security checks
Thanks to the previous commit, we may now assume that hlua_traceback()
won't LJMP, so it's safe to use it from unprotected environment without
any precautions.
2024-06-04 16:31:22 +02:00
Aurelien DARRAGON
365ee28510 BUG/MINOR: hlua: prevent LJMP in hlua_traceback()
Function is often used on error paths where no precaution is taken
against LJMP. Since the function is used on error paths (which include
out-of-memory error paths) the function lua_getinfo() could also raise
a memory exception, causing the process to crash or improper error
handling if the caller isn't prepared against that eventually. Since the
function is only used on rare events (error handling) and is lacking the
__LJMP prototype pefix, let's make it safe by protecting the lua_getinfo()
call so that hlua_traceback() callers may use it safely now (the function
will always succeed, output will be truncated in case of error).

This could be backported to all stable versions.
2024-06-04 16:31:15 +02:00
Aurelien DARRAGON
f0e5b825cf BUG/MINOR: hlua: fix unsafe hlua_pusherror() usage
Following previous commit's logic: hlua_pusherror() is mainly used
from cleanup paths where the caller isn't protected against LJMPs.

Caller was tempted to think that the function was safe because func
prototype was lacking the __LJMP prefix.

Let's make the function really LJMP-safe by wrapping the sensitive calls
under lua_pcall().

This may be backported to all stable versions.
2024-06-04 16:31:09 +02:00
Aurelien DARRAGON
c0a3c1281f BUG/MINOR: hlua: don't use lua_pushfstring() when we don't expect LJMP
lua_pushfstring() is used in multiple cleanup paths (upon error) to
push the error message that will be raised by lua_error(). However this
is often done from an unprotected environment, or in the middle of a
cleanup sequence, thus we don't want the function to LJMP! (it may cause
various issues ranging from memory leaks to crashing the process..)

Hopefully this has very few chances of happening but since the use of
lua_pushfstring() is limited to error reporting here, it's ok to use our
own hlua_pushfstring_safe() implementation with a little overhead to
ensure that the function will never LJMP.

This could be backported to all stable versions.
2024-06-04 16:31:01 +02:00
Aurelien DARRAGON
6e484996c6 CLEANUP: hlua: use hlua_pusherror() where relevant
In hlua_map_new(), when error occurs we use a combination of luaL_where,
lua_pushfstring and lua_concat to build the error string before calling
lua_error().

It turns out that we already have the hlua_pusherror() macro which is
exactly made for that purpose so let's use it.

It could be backported to all stable versions to ease code maintenance.
2024-06-04 16:30:55 +02:00
Amaury Denoyelle
f7ae84e7d1 BUG/MINOR: quic: prevent crash on qc_kill_conn()
Ensure idle_timer task is allocated in qc_kill_conn() before waking it
up. It can be NULL if idle timer has already fired but MUX layer is
still present, which prevents immediate quic_conn release.

qc_kill_conn() is only used on send() syscall fatal error to notify
upper layer of an error and close the whole connection asap.

This crash occurence is pretty rare as it relies on timing issues. It
happens only if idle timer occurs before the MUX release (a bigger
client timeout is thus required) and any send() syscall detected error.
For now, it was only reproduced using GDB to interrupt haproxy longer
than the idle timeout.

This should be backported up to 2.6.
2024-06-04 14:59:24 +02:00
Christopher Faulet
792a645ec2 BUG/MEDIUM: mux-quic: Unblock zero-copy forwarding if the txbuf can be released
In done_fastfwd() callback function, if nothing was forwarding while the SD
is blocked, it means there is not enough space in the buffer to proceed. It
may be because there are data to be sent. But it may also be data already
sent waiting for an ack. In this case, no data to be sent by the mux. So the
quic stream is not woken up when data are finally removed from the
buffer. The data forwarding can thus be stuck. This happens when the stats
page is requested in QUIC/H3. Only applets are affected by this issue and
only with the QUIC multiplexer because it is the only mux with already sent
data in the TX buf.

To fix the issue, the idea is to release the txbuf if possible and then
unblock the SD to perform a new zero-copy data forwarding attempt. Doing so,
and thanks to the previous patch ("MEDIUM: applet: Be able to unblock
zero-copy data forwarding from done_fastfwd"), the applet will be woken up.

This patch should fix the issue #2584. It must be backported to 3.0.
2024-06-04 14:23:40 +02:00
Christopher Faulet
d2a2014f15 MEDIUM: stconn: Be able to unblock zero-copy data forwarding from done_fastfwd
This part is only experienced by applet. When an applet try to forward data
via an iobuf, it may decide to block for any reason even if there is free
space in the buffer. For instance, the stats applet don't procude data if
the buffer is almost full.

However, in this case, it could be good to let the consumer decide a new
attempt is possible because more space was made. So, if IOBUF_FL_FF_BLOCKED
flag is removed by the consumer when done_fastfwd() callback function is
called, the SE_FL_WANT_ROOM flag is removed on the producer sedesc. It is
only done for applets. And thanks to this change, the applet can be woken up
for a new attempt.

This patch is required for a fix on the QUIC multiplexer.
2024-06-04 14:23:40 +02:00
Christopher Faulet
7c84ee71f7 BUG/MEDIUM: h1-htx: Don't state interim responses are bodyless
Interim responses are by definition bodyless. But we must not set the
corresponding HTX start-line flag, beecause the start-line of the final
response is still expected. Setting the flag above too early may lead the
multiplexer on the sending side to consider the message is finished after
the headers of the interim message.

It happens with the H2 multiplexer on frontend side if a "100-Continue" is
received from the server. The interim response is sent and
HTX_SL_F_BODYLESS_RESP flag is evaluated. Then, the headers of the final
response are sent with ES flag, because HTX_SL_F_BODYLESS_RESP flag was seen
too early, leading to a protocol error if the response has a body.

Thanks to grembo for this analysis.

This patch should fix the issue #2587. It must be backported as far as 2.9.
2024-06-04 14:23:40 +02:00
Ilia Shipitsin
1ef6cdcd26 CI: FreeBSD: upgrade image, packages
FreeBSD-13.2 was removed from cirrus-ci, let's upgrade to 14.0,
also, pcre is EOL, let's switch to pcre2. lua is updated to 5.4
2024-06-04 11:19:00 +02:00
Aurelien DARRAGON
a63f2cde94 CLEANUP: hlua: fix CertCache class comment
CLASS_CERTCACHE is used to declare CertCache global object, not Regex one

This copy-paste typo introduced was in 30fcca18 ("MINOR: ssl/lua:
CertCache.set() allows to update an SSL certificate file")
2024-06-03 17:00:06 +02:00
Aurelien DARRAGON
4f906a9c38 BUG/MINOR: hlua: use CertCache.set() from various hlua contexts
Using CertCache.set() from init context wasn't explicitly supported and
caused the process to crash:

crash.lua:
  core.register_init(function()
    CertCache.set{filename="reg-tests/ssl/set_cafile_client.pem", ocsp=""}
  end)

crash.conf:
  global
    lua-load crash.lua
  listen front
    bind localhost:9090 ssl crt reg-tests/ssl/set_cafile_client.pem ca-file reg-tests/ssl/set_cafile_interCA1.crt verify none

./haproxy -f crash.conf
[NOTICE]   (267993) : haproxy version is 3.0-dev2-640ff6-910
[NOTICE]   (267993) : path to executable is ./haproxy
[WARNING]  (267993) : config : missing timeouts for proxy 'front'.
   | While not properly invalid, you will certainly encounter various problems
   | with such a configuration. To fix this, please ensure that all following
   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[1]    267993 segmentation fault (core dumped)  ./haproxy -f crash.conf

This is because in hlua_ckch_set/hlua_ckch_commit_yield, we always
consider that we're being called from a yield-capable runtime context.
As such, hlua_gethlua() is never checked for NULL and we systematically
try to wake hlua->task and yield every 10 instances.

In fact, if we're called from the body or init context (that is, during
haproxy startup), hlua_gethlua() will return NULL, and in this case we
shouldn't care about yielding because it is ok to commit all instances
at once since haproxy is still starting up.

Also, when calling CertCache.set() from a non-yield capable runtime
context (such as hlua fetch context), we kept doing as if the yield
succeeded, resulting in unexpected function termination (operation
would be aborted and the CertCache lock wouldn't be released). Instead,
now we explicitly state in the doc that CertCache.set() cannot be used
from a non-yield capable runtime context, and we raise a runtime error
if it is used that way.

These bugs were discovered by reading the code when trying to address
Svace report documented by @Bbulatov GH #2586.

It should be backported up to 2.6 with 30fcca18 ("MINOR: ssl/lua:
CertCache.set() allows to update an SSL certificate file")
2024-06-03 17:00:00 +02:00
Aurelien DARRAGON
8860c22c00 MINOR: stktable: avoid ambiguous stktable_data_ptr() usage in cli_io_handler_table()
As reported by @Bbulatov in GH #2586, stktable_data_ptr() return value is
used without checking it isn't NULL first, which may happen if the given
type is invalid or not stored in the table.

However, since date_type is set by table_prepare_data_request() right
before cli_io_handler_table() is invoked, date_type is not expected to
be invalid: table_prepare_data_request() normally checked that the type
is stored inside the table. Thus stktable_data_ptr() should not be failing
at this point, so we add a BUG_ON() to indicate that.
2024-06-03 16:59:54 +02:00
William Lallemand
dc8a2c7f43 DOC: change the link to the FreeBSD CI in README.md
Change the link to the FreeBSD CI status badge to use the cirrus.com
jobs list.
2024-06-03 15:21:29 +02:00
William Lallemand
45cac52212 DOC: add the FreeBSD status badge to README.md
Add the FreeBSD status badge that comes from the Cirrus CI in the
README.md
2024-06-03 15:14:37 +02:00
Ilia Shipitsin
ab23d7eb69 CI: speedup apt package install
we are fine to skip some repos like languages and translations.
this drops number of repos twice
2024-06-03 11:59:07 +02:00
William Lallemand
c79c312142 DOC: configuration: add an example for keywords from crt-store
In ticket #785, people are still confused about how to use the crt-store
load parameters in a crt-list.

This patch adds an example.

This must be backported in 3.0
2024-06-03 11:02:23 +02:00
Willy Tarreau
ba958fb230 BUG/MINOR: tools: fix possible null-deref in env_expand() on out-of-memory
In GH issue #2586 @Bbulatov reported a theoretical null-deref in
env_expand() in case there's no memory anymore to expand an environment
variable. The function should return NULL in this case so that the only
caller (str2sa_range) sees it. In practice it may only happen during
boot thus is harmless but better fix it since it's easy. This can be
backported to all versions where this applies.
2024-05-31 18:55:36 +02:00
Willy Tarreau
8a7afb6964 BUG/MINOR: tcpcheck: report correct error in tcp-check rule parser
When parsing tcp-check expect-header, a copy-paste error in the error
message causes the name of the header to be reporetd as the invalid
format string instead of its value. This is really harmless but should
be backported to all versions to help users understand the cause of the
problem when this happens. This was reported in GH issue #2586 by
@Bbulatov.
2024-05-31 18:37:56 +02:00
Willy Tarreau
d8194fab82 BUG/MINOR: cfgparse: remove the correct option on httpcheck send-state warning
In GH issue #2586 @Bbulatov reported a bug where the http-check
send-state flag is removed from options instead of options2 when
http-check is disabled. It only has an effect when this option is
set and http-check disabled, where it displays a warning indicating
this will be ignored. The option removed instead is srvtcpka when
this happens. It's likely that both options being so minor, nobody
ever faced it.

This can be backported to all versions.
2024-05-31 18:30:16 +02:00
William Lallemand
f8418d3ade ADMIN: acme.sh: remove the old acme.sh code
Remove the acme.sh script since it was merged in
https://github.com/acmesh-official/acme.sh/pull/4581

So people don't try to download a script which is not up to date with
the current acme.sh master.
2024-05-31 13:37:47 +02:00
Ilia Shipitsin
f3e6dfdc92 CI: VTest: accelerate package install a bit
let's check and install only package is required
2024-05-30 17:04:08 +02:00
William Lallemand
485b206f61 DOC: replace the README by a markdown version
This patch removes the old README file and replaces it with a more
modern markdown version which allows clickable links on the github page.

It also adds some of the Github Actions worfklow Status.

This patch includes the HAProxy png in the doc directory.
2024-05-30 13:53:46 +02:00
Ilia Shipitsin
09db70d021 CI: use USE_PCRE2 instead of USE_PCRE
USE_PCRE2 is recommended, I guess USE_PCRE is left unintentionally
2024-05-29 22:37:26 +02:00
Ilia Shipitsin
11c088e203 CI: switch to lua 5.4
current release is 5.4, let's switch to it
2024-05-29 22:37:26 +02:00
Ilia Shipitsin
01c213a4bb CI: use "--no-install-recommends" for apt-get
this reduces number of packages installed by 1
2024-05-29 22:37:26 +02:00
Tim Duesterhus
e349159a34 REGTESTS: Remove REQUIRE_VERSION=2.2 from all tests
HAProxy 2.2 is the lowest supported version, thus this always matches.

see 7aff1bf6b90caadfa95f6b43b526275191991d6f
2024-05-29 22:36:15 +02:00
Tim Duesterhus
10418b6b5a REGTESTS: Remove REQUIRE_VERSION=2.1 from all tests
HAProxy 2.2 is the lowest supported version, thus this always matches.

see 7aff1bf6b90caadfa95f6b43b526275191991d6f
2024-05-29 22:36:15 +02:00
Willy Tarreau
1eb0f22ee1 [RELEASE] Released version 3.1-dev0
Released version 3.1-dev0 with the following main changes :
    - MINOR: version: mention that it's development again
2024-05-29 15:00:02 +02:00
Willy Tarreau
555772e961 MINOR: version: mention that it's development again
This essentially reverts 2e42a19cde.
2024-05-29 14:59:19 +02:00
Willy Tarreau
5590ada473 [RELEASE] Released version 3.0.0
Released version 3.0.0 with the following main changes :
    - MINOR: sample: implement the uptime sample fetch
    - CI: scripts: fix build of vtest regarding option -C
    - CI: scripts: build vtest using multiple CPUs
    - MINOR: log: rename 'log-format tag' to 'log-format alias'
    - DOC: config: document logformat item naming and typecasting features
    - BUILD: makefile: yearly reordering of objects by build time
    - BUILD: fd: errno is also needed without poll()
    - DOC: config: fix two typos "RST_STEAM" vs "RST_STREAM"
    - DOC: config: refer to the non-deprecated keywords in ocsp-update on/off
    - DOC: streamline http-reuse and connection naming definition
    - REGTESTS: complete http-reuse test with pool-conn-name
    - DOC: config: add %ID logformat alias alternative
    - CLEANUP: ssl/ocsp: readable ifdef in ssl_sock_load_ocsp
    - BUG/MINOR: ssl/ocsp: init callback func ptr as NULL
    - CLEANUP: ssl_sock: move dirty openssl-1.0.2 wrapper to openssl-compat
    - BUG/MINOR: activity: fix Delta_calls and Delta_bytes count
    - CI: github: upgrade the WolfSSL job to 5.7.0
    - DOC: install: update quick build reminders with some missing options
    - DOC: install: update the range of tested openssl version to cover 3.3
    - DEV: patchbot: prepare for new version 3.1-dev
    - MINOR: version: mention that it's 3.0 LTS now.
2024-05-29 14:43:38 +02:00
Willy Tarreau
2e42a19cde MINOR: version: mention that it's 3.0 LTS now.
The version will be maintained up to around Q2 2029. Let's
also update the INSTALL file to mention this.
2024-05-29 14:40:26 +02:00
Willy Tarreau
bb7e62b98a DEV: patchbot: prepare for new version 3.1-dev
The bot will now load the prompt for the upcoming 3.1 version so we have
to rename the files and update their contents to match the current version.
2024-05-29 14:38:21 +02:00
Willy Tarreau
8452a3f7c9 DOC: install: update the range of tested openssl version to cover 3.3
OpenSSL 3.3 is known to work since it's tested on the CI, to let's add
it to the list of known good versions.
2024-05-29 10:23:59 +02:00
Willy Tarreau
2a949be18d DOC: install: update quick build reminders with some missing options
The quick build reminders claimed to present "all options" but were
still missing QUIC. It was also the moment to split FreeBSD and
OpenBSD apart since the latter uses LibreSSL and does not require
the openssl compatibility wrapper. We also replace the hard-coded
number of cpus for the parallel build, by the real number reported
by the system.
2024-05-29 08:43:01 +02:00
William Lallemand
40cd5cc0e2 CI: github: upgrade the WolfSSL job to 5.7.0
WolfSSL 5.70 was released in March 2024,  let's upgrade our CI job to
this version.
2024-05-28 19:26:52 +02:00
Valentine Krasnobaeva
d5e43caaf5 BUG/MINOR: activity: fix Delta_calls and Delta_bytes count
Thanks to the commit 5714aff4a6bf
"DEBUG: pool: store the memprof bin on alloc() and update it on free()", the
amount of memory allocations and memory "frees" is shown now on the same line,
corresponded to the caller name. This is very convenient to debug memory leaks
(haproxy should run with -dMcaller option).

The implicit drawback of this solution is that we count twice same free_calls
and same free_tot (bytes) values in cli_io_handler_show_profiling(), when
we've calculed tot_free_calls and tot_free_bytes, by adding them to the these
totalizators for p_alloc, malloc and calloc allocator types. See the details
about why this happens in a such way in __pool_free() implementation and
also in the commit message for 5714aff4a6bf.

This double addition of free counters falses 'Delta_calls' and 'Delta_bytes',
sometimes we even noticed that they show negative values.

Same problem was with the calculation of average allocated buffer size for
lines, where we show simultaneously the number of allocated and freed bytes.
2024-05-28 19:25:08 +02:00
Willy Tarreau
decb7c90df CLEANUP: ssl_sock: move dirty openssl-1.0.2 wrapper to openssl-compat
Valentine noticed this ugly SSL_CTX_get_tlsext_status_cb() macro
definition inside ssl_sock.c that is dedicated to openssl-1.0.2 only.
It would be better placed in openssl-compat.h, which is what this
patch does. It also addresses a missing pair of parenthesis and
removes an invalid extra semicolon.
2024-05-28 19:17:57 +02:00
Valentine Krasnobaeva
84380965a5 BUG/MINOR: ssl/ocsp: init callback func ptr as NULL
In ssl_sock_load_ocsp() it is better to initialize local scope variable
'callback' function pointer as NULL, while we are declaring it. According to
SSL_CTX_get_tlsext_status_cb() API, then we will provide a pointer to this
'on stack' variable in order to check, if the callback was already set before:

OpenSSL 1.x.x and 3.x.x:
  long SSL_CTX_get_tlsext_status_cb(SSL_CTX *ctx, int (**callback)(SSL *, void *));
  long SSL_CTX_set_tlsext_status_cb(SSL_CTX *ctx, int (*callback)(SSL *, void *));

WolfSSL 5.7.0:
  typedef int(*tlsextStatusCb)(WOLFSSL* ssl, void*);
  WOLFSSL_API int wolfSSL_CTX_get_tlsext_status_cb(WOLFSSL_CTX* ctx, tlsextStatusCb* cb);
  WOLFSSL_API int wolfSSL_CTX_set_tlsext_status_cb(WOLFSSL_CTX* ctx, tlsextStatusCb cb);

When this func ptr variable stays uninitialized, haproxy comipled with ASAN
crushes in ssl_sock_load_ocsp():

  ./haproxy -d -f haproxy.cfg
  ...
  AddressSanitizer:DEADLYSIGNAL
  =================================================================
  ==114919==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008 (pc 0x5eab8951bb32 bp 0x7ffcdd6d8410 sp 0x7ffcdd6d82e0 T0)
  ==114919==The signal is caused by a READ memory access.
  ==114919==Hint: address points to the zero page.
    #0 0x5eab8951bb32 in ssl_sock_load_ocsp /home/vk/projects/haproxy/src/ssl_sock.c:1248:22
    #1 0x5eab89510d65 in ssl_sock_put_ckch_into_ctx /home/vk/projects/haproxy/src/ssl_sock.c:3389:6
  ...

This happens, because callback variable is allocated on the stack. As not
being explicitly initialized, it may contain some garbage value at runtime,
due to the linked crypto library update or recompilation.

So, following ssl_sock_load_ocsp code, SSL_CTX_get_tlsext_status_cb() may
fail, callback will still contain its initial garbage value,
'if (!callback) {...' test will put us on the wrong path to access some
ocsp_cbk_arg properties via its pointer, which won't be set and like this
we will finish with segmentation fault.

Must be backported in all stable versions. All versions does not have
the ifdef, the previous cleanup patch is useful starting from the 2.7
version.
2024-05-28 18:14:26 +02:00
Valentine Krasnobaeva
fb7b46d267 CLEANUP: ssl/ocsp: readable ifdef in ssl_sock_load_ocsp
Due to the support of different TLS/SSL libraries and its different versions,
sometimes we are forced to use different internal typedefs and callback
functions. We strive to avoid this, but time to time "#ifdef... #endif"
become inevitable.

In particular, in ssl_sock_load_ocsp() we define a 'callback' variable, which
will contain a function pointer to our OCSP stapling callback, assigned
further via SSL_CTX_set_tlsext_status_cb() to the intenal SSL context
struct in a linked crypto library.

If this linked crypto library is OpenSSL 1.x.x/3.x.x, for setting and
getting this callback we have the following API signatures
(see doc/man3/SSL_CTX_set_tlsext_status_cb.pod):

  long SSL_CTX_get_tlsext_status_cb(SSL_CTX *ctx, int (**callback)(SSL *, void *));
  long SSL_CTX_set_tlsext_status_cb(SSL_CTX *ctx, int (*callback)(SSL *, void *));

If we are using WolfSSL, same APIs expect tlsextStatusCb function prototype,
provided via the typedef below (see wolfssl/wolfssl/ssl.h):

  typedef int(*tlsextStatusCb)(WOLFSSL* ssl, void*);
  WOLFSSL_API int wolfSSL_CTX_get_tlsext_status_cb(WOLFSSL_CTX* ctx, tlsextStatusCb* cb);
  WOLFSSL_API int wolfSSL_CTX_set_tlsext_status_cb(WOLFSSL_CTX* ctx, tlsextStatusCb cb);

It seems, that in OpenSSL < 1.0.0, there was no support for OCSP extention, so
no need to set this callback.

Let's avoid #ifndef... #endif for this 'callback' variable definition to keep
things clear. #ifndef... #endif are usually less readable, than
straightforward "#ifdef... #endif".
2024-05-28 18:00:44 +02:00
Aurelien DARRAGON
f9740230fc DOC: config: add %ID logformat alias alternative
unique-id sample fetch may be used instead of %ID alias but it wasn't
mentioned explicitly in the doc.
2024-05-28 15:45:03 +02:00
Amaury Denoyelle
b0e1f77fea REGTESTS: complete http-reuse test with pool-conn-name
Add new test cases in http_reuse_conn_hash vtest. Ensure new server
parameter "pool-conn-name" is used as expected for idle connection name,
both alone and mixed with a SNI.
2024-05-28 15:00:54 +02:00
Amaury Denoyelle
8c09c7f39f DOC: streamline http-reuse and connection naming definition
With the introduction of "pool-conn-name", documentation related to
http-reuse was rendered more complex than already, notably with multiple
cross-references between "pool-conn-name" and "sni" server keywords.

Took the opportunity to improve all http-reuse related documentation.
First, "http-reuse" keyword general purpose has been greatly expanded
and reordered.

Then, "pool-conn-name" and "sni" have been clarified, in particular the
relation between them, with the foremost being an advanced usage to the
default SSL SNI case in the context of http-reuse. Also update
attach-srv rule documentation as its name parameter is directly linked
to both "pool-conn-name" and "sni".
2024-05-28 13:58:08 +02:00
Willy Tarreau
652a6f18b2 DOC: config: refer to the non-deprecated keywords in ocsp-update on/off
The doc for "ocsp-update [ off | on ]" was still referring to
"tune.ssl.ocsp-update.*" instead of "ocsp-update.*". No backport
needed.
2024-05-27 20:13:42 +02:00
Willy Tarreau
2ed3531619 DOC: config: fix two typos "RST_STEAM" vs "RST_STREAM"
These were added in 3.0-dev11 by commit 068ce2d5d2 ("MINOR: stconn:
Add samples to retrieve about stream aborts"), no backport needed.
2024-05-27 19:51:19 +02:00
Willy Tarreau
725fa0ecd2 BUILD: fd: errno is also needed without poll()
When building without USE_POLL, fd.c fails on errno because that one is
only included when USE_POLL is set. Let's move it outside of the ifdef.
2024-05-27 19:14:14 +02:00
Willy Tarreau
35e9826c13 BUILD: makefile: yearly reordering of objects by build time
Some large files have been split since 2.9 (e.g. stats) and build times
have moved and become less smooth, causing a less even parallel build.
As usual, a small reordering cleans all this up. The effect was less
visible than previous years though.
2024-05-27 19:14:14 +02:00
Aurelien DARRAGON
141bc5ba0d DOC: config: document logformat item naming and typecasting features
The ability to give a name to a logformat_node (known as logformat item in
the documentation) implemented in 2ed6068f2a ("MINOR: log: custom name for
logformat node") wasn't documented.

The same goes for the ability to force the logformat_node's output type to
a specific type implemented in 1448478d62 ("MINOR: log: explicit
typecasting for logformat nodes")

Let's quickly describe such new usages at the start of the custom log
format section.
2024-05-27 17:04:16 +02:00
Aurelien DARRAGON
435a9da267 MINOR: log: rename 'log-format tag' to 'log-format alias'
In 2.9 we started to introduce an ambiguity in the documentation by
referring to historical log-format variables ('%var') as log-format
tags in 739c4e5b1e ("MINOR: sample: accept_date / request_date return
%Ts / %tr timestamp values") and 454c372b60 ("DOC: configuration: add
sample fetches for timing events").

In fact, we've had this confusion between log-format tag and log-format
var for more than 10 years now, but in 2.9 it was the first time the
confusion was exposed in the documentation.

Indeed, both 'log-format variable' and 'log-format tag' actually refer
to the same feature (that is: '%B' and friends that can be used for
direct access to some log-oriented predefined fetches instead of using
%[expr] with generic sample expressions).

This feature was first implemented in 723b73ad75 ("MINOR: config: Parse
the string of the log-format config keyword") and later documented in
4894040fa ("DOC: log-format documentation"). At that time, it was clear
that we used to name it 'log-format variable'.

But later the same year, 'log-format tag' naming started to appear in
some commit messages (while still referring to the same feature), for
instance with ffc3fcd6d ("MEDIUM: log: report SSL ciphers and version
in logs using logformat %sslc/%sslv").

Unfortunately in 2.9 when we added (and documented) new log-format
variables we officially started drifting to the misleading 'log-format
tag' naming (perhaps because it was the most recent naming found for
this feature in git log history, or because the confusion has always
been there)

Even worse, in 3.0 this confusion led us to rename all 'var' occurrences
to 'tag' in log-format related code to unify the code with the doc.

Hopefully William quickly noticed that we made a mistake there, but
instead of reverting to historical naming (log-format variable), it was
decided that we must use a different name that is less confusing than
'tags' or 'variables' (tags and variables are keywords that are already
used to designate other features in the code and that are not very
explicit under log-format context today).

Now we refer to '%B' and friends as a logformat alias, which is
essentially a handy way to print some log oriented information in the
log string instead of leveraging '%[expr]' with generic sample expressions
made of fetches and converters. Of course, there are some subtelties, such
as a few log-format aliases that still don't have sample fetch equivalent
for historical reasons, and some aliases that may be a little faster than
their generic sample expression equivalents because most aliases are
pretty much hardcoded in the log building function. But in general
logformat aliases should be simply considered as an alternative to using
expressions (with '%[expr']')

Also, under log-format context, when we want to refer to either an alias
('%alias') or an expression ('%[expr]'), we should use the generic term
'logformat item', which in fact designates a single item within the
logformat string provided by the user. Indeed, a logformat item (whether
is is an alias or an expression) always starts with '%' and may accept
optional flags / arguments

Both the code and the documentation were updated in that sense, hopefully
this will clarify things and prevent future confusions.
2024-05-27 17:03:48 +02:00
Willy Tarreau
7e943cdf27 CI: scripts: build vtest using multiple CPUs
Now that vtest supports make -j, let's use it to save a bit of time
(the build time is ~6s per test by default).
2024-05-27 12:15:50 +02:00
Willy Tarreau
01843c47a1 CI: scripts: fix build of vtest regarding option -C
On Linux, GNU make emits "w" at the beginning of the MAKEFLAGS
variable if -C is passed, which happens since vtest d6d228bcb3.
In fact it emits any of the command line flags without the leading
'-' in this case. gmake doesn't do that on BSD apparently. It's
documented under Options/Recursion in the GNU make doc. There's
also MFLAGS that could work but it does not contain the variables
definitions. So let's just avoid the -C that we don't really need.

This needs to be backported to stable versions.
2024-05-27 12:15:50 +02:00
William Lallemand
0a00302fab MINOR: sample: implement the uptime sample fetch
'uptime' returns the uptime of the current HAProxy worker in seconds.
2024-05-27 11:06:40 +02:00
Willy Tarreau
f76e73511a [RELEASE] Released version 3.0-dev13
Released version 3.0-dev13 with the following main changes :
    - CLEANUP: ssl/cli: remove unused code in dump_crtlist_conf
    - MINOR: ssl: check parameter in ckch_conf_cmp()
    - BUG/MINOR: ring: free ring's allocated area not ring's usable area when using maps
    - DOC: configuration: rework the crt-store load documentation
    - DEBUG: tools: add vma_set_name() helper
    - DEBUG: shctx: name shared memory using vma_set_name()
    - DEBUG: sink: add name hint for memory area used by memory-backed sinks
    - DEBUG: pollers: add name hint for large memory areas used by pollers
    - DEBUG: errors: add name hint for startup-logs memory area
    - DEBUG: fd: add name hint for large memory areas
    - MEDIUM: ssl: don't load file by discovering them in crt-store
    - DOC: configuration: update the crt-list documentation
    - DOC: configuration: add the supported crt-store options in crt-list
    - BUG/MEDIUM: proto: fix fd leak in <proto>_connect_server
    - MINOR: sock: set conn->err_code in case of EPERM
    - BUG/MINOR: http-ana: Don't crush stream termination condition on internal error
    - MAJOR: spoe: Let the SPOE back into the game
    - BUG/MINOR: connection: parse PROXY TLV for LOCAL mode
    - BUG/MINOR: server: free PROXY v2 TLVs on srv drop
    - MINOR: rhttp: add log on connection allocation failure
    - BUG/MEDIUM: rhttp: fix preconnect on single-thread
    - BUG/MINOR: rhttp: prevent listener suspend
    - BUG/MINOR: rhttp: fix task_wakeup state
    - MINOR: session: define flag to explicitely release listener on free
    - MEDIUM: rhttp: create session for active preconnect
    - MINOR: rhttp: support PROXY emission on preconnect
    - MINOR: connection: support PROXY v2 TLV emission without stream
    - MINOR: traces: enumerate the list of levels/verbosities when not found
    - BUG/MINOR: sock: fix sock_create_server_socket
    - MINOR: proto: fix coding style
    - BUG/MAJOR: quic: Crash with TLS_AES_128_CCM_SHA256 (libressl only)
    - REGTESTS: scripts: allow to change the vtest timeout
    - BUG/MEDIUM: quic_tls: prevent LibreSSL < 4.0 from negotiating CHACHA20_POLY1305
    - CI: scripts/build-ssl.sh: loudly fail on unsupported platforms
    - BUG/MEDIUM: mux-quic: Create sedesc in same time of the QUIC stream
    - MINOR: mux-quic: Set abort info for SC-less QCS on STOP_SENDING frame
    - CI: scripts/build-ssl: add a DESTDIR and TMPDIR variable
    - CI: scripts/buil-ssl: cleanup the boringssl and quictls build
    - MINOR: config: add thread-hard-limit to set an upper bound to nbthread
    - BUILD: quic: fix unused variable warning when threads are disabled
    - BUG/MEDIUM: stick-tables: Fix race with peers when trashing oldest entries
    - BUG/MEDIUM: stick-tables: Fix race with peers when killing a sticky session
    - BUG/MEDIUM: stick-tables: make sure never to create two same remote entries
    - CLEANUP: stick-tables: remove a few unneeded tests for use_wrlock
    - MINOR: stick-tables: remove the uneeded read lock in stksess_free()
    - CLEANUP: tools: fix vma_set_name() function comment
    - DEBUG: tools: add vma_set_name_id() helper
    - DEBUG: pollers/fd: add thread id suffix to per-thread memory areas name hints
    - DOC: config: fix aes_gcm_enc() description text
    - BUILD: trace: fix warning on null dereference
    - MEDIUM: config: prevent communication with privileged ports
    - MAJOR: config: prevent QUIC with clients privileged port by default
    - BUG/MINOR: quic: adjust restriction for stateless reset emission
    - MINOR: quic: clarify doc for quic_recv()
    - MINOR: server: generalize sni expr parsing
    - MINOR: server: define pool-conn-name keyword
    - MEDIUM: connection: use pool-conn-name instead of sni on reuse
    - BUG/MINOR: rhttp: initialize session origin after preconnect reversal
    - BUG/MEDIUM: server/dns: preserve server's port upon resolution timeout or error
    - BUG/MINOR: http-htx: Support default path during scheme based normalization
    - BUG/MINOR: server: Don't reset resolver options on a new default-server line
    - DOC: quic: specify that connection migration is not supported
    - DOC: config: fix incorrect section reference about custom log format
    - DOC: config: uniformize the naming and description of custom log format args
    - DOC: config: clarify the fact that custom log format is not just for logging
    - REGTESTS: acl_cli_spaces: avoid a warning caused by undefined logs
2024-05-24 17:57:29 +02:00
Willy Tarreau
45a187304e REGTESTS: acl_cli_spaces: avoid a warning caused by undefined logs
There's a warning being reported in this reg test in the detailed startup
logs because of "log global" and "option httplog" while there's no global
section hence no logger. Let's just drop both options since they're not
relevant to this test.
2024-05-24 17:50:19 +02:00
Willy Tarreau
0af9bfcbc5 DOC: config: clarify the fact that custom log format is not just for logging
The wording in the Custom log format section was still extremely centered
on logging, but it's about time to mention that these are usable for other
actions as well, otherwise it's very confusing for newcomers who try to
define a variable or header. The updated text also reminds about the risks
of safe encodings that may (rarely) mangle an output string, and encourages
to migrate away from the unquoted definition which is full of backslashes.
It would definitely deserve further improvements and refinements.
2024-05-24 17:32:59 +02:00
Willy Tarreau
c02cefce23 DOC: config: uniformize the naming and description of custom log format args
A significant number of actions now take arguments that are evaluated as
log-format expressions. Some of them are called "fmt", others "string".
The description of the argument sometimes just says "the log-format
string" or "log format" or "custom log format" etc. Most of them do not
mention the section to visit, and section 8.2 speaking about log-format
is very centric on logs usage (the primary use case), making all of this
very confusing for newcomers.

Since section 8.2.6 is titled "Custom log format" and describes the syntax
to be used with the "log-format" (and other) directives, let's call this
"Custom log format" everywhere and mention section 8.2.6. When the field
was called "string", it was also renamed to "fmt".

It doesn't seem worth backporting this, unless it applies fine.
2024-05-24 17:32:59 +02:00
Willy Tarreau
474cbcf842 DOC: config: fix incorrect section reference about custom log format
Since 2.5 with commit 98b930d043 ("MINOR: ssl: Define a default https
log format"), some log-format sections were shifted a bit without having
been renumberred, causing 8.2.4 to be referenced as the custom log
format while it's in fact 8.2.6. This patch fixes the affected
locations.

In addition two places mentioned 8.2.6 instead of 8.2.5 for the error
log format.

This can be backported to 2.6.
2024-05-24 17:32:59 +02:00
Amaury Denoyelle
59b69aafae DOC: quic: specify that connection migration is not supported
Currently haproxy does not support QUIC connection migration. This is
advertized to clients on their connections. Document this in the first
QUIC related paragraph.

This should be backported up to 2.6.
2024-05-24 17:32:37 +02:00
Christopher Faulet
0d7c1bc6ab BUG/MINOR: server: Don't reset resolver options on a new default-server line
When a new "default-server" line is parsed, some resolver options are reset.
Thus previously defined default options cannot be inherited. There is no
reason to do so. First because other server options are inherited. And then
because not all resolver options are reset. It is not consistent.

This patch should fix issue #2559. It should be backported to all stable
versions.
2024-05-24 16:31:01 +02:00
Christopher Faulet
8d2514e087 BUG/MINOR: http-htx: Support default path during scheme based normalization
As stated in RFC3986, for an absolute-form URI, an empty path should be
normalized to a path of "/". This is part of scheme based normalization
rules. This kind of normalization is already performed for default ports. So
we might as well deal with the case of empty path.

The associated reg-tests was updated accordingly.

This patch should fix the issue #2573. It may be backported as far as 2.4 if
necessary.
2024-05-24 16:17:24 +02:00
Aurelien DARRAGON
c16eba8183 BUG/MEDIUM: server/dns: preserve server's port upon resolution timeout or error
@boi4 reported in GH #2578 that since 3.0-dev1 for servers with address
learned from A/AAAA records after a DNS flap server would be put out of
maintenance with proper address but with invalid port (== 0), making it
unusable and causing tcp checks to fail:

[NOTICE]   (1) : Loading success.
[WARNING]  (8) : Server mybackend/myserver1 is going DOWN for maintenance (DNS refused status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT]    (8) : backend 'mybackend' has no server available!
[WARNING]  (8) : mybackend/myserver1: IP changed from '(none)' to '127.0.0.1' by 'myresolver/ns1'.
[WARNING]  (8) : Server mybackend/myserver1 ('myhost') is UP/READY (resolves again).
[WARNING]  (8) : Server mybackend/myserver1 administratively READY thanks to valid DNS answer.
[WARNING]  (8) : Server mybackend/myserver1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

@boi4 also mentioned that this used to work fine before.

Willy suggested that this regression may have been introduced by 64c9c8e
("BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS")

Turns out he was right! Indeed, in 64c9c8e we systematically memset the
whole server_inetaddr struct (which contains both the requested server's
addr and port planned for atomic update) instead of only memsetting the
addr part of the structure: except when SRV records are involved (SRV
records provide both the address and the port unlike A or AAAA records),
we must not reset the server's port upon DNS errors because the port may
have been provided at config time and we don't want to lose its value.

Big thanks to @boi4 for his well-documented issue that really helped us to
pinpoint the bug right on time for the dev-13 release.

No backport needed (unless 64c9c8e gets backported).
2024-05-24 15:29:48 +02:00
Amaury Denoyelle
98ed11b0c5 BUG/MINOR: rhttp: initialize session origin after preconnect reversal
Since the following commit, session is initialized early for rhttp
preconnect.

  12c40c25a9520fe3365950184fe724a1f4e91d03
  MEDIUM: rhttp: create session for active preconnect

Session origin member was not set. However, this prevents several
session fetches to not work as expected. Worst, this caused a regression
as previously session was created after reversal with origin member
defined. This was reported by user William Manley on the mailing-list
which rely on set-dst.

One possible fix would be to set origin on session_new(). However, as
this is done before reversal, some session members may be incorrectly
initialized, in particular source and destination address.

Thus, session origin is only set after reversal is completed. This
ensures that session fetches have the same behavior on standard
connections and reversable ones.

This does not need to be backported.
2024-05-24 14:47:21 +02:00
Amaury Denoyelle
47168e217a MEDIUM: connection: use pool-conn-name instead of sni on reuse
Implement pool-conn-name support for idle connection reuse. It replaces
SNI as arbitrary identifier for connections in the idle pool. Thus,
every SNI reference in this context have been replaced.

Main change occurs in connect_server() where pool-conn-name sample fetch
is now prehash to generate idle connection identifier. SNI is now solely
used in the context of SSL for ssl_sock_set_servername().
2024-05-24 14:47:21 +02:00
Amaury Denoyelle
be4f89f2b2 MINOR: server: define pool-conn-name keyword
Define a new server keyword pool-conn-name. The purpose of this keyword
will be to identify connections inside the idle connections pool,
replacing SNI in case SSL is not wanted.

This keyword uses a sample expression argument. It thus can reuse
existing function parse_srv_expr() for parsing. In the future, it may be
necessary to define a keyword variant which uses a logformat for
extensability.

This patch only implement parsing. Argument is stored inside new server
field <pool_conn_name> and expression is generated in
_srv_parse_finalize() into <pool_conn_name_expr>.

If pool-conn-name is not set but SNI is, the latter is reused
automatically as pool-conn-name via _srv_parse_finalize(). This ensures
current reuse behavior remains compatible and idle connection reuse will
not mix connections with different SNIs by mistake.

Main usage will be for rhttp when SSL is not wanted between the two
haproxy instances. Previously, it was possible to use "sni" keyword even
without SSL on a server line which have a similar effect. However,
having a dedicated "pool-conn-name" keyword is deemed clearer. Besides,
it would allow for more complex configuration where pool-conn-name and
SNI are use in parallel with different values.
2024-05-24 14:36:31 +02:00
Amaury Denoyelle
91001422b4 MINOR: server: generalize sni expr parsing
Two functions exists for server sni sample expression parsing. This is
confusing so this commit aims at clarifying this.

Functions are renamed with the following identifiers. First function is
named parse_srv_expr() and can be used during parsing. Besides
expression parsing, it has ensure sample fetch validity in the context
of a server line.

Second function is renamed _parse_srv_expr() and is used internally by
parse_srv_expr(). It only implements sample parsing without extra
checks. It is already use for server instantiation derived from
server-template as checks were already performed. Also, it is now used
in http-client code as SNI is a fixed string.

Finally, both functions are generalized to remove any reference to SNI.
This will allow to reuse it to parse other server keywords which use an
expression. This will be the case for the future keyword pool-conn-name.
2024-05-24 14:36:31 +02:00
Amaury Denoyelle
b9f67a46a2 MINOR: quic: clarify doc for quic_recv()
Just highlight the fact that quic_recv() only receive a single datagram.
2024-05-24 14:36:31 +02:00
Amaury Denoyelle
5764bc50b5 BUG/MINOR: quic: adjust restriction for stateless reset emission
Review RFC 9000 and ensure restriction on Stateless reset are properly
enforced. After careful examination, several changes are introduced.

First, redefine minimal Stateless Reset emitted packet length to 21
bytes (5 random bytes + a token). This is the new default length used in
every case, unless received packet which triggered it is 43 bytes or
smaller.

Ensure every Stateless Reset packets emitted are at 1 byte shorter than
the received packet which triggered it. No Stateless reset will be
emitted if this falls under the above limit of 21 bytes. Thus this
should prevent looping issues.

This should be backported up to 2.6.
2024-05-24 14:36:31 +02:00
Amaury Denoyelle
f55748a422 MAJOR: config: prevent QUIC with clients privileged port by default
Previous commit introduce new protection mechanism to forbid
communications with clients which use a privileged source port. By
default, this mechanism is disabled for every protocols.

This patch changes the default value and activate the protection
mechanism for QUIC protocol. This is justified as it is a probable sign
of DNS/NTP amplification attack.

This is labelled as major as it can be a breaking change with some
network environments.
2024-05-24 14:36:31 +02:00
Amaury Denoyelle
45f40bac4c MEDIUM: config: prevent communication with privileged ports
This commit introduces a new global setting named
harden.reject_privileged_ports.{tcp|quic}. When active, communications
with clients which use privileged source ports are forbidden. Such
behavior is considered suspicious as it can be used as spoofing or
DNS/NTP amplication attack.

Value is configured per transport protocol. For each TCP and QUIC
distinct code locations are impacted by this setting. The first one is
in sock_accept_conn() which acts as a filter for all TCP based
communications just after accept() returns a new connection. The second
one is dedicated for QUIC communication in quic_recv(). In both cases,
if a privileged source port is used and setting is disabled, received
message is silently dropped.

By default, protection are disabled for both protocols. This is to be
able to backport it without breaking changes on stable release.

This should be backported as it is an interesting security feature yet
relatively simple to implement.
2024-05-24 14:36:31 +02:00
Amaury Denoyelle
4e632545f7 BUILD: trace: fix warning on null dereference
Since a recent change on trace, the following compilation warning may
occur :
  src/trace.c: In function ‘trace_parse_cmd’:
  src/trace.c:865:33: error: potential null pointer dereference [-Werror=null-dereference]
    865 |                         for (nd = src->decoding; nd->name && nd->desc; nd++)
        |                              ~~~^~~~~~~~~~~~~~~

Fix this by rearranging code path to better highlight that only "quiet"
verbosity is allowed if no trace source is specified.

This was detected with GCC 14.1.
2024-05-24 14:36:03 +02:00
Willy Tarreau
77c228f04f DOC: config: fix aes_gcm_enc() description text
As reported by Nick Ramirez, it was written "decrypts" instead of
"encrypts". No backport needed.
2024-05-24 12:09:25 +02:00
Aurelien DARRAGON
c9af6d5414 DEBUG: pollers/fd: add thread id suffix to per-thread memory areas name hints
Willy reported that since abb8412d2 ("DEBUG: pollers: add name hint for
large memory areas used by pollers") and 22ec2ad8b ("DEBUG: fd: add name
hint for large memory areas") multiple maps with the same name could be
found in /proc/<pid>/maps when haproxy process is started with multiple
threads, which can be annoying.

In fact this happens because some poller and fd-created memory areas are
being created for each available thread, and since the naming was done
using vma_set_name() with the same <type> and <name> inputs, the resulting
name was the same for all threads.

Thanks to the previous commit, we now use vma_set_name_id() for naming
per-thread memory areas so that "-id" prefix is appended after the name
name, where "id" equals to 'tid+1' (to match the thread numbering logic
found in config file or in ha_panic() report),  allowing to easily
identify which haproxy thread owns the map in /proc/<pid>/maps:

7d3b26200000-7d3b26a01000 rw-p 00000000 00:00 0                          [anon:ev_poll:poll_events-2]
7d3b26c00000-7d3b27001000 rw-p 00000000 00:00 0                          [anon:fd:fd_updt-2]
7d3b27200000-7d3b27a01000 rw-p 00000000 00:00 0                          [anon:ev_poll:poll_events-1]
7d3b34200000-7d3b34601000 rw-p 00000000 00:00 0                          [anon:fd:fd_updt-1]
2024-05-24 12:07:18 +02:00
Aurelien DARRAGON
9d37c4b989 DEBUG: tools: add vma_set_name_id() helper
Just like vma_set_name() from 51a8f134e ("DEBUG: tools: add vma_set_name()
helper"), but also takes <id> as parameter to append "-$id" suffix after
the name in order to differentiate 2 areas that were named using the same
<type> and <name> combination.

example, using mmap + MAP_SHARED|MAP_ANONYMOUS:
  7364c4fff000-736508000000 rw-s 00000000 00:01 3540  [anon_shmem:type:name-id]
Another example, using mmap + MAP_PRIVATE|MAP_ANONYMOUS or using
glibc/malloc() above MMAP_THRESHOLD:
  7364c4fff000-736508000000 rw-s 00000000 00:01 3540  [anon:type:name-id]
2024-05-24 12:07:13 +02:00
Aurelien DARRAGON
23814a44e5 CLEANUP: tools: fix vma_set_name() function comment
There was a typo in the example provided in vma_set_name(): maps named
using the function will show up as "type:name", not "type.name", updating
the comment to reflect the current behavior.
2024-05-24 12:07:07 +02:00
Willy Tarreau
0bda33a3ec MINOR: stick-tables: remove the uneeded read lock in stksess_free()
During changes made in 2.7 by commits 8d3c3336f9 ("MEDIUM: stick-table:
make stksess_kill_if_expired() avoid the exclusive lock") and 996f1a5124
("MEDIUM: stick-table: do not take a lock to update t->current anymore."),
the operation was done cautiously one baby step at a time and the final
cleanup was not done, as we're keeping a read lock under an atomic dec.
Furthermore there's a pool_free() call under that lock, and we try to
avoid pool_alloc() and pool_free() under locks for their nasty side
effects (e.g. when memory gets recompacted), so let's really drop it
now.

Note that the performance gain is not really perceptible here, it's
essentially for code clarity reasons that this has to be done.
2024-05-24 11:52:57 +02:00
Willy Tarreau
8580f9db20 CLEANUP: stick-tables: remove a few unneeded tests for use_wrlock
Due to the code in stktable_touch_with_exp() being the same as in other
functions previously made around a loop trying first to upgrade a read
lock then to fall back to a direct write lock, there remains a confusing
construct with multiple tests on use_wrlock that is obviously zero when
tested. Let's remove them since the value is known and the loop does not
exist anymore.
2024-05-24 11:52:19 +02:00
Willy Tarreau
77f286e8bc BUG/MEDIUM: stick-tables: make sure never to create two same remote entries
In GH issue #2552, Christian Ruppert reported an increase in crashes
with recent 3.0-dev versions, always related with stick-tables and peers.
One particularity of his config is that it has a lot of peers.

While trying to reproduce, it empirically was found that firing 10 load
generators at 10 different haproxy instances tracking a random key among
100k against a table of max 5k entries, on 8 threads and between a total
of 50 parallel peers managed to reproduce the crashes in seconds, very
often in ebtree deletion or insertion code, but not only.

The debugging revealed that the crashes are often caused by a parent node
being corrupted while delete/insert tries to update it regarding a recently
inserted/removed node, and that that corrupted node had always been proven
to be deleted, then immediately freed, so it ought not be visited in the
tree from functions enclosed between a pair of lock/unlock. As such the
only possibility was that it had experienced unexpected inserts. Also,
running with pool integrity checking would 90% of the time cause crashes
during allocation based on corrupted contents in the node, likely because
it was found at two places in the same tree and still present as a parent
of a node being deleted or inserted (hence the __stksess_free and
stktable_trash_oldest callers being visible on these items).

Indeed the issue is in fact related to the test set (occasionally redundant
keys, many peers). What happens is that sometimes, a same key is learned
from two different peers. When it is learned for the first time, we end up
in stktable_touch_with_exp() in the "else" branch, where the test for
existence is made before taking the lock (since commit cfeca3a3a3
("MEDIUM: stick-table: touch updates under an upgradable read lock") that
was merged in 2.9), and from there the entry is added. But is one of the
threads manages to insert it before the other thread takes the lock, then
the second thread will try to insert this node again. And inserting an
already inserted node will corrupt the tree (note that we never switched
to enforcing a check in insertion code on this due to API history that
would break various code parts).

Here the solution is simple, it requires to recheck leaf_p after getting
the lock, to avoid touching anything if the entry has already been
inserted in the mean time.

Many thanks to Christian Ruppert for testing this and for his invaluable
help on this hard-to-trigger issue.

This fix needs to be backported to 2.9.
2024-05-24 11:52:11 +02:00
Christopher Faulet
9938fb9c7a BUG/MEDIUM: stick-tables: Fix race with peers when killing a sticky session
When a sticky session is killed, we must be sure no other entity is still
referencing it. The session's ref_cnt must be 0. However, there is a race
with peers, as decribed in 21447b1dd4 ("BUG/MAJOR: stick-tables: fix race
with peers in entry expiration"). When the update lock is acquire, we must
recheck the ref_cnt value.

This patch is part of a debugging session about issue #2552. It must be
backported to 2.9.
2024-05-24 11:52:11 +02:00
Christopher Faulet
dfd938bad6 BUG/MEDIUM: stick-tables: Fix race with peers when trashing oldest entries
It is the same that the one fixed in process_table_expire() (21447b1dd4
["BUG/MAJOR: stick-tables: fix race with peers in entry expiration"]). In
stktable_trash_oldest(), when the update lock is acquired, we must take care
to check again the ref_cnt because some peers may increment it (See commit
above for details).

This patch fixes a crash mentionned in 2552#issuecomment-2110532706. It must
be backported to 2.9.
2024-05-24 11:52:11 +02:00
Willy Tarreau
51f9f6cfd4 BUILD: quic: fix unused variable warning when threads are disabled
The tree variable was introduced in 3.0 by commit dd58dff1e6
("BUG/MEDIUM: quic: QUIC CID removed from tree without locking") which
was marked for backport. The variable is only used for locks.
Let's just mark the variable __maybe_unused for when the code is
built without threads.

The patch above was marked for backport to 2.7 so this should be
backported wherever the fix was backported.
2024-05-24 11:51:41 +02:00
Willy Tarreau
381ed2a4dd MINOR: config: add thread-hard-limit to set an upper bound to nbthread
On todays large systems, it's not always desired to run on all threads
for light loads, and usually users enforce nbthread to a lower value
(e.g. 8). The problem is that this is a fixed value, and moving such
configs to smaller machines continues to enforce the value and this
becomes extremely unproductive due to having more threads than CPUs.
This also happens quite a bit in VMs, containers, or cloud instances
of various sizes.

This commit introduces the thread-hard-limit setting that allows to only
set an upper bound to the number of threads without raising a lower value.
This means that using "thread-hard-limit 8" will make sure that no more
than 8 threads will be used when available, but it will remain two when
run on a dual-core machine.
2024-05-24 09:46:49 +02:00
William Lallemand
9c1fa3e411 CI: scripts/buil-ssl: cleanup the boringssl and quictls build
Put the quictls and boringssl build in their own function instead of
keeping it in the main part of the script.
2024-05-23 16:54:30 +02:00
William Lallemand
5d73643ca3 CI: scripts/build-ssl: add a DESTDIR and TMPDIR variable
Add a DESTDIR and TMPDIR variables so the build-ssl.sh script can be used as a
generic SSL lib installer outside the CI.

The varibles are prefixed with BUILDSSL so they doesn't collide with the
makefile one.

Ex:

  OPENSSL_VERSION=3.2.0 BUILDSSL_DESTDIR=/opt/openssl-3.2.0/ ./scripts/build-ssl.sh
  WOLFSSL_VERSION=5.7.0 BUILDSSL_DESTDIR=/opt/wolfssl-5.7.0/ ./scripts/build-ssl.sh
2024-05-23 15:34:59 +02:00
Christopher Faulet
d11249f292 MINOR: mux-quic: Set abort info for SC-less QCS on STOP_SENDING frame
It is a revert of cc9827bb09 ("BUG/MEDIUM: mux-quic: fix crash on
STOP_SENDING received without SD"). This fix was based on a wrong assumption
about QUIC streams that may have no stream-endpoint descriptor. However, it
must never happen. And this was fixed. So we can now safely revert the
commit above. However, it is not a bugfix because, for now, abort info are
only used by the upper layer. So it is not a big deal to not set it when
there is no SC.
2024-05-23 11:18:19 +02:00
Christopher Faulet
086e51017e BUG/MEDIUM: mux-quic: Create sedesc in same time of the QUIC stream
Recent changes to save abort reason revealed an issue during the QUIC stream
creation. Indeed, by design, when a mux stream is created, it must always
have a valid stream-endpoint descriptor and it must remain valid till the
mux stream destruction. On frontend side, it is the multiplexer
responsibility to create it and set it as orphan. On the backend side, the
sedesc is provided by the upper layer. It is the sedesc of the back
stream-connector.

For the QUIC multiplexer, the stream-endpoint descriptor was only created
when the stream-connector was created and attached on it. It is unexpected
and some bugs may be introduced because there is no valid sedesc on a QUIC
stream. And a recent bug was introduced for this reason.

This patch must be backported as far as 2.6.
2024-05-23 11:18:06 +02:00
Ilia Shipitsin
4a968d9d27 CI: scripts/build-ssl.sh: loudly fail on unsupported platforms 2024-05-22 16:52:43 +02:00
Willy Tarreau
c7335d55f8 BUG/MEDIUM: quic_tls: prevent LibreSSL < 4.0 from negotiating CHACHA20_POLY1305
As diagnosed in GH issue #2569, there's currently an issue in LibreSSL's
CHACHA20 in-place implementation that makes haproxy discard incoming QUIC
packets encrypted with it. It's not very easy to observe the issue because:
  - QUIC recommends that CHACHA20 is used in priority
  - on x86 with AES-NI, LibreSSL prefers AES-GCM for performance
    reasons, so the problem is only observed there if a client
    explicitly forces TLS_CHACHA20_POLY1305_SHA256 only.
  - discarded packets cause retransmits showing some apparent activity,
    and the handshake succeeds so it's not easy to analyze from the
    client which thinks that the server is slow to respond.

Thus in practice, on non-x86 machines running LibreSSL, requests made over
QUIC freeze for a long time, unless the client explicitly forces algos
excluding TLS_CHACHA20_POLY1305_SHA256. That's typically the case by
default on modern OpenBSD systems, and was reported in the issue above
for an arm64 machine running OpenBSD -current, and was also observed on a
mips64 one running OpenBSD 7.5.

There is no simple solution to this problem due to some of the protocol's
constraints without digging too low into the stack (and risking to break
more). Here we're taking a pragmatic approach consisting in making the
connection fail hard when TLS_CHACHA20_POLY1305_SHA256 is selected,
regardless of the availability of other ciphers. This means that every
time a connection would have hung, instead it will fail fast, allowing
the client to retry over TLS/TCP.

Theo Buehler recommends that we limit this protection to all LibreSSL
versions before 4.0 since it's where the fix will be implemented. Older
stable versions will just see TLS_CHACHA20_POLY1305_SHA256 disabled,
which should be sufficient to make QUIC work there again as well.

The following config is sufficient to reproduce the issue (on a non-x86
machine, both arm64 & mips64 were confirmed to reproduce it):

    global
        limited-quic

    frontend stats
        mode http
        #bind :8181
        #bind :8443 ssl crt rsa+dh2048.pem
        bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3
        timeout client 5s
        stats uri /

And the following commands will trigger the problem on affected LibreSSL
versions:
  curl --tls13-ciphers TLS_CHACHA20_POLY1305_SHA256 -v --http3 -k https://127.0.0.1:8443/
  curl -v --http3 -k https://127.0.0.1:8443/

while these ones must work:
  curl --tls13-ciphers TLS_AES_128_GCM_SHA256 -v --http3 -k https://127.0.0.1:8443/
  curl --tls13-ciphers TLS_AES_256_GCM_SHA384 -v --http3 -k https://127.0.0.1:8443/

Normally all of them will work with LibreSSL 4, and only the first one
should fail with stable LibreSSL versions higher than 3.9.2. An haproxy
version without this workaround will show an unresponsive command after
the GET is sent, while a version with the workaround will close the
connection on error. On a version with this workaround, if TCP listeners
are uncommented, curl will automatically fall back to TCP and attempt
the reqeust again over HTTP/2. Finally, on OpenSSL 1.1.1 in compat mode
(hence the limited-quic option above) all of them must work.

Many thanks to github user @lgv5 for the detailed report, tests, and
for spotting the issue, and to @botovq (Theo Buehler) for the quick
analysis, patch and help on this workaround.

This needs to be backported to versions 2.6 and above.
2024-05-22 16:22:22 +02:00
William Lallemand
0182f6bbb6 REGTESTS: scripts: allow to change the vtest timeout
$ make reg-tests VTEST_TIMEOUT=5

Allow to change the timeout of the regtests with the VTEST_TIMEOUT
variable. The default value is still 10.
2024-05-22 15:43:53 +02:00
Frederic Lecaille
169fc0b771 BUG/MAJOR: quic: Crash with TLS_AES_128_CCM_SHA256 (libressl only)
At least 3.9.0 version of libressl TLS stack does not behave as others stacks like quictls which
make SSL_do_handshake() return an error when no cipher could be negotiated
in addition to emit a TLS alert(0x28). This is the case when TLS_AES_128_CCM_SHA256
is forced as TLS1.3 cipher from the client side. This make haproxy enter a code
path which leads to a crash as follows:

[Switching to Thread 0x7ffff76b9640 (LWP 23902)]
0x0000000000487627 in quic_tls_key_update (qc=qc@entry=0x7ffff00371f0) at src/quic_tls.c:910
910             struct quic_kp_trace kp_trace = {
(gdb) list
905     {
906             struct quic_tls_ctx *tls_ctx = &qc->ael->tls_ctx;
907             struct quic_tls_secrets *rx = &tls_ctx->rx;
908             struct quic_tls_secrets *tx = &tls_ctx->tx;
909             /* Used only for the traces */
910             struct quic_kp_trace kp_trace = {
911                     .rx_sec = rx->secret,
912                     .rx_seclen = rx->secretlen,
913                     .tx_sec = tx->secret,
914                     .tx_seclen = tx->secretlen,
(gdb) p qc
$1 = (struct quic_conn *) 0x7ffff00371f0
(gdb) p qc->ael
$2 = (struct quic_enc_level *) 0x0
(gdb) bt
 #0  0x0000000000487627 in quic_tls_key_update (qc=qc@entry=0x7ffff00371f0) at src/quic_tls.c:910
 #1  0x000000000049bca9 in qc_ssl_provide_quic_data (len=268, data=<optimized out>, ctx=0x7ffff0047f80, level=<optimized out>, ncbuf=<optimized out>) at src/quic_ssl.c:617
 #2  qc_ssl_provide_all_quic_data (qc=qc@entry=0x7ffff00371f0, ctx=0x7ffff0047f80) at src/quic_ssl.c:688
 #3  0x00000000004683a7 in quic_conn_io_cb (t=0x7ffff0047f30, context=0x7ffff00371f0, state=<optimized out>) at src/quic_conn.c:760
 #4  0x000000000063cd9c in run_tasks_from_lists (budgets=budgets@entry=0x7ffff76961f0) at src/task.c:596
 #5  0x000000000063d934 in process_runnable_tasks () at src/task.c:876
 #6  0x0000000000600508 in run_poll_loop () at src/haproxy.c:3073
 #7  0x0000000000600b67 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3287
 #8  0x00007ffff7f6ae45 in start_thread () from /lib64/libpthread.so.0
 #9  0x00007ffff78254af in clone () from /lib64/libc.so.6

When a TLS alert is emitted, haproxy calls quic_set_connection_close() which sets
QUIC_FL_CONN_IMMEDIATE_CLOSE connection flag. This is this flag which is tested
by this patch to make the handshake fail even if SSL_do_handshake() does not
return an error. This test is specific to libressl and never run with
others TLS stack.

Thank you to @lgv5 and @botovq for having reported this issue in GH #2569.

Must be backported as far as 2.6.
2024-05-22 15:21:55 +02:00
Valentine Krasnobaeva
0e93549d2a MINOR: proto: fix coding style
Remove redundant brackets for 'if' statements that contain only one
instruction.
2024-05-22 12:00:11 +02:00
Valentine Krasnobaeva
83ab1479d0 BUG/MINOR: sock: fix sock_create_server_socket
Set stream_err value as SF_ERR_NONE, if obtained socket fd has passed all
common runtime and configuration related checks.

'.connect()' method implementation in higher protocol layers requires Stream
Error Flag as the return value. So, at the socket layer, we need to pass to
sock_create_server_socket() a variable to set this flag, because syscalls and
some socket options checks are convenient to performe at the socket layer.
2024-05-22 11:59:55 +02:00
Willy Tarreau
5b9503ed33 MINOR: traces: enumerate the list of levels/verbosities when not found
It's quite frustrating, particularly on the command line, not to have
access to the list of available levels and verbosities when one does
not exist for a given source, because there's no easy way to find them
except by starting without and connecting to the CLI. Let's enumerate
the list of supported levels and verbosities when a name does not match.

For example:

  $ ./haproxy -db -f quic-repro.cfg -dt h2:help
  [NOTICE]   (9602) : haproxy version is 3.0-dev12-60496e-27
  [NOTICE]   (9602) : path to executable is ./haproxy
  [ALERT]    (9602) : -dt: no such trace level 'help', available levels are 'error', 'user', 'proto', 'state', 'data', and 'developer'.

  $ ./haproxy -db -f quic-repro.cfg -dt h2:user:help
  [NOTICE]   (9604) : haproxy version is 3.0-dev12-60496e-27
  [NOTICE]   (9604) : path to executable is ./haproxy
  [ALERT]    (9604) : -dt: no such trace verbosity 'help' for source 'h2', available verbosities for this source are: 'quiet', 'clean', 'minimal', 'simple', 'advanced', and 'complete'.

The same is done for the CLI where the existing help message is always
displayed when entering an invalid verbosity or level.
2024-05-22 11:17:57 +02:00
Amaury Denoyelle
60496e884e MINOR: connection: support PROXY v2 TLV emission without stream
Update API for PROXY protocol header encoding. Previously, it requires
stream parameter to be set. Change make_proxy_line() and associated
functions to add an extra session parameter. This is useful in context
where no stream is instantiated. For example, this is the case for rhttp
preconnect.

This change allows to extend PROXY v2 TLV encoding. Replace
build_logline() which requires a stream instance and call directly
sess_build_logline().

Note that stream parameter is kept as it is necessary for unique ID
encoding.

This change has no functional change for standard connections. However,
it is necessary to support TLV encoding on rhttp preconnect.
2024-05-22 10:01:57 +02:00
Amaury Denoyelle
7a81bfc8d2 MINOR: rhttp: support PROXY emission on preconnect
Extend preconnect to support PROXY protocol emission. Code is duplicated
from connect_server() into new_reverse_conn(). This is necessary to
support send-proxy on server line used as rhttp.
2024-05-22 10:01:57 +02:00
Amaury Denoyelle
12c40c25a9 MEDIUM: rhttp: create session for active preconnect
Modify rhttp preconnect by instantiating a new session for each
connection attempt. Connection is thus linked to a session directly on
its instantiation contrary to previously where no session existed until
listener_accept().

This patch will allow to extend rhttp usage. Most notably, it will be
useful to use various sample fetches on the server line and extend
logging capabilities.

Changes are minimal, yet consequences are considered not trivial as for
the first time a FE connection session is instantiated before
listener_accept(). This requires an extra explicit check in
session_accept_fd() to not overwrite an existing session. Also, flag
SESS_FL_RELEASE_LI is not set immediately as listener counters must note
be decremented if connection and its session are freed before reversal
is completed, or else listener counters will be invalid.

conn_session_free() is used as connection destroy callback to ensure the
session will be freed automatically on connection release.
2024-05-22 10:01:57 +02:00
Amaury Denoyelle
45b80aed70 MINOR: session: define flag to explicitely release listener on free
When a session is allocated for a FE connection, session_free() is
responsible to call listener_release() to decrement listener connection
counters and resume listening.

Until now, <listener> member of session was tested inside session_free()
before invocating listener_release(). To highlight more explicitely the
relation between sessions and listeners, introduce a new flag
SESS_FL_RELEASE_LI. Only session with such flag set will invoke
listener_release() on their cleanup. Flag is set inside
session_accept_fd() on success.

This patch has no functional change. However, it will be useful to
implement session creation for rHTTP preconnect.
2024-05-22 10:01:57 +02:00
Amaury Denoyelle
808daa7cfb BUG/MINOR: rhttp: fix task_wakeup state
TASK_WOKEN_ANY was incorrectly used as argument to task_wakeup() for
rhttp preconnect task. This value is used as a flag. Replace it by
proper individual values. This is labelled as a bug but it has no known
impact.

This should be backported up to 2.9.
2024-05-22 10:01:57 +02:00
Amaury Denoyelle
2770ef352e BUG/MINOR: rhttp: prevent listener suspend
Ensure "disable frontend" on a reverse HTTP listener is forbidden by
returing -1 on suspend callback. Suspending such a listener has unknown
effect and so is not properly implemented for now.

This should be backported up to 2.9.
2024-05-22 10:01:57 +02:00
Amaury Denoyelle
ceebb09744 BUG/MEDIUM: rhttp: fix preconnect on single-thread
On initialization of a rhttp bind, the first thread available on the
listener is selected to execute the first occurence of the preconnect
task.

This thread selection was incorrect as it used my_ffsl() which returns
value indexed from 1, contrary to tid which are indexed from 0. This
cause the first listener thread to be skipped in favor of the second
one. Worst, if haproxy runs in single-thread mode, calculated thread ID
will be invalid and the task will never run, which prevent any
preconnect execution.

Fix this by substracting the result of my_ffsl() by 1 to have a value
indexed from 0.

This must be backported up to 2.9.
2024-05-22 10:01:57 +02:00
Amaury Denoyelle
4f80543220 MINOR: rhttp: add log on connection allocation failure
Add an error log when new_reverse_conn() fails. This may help to
diagnose future issues on reverse HTTP.
2024-05-22 10:01:57 +02:00
Amaury Denoyelle
3efd9f3925 BUG/MINOR: server: free PROXY v2 TLVs on srv drop
Dynamically allocated servers PROXY TLVs were not freed on server
release. This patch fixes this leak by extending srv_free_params().
Every server line with set-proxy-v2-tlv-fmt keyword is impacted.

For static servers, issue is minimal as it will only cause leak on
deinit(). However, this could be aggravated when performing multiple
removal of dynamic servers.

This should be backported up to 2.9.
2024-05-22 10:01:57 +02:00
Amaury Denoyelle
8b72270e95 BUG/MINOR: connection: parse PROXY TLV for LOCAL mode
conn_recv_proxy() is responsible to parse PROXY protocol header. For v2
of the protocol, TLVs parsing is implemented. However, this step was
only done inside 'PROXY' command label. TLVs were never extracted for
'LOCAL' command mode.

Fix this by extracting TLV parsing loop outside of the switch case. Of
notable importance, tlv_offset is updated on LOCAL label to point to
first TLV location.

This bug should be backported up to 2.9 at least. It should even
probably be backported to every stable versions. Note however that this
code has changed much over time. It may be useful to use option
'--ignore-all-space' to have a clearer overview of the git diff.
2024-05-22 10:01:57 +02:00
Christopher Faulet
eb89a7da33 MAJOR: spoe: Let the SPOE back into the game
This reverts commits 885e40494c5de6aee841222496d84dc718401fa0 and
dff98071888ae06dcec0a6c3a9222e76e893305d.

We decided to spend some time to refactor and rationnalize the SPOE for the
3.1. Thus there is no reason to still consider it as deprecated for the
3.0. Compatibility between the both versions will be maintained.

See #2502 for more info.
2024-05-22 09:04:38 +02:00
Christopher Faulet
746e6f8597 BUG/MINOR: http-ana: Don't crush stream termination condition on internal error
When internal error is reported from an HTTP analyzer, we must take care to
not set the stream termination condition if it was already set. For
instance, it happens when a message rewrite fails. In this case
SF_ERR_PXCOND is set by the rule. The HTTP analyzer must not crush it with
SF_ERR_INTERNAL.

The regression was introduced with the commit 0fd25514d6 ("MEDIUM: http-ana:
Set termination state before returning haproxy response").

The bug was discovered working in the issue #2568. It must be backported to
2.9.
2024-05-22 09:04:38 +02:00
Valentine Krasnobaeva
39caa20b3c MINOR: sock: set conn->err_code in case of EPERM
To improve the readability of sock_handle_system_err(), let's
set explicitly conn->err_code as CO_ER_SOCK_ERR in case of EPERM
(could be returned by setns syscall).
2024-05-21 20:14:31 +02:00
Valentine Krasnobaeva
5f713c03be BUG/MEDIUM: proto: fix fd leak in <proto>_connect_server
This fixes the fd leak, introduced in the commit d3fc982cd788
("MEDIUM: proto: make common fd checks in sock_create_server_socket").

Initially sock_create_server_socket() was designed to return only created
socket FD or -1. Its callers from upper protocol layers were required to test
the returned errno and were required then to apply different configuration
related checks to obtained positive sock_fd. A lot of this code was duplicated
among protocols implementations.

The new refactored version of sock_create_server_socket() gathers in one place
all duplicated checks, but in order to be complient with upper protocol
layers, it needs the 3rd parameter: 'stream_err', in which it sets the
Stream Error Flag for upper levels, if the obtained sock_fd has passed all
additional checks.

No backport needed since this was introduced in 3.0-dev10.
2024-05-21 20:14:05 +02:00
William Lallemand
04a42a92f4 DOC: configuration: add the supported crt-store options in crt-list
The crt-list supports some crt-store keywords. This patch list them in
the crt-list documentation.
2024-05-21 18:30:45 +02:00
William Lallemand
e732de7db2 DOC: configuration: update the crt-list documentation
Update the crt-list documentation with the supported keywords.

Also format it in a more clear way.

Must be backported to 2.8.
2024-05-21 18:30:45 +02:00
William Lallemand
e6657fd108 MEDIUM: ssl: don't load file by discovering them in crt-store
In commit 55e9e9591 ("MEDIUM: ssl: temporarily load files by detecting
their presence in crt-store"), ssl_sock_load_pem_into_ckch() was
replaced by ssl_sock_load_files_into_ckch() in the crt-store loading.

But the side effect was that we always try to autodetect, and this is
not what we want. This patch reverse this, and add specific code in the
crt-list loading, so we could autodetect in crt-list like it was done
before, but still try to load files when a crt-store filename keyword is
specified.

Example:

These crt-list lines won't autodetect files:

    foobar.crt [key foobar.key issuer foobar.issuer ocsp-update on] *.foo.bar
    foobar.crt [key foobar.key] *.foo.bar

These crt-list lines will autodect files:

    foobar.pem [ocsp-update on] *.foo.bar
    foobar.pem
2024-05-21 18:30:45 +02:00
Aurelien DARRAGON
22ec2ad8b0 DEBUG: fd: add name hint for large memory areas
Thanks to ("MINOR: tools: add vma_set_name() helper"), set a name hint
for large arrays created by fd api (fdtab arrays and so on) so that
that they can be easily identified in /proc/<pid>/maps.

Depending on malloc() implementation, such memory areas will normally be
merged on the heap under MMAP_THRESHOLD (128 kB by default) and will
have a dedicated memory area once the threshold is exceeded. As such, when
large enough, they will appear like this in /proc/<pid>/maps:

7b8e83200000-7b8e84201000 rw-p 00000000 00:00 0                          [anon:fd:fdinfo]
7b8e84400000-7b8e85401000 rw-p 00000000 00:00 0                          [anon:fd:polled_mask]
7b8e85600000-7b8e89601000 rw-p 00000000 00:00 0                          [anon:fd:fdtab_addr]
7b8e90a00000-7b8e90e01000 rw-p 00000000 00:00 0                          [anon:fd:fd_updt]
2024-05-21 17:55:29 +02:00
Aurelien DARRAGON
9424e5a06f DEBUG: errors: add name hint for startup-logs memory area
Thanks to ("MINOR: tools: add vma_set_name() helper"), set a name hint
for startup-logs ring's memory area created using mmap() so it can be
easily indentified in /proc/<pid>/maps.

7b8e91cce000-7b8e91cde000 rw-s 00000000 00:19 46                         [anon_shmem:errors:startup_logs]
2024-05-21 17:55:20 +02:00
Aurelien DARRAGON
abb8412d20 DEBUG: pollers: add name hint for large memory areas used by pollers
Thanks to ("MINOR: tools: add vma_set_name() helper"), set a name hint
for large memory areas allocated by pollers upon init so that they can
be easily indentified in /proc/<pid>/maps.

For now, only linux-compatible pollers are considered since vma_set_name()
requires a recent linux kernel (>= 5.17).

Depending on malloc() implementation, such memory areas will normally be
merged on the heap under MMAP_THRESHOLD (128 kB by default) and will
have a dedicated memory area once the threshold is exceeded. As such, when
large enough, they will appear like this in /proc/<pid>/maps:

7ec6b2d40000-7ec6b2d61000 rw-p 00000000 00:00 0                          [anon:ev_poll:fd_evts_wr]
7ec6b2d61000-7ec6b2d82000 rw-p 00000000 00:00 0                          [anon:ev_poll:fd_evts_rd]
2024-05-21 17:55:14 +02:00
Aurelien DARRAGON
6c5869f846 DEBUG: sink: add name hint for memory area used by memory-backed sinks
Thanks to ("MINOR: tools: add vma_set_name() helper"), set a name hint
for user created memory-backed sinks (ring sections without backing-file)
so that they can be easily indentified in /proc/<pid>/maps.

Depending on malloc() implementation, such memory areas will normally be
merged on the heap under MMAP_THRESHOLD (128 kB by default) and will
have a dedicated memory area once the threshold is exceeded. As such, when
large enough, they will appear like this in /proc/<pid>/maps:

7b8e8ac00000-7b8e8bf13000 rw-p 00000000 00:00 0                          [anon💍myring]
2024-05-21 17:55:09 +02:00
Aurelien DARRAGON
6de0da1b54 DEBUG: shctx: name shared memory using vma_set_name()
In 98d22f212 ("MEDIUM: shctx: Naming shared memory context"), David
implemented prctl/PR_SET_VMA support to give a name to shctx maps when
supported. Maps were named after "HAProxy $name". It turns out that it
is not relevant to include "HAProxy" in the map name, given that we're
already looking at maps for a given PID (and here it's HAProxy's pid).

Instead, let's name shctx maps by making use of the new vma_set_name()
helper introduced by previous commit. Resulting maps will be named
"shctx:$name", e.g.: "shctx:globalCache", they will appear like this in
/proc/<pid>/maps:

7ec6aab0f000-7ec6ac000000 rw-s 00000000 00:01 405                        [anon_shmem:shctx:custom_name]
2024-05-21 17:55:03 +02:00
Aurelien DARRAGON
51a8f134ef DEBUG: tools: add vma_set_name() helper
Following David Carlier's work in 98d22f21 ("MEDIUM: shctx: Naming shared
memory context"), let's provide an helper function to set a name hint on
a virtual memory area (ie: anonymous map created using mmap(), or memory
area returned by malloc()).

Naming will only occur if available, and naming errors will be ignored.
The function takes mandatory <type> and <name> parameterss to build the
map name as follow: "type:name". When looking at /proc/<pid>/maps, vma
named using this helper function will show up this way (provided that
the kernel has prtcl support for PR_SET_VMA_ANON_NAME):

example, using mmap + MAP_SHARED|MAP_ANONYMOUS:
  7364c4fff000-736508000000 rw-s 00000000 00:01 3540  [anon_shmem:type:name]
Another example, using mmap + MAP_PRIVATE|MAP_ANONYMOUS or using
glibc/malloc() above MMAP_THRESHOLD:
  7364c4fff000-736508000000 rw-s 00000000 00:01 3540  [anon:type:name]
2024-05-21 17:54:58 +02:00
William Lallemand
4bb6ea5d00 DOC: configuration: rework the crt-store load documentation
The load keyword from the documentation has its own section to be
readable (like the server or bind options section).

The ocsp-update keyword was move from the bind section to the crt-list
load one.
2024-05-21 12:00:55 +02:00
Aurelien DARRAGON
0cfbeb1ae8 BUG/MINOR: ring: free ring's allocated area not ring's usable area when using maps
Since 40d1c84bf0 ("BUG/MAJOR: ring: free the ring storage not the ring
itself when using maps"), munmap() call for startup_logs's ring and
file-backed rings fails to work (EINVAL) and causes memory leaks during
process cleanup.

munmap() fails because it is called with the ring's usable area pointer
which is an offset from the underlying original memory block allocated
using mmap(). Indeed, ring_area() helper function was misused because
it didn't explicitly mention that the returned address corresponds to
the usable storage's area, not the allocated one.

To fix the issue, we add an explicit ring_allocated_area() helper to
return the allocated area for the ring, just like we already have
ring_allocated_size() for the allocated size, and we properly use both
the allocated size and allocated area to manipulate them using munmap()
and msync().

No backport needed.
2024-05-21 11:42:35 +02:00
William Lallemand
d74ba7cc24 MINOR: ssl: check parameter in ckch_conf_cmp()
Check prev and new parameters in ckch_conf_cmp() so we don't dereference
a NULL ptr. There is no risk since it's not used with a NULL ptr yet.

Also remove the check that are done later, and do it at the beginning of
the function.

Should fix issue #2572.
2024-05-21 11:09:59 +02:00
William Lallemand
140078c19d CLEANUP: ssl/cli: remove unused code in dump_crtlist_conf
This code was never used because space is never define before:

    if (space) chunk_appendf(buf, " ");

Should fix issue #2571.
2024-05-21 10:58:09 +02:00
Willy Tarreau
d236b43da7 [RELEASE] Released version 3.0-dev12
Released version 3.0-dev12 with the following main changes :
    - CI: drop asan.log umbrella completely
    - BUG/MINOR: log: fix leak in add_sample_to_logformat_list() error path
    - BUG/MINOR: log: smp_rgs array issues with inherited global log directives
    - MINOR: rhttp: Don't require SSL when attach-srv name parsing
    - REGTESTS: ssl: be more verbose with ocsp_compat_check.vtc
    - DOC: Update UUID references to RFC 9562
    - MINOR: hlua: add hlua_nb_instruction getter
    - MEDIUM: hlua: take nbthread into account in hlua_get_nb_instruction()
    - BUG/MEDIUM: server: clear purgeable conns before server deletion
    - BUG/MINOR: mux-quic: fix error code on shutdown for non HTTP/3
    - BUG/MINOR: qpack: fix error code reported on QPACK decoding failure
    - BUG/MEDIUM: htx: mark htx_sl as packed since it may be realigned
    - BUG/MEDIUM: stick-tables: properly mark stktable_data as packed
    - SCRIPTS: run-regtests: fix a few occurrences of extended regexes
    - BUG/MINOR: ssl_sock: fix xprt_set_used() to properly clear the TASK_F_USR1 bit
    - MINOR: dynbuf: provide a b_dequeue() variant for multi-thread
    - BUG/MEDIUM: muxes: enforce buf_wait check in takeover()
    - BUG/MINOR: h1: Check authority for non-CONNECT methods only if a scheme is found
    - BUG/MEDIUM: h1: Reject CONNECT request if the target has a scheme
    - BUG/MAJOR: h1: Be stricter on request target validation during message parsing
    - MINOR: qpack: prepare error renaming
    - MINOR: h3/qpack: adjust naming for errors
    - MINOR: h3: adjust error reporting on sending
    - MINOR: h3: adjust error reporting on receive
    - MINOR: mux-quic: support glitches
    - MINOR: h3: report glitch on RFC violation
    - BUILD: stick-tables: better mark the stktable_data as 32-bit aligned
    - MINOR: ssl: rename tune.ssl.ocsp-update.mode in ocsp-update.mode
    - REGTESTS: update the ocsp-update tests
    - BUILD: stats: remove non portable getline() usage
    - MEDIUM: ssl: add ocsp-update.mindelay and ocsp-update.maxdelay
    - BUILD: log: get rid of non-portable strnlen() func
    - BUG/MEDIUM: fd: prevent memory waste in fdtab array
    - CLEANUP: compat: make the MIN/MAX macros more reliable
    - Revert: MEDIUM: evports: permit to report multiple events at once"
    - BUG/MINOR: stats: Don't state the 303 redirect response is chunked
    - MINOR: mux-h1: Add a flag to ignore the request payload
    - REORG: mux-h1: Group H1S_F_BODYLESS_* flags
    - CLEANUP: mux-h1: Remove unused H1S_F_ERROR_MASK mask value
    - MEDIUM: mux-h1: Support C-L/T-E header suppressions when sending messages
    - MINOR: ssl: ckch_store_new_load_files_conf() loads filenames from ckch_conf
    - MEDIUM: ssl/crtlist: loading crt-store keywords from a crt-list
    - CLEANUP: ssl/ocsp: remove the deprecated parsing code for "ocsp-update"
    - MINOR: ssl: pass ckch_store instead of ckch_data to ssl_sock_load_ocsp()
    - MEDIUM: ssl: ckch_conf_parse() uses -1/0/1 for off/default/on
    - MINOR: ssl: handle PARSE_TYPE_INT and PARSE_TYPE_ONOFF in ckch_store_load_files()
    - MINOR: ssl/ocsp: use 'ocsp-update' in crt-store
    - MINOR: ssl: ckch_conf_clean() utility function for ckch_conf
    - MEDIUM: ssl: add ocsp-update.disable global option
    - MEDIUM: ssl/cli: handle crt-store keywords in crt-list over the CLI
    - MINOR: ssl: ckch_conf_cmp() compare multiple ckch_conf structures
    - MEDIUM: ssl: temporarily load files by detecting their presence in crt-store
    - REGTESTS: ocsp-update: change the reg-test to support the new crt-store mode
    - DOC: capabilities: fix chapter header rendering
2024-05-18 16:51:23 +02:00
Valentine Krasnobaeva
63bed0161d DOC: capabilities: fix chapter header rendering
The header of a new management guide chapter, "13.1. Linux capabilities
support", is not rendered in HTML format in a proper way, because of missing
dots at the end of this chapter's number.
2024-05-18 16:48:20 +02:00
William Lallemand
d33a5f8e14 REGTESTS: ocsp-update: change the reg-test to support the new crt-store mode
Update the ocsp-update tests for the recent changes:

- Incompatibilities check string changed to match the crt-store one
- The "good configurations" are not good anymore because the
  ckch_conf_cmp() does not compare anymore with a global value.
2024-05-17 17:35:51 +02:00
William Lallemand
55e9e95914 MEDIUM: ssl: temporarily load files by detecting their presence in crt-store
crt-store is maint to be stricter than your common crt argument on a
bind line, and is supposed to be a declarative format.

However, since the 'ocsp-update' was migrated from ssl_conf to
ckch_conf, the .issuer file is not autodetected anymore when adding a
ocsp-update keyword in a crt-list file, which breaks retro-compatibility.

This patch is a quick fix that will disappear once we are able to be
strict on a crt-store and autodetect on a crt-list.
2024-05-17 17:35:51 +02:00
William Lallemand
58103bc8e6 MINOR: ssl: ckch_conf_cmp() compare multiple ckch_conf structures
The ckch_conf_cmp() function allow to compare multiple ckch_conf
structures in order to check that multiple usage of the same crt in the
configuration uses the same ckch_conf definition.

A crt-list allows to use "crt-store" keywords that defines a ckch_store,
that can lead to inconsistencies when a crt is called multiple time with
different parameters.

This function compare and dump a list of differences in the err variable
to be output as error.

The variant ckch_conf_cmp_empty() compares the ckch_conf structure to an
empty one, which is useful for bind lines, that are not able to have
crt-store keywords.

These functions are used when a crt-store is already inialized and we
need to verify if the parameters are compatible.

ckch_conf_cmp() handles multiple cases:

- When the previous ckch_conf was declared with CKCH_CONF_SET_EMPTY, we
  can't define any new keyword in the next initialisation
- When the previous ckch_conf was declared with keywords in a crtlist
  (CKCH_CONF_SET_CRTLIST), the next initialisation must have the exact
  same keywords.
- When the previous ckch_conf was declared in a "crt-store"
  (CKCH_CONF_SET_CRTSTORE), the next initialisaton could use no keyword
  at all or the exact same keywords.
2024-05-17 17:35:51 +02:00
William Lallemand
1bc6e990f2 MEDIUM: ssl/cli: handle crt-store keywords in crt-list over the CLI
This patch adds crt-store keywords from the crt-list on the CLI.

- keywords from crt-store can be used over the CLI when inserting
  certificate in a crt-list
- keywords from crt-store are dumped when showing a crt-list content
  over the CLI

The ckch_conf_kws.func function pointer needed a new "cli" parameter, in
order to differenciate loading that come from the CLI or from the
startup, as they don't behave the same. For example it must not try to
load a file on the filesystem when loading a crt-list line from the CLI.

dump_crtlist_sslconf() was renamed in dump_crtlist_conf() and takes a
new ckch_conf parameter in order to dump relevant crt-store keywords.
2024-05-17 17:35:51 +02:00
William Lallemand
2bcf38c7c8 MEDIUM: ssl: add ocsp-update.disable global option
This option allow to disable completely the ocsp-update.

To achieve this, the ocsp-update.mode global keyword don't rely anymore
on SSL_SOCK_OCSP_UPDATE_OFF during parsing to call
ssl_create_ocsp_update_task().

Instead, we will inherit the SSL_SOCK_OCSP_UPDATE_* value from
ocsp-update.mode for each certificate which does not specify its own
mode.

To disable completely the ocsp without editing all crt entries,
ocsp-update.disable is used instead of "ocsp-update.mode" which is now
only used as the default value for crt.
2024-05-17 17:35:51 +02:00
William Lallemand
2e6615b282 MINOR: ssl: ckch_conf_clean() utility function for ckch_conf
- ckch_conf_clean() to free() the content of a ckch_conf structure,
  mostly the string that were strdup()
2024-05-17 17:35:51 +02:00
William Lallemand
2b6b7fea58 MINOR: ssl/ocsp: use 'ocsp-update' in crt-store
Use the ocsp-update keyword in the crt-store section. This is not used
as an exception in the crtlist code anymore.

This patch introduces the "ocsp_update_mode" variable in the ckch_conf
structure.

The SSL_SOCK_OCSP_UPDATE_* enum was changed to a define to match the
ckch_conf on/off parser so we can have off to -1.
2024-05-17 17:35:51 +02:00
William Lallemand
462e5b0098 MINOR: ssl: handle PARSE_TYPE_INT and PARSE_TYPE_ONOFF in ckch_store_load_files()
The callback used by ckch_store_load_files() only works with
PARSE_TYPE_STR.

This allows to use a callback which will use a integer type for PARSE_TYPE_INT
and PARSE_TYPE_ONOFF.

This require to change the type of the callback to void * to pass either
a char * or a int depending of the parsing type.

The ssl_sock_load_* functions were encapsuled in ckch_conf_load_*
function just to match the type.

This will allow to handle crt-store keywords that are ONOFF or INT
types.
2024-05-17 17:35:51 +02:00
William Lallemand
c5a665f5d8 MEDIUM: ssl: ckch_conf_parse() uses -1/0/1 for off/default/on
ckch_conf_parse() now set -1 for a off value and 1 for a on value.
This allow to detect when a value is the default since the struct are memset
to 0.
2024-05-17 17:35:51 +02:00
William Lallemand
2b8880e395 MINOR: ssl: pass ckch_store instead of ckch_data to ssl_sock_load_ocsp()
ssl_sock_put_ckch_into_ctx() and ssl_sock_load_ocsp() need to take a
ckch_store in argument. Indeed the ocsp_update_mode is not stored
anymore in ckch_data, but in ckch_conf which is part of the ckch_store.

This is a minor change, but the function definition had to change.
2024-05-17 17:35:51 +02:00
William Lallemand
db09c2168f CLEANUP: ssl/ocsp: remove the deprecated parsing code for "ocsp-update"
Remove the "ocsp-update" keyword handling from the crt-list.

The code was made as an exception everywhere so we could activate the
ocsp-update for an individual certificate.

The feature will still exists but will be parsed as a "crt-store"
keyword which will still be usable in a "crt-list". This will appear in
future commits.

This commit also disable the reg-tests for now.
2024-05-17 17:35:51 +02:00
William Lallemand
d616932076 MEDIUM: ssl/crtlist: loading crt-store keywords from a crt-list
This patch allows the usage of "crt-store" keywords from a "crt-list".

The crtstore_parse_load() function was splitted into 2 functions, so the
keywords parsing is done in ckch_conf_parse().

With this patch, crt are loaded with ckch_store_new_load_files_conf() or
ckch_store_new_load_files_path() depending on weither or not there is a
"crt-store" keyword.

More checks need to be done on "crt" bind keywords to ensure that
keywords are compatible.

This patch does not introduce the feature on the CLI.
2024-05-17 17:35:51 +02:00
William Lallemand
8526d666d2 MINOR: ssl: ckch_store_new_load_files_conf() loads filenames from ckch_conf
ckch_store_new_load_files_conf() is the equivalent of
new_ckch_store_load_files_path() but instead of trying to find the files
using a base filename, it will load them from a list of files.
2024-05-17 17:35:51 +02:00
Christopher Faulet
2fc9e6fa39 MEDIUM: mux-h1: Support C-L/T-E header suppressions when sending messages
During the 2.9 dev cycle, to be able to support zero-copy data forwarding, a
change on the H1 mux was performed to ignore the headers modifications about
payload representation (Content-Length and Transfer-Encoding headers).

It appears there are some use-cases where it could be handy to change values
of these headers or just remove them. For instance, we can imagine to remove
these headers on a server response to force the old HTTP/1.0 close mode
behavior. So thaks to this patch, the rules are relaxed. It is now possible
to remove these headers. When this happens, the following rules are applied:

 * If "Content-Length" header is removed but a "Transfer-Encoding: chunked"
   header is found, no special processing is performed. The message remains
   chunked. However the close mode is not forced.

 * If "Transfer-Encoding" header is removed but a "Content-Length" header is
   found, no special processing is performed. The payload length must comply
   to the specified content length.

 * If one of them is removed and the other one is not found, a response is
   switch the close mode and a "Content-Length: 0" header is forced on a
   request.

With these rules, we fit the best to the user expectations.

This patch depends on the following commit:

  * MINOR: mux-h1: Add a flag to ignore the request payload

This patch should fix the issue #2536. It should be backported it to 2.9
with the commit above.
2024-05-17 16:33:53 +02:00
Christopher Faulet
1a2699d5f7 CLEANUP: mux-h1: Remove unused H1S_F_ERROR_MASK mask value
This mask value is unused, so we can safely remove it. It is a chance
because its value was wrong. But there is no bug here, even in stable
versions, because it is no longer used in all versions.
2024-05-17 16:33:53 +02:00
Christopher Faulet
071057d112 REORG: mux-h1: Group H1S_F_BODYLESS_* flags
To ease reading of H1S flags, H1S_F_BODYLESS_REQ and H1S_F_BODYLESS_RESP
flags are grouped.
2024-05-17 16:33:53 +02:00
Christopher Faulet
8e55d29109 MINOR: mux-h1: Add a flag to ignore the request payload
There was a flag to skip the response payload on output, if any, by stating
it is bodyless. It is used for responses to HEAD requests or for 204/304
responses. This allow rewrites during analysis. For instance a HEAD request
can be rewrite to a GET request for any reason (ie, a server not supporting
HEAD requests). In this case, the server will send a response with a
payload. On frontend side, the payload will be skipped and a valid response
(without payload) will be sent to the client.

With this patch we introduce the corresponding flag for the request. It will
be used to skip the request payload. In addition, when payload must be
skipped for a request or a response, The zero-copy data forwarding is now
disabled.
2024-05-17 16:33:53 +02:00
Christopher Faulet
45a45c917a BUG/MINOR: stats: Don't state the 303 redirect response is chunked
Start-line flags for 303-See-Other response returned by the stats applet are
not properly set. Indeed, the reponse has a "content-length" header but both
HTX_SL_F_CHNK and HTX_SL_F_CLEN flags are set. Because of this bug, the
reponse is considered as chunked. So, let's remove HTX_SL_F_CHNK flag.

And also add HTX_SL_F_BODYLESS flag because there is no payload
("content-length" header is always set to 0).

This patch must be backported to all stable versions. On the 2.8 and lower
versions, the commit d0b04920d1 ("BUG/MINOR: htpp-ana/stats: Specify that
HTX redirect messages have a C-L header") must be backported first.
2024-05-17 16:33:53 +02:00
Willy Tarreau
e362b076b1 Revert: MEDIUM: evports: permit to report multiple events at once"
Tests have shown that switching nevlist to global.tune.maxpollevents
is totally unreliable when using evports, and that events seem to be
missed. A good reproducer seems to be QUIC. There are not enough
users of Solaris to warrant spending more time trying to get down to
this, and even the few that remain are by definition not interested
in performance, so let's just revert the commit that tried to lift the
value: e6662bf706 ("MEDIUM: evports: permit to report multiple events
at once").

No backport is needed.
2024-05-17 15:57:18 +02:00
Willy Tarreau
0999e3d959 CLEANUP: compat: make the MIN/MAX macros more reliable
After every release we say that MIN/MAX should be changed to be an
expression that only evaluates each operand once, and before every
version we forget to change it and we recheck that the code doesn't
misuse them. Let's fix them now.
2024-05-17 15:57:18 +02:00
Aurelien DARRAGON
b9915a745e BUG/MEDIUM: fd: prevent memory waste in fdtab array
In 97ea9c49f1 ("BUG/MEDIUM: fd: always align fdtab[] to 64 bytes"), the
patch doesn't do what the message says. The intent was only to align the
base fdtab addr on 64 bytes so that all fdtab entries are aligned and thus
don't share the same cache line. For that, fdtab pointer is adjusted from
fdtab_addr (unaligned) address after it is allocated. Thus, all we need
is an extra 64 bytes in the fdtab_addr array for the aligment. Because
we use calloc() to perform the allocation, a dumb mistake was made: the
'+64' was added on <size> calloc argument, which means EACH fdtab entry
is allocated with 64 extra bytes.

Given that a single fdtab entry is 64 bytes, since 97ea9c49f1 each fdtab
entry now takes 128 bytes! We doubled fdtab memory consumption.

To give you an idea, on my laptop, when looking at memory consumption
using 'ps -p `pidof haproxy` -o size' right after starting haproxy
process with default settings (no maxsock enforced):

before 97ea9c49f1:
  -> 118440 (KB, ~= 118MB)

after 97ea9c49f1:
  -> 183976 (KB, ~= 184MB)

To fix this, use calloc with 1 <nmemb> and manually provide the size with
<size> as we would do if we used malloc(). With this patch, we're back to
pre-97ea9c49f1 for fdtab  memory consumption (with 64 extra bytes the
whole array, which is insignificant).

It should be backported to all stable versions.
2024-05-17 15:25:03 +02:00
Aurelien DARRAGON
e84c8dee1a BUILD: log: get rid of non-portable strnlen() func
In c614fd3b9 ("MINOR: log: add +cbor encoding option"), I wrongly used
strnlen() without noticing that the function is not portable (requires
_POSIX_C_SOURCE >= 2008) and that it was the first occurrence in the
entire project. In fact it is not a hard requirement since it's a pretty
simple function. Thus to restore build compatibility with minimal/older
build systems, let's actually get rid of it and use an equivalent portable
code where needed (we cannot simply rely on strlen() because the string
might not be NULL terminated, we must take upstream len into account).

No backport needed (unless c614fd3b9 gets backported)
2024-05-17 15:24:53 +02:00
William Lallemand
f18ed8d07e MEDIUM: ssl: add ocsp-update.mindelay and ocsp-update.maxdelay
This patch deprecates tune.ssl.ocsp-update.* in favor of
"ocsp-update.*".

Since the ocsp-update is not really a tunable of the SSL connections.
2024-05-17 15:00:11 +02:00
Amaury Denoyelle
fbc3d46b9f BUILD: stats: remove non portable getline() usage
getline() was used to read stats-file. However, this function is not
portable and may cause build issue on some systems. Replace it by
standard fgets().

No need to backport.
2024-05-17 14:53:19 +02:00
William Lallemand
ef943c186d REGTESTS: update the ocsp-update tests
Update the ocsp-update tests for the recent changes:

- "tune.ssl.ocsp-update.mode" was renamed iin "ocsp-update.mode"
2024-05-17 14:50:00 +02:00
William Lallemand
ee58fac1b4 MINOR: ssl: rename tune.ssl.ocsp-update.mode in ocsp-update.mode
Since the ocsp-update is not strictly a tuning of the SSL stack, but a
feature of its own, lets rename the option.

The option was also missing from the index.
2024-05-17 14:50:00 +02:00
Willy Tarreau
ea3b89952d BUILD: stick-tables: better mark the stktable_data as 32-bit aligned
Aurlien reported that clang's build was broken by the recent fix
845fb846c7 ("BUG/MEDIUM: stick-tables: properly mark stktable_data
as packed"), because it now wants to use a helper for some atomic
ops (to increment std_t_uint). While this makes no sense to do
something that slow on modern architectures like x86 and arm64 which
are fine with unaligned accesses, we actually we can simply mark the
struct as aligned to its smallest element which is 32-bit (but still
packed). With this, it was verified that it is enough for clang to
see that its 32-bit operations will always be aligned, while making
64-bit operations safe on 64-bit platforms that do not support unaligned
accesses.

This should be backported wherever the patch above is backported.
2024-05-17 11:00:45 +02:00
Amaury Denoyelle
0d35f8d918 MINOR: h3: report glitch on RFC violation
Increment glitch connection counter on every HTTP/3 or QPACK errors
which is a violation of the specification. This could be useful to get
rid early of bogus clients.
2024-05-16 10:58:54 +02:00
Amaury Denoyelle
216f70f989 MINOR: mux-quic: support glitches
Implement basic support for glitches on QUIC multiplexer. This is mostly
identical too glitches for HTTP/2.

A new configuration option named tune.quic.frontend.glitches-threshold
is defined to limit the number of glitches on a connection before
closing it.

Glitches counter is incremented via qcc_report_glitch(). A new
qcc_app_ops callback <report_susp> is defined. On threshold reaching, it
allows to set an application error code to close the connection. For
HTTP/3, value H3_EXCESSIVE_LOAD is returned. If not defined, default
code INTERNAL_ERROR is used.

For the moment, no glitch are reported for QUIC or HTTP/3 usage. This
will be added in future patches as needed.
2024-05-16 10:58:20 +02:00
Amaury Denoyelle
a6993a669b MINOR: h3: adjust error reporting on receive
This commit is the second step to simplify HTTP/3 error management. This
times it deals with receive side on h3_rcv_buf().

Various internal HTTP/3 to HTX conversion functions does not set
H3_INTERNAL_ERROR on h3c err anymore. Only standard error code are set.
For every errors, both internal and protocol ones, a negative value is
returned. This ensure that h3_rcv_buf() looping is interrupted. This
function will then set H3_INTERNAL_ERROR only if no standard error is
registered via h3c or h3s.

Along the previous commit, this should better reflect internal errors
from protocol ones caused by a faulty client.
2024-05-16 10:31:17 +02:00
Amaury Denoyelle
079d13f73f MINOR: h3: adjust error reporting on sending
It's currently difficult to differentiate HTTP/3 standard protocol
violation from internal issues which use solely H3_INTERNAL_ERROR code.
This patch aims is the first step to simplify this. The objective is to
reduce H3_INTERNAL_ERROR. <err> field of h3c should be reserved
exclusively to other values.

Simplify error management in sending via h3_snd_buf(). Sending side is
straightforward as only internal errors can be encountered. Do not
manually set h3c.err to H3_INTERNAL_ERROR in HTX to HTTP/3 various
conversion function. Instead, just return a negative value which is
enough to break h3_snd_buf() loop. H3_INTERNAL_ERROR is thus positionned
on a single location in this function for all sending operations.
2024-05-16 10:31:17 +02:00
Amaury Denoyelle
e094412337 MINOR: h3/qpack: adjust naming for errors
Rename enum values used for HTTP/3 and QPACK RFC defined codes. First
uses a prefix H3_ERR_* which serves as identifier between them. Also
separate QPACK values in a new dedicated enum qpack_err. This is deemed
cleaner.
2024-05-16 10:31:17 +02:00
Amaury Denoyelle
2dabcf30be MINOR: qpack: prepare error renaming
There is two distinct enums both related to QPACK error management. The
first one is dedicated to RFC defined code. The other one is a set of
internal values returned by qpack_decode_fs(). There has been issues
discovered recently due to the confusion between them.

Rename internal values with the prefix QPACK_RET_*. The older name
QPACK_ERR_* will be used in a future commit for the first enum.
2024-05-16 10:31:17 +02:00
Christopher Faulet
25bcdb1d95 BUG/MAJOR: h1: Be stricter on request target validation during message parsing
As stated in issue #2565, checks on the request target during H1 message
parsing are not good enough. Invalid paths, not starting by a slash are in
fact parsed as authorities. The same error is repeated at the sample fetch
level. This last point is annoying because routing rules may be fooled. It
is also an issue when the URI or the Host header are updated.

Because the error is repeated at different places, it must be fixed. We
cannot be lax by arguing it is the server's job to accept or reject invalid
request targets. With this patch, we strengthen the checks performed on the
request target during H1 parsing. Idea is to reject invalid requests at this
step to be sure it is safe to manipulate the path or the authority at other
places.

So now, the asterisk-form is only allowed for OPTIONS and OTHER methods.
This last point was added to not reject the H2 preface. In addition, we take
care to have only one asterisk and nothing more. For the CONNECT method, we
take care to have a valid authority-form. All other form are rejected. The
authority-form is now only supported for CONNECT method. No specific check
is performed on the origin-form (except for the CONNECT method). For the
absolute-form, we take care to have a scheme and a valid authority.

These checks are not perfect but should be good enough to properly identify
each part of the request target for a relative small cost. But, it is a
breaking change. Some requests are now be rejected while they was not on
older versions. However, nowadays, it is most probably not an issue.  If it
turns out it's really an issue for legitimate use-cases, an option would be
to supports these kinds of requests when the "accept-invalid-http-request"
option is set, with the consequence of seeing some sample fetches having an
unexpected behavior.

This patch should fix the issue #2665. It MUST NOT be backported. First
because it is a breaking change. And then because by avoiding backporting
it, it remains possible to relax the parsing with the
"accept-invalid-http-request" option.
2024-05-15 21:20:37 +02:00
Christopher Faulet
d3d9d83f03 BUG/MEDIUM: h1: Reject CONNECT request if the target has a scheme
The target of a CONNECT request must not have scheme. However, this was not
checked during the message parsing. It is now rejected.

This patch may be backported as far as 2.4.
2024-05-15 21:20:37 +02:00
Christopher Faulet
d724b0d147 BUG/MINOR: h1: Check authority for non-CONNECT methods only if a scheme is found
When a non-CONNECT H1 request is parsed, the authority is compared to the
host header value, to validate that they are the same. However there is an
issue here when a relative path is used (not begining with a '/'). In this
case, the path is considered as the authority and will be erroneously
compared to the host header value. It is observable with this kind of
request:

  GET admin HTTP/1.1
  Host: www.mysite.com

In this case "admin" is parsed as an authority while it is in fact a path.
At this step, it is not a big deal because it just happens on the very first
checks on the message during the parsing. However, the same happens when the
authority is updated. This will be fixed in another commit

Note this kind of request is invalid because the path does not start with a
'/'. But, till now, HAProxy does not reject it.

This patch is related to issue #2565. It must be backported as far as 2.4.
2024-05-15 21:20:37 +02:00
Willy Tarreau
821a04377d BUG/MEDIUM: muxes: enforce buf_wait check in takeover()
The ->takeover() is quite tricky. It didn't take care of the possibility
that the original thread's connection handler had been woken up to handle
an event (e.g. read0), failed to get a buffer, registered against its own
thread's buffer_wait queue and left the connection in an idle state.

A new thread could then come by, perform a takeover(), and when a buffer
was available, the new thread's tasklet would be woken up by the old one
via *_buf_available(), causing all sort of problems. These problems are
easy to reproduce, by running with shared backend connections and few
buffers (tune.buffers.limit=20, 8 threads, 500 connections, transfer
64kB objects and wait 2-5s for a crash to appear).

A first estimated solution consisted in removing the connection from the
idle list but it turns out that it would be worse for the delete stuff
(the connection no longer appearing as idle, making it impossible to find
it in order to close it). Also, idle counts wouldn't match anymore the
list's state, and the special case of private connections could be
difficult to handle as the connection could be forcefully re-added to the
idle list after allocation despite being private.

After multiple attempts to address the problem in various ways, it appears
that the only reliable solution for now (without starting to turn many
lists to mt_lists) is to have the takeover() function handle the buf_wait
detection or unregistration itself:

  - when doing a regular takeover aiming at finding an idle connection
    for a new request, connections that are blocked in a buffer_wait
    queue are quite rare and not interesting at all (since not immediately
    usable), so skipping them is sufficient. For this we detect that the
    desired connection belongs to a buffer_wait list by checking its
    buf_wait.list element. Note that this check is *not* thread-safe! The
    LIST_DEL_INIT() is performed by __offer_buffers() after the callback
    was called. But this is sufficient as it is now because the only way
    for the element to be seen as not in a list is after the element was
    last touched by __offer_buffers(), so the situation for this connection
    will not change in a different way later.

  - when doing a server delete, we're running under thread isolation.
    The connection might get taken over to be killed. The only trick is
    that private connections not belonging to any idle list may also
    experience this, and in this case even the idle_conns lock will not
    offer any protection against anything. But since we're run under
    thread isolation, we're certain not to compete with the other thread,
    so it's safe to directly unregister the connection from its owner
    thread. Normally this is already handled by conn_release() in
    cli_parse_delete_server(), which calls mux->destroy(), but this would
    actually update the current thread's queue instead of the origin
    thread's, thus we do need to perform an explicit dequeue before
    completing the takeover.

With this, the problem now looks solved for HTTP/1, HTTP/2 and FCGI,
though extensive tests were essentially run on HTTP/1 and HTTP/2.

While the problem has been there for a very long time, there should be
no reason to backport it since buffer_wait didn't practically work
before 3.0-dev and the process used to freeze hard very quickly before
we'd even have a chance to meet that race.
2024-05-15 19:37:12 +02:00
Willy Tarreau
b0349cf2de MINOR: dynbuf: provide a b_dequeue() variant for multi-thread
In order to forcefully unregister a buffer waiter during an inter-thread
takeover under isolation, we'll need to that the function works without
th_ctx but the target thread's ctx instead. Let's implement this by
passing the target thread as an argument. Now b_dequeue() simply calls
this one with tid. It's OK it's not on that critical a path, especially
since the list has been checked for existence before performing the call.
2024-05-15 19:37:12 +02:00
Willy Tarreau
edb99e296d BUG/MINOR: ssl_sock: fix xprt_set_used() to properly clear the TASK_F_USR1 bit
In 2.4-dev8 with commit 5c7086f6b0 ("MEDIUM: connection: protect idle
conn lists with locks"), the idle conns list started to be protected
using the lock for takeover, and the SSL layer used to always take
that lock. Later in 2.4-dev11, with commit 4149168255 ("MEDIUM: ssl:
implement xprt_set_used and xprt_set_idle to relax context checks"), we
decided to relax this lock using TASK_F_USR1 just as is done in muxes.

However the xprt_set_used() call, that's supposed to clear the flag,
visibly suffered from a copy-paste and kept the OR operation instead of
the AND, resulting in the flag never being released, so that SSL on the
backend continues to take the lock on each and every I/O access even when
the connection is not idle.

The effect is only a reduced performance. This could be backported, but
given the non-zero risk of triggering another bug somewhere, it would
be prudent to wait for this fix to be sufficiently tested in new
versions first.
2024-05-15 19:37:12 +02:00
Willy Tarreau
b6ed749adc SCRIPTS: run-regtests: fix a few occurrences of extended regexes
Running run-regtests on OpenBSD failed to identify haproxy version and
the various build options because the backslash is not recognized in
grep expressions. One must only use -E for the extended regexes and
not use the slash.
2024-05-15 19:33:45 +02:00
Willy Tarreau
845fb846c7 BUG/MEDIUM: stick-tables: properly mark stktable_data as packed
The stktable_data union is made of types of varying sizes, and depending
on which types are stored in a table, some offsets might not necessarily
be aligned. This results in a bus error for certain regtests (e.g.
lb-services) on MIPS64. This bug may impact MIPS64, SPARC64, armv7 when
accessing a 64-bit counter (e.g. bytes) and depending on how the compiler
emitted the operation, and cause a trap that's emulated by the OS on RISCV
(heavy cost). x86_64 and armv8 are not affected at all.

Let's properly mark the struct with __attribute__((packed)) so that the
compiler emits the suitable unaligned-compatible instructions when
accessing the fields.

This should be backported to all versions where it applies.
2024-05-15 19:03:18 +02:00
Willy Tarreau
276cdc11e8 BUG/MEDIUM: htx: mark htx_sl as packed since it may be realigned
A test on MIPS64 revealed that the following reg tests would all
fail at the same place in htx_replace_stline() when updating
parts of the request line:
  reg-tests/cache/if-modified-since.vtc
  reg-tests/http-rules/h1or2_to_h1c.vtc
  reg-tests/http-rules/http_after_response.vtc
  reg-tests/http-rules/normalize_uri.vtc
  reg-tests/http-rules/path_and_pathq.vtc

While the status line is normally aligned since it's the first block
of the HTX, it may become unaligned once replaced. The problem is, it
is a structure which contains some u16 and u32, and dereferencing them
on machines not natively supporting unaligned accesses makes them crash
or handle crap. Typically, MIPS/MIPS64/SPARC will crash, ARMv5 will
either crash or (more likely) return swapped values and do crap, and
RISCV will trap and turn to slow emulation.

We can assign the htx_sl struct the packed attribute, but then this
also causes the ints to fill the 2-bytes gap before them, always causing
unaligned accesses for this part on such machines. The patch does a bit
better, by explicitly filling this two-bytes hole, and packing the
struct.

This should be backported to all versions.
2024-05-15 19:03:17 +02:00
Amaury Denoyelle
86aafd0236 BUG/MINOR: qpack: fix error code reported on QPACK decoding failure
qpack_decode_fs() is used to decode QPACK field section on HTTP/3
headers parsing. Its return value is incoherent as it returns either
QPACK_DECOMPRESSION_FAILED defined in RFC 9204 or any other internal
values defined in qpack-dec.h. On failure, such return code is reused by
HTTP/3 layer to be reported via a CONNECTION_CLOSE frame. This is
incorrect if an internal error values was reported as it is not defined
by any specification.

Fir return values of qpack_decode_fs() in two ways. Firstly, fix invalid
usages of QPACK_DECOMPRESSION_FAILED when decoded content is too large
for the correct internal error QPACK_ERR_TOO_LARGE.

Secondly, adjust qpack_decode_fs() API to only returns internal code
values. A new internal enum QPACK_ERR_DECOMP is defined to replace
QPACK_DECOMPRESSION_FAILED. Caller is responsible to convert it to a
suitable error value. For other internal values, H3_INTERNAL_ERROR is
used. This is done through a set of convert functions.

This should be backported up to 2.6. Note that trailers are not
supported in 2.6 so chunk related to h3_trailers_to_htx() can be safely
skipped.
2024-05-15 16:07:15 +02:00
Amaury Denoyelle
4295dd21bd BUG/MINOR: mux-quic: fix error code on shutdown for non HTTP/3
qcc_shutdown() is called whenever the connection must be closed. If
application protocol defined its owned shutdown callback, it is invoked
to use the correct error code. Else transport error code NO_ERROR is
used.

A bug occurs in the latter case as NO_ERROR is used with quic_err_app()
which is reserved for application errro codes. This will trigger the
emission of a CONNECTION_CLOSE of type 0x1d (Application) instead of
0x1c (Transport).

This bug is considered minor as it does not impact QUIC with HTTP/3. It
may only be visible when using experimental HTTP/0.9 protocol.

This should be backported up to 2.6. For 2.6, patch must be completed
rewritten due to code differences. Here is the change to apply :

  diff --git a/src/mux_quic.c b/src/mux_quic.c
  index 26fb70ddf..c48f82e27 100644
  --- a/src/mux_quic.c
  +++ b/src/mux_quic.c
  @@ -1918,7 +1918,9 @@ static void qc_release(struct qcc *qcc)
                          qc_send(qcc);
                  }
                  else {
  -                       qcc_emit_cc_app(qcc, QC_ERR_NO_ERROR, 0);
  +                       /* Duplicate from qcc_emit_cc_app() for Transport error code. */
  +                       if (!(qcc->conn->handle.qc->flags & QUIC_FL_CONN_IMMEDIATE_CLOSE))
  +                               qcc->conn->handle.qc->err = quic_err_transport(QC_ERR_NO_ERROR);
                  }
          }
2024-05-15 16:03:01 +02:00
Amaury Denoyelle
412f1eeb89 BUG/MEDIUM: server: clear purgeable conns before server deletion
Since the following commit, idle connections are cleared before a server
is deleted. This is better than blocking server deletion due to inactive
connections :

  6e0afb2e274952663957121ea33cb6bae574fc2e
  MEDIUM: server: close idle conn on server deletion

A BUG_ON() has been added to ensure that server idle conn counter is nul
after these connections are removed. However, Willy managed to trigger
it easily by repeatedly and randomly delete servers accross a
single-thread haproxy using a server-template with 1000 instances. In
parallel, a h1load client is executed to generate traffic.

This BUG_ON() reflected that it some connections referencing the server
targetted for deletion remained, even though idle server list is empty.
In fact, this is caused by connections scheduled for purging. These
connections are moved from idle server list to a global toremove_list
while still being accounted by the server.

A first approach could be to decrement server idle counter while moving
connection to the purge list. However, this is functionnaly incorrect as
these purgeable connections still reference the server and it could
cause a crash if cleared after it.

The correct fix for this issue is simply to remove every purgeable
connections before a server is deleted. This is implemented by this
patch by extending cli_parse_delete_server(). It could be enough to only
remove connections targetted the deleted server, but as these
connections will be purged anyway it is justified to clear the whole
list.

This must not be backported, unless the above mentionned patch is.
2024-05-15 15:01:55 +02:00
Aurelien DARRAGON
231d3d32be MEDIUM: hlua: take nbthread into account in hlua_get_nb_instruction()
Based on Willy's idea (from 3.0-dev6 announcement message): in this patch
we try to reduce the max latency that can be caused by running lua scripts
with default settings.

Indeed, by default, hlua engine is allowed to process up to 10k
instructions per batch. While this value was found to be the optimal one
for a single thread, it turns out that keeping a thread busy for 10k lua
instructions could increase thread contention. This is especially true
when the script is loaded with 'lua-load', because in that case the
current thread owns the main lua lock and prevent other threads from
making any progress if they're also waiting on the main lock.

Thanks to Thierry Fournier's work, we know that performance-wise we can
reach optimal performance by sticking between 500 and 10k instructions
per batch. Given that, when the script is loaded using 'lua-load', if no
"tune.lua.forced-yield" was set by the user, we automatically divide the
default value (10K) by the number of threads haproxy can use to reduce
thread contention (given that all threads could compete for the main lua
lock), however we make sure not to return a value below 500, because
Thierry's work showed that this would come with a significant performance
loss.

The historical behavior may still be enforced by setting
"tune.lua.forced-yield" to 10000 in the global config section.
2024-05-15 11:59:44 +02:00
Aurelien DARRAGON
e60d9dddf8 MINOR: hlua: add hlua_nb_instruction getter
No functional behavior change, but this will ease the work of dynamically
computing hlua_nb_instruction value depending on various inputs.
2024-05-15 11:59:37 +02:00
Tim Duesterhus
6610f656ea DOC: Update UUID references to RFC 9562
When support for UUIDv7 was added in commit
aab6477b67415c4cc260bba5df359fa2e6f49733
the specification still was a draft.

It has since been published as RFC 9562.

This patch updates all UUID references from the obsoleted RFC 4122 and the
draft for RFC 9562 to the published RFC 9562.
2024-05-15 11:40:08 +02:00
William Lallemand
8c6f43d382 REGTESTS: ssl: be more verbose with ocsp_compat_check.vtc
the ocsp_compat_check.vtc reg-test is difficult to debug given than the
haproxy output is piped in `grep -q`.

This patch helps by showing the haproxy output as well as the return
code.
2024-05-15 10:36:02 +02:00
William Manley
366b722f7e MINOR: rhttp: Don't require SSL when attach-srv name parsing
An attach-srv config line usually looks like this:
    tcp-request session attach-srv be/srv name ssl_c_s_dn(CN)

while a rhttp server line usually looks like this:
    server srv rhttp@ sni req.hdr(host)

The server sni argument is used as a key for looking up connection in
the connection pool. The attach-srv name argument is used as a key for
inserting connections into the pool. For it to work correctly they must
match. There was a check that either both the attach-srv and server
provide that key or neither does.

It also checked that SSL and SNI was activated on the server. However,
thanks to current connect_server() implementation, it appears that SNI
is usable even without SSL to identify a connection in the pool. Thus,
it can be diverted from its original intent in reverse HTTP case to
serve even without SSL activated. For example, this could be useful to
use `fc_pp_unique_id` as a name expression (DISCLAIMER: note that for
now PROXY protocol is not compatible with rhttp).

Error is still reported if either SNI or name is used without the other.
This patch adjust the message to a more helpful one.

Arguably it would be easier to understand if instead of using `name` and
`sni` for `attach-srv` and `server` rules it used the same term in both
places - like "conn-pool-key" or something. That would make it clear
that the two must match.
2024-05-14 16:39:07 +02:00
Aurelien DARRAGON
32f0cd3242 BUG/MINOR: log: smp_rgs array issues with inherited global log directives
When a log directive is defined in the global section, each time we use
"log global" in a proxy section, the global log directives are duplicated
for the current proxy. This works by creating a new proxy logger struct
and duplicating every members for each global one.

However, smp_rgs logger member is a special pointer member that is
allocated when "range" is used on a log directive. Currently, we simply
copy the array pointer (from the global one), instead of creating our own
copy. Because of that, range log sampling may not work properly in some
situations prior to 3f1284560 ("MINOR: log: remove the unused curr_idx in
struct smp_log_range") when used in global log directives, for instance:

  global
    log 127.0.0.1:5114 format raw sample 1-2,3:4 local0 info # should receive 75% of all proxy logs
    log 127.0.0.1:5115 format raw sample 4:4 local0 info     # should receive 25% of all proxy logs

  listen proxy1
    log global

  listen proxy2
    log global

May not work as expected, because curr_idx was stored within smp_rgs array
member prior to 3f1284560, and due to this bug, it happens to be shared
between every log directive inherited from a "global" one. The result is
that curr_idx counter will not behave properly because the index will be
increased globally instead of per-log directive, and it could even suffer
from concurrent thread accesses under load since we don't own the global
log directive's lock when manipulating it.

Another issue that was revealed because of this bug is that the smp_rgs
array allocated during config parsing is never freed in free_logger(),
resulting in small memory leak during clean exit.

To fix these issues all at once, let's properly duplicate smp_rgs logger
struct member in dup_logger() like we already do for other special members
so that every log directive have its own sms_rgs copy, and then
systematically free it in free_logger().

While this bug affects all stable versions (including 2.4), it's probably
best to not backport this beyond 2.6 because of 211ea252d
("BUG/MINOR: logs: fix logsrv leaks on clean exit") prerequisite that
first appears in 2.6.

[ada: for versions prior to 2.9, 969e212
 ("MINOR: log: add dup_logsrv() helper function") and 76acde91
 ("BUG/MINOR: log: keep the ref in dup_logger()") must be backported
 first.
 Note: Some ctx adjustments should be performed because 'logger' struct
 used to be named 'logsrv' in the past and 2.9 introduced logger target
 struct member. Thus it's probably easier to manually apply 76acde91 and
 the current bugfix by hand directly on top of 969e212.
]
2024-05-14 12:00:23 +02:00
Aurelien DARRAGON
9d4a44e713 BUG/MINOR: log: fix leak in add_sample_to_logformat_list() error path
If add_sample_to_logformat_list() fails to allocate new logformat_node,
then we directly jump to error_free label to cleanup the node using
free_logformat_node() before returning an error.

However if the node failed to allocate, then the sample expression that
was allocated just before (not yet assigned) isn't released
(free_logformat_node() is a no-op when NULL is provided). Thus if expr
wasn't assigned to the node during early failure, then it must be manually
released.

This bug was introduced by 2462e5bcc ("BUG/MINOR: log: fix potential
lf->name memory leak") which wasn't marked for backports. It only
affects 3.0.
2024-05-13 16:44:27 +02:00
Ilia Shipitsin
cbe78c0281 CI: drop asan.log umbrella completely
asan.log redirection appeared to work poorly, let's cease that practice
for good.

ML: https://www.mail-archive.com/haproxy@formilux.org/msg44844.html
2024-05-13 11:36:36 +02:00
Willy Tarreau
7217a9e9b9 [RELEASE] Released version 3.0-dev11
Released version 3.0-dev11 with the following main changes :
    - BUILD: clock: improve check for pthread_getcpuclockid()
    - CI: add Illumos scheduled workflow
    - CI: netbsd: limit scheduled workflow to parent repo only
    - OPTIM: log: resolve logformat options during postparsing
    - BUG/MINOR: haproxy: only tid 0 must not sleep if got signal
    - REGTEST: add tests for acl() sample fetch
    - BUG/MINOR: acl: support built-in ACLs with acl() sample
    - BUG/MINOR: cfgparse: use curproxy global var from config post validation
    - MEDIUM: stconn/muxes: Add an abort reason for SE shutdowns on muxes
    - MINOR: mux-h2: Set the SE abort reason when a RST_STREAM frame is received
    - MEDIUM: mux-h2: Forward h2 client cancellations to h2 servers
    - MINOR: mux-quic: Set tha SE abort reason when a STOP_SENDING frame is received
    - MINOR: stconn: Add samples to retrieve about stream aborts
    - MINOR: mux-quic: Add .ctl callback function to get info about a mux connection
    - MINOR: muxes: Add ctl commands to get info on streams for a connection
    - MINOR: connection: Add samples to retrieve info on streams for a connection
    - BUG/MEDIUM: log/ring: broken syslog octet counting
    - BUG/MEDIUM: mux-quic: fix crash on STOP_SENDING received without SD
    - DOC: lua: fix filters.txt file location
    - MINOR: dynbuf: pass a criticality argument to b_alloc()
    - MINOR: dynbuf: add functions to help queue/requeue buffer_wait fields
    - MINOR: dynbuf: use the b_queue()/b_requeue() functions everywhere
    - MEDIUM: dynbuf: make the buffer_wq an array of list heads
    - CLEANUP: tinfo: better align fields in thread_ctx
    - MINOR: dynbuf: provide a b_dequeue() function to detach a bw from the queue
    - MEDIUM: dynbuf: generalize the use of b_dequeue() to detach buffer_wait
    - MEDIUM: dynbuf/stream: re-enable queueing upon failed buffer allocation
    - MEDIUM: dynbuf/stream: do not allocate the buffers in the callback
    - MEDIUM: applet: make appctx_buf_available() only wake the applet up, not allocate
    - MINOR: applet: set the blocking flag in the buffer allocation function
    - MINOR: applet: adjust the allocation criticity based on the requested buffer
    - MINOR: dynbuf/mux-h1: use different criticalities for buffer allocations
    - MEDIUM: dynbuf/mux-h1: do not allocate the buffers in the callback
    - MEDIUM: dynbuf: refrain from offering a buffer if more critical ones are waiting
    - MINOR: stconn: report that a buffer allocation succeeded
    - MINOR: stream: report that a buffer allocation succeeded
    - MINOR: applet: report about buffer allocation success
    - MINOR: mux-h1: report that a buffer allocation succeeded
    - MEDIUM: stream: allocate without queuing when retrying
    - MEDIUM: channel: allocate without queuing when retrying
    - MEDIUM: mux-h1: allocate without queuing when retrying
    - MEDIUM: dynbuf: implement emergency buffers
    - MEDIUM: dynbuf: use emergency buffers upon failed memory allocations
2024-05-10 17:39:19 +02:00
Willy Tarreau
fc792694a6 MEDIUM: dynbuf: use emergency buffers upon failed memory allocations
Now, if a pool_alloc() fails for a buffer and if conditions are met
based on the queue number, we'll try to get an emergency buffer.

Thanks to this the situation is way more stable now. With only 4 reserve
buffers and 1 buffer it's possible to reliably serve 500 concurrent end-
to-end H1 connections and consult stats in parallel in loops showing the
growing number of buf_wait events in "show activity" without facing an
instant stall like in the past. Lower values still cause quick stalls
though.

It's also apparent that some subsystems do not seem to detach from the
buffer_wait lists when leaving. For example several crashes in the H1
part showed list elements still present after a free(), so maybe some
operations performed inside h1_release() after the b_dequeue() call
can sometimes result in a new allocation. Same for streams, where
the dequeue is done relatively early.
2024-05-10 17:18:13 +02:00
Willy Tarreau
0ce51dc93b MEDIUM: dynbuf: implement emergency buffers
The buffer reserve set by tune.buffers.reserve has long been unused, and
in order to deal gracefully with failed memory allocations we'll need to
resort to a few emergency buffers that are pre-allocated per thread.

These buffers are only for emergency use, so every time their count is
below the configured number a b_free() will refill them. For this reason
their count can remain pretty low. We changed the default number from 2
to 4 per thread, and the minimum value is now zero (e.g. for low-memory
systems). The tune.buffers.limit setting has always been a problem when
trying to deal with the reserve but now we could simplify it by simply
pushing the limit (if set) to match the reserve. That was already done in
the past with a static value, but now with threads it was a bit trickier,
which is why the per-thread allocators increment the limit on the fly
before allocating their own buffers. This also means that the configured
limit is saner and now corresponds to the regular buffers that can be
allocated on top of emergency buffers.

At the moment these emergency buffers are not used upon allocation
failure. The only reason is to ease bisecting later if needed, since
this commit only has to deal with resource management.
2024-05-10 17:18:13 +02:00
Willy Tarreau
47665be083 MEDIUM: mux-h1: allocate without queuing when retrying
Now when trying to allocate a buffer, we can check if we've been notified
of availability via the callback, in which case we should not consult the
queue, or if we're doing a first allocation and check the queue. At this
point it still doesn't change much since the stream still doesn't make use
of it but some progress is expected.
2024-05-10 17:18:13 +02:00
Willy Tarreau
5b8d27617f MEDIUM: channel: allocate without queuing when retrying
Now when trying to allocate a channel buffer, we can check if we've been
notified of availability via the producer stream connector callback, in
which case we should not consult the queue, or if we're doing a first
allocation and check the queue.
2024-05-10 17:18:13 +02:00
Willy Tarreau
b5714b45e8 MEDIUM: stream: allocate without queuing when retrying
Now when trying to allocate the work buffer, we can check if we've been
notified of availability via the buf_wait callback, in which case we
should not consult the queue, or if we're doing a first allocation and
check the queue.
2024-05-10 17:18:13 +02:00
Willy Tarreau
f552f79ba5 MINOR: mux-h1: report that a buffer allocation succeeded
When the buffer allocation callback is notified of a buffer availability,
it will now set a MAYALLOC flag in addition to clearing the ALLOC one, for
each of the 3 levels where we may fail an allocation. The flag will be
cleared upon a successful allocation. This will soon be used to decide to
re-allocate without waiting again in the queue. For now it has no effect.

There's just a trick, we need to clear the various *_ALLOC flags before
testing h1_recv_allowed() otherwise it will return false!
2024-05-10 17:18:13 +02:00
Willy Tarreau
cb2d758043 MINOR: applet: report about buffer allocation success
When appctx_buf_available() is called, it now sets APPCTX_FL_IN_MAYALLOC
or APPCTX_FL_OUT_MAYALLOC depending on the reportedly permitted buffer
allocation, and these flags are cleared when the said buffers are
allocated. For now they're not used for anything else.
2024-05-10 17:18:13 +02:00
Willy Tarreau
17d8916bb1 MINOR: stream: report that a buffer allocation succeeded
When the buffer allocation callback is notified of a buffer availability,
it will now set a MAYALLOC flag on the stream so that the stream knows it
is allowed to bypass the queue checks. For now this is not used.
2024-05-10 17:18:13 +02:00
Willy Tarreau
7aff64518c MINOR: stconn: report that a buffer allocation succeeded
We used to have two states for the channel's input buffer used by the SC,
NEED_BUFF or not, flipped by sc_need_buff() and sc_have_buff(). We want to
have a 3rd state, indicating that we've just got a desired buffer. Let's
add an HAVE_BUFF flag that is set by sc_have_buff() and that is cleared by
sc_used_buff(). This way by looking at HAVE_BUFF we know that we're coming
back from the allocation callback and that the offered buffer has not yet
been used.
2024-05-10 17:18:13 +02:00
Willy Tarreau
d1eb48a12b MEDIUM: dynbuf: refrain from offering a buffer if more critical ones are waiting
Now b_alloc() will check the queues at the same and higher criticality
levels before allocating a buffer, and will refrain from allocating one
if these are not empty. The purpose is to put some priorities in the
allocation order so that most critical allocators are offered a chance
to complete.

However in order to permit a freshly dequeued task to allocate again while
siblings are still in the queue, there is a special DB_F_NOQUEUE flag to
pass to b_alloc() that will take care of this special situation.
2024-05-10 17:18:13 +02:00
Willy Tarreau
a160b3c50c MEDIUM: dynbuf/mux-h1: do not allocate the buffers in the callback
One of the problematic designs with the buffer_wait mechanism is that
the callbacks pre-allocate the buffers and stay in the run queue for
a while, resulting in all of the few buffers being assigned to waiting
tasks instead of being all available to one task that needs them all at
once.

Here we simply stop doing this, the callback clears the waiting flags
and wakes the task up so that it has a chance of still finding some
buffers.
2024-05-10 17:18:13 +02:00
Willy Tarreau
c510e81a3f MINOR: dynbuf/mux-h1: use different criticalities for buffer allocations
While it could certainly still be improved, this first approach consists
in assigning buffers like this in the H1 mux:
  - h1c->obuf : DB_MUX_TX
  - h1c->ibuf : DB_MUX_RX
  - h1s->rxbuf: DB_SE_RX

That's done via 3 distinct functions for better code clarity, and it
also allowed to move the missing buffer flags assignment there.

Among possible improvements would be to take into consideration the
state of the parser (i.e. no data yet vs data, or headers vs payload)
so that even server beginning of response or pure payload can be lowered
in priority.
2024-05-10 17:18:13 +02:00
Willy Tarreau
4a42af1744 MINOR: applet: adjust the allocation criticity based on the requested buffer
When we want to allocate an in buffer, it's in order to pass data to
the applet, that will consume it, so it must be seen as the same as
a send() from the higher level, i.e. MUX_TX. And for the outbuf, it's
a stream endpoint returning data, i.e. DB_SE_RX.
2024-05-10 17:18:13 +02:00
Willy Tarreau
4ffb3b5ebe MINOR: applet: set the blocking flag in the buffer allocation function
Instead of having each caller of appctx_get_buf() think about setting
the blocking flag, better have the function do it, since it's already
handling the queue anyway. This way we're sure that both are consistent.
2024-05-10 17:18:13 +02:00
Willy Tarreau
ee0d56ac85 MEDIUM: applet: make appctx_buf_available() only wake the applet up, not allocate
Now we don't want bufwait handlers to preallocate the resources they
were expecting since it contributes to the shortage. Let's just wake
the applet up and that's all.
2024-05-10 17:18:13 +02:00
Willy Tarreau
9a27d7aa6f MEDIUM: dynbuf/stream: do not allocate the buffers in the callback
One of the problematic designs with the buffer_wait mechanism is that
the callbacks pre-allocate the buffers and stay in the run queue for
a while, resulting in all of the few buffers being assigned to waiting
tasks instead of being all available to one task that needs them all at
once.

Here we simply stop doing this, the callback clears the waiting flags
and wakes the task up so that it has a chance of still finding some
buffers.
2024-05-10 17:18:13 +02:00
Willy Tarreau
db21062881 MEDIUM: dynbuf/stream: re-enable queueing upon failed buffer allocation
The errors were not working fine anyway since we know that upon low memory
condition everything freezes. However we have a chance to do better now,
so let's start by re-enabling queueing when allocations fail.
2024-05-10 17:18:13 +02:00
Willy Tarreau
f5566afec6 MEDIUM: dynbuf: generalize the use of b_dequeue() to detach buffer_wait
Now thanks to this the bufq_map field is expected to remain accurate.
2024-05-10 17:18:13 +02:00
Willy Tarreau
f70bd5fad1 MINOR: dynbuf: provide a b_dequeue() function to detach a bw from the queue
Now that we need to keep the bitmap in sync with the list heads, we don't
want tasks to leave just doing a LIST_DEL_INIT() without updating the map.
Let's provide a b_dequeue() function for that purpose. The function detects
when it's going to remove the last element and figures the queue number
based on the pointer since it points to the root. It's not used yet.
2024-05-10 17:18:13 +02:00
Willy Tarreau
53461e4d94 CLEANUP: tinfo: better align fields in thread_ctx
The introduction of buffer_wq[] in thread_ctx pushed a few fields around
and the cache line alignment is less satisfying. And more importantly, even
before this, all the lists in the local parts were 8-aligned, with the first
one split across two cache lines.

We can do better:
  - sched_profile_entry is not atomic at all, the data it points to is
    atomic so it doesn't need to be in the atomic-only region, and it can
    fill the 8-hole before the lists
  - the align(2*void) that was only before tasklets[] moves before all
    lists (and it's a nop for now)

This now makes the lists and buffer_wq[] start on a cache line boundary,
leaves 48 bytes after the lists before the atomic-only cache line, and
leaves a full cache line at the end for 128-alignment. This way we still
have plenty of room in both parts with better aligned fields.
2024-05-10 17:18:13 +02:00
Willy Tarreau
a5d6a79986 MEDIUM: dynbuf: make the buffer_wq an array of list heads
Let's turn the buffer_wq into an array of 4 list heads. These are chosen
by criticality. The DB_CRIT_TO_QUEUE() macro maps each criticality level
into one of these 4 queues. The goal here clearly is to make it possible
to wake up the most critical queues in priority in order to let some tasks
finish their job and release buffers that others can use.

In order to avoid having to look up all queues, a bit map indicates which
queues are in use, which also allows to avoid looping in the most common
case where queues are empty..
2024-05-10 17:18:13 +02:00
Willy Tarreau
a214197ce7 MINOR: dynbuf: use the b_queue()/b_requeue() functions everywhere
The code places that were used to manipulate the buffer_wq manually
now just call b_queue() or b_requeue(). This will simplify the multiple
list management later.
2024-05-10 17:18:13 +02:00
Willy Tarreau
d1c2f325a2 MINOR: dynbuf: add functions to help queue/requeue buffer_wait fields
When failing an allocation we always do the same dance, add the
buffer_wait struct to a list if it's not, and return. Let's just add
dedicated functions to centralize this, this will be useful to implement
a bit more complex logic.

For now they're not used.
2024-05-10 17:18:13 +02:00
Willy Tarreau
72d0dcda8e MINOR: dynbuf: pass a criticality argument to b_alloc()
The goal is to indicate how critical the allocation is, between the
least one (growing an existing buffer ring) and the topmost one (boot
time allocation for the life of the process).

The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function
to try to allocate otherwise subscribe. There's currently no distinction
of direction nor part that tries to allocate, and this should be revisited
to improve this situation, particularly when we consider that mux-h2 can
reduce its Tx allocations if needed.

For now, 4 main levels are planned, to translate how the data travels
inside haproxy from a producer to a consumer:
  - MUX_RX:   buffer used to receive data from the OS
  - SE_RX:    buffer used to place a transformation of the RX data for
              a mux, or to produce a response for an applet
  - CHANNEL:  the channel buffer for sync recv
  - MUX_TX:   buffer used to transfer data from the channel to the outside,
              generally a mux but there can be a few specificities (e.g.
              http client's response buffer passed to the application,
              which also gets a transformation of the channel data).

The other levels are a bit different in that they don't strictly need to
allocate for the first two ones, or they're permanent for the last one
(used by compression).
2024-05-10 17:18:13 +02:00
Aurelien DARRAGON
84f7525c5b DOC: lua: fix filters.txt file location
At the beginning of the filter class section, we encourage the user to
check out filters.txt file to get to know how the filters API works
within haproxy.

However the file location is incorrect. The proper directory to look for
the file is: doc/internals/api.

It should be backported up to 2.5.
2024-05-10 11:02:56 +02:00
Amaury Denoyelle
cc9827bb09 BUG/MEDIUM: mux-quic: fix crash on STOP_SENDING received without SD
Abort reason code received on STOP_SENDING is notified to upper layer
since the following commit :
  367ce1ebf3e4cead319a9f01581037c9f0280e77
  MINOR: mux-quic: Set tha SE abort reason when a STOP_SENDING frame is received

However, this causes a crash when a STOP_SENDING is received on a QCS
instance without any stream instantiated. Fix this by checking first if
qcs->sd is not NULL before setting abort code.

This bug can easily be reproduced by emitting a STOP_SENDING as first
frame of a stream.

This should fix github issue #2563.

This does not need to be backported.
2024-05-10 11:01:05 +02:00
Aurelien DARRAGON
fbbc2925d4 BUG/MEDIUM: log/ring: broken syslog octet counting
As reported by Tristan in GH #2561, syslog messages sent over rings are
malformed since commit 01aa0a05 ("MEDIUM: ring: change the ring reader
to use the new vector-based API now").

Indeed, take a look at the following log message produced prior to
01aa0a05:

  181 <134>1 2024-05-07T09:45:21.543263+02:00 - haproxy 113700 - - 127.0.0.1:56136 [07/May/2024:09:45:21.491] front front/s1 0/0/21/30/51 404 369 - - ---- 1/1/0/0/0 0/0   "GET / HTTP/1.1"

Starting with 01aa0a05, here's the equivalent log message:

  <134>1 2024-05-07T09:45:21.543263+02:00 - haproxy 112729 - - 127.0.0.1:56136 [07/May/2024:09:45:21.491] front front/s1 0/0/66/39/105 404 369 - - ---- 1/1/0/0/0 0/0   "GET / HTTP/1.1"-fwr

-> Message is missing octet counting header, and garbage bytes are found
at the end of the payload.

This bug is caused by a small mistake in syslog_applet_append_event():
when the function was refactored to use vector API instead of buffer
API, we used 'trash.area' as starting pointer to write the event instead
of 'trash.area + trash.data', causing existing octet counting prefix
(already written in trash) to be overwritten and trash.data to be
wrongly incremented.

No backport needed (01aa0a05 was introduced during 3.0 development)
2024-05-07 19:23:01 +02:00
Christopher Faulet
bd47e344b8 MINOR: connection: Add samples to retrieve info on streams for a connection
Thanks to the previous fix, it is now possible to get the number of opened
streams for a connection and the negociated limit. Here, corresponding
sample feches are added, in fc_ and bc_ scopes.

On frontend side, the limit of streams is imposed by HAProxy. But on the
backend side, the limit is defined by the server. it may be useful for
debugging purpose because it may explain slow-downs on some processing.
2024-05-06 22:00:01 +02:00
Christopher Faulet
eca9831ec8 MINOR: muxes: Add ctl commands to get info on streams for a connection
There are 2 new ctl commands that may be used to retrieve the current number
of streams openned for a connection and its limit (the maximum number of
streams a mux connection supports).

For the PT and H1 muxes, the limit is always 1 and the current number of
streams is 0 for idle connections, otherwise 1 is returned.

For the H2 and the FCGI muxes, info are already available in the mux
connection.

For the QUIC mux, the limit is also directly available. It is the maximum
initial sub-ID of bidirectional stream allowed for the connection. For the
current number of streams, it is the number of SC attached on the connection
and the number of not already attached streams present in the "opening_list"
list.
2024-05-06 22:00:00 +02:00
Christopher Faulet
12fb6d73cd MINOR: mux-quic: Add .ctl callback function to get info about a mux connection
Other muxes implement this callback function. It was not implemented for the
QUIC mux because it was useless. It will be used to retrieve the current/max
number of stream for a quic connection. So let's added it, adding the
default support for MUX_CTL_EXIT_STATUS command.
2024-05-06 22:00:00 +02:00
Christopher Faulet
068ce2d5d2 MINOR: stconn: Add samples to retrieve about stream aborts
It is now possible to retrieve some info about the abort received for a
server or a client stream, if any.

  * fs.aborted and bs.aborted can be used to know if an abort was received
    on frontend or backend side. A boolean is returned.

  * fs.rst_code and bs.rst_code return the code of the received RESET_STREAM
    frame for a H2 stream or the code of the received STOP_SENDING frame for
    a QUIC stream. In both cases, the error code attached to the frame is
    returned. The sample fetch fails if no such frame was received or if the
    stream is not an H2/QUIC stream.
2024-05-06 22:00:00 +02:00
Christopher Faulet
367ce1ebf3 MINOR: mux-quic: Set tha SE abort reason when a STOP_SENDING frame is received
When STOP_SENDING frame is received for a quic stream, the error code is now
saved in the SE abort reason. To do so, we use the QUIC source
(SE_ABRT_SRC_MUX_QUIC). For now, this code is only set but not used on the
opposite side.
2024-05-06 22:00:00 +02:00
Christopher Faulet
20b156ee15 MEDIUM: mux-h2: Forward h2 client cancellations to h2 servers
When a H2 client sends a RST_STREAM(CANCEL) frame to abort a request, the
abort reason is now used on server side, in the H2 mux, to set the
RST_STREAM code. The main use case is to forward client cancellations to
gRPC applications.

This patch should fix the issue #172.
2024-05-06 22:00:00 +02:00
Christopher Faulet
dea79f3fe1 MINOR: mux-h2: Set the SE abort reason when a RST_STREAM frame is received
When RST_STREAM frame is received, the error code is now saved in the SE
abort reason. To do so, we use the H2 source (SE_ABRT_SRC_MUX_H2). For now,
this code is only set but not used on the opposite side.
2024-05-06 22:00:00 +02:00
Christopher Faulet
96f8b7ad08 MEDIUM: stconn/muxes: Add an abort reason for SE shutdowns on muxes
A reason is now passed as parameter to muxes shutdowns to pass additional
info about the abort, if any. No info means no abort or only generic one.

For now, the reason is composed of 2 32-bits integer. The first on represents
the abort code and the other one represents the info about the code (for
instance the source). The code should be interpreted according to the associated
info.

One info is the source, encoding on 5 bits. Other bits are reserverd for now.
For now, the muxes are the only supported source. But we can imagine to extend
it to applets, streams, health-checks...

The current design is quite simple and will most probably evolved.. But the
idea is to let the opposite side forward some errors and let's a mux know
why its stream was aborted. At first glance, a abort reason must only be
evaluated if SE_SHW_SILENT flag is set.

The main goal at short term, is to forward some H2 RST_STREAM codes because
it is mandatory for gRPC applications, mainly to forward gRPC cancellation
from an H2 client to an H2 server. But we can imagine to alter this reason
at the applicative level to enrich it. It would also be used to report more
accurate errors in logs.
2024-05-06 22:00:00 +02:00
Patrick Hemmer
28489021b3 BUG/MINOR: cfgparse: use curproxy global var from config post validation
Previously check_config_validity() had its own curproxy variable. This
resulted in the acl() sample fetch being unable to determine which
proxy was in use when used from within log-format statements. This
change addresses the issue by having the check_config_validity()
function use the global variable instead.
2024-05-06 18:45:47 +02:00
Patrick Hemmer
93d4e99714 BUG/MINOR: acl: support built-in ACLs with acl() sample
Built-in ACLs were not being searched by the acl() sample fetch. This
fixes that so they are searched if no other match is found.
2024-05-06 18:42:54 +02:00
Patrick Hemmer
7c6b410b35 REGTEST: add tests for acl() sample fetch
This adds reg tests for the recently added acl() sample fetch
2024-05-06 18:41:57 +02:00
Valentine Krasnobaeva
4a9e3e102e BUG/MINOR: haproxy: only tid 0 must not sleep if got signal
This patch fixes the commit eea152ee68
("BUG/MINOR: signals/poller: ensure wakeup from signals").

There is some probability that run_poll_loop() becomes inifinite, if
TH_FL_SLEEPING is withdrawn from all threads in the second signal_queue_len
check, when a signal has received just after the first one.

In such particular case, the 'wake' variable, which is used to terminate
thread's poll loop is never reset to 0. So, we never enter to the "stopping"
part of the run_poll_loop() and threads, except the one with id 0 (tid 0
handles signals), will continue to call _do_poll() eternally and will never
sleep, as its TH_FL_SLEEPING flag was unset.

This flag needs to be removed only for the tid 0, as it was done in the first
signal_queue_len check.

This fixes an issue #2537 "infinite loop when shutting down".

This fix must be backported in every stable version.
2024-05-06 18:39:08 +02:00
Aurelien DARRAGON
03ca16f38b OPTIM: log: resolve logformat options during postparsing
In lf_buildctx_prepare(), we perform costly bitwise operations for every
nodes to resolve node options and check for incompatibilities with global
options.

In fact, all this logic may safely be performed during postparsing. This
is what we're doing in this commit. Doing so saves us from unnecessary
runtime checks and could help speedup sess_build_logline().

Since checks are not as costly as before (due to them being performed
during postparsing and not on log building path anymore), an complementary
check for OPT_HTTP vs OPT_ENCODE incompatibity was added:

  encoding is ignored if HTTP option is set, unless HTTP option wasn't
  set globally and encoding was set globally, which means encoding
  takes the precedence

Thanks to this patch, lf_buildctx_prepare() now only takes care of
assigning proper typecast and options settings depending if it's used
from global or per-node context, and prepares CBOR-specific structure
members when CBOR encode option is set.
2024-05-06 11:13:46 +02:00
Ilia Shipitsin
05ecba0813 CI: netbsd: limit scheduled workflow to parent repo only
it is not very useful for most of forks.
2024-05-06 08:26:14 +02:00
Ilia Shipitsin
fab5a23731 CI: add Illumos scheduled workflow
this is very initial build only implementation.
2024-05-06 08:26:05 +02:00
Ilia Shipitsin
a7cf2454dd BUILD: clock: improve check for pthread_getcpuclockid()
if _POSIX_THREAD_CPUTIME is greater than 0, pthread_getcpuclockid()
is implemented.

This should fix the build on Solaris 11.

Reference: https://docs.oracle.com/cd/E88353_01/html/E37842/unistd-3head.html
ML: https://www.mail-archive.com/haproxy@formilux.org/msg44915.html
2024-05-06 08:25:17 +02:00
960 changed files with 109256 additions and 36857 deletions

View File

@ -1,15 +1,15 @@
FreeBSD_task:
freebsd_instance:
matrix:
image_family: freebsd-13-2
image_family: freebsd-14-3
only_if: $CIRRUS_BRANCH =~ 'master|next'
install_script:
- pkg update -f && pkg upgrade -y && pkg install -y openssl git gmake lua53 socat pcre
- pkg update -f && pkg upgrade -y && pkg install -y openssl git gmake lua54 socat pcre2
script:
- sudo sysctl kern.corefile=/tmp/%N.%P.core
- sudo sysctl kern.sugid_coredump=1
- scripts/build-vtest.sh
- gmake CC=clang V=1 ERR=1 TARGET=freebsd USE_ZLIB=1 USE_PCRE=1 USE_OPENSSL=1 USE_LUA=1 LUA_INC=/usr/local/include/lua53 LUA_LIB=/usr/local/lib LUA_LIB_NAME=lua-5.3
- gmake CC=clang V=1 ERR=1 TARGET=freebsd USE_ZLIB=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 LUA_INC=/usr/local/include/lua54 LUA_LIB=/usr/local/lib LUA_LIB_NAME=lua-5.4
- ./haproxy -vv
- ldd haproxy
test_script:

34
.github/actions/setup-vtest/action.yml vendored Normal file
View File

@ -0,0 +1,34 @@
name: 'setup VTest'
description: 'ssss'
runs:
using: "composite"
steps:
- name: Setup coredumps
if: ${{ startsWith(matrix.os, 'ubuntu-') }}
shell: bash
run: |
sudo sysctl -w fs.suid_dumpable=1
sudo sysctl kernel.core_pattern=/tmp/core.%h.%e.%t
- name: Setup ulimit for core dumps
shell: bash
run: |
# This is required for macOS which does not actually allow to increase
# the '-n' soft limit to the hard limit, thus failing to run.
ulimit -n 65536
ulimit -c unlimited
- name: Install VTest
shell: bash
run: |
scripts/build-vtest.sh
- name: Install problem matcher for VTest
shell: bash
# This allows one to more easily see which tests fail.
run: echo "::add-matcher::.github/vtest.json"

View File

@ -19,9 +19,9 @@ defaults
frontend h2
mode http
bind 127.0.0.1:8443 ssl crt reg-tests/ssl/common.pem alpn h2,http/1.1
default_backend h2
bind 127.0.0.1:8443 ssl crt reg-tests/ssl/certs/common.pem alpn h2,http/1.1
default_backend h2b
backend h2
backend h2b
errorfile 200 .github/errorfile
http-request deny deny_status 200

129
.github/matrix.py vendored
View File

@ -67,6 +67,37 @@ def determine_latest_aws_lc(ssl):
latest_tag = max(valid_tags, key=aws_lc_version_string_to_num)
return "AWS_LC_VERSION={}".format(latest_tag[1:])
def aws_lc_fips_version_string_to_num(version_string):
return tuple(map(int, version_string[12:].split('.')))
def aws_lc_fips_version_valid(version_string):
return re.match('^AWS-LC-FIPS-[0-9]+(\.[0-9]+)*$', version_string)
@functools.lru_cache(5)
def determine_latest_aws_lc_fips(ssl):
# the AWS-LC-FIPS tags are at the end of the list, so let's get a lot
tags = get_all_github_tags("https://api.github.com/repos/aws/aws-lc/tags?per_page=200")
if not tags:
return "AWS_LC_FIPS_VERSION=failed_to_detect"
valid_tags = list(filter(aws_lc_fips_version_valid, tags))
latest_tag = max(valid_tags, key=aws_lc_fips_version_string_to_num)
return "AWS_LC_FIPS_VERSION={}".format(latest_tag[12:])
def wolfssl_version_string_to_num(version_string):
return tuple(map(int, version_string[1:].removesuffix('-stable').split('.')))
def wolfssl_version_valid(version_string):
return re.match('^v[0-9]+(\.[0-9]+)*-stable$', version_string)
@functools.lru_cache(5)
def determine_latest_wolfssl(ssl):
tags = get_all_github_tags("https://api.github.com/repos/wolfssl/wolfssl/tags")
if not tags:
return "WOLFSSL_VERSION=failed_to_detect"
valid_tags = list(filter(wolfssl_version_valid, tags))
latest_tag = max(valid_tags, key=wolfssl_version_string_to_num)
return "WOLFSSL_VERSION={}".format(latest_tag[1:].removesuffix('-stable'))
@functools.lru_cache(5)
def determine_latest_libressl(ssl):
try:
@ -94,9 +125,11 @@ def main(ref_name):
# Ubuntu
if "haproxy-" in ref_name:
os = "ubuntu-22.04" # stable branch
os = "ubuntu-24.04" # stable branch
os_arm = "ubuntu-24.04-arm" # stable branch
else:
os = "ubuntu-latest" # development branch
os = "ubuntu-24.04" # development branch
os_arm = "ubuntu-24.04-arm" # development branch
TARGET = "linux-glibc"
for CC in ["gcc", "clang"]:
@ -123,11 +156,10 @@ def main(ref_name):
"OT_INC=${HOME}/opt-ot/include",
"OT_LIB=${HOME}/opt-ot/lib",
"OT_RUNPATH=1",
"USE_PCRE=1",
"USE_PCRE_JIT=1",
"USE_PCRE2=1",
"USE_PCRE2_JIT=1",
"USE_LUA=1",
"USE_OPENSSL=1",
"USE_SYSTEMD=1",
"USE_WURFL=1",
"WURFL_INC=addons/wurfl/dummy",
"WURFL_LIB=addons/wurfl/dummy",
@ -142,37 +174,37 @@ def main(ref_name):
# ASAN
matrix.append(
{
"name": "{}, {}, ASAN, all features".format(os, CC),
"os": os,
"TARGET": TARGET,
"CC": CC,
"FLAGS": [
"USE_OBSOLETE_LINKER=1",
'ARCH_FLAGS="-g -fsanitize=address"',
'OPT_CFLAGS="-O1"',
"USE_ZLIB=1",
"USE_OT=1",
"OT_INC=${HOME}/opt-ot/include",
"OT_LIB=${HOME}/opt-ot/lib",
"OT_RUNPATH=1",
"USE_PCRE=1",
"USE_PCRE_JIT=1",
"USE_LUA=1",
"USE_OPENSSL=1",
"USE_SYSTEMD=1",
"USE_WURFL=1",
"WURFL_INC=addons/wurfl/dummy",
"WURFL_LIB=addons/wurfl/dummy",
"USE_DEVICEATLAS=1",
"DEVICEATLAS_SRC=addons/deviceatlas/dummy",
"USE_PROMEX=1",
"USE_51DEGREES=1",
"51DEGREES_SRC=addons/51degrees/dummy/pattern",
],
}
)
for os_asan in [os, os_arm]:
matrix.append(
{
"name": "{}, {}, ASAN, all features".format(os_asan, CC),
"os": os_asan,
"TARGET": TARGET,
"CC": CC,
"FLAGS": [
"USE_OBSOLETE_LINKER=1",
'ARCH_FLAGS="-g -fsanitize=address"',
'OPT_CFLAGS="-O1"',
"USE_ZLIB=1",
"USE_OT=1",
"OT_INC=${HOME}/opt-ot/include",
"OT_LIB=${HOME}/opt-ot/lib",
"OT_RUNPATH=1",
"USE_PCRE2=1",
"USE_PCRE2_JIT=1",
"USE_LUA=1",
"USE_OPENSSL=1",
"USE_WURFL=1",
"WURFL_INC=addons/wurfl/dummy",
"WURFL_LIB=addons/wurfl/dummy",
"USE_DEVICEATLAS=1",
"DEVICEATLAS_SRC=addons/deviceatlas/dummy",
"USE_PROMEX=1",
"USE_51DEGREES=1",
"51DEGREES_SRC=addons/51degrees/dummy/pattern",
],
}
)
for compression in ["USE_ZLIB=1"]:
matrix.append(
@ -189,9 +221,10 @@ def main(ref_name):
"stock",
"OPENSSL_VERSION=1.0.2u",
"OPENSSL_VERSION=1.1.1s",
"OPENSSL_VERSION=3.5.1",
"QUICTLS=yes",
"WOLFSSL_VERSION=5.6.6",
"AWS_LC_VERSION=1.16.0",
"WOLFSSL_VERSION=5.7.0",
"AWS_LC_VERSION=1.39.0",
# "BORINGSSL=yes",
]
@ -203,8 +236,7 @@ def main(ref_name):
for ssl in ssl_versions:
flags = ["USE_OPENSSL=1"]
if ssl == "BORINGSSL=yes" or ssl == "QUICTLS=yes" or "LIBRESSL" in ssl or "WOLFSSL" in ssl or "AWS_LC" in ssl:
flags.append("USE_QUIC=1")
skipdup=0
if "WOLFSSL" in ssl:
flags.append("USE_OPENSSL_WOLFSSL=1")
if "AWS_LC" in ssl:
@ -214,8 +246,23 @@ def main(ref_name):
flags.append("SSL_INC=${HOME}/opt/include")
if "LIBRESSL" in ssl and "latest" in ssl:
ssl = determine_latest_libressl(ssl)
skipdup=1
if "OPENSSL" in ssl and "latest" in ssl:
ssl = determine_latest_openssl(ssl)
skipdup=1
# if "latest" equals a version already in the list
if ssl in ssl_versions and skipdup == 1:
continue
openssl_supports_quic = False
try:
openssl_supports_quic = version.Version(ssl.split("OPENSSL_VERSION=",1)[1]) >= version.Version("3.5.0")
except:
pass
if ssl == "BORINGSSL=yes" or ssl == "QUICTLS=yes" or "LIBRESSL" in ssl or "WOLFSSL" in ssl or "AWS_LC" in ssl or openssl_supports_quic:
flags.append("USE_QUIC=1")
matrix.append(
{
@ -233,7 +280,7 @@ def main(ref_name):
if "haproxy-" in ref_name:
os = "macos-13" # stable branch
else:
os = "macos-14" # development branch
os = "macos-26" # development branch
TARGET = "osx"
for CC in ["clang"]:

12
.github/workflows/aws-lc-fips.yml vendored Normal file
View File

@ -0,0 +1,12 @@
name: AWS-LC-FIPS
on:
schedule:
- cron: "0 0 * * 4"
workflow_dispatch:
jobs:
test:
uses: ./.github/workflows/aws-lc-template.yml
with:
command: "from matrix import determine_latest_aws_lc_fips; print(determine_latest_aws_lc_fips(''))"

94
.github/workflows/aws-lc-template.yml vendored Normal file
View File

@ -0,0 +1,94 @@
name: AWS-LC template
on:
workflow_call:
inputs:
command:
required: true
type: string
permissions:
contents: read
jobs:
test:
runs-on: ubuntu-latest
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
steps:
- uses: actions/checkout@v5
- name: Determine latest AWS-LC release
id: get_aws_lc_release
run: |
result=$(cd .github && python3 -c "${{ inputs.command }}")
echo $result
echo "result=$result" >> $GITHUB_OUTPUT
- name: Cache AWS-LC
id: cache_aws_lc
uses: actions/cache@v4
with:
path: '~/opt/'
key: ssl-${{ steps.get_aws_lc_release.outputs.result }}-Ubuntu-latest-gcc
- name: Install apt dependencies
run: |
sudo apt-get update -o Acquire::Languages=none -o Acquire::Translation=none
sudo apt-get --no-install-recommends -y install socat gdb jose
- name: Install AWS-LC
if: ${{ steps.cache_ssl.outputs.cache-hit != 'true' }}
run: env ${{ steps.get_aws_lc_release.outputs.result }} scripts/build-ssl.sh
- name: Compile HAProxy
run: |
make -j$(nproc) ERR=1 CC=gcc TARGET=linux-glibc \
USE_OPENSSL_AWSLC=1 USE_QUIC=1 \
SSL_LIB=${HOME}/opt/lib SSL_INC=${HOME}/opt/include \
DEBUG="-DDEBUG_POOL_INTEGRITY -DDEBUG_UNIT" \
ADDLIB="-Wl,-rpath,/usr/local/lib/ -Wl,-rpath,$HOME/opt/lib/"
sudo make install
- name: Show HAProxy version
id: show-version
run: |
ldd $(which haproxy)
haproxy -vv
echo "version=$(haproxy -v |awk 'NR==1{print $3}')" >> $GITHUB_OUTPUT
- uses: ./.github/actions/setup-vtest
- name: Run VTest for HAProxy
id: vtest
run: |
make reg-tests VTEST_PROGRAM=../vtest/vtest REGTESTS_TYPES=default,bug,devel
- name: Run Unit tests
id: unittests
run: |
make unit-tests
- name: Show VTest results
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |
for folder in ${TMPDIR:-/tmp}/haregtests-*/vtc.*; do
printf "::group::"
cat $folder/INFO
cat $folder/LOG
echo "::endgroup::"
done
exit 1
- name: Show coredumps
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |
failed=false
shopt -s nullglob
for file in /tmp/core.*; do
failed=true
printf "::group::"
gdb -ex 'thread apply all bt full' ./haproxy $file
echo "::endgroup::"
done
if [ "$failed" = true ]; then
exit 1;
fi
- name: Show Unit-Tests results
if: ${{ failure() && steps.unittests.outcome == 'failure' }}
run: |
for result in ${TMPDIR:-/tmp}/ha-unittests-*/results/res.*; do
printf "::group::"
cat $result
echo "::endgroup::"
done
exit 1

View File

@ -5,62 +5,8 @@ on:
- cron: "0 0 * * 4"
workflow_dispatch:
permissions:
contents: read
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install VTest
run: |
scripts/build-vtest.sh
- name: Determine latest AWS-LC release
id: get_aws_lc_release
run: |
result=$(cd .github && python3 -c "from matrix import determine_latest_aws_lc; print(determine_latest_aws_lc(''))")
echo $result
echo "result=$result" >> $GITHUB_OUTPUT
- name: Cache AWS-LC
id: cache_aws_lc
uses: actions/cache@v4
with:
path: '~/opt/'
key: ssl-${{ steps.get_aws_lc_release.outputs.result }}-Ubuntu-latest-gcc
- name: Install AWS-LC
if: ${{ steps.cache_ssl.outputs.cache-hit != 'true' }}
run: env ${{ steps.get_aws_lc_release.outputs.result }} scripts/build-ssl.sh
- name: Compile HAProxy
run: |
make -j$(nproc) CC=gcc TARGET=linux-glibc \
USE_OPENSSL_AWSLC=1 USE_QUIC=1 \
SSL_LIB=${HOME}/opt/lib SSL_INC=${HOME}/opt/include \
DEBUG="-DDEBUG_POOL_INTEGRITY" \
ADDLIB="-Wl,-rpath,/usr/local/lib/ -Wl,-rpath,$HOME/opt/lib/"
sudo make install
- name: Show HAProxy version
id: show-version
run: |
ldd $(which haproxy)
haproxy -vv
echo "version=$(haproxy -v |awk 'NR==1{print $3}')" >> $GITHUB_OUTPUT
- name: Install problem matcher for VTest
run: echo "::add-matcher::.github/vtest.json"
- name: Run VTest for HAProxy
id: vtest
run: |
# This is required for macOS which does not actually allow to increase
# the '-n' soft limit to the hard limit, thus failing to run.
ulimit -n 65536
make reg-tests VTEST_PROGRAM=../vtest/vtest REGTESTS_TYPES=default,bug,devel
- name: Show VTest results
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |
for folder in ${TMPDIR}/haregtests-*/vtc.*; do
printf "::group::"
cat $folder/INFO
cat $folder/LOG
echo "::endgroup::"
done
exit 1
uses: ./.github/workflows/aws-lc-template.yml
with:
command: "from matrix import determine_latest_aws_lc; print(determine_latest_aws_lc(''))"

View File

@ -3,6 +3,7 @@ name: Spelling Check
on:
schedule:
- cron: "0 0 * * 2"
workflow_dispatch:
permissions:
contents: read
@ -10,12 +11,12 @@ permissions:
jobs:
codespell:
runs-on: ubuntu-latest
if: ${{ github.repository_owner == 'haproxy' }}
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
steps:
- uses: actions/checkout@v4
- uses: codespell-project/codespell-problem-matcher@v1
- uses: actions/checkout@v5
- uses: codespell-project/codespell-problem-matcher@v1.2.0
- uses: codespell-project/actions-codespell@master
with:
skip: CHANGELOG,Makefile,*.fig,*.pem,./doc/design-thoughts,./doc/internals
ignore_words_list: ist,ists,hist,wan,ca,cas,que,ans,te,nd,referer,ot,uint,iif,fo,keep-alives,dosen,ifset,thrid,strack,ba,chck,hel,unx,mor,clen,collet,bu,htmp,siz,experim
ignore_words_list: pres,ist,ists,hist,wan,ca,cas,que,ans,te,nd,referer,ot,uint,iif,fo,keep-alives,dosen,ifset,thrid,strack,ba,chck,hel,unx,mor,clen,collet,bu,htmp,siz,experim
uri_ignore_words_list: trafic,ressources

View File

@ -11,15 +11,10 @@ permissions:
jobs:
h2spec:
name: h2spec
runs-on: ${{ matrix.os }}
strategy:
matrix:
include:
- TARGET: linux-glibc
CC: gcc
os: ubuntu-latest
runs-on: ubuntu-latest
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v5
- name: Install h2spec
id: install-h2spec
run: |
@ -28,12 +23,12 @@ jobs:
tar xvf h2spec.tar.gz
sudo install -m755 h2spec /usr/local/bin/h2spec
echo "version=${H2SPEC_VERSION}" >> $GITHUB_OUTPUT
- name: Compile HAProxy with ${{ matrix.CC }}
- name: Compile HAProxy with gcc
run: |
make -j$(nproc) all \
ERR=1 \
TARGET=${{ matrix.TARGET }} \
CC=${{ matrix.CC }} \
TARGET=linux-glibc \
CC=gcc \
DEBUG="-DDEBUG_POOL_INTEGRITY" \
USE_OPENSSL=1
sudo make install

View File

@ -10,7 +10,7 @@ jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v5
- name: Compile admin/halog/halog
run: |
make admin/halog/halog

View File

@ -15,14 +15,15 @@ permissions:
jobs:
scan:
runs-on: ubuntu-latest
if: ${{ github.repository_owner == 'haproxy' }}
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v5
- name: Install apt dependencies
run: |
sudo apt-get update
sudo apt-get install -y \
liblua5.3-dev \
sudo apt-get update -o Acquire::Languages=none -o Acquire::Translation=none
sudo apt-get --no-install-recommends -y install \
liblua5.4-dev \
libpcre2-dev \
libsystemd-dev
- name: Install QUICTLS
run: |
@ -37,7 +38,7 @@ jobs:
- name: Build with Coverity build tool
run: |
export PATH=`pwd`/coverity_tool/bin:$PATH
cov-build --dir cov-int make CC=clang TARGET=linux-glibc USE_ZLIB=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_LUA=1 USE_OPENSSL=1 USE_QUIC=1 USE_SYSTEMD=1 USE_WURFL=1 WURFL_INC=addons/wurfl/dummy WURFL_LIB=addons/wurfl/dummy USE_DEVICEATLAS=1 DEVICEATLAS_SRC=addons/deviceatlas/dummy USE_51DEGREES=1 51DEGREES_SRC=addons/51degrees/dummy/pattern ADDLIB=\"-Wl,-rpath,$HOME/opt/lib/\" SSL_LIB=${HOME}/opt/lib SSL_INC=${HOME}/opt/include DEBUG+=-DDEBUG_STRICT=1 DEBUG+=-DDEBUG_USE_ABORT=1
cov-build --dir cov-int make CC=clang TARGET=linux-glibc USE_ZLIB=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_LUA=1 USE_OPENSSL=1 USE_QUIC=1 USE_WURFL=1 WURFL_INC=addons/wurfl/dummy WURFL_LIB=addons/wurfl/dummy USE_DEVICEATLAS=1 DEVICEATLAS_SRC=addons/deviceatlas/dummy USE_51DEGREES=1 51DEGREES_SRC=addons/51degrees/dummy/pattern ADDLIB=\"-Wl,-rpath,$HOME/opt/lib/\" SSL_LIB=${HOME}/opt/lib SSL_INC=${HOME}/opt/include DEBUG+=-DDEBUG_STRICT=2 DEBUG+=-DDEBUG_USE_ABORT=1
- name: Submit build result to Coverity Scan
run: |
tar czvf cov.tar.gz cov-int

View File

@ -6,6 +6,7 @@ name: Cross Compile
on:
schedule:
- cron: "0 0 21 * *"
workflow_dispatch:
permissions:
contents: read
@ -90,15 +91,15 @@ jobs:
}
]
runs-on: ubuntu-latest
if: ${{ github.repository_owner == 'haproxy' }}
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
steps:
- name: install packages
run: |
sudo apt-get update
sudo apt-get update -o Acquire::Languages=none -o Acquire::Translation=none
sudo apt-get -yq --force-yes install \
gcc-${{ matrix.platform.arch }} \
${{ matrix.platform.libs }}
- uses: actions/checkout@v4
- uses: actions/checkout@v5
- name: install quictls

View File

@ -3,6 +3,7 @@ name: Fedora/Rawhide/QuicTLS
on:
schedule:
- cron: "0 0 25 * *"
workflow_dispatch:
permissions:
contents: read
@ -17,19 +18,19 @@ jobs:
{ name: x86, cc: gcc, QUICTLS_EXTRA_ARGS: "-m32 linux-generic32", ADDLIB_ATOMIC: "-latomic", ARCH_FLAGS: "-m32" },
{ name: x86, cc: clang, QUICTLS_EXTRA_ARGS: "-m32 linux-generic32", ADDLIB_ATOMIC: "-latomic", ARCH_FLAGS: "-m32" }
]
fail-fast: false
name: ${{ matrix.platform.cc }}.${{ matrix.platform.name }}
runs-on: ubuntu-latest
if: ${{ github.repository_owner == 'haproxy' }}
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
container:
image: fedora:rawhide
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v5
- name: Install dependencies
run: |
dnf -y install diffutils git pcre-devel zlib-devel pcre2-devel 'perl(FindBin)' perl-IPC-Cmd 'perl(File::Copy)' 'perl(File::Compare)' lua-devel socat findutils systemd-devel clang
dnf -y install awk diffutils git pcre-devel zlib-devel pcre2-devel 'perl(FindBin)' perl-IPC-Cmd 'perl(File::Copy)' 'perl(File::Compare)' lua-devel socat findutils systemd-devel clang
dnf -y install 'perl(FindBin)' 'perl(File::Compare)' perl-IPC-Cmd 'perl(File::Copy)' glibc-devel.i686 lua-devel.i686 lua-devel.x86_64 systemd-devel.i686 zlib-ng-compat-devel.i686 pcre-devel.i686 libatomic.i686
- name: Install VTest
run: scripts/build-vtest.sh
- uses: ./.github/actions/setup-vtest
- name: Install QuicTLS
run: QUICTLS=yes QUICTLS_EXTRA_ARGS="${{ matrix.platform.QUICTLS_EXTRA_ARGS }}" scripts/build-ssl.sh
- name: Build contrib tools
@ -40,7 +41,7 @@ jobs:
make dev/hpack/decode dev/hpack/gen-enc dev/hpack/gen-rht
- name: Compile HAProxy with ${{ matrix.platform.cc }}
run: |
make -j3 CC=${{ matrix.platform.cc }} V=1 ERR=1 TARGET=linux-glibc USE_OPENSSL=1 USE_QUIC=1 USE_ZLIB=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_LUA=1 USE_SYSTEMD=1 ADDLIB="${{ matrix.platform.ADDLIB_ATOMIC }} -Wl,-rpath,${HOME}/opt/lib" SSL_LIB=${HOME}/opt/lib SSL_INC=${HOME}/opt/include ARCH_FLAGS="${{ matrix.platform.ARCH_FLAGS }}"
make -j3 CC=${{ matrix.platform.cc }} V=1 ERR=1 TARGET=linux-glibc DEBUG="-DDEBUG_POOL_INTEGRITY -DDEBUG_UNIT" USE_OPENSSL=1 USE_QUIC=1 USE_ZLIB=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_LUA=1 ADDLIB="${{ matrix.platform.ADDLIB_ATOMIC }} -Wl,-rpath,${HOME}/opt/lib" SSL_LIB=${HOME}/opt/lib SSL_INC=${HOME}/opt/include ARCH_FLAGS="${{ matrix.platform.ARCH_FLAGS }}"
make install
- name: Show HAProxy version
id: show-version
@ -57,9 +58,13 @@ jobs:
- name: Show VTest results
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |
for folder in ${TMPDIR}/haregtests-*/vtc.*; do
for folder in ${TMPDIR:-/tmp}/haregtests-*/vtc.*; do
printf "::group::"
cat $folder/INFO
cat $folder/LOG
echo "::endgroup::"
done
- name: Run Unit tests
id: unittests
run: |
make unit-tests

24
.github/workflows/illumos.yml vendored Normal file
View File

@ -0,0 +1,24 @@
name: Illumos
on:
schedule:
- cron: "0 0 25 * *"
workflow_dispatch:
jobs:
gcc:
runs-on: ubuntu-latest
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
permissions:
contents: read
steps:
- name: "Checkout repository"
uses: actions/checkout@v5
- name: "Build on VM"
uses: vmactions/solaris-vm@v1
with:
prepare: |
pkg install gcc make
run: |
gmake CC=gcc TARGET=solaris USE_OPENSSL=1 USE_PROMEX=1

View File

@ -20,13 +20,13 @@ jobs:
run: |
ulimit -c unlimited
echo '/tmp/core/core.%h.%e.%t' > /proc/sys/kernel/core_pattern
- uses: actions/checkout@v4
- uses: actions/checkout@v5
- name: Install dependencies
run: apk add gcc gdb make tar git python3 libc-dev linux-headers pcre-dev pcre2-dev openssl-dev lua5.3-dev grep socat curl musl-dbg lua5.3-dbg
run: apk add gcc gdb make tar git python3 libc-dev linux-headers pcre-dev pcre2-dev openssl-dev lua5.3-dev grep socat curl musl-dbg lua5.3-dbg jose
- name: Install VTest
run: scripts/build-vtest.sh
- name: Build
run: make -j$(nproc) TARGET=linux-musl ARCH_FLAGS='-ggdb3' CC=cc V=1 USE_LUA=1 LUA_INC=/usr/include/lua5.3 LUA_LIB=/usr/lib/lua5.3 USE_OPENSSL=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_PROMEX=1
run: make -j$(nproc) TARGET=linux-musl DEBUG="-DDEBUG_POOL_INTEGRITY -DDEBUG_UNIT" ARCH_FLAGS='-ggdb3' CC=cc V=1 USE_LUA=1 LUA_INC=/usr/include/lua5.3 LUA_LIB=/usr/lib/lua5.3 USE_OPENSSL=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_PROMEX=1
- name: Show version
run: ./haproxy -vv
- name: Show linked libraries
@ -37,6 +37,10 @@ jobs:
- name: Run VTest
id: vtest
run: make reg-tests VTEST_PROGRAM=../vtest/vtest REGTESTS_TYPES=default,bug,devel
- name: Run Unit tests
id: unittests
run: |
make unit-tests
- name: Show coredumps
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |
@ -60,3 +64,13 @@ jobs:
cat $folder/LOG
echo "::endgroup::"
done
- name: Show Unit-Tests results
if: ${{ failure() && steps.unittests.outcome == 'failure' }}
run: |
for result in ${TMPDIR:-/tmp}/ha-unittests-*/results/res.*; do
printf "::group::"
cat $result
echo "::endgroup::"
done
exit 1

View File

@ -3,15 +3,17 @@ name: NetBSD
on:
schedule:
- cron: "0 0 25 * *"
workflow_dispatch:
jobs:
gcc:
runs-on: ubuntu-latest
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
permissions:
contents: read
steps:
- name: "Checkout repository"
uses: actions/checkout@v4
uses: actions/checkout@v5
- name: "Build on VM"
uses: vmactions/netbsd-vm@v1
@ -19,4 +21,4 @@ jobs:
prepare: |
/usr/sbin/pkg_add gmake curl
run: |
gmake CC=gcc TARGET=netbsd USE_OPENSSL=1 USE_LUA=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_PROMEX=1 USE_ZLIB=1
gmake CC=gcc TARGET=netbsd ERR=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_PROMEX=1 USE_ZLIB=1

82
.github/workflows/openssl-ech.yml vendored Normal file
View File

@ -0,0 +1,82 @@
name: openssl ECH
on:
schedule:
- cron: "0 3 * * *"
workflow_dispatch:
permissions:
contents: read
jobs:
test:
runs-on: ubuntu-latest
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
steps:
- uses: actions/checkout@v5
- name: Install VTest
run: |
scripts/build-vtest.sh
- name: Install apt dependencies
run: |
sudo apt-get update -o Acquire::Languages=none -o Acquire::Translation=none
sudo apt-get --no-install-recommends -y install socat gdb
sudo apt-get --no-install-recommends -y install libpsl-dev
- name: Install OpenSSL+ECH
run: env OPENSSL_VERSION="git-feature/ech" GIT_TYPE="branch" scripts/build-ssl.sh
- name: Install curl+ECH
run: env SSL_LIB=${HOME}/opt/ scripts/build-curl.sh
- name: Compile HAProxy
run: |
make -j$(nproc) CC=gcc TARGET=linux-glibc \
USE_QUIC=1 USE_OPENSSL=1 USE_ECH=1 \
SSL_LIB=${HOME}/opt/lib SSL_INC=${HOME}/opt/include \
DEBUG="-DDEBUG_POOL_INTEGRITY -DDEBUG_UNIT" \
ADDLIB="-Wl,-rpath,/usr/local/lib/ -Wl,-rpath,$HOME/opt/lib/" \
ARCH_FLAGS="-ggdb3 -fsanitize=address"
sudo make install
- name: Show HAProxy version
id: show-version
run: |
ldd $(which haproxy)
haproxy -vv
echo "version=$(haproxy -v |awk 'NR==1{print $3}')" >> $GITHUB_OUTPUT
- name: Install problem matcher for VTest
run: echo "::add-matcher::.github/vtest.json"
- name: Run VTest for HAProxy
id: vtest
run: |
# This is required for macOS which does not actually allow to increase
# the '-n' soft limit to the hard limit, thus failing to run.
ulimit -n 65536
# allow to catch coredumps
ulimit -c unlimited
make reg-tests VTEST_PROGRAM=../vtest/vtest REGTESTS_TYPES=default,bug,devel
- name: Show VTest results
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |
for folder in ${TMPDIR:-/tmp}/haregtests-*/vtc.*; do
printf "::group::"
cat $folder/INFO
cat $folder/LOG
echo "::endgroup::"
done
exit 1
- name: Run Unit tests
id: unittests
run: |
make unit-tests
- name: Show coredumps
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |
failed=false
shopt -s nullglob
for file in /tmp/core.*; do
failed=true
printf "::group::"
gdb -ex 'thread apply all bt full' ./haproxy $file
echo "::endgroup::"
done
if [ "$failed" = true ]; then
exit 1;
fi

77
.github/workflows/openssl-master.yml vendored Normal file
View File

@ -0,0 +1,77 @@
name: openssl master
on:
schedule:
- cron: "0 3 * * *"
workflow_dispatch:
permissions:
contents: read
jobs:
test:
runs-on: ubuntu-latest
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
steps:
- uses: actions/checkout@v5
- name: Install apt dependencies
run: |
sudo apt-get update -o Acquire::Languages=none -o Acquire::Translation=none
sudo apt-get --no-install-recommends -y install socat gdb
sudo apt-get --no-install-recommends -y install libpsl-dev
- uses: ./.github/actions/setup-vtest
- name: Install OpenSSL master
run: env OPENSSL_VERSION="git-master" GIT_TYPE="branch" scripts/build-ssl.sh
- name: Compile HAProxy
run: |
make -j$(nproc) ERR=1 CC=gcc TARGET=linux-glibc \
USE_QUIC=1 USE_OPENSSL=1 \
SSL_LIB=${HOME}/opt/lib SSL_INC=${HOME}/opt/include \
DEBUG="-DDEBUG_POOL_INTEGRITY -DDEBUG_UNIT" \
ADDLIB="-Wl,-rpath,/usr/local/lib/ -Wl,-rpath,$HOME/opt/lib/"
sudo make install
- name: Show HAProxy version
id: show-version
run: |
ldd $(which haproxy)
haproxy -vv
echo "version=$(haproxy -v |awk 'NR==1{print $3}')" >> $GITHUB_OUTPUT
- name: Install problem matcher for VTest
run: echo "::add-matcher::.github/vtest.json"
- name: Run VTest for HAProxy
id: vtest
run: |
# This is required for macOS which does not actually allow to increase
# the '-n' soft limit to the hard limit, thus failing to run.
ulimit -n 65536
# allow to catch coredumps
ulimit -c unlimited
make reg-tests VTEST_PROGRAM=../vtest/vtest REGTESTS_TYPES=default,bug,devel
- name: Show VTest results
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |
for folder in ${TMPDIR:-/tmp}/haregtests-*/vtc.*; do
printf "::group::"
cat $folder/INFO
cat $folder/LOG
echo "::endgroup::"
done
exit 1
- name: Run Unit tests
id: unittests
run: |
make unit-tests
- name: Show coredumps
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |
failed=false
shopt -s nullglob
for file in /tmp/core.*; do
failed=true
printf "::group::"
gdb -ex 'thread apply all bt full' ./haproxy $file
echo "::endgroup::"
done
if [ "$failed" = true ]; then
exit 1;
fi

View File

@ -1,33 +0,0 @@
#
# special purpose CI: test against OpenSSL built in "no-deprecated" mode
# let us run those builds weekly
#
# for example, OpenWRT uses such OpenSSL builds (those builds are smaller)
#
#
# some details might be found at NL: https://www.mail-archive.com/haproxy@formilux.org/msg35759.html
# GH: https://github.com/haproxy/haproxy/issues/367
name: openssl no-deprecated
on:
schedule:
- cron: "0 0 * * 4"
permissions:
contents: read
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install VTest
run: |
scripts/build-vtest.sh
- name: Compile HAProxy
run: |
make DEFINE="-DOPENSSL_API_COMPAT=0x10100000L -DOPENSSL_NO_DEPRECATED" -j3 CC=gcc ERR=1 TARGET=linux-glibc USE_OPENSSL=1
- name: Run VTest
run: |
make reg-tests VTEST_PROGRAM=../vtest/vtest REGTESTS_TYPES=default,bug,devel

View File

@ -0,0 +1,104 @@
#
# goodput,crosstraffic are not run on purpose, those tests are intended to bandwidth measurement, we currently do not want to use GitHub runners for that
#
name: QUIC Interop AWS-LC
on:
workflow_dispatch:
schedule:
- cron: "0 0 * * 2"
jobs:
build:
runs-on: ubuntu-24.04
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v5
- name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push Docker image
id: push
uses: docker/build-push-action@v5
with:
context: https://github.com/haproxytech/haproxy-qns.git
push: true
build-args: |
SSLLIB=AWS-LC
tags: ghcr.io/${{ github.repository }}:aws-lc
- name: Cleanup registry
uses: actions/delete-package-versions@v5
with:
owner: ${{ github.repository_owner }}
package-name: 'haproxy'
package-type: container
min-versions-to-keep: 1
delete-only-untagged-versions: 'true'
run:
needs: build
strategy:
matrix:
suite: [
{ client: chrome, tests: "http3" },
{ client: picoquic, tests: "handshake,transfer,longrtt,chacha20,multiplexing,retry,resumption,zerortt,http3,blackhole,keyupdate,ecn,amplificationlimit,handshakeloss,transferloss,handshakecorruption,transfercorruption,ipv6,v2" },
{ client: quic-go, tests: "handshake,transfer,longrtt,chacha20,multiplexing,retry,resumption,zerortt,http3,blackhole,keyupdate,ecn,amplificationlimit,handshakeloss,transferloss,handshakecorruption,transfercorruption,ipv6,v2" },
{ client: ngtcp2, tests: "handshake,transfer,longrtt,chacha20,multiplexing,retry,resumption,zerortt,http3,blackhole,keyupdate,ecn,amplificationlimit,handshakeloss,transferloss,handshakecorruption,transfercorruption,ipv6,v2" }
]
fail-fast: false
name: ${{ matrix.suite.client }}
runs-on: ubuntu-24.04
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
steps:
- uses: actions/checkout@v5
- name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Install tshark
run: |
sudo apt-get update
sudo apt-get -y install tshark
- name: Pull image
run: |
docker pull ghcr.io/${{ github.repository }}:aws-lc
- name: Run
run: |
git clone https://github.com/quic-interop/quic-interop-runner
cd quic-interop-runner
pip install -r requirements.txt --break-system-packages
python run.py -j result.json -l logs -r haproxy=ghcr.io/${{ github.repository }}:aws-lc -t ${{ matrix.suite.tests }} -c ${{ matrix.suite.client }} -s haproxy
- name: Delete succeeded logs
if: failure()
run: |
cd quic-interop-runner/logs/haproxy_${{ matrix.suite.client }}
cat ../../result.json | jq -r '.results[][] | select(.result=="succeeded") | .name' | xargs rm -rf
- name: Logs upload
if: failure()
uses: actions/upload-artifact@v4
with:
name: logs-${{ matrix.suite.client }}
path: quic-interop-runner/logs/
retention-days: 6

View File

@ -0,0 +1,102 @@
#
# goodput,crosstraffic are not run on purpose, those tests are intended to bandwidth measurement, we currently do not want to use GitHub runners for that
#
name: QUIC Interop LibreSSL
on:
workflow_dispatch:
schedule:
- cron: "0 0 * * 2"
jobs:
build:
runs-on: ubuntu-24.04
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v5
- name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push Docker image
id: push
uses: docker/build-push-action@v5
with:
context: https://github.com/haproxytech/haproxy-qns.git
push: true
build-args: |
SSLLIB=LibreSSL
tags: ghcr.io/${{ github.repository }}:libressl
- name: Cleanup registry
uses: actions/delete-package-versions@v5
with:
owner: ${{ github.repository_owner }}
package-name: 'haproxy'
package-type: container
min-versions-to-keep: 1
delete-only-untagged-versions: 'true'
run:
needs: build
strategy:
matrix:
suite: [
{ client: picoquic, tests: "handshake,transfer,longrtt,chacha20,multiplexing,retry,http3,blackhole,amplificationlimit,handshakeloss,transferloss,handshakecorruption,transfercorruption,v2" },
{ client: quic-go, tests: "handshake,transfer,longrtt,chacha20,multiplexing,retry,http3,blackhole,amplificationlimit,transferloss,transfercorruption,v2" }
]
fail-fast: false
name: ${{ matrix.suite.client }}
runs-on: ubuntu-24.04
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
steps:
- uses: actions/checkout@v5
- name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Install tshark
run: |
sudo apt-get update
sudo apt-get -y install tshark
- name: Pull image
run: |
docker pull ghcr.io/${{ github.repository }}:libressl
- name: Run
run: |
git clone https://github.com/quic-interop/quic-interop-runner
cd quic-interop-runner
pip install -r requirements.txt --break-system-packages
python run.py -j result.json -l logs -r haproxy=ghcr.io/${{ github.repository }}:libressl -t ${{ matrix.suite.tests }} -c ${{ matrix.suite.client }} -s haproxy
- name: Delete succeeded logs
if: failure()
run: |
cd quic-interop-runner/logs/haproxy_${{ matrix.suite.client }}
cat ../../result.json | jq -r '.results[][] | select(.result=="succeeded") | .name' | xargs rm -rf
- name: Logs upload
if: failure()
uses: actions/upload-artifact@v4
with:
name: logs-${{ matrix.suite.client }}
path: quic-interop-runner/logs/
retention-days: 6

74
.github/workflows/quictls.yml vendored Normal file
View File

@ -0,0 +1,74 @@
#
# weekly run against modern QuicTLS branch, i.e. https://github.com/quictls/quictls
#
name: QuicTLS
on:
schedule:
- cron: "0 0 * * 4"
workflow_dispatch:
permissions:
contents: read
jobs:
test:
runs-on: ubuntu-latest
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
steps:
- uses: actions/checkout@v5
- name: Install apt dependencies
run: |
sudo apt-get update -o Acquire::Languages=none -o Acquire::Translation=none
sudo apt-get --no-install-recommends -y install socat gdb
- name: Install QuicTLS
run: env QUICTLS=yes QUICTLS_URL=https://github.com/quictls/quictls scripts/build-ssl.sh
- name: Compile HAProxy
run: |
make -j$(nproc) ERR=1 CC=gcc TARGET=linux-glibc \
USE_QUIC=1 USE_OPENSSL=1 \
SSL_LIB=${HOME}/opt/lib SSL_INC=${HOME}/opt/include \
DEBUG="-DDEBUG_POOL_INTEGRITY -DDEBUG_UNIT" \
ADDLIB="-Wl,-rpath,/usr/local/lib/ -Wl,-rpath,$HOME/opt/lib/" \
ARCH_FLAGS="-ggdb3 -fsanitize=address"
sudo make install
- name: Show HAProxy version
id: show-version
run: |
ldd $(which haproxy)
haproxy -vv
echo "version=$(haproxy -v |awk 'NR==1{print $3}')" >> $GITHUB_OUTPUT
- uses: ./.github/actions/setup-vtest
- name: Run VTest for HAProxy
id: vtest
run: |
make reg-tests VTEST_PROGRAM=../vtest/vtest REGTESTS_TYPES=default,bug,devel
- name: Show VTest results
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |
for folder in ${TMPDIR:-/tmp}/haregtests-*/vtc.*; do
printf "::group::"
cat $folder/INFO
cat $folder/LOG
echo "::endgroup::"
done
exit 1
- name: Run Unit tests
id: unittests
run: |
make unit-tests
- name: Show coredumps
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |
failed=false
shopt -s nullglob
for file in /tmp/core.*; do
failed=true
printf "::group::"
gdb -ex 'thread apply all bt full' ./haproxy $file
echo "::endgroup::"
done
if [ "$failed" = true ]; then
exit 1;
fi

View File

@ -23,7 +23,7 @@ jobs:
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v5
- name: Generate Build Matrix
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@ -44,16 +44,10 @@ jobs:
TMPDIR: /tmp
OT_CPP_VERSION: 1.6.0
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v5
with:
fetch-depth: 100
- name: Setup coredumps
if: ${{ startsWith(matrix.os, 'ubuntu-') }}
run: |
sudo sysctl -w fs.suid_dumpable=1
sudo sysctl kernel.core_pattern=/tmp/core.%h.%e.%t
#
# Github Action cache key cannot contain comma, so we calculate it based on job name
#
@ -76,26 +70,24 @@ jobs:
uses: actions/cache@v4
with:
path: '~/opt-ot/'
key: ot-${{ matrix.CC }}-${{ env.OT_CPP_VERSION }}-${{ contains(matrix.name, 'ASAN') }}
key: ${{ matrix.os }}-ot-${{ matrix.CC }}-${{ env.OT_CPP_VERSION }}-${{ contains(matrix.name, 'ASAN') }}
- name: Install apt dependencies
if: ${{ startsWith(matrix.os, 'ubuntu-') }}
run: |
sudo apt-get update
sudo apt-get install -y \
liblua5.3-dev \
libpcre2-dev \
libsystemd-dev \
ninja-build \
sudo apt-get update -o Acquire::Languages=none -o Acquire::Translation=none
sudo apt-get --no-install-recommends -y install \
${{ contains(matrix.FLAGS, 'USE_LUA=1') && 'liblua5.4-dev' || '' }} \
${{ contains(matrix.FLAGS, 'USE_PCRE2=1') && 'libpcre2-dev' || '' }} \
${{ contains(matrix.ssl, 'BORINGSSL=yes') && 'ninja-build' || '' }} \
socat \
gdb
gdb \
jose
- name: Install brew dependencies
if: ${{ startsWith(matrix.os, 'macos-') }}
run: |
brew install socat
brew install lua
- name: Install VTest
run: |
scripts/build-vtest.sh
- uses: ./.github/actions/setup-vtest
- name: Install SSL ${{ matrix.ssl }}
if: ${{ matrix.ssl && matrix.ssl != 'stock' && steps.cache_ssl.outputs.cache-hit != 'true' }}
run: env ${{ matrix.ssl }} scripts/build-ssl.sh
@ -118,10 +110,19 @@ jobs:
ERR=1 \
TARGET=${{ matrix.TARGET }} \
CC=${{ matrix.CC }} \
DEBUG="-DDEBUG_POOL_INTEGRITY" \
DEBUG="-DDEBUG_POOL_INTEGRITY -DDEBUG_UNIT" \
${{ join(matrix.FLAGS, ' ') }} \
ADDLIB="-Wl,-rpath,/usr/local/lib/ -Wl,-rpath,$HOME/opt/lib/"
sudo make install-bin
- name: Compile admin/halog/halog
run: |
make -j$(nproc) admin/halog/halog \
ERR=1 \
TARGET=${{ matrix.TARGET }} \
CC=${{ matrix.CC }} \
DEBUG="-DDEBUG_POOL_INTEGRITY -DDEBUG_UNIT" \
${{ join(matrix.FLAGS, ' ') }} \
ADDLIB="-Wl,-rpath,/usr/local/lib/ -Wl,-rpath,$HOME/opt/lib/"
sudo make install
- name: Show HAProxy version
id: show-version
run: |
@ -136,45 +137,33 @@ jobs:
echo "::endgroup::"
haproxy -vv
echo "version=$(haproxy -v |awk 'NR==1{print $3}')" >> $GITHUB_OUTPUT
- name: Install problem matcher for VTest
# This allows one to more easily see which tests fail.
run: echo "::add-matcher::.github/vtest.json"
- name: Run VTest for HAProxy ${{ steps.show-version.outputs.version }}
id: vtest
env:
# Force ASAN output into asan.log to make the output more readable.
ASAN_OPTIONS: log_path=asan.log
run: |
# This is required for macOS which does not actually allow to increase
# the '-n' soft limit to the hard limit, thus failing to run.
ulimit -n 65536
ulimit -c unlimited
make reg-tests VTEST_PROGRAM=../vtest/vtest REGTESTS_TYPES=default,bug,devel
- name: Config syntax check memleak smoke testing
if: ${{ contains(matrix.name, 'ASAN') }}
run: |
./haproxy -dI -f .github/h2spec.config -c
./haproxy -dI -f examples/content-sw-sample.cfg -c
./haproxy -dI -f examples/option-http_proxy.cfg -c
./haproxy -dI -f examples/quick-test.cfg -c
./haproxy -dI -f examples/transparent_proxy.cfg -c
- name: Show VTest results
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |
for folder in ${TMPDIR}/haregtests-*/vtc.*; do
for folder in ${TMPDIR:-/tmp}/haregtests-*/vtc.*; do
printf "::group::"
cat $folder/INFO
cat $folder/LOG
echo "::endgroup::"
done
shopt -s nullglob
for asan in asan.log*; do
echo "::group::$asan"
cat $asan
exit 1
- name: Run Unit tests
id: unittests
run: |
make unit-tests
- name: Show Unit-Tests results
if: ${{ failure() && steps.unittests.outcome == 'failure' }}
run: |
for result in ${TMPDIR:-/tmp}/ha-unittests-*/results/res.*; do
printf "::group::"
cat $result
echo "::endgroup::"
done
exit 1
- name: Show coredumps
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |

View File

@ -35,7 +35,7 @@ jobs:
- USE_THREAD=1
- USE_ZLIB=1
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v5
- uses: msys2/setup-msys2@v2
with:
install: >-

80
.github/workflows/wolfssl.yml vendored Normal file
View File

@ -0,0 +1,80 @@
name: WolfSSL
on:
schedule:
- cron: "0 0 * * 4"
workflow_dispatch:
permissions:
contents: read
jobs:
test:
runs-on: ubuntu-latest
if: ${{ github.repository_owner == 'haproxy' || github.event_name == 'workflow_dispatch' }}
steps:
- uses: actions/checkout@v5
- name: Install apt dependencies
run: |
sudo apt-get update -o Acquire::Languages=none -o Acquire::Translation=none
sudo apt-get --no-install-recommends -y install socat gdb jose
- name: Install WolfSSL
run: env WOLFSSL_VERSION=git-master WOLFSSL_DEBUG=1 scripts/build-ssl.sh
- name: Compile HAProxy
run: |
make -j$(nproc) ERR=1 CC=gcc TARGET=linux-glibc \
USE_OPENSSL_WOLFSSL=1 USE_QUIC=1 \
SSL_LIB=${HOME}/opt/lib SSL_INC=${HOME}/opt/include \
DEBUG="-DDEBUG_POOL_INTEGRITY -DDEBUG_UNIT" \
ADDLIB="-Wl,-rpath,/usr/local/lib/ -Wl,-rpath,$HOME/opt/lib/" \
ARCH_FLAGS="-ggdb3 -fsanitize=address"
sudo make install
- name: Show HAProxy version
id: show-version
run: |
ldd $(which haproxy)
haproxy -vv
echo "version=$(haproxy -v |awk 'NR==1{print $3}')" >> $GITHUB_OUTPUT
- uses: ./.github/actions/setup-vtest
- name: Run VTest for HAProxy
id: vtest
run: |
make reg-tests VTEST_PROGRAM=../vtest/vtest REGTESTS_TYPES=default,bug,devel
- name: Run Unit tests
id: unittests
run: |
make unit-tests
- name: Show VTest results
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |
for folder in ${TMPDIR:-/tmp}/haregtests-*/vtc.*; do
printf "::group::"
cat $folder/INFO
cat $folder/LOG
echo "::endgroup::"
done
exit 1
- name: Show coredumps
if: ${{ failure() && steps.vtest.outcome == 'failure' }}
run: |
failed=false
shopt -s nullglob
for file in /tmp/core.*; do
failed=true
printf "::group::"
gdb -ex 'thread apply all bt full' ./haproxy $file
echo "::endgroup::"
done
if [ "$failed" = true ]; then
exit 1;
fi
- name: Show Unit-Tests results
if: ${{ failure() && steps.unittests.outcome == 'failure' }}
run: |
for result in ${TMPDIR:-/tmp}/ha-unittests-*/results/res.*; do
printf "::group::"
cat $result
echo "::endgroup::"
done
exit 1

1
.gitignore vendored
View File

@ -57,3 +57,4 @@ dev/udp/udp-perturb
/src/dlmalloc.c
/tests/test_hashes
doc/lua-api/_build
dev/term_events/term_events

View File

@ -8,7 +8,7 @@ branches:
env:
global:
- FLAGS="USE_LUA=1 USE_OPENSSL=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_SYSTEMD=1 USE_ZLIB=1"
- FLAGS="USE_LUA=1 USE_OPENSSL=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_ZLIB=1"
- TMPDIR=/tmp
addons:

View File

@ -171,7 +171,17 @@ feedback for developers:
as the previous releases that had 6 months to stabilize. In terms of
stability it really means that the point zero version already accumulated
6 months of fixes and that it is much safer to use even just after it is
released.
released. There is one exception though, features marked as "experimental"
are not guaranteed to be maintained beyond the release of the next LTS
branch. The rationale here is that the experimental status is made to
expose an early preview of a feature, that is often incomplete, not always
in its definitive form regarding configuration, and for which developers
are seeking feedback from the users. It is even possible that changes will
be brought within the stable branch and it may happen that the feature
breaks. It is not imaginable to always be able to backport bug fixes too
far in this context since the code and configuration may change quite a
bit. Users who want to try experimental features are expected to upgrade
quickly to benefit from the improvements made to that feature.
- for developers, given that the odd versions are solely used by highly
skilled users, it's easier to get advanced traces and captures, and there

3832
CHANGELOG

File diff suppressed because it is too large Load Diff

View File

@ -1010,7 +1010,7 @@ you notice you're already practising some of them:
- continue to send pull requests after having been explained why they are not
welcome.
- give wrong advices to people asking for help, or sending them patches to
- give wrong advice to people asking for help, or sending them patches to
try which make no sense, waste their time, and give them a bad impression
of the people working on the project.

120
INSTALL
View File

@ -9,7 +9,7 @@ used to follow updates then it is recommended that instead you use the packages
provided by your software vendor or Linux distribution. Most of them are taking
this task seriously and are doing a good job at backporting important fixes.
If for any reason you'd prefer to use a different version than the one packaged
If for any reason you would prefer a different version than the one packaged
for your system, you want to be certain to have all the fixes or to get some
commercial support, other choices are available at http://www.haproxy.com/.
@ -34,18 +34,26 @@ are a few build examples :
- recent Linux system with all options, make and install :
$ make clean
$ make -j $(nproc) TARGET=linux-glibc \
USE_OPENSSL=1 USE_LUA=1 USE_PCRE2=1 USE_SYSTEMD=1
USE_OPENSSL=1 USE_QUIC=1 USE_QUIC_OPENSSL_COMPAT=1 \
USE_LUA=1 USE_PCRE2=1
$ sudo make install
- FreeBSD and OpenBSD, build with all options :
$ gmake -j 4 TARGET=freebsd USE_OPENSSL=1 USE_LUA=1 USE_PCRE2=1
- FreeBSD + OpenSSL, build with all options :
$ gmake -j $(sysctl -n hw.ncpu) TARGET=freebsd \
USE_OPENSSL=1 USE_QUIC=1 USE_QUIC_OPENSSL_COMPAT=1 \
USE_LUA=1 USE_PCRE2=1
- OpenBSD + LibreSSL, build with all options :
$ gmake -j $(sysctl -n hw.ncpu) TARGET=openbsd \
USE_OPENSSL=1 USE_QUIC=1 USE_LUA=1 USE_PCRE2=1
- embedded Linux, build using a cross-compiler :
$ make -j $(nproc) TARGET=linux-glibc USE_OPENSSL=1 USE_PCRE2=1 \
CC=/opt/cross/gcc730-arm/bin/gcc ADDLIB=-latomic
CC=/opt/cross/gcc730-arm/bin/gcc CFLAGS="-mthumb" ADDLIB=-latomic
- Build with static PCRE on Solaris / UltraSPARC :
$ make TARGET=solaris CPU_CFLAGS="-mcpu=v9" USE_STATIC_PCRE2=1
$ make -j $(/usr/sbin/psrinfo -p) TARGET=solaris \
CPU_CFLAGS="-mcpu=v9" USE_STATIC_PCRE2=1
For more advanced build options or if a command above reports an error, please
read the following sections.
@ -103,20 +111,22 @@ HAProxy requires a working GCC or Clang toolchain and GNU make :
may want to retry with "gmake" which is the name commonly used for GNU make
on BSD systems.
- GCC >= 4.2 (up to 13 tested). Older versions can be made to work with a
few minor adaptations if really needed. Newer versions may sometimes break
due to compiler regressions or behaviour changes. The version shipped with
your operating system is very likely to work with no trouble. Clang >= 3.0
is also known to work as an alternative solution. Recent versions may emit
a bit more warnings that are worth reporting as they may reveal real bugs.
TCC (https://repo.or.cz/tinycc.git) is also usable for developers but will
not support threading and was found at least once to produce bad code in
some rare corner cases (since fixed). But it builds extremely quickly
(typically half a second for the whole project) and is very convenient to
run quick tests during API changes or code refactoring.
- GCC >= 4.7 (up to 15 tested). Older versions are no longer supported due to
the latest mt_list update which only uses c11-like atomics. Newer versions
may sometimes break due to compiler regressions or behaviour changes. The
version shipped with your operating system is very likely to work with no
trouble. Clang >= 3.0 is also known to work as an alternative solution, and
versions up to 19 were successfully tested. Recent versions may emit a bit
more warnings that are worth reporting as they may reveal real bugs. TCC
(https://repo.or.cz/tinycc.git) is also usable for developers but will not
support threading and was found at least once to produce bad code in some
rare corner cases (since fixed). But it builds extremely quickly (typically
half a second for the whole project) and is very convenient to run quick
tests during API changes or code refactoring.
- GNU ld (binutils package), with no particular version. Other linkers might
work but were not tested.
work but were not tested. The default one from your operating system will
normally work.
On debian or Ubuntu systems and their derivatives, you may get all these tools
at once by issuing the two following commands :
@ -227,7 +237,7 @@ to forcefully enable it using "USE_LIBCRYPT=1".
-----------------
For SSL/TLS, it is necessary to use a cryptography library. HAProxy currently
supports the OpenSSL library, and is known to build and work with branches
1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 3.0, 3.1 and 3.2. It is recommended to use
1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, and 3.0 to 3.6. It is recommended to use
at least OpenSSL 1.1.1 to have support for all SSL keywords and configuration
in HAProxy. OpenSSL follows a long-term support cycle similar to HAProxy's,
and each of the branches above receives its own fixes, without forcing you to
@ -244,16 +254,20 @@ https://github.com/openssl/openssl/issues/17627). If a migration to 3.x is
mandated by support reasons, at least 3.1 recovers a small fraction of this
important loss.
Four OpenSSL derivatives called LibreSSL, BoringSSL, QUICTLS, and AWS-LC are
Three OpenSSL derivatives called LibreSSL, QUICTLS, and AWS-LC are
reported to work as well. While there are some efforts from the community to
ensure they work well, OpenSSL remains the primary target and this means that
in case of conflicting choices, OpenSSL support will be favored over other
options. Note that QUIC is not fully supported when haproxy is built with
OpenSSL. In this case, QUICTLS is the preferred alternative. As of writing
this, the QuicTLS project follows OpenSSL very closely and provides update
simultaneously, but being a volunteer-driven project, its long-term future does
not look certain enough to convince operating systems to package it, so it
needs to be build locally. See the section about QUIC in this document.
OpenSSL < 3.5.2 version. In this case, QUICTLS or AWS-LC are the preferred
alternatives. As of writing this, the QuicTLS project follows OpenSSL very
closely and provides update simultaneously, but being a volunteer-driven
project, its long-term future does not look certain enough to convince
operating systems to package it, so it needs to be build locally. Recent
versions of AWS-LC (>= 1.22 and the FIPS branches) are pretty complete and
generally more performant than other OpenSSL derivatives, but may behave
slightly differently, particularly when dealing with outdated setups. See
the section about QUIC in this document.
A fifth option is wolfSSL (https://github.com/wolfSSL/wolfssl). It is the only
supported alternative stack not based on OpenSSL, yet which implements almost
@ -312,7 +326,7 @@ command line, for example:
$ make -j $(nproc) TARGET=generic USE_OPENSSL_WOLFSSL=1 USE_QUIC=1 \
SSL_INC=/opt/wolfssl-5.6.6/include SSL_LIB=/opt/wolfssl-5.6.6/lib
To use HAProxy with AWS-LC you must have version v1.13.0 or newer of AWS-LC
To use HAProxy with AWS-LC you must have version v1.22.0 or newer of AWS-LC
built and installed locally.
$ cd ~/build/aws-lc
$ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/aws-lc
@ -375,10 +389,15 @@ systems, by passing "USE_SLZ=" to the "make" command.
Please note that SLZ will benefit from some CPU-specific instructions like the
availability of the CRC32 extension on some ARM processors. Thus it can further
improve its performance to build with "CPU=native" on the target system, or
"CPU=armv81" (modern systems such as Graviton2 or A55/A75 and beyond),
"CPU=a72" (e.g. for RPi4, or AWS Graviton), "CPU=a53" (e.g. for RPi3), or
"CPU=armv8-auto" (automatic detection with minor runtime penalty).
improve its performance to build with:
- "CPU_CFLAGS=-march=native" on the target system or
- "CPU_CFLAGS=-march=armv81" on modern systems such as Graviton2 or A55/A75
and beyond)
- "CPU_CFLAGS=-march=a72" (e.g. for RPi4, or AWS Graviton)
- "CPU_CFLAGS=-march=a53" (e.g. for RPi3)
- "CPU_CFLAGS=-march=armv8-auto" automatic detection with minor runtime
penalty)
A second option involves the widely known zlib library, which is very likely
installed on your system. In order to use zlib, simply pass "USE_ZLIB=1" to the
@ -452,12 +471,6 @@ are the extra libraries that may be referenced at build time :
on Linux. It is automatically detected and may be disabled
using "USE_DL=", though it should never harm.
- USE_SYSTEMD=1 enables support for the sdnotify features of systemd,
allowing better integration with systemd on Linux systems
which come with it. It is never enabled by default so there
is no need to disable it.
4.10) Common errors
-------------------
Some build errors may happen depending on the options combinations or the
@ -481,8 +494,8 @@ target. Common issues may include:
other supported compatible library.
- many "dereferencing pointer 'sa.985' does break strict-aliasing rules"
=> these warnings happen on old compilers (typically gcc-4.4), and may
safely be ignored; newer ones are better on these.
=> these warnings happen on old compilers (typically gcc before 7.x),
and may safely be ignored; newer ones are better on these.
4.11) QUIC
@ -491,10 +504,11 @@ QUIC is the new transport layer protocol and is required for HTTP/3. This
protocol stack is currently supported as an experimental feature in haproxy on
the frontend side. In order to enable it, use "USE_QUIC=1 USE_OPENSSL=1".
Note that QUIC is not fully supported by the OpenSSL library. Indeed QUIC 0-RTT
cannot be supported by OpenSSL contrary to others libraries with full QUIC
support. The preferred option is to use QUICTLS. This is a fork of OpenSSL with
a QUIC-compatible API. Its repository is available at this location:
Note that QUIC is not always fully supported by the OpenSSL library depending on
its version. Indeed QUIC 0-RTT cannot be supported by OpenSSL for versions before
3.5 contrary to others libraries with full QUIC support. The preferred option is
to use QUICTLS. This is a fork of OpenSSL with a QUIC-compatible API. Its
repository is available at this location:
https://github.com/quictls/openssl
@ -522,14 +536,18 @@ way assuming that wolfSSL was installed in /opt/wolfssl-5.6.0 as shown in 4.5:
SSL_INC=/opt/wolfssl-5.6.0/include SSL_LIB=/opt/wolfssl-5.6.0/lib
LDFLAGS="-Wl,-rpath,/opt/wolfssl-5.6.0/lib"
As last resort, haproxy may be compiled against OpenSSL as follows:
As last resort, haproxy may be compiled against OpenSSL as follows from 3.5
version with 0-RTT support:
$ make TARGET=generic USE_OPENSSL=1 USE_QUIC=1
or as follows for all OpenSSL versions but without O-RTT support:
$ make TARGET=generic USE_OPENSSL=1 USE_QUIC=1 USE_QUIC_OPENSSL_COMPAT=1
Note that QUIC 0-RTT is not supported by haproxy QUIC stack when built against
OpenSSL. In addition to this compilation requirements, the QUIC listener
bindings must be explicitly enabled with a specific QUIC tuning parameter.
(see "limited-quic" global parameter of haproxy Configuration Manual).
In addition to this requirements, the QUIC listener bindings must be explicitly
enabled with a specific QUIC tuning parameter. (see "limited-quic" global
parameter of haproxy Configuration Manual).
5) How to build HAProxy
@ -545,9 +563,9 @@ It goes into more details with the main options.
To build haproxy, you have to choose your target OS amongst the following ones
and assign it to the TARGET variable :
- linux-glibc for Linux kernel 2.6.28 and above
- linux-glibc for Linux kernel 4.17 and above
- linux-glibc-legacy for Linux kernel 2.6.28 and above without new features
- linux-musl for Linux kernel 2.6.28 and above with musl libc
- linux-musl for Linux kernel 4.17 and above with musl libc
- solaris for Solaris 10 and above
- freebsd for FreeBSD 10 and above
- dragonfly for DragonFlyBSD 4.3 and above
@ -747,8 +765,8 @@ forced to produce final binaries, and must not be used during bisect sessions,
as it will often lead to the wrong commit.
Examples:
# silence strict-aliasing warnings with old gcc-4.4:
$ make -j$(nproc) TARGET=linux-glibc CC=gcc-44 CFLAGS=-fno-strict-aliasing
# silence strict-aliasing warnings with old gcc-5.5:
$ make -j$(nproc) TARGET=linux-glibc CC=gcc-55 CFLAGS=-fno-strict-aliasing
# disable all warning options:
$ make -j$(nproc) TARGET=linux-glibc CC=mycc WARN_CFLAGS= NOWARN_CFLAGS=

View File

@ -138,7 +138,7 @@ ScientiaMobile WURFL Device Detection
Maintainer: Paul Borile, Massimiliano Bellomi <wurfl-haproxy-support@scientiamobile.com>
Files: addons/wurfl, doc/WURFL-device-detection.txt
SPOE (deprecated)
SPOE
Maintainer: Christopher Faulet <cfaulet@haproxy.com>
Files: src/flt_spoe.c, include/haproxy/spoe*.h, doc/SPOE.txt

203
Makefile
View File

@ -35,6 +35,7 @@
# USE_OPENSSL : enable use of OpenSSL. Recommended, but see below.
# USE_OPENSSL_AWSLC : enable use of AWS-LC
# USE_OPENSSL_WOLFSSL : enable use of wolfSSL with the OpenSSL API
# USE_ECH : enable use of ECH with the OpenSSL API
# USE_QUIC : enable use of QUIC with the quictls API (quictls, libressl, boringssl)
# USE_QUIC_OPENSSL_COMPAT : enable use of QUIC with the standard openssl API (limited features)
# USE_ENGINE : enable use of OpenSSL Engine.
@ -56,14 +57,14 @@
# USE_DEVICEATLAS : enable DeviceAtlas api.
# USE_51DEGREES : enable third party device detection library from 51Degrees
# USE_WURFL : enable WURFL detection library from Scientiamobile
# USE_SYSTEMD : enable sd_notify() support.
# USE_OBSOLETE_LINKER : use when the linker fails to emit __start_init/__stop_init
# USE_THREAD_DUMP : use the more advanced thread state dump system. Automatic.
# USE_OT : enable the OpenTracing filter
# USE_MEMORY_PROFILING : enable the memory profiler. Linux-glibc only.
# USE_LIBATOMIC : force to link with/without libatomic. Automatic.
# USE_PTHREAD_EMULATION : replace pthread's rwlocks with ours
# USE_SHM_OPEN : use shm_open() for the startup-logs
# USE_SHM_OPEN : use shm_open() for features that can make use of shared memory
# USE_KTLS : use kTLS.(requires at least Linux 4.17).
#
# Options can be forced by specifying "USE_xxx=1" or can be disabled by using
# "USE_xxx=" (empty string). The list of enabled and disabled options for a
@ -135,7 +136,12 @@
# VTEST_PROGRAM : location of the vtest program to run reg-tests.
# DEBUG_USE_ABORT: use abort() for program termination, see include/haproxy/bug.h for details
#### Add -Werror when set to non-empty, and make Makefile stop on warnings.
#### It must be declared before includes because it's used there.
ERR =
include include/make/verbose.mk
include include/make/errors.mk
include include/make/compiler.mk
include include/make/options.mk
@ -159,7 +165,7 @@ TARGET =
CPU =
ifneq ($(CPU),)
ifneq ($(CPU),generic)
$(warning Warning: the "CPU" variable was forced to "$(CPU)" but is no longer \
$(call $(complain),the "CPU" variable was forced to "$(CPU)" but is no longer \
used and will be ignored. For native builds, modern compilers generally \
prefer that the string "-march=native" is passed in CPU_CFLAGS or CFLAGS. \
For other CPU-specific options, please read suggestions in the INSTALL file.)
@ -169,7 +175,7 @@ endif
#### No longer used
ARCH =
ifneq ($(ARCH),)
$(warning Warning: the "ARCH" variable was forced to "$(ARCH)" but is no \
$(call $(complain),the "ARCH" variable was forced to "$(ARCH)" but is no \
longer used and will be ignored. Please check the INSTALL file for other \
options, but usually in order to pass arch-specific options, ARCH_FLAGS, \
CFLAGS or LDFLAGS are preferred.)
@ -187,7 +193,7 @@ OPT_CFLAGS = -O2
#### No longer used
DEBUG_CFLAGS =
ifneq ($(DEBUG_CFLAGS),)
$(warning Warning: DEBUG_CFLAGS was forced to "$(DEBUG_CFLAGS)" but is no \
$(call $(complain),DEBUG_CFLAGS was forced to "$(DEBUG_CFLAGS)" but is no \
longer used and will be ignored. If you have ported this build setting from \
and older version, it is likely that you just want to pass these options \
to the CFLAGS variable. If you are passing some debugging-related options \
@ -195,12 +201,10 @@ $(warning Warning: DEBUG_CFLAGS was forced to "$(DEBUG_CFLAGS)" but is no \
both the compilation and linking stages.)
endif
#### Add -Werror when set to non-empty
ERR =
#### May be used to force running a specific set of reg-tests
REG_TEST_FILES =
REG_TEST_SCRIPT=./scripts/run-regtests.sh
UNIT_TEST_SCRIPT=./scripts/run-unittests.sh
#### Standard C definition
# Compiler-specific flags that may be used to set the standard behavior we
@ -210,7 +214,8 @@ REG_TEST_SCRIPT=./scripts/run-regtests.sh
# undefined behavior to silently produce invalid code. For this reason we have
# to use -fwrapv or -fno-strict-overflow to guarantee the intended behavior.
# It is preferable not to change this option in order to avoid breakage.
STD_CFLAGS := $(call cc-opt-alt,-fwrapv,-fno-strict-overflow)
STD_CFLAGS := $(call cc-opt-alt,-fwrapv,-fno-strict-overflow) \
$(call cc-opt,-fvect-cost-model=very-cheap)
#### Compiler-specific flags to enable certain classes of warnings.
# Some are hard-coded, others are enabled only if supported.
@ -247,7 +252,7 @@ endif
#### No longer used
SMALL_OPTS =
ifneq ($(SMALL_OPTS),)
$(warning Warning: SMALL_OPTS was forced to "$(SMALL_OPTS)" but is no longer \
$(call $(complain),SMALL_OPTS was forced to "$(SMALL_OPTS)" but is no longer \
used and will be ignored. Please check if this setting are still relevant, \
and move it either to DEFINE or to CFLAGS instead.)
endif
@ -260,8 +265,9 @@ endif
# without appearing here. Currently defined DEBUG macros include DEBUG_FULL,
# DEBUG_MEM_STATS, DEBUG_DONT_SHARE_POOLS, DEBUG_FD, DEBUG_POOL_INTEGRITY,
# DEBUG_NO_POOLS, DEBUG_FAIL_ALLOC, DEBUG_STRICT_ACTION=[0-3], DEBUG_HPACK,
# DEBUG_AUTH, DEBUG_SPOE, DEBUG_UAF, DEBUG_THREAD, DEBUG_STRICT, DEBUG_DEV,
# DEBUG_TASK, DEBUG_MEMORY_POOLS, DEBUG_POOL_TRACING, DEBUG_QPACK, DEBUG_LIST.
# DEBUG_AUTH, DEBUG_SPOE, DEBUG_UAF, DEBUG_THREAD=0-2, DEBUG_STRICT, DEBUG_DEV,
# DEBUG_TASK, DEBUG_MEMORY_POOLS, DEBUG_POOL_TRACING, DEBUG_QPACK, DEBUG_LIST,
# DEBUG_COUNTERS=[0-2], DEBUG_STRESS, DEBUG_UNIT.
DEBUG =
#### Trace options
@ -336,14 +342,16 @@ use_opts = USE_EPOLL USE_KQUEUE USE_NETFILTER USE_POLL \
USE_TPROXY USE_LINUX_TPROXY USE_LINUX_CAP \
USE_LINUX_SPLICE USE_LIBCRYPT USE_CRYPT_H USE_ENGINE \
USE_GETADDRINFO USE_OPENSSL USE_OPENSSL_WOLFSSL USE_OPENSSL_AWSLC \
USE_ECH \
USE_SSL USE_LUA USE_ACCEPT4 USE_CLOSEFROM USE_ZLIB USE_SLZ \
USE_CPU_AFFINITY USE_TFO USE_NS USE_DL USE_RT USE_LIBATOMIC \
USE_MATH USE_DEVICEATLAS USE_51DEGREES \
USE_WURFL USE_SYSTEMD USE_OBSOLETE_LINKER USE_PRCTL USE_PROCCTL \
USE_WURFL USE_OBSOLETE_LINKER USE_PRCTL USE_PROCCTL \
USE_THREAD_DUMP USE_EVPORTS USE_OT USE_QUIC USE_PROMEX \
USE_MEMORY_PROFILING USE_SHM_OPEN \
USE_STATIC_PCRE USE_STATIC_PCRE2 \
USE_PCRE USE_PCRE_JIT USE_PCRE2 USE_PCRE2_JIT USE_QUIC_OPENSSL_COMPAT
USE_PCRE USE_PCRE_JIT USE_PCRE2 USE_PCRE2_JIT \
USE_QUIC_OPENSSL_COMPAT USE_KTLS
# preset all variables for all supported build options among use_opts
$(reset_opts_vars)
@ -374,13 +382,13 @@ ifeq ($(TARGET),haiku)
set_target_defaults = $(call default_opts,USE_POLL USE_TPROXY USE_OBSOLETE_LINKER)
endif
# For linux >= 2.6.28 and glibc
# For linux >= 4.17 and glibc
ifeq ($(TARGET),linux-glibc)
set_target_defaults = $(call default_opts, \
USE_POLL USE_TPROXY USE_LIBCRYPT USE_DL USE_RT USE_CRYPT_H USE_NETFILTER \
USE_CPU_AFFINITY USE_THREAD USE_EPOLL USE_LINUX_TPROXY USE_LINUX_CAP \
USE_ACCEPT4 USE_LINUX_SPLICE USE_PRCTL USE_THREAD_DUMP USE_NS USE_TFO \
USE_GETADDRINFO USE_BACKTRACE USE_SHM_OPEN USE_SYSTEMD)
USE_GETADDRINFO USE_BACKTRACE USE_SHM_OPEN USE_KTLS)
INSTALL = install -v
endif
@ -393,13 +401,13 @@ ifeq ($(TARGET),linux-glibc-legacy)
INSTALL = install -v
endif
# For linux >= 2.6.28 and musl
# For linux >= 4.17 and musl
ifeq ($(TARGET),linux-musl)
set_target_defaults = $(call default_opts, \
USE_POLL USE_TPROXY USE_LIBCRYPT USE_DL USE_RT USE_CRYPT_H USE_NETFILTER \
USE_CPU_AFFINITY USE_THREAD USE_EPOLL USE_LINUX_TPROXY USE_LINUX_CAP \
USE_ACCEPT4 USE_LINUX_SPLICE USE_PRCTL USE_THREAD_DUMP USE_NS USE_TFO \
USE_GETADDRINFO USE_SHM_OPEN)
USE_GETADDRINFO USE_BACKTRACE USE_SHM_OPEN USE_KTLS)
INSTALL = install -v
endif
@ -416,7 +424,7 @@ endif
ifeq ($(TARGET),freebsd)
set_target_defaults = $(call default_opts, \
USE_POLL USE_TPROXY USE_LIBCRYPT USE_THREAD USE_CPU_AFFINITY USE_KQUEUE \
USE_ACCEPT4 USE_CLOSEFROM USE_GETADDRINFO USE_PROCCTL USE_SHM_OPEN)
USE_ACCEPT4 USE_CLOSEFROM USE_GETADDRINFO USE_PROCCTL)
endif
# kFreeBSD glibc
@ -590,10 +598,16 @@ endif
ifneq ($(USE_BACKTRACE:0=),)
BACKTRACE_LDFLAGS = -Wl,$(if $(EXPORT_SYMBOL),$(EXPORT_SYMBOL),--export-dynamic)
BACKTRACE_CFLAGS = -fno-omit-frame-pointer
endif
ifneq ($(USE_MEMORY_PROFILING:0=),)
MEMORY_PROFILING_CFLAGS = -fno-optimize-sibling-calls
endif
ifneq ($(USE_CPU_AFFINITY:0=),)
OPTIONS_OBJS += src/cpuset.o
OPTIONS_OBJS += src/cpu_topo.o
endif
# OpenSSL is packaged in various forms and with various dependencies.
@ -626,7 +640,10 @@ ifneq ($(USE_OPENSSL:0=),)
SSL_LDFLAGS := $(if $(SSL_LIB),-L$(SSL_LIB)) -lssl -lcrypto
endif
USE_SSL := $(if $(USE_SSL:0=),$(USE_SSL:0=),implicit)
OPTIONS_OBJS += src/ssl_sock.o src/ssl_ckch.o src/ssl_sample.o src/ssl_crtlist.o src/cfgparse-ssl.o src/ssl_utils.o src/jwt.o src/ssl_ocsp.o src/ssl_gencert.o
OPTIONS_OBJS += src/ssl_sock.o src/ssl_ckch.o src/ssl_ocsp.o src/ssl_crtlist.o \
src/ssl_sample.o src/cfgparse-ssl.o src/ssl_gencert.o \
src/ssl_utils.o src/jwt.o src/ssl_clienthello.o src/jws.o src/acme.o \
src/ssl_trace.o src/jwe.o
endif
ifneq ($(USE_ENGINE:0=),)
@ -638,17 +655,22 @@ ifneq ($(USE_ENGINE:0=),)
endif
ifneq ($(USE_QUIC:0=),)
OPTIONS_OBJS += src/quic_conn.o src/mux_quic.o src/h3.o src/xprt_quic.o \
src/quic_frame.o src/quic_tls.o src/quic_tp.o \
src/quic_stats.o src/quic_sock.o src/proto_quic.o \
src/qmux_trace.o src/quic_loss.o src/qpack-enc.o \
src/quic_cc_newreno.o src/quic_cc_cubic.o src/qpack-tbl.o \
src/qpack-dec.o src/hq_interop.o src/quic_stream.o \
src/h3_stats.o src/qmux_http.o src/cfgparse-quic.o \
src/cbuf.o src/quic_cc.o src/quic_cc_nocc.o src/quic_ack.o \
src/quic_trace.o src/quic_cli.o src/quic_ssl.o \
src/quic_rx.o src/quic_tx.o src/quic_cid.o src/quic_retry.o\
src/quic_retransmit.o src/quic_fctl.o
OPTIONS_OBJS += src/mux_quic.o src/h3.o src/quic_rx.o src/quic_tx.o \
src/quic_conn.o src/quic_frame.o src/quic_sock.o \
src/quic_tls.o src/quic_ssl.o src/proto_quic.o \
src/quic_cli.o src/quic_trace.o src/quic_tp.o \
src/quic_cid.o src/quic_stream.o \
src/quic_retransmit.o src/quic_loss.o \
src/hq_interop.o src/quic_cc_cubic.o \
src/quic_cc_bbr.o src/quic_retry.o \
src/cfgparse-quic.o src/xprt_quic.o src/quic_token.o \
src/quic_ack.o src/qpack-dec.o src/quic_cc_newreno.o \
src/qmux_http.o src/qmux_trace.o src/quic_rules.o \
src/quic_cc_nocc.o src/quic_cc.o src/quic_pacing.o \
src/h3_stats.o src/quic_stats.o src/qpack-enc.o \
src/qpack-tbl.o src/quic_cc_drs.o src/quic_fctl.o \
src/quic_enc.o
endif
ifneq ($(USE_QUIC_OPENSSL_COMPAT:0=),)
@ -760,10 +782,6 @@ ifneq ($(USE_WURFL:0=),)
WURFL_LDFLAGS = $(if $(WURFL_LIB),-L$(WURFL_LIB)) -lwurfl
endif
ifneq ($(USE_SYSTEMD:0=),)
OPTIONS_OBJS += src/systemd.o
endif
ifneq ($(USE_PCRE:0=)$(USE_STATIC_PCRE:0=)$(USE_PCRE_JIT:0=),)
ifneq ($(USE_PCRE2:0=)$(USE_STATIC_PCRE2:0=)$(USE_PCRE2_JIT:0=),)
$(error cannot compile both PCRE and PCRE2 support)
@ -933,7 +951,7 @@ all:
@echo
@exit 1
else
all: haproxy dev/flags/flags $(EXTRA)
all: dev/flags/flags haproxy $(EXTRA)
endif # obsolete targets
endif # TARGET
@ -943,40 +961,48 @@ ifneq ($(EXTRA_OBJS),)
OBJS += $(EXTRA_OBJS)
endif
OBJS += src/mux_h2.o src/mux_fcgi.o src/mux_h1.o src/tcpcheck.o \
src/stream.o src/stats.o src/http_ana.o src/server.o \
src/stick_table.o src/sample.o src/flt_spoe.o src/tools.o \
src/log.o src/cfgparse.o src/peers.o src/backend.o src/resolvers.o \
src/cli.o src/connection.o src/proxy.o src/http_htx.o \
src/cfgparse-listen.o src/pattern.o src/check.o src/haproxy.o \
src/cache.o src/stconn.o src/http_act.o src/http_fetch.o \
src/http_client.o src/listener.o src/dns.o src/vars.o src/debug.o \
src/tcp_rules.o src/sink.o src/h1_htx.o src/task.o src/mjson.o \
src/h2.o src/filters.o src/server_state.o src/payload.o \
src/fcgi-app.o src/map.o src/htx.o src/h1.o src/pool.o src/dns_ring.o \
src/cfgparse-global.o src/trace.o src/tcp_sample.o src/http_ext.o \
src/flt_http_comp.o src/mux_pt.o src/flt_trace.o src/mqtt.o \
src/acl.o src/sock.o src/mworker.o src/tcp_act.o src/ring.o \
src/session.o src/proto_tcp.o src/fd.o src/channel.o src/activity.o \
src/queue.o src/lb_fas.o src/http_rules.o src/extcheck.o \
src/flt_bwlim.o src/thread.o src/http.o src/lb_chash.o src/applet.o \
src/compression.o src/raw_sock.o src/ncbuf.o src/frontend.o \
src/errors.o src/uri_normalizer.o src/http_conv.o src/lb_fwrr.o \
src/sha1.o src/proto_sockpair.o src/mailers.o src/lb_fwlc.o \
src/ebmbtree.o src/cfgcond.o src/action.o src/xprt_handshake.o \
src/protocol.o src/proto_uxst.o src/proto_udp.o src/lb_map.o \
src/fix.o src/ev_select.o src/arg.o src/sock_inet.o src/event_hdl.o \
src/mworker-prog.o src/hpack-dec.o src/cfgparse-tcp.o src/lb_ss.o \
src/sock_unix.o src/shctx.o src/proto_uxdg.o src/fcgi.o \
src/eb64tree.o src/clock.o src/chunk.o src/cfgdiag.o src/signal.o \
src/regex.o src/lru.o src/eb32tree.o src/eb32sctree.o \
src/cfgparse-unix.o src/hpack-tbl.o src/ebsttree.o src/ebimtree.o \
src/base64.o src/auth.o src/uri_auth.o src/time.o src/ebistree.o \
src/dynbuf.o src/wdt.o src/pipe.o src/init.o src/http_acl.o \
src/hpack-huff.o src/hpack-enc.o src/dict.o src/freq_ctr.o \
src/ebtree.o src/hash.o src/dgram.o src/version.o src/proto_rhttp.o \
src/guid.o src/stats-html.o src/stats-json.o src/stats-file.o \
src/stats-proxy.o
OBJS += src/mux_h2.o src/mux_h1.o src/mux_fcgi.o src/log.o \
src/server.o src/stream.o src/tcpcheck.o src/http_ana.o \
src/stick_table.o src/tools.o src/mux_spop.o src/sample.o \
src/activity.o src/cfgparse.o src/peers.o src/cli.o \
src/backend.o src/connection.o src/resolvers.o src/proxy.o \
src/cache.o src/stconn.o src/http_htx.o src/debug.o \
src/check.o src/stats-html.o src/haproxy.o src/listener.o \
src/applet.o src/pattern.o src/cfgparse-listen.o \
src/flt_spoe.o src/cebis_tree.o src/http_ext.o \
src/http_act.o src/http_fetch.o src/cebs_tree.o \
src/cebib_tree.o src/http_client.o src/dns.o \
src/cebb_tree.o src/vars.o src/event_hdl.o src/tcp_rules.o \
src/trace.o src/stats-proxy.o src/pool.o src/stats.o \
src/cfgparse-global.o src/filters.o src/mux_pt.o \
src/flt_http_comp.o src/sock.o src/h1.o src/sink.o \
src/ceba_tree.o src/session.o src/payload.o src/htx.o \
src/cebl_tree.o src/ceb32_tree.o src/ceb64_tree.o \
src/server_state.o src/proto_rhttp.o src/flt_trace.o src/fd.o \
src/task.o src/map.o src/fcgi-app.o src/h2.o src/mworker.o \
src/tcp_sample.o src/mjson.o src/h1_htx.o src/tcp_act.o \
src/ring.o src/flt_bwlim.o src/acl.o src/thread.o src/queue.o \
src/http_rules.o src/http.o src/channel.o src/proto_tcp.o \
src/mqtt.o src/lb_chash.o src/extcheck.o src/dns_ring.o \
src/errors.o src/ncbuf.o src/compression.o src/http_conv.o \
src/frontend.o src/stats-json.o src/proto_sockpair.o \
src/raw_sock.o src/action.o src/stats-file.o src/buf.o \
src/xprt_handshake.o src/proto_uxst.o src/lb_fwrr.o \
src/uri_normalizer.o src/mailers.o src/protocol.o \
src/cfgcond.o src/proto_udp.o src/lb_fwlc.o src/ebmbtree.o \
src/proto_uxdg.o src/cfgdiag.o src/sock_unix.o src/sha1.o \
src/lb_fas.o src/clock.o src/sock_inet.o src/ev_select.o \
src/lb_map.o src/shctx.o src/hpack-dec.o src/net_helper.o \
src/arg.o src/signal.o src/fix.o src/dynbuf.o src/guid.o \
src/cfgparse-tcp.o src/lb_ss.o src/chunk.o src/counters.o \
src/cfgparse-unix.o src/regex.o src/fcgi.o src/uri_auth.o \
src/eb64tree.o src/eb32tree.o src/eb32sctree.o src/lru.o \
src/limits.o src/ebimtree.o src/wdt.o src/hpack-tbl.o \
src/ebistree.o src/base64.o src/auth.o src/time.o \
src/ebsttree.o src/freq_ctr.o src/systemd.o src/init.o \
src/http_acl.o src/dict.o src/dgram.o src/pipe.o \
src/hpack-huff.o src/hpack-enc.o src/ebtree.o src/hash.o \
src/httpclient_cli.o src/version.o src/ncbmbuf.o src/ech.o
ifneq ($(TRACE),)
OBJS += src/calltrace.o
@ -1011,8 +1037,9 @@ help:
# TARGET variable is not set since we're not building, by definition.
IGNORE_OPTS=help install install-man install-doc install-bin \
uninstall clean tags cscope tar git-tar version update-version \
opts reg-tests reg-tests-help admin/halog/halog dev/flags/flags \
dev/haring/haring dev/poll/poll dev/tcploop/tcploop
opts reg-tests reg-tests-help unit-tests admin/halog/halog dev/flags/flags \
dev/haring/haring dev/ncpu/ncpu dev/poll/poll dev/tcploop/tcploop \
dev/term_events/term_events
ifneq ($(TARGET),)
ifeq ($(filter $(firstword $(MAKECMDGOALS)),$(IGNORE_OPTS)),)
@ -1049,6 +1076,9 @@ dev/haring/haring: dev/haring/haring.o
dev/hpack/%: dev/hpack/%.o
$(cmd_LD) $(ARCH_FLAGS) $(LDFLAGS) -o $@ $^ $(LDOPTS)
dev/ncpu/ncpu:
$(cmd_MAKE) -C dev/ncpu ncpu V='$(V)'
dev/poll/poll:
$(cmd_MAKE) -C dev/poll poll CC='$(CC)' OPTIMIZE='$(COPTS)' V='$(V)'
@ -1061,13 +1091,16 @@ dev/tcploop/tcploop:
dev/udp/udp-perturb: dev/udp/udp-perturb.o
$(cmd_LD) $(ARCH_FLAGS) $(LDFLAGS) -o $@ $^ $(LDOPTS)
dev/term_events/term_events: dev/term_events/term_events.o
$(cmd_LD) $(ARCH_FLAGS) $(LDFLAGS) -o $@ $^ $(LDOPTS)
# rebuild it every time
.PHONY: src/version.c dev/poll/poll dev/tcploop/tcploop
.PHONY: src/version.c dev/ncpu/ncpu dev/poll/poll dev/tcploop/tcploop
src/calltrace.o: src/calltrace.c $(DEP)
$(cmd_CC) $(TRACE_COPTS) -c -o $@ $<
src/haproxy.o: src/haproxy.c $(DEP)
src/version.o: src/version.c $(DEP)
$(cmd_CC) $(COPTS) \
-DBUILD_TARGET='"$(strip $(TARGET))"' \
-DBUILD_CC='"$(strip $(CC))"' \
@ -1090,6 +1123,11 @@ install-doc:
$(INSTALL) -m 644 doc/$$x.txt "$(DESTDIR)$(DOCDIR)" ; \
done
install-admin:
$(Q)$(INSTALL) -d "$(DESTDIR)$(SBINDIR)"
$(Q)$(INSTALL) admin/cli/haproxy-dump-certs "$(DESTDIR)$(SBINDIR)"
$(Q)$(INSTALL) admin/cli/haproxy-reload "$(DESTDIR)$(SBINDIR)"
install-bin:
$(Q)for i in haproxy $(EXTRA); do \
if ! [ -e "$$i" ]; then \
@ -1100,7 +1138,7 @@ install-bin:
$(Q)$(INSTALL) -d "$(DESTDIR)$(SBINDIR)"
$(Q)$(INSTALL) haproxy $(EXTRA) "$(DESTDIR)$(SBINDIR)"
install: install-bin install-man install-doc
install: install-bin install-admin install-man install-doc
uninstall:
$(Q)rm -f "$(DESTDIR)$(MANDIR)"/man1/haproxy.1
@ -1122,10 +1160,13 @@ clean:
$(Q)rm -f addons/ot/src/*.[oas]
$(Q)rm -f addons/wurfl/*.[oas] addons/wurfl/dummy/*.[oas]
$(Q)rm -f admin/*/*.[oas] admin/*/*/*.[oas]
$(Q)rm -f dev/*/*.[oas]
$(Q)rm -f dev/flags/flags
distclean: clean
$(Q)rm -f admin/iprange/iprange admin/iprange/ip6range admin/halog/halog
$(Q)rm -f admin/dyncookie/dyncookie
$(Q)rm -f dev/*/*.[oas]
$(Q)rm -f dev/flags/flags dev/haring/haring dev/poll/poll dev/tcploop/tcploop
$(Q)rm -f dev/haring/haring dev/ncpu/ncpu{,.so} dev/poll/poll dev/tcploop/tcploop
$(Q)rm -f dev/hpack/decode dev/hpack/gen-enc dev/hpack/gen-rht
$(Q)rm -f dev/qpack/decode
@ -1245,10 +1286,17 @@ reg-tests-help:
.PHONY: reg-tests reg-tests-help
unit-tests:
$(Q)$(UNIT_TEST_SCRIPT)
.PHONY: unit-tests
# "make range" iteratively builds using "make all" and the exact same build
# options for all commits within RANGE. RANGE may be either a git range
# such as ref1..ref2 or a single commit, in which case all commits from
# the master branch to this one will be tested.
# Will execute TEST_CMD for each commit if defined, and will stop in case of
# failure.
range:
$(Q)[ -d .git/. ] || { echo "## Fatal: \"make $@\" may only be used inside a Git repository."; exit 1; }
@ -1274,6 +1322,7 @@ range:
echo "[ $$index/$$count ] $$commit #############################"; \
git checkout -q $$commit || die 1; \
$(MAKE) all || die 1; \
[ -z "$(TEST_CMD)" ] || $(TEST_CMD) || die 1; \
index=$$((index + 1)); \
done; \
echo;echo "Done! $${count} commit(s) built successfully for RANGE $${RANGE}" ; \

22
README
View File

@ -1,22 +0,0 @@
The HAProxy documentation has been split into a number of different files for
ease of use.
Please refer to the following files depending on what you're looking for :
- INSTALL for instructions on how to build and install HAProxy
- BRANCHES to understand the project's life cycle and what version to use
- LICENSE for the project's license
- CONTRIBUTING for the process to follow to submit contributions
The more detailed documentation is located into the doc/ directory :
- doc/intro.txt for a quick introduction on HAProxy
- doc/configuration.txt for the configuration's reference manual
- doc/lua.txt for the Lua's reference manual
- doc/SPOE.txt for how to use the SPOE engine
- doc/network-namespaces.txt for how to use network namespaces under Linux
- doc/management.txt for the management guide
- doc/regression-testing.txt for how to use the regression testing suite
- doc/peers.txt for the peers protocol reference
- doc/coding-style.txt for how to adopt HAProxy's coding style
- doc/internals for developer-specific documentation (not all up to date)

62
README.md Normal file
View File

@ -0,0 +1,62 @@
# HAProxy
[![alpine/musl](https://github.com/haproxy/haproxy/actions/workflows/musl.yml/badge.svg)](https://github.com/haproxy/haproxy/actions/workflows/musl.yml)
[![AWS-LC](https://github.com/haproxy/haproxy/actions/workflows/aws-lc.yml/badge.svg)](https://github.com/haproxy/haproxy/actions/workflows/aws-lc.yml)
[![openssl no-deprecated](https://github.com/haproxy/haproxy/actions/workflows/openssl-nodeprecated.yml/badge.svg)](https://github.com/haproxy/haproxy/actions/workflows/openssl-nodeprecated.yml)
[![Illumos](https://github.com/haproxy/haproxy/actions/workflows/illumos.yml/badge.svg)](https://github.com/haproxy/haproxy/actions/workflows/illumos.yml)
[![NetBSD](https://github.com/haproxy/haproxy/actions/workflows/netbsd.yml/badge.svg)](https://github.com/haproxy/haproxy/actions/workflows/netbsd.yml)
[![FreeBSD](https://api.cirrus-ci.com/github/haproxy/haproxy.svg?task=FreeBSD)](https://cirrus-ci.com/github/haproxy/haproxy/)
[![VTest](https://github.com/haproxy/haproxy/actions/workflows/vtest.yml/badge.svg)](https://github.com/haproxy/haproxy/actions/workflows/vtest.yml)
![HAProxy logo](doc/HAProxyCommunityEdition_60px.png)
HAProxy is a free, very fast and reliable reverse-proxy offering high availability, load balancing, and proxying for TCP
and HTTP-based applications.
## Installation
The [INSTALL](INSTALL) file describes how to build HAProxy.
A [list of packages](https://github.com/haproxy/wiki/wiki/Packages) is also available on the wiki.
## Getting help
The [discourse](https://discourse.haproxy.org/) and the [mailing-list](https://www.mail-archive.com/haproxy@formilux.org/)
are available for questions or configuration assistance. You can also use the [slack](https://slack.haproxy.org/) or
[IRC](irc://irc.libera.chat/%23haproxy) channel. Please don't use the issue tracker for these.
The [issue tracker](https://github.com/haproxy/haproxy/issues/) is only for bug reports or feature requests.
## Documentation
The HAProxy documentation has been split into a number of different files for
ease of use. It is available in text format as well as HTML. The wiki is also meant to replace the old architecture
guide.
- [HTML documentation](http://docs.haproxy.org/)
- [HTML HAProxy LUA API Documentation](https://www.arpalert.org/haproxy-api.html)
- [Wiki](https://github.com/haproxy/wiki/wiki)
Please refer to the following files depending on what you're looking for:
- [INSTALL](INSTALL) for instructions on how to build and install HAProxy
- [BRANCHES](BRANCHES) to understand the project's life cycle and what version to use
- [LICENSE](LICENSE) for the project's license
- [CONTRIBUTING](CONTRIBUTING) for the process to follow to submit contributions
The more detailed documentation is located into the doc/ directory:
- [ doc/intro.txt ](doc/intro.txt) for a quick introduction on HAProxy
- [ doc/configuration.txt ](doc/configuration.txt) for the configuration's reference manual
- [ doc/lua.txt ](doc/lua.txt) for the Lua's reference manual
- [ doc/SPOE.txt ](doc/SPOE.txt) for how to use the SPOE engine
- [ doc/network-namespaces.txt ](doc/network-namespaces.txt) for how to use network namespaces under Linux
- [ doc/management.txt ](doc/management.txt) for the management guide
- [ doc/regression-testing.txt ](doc/regression-testing.txt) for how to use the regression testing suite
- [ doc/peers.txt ](doc/peers.txt) for the peers protocol reference
- [ doc/coding-style.txt ](doc/coding-style.txt) for how to adopt HAProxy's coding style
- [ doc/internals ](doc/internals) for developer-specific documentation (not all up to date)
## License
HAProxy is licensed under [GPL 2](doc/gpl.txt) or any later version, the headers under [LGPL 2.1](doc/lgpl.txt). See the
[LICENSE](LICENSE) file for a more detailed explanation.

View File

@ -1,2 +1,2 @@
$Format:%ci$
2024/05/04
2026/01/07

View File

@ -1 +1 @@
3.0-dev10
3.4-dev2

View File

@ -5,7 +5,8 @@ CXX := c++
CXXLIB := -lstdc++
ifeq ($(DEVICEATLAS_SRC),)
OPTIONS_LDFLAGS += -lda
OPTIONS_CFLAGS += -I$(DEVICEATLAS_INC)
OPTIONS_LDFLAGS += -Wl,-rpath,$(DEVICEATLAS_LIB) -L$(DEVICEATLAS_LIB) -lda
else
DEVICEATLAS_INC = $(DEVICEATLAS_SRC)
DEVICEATLAS_LIB = $(DEVICEATLAS_SRC)

View File

@ -212,7 +212,7 @@ da_status_t da_atlas_compile(void *ctx, da_read_fn readfn, da_setpos_fn setposfn
* da_getpropid on the atlas, and if generated by the search, the ID will be consistent across
* different calls to search.
* Properties added by a search that are neither in the compiled atlas, nor in the extra_props list
* Are assigned an ID within the context that is not transferrable through different search results
* Are assigned an ID within the context that is not transferable through different search results
* within the same atlas.
* @param atlas Atlas instance
* @param extra_props properties

View File

@ -47,6 +47,12 @@ via the OpenTracing API with OpenTracing compatible servers (tracers).
Currently, tracers that support this API include Datadog, Jaeger, LightStep
and Zipkin.
Note: The OpenTracing filter shouldn't be used for new designs as OpenTracing
itself is no longer maintained nor supported by its authors. A
replacement filter base on OpenTelemetry is currently under development
and is expected to be ready around HAProxy 3.2. As such OpenTracing will
be deprecated in 3.3 and removed in 3.5.
The OT filter was primarily tested with the Jaeger tracer, while configurations
for both Datadog and Zipkin tracers were also set in the test directory.

View File

@ -718,7 +718,7 @@ static void flt_ot_check_timeouts(struct stream *s, struct filter *f)
if (flt_ot_is_disabled(f FLT_OT_DBG_ARGS(, -1)))
FLT_OT_RETURN();
s->pending_events |= TASK_WOKEN_MSG;
s->pending_events |= STRM_EVT_MSG;
flt_ot_return_void(f, &err);

View File

@ -1074,8 +1074,9 @@ static int flt_ot_post_parse_cfg_scope(void)
*/
static int flt_ot_parse_cfg(struct flt_ot_conf *conf, const char *flt_name, char **err)
{
struct list backup_sections;
int retval = ERR_ABORT | ERR_ALERT;
struct list backup_sections;
struct cfgfile cfg_file = {0};
int retval = ERR_ABORT | ERR_ALERT;
FLT_OT_FUNC("%p, \"%s\", %p:%p", conf, flt_name, FLT_OT_DPTR_ARGS(err));
@ -1094,8 +1095,16 @@ static int flt_ot_parse_cfg(struct flt_ot_conf *conf, const char *flt_name, char
/* Do nothing. */;
else if (access(conf->cfg_file, R_OK) == -1)
FLT_OT_PARSE_ERR(err, "'%s' : %s", conf->cfg_file, strerror(errno));
else
retval = readcfgfile(conf->cfg_file);
else {
cfg_file.filename = conf->cfg_file;
cfg_file.size = load_cfg_in_mem(cfg_file.filename, &cfg_file.content);
if (cfg_file.size < 0) {
ha_free(&cfg_file.content);
FLT_OT_RETURN_INT(retval);
}
retval = parse_cfg(&cfg_file);
ha_free(&cfg_file.content);
}
/* Unregister OT sections and restore previous sections. */
cfg_unregister_sections();

View File

@ -39,14 +39,21 @@
*/
static void flt_ot_vars_scope_dump(struct vars *vars, const char *scope)
{
const struct var *var;
int i;
if (vars == NULL)
return;
vars_rdlock(vars);
list_for_each_entry(var, &(vars->head), l)
FLT_OT_DBG(2, "'%s.%016" PRIx64 "' -> '%.*s'", scope, var->name_hash, (int)b_data(&(var->data.u.str)), b_orig(&(var->data.u.str)));
for (i = 0; i < VAR_NAME_ROOTS; i++) {
struct ceb_node *node = cebu64_first(&(vars->name_root[i]));
for ( ; node != NULL; node = cebu64_next(&(vars->name_root[i]), node)) {
struct var *var = container_of(node, struct var, node);
FLT_OT_DBG(2, "'%s.%016" PRIx64 "' -> '%.*s'", scope, var->name_hash, (int)b_data(&(var->data.u.str)), b_orig(&(var->data.u.str)));
}
}
vars_rdunlock(vars);
}

View File

@ -91,6 +91,18 @@ name must be preceded by a minus character ('-'). Here are examples:
# Only dump frontends, backends and servers status
/metrics?metrics=haproxy_frontend_status,haproxy_backend_status,haproxy_server_status
* Add section description as label for all metrics
It is possible to set a description in global and proxy sections, via the
"description" directive. The global description is exposed if it is define via
the "haproxy_process_description" metric. But the descriptions provided in proxy
sections are not dumped. However, it is possible to add it as a label for all
metrics of the corresponding section, including the global one. To do so,
"desc-labels" parameter must be set:
/metrics?desc-labels
/ metrics?scope=frontend&desc-labels
* Dump extra counters
@ -193,6 +205,8 @@ listed below. Metrics from extra counters are not listed.
| haproxy_process_current_tasks |
| haproxy_process_current_run_queue |
| haproxy_process_idle_time_percent |
| haproxy_process_node |
| haproxy_process_description |
| haproxy_process_stopping |
| haproxy_process_jobs |
| haproxy_process_unstoppable_jobs |
@ -375,6 +389,9 @@ listed below. Metrics from extra counters are not listed.
| haproxy_server_max_connect_time_seconds |
| haproxy_server_max_response_time_seconds |
| haproxy_server_max_total_time_seconds |
| haproxy_server_agent_status |
| haproxy_server_agent_code |
| haproxy_server_agent_duration_seconds |
| haproxy_server_internal_errors_total |
| haproxy_server_unsafe_idle_connections_current |
| haproxy_server_safe_idle_connections_current |

View File

@ -32,11 +32,11 @@
/* Prometheus exporter flags (ctx->flags) */
#define PROMEX_FL_METRIC_HDR 0x00000001
#define PROMEX_FL_INFO_METRIC 0x00000002
#define PROMEX_FL_FRONT_METRIC 0x00000004
#define PROMEX_FL_BACK_METRIC 0x00000008
#define PROMEX_FL_SRV_METRIC 0x00000010
#define PROMEX_FL_LI_METRIC 0x00000020
#define PROMEX_FL_BODYLESS_RESP 0x00000002
/* unused: 0x00000004 */
/* unused: 0x00000008 */
/* unused: 0x00000010 */
/* unused: 0x00000020 */
#define PROMEX_FL_MODULE_METRIC 0x00000040
#define PROMEX_FL_SCOPE_GLOBAL 0x00000080
#define PROMEX_FL_SCOPE_FRONT 0x00000100
@ -47,6 +47,7 @@
#define PROMEX_FL_NO_MAINT_SRV 0x00002000
#define PROMEX_FL_EXTRA_COUNTERS 0x00004000
#define PROMEX_FL_INC_METRIC_BY_DEFAULT 0x00008000
#define PROMEX_FL_DESC_LABELS 0x00010000
#define PROMEX_FL_SCOPE_ALL (PROMEX_FL_SCOPE_GLOBAL | PROMEX_FL_SCOPE_FRONT | \
PROMEX_FL_SCOPE_LI | PROMEX_FL_SCOPE_BACK | \

File diff suppressed because it is too large Load Diff

View File

@ -1,674 +0,0 @@
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU General Public License is a free, copyleft license for
software and other kinds of works.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users. We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors. You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received. You must make sure that they, too, receive
or can get the source code. And you must show them these terms so they
know their rights.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.
For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software. For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so. This is fundamentally incompatible with the aim of
protecting users' freedom to change the software. The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we
have designed this version of the GPL to prohibit the practice for those
products. If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.
Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary. To prevent this, the GPL assures that
patents cannot be used to render the program non-free.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
0. Definitions.
"This License" refers to version 3 of the GNU General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work based
on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified
it, and giving a relevant date.
b) The work must carry prominent notices stating that it is
released under this License and any conditions added under section
7. This requirement modifies the requirement in section 4 to
"keep intact all notices".
c) You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either (1) a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or (2) access to copy the
Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
d) Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.
A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information. But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.
7. Additional Terms.
"Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law. If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
e) Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term. If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance. However,
nothing other than this License grants you permission to propagate or
modify any covered work. These actions infringe copyright if you do
not accept this License. Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not responsible
for enforcing compliance by third parties with this License.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.
11. Patents.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The
work thus licensed is called the contributor's "contributor version".
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version. For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.
In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.
If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients. "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License. You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all. For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Use with the GNU Affero General Public License.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Also add information on how to contact you by electronic and paper mail.
If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:
<program> Copyright (C) <year> <name of author>
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, your program's commands
might be different; for a GUI interface, you would use an "about box".
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
<https://www.gnu.org/licenses/>.
The GNU General Public License does not permit incorporating your program
into proprietary programs. If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
<https://www.gnu.org/licenses/why-not-lgpl.html>.

View File

@ -1,13 +0,0 @@
This directory contains a fork of the acme.sh deploy script for haproxy which
allow acme.sh to run as non-root and don't require to reload haproxy.
The content of this directory is licensed under GPLv3 as explained in the
LICENSE file.
This was originally written for this pull request
https://github.com/acmesh-official/acme.sh/pull/4581.
The documentation is available on the haproxy wiki:
https://github.com/haproxy/wiki/wiki/Letsencrypt-integration-with-HAProxy-and-acme.sh
The haproxy.sh script must replace the one provided by acme.sh.

View File

@ -1,403 +0,0 @@
#!/usr/bin/env sh
# Script for acme.sh to deploy certificates to haproxy
#
# The following variables can be exported:
#
# export DEPLOY_HAPROXY_PEM_NAME="${domain}.pem"
#
# Defines the name of the PEM file.
# Defaults to "<domain>.pem"
#
# export DEPLOY_HAPROXY_PEM_PATH="/etc/haproxy"
#
# Defines location of PEM file for HAProxy.
# Defaults to /etc/haproxy
#
# export DEPLOY_HAPROXY_RELOAD="systemctl reload haproxy"
#
# OPTIONAL: Reload command used post deploy
# This defaults to be a no-op (ie "true").
# It is strongly recommended to set this something that makes sense
# for your distro.
#
# export DEPLOY_HAPROXY_ISSUER="no"
#
# OPTIONAL: Places CA file as "${DEPLOY_HAPROXY_PEM}.issuer"
# Note: Required for OCSP stapling to work
#
# export DEPLOY_HAPROXY_BUNDLE="no"
#
# OPTIONAL: Deploy this certificate as part of a multi-cert bundle
# This adds a suffix to the certificate based on the certificate type
# eg RSA certificates will have .rsa as a suffix to the file name
# HAProxy will load all certificates and provide one or the other
# depending on client capabilities
# Note: This functionality requires HAProxy was compiled against
# a version of OpenSSL that supports this.
#
# export DEPLOY_HAPROXY_HOT_UPDATE="yes"
# export DEPLOY_HAPROXY_STATS_SOCKET="UNIX:/run/haproxy/admin.sock"
#
# OPTIONAL: Deploy the certificate over the HAProxy stats socket without
# needing to reload HAProxy. Default is "no".
#
# Require the socat binary. DEPLOY_HAPROXY_STATS_SOCKET variable uses the socat
# address format.
#
# export DEPLOY_HAPROXY_MASTER_CLI="UNIX:/run/haproxy-master.sock"
#
# OPTIONAL: To use the master CLI with DEPLOY_HAPROXY_HOT_UPDATE="yes" instead
# of a stats socket, use this variable.
######## Public functions #####################
#domain keyfile certfile cafile fullchain
haproxy_deploy() {
_cdomain="$1"
_ckey="$2"
_ccert="$3"
_cca="$4"
_cfullchain="$5"
_cmdpfx=""
# Some defaults
DEPLOY_HAPROXY_PEM_PATH_DEFAULT="/etc/haproxy"
DEPLOY_HAPROXY_PEM_NAME_DEFAULT="${_cdomain}.pem"
DEPLOY_HAPROXY_BUNDLE_DEFAULT="no"
DEPLOY_HAPROXY_ISSUER_DEFAULT="no"
DEPLOY_HAPROXY_RELOAD_DEFAULT="true"
DEPLOY_HAPROXY_HOT_UPDATE_DEFAULT="no"
DEPLOY_HAPROXY_STATS_SOCKET_DEFAULT="UNIX:/run/haproxy/admin.sock"
_debug _cdomain "${_cdomain}"
_debug _ckey "${_ckey}"
_debug _ccert "${_ccert}"
_debug _cca "${_cca}"
_debug _cfullchain "${_cfullchain}"
# PEM_PATH is optional. If not provided then assume "${DEPLOY_HAPROXY_PEM_PATH_DEFAULT}"
_getdeployconf DEPLOY_HAPROXY_PEM_PATH
_debug2 DEPLOY_HAPROXY_PEM_PATH "${DEPLOY_HAPROXY_PEM_PATH}"
if [ -n "${DEPLOY_HAPROXY_PEM_PATH}" ]; then
Le_Deploy_haproxy_pem_path="${DEPLOY_HAPROXY_PEM_PATH}"
_savedomainconf Le_Deploy_haproxy_pem_path "${Le_Deploy_haproxy_pem_path}"
elif [ -z "${Le_Deploy_haproxy_pem_path}" ]; then
Le_Deploy_haproxy_pem_path="${DEPLOY_HAPROXY_PEM_PATH_DEFAULT}"
fi
# Ensure PEM_PATH exists
if [ -d "${Le_Deploy_haproxy_pem_path}" ]; then
_debug "PEM_PATH ${Le_Deploy_haproxy_pem_path} exists"
else
_err "PEM_PATH ${Le_Deploy_haproxy_pem_path} does not exist"
return 1
fi
# PEM_NAME is optional. If not provided then assume "${DEPLOY_HAPROXY_PEM_NAME_DEFAULT}"
_getdeployconf DEPLOY_HAPROXY_PEM_NAME
_debug2 DEPLOY_HAPROXY_PEM_NAME "${DEPLOY_HAPROXY_PEM_NAME}"
if [ -n "${DEPLOY_HAPROXY_PEM_NAME}" ]; then
Le_Deploy_haproxy_pem_name="${DEPLOY_HAPROXY_PEM_NAME}"
_savedomainconf Le_Deploy_haproxy_pem_name "${Le_Deploy_haproxy_pem_name}"
elif [ -z "${Le_Deploy_haproxy_pem_name}" ]; then
Le_Deploy_haproxy_pem_name="${DEPLOY_HAPROXY_PEM_NAME_DEFAULT}"
# We better not have '*' as the first character
if [ "${Le_Deploy_haproxy_pem_name%%"${Le_Deploy_haproxy_pem_name#?}"}" = '*' ]; then
# removes the first characters and add a _ instead
Le_Deploy_haproxy_pem_name="_${Le_Deploy_haproxy_pem_name#?}"
fi
fi
# BUNDLE is optional. If not provided then assume "${DEPLOY_HAPROXY_BUNDLE_DEFAULT}"
_getdeployconf DEPLOY_HAPROXY_BUNDLE
_debug2 DEPLOY_HAPROXY_BUNDLE "${DEPLOY_HAPROXY_BUNDLE}"
if [ -n "${DEPLOY_HAPROXY_BUNDLE}" ]; then
Le_Deploy_haproxy_bundle="${DEPLOY_HAPROXY_BUNDLE}"
_savedomainconf Le_Deploy_haproxy_bundle "${Le_Deploy_haproxy_bundle}"
elif [ -z "${Le_Deploy_haproxy_bundle}" ]; then
Le_Deploy_haproxy_bundle="${DEPLOY_HAPROXY_BUNDLE_DEFAULT}"
fi
# ISSUER is optional. If not provided then assume "${DEPLOY_HAPROXY_ISSUER_DEFAULT}"
_getdeployconf DEPLOY_HAPROXY_ISSUER
_debug2 DEPLOY_HAPROXY_ISSUER "${DEPLOY_HAPROXY_ISSUER}"
if [ -n "${DEPLOY_HAPROXY_ISSUER}" ]; then
Le_Deploy_haproxy_issuer="${DEPLOY_HAPROXY_ISSUER}"
_savedomainconf Le_Deploy_haproxy_issuer "${Le_Deploy_haproxy_issuer}"
elif [ -z "${Le_Deploy_haproxy_issuer}" ]; then
Le_Deploy_haproxy_issuer="${DEPLOY_HAPROXY_ISSUER_DEFAULT}"
fi
# RELOAD is optional. If not provided then assume "${DEPLOY_HAPROXY_RELOAD_DEFAULT}"
_getdeployconf DEPLOY_HAPROXY_RELOAD
_debug2 DEPLOY_HAPROXY_RELOAD "${DEPLOY_HAPROXY_RELOAD}"
if [ -n "${DEPLOY_HAPROXY_RELOAD}" ]; then
Le_Deploy_haproxy_reload="${DEPLOY_HAPROXY_RELOAD}"
_savedomainconf Le_Deploy_haproxy_reload "${Le_Deploy_haproxy_reload}"
elif [ -z "${Le_Deploy_haproxy_reload}" ]; then
Le_Deploy_haproxy_reload="${DEPLOY_HAPROXY_RELOAD_DEFAULT}"
fi
# HOT_UPDATE is optional. If not provided then assume "${DEPLOY_HAPROXY_HOT_UPDATE_DEFAULT}"
_getdeployconf DEPLOY_HAPROXY_HOT_UPDATE
_debug2 DEPLOY_HAPROXY_HOT_UPDATE "${DEPLOY_HAPROXY_HOT_UPDATE}"
if [ -n "${DEPLOY_HAPROXY_HOT_UPDATE}" ]; then
Le_Deploy_haproxy_hot_update="${DEPLOY_HAPROXY_HOT_UPDATE}"
_savedomainconf Le_Deploy_haproxy_hot_update "${Le_Deploy_haproxy_hot_update}"
elif [ -z "${Le_Deploy_haproxy_hot_update}" ]; then
Le_Deploy_haproxy_hot_update="${DEPLOY_HAPROXY_HOT_UPDATE_DEFAULT}"
fi
# STATS_SOCKET is optional. If not provided then assume "${DEPLOY_HAPROXY_STATS_SOCKET_DEFAULT}"
_getdeployconf DEPLOY_HAPROXY_STATS_SOCKET
_debug2 DEPLOY_HAPROXY_STATS_SOCKET "${DEPLOY_HAPROXY_STATS_SOCKET}"
if [ -n "${DEPLOY_HAPROXY_STATS_SOCKET}" ]; then
Le_Deploy_haproxy_stats_socket="${DEPLOY_HAPROXY_STATS_SOCKET}"
_savedomainconf Le_Deploy_haproxy_stats_socket "${Le_Deploy_haproxy_stats_socket}"
elif [ -z "${Le_Deploy_haproxy_stats_socket}" ]; then
Le_Deploy_haproxy_stats_socket="${DEPLOY_HAPROXY_STATS_SOCKET_DEFAULT}"
fi
# MASTER_CLI is optional. No defaults are used. When the master CLI is used,
# all commands are sent with a prefix.
_getdeployconf DEPLOY_HAPROXY_MASTER_CLI
_debug2 DEPLOY_HAPROXY_MASTER_CLI "${DEPLOY_HAPROXY_MASTER_CLI}"
if [ -n "${DEPLOY_HAPROXY_MASTER_CLI}" ]; then
Le_Deploy_haproxy_stats_socket="${DEPLOY_HAPROXY_MASTER_CLI}"
_savedomainconf Le_Deploy_haproxy_stats_socket "${Le_Deploy_haproxy_stats_socket}"
_cmdpfx="@1 " # command prefix used for master CLI only.
fi
# Set the suffix depending if we are creating a bundle or not
if [ "${Le_Deploy_haproxy_bundle}" = "yes" ]; then
_info "Bundle creation requested"
# Initialise $Le_Keylength if its not already set
if [ -z "${Le_Keylength}" ]; then
Le_Keylength=""
fi
if _isEccKey "${Le_Keylength}"; then
_info "ECC key type detected"
_suffix=".ecdsa"
else
_info "RSA key type detected"
_suffix=".rsa"
fi
else
_suffix=""
fi
_debug _suffix "${_suffix}"
# Set variables for later
_pem="${Le_Deploy_haproxy_pem_path}/${Le_Deploy_haproxy_pem_name}${_suffix}"
_issuer="${_pem}.issuer"
_ocsp="${_pem}.ocsp"
_reload="${Le_Deploy_haproxy_reload}"
_statssock="${Le_Deploy_haproxy_stats_socket}"
_info "Deploying PEM file"
# Create a temporary PEM file
_temppem="$(_mktemp)"
_debug _temppem "${_temppem}"
cat "${_ccert}" "${_cca}" "${_ckey}" | grep . >"${_temppem}"
_ret="$?"
# Check that we could create the temporary file
if [ "${_ret}" != "0" ]; then
_err "Error code ${_ret} returned during PEM file creation"
[ -f "${_temppem}" ] && rm -f "${_temppem}"
return ${_ret}
fi
# Move PEM file into place
_info "Moving new certificate into place"
_debug _pem "${_pem}"
cat "${_temppem}" >"${_pem}"
_ret=$?
# Clean up temp file
[ -f "${_temppem}" ] && rm -f "${_temppem}"
# Deal with any failure of moving PEM file into place
if [ "${_ret}" != "0" ]; then
_err "Error code ${_ret} returned while moving new certificate into place"
return ${_ret}
fi
# Update .issuer file if requested
if [ "${Le_Deploy_haproxy_issuer}" = "yes" ]; then
_info "Updating .issuer file"
_debug _issuer "${_issuer}"
cat "${_cca}" >"${_issuer}"
_ret="$?"
if [ "${_ret}" != "0" ]; then
_err "Error code ${_ret} returned while copying issuer/CA certificate into place"
return ${_ret}
fi
else
[ -f "${_issuer}" ] && _err "Issuer file update not requested but .issuer file exists"
fi
# Update .ocsp file if certificate was requested with --ocsp/--ocsp-must-staple option
if [ -z "${Le_OCSP_Staple}" ]; then
Le_OCSP_Staple="0"
fi
if [ "${Le_OCSP_Staple}" = "1" ]; then
_info "Updating OCSP stapling info"
_debug _ocsp "${_ocsp}"
_info "Extracting OCSP URL"
_ocsp_url=$(${ACME_OPENSSL_BIN:-openssl} x509 -noout -ocsp_uri -in "${_pem}")
_debug _ocsp_url "${_ocsp_url}"
# Only process OCSP if URL was present
if [ "${_ocsp_url}" != "" ]; then
# Extract the hostname from the OCSP URL
_info "Extracting OCSP URL"
_ocsp_host=$(echo "${_ocsp_url}" | cut -d/ -f3)
_debug _ocsp_host "${_ocsp_host}"
# Only process the certificate if we have a .issuer file
if [ -r "${_issuer}" ]; then
# Check if issuer cert is also a root CA cert
_subjectdn=$(${ACME_OPENSSL_BIN:-openssl} x509 -in "${_issuer}" -subject -noout | cut -d'/' -f2,3,4,5,6,7,8,9,10)
_debug _subjectdn "${_subjectdn}"
_issuerdn=$(${ACME_OPENSSL_BIN:-openssl} x509 -in "${_issuer}" -issuer -noout | cut -d'/' -f2,3,4,5,6,7,8,9,10)
_debug _issuerdn "${_issuerdn}"
_info "Requesting OCSP response"
# If the issuer is a CA cert then our command line has "-CAfile" added
if [ "${_subjectdn}" = "${_issuerdn}" ]; then
_cafile_argument="-CAfile \"${_issuer}\""
else
_cafile_argument=""
fi
_debug _cafile_argument "${_cafile_argument}"
# if OpenSSL/LibreSSL is v1.1 or above, the format for the -header option has changed
_openssl_version=$(${ACME_OPENSSL_BIN:-openssl} version | cut -d' ' -f2)
_debug _openssl_version "${_openssl_version}"
_openssl_major=$(echo "${_openssl_version}" | cut -d '.' -f1)
_openssl_minor=$(echo "${_openssl_version}" | cut -d '.' -f2)
if [ "${_openssl_major}" -eq "1" ] && [ "${_openssl_minor}" -ge "1" ] || [ "${_openssl_major}" -ge "2" ]; then
_header_sep="="
else
_header_sep=" "
fi
# Request the OCSP response from the issuer and store it
_openssl_ocsp_cmd="${ACME_OPENSSL_BIN:-openssl} ocsp \
-issuer \"${_issuer}\" \
-cert \"${_pem}\" \
-url \"${_ocsp_url}\" \
-header Host${_header_sep}\"${_ocsp_host}\" \
-respout \"${_ocsp}\" \
-verify_other \"${_issuer}\" \
${_cafile_argument} \
| grep -q \"${_pem}: good\""
_debug _openssl_ocsp_cmd "${_openssl_ocsp_cmd}"
eval "${_openssl_ocsp_cmd}"
_ret=$?
else
# Non fatal: No issuer file was present so no OCSP stapling file created
_err "OCSP stapling in use but no .issuer file was present"
fi
else
# Non fatal: No OCSP url was found int the certificate
_err "OCSP update requested but no OCSP URL was found in certificate"
fi
# Non fatal: Check return code of openssl command
if [ "${_ret}" != "0" ]; then
_err "Updating OCSP stapling failed with return code ${_ret}"
fi
else
# An OCSP file was already present but certificate did not have OCSP extension
if [ -f "${_ocsp}" ]; then
_err "OCSP was not requested but .ocsp file exists."
# Could remove the file at this step, although HAProxy just ignores it in this case
# rm -f "${_ocsp}" || _err "Problem removing stale .ocsp file"
fi
fi
if [ "${Le_Deploy_haproxy_hot_update}" = "yes" ]; then
# set the socket name for messages
if [ -n "${_cmdpfx}" ]; then
_socketname="master CLI"
else
_socketname="stats socket"
fi
# Update certificate over HAProxy stats socket or master CLI.
if _exists socat; then
# look for the certificate on the stats socket, to chose between updating or creating one
_socat_cert_cmd="echo '${_cmdpfx}show ssl cert' | socat '${_statssock}' - | grep -q '^${_pem}$'"
_debug _socat_cert_cmd "${_socat_cert_cmd}"
eval "${_socat_cert_cmd}"
_ret=$?
if [ "${_ret}" != "0" ]; then
_newcert="1"
_info "Creating new certificate '${_pem}' over HAProxy ${_socketname}."
# certificate wasn't found, it's a new one. We should check if the crt-list exists and creates/inserts the certificate.
_socat_crtlist_show_cmd="echo '${_cmdpfx}show ssl crt-list' | socat '${_statssock}' - | grep -q '^${Le_Deploy_haproxy_pem_path}$'"
_debug _socat_crtlist_show_cmd "${_socat_crtlist_show_cmd}"
eval "${_socat_crtlist_show_cmd}"
_ret=$?
if [ "${_ret}" != "0" ]; then
_err "Couldn't find '${Le_Deploy_haproxy_pem_path}' in haproxy 'show ssl crt-list'"
return "${_ret}"
fi
# create a new certificate
_socat_new_cmd="echo '${_cmdpfx}new ssl cert ${_pem}' | socat '${_statssock}' - | grep -q 'New empty'"
_debug _socat_new_cmd "${_socat_new_cmd}"
eval "${_socat_new_cmd}"
_ret=$?
if [ "${_ret}" != "0" ]; then
_err "Couldn't create '${_pem}' in haproxy"
return "${_ret}"
fi
else
_info "Update existing certificate '${_pem}' over HAProxy ${_socketname}."
fi
_socat_cert_set_cmd="echo -e '${_cmdpfx}set ssl cert ${_pem} <<\n$(cat "${_pem}")\n' | socat '${_statssock}' - | grep -q 'Transaction created'"
_debug _socat_cert_set_cmd "${_socat_cert_set_cmd}"
eval "${_socat_cert_set_cmd}"
_ret=$?
if [ "${_ret}" != "0" ]; then
_err "Can't update '${_pem}' in haproxy"
return "${_ret}"
fi
_socat_cert_commit_cmd="echo '${_cmdpfx}commit ssl cert ${_pem}' | socat '${_statssock}' - | grep -q '^Success!$'"
_debug _socat_cert_commit_cmd "${_socat_cert_commit_cmd}"
eval "${_socat_cert_commit_cmd}"
_ret=$?
if [ "${_ret}" != "0" ]; then
_err "Can't commit '${_pem}' in haproxy"
return ${_ret}
fi
if [ "${_newcert}" = "1" ]; then
# if this is a new certificate, it needs to be inserted into the crt-list`
_socat_cert_add_cmd="echo '${_cmdpfx}add ssl crt-list ${Le_Deploy_haproxy_pem_path} ${_pem}' | socat '${_statssock}' - | grep -q 'Success!'"
_debug _socat_cert_add_cmd "${_socat_cert_add_cmd}"
eval "${_socat_cert_add_cmd}"
_ret=$?
if [ "${_ret}" != "0" ]; then
_err "Can't update '${_pem}' in haproxy"
return "${_ret}"
fi
fi
else
_err "'socat' is not available, couldn't update over ${_socketname}"
fi
else
# Reload HAProxy
_debug _reload "${_reload}"
eval "${_reload}"
_ret=$?
if [ "${_ret}" != "0" ]; then
_err "Error code ${_ret} during reload"
return ${_ret}
else
_info "Reload successful"
fi
fi
return 0
}

235
admin/cli/haproxy-dump-certs Executable file
View File

@ -0,0 +1,235 @@
#!/bin/bash
#
# Dump certificates from the HAProxy stats or master socket to the filesystem
# Experimental script
#
set -e
export BASEPATH=${BASEPATH:-/etc/haproxy}/
export SOCKET=${SOCKET:-/var/run/haproxy-master.sock}
export DRY_RUN=0
export DEBUG=
export VERBOSE=
export M="@1 "
export TMP
vecho() {
[ -n "$VERBOSE" ] && echo "$@"
return 0
}
read_certificate() {
name=$1
crt_filename=
key_filename=
OFS=$IFS
IFS=":"
while read -r key value; do
case "$key" in
"Crt filename")
crt_filename="${value# }"
key_filename="${value# }"
;;
"Key filename")
key_filename="${value# }"
;;
esac
done < <(echo "${M}show ssl cert ${name}" | socat "${SOCKET}" -)
IFS=$OFS
if [ -z "$crt_filename" ] || [ -z "$key_filename" ]; then
return 1
fi
# handle fields without a crt-base/key-base
[ "${crt_filename:0:1}" != "/" ] && crt_filename="${BASEPATH}${crt_filename}"
[ "${key_filename:0:1}" != "/" ] && key_filename="${BASEPATH}${key_filename}"
vecho "name:$name"
vecho "crt:$crt_filename"
vecho "key:$key_filename"
export NAME="$name"
export CRT_FILENAME="$crt_filename"
export KEY_FILENAME="$key_filename"
return 0
}
cmp_certkey() {
prev=$1
new=$2
if [ ! -f "$prev" ]; then
return 1;
fi
if ! cmp -s <(openssl x509 -in "$prev" -noout -fingerprint -sha256) <(openssl x509 -in "$new" -noout -fingerprint -sha256); then
return 1
fi
return 0
}
dump_certificate() {
name=$1
prev_crt=$2
prev_key=$3
r="tmp.${RANDOM}"
d="old.$(date +%s)"
new_crt="$TMP/$(basename "$prev_crt").${r}"
new_key="$TMP/$(basename "$prev_key").${r}"
if ! touch "${new_crt}" || ! touch "${new_key}"; then
echo "[ALERT] ($$) : can't dump \"$name\", can't create tmp files" >&2
return 1
fi
echo "${M}dump ssl cert ${name}" | socat "${SOCKET}" - | openssl pkey >> "${new_key}"
# use crl2pkcs7 as a way to dump multiple x509, storeutl could be used in modern versions of openssl
echo "${M}dump ssl cert ${name}" | socat "${SOCKET}" - | openssl crl2pkcs7 -nocrl -certfile /dev/stdin | openssl pkcs7 -print_certs >> "${new_crt}"
if ! cmp -s <(openssl x509 -in "${new_crt}" -pubkey -noout) <(openssl pkey -in "${new_key}" -pubout); then
echo "[ALERT] ($$) : Private key \"${new_key}\" and public key \"${new_crt}\" don't match" >&2
return 1
fi
if cmp_certkey "${prev_crt}" "${new_crt}"; then
echo "[NOTICE] ($$) : ${crt_filename} is already up to date" >&2
return 0
fi
# dry run will just return before trying to move the files
if [ "${DRY_RUN}" != "0" ]; then
return 0
fi
# move the current certificates to ".old.timestamp"
if [ -f "${prev_crt}" ] && [ -f "${prev_key}" ]; then
mv "${prev_crt}" "${prev_crt}.${d}"
[ "${prev_crt}" != "${prev_key}" ] && mv "${prev_key}" "${prev_key}.${d}"
fi
# move the new certificates to old place
mv "${new_crt}" "${prev_crt}"
[ "${prev_crt}" != "${prev_key}" ] && mv "${new_key}" "${prev_key}"
return 0
}
dump_all_certificates() {
echo "${M}show ssl cert" | socat "${SOCKET}" - | grep -v '^#' | grep -v '^$' | while read -r line; do
export NAME
export CRT_FILENAME
export KEY_FILENAME
if read_certificate "$line"; then
dump_certificate "$NAME" "$CRT_FILENAME" "$KEY_FILENAME"
else
echo "[WARNING] ($$) : can't dump \"$name\", crt/key filename details not found in \"show ssl cert\"" >&2
fi
done
}
usage() {
echo "Usage:"
echo " $0 [options]* [cert]*"
echo ""
echo " Dump certificates from the HAProxy stats or master socket to the filesystem"
echo " Require socat and openssl"
echo " EXPERIMENTAL script, backup your files!"
echo " The script will move your previous files to FILE.old.unixtimestamp (ex: foo.com.pem.old.1759044998)"
echo ""
echo "Options:"
echo " -S, --master-socket <path> Use the master socket at <path> (default: ${SOCKET})"
echo " -s, --socket <path> Use the stats socket at <path>"
echo " -p, --path <path> Specifiy a base path for relative files (default: ${BASEPATH})"
echo " -n, --dry-run Read certificates on the socket but don't dump them"
echo " -d, --debug Debug mode, set -x"
echo " -v, --verbose Verbose mode"
echo " -h, --help This help"
echo " -- End of options"
echo ""
echo "Examples:"
echo " $0 -v -p ${BASEPATH} -S ${SOCKET}"
echo " $0 -v -p ${BASEPATH} -S ${SOCKET} bar.com.rsa.pem"
echo " $0 -v -p ${BASEPATH} -S ${SOCKET} -- foo.com.ecdsa.pem bar.com.rsa.pem"
}
main() {
while [ -n "$1" ]; do
case "$1" in
-S|--master-socket)
SOCKET="$2"
M="@1 "
shift 2
;;
-s|--socket)
SOCKET="$2"
M=
shift 2
;;
-p|--path)
BASEPATH="$2/"
shift 2
;;
-n|--dry-run)
DRY_RUN=1
shift
;;
-d|--debug)
DEBUG=1
shift
;;
-v|--verbose)
VERBOSE=1
shift
;;
-h|--help)
usage "$@"
exit 0
;;
--)
shift
break
;;
-*)
echo "[ALERT] ($$) : Unknown option '$1'" >&2
usage "$@"
exit 1
;;
*)
break
;;
esac
done
if [ -n "$DEBUG" ]; then
set -x
fi
TMP=${TMP:-$(mktemp -d)}
if [ -z "$1" ]; then
dump_all_certificates
else
# compute the certificates names at the end of the command
while [ -n "$1" ]; do
if ! read_certificate "$1"; then
echo "[ALERT] ($$) : can't dump \"$1\", crt/key filename details not found in \"show ssl cert\"" >&2
exit 1
fi
[ "${DRY_RUN}" = "0" ] && dump_certificate "$NAME" "$CRT_FILENAME" "$KEY_FILENAME"
shift
done
fi
}
trap 'rm -rf -- "$TMP"' EXIT
main "$@"

113
admin/cli/haproxy-reload Executable file
View File

@ -0,0 +1,113 @@
#!/bin/bash
set -e
export VERBOSE=1
export TIMEOUT=90
export MASTER_SOCKET=${MASTER_SOCKET:-/var/run/haproxy-master.sock}
export RET=
alert() {
if [ "$VERBOSE" -ge "1" ]; then
echo "[ALERT] $*" >&2
fi
}
reload() {
while read -r line; do
if [ "$line" = "Success=0" ]; then
RET=1
elif [ "$line" = "Success=1" ]; then
RET=0
elif [ "$line" = "Another reload is still in progress." ]; then
alert "$line"
elif [ "$line" = "--" ]; then
continue;
else
if [ "$RET" = 1 ] && [ "$VERBOSE" = "2" ]; then
echo "$line" >&2
elif [ "$VERBOSE" = "3" ]; then
echo "$line" >&2
fi
fi
done < <(echo "reload" | socat -t"${TIMEOUT}" "${MASTER_SOCKET}" -)
if [ -z "$RET" ]; then
alert "Couldn't finish the reload before the timeout (${TIMEOUT})."
return 1
fi
return "$RET"
}
usage() {
echo "Usage:"
echo " $0 [options]*"
echo ""
echo " Trigger a reload from the master socket"
echo " Require socat"
echo " EXPERIMENTAL script!"
echo ""
echo "Options:"
echo " -S, --master-socket <path> Use the master socket at <path> (default: ${MASTER_SOCKET})"
echo " -d, --debug Debug mode, set -x"
echo " -t, --timeout Timeout (socat -t) (default: ${TIMEOUT})"
echo " -s, --silent Silent mode (no output)"
echo " -v, --verbose Verbose output (output from haproxy on failure)"
echo " -vv Even more verbose output (output from haproxy on success and failure)"
echo " -h, --help This help"
echo ""
echo "Examples:"
echo " $0 -S ${MASTER_SOCKET} -d ${TIMEOUT}"
}
main() {
while [ -n "$1" ]; do
case "$1" in
-S|--master-socket)
MASTER_SOCKET="$2"
shift 2
;;
-t|--timeout)
TIMEOUT="$2"
shift 2
;;
-s|--silent)
VERBOSE=0
shift
;;
-v|--verbose)
VERBOSE=2
shift
;;
-vv|--verbose)
VERBOSE=3
shift
;;
-d|--debug)
DEBUG=1
shift
;;
-h|--help)
usage "$@"
exit 0
;;
*)
echo "[ALERT] ($$) : Unknown option '$1'" >&2
usage "$@"
exit 1
;;
esac
done
if [ -n "$DEBUG" ]; then
set -x
fi
}
main "$@"
reload

View File

@ -123,6 +123,22 @@ struct url_stat {
#define FILT2_PRESERVE_QUERY 0x02
#define FILT2_EXTRACT_CAPTURE 0x04
#define FILT_OUTPUT_FMT (FILT_COUNT_ONLY| \
FILT_COUNT_STATUS| \
FILT_COUNT_SRV_STATUS| \
FILT_COUNT_COOK_CODES| \
FILT_COUNT_TERM_CODES| \
FILT_COUNT_URL_ONLY| \
FILT_COUNT_URL_COUNT| \
FILT_COUNT_URL_ERR| \
FILT_COUNT_URL_TAVG| \
FILT_COUNT_URL_TTOT| \
FILT_COUNT_URL_TAVGO| \
FILT_COUNT_URL_TTOTO| \
FILT_COUNT_URL_BAVG| \
FILT_COUNT_URL_BTOT| \
FILT_COUNT_IP_COUNT)
unsigned int filter = 0;
unsigned int filter2 = 0;
unsigned int filter_invert = 0;
@ -192,7 +208,7 @@ void help()
" you can also use -n to start from earlier then field %d\n"
" -query preserve the query string for per-URL (-u*) statistics\n"
"\n"
"Output format - only one may be used at a time\n"
"Output format - **only one** may be used at a time\n"
" -c only report the number of lines that would have been printed\n"
" -pct output connect and response times percentiles\n"
" -st output number of requests per HTTP status code\n"
@ -898,6 +914,9 @@ int main(int argc, char **argv)
if (!filter && !filter2)
die("No action specified.\n");
if ((filter & FILT_OUTPUT_FMT) & ((filter & FILT_OUTPUT_FMT) - 1))
die("Please, set only one output filter.\n");
if (filter & FILT_ACC_COUNT && !filter_acc_count)
filter_acc_count=1;
@ -1552,6 +1571,10 @@ void filter_count_srv_status(const char *accept_field, const char *time_field, s
if (!srv_node) {
/* server not yet in the tree, let's create it */
srv = (void *)calloc(1, sizeof(struct srv_st) + e - b + 1);
if (unlikely(!srv)) {
fprintf(stderr, "%s: not enough memory\n", __FUNCTION__);
exit(1);
}
srv_node = &srv->node;
memcpy(&srv_node->key, b, e - b);
srv_node->key[e - b] = '\0';
@ -1661,6 +1684,10 @@ void filter_count_url(const char *accept_field, const char *time_field, struct t
*/
if (unlikely(!ustat))
ustat = calloc(1, sizeof(*ustat));
if (unlikely(!ustat)) {
fprintf(stderr, "%s: not enough memory\n", __FUNCTION__);
exit(1);
}
ustat->nb_err = err;
ustat->nb_req = 1;

View File

@ -7,6 +7,21 @@ the queue.
## Requirements
- Python 3.x
- [lxml](https://lxml.de/installation.html)
- requests
- urllib3
## Installation
It can be easily installed with venv from python3
$ python3 -m venv ~/.local/venvs/stable-bot/
$ source ~/.local/venvs/stable-bot/bin/activate
$ pip install -r requirements.txt
And can be executed with:
$ ~/.local/venvs/stable-bot/bin/python release-estimator.py
## Usage

View File

@ -1,4 +1,4 @@
#!/usr/bin/python3
#!/usr/bin/env python3
#
# Release estimator for HAProxy
#
@ -16,6 +16,7 @@
#
from lxml import html
from urllib.parse import urljoin
import requests
import traceback
import smtplib
@ -190,6 +191,7 @@ This is a friendly bot that watches fixes pending for the next haproxy-stable re
# parse out the CHANGELOG link
CHANGELOG = tree.xpath('//a[contains(@href,"CHANGELOG")]/@href')[0]
CHANGELOG = urljoin("https://", CHANGELOG)
last_version = tree.xpath('//td[contains(text(), "last")]/../td/a/text()')[0]
first_version = "%s.0" % (version)

View File

@ -0,0 +1,3 @@
lxml
requests
urllib3

View File

@ -6,9 +6,9 @@ Wants=network-online.target
[Service]
EnvironmentFile=-/etc/default/haproxy
EnvironmentFile=-/etc/sysconfig/haproxy
Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" "EXTRAOPTS=-S /run/haproxy-master.sock"
ExecStart=@SBINDIR@/haproxy -Ws -f $CONFIG -p $PIDFILE $EXTRAOPTS
ExecReload=@SBINDIR@/haproxy -Ws -f $CONFIG -c $EXTRAOPTS
Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" "CFGDIR=/etc/haproxy/conf.d" "EXTRAOPTS=-S /run/haproxy-master.sock"
ExecStart=@SBINDIR@/haproxy -Ws -f $CONFIG -f $CFGDIR -p $PIDFILE $EXTRAOPTS
ExecReload=@SBINDIR@/haproxy -Ws -f $CONFIG -f $CFGDIR -c $EXTRAOPTS
ExecReload=/bin/kill -USR2 $MAINPID
KillMode=mixed
Restart=always

View File

@ -0,0 +1,34 @@
// find calls to calloc
@call@
expression ptr;
position p;
@@
ptr@p = calloc(...);
// find ok calls to calloc
@ok@
expression ptr;
position call.p;
@@
ptr@p = calloc(...);
... when != ptr
(
(ptr == NULL || ...)
|
(ptr == 0 || ...)
|
(ptr != NULL || ...)
|
(ptr != 0 || ...)
)
// fix bad calls to calloc
@depends on !ok@
expression ptr;
position call.p;
@@
ptr@p = calloc(...);
+ if (ptr == NULL) return;

View File

@ -0,0 +1,34 @@
// find calls to malloc
@call@
expression ptr;
position p;
@@
ptr@p = malloc(...);
// find ok calls to malloc
@ok@
expression ptr;
position call.p;
@@
ptr@p = malloc(...);
... when != ptr
(
(ptr == NULL || ...)
|
(ptr == 0 || ...)
|
(ptr != NULL || ...)
|
(ptr != 0 || ...)
)
// fix bad calls to malloc
@depends on !ok@
expression ptr;
position call.p;
@@
ptr@p = malloc(...);
+ if (ptr == NULL) return;

View File

@ -0,0 +1,34 @@
// find calls to strdup
@call@
expression ptr;
position p;
@@
ptr@p = strdup(...);
// find ok calls to strdup
@ok@
expression ptr;
position call.p;
@@
ptr@p = strdup(...);
... when != ptr
(
(ptr == NULL || ...)
|
(ptr == 0 || ...)
|
(ptr != NULL || ...)
|
(ptr != 0 || ...)
)
// fix bad calls to strdup
@depends on !ok@
expression ptr;
position call.p;
@@
ptr@p = strdup(...);
+ if (ptr == NULL) return;

View File

@ -4,6 +4,7 @@
/* make the include files below expose their flags */
#define HA_EXPOSE_FLAGS
#include <haproxy/applet-t.h>
#include <haproxy/channel-t.h>
#include <haproxy/connection-t.h>
#include <haproxy/fd-t.h>
@ -12,7 +13,10 @@
#include <haproxy/mux_fcgi-t.h>
#include <haproxy/mux_h2-t.h>
#include <haproxy/mux_h1-t.h>
#include <haproxy/mux_quic-t.h>
#include <haproxy/mux_spop-t.h>
#include <haproxy/peers-t.h>
#include <haproxy/quic_conn-t.h>
#include <haproxy/stconn-t.h>
#include <haproxy/stream-t.h>
#include <haproxy/task-t.h>
@ -39,11 +43,17 @@
#define SHOW_AS_FSTRM 0x00040000
#define SHOW_AS_PEERS 0x00080000
#define SHOW_AS_PEER 0x00100000
#define SHOW_AS_QC 0x00200000
#define SHOW_AS_SPOPC 0x00400000
#define SHOW_AS_SPOPS 0x00800000
#define SHOW_AS_QCC 0x01000000
#define SHOW_AS_QCS 0x02000000
#define SHOW_AS_APPCTX 0x04000000
// command line names, must be in exact same order as the SHOW_AS_* flags above
// so that show_as_words[i] matches flag 1U<<i.
const char *show_as_words[] = { "ana", "chn", "conn", "sc", "stet", "strm", "task", "txn", "sd", "hsl", "htx", "hmsg", "fd", "h2c", "h2s", "h1c", "h1s", "fconn", "fstrm",
"peers", "peer"};
"peers", "peer", "qc", "spopc", "spops", "qcc", "qcs", "appctx"};
/* will be sufficient for even largest flag names */
static char buf[4096];
@ -158,6 +168,12 @@ int main(int argc, char **argv)
if (show_as & SHOW_AS_FSTRM) printf("fstrm->flags = %s\n",(fstrm_show_flags (buf, bsz, " | ", flags), buf));
if (show_as & SHOW_AS_PEERS) printf("peers->flags = %s\n",(peers_show_flags (buf, bsz, " | ", flags), buf));
if (show_as & SHOW_AS_PEER) printf("peer->flags = %s\n", (peer_show_flags (buf, bsz, " | ", flags), buf));
if (show_as & SHOW_AS_QC) printf("qc->flags = %s\n", (qc_show_flags (buf, bsz, " | ", flags), buf));
if (show_as & SHOW_AS_SPOPC) printf("spopc->flags = %s\n",(spop_conn_show_flags(buf, bsz, " | ", flags), buf));
if (show_as & SHOW_AS_SPOPS) printf("spops->flags = %s\n",(spop_strm_show_flags(buf, bsz, " | ", flags), buf));
if (show_as & SHOW_AS_QCC) printf("qcc->flags = %s\n", (qcc_show_flags (buf, bsz, " | ", flags), buf));
if (show_as & SHOW_AS_QCS) printf("qcs->flags = %s\n", (qcs_show_flags (buf, bsz, " | ", flags), buf));
if (show_as & SHOW_AS_APPCTX) printf("appctx->flags = %s\n", (appctx_show_flags(buf, bsz, " | ", flags), buf));
}
return 0;
}

View File

@ -1,2 +1,2 @@
#!/bin/sh
awk '{print $12}' | grep cflg= | sort | uniq -c | sort -nr | while read a b; do c=${b##*=}; d=$(${0%/*}/flags conn $c);d=${d##*= }; printf "%6d %s %s\n" $a "$b" "$d";done
grep -o 'cflg=[0-9a-fx]*' | sort | uniq -c | sort -nr | while read a b; do c=${b##*=}; d=$(${0%/*}/flags conn $c);d=${d##*= }; printf "%6d %s %s\n" $a "$b" "$d";done

View File

@ -195,7 +195,7 @@ while read -r; do
! [[ "$REPLY" =~ [[:blank:]]h2c.*\.flg=([0-9a-fx]*) ]] || append_flag b.h2c.flg h2c "${BASH_REMATCH[1]}"
elif [ $ctx = cob ]; then
! [[ "$REPLY" =~ [[:blank:]]flags=([0-9a-fx]*) ]] || append_flag b.co.flg conn "${BASH_REMATCH[1]}"
! [[ "$REPLY" =~ [[:blank:]]fd.state=([0-9a-fx]*) ]] || append_flag b.co.fd.st fd "${BASH_REMATCH[1]}"
! [[ "$REPLY" =~ [[:blank:]]fd.state=([0-9a-fx]*) ]] || append_flag b.co.fd.st fd 0x"${BASH_REMATCH[1]}"
elif [ $ctx = res ]; then
! [[ "$REPLY" =~ [[:blank:]]\(f=([0-9a-fx]*) ]] || append_flag res.flg chn "${BASH_REMATCH[1]}"
! [[ "$REPLY" =~ [[:blank:]]an=([0-9a-fx]*) ]] || append_flag res.ana ana "${BASH_REMATCH[1]}"

118
dev/gdb/ebtree.gdb Normal file
View File

@ -0,0 +1,118 @@
# sets $tag and $node from $arg0, for internal use only
define _ebtree_set_tag_node
set $tag = (unsigned long)$arg0 & 0x1
set $node = (unsigned long)$arg0 & 0xfffffffffffffffe
set $node = (struct eb_node *)$node
end
# get root from any node (leaf of node), returns in $node
define ebtree_root
set $node = (struct eb_root *)$arg0->node_p
if $node == 0
# sole node
set $node = (struct eb_root *)$arg0->leaf_p
end
# walk up
while 1
_ebtree_set_tag_node $node
if $node->branches.b[1] == 0
break
end
set $node = $node->node_p
end
# root returned in $node
end
# returns $node filled with the first node of ebroot $arg0
define ebtree_first
# browse ebtree left until encountering leaf
set $node = (struct eb_node *)$arg0->b[0]
while 1
_ebtree_set_tag_node $node
if $tag == 0
loop_break
end
set $node = (struct eb_root *)$node->branches.b[0]
end
# extract last node
_ebtree_set_tag_node $node
end
# finds next ebtree node after $arg0, and returns it in $node
define ebtree_next
# get parent
set $node = (struct eb_root *)$arg0->leaf_p
# Walking up from right branch, so we cannot be below root
# while (eb_gettag(t) != EB_LEFT) // #define EB_LEFT 0
while 1
_ebtree_set_tag_node $node
if $tag == 0
loop_break
end
set $node = (struct eb_root *)$node->node_p
end
set $node = (struct eb_root *)$node->branches.b[1]
# walk down (left side => 0)
# while (eb_gettag(start) == EB_NODE) // #define EB_NODE 1
while 1
_ebtree_set_tag_node $node
if $node == 0
loop_break
end
if $tag != 1
loop_break
end
set $node = (struct eb_root *)$node->branches.b[0]
end
end
# sets $tag and $node from $arg0, for internal use only
define _ebsctree_set_tag_node
set $tag = (unsigned long)$arg0 & 0x1
set $node = (unsigned long)$arg0 & 0xfffffffffffffffe
set $node = (struct eb32sc_node *)$node
end
# returns $node filled with the first node of ebroot $arg0
define ebsctree_first
# browse ebsctree left until encountering leaf
set $node = (struct eb32sc_node *)$arg0->b[0]
while 1
_ebsctree_set_tag_node $node
if $tag == 0
loop_break
end
set $node = (struct eb_root *)$node->branches.b[0]
end
# extract last node
_ebsctree_set_tag_node $node
end
# finds next ebtree node after $arg0, and returns it in $node
define ebsctree_next
# get parent
set $node = (struct eb_root *)$arg0->node.leaf_p
# Walking up from right branch, so we cannot be below root
# while (eb_gettag(t) != EB_LEFT) // #define EB_LEFT 0
while 1
_ebsctree_set_tag_node $node
if $tag == 0
loop_break
end
set $node = (struct eb_root *)$node->node.node_p
end
set $node = (struct eb_root *)$node->node.branches.b[1]
# walk down (left side => 0)
# while (eb_gettag(start) == EB_NODE) // #define EB_NODE 1
while 1
_ebsctree_set_tag_node $node
if $node == 0
loop_break
end
if $tag != 1
loop_break
end
set $node = (struct eb_root *)$node->node.branches.b[0]
end
end

26
dev/gdb/list.gdb Normal file
View File

@ -0,0 +1,26 @@
# lists entries starting at list head $arg0
define list_dump
set $h = $arg0
set $p = *(void **)$h
while ($p != $h)
printf "%#lx\n", $p
if ($p == 0)
loop_break
end
set $p = *(void **)$p
end
end
# list all entries starting at list head $arg0 until meeting $arg1
define list_find
set $h = $arg0
set $k = $arg1
set $p = *(void **)$h
while ($p != $h)
printf "%#lx\n", $p
if ($p == 0 || $p == $k)
loop_break
end
set $p = *(void **)$p
end
end

19
dev/gdb/memprof.dbg Normal file
View File

@ -0,0 +1,19 @@
# show non-null memprofile entries with method, alloc/free counts/tot and caller
define memprof_dump
set $i = 0
set $meth={ "UNKN", "MALL", "CALL", "REAL", "STRD", "FREE", "P_AL", "P_FR", "STND", "VALL", "ALAL", "PALG", "MALG", "PVAL" }
while $i < sizeof(memprof_stats) / sizeof(memprof_stats[0])
if memprof_stats[$i].alloc_calls || memprof_stats[$i].free_calls
set $m = memprof_stats[$i].method
printf "m:%s ac:%u fc:%u at:%u ft:%u ", $meth[$m], \
memprof_stats[$i].alloc_calls, memprof_stats[$i].free_calls, \
memprof_stats[$i].alloc_tot, memprof_stats[$i].free_tot
output/a memprof_stats[$i].caller
printf "\n"
end
set $i = $i + 1
end
end

21
dev/gdb/pools.gdb Normal file
View File

@ -0,0 +1,21 @@
# dump pool contents (2.9 and above, with buckets)
define pools_dump
set $h = $po
set $p = *(void **)$h
while ($p != $h)
set $e = (struct pool_head *)(((char *)$p) - (unsigned long)&((struct pool_head *)0)->list)
set $total = 0
set $used = 0
set $idx = 0
while $idx < sizeof($e->buckets) / sizeof($e->buckets[0])
set $total=$total + $e->buckets[$idx].allocated
set $used=$used + $e->buckets[$idx].used
set $idx=$idx + 1
end
set $mem = $total * $e->size
printf "list=%#lx pool_head=%p name=%s size=%u alloc=%u used=%u mem=%u\n", $p, $e, $e->name, $e->size, $total, $used, $mem
set $p = *(void **)$p
end
end

47
dev/gdb/post-mortem.gdb Normal file
View File

@ -0,0 +1,47 @@
# This script will set the post_mortem struct pointer ($pm) from the one found
# in the "post_mortem" symbol. If not found or if not correct, it's the same
# address as the "_post_mortem" section, which can be found using "info files"
# or "objdump -h" on the executable. The guessed value is the by a first call
# to pm_init, but if not correct, you just need to call pm_init again with the
# correct pointer, e.g:
# pm_init 0xcfd400
define pm_init
set $pm = (struct post_mortem*)$arg0
set $g = $pm.global
set $ti = $pm.thread_info
set $tc = $pm.thread_ctx
set $tgi = $pm.tgroup_info
set $tgc = $pm.tgroup_ctx
set $fd = $pm.fdtab
set $pxh = *$pm.proxies
set $po = $pm.pools
set $ac = $pm.activity
end
# show basic info on the running process (OS, uid, etc)
define pm_show_info
print $pm->platform
print $pm->process
end
# show thread IDs to easily map between gdb threads and tid
define pm_show_threads
set $t = 0
while $t < $g.nbthread
printf "Tid %4d: pthread_id=%#lx stack_top=%#lx\n", $t, $ti[$t].pth_id, $ti[$t].stack_top
set $t = $t + 1
end
end
# dump all threads' dump buffers
define pm_show_thread_dump
set $t = 0
while $t < $g.nbthread
printf "%s\n", $tc[$t].thread_dump_buffer->area
set $t = $t + 1
end
end
# initialize the various pointers
pm_init &post_mortem

25
dev/gdb/proxies.gdb Normal file
View File

@ -0,0 +1,25 @@
# list proxies starting with the one in argument (typically $pxh)
define px_list
set $p = (struct proxy *)$arg0
while ($p != 0)
printf "%p (", $p
if $p->cap & 0x10
printf "LB,"
end
if $p->cap & 0x1
printf "FE,"
end
if $p->cap & 0x2
printf "BE,"
end
printf "%s)", $p->id
if $p->cap & 0x1
printf " feconn=%u cmax=%u cum_conn=%llu cpsmax=%u", $p->feconn, $p->fe_counters.conn_max, $p->fe_counters.cum_conn, $p->fe_counters.cps_max
end
if $p->cap & 0x2
printf " beconn=%u served=%u queued=%u qmax=%u cum_sess=%llu wact=%u", $p->beconn, $p->served, $p->queue.length, $p->be_counters.nbpend_max, $p->be_counters.cum_sess, $p->lbprm.tot_wact
end
printf "\n"
set $p = ($p)->next
end
end

9
dev/gdb/servers.gdb Normal file
View File

@ -0,0 +1,9 @@
# list servers in a proxy whose pointer is passed in argument
define px_list_srv
set $h = (struct proxy *)$arg0
set $p = ($h)->srv
while ($p != 0)
printf "%#lx %s maxconn=%u cur_sess=%u max_sess=%u served=%u queued=%u st=%u->%u ew=%u sps_max=%u\n", $p, $p->id, $p->maxconn, $p->cur_sess, $p->counters.cur_sess_max, $p->served, $p->queue.length, $p->cur_state, $p->next_state, $p->cur_eweight, $p->counters.sps_max
set $p = ($p)->next
end
end

18
dev/gdb/stream.gdb Normal file
View File

@ -0,0 +1,18 @@
# list all streams for all threads
define stream_dump
set $t = 0
while $t < $g.nbthread
set $h = &$tc[$t].streams
printf "Tid %4d: &streams=%p\n", $t, $h
set $p = *(void **)$h
while ($p != $h)
set $s = (struct stream *)(((char *)$p) - (unsigned long)&((struct stream *)0)->list)
printf " &list=%#lx strm=%p uid=%u strm.fe=%s strm.flg=%#x strm.list={n=%p,p=%p}\n", $p, $s, $s->uniq_id, $s->sess->fe->id, $s->flags, $s->list.n, $s->list.p
if ($p == 0)
loop_break
end
set $p = *(void **)$p
end
set $t = $t + 1
end
end

247
dev/h2/h2-tracer.lua Normal file
View File

@ -0,0 +1,247 @@
-- This is an HTTP/2 tracer for a TCP proxy. It will decode the frames that are
-- exchanged between the client and the server and indicate their direction,
-- types, flags and lengths. Lines are prefixed with a connection number modulo
-- 4096 that allows to sort out multiplexed exchanges. In order to use this,
-- simply load this file in the global section and use it from a TCP proxy:
--
-- global
-- lua-load "dev/h2/h2-tracer.lua"
--
-- listen h2_sniffer
-- mode tcp
-- bind :8002
-- filter lua.h2-tracer #hex
-- server s1 127.0.0.1:8003
--
-- define the decoder's class here
Dec = {}
Dec.id = "Lua H2 tracer"
Dec.flags = 0
Dec.__index = Dec
Dec.args = {} -- args passed by the filter's declaration
Dec.cid = 0 -- next connection ID
-- prefix to indent responses
res_pfx = " | "
-- H2 frame types
h2ft = {
[0] = "DATA",
[1] = "HEADERS",
[2] = "PRIORITY",
[3] = "RST_STREAM",
[4] = "SETTINGS",
[5] = "PUSH_PROMISE",
[6] = "PING",
[7] = "GOAWAY",
[8] = "WINDOW_UPDATE",
[9] = "CONTINUATION",
}
h2ff = {
[0] = { [0] = "ES", [3] = "PADDED" }, -- data
[1] = { [0] = "ES", [2] = "EH", [3] = "PADDED", [5] = "PRIORITY" }, -- headers
[2] = { }, -- priority
[3] = { }, -- rst_stream
[4] = { [0] = "ACK" }, -- settings
[5] = { [2] = "EH", [3] = "PADDED" }, -- push_promise
[6] = { [0] = "ACK" }, -- ping
[7] = { }, -- goaway
[8] = { }, -- window_update
[9] = { [2] = "EH" }, -- continuation
}
function Dec:new()
local dec = {}
setmetatable(dec, Dec)
dec.do_hex = false
if (Dec.args[1] == "hex") then
dec.do_hex = true
end
Dec.cid = Dec.cid+1
-- mix the thread number when multithreading.
dec.cid = Dec.cid + 64 * core.thread
-- state per dir. [1]=req [2]=res
dec.st = {
[1] = {
hdr = { 0, 0, 0, 0, 0, 0, 0, 0, 0 },
fofs = 0,
flen = 0,
ftyp = 0,
fflg = 0,
sid = 0,
tot = 0,
},
[2] = {
hdr = { 0, 0, 0, 0, 0, 0, 0, 0, 0 },
fofs = 0,
flen = 0,
ftyp = 0,
fflg = 0,
sid = 0,
tot = 0,
},
}
return dec
end
function Dec:start_analyze(txn, chn)
if chn:is_resp() then
io.write(string.format("[%03x] ", self.cid % 4096) .. res_pfx .. "### res start\n")
else
io.write(string.format("[%03x] ", self.cid % 4096) .. "### req start\n")
end
filter.register_data_filter(self, chn)
end
function Dec:end_analyze(txn, chn)
if chn:is_resp() then
io.write(string.format("[%03x] ", self.cid % 4096) .. res_pfx .. "### res end: " .. self.st[2].tot .. " bytes total\n")
else
io.write(string.format("[%03x] ", self.cid % 4096) .. "### req end: " ..self.st[1].tot.. " bytes total\n")
end
end
function Dec:tcp_payload(txn, chn)
local data = { }
local dofs = 1
local pfx = ""
local dir = 1
local sofs = 0
local ft = ""
local ff = ""
if chn:is_resp() then
pfx = res_pfx
dir = 2
end
pfx = string.format("[%03x] ", self.cid % 4096) .. pfx
-- stream offset before processing
sofs = self.st[dir].tot
if (chn:input() > 0) then
data = chn:data()
self.st[dir].tot = self.st[dir].tot + chn:input()
end
if (chn:input() > 0 and self.do_hex ~= false) then
io.write("\n" .. pfx .. "Hex:\n")
for i = 1, #data do
if ((i & 7) == 1) then io.write(pfx) end
io.write(string.format("0x%02x ", data:sub(i, i):byte()))
if ((i & 7) == 0 or i == #data) then io.write("\n") end
end
end
-- start at byte 1 in the <data> string
dofs = 1
-- the first 24 bytes are expected to be an H2 preface on the request
if (dir == 1 and sofs < 24) then
-- let's not check it for now
local bytes = self.st[dir].tot - sofs
if (sofs + self.st[dir].tot >= 24) then
-- skip what was missing from the preface
dofs = dofs + 24 - sofs
sofs = 24
io.write(pfx .. "[PREFACE len=24]\n")
else
-- consume more preface bytes
sofs = sofs + self.st[dir].tot
return
end
end
-- parse contents as long as there are pending data
while true do
-- check if we need to consume data from the current frame
-- flen is the number of bytes left before the frame's end.
if (self.st[dir].flen > 0) then
if dofs > #data then return end -- missing data
if (#data - dofs + 1 < self.st[dir].flen) then
-- insufficient data
self.st[dir].flen = self.st[dir].flen - (#data - dofs + 1)
io.write(pfx .. string.format("%32s\n", "... -" .. (#data - dofs + 1) .. " = " .. self.st[dir].flen))
dofs = #data + 1
return
else
-- enough data to finish
if (dofs == 1) then
-- only print a partial size if the frame was interrupted
io.write(pfx .. string.format("%32s\n", "... -" .. self.st[dir].flen .. " = 0"))
end
dofs = dofs + self.st[dir].flen
self.st[dir].flen = 0
end
end
-- here, flen = 0, we're at the beginning of a new frame --
-- read possibly missing header bytes until dec.fofs == 9
while self.st[dir].fofs < 9 do
if dofs > #data then return end -- missing data
self.st[dir].hdr[self.st[dir].fofs + 1] = data:sub(dofs, dofs):byte()
dofs = dofs + 1
self.st[dir].fofs = self.st[dir].fofs + 1
end
-- we have a full frame header here
if (self.do_hex ~= false) then
io.write("\n" .. pfx .. string.format("hdr=%02x %02x %02x %02x %02x %02x %02x %02x %02x\n",
self.st[dir].hdr[1], self.st[dir].hdr[2], self.st[dir].hdr[3],
self.st[dir].hdr[4], self.st[dir].hdr[5], self.st[dir].hdr[6],
self.st[dir].hdr[7], self.st[dir].hdr[8], self.st[dir].hdr[9]))
end
-- we have a full frame header, we'll be ready
-- for a new frame once the data is gone
self.st[dir].flen = self.st[dir].hdr[1] * 65536 +
self.st[dir].hdr[2] * 256 +
self.st[dir].hdr[3]
self.st[dir].ftyp = self.st[dir].hdr[4]
self.st[dir].fflg = self.st[dir].hdr[5]
self.st[dir].sid = self.st[dir].hdr[6] * 16777216 +
self.st[dir].hdr[7] * 65536 +
self.st[dir].hdr[8] * 256 +
self.st[dir].hdr[9]
self.st[dir].fofs = 0
-- decode frame type
if self.st[dir].ftyp <= 9 then
ft = h2ft[self.st[dir].ftyp]
else
ft = string.format("TYPE_0x%02x\n", self.st[dir].ftyp)
end
-- decode frame flags for frame type <ftyp>
ff = ""
for i = 7, 0, -1 do
if (((self.st[dir].fflg >> i) & 1) ~= 0) then
if self.st[dir].ftyp <= 9 and h2ff[self.st[dir].ftyp][i] ~= nil then
ff = ff .. ((ff == "") and "" or "+")
ff = ff .. h2ff[self.st[dir].ftyp][i]
else
ff = ff .. ((ff == "") and "" or "+")
ff = ff .. string.format("0x%02x", 1<<i)
end
end
end
io.write(pfx .. string.format("[%s %ssid=%u len=%u (bytes=%u)]\n",
ft, (ff == "") and "" or ff .. " ",
self.st[dir].sid, self.st[dir].flen,
(#data - dofs + 1)))
end
end
core.register_filter("h2-tracer", Dec, function(dec, args)
Dec.args = args
return dec
end)

View File

@ -59,9 +59,9 @@ struct ring_v2 {
struct ring_v2a {
size_t size; // storage size
size_t rsvd; // header length (used for file-backed maps)
size_t tail __attribute__((aligned(64))); // storage tail
size_t head __attribute__((aligned(64))); // storage head
char area[0] __attribute__((aligned(64))); // storage area begins immediately here
size_t tail ALIGNED(64); // storage tail
size_t head ALIGNED(64); // storage head
char area[0] ALIGNED(64); // storage area begins immediately here
};
/* display the message and exit with the code */

31
dev/ncpu/Makefile Normal file
View File

@ -0,0 +1,31 @@
include ../../include/make/verbose.mk
CC = cc
OPTIMIZE = -O2 -g
DEFINE =
INCLUDE =
OBJS = ncpu.so ncpu
OBJDUMP = objdump
all: $(OBJS)
%.o: %.c
$(cmd_CC) $(OPTIMIZE) $(DEFINE) $(INCLUDE) -shared -fPIC -c -o $@ $^
%.so: %.o
$(cmd_CC) -pie -o $@ $^
$(Q)rm -f $^
%: %.so
$(call qinfo, PATCHING)set -- $$($(OBJDUMP) -j .dynamic -h $^ | fgrep .dynamic); \
ofs=$$6; size=$$3; \
dd status=none bs=1 count=$$((0x$$ofs)) if=$^ of=$^-p1; \
dd status=none bs=1 skip=$$((0x$$ofs)) count=$$((0x$$size)) if=$^ of=$^-p2; \
dd status=none bs=1 skip=$$((0x$$ofs+0x$$size)) if=$^ of=$^-p3; \
sed -e 's,\xfb\xff\xff\x6f\x00\x00\x00\x00\x00\x00\x00\x08,\xfb\xff\xff\x6f\x00\x00\x00\x00\x00\x00\x00\x00,g' < $^-p2 > $^-p2-patched; \
cat $^-p1 $^-p2-patched $^-p3 > "$@"
$(Q)rm -f $^-p*
$(Q)chmod 755 "$@"
clean:
rm -f $(OBJS) *.[oas] *.so-* *~

136
dev/ncpu/ncpu.c Normal file
View File

@ -0,0 +1,136 @@
#define _GNU_SOURCE
#include <errno.h>
#include <limits.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
// gcc -fPIC -shared -O2 -o ncpu{.so,.c}
// NCPU=16 LD_PRELOAD=$PWD/ncpu.so command args...
static char prog_full_path[PATH_MAX];
long sysconf(int name)
{
if (name == _SC_NPROCESSORS_ONLN ||
name == _SC_NPROCESSORS_CONF) {
const char *ncpu = getenv("NCPU");
int n;
n = ncpu ? atoi(ncpu) : CPU_SETSIZE;
if (n < 0 || n > CPU_SETSIZE)
n = CPU_SETSIZE;
return n;
}
errno = EINVAL;
return -1;
}
/* return a cpu_set having the first $NCPU set */
int sched_getaffinity(pid_t pid, size_t cpusetsize, cpu_set_t *mask)
{
const char *ncpu;
int i, n;
CPU_ZERO_S(cpusetsize, mask);
ncpu = getenv("NCPU");
n = ncpu ? atoi(ncpu) : CPU_SETSIZE;
if (n < 0 || n > CPU_SETSIZE)
n = CPU_SETSIZE;
for (i = 0; i < n; i++)
CPU_SET_S(i, cpusetsize, mask);
return 0;
}
/* silently ignore the operation */
int sched_setaffinity(pid_t pid, size_t cpusetsize, const cpu_set_t *mask)
{
return 0;
}
void usage(const char *argv0)
{
fprintf(stderr,
"Usage: %s [-n ncpu] [cmd [args...]]\n"
" Will install itself in LD_PRELOAD before calling <cmd> with args.\n"
" The number of CPUs may also come from variable NCPU or default to %d.\n"
"\n"
"",
argv0, CPU_SETSIZE);
exit(1);
}
/* Called in wrapper mode, no longer supported on recent glibc */
int main(int argc, char **argv)
{
const char *argv0 = argv[0];
char *preload;
int plen;
prog_full_path[0] = 0;
plen = readlink("/proc/self/exe", prog_full_path, sizeof(prog_full_path) - 1);
if (plen != -1)
prog_full_path[plen] = 0;
else
plen = snprintf(prog_full_path, sizeof(prog_full_path), "%s", argv[0]);
while (1) {
argc--;
argv++;
if (argc < 1)
usage(argv0);
if (strcmp(argv[0], "--") == 0) {
argc--;
argv++;
break;
}
else if (strcmp(argv[0], "-n") == 0) {
if (argc < 2)
usage(argv0);
if (setenv("NCPU", argv[1], 1) != 0)
usage(argv0);
argc--;
argv++;
}
else {
/* unknown arg, that's the command */
break;
}
}
/* here the only args left start with the cmd name */
/* now we'll concatenate ourselves at the end of the LD_PRELOAD variable */
preload = getenv("LD_PRELOAD");
if (preload) {
int olen = strlen(preload);
preload = realloc(preload, olen + 1 + plen + 1);
if (!preload) {
perror("realloc");
exit(2);
}
preload[olen] = ' ';
memcpy(preload + olen + 1, prog_full_path, plen);
preload[olen + 1 + plen] = 0;
}
else {
preload = prog_full_path;
}
if (setenv("LD_PRELOAD", preload, 1) < 0) {
perror("setenv");
exit(2);
}
execvp(*argv, argv);
perror("execve");
exit(2);
}

View File

@ -14,11 +14,11 @@ that are picked from the development branch.
Branches are numbered in 0.1 increments. Every 6 months, upon a new major
release, the development branch enters maintenance and a new development branch
is created with a new, higher version. The current development branch is
3.0-dev, and maintenance branches are 2.9 and below.
3.1-dev, and maintenance branches are 3.0 and below.
Fixes created in the development branch for issues that were introduced in an
earlier branch are applied in descending order to each and every version till
that branch that introduced the issue: 2.9 first, then 2.8, then 2.7 and so
that branch that introduced the issue: 3.0 first, then 2.9, then 2.8 and so
on. This operation is called "backporting". A fix for an issue is never
backported beyond the branch that introduced the issue. An important point is
that the project maintainers really aim at zero regression in maintenance

View File

@ -17,7 +17,7 @@ Finally, based on your analysis, give your general conclusion as "Conclusion: X"
where X is a single word among:
- "yes", if you recommend to backport the patch right now either because
it explicitly states this or because it's a fix for a bug that affects
a maintenance branch (2.9 or lower);
a maintenance branch (3.0 or lower);
- "wait", if this patch explicitly mentions that it must be backported, but
only after waiting some time.
- "no", if nothing clearly indicates a necessity to backport this patch (e.g.

View File

@ -0,0 +1,70 @@
BEGININPUT
BEGINCONTEXT
HAProxy's development cycle consists in one development branch, and multiple
maintenance branches.
All the development is made into the development branch exclusively. This
includes mostly new features, doc updates, cleanups and or course, fixes.
The maintenance branches, also called stable branches, never see any
development, and only receive ultra-safe fixes for bugs that affect them,
that are picked from the development branch.
Branches are numbered in 0.1 increments. Every 6 months, upon a new major
release, the development branch enters maintenance and a new development branch
is created with a new, higher version. The current development branch is
3.2-dev, and maintenance branches are 3.1 and below.
Fixes created in the development branch for issues that were introduced in an
earlier branch are applied in descending order to each and every version till
that branch that introduced the issue: 3.1 first, then 3.0, then 2.9, then 2.8
and so on. This operation is called "backporting". A fix for an issue is never
backported beyond the branch that introduced the issue. An important point is
that the project maintainers really aim at zero regression in maintenance
branches, so they're never willing to take any risk backporting patches that
are not deemed strictly necessary.
Fixes consist of patches managed using the Git version control tool and are
identified by a Git commit ID and a commit message. For this reason we
indistinctly talk about backporting fixes, commits, or patches; all mean the
same thing. When mentioning commit IDs, developers always use a short form
made of the first 8 characters only, and expect the AI assistant to do the
same.
It seldom happens that some fixes depend on changes that were brought by other
patches that were not in some branches and that will need to be backported as
well for the fix to work. In this case, such information is explicitly provided
in the commit message by the patch's author in natural language.
Developers are serious and always indicate if a patch needs to be backported.
Sometimes they omit the exact target branch, or they will say that the patch is
"needed" in some older branch, but it means the same. If a commit message
doesn't mention any backport instructions, it means that the commit does not
have to be backported. And patches that are not strictly bug fixes nor doc
improvements are normally not backported. For example, fixes for design
limitations, architectural improvements and performance optimizations are
considered too risky for a backport. Finally, all bug fixes are tagged as
"BUG" at the beginning of their subject line. Patches that are not tagged as
such are not bugs, and must never be backported unless their commit message
explicitly requests so.
ENDCONTEXT
A developer is reviewing the development branch, trying to spot which commits
need to be backported to maintenance branches. This person is already expert
on HAProxy and everything related to Git, patch management, and the risks
associated with backports, so he doesn't want to be told how to proceed nor to
review the contents of the patch.
The goal for this developer is to get some help from the AI assistant to save
some precious time on this tedious review work. In order to do a better job, he
needs an accurate summary of the information and instructions found in each
commit message. Specifically he needs to figure if the patch fixes a problem
affecting an older branch or not, if it needs to be backported, if so to which
branches, and if other patches need to be backported along with it.
The indented text block below after an "id" line and starting with a Subject line
is a commit message from the HAProxy development branch that describes a patch
applied to that branch, starting with its subject line, please read it carefully.

View File

@ -0,0 +1,29 @@
ENDINPUT
BEGININSTRUCTION
You are an AI assistant that follows instruction extremely well. Help as much
as you can, responding to a single question using a single response.
The developer wants to know if he needs to backport the patch above to fix
maintenance branches, for which branches, and what possible dependencies might
be mentioned in the commit message. Carefully study the commit message and its
backporting instructions if any (otherwise it should probably not be backported),
then provide a very concise and short summary that will help the developer decide
to backport it, or simply to skip it.
Start by explaining in one or two sentences what you recommend for this one and why.
Finally, based on your analysis, give your general conclusion as "Conclusion: X"
where X is a single word among:
- "yes", if you recommend to backport the patch right now either because
it explicitly states this or because it's a fix for a bug that affects
a maintenance branch (3.1 or lower);
- "wait", if this patch explicitly mentions that it must be backported, but
only after waiting some time.
- "no", if nothing clearly indicates a necessity to backport this patch (e.g.
lack of explicit backport instructions, or it's just an improvement);
- "uncertain" otherwise for cases not covered above
ENDINSTRUCTION
Explanation:

View File

@ -0,0 +1,70 @@
BEGININPUT
BEGINCONTEXT
HAProxy's development cycle consists in one development branch, and multiple
maintenance branches.
All the development is made into the development branch exclusively. This
includes mostly new features, doc updates, cleanups and or course, fixes.
The maintenance branches, also called stable branches, never see any
development, and only receive ultra-safe fixes for bugs that affect them,
that are picked from the development branch.
Branches are numbered in 0.1 increments. Every 6 months, upon a new major
release, the development branch enters maintenance and a new development branch
is created with a new, higher version. The current development branch is
3.3-dev, and maintenance branches are 3.2 and below.
Fixes created in the development branch for issues that were introduced in an
earlier branch are applied in descending order to each and every version till
that branch that introduced the issue: 3.2 first, then 3.1, then 3.0, then 2.9
and so on. This operation is called "backporting". A fix for an issue is never
backported beyond the branch that introduced the issue. An important point is
that the project maintainers really aim at zero regression in maintenance
branches, so they're never willing to take any risk backporting patches that
are not deemed strictly necessary.
Fixes consist of patches managed using the Git version control tool and are
identified by a Git commit ID and a commit message. For this reason we
indistinctly talk about backporting fixes, commits, or patches; all mean the
same thing. When mentioning commit IDs, developers always use a short form
made of the first 8 characters only, and expect the AI assistant to do the
same.
It seldom happens that some fixes depend on changes that were brought by other
patches that were not in some branches and that will need to be backported as
well for the fix to work. In this case, such information is explicitly provided
in the commit message by the patch's author in natural language.
Developers are serious and always indicate if a patch needs to be backported.
Sometimes they omit the exact target branch, or they will say that the patch is
"needed" in some older branch, but it means the same. If a commit message
doesn't mention any backport instructions, it means that the commit does not
have to be backported. And patches that are not strictly bug fixes nor doc
improvements are normally not backported. For example, fixes for design
limitations, architectural improvements and performance optimizations are
considered too risky for a backport. Finally, all bug fixes are tagged as
"BUG" at the beginning of their subject line. Patches that are not tagged as
such are not bugs, and must never be backported unless their commit message
explicitly requests so.
ENDCONTEXT
A developer is reviewing the development branch, trying to spot which commits
need to be backported to maintenance branches. This person is already expert
on HAProxy and everything related to Git, patch management, and the risks
associated with backports, so he doesn't want to be told how to proceed nor to
review the contents of the patch.
The goal for this developer is to get some help from the AI assistant to save
some precious time on this tedious review work. In order to do a better job, he
needs an accurate summary of the information and instructions found in each
commit message. Specifically he needs to figure if the patch fixes a problem
affecting an older branch or not, if it needs to be backported, if so to which
branches, and if other patches need to be backported along with it.
The indented text block below after an "id" line and starting with a Subject line
is a commit message from the HAProxy development branch that describes a patch
applied to that branch, starting with its subject line, please read it carefully.

View File

@ -0,0 +1,29 @@
ENDINPUT
BEGININSTRUCTION
You are an AI assistant that follows instruction extremely well. Help as much
as you can, responding to a single question using a single response.
The developer wants to know if he needs to backport the patch above to fix
maintenance branches, for which branches, and what possible dependencies might
be mentioned in the commit message. Carefully study the commit message and its
backporting instructions if any (otherwise it should probably not be backported),
then provide a very concise and short summary that will help the developer decide
to backport it, or simply to skip it.
Start by explaining in one or two sentences what you recommend for this one and why.
Finally, based on your analysis, give your general conclusion as "Conclusion: X"
where X is a single word among:
- "yes", if you recommend to backport the patch right now either because
it explicitly states this or because it's a fix for a bug that affects
a maintenance branch (3.2 or lower);
- "wait", if this patch explicitly mentions that it must be backported, but
only after waiting some time.
- "no", if nothing clearly indicates a necessity to backport this patch (e.g.
lack of explicit backport instructions, or it's just an improvement);
- "uncertain" otherwise for cases not covered above
ENDINSTRUCTION
Explanation:

View File

@ -0,0 +1,70 @@
BEGININPUT
BEGINCONTEXT
HAProxy's development cycle consists in one development branch, and multiple
maintenance branches.
All the development is made into the development branch exclusively. This
includes mostly new features, doc updates, cleanups and or course, fixes.
The maintenance branches, also called stable branches, never see any
development, and only receive ultra-safe fixes for bugs that affect them,
that are picked from the development branch.
Branches are numbered in 0.1 increments. Every 6 months, upon a new major
release, the development branch enters maintenance and a new development branch
is created with a new, higher version. The current development branch is
3.4-dev, and maintenance branches are 3.3 and below.
Fixes created in the development branch for issues that were introduced in an
earlier branch are applied in descending order to each and every version till
that branch that introduced the issue: 3.3 first, then 3.2, then 3.1, then 3.0
and so on. This operation is called "backporting". A fix for an issue is never
backported beyond the branch that introduced the issue. An important point is
that the project maintainers really aim at zero regression in maintenance
branches, so they're never willing to take any risk backporting patches that
are not deemed strictly necessary.
Fixes consist of patches managed using the Git version control tool and are
identified by a Git commit ID and a commit message. For this reason we
indistinctly talk about backporting fixes, commits, or patches; all mean the
same thing. When mentioning commit IDs, developers always use a short form
made of the first 8 characters only, and expect the AI assistant to do the
same.
It seldom happens that some fixes depend on changes that were brought by other
patches that were not in some branches and that will need to be backported as
well for the fix to work. In this case, such information is explicitly provided
in the commit message by the patch's author in natural language.
Developers are serious and always indicate if a patch needs to be backported.
Sometimes they omit the exact target branch, or they will say that the patch is
"needed" in some older branch, but it means the same. If a commit message
doesn't mention any backport instructions, it means that the commit does not
have to be backported. And patches that are not strictly bug fixes nor doc
improvements are normally not backported. For example, fixes for design
limitations, architectural improvements and performance optimizations are
considered too risky for a backport. Finally, all bug fixes are tagged as
"BUG" at the beginning of their subject line. Patches that are not tagged as
such are not bugs, and must never be backported unless their commit message
explicitly requests so.
ENDCONTEXT
A developer is reviewing the development branch, trying to spot which commits
need to be backported to maintenance branches. This person is already expert
on HAProxy and everything related to Git, patch management, and the risks
associated with backports, so he doesn't want to be told how to proceed nor to
review the contents of the patch.
The goal for this developer is to get some help from the AI assistant to save
some precious time on this tedious review work. In order to do a better job, he
needs an accurate summary of the information and instructions found in each
commit message. Specifically he needs to figure if the patch fixes a problem
affecting an older branch or not, if it needs to be backported, if so to which
branches, and if other patches need to be backported along with it.
The indented text block below after an "id" line and starting with a Subject line
is a commit message from the HAProxy development branch that describes a patch
applied to that branch, starting with its subject line, please read it carefully.

View File

@ -0,0 +1,29 @@
ENDINPUT
BEGININSTRUCTION
You are an AI assistant that follows instruction extremely well. Help as much
as you can, responding to a single question using a single response.
The developer wants to know if he needs to backport the patch above to fix
maintenance branches, for which branches, and what possible dependencies might
be mentioned in the commit message. Carefully study the commit message and its
backporting instructions if any (otherwise it should probably not be backported),
then provide a very concise and short summary that will help the developer decide
to backport it, or simply to skip it.
Start by explaining in one or two sentences what you recommend for this one and why.
Finally, based on your analysis, give your general conclusion as "Conclusion: X"
where X is a single word among:
- "yes", if you recommend to backport the patch right now either because
it explicitly states this or because it's a fix for a bug that affects
a maintenance branch (3.3 or lower);
- "wait", if this patch explicitly mentions that it must be backported, but
only after waiting some time.
- "no", if nothing clearly indicates a necessity to backport this patch (e.g.
lack of explicit backport instructions, or it's just an improvement);
- "uncertain" otherwise for cases not covered above
ENDINSTRUCTION
Explanation:

View File

@ -150,11 +150,14 @@ function updt_table(line) {
var w = document.getElementById("sh_w").checked;
var y = document.getElementById("sh_y").checked;
var tn = 0, tu = 0, tw = 0, ty = 0;
var bn = 0, bu = 0, bw = 0, by = 0;
var i, el;
for (i = 1; i < nb_patches; i++) {
if (document.getElementById("bt_" + i + "_n").checked) {
tn++;
if (bkp[i])
bn++;
if (line && i != line)
continue;
el = document.getElementById("tr_" + i);
@ -163,6 +166,8 @@ function updt_table(line) {
}
else if (document.getElementById("bt_" + i + "_u").checked) {
tu++;
if (bkp[i])
bu++;
if (line && i != line)
continue;
el = document.getElementById("tr_" + i);
@ -171,6 +176,8 @@ function updt_table(line) {
}
else if (document.getElementById("bt_" + i + "_w").checked) {
tw++;
if (bkp[i])
bw++;
if (line && i != line)
continue;
el = document.getElementById("tr_" + i);
@ -179,6 +186,8 @@ function updt_table(line) {
}
else if (document.getElementById("bt_" + i + "_y").checked) {
ty++;
if (bkp[i])
by++;
if (line && i != line)
continue;
el = document.getElementById("tr_" + i);
@ -198,6 +207,18 @@ function updt_table(line) {
document.getElementById("cnt_u").innerText = tu;
document.getElementById("cnt_w").innerText = tw;
document.getElementById("cnt_y").innerText = ty;
document.getElementById("cnt_bn").innerText = bn;
document.getElementById("cnt_bu").innerText = bu;
document.getElementById("cnt_bw").innerText = bw;
document.getElementById("cnt_by").innerText = by;
document.getElementById("cnt_bt").innerText = bn + bu + bw + by;
document.getElementById("cnt_nbn").innerText = tn - bn;
document.getElementById("cnt_nbu").innerText = tu - bu;
document.getElementById("cnt_nbw").innerText = tw - bw;
document.getElementById("cnt_nby").innerText = ty - by;
document.getElementById("cnt_nbt").innerText = tn - bn + tu - bu + tw - bw + ty - by;
}
function updt_output() {
@ -236,23 +257,47 @@ function updt(line,value) {
updt_output();
}
function show_only(b,n,u,w,y) {
document.getElementById("sh_b").checked = !!b;
document.getElementById("sh_n").checked = !!n;
document.getElementById("sh_u").checked = !!u;
document.getElementById("sh_w").checked = !!w;
document.getElementById("sh_y").checked = !!y;
document.getElementById("show_all").checked = true;
updt(0,"r");
}
// -->
</script>
</HEAD>
EOF
echo "<BODY>"
echo -n "<table cellpadding=3 cellspacing=5 style='font-size: 150%;'><tr><th align=left>Backported</th>"
echo -n "<td style='background-color:$BG_N'><a href='#' onclick='show_only(1,1,0,0,0);'> N: <span id='cnt_bn'>0</span> </a></td>"
echo -n "<td style='background-color:$BG_U'><a href='#' onclick='show_only(1,0,1,0,0);'> U: <span id='cnt_bu'>0</span> </a></td>"
echo -n "<td style='background-color:$BG_W'><a href='#' onclick='show_only(1,0,0,1,0);'> W: <span id='cnt_bw'>0</span> </a></td>"
echo -n "<td style='background-color:$BG_Y'><a href='#' onclick='show_only(1,0,0,0,1);'> Y: <span id='cnt_by'>0</span> </a></td>"
echo -n "<td>total: <span id='cnt_bt'>0</span></td>"
echo "</tr><tr>"
echo -n "<th align=left>Not backported</th>"
echo -n "<td style='background-color:$BG_N'><a href='#' onclick='show_only(0,1,0,0,0);'> N: <span id='cnt_nbn'>0</span> </a></td>"
echo -n "<td style='background-color:$BG_U'><a href='#' onclick='show_only(0,0,1,0,0);'> U: <span id='cnt_nbu'>0</span> </a></td>"
echo -n "<td style='background-color:$BG_W'><a href='#' onclick='show_only(0,0,0,1,0);'> W: <span id='cnt_nbw'>0</span> </a></td>"
echo -n "<td style='background-color:$BG_Y'><a href='#' onclick='show_only(0,0,0,0,1);'> Y: <span id='cnt_nby'>0</span> </a></td>"
echo -n "<td>total: <span id='cnt_nbt'>0</span></td>"
echo "</tr></table><P/>"
echo -n "<big><big>Show:"
echo -n " <span style='background-color:$BG_B'><input type='checkbox' onclick='updt_table(0);' id='sh_b' checked />B (${#bkp[*]})</span> "
echo -n " <span style='background-color:$BG_N'><input type='checkbox' onclick='updt_table(0);' id='sh_n' checked />N (<span id='cnt_n'>0</span>)</span> "
echo -n " <span style='background-color:$BG_U'><input type='checkbox' onclick='updt_table(0);' id='sh_u' checked />U (<span id='cnt_u'>0</span>)</span> "
echo -n " <span style='background-color:$BG_W'><input type='checkbox' onclick='updt_table(0);' id='sh_w' checked />W (<span id='cnt_w'>0</span>)</span> "
echo -n " <span style='background-color:$BG_Y'><input type='checkbox' onclick='updt_table(0);' id='sh_y' checked />Y (<span id='cnt_y'>0</span>)</span> "
echo -n "</big/></big> (B=show backported, N=no/drop, U=uncertain, W=wait/next, Y=yes/pick"
echo -n "</big/></big><br/>(B=show backported, N=no/drop, U=uncertain, W=wait/next, Y=yes/pick"
echo ")<P/>"
echo "<TABLE COLS=5 BORDER=1 CELLSPACING=0 CELLPADDING=3>"
echo "<TR><TH>All<br/><input type='radio' name='review' onclick='updt(0,\"r\");' checked title='Start review here'/></TH><TH>CID</TH><TH>Subject</TH><TH>Verdict<BR>N U W Y</BR></TH><TH>Reason</TH></TR>"
echo "<TR><TH>All<br/><input type='radio' name='review' id='show_all' onclick='updt(0,\"r\");' checked title='Start review here'/></TH><TH>CID</TH><TH>Subject</TH><TH>Verdict<BR>N U W Y</BR></TH><TH>Reason</TH></TR>"
seq_num=1; do_check=1; review=0;
for patch in "${PATCHES[@]}"; do
# try to retrieve the patch's numbering (0001-9999)
@ -335,7 +380,7 @@ for patch in "${PATCHES[@]}"; do
resp=$(echo "$resp" | sed -e "s|#\([0-9]\{1,5\}\)|<a href='${ISSUES}\1'>#\1</a>|g")
# put links to commit IDs
resp=$(echo "$resp" | sed -e "s|\([0-9a-f]\{8,40\}\)|<a href='${GITURL}\1'>\1</a>|g")
resp=$(echo "$resp" | sed -e "s|\([0-9a-f]\{7,40\}\)|<a href='${GITURL}\1'>\1</a>|g")
echo -n "<TD nowrap align=center ${bkp[$cid]:+style='background-color:${BG_B}'}>$seq_num<BR/>"
echo -n "<input type='radio' name='review' onclick='updt($seq_num,\"r\");' ${do_check:+checked} title='Start review here'/></TD>"

View File

@ -22,7 +22,8 @@ STABLE=$(cd "$HAPROXY_DIR" && git describe --tags "v${BRANCH}-dev0^" |cut -f1,2
PATCHES_DIR="$PATCHES_PFX"-"$BRANCH"
(cd "$HAPROXY_DIR"
git pull
# avoid git pull, it chokes on forced push
git remote update origin; git reset origin/master;git checkout -f
last_file=$(ls -1 "$PATCHES_DIR"/*.patch 2>/dev/null | tail -n1)
if [ -n "$last_file" ]; then
restart=$(head -n1 "$last_file" | cut -f2 -d' ')

View File

@ -17,9 +17,9 @@
//const int codes[CODES] = { 200,400,401,403,404,405,407,408,410,413,421,422,425,429,500,501,502,503,504};
#define CODES 32
const int codes[CODES] = { 200,400,401,403,404,405,407,408,410,413,421,422,425,429,500,501,502,503,504,
const int codes[CODES] = { 200,400,401,403,404,405,407,408,410,413,414,421,422,425,429,431,500,501,502,503,504,
/* padding entries below, which will fall back to the default code */
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1};
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1};
unsigned mul, xor;
unsigned bmul = 0, bxor = 0;

View File

@ -0,0 +1,233 @@
#include <stdio.h>
#include <stdlib.h>
#include <haproxy/connection-t.h>
#include <haproxy/intops.h>
struct tevt_info {
const char *loc;
const char **types;
};
/* will be sufficient for even largest flag names */
static char buf[4096];
static size_t bsz = sizeof(buf);
static const char *tevt_unknown_types[16] = {
[ 0] = "-", [ 1] = "-", [ 2] = "-", [ 3] = "-",
[ 4] = "-", [ 5] = "-", [ 6] = "-", [ 7] = "-",
[ 8] = "-", [ 9] = "-", [10] = "-", [11] = "-",
[12] = "-", [13] = "-", [14] = "-", [15] = "-",
};
static const char *tevt_fd_types[16] = {
[ 0] = "-", [ 1] = "shutw", [ 2] = "shutr", [ 3] = "rcv_err",
[ 4] = "snd_err", [ 5] = "-", [ 6] = "-", [ 7] = "conn_err",
[ 8] = "intercepted", [ 9] = "conn_poll_err", [10] = "poll_err", [11] = "poll_hup",
[12] = "-", [13] = "-", [14] = "-", [15] = "-",
};
static const char *tevt_hs_types[16] = {
[ 0] = "-", [ 1] = "-", [ 2] = "-", [ 3] = "rcv_err",
[ 4] = "snd_err", [ 5] = "-", [ 6] = "-", [ 7] = "-",
[ 8] = "-", [ 9] = "-", [10] = "-", [11] = "-",
[12] = "-", [13] = "-", [14] = "-", [15] = "-",
};
static const char *tevt_xprt_types[16] = {
[ 0] = "-", [ 1] = "shutw", [ 2] = "shutr", [ 3] = "rcv_err",
[ 4] = "snd_err", [ 5] = "-", [ 6] = "-", [ 7] = "-",
[ 8] = "-", [ 9] = "-", [10] = "-", [11] = "-",
[12] = "-", [13] = "-", [14] = "-", [15] = "-",
};
static const char *tevt_muxc_types[16] = {
[ 0] = "-", [ 1] = "shutw", [ 2] = "shutr", [ 3] = "rcv_err",
[ 4] = "snd_err", [ 5] = "truncated_shutr", [ 6] = "truncated_rcv_err", [ 7] = "tout",
[ 8] = "goaway_rcvd", [ 9] = "proto_err", [10] = "internal_err", [11] = "other_err",
[12] = "graceful_shut", [13] = "-", [14] = "-", [15] = "-",
};
static const char *tevt_se_types[16] = {
[ 0] = "-", [ 1] = "shutw", [ 2] = "eos", [ 3] = "rcv_err",
[ 4] = "snd_err", [ 5] = "truncated_eos", [ 6] = "truncated_rcv_err", [ 7] = "-",
[ 8] = "rst_rcvd", [ 9] = "proto_err", [10] = "internal_err", [11] = "other_err",
[12] = "cancelled", [13] = "-", [14] = "-", [15] = "-",
};
static const char *tevt_strm_types[16] = {
[ 0] = "-", [ 1] = "shutw", [ 2] = "eos", [ 3] = "rcv_err",
[ 4] = "snd_err", [ 5] = "truncated_eos", [ 6] = "truncated_rcv_err", [ 7] = "tout",
[ 8] = "intercepted", [ 9] = "proto_err", [10] = "internal_err", [11] = "other_err",
[12] = "aborted", [13] = "-", [14] = "-", [15] = "-",
};
static const struct tevt_info tevt_location[26] = {
[ 0] = {.loc = "-", .types = tevt_unknown_types}, [ 1] = {.loc = "-", .types = tevt_unknown_types},
[ 2] = {.loc = "-", .types = tevt_unknown_types}, [ 3] = {.loc = "-", .types = tevt_unknown_types},
[ 4] = {.loc = "se", .types = tevt_se_types}, [ 5] = {.loc = "fd", .types = tevt_fd_types},
[ 6] = {.loc = "-", .types = tevt_unknown_types}, [ 7] = {.loc = "hs", .types = tevt_hs_types},
[ 8] = {.loc = "-", .types = tevt_unknown_types}, [ 9] = {.loc = "-", .types = tevt_unknown_types},
[10] = {.loc = "-", .types = tevt_unknown_types}, [11] = {.loc = "-", .types = tevt_unknown_types},
[12] = {.loc = "muxc", .types = tevt_muxc_types}, [13] = {.loc = "-", .types = tevt_unknown_types},
[14] = {.loc = "-", .types = tevt_unknown_types}, [15] = {.loc = "-", .types = tevt_unknown_types},
[16] = {.loc = "-", .types = tevt_unknown_types}, [17] = {.loc = "-", .types = tevt_unknown_types},
[18] = {.loc = "strm", .types = tevt_strm_types}, [19] = {.loc = "-", .types = tevt_unknown_types},
[20] = {.loc = "-", .types = tevt_unknown_types}, [21] = {.loc = "-", .types = tevt_unknown_types},
[22] = {.loc = "-", .types = tevt_unknown_types}, [23] = {.loc = "xprt", .types = tevt_xprt_types},
[24] = {.loc = "-", .types = tevt_unknown_types}, [25] = {.loc = "-", .types = tevt_unknown_types},
};
void usage_exit(const char *name)
{
fprintf(stderr, "Usage: %s { value* | - }\n", name);
exit(1);
}
char *to_upper(char *dst, const char *src)
{
int i;
for (i = 0; src[i]; i++)
dst[i] = toupper(src[i]);
dst[i] = 0;
return dst;
}
char *tevt_show_events(char *buf, size_t len, const char *delim, const char *value)
{
char loc[5];
int ret;
if (!value || !*value) {
snprintf(buf, len, "##NONE");
goto end;
}
if (strcmp(value, "-") == 0) {
snprintf(buf, len, "##UNK");
goto end;
}
if (strlen(value) % 2 != 0) {
snprintf(buf, len, "##INV");
goto end;
}
while (*value) {
struct tevt_info info;
char l = value[0];
char t = value[1];
if (!isalpha(l) || !isxdigit(t)) {
snprintf(buf, len, "##INV");
goto end;
}
info = tevt_location[tolower(l) - 'a'];
ret = snprintf(buf, len, "%s:%s%s",
isupper(l) ? to_upper(loc, info.loc) : info.loc,
info.types[hex2i(t)],
value[2] != 0 ? delim : "");
if (ret < 0)
break;
len -= ret;
buf += ret;
value += 2;
}
end:
return buf;
}
char *tevt_show_tuple_events(char *buf, size_t len, char *value)
{
char *p = value;
/* skip '{' */
p++;
while (*p) {
char *v;
char c;
while (*p == ' ' || *p == '\t')
p++;
v = p;
while (*p && *p != ',' && *p != '}')
p++;
c = *p;
*p = 0;
tevt_show_events(buf, len, " > ", v);
printf("\t- %s\n", buf);
*p = c;
if (*p == ',')
p++;
else if (*p == '}')
break;
else {
printf("\t- ##INV\n");
break;
}
}
*buf = 0;
return buf;
}
int main(int argc, char **argv)
{
const char *name = argv[0];
char line[128];
char *value;
int multi = 0;
int use_stdin = 0;
char *err;
while (argc == 1)
usage_exit(name);
argv++; argc--;
if (argc > 1)
multi = 1;
if (strcmp(argv[0], "-") == 0)
use_stdin = 1;
while (argc > 0) {
if (use_stdin) {
value = fgets(line, sizeof(line), stdin);
if (!value)
break;
/* skip common leading delimiters that slip from copy-paste */
while (*value == ' ' || *value == '\t' || *value == ':' || *value == '=')
value++;
err = value;
while (*err && *err != '\n')
err++;
*err = 0;
}
else {
value = argv[0];
argv++; argc--;
}
if (multi)
printf("### %-8s : ", value);
if (*value == '{') {
if (!use_stdin)
printf("\n");
tevt_show_tuple_events(buf, bsz, value);
}
else
tevt_show_events(buf, bsz, " > ", value);
printf("%s\n", buf);
}
return 0;
}

View File

@ -3,7 +3,9 @@ DeviceAtlas Device Detection
In order to add DeviceAtlas Device Detection support, you would need to download
the API source code from https://deviceatlas.com/deviceatlas-haproxy-module.
Once extracted :
Once extracted, two modes are supported :
1/ Build HAProxy and DeviceAtlas in one command
$ make TARGET=<target> USE_DEVICEATLAS=1 DEVICEATLAS_SRC=<path to the API root folder>
@ -14,10 +16,6 @@ directory. Also, in the case the api cache support is not needed and/or a C++ to
$ make TARGET=<target> USE_DEVICEATLAS=1 DEVICEATLAS_SRC=<path to the API root folder> DEVICEATLAS_NOCACHE=1
However, if the API had been installed beforehand, DEVICEATLAS_SRC
can be omitted. Note that the DeviceAtlas C API version supported is from the 3.x
releases series (3.2.1 minimum recommended).
For HAProxy developers who need to verify that their changes didn't accidentally
break the DeviceAtlas code, it is possible to build a dummy library provided in
the addons/deviceatlas/dummy directory and to use it as an alternative for the
@ -27,6 +25,29 @@ validate API changes :
$ make TARGET=<target> USE_DEVICEATLAS=1 DEVICEATLAS_SRC=$PWD/addons/deviceatlas/dummy
2/ Build and install DeviceAtlas according to https://docs.deviceatlas.com/apis/enterprise/c/<release version>/README.html
For example :
In the deviceatlas library folder :
$ cmake .
$ make
$ sudo make install
In the HAProxy folder :
$ make TARGET=<target> USE_DEVICEATLAS=1
Note that if the -DCMAKE_INSTALL_PREFIX cmake option had been used, it is necessary to set as well DEVICEATLAS_LIB and
DEVICEATLAS_INC as follow :
$ make TARGET=<target> USE_DEVICEATLAS=1 DEVICEATLAS_INC=<CMAKE_INSTALL_PREFIX value>/include DEVICEATLAS_LIB=<CMAKE_INSTALL_PREFIX value>/lib
For example :
$ cmake -DCMAKE_INSTALL_PREFIX=/opt/local
$ make
$ sudo make install
$ make TARGET=<target> USE_DEVICEATLAS=1 DEVICEATLAS_INC=/opt/local/include DEVICEATLAS_LIB=/opt/local/lib
Note that DEVICEATLAS_SRC is omitted in this case.
These are supported DeviceAtlas directives (see doc/configuration.txt) :
- deviceatlas-json-file <path to the DeviceAtlas JSON data file>.
- deviceatlas-log-level <number> (0 to 3, level of information returned by

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

View File

@ -1,16 +1,12 @@
-----------------------------------------------
Stream Processing Offload Engine (SPOE)
Version 1.2
( Last update: 2020-06-13 )
( Last update: 2024-07-12 )
-----------------------------------------------
Author : Christopher Faulet
Contact : cfaulet at haproxy dot com
WARNING: The SPOE is now deprecated and will be removed in future version.
SUMMARY
--------
@ -73,13 +69,10 @@ systems (often at least the connect() is blocking). So, it is hard to properly
implement Single Sign On solution (SSO) in HAProxy. The SPOE will ease this
kind of processing, or we hope so.
Now, the aim of SPOE is to allow any kind of offloading on the streams. First
releases won't do lot of things. As we will see, there are few handled events
and even less actions supported. Actually, for now, the SPOE can offload the
processing before "tcp-request content", "tcp-response content", "http-request"
and "http-response" rules. And it only supports variables definition. But, in
spite of these limited features, we can easily imagine to implement SSO
solution, ip reputation or ip geolocation services.
The aim of SPOE is to allow any kind of offloading on the streams. It can
offload the processing before "tcp-request content", "tcp-response content",
"http-request" and "http-response" rules. It is also possible to offload the
processing via an TCP/HTTP rule.
Some example implementations in various languages are linked to from the
HAProxy Wiki page dedicated to this mechanism:
@ -89,8 +82,8 @@ HAProxy Wiki page dedicated to this mechanism:
2. SPOE configuration
----------------------
Because SPOE is implemented as a filter, To use it, you must declare a "filter
spoe" line in a proxy section (frontend/backend/listen) :
Because SPOE is implemented as a filter, To use it, a "filter spoe" line must
be declared xin a proxy section (frontend/backend/listen) :
frontend my-front
...
@ -103,9 +96,10 @@ the SPOE configuration. So it is possible to use the same SPOE configuration
for several engines. If no name is provided, the SPOE configuration must not
contain any scope directive.
We use a separate configuration file on purpose. By commenting SPOE filter
line, you completely disable the feature, including the parsing of sections
reserved to SPOE. This is also a way to keep the HAProxy configuration clean.
Using a separate configuration file makes possible to disable completely an
engine by only commenting the SPOE filter line, including the parsing of
sections reserved to SPOE. This is also a way to keep the HAProxy configuration
clean.
A SPOE configuration file must contains, at least, the SPOA configuration
("spoe-agent" section) and SPOE messages/groups ("spoe-message" or "spoe-group"
@ -118,12 +112,13 @@ file.
2.1. SPOE scope
-------------------------
If you specify an engine name on the SPOE filter line, then you need to define
scope in the SPOE configuration with the same name. You can have several SPOE
scope in the same file. In each scope, you must define one and only one
"spoe-agent" section to configure the SPOA linked to your SPOE and several
"spoe-message" and "spoe-group" sections to describe, respectively, messages and
group of messages sent to servers managed by your SPOA.
If an engine name is specified on the SPOE filter line, then the corresponding
scope must be defined in the SPOE configuration with the same name. It is
possible to have several SPOE scopes in the same file. In each scope, one and
only one "spoe-agent" section must be defined, to configure the SPOA linked to
the defined engine and several "spoe-message" and "spoe-group" sections to
describe, respectively, messages and group of messages sent to servers managed
the SPOA.
A SPOE scope starts with this kind of line :
@ -152,15 +147,15 @@ If no engine name is provided on the SPOE filter line, no SPOE scope must be
found in the SPOE configuration file. All the file is considered to be in the
same anonymous and implicit scope.
The engine name must be uniq for a proxy. If no engine name is provided on the
SPOE filter line, the SPOE agent name is used by default.
The engine name must be unique for a proxy. If no engine name is provided on
the SPOE filter line, the SPOE agent name is used by default.
2.2. "spoe-agent" section
--------------------------
For each engine, you must define one and only one "spoe-agent" section. In this
section, you will declare SPOE messages and the backend you will use. You will
also set timeouts and options to customize your agent's behaviour.
For each engine, exactly one "spoe-agent" section must be defined. Enabled SPOE
messages are declared in this section, and all the parameters (timeout,
options, ...) used to customize the agent behavior.
spoe-agent <name>
@ -173,15 +168,10 @@ spoe-agent <name>
following keywords are supported :
- groups
- log
- maxconnrate
- maxerrrate
- max-frame-size
- max-waiting-frames
- messages
- [no] option async
- [no] option dontlog-normal
- [no] option pipelining
- [no] option send-frag-payload
- option continue-on-error
- option force-set-var
- option set-on-error
@ -189,9 +179,16 @@ spoe-agent <name>
- option set-total-time
- option var-prefix
- register-var-names
- timeout hello|idle|processing
- timeout processing
- use-backend
following keywords are deprecated and ignored:
- maxconnrate
- maxerrrate
- max-waiting-frames
- [no] option async
- [no] option send-frag-payload
- timeout hello|idle
groups <grp-name> ...
Declare the list of SPOE groups that an agent will handle.
@ -200,11 +197,11 @@ groups <grp-name> ...
<grp-name> is the name of a SPOE group.
Groups declared here must be found in the same engine scope, else an error is
triggered during the configuration parsing. You can have many "groups" lines.
triggered during the configuration parsing. Several "groups" lines can be
defined.
See also: "spoe-group" section.
log global
log <address> [len <length>] [format <format>] <facility> [<level> [<minlevel>]]
no log
@ -215,28 +212,35 @@ no log
See the HAProxy Configuration Manual for details about this option.
maxconnrate <number>
maxconnrate <number> [DEPRECATED]
Set the maximum number of connections per second to <number>. The SPOE will
stop to open new connections if the maximum is reached and will wait to
acquire an existing one. So it is important to set "timeout hello" to a
relatively small value.
This parameter is now deprecated and ignored. It will be removed in future
versions.
maxerrrate <number>
maxerrrate <number> [DEPRECATED]
Set the maximum number of errors per second to <number>. The SPOE will stop
its processing if the maximum is reached.
This parameter is now deprecated and ignored. It will be removed in future
versions.
max-frame-size <number>
Set the maximum allowed size for frames exchanged between HAProxy and SPOA.
It must be in the range [256, tune.bufsize-4] (4 bytes are reserved for the
frame length). By default, it is set to (tune.bufsize-4).
max-waiting-frames <number>
max-waiting-frames <number> [DEPRECATED]
Set the maximum number of frames waiting for an acknowledgement on the same
connection. This value is only used when the pipelinied or asynchronous
exchanges between HAProxy and SPOA are enabled. By default, it is set to 20.
This parameter is now deprecated and ignored. It will be removed in future
versions.
messages <msg-name> ...
Declare the list of SPOE messages that an agent will handle.
@ -244,23 +248,24 @@ messages <msg-name> ...
<msg-name> is the name of a SPOE message.
Messages declared here must be found in the same engine scope, else an error
is triggered during the configuration parsing. You can have many "messages"
lines.
is triggered during the configuration parsing. Several "messages" lines can
be defined.
See also: "spoe-message" section.
option async
option async [DEPRECATED]
no option async
Enable or disable the support of asynchronous exchanges between HAProxy and
SPOA. By default, this option is enabled.
This parameter is now deprecated and ignored. It will be removed in future
versions.
option continue-on-error
Do not stop the events processing when an error occurred on a stream.
By default, for a specific stream, when an abnormal/unexpected error occurs,
the SPOE is disabled for all the transaction. So if you have several events
the SPOE is disabled for all the transaction. if several events are
configured, such error on an event will disabled all following. For TCP
streams, this will disable the SPOE for the whole session. For HTTP streams,
this will disable it for the transaction (request and response).
@ -268,7 +273,6 @@ option continue-on-error
When set, this option bypass this behaviour and only the current event will
be ignored.
option dontlog-normal
no option dontlog-normal
Enable or disable logging of normal, successful processing.
@ -277,29 +281,27 @@ no option dontlog-normal
See also: "log" and section 4 about logging.
option force-set-var
By default, SPOE filter only register already known variables (mainly from
parsing of the configuration), and process-wide variables (those of scope
"proc") cannot be created. If you want that haproxy trusts the agent and
registers all variables (ex: can be useful for LUA workload), activate this
option.
"proc") cannot be created. If HAProxy trusts the agent and registers all
variables (ex: can be useful for LUA workload), this option can be sets.
Caution : this option opens to a variety of attacks such as a rogue SPOA that
asks to register too many variables.
option pipelining
no option pipelining
Enable or disable the support of pipelined exchanges between HAProxy and
SPOA. By default, this option is enabled.
option send-frag-payload
option send-frag-payload [DEPRECATED]
no option send-frag-payload
Enable or disable the sending of fragmented payload to SPOA. By default, this
option is enabled.
This parameter is now deprecated and ignored. It will be removed in future
versions.
option set-on-error <var name>
Define the variable to set when an error occurred during an event processing.
@ -311,13 +313,13 @@ option set-on-error <var name>
This variable will only be set when an error occurred in the scope of the
transaction. As for all other variables define by the SPOE, it will be
prefixed. So, if your variable name is "error" and your prefix is
prefixed. So, if the variable name is "error" and the prefix is
"my_spoe_pfx", the variable will be "txn.my_spoe_pfx.error".
When set, the variable is an integer representing the error reason. For values
under 256, it represents an error coming from the engine. Below 256, it
reports a SPOP error. In this case, to retrieve the right SPOP status code,
you must remove 256 to this value. Here are possible values:
256 must be removed from this value. Here are possible values:
* 1 a timeout occurred during the event processing.
@ -351,8 +353,8 @@ option set-process-time <var name>
contain characters 'a-z', 'A-Z', '0-9', '.' and '_'.
This variable will be set in the scope of the transaction. As for all other
variables define by the SPOE, it will be prefixed. So, if your variable name
is "process_time" and your prefix is "my_spoe_pfx", the variable will be
variables define by the SPOE, it will be prefixed. So, if the variable name
is "process_time" and the prefix is "my_spoe_pfx", the variable will be
"txn.my_spoe_pfx.process_time".
When set, the variable is an integer representing the delay to process the
@ -360,11 +362,10 @@ option set-process-time <var name>
latency added by the SPOE processing for the last handled event or group.
If several events or groups are processed for the same stream, this value
will be overrideen.
will be overridden.
See also: "option set-total-time".
option set-total-time <var name>
Define the variable to set to report the total processing time SPOE for a
stream.
@ -375,8 +376,8 @@ option set-total-time <var name>
contain characters 'a-z', 'A-Z', '0-9', '.' and '_'.
This variable will be set in the scope of the transaction. As for all other
variables define by the SPOE, it will be prefixed. So, if your variable name
is "total_time" and your prefix is "my_spoe_pfx", the variable will be
variables define by the SPOE, it will be prefixed. So, if the variable name
is "total_time" and the prefix is "my_spoe_pfx", the variable will be
"txn.my_spoe_pfx.total_time".
When set, the variable is an integer representing the sum of processing times
@ -388,7 +389,6 @@ option set-total-time <var name>
See also: "option set-process-time".
option var-prefix <prefix>
Define the prefix used when variables are set by an agent.
@ -403,19 +403,19 @@ option var-prefix <prefix>
The prefix will be added between the variable scope and its name, separated
by a '.'. It may only contain characters 'a-z', 'A-Z', '0-9', '.' and '_', as
for variables name. In HAProxy configuration, you need to use this prefix as
a part of the variables name. For example, if an agent define the variable
"myvar" in the "txn" scope, with the prefix "my_spoe_pfx", then you should
use "txn.my_spoe_pfx.myvar" name in your HAProxy configuration.
for variables name. In HAProxy configuration, this prefix must be used as a
part of the variables name. For example, if an agent define the variable
"myvar" in the "txn" scope, with the prefix "my_spoe_pfx", then
"txn.my_spoe_pfx.myvar" name must be used in HAProxy configuration.
By default, an agent will never set new variables at runtime: It can only set
new value for existing ones. If you want a different behaviour, see
force-set-var option and register-var-names directive.
new value for existing ones. To change this behaviour, see "force-set-var"
option and "register-var-names" directive.
register-var-names <var name> ...
Register some variable names. By default, an agent will not be allowed to set
new variables at runtime. This rule can be totally relaxed by setting the
option "force-set-var". If you know all the variables you will need, this
option "force-set-var". If all the required variables are known, this
directive is a good way to register them without letting an agent doing what
it want. This is only required if these variables are not referenced anywhere
in the HAProxy configuration or the SPOE one.
@ -424,12 +424,12 @@ register-var-names <var name> ...
<var name> is a variable name without the scope. The name may only
contain characters 'a-z', 'A-Z', '0-9', '.' and '_'.
The prefix will be automatically added during the registration. You can have
many "register-var-names" lines.
The prefix will be automatically added during the registration. Several
"register-var-names" lines can be used.
See also: "option force-set-var", "option var-prefix".
timeout hello <timeout>
timeout hello <timeout> [DEPRECATED]
Set the maximum time to wait for an agent to receive the AGENT-HELLO frame.
It is applied on the stream that handle the connection with the agent.
@ -441,8 +441,10 @@ timeout hello <timeout>
This timeout is an applicative timeout. It differ from "timeout connect"
defined on backends.
This parameter is now deprecated and ignored. It will be removed in future
versions.
timeout idle <timeout>
timeout idle <timeout> [DEPRECATED]
Set the maximum time to wait for an agent to close an idle connection. It is
applied on the stream that handle the connection with the agent.
@ -451,6 +453,8 @@ timeout idle <timeout>
can be in any other unit if the number is suffixed by the unit,
as explained at the top of this document.
This parameter is now deprecated and ignored. It will be removed in future
versions.
timeout processing <timeout>
Set the maximum time to wait for a stream to process an event, i.e to acquire
@ -486,21 +490,19 @@ spoe-message <name>
Arguments :
<name> is the name of the SPOE message.
Here you define a message that can be referenced in a "spoe-agent"
section. Following keywords are supported :
Here a message that can be referenced in a "spoe-agent" section is
defined. Following keywords are supported :
- acl
- args
- event
See also: "spoe-agent" section.
acl <aclname> <criterion> [flags] [operator] <value> ...
Declare or complete an access list.
See section 7 about ACL usage in the HAProxy Configuration Manual.
args [name=]<sample> ...
Define arguments passed into the SPOE message.
@ -514,7 +516,6 @@ args [name=]<sample> ...
For example:
args frontend=fe_id src dst
event <name> [ { if | unless } <condition> ]
Set the event that triggers sending of the message. It may optionally be
followed by an ACL-based condition, in which case it will only be evaluated
@ -556,13 +557,12 @@ spoe-group <name>
Arguments :
<name> is the name of the SPOE group.
Here you define a group of SPOE messages that can be referenced in a
Here a group of SPOE messages is defined. It can be referenced in a
"spoe-agent" section. Following keywords are supported :
- messages
- messages
See also: "spoe-agent" and "spoe-message" sections.
messages <msg-name> ...
Declare the list of SPOE messages belonging to the group.
@ -571,7 +571,7 @@ messages <msg-name> ...
Messages declared here must be found in the same engine scope, else an error
is triggered during the configuration parsing. Furthermore, a message belongs
at most to a group. You can have many "messages" lines.
at most to a group. Several "messages" lines can be defined.
See also: "spoe-message" section.
@ -602,7 +602,7 @@ and 0 a blacklisted IP with no doubt).
server http A.B.C.D:80
backend iprep-servers
mode tcp
mode spop
balance roundrobin
timeout connect 5s # greater than hello timeout
@ -620,8 +620,6 @@ and 0 a blacklisted IP with no doubt).
option var-prefix iprep
timeout hello 2s
timeout idle 2m
timeout processing 10ms
use-backend iprep-servers
@ -718,62 +716,37 @@ actions.
+---+---+----------+
FIN: Indicates that this is the final payload fragment. The first fragment
may also be the final fragment.
may also be the final fragment. The payload fragmentation was removed
and is now deprecated. It means the FIN flag must be set on all
frames.
ABORT: Indicates that the processing of the current frame must be
cancelled. This bit should be set on frames with a fragmented
payload. It can be ignore for frames with an unfragemnted
payload. When it is set, the FIN bit must also be set.
cancelled.
Frames cannot exceed a maximum size negotiated between HAProxy and agents
during the HELLO handshake. Most of time, payload will be small enough to send
it in one frame. But when supported by the peer, it will be possible to
fragment huge payload on many frames. This ability is announced during the
HELLO handshake and it can be asynmetric (supported by agents but not by
HAProxy or the opposite). The following rules apply to fragmentation:
* An unfragemnted payload consists of a single frame with the FIN bit set.
* A fragemented payload consists of several frames with the FIN bit clear and
terminated by a single frame with the FIN bit set. All these frames must
share the same STREAM-ID and FRAME-ID. The first frame must set the right
FRAME-TYPE (e.g, NOTIFY). The following frames must have an unset type (0).
Beside the support of fragmented payload by a peer, some payload must not be
fragmented. See below for details.
it in one frame.
IMPORTANT : The maximum size supported by peers for a frame must be greater
than or equal to 256 bytes.
than or equal to 256 bytes. A good common value is the HAProxy
buffer size minus 4 bytes, reserved for the frame length
(tune.bufsize - 4). It is the default value announced by HAproxy.
3.2.1. Frame capabilities
--------------------------
Here are the list of official capabilities that HAProxy and agents can support:
* fragmentation: This is the ability for a peer to support fragmented
payload in received frames. This is an asymmectical
capability, it only concerns the peer that announces
it. This is the responsibility to the other peer to use it
or not.
* pipelining: This is the ability for a peer to decouple NOTIFY and ACK
frames. This is a symmectical capability. To be used, it must
be supported by HAProxy and agents. Unlike HTTP pipelining, the
ACK frames can be send in any order, but always on the same TCP
connection used for the corresponding NOTIFY frame.
* async: This ability is similar to the pipelining, but here any TCP
connection established between HAProxy and the agent can be used to
send ACK frames. if an agent accepts connections from multiple
HAProxy, it can use the "engine-id" value to group TCP
connections. See details about HAPROXY-HELLO frame.
Unsupported or unknown capabilities are silently ignored, when possible.
NOTE: HAProxy does not support the fragmentation for now. This means it is not
able to handle fragmented frames. However, if an agent announces the
fragmentation support, HAProxy may choose to send fragemented frames.
NOTE: Fragmentation and async capabilities were deprecated and are now ignored.
3.2.2. Frame types overview
----------------------------
@ -782,9 +755,6 @@ Here are types of frame supported by SPOE. Frames sent by HAProxy come first,
then frames sent by agents :
TYPE | ID | DESCRIPTION
-----------------------------+-----+-------------------------------------
UNSET | 0 | Used for all frames but the first when a
| | payload is fragmented.
-----------------------------+-----+-------------------------------------
HAPROXY-HELLO | 1 | Sent by HAProxy when it opens a
| | connection on an agent.
@ -805,7 +775,8 @@ then frames sent by agents :
ACK | 103 | Sent to acknowledge a NOTIFY frame
-----------------------------+-----+-------------------------------------
Unknown frames may be silently skipped.
Unknown frames may be silently skipped or trigger an error, depending on the
implementation.
3.2.3. Workflow
----------------
@ -869,37 +840,6 @@ Unknown frames may be silently skipped.
| <-------------------------- |
| |
* Notify / Ack exchange (fragmented payload):
HAPROXY AGENT SRV
| NOTIFY (frag 1) |
| --------------------------> |
| |
| UNSET (frag 2) |
| --------------------------> |
| ... |
| UNSET (frag N) |
| --------------------------> |
| |
| ACK |
| <-------------------------- |
| |
* Aborted fragmentation of a NOTIFY frame:
HAPROXY AGENT SRV
| ... |
| UNSET (frag X) |
| --------------------------> |
| |
| ACK/ABORT |
| <-------------------------- |
| |
| UNSET (frag X+1) |
| -----------X |
| |
| |
* Connection closed by haproxy:
HAPROXY AGENT SRV
@ -921,8 +861,8 @@ Unknown frames may be silently skipped.
----------------------------
This frame is the first one exchanged between HAProxy and an agent, when the
connection is established. The payload of this frame is a KV-LIST. It cannot be
fragmented. STREAM-ID and FRAME-ID are must be set 0.
connection is established. The payload of this frame is a KV-LIST. STREAM-ID
and FRAME-ID are must be set 0.
Following items are mandatory in the KV-LIST:
@ -967,7 +907,7 @@ AGENT-DISCONNECT frame must be returned.
This frame is sent in reply to a HAPROXY-HELLO frame to finish a HELLO
handshake. As for HAPROXY-HELLO frame, STREAM-ID and FRAME-ID are also set
0. The payload of this frame is a KV-LIST and it cannot be fragmented.
0. The payload of this frame is a KV-LIST.
Following items are mandatory in the KV-LIST:
@ -1001,8 +941,7 @@ will close the connection at the end of the health check.
Information are sent to the agents inside NOTIFY frames. These frames are
attached to a stream, so STREAM-ID and FRAME-ID must be set. The payload of
NOTIFY frames is a LIST-OF-MESSAGES and, if supported by agents, it can be
fragmented.
NOTIFY frames is a LIST-OF-MESSAGES.
NOTIFY frames must be acknowledge by agents sending an ACK frame, repeating
right STREAM-ID and FRAME-ID.
@ -1012,8 +951,7 @@ right STREAM-ID and FRAME-ID.
ACK frames must be sent by agents to reply to NOTIFY frames. STREAM-ID and
FRAME-ID found in a NOTIFY frame must be reuse in the corresponding ACK
frame. The payload of ACK frames is a LIST-OF-ACTIONS and, if supported by
HAProxy, it can be fragmented.
frame. The payload of ACK frames is a LIST-OF-ACTIONS.
3.2.8. Frame: HAPROXY-DISCONNECT
---------------------------------
@ -1023,8 +961,8 @@ frame is sent with information describing the error. HAProxy will wait an
AGENT-DISCONNECT frame in reply. All other frames will be ignored. The agent
must then close the socket.
The payload of this frame is a KV-LIST. It cannot be fragmented. STREAM-ID and
FRAME-ID are must be set 0.
The payload of this frame is a KV-LIST. STREAM-ID and FRAME-ID are must be set
0.
Following items are mandatory in the KV-LIST:
@ -1046,8 +984,8 @@ is sent, with information describing the error. such frame is also sent in reply
to a HAPROXY-DISCONNECT. The agent must close the socket just after sending
this frame.
The payload of this frame is a KV-LIST. It cannot be fragmented. STREAM-ID and
FRAME-ID are must be set 0.
The payload of this frame is a KV-LIST. STREAM-ID and FRAME-ID are must be set
0.
Following items are mandatory in the KV-LIST:
@ -1064,10 +1002,10 @@ For more information about known errors, see section "Errors & timeouts"
3.3. Events & Messages
-----------------------
Information about streams are sent in NOTIFY frames. You can specify which kind
of information to send by defining "spoe-message" sections in your SPOE
configuration file. for each "spoe-message" there will be a message in a NOTIFY
frame when the right event is triggered.
Information about streams are sent in NOTIFY frames. It is possible to specify
which kind of information to send by defining "spoe-message" sections in the
SPOE configuration file. for each "spoe-message" there will be a message in a
NOTIFY frame when the right event is triggered.
A NOTIFY frame is sent for an specific event when there is at least one
"spoe-message" attached to this event. All messages for an event will be added
@ -1189,21 +1127,15 @@ An agent can define its own errors using a not yet assigned status code.
IMPORTANT NOTE: By default, for a specific stream, when an abnormal/unexpected
error occurs, the SPOE is disabled for all the transaction. So
if you have several events configured, such error on an event
will disabled all following. For TCP streams, this will
disable the SPOE for the whole session. For HTTP streams, this
will disable it for the transaction (request and response).
See 'option continue-on-error' to bypass this limitation.
if several events are configured, such error on an event will
disabled all following. For TCP streams, this will disable the
SPOE for the whole session. For HTTP streams, this will disable
it for the transaction (request and response). See 'option
continue-on-error' to bypass this limitation.
To avoid a stream to wait undefinetly, you must carefully choose the
acknowledgement timeout. In most of cases, it will be quiet low. But it depends
on the responsivness of your service.
You must also choose idle timeout carefully. Because connection with your
service depends on the backend configuration used by the SPOA, it is important
to use a lower value for idle timeout than the server timeout. Else the
connection will be closed by HAProxy. The same is true for hello timeout. You
should choose a lower value than the connect timeout.
To avoid a stream to wait undefinetly, A processing timeout should be carefully
defined. Most of time, it will be quiet low. But it depends on the SPOA
responsivness.
4. Logging
-----------
@ -1218,40 +1150,19 @@ LOG_NOTICE. Otherwise, the message is logged with the level LOG_WARNING.
The messages are logged using the agent's logger, if defined, and use the
following format:
SPOE: [AGENT] <TYPE:NAME> sid=STREAM-ID st=STATUS-CODE reqT/qT/wT/resT/pT \
<idles>/<applets> <nb_sending>/<nb_waiting> <nb_error>/<nb_processed>
SPOE: [AGENT] <TYPE:NAME> sid=STREAM-ID st=STATUS-CODE pT <nb_error>/<nb_processed>
AGENT is the agent name
TYPE is EVENT of GROUP
NAME is the event or the group name
STREAM-ID is an integer, the unique id of the stream
STATUS_CODE is the processing's status code
reqT/qT/wT/resT/pT are the following time events:
* reqT : the encoding time. It includes ACLs processing, if any. For
fragmented frames, it is the sum of all fragments.
* qT : the delay before the request gets out the sending queue. For
fragmented frames, it is the sum of all fragments.
* wT : the delay before the response is received. No fragmentation
supported here.
* resT : the delay to process the response. No fragmentation supported
here.
* pT : the delay to process the event or the group. From the stream
point of view, it is the latency added by the SPOE processing.
It is more or less the sum of values above.
<idle> is the numbers of idle SPOE applets
<applets> is the numbers of SPOE applets
<nb_sending> is the numbers of streams waiting to send data
<nb_waiting> is the numbers of streams waiting for a ack
pT is the delay to process the event or the group.
From the stream point of view, it is the latency added
by the SPOE processing.
<nb_error> is the numbers of processing errors
<nb_processed> is the numbers of events/groups processed
For all these time events, -1 means the processing was interrupted before the
end. So -1 for the queue time means the request was never dequeued. For
fragmented frames it is harder to know when the interruption happened.
/*
* Local variables:
* fill-column: 79

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,114 @@
2024-10-28 - error reporting
----------------------------
- rules:
-> stream->current_rule ~= yielding rule or error
pb: not always set.
-> todo: curr_rule_in_progress points to &rule->conf (file+line)
- set on ACT_RET_ERR, ACT_RET_YIELD, ACT_RET_INV.
- sample_fetch: curr_rule
- filters:
-> strm_flt.filters[2] (1 per direction) ~= yielding filter or error
-> to check: what to do on forward filters (e.g. compression)
-> check spoe / waf (stream data)
-> sample_fetch: curr_filt
- cleanup:
- last_rule_line + last_rule_file can point to &rule->conf
- xprt:
- all handshakes use the dummy xprt "xprt_handshake" ("HS"). No data
exchange is possible there. The ctx is of type xprt_handshake_ctx
for all of them, and contains a wait_event.
=> conn->xprt_ctx->wait_event contains the sub for current handshake
*if* xprt points to xprt_handshake.
- at most 2 active xprt at once: top and bottom (bottom=raw_sock)
- proposal:
- combine 2 bits for muxc, 2 bits for xprt, 4 bits for fd (active,ready).
=> 8 bits for muxc and below. QUIC uses something different TBD.
- muxs uses 6 bits max (ex: h2 send_list, fctl_list, full etc; h1: full,
blocked connect...).
- 2 bits for sc's sub
- mux_sctl to retrieve a 32-bit code padded right, limited to 16 bits
for now.
=> [ 0000 | 0000 | 0000 | 0000 | SC | MUXS | MUXC | XPRT | FD ]
2 6 2 2 4
- sample-fetch for each side.
- shut / abort
- history, almost human-readable.
- event locations:
- fd (detected by rawsock)
- handshake (detected by xprt_handshake). Eg. parsing or address encoding
- xprt (ssl)
- muxc
- se: muxs / applet
- stream
< 8 total. +8 to distinguish front from back at stream level.
suggest:
- F, H, X, M, E, S front or back
- f, h, x, m, e, s back or front
- event types:
- 0 = no event yet
- 1 = timeout
- 2 = intercepted (rule, etc)
- 3 unused
// shutr / shutw: +1 if other side already shut
- 4 = aligned shutr
- 6 = aligned recv error
- 8 = early shutr (truncation)
- 10 = early error (truncation)
- 12 = shutw
- 14 = send error
- event location = MSB
event type = LSB
appending a single event:
-- if code not full --
code <<= 8;
code |= location << 4;
code |= event type;
- up to 4 events per connection in 32-bit mode stored on connection
(since raw_sock & ssl_sock need to access it).
- SE (muxs/applet) store their event log in the SD: se_event_log (64 bits).
- muxs must aggregate the connection's flags with its own:
- store last known connection state in SD: conn_event_log
- detect changes at the connection level by comparing with SD conn_event_log
- create a new SD event with difference(s) into SD se_event_log
- update connection state in SD conn_event_log
- stream
- store their event log in the stream: strm_event_log (64 bits).
- for each side:
- store last known SE state in SD: last_se_event_log
- detect changes at the SE level by comparing with SD se_event_log
- create a new STREAM event with difference(s) into STREAM strm_event_log
and patch the location depending on front vs back (+8 for back).
- update SE state in SD last_se_event_log
=> strm_event_log contains a composite of each side + stream.
- converted to string using the location letters
- if more event types needed later, can enlarge bits and use another letter.
- note: also possible to create an exhaustive enumeration of all possible codes
(types+locations).
- sample fetch to retrieve strm_event_log.
- Note that fc_err and fc_err_str are already usable
- questions:
- htx layer needed ?
- ability to map EOI/EOS etc to SE activity ?
- we'd like to detect an HTTP response before end of POST.

View File

@ -0,0 +1,750 @@
#FIG 3.2 Produced by xfig version 3.1
Landscape
Center
Metric
A4
100.00
Single
-2
1200 2
0 32 #8e8e8e
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
450 450 450 6750
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 547 2250 637
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 592 2250 682
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 637 2250 727
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 682 2250 772
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 900 2250 990
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 945 2250 1035
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 990 2250 1080
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 1035 2250 1125
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 1080 2250 1170
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 1125 2250 1215
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 1168 2250 1258
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 1213 2250 1303
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 1429 2250 1519
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 1384 2250 1474
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 1339 2250 1429
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
450 1303 2250 1393
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
448 1253 2248 1343
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
2251 794 451 884
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
2250 450 2250 6750
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
2251 1130 451 1220
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
2251 1309 451 1399
2 1 0 1 4 7 53 -1 -1 0.000 0 0 -1 0 1 2
10 1 1.00 60.00 120.00
2295 810 2475 810
2 1 0 1 4 7 53 -1 -1 0.000 0 0 -1 0 1 2
10 1 1.00 60.00 120.00
2295 1305 2475 1305
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
10800 450 10800 7155
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
9000 450 9000 7155
2 1 0 2 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 547 10800 1440
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 592 10800 1485
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 637 10800 1530
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 682 10800 1575
2 1 0 2 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 2437 10800 3330
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 2482 10800 3375
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 2527 10800 3420
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 2572 10800 3465
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 2617 10800 3510
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 2707 10800 3600
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 2752 10800 3645
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 2662 10800 3555
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 4327 10800 5220
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 4372 10800 5265
2 1 0 2 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 4462 10800 5355
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 4417 10800 5310
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 4507 10800 5400
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 4552 10800 5445
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 4597 10800 5490
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 4642 10800 5535
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
10801 5334 9001 6189
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
10801 5532 9001 6387
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
10801 3629 9001 4484
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
10801 3476 9001 4331
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
10801 1575 9001 2430
2 1 0 1 4 7 53 -1 -1 0.000 0 0 -1 0 1 2
10 1 1.00 60.00 120.00
10845 1575 11610 1575
2 1 0 1 4 7 53 -1 -1 0.000 0 0 -1 0 1 2
10 1 1.00 60.00 120.00
10845 3645 11565 3645
2 1 0 1 4 7 53 -1 -1 0.000 0 0 -1 0 1 2
10 1 1.00 60.00 120.00
10845 6120 11610 6120
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
10813 1487 10948 1366 10948 1456 11173 1276
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
10813 1741 10948 1620 10948 1710 11173 1530
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
10813 3406 10948 3285 10948 3375 11173 3195
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
10813 3681 10948 3560 10948 3650 11173 3470
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
10813 3996 10948 3875 10948 3965 11173 3785
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
10813 4266 10948 4145 10948 4235 11173 4055
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
10813 5278 10948 5157 10948 5247 11173 5067
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
10813 5537 10948 5416 10948 5506 11173 5326
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 5002 10800 5895
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 5047 10800 5940
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 5092 10800 5985
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 5137 10800 6030
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 5182 10800 6075
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 5227 10800 6120
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 6802 10800 7695
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 6847 10800 7740
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 6892 10800 7785
2 1 0 2 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 6982 10800 7875
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7027 10800 7920
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7072 10800 7965
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 6937 10800 7830
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7117 10800 8010
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7162 10800 8055
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
10801 6129 9001 6984
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
10801 5942 9001 6797
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 4950 10800 5843
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 4905 10800 5798
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
3150 450 3150 6750
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
4905 450 4905 6750
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 592 4950 1485
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 637 4950 1530
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 547 4950 1440
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 682 4950 1575
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 2572 4950 3465
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 2527 4950 3420
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 2482 4950 3375
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 2437 4950 3330
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 2617 4950 3510
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 2662 4950 3555
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 2707 4950 3600
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 2752 4950 3645
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 4552 4950 5445
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 4597 4950 5490
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 4642 4950 5535
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 4687 4950 5580
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 4867 4950 5760
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 4912 4950 5805
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 5047 4950 5940
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 5092 4950 5985
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 4822 4950 5715
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 4777 4950 5670
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 4732 4950 5625
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 4957 4950 5850
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 5002 4950 5895
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 5137 4950 6030
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 5227 4950 6120
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
3150 5182 4950 6075
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
4951 1575 3151 2430
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
4951 3673 3151 4528
2 1 0 1 4 7 53 -1 -1 0.000 0 0 -1 0 1 2
10 1 1.00 60.00 120.00
4995 1575 5175 1575
2 1 0 1 4 7 53 -1 -1 0.000 0 0 -1 0 1 2
10 1 1.00 60.00 120.00
4995 3645 5175 3645
2 1 0 1 4 7 53 -1 -1 0.000 0 0 -1 0 1 2
10 1 1.00 60.00 120.00
4995 6120 5175 6120
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
7650 450 7650 7155
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
5850 450 5850 7155
2 1 0 2 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 547 7650 1440
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 592 7650 1485
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 637 7650 1530
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 682 7650 1575
2 1 0 2 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 2437 7650 3330
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 2482 7650 3375
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 2527 7650 3420
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 2572 7650 3465
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 2617 7650 3510
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 2707 7650 3600
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 2752 7650 3645
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 2662 7650 3555
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4327 7650 5220
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4372 7650 5265
2 1 0 2 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4462 7650 5355
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4417 7650 5310
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4507 7650 5400
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4552 7650 5445
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4597 7650 5490
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4642 7650 5535
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4687 7650 5580
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4732 7650 5625
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4777 7650 5670
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4822 7650 5715
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4867 7650 5760
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4912 7650 5805
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 4957 7650 5850
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 5002 7650 5895
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6213 7650 7106
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6262 7650 7155
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6307 7650 7200
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6352 7650 7245
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6397 7650 7290
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6487 7650 7380
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6532 7650 7425
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6577 7650 7470
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6442 7650 7335
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6622 7650 7515
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6667 7650 7560
2 1 0 2 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6757 7650 7650
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6802 7650 7695
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6847 7650 7740
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6712 7650 7605
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6892 7650 7785
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
5850 6937 7650 7830
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
7651 5334 5851 6189
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
7651 5532 5851 6387
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
7651 5698 5851 6553
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
7651 5917 5851 6772
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
7651 3629 5851 4484
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
7651 3476 5851 4331
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
7651 1575 5851 2430
2 1 0 1 4 7 53 -1 -1 0.000 0 0 -1 0 1 2
10 1 1.00 60.00 120.00
7695 1575 8460 1575
2 1 0 1 4 7 53 -1 -1 0.000 0 0 -1 0 1 2
10 1 1.00 60.00 120.00
7695 3645 8415 3645
2 1 0 1 4 7 53 -1 -1 0.000 0 0 -1 0 1 2
10 1 1.00 60.00 120.00
7695 6120 8460 6120
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
7663 1487 7798 1366 7798 1456 8023 1276
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
7663 1741 7798 1620 7798 1710 8023 1530
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
7663 3406 7798 3285 7798 3375 8023 3195
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
7663 3681 7798 3560 7798 3650 8023 3470
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
7663 3996 7798 3875 7798 3965 8023 3785
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
7663 4266 7798 4145 7798 4235 8023 4055
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
7663 5278 7798 5157 7798 5247 8023 5067
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
7663 5537 7798 5416 7798 5506 8023 5326
2 1 0 1 4 7 53 -1 -1 0.000 0 0 -1 0 0 4
8955 4680 8910 4680 8910 4860 8955 4860
2 1 0 1 4 7 53 -1 -1 0.000 0 0 -1 0 0 4
8955 6570 8910 6570 8910 6750 8955 6750
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
10813 5791 10948 5670 10948 5760 11173 5580
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
10813 6060 10948 5939 10948 6029 11173 5849
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
10813 6372 10948 6251 10948 6341 11173 6161
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
10813 6601 10948 6480 10948 6570 11173 6390
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
10813 6781 10948 6660 10948 6750 11173 6570
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
10813 6970 10948 6849 10948 6939 11173 6759
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
7663 5791 7798 5670 7798 5760 8023 5580
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
7663 6060 7798 5939 7798 6029 8023 5849
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
7663 6372 7798 6251 7798 6341 8023 6161
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
7663 6601 7798 6480 7798 6570 8023 6390
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
7663 6781 7798 6660 7798 6750 8023 6570
2 1 0 1 5 7 54 -1 -1 0.000 0 0 -1 1 0 4
1 1 1.00 60.00 120.00
7663 6970 7798 6849 7798 6939 8023 6759
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
10801 7245 9001 8100
2 1 0 1 12 7 52 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
10801 7425 9001 8280
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7920 10800 8813
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7965 10800 8858
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 8010 10800 8903
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 8055 10800 8948
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 8100 10800 8993
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 8145 10800 9038
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 8190 10800 9083
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 8235 10800 9128
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7560 10800 8453
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7605 10800 8498
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7650 10800 8543
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7695 10800 8588
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7740 10800 8633
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7785 10800 8678
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7830 10800 8723
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7875 10800 8768
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7200 10800 8093
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7245 10800 8138
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7290 10800 8183
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7335 10800 8228
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7380 10800 8273
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7425 10800 8318
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7470 10800 8363
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 7515 10800 8408
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 6210 10800 7103
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 6255 10800 7148
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 6300 10800 7193
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 6345 10800 7238
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 6390 10800 7283
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 6435 10800 7328
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 6480 10800 7373
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 6525 10800 7418
2 1 0 1 4 7 53 -1 -1 0.000 0 0 -1 1 0 2
2 1 1.00 60.00 120.00
8190 8280 8955 8280
2 1 0 1 1 7 51 -1 -1 0.000 0 0 -1 1 0 2
1 1 1.00 60.00 120.00
9000 8282 10800 9175
3 0 0 1 4 7 53 -1 -1 0.000 0 1 0 5
1 1 1.00 60.00 120.00
8910 4905 8820 5310 8775 5805 8865 6345 8910 6525
0.000 1.000 1.000 1.000 0.000
4 0 0 53 -1 16 6 0.0000 4 105 495 2520 1350 WP1 @12\001
4 0 0 53 -1 16 6 0.0000 4 105 435 2565 855 WP0 @4\001
4 0 0 53 -1 16 6 0.0000 4 75 390 2565 1005 => +8\001
4 1 0 52 -1 16 8 0.4363 4 105 765 9945 4050 WU: win=16\001
4 1 0 52 -1 16 8 0.4363 4 105 690 9945 1935 WU: win=8\001
4 0 0 54 -1 16 6 0.0000 4 75 270 11205 1305 -2 = 0\001
4 0 0 54 -1 16 6 0.0000 4 75 270 11205 1485 -2 = 0\001
4 0 0 54 -1 16 6 0.0000 4 75 270 11205 3195 -2 = 0\001
4 0 0 54 -1 16 6 0.0000 4 75 270 11205 3465 -2 = 4\001
4 0 0 54 -1 16 6 0.0000 4 75 270 11205 3825 -2 = 2\001
4 0 20 54 -1 18 6 0.0000 4 75 270 11205 4095 -2 = 0\001
4 0 0 54 -1 16 6 0.0000 4 75 270 11205 5085 -2 = 0\001
4 0 0 54 -1 16 6 0.0000 4 75 270 11205 5355 -2 = 4\001
4 0 0 53 -1 16 6 0.0000 4 105 495 11340 3645 WP1 @12\001
4 0 0 53 -1 16 6 0.0000 4 105 495 11295 6075 WP2 @28\001
4 0 0 53 -1 16 6 0.0000 4 105 435 11340 1710 WP0 @4\001
4 0 0 53 -1 16 6 0.0000 4 75 360 11340 1860 => +8\001
4 1 0 52 -1 16 8 0.4363 4 105 765 9945 6480 WU: win=32\001
4 0 0 53 -1 16 6 0.0000 4 105 495 5220 3690 WP1 @12\001
4 0 0 53 -1 16 6 0.0000 4 105 495 5220 6165 WP2 @28\001
4 0 0 53 -1 16 6 0.0000 4 105 435 5220 1620 WP0 @4\001
4 0 0 53 -1 16 6 0.0000 4 75 390 5220 1770 => +8\001
4 1 0 52 -1 16 8 0.4363 4 105 765 6795 6300 WU: win=32\001
4 1 0 52 -1 16 8 0.4363 4 105 765 6795 4050 WU: win=16\001
4 1 0 52 -1 16 8 0.4363 4 105 690 6795 1935 WU: win=8\001
4 0 0 54 -1 16 6 0.0000 4 75 270 8055 1305 -2 = 0\001
4 0 0 54 -1 16 6 0.0000 4 75 270 8055 1485 -2 = 0\001
4 0 0 54 -1 16 6 0.0000 4 75 270 8055 3195 -2 = 0\001
4 0 0 54 -1 16 6 0.0000 4 75 270 8055 3465 -2 = 4\001
4 0 0 54 -1 16 6 0.0000 4 75 270 8055 3825 -2 = 2\001
4 0 20 54 -1 18 6 0.0000 4 75 270 8055 4095 -2 = 0\001
4 0 0 54 -1 16 6 0.0000 4 75 270 8055 5085 -2 = 0\001
4 0 0 54 -1 16 6 0.0000 4 75 270 8055 5355 -2 = 4\001
4 0 0 53 -1 16 6 0.0000 4 105 495 8190 3645 WP1 @12\001
4 0 0 53 -1 16 6 0.0000 4 105 495 8145 6075 WP2 @28\001
4 0 0 53 -1 16 6 0.0000 4 105 435 8190 1710 WP0 @4\001
4 0 0 53 -1 16 6 0.0000 4 75 360 8190 1860 => +8\001
4 2 0 53 -1 16 6 0.0000 4 90 315 8865 4770 Pause\001
4 2 0 53 -1 16 6 0.0000 4 90 210 8865 6660 Zero\001
4 2 0 53 -1 16 6 0.0000 4 90 390 8865 6750 Window\001
4 0 0 54 -1 16 6 0.0000 4 75 270 11205 5625 -2 = 3\001
4 0 0 54 -1 16 6 0.0000 4 75 270 11205 6435 -2 = 4\001
4 0 0 54 -1 16 6 0.0000 4 75 270 11205 6615 -2 = 2\001
4 0 20 54 -1 18 6 0.0000 4 75 270 11205 6795 -2 = 0\001
4 0 0 54 -1 16 6 0.0000 4 75 270 8055 5625 -2 = 8\001
4 0 0 54 -1 16 6 0.0000 4 75 270 8055 5850 -2 = 8\001
4 0 0 54 -1 16 6 0.0000 4 75 270 8055 6210 -2 = 6\001
4 0 0 54 -1 16 6 0.0000 4 75 270 8055 6435 -2 = 4\001
4 0 0 54 -1 16 6 0.0000 4 75 270 8055 6615 -2 = 2\001
4 0 20 54 -1 18 6 0.0000 4 75 270 8055 6795 -2 = 0\001
4 0 0 54 -1 16 6 0.0000 4 75 270 11205 5850 -2 = 7\001
4 0 0 54 -1 16 6 0.0000 4 75 270 11205 6210 -2 = 6\001
4 2 0 53 -1 16 6 0.0000 4 90 270 8910 8190 Fixed\001

File diff suppressed because it is too large Load Diff

View File

@ -548,11 +548,15 @@ buffer_almost_full | const buffer *buf| returns true if the buffer is not null
| | are used. A waiting buffer will match.
--------------------+------------------+---------------------------------------
b_alloc | buffer *buf | ensures that <buf> is allocated or
| ret: buffer * | allocates a buffer and assigns it to
| | *buf. If no memory is available, (1)
| | is assigned instead with a zero size.
| enum dynbuf_crit | allocates a buffer and assigns it to
| criticality | *buf. If no memory is available, (1)
| ret: buffer * | is assigned instead with a zero size.
| | The allocated buffer is returned, or
| | NULL in case no memory is available
| | NULL in case no memory is available.
| | The criticality indicates the how the
| | buffer might be used and how likely it
| | is that the allocated memory will be
| | quickly released.
--------------------+------------------+---------------------------------------
__b_free | buffer *buf | releases <buf> which must be allocated
| ret: void | and marks it empty

View File

@ -0,0 +1,128 @@
2024-09-30 - Buffer List API
1. Use case
The buffer list API allows one to share a certain amount of buffers between
multiple entities, which will each see their own as lists of buffers, while
keeping a sharedd free list. The immediate use case is for muxes, which may
want to allocate up to a certain number of buffers per connection, shared
among all streams. In this case, each stream will first request a new list
for its own use, then may request extra entries from the free list. At any
moment it will be possible to enumerate all allocated lists and to know which
buffer follows which one.
2. Representation
The buffer list is an array of struct bl_elem. It can hold up to N-1 buffers
for N elements. The first one serves as the bookkeeping head and creates the
free list.
Each bl_elem contains a struct buffer, a pointer to the next cell, and a few
flags. The struct buffer is a real struct buffer for all cells, except the
first one where it holds useful data to describe the state of the array:
struct bl_elem {
struct buffer {
size_t size; // head: size of the array in number of elements
char *area; // head: not used (0)
size_t data; // head: number of elements allocated
size_t head; // head: number of users
} buf;
uint32_t next;
uint32_t flags;
};
There are a few important properties here:
- for the free list, the first element isn't part of the list, otherwise
there wouldn't be any head storage anymore.
- the head's buf.data doesn't include the first cell of the array, thus its
maximum value is buf.size - 1.
- allocations are always made by appending to end of the existing list
- releases are always made by releasing the beginning of the existing list
- next == 0 for an allocatable cell implies that all the cells from this
element to the last one of the array are free. This allows to simply
initialize a whole new array with memset(array, 0, sizeof(array))
- next == ~0 for an allocated cell indicates we've reached the last element
of the current list.
- for the head of the list, next points to the first available cell, or 0 if
the free list is depleted.
3. Example
The array starts like this, created with a calloc() and having size initialized
to the total number of cells. The number represented is the 'next' value. "~"
here standands for ~0 (i.e. end marker).
[1|0|0|0|0|0|0|0|0|0] => array entirely free
strm1: bl_get(0) -> 1 = assign 1 to strm1's first cell
[2|~|0|0|0|0|0|0|0|0] => strm1 allocated at [1]
1
strm1: bl_get(1) -> 2 = allocate one cell after cell 1
[3|2|~|0|0|0|0|0|0|0]
1
strm1: bl_get(2) -> 3 = allocate one cell after cell 2
[4|2|3|~|0|0|0|0|0|0]
1
strm2: bl_get(0) -> 4 = assign 4 to strm2's first cell
[5|2|3|~|~|0|0|0|0|0]
1 2
strm1: bl_put(1) -> 2 = release cell 1, jump to next one (2)
[1|5|3|~|~|0|0|0|0|0]
1 2
4. Manipulating buffer lists
The API is very simple, it allows to reserve a buffer for a new stream or for
an existing one, to release a stream's first buffer or release the entire
stream, and to initialize / release the whole array.
====================+==================+=======================================
Function | Arguments/Return | Description
--------------------+------------------+---------------------------------------
bl_users() | const bl_elem *b | returns the current number of users on
| ret: uint32_t | the array (i.e. buf.head).
--------------------+------------------+---------------------------------------
bl_size() | const bl_elem *b | returns the total number of
| ret: uint32_t | allocatable cells (i.e. buf.size-1)
--------------------+------------------+---------------------------------------
bl_used() | const bl_elem *b | returns the number of cells currently
| ret: uint32_t | in use (i.e. buf.data)
--------------------+------------------+---------------------------------------
bl_avail() | const bl_elem *b | returns the number of cells still
| ret: uint32_t | available.
--------------------+------------------+---------------------------------------
bl_init() | bl_elem *b | initializes b for n elements. All are
| uint32_t n | in the free list.
--------------------+------------------+---------------------------------------
bl_put() | bl_elem *b | releases cell <idx> to the free list,
| uint32_t n | possibly deleting the user. Returns
| ret: uint32_t | next cell idx or 0 if none (last one).
--------------------+------------------+---------------------------------------
bl_deinit() | bl_elem *b | only when DEBUG_STRICT==2, scans the
| | array to check for leaks.
--------------------+------------------+---------------------------------------
bl_get() | bl_elem *b | allocates a new cell after to add to n
| uint32_t n | or a new stream. Returns the cell or 0
| ret: uint32_t | if no more space.
====================+==================+=======================================

View File

@ -1,12 +1,12 @@
-----------------------------------------
event_hdl Guide - version 2.8
( Last update: 2022-11-14 )
event_hdl Guide - version 3.1
( Last update: 2024-06-21 )
------------------------------------------
ABSTRACT
--------
The event_hdl support is a new feature of HAProxy 2.7. It is a way to easily
The event_hdl support is a new feature of HAProxy 2.8. It is a way to easily
handle general events in a simple to maintain fashion, while keeping core code
impact to the bare minimum.
@ -38,7 +38,7 @@ SUMMARY
1. EVENT_HDL INTRODUCTION
-----------------------
-------------------------
EVENT_HDL provides two complementary APIs, both are implemented
in src/event_hdl.c and include/haproxy/event_hdl(-t).h:
@ -52,7 +52,7 @@ an event that is happening in the process.
(See section 3.)
2. HOW TO HANDLE EXISTING EVENTS
---------------------
--------------------------------
To handle existing events, you must first decide which events you're
interested in.
@ -197,7 +197,7 @@ event subscription is performed using the function:
As the name implies, anonymous subscriptions don't support lookups.
2.1 SYNC MODE
---------------------
-------------
Example, you want to register a sync handler that will be called when
a new server is added.
@ -280,12 +280,12 @@ identified subscription where freeing private is required when subscription ends
```
2.2 ASYNC MODE
---------------------
--------------
As mentioned before, async mode comes in 2 flavors, normal and task.
2.2.1 NORMAL VERSION
---------------------
--------------------
Normal is meant to be really easy to use, and highly compatible with sync mode.
@ -379,7 +379,7 @@ identified subscription where freeing private is required when subscription ends
```
2.2.2 TASK VERSION
---------------------
------------------
task version requires a bit more setup, but it's pretty
straightforward actually.
@ -510,14 +510,14 @@ Note: it is not recommended to perform multiple subscriptions
that might already be freed. Thus UAF will occur.
2.3 ADVANCED FEATURES
-----------------------
---------------------
We've already covered some of these features in the previous examples.
Here is a documented recap.
2.3.1 SUB MGMT
-----------------------
--------------
From an event handler context, either sync or async mode:
You have the ability to directly manage the subscription
@ -565,7 +565,7 @@ task and notify async modes (from the event):
```
2.3.2 SUBSCRIPTION EXTERNAL LOOKUPS
-----------------------
-----------------------------------
As you've seen in 2.3.1, managing the subscription directly
from the handler is a possibility.
@ -620,7 +620,7 @@ unsubscribing:
```
2.3.3 SUBSCRIPTION PTR
-----------------------
----------------------
To manage existing subscriptions from external code,
we already talked about identified subscriptions that
@ -720,7 +720,7 @@ Example:
```
2.3.4 PRIVATE FREE
-----------------------
------------------
Upon handler subscription, you have the ability to provide
a private data pointer that will be passed to the handler
@ -777,7 +777,7 @@ Then:
```
3 HOW TO ADD SUPPORT FOR NEW EVENTS
-----------------------
-----------------------------------
Adding support for a new event is pretty straightforward.
@ -787,9 +787,20 @@ First, you need to declare a new event subtype in event_hdl-t.h file
You might want to declare a whole new event family, in which case
you declare both the new family and the associated subtypes (if any).
Up to 256 families containing 16 subtypes each are supported by the API.
Family 0 is reserved for special events, which means there are 255 usable
families.
You can declare a family using EVENT_HDL_SUB_FAMILY(x) where x is the
family.
You can declare a subtype using EVENT_HDL_SUB_TYPE(x, y) where x is the
family previously declared and y the subtype, Subtypes range from 1 to
16 (included), 0 is not a valid subtype.
```
#define EVENT_HDL_SUB_NEW_FAMILY EVENT_HDL_SUB_FAMILY(4)
#define EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1 EVENT_HDL_SUB_TYPE(4,0)
#define EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1 EVENT_HDL_SUB_TYPE(4,1)
```
Then, you need to update the event_hdl_sub_type_map map,
@ -803,7 +814,7 @@ Please follow this procedure:
You added a new family: go to section 3.1
3.1 DECLARING A NEW EVENT DATA STRUCTURE
-----------------------
----------------------------------------
You have the ability to provide additional data for a given
event family when such events occur.
@ -943,7 +954,7 @@ Event publishing can be performed from anywhere in the code.
--------------------------------------------------------------------------------
4 SUBSCRIPTION LISTS
-----------------------
--------------------
As you may already know, EVENT_HDL API main functions rely on
subscription lists.

View File

@ -540,14 +540,15 @@ message. These functions are used by HTX analyzers or by multiplexers.
the amount of data drained.
- htx_xfer_blks() transfers HTX blocks from an HTX message to another,
stopping on the first block of a specified type or when a specific amount
of bytes, including meta-data, was moved. If the tail block is a DATA
block, it may be partially moved. All other block are transferred at once
or kept. This function returns a mixed value, with the last block moved,
or NULL if nothing was moved, and the amount of data transferred. When
HEADERS or TRAILERS blocks must be transferred, this function transfers
all of them. Otherwise, if it is not possible, it triggers an error. It is
the caller responsibility to transfer all headers or trailers at once.
stopping after the first block of a specified type is transferred or when
a specific amount of bytes, including meta-data, was moved. If the tail
block is a DATA block, it may be partially moved. All other block are
transferred at once or kept. This function returns a mixed value, with the
last block moved, or NULL if nothing was moved, and the amount of data
transferred. When HEADERS or TRAILERS blocks must be transferred, this
function transfers all of them. Otherwise, if it is not possible, it
triggers an error. It is the caller responsibility to transfer all headers
or trailers at once.
- htx_append_msg() append an HTX message to another one. All the message is
copied or nothing. So, if an error occurred, a rollback is performed. This

View File

@ -314,6 +314,16 @@ alphanumerically ordered:
call to cfg_register_section() with the three arguments at stage
STG_REGISTER.
You can only register a section once, but you can register post callbacks
multiple time for this section with REGISTER_CONFIG_POST_SECTION().
- REGISTER_CONFIG_POST_SECTION(name, post)
Registers a function which will be called after a section is parsed. This is
the same as the <post> argument in REGISTER_CONFIG_SECTION(), the difference
is that it allows to register multiple <post> callbacks and to register them
elsewhere in the code.
- REGISTER_PER_THREAD_ALLOC(fct)
Registers a call to register_per_thread_alloc(fct) at stage STG_REGISTER.

View File

@ -0,0 +1,86 @@
2025-08-13 - Memory allocation in HAProxy 3.3
The vast majority of dynamic memory allocations are performed from pools. Pools
are optimized to store pre-calibrated objects of the right size for a given
usage, try to favor locality and hot objects as much as possible, and are
heavily instrumented to detect and help debug a wide class of bugs including
buffer overflows, use-after-free, etc.
For objects of random sizes, or those used only at configuration time, pools
are not suited, and the regular malloc/free family is available, in addition of
a few others.
The standard allocation calls are intercepted at the code level (#define) when
the code is compiled with -DDEBUG_MEM_STATS. For this reason, these calls are
redefined as macros in "bug.h", and one must not try to use the pointers to
such functions, as this may break DEBUG_MEM_STATS. This provides fine-grained
stats about allocation/free per line of source code using locally implemented
counters that can be consulted by "debug dev memstats". The calls are
categorized into one of "calloc", "free", "malloc", "realloc", "strdup",
"p_alloc", "p_free", the latter two designating pools. Extra calls such as
memalign() and similar are also intercepted and counted as malloc.
Due to the nature of this replacement, DEBUG_MEM_STATS cannot see operations
performed in libraries or dependencies.
In addition to DEBUG_MEM_STATS, when haproxy is built with USE_MEMORY_PROFILING
the standard functions are wrapped by new ones defined in "activity.c", which
also hold counters by call place. These ones are able to trace activity in
libraries because the functions check the return pointer to figure where the
call was made. The approach is different and relies on a large hash table. The
files, function names and line numbers are not know, but by passing the pointer
to dladdr(), we can often resolve most of these symbols. These operations are
consulted via "show profiling memory". It must first be enabled either in the
global config "profiling.memory on" or the CLI using "set profiling memory on".
Memory profiling can also track pool allocations and frees thanks to knowing
the size of the element and knowing a place where to store it. Some future
evolutions might consider making this possible as well for pure malloc/free
too by leveraging malloc_usable_size() a bit more.
Finally, 3.3 brought aligned allocations. These are made available via a new
family of functions around ha_aligned_alloc() that simply map to either
posix_memalign(), memalign() or _aligned_malloc() for CYGWIN, depending on
which one is available. This latter one requires to pass the pointer to
_aligned_free() instead of free(), so for this reason, all aligned allocations
have to be released using ha_aligned_free(). Since this mostly happens on
configuration elements, in practice it's not as inconvenient as it can sound.
These functions are in reality macros handled in "bug.h" like the previous
ones in order to deal with DEBUG_MEM_STATS. All "alloc" variants are reported
in memstats as "malloc". All "zalloc" variants are reported in memstats as
"calloc".
The currently available allocators are the following:
- void *ha_aligned_alloc(size_t align, size_t size)
- void *ha_aligned_zalloc(size_t align, size_t size)
Equivalent of malloc() but aligned to <align> bytes. The alignment MUST be
at least as large as one word and MUST be a power of two. The "zalloc"
variant also zeroes the area on success. Both return NULL on failure.
- void *ha_aligned_alloc_safe(size_t align, size_t size)
- void *ha_aligned_zalloc_safe(size_t align, size_t size)
Equivalent of malloc() but aligned to <align> bytes. The alignment is
automatically adjusted to the nearest larger power of two that is at least
as large as a word. The "zalloc" variant also zeroes the area on
success. Both return NULL on failure.
- (type *)ha_aligned_alloc_typed(size_t count, type)
(type *)ha_aligned_zalloc_typed(size_t count, type)
This macro returns an area aligned to the required alignment for type
<type>, large enough for <count> objects of this type, and the result is a
pointer of this type. The goal is to ease allocation of known structures
whose alignment is not necessarily known to the developer (and to avoid
encouraging to hard-code alignment). The cast in return also provides a
last-minute control in case a wrong type is mistakenly used due to a poor
copy-paste or an extra "*" after the type. When DEBUG_MEM_STATS is in use,
the type is stored as a string in the ".extra" field so that it can be
displayed in "debug dev memstats". The "zalloc" variant also zeroes the
area on success. Both return NULL on failure.
- void ha_aligned_free(void *ptr)
Frees the area pointed to by ptr. It is the equivalent of free() but for
objects allocated using one of the functions above.

Some files were not shown because too many files have changed in this diff Show More