The UDP GSO code emits a build warning with older toolchains (gcc 5 and 6):
src/quic_sock.c: In function 'cmsg_set_gso':
src/quic_sock.c:683:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
*((uint16_t *)CMSG_DATA(c)) = gso_size;
^
Let's just use the write_u16() function that's made for this purpose.
It was verified that for all versions from 5 to 13, gcc produces the
exact same code with the fix (and without the warning). It arrived in
3.1 with commit 448d3d388a ("MINOR: quic: add GSO parameter on quic_sock
send API") so this can be backported there.
Since recent commit ee94a6cfc1 ("MINOR: backend: extract conn reuse
from connect_server()") a build warning "set but not used" on the
"reuse" variable is emitted, because indeed the variable is now only
checked when SSL is in use. Let's just mark it as such.
Implement the possibility to reuse idle connections when performing
server checks. This is done thanks to the recently introduced functions
be_calculate_conn_hash() and be_reuse_connection().
One side effect of this change is that be_calculate_conn_hash() can now
be called with a NULL stream instance. As such, part of the functions
are adjusted accordingly.
Note that to simplify configuration, connection reuse is not performed
if any specific check connection parameters are defined on the server
line or via the tcp-check connect rule. This is performed via newly
defined tcpcheck_use_nondefault_connect().
Define a new server keyword check-reuse-pool, and its counterpart with a
"no" prefix. For the moment, only parsing is implemented. The real
behavior adjustment will be implemented in the next patch.
Adjust newly defined be_reuse_connection() API. The stream argument is
removed. This will allows checks to be able to invoke it without relying
on a stream instance.
Following the previous patch, the part directly related to connection
reuse is extracted from connect_server(). It is now define in a new
function be_reuse_connection().
On connection reuse, a hash is first calculated. It is generated from
various connection parameters, to retrieve a matching connection.
Extract hash calculation from connect_server() into a new dedicated
function be_calculate_conn_hash(). The objective is to be able to
perform connection reuse for checks, without connect_server() invokation
which relies on a stream instance.
The main objective of this patch is to remove the stream instance from
conn_backend_get() parameters. This would allow to perform reuse outside
of stream contexts, for example for checks purpose.
Previously, if a server reached its pool-high-count limit, connection
were killed on connect_server() when reuse was not possible. However,
this is now performed even if reuse is done since the following patch :
b3397367dc
MEDIUM: connections: Kill connections even if we are reusing one.
Thus, adjust the related comment to reflect this state.
Recently, work on connection reuses reveals an issue when mixed with
transparent proxy and set-dst. This patch rewrites the related regtests
to be able to catch this now fixed bug.
Note that it is the first regtest which relies on bc_reused recently
introduced sample fetches. This fetch could be reuse in other related
connection reuse regtests to simplify them.
On backend connection reuse, a hash is calculated from various
parameters, to ensure the selected connection match the requested
parameters. Notably, destination address is one of these parameters.
However, it is only taken into account if using a transparent server
(server address 0.0.0.0).
This may cause issue where an incorrect connection is reused, which is
not targetted to the correct destination address. This may be the case
if a set-dst/set-dst-port is used with a transparent proxy (proxy option
transparent).
The fix is simple enough. Destination address is now always used as
input to the connection reuse hash.
This must be backported up to 2.6. Note that for reverse HTTP to work,
it relies on the following patch, which ensures destination address
remains NULL in this case.
commit e94baf6ca71cb2319610baa74dbf17b9bc602b18
BUG/MINOR: rhttp: fix incorrect dst/dst_port values
Previously, destination address of backend connection was systematically
always reassigned. However, this step is unnecessary on connection
reuse. Indeed, reuse should only be conducted with connection using the
same destination address matching the stream requirements.
This patch removes this unnecessary assignment. It is now only performed
when reuse cannot be conducted and a new connection is instantiated.
Functionnally speaking, this patch should not change anything in theory,
as reuse is performed in conformance with the destination address.
However, it appears that it was not always properly enforced. The
systematic assignment of the destination address hides these issues, so
it is now remove. The identified bogus cases will then be fixed in the
following patches.would
This should be backported up to all stable versions.
With a @rhttp server, connect is not possible, transfer is only possible
via idle connection reuse. The server does not have any network address.
Thus, it is unnecessary to allocate the stream destination address prior
to connection reuse. This patch adjusts this by fixing
alloc_dst_address() to take this into account.
Prior to this patch, alloc_dst_address() would incorrectly assimilate a
@rhttp server with a transparent proxy mode. Thus stream destination
address would be copied from the destination address. Connection adress
would then be rewrote with this incorrect value. This did not impact
connect or reuse as destination addr is only used in idle conn hash
calculation for transparent servers. However, it causes incorrect values
for dst/dst_port samples.
This should be backported up to 2.9.
It may happen that the server is going down, and fwlc_srv_reposition()
is still called, because streams still attached to the server are
being terminated.
So in fwlc_srv_reposition(), just do nothing if we've been removed from
the tree.
This should fix github issue #2919.
This should not be backported, unless commit
9fe72bba3c is also backported.

As Ilya reported in issue #2911, the CONCAT() macro breaks on NetBSD
which defines its own as __CONCAT() (which is exactly the same). Let's
just undefine it before ours to fix the issue instead of renaming, but
keep ours so that we don't have doubts about what we're running with.
Note that the patch introducing this breaking change was backported
to 3.0.
As reported by Uku Srmus in GitHub issue #2917, two "tcp-request" rules
in an example were mistakenly missing the "content" hook, rendering them
invalid.
This can be backported.
"raw" logformat node typecast is a special value (unlike str,bool,int..)
which tells haproxy to completely ignore logformat options (including
encoding ones) and force binary output for the current node only. It is
mainly intended for use with JSON or CBOR encoders in order to generate
nested CBOR or nested JSON by storing intermediate log-formats within
variables and assembling the final object in the parent log-format.
Example:
http-request set-var-fmt(txn.intermediate) "%{+json}o %(lower)[str(value)]"
log-format "%{+json}o %(upper)[str(value)] %(intermediate:raw)[var(txn.intermediate)]"
Would produce:
{"upper": "value", "intermediate": {"lower": "value"}}
src/ssl_ckch.c: In function ‘ckch_conf_parse’:
src/ssl_ckch.c:4852:40: error: potential null pointer dereference [-Werror=null-dereference]
4852 | while (*r) {
| ^~
Add a test on r before using *r.
No backport needed
fdcb97614c ("MINOR: ssl/ckch: add substring parser for ckch_conf")
introduced a leak in the error path when the strndup fails.
This patch fixes issue #2920. No backport needed.
For leastconn, servers used to just be stored in an ebtree.
Each server would be one node.
Change that so that nodes contain multiple mt_lists. Each list
will contain servers that share the same key (typically meaning
they have the same number of connections). Using mt_lists means
that as long as tree elements already exist, moving a server from
one tree element to another does no longer require the lbprm write
lock.
We use multiple mt_lists to reduce the contention when moving
a server from one tree element to another. A list in the new
element will be chosen randomly.
We no longer remove a tree element as soon as they no longer
contain any server. Instead, we keep a list of all elements,
and when we need a new element, we look at that list only if it
contains a number of elements already, otherwise we'll allocate
a new one. Keeping nodes in the tree ensures that we very
rarely have to take the lbrpm write lock (as it only happens
when we're moving the server to a position for which no
element is currently in the tree).
The number of mt_lists used is defined as FWLC_NB_LISTS.
The number of tree elements we want to keep is defined as
FWLC_MIN_FREE_ENTRIES, both in defaults.h.
The value used were picked afrer experimentation, and
seems to be the best choice of performances vs memory
usage.
Doing that gives a good boost in performances when a lot of
servers are used.
With a configuration using 500 servers, before that patch,
about 830000 requests per second could be processed, with
that patch, about 1550000 requests per second are
processed, on an 64-cores AMD, using 1200 concurrent connections.
Add two new methods to lbprm, server_deinit() and proxy_deinit(),
in case something should be done at the lbprm level when
removing servers and proxies.
Implement mt_list_try_lock_prev(), that does the same thing
as mt_list_lock_prev(), exceot if the list is locked, it
returns { NULL, NULL } instaed of waiting.
jwk_thumbprint() is a function which is a function which implements
RFC7368 and emits a JWK thumbprint using a EVP_PKEY.
EVP_PKEY_EC_to_pub_jwk() and EVP_PKEY_RSA_to_pub_jwk() were changed in
order to match what is required to emit a thumbprint (ie, no spaces or
lines and the lexicographic order of the fields)
The purpose is mainly to exhibit certain limitations that come with such
less common programming models, to show users how to program interactive
tools in Lua, and how to connect interactively.
Other use cases that could be envisioned are "top" and various monitoring
utilities, with sliding graphs etc. Lua is particularly attractive for
this usage, easy to program, well known from most AI tools (including its
integration into haproxy), making such programs very quick to obtain in
their basic form, and to improve later.
A very limited example game is provided, following the principle of a
very popular one, where the player must compose lines from falling
pieces. It quickly revealed the need to the ability to enforce a timeout
to applet:receive(). Other identified limitations include the difficulty
from the Lua side to monitor multiple events at once, but it seems that
callbacks and/or event dispatchers would be useful here.
At the moment the CLI is not workable (it interactivity was broken in 2.9
when line buffering was adopted), though it was verified that it works
with older releases.
The command needed to connect to the game is displayed as a notice message
during boot.
When first pre-parsing the config to detect the presence or absence of
the master mode, we must not emit messages because they are not supposed
to be visible at this point, otherwise they appear twice each. The
pre-parsing, also called discovery mode, is only for internal use,
thus it should remain silent.
This should be backported to 3.1 where this mode was introduced.
This adds "group-by-{2,3,4}-clusters", which, as its name implies,
create one thread group per X clusters. This can be useful when CPUs
are split into too small clusters, as well as when the total number
of assigned cores is not even between the clusters, to try to spread
the load between less different ones.
When emitting the CPU topology info with -dc, also emit a list of
thread-to-CPU mapping. The group/thread and thread ID are emitted
with the list of their CPUs on each line. The count of CPUs is shown
to ease comparisons, and as much as possible, we try to pack identical
lines within a group by showing thread ranges.
The new function "print_cpu_set()" will print cpu sets in a human-friendly
way, with commas and dashes for intervals. The goal is to keep them compact
enough.
It was previously done in thread_detect_count() but that's not quite
handy because we still don't know about the groups setting. Better do
it slightly later and have all the relevant info instead.
GCC 15 throws the following warning on fixed-size char arrays if they do not
contain terminated NUL:
src/tools.c:2041:25: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (17 chars into 16 available) [-Werror=unterminated-string-initialization]
2041 | const char hextab[16] = "0123456789ABCDEF";
We are using a couple of such definitions for some constants. Converting them
to flexible arrays, like: hextab[] = "0123456789ABCDEF" may have consequences,
as enlarged arrays won't fit anymore where they were possibly located due to
the memory alignement constraints.
GCC adds 'nonstring' variable attribute for such char arrays, but clang and
other compilers don't have it. Let's wrap 'nonstring' with our
__nonstring macro, which will test if the compiler supports this attribute.
This fixes the issue #2910.
gcc 15 throws such kind of warnings about initialization of some char arrays:
src/log.c:181:33: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (17 chars into 16 available) [-Werror=unterminated-string-initialization]
181 | const char sess_term_cond[16] = "-LcCsSPRIDKUIIII"; /* normal, Local, CliTo, CliErr, SrvTo, SrvErr, PxErr, Resource, Internal, Down, Killed, Up, -- */
| ^~~~~~~~~~~~~~~~~~
src/log.c:182:33: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (9 chars into 8 available) [-Werror=unterminated-string-initialization]
182 | const char sess_fin_state[8] = "-RCHDLQT"; /* cliRequest, srvConnect, srvHeader, Data, Last, Queue, Tarpit */
So, let's make it happy by not giving the sizes of these char arrays
explicitly, thus he can accomodate there NUL terminators.
Reported in GitHub issue #2910.
This should be backported up to 2.6.
This test brought by commit 8ed1e91efd ("MEDIUM: lb-chash: add directive
hash-preserve-affinity") seems to have hit a limitation of what can be
expressed in vtc, as it would be desirable to have one server response
release two clients at once but the various attempts using barriers
have failed so far. The test seems to work fine locally but still fails
almost 100% of the time on the CI, so it remains timing dependent in
some ways. Tests have been done with nbthread 1, pool-idle-shared off,
http-reuse never (since always fails locally) etc but to no avail. Let's
just mark it broken in case we later figure another way to fix it. It's
still usable locally most of the time, though.
By default, pools of comparable sizes are merged together. However, the
current algorithm is dumb: it rounds the requested size to the next
multiple of 16 and compares the sizes like this. This results in many
entries which are already multiples of 16 not being merged, for example
1024 and 1032 are separate, 65536 and 65540 are separate, 48 and 56 are
separate (though 56 merges with 64).
This commit changes this to consider not just the entry size but also the
average entry size, that is, it compares the average size of all objects
sharing the pool with the size of the object looking for a pool. If the
object is not more than 1% bigger nor smaller than the current average
size or if it neither 16 bytes smaller nor larger, then it can be merged.
Also, it always respects exact matches in order to avoid merging objects
into larger pools or worse, extending existing ones for no reason, and
when there's a tie, it always avoids extending an existing pool.
Also, we now visit all existing pools in order to spot the best one, we
do not stop anymore at the smallest one large enough. Theoretically this
could cost a bit of CPU but in practice it's O(N^2) with N quite small
(typically in the order of 100) and the cost at each step is very low
(compare a few integer values). But as a side effect, pools are no
longer sorted by size, "show pools bysize" is needed for this.
This causes the objects to be much better grouped together, accepting to
use a little bit more sometimes to avoid fragmentation, without causing
everyone to be merged into the same pool. Thanks to this we're now
seeing 36 pools instead of 48 by default, with some very nice examples
of compact grouping:
- Pool qc_stream_r (80 bytes) : 13 users
> qc_stream_r : size=72 flags=0x1 align=0
> quic_cstrea : size=80 flags=0x1 align=0
> qc_stream_a : size=64 flags=0x1 align=0
> hlua_esub : size=64 flags=0x1 align=0
> stconn : size=80 flags=0x1 align=0
> dns_query : size=64 flags=0x1 align=0
> vars : size=80 flags=0x1 align=0
> filter : size=64 flags=0x1 align=0
> session pri : size=64 flags=0x1 align=0
> fcgi_hdr_ru : size=72 flags=0x1 align=0
> fcgi_param_ : size=72 flags=0x1 align=0
> pendconn : size=80 flags=0x1 align=0
> capture : size=64 flags=0x1 align=0
- Pool h3s (56 bytes) : 17 users
> h3s : size=56 flags=0x1 align=0
> qf_crypto : size=48 flags=0x1 align=0
> quic_tls_se : size=48 flags=0x1 align=0
> quic_arng : size=56 flags=0x1 align=0
> hlua_flt_ct : size=56 flags=0x1 align=0
> promex_metr : size=48 flags=0x1 align=0
> conn_hash_n : size=56 flags=0x1 align=0
> resolv_requ : size=48 flags=0x1 align=0
> mux_pt : size=40 flags=0x1 align=0
> comp_state : size=40 flags=0x1 align=0
> notificatio : size=48 flags=0x1 align=0
> tasklet : size=56 flags=0x1 align=0
> bwlim_state : size=48 flags=0x1 align=0
> xprt_handsh : size=48 flags=0x1 align=0
> email_alert : size=56 flags=0x1 align=0
> caphdr : size=41 flags=0x1 align=0
> caphdr : size=41 flags=0x1 align=0
- Pool quic_cids (32 bytes) : 13 users
> quic_cids : size=16 flags=0x1 align=0
> quic_tls_ke : size=32 flags=0x1 align=0
> quic_tls_iv : size=12 flags=0x1 align=0
> cbuf : size=32 flags=0x1 align=0
> hlua_queuew : size=24 flags=0x1 align=0
> hlua_queue : size=24 flags=0x1 align=0
> promex_modu : size=24 flags=0x1 align=0
> cache_st : size=24 flags=0x1 align=0
> spoe_appctx : size=32 flags=0x1 align=0
> ehdl_sub_tc : size=32 flags=0x1 align=0
> fcgi_flt_ct : size=16 flags=0x1 align=0
> sig_handler : size=32 flags=0x1 align=0
> pipe : size=24 flags=0x1 align=0
- Pool quic_crypto (1032 bytes) : 2 users
> quic_crypto : size=1032 flags=0x1 align=0
> requri : size=1024 flags=0x1 align=0
- Pool quic_conn_r (65544 bytes) : 2 users
> quic_conn_r : size=65536 flags=0x1 align=0
> dns_msg_buf : size=65540 flags=0x1 align=0
On a very unscientific test consisting in sending 1 million H1 requests
and 1 million H2 requests to the stats page, we're seeing an ~6% lower
memory usage with the patch:
before the patch:
Total: 48 pools, 4120832 bytes allocated, 4120832 used (~3555680 by thread caches).
after the patch:
Total: 36 pools, 3880648 bytes allocated, 3880648 used (~3299064 by thread caches).
This should be taken with care however since pools allocate and release
in batches.