Commit Graph

7956 Commits

Author SHA1 Message Date
Aurelien DARRAGON
ee288a4eef REORG: log: reorder send log helpers by dependency order
This commit looks messy, but all it does is reorganize send_log() helpers
by dependency order to remove the need of forward-declaring some of them.

Also, since they're all internal helpers, let's explicitly mark them as
static to prevent any misuse.
2024-06-13 15:43:09 +02:00
Amaury Denoyelle
88681681cc MINOR: quic: refactor qc_build_pkt() error handling
qc_build_pkt() error handling was difficult due to multiple error code
possible. Improve this by defining a proper enum to describe the various
error code. Also clean up ending labels inside qc_build_pkt().
2024-06-12 18:05:40 +02:00
Christopher Faulet
9748df29ff BUG/MEDIUM: mux-quic: Don't unblock zero-copy fwding if blocked during nego
The previous fix (792a645ec2 ["BUG/MEDIUM: mux-quic: Unblock zero-copy
forwarding if the txbuf can be released"]) introduced a regression. The
zero-copy data forwarding must only be unblocked if it was blocked by the
producer, after a successful negotiation.

It is important because during a negotiation, the consumer may be blocked
for another reason. Because of the flow control for instance. In that case,
there is not necessarily a TX buffer. And it unexpected to try to release an
unallocated TX buf.

In addition, the same may happen while a TX buf is still in-use. In that
case, it must also not be released. So testing the TX buffer is not the
right solution.

To fix the issue, a new IOBUF flag was added (IOBUF_FL_FF_WANT_ROOM). It
must be set by the producer if it is blocked after a sucessful negotiation
because it needs more room. In that case, we know a buffer was provided by
the consummer. In done_fastfwd() callback function, it is then possible to
safely unblock the zero-copy data forwarding if this flag is set.

This patch must be backported to 3.0 with the commit above.
2024-06-05 07:28:10 +02:00
Willy Tarreau
1eb0f22ee1 [RELEASE] Released version 3.1-dev0
Released version 3.1-dev0 with the following main changes :
    - MINOR: version: mention that it's development again
2024-05-29 15:00:02 +02:00
Willy Tarreau
555772e961 MINOR: version: mention that it's development again
This essentially reverts 2e42a19cde.
2024-05-29 14:59:19 +02:00
Willy Tarreau
2e42a19cde MINOR: version: mention that it's 3.0 LTS now.
The version will be maintained up to around Q2 2029. Let's
also update the INSTALL file to mention this.
2024-05-29 14:40:26 +02:00
Willy Tarreau
decb7c90df CLEANUP: ssl_sock: move dirty openssl-1.0.2 wrapper to openssl-compat
Valentine noticed this ugly SSL_CTX_get_tlsext_status_cb() macro
definition inside ssl_sock.c that is dedicated to openssl-1.0.2 only.
It would be better placed in openssl-compat.h, which is what this
patch does. It also addresses a missing pair of parenthesis and
removes an invalid extra semicolon.
2024-05-28 19:17:57 +02:00
Aurelien DARRAGON
435a9da267 MINOR: log: rename 'log-format tag' to 'log-format alias'
In 2.9 we started to introduce an ambiguity in the documentation by
referring to historical log-format variables ('%var') as log-format
tags in 739c4e5b1e ("MINOR: sample: accept_date / request_date return
%Ts / %tr timestamp values") and 454c372b60 ("DOC: configuration: add
sample fetches for timing events").

In fact, we've had this confusion between log-format tag and log-format
var for more than 10 years now, but in 2.9 it was the first time the
confusion was exposed in the documentation.

Indeed, both 'log-format variable' and 'log-format tag' actually refer
to the same feature (that is: '%B' and friends that can be used for
direct access to some log-oriented predefined fetches instead of using
%[expr] with generic sample expressions).

This feature was first implemented in 723b73ad75 ("MINOR: config: Parse
the string of the log-format config keyword") and later documented in
4894040fa ("DOC: log-format documentation"). At that time, it was clear
that we used to name it 'log-format variable'.

But later the same year, 'log-format tag' naming started to appear in
some commit messages (while still referring to the same feature), for
instance with ffc3fcd6d ("MEDIUM: log: report SSL ciphers and version
in logs using logformat %sslc/%sslv").

Unfortunately in 2.9 when we added (and documented) new log-format
variables we officially started drifting to the misleading 'log-format
tag' naming (perhaps because it was the most recent naming found for
this feature in git log history, or because the confusion has always
been there)

Even worse, in 3.0 this confusion led us to rename all 'var' occurrences
to 'tag' in log-format related code to unify the code with the doc.

Hopefully William quickly noticed that we made a mistake there, but
instead of reverting to historical naming (log-format variable), it was
decided that we must use a different name that is less confusing than
'tags' or 'variables' (tags and variables are keywords that are already
used to designate other features in the code and that are not very
explicit under log-format context today).

Now we refer to '%B' and friends as a logformat alias, which is
essentially a handy way to print some log oriented information in the
log string instead of leveraging '%[expr]' with generic sample expressions
made of fetches and converters. Of course, there are some subtelties, such
as a few log-format aliases that still don't have sample fetch equivalent
for historical reasons, and some aliases that may be a little faster than
their generic sample expression equivalents because most aliases are
pretty much hardcoded in the log building function. But in general
logformat aliases should be simply considered as an alternative to using
expressions (with '%[expr']')

Also, under log-format context, when we want to refer to either an alias
('%alias') or an expression ('%[expr]'), we should use the generic term
'logformat item', which in fact designates a single item within the
logformat string provided by the user. Indeed, a logformat item (whether
is is an alias or an expression) always starts with '%' and may accept
optional flags / arguments

Both the code and the documentation were updated in that sense, hopefully
this will clarify things and prevent future confusions.
2024-05-27 17:03:48 +02:00
Amaury Denoyelle
47168e217a MEDIUM: connection: use pool-conn-name instead of sni on reuse
Implement pool-conn-name support for idle connection reuse. It replaces
SNI as arbitrary identifier for connections in the idle pool. Thus,
every SNI reference in this context have been replaced.

Main change occurs in connect_server() where pool-conn-name sample fetch
is now prehash to generate idle connection identifier. SNI is now solely
used in the context of SSL for ssl_sock_set_servername().
2024-05-24 14:47:21 +02:00
Amaury Denoyelle
be4f89f2b2 MINOR: server: define pool-conn-name keyword
Define a new server keyword pool-conn-name. The purpose of this keyword
will be to identify connections inside the idle connections pool,
replacing SNI in case SSL is not wanted.

This keyword uses a sample expression argument. It thus can reuse
existing function parse_srv_expr() for parsing. In the future, it may be
necessary to define a keyword variant which uses a logformat for
extensability.

This patch only implement parsing. Argument is stored inside new server
field <pool_conn_name> and expression is generated in
_srv_parse_finalize() into <pool_conn_name_expr>.

If pool-conn-name is not set but SNI is, the latter is reused
automatically as pool-conn-name via _srv_parse_finalize(). This ensures
current reuse behavior remains compatible and idle connection reuse will
not mix connections with different SNIs by mistake.

Main usage will be for rhttp when SSL is not wanted between the two
haproxy instances. Previously, it was possible to use "sni" keyword even
without SSL on a server line which have a similar effect. However,
having a dedicated "pool-conn-name" keyword is deemed clearer. Besides,
it would allow for more complex configuration where pool-conn-name and
SNI are use in parallel with different values.
2024-05-24 14:36:31 +02:00
Amaury Denoyelle
91001422b4 MINOR: server: generalize sni expr parsing
Two functions exists for server sni sample expression parsing. This is
confusing so this commit aims at clarifying this.

Functions are renamed with the following identifiers. First function is
named parse_srv_expr() and can be used during parsing. Besides
expression parsing, it has ensure sample fetch validity in the context
of a server line.

Second function is renamed _parse_srv_expr() and is used internally by
parse_srv_expr(). It only implements sample parsing without extra
checks. It is already use for server instantiation derived from
server-template as checks were already performed. Also, it is now used
in http-client code as SNI is a fixed string.

Finally, both functions are generalized to remove any reference to SNI.
This will allow to reuse it to parse other server keywords which use an
expression. This will be the case for the future keyword pool-conn-name.
2024-05-24 14:36:31 +02:00
Amaury Denoyelle
5764bc50b5 BUG/MINOR: quic: adjust restriction for stateless reset emission
Review RFC 9000 and ensure restriction on Stateless reset are properly
enforced. After careful examination, several changes are introduced.

First, redefine minimal Stateless Reset emitted packet length to 21
bytes (5 random bytes + a token). This is the new default length used in
every case, unless received packet which triggered it is 43 bytes or
smaller.

Ensure every Stateless Reset packets emitted are at 1 byte shorter than
the received packet which triggered it. No Stateless reset will be
emitted if this falls under the above limit of 21 bytes. Thus this
should prevent looping issues.

This should be backported up to 2.6.
2024-05-24 14:36:31 +02:00
Amaury Denoyelle
45f40bac4c MEDIUM: config: prevent communication with privileged ports
This commit introduces a new global setting named
harden.reject_privileged_ports.{tcp|quic}. When active, communications
with clients which use privileged source ports are forbidden. Such
behavior is considered suspicious as it can be used as spoofing or
DNS/NTP amplication attack.

Value is configured per transport protocol. For each TCP and QUIC
distinct code locations are impacted by this setting. The first one is
in sock_accept_conn() which acts as a filter for all TCP based
communications just after accept() returns a new connection. The second
one is dedicated for QUIC communication in quic_recv(). In both cases,
if a privileged source port is used and setting is disabled, received
message is silently dropped.

By default, protection are disabled for both protocols. This is to be
able to backport it without breaking changes on stable release.

This should be backported as it is an interesting security feature yet
relatively simple to implement.
2024-05-24 14:36:31 +02:00
Aurelien DARRAGON
9d37c4b989 DEBUG: tools: add vma_set_name_id() helper
Just like vma_set_name() from 51a8f134e ("DEBUG: tools: add vma_set_name()
helper"), but also takes <id> as parameter to append "-$id" suffix after
the name in order to differentiate 2 areas that were named using the same
<type> and <name> combination.

example, using mmap + MAP_SHARED|MAP_ANONYMOUS:
  7364c4fff000-736508000000 rw-s 00000000 00:01 3540  [anon_shmem:type:name-id]
Another example, using mmap + MAP_PRIVATE|MAP_ANONYMOUS or using
glibc/malloc() above MMAP_THRESHOLD:
  7364c4fff000-736508000000 rw-s 00000000 00:01 3540  [anon:type:name-id]
2024-05-24 12:07:13 +02:00
Willy Tarreau
381ed2a4dd MINOR: config: add thread-hard-limit to set an upper bound to nbthread
On todays large systems, it's not always desired to run on all threads
for light loads, and usually users enforce nbthread to a lower value
(e.g. 8). The problem is that this is a fixed value, and moving such
configs to smaller machines continues to enforce the value and this
becomes extremely unproductive due to having more threads than CPUs.
This also happens quite a bit in VMs, containers, or cloud instances
of various sizes.

This commit introduces the thread-hard-limit setting that allows to only
set an upper bound to the number of threads without raising a lower value.
This means that using "thread-hard-limit 8" will make sure that no more
than 8 threads will be used when available, but it will remain two when
run on a dual-core machine.
2024-05-24 09:46:49 +02:00
Willy Tarreau
c7335d55f8 BUG/MEDIUM: quic_tls: prevent LibreSSL < 4.0 from negotiating CHACHA20_POLY1305
As diagnosed in GH issue #2569, there's currently an issue in LibreSSL's
CHACHA20 in-place implementation that makes haproxy discard incoming QUIC
packets encrypted with it. It's not very easy to observe the issue because:
  - QUIC recommends that CHACHA20 is used in priority
  - on x86 with AES-NI, LibreSSL prefers AES-GCM for performance
    reasons, so the problem is only observed there if a client
    explicitly forces TLS_CHACHA20_POLY1305_SHA256 only.
  - discarded packets cause retransmits showing some apparent activity,
    and the handshake succeeds so it's not easy to analyze from the
    client which thinks that the server is slow to respond.

Thus in practice, on non-x86 machines running LibreSSL, requests made over
QUIC freeze for a long time, unless the client explicitly forces algos
excluding TLS_CHACHA20_POLY1305_SHA256. That's typically the case by
default on modern OpenBSD systems, and was reported in the issue above
for an arm64 machine running OpenBSD -current, and was also observed on a
mips64 one running OpenBSD 7.5.

There is no simple solution to this problem due to some of the protocol's
constraints without digging too low into the stack (and risking to break
more). Here we're taking a pragmatic approach consisting in making the
connection fail hard when TLS_CHACHA20_POLY1305_SHA256 is selected,
regardless of the availability of other ciphers. This means that every
time a connection would have hung, instead it will fail fast, allowing
the client to retry over TLS/TCP.

Theo Buehler recommends that we limit this protection to all LibreSSL
versions before 4.0 since it's where the fix will be implemented. Older
stable versions will just see TLS_CHACHA20_POLY1305_SHA256 disabled,
which should be sufficient to make QUIC work there again as well.

The following config is sufficient to reproduce the issue (on a non-x86
machine, both arm64 & mips64 were confirmed to reproduce it):

    global
        limited-quic

    frontend stats
        mode http
        #bind :8181
        #bind :8443 ssl crt rsa+dh2048.pem
        bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3
        timeout client 5s
        stats uri /

And the following commands will trigger the problem on affected LibreSSL
versions:
  curl --tls13-ciphers TLS_CHACHA20_POLY1305_SHA256 -v --http3 -k https://127.0.0.1:8443/
  curl -v --http3 -k https://127.0.0.1:8443/

while these ones must work:
  curl --tls13-ciphers TLS_AES_128_GCM_SHA256 -v --http3 -k https://127.0.0.1:8443/
  curl --tls13-ciphers TLS_AES_256_GCM_SHA384 -v --http3 -k https://127.0.0.1:8443/

Normally all of them will work with LibreSSL 4, and only the first one
should fail with stable LibreSSL versions higher than 3.9.2. An haproxy
version without this workaround will show an unresponsive command after
the GET is sent, while a version with the workaround will close the
connection on error. On a version with this workaround, if TCP listeners
are uncommented, curl will automatically fall back to TCP and attempt
the reqeust again over HTTP/2. Finally, on OpenSSL 1.1.1 in compat mode
(hence the limited-quic option above) all of them must work.

Many thanks to github user @lgv5 for the detailed report, tests, and
for spotting the issue, and to @botovq (Theo Buehler) for the quick
analysis, patch and help on this workaround.

This needs to be backported to versions 2.6 and above.
2024-05-22 16:22:22 +02:00
Amaury Denoyelle
60496e884e MINOR: connection: support PROXY v2 TLV emission without stream
Update API for PROXY protocol header encoding. Previously, it requires
stream parameter to be set. Change make_proxy_line() and associated
functions to add an extra session parameter. This is useful in context
where no stream is instantiated. For example, this is the case for rhttp
preconnect.

This change allows to extend PROXY v2 TLV encoding. Replace
build_logline() which requires a stream instance and call directly
sess_build_logline().

Note that stream parameter is kept as it is necessary for unique ID
encoding.

This change has no functional change for standard connections. However,
it is necessary to support TLV encoding on rhttp preconnect.
2024-05-22 10:01:57 +02:00
Amaury Denoyelle
12c40c25a9 MEDIUM: rhttp: create session for active preconnect
Modify rhttp preconnect by instantiating a new session for each
connection attempt. Connection is thus linked to a session directly on
its instantiation contrary to previously where no session existed until
listener_accept().

This patch will allow to extend rhttp usage. Most notably, it will be
useful to use various sample fetches on the server line and extend
logging capabilities.

Changes are minimal, yet consequences are considered not trivial as for
the first time a FE connection session is instantiated before
listener_accept(). This requires an extra explicit check in
session_accept_fd() to not overwrite an existing session. Also, flag
SESS_FL_RELEASE_LI is not set immediately as listener counters must note
be decremented if connection and its session are freed before reversal
is completed, or else listener counters will be invalid.

conn_session_free() is used as connection destroy callback to ensure the
session will be freed automatically on connection release.
2024-05-22 10:01:57 +02:00
Amaury Denoyelle
45b80aed70 MINOR: session: define flag to explicitely release listener on free
When a session is allocated for a FE connection, session_free() is
responsible to call listener_release() to decrement listener connection
counters and resume listening.

Until now, <listener> member of session was tested inside session_free()
before invocating listener_release(). To highlight more explicitely the
relation between sessions and listeners, introduce a new flag
SESS_FL_RELEASE_LI. Only session with such flag set will invoke
listener_release() on their cleanup. Flag is set inside
session_accept_fd() on success.

This patch has no functional change. However, it will be useful to
implement session creation for rHTTP preconnect.
2024-05-22 10:01:57 +02:00
Amaury Denoyelle
2770ef352e BUG/MINOR: rhttp: prevent listener suspend
Ensure "disable frontend" on a reverse HTTP listener is forbidden by
returing -1 on suspend callback. Suspending such a listener has unknown
effect and so is not properly implemented for now.

This should be backported up to 2.9.
2024-05-22 10:01:57 +02:00
Valentine Krasnobaeva
5f713c03be BUG/MEDIUM: proto: fix fd leak in <proto>_connect_server
This fixes the fd leak, introduced in the commit d3fc982cd7
("MEDIUM: proto: make common fd checks in sock_create_server_socket").

Initially sock_create_server_socket() was designed to return only created
socket FD or -1. Its callers from upper protocol layers were required to test
the returned errno and were required then to apply different configuration
related checks to obtained positive sock_fd. A lot of this code was duplicated
among protocols implementations.

The new refactored version of sock_create_server_socket() gathers in one place
all duplicated checks, but in order to be complient with upper protocol
layers, it needs the 3rd parameter: 'stream_err', in which it sets the
Stream Error Flag for upper levels, if the obtained sock_fd has passed all
additional checks.

No backport needed since this was introduced in 3.0-dev10.
2024-05-21 20:14:05 +02:00
William Lallemand
e6657fd108 MEDIUM: ssl: don't load file by discovering them in crt-store
In commit 55e9e9591 ("MEDIUM: ssl: temporarily load files by detecting
their presence in crt-store"), ssl_sock_load_pem_into_ckch() was
replaced by ssl_sock_load_files_into_ckch() in the crt-store loading.

But the side effect was that we always try to autodetect, and this is
not what we want. This patch reverse this, and add specific code in the
crt-list loading, so we could autodetect in crt-list like it was done
before, but still try to load files when a crt-store filename keyword is
specified.

Example:

These crt-list lines won't autodetect files:

    foobar.crt [key foobar.key issuer foobar.issuer ocsp-update on] *.foo.bar
    foobar.crt [key foobar.key] *.foo.bar

These crt-list lines will autodect files:

    foobar.pem [ocsp-update on] *.foo.bar
    foobar.pem
2024-05-21 18:30:45 +02:00
Aurelien DARRAGON
51a8f134ef DEBUG: tools: add vma_set_name() helper
Following David Carlier's work in 98d22f21 ("MEDIUM: shctx: Naming shared
memory context"), let's provide an helper function to set a name hint on
a virtual memory area (ie: anonymous map created using mmap(), or memory
area returned by malloc()).

Naming will only occur if available, and naming errors will be ignored.
The function takes mandatory <type> and <name> parameterss to build the
map name as follow: "type:name". When looking at /proc/<pid>/maps, vma
named using this helper function will show up this way (provided that
the kernel has prtcl support for PR_SET_VMA_ANON_NAME):

example, using mmap + MAP_SHARED|MAP_ANONYMOUS:
  7364c4fff000-736508000000 rw-s 00000000 00:01 3540  [anon_shmem:type:name]
Another example, using mmap + MAP_PRIVATE|MAP_ANONYMOUS or using
glibc/malloc() above MMAP_THRESHOLD:
  7364c4fff000-736508000000 rw-s 00000000 00:01 3540  [anon:type:name]
2024-05-21 17:54:58 +02:00
Aurelien DARRAGON
0cfbeb1ae8 BUG/MINOR: ring: free ring's allocated area not ring's usable area when using maps
Since 40d1c84bf0 ("BUG/MAJOR: ring: free the ring storage not the ring
itself when using maps"), munmap() call for startup_logs's ring and
file-backed rings fails to work (EINVAL) and causes memory leaks during
process cleanup.

munmap() fails because it is called with the ring's usable area pointer
which is an offset from the underlying original memory block allocated
using mmap(). Indeed, ring_area() helper function was misused because
it didn't explicitly mention that the returned address corresponds to
the usable storage's area, not the allocated one.

To fix the issue, we add an explicit ring_allocated_area() helper to
return the allocated area for the ring, just like we already have
ring_allocated_size() for the allocated size, and we properly use both
the allocated size and allocated area to manipulate them using munmap()
and msync().

No backport needed.
2024-05-21 11:42:35 +02:00
William Lallemand
55e9e95914 MEDIUM: ssl: temporarily load files by detecting their presence in crt-store
crt-store is maint to be stricter than your common crt argument on a
bind line, and is supposed to be a declarative format.

However, since the 'ocsp-update' was migrated from ssl_conf to
ckch_conf, the .issuer file is not autodetected anymore when adding a
ocsp-update keyword in a crt-list file, which breaks retro-compatibility.

This patch is a quick fix that will disappear once we are able to be
strict on a crt-store and autodetect on a crt-list.
2024-05-17 17:35:51 +02:00
William Lallemand
58103bc8e6 MINOR: ssl: ckch_conf_cmp() compare multiple ckch_conf structures
The ckch_conf_cmp() function allow to compare multiple ckch_conf
structures in order to check that multiple usage of the same crt in the
configuration uses the same ckch_conf definition.

A crt-list allows to use "crt-store" keywords that defines a ckch_store,
that can lead to inconsistencies when a crt is called multiple time with
different parameters.

This function compare and dump a list of differences in the err variable
to be output as error.

The variant ckch_conf_cmp_empty() compares the ckch_conf structure to an
empty one, which is useful for bind lines, that are not able to have
crt-store keywords.

These functions are used when a crt-store is already inialized and we
need to verify if the parameters are compatible.

ckch_conf_cmp() handles multiple cases:

- When the previous ckch_conf was declared with CKCH_CONF_SET_EMPTY, we
  can't define any new keyword in the next initialisation
- When the previous ckch_conf was declared with keywords in a crtlist
  (CKCH_CONF_SET_CRTLIST), the next initialisation must have the exact
  same keywords.
- When the previous ckch_conf was declared in a "crt-store"
  (CKCH_CONF_SET_CRTSTORE), the next initialisaton could use no keyword
  at all or the exact same keywords.
2024-05-17 17:35:51 +02:00
William Lallemand
1bc6e990f2 MEDIUM: ssl/cli: handle crt-store keywords in crt-list over the CLI
This patch adds crt-store keywords from the crt-list on the CLI.

- keywords from crt-store can be used over the CLI when inserting
  certificate in a crt-list
- keywords from crt-store are dumped when showing a crt-list content
  over the CLI

The ckch_conf_kws.func function pointer needed a new "cli" parameter, in
order to differenciate loading that come from the CLI or from the
startup, as they don't behave the same. For example it must not try to
load a file on the filesystem when loading a crt-list line from the CLI.

dump_crtlist_sslconf() was renamed in dump_crtlist_conf() and takes a
new ckch_conf parameter in order to dump relevant crt-store keywords.
2024-05-17 17:35:51 +02:00
William Lallemand
2bcf38c7c8 MEDIUM: ssl: add ocsp-update.disable global option
This option allow to disable completely the ocsp-update.

To achieve this, the ocsp-update.mode global keyword don't rely anymore
on SSL_SOCK_OCSP_UPDATE_OFF during parsing to call
ssl_create_ocsp_update_task().

Instead, we will inherit the SSL_SOCK_OCSP_UPDATE_* value from
ocsp-update.mode for each certificate which does not specify its own
mode.

To disable completely the ocsp without editing all crt entries,
ocsp-update.disable is used instead of "ocsp-update.mode" which is now
only used as the default value for crt.
2024-05-17 17:35:51 +02:00
William Lallemand
2e6615b282 MINOR: ssl: ckch_conf_clean() utility function for ckch_conf
- ckch_conf_clean() to free() the content of a ckch_conf structure,
  mostly the string that were strdup()
2024-05-17 17:35:51 +02:00
William Lallemand
2b6b7fea58 MINOR: ssl/ocsp: use 'ocsp-update' in crt-store
Use the ocsp-update keyword in the crt-store section. This is not used
as an exception in the crtlist code anymore.

This patch introduces the "ocsp_update_mode" variable in the ckch_conf
structure.

The SSL_SOCK_OCSP_UPDATE_* enum was changed to a define to match the
ckch_conf on/off parser so we can have off to -1.
2024-05-17 17:35:51 +02:00
William Lallemand
462e5b0098 MINOR: ssl: handle PARSE_TYPE_INT and PARSE_TYPE_ONOFF in ckch_store_load_files()
The callback used by ckch_store_load_files() only works with
PARSE_TYPE_STR.

This allows to use a callback which will use a integer type for PARSE_TYPE_INT
and PARSE_TYPE_ONOFF.

This require to change the type of the callback to void * to pass either
a char * or a int depending of the parsing type.

The ssl_sock_load_* functions were encapsuled in ckch_conf_load_*
function just to match the type.

This will allow to handle crt-store keywords that are ONOFF or INT
types.
2024-05-17 17:35:51 +02:00
William Lallemand
db09c2168f CLEANUP: ssl/ocsp: remove the deprecated parsing code for "ocsp-update"
Remove the "ocsp-update" keyword handling from the crt-list.

The code was made as an exception everywhere so we could activate the
ocsp-update for an individual certificate.

The feature will still exists but will be parsed as a "crt-store"
keyword which will still be usable in a "crt-list". This will appear in
future commits.

This commit also disable the reg-tests for now.
2024-05-17 17:35:51 +02:00
William Lallemand
d616932076 MEDIUM: ssl/crtlist: loading crt-store keywords from a crt-list
This patch allows the usage of "crt-store" keywords from a "crt-list".

The crtstore_parse_load() function was splitted into 2 functions, so the
keywords parsing is done in ckch_conf_parse().

With this patch, crt are loaded with ckch_store_new_load_files_conf() or
ckch_store_new_load_files_path() depending on weither or not there is a
"crt-store" keyword.

More checks need to be done on "crt" bind keywords to ensure that
keywords are compatible.

This patch does not introduce the feature on the CLI.
2024-05-17 17:35:51 +02:00
William Lallemand
8526d666d2 MINOR: ssl: ckch_store_new_load_files_conf() loads filenames from ckch_conf
ckch_store_new_load_files_conf() is the equivalent of
new_ckch_store_load_files_path() but instead of trying to find the files
using a base filename, it will load them from a list of files.
2024-05-17 17:35:51 +02:00
Christopher Faulet
1a2699d5f7 CLEANUP: mux-h1: Remove unused H1S_F_ERROR_MASK mask value
This mask value is unused, so we can safely remove it. It is a chance
because its value was wrong. But there is no bug here, even in stable
versions, because it is no longer used in all versions.
2024-05-17 16:33:53 +02:00
Christopher Faulet
071057d112 REORG: mux-h1: Group H1S_F_BODYLESS_* flags
To ease reading of H1S flags, H1S_F_BODYLESS_REQ and H1S_F_BODYLESS_RESP
flags are grouped.
2024-05-17 16:33:53 +02:00
Christopher Faulet
8e55d29109 MINOR: mux-h1: Add a flag to ignore the request payload
There was a flag to skip the response payload on output, if any, by stating
it is bodyless. It is used for responses to HEAD requests or for 204/304
responses. This allow rewrites during analysis. For instance a HEAD request
can be rewrite to a GET request for any reason (ie, a server not supporting
HEAD requests). In this case, the server will send a response with a
payload. On frontend side, the payload will be skipped and a valid response
(without payload) will be sent to the client.

With this patch we introduce the corresponding flag for the request. It will
be used to skip the request payload. In addition, when payload must be
skipped for a request or a response, The zero-copy data forwarding is now
disabled.
2024-05-17 16:33:53 +02:00
Willy Tarreau
0999e3d959 CLEANUP: compat: make the MIN/MAX macros more reliable
After every release we say that MIN/MAX should be changed to be an
expression that only evaluates each operand once, and before every
version we forget to change it and we recheck that the code doesn't
misuse them. Let's fix them now.
2024-05-17 15:57:18 +02:00
Willy Tarreau
ea3b89952d BUILD: stick-tables: better mark the stktable_data as 32-bit aligned
Aurlien reported that clang's build was broken by the recent fix
845fb846c7 ("BUG/MEDIUM: stick-tables: properly mark stktable_data
as packed"), because it now wants to use a helper for some atomic
ops (to increment std_t_uint). While this makes no sense to do
something that slow on modern architectures like x86 and arm64 which
are fine with unaligned accesses, we actually we can simply mark the
struct as aligned to its smallest element which is 32-bit (but still
packed). With this, it was verified that it is enough for clang to
see that its 32-bit operations will always be aligned, while making
64-bit operations safe on 64-bit platforms that do not support unaligned
accesses.

This should be backported wherever the patch above is backported.
2024-05-17 11:00:45 +02:00
Amaury Denoyelle
216f70f989 MINOR: mux-quic: support glitches
Implement basic support for glitches on QUIC multiplexer. This is mostly
identical too glitches for HTTP/2.

A new configuration option named tune.quic.frontend.glitches-threshold
is defined to limit the number of glitches on a connection before
closing it.

Glitches counter is incremented via qcc_report_glitch(). A new
qcc_app_ops callback <report_susp> is defined. On threshold reaching, it
allows to set an application error code to close the connection. For
HTTP/3, value H3_EXCESSIVE_LOAD is returned. If not defined, default
code INTERNAL_ERROR is used.

For the moment, no glitch are reported for QUIC or HTTP/3 usage. This
will be added in future patches as needed.
2024-05-16 10:58:20 +02:00
Amaury Denoyelle
e094412337 MINOR: h3/qpack: adjust naming for errors
Rename enum values used for HTTP/3 and QPACK RFC defined codes. First
uses a prefix H3_ERR_* which serves as identifier between them. Also
separate QPACK values in a new dedicated enum qpack_err. This is deemed
cleaner.
2024-05-16 10:31:17 +02:00
Amaury Denoyelle
2dabcf30be MINOR: qpack: prepare error renaming
There is two distinct enums both related to QPACK error management. The
first one is dedicated to RFC defined code. The other one is a set of
internal values returned by qpack_decode_fs(). There has been issues
discovered recently due to the confusion between them.

Rename internal values with the prefix QPACK_RET_*. The older name
QPACK_ERR_* will be used in a future commit for the first enum.
2024-05-16 10:31:17 +02:00
Willy Tarreau
b0349cf2de MINOR: dynbuf: provide a b_dequeue() variant for multi-thread
In order to forcefully unregister a buffer waiter during an inter-thread
takeover under isolation, we'll need to that the function works without
th_ctx but the target thread's ctx instead. Let's implement this by
passing the target thread as an argument. Now b_dequeue() simply calls
this one with tid. It's OK it's not on that critical a path, especially
since the list has been checked for existence before performing the call.
2024-05-15 19:37:12 +02:00
Willy Tarreau
845fb846c7 BUG/MEDIUM: stick-tables: properly mark stktable_data as packed
The stktable_data union is made of types of varying sizes, and depending
on which types are stored in a table, some offsets might not necessarily
be aligned. This results in a bus error for certain regtests (e.g.
lb-services) on MIPS64. This bug may impact MIPS64, SPARC64, armv7 when
accessing a 64-bit counter (e.g. bytes) and depending on how the compiler
emitted the operation, and cause a trap that's emulated by the OS on RISCV
(heavy cost). x86_64 and armv8 are not affected at all.

Let's properly mark the struct with __attribute__((packed)) so that the
compiler emits the suitable unaligned-compatible instructions when
accessing the fields.

This should be backported to all versions where it applies.
2024-05-15 19:03:18 +02:00
Willy Tarreau
276cdc11e8 BUG/MEDIUM: htx: mark htx_sl as packed since it may be realigned
A test on MIPS64 revealed that the following reg tests would all
fail at the same place in htx_replace_stline() when updating
parts of the request line:
  reg-tests/cache/if-modified-since.vtc
  reg-tests/http-rules/h1or2_to_h1c.vtc
  reg-tests/http-rules/http_after_response.vtc
  reg-tests/http-rules/normalize_uri.vtc
  reg-tests/http-rules/path_and_pathq.vtc

While the status line is normally aligned since it's the first block
of the HTX, it may become unaligned once replaced. The problem is, it
is a structure which contains some u16 and u32, and dereferencing them
on machines not natively supporting unaligned accesses makes them crash
or handle crap. Typically, MIPS/MIPS64/SPARC will crash, ARMv5 will
either crash or (more likely) return swapped values and do crap, and
RISCV will trap and turn to slow emulation.

We can assign the htx_sl struct the packed attribute, but then this
also causes the ints to fill the 2-bytes gap before them, always causing
unaligned accesses for this part on such machines. The patch does a bit
better, by explicitly filling this two-bytes hole, and packing the
struct.

This should be backported to all versions.
2024-05-15 19:03:17 +02:00
Amaury Denoyelle
86aafd0236 BUG/MINOR: qpack: fix error code reported on QPACK decoding failure
qpack_decode_fs() is used to decode QPACK field section on HTTP/3
headers parsing. Its return value is incoherent as it returns either
QPACK_DECOMPRESSION_FAILED defined in RFC 9204 or any other internal
values defined in qpack-dec.h. On failure, such return code is reused by
HTTP/3 layer to be reported via a CONNECTION_CLOSE frame. This is
incorrect if an internal error values was reported as it is not defined
by any specification.

Fir return values of qpack_decode_fs() in two ways. Firstly, fix invalid
usages of QPACK_DECOMPRESSION_FAILED when decoded content is too large
for the correct internal error QPACK_ERR_TOO_LARGE.

Secondly, adjust qpack_decode_fs() API to only returns internal code
values. A new internal enum QPACK_ERR_DECOMP is defined to replace
QPACK_DECOMPRESSION_FAILED. Caller is responsible to convert it to a
suitable error value. For other internal values, H3_INTERNAL_ERROR is
used. This is done through a set of convert functions.

This should be backported up to 2.6. Note that trailers are not
supported in 2.6 so chunk related to h3_trailers_to_htx() can be safely
skipped.
2024-05-15 16:07:15 +02:00
Willy Tarreau
fc792694a6 MEDIUM: dynbuf: use emergency buffers upon failed memory allocations
Now, if a pool_alloc() fails for a buffer and if conditions are met
based on the queue number, we'll try to get an emergency buffer.

Thanks to this the situation is way more stable now. With only 4 reserve
buffers and 1 buffer it's possible to reliably serve 500 concurrent end-
to-end H1 connections and consult stats in parallel in loops showing the
growing number of buf_wait events in "show activity" without facing an
instant stall like in the past. Lower values still cause quick stalls
though.

It's also apparent that some subsystems do not seem to detach from the
buffer_wait lists when leaving. For example several crashes in the H1
part showed list elements still present after a free(), so maybe some
operations performed inside h1_release() after the b_dequeue() call
can sometimes result in a new allocation. Same for streams, where
the dequeue is done relatively early.
2024-05-10 17:18:13 +02:00
Willy Tarreau
0ce51dc93b MEDIUM: dynbuf: implement emergency buffers
The buffer reserve set by tune.buffers.reserve has long been unused, and
in order to deal gracefully with failed memory allocations we'll need to
resort to a few emergency buffers that are pre-allocated per thread.

These buffers are only for emergency use, so every time their count is
below the configured number a b_free() will refill them. For this reason
their count can remain pretty low. We changed the default number from 2
to 4 per thread, and the minimum value is now zero (e.g. for low-memory
systems). The tune.buffers.limit setting has always been a problem when
trying to deal with the reserve but now we could simplify it by simply
pushing the limit (if set) to match the reserve. That was already done in
the past with a static value, but now with threads it was a bit trickier,
which is why the per-thread allocators increment the limit on the fly
before allocating their own buffers. This also means that the configured
limit is saner and now corresponds to the regular buffers that can be
allocated on top of emergency buffers.

At the moment these emergency buffers are not used upon allocation
failure. The only reason is to ease bisecting later if needed, since
this commit only has to deal with resource management.
2024-05-10 17:18:13 +02:00
Willy Tarreau
5b8d27617f MEDIUM: channel: allocate without queuing when retrying
Now when trying to allocate a channel buffer, we can check if we've been
notified of availability via the producer stream connector callback, in
which case we should not consult the queue, or if we're doing a first
allocation and check the queue.
2024-05-10 17:18:13 +02:00
Willy Tarreau
f552f79ba5 MINOR: mux-h1: report that a buffer allocation succeeded
When the buffer allocation callback is notified of a buffer availability,
it will now set a MAYALLOC flag in addition to clearing the ALLOC one, for
each of the 3 levels where we may fail an allocation. The flag will be
cleared upon a successful allocation. This will soon be used to decide to
re-allocate without waiting again in the queue. For now it has no effect.

There's just a trick, we need to clear the various *_ALLOC flags before
testing h1_recv_allowed() otherwise it will return false!
2024-05-10 17:18:13 +02:00
Willy Tarreau
cb2d758043 MINOR: applet: report about buffer allocation success
When appctx_buf_available() is called, it now sets APPCTX_FL_IN_MAYALLOC
or APPCTX_FL_OUT_MAYALLOC depending on the reportedly permitted buffer
allocation, and these flags are cleared when the said buffers are
allocated. For now they're not used for anything else.
2024-05-10 17:18:13 +02:00
Willy Tarreau
17d8916bb1 MINOR: stream: report that a buffer allocation succeeded
When the buffer allocation callback is notified of a buffer availability,
it will now set a MAYALLOC flag on the stream so that the stream knows it
is allowed to bypass the queue checks. For now this is not used.
2024-05-10 17:18:13 +02:00
Willy Tarreau
7aff64518c MINOR: stconn: report that a buffer allocation succeeded
We used to have two states for the channel's input buffer used by the SC,
NEED_BUFF or not, flipped by sc_need_buff() and sc_have_buff(). We want to
have a 3rd state, indicating that we've just got a desired buffer. Let's
add an HAVE_BUFF flag that is set by sc_have_buff() and that is cleared by
sc_used_buff(). This way by looking at HAVE_BUFF we know that we're coming
back from the allocation callback and that the offered buffer has not yet
been used.
2024-05-10 17:18:13 +02:00
Willy Tarreau
d1eb48a12b MEDIUM: dynbuf: refrain from offering a buffer if more critical ones are waiting
Now b_alloc() will check the queues at the same and higher criticality
levels before allocating a buffer, and will refrain from allocating one
if these are not empty. The purpose is to put some priorities in the
allocation order so that most critical allocators are offered a chance
to complete.

However in order to permit a freshly dequeued task to allocate again while
siblings are still in the queue, there is a special DB_F_NOQUEUE flag to
pass to b_alloc() that will take care of this special situation.
2024-05-10 17:18:13 +02:00
Willy Tarreau
4a42af1744 MINOR: applet: adjust the allocation criticity based on the requested buffer
When we want to allocate an in buffer, it's in order to pass data to
the applet, that will consume it, so it must be seen as the same as
a send() from the higher level, i.e. MUX_TX. And for the outbuf, it's
a stream endpoint returning data, i.e. DB_SE_RX.
2024-05-10 17:18:13 +02:00
Willy Tarreau
4ffb3b5ebe MINOR: applet: set the blocking flag in the buffer allocation function
Instead of having each caller of appctx_get_buf() think about setting
the blocking flag, better have the function do it, since it's already
handling the queue anyway. This way we're sure that both are consistent.
2024-05-10 17:18:13 +02:00
Willy Tarreau
f5566afec6 MEDIUM: dynbuf: generalize the use of b_dequeue() to detach buffer_wait
Now thanks to this the bufq_map field is expected to remain accurate.
2024-05-10 17:18:13 +02:00
Willy Tarreau
f70bd5fad1 MINOR: dynbuf: provide a b_dequeue() function to detach a bw from the queue
Now that we need to keep the bitmap in sync with the list heads, we don't
want tasks to leave just doing a LIST_DEL_INIT() without updating the map.
Let's provide a b_dequeue() function for that purpose. The function detects
when it's going to remove the last element and figures the queue number
based on the pointer since it points to the root. It's not used yet.
2024-05-10 17:18:13 +02:00
Willy Tarreau
53461e4d94 CLEANUP: tinfo: better align fields in thread_ctx
The introduction of buffer_wq[] in thread_ctx pushed a few fields around
and the cache line alignment is less satisfying. And more importantly, even
before this, all the lists in the local parts were 8-aligned, with the first
one split across two cache lines.

We can do better:
  - sched_profile_entry is not atomic at all, the data it points to is
    atomic so it doesn't need to be in the atomic-only region, and it can
    fill the 8-hole before the lists
  - the align(2*void) that was only before tasklets[] moves before all
    lists (and it's a nop for now)

This now makes the lists and buffer_wq[] start on a cache line boundary,
leaves 48 bytes after the lists before the atomic-only cache line, and
leaves a full cache line at the end for 128-alignment. This way we still
have plenty of room in both parts with better aligned fields.
2024-05-10 17:18:13 +02:00
Willy Tarreau
a5d6a79986 MEDIUM: dynbuf: make the buffer_wq an array of list heads
Let's turn the buffer_wq into an array of 4 list heads. These are chosen
by criticality. The DB_CRIT_TO_QUEUE() macro maps each criticality level
into one of these 4 queues. The goal here clearly is to make it possible
to wake up the most critical queues in priority in order to let some tasks
finish their job and release buffers that others can use.

In order to avoid having to look up all queues, a bit map indicates which
queues are in use, which also allows to avoid looping in the most common
case where queues are empty..
2024-05-10 17:18:13 +02:00
Willy Tarreau
a214197ce7 MINOR: dynbuf: use the b_queue()/b_requeue() functions everywhere
The code places that were used to manipulate the buffer_wq manually
now just call b_queue() or b_requeue(). This will simplify the multiple
list management later.
2024-05-10 17:18:13 +02:00
Willy Tarreau
d1c2f325a2 MINOR: dynbuf: add functions to help queue/requeue buffer_wait fields
When failing an allocation we always do the same dance, add the
buffer_wait struct to a list if it's not, and return. Let's just add
dedicated functions to centralize this, this will be useful to implement
a bit more complex logic.

For now they're not used.
2024-05-10 17:18:13 +02:00
Willy Tarreau
72d0dcda8e MINOR: dynbuf: pass a criticality argument to b_alloc()
The goal is to indicate how critical the allocation is, between the
least one (growing an existing buffer ring) and the topmost one (boot
time allocation for the life of the process).

The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function
to try to allocate otherwise subscribe. There's currently no distinction
of direction nor part that tries to allocate, and this should be revisited
to improve this situation, particularly when we consider that mux-h2 can
reduce its Tx allocations if needed.

For now, 4 main levels are planned, to translate how the data travels
inside haproxy from a producer to a consumer:
  - MUX_RX:   buffer used to receive data from the OS
  - SE_RX:    buffer used to place a transformation of the RX data for
              a mux, or to produce a response for an applet
  - CHANNEL:  the channel buffer for sync recv
  - MUX_TX:   buffer used to transfer data from the channel to the outside,
              generally a mux but there can be a few specificities (e.g.
              http client's response buffer passed to the application,
              which also gets a transformation of the channel data).

The other levels are a bit different in that they don't strictly need to
allocate for the first two ones, or they're permanent for the last one
(used by compression).
2024-05-10 17:18:13 +02:00
Christopher Faulet
eca9831ec8 MINOR: muxes: Add ctl commands to get info on streams for a connection
There are 2 new ctl commands that may be used to retrieve the current number
of streams openned for a connection and its limit (the maximum number of
streams a mux connection supports).

For the PT and H1 muxes, the limit is always 1 and the current number of
streams is 0 for idle connections, otherwise 1 is returned.

For the H2 and the FCGI muxes, info are already available in the mux
connection.

For the QUIC mux, the limit is also directly available. It is the maximum
initial sub-ID of bidirectional stream allowed for the connection. For the
current number of streams, it is the number of SC attached on the connection
and the number of not already attached streams present in the "opening_list"
list.
2024-05-06 22:00:00 +02:00
Christopher Faulet
96f8b7ad08 MEDIUM: stconn/muxes: Add an abort reason for SE shutdowns on muxes
A reason is now passed as parameter to muxes shutdowns to pass additional
info about the abort, if any. No info means no abort or only generic one.

For now, the reason is composed of 2 32-bits integer. The first on represents
the abort code and the other one represents the info about the code (for
instance the source). The code should be interpreted according to the associated
info.

One info is the source, encoding on 5 bits. Other bits are reserverd for now.
For now, the muxes are the only supported source. But we can imagine to extend
it to applets, streams, health-checks...

The current design is quite simple and will most probably evolved.. But the
idea is to let the opposite side forward some errors and let's a mux know
why its stream was aborted. At first glance, a abort reason must only be
evaluated if SE_SHW_SILENT flag is set.

The main goal at short term, is to forward some H2 RST_STREAM codes because
it is mandatory for gRPC applications, mainly to forward gRPC cancellation
from an H2 client to an H2 server. But we can imagine to alter this reason
at the applicative level to enrich it. It would also be used to report more
accurate errors in logs.
2024-05-06 22:00:00 +02:00
Aurelien DARRAGON
48e0efb00b MEDIUM: log: optimizing tmp->type handling in sess_build_logline()
Instead of chaining 2 switchcases and performing encoding checks for all
nodes let's actually split the logic in 2: first handle simple node types
(text/separator), and then handle dynamic node types (tag, expr). Encoding
options are only evaluated for dynamic node types.

Also, last_isspace is always set to 0 after next_fmt label, since next_fmt
label is only used for dynamic nodes, thus != LOG_FMT_SEPARATOR.

Since LF_NODE_WITH_OPT() macro (which was introduced recently) is now
unused, let's get rid of it.

No functional change should be expected.

(Use diff -w to check patch changes since reindentation makes the patch
look heavy, but in fact it remains fairly small)
2024-05-03 16:48:21 +02:00
Ilia Shipitsin
a65c6d3574 CLEANUP: assorted typo fixes in the code and comments
This is 42nd iteration of typo fixes
2024-05-03 09:01:36 +02:00
Amaury Denoyelle
53782b9ea5 MINOR: stats: extract proxy clear-counter in a dedicated function
Split code related to proxies list looping in cli_parse_clear_counters()
to a new dedicated function. This function is placed in the new module
stats-proxy.
2024-05-02 16:43:26 +02:00
Amaury Denoyelle
f0644d1bd7 REORG: stats: define stats-proxy source module
Create a new module stats-proxy. Move stats functions related to proxies
list looping in it. This allows to reduce stats source file dividing its
size by half.
2024-05-02 16:42:36 +02:00
William Lallemand
964f093504 CLEANUP: ssl: rename new_ckch_store_load_files_path() to ckch_store_new_load_files_path()
Rename the new_ckch_store_load_files_path() function to
ckch_store_new_load_files_path(), in order to be more consistent.
2024-05-02 16:03:20 +02:00
Amaury Denoyelle
10ab56831e MINOR: stats: convert age as generic column for proxy stat
Convert FN_AGE in stat_cols_px[] as generic columns. These values will
be automatically used for dump/preload of a stats-file.

Remove srv_lastsession() / be_lastsession() function which are now
useless as last_sess is calculated via me_generate_field().
2024-05-02 10:55:25 +02:00
Amaury Denoyelle
634cc2a5d8 MINOR: counters: move last_change into counters struct
last_change was a member present in both proxy and server struct. It is
used as an age statistics to report the last update of the object.

Move last_change into fe_counters/be_counters. This is necessary to be
able to manipulate it through generic stat column and report it into
stats-file.

Note that there is a change for proxy structure with now 2 different
last_change values, on frontend and backend side. Special care was taken
to ensure that the value is initialized only on the proxy side. The
other value is set to 0 unless a listen proxy is instantiated. For the
moment, only backend counter is reported in stats. However, with now two
distinct values, stats could be extended to report it on both side.
2024-05-02 10:55:25 +02:00
Amaury Denoyelle
fec2ae9b76 MINOR: stats: support rate in stats-file
Implement support for FN_RATE stat column into stat-file.

For the output part, only minimal change is required. Reuse the function
read_freq_ctr() to print the same value in both stats output and
stats-file dump.

For counter preloading, define a new utility function
preload_freq_ctr(). This can be used to initialize a freq-ctr type by
preloading previous period value. Reuse this function in load_ctr()
during stats-file parsing.

At the moment, no rate column is defined as generic. Thus, this commit
does not have functional change. This will be changed as soon as FN_RATE
are converted to generic columns.
2024-05-02 10:55:25 +02:00
Amaury Denoyelle
639e73f8f2 MINOR: counters: move freq-ctr from proxy/server into counters struct
Move freq-ctr defined in proxy or server structures into their dedicated
fe_counters/be_counters struct.

Functionnaly no change here. This commit will allow to convert rate
stats column to generic one, which is mandatory to manipulate them in
the stats-file.
2024-05-02 10:55:25 +02:00
Amaury Denoyelle
4e9e841878 MINOR: stats: prepare stats-file support for values other than FN_COUNTER
Currently, only FN_COUNTER are dumped and preloaded via a stats-file.
Thus in several places we relied on the assumption that only FN_COUNTER
are valid in stats-file context.

New stats types will soon be implemented as they are also eligilible to
statistics reloading on process startup. Thus, prepare stats-file
functions to remove any FN_COUNTER restriction.

As one of this change, generate_stat_tree() now uses stcol_is_generic()
for stats name tree indexing before stats-file parsing.

Also related to stats-file parsing, individual counter preloading step
as been extracted from line parsing in a dedicated new function
load_ctr(). This will allow to extend it to support multiple mechanism
of counter preloading depending on the stats type.
2024-05-02 10:55:25 +02:00
Valentine Krasnobaeva
5cbb278fae MINOR: capabilities: add cap_sys_admin support
If 'namespace' keyword is used in the backend server settings or/and in the
bind string, it means that haproxy process will call setns() to change its
default namespace to the configured one and then, it will create a
socket in this new namespace. setns() syscall requires CAP_SYS_ADMIN
capability in the process Effective set (see man 2 setns). Otherwise, the
process must be run as root.

To avoid to run haproxy as root, let's add cap_sys_admin capability in the
same way as we already added the support for some other network capabilities.

As CAP_SYS_ADMIN belongs to CAP_SYS_* capabilities type, let's add a separate
flag LSTCHK_SYSADM for it. This flag is set, if the 'namespace' keyword was
found during configuration parsing. The flag may be unset only in
prepare_caps_for_setuid() or in prepare_caps_from_permitted_set(), which
inspect process EUID/RUID and Effective and Permitted capabilities sets.

If system doesn't support Linux capabilities or 'cap_sys_admin' was not set
in 'setcap', but 'namespace' keyword is presented in the configuration, we
keep the previous strict behaviour. Process, that has changed uid to the
non-priviledged user, will terminate with alert. This alert invites the user
to recheck its configuration.

In the case, when haproxy will start and run under a non-root user and
'cap_sys_admin' is not set, but 'namespace' keyword is presented, this patch
does not change previous behaviour as well. We'll still let the user to try
its configuration, but we inform via warning, that unexpected things, like
socket creation errors, may occur.
2024-04-30 21:40:17 +02:00
Valentine Krasnobaeva
d3fc982cd7 MEDIUM: proto: make common fd checks in sock_create_server_socket
quic_connect_server(), tcp_connect_server(), uxst_connect_server() duplicate
same code to check different ERRNOs, that socket() and setns() may return.
They also duplicate some runtime condition checks, applied to the obtained
server socket fd.

So, in order to remove these duplications and to improve code readability,
let's encapsulate socket() and setns() ERRNOs handling in
sock_handle_system_err(). It must be called just before fd's runtime condition
checks, which we also move in sock_create_server_socket by the same reason.
2024-04-30 21:39:24 +02:00
Valentine Krasnobaeva
772d070ab5 MINOR: sock_set_mark: take sock family in account
SO_MARK, SO_USER_COOKIE, SO_RTABLE socket options (used to set the special
mark/ID on socket, in order to perform mark-based routing) are only supported
by AF_INET sockets. So, let's check socket address family, when we enter into
this function.
2024-04-30 21:38:29 +02:00
Aurelien DARRAGON
9931a62c3f BUG/MINOR: log: fix global lf_expr node options behavior (2nd try)
In 98b44e8 ("BUG/MINOR: log: fix global lf_expr node options behavior"),
I properly restored global node options behavior for when encoding is
not used, however the fix is not optimal when encoding is involved:

Indeed, encoding logic in sess_build_logline() relies on global node
options to know if encoding must be handled expression-wide or
individually. However, because of the above fix, if an expression is
made of 1 or multiple nodes that all set an encoding option manually
(without '%o'), we consider that the option was set globally, but
that's probably not what the user intended. Instead we should only
evaluate global options from '%o', so that it remains possible to
skip global encoding when needed.

No backport needed.
2024-04-30 10:10:35 +02:00
William Lallemand
95949e6868 MINOR: httpclient: allow to use absolute URI with new flag HC_F_HTTPROXY
The new HC_F_HTTPPROXY flag allows to use an absolute URI within a
request that won't be modified in order to use an http proxy.
2024-04-29 17:10:47 +02:00
Aurelien DARRAGON
9bdce67585 CLEANUP: log: add a macro to know if a lf_node is configurable
LF_NODE_WITH_OPT(node) returns true if the node's option may be set and
thus should be considered. Logic is based on logformat node's type:
for now only TAG and FMT nodes can be configured.
2024-04-29 14:47:37 +02:00
Aurelien DARRAGON
0e2aea8224 CLEANUP: tools/cbor: rename cbor_encode_ctx struct members
Rename e_byte_fct to e_fct_byte and e_fct_byte_ctx to e_fct_ctx, and
adjust some comments to make it clear that e_fct_ctx is here to provide
additional user-ctx to the custom cbor encode function pointers.

For now, only e_fct_byte function may be provided, but we could imagine
having e_fct_int{16,32,64}() one day to speed up the encoding when we
know we can encode multiple bytes at a time, but for now it's not worth
the hassle.
2024-04-29 14:47:37 +02:00
Willy Tarreau
1db3a390bb MINOR: list: add a macro to detect that a list contains at most one element
The new LIST_ATMOST1() test verifies that the designated element is either
alone or points on both sides to the same element. This is used to detect
that a list has at most a single element, or that an element about to be
deleted was the last one of a list.
2024-04-27 09:36:36 +02:00
Aurelien DARRAGON
c614fd3b9f MINOR: log: add +cbor encoding option
In this patch, we make use of the CBOR (RFC8949) encode helper functions
from the previous commit to implement '+cbor' encoding option for log-
formats. The logic behind it is pretty similar to '+json' encoding option,
except that the produced output is a CBOR payload written in HEX format so
that it remains compatible to use this with regular syslog endpoints.

Example:
  log-format "%{+cbor}o %[int(4)] test %(named_field)[str(ok)]"

Will produce:
  BF6B6E616D65645F6669656C64626F6BFF

  Detailed view (from cbor.me):
    BF                           # map(*)
       6B                        # text(11)
          6E616D65645F6669656C64 # "named_field"
       62                        # text(2)
          6F6B                   # "ok"
       FF                        # primitive(*)

If the option isn't set globally, but on a specific node instead, then
only the value will be encoded according to CBOR specification.

Example:
  log-format "test cbor bool: %{+cbor}[bool(true)]"

Will produce:
  test cbor bool: F5
2024-04-26 18:39:32 +02:00
Aurelien DARRAGON
810303e3e6 MINOR: tools: add cbor encode helpers
Add cbor helpers to encode strings (bytes/text) and integers according to
RFC8949, also add cbor_encode_ctx struct to pass encoding options such as
how to encode a single byte.
2024-04-26 18:39:32 +02:00
Aurelien DARRAGON
3f7c8387c0 MINOR: log: add +json encoding option
In this patch, we add the "+json" log format option that can be set
globally or per log format node.

What it does, it that it sets the LOG_OPT_ENCODE_JSON flag for the
current context which is provided to all lf_* log building function.

This way, all lf_* are now aware of this option and try to comply with
JSON specification when the option is set.

If the option is set globally, then sess_build_logline() will produce a
map-like object with key=val pairs for named logformat nodes.
(logformat nodes that don't have a name are simply ignored).

Example:
  log-format "%{+json}o %[int(4)] test %(named_field)[str(ok)]"

Will produce:
  {"named_field": "ok"}

If the option isn't set globally, but on a specific node instead, then
only the value will be encoded according to JSON specification.

Example:
  log-format "{ \"manual_key\": %(named_field){+json}[bool(true)] }"

Will produce:
  {"manual_key": true}

When the option is set, +E option will be ignored, and partial numerical
values (ie: because of logasap) will be encoded as-is.
2024-04-26 18:39:32 +02:00
Aurelien DARRAGON
b7c3d8c87c MINOR: log: add +bin logformat node option
Support '+bin' option argument on logformat nodes to try to preserve
binary output type with binary sample expressions.

For this, we rely on the log/sink API which is capable of conveying binary
data since all related functions don't search for a terminating NULL byte
in provided log payload as they take a string pointer and a string length
as argument.

Example:
  log-format "%{+bin}o %[bin(00AABB)]"

Will produce:
  00aabb

(output was piped to `hexdump  -ve '1/1 "%.2x"'` to dump raw bytes as HEX
characters)

This should be used carefully, because many syslog endpoints don't expect
binary data (especially NULL bytes). This is mainly intended for use with
set-var-fmt actions or with ring/udp log endpoints that know how to deal
with such binary payloads.

Also, this option is only supported globally (for use with '%o'), it will
not have any effect when set on an individual node. (it makes no sense to
have binary data in the middle of log payload that was started without
binary data option)
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
2caa921abf MINOR: log: add LOG_OPT_NONE flag
Add LOG_OPT_NONE flag for default value. Flag is not explicitly used yet
but with way we make it official that 0 value means NONE.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
a1583ec7c7 MINOR: log: make all lf_* sess build helper static
There is no need to expose such functions since they are only involved in
the log building process that occurs inside sess_build_logline().

Making functions static and removing their public prototype to ease code
maintenance.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
507223d527 MINOR: log: global lf_expr node options
Add options to lf_expr->nodes to store global options (those that are
common to all node) for easier access.

No functional change should be expected.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
7ff4f09e23 MINOR: log: store lf_expr nodes inside substruct
Add another struct level inside lf_expr struct to allow new information
to be stored alongside lf_expr nodes.
2024-04-26 18:39:31 +02:00
Amaury Denoyelle
374dc08611 MINOR: stats: parse header lines from stats-file
This patch implements parsing of headers line from stats-file.

A header line is defined as starting with '#' character. It is directly
followed by a domain name. For the moment, either 'fe' or 'be' is
allowed. The following lines will contain counters values relatives to
the domain context until the next header line.

This is implemented via static function parse_header_line(). It first
sets the domain context used during apply_stats_file(). A stats column
array is generated to contains the order on which column are stored.
This will be reused to parse following lines values.

If an invalid line is found and no header was parsed, considered the
stats-file as ill formatted and stop parsing. This allows to immediately
interrupt parsing if a garbage file was used without emitting a ton of
warnings to the user.
2024-04-26 11:34:02 +02:00
Amaury Denoyelle
34ae7755b3 MINOR: stats: apply stats-file on process startup
This commit is the first one of a serie to implement preloading of
haproxy counters via stats-file parsing.

This patch defines a basic apply_stats_file() function. It implements
reading line by line of a stats-file without any parsing for the moment.
It is called automatically on process startup via init().
2024-04-26 11:29:25 +02:00
Amaury Denoyelle
83731c8048 MINOR: guid: define guid_is_valid_fmt()
Extract GUID format validation in a dedicated function named
guid_is_valid_fmt(). For the moment, it is only used on guid_insert().

This will be reused when parsing stats-file, to ensure GUID has a valid
format before tree lookup.
2024-04-26 11:29:25 +02:00
Amaury Denoyelle
bc3c117dc0 MINOR: ist: define iststrip() new function
Implement iststrip(). This function removes any trailing newline
sequence if present from an ist.
2024-04-26 11:29:25 +02:00
Amaury Denoyelle
e74148fb7c MEDIUM: stats: implement dump stats-file CLI
Define a new CLI command "dump stats-file" with its handler
cli_parse_dump_stat_file(). It will loop twice on proxies_list to dump
first frontend and then backend side. It reuses the common function
stats_dump_stat_to_buffer(), using STAT_F_BOUND to restrict on the
correct side.

A new module stats-file.c is added to regroup function specifics to
stats-file. It defines two main functions :
* stats_dump_file_header() to generate the list of column list prefixed
  by the line context, either "#fe" or "#be"
* stats_dump_fields_file() to generate each stat lines. Object without
  GUID are skipped. Each stat entry is separated by a comma.

For the moment, stats-file does not support statistics modules. As such,
stats_dump_*_line() functions are updated to prevent looping over stats
module on stats-file output.
2024-04-26 10:20:57 +02:00
Amaury Denoyelle
83281303f6 MINOR: stats: define stats-file output format support
Prepare stats function to handle a new format labelled "stats-file". Its
purpose is to generate a statistics dump with a format closed from the
CSV output. Such output will be then used to preload haproxy internal
counters on process startup.

stats-file output differs from a standard CSV on several points. First,
only an excerpt of all statistics is outputted. All values that does not
make sense to preload are excluded. For the moment, stats-file only list
stats fully defined via "struct stat_col" method. Contrary to a CSV, sll
columns of a stats-file will be filled. As such, empty field value is
used to mark stats which should not be outputted.

Some adaptation specifics to stats-file are necessary into
me_generate_field(). First, stats-file will output separatedly values
from frontend and backend sides with their own respective set of
columns. As such, an empty field value is returned if stat is not
defined for either frontend/listener, or backend/server when outputting
the other side. Also, as stats-file does not support empty column,
stcol_hide() is not used for it.

A minor adjustement was necessary for stats_fill_fe_line() to pass
context flags. This is necessary to detect stat output format. All other
listener/server/backend corresponding functions already have it.
2024-04-26 10:20:57 +02:00
Amaury Denoyelle
6615252656 MEDIUM: stats: convert counters to new column definition
Convert most of proxy counters statistics to new "struct stat_col"
definition. Remove their corresponding switch..case entries in
stats_fill_*_line() functions. Their value are automatically calculate
via me_generate_field() invocation.

Along with this, also complete stcol_hide() when some stats should be
hidden.

Only a few counters where not converted. This is because they rely on
values stored outside of fe/be_counters structure, which
me_generate_field() cannot use for now.
2024-04-26 10:20:57 +02:00
Amaury Denoyelle
a7810b7be6 MINOR: stats: implement automatic metric generation from stat_col
This commit is a direct follow-up of the previous one which define a new
type "struct stat_col" to fully define a statistic entry.

Define a new function metric_generate(). For metrics statistics, it is
able to automatically calculate a stat value field for "offsets" from
"struct stat_col". Use it in stats_fill_*_stats() functions. Maintain a
fallback to previously used switch-case for old-style statistics.

This commit does not introduce functional change as currently no
statistic is defined as "struct stat_col". This will be the subject of a
future commit.
2024-04-26 10:20:57 +02:00
Amaury Denoyelle
65624876f2 MINOR: stats: introduce a more expressive stat definition method
Previously, statistics were simply defined as a list of name_desc, as
for example "stat_cols_px" for proxy stats. No notion of type was fixed
for each stat definition. This correspondance was done individually
inside stats_fill_*_line() functions. This renders the process to
define new statistics tedious.

Implement a more expressive stat definition method via a new API. A new
type "struct stat_col" for stat column to replace name_desc usage is
defined. It contains a field to store the stat nature and format. A
<cap> field is also defined to be able to define a proxy stat only for
certain type of objects.

This new type is also further extended to include counter offsets. This
allows to define a method to automatically generate a stat value field
from a "struct stat_col". This will be the subject of a future commit.

New type "struct stat_col" is fully compatible full name_desc. This
allows to gradually convert stats definition. The focus will be first
for proxies counters to implement statistics preservation on reload.
2024-04-26 10:20:57 +02:00
Amaury Denoyelle
861370a6d4 MINOR: stats: update ambiguous "metrics" naming to "stat_cols"
The name "metrics" was chosen to represent the various list of haproxy
exposed statistics. However, it is deemed as ambiguous as some stats are
indeed metric in the true sense, but some are not, as highlighted by
various "enum field_origin" values.

Replace it by the new name "stat_cols" for statistic columns. Along with
the already existing notion of stat lines it should better reflect its
purpose.
2024-04-26 10:20:57 +02:00
Christopher Faulet
608e23c495 MINOR: peers: Use a static variable to wait a resync on reload
When a process is reloaded, the old process must performed a synchronisation
with the new process. To do so, the sync task notify the local peer to
proceed and waits. Internally, the sync task used PEERS_F_DONOTSTOP flag to
know it should wait. However, this flag was only set/unset in a single
function. There is no real reason to set a flag to do so. A static variable
set to 1 when the resync starts and to 0 when it is finished is enough.
2024-04-25 18:29:58 +02:00
Christopher Faulet
5df54f4796 DEV: flags/peers: Decode PEER and PEERS flags
Decode peer and peers flags via peer_show_flags() and peers_show_flags()
functions.
2024-04-25 18:29:58 +02:00
Christopher Faulet
697bd69efc REORG: peers: Move peer and peers flags in the corresponding header file
PEER_F_* and PEERS_F_ * flags were moved to <peer-t.h> header file. It is
mandatory to decode them from "flags" dev tool.
2024-04-25 18:29:58 +02:00
Christopher Faulet
c904f7b440 MEDIUM: peers: Use true states for the learn state of a peer
Some flags were used to define the learn state of a peer. It was a bit
confusing, especially because the learn state of a peer is manipulated from
the peer applet but also from the sync task. It is harder to understand the
transitions if it is based on flags than if it is based a dedicated state
based on an enum. It is the purpose of this patch.

Now, we can define the following rules regarding this learn state:

  * A peer is assigned to learn by the sync task
  * The learn state is then changed by the peer itself to notify the
    learning is in progress and when it is finished.
  * Finally, when the peer finished to learn, the sync task must acknowledge
    it by unassigning the peer.
2024-04-25 18:29:57 +02:00
Christopher Faulet
ea9bd6d075 MEDIUM: peers: Use true states for the peer applets as seen from outside
This patch is a cleanup of the recent change about the relation between a
peer and the applet used to deal with I/O. Three flags was introduced to
reflect the peer applet state as seen from outside (from the sync task in
fact). Using flags instead of true states was in fact a bad idea. This work
but it is confusing. Especially because it was mixed with LEARN and TEACH
peer flags.

So, now, to make it clearer, we are now using a dedicated state for this
purpose. From the outside, the peer may be in one of the following state
with respects of its applet:

 * the peer has no applet, it is stopped (PEER_APP_ST_STOPPED).

 * the peer applet was created with a validated connection from the protocol
   perspective. But the sync task must synchronized it with the peers
   section. It is in starting state (PEER_APP_ST_STARTING).

 * The starting starting was acknowledged by the sync task, the peer applet
   can start to process messages. It is in running state
   (PEER_APP_ST_RUNNING).

 * The last peer applet was released and the associated connection
   closed. But the sync task must synchronized it with the peers section. It
   is in stopping state (PEER_APP_ST_STOPPING).

Functionnaly speaking, there is no true change here. But it should be easier
to understand now.

In addition to these changes, __process_peer_state() function was renamed
sync_peer_app_state().
2024-04-25 18:29:57 +02:00
Christopher Faulet
bea541b70a MINOR: applet: Add a function to know the side where an applet was created
appctx_is_back() function may be used to know if an applet was create on
frontend side or on backend side. It may be handy for some applets that may
exist on both sides, like peer applets.
2024-04-25 18:29:57 +02:00
Willy Tarreau
13515d9fbe MINOR: intops: add a pair of functions to check multi-byte ranges
These new functions is_char4_outside() and is_char8_outside() are meant
to be used to verify if any of the 4 or 8 chars represented respectively
by a uint32_t or a uint64_t is outside of the min,max byte range passed
in argument. This is the simplified, fast version of the function so it
is restricted to less than 0x80 distance between min and max (sufficient
to validate chars). Extra functions are also provided to check for min
or max alone as well, with the same restriction.

The use case typically is to check that the output of read_u32() or
read_u64() contains exclusively certain bytes.
2024-04-24 15:54:55 +02:00
David Carlier
98d22f212a MEDIUM: shctx: Naming shared memory context
From Linux 5.17, anonymous regions can be name via prctl/PR_SET_VMA
so caches can be identified when looking at HAProxy process memory
mapping.
The most possible error is lack of kernel support, as a result
we ignore it, if the naming fails the mapping of memory context
ought to still occur.
2024-04-24 10:25:38 +02:00
Tim Duesterhus
aab6477b67 MINOR: Add ha_generate_uuid_v7
This function generates a version 7 UUID as per
draft-ietf-uuidrev-rfc4122bis-14.
2024-04-24 08:23:56 +02:00
Tim Duesterhus
c6cea750a9 MINOR: tools: Rename ha_generate_uuid to ha_generate_uuid_v4
This is in preparation of adding support for other UUID versions.
2024-04-24 08:23:56 +02:00
Willy Tarreau
19f8762a98 BUILD: stick-tables: silence build warnings when threads are disabled
Since 3.0-dev7 with commit 1a088da7c2 ("MAJOR: stktable: split the keys
across multiple shards to reduce contention"), building without threads
yields a warning about the shard not being used. This is because the
locks API does nothing of its arguments, which is the only place where
the shard is being used. We cannot modify the lock API to pretend to
consume its argument because quite often it's not even instantiated.
Let's just pretend we consume shard using an explict ALREADY_CHECKED()
statement instead. While we're at it, let's make sure that XXH32() is
not called when there is a single bucket!

No backport is needed.
2024-04-24 08:23:56 +02:00
Amaury Denoyelle
341bf913d4 MINOR: stats: use STAT_F_* prefix for flags
Some flags are defined during statistics generation and output. They use
the prefix STAT_* which is also used for other purposes. Rename them
with the new prefix STAT_F_* to differentiate them from the other
usages.
2024-04-22 16:25:18 +02:00
Amaury Denoyelle
e97375dcab MINOR: stats: use stricter naming stats/field/line
Several unique names were used for different purposes under statistics
implementation. This caused the code to be difficult to understand.

* stat/stats name is removed when a more specific name could be used
* restrict field usage to purely refer to <struct field> which
  represents a raw stat value.
* use "line" naming to represent an array of <struct field>
2024-04-22 16:25:18 +02:00
Amaury Denoyelle
8dbb74542f MINOR: stats: rename info stats
Info are used to expose haproxy global metrics. It is similar to proxy
statistics and any other module. As such, rename info indexes using
SI_I_INF_* prefix. Also info variable is renamed stat_line_info.

Thanks to this, naming is now consistent between info and other
statistics. It will help to integrate it as a "global" statistics
module.
2024-04-22 16:25:18 +02:00
Amaury Denoyelle
02e0dd6d30 MINOR: stats: rename ambiguous stat_l and stat_count
Statistics were extended with the introduction of stats module. This
mechanism allows to expose various metrics for several haproxy
components. As a consequence of this, some static variables were
transformed to dynamic ones to be able to regroup all statistics
definition.

Rename these variables with more explicit naming :
* stat_lines can be used to generate one line of statistics for any
  module using struct field as value
* metrics and metrics_len are used to stored description of metrics
  indexed by module

Note that info is not integrated in the statistics module mechanism.
However, it could be done in the future to better reflect its purpose.
2024-04-22 16:25:18 +02:00
Amaury Denoyelle
8fc0b18087 MINOR: stats: rename proxy stats
This commit is the first one of a serie which adjust naming convention
for stats module. The objective is to remove ambiguity and better
reflect how stats are implemented, especially since the introduction of
stats module.

This patch renames elements related to proxies statistics. One of the
main change is to rename ST_F_* statistics indexes prefix with the new
name ST_I_PX_*. This remove the reference to field which represents
another concept in the stats module. In the same vein, global
stat_fields variable is renamed metrics_px.
2024-04-22 16:25:18 +02:00
Amaury Denoyelle
c02ec9a9db BUG/MINOR: backend: use cum_sess counters instead of cum_conn
This commit is part of a serie to align counters usage between
frontends/listeners on one side and backends/servers on the other.

"stot" metric refers to the total number of sessions. On backend side,
it is interpreted as a number of streams. Previously, this was accounted
using <cum_sess> be_counters field for servers, but <cum_conn> instead
for backend proxies.

Adjust this by using <cum_sess> for both proxies and servers. As such,
<cum_conn> field can be removed from be_counters.

Note that several diagnostic messages which reports total frontend and
backend connections were adjusted to use <cum_sess>. However, this is an
outdated and misleading information as it does reports streams count on
backend side. These messages should be fixed in a separate commit.

This should be backported to all stable releases.
2024-04-22 10:35:18 +02:00
Amaury Denoyelle
93066be32d MINOR: backend: use be_counters for health down accounting
This commit is the first one of a series which aims to align counters
usage between frontends/listeners on one side and backends/servers on
the other.

Remove <down_trans> field from proxy structure. Use instead the same
name field from be_counters structure, which is already used for
servers.
2024-04-22 10:35:18 +02:00
Christopher Faulet
fbc0850d36 MEDIUM: muxes: Use one callback function to shut a mux stream
mux-ops .shutr and .shutw callback functions are merged into a unique
functions, called .shut. The shutdown mode is still passed as argument,
muxes are responsible to test it. Concretly, .shut() function of each mux is
now the content of the old .shutw() followed by the content of the old
.shutr().
2024-04-19 16:33:40 +02:00
Christopher Faulet
1e38ac72ce MEDIUM: stconn: Use one function to shut connection and applet endpoints
se_shutdown() function is now used to perform a shutdown on a connection
endpoint and an applet endpoint. The same function is used for
both. sc_conn_shut() function was removed and appctx_shut() function was
updated to only deal with the applet stuff.
2024-04-19 16:33:35 +02:00
Christopher Faulet
4b80442832 MEDIUM: stconn: Explicitly pass shut modes to shut applet endpoints
It is the same than the previous patch but for applets. Here there is
already only one function. But with this patch, appctx_shut() function was
modified to explicitly get shutdown mode as parameter. In addition
appctx_shutw() was removed.
2024-04-19 16:25:06 +02:00
Christopher Faulet
c96a873ba3 MEDIUM: stconn: Use only one SC function to shut connection endpoints
The SC API to perform shutdowns on connection endpoints was unified to have
only one function, sc_conn_shut(), with read/write shut modes passed
explicitly. It means sc_conn_shutr() and sc_conn_shutw() were removed. The
next step is to do the same at the mux level.
2024-04-19 16:25:06 +02:00
Christopher Faulet
d2c3f8dde7 MINOR: stconn/connection: Move shut modes at the SE descriptor level
CO_SHR_* and CO_SHW_* modes are in fact used by the stream-connectors to
instruct the muxes how streams must be shut done. It is then the mux
responsibility to decide if it must be propagated to the connection layer or
not. And in this case, the modes above are only tested to pass a boolean
(clean or not).

So, it is not consistant to still use connection related modes for
information set at an upper layer and never used by the connection layer
itself.

These modes are thus moved at the sedesc level and merged into a single
enum. Idea is to add more modes, not necessarily mutually exclusive, to pass
more info to the muxes. For now, it is a one-for-one renaming.
2024-04-19 16:24:46 +02:00
Christopher Faulet
f58883002c BUG/MINOR: stconn: Fix sc_mux_strm() return value
Since the begining, this function returns a pointer on an appctx while it
should be a void pointer. It is the caller responsibility to cast it to the
right type, the corresponding mux stream in this case.

However, it is not a big deal because this function is unused for now. Only
the unsafe one is used.

This patch must be backported as far as 2.6.
2024-04-19 15:31:06 +02:00
Olivier Houchard
a7caa14a64 MINOR: stats: Get the right prototype for stats_dump_html_end().
When the stat code was reorganized, and the prototype to
stats_dump_html_end() was moved to its own header, it missed the function
arguments. Fix that.

This should fix issue 2540.
2024-04-19 01:54:00 +02:00
Amaury Denoyelle
0109c0658d REORG: stats: extract JSON related functions
This commit is similar to the previous one. This time it deals with
functions related to stats JSON output.
2024-04-18 17:04:08 +02:00
Amaury Denoyelle
b8c1fdf24e REORG: stats: extract HTML related functions
Extract functions related to HTML stats webpage from stats.c into a new
module named stats-html. This allows to reduce stats.c to roughly half
of its original size.
2024-04-18 17:04:08 +02:00
Amaury Denoyelle
b3d5708adc MINOR: stats: remove implicit static trash_chunk usage
A static variable trash_chunk was used as implicit buffer in most of
stats output function. It was a oneline buffer uses as temporary storage
before emitting to the final applet or CLI buffer.

Replaces it by a buffer defined in show_stat_ctx structure. This allows
to retrieve it in most of stats output function. An additional parameter
was added for the function where context was not already used. This
renders the code cleaner and will allow to split stats.c in several
source files.

As a result of a new member into show_stat_ctx, per-command context max
size has increased. This forces to increase APPLET_MAX_SVCCTX to ensure
pool size is big enough. Increase it to 128 bytes which includes some
extra room for the future.
2024-04-18 17:04:08 +02:00
Christopher Faulet
9b3a27f70c BUILD: linuxcap: Properly declare prepare_caps_from_permitted_set()
Expected arguments were not specified in the
prepare_caps_from_permitted_set() function declaration. It is an issue for
some compilers, for instance clang. But at the end, it is unexpected and
deprecated.

No backport needed, except if f0b6436f57 ("MEDIUM: capabilities: check
process capabilities sets") is backported.
2024-04-18 10:17:38 +02:00
Christopher Faulet
40aa87a28f BUG/MEDIUM: applet: Fix applet API to put input data in a buffer
applet_putblk and co were added to simplify applets. In 2.8, a fix was
pushed to deal with all errors as a room error because the vast majority of
applets didn't expect other kind of errors. The API was changed with the
commit 389b7d1f7b ("BUG/MEDIUM: applet: Fix API for function to push new
data in channels buffer").

Unfortunately and for unknown reason, the fix was totally failed. Checks on
channel functions were just wrong and not consistent. applet_putblk()
function is especially affected because the error is returned but no flag
are set on the SC to request more room. Because of this bug, applets relying
on it may be blocked, waiting for more room, and never woken up.

It is an issue for the peer and spoe applets.

This patch must be backported as far as 2.8.
2024-04-18 09:17:03 +02:00
William Lallemand
10224d72fd BUG/MINOR: ssl: fix crt-store load parsing
The crt-store load line parser relies on offsets of member of the
ckch_conf struct. However the new "alias" keyword as an offset to
-1, because it does not need to be used. Plan was to handle it that way
in the parser, but it wasn't supported yet. So -1 was still used in an
offset computation which was not used, but ASAN could see the problem.

This patch fixes the issue by using a signed type for the offset value,
so any negative value would be skipped. It also introduced a
PARSE_TYPE_NONE for the parser.

No backport needed.
2024-04-17 21:00:34 +02:00
Ilya Shipitsin
ab7f05daba CLEANUP: assorted typo fixes in the code and comments
This is 41st iteration of typo fixes
2024-04-17 11:14:44 +02:00
Willy Tarreau
99c918ed8a BUILD: xxhash: silence a build warning on Solaris + gcc-5.5
Testing an undefined macro emits warnings due to -Wundef, and we have
exactly one such case in xxhash:

  include/import/xxhash.h:3390:42: warning: "__cplusplus" is not defined [-Wundef]
   #if ((defined(sun) || defined(__sun)) && __cplusplus) /* Solaris includes __STDC_VERSION__ with C++. Tested with GCC 5.5 */

Let's just prepend "defined(__cplusplus) &&" before __cplusplus to
resolve the problem. Upstream is still affected apparently.
2024-04-17 09:43:32 +02:00
Frederic Lecaille
98583c4256 BUG/MEDIUM: grpc: Fix several unaligned 32/64 bits accesses
There were several places in grpc and its dependency protobuf where unaligned
accesses were done. Read accesses to 32 (resp. 64) bits values should be performed
by read_u32() (resp. read_u64()).
Replace these unligned read accesses by correct calls to these functions.
Same fixes for doubles and floats.

Such unaligned read accesses could lead to crashes with bus errors on CPU
archictectures which do not fix them at run time.

This patch depends on this previous commit:
    861199fa71 MINOR: net_helper: Add support for floats/doubles.

Must be backported as far as 2.6.
2024-04-16 07:37:28 +02:00
Frederic Lecaille
153fac4804 MINOR: net_helper: Add support for floats/doubles.
Implement (read|write)_flt() (resp. (read|write)_dbl()) to read/write floats
(resp. read/write doubles) from/to an unaligned buffer.
2024-04-16 07:37:28 +02:00
William Lallemand
fa5c4cc6ce MINOR: ssl: 'key-base' allows to load a 'key' from a specific path
The global 'key-base' keyword allows to read the 'key' parameter of a
crt-store load line using a path prefix.

This is the equivalent of the 'crt-base' keyword but for 'key'.

It only applies on crt-store.
2024-04-15 15:27:10 +02:00
William Lallemand
6567d09af5 MINOR: ssl: supports crt-base in crt-store
Add crt-base support for "crt-store". It will be used by 'crt', 'ocsp',
'issuer', 'sctl' load line parameter.

In order to keep compatibility with previous configurations and scripts
for the CLI, a crt-store load line will save its ckch_store using the
absolute crt path with the crt-base as the ckch tree key. This way, a
`show ssl cert` on the CLI will always have the completed path.
2024-04-15 15:25:36 +02:00
Willy Tarreau
4615cb510c MINOR: ring: always check that the old ring fits in the new one in ring_dup()
Let's add a BUG_ON() to make sure we don't accidentally shrink a buffer.
2024-04-15 08:31:01 +02:00
Willy Tarreau
b662c5d2b8 MINOR: ring: clarify the usage of ring_size() and add ring_allocated_size()
There's currently an abiguity around ring_size(), it's said to return
the allocated size but returns the usable size. We can't change it as
it's used everywhere in the code like this. Let's fix the comment and
add ring_allocated_size() instead for anything related to allocation.
2024-04-15 08:25:03 +02:00
Willy Tarreau
c0ee2d78d7 DEBUG: pools: report the data around the offending area in case of mismatch
When the integrity check fails, it's useful to get a dump of the area
around the first faulty byte. That's what this patch does. For example
it now shows this before reporting info about the tag itself:

  Contents around first corrupted address relative to pool item:.
  Contents around address 0xe4febc0792c0+40=0xe4febc0792e8:
    0xe4febc0792c8 [80 75 56 d8 fe e4 00 00] [.uV.....]
    0xe4febc0792d0 [a0 f7 23 a4 fe e4 00 00] [..#.....]
    0xe4febc0792d8 [90 75 56 d8 fe e4 00 00] [.uV.....]
    0xe4febc0792e0 [d9 93 fb ff fd ff ff ff] [........]
    0xe4febc0792e8 [d9 93 fb ff ff ff ff ff] [........]
    0xe4febc0792f0 [d9 93 fb ff ff ff ff ff] [........]
    0xe4febc0792f8 [d9 93 fb ff ff ff ff ff] [........]
    0xe4febc079300 [d9 93 fb ff ff ff ff ff] [........]

This may be backported to 2.9 and maybe even 2.8 as it does help spot
the cause of the memory corruption.
2024-04-12 18:01:55 +02:00
Willy Tarreau
16e3655fbd REORG: pool: move the area dump with symbol resolution to tools.c
This function is particularly useful to dump unknown areas watching
for opportunistic symbols, so let's move it to tools.c so that we can
reuse it a little bit more.
2024-04-12 18:01:20 +02:00
William Lallemand
81e54ef197 MINOR: ssl: rename ckchs_load_cert_file to new_ckch_store_load_files_path
Remove the ambiguous "ckchs" name and reflect the fact that its loaded
from a path.
2024-04-12 15:38:54 +02:00
William Lallemand
00eb44864b MINOR: ssl: add the section parser for 'crt-store'
'crt-store' is a new section useful to define the struct ckch_store.

The "load" keyword in the "crt-store" section allows to define which
files you want to load for a specific certificate definition.

Ex:
    crt-store
        load crt "site1.crt" key "site1.key"
        load crt "site2.crt" key "site2.key"

    frontend in
        bind *:443 ssl crt "site1.crt" crt "site2.crt"

This is part of the certificate loading which was discussed in #785.
2024-04-12 15:38:54 +02:00
Willy Tarreau
772f9a5874 BUILD: pools: make DEBUG_MEMORY_POOLS=1 the default option
This option has been set by default for a very long time and also
complicates the manipulation of the DEBUG variable. Let's make it
the official default and permit to unset it by setting it to zero.
The other pool-related DEBUG options were adjusted to also explicitly
check for the zero value for consistency.
2024-04-11 17:25:45 +02:00
Willy Tarreau
b70981532a BUILD: debug: make DEBUG_STRICT=1 the default
We continue to carry it in the makefile, which adds to the difficulty
of passing new options. Let's make DEBUG_STRICT=1 the default so that
one has to explicitly pass DEBUG_STRICT=0 to disable it. This allows us
to remove the option from the default DEBUG variable in the makefile.
2024-04-11 17:25:45 +02:00
Willy Tarreau
e791b243f0 BUG/MINOR: debug: make sure DEBUG_STRICT=0 does work as documented
Setting DEBUG_STRICT=0 only validates the defined(DEBUG_STRICT) test
regarding DEBUG_STRICT_ACTION, which is equivalent to DEBUG_STRICT>=0.
Let's make sure the test checks for >0 so that DEBUG_STRICT=0 properly
disables DEBUG_STRICT.
2024-04-11 16:41:08 +02:00
Willy Tarreau
2a9ccf5b25 BUILD: atomic: fix peers build regression on gcc < 4.7 after recent changes
Recent commit 4c1480f13b ("MINOR: stick-tables: mark the seen stksess
with a flag "seen"") introduced a build regression on older versions of
gcc before 4.7. This is in the old __sync_ API, the HA_ATOMIC_LOAD()
implementation uses an intermediary return value called "ret" that is
of the same name as the variable passed in argument to the macro in the
aforementioned commit. As such, the compiler complains with a cryptic
error:
  src/peers.c: In function 'peer_teach_process_stksess_lookup':
  src/peers.c:1502: error: invalid type argument of '->' (have 'int')

The solution is to avoid referencing the argument in the expression and
using an intermediary variable for the pointer as done elsewhere in the
code. It seems there's no other place affected with this. It probably
does not need to be backported since this code is antique and very rarely
used nowadays.
2024-04-11 16:41:08 +02:00
Willy Tarreau
d78c346670 BUILD: makefile: support USE_xxx=0 as well
William rightfully reported that not supporting =0 to disable a USE_xxx
option is sometimes painful (e.g. a script might do USE_xxx=$(command)).
It's not that difficult to handle actually, we just need to consider the
value 0 as empty at the few places that test for an empty string in
options.mk, and in each "ifneq" test in the main Makefile, so let's do
that. We even take care of preserving the original value in the build
options string so that building with USE_OPENSSL=0 will be reported
as-is in haproxy -vv, and with "-OPENSSL" in the feature list.
2024-04-11 11:06:19 +02:00
Willy Tarreau
aa32ab13f0 BUILD: makefile: warn about unknown USE_* variables
William suggested that it would be nice to warn about unknown USE_*
variables to more easily catch misspelled ones. The valid ones are
present in use_opts, so by appending "=%" to each of them, we can
build a series of patterns to exclude from MAKEOVERRIDES and emit
a warning for the ones that stand out.

Example:

  $ make TARGET=linux-glibc  USE_QUIC_COMPAT_OPENSSL=1
  Makefile:338: Warning: ignoring unknown build option: USE_QUIC_COMPAT_OPENSSL=1
    CC      src/slz.o
2024-04-11 11:06:19 +02:00
Christopher Faulet
1fa6eb2eb9 BUG/MINOR: http-ana: Fix TX_L7_RETRY and TX_D_L7_RETRY values
These values are obviously wrong. There is an extra zero at the end for both
defines. By chance, it is harmless. But it is better to fix it.

This patch should be backported as far as 2.6.
2024-04-10 15:50:00 +02:00
Amaury Denoyelle
34b31d85cb OPTIM: quic: do not call qc_send() if nothing to emit
qc_send() was systematically called by quic_conn IO handlers with all
instantiated quic_enc_level. Change this to only register quic_enc_level
for send if needed. Do not call at all qc_send() if no qel registered.

A new function qel_need_sending() is defined to detect if sending is
required. First, it checks if quic_enc_level has prepared frames or
probing is set. It can also returns true if ACK required either on
quic_enc_level itself or because of quic_conn ack timer fired. Finally,
a CONNECTION_CLOSE emission for quic_conn is also a valid case.

This should reduce the number of invocations of qc_send(). This could
improve slightly performance, as well as simplify traces debugging.
2024-04-10 11:17:21 +02:00
Amaury Denoyelle
7fc1ce5bc8 MEDIUM: quic: remove duplicate hdshk/app send functions
A series of previous patches have clean up sending function for
handshake case. Their new exposed API is now flexible enough to convert
app case to use the same functions.

As such, qc_send_hdshk_pkts() is renamed qc_send() and become the single
entry point for QUIC emission. It is used during application packets
emission in quic_conn_app_io_cb(), qc_send_mux(). Also the internal
function qc_prep_hpkts() is renamed qc_prep_pkts().

Remove the new unneeded qc_send_app_pkts() and qc_prep_app_pkts().

Also removed qc_send_app_probing(). It was a simple wrapper over other
application send functions. Now, default qc_send() can be reuse for such
cases with <old_data> argument set to true.

An adjustment was needed when converting qc_send_hdshk_pkts() to the
general qc_send() version. Previously, only a single packets
encoding/emission cycle was performed. This was enough as handshake
packets are always smaller than Tx buffer. However, it may be possible
to emit more application data. As such, a loop is necessary to perform
multiple encoding/emission cycles, as this was already the case in
qc_send_app_pkts().

No functional difference should happen with this commit. However, as
these are critcal functions with a lot of changes, this patch is
labelled as medium.
2024-04-10 11:07:35 +02:00
Amaury Denoyelle
4e4127a66d MINOR: quic: use qc_send_hdshk_pkts() in handshake IO cb
quic_conn_io_cb() manually implements emission by using lower level
functions qc_prep_pkts() and qc_send_ppkts(). Replace this by using the
higher level function qc_send_hdshk_pkts() which notably handle buffer
allocation and purging.

This allows to clean up send API by flagging qc_prep_pkts() and
qc_send_ppkts() as static. They are now used in a single location inside
qc_send_hdshk_pkts().
2024-04-10 11:07:19 +02:00
Amaury Denoyelle
3a8f4761e7 MINOR: quic: improve sending API on retransmit
qc_send_hdshk_pkts() is a wrapper for qc_prep_hpkts() used on
retransmission. It was restricted to use two quic_enc_level pointers as
distinct arguments. Adapt it to directly use the same list of
quic_enc_level which is passed then to qc_prep_hpkts().

Now for retransmission quic_enc_level send list is built directly into
qc_dgrams_retransmit() which calls qc_send_hdshk_pkts().

Along this change, a new utility function qel_register_send() is
defined. It is an helper to build the quic_enc_level send list. It
enfores that each quic_enc_level instance is only registered in a single
list to prevent memory issues. It is both used in qc_dgrams_retransmit()
and quic_conn_io_cb().
2024-04-10 11:06:55 +02:00
Amaury Denoyelle
93f5b4c8ae MINOR: quic: uniformize sending methods for handshake
Emission of packets during handshakes was implemented via an API which
uses two alternative ways to specify the list of frames.

The first one uses a NULL list of quic_enc_level as argument for
qc_prep_hpkts(). This was an implicit method to iterate on all qels
stored in quic_conn instance, with frames already inserted in their
corresponding quic_pktns.

The second method was used for retransmission. It uses a custom local
quic_enc_level list specified by the caller as input to qc_prep_hpkts().
Frames were accessible through <retransmit> list pointers of each
quic_enc_level used in an implicit mechanism.

This commit clarifies the API by using a single common method. Now
quic_enc_level list must always be specified by the caller. As for
frames list, each qels must set its new field <send_frms> pointer to the
list of frames to send. Callers of qc_prep_hpkts() are responsible to
always clear qels send list. This prevent a single instance of
quic_enc_level to be inserted while being attached to another list.

This allows notably to clean up some unnecessary code. First,
<retransmit> list of quic_enc_level is removed as it is replaced by new
<send_frms>. Also, it's now possible to use proper list_for_each_entry()
inside qc_prep_hpkts() to loop over each qels. Internal functions for
quic_enc_level selection is now removed.
2024-04-10 11:06:41 +02:00
Aurelien DARRAGON
8226e92eb0 BUG/MINOR: tools/log: invalid encode_{chunk,string} usage
encode_{chunk,string}() is often found to be used this way:

  ret = encode_{chunk,string}(start, stop...)
  if (ret == NULL || *ret != '\0') {
	//error
  }
  //success

Indeed, encode_{chunk,string} will always try to add terminating NULL byte
to the output string, unless no space is available for even 1 byte.
However, it means that for the caller to be able to spot an error, then it
must provide a buffer (here: start) which is already initialized.

But this is wrong: not only this is very tricky to use, but since those
functions don't return NULL on failure, then if the output buffer was not
properly initialized prior to calling the function, the caller will
perform invalid reads when checking for failure this way. Moreover, even
if the buffer is initialized, we cannot reliably tell if the function
actually failed this way because if the buffer was previously initialized
with NULL byte, then the caller might think that the call actually
succeeded (since the function didn't return NULL and didn't update the
buffer).

Also, sess_build_logline() relies lf_encode_{chunk,string}() functions
which are in fact wrappers for encode_{chunk,string}() functions and thus
exhibit the same error handling mechanism. It turns out that
sess_build_logline() makes unsafe use of those functions because it uses
the error-checking logic mentionned above while buffer (tmplog) is not
guaranteed to be initialized when entering the function. This may
ultimately cause malfunctions or invalid reads if the output buffer is
lacking space.

To fix the issue once and for all and prevent similar bugs from being
introduced, we make it so encode_{string, chunk} and escape_string()
(based on encode_string()) now explicitly return NULL on failure
(when the function failed to write at least the ending NULL byte)

lf_encode_{string,chunk}() helpers had to be patched as well due to code
duplication.

This should be backported to all stable versions.

[ada: for 2.4 and 2.6 the patch won't apply as-is, it might be helpful to
 backport ae1e14d65 ("CLEANUP: tools: removing escape_chunk() function")
 first, considering it's not very relevant to maintain a dead function]
2024-04-09 17:35:45 +02:00
Valentine Krasnobaeva
eef14e9574 CLEANUP: global: remove LSTCHK_CAP_BIND
Remove LSTCHK_CAP_BIND as it is never set and never checked.
2024-04-05 18:01:54 +02:00
Valentine Krasnobaeva
f0b6436f57 MEDIUM: capabilities: check process capabilities sets
Since the Linux capabilities support add-on (see the commit bd84387beb
("MEDIUM: capabilities: enable support for Linux capabilities")), we can also
check haproxy process effective and permitted capabilities sets, when it
starts and runs as non-root.

Like this, if needed network capabilities are presented only in the process
permitted set, we can get this information with capget and put them in the
process effective set via capset. To do this properly, let's introduce
prepare_caps_from_permitted_set().

First, it checks if binary effective set has CAP_NET_ADMIN or CAP_NET_RAW. If
there is a match, LSTCHK_NETADM is removed from global.last_checks list to
avoid warning, because in the initialization sequence some last configuration
checks are based on LSTCHK_NETADM flag and haproxy process euid may stay
unpriviledged.

If there are no CAP_NET_ADMIN and CAP_NET_RAW in the effective set, permitted
set will be checked and only capabilities given in 'setcap' keyword will be
promoted in the process effective set. LSTCHK_NETADM will be also removed in
this case by the same reason. In order to be transparent, we promote from
permitted set only capabilities given by user in 'setcap' keyword. So, if
caplist doesn't include CAP_NET_ADMIN or CAP_NET_RAW, LSTCHK_NETADM would not
be unset and warning about missing priviledges will be emitted at
initialization.

Need to call it before protocol_bind_all() to allow binding to priviledged
ports under non-root and 'setcap cap_net_bind_service' must be set in the
global section in this case.
2024-04-05 18:01:54 +02:00
Amaury Denoyelle
0489d85263 MINOR: listener: implement GUID support
This commit is similar with the two previous ones. Its purpose is to add
GUID support on listeners. Due to bind_conf and listeners configuration,
some specifities were required.

Its possible to define several listeners on a single bind line, for
example by specifying multiple addresses. As such, it's impossible to
support a "guid" keyword on a bind line. The problem is exacerbated by
the cloning of listeners when sharding is used.

To resolve this, a new keyword "guid-prefix" is defined for bind lines.
It allows to specify a string which will be used as a prefix for
automatically generated GUID for each listeners attached to a bind_conf.

Automatic GUID listeners generation is implemented via a new function
bind_generate_guid(). It is called on post-parsing, after
bind_complete_thread_setup(). For each listeners on a bind_conf, a new
GUID is generated with bind_conf prefix and the index of the listener
relative to other listeners in the bind_conf. This last value is stored
in a new bind_conf field named <guid_idx>. If a GUID cannot be inserted,
for example due to a non-unique value, an error is returned, startup is
interrupted with configuration rejected.
2024-04-05 15:40:42 +02:00
Amaury Denoyelle
8259456981 MINOR: server: implement GUID support
This commit is similar to previous one, except that it implements GUID
support for server instances. A guid_node field is inserted into server
structure. A new "guid" server keyword is defined.
2024-04-05 15:40:42 +02:00
Amaury Denoyelle
da754b4533 MINOR: proxy: implement GUID support
Implement proxy identiciation through GUID. As such, a guid_node member
is inserted into proxy structure. A proxy keyword "guid" is defined to
allow user to fix its value.
2024-04-05 15:40:42 +02:00
Amaury Denoyelle
1009ca4160 MINOR: guid: restrict guid format
GUID format is unspecified to allow users to choose the naming scheme.
Some restrictions however are added by this patch, mainly to ensure
coherence and memory usage.

The first restriction is on the length of GUID. No more than 127
characters can be used to prevent memory over consumption.

The second restriction is on the character set allowed in GUID. Utility
function invalid_char() is used for this : it allows alphanumeric
values and '-', '_', '.' and ':'.
2024-04-05 15:40:42 +02:00
Amaury Denoyelle
84fa6b344a MINOR: guid: introduce global UID module
Define a new module guid. Its purpose is to be able to attach a global
identifier for various objects such as proxies, servers and listeners.

A new type guid_node is defined. It will be stored in the objects which
can be referenced by such GUID. Several functions are implemented to
properly initialized, insert, remove and lookup GUID in a global tree.
Modification operations should only be conducted under thread isolation.
2024-04-05 15:40:42 +02:00
Aurelien DARRAGON
e751eebfc6 MEDIUM: proxy/log: leverage lf_expr API for logformat preparsing
Currently, the way proxy-oriented logformat directives are handled is way
too complicated. Indeed, "log-format", "log-format-error", "log-format-sd"
and "unique-id-format" all rely on preparsing hints stored inside
proxy->conf member struct. Those preparsing hints include the original
string that should be compiled once the proxy parameters are known plus
the config file and line number where the string was found to generate
precise error messages in case of failure during the compiling process
that happens within check_config_validity().

Now that lf_expr API permits to compile a lf_expr struct that was
previously prepared (with original string and config hints), let's
leverage lf_expr_compile() from check_config_validity() and instead
of relying on individual proxy->conf hints for each logformat expression,
store string and config hints in the lf_expr struct directly and use
lf_expr helpers funcs to handle them when relevant (ie: original
logformat string freeing is now done at a central place inside
lf_expr_deinit(), which allows for some simplifications)

Doing so allows us to greatly simplify the preparsing logic for those 4
proxy directives, and to finally save some space in the proxy struct.

Also, since httpclient proxy has its "logformat" automatically compiled
in check_config_validity(), we now use the file hint from the logformat
expression struct to set an explicit name that will be reported in case
of error ("parsing [httpclient:0] : ...") and remove the extraneous check
in httpclient_precheck() (logformat was parsed twice previously..)
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
2b79457bc0 MEDIUM: log: add compiling logic to logformat expressions
split parse_logformat_string() into two functions:

parse_logformat_string() sticks to the same behavior, but now becomes an
helper for lf_expr_compile() which uses explicit arguments so that it
becomes possible to use lf_expr_compile() without a proxy, but also
compile an expression which was previously prepared for compiling (set
string and config hints within the logformat expression to avoid manually
storing string and config context if the compiling step happens later).

lf_expr_dup() may be used to duplicate an expression before it is
compiled, lf_expr_xfer() now makes sure that the input logformat is
already compiled.

This is some prerequisite works for log-profiles implementation, no
functional change should be expected.
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
7a21c3a4ef MAJOR: log: implement proper postparsing for logformat expressions
This patch tries to address a design flaw with how logformat expressions
are parsed from config. Indeed, some parse_logformat_string() calls are
performed during config parsing when the proxy mode is not yet known.

Here's a config example that illustrates the issue:

  defaults
     mode tcp

  listen test
     bind :8888
     http-response set-header custom-hdr "%trl" # needs http
     mode http

The above config should work, because the effective proxy mode is http,
yet haproxy fails with this error:

  [ALERT]    (99051) : config : parsing [repro.conf:6] : error detected in proxy 'test' while parsing 'http-response set-header' rule : format tag 'trl' is reserved for HTTP mode.

To fix the issue once and for all, let's implement smart postparsing for
logformat expressions encountered during config parsing:

  - split parse_logformat_string() (and subfonctions) in order to create a
    new lf_expr_postcheck() function that must be called to finish
    preparing and checking the logformat expression once the proxy type is
    known.
  - save some config hints info during parse_logformat_string() to
    generate more precise error messages during lf_expr_postcheck(), if
    needed, we rely on curpx->conf.args.{file,line} hints for that because
    parse_logformat_string() doesn't know about current file and line
    number.
  - lf_expr_postcheck() uses PR_FL_CHECKED proxy flag to know if the
    function may try to make the proxy compatible with the expression, or
    if it should simply fail as soon as an incompatibility is detected.
  - if parse_logformat_string() is called from an unchecked proxy, then
    schedule the expression for postparsing, else (ie: during runtime),
    run the postcheck right away.

This change will also allow for some logformat expression error handling
simplifications in the future.
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
56d8074798 MINOR: proxy: add PR_FL_CHECKED flag
PR_FL_CHECKED is set on proxy once the proxy configuration was fully
checked (including postparsing checks).

This information may be useful to functions that need to know if some
config-related proxy properties are likely to change or not due to parsing
or postparsing/check logics. Also, during runtime, except for some rare cases
config-related proxy properties are not supposed to be changed.
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
6810c41f8e MEDIUM: tree-wide: add logformat expressions wrapper
log format expressions are broadly used within the code: once they are
parsed from input string, they are converted to a linked list of
logformat nodes.

We're starting to face some limitations because we're simply storing the
converted expression as a generic logformat_node list.

The first issue we're facing is that storing logformat expressions that
way doesn't allow us to add metadata alongside the list, which is part
of the prerequites for implementing log-profiles.

Another issue with storing logformat expressions as generic lists of
logformat_node elements is that it's starting to become really hard to
tell when we rely on logformat expressions or not in the code given that
there isn't always a comment near the list declaration or manipulation
to indicate that it's relying on logformat expressions under the hood,
so this adds some complexity for code maintenance.

This patch looks quite impressive due to changes in a lot of header and
source files (since logformat expressions are broadly used), but it does
a simple thing: it defines the lf_expr structure which itself holds a
generic list of logformat nodes, and then declares some helpers to
manipulate lf_expr elements and fixes the code so that we now exclusively
manipulate logformat_node lists as lf_expr elements outside of log.c.

For now, lf_expr struct only contains the list of logformat nodes (no
additional metadata), but now that we have dedicated type and helpers,
doing so in the future won't be problematic at all and won't require
extensive code changes.
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
7d8f45b647 MEDIUM: log: carry tag context in logformat node
This is a pretty simple patch despite requiring to make some visible
changes in the code:

When parsing a logformat string, log tags (ie: '%tag', AKA log tags) are
turned into logformat nodes with their type set to the type of the
corresponding logformat_tag element which was matched by name. Thus, when
"compiling" a logformat tag, we only keep a reference to the tag type
from the original logformat_tag.

For example, for "%B" log tag, we have the following logformat_tag
element:

  {
    .name = "B",
    .type = LOG_FMT_BYTES,
    .mode = PR_MODE_TCP,
    .lw = LW_BYTES,
    .config_callback = NULL
  }

When parsing "%B" string, we search for a matching logformat tag
inside logformat_tags[] array using the provided name, once we find a
matching element, we craft a logformat node whose type will be
LOG_FMT_BYTES, but from the node itself, we no longer have access to
other informations that are set in the logformat_tag struct element.

Thus from a logformat_node resulting from a log tag, with current
implementation, we cannot easily get back to matching logformat_tag
struct element as it would require us to scan the whole logformat_tags
array at runtime using node->type to find the matching element.

Let's take a simpler path and consider all tag-specific LOG_FMT_*
subtypes as being part of the same logformat node type: LOG_FMT_TAG.

Thanks to that, we're now able to distinguish logformat nodes made
from logformat tag from other logformat nodes, and link them to
their corresponding logformat_tag element from logformat_tags[] array. All
it costs is a simple indirection and an extra pointer in logformat_node
struct.

While at it, all LOG_FMT_* types related to logformat tags were moved
inside log.c as they have no use outside of it since they are simply
lookup indexes for sess_build_logline() and could even be replaced by
function pointers some day...
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
8cf5c3d7f0 MINOR: log: expose logformat_tag struct
rename logformat_type internal struct to logformat_tag to to make it less
confusing, then expose logformat_tag struct through header file so that it
can be referenced in other structs.

also rename logformat_keywords[] to logformat_tags[] for better
consistency.
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
c85cbc1061 MEDIUM: log: rename logformat var to logformat tag
What we use to call logformat variable in the code is referred as
log-format tag in the documentation. Having both 'var' and 'tag' labels
referring to the same thing is really confusing. Let's make the code
comply with the documentation by replacing all logformat var/variable/VAR
occurences with either tag or TAG.

No functional change should be expected, the only visible side-effect from
user point of view is that "variable" was replaced by "tag" in some error
messages.
2024-04-04 19:10:01 +02:00
Willy Tarreau
1a088da7c2 MAJOR: stktable: split the keys across multiple shards to reduce contention
In order to reduce the contention on the table when keys expire quickly,
we're spreading the load over multiple trees. That counts for keys and
expiration dates. The shard number is calculated from the key value
itself, both when looking up and when setting it.

The "show table" dump on the CLI iterates over all shards so that the
output is not fully sorted, it's only sorted within each shard. The Lua
table dump just does the same. It was verified with a Lua program to
count stick-table entries that it works as intended (the test case is
reproduced here as it's clearly not easy to automate as a vtc):

  function dump_stk()
    local dmp = core.proxies['tbl'].stktable:dump({});
    local count = 0
    for _, __ in pairs(dmp) do
        count = count + 1
    end
    core.Info('Total entries: ' .. count)
  end

  core.register_action("dump_stk", {'tcp-req', 'http-req'}, dump_stk, 0);

  ##
  global
    tune.lua.log.stderr on
    lua-load-per-thread lua-cnttbl.lua

  listen front
    bind :8001
    http-request lua.dump_stk if { path_beg /stk }
    http-request track-sc1 rand(),upper,hex table tbl
    http-request redirect location /

  backend tbl
    stick-table size 100k type string len 12 store http_req_cnt

  ##
  $ h2load -c 16 -n 10000 0:8001/
  $ curl 0:8001/stk

  ## A count close to 100k appears on haproxy's stderr
  ## On the CLI, "show table tbl" | wc will show the same.

Some large parts were reindented only to add a top-level loop to iterate
over shards (e.g. process_table_expire()). Better check the diff using
git show -b.

The number of shards is decided just like for the pools, at build time
based on the max number of threads, so that we can keep a constant. Maybe
this should be done differently. For now CONFIG_HAP_TBL_BUCKETS is used,
and defaults to CONFIG_HAP_POOL_BUCKETS to keep the benefits of all the
measurements made for the pools. It turns out that this value seems to
be the most reasonable one without inflating the struct stktable too
much. By default for 1024 threads the value is 32 and delivers 980k RPS
in a test involving 80 threads, while adding 1kB to the struct stktable
(roughly doubling it). The same test at 64 gives 1008 kRPS and at 128
it gives 1040 kRPS for 8 times the initial size. 16 would be too low
however, with 675k RPS.

The stksess already have a shard number, it's the one used to decide which
peer connection to send the entry. Maybe we should also store the one
associated with the entry itself instead of recalculating it, though it
does not happen that often. The operation is done by hashing the key using
XXH32().

The peers also take and release the table's lock but the way it's used
it not very clear yet, so at this point it's sure this will not work.

At this point, this allowed to completely unlock the performance on a
80-thread setup:

 before: 5.4 Gbps, 150k RPS, 80 cores
  52.71%  haproxy    [.] stktable_lookup_key
  36.90%  haproxy    [.] stktable_get_entry.part.0
   0.86%  haproxy    [.] ebmb_lookup
   0.18%  haproxy    [.] process_stream
   0.12%  haproxy    [.] process_table_expire
   0.11%  haproxy    [.] fwrr_get_next_server
   0.10%  haproxy    [.] eb32_insert
   0.10%  haproxy    [.] run_tasks_from_lists

 after: 36 Gbps, 980k RPS, 80 cores
  44.92%  haproxy    [.] stktable_get_entry
   5.47%  haproxy    [.] ebmb_lookup
   2.50%  haproxy    [.] fwrr_get_next_server
   0.97%  haproxy    [.] eb32_insert
   0.92%  haproxy    [.] process_stream
   0.52%  haproxy    [.] run_tasks_from_lists
   0.45%  haproxy    [.] conn_backend_get
   0.44%  haproxy    [.] __pool_alloc
   0.35%  haproxy    [.] process_table_expire
   0.35%  haproxy    [.] connect_server
   0.35%  haproxy    [.] h1_headers_to_hdr_list
   0.34%  haproxy    [.] eb_delete
   0.31%  haproxy    [.] srv_add_to_idle_list
   0.30%  haproxy    [.] h1_snd_buf

WIP: uint64_t -> long

WIP: ulong -> uint

code is much smaller
2024-04-03 17:34:47 +02:00
Willy Tarreau
4c1480f13b MINOR: stick-tables: mark the seen stksess with a flag "seen"
Right now we're taking the stick-tables update lock for reads just for
the sake of checking if the update index is past it or not. That's
costly because even taking the read lock is sufficient to provoke a
cache line write, while when under load or attack it's frequent that
the update has not yet been propagated and wouldn't require anything.

This commit brings a new field to the stksess, "seen", which is zeroed
when the entry is updated, and set to one as soon as at least one peer
starts to consult it. This way it will reflect that the entry must be
updated again so that this peer can see it. Otherwise no update will
be necessary. For now the flag is only set/reset but not exploited.
A great care is taken to avoid writes whenever possible.
2024-04-03 17:34:47 +02:00
William Lallemand
aa3632962f MEDIUM: mworker: get rid of libsystemd
Given the xz drama which allowed liblzma to be linked to openssh, lets remove
libsystemd to get rid of useless dependencies.

The sd_notify API seems to be stable and is now documented. This patch replaces
the sd_notify() and sd_notifyf() function by a reimplementation inspired by the
systemd documentation.

This should not change anything functionnally. The function will be built when
haproxy is built using USE_SYSTEMD=1.

References:
  https://github.com/systemd/systemd/issues/32028
  https://www.freedesktop.org/software/systemd/man/devel/sd_notify.html#Notes

Before:

wla@kikyo:~% ldd /usr/sbin/haproxy
	linux-vdso.so.1 (0x00007ffcfaf65000)
	libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x000074637fef4000)
	libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x000074637fe4f000)
	libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x000074637f400000)
	liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x000074637fe0d000)
	libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x000074637f92a000)
	libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x000074637f365000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000074637f000000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000074637f27a000)
	libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x000074637fdff000)
	libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x000074637eeb8000)
	liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x000074637fdcd000)
	libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x000074637ee01000)
	liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x000074637fda8000)
	/lib64/ld-linux-x86-64.so.2 (0x000074637ff5d000)
	libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x000074637f904000)

After:

wla@kikyo:~% ldd /usr/sbin/haproxy
	linux-vdso.so.1 (0x00007ffd51901000)
	libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f758d6c0000)
	libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007f758d61b000)
	libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007f758ca00000)
	liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x00007f758d5d9000)
	libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f758d365000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f758d5ba000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f758c600000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f758c915000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f758d729000)

A backport to all stable versions could be considered at some point.
2024-04-03 15:53:18 +02:00
Frederic Lecaille
a305bb92b9 MINOR: quic: HyStart++ implementation (RFC 9406)
This is a simple algorithm to replace the classic slow start phase of the
congestion control algorithms. It should reduce the high packet loss during
this step.

Implemented only for Cubic.
2024-04-02 18:47:19 +02:00
Anthony Deschamps
faa8c3e024 MEDIUM: lb-chash: Deterministic node hashes based on server address
Motivation: When services are discovered through DNS resolution, the order in
which DNS records get resolved and assigned to servers is arbitrary. Therefore,
even though two HAProxy instances using chash balancing might agree that a
particular request should go to server3, it is likely the case that they have
assigned different IPs and ports to the server in that slot.

This patch adds a server option, "hash-key <key>" which can be set to "id" (the
existing behaviour, default), "addr", or "addr-port". By deriving the keys for
the chash tree nodes from a server's address and port we ensure that independent
HAProxy instances will agree on routing decisions. If an address is not known
then the key is derived from the server's puid as it was previously.

When adjusting a server's weight, we now check whether the server's hash has
changed. If it has, we have to remove all its nodes first, since the node keys
will also have to change.
2024-04-02 07:00:10 +02:00
Aurelien DARRAGON
3c6dfa618a MEDIUM: log/balance: leverage lbprm api for log load-balancing
log load-balancing implementation was not seamlessly integrated within
lbprm API. The consequence is that it could become harder to maintain
over time since it added some specific cases just for the log backend.
Moreover, it resulted in some code duplication since balance algorithms
that are common to logs and regular (tcp, http) backends were specifically
rewritten for log backends.

Thanks to the previous commit, we now have all the prerequisites to make
log load-balancing fully leverage lbprm logic. Thus in this patch we make
__do_send_log_backend() use existing lbprm algorithms, and we no longer
require log-specific lbprm initialization in cfgparse.c and in
postcheck_log_backend().

As a bonus, for log backends this allows weighed algorithms to properly
support weights (ie: roundrobin, random and log-hash) since we now
leverage the same lb algorithms that we use for tcp/http backends
(doc was updated).
2024-03-29 17:08:37 +01:00
Aurelien DARRAGON
9aea6df81f MINOR: lbprm: implement true "sticky" balance algo
As previously mentioned in cd352c0db ("MINOR: log/balance: rename
"log-sticky" to "sticky""), let's define a sticky algorithm that may be
used from any protocol. Sticky algorithm sticks on the same server as
long as it remains available.

The documentation was updated accordingly.
2024-03-29 17:08:37 +01:00
Christopher Faulet
87426e82ec MAJOR: cli: Use a custom .snd_buf function to only copy the current command
The CLI applet is now using its own snd_buf callback function. Instead of
copying as most output data as possible, only one command is copied at a
time.

To do so, a new state CLI_ST_PARSEREQ is added for the CLI applet. In this
state, the CLI I/O handle knows a full command was copied into its input
buffer and it must parse this command to evaluate it.
2024-03-28 17:32:55 +01:00
Christopher Faulet
838fb54de6 MINOR: stconn: Add a connection flag to notify sending data are the last ones
This flag can be use by endpoints to know the data to send, via .snd_buf
callback function are the last ones. It is useful to know a shutdown is
pending but it cannot be delivered while sedning data are not consumed.
2024-03-28 17:32:55 +01:00
Christopher Faulet
2c6321842b MEDIUM: applet: Handle applets with their own buffers in put functions
applet_putchk() and other similar functions are now testing the applet's
type to use the applet's outbuf instead of the channel's buffer. This will
ease applets convertion because most of them relies on these functions.
2024-03-28 17:28:20 +01:00
Christopher Faulet
1380b97285 MEDIUM: buf: Add b_getline() and b_getdelim() functions
These functions are very similar to co_getline() and co_getdelim(). The
first one retrieves the longest part of the buffer that is composed
exclusively of characters not in the a delimiter set. The second one stops
on LF only and thus returns a line.
2024-03-28 17:28:20 +01:00
Christopher Faulet
5056cbdb86 MINOR: sc_strm: Add generic version to perform sync receives and sends
sc_sync_recv() and sc_sync_send() were added to use connection or applet
versions, depending on the endpoint type. For now these functions are not
used. But this will be used by process_stream() to replace the connection
version.
2024-03-28 17:28:20 +01:00
Remi Tricot-Le Breton
7359c0c7f4 MEDIUM: ssl: Add 'tune.ssl.ocsp-update.mode' global option
This option can be used to set a default ocsp-update mode for all
certificates of a given conf file. It allows to activate ocsp-update on
certificates without the need to create separate crt-lists. It can still
be superseded by the crt-list 'ocsp-update' option. It takes either "on"
or "off" as value and defaults to "off".
Since setting this new parameter to "on" would mean that we try to
enable ocsp-update on any certificate, and also certificates that don't
have an OCSP URI, the checks performed in ssl_sock_load_ocsp were
softened. We don't systematically raise an error when trying to enable
ocsp-update on a certificate that does not have an OCSP URI, be it via
the global option or the crt-list one. We will still raise an error when
a user tries to load a certificate that does have an OCSP URI but a
missing issuer certificate (if ocsp-update is enabled).
2024-03-27 11:38:28 +01:00
Willy Tarreau
6c1b29d06f MINOR: ring: make the number of queues configurable
Now the rings have one wait queue per group. This should limit the
contention on systems such as EPYC CPUs where the performance drops
dramatically when using more than one CCX.

Tests were run with different numbers and it was showed that value
6 outperforms all other ones at 12, 24, 48, 64 and 80 threads on an
EPYC, a Xeon and an Ampere CPU. Value 7 sometimes comes close and
anything around these values degrades quickly. The value has been
left tunable in the global section.

This commit only introduces everything needed to set up the queue count
so that it's easier to adjust it in the forthcoming patches, but it was
initially added after the series, making it harder to compare.

It was also shown that trying to group the threads in queues by their
thread groups is counter-productive and that it was more efficient to
do that by applying a modulo on the thread number. As surprising as it
seems, it does have the benefit of well balancing any number of threads.
2024-03-25 17:34:19 +00:00
Willy Tarreau
e3f101a19a MINOR: ring: add the definition of a ring waiting cell
This is what will be used to describe one waiting thread, its message
in the queues, and the aggregation of pending messages after it.
2024-03-25 17:34:19 +00:00
Willy Tarreau
c7bd7a68e4 OPTIM: ring: have only one thread at a time wake up all readers
It's inefficient and counter-productive that each ring writer iterates
over all readers to wake them up. Let's just have one in charge of this,
it strongly limits contention. The only thing is that since the thread
is iterating over a list, we want to be sure that if the first readers
have already completed their job, they will be woken up again. For this
we keep a counter of messages delivered after the wakeup started, and
the waking thread will check it before going back to sleep. In order to
avoid looping forever, it will also drop its waking flag soon enough to
possibly let another one take it.

There used to be a few cases of watchdogs before this on a 24-core AMD
EPYC platform on the list iteration those never appeared anymore.
The perf has dropped a bit on 3C6T on the EPYC, from 6.61 to 6.0M but
remains unchanged at 24C48T.
2024-03-25 17:34:19 +00:00
Willy Tarreau
9e99cfbeb6 MAJOR: ring: drop the now unneeded lock
It was only used to protect the list which is now an mt_list so it
doesn't provide any required protection anymore. It obviously also
used to provide strict ordering between the writer and the reader
when the writer started to update the messages, but that's now
covered by the oredered tail updates and updates to the readers
count to protect the area.

The message rate on small thread counts (up to 12) saw a boost of
roughly 5% while on large counts while for large counts it lost
about 2% due to some contention now becoming visible elsewhere.
Typical measures are 6.13M -> 6.61M at 3C6T, and 1.88 -> 1.92M at
24C48T on the EPYC.
2024-03-25 17:34:19 +00:00
Willy Tarreau
a2d2dbf210 MEDIUM: ring/applet: turn the wait_entry list to an mt_list instead
Rings are keeping a lock only for the list, which apparently doesn't
need anything more than an mt_list, so let's first turn it into that
before dropping the lock. There should be no visible effect.
2024-03-25 17:34:19 +00:00
Willy Tarreau
eb3d5f464d MEDIUM: ring: use the topmost bit of the tail as a lock
We're now locking the tail while looking for some room in the ring. In
fact it's still while writing to it, but the goal definitely is to get
rid of the lock ASAP. For this we reserve the topmost bit of the tail
as a lock, which may have as a possible visible effect that buffers will
be limited to 2GB instead of 4GB on 32-bit machines (though in practise,
good luck for allocating more than 2GB contiguous on 32-bit), but in
practice since the size is read with atol() and some operating systems
limit it to LONG_MAX unless passing negative numbers, the limit is
already there.

For now the impact on x86_64 is significant (drop from 2.35 to 1.4M/s
on 48 threads on EPYC 24 cores) but this situation is only temporary
so that changes can be reviewable and bisectable.

Other approaches were attempted, such as using XCHG instead, which is
slightly faster on x86 with low thread counts (but causes more write
contention), and forces readers to stall under heavy traffic because
they can't access a valid value for the queue anymore. A CAS requires
preloading the value and is les good on ARMv8.1. XADD could also be
considered with 12-13 upper bits of the offset dedicated to locking,
but that looks overkill.
2024-03-25 17:34:19 +00:00
Willy Tarreau
dd8ea5d928 MEDIUM: ring: align the head and tail fields in the ring_storage structure
We really want to let the readers and writers act on different areas, so
we want to have the tail and the head on separate cache lines, themselves
separate from the rest of the ring. Doing so improves the performance from
2.15 to 2.35M msg/s at 48 threads on a 24-core EPYC.

This increases the header space from 32 to 192 bytes when threads are
enabled. But since we already have the header size available in the file,
haring remains able to detect the aligned vs unaligned formats and call
dump_v2a() when aligned is detected.
2024-03-25 17:34:19 +00:00
Willy Tarreau
bf3dead20c MEDIUM: ring: remove the struct buffer from the ring
The purpose is to store a head and a tail that are independent so that
we can further improve the API to update them independently from each
other.

The struct was arranged like the original one so that as long as a ring
has its head set to zero (i.e. no recycling) it will continue to work.
The new format is already detectable thanks to the "rsvd" field which
indicates the number of reserved bytes at the beginning. It's located
where the buffer's area pointer previously was, so that older versions
of haring can continue to open the ring in repair mode, and newer ones
can use the fact that the upper bits of that variable are zero to guess
that it's working with the new format instead of the old one. Also let's
keep in mind that the layout will further change to place some alignment
constraints.

The haring tool will thus updated based on this and it detects that the
rsvd field is smaller than a page and that the sum of it with the size
equals the mapped size, in which case it uses the new dump_v2() function
instead of dump_v1(). The new function also creates a buffer from the
ring's area, size, head and tail and calls the generic one so that no
other code had to be adapted.
2024-03-25 17:34:19 +00:00
Willy Tarreau
01aa0a057c MEDIUM: ring: change the ring reader to use the new vector-based API now
The code now looks cleaner and more easily shows what still needs to be
addressed. There are not that many changes in practice, these are mostly
mechanical, essentially hiding the buffer from the callers.
2024-03-25 17:34:19 +00:00
Willy Tarreau
03816ccfa9 MAJOR: ring: insert an intermediary ring_storage level
We'll need to add more complex structures in the ring, such as wait
queues. That's far too much to be stored into the area in case of
file-backed contents, so let's split the ring definition and its
storage once for all.

This patch introduces a struct ring_storage which is assigned to
ring->storage, which contains minimal information to represent the
storage layout, i.e. for now only the buffer, and all the rest
remains in the ring itself. The storage is appended immediately after
it and the buffer's pointer always points to that area. It has the
benefit of remaining 100% compatible with the existing file-backed
layout. In memory, the allocation loses the size of a struct buffer.

It's not even certain it's worth placing the size there, given that it's
constant and that a dump of a ring wouldn't really need it (the file size
is sufficient). But for now everything comes with the struct buffer, and
later this will change once split into head and tail. Also this area may
be completed with more information in the future (e.g. storage version,
format, endianness, word size etc).
2024-03-25 17:34:19 +00:00
Willy Tarreau
01abdcb307 MINOR: ring: add a flag to indicate a mapped file
Till now we used to rely on a heuristic pointer comparison to check if
a ring was mapped or allocated. Better assign a flag to clarify this
because it's going to become difficult otherwise.
2024-03-25 17:34:19 +00:00
Willy Tarreau
b30fd8cc2d MINOR: ring: also add ring_area(), ring_head(), ring_tail()
These will essentially be used to simplify the conversion to a new API.
2024-03-25 17:34:19 +00:00
Willy Tarreau
dc4836c15c MINOR: ring: add ring_dup() to copy a ring into another one
This will mostly be used during reallocation and boot-time duplicates,
the purpose is simply to save the caller from having to know the details
of the internal representation.
2024-03-25 17:34:19 +00:00
Willy Tarreau
a185d3d90d MINOR: ring: add ring_size() to return the ring's size
This is just to ease conversion so that callers stop accessing the ring's
buffer.
2024-03-25 17:34:19 +00:00
Willy Tarreau
4c41fcd0da MINOR: ring: add ring_data() to report the amount of data in a ring
This will be used as an accessor for the few functions that need this
outside of ring.c.
2024-03-25 17:34:19 +00:00
Willy Tarreau
c222cb8389 MINOR: vecpair: add necessary functions to use vecpairss from/to ring APIs
Many ring-based APIs need a tail and a head, with some extra assumption
that the user takes care of not filling the ring so that tail==head is
unambiguous. Vectors are particularly suited to this usage so here we
create 4 functions to create vectors representing free room or data
from a ring, as well as updating rings based on a pair of vectors that
represents either free space or data.
2024-03-25 17:34:19 +00:00
Willy Tarreau
63261aae39 MINOR: vecpair: add new vector pair based data manipulation mechanisms
The buffers API defines both a storage layout and how to handle the
data. The storage is shared with the chunks API which only deals with
non-wrapping messages while buffers support wrapping both of the data
and of the free space. As such, most of the buffers code already makes
special cases of two parts in a buffer, the first one before wrapping
and the optional second one after the wrapping occurred.

The thing is, there are plenty of other places (e.g. rings) where the
code dealing with wrapping is desirable but with a different storage
layout. Let's export the existing buffer handling code related to
reading/writing wrapping data and make it work with arbitrary vector
pairs instead. This will handle wrapping and holes in messages if
desired, and it will be up to the caller to decide how its messages
are arranged and to pass the relevant ptr,len elements.

The code is limited to two vectors because this is sufficient to deal
with wrapping without making the code needlessly complex. I.e. this will
not reassemble an iovec. For vectors, since we already had the ist type,
there's no point inventing a new type, and it's even possible that over
time some callers will find benefits in using this unified API (i.e. no
NOP translation layer). It also allows to pass inputs as direct arguments
and outputs as pointers. Not only this is more efficient code-wise, but
it also avoids the accidental use of a wrong function. It was indeed
found that naming functions is even harder than with the buffer as the
notion of from/to is even fuzzier here.

The API will likely continue to evolve and some functions might get
renamed to more explicit ones over time to limit confusion. For now
the code provides anything needed to reset/create/fill/erase/read/peek
or measure vector pairs and to manipulate chars/blocks/varints to/from
there.
2024-03-25 17:34:19 +00:00
Willy Tarreau
0b1c17a2dd MINOR: ring: reserve one special value for the readers count
In order to support concurrent writers we'll need to lock areas in the
buffer. For this we'll use one special value of the single-byte readers
count. Let's reserve it now and use the macro instead of the hardcoded
255.
2024-03-25 17:34:19 +00:00
Willy Tarreau
63242a59c4 MINOR: buf: add b_getblk_ofs() that works relative to area and not head
For some concurrently accessed buffers we can't rely on head/data etc,
but sometimes the access patterns guarantees that the buffer contents
are there. Let's implement a function to read contents from a fixed
offset, which never checks head nor data, only the area and its size.
It's the caller's job to get this offset.
2024-03-25 17:34:19 +00:00
Willy Tarreau
2f28981546 MINOR: buf: add b_putblk_ofs() to copy a block at a specific position
This new function b_putblk_ofs() puts one full block of data of length
<len> from <blk> into the buffer, starting from absolute offset <offset>
after the buffer's area.  As a convenience to avoid complex checks in
callers, the offset is allowed to exceed a valid one by no more than one
buffer size, and will automatically be wrapped. The caller is responsible
for ensuring that <len> doesn't exceed the known length of the available
room at this position, otherwise data may be overwritten. The buffer's
length is *not* updated, so generally the caller will have updated it
before calling this function. This is meant to be used on concurrently
accessed buffers, so that a writer can append data while a reader is
blocked by other means from reaching the current area The function
guarantees never to use ->head nor ->data.
2024-03-25 17:34:19 +00:00
Willy Tarreau
c5004ccb36 MINOR: buf: add b_rel_ofs() to turn an absolute offset into a relative one
It basically does the opposite of b_peek_ofs(). If x=b_peek_ofs(y), then
y=b_rel_ofs(x).
2024-03-25 17:34:19 +00:00
Willy Tarreau
15e47b6a59 MINOR: buf: add b_add_ofs() to add a count to an absolute position
This function is used to compute a new absolute buffer offset by adding a
length to an existing valid offset. It will check for wrapping.
2024-03-25 17:34:19 +00:00
Willy Tarreau
c62a2d540d MEDIUM: ring: move the ring reader code to ring_dispatch_messages()
This new function is made around the loop that scans a ring for new
messages and dispatches them to a message handler. It also takes
ring flags (WAIT, NEW, etc) and offset pointers that the caller will
use to initialize/reuse/update the current processing offset. The
caller is still responsible for presetting it to ~0 before the
first call if it wants the function to automatically adjust it (or set
it to the correct value). The function may also return the last_ofs
that was known before releasing the lock so that the caller knows
what to compare against and if it needs to restart processing or not.
The context remains a void* so that should not necessarily depend on
an appctx.

The current "show ring" code was ported to this and it continues to
work as expected.
2024-03-25 17:34:19 +00:00
Willy Tarreau
ad31e53287 REORG: dns/ring: split the ring between the generic one and the DNS one
A ring is used for the DNS code but slightly differently from the generic
one, which prevents some important changes from being made to the generic
code without breaking DNS. As the use cases differ, it's better to just
split them apart for now and have the DNS code use its own ring that we
rename dns_ring and let the generic code continue to live on its own.

The unused parts such as CLI registration were dropped, resizing and
allocation from a mapped area were dropped. dns_ring_detach_appctx() was
kept despite not being used, so as to stay consistent with the comments
that say it must be called, despite the DNS code explicitly mentioning
that it skips it for now (i.e. this may change in the future).

Hopefully after the generic rings are converted the DNS code can migrate
back to them, though this is really not necessary.
2024-03-25 17:34:19 +00:00
Willy Tarreau
201c706330 MINOR: log/applet: add new function syslog_applet_append_event()
This function takes a buffer on input, and offset and a length, and
consumes the block from that buffer to send it to the appctx's output
buffer. Contrary to its sibling applet_append_line(), instead of just
appending an LF at the end of the line, it prepends the message size
in decimal and a space before the message, as expected by syslog TCP
implementaions. This will be used to simplify the ring reader code.
2024-03-25 17:34:19 +00:00
Willy Tarreau
6ae41dc510 MINOR: applet: add new function applet_append_line()
This function takes a buffer on input, and offset and a length, and
consumes the block from that buffer to send it to the appctx's output
buffer. This will be used to simplify the ring reader code.
2024-03-25 17:34:19 +00:00
Willy Tarreau
c038ca8e8c MINOR: atomic: add a read-specific variant of __ha_cpu_relax()
Tests on various systems show that x86 prefers not to wait at all inside
read loops while aarch64 prefers to wait a little bit. Instead of having
to stuff ifdefs around __ha_cpu_relax() inside plenty of such loops
waiting for a condition to appear, better implement a new variant that
we call __ha_cpu_relax_for_read() which honors each architecture's
preferences and is the same as __ha_cpu_relax() for other ones.
2024-03-25 17:34:19 +00:00
Aurelien DARRAGON
3de1acfb23 BUILD: server: fix build regression on old compilers (<= gcc-4.4)
Willy reported that since 3ac79b504 ("MEDIUM: server:
make server_set_inetaddr() updater serializable"), haproxy fails to
compile on some older compilers such as gcc-4.4 with this kind of error:

  src/server.c: In function 'snr_resolution_cb':
  src/server.c:4471: error: unknown field 'dns_resolver' specified in initializer
  compilation terminated due to -Wfatal-errors.
  make: *** [Makefile:1006: src/server.o] Error 1

This is due to referencing a member inside anonymous union from a compound
literal assignment. Apparently such use of anonymous union wasn't properly
supported back then on older compilers. To fix the issue, we give "u" name
to the parent union use this name to explicitly refer to the union where
relevant in the code (only a few changes fortunately).

The fix itself was verified to restore build compatibility with gcc 4.4
(and even 4.2).

As 3ac79b504 is used as a prerequisite for 64c9c8ef3 ("BUG/MINOR:
server/dns: use server_set_inetaddr() to unset srv addr from DNS"), please
consider backporting this patch too if 64c9c8ef3 happens to be backported
in 2.9.
2024-03-25 16:23:37 +01:00
Amaury Denoyelle
0d4273f04b MEDIUM: server: close private idle connection before server deletion
This commit similar to the following one :
  65ae241dcfe710e1cdd3ec4e7a9bde38d2e4c116
  MEDIUM: server: close idle conn before server deletion

This patch implements a similar logic, this time to close private idle
connections stored in sessions. The principle is identical to the above
commit : conn_release() is used on idle connections after a takeover to
ensure thread safety.

An extra change was required to be able to execute takeover on such
connections. Their original thread ID was unknown, contrary to non
private connections which are stored in sharded lists. As such, a new
tid member has been added under sess_priv_conns chaining element.
2024-03-22 17:12:27 +01:00
Amaury Denoyelle
f3862a9bc7 MINOR: connection: extend takeover with release option
Extend takeover API both for MUX and XPRT with a new boolean argument
<release>. Its purpose is to signal if the connection will be freed
immediately after the takeover, rendering new resources allocation
unnecessary.

For the moment, release argument is always false. However, it will be
set to true on delete server CLI handler to proactively close server
idle connections.
2024-03-22 16:12:36 +01:00
Amaury Denoyelle
ff2e71ae24 MINOR: connection: implement conn_release()
Several places reuse the same code to ensure a connection is properly
freed, either via its MUX or by calling the proper set of functions.
Factorize all of this in a new function conn_release().

This new function is now called via session_free() and
session_accept_fd(). It will also be reused on delete server to
proactively close idle connections.
2024-03-22 16:12:36 +01:00
Remi Tricot-Le Breton
5c25c577a0 BUG/MEDIUM: ssl: Fix crash when calling "update ssl ocsp-response" when an update is ongoing
The CLI command "update ssl ocsp-response" was forcefully removing an
OCSP response from the update tree regardless of whether it used to be
in it beforehand or not. But since the main OCSP upate task works by
removing the entry being currently updated from the update tree and then
reinserting it when the update process is over, it meant that in the CLI
command code we were modifying a structure that was already being used.

These concurrent accesses were not properly locked on the "regular"
update case because it was assumed that once an entry was removed from
the update tree, the update task was the only one able to work on it.

Rather than locking the whole update process, an "updating" flag was
added to the certificate_ocsp in order to prevent the "update ssl
ocsp-response" command from trying to update a response already being
updated.

An easy way to reproduce this crash was to perform two "simultaneous"
calls to "update ssl ocsp-response" on the same certificate. It would
then crash on an eb64_delete call in the main ocsp update task function.

This patch can be backported up to 2.8. Wait a little bit before
backporting.
2024-03-20 16:12:10 +01:00
Remi Tricot-Le Breton
69071490ff BUG/MAJOR: ocsp: Separate refcount per instance and per store
With the current way OCSP responses are stored, a single OCSP response
is stored (in a certificate_ocsp structure) when it is loaded during a
certificate parsing, and each SSL_CTX that references it increments its
refcount. The reference to the certificate_ocsp is kept in the SSL_CTX
linked to each ckch_inst, in an ex_data entry that gets freed when the
context is freed.
One of the downsides of this implementation is that if every ckch_inst
referencing a certificate_ocsp gets detroyed, then the OCSP response is
removed from the system. So if we were to remove all crt-list lines
containing a given certificate (that has an OCSP response), and if all
the corresponding SSL_CTXs were destroyed (no ongoing connection using
them), the OCSP response would be destroyed even if the certificate
remains in the system (as an unused certificate).
In such a case, we would want the OCSP response not to be "usable",
since it is not used by any ckch_inst, but still remain in the OCSP
response tree so that if the certificate gets reused (via an "add ssl
crt-list" command for instance), its OCSP response is still known as
well.
But we would also like such an entry not to be updated automatically
anymore once no instance uses it. An easy way to do it could have been
to keep a reference to the certificate_ocsp structure in the ckch_store
as well, on top of all the ones in the ckch_instances, and to remove the
ocsp response from the update tree once the refcount falls to 1, but it
would not work because of the way the ocsp response tree keys are
calculated. They are decorrelated from the ckch_store and are the actual
OCSP_CERTIDs, which is a combination of the issuer's name hash and key
hash, and the certificate's serial number. So two copies of the same
certificate but with different names would still point to the same ocsp
response tree entry.

The solution that answers to all the needs expressed aboved is actually
to have two reference counters in the certificate_ocsp structure, one
actual reference counter corresponding to the number of "live" pointers
on the certificate_ocsp structure, incremented for every SSL_CTX using
it, and one for the ckch stores.
If the ckch_store reference counter falls to 0, the corresponding
certificate must have been removed via CLI calls ('set ssl cert' for
instance).
If the actual refcount falls to 0, then no live SSL_CTX uses the
response anymore. It could happen if all the corresponding crt-list
lines were removed and there are no live SSL sessions using the
certificate anymore.
If any of the two refcounts becomes 0, we will always remove the
response from the auto update tree, because there's no point in spending
time updating an OCSP response that no new SSL connection will be able
to use. But the certificate_ocsp object won't be removed from the tree
unless both refcounts are 0.

Must be backported up to 2.8. Wait a little bit before backporting.
2024-03-20 16:12:10 +01:00
Amaury Denoyelle
c130f74803 BUG/MINOR: session: ensure conn owner is set after insert into session
A crash could occured if a session_add_conn() would temporarily failed
when called via h2_detach(). In this case, connection owner is reset to
NULL. However, if this wasn't the last connection stream, the connection
won't be destroyed. When h2_detach() is recalled for another stream and
this time session_add_conn() succeeds, a crash will occur due to
session_check_idle_conn() invocation with a NULL connection owner.

To fix this, ensure connection owner is always set after
session_add_conn() success.

This bug is considered as minor as the only failure reason for
session_add_conn() is a pool allocation issue.

This should be backported up to all stable releases.
2024-03-20 14:26:57 +01:00
Christopher Faulet
189f74d4ff MINOR: cfgparse: Add a global option to expose deprecated directives
Similarly to "expose-exprimental-directives" option, there is no a global
option to expose some deprecated directives. Idea is to have a way to silent
warnings about deprecated directives when there is no alternative solution.

Of course, deprecated directives covered by this option are not listed and
may change. It is only a best effort to let users upgrade smoothly.
2024-03-15 11:31:48 +01:00
Amaury Denoyelle
7dae3ceaa0 BUG/MAJOR: server: do not delete srv referenced by session
A server can only be deleted if there is no elements which reference it.
This is taken care via srv_check_for_deletion(), most notably for active
and idle connections.

A special case occurs for connections directly managed by a session.
This is for so-called private connections, when using http-reuse never
or H2 + http-reuse safe for example. In this case. server does not
account these connections into its idle lists. This caused a bug as the
server is deleted despite the session still being able to access it.

To properly fix this, add a new referencing element into the server for
these session connections. A mt_list has been chosen for this. On
default http-reuse, private connections are typically not used so it
won't make any difference. If using H2 servers, or more generally when
dealing with private connections, insert/delete should typically occur
only once per session lifetime so impact on performance should be
minimal.

This should be backported up to 2.4. Note that srv_check_for_deletion()
was introduced in 3.0 dev tree. On backport, the extra condition in it
should be placed in cli_parse_delete_server() instead.
2024-03-14 15:21:07 +01:00
Amaury Denoyelle
5ad801c058 MINOR: session: rename private conns elements
By default, backend connections are attached to a server instance. This
allows to implement connection reuse. However, in some particular cases,
connection cannot be shared accross several clients. These connections
are considered and private and are attached to the session instance
instead.

These private connections are also indexed by the target server to not
mix them. All of this is implemented via a dedicated structure
previously named struct sess_srv_list.

Rename it to better reflect its usage to struct sess_priv_conns. Also
rename its internal members and all of the associated functions.

This commit is only a renaming, thus no functional impact is expected.
2024-03-14 15:21:02 +01:00
Aurelien DARRAGON
07b2e84bce BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread (2nd try)
While trying to reproduce another crash case involving lua filters
reported by @bgrooot on GH #2467, we found out that mixing filters loaded
from different contexts ('lua-load' vs 'lua-load-per-thread') for the same
stream isn't supported and may even cause the process to crash.

Historically, mixing lua-load and lua-load-per-threads for a stream wasn't
supported, but this changed thanks to 0913386 ("BUG/MEDIUM: hlua: streams
don't support mixing lua-load with lua-load-per-thread").

However, the above fix didn't consider lua filters's use-case properly:
unlike lua fetches, actions or even services, lua filters don't simply
use the stream hlua context as a "temporary" hlua running context to
process some hlua code. For fetches, actions.. hlua executions are
processed sequentially, so we simply reuse the hlua context from the
previous action/fetch to run the next one (this allows to bypass memory
allocations and initialization, thus it increases performance), unless
we need to run on a different hlua state-id, in which case we perform a
reset of the hlua context.

But this cannot work with filters: indeed, once registered, a filter will
last for the whole stream duration. It means that the filter will rely
on the stream hlua context from ->attach() to ->detach(). And here is the
catch, if for the same stream we register 2 lua filters from different
contexts ('lua-load' + 'lua-load-per-thread'), then we have an issue,
because the hlua stream will be re-created each time we switch between
runtime contexts, which means each time we switch between the filters (may
happen for each stream processing step), and since lua filters rely on the
stream hlua to carry context between filtering steps, this context will be
lost upon a switch. Given that lua filters code was not designed with that
in mind, it would confuse the code and cause unexpected behaviors ranging
from lua errors to crashing process.

So here we take another approach: instead of re-creating the stream hlua
context each time we switch between "global" and "per-thread" runtime
context, let's have both of them inside the stream directly as initially
suggested by Christopher back then when talked about the original issue.

For this we leverage hlua_stream_ctx_prepare() and hlua_stream_ctx_get()
helper functions which return the proper hlua context for a given stream
and state_id combination.

As for debugging infos reported after ha_panic(), we check for both hlua
runtime contexts to check if one of them was active when the panic occured
(only 1 runtime ctx per stream may be active at a given time).

This should be backported to all stable versions with 0913386
("BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread")

This commit depends on:
 - "DEBUG: lua: precisely identify if stream is stuck inside lua or not"
   [for versions < 2.9 the ha_thread_dump_one() part should be skipped]
 - "MINOR: hlua: use accessors for stream hlua ctx"

For 2.4, the filters API didn't exist. However it may be a good idea to
backport it anyway because ->set_priv()/->get_priv() from tcp/http lua
applets may also be affected by this bug, plus it will ease code
maintenance. Of course, filters-related parts should be skipped in this
case.
2024-03-13 09:24:46 +01:00
Aurelien DARRAGON
1a2cdf64c9 DEBUG: lua: precisely identify if stream is stuck inside lua or not
When ha_panic() is called by the watchdog, we try to guess from
ha_task_dump() and ha_thread_dump_one() if the thread was stuck while
executing lua from the stream context. However we consider this is the
case by simply checking if the stream hlua context was set, but this is
not very precise because if the hlua context is set, then it simply means
that at least one lua instruction was executed at the stream level, not
that the stuck was currently executing lua when the panic occured.

This is especially true with filters, one could simply register a lua
filter that does nothing but this will still end up initializing the
stream hlua context for each stream. If the thread end up being stuck
during the stream handling, then debug dumping functions will report
that the stream was stuck while handling lua, which is not necessarilly
true, and could in fact confuse us even more.

So here we take another approach, we add the BUSY flag to hlua context:
this flag is set by hlua_ctx_resume() around lua_resume() call, this way
we can precisely tell if the thread was handling lua when it was
interrupted, and we rely on this flag in debug functions to check if the
thread was effectively stuck inside lua or not while processing the stream

No backport needed unless a commit depends on it.
2024-03-13 09:24:46 +01:00
William Lallemand
bbc215d3bd CLEANUP: ssl: remove useless #ifdef in openssl-compat.h
Remove a useless #ifdef in openssl-compat.h
2024-03-13 08:51:04 +01:00
William Lallemand
501d9fdb86 MEDIUM: ssl: allow to change the OpenSSL security level from global section
The new "ssl-security-level" option allows one to change the OpenSSL
security level without having to change the openssl.cnf global file of
your distribution. This directives applies on every SSL_CTX context.

People sometimes change their security level directly in the ciphers
directive, however there are some cases when the security level change
is not applied in the right order (for example when applying a DH
param).

Before this patch, it was to possible to trick by using a specific
openssl.cnf file and start haproxy this way:

    OPENSSL_CONF=./openssl.cnf ./haproxy -f bug-2468.cfg

Values for the security level can be found there:

https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_security_level.html

This was discussed in github issue #2468.
2024-03-12 17:37:11 +01:00
Amaury Denoyelle
c499d66f37 MINOR: quic: remove qc_treat_rx_crypto_frms()
This commit removes qc_treat_rx_crypto_frms(). This function was used in
a single place inside qc_ssl_provide_all_quic_data(). Besides, its
naming was confusing as conceptually it is directly linked to quic_ssl
module instead of quic_rx.

Thus, body of qc_treat_rx_crypto_frms() is inlined directly inside
qc_ssl_provide_all_quic_data(). Also, qc_ssl_provide_quic_data() is now
only used inside quic_ssl to its scope is set to static. Overall, API
for CRYPTO frame handling is now cleaner.
2024-03-11 14:27:51 +01:00
matthias sweertvaegher
062ea3a3d4 BUILD: solaris: fix compilation errors
Compilation on solaris fails because of usage of names reserved on that
platform, i.e. 'queue' and 's_addr'.

This patch redefines 'queue' as '_queue' and renames 's_addr' to
'srv_addr' which fixes compilation for now.

Future plan: rename 'queue' in code base so define can be removed again.

Backporting: 2.9, 2.8
2024-03-09 11:24:54 +01:00
Willy Tarreau
758cb450a2 OPTIM: sink: drop the sink lock used to count drops
The sink lock was made to prevent event producers from passing while
there were other threads trying to print a "dropped" message, in order
to guarantee the absence of reordering. It has a serious impact however,
which is that all threads need to take the read lock when producing a
regular trace even when there's no reader.

This patch takes a different approach. The drop counter is shifted left
by one so that the lowest bit is used to indicate that one thread is
already taking care of trying to dump the counter. Threads only read
this value normally, and will only try to change it if it's non-null,
in which case they'll first check if they are the first ones trying to
dump it, otherwise will simply count another drop and leave. This has
a large benefit. First, it will avoid the locking that causes stalls
as soon as a slow reader is present. Second, it avoids any write on the
fast path as long as there's no drop. And it remains very lightweight
since we just need to add +2 or subtract 2*dropped in operations, while
offering the guarantee that the sink_write() has succeeded before
unlocking the counter.

While a reader was previously limiting the traffic to 11k RPS under
4C/8T, now we reach 36k RPS vs 14k with no reader, so readers will no
longer slow the traffic down and will instead even speed it up due to
avoiding the contention down the chain in the ring. The locking cost
dropped from ~75% to ~60% now (it's in ring_write now).
2024-03-09 11:23:52 +01:00
Willy Tarreau
26cd248feb BUILD: ssl: define EVP_CTRL_AEAD_GET_TAG for older versions
Amaury reported that previous commit 08ac282375 ("MINOR: Add aes_gcm_enc
converter") broke the CI on OpenSSL 1.0.2 due to the define above not
existing there. Let's just map it to its older name when not existing.
For reference, these were renamed when switching to 1.1.0:

    https://marc.info/?l=openssl-cvs&m=142244190907706&w=2

No backport is needed.
2024-03-08 18:23:34 +01:00
Amaury Denoyelle
1ee7bf5bd9 MINOR: quic: always use ncbuf for rx CRYPTO
The previous patch fix the handling of in-order CRYPTO frames which
requires the usage of a new buffer for these data as their handling is
delayed to run under TASK_HEAVY.

In fact, as now all CRYPTO frames handling must be delayed, their
handling can be unify. This is the purpose of this commit, which removes
the just introduced new buffer. Now, all CRYPTO frames are buffered
inside the ncbuf. Unused elements such as crypto_frms member for
encryption level are also removed.

This commit is not a bugcfix but is a direct follow-up to the last one.
As such, it can probably be backported with it to 2.9 to reduce code
differences between these versions.
2024-03-08 17:22:48 +01:00
Amaury Denoyelle
81f118cec0 BUG/MEDIUM: quic: fix handshake freeze under high traffic
QUIC relies on SSL_do_hanshake() to be able to validate handshake. As
this function is computation heavy, it is since 2.9 called only under
TASK_HEAVY. This has been implemented by the following patch :
  94d20be138
  MEDIUM: quic: Heavy task mode during handshake

Instead of handling CRYPTO frames immediately during reception, this
patch delays the process to run under TASK_HEAVY tasklet. A frame copy
is stored in qel.rx.crypto_frms list. However, this frame still
reference the receive buffer. If the receive buffer is cleared before
the tasklet is rescheduled, it will point to garbage data, resulting in
haproxy decryption error. This happens if a fair amount of data is
received constantly to preempt the quic_conn tasklet execution.

This bug can be reproduced with a fair amount of clients. It is
exhibited by 'show quic full' which can report connections blocked on
handshake. Using the following commands result in h2load non able to
complete the last connections.

$ h2load --alpn-list h3 -t 8 -c 800 -m 10 -w 10 -n 8000 "https://127.0.0.1:20443/?s=10k"

Also, haproxy QUIC listener socket mode was active to trigger the issue.
This forces several connections to share the same reception buffer,
rendering the bug even more plausible to occur. It should be possible to
reproduce it with connection socket if increasing the clients amount.

To fix this bug, define a new buffer under quic_cstream. It is used
exclusively to copy CRYPTO data for in-order frame if ncbuf is empty.
This ensures data remains accessible even if receive buffer is cleared.

Note that this fix is only a temporary step. Indeed, a ncbuf is also
already used for out-of-order data. It should be possible to unify its
usage for both in and out-of-order data, rendering this new buffer
instance unnecessary. In this case, several unneeded elements will
become obsolete such as qel.rx.crypto_frms list. This will be done in a
future refactoring patch.

This must be backported up to 2.9.
2024-03-08 17:22:48 +01:00
Nenad Merdanovic
e225e04ba7 MINOR: vars: export var_set and var_unset functions
Co-authored-by: Dragan Dosen <ddosen@haproxy.com>
2024-03-08 17:20:43 +01:00
Willy Tarreau
93a0fb74f4 BUILD: buf: make b_ncat() take a const for the source
In 2.7 with commit 35df34223b ("MINOR: buffers: split b_force_xfer() into
b_cpy() and b_force_xfer()"), b_ncat() was extracted from b_force_xfer()
but kept its source variable instead of constant, making it unusable for
calls from a const source. Let's just fix it.
2024-03-05 11:50:34 +01:00
Willy Tarreau
0a0041d195 BUILD: tree-wide: fix a few missing includes in a few files
Some include files, mostly types definitions, are missing a few includes
to define the types they're using, causing include ordering dependencies
between files, which are most often not seen due to the alphabetical
order of includes. Let's just fix them.

These were spotted by building pre-compiled headers for all these files
to .h.gch.
2024-03-05 11:50:34 +01:00
Willy Tarreau
ac692d7ee5 BUILD: thread: move lock label definitions to thread-t.h
The 'lock_label' enum is defined in thread.h but it's used in a few
type files, so let's move it to thread-t.h to allow explicit includes.
2024-03-05 11:50:34 +01:00
Amaury Denoyelle
f913d42aaf MINOR: quic: add MUX output for show quic
Extend "show quic" to be able to dump MUX related information. This is
done via the new function qcc_show_quic(). This replaces the old streams
dumping list which was incomplete.

These info are displayed on full output or by specifying "mux" field.
2024-02-29 10:03:36 +01:00
Christopher Faulet
60fcc27577 MEDIUM: htx/http-ana: No longer close connection on early HAProxy response
When a response was returned by HAProxy, a dedicated HTX flag was
set. Thanks to this flag, it was possible to add a "connection: close"
header to the response if the request was not fully received and to close
the connection. In the same way, when a redirect rule was applied,
keep-alive was forcefully disabled for unfinished requests.

All these mechanisms are now useless because the H1 mux is able to drain the
response. So HTX_FL_PROXY_RESP flag is removed and no special processing is
performed on HAProxy response when the request is unfinished.
2024-02-28 16:02:33 +01:00
Christopher Faulet
077906da14 MAJOR: mux-h1: Drain requests on client side before shut a stream down
unlike for H2 and H3, there is no mechanism in H1 to notify the client it
must stop to upload data when a response is replied before the end of the
request without closing the connection. There is no RST_STREAM frame
equivalent.

Thus, there is only two ways to deal with this situation: closing the
connection or draining the request. Until now, HAProxy didn't support
draining H1 messages. Closing the connection in this case has however a
major drawback. It leads to send a TCP reset, dropping this way all in-fly
data. There is no warranty the client has fully received the response.

Draining H1 messages was never implemented because in old versions it was a
bit tricky to implement. However, it is now far simplier to support this
feature because it is possible to have a H1 stream without any applicative
stream. It is the purpose of this patch. Now, when a shutdown is requested
and the stream is detached from the connection, if the request is unfinished
while the response was fully sent, the request in drained.

To do so, in this case the shutdown and the detach are delayed. From the
upper layer point of view, there is no changes. The endpoint is shut down
and detached as usual. But on H1 mux point of view, the H1 stream is still
alive and is being able to drain data. However the stream-endpoint
descriptor is orphan. Once the request is fully received (and drained), the
connection is shut down if it cannot be reused for a new transaction and the
H1 stream is destroyed.
2024-02-28 16:02:33 +01:00
Amaury Denoyelle
8a31783b64 BUG/MEDIUM: server: fix dynamic servers initial settings
Contrary to static servers, dynamic servers does not initialize their
settings from a default server instance. As such, _srv_parse_init() was
responsible to set a set of minimal values to have a correct behavior.

However, some settings were not properly initialized. This caused
dynamic servers to not behave as static ones without explicit
parameters.

Currently, the main issue detected is connection reuse which was
completely impossible. This is due to incorrect pool_purge_delay and
max_reuse settings incompatible with srv_add_to_idle_list().

To fix the connection reuse, but also more generally to ensure dynamic
servers are aligned with other server instances, define a new function
srv_settings_init(). This is used to set initial values for both default
servers and dynamic servers. For static servers, srv_settings_cpy() is
kept instead, using their default server as reference.

This patch could have unexpected effects on dynamic servers behavior as
it restored proper initial settings. Previously, they were set to 0 via
calloc() invocation from new_server().

This should be backported up to 2.6, after a brief period of
observation.
2024-02-27 17:02:20 +01:00
William Lallemand
4895fdac5a BUG/MAJOR: ssl/ocsp: crash with ocsp when old process exit or using ocsp CLI
This patch reverts 2 fixes that were made in an attempt to fix the
ocsp-update feature used with the 'commit ssl cert' command.

The patches crash the worker when doing a soft-stop when the 'set ssl
ocsp-response' command was used, or during runtime if the ocsp-update
was used.

This was reported in issue #2462 and #2442.

The last patch reverted is the associated reg-test.

Revert "BUG/MEDIUM: ssl: Fix crash when calling "update ssl ocsp-response" when an update is ongoing"
This reverts commit 5e66bf26ec.

Revert "BUG/MEDIUM: ocsp: Separate refcount per instance and per store"
This reverts commit 04b77f84d1b52185fc64735d7d81137479d68b00.

Revert "REGTESTS: ssl: Add OCSP related tests"
This reverts commit acd1b85d3442fc58164bd0fb96e72f3d4b501d15.
2024-02-26 18:04:25 +01:00
Willy Tarreau
a4d44250eb BUG/MINOR: ist: only store NUL byte on succeeded alloc
The trailing NUL added at the end of istdup() by recent commit de0216758
("BUG/MINOR: ist: allocate nul byte on istdup") was placed outside of
the pointer validity test, rightfully showing null deref warnings. This
fix should be backported along with the fix above, to the same versions.
2024-02-23 19:51:54 +01:00
Miroslav Zagorac
3f771f5118 MINOR: ssl: Call callback function after loading SSL CRL data
Due to the possibility of calling a control process after adding CRLs, the
ssl_commit_crlfile_cb variable was added.  It is actually a pointer to the
callback function, which is called if defined after initial loading of CRL
data from disk and after committing CRL data via CLI command
'commit ssl crl-file ..'.

If the callback function returns an error, then the CLI commit operation
is terminated.

Also, one case was added to the CLI context used by "commit cafile" and
"commit crlfile": CACRL_ST_CRLCB in which the callback function is called.

Signed-off-by: William Lallemand <wlallemand@haproxy.com>
2024-02-23 18:12:27 +01:00
Christopher Faulet
3d93ecc132 BUG/MAJOR: cli: Restore non-interactive mode behavior with pipelined commands
The issue was decribed in commit "BUG/MEDIUM: cli: Warn if pipelined commands
are delimited by a \n". In non-interactive mode, it was possible to use a
newline character as delimiter for pipelined commands. As a consequence, it was
possible to stop commands processing on the middle.

With the above commit, a warning is emitted to notify users. With this one,
we restore the expected behavior, as documented in the management guide.
Only the first line of commands is parsed. This commit will not be
backported to avoid breaking changes on stable versions.

This commit has of course some visible effects. All script using a newline
character as delimiter to pipeline commands in non-interactive mode will
stop working. Only the first command will be evaluated, all others will be
ignored. Pipelined commands MUST now be separated by a semi-colon.

But there is a more subtle and probably more annoying change. It is no
longer possible to pipeline commands with a payload ! A command with a
payload will always be the last one evaluated because it must be finished by
a newline (eventually preceeded by a custom pattern).

It is really annoying to introduce such breaking change. But, on the long
term, it is mandatory. The 2.8 will be the last LST version supporting the
old behavior (with some warning however). This will let 4 years to users to
adapt their scripts.

No backport needed.
2024-02-23 15:19:49 +01:00
Christopher Faulet
598c7f164c BUG/MEDIUM: cli: Warn if pipelined commands are delimited by a \n
This was broken since commit 0011c25144 ("BUG/MINOR: cli: avoid O(bufsize)
parsing cost on pipelined commands"). It is not really a bug fix but it is
labelled as is to make it more visible.

Before, a full line was first retrieved from the request buffer before
extracting the first command to eval it. Now, only one command is retrieved.
But we rely on the request buffer state to interrupt processing in
non-interactive mode. After a command processing, if output of the request
buffer is empty, we leave. Before the above commit, this was not a problem.
But since then, it is obviously a bad statement. First because some input
data may still be there. It is not true today, but it might change. Then,
there is no warranty to receive all commands in same time. For small list of
commands, it will be most of time the case, but it is a dangerous
assumption. For long list of commands, it is almost always false.

To be an issue, commands must be chunked exactly between two commands. But
in this case, remaining commands are skipped. A good way to reproduce the
issue is to wait a bit between two commands, for instance:

    (printf "show info;"; sleep 2; printf "show stat\n") | socat ...

In fact, to properly fix the issue, we should exit on the first command
finished by a newline. Indeed, as stated in the documentation, in
non-interactive mode, a single line is processed. To pipeline commands,
commands must be separated by a semi-colon. Unfortunately, the above commit
introduced another change. It is possible to pipeline commands delimited by
a newline. It was pushed 2 years ago and backported to all stable versions.
Several scripts may rely on this behavior.

So, on stable version, the bug will not be fixed. However a warning will be
emitted to notify users their scripts don't respect the documentation and
they must adapt it. Mainly because the cli behavior on this point will be
changed in 3.0 to stick to the doc. This warning will only be emitted once
over the whole worker process life. Idea is to not flood the logs with the
same warning for every offending commands.

This commit should probably be backported to all stable versions. But with
some cautions because the CLI was often modified.
2024-02-23 15:19:49 +01:00
Amaury Denoyelle
de02167584 BUG/MINOR: ist: allocate nul byte on istdup
istdup() is documented as having the same behavior as strdup(). However,
it may cause confusion as it allocates a block of input length, without
an extra byte for \0 delimiter. This behavior is incoherent as in case
of an empty string however a single \0 is allocated.

This API inconsistency could cause a bug anywhere an IST is used as a
C-string after istdup() invocation. Currently, the only found issue is
with 'wait' CLI command using 'srv-unused'. This causes a buffer
overflow due to ist0() invocation after istdup() for be_name and
sv_name.

Backport should be done to all stable releases. Even if no bug has been
found outside of wait CLI implementation, it ensures the code is more
consistent on every releases.
2024-02-22 18:24:35 +01:00
Aurelien DARRAGON
1c2e16ba8a MINOR: log: add free_logformat_node() helper function
Function may be used to free a single logformat node.
2024-02-22 15:32:42 +01:00
Aurelien DARRAGON
891bac673b CLEANUP: proxy/log: remove unused proxy flag
Since 3d6350e10 ("MINOR: log: Remove log-error-via-logformat option"),
PR_O_ERR_LOGFMT flag is not used anymore, but it was left in the proxy-t.h
header file. Simply removing it and adding a comment to indicate that the
corresponding bit is now unused.
2024-02-22 15:32:42 +01:00
Amaury Denoyelle
8b950f40fa MINOR: quic: only use sendmsg() syscall variant
This patch is the direct followup of the previous one :
  MINOR: quic: remove sendto() usage variant

This finalizes qc_snd_buf() simplification by removing send() syscall
usage for quic-conn owned socket. Syscall invocation is merged in a
single code location to the sendmsg() variant.

The only difference for owned socket is that destination address for
sendmsg() is set to NULL. This usage is documented in man 2 sendmsg as
valid for connected sockets. This allows maximum performance by avoiding
unnecessary lookups on kernel socket address tables.

As the previous patch, no functional change should happen here. However,
it will be simpler to extend qc_snd_buf() for GSO usage.
2024-02-20 16:42:05 +01:00
Aurelien DARRAGON
1448478d62 MINOR: log: explicit typecasting for logformat nodes
Add the ability to manually specify desired output type after a custom
field name for logformat nodes. Forcing the type can be useful to ensure
value is stored with the proper type representation. (i.e.: forcing
numerical to string to work around the limited resolution of JS number
types)

By default, type is set to SMP_T_SAME, which means the original type will
be preserved.

Currently supported types are: bool, str, sint
2024-02-20 15:49:54 +01:00
Aurelien DARRAGON
0cfcc64b79 MINOR: sample: add type_to_smp() helper function
type_to_smp(type) does the reverse operation of smp_to_type[smp]: it takes
a type name as input string and tries to return the corresponding SMP_T_*
smp type or SMP_TYPES if not found.
2024-02-20 15:18:39 +01:00
Aurelien DARRAGON
2ed6068f2a MINOR: log: custom name for logformat node
Add the ability to specify custom name (will be used for representation
in verbose output types such as json) to logformat nodes.

For now, a custom name should be composed by characters [a-zA-Z0-9-_]*
2024-02-20 15:18:39 +01:00
Christopher Faulet
22d4b0e901 MINOR: stconn: Add SE flag to announce zero-copy forwarding on consumer side
The SE_FL_MAY_FASTFWD_CONS is added and it will be used by endpoints to
announce their support for the zero-copy forwarding on the consumer
side. The flag is not necessarily permanent. However, it will be used this
way for now.
2024-02-14 15:09:14 +01:00
Christopher Faulet
7598c0ba69 MINOR: stconn: Rename SE_FL_MAY_FASTFWD and reorder bitfield
To fix a bug, a flag to announce the capabitlity to support the zero-copy
forwarding on the consumer side will be added on the SE descriptor. So the
old flag SE_FL_MAY_FASTFWD is renamed to indicate it concerns the producer
side. It is now SE_FL_MAY_FASTFWD_PROD. And to prepare addition of the new
flag, the bitfield is a bit reordered.
2024-02-14 15:00:32 +01:00
Christopher Faulet
8cfc11f461 CLEANUP: stconn: Move SE flags set by app layer at the end of the bitfield
To fix a bug, some SE flags must be added or renamed. To avoid mixing flags
set by the endpoint and flags set by the app, the second set of flags are
moved at the end of the bitfield, leaving the holes on the middle.
2024-02-14 14:52:25 +01:00
Christopher Faulet
40d98176ba BUG/MEDIUM: stconn: Don't check pending shutdown to wake an applet up
This revert of commit 0b93ff8c87 ("BUG/MEDIUM: stconn: Wake applets on
sending path if there is a pending shutdown") and 9e394d34e0 ("BUG/MINOR:
stconn: Don't report blocked sends during connection establishment") because
it was not the right fixes.

We must not wake an applet up when a shutdown is pending because it means
output some data are still blocked in the channel buffer. The applet does
not necessarily consume these data. In this case, the applet may be woken up
infinitly, except if it explicitly reports it wont consume datay yet.

This patch must be backported as far as 2.8. For older versions, as far as
2.2, it may be backported. If so, a previous fix must be pushed to prevent
an HTTP applet to be stuck. In http_ana.c, in http_end_request() and
http_end_reponse(), the call to channel_htx_truncate() on the request
channel in case of MSG_ERROR must be replace by a call to
channel_htx_erase().
2024-02-14 14:22:36 +01:00
Christopher Faulet
2c672f282d BUG/MEDIUM: stconn: Allow expiration update when READ/WRITE event is pending
When a READ or a WRITE activity is reported on a channel, the corresponding
date is updated. the last-read-activity date (lra) is updated and the
first-send-block date (fsb) is reset. The event is also reported at the
channel level by setting CF_READ_EVENT or CF_WRITE_EVENT flags. When one of
these flags is set, this prevent the update of the stream's task expiration
date from sc_notify(). It also prevents corresponding timeout to be reported
from process_stream().

But it is a problem during fast-forwarding stage if no expiration date was
set by the stream. Only process_stream() resets these flags. So a first READ
or WRITE event will prevent any stream's expiration date update till a new
call to process_stream(). But with no expiration date, this will only happen
on shutdown/abort event, blocking the stream for a while.

It is for instance possible to block the stats applet or the cli applet if a
client does not consume the response. The stream may be blocked, the client
timeout is not respected and the stream can only be closed on a client
abort.

So now, we update the stream's expiration date, regardless of reported
READ/WRITE events. It is not a big deal because lra and fsb date are
properly updated. It also means an old READ/WRITE event will no prevent the
stream to report a timeout and it is expected too.

This patch must be backported as far as 2.8. On older versions, timeouts and
stream's expiration date are not updated in the same way and this works as
expected.
2024-02-14 14:22:36 +01:00
Christopher Faulet
4a78f766ff MEDIUM: applet: Add notion of shutdown for write for applets
In fact there is already flags on the SE to state a shutdown for reads or
writes was performed. But for applets, this notion does not exist. Both
flags are set in same time when the applet is released. But at the SC level,
there are functions to perform a shutdown (formely the shutw) and an abort
(formely the shutr). For applets, when a shutdown is performed on the SC, if
the applet is not immediately released, nothing is acknowledge at the SE
level.

With old way to implement applets, this was not an real issue until recently
because applets accessed to the channel/SC flags. It was thus possible to
catch the shutdowns. But the "wait" command on the CLI reveals the
flaw. Indeed, when this command is executed, nothing is read or sent. So, it
is not possible to detect the shutdowns. As a workaround, a dedicated test
on the SC flags was added at the end of the wait command I/O handler. But it
is pretty ugly.

With new way to implement applets, there is no longer access to the channel
or SC. So we must add a way to acknowledge shutdown into the SE.

This patch solves the both sides of the issue. The shutw notion is added for
applets. Its only purpose is to set SE_FL_SHWN flags. This flag is tested by
all applets, so, it solves the issue quite simply.

Note that it is described as a bug fix but there is no real issue, just a
design flaw. However, if the "wait" command is backported, this patch must
be backported too. Unfortinately it will require an adaptation because there
is no appctx flags on older versions.
2024-02-14 14:22:36 +01:00
Christopher Faulet
5df45cff8f BUG/MEDIUM: stconn/applet: Block 0-copy forwarding if producer needs more room
This case does not exist yet with the H1 multiplexer, but applets may decide to
not produce data if there is not enough room in the destination buffer (the
applet's outbuf or the opposite SE buffer). It is true for the stats applets for
instance. However this case is not properly handled when the zero-copy
forwarding is in-use.

To fix the issue, the se_done_ff() function was modified to return the number of
bytes really forwarded and to subs for sends if nothing was forwarded while the
zero-copy forwarding was blocked by the producer. On the applet side, we take
care to block the zero-copy forwarding if the applet requests more room. At the
end, zero-copy forwarding is unblocked if something was forwarded.

This way, it is now possible for the stats applet to report a full buffer and
block the zero-copy forwarding, even if the buffer is not really full, by
requesting more room.

No backport needed.
2024-02-14 14:22:36 +01:00
Christopher Faulet
ece002af1d BUG/MEDIUM: applet: Add a flag to state an applet is using zero-copy forwarding
An issue was introduced when zero-copy forwarding was added to the stats and
cache applets. There is no test to be sure the upper layer is ready to use
the zero-copy forwarding. So these applets refuse to deliver the response
into the applet's output buffer if the zero-copy forwarding is supported by
the opposite endpoint. It is especially an issue when a filter, like the
compression, is in-use on the response channel.

Because of this bug, the response is not delivered and the applet is woken
up in loop to produce data.

To fix the issue, an appctx flag was added, APPCTX_FL_FASTFWD, to know when
the zero-copy forwarding is in-use. We rely on this flag to not fill the
outbuf in the applet's I/O handler.

No backport needed.
2024-02-14 14:22:36 +01:00
Frederic Lecaille
167e38e0e0 MINOR: quic: Add a counter for reordered packets
A packet is considered as reordered when it is detected as lost because its packet
number is above the largest acknowledeged packet number by at least the
packet reordering threshold value.

Add ->nb_reordered_pkt new quic_loss struct member at the same location that
the number of lost packets to count such packets.

Should be backported to 2.6.
2024-02-14 11:32:29 +01:00
Frederic Lecaille
eeeb81bb49 MINOR: quic: Dynamic packet reordering threshold
Let's say that the largest packet number acknowledged by the peer is #10, when inspecting
the non already acknowledged packets to detect if they are lost or not, this is the
case a least if the difference between this largest packet number and and their
packet numbers are bigger or equal to the packet reordering threshold as defined
by the RFC 9002. This latter must not be less than QUIC_LOSS_PACKET_THRESHOLD(3).
Which such a value, packets #7 and oldest are detected as lost if non acknowledged,
contrary to packet number #8 or #9.

So, the packet loss detection is very sensitive to such a network characteristic
where non acknowledged packets are distant from each others by their packet number
differences.

Do not use this static value anymore for the packet reordering threshold which is used
as a criteria to detect packet loss. In place, make it depend on the difference
between the number of the last transmitted packet and the number of the oldest
one among the packet which are still in flight before being inspected to be
deemed as lost.

Add new tune.quic.reorder-ratio setting to apply a ratio in percent to this
dynamic packet reorder threshold.

Should be backported to 2.6.
2024-02-14 11:32:29 +01:00
Remi Tricot-Le Breton
5e66bf26ec BUG/MEDIUM: ssl: Fix crash when calling "update ssl ocsp-response" when an update is ongoing
The CLI command "update ssl ocsp-response" was forcefully removing an
OCSP response from the update tree regardless of whether it used to be
in it beforehand or not. But since the main OCSP upate task works by
removing the entry being currently updated from the update tree and then
reinserting it when the update process is over, it meant that in the CLI
command code we were modifying a structure that was already being used.

These concurrent accesses were not properly locked on the "regular"
update case because it was assumed that once an entry was removed from
the update tree, the update task was the only one able to work on it.

Rather than locking the whole update process, an "updating" flag was
added to the certificate_ocsp in order to prevent the "update ssl
ocsp-response" command from trying to update a response already being
updated.

An easy way to reproduce this crash was to perform two "simultaneous"
calls to "update ssl ocsp-response" on the same certificate. It would
then crash on an eb64_delete call in the main ocsp update task function.

This patch can be backported up to 2.8.
2024-02-12 11:15:45 +01:00
Willy Tarreau
613e959c7b MINOR: cli/wait: add a condition to wait on a server to become unused
The "wait" command now supports a condition, "srv-unused", which waits
for the designated server to become totally unused, indicating that it
is removable. Upon each wakeup it calls srv_check_for_deletion() to
verify if conditions are met, if not if it's recoverable, or if it's
not recoverable, and proceeds according to this, never waiting for a
final decision longer than the configured delay.

The purpose is to make it possible to remove servers from the CLI after
waiting for their sessions to be terminated:

  $ socat -t5 /path/to/socket - <<< "
        disable server px/srv1
        shutdown sessions server px/srv1
        wait 2s srv-unused px/srv1
        del server px/srv1"

Or even wait for connections to terminate themselves:

  $ socat -t70 /path/to/socket - <<< "
        disable server px/srv1
        wait 1m srv-unused px/srv1
        del server px/srv1"
2024-02-09 20:38:08 +01:00
Willy Tarreau
66989ff426 MINOR: cli/wait: also pass up to 4 arguments to the external conditions
Conditions will need to have context, arguments etc from the command line.
Since these will vary with time (otherwise we wouldn't wait), let's just
pass them as text (possibly pre-processed). We're starting with 4 strings
that are expected to be allocated by strdup() and are always sent to free()
upon release.
2024-02-09 20:38:08 +01:00
Willy Tarreau
2673f8be82 MINOR: cli/wait: also support an unrecoverable failure status
Since we'll support waiting for an action to succeed or permanently
fail, we need the ability to return an unrecoverable failure. Let's
add CLI_WAIT_ERR_FAIL for this. A static error message may be placed
into ctx->msg to report to the user why the failure is unrecoverable.
2024-02-09 20:38:08 +01:00
Willy Tarreau
9b680d7411 MINOR: server: split the server deletion code in two parts
We'll need to be able to verify whether or not a server may be deleted.
For now, both the verification and the action are performed in the same
function, at once under thread isolation. The goal here is to extract
the verification code into a new function that will perform these checks,
return a status between success/recoverable/non-recoverable failure, and
will also return a message for the caller.
2024-02-09 20:38:08 +01:00
Willy Tarreau
1d2255a78a MINOR: cli: add a new "wait" command to wait for a certain delay
This allows to insert delays between commands, i.e. to collect a same
set of metrics at a fixed interval. E.g:

  $ socat -t20 /path/to/socket <<< "show activity; wait 10s; show activity"

The goal will be to extend the feature to optionally support waiting on
certain conditions. For this reason the struct definitions and enums were
placed into cli-t.h.
2024-02-08 21:54:54 +01:00
Willy Tarreau
8581d62daf MINOR: session: add the necessary functions to update the per-session glitches
This provides a new function session_add_glitch_ctr() that will update
the glitch counter and rate for the session, if tracked at all.
2024-02-08 15:51:49 +01:00
Willy Tarreau
c9c6b683fb MEDIUM: stick-tables: add a new stored type for glitch_cnt and glitch_rate
This adds a new pair of stored types in the stick-tables:
  - glitch_cnt
  - glitch_rate

These keep count of the number of glitches reported on a front connection,
in order to decide how to act with a badly defective client or a potential
attacker. For now nothing updates these counters, but all the infrastructure
needed to configure, update and retrieve them was added, including the doc.

No regtest was added yet since they're not filled yet.
2024-02-08 15:51:49 +01:00
Remi Tricot-Le Breton
befebf8b51 BUG/MEDIUM: ocsp: Separate refcount per instance and per store
With the current way OCSP responses are stored, a single OCSP response
is stored (in a certificate_ocsp structure) when it is loaded during a
certificate parsing, and each ckch_inst that references it increments
its refcount. The reference to the certificate_ocsp is actually kept in
the SSL_CTX linked to each ckch_inst, in an ex_data entry that gets
freed when he context is freed.
One of the downside of this implementation is that is every ckch_inst
referencing a certificate_ocsp gets detroyed, then the OCSP response is
removed from the system. So if we were to remove all crt-list lines
containing a given certificate (that has an OCSP response), the response
would be destroyed even if the certificate remains in the system (as an
unused certificate). In such a case, we would want the OCSP response not
to be "usable", since it is not used by any ckch_inst, but still remain
in the OCSP response tree so that if the certificate gets reused (via an
"add ssl crt-list" command for instance), its OCSP response is still
known as well. But we would also like such an entry not to be updated
automatically anymore once no instance uses it. An easy way to do it
could have been to keep a reference to the certificate_ocsp structure in
the ckch_store as well, on top of all the ones in the ckch_instances,
and to remove the ocsp response from the update tree once the refcount
falls to 1, but it would not work because of the way the ocsp response
tree keys are calculated. They are decorrelated from the ckch_store and
are the actual OCSP_CERTIDs, which is a combination of the issuer's name
hash and key hash, and the certificate's serial number. So two copies of
the same certificate but with different names would still point to the
same ocsp response tree entry.

The solution that answers to all the needs expressed aboved is actually
to have two reference counters in the certificate_ocsp structure, one
for the actual ckch instances and one for the ckch stores. If the
instance refcount becomes 0 then we remove the entry from the auto
update tree, and if the store reference becomes 0 we can then remove the
OCSP response from the tree. This would allow to chain some "del ssl
crt-list" and "add ssl crt-list" CLI commands without losing any
functionality.

Must be backported to 2.8.
2024-02-07 17:10:05 +01:00
Remi Tricot-Le Breton
28e78a0a74 MINOR: ssl: Use OCSP_CERTID instead of ckch_store in ckch_store_build_certid
The only useful information taken out of the ckch_store in order to copy
an OCSP certid into a buffer (later used as a key for entries in the
OCSP response tree) is the ocsp_certid field of the ckch_data structure.
We then don't need to pass a pointer to the full ckch_store to
ckch_store_build_certid or even any information related to the store
itself.
The ckch_store_build_certid is then converted into a helper function
that simply takes an OCSP_CERTID and converts it into a char buffer.
2024-02-07 17:09:39 +01:00
Christopher Faulet
d7467cd495 MINOR: applet: Identify applets using their own buffers via a flag
These applets can now be identified by testing APPCTX_FL_INOUT_BUFS
flag. This will be useful between the kind of applets in helper functions.
2024-02-07 15:05:05 +01:00
Christopher Faulet
a9301c96f1 MINOR: applet: Use an option to disable zero-copy forwarding for all applets
At the beginning of the 3.0-dev cycle, the zero-copy forwarding support was
added only for the cache applet with an option to disable it. This was a
hack, waiting for a better integration with applets. It is now possible to
implement the zero-copy forwarding for any applets. So the specific option
for the cache applet was renamed to be used for all applets. And this option
is now also checked for the stats applet.

Concretely, 'tune.cache.zero-copy-forwarding' was renamed to
'tune.applet.zero-copy-forwarding'.
2024-02-07 15:05:01 +01:00
Christopher Faulet
ee53d8421f MEDIUM: applet: Simplify a bit API to exchange data with applets
Default .rcv_buf and .snd_buf functions that applets can use are now
specialized to manipulate raw buffers or HTX buffers.

Thus a TCP applet should use appctx_raw_rcv_buf() and appctx_raw_snd_buf()
while HTTP applet should use appctx_htx_rcv_buf() and appctx_htx_snd_buf().

Note that the appctx is now directly passed to these functions instead of
the SC.
2024-02-07 15:04:52 +01:00
Christopher Faulet
868205943c MAJOR: stats: Send stats dump over HTTP using zero-copy forwarding
Just like for the cache applet, it is now possible to send response to the
opposite side using the zero-copy forwarding. Internal functions were
slightly updated but there is nothing special to say. Except the requested
size during the nego stage is not exact.
2024-02-07 15:04:48 +01:00
Christopher Faulet
1c18d32a0d MEDIUM: stconn: Nofify requested size during zero-copy forwarding nego is exact
It is now possible to use a flag during zero-copy forwarding negotiation to
specify the requested size is exact, it means the producer really expect to
receive at least this amount of data.

It can be used by consumer to prepare some processing at this stage, based
on the requested size. For instance, in the H1 mux, it is used to write the
next chunk size.
2024-02-07 15:04:38 +01:00
Christopher Faulet
2297f52734 MINOR: stconn: Add support for flags during zero-copy forwarding negotiation
During zero-copy forwarding negotiation, a pseudo flag was already used to
notify the consummer if the producer is able to use kernel splicing or not. But
this was not extensible. So, now we use a true bitfield to be able to pass flags
during the negotiation. NEGO_FF_FL_* flags may be used now.

Of course, for now, there is only one flags, the kernel splicing support on
producer side (NEGO_FF_FL_MAY_SPLICE).
2024-02-07 15:04:29 +01:00
Christopher Faulet
39b6f5b04c MEDIUM: applet: Add support for zero-copy forwarding from an applet
Thanks to this patch, it is possible to an applet to directly send data to
the opposite endpoint. To do so, it must implement <fastfwd> appctx callback
function and set SE_FL_MAY_FASTFWD flag.

Everything will be handled by appctx_fastfwd() function. The applet is only
responsible to transfer data. If it sets <to_forward> value, it is used to
limit the amount of data to forward.
2024-02-07 15:04:01 +01:00
Christopher Faulet
62a81cb6a6 MINOR: applet: Add callback function to deal with zero-copy forwarding
This patch introduces the support for the callback function responsible to
produce data via the zero-copy forwarding mechanism. There is no
implementation for now. But <to_forward> field was added in the appctx
structure to let an applet inform how much data it want to forward. It is
not mandatory but it will be used during the zero-copy forwarding
negociation.
2024-02-07 15:03:57 +01:00
Christopher Faulet
cc7b141e1c MINOR: applet: Add an appctx flag to report shutdown to applets
There is no shutdown for reads and send with applets. Both are performed
when the appctx is released. So instead of 2 flags, like for
muxes/connections, only one flag is used. But the idea is the same:
acknowledge the event at the applet level.
2024-02-07 15:03:50 +01:00
Christopher Faulet
14bd091fd7 MINOR: applet: Remove appctx state field to only used the flags
The appctx state was never really used as a state. It is only used to know
when an applet should be freed on the next wakeup. This can be converted to
a flag and the state can be removed. This is what this patch does.
2024-02-07 15:03:46 +01:00
Christopher Faulet
4434b03358 MINIOR: applet: Add flags to deal with ends of input, ends of stream and errors
Dedicated appctx flags to report EOI, EOS and errors (pending or terminal) were
added with the functions to set these flags. It is pretty similar to what it
done on most of muxes.
2024-02-07 15:03:42 +01:00
Christopher Faulet
e8655546b7 MINOR: applet: Add flags on the appctx and stop abusing its state
Till now, we've extended the appctx state to add some flags. However, the
field name is misleading. So a bitfield was added to handle real flags. And
helper functions to manipulate this bitfield were added.
2024-02-07 15:03:34 +01:00
Christopher Faulet
4ad8192ce4 MEDIM: applet: Add the applet handler based on IN/OUT buffers
A dedicated function to run applets was introduced, in addition to the old
one, to deal with applets that use their own buffers. The main differnce
here is that this handler does not use channels at all. It performs a
synchronous send before calling the applet and performs a synchronous
receive just after.

No applets are plugged on this handler for now.
2024-02-07 15:03:26 +01:00
Christopher Faulet
f81b704d01 MEDIUM: stconn: Add functions to handle applets I/O from the SC layer
There is no tasklet to handle I/O subscriptions for applets, but functions
to deal with receives and sends from the SC layer were added. it meanse a
function to retrieve data from an applet with this synchronous version and a
function to push data to an applet wit this synchronous version.

It is pretty similar to the functions used for muxes but there are some
differences. So for now, we keep them separated.

Zero-copy forwarding is not supported for now. In addition, there is no
subscription mechanism.
2024-02-07 15:03:23 +01:00
Christopher Faulet
525ec12305 MINOR: applet: Implement default functions to exchange data with channels
In this patch, we add default functions to copy data from a channel to the
<inbuf> buffer of an applet (appctx_rcv_buf) and another on to copy data
from <outbuf> buffer of an applet to a channel (appctx_snd_buf).

These functions are not used for now, but they will be used by applets to
define their <rcv_buf> and <snd_buf> callback functions. Of course, it will
be possible for a specific applet to implement its own functions but these
ones should be good enough for most of applets. HTX and RAW buffers are
supported.
2024-02-07 15:03:18 +01:00
Christopher Faulet
361b81bfca MINOR: applet: Add support for callback functions to exchange data with channels
For now, it is not usable, but this patch introduce the support of callback
functions, in the applet structure, to exchange data between channels and
applets. It is pretty similar to callback functions defined by muxes.
2024-02-07 15:03:14 +01:00
Christopher Faulet
ab9d2c6ca8 MINOR: applet: Add dedicated IN/OUT buffers for appctx
It is the first patch of a series aimed to align applets on connections.
Here, dedicated buffers are added for applets. For now, buffers are
initialized and helpers function to deal with allocation are added. In
addition, flags to report allocation failures or full buffers are also
introduced. <inbuf> will be used to push data to the applet from the stream
and <outbuf> will be used to push data from the applet to the stream.
2024-02-07 15:03:01 +01:00
Christopher Faulet
0dd7ff0d67 MINOR: stconn: Be able to detect applets using HTX
IS_HXT_SC() macro is only usable if the stream-connector is attached to a
connection. It is a bit restrictive because this cannot work if the SC is
attached to an applet. So let's fix that be adding the support of applets
too.
2024-02-07 15:02:19 +01:00
Christopher Faulet
6734e56514 MINOR: task: Move wait_event in the task header file
wait_event structure was in connection header file because it is only used
by connections and muxes. But, this may change. For instance applets may be
good candidates to use it too. So, the structure is moved to the task header
file instead.
2024-02-07 15:02:13 +01:00
Willy Tarreau
25968c186a MINOR: debug: add an optional message argument to the BUG_ON() family
This commit adds support for an optional second argument to BUG_ON(),
WARN_ON(), CHECK_IF(), that can be a constant string. When such an
argument is given, it will be printed on a second line after the
existing first message that contains the condition.

This can be used to provide more human-readable explanations about
what happened, such as "too low on memory" or "memory corruption
detected" that may help a user resolve the incident by themselves.
2024-02-05 17:09:00 +01:00
Willy Tarreau
d417863828 MINOR: debug: support passing an optional message in ABORT_NOW()
The ABORT_NOW() macro is not much used since we have BUG_ON(), but
there are situations where it makes sense, typically if the program
must always die regardless od DEBUG_STRICT, or if the condition must
always be evaluated (e.g. decompress something and check it).

It's not convenient not to have any hint about what happened there. But
providing too much info also results in wiping some registers, making
the trace less exploitable, so a compromise must be found.

What this patch does is to provide the support for an optional argument
to ABORT_NOW(). When an argument is passed (a string), then a message
will be emitted with the file name, line number, the message and a
trailing LF, before the stack dump and the crash. It should be used
reasonably, for example in functions that have multiple calls that need
to be more easily distinguished.
2024-02-05 17:09:00 +01:00
Willy Tarreau
bc70b385fd MINOR: debug: make BUG_ON() catch build errors even without DEBUG_STRICT
As seen in previous commit 59acb27001 ("BUILD: quic: Variable name typo
inside a BUG_ON()."), it can sometimes happen that with DEBUG forced
without DEBUG_STRICT, BUG_ON() statements are ignored. Sadly, it means
that typos there are not even build-tested.

This patch makes these statements reference sizeof(cond) to make sure
the condition is parsed. This doesn't result in any code being emitted,
but makes sure the expression is correct so that an issue such as the one
above will fail to build (which was verified).

This may be backported as it can help spot failed backports as well.
2024-02-05 15:09:37 +01:00
Aurelien DARRAGON
be0165b249 BUILD: debug: remove leftover parentheses in ABORT_NOW()
Since d480b7b ("MINOR: debug: make ABORT_NOW() store the caller's line
number when using abort"), building with 'DEBUG_USE_ABORT' fails with:

  |In file included from include/haproxy/api.h:35,
  |                 from include/haproxy/activity.h:26,
  |                 from src/ev_poll.c:20:
  |include/haproxy/thread.h: In function ‘ha_set_thread’:
  |include/haproxy/bug.h:107:47: error: expected ‘;’ before ‘_with_line’
  |  107 | #define ABORT_NOW() do { DUMP_TRACE(); abort()_with_line(__LINE__); } while (0)
  |      |                                               ^~~~~~~~~~
  |include/haproxy/bug.h:129:25: note: in expansion of macro ‘ABORT_NOW’
  |  129 |                         ABORT_NOW();                                    \
  |      |                         ^~~~~~~~~
  |include/haproxy/bug.h:123:9: note: in expansion of macro ‘__BUG_ON’
  |  123 |         __BUG_ON(cond, file, line, crash, pfx, sfx)
  |      |         ^~~~~~~~
  |include/haproxy/bug.h:174:30: note: in expansion of macro ‘_BUG_ON’
  |  174 | #  define BUG_ON(cond)       _BUG_ON     (cond, __FILE__, __LINE__, 3, "FATAL: bug ",     "")
  |      |                              ^~~~~~~
  |include/haproxy/thread.h:201:17: note: in expansion of macro ‘BUG_ON’
  |  201 |                 BUG_ON(!thr->ltid_bit);
  |      |                 ^~~~~~
  |compilation terminated due to -Wfatal-errors.
  |make: *** [Makefile:1006: src/ev_poll.o] Error 1

This is because of a leftover: abort()_with_line(__LINE__);
                                    ^^
Fixing it by removing the extra parentheses after 'abort' since the
abort() call is now performed under abort_with_line() helper function.

This was raised by Ilya in GH #2440.

No backport is needed, unless the above commit gets backported.
2024-02-05 14:55:04 +01:00
Willy Tarreau
d480b7be96 MINOR: debug: make ABORT_NOW() store the caller's line number when using abort
Placing DO_NOT_FOLD() before abort() only works in -O2 but not in -Os which
continues to place only 5 calls to abort() in h3.o for call places. The
approach taken here is to replace abort() with a new function that wraps
it and stores the line number in the stack. This slightly increases the
code size (+0.1%) but when unwinding a crash, the line number remains
present now. This is a very low cost, especially if we consider that
DEBUG_USE_ABORT is almost only used by code coverage tools and occasional
debugging sessions.
2024-02-02 17:12:06 +01:00
Willy Tarreau
2bb192ba91 MINOR: debug: make sure calls to ha_crash_now() are never merged
As indicated in previous commit, we don't want calls to ha_crash_now()
to be merged, since it will make gdb return a wrong line number. This
was found to happen with gcc 4.7 and 4.8 in h3.c where 26 calls end up
as only 5 to 18 "ud2" instructions depending on optimizations. By
calling DO_NOT_FOLD() just before provoking the trap, we can reliably
avoid this folding problem. Note that this does not address the case
where abort() is used instead (DEBUG_USE_ABORT).
2024-02-02 17:12:06 +01:00
Willy Tarreau
e06e8a2390 MINOR: compiler: add a new DO_NOT_FOLD() macro to prevent code folding
Modern compilers sometimes perform function tail merging and identical
code folding, which consist in merging identical occurrences of same
code paths, generally final ones (e.g. before a return, a jump or an
unreachable statement). In the case of ABORT_NOW(), it can happen that
the compiler merges all of them into a single one in a function,
defeating the purpose of the check which initially was to figure where
the bug occurred.

Here we're creating a DO_NO_FOLD() macro which makes use of the line
number and passes it as an integer argument to an empty asm() statement.
The effect is a code position dependency which prevents the compiler
from merging the code till that point (though it may still merge the
following code). In practice it's efficient at stopping the compilers
from merging calls to ha_crash_now(), which was the initial purpose.

It may also be used to force certain optimization constructs since it
gives more control to the developer.
2024-02-02 17:12:06 +01:00
Christopher Faulet
3246f863d6 MEDIUM: stats: Be able to access a specific field into a stats module
It is now possible to selectively retrieve extra counters from stats
modules. H1, H2, QUIC and H3 fill_stats() callback functions are updated to
return a specific counter.
2024-02-01 12:00:53 +01:00
Christopher Faulet
fd366a106b MINOR: stats: Be able to access to registered stats modules from anywhere
The list of modules registered on the stats to expose extra counters is now
public. It is required to export these counters into the Prometheus
exporter.
2024-02-01 12:00:53 +01:00