Commit Graph

1129 Commits

Author SHA1 Message Date
Valentine Krasnobaeva
4a9e3e102e BUG/MINOR: haproxy: only tid 0 must not sleep if got signal
This patch fixes the commit eea152ee68
("BUG/MINOR: signals/poller: ensure wakeup from signals").

There is some probability that run_poll_loop() becomes inifinite, if
TH_FL_SLEEPING is withdrawn from all threads in the second signal_queue_len
check, when a signal has received just after the first one.

In such particular case, the 'wake' variable, which is used to terminate
thread's poll loop is never reset to 0. So, we never enter to the "stopping"
part of the run_poll_loop() and threads, except the one with id 0 (tid 0
handles signals), will continue to call _do_poll() eternally and will never
sleep, as its TH_FL_SLEEPING flag was unset.

This flag needs to be removed only for the tid 0, as it was done in the first
signal_queue_len check.

This fixes an issue #2537 "infinite loop when shutting down".

This fix must be backported in every stable version.
2024-05-06 18:39:08 +02:00
Ilia Shipitsin
a65c6d3574 CLEANUP: assorted typo fixes in the code and comments
This is 42nd iteration of typo fixes
2024-05-03 09:01:36 +02:00
Valentine Krasnobaeva
5cbb278fae MINOR: capabilities: add cap_sys_admin support
If 'namespace' keyword is used in the backend server settings or/and in the
bind string, it means that haproxy process will call setns() to change its
default namespace to the configured one and then, it will create a
socket in this new namespace. setns() syscall requires CAP_SYS_ADMIN
capability in the process Effective set (see man 2 setns). Otherwise, the
process must be run as root.

To avoid to run haproxy as root, let's add cap_sys_admin capability in the
same way as we already added the support for some other network capabilities.

As CAP_SYS_ADMIN belongs to CAP_SYS_* capabilities type, let's add a separate
flag LSTCHK_SYSADM for it. This flag is set, if the 'namespace' keyword was
found during configuration parsing. The flag may be unset only in
prepare_caps_for_setuid() or in prepare_caps_from_permitted_set(), which
inspect process EUID/RUID and Effective and Permitted capabilities sets.

If system doesn't support Linux capabilities or 'cap_sys_admin' was not set
in 'setcap', but 'namespace' keyword is presented in the configuration, we
keep the previous strict behaviour. Process, that has changed uid to the
non-priviledged user, will terminate with alert. This alert invites the user
to recheck its configuration.

In the case, when haproxy will start and run under a non-root user and
'cap_sys_admin' is not set, but 'namespace' keyword is presented, this patch
does not change previous behaviour as well. We'll still let the user to try
its configuration, but we inform via warning, that unexpected things, like
socket creation errors, may occur.
2024-04-30 21:40:17 +02:00
William Lallemand
2ab42dddc4 BUG/MINOR: mworker: reintroduce way to disable seamless reload with -x /dev/null
Since the introduction of the automatic seamless reload using the
internal socketpair, there is no way of disabling the seamless reload.

Previously we just needed to remove -x from the startup command line,
and remove any "expose-fd" keyword on stats socket lines.

This was introduced in 2be557f7c ("MEDIUM: mworker: seamless reload use
the internal sockpairs").

The patch copy /dev/null again and pass it to the next exec so we never
try to get socket from the -x.

Must be backported as far as 2.6.
2024-04-26 15:25:49 +02:00
Amaury Denoyelle
34ae7755b3 MINOR: stats: apply stats-file on process startup
This commit is the first one of a serie to implement preloading of
haproxy counters via stats-file parsing.

This patch defines a basic apply_stats_file() function. It implements
reading line by line of a stats-file without any parsing for the moment.
It is called automatically on process startup via init().
2024-04-26 11:29:25 +02:00
Amaury Denoyelle
c02ec9a9db BUG/MINOR: backend: use cum_sess counters instead of cum_conn
This commit is part of a serie to align counters usage between
frontends/listeners on one side and backends/servers on the other.

"stot" metric refers to the total number of sessions. On backend side,
it is interpreted as a number of streams. Previously, this was accounted
using <cum_sess> be_counters field for servers, but <cum_conn> instead
for backend proxies.

Adjust this by using <cum_sess> for both proxies and servers. As such,
<cum_conn> field can be removed from be_counters.

Note that several diagnostic messages which reports total frontend and
backend connections were adjusted to use <cum_sess>. However, this is an
outdated and misleading information as it does reports streams count on
backend side. These messages should be fixed in a separate commit.

This should be backported to all stable releases.
2024-04-22 10:35:18 +02:00
Valentine Krasnobaeva
865db6307f MINOR: init: use RLIMIT_DATA instead of RLIMIT_AS
Limiting total allocatable process memory (VSZ) via setting RLIMIT_AS limit is
no longer effective, in order to restrict memory consumption at run time.
We can see from process memory map below, that there are many holes within
the process VA space, which bumps its VSZ to 1.5G. These holes are here by
many reasons and could be explaned at first by the full randomization of
system VA space. Now it is usually enabled in Linux kernels by default. There
are always gaps around the process stack area to trap overflows. Holes before
and after shared libraries could be explained by the fact, that on many
architectures libraries have a 'preferred' address to be loaded at; putting
them elsewhere requires relocation work, and probably some unshared pages.
Repetitive holes of 65380K are most probably correspond to the header that
malloc has to allocate before asked a claimed memory block. This header is
used by malloc to link allocated chunks together and for its internal book
keeping.

	$ sudo pmap -x -p `pidof haproxy`
	127136:   ./haproxy -f /home/haproxy/haproxy/haproxy_h2.cfg
	Address           Kbytes     RSS   Dirty Mode  Mapping
	0000555555554000     388      64       0 r---- /home/haproxy/haproxy/haproxy
	00005555555b5000    2608    1216       0 r-x-- /home/haproxy/haproxy/haproxy
	0000555555841000     916      64       0 r---- /home/haproxy/haproxy/haproxy
	0000555555926000      60      60      60 r---- /home/haproxy/haproxy/haproxy
	0000555555935000     116     116     116 rw--- /home/haproxy/haproxy/haproxy
	0000555555952000    7872    5236    5236 rw---   [ anon ]
	00007fff98000000     156      36      36 rw---   [ anon ]
	00007fff98027000   65380       0       0 -----   [ anon ]
	00007fffa0000000     156      36      36 rw---   [ anon ]
	00007fffa0027000   65380       0       0 -----   [ anon ]
	00007fffa4000000     156      36      36 rw---   [ anon ]
	00007fffa4027000   65380       0       0 -----   [ anon ]
	00007fffa8000000     156      36      36 rw---   [ anon ]
	00007fffa8027000   65380       0       0 -----   [ anon ]
	00007fffac000000     156      36      36 rw---   [ anon ]
	00007fffac027000   65380       0       0 -----   [ anon ]
	00007fffb0000000     156      36      36 rw---   [ anon ]
	00007fffb0027000   65380       0       0 -----   [ anon ]
	...
	00007ffff7fce000       4       4       0 r-x--   [ anon ]
	00007ffff7fcf000       4       4       0 r---- /usr/lib/x86_64-linux-gnu/ld-2.31.so
	00007ffff7fd0000     140     140       0 r-x-- /usr/lib/x86_64-linux-gnu/ld-2.31.so
	...
	00007ffff7ffe000       4       4       4 rw---   [ anon ]
	00007ffffffde000     132      20      20 rw---   [ stack ]
	ffffffffff600000       4       0       0 --x--   [ anon ]
	---------------- ------- ------- -------
	total kB         1499288   75504   72760

This exceeded VSZ makes impossible to start an haproxy process with 200M
memory limit, set at its initialization stage as RLIMIT_AS. We usually
have in this case such cryptic output at stderr:

	$ haproxy -m 200 -f haproxy_quic.cfg
        (null)(null)(null)(null)(null)(null)

At the same time the process RSS (a memory really used) is only 75,5M.
So to make process memory accounting more realistic let's base the memory
limit, set by -m option, on RSS measurement and let's use RLIMIT_DATA instead
of RLIMIT_AS.

RLIMIT_AS was used before, because earlier versions of haproxy always allocate
memory buffers for new connections, but data were not written there
immediately. So these buffers were not instantly counted in RSS, but were
always counted in VSZ. Now we allocate new buffers only in the case, when we
will write there some data immediately, so using RLIMIT_DATA becomes more
appropriate.
2024-04-19 17:36:40 +02:00
Ilya Shipitsin
ab7f05daba CLEANUP: assorted typo fixes in the code and comments
This is 41st iteration of typo fixes
2024-04-17 11:14:44 +02:00
Willy Tarreau
018443b8a1 BUILD: makefile: get rid of the CPU variable
The CPU variable, when used, is almost always exclusively used with
"generic" to disable any CPU-specific optimizations, or "native" to
enable "-march=native". Other options are not used and are just making
CPU_CFLAGS more confusing.

This commit just drops all pre-configured variants and replaces them
with documentation about examples of supported options. CPU_CFLAGS is
preserved as it appears that it's mostly used as a proxy to inject the
distro's CFLAGS, and it's just empty by default.

The CPU variable is checked, and if set to anything but "generic", it
emits a warning about its deprecation and invites the user to read
INSTALL.

Users who would just set CPU_CFLAGS will be able to continue to do so,
those who were using CPU=native will have to pass CPU_CFLAGS=-march=native
and those who were passing one of the other options will find it in the
doc as well.

Note that this also removes the "CPU=" line from haproxy -vv, that most
users got used to seeing set to "generic" or occasionally "native"
anyway, thus that didn't provide any useful information.
2024-04-11 17:33:28 +02:00
Valentine Krasnobaeva
f0b6436f57 MEDIUM: capabilities: check process capabilities sets
Since the Linux capabilities support add-on (see the commit bd84387beb
("MEDIUM: capabilities: enable support for Linux capabilities")), we can also
check haproxy process effective and permitted capabilities sets, when it
starts and runs as non-root.

Like this, if needed network capabilities are presented only in the process
permitted set, we can get this information with capget and put them in the
process effective set via capset. To do this properly, let's introduce
prepare_caps_from_permitted_set().

First, it checks if binary effective set has CAP_NET_ADMIN or CAP_NET_RAW. If
there is a match, LSTCHK_NETADM is removed from global.last_checks list to
avoid warning, because in the initialization sequence some last configuration
checks are based on LSTCHK_NETADM flag and haproxy process euid may stay
unpriviledged.

If there are no CAP_NET_ADMIN and CAP_NET_RAW in the effective set, permitted
set will be checked and only capabilities given in 'setcap' keyword will be
promoted in the process effective set. LSTCHK_NETADM will be also removed in
this case by the same reason. In order to be transparent, we promote from
permitted set only capabilities given by user in 'setcap' keyword. So, if
caplist doesn't include CAP_NET_ADMIN or CAP_NET_RAW, LSTCHK_NETADM would not
be unset and warning about missing priviledges will be emitted at
initialization.

Need to call it before protocol_bind_all() to allow binding to priviledged
ports under non-root and 'setcap cap_net_bind_service' must be set in the
global section in this case.
2024-04-05 18:01:54 +02:00
Valentine Krasnobaeva
e4306fb822 BUG/MINOR: init: relax LSTCHK_NETADM checks for non root
Linux capabilities support and ability to preserve it for running process
after switching to a global.uid was added recently by the commit bd84387beb
("MEDIUM: capabilities: enable support for Linux capabilities")).
This new feature hasn't yet been taken into account by last config checks,
which are performed at initialization stage.

So, to update it, let's perform it after set_identity() call. Like this,
current EUID is already changed to a global.uid and prepare_caps_for_setuid()
would unset LSTCHK_NETADM flag, only if capabilities given in the 'setcap'
keyword in the configuration file were preserved.

Otherwise, if system doesn't support Linux capabilities or they were not set
via 'setcap', we keep the previous strict behaviour: process will terminate
with an alert, in order to insist that user: either needs to change
run UID (worst case: start and run as root), or he needs to set/recheck
capabilities listed as 'setcap' arguments.

In the case, when haproxy will start and run under a non-root user this patch
doesn't change the previous behaviour: we'll still let him try the
configuration, but we inform via warning that unexpected things may occur.

Need to be backported until v2.9, including v2.9.
2024-04-05 18:01:54 +02:00
Aurelien DARRAGON
56d8074798 MINOR: proxy: add PR_FL_CHECKED flag
PR_FL_CHECKED is set on proxy once the proxy configuration was fully
checked (including postparsing checks).

This information may be useful to functions that need to know if some
config-related proxy properties are likely to change or not due to parsing
or postparsing/check logics. Also, during runtime, except for some rare cases
config-related proxy properties are not supposed to be changed.
2024-04-04 19:10:01 +02:00
Tim Duesterhus
ad54273cf9 MINOR: systemd: Include MONOTONIC_USEC field in RELOADING=1 message
As per the `sd_notify` manual:

> A field carrying the monotonic timestamp (as per CLOCK_MONOTONIC) formatted
> in decimal in μs, when the notification message was generated by the client.
> This is typically used in combination with "RELOADING=1", to allow the
> service manager to properly synchronize reload cycles. See systemd.service(5)
> for details, specifically "Type=notify-reload".

Thus this change allows users with a recent systemd to switch to
`Type=notify-reload`, should they desire to do so. Correct behavior was
verified with a Fedora 39 VM.

see systemd/systemd#25916

[wla: the service file should be updated this way:]

    diff --git a/admin/systemd/haproxy.service.in b/admin/systemd/haproxy.service.in
    index 22a53d8aab..8c6dadb5e5 100644
    --- a/admin/systemd/haproxy.service.in
    +++ b/admin/systemd/haproxy.service.in
    @@ -8,12 +8,11 @@ EnvironmentFile=-/etc/default/haproxy
     EnvironmentFile=-/etc/sysconfig/haproxy
     Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" "EXTRAOPTS=-S /run/haproxy-master.sock"
     ExecStart=@SBINDIR@/haproxy -Ws -f $CONFIG -p $PIDFILE $EXTRAOPTS
    -ExecReload=@SBINDIR@/haproxy -Ws -f $CONFIG -c $EXTRAOPTS
    -ExecReload=/bin/kill -USR2 $MAINPID
     KillMode=mixed
     Restart=always
     SuccessExitStatus=143
    -Type=notify
    +Type=notify-reload
    +ReloadSignal=SIGUSR2

     # The following lines leverage SystemD's sandboxing options to provide
     # defense in depth protection at the expense of restricting some flexibility

Signed-off-by: William Lallemand <wlallemand@haproxy.com>
2024-04-04 15:58:29 +02:00
Willy Tarreau
f821a3983e BUILD: systemd: fix build error on non-systemd systems with USE_SYSTEMD=1
Thanks to previous commit, we can now build with USE_SYSTEMD=1 on any
system without requiring any parts from systemd. It just turns our that
there was one remaining include in haproxy.c that needed to be replaced
with haproxy/systemd.h to build correctly. That's what this commit does.
2024-04-03 17:34:36 +02:00
Willy Tarreau
6c1b29d06f MINOR: ring: make the number of queues configurable
Now the rings have one wait queue per group. This should limit the
contention on systems such as EPYC CPUs where the performance drops
dramatically when using more than one CCX.

Tests were run with different numbers and it was showed that value
6 outperforms all other ones at 12, 24, 48, 64 and 80 threads on an
EPYC, a Xeon and an Ampere CPU. Value 7 sometimes comes close and
anything around these values degrades quickly. The value has been
left tunable in the global section.

This commit only introduces everything needed to set up the queue count
so that it's easier to adjust it in the forthcoming patches, but it was
initially added after the series, making it harder to compare.

It was also shown that trying to group the threads in queues by their
thread groups is counter-productive and that it was more efficient to
do that by applying a modulo on the thread number. As surprising as it
seems, it does have the benefit of well balancing any number of threads.
2024-03-25 17:34:19 +00:00
Christopher Faulet
189f74d4ff MINOR: cfgparse: Add a global option to expose deprecated directives
Similarly to "expose-exprimental-directives" option, there is no a global
option to expose some deprecated directives. Idea is to have a way to silent
warnings about deprecated directives when there is no alternative solution.

Of course, deprecated directives covered by this option are not listed and
may change. It is only a best effort to let users upgrade smoothly.
2024-03-15 11:31:48 +01:00
William Lallemand
70be894e41 MINOR: debug: enable insecure fork on the command line
-dI allow to enable "insure-fork-wanted" directly from the command line,
which is useful when you want to run ASAN with addr2line with a lot of
configuration files without editing them.
2024-03-13 11:23:14 +01:00
Frederic Lecaille
eeeb81bb49 MINOR: quic: Dynamic packet reordering threshold
Let's say that the largest packet number acknowledged by the peer is #10, when inspecting
the non already acknowledged packets to detect if they are lost or not, this is the
case a least if the difference between this largest packet number and and their
packet numbers are bigger or equal to the packet reordering threshold as defined
by the RFC 9002. This latter must not be less than QUIC_LOSS_PACKET_THRESHOLD(3).
Which such a value, packets #7 and oldest are detected as lost if non acknowledged,
contrary to packet number #8 or #9.

So, the packet loss detection is very sensitive to such a network characteristic
where non acknowledged packets are distant from each others by their packet number
differences.

Do not use this static value anymore for the packet reordering threshold which is used
as a criteria to detect packet loss. In place, make it depend on the difference
between the number of the last transmitted packet and the number of the oldest
one among the packet which are still in flight before being inspected to be
deemed as lost.

Add new tune.quic.reorder-ratio setting to apply a ratio in percent to this
dynamic packet reorder threshold.

Should be backported to 2.6.
2024-02-14 11:32:29 +01:00
Willy Tarreau
75d64c0d4c BUG/MINOR: diag: run the final diags before quitting when using -c
Final diags were added in 2.4 by commit 5a6926dcf ("MINOR: diag: create
cfgdiag module"), but it's called too late in the startup process,
because when "-c" is passed, the call is not made, while it's its primary
use case. Let's just move the call earlier.

Note that currently the check in this function is limited to verifying
unicity of server cookies in a backend, so it can be backported as far
as 2.4, but there is little value in insisting if it doesn't backport
easily.
2024-02-03 12:08:11 +01:00
Willy Tarreau
2b930aa7c3 [RELEASE] Released version 3.0-dev1
Released version 3.0-dev1 with the following main changes :
    - MINOR: channel: Use dedicated functions to deal with STREAMER flags
    - MEDIUM: applet: Handle channel's STREAMER flags on applets size
    - MINOR: applets: Use channel's field to compute amount of data received
    - MEDIUM: cache: Save body size of cached objects and track it on delivery
    - MEDIUM: cache: Add support for endp-to-endp fast-forwarding
    - MINOR: cache: Add global option to enable/disable zero-copy forwarding
    - MINOR: pattern: Use reference name as filename to read patterns from a file
    - MEDIUM: pattern: Add support for virtual and optional files for patterns
    - DOC: config: Add section about name format for maps and ACLs
    - DOC: management/lua: Update commands about map and acl
    - MINOR: promex: Add support for specialized front/back/li/srv metric names
    - MINOR: promex: Export active/backup metrics per-server
    - BUG/MINOR: ssl: Double free of OCSP Certificate ID
    - MINOR: ssl/cli: Add ha_(warning|alert) msgs to CLI ckch callback
    - BUG/MINOR: ssl: Wrong OCSP CID after modifying an SSL certficate
    - BUG/MINOR: lua: Wrong OCSP CID after modifying an SSL certficate (LUA)
    - DOC: configuration: typo req.ssl_hello_type
    - MINOR: hq-interop: add fastfwd support
    - CLEANUP: mux_quic: rename ffwd function with prefix qmux_strm_
    - MINOR: mux-quic: add traces for 0-copy/fast-forward
    - BUG/MINOR: mworker/cli: fix set severity-output support
    - CLEANUP: mworker/cli: add comments about pcli_find_and_exec_kw()
    - BUG/MEDIUM: quic: Possible buffer overflow when building TLS records
    - BUILD: ssl: update types in wolfssl cert selection callback
    - MINOR: ssl: activate the certificate selection callback for WolfSSL
    - CI: github: switch to wolfssl git-c4b77ad for new PR
    - BUG/MEDIUM: map/acl: pat_ref_{set,delete}_by_id regressions
    - BUG/MINOR: ext-check: cannot use without preserve-env
    - CLEANUP: mux-quic: remove unused prototype
    - MINOR: mux-quic: clean up qcs Rx buffer allocation API
    - MINOR: mux-quic: clean up qcs Tx buffer allocation API
    - CLEANUP: mux-quic: clean up app ops callback definitions
    - MINOR: mux-quic: factorize QC_SF_UNKNOWN_PL_LENGTH set
    - MINOR: h3: complete traces for sending
    - MINOR: h3: adjust zero-copy sending related code
    - MINOR: hq-interop: use zero-copy to transfer single HTX data block
    - BUG/MEDIUM: quic: QUIC CID removed from tree without locking
    - BUG/MEDIUM: stconn: Block zero-copy forwarding if EOS/ERROR on consumer side
    - BUG/MEDIUM: mux-h1: Cound data from input buf during zero-copy forwarding
    - BUG/MEDIUM: mux-h1: Explicitly skip request's C-L header if not set originally
    - CLEANUP: mux-h1: Fix a trace message about C-L header addition
    - BUG/MEDIUM: mux-h2: Report too large HEADERS frame only when rxbuf is empty
    - BUG/MEDIUM: mux-quic: report early error on stream
    - DOC: config: add arguments to sample fetch methods in the table
    - DOC: config: also add arguments to the converters in the table
    - BUG/MINOR: resolvers: default resolvers fails when network not configured
    - SCRIPTS: mk-patch-list: produce a list of patches
    - DEV: patchbot: add the AI-based bot to pre-select candidate patches to backport
    - BUG/MEDIUM: mux-h2: Switch pending error to error if demux buffer is empty
    - BUG/MEDIUM: mux-h2: Only Report H2C error on read error if demux buffer is empty
    - BUG/MEDIUM: mux-h2: Don't report error on SE if error is only pending on H2C
    - BUG/MEDIUM: mux-h2: Don't report error on SE for closed H2 streams
    - DOC: config: Update documentation about local haproxy response
    - DEV: patchbot: use checked buttons as reference instead of internal table
    - DEV: patchbot: allow to show/hide backported patches
    - MINOR: h3: remove quic_conn only reference
    - BUG/MINOR: server: Use the configured address family for the initial resolution
    - MINOR: mux-quic: remove qcc_shutdown() from qcc_release()
    - MINOR: mux-quic: use qcc_release in case of init failure
    - MINOR: mux-quic: adjust error code in init failure
    - MINOR: h3: add traces for connection init stage
    - BUG/MINOR: h3: properly handle alloc failure on finalize
    - MINOR: h3: use INTERNAL_ERROR code for init failure
    - BUG/MAJOR: stconn: Disable zero-copy forwarding if consumer is shut or in error
    - MINOR: stats: store the parent proxy in stats ctx (http)
    - BUG/MEDIUM: stats: unhandled switching rules with TCP frontend
    - MEDIUM: proxy: set PR_O_HTTP_UPG on implicit upgrades
    - MINOR: proxy: monitor-uri works with tcp->http upgrades
    - OPTIM: server: eb lookup for server_find_by_name()
    - OPTIM: server: ebtree lookups for findserver_unique_* functions
    - MINOR: server/event_hdl: add server_inetaddr struct to facilitate event data usage
    - MINOR: server/event_hdl: update _srv_event_hdl_prepare_inetaddr prototype
    - BUG/MINOR: server/event_hdl: propagate map port info through inetaddr event
    - MINOR: server: ensure connection cleanup on server addr changes
    - CLEANUP: server/event_hdl: remove purge_conn hint in INETADDR event
    - MEDIUM: server: merge srv_update_addr() and srv_update_addr_port() logic
    - CLEANUP: server: remove unused server_parse_addr_change_request() function
    - CLEANUP: resolvers: remove duplicate func prototype
    - MINOR: resolvers: add unique numeric id to nameservers
    - MEDIUM: server: make server_set_inetaddr() updater serializable
    - MINOR: server/event_hdl: expose updater info through INETADDR event
    - MINOR: server: add dns hint in server_inetaddr_updater struct
    - MEDIUM: server/dns: clear RMAINT when addr resolves again
    - BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS
    - BUG/MEDIUM: server/dns: perform svc_port updates atomically from SRV records
    - MEDIUM: peers: use server as stream target
    - CLEANUP: peers: remove unused sock_init_arg struct member
    - CLEANUP: peers: remove unused "proto" and "xprt" struct members
    - MINOR: peers: rely on srv->addr and remove peer->addr
    - DOC: config: add context hint for server keywords
    - MINOR: stktable: add table_process_entry helper function
    - MINOR: stktable: use {show,set,clear} table with ptr
    - MINOR: map: add map_*_key converters to provide the matching key
    - DOC: fix typo for fastfwd QUIC option
    - BUG/MINOR: mux-quic: always report error to SC on RESET_STREAM emission
    - MEDIUM: mux-quic: add BUG_ON if sending on locally closed QCS
    - BUG/MINOR: mux-quic: disable fast-fwd if connection on error
    - BUG/MINOR: quic: Wrong keylog callback setting.
    - BUG/MINOR: quic: Missing call to TLS message callbacks
    - MINOR: h3: check connection error during sending
    - BUG/MINOR: h3: close connection on header list too big
    - BUG/MINOR: h3: close connection on sending alloc errors
    - BUG/MINOR: h3: disable fast-forward on buffer alloc failure
    - Revert "MINOR: mux-quic: Disable zero-copy forwarding for send by default"
    - MINOR: stktable: stktable_data_ptr() cannot fail in table_process_entry()
    - CLEANUP: assorted typo fixes in the code and comments
    - CI: use semantic version compare for determing "latest" OpenSSL
    - CLEANUP: server: remove ambiguous check in srv_update_addr_port()
    - CLEANUP: resolvers: remove unused RSLV_UPD_OBSOLETE_IP flag
    - CLEANUP: resolvers: remove some more unused RSLV_UDP flags
    - MEDIUM: server: simplify snr_set_srv_down() to prevent confusions
    - MINOR: backend: export get_server_*() functions
    - MINOR: tcpcheck: export proxy_parse_tcpcheck()
    - MEDIUM: udp: allow to retrieve the frontend destination address
    - MINOR: global: export a way to list build options
    - MINOR: debug: add features and build options to "show dev"
    - BUG/MINOR: server: fix server_find_by_name() usage during parsing
    - REGTESTS: check attach-srv out of order declaration
    - CLEANUP: quic: Remaining useless code into server part
    - BUILD: quic: Missing quic_ssl.h header protection
    - BUG/MEDIUM: h3: fix incorrect snd_buf return value
    - MINOR: h3: do not consider missing buf room as error on trailers
    - BUG/MEDIUM: stconn: Forward shutdown on write timeout only if it is forwardable
    - BUG/MEDIUM: stconn: Set fsb date if zero-copy forwarding is blocked during nego
    - BUG/MEDIUM: spoe: Never create new spoe applet if there is no server up
    - MINOR: mux-h2: support limiting the total number of H2 streams per connection
    - CLEANUP: mux-h2: remove the printfs from previous commit on h2 streams limit.
    - DEV: h2: add the ability to emit literals in mkhdr
    - DEV: h2: add the preface as well in supported output types
    - DEV: h2: support passing raw data for a frame
    - IMPORT: ebtree: implement and use flsnz_long() to count bits
    - IMPORT: ebtree: switch the sizes and offsets to size_t and ssize_t
    - IMPORT: ebtree: rework the fls macros to better deal with arch-specific ones
    - IMPORT: ebtree: make string_equal_bits turn back to unsigned char
    - IMPORT: ebtree: use unsigned ints for flznz()
    - IMPORT: ebtree: make string_equal_bits() return an unsigned
2024-01-06 14:09:35 +01:00
Willy Tarreau
afba58f21e MINOR: global: export a way to list build options
The new function hap_get_next_build_opt() will iterate over the list of
build options. This will be used for debugging, so that the build options
can be retrieved from the CLI.
2024-01-02 11:44:42 +01:00
Amaury Denoyelle
b7274e69ef Revert "MINOR: mux-quic: Disable zero-copy forwarding for send by default"
This reverts commit 18f2ccd244.

Found issues related to QUIC fast-forward were resolved (see github
issue #2372). Reenable it by default. If any issue arises, it can be
disabled using the global statement :
  tune.quit.zero-copy-fwd-send off

This can be backported to 2.9, but only after a sensible period of
observation.
2023-12-22 16:30:37 +01:00
Christopher Faulet
18f2ccd244 MINOR: mux-quic: Disable zero-copy forwarding for send by default
There is at least an bug for now in this part and it is still unstable. Thus
it is better to disable it for now by default. It can be enable by setting
tune.quic.zero-copy-fwd-send to 'on'.
2023-12-04 15:36:02 +01:00
Christopher Faulet
7732323cf3 MINOR: global: Use a dedicated bitfield to customize zero-copy fast-forwarding
Zero-copy fast-forwading feature is a quite new and is a bit sensitive.
There is an option to disable it globally. However, all protocols have not
the same maturity. For instance, for the PT multiplexer, there is nothing
really new. The zero-copy fast-forwading is only another name for the kernel
splicing. However, for the QUIC/H3, it is pretty new, not really optimized
and it will evolved. And soon, the support will be added for the cache
applet.

In this context, it is usefull to be able to enable/disable zero-copy
fast-forwading per-protocol and applet. And when it is applicable, on sends
or receives separately. So, instead of having one flag to disable it
globally, there is now a dedicated bitfield, global.tune.no_zero_copy_fwd.
2023-12-04 15:31:47 +01:00
Amaury Denoyelle
e97489a526 MINOR: trace: support -dt optional format
Add an optional argument for "-dt". This argument is interpreted as a
list of several trace statement separated by comma. For each statement,
a specific trace name can be specifed, or none to act on all sources.
Using double-colon separator, it is possible to add specifications on
the wanted level and verbosity.
2023-11-27 17:15:14 +01:00
Amaury Denoyelle
cef29d3708 MINOR: trace: define simple -dt argument
Add '-dt' haproxy process argument. This will automatically activate all
trace sources on stderr with the error level. This could be useful to
troubleshoot issues such as protocol violations.
2023-11-27 17:10:18 +01:00
William Lallemand
3dd55fa132 MINOR: mworker/cli: implement hard-reload over the master CLI
The mworker mode never had a proper 'hard-stop' (-st) for the reload,
this is a mode which was commonly used with the daemon mode, but it was
never implemented in mworker mode.

This patch fixes the problem by implementing a "hard-reload" command
over the master CLI. It does the same as the "reload" command, but
instead of waiting for the connections to stop in the previous process,
it immediately quits the previous process after binding.
2023-11-24 21:44:25 +01:00
Willy Tarreau
45a9e4e24b MINOR: init: add info about the main program to the post_mortem struct
This way we'll still have haproxy's version, build options etc in core
dumps and centralized all at once.
2023-11-23 15:39:21 +01:00
Willy Tarreau
2268f10dd6 DEBUG: tinfo: store the pthread ID and the stack pointer in tinfo
When debugging a core, it's difficult to match a given gdb thread number
against an internal thread. Let's just store the pthread ID and the stack
pointer in each tinfo. This could help in the future by allowing to just
glance over them and pick the right one depending what info is found
first.
2023-11-23 14:32:55 +01:00
Amaury Denoyelle
decf29d06d MINOR: quic: remove unneeded QUIC specific stopping function
On CONNECTION_CLOSE reception/emission, QUIC connections enter CLOSING
state. At this stage, only CONNECTION_CLOSE can be reemitted and all
other exchanges are stopped.

Previously, on haproxy process stopping, if all QUIC connections were in
CLOSING state, they were released before their closing timer expiration
to not block the process shutdown. However, since a recent commit, the
closing timer has been shorten to a more reasonable delay. It is now
consider viable to respect connections closing state even on process
shutdown. As such, stopping specific code in QUIC connections idle timer
task was removed.

A specific function quic_handle_stopping() was implemented to notify
QUIC connections on shutdown from main() function. It should have been
deleted along the removal in QUIC idle timer task. This patch just does
this.
2023-11-20 17:59:52 +01:00
William Lallemand
ef9a195742 BUG/MINOR: startup: set GTUNE_SOCKET_TRANSFER correctly
This bug was forbidding the GTUNE_SOCKET_TRANSFER option to be set
when haproxy is neither in daemon mode nor in mworker mode. So it
basically only impacts the foreground mode.

The fix moves the code outside the 'if (global.mode & (MODE_DAEMON |
MODE_MWORKER | MODE_MWORKER_WAIT))' condition.

Bug was introduced with 7f80eb23 ("MEDIUM: proxy: zombify proxies only
when the expose-fd socket is bound").

Must be backported in every stable version.
2023-11-20 10:49:05 +01:00
Willy Tarreau
c7a90cc181 CLEANUP: haproxy: remove old comment from 1.1 from the file header
There was still a totally outdated comment speaking about issues
affecting solaris on 1.1.8pre4 (April 2002, 21 year-old)! This
proves that comments in headers are never read, so let's take this
opportunity for also removing the outdated one recommending to read
the "updated" RFC7230.
2023-11-17 18:10:16 +01:00
William Lallemand
d76fa37534 BUG/MEDIUM: mworker: set the master variable earlier
Since 2.7 and the mcli_reload_bind_conf (56f73b21a5), upon a reload
failure because of a bind error, the mcli_reload_bind_conf go through a
sock_unbind((). This is not supposed to do anything when a listener is
RX_F_INHERITED in the master, but unfortunately this is done too early
and provokes an exit of the master.

We already suspected in the past that setting the 'master' variable this
late could have negative impact.

The fix sets the master variable earlier before the bind.

This must be backported at least to 2.7. This could be backported
earlier but better wait any feedbacks on the fix.
2023-11-14 14:32:39 +01:00
William Lallemand
a06f6212c9 MEDIUM: startup: 'haproxy -c' is quiet when valid
MODE_CHECK does not output "Configuration file is valid" by default
anymore. To display this message the -V option must be used with -c.

However the warning and errors are still output by default if they
exist.

This allows to clean the output of the systemd unit file with is doing a
-c.
2023-11-13 09:59:34 +01:00
William Lallemand
3ac3a06963 MEDIUM: mworker: -W is mandatory when using -S
Defining a master CLI without the master-worker mode emits a warning
since version 1.8. This patch enforce the behavior by forbiding the
usage of the -S option without the master-worker mode.
2023-11-09 15:07:15 +01:00
Christopher Faulet
023564b685 MINOR: global: Add an option to disable the zero-copy forwarding
The zero-copy forwarding or the mux-to-mux forwarding is a way to
fast-forward data without using the channels buffers. Data are transferred
from a mux to the other one. The kernel splicing is an optimization of the
zero-copy forwarding. But it can also use normal buffers (but not channels
ones). This way, it could be possible to fast-forward data with muxes not
supporting the kernel splicing (H2 and H3 muxes) but also with applets.

However, this mode can introduce regressions or bugs in future (just like
the kernel splicing). Thus, It could be usefull to disable this optim. To do
so, in configuration, the global tune settting
'tune.disable-zero-copy-forwarding' may be set in a global section or the
'-dZ' command line parameter may be used to start HAProxy. Of course, this
also disables the kernel splicing.
2023-10-17 18:51:13 +02:00
Aurelien DARRAGON
18da35c123 MEDIUM: tree-wide: logsrv struct becomes logger
When 'log' directive was implemented, the internal representation was
named 'struct logsrv', because the 'log' directive would directly point
to the log target, which used to be a (UDP) log server exclusively at
that time, hence the name.

But things have become more complex, since today 'log' directive can point
to ring targets (implicit, or named) for example.

Indeed, a 'log' directive does no longer reference the "final" server to
which the log will be sent, but instead it describes which log API and
parameters to use for transporting the log messages to the proper log
destination.

So now the term 'logsrv' is rather confusing and prevents us from
introducing a new level of abstraction because they would be mixed
with logsrv.

So in order to better designate this 'log' directive, and make it more
generic, we chose the word 'logger' which now replaces logsrv everywhere
it was used in the code (including related comments).

This is internal rewording, so no functional change should be expected
on user-side.
2023-10-13 10:05:06 +02:00
Willy Tarreau
90fa2eaa15 MINOR: haproxy: permit to register features during boot
The regtests are using the "feature()" predicate but this one can only
rely on build-time options. It would be nice if some runtime-specific
options could be detected at boot time so that regtests could more
flexibly adapt to what is supported (capabilities, splicing, etc).

Similarly, certain features that are currently enabled with USE_XXX
could also be automatically detected at build time using ifdefs and
would simplify the configuration, but then we'd lose the feature
report in the feature list which is convenient for regtests.

This patch makes sure that haproxy -vv shows the variable's contents
and not the macro's contents, and adds a new hap_register_feature()
to allow the code to register a new keyword.
2023-10-06 11:40:02 +02:00
Willy Tarreau
5119109e3f MINOR: cpuset: dynamically allocate cpu_map
cpu_map is 8.2kB/entry and there's one such entry per group, that's
~520kB total. In addition, the init code is still in haproxy.c enclosed
in ifdefs. Let's make this a dynamically allocated array in the cpuset
code and remove that init code.

Later we may even consider reallocating it once the number of threads
and groups is known, in order to shrink it a little bit, as the typical
setup with a single group will only need 8.2kB, thus saving half a MB
of RAM. This would require that the upper bound is placed in a variable
though.
2023-09-08 16:25:19 +02:00
Willy Tarreau
5f10176e2c MEDIUM: init: initialize the trash earlier
More and more utility functions rely on the trash while most of the init
code doesn't have access to it because it's initialized very late (in
PRE_CHECK for the initial one). It's a pool, and it purposely supports
being reallocated, so let's initialize it in STG_POOL so that early
STG_INIT code can at least use it.
2023-09-08 16:25:19 +02:00
Frédéric Lécaille
292dfdd78d BUG/MINOR: quic: Wrong cluster secret initialization
The function generate_random_cluster_secret() which initializes the cluster secret
when not supplied by configuration is buggy. There 1/256 that the cluster secret
string is empty.

To fix this, one stores the cluster as a reduced size first 128 bits of its own
SHA1 (160 bits) digest, if defined by configuration. If this is not the case, it
is initialized with a 128 bits random value. Furthermore, thus the cluster secret
is always initialized.

As the cluster secret is always initialized, there are several tests which
are for now on useless. This patch removes such tests (if(global.cluster_secret))
in the QUIC code part and at parsing time: no need to check that a cluster
secret was initialized with "quic-force-retry" option.

Must be backported as far as 2.6.
2023-09-08 09:50:58 +02:00
Aurelien DARRAGON
e187361b52 MINOR: log: move log-forwarders cleanup in log.c
Move the log-forwarded proxies cleanup from global deinit() function into
log dedicated deinit function.

No backport needed.
2023-09-06 16:06:39 +02:00
William Lallemand
d90d3bf894 MINOR: global: export the display_version() symbol
Export the display_version() function which can be used elsewhere than
in haproxy.c
2023-09-05 15:24:39 +02:00
Willy Tarreau
86854dd032 MEDIUM: threads: detect excessive thread counts vs cpu-map
This detects when there are more threads bound via cpu-map than CPUs
enabled in cpu-map, or when there are more total threads than the total
number of CPUs available at boot (for unbound threads) and configured
for bound threads. In this case, a warning is emitted to explain the
problems it will cause, and explaining how to address the situation.

Note that some configurations will not be detected as faulty because
the algorithmic complexity to resolve all arrangements grows in O(N!).
This means that having 3 threads on 2 CPUs and one thread on 2 CPUs
will not be detected as it's 4 threads for 4 CPUs. But at least configs
such as T0:(1,4) T1:(1,4) T2:(2,4) T3:(3,4) will not trigger a warning
since they're valid.
2023-09-04 19:39:17 +02:00
Willy Tarreau
8357f950cb MEDIUM: threads: detect incomplete CPU bindings
It's very easy to mess up with some cpu-map directives and to leave
some thread unbound. Let's add a test that checks that either all
threads are bound or none are bound, but that we do not face the
intermediary situation where some are pinned and others are left
wandering around, possibly on the same CPUs as bound ones.

Note that this should not be backported, or maybe turned into a
notice only, as it appears that it will easily catch invalid
configs and that may break updates for some users.
2023-09-04 19:39:17 +02:00
Andrew Hopkins
b3f94f8b3b BUILD: ssl: Build with new cryptographic library AWS-LC
This adds a new option for the Makefile USE_OPENSSL_AWSLC, and
update the documentation with instructions to use HAProxy with
AWS-LC.

Update the type of the OCSP callback retrieved with
SSL_CTX_get_tlsext_status_cb with the actual type for
libcrypto versions greater than 1.0.2. This doesn't affect
OpenSSL which casts the callback to void* in SSL_CTX_ctrl.
2023-09-04 18:19:18 +02:00
Willy Tarreau
bd84387beb MEDIUM: capabilities: enable support for Linux capabilities
For a while there has been the constraint of having to run as root for
transparent proxying, and we're starting to see some cases where QUIC is
not running in socket-per-connection mode due to the missing capability
that would be needed to bind a privileged port. It's not realistic to
ask all QUIC users on port 443 to run as root, so instead let's provide
a basic support for capabilities at least on linux. The ones currently
supported are cap_net_raw, cap_net_admin and cap_net_bind_service. The
mechanism was made OS-specific with a dedicated file because it really
is. It can be easily refined later for other OSes if needed.

A new keyword "setcaps" is added to the global section, to enumerate the
capabilities that must be kept when switching from root to non-root. This
is ignored in other situations though. HAProxy has to be built with
USE_LINUX_CAP=1 for this to be supported, which is enabled by default
for linux-glibc, linux-glibc-legacy and linux-musl.

A good way to test this is to start haproxy with such a config:

    global
        uid 1000
        setcap cap_net_bind_service

    frontend test
        mode http
        timeout client 3s
        bind quic4@:443 ssl crt rsa+dh2048.pem allow-0rtt

and run it under "sudo strace -e trace=bind,setuid", then connecting
there from an H3 client. The bind() syscall must succeed despite the
user id having been switched.
2023-08-29 11:11:50 +02:00
Willy Tarreau
f54d8c6457 CLEANUP: cpuset: remove the unused proc_t1 field in cpu_map
This field used to store the cpumap of the first thread in a group, and
was used till 2.4 to hold some default settings, after which it was no
longer used. Let's just drop it.
2023-07-20 11:01:09 +02:00
Willy Tarreau
c955659906 BUG/MINOR: init: set process' affinity even in foreground
The per-process CPU affinity settings are only applied during forking,
which means that cpu-map are ignored when running in foreground (e.g.
haproxy started with -db). This is historic due to the original semantics
of a process array, but isn't documented and causes surprises when trying
to debug affinity settings.

Let's make sure the setting is applied to the workers themselves even
in foreground. This may be backported to 2.6 though it is really not
important. If backported, it also depends on previous commit:

  BUG/MINOR: cpuset: remove the bogus "proc" from the cpu_map struct
2023-07-20 11:01:09 +02:00
Willy Tarreau
151f9a2808 BUG/MINOR: cpuset: remove the bogus "proc" from the cpu_map struct
We're currently having a problem with the porting from cpu_map from
processes to thread-groups as it happened in 2.7 with commit 5b09341c0
("MEDIUM: cpu-map: replace the process number with the thread group
number"), though it seems that it has deeper roots even in 2.0 and
that it was progressively made worng over time.

The issue stems in the way the per-process and per-thread cpu-sets were
employed over time. Originally only processes were supported. Then
threads were added after an optional "/" and it was documented that
"cpu-map 1" is exactly equivalent to "cpu-map 1/all" (this was clarified
in 2.5 by commit 317804d28 ("DOC: update references to process numbers
in cpu-map and bind-process").

The reality is different: when processes were still supported, setting
"cpu-map 1" would apply the mask to the process itself (and only when
run in the background, which is not documented either and is also a
bug for another fix), and would be combined with any possible per-thread
mask when calculating the threads' affinity, possibly resulting in empty
sets. However, "cpu-map 1/all" would only set the mask for the threads
and not the process. As such the following:

    cpu-map 1 odd
    cpu-map 1/1-8 even

would leave no CPU while doing:

    cpu-map 1/all odd
    cpu-map 1/1-8 even

would allow all CPUs.

While such configs are very unlikely to ever be met (which is why this
bug is tagged minor), this is becoming quite more visible while testing
automatic CPU binding during 2.9 development because due to this bug
it's much more common to end up with incorrect bindings.

This patch fixes it by simply removing the .proc entry from cpu_map and
always setting all threads' maps. The process is no longer arbitrarily
bound to the group 1's mask, but in case threads are disabled, we'll
use thread 1's mask since it contains the configured CPUs.

This fix should be backported at least to 2.6, but no need to insist if
it resists as it's easier to break cpu-map than to fix an unlikely issue.
2023-07-20 11:01:09 +02:00