The current tests in _h3_handle_hdr() and h3_trailers_to_htx() check
for an interval between 'A' and 'Z' for letters in header field names
that should be forbidden, but mistakenly leave the 'Z' out of the
forbidden range, resulting in it being implicitly valid.
This has no real consequences but should be fixed for the sake of
protocol validity checking.
This must be backported to all relevant versions.
The current tests in h2_make_htx_request(), h2_make_htx_response()
and h2_make_htx_trailers() check for an interval between 'A' and 'Z'
for letters in header field names that should be forbidden, but
mistakenly leave the 'Z' out of the forbidden range, resulting in it
being implicitly valid.
This has no real consequences but should be fixed for the sake of
protocol validity checking.
This must be backported to all relevant versions.
Mjson comes with its own strtod() implementation for portability
reasons and probably also because many generic strtod() versions as
provided by operating systems do not focus on resource preservation
and may call malloc(), which is not welcome in a parser.
The strtod() implementation used here apparently originally comes from
https://gist.github.com/mattn/1890186 and seems to have purposely
omitted a few parts that were considered as not needed in this context
(e.g. skipping white spaces, or setting errno). But when subject to the
relevant test cases of the designated file above, the current function
provides the same results.
The aforementioned implementation uses pow() to calculate exponents,
but mjson authors visibly preferred not to introduce a libm dependency
and replaced it with an iterative loop in O(exp) time. The problem is
that the exponent is not bounded and that this loop can take a huge
amount of time. There's even an issue already opened on mjson about
this: https://github.com/cesanta/mjson/issues/59. In the case of
haproxy, fortunately, the watchdog will quickly stop a runaway process
but this remains a possible denial of service.
A first approach would consist in reintroducing pow() like in the
original implementation, but if haproxy is built without Lua nor
51Degrees, -lm is not used so this will not work everywhere.
Anyway here we're dealing with integer exponents, so an easy alternate
approach consists in simply using shifts and squares, to compute the
exponent in O(log(exp)) time. Not only it doesn't introduce any new
dependency, but it turns out to be even faster than the generic pow()
(85k req/s per core vs 83.5k on the same machine).
This must be backported as far as 2.4, where mjson was introduced.
Many thanks to Oula Kivalo for reporting this issue.
CVE-2025-11230 was assigned to this issue.
Oula Kivalo reported that different JSON libraries may process duplicate
keys differently and that most JSON libraries usually decode the stream
before extracting keys, while the current mjson implementation decodes the
contents during extraction instead. Let's document this point so that
users are aware of the limitations and do not rely on the current behavior
and do not use it for what it's not made for (e.g. content sanitization).
This is also the case for jwt_header_query(), jwt_payload_query() and
jwt_verify(), which already refer to this converter for specificities.
Properly handle memory allocation failures, by checking the return value
for pool_alloc(), and if it fails, make sure that the caller will take
it into account.
The only use of pool_alloc() in fwlc is to allocate the tree elements in
order to properly queue the server into the ebtree, so if that
allocation fails, just schedule the requeue tasklet, that will try
again, until it hopefully eventually succeeds.
This should be backported to 3.2.
This should fix github issue #3143.
Modify fwlc_srv_reposition() so that it does not assume that the server
was already queued, and so make it so it works even if s->tree_elt is
NULL.
While the server will usually be queued, there is an unlikely
possibility that when the server attempted to get queued when it got up,
it failed due to a memory allocation failure, and it just expect the
server_requeue tasklet to run to take care of that later.
This should be backported to 3.2.
This is part of an attempt to fix github issue #3143
On creation, schedule the server requeue once it's been created.
It is possible that when the server went up, it tried to queue itself
into the lb specific code, failed to do so, and expect the tasklet to
run to take care of that.
This should be backported to 3.2.
This is part of an attempt to fix github issue #3143.
If a client aborts a pending SSL connection for whatever reason (timeout
etc) and the listen queue is large, it may inflict a severe load to a
frontend which will spend the CPU creating new sessions then killing the
connection. This is similar to HTTP requests aborted just after being
sent, except that asymmetric crypto is way more expensive.
Unfortunately "option abortonclose" has no effect on this, because it
only applies at a higher level.
This patch ensures that handshakes being received on a frontend having
"option abortonclose" set will be checked for a pending close, and if
this is the case, then the connection will be aborted before the heavy
calculations. The principle is to use recv(MSG_PEEK) to detect the end,
and to destroy the pending handshake data before returning to the SSL
library so that it cannot start computing, notices the error and stops.
We don't do it without abortonclose though, because this can be used for
health checks from other haproxy nodes or even other components which
just want to see a handshake succeed.
This is in relation with GH issue #3124.
Normally, when reading a full buffer, or exactly the requested size, it
is not really possible to know if the peer had closed immediately after,
and usually we don't care. There's a problematic case, though, which is
with SSL: the SSL layer reads in small chunks of a few bytes, and can
consume a client_hello this way, then start computation without knowing
yet that the client has aborted. In order to permit knowing more, we now
introduce a new read flag, CO_RFL_TRY_HARDER, which says that if we've
read up to the permitted limit and the flag is set, then we attempt one
extra byte using MSG_PEEK to detect whether the connection was closed
immediately after that content or not. The first use case will obviously
be related to SSL and client_hello, but it might possibly also make sense
on HTTP responses to detect a pending FIN at the end of a response (e.g.
if a close was already advertised).
Sometimes in order to debug certain difficult situations it can be useful
to know what SNI was configured on a connection going to a server, for
example to match it against what the server saw or to detect cases where
a server would route on SNI instead of Host. This sample fetch function
simply retrieves the SNI configured on the backend connection, if any.
The fact that the watchdog timer measures the execution time from the
last return from the poller tends to amplify the impact of multiple
bad tasks, and may explain some of the panics reported by Felipe and
Ricardo in GH issues #3084, #3092 and #3101. The problem is that we
check the time if we see that the scheduler appears not to be moving
anymore, but one situation may still arise and catch a bad task:
- one slow task takes so long a time that it triggers the watchdog
twice, emitting a warning the second time (~200ms). The scheduler
is rightfully marked as stuck.
- then it completes and the scheduler is no longer stuck. Many other
tasks run in turn, they all take quite some time but not enough to
trigger a warning. But collectively their cost adds up.
- then a task takes more than the warning time (100ms), and causes
the total execution time to cross the second. The watchdog is
called, sees that we've spend more than 1 second since we left the
poller, and marks the thread as stuck.
- the task is not finished, the watchdog is called again, sees more
than one second with a stuck thread and panics 100ms later.
The total time away from the poller is indeed more than one second,
which is very bad, but no single task caused this individually, and
while the warnings are OK, the watchdog should not panic in this case.
This patch revisits the approach to store the moment the scheduler was
marked as stuck in the wdt context. The idea is that this date will be
used to detect warnings and panics. And by doing so and exploiting the
new is_sched_alive(thr), we can greatly simplify the mechanism so that
the signal handling thread does the strict minimum (mark the scheduler
as possibly stuck and update the stuck_start date), and only bounces to
the reporting thread if the scheduler made no progress since last call.
This means that without even doing computations in the handing thread,
we can continue to avoid all bounces unless a warning is required. Then
when the reporting thread is signaled, it will check the dates from the
last moment the scheduler was marked, and will decide to warn or panic.
The panic decision continues to pass via a TH_FL_STUCK flag to probe the
code so that exceptionally slow code (e.g. live cert generation etc) can
still find a way to avoid the panic if absolutely certain that things
are still moving.
This means that now we have the guarantee that panics will only happen
if a given task spends more than one full second not moving, and that
warnings will be issued for other calls crossing the warn delay boundary.
This was tested using artificially slow operations, and all combinations
which individually took less than a second only resulted in floods of
warnings even if the total reported time in the warning was much higher,
while those above one second provoked the panic.
One improvement could consist in reporting the time since last stuck
in the thread dumps to differentiate the individual task from the whole
set.
This needs to be backported to 3.2 along with the two previous patches:
MINOR: sched: let's permit to share the local ctx between threads
MINOR: sched: pass the thread number to is_sched_alive()
Now it will be possible to query any thread's scheduler state, not
only the current one. This aims at simplifying the watchdog checks
for reported threads. The operation is now a simple atomic xchg.
The watchdog timer has to go through complex operations due to not being
able to check if another thread's scheduler is still ticking. This is
simply because the scheduler status is marked as thread-local while it
could in fact also be an array. Let's do that (and align the array to
avoid false sharing) so that it's now possible to check any scheduler's
status.
There is a race condition, an entry can be free'd by stksess_kill()
between the time stktable_add_pend_updates() gets the entry from the
mt_list, and the time it adds it to the ebtree.
To prevent this, use the newly implemented MT_LIST_POP_LOCKED() to keep
the stksess locked until it is added to the tree. That way,
__stksess_kill() will wait until we're done with it.
This should be backported to 3.2.
Implement MT_LIST_POP_LOCKED(), that behaves as MT_LIST_POP() and
removes the first element from the list, if any, but keeps it locked.
This should be backported to 3.2, as it will be use in a bug fix in the
stick tables that affects 3.2 too.
The -v verbose mode displays the loading messages returned by the master
CLI reload command upon error.
The new -vv mode displays the loading messages even upon success,
showing the content of `show startup-logs` after the reload attempt.
By default haproxy-reload displays the error that are not emitted by
haproxy, but only emitted by haproxy-reload.
-s silent mode, don't display any error
-v verbose mode, display the loading messages returned by the master CLI
reload command upon error.
When using AWS-LC, the free() of the data ptr resulting from
i2d_X509_REQ() might crash, because it uses the free() of the libc
instead of OPENSSL_free().
It does not seems to be a problem on openssl builds.
Must be backported in 3.2.
Replace error/notice by [ALERT]/[WARNING]/[NOTICE] like it's done in
haproxy.
ALERT means a failure and the program will exit 1 just after it
WARNING will continue the execution of the program
NOTICE will continue the execution as well
Files dumped from the socket are put in a temporary directory, this
directory is then removed upon exit.
Variable were cleaned to be clearer:
- crt_filename -> prev_crt
- key_filename -> prev_key
- ${crt_filename}.${tmp} -> new_crt
- ${key_filename}.${tmp} -> new_key
Compare the fingerprint of the leaf certificate to the previous file to
check if it needs to be updated or not
Also skip the check if no file is on the disk.
haproxy-dump0-certs is a bash script that connects to your master socket
or your stat socket in order to dump certificates from haproxy memory to
the corresponding files.
The cfg_postsection_acme() redefines its own cur_acme variable, pointing
to the first acme section created. Meaning that the first section would
be init multiple times, and the next sections won't never be
initialized.
It could result in crashes at the first use of all sections that are not
the first one.
Must be backported in 3.2
Unlinking the acme_ctx element from acme_ctx_destroy() requires to have
the element unlocked, because MT_LIST_DELETE() locks the element.
acme_ctx_destroy() frees the data from acme_ctx with the ctx still
linked and unlocked, then lock to unlink. So there's a small risk of
accessing acme_ctx from somewhere else. The only way to do that would be
to use the `acme challenge_ready` CLI command at the same time.
Fix the issue by doing a mt_list_unlock_link() and a
mt_list_unlock_self() to unlink the element under the lock, then destroy
the element.
This must be backported in 3.2.
admin/halog/halog.c: In function 'filter_count_url':
admin/halog/halog.c:1685:9: error: this 'if' clause does not guard... [-Werror=misleading-indentation]
1685 | if (unlikely(!ustat))
| ^~
admin/halog/halog.c:1687:17: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if'
1687 | if (unlikely(!ustat)) {
| ^~
This patch fixes the indentation.
Must be backported where fbd0fb20a22 ("BUG/MINOR: halog: Add OOM checks
for calloc() in filter_count_srv_status() and filter_count_url()") was
backported.
There is currently an srv_queue converter which is capable of taking the
output of a dynamic name and determining the queue length for a given
server. In addition there is a sample fetcher for whether a server is
currently up. This simply combines the two such that srv_is_up can be
used as a converter too.
Future work might extend this to other sample fetchers for servers, but
this is probably the most useful for acl routing.
In preparation of providing further server converters, split the code
for finding the server from the sample out.
Additionally, update the documentation for srv_queue converter to note
security concerns.
src/acme.c: In function ‘cfg_parse_acme_vars_provider’:
src/acme.c:471:9: error: potential null pointer dereference [-Werror=null-dereference]
471 | free(*dst);
| ^~~~~~~~~~
gcc13 on ubuntu 24.04 detects a false positive when building
3e72a9f ("MINOR: acme: provider-name for dpapi sink").
Indeed dst can't be NULL. Clarify the code so gcc don't complain
anymore.
Like "acme-vars", the "provider-name" in the acme section is used in
case of DNS-01 challenge and is sent to the dpapi sink.
This is used to pass the name of a DNS provider in order to chose the
DNS API to use.
This patch implements the cfg_parse_acme_vars_provider() which parses
either acme-vars or provider-name options and escape their strings.
Example:
$ ( echo "@@1 show events dpapi -w -0"; cat - ) | socat /tmp/master.sock - | cat -e
<0>2025-09-18T17:53:58.831140+02:00 acme deploy foobpar.pem thumbprint gDvbPL3w4J4rxb8gj20mGEgtuicpvltnTl6j1kSZ3vQ$
acme-vars "var1=foobar\"toto\",var2=var2"$
provider-name "godaddy"$
{$
"identifier": {$
"type": "dns",$
"value": "example.com"$
},$
"status": "pending",$
"expires": "2025-09-25T14:41:57Z",$
[...]
The httpclient is configured with @system-ca by default, which uses the
directory returned by X509_get_default_cert_dir().
On debian/ubuntu systems, this directory contains multiple certificate
files that are loaded successfully. However it seems that on other
systems the files in this directory is the direct result of
ca-certificates instead of its source. Meaning that you would only have
a bundle file with every certificates in it.
The loading was not done correctly in case of directory loading, and was
only loading the first certificate of each file.
This patch fixes the issue by using X509_STORE_load_locations() on each
file from the scandir instead of trying to load it manually with BIO.
Not that we can't use X509_STORE_load_locations with the `dir` argument,
which would be simpler, because it uses X509_LOOKUP_hash_dir() which
requires a directory in hash form. That wouldn't be suited for this use
case.
Must be backported in every stable branches.
Fix issue #3137.
Add a script to build curl with ECH support, to specify the path of the
openssl+ECH library, you should set the SSL_LIB variable with the prefix
of the library.
Example:
SSL_LIB=/opt/openssl-ech CURL_DESTDIR=/opt/curl-ech/ ./build-curl.sh
When we look for a map file reference, the file@ prefix is removed because
if may be omitted. The same is true with opt@ prefix. However this case was
not properly performed in pat_ref_lookup(). Let's do so.
This patch must be backported as far as 3.0.
Date computation between acme_will_expire() and acme_schedule_date() are
the same. Call acme_schedule_date() from acme_will_expire() and put the
functions as static. The patch also move the functions in the right
order.
acme_will_expire() computes the schedule date using notAfter and
notBefore from the certificate. However notBefore could be greater than
notAfter and could result in an overflow.
This is unlikely to happen and would mean an incorrect certificate.
This patch fixes the issue by checking that notAfter > notBefore.
It also replace the int type by a time_t to avoid overflow on 64bits
architecture which is also unlikely to happen with certificates.
`(date.tv_sec + diff > notAfter)` was also replaced by `if (notAfter -
diff <= date.tv_sec)` to avoid an overflow.
Fix issue #3135.
Need to be backported to 3.2.
acme_schedule_date() computes the schedule date using notAfter and
notBefore from the certificate. However notBefore could be greater than
notAfter and could result in an overflow.
This is unlikely to happen and would mean an incorrect certificate.
This patch fixes the issue by checking that notAfter > notBefore.
It also replace the int type by a time_t to avoid overflow on 64bits
architecture which is also unlikely to happen with certificates.
Fix issue #3136.
Need to be backported to 3.2.
When a map file is load, internally, the pattern reference is flagged as
based on a sample. However it is not performed for virtual maps. This flag
is only used during startup to check the map compatibility when it used at
different places. At runtime this does not change anything. But errors can
be triggered during configuration parsing. For instance, the following valid
config will trigger an error:
http-request set-map(virt@test) foo bar if !{ str(foo),map(virt@test) -m found }
http-request set-var(txn.foo) str(foo),map(virt@test)
The fix is quite obvious. PAT_REF_SMP flag must be set for virtual map as
any other map.
A workaround is to use optional map (opt@...) by checking the map id cannot
reference an existing file.
This patch must be backported as far as 3.0.
When a minimum size is defined to performe the comression, the message
payload size is tested. To do so, information from the HTX message a used to
determine the message length. However it is performed regardless the payload
length is fully known or not. Concretely, the test must on be performed when
a content-length value was speficied or when the message was fully received
(EOM flag set). Otherwise, we are unable to really determine the real
payload length.
Because of this bug, compression may be skipped for a large chunked message
because the first chunks received are too small. But this does not mean the
whole message is small.
This patch must be backported to 3.2.
Instead of having table_process_entry() decrement the session's ref
counter, do it outside, from the caller. Some were missed, such as when
an action was invalid, which would lead to the ref counter not being
decremented, and the session not being destroyable.
It makes more sense to do that from the caller, who just obtained the
ref counter, anyway.
This should be backporter up to 2.8.
The "acme challenge_ready" command mistakenly use the description of the
"acme status" command. This patch adds the right description.
Must be backported to 3.2.
Handle allocation properly during acme-vars parsing.
Check if we have a allocation failure in both the malloc and the
realloc and emits an error if that's the case.
In the case of the dns-01 challenge, the agent that handles the
challenge might need some extra information which depends on the DNS
provider.
This patch introduces the "acme-vars" option in the acme section, which
allows to pass these data to the dpapi sink. The double quotes will be
escaped when printed in the sink.
Example:
global
setenv VAR1 'foobar"toto"'
acme LE
directory https://acme-staging-v02.api.letsencrypt.org/directory
challenge DNS-01
acme-vars "var1=${VAR1},var2=var2"
Would output:
$ ( echo "@@1 show events dpapi -w -0"; cat - ) | socat /tmp/master.sock - | cat -e
<0>2025-09-18T17:53:58.831140+02:00 acme deploy foobpar.pem thumbprint gDvbPL3w4J4rxb8gj20mGEgtuicpvltnTl6j1kSZ3vQ$
acme-vars "var1=foobar\"toto\",var2=var2"$
{$
"identifier": {$
"type": "dns",$
"value": "example.com"$
},$
"status": "pending",$
"expires": "2025-09-25T14:41:57Z",$
[...]
The commit 88aa7a780 ("MINOR: http-client: Trigger an error if first
response block isn't a start-line") introduced a bug. From an endpoint, an
applet or a mux, the <first> index must never be used. It is reserved to the
HTTP analyzers. From endpoint, this value may be undefined or just point on
any other block that the first one. Instead we must always get the head
block.
In taht case, to be sure the first HTX block in a response is a start-line,
we must use htx_get_head_type() function instead of htx_get_first_type().
Otherwise, we can trigger an error while the response is in fact properly
formatted.
It is a 3.3-speific issue. cNo backport needed.
This patch looks huge, but it has a very simple goal: protect all
accessed to shared stats pointers (either read or writes), because
we know consider that these pointers may be NULL.
The reason behind this is despite all precautions taken to ensure the
pointers shouldn't be NULL when not expected, there are still corner
cases (ie: frontends stats used on a backend which no FE cap and vice
versa) where we could try to access a memory area which is not
allocated. Willy stumbled on such cases while playing with the rings
servers upon connection error, which eventually led to process crashes
(since 3.3 when shared stats were implemented)
Also, we may decide later that shared stats are optional and should
be disabled on the proxy to save memory and CPU, and this patch is
a step further towards that goal.
So in essence, this patch ensures shared stats pointers are always
initialized (including NULL), and adds necessary guards before shared
stats pointers are de-referenced. Since we already had some checks
for backends and listeners stats, and the pointer address retrieval
should stay in cpu cache, let's hope that this patch doesn't impact
stats performance much.
Willy experienced an unexpected behavior with the config below:
global
stats socket :1514
ring buf1
server srv1 127.0.0.1:1514
Indeed, haproxy would connect to the ring server twice since commit 23e5f18b
("MEDIUM: sink: change the sink mode type to PR_MODE_SYSLOG"), and one of the
connection would report errors.
The reason behind is is, despite the above commit saying no change of behavior
is expected, with the sink forward_px proxy now being set with PR_MODE_SYSLOG,
postcheck_log_backend() was being automatically executed in addition to the
manual cfg_post_parse_ring() function for each "ring" section. The consequence
is that sink_finalize() was called twice for a given "ring" section, which
means the connection init would be triggered twice.. which in turn resulted in
the behavior described above, plus possible unexpected side-effects.
To fix the issue, when we create the forward_px proxy, we now set the
PR_CAP_INT capability on it to tell haproxy not to automatically manage the
proxy (ie: to skip the automatic log backend postinit), because we are about
to manually manage the proxy from the sink API.
No backport needed, this bug is specific to 3.3