A few functions are used when debugging debug signals and watchdog, but
being static, they're not resolved and are hard to spot in dumps, and
they appear as any random other function plus an offset. Let's just not
mark them static anymore, it only hurts:
- cli_io_handler_show_threads()
- debug_run_cli_deadlock()
- debug_parse_cli_loop()
- debug_parse_cli_panic()
In the following trace trying to abuse the watchdog from the CLI's
"debug dev loop" command running in parallel to "show threads" loops,
it's clear that some re-entrance may happen in ha_thread_dump_fill().
A first minimal fix consists in using a test-and-set on the flag
indicating that the function is currently dumping threads, so that
the one from the signal just returns. However the caller should be
made more reliable to serialize all of this, that's for future
work.
Here's an example capture of 7 threads stuck waiting for each other:
(gdb) bt
#0 0x00007fe78d78e147 in sched_yield () from /lib64/libc.so.6
#1 0x0000000000674a05 in ha_thread_relax () at src/thread.c:356
#2 0x00000000005ba4f5 in ha_thread_dump_fill (thr=2, buf=0x7ffdd8e08ab0) at src/debug.c:402
#3 ha_thread_dump_fill (buf=0x7ffdd8e08ab0, thr=<optimized out>) at src/debug.c:384
#4 0x00000000005baac4 in ha_stuck_warning (thr=thr@entry=2) at src/debug.c:840
#5 0x00000000006a360d in wdt_handler (sig=<optimized out>, si=<optimized out>, arg=<optimized out>) at src/wdt.c:156
#6 <signal handler called>
#7 0x00007fe78d78e147 in sched_yield () from /lib64/libc.so.6
#8 0x0000000000674a05 in ha_thread_relax () at src/thread.c:356
#9 0x00000000005ba4c2 in ha_thread_dump_fill (thr=2, buf=0x7fe78f2d6420) at src/debug.c:426
#10 ha_thread_dump_fill (buf=0x7fe78f2d6420, thr=2) at src/debug.c:384
#11 0x00000000005ba7c6 in cli_io_handler_show_threads (appctx=0x2a89ab0) at src/debug.c:548
#12 0x000000000057ea43 in cli_io_handler (appctx=0x2a89ab0) at src/cli.c:1176
#13 0x00000000005d7885 in task_process_applet (t=0x2a82730, context=0x2a89ab0, state=<optimized out>) at src/applet.c:920
#14 0x0000000000659002 in run_tasks_from_lists (budgets=budgets@entry=0x7ffdd8e0a5c0) at src/task.c:644
#15 0x0000000000659bd7 in process_runnable_tasks () at src/task.c:886
#16 0x00000000005cdcc9 in run_poll_loop () at src/haproxy.c:2858
#17 0x00000000005ce457 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3075
#18 0x0000000000430628 in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3665
As seen in issue #2860, there are some situations where a watchdog could
trigger during the debug signal handler, and where similarly the debug
signal handler may trigger during the wdt handler. This is really bad
because it could trigger some deadlocks inside inner libc code such as
dladdr() or backtrace() since the code will not protect against re-
entrance but only against concurrent accesses.
A first attempt was made using ha_sigmask() but that's not always very
convenient because the second handler is called immediately after
unblocking the signal and before returning, leaving signal cascades in
backtrace. Instead, let's mark which signals to block at registration
time. Here we're blocking wdt/dbg for both signals, and optionally
SIGRTMAX if DEBUG_DEV is used as that one may also be used in this case.
This should be backported at least to 3.1.
The function must make sure to return NULL for foreign threads and
the local buffer for the current thread in this case, otherwise panics
(and sometimes even warnings) will segfault when USE_THREAD_DUMP is
disabled. Let's slightly re-arrange the function to reduce the #if/else
since we have to specifically handle the case of !USE_THREAD_DUMP anyway.
This needs to be backported wherever b8adef065d ("MEDIUM: debug: on
panic, make the target thread automatically allocate its buf") was
backported (at least 2.8).
Some signal handlers rely on these to decide about the level of detail to
provide in dumps, so let's properly fill the info about entering/leaving
idle. Note that for consistency with other tests we're using bitops with
t->ltid_bit, while we could simply assign 0/1 to the fields. But it makes
the code more readable and the whole difference is only 88 bytes on a 3MB
executable.
This bug is not important, and while older versions are likely affected
as well, it's not worth taking the risk to backport this in case it would
wake up an obscure bug.
The reason musl builds was not producing exploitable backtraces was
that the toolchain used appears to automatically omit the frame pointer
at -O2 but leaves it at -O0. This patch just makes sure to always append
-fno-omit-frame-pointer to the BACKTRACE cflags and enables the option
with musl where it now works. This will allow us to finally get
exploitable traces from docker images where core dumps are not always
available.
This commit extends MUX H2 connection reversal step to properly take
into account the new idle-ping feature. It first ensures that h2c task
is properly instantiated/freed depending now on both timers and
idle-ping configuration. Also, h2c_update_timeout() is now called
instead of manually requeuing the task, which ensures the proper timer
value is selected depending on the new connection side.
If idle-ping is activated and h2c task is expired due to missing PING
ACK, consider that the peer is away and the connection can be closed
immediately. GOAWAY emission is thus skipped.
A new test is necessary in h2c_update_timeout() when PING ACK is
currently expected, but the next timer expiration selected is not
idle-ping. This may happen if http-keep-alive/http-request timers are
selected first. In this case, H2_CF_IDL_PING_SENT flag is resetted. This
is necessary to not prevent GOAWAY emission on expiration.
This commit is the counterpart of the previous one, adapted on the
frontend side. "idle-ping" is added as keyword to bind lines, to be able
to refresh client timeout of idle frontend connections.
H2 MUX behavior remains similar as the previous patch. The only
significant change is in h2c_update_timeout(), as idle-ping is now taken
into account also for frontend connection. The calculated value is
compared with http-request/http-keep-alive timeout value. The shorter
delay is then used as expired date. As hr/ka timeout are based on
idle_start, this allows to run them in parallel with an idle-ping timer.
This commit implements support for idle-ping on the backend side. First,
a new server keyword "idle-ping" is defined in configuration parsing. It
is used to set the corresponding new server member.
The second part of this commit implements idle-ping support on H2 MUX. A
new inlined function conn_idle_ping() is defined to access connection
idle-ping value. Two new connection flags are defined H2_CF_IDL_PING and
H2_CF_IDL_PING_SENT. The first one is set for idle connections via
h2c_update_timeout().
On h2_timeout_task() handler, if first flag is set, instead of releasing
the connection as before, the second flag is set and tasklet is
scheduled. As both flags are now set, h2_process_mux() will proceed to
PING emission. The timer has also been rearmed to the idle-ping value.
If a PING ACK is received before next timeout, connection timer is
refreshed. Else, the connection is released, as with timer expiration.
Also of importance, special care is needed when a backend connection is
going to idle. In this case, idle-ping timer must be rearmed. Thus a new
invokation of h2c_update_timeout() is performed on h2_detach().
Adapt the already existing function h2c_ack_ping(). The objective is to
be able to emit a PING request. First, it is renamed as h2c_send_ping().
A new boolean argument <ack> is used to emit either a PING request or
ack.
Reorganize code for timeout calculation in case the connection is idle.
The objective is to better reflect the relations between each timeouts
as follow :
* if GOAWAY already emitted, use shut-timeout, or if unset fallback to
client/server one. However, an already set timeout is never erased.
* else, for frontend connection, http-request or keep-alive timeout is
applied depending on the current demux state. If the selected value is
unset, fallback to client timeout
* for backend connection, no timeout is set to perform http-reuse
This commit is pure refactoring, so no functional change should occur.
Since the following commit, MUX H2 timeout function has been slightly
exetended.
d38d8c6ccb
BUG/MEDIUM: mux-h2: make sure control frames do not refresh the idle timeout
A side-effect of this patch is that now backend idle connection expire
timer is not reset if already defined. This means that if a timer was
registered prior to the connection transition to idle, the connection
would be destroyed on its timeout. If this happens for enough
connection, this may have an impact on the reuse rate.
In practice, this case should be rare, as h2c timer is set to
TICK_ETERNITY while there is active streams. The timer is not refreshed
most of the time before going the transition to idle, so the connection
won't be deleted on expiration.
The only case where it could occur is if there is still pending data
blocked on emission on stream detach. Here, timeout server is applied on
the connection. When the emission completes, the connection goes to
idle, but the timer will still armed, and thus will be triggered on the
idle connection.
To prevent this, explicitely reset h2c timer to TICK_ETERNITY for idle
backend connection via h2c_update_timeout().
This patch is explicitely not scheduled for backport for now, as it is
difficult to estimate the real impact of the previous code state.
GOAWAY emission should not be emitted before preface. Thus, max_id field
from h2c acting as a server is initialized to -1, which prevents its
emission until preface is received from the peer. If acting as a client,
max_id is initialized to a valid value on the first h2s emission.
This causes an issue with reverse HTTP on the active side. First, it
starts as a client, so the peer does not emit a preface but instead a
simple SETTINGS frame. As role are switched, max_id is initialized much
later when the first h2s response is emitted. Thus, if the connection
must be terminated before any stream transfer, GOAWAY cannot be emitted.
To fix this, ensure max_id is initialized to 0 on h2_conn_reverse() for
active connect side. Thus, a GOAWAY indicating that no stream has been
handled can be generated.
Note that passive connect side is not impacted, as it max_id is
initialized thanks to preface reception.
This should be backported up to 2.9.
Active connect on reverse http relies on connect timeout to detect
connection failure. Thus, if this timeout was unset, connection failure
may not be properly detected.
Fix this by fallback on hardcoded value of 1s for connect if timeout is
unset in the configuration. This is considered as a minor bug, as
haproxy advises against running with timeout unset.
This must be backported up to 2.9.
While reviewing HTTP/2 MUX timeout, it seems there is a possibility that
MUX task is requeued via h2c_update_timeout() with an already expired
date. This can happens with idle connections on two cases :
* first with shut timeout, as timer is not refreshed if already set
* second with http-request and keep-alive timers, which are based on
idle_start
Queuing an already expired task is an undefined behavior. Fix this by
using task_wakeup() instead of task_queue() at the end of
h2c_update_timeout() if such case occurs.
This should be backported up to 2.6.
Jacques Heunis from bloomberg reported on the mailing list [1] that
with haproxy 2.8 up to master, yielding from a Lua tcp service while
data was still buffered inside haproxy would eat some data which was
definitely lost.
He provided the reproducer below which turned out to be really helpful:
global
log stdout format raw local0 info
lua-load haproxy_yieldtest.lua
defaults
log global
timeout connect 10s
timeout client 1m
timeout server 1m
listen echo
bind *:9090
mode tcp
tcp-request content use-service lua.print_input
haproxy_yieldtest.lua:
core.register_service("print_input", "tcp", function(applet)
core.Info("Start printing input...")
while true do
local inputs = applet:getline()
if inputs == nil or string.len(inputs) == 0 then
core.Info("closing input connection")
return
end
core.Info("Received line: "..inputs)
core.yield()
end
end)
And the script below:
#!/usr/bin/bash
for i in $(seq 1 9999); do
for j in $(seq 1 50); do
echo "${i}_foo_${j}"
done
sleep 2
done
Using it like this:
./test_seq.sh | netcat localhost 9090
We can clearly see the missing data for every "foo" burst (every 2
seconds), as they are holes in the numbering.
Thanks to the reproducer, it was quickly found that only versions
>= 2.8 were affected, and that in fact this regression was introduced
by commit 31572229e ("MEDIUM: hlua/applet: Use the sedesc to report and
detect end of processing")
In fact in 31572229e 2 mistakes were made during the refaco.
Indeed, both in hlua_applet_tcp_fct() (which is involved in the reproducer
above) and hlua_applet_http_fct(), the request (buffer) is now
systematically consumed when returning from the function, which wasn't the
case prior to this commit: when HLUA_E_AGAIN is returned, it means a
yield was requested and that the processing is not done yet, thus we
should not consume any data, like we did prior to the refacto.
Big thanks to Jacques who did a great job reproducing and reporting this
issue on the mailing list.
[1]: https://www.mail-archive.com/haproxy@formilux.org/msg45778.html
It should be backported up to 2.8 with commit 31572229e
Change the representation of the start-line URI when parsing a HTTP/3
request into HTX. Adopt the same conversion as HTTP/2. If :authority
header is used (default case), the URI is encoded using absolute-form,
with scheme, host and path concatenated. If only a plain host header is
used instead, fallback to the origin form.
This commit may cause some configuration to be broken if parsing is
performed on the URI. Indeed, now most of the HTTP/3 requests will be
represented with an absolute-form URI at the stream layer.
Note that prior to this commit a check was performed on the path used as
URI to ensure that it did not contain any invalid characters. Now, this
is directly performed on the URI itself, which may include the path.
This must not be backported.
Ensure that the HTX start-line generated after parsing an HTTP/3 request
does not contain any invalid character, i.e. control or whitespace
characters.
Note that for now path is used directly as URI. Thus, the check is
performed directly over it. A patch will change this to generate an
absolute-form URI in most cases, but it won't be backported to avoid
configuration breaking in stable versions.
This must be backported up to 2.6.
RFC 9114 specifies some requirements for :path pseudo-header when using
http or https scheme. This commit enforces this by rejecting a request
if needed. Thus, path cannot be empty, and it must either start with a
'/' character or contains only '*'.
This must be backported up to 2.6.
As specified in RFC 9114, connection headers required special care in
HTTP/3. When a request is received with connection headers, the stream
is immediately closed. Conversely, when translating the response from
HTX, such headers are not encoded but silently ignored.
However, "upgrade" was not listed in connection headers. This commit
fixes this by adding a check on it both on request parsing and response
encoding.
This must be backported up to 2.6.
This commit does a similar job than the previous one, but it acts now on
the response path. Any leading or trailing whitespaces characters from a
HTX block header value are removed, prior to the header encoding via
QPACK.
This must be backported up to 2.6.
Remove any leading and trailing whitespace from header field values
prior to inserting a new HTX header block. This is done when parsing a
HEADERS frame, both as headers and trailers.
This must be backported up to 2.6.
Free the acme_ctx task context once the task is done.
It frees everything but the config and the httpclient,
everything else is free.
The ckch_store is freed in case of error, but when the task is
successful, the ptr is set to NULL to prevent the free once inserted in
the tree.
This patch registers the task in the ckch_store so we don't run 2 tasks
at the same time for a given certificate.
Move the task creation under the lock and check if there was already a
task under the lock.
Exponential backoff values was multiplied by 3000 instead of 3 with a
second to ms conversion. Leading to a 9000000ms value at the 2nd
attempt.
Fix the issue by setting the value in seconds and converting the value
in tick_add().
No backport needed.
When receiving the final certificate, it need to be loaded by
ssl_sock_load_pem_into_ckch(). However this function will remove any
existing private key in the struct ckch_store.
In order to fix the issue, the ptr to the key is swapped with a NULL
ptr, and restored once the new certificate is commited.
However there is a discrepancy when there is an error in
ssl_sock_load_pem_into_ckch() fails and the pointer is lost.
This patch fixes the issue by restoring the pointer in the error path.
This must fix issue #2933.
Add a 1MB ring buffer called "dpapi" for communication with the
dataplane API. It would first be used to transmit ACME informations to
the dataplane API but could be used for more.
A server abort must be handled by the request analyzers only when the
response forwarding was already started. Otherwise, it it the responsability
of the response analyzer to detect this event. L7-retires and conditions to
decide to silently close a client conneciotn are handled by this analyzer.
Because a reused server connections closed too early could be detected at
the wrong place, it was possible to get a 502/SH instead of a silent close,
preventing the client to safely retries its request.
Thanks to this patch, we are able to silently close the client connection in
this case and eventually to perform a L7 retry.
This patch must be backported as far as 2.8.
During the response payload forwarding, if the back SC is closed, we try to
figure out if it is because of a client abort or a server abort. However,
the condition was not accurrate, especially when abortonclose option is
set. Because of this issue, a server abort may be reported (SD-- in logs)
instead of a client abort (CD-- in logs).
The right way to detect a client abort when we try to forward the response
is to test if the back SC was shut down (SC_FL_SHUT_DOWN flag set) AND
aborted (SC_FL_ABRT_DONE flag set). When these both flags are set, it means
the back connection underwent the shutdown, which should be converted to a
client abort at this stage.
This patch should be backported as far as 2.8. It should fix last strange SD
report in the issue #2749.
src/jws.c: In function '__jws_init':
src/jws.c:594:38: error: passing argument 2 of 'hap_register_unittest' from incompatible pointer type [-Wincompatible-pointer-types]
594 | hap_register_unittest("jwk", jwk_debug);
| ^~~~~~~~~
| |
| int (*)(int, char **)
In file included from include/haproxy/api.h:36,
from include/import/ebtree.h:251,
from include/import/ebmbtree.h:25,
from include/haproxy/jwt-t.h:25,
from src/jws.c:5:
include/haproxy/init.h:37:52: note: expected 'int (*)(void)' but argument is of type 'int (*)(int, char **)'
37 | void hap_register_unittest(const char *name, int (*fct)());
| ~~~~~~^~~~~~
GCC 15 is warning because the function pointer does have its
arguments in the register function.
Should fix issue #2929.
>>> CID 1609049: Code maintainability issues (UNUSED_VALUE)
>>> Assigning value "NULL" to "new_ckchs" here, but that stored value is overwritten before it can be used.
592 struct ckch_store *old_ckchs, *new_ckchs = NULL;
Coverity reported an issue where a variable is initialized to NULL then
directry overwritten with another value. This doesn't arm but this patch
removes the useless initialization.
Must fix issue #2932.
In call traces, we're interested in seeing the code that was executed, not
the code that was not yet. The return address is where the CPU will return
to, so we want to see the bytes that precede this location. In the example
below on x86 we can clearly see a number of direct "call" instructions
(0xe8 + 4 bytes). There are also indirect calls (0xffd0) that cannot be
exploited but it gives insights about where the code branched, which will
not always be the function above it if that one used tail branching for
example. Here's an example dump output:
call ------------,
v
0x6bd634 <64 8b 38 e8 ac f7 ff ff]: debug_handler+0x84/0x95
0x7fa4ea2593a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0
0x752132 <00 00 00 00 00 90 41 55]: htx_remove_blk+0x2/0x354
0x5b1a2c <4c 89 ef e8 04 07 1a 00]: main+0x10471c
0x5b5f05 <48 89 df e8 8b b8 ff ff]: main+0x108bf5
0x60b6f4 <89 ee 4c 89 e7 41 ff d0]: tcpcheck_eval_send+0x3b4/0x14b2
0x610ded <00 00 00 e8 53 a5 ff ff]: tcpcheck_main+0x7dd/0xd36
0x6c5ab4 <48 89 df e8 5c ab f4 ff]: wake_srv_chk+0xc4/0x3d7
0x6c5ddc <48 89 f7 e8 14 fc ff ff]: srv_chk_io_cb+0xc/0x13
For code dumps, dumping from the return address is pointless, what is
interesting is to dump before the return address to read the machine
code that was executed before branching. Let's just make the function
support negative sizes to indicate that we're dumping this number of
bytes to the address instead of this number from the address. In this
case, in order to distinguish them, we're using a '<' instead of '[' to
start the series of bytes, indicating where the bytes expand and where
they stop. For example we can now see this:
0x6bd634 <64 8b 38 e8 ac f7 ff ff]: debug_handler+0x84/0x95
0x7fa4ea2593a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0
0x752132 <00 00 00 00 00 90 41 55]: htx_remove_blk+0x2/0x354
0x5b1a2c <4c 89 ef e8 04 07 1a 00]: main+0x10471c
0x5b5f05 <48 89 df e8 8b b8 ff ff]: main+0x108bf5
0x60b6f4 <89 ee 4c 89 e7 41 ff d0]: tcpcheck_eval_send+0x3b4/0x14b2
0x610ded <00 00 00 e8 53 a5 ff ff]: tcpcheck_main+0x7dd/0xd36
0x6c5ab4 <48 89 df e8 5c ab f4 ff]: wake_srv_chk+0xc4/0x3d7
0x6c5ddc <48 89 f7 e8 14 fc ff ff]: srv_chk_io_cb+0xc/0x13
These counters can have a noticeable cost on large machines, though not
dramatic. There's no single good choice to keep them enabled or disabled.
This commit adds multiple choices:
- DEBUG_COUNTERS set to 2 will automatically enable them by default, while
1 will disable them by default
- the global "debug.counters on/off" will allow to change the setting at
boot, regardless of DEBUG_COUNTERS as long as it was at least 1.
- the CLI "debug counters on/off" will also allow to change the value at
run time, allowing to observe a phenomenon while it's happening, or to
disable counters if it's suspected that their cost is too high
Finally, the "debug counters" command will append "(stopped)" at the end
of the CNT lines when these counters are stopped.
Not that the whole mechanism would easily support being extended to all
counter types by specifying the types to apply to, but it doesn't seem
useful at all and would require the user to also type "cnt" on debug
lines. This may easily be changed in the future if it's found relevant.
COUNT_IF() is convenient but can be heavy since some of them were found
to trigger often (roughly 1 counter per request on avg). This might even
have an impact on large setups due to the cost of a shared cache line
bouncing between multiple cores. For now there's no way to disable it,
so let's only enable it when DEBUG_COUNTERS is 1 or above. A future
change will make it configurable.
Till now the per-line glitches counters were only enabled with the
confusingly named DEBUG_GLITCHES (which would not turn glitches off
when disabled). Let's instead change it to DEBUG_COUNTERS and make sure
it's enabled by default (though it can still be disabled with
-DDEBUG_GLITCHES=0 just like for DEBUG_STRICT). It will later be
expanded to cover more counters.
It's easy to leave some trailing \n or even other characters that can
mangle the debug output. Let's verify at boot time that the debug sections
are clean by checking for chars 0x20 to 0x7e inclusive. This is very simple
to do and it managed to find another one in a multi-line message:
[WARNING] (23696) : Invalid character 0x0a at position 96 in description string at src/cli.c:2516 _send_status()
This way new offending code will be spotted before being committed.
These ones were added by mistake during the change of the cfgparse
mechanism in 3.1, but they're corrupting the output of "debug counters"
by leaving stray ']' on their own lines. We could possibly check them
all once at boot but it doens't seem worth it.
This should be backported to 3.1.