This patch contains this 4 new fetches and doc changes for the new fetches:
- req.ssl_cipherlist
- req.ssl_sigalgs
- req.ssl_keyshare_groups
- req.ssl_supported_groups
Towards:#2532
Now we can simply call is_sched_alive() on the local thread to verify
that the scheduler is still ticking instead of having to keep a copy of
the ctxsw and comparing it. It's cleaner, doesn't require to maintain
a local copy, doesn't rely on activity[] (whose purpose is mainly for
observation and debugging), and shows how this could be extended later
to cover other use cases. Practically speaking this doesn't change
anything however, the algorithm is still the same.
This verifies that the scheduler is still ticking without having to
access the activity[] array nor keeping local copies of the ctxsw
counter. It just tests and sets a flag that is reset after each
return from a ->process() function.
TH_FL_DUMPING_OTHERS was being used to try to perform exclusion between
threads running "show threads" and those producing warnings. Now that it
is much more cleanly handled, we don't need that type of protection
anymore, which was adding to the complexity of the solution. Let's just
get rid of it.
It has been noticed quite a few times during troubleshooting and even
testing that warnings can happen in avalanches from multiple threads
at the same time, and that their reporting it interleaved bacause the
output is produced in small chunks. Originally, this code inspired by
the panic code aimed at making sure to log whatever could be emitted
in case it would crash later. But this approach was wrong since writes
are atomic, and performing 5 writes in sequence in each dumping thread
also means that the outputs can be mixed up at 5 different locations
between multiple threads. The output of warnings is never very long,
and the stack-based buffer is 4kB so let's just concatenate everything
in the buffer and emit it at once using a single write(). Now there's
no longer this confusion on the output.
Since we no longer call it with a foreign thread, let's simplify its code
and get rid of the special cases that were relying on ha_thread_dump_fill()
and synchronization with a remote thread. We're not only dumping the
current thread so ha_thread_dump_one() is sufficient.
Warnings remain tricky to deal with, especially for other threads as
they require some inter-thread synchronization that doesn't cope very
well with other parallel activities such as "show threads" for example.
However there is nothing that forces us to handle them this way. The
panic for example is already handled by bouncing the WDT signal to the
faulty thread.
This commit rearranges the WDT handler to make a better used of this
existing signal bouncing feature of the WDT handler so that it's no
longer limited to panics but can also deal with warnings. In order not
to bounce on all wakeups, we only bounce when there is a suspicion,
that is, when the warning timer has been crossed. We'll let the target
thread verify the stuck flag and context switch count by itself to
decide whether or not to panic, warn, or just do nothing and update
the counters.
As a bonus, now all warning traces look the same regardless of the
reporting thread:
call trace(16):
| 0x6bc733 <01 00 00 e8 6d e6 de ff]: ha_dump_backtrace+0x73/0x309 > main-0x2570
| 0x6bd37a <00 00 00 e8 d6 fb ff ff]: ha_thread_dump_fill+0xda/0x104 > ha_thread_dump_one
| 0x6bd625 <00 00 00 e8 7b fc ff ff]: ha_stuck_warning+0xc5/0x19e > ha_thread_dump_fill
| 0x7b2b60 <64 8b 3b e8 00 aa f0 ff]: wdt_handler+0x1f0/0x212 > ha_stuck_warning
| 0x7fd7e2cef3a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0
| 0x7ffc6af9e634 <85 a6 00 00 00 0f 01 f9]: linux-vdso:__vdso_gettimeofday+0x34/0x2b0
| 0x6bad74 <7c 24 10 e8 9c 01 df ff]: sc_conn_io_cb+0x9fa4 > main-0x2400
| 0x67c457 <89 f2 4c 89 e6 41 ff d0]: main+0x1cf147
| 0x67d401 <48 89 df e8 8f ed ff ff]: cli_io_handler+0x191/0xb38 > main+0x1cee80
| 0x6dd605 <40 48 8b 45 60 ff 50 18]: task_process_applet+0x275/0xce9
The goal is to let the caller deal with the pointer so that the function
only has to fill that buffer without worrying about locking. This way,
synchronous dumps from "show threads" are produced and emitted directly
without causing undesired locking of the buffer nor risking causing
confusion about thread_dump_buffer containing bits from an interrupted
dump in progress.
It's only the caller that's responsible for notifying the requester of
the end of the dump by setting bit 0 of the pointer if needed (i.e. it's
only done in the debug handler).
This function was initially designed to dump any threadd into the presented
buffer, but the way it currently works is that it's always called for the
current thread, and uses the distinction between coming from a sighandler
or being called directly to detect which thread is the caller.
Let's simplify all this by replacing thr with tid everywhere, and using
the thread-local pointers where it makes sense (e.g. th_ctx, th_ctx etc).
The confusing "from_signal" argument is now replaced with "is_caller"
which clearly states whether or not the caller declares being the one
asking for the dump (the logic is inverted, but there are only two call
places with a constant).
We don't need to copy the old dump pointer to the thread_dump_pointer
area anymore to indicate a dump is collected. It used to be done as an
artificial way to keep the pointer for the post-mortem analysis but
since we now have this pointer stored separately, that's no longer
needed and it simplifies the mechanim to reset it.
Instead of using the thread dump buffer for post-mortem analysis, we'll
keep a copy of the assigned pointer whenever it's used, even for warnings
or "show threads". This will offer more opportunities to figure from a
core what happened, and will give us more freedom regarding the value of
the thread_dump_buffer itself. For example, even at the end of the dump
when the pointer is reset, the last used buffer is now preserved.
If a thread is dumping itself (warning, show thread etc) and another one
wants to dump the state of all threads (e.g. panic), it may interrupt the
first one during backtrace() and re-enter it from the signal handler,
possibly triggering a deadlock in the underlying libc. Let's postpone
the debug signal delivery at this point until the call ends in order to
avoid this.
If a thread is currently resolving a symbol while another thread triggers
a thread dump, the current thread may enter the debug handler and call
resolve_sym_addr() again, possibly deadlocking if the underlying libc
uses locking. Let's postpone the debug signal delivery in this area
during the call. This will slow the resolution a little bit but we don't
care, it's not supposed to happen often and it must remain rock-solid.
This is an extension of eb41d768f ("MINOR: tools: use only opportunistic
symbols resolution"). It also makes sure we're not calling dladddr() in
parallel to dladdr_and_size(), as a preventive measure against some
potential deadlocks in the inner layers of the libc.
A few functions are used when debugging debug signals and watchdog, but
being static, they're not resolved and are hard to spot in dumps, and
they appear as any random other function plus an offset. Let's just not
mark them static anymore, it only hurts:
- cli_io_handler_show_threads()
- debug_run_cli_deadlock()
- debug_parse_cli_loop()
- debug_parse_cli_panic()
In the following trace trying to abuse the watchdog from the CLI's
"debug dev loop" command running in parallel to "show threads" loops,
it's clear that some re-entrance may happen in ha_thread_dump_fill().
A first minimal fix consists in using a test-and-set on the flag
indicating that the function is currently dumping threads, so that
the one from the signal just returns. However the caller should be
made more reliable to serialize all of this, that's for future
work.
Here's an example capture of 7 threads stuck waiting for each other:
(gdb) bt
#0 0x00007fe78d78e147 in sched_yield () from /lib64/libc.so.6
#1 0x0000000000674a05 in ha_thread_relax () at src/thread.c:356
#2 0x00000000005ba4f5 in ha_thread_dump_fill (thr=2, buf=0x7ffdd8e08ab0) at src/debug.c:402
#3 ha_thread_dump_fill (buf=0x7ffdd8e08ab0, thr=<optimized out>) at src/debug.c:384
#4 0x00000000005baac4 in ha_stuck_warning (thr=thr@entry=2) at src/debug.c:840
#5 0x00000000006a360d in wdt_handler (sig=<optimized out>, si=<optimized out>, arg=<optimized out>) at src/wdt.c:156
#6 <signal handler called>
#7 0x00007fe78d78e147 in sched_yield () from /lib64/libc.so.6
#8 0x0000000000674a05 in ha_thread_relax () at src/thread.c:356
#9 0x00000000005ba4c2 in ha_thread_dump_fill (thr=2, buf=0x7fe78f2d6420) at src/debug.c:426
#10 ha_thread_dump_fill (buf=0x7fe78f2d6420, thr=2) at src/debug.c:384
#11 0x00000000005ba7c6 in cli_io_handler_show_threads (appctx=0x2a89ab0) at src/debug.c:548
#12 0x000000000057ea43 in cli_io_handler (appctx=0x2a89ab0) at src/cli.c:1176
#13 0x00000000005d7885 in task_process_applet (t=0x2a82730, context=0x2a89ab0, state=<optimized out>) at src/applet.c:920
#14 0x0000000000659002 in run_tasks_from_lists (budgets=budgets@entry=0x7ffdd8e0a5c0) at src/task.c:644
#15 0x0000000000659bd7 in process_runnable_tasks () at src/task.c:886
#16 0x00000000005cdcc9 in run_poll_loop () at src/haproxy.c:2858
#17 0x00000000005ce457 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3075
#18 0x0000000000430628 in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3665
As seen in issue #2860, there are some situations where a watchdog could
trigger during the debug signal handler, and where similarly the debug
signal handler may trigger during the wdt handler. This is really bad
because it could trigger some deadlocks inside inner libc code such as
dladdr() or backtrace() since the code will not protect against re-
entrance but only against concurrent accesses.
A first attempt was made using ha_sigmask() but that's not always very
convenient because the second handler is called immediately after
unblocking the signal and before returning, leaving signal cascades in
backtrace. Instead, let's mark which signals to block at registration
time. Here we're blocking wdt/dbg for both signals, and optionally
SIGRTMAX if DEBUG_DEV is used as that one may also be used in this case.
This should be backported at least to 3.1.
The function must make sure to return NULL for foreign threads and
the local buffer for the current thread in this case, otherwise panics
(and sometimes even warnings) will segfault when USE_THREAD_DUMP is
disabled. Let's slightly re-arrange the function to reduce the #if/else
since we have to specifically handle the case of !USE_THREAD_DUMP anyway.
This needs to be backported wherever b8adef065d ("MEDIUM: debug: on
panic, make the target thread automatically allocate its buf") was
backported (at least 2.8).
Some signal handlers rely on these to decide about the level of detail to
provide in dumps, so let's properly fill the info about entering/leaving
idle. Note that for consistency with other tests we're using bitops with
t->ltid_bit, while we could simply assign 0/1 to the fields. But it makes
the code more readable and the whole difference is only 88 bytes on a 3MB
executable.
This bug is not important, and while older versions are likely affected
as well, it's not worth taking the risk to backport this in case it would
wake up an obscure bug.
The reason musl builds was not producing exploitable backtraces was
that the toolchain used appears to automatically omit the frame pointer
at -O2 but leaves it at -O0. This patch just makes sure to always append
-fno-omit-frame-pointer to the BACKTRACE cflags and enables the option
with musl where it now works. This will allow us to finally get
exploitable traces from docker images where core dumps are not always
available.
This commit extends MUX H2 connection reversal step to properly take
into account the new idle-ping feature. It first ensures that h2c task
is properly instantiated/freed depending now on both timers and
idle-ping configuration. Also, h2c_update_timeout() is now called
instead of manually requeuing the task, which ensures the proper timer
value is selected depending on the new connection side.
If idle-ping is activated and h2c task is expired due to missing PING
ACK, consider that the peer is away and the connection can be closed
immediately. GOAWAY emission is thus skipped.
A new test is necessary in h2c_update_timeout() when PING ACK is
currently expected, but the next timer expiration selected is not
idle-ping. This may happen if http-keep-alive/http-request timers are
selected first. In this case, H2_CF_IDL_PING_SENT flag is resetted. This
is necessary to not prevent GOAWAY emission on expiration.
This commit is the counterpart of the previous one, adapted on the
frontend side. "idle-ping" is added as keyword to bind lines, to be able
to refresh client timeout of idle frontend connections.
H2 MUX behavior remains similar as the previous patch. The only
significant change is in h2c_update_timeout(), as idle-ping is now taken
into account also for frontend connection. The calculated value is
compared with http-request/http-keep-alive timeout value. The shorter
delay is then used as expired date. As hr/ka timeout are based on
idle_start, this allows to run them in parallel with an idle-ping timer.
This commit implements support for idle-ping on the backend side. First,
a new server keyword "idle-ping" is defined in configuration parsing. It
is used to set the corresponding new server member.
The second part of this commit implements idle-ping support on H2 MUX. A
new inlined function conn_idle_ping() is defined to access connection
idle-ping value. Two new connection flags are defined H2_CF_IDL_PING and
H2_CF_IDL_PING_SENT. The first one is set for idle connections via
h2c_update_timeout().
On h2_timeout_task() handler, if first flag is set, instead of releasing
the connection as before, the second flag is set and tasklet is
scheduled. As both flags are now set, h2_process_mux() will proceed to
PING emission. The timer has also been rearmed to the idle-ping value.
If a PING ACK is received before next timeout, connection timer is
refreshed. Else, the connection is released, as with timer expiration.
Also of importance, special care is needed when a backend connection is
going to idle. In this case, idle-ping timer must be rearmed. Thus a new
invokation of h2c_update_timeout() is performed on h2_detach().
Adapt the already existing function h2c_ack_ping(). The objective is to
be able to emit a PING request. First, it is renamed as h2c_send_ping().
A new boolean argument <ack> is used to emit either a PING request or
ack.
Reorganize code for timeout calculation in case the connection is idle.
The objective is to better reflect the relations between each timeouts
as follow :
* if GOAWAY already emitted, use shut-timeout, or if unset fallback to
client/server one. However, an already set timeout is never erased.
* else, for frontend connection, http-request or keep-alive timeout is
applied depending on the current demux state. If the selected value is
unset, fallback to client timeout
* for backend connection, no timeout is set to perform http-reuse
This commit is pure refactoring, so no functional change should occur.
Since the following commit, MUX H2 timeout function has been slightly
exetended.
d38d8c6ccb
BUG/MEDIUM: mux-h2: make sure control frames do not refresh the idle timeout
A side-effect of this patch is that now backend idle connection expire
timer is not reset if already defined. This means that if a timer was
registered prior to the connection transition to idle, the connection
would be destroyed on its timeout. If this happens for enough
connection, this may have an impact on the reuse rate.
In practice, this case should be rare, as h2c timer is set to
TICK_ETERNITY while there is active streams. The timer is not refreshed
most of the time before going the transition to idle, so the connection
won't be deleted on expiration.
The only case where it could occur is if there is still pending data
blocked on emission on stream detach. Here, timeout server is applied on
the connection. When the emission completes, the connection goes to
idle, but the timer will still armed, and thus will be triggered on the
idle connection.
To prevent this, explicitely reset h2c timer to TICK_ETERNITY for idle
backend connection via h2c_update_timeout().
This patch is explicitely not scheduled for backport for now, as it is
difficult to estimate the real impact of the previous code state.
GOAWAY emission should not be emitted before preface. Thus, max_id field
from h2c acting as a server is initialized to -1, which prevents its
emission until preface is received from the peer. If acting as a client,
max_id is initialized to a valid value on the first h2s emission.
This causes an issue with reverse HTTP on the active side. First, it
starts as a client, so the peer does not emit a preface but instead a
simple SETTINGS frame. As role are switched, max_id is initialized much
later when the first h2s response is emitted. Thus, if the connection
must be terminated before any stream transfer, GOAWAY cannot be emitted.
To fix this, ensure max_id is initialized to 0 on h2_conn_reverse() for
active connect side. Thus, a GOAWAY indicating that no stream has been
handled can be generated.
Note that passive connect side is not impacted, as it max_id is
initialized thanks to preface reception.
This should be backported up to 2.9.
Active connect on reverse http relies on connect timeout to detect
connection failure. Thus, if this timeout was unset, connection failure
may not be properly detected.
Fix this by fallback on hardcoded value of 1s for connect if timeout is
unset in the configuration. This is considered as a minor bug, as
haproxy advises against running with timeout unset.
This must be backported up to 2.9.
While reviewing HTTP/2 MUX timeout, it seems there is a possibility that
MUX task is requeued via h2c_update_timeout() with an already expired
date. This can happens with idle connections on two cases :
* first with shut timeout, as timer is not refreshed if already set
* second with http-request and keep-alive timers, which are based on
idle_start
Queuing an already expired task is an undefined behavior. Fix this by
using task_wakeup() instead of task_queue() at the end of
h2c_update_timeout() if such case occurs.
This should be backported up to 2.6.
Jacques Heunis from bloomberg reported on the mailing list [1] that
with haproxy 2.8 up to master, yielding from a Lua tcp service while
data was still buffered inside haproxy would eat some data which was
definitely lost.
He provided the reproducer below which turned out to be really helpful:
global
log stdout format raw local0 info
lua-load haproxy_yieldtest.lua
defaults
log global
timeout connect 10s
timeout client 1m
timeout server 1m
listen echo
bind *:9090
mode tcp
tcp-request content use-service lua.print_input
haproxy_yieldtest.lua:
core.register_service("print_input", "tcp", function(applet)
core.Info("Start printing input...")
while true do
local inputs = applet:getline()
if inputs == nil or string.len(inputs) == 0 then
core.Info("closing input connection")
return
end
core.Info("Received line: "..inputs)
core.yield()
end
end)
And the script below:
#!/usr/bin/bash
for i in $(seq 1 9999); do
for j in $(seq 1 50); do
echo "${i}_foo_${j}"
done
sleep 2
done
Using it like this:
./test_seq.sh | netcat localhost 9090
We can clearly see the missing data for every "foo" burst (every 2
seconds), as they are holes in the numbering.
Thanks to the reproducer, it was quickly found that only versions
>= 2.8 were affected, and that in fact this regression was introduced
by commit 31572229e ("MEDIUM: hlua/applet: Use the sedesc to report and
detect end of processing")
In fact in 31572229e 2 mistakes were made during the refaco.
Indeed, both in hlua_applet_tcp_fct() (which is involved in the reproducer
above) and hlua_applet_http_fct(), the request (buffer) is now
systematically consumed when returning from the function, which wasn't the
case prior to this commit: when HLUA_E_AGAIN is returned, it means a
yield was requested and that the processing is not done yet, thus we
should not consume any data, like we did prior to the refacto.
Big thanks to Jacques who did a great job reproducing and reporting this
issue on the mailing list.
[1]: https://www.mail-archive.com/haproxy@formilux.org/msg45778.html
It should be backported up to 2.8 with commit 31572229e
Change the representation of the start-line URI when parsing a HTTP/3
request into HTX. Adopt the same conversion as HTTP/2. If :authority
header is used (default case), the URI is encoded using absolute-form,
with scheme, host and path concatenated. If only a plain host header is
used instead, fallback to the origin form.
This commit may cause some configuration to be broken if parsing is
performed on the URI. Indeed, now most of the HTTP/3 requests will be
represented with an absolute-form URI at the stream layer.
Note that prior to this commit a check was performed on the path used as
URI to ensure that it did not contain any invalid characters. Now, this
is directly performed on the URI itself, which may include the path.
This must not be backported.
Ensure that the HTX start-line generated after parsing an HTTP/3 request
does not contain any invalid character, i.e. control or whitespace
characters.
Note that for now path is used directly as URI. Thus, the check is
performed directly over it. A patch will change this to generate an
absolute-form URI in most cases, but it won't be backported to avoid
configuration breaking in stable versions.
This must be backported up to 2.6.
RFC 9114 specifies some requirements for :path pseudo-header when using
http or https scheme. This commit enforces this by rejecting a request
if needed. Thus, path cannot be empty, and it must either start with a
'/' character or contains only '*'.
This must be backported up to 2.6.
As specified in RFC 9114, connection headers required special care in
HTTP/3. When a request is received with connection headers, the stream
is immediately closed. Conversely, when translating the response from
HTX, such headers are not encoded but silently ignored.
However, "upgrade" was not listed in connection headers. This commit
fixes this by adding a check on it both on request parsing and response
encoding.
This must be backported up to 2.6.
This commit does a similar job than the previous one, but it acts now on
the response path. Any leading or trailing whitespaces characters from a
HTX block header value are removed, prior to the header encoding via
QPACK.
This must be backported up to 2.6.
Remove any leading and trailing whitespace from header field values
prior to inserting a new HTX header block. This is done when parsing a
HEADERS frame, both as headers and trailers.
This must be backported up to 2.6.
Free the acme_ctx task context once the task is done.
It frees everything but the config and the httpclient,
everything else is free.
The ckch_store is freed in case of error, but when the task is
successful, the ptr is set to NULL to prevent the free once inserted in
the tree.
This patch registers the task in the ckch_store so we don't run 2 tasks
at the same time for a given certificate.
Move the task creation under the lock and check if there was already a
task under the lock.
Exponential backoff values was multiplied by 3000 instead of 3 with a
second to ms conversion. Leading to a 9000000ms value at the 2nd
attempt.
Fix the issue by setting the value in seconds and converting the value
in tick_add().
No backport needed.
When receiving the final certificate, it need to be loaded by
ssl_sock_load_pem_into_ckch(). However this function will remove any
existing private key in the struct ckch_store.
In order to fix the issue, the ptr to the key is swapped with a NULL
ptr, and restored once the new certificate is commited.
However there is a discrepancy when there is an error in
ssl_sock_load_pem_into_ckch() fails and the pointer is lost.
This patch fixes the issue by restoring the pointer in the error path.
This must fix issue #2933.
Add a 1MB ring buffer called "dpapi" for communication with the
dataplane API. It would first be used to transmit ACME informations to
the dataplane API but could be used for more.
A server abort must be handled by the request analyzers only when the
response forwarding was already started. Otherwise, it it the responsability
of the response analyzer to detect this event. L7-retires and conditions to
decide to silently close a client conneciotn are handled by this analyzer.
Because a reused server connections closed too early could be detected at
the wrong place, it was possible to get a 502/SH instead of a silent close,
preventing the client to safely retries its request.
Thanks to this patch, we are able to silently close the client connection in
this case and eventually to perform a L7 retry.
This patch must be backported as far as 2.8.