These counters can have a noticeable cost on large machines, though not
dramatic. There's no single good choice to keep them enabled or disabled.
This commit adds multiple choices:
- DEBUG_COUNTERS set to 2 will automatically enable them by default, while
1 will disable them by default
- the global "debug.counters on/off" will allow to change the setting at
boot, regardless of DEBUG_COUNTERS as long as it was at least 1.
- the CLI "debug counters on/off" will also allow to change the value at
run time, allowing to observe a phenomenon while it's happening, or to
disable counters if it's suspected that their cost is too high
Finally, the "debug counters" command will append "(stopped)" at the end
of the CNT lines when these counters are stopped.
Not that the whole mechanism would easily support being extended to all
counter types by specifying the types to apply to, but it doesn't seem
useful at all and would require the user to also type "cnt" on debug
lines. This may easily be changed in the future if it's found relevant.
Till now the per-line glitches counters were only enabled with the
confusingly named DEBUG_GLITCHES (which would not turn glitches off
when disabled). Let's instead change it to DEBUG_COUNTERS and make sure
it's enabled by default (though it can still be disabled with
-DDEBUG_GLITCHES=0 just like for DEBUG_STRICT). It will later be
expanded to cover more counters.
It's easy to leave some trailing \n or even other characters that can
mangle the debug output. Let's verify at boot time that the debug sections
are clean by checking for chars 0x20 to 0x7e inclusive. This is very simple
to do and it managed to find another one in a multi-line message:
[WARNING] (23696) : Invalid character 0x0a at position 96 in description string at src/cli.c:2516 _send_status()
This way new offending code will be spotted before being committed.
Doing unit tests with haproxy was always a bit difficult, some of the
function you want to test would depend on the buffer or trash buffer
initialisation of HAProxy, so building a separate main() for them is
quite hard.
This patch adds a way to register a function that can be called with the
"-U" parameter on the command line, will be executed just after
step_init_1() and will exit the process with its return value as an exit
code.
When using the -U option, every keywords after this option is passed to
the callback and could be used as a parameter, letting the capability to
handle complex arguments if required by the test.
HAProxy need to be built with DEBUG_UNIT to activate this feature.
While signals are not recursive, one signal (e.g. wdt) may interrupt
another one (e.g. debug). The problem this causes is that when leaving
the inner handler, it removes the outer's flag, hence the protection
that comes with it. Let's just have 3 distinct flags for regular signals,
debug signal and watchdog signal. We add a 4th definition which is an
aggregate of the 3 to ease testing.
Signal handlers must absolutely not change anything, but some long and
complex call chains may look innocuous at first glance, yet result in
some subtle write accesses (e.g. pools) that can conflict with a running
thread being interrupted.
Let's add a new thread flag TH_FL_IN_SIG_HANDLER that is only set when
entering a signal handler and cleared when leaving them. Note, we're
speaking about real signal handlers (synchronous ones), not deferred
ones. This will allow some sensitive call places to act differently
when detecting such a condition, and possibly even to place a few new
BUG_ON().
The rework of the thread dumping mechanism in 2.8 with commit 9a6ecbd590
("MEDIUM: debug: simplify the thread dump mechanism") opened a small
race, which is that a thread in the process of dumping other ones may
block the other one from panicing while it's looping at the end of
ha_thread_dump_fill(), or any other sequence involving the currently
dumped one.
This was emphasized in 3.1 with commit 148eb5875f ("DEBUG: wdt: better
detect apparently locked up threads and warn about them") that allowed
to emit warnings about long-stuck threads, because in this case, what
happens is that sometimes a thread starts to emit a warning (or a set
of warnings), and while the warning is being awaited for, a panic
finally happens and interrupts either the dumping thread, which never
finishes and waits for the target's pointer to become NULL which will
never happen since it was supposed to do it itself, or the currently
dumped thread which could wait for the dumping thread to become ready
while this one has not released the former.
In order to address this, first we now make sure never to dump a thread
that is already in the process of dumping another one. We're adding a
new thread flag to know this situation, that is set in ha_thread_dump_fill()
and cleared in ha_thread_dump_done(). And similarly, we don't trigger
the watchdog on a thread waiting for another one to finish its dump,
as it's likely a case of warning (and maybe even a panic) that makes
them wait for each other and we don't want such cases to be reentrant.
Finally, we check in the main polling loop that the flag never accidentally
leaked (e.g. wrong flag manipulation) as this would be difficult to spot
with bad consequences.
This should be backported at least to 2.8, and should resolve github
issue #2860. Thanks to Chris Staite for the very informative backtrace
that exhibited the problem.
When "debug dev sched" is used to pop up background tasks, these tasks
are never stopped, so we must be careful to stop them when the stopping
flag is set, otherwise they can prevent the process from stopping when
sufficiently numerous (tests went as far as 100 million tasks, leading
the run queue never being completely purged in one poll round).
No backport is needed since this is only used when debugging and tuning
the scheduler.
The TID passed to "debug dev sched" is used to pin the task to a given
thread. A negative value normally means the task is unpinned and goes
to the shared wait queue and run queue. However due to the type of the
variable, negative values were mapped as highly positive values and were
set to the current thread. Let's add the proper cast to fix this.
No backport is needed since this is only used to experiment with the
scheduler and measure its performance.
Let's use errname, introduced in the previous commit in the output of
"show dev". This output is destined to engineers. So, no need to provide a
long descriptions of errnos given by strerror.
As can be seen here, the build fails on m68k since commit 665dde648
("MINOR: debug: use LIM2A to show limits") in 3.1:
https://github.com/haproxy/haproxy/actions/runs/12440234399/job/34735360177
The reason is the comparison between a ulong limit and RLIM_INFINITY.
Indeed, on m68k, rlim_t is an unsigned long long. Let's just change
the function's input type to take an rlim_t instead. This also allows
to get rid of the casts in the call place.
This can be backported to 3.1 though it's not important given the low
prevalence of this platform for such use cases.
If neither DEBUG_GLITCHES nor DEBUG_STRICT is set, we end up with
no dbg_cnt section, resulting in debug_parse_cli_counters not
building due to __stop_dbg_cnt and __start_dbg_cnt not being defined.
Let's just condition the end of the function to these conditions.
An alternate approach (less elegant) is to always declare a dummy
entry of type DBG_COUNTER_TYPES in debug.c.
This must be backported to 3.1 since it was brought with glitches.
Since 2.0 with commit e6a02fa65a ("MINOR: threads: add a "stuck" flag
to the thread_info struct"), the TH_FL_STUCK flag was set by the
debugger to flag that a thread was stuck and report it in the output.
However, two commits later (2bfefdbaef "MAJOR: watchdog: implement a
thread lockup detection mechanism"), this flag was used to detect that
a thread had already been reported as stuck. The problem is that it
seldom happens that a "show threads" command instantly crashes because
it calls debug_handler(), which sets the flag, and if the watchdog timer
was about to trigger before going back to the scheduler, the watchdog
believes that the thread has been stuck for a while and will kill the
process.
The issue was magnified in 3.1 with the lower-delay warning, because
it's possible for a thread to die on the next wakeup after the first
warning (which calls debug_handler() hence sets the STUCK flag).
One good approach would have been to use two distinct flags, one for
"stuck" as reported by the debug handler, and one for "stuck" as seen
by the watchdog. However, one could also argue that since the second
commit, given that the wdt monitors the threads, there's no point any
more for the debug handler to set the flag itself. Removing this code
means that two consecutive "show threads" will not report "stuck" until
the watchdog sets it, which aligns better with expectations.
This can be backported to all stable releases. This code has changed a
bit over time, the "if" block and the harmless variables just need to
be removed.
"debug dev" commands are not meant to be used by end-users, and are
purposely not documented. Yet due to their usefulness in troubleshooting
sessions, users are increasingly invited by developers to use some of
them.
"debug dev counters" is one of them. Better move it to "debug counters"
and document it so that users can check them even if the output can look
cryptic at times. This, combined with DEBUG_GLITCHES, can be convenient
to observe suspcious activity. The doc however precises that the format
may change between versions and that new entries/types might appear
within a stable branch.
Using "debug task", it's possible to change a task's expiration, but
we must be careful not to set it to TICK_ETERNITY. Let's use tick_add()
instead. The risk is basically nul since it's a debugging command, so
no backport is needed.
COUNT_GLITCH() will implement an unconditional counter on its declaration
line when DEBUG_GLITCHES is set, and do nothing otherwise. The output will
be reported as "GLT" and can be filtered as "glt" on the CLI. The purpose
is to help figure what's happening if some glitches counters start going
through the roof. The macro supports an optional string argument to
describe the cause of the glitch (e.g. "truncated header"), which is then
reported in the dump.
For now this is conditioned by DEBUG_GLITCHES but if it turns out to be
light enough, maybe we'll keep it enabled full time. In this case it
might have to be moved away from debug dev, or at least documented (or
done as debug counters maybe so that dev can remain undocumented and
updatable within a branch?).
In order to count new event types, we'll need to support empty conditions
so that we don't have to fake if (1) that would pollute the output. This
change checks if #cond is an empty string before concatenating it with
the optional var args, and avoids dumping the colon on the dump if the
whole description is empty.
At the end of the too long processing warning added by commit 0950778b3a
("MINOR: debug: add a function to dump a stuck thread"), there can be some
optional notes about lua and memory trimming. However it's a bit awkward
that they appear after the "trying to recover now" message. Let's just move
that message after the notes.
In order to help users detect when threads are behaving abnormally, let's
try to emit a warning when one is no longer making any progress. This will
allow to catch faulty situations more accurately, instead of occasionally
triggering just after the long task. It will also let users know that there
is something wrong with their configuration, and inspect the call trace to
figure whether they're using excessively long rules or Lua for example (the
usual warnings about lua-load vs lua-load-per-thread are still reported).
The warning will only be emitted for threads not yet marked as stuck so
as not to interfere with panic dumps and avoid sending a warning just
before a panic. A tainted flag is set when this happens however (0x2000).
There's currently no way to just emit a warning informing that a thread
is stuck without crashing. This is a problem because sometimes users
would benefit from this info to clean up their configuration (e.g. abuse
of map_regm, lua-load etc).
This commit adds a new function ha_stuck_warning() that will emit a
warning indicating that the designated thread has been stuck for XX
milliseconds, with a number of streams blocked, and will make that
thread dump its own state. The warning will then be sent to stderr,
along with some reminders about the impacts of such situations to
encourage users to fix their configuration.
In order not to disrupt operations, a local 4kB buffer is allocated
in the stack. This should be quite sufficient.
For now the function is not used.
"debug dev close <fd>" currently closes that FD using fd_delete() after
checking that it's known from the fdtab. Sometimes we also want to just
perform a pure close() of FDs not in the fdtab (pollers, etc) in order
to provoke certain error cases. The optional "hard" argument to the
command will make it use a plain close() instead of fd_delete() and skip
the fd owner check. The main visible effect when closing a traffic socket
with it is that instead of dying from a double fd_delete() by seeing that
fd.owner is already 0, it will die during the next fd_insert() seeing that
fd.owner was not 0.
That one is huge and unneeded since we now have the pointer to the
whole thread_info[] array, which does contain the freshest version
of these info and many more. Let's just get rid of it entirely.
The pointer to struct global is also an important element to have in
post_mortem given that it's used a lot to take decisions in the code.
Let's just add it. It's worth noting that we could get rid of argc/argv
at this point since they're also present in the global struct, but they
don't cost much there anyway.
When using 'debug dev fd', the output of laddr and raddr can contain
some garbage.
This patch replaces any control or non-printable character by a '.'.
Historically for size limitation reasons, we would only dump the
backtrace of stuck threads. The problem is that when triggering
a panic or other reasons, we have no backtrace, which effectively
limits it to the watchdog timer. It's also visible in "show threads"
which used to report backtraces for all threads in 2.4 and displays
none nowadays, making its use much more limited.
A first approach could be to just dump the thread that triggers the
panic (in addition to stuck threads). But that remains quite limited
since "show threads" would still display nothing. This patch takes a
better approach consisting in dumping all non-idle threads. This way
the output is less polluted that with the older approach (no need to
dump all those waiting in the poller), and all active threads are
visible, in panics as well as in "show threads". As such, the CLI
command "debug dev panic" now dmups backtraces again. This is already
a benefit which will ease testing of various locations against the
ability to resolve useful symbols.
Dealing with a core and a stripped executable is a pain when it comes
to finding pools, proxies or thread contexts. Let's put a pointer to
these heads and arrays in the post_mortem struct for easier location.
Other critical lists like this could possibly benefit from being added
later.
Here we now have:
- tgroup_info
- thread_info
- tgroup_ctx
- thread_ctx
- pools
- proxies
Example:
$ objdump -h haproxy|grep post
34 _post_mortem 000014b0 0000000000cfd400 0000000000cfd400 008fc400 2**8
(gdb) set $pm=(struct post_mortem*)0x0000000000cfd400
(gdb) p $pm->tgroup_ctx[0]
$8 = {
threads_harmless = 254,
threads_idle = 254,
stopping_threads = 0,
timers = {
b = {0x0, 0x0}
},
niced_tasks = 0,
__pad = 0xf5662c <ha_tgroup_ctx+44> "",
__end = 0xf56640 <ha_tgroup_ctx+64> ""
}
(gdb) info thr
Id Target Id Frame
* 1 Thread 0x7f9e7706a440 (LWP 21169) 0x00007f9e76a9c868 in raise () from /lib64/libc.so.6
2 Thread 0x7f9e76a60640 (LWP 21175) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
3 Thread 0x7f9e7613d640 (LWP 21176) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
4 Thread 0x7f9e7493a640 (LWP 21179) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
5 Thread 0x7f9e7593c640 (LWP 21177) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
6 Thread 0x7f9e7513b640 (LWP 21178) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
7 Thread 0x7f9e6ffff640 (LWP 21180) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
8 Thread 0x7f9e6f7fe640 (LWP 21181) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
(gdb) p/x $pm->thread_info[0].pth_id
$12 = 0x7f9e7706a440
(gdb) p/x $pm->thread_info[1].pth_id
$13 = 0x7f9e76a60640
(gdb) set $px = *$pm->proxies
while ($px != 0)
printf "%#lx %s served=%u\n", $px, $px->id, $px->served
set $px = ($px)->next
end
0x125eda0 GLOBAL served=0
0x12645b0 stats served=0
0x1266940 comp served=0
0x1268e10 comp_bck served=0
0x1260cf0 <OCSP-UPDATE> served=0
0x12714c0 <HTTPCLIENT> served=0
Placing it in its own section will ease its finding, particularly in
gdb which is too dumb to find anything in memory. Now it will be
sufficient to issue this:
$ gdb -ex "info files" -ex "quit" ./haproxy core 2>/dev/null |grep _post_mortem
0x0000000000cfd300 - 0x0000000000cfe780 is _post_mortem
or this:
$ objdump -h haproxy|grep post
34 _post_mortem 00001480 0000000000cfd300 0000000000cfd300 008fc300 2**8
to spot the symbol's address. Then it can be read this way:
(gdb) p *(struct post_mortem *)0x0000000000cfd300
In order to ease finding of the post_mortem struct in core dumps, let's
make it start with a recognizable pattern of exactly 32 chars (to
preserve alignment):
"POST-MORTEM STARTS HERE+7654321\0"
It can then be found like this from gdb:
(gdb) find 0x000000012345678, 0x0000000100000000, 'P','O','S','T','-','M','O','R','T','E','M'
0xcfd300 <post_mortem>
1 pattern found.
Or easier with any other more practical tool (who as ever used "find" in
gdb, given that it cannot iterate over maps and is 100% useless?).
Commit 091de0f9b2 ("MINOR: debug: slightly change the thread_dump_pointer
signification") caused the following warning to be emitted when threads
are disabled:
src/debug.c: In function 'ha_thread_dump_one':
src/debug.c:359:9: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
Let's just disguise the pointer to silence it. It should be backported
where the patch above was backported, since it was part of a series aiming
at making thread dumps more exploitable from core dumps.
Issuing "debug dev counters" on the CLI will now scan all existing
counters, and report their count, type, location, function name, the
condition and an optional comment passed to the macro.
The command takes a number of arguments:
- "show": this is the default, it will just list the counters
- "reset": will reset the matching counters instead of listing them
- "all": by default, only non-zero counters are listed. With "all",
they are all listed
- "bug": restrict the reset or dump to counters of type "BUG" (BUG_ON usually)
- "chk": restrict the reset or dump to counters of type "CHK" (CHECK_IF)
- "cnt": restrict the reset or dump to counters of type "CNT" (COUNT_IF)
The types may be cumulated, and the option entered in any order. Here's
an example of the output of "debug dev counters show all bug":
Count Type Location function(): "condition" [comment]
0 BUG ring.h:114 ring_dup(): "max > ring_size(dst)"
0 BUG vecpair.h:223 vp_getblk_ofs(): "ofs >= v1->len + v2->len"
0 BUG buf.h:395 b_add(): "b->data + count > b->size"
0 BUG buf.h:106 b_room(): "b->data > b->size"
0 BUG task.h:328 _task_queue(): "(ulong)caller & 1"
0 BUG task.h:324 _task_queue(): "task->tid != tid"
0 BUG task.h:313 _task_queue(): "(ulong)caller & 1"
(...)
This is expected to be convenient combined with the use and abuse of
COUNT_IF() at select locations.
One main problem with panic dumps is that they're filling the dumping
thread's trash, and that the global thread_dump_buffer is too small to
catch enough of them.
Here we're proceeding differently. When dumping threads for a panic, we're
passing the magic value 0x2 as the buffer, and it will instruct the target
thread to allocate its own buffer using get_trash_chunk() (which is signal
safe), so that each thread dumps into its own buffer. Then the thread will
wait for the buffer to be consumed, and will assign its own thread_dump_buffer
to it. This way we can simply dump all threads' buffers from gdb like this:
(gdb) set $t=0
while ($t < global.nbthread)
printf "%s\n", ha_thread_ctx[$t].thread_dump_buffer.area
set $t=$t+1
end
For now we make it wait forever since it's only called on panic and we
want to make sure the thread doesn't leave and continues to use that trash
buffer or do other nasty stuff. That way the dumping thread will make all
of them die.
This would be useful to backport to the most recent branches to help
troubleshooting. It backports well to 2.9, except for some trivial
context in tinfo-t.h for an updated comment. 2.8 and older would also
require TAINTED_PANIC. The following previous patches are required:
MINOR: debug: make mark_tainted() return the previous value
MINOR: chunk: drop the global thread_dump_buffer
MINOR: debug: split ha_thread_dump() in two parts
MINOR: debug: slightly change the thread_dump_pointer signification
MINOR: debug: make ha_thread_dump_done() take the pointer to be used
MINOR: debug: replace ha_thread_dump() with its two components
At the few places we were calling ha_thread_dump(), now we're
calling separately ha_thread_dump_fill() and ha_thread_dump_done()
once the data are consumed.
This will allow the caller to decide whether to definitely clear the
pointer and release the thread, or to leave it unlocked so that it's
easy to analyse from the struct (the goal will be to use that in panic()
so that cores are easy to analyse).
Now the thread_dump_pointer is returned ORed with 1 once done, or NULL
when cancelled (for now noone cancels). The goal will be to permit
the callee to provide its own pointer.
The ha_thread_dump_fill() function now returns the buffer pointer that
was used (without OR 1) or NULL, for ease of use from the caller.
We want to have a function to trigger the dump and another one to wait
for it to be completed. This will be important to permit panic dumps to
be done on local threads. For now this does not change anything, as the
function still calls the two new functions one after the other.
This variable is not very useful and is confusing anyway. It was mostly
used to detect that a panic dump was still in progress, but we can now
check mark_tainted() for this. The pointer was set to one of the dumping
thread's trash chunks. Let's temporarily continue to copy the dumps to
that trash, we'll remove it later.
It's usefull to keep runtime limits (fd and RAM) in postmortem and show them in
debug_parse_cli_show_dev(). Runtime limits are fed in feed_post_mortem_late(),
as we are sure that at this moment that all configuration was parsed and
all applied limits were alredy adjusted.
This is a preparation patch to extend postmortem in order to store runtime
limits. No need to perform getrlimit() in feed_post_mortem(), as we do this
in the very beginning of main() and we store initial fd limits in global
'rlim_fd_cur_at_boot' and 'rlim_fd_max_at_boot' variables.
It is more handy to use LIM2A in debug_parse_cli_show_dev(), as it allows to
show a custom string ("unlimited"), if a given limit value equals to 0.
normalize_rlim() handler is needed to convert properly RLIM_INFINITY to zero,
with the respect of type sizes, as rlim_t is always 4 bytes on 32bit and
64bit arch.
Let's extend postmortem to keep process runtime capabilities. This information
is gathered in feed_post_mortem_late(), as it is called just before
run_poll_loop() and we are sure at this moment, that all configuration
settings were successfully applied.
Let's extend post_mortem to store runtime process uid and gid.
This information is fed in feed_post_mortem_late(), just before calling
run_poll_loop(). Like this we are sure that all configuration settings were
successfully applied.
Process runtime information could be very useful in post_mortem, but we have to
collect it just before calling run_poll_loop(). Like this we are sure, that
we've successfully applied all configuration parameters and what we've
collected are the latest runtime settings.
The most appropraite place to collect such information is
feed_post_mortem_late(). It's called in each thread, but puts thread info in
the post_mortem only when it's in the last thread context. As it's called
under mutex lock, other threads at this moment have to wait until
feed_post_mortem_late() and another initialization functions from
per_thread_init_list will finish. The number of threads could be large. So, to
avoid spending a lot of time under the lock, let's exit immediately from
feed_post_mortem_late(), if it wasn't called in the last thread.