haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-07 07:37:02 +02:00

Author	SHA1	Message	Date
Valentine Krasnobaeva	ff461efc59	MINOR: debug: align output style of debug_parse_cli_show_dev with cpu_dump_topology Align titles style of debug_parse_cli_show_dev() with cpu_dump_topology(). We will call the latter inside of debug_parse_cli_show_dev() to show thread-cpu bindings info.	2025-07-17 19:08:06 +02:00
Willy Tarreau	110625bdb2	MINOR: debug: report haproxy and operating system info in panic dumps The goal is to help figure the OS version (kernel and userland), any virtualization/containers, and the haproxy version and build features. Sometimes even reporters themselve can be mistaken about the running version or environment. Also printing this at the top hepls draw a visual delimitation between warnings and panic. Now we get something like this: PANIC! Thread 1 is about to kill the process. HAProxy info: version: 3.3-dev3-c863c0-18 features: +51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY (...) Operating system info: virtual machine: no container: no kernel: Linux 6.1.131 #1 SMP PREEMPT_DYNAMIC Fri Mar 14 01:04:55 CET 2025 x86_64 userland: Slackware 15.0 x86_64 * Thread 1 : id=0x7f615a8775c0 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0 1/1 stuck=0 prof=0 harmless=0 isolated=0 cpu_ns: poll=1835010197 now=1835066102 diff=55905 (...)	2025-07-15 17:18:29 +02:00
Valentine Krasnobaeva	0c63883be1	MINOR: debug: add distro name and version in postmortem Since 2012, systemd compliant distributions contain /etc/os-release file. This file has some standardized format, see details at https://www.freedesktop.org/software/systemd/man/latest/os-release.html. Let's read it in feed_post_mortem_linux() to gather more info about the distribution. (cherry picked from commit f1594c41368baf8f60737b229e4359fa7e1289a9) Signed-off-by: Willy Tarreau <w@1wt.eu>	2025-07-11 11:48:19 +02:00
Willy Tarreau	697a531516	MINOR: debug: bump the dump buffer to 8kB Now with the improved backtraces, the lock history and details in the mux layers, some dumps appear truncated or with some chars alone at the beginning of the line. The issue is in fact caused by the limited dump buffer size (2kB for stderr, 4kB for warning), that cannot hold a complete trace anymore. Let's jump bump them to 8kB, this will be plenty for a long time.	2025-05-07 10:02:58 +02:00
Willy Tarreau	3bb6eea6d5	DEBUG: threads: display held locks in threads dumps Based on the lock history, we can spot some locks that are still held by checking the last operation that happened on them: if it's not an unlock, then we know the lock is held. In this case we append the list after "locked:" with their label and state like below: U:QUEUE S:IDLE_CONNS U:IDLE_CONNS R:TASK_WQ U:TASK_WQ S:QUEUE S:QUEUE S:QUEUE locked: QUEUE(S) S:IDLE_CONNS U:IDLE_CONNS S:TASK_RQ U:TASK_RQ S:QUEUE U:QUEUE S:IDLE_CONNS locked: IDLE_CONNS(S) R:TASK_WQ S:TASK_WQ R:TASK_WQ S:TASK_WQ R:TASK_WQ S:TASK_WQ R:TASK_WQ locked: TASK_WQ(R) W:STK_TABLE W:STK_TABLE_UPDT U:STK_TABLE_UPDT W:STK_TABLE W:STK_TABLE_UPDT U:STK_TABLE_UPDT W:STK_TABLE W:STK_TABLE_UPDT locked: STK_TABLE(W) STK_TABLE_UPDT(W) The format is slightly different (label(status)) so as to easily differentiate them visually from the history.	2025-05-06 05:20:37 +02:00
Willy Tarreau	d9a659ed96	MINOR: threads/cli: display the lock history on "show threads" This will display the lock labels and modes for each non-empty step at the end of "show threads" when these are defined. This allows to emit up to the last 8 locking operation for each thread on 64 bit machines.	2025-04-28 16:50:34 +02:00
Willy Tarreau	874ba2afed	CLEANUP: debug: no longer set nor use TH_FL_DUMPING_OTHERS TH_FL_DUMPING_OTHERS was being used to try to perform exclusion between threads running "show threads" and those producing warnings. Now that it is much more cleanly handled, we don't need that type of protection anymore, which was adding to the complexity of the solution. Let's just get rid of it.	2025-04-17 16:25:47 +02:00
Willy Tarreau	513397ac82	MINOR: debug: make ha_stuck_warning() print the whole message at once It has been noticed quite a few times during troubleshooting and even testing that warnings can happen in avalanches from multiple threads at the same time, and that their reporting it interleaved bacause the output is produced in small chunks. Originally, this code inspired by the panic code aimed at making sure to log whatever could be emitted in case it would crash later. But this approach was wrong since writes are atomic, and performing 5 writes in sequence in each dumping thread also means that the outputs can be mixed up at 5 different locations between multiple threads. The output of warnings is never very long, and the stack-based buffer is 4kB so let's just concatenate everything in the buffer and emit it at once using a single write(). Now there's no longer this confusion on the output.	2025-04-17 16:25:47 +02:00
Willy Tarreau	c16d5415a8	MINOR: debug: make ha_stuck_warning() only work for the current thread Since we no longer call it with a foreign thread, let's simplify its code and get rid of the special cases that were relying on ha_thread_dump_fill() and synchronization with a remote thread. We're not only dumping the current thread so ha_thread_dump_one() is sufficient.	2025-04-17 16:25:47 +02:00
Willy Tarreau	b24d7f248e	MINOR: pass a valid buffer pointer to ha_thread_dump_one() The goal is to let the caller deal with the pointer so that the function only has to fill that buffer without worrying about locking. This way, synchronous dumps from "show threads" are produced and emitted directly without causing undesired locking of the buffer nor risking causing confusion about thread_dump_buffer containing bits from an interrupted dump in progress. It's only the caller that's responsible for notifying the requester of the end of the dump by setting bit 0 of the pointer if needed (i.e. it's only done in the debug handler).	2025-04-17 16:25:47 +02:00
Willy Tarreau	5ac739cd0c	MINOR: debug: remove unused case of thr!=tid in ha_thread_dump_one() This function was initially designed to dump any threadd into the presented buffer, but the way it currently works is that it's always called for the current thread, and uses the distinction between coming from a sighandler or being called directly to detect which thread is the caller. Let's simplify all this by replacing thr with tid everywhere, and using the thread-local pointers where it makes sense (e.g. th_ctx, th_ctx etc). The confusing "from_signal" argument is now replaced with "is_caller" which clearly states whether or not the caller declares being the one asking for the dump (the logic is inverted, but there are only two call places with a constant).	2025-04-17 16:25:47 +02:00
Willy Tarreau	5646ec4d40	MINOR: debug: always reset the dump pointer when done We don't need to copy the old dump pointer to the thread_dump_pointer area anymore to indicate a dump is collected. It used to be done as an artificial way to keep the pointer for the post-mortem analysis but since we now have this pointer stored separately, that's no longer needed and it simplifies the mechanim to reset it.	2025-04-17 16:25:47 +02:00
Willy Tarreau	6d8a523d14	MINOR: tinfo: keep a copy of the pointer to the thread dump buffer Instead of using the thread dump buffer for post-mortem analysis, we'll keep a copy of the assigned pointer whenever it's used, even for warnings or "show threads". This will offer more opportunities to figure from a core what happened, and will give us more freedom regarding the value of the thread_dump_buffer itself. For example, even at the end of the dump when the pointer is reset, the last used buffer is now preserved.	2025-04-17 16:25:47 +02:00
Willy Tarreau	d20e9cad67	MINOR: debug: protect ha_dump_backtrace() against risks of re-entrance If a thread is dumping itself (warning, show thread etc) and another one wants to dump the state of all threads (e.g. panic), it may interrupt the first one during backtrace() and re-enter it from the signal handler, possibly triggering a deadlock in the underlying libc. Let's postpone the debug signal delivery at this point until the call ends in order to avoid this.	2025-04-17 16:25:47 +02:00
Willy Tarreau	5b5960359f	MINOR: debug: do not statify a few debugging functions often used with wdt/dbg A few functions are used when debugging debug signals and watchdog, but being static, they're not resolved and are hard to spot in dumps, and they appear as any random other function plus an offset. Let's just not mark them static anymore, it only hurts: - cli_io_handler_show_threads() - debug_run_cli_deadlock() - debug_parse_cli_loop() - debug_parse_cli_panic()	2025-04-17 16:25:47 +02:00
Willy Tarreau	47f8397afb	BUG/MINOR: debug: detect and prevent re-entrance in ha_thread_dump_fill() In the following trace trying to abuse the watchdog from the CLI's "debug dev loop" command running in parallel to "show threads" loops, it's clear that some re-entrance may happen in ha_thread_dump_fill(). A first minimal fix consists in using a test-and-set on the flag indicating that the function is currently dumping threads, so that the one from the signal just returns. However the caller should be made more reliable to serialize all of this, that's for future work. Here's an example capture of 7 threads stuck waiting for each other: (gdb) bt #0 0x00007fe78d78e147 in sched_yield () from /lib64/libc.so.6 #1 0x0000000000674a05 in ha_thread_relax () at src/thread.c:356 #2 0x00000000005ba4f5 in ha_thread_dump_fill (thr=2, buf=0x7ffdd8e08ab0) at src/debug.c:402 #3 ha_thread_dump_fill (buf=0x7ffdd8e08ab0, thr=<optimized out>) at src/debug.c:384 #4 0x00000000005baac4 in ha_stuck_warning (thr=thr@entry=2) at src/debug.c:840 #5 0x00000000006a360d in wdt_handler (sig=<optimized out>, si=<optimized out>, arg=<optimized out>) at src/wdt.c:156 #6 <signal handler called> #7 0x00007fe78d78e147 in sched_yield () from /lib64/libc.so.6 #8 0x0000000000674a05 in ha_thread_relax () at src/thread.c:356 #9 0x00000000005ba4c2 in ha_thread_dump_fill (thr=2, buf=0x7fe78f2d6420) at src/debug.c:426 #10 ha_thread_dump_fill (buf=0x7fe78f2d6420, thr=2) at src/debug.c:384 #11 0x00000000005ba7c6 in cli_io_handler_show_threads (appctx=0x2a89ab0) at src/debug.c:548 #12 0x000000000057ea43 in cli_io_handler (appctx=0x2a89ab0) at src/cli.c:1176 #13 0x00000000005d7885 in task_process_applet (t=0x2a82730, context=0x2a89ab0, state=<optimized out>) at src/applet.c:920 #14 0x0000000000659002 in run_tasks_from_lists (budgets=budgets@entry=0x7ffdd8e0a5c0) at src/task.c:644 #15 0x0000000000659bd7 in process_runnable_tasks () at src/task.c:886 #16 0x00000000005cdcc9 in run_poll_loop () at src/haproxy.c:2858 #17 0x00000000005ce457 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3075 #18 0x0000000000430628 in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3665	2025-04-17 16:25:47 +02:00
Willy Tarreau	ebf1757dc2	BUG/MINOR: wdt/debug: avoid signal re-entrance between debugger and watchdog As seen in issue #2860, there are some situations where a watchdog could trigger during the debug signal handler, and where similarly the debug signal handler may trigger during the wdt handler. This is really bad because it could trigger some deadlocks inside inner libc code such as dladdr() or backtrace() since the code will not protect against re- entrance but only against concurrent accesses. A first attempt was made using ha_sigmask() but that's not always very convenient because the second handler is called immediately after unblocking the signal and before returning, leaving signal cascades in backtrace. Instead, let's mark which signals to block at registration time. Here we're blocking wdt/dbg for both signals, and optionally SIGRTMAX if DEBUG_DEV is used as that one may also be used in this case. This should be backported at least to 3.1.	2025-04-17 16:25:47 +02:00
Willy Tarreau	0b56839455	BUG/MINOR debug: fix !USE_THREAD_DUMP in ha_thread_dump_fill() The function must make sure to return NULL for foreign threads and the local buffer for the current thread in this case, otherwise panics (and sometimes even warnings) will segfault when USE_THREAD_DUMP is disabled. Let's slightly re-arrange the function to reduce the #if/else since we have to specifically handle the case of !USE_THREAD_DUMP anyway. This needs to be backported wherever b8adef065d ("MEDIUM: debug: on panic, make the target thread automatically allocate its buf") was backported (at least 2.8).	2025-04-17 16:25:47 +02:00
Willy Tarreau	3cbbf41cd8	MINOR: debug: detect call instructions and show the branch target in backtraces In backtraces, sometimes it's difficult to know what was called by a given point, because some functions can be fairly long making one doubt about the correct pointer of unresolved ones, others might just use a tail branch instead of a call + return, etc. On common architectures (x86 and aarch64), it's not difficult to detect and decode a relative call, so let's do it on both of these platforms and show the branch location after a '>'. Example: x86_64: call trace(19): \| 0x6bd644 <64 8b 38 e8 ac f7 ff ff]: debug_handler+0x84/0x95 > ha_thread_dump_one \| 0x7feb3e5383a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0 \| 0x7feb3e53748b <c0 b8 03 00 00 00 0f 05]: libpthread:__close+0x3b/0x8b \| 0x7619e4 <44 89 ff e8 fc 97 d4 ff]: _fd_delete_orphan+0x1d4/0x1d6 > main-0x2130 \| 0x743862 <8b 7f 68 e8 8e e1 01 00]: sock_conn_ctrl_close+0x12/0x54 > fd_delete \| 0x5ac822 <c0 74 05 4c 89 e7 ff d0]: main+0xff512 \| 0x5bc85c <48 89 ef e8 04 fc fe ff]: main+0x10f54c > main+0xff150 \| 0x5be410 <4c 89 e7 e8 c0 e1 ff ff]: main+0x111100 > main+0x10f2c0 \| 0x6ae6a4 <28 00 00 00 00 ff 51 58]: cli_io_handler+0x31524 \| 0x6aeab4 <7c 24 08 e8 fc fa ff ff]: sc_destroy+0x14/0x2a4 > cli_io_handler+0x31430 \| 0x6c685d <48 89 ef e8 43 82 fe ff]: process_chk_conn+0x51d/0x1927 > sc_destroy aarch64: call trace(15): \| 0xaaaaad0c1540 <60 6a 60 b8 c3 fd ff 97]: debug_handler+0x9c/0xbc > ha_thread_dump_one \| 0xffffa8c177ac <c2 e0 3b d5 1f 20 03 d5]: linux-vdso:__kernel_rt_sigreturn \| 0xaaaaad0b0964 <c0 03 5f d6 d2 ff ff 97]: cli_io_handler+0x28e44 > sedesc_new \| 0xaaaaad0b22a4 <00 00 80 d2 94 f9 ff 97]: sc_new_from_strm+0x1c/0x54 > cli_io_handler+0x28dd0 \| 0xaaaaad0167e8 <21 00 80 52 a9 6e 02 94]: stream_new+0x258/0x67c > sc_new_from_strm \| 0xaaaaad0b21f8 <e1 03 13 aa e7 90 fd 97]: sc_new_from_endp+0x38/0xc8 > stream_new \| 0xaaaaacfda628 <21 18 40 f9 e7 5e 03 94]: main+0xcaca8 > sc_new_from_endp \| 0xaaaaacfdb95c <42 c0 00 d1 02 f3 ff 97]: main+0xcbfdc > main+0xc8be0 \| 0xaaaaacfdd3f0 <e0 03 13 aa f5 f7 ff 97]: h1_io_cb+0xd0/0xb90 > main+0xcba40	2025-04-14 20:06:48 +02:00
Willy Tarreau	9740f15274	MINOR: debug: in call traces, dump the 8 bytes before the return address, not after In call traces, we're interested in seeing the code that was executed, not the code that was not yet. The return address is where the CPU will return to, so we want to see the bytes that precede this location. In the example below on x86 we can clearly see a number of direct "call" instructions (0xe8 + 4 bytes). There are also indirect calls (0xffd0) that cannot be exploited but it gives insights about where the code branched, which will not always be the function above it if that one used tail branching for example. Here's an example dump output: call ------------, v 0x6bd634 <64 8b 38 e8 ac f7 ff ff]: debug_handler+0x84/0x95 0x7fa4ea2593a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0 0x752132 <00 00 00 00 00 90 41 55]: htx_remove_blk+0x2/0x354 0x5b1a2c <4c 89 ef e8 04 07 1a 00]: main+0x10471c 0x5b5f05 <48 89 df e8 8b b8 ff ff]: main+0x108bf5 0x60b6f4 <89 ee 4c 89 e7 41 ff d0]: tcpcheck_eval_send+0x3b4/0x14b2 0x610ded <00 00 00 e8 53 a5 ff ff]: tcpcheck_main+0x7dd/0xd36 0x6c5ab4 <48 89 df e8 5c ab f4 ff]: wake_srv_chk+0xc4/0x3d7 0x6c5ddc <48 89 f7 e8 14 fc ff ff]: srv_chk_io_cb+0xc/0x13	2025-04-14 19:28:22 +02:00
Willy Tarreau	b708345c17	DEBUG: counters: add the ability to enable/disable updating the COUNT_IF counters These counters can have a noticeable cost on large machines, though not dramatic. There's no single good choice to keep them enabled or disabled. This commit adds multiple choices: - DEBUG_COUNTERS set to 2 will automatically enable them by default, while 1 will disable them by default - the global "debug.counters on/off" will allow to change the setting at boot, regardless of DEBUG_COUNTERS as long as it was at least 1. - the CLI "debug counters on/off" will also allow to change the value at run time, allowing to observe a phenomenon while it's happening, or to disable counters if it's suspected that their cost is too high Finally, the "debug counters" command will append "(stopped)" at the end of the CNT lines when these counters are stopped. Not that the whole mechanism would easily support being extended to all counter types by specifying the types to apply to, but it doesn't seem useful at all and would require the user to also type "cnt" on debug lines. This may easily be changed in the future if it's found relevant.	2025-04-14 19:02:13 +02:00
Willy Tarreau	61d633a3ac	DEBUG: rename DEBUG_GLITCHES to DEBUG_COUNTERS and enable it by default Till now the per-line glitches counters were only enabled with the confusingly named DEBUG_GLITCHES (which would not turn glitches off when disabled). Let's instead change it to DEBUG_COUNTERS and make sure it's enabled by default (though it can still be disabled with -DDEBUG_GLITCHES=0 just like for DEBUG_STRICT). It will later be expanded to cover more counters.	2025-04-14 19:02:13 +02:00
Willy Tarreau	a8148c313a	DEBUG: init: report invalid characters in debug description strings It's easy to leave some trailing \n or even other characters that can mangle the debug output. Let's verify at boot time that the debug sections are clean by checking for chars 0x20 to 0x7e inclusive. This is very simple to do and it managed to find another one in a multi-line message: [WARNING] (23696) : Invalid character 0x0a at position 96 in description string at src/cli.c:2516 _send_status() This way new offending code will be spotted before being committed.	2025-04-14 19:02:13 +02:00
William Lallemand	a647839954	DEBUG: init: add a way to register functions for unit tests Doing unit tests with haproxy was always a bit difficult, some of the function you want to test would depend on the buffer or trash buffer initialisation of HAProxy, so building a separate main() for them is quite hard. This patch adds a way to register a function that can be called with the "-U" parameter on the command line, will be executed just after step_init_1() and will exit the process with its return value as an exit code. When using the -U option, every keywords after this option is passed to the callback and could be used as a parameter, letting the capability to handle complex arguments if required by the test. HAProxy need to be built with DEBUG_UNIT to activate this feature.	2025-03-03 12:43:32 +01:00
Willy Tarreau	fb7874c286	MINOR: tinfo: split the signal handler report flags into 3 While signals are not recursive, one signal (e.g. wdt) may interrupt another one (e.g. debug). The problem this causes is that when leaving the inner handler, it removes the outer's flag, hence the protection that comes with it. Let's just have 3 distinct flags for regular signals, debug signal and watchdog signal. We add a 4th definition which is an aggregate of the 3 to ease testing.	2025-02-24 13:37:52 +01:00
Willy Tarreau	ddd173355c	MINOR: tinfo: add a new thread flag to indicate a call from a sig handler Signal handlers must absolutely not change anything, but some long and complex call chains may look innocuous at first glance, yet result in some subtle write accesses (e.g. pools) that can conflict with a running thread being interrupted. Let's add a new thread flag TH_FL_IN_SIG_HANDLER that is only set when entering a signal handler and cleared when leaving them. Note, we're speaking about real signal handlers (synchronous ones), not deferred ones. This will allow some sensitive call places to act differently when detecting such a condition, and possibly even to place a few new BUG_ON().	2025-02-21 17:41:38 +01:00
Willy Tarreau	7ddcdff33f	BUG/MEDIUM: debug: close a possible race between thread dump and panic() The rework of the thread dumping mechanism in 2.8 with commit `9a6ecbd590` ("MEDIUM: debug: simplify the thread dump mechanism") opened a small race, which is that a thread in the process of dumping other ones may block the other one from panicing while it's looping at the end of ha_thread_dump_fill(), or any other sequence involving the currently dumped one. This was emphasized in 3.1 with commit `148eb5875f` ("DEBUG: wdt: better detect apparently locked up threads and warn about them") that allowed to emit warnings about long-stuck threads, because in this case, what happens is that sometimes a thread starts to emit a warning (or a set of warnings), and while the warning is being awaited for, a panic finally happens and interrupts either the dumping thread, which never finishes and waits for the target's pointer to become NULL which will never happen since it was supposed to do it itself, or the currently dumped thread which could wait for the dumping thread to become ready while this one has not released the former. In order to address this, first we now make sure never to dump a thread that is already in the process of dumping another one. We're adding a new thread flag to know this situation, that is set in ha_thread_dump_fill() and cleared in ha_thread_dump_done(). And similarly, we don't trigger the watchdog on a thread waiting for another one to finish its dump, as it's likely a case of warning (and maybe even a panic) that makes them wait for each other and we don't want such cases to be reentrant. Finally, we check in the main polling loop that the flag never accidentally leaked (e.g. wrong flag manipulation) as this would be difficult to spot with bad consequences. This should be backported at least to 2.8, and should resolve github issue #2860. Thanks to Chris Staite for the very informative backtrace that exhibited the problem.	2025-02-10 18:34:26 +01:00
Willy Tarreau	8d63dc50ab	BUG/MINOR: debug: make sure the "debug dev sched" tasks don't block stopping When "debug dev sched" is used to pop up background tasks, these tasks are never stopped, so we must be careful to stop them when the stopping flag is set, otherwise they can prevent the process from stopping when sufficiently numerous (tests went as far as 100 million tasks, leading the run queue never being completely purged in one poll round). No backport is needed since this is only used when debugging and tuning the scheduler.	2025-02-07 18:04:29 +01:00
Willy Tarreau	6765a32eb4	BUG/MINOR: debug: make "debug dev sched" accept a negative TID The TID passed to "debug dev sched" is used to pin the task to a given thread. A negative value normally means the task is unpinned and goes to the shared wait queue and run queue. However due to the type of the variable, negative values were mapped as highly positive values and were set to the current thread. Let's add the proper cast to fix this. No backport is needed since this is only used to experiment with the scheduler and measure its performance.	2025-02-07 18:04:29 +01:00
Valentine Krasnobaeva	8620ae7962	MINOR: debug: show boot and runtime process settings in table Let's reformat output of "show dev" in order to show some boot and runtime process settings in a table. This makes the output less crowded.	2025-01-24 09:54:57 +01:00
Valentine Krasnobaeva	df7f16d960	MINOR: debug: debug_parse_cli_show_dev: use errname Let's use errname, introduced in the previous commit in the output of "show dev". This output is destined to engineers. So, no need to provide a long descriptions of errnos given by strerror.	2025-01-24 09:54:57 +01:00
Ilia Shipitsin	6524fbfb70	BUG/MINOR: debug: handle a possible strdup() failure This defect was found by the coccinelle script "unchecked-strdup.cocci". It can be backported to all supported branches.	2024-12-25 12:42:33 +01:00
Willy Tarreau	f486f976c7	BUILD: limits: make normalize_rlim() take an rlim_t to fix build on m68k As can be seen here, the build fails on m68k since commit `665dde648` ("MINOR: debug: use LIM2A to show limits") in 3.1: https://github.com/haproxy/haproxy/actions/runs/12440234399/job/34735360177 The reason is the comparison between a ulong limit and RLIM_INFINITY. Indeed, on m68k, rlim_t is an unsigned long long. Let's just change the function's input type to take an rlim_t instead. This also allows to get rid of the casts in the call place. This can be backported to 3.1 though it's not important given the low prevalence of this platform for such use cases.	2024-12-25 12:33:06 +01:00
Willy Tarreau	4710ab5604	BUILD: debug: only dump/reset glitch counters when really defined If neither DEBUG_GLITCHES nor DEBUG_STRICT is set, we end up with no dbg_cnt section, resulting in debug_parse_cli_counters not building due to __stop_dbg_cnt and __start_dbg_cnt not being defined. Let's just condition the end of the function to these conditions. An alternate approach (less elegant) is to always declare a dummy entry of type DBG_COUNTER_TYPES in debug.c. This must be backported to 3.1 since it was brought with glitches.	2024-12-17 16:46:25 +01:00
Willy Tarreau	1151fe6818	BUG/MEDIUM: debug: don't set the STUCK flag from debug_handler() Since 2.0 with commit `e6a02fa65a` ("MINOR: threads: add a "stuck" flag to the thread_info struct"), the TH_FL_STUCK flag was set by the debugger to flag that a thread was stuck and report it in the output. However, two commits later (`2bfefdbaef` "MAJOR: watchdog: implement a thread lockup detection mechanism"), this flag was used to detect that a thread had already been reported as stuck. The problem is that it seldom happens that a "show threads" command instantly crashes because it calls debug_handler(), which sets the flag, and if the watchdog timer was about to trigger before going back to the scheduler, the watchdog believes that the thread has been stuck for a while and will kill the process. The issue was magnified in 3.1 with the lower-delay warning, because it's possible for a thread to die on the next wakeup after the first warning (which calls debug_handler() hence sets the STUCK flag). One good approach would have been to use two distinct flags, one for "stuck" as reported by the debug handler, and one for "stuck" as seen by the watchdog. However, one could also argue that since the second commit, given that the wdt monitors the threads, there's no point any more for the debug handler to set the flag itself. Removing this code means that two consecutive "show threads" will not report "stuck" until the watchdog sets it, which aligns better with expectations. This can be backported to all stable releases. This code has changed a bit over time, the "if" block and the harmless variables just need to be removed.	2024-11-21 19:58:05 +01:00
Willy Tarreau	4420939fcd	MINOR: debug/cli: replace "debug dev counters" with "debug counters" "debug dev" commands are not meant to be used by end-users, and are purposely not documented. Yet due to their usefulness in troubleshooting sessions, users are increasingly invited by developers to use some of them. "debug dev counters" is one of them. Better move it to "debug counters" and document it so that users can check them even if the output can look cryptic at times. This, combined with DEBUG_GLITCHES, can be convenient to observe suspcious activity. The doc however precises that the format may change between versions and that new entries/types might appear within a stable branch.	2024-11-15 16:26:01 +01:00
Willy Tarreau	808a7cc777	BUG/MINOR: debug: do not set task expiration to TICK_ETERNITY Using "debug task", it's possible to change a task's expiration, but we must be careful not to set it to TICK_ETERNITY. Let's use tick_add() instead. The risk is basically nul since it's a debugging command, so no backport is needed.	2024-11-15 15:39:00 +01:00
Willy Tarreau	502790ed7e	MINOR: debug: add a new counter type for glitches COUNT_GLITCH() will implement an unconditional counter on its declaration line when DEBUG_GLITCHES is set, and do nothing otherwise. The output will be reported as "GLT" and can be filtered as "glt" on the CLI. The purpose is to help figure what's happening if some glitches counters start going through the roof. The macro supports an optional string argument to describe the cause of the glitch (e.g. "truncated header"), which is then reported in the dump. For now this is conditioned by DEBUG_GLITCHES but if it turns out to be light enough, maybe we'll keep it enabled full time. In this case it might have to be moved away from debug dev, or at least documented (or done as debug counters maybe so that dev can remain undocumented and updatable within a branch?).	2024-11-14 08:49:38 +01:00
Willy Tarreau	e119095290	MINOR: debug: explicitly permit the counter condition to be empty In order to count new event types, we'll need to support empty conditions so that we don't have to fake if (1) that would pollute the output. This change checks if #cond is an empty string before concatenating it with the optional var args, and avoids dumping the colon on the dump if the whole description is empty.	2024-11-14 08:47:00 +01:00
Willy Tarreau	5dcf2012fc	MINOR: debug: move the "recover now" warn message after the optional notes At the end of the too long processing warning added by commit `0950778b3a` ("MINOR: debug: add a function to dump a stuck thread"), there can be some optional notes about lua and memory trimming. However it's a bit awkward that they appear after the "trying to recover now" message. Let's just move that message after the notes.	2024-11-07 07:56:13 +01:00
Willy Tarreau	84dd05e7d8	DEBUG: wdt: add a stats counter "BlockedTrafficWarnings" in show info Every time a warning is issued about traffic being blocked, let's increment a global counter so that we can check for this situation in "show info".	2024-11-06 18:35:42 +01:00
Willy Tarreau	6127e5a4e9	DEBUG: wdt: make the blocked traffic warning delay configurable The new global "warn-blocked-traffic-after" allows one to configure after how much time a warning should be emitted when traffic is blocked.	2024-11-06 18:35:42 +01:00
Willy Tarreau	7337c42224	DEBUG: cli: make it possible for "debug dev loop" to trigger warnings A new argument "warn" allows to force the emission of a warning while stuck in the loop by making the internal state inconsistent.	2024-11-06 18:35:42 +01:00
Willy Tarreau	148eb5875f	DEBUG: wdt: better detect apparently locked up threads and warn about them In order to help users detect when threads are behaving abnormally, let's try to emit a warning when one is no longer making any progress. This will allow to catch faulty situations more accurately, instead of occasionally triggering just after the long task. It will also let users know that there is something wrong with their configuration, and inspect the call trace to figure whether they're using excessively long rules or Lua for example (the usual warnings about lua-load vs lua-load-per-thread are still reported). The warning will only be emitted for threads not yet marked as stuck so as not to interfere with panic dumps and avoid sending a warning just before a panic. A tainted flag is set when this happens however (0x2000).	2024-11-06 18:35:42 +01:00
Willy Tarreau	0950778b3a	MINOR: debug: add a function to dump a stuck thread There's currently no way to just emit a warning informing that a thread is stuck without crashing. This is a problem because sometimes users would benefit from this info to clean up their configuration (e.g. abuse of map_regm, lua-load etc). This commit adds a new function ha_stuck_warning() that will emit a warning indicating that the designated thread has been stuck for XX milliseconds, with a number of streams blocked, and will make that thread dump its own state. The warning will then be sent to stderr, along with some reminders about the impacts of such situations to encourage users to fix their configuration. In order not to disrupt operations, a local 4kB buffer is allocated in the stack. This should be quite sufficient. For now the function is not used.	2024-11-06 18:35:42 +01:00
Willy Tarreau	0f1d37a479	DEBUG: cli: support closing "hard" using close() in addition to fd_delete() "debug dev close <fd>" currently closes that FD using fd_delete() after checking that it's known from the fdtab. Sometimes we also want to just perform a pure close() of FDs not in the fdtab (pollers, etc) in order to provoke certain error cases. The optional "hard" argument to the command will make it use a plain close() instead of fd_delete() and skip the fd owner check. The main visible effect when closing a traffic socket with it is that instead of dying from a double fd_delete() by seeing that fd.owner is already 0, it will die during the next fd_insert() seeing that fd.owner was not 0.	2024-11-05 18:57:43 +01:00
Willy Tarreau	52240680f1	MINOR: debug: remove the redundant process.thread_info array from post_mortem That one is huge and unneeded since we now have the pointer to the whole thread_info[] array, which does contain the freshest version of these info and many more. Let's just get rid of it entirely.	2024-10-28 17:14:48 +01:00
Willy Tarreau	da5cf52173	MINOR: debug: also add fdtab and acitvity to struct post_mortem These ones are often used as well when trying to analyse sequences of events, let's add them.	2024-10-28 17:14:48 +01:00
Willy Tarreau	2f04ebe14a	MINOR: debug: also add a pointer to struct global to post_mortem The pointer to struct global is also an important element to have in post_mortem given that it's used a lot to take decisions in the code. Let's just add it. It's worth noting that we could get rid of argc/argv at this point since they're also present in the global struct, but they don't cost much there anyway.	2024-10-26 11:33:09 +02:00
William Lallemand	944a224358	MINOR: cli: remove non-printable characters from 'debug dev fd' When using 'debug dev fd', the output of laddr and raddr can contain some garbage. This patch replaces any control or non-printable character by a '.'.	2024-10-24 16:45:11 +02:00

1 2 3 4 5 ...

285 Commits