haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-07 23:56:57 +02:00

Author	SHA1	Message	Date
Aurelien DARRAGON	b6a24a52a2	BUG/MINOR: debug: fix pointer check in debug_parse_cli_task() Task pointer check in debug_parse_cli_task() computes the theoric end address of provided task pointer to check if it is valid or not thanks to may_access() helper function. However, relative ending address is calculated by adding task size to 't' pointer (which is a struct task pointer), thus it will result to incorrect address since the compiler automatically translates 't + x' to 't + x * sizeof(t)' internally (with sizeof(t) != 1 here). Solving the issue by using 'ptr' (which is the void * raw address) as starting address to prevent automatic address scaling. This was revealed by coverity, see GH #2157. No backport is needed, unless `9867987` ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets") gets backported.	2023-05-17 16:49:17 +02:00
Willy Tarreau	94df1b57ee	BUILD: debug: fix build issue on 32-bit platforms in "debug dev task" Commit `986798718` ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets") caused a build failure on 32-bit platforms when parsing the task's pointer. Let's use strtoul() and not strtoll(). No backport is needed, unless the commit above gets backported.	2023-05-12 04:40:06 +02:00
Willy Tarreau	95e6c9999a	BUILD: debug: do not check the isolated_thread variable in non-threaded builds The build without thread support was broken by commit `b30ced3d8` ("BUG/MINOR: debug: fix incorrect profiling status reporting in show threads") because it accesses the isolated_thread variable that is not defined when threads are disabled. In fact both the test on harmless and this one make no sense without threads, so let's comment out the block and mark the related variables as unused. This may have to be backported to 2.7 if the commit above is.	2023-05-07 15:02:30 +02:00
Willy Tarreau	e69919d1ba	CLEANUP: debug: remove the now unused ha_thread_dump_all_to_trash() The function isn't used anymore since each call place performs its own loop. Let's get rid of it.	2023-05-04 19:19:04 +02:00
Willy Tarreau	009b5519e6	MINOR: debug: make "show threads" properly iterate over all threads Previously it would re-dump all threads to the same trash if the output buffer was full, which it never was since the trash is of the same size. Now it dumps one thread, copies it to the buffer and yields until it can continue. Showing 256 threads works as expected.	2023-05-04 19:15:50 +02:00
Willy Tarreau	880d1684a7	MINOR: debug: write panic dump to stderr one thread at a time Currently large setups cannot dump all their threads because they're first dumped to the trash buffer, then copied to stderr. Here we can now change this, instead we dump one thread at a time into the trash and immediately send it to stderr. We also keep a copy into a local trash chunk that's assigned to thread_dump_buffer so that a core file still contains a copy of a large number of threads, which is generally sufficient for the vast majority of situations. It was verified that dumping 256 threads now produces ~55kB of output and all of them are properly dumped.	2023-05-04 19:15:50 +02:00
Willy Tarreau	9a6ecbd590	MEDIUM: debug: simplify the thread dump mechanism The thread dump mechanism that is used by "show threads" and by the panic dump is overly complicated due to an initial misdesign. It firsts wakes all threads, then serializes their dumps, then releases them, while taking extreme care not to face colliding dumps. In fact this is not what we need and it reached a limit where big machines cannot dump all their threads anymore due to buffer size limitations. What is needed instead is to be able to dump one thread, and to let the requester iterate on all threads. That's what this patch does. It adds the thread_dump_buffer to the struct thread_ctx so that the requester offers the buffer to the thread that is about to be dumped. This buffer also serves as a lock. A thread at rest has a NULL, a valid pointer indicates the thread is using it, and 0x1 (NULL+1) is used by the dumped thread to tell the requester it's done. This makes sure that a given thread is dumped once at a time. In addition to this, the calling thread decides whether it accesses the thread by itself or via the debug signal handler, in order to get a backtrace. This is much saner because the calling thread is free to do whatever it wants with the buffer after each thread is dumped, and there is no dependency between threads, once they've dumped, they're free to continue (and possibly to dump for another requester if needed). Finally, when the THREAD_DUMP feature is disabled and the debug signal is not used, the requester accesses the thread by itself like before. For now we still have the buffer size limitation but it will be addressed in future patches.	2023-05-04 19:15:44 +02:00
Willy Tarreau	cb01f5daa7	BUG/MINOR: debug: do not emit empty lines in thread dumps In 2.3, commit `471425f51` ("BUG/MINOR: debug: Don't dump the lua stack if it is not initialized") introduced the possibility to emit an empty line when there's no Lua info to dump. The problem is that doing this on the CLI in "show threads" marks the end of the output, and it may affect some external tools. We need to make sure that LFs are only emitted if there's something on the line and that all lines properly start with the prefix. This may be backported as far as 2.0 since the commit above was backported there.	2023-05-04 16:51:50 +02:00
Willy Tarreau	e5e62231d8	MINOR: debug: permit the "debug dev loop" to run under isolation Sometimes it's convenient to test the effect of tasks running under isolation, e.g. to validate the contents of the crash dumps. Let's add an optional "isolated" keyword to "debug dev loop" for this.	2023-05-04 11:50:26 +02:00
Willy Tarreau	b30ced3d88	BUG/MINOR: debug: fix incorrect profiling status reporting in show threads Thread dumps include a field "prof" for each thread that reports whether task profiling is currently active or not. It turns out that in 2.7-dev1, commit `680ed5f28` ("MINOR: task: move profiling bit to per-thread") mistakenly replaced it with a check for the current thread's bit in the thread dumps, which basically is the only place where another thread is being watched. The same mistake was done a few lines later by confusing threads_want_rdv_mask with the profiling mask. This mask disappeared in 2.7-dev2 with commit `598cf3f22` ("MAJOR: threads: change thread_isolate to support inter-group synchronization"), though instead we know the ID of the isolated thread. This commit fixes this and now reports "isolated" instead of "wantrdv". This can be backported to 2.7.	2023-05-04 11:41:33 +02:00
Willy Tarreau	ff508f12c6	BUILD: cli: fix build on Windows due to isalnum() implemented as a macro Commit `986798718` ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets") broke the build on windows due to this: src/debug.c:940:95: error: array subscript has type char [-Werror=char-subscripts] 940 \| caller && may_access(caller) && may_access(caller->func) && isalnum(*caller->func) ? caller->func : "0", \| ^~~~~~~~~~~~~ It's classical on platforms which implement ctype.h as macros instead of functions, let's cast it as uchar. No backport is needed.	2023-05-03 16:32:50 +02:00
Willy Tarreau	9867987182	DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets When analyzing certain types of bugs in field, sometimes it would be nice to be able to wake up a task or tasklet to see how events progress (e.g. to detect a missing wakeup condition), or expire or kill such a task. This restricted command shows hte current state of a task or tasklet and allows to manipulate it like this. However it must be used with extreme care because while it does verify that the pointers are mapped, it cannot know if they point to a real task, and performing such actions on something not a task will easily lead to a crash. In addition, performing a "kill" on a task has great chances of provoking a deferred crash due to a double free and/or another kill that is not idempotent. Use with extreme care!	2023-05-03 11:47:44 +02:00
Willy Tarreau	dd01448953	MINOR: debug: clarify "debug dev stream" help message The help message was insufficient to figure how to use it and specify the stream pointer and changes to operate.	2023-05-03 11:47:44 +02:00
Christopher Faulet	208c712b40	MINOR: stconn: Rename SC_FL_SHUTW in SC_FL_SHUT_DONE Here again, it is just a flag renaming. In SC flags, there is no longer shutdown for writes but shutdowns.	2023-04-14 15:01:21 +02:00
Christopher Faulet	7faac7cf34	MINOR: tree-wide: Simplifiy some tests on SHUT flags by accessing SCs directly At many places, we simplify the tests on SHUT flags to remove calls to chn_prod() or chn_cons() function because the corresponding SC is available.	2023-04-05 08:57:06 +02:00
Christopher Faulet	87633c3a11	MEDIUM: tree-wide: Move flags about shut from the channel to the SC The purpose of this patch is only a one-to-one replacement, as far as possible. CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not so simple because at many places, we were testing if a channel was shut for reads and writes in same time. To do the same, shut for reads must be tested on one side on the SC and shut for writes on the other side on the opposite SC. A special care was taken with process_stream(). flags of SCs must be saved to be able to detect changes, just like for the channels.	2023-04-05 08:57:06 +02:00
Willy Tarreau	bd3b44edff	MINOR: debug: add random delay injection with "debug dev delay-inj" The goal is to send signals to random threads at random instants so that they spin for a random delay in a relax() loop, trying to give back the CPU to another competing hardware thread, in hope that from time to time this can trigger in critical areas and increase the chances to provoke a latent concurrency bug. For now none were observed. For example, this command starts 64 such tasks waking after random delays of 0-1ms and delivering signals to trigger such loops on 3 random threads: for i in {1..64}; do socat - /tmp/sock1 <<< "expert-mode on;debug dev delay-inj 2 3" done This command is only enabled when DEBUG_DEV is set at build time.	2023-03-09 14:01:58 +01:00
Christopher Faulet	15315d6c0a	CLEANUP: stconn: Remove old read and write expiration dates Old read and write expiration dates are no longer used. Thus we can safely remove them.	2023-02-22 15:59:16 +01:00
Christopher Faulet	f8413cba2a	MEDIUM: channel/stconn: Move rex/wex timer from the channel to the sedesc These timers are related to the I/O. Thus it is logical to move them into the SE descriptor. The patch is a bit huge but it is just a replacement. However it is error-prone. From the stconn or the stream, helper functions are used to get, set or reset these timers. This simplify the timers manipulations.	2023-02-22 14:52:15 +01:00
Willy Tarreau	9debe0fb27	BUG/MEDIUM: debug/thread: make the debug handler not wait for !rdv_requests The debug handler may deadlock with some threads waiting for isolation. This may happend during a "show threads" command or even during a panic. The reason is the call to thread_harmless_end() which waits for rdv_requests to turn to zero before releasing its position in thread_dump_state, while that one may not progress if another thread was interrupted in thread_isolate() and is waiting for that thread to drop thread_dump_state. In order to address this, we now use thread_harmless_end_sig() introduced by previous commit: MINOR: threads: add a thread_harmless_end() version that doesn't wait However there's a catch: since commit `f7afdd910` ("MINOR: debug: mark oneself harmless while waiting for threads to finish"), there's a second pair of thread_harmless_now()/thread_harmless_end() that surround the loop around thread_dump_state. Marking a thread harmless before this loop and dropping that without checking rdv_requests there could break the harmless promise made to the other thread if it returns first and proceeds with its isolated work. Hence we just drop this pair which was only preventive for other signal handlers, while as indicated in that patch's commit message, other signals are handled asynchronously and do not require that extra protection. This fix must be backported to 2.7. The problem can be seen by running "show threads" in fast loops (100/s) while reloading haproxy very quickly (10/s) and sending lots of traffic to it (100krps, 15 Gbps). In this case the soft stop calls pool_gc() which isolates a lot and manages to race with the dumps after a few tens of seconds, leaving the process with all threads at 100%.	2023-01-19 19:22:17 +01:00
Christopher Faulet	da89e9b95b	MINOR: channel/applets: Stop to test CF_WRITE_ERROR flag if CF_SHUTW is enough In applets, we stop processing when a write error (CF_WRITE_ERROR) or a shutdown for writes (CF_SHUTW) is detected. However, any write error leads to an immediate shutdown for writes. Thus, it is enough to only test if CF_SHUTW is set.	2023-01-09 18:41:08 +01:00
Willy Tarreau	b5662519df	BUG/MINOR: debug: don't mask the TH_FL_STUCK flag before dumping threads Commit `f0c86ddfe` ("BUG/MEDIUM: debug: fix parallel thread dumps again") added a clearing of the TH_FL_STUCK flag before dumping threads in case of parallel dumps, but that was in part a sort of workaround for some remains of the commit that introduced the flag in 2.0 before the watchdog existed, and which would set it after dumping a thread: `e6a02fa65` ("MINOR: threads: add a "stuck" flag to the thread_info struct"), and in part an attempt to avoid that a thread waiting for too long during the dump would get the flag set. But that is not possible, a thread waiting for being dumped has the harmless bit set and doesn't get the stuck bit. What happens in fact is that issuing "show threads" in fast loops ends up causing some threads to keep their STUCK bit that was set at the end of "show threads", and confuses the output. The problem with doing this is that the flag is cleared before the thread is dumped, and since this flag is used to decide whether to show a backtrace or not, we don't get backtraces anymore of stuck threads since the commit above in 2.7. This patch just removes the two points where the flag was cleared by the commit above. It should be backported to 2.7.	2023-01-02 09:51:35 +01:00
Willy Tarreau	b59e3f6045	MINOR: debug: add a balance of alloc - free at the end of the memstats dump When digging into suspected memory leaks, it's cumbersome to count the number of allocations and free calls. Here we're adding a summary at the end of the sum of allocs minus the sum of frees, excluding realloc since we can't know how much it releases upon each call. This means that when doing many realloc+free the count may be negative but in practice there are very few reallocs so that's not a problem. Also the size/call is signed and corresponds to the average size allocated (e.g. leaked) per call. It seems to work reasonably well for now: > debug dev memstats match buf quic_conn.c:2978 P_FREE size: 1239547904 calls: 75656 size/call: 16384 buffer quic_conn.c:2960 P_ALLOC size: 1239547904 calls: 75656 size/call: 16384 buffer mux_quic.c:393 P_ALLOC size: 9112780800 calls: 556200 size/call: 16384 buffer mux_quic.c:383 P_ALLOC size: 17783193600 calls: 1085400 size/call: 16384 buffer mux_quic.c:159 P_FREE size: 8935833600 calls: 545400 size/call: 16384 buffer mux_quic.c:142 P_FREE size: 9112780800 calls: 556200 size/call: 16384 buffer h3.c:776 P_ALLOC size: 8935833600 calls: 545400 size/call: 16384 buffer quic_stream.c:166 P_FREE size: 975241216 calls: 59524 size/call: 16384 buffer quic_stream.c:127 P_FREE size: 7960592384 calls: 485876 size/call: 16384 buffer stream.c:772 P_FREE size: 8798208 calls: 537 size/call: 16384 buffer stream.c:768 P_FREE size: 2424832 calls: 148 size/call: 16384 buffer stream.c:751 P_ALLOC size: 8852062208 calls: 540287 size/call: 16384 buffer stream.c:641 P_FREE size: 8849162240 calls: 540110 size/call: 16384 buffer stream.c:640 P_FREE size: 8847360000 calls: 540000 size/call: 16384 buffer channel.h:850 P_ALLOC size: 2441216 calls: 149 size/call: 16384 buffer channel.h:850 P_ALLOC size: 5914624 calls: 361 size/call: 16384 buffer dynbuf.c:55 P_FREE size: 32768 calls: 2 size/call: 16384 buffer Total BALANCE size: 0 calls: 5606906 size/call: 0 (excl. realloc) Let's see how useful this becomes over time.	2022-12-01 16:12:21 +01:00
Willy Tarreau	e57fbed3c4	MINOR: debug: support pool filtering on "debug dev memstats" Sometimes when debugging it's convenient to be able to focus only on certain pools. Just like we did for "show pools", let's add a filter based on a prefix on "debug dev memstats match <prefix>".	2022-12-01 16:12:21 +01:00
Willy Tarreau	111c78329e	MINOR: debug: relax access restrictions on "debug dev hash" and "memstats" These two have absolutely zero impact on the process and do not need to be restricted to the expert mode. The first one calculates a string hash that can be used by anyone when checking a dump; the second one may be used by anyone tracking a memory leak, and is cumbersome to use due to the "expert-mode on" that needs to be prepended. In addition this gives bad habits to users and needlessly taints the process. So let's drop this restriction for these two commands.	2022-11-30 17:58:00 +01:00
Willy Tarreau	50dd7e95c8	CLEANUP: anon: clarify the help message on "debug dev hash" This command is used to hash a section name using the current anon key, it was brought in 2.7 by commit `54966dffd` ("MINOR: anon: store the anonymizing key in the CLI's appctx"). However the help message only says "return msg hashed" which is misleading because if anon mode is not enabled, it returns the string as-is. Let's just mention this condition in the help message, and also fix the alphabetical ordering and alignment on the line.	2022-11-30 17:58:00 +01:00
Willy Tarreau	334d091b75	MINOR: debug: improve error handling on the memstats command parser "debug dev memstats" supports various options but silently ignores the unknown ones. Let's make sure it returns indications about what it expects, as the help message is quite limited otherwise.	2022-11-30 17:24:29 +01:00
Erwan Le Goas	54966dffda	MINOR: anon: store the anonymizing key in the CLI's appctx In order to allow users to dump internal states using a specific key without changing the global one, we're introducing a key in the CLI's appctx. This key is preloaded from the global one when "set anon on" is used (and if none exists, a random one is assigned). And the key can optionally be assigned manually for the whole CLI session. A "show anon" command was also added to show the anon state, and the current key if the users has sufficient permissions. In addition, a "debug dev hash" command was added to test the feature.	2022-09-17 11:27:09 +02:00
Willy Tarreau	d96d214b4c	CLEANUP: debug: use struct ha_caller for memstat The memstats code currently defines its own file/function/line number, type and extra pointer. We don't need to keep them separate and we can easily replace them all with just a struct ha_caller. Note that the extra pointer could be converted to a pool ID stored into arg8 or arg32 and be dropped as well, but this would first require to define IDs for pools (which we currently do not have).	2022-09-08 14:19:15 +02:00
Willy Tarreau	04e50b3d32	CLEANUP: task: rename ->call_date to ->wake_date This field is misnamed because its real and important content is the date the task was woken up, not the date it was called. It temporarily holds the call date during execution but this remains confusing. In fact before the latency measurements were possible it was indeed a call date. Thus is will now be called wake_date. This change is necessary because a subsequent fix will require the introduction of the real call date in the thread ctx.	2022-09-08 14:19:15 +02:00
Willy Tarreau	4a426e2082	MINOR: debug/memstats: automatically determine first column size The first column's width may vary a lot depending on outputs, and it's annoying to have large empty columns on small names and mangled large columns that are not yet large enough. In order to overcome this, this patch adds a width field to the memstats applet's context, and this width is calculated the first time the function is entered, by estimating the width of all lines that will be dumped. This is simple enough and does the job well. If in the future some filtering criteria are added, it will still be possible to perform a single pass on everything depending on the desired output format.	2022-08-09 08:51:08 +02:00
Willy Tarreau	17200dd1f3	MINOR: debug: also store the function name in struct mem_stats The calling function name is now stored in the structure, and it's reported when the "all" argument is passed. The first column is significantly enlarged because some names are really wide :-(	2022-08-09 08:42:42 +02:00
Willy Tarreau	55c950baa9	MINOR: debug: store and report the pool's name in struct mem_stats Let's add a generic "extra" pointer to the struct mem_stats to store context-specific information. When tracing pool_alloc/pool_free, we can now store a pointer to the pool, which allows to report the pool name on an extra column. This significantly improves tracing capabilities. Example: proxy.c:1598 CALLOC size: 28832 calls: 4 size/call: 7208 dynbuf.c:55 P_FREE size: 32768 calls: 2 size/call: 16384 buffer quic_tls.h:385 P_FREE size: 34008 calls: 1417 size/call: 24 quic_tls_iv quic_tls.h:389 P_FREE size: 34008 calls: 1417 size/call: 24 quic_tls_iv quic_tls.h:554 P_FREE size: 34008 calls: 1417 size/call: 24 quic_tls_iv quic_tls.h:558 P_FREE size: 34008 calls: 1417 size/call: 24 quic_tls_iv quic_tls.h:562 P_FREE size: 34008 calls: 1417 size/call: 24 quic_tls_iv quic_tls.h:401 P_ALLOC size: 34080 calls: 1420 size/call: 24 quic_tls_iv quic_tls.h:403 P_ALLOC size: 34080 calls: 1420 size/call: 24 quic_tls_iv xprt_quic.c:4060 MALLOC size: 45376 calls: 5672 size/call: 8 quic_sock.c:328 P_ALLOC size: 46440 calls: 215 size/call: 216 quic_dgram	2022-08-09 08:26:59 +02:00
Willy Tarreau	dadf00e226	DEBUG: cli: add a new "debug dev deadlock" expert command This command will create the requested number of tasks competing on a lock, resulting in triggering the watchdog and crashing the process. This will help stress the watchdog and inspect the lock debugging parts.	2022-07-15 19:41:26 +02:00
Willy Tarreau	f0c86ddfe8	BUG/MEDIUM: debug: fix parallel thread dumps again The previous attempt to fix thread dumps in commit `672972604` ("BUG/MEDIUM: debug: fix possible hang when multiple threads dump at once") still had some shortcomings. Sometimes parallel dumps are jerky essentially due to the way that threads synchronize on startup and end. In addition the risk of waiting forever for a stopped thread exists, and panics happening in parallel to thread dumps are not more reliable either. This commit revisits the state transitions so that all threads may request a dump in parallel, that all of them wait for each other in the handler, and that one thread is responsible for counting every other and checking that the total matches the number of active threads. Then for stopping there's a finishing phase that all threads wait for so that none quits this area too early. Given that we now know the number of participants to the dump, we can let them each decrement the counter when leaving so that another dump may only start after the last participant has completely left. Now many thread dumps in parallel are running fine, so do panics. No backport is needed as this was the result of the changes for thread groups.	2022-07-15 19:41:26 +02:00
Willy Tarreau	55433f9b34	BUG/MINOR: debug: enter ha_panic() only once Some panic dumps are mangled or truncated due to the watchdog firing at the same time on multiple threads and calling ha_panic() simultaneously. What may happen in this case is that the second one waits for the first one to finish but as soon as it's done the second one resets the buffer and dumps again, sometimes resetting the first one's dump. Also the first one's abort() may trigger while the second one is currently dumping, resulting in a full dump followed by a truncated one, leading to confusion. Sometimes some lines appear in the middle of a dump as well. It doesn't happen often and is easier to trigger by causing massive deadlocks. There's no reason for the process to resist to a panic, so we can safely add a counter and no nothing on subsequent calls. Ideally we'd wait there forever but as this may happen inside a signal handler (e.g. watchdog), it doesn't always work, so the easiest thing to do is to return so that the thread is interrupted as soon as possible and brought to the debug handler to be dumped. This should be backported, at least to 2.6 and possibly to older versions as well.	2022-07-15 19:41:26 +02:00
Willy Tarreau	52f238d326	BUG/MEDIUM: cli/threads: make "show threads" more robust on applets Running several concurrent "show threads" in loops might occasionally cause a segfault when trying to retrieve the stream from appctx_sc() which may be null while the applet is finishing. It's not easy to reproduce, it requires 3-5 sessions in parallel for about a minute or so. The appctx_sc must be checked before passing it to sc_strm(). This must be backported to 2.6 which also has the bug.	2022-07-15 19:41:26 +02:00
Willy Tarreau	672972604f	BUG/MEDIUM: debug: fix possible hang when multiple threads dump at once A bug in the thread dumper was introduced by commit `00c27b50c` ("MEDIUM: debug: make the thread dumper not rely on a thread mask anymore"). If two or more threads try to trigger a thread dump exactly at the same time, the second one may loop indefinitely trying to set the value to 1 while the other ones will wait for it to finish dumping before leaving. This is a consequence of a logic change using thread numbers instead of a thread mask, as threads do not need to see all other ones there anymore. No backport is needed, this is only for 2.7.	2022-07-13 09:03:02 +02:00
Willy Tarreau	89ed89e895	BUILD: debug: re-export thread_dump_state Building with threads and without thread dump (e.g. macos, freebsd) warns that thread_dump_state is unused. This happened in fact with recentcommit `1229ef312` ("MINOR: wdt: do not rely on threads_to_dump anymore"). The solution would be to mark it unused, but after a second thought, it can be convenient to keep it exported to help debug crashes, so let's export it again. It's just not referenced in include files since it's not needed outside.	2022-07-01 21:18:03 +02:00
Willy Tarreau	039972b4e5	BUILD: debug: fix build issue on clang with previous commit Since the thread_dump_state type changed to uint, the old value in the CAS needs to be the same as well.	2022-07-01 19:37:42 +02:00
Willy Tarreau	00c27b50c0	MEDIUM: debug: make the thread dumper not rely on a thread mask anymore The thread mask is too short to dump more than 64 bits. Thus here we're using a different approach with two counters, one for the next thread ID to dump (which always exists, as it's looked up), and the second one for the number of threads done dumping. This allows to dump threads in ascending order then to let them wait for all others to be done, then to leave without the risk of an overlapping dump until the done count is null again. This allows to remove threads_to_dump which was the last non-FD variable using a global thread mask.	2022-07-01 19:31:39 +02:00
Willy Tarreau	1229ef312d	MINOR: wdt: do not rely on threads_to_dump anymore This flag is not needed anymore as we're already marking the waiting threads as harmless, thus the thread's bit is already covered by this information. The variable was unexported.	2022-07-01 19:26:35 +02:00
Willy Tarreau	f7afdd910b	MINOR: debug: mark oneself harmless while waiting for threads to finish The debug_handler() function waits for other threads to join, but does not mark itself as harmless, so if at the same time another thread tries to isolate, this may deadlock. In practice this does not happen as the signal is received during epoll_wait() hence under harmless mode, but it can possibly arrive under other conditions. In order to improve this, while waiting for other threads to join, we're now marking the current thread as harmless, as it's doing nothing but waiting for the other ones. This way another harmless waiter will be able to proceed. It's valid to do this since we're not doing anything else in this loop. One improvement could be to also check for the thread being idle and marking it idle in addition to harmless, so that it can even release a full isolation requester. But that really doesn't look worth it.	2022-07-01 19:26:35 +02:00
Willy Tarreau	a2b8ed4b44	MINOR: thread: add is_thread_harmless() to know if a thread already is harmless The harmless status is not re-entrant, so sometimes for signal handling it can be useful to know if we're already harmless or not. Let's add a function doing that, and make the debugger use it instead of manipulating the harmless mask.	2022-07-01 19:26:35 +02:00
Willy Tarreau	03f9b35114	MEDIUM: tinfo: add a dynamic thread-group context The thread group info is not sufficient to represent a thread group's current state as it's read-only. We also need something comparable to the thread context to represent the aggregate state of the threads in that group. This patch introduces ha_tgroup_ctx[] and tg_ctx for this. It's indexed on the group id and must be cache-line aligned. The thread masks that were global and that do not need to remain global were moved there (want_rdv, harmless, idle). Given that all the masks placed there now become group-specific, the associated thread mask (tid_bit) now switches to the thread's local bit (ltid_bit). Both are the same for nbtgroups 1 but will differ for other values. There's also a tg_ctx pointer in the thread so that it can be reached from other threads.	2022-07-01 19:15:15 +02:00
Willy Tarreau	38d0712748	MINOR: debug: use ltid_bit in ha_thread_dump() Since commit `cc7a11ee3` ("MINOR: threads: set the tid, ltid and their bit in thread_cfg") we ought not use (1UL << thr) to get the group mask for thread <thr>, but (ha_thread_info[thr].ltid_bit). ha_thread_dump() needs this.	2022-07-01 19:15:14 +02:00
Willy Tarreau	66ad98a772	MINOR: tinfo: add the tgid to the thread_info struct At several places we're dereferencing the thread group just to catch the group number, and this will become even more required once we start to use per-group contexts. Let's just add the tgid in the thread_info struct to make this easier.	2022-07-01 19:15:14 +02:00
Willy Tarreau	e7475c8e79	MEDIUM: tasks/fd: replace sleeping_thread_mask with a TH_FL_SLEEPING flag Every single place where sleeping_thread_mask was still used was to test or set a single thread. We can now add a per-thread flag to indicate a thread is sleeping, and remove this shared mask. The wake_thread() function now always performs an atomic fetch-and-or instead of a first load then an atomic OR. That's cleaner and more reliable. This is not easy to test, as broadcast FD events are rare. The good way to test for this is to run a very low rate-limited frontend with a listener that listens to the fewest possible threads (2), and to send it only 1 connection at a time. The listener will periodically pause and the wakeup task will sometimes wake up on a random thread and will call wake_thread(): frontend test bind :8888 maxconn 10 thread 1-2 rate-limit sessions 5 Alternately, disabling/enabling a frontend in loops via the CLI also broadcasts such events, but they're more difficult to observe since this is causing connection failures.	2022-07-01 19:15:14 +02:00
Willy Tarreau	bdcd32598f	MINOR: thread: only use atomic ops to touch the flags The thread flags are touched a little bit by other threads, e.g. the STUCK flag may be set by other ones, and they're watched a little bit. As such we need to use atomic ops only to manipulate them. Most places were already using them, but here we generalize the practice. Only ha_thread_dump() does not change because it's run under isolation.	2022-07-01 19:15:14 +02:00
Willy Tarreau	c958c70ec8	MINOR: task: replace global_tasks_mask with a check for tree's emptiness This bit field used to be a per-thread cache of the result of the last lookup of the presence of a task for each thread in the shared cache. Since we now know that each thread has its own shared cache, a test of emptiness is now sufficient to decide whether or not the shared tree has a task for the current thread. Let's just remove this mask.	2022-07-01 19:15:14 +02:00

1 2 3 4

184 Commits