haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-14 02:57:01 +02:00

Author	SHA1	Message	Date
Ilya Shipitsin	1f6e5f7a61	CLEANUP: assorted typo fixes in the code and comments This is 43rd iteration of typo fixes	2024-09-03 17:49:21 +02:00
Valentine Krasnobaeva	e8799d2880	MINOR: debug: keep runtime limits in postmortem It's usefull to keep runtime limits (fd and RAM) in postmortem and show them in debug_parse_cli_show_dev(). Runtime limits are fed in feed_post_mortem_late(), as we are sure that at this moment that all configuration was parsed and all applied limits were alredy adjusted.	2024-07-16 14:04:41 +02:00
Valentine Krasnobaeva	3abd03aa78	MINOR: debug: prepare to show runtime limits This is a preparation patch to extend postmortem in order to store runtime limits. No need to perform getrlimit() in feed_post_mortem(), as we do this in the very beginning of main() and we store initial fd limits in global 'rlim_fd_cur_at_boot' and 'rlim_fd_max_at_boot' variables.	2024-07-16 14:04:41 +02:00
Valentine Krasnobaeva	665dde6481	MINOR: debug: use LIM2A to show limits It is more handy to use LIM2A in debug_parse_cli_show_dev(), as it allows to show a custom string ("unlimited"), if a given limit value equals to 0. normalize_rlim() handler is needed to convert properly RLIM_INFINITY to zero, with the respect of type sizes, as rlim_t is always 4 bytes on 32bit and 64bit arch.	2024-07-16 14:04:41 +02:00
Valentine Krasnobaeva	93cc7df276	MINOR: debug: keep runtime capabilities in post_mortem Let's extend postmortem to keep process runtime capabilities. This information is gathered in feed_post_mortem_late(), as it is called just before run_poll_loop() and we are sure at this moment, that all configuration settings were successfully applied.	2024-07-16 14:04:41 +02:00
Valentine Krasnobaeva	baa4e1cf39	MINOR: debug: store runtime uid/gid in postmortem Let's extend post_mortem to store runtime process uid and gid. This information is fed in feed_post_mortem_late(), just before calling run_poll_loop(). Like this we are sure that all configuration settings were successfully applied.	2024-07-16 14:04:41 +02:00
Valentine Krasnobaeva	ac8bd679dc	CLEANUP: debug: fix indents in debug_parse_cli_show_dev Fix indents in debug_parse_cli_show_dev() to avoid useless conflicts in case of future changes in this function or git-bisect.	2024-07-16 14:04:41 +02:00
Valentine Krasnobaeva	7cdf5751b5	MINOR: debug: prepare feed_post_mortem_late Process runtime information could be very useful in post_mortem, but we have to collect it just before calling run_poll_loop(). Like this we are sure, that we've successfully applied all configuration parameters and what we've collected are the latest runtime settings. The most appropraite place to collect such information is feed_post_mortem_late(). It's called in each thread, but puts thread info in the post_mortem only when it's in the last thread context. As it's called under mutex lock, other threads at this moment have to wait until feed_post_mortem_late() and another initialization functions from per_thread_init_list will finish. The number of threads could be large. So, to avoid spending a lot of time under the lock, let's exit immediately from feed_post_mortem_late(), if it wasn't called in the last thread.	2024-07-16 14:04:41 +02:00
Willy Tarreau	e0e2b66132	BUG/MEDIUM: debug/cli: fix "show threads" crashing with low thread counts The "show threads" command introduced early in the 2.0 dev cycle uses appctx->st1 to store its context (the number of the next thread to dump). It goes back to an era where contexts were shared between the various applets and the CLI's command handlers. In fact it was already not good by then because st1 could possibly have APPCTX_CLI_ST1_PAYLOAD (2) in it, that would make the dmup start at thread 2, though it was extremely unlikely. When contexts were finally cleaned up and moved to their own storage, this one was overlooked, maybe due to using st1 instead of st2 like most others. So it continues to rely on st1, and more recently some new flags were appended, one of which is APPCTX_CLI_ST1_LASTCMD (16) and is always there. This results in "show threads" to believe it must start do dump from thread 16, and if this thread is not present, it can simply crash the process. A tiny reproducer is: global nbthread 1 stats socket /tmp/sock1 level admin mode 666 $ socat /tmp/sock1 - <<< "show threads" The fix for modern versions simply consists in assigning a context to this command from the applet storage. We're using a single int, no need for a struct, an int* will do it. That's valid till 2.6. Prior to 2.6, better switch to appctx->ctx.cli.i0 or i1 which are all properly initialized before the command is executed. This must be backported to all stable versions. Thanks to Andjelko Horvat for the report and the reproducer.	2024-07-16 11:35:06 +02:00
Willy Tarreau	8f204fa8ae	MINOR: debug: print gdb hints when crashing To make bug reporting easier for users, when crashing, let's suggest what to do. Typically when a BUG_ON() matches, only the current thread is useful the vast majority of the time, while when the watchdog triggers, all threads are interesting. The messages are printed at the end after the dump. We may adjust these with wiki links in the future is more detailed instructions are relevant.	2024-06-26 07:43:00 +02:00
Valentine Krasnobaeva	2cd52a88be	MINOR: cli/debug: show dev: show capabilities If haproxy compiled with Linux capabilities support, let's show process capabilities before applying the configuration and at runtime in 'show dev' command output. This maybe useful for debugging purposes. Especially in cases, when process changes its UID and GID to non-priviledged or it has started and run under non-priviledged UID and needed capabilities are set by admin on the haproxy binary.	2024-06-26 07:38:21 +02:00
Valentine Krasnobaeva	0d79c9bedf	MINOR: cli/debug: show dev: add cmdline and version 'show dev' command is very convenient to obtain haproxy debugging information, while process is run in container. Let's extend its output with version and cmdline. cmdline is useful in a way, as it shows absolute binary path and its arguments, because sometimes the person, who is debugging failing container is not the same, who has created and deployed it. argc and argv are stored in the exported global structure, because feed_post_mortem() is added as a post check function callback in the post_check_list. So we can't simply change the signature of feed_post_mortem(), without breaking other post check callbacks APIs. Parsers are not supposed to modify argv, so we can safely bypass its pointer to debug_parse_cli_show_dev(), without copying all argument stings somewhere in the heap or on stack.	2024-06-26 07:38:21 +02:00
Valentine Krasnobaeva	865db6307f	MINOR: init: use RLIMIT_DATA instead of RLIMIT_AS Limiting total allocatable process memory (VSZ) via setting RLIMIT_AS limit is no longer effective, in order to restrict memory consumption at run time. We can see from process memory map below, that there are many holes within the process VA space, which bumps its VSZ to 1.5G. These holes are here by many reasons and could be explaned at first by the full randomization of system VA space. Now it is usually enabled in Linux kernels by default. There are always gaps around the process stack area to trap overflows. Holes before and after shared libraries could be explained by the fact, that on many architectures libraries have a 'preferred' address to be loaded at; putting them elsewhere requires relocation work, and probably some unshared pages. Repetitive holes of 65380K are most probably correspond to the header that malloc has to allocate before asked a claimed memory block. This header is used by malloc to link allocated chunks together and for its internal book keeping. $ sudo pmap -x -p `pidof haproxy` 127136: ./haproxy -f /home/haproxy/haproxy/haproxy_h2.cfg Address Kbytes RSS Dirty Mode Mapping 0000555555554000 388 64 0 r---- /home/haproxy/haproxy/haproxy 00005555555b5000 2608 1216 0 r-x-- /home/haproxy/haproxy/haproxy 0000555555841000 916 64 0 r---- /home/haproxy/haproxy/haproxy 0000555555926000 60 60 60 r---- /home/haproxy/haproxy/haproxy 0000555555935000 116 116 116 rw--- /home/haproxy/haproxy/haproxy 0000555555952000 7872 5236 5236 rw--- [ anon ] 00007fff98000000 156 36 36 rw--- [ anon ] 00007fff98027000 65380 0 0 ----- [ anon ] 00007fffa0000000 156 36 36 rw--- [ anon ] 00007fffa0027000 65380 0 0 ----- [ anon ] 00007fffa4000000 156 36 36 rw--- [ anon ] 00007fffa4027000 65380 0 0 ----- [ anon ] 00007fffa8000000 156 36 36 rw--- [ anon ] 00007fffa8027000 65380 0 0 ----- [ anon ] 00007fffac000000 156 36 36 rw--- [ anon ] 00007fffac027000 65380 0 0 ----- [ anon ] 00007fffb0000000 156 36 36 rw--- [ anon ] 00007fffb0027000 65380 0 0 ----- [ anon ] ... 00007ffff7fce000 4 4 0 r-x-- [ anon ] 00007ffff7fcf000 4 4 0 r---- /usr/lib/x86_64-linux-gnu/ld-2.31.so 00007ffff7fd0000 140 140 0 r-x-- /usr/lib/x86_64-linux-gnu/ld-2.31.so ... 00007ffff7ffe000 4 4 4 rw--- [ anon ] 00007ffffffde000 132 20 20 rw--- [ stack ] ffffffffff600000 4 0 0 --x-- [ anon ] ---------------- ------- ------- ------- total kB 1499288 75504 72760 This exceeded VSZ makes impossible to start an haproxy process with 200M memory limit, set at its initialization stage as RLIMIT_AS. We usually have in this case such cryptic output at stderr: $ haproxy -m 200 -f haproxy_quic.cfg (null)(null)(null)(null)(null)(null) At the same time the process RSS (a memory really used) is only 75,5M. So to make process memory accounting more realistic let's base the memory limit, set by -m option, on RSS measurement and let's use RLIMIT_DATA instead of RLIMIT_AS. RLIMIT_AS was used before, because earlier versions of haproxy always allocate memory buffers for new connections, but data were not written there immediately. So these buffers were not instantly counted in RSS, but were always counted in VSZ. Now we allocate new buffers only in the case, when we will write there some data immediately, so using RLIMIT_DATA becomes more appropriate.	2024-04-19 17:36:40 +02:00
Amaury Denoyelle	da03396bb3	BUG/BUILD: debug: fix unused variable error A compilation error occurs when using DEBUG_MEM_STATS due to a variable now being unused in debug_iohandler_memstats() : src/debug.c: In function ‘debug_iohandler_memstats’: src/debug.c:1862:24: error: unused variable ‘sc’ [-Werror=unused-variable] 1862 \| struct stconn *sc = appctx_sc(appctx); \| ^~ This is caused since the following commit : `94b8ed446f` MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands This must not be backported.	2024-03-29 17:21:04 +01:00
Christopher Faulet	94b8ed446f	MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands The main CLI I/O handle is responsible to interrupt the processing on shutdown/abort. It is not the responsibility of the I/O handler of CLI commands to take care of it.	2024-03-28 17:28:20 +01:00
Willy Tarreau	5df0df96dd	MINOR: debug: add "debug dev trace" to flood with traces This new command, enabled only with "DEBUG_DEV", sends 2 or 20 traces per task wakeup (depending on the verbosity level), and stops after 1M wakeups per thread in order not to have to stop/start the process each time it's fired. We have two small messages and 18 larger ones from 20 to 270 bytes each, so that the average size is approx 213 bytes counting headers (the header adds approx 82 bytes), which matches what's generally observed on average when traces are enabled in all muxes. Typical figures show varations between 5.7M and 6.2M msg/s on an EPYC in a 3C6T setup (single CCX), and 2.12M - 2.22M in a 24C48T setup (across 8 CCX, with 8 thread groups).	2024-03-25 17:32:22 +00:00
Aurelien DARRAGON	07b2e84bce	BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread (2nd try) While trying to reproduce another crash case involving lua filters reported by @bgrooot on GH #2467, we found out that mixing filters loaded from different contexts ('lua-load' vs 'lua-load-per-thread') for the same stream isn't supported and may even cause the process to crash. Historically, mixing lua-load and lua-load-per-threads for a stream wasn't supported, but this changed thanks to `0913386` ("BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread"). However, the above fix didn't consider lua filters's use-case properly: unlike lua fetches, actions or even services, lua filters don't simply use the stream hlua context as a "temporary" hlua running context to process some hlua code. For fetches, actions.. hlua executions are processed sequentially, so we simply reuse the hlua context from the previous action/fetch to run the next one (this allows to bypass memory allocations and initialization, thus it increases performance), unless we need to run on a different hlua state-id, in which case we perform a reset of the hlua context. But this cannot work with filters: indeed, once registered, a filter will last for the whole stream duration. It means that the filter will rely on the stream hlua context from ->attach() to ->detach(). And here is the catch, if for the same stream we register 2 lua filters from different contexts ('lua-load' + 'lua-load-per-thread'), then we have an issue, because the hlua stream will be re-created each time we switch between runtime contexts, which means each time we switch between the filters (may happen for each stream processing step), and since lua filters rely on the stream hlua to carry context between filtering steps, this context will be lost upon a switch. Given that lua filters code was not designed with that in mind, it would confuse the code and cause unexpected behaviors ranging from lua errors to crashing process. So here we take another approach: instead of re-creating the stream hlua context each time we switch between "global" and "per-thread" runtime context, let's have both of them inside the stream directly as initially suggested by Christopher back then when talked about the original issue. For this we leverage hlua_stream_ctx_prepare() and hlua_stream_ctx_get() helper functions which return the proper hlua context for a given stream and state_id combination. As for debugging infos reported after ha_panic(), we check for both hlua runtime contexts to check if one of them was active when the panic occured (only 1 runtime ctx per stream may be active at a given time). This should be backported to all stable versions with `0913386` ("BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread") This commit depends on: - "DEBUG: lua: precisely identify if stream is stuck inside lua or not" [for versions < 2.9 the ha_thread_dump_one() part should be skipped] - "MINOR: hlua: use accessors for stream hlua ctx" For 2.4, the filters API didn't exist. However it may be a good idea to backport it anyway because ->set_priv()/->get_priv() from tcp/http lua applets may also be affected by this bug, plus it will ease code maintenance. Of course, filters-related parts should be skipped in this case.	2024-03-13 09:24:46 +01:00
Aurelien DARRAGON	1a2cdf64c9	DEBUG: lua: precisely identify if stream is stuck inside lua or not When ha_panic() is called by the watchdog, we try to guess from ha_task_dump() and ha_thread_dump_one() if the thread was stuck while executing lua from the stream context. However we consider this is the case by simply checking if the stream hlua context was set, but this is not very precise because if the hlua context is set, then it simply means that at least one lua instruction was executed at the stream level, not that the stuck was currently executing lua when the panic occured. This is especially true with filters, one could simply register a lua filter that does nothing but this will still end up initializing the stream hlua context for each stream. If the thread end up being stuck during the stream handling, then debug dumping functions will report that the stream was stuck while handling lua, which is not necessarilly true, and could in fact confuse us even more. So here we take another approach, we add the BUSY flag to hlua context: this flag is set by hlua_ctx_resume() around lua_resume() call, this way we can precisely tell if the thread was handling lua when it was interrupted, and we rely on this flag in debug functions to check if the thread was effectively stuck inside lua or not while processing the stream No backport needed unless a commit depends on it.	2024-03-13 09:24:46 +01:00
Willy Tarreau	ab8928b9db	BUILD: address a few remaining calloc(size, n) cases In issue #2427 Ilya reports that gcc-14 rightfully complains about sizeof() being placed in the left term of calloc(). There's no impact but it's a bad pattern that gets copy-pasted over time. Let's fix the few remaining occurrences (debug.c, halog, udp-perturb). This can be backported to all branches, and the irrelevant parts dropped.	2024-02-10 11:37:27 +01:00
Willy Tarreau	7cba015c85	DEBUG: make the "debug dev {debug\|warn\|check}" command print a message In order to test the new message output capability, these commands will now explicitly mention that the bug was triggered from the CLI.	2024-02-05 17:09:00 +01:00
Willy Tarreau	9d869b10de	MINOR: debug: add features and build options to "show dev" The "show dev" CLI command is still missing useful elements such as the build options, SSL version etc. Let's just add the build features and the build options there so that it's possible to collect all of this from a running process without having to start the executable with -vv. This is still dumped all at once from the parsing function since the output is small. If it were to grow, this would possibly require to be reworked to support a context. It might be helpful to backport this to 2.9 since it can help narrow down certain issues.	2024-01-02 11:44:42 +01:00
Ilya Shipitsin	80813cdd2a	CLEANUP: assorted typo fixes in the code and comments This is 37th iteration of typo fixes	2023-11-23 16:23:14 +01:00
Willy Tarreau	6455fd5024	MINOR: debug: add the ability to enter components in the post_mortem struct Here the idea is to collect components' versions and build options. The main component is haproxy, but the API is made so that any sub-system can easily add a component there (for example the detailed version of a device detection lib, or some info about a lib loaded from Lua). The elements are stored as a pointer to an array of structs and its count so that it's sufficient to issue this in gdb to list them all at once: print *post_mortem.components@post_mortem.nb_components For now we collect name, version, toolchain, toolchain options, build options and path. Maybe more could be useful in the future.	2023-11-23 15:39:21 +01:00
Willy Tarreau	a88a3482b5	MINOR: debug: dump the mapping of the libs into post_mortem Having the libs and their addresses listed in the post_mortem struct is also helpful. Sometimes it helps notice that one version is not the expected one, e.g. due to some LD_LIBRARY_PATH. We don't emit it on "show dev" however since that's already available via "show libs".	2023-11-23 15:39:21 +01:00
Willy Tarreau	37e3dd718c	MINOR: debug: copy the thread info into the post_mortem struct The last starting thread now copies the pthread ID and stack top of each thread into post_mortem. That way it's as easy as issuing "p post_mortem" in gdb to see all thread IDs and stack frames and more easily map them to the threads met in a core.	2023-11-23 15:39:21 +01:00
Willy Tarreau	c0eec3a4aa	MINOR: debug: collect some boot-time info related to the process Here we collect the original uid/gid/rlimits for FD and RAM since these ones do affect behavior and are sometimes different from expected in containers or when starting as a service.	2023-11-23 15:39:21 +01:00
Willy Tarreau	ff9e06cd53	MINOR: debug: report any detected hypervisor in post_mortem When the x86 CPU flags show the "hypervisor" flag, we know we're running inside QEMU, VMware or possibly other flavors of hypervisors. In this case we'll report either "qemu", "vmware" or "yes" for other ones in the "virt_techno" field, based on the DMI hardware vendor name, otherwise "no" when the flag is not found.	2023-11-23 15:39:21 +01:00
Willy Tarreau	0cc799bdd1	MINOR: debug: detect CPU model and store it in post_mortem The CPU model and type has significant impact on certain bugs, such as contention issues caused by CPUs having split L3 caches, or stricter memory models that exhibit some barrier issues. It's complicated though because the info about the model depends on the arch. For example, x86 reports an SKU name while ARM rather reports the CPU core types, families and versions for each CPU core. There, the SoC will sometimes be reported in the device tree or DMI info instead. But we don't really care, it's essentially useful to know if the code is running on an armv8.0 such as A53, a 8.2 such as A55/A76/Neoverse etc. For MIPS the model appears to generally be there, and in addition the SoC is often present in the "system type" field before the first CPU, and the type of machine in the "machine" field, to replace the missing DMI and DT, so they are also collected. Note that only the first CPU is checked and reported, that's expected to be vastly sufficient, since we're just trying to spot known incompatibilities or issues.	2023-11-23 15:39:21 +01:00
Willy Tarreau	2974f3e71b	MINOR: debug: report in post_mortem if the container techno used is docker If we detect we're running inside a container on Linux, let's check if it seems to be docker. Docker usually creates a /.dockerenv file, which is easy to check. It's uncertain whether it's always the case, but on the few tested instances that was true, and we don't really care, what matters is to place helpful debugging info for developers. When this file is detected, we report "docker" instead of "yes" in the container techno.	2023-11-23 15:39:21 +01:00
Willy Tarreau	cf8be50a3d	MINOR: debug: report in port_mortem whether a container was detected Containers often cause significant trouble depending on how they're set up, and they're not always trivial for their users to extract info from. Here we're trying to detect if we're running inside a container on Linux. There are plenty of approaches and none is perfectly clean nor reliable, which makes sense since the goal is to remain transparent enough. One interesting approach is to rely on the observation that containers generally do not expose most kernel threads, and that the very firsts of them are extremely stable across all kernel versions: pid 2 was called "keventd" in kernel 2.4, became "kthreadd" in kernel 2.6, and has since not changed. This is true on all architectures tested, even with highly stripped down kernels such as those found on 15 year-old OpenWRT images. And this one doesn't appear inside containers. Thus here we check if we find such a thread via /proc and whether it's called keventd or kthreadd, to detect a container, and we set the "cont_techno" variable to "yes" or "no" depending on what is found.	2023-11-23 15:39:21 +01:00
Willy Tarreau	4e3f9921de	MINOR: debug: add OS/hardware info to the post_mortem struct Let's extract some info about the system (board model, vendor etc), this will indicate some hypervisors, some cloud instances or some uncommon embedded boards etc. Typically, vmware, qemu and raspberry-pi are visible here and can help during the troubleshooting session.	2023-11-23 15:39:21 +01:00
Willy Tarreau	0184597522	MINOR: debug: start to create a new struct post_mortem The goal here is to accumulate precious debugging information in a struct that is easy to find in memory. It's aligned to 256-byte as it also helps. We'll progressively add a lot of info about the startup conditions, the operating system, the hardware and hypervisor so as to limit the number of round trips between developers and users during debugging sessions. Also, opening a core file with an hex editor should often be sufficient to extract most of the info. In addition, a new "show dev" command will show these information so that they can be checked at runtime without having to wait for a crash (e.g. if a limit is bad in a container, better know it early). For now the struct only contains utsname that's fed at boot time.	2023-11-23 15:39:21 +01:00
Willy Tarreau	96bb99a87d	DEBUG: pools: detect that malloc_trim() is in progress Now when calling ha_panic() with a thread still under malloc_trim(), we'll set a new tainted flag to easily report it, and the output trace will report that this condition happened and will suggest to use no-memory-trimming to avoid it in the future.	2023-10-25 15:48:02 +02:00
Willy Tarreau	26a6481f00	DEBUG: lua: add tainted flags for stuck Lua contexts William suggested that since we can detect the presence of Lua in the stack, let's combine it with stuck detection to set a new pair of flags indicating a stuck Lua context and a stuck Lua shared context. Now, executing an infinite loop in a Lua sample fetch function with yield disabled crashes with tainted=0xe40 if loaded from a lua-load statement, or tainted=0x640 from a lua-load-per-thread statement. In addition, at the end of the panic dump, we can check if Lua was seen stuck and emit recommendations about lua-load-per-thread and the choice of dependencies depending on the presence of threads and/or shared context.	2023-10-25 15:48:02 +02:00
Willy Tarreau	46bbb3a33b	DEBUG: add a tainted flag when ha_panic() is called This will make it easier to know that the panic function was called, for the occasional case where the dump crashes and/or the stack is corrupted and not much exploitable. Now at least it will be sufficient to check the tainted value to know that someone called ha_panic(), and it will also be usable to condition extra analysis.	2023-10-25 15:48:02 +02:00
Willy Tarreau	b3dcd59f8d	MINOR: stream: fix output alignment of stuck thread dumps Since commit `c185bc465` ("MEDIUM: stream: now provide full stream dumps in case of loops"), the stuck threads show the stream's pointer in the margin since it appears immediately after a line feed. Let's add it after the prefix and "stream=" to make the output more readable.	2023-09-29 16:43:07 +02:00
Willy Tarreau	feff6296a1	MINOR: debug: use the more detailed stream dump in panics Similarly upon a panic we'd like to have a more detailed dump of a stream's state, so let's use the full dump function for this now.	2023-09-29 09:20:27 +02:00
Willy Tarreau	5743eeea88	MINOR: stream: make stream_dump() always multi-line There used to be two working modes for this function, a single-line one and a multi-line one, the difference being made on the "eol" argument which could contain either a space or an LF (and with the prefix being adjusted accordingly). Let's get rid of the single-line mode as it's what limits the output contents because it's difficult to produce exploitable structured data this way. It was only used in the rare case of spinning streams and applets and these are the ones lacking info. Now a spinning stream produces: [ALERT] (3511) : A bogus STREAM [0x227e7b0] is spinning at 5581202 calls per second and refuses to die, aborting now! Please report this error to developers: strm=0x227e7b0,c4a src=127.0.0.1 fe=public be=public dst=s1 txn=0x2041650,3000 txn.req=MSG_DONE,4c txn.rsp=MSG_RPBEFORE,0 rqf=1840000 rqa=8000 rpf=80000000 rpa=1400000 scf=0x24af280,EST,482 scb=0x24af430,EST,1411 af=(nil),0 sab=(nil),0 cof=0x7fdb28026630,300:H1(0x24a6f60)/RAW((nil))/tcpv4(33) cob=0x23199f0,10000300:H1(0x24af630)/RAW((nil))/tcpv4(32) filters={} call trace(11): (...)	2023-09-29 09:20:27 +02:00
Willy Tarreau	1f60133bfb	BUILD: debug: avoid a build warning related to epoll_wait() in debug code Ilya reported in issue #2193 that the latest Fedora complains about us passing NULL to epoll_wait() in the "debug dev fd" code to try to detect an epoll FD. That was intentional to get the kernel's verifications and make sure we're facing a poller, but since such a warning comes from the libc, it's possible that it plans to replace the syscall with a wrapper in the near future (e.g. epoll_pwait()), and that just hiding the NULL (as was confirmed to work) might just postpone the problem. Let's take another approach, instead we'll create a new dummy FD that we'll try to remove from the epoll set using epoll_ctl(). Since we created the FD we're certain it cannot be there. In this case (and only in this case) epoll_ctl() will return -ENOENT, otherwise it will typically return EINVAL or EBADF. It was verified that it works and doesn't return false positives for other FD types. It should be backported to the branches that contain a backport of the commit which introduced the feature, apparently as far as 2.4: `5be7c198e` ("DEBUG: cli: add a new "debug dev fd" expert command")	2023-07-02 11:01:37 +02:00
Aurelien DARRAGON	b6a24a52a2	BUG/MINOR: debug: fix pointer check in debug_parse_cli_task() Task pointer check in debug_parse_cli_task() computes the theoric end address of provided task pointer to check if it is valid or not thanks to may_access() helper function. However, relative ending address is calculated by adding task size to 't' pointer (which is a struct task pointer), thus it will result to incorrect address since the compiler automatically translates 't + x' to 't + x * sizeof(t)' internally (with sizeof(t) != 1 here). Solving the issue by using 'ptr' (which is the void * raw address) as starting address to prevent automatic address scaling. This was revealed by coverity, see GH #2157. No backport is needed, unless `9867987` ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets") gets backported.	2023-05-17 16:49:17 +02:00
Willy Tarreau	94df1b57ee	BUILD: debug: fix build issue on 32-bit platforms in "debug dev task" Commit `986798718` ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets") caused a build failure on 32-bit platforms when parsing the task's pointer. Let's use strtoul() and not strtoll(). No backport is needed, unless the commit above gets backported.	2023-05-12 04:40:06 +02:00
Willy Tarreau	95e6c9999a	BUILD: debug: do not check the isolated_thread variable in non-threaded builds The build without thread support was broken by commit `b30ced3d8` ("BUG/MINOR: debug: fix incorrect profiling status reporting in show threads") because it accesses the isolated_thread variable that is not defined when threads are disabled. In fact both the test on harmless and this one make no sense without threads, so let's comment out the block and mark the related variables as unused. This may have to be backported to 2.7 if the commit above is.	2023-05-07 15:02:30 +02:00
Willy Tarreau	e69919d1ba	CLEANUP: debug: remove the now unused ha_thread_dump_all_to_trash() The function isn't used anymore since each call place performs its own loop. Let's get rid of it.	2023-05-04 19:19:04 +02:00
Willy Tarreau	009b5519e6	MINOR: debug: make "show threads" properly iterate over all threads Previously it would re-dump all threads to the same trash if the output buffer was full, which it never was since the trash is of the same size. Now it dumps one thread, copies it to the buffer and yields until it can continue. Showing 256 threads works as expected.	2023-05-04 19:15:50 +02:00
Willy Tarreau	880d1684a7	MINOR: debug: write panic dump to stderr one thread at a time Currently large setups cannot dump all their threads because they're first dumped to the trash buffer, then copied to stderr. Here we can now change this, instead we dump one thread at a time into the trash and immediately send it to stderr. We also keep a copy into a local trash chunk that's assigned to thread_dump_buffer so that a core file still contains a copy of a large number of threads, which is generally sufficient for the vast majority of situations. It was verified that dumping 256 threads now produces ~55kB of output and all of them are properly dumped.	2023-05-04 19:15:50 +02:00
Willy Tarreau	9a6ecbd590	MEDIUM: debug: simplify the thread dump mechanism The thread dump mechanism that is used by "show threads" and by the panic dump is overly complicated due to an initial misdesign. It firsts wakes all threads, then serializes their dumps, then releases them, while taking extreme care not to face colliding dumps. In fact this is not what we need and it reached a limit where big machines cannot dump all their threads anymore due to buffer size limitations. What is needed instead is to be able to dump one thread, and to let the requester iterate on all threads. That's what this patch does. It adds the thread_dump_buffer to the struct thread_ctx so that the requester offers the buffer to the thread that is about to be dumped. This buffer also serves as a lock. A thread at rest has a NULL, a valid pointer indicates the thread is using it, and 0x1 (NULL+1) is used by the dumped thread to tell the requester it's done. This makes sure that a given thread is dumped once at a time. In addition to this, the calling thread decides whether it accesses the thread by itself or via the debug signal handler, in order to get a backtrace. This is much saner because the calling thread is free to do whatever it wants with the buffer after each thread is dumped, and there is no dependency between threads, once they've dumped, they're free to continue (and possibly to dump for another requester if needed). Finally, when the THREAD_DUMP feature is disabled and the debug signal is not used, the requester accesses the thread by itself like before. For now we still have the buffer size limitation but it will be addressed in future patches.	2023-05-04 19:15:44 +02:00
Willy Tarreau	cb01f5daa7	BUG/MINOR: debug: do not emit empty lines in thread dumps In 2.3, commit `471425f51` ("BUG/MINOR: debug: Don't dump the lua stack if it is not initialized") introduced the possibility to emit an empty line when there's no Lua info to dump. The problem is that doing this on the CLI in "show threads" marks the end of the output, and it may affect some external tools. We need to make sure that LFs are only emitted if there's something on the line and that all lines properly start with the prefix. This may be backported as far as 2.0 since the commit above was backported there.	2023-05-04 16:51:50 +02:00
Willy Tarreau	e5e62231d8	MINOR: debug: permit the "debug dev loop" to run under isolation Sometimes it's convenient to test the effect of tasks running under isolation, e.g. to validate the contents of the crash dumps. Let's add an optional "isolated" keyword to "debug dev loop" for this.	2023-05-04 11:50:26 +02:00
Willy Tarreau	b30ced3d88	BUG/MINOR: debug: fix incorrect profiling status reporting in show threads Thread dumps include a field "prof" for each thread that reports whether task profiling is currently active or not. It turns out that in 2.7-dev1, commit `680ed5f28` ("MINOR: task: move profiling bit to per-thread") mistakenly replaced it with a check for the current thread's bit in the thread dumps, which basically is the only place where another thread is being watched. The same mistake was done a few lines later by confusing threads_want_rdv_mask with the profiling mask. This mask disappeared in 2.7-dev2 with commit `598cf3f22` ("MAJOR: threads: change thread_isolate to support inter-group synchronization"), though instead we know the ID of the isolated thread. This commit fixes this and now reports "isolated" instead of "wantrdv". This can be backported to 2.7.	2023-05-04 11:41:33 +02:00
Willy Tarreau	ff508f12c6	BUILD: cli: fix build on Windows due to isalnum() implemented as a macro Commit `986798718` ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets") broke the build on windows due to this: src/debug.c:940:95: error: array subscript has type char [-Werror=char-subscripts] 940 \| caller && may_access(caller) && may_access(caller->func) && isalnum(*caller->func) ? caller->func : "0", \| ^~~~~~~~~~~~~ It's classical on platforms which implement ctype.h as macros instead of functions, let's cast it as uchar. No backport is needed.	2023-05-03 16:32:50 +02:00

1 2 3 4 5

223 Commits