Commit Graph

211 Commits

Author SHA1 Message Date
Valentine Krasnobaeva
865db6307f MINOR: init: use RLIMIT_DATA instead of RLIMIT_AS
Limiting total allocatable process memory (VSZ) via setting RLIMIT_AS limit is
no longer effective, in order to restrict memory consumption at run time.
We can see from process memory map below, that there are many holes within
the process VA space, which bumps its VSZ to 1.5G. These holes are here by
many reasons and could be explaned at first by the full randomization of
system VA space. Now it is usually enabled in Linux kernels by default. There
are always gaps around the process stack area to trap overflows. Holes before
and after shared libraries could be explained by the fact, that on many
architectures libraries have a 'preferred' address to be loaded at; putting
them elsewhere requires relocation work, and probably some unshared pages.
Repetitive holes of 65380K are most probably correspond to the header that
malloc has to allocate before asked a claimed memory block. This header is
used by malloc to link allocated chunks together and for its internal book
keeping.

	$ sudo pmap -x -p `pidof haproxy`
	127136:   ./haproxy -f /home/haproxy/haproxy/haproxy_h2.cfg
	Address           Kbytes     RSS   Dirty Mode  Mapping
	0000555555554000     388      64       0 r---- /home/haproxy/haproxy/haproxy
	00005555555b5000    2608    1216       0 r-x-- /home/haproxy/haproxy/haproxy
	0000555555841000     916      64       0 r---- /home/haproxy/haproxy/haproxy
	0000555555926000      60      60      60 r---- /home/haproxy/haproxy/haproxy
	0000555555935000     116     116     116 rw--- /home/haproxy/haproxy/haproxy
	0000555555952000    7872    5236    5236 rw---   [ anon ]
	00007fff98000000     156      36      36 rw---   [ anon ]
	00007fff98027000   65380       0       0 -----   [ anon ]
	00007fffa0000000     156      36      36 rw---   [ anon ]
	00007fffa0027000   65380       0       0 -----   [ anon ]
	00007fffa4000000     156      36      36 rw---   [ anon ]
	00007fffa4027000   65380       0       0 -----   [ anon ]
	00007fffa8000000     156      36      36 rw---   [ anon ]
	00007fffa8027000   65380       0       0 -----   [ anon ]
	00007fffac000000     156      36      36 rw---   [ anon ]
	00007fffac027000   65380       0       0 -----   [ anon ]
	00007fffb0000000     156      36      36 rw---   [ anon ]
	00007fffb0027000   65380       0       0 -----   [ anon ]
	...
	00007ffff7fce000       4       4       0 r-x--   [ anon ]
	00007ffff7fcf000       4       4       0 r---- /usr/lib/x86_64-linux-gnu/ld-2.31.so
	00007ffff7fd0000     140     140       0 r-x-- /usr/lib/x86_64-linux-gnu/ld-2.31.so
	...
	00007ffff7ffe000       4       4       4 rw---   [ anon ]
	00007ffffffde000     132      20      20 rw---   [ stack ]
	ffffffffff600000       4       0       0 --x--   [ anon ]
	---------------- ------- ------- -------
	total kB         1499288   75504   72760

This exceeded VSZ makes impossible to start an haproxy process with 200M
memory limit, set at its initialization stage as RLIMIT_AS. We usually
have in this case such cryptic output at stderr:

	$ haproxy -m 200 -f haproxy_quic.cfg
        (null)(null)(null)(null)(null)(null)

At the same time the process RSS (a memory really used) is only 75,5M.
So to make process memory accounting more realistic let's base the memory
limit, set by -m option, on RSS measurement and let's use RLIMIT_DATA instead
of RLIMIT_AS.

RLIMIT_AS was used before, because earlier versions of haproxy always allocate
memory buffers for new connections, but data were not written there
immediately. So these buffers were not instantly counted in RSS, but were
always counted in VSZ. Now we allocate new buffers only in the case, when we
will write there some data immediately, so using RLIMIT_DATA becomes more
appropriate.
2024-04-19 17:36:40 +02:00
Amaury Denoyelle
da03396bb3 BUG/BUILD: debug: fix unused variable error
A compilation error occurs when using DEBUG_MEM_STATS due to a variable
now being unused in debug_iohandler_memstats() :

src/debug.c: In function ‘debug_iohandler_memstats’:
src/debug.c:1862:24: error: unused variable ‘sc’ [-Werror=unused-variable]
 1862 |         struct stconn *sc = appctx_sc(appctx);
      |                        ^~

This is caused since the following commit :
  94b8ed446f
  MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands

This must not be backported.
2024-03-29 17:21:04 +01:00
Christopher Faulet
94b8ed446f MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands
The main CLI I/O handle is responsible to interrupt the processing on
shutdown/abort. It is not the responsibility of the I/O handler of CLI
commands to take care of it.
2024-03-28 17:28:20 +01:00
Willy Tarreau
5df0df96dd MINOR: debug: add "debug dev trace" to flood with traces
This new command, enabled only with "DEBUG_DEV", sends 2 or 20 traces
per task wakeup (depending on the verbosity level), and stops after 1M
wakeups per thread in order not to have to stop/start the process each
time it's fired.

We have two small messages and 18 larger ones from 20 to 270 bytes
each, so that the average size is approx 213 bytes counting headers
(the header adds approx 82 bytes), which matches what's generally
observed on average when traces are enabled in all muxes.

Typical figures show varations between 5.7M and 6.2M msg/s on an EPYC
in a 3C6T setup (single CCX), and 2.12M - 2.22M in a 24C48T setup
(across 8 CCX, with 8 thread groups).
2024-03-25 17:32:22 +00:00
Aurelien DARRAGON
07b2e84bce BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread (2nd try)
While trying to reproduce another crash case involving lua filters
reported by @bgrooot on GH #2467, we found out that mixing filters loaded
from different contexts ('lua-load' vs 'lua-load-per-thread') for the same
stream isn't supported and may even cause the process to crash.

Historically, mixing lua-load and lua-load-per-threads for a stream wasn't
supported, but this changed thanks to 0913386 ("BUG/MEDIUM: hlua: streams
don't support mixing lua-load with lua-load-per-thread").

However, the above fix didn't consider lua filters's use-case properly:
unlike lua fetches, actions or even services, lua filters don't simply
use the stream hlua context as a "temporary" hlua running context to
process some hlua code. For fetches, actions.. hlua executions are
processed sequentially, so we simply reuse the hlua context from the
previous action/fetch to run the next one (this allows to bypass memory
allocations and initialization, thus it increases performance), unless
we need to run on a different hlua state-id, in which case we perform a
reset of the hlua context.

But this cannot work with filters: indeed, once registered, a filter will
last for the whole stream duration. It means that the filter will rely
on the stream hlua context from ->attach() to ->detach(). And here is the
catch, if for the same stream we register 2 lua filters from different
contexts ('lua-load' + 'lua-load-per-thread'), then we have an issue,
because the hlua stream will be re-created each time we switch between
runtime contexts, which means each time we switch between the filters (may
happen for each stream processing step), and since lua filters rely on the
stream hlua to carry context between filtering steps, this context will be
lost upon a switch. Given that lua filters code was not designed with that
in mind, it would confuse the code and cause unexpected behaviors ranging
from lua errors to crashing process.

So here we take another approach: instead of re-creating the stream hlua
context each time we switch between "global" and "per-thread" runtime
context, let's have both of them inside the stream directly as initially
suggested by Christopher back then when talked about the original issue.

For this we leverage hlua_stream_ctx_prepare() and hlua_stream_ctx_get()
helper functions which return the proper hlua context for a given stream
and state_id combination.

As for debugging infos reported after ha_panic(), we check for both hlua
runtime contexts to check if one of them was active when the panic occured
(only 1 runtime ctx per stream may be active at a given time).

This should be backported to all stable versions with 0913386
("BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread")

This commit depends on:
 - "DEBUG: lua: precisely identify if stream is stuck inside lua or not"
   [for versions < 2.9 the ha_thread_dump_one() part should be skipped]
 - "MINOR: hlua: use accessors for stream hlua ctx"

For 2.4, the filters API didn't exist. However it may be a good idea to
backport it anyway because ->set_priv()/->get_priv() from tcp/http lua
applets may also be affected by this bug, plus it will ease code
maintenance. Of course, filters-related parts should be skipped in this
case.
2024-03-13 09:24:46 +01:00
Aurelien DARRAGON
1a2cdf64c9 DEBUG: lua: precisely identify if stream is stuck inside lua or not
When ha_panic() is called by the watchdog, we try to guess from
ha_task_dump() and ha_thread_dump_one() if the thread was stuck while
executing lua from the stream context. However we consider this is the
case by simply checking if the stream hlua context was set, but this is
not very precise because if the hlua context is set, then it simply means
that at least one lua instruction was executed at the stream level, not
that the stuck was currently executing lua when the panic occured.

This is especially true with filters, one could simply register a lua
filter that does nothing but this will still end up initializing the
stream hlua context for each stream. If the thread end up being stuck
during the stream handling, then debug dumping functions will report
that the stream was stuck while handling lua, which is not necessarilly
true, and could in fact confuse us even more.

So here we take another approach, we add the BUSY flag to hlua context:
this flag is set by hlua_ctx_resume() around lua_resume() call, this way
we can precisely tell if the thread was handling lua when it was
interrupted, and we rely on this flag in debug functions to check if the
thread was effectively stuck inside lua or not while processing the stream

No backport needed unless a commit depends on it.
2024-03-13 09:24:46 +01:00
Willy Tarreau
ab8928b9db BUILD: address a few remaining calloc(size, n) cases
In issue #2427 Ilya reports that gcc-14 rightfully complains about
sizeof() being placed in the left term of calloc(). There's no impact
but it's a bad pattern that gets copy-pasted over time. Let's fix the
few remaining occurrences (debug.c, halog, udp-perturb).

This can be backported to all branches, and the irrelevant parts dropped.
2024-02-10 11:37:27 +01:00
Willy Tarreau
7cba015c85 DEBUG: make the "debug dev {debug|warn|check}" command print a message
In order to test the new message output capability, these commands will
now explicitly mention that the bug was triggered from the CLI.
2024-02-05 17:09:00 +01:00
Willy Tarreau
9d869b10de MINOR: debug: add features and build options to "show dev"
The "show dev" CLI command is still missing useful elements such as the
build options, SSL version etc. Let's just add the build features and
the build options there so that it's possible to collect all of this
from a running process without having to start the executable with -vv.

This is still dumped all at once from the parsing function since the
output is small. If it were to grow, this would possibly require to be
reworked to support a context.

It might be helpful to backport this to 2.9 since it can help narrow
down certain issues.
2024-01-02 11:44:42 +01:00
Ilya Shipitsin
80813cdd2a CLEANUP: assorted typo fixes in the code and comments
This is 37th iteration of typo fixes
2023-11-23 16:23:14 +01:00
Willy Tarreau
6455fd5024 MINOR: debug: add the ability to enter components in the post_mortem struct
Here the idea is to collect components' versions and build options. The
main component is haproxy, but the API is made so that any sub-system
can easily add a component there (for example the detailed version of a
device detection lib, or some info about a lib loaded from Lua).

The elements are stored as a pointer to an array of structs and its count
so that it's sufficient to issue this in gdb to list them all at once:

  print *post_mortem.components@post_mortem.nb_components

For now we collect name, version, toolchain, toolchain options, build
options and path. Maybe more could be useful in the future.
2023-11-23 15:39:21 +01:00
Willy Tarreau
a88a3482b5 MINOR: debug: dump the mapping of the libs into post_mortem
Having the libs and their addresses listed in the post_mortem struct
is also helpful. Sometimes it helps notice that one version is not the
expected one, e.g. due to some LD_LIBRARY_PATH. We don't emit it on
"show dev" however since that's already available via "show libs".
2023-11-23 15:39:21 +01:00
Willy Tarreau
37e3dd718c MINOR: debug: copy the thread info into the post_mortem struct
The last starting thread now copies the pthread ID and stack top of
each thread into post_mortem. That way it's as easy as issuing
"p post_mortem" in gdb to see all thread IDs and stack frames and more
easily map them to the threads met in a core.
2023-11-23 15:39:21 +01:00
Willy Tarreau
c0eec3a4aa MINOR: debug: collect some boot-time info related to the process
Here we collect the original uid/gid/rlimits for FD and RAM since these
ones do affect behavior and are sometimes different from expected in
containers or when starting as a service.
2023-11-23 15:39:21 +01:00
Willy Tarreau
ff9e06cd53 MINOR: debug: report any detected hypervisor in post_mortem
When the x86 CPU flags show the "hypervisor" flag, we know we're running
inside QEMU, VMware or possibly other flavors of hypervisors. In this
case we'll report either "qemu", "vmware" or "yes" for other ones in
the "virt_techno" field, based on the DMI hardware vendor name,
otherwise "no" when the flag is not found.
2023-11-23 15:39:21 +01:00
Willy Tarreau
0cc799bdd1 MINOR: debug: detect CPU model and store it in post_mortem
The CPU model and type has significant impact on certain bugs, such
as contention issues caused by CPUs having split L3 caches, or stricter
memory models that exhibit some barrier issues. It's complicated though
because the info about the model depends on the arch. For example, x86
reports an SKU name while ARM rather reports the CPU core types, families
and versions for each CPU core. There, the SoC will sometimes be reported
in the device tree or DMI info instead. But we don't really care, it's
essentially useful to know if the code is running on an armv8.0 such as
A53, a 8.2 such as A55/A76/Neoverse etc. For MIPS the model appears to
generally be there, and in addition the SoC is often present in the
"system type" field before the first CPU, and the type of machine in the
"machine" field, to replace the missing DMI and DT, so they are also
collected. Note that only the first CPU is checked and reported, that's
expected to be vastly sufficient, since we're just trying to spot known
incompatibilities or issues.
2023-11-23 15:39:21 +01:00
Willy Tarreau
2974f3e71b MINOR: debug: report in post_mortem if the container techno used is docker
If we detect we're running inside a container on Linux, let's check if
it seems to be docker. Docker usually creates a /.dockerenv file, which
is easy to check. It's uncertain whether it's always the case, but on the
few tested instances that was true, and we don't really care, what matters
is to place helpful debugging info for developers. When this file is
detected, we report "docker" instead of "yes" in the container techno.
2023-11-23 15:39:21 +01:00
Willy Tarreau
cf8be50a3d MINOR: debug: report in port_mortem whether a container was detected
Containers often cause significant trouble depending on how they're
set up, and they're not always trivial for their users to extract info
from. Here we're trying to detect if we're running inside a container
on Linux. There are plenty of approaches and none is perfectly clean
nor reliable, which makes sense since the goal is to remain transparent
enough.

One interesting approach is to rely on the observation that containers
generally do not expose most kernel threads, and that the very firsts
of them are extremely stable across all kernel versions: pid 2 was
called "keventd" in kernel 2.4, became "kthreadd" in kernel 2.6, and
has since not changed. This is true on all architectures tested, even
with highly stripped down kernels such as those found on 15 year-old
OpenWRT images. And this one doesn't appear inside containers. Thus
here we check if we find such a thread via /proc and whether it's
called keventd or kthreadd, to detect a container, and we set the
"cont_techno" variable to "yes" or "no" depending on what is found.
2023-11-23 15:39:21 +01:00
Willy Tarreau
4e3f9921de MINOR: debug: add OS/hardware info to the post_mortem struct
Let's extract some info about the system (board model, vendor etc),
this will indicate some hypervisors, some cloud instances or some
uncommon embedded boards etc. Typically, vmware, qemu and raspberry-pi
are visible here and can help during the troubleshooting session.
2023-11-23 15:39:21 +01:00
Willy Tarreau
0184597522 MINOR: debug: start to create a new struct post_mortem
The goal here is to accumulate precious debugging information in a
struct that is easy to find in memory. It's aligned to 256-byte as
it also helps. We'll progressively add a lot of info about the
startup conditions, the operating system, the hardware and hypervisor
so as to limit the number of round trips between developers and users
during debugging sessions. Also, opening a core file with an hex editor
should often be sufficient to extract most of the info.

In addition, a new "show dev" command will show these information so
that they can be checked at runtime without having to wait for a crash
(e.g. if a limit is bad in a container, better know it early).

For now the struct only contains utsname that's fed at boot time.
2023-11-23 15:39:21 +01:00
Willy Tarreau
96bb99a87d DEBUG: pools: detect that malloc_trim() is in progress
Now when calling ha_panic() with a thread still under malloc_trim(),
we'll set a new tainted flag to easily report it, and the output
trace will report that this condition happened and will suggest to
use no-memory-trimming to avoid it in the future.
2023-10-25 15:48:02 +02:00
Willy Tarreau
26a6481f00 DEBUG: lua: add tainted flags for stuck Lua contexts
William suggested that since we can detect the presence of Lua in the
stack, let's combine it with stuck detection to set a new pair of flags
indicating a stuck Lua context and a stuck Lua shared context.

Now, executing an infinite loop in a Lua sample fetch function with
yield disabled crashes with tainted=0xe40 if loaded from a lua-load
statement, or tainted=0x640 from a lua-load-per-thread statement.

In addition, at the end of the panic dump, we can check if Lua was
seen stuck and emit recommendations about lua-load-per-thread and
the choice of dependencies depending on the presence of threads
and/or shared context.
2023-10-25 15:48:02 +02:00
Willy Tarreau
46bbb3a33b DEBUG: add a tainted flag when ha_panic() is called
This will make it easier to know that the panic function was called,
for the occasional case where the dump crashes and/or the stack is
corrupted and not much exploitable. Now at least it will be sufficient
to check the tainted value to know that someone called ha_panic(), and
it will also be usable to condition extra analysis.
2023-10-25 15:48:02 +02:00
Willy Tarreau
b3dcd59f8d MINOR: stream: fix output alignment of stuck thread dumps
Since commit c185bc465 ("MEDIUM: stream: now provide full stream dumps
in case of loops"), the stuck threads show the stream's pointer in the
margin since it appears immediately after a line feed. Let's add it after
the prefix and "stream=" to make the output more readable.
2023-09-29 16:43:07 +02:00
Willy Tarreau
feff6296a1 MINOR: debug: use the more detailed stream dump in panics
Similarly upon a panic we'd like to have a more detailed dump of a
stream's state, so let's use the full dump function for this now.
2023-09-29 09:20:27 +02:00
Willy Tarreau
5743eeea88 MINOR: stream: make stream_dump() always multi-line
There used to be two working modes for this function, a single-line one
and a multi-line one, the difference being made on the "eol" argument
which could contain either a space or an LF (and with the prefix being
adjusted accordingly). Let's get rid of the single-line mode as it's
what limits the output contents because it's difficult to produce
exploitable structured data this way. It was only used in the rare case
of spinning streams and applets and these are the ones lacking info. Now
a spinning stream produces:

[ALERT]    (3511) : A bogus STREAM [0x227e7b0] is spinning at 5581202 calls per second and refuses to die, aborting now! Please report this error to developers:
  strm=0x227e7b0,c4a src=127.0.0.1 fe=public be=public dst=s1
  txn=0x2041650,3000 txn.req=MSG_DONE,4c txn.rsp=MSG_RPBEFORE,0
  rqf=1840000 rqa=8000 rpf=80000000 rpa=1400000
  scf=0x24af280,EST,482 scb=0x24af430,EST,1411
  af=(nil),0 sab=(nil),0
  cof=0x7fdb28026630,300:H1(0x24a6f60)/RAW((nil))/tcpv4(33)
  cob=0x23199f0,10000300:H1(0x24af630)/RAW((nil))/tcpv4(32)
  filters={}
  call trace(11):
  (...)
2023-09-29 09:20:27 +02:00
Willy Tarreau
1f60133bfb BUILD: debug: avoid a build warning related to epoll_wait() in debug code
Ilya reported in issue #2193 that the latest Fedora complains about us
passing NULL to epoll_wait() in the "debug dev fd" code to try to detect
an epoll FD. That was intentional to get the kernel's verifications and
make sure we're facing a poller, but since such a warning comes from the
libc, it's possible that it plans to replace the syscall with a wrapper
in the near future (e.g. epoll_pwait()), and that just hiding the NULL
(as was confirmed to work) might just postpone the problem.

Let's take another approach, instead we'll create a new dummy FD that
we'll try to remove from the epoll set using epoll_ctl(). Since we
created the FD we're certain it cannot be there.  In this case (and
only in this case) epoll_ctl() will return -ENOENT, otherwise it will
typically return EINVAL or EBADF. It was verified that it works and
doesn't return false positives for other FD types. It should be
backported to the branches that contain a backport of the commit which
introduced the feature, apparently as far as 2.4:

  5be7c198e ("DEBUG: cli: add a new "debug dev fd" expert command")
2023-07-02 11:01:37 +02:00
Aurelien DARRAGON
b6a24a52a2 BUG/MINOR: debug: fix pointer check in debug_parse_cli_task()
Task pointer check in debug_parse_cli_task() computes the theoric end
address of provided task pointer to check if it is valid or not thanks to
may_access() helper function.

However, relative ending address is calculated by adding task size to 't'
pointer (which is a struct task pointer), thus it will result to incorrect
address since the compiler automatically translates 't + x' to
't + x * sizeof(*t)' internally (with sizeof(*t) != 1 here).

Solving the issue by using 'ptr' (which is the void * raw address) as
starting address to prevent automatic address scaling.

This was revealed by coverity, see GH #2157.

No backport is needed, unless 9867987 ("DEBUG: cli: add "debug dev task"
to show/wake/expire/kill tasks and tasklets") gets backported.
2023-05-17 16:49:17 +02:00
Willy Tarreau
94df1b57ee BUILD: debug: fix build issue on 32-bit platforms in "debug dev task"
Commit 986798718 ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill
tasks and tasklets") caused a build failure on 32-bit platforms when parsing
the task's pointer. Let's use strtoul() and not strtoll(). No backport is
needed, unless the commit above gets backported.
2023-05-12 04:40:06 +02:00
Willy Tarreau
95e6c9999a BUILD: debug: do not check the isolated_thread variable in non-threaded builds
The build without thread support was broken by commit b30ced3d8 ("BUG/MINOR:
debug: fix incorrect profiling status reporting in show threads") because
it accesses the isolated_thread variable that is not defined when threads
are disabled. In fact both the test on harmless and this one make no sense
without threads, so let's comment out the block and mark the related
variables as unused.

This may have to be backported to 2.7 if the commit above is.
2023-05-07 15:02:30 +02:00
Willy Tarreau
e69919d1ba CLEANUP: debug: remove the now unused ha_thread_dump_all_to_trash()
The function isn't used anymore since each call place performs its
own loop. Let's get rid of it.
2023-05-04 19:19:04 +02:00
Willy Tarreau
009b5519e6 MINOR: debug: make "show threads" properly iterate over all threads
Previously it would re-dump all threads to the same trash if  the output
buffer was full, which it never was since the trash is of the same size.
Now it dumps one thread, copies it to the buffer and yields until it can
continue. Showing 256 threads works as expected.
2023-05-04 19:15:50 +02:00
Willy Tarreau
880d1684a7 MINOR: debug: write panic dump to stderr one thread at a time
Currently large setups cannot dump all their threads because they're
first dumped to the trash buffer, then copied to stderr. Here we can
now change this, instead we dump one thread at a time into the trash
and immediately send it to stderr. We also keep a copy into a local
trash chunk that's assigned to thread_dump_buffer so that a core file
still contains a copy of a large number of threads, which is generally
sufficient for the vast majority of situations.

It was verified that dumping 256 threads now produces ~55kB of output
and all of them are properly dumped.
2023-05-04 19:15:50 +02:00
Willy Tarreau
9a6ecbd590 MEDIUM: debug: simplify the thread dump mechanism
The thread dump mechanism that is used by "show threads" and by the
panic dump is overly complicated due to an initial misdesign. It
firsts wakes all threads, then serializes their dumps, then releases
them, while taking extreme care not to face colliding dumps. In fact
this is not what we need and it reached a limit where big machines
cannot dump all their threads anymore due to buffer size limitations.

What is needed instead is to be able to dump *one* thread, and to let
the requester iterate on all threads.

That's what this patch does. It adds the thread_dump_buffer to the
struct thread_ctx so that the requester offers the buffer to the
thread that is about to be dumped. This buffer also serves as a lock.
A thread at rest has a NULL, a valid pointer indicates the thread is
using it, and 0x1 (NULL+1) is used by the dumped thread to tell the
requester it's done. This makes sure that a given thread is dumped
once at a time. In addition to this, the calling thread decides
whether it accesses the thread by itself or via the debug signal
handler, in order to get a backtrace. This is much saner because the
calling thread is free to do whatever it wants with the buffer after
each thread is dumped, and there is no dependency between threads,
once they've dumped, they're free to continue (and possibly to dump
for another requester if needed). Finally, when the THREAD_DUMP
feature is disabled and the debug signal is not used, the requester
accesses the thread by itself like before.

For now we still have the buffer size limitation but it will be
addressed in future patches.
2023-05-04 19:15:44 +02:00
Willy Tarreau
cb01f5daa7 BUG/MINOR: debug: do not emit empty lines in thread dumps
In 2.3, commit 471425f51 ("BUG/MINOR: debug: Don't dump the lua stack
if it is not initialized") introduced the possibility to emit an empty
line when there's no Lua info to dump. The problem is that doing this
on the CLI in "show threads" marks the end of the output, and it may
affect some external tools. We need to make sure that LFs are only
emitted if there's something on the line and that all lines properly
start with the prefix.

This may be backported as far as 2.0 since the commit above was
backported there.
2023-05-04 16:51:50 +02:00
Willy Tarreau
e5e62231d8 MINOR: debug: permit the "debug dev loop" to run under isolation
Sometimes it's convenient to test the effect of tasks running under
isolation, e.g. to validate the contents of the crash dumps. Let's
add an optional "isolated" keyword to "debug dev loop" for this.
2023-05-04 11:50:26 +02:00
Willy Tarreau
b30ced3d88 BUG/MINOR: debug: fix incorrect profiling status reporting in show threads
Thread dumps include a field "prof" for each thread that reports whether
task profiling is currently active or not. It turns out that in 2.7-dev1,
commit 680ed5f28 ("MINOR: task: move profiling bit to per-thread")
mistakenly replaced it with a check for the current thread's bit in the
thread dumps, which basically is the only place where another thread is
being watched. The same mistake was done a few lines later by confusing
threads_want_rdv_mask with the profiling mask. This mask disappeared
in 2.7-dev2 with commit 598cf3f22 ("MAJOR: threads: change thread_isolate
to support inter-group synchronization"), though instead we know the ID
of the isolated thread. This commit fixes this and now reports "isolated"
instead of "wantrdv".

This can be backported to 2.7.
2023-05-04 11:41:33 +02:00
Willy Tarreau
ff508f12c6 BUILD: cli: fix build on Windows due to isalnum() implemented as a macro
Commit 986798718 ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill
tasks and tasklets") broke the build on windows due to this:

  src/debug.c:940:95: error: array subscript has type char [-Werror=char-subscripts]
    940 |  caller && may_access(caller) && may_access(caller->func) && isalnum(*caller->func) ? caller->func : "0",
        |                                                                      ^~~~~~~~~~~~~

It's classical on platforms which implement ctype.h as macros instead of
functions, let's cast it as uchar. No backport is needed.
2023-05-03 16:32:50 +02:00
Willy Tarreau
9867987182 DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets
When analyzing certain types of bugs in field, sometimes it would be
nice to be able to wake up a task or tasklet to see how events progress
(e.g. to detect a missing wakeup condition), or expire or kill such a
task. This restricted command shows hte current state of a task or tasklet
and allows to manipulate it like this. However it must be used with extreme
care because while it does verify that the pointers are mapped, it cannot
know if they point to a real task, and performing such actions on something
not a task will easily lead to a crash. In addition, performing a "kill"
on a task has great chances of provoking a deferred crash due to a double
free and/or another kill that is not idempotent. Use with extreme care!
2023-05-03 11:47:44 +02:00
Willy Tarreau
dd01448953 MINOR: debug: clarify "debug dev stream" help message
The help message was insufficient to figure how to use it and specify
the stream pointer and changes to operate.
2023-05-03 11:47:44 +02:00
Christopher Faulet
208c712b40 MINOR: stconn: Rename SC_FL_SHUTW in SC_FL_SHUT_DONE
Here again, it is just a flag renaming. In SC flags, there is no longer
shutdown for writes but shutdowns.
2023-04-14 15:01:21 +02:00
Christopher Faulet
7faac7cf34 MINOR: tree-wide: Simplifiy some tests on SHUT flags by accessing SCs directly
At many places, we simplify the tests on SHUT flags to remove calls to
chn_prod() or chn_cons() function because the corresponding SC is available.
2023-04-05 08:57:06 +02:00
Christopher Faulet
87633c3a11 MEDIUM: tree-wide: Move flags about shut from the channel to the SC
The purpose of this patch is only a one-to-one replacement, as far as
possible.

CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the
stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not
so simple because at many places, we were testing if a channel was shut for
reads and writes in same time. To do the same, shut for reads must be tested
on one side on the SC and shut for writes on the other side on the opposite
SC. A special care was taken with process_stream(). flags of SCs must be
saved to be able to detect changes, just like for the channels.
2023-04-05 08:57:06 +02:00
Willy Tarreau
bd3b44edff MINOR: debug: add random delay injection with "debug dev delay-inj"
The goal is to send signals to random threads at random instants so that
they spin for a random delay in a relax() loop, trying to give back the
CPU to another competing hardware thread, in hope that from time to time
this can trigger in critical areas and increase the chances to provoke a
latent concurrency bug. For now none were observed.

For example, this command starts 64 such tasks waking after random delays
of 0-1ms and delivering signals to trigger such loops on 3 random threads:

  for i in {1..64}; do
    socat - /tmp/sock1 <<< "expert-mode on;debug dev delay-inj 2 3"
  done

This command is only enabled when DEBUG_DEV is set at build time.
2023-03-09 14:01:58 +01:00
Christopher Faulet
15315d6c0a CLEANUP: stconn: Remove old read and write expiration dates
Old read and write expiration dates are no longer used. Thus we can safely
remove them.
2023-02-22 15:59:16 +01:00
Christopher Faulet
f8413cba2a MEDIUM: channel/stconn: Move rex/wex timer from the channel to the sedesc
These timers are related to the I/O. Thus it is logical to move them into
the SE descriptor. The patch is a bit huge but it is just a
replacement. However it is error-prone.

From the stconn or the stream, helper functions are used to get, set or
reset these timers. This simplify the timers manipulations.
2023-02-22 14:52:15 +01:00
Willy Tarreau
9debe0fb27 BUG/MEDIUM: debug/thread: make the debug handler not wait for !rdv_requests
The debug handler may deadlock with some threads waiting for isolation.
This may happend during a "show threads" command or even during a panic.
The reason is the call to thread_harmless_end() which waits for rdv_requests
to turn to zero before releasing its position in thread_dump_state,
while that one may not progress if another thread was interrupted in
thread_isolate() and is waiting for that thread to drop thread_dump_state.

In order to address this, we now use thread_harmless_end_sig() introduced
by previous commit:

   MINOR: threads: add a thread_harmless_end() version that doesn't wait

However there's a catch: since commit f7afdd910 ("MINOR: debug: mark
oneself harmless while waiting for threads to finish"), there's a second
pair of thread_harmless_now()/thread_harmless_end() that surround the
loop around thread_dump_state. Marking a thread harmless before this
loop and dropping that without checking rdv_requests there could break
the harmless promise made to the other thread if it returns first and
proceeds with its isolated work. Hence we just drop this pair which was
only preventive for other signal handlers, while as indicated in that
patch's commit message, other signals are handled asynchronously and do
not require that extra protection.

This fix must be backported to 2.7.

The problem can be seen by running "show threads" in fast loops (100/s)
while reloading haproxy very quickly (10/s) and sending lots of traffic
to it (100krps, 15 Gbps). In this case the soft stop calls pool_gc()
which isolates a lot and manages to race with the dumps after a few
tens of seconds, leaving the process with all threads at 100%.
2023-01-19 19:22:17 +01:00
Christopher Faulet
da89e9b95b MINOR: channel/applets: Stop to test CF_WRITE_ERROR flag if CF_SHUTW is enough
In applets, we stop processing when a write error (CF_WRITE_ERROR) or a shutdown
for writes (CF_SHUTW) is detected. However, any write error leads to an
immediate shutdown for writes. Thus, it is enough to only test if CF_SHUTW is
set.
2023-01-09 18:41:08 +01:00
Willy Tarreau
b5662519df BUG/MINOR: debug: don't mask the TH_FL_STUCK flag before dumping threads
Commit f0c86ddfe ("BUG/MEDIUM: debug: fix parallel thread dumps again")
added a clearing of the TH_FL_STUCK flag before dumping threads in case
of parallel dumps, but that was in part a sort of workaround for some
remains of the commit that introduced the flag in 2.0 before the watchdog
existed, and which would set it after dumping a thread: e6a02fa65 ("MINOR:
threads: add a "stuck" flag to the thread_info struct"), and in part an
attempt to avoid that a thread waiting for too long during the dump would
get the flag set. But that is not possible, a thread waiting for being
dumped has the harmless bit set and doesn't get the stuck bit. What happens
in fact is that issuing "show threads" in fast loops ends up causing some
threads to keep their STUCK bit that was set at the end of "show threads",
and confuses the output.

The problem with doing this is that the flag is cleared before the thread
is dumped, and since this flag is used to decide whether to show a backtrace
or not, we don't get backtraces anymore of stuck threads since the commit
above in 2.7.

This patch just removes the two points where the flag was cleared by the
commit above. It should be backported to 2.7.
2023-01-02 09:51:35 +01:00
Willy Tarreau
b59e3f6045 MINOR: debug: add a balance of alloc - free at the end of the memstats dump
When digging into suspected memory leaks, it's cumbersome to count the
number of allocations and free calls. Here we're adding a summary at the
end of the sum of allocs minus the sum of frees, excluding realloc since
we can't know how much it releases upon each call. This means that when
doing many realloc+free the count may be negative but in practice there
are very few reallocs so that's not a problem. Also the size/call is signed
and corresponds to the average size allocated (e.g. leaked) per call.

It seems to work reasonably well for now:

  > debug dev memstats match buf
  quic_conn.c:2978       P_FREE  size:   1239547904  calls:     75656  size/call:  16384 buffer
  quic_conn.c:2960      P_ALLOC  size:   1239547904  calls:     75656  size/call:  16384 buffer
  mux_quic.c:393        P_ALLOC  size:   9112780800  calls:    556200  size/call:  16384 buffer
  mux_quic.c:383        P_ALLOC  size:  17783193600  calls:   1085400  size/call:  16384 buffer
  mux_quic.c:159         P_FREE  size:   8935833600  calls:    545400  size/call:  16384 buffer
  mux_quic.c:142         P_FREE  size:   9112780800  calls:    556200  size/call:  16384 buffer
  h3.c:776              P_ALLOC  size:   8935833600  calls:    545400  size/call:  16384 buffer
  quic_stream.c:166      P_FREE  size:    975241216  calls:     59524  size/call:  16384 buffer
  quic_stream.c:127      P_FREE  size:   7960592384  calls:    485876  size/call:  16384 buffer
  stream.c:772           P_FREE  size:      8798208  calls:       537  size/call:  16384 buffer
  stream.c:768           P_FREE  size:      2424832  calls:       148  size/call:  16384 buffer
  stream.c:751          P_ALLOC  size:   8852062208  calls:    540287  size/call:  16384 buffer
  stream.c:641           P_FREE  size:   8849162240  calls:    540110  size/call:  16384 buffer
  stream.c:640           P_FREE  size:   8847360000  calls:    540000  size/call:  16384 buffer
  channel.h:850         P_ALLOC  size:      2441216  calls:       149  size/call:  16384 buffer
  channel.h:850         P_ALLOC  size:      5914624  calls:       361  size/call:  16384 buffer
  dynbuf.c:55            P_FREE  size:        32768  calls:         2  size/call:  16384 buffer
  Total                 BALANCE  size:            0  calls:   5606906  size/call:      0 (excl. realloc)

Let's see how useful this becomes over time.
2022-12-01 16:12:21 +01:00