haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-06 23:27:04 +02:00

Author	SHA1	Message	Date
Willy Tarreau	111c78329e	MINOR: debug: relax access restrictions on "debug dev hash" and "memstats" These two have absolutely zero impact on the process and do not need to be restricted to the expert mode. The first one calculates a string hash that can be used by anyone when checking a dump; the second one may be used by anyone tracking a memory leak, and is cumbersome to use due to the "expert-mode on" that needs to be prepended. In addition this gives bad habits to users and needlessly taints the process. So let's drop this restriction for these two commands.	2022-11-30 17:58:00 +01:00
Willy Tarreau	50dd7e95c8	CLEANUP: anon: clarify the help message on "debug dev hash" This command is used to hash a section name using the current anon key, it was brought in 2.7 by commit `54966dffd` ("MINOR: anon: store the anonymizing key in the CLI's appctx"). However the help message only says "return msg hashed" which is misleading because if anon mode is not enabled, it returns the string as-is. Let's just mention this condition in the help message, and also fix the alphabetical ordering and alignment on the line.	2022-11-30 17:58:00 +01:00
Willy Tarreau	334d091b75	MINOR: debug: improve error handling on the memstats command parser "debug dev memstats" supports various options but silently ignores the unknown ones. Let's make sure it returns indications about what it expects, as the help message is quite limited otherwise.	2022-11-30 17:24:29 +01:00
Erwan Le Goas	54966dffda	MINOR: anon: store the anonymizing key in the CLI's appctx In order to allow users to dump internal states using a specific key without changing the global one, we're introducing a key in the CLI's appctx. This key is preloaded from the global one when "set anon on" is used (and if none exists, a random one is assigned). And the key can optionally be assigned manually for the whole CLI session. A "show anon" command was also added to show the anon state, and the current key if the users has sufficient permissions. In addition, a "debug dev hash" command was added to test the feature.	2022-09-17 11:27:09 +02:00
Willy Tarreau	d96d214b4c	CLEANUP: debug: use struct ha_caller for memstat The memstats code currently defines its own file/function/line number, type and extra pointer. We don't need to keep them separate and we can easily replace them all with just a struct ha_caller. Note that the extra pointer could be converted to a pool ID stored into arg8 or arg32 and be dropped as well, but this would first require to define IDs for pools (which we currently do not have).	2022-09-08 14:19:15 +02:00
Willy Tarreau	04e50b3d32	CLEANUP: task: rename ->call_date to ->wake_date This field is misnamed because its real and important content is the date the task was woken up, not the date it was called. It temporarily holds the call date during execution but this remains confusing. In fact before the latency measurements were possible it was indeed a call date. Thus is will now be called wake_date. This change is necessary because a subsequent fix will require the introduction of the real call date in the thread ctx.	2022-09-08 14:19:15 +02:00
Willy Tarreau	4a426e2082	MINOR: debug/memstats: automatically determine first column size The first column's width may vary a lot depending on outputs, and it's annoying to have large empty columns on small names and mangled large columns that are not yet large enough. In order to overcome this, this patch adds a width field to the memstats applet's context, and this width is calculated the first time the function is entered, by estimating the width of all lines that will be dumped. This is simple enough and does the job well. If in the future some filtering criteria are added, it will still be possible to perform a single pass on everything depending on the desired output format.	2022-08-09 08:51:08 +02:00
Willy Tarreau	17200dd1f3	MINOR: debug: also store the function name in struct mem_stats The calling function name is now stored in the structure, and it's reported when the "all" argument is passed. The first column is significantly enlarged because some names are really wide :-(	2022-08-09 08:42:42 +02:00
Willy Tarreau	55c950baa9	MINOR: debug: store and report the pool's name in struct mem_stats Let's add a generic "extra" pointer to the struct mem_stats to store context-specific information. When tracing pool_alloc/pool_free, we can now store a pointer to the pool, which allows to report the pool name on an extra column. This significantly improves tracing capabilities. Example: proxy.c:1598 CALLOC size: 28832 calls: 4 size/call: 7208 dynbuf.c:55 P_FREE size: 32768 calls: 2 size/call: 16384 buffer quic_tls.h:385 P_FREE size: 34008 calls: 1417 size/call: 24 quic_tls_iv quic_tls.h:389 P_FREE size: 34008 calls: 1417 size/call: 24 quic_tls_iv quic_tls.h:554 P_FREE size: 34008 calls: 1417 size/call: 24 quic_tls_iv quic_tls.h:558 P_FREE size: 34008 calls: 1417 size/call: 24 quic_tls_iv quic_tls.h:562 P_FREE size: 34008 calls: 1417 size/call: 24 quic_tls_iv quic_tls.h:401 P_ALLOC size: 34080 calls: 1420 size/call: 24 quic_tls_iv quic_tls.h:403 P_ALLOC size: 34080 calls: 1420 size/call: 24 quic_tls_iv xprt_quic.c:4060 MALLOC size: 45376 calls: 5672 size/call: 8 quic_sock.c:328 P_ALLOC size: 46440 calls: 215 size/call: 216 quic_dgram	2022-08-09 08:26:59 +02:00
Willy Tarreau	dadf00e226	DEBUG: cli: add a new "debug dev deadlock" expert command This command will create the requested number of tasks competing on a lock, resulting in triggering the watchdog and crashing the process. This will help stress the watchdog and inspect the lock debugging parts.	2022-07-15 19:41:26 +02:00
Willy Tarreau	f0c86ddfe8	BUG/MEDIUM: debug: fix parallel thread dumps again The previous attempt to fix thread dumps in commit `672972604` ("BUG/MEDIUM: debug: fix possible hang when multiple threads dump at once") still had some shortcomings. Sometimes parallel dumps are jerky essentially due to the way that threads synchronize on startup and end. In addition the risk of waiting forever for a stopped thread exists, and panics happening in parallel to thread dumps are not more reliable either. This commit revisits the state transitions so that all threads may request a dump in parallel, that all of them wait for each other in the handler, and that one thread is responsible for counting every other and checking that the total matches the number of active threads. Then for stopping there's a finishing phase that all threads wait for so that none quits this area too early. Given that we now know the number of participants to the dump, we can let them each decrement the counter when leaving so that another dump may only start after the last participant has completely left. Now many thread dumps in parallel are running fine, so do panics. No backport is needed as this was the result of the changes for thread groups.	2022-07-15 19:41:26 +02:00
Willy Tarreau	55433f9b34	BUG/MINOR: debug: enter ha_panic() only once Some panic dumps are mangled or truncated due to the watchdog firing at the same time on multiple threads and calling ha_panic() simultaneously. What may happen in this case is that the second one waits for the first one to finish but as soon as it's done the second one resets the buffer and dumps again, sometimes resetting the first one's dump. Also the first one's abort() may trigger while the second one is currently dumping, resulting in a full dump followed by a truncated one, leading to confusion. Sometimes some lines appear in the middle of a dump as well. It doesn't happen often and is easier to trigger by causing massive deadlocks. There's no reason for the process to resist to a panic, so we can safely add a counter and no nothing on subsequent calls. Ideally we'd wait there forever but as this may happen inside a signal handler (e.g. watchdog), it doesn't always work, so the easiest thing to do is to return so that the thread is interrupted as soon as possible and brought to the debug handler to be dumped. This should be backported, at least to 2.6 and possibly to older versions as well.	2022-07-15 19:41:26 +02:00
Willy Tarreau	52f238d326	BUG/MEDIUM: cli/threads: make "show threads" more robust on applets Running several concurrent "show threads" in loops might occasionally cause a segfault when trying to retrieve the stream from appctx_sc() which may be null while the applet is finishing. It's not easy to reproduce, it requires 3-5 sessions in parallel for about a minute or so. The appctx_sc must be checked before passing it to sc_strm(). This must be backported to 2.6 which also has the bug.	2022-07-15 19:41:26 +02:00
Willy Tarreau	672972604f	BUG/MEDIUM: debug: fix possible hang when multiple threads dump at once A bug in the thread dumper was introduced by commit `00c27b50c` ("MEDIUM: debug: make the thread dumper not rely on a thread mask anymore"). If two or more threads try to trigger a thread dump exactly at the same time, the second one may loop indefinitely trying to set the value to 1 while the other ones will wait for it to finish dumping before leaving. This is a consequence of a logic change using thread numbers instead of a thread mask, as threads do not need to see all other ones there anymore. No backport is needed, this is only for 2.7.	2022-07-13 09:03:02 +02:00
Willy Tarreau	89ed89e895	BUILD: debug: re-export thread_dump_state Building with threads and without thread dump (e.g. macos, freebsd) warns that thread_dump_state is unused. This happened in fact with recentcommit `1229ef312` ("MINOR: wdt: do not rely on threads_to_dump anymore"). The solution would be to mark it unused, but after a second thought, it can be convenient to keep it exported to help debug crashes, so let's export it again. It's just not referenced in include files since it's not needed outside.	2022-07-01 21:18:03 +02:00
Willy Tarreau	039972b4e5	BUILD: debug: fix build issue on clang with previous commit Since the thread_dump_state type changed to uint, the old value in the CAS needs to be the same as well.	2022-07-01 19:37:42 +02:00
Willy Tarreau	00c27b50c0	MEDIUM: debug: make the thread dumper not rely on a thread mask anymore The thread mask is too short to dump more than 64 bits. Thus here we're using a different approach with two counters, one for the next thread ID to dump (which always exists, as it's looked up), and the second one for the number of threads done dumping. This allows to dump threads in ascending order then to let them wait for all others to be done, then to leave without the risk of an overlapping dump until the done count is null again. This allows to remove threads_to_dump which was the last non-FD variable using a global thread mask.	2022-07-01 19:31:39 +02:00
Willy Tarreau	1229ef312d	MINOR: wdt: do not rely on threads_to_dump anymore This flag is not needed anymore as we're already marking the waiting threads as harmless, thus the thread's bit is already covered by this information. The variable was unexported.	2022-07-01 19:26:35 +02:00
Willy Tarreau	f7afdd910b	MINOR: debug: mark oneself harmless while waiting for threads to finish The debug_handler() function waits for other threads to join, but does not mark itself as harmless, so if at the same time another thread tries to isolate, this may deadlock. In practice this does not happen as the signal is received during epoll_wait() hence under harmless mode, but it can possibly arrive under other conditions. In order to improve this, while waiting for other threads to join, we're now marking the current thread as harmless, as it's doing nothing but waiting for the other ones. This way another harmless waiter will be able to proceed. It's valid to do this since we're not doing anything else in this loop. One improvement could be to also check for the thread being idle and marking it idle in addition to harmless, so that it can even release a full isolation requester. But that really doesn't look worth it.	2022-07-01 19:26:35 +02:00
Willy Tarreau	a2b8ed4b44	MINOR: thread: add is_thread_harmless() to know if a thread already is harmless The harmless status is not re-entrant, so sometimes for signal handling it can be useful to know if we're already harmless or not. Let's add a function doing that, and make the debugger use it instead of manipulating the harmless mask.	2022-07-01 19:26:35 +02:00
Willy Tarreau	03f9b35114	MEDIUM: tinfo: add a dynamic thread-group context The thread group info is not sufficient to represent a thread group's current state as it's read-only. We also need something comparable to the thread context to represent the aggregate state of the threads in that group. This patch introduces ha_tgroup_ctx[] and tg_ctx for this. It's indexed on the group id and must be cache-line aligned. The thread masks that were global and that do not need to remain global were moved there (want_rdv, harmless, idle). Given that all the masks placed there now become group-specific, the associated thread mask (tid_bit) now switches to the thread's local bit (ltid_bit). Both are the same for nbtgroups 1 but will differ for other values. There's also a tg_ctx pointer in the thread so that it can be reached from other threads.	2022-07-01 19:15:15 +02:00
Willy Tarreau	38d0712748	MINOR: debug: use ltid_bit in ha_thread_dump() Since commit `cc7a11ee3` ("MINOR: threads: set the tid, ltid and their bit in thread_cfg") we ought not use (1UL << thr) to get the group mask for thread <thr>, but (ha_thread_info[thr].ltid_bit). ha_thread_dump() needs this.	2022-07-01 19:15:14 +02:00
Willy Tarreau	66ad98a772	MINOR: tinfo: add the tgid to the thread_info struct At several places we're dereferencing the thread group just to catch the group number, and this will become even more required once we start to use per-group contexts. Let's just add the tgid in the thread_info struct to make this easier.	2022-07-01 19:15:14 +02:00
Willy Tarreau	e7475c8e79	MEDIUM: tasks/fd: replace sleeping_thread_mask with a TH_FL_SLEEPING flag Every single place where sleeping_thread_mask was still used was to test or set a single thread. We can now add a per-thread flag to indicate a thread is sleeping, and remove this shared mask. The wake_thread() function now always performs an atomic fetch-and-or instead of a first load then an atomic OR. That's cleaner and more reliable. This is not easy to test, as broadcast FD events are rare. The good way to test for this is to run a very low rate-limited frontend with a listener that listens to the fewest possible threads (2), and to send it only 1 connection at a time. The listener will periodically pause and the wakeup task will sometimes wake up on a random thread and will call wake_thread(): frontend test bind :8888 maxconn 10 thread 1-2 rate-limit sessions 5 Alternately, disabling/enabling a frontend in loops via the CLI also broadcasts such events, but they're more difficult to observe since this is causing connection failures.	2022-07-01 19:15:14 +02:00
Willy Tarreau	bdcd32598f	MINOR: thread: only use atomic ops to touch the flags The thread flags are touched a little bit by other threads, e.g. the STUCK flag may be set by other ones, and they're watched a little bit. As such we need to use atomic ops only to manipulate them. Most places were already using them, but here we generalize the practice. Only ha_thread_dump() does not change because it's run under isolation.	2022-07-01 19:15:14 +02:00
Willy Tarreau	c958c70ec8	MINOR: task: replace global_tasks_mask with a check for tree's emptiness This bit field used to be a per-thread cache of the result of the last lookup of the presence of a task for each thread in the shared cache. Since we now know that each thread has its own shared cache, a test of emptiness is now sufficient to decide whether or not the shared tree has a task for the current thread. Let's just remove this mask.	2022-07-01 19:15:14 +02:00
Willy Tarreau	8e5c53a6c9	MINOR: debug: remove mask support from "debug dev sched" The thread mask will not be used anymore, instead the thread id only is used. Interestingly it was already implemented in the parsing but not used. The single/multi thread argument is not needed anymore since it's sufficient to pass tid<0 to get a multi-threaded task/tasklet. This is in preparation for the removal of the thread_mask in tasks as only this debug code was using it!	2022-07-01 19:15:14 +02:00
Willy Tarreau	27061cd144	MEDIUM: debug: improve DEBUG_MEM_STATS to also report pool alloc/free Sometimes using "debug dev memstats" can be frustrating because all pool allocations are reported through pool-os.h and that's all. But in practice there's nothing wrong with also intercepting pool_alloc, pool_free and pool_zalloc and report their call counts and locations, so that's what this patch does. It only uses an alternate set of macroes for these 3 calls when DEBUG_MEM_STATS is defined. The outputs are reported as P_ALLOC (for both pool_malloc() and pool_zalloc()) and P_FREE (for pool_free()).	2022-06-23 11:58:01 +02:00
Willy Tarreau	680ed5f28b	MINOR: task: move profiling bit to per-thread Instead of having a global mask of all the profiled threads, let's have one flag per thread in each thread's flags. They are never accessed more than one at a time an are better located inside the threads' contexts for both performance and scalability.	2022-06-14 10:38:03 +02:00
Willy Tarreau	c12b321661	CLEANUP: applet: rename appctx_cs() to appctx_sc() It returns a stream connector, not a conn_stream anymore, so let's fix its name.	2022-05-27 19:33:35 +02:00
Willy Tarreau	475e4636bc	CLEANUP: cli: rename all occurrences of stconn "cs" to "sc" Function arguments and local variables called "cs" were renamed to "sc" in the various keyword handlers.	2022-05-27 19:33:35 +02:00
Willy Tarreau	cb086c6de1	REORG: stconn: rename conn_stream.{c,h} to stconn.{c,h} There's no more reason for keepin the code and definitions in conn_stream, let's move all that to stconn. The alphabetical ordering of include files was adjusted.	2022-05-27 19:33:35 +02:00
Willy Tarreau	5edca2f0e1	REORG: rename cs_utils.h to sc_strm.h This file contains all the stream-connector functions that are specific to application layers of type stream. So let's name it accordingly so that it's easier to figure what's located there. The alphabetical ordering of include files was preserved.	2022-05-27 19:33:35 +02:00
Willy Tarreau	462b989d4c	CLEANUP: stconn: rename cs_conn_() to sc_conn_() The following functions which act on a connection-based stream connector were renamed to sc_conn_* (~60 places): cs_conn_drain_and_shut cs_conn_process cs_conn_read0 cs_conn_ready cs_conn_recv cs_conn_send cs_conn_shut cs_conn_shutr cs_conn_shutw	2022-05-27 19:33:34 +02:00
Willy Tarreau	ea27f48c5a	CLEANUP: stconn: rename cs_{check,strm,strm_task} to sc_strm_* These functions return the app-layer associated with an stconn, which is a check, a stream or a stream's task. They're used a lot to access channels, flags and for waking up tasks. Let's just name them appropriately for the stream connector.	2022-05-27 19:33:34 +02:00
Willy Tarreau	40a9c32e3a	CLEANUP: stconn: rename cs_{i,o}{b,c} to sc_{i,o}{b,c} We're starting to propagate the stream connector's new name through the API. Most call places of these functions that retrieve the channel or its buffer are in applets. The local variable names are not changed in order to keep the changes small and reviewable. There were ~92 uses of cs_ic(), ~96 of cs_oc() (due to co_get() being less factorizable than ci_put), and ~5 accesses to the buffer itself.	2022-05-27 19:33:34 +02:00
Willy Tarreau	d0a06d52f4	CLEANUP: applet: use applet_put() everywhere possible This applies the change so that the applet code stops using ci_putchk() and friends everywhere possible, for the much saferapplet_put() instead. The change is mechanical but large. Two or three functions used to have no appctx and a cs derived from the appctx instead, which was a reminiscence of old times' stream_interface. These were simply changed to directly take the appctx. No sensitive change was performed, and the old (more complex) API is still usable when needed (e.g. the channel is already known). The change touched roughly a hundred of locations, with no less than 124 lines removed. It's worth noting that the stats applet, the oldest of the series, could get a serious lifting, as it's still very channel-centric instead of propagating the appctx along the chain. Given that this code doesn't change often, there's no emergency to clean it up but it would look better.	2022-05-27 19:33:34 +02:00
Willy Tarreau	7cb9e6c6ba	CLEANUP: stream: rename "csf" and "csb" to "scf" and "scb" These are the stream connectors, let's give them consistent names. The patch is large (405 locations) but totally trivial.	2022-05-27 19:33:34 +02:00
Willy Tarreau	4596fe20d9	CLEANUP: conn_stream: tree-wide rename to stconn (stream connector) This renames the "struct conn_stream" to "struct stconn" and updates the descriptions in all comments (and the rare help descriptions) to "stream connector" or "connector". This touches a lot of files but the change is minimal. The local variables were not even renamed, so there's still a lot of "cs" everywhere.	2022-05-27 19:33:34 +02:00
Willy Tarreau	0698c80a58	CLEANUP: applet: remove the unneeded appctx->owner This one is the pointer to the conn_stream which is always in the endpoint that is always present in the appctx, thus it's not needed. This patch removes it and replaces it with appctx_cs() instead. A few occurences that were using __cs_strm(appctx->owner) were moved directly to appctx_strm() which does the equivalent.	2022-05-13 14:28:48 +02:00
Willy Tarreau	aa229ccc4c	MINOR: lua: move the http service context out of appctx.ctx Just like for the TCP service, let's move the context away from appctx.ctx. A new struct hlua_http_ctx was defined, reserved in hlua_applet_http_init() and used everywhere else. Similarly, the task dump code will no more report decoded stack traces in case these services would be involved. That may be solved later.	2022-05-06 18:13:36 +02:00
Willy Tarreau	e23f33bbfe	MINOR: lua: move the tcp service storage outside of appctx.ctx The use-service mechanism for Lua in TCP mode relies on the hlua_tcp storage in appctx->ctx. We can move its definition to hlua.c and simply use appctx_reserve_svcctx() to reserve and access the stoage. One tiny side effect is that the task dump used in panics will not show anymore the Lua call stack in its trace. For this a better API is needed from the Lua code to expose a function that does the job from an appctx.	2022-05-06 18:13:36 +02:00
Willy Tarreau	40e952f1a6	CLEANUP: debug/cli: make "debug dev memstats" not use ctx.cli anymore There was only the need for a start and a stop pointer, and a show_all flag. All of that moved to a locally-defined struct dev_mem_ctx.	2022-05-06 18:13:36 +02:00
Willy Tarreau	e06bbf3f19	CLEANUP: debug/cli: make "debug dev fd" not use ctx.cli anymore The command only requires to store an int, but it will be useful later to have a struct to pass extra info such as an "all" flag to dump all FDs. The new context is now a struct dev_fd_ctx stored in svcctx.	2022-05-06 18:13:36 +02:00
Willy Tarreau	7831e0272e	BUILD: debug: unify the definition of ha_backtrace_to_stderr() It was both defined as ha_backtrace_to_stderr(void) and ha_backtrace_to_stderr(), and tcc is not happy with this, so let's adjust this tiny detail.	2022-05-06 15:16:19 +02:00
Willy Tarreau	382474348c	CLEANUP: tree-wide: use fd_set_nonblock() and fd_set_cloexec() This gets rid of most open-coded fcntl() calls, some of which were passed through DISGUISE() to avoid a useless test. The FD_CLOEXEC was most often set without preserving previous flags, which could become a problem once new flags are created. Now this will not happen anymore.	2022-04-26 10:59:48 +02:00
Christopher Faulet	6b0a0fb2f9	CLEANUP: tree-wide: Remove any ref to stream-interfaces Stream-interfaces are gone. Corresponding files can be safely be removed. In addition, comments are updated accordingly.	2022-04-13 15:10:16 +02:00
Christopher Faulet	582a226a2c	MINOR: conn-stream: Remove the stream-interface from the conn-stream The stream-interface API is no longer used. Thus, it is removed from the conn-stream. From now, stream-interfaces are now longer used !	2022-04-13 15:10:16 +02:00
Christopher Faulet	5e29b76ea6	MEDIUM: stream-int/conn-stream: Move I/O functions to conn-stream cs_conn_io_cb(), cs_conn_sync_recv() and cs_conn_sync_send() are moved in conn_stream.c. Associated functions are moved too (cs_notify, cs_conn_read0, cs_conn_recv, cs_conn_send and cs_conn_process).	2022-04-13 15:10:15 +02:00
Christopher Faulet	a0bdec350f	MEDIUM: stream-int/conn-stream: Move blocking flags from SI to CS Remaining flags and associated functions are move in the conn-stream scope. These flags are added on the endpoint and not the conn-stream itself. This way it will be possible to get them from the mux or the applet. The functions to get or set these flags are renamed accordingly with the "cs_" prefix and updated to manipualte a conn-stream instead of a stream-interface.	2022-04-13 15:10:15 +02:00
Christopher Faulet	4a7764ae9d	MINOR: stream-int/conn-stream: Move si_cs_io_cb() in the conn-stream scope si_cs_io_cb() is renamed cs_conn_io_cb(). In addition, the context of the tasklet used to wake-up the conn-stream is now a conn-stream.	2022-04-13 15:10:15 +02:00
Christopher Faulet	62e757470a	MEDIUM: stream-int/conn-stream: Move stream-interface state in the conn-stream The stream-interface state (SI_ST_) is now in the conn-stream. It is a mechanical replacement for now. Nothing special. SI_ST_ and SI_SB_* were renamed accordingly. Utils functions to manipulate these infos were moved under the conn-stream scope. But it could be good to keep in mind that this part should be reworked. Indeed, at the CS level, we only need to know if it is ready to receive or to send. The state of conn-stream from INI to EST is only used on the server side. The client CS is immediately set to EST. Thus current SI_ST_* states should probably be moved to the stream to reflect the server connection state during the establishment stage.	2022-04-13 15:10:15 +02:00
Christopher Faulet	ae024ced03	MEDIUM: stream-int/stream: Use connect expiration instead of SI expiration The expiration date in the stream-interface was only used on the server side to set the connect, queue or turn-around timeout. It was checked on the frontend stream-interface, but never used concretely. So it was removed and replaced by a connect expiration date in the stream itself. Thus, SI_FL_EXP flag in stream-interfaces is replaced by a stream flag, SF_CONN_EXP.	2022-04-13 15:10:14 +02:00
Christopher Faulet	908628c4c0	MEDIUM: tree-wide: Use CS util functions instead of SI ones At many places, we now use the new CS functions to get a stream or a channel from a conn-stream instead of using the stream-interface API. It is the first step to reduce the scope of the stream-interfaces. The main change here is about the applet I/O callback functions. Before the refactoring, the stream-interface was the appctx owner. Thus, it was heavily used. Now, as far as possible,the conn-stream is used. Of course, it remains many calls to the stream-interface API.	2022-04-13 15:10:14 +02:00
Christopher Faulet	fe14af30ec	BUG/MEDIUM: cli/debug: Properly get the stream-int in all debug I/O handlers The appctx owner is not a stream-interface anymore. It is now a conn-stream. In the cli I/O handler for the command "debug dev fd", we still handle it as a stream-interface. It is now fixed. It is 2.6-specific, no backport is needed.	2022-03-16 09:52:13 +01:00
Willy Tarreau	06e66c84fc	DEBUG: reduce the footprint of BUG_ON() calls Many inline functions involve some BUG_ON() calls and because of the partial complexity of the functions, they're not inlined anymore (e.g. co_data()). The reason is that the expression instantiates the message, its size, sometimes a counter, then the atomic OR to taint the process, and the back trace. That can be a lot for an inline function and most of it is always the same. This commit modifies this by delegating the common parts to a dedicated function "complain()" that takes care of updating the counter if needed, writing the message and measuring its length, and tainting the process. This way the caller only has to check a condition, pass a pointer to the preset message, and the info about the type (bug or warn) for the tainting, then decide whether to dump or crash. Note that this part could also be moved to the function but resulted in complain() always being at the top of the stack, which didn't seem like an improvement. Thanks to these changes, the BUG_ON() calls do not result in uninlining functions anymore and the overall code size was reduced by 60 to 120 kB depending on the build options.	2022-03-02 16:00:42 +01:00
Willy Tarreau	6d3f1e322e	DEBUG: rename WARN_ON_ONCE() to CHECK_IF() The only reason for warning once is to check if a condition really happens. Let's use a term that better translates the intent, that's important when reading the code.	2022-02-28 11:51:23 +01:00
Willy Tarreau	4e0a8b1224	DEBUG: add a new WARN_ON_ONCE() macro This one will maintain a static counter per call place and will only emit the warning on the first call. It may be used to invite users to report an unexpected event without spamming them with messages.	2022-02-25 11:55:47 +01:00
Willy Tarreau	305cfbde43	DBEUG: add a new WARN_ON() macro This is the same as BUG_ON() except that it never crashes and only emits a warning and a backtrace, inviting users to report the problem. This will be usable for non-fatal issues that should not happen and need to be fixed. This way the BUG_ON() when using DEBUG_STRICT_NOCRASH is effectively an equivalent of WARN_ON().	2022-02-25 11:55:47 +01:00
Christopher Faulet	5d3c8aa154	MINOR: debug: Always access the stream-int via the conn-stream To be able to move the stream-interface from the stream to the conn-stream, all access to the SI is done via the conn-stream. This patch is limited to the debug part.	2022-02-24 11:00:02 +01:00
Christopher Faulet	86e1c3381b	MEDIUM: applet: Set the conn-stream as appctx owner instead of the stream-int Because appctx is now an endpoint of the conn-stream, there is no reason to still have the stream-interface as appctx owner. Thus, the conn-stream is now the appctx owner.	2022-02-24 11:00:02 +01:00
Willy Tarreau	410942b92a	BUILD: debug/cli: condition test of O_ASYNC to its existence David Carlier reported a build breakage on Haiku since commit `5be7c198e` ("DEBUG: cli: add a new "debug dev fd" expert command") due to O_ASYNC not being defined. Ilya also reported it broke the build on Cygwin. It's not that portable and sometimes defined as O_NONBLOCK for portability. But here we don't even need that, as we already condition other flags, let's just ignore it if it does not exist.	2022-01-25 14:51:53 +01:00
Willy Tarreau	5be7c198e5	DEBUG: cli: add a new "debug dev fd" expert command This command will scan the whole file descriptors space to look for existing FDs that are unknown to haproxy's fdtab, and will try to dump a maximum number of information about them (including type, mode, device, size, uid/gid, cloexec, O_* flags, socket types and addresses when relevant). The goal is to help detecting inherited FDs from parent processes as well as potential leaks. Some of those listed are actually known but handled so deep into some systems that they're not in the fdtab (such as epoll FDs or inter- thread pipes). This might be refined in the future so that these ones become known and do not appear. Example of output: $ socat - /tmp/sock1 <<< "expert-mode on;debug dev fd" 0 type=tty. mod=0620 dev=0x8803 siz=0 uid=1000 gid=5 fs=0x16 ino=0x6 getfd=+0 getfl=O_RDONLY,O_APPEND 1 type=tty. mod=0620 dev=0x8803 siz=0 uid=1000 gid=5 fs=0x16 ino=0x6 getfd=+0 getfl=O_RDONLY,O_APPEND 2 type=tty. mod=0620 dev=0x8803 siz=0 uid=1000 gid=5 fs=0x16 ino=0x6 getfd=+0 getfl=O_RDONLY,O_APPEND 3 type=pipe mod=0600 dev=0 siz=0 uid=1000 gid=100 fs=0xc ino=0x18112348 getfd=+0 4 type=epol mod=0600 dev=0 siz=0 uid=0 gid=0 fs=0xd ino=0x3674 getfd=+0 getfl=O_RDONLY 33 type=pipe mod=0600 dev=0 siz=0 uid=1000 gid=100 fs=0xc ino=0x24af8251 getfd=+0 getfl=O_RDONLY 34 type=epol mod=0600 dev=0 siz=0 uid=0 gid=0 fs=0xd ino=0x3674 getfd=+0 getfl=O_RDONLY 36 type=pipe mod=0600 dev=0 siz=0 uid=1000 gid=100 fs=0xc ino=0x24af8d1b getfd=+0 getfl=O_RDONLY 37 type=epol mod=0600 dev=0 siz=0 uid=0 gid=0 fs=0xd ino=0x3674 getfd=+0 getfl=O_RDONLY 39 type=pipe mod=0600 dev=0 siz=0 uid=1000 gid=100 fs=0xc ino=0x24afa04f getfd=+0 getfl=O_RDONLY 41 type=pipe mod=0600 dev=0 siz=0 uid=1000 gid=100 fs=0xc ino=0x24af8252 getfd=+0 getfl=O_RDONLY 42 type=epol mod=0600 dev=0 siz=0 uid=0 gid=0 fs=0xd ino=0x3674 getfd=+0 getfl=O_RDONLY	2022-01-24 20:26:09 +01:00
Willy Tarreau	6ab7b21a11	MINOR: debug: add ability to dump loaded shared libraries Many times core dumps reported by users who experience trouble are difficult to exploit due to missing system libraries. Sometimes, having just a list of loaded libraries and their respective addresses can already provide some hints about some problems. This patch makes a step in that direction by adding a new "show libs" command that will try to enumerate the list of object files that are loaded in memory, relying on the dynamic linker for this. It may also be used to detect that some foreign code embarks other undesired libs (e.g. some external Lua modules). At the moment it's only supported on glibc when USE_DL is set, but it's implemented in a way that ought to make it reasonably easy to be extended to other platforms.	2021-12-28 16:59:00 +01:00
Willy Tarreau	a3870b7952	MINOR: debug: report the group and thread ID in the thread dumps Now thread dumps will report the thread group number and the ID within this group. Note that this is still quite limited because some masks are calculated based on the thread in argument while they have to be performed against a group-level thread ID.	2021-10-08 17:22:26 +02:00
Willy Tarreau	a0b99536c8	REORG: thread/sched: move the thread_info flags to the thread_ctx The TI_FL_STUCK flag is manipulated by the watchdog and scheduler and describes the apparent life/death of a thread so it changes all the time and it makes sense to move it to the thread's context for an active thread.	2021-10-08 17:22:26 +02:00
Willy Tarreau	45c38e22bf	REORG: thread/clock: move the clock parts of thread_info to thread_ctx The "thread_info" name was initially chosen to store all info about threads but since we now have a separate per-thread context, there is no point keeping some of its elements in the thread_info struct. As such, this patch moves prev_cpu_time, prev_mono_time and idle_pct to thread_ctx, into the thread context, with the scheduler parts. Instead of accessing them via "ti->" we now access them via "th_ctx->", which makes more sense as they're totally dynamic, and will be required for future evolutions. There's no room problem for now, the structure still has 84 bytes available at the end.	2021-10-08 17:22:26 +02:00
Willy Tarreau	1a9c922b53	REORG: thread/sched: move the task_per_thread stuff to thread_ctx The scheduler contains a lot of stuff that is thread-local and not exclusively tied to the scheduler. Other parts (namely thread_info) contain similar thread-local context that ought to be merged with it but that is even less related to the scheduler. However moving more data into this structure isn't possible since task.h is high level and cannot be included everywhere (e.g. activity) without causing include loops. In the end, it appears that the task_per_thread represents most of the per-thread context defined with generic types and should simply move to tinfo.h so that everyone can use them. The struct was renamed to thread_ctx and the variable "sched" was renamed to "th_ctx". "sched" used to be initialized manually from run_thread_poll_loop(), now it's initialized by ha_set_tid() just like ti, tid, tid_bit. The memset() in init_task() was removed in favor of a bss initialization of the array, so that other subsystems can put their stuff in this array. Since the tasklet array has TL_CLASSES elements, the TL_* definitions was moved there as well, but it's not a problem. The vast majority of the change in this patch is caused by the renaming of the structures.	2021-10-08 17:22:26 +02:00
Willy Tarreau	2169498941	MINOR: clock: move the clock_ids to clock.c This removes the knowledge of clockid_t from anywhere but clock.c, thus eliminating a source of includes burden. The unused clock_id field was removed from thread_info, and the definition setting of clockid_t was removed from compat.h. The most visible change is that the function now_cpu_time_thread() now takes the thread number instead of a tinfo pointer.	2021-10-08 17:22:26 +02:00
Willy Tarreau	5554264f31	REORG: time: move time-keeping code and variables to clock.c There is currently a problem related to time keeping. We're mixing the functions to perform calculations with the os-dependent code needed to retrieve and adjust the local time. This patch extracts from time.{c,h} the parts that are solely dedicated to time keeping. These are the "now" or "before_poll" variables for example, as well as the various now_() functions that make use of gettimeofday() and clock_gettime() to retrieve the current time. The "tv_" functions moved there were also more appropriately renamed to "clock_*". Other parts used to compute stolen time are in other files, they will have to be picked next.	2021-10-08 17:22:26 +02:00
Willy Tarreau	b7fc4c4e9f	BUILD: tree-wide: add missing http_ana.h from many places At least 6 files make use of s->txn without including http_ana which defines it. They used to get it from other includes.	2021-10-07 01:36:51 +02:00
Willy Tarreau	b205bfdab7	CLEANUP: cli/tree-wide: properly re-align the CLI commands' help messages There were 102 CLI commands whose help were zig-zagging all along the dump making them unreadable. This patch realigns all these messages so that the command now uses up to 40 characters before the delimiting colon. About a third of the commands did not correctly list their arguments which were added after the first version, so they were all updated. Some abuses of the term "id" were fixed to use a more explanatory term. The "set ssl ocsp-response" command was not listed because it lacked a help message, this was fixed as well. The deprecated enable/disable commands for agent/health/server were prominently written as deprecated. Whenever possible, clearer explanations were provided.	2021-05-07 11:51:26 +02:00
Willy Tarreau	48129be18a	MINOR: debug: add a new "debug dev sym" command in expert mode This command attempts to resolve a pointer to a symbol name. This is convenient during development as it's easier to get such pointers live than by issuing a debugger or calling addr2line.	2021-05-05 07:47:29 +02:00
Willy Tarreau	4781b1521a	CLEANUP: atomic/tree-wide: replace single increments/decrements with inc/dec This patch replaces roughly all occurrences of an HA_ATOMIC_ADD(&foo, 1) or HA_ATOMIC_SUB(&foo, 1) with the equivalent HA_ATOMIC_INC(&foo) and HA_ATOMIC_DEC(&foo) respectively. These are 507 changes over 45 files.	2021-04-07 18:18:37 +02:00
Christopher Faulet	cc2c4f8f4c	BUG/MEDIUM: debug/lua: Use internal hlua function to dump the lua traceback The commit reverts following commits: * `83926a04` BUG/MEDIUM: debug/lua: Don't dump the lua stack if not dumpable * `a61789a1` MEDIUM: lua: Use a per-thread counter to track some non-reentrant parts of lua Instead of relying on a Lua function to print the lua traceback into the debugger, we are now using our own internal function (hlua_traceback()). This one does not allocate memory and use a chunk instead. This avoids any issue with a possible deadlock in the memory allocator because the thread processing was interrupted during a memory allocation. This patch relies on the commit "BUG/MEDIUM: debug/lua: Use internal hlua function to dump the lua traceback". Both must be backported wherever the patches above are backported, thus as far as 2.0	2021-03-24 16:35:23 +01:00
Christopher Faulet	83926a04fe	BUG/MEDIUM: debug/lua: Don't dump the lua stack if not dumpable When we try to dump the stack of a lua context, if it is not dumpable, nothing is performed and a message is emitted instead. This happens when a lua execution was interrupted inside a non-reentrant part. This patch depends on following commit : * MEDIUM: lua: Use a per-thread counter to track some non-reentrant parts of lua Thanks to this patch, we avoid a possible deadllock if the lua is interrupted by the watchdog in the lua memory allocator, because realloc() is not async-signal-safe. Both patches must be backported as far as 2.0.	2021-03-19 16:19:59 +01:00
Willy Tarreau	144f84a09d	MEDIUM: task: extend the state field to 32 bits It's been too short for quite a while now and is now full. It's still time to extend it to 32-bits since we have room for this without wasting any space, so we now gained 16 new bits for future flags. The values were not reassigned just in case there would be a few hidden u16 or short somewhere in which these flags are placed (as it used to be the case with stream->pending_events). The patch is tagged MEDIUM because this required to update the task's process() prototype to use an int instead of a short, that's quite a bunch of places.	2021-03-05 08:30:08 +01:00
Willy Tarreau	06e69b556c	REORG: tools: promote the debug PRNG to more general use as a statistical one We frequently need to access a simple and fast PRNG for statistical purposes. The debug_prng() function did exactly this using a xorshift generator but its use was limited to debug only. Let's move this to tools.h and tools.c to make it accessible everywhere. Since it needs to be fast, its state is thread-local. An initialization function starts a different initial value for each thread for better distribution.	2021-03-05 08:30:08 +01:00
Willy Tarreau	1f3b1417b8	CLEANUP: tasks: use a less confusing name for task_list_size This one is systematically misunderstood due to its unclear name. It is in fact the number of tasks in the local tasklet list. Let's call it "tasks_in_list" to remove some of the confusion.	2021-02-24 17:42:04 +01:00
Willy Tarreau	9c7b8085f4	MEDIUM: task: remove the tasks_run_queue counter and have one per thread This counter is solely used for reporting in the stats and is the hottest thread contention point to date. Moving it to the scheduler and having a separate one for the global run queue dramatically improves the performance, showing a 12% boost on the request rate on 16 threads! In addition, the thread debugging output which used to rely on rqueue_size was not totally accurate as it would only report task counts. Now we can return the exact thread's run queue length. It is also interesting to note that there are still a few other task/tasklet counters in the scheduler that are not efficiently updated because some cover a single area and others cover multiple areas. It looks like having a distinct counter for each of the following entries would help and would keep the code a bit cleaner: - global run queue (tree) - per-thread run queue (tree) - per-thread shared tasklets list - per-thread local lists Maybe even splitting the shared tasklets lists between pure tasklets and tasks instead of having the whole and tasks would simplify the code because there remain a number of places where several counters have to be updated.	2021-02-24 17:42:04 +01:00
Willy Tarreau	2cbe2e7f84	BUILD: debug: fix build warning by consuming the write() result When writing commit `a8459b28c` ("MINOR: debug: create ha_backtrace_to_stderr() to dump an instant backtrace") I just forgot that some distros are a bit extremist about the syscall return values. src/debug.c: In function `ha_backtrace_to_stderr': src/debug.c:147:3: error: ignoring return value of `write', declared with attribute warn_unused_result [-Werror=unused-result] write(2, b.area, b.data); ^~~~~~~~~~~~~~~~~~~~~~~~ CC src/h1_htx.o Let's apply the usual tricks to shut them up. No backport is needed.	2021-01-22 15:58:26 +01:00
Willy Tarreau	2bfce7e424	MINOR: debug: let ha_dump_backtrace() dump a bit further for some callers The dump state is now passed to the function so that the caller can adjust the behavior. A new series of 4 values allow to stop after dumping main instead of before it or any of the usual loops. This allows to also report BUG_ON() that could happen very high in the call graph (e.g. startup, or the scheduler itself) while still understanding what the call path was.	2021-01-22 14:48:34 +01:00
Willy Tarreau	5baf4fe31a	MEDIUM: debug: now always print a backtrace on CRASH_NOW() and friends The purpose is to enable the dumping of a backtrace on BUG_ON(). While it's very useful to know that a condition was met, very often some caller context is missing to figure how the condition could happen. From now on, on systems featuring backtrace, a backtrace of the calling thread will also be dumped to stderr in addition to the unexpected condition. This will help users of DEBUG_STRICT as they'll most often find this backtrace in their logs even if they can't find their core file. A new "debug dev bug" expert-mode CLI command was added to test the feature.	2021-01-22 14:18:34 +01:00
Willy Tarreau	a8459b28c3	MINOR: debug: create ha_backtrace_to_stderr() to dump an instant backtrace This function calls the ha_dump_backtrace() function with a locally allocated buffer and sends the output slightly indented to fd #2. It's meant to be used as an emergency backtrace dump.	2021-01-22 14:15:36 +01:00
Willy Tarreau	123fc9786a	MINOR: debug: extract the backtrace dumping code to its own function The backtrace dumping code was located into the thread dump function but it looks particularly convenient to be able to call it to produce a dump in other situations, so let's move it to its own function and make sure it's called last in the function so that we can benefit from tail merging to save one entry.	2021-01-22 13:52:41 +01:00
Willy Tarreau	2f1227eb3f	MINOR: debug: always export the my_backtrace function In order to simplify the code and remove annoying ifdefs everywhere, let's always export my_backtrace() and make it adapt to the situation and return zero if not supported. A small update in the thread dump function was needed to make sure we don't use its results if it fails now.	2021-01-22 12:12:29 +01:00
Willy Tarreau	c7ead07b9c	CLEANUP: debug: mark the RNG's seed as unsigned Since commit `8a069eb9a` ("MINOR: debug: add a trivial PRNG for scheduler stress-tests"), 32-bit gcc 4.7 emits this warning when parsing the initial seed for the debugger's RNG (2463534242): src/debug.c:46:1: warning: this decimal constant is unsigned only in ISO C90 [enabled by default] Let's mark it explicitly unsigned.	2020-12-18 16:31:08 +01:00
Willy Tarreau	8a069eb9a4	MINOR: debug: add a trivial PRNG for scheduler stress-tests Commit `a5a447984` ("MINOR: debug: add "debug dev sched" to stress the scheduler.") doesn't scale with threads because ha_random64() takes care of being totally thread-safe for use with UUIDs. We don't need this for the stress-testing functions, let's just implement a xorshift PRNG instead. On 8 threads the performance jumped from 230k ctx/s with 96% spent in ha_random64() to 14M ctx/s.	2020-11-30 17:07:32 +01:00
Willy Tarreau	a5a4479849	MINOR: debug: add "debug dev sched" to stress the scheduler. This command supports starting a bunch of tasks or tasklets, either on the current thread (mask=0), all (default), or any set, either single-threaded or multi-threaded, and possibly auto-scheduled. These tasks/tasklets will randomly pick another one to wake it up. The tasks only do it 50% of the time while tasklets always wake two tasks up, in order to achieve roughly 50% load (since the target might already be woken up).	2020-11-29 17:43:07 +01:00
Christopher Faulet	fc633b6eff	CLEANUP: config: Return ERR_NONE from config callbacks instead of 0 Return ERR_NONE instead of 0 on success for all config callbacks that should return ERR_* codes. There is no change because ERR_NONE is a macro equals to 0. But this makes the return value more explicit.	2020-11-13 16:26:10 +01:00
Christopher Faulet	471425f51d	BUG/MINOR: debug: Don't dump the lua stack if it is not initialized When the watchdog is fired because of the lua, the stack of the corresponding lua context is dumped. But we must be sure the lua context is fully initialized to do so. If we are blocked on the global lua lock, during the lua context initialization, the lua stask may be NULL. This patch should fix the issue #776. It must be backported as far as 2.0.	2020-07-27 09:37:18 +02:00
Willy Tarreau	0c439d8956	BUILD: tools: make resolve_sym_name() return a const Originally it was made to return a void* because some comparisons in the code where it was used required a lot of casts. But now we don't need that anymore. And having it non-const breaks the build on NetBSD 9 as reported in issue #728. So let's switch to const and adjust debug.c to accomodate this.	2020-07-05 20:26:04 +02:00
Willy Tarreau	a6026a0c92	MINOR: debug: add a new "debug dev memstats" command Now when building with -DDEBUG_MEM_STATS, some malloc/calloc/strdup/realloc stats are kept per file+line number and may be displayed and even reset on the CLI using "debug dev memstats". This allows to easily track potential leakers or abnormal usages.	2020-07-02 09:14:48 +02:00
Willy Tarreau	59153fef86	MINOR: tasks: make run_tasks_from_lists() scan the queues itself Now process_runnable_tasks is responsible for calculating the budgets for each queue, dequeuing from the tree, and calling run_tasks_from_lists(). This latter one scans the queues, picking tasks there and respecting budgets. Note that its name was updated with a plural "s" for this reason.	2020-06-24 12:21:26 +02:00
Willy Tarreau	b2551057af	CLEANUP: include: tree-wide alphabetical sort of include files This patch fixes all the leftovers from the include cleanup campaign. There were not that many (~400 entries in ~150 files) but it was definitely worth doing it as it revealed a few duplicates.	2020-06-11 10:18:59 +02:00
Willy Tarreau	aeed4a85d6	REORG: include: move log.h to haproxy/log{,-t}.h The current state of the logging is a real mess. The main problem is that almost all files include log.h just in order to have access to the alert/warning functions like ha_alert() etc, and don't care about logs. But log.h also deals with real logging as well as log-format and depends on stream.h and various other things. As such it forces a few heavy files like stream.h to be loaded early and to hide missing dependencies depending where it's loaded. Among the missing ones is syslog.h which was often automatically included resulting in no less than 3 users missing it. Among 76 users, only 5 could be removed, and probably 70 don't need the full set of dependencies. A good approach would consist in splitting that file in 3 parts: - one for error output ("errors" ?). - one for log_format processing - and one for actual logging.	2020-06-11 10:18:58 +02:00
Willy Tarreau	5e539c9b8d	REORG: include: move stream_interface.h to haproxy/stream_interface{,-t}.h Almost no changes, removed stdlib and added buf-t and connection-t to the types to avoid a warning.	2020-06-11 10:18:58 +02:00
Willy Tarreau	83487a833c	REORG: include: move cli.h to haproxy/cli{,-t}.h Almost no change except moving the cli_kw struct definition after the defines. Almost all users had both types&proto included, which is not surprizing since this code is old and it used to be the norm a decade ago. These places were cleaned.	2020-06-11 10:18:58 +02:00
Willy Tarreau	3727a8a083	REORG: include: move signal.h to haproxy/signal{,-t}.h No change was necessary. Include from wdt.c was dropped since unneeded.	2020-06-11 10:18:58 +02:00
Willy Tarreau	cea0e1bb19	REORG: include: move task.h to haproxy/task{,-t}.h The TASK_IS_TASKLET() macro was moved to the proto file instead of the type one. The proto part was a bit reordered to remove a number of ugly forward declaration of static inline functions. About a tens of C and H files had their dependency dropped since they were not using anything from task.h.	2020-06-11 10:18:58 +02:00

1 2 3 4 5

210 Commits