haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-10-29 07:31:00 +01:00

Author	SHA1	Message	Date
Ilia Shipitsin	78b849b839	CLEANUP: assorted typo fixes in the code and comments code, comments and doc actually.	2025-04-02 11:12:20 +02:00
Willy Tarreau	b30639848e	BUILD: activity/memprofile: fix a build warning in the posix_memalign handler A "return NULL" statement was placed for error handling in the posix_memalign() handler instead of an int errno value, by recent commit 5ddc8b3ad4 ("MINOR: activity/memprofile: monitor non-portable calls as well"). Surprisingly the warning only triggered on gcc-4.8. Let's use ENOMEM instead. No backport needed.	2024-11-22 09:42:49 +01:00
Willy Tarreau	ead0b0154b	MINOR: activity/memprofile: use resolve_dso_name() for the DSO summary Let's simplify the code by making use of this simpler and sometimes more efficient variant.	2024-11-21 19:58:06 +01:00
Willy Tarreau	9a8b834435	MINOR: activity: interrupt the show profile dump more often The calls to resolv_sym_name() can be a bit expensive. Forcing to yield more often is better for the latency and will avoid the watchdog reporting warnings. Note that it's still called in the sort at the end, but that one cannot be avoided. At best we could try to rely on the list of libs but that's not trivial and not always present.	2024-11-21 19:58:06 +01:00
Willy Tarreau	5ddc8b3ad4	MINOR: activity/memprofile: monitor non-portable calls as well Some dependencies might very well rely on posix_memalign(), strndup() or other less portable callsn making us miss them when chasing memory leaks, resulting in negative global allocation counters. Let's provide the handlers for the following functions: strndup() // _POSIX_C_SOURCE >= 200809L \|\| glibc >= 2.10 valloc() // _BSD_SOURCE \|\| _XOPEN_SOURCE>=500 \|\| glibc >= 2.12 aligned_alloc() // _ISOC11_SOURCE posix_memalign() // _POSIX_C_SOURCE >= 200112L memalign() // obsolete pvalloc() // obsolete This time we don't fail if they're not found, we just silently forward the calls.	2024-11-21 19:58:06 +01:00
Willy Tarreau	33c0ce299d	MINOR: activity/memprofile: also monitor strdup() activity Some memory profiling outputs have showed negative counters, very likely due to some libs calling strdup(). Let's add it to the list of monitored activities. Actually even haproxy itself uses some. Having "profiling.memory on" in the config reveals 35 call places.	2024-11-21 19:58:06 +01:00
Willy Tarreau	623a2c4e19	CLEANUP: activity: better use a mask to tests freeing methods In "show profiling memory", we need to distinguish methods which really free memory from those which do not so that we don't account for the free value twice. However for now it's done using multiple tests, which are going to complicate the addition of new methods. Let's switch to a bit field defined as a mask in a single place instead, as we don't intend to use more than 32/64 methods!	2024-11-21 19:58:06 +01:00
Willy Tarreau	f3547d0b74	MINOR: activity: better report nil than ffff in unknown callers For unknown callers we try to get the lowest known address and we purposely ignore NULL during calculation of the min. But the side effect is that we also report ffff in the per-DSO address. Better catch this case and finally accept to report nil. Before it would report this: $ socat - /tmp/sock1 <<< "show profiling memory" \|grep nil 50000 10 9600000 9440\| (nil) [other] unknown(192) [delta=9590560] [pool=http_txn] 50000 10 9600000 9440\| (nil) DSO:other; delta_calls=49990; delta_bytes=9590560 now it reports this: $ socat - /tmp/sock1 <<< "show profiling memory" \|grep nil 50000 11 9600000 9656\| (nil) [other] unknown(192) [delta=9590344] [pool=connection] 50000 11 9600000 9656\| (nil) DSO:other; delta_calls=49989; delta_bytes=9590344	2024-11-21 19:58:06 +01:00
Willy Tarreau	859341c1ec	MINOR: activity/memprofile: offer a function to unregister stale info There's actually a problem with memprofiles: the pool pointer is stored in ->info but some pools are replaced during startup, such as the trash pool, leaving a dangling pointer there. Let's complete the API with a new function memprof_remove_stale_info() that will remove all stale references to this info pointer. It's also present when USE_MEMORY_PROFILING is not set so as to ease the job on callers.	2024-11-21 19:58:06 +01:00
Willy Tarreau	c42a2b8c94	BUG/MINOR: activity/memprofile: reinitialize the free calls on DSO summary In commit 401fb0e87a ("MINOR: activity/memprofile: show per-DSO stats") we added a summary per DSO. However the free calls/tot were not initialized when creating a new entry because initially they were applied to any entry, but since we don't update free calls for non-free capable callers, we still need to reinitialize these entries when reassigning one. Because of this bug, a "show profiling memory" output can randomly show highly negative values on the DSO lines if it turns out that the DSO entry was created on an alloc instead of a realloc/free. Since the commit above was backported to 2.9, this one must go there as well.	2024-11-21 19:58:05 +01:00
Willy Tarreau	401fb0e87a	MINOR: activity/memprofile: show per-DSO stats On systems where many libs are loaded, it's hard to track suspected leaks. Having a per-DSO summary makes it more convenient. That's what we're doing here by summarizing all calls per DSO before showing the total.	2024-10-24 10:49:21 +02:00
Willy Tarreau	5091f90479	MINOR: activity/memprofile: always return "other" bin on NULL return address It was found in a large "show profiling memory" output that a few entries have a NULL return address, which causes confusion because this address will be reused by the next new allocation caller, possibly resulting in inconsistencies such as "free() ... pool=trash" which makes no sense. The cause is in fact that the first caller had an entry->info pointing to the trash pool from a p_alloc/p_free with a NULL return address, and the second had a different type and reused that entry. Let's make sure undecodable stacks causing an apparent NULL return address all lead to the "other" bin. While this is not exactly a bug, it would make sense to backport it to the recent branches where the feature is used (probably at least as far as 2.8).	2024-10-15 08:12:34 +02:00
Valentine Krasnobaeva	d5e43caaf5	BUG/MINOR: activity: fix Delta_calls and Delta_bytes count Thanks to the commit 5714aff4a6bf "DEBUG: pool: store the memprof bin on alloc() and update it on free()", the amount of memory allocations and memory "frees" is shown now on the same line, corresponded to the caller name. This is very convenient to debug memory leaks (haproxy should run with -dMcaller option). The implicit drawback of this solution is that we count twice same free_calls and same free_tot (bytes) values in cli_io_handler_show_profiling(), when we've calculed tot_free_calls and tot_free_bytes, by adding them to the these totalizators for p_alloc, malloc and calloc allocator types. See the details about why this happens in a such way in __pool_free() implementation and also in the commit message for 5714aff4a6bf. This double addition of free counters falses 'Delta_calls' and 'Delta_bytes', sometimes we even noticed that they show negative values. Same problem was with the calculation of average allocated buffer size for lines, where we show simultaneously the number of allocated and freed bytes.	2024-05-28 19:25:08 +02:00
Christopher Faulet	94b8ed446f	MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands The main CLI I/O handle is responsible to interrupt the processing on shutdown/abort. It is not the responsibility of the I/O handler of CLI commands to take care of it.	2024-03-28 17:28:20 +01:00
Willy Tarreau	a63e016d27	MINOR: activity: report profiling duration and age in "show profiling" Seeing counters in "show profiling" is not always very helpful without an indication of how long the analysis lasted nor if it's still active or not. Let's add a pair of start/stop timers for tasks and memory so that we can now indicate how long the measurements lasted and when they ended (or 0 if still running). Note that for tasks profiling set to "auto", the measurement is considered enabled since it can automatically switch on and off on a per-thread basis.	2023-11-14 11:46:37 +01:00
Willy Tarreau	00de9e0804	MINOR: checks: maintain counters of active checks per thread Let's keep two check counters per thread: - one for "active" checks, i.e. checks that are no more sleeping and are assigned to the thread. These include sleeping and running checks ; - one for "running" checks, i.e. those which are currently executing on the thread. By doing so, we'll be able to spread the health checks load a bit better and refrain from sending too many at once per thread. The counters are atomic since a migration increments the target thread's active counter. These numbers are reported in "show activity", which allows to check per thread and globally how many checks are currently pending and running on the system. Ideally, we should only consider checks in the process of establishing a connection since that's really the expensive part (particularly with OpenSSL 3.0). But the inner layers are really not suitable to doing this. However knowing the number of active checks is already a good enough hint.	2023-09-01 08:26:06 +02:00
Willy Tarreau	3b7942a1c9	MINOR: check/activity: collect some per-thread check activity stats We now count the number of times a check was started on each thread and the number of times a check was adopted. This helps understand better what is observed regarding checks.	2023-09-01 08:26:06 +02:00
Willy Tarreau	338431ecb6	MINOR: activity: report the current run queue size While troubleshooting the causes of load spikes, it appeared that the length of individual run queues was missing, let's add it to "show activity".	2023-09-01 08:26:06 +02:00
Willy Tarreau	8b3e39e37b	MINOR: activity: allow "show activity" to restart in the middle of a line 16kB buffers are not enough to dump 4096 threads with up to 10 bytes value on each line. By storing the column number in the applet's context, we can now restart from the last attempted column. This requires to dump all values as they are produced, but it doesn't cost that much: a 4096-thread output from a fesh process produces 300kB of output in ~8ms, or ~400us per call (19*16kB), most of which are spent in vfprintf(). Given that we don't print more than needed, it doesn't really change anything. The main caveat is that when interrupted on such large lines, there's a great possibility that the total or average on the first column doesn't match anymore the sum or average of all dumped values. In order to avoid this whenever possible (typically less than ~1500 threads), we first try to dump entire lines and only proceed one column at a time when we have to retry a failed dump. This is already the same for other stats that are dumped in an interruptible way anyway and there's little that can be done about it at this point (and not much immediately perceived benefit in doing this with extreme accuracy for >1500 threads).	2023-05-03 17:26:11 +02:00
Willy Tarreau	6ed0b9885d	MINOR: activity: allow "show activity" to restart dumping on any line When using many threads, it's difficult to see the end of "show activity" due to the numerous columns which fill the buffer. For example a dump of a 256-thread, freshly booted process yields around 15kB. Here by arranging the dump in a loop around a switch/case block where each case checks the code line number against the current dump position, we have a restartable counter for free with a granularity of the line of code, without having to maintain a matching between states and specific lines. It just requires to reset the trash buffer for each line and to try to dump it after each line. Now dumping 256 threads after a few seconds of traffic happily emits 20kB.	2023-05-03 17:24:54 +02:00
Willy Tarreau	8ee0d11cb8	MINOR: activity: iterate over all fields in a main loop for dumping Now each line of "show activity" will iterate over n+2 fields, one for the line header, one for the total, and one per thread. This will soon allow us to save the current state in a restartable way.	2023-05-03 17:24:54 +02:00
Willy Tarreau	a465b21516	MINOR: activity: show the line header inside the SHOW_VAL macro Doing so will allow us to drop the extra chunk_appendf() dedicated to the line header and simplify iteration over restartable columns.	2023-05-03 17:24:54 +02:00
Willy Tarreau	5ddf9bea09	MINOR: activity: use a single macro to iterate over all fields Instead of having SHOW_AVG() and SHOW_TOT(), let's just have SHOW_VAL() which iterates over all values.	2023-05-03 17:24:54 +02:00
Willy Tarreau	c05d30e9d8	MINOR: clock: replace the timeval start_time with start_time_ns Now that "now" is no more a timeval, there's no point keeping a copy of it as a timeval, let's also switch start_time to nanoseconds, it simplifies operations.	2023-04-28 16:08:08 +02:00
Willy Tarreau	69530f59ae	MEDIUM: clock: replace timeval "now" with integer "now_ns" This puts an end to the occasional confusion between the "now" date that is internal, monotonic and not synchronized with the system's date, and "date" which is the system's date and not necessarily monotonic. Variable "now" was removed and replaced with a 64-bit integer "now_ns" which is a counter of nanoseconds. It wraps every 585 years, so if all goes well (i.e. if humanity does not need haproxy anymore in 500 years), it will just never wrap. This implies that now_ns is never nul and that the zero value can reliably be used as "not set yet" for a timestamp if needed. This will also simplify date checks where it becomes possible again to do "date1<date2". All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns". Due to the intricacies between now, global_now and now_offset, all 3 had to be turned to nanoseconds at once. It's not a problem since all of them were solely used in 3 functions in clock.c, but they make the patch look bigger than it really is. The clock_update_local_date() and clock_update_global_date() functions are now much simpler as there's no need anymore to perform conversions nor to round the timeval up or down. The wrapping continues to happen by presetting the internal offset in the short future so that the 32-bit now_ms continues to wrap 20 seconds after boot. The start_time used to calculate uptime can still be turned to nanoseconds now. One interrogation concerns global_now_ms which is used only for the freq counters. It's unclear whether there's more value in using two variables that need to be synchronized sequentially like today or to just use global_now_ns divided by 1 million. Both approaches will work equally well on modern systems, the difference might come from smaller ones. Better not change anyhting for now. One benefit of the new approach is that we now have an internal date with a resolution of the nanosecond and the precision of the microsecond, which can be useful to extend some measurements given that timestamps also have this resolution.	2023-04-28 16:08:08 +02:00
Willy Tarreau	b68d308aec	MINOR: activity: use nanoseconds, not timeval to compute uptime Now that we have the required functions, let's get rid of the timeval in intermediary calculations.	2023-04-28 16:08:08 +02:00
Willy Tarreau	82bde18aa4	BUG/MINOR: activity: show wall-clock date, not internal date in show activity Another case where "now" was used instead of "date" for a publicly visible date that was already incorrect and became worse after commit 28360dc ("MEDIUM: clock: force internal time to wrap early after boot"). No backport is needed.	2023-04-27 14:47:50 +02:00
Willy Tarreau	e6f5ab5afa	MINOR: listener: make accept_queue index atomic There has always been a race when checking the length of an accept queue to determine which one is more loaded that another, because the head and tail are read at two different moments. This is not required, we can merge them as two 16 bit numbers inside a single 32-bit index that is always accessed atomically. This way we read both values at once and always have a consistent measurement.	2023-04-21 17:41:26 +02:00
Christopher Faulet	208c712b40	MINOR: stconn: Rename SC_FL_SHUTW in SC_FL_SHUT_DONE Here again, it is just a flag renaming. In SC flags, there is no longer shutdown for writes but shutdowns.	2023-04-14 15:01:21 +02:00
Willy Tarreau	28f2a590f6	MINOR: activity: add a line reporting the average CPU usage to "show activity" It was missing from the output but is sometimes convenient to observe and understand how incoming connections are distributed. The CPU usage is reported as the instant measurement of 100-idle_pct for each thread, and the average value is shown for the aggregated value. This could be backported as it's helpful in certain troublehsooting sessions.	2023-04-12 08:42:52 +02:00
Christopher Faulet	7faac7cf34	MINOR: tree-wide: Simplifiy some tests on SHUT flags by accessing SCs directly At many places, we simplify the tests on SHUT flags to remove calls to chn_prod() or chn_cons() function because the corresponding SC is available.	2023-04-05 08:57:06 +02:00
Christopher Faulet	87633c3a11	MEDIUM: tree-wide: Move flags about shut from the channel to the SC The purpose of this patch is only a one-to-one replacement, as far as possible. CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not so simple because at many places, we were testing if a channel was shut for reads and writes in same time. To do the same, shut for reads must be tested on one side on the SC and shut for writes on the other side on the opposite SC. A special care was taken with process_stream(). flags of SCs must be saved to be able to detect changes, just like for the channels.	2023-04-05 08:57:06 +02:00
Willy Tarreau	6093ba47c0	BUG/MINOR: clock: do not mix wall-clock and monotonic time in uptime calculation We've had a start date even before the internal monotonic clock existed, but once the monotonic clock was added, the start date was not updated to distinguish the wall clock time units and the internal monotonic time units. The distinction is important because both clocks do not necessarily progress at the same speed. The very rare occurrences of the wall-clock date are essentially for human consumption and communication with third parties (e.g. report the start date in "show info" for monitoring purposes). However currently this one is also used to measure the distance to "now" as being the process' uptime. This is actually not correct. It only works because for now the two dates are initialized at the exact same instant at boot but could still be wrong if the system's date shows a big jump backwards during startup for example. In addition the current situation prevents us from enforcing an abritrary offset at boot to reveal some heisenbugs. This patch adds a new "start_time" at boot that is set from "now" and is used in uptime calculations. "start_date" instead is now set from "date" and will always reflect the system date for human consumption (e.g. in "show info"). This way we're now sure that any drift of the internal clock relative to the system date will not impact the reported uptime. This could possibly be backported though it's unlikely that anyone has ever noticed the problem.	2023-02-08 11:06:55 +01:00
Christopher Faulet	da89e9b95b	MINOR: channel/applets: Stop to test CF_WRITE_ERROR flag if CF_SHUTW is enough In applets, we stop processing when a write error (CF_WRITE_ERROR) or a shutdown for writes (CF_SHUTW) is detected. However, any write error leads to an immediate shutdown for writes. Thus, it is enough to only test if CF_SHUTW is set.	2023-01-09 18:41:08 +01:00
Willy Tarreau	f9607f8b1f	REORG: activity/cli: move the "show activity" handler to activity.c Initially the code was placed into cli.c to keep activity.c small and independent of the cli stuff, but that's no longer the case anyway and keeping that code over there makes it harder to find. Let's move it to its more natural place now.	2022-11-25 15:41:47 +01:00
Willy Tarreau	e86bc35672	MINOR: activity/cli: support sorting task profiling by total CPU time The new "bytime" sorting criterion uses the reported CPU time instead of the usage. This is convenient to spot tasks that are mostly reponsible for the CPU usage in a running process. It supports both the detailed and the aggregated format. The output looks like this: > show profiling tasks bytime Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg qc_io_cb 117739 1.961m 999.1us 37.45s 318.1us <- h3_snd_buf@src/h3.c:1084 tasklet_wakeup process_stream 7376273 1.384m 11.26us 1.013h 494.2us <- stream_new@src/stream.c:563 task_wakeup process_stream 8104400 1.133m 8.389us 1.130h 502.0us <- sc_notify@src/stconn.c:1209 task_wakeup qc_io_cb 43280 45.76s 1.057ms 13.95s 322.3us <- qc_stream_desc_ack@src/quic_stream.c:128 tasklet_wakeup h1_io_cb 11025715 24.82s 2.251us 5.406m 29.42us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup quic_conn_app_io_cb 312861 23.86s 76.27us 2.373s 7.584us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after qc_io_cb 37063 12.65s 341.4us 6.409s 172.9us <- qc_treat_acked_tx_frm@src/xprt_quic.c:1695 tasklet_wakeup h1_io_cb 4783520 11.79s 2.463us 1.419h 1.068ms <- conn_subscribe@src/connection.c:732 tasklet_wakeup sc_conn_io_cb 12269693 11.51s 938.0ns 2.117h 621.2us <- sc_app_chk_rcv_conn@src/stconn.c:762 tasklet_wakeup sc_conn_io_cb 6479006 10.94s 1.689us 7.984m 73.93us <- h1_wake_stream_for_recv@src/mux_h1.c:2600 tasklet_wakeup qc_io_cb 12011 10.72s 892.5us 2.120s 176.5us <- qcc_release_remote_stream@src/mux_quic.c:1200 tasklet_wakeup h2_io_cb 246423 6.225s 25.26us 56.52s 229.4us <- h2_snd_buf@src/mux_h2.c:6712 tasklet_wakeup h2_io_cb 137744 6.076s 44.11us 16.59s 120.4us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup quic_lstnr_dghdlr 323575 3.062s 9.462us 3.424m 634.9us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:255 tasklet_wakeup sc_conn_io_cb 1206939 1.616s 1.338us 27.62m 1.373ms <- qcs_notify_send@src/mux_quic.c:529 tasklet_wakeup h2_io_cb 212370 251.2ms 1.182us 6.476s 30.49us <- h2c_restart_reading@src/mux_h2.c:856 tasklet_wakeup h1_io_cb 44109 197.0ms 4.466us 31.89s 723.0us <- h1_takeover@src/mux_h1.c:4085 tasklet_wakeup quic_conn_app_io_cb 3029 87.59ms 28.92us 999.0ms 329.8us <- qc_process_timer@src/xprt_quic.c:4635 tasklet_wakeup task_run_applet 40 35.77ms 894.3us 4.407ms 110.2us <- sc_applet_create@src/stconn.c:489 appctx_wakeup task_run_applet 18 27.36ms 1.520ms 19.56us 1.086us <- sc_app_chk_snd_applet@src/stconn.c:996 appctx_wakeup sc_conn_io_cb 2186 11.76ms 5.377us 963.0ms 440.5us <- h1_wake_stream_for_send@src/mux_h1.c:2610 tasklet_wakeup qc_io_cb 8 9.880ms 1.235ms 5.871ms 733.9us <- qcs_consume@src/mux_quic.c:800 tasklet_wakeup quic_conn_io_cb 4 5.951ms 1.488ms 38.85us 9.713us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after qc_io_cb 101 4.975ms 49.26us 13.91ms 137.8us <- qc_process_timer@src/xprt_quic.c:4602 tasklet_wakeup h1_io_cb 2186 1.809ms 827.0ns 720.2ms 329.5us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup qc_process_timer 3031 1.735ms 572.0ns 1.153s 380.3us <- wake_expired_tasks@src/task.c:344 task_wakeup accept_queue_process 359 1.362ms 3.793us 80.32ms 223.7us <- listener_accept@src/listener.c:1099 tasklet_wakeup quic_conn_app_io_cb 2 921.1us 460.6us 203.1us 101.5us <- qc_xprt_start@src/xprt_quic.c:7122 tasklet_wakeup h1_timeout_task 2618 526.8us 201.0ns 1.121s 428.4us <- h1_release@src/mux_h1.c:1087 task_wakeup process_resolvers 316 283.3us 896.0ns 14.96ms 47.33us <- wake_expired_tasks@src/task.c:429 task_drop_running sc_conn_io_cb 420 235.6us 560.0ns 116.7ms 277.8us <- h2s_notify_recv@src/mux_h2.c:1298 tasklet_wakeup qc_idle_timer_task 1 225.5us 225.5us 506.0ns 506.0ns <- wake_expired_tasks@src/task.c:344 task_wakeup accept_queue_process 36 153.0us 4.250us 5.834ms 162.1us <- accept_queue_process@src/listener.c:165 tasklet_wakeup sc_conn_io_cb 18 54.05us 3.003us 11.50us 638.0ns <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup h2_io_cb 6 38.88us 6.480us 2.089ms 348.2us <- h2_do_shutw@src/mux_h2.c:4656 tasklet_wakeup srv_cleanup_idle_conns 54 37.72us 698.0ns 14.21ms 263.1us <- wake_expired_tasks@src/task.c:429 task_drop_running sc_conn_io_cb 50 32.86us 657.0ns 28.83ms 576.5us <- qcs_notify_recv@src/mux_quic.c:519 tasklet_wakeup qc_io_cb 2 30.25us 15.12us 6.093us 3.046us <- qc_init@src/mux_quic.c:2057 tasklet_wakeup srv_cleanup_toremove_conns 1 27.16us 27.16us 905.6us 905.6us <- srv_cleanup_idle_conns@src/server.c:5948 task_wakeup task_run_applet 39 19.61us 502.0ns 818.7us 20.99us <- run_tasks_from_lists@src/task.c:652 task_drop_running quic_accept_run 2 15.46us 7.727us 305.5us 152.8us <- quic_accept_push_qc@src/quic_sock.c:458 tasklet_wakeup h2_timeout_task 32 12.91us 403.0ns 4.207ms 131.5us <- h2_release@src/mux_h2.c:1191 task_wakeup quic_conn_app_io_cb 1 9.645us 9.645us 1.445us 1.445us <- qc_process_timer@src/xprt_quic.c:4589 tasklet_wakeup > show profiling tasks bytime aggr Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg qc_io_cb 212301 3.147m 889.5us 1.009m 285.2us process_stream 15503573 2.519m 9.747us 2.148h 498.7us h1_io_cb 15916733 36.95s 2.321us 1.535h 347.1us quic_conn_app_io_cb 318845 24.21s 75.92us 3.410s 10.70us sc_conn_io_cb 20037058 24.19s 1.207us 2.737h 491.8us h2_io_cb 596543 12.55s 21.04us 1.326m 133.4us quic_lstnr_dghdlr 326624 3.094s 9.473us 3.462m 635.9us task_run_applet 100 64.43ms 644.3us 5.285ms 52.85us quic_conn_io_cb 4 5.951ms 1.488ms 38.85us 9.713us qc_process_timer 3061 1.750ms 571.0ns 1.162s 379.5us accept_queue_process 396 1.521ms 3.840us 86.16ms 217.6us h1_timeout_task 2618 526.8us 201.0ns 1.121s 428.4us process_resolvers 319 286.0us 896.0ns 16.82ms 52.73us qc_idle_timer_task 1 225.5us 225.5us 506.0ns 506.0ns srv_cleanup_idle_conns 54 37.72us 698.0ns 14.21ms 263.1us srv_cleanup_toremove_conns 1 27.16us 27.16us 905.6us 905.6us quic_accept_run 2 15.46us 7.727us 305.5us 152.8us h2_timeout_task 32 12.91us 403.0ns 4.207ms 131.5us	2022-09-08 16:38:10 +02:00
Willy Tarreau	dc89b1806c	MINOR: activity/cli: support aggregating task profiling outputs By default we now dump stats between caller and callee, but by specifying "aggr" on the command line, stats get aggregated by callee again as it used to be before the feature was available. It may sometimes be helpful when comparing total call counts, though that's about all.	2022-09-08 16:32:17 +02:00
Willy Tarreau	64435aaa85	MINOR: tasks/activity: improve the caller-callee activity hash The previous dump already showed that the "other" category was getting a few entries. Let's proceed like for the memory profiling, by scanning a limited range of adjacent slots to find a spare one (16 max). That's pretty fast since close and likely prefetched and the comparison is cheap. The new dump now shows up to 45 entries below without "other": Now: Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg task_run_applet 22 34.56ms 1.571ms 1.145ms 52.04us <- sc_applet_create@src/stconn.c:489 appctx_wakeup task_run_applet 21 11.11us 529.0ns 2.590ms 123.3us <- run_tasks_from_lists@src/task.c:652 task_drop_running task_run_applet 5 7.715ms 1.543ms 2.186us 437.0ns <- sc_app_chk_snd_applet@src/stconn.c:996 appctx_wakeup accept_queue_process 345 3.129ms 9.068us 72.84ms 211.1us <- listener_accept@src/listener.c:1099 tasklet_wakeup accept_queue_process 32 113.0us 3.529us 3.070ms 95.94us <- accept_queue_process@src/listener.c:165 tasklet_wakeup sc_conn_io_cb 5026032 3.037s 604.0ns 17.47m 208.5us <- sc_app_chk_rcv_conn@src/stconn.c:762 tasklet_wakeup sc_conn_io_cb 4361192 7.626s 1.748us 3.179m 43.74us <- h1_wake_stream_for_recv@src/mux_h1.c:2600 tasklet_wakeup sc_conn_io_cb 178293 275.4ms 1.544us 2.740m 922.0us <- qcs_notify_send@src/mux_quic.c:529 tasklet_wakeup sc_conn_io_cb 2561 15.84ms 6.185us 1.036s 404.4us <- h1_wake_stream_for_send@src/mux_h1.c:2610 tasklet_wakeup sc_conn_io_cb 453 261.4us 577.0ns 86.79ms 191.6us <- h2s_notify_recv@src/mux_h2.c:1298 tasklet_wakeup sc_conn_io_cb 89 50.05us 562.0ns 100.7ms 1.131ms <- qcs_notify_recv@src/mux_quic.c:519 tasklet_wakeup sc_conn_io_cb 8 19.04us 2.379us 472.5us 59.06us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup process_resolvers 50 57.50us 1.149us 1.116ms 22.32us <- wake_expired_tasks@src/task.c:429 task_drop_running srv_cleanup_idle_conns 8 5.669us 708.0ns 216.6us 27.08us <- wake_expired_tasks@src/task.c:429 task_drop_running process_stream 4599847 48.79s 10.61us 16.92m 220.7us <- sc_notify@src/stconn.c:1209 task_wakeup process_stream 4530081 52.82s 11.66us 14.92m 197.6us <- stream_new@src/stream.c:563 task_wakeup process_stream 15 201.7us 13.45us 31.58ms 2.105ms <- sc_app_chk_snd_conn@src/stconn.c:857 task_wakeup h1_io_cb 7861205 18.22s 2.317us 2.408m 18.38us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup h1_io_cb 474763 1.379s 2.905us 6.578m 831.4us <- conn_subscribe@src/connection.c:732 tasklet_wakeup h1_io_cb 34830 38.64ms 1.109us 18.85s 541.2us <- h1_takeover@src/mux_h1.c:4085 tasklet_wakeup h1_io_cb 2561 2.150ms 839.0ns 674.4ms 263.3us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup h1_timeout_task 2634 588.5us 223.0ns 890.5ms 338.1us <- h1_release@src/mux_h1.c:1087 task_wakeup h2_timeout_task 16 7.519us 469.0ns 1.146ms 71.63us <- h2_release@src/mux_h2.c:1191 task_wakeup h2_io_cb 99601 2.212s 22.21us 19.33s 194.1us <- h2_snd_buf@src/mux_h2.c:6712 tasklet_wakeup h2_io_cb 79777 146.6ms 1.837us 3.529s 44.24us <- h2c_restart_reading@src/mux_h2.c:856 tasklet_wakeup h2_io_cb 60698 2.259s 37.21us 4.704s 77.50us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup h2_io_cb 5 36.90us 7.380us 2.045ms 409.0us <- h2_do_shutw@src/mux_h2.c:4656 tasklet_wakeup qc_io_cb 26595 8.007s 301.1us 4.261s 160.2us <- qc_treat_acked_tx_frm@src/xprt_quic.c:1695 tasklet_wakeup qc_io_cb 7921 5.284s 667.1us 2.171s 274.1us <- qc_stream_desc_ack@src/quic_stream.c:128 tasklet_wakeup qc_io_cb 6229 5.851s 939.3us 1.856s 297.9us <- h3_snd_buf@src/h3.c:1084 tasklet_wakeup qc_io_cb 994 699.1ms 703.3us 174.9ms 176.0us <- qcc_release_remote_stream@src/mux_quic.c:1200 tasklet_wakeup qc_io_cb 65 9.883ms 152.0us 13.33ms 205.1us <- qc_process_timer@src/xprt_quic.c:4602 tasklet_wakeup qc_io_cb 1 293.5us 293.5us 105.9us 105.9us <- qcs_consume@src/mux_quic.c:800 tasklet_wakeup qc_io_cb 1 10.87us 10.87us 3.307us 3.307us <- qc_init@src/mux_quic.c:2057 tasklet_wakeup quic_conn_io_cb 2 2.531ms 1.265ms 2.839us 1.419us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after quic_conn_app_io_cb 61392 2.620s 42.67us 268.0ms 4.365us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after quic_conn_app_io_cb 408 10.56ms 25.88us 124.0ms 303.8us <- qc_process_timer@src/xprt_quic.c:4635 tasklet_wakeup quic_conn_app_io_cb 2 15.61us 7.806us 103.2us 51.59us <- qc_process_timer@src/xprt_quic.c:4589 tasklet_wakeup quic_conn_app_io_cb 1 410.6us 410.6us 11.52us 11.52us <- qc_xprt_start@src/xprt_quic.c:7122 tasklet_wakeup quic_lstnr_dghdlr 62716 409.2ms 6.523us 21.81s 347.8us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:255 tasklet_wakeup qc_process_timer 410 245.4us 598.0ns 238.5ms 581.7us <- wake_expired_tasks@src/task.c:344 task_wakeup quic_accept_run 1 7.711us 7.711us 82.28us 82.28us <- quic_accept_push_qc@src/quic_sock.c:458 tasklet_wakeup	2022-09-08 16:25:36 +02:00
Willy Tarreau	3d4cdb198c	MEDIUM: tasks/activity: combine the called function with the caller Now instead of getting aggregate stats per called function, we have them per function AND per call place. The "byaddr" sort considers the function pointer first, then the call count, so that dominant callers of a given callee are instantly spotted. This allows to get sorted outputs like this: Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg h1_io_cb 17357952 40.91s 2.357us 4.849m 16.76us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup sc_conn_io_cb 10357182 6.297s 607.0ns 27.93m 161.8us <- sc_app_chk_rcv_conn@src/stconn.c:762 tasklet_wakeup process_stream 9891131 1.809m 10.97us 53.61m 325.2us <- sc_notify@src/stconn.c:1209 task_wakeup process_stream 9823934 1.887m 11.52us 48.31m 295.1us <- stream_new@src/stream.c:563 task_wakeup sc_conn_io_cb 9347863 16.59s 1.774us 6.143m 39.43us <- h1_wake_stream_for_recv@src/mux_h1.c:2600 tasklet_wakeup h1_io_cb 501344 1.848s 3.686us 6.544m 783.2us <- conn_subscribe@src/connection.c:732 tasklet_wakeup sc_conn_io_cb 239717 492.3ms 2.053us 3.213m 804.3us <- qcs_notify_send@src/mux_quic.c:529 tasklet_wakeup h2_io_cb 173019 4.204s 24.30us 40.95s 236.7us <- h2_snd_buf@src/mux_h2.c:6712 tasklet_wakeup h2_io_cb 149487 424.3ms 2.838us 14.63s 97.87us <- h2c_restart_reading@src/mux_h2.c:856 tasklet_wakeup other 101893 4.626s 45.40us 14.84s 145.7us quic_lstnr_dghdlr 94389 614.0ms 6.504us 30.54s 323.6us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:255 tasklet_wakeup quic_conn_app_io_cb 92205 3.735s 40.51us 390.9ms 4.239us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after qc_io_cb 50355 19.01s 377.5us 10.65s 211.4us <- qc_treat_acked_tx_frm@src/xprt_quic.c:1695 tasklet_wakeup h1_io_cb 44427 155.0ms 3.489us 21.50s 484.0us <- h1_takeover@src/mux_h1.c:4085 tasklet_wakeup qc_io_cb 9018 4.924s 546.0us 3.084s 342.0us <- qc_stream_desc_ack@src/quic_stream.c:128 tasklet_wakeup h1_timeout_task 3236 1.172ms 362.0ns 1.119s 345.9us <- h1_release@src/mux_h1.c:1087 task_wakeup h1_io_cb 2804 7.974ms 2.843us 1.980s 706.0us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup sc_conn_io_cb 2804 33.44ms 11.92us 2.597s 926.2us <- h1_wake_stream_for_send@src/mux_h1.c:2610 tasklet_wakeup qc_io_cb 2623 2.669s 1.017ms 1.347s 513.5us <- h3_snd_buf@src/h3.c:1084 tasklet_wakeup qc_process_timer 662 526.4us 795.0ns 1.081s 1.633ms <- wake_expired_tasks@src/task.c:344 task_wakeup quic_conn_app_io_cb 648 12.62ms 19.47us 225.7ms 348.2us <- qc_process_timer@src/xprt_quic.c:4635 tasklet_wakeup accept_queue_process 286 1.571ms 5.494us 72.55ms 253.7us <- listener_accept@src/listener.c:1099 tasklet_wakeup process_resolvers 176 157.8us 896.0ns 7.835ms 44.52us <- wake_expired_tasks@src/task.c:429 task_drop_running qc_io_cb 167 10.71ms 64.12us 32.47ms 194.4us <- qc_process_timer@src/xprt_quic.c:4602 tasklet_wakeup sc_conn_io_cb 123 80.05us 650.0ns 50.35ms 409.4us <- qcs_notify_recv@src/mux_quic.c:519 tasklet_wakeup h2_timeout_task 32 30.69us 958.0ns 9.038ms 282.4us <- h2_release@src/mux_h2.c:1191 task_wakeup task_run_applet 24 33.79ms 1.408ms 5.838ms 243.3us <- sc_applet_create@src/stconn.c:489 appctx_wakeup accept_queue_process 17 56.34us 3.314us 7.505ms 441.5us <- accept_queue_process@src/listener.c:165 tasklet_wakeup srv_cleanup_toremove_conns 16 1.133ms 70.81us 5.685ms 355.3us <- srv_cleanup_idle_conns@src/server.c:5948 task_wakeup srv_cleanup_idle_conns 16 74.57us 4.660us 2.797ms 174.8us <- wake_expired_tasks@src/task.c:429 task_drop_running quic_conn_app_io_cb 12 786.9us 65.58us 2.042ms 170.1us <- qc_process_timer@src/xprt_quic.c:4589 tasklet_wakeup sc_conn_io_cb 9 20.55us 2.283us 2.475ms 275.0us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup h2_io_cb 8 34.12us 4.265us 1.784ms 223.0us <- h2_do_shutw@src/mux_h2.c:4656 tasklet_wakeup task_run_applet 4 6.615ms 1.654ms 2.306us 576.0ns <- sc_app_chk_snd_applet@src/stconn.c:996 appctx_wakeup quic_conn_io_cb 4 4.278ms 1.069ms 6.469us 1.617us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after qc_io_cb 2 20.81us 10.40us 4.943us 2.471us <- qc_init@src/mux_quic.c:2057 tasklet_wakeup quic_conn_app_io_cb 2 752.9us 376.4us 63.97us 31.99us <- qc_xprt_start@src/xprt_quic.c:7122 tasklet_wakeup quic_accept_run 2 13.84us 6.920us 172.8us 86.42us <- quic_accept_push_qc@src/quic_sock.c:458 tasklet_wakeup qc_idle_timer_task 2 295.0us 147.5us 8.761us 4.380us <- wake_expired_tasks@src/task.c:344 task_wakeup qc_io_cb 1 867.1us 867.1us 812.8us 812.8us <- qcs_consume@src/mux_quic.c:800 tasklet_wakeup ... and calls sorted by address like this: Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg task_run_applet 23 32.73ms 1.423ms 5.837ms 253.8us <- sc_applet_create@src/stconn.c:489 appctx_wakeup task_run_applet 4 6.615ms 1.654ms 2.306us 576.0ns <- sc_app_chk_snd_applet@src/stconn.c:996 appctx_wakeup accept_queue_process 285 1.566ms 5.495us 72.49ms 254.3us <- listener_accept@src/listener.c:1099 tasklet_wakeup accept_queue_process 17 56.34us 3.314us 7.505ms 441.5us <- accept_queue_process@src/listener.c:165 tasklet_wakeup sc_conn_io_cb 10357182 6.297s 607.0ns 27.93m 161.8us <- sc_app_chk_rcv_conn@src/stconn.c:762 tasklet_wakeup sc_conn_io_cb 9347863 16.59s 1.774us 6.143m 39.43us <- h1_wake_stream_for_recv@src/mux_h1.c:2600 tasklet_wakeup sc_conn_io_cb 239717 492.3ms 2.053us 3.213m 804.3us <- qcs_notify_send@src/mux_quic.c:529 tasklet_wakeup sc_conn_io_cb 2804 33.44ms 11.92us 2.597s 926.2us <- h1_wake_stream_for_send@src/mux_h1.c:2610 tasklet_wakeup sc_conn_io_cb 123 80.05us 650.0ns 50.35ms 409.4us <- qcs_notify_recv@src/mux_quic.c:519 tasklet_wakeup sc_conn_io_cb 9 20.55us 2.283us 2.475ms 275.0us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup process_resolvers 159 145.9us 917.0ns 7.823ms 49.20us <- wake_expired_tasks@src/task.c:429 task_drop_running srv_cleanup_idle_conns 16 74.57us 4.660us 2.797ms 174.8us <- wake_expired_tasks@src/task.c:429 task_drop_running srv_cleanup_toremove_conns 16 1.133ms 70.81us 5.685ms 355.3us <- srv_cleanup_idle_conns@src/server.c:5948 task_wakeup process_stream 9891130 1.809m 10.97us 53.61m 325.2us <- sc_notify@src/stconn.c:1209 task_wakeup process_stream 9823933 1.887m 11.52us 48.31m 295.1us <- stream_new@src/stream.c:563 task_wakeup h1_io_cb 17357952 40.91s 2.357us 4.849m 16.76us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup h1_io_cb 501344 1.848s 3.686us 6.544m 783.2us <- conn_subscribe@src/connection.c:732 tasklet_wakeup h1_io_cb 44427 155.0ms 3.489us 21.50s 484.0us <- h1_takeover@src/mux_h1.c:4085 tasklet_wakeup h1_io_cb 2804 7.974ms 2.843us 1.980s 706.0us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup h1_timeout_task 3236 1.172ms 362.0ns 1.119s 345.9us <- h1_release@src/mux_h1.c:1087 task_wakeup h2_timeout_task 32 30.69us 958.0ns 9.038ms 282.4us <- h2_release@src/mux_h2.c:1191 task_wakeup h2_io_cb 173019 4.204s 24.30us 40.95s 236.7us <- h2_snd_buf@src/mux_h2.c:6712 tasklet_wakeup h2_io_cb 149487 424.3ms 2.838us 14.63s 97.87us <- h2c_restart_reading@src/mux_h2.c:856 tasklet_wakeup h2_io_cb 8 34.12us 4.265us 1.784ms 223.0us <- h2_do_shutw@src/mux_h2.c:4656 tasklet_wakeup qc_io_cb 50355 19.01s 377.5us 10.65s 211.4us <- qc_treat_acked_tx_frm@src/xprt_quic.c:1695 tasklet_wakeup qc_io_cb 9018 4.924s 546.0us 3.084s 342.0us <- qc_stream_desc_ack@src/quic_stream.c:128 tasklet_wakeup qc_io_cb 2623 2.669s 1.017ms 1.347s 513.5us <- h3_snd_buf@src/h3.c:1084 tasklet_wakeup qc_io_cb 167 10.71ms 64.12us 32.47ms 194.4us <- qc_process_timer@src/xprt_quic.c:4602 tasklet_wakeup qc_io_cb 2 20.81us 10.40us 4.943us 2.471us <- qc_init@src/mux_quic.c:2057 tasklet_wakeup qc_io_cb 1 867.1us 867.1us 812.8us 812.8us <- qcs_consume@src/mux_quic.c:800 tasklet_wakeup qc_idle_timer_task 2 295.0us 147.5us 8.761us 4.380us <- wake_expired_tasks@src/task.c:344 task_wakeup quic_conn_io_cb 4 4.278ms 1.069ms 6.469us 1.617us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after quic_conn_app_io_cb 92205 3.735s 40.51us 390.9ms 4.239us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after quic_conn_app_io_cb 648 12.62ms 19.47us 225.7ms 348.2us <- qc_process_timer@src/xprt_quic.c:4635 tasklet_wakeup quic_conn_app_io_cb 12 786.9us 65.58us 2.042ms 170.1us <- qc_process_timer@src/xprt_quic.c:4589 tasklet_wakeup quic_conn_app_io_cb 2 752.9us 376.4us 63.97us 31.99us <- qc_xprt_start@src/xprt_quic.c:7122 tasklet_wakeup quic_lstnr_dghdlr 94389 614.0ms 6.504us 30.54s 323.6us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:255 tasklet_wakeup qc_process_timer 662 526.4us 795.0ns 1.081s 1.633ms <- wake_expired_tasks@src/task.c:344 task_wakeup quic_accept_run 2 13.84us 6.920us 172.8us 86.42us <- quic_accept_push_qc@src/quic_sock.c:458 tasklet_wakeup other 101892 4.626s 45.40us 14.84s 145.7us It already becomes visible that some tasks have different very costs depending where they're called (e.g. process_stream). The method used to wake them up is also shown. Applets are handled specially and shown as appctx_wakeup.	2022-09-08 16:21:22 +02:00
Willy Tarreau	a3423873fe	CLEANUP: activity: make the number of sched activity entries more configurable This removes all the hard-coded 8-bit and 256 entries to use a pair of macros instead so that we can more easily experiment with larger table sizes if needed.	2022-09-08 14:55:09 +02:00
Willy Tarreau	4c1bc01f31	CLEANUP: activity: make taskprof use ptr_hash() There's no more point using a different hash function here, xxh64 is of course better distributed but we really don't care so let's unify the code.	2022-09-08 14:19:15 +02:00
Willy Tarreau	245d32fe8f	CLEANUP: activity: make memprof use the generic ptr_hash() function There's no need to keep a local version of that function anymore.	2022-09-08 14:19:15 +02:00
Willy Tarreau	04e50b3d32	CLEANUP: task: rename ->call_date to ->wake_date This field is misnamed because its real and important content is the date the task was woken up, not the date it was called. It temporarily holds the call date during execution but this remains confusing. In fact before the latency measurements were possible it was indeed a call date. Thus is will now be called wake_date. This change is necessary because a subsequent fix will require the introduction of the real call date in the thread ctx.	2022-09-08 14:19:15 +02:00
Willy Tarreau	42b180dcdb	MINOR: pools/memprof: store and report the pool's name in each bin Storing the pointer to the pool along with the stats is quite useful as it allows to report the name. That's what we're doing here. We could store it in place of another field but that's not convenient as it would require to change all functions that manipulate counters. Thus here we store one extra field, as well as some padding because the struct turns 56 bytes long, thus better go to 64 directly. Example of output from "show profiling memory": 2 0 48 0\| 0x4bfb2c ha_quic_set_encryption_secrets+0xcc/0xb5e p_alloc(24) [pool=quic_tls_iv] 0 55252 0 10608384\| 0x4bed32 main+0x2beb2 free(-192) 15 0 2760 0\| 0x4be855 main+0x2b9d5 p_alloc(184) [pool=quic_frame] 1 0 1048 0\| 0x4be266 ha_quic_add_handshake_data+0x2b6/0x66d p_alloc(1048) [pool=quic_crypto] 3 0 552 0\| 0x4be142 ha_quic_add_handshake_data+0x192/0x66d p_alloc(184) [pool=quic_frame] 31276 0 6755616 0\| 0x4bb8f9 quic_sock_fd_iocb+0x689/0x69b p_alloc(216) [pool=quic_dgram] 0 31424 0 6787584\| 0x4bb7f3 quic_sock_fd_iocb+0x583/0x69b p_free(-216) [pool=quic_dgram] 152 0 32832 0\| 0x4bb4d9 quic_sock_fd_iocb+0x269/0x69b p_alloc(216) [pool=quic_dgram]	2022-08-17 10:34:00 +02:00
Willy Tarreau	facfad2b64	MINOR: pool/memprof: report pool alloc/free in memory profiling Pools are being used so well that it becomes difficult to profile their usage via the regular memory profiling. Let's add new entries for pools there, named "p_alloc" and "p_free" that correspond to pool_alloc() and pool_free(). Ideally it would be nice to only report those that fail cache lookups but that's complicated, particularly on the free() path since free lists are released in clusters to the shared pools. It's worth noting that the alloc_tot/free_tot fields can easily be determined by multiplying alloc_calls/free_calls by the pool's size, and could be better used to store a pointer to the pool itself. However it would require significant changes down the code that sorts output. If this were to cause a measurable slowdown, an alternate approach could consist in using a different value of USE_MEMORY_PROFILING to enable pools profiling. Also, this profiler doesn't depend on intercepting regular malloc functions, so we could also imagine enabling it alone or the other one alone or both. Tests show that the CPU overhead on QUIC (which is already an extremely intensive user of pools) jumps from ~7% to ~10%. This is quite acceptable in most deployments.	2022-08-17 09:38:05 +02:00
Willy Tarreau	219afa2ca8	MINOR: memprof: export the minimum definitions for memory profiling Right now it's not possible to feed memory profiling info from outside activity.c, so let's export the function and move the enum and struct to the include file.	2022-08-17 09:03:57 +02:00
Willy Tarreau	bdcd32598f	MINOR: thread: only use atomic ops to touch the flags The thread flags are touched a little bit by other threads, e.g. the STUCK flag may be set by other ones, and they're watched a little bit. As such we need to use atomic ops only to manipulate them. Most places were already using them, but here we generalize the practice. Only ha_thread_dump() does not change because it's run under isolation.	2022-07-01 19:15:14 +02:00
Willy Tarreau	319d136ff9	MEDIUM: task: use regular eb32 trees for the run queues Since we don't mix tasks from different threads in the run queues anymore, we don't need to use the eb32sc_ trees and we can switch to the regular eb32 ones. This uses cheaper lookup and insert code, and a 16-thread test on the queues shows a performance increase from 570k RPS to 585k RPS.	2022-07-01 19:15:14 +02:00
Willy Tarreau	6f78038d72	MEDIUM: task: move the shared runqueue to one per thread Since we only use the shared runqueue to put tasks only assigned to known threads, let's move that runqueue to each of these threads. The goal will be to arrange an N*(N-1) mesh instead of a central contention point. The global_rqueue_ticks had to be dropped (for good) since we'll now use the per-thread rqueue_ticks counter for both trees. A few points to note: - the rq_lock stlil remains the global one for now so there should not be any gain in doing this, but should this trigger any regression, it is important to detect whether it's related to the lock or to the tree. - there's no more reason for using the scope-based version of the ebtree now, we could switch back to the regular eb32_tree. - it's worth checking if we still need TASK_GLOBAL (probably only to delete a task in one's own shared queue maybe).	2022-07-01 19:15:14 +02:00
Willy Tarreau	680ed5f28b	MINOR: task: move profiling bit to per-thread Instead of having a global mask of all the profiled threads, let's have one flag per thread in each thread's flags. They are never accessed more than one at a time an are better located inside the threads' contexts for both performance and scalability.	2022-06-14 10:38:03 +02:00

1 2 3

115 Commits