haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-20 22:21:24 +02:00

Author	SHA1	Message	Date
Willy Tarreau	3d4cdb198c	MEDIUM: tasks/activity: combine the called function with the caller Now instead of getting aggregate stats per called function, we have them per function AND per call place. The "byaddr" sort considers the function pointer first, then the call count, so that dominant callers of a given callee are instantly spotted. This allows to get sorted outputs like this: Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg h1_io_cb 17357952 40.91s 2.357us 4.849m 16.76us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup sc_conn_io_cb 10357182 6.297s 607.0ns 27.93m 161.8us <- sc_app_chk_rcv_conn@src/stconn.c:762 tasklet_wakeup process_stream 9891131 1.809m 10.97us 53.61m 325.2us <- sc_notify@src/stconn.c:1209 task_wakeup process_stream 9823934 1.887m 11.52us 48.31m 295.1us <- stream_new@src/stream.c:563 task_wakeup sc_conn_io_cb 9347863 16.59s 1.774us 6.143m 39.43us <- h1_wake_stream_for_recv@src/mux_h1.c:2600 tasklet_wakeup h1_io_cb 501344 1.848s 3.686us 6.544m 783.2us <- conn_subscribe@src/connection.c:732 tasklet_wakeup sc_conn_io_cb 239717 492.3ms 2.053us 3.213m 804.3us <- qcs_notify_send@src/mux_quic.c:529 tasklet_wakeup h2_io_cb 173019 4.204s 24.30us 40.95s 236.7us <- h2_snd_buf@src/mux_h2.c:6712 tasklet_wakeup h2_io_cb 149487 424.3ms 2.838us 14.63s 97.87us <- h2c_restart_reading@src/mux_h2.c:856 tasklet_wakeup other 101893 4.626s 45.40us 14.84s 145.7us quic_lstnr_dghdlr 94389 614.0ms 6.504us 30.54s 323.6us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:255 tasklet_wakeup quic_conn_app_io_cb 92205 3.735s 40.51us 390.9ms 4.239us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after qc_io_cb 50355 19.01s 377.5us 10.65s 211.4us <- qc_treat_acked_tx_frm@src/xprt_quic.c:1695 tasklet_wakeup h1_io_cb 44427 155.0ms 3.489us 21.50s 484.0us <- h1_takeover@src/mux_h1.c:4085 tasklet_wakeup qc_io_cb 9018 4.924s 546.0us 3.084s 342.0us <- qc_stream_desc_ack@src/quic_stream.c:128 tasklet_wakeup h1_timeout_task 3236 1.172ms 362.0ns 1.119s 345.9us <- h1_release@src/mux_h1.c:1087 task_wakeup h1_io_cb 2804 7.974ms 2.843us 1.980s 706.0us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup sc_conn_io_cb 2804 33.44ms 11.92us 2.597s 926.2us <- h1_wake_stream_for_send@src/mux_h1.c:2610 tasklet_wakeup qc_io_cb 2623 2.669s 1.017ms 1.347s 513.5us <- h3_snd_buf@src/h3.c:1084 tasklet_wakeup qc_process_timer 662 526.4us 795.0ns 1.081s 1.633ms <- wake_expired_tasks@src/task.c:344 task_wakeup quic_conn_app_io_cb 648 12.62ms 19.47us 225.7ms 348.2us <- qc_process_timer@src/xprt_quic.c:4635 tasklet_wakeup accept_queue_process 286 1.571ms 5.494us 72.55ms 253.7us <- listener_accept@src/listener.c:1099 tasklet_wakeup process_resolvers 176 157.8us 896.0ns 7.835ms 44.52us <- wake_expired_tasks@src/task.c:429 task_drop_running qc_io_cb 167 10.71ms 64.12us 32.47ms 194.4us <- qc_process_timer@src/xprt_quic.c:4602 tasklet_wakeup sc_conn_io_cb 123 80.05us 650.0ns 50.35ms 409.4us <- qcs_notify_recv@src/mux_quic.c:519 tasklet_wakeup h2_timeout_task 32 30.69us 958.0ns 9.038ms 282.4us <- h2_release@src/mux_h2.c:1191 task_wakeup task_run_applet 24 33.79ms 1.408ms 5.838ms 243.3us <- sc_applet_create@src/stconn.c:489 appctx_wakeup accept_queue_process 17 56.34us 3.314us 7.505ms 441.5us <- accept_queue_process@src/listener.c:165 tasklet_wakeup srv_cleanup_toremove_conns 16 1.133ms 70.81us 5.685ms 355.3us <- srv_cleanup_idle_conns@src/server.c:5948 task_wakeup srv_cleanup_idle_conns 16 74.57us 4.660us 2.797ms 174.8us <- wake_expired_tasks@src/task.c:429 task_drop_running quic_conn_app_io_cb 12 786.9us 65.58us 2.042ms 170.1us <- qc_process_timer@src/xprt_quic.c:4589 tasklet_wakeup sc_conn_io_cb 9 20.55us 2.283us 2.475ms 275.0us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup h2_io_cb 8 34.12us 4.265us 1.784ms 223.0us <- h2_do_shutw@src/mux_h2.c:4656 tasklet_wakeup task_run_applet 4 6.615ms 1.654ms 2.306us 576.0ns <- sc_app_chk_snd_applet@src/stconn.c:996 appctx_wakeup quic_conn_io_cb 4 4.278ms 1.069ms 6.469us 1.617us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after qc_io_cb 2 20.81us 10.40us 4.943us 2.471us <- qc_init@src/mux_quic.c:2057 tasklet_wakeup quic_conn_app_io_cb 2 752.9us 376.4us 63.97us 31.99us <- qc_xprt_start@src/xprt_quic.c:7122 tasklet_wakeup quic_accept_run 2 13.84us 6.920us 172.8us 86.42us <- quic_accept_push_qc@src/quic_sock.c:458 tasklet_wakeup qc_idle_timer_task 2 295.0us 147.5us 8.761us 4.380us <- wake_expired_tasks@src/task.c:344 task_wakeup qc_io_cb 1 867.1us 867.1us 812.8us 812.8us <- qcs_consume@src/mux_quic.c:800 tasklet_wakeup ... and calls sorted by address like this: Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg task_run_applet 23 32.73ms 1.423ms 5.837ms 253.8us <- sc_applet_create@src/stconn.c:489 appctx_wakeup task_run_applet 4 6.615ms 1.654ms 2.306us 576.0ns <- sc_app_chk_snd_applet@src/stconn.c:996 appctx_wakeup accept_queue_process 285 1.566ms 5.495us 72.49ms 254.3us <- listener_accept@src/listener.c:1099 tasklet_wakeup accept_queue_process 17 56.34us 3.314us 7.505ms 441.5us <- accept_queue_process@src/listener.c:165 tasklet_wakeup sc_conn_io_cb 10357182 6.297s 607.0ns 27.93m 161.8us <- sc_app_chk_rcv_conn@src/stconn.c:762 tasklet_wakeup sc_conn_io_cb 9347863 16.59s 1.774us 6.143m 39.43us <- h1_wake_stream_for_recv@src/mux_h1.c:2600 tasklet_wakeup sc_conn_io_cb 239717 492.3ms 2.053us 3.213m 804.3us <- qcs_notify_send@src/mux_quic.c:529 tasklet_wakeup sc_conn_io_cb 2804 33.44ms 11.92us 2.597s 926.2us <- h1_wake_stream_for_send@src/mux_h1.c:2610 tasklet_wakeup sc_conn_io_cb 123 80.05us 650.0ns 50.35ms 409.4us <- qcs_notify_recv@src/mux_quic.c:519 tasklet_wakeup sc_conn_io_cb 9 20.55us 2.283us 2.475ms 275.0us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup process_resolvers 159 145.9us 917.0ns 7.823ms 49.20us <- wake_expired_tasks@src/task.c:429 task_drop_running srv_cleanup_idle_conns 16 74.57us 4.660us 2.797ms 174.8us <- wake_expired_tasks@src/task.c:429 task_drop_running srv_cleanup_toremove_conns 16 1.133ms 70.81us 5.685ms 355.3us <- srv_cleanup_idle_conns@src/server.c:5948 task_wakeup process_stream 9891130 1.809m 10.97us 53.61m 325.2us <- sc_notify@src/stconn.c:1209 task_wakeup process_stream 9823933 1.887m 11.52us 48.31m 295.1us <- stream_new@src/stream.c:563 task_wakeup h1_io_cb 17357952 40.91s 2.357us 4.849m 16.76us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup h1_io_cb 501344 1.848s 3.686us 6.544m 783.2us <- conn_subscribe@src/connection.c:732 tasklet_wakeup h1_io_cb 44427 155.0ms 3.489us 21.50s 484.0us <- h1_takeover@src/mux_h1.c:4085 tasklet_wakeup h1_io_cb 2804 7.974ms 2.843us 1.980s 706.0us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup h1_timeout_task 3236 1.172ms 362.0ns 1.119s 345.9us <- h1_release@src/mux_h1.c:1087 task_wakeup h2_timeout_task 32 30.69us 958.0ns 9.038ms 282.4us <- h2_release@src/mux_h2.c:1191 task_wakeup h2_io_cb 173019 4.204s 24.30us 40.95s 236.7us <- h2_snd_buf@src/mux_h2.c:6712 tasklet_wakeup h2_io_cb 149487 424.3ms 2.838us 14.63s 97.87us <- h2c_restart_reading@src/mux_h2.c:856 tasklet_wakeup h2_io_cb 8 34.12us 4.265us 1.784ms 223.0us <- h2_do_shutw@src/mux_h2.c:4656 tasklet_wakeup qc_io_cb 50355 19.01s 377.5us 10.65s 211.4us <- qc_treat_acked_tx_frm@src/xprt_quic.c:1695 tasklet_wakeup qc_io_cb 9018 4.924s 546.0us 3.084s 342.0us <- qc_stream_desc_ack@src/quic_stream.c:128 tasklet_wakeup qc_io_cb 2623 2.669s 1.017ms 1.347s 513.5us <- h3_snd_buf@src/h3.c:1084 tasklet_wakeup qc_io_cb 167 10.71ms 64.12us 32.47ms 194.4us <- qc_process_timer@src/xprt_quic.c:4602 tasklet_wakeup qc_io_cb 2 20.81us 10.40us 4.943us 2.471us <- qc_init@src/mux_quic.c:2057 tasklet_wakeup qc_io_cb 1 867.1us 867.1us 812.8us 812.8us <- qcs_consume@src/mux_quic.c:800 tasklet_wakeup qc_idle_timer_task 2 295.0us 147.5us 8.761us 4.380us <- wake_expired_tasks@src/task.c:344 task_wakeup quic_conn_io_cb 4 4.278ms 1.069ms 6.469us 1.617us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after quic_conn_app_io_cb 92205 3.735s 40.51us 390.9ms 4.239us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after quic_conn_app_io_cb 648 12.62ms 19.47us 225.7ms 348.2us <- qc_process_timer@src/xprt_quic.c:4635 tasklet_wakeup quic_conn_app_io_cb 12 786.9us 65.58us 2.042ms 170.1us <- qc_process_timer@src/xprt_quic.c:4589 tasklet_wakeup quic_conn_app_io_cb 2 752.9us 376.4us 63.97us 31.99us <- qc_xprt_start@src/xprt_quic.c:7122 tasklet_wakeup quic_lstnr_dghdlr 94389 614.0ms 6.504us 30.54s 323.6us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:255 tasklet_wakeup qc_process_timer 662 526.4us 795.0ns 1.081s 1.633ms <- wake_expired_tasks@src/task.c:344 task_wakeup quic_accept_run 2 13.84us 6.920us 172.8us 86.42us <- quic_accept_push_qc@src/quic_sock.c:458 tasklet_wakeup other 101892 4.626s 45.40us 14.84s 145.7us It already becomes visible that some tasks have different very costs depending where they're called (e.g. process_stream). The method used to wake them up is also shown. Applets are handled specially and shown as appctx_wakeup.	2022-09-08 16:21:22 +02:00
Willy Tarreau	a3423873fe	CLEANUP: activity: make the number of sched activity entries more configurable This removes all the hard-coded 8-bit and 256 entries to use a pair of macros instead so that we can more easily experiment with larger table sizes if needed.	2022-09-08 14:55:09 +02:00
Willy Tarreau	e0e6d81460	CLEANUP: task: move tid and wake_date into the common part There used to be one tid for tasklets and a thread_mask for tasks. Since 2.7, both tasks and tasklets now use a tid (albeit with a very slight semantic difference for the negative value), to in order to limit code duplication and to ease debugging it makes sense to move tid into the common part. One limitation is that it will leave a hole in the structure, but we now have the wake_date that is always present and can move there as well to plug the hole. This results in something overall pretty clean (and cleaner than before), with the low-level stuff (state,tid,process,context) appearing first, then the caller stuff (caller,wake_date,calls,debug) next, and finally the type-specific stuff (rq/wq/expire/nice).	2022-09-08 14:30:38 +02:00
Willy Tarreau	2830d282e5	DEBUG: task: simplify the caller recording in DEBUG_TASK Instead of storing an index that's swapped at every call, let's use the two pointers as a shifting history. Now we have a permanent "caller" field that records the last caller, and an optional prev_caller in the debug section enabled by DEBUG_TASK that keeps a copy of the previous caller one. This way, not only it's much easier to follow what's happening during debugging, but it saves 8 bytes in the struct task in debug mode and still keeps it under 2 cache lines in nominal mode, and this will finally be usable everywhere and later in profiling. The caller_idx was also used as a hint that the entry was freed, in order to detect wakeup-after-free. This was changed by setting caller to -1 instead and preserving its value in caller[1]. Finally, the operations were made atomic. That's not critical but since it's used for debugging and race conditions represent a significant part of the issues in multi-threaded mode, it seems wise to at least eliminate some possible factors of faulty analysis.	2022-09-08 14:30:38 +02:00
Willy Tarreau	8d71abf0cd	DEBUG: applet: instrument appctx_wakeup() to log the caller's location appctx_wakeup() relies on task_wakeup(), but since it calls it from a function, the calling place is always appctx_wakeup() itself, which is not very useful. Let's turn it to a macro so that we can log the location of the caller instead. As an example, the cli_io_handler() which used to be seen as this: (gdb) p appctx->t.debug.caller[0] $10 = { func = 0x9ffb78 <__func__.37996> "appctx_wakeup", file = 0x9b336a "include/haproxy/applet.h", line = 110, what = 1 '\001', arg8 = 0 '\000', arg32 = 0 } Now shows the more useful: (gdb) p appctx->t.debug.caller[0] $6 = { func = 0x9ffe80 <__func__.38641> "sc_app_chk_snd_applet", file = 0xa00320 "src/stconn.c", line = 996, what = 6 '\006', arg8 = 0 '\000', arg32 = 0 }	2022-09-08 14:30:38 +02:00
Willy Tarreau	e08af9a0f4	DEBUG: task: use struct ha_caller instead of arrays of file:line This reduces the task struct by 8 bytes, reduces the code size a little bit by simplifying the calling convention (one argument dropped), and as a bonus provides the function name in the caller.	2022-09-08 14:30:38 +02:00
Willy Tarreau	d2b2ad902b	DEBUG: task: define a series of wakeup types for tasks and tasklets The WAKEUP_* values will be used to report how a task/tasklet was woken up, and task_wakeup_type_str() wlil report the associated function name.	2022-09-08 14:30:16 +02:00
Willy Tarreau	d96d214b4c	CLEANUP: debug: use struct ha_caller for memstat The memstats code currently defines its own file/function/line number, type and extra pointer. We don't need to keep them separate and we can easily replace them all with just a struct ha_caller. Note that the extra pointer could be converted to a pool ID stored into arg8 or arg32 and be dropped as well, but this would first require to define IDs for pools (which we currently do not have).	2022-09-08 14:19:15 +02:00
Willy Tarreau	7f2f1f294c	MINOR: debug: add struct ha_caller to describe a calling location The purpose of this structure is to assemble all constant parts of a generic calling point for a specific event. These ones are created by the compiler as a static const element outside of the code path, so they cost nothing in terms of CPU, and a pointer to that descriptor can be passed to the place that needs it. This is very similar to what is being done for the mem_stat stuff. This will be useful to simplify and improve DEBUG_TASK.	2022-09-08 14:19:15 +02:00
Willy Tarreau	4a3907617f	MINOR: tools: add generic pointer hashing functions There are a few places where it's convenient to hash a pointer to compute a statistics bucket. Here we're basically reusing the hash that was used by memory profiling with a minor update that the multiplier was corrected to be prime and stand by its promise to have equal numbers of 1 and 0, and that 32-bit platforms won't lose range anymore. A two-pointer variant was also added.	2022-09-08 14:19:15 +02:00
Willy Tarreau	6a28a30efa	MINOR: tasks: do not keep cpu and latency times in struct task It was a mistake to put these two fields in the struct task. This was added in 1.9 via commit 9efd7456e ("MEDIUM: tasks: collect per-task CPU time and latency"). These fields are used solely by streams in order to report the measurements via the lat_ns* and cpu_ns* sample fetch functions when task profiling is enabled. For the rest of the tasks, this is pure CPU waste when profiling is enabled, and memory waste 100% of the time, as the point where these latencies and usages are measured is in the profiling array. Let's move the fields to the stream instead, and have process_stream() retrieve the relevant info from the thread's context. The struct task is now back to 120 bytes, i.e. almost two cache lines, with 32 bit still available.	2022-09-08 14:19:15 +02:00
Willy Tarreau	1efddfa6bf	MINOR: sched: store the current profile entry in the thread context The profile entry that corresponds to the current task/tasklet being profiled is now stored into the thread's context. This will allow it to be accessed from the tasks themselves. This is needed for an upcoming fix.	2022-09-08 14:19:15 +02:00
Willy Tarreau	62b5b96bcc	BUG/MINOR: sched: properly account for the CPU time of dying tasks When task profiling is enabled, the scheduler can measure and report the cumulated time spent in each task and their respective latencies. But this was wrong for tasks with few wakeups as well as for self-waking ones, because the call date needed to measure how long it takes to process the task is retrieved in the task itself (->wake_date was turned to the call date), and we could face two conditions: - a new wakeup while the task is executing would reset the ->wake_date field before returning and make abnormally low values being reported; that was likely the case for task�run_applet for self-waking applets; - when the task dies, NULL is returned and the call date couldn't be retrieved, so that CPU time was not being accounted for. This was particularly visible with process_stream() which is usually called only twice per request, and whose time was systematically halved. The cleanest solution here is to keep in mind that the scheduler already uses quite a bit of local context in th_ctx, and place the intermediary values there so that they cannot vanish. The wake_date has to be reset immediately once read, and only its copy is used along the function. Note that this must be done both for tasks and tasklet, and that until recently tasklets were also able to report wrong values due to their sole dependency on TH_FL_TASK_PROFILING between tests. One nice benefit for future improvements is that such information will now be available from the task without having to be stored into the task itself anymore. Since the tasklet part was computed on wrapping 32-bit arithmetics and the task one was on 64-bit, the values were now consistently moved to 32-bit as it's already largely sufficient (4s spent in a task is more than twice what the watchdog would tolerate). Some further cleanups might be necessary, but the patch aimed at staying minimal. Task profiling output after 1 million HTTP request previously looked like this: Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg h1_io_cb 2012338 4.850s 2.410us 12.91s 6.417us process_stream 2000136 9.594s 4.796us 34.26s 17.13us sc_conn_io_cb 2000135 1.973s 986.0ns 30.24s 15.12us h1_timeout_task 137 - - 2.649ms 19.34us accept_queue_process 49 152.3us 3.107us 321.7yr 6.564yr main+0x146430 7 5.250us 750.0ns 25.92us 3.702us srv_cleanup_idle_conns 1 559.0ns 559.0ns 918.0ns 918.0ns task_run_applet 1 - - 2.162us 2.162us Now it looks like this: Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg h1_io_cb 2014194 4.794s 2.380us 13.75s 6.826us process_stream 2000151 20.01s 10.00us 36.04s 18.02us sc_conn_io_cb 2000148 2.167s 1.083us 32.27s 16.13us h1_timeout_task 198 54.24us 273.0ns 3.487ms 17.61us accept_queue_process 52 158.3us 3.044us 409.9us 7.882us main+0x1466e0 18 16.77us 931.0ns 63.98us 3.554us srv_cleanup_toremove_conns 8 282.1us 35.26us 546.8us 68.35us srv_cleanup_idle_conns 3 149.2us 49.73us 8.131us 2.710us task_run_applet 3 268.1us 89.38us 11.61us 3.871us Note the two-fold difference on process_stream(). This feature is essentially used for debugging so it has extremely limited impact. However it's used quite a bit more in bug reports and it would be desirable that at least 2.6 gets this fix backported. It depends on at least these two previous patches which will then also have to be backported: MINOR: task: permanently enable latency measurement on tasklets CLEANUP: task: rename ->call_date to ->wake_date	2022-09-08 14:19:15 +02:00
Willy Tarreau	04e50b3d32	CLEANUP: task: rename ->call_date to ->wake_date This field is misnamed because its real and important content is the date the task was woken up, not the date it was called. It temporarily holds the call date during execution but this remains confusing. In fact before the latency measurements were possible it was indeed a call date. Thus is will now be called wake_date. This change is necessary because a subsequent fix will require the introduction of the real call date in the thread ctx.	2022-09-08 14:19:15 +02:00
Willy Tarreau	768c2c5678	MINOR: task: permanently enable latency measurement on tasklets When tasklet latency measurement was enabled in 2.4 with commit b2285de04 ("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set"), the feature was conditionned on DEBUG_TASK because the field would add 8 bytes to the struct tasklet. This approach was not a very good idea because the struct ends on an int anyway thus it does finish with a 32-bit hole regardless of the presence of this field. What is true however is that adding it turned a 64-byte struct to 72-byte when caller debugging is enabled. This patch revisits this with a minor change. Now only the lowest 32 bits of the call date are stored, so they always fit in the remaining hole, and this allows to remove the dependency on DEBUG_TASK. With debugging off, we're now seeing a 48-byte struct, and with debugging on it's exactly 64 bytes, thus still exactly one cache line. 32 bits allow a latency of 4 seconds on a tasklet, which already indicates a completely dead process, so there's no point storing the upper bits at all. And even in the event it would happen once in a while, the lost upper bits do not really add any value to the debug reports. Also, now one tasklet wakeup every 4 billion will not be sampled due to the test on the value itself. Similarly we just don't care, it's statistics and the measurements are not 9-digit accurate anyway.	2022-09-08 14:19:15 +02:00
Willy Tarreau	0fae3a0360	BUG/MINOR: task: make task_instant_wakeup() work on a task not a tasklet There's a subtle (harmless) bug in task_instant_wakeup(). As it uses some tasklet code instead of some task code, the debug part also acts on the tasklet equivalent, and the call_date is only set when DEBUG_TASK is set instead of inconditionally like with tasks. As such, without this debugging macro, call dates are not updated for tasks woken this way. There isn't any impact yet because this function was introduced in 2.6 to solve certain classes of issues and is not used yet, and in the worst case it would only affect the reported latency time. This may be backported to 2.6 in case a future fix would depend on it but currently will not fix existing code.	2022-09-08 14:19:15 +02:00
Willy Tarreau	f27acd961e	BUG/MINOR: task: always reset a new tasklet's call date The tasklet's call date was not reset, so if profiling was enabled while some tasklets were in the run queue, their initial random value could be used to preload a bogus initial latency value into the task profiling bin. Let's just zero the initial value. This should be backported to 2.4 as it was brought with initial commit b2285de04 ("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set"). The impact is very low though.	2022-09-08 14:19:15 +02:00
Frédéric Lécaille	3122c75fd1	BUG/MINOR: quic: Wrong connection ID to thread ID association To work, quic_pin_cid_to_tid() must set cid[0] to a value with <target_id> as <global.nbthread> modulo. For each integer n, (n - (n % m)) + d has always d as modulo m (with d < m). So, this statement seemed correct: cid[0] = cid[0] - (cid[0] % global.nbthread) + target_tid; except when n wraps or when another modulo is applied to the addition result. Here, for 8bit modulo arithmetic, if m does not divides 256, this cannot works for values which wraps when we increment them by d. For instance n=255 m=3 and d=1 the formula result is 0 (should be d). To fix this, we first limit c[0] to 255 - <target_id> to prevent c[0] from wrapping. Thank you to @esb for having reported this issue in GH #1855. Must be backported to 2.6	2022-09-07 15:59:43 +02:00
William Lallemand	d2be9d4c48	BUILD: quic: temporarly ignore chacha20_poly1305 for libressl LibreSSL does not implement EVP_chacha20_poly1305() with EVP_CIPHER but uses the EVP_AEAD API instead: https://man.openbsd.org/EVP_AEAD_CTX_init This patch disables this cipher for libreSSL for now.	2022-09-07 09:33:46 +02:00
William Lallemand	844009d77a	BUILD: ssl: fix ssl_sock_switchtx_cbk when no client_hello_cb When building HAProxy with USE_QUIC and libressl 3.6.0, the ssl_sock_switchtx_cbk symbol is not found because libressl does not implement the client_hello_cb. A ssl_sock_switchtx_cbk version for the servername callback is available but wasn't exported correctly.	2022-09-07 09:33:46 +02:00
William Lallemand	6d74e179ee	BUILD: quic: add some ifdef around the SSL_ERROR_* for libressl SSL_ERROR_WANT_ASYNC, SSL_ERROR_WANT_ASYNC_JOB and SSL_ERROR_WANT_CLIENT_HELLO_CB does not seems supported by libressl.	2022-09-07 09:33:46 +02:00
Willy Tarreau	ce57777660	MINOR: muxes: add a "show_sd" helper to complete "show sess" dumps This helper will be called for muxes that provide it and will be used to let the mux provide extra information about the stream attached to a stream descriptor. A line prefix is passed in argument so that the mux is free to break long lines without breaking indent. No prefix means no line breaks should be produced (e.g. for short dumps).	2022-09-02 15:48:50 +02:00
Willy Tarreau	178dda6b41	DEBUG: stream: minor rearrangement of a few fields in struct stream. Some recent traces started to show confusing stream pointers ending with 0xe. The reason was that the stream's obj_type was almost unused in the past and was stuffed in a hole in the structure. But now it's present in all "show sess all" outputs and having to mentally match this value against another one that's 0x17e lower is painful. The solution here is to move the obj_type at the top, like in almost every other structure, but without breaking the efficient layout. This patch moves a few fields around and manages to both plug some holes (16 bytes saved, 976 to 960) and avoid channels needlessly crossing cache boundaries (res was spread over 3 lines vs 2 now). Nothing else was changed. It would be desirable to backport this to 2.6 since it's where dumps are currently being processed the most.	2022-09-02 15:48:10 +02:00
Willy Tarreau	d8009a1ca6	BUILD: debug: make sure debug macros are never empty As outlined in commit f7ebe584d7 ("BUILD: debug: Add braces to if statement calling only CHECK_IF()"), the BUG_ON() family of macros is incorrectly defined to be empty when debugging is disabled, and that can lead to trouble. Make sure they always fall back to the usual "do { } while (0)". This may be backported to 2.6 if needed, though no such issue was met there to date.	2022-08-31 10:53:53 +02:00
Fr�d�ric L�caille	c242832af3	BUG/MINOR: quic: Missing header protection AES cipher context initialisations (draft-v2) This bug arrived with this commit: "MINOR: quic: Add reusable cipher contexts for header protection" haproxy could crash because of missing cipher contexts initializations for the header protection and draft-v2 Initial secrets. This was due to the fact that these initialization both for RX and TX secrets were done outside of qc_new_isecs(). The role of this function is definitively to initialize these cipher contexts in addition to the derived secrets. Indeed this function is called by qc_new_conn() which initializes the connection but also by qc_conn_finalize() which also calls qc_new_isecs() in case of a different QUIC version was negotiated by the peers from the one used by the client for its first Initial packet. This was reported by "v2" QUIC interop test with at least picoquic as client. Must be backported to 2.6.	2022-08-29 18:46:40 +02:00
Frédéric Lécaille	f34c1c9568	CLEANUP: quic: No more use ->rx_list MT_LIST entry point (quic_rx_packet) This quic_rx_packet is definitively no more used. Should be backported to 2.6 to ease the future backports.	2022-08-24 18:17:13 +02:00
William Lallemand	b10b1196b8	MINOR: resolvers: shut the warning when "default" resolvers is implicit Shut the connect() warning of resolvers_finalize_config() when the configuration was not emitted manually. This shuts the warning for the "default" resolvers which is created automatically for the httpclient. Must be backported in 2.6.	2022-08-24 14:56:42 +02:00
Christopher Faulet	871dd82117	BUG/MINOR: tcpcheck: Disable QUICKACK only if data should be sent after connect It is only a real problem for agent-checks when there is no agent string to send. The condition to disable TCP_QUICKACK was only based on the action type following the connect one. But it is not always accurate. indeed, for agent-checks, there is always a SEND action. But if there is no "agent-send" string defined, nothing is sent. In this case, this adds 200ms of latency with no reason. To fix the bug, a flag is now used on the CONNECT action to instruct there are data that should be sent after the connect. For health-checks, this flag is set if the action following the connect is a SEND action. For agent-checks, it is set if an "agent-send" string is defined. This patch should fix the issue #1836. It must be backported as far as 2.2.	2022-08-24 11:59:04 +02:00
Frédéric Lécaille	a2d8ad20a3	MINOR: quic: Replace MT_LISTs by LISTs for RX packets. Replace ->rx.pqpkts quic_enc_level struct member MT_LIST by an LIST. Same thing for ->list quic_rx_packet struct member MT_LIST. Update the code consequently. This was a reminisence of the multithreading support (several threads by connection). Must be backported to 2.6	2022-08-23 17:55:02 +02:00
Frédéric Lécaille	ea4a5cbbdf	BUG/MINOR: mux-quic: Fix memleak on QUIC stream buffer for unacknowledged data Some clients send CONNECTION_CLOSE frame without acknowledging the STREAM data haproxy has sent. In this case, when closing the connection if there were remaining data in QUIC stream buffers, they were not released. Add a <closing> boolean option to qc_stream_desc_free() to force the stream buffer memory releasing upon closing connection. Thank you to Tristan for having reported such a memory leak issue in GH #1801. Must be backported to 2.6.	2022-08-20 19:08:31 +02:00
William Lallemand	62c0b99e3b	MINOR: ssl/cli: implement "add ssl ca-file" In ticket #1805 an user is impacted by the limitation of size of the CLI buffer when updating a ca-file. This patch allows a user to append new certificates to a ca-file instead of trying to put them all with "set ssl ca-file" The implementation use a new function ssl_store_dup_cafile_entry() which duplicates a cafile_entry and its X509_STORE. ssl_store_load_ca_from_buf() was modified to take an apped parameter so we could share the function for "set" and "add".	2022-08-19 19:58:53 +02:00
William Lallemand	d4774d3cfa	MINOR: ssl: handle ca-file appending in cafile_entry In order to be able to append new CA in a cafile_entry, ssl_store_load_ca_from_buf() was reworked and a "append" parameter was added. The function is able to keep the previous X509_STORE which was already present in the cafile_entry.	2022-08-19 19:58:53 +02:00
Frédéric Lécaille	86a53c5669	MINOR: quic: Add reusable cipher contexts for header protection Implement quic_tls_rx_hp_ctx_init() and quic_tls_tx_hp_ctx_init() to initiliaze such header protection cipher contexts for each RX and TX parts and for each packet number spaces, only one time by connection. Make qc_new_isecs() call these two functions to initialize the cipher contexts of the Initial secrets. Same thing for ha_quic_set_encryption_secrets() to initialize the cipher contexts of the subsequent derived secrets (ORTT, 1RTT, Handshake). Modify qc_do_rm_hp() and quic_apply_header_protection() to reuse these cipher contexts. Note that there is no need to modify the key update for the header protection. The header protection secrets are never updated.	2022-08-19 18:31:59 +02:00
Willy Tarreau	1cc08a33e1	MINOR: applet: add a function to reset the svcctx of an applet The CLI needs to reset the svcctx between commands, and there was nothing done to handle this. Let's add appctx_reset_svcctx() to do that, it's the closing equivalent of appctx_reserve_svcctx(). This will have to be backported to 2.6 as it will be used by a subsequent patch to fix a bug.	2022-08-18 18:16:36 +02:00
Amaury Denoyelle	2c5a7ee333	REORG: h2: extract cookies concat function in http_htx As specified by RFC 7540, multiple cookie headers are merged in a single entry before passing it to a HTTP/1.1 connection. This step is implemented during headers parsing in h2 module. Extract this code in the generic http_htx module. This will allow to reuse it quickly for HTTP/3 implementation which has the same requirement for cookie headers.	2022-08-18 16:13:33 +02:00
Amaury Denoyelle	704675656b	BUG/MEDIUM: quic: fix crash on MUX send notification MUX notification on TX has been edited recently : it will be notified only when sending its own data, and not for example on retransmission by the quic-conn layer. This is subject of the patch : b29a1dc2f4a334c1c7fea76c59abb4097422c05c BUG/MINOR: quic: do not notify MUX on frame retransmit A new flag QUIC_FL_CONN_RETRANS_LOST_DATA has been introduced to differentiate qc_send_app_pkts invocation by MUX and directly by the quic-conn layer in quic_conn_app_io_cb(). However, this is a first problem as internal quic-conn layer usage is not limited to retransmission. For example for NEW_CONNECTION_ID emission. Another problem much important is that send functions are also called through quic_conn_io_cb() which has not been protected from MUX notification. This could probably result in crash when trying to notify the MUX. To fix both problems, quic-conn flagging has been inverted : when used by the MUX, quic-conn is flagged with QUIC_FL_CONN_TX_MUX_CONTEXT. To improve the API, MUX must now used qc_send_mux which ensure the flag is set. qc_send_app_pkts is now static and can only be used by the quic-conn layer. This must be backported wherever the previously mentionned patch is.	2022-08-18 11:33:22 +02:00
Amaury Denoyelle	b29a1dc2f4	BUG/MINOR: quic: do not notify MUX on frame retransmit On STREAM emission, quic-conn notifies MUX through a callback named qcc_streams_sent_done(). This also happens on retransmission : in this case offset are examined and notification is ignored if already seen. However, this behavior has slightly changed since e53b489826ba9760a527b461095402ca05d2b6be BUG/MEDIUM: mux-quic: fix server chunked encoding response Indeed, if offset diff is NULL, frame is now not ignored. This is to support FIN notification with a final empty STREAM frame. A side-effect of this is that if the last stream frame is retransmitted, it won't be ignored in qcc_streams_sent_done(). In most cases, this side-effect is harmless as qcs instance will soon be freed after being closed. But if qcs is still alive, this will cause a BUG_ON crash as it is considered as locally closed. This bug depends on delay condition and seems to be extremely rare. But it might be the reason for a crash seen on interop with s2n client on http3 testcase : FATAL: bug condition "qcs->st == QC_SS_CLO" matched at src/mux_quic.c:372 call trace(16): \| 0x558228912b0d [b8 01 00 00 00 c6 00 00]: main-0x1c7878 \| 0x558228917a70 [48 8b 55 d8 48 8b 45 e0]: qcc_streams_sent_done+0xcf/0x355 \| 0x558228906ff1 [e9 29 05 00 00 48 8b 05]: main-0x1d3394 \| 0x558228907cd9 [48 83 c4 10 85 c0 0f 85]: main-0x1d26ac \| 0x5582289089c1 [48 83 c4 50 85 c0 75 12]: main-0x1d19c4 \| 0x5582288f8d2a [48 83 c4 40 48 89 45 a0]: main-0x1e165b \| 0x5582288fc4cc [89 45 b4 83 7d b4 ff 74]: qc_send_app_pkts+0xc6/0x1f0 \| 0x5582288fd311 [85 c0 74 12 eb 01 90 48]: main-0x1dd074 \| 0x558228b2e4c1 [48 c7 c0 d0 60 ff ff 64]: run_tasks_from_lists+0x4e6/0x98e \| 0x558228b2f13f [8b 55 80 29 c2 89 d0 89]: process_runnable_tasks+0x7d6/0x84c \| 0x558228ad9aa9 [8b 05 75 16 4b 00 83 f8]: run_poll_loop+0x80/0x48c \| 0x558228ada12f [48 8b 05 aa c5 20 00 48]: main-0x256 \| 0x7ff01ed2e609 [64 48 89 04 25 30 06 00]: libpthread:+0x8609 \| 0x7ff01e8ca163 [48 89 c7 b8 3c 00 00 00]: libc:clone+0x43/0x5e To reproduce it locally, code was artificially patched to produce retransmission and avoid qcs liberation. In order to fix this and avoid future class of similar problem, the best way is to not call qcc_streams_sent_done() to notify MUX for retranmission. To implement this, we test if any of QUIC_FL_CONN_RETRANS_OLD_DATA or the new flag QUIC_FL_CONN_RETRANS_LOST_DATA is set. A new wrapper qc_send_app_retransmit() has been added to set the new flag as a complement to already existing qc_send_app_probing(). This must be backported up to 2.6.	2022-08-17 11:06:24 +02:00
Amaury Denoyelle	cc13047364	MINOR: quic: refactor application send Adjust qc_send_app_pkts function : remove <old_data> arg and provide a new wrapper function qc_send_app_probing() which should be used instead when probing with old data. This simplifies the interface of the default function, most notably for the MUX which does not interfer with retransmission. QUIC_FL_CONN_RETRANS_OLD_DATA flag is set/unset directly in the wrapper qc_send_app_probing(). At the same time, function documentation has been updated to clarified arguments and return values. This commit will be useful for the next patch to differentiate MUX and retransmission send context. As a consequence, the current patch should be backported wherever the next one will be.	2022-08-17 11:05:49 +02:00
Amaury Denoyelle	bf3c208760	BUG/MEDIUM: mux-quic: reject uni stream ID exceeding flow control Emit STREAM_LIMIT_ERROR if a client tries to open an unidirectional stream with an ID greater than the value specified by our flow-control limit. The code is similar to the bidirectional stream opening. MAX_STREAMS_UNI emission is not implement for the moment and is left as a TODO. This should not be too urgent for the moment : in HTTP/3, a client has only a limited use for unidirectional streams (H3 control stream + 2 QPACK streams). This is covered by the value provided by haproxy in transport parameters. This patch has been tagged with BUG as it should have prevented last crash reported on github issue #1808 when opening a new unidirectional streams with an invalid ID. However, it is probably not the main cause of the bug contrary to the patch commit 11a6f4007b908b49ecd3abd5cd10fba177f07c11 BUG/MINOR: quic: Wrong status returned by qc_pkt_decrypt() This must be backported up to 2.6.	2022-08-17 11:05:19 +02:00
Amaury Denoyelle	26aa399d6b	MINOR: qpack: report error on enc/dec stream close As specified by RFC 9204, encoder and decoder streams must not be closed. If the peer behaves incorrectly and closes one of them, emit a H3_CLOSED_CRITICAL_STREAM connection error. To implement this, QPACK stream decoding API has been slightly adjusted. Firstly, fin parameter is passed to notify about FIN STREAM bit. Secondly, qcs instance is passed via unused void* context. This allows to use qcc_emit_cc_app() function to report a CONNECTION_CLOSE error.	2022-08-17 11:04:53 +02:00
Willy Tarreau	cc1a2a1867	MINOR: chunk: inline alloc_trash_chunk() This function is responsible for all calls to pool_alloc(trash), whose total size can be huge. As such it's quite a pain that it doesn't provide more hints about its users. However, since the function is tiny, it fully makes sense to inline it, the code is less than 0.1% larger with this. This way we can now detect where the callers are via "show profiling", e.g.: 0 1953671 0 32071463136\| 0x59960f main+0x10676f p_free(-16416) [pool=trash] 0 1 0 16416\| 0x59960f main+0x10676f p_free(-16416) [pool=trash] 1953672 0 32071479552 0\| 0x599561 main+0x1066c1 p_alloc(16416) [pool=trash] 0 976835 0 16035723360\| 0x576ca7 http_reply_to_htx+0x447/0x920 p_free(-16416) [pool=trash] 0 1 0 16416\| 0x576ca7 http_reply_to_htx+0x447/0x920 p_free(-16416) [pool=trash] 976835 0 16035723360 0\| 0x576a5d http_reply_to_htx+0x1fd/0x920 p_alloc(16416) [pool=trash] 1 0 16416 0\| 0x576a5d http_reply_to_htx+0x1fd/0x920 p_alloc(16416) [pool=trash]	2022-08-17 10:45:22 +02:00
Willy Tarreau	42b180dcdb	MINOR: pools/memprof: store and report the pool's name in each bin Storing the pointer to the pool along with the stats is quite useful as it allows to report the name. That's what we're doing here. We could store it in place of another field but that's not convenient as it would require to change all functions that manipulate counters. Thus here we store one extra field, as well as some padding because the struct turns 56 bytes long, thus better go to 64 directly. Example of output from "show profiling memory": 2 0 48 0\| 0x4bfb2c ha_quic_set_encryption_secrets+0xcc/0xb5e p_alloc(24) [pool=quic_tls_iv] 0 55252 0 10608384\| 0x4bed32 main+0x2beb2 free(-192) 15 0 2760 0\| 0x4be855 main+0x2b9d5 p_alloc(184) [pool=quic_frame] 1 0 1048 0\| 0x4be266 ha_quic_add_handshake_data+0x2b6/0x66d p_alloc(1048) [pool=quic_crypto] 3 0 552 0\| 0x4be142 ha_quic_add_handshake_data+0x192/0x66d p_alloc(184) [pool=quic_frame] 31276 0 6755616 0\| 0x4bb8f9 quic_sock_fd_iocb+0x689/0x69b p_alloc(216) [pool=quic_dgram] 0 31424 0 6787584\| 0x4bb7f3 quic_sock_fd_iocb+0x583/0x69b p_free(-216) [pool=quic_dgram] 152 0 32832 0\| 0x4bb4d9 quic_sock_fd_iocb+0x269/0x69b p_alloc(216) [pool=quic_dgram]	2022-08-17 10:34:00 +02:00
Willy Tarreau	facfad2b64	MINOR: pool/memprof: report pool alloc/free in memory profiling Pools are being used so well that it becomes difficult to profile their usage via the regular memory profiling. Let's add new entries for pools there, named "p_alloc" and "p_free" that correspond to pool_alloc() and pool_free(). Ideally it would be nice to only report those that fail cache lookups but that's complicated, particularly on the free() path since free lists are released in clusters to the shared pools. It's worth noting that the alloc_tot/free_tot fields can easily be determined by multiplying alloc_calls/free_calls by the pool's size, and could be better used to store a pointer to the pool itself. However it would require significant changes down the code that sorts output. If this were to cause a measurable slowdown, an alternate approach could consist in using a different value of USE_MEMORY_PROFILING to enable pools profiling. Also, this profiler doesn't depend on intercepting regular malloc functions, so we could also imagine enabling it alone or the other one alone or both. Tests show that the CPU overhead on QUIC (which is already an extremely intensive user of pools) jumps from ~7% to ~10%. This is quite acceptable in most deployments.	2022-08-17 09:38:05 +02:00
Willy Tarreau	219afa2ca8	MINOR: memprof: export the minimum definitions for memory profiling Right now it's not possible to feed memory profiling info from outside activity.c, so let's export the function and move the enum and struct to the include file.	2022-08-17 09:03:57 +02:00
Willy Tarreau	0b8e9ceb12	MINOR: ring: add support for a backing-file This mmaps a file which will serve as the backing-store for the ring's contents. The idea is to provide a way to retrieve sensitive information (last logs, debugging traces) even after the process stops and even after a possible crash. Right now this was possible by connecting to the CLI and dumping the contents of the ring live, but this is not handy and consumes quite a bit of resources before it is needed. With a backing file, the ring is effectively RAM-mapped file, so that contents stored there are the same as those found in the file (the OS doesn't guarantee immediate sync but if the process dies it will be OK). Note that doing that on a filesystem backed by a physical device is a bad idea, as it will induce slowdowns at high loads. It's really important that the device is RAM-based. Also, this may have security implications: if the file is corrupted by another process, the storage area could be corrupted, causing haproxy to crash or to overwrite its own memory. As such this should only be used for debugging.	2022-08-12 11:18:46 +02:00
Willy Tarreau	6df10d872b	MINOR: ring: support creating a ring from a linear area Instead of allocating two parts, one for the ring struct itself and one for the storage area, ring_make_from_area() will arrange the two inside the same memory area, with the storage starting immediately after the struct. This will allow to store a complete ring state in shared memory areas for example.	2022-08-12 11:18:46 +02:00
Willy Tarreau	8df098c2b1	BUILD: ring: forward-declare struct appctx to avoid a build warning When using ring.h standalone it emits warnings about appctx. Let's forward-declare it.	2022-08-12 11:18:46 +02:00
Fr�d�ric L�caille	7629f5d670	BUG/MEDIUM: quic: Wrong use of <token_odcid> in qc_lsntr_pkt_rcv() This commit was not complete: "BUG/MEDIUM: quic: Possible use of uninitialized <odcid> variable in qc_lstnr_params_init()" <token_odcid> should have been directly passed to qc_lstnr_params_init() without dereferencing it to prevent haproxy to have new chances to crash! Must be backported to 2.6.	2022-08-11 19:12:12 +02:00
Fr�d�ric L�caille	e9325e97c2	BUG/MEDIUM: quic: Possible use of uninitialized <odcid> variable in qc_lstnr_params_init() When receiving a token into a client Initial packet without a cluster secret defined by configuration, the <odcid> variable used to parse the ODCID from the token could be used without having been initialized. Such a packet must be dropped. So the sufficient part of this patch is this check: + } + else if (!global.cluster_secret && token_len) { + /* Impossible case: a token was received without configured + * cluster secret. + */ + TRACE_PROTO("Packet dropped", QUIC_EV_CONN_LPKT, + NULL, NULL, NULL, qv); + goto drop; } Take the opportunity of this patch to rework and make it more readable this part of code where such a packet must be dropped removing the <check_token> variable. When an ODCID is parsed from a token, new <token_odcid> new pointer variable is set to the address of the parsed ODCID. This way, is not set but used it will make crash haproxy. This was not always the case with an uninitialized local variable. Adapt the API to used such a pointer variable: <token> boolean variable is removed from qc_lstnr_params_init() prototype. This must be backported to 2.6.	2022-08-11 18:33:36 +02:00
Amaury Denoyelle	4c9a1642c1	MINOR: mux-quic: define new traces Add new traces to help debugging on QUIC MUX. Most notable, the following functions are now traced : * qcc_emit_cc * qcs_free * qcs_consume * qcc_decode_qcs * qcc_emit_cc_app * qcc_install_app_ops * qcc_release_remote_stream * qcc_streams_sent_done * qc_init	2022-08-11 15:20:44 +02:00

... 24 25 26 27 28 ...

3465 Commits