haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-11-19 09:51:01 +01:00

Author	SHA1	Message	Date
Willy Tarreau	98cc815e3e	MINOR: activity: collect time spent with a lock held for each task When DEBUG_THREAD > 0 and task profiling enabled, we'll now measure the time spent with at least one lock held for each task. The time is collected by locking operations when locks are taken raising the level to one, or released resetting the level. An accumulator is updated in the thread_ctx struct that is collected by the scheduler when the task returns, and updated in the sched_activity entry of the related task. This allows to observe figures like this one: Tasks activity over 259.516 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg lat_avg h1_io_cb 15466589 2.574m 9.984us - - 33.45us <- sock_conn_iocb@src/sock.c:1099 tasklet_wakeup sc_conn_io_cb 8047994 8.325s 1.034us - - 870.1us <- sc_app_chk_rcv_conn@src/stconn.c:844 tasklet_wakeup process_stream 7734689 4.356m 33.79us 1.990us 1.641us 1.554ms <- sc_notify@src/stconn.c:1206 task_wakeup process_stream 7734292 46.74m 362.6us 278.3us 132.2us 972.0us <- stream_new@src/stream.c:585 task_wakeup sc_conn_io_cb 7733158 46.88s 6.061us - - 68.78us <- h1_wake_stream_for_recv@src/mux_h1.c:3633 tasklet_wakeup task_process_applet 6603593 4.484m 40.74us 16.69us 34.00us 96.47us <- sc_app_chk_snd_applet@src/stconn.c:1043 appctx_wakeup task_process_applet 4761796 3.420m 43.09us 18.79us 39.28us 138.2us <- __process_running_peer_sync@src/peers.c:3579 appctx_wakeup process_table_expire 4710662 4.880m 62.16us 9.648us 53.95us 158.6us <- run_tasks_from_lists@src/task.c:671 task_queue stktable_add_pend_updates 4171868 6.786s 1.626us - 1.487us 47.94us <- stktable_add_pend_updates@src/stick_table.c:869 tasklet_wakeup h1_io_cb 2871683 1.198s 417.0ns 70.00ns 69.00ns 1.005ms <- h1_takeover@src/mux_h1.c:5659 tasklet_wakeup process_peer_sync 2304957 5.368s 2.328us - 1.156us 68.54us <- stktable_add_pend_updates@src/stick_table.c:873 task_wakeup process_peer_sync 1388141 3.174s 2.286us - 1.130us 52.31us <- run_tasks_from_lists@src/task.c:671 task_queue stktable_add_pend_updates 463488 3.530s 7.615us 2.000ns 7.134us 771.2us <- stktable_touch_with_exp@src/stick_table.c:654 tasklet_wakeup Here we see that almost the entirety of stktable_add_pend_updates() is spent under a lock, that 1/3 of the execution time of process_stream() was performed under a lock and that 2/3 of it was spent waiting for a lock (this is related to the 10 track-sc present in this config), and that the locking time in process_peer_sync() has now significantly reduced. This is more visible with "show profiling tasks aggr": Tasks activity over 475.354 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg lat_avg h1_io_cb 25742539 3.699m 8.622us 11.00ns 10.00ns 188.0us sc_conn_io_cb 22565666 1.475m 3.920us - - 473.9us process_stream 21665212 1.195h 198.6us 140.6us 67.08us 1.266ms task_process_applet 16352495 11.31m 41.51us 17.98us 36.55us 112.3us process_peer_sync 7831923 17.15s 2.189us - 1.107us 41.27us process_table_expire 6878569 6.866m 59.89us 9.359us 51.91us 151.8us stktable_add_pend_updates 6602502 14.77s 2.236us - 2.060us 119.8us h1_timeout_task 801 703.4us 878.0ns - - 185.7us srv_cleanup_toremove_conns 347 12.43ms 35.82us 240.0ns 70.00ns 1.924ms accept_queue_process 142 1.384ms 9.743us - - 340.6us srv_cleanup_idle_conns 74 475.0us 6.418us 896.0ns 5.667us 114.6us	2025-09-11 16:32:34 +02:00
Willy Tarreau	95433f224e	MINOR: activity: add a new lkd_avg column to show profiling stats This new column will be used for reporting the average time spent in a task with at least one lock held. It will only have a non-zero value when DEBUG_THREAD > 0. For now it is not updated.	2025-09-11 16:32:34 +02:00
Willy Tarreau	4b23b2ed32	MINOR: thread: add a lock level information in the thread_ctx The new lock_level field indicates the number of cumulated locks that are held by the current thread. It's fed as soon as DEBUG_THREAD is at least 1. In addition, thread_isolate() adds 128, so that it's even possible to check for combinations of both. The value is also reported in thread dumps (warnings and panics).	2025-09-11 16:32:34 +02:00
Willy Tarreau	503084643f	MINOR: activity: collect time spent waiting on a lock for each task When DEBUG_THREAD > 0, and if task profiling is enabled, then each locking attempt will measure the time it takes to obtain the lock, then add that time to a thread_ctx accumulator that the scheduler will then retrieve to update the current task's sched_activity entry. The value will then appear avearaged over the number of calls in the lkw_avg column of "show profiling tasks", such as below: Tasks activity over 48.298 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lat_avg h1_io_cb 3200170 26.81s 8.377us - 32.73us <- sock_conn_iocb@src/sock.c:1099 tasklet_wakeup sc_conn_io_cb 1657841 1.645s 992.0ns - 853.0us <- sc_app_chk_rcv_conn@src/stconn.c:844 tasklet_wakeup process_stream 1600450 49.16s 30.71us 1.936us 1.392ms <- sc_notify@src/stconn.c:1206 task_wakeup process_stream 1600321 7.770m 291.3us 209.1us 901.6us <- stream_new@src/stream.c:585 task_wakeup sc_conn_io_cb 1599928 7.975s 4.984us - 65.77us <- h1_wake_stream_for_recv@src/mux_h1.c:3633 tasklet_wakeup task_process_applet 997609 46.37s 46.48us 16.80us 113.0us <- sc_app_chk_snd_applet@src/stconn.c:1043 appctx_wakeup process_table_expire 922074 48.79s 52.92us 7.275us 181.1us <- run_tasks_from_lists@src/task.c:670 task_queue stktable_add_pend_updates 705423 1.511s 2.142us - 56.81us <- stktable_add_pend_updates@src/stick_table.c:869 tasklet_wakeup task_process_applet 683511 34.75s 50.84us 18.37us 153.3us <- __process_running_peer_sync@src/peers.c:3579 appctx_wakeup h1_io_cb 535395 198.1ms 370.0ns 72.00ns 930.4us <- h1_takeover@src/mux_h1.c:5659 tasklet_wakeup It now makes it pretty obvious which tasks (hence call chains) spend their time waiting on a lock and for what share of their execution time.	2025-09-11 16:32:34 +02:00
Willy Tarreau	1956c544b5	MINOR: activity: add a new lkw_avg column to show profiling stats This new column will be used for reporting the average time spent waiting for a lock. It will only have a non-zero value when DEBUG_THREAD > 0. For now it is not updated.	2025-09-11 16:32:34 +02:00
Willy Tarreau	9f7ce9e807	MINOR: activity: don't report the lat_tot column for show profiling tasks This column is pretty useless, as the total latency experienced by tasks is meaningless, what matters is the average per call. Since we'll add more columns and we need to keep all of this readable, let's get rid of this column.	2025-09-11 16:32:34 +02:00
Christopher Faulet	3023e98199	BUG/MINOR: resolvers: Restore round-robin selection on records in DNS answers Since the commit dcb696cd3 ("MEDIUM: resolvers: hash the records before inserting them into the tree"), When several records are found in a DNS answer, the round robin selection over these records is no longer performed. Indeed, before a list of records was used. To ensure each records was selected one after the other, at each selection, the first record of the list was moved at the end. When this list was replaced bu a tree, the same mechanism was preserved. However, the record is indexed using its key, a hash of the record. So its position never changes. When it is removed and reinserted in the tree, its position remains the same. When we walk though the tree, starting from the root, the records are always evaluated in the same order. So, even if there are several records in a DNS answer, the same IP address is always selected. It is quite easy to trigger the issue with a do-resolv action. To fix the issue, the node to perform the next selection is now saved. So instead of restarting from the root each time, we can restart from the next node of the previous call. Thanks to Damien Claisse for the issue analysis and for the reproducer. This patch should fix the issue #3116. It must be backported as far as 2.6.	2025-09-11 15:46:45 +02:00
Christopher Faulet	37abe56b18	BUG/MEDIUM: resolvers: Properly cache do-resolv resolution As stated by the documentation, when a do-resolv resolution is performed, the result should be cached for <hold.valid> milliseconds. However, the only way to cache the result is to always have a requester. When the last requester is unlink from the resolution, the resolution is released. So, for a do-resolv resolution, it means it could only work by chance if the same FQDN is requested enough to always have at least two streams waiting for the resolution. And because in that case, the cached result is used, it means the traffic must be quite high. In fact, a good approach to fix the issue is to keep orphan resolutions to be able cache the result and only release them after hold.valid milliseconds after the last real resolution. The resolver's task already releases orphan resolutions. So we only need to check the expiration date and take care to not release the resolution when the last stream is unlink from it. This patch should be backported to all stable versions. We can start to backport it as far as 3.1 and then wait a bit.	2025-09-11 15:46:45 +02:00
William Lallemand	fb832e1e52	BUILD: ssl: functions defined but not used Previous patch 50d191b ("MINOR: ssl: set functions as static when no protypes in the .h") broke the WolfSSL function with unused functions. This patch add __maybe_unused to ssl_sock_sctl_parse_cbk(), ssl_sock_sctl_add_cbk() and ssl_sock_msgcbk()	2025-09-11 15:32:59 +02:00
William Lallemand	50d191b8a3	MINOR: ssl: set functions as static when no protypes in the .h Check with -Wmissing-prototypes what should be static. src/ssl_sock.c:1572:5: error: no previous prototype for ‘ssl_sock_sctl_add_cbk’ [-Werror=missing-prototypes] 1572 \| int ssl_sock_sctl_add_cbk(SSL ssl, unsigned ext_type, const unsigned char out, size_t outlen, int al, void add_arg) \| ^~~~~~~~~~~~~~~~~~~~~ src/ssl_sock.c:1582:5: error: no previous prototype for ‘ssl_sock_sctl_parse_cbk’ [-Werror=missing-prototypes] 1582 \| int ssl_sock_sctl_parse_cbk(SSL s, unsigned int ext_type, const unsigned char in, size_t inlen, int al, void parse_arg) \| ^~~~~~~~~~~~~~~~~~~~~~~ src/ssl_sock.c:1604:6: error: no previous prototype for ‘ssl_sock_infocbk’ [-Werror=missing-prototypes] 1604 \| void ssl_sock_infocbk(const SSL ssl, int where, int ret) \| ^~~~~~~~~~~~~~~~ src/ssl_sock.c:2107:6: error: no previous prototype for ‘ssl_sock_msgcbk’ [-Werror=missing-prototypes] 2107 \| void ssl_sock_msgcbk(int write_p, int version, int content_type, const void buf, size_t len, SSL ssl, void arg) \| ^~~~~~~~~~~~~~~ src/ssl_sock.c:3936:5: error: no previous prototype for ‘sh_ssl_sess_new_cb’ [-Werror=missing-prototypes] 3936 \| int sh_ssl_sess_new_cb(SSL ssl, SSL_SESSION sess) \| ^~~~~~~~~~~~~~~~~~ src/ssl_sock.c:3990:14: error: no previous prototype for ‘sh_ssl_sess_get_cb’ [-Werror=missing-prototypes] 3990 \| SSL_SESSION sh_ssl_sess_get_cb(SSL ssl, __OPENSSL_110_CONST__ unsigned char key, int key_len, int do_copy) \| ^~~~~~~~~~~~~~~~~~ src/ssl_sock.c:4043:6: error: no previous prototype for ‘sh_ssl_sess_remove_cb’ [-Werror=missing-prototypes] 4043 \| void sh_ssl_sess_remove_cb(SSL_CTX ctx, SSL_SESSION sess) \| ^~~~~~~~~~~~~~~~~~~~~ src/ssl_sock.c:4075:6: error: no previous prototype for ‘ssl_set_shctx’ [-Werror=missing-prototypes] 4075 \| void ssl_set_shctx(SSL_CTX ctx) \| ^~~~~~~~~~~~~ src/ssl_sock.c:4103:6: error: no previous prototype for ‘SSL_CTX_keylog’ [-Werror=missing-prototypes] 4103 \| void SSL_CTX_keylog(const SSL ssl, const char line) \| ^~~~~~~~~~~~~~ src/ssl_sock.c:5167:6: error: no previous prototype for ‘ssl_sock_deinit’ [-Werror=missing-prototypes] 5167 \| void ssl_sock_deinit() \| ^~~~~~~~~~~~~~~ src/ssl_sock.c:6976:6: error: no previous prototype for ‘ssl_sock_close’ [-Werror=missing-prototypes] 6976 \| void ssl_sock_close(struct connection conn, void xprt_ctx) { \| ^~~~~~~~~~~~~~ src/ssl_sock.c:7846:17: error: no previous prototype for ‘ssl_action_wait_for_hs’ [-Werror=missing-prototypes] 7846 \| enum act_return ssl_action_wait_for_hs(struct act_rule rule, struct proxy *px, \| ^~~~~~~~~~~~~~~~~~~~~~	2025-09-11 15:23:59 +02:00
William Lallemand	19daee6549	MINOR: ocsp: put internal functions as static ones -Wmissing-prototypes let us check which functions can be made static and is not used elsewhere. rc/ssl_ocsp.c:1079:5: error: no previous prototype for ‘ssl_ocsp_update_insert_after_error’ [-Werror=missing-prototypes] 1079 \| int ssl_ocsp_update_insert_after_error(struct certificate_ocsp ocsp) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/ssl_ocsp.c:1116:6: error: no previous prototype for ‘ocsp_update_response_stline_cb’ [-Werror=missing-prototypes] 1116 \| void ocsp_update_response_stline_cb(struct httpclient hc) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/ssl_ocsp.c:1127:6: error: no previous prototype for ‘ocsp_update_response_headers_cb’ [-Werror=missing-prototypes] 1127 \| void ocsp_update_response_headers_cb(struct httpclient hc) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/ssl_ocsp.c:1138:6: error: no previous prototype for ‘ocsp_update_response_body_cb’ [-Werror=missing-prototypes] 1138 \| void ocsp_update_response_body_cb(struct httpclient hc) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/ssl_ocsp.c:1149:6: error: no previous prototype for ‘ocsp_update_response_end_cb’ [-Werror=missing-prototypes] 1149 \| void ocsp_update_response_end_cb(struct httpclient *hc) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~ src/ssl_ocsp.c:2095:5: error: no previous prototype for ‘ocsp_update_postparser_init’ [-Werror=missing-prototypes] 2095 \| int ocsp_update_postparser_init() \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~	2025-09-11 15:18:48 +02:00
William Lallemand	0224d60de6	BUG/MINOR: ocsp: prototype inconsistency Inconsistencies between the .h and the .c can't be catched because the .h is not included in the .c. ocsp_update_init() does not have the right prototype and lacks a const attribute. Must be backported in all previous stable versions.	2025-09-11 15:18:10 +02:00
Remi Tricot-Le Breton	e0844a305c	BUG/MINOR: ssl: Fix potential NULL deref in trace callback 'conn' might be NULL in the trace callback so the calls to conn_err_code_str must be covered by a proper check. This issue was found by Coverity and raised in GitHub #3112. The patch must be backported to 3.2.	2025-09-11 14:31:32 +02:00
Remi Tricot-Le Breton	a316342ec6	BUG/MINOR: ssl: Potential NULL deref in trace macro 'ctx' might be NULL when we exit 'ssl_sock_handshake', it can't be dereferenced without check in the trace macro. This was found by Coverity andraised in GitHub #3113. This patch should be backported up to 3.2.	2025-09-11 14:31:32 +02:00
William Lallemand	e52e6f66ac	BUG/MEDIUM: jws: return size_t in JWS functions JWS functions are supposed to return 0 upon error or when nothing was produced. This was done in order to put easily the return value in trash->data without having to check the return value. However functions like a2base64url() or snprintf() could return a negative value, which would be casted in a unsigned int if this happen. This patch add checks on the JWS functions to ensure that no negative value can be returned, and change the prototype from int to size_t. This is also related to issue #3114. Must be backported to 3.2.	2025-09-11 14:31:32 +02:00
William Lallemand	66a7ebfeef	BUG/MINOR: acme: null pointer dereference upon allocation failure Reported in issue #3115: 11. var_compare_op: Comparing task to null implies that task might be null. 681 if (!task) { 682 ret++; 683 ha_alert("acme: couldn't start the scheduler!\n"); 684 } CID 1609721: (#1 of 1): Dereference after null check (FORWARD_NULL) 12. var_deref_op: Dereferencing null pointer task. 685 task->nice = 0; 686 task->process = acme_scheduler; 687 688 task_wakeup(task, TASK_WOKEN_INIT); 689 } 690 Task would be dereferenced upon allocation failure instead of falling back to the end of the function after the error. Should be backported in 3.2.	2025-09-11 14:31:32 +02:00
Amaury Denoyelle	c15129f7dc	DOC: quic: clarifies limited-quic support This patch extends the documentation for "limited-quic" global keyword. It mentions first that it relies on USE_QUIC_OPENSSL_COMPAT=1 build option. Compatibility with TLS libraries is now clearly exposed. In particular, it highlights the fact that it is mostly targetted at OpenSSL version prior to 3.5.2, and that it should be disabled if a recent OpenSSL release is available. It also states that limited-quic does nothing if USE_QUIC_OPENSSL_COMPAT is not set during compilation.	2025-09-11 10:11:12 +02:00
Amaury Denoyelle	d293cc62dc	MINOR: quic: display build warning for compat layer on recent OpenSSL Build option USE_QUIC_OPENSSL_COMPAT=1 must be set to activate QUIC support for OpenSSL prior to version 3.5.2. This compiles an internal compatibility layer, which must be then activated at runtime with global option limited-quic. Starting from OpenSSL version 3.5.2, a proper QUIC TLS API is now exposed. Thus, the compatibility layer is unneeded. However it can still be compiled against newer OpenSSL releases and activated at runtime, mostly for test purpose. As this compatibility layer has some limitations, (no support for QUIC 0-RTT), it's important that users notice this situation and disable it if possible. Thus, this patch adds a notice warning when USE_QUIC_OPENSSL_COMPAT=1 is set when building against OpenSSL 3.5.2 and above. This should be sufficient for users and packagers to understand that this option is not necessary anymore. Note that USE_QUIC_OPENSSL_COMPAT=1 is incompatible with others TLS library which exposed a QUIC API based on original BoringSSL patches set. A build error will prevent the compatibility layer to be built. limited-quic option is thus silently ignored.	2025-09-11 10:11:12 +02:00
Frederic Lecaille	5027ba36a9	MINOR: quic-be: make SSL/QUIC objects use their own indexes (ssl_qc_app_data_index) This index is used to retrieve the quic_conn object from its SSL object, the same way the connection is retrieved from its SSL object for SSL/TCP connections. This patch implements two helper functions to avoid the ugly code with such blocks: #ifdef USE_QUIC else if (qc) { .. } #endif Implement ssl_sock_get_listener() to return the listener from an SSL object. Implement ssl_sock_get_conn() to return the connection from an SSL object and optionally a pointer to the ssl_sock_ctx struct attached to the connections or the quic_conns. Use this functions where applicable: - ssl_tlsext_ticket_key_cb() calls ssl_sock_get_listener() - ssl_sock_infocbk() calls ssl_sock_get_conn() - ssl_sock_msgcbk() calls ssl_sock_get_ssl_conn() - ssl_sess_new_srv_cb() calls ssl_sock_get_conn() - ssl_sock_srv_verifycbk() calls ssl_sock_get_conn() Also modify qc_ssl_sess_init() to initialize the ssl_qc_app_data_index index for the QUIC backends.	2025-09-11 09:51:28 +02:00
Frederic Lecaille	47bb15ca84	MINOR: quic: get rid of ->target quic_conn struct member The ->li (struct listener ) member of quic_conn struct was replaced by a ->target (struct obj_type ) member by this commit: MINOR: quic-be: get rid of ->li quic_conn member to abstract the connection type (front or back) when implementing QUIC for the backends. In these cases, ->target was a pointer to the ojb_type of a server struct. This could not work with the dynamic servers contrary to the listeners which are not dynamic. This patch almost reverts the one mentioned above. ->target pointer to obj_type member is replaced by ->li pointer to listener struct member. As the listener are not dynamic, this is easy to do this. All one has to do is to replace the objt_listener(qc->target) statement by qc->li where applicable. For the backend connection, when needed, this is always qc->conn->target which is used only when qc->conn is initialized. The only "problematic" case is for quic_dgram_parse() which takes a pointer to an obj_type as third argument. But this obj_type is only used to call quic_rx_pkt_parse(). Inside this function it is used to access the proxy counters of the connection thanks to qc_counters(). So, this obj_type argument may be null for now on with this patch. This is the reason why qc_counters() is modified to take this into consideration.	2025-09-11 09:51:28 +02:00
Christopher Faulet	5354c24c76	BUG/MAJOR: stream: Force channel analysis on successful synchronous send This patchs reverts commit a498e527b ("BUG/MAJOR: stream: Remove READ/WRITE events on channels after analysers eval") because of a regression. It was an attempt to properly detect synchronous sends, even when the stream was woken up on a write event. However, the fix was wrong because it could mask shutdowns performed during process_stream() and block the stream. Indeed, when a shutdown is performed, because an error occurred for instance, a write event is reported. The commit above could mask this event while the shutdown prevent any synchronous sends. In such case, the stream could remain blocked infinitly because an I/O event was missed. So to properly fix the original issue (#3070), the write event must not be masked before a synchronous send. Instead, we now force the channel analysis by setting explicitly CF_WAKE_ONCE flags on the corresponding channel if a write event is reported after the synchronous send. CF_WRITE_EVENT flag is remove explicitly just before, so it is quite easy to detect. This patch must be backport to all stable version in same time of the commit above.	2025-09-11 09:47:47 +02:00
Willy Tarreau	ded2110ec6	MEDIUM: peers: move process_peer_sync() to a single thread The remaining half of the task_queue() and task_wakeup() contention is caused by this function when peers are in use, because just like process_table_expire(), it's created using task_new_anywhere() and is woken up for local updates. Let's turn it to single thread by rotating the assigned threads during initialization so that a table only runs on one thread at a time. Here we go backwards to assign the threads, so that on small setups they don't end up on the same CPUs as the ones used by the stick-tables. This way this will make an even better use of large machines. The performance remains the same as with previous patch, even slightly better (1-3% on avg). At this point there's almost no multi-threaded task activity anymore (only srv_cleanup_idle_server once in a while). This should improve the situation described by Felipe in issues #3084 and #3101. This should be backported to 3.2 after some extended checks.	2025-09-10 19:14:05 +02:00
Willy Tarreau	e05afda249	MEDIUM: stick-table: move process_table_expire() to a single thread A big deal of the task_queue() contention is caused by this function because it's created using task_new_anywhere() and is subject to heavy updates. Let's turn it to single thread by rotating the assigned threads during initialization so that a table only runs on one thread at a time. However there's a trick: the function used to call task_queue() to requeue the task if it had advanced its timer (may only happen when learning an entry from a peer). We can't do that anymore since we can't queue another thread's task. Thus instead of the task needs to be scheduled earlier than previously planned, we simply perform a wakeup. It will likely do nothing and will self-adjust its next wakeup timer. Doing so halves the number of multi-thread task wakeups. In addition the request rate at saturation increased by 12% with 16 peers and 40 tables on a 16 8-thread processes. This should improve the situation described by Felipe in issues #3084 and #3101. This should be backported to 3.2 after some extended checks.	2025-09-10 19:13:33 +02:00
Willy Tarreau	2831cb104f	BUG/MINOR: stick-table: make sure never to miss a process_table_expire update In stktable_requeue_exp(), there's a tiny race at the beginning during which we check the task's expiration date to decide whether or not to wake process_table_expire() up. During this race, the task might just have finished running on its owner thread and we can miss a task_queue() opportunity, which probably explains why during testing it seldom happens that a few entries are left at the end. Let's perform a CAS to confirm the value is still the same before leaving. This way we're certain that our value has been seen at least once. This should be backported to 3.2.	2025-09-10 18:45:01 +02:00
Willy Tarreau	2ce5e0edcc	MEDIUM: resolvers: make the process_resolvers() task single-threaded This task is sometimes caught triggering the watchdog while waiting for the infamous resolvers lock, or the scheduler's wait queue lock in task_queue(). Both are caused by its multi-threaded capability. The task may indeed start on a thread that's different from the one that is currently receiving a response and that holds the resolvers lock, and when being queued back, it requires to lock the wait queue. Both problems disappear when sticking it to a single thread. But for configs running multiple resolvers sections, it would be suboptimal to run them all on the same thread. In order to avoid this, we implement a counter in the resolvers_finalize_config() section that rotates the thread for each resolvers section. This was sufficient to further improve the performance here, making the CPU usage drop to about 7% (from 11 previously or 38 initially) and not showing any resolvers lock contention anymore in perf top output. The change was kept fairly minimal to permit a backport once enough testing is conducted on it. It could address a significant part of the trouble reported by Felipe in GH issue #3101.	2025-09-10 16:51:14 +02:00
Willy Tarreau	d624aceaef	MEDIUM: dns: bind the nameserver sockets to the initiating thread There's still a big architectural limitation in the dns/resolvers code regarding threads: resolvers run as a task that is scheduled to run anywhere, and each NS dgram socket is bound to any thread of the same thread group as the initiating thread. This becomes a big problem when dealing with multiple nameservers because responses arrive on any thread, start by locking the resolvers section, and other threads dealing with responses are just stuck waiting for the lock to disappear. This means that most of the time is exclusively spent causing contention. The process_resolvers() function also also suffers from this contention but apparently less often. It turns out that the nameserver sockets are created during emission of the first packet, triggered from the resolvers task. The present patch exploits this to stick all sockets to the calling thread instead of any thread. This way there is no longer any contention between multiple nameservers of a same resolvers section. Tests with a section having 10 name servers showed that the CPU usage dropped from 38 to about 10%, or almost by a factor of 4. Note that TCP resolvers do not offer this possibility because the tasks that manage the applets are created earlier to run anywhere during config parsing. This might possibly be refined later, e.g. by changing the task's affinity when it first runs. The change was kept fairly minimal to permit a backport once enough testing is conducted on it. It could address a significant part of the trouble reported by Felipe in GH issue #3101.	2025-09-10 16:48:09 +02:00
Olivier Houchard	07c10ec2f1	BUG/MEDIUM: ssl: Fix a crash if we failed to create the mux In ssl_sock_io_cb(), if we failed to create the mux, we may have destroyed the connection, so only attempt to access it to get the ALPN if conn_create_mux() was successful. This fixes crashes that may happen when using ssl.	2025-09-10 12:02:53 +02:00
Olivier Houchard	1759c97255	BUG/MEDIUM: ssl: Fix a crash when using QUIC Commit 5ab9954faa9c815425fa39171ad33e75f4f7d56f introduced a new flag in ssl_sock_ctx, to know that an ALPN was negociated, however, the way to get the ssl_sock_ctx was wrong for QUIC. If we're using QUIC, get it from the quic_conn. This should fix crashes when attempting to use QUIC.	2025-09-10 11:45:03 +02:00
Willy Tarreau	be86a69fe8	DEBUG: stick-tables: export stktable_add_pend_updates() for better reporting This function is a tasklet handler used to send peers updates, and it can happen quite a bit in "show tasks" and "show profiling tasks", so let's export it so that we don't face a cryptic symbol name: $ socat - /tmp/haproxy-n10.stat <<< "show tasks" Running tasks: 43 (8 threads) function places % lat_tot lat_avg calls_tot calls_avg calls% process_table_expire 16 37.2 1.072m 4.021s 115831 7239 15.4 task_process_applet 15 34.8 1.072m 4.287s 486299 32419 65.0 stktable_add_pend_updates 8 18.6 - - 89725 11215 12.0 sc_conn_io_cb 3 6.9 - - 5007 1669 0.6 process_peer_sync 1 2.3 4.293s 4.293s 50765 50765 6.7 This should be backported to 3.2 as it participates to debugging the table+peers processing overhead.	2025-09-10 11:34:51 +02:00
Willy Tarreau	993c09438b	BUG/MEDIUM: stick-tables: don't loop on non-expirable entries The stick-table expiration of ref-counted entries was insufficiently addresse by commit 324f0a60ab ("BUG/MINOR: stick-tables: never leave used entries without expiration"), because now entries are just requeued where they were, so they're visited over and over for long sessions, causing process_table_expire() to loop, eating CPU and causing lock contention. Here we take care of refreshing their timeer when they are met, so that we don't meet them more than once per stick-table lifetime. It should address at least a part of the recent degradation that Felipe noticed in GH #3084. Since the fix above was marked for backporting to 3.2, this one should be backported there as well.	2025-09-10 11:27:27 +02:00
Willy Tarreau	997d217dee	MINOR: tools: don't emit "+0" for symbol names which exactly match known ones resolve_sym_name() knows a number of symbols, but when one exactly matches (e.g. a task's handler), it systematically displays the offset behind it ("+0"). Let's only show the offset when non-zero. This can be backported as this is helpful for debugging.	2025-09-10 10:44:33 +02:00
Willy Tarreau	9eb35563a6	MINOR: activity: indicate the number of calls on "show tasks" The "show tasks" command can be useful to inspect run queues for active tasks, but currently it's difficult to distinguish an occasional running task from a heavily active one. Let's collect the number of calls for each of them, report them average on the number of instances of each task as well as a percentage of the total used. This way it even becomes possible to get a hint about how CPU usage is distributed.	2025-09-10 10:44:33 +02:00
Willy Tarreau	17d3392348	BUG/MINOR: activity: fix reporting of task latency In 2.4, "show tasks" was introduced by commit 7eff06e162 ("MINOR: activity: add a new "show tasks" command to list currently active tasks") to expose some info about running tasks. The latency is not correct because it's a u32 subtracted from a u64. It ought to have been casted to u32 for the operation, which is what this patch does. This can be backported to 2.4.	2025-09-10 10:44:33 +02:00
Willy Tarreau	bdff394195	BUILD: ssl: address a recent build warning when QUIC is enabled Since commit 5ab9954faa ("MINOR: ssl: Add a flag to let it known we have an ALPN negociated"), when building with QUIC we get this warning: src/ssl_sock.c: In function 'ssl_sock_advertise_alpn_protos': src/ssl_sock.c:2189:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] Let's just move the instructions after the optional declaration. No backport is needed.	2025-09-10 10:44:33 +02:00
Olivier Houchard	d4c51a4f57	MEDIUM: server: Make use of the stored ALPN stored in the server Now that which ALPN gets negociated for a given server, use that to decide if we can create the mux right away in connect_server(), and use it in conn_install_mux_be(). That way, we may create the mux soon enough for early data to be sent, before the handshake has been completed. This commit depends on several previous commits, and it has not been deemed important enough to backport.	2025-09-09 19:01:24 +02:00
Willy Tarreau	6a2b3269f9	CLEANUP: backend: clarify the cases where we want to use early data The conditions to use early data on output are super tricky and detected later, so that it's difficult to figure how this works. This patch splits the condition in two parts, the one that can be performed early that is based on config/client/etc. It is used to clear a variable that allows early data to be used in case any condition is not satisfied. It was purposely split into multiple independent and reviewable tests. The second part remains where it was at the end, and is used to temporarily clear the handshake flags to let the data layer use early data. This one being tricky, a large comment explaining the principle was added. The logic was not changed at all, only the code was made more readable.	2025-09-09 19:01:24 +02:00
Willy Tarreau	9b9d0720e1	CLEANUP: backend: simplify the complex ifdef related to 0RTT in connect_server() Since 3.0 we have HAVE_SSL_0RTT precisely to avoid checking horribly complicated and unmaintainable conditions to detect support for 0RTT. Let's just drop the complex condition and use the macro instead.	2025-09-09 19:01:24 +02:00
Willy Tarreau	4aaf0bfbce	CLEANUP: backend: invert the condition to start the mux in connect_server() Instead of trying to switch from delayed start to instant start based on a single condition, let's do the opposite and preset the condition to instant start and detect what could cause it to be delayed, thus falling back to the slow mode. The condition remains exactly the inverted one and better matches the comment about ALPN being the only cause of such a delay.	2025-09-09 19:01:24 +02:00
Willy Tarreau	7b4a7f92b5	CLEANUP: backend: clarify the role of the init_mux variable in connect_server() The init_mux variable is currently used in a way that's not super easy to grasp. It's set a bit too late and requires to know a lot of info at once. Let's first rename it to "may_start_mux_now" to clarify its role, as the purpose is not to force the mux to be initialized now but to permit it to do it.	2025-09-09 19:01:24 +02:00
Olivier Houchard	ff47ae60f3	MEDIUM: server: Introduce the concept of path parameters Add a new field in struct server, path parameters. It will contain connection informations for the server that are not expected to change. For now, just store the ALPN negociated with the server. Each time an handhskae is done, we'll update it, even though it is not supposed to change. This will be useful when trying to send early data, that way we'll know which mux to use. Each time the server goes down or is disabled, those informations are erased, as we can't be sure those parameters will be the same once the server will be back up.	2025-09-09 19:01:24 +02:00
Olivier Houchard	9d65f5cd4d	MINOR: ssl: Use the new flag to know when the ALPN has been set. How that we have a flag to let us know the ALPN has been set, we no longer have to call ssl_sock_get_alpn() to know if the alpn has been negociated already. Remove the call to conn_create_mux() from ssl_sock_handshake(), and just reuse the one already present in ssl_sock_io_cb() if we have received early data, and if the flag is set.	2025-09-09 19:01:24 +02:00
Olivier Houchard	5ab9954faa	MINOR: ssl: Add a flag to let it known we have an ALPN negociated Add a new flag to the ssl_sock_ctx, to be set as soon as the ALPN has been negociated. This happens before the handshake has been completed, and that information will let us know that, when we receive early data, if the ALPN has been negociated, then we can immediately create a mux, as the ALPN will tell us which mux to use.	2025-09-09 19:01:24 +02:00
Olivier Houchard	6b78af837d	BUG/MEDIUM: ssl: create the mux immediately on early data If we received early data, and an ALPN has been negociated, then immediately try to create a mux if we did not have one already. Generally, at this point we would not have one, as the mux is decided by the ALPN, however at this point, even if the handshake is not done yet, we have enough to determine the ALPN, so we can immediately create the mux. Doing so makes up able to treat the request immediately, without waiting for the handshake to be done. This should be backported up to 2.8.	2025-09-09 19:01:24 +02:00
Olivier Houchard	aa25ddb773	BUG/MEDIUM: h1: Allow reception if we have early data In h1_recv_allowed(), do not forbid the reception if we are yet to complete the connection, if we have received early data on it. That way, we can deal with them right away, instead of waiting for the handshake to be done. This should be backported up to 2.8.	2025-09-09 19:01:24 +02:00
Willy Tarreau	d7696d11e1	MEDIUM: peers: don't even try to process updates under contention Recent fix 2421c3769a ("BUG/MEDIUM: peers: don't fail twice to grab the update lock") improved the situation a lot for peers under locking contention but still not enough for situations with many peers and many entries to expire fast. It's indeed still possible to trigger warnings at end of injection sessions for 16 peers at 100k req/s each doing 10 random track-sc when process_table_expire() runs and holds the update lock if compiled with a high value of STKTABLE_MAX_UPDATES_AT_ONCE (1000). Better just not insist in this case and postpone the update. At this point, under load only ebmb_lookup() consumes CPU, other functions are in the few percent, indicating reasonable contention, and peers remain updated. This should be backported to 3.2 after a bit of testing.	2025-09-09 17:56:37 +02:00
Willy Tarreau	d5e7fba5c0	MEDIUM: stick-tables: don't wait indefinitely in stktable_add_pend_updates() This one doesn't need to wait forever, if it cannot work it can postpone it. When building with a high value of STKTABLE_MAX_UPDATES_AT_ONCE (1000), it's still possible to trigger warnings in this function on the write lock that is contended by peers and expiration. Changing it for a trylock resolves the issue. This should be backported to 3.2 after a bit of testing.	2025-09-09 17:56:37 +02:00
Willy Tarreau	a771b14541	MEDIUM: stick-tables: give up on lock contention in process_table_expire() process_table_expire() can take quite a lot of time running over all shards. During this time it will hinder track-sc rules and peers, which will experience an increased latency to do their work, especially peers where each message will cause a lock, whose cumulated time can exceed the watchdog's patience. Here, we proceed just like in stktable_trash_oldest(), which is that we're using a trylock to detect contention. The first time it happens, if we hadn't purged anything, we switch to a regular lock to perform the operation, and next time it happens we abort. This guarantees that some entries will be expired and that contention will be reduced with when detected. With this change, various tests didn't manage to produce any warning, including at the end of the load generation session. This should be backported to 3.2 after a bit more testing.	2025-09-09 17:56:37 +02:00
Willy Tarreau	f87cf8b76e	MEDIUM: stick-tables: relax stktable_trash_oldest() to only purge what is needed stktable_trash_oldest() does insist a lot on purging what was requested, only limited by STKTABLE_MAX_UPDATES_AT_ONCE. This is called in two conditions, one to allocate a new stksess, and the other one to purge entries of a stopping process. The cost of iterating over all shards is huge, and a shard lock is taken each time before looking up entries. Moreover, multiple threads can end up doing the same and looking hard for many entries to purge when only one is needed. Furthermore, all threads start from the same shard, hence synchronize their locks. All of this costs a lot to other operations such as access from peers. This commit simplifies the approach by ignoring the budget, starting from a random shard number, and using a trylock so as to be able to give up early in case of contention. The approach chosen here consists in trying hard to flush at least one entry, but once at least one is evicted or at least one trylock failed, then a failure on the trylock will result in finishing. The function now returns a success as long as one entry was freed. With this, tests no longer show watchdog warnings during tests, though a few still remain when stopping the tests (which are not related to this function but to the contention from process_table_expire()). With this change, under high contention some entries' purge might be postponed and the table may occasionally contain slightly more entries than their size (though this already happens since stksess_new() first increments ->current before decrementing it). Measures were made on a 64-core system with 8 peers of 16 threads each, at CPU saturation (350k req/s each doing 10 track-sc) for 10M req, with 3 different approaches: - this one resulted in 1500 failures to find an entry (0.015% size overhead), with the lowest contention and the fairest peers distibution. - leaving only after a success resulted in 229 failures (0.0029% size overhead) but doubled the time spent in the function (on the write lock precisely). - leaving only when both a success and a failed lock were met resulted in 31 failures (0.00031% overhead) but the contention was high enough again so that peers were not all up to date. Considering that a saturated machine might exceed its entries by 0.015% is pretty minimal, the mechanism is kept. This should be backported to 3.2 after a bit more testing as it resolves some watchdog warnings and panics. It requires precedent commit "MINOR: stick-table: permit stksess_new() to temporarily allocate more entries" to over-allocate instead of failing in case of contention.	2025-09-09 17:56:37 +02:00
Willy Tarreau	b119280f60	MINOR: stick-table: permit stksess_new() to temporarily allocate more entries stksess_new() calls stktable_trash_oldest() to release some entries. If it fails however, it will fail to allocate an entry. This is a problem because it doesn't permit stktable_trash_oldest() to be used in best effort mode, which forces it to impose high contention. There's no problem with allocating slightly more in practice. In the worst case if all entries are in use, it's not shocking to temporarily exceed the number of entries by a few units. Let's relax this problematic rule. This patch might need to be backported to 3.2 after a bit more testing in order to support locking relaxation.	2025-09-09 17:56:37 +02:00
Willy Tarreau	0f33a55171	DEBUG: peers: export functions that use locks The following functions take locks and are often involved in warnings but are currently not resolved, so let's export them so that they are properly decoded: peer_prepare_updatemsg(), peer_send_teachmsgs(), peer_treat_updatemsg(), peer_send_msgs(), peer_io_handler() This should be backported to 3.2.	2025-09-09 17:56:14 +02:00

1 2 3 4 5 ...

25368 Commits