haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-09-20 21:31:28 +02:00

Author	SHA1	Message	Date
Willy Tarreau	b2402d67b7	MINOR: proxy: add proxy_get_next_id() to find next free proxy ID This was previously achieved via the generic get_next_id() but we'll soon get rid of generic ID trees so let's have a dedicated proxy_get_next_id().	2025-09-16 09:23:46 +02:00
Willy Tarreau	f4059ea42f	MEDIUM: stktable: index table names using compact trees Here we're saving 64 bytes per stick-table, from 3392 to 3328, and the change was really straightforward so there's no reason not to do it.	2025-09-16 09:23:46 +02:00
Willy Tarreau	d0d60a007d	MEDIUM: proxy: switch conf.name to cebis_tree This is used to index the proxy's name and it contains a copy of the pointer to the proxy's name in <id>. Changing that for a ceb_node placed just before <id> saves 32 bytes to the struct proxy, which is now 3112 bytes large. Here we need to continue to support duplicates since they're still allowed between type-incompatible proxies. Interestingly, the use of cebis_next_dup() instead of cebis_next() in proxy_find_by_name() allows us to get rid of an strcmp() that was performed for each use_backend rule. A test with a large config (100k backends) shows that we can get 3% extra performance on a config involving a static use_backend rule (3.09M to 3.18M rps), and even 4.5% on a dynamic rule selecting a random backend (2.47M to 2.59M).	2025-09-16 09:23:46 +02:00
Willy Tarreau	fdf6fd5b45	MEDIUM: server: switch the host_dn member to cebis_tree This member is used to index the hostname_dn contents for DNS resolution. Let's replace it with a cebis_tree to save another 32 bytes (24 for the node + 8 by avoiding the duplication of the pointer). The struct server is now at 3904 bytes.	2025-09-16 09:23:46 +02:00
Willy Tarreau	413e903a22	MEDIUM: server: switch conf.name to cebis_tree This is used to index the server name and it contains a copy of the pointer to the server's name in <id>. Changing that for a ceb_node placed just before <id> saves 32 bytes to the struct server, which remains 3968 bytes large due to alignment. The proxy struct shrinks by 8 bytes to 3144. It's worth noting that the current way duplicate names are handled remains based on the previous mechanism where dups were permitted. Ideally we should now reject them during insertion and use unique key trees instead.	2025-09-16 09:23:46 +02:00
Willy Tarreau	0e99f64fc6	MEDIUM: server: switch addr_node to cebis_tree This contains the text representation of the server's address, for use with stick-tables with "srvkey addr". Switching them to a compact node saves 24 more bytes from this structure. The key was moved to an external pointer "addr_key" right after the node. The server struct is now 3968 bytes (down from 4032) due to alignment, and the proxy struct shrinks by 8 bytes to 3152.	2025-09-16 09:23:46 +02:00
Willy Tarreau	91258fb9d8	MEDIUM: guid: switch guid to more compact cebuis_tree The current guid struct size is 56 bytes. Once reduced using compact trees, it goes down to 32 (almost half). We're not on a critical path and size matters here, so better switch to this. It's worth noting that the name part could also be stored in the guid_node at the end to save 8 extra byte (no pointer needed anymore), however the purpose of this struct is to be embedded into other ones, which is not compatible with having a dynamic size. Affected struct sizes in bytes: Before After Diff server 4032 4032 0* proxy 3184 3160 -24 listener 752 728 -24 *: struct server is full of holes and padding (176 bytes) and is 64-byte aligned. Moving the guid_node elsewhere such as after sess_conn reduces it to 3968, or one less cache line. There's no point in moving anything now because forthcoming patches will arrange other parts.	2025-09-16 09:23:46 +02:00
Willy Tarreau	e36b3b60b3	MEDIUM: migrate the patterns reference to cebs_tree cebs_tree are 24 bytes smaller than ebst_tree (16B vs 40B), and pattern references are only used during map/acl updates, so their storage is pure loss between updates (which most of the time never happen). By switching their indexing to compact trees, we can save 16 to 24 bytes per entry depending on alightment (here it's 24 per struct but 16 practical as malloc's alignment keeps 8 unused). Tested on core i7-8650U running at 3.0 GHz, with a file containing 17.7M IP addresses (16.7M different): $ time ./haproxy -c -f acl-ip.cfg Save 280 MB RAM for 17.7M IP addresses, and slightly speeds up the startup (5.8%, from 19.2s to 18.2s), a part of which possible being attributed to having to write less memory. Note that this is on small strings. On larger ones such as user-agents, ebtree doesn't reread the whole key and might be more efficient. Before: RAM (VSZ/RSS): 4443912 3912444 real 0m19.211s user 0m18.138s sys 0m1.068s Overhead Command Shared Object Symbol 44.79% haproxy haproxy [.] ebst_insert 25.07% haproxy haproxy [.] ebmb_insert_prefix 3.44% haproxy libc-2.33.so [.] __libc_calloc 2.71% haproxy libc-2.33.so [.] _int_malloc 2.33% haproxy haproxy [.] free_pattern_tree 1.78% haproxy libc-2.33.so [.] inet_pton4 1.62% haproxy libc-2.33.so [.] _IO_fgets 1.58% haproxy libc-2.33.so [.] _int_free 1.56% haproxy haproxy [.] pat_ref_push 1.35% haproxy libc-2.33.so [.] malloc_consolidate 1.16% haproxy libc-2.33.so [.] __strlen_avx2 0.79% haproxy haproxy [.] pat_idx_tree_ip 0.76% haproxy haproxy [.] pat_ref_read_from_file 0.60% haproxy libc-2.33.so [.] __strrchr_avx2 0.55% haproxy libc-2.33.so [.] unlink_chunk.constprop.0 0.54% haproxy libc-2.33.so [.] __memchr_avx2 0.46% haproxy haproxy [.] pat_ref_append After: RAM (VSZ/RSS): 4166108 3634768 real 0m18.114s user 0m17.113s sys 0m0.996s Overhead Command Shared Object Symbol 38.99% haproxy haproxy [.] cebs_insert 27.09% haproxy haproxy [.] ebmb_insert_prefix 3.63% haproxy libc-2.33.so [.] __libc_calloc 3.18% haproxy libc-2.33.so [.] _int_malloc 2.69% haproxy haproxy [.] free_pattern_tree 1.99% haproxy libc-2.33.so [.] inet_pton4 1.74% haproxy libc-2.33.so [.] _IO_fgets 1.73% haproxy libc-2.33.so [.] _int_free 1.57% haproxy haproxy [.] pat_ref_push 1.48% haproxy libc-2.33.so [.] malloc_consolidate 1.22% haproxy libc-2.33.so [.] __strlen_avx2 1.05% haproxy libc-2.33.so [.] __strcmp_avx2 0.80% haproxy haproxy [.] pat_idx_tree_ip 0.74% haproxy libc-2.33.so [.] __memchr_avx2 0.69% haproxy libc-2.33.so [.] __strrchr_avx2 0.69% haproxy libc-2.33.so [.] _IO_getline_info 0.62% haproxy haproxy [.] pat_ref_read_from_file 0.56% haproxy libc-2.33.so [.] unlink_chunk.constprop.0 0.56% haproxy libc-2.33.so [.] cfree@GLIBC_2.2.5 0.46% haproxy haproxy [.] pat_ref_append If the addresses are totally disordered (via "shuf" on the input file), we see both implementations reach exactly 68.0s (slower due to much higher cache miss ratio). On large strings such as user agents (1 million here), it's now slightly slower (+9%): Before: real 0m2.475s user 0m2.316s sys 0m0.155s After: real 0m2.696s user 0m2.544s sys 0m0.147s But such patterns are much less common than short ones, and the memory savings do still count. Note that while it could be tempting to get rid of the list that chains all these pat_ref_elt together and only enumerate them by walking along the tree to save 16 extra bytes per entry, that's not possible due to the problem that insertion ordering is critical (think overlapping regex such as /index.* and /index.html). Currently it's not possible to proceed differently because patterns are first pre-loaded into the pat_ref via pat_ref_read_from_file_smp() and later indexed by pattern_read_from_file(), which has to only redo the second part anyway for maps/acls declared multiple times.	2025-09-16 09:23:46 +02:00
Willy Tarreau	ddf900a0ce	IMPORT: cebtree: import version 0.5.0 to support duplicates The support for duplicates is necessary for various use cases related to config names, so let's upgrade to the latest version which brings this support. This updates the cebtree code to commit 808ed67 (tag 0.5.0). A few tiny adaptations were needed: - replace a few ceb_node with ceb_root since pointers are now tagged ; - replace cebu.h with ceb.h since both are now merged in the same include file. This way we can drop the unused cebu*.h files from cebtree that are provided only for compatibility. - rename immediate storage functions to cebXX_imm_XXX() as per the API change in 0.5 that makes immediate explicit rather than implicit. This only affects vars and tools.c:copy_file_name(). The tests continue to work.	2025-09-16 09:23:46 +02:00
Willy Tarreau	90b70b61b1	BUILD: makefile: implement support for running a command in range When running "make range", it would be convenient to support running reg tests or anything else such as "size", "pahole" or even benchmarks. Such commands are usually specific to the developer's environment, so let's just pass a generic variable TEST_CMD that is executed as-is if not empty. This way it becomes possible to run "make range RANGE=... TEST_CMD=...".	2025-09-16 09:23:46 +02:00
Valentine Krasnobaeva	f8acac653e	BUG/MINOR: resolvers: always normalize FQDN from response RFC1034 states the following: By convention, domain names can be stored with arbitrary case, but domain name comparisons for all present domain functions are done in a case-insensitive manner, assuming an ASCII character set, and a high order zero bit. This means that you are free to create a node with label "A" or a node with label "a", but not both as brothers; you could refer to either using "a" or "A". In practice, most DNS resolvers normalize domain labels (i.e., convert them to lowercase) before performing searches or comparisons to ensure this requirement is met. While HAProxy normalizes the domain name in the request, it currently does not do so for the response. Commit 75cc653 ("MEDIUM: resolvers: replace bogus resolv_hostname_cmp() with memcmp()") intentionally removed the `tolower()` conversion from `resolv_hostname_cmp()` for safety and performance reasons. This commit re-introduces the necessary normalization for FQDNs received in the response. The change is made in `resolv_read_name()`, where labels are processed as an unsigned char string, allowing `tolower()` to be applied safely. Since a typical FQDN has only 3-4 labels, replacing `memcpy()` with an explicit copy that also applies `tolower()` should not introduce a significant performance degradation. This patch addresses the rare edge case, as most resolvers perform this normalization themselves. This fixes the GitHub issue #3102. This fix may be backported in all stable versions since 2.5 included 2.5.	2025-09-15 18:02:16 +02:00
Remi Tricot-Le Breton	257df69fbd	BUG/MINOR: ocsp: Crash when updating CA during ocsp updates If an ocsp response is set to be updated automatically and some certificate or CA updates are performed on the CLI, if the CLI update happens while the OCSP response is being updated and is then detached from the udapte tree, it might be wrongly inserted into the update tree in 'ssl_sock_load_ocsp', and then reinserted when the update finishes. The update tree then gets corrupted and we could end up crashing when accessing other nodes in the ocsp response update tree. This patch must be backported up to 2.8. This patch fixes GitHub #3100.	2025-09-15 15:34:36 +02:00
Aurelien DARRAGON	6a92b14cc1	MEDIUM: log/proxy: store log-steps selection using a bitmask, not an eb tree An eb tree was used to anticipate for infinite amount of custom log steps configured at a proxy level. In turns out this makes no sense to configure that much logging steps for a proxy, and the cost of the eb tree is non negligible in terms of memory footprint, especially when used in a default section. Instead, let's use a simple bitmask, which allows up to 64 logging steps configured at proxy level. If we lack space some day (and need more than 64 logging steps to be configured), we could simply modify "struct log_steps" to spread the bitmask over multiple 64bits integers, minor some adjustments where the mask is set and checked.	2025-09-15 10:29:02 +02:00
Aurelien DARRAGON	be417c1db2	BUG/MEDIUM: http_ana: fix potential NULL deref in http_process_req_common() As reported by @kenballus in GH #3118, a potential NULL-deref was introduced in 3da1d63 ("BUG/MEDIUM: http_ana: handle yield for "stats http-request" evaluation") Indeed, px->uri_auth may be NULL when stats directive is not involved in the current proxy section. The bug went unnoticed because it didn't seem to cause any side-effect so far and valgrind didn't catch it. However ASAN did, so let's fix it before it causes harm. It should be backported with 3da1d63.	2025-09-15 10:28:59 +02:00
Christopher Faulet	b582fd41c2	Revert "BUG/MINOR: ocsp: Crash when updating CA during ocsp updates" This reverts commit 167ea8fc7b0cf9d1bf71ec03d7eac3141fbe0080. The patch was backported by mistake.	2025-09-15 10:16:20 +02:00
Remi Tricot-Le Breton	167ea8fc7b	BUG/MINOR: ocsp: Crash when updating CA during ocsp updates If an ocsp response is set to be updated automatically and some certificate or CA updates are performed on the CLI, if the CLI update happens while the OCSP response is being updated and is then detached from the udapte tree, it might be wrongly inserted into the update tree in 'ssl_sock_load_ocsp', and then reinserted when the update finishes. The update tree then gets corrupted and we could end up crashing when accessing other nodes in the ocsp response update tree. This patch must be backported up to 2.8. This patch fixes GitHub #3100.	2025-09-15 08:20:16 +02:00
Christopher Faulet	157852ce99	BUG/MEDIUM: resolvers: Wake resolver task up whne unlinking a stream requester Another regression introduced with the commit 3023e9819 ("BUG/MINOR: resolvers: Restore round-robin selection on records in DNS answers"). Stream requesters are unlinked from any theards. So we must not try to queue the resolver's task here because it is not allowed to do so from another thread than the task thread. Instead, we can simply wake the resolver's task up. It is only performed when the last stream requester is unlink from the resolution. This patch should fix the issue #3119. It must be backported with the commit above.	2025-09-15 07:57:29 +02:00
Christopher Faulet	e6a9192af6	BUG/MEDIUM: resolvers: Accept to create resolution without hostname A regression was introduced by commit 6cf2401ed ("BUG/MEDIUM: resolvers: Make resolution owns its hostname_dn value"). In fact, it is possible (an allowed ?!) to create a resolution without hostname (hostname_dn == NULL). It only happens on startup for a server relying on a resolver but defined with an IP address and not a hostname Because of the patch above, an error is triggered during the configuration parsing when this happens, while it should be accepted. This patch must be backported with the commit above.	2025-09-12 11:52:06 +02:00
Christopher Faulet	6cf2401eda	BUG/MEDIUM: resolvers: Make resolution owns its hostname_dn value The commit 37abe56b1 ("BUG/MEDIUM: resolvers: Properly cache do-resolv resolution") introduced a regression. A resolution does not own its hostname_dn value, it is a pointer on the first request value. But since the commit above, it is possible to have orphan resolution, with no requester. So it is important to modify the resolutions to make it owns its hostname_dn value by duplicating it when it is created. This patch must be backported with the commit above.	2025-09-12 11:09:19 +02:00
Christopher Faulet	f6dfbbe870	BUG/MEDIUM: resolvers: Test for empty tree when getting a record from DNS answer In the previous fix 5d1d93fad ("BUG/MEDIUM: resolvers: Properly handle empty tree when getting a record from the DNS answer"), I missed the fact the answer tree can be empty. So, to avoid crashes, when the answer tree is empty, we immediately exit from resolv_get_ip_from_response() function with RSLV_UPD_NO_IP_FOUND. In addition, when a record is removed from the tree, we take care to reset the next node saved if necessary. This patch must be backported with the commit above.	2025-09-12 11:09:19 +02:00
Collison, Steven	d738fa4ec0	DOC: proxy-protocol: Add TLS group and sig scheme TLVs This change adds the PP2_SUBTYPE_SSL_GROUP and PP2_SUBTYPE_SSL_SIG_SCHEME code point reservations in proxy_protocol.txt. The motivation for adding these two TLVs is for backend visibility into the negotiated TLS key exchange group and handshake signature scheme. Demand for visibility is expected to increase as endpoints migrate to use new Post-Quantum resistant algorithms for key exchange and signatures.	2025-09-12 09:25:14 +02:00
Willy Tarreau	8fb5ae5cc6	MINOR: activity/memory: count allocations performed under a lock By checking the current thread's locking status, it becomes possible to know during a memory allocation whether it's performed under a lock or not. Both pools and memprofile functions were instrumented to check for this and to increment the memprofile bin's locked_calls counter. This one, when not zero, is reported on "show profiling memory" with a percentage of all allocations that such locked allocations represent. This way it becomes possible to try to target certain code paths that are particularly expensive. Example: $ socat - /tmp/sock1 <<< "show profiling memory"\|grep lock 20297301 0 2598054528 0\| 0x62a820fa3991 sockaddr_alloc+0x61/0xa3 p_alloc(128) [pool=sockaddr] [locked=54962 (0.2 %)] 0 20297301 0 2598054528\| 0x62a820fa3a24 sockaddr_free+0x44/0x59 p_free(-128) [pool=sockaddr] [locked=34300 (0.1 %)] 9908432 0 1268279296 0\| 0x62a820eb8524 main+0x81974 p_alloc(128) [pool=task] [locked=9908432 (100.0 %)] 9908432 0 554872192 0\| 0x62a820eb85a6 main+0x819f6 p_alloc(56) [pool=tasklet] [locked=9908432 (100.0 %)] 263001 0 63120240 0\| 0x62a820fa3c97 conn_new+0x37/0x1b2 p_alloc(240) [pool=connection] [locked=20662 (7.8 %)] 71643 0 47307584 0\| 0x62a82105204d pool_get_from_os_noinc+0x12d/0x161 posix_memalign(660) [locked=5393 (7.5 %)]	2025-09-11 16:32:34 +02:00
Willy Tarreau	9d8c2a888b	MINOR: activity: collect CPU time spent on memory allocations for each task When task profiling is enabled, the pool alloc/free code will measure the time it takes to perform memory allocation after a cache miss or memory freeing to the shared cache or OS. The time taken with the thread-local cache is never measured as measuring that time is very expensive compared to the pool access time. Here doing so costs around 2% performance at 2M req/s, only when task profiling is enabled, so this remains reasonable. The scheduler takes care of collecting that time and updating the sched_activity entry corresponding to the current task when task profiling is enabled. The goal clearly is to track places that are wasting CPU time allocating and releasing too often, or causing large evictions. This appears like this in "show profiling tasks aggr": Tasks activity over 11.428 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg mem_avg lat_avg process_stream 44183891 16.47m 22.36us 491.0ns 1.154us 1.000ns 101.1us h1_io_cb 57386064 4.011m 4.193us 20.00ns 16.00ns - 29.47us sc_conn_io_cb 42088024 49.04s 1.165us - - - 54.67us h1_timeout_task 438171 196.5ms 448.0ns - - - 100.1us srv_cleanup_toremove_conns 65 1.468ms 22.58us 184.0ns 87.00ns - 101.3us task_process_applet 3 508.0us 169.3us - 107.0us 1.847us 29.67us srv_cleanup_idle_conns 6 225.3us 37.55us 15.74us 36.84us - 49.47us accept_queue_process 2 45.62us 22.81us - - 4.949us 54.33us	2025-09-11 16:32:34 +02:00
Willy Tarreau	195794eb59	MINOR: activity: add a new mem_avg column to show profiling stats This new column will be used for reporting the average time spent allocating or freeing memory in a task when task profiling is enabled. For now it is not updated.	2025-09-11 16:32:34 +02:00
Willy Tarreau	98cc815e3e	MINOR: activity: collect time spent with a lock held for each task When DEBUG_THREAD > 0 and task profiling enabled, we'll now measure the time spent with at least one lock held for each task. The time is collected by locking operations when locks are taken raising the level to one, or released resetting the level. An accumulator is updated in the thread_ctx struct that is collected by the scheduler when the task returns, and updated in the sched_activity entry of the related task. This allows to observe figures like this one: Tasks activity over 259.516 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg lat_avg h1_io_cb 15466589 2.574m 9.984us - - 33.45us <- sock_conn_iocb@src/sock.c:1099 tasklet_wakeup sc_conn_io_cb 8047994 8.325s 1.034us - - 870.1us <- sc_app_chk_rcv_conn@src/stconn.c:844 tasklet_wakeup process_stream 7734689 4.356m 33.79us 1.990us 1.641us 1.554ms <- sc_notify@src/stconn.c:1206 task_wakeup process_stream 7734292 46.74m 362.6us 278.3us 132.2us 972.0us <- stream_new@src/stream.c:585 task_wakeup sc_conn_io_cb 7733158 46.88s 6.061us - - 68.78us <- h1_wake_stream_for_recv@src/mux_h1.c:3633 tasklet_wakeup task_process_applet 6603593 4.484m 40.74us 16.69us 34.00us 96.47us <- sc_app_chk_snd_applet@src/stconn.c:1043 appctx_wakeup task_process_applet 4761796 3.420m 43.09us 18.79us 39.28us 138.2us <- __process_running_peer_sync@src/peers.c:3579 appctx_wakeup process_table_expire 4710662 4.880m 62.16us 9.648us 53.95us 158.6us <- run_tasks_from_lists@src/task.c:671 task_queue stktable_add_pend_updates 4171868 6.786s 1.626us - 1.487us 47.94us <- stktable_add_pend_updates@src/stick_table.c:869 tasklet_wakeup h1_io_cb 2871683 1.198s 417.0ns 70.00ns 69.00ns 1.005ms <- h1_takeover@src/mux_h1.c:5659 tasklet_wakeup process_peer_sync 2304957 5.368s 2.328us - 1.156us 68.54us <- stktable_add_pend_updates@src/stick_table.c:873 task_wakeup process_peer_sync 1388141 3.174s 2.286us - 1.130us 52.31us <- run_tasks_from_lists@src/task.c:671 task_queue stktable_add_pend_updates 463488 3.530s 7.615us 2.000ns 7.134us 771.2us <- stktable_touch_with_exp@src/stick_table.c:654 tasklet_wakeup Here we see that almost the entirety of stktable_add_pend_updates() is spent under a lock, that 1/3 of the execution time of process_stream() was performed under a lock and that 2/3 of it was spent waiting for a lock (this is related to the 10 track-sc present in this config), and that the locking time in process_peer_sync() has now significantly reduced. This is more visible with "show profiling tasks aggr": Tasks activity over 475.354 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg lat_avg h1_io_cb 25742539 3.699m 8.622us 11.00ns 10.00ns 188.0us sc_conn_io_cb 22565666 1.475m 3.920us - - 473.9us process_stream 21665212 1.195h 198.6us 140.6us 67.08us 1.266ms task_process_applet 16352495 11.31m 41.51us 17.98us 36.55us 112.3us process_peer_sync 7831923 17.15s 2.189us - 1.107us 41.27us process_table_expire 6878569 6.866m 59.89us 9.359us 51.91us 151.8us stktable_add_pend_updates 6602502 14.77s 2.236us - 2.060us 119.8us h1_timeout_task 801 703.4us 878.0ns - - 185.7us srv_cleanup_toremove_conns 347 12.43ms 35.82us 240.0ns 70.00ns 1.924ms accept_queue_process 142 1.384ms 9.743us - - 340.6us srv_cleanup_idle_conns 74 475.0us 6.418us 896.0ns 5.667us 114.6us	2025-09-11 16:32:34 +02:00
Willy Tarreau	95433f224e	MINOR: activity: add a new lkd_avg column to show profiling stats This new column will be used for reporting the average time spent in a task with at least one lock held. It will only have a non-zero value when DEBUG_THREAD > 0. For now it is not updated.	2025-09-11 16:32:34 +02:00
Willy Tarreau	4b23b2ed32	MINOR: thread: add a lock level information in the thread_ctx The new lock_level field indicates the number of cumulated locks that are held by the current thread. It's fed as soon as DEBUG_THREAD is at least 1. In addition, thread_isolate() adds 128, so that it's even possible to check for combinations of both. The value is also reported in thread dumps (warnings and panics).	2025-09-11 16:32:34 +02:00
Willy Tarreau	503084643f	MINOR: activity: collect time spent waiting on a lock for each task When DEBUG_THREAD > 0, and if task profiling is enabled, then each locking attempt will measure the time it takes to obtain the lock, then add that time to a thread_ctx accumulator that the scheduler will then retrieve to update the current task's sched_activity entry. The value will then appear avearaged over the number of calls in the lkw_avg column of "show profiling tasks", such as below: Tasks activity over 48.298 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lat_avg h1_io_cb 3200170 26.81s 8.377us - 32.73us <- sock_conn_iocb@src/sock.c:1099 tasklet_wakeup sc_conn_io_cb 1657841 1.645s 992.0ns - 853.0us <- sc_app_chk_rcv_conn@src/stconn.c:844 tasklet_wakeup process_stream 1600450 49.16s 30.71us 1.936us 1.392ms <- sc_notify@src/stconn.c:1206 task_wakeup process_stream 1600321 7.770m 291.3us 209.1us 901.6us <- stream_new@src/stream.c:585 task_wakeup sc_conn_io_cb 1599928 7.975s 4.984us - 65.77us <- h1_wake_stream_for_recv@src/mux_h1.c:3633 tasklet_wakeup task_process_applet 997609 46.37s 46.48us 16.80us 113.0us <- sc_app_chk_snd_applet@src/stconn.c:1043 appctx_wakeup process_table_expire 922074 48.79s 52.92us 7.275us 181.1us <- run_tasks_from_lists@src/task.c:670 task_queue stktable_add_pend_updates 705423 1.511s 2.142us - 56.81us <- stktable_add_pend_updates@src/stick_table.c:869 tasklet_wakeup task_process_applet 683511 34.75s 50.84us 18.37us 153.3us <- __process_running_peer_sync@src/peers.c:3579 appctx_wakeup h1_io_cb 535395 198.1ms 370.0ns 72.00ns 930.4us <- h1_takeover@src/mux_h1.c:5659 tasklet_wakeup It now makes it pretty obvious which tasks (hence call chains) spend their time waiting on a lock and for what share of their execution time.	2025-09-11 16:32:34 +02:00
Willy Tarreau	1956c544b5	MINOR: activity: add a new lkw_avg column to show profiling stats This new column will be used for reporting the average time spent waiting for a lock. It will only have a non-zero value when DEBUG_THREAD > 0. For now it is not updated.	2025-09-11 16:32:34 +02:00
Willy Tarreau	9f7ce9e807	MINOR: activity: don't report the lat_tot column for show profiling tasks This column is pretty useless, as the total latency experienced by tasks is meaningless, what matters is the average per call. Since we'll add more columns and we need to keep all of this readable, let's get rid of this column.	2025-09-11 16:32:34 +02:00
Christopher Faulet	3023e98199	BUG/MINOR: resolvers: Restore round-robin selection on records in DNS answers Since the commit dcb696cd3 ("MEDIUM: resolvers: hash the records before inserting them into the tree"), When several records are found in a DNS answer, the round robin selection over these records is no longer performed. Indeed, before a list of records was used. To ensure each records was selected one after the other, at each selection, the first record of the list was moved at the end. When this list was replaced bu a tree, the same mechanism was preserved. However, the record is indexed using its key, a hash of the record. So its position never changes. When it is removed and reinserted in the tree, its position remains the same. When we walk though the tree, starting from the root, the records are always evaluated in the same order. So, even if there are several records in a DNS answer, the same IP address is always selected. It is quite easy to trigger the issue with a do-resolv action. To fix the issue, the node to perform the next selection is now saved. So instead of restarting from the root each time, we can restart from the next node of the previous call. Thanks to Damien Claisse for the issue analysis and for the reproducer. This patch should fix the issue #3116. It must be backported as far as 2.6.	2025-09-11 15:46:45 +02:00
Christopher Faulet	37abe56b18	BUG/MEDIUM: resolvers: Properly cache do-resolv resolution As stated by the documentation, when a do-resolv resolution is performed, the result should be cached for <hold.valid> milliseconds. However, the only way to cache the result is to always have a requester. When the last requester is unlink from the resolution, the resolution is released. So, for a do-resolv resolution, it means it could only work by chance if the same FQDN is requested enough to always have at least two streams waiting for the resolution. And because in that case, the cached result is used, it means the traffic must be quite high. In fact, a good approach to fix the issue is to keep orphan resolutions to be able cache the result and only release them after hold.valid milliseconds after the last real resolution. The resolver's task already releases orphan resolutions. So we only need to check the expiration date and take care to not release the resolution when the last stream is unlink from it. This patch should be backported to all stable versions. We can start to backport it as far as 3.1 and then wait a bit.	2025-09-11 15:46:45 +02:00
William Lallemand	fb832e1e52	BUILD: ssl: functions defined but not used Previous patch 50d191b ("MINOR: ssl: set functions as static when no protypes in the .h") broke the WolfSSL function with unused functions. This patch add __maybe_unused to ssl_sock_sctl_parse_cbk(), ssl_sock_sctl_add_cbk() and ssl_sock_msgcbk()	2025-09-11 15:32:59 +02:00
William Lallemand	50d191b8a3	MINOR: ssl: set functions as static when no protypes in the .h Check with -Wmissing-prototypes what should be static. src/ssl_sock.c:1572:5: error: no previous prototype for ‘ssl_sock_sctl_add_cbk’ [-Werror=missing-prototypes] 1572 \| int ssl_sock_sctl_add_cbk(SSL ssl, unsigned ext_type, const unsigned char out, size_t outlen, int al, void add_arg) \| ^~~~~~~~~~~~~~~~~~~~~ src/ssl_sock.c:1582:5: error: no previous prototype for ‘ssl_sock_sctl_parse_cbk’ [-Werror=missing-prototypes] 1582 \| int ssl_sock_sctl_parse_cbk(SSL s, unsigned int ext_type, const unsigned char in, size_t inlen, int al, void parse_arg) \| ^~~~~~~~~~~~~~~~~~~~~~~ src/ssl_sock.c:1604:6: error: no previous prototype for ‘ssl_sock_infocbk’ [-Werror=missing-prototypes] 1604 \| void ssl_sock_infocbk(const SSL ssl, int where, int ret) \| ^~~~~~~~~~~~~~~~ src/ssl_sock.c:2107:6: error: no previous prototype for ‘ssl_sock_msgcbk’ [-Werror=missing-prototypes] 2107 \| void ssl_sock_msgcbk(int write_p, int version, int content_type, const void buf, size_t len, SSL ssl, void arg) \| ^~~~~~~~~~~~~~~ src/ssl_sock.c:3936:5: error: no previous prototype for ‘sh_ssl_sess_new_cb’ [-Werror=missing-prototypes] 3936 \| int sh_ssl_sess_new_cb(SSL ssl, SSL_SESSION sess) \| ^~~~~~~~~~~~~~~~~~ src/ssl_sock.c:3990:14: error: no previous prototype for ‘sh_ssl_sess_get_cb’ [-Werror=missing-prototypes] 3990 \| SSL_SESSION sh_ssl_sess_get_cb(SSL ssl, __OPENSSL_110_CONST__ unsigned char key, int key_len, int do_copy) \| ^~~~~~~~~~~~~~~~~~ src/ssl_sock.c:4043:6: error: no previous prototype for ‘sh_ssl_sess_remove_cb’ [-Werror=missing-prototypes] 4043 \| void sh_ssl_sess_remove_cb(SSL_CTX ctx, SSL_SESSION sess) \| ^~~~~~~~~~~~~~~~~~~~~ src/ssl_sock.c:4075:6: error: no previous prototype for ‘ssl_set_shctx’ [-Werror=missing-prototypes] 4075 \| void ssl_set_shctx(SSL_CTX ctx) \| ^~~~~~~~~~~~~ src/ssl_sock.c:4103:6: error: no previous prototype for ‘SSL_CTX_keylog’ [-Werror=missing-prototypes] 4103 \| void SSL_CTX_keylog(const SSL ssl, const char line) \| ^~~~~~~~~~~~~~ src/ssl_sock.c:5167:6: error: no previous prototype for ‘ssl_sock_deinit’ [-Werror=missing-prototypes] 5167 \| void ssl_sock_deinit() \| ^~~~~~~~~~~~~~~ src/ssl_sock.c:6976:6: error: no previous prototype for ‘ssl_sock_close’ [-Werror=missing-prototypes] 6976 \| void ssl_sock_close(struct connection conn, void xprt_ctx) { \| ^~~~~~~~~~~~~~ src/ssl_sock.c:7846:17: error: no previous prototype for ‘ssl_action_wait_for_hs’ [-Werror=missing-prototypes] 7846 \| enum act_return ssl_action_wait_for_hs(struct act_rule rule, struct proxy *px, \| ^~~~~~~~~~~~~~~~~~~~~~	2025-09-11 15:23:59 +02:00
William Lallemand	19daee6549	MINOR: ocsp: put internal functions as static ones -Wmissing-prototypes let us check which functions can be made static and is not used elsewhere. rc/ssl_ocsp.c:1079:5: error: no previous prototype for ‘ssl_ocsp_update_insert_after_error’ [-Werror=missing-prototypes] 1079 \| int ssl_ocsp_update_insert_after_error(struct certificate_ocsp ocsp) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/ssl_ocsp.c:1116:6: error: no previous prototype for ‘ocsp_update_response_stline_cb’ [-Werror=missing-prototypes] 1116 \| void ocsp_update_response_stline_cb(struct httpclient hc) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/ssl_ocsp.c:1127:6: error: no previous prototype for ‘ocsp_update_response_headers_cb’ [-Werror=missing-prototypes] 1127 \| void ocsp_update_response_headers_cb(struct httpclient hc) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/ssl_ocsp.c:1138:6: error: no previous prototype for ‘ocsp_update_response_body_cb’ [-Werror=missing-prototypes] 1138 \| void ocsp_update_response_body_cb(struct httpclient hc) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/ssl_ocsp.c:1149:6: error: no previous prototype for ‘ocsp_update_response_end_cb’ [-Werror=missing-prototypes] 1149 \| void ocsp_update_response_end_cb(struct httpclient *hc) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~ src/ssl_ocsp.c:2095:5: error: no previous prototype for ‘ocsp_update_postparser_init’ [-Werror=missing-prototypes] 2095 \| int ocsp_update_postparser_init() \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~	2025-09-11 15:18:48 +02:00
William Lallemand	0224d60de6	BUG/MINOR: ocsp: prototype inconsistency Inconsistencies between the .h and the .c can't be catched because the .h is not included in the .c. ocsp_update_init() does not have the right prototype and lacks a const attribute. Must be backported in all previous stable versions.	2025-09-11 15:18:10 +02:00
Remi Tricot-Le Breton	e0844a305c	BUG/MINOR: ssl: Fix potential NULL deref in trace callback 'conn' might be NULL in the trace callback so the calls to conn_err_code_str must be covered by a proper check. This issue was found by Coverity and raised in GitHub #3112. The patch must be backported to 3.2.	2025-09-11 14:31:32 +02:00
Remi Tricot-Le Breton	a316342ec6	BUG/MINOR: ssl: Potential NULL deref in trace macro 'ctx' might be NULL when we exit 'ssl_sock_handshake', it can't be dereferenced without check in the trace macro. This was found by Coverity andraised in GitHub #3113. This patch should be backported up to 3.2.	2025-09-11 14:31:32 +02:00
William Lallemand	e52e6f66ac	BUG/MEDIUM: jws: return size_t in JWS functions JWS functions are supposed to return 0 upon error or when nothing was produced. This was done in order to put easily the return value in trash->data without having to check the return value. However functions like a2base64url() or snprintf() could return a negative value, which would be casted in a unsigned int if this happen. This patch add checks on the JWS functions to ensure that no negative value can be returned, and change the prototype from int to size_t. This is also related to issue #3114. Must be backported to 3.2.	2025-09-11 14:31:32 +02:00
William Lallemand	66a7ebfeef	BUG/MINOR: acme: null pointer dereference upon allocation failure Reported in issue #3115: 11. var_compare_op: Comparing task to null implies that task might be null. 681 if (!task) { 682 ret++; 683 ha_alert("acme: couldn't start the scheduler!\n"); 684 } CID 1609721: (#1 of 1): Dereference after null check (FORWARD_NULL) 12. var_deref_op: Dereferencing null pointer task. 685 task->nice = 0; 686 task->process = acme_scheduler; 687 688 task_wakeup(task, TASK_WOKEN_INIT); 689 } 690 Task would be dereferenced upon allocation failure instead of falling back to the end of the function after the error. Should be backported in 3.2.	2025-09-11 14:31:32 +02:00
Amaury Denoyelle	c15129f7dc	DOC: quic: clarifies limited-quic support This patch extends the documentation for "limited-quic" global keyword. It mentions first that it relies on USE_QUIC_OPENSSL_COMPAT=1 build option. Compatibility with TLS libraries is now clearly exposed. In particular, it highlights the fact that it is mostly targetted at OpenSSL version prior to 3.5.2, and that it should be disabled if a recent OpenSSL release is available. It also states that limited-quic does nothing if USE_QUIC_OPENSSL_COMPAT is not set during compilation.	2025-09-11 10:11:12 +02:00
Amaury Denoyelle	d293cc62dc	MINOR: quic: display build warning for compat layer on recent OpenSSL Build option USE_QUIC_OPENSSL_COMPAT=1 must be set to activate QUIC support for OpenSSL prior to version 3.5.2. This compiles an internal compatibility layer, which must be then activated at runtime with global option limited-quic. Starting from OpenSSL version 3.5.2, a proper QUIC TLS API is now exposed. Thus, the compatibility layer is unneeded. However it can still be compiled against newer OpenSSL releases and activated at runtime, mostly for test purpose. As this compatibility layer has some limitations, (no support for QUIC 0-RTT), it's important that users notice this situation and disable it if possible. Thus, this patch adds a notice warning when USE_QUIC_OPENSSL_COMPAT=1 is set when building against OpenSSL 3.5.2 and above. This should be sufficient for users and packagers to understand that this option is not necessary anymore. Note that USE_QUIC_OPENSSL_COMPAT=1 is incompatible with others TLS library which exposed a QUIC API based on original BoringSSL patches set. A build error will prevent the compatibility layer to be built. limited-quic option is thus silently ignored.	2025-09-11 10:11:12 +02:00
Frederic Lecaille	5027ba36a9	MINOR: quic-be: make SSL/QUIC objects use their own indexes (ssl_qc_app_data_index) This index is used to retrieve the quic_conn object from its SSL object, the same way the connection is retrieved from its SSL object for SSL/TCP connections. This patch implements two helper functions to avoid the ugly code with such blocks: #ifdef USE_QUIC else if (qc) { .. } #endif Implement ssl_sock_get_listener() to return the listener from an SSL object. Implement ssl_sock_get_conn() to return the connection from an SSL object and optionally a pointer to the ssl_sock_ctx struct attached to the connections or the quic_conns. Use this functions where applicable: - ssl_tlsext_ticket_key_cb() calls ssl_sock_get_listener() - ssl_sock_infocbk() calls ssl_sock_get_conn() - ssl_sock_msgcbk() calls ssl_sock_get_ssl_conn() - ssl_sess_new_srv_cb() calls ssl_sock_get_conn() - ssl_sock_srv_verifycbk() calls ssl_sock_get_conn() Also modify qc_ssl_sess_init() to initialize the ssl_qc_app_data_index index for the QUIC backends.	2025-09-11 09:51:28 +02:00
Frederic Lecaille	47bb15ca84	MINOR: quic: get rid of ->target quic_conn struct member The ->li (struct listener ) member of quic_conn struct was replaced by a ->target (struct obj_type ) member by this commit: MINOR: quic-be: get rid of ->li quic_conn member to abstract the connection type (front or back) when implementing QUIC for the backends. In these cases, ->target was a pointer to the ojb_type of a server struct. This could not work with the dynamic servers contrary to the listeners which are not dynamic. This patch almost reverts the one mentioned above. ->target pointer to obj_type member is replaced by ->li pointer to listener struct member. As the listener are not dynamic, this is easy to do this. All one has to do is to replace the objt_listener(qc->target) statement by qc->li where applicable. For the backend connection, when needed, this is always qc->conn->target which is used only when qc->conn is initialized. The only "problematic" case is for quic_dgram_parse() which takes a pointer to an obj_type as third argument. But this obj_type is only used to call quic_rx_pkt_parse(). Inside this function it is used to access the proxy counters of the connection thanks to qc_counters(). So, this obj_type argument may be null for now on with this patch. This is the reason why qc_counters() is modified to take this into consideration.	2025-09-11 09:51:28 +02:00
Christopher Faulet	5354c24c76	BUG/MAJOR: stream: Force channel analysis on successful synchronous send This patchs reverts commit a498e527b ("BUG/MAJOR: stream: Remove READ/WRITE events on channels after analysers eval") because of a regression. It was an attempt to properly detect synchronous sends, even when the stream was woken up on a write event. However, the fix was wrong because it could mask shutdowns performed during process_stream() and block the stream. Indeed, when a shutdown is performed, because an error occurred for instance, a write event is reported. The commit above could mask this event while the shutdown prevent any synchronous sends. In such case, the stream could remain blocked infinitly because an I/O event was missed. So to properly fix the original issue (#3070), the write event must not be masked before a synchronous send. Instead, we now force the channel analysis by setting explicitly CF_WAKE_ONCE flags on the corresponding channel if a write event is reported after the synchronous send. CF_WRITE_EVENT flag is remove explicitly just before, so it is quite easy to detect. This patch must be backport to all stable version in same time of the commit above.	2025-09-11 09:47:47 +02:00
Willy Tarreau	ded2110ec6	MEDIUM: peers: move process_peer_sync() to a single thread The remaining half of the task_queue() and task_wakeup() contention is caused by this function when peers are in use, because just like process_table_expire(), it's created using task_new_anywhere() and is woken up for local updates. Let's turn it to single thread by rotating the assigned threads during initialization so that a table only runs on one thread at a time. Here we go backwards to assign the threads, so that on small setups they don't end up on the same CPUs as the ones used by the stick-tables. This way this will make an even better use of large machines. The performance remains the same as with previous patch, even slightly better (1-3% on avg). At this point there's almost no multi-threaded task activity anymore (only srv_cleanup_idle_server once in a while). This should improve the situation described by Felipe in issues #3084 and #3101. This should be backported to 3.2 after some extended checks.	2025-09-10 19:14:05 +02:00
Willy Tarreau	e05afda249	MEDIUM: stick-table: move process_table_expire() to a single thread A big deal of the task_queue() contention is caused by this function because it's created using task_new_anywhere() and is subject to heavy updates. Let's turn it to single thread by rotating the assigned threads during initialization so that a table only runs on one thread at a time. However there's a trick: the function used to call task_queue() to requeue the task if it had advanced its timer (may only happen when learning an entry from a peer). We can't do that anymore since we can't queue another thread's task. Thus instead of the task needs to be scheduled earlier than previously planned, we simply perform a wakeup. It will likely do nothing and will self-adjust its next wakeup timer. Doing so halves the number of multi-thread task wakeups. In addition the request rate at saturation increased by 12% with 16 peers and 40 tables on a 16 8-thread processes. This should improve the situation described by Felipe in issues #3084 and #3101. This should be backported to 3.2 after some extended checks.	2025-09-10 19:13:33 +02:00
Willy Tarreau	2831cb104f	BUG/MINOR: stick-table: make sure never to miss a process_table_expire update In stktable_requeue_exp(), there's a tiny race at the beginning during which we check the task's expiration date to decide whether or not to wake process_table_expire() up. During this race, the task might just have finished running on its owner thread and we can miss a task_queue() opportunity, which probably explains why during testing it seldom happens that a few entries are left at the end. Let's perform a CAS to confirm the value is still the same before leaving. This way we're certain that our value has been seen at least once. This should be backported to 3.2.	2025-09-10 18:45:01 +02:00
Willy Tarreau	2ce5e0edcc	MEDIUM: resolvers: make the process_resolvers() task single-threaded This task is sometimes caught triggering the watchdog while waiting for the infamous resolvers lock, or the scheduler's wait queue lock in task_queue(). Both are caused by its multi-threaded capability. The task may indeed start on a thread that's different from the one that is currently receiving a response and that holds the resolvers lock, and when being queued back, it requires to lock the wait queue. Both problems disappear when sticking it to a single thread. But for configs running multiple resolvers sections, it would be suboptimal to run them all on the same thread. In order to avoid this, we implement a counter in the resolvers_finalize_config() section that rotates the thread for each resolvers section. This was sufficient to further improve the performance here, making the CPU usage drop to about 7% (from 11 previously or 38 initially) and not showing any resolvers lock contention anymore in perf top output. The change was kept fairly minimal to permit a backport once enough testing is conducted on it. It could address a significant part of the trouble reported by Felipe in GH issue #3101.	2025-09-10 16:51:14 +02:00
Willy Tarreau	d624aceaef	MEDIUM: dns: bind the nameserver sockets to the initiating thread There's still a big architectural limitation in the dns/resolvers code regarding threads: resolvers run as a task that is scheduled to run anywhere, and each NS dgram socket is bound to any thread of the same thread group as the initiating thread. This becomes a big problem when dealing with multiple nameservers because responses arrive on any thread, start by locking the resolvers section, and other threads dealing with responses are just stuck waiting for the lock to disappear. This means that most of the time is exclusively spent causing contention. The process_resolvers() function also also suffers from this contention but apparently less often. It turns out that the nameserver sockets are created during emission of the first packet, triggered from the resolvers task. The present patch exploits this to stick all sockets to the calling thread instead of any thread. This way there is no longer any contention between multiple nameservers of a same resolvers section. Tests with a section having 10 name servers showed that the CPU usage dropped from 38 to about 10%, or almost by a factor of 4. Note that TCP resolvers do not offer this possibility because the tasks that manage the applets are created earlier to run anywhere during config parsing. This might possibly be refined later, e.g. by changing the task's affinity when it first runs. The change was kept fairly minimal to permit a backport once enough testing is conducted on it. It could address a significant part of the trouble reported by Felipe in GH issue #3101.	2025-09-10 16:48:09 +02:00

1 2 3 4 5 ...

25392 Commits