haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-10-27 14:41:28 +01:00

Author	SHA1	Message	Date
Aurelien DARRAGON	6a92b14cc1	MEDIUM: log/proxy: store log-steps selection using a bitmask, not an eb tree An eb tree was used to anticipate for infinite amount of custom log steps configured at a proxy level. In turns out this makes no sense to configure that much logging steps for a proxy, and the cost of the eb tree is non negligible in terms of memory footprint, especially when used in a default section. Instead, let's use a simple bitmask, which allows up to 64 logging steps configured at proxy level. If we lack space some day (and need more than 64 logging steps to be configured), we could simply modify "struct log_steps" to spread the bitmask over multiple 64bits integers, minor some adjustments where the mask is set and checked.	2025-09-15 10:29:02 +02:00
Christopher Faulet	b582fd41c2	Revert "BUG/MINOR: ocsp: Crash when updating CA during ocsp updates" This reverts commit 167ea8fc7b0cf9d1bf71ec03d7eac3141fbe0080. The patch was backported by mistake.	2025-09-15 10:16:20 +02:00
Remi Tricot-Le Breton	167ea8fc7b	BUG/MINOR: ocsp: Crash when updating CA during ocsp updates If an ocsp response is set to be updated automatically and some certificate or CA updates are performed on the CLI, if the CLI update happens while the OCSP response is being updated and is then detached from the udapte tree, it might be wrongly inserted into the update tree in 'ssl_sock_load_ocsp', and then reinserted when the update finishes. The update tree then gets corrupted and we could end up crashing when accessing other nodes in the ocsp response update tree. This patch must be backported up to 2.8. This patch fixes GitHub #3100.	2025-09-15 08:20:16 +02:00
Willy Tarreau	8fb5ae5cc6	MINOR: activity/memory: count allocations performed under a lock By checking the current thread's locking status, it becomes possible to know during a memory allocation whether it's performed under a lock or not. Both pools and memprofile functions were instrumented to check for this and to increment the memprofile bin's locked_calls counter. This one, when not zero, is reported on "show profiling memory" with a percentage of all allocations that such locked allocations represent. This way it becomes possible to try to target certain code paths that are particularly expensive. Example: $ socat - /tmp/sock1 <<< "show profiling memory"\|grep lock 20297301 0 2598054528 0\| 0x62a820fa3991 sockaddr_alloc+0x61/0xa3 p_alloc(128) [pool=sockaddr] [locked=54962 (0.2 %)] 0 20297301 0 2598054528\| 0x62a820fa3a24 sockaddr_free+0x44/0x59 p_free(-128) [pool=sockaddr] [locked=34300 (0.1 %)] 9908432 0 1268279296 0\| 0x62a820eb8524 main+0x81974 p_alloc(128) [pool=task] [locked=9908432 (100.0 %)] 9908432 0 554872192 0\| 0x62a820eb85a6 main+0x819f6 p_alloc(56) [pool=tasklet] [locked=9908432 (100.0 %)] 263001 0 63120240 0\| 0x62a820fa3c97 conn_new+0x37/0x1b2 p_alloc(240) [pool=connection] [locked=20662 (7.8 %)] 71643 0 47307584 0\| 0x62a82105204d pool_get_from_os_noinc+0x12d/0x161 posix_memalign(660) [locked=5393 (7.5 %)]	2025-09-11 16:32:34 +02:00
Willy Tarreau	9d8c2a888b	MINOR: activity: collect CPU time spent on memory allocations for each task When task profiling is enabled, the pool alloc/free code will measure the time it takes to perform memory allocation after a cache miss or memory freeing to the shared cache or OS. The time taken with the thread-local cache is never measured as measuring that time is very expensive compared to the pool access time. Here doing so costs around 2% performance at 2M req/s, only when task profiling is enabled, so this remains reasonable. The scheduler takes care of collecting that time and updating the sched_activity entry corresponding to the current task when task profiling is enabled. The goal clearly is to track places that are wasting CPU time allocating and releasing too often, or causing large evictions. This appears like this in "show profiling tasks aggr": Tasks activity over 11.428 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg mem_avg lat_avg process_stream 44183891 16.47m 22.36us 491.0ns 1.154us 1.000ns 101.1us h1_io_cb 57386064 4.011m 4.193us 20.00ns 16.00ns - 29.47us sc_conn_io_cb 42088024 49.04s 1.165us - - - 54.67us h1_timeout_task 438171 196.5ms 448.0ns - - - 100.1us srv_cleanup_toremove_conns 65 1.468ms 22.58us 184.0ns 87.00ns - 101.3us task_process_applet 3 508.0us 169.3us - 107.0us 1.847us 29.67us srv_cleanup_idle_conns 6 225.3us 37.55us 15.74us 36.84us - 49.47us accept_queue_process 2 45.62us 22.81us - - 4.949us 54.33us	2025-09-11 16:32:34 +02:00
Willy Tarreau	195794eb59	MINOR: activity: add a new mem_avg column to show profiling stats This new column will be used for reporting the average time spent allocating or freeing memory in a task when task profiling is enabled. For now it is not updated.	2025-09-11 16:32:34 +02:00
Willy Tarreau	98cc815e3e	MINOR: activity: collect time spent with a lock held for each task When DEBUG_THREAD > 0 and task profiling enabled, we'll now measure the time spent with at least one lock held for each task. The time is collected by locking operations when locks are taken raising the level to one, or released resetting the level. An accumulator is updated in the thread_ctx struct that is collected by the scheduler when the task returns, and updated in the sched_activity entry of the related task. This allows to observe figures like this one: Tasks activity over 259.516 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg lat_avg h1_io_cb 15466589 2.574m 9.984us - - 33.45us <- sock_conn_iocb@src/sock.c:1099 tasklet_wakeup sc_conn_io_cb 8047994 8.325s 1.034us - - 870.1us <- sc_app_chk_rcv_conn@src/stconn.c:844 tasklet_wakeup process_stream 7734689 4.356m 33.79us 1.990us 1.641us 1.554ms <- sc_notify@src/stconn.c:1206 task_wakeup process_stream 7734292 46.74m 362.6us 278.3us 132.2us 972.0us <- stream_new@src/stream.c:585 task_wakeup sc_conn_io_cb 7733158 46.88s 6.061us - - 68.78us <- h1_wake_stream_for_recv@src/mux_h1.c:3633 tasklet_wakeup task_process_applet 6603593 4.484m 40.74us 16.69us 34.00us 96.47us <- sc_app_chk_snd_applet@src/stconn.c:1043 appctx_wakeup task_process_applet 4761796 3.420m 43.09us 18.79us 39.28us 138.2us <- __process_running_peer_sync@src/peers.c:3579 appctx_wakeup process_table_expire 4710662 4.880m 62.16us 9.648us 53.95us 158.6us <- run_tasks_from_lists@src/task.c:671 task_queue stktable_add_pend_updates 4171868 6.786s 1.626us - 1.487us 47.94us <- stktable_add_pend_updates@src/stick_table.c:869 tasklet_wakeup h1_io_cb 2871683 1.198s 417.0ns 70.00ns 69.00ns 1.005ms <- h1_takeover@src/mux_h1.c:5659 tasklet_wakeup process_peer_sync 2304957 5.368s 2.328us - 1.156us 68.54us <- stktable_add_pend_updates@src/stick_table.c:873 task_wakeup process_peer_sync 1388141 3.174s 2.286us - 1.130us 52.31us <- run_tasks_from_lists@src/task.c:671 task_queue stktable_add_pend_updates 463488 3.530s 7.615us 2.000ns 7.134us 771.2us <- stktable_touch_with_exp@src/stick_table.c:654 tasklet_wakeup Here we see that almost the entirety of stktable_add_pend_updates() is spent under a lock, that 1/3 of the execution time of process_stream() was performed under a lock and that 2/3 of it was spent waiting for a lock (this is related to the 10 track-sc present in this config), and that the locking time in process_peer_sync() has now significantly reduced. This is more visible with "show profiling tasks aggr": Tasks activity over 475.354 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg lat_avg h1_io_cb 25742539 3.699m 8.622us 11.00ns 10.00ns 188.0us sc_conn_io_cb 22565666 1.475m 3.920us - - 473.9us process_stream 21665212 1.195h 198.6us 140.6us 67.08us 1.266ms task_process_applet 16352495 11.31m 41.51us 17.98us 36.55us 112.3us process_peer_sync 7831923 17.15s 2.189us - 1.107us 41.27us process_table_expire 6878569 6.866m 59.89us 9.359us 51.91us 151.8us stktable_add_pend_updates 6602502 14.77s 2.236us - 2.060us 119.8us h1_timeout_task 801 703.4us 878.0ns - - 185.7us srv_cleanup_toremove_conns 347 12.43ms 35.82us 240.0ns 70.00ns 1.924ms accept_queue_process 142 1.384ms 9.743us - - 340.6us srv_cleanup_idle_conns 74 475.0us 6.418us 896.0ns 5.667us 114.6us	2025-09-11 16:32:34 +02:00
Willy Tarreau	95433f224e	MINOR: activity: add a new lkd_avg column to show profiling stats This new column will be used for reporting the average time spent in a task with at least one lock held. It will only have a non-zero value when DEBUG_THREAD > 0. For now it is not updated.	2025-09-11 16:32:34 +02:00
Willy Tarreau	4b23b2ed32	MINOR: thread: add a lock level information in the thread_ctx The new lock_level field indicates the number of cumulated locks that are held by the current thread. It's fed as soon as DEBUG_THREAD is at least 1. In addition, thread_isolate() adds 128, so that it's even possible to check for combinations of both. The value is also reported in thread dumps (warnings and panics).	2025-09-11 16:32:34 +02:00
Willy Tarreau	503084643f	MINOR: activity: collect time spent waiting on a lock for each task When DEBUG_THREAD > 0, and if task profiling is enabled, then each locking attempt will measure the time it takes to obtain the lock, then add that time to a thread_ctx accumulator that the scheduler will then retrieve to update the current task's sched_activity entry. The value will then appear avearaged over the number of calls in the lkw_avg column of "show profiling tasks", such as below: Tasks activity over 48.298 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lat_avg h1_io_cb 3200170 26.81s 8.377us - 32.73us <- sock_conn_iocb@src/sock.c:1099 tasklet_wakeup sc_conn_io_cb 1657841 1.645s 992.0ns - 853.0us <- sc_app_chk_rcv_conn@src/stconn.c:844 tasklet_wakeup process_stream 1600450 49.16s 30.71us 1.936us 1.392ms <- sc_notify@src/stconn.c:1206 task_wakeup process_stream 1600321 7.770m 291.3us 209.1us 901.6us <- stream_new@src/stream.c:585 task_wakeup sc_conn_io_cb 1599928 7.975s 4.984us - 65.77us <- h1_wake_stream_for_recv@src/mux_h1.c:3633 tasklet_wakeup task_process_applet 997609 46.37s 46.48us 16.80us 113.0us <- sc_app_chk_snd_applet@src/stconn.c:1043 appctx_wakeup process_table_expire 922074 48.79s 52.92us 7.275us 181.1us <- run_tasks_from_lists@src/task.c:670 task_queue stktable_add_pend_updates 705423 1.511s 2.142us - 56.81us <- stktable_add_pend_updates@src/stick_table.c:869 tasklet_wakeup task_process_applet 683511 34.75s 50.84us 18.37us 153.3us <- __process_running_peer_sync@src/peers.c:3579 appctx_wakeup h1_io_cb 535395 198.1ms 370.0ns 72.00ns 930.4us <- h1_takeover@src/mux_h1.c:5659 tasklet_wakeup It now makes it pretty obvious which tasks (hence call chains) spend their time waiting on a lock and for what share of their execution time.	2025-09-11 16:32:34 +02:00
Willy Tarreau	1956c544b5	MINOR: activity: add a new lkw_avg column to show profiling stats This new column will be used for reporting the average time spent waiting for a lock. It will only have a non-zero value when DEBUG_THREAD > 0. For now it is not updated.	2025-09-11 16:32:34 +02:00
Christopher Faulet	3023e98199	BUG/MINOR: resolvers: Restore round-robin selection on records in DNS answers Since the commit dcb696cd3 ("MEDIUM: resolvers: hash the records before inserting them into the tree"), When several records are found in a DNS answer, the round robin selection over these records is no longer performed. Indeed, before a list of records was used. To ensure each records was selected one after the other, at each selection, the first record of the list was moved at the end. When this list was replaced bu a tree, the same mechanism was preserved. However, the record is indexed using its key, a hash of the record. So its position never changes. When it is removed and reinserted in the tree, its position remains the same. When we walk though the tree, starting from the root, the records are always evaluated in the same order. So, even if there are several records in a DNS answer, the same IP address is always selected. It is quite easy to trigger the issue with a do-resolv action. To fix the issue, the node to perform the next selection is now saved. So instead of restarting from the root each time, we can restart from the next node of the previous call. Thanks to Damien Claisse for the issue analysis and for the reproducer. This patch should fix the issue #3116. It must be backported as far as 2.6.	2025-09-11 15:46:45 +02:00
William Lallemand	e52e6f66ac	BUG/MEDIUM: jws: return size_t in JWS functions JWS functions are supposed to return 0 upon error or when nothing was produced. This was done in order to put easily the return value in trash->data without having to check the return value. However functions like a2base64url() or snprintf() could return a negative value, which would be casted in a unsigned int if this happen. This patch add checks on the JWS functions to ensure that no negative value can be returned, and change the prototype from int to size_t. This is also related to issue #3114. Must be backported to 3.2.	2025-09-11 14:31:32 +02:00
Amaury Denoyelle	d293cc62dc	MINOR: quic: display build warning for compat layer on recent OpenSSL Build option USE_QUIC_OPENSSL_COMPAT=1 must be set to activate QUIC support for OpenSSL prior to version 3.5.2. This compiles an internal compatibility layer, which must be then activated at runtime with global option limited-quic. Starting from OpenSSL version 3.5.2, a proper QUIC TLS API is now exposed. Thus, the compatibility layer is unneeded. However it can still be compiled against newer OpenSSL releases and activated at runtime, mostly for test purpose. As this compatibility layer has some limitations, (no support for QUIC 0-RTT), it's important that users notice this situation and disable it if possible. Thus, this patch adds a notice warning when USE_QUIC_OPENSSL_COMPAT=1 is set when building against OpenSSL 3.5.2 and above. This should be sufficient for users and packagers to understand that this option is not necessary anymore. Note that USE_QUIC_OPENSSL_COMPAT=1 is incompatible with others TLS library which exposed a QUIC API based on original BoringSSL patches set. A build error will prevent the compatibility layer to be built. limited-quic option is thus silently ignored.	2025-09-11 10:11:12 +02:00
Frederic Lecaille	5027ba36a9	MINOR: quic-be: make SSL/QUIC objects use their own indexes (ssl_qc_app_data_index) This index is used to retrieve the quic_conn object from its SSL object, the same way the connection is retrieved from its SSL object for SSL/TCP connections. This patch implements two helper functions to avoid the ugly code with such blocks: #ifdef USE_QUIC else if (qc) { .. } #endif Implement ssl_sock_get_listener() to return the listener from an SSL object. Implement ssl_sock_get_conn() to return the connection from an SSL object and optionally a pointer to the ssl_sock_ctx struct attached to the connections or the quic_conns. Use this functions where applicable: - ssl_tlsext_ticket_key_cb() calls ssl_sock_get_listener() - ssl_sock_infocbk() calls ssl_sock_get_conn() - ssl_sock_msgcbk() calls ssl_sock_get_ssl_conn() - ssl_sess_new_srv_cb() calls ssl_sock_get_conn() - ssl_sock_srv_verifycbk() calls ssl_sock_get_conn() Also modify qc_ssl_sess_init() to initialize the ssl_qc_app_data_index index for the QUIC backends.	2025-09-11 09:51:28 +02:00
Frederic Lecaille	47bb15ca84	MINOR: quic: get rid of ->target quic_conn struct member The ->li (struct listener ) member of quic_conn struct was replaced by a ->target (struct obj_type ) member by this commit: MINOR: quic-be: get rid of ->li quic_conn member to abstract the connection type (front or back) when implementing QUIC for the backends. In these cases, ->target was a pointer to the ojb_type of a server struct. This could not work with the dynamic servers contrary to the listeners which are not dynamic. This patch almost reverts the one mentioned above. ->target pointer to obj_type member is replaced by ->li pointer to listener struct member. As the listener are not dynamic, this is easy to do this. All one has to do is to replace the objt_listener(qc->target) statement by qc->li where applicable. For the backend connection, when needed, this is always qc->conn->target which is used only when qc->conn is initialized. The only "problematic" case is for quic_dgram_parse() which takes a pointer to an obj_type as third argument. But this obj_type is only used to call quic_rx_pkt_parse(). Inside this function it is used to access the proxy counters of the connection thanks to qc_counters(). So, this obj_type argument may be null for now on with this patch. This is the reason why qc_counters() is modified to take this into consideration.	2025-09-11 09:51:28 +02:00
Olivier Houchard	ff47ae60f3	MEDIUM: server: Introduce the concept of path parameters Add a new field in struct server, path parameters. It will contain connection informations for the server that are not expected to change. For now, just store the ALPN negociated with the server. Each time an handhskae is done, we'll update it, even though it is not supposed to change. This will be useful when trying to send early data, that way we'll know which mux to use. Each time the server goes down or is disabled, those informations are erased, as we can't be sure those parameters will be the same once the server will be back up.	2025-09-09 19:01:24 +02:00
Olivier Houchard	5ab9954faa	MINOR: ssl: Add a flag to let it known we have an ALPN negociated Add a new flag to the ssl_sock_ctx, to be set as soon as the ALPN has been negociated. This happens before the handshake has been completed, and that information will let us know that, when we receive early data, if the ALPN has been negociated, then we can immediately create a mux, as the ALPN will tell us which mux to use.	2025-09-09 19:01:24 +02:00
Willy Tarreau	f87cf8b76e	MEDIUM: stick-tables: relax stktable_trash_oldest() to only purge what is needed stktable_trash_oldest() does insist a lot on purging what was requested, only limited by STKTABLE_MAX_UPDATES_AT_ONCE. This is called in two conditions, one to allocate a new stksess, and the other one to purge entries of a stopping process. The cost of iterating over all shards is huge, and a shard lock is taken each time before looking up entries. Moreover, multiple threads can end up doing the same and looking hard for many entries to purge when only one is needed. Furthermore, all threads start from the same shard, hence synchronize their locks. All of this costs a lot to other operations such as access from peers. This commit simplifies the approach by ignoring the budget, starting from a random shard number, and using a trylock so as to be able to give up early in case of contention. The approach chosen here consists in trying hard to flush at least one entry, but once at least one is evicted or at least one trylock failed, then a failure on the trylock will result in finishing. The function now returns a success as long as one entry was freed. With this, tests no longer show watchdog warnings during tests, though a few still remain when stopping the tests (which are not related to this function but to the contention from process_table_expire()). With this change, under high contention some entries' purge might be postponed and the table may occasionally contain slightly more entries than their size (though this already happens since stksess_new() first increments ->current before decrementing it). Measures were made on a 64-core system with 8 peers of 16 threads each, at CPU saturation (350k req/s each doing 10 track-sc) for 10M req, with 3 different approaches: - this one resulted in 1500 failures to find an entry (0.015% size overhead), with the lowest contention and the fairest peers distibution. - leaving only after a success resulted in 229 failures (0.0029% size overhead) but doubled the time spent in the function (on the write lock precisely). - leaving only when both a success and a failed lock were met resulted in 31 failures (0.00031% overhead) but the contention was high enough again so that peers were not all up to date. Considering that a saturated machine might exceed its entries by 0.015% is pretty minimal, the mechanism is kept. This should be backported to 3.2 after a bit more testing as it resolves some watchdog warnings and panics. It requires precedent commit "MINOR: stick-table: permit stksess_new() to temporarily allocate more entries" to over-allocate instead of failing in case of contention.	2025-09-09 17:56:37 +02:00
Willy Tarreau	c3f94fbd9b	DEBUG: stream: count the number of passes in the connect loop Normally the connect loop cannot loop, but some recent traces can easily convince one of the opposite. Let's add a counter, including in panic dumps, in order to avoid the repeated long head scratching sessions starting with "and what if...". In addition, if it's found to loop, this time it will be certain and will indicate what to zoom in. This should be backported to 3.2.	2025-09-09 17:56:14 +02:00
Amaury Denoyelle	0b6908385e	BUG/MINOR: quic: properly support GSO on backend side Previously, GSO emission was explicitely disabled on backend side. This is not true since the following patch, thus GSO can be used, for example when transfering large POST requests to a HTTP/3 backend. commit e064e5d46171d32097a84b8f84ccc510a5c211db MINOR: quic: duplicate GSO unsupp status from listener to conn However, GSO on the backend side may cause crash when handling EIO. In this case, GSO must be completely disabled. Previously, this was performed by flagging listener instance. In backend side, this would cause a crash as listener is NULL. This patch fixes it by supporting GSO disable flag for servers. Thus, in qc_send_ppkts(), EIO can be converted either to a listener or server flag depending on the quic_conn proxy side. On backend side, server instance is retrieved via <qc.conn.target>. This is enough to guarantee that server is not deleted. This does not need to be backported.	2025-09-08 16:18:05 +02:00
Christopher Faulet	e653dc304e	MINOR: pools: Don't dump anymore info about pools when purge is forced Historically, when the purge of pools was forced by sending a SIGQUIT to haproxy, information about the pools were first dumped. It is now totally pointless because these info can be retrieved via the CLI. It is even less relevant now because the purge is forced typically when there are memroy issues and to dump pools information, data must be allocated. dump_pools_info() function was simplified because it is now called only from an applet. No reason to still try to dump info on stderr.	2025-09-08 16:04:40 +02:00
Amaury Denoyelle	f645cd3c74	MINOR: quic: restore QUIC_HP_SAMPLE_LEN constant The below patch fixes padding emission for small packets, which is required to ensure that header protection removal can be performed by the recipient. commit d7dea408c64c327cab6aebf4ccad93405b675565 BUG/MINOR: quic: too short PADDING frame for too short packets In addition to the proper fix, constant QUIC_HP_SAMPLE_LEN was removed and replaced by QUIC_TLS_TAG_LEN. However, it still makes sense to have a dedicated constant which represent the size of the sample used for header protection. Thus, this patch restores it. Special instructions for backport : above patch mentions that no backport is needed. However, this is incorrect, as bug is introduced by another patch scheduled for backport up to 2.6. Thus, it is first mandatory to schedule d7dea408c64c327cab6aebf4ccad93405b675565 after it. Then, this patch can also be used for the sake of code clarity.	2025-09-08 14:49:03 +02:00
Frederic Lecaille	6f9fccec1f	MINOR: quic: SSL session reuse for QUIC Mimic the same behavior as the one for SSL/TCP connetion to implement the SSL session reuse. Extract the code which try to reuse the SSL session for SSL/TCP connections to implement ssl_sock_srv_try_reuse_sess(). Call this function from QUIC ->init() xprt callback (qc_conn_init()) as this done for SSL/TCP connections.	2025-09-08 11:46:26 +02:00
Frederic Lecaille	d7dea408c6	BUG/MINOR: quic: too short PADDING frame for too short packets This bug arrvived with this commit: MINOR: quic: centralize padding for HP sampling on packet building What was missed is the fact that at the centralization point for the PADDING frame to add for too short packet, <len> payload length already includes <*pn_len> the packet number field length value. So when computing the length of the PADDING frame, the packet field length must not be considered and added to the payload length (<len>). This bug leaded too short PADDING frame to too short packets. This was the case, most of times with Application level packets with a 1-byte packet number field followed by a 1-byte PING frame. A 1-byte PADDING frame was added in this case in place of a correct 2-bytes PADDINF frame. The header packet protection of such packet could not be removed by the clients as for instance for ngtcp2 with such traces: I00001828 0x5a135c81e803f092c74bac64a85513b657 pkt could not decrypt packet number As the header protection could no be removed, the header keyupdate bit could also not be read by packet analyzers such as pyshark used during the keyupdate tests. No need to backport.	2025-09-05 16:17:11 +02:00
Christopher Faulet	f9a6ae727c	OPTIM: tcpcheck: Reorder tcpchek_connect structure fields to fill holes Thanks to this patch, two 4-bytes holes are now filled in the tcpchek_connect structure.	2025-09-05 15:56:42 +02:00
Christopher Faulet	ffc1f096e0	MEDIUM: httpcheck/ssl: Base the SNI value on the HTTP host header by default Similarly to the automic SNI selection for regulat SSL traffic, the SNI of health-checks HTTPS connection is now automatically set by default by using the host header value. "check-sni-auto" and "no-check-sni-auto" server settings were added to change this behavior. Only implicit HTTPS health-checks can take advantage of this feature. In this case, the host header value from the "option httpchk" directive is used to extract the SNI. It is disabled if http-check rules are used. So, the SNI must still be explicitly specified via a "http-check connect" rule. This patch with should paritally fix the issue #3081.	2025-09-05 15:56:42 +02:00
Christopher Faulet	668916c1a2	MEDIUM: server/ssl: Base the SNI value to the HTTP host header by default For HTTPS outgoing connections, the SNI is now automatically set using the Host header value if no other value is already set (via the "sni" server keyword). It is now the default behavior. It could be disabled with the "no-sni-auto" server keyword. And eventually "sni-auto" server keyword may be used to reset any previous "no-sni-auto" setting. This option can be inherited from "default-server" settings. Finally, if no connection name is set via "pool-conn-name" setting, the selected value is used. The automatic selection of the SNI is enabled by default for all outgoing connections. But it is concretely used for HTTPS connections only. The expression used is "req.hdr(host),host_only". This patch should paritally fix the issue #3081. It only covers the server part. Another patch will add the feature for HTTP health-checks.	2025-09-05 15:56:42 +02:00
Christopher Faulet	f8f94ffc9c	BUG/MEDIUM: server: Use sni as pool connection name for SSL server only By default, for a given server, when no pool-conn-name is specified, the configured sni is used. However, this must only be done when SSL is in-use for the server. Of course, it is uncommon to have a sni expression for now-ssl server. But this may happen. In addition, the SSL may be disabled via the CLI. In that case, the pool-conn-name must be discarded if it was copied from the sni. And, we must of course take care to set it if the ssl is enabled. Finally, when the attac-srv action is checked, we now checked the pool-conn-name expression. This patch should be backported as far as 3.0. It relies on "MINOR: server: Parse sni and pool-conn-name expressions in a dedicated function" which should be backported too.	2025-09-05 15:56:08 +02:00
Aurelien DARRAGON	1a1362ea0b	MINOR: stats-file: reserve some bytes in exported structs We may need additional struct members in shm_stats_file_object and shm_stats_file_hdr, yet since these structs are exported they should not change in size nor ordering else it would require a version change to break compability on purpose since mapping would differ. Here we reserve 64 additional bytes in shm_stats_file_object, and 128 bytes in shm_stats_file_hdr for future usage.	2025-09-03 16:29:48 +02:00
Aurelien DARRAGON	21d97ccfae	BUILD: stats-file: fix aligment issues Document some byte holes and fix some potential aligment issues between 32 and 64 bits architectures to ensure the shm_stats_file memory mapping is consistent between operating systems.	2025-09-03 16:28:46 +02:00
Aurelien DARRAGON	46a5948ed2	MINOR: compiler: add ALWAYS_PAD() macro same as THREAD_PAD() but doesn't depend on haproxy being compiled with thread support. It may be useful for memory (or files) that may be shared between multiple processed.	2025-09-03 16:28:46 +02:00
Aurelien DARRAGON	585ece4c92	MEDIUM: stats-file/counters: store and preload stats counters as shm file objects This is the last patch of the shm stats file series, in this patch we implement the logic to store and fetch shm stats objects and associate them to existing shared counters on the current process. Shm objects are stored in the same memory location as the shm stats file header. In fact they are stored right after it. All objects (struct shm_stats_file_object) have the same size (no matter their type), which allows for easy object traversal without having to check the object's type, and could permit the use of external tools to scan the SHM in the future. Each object stores a guid (of GUID_MAX_LEN+1 size) and tgid which allows to match corresponding shared counters indexes. Also, as stated before, each object stores the list of users making use of it. Objects are never released (the map can only grow), but unused objects (when no more users or active users are found in objects->users), the object is automatically recycled. Also, each object stores its type which defines how the object generic data member should be handled. Upon startup (or reload), haproxy first tries to scan existing shm to find objects that could be associated to frontends, backends, listeners or servers in the current config based on GUID. For associations that couldn't be made, haproxy will automatically create missing objects in the SHM during late startup. When haproxy matches with an existing object, it means the counter from an older process is preserved in the new process, so multiple processes temporarily share the same counter for as long as required for older processes to eventually exit.	2025-09-03 15:59:37 +02:00
Aurelien DARRAGON	ee17d20245	MINOR: stats-file: add process slot management for shm stats file Now that all processes tied to the same shm stats file now share a common clock source, we introduce the process slot notion in this patch. Each living process registers itself in a map at a free index: each slot stores information about the process' PID and heartbeat. Each process is responsible for updating its heartbeat, a slot is considered as "free" if the heartbeat was never set or if the heartbeat is expired (60 seconds of inactivity). The total number of slots is set to 64, this is on purpose because it allows to easily store the "users" of a given shm object using a 64 bits bitmask. Given that when haproxy is reloaded olders processes are supposed to die eventually, it should be large enough (64 simultaneous processes) to be safe. If we manage to reach this limit someday, more slots could be added by splitting "users" bitmask on multiple 64bits variable.	2025-09-03 15:59:33 +02:00
Aurelien DARRAGON	443e657fd6	MEDIUM: stats-file: processes share the same clock source from shm-stats-file The use of the "shm-stats-file" directive now implies that all processes using the same file now share a common clock source, this is required for consistency regarding time-related operations. The clock source is stored in the shm stats file header. When the directive is set, all processes share the same clock (global_now_ms and global_now_ns both point to variables in the map), this is required for time-based counters such as freq counters to work consistently. Since all processes manipulate global clock with atomic operations exclusively during runtime, and don't systematically relies on it (thanks to local now_ms and now_ns), it is pretty much transparent.	2025-09-03 15:59:27 +02:00
Aurelien DARRAGON	c91d93ed1c	MINOR: stats-file: introduce shm-stats-file directive add initial support for the "shm-stats-file" directive and associated "shm-stats-file-max-objects" directive. For now they are flagged as experimental directives. The shared memory file is automatically created by the first process. The file is created using open() so it is up to the user to provide relevant path (either on regular filesystem or ramfs for performance reasons). The directive takes only one argument which is path of the shared memory file. It is passed as-is to open(). The maximum number of objects per thread-group (hard limit) that can be stored in the shm is defined by "shm-stats-file-max-objects" directive, Upon initial creation, the main shm stats file header is provisioned with the version which must remains the same to be compatible between processes and defaults to 2k. which means approximately 1mb max per thread group and should cover most setups. When the limit is reached (during startup) an error is reported by haproxy which invites the user to increase the "shm-stats-file-max-objects" if desired, but this means more memory will be allocated. Actual memory usage is low at start, because only the mmap (mapping) is provisionned with the maximum number of objects to avoid relocating the memory area during runtime, but the actual shared memory file is dynamically resized when objects are added (resized by following half power of 2 curve when new objects are added, see upcoming commits) For now only the file is created, further logic will be implemented in upcoming commits.	2025-09-03 15:59:22 +02:00
Aurelien DARRAGON	cb08bcb9d6	MINOR: counters: retrieve detailed errmsg upon failure with counters_{fe,be}_shared_prepare() counters_{fe,be}_shared_prepare now take an extra <errmsg> parameter that contains additional hints about the error in case of failure. It must be freed accordingly since it is allocated using memprintf	2025-09-03 15:59:17 +02:00
Amaury Denoyelle	a84b404b34	MINOR: quic/flags: complete missing flags Add missing quic_conn flags definition for dev utility.	2025-09-02 09:37:43 +02:00
Amaury Denoyelle	1517869145	BUG/BUILD: stats: fix build due to missing stat enum definition Recently, new server counter for private idle connections have been added to statistics output. However, the patch was missing ST_I_PX_PRIV_IDLE_CUR enum definition. No need to backport.	2025-08-29 09:32:10 +02:00
Amaury Denoyelle	dbe31e3f65	MEDIUM: session: account on server idle conns attached to session This patch adds a new member <curr_sess_idle_conns> on the server. It serves as a counter of idle connections attached on a session instead of regular idle/safe trees. This is used only for private connections. The objective is to provide a method to detect if there is idle connections still referencing a server. This will be particularly useful to ensure that a server is removable. Currently, this is not yet necessary as idle connections are directly freed via "del server" handler under thread isolation. However, this procedure will be replaced by an asynchronous mechanism outside of thread isolation. Careful: connections attached to a session but not idle will not be accounted by this counter. These connections can still be detected via srv_has_streams() so "del server" will be safe. This counter is maintain during the whole lifetime of a private connection. This is mandatory to guarantee "del server" safety and is conform with other idle server counters. What this means it that decrement is performed only when the connection transitions from idle to in use, or just prior to its deletion. For the first case, this is covered by session_get_conn(). The second case is trickier. It cannot be done via session_unown_conn() as a private connection may still live a little longer after its removal from session, most notably when scheduled for idle purging. Thus, conn_free() has been adjusted to handle the final decrement. Now, conn_backend_deinit() is also called for private connections if CO_FL_SESS_IDLE flag is present. This results in a call to srv_release_conn() which is responsible to decrement server idle counters.	2025-08-28 15:08:35 +02:00
Amaury Denoyelle	7a6e3c1a73	MAJOR: server: implement purging of private idle connections When a server goes into maintenance, or if its IP address is changed, idle connections attached to it are scheduled for deletion via the purge mechanism. Connections are moved from server idle/safe list to the purge list relative to their thread. Connections are freed on their owned thread by the scheduled purge task. This patch extends this procedure to also handle private idle connections stored in sessions instead of servers. This is possible thanks via <sess_conns> list server member. A call to the newly defined-function session_purge_conns() is performed on each list element. This moves private connections from their session to the purge list alongside other server idle connections. This change relies on the serie of previous commits which ensure that access to private idle connections is now thread-safe, with idle_conns lock usage and careful manipulation of private idle conns in input/output handlers. The main benefit of this patch is that now all idle connections targetting a server set in maintenance are removed. Previously, private connections would remain until their attach sessions were closed.	2025-08-28 15:08:35 +02:00
Amaury Denoyelle	73fd12e928	MEDIUM: conn/muxes/ssl: remove BE priv idle conn from sess on IO This is a direct follow-up of previous patch which adjust idle private connections access via input/output handlers. This patch implement the handlers prologue part. Now, private idle connections require a similar treatment with non-private idle connections. Thus, private conns are removed temporarily from its session under protection of idle_conns lock. As locking usage is already performed in input/output handler, session_unown_conn() cannot be called. Thus, a new function session_detach_idle_conn() is implemented in session module, which performs basically the same operation but relies on external locking.	2025-08-28 15:08:35 +02:00
Amaury Denoyelle	8de0807b74	MEDIUM: conn/muxes/ssl: reinsert BE priv conn into sess on IO completion When dealing with input/output on a connection related handler, special care must be taken prior to access the connection if it is considered as idle, as it could be manipulated by another thread. Thus, connection is first removed from its idle tree before processing. The connection is reinserted on processing completion unless it has been freed during it. Idle private connections are not concerned by this, because takeover is not applied on them. However, a future patch will implement purging of these connections along with regular idle ones. As such, it is necessary to also protect private connections usage now. This is the subject of this patch and the next one. With this patch, input/output handlers epilogue of muxes/SSL/conn_notify_mux() are adjusted. A new code path is able to deal with a connection attached to a session instead of a server. In this case, session_reinsert_idle_conn() is used. Contrary to session_add_conn(), this new function is reserved for idle connections usage after a temporary removal. Contrary to _srv_add_idle() used by regular idle connections, session_reinsert_idle_conn() may fail as an allocation can be required. If this happens, the connection is immediately destroyed. This patch has no effect for now. It must be coupled with the next one which will temporarily remove private idle connections on input/output handler prologue.	2025-08-28 15:08:35 +02:00
Amaury Denoyelle	f234b40cde	MINOR: server: shard by thread sess_conns member Server member <sess_conns> is a mt_list which contains every backend connections attached to a session which targets this server. These connecions are not present in idle server trees. The main utility of this list is to be able to cleanup these connections prior to removing a server via "del server" CLI. However, this procedure will be adjusted by a future patch. As such, <sess_conns> member must be moved into srv_per_thread struct. Effectively, this duplicates a list for every threads. This commit does not introduce functional change. Its goal is to ensure that these connections are now ordered by their owning thread, which will allow to implement a purge, similarly to idle connections attached to servers.	2025-08-28 14:52:29 +02:00
Amaury Denoyelle	d4f7a2dbcc	MINOR: session: uninline functions related to BE conns management Move from header to source file functions related to session management of backend connections. These functions are big enough to remove inline attribute.	2025-08-28 14:52:29 +02:00
Amaury Denoyelle	d0df41fd22	MINOR: session: document explicitely that session_add_conn() is safe A set of recent patches have simplified management of backend connection attached to sessions. The API is now stricter to prevent any misuse. One of this change is the addition of a BUG_ON() in session_add_conn(), which ensures that a connection is not attached to a session if its <owner> field points to another entry. On older haproxy releases, this assertion could not be enforced due to NTLM as a connection is turned as private during its transfer. When using a true multiplexed protocol on the backend side, the connection could be assigned in turn to several sessions. However, NTLM is now only applied for HTTP/1.1 as it does not make sense if the connection is already shared. To better clarify this situation, extend the comment on BUG_ON() inside session_add_conn().	2025-08-28 14:52:29 +02:00
Amaury Denoyelle	a96f1286a7	BUG/MINOR: connection: rearrange union list members A connection can be stored in several lists, thus there is several attach points in struct connection. Depending on its proxy side, either frontend or backend, a single connection will only access some of them during its lifetime. As an optimization, these attach points are organized in a union. However, this repartition was not correctly achieved along frontend/backend side delimitation. Furthermore, reverse HTTP has recently been introduced. With this feature, a connection can migrate from frontend to backend side or vice versa. As such, it becomes even more tedious to ensure that these members are always accessed in a safe way. This commit rearrange these fields. First, union is now clearly splitted between frontend and backend only elements. Next, backend elements are initialized with conn_backend_init(), which is already used during connection reversal on an edge endpoint. A new function conn_frontend_init() serves to initialize the other members, called both on connection first instantiation and on reversal on a dialer endpoint. This model is much cleaner and should prevent any access to fields from the wrong side. Currently, there is no known case of wrong access in the existing code base. However, this cleanup is considered an improvement which must be backported up to 3.0 to remove any possible undefined behavior.	2025-08-28 14:52:29 +02:00
Frederic Lecaille	31c17ad837	MINOR: quic: remove ->offset qf_crypto struct field This patch follows this previous bug fix: BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets where a ebtree node has been added to qf_crypto struct. It has the same meaning and type as ->offset_node.key field with ->offset_node an eb64tree node. This patch simply removes ->offset which is no more useful. This patch should be easily backported as far as 2.6 as the one mentioned above to ease any further backport to come.	2025-08-28 08:19:34 +02:00
William Lallemand	18ebd81962	MINOR: ssl: diagnostic warning when both 'default-crt' and 'strict-sni' are used It possible to use both 'strict-sni' and 'default-crt' on the same bind line, which does not make much sense. This patch implements a check which will look for default certificates in the sni_w tree when strict-sni is used. (Referenced by their empty sni ""). default-crt sets the CKCH_INST_EXPL_DEFAULT flag in ckch_inst->is_default, so its possible to differenciate explicits default from implicit default. Could be backported as far as 3.0. This was discussed in ticket #3082.	2025-08-27 16:22:12 +02:00
Frederic Lecaille	d753f24096	BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets This issue impacts the QUIC listeners. It is the same as the one fixed by this commit: BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO As chrome, ngtcp2 client decided to fragment its CRYPTO frames but in a much more agressive way. This could be fixed with a list local to qc_parse_pkt_frms() to please chrome thanks to the commit above. But this is not sufficient for ngtcp2 which often splits its ClientHello message into more than 10 fragments with very small ones. This leads the packet parser to interrupt the CRYPTO frames parsing due to the ncbuf gap size limit. To fix this, this patch approximatively proceeds the same way but with an ebtree to reorder the CRYPTO by their offsets. These frames are directly inserted into a local ebtree. Then this ebtree is reused to provide the reordered CRYPTO data to the underlying ncbuf (non contiguous buffer). This way there are very few less chances for the ncbufs used to store CRYPTO data to reach a too much fragmented state. Must be backported as far as 2.6.	2025-08-27 16:14:19 +02:00

1 2 3 4 5 ...

8643 Commits