haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-10-27 06:31:23 +01:00

Author	SHA1	Message	Date
Willy Tarreau	67603162c1	DOC: config: clarify some known limitations of the json_query() converter Oula Kivalo reported that different JSON libraries may process duplicate keys differently and that most JSON libraries usually decode the stream before extracting keys, while the current mjson implementation decodes the contents during extraction instead. Let's document this point so that users are aware of the limitations and do not rely on the current behavior and do not use it for what it's not made for (e.g. content sanitization). This is also the case for jwt_header_query(), jwt_payload_query() and jwt_verify(), which already refer to this converter for specificities.	2025-10-02 08:57:39 +02:00
Olivier Houchard	b71bb6c2ae	BUG/MEDIUM: fwlc: Handle memory allocation failures. Properly handle memory allocation failures, by checking the return value for pool_alloc(), and if it fails, make sure that the caller will take it into account. The only use of pool_alloc() in fwlc is to allocate the tree elements in order to properly queue the server into the ebtree, so if that allocation fails, just schedule the requeue tasklet, that will try again, until it hopefully eventually succeeds. This should be backported to 3.2. This should fix github issue #3143.	2025-10-01 18:13:33 +02:00
Olivier Houchard	f4a9c6ffae	MEDIUM: fwlc: Make it so fwlc_srv_reposition works with unqueued srv Modify fwlc_srv_reposition() so that it does not assume that the server was already queued, and so make it so it works even if s->tree_elt is NULL. While the server will usually be queued, there is an unlikely possibility that when the server attempted to get queued when it got up, it failed due to a memory allocation failure, and it just expect the server_requeue tasklet to run to take care of that later. This should be backported to 3.2. This is part of an attempt to fix github issue #3143	2025-10-01 18:13:33 +02:00
Olivier Houchard	822ee90dc2	MEDIUM: servers: Schedule the server requeue target on creation On creation, schedule the server requeue once it's been created. It is possible that when the server went up, it tried to queue itself into the lb specific code, failed to do so, and expect the tasklet to run to take care of that. This should be backported to 3.2. This is part of an attempt to fix github issue #3143.	2025-10-01 18:13:33 +02:00
Willy Tarreau	7ea80cc5b6	MEDIUM: ssl: don't always process pending handshakes on closed connections If a client aborts a pending SSL connection for whatever reason (timeout etc) and the listen queue is large, it may inflict a severe load to a frontend which will spend the CPU creating new sessions then killing the connection. This is similar to HTTP requests aborted just after being sent, except that asymmetric crypto is way more expensive. Unfortunately "option abortonclose" has no effect on this, because it only applies at a higher level. This patch ensures that handshakes being received on a frontend having "option abortonclose" set will be checked for a pending close, and if this is the case, then the connection will be aborted before the heavy calculations. The principle is to use recv(MSG_PEEK) to detect the end, and to destroy the pending handshake data before returning to the SSL library so that it cannot start computing, notices the error and stops. We don't do it without abortonclose though, because this can be used for health checks from other haproxy nodes or even other components which just want to see a handshake succeed. This is in relation with GH issue #3124.	2025-10-01 10:23:04 +02:00
Willy Tarreau	1afaa7b59d	MINOR: rawsock: introduce CO_RFL_TRY_HARDER to detect closures on complete reads Normally, when reading a full buffer, or exactly the requested size, it is not really possible to know if the peer had closed immediately after, and usually we don't care. There's a problematic case, though, which is with SSL: the SSL layer reads in small chunks of a few bytes, and can consume a client_hello this way, then start computation without knowing yet that the client has aborted. In order to permit knowing more, we now introduce a new read flag, CO_RFL_TRY_HARDER, which says that if we've read up to the permitted limit and the flag is set, then we attempt one extra byte using MSG_PEEK to detect whether the connection was closed immediately after that content or not. The first use case will obviously be related to SSL and client_hello, but it might possibly also make sense on HTTP responses to detect a pending FIN at the end of a response (e.g. if a close was already advertised).	2025-10-01 10:23:01 +02:00
Willy Tarreau	dae4cfe8c5	MINOR: ssl: add the ssl_bc_sni sample fetch function to retrieve backend SNI Sometimes in order to debug certain difficult situations it can be useful to know what SNI was configured on a connection going to a server, for example to match it against what the server saw or to detect cases where a server would route on SNI instead of Host. This sample fetch function simply retrieves the SNI configured on the backend connection, if any.	2025-10-01 10:18:53 +02:00
Willy Tarreau	205f1cbf4c	BUG/MEDIUM: wdt: improve stuck task detection accuracy The fact that the watchdog timer measures the execution time from the last return from the poller tends to amplify the impact of multiple bad tasks, and may explain some of the panics reported by Felipe and Ricardo in GH issues #3084, #3092 and #3101. The problem is that we check the time if we see that the scheduler appears not to be moving anymore, but one situation may still arise and catch a bad task: - one slow task takes so long a time that it triggers the watchdog twice, emitting a warning the second time (~200ms). The scheduler is rightfully marked as stuck. - then it completes and the scheduler is no longer stuck. Many other tasks run in turn, they all take quite some time but not enough to trigger a warning. But collectively their cost adds up. - then a task takes more than the warning time (100ms), and causes the total execution time to cross the second. The watchdog is called, sees that we've spend more than 1 second since we left the poller, and marks the thread as stuck. - the task is not finished, the watchdog is called again, sees more than one second with a stuck thread and panics 100ms later. The total time away from the poller is indeed more than one second, which is very bad, but no single task caused this individually, and while the warnings are OK, the watchdog should not panic in this case. This patch revisits the approach to store the moment the scheduler was marked as stuck in the wdt context. The idea is that this date will be used to detect warnings and panics. And by doing so and exploiting the new is_sched_alive(thr), we can greatly simplify the mechanism so that the signal handling thread does the strict minimum (mark the scheduler as possibly stuck and update the stuck_start date), and only bounces to the reporting thread if the scheduler made no progress since last call. This means that without even doing computations in the handing thread, we can continue to avoid all bounces unless a warning is required. Then when the reporting thread is signaled, it will check the dates from the last moment the scheduler was marked, and will decide to warn or panic. The panic decision continues to pass via a TH_FL_STUCK flag to probe the code so that exceptionally slow code (e.g. live cert generation etc) can still find a way to avoid the panic if absolutely certain that things are still moving. This means that now we have the guarantee that panics will only happen if a given task spends more than one full second not moving, and that warnings will be issued for other calls crossing the warn delay boundary. This was tested using artificially slow operations, and all combinations which individually took less than a second only resulted in floods of warnings even if the total reported time in the warning was much higher, while those above one second provoked the panic. One improvement could consist in reporting the time since last stuck in the thread dumps to differentiate the individual task from the whole set. This needs to be backported to 3.2 along with the two previous patches: MINOR: sched: let's permit to share the local ctx between threads MINOR: sched: pass the thread number to is_sched_alive()	2025-10-01 10:18:53 +02:00
Willy Tarreau	25f5f357cc	MINOR: sched: pass the thread number to is_sched_alive() Now it will be possible to query any thread's scheduler state, not only the current one. This aims at simplifying the watchdog checks for reported threads. The operation is now a simple atomic xchg.	2025-10-01 10:18:53 +02:00
Willy Tarreau	7c7e17a605	MINOR: sched: let's permit to share the local ctx between threads The watchdog timer has to go through complex operations due to not being able to check if another thread's scheduler is still ticking. This is simply because the scheduler status is marked as thread-local while it could in fact also be an array. Let's do that (and align the array to avoid false sharing) so that it's now possible to check any scheduler's status.	2025-10-01 10:18:53 +02:00
Olivier Houchard	21ae35dd29	BUG/MEDIUM: stick-tables: Make sure not to free a pending entry There is a race condition, an entry can be free'd by stksess_kill() between the time stktable_add_pend_updates() gets the entry from the mt_list, and the time it adds it to the ebtree. To prevent this, use the newly implemented MT_LIST_POP_LOCKED() to keep the stksess locked until it is added to the tree. That way, __stksess_kill() will wait until we're done with it. This should be backported to 3.2.	2025-09-30 16:25:07 +02:00
Olivier Houchard	cf26745857	MINOR: mt_list: Implement MT_LIST_POP_LOCKED() Implement MT_LIST_POP_LOCKED(), that behaves as MT_LIST_POP() and removes the first element from the list, if any, but keeps it locked. This should be backported to 3.2, as it will be use in a bug fix in the stick tables that affects 3.2 too.	2025-09-30 16:25:07 +02:00
William Lallemand	6316f958e3	ADMIN: reload: introduce -vv mode The -v verbose mode displays the loading messages returned by the master CLI reload command upon error. The new -vv mode displays the loading messages even upon success, showing the content of `show startup-logs` after the reload attempt.	2025-09-29 19:29:10 +02:00
William Lallemand	5d05f343b9	ADMIN: reload: introduce verbose and silent mode By default haproxy-reload displays the error that are not emitted by haproxy, but only emitted by haproxy-reload. -s silent mode, don't display any error -v verbose mode, display the loading messages returned by the master CLI reload command upon error.	2025-09-29 19:29:10 +02:00
William Lallemand	3ce597bfa2	BUG/MEDIUM: acme: free() of i2d_X509_REQ() with AWS-LC When using AWS-LC, the free() of the data ptr resulting from i2d_X509_REQ() might crash, because it uses the free() of the libc instead of OPENSSL_free(). It does not seems to be a problem on openssl builds. Must be backported in 3.2.	2025-09-29 13:46:51 +02:00
William Lallemand	8635c7d789	ADMIN: reload: add a synchronous reload helper haproxy-reload is a utility script which reload synchronously using the master CLI, instead of asynchronously with kill.	2025-09-28 22:10:40 +02:00
William Lallemand	02f7bff90b	ADMIN: dump-certs: use same error format as haproxy Replace error/notice by [ALERT]/[WARNING]/[NOTICE] like it's done in haproxy. ALERT means a failure and the program will exit 1 just after it WARNING will continue the execution of the program NOTICE will continue the execution as well	2025-09-28 20:21:07 +02:00
William Lallemand	5c9f28641b	ADMIN: dump-certs: fix lack of / in -p Add a trailing / so -p don't fail if it wasn't specified.	2025-09-28 18:21:25 +02:00
William Lallemand	172ac6ad03	ADMIN: dump-certs: create files in a tmpdir Files dumped from the socket are put in a temporary directory, this directory is then removed upon exit. Variable were cleaned to be clearer: - crt_filename -> prev_crt - key_filename -> prev_key - ${crt_filename}.${tmp} -> new_crt - ${key_filename}.${tmp} -> new_key	2025-09-28 18:21:25 +02:00
William Lallemand	8781c65d8a	ADMIN: dump-certs: don't update the file if it's up to date Compare the fingerprint of the leaf certificate to the previous file to check if it needs to be updated or not Also skip the check if no file is on the disk.	2025-09-28 18:21:20 +02:00
William Lallemand	3a6ea8b959	ADMIN: haproxy-dump-certs: implement a certificate dumper haproxy-dump0-certs is a bash script that connects to your master socket or your stat socket in order to dump certificates from haproxy memory to the corresponding files.	2025-09-28 13:38:48 +02:00
William Lallemand	b70c7f48fa	MINOR: acme: implement "reuse-key" option The new "reuse-key" option in the "acme" section, allows to keep the private key instead of generating a new one at each renewal.	2025-09-27 21:41:39 +02:00
William Lallemand	a9ccf692e7	BUG/MEDIUM: acme: cfg_postsection_acme() don't init correctly acme sections The cfg_postsection_acme() redefines its own cur_acme variable, pointing to the first acme section created. Meaning that the first section would be init multiple times, and the next sections won't never be initialized. It could result in crashes at the first use of all sections that are not the first one. Must be backported in 3.2	2025-09-27 19:58:44 +02:00
William Lallemand	406fd0ceb1	BUG/MINOR: acme: don't unlink from acme_ctx_destroy() Unlinking the acme_ctx element from acme_ctx_destroy() requires to have the element unlocked, because MT_LIST_DELETE() locks the element. acme_ctx_destroy() frees the data from acme_ctx with the ctx still linked and unlocked, then lock to unlink. So there's a small risk of accessing acme_ctx from somewhere else. The only way to do that would be to use the `acme challenge_ready` CLI command at the same time. Fix the issue by doing a mt_list_unlock_link() and a mt_list_unlock_self() to unlink the element under the lock, then destroy the element. This must be backported in 3.2.	2025-09-27 18:52:56 +02:00
William Lallemand	6499c0a0d5	CI: github: build halog on the vtest job halog was not built in the vtest job. Add it to vtest.yml to be able to track build issues on push.	2025-09-26 16:29:29 +02:00
William Lallemand	f1f5877ce1	BUILD: halog: misleading indentation in halog.c admin/halog/halog.c: In function 'filter_count_url': admin/halog/halog.c:1685:9: error: this 'if' clause does not guard... [-Werror=misleading-indentation] 1685 \| if (unlikely(!ustat)) \| ^~ admin/halog/halog.c:1687:17: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if' 1687 \| if (unlikely(!ustat)) { \| ^~ This patch fixes the indentation. Must be backported where fbd0fb20a22 ("BUG/MINOR: halog: Add OOM checks for calloc() in filter_count_srv_status() and filter_count_url()") was backported.	2025-09-26 16:01:50 +02:00
Chris Staite	54f53bc875	MINOR: backend: srv_is_up converter There is currently an srv_queue converter which is capable of taking the output of a dynamic name and determining the queue length for a given server. In addition there is a sample fetcher for whether a server is currently up. This simply combines the two such that srv_is_up can be used as a converter too. Future work might extend this to other sample fetchers for servers, but this is probably the most useful for acl routing.	2025-09-26 10:46:48 +02:00
Chris Staite	faba98c85f	MINOR: backend: srv_queue helper In preparation of providing further server converters, split the code for finding the server from the sample out. Additionally, update the documentation for srv_queue converter to note security concerns.	2025-09-26 10:46:48 +02:00
William Lallemand	b3b910cc3f	BUILD: acme: fix false positive null pointer dereference src/acme.c: In function ‘cfg_parse_acme_vars_provider’: src/acme.c:471:9: error: potential null pointer dereference [-Werror=null-dereference] 471 \| free(*dst); \| ^~~~~~~~~~ gcc13 on ubuntu 24.04 detects a false positive when building 3e72a9f ("MINOR: acme: provider-name for dpapi sink"). Indeed dst can't be NULL. Clarify the code so gcc don't complain anymore.	2025-09-26 10:34:35 +02:00
William Lallemand	3e72a9f618	MINOR: acme: provider-name for dpapi sink Like "acme-vars", the "provider-name" in the acme section is used in case of DNS-01 challenge and is sent to the dpapi sink. This is used to pass the name of a DNS provider in order to chose the DNS API to use. This patch implements the cfg_parse_acme_vars_provider() which parses either acme-vars or provider-name options and escape their strings. Example: $ ( echo "@@1 show events dpapi -w -0"; cat - ) \| socat /tmp/master.sock - \| cat -e <0>2025-09-18T17:53:58.831140+02:00 acme deploy foobpar.pem thumbprint gDvbPL3w4J4rxb8gj20mGEgtuicpvltnTl6j1kSZ3vQ$ acme-vars "var1=foobar\"toto\",var2=var2"$ provider-name "godaddy"$ {$ "identifier": {$ "type": "dns",$ "value": "example.com"$ },$ "status": "pending",$ "expires": "2025-09-25T14:41:57Z",$ [...]	2025-09-26 10:23:35 +02:00
William Lallemand	c52d69cc78	BUG/MEDIUM: ssl: ca-file directory mode must read every certificates of a file The httpclient is configured with @system-ca by default, which uses the directory returned by X509_get_default_cert_dir(). On debian/ubuntu systems, this directory contains multiple certificate files that are loaded successfully. However it seems that on other systems the files in this directory is the direct result of ca-certificates instead of its source. Meaning that you would only have a bundle file with every certificates in it. The loading was not done correctly in case of directory loading, and was only loading the first certificate of each file. This patch fixes the issue by using X509_STORE_load_locations() on each file from the scandir instead of trying to load it manually with BIO. Not that we can't use X509_STORE_load_locations with the `dir` argument, which would be simpler, because it uses X509_LOOKUP_hash_dir() which requires a directory in hash form. That wouldn't be suited for this use case. Must be backported in every stable branches. Fix issue #3137.	2025-09-26 09:36:55 +02:00
William Lallemand	230a072102	CI: github: add curl+ech build into openssl-ech job Build a curl binary with the ECH function linked with our openssl+ech library.	2025-09-25 17:05:46 +02:00
William Lallemand	44b20e0b01	CI: scripts: build curl with ECH support Add a script to build curl with ECH support, to specify the path of the openssl+ECH library, you should set the SSL_LIB variable with the prefix of the library. Example: SSL_LIB=/opt/openssl-ech CURL_DESTDIR=/opt/curl-ech/ ./build-curl.sh	2025-09-25 17:05:46 +02:00
Christopher Faulet	7aa9f5ec98	BUG/MINOR: pattern: Fix pattern lookup for map with opt@ prefix When we look for a map file reference, the file@ prefix is removed because if may be omitted. The same is true with opt@ prefix. However this case was not properly performed in pat_ref_lookup(). Let's do so. This patch must be backported as far as 3.0.	2025-09-25 15:28:22 +02:00
William Lallemand	c325e34e6d	CLEANUP: acme: acme_will_expire() uses acme_schedule_date() Date computation between acme_will_expire() and acme_schedule_date() are the same. Call acme_schedule_date() from acme_will_expire() and put the functions as static. The patch also move the functions in the right order.	2025-09-25 15:14:31 +02:00
William Lallemand	f256b5fdf3	BUG/MINOR: acme: possible overflow in acme_will_expire() acme_will_expire() computes the schedule date using notAfter and notBefore from the certificate. However notBefore could be greater than notAfter and could result in an overflow. This is unlikely to happen and would mean an incorrect certificate. This patch fixes the issue by checking that notAfter > notBefore. It also replace the int type by a time_t to avoid overflow on 64bits architecture which is also unlikely to happen with certificates. `(date.tv_sec + diff > notAfter)` was also replaced by `if (notAfter - diff <= date.tv_sec)` to avoid an overflow. Fix issue #3135. Need to be backported to 3.2.	2025-09-25 15:12:14 +02:00
William Lallemand	68770479ea	BUG/MINOR: acme: possible overflow on scheduling computation acme_schedule_date() computes the schedule date using notAfter and notBefore from the certificate. However notBefore could be greater than notAfter and could result in an overflow. This is unlikely to happen and would mean an incorrect certificate. This patch fixes the issue by checking that notAfter > notBefore. It also replace the int type by a time_t to avoid overflow on 64bits architecture which is also unlikely to happen with certificates. Fix issue #3136. Need to be backported to 3.2.	2025-09-25 15:12:03 +02:00
Christopher Faulet	3be8b06a60	BUG/MINOR: pattern: Properly flag virtual maps as using samples When a map file is load, internally, the pattern reference is flagged as based on a sample. However it is not performed for virtual maps. This flag is only used during startup to check the map compatibility when it used at different places. At runtime this does not change anything. But errors can be triggered during configuration parsing. For instance, the following valid config will trigger an error: http-request set-map(virt@test) foo bar if !{ str(foo),map(virt@test) -m found } http-request set-var(txn.foo) str(foo),map(virt@test) The fix is quite obvious. PAT_REF_SMP flag must be set for virtual map as any other map. A workaround is to use optional map (opt@...) by checking the map id cannot reference an existing file. This patch must be backported as far as 3.0.	2025-09-25 10:16:53 +02:00
Christopher Faulet	23e5d272af	BUG/MINOR: compression: Test payload size only if content-length is specified When a minimum size is defined to performe the comression, the message payload size is tested. To do so, information from the HTX message a used to determine the message length. However it is performed regardless the payload length is fully known or not. Concretely, the test must on be performed when a content-length value was speficied or when the message was fully received (EOM flag set). Otherwise, we are unable to really determine the real payload length. Because of this bug, compression may be skipped for a large chunked message because the first chunks received are too small. But this does not mean the whole message is small. This patch must be backported to 3.2.	2025-09-25 10:16:53 +02:00
Olivier Houchard	71199e394c	BUG/MEDIUM: stick-tables: Don't let table_process_entry() handle refcnt Instead of having table_process_entry() decrement the session's ref counter, do it outside, from the caller. Some were missed, such as when an action was invalid, which would lead to the ref counter not being decremented, and the session not being destroyable. It makes more sense to do that from the caller, who just obtained the ref counter, anyway. This should be backporter up to 2.8.	2025-09-22 23:14:19 +02:00
Ilia Shipitsin	8c8e50e09a	CI: move VTest preparation & friends to dedicated composite action reference: https://docs.github.com/en/actions/tutorials/create-actions/create-a-composite-action preparing coredump limits, installing VTest are now served by dedicated composite action	2025-09-22 19:18:23 +02:00
William Lallemand	fbffd2e25f	BUG/MINOR: acme/cli: wrong description for "acme challenge_ready" The "acme challenge_ready" command mistakenly use the description of the "acme status" command. This patch adds the right description. Must be backported to 3.2.	2025-09-22 19:14:54 +02:00
William Lallemand	34cdc5e191	MINOR: acme: check acme-vars allocation during escaping Handle allocation properly during acme-vars parsing. Check if we have a allocation failure in both the malloc and the realloc and emits an error if that's the case.	2025-09-19 18:11:50 +02:00
William Lallemand	92c31a6fb7	MINOR: acme: acme-vars allow to pass data to the dpapi sink In the case of the dns-01 challenge, the agent that handles the challenge might need some extra information which depends on the DNS provider. This patch introduces the "acme-vars" option in the acme section, which allows to pass these data to the dpapi sink. The double quotes will be escaped when printed in the sink. Example: global setenv VAR1 'foobar"toto"' acme LE directory https://acme-staging-v02.api.letsencrypt.org/directory challenge DNS-01 acme-vars "var1=${VAR1},var2=var2" Would output: $ ( echo "@@1 show events dpapi -w -0"; cat - ) \| socat /tmp/master.sock - \| cat -e <0>2025-09-18T17:53:58.831140+02:00 acme deploy foobpar.pem thumbprint gDvbPL3w4J4rxb8gj20mGEgtuicpvltnTl6j1kSZ3vQ$ acme-vars "var1=foobar\"toto\",var2=var2"$ {$ "identifier": {$ "type": "dns",$ "value": "example.com"$ },$ "status": "pending",$ "expires": "2025-09-25T14:41:57Z",$ [...]	2025-09-19 16:40:53 +02:00
Christopher Faulet	331689d216	BUG/MEDIUM: http-client: Fix the test on the response start-line The commit 88aa7a780 ("MINOR: http-client: Trigger an error if first response block isn't a start-line") introduced a bug. From an endpoint, an applet or a mux, the <first> index must never be used. It is reserved to the HTTP analyzers. From endpoint, this value may be undefined or just point on any other block that the first one. Instead we must always get the head block. In taht case, to be sure the first HTX block in a response is a start-line, we must use htx_get_head_type() function instead of htx_get_first_type(). Otherwise, we can trigger an error while the response is in fact properly formatted. It is a 3.3-speific issue. cNo backport needed.	2025-09-19 14:59:28 +02:00
Aurelien DARRAGON	5c299dee5a	MEDIUM: stats: consider that shared stats pointers may be NULL This patch looks huge, but it has a very simple goal: protect all accessed to shared stats pointers (either read or writes), because we know consider that these pointers may be NULL. The reason behind this is despite all precautions taken to ensure the pointers shouldn't be NULL when not expected, there are still corner cases (ie: frontends stats used on a backend which no FE cap and vice versa) where we could try to access a memory area which is not allocated. Willy stumbled on such cases while playing with the rings servers upon connection error, which eventually led to process crashes (since 3.3 when shared stats were implemented) Also, we may decide later that shared stats are optional and should be disabled on the proxy to save memory and CPU, and this patch is a step further towards that goal. So in essence, this patch ensures shared stats pointers are always initialized (including NULL), and adds necessary guards before shared stats pointers are de-referenced. Since we already had some checks for backends and listeners stats, and the pointer address retrieval should stay in cpu cache, let's hope that this patch doesn't impact stats performance much.	2025-09-18 16:49:51 +02:00
Aurelien DARRAGON	40eb1dd135	BUG/MEDIUM: sink: fix unexpected double postinit of sink backend Willy experienced an unexpected behavior with the config below: global stats socket :1514 ring buf1 server srv1 127.0.0.1:1514 Indeed, haproxy would connect to the ring server twice since commit 23e5f18b ("MEDIUM: sink: change the sink mode type to PR_MODE_SYSLOG"), and one of the connection would report errors. The reason behind is is, despite the above commit saying no change of behavior is expected, with the sink forward_px proxy now being set with PR_MODE_SYSLOG, postcheck_log_backend() was being automatically executed in addition to the manual cfg_post_parse_ring() function for each "ring" section. The consequence is that sink_finalize() was called twice for a given "ring" section, which means the connection init would be triggered twice.. which in turn resulted in the behavior described above, plus possible unexpected side-effects. To fix the issue, when we create the forward_px proxy, we now set the PR_CAP_INT capability on it to tell haproxy not to automatically manage the proxy (ie: to skip the automatic log backend postinit), because we are about to manually manage the proxy from the sink API. No backport needed, this bug is specific to 3.3	2025-09-18 16:49:29 +02:00
Willy Tarreau	79ef362d9e	OPTIM: ring: avoid reloading the tail_ofs value before the CAS in ring_write() The load followed by the CAS seem to cause two bus cycles, one to retrieve the cache line in shared state and a second one to get exclusive ownership of it. Tests show that on x86 it's much better to just rely on the previous value and preset it to zero before entering the loop. We just mask the ring lock in case of failure so as to challenge it on next iteration and that's done. This little change brings 2.3% extra performance (11.34M msg/s) on a 64-core AMD.	2025-09-18 15:27:32 +02:00
Willy Tarreau	a727c6eaa5	OPTIM: ring: check the queue's owner using a CAS on x86 In the loop where the queue's leader tries to get the tail lock, we also need to check if another thread took ownership of the queue the current thread is currently working for. This is currently done using an atomic load. Tests show that on x86, using a CAS for this is much more efficient because it allows to keep the cache line in exclusive state for a few more cycles that permit the queue release call after the loop to be done without having to wait again. The measured gain is +5% for 128 threads on a 64-core AMD system (11.08M msg/s vs 10.56M). However, ARM loses about 1% on this, and we cannot afford that on machines without a fast CAS anyway, so the load is performed using a CAS only on x86_64. It might not be as efficient on low-end models but we don't care since they are not the ones dealing with high contention.	2025-09-18 15:08:12 +02:00
Willy Tarreau	d25099b359	OPTIM: ring: always relax in the ring lock and leader wait loop Tests have shown that AMD systems really need to use a cpu_relax() in these two loops. The performance improves from 10.03 to 10.56M messages per second (+5%) on a 128-thread system, without affecting intel nor ARM, so let's do this.	2025-09-18 15:07:56 +02:00

1 2 3 4 5 ...

25591 Commits