haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-06 23:27:04 +02:00

Author	SHA1	Message	Date
Willy Tarreau	2c46c2c042	MINOR: resolvers: add command-line argument -4 to force IPv4-only DNS In order to ease troubleshooting and testing, the new "-4" command line argument enforces queries and processing of "A" DNS records only, i.e. those representing IPv4 addresses. This can be useful when a host lack end-to-end dual-stack connectivity. This overrides the global "dns-accept-family" directive and is equivalent to value "ipv4".	2025-04-24 17:52:28 +02:00
Willy Tarreau	940fa19ad8	MEDIUM: resolvers: add global "dns-accept-family" directive By default, DNS resolvers accept both IPv4 and IPv6 addresses. This can be influenced by the "resolve-prefer" keywords on server lines as well as the family argument to the "do-resolve" action, but that is only a preference, which does not block the other family from being used when it's alone. In some environments where dual-stack is not usable, stumbling on an unreachable IPv6-only DNS record can cause significant trouble as it will replace a previous IPv4 one which would possibly have continued to work till next request. The "dns-accept-family" global option permits to enforce usage of only one (or both) address families. The argument is a comma-delimited list of the following words: - "ipv4": query and accept IPv4 addresses ("A" records) - "ipv6": query and accept IPv6 addresses ("AAAA" records) When a single family is used, no request will be sent to resolvers for the other family, and any response for the othe family will be ignored. The default value is "ipv4,ipv6", which effectively enables both families.	2025-04-24 17:52:28 +02:00
Christopher Faulet	29632bcabf	CLEANUP: applet: Remove unsued rule pointer in appctx structure Thanks to previous commits, the "rule" field in the appctx structure is no longer used. So we can safely remove it.	2025-04-24 16:22:31 +02:00
Christopher Faulet	b734d7c156	MINOR: cli/applet: Move appctx fields only used by the CLI in a private context There are several fields in the appctx structure only used by the CLI. To make things cleaner, all these fields are now placed in a dedicated context inside the appctx structure. The final goal is to move it in the service context and add an API for cli commands to get a command coontext inside the cli context.	2025-04-24 15:09:37 +02:00
Christopher Faulet	742dc01537	CLEANUP: applet: Update st0/st1 comment in appctx structure Today, these states are used by almost all applets. So update the comments of these fields.	2025-04-24 15:09:37 +02:00
Christopher Faulet	44ace9a1b7	MINOR: cli: Rename some CLI applet states to reflect recent refactoring CLI_ST_GETREQ state was renamed into CLI_ST_PARSE_CMDLINE and CLI_ST_PARSEREQ into CLI_ST_PROCESS_CMDLINE to reflect the real action performed in these states.	2025-04-24 15:09:37 +02:00
Christopher Faulet	20ec1de214	MAJOR: cli: Refacor parsing and execution of pipelined commands Before this patch, when pipelined commands were received, each command was parsed and then excuted before moving to the next command. Pending commands were not copied in the input buffer of the applet. The major issue with this way to handle commands is the impossibility to consume inputs from commands with an I/O handler, like "show events" for instance. It was working thanks to a "bug" if such commands were the last one on the command line. But it was impossible to use them followed by another command. And this prevents us to implement any streaming support for CLI commands. So we decided to refactor the command line parsing to have something similar to a basic shell. Now an entire line is parsed, including the payload, before starting commands execution. The command line is copied in a dedicated buffer. "appctx->chunk" buffer is used for this purpose. It was an unsed field, so it is safe to use it here. Once the command line copied, the commands found on this line are executed. Because the applet input buffer was flushed, any input can be safely consumed by the CLI applet and is available for the command I/O handler. Thanks to this change, "show event -w" command can be followed by a command. And in theory, it should be possible to implement commands supporting input data streaming. For instance, the Tetris like lua applet can be used on the CLI now. Note that the payload, if any, is part of the command line and must be fully received before starting the commands processing. It means there is still the limitation to a buffer, but not only for the payload but for the whole command line. The payload is still necessarily at the end of the command line and is passed as argument to the last command. Internally, the "appctx->cli_payload" field was introduced to point on the payload in the command line buffer. This patch is quite huge but it cannot easily be splitted. It should not introduced significant changes.	2025-04-24 15:09:37 +02:00
Willy Tarreau	1af592c511	MINOR: stick-table: use a separate lock label for updates Too many locks were sharing STK_TABLE_LOCK making it hard to analyze. Let's split the already heavily used update lock.	2025-04-24 14:02:22 +02:00
William Lallemand	af73f98a3e	MEDIUM: acme: rename "uri" into "directory" Rename the "uri" option of the acme section into "directory".	2025-04-24 10:52:46 +02:00
William Lallemand	d700a242b4	MINOR: httpclient: add an "https" log-format Add an experimental "https" log-format for the httpclient, it is not used by the httpclient by default, but could be define in a customized proxy. The string is basically a httpslog, with some of the fields replaced by their backend equivalent or - when not available: "%ci:%cp [%tr] %ft -/- %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r %[bc_err]/%[ssl_bc_err,hex]/-/-/%[ssl_bc_is_resumed] -/-/-"	2025-04-23 15:32:46 +02:00
Christopher Faulet	a56feffc6f	CLEANUP: h1: Remove now useless h1_parse_cont_len_header() function Since the commit "MINOR: hlua/h1: Use http_parse_cont_len_header() to parse content-length value", this function is no longer used. So it can be safely removed.	2025-04-22 16:14:47 +02:00
Christopher Faulet	5200203677	MINOR: proxy: Add options to drop HTTP trailers during message forwarding In RFC9110, it is stated that trailers could be merged with the headers. While it should be performed with a speicial care, it may be a problem for some applications. To avoid any trouble with such applications, two new options were added to drop trailers during the message forwarding. On the backend, "http-drop-request-trailers" option can be enabled to drop trailers from the requests before sending them to the server. And on the frontend, "http-drop-response-trailers" option can be enabled to drop trailers from the responses before sending them to the client. The options can be defined in defaults sections and disabled with "no" keyword. This patch should fix the issue #2930.	2025-04-22 16:14:46 +02:00
Christopher Faulet	044ef9b3d6	CLEANUP: Slightly reorder some proxy option flags to free slots PR_O_TCPCHK_SSL and PR_O_CONTSTATS was shifted to free a slot. The idea is to have 2 contiguous slots to be able to insert two new options.	2025-04-22 16:14:46 +02:00
Amaury Denoyelle	4309a6fbf8	BUG/MINOR: quic: do not crash on CRYPTO ncbuf alloc failure To handle out-of-order received CRYPTO frames, a ncbuf instance is allocated. This is done via the helper quic_get_ncbuf(). Buffer allocation was improperly checked. In case b_alloc() fails, it crashes due to a BUG_ON(). Fix this by removing it. The function now returns NULL on allocation failure, which is already properly handled in its caller qc_handle_crypto_frm(). This should fix the last reported crash from github issue #2935. This must be backported up to 2.6.	2025-04-18 18:11:17 +02:00
Olivier Houchard	3758eab71c	MEDIUM: lb_fwrr: Use one ebtree per thread group. When using the round-robin load balancer, the major source of contention is the lbprm lock, that has to be held every time we pick a server. To mitigate that, make it so there are one tree per thread-group, and one lock per thread-group. That means we now have a lb_fwrr_per_tgrp structure that will contain the two lb_fwrr_groups (active and backup) as well as the lock to protect them in the per-thread lbprm struct, and all fields in the struct server are now moved to the per-thread structure too. Those changes are mostly mechanical, and brings good performances improvment, on a 64-cores AMD CPU, with 64 servers configured, we could process about 620000 requests par second, and we now can process around 1400000 requests per second.	2025-04-17 17:38:23 +02:00
Olivier Houchard	f36f6cfd26	MINOR: proxies: Add a per-thread group lbprm struct. Add a new structure in the per-thread groups proxy structure, that will contain whatever is per-thread group in lbprm. It will be accessed as p->per_tgrp[tgid].lbprm.	2025-04-17 17:38:23 +02:00
Olivier Houchard	7ca1c94ff0	MINOR: lb_fwrr: Move the next weight out of fwrr_group. Move the "next_weight" outside of fwrr_group, and inside struct lb_fwrr directly, one for the active servers, one for the backup servers. We will soon have one fwrr_group per thread group, but next_weight will be global to all of them.	2025-04-17 17:38:23 +02:00
Olivier Houchard	444125a764	MINOR: servers: Provide a pointer to the server in srv_per_tgroup. Add a pointer to the server into the struct srv_per_tgroup, so that if we only have access to that srv_per_tgroup, we can come back to the corresponding server.	2025-04-17 17:38:23 +02:00
Willy Tarreau	36ec70c526	MINOR: sched: add a new function is_sched_alive() to report scheduler's health This verifies that the scheduler is still ticking without having to access the activity[] array nor keeping local copies of the ctxsw counter. It just tests and sets a flag that is reset after each return from a ->process() function.	2025-04-17 16:25:47 +02:00
Willy Tarreau	874ba2afed	CLEANUP: debug: no longer set nor use TH_FL_DUMPING_OTHERS TH_FL_DUMPING_OTHERS was being used to try to perform exclusion between threads running "show threads" and those producing warnings. Now that it is much more cleanly handled, we don't need that type of protection anymore, which was adding to the complexity of the solution. Let's just get rid of it.	2025-04-17 16:25:47 +02:00
Willy Tarreau	c16d5415a8	MINOR: debug: make ha_stuck_warning() only work for the current thread Since we no longer call it with a foreign thread, let's simplify its code and get rid of the special cases that were relying on ha_thread_dump_fill() and synchronization with a remote thread. We're not only dumping the current thread so ha_thread_dump_one() is sufficient.	2025-04-17 16:25:47 +02:00
Willy Tarreau	b24d7f248e	MINOR: pass a valid buffer pointer to ha_thread_dump_one() The goal is to let the caller deal with the pointer so that the function only has to fill that buffer without worrying about locking. This way, synchronous dumps from "show threads" are produced and emitted directly without causing undesired locking of the buffer nor risking causing confusion about thread_dump_buffer containing bits from an interrupted dump in progress. It's only the caller that's responsible for notifying the requester of the end of the dump by setting bit 0 of the pointer if needed (i.e. it's only done in the debug handler).	2025-04-17 16:25:47 +02:00
Willy Tarreau	5ac739cd0c	MINOR: debug: remove unused case of thr!=tid in ha_thread_dump_one() This function was initially designed to dump any threadd into the presented buffer, but the way it currently works is that it's always called for the current thread, and uses the distinction between coming from a sighandler or being called directly to detect which thread is the caller. Let's simplify all this by replacing thr with tid everywhere, and using the thread-local pointers where it makes sense (e.g. th_ctx, th_ctx etc). The confusing "from_signal" argument is now replaced with "is_caller" which clearly states whether or not the caller declares being the one asking for the dump (the logic is inverted, but there are only two call places with a constant).	2025-04-17 16:25:47 +02:00
Willy Tarreau	6d8a523d14	MINOR: tinfo: keep a copy of the pointer to the thread dump buffer Instead of using the thread dump buffer for post-mortem analysis, we'll keep a copy of the assigned pointer whenever it's used, even for warnings or "show threads". This will offer more opportunities to figure from a core what happened, and will give us more freedom regarding the value of the thread_dump_buffer itself. For example, even at the end of the dump when the pointer is reset, the last used buffer is now preserved.	2025-04-17 16:25:47 +02:00
Willy Tarreau	337017e2f9	BUG/MINOR: threads: set threads_idle and threads_harmless even with no threads Some signal handlers rely on these to decide about the level of detail to provide in dumps, so let's properly fill the info about entering/leaving idle. Note that for consistency with other tests we're using bitops with t->ltid_bit, while we could simply assign 0/1 to the fields. But it makes the code more readable and the whole difference is only 88 bytes on a 3MB executable. This bug is not important, and while older versions are likely affected as well, it's not worth taking the risk to backport this in case it would wake up an obscure bug.	2025-04-17 16:25:47 +02:00
Amaury Denoyelle	52246249ab	MEDIUM: listener/mux-h2: implement idle-ping on frontend side This commit is the counterpart of the previous one, adapted on the frontend side. "idle-ping" is added as keyword to bind lines, to be able to refresh client timeout of idle frontend connections. H2 MUX behavior remains similar as the previous patch. The only significant change is in h2c_update_timeout(), as idle-ping is now taken into account also for frontend connection. The calculated value is compared with http-request/http-keep-alive timeout value. The shorter delay is then used as expired date. As hr/ka timeout are based on idle_start, this allows to run them in parallel with an idle-ping timer.	2025-04-17 14:49:36 +02:00
Amaury Denoyelle	a78a04cfae	MEDIUM: server/mux-h2: implement idle-ping on backend side This commit implements support for idle-ping on the backend side. First, a new server keyword "idle-ping" is defined in configuration parsing. It is used to set the corresponding new server member. The second part of this commit implements idle-ping support on H2 MUX. A new inlined function conn_idle_ping() is defined to access connection idle-ping value. Two new connection flags are defined H2_CF_IDL_PING and H2_CF_IDL_PING_SENT. The first one is set for idle connections via h2c_update_timeout(). On h2_timeout_task() handler, if first flag is set, instead of releasing the connection as before, the second flag is set and tasklet is scheduled. As both flags are now set, h2_process_mux() will proceed to PING emission. The timer has also been rearmed to the idle-ping value. If a PING ACK is received before next timeout, connection timer is refreshed. Else, the connection is released, as with timer expiration. Also of importance, special care is needed when a backend connection is going to idle. In this case, idle-ping timer must be rearmed. Thus a new invokation of h2c_update_timeout() is performed on h2_detach().	2025-04-17 14:49:36 +02:00
William Lallemand	e778049ffc	MINOR: acme: register the task in the ckch_store This patch registers the task in the ckch_store so we don't run 2 tasks at the same time for a given certificate. Move the task creation under the lock and check if there was already a task under the lock.	2025-04-16 17:12:43 +02:00
William Lallemand	c291a5c73c	BUILD: incompatible pointer type suspected with -DDEBUG_UNIT src/jws.c: In function '__jws_init': src/jws.c:594:38: error: passing argument 2 of 'hap_register_unittest' from incompatible pointer type [-Wincompatible-pointer-types] 594 \| hap_register_unittest("jwk", jwk_debug); \| ^~~~~~~~~ \| \| \| int ()(int, char ) In file included from include/haproxy/api.h:36, from include/import/ebtree.h:251, from include/import/ebmbtree.h:25, from include/haproxy/jwt-t.h:25, from src/jws.c:5: include/haproxy/init.h:37:52: note: expected 'int ()(void)' but argument is of type 'int ()(int, char )' 37 \| void hap_register_unittest(const char name, int (*fct)()); \| ~~~~~~^~~~~~ GCC 15 is warning because the function pointer does have its arguments in the register function. Should fix issue #2929.	2025-04-15 15:49:44 +02:00
Willy Tarreau	b708345c17	DEBUG: counters: add the ability to enable/disable updating the COUNT_IF counters These counters can have a noticeable cost on large machines, though not dramatic. There's no single good choice to keep them enabled or disabled. This commit adds multiple choices: - DEBUG_COUNTERS set to 2 will automatically enable them by default, while 1 will disable them by default - the global "debug.counters on/off" will allow to change the setting at boot, regardless of DEBUG_COUNTERS as long as it was at least 1. - the CLI "debug counters on/off" will also allow to change the value at run time, allowing to observe a phenomenon while it's happening, or to disable counters if it's suspected that their cost is too high Finally, the "debug counters" command will append "(stopped)" at the end of the CNT lines when these counters are stopped. Not that the whole mechanism would easily support being extended to all counter types by specifying the types to apply to, but it doesn't seem useful at all and would require the user to also type "cnt" on debug lines. This may easily be changed in the future if it's found relevant.	2025-04-14 19:02:13 +02:00
Willy Tarreau	a142adaba0	DEBUG: counters: make COUNT_IF() only appear at DEBUG_COUNTERS>=1 COUNT_IF() is convenient but can be heavy since some of them were found to trigger often (roughly 1 counter per request on avg). This might even have an impact on large setups due to the cost of a shared cache line bouncing between multiple cores. For now there's no way to disable it, so let's only enable it when DEBUG_COUNTERS is 1 or above. A future change will make it configurable.	2025-04-14 19:02:13 +02:00
Willy Tarreau	61d633a3ac	DEBUG: rename DEBUG_GLITCHES to DEBUG_COUNTERS and enable it by default Till now the per-line glitches counters were only enabled with the confusingly named DEBUG_GLITCHES (which would not turn glitches off when disabled). Let's instead change it to DEBUG_COUNTERS and make sure it's enabled by default (though it can still be disabled with -DDEBUG_GLITCHES=0 just like for DEBUG_STRICT). It will later be expanded to cover more counters.	2025-04-14 19:02:13 +02:00
William Lallemand	39c05cedff	BUILD: acme: enable the ACME feature when JWS is present The ACME feature depends on the JWS, which currently does not work with every SSL libraries. This patch only enables ACME when JWS is enabled.	2025-04-12 01:39:03 +02:00
William Lallemand	5500bda9eb	MINOR: acme: implement retrieval of the certificate Once the Order status is "valid", the certificate URL is accessible, this patch implements the retrieval of the certificate which is stocked in ctx->store.	2025-04-12 01:39:03 +02:00
William Lallemand	27fff179fe	MINOR: acme: verify the order status once finalized This implements a call to the order status to check if the certificate is ready.	2025-04-12 01:39:03 +02:00
William Lallemand	680222b382	MINOR: acme: finalize by sending the CSR This patch does the finalize step of the ACME task. This encodes the CSR into base64 format and send it to the finalize URL. https://www.rfc-editor.org/rfc/rfc8555#section-7.4	2025-04-12 01:29:27 +02:00
William Lallemand	de5dc31a0d	MINOR: acme: generate the CSR in a X509_REQ Generate the X509_REQ using the generated private key and the SAN from the configuration. This is only done once before the task is started. It could probably be done at the beginning of the task with the private key generation once we have a scheduler instead of a CLI command.	2025-04-12 01:29:27 +02:00
William Lallemand	00ba62df15	MINOR: acme: implement a check on the challenge status This patch implements a check on the challenge URL, once haproxy asked for the challenge to be verified, it must verify the status of the challenge resolution and if there weren't any error.	2025-04-12 01:29:27 +02:00
William Lallemand	711a13a4b4	MINOR: acme: send the request for challenge ready This patch sends the "{}" message to specify that a challenge is ready. It iterates on every challenge URL in the authorization list from the acme_ctx. This allows the ACME server to procede to the challenge validation. https://www.rfc-editor.org/rfc/rfc8555#section-7.5.1	2025-04-12 01:29:27 +02:00
William Lallemand	ae0bc88f91	MINOR: acme: get the challenges object from the Auth URL This patch implements the retrieval of the challenges objects on the authorizations URLs. The challenges object contains a token and a challenge url that need to be called once the challenge is setup. Each authorization URLs contain multiple challenge objects, usually one per challenge type (HTTP-01, DNS-01, ALPN-01... We only need to keep the one that is relevent to our configuration.	2025-04-12 01:29:27 +02:00
William Lallemand	4842c5ea8c	MINOR: acme: newOrder request retrieve authorizations URLs This patch implements the newOrder action in the ACME task, in order to ask for a new certificate, a list of SAN is sent as a JWS payload. the ACME server replies a list of Authorization URLs. One Authorization is created per SAN on a Order. The authorization URLs are stored in a linked list of 'struct acme_auth' in acme_ctx, so we can get the challenge URLs from them later. The location header is also store as it is the URL of the order object. https://datatracker.ietf.org/doc/html/rfc8555#section-7.4	2025-04-12 01:29:27 +02:00
William Lallemand	04d393f661	MINOR: acme: generate new account The new account action in the ACME task use the same function as the chkaccount, but onlyReturnExisting is not sent in this case!	2025-04-12 01:29:27 +02:00
William Lallemand	7f9bf4d5f7	MINOR: acme: check if the account exist This patch implements the retrival of the KID (account identifier) using the pkey. A request is sent to the newAccount URL using the onlyReturnExisting option, which allow to get the kid of an existing account. acme_jws_payload() implement a way to generate a JWS payload using the nonce, pkey and provided URI.	2025-04-12 01:29:27 +02:00
William Lallemand	0aa6dedf72	MINOR: acme: handle the nonce ACME requests are supposed to be sent with a Nonce, the first Nonce should be retrieved using the newNonce URI provided by the directory. This nonce is stored and must be replaced by the new one received in the each response.	2025-04-12 01:29:27 +02:00
William Lallemand	471290458e	MINOR: acme: get the ACME directory The first request of the ACME protocol is getting the list of URLs for the next steps. This patch implements the first request and the parsing of the response. The response is a JSON object so mjson is used to parse it.	2025-04-12 01:29:27 +02:00
William Lallemand	b8209cf697	MINOR: acme/cli: add the 'acme renew' command The "acme renew" command launch the ACME task for a given certificate. The CLI parser generates a new private key using the parameters from the acme section..	2025-04-12 01:29:27 +02:00
William Lallemand	bf6a39c4d1	MINOR: acme: add private key configuration This commit allows to configure the generated private keys, you can configure the keytype (RSA/ECDSA), the number of bits or the curves. Example: acme LE uri https://acme-staging-v02.api.letsencrypt.org/directory account account.key contact foobar@example.com challenge HTTP-01 keytype ECDSA curves P-384	2025-04-12 01:29:27 +02:00
William Lallemand	2e8c350b95	MINOR: acme: add configuration for the crt-store Add new acme keywords for the ckch_conf parsing, which will be used on a crt-store, a crt line in a frontend, or even a crt-list. The cfg_postparser_acme() is called in order to check if a section referenced elsewhere really exists in the config file.	2025-04-12 01:29:27 +02:00
William Lallemand	077e2ce84c	MINOR: acme: add the acme section in the configuration parser Add a configuration parser for the new acme section, the section is configured this way: acme letsencrypt uri https://acme-staging-v02.api.letsencrypt.org/directory account account.key contact foobar@example.com challenge HTTP-01 When unspecified, the challenge defaults to HTTP-01, and the account key to "<section_name>.account.key". Section are stored in a linked list containing acme_cfg structures, the configuration parsing is mostly resolved in the postsection parser cfg_postsection_acme() which is called after the parsing of an acme section.	2025-04-12 01:29:27 +02:00
William Lallemand	20718f40b6	MEDIUM: ssl/ckch: add filename and linenum argument to crt-store parsing Add filename and linenum arguments to the crt-store / ckch_conf parsing. It allows to use them in the parsing function so we could emits error.	2025-04-12 01:29:27 +02:00
Willy Tarreau	00c967fac4	MINOR: master/cli: support bidirectional communications with workers Some rare commands in the worker require to keep their input open and terminate when it's closed ("show events -w", "wait"). Others maintain a per-session context ("set anon on"). But in its default operation mode, the master CLI passes commands one at a time to the worker, and closes the CLI's input channel so that the command can immediately close upon response. This effectively prevents these two specific cases from being used. Here the approach that we take is to introduce a bidirectional mode to connect to the worker, where everything sent to the master is immediately forwarded to the worker (including the raw command), allowing to queue multiple commands at once in the same session, and to continue to watch the input to detect when the client closes. It must be a client's choice however, since doing so means that the client cannot batch many commands at once to the master process, but must wait for these commands to complete before sending new ones. For this reason we use the prefix "@@<pid>" for this. It works exactly like "@" except that it maintains the channel open during the whole execution. Similarly to "@<pid>" with no command, "@@<pid>" will simply open an interactive CLI session to the worker, that will be ended by "quit" or by closing the connection. This can be convenient for the user, and possibly for clients willing to dedicate a connection to the worker.	2025-04-11 16:09:17 +02:00
Aurelien DARRAGON	fbfeb591f7	MINOR: proxy: add deinit_proxy() helper func Same as free_proxy(), but does not free the base proxy pointer (ie: the proxy itself may not be allocated) Goal is to be able to cleanup statically allocated dummy proxies.	2025-04-10 22:10:31 +02:00
Aurelien DARRAGON	e1cec655ee	MINOR: proxy: add setup_new_proxy() function Split alloc_new_proxy() in two functions: the preparing part is now handled by setup_new_proxy() which can be called individually, while alloc_new_proxy() takes care of allocating a new proxy struct and then calling setup_new_proxy() with the freshly allocated proxy.	2025-04-10 22:10:31 +02:00
Willy Tarreau	f4634e5a38	MINOR: ring/cli: support delimiting events with a trailing \0 on "show events" At the moment it is not supported to produce multi-line events on the "show events" output, simply because the LF character is used as the default end-of-event mark. However it could be convenient to produce well-formatted multi-line events, e.g. in JSON or other formats. UNIX utilities have already faced similar needs in the past and added "-print0" to "find" and "-0" to "xargs" to mention that the delimiter is the NUL character. This makes perfect sense since it's never present in contents, so let's do exactly the same here. Thus from now on, "show events <ring> -0" will delimit messages using a \0 instead of a \n, permitting a better and safer encapsulation.	2025-04-08 14:36:35 +02:00
Willy Tarreau	0be6d73e88	MINOR: ring: support arbitrary delimiters through ring_dispatch_messages() In order to support delimiting output events with other characters than just the LF, let's pass the delimiter through the API. The default remains the LF, used by applet_append_line(), and ignored by the log forwarder.	2025-04-08 14:36:35 +02:00
Willy Tarreau	f01ff2478f	BUILD: atomics: fix build issue on non-x86/non-arm systems Commit `f435a2e518` ("CLEANUP: atomics: also replace __sync_synchronize() with __atomic_thread_fence()") replaced the builtins used for barriers, but the different API required an argument while the macros didn't specify any, resulting in double parenthesis that were causing obscure build errors such as "called object type 'void' is not a function or function pointer". Let's just specify the args for the macro. No backport is needed.	2025-04-07 09:38:22 +02:00
Aurelien DARRAGON	11d4d0957e	MEDIUM: task: make notification_* API thread safe by default Some notification_* functions were not thread safe by default as they assumed only one producer would emit events for registered tasks. While this suited well with the Lua sockets use-case, this proved to be a limitation with some other event sources (ie: lua Queue class) instead of having to deal with both the non thread safe and thread safe variants (_mt suffix), which is error prone, let's make the entire API thread safe regarding the event list. Pruning functions still require that only one thread executes them, with Lua this is always the case because there is one cleanup list per context.	2025-04-03 17:52:50 +02:00
Aurelien DARRAGON	748dba4859	MINOR: hlua_fcn: register queue class using hlua_register_metatable() Most lua classes are registered by leveraging the hlua_register_metatable() helper. Let's use that for the Queue class as well for consitency.	2025-04-03 17:52:17 +02:00
Aurelien DARRAGON	b77b1a2c3a	MINOR: task: add thread safe notification_new and notification_wake variants notification_new and notification_wake were historically meant to be called by a single thread doing both the init and the wakeup for other tasks waiting on the signals. In this patch, we extend the API so that notification_new and notification_wake have thread-safe variants that can safely be used with multiple threads registering on the same list of events and multiple threads pushing updates on the list.	2025-04-03 17:52:03 +02:00
Amaury Denoyelle	f0f1816f1a	MINOR: check: implement check-pool-conn-name srv keyword This commit is a direct follow-up of the previous one. It defines a new server keyword check-pool-conn-name. It is used as the default value for the name parameter of idle connection hash generation. Its behavior is similar to server keyword pool-conn-name, but reserved for checks reuse. If check-pool-conn-name is set, it is used in priority to match a connection for reuse. If unset, a fallback is performed on check-sni.	2025-04-03 17:19:07 +02:00
Amaury Denoyelle	43367f94f1	MINOR: check/backend: support conn reuse with SNI Support for connection reuse during server checks was implemented recently. This is activated with the server keyword check-reuse-pool. Similarly to stream processing via connect_backend(), a connection hash is calculated when trying to perform reuse for checks. This is necessary to retrieve for a connection which shares the check connect parameters. However, idle connections can additionnally be tagged using a pool-conn-name or SNI under connect_backend(). Check reuse does not test these values, which prevent to retrieve a matching connection. Improve this by using "check-sni" value as idle connection hash input for check reuse. be_calculate_conn_hash() API has been adjusted so that name value can be passed as input, both when using streams or checks. Even with the current patch, there is still some scenarii which could not be covered for checks connection reuse. most notably, when using dynamic pool-conn-name/SNI value. It is however at least sufficient to cover simpler cases.	2025-04-03 17:19:07 +02:00
Willy Tarreau	f435a2e518	CLEANUP: atomics: also replace __sync_synchronize() with __atomic_thread_fence() The drop of older compilers also allows us to focus on clearer barriers, so let's use them.	2025-04-03 11:59:31 +02:00
Willy Tarreau	34e3b83f9c	CLEANUP: atomics: remove support for gcc < 4.7 The old __sync_* API is no longer necessary since we do not support gcc before 4.7 anymore. Let's just get rid of this code, the file is still ugly enough without it.	2025-04-03 11:55:35 +02:00
Ilia Shipitsin	27a6353ceb	CLEANUP: assorted typo fixes in the code, commits and doc	2025-04-03 11:37:25 +02:00
William Lallemand	b351f06ff1	REORG: ssl: move curves2nid and nid2nist to ssl_utils curves2nid and nid2nist are generic functions that could be used outside the JWS scope, this patch put them at the right place so they can be reused.	2025-04-02 19:34:09 +02:00
Amaury Denoyelle	f1fb396d71	MEDIUM: check: implement check-reuse-pool Implement the possibility to reuse idle connections when performing server checks. This is done thanks to the recently introduced functions be_calculate_conn_hash() and be_reuse_connection(). One side effect of this change is that be_calculate_conn_hash() can now be called with a NULL stream instance. As such, part of the functions are adjusted accordingly. Note that to simplify configuration, connection reuse is not performed if any specific check connection parameters are defined on the server line or via the tcp-check connect rule. This is performed via newly defined tcpcheck_use_nondefault_connect().	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	e34f748e3a	MINOR: check define check-reuse-pool server keyword Define a new server keyword check-reuse-pool, and its counterpart with a "no" prefix. For the moment, only parsing is implemented. The real behavior adjustment will be implemented in the next patch.	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	20eb57b486	MINOR: backend: remove stream usage on connection reuse Adjust newly defined be_reuse_connection() API. The stream argument is removed. This will allows checks to be able to invoke it without relying on a stream instance.	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	ee94a6cfc1	MINOR: backend: extract conn reuse from connect_server() Following the previous patch, the part directly related to connection reuse is extracted from connect_server(). It is now define in a new function be_reuse_connection().	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	c7cc6b6401	MINOR: backend: extract conn hash calculation from connect_server() On connection reuse, a hash is first calculated. It is generated from various connection parameters, to retrieve a matching connection. Extract hash calculation from connect_server() into a new dedicated function be_calculate_conn_hash(). The objective is to be able to perform connection reuse for checks, without connect_server() invokation which relies on a stream instance.	2025-04-02 14:57:40 +02:00
Willy Tarreau	4ec5509541	BUILD: compiler: undefine the CONCAT() macro if already defined As Ilya reported in issue #2911, the CONCAT() macro breaks on NetBSD which defines its own as __CONCAT() (which is exactly the same). Let's just undefine it before ours to fix the issue instead of renaming, but keep ours so that we don't have doubts about what we're running with. Note that the patch introducing this breaking change was backported to 3.0.	2025-04-02 11:36:43 +02:00
Ilia Shipitsin	78b849b839	CLEANUP: assorted typo fixes in the code and comments code, comments and doc actually.	2025-04-02 11:12:20 +02:00
Olivier Houchard	9fe72bba3c	MAJOR: leastconn; Revamp the way servers are ordered. For leastconn, servers used to just be stored in an ebtree. Each server would be one node. Change that so that nodes contain multiple mt_lists. Each list will contain servers that share the same key (typically meaning they have the same number of connections). Using mt_lists means that as long as tree elements already exist, moving a server from one tree element to another does no longer require the lbprm write lock. We use multiple mt_lists to reduce the contention when moving a server from one tree element to another. A list in the new element will be chosen randomly. We no longer remove a tree element as soon as they no longer contain any server. Instead, we keep a list of all elements, and when we need a new element, we look at that list only if it contains a number of elements already, otherwise we'll allocate a new one. Keeping nodes in the tree ensures that we very rarely have to take the lbrpm write lock (as it only happens when we're moving the server to a position for which no element is currently in the tree). The number of mt_lists used is defined as FWLC_NB_LISTS. The number of tree elements we want to keep is defined as FWLC_MIN_FREE_ENTRIES, both in defaults.h. The value used were picked afrer experimentation, and seems to be the best choice of performances vs memory usage. Doing that gives a good boost in performances when a lot of servers are used. With a configuration using 500 servers, before that patch, about 830000 requests per second could be processed, with that patch, about 1550000 requests per second are processed, on an 64-cores AMD, using 1200 concurrent connections.	2025-04-01 18:05:30 +02:00
Olivier Houchard	ba521a1d88	MINOR: threads: Add HA_RWLOCK_TRYRDTOWR() Add HA_RWLOCK_TRYRDTOWR(), that tries to upgrade a lock from reader to writer, and fails if any seeker or writer already holds it.	2025-04-01 18:05:30 +02:00
Olivier Houchard	2a9436f96b	MINOR: lbprm: Add method to deinit server and proxy Add two new methods to lbprm, server_deinit() and proxy_deinit(), in case something should be done at the lbprm level when removing servers and proxies.	2025-04-01 18:05:30 +02:00
Olivier Houchard	17059098e7	MINOR: mt_list: Implement mt_list_try_lock_prev(). Implement mt_list_try_lock_prev(), that does the same thing as mt_list_lock_prev(), exceot if the list is locked, it returns { NULL, NULL } instaed of waiting.	2025-04-01 18:05:30 +02:00
William Lallemand	fdcb97614c	MINOR: ssl/ckch: add substring parser for ckch_conf Add a substring parser for the ckch_conf keyword parser, this will split a string into multiple substring, and strdup them in a array.	2025-04-01 15:38:32 +02:00
William Lallemand	f8fe84caca	MINOR: jws: emit the JWK thumbprint jwk_thumbprint() is a function which is a function which implements RFC7368 and emits a JWK thumbprint using a EVP_PKEY. EVP_PKEY_EC_to_pub_jwk() and EVP_PKEY_RSA_to_pub_jwk() were changed in order to match what is required to emit a thumbprint (ie, no spaces or lines and the lexicographic order of the fields)	2025-04-01 11:57:55 +02:00
Willy Tarreau	1e9a2529aa	MINOR: cpu-topo: pass an extra argument to ha_cpu_policy This extra argument will allow common functions to distinguish between multiple policies. For now it's not used.	2025-03-31 16:21:37 +02:00
Willy Tarreau	571573874a	MINOR: cpu-set: add a new function to print cpu-sets in human-friendly mode The new function "print_cpu_set()" will print cpu sets in a human-friendly way, with commas and dashes for intervals. The goal is to keep them compact enough.	2025-03-31 16:21:37 +02:00
Willy Tarreau	3955f151b1	MINOR: cpu-set: compare two cpu sets with ha_cpuset_isequal() This function returns true if two CPU sets are equal.	2025-03-31 16:21:37 +02:00
Valentine Krasnobaeva	b303861469	MINOR: compiler: add __nonstring macro GCC 15 throws the following warning on fixed-size char arrays if they do not contain terminated NUL: src/tools.c:2041:25: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (17 chars into 16 available) [-Werror=unterminated-string-initialization] 2041 \| const char hextab[16] = "0123456789ABCDEF"; We are using a couple of such definitions for some constants. Converting them to flexible arrays, like: hextab[] = "0123456789ABCDEF" may have consequences, as enlarged arrays won't fit anymore where they were possibly located due to the memory alignement constraints. GCC adds 'nonstring' variable attribute for such char arrays, but clang and other compilers don't have it. Let's wrap 'nonstring' with our __nonstring macro, which will test if the compiler supports this attribute. This fixes the issue #2910.	2025-03-31 13:50:28 +02:00
Willy Tarreau	6b17310757	MEDIUM: pools: be a bit smarter when merging comparable size pools By default, pools of comparable sizes are merged together. However, the current algorithm is dumb: it rounds the requested size to the next multiple of 16 and compares the sizes like this. This results in many entries which are already multiples of 16 not being merged, for example 1024 and 1032 are separate, 65536 and 65540 are separate, 48 and 56 are separate (though 56 merges with 64). This commit changes this to consider not just the entry size but also the average entry size, that is, it compares the average size of all objects sharing the pool with the size of the object looking for a pool. If the object is not more than 1% bigger nor smaller than the current average size or if it neither 16 bytes smaller nor larger, then it can be merged. Also, it always respects exact matches in order to avoid merging objects into larger pools or worse, extending existing ones for no reason, and when there's a tie, it always avoids extending an existing pool. Also, we now visit all existing pools in order to spot the best one, we do not stop anymore at the smallest one large enough. Theoretically this could cost a bit of CPU but in practice it's O(N^2) with N quite small (typically in the order of 100) and the cost at each step is very low (compare a few integer values). But as a side effect, pools are no longer sorted by size, "show pools bysize" is needed for this. This causes the objects to be much better grouped together, accepting to use a little bit more sometimes to avoid fragmentation, without causing everyone to be merged into the same pool. Thanks to this we're now seeing 36 pools instead of 48 by default, with some very nice examples of compact grouping: - Pool qc_stream_r (80 bytes) : 13 users > qc_stream_r : size=72 flags=0x1 align=0 > quic_cstrea : size=80 flags=0x1 align=0 > qc_stream_a : size=64 flags=0x1 align=0 > hlua_esub : size=64 flags=0x1 align=0 > stconn : size=80 flags=0x1 align=0 > dns_query : size=64 flags=0x1 align=0 > vars : size=80 flags=0x1 align=0 > filter : size=64 flags=0x1 align=0 > session pri : size=64 flags=0x1 align=0 > fcgi_hdr_ru : size=72 flags=0x1 align=0 > fcgi_param_ : size=72 flags=0x1 align=0 > pendconn : size=80 flags=0x1 align=0 > capture : size=64 flags=0x1 align=0 - Pool h3s (56 bytes) : 17 users > h3s : size=56 flags=0x1 align=0 > qf_crypto : size=48 flags=0x1 align=0 > quic_tls_se : size=48 flags=0x1 align=0 > quic_arng : size=56 flags=0x1 align=0 > hlua_flt_ct : size=56 flags=0x1 align=0 > promex_metr : size=48 flags=0x1 align=0 > conn_hash_n : size=56 flags=0x1 align=0 > resolv_requ : size=48 flags=0x1 align=0 > mux_pt : size=40 flags=0x1 align=0 > comp_state : size=40 flags=0x1 align=0 > notificatio : size=48 flags=0x1 align=0 > tasklet : size=56 flags=0x1 align=0 > bwlim_state : size=48 flags=0x1 align=0 > xprt_handsh : size=48 flags=0x1 align=0 > email_alert : size=56 flags=0x1 align=0 > caphdr : size=41 flags=0x1 align=0 > caphdr : size=41 flags=0x1 align=0 - Pool quic_cids (32 bytes) : 13 users > quic_cids : size=16 flags=0x1 align=0 > quic_tls_ke : size=32 flags=0x1 align=0 > quic_tls_iv : size=12 flags=0x1 align=0 > cbuf : size=32 flags=0x1 align=0 > hlua_queuew : size=24 flags=0x1 align=0 > hlua_queue : size=24 flags=0x1 align=0 > promex_modu : size=24 flags=0x1 align=0 > cache_st : size=24 flags=0x1 align=0 > spoe_appctx : size=32 flags=0x1 align=0 > ehdl_sub_tc : size=32 flags=0x1 align=0 > fcgi_flt_ct : size=16 flags=0x1 align=0 > sig_handler : size=32 flags=0x1 align=0 > pipe : size=24 flags=0x1 align=0 - Pool quic_crypto (1032 bytes) : 2 users > quic_crypto : size=1032 flags=0x1 align=0 > requri : size=1024 flags=0x1 align=0 - Pool quic_conn_r (65544 bytes) : 2 users > quic_conn_r : size=65536 flags=0x1 align=0 > dns_msg_buf : size=65540 flags=0x1 align=0 On a very unscientific test consisting in sending 1 million H1 requests and 1 million H2 requests to the stats page, we're seeing an ~6% lower memory usage with the patch: before the patch: Total: 48 pools, 4120832 bytes allocated, 4120832 used (~3555680 by thread caches). after the patch: Total: 36 pools, 3880648 bytes allocated, 3880648 used (~3299064 by thread caches). This should be taken with care however since pools allocate and release in batches.	2025-03-25 18:01:01 +01:00
Pierre-Andre Savalle	8ed1e91efd	MEDIUM: lb-chash: add directive hash-preserve-affinity When using hash-based load balancing, requests are always assigned to the server corresponding to the hash bucket for the balancing key, without taking maxconn or maxqueue into account, unlike in other load balancing methods like 'first'. This adds a new backend directive that can be used to take maxconn and possibly maxqueue in that context. This can be used when hashing is desired to achieve cache locality, but sending requests to a different server is preferable to queuing for a long time or failing requests when the initial server is saturated. By default, affinity is preserved as was the case previously. When 'hash-preserve-affinity' is set to 'maxqueue', servers are considered successively in the order of the hash ring until a server that does not have a full queue is found. When 'maxconn' is set on a server, queueing cannot be disabled, as 'maxqueue=0' means unlimited. To support picking a different server when a server is at 'maxconn' irrespective of the queue, 'hash-preserve-affinity' can be set to 'maxconn'.	2025-03-25 18:01:01 +01:00
Amaury Denoyelle	cf9e40bd8a	MINOR: quic: define max-stream-data configuration as a ratio	2025-03-25 16:30:35 +01:00
Amaury Denoyelle	68c10d444d	MINOR: mux-quic: define config for max-data Define a new global configuration tune.quic.frontend.max-data. This allows users to explicitely set the value for the corresponding QUIC TP initial-max-data, with direct impact on haproxy memory consumption.	2025-03-25 16:30:09 +01:00
Amaury Denoyelle	a71007c088	MINOR: quic: move global tune options into quic_tune A new structure quic_tune has recently been defined. Its purpose is to store global options related to QUIC. Previously, only the tunable to toggle pacing was stored in it. This commit moves several QUIC related tunable from global to quic_tune structure. This better centralizes QUIC configuration option and gives room for future generic options.	2025-03-24 10:01:46 +01:00
Willy Tarreau	9091c5317f	MINOR: cli/pools: record the list of pool registrations even when merging them By default, create_pool() tries to merge similar pools into one. But when dealing with certain bugs, it's hard to say which ones were merged together. We do have the information at registration time, so let's just create a list of registrations ("pool_registration") attached to each pool, that will store that information. It can then be consulted on the CLI using "show pools detailed", where the names, sizes, alignment and flags are reported.	2025-03-21 17:09:30 +01:00
Aurelien DARRAGON	7ec6f4412c	MINOR: stats: add alt_name field to stat_col struct alt_name will be used by metric exporters to know how the metric should be presented to the user. If the alt_name is NULL, the metric should be ignored. For now only promex exporter will make use of this.	2025-03-21 17:04:54 +01:00
Olivier Houchard	98967aa09f	MEDIUM: mt_list: Reduce the max number of loops with exponential backoff Reduce the max number of loops in the mt_list code while waiting for a lock to be available with exponential backoff. It's been observed that the current value led to severe performances degradation at least on some hardware, hopefully this value will be acceptable everywhere.	2025-03-21 11:30:59 +01:00
Aurelien DARRAGON	af68343a56	MINOR: stats: use stat_col storage stat_cols_info Use stat_col storage for stat_cols_info[] array instead of name_desc. As documented in `65624876f` ("MINOR: stats: introduce a more expressive stat definition method"), stat_col supersedes name_desc storage but it remains backward compatible. Here we migrate to the new API to be able to further extend stat_cols_info[] in following patches.	2025-03-20 11:38:32 +01:00
Aurelien DARRAGON	9c60fc9fe1	MINOR: stats: STATS_PX_CAP___B_ macro STATS_PX_CAP___B_ points to STATS_PX_CAP_BE, it is just an alias for consistency, like STATS_PX_CAP____S which points to STATS_PX_CAP_SRV.	2025-03-20 11:37:47 +01:00
Aurelien DARRAGON	3c1b00b127	MINOR: stats: add .generic explicit field in stat_col struct Further extend logic implemented in `65624876` ("MINOR: stats: introduce a more expressive stat definition method") and `4e9e8418` ("MINOR: stats: prepare stats-file support for values other than FN_COUNTER"): we don't rely anymore on the presence of the capability to know if the metric is generic or not. This is because it prevents us from setting a capability on static statistics. Yet it could be useful to set the capability even on static metrics, thus we add a dedicated .generic bit to tell haproxy that the metric is generic and can be handled automatically by the API. Also, ME_NEW_* helpers are not explicitly associated to generic metric definition (as it was already the case before) to avoid ambiguities. It may change in the future as we may need to use the new definition method to define static metrics (without the generic bit set). But for now it isn't the case as this need definition was implemented for generic metrics support in the first place. If we want to define static metrics using the API, we could add a new set of helpers for instance.	2025-03-20 11:37:21 +01:00
William Lallemand	2fb6270910	MEDIUM: ssl/ckch: make the ckch_conf more generic The ckch_store_load_files() function makes specific processing for PARSE_TYPE_STR as if it was a type only used for paths. This patch changes a little bit the way it's done, PARSE_TYPE_STR is only meant to strdup() a string and stores the resulting pointer in the ckch_conf structure. Any processing regarding the path is now done in the callback. Since the callbacks were basically doing the same thing, they were transformed into the DECLARE_CKCH_CONF_LOAD() macros which allows to do some templating of these functions. The resulting ckch_conf_load_* functions will do the same as before, except they will also do the path processing instead of letting ckch_store_load_files() do it, which means we don't need the "base" member anymore in the struct ckch_conf_kws.	2025-03-19 18:08:40 +01:00
William Lallemand	b0ad777902	MINOR: tools: path_base() concatenates a path with a base path With the SSL configuration, crt-base, key-base are often used, these keywords concatenates the base path with the path when the path does not start by '/'. This is done at several places in the code, so a function to do this would be better to standardize the code.	2025-03-19 17:59:31 +01:00
William Lallemand	29b4b985c3	MINOR: jws: use jwt_alg type instead of a char This patch implements the function EVP_PKEY_to_jws_algo() which returns a jwt_alg compatible with the private key. This value can then be passed to jws_b64_protected() and jws_b64_signature() which modified to take an jwt_alg instead of a char.	2025-03-17 18:06:34 +01:00
William Lallemand	de67f25a7e	MINOR: jws: add new functions in jws.h Add signatures of jws_b64_payload(), jws_b64_protected(), jws_b64_signature(), jws_flattened() which allows to create a complete JWS flattened object.	2025-03-17 11:51:52 +01:00
Willy Tarreau	156430ceb6	MINOR: cpu-topo: add a CPU policy setting to the global section We'll need to let the user decide what's best for their workload, and in order to do this we'll have to provide tunable options. For that, we're introducing struct ha_cpu_policy which contains a name, a description and a function pointer. The purpose will be to use that function pointer to choose the best CPUs to use and now to set the number of threads and thread-groups, that will be called during the thread setup phase. The only supported policy for now is "none" which doesn't set/touch anything (i.e. all available CPUs are used).	2025-03-14 18:33:16 +01:00
Willy Tarreau	c93ee25054	MINOR: cpu-topo: add "only-node" and "drop-node" to cpu-set These are processed after the topology is detected, and they allow to restrict binding to or evict CPUs matching the indicated node(s).	2025-03-14 18:33:16 +01:00
Willy Tarreau	aa4776210b	MINOR: cpu-topo: create an array of the clusters The goal here is to keep an array of the known CPU clusters, because we'll use that often to decide of the performance of a cluster and its relevance compared to other ones. We'll store the number of CPUs in it, the total capacity etc. For the capacity, we count one unit per core, and 1/3 of it per extra SMT thread, since this is roughly what has been measured on modern CPUs. In order to ease debugging, they're also dumped with -dc.	2025-03-14 18:30:31 +01:00
Willy Tarreau	4a6eaf6c5e	MINOR: cpu-topo: add a function to sort by cluster+capacity The purpose here is to detect heterogenous clusters which are not properly reported, based on the exposed information about the cores capacity. The algorithm here consists in sorting CPUs by capacity within a cluster, and considering as equal all those which have 5% or less difference in capacity with the previous one. This allows large clusters of more than 5% total between extremities, while keeping apart those where the limit is more pronounced. This is quite common in embedded environments with big.little systems, as well as on some laptops.	2025-03-14 18:30:31 +01:00
Willy Tarreau	d169758fa9	MINOR: cpu-topo: make sure we don't leave unassigned IDs in the cpu_topo It's important that we don't leave unassigned IDs in the topology, because the selection mechanism is based on index-based masks, so an unassigned ID will never be kept. This is particularly visible on systems where we cannot access the CPU topology, the package id, node id and even thread id are set to -1, and all CPUs are evicted due to -1 not being set in the "only-cpu" sets. Here in new function "cpu_fixup_topology()", we assign them with the smallest unassigned value. This function will be used to assign IDs where missing in general.	2025-03-14 18:30:31 +01:00
Willy Tarreau	af648c7b58	MINOR: cpu-topo: assign clusters to cores without and renumber them Due to the previous commit we can end up with cores not assigned any cluster ID. For this, at the end we sort the CPUs by topology and assign cluster IDs to remaining CPUs based on pkg/node/llc. For example an 14900 now shows 5 clusters, one for the 8 p-cores, and 4 of 4 e-cores each. The local cluster numbers are per (node,pkg) ID so that any rule could easily be applied on them, but we also keep the global numbers that will help with thread group assignment. We still need to force to assign distinct cluster IDs to cores running on a different L3. For example the EPYC 74F3 is reported as having 8 different L3s (which is true) and only one cluster. Here we introduce a new function "cpu_compose_clusters()" that is called from the main init code just after cpu_detect_topology() so that it's not OS-dependent. It deals with this renumbering of all clusters in topology order, taking care of considering any distinct LLC as being on a distinct cluster.	2025-03-14 18:30:31 +01:00
Willy Tarreau	a4471ea56d	MINOR: cpu-topo: implement a CPU sorting mechanism by cluster ID This will be used to detect and fix incorrect setups which report the same cluster ID for multiple L3 instances. The arrangement of functions in this file is becoming a real problem. Maybe we should move all this to cpu_topo for example, and better distinguish OS-specific and generic code.	2025-03-14 18:30:31 +01:00
Willy Tarreau	a8acdbd9fd	MINOR: cpu-topo: implement a sorting mechanism by CPU locality Once we've kept only the CPUs we want, the next step will be to form groups and these ones are based on locality. Thus we'll have to sort by locality. For now the locality is only inferred by the index. No grouping is made at this point. For this we add the "cpu_reorder_by_locality" function with a locality-based comparison function.	2025-03-14 18:30:31 +01:00
Willy Tarreau	18133a054d	MINOR: cpu-topo: implement a sorting mechanism for CPU index CPU selection will be performed by sorting CPUs according to various criteria. For dumps however, that's really not convenient and we'll need to reorder the CPUs according to their index only. This is what the new function cpu_reorder_by_index() does. It's called in thread_detect_count() before dumping the CPU topology.	2025-03-14 18:30:31 +01:00
Willy Tarreau	1af4942c95	MEDIUM: thread: start to detect thread groups and threads min/max By mutually refining the thread count and group count, we can try to detect the most suitable setup for the current machine. Taskset is implicitly handled correctly. tgroups automatically adapt to the configured number of threads. cpu-map manages to limit tgroups to the smallest supported value. The thread-limit is enforced. Just like in cfgparse, if the thread count was forced to a higher value, it's reduced and a warning is emitted. But if it was not set, the thr_max value is bound to this limit so that further calculations respect it. We continue to default to the max number of available threads and 1 tgroup by default, with the limit. This normally allows to get rid of that test in check_config_validity().	2025-03-14 18:30:30 +01:00
Willy Tarreau	f0661e79fe	MINOR: global: add a command-line option to enable CPU binding debugging During development, everything related to CPU binding and the CPU topology is debugged using state dumps at various places, but it does make sense to have a real command line option so that this remains usable in production to help users figure why some CPUs are not used by default. Let's add "-dc" for this. Since the list of global.tune.options values is almost full and does not 100% match this option, let's add a new "tune.debug" field for this.	2025-03-14 18:30:30 +01:00
Willy Tarreau	ac1db9db7d	MINOR: thread: turn thread_cpu_mask_forced() into an init-time variable The function is not convenient because it doesn't allow us to undo the startup changes, and depending on where it's being used, we don't know whether the values read have already been altered (this is not the case right now but it's going to evolve). Let's just compute the status during cpu_detect_usable() and set a variable accordingly. This way we'll always read the init value, and if needed we can even afford to reset it. Also, placing it in cpu_topo.c limits cross-file dependencies (e.g. threads without affinity etc).	2025-03-14 18:30:30 +01:00
Willy Tarreau	7cb274439b	MINOR: cpu-topo: add CPU topology detection for linux This uses the publicly available information from /sys to figure the cache and package arrangements between logical CPUs and fill ha_cpu_topo[], as well as their SMT capabilities and relative capacity for those which expose this. The functions clearly have to be OS-specific.	2025-03-14 18:30:30 +01:00
Willy Tarreau	8f72ce335a	MINOR: cpu-topo: add detection of online CPUs on Linux This adds a generic function ha_cpuset_detect_online() which for now only supports linux via /sys. It fills a cpuset with the list of online CPUs that were detected (or returns a failure).	2025-03-14 18:30:30 +01:00
Willy Tarreau	8c524c7c9d	REORG: cpu-topo: move bound cpu detection from cpuset to cpu-topo The cpuset files are normally used only for cpu manipulations. It happens that the initial CPU binding detection was initially placed there since there was no better place, but in practice, being OS-specific, it should really be in cpu-topo. This simplifies cpuset which doesn't need to know about the OS anymore.	2025-03-14 18:30:30 +01:00
Willy Tarreau	a6fdc3eaf0	MINOR: cpu-topo: update CPU topology from excluded CPUs at boot Now before trying to resolve the thread assignment to groups, we detect which CPUs are not bound at boot so that we can mark them with HA_CPU_F_EXCLUDED. This will be useful to better know on which CPUs we can count later. Note that we purposely ignore cpu-map here as we don't know how threads and groups will map to cpu-map entries, hence which CPUs will really be used. It's important to proceed this way so that when we have no info we assume they're all available.	2025-03-14 18:30:30 +01:00
Willy Tarreau	bdb731172c	MINOR: cpu-topo: add a function to dump CPU topology The new function cpu_dump_topology() will centralize most debugging calls, and it can make efforts of not dumping some possibly irrelevant fields (e.g. non-existing cache levels).	2025-03-14 18:30:30 +01:00
Willy Tarreau	041462c4af	MINOR: cpu-topo: rely on _SC_NPROCESSORS_CONF to trim maxcpus We don't want to constantly deal with as many CPUs as a cpuset can hold, so let's first try to trim the value to what the system claims to support via _SC_NPROCESSORS_CONF. It is obviously still subject to the limit of the cpuset size though. The value is stored globally so that we can reuse it elsewhere after initialization.	2025-03-14 18:30:30 +01:00
Willy Tarreau	656cedad42	MINOR: cpu-topo: allocate and initialize the ha_cpu_topo array. This does the bare minimum to allocate and initialize a global ha_cpu_topo array for the number of supported CPUs and release it at deinit time.	2025-03-14 18:30:30 +01:00
Willy Tarreau	d165f5d3ab	MINOR: cpu-topo: add ha_cpu_topo definition This structure will be used to store information about each CPU's topology (package ID, L3 cache ID, NUMA node ID etc). This will be used in conjunction with CPU affinity setting to try to perform a mostly optimal binding between threads and CPU numbers by default. Since it was noticed during tests that absolutely none of the many machines tested reports different die numbers, the die_id is not stored. Also, it was found along experiments that the cluster ID will be used a lot, half of the time as a node-local identifier, and half of the time as a global identifier. So let's store the two versions at once (cl_gid, cl_lid). Some flags are added to indicate causes of exclusion (offline, excluded at boot, excluded by rules, ignored by policy).	2025-03-14 18:30:30 +01:00
Willy Tarreau	69ac4cd315	MINOR: compiler: add a new __decl_thread_var() macro to declare local variables __decl_thread() already exists but is more suited for struct members. When using it in a variables block, it appends the final trailing semi-colon which is a statement that ends the variable block. Better clean this up and have one precisely for variable blocks. In this case we can simply define an unused enum value that will consume the semi-colon. That's what the new macro __decl_thread_var() does.	2025-03-12 18:08:12 +01:00
Willy Tarreau	bb4addabb7	MINOR: compiler: add a simple macro to concatenate resolved strings It's often useful to be able to concatenate strings after resolving them (e.g. __FILE__, __LINE__ etc). Let's just have a CONCAT() macro to do that, which calls _CONCAT() with the same arguments to make sure the contents are resolved before being concatenated.	2025-03-12 18:06:55 +01:00
Aurelien DARRAGON	003fe530ae	MINOR: log: add "option host" log-forward option add only the parsing part, options are currently unused	2025-03-12 10:51:35 +01:00
Aurelien DARRAGON	47f14be9f3	MINOR: tools: only print address in sa2str() when port == -1 Support special value for port in sa2str: if port is equal to -1, only print the address without the port, also ignoring <map_ports> value.	2025-03-12 10:51:20 +01:00
Aurelien DARRAGON	bc76f6dde9	MINOR: log: migrate log-forward options from proxy->options2 to options3 Migrate recently added log-forward section options, currently stored under proxy->options2 to proxy->options3 since proxy->options2 is running out of space and we plan on adding more log-forward options.	2025-03-12 10:50:03 +01:00
Aurelien DARRAGON	cc5a66212d	MINOR: proxy: add proxy->options3 proxy->options2 is almost full, yet we will add new log-forward options in upcoming patches so we anticipate that by adding a new {no_}options3 and cfg_opts3[] to further extend proxy options	2025-03-12 10:49:36 +01:00
Amaury Denoyelle	dc7913d814	MAJOR: mux-quic: increase stream flow-control for multi-buffer alloc Support for multiple Rx buffers per QCS instance has been introduced by previous patches. However, due to flow-control initial values, client were still unable to fully used this to increase their upload throughput. This patch increases max-stream-data-bidi-remote flow-control initial values. A new define QMUX_STREAM_RX_BUF_FACTOR will fix the number of concurrent buffers allocable per QCS. It is set to 90. Note that connection flow-control initial value did not changed. It is still configured to be equivalent to bufsize multiplied by the maximum concurrent streams. This ensures that Rx buffers allocation is still constrained per connection, so that it won't be possible to have all active QCS instances using in parallel their maximum Rx buffers count.	2025-03-07 12:06:27 +01:00
Amaury Denoyelle	a4f31ffeeb	MINOR: mux-quic: store QCS Rx buf in a single-entry tree Convert QCS rx buffer pointer to a tree container. Additionnaly, offset field of qc_stream_rxbuf is thus transformed into a node tree. For now, only a single Rx buffer is stored at most in QCS tree. Multiple Rx buffers will be implemented in a future patch to improve QUIC clients upload throughput.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	cc3c2d1f12	MINOR: mux-quic: define rxbuf wrapper Define a new type qc_stream_rxbuf. This is used as a wrapper around QCS Rx buffer with encapsulation of the ncbuf storage. It is allocated via a new pool. Several functions are adapted to be able to deal with qc_stream_rxbuf as a wrapper instead of the previous plain ncbuf instance. No functional change should happen with this patch. For now, only a single qc_stream_rxbuf can be instantiated per QCS. However, this new type will be useful to implement multiple Rx buffer storage in a future commit.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	4b1e63d191	MINOR: mux-quic: define globally stream rxbuf size QCS uses ncbuf for STREAM data storage. This serves as a limit for maximum STREAM buffering capacity, advertised via QUIC transport parameters for initial flow-control values. Define a new function qmux_stream_rx_bufsz() which can be used to retrieve this Rx buffer size. This can be used both in MUX/H3 layers and in QUIC transport parameters.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	861b11334c	MINOR: h3/hq-interop: restore function for standalone FIN receive Previously, a function qcs_http_handle_standalone_fin() was implemented to handle a received standalone FIN, bypassing app_ops layer decoding. However, this was removed as app_ops layer interaction is necessary. For example, HTTP/3 checks that FIN is never sent on the control uni stream. This patch reintroduces qcs_http_handle_standalone_fin(), albeit in a slightly diminished version. Most importantly, it is now the responsibility of the app_ops layer itself to use it, to avoid the shortcoming described above. The main objective of this patch is to be able to support standalone FIN in HTTP/0.9 layer. This is easily done via the reintroduction of qcs_http_handle_standalone_fin() usage. This will be useful to perform testing, as standalone FIN is a corner case which can easily be broken.	2025-03-07 12:06:26 +01:00
Valentine Krasnobaeva	e900ef987e	BUG/MEIDUM: startup: return to initial cwd only after check_config_validity() In check_config_validity() we evaluate some sample fetch expressions (log-format, server rules, etc). These expressions may use external files like maps. If some particular 'default-path' was set in the global section before, it's no longer applied to resolve file pathes in check_config_validity(). parse_cfg() at the end of config parsing switches back to the initial cwd. This fixes the issue #2886. This patch should be backported in all stable versions since 2.4.0, including 2.4.0.	2025-03-06 10:49:48 +01:00
Roberto Moreda	f98b5c4f59	MINOR: log: add dont-parse-log and assume-rfc6587-ntf options This commit introduces the dont-parse-log option to disable log message parsing, allowing raw log data to be forwarded without modification. Also, it adds the assume-rfc6587-ntf option to frame log messages using only non-transparent framing as per RFC 6587. This avoids missparsing in certain cases (mainly with non RFC compliant messages). The documentation is updated to include details on the new options and their intended use cases. This feature was discussed in GH #2856	2025-03-06 09:30:39 +01:00
Aurelien DARRAGON	0746f6bde0	MINOR: cfgparse-listen: add and use cfg_parse_listen_match_option() helper cfg_parse_listen_match_option() takes cfg_opt array as parameter, as well current args, expected mode and cap bitfields. It is expected to be used under cfg_parse_listen() function or similar. Its goal is to remove code duplication around proxy->options and proxy->options2 handling, since the same checks are performed for the two. Also, this function could help to evaluate proxy options for mode-specific proxies such as log-forward section for instance: by giving the expected mode and capatiblity as input, the function would only match compatible options.	2025-03-06 09:30:18 +01:00
Aurelien DARRAGON	d9aa199100	MINOR: proxy: make pr_mode enum bitfield compatible Current pr_mode enum is a regular enum because a proxy only supports one mode at a time. However it can be handy for a function to be given a list of compatible modes for a proxy, and we can't do that using a bitfield because pr_mode is not bitfield compatible (values share the same bits). In this patch we manually define pr_mode values so that they are all using separate bits and allows a function to take a bitfield of compatible modes as parameter.	2025-03-06 09:30:11 +01:00
Olivier Houchard	335ef3264b	DEBUG: init: Add a macro to register unit tests Add a new macro, REGISTER_UNITTEST(), that will automatically make sure we call hap_register_unittest(), instead of having to create a function that will do so.	2025-03-04 18:18:10 +01:00
William Lallemand	a647839954	DEBUG: init: add a way to register functions for unit tests Doing unit tests with haproxy was always a bit difficult, some of the function you want to test would depend on the buffer or trash buffer initialisation of HAProxy, so building a separate main() for them is quite hard. This patch adds a way to register a function that can be called with the "-U" parameter on the command line, will be executed just after step_init_1() and will exit the process with its return value as an exit code. When using the -U option, every keywords after this option is passed to the callback and could be used as a parameter, letting the capability to handle complex arguments if required by the test. HAProxy need to be built with DEBUG_UNIT to activate this feature.	2025-03-03 12:43:32 +01:00
William Lallemand	4dc0ba233e	MINOR: jws: implement a JWK public key converter Implement a converter which takes an EVP_PKEY and converts it to a public JWK key. This is the first step of the JWS implementation. It supports both EC and RSA keys. Know to work with: - LibreSSL - AWS-LC - OpenSSL > 1.1.1	2025-03-03 12:43:32 +01:00
Willy Tarreau	730641f7ca	BUG/MINOR: server: check for either proxy-protocol v1 or v2 to send hedaer As reported in issue #2882, using "no-send-proxy-v2" on a server line does not properly disable the use of proxy-protocol if it was enabled in a default-server directive in combination with other PP options. The reason for this is that the sending of a proxy header is determined by a test on srv->pp_opts without any distinction, so disabling PPv2 while leaving other options results in a PPv1 header to be sent. Let's fix this by explicitly testing for the presence of either send-proxy or send-proxy-v2 when deciding to send a proxy header. This can be backported to all versions. Thanks to Andre Sencioles (@asenci) for reporting the issue and testing the fix.	2025-03-03 04:05:47 +01:00
Olivier Houchard	706b008429	MEDIUM: servers: Add strict-maxconn. Maxconn is a bit of a misnomer when it comes to servers, as it doesn't control the maximum number of connections we establish to a server, but the maximum number of simultaneous requests. So add "strict-maxconn", that will make it so we will never establish more connections than maxconn. It extends the meaning of the "restricted" setting of tune.takeover-other-tg-connections, as it will also attempt to get idle connections from other thread groups if strict-maxconn is set.	2025-02-26 13:00:18 +01:00
Olivier Houchard	8de8ed4f48	MEDIUM: connections: Allow taking over connections from other tgroups. Allow haproxy to take over idle connections from other thread groups than our own. To control that, add a new tunable, tune.takeover-other-tg-connections. It can have 3 values, "none", where we won't attempt to get connections from the other thread group (the default), "restricted", where we only will try to get idle connections from other thread groups when we're using reverse HTTP, and "full", where we always try to get connections from other thread groups. Unless there is a special need, it is advised to use "none" (or restricted if we're using reverse HTTP) as using connections from other thread groups may have a performance impact.	2025-02-26 13:00:18 +01:00
Olivier Houchard	c36aae2af1	MINOR: pollers: Add a fixup_tgid_takeover() method. Add a fixup_tgid_takeover() method to pollers for which it makes sense (epoll, kqueue and evport). That method can be called after a takeover of a fd from a different thread group, to make sure the poller's internal structure reflects the new state.	2025-02-26 13:00:18 +01:00
Olivier Houchard	c5cc09c00d	MINOR: fd: Add fd_lock_tgid_cur(). Add fd_lock_tgid_cur(), a function that will lock the tgid, without modifying its value.	2025-02-26 13:00:18 +01:00
Olivier Houchard	52b97ff8dd	MEDIUM: fd: Wait if locked in fd_grab_tgid() and fd_take_tgid(). Wait while the tgid is locked in fd_grab_tgid() and fd_take_tgid(). As that lock is barely used, it should have no impact.	2025-02-26 13:00:18 +01:00
Willy Tarreau	fb7874c286	MINOR: tinfo: split the signal handler report flags into 3 While signals are not recursive, one signal (e.g. wdt) may interrupt another one (e.g. debug). The problem this causes is that when leaving the inner handler, it removes the outer's flag, hence the protection that comes with it. Let's just have 3 distinct flags for regular signals, debug signal and watchdog signal. We add a 4th definition which is an aggregate of the 3 to ease testing.	2025-02-24 13:37:52 +01:00
Vincent Dechenaux	9011b3621b	MINOR: compression: Introduce minimum size This is the introduction of "minsize-req" and "minsize-res". These two options allow you to set the minimum payload size required for compression to be applied. This helps save CPU on both server and client sides when the payload does not need to be compressed.	2025-02-22 11:32:40 +01:00
Willy Tarreau	29e246a84c	MINOR: freq_ctr: provide non-blocking read functions Some code called by the debug handlers in the context of a signal handler accesses to some freq_ctr and occasionally ends up on a locked one from the same thread that is dumping it. Let's introduce a non-blocking version that at least allows to return even if the value is in the process of being updated, it's less problematic than hanging.	2025-02-21 18:26:29 +01:00
Willy Tarreau	ddd173355c	MINOR: tinfo: add a new thread flag to indicate a call from a sig handler Signal handlers must absolutely not change anything, but some long and complex call chains may look innocuous at first glance, yet result in some subtle write accesses (e.g. pools) that can conflict with a running thread being interrupted. Let's add a new thread flag TH_FL_IN_SIG_HANDLER that is only set when entering a signal handler and cleared when leaving them. Note, we're speaking about real signal handlers (synchronous ones), not deferred ones. This will allow some sensitive call places to act differently when detecting such a condition, and possibly even to place a few new BUG_ON().	2025-02-21 17:41:38 +01:00
Aurelien DARRAGON	9561b9fb69	BUG/MINOR: sink: add tempo between 2 connection attempts for sft servers When the connection for sink_forward_{oc}_applet fails or a previous one is destroyed, the sft->appctx is instantly released. However process_sink_forward_task(), which may run at any time, iterates over all known sfts and tries to create sessions for orphan ones. It means that instantly after sft->appctx is destroyed, a new one will be created, thus a new connection attempt will be made. It can be an issue with tcp log-servers or sink servers, because if the server is unavailable, process_sink_forward() will keep looping without any temporisation until the applet survives (ie: connection succeeds), which results in unexpected CPU usage on the threads responsible for that task. Instead, we add a tempo logic so that a delay of 1second is applied between two retries. Of course the initial attempt is not delayed. This could be backported to all stable versions.	2025-02-21 11:22:35 +01:00
Christopher Faulet	851e52b551	BUG/MEDIUM: spoe/mux-spop: Introduce an NOOP action to deal with empty ACK In the SPOP protocol, ACK frame with empty payload are allowed. However, in that case, because only the payload is transferred, there is no data to return to the SPOE applet. Only the end of input is reported. Thus the applet is never woken up. It means that the SPOE filter will be blocked during the processing timeout and will finally return an error. To workaournd this issue, a NOOP action is introduced with the value 0. It is only an internal action for now. It does not exist in the SPOP protocol. When an ACK frame with an empy payload is received, this noop action is transferred to the SPOE applet, instead of nothing. Thanks to this trick, the applet is properly notified. This works because unknown actions are ignored by the SPOE filter. This patch must be backported to 3.1.	2025-02-20 11:56:27 +01:00
Amaury Denoyelle	06e7674399	MINOR: mux-quic/h3: emit SETTINGS via MUX tasklet handler Previously, QUIC MUX application layer was installed and initialized via MUX init. However, the latter stage involve I/O operations, for example when using HTTP/3 with the emission of a SETTINGS frame. Change this to prevent any I/O operations during MUX init. As such, finalize app_ops callback is now called during the first invokation of qcc_io_send(), in the context of MUX tasklet. To implement this, a new application state value is added, to detect the transition from NULL to INIT stage.	2025-02-19 11:03:40 +01:00
Amaury Denoyelle	188fc45b95	MINOR: mux-quic: define a QCC application state member Introduce a new QCC field to track the current application layer state. For the moment, only INIT and SHUT state are defined. This allows to replace the older flag QC_CF_APP_SHUT. This commit does not bring major changes. It is only necessary to permit future evolutions on QUIC MUX. The only noticeable change is that QMUX traces can now display this new field.	2025-02-19 10:59:53 +01:00
William Lallemand	69163cd63e	MINOR: ssl/crtlist: split the ckch_conf loading from the crtlist line parsing ckch_conf loading is not that simple as it requires to check - if the cert already exists in the ckchs_tree - if the ckch_conf is compatible with an existing cert in ckchs_tree - if the cert is a bundle which need to load multiple ckch_store This logic could be reuse elsewhere, so this commit introduce the new crtlist_load_crt() function which does that.	2025-02-17 18:26:37 +01:00
Amaury Denoyelle	32691e7c25	MINOR: quic: support frame type as a varint QUIC frame type is encoded as a variable-length integer. Thus, 64-bit integer should be used for them. Currently, this was not the case as type was represented as a 1-byte char inside quic_frame structure. This does not cause any issue with QUIC from RFC9000, as all frame types fit in this range. Furthermore, a QUIC implementation is required to use the smallest size varint when encoding a frame type. However, the current code is unable to accept QUIC extension with bigger frame types. This is notably the case for quic-on-streams draft. Thus, this commit readjusts quic_frame architecture to be able to support higher frame type values. First, type field of quic_frame is changed to a 64-bits variable. Both encoding and decoding frame functions uses variable-length integer helpers to manipulate the frame type field. Secondly, the quic_frame builders/parsers infrastructure is still preserved. However, it could be impossible to define new large frame type as an index into quic_frame_builders / quic_frame_parsers arrays. Thus, wrapper functions are now provided to access the builders and parsers. Both qf_builder() and qf_parser() wrappers can then be extended to return custom builder/parser instances for larger frame type. Finally, unknown frame type detection also uses the new wrapper quic_frame_is_known(). As with builders/parsers, for large frame type, this function must be manually completed to support a new type value.	2025-02-14 09:00:05 +01:00
William Lallemand	7034f2ca48	MINOR: ssl: store the filenames resulting from a lookup in ckch_conf With this patch, files resulting from a lookup (.key, .ocsp, *.issuer etc) are now stored in the ckch_conf. It allows to see the original filename from where it was loaded in "show ssl cert <filename>"	2025-02-13 17:44:00 +01:00
Amaury Denoyelle	f96af8e463	MINOR: quic: refactor STREAM encoding and splitting CRYPTO and STREAM frames encoding is similar. If payload is too large, frame will be splitted and only the first payload part will be written in the output QUIC packet. This process is complexified by the presence of a variable-length integer Length field prior to the payload. This commit aims at refactor these operations. Define two functions to simplify the code : * quic_strm_frm_fillbuf() which is used to calculate the optimal frame length of a STREAM/CRYPTO frame with its payload in a buffer * quic_strm_frm_split() which is used to split the frame payload if buffer is too small With this patch, both functions are now implemented for STREAM encoding.	2025-02-12 15:10:03 +01:00
William Lallemand	4de86bbbfc	MEDIUM: initcall: allow to register mutiple post_section_parser per section Before this patch, REGISTER_CONFIG_SECTION() allowed to register one and only one callback (<post>) called after the parsing of a section. It was limitating because you couldn't register a post callback from anywhere else in the code. This patch introduces the new REGISTER_CONFIG_SECTION_POST() macros which allows to register a new post callback for a section keyword from anywhere. This patch introduces the feature by allowing `struct cfg_section` entries that does not have a `section_parser`, and then iterating on all cfg_section with a post_section_parser for a keyword.	2025-02-12 12:52:41 +01:00
Amaury Denoyelle	731340afbd	MINOR: quic: simplify length calculation for STREAM/CRYPTO frames STREAM and CRYPTO frames have a similar encoding format. In particular, both of them have a variable-length integer Length field just before the frame payload. It is complex to determine the optimal Length value before copying the payload data in the remaining buffer space. As such, helper functions were implemented to calculate this. However, CRYPTO and STREAM frames encoding implementation were not completely aligned, which renders the code harder to follow. The purpose of this commit is to simplify CRYPTO and STREAM frames encoding. First, a new helper quic_int_cap_length() is defined which is useful to determine the optimal buffer room available if prefixed by a variable-length integer as Length field. Then, processing of both CRYPTO and STREAM frames is now nearly identical, based on this new helper function. Functions max_available_room() and max_stream_data_size() are now unused and are removed.	2025-02-12 11:51:09 +01:00
Willy Tarreau	b6a8318cc2	MEDIUM: server: allocate a tasklet for asyncronous requeuing This creates a tasklet that only expects to be called when the LB algorithm is under contention when trying to reposition the server in its tree. Indeed, that's one of the operations that usually requires to take a write lock on a highly contended area, often for very little benefits under contention; indeed, under load, if a server keeps its previous position for a few extra microseconds, usually there's no harm. Thus this new tasklet can be woken up by the LB algo to ask the server to later call lbprm.server_requeue(). It does nothing else.	2025-02-11 17:24:09 +01:00
Willy Tarreau	20b8c4ddba	MINOR: lbprm: add a new callback ->server_requeue to the lbprm This callback will be used to reposition a server to its expected position regardless of the fact that it was taken or dropped. It will only be used by supporting LB algos. For now, only fwlc defines it and assigns it to fwlc_srv_reposition(). At the moment it's not used yet.	2025-02-11 17:16:14 +01:00
Willy Tarreau	eced1d6d8a	DEBUG: thread: reduce the struct lock_stat to store only 30 buckets Storing only 30 buckets means we only keep 256 bytes per label. This further simplifies address calculation and reduces the memory used without complicating the locking code. It means we won't measure wait times larger than a second but we're not supposed to face this as it would trigger the watchdog anyway. It may become a little bit just if measuring using rdtsc() instead of now_mono_time() though (typically the limit would be around 350ms for a 3 GHz CPU).	2025-02-10 18:34:43 +01:00
Willy Tarreau	c2f2d6fd3c	DEBUG: thread: make lock_stat per operation instead of for all operations It's more convenient (and more readable) to have the lock stats arranged by operation type (read, seek, write). It will also allow to later simplify the structure format and the bucket address calculation. Now lock_stat[] got split into lock_stats_rd[], lock_stats_sk[], lock_stats_wr[].	2025-02-10 18:34:43 +01:00
Willy Tarreau	4168d1278c	DEBUG: thread: don't keep the redundant _locked counter Now that we have our sums by bucket, the _locked counter is redundant since it's always equal to the sum of all entries. Let's just get rid of it and replace its consumption with a loop over all buckets, this will reduce the overhead of taking each lock at the expense of a tiny extra effort when dumping all locks, which we don't care about.	2025-02-10 18:34:43 +01:00
Willy Tarreau	a22550fbd7	DEBUG: thread: report the wait time buckets for lock classes In addition to the total/average wait time, we now also store the wait time in 2^N buckets. There are 32 buckets for each type (read, seek, write), allowing to store wait times from 1-2ns to 2.1-4.3s, which is quite sufficient, even if we'd want to switch from NS to CPU cycles in the future. The counters are only reported for non- zero buckets so as not to visually pollute the output. This significantly inflates the lock_stat struct, which is now aligned to 256 bytes and rounded up to 1kB. But that's not really a problem, given that there's only one per lock label.	2025-02-10 18:34:43 +01:00
Willy Tarreau	7ddcdff33f	BUG/MEDIUM: debug: close a possible race between thread dump and panic() The rework of the thread dumping mechanism in 2.8 with commit `9a6ecbd590` ("MEDIUM: debug: simplify the thread dump mechanism") opened a small race, which is that a thread in the process of dumping other ones may block the other one from panicing while it's looping at the end of ha_thread_dump_fill(), or any other sequence involving the currently dumped one. This was emphasized in 3.1 with commit `148eb5875f` ("DEBUG: wdt: better detect apparently locked up threads and warn about them") that allowed to emit warnings about long-stuck threads, because in this case, what happens is that sometimes a thread starts to emit a warning (or a set of warnings), and while the warning is being awaited for, a panic finally happens and interrupts either the dumping thread, which never finishes and waits for the target's pointer to become NULL which will never happen since it was supposed to do it itself, or the currently dumped thread which could wait for the dumping thread to become ready while this one has not released the former. In order to address this, first we now make sure never to dump a thread that is already in the process of dumping another one. We're adding a new thread flag to know this situation, that is set in ha_thread_dump_fill() and cleared in ha_thread_dump_done(). And similarly, we don't trigger the watchdog on a thread waiting for another one to finish its dump, as it's likely a case of warning (and maybe even a panic) that makes them wait for each other and we don't want such cases to be reentrant. Finally, we check in the main polling loop that the flag never accidentally leaked (e.g. wrong flag manipulation) as this would be difficult to spot with bad consequences. This should be backported at least to 2.8, and should resolve github issue #2860. Thanks to Chris Staite for the very informative backtrace that exhibited the problem.	2025-02-10 18:34:26 +01:00
Willy Tarreau	ae540e3d9c	Revert "IMPORT: plock: export the uninlined version of the lock wait function" This reverts commit `5496d06b2b`. It breaks the build on Windows which apparently doesn't support the weak attribute well on functions. It's not big deal anyway, playing with build options while debugging still works though it's less easy to use.	2025-02-07 19:51:15 +01:00
Willy Tarreau	b957e2f3ef	IMPORT: plock: use cpu_relax() for a shorter time in EBO Tests have shown that on modern CPUs it's interesting to wait a bit less in cpu_relax(). Till now we were looping down to 60 iterations and then switching to just barriers. Increasing the threshold to 90 iterations left before getting out of the loop improved the average and max time to grab a write lock by a few percent (e.g. 10% at 1us, 20% at 256ns or lower). Higher values tend to progressively lose that gain so let's stick to this one. This was measured on an EPYC 74F3 like previous measurements that initially led to this value, and the value might possibly depend on the mask applied to the loop counter. This is plock commit 74ca0a7307fa6aec3139f27d3b7e534e1bdb748e.	2025-02-07 18:04:29 +01:00
Willy Tarreau	253fba01a7	IMPORT: plock: lower the slope of the exponential back-off Along many tests involving both haproxy's scheduler and forwarded traffic, various exponents and algorithms were attempted for the EBO and their effects were measured. It was found that a growth in 1.25^N limited to 128k cycles consistently gives a better latency than 1.5^N limited to 256k cycles, without degrading general performance. The measures of the time to grab a write lock on a 48-thread EPYC show that the number of occurrences of low times was roughly multiplied by 2-3 while the number of occurrences of times above 64us was reduced by similar factors, to even reach 300 at 64us and limiting the maximum time by a factor of 4. The other variants that were experimented with are: m = ((m + (m >> 1)) + 2) & 0x3ffff; // original m = ((m + (m >> 1) + (m >> 3)) + 2) & 0x3ffff; m = ((m + (m >> 1) + (m >> 4)) + 2) & 0x3ffff; m = ((m + (m >> 1) + (m >> 4)) + 2) & 0x1ffff; m = ((m + (m >> 1) + (m >> 4)) + 1) & 0x1ffff; m = ((m + (m >> 2) + (m >> 4)) + 1) & 0x1ffff; // lowest CPU on pl_wr test + good perf m = ((m + (m >> 2)) + 1) & 0x1ffff; // even lower cpu usage, lowest max m = ((m + (m >> 1) + (m >> 2)) + 1) & 0x1ffff; // correct but slightly higher maxes m = ((m + (m >> 1) + (m >> 3)) + 1) & 0x1ffff; // less good than m+m>>2 m = ((m + (m >> 2) + (m >> 3)) + 1) & 0x1ffff; // better but not as good as m+m>>2 m = ((m + (m >> 3) + (m >> 4)) + 1) & 0x1ffff; // less good, lower rates on small coounts. m = ((m + (m >> 2) + (m >> 3) + (m >> 4)) + 1) & 0x1ffff; // less good as well m = ((m & 0x7fff) + (m >> 1) + (m >> 4)) + 2; m = ((m & 0xffff) + (m >> 1) + (m >> 4)) + 2; This is plock commit dddd9ee01c522da33c353e2e4d4fd743d8336ec3.	2025-02-07 18:04:29 +01:00
Willy Tarreau	9dd56da730	IMPORT: plock: give higher precedence to W than S It was noticed in haproxy that in certain extreme cases, a write lock subject to EBO may fail for a very long time in front of a large set of readers constantly trying to upgrade to the S state. The reason is that among many readers, one will succeed in its upgrade, and this situation can last for a very long time with many readers upgrading in turn, while the writer waits longer and longer before trying again. Here we're taking a reasonable approach which is that the write lock should have a higher precedence in its attempt to grab the lock. What is done is that instead of fully rolling back in case of conflict with a pure S lock, the writer will only release its read part in order to let the S upgrade to W if needed, and finish its operations. This guarantees no other seek/read/write can enter. Once the conflict is resolved, the writer grabs the read part again and waits for readers to be gone (in practice it could even return without waiting since we know that any possible wanderers would leave or even not be there at all, but it avoids a complicated loop code that wouldn't improve the practical situation but inflate the code). Thanks to this change, the maximum write lock latency on a 48 threads AMD with aheavily loaded scheduler went down from 256 to 64 ms, and the number of occurrences of 32ms or more was divided by 300, while all occurrences of 1ms or less were multiplied by up to 3 (3 for the 4-16ns cases). This is plock commit b6a28366d156812f59c91346edc2eab6374a5ebd.	2025-02-07 18:04:29 +01:00
Willy Tarreau	5496d06b2b	IMPORT: plock: export the uninlined version of the lock wait function The inlining of the lock waiting function was made more easily configurable with commit 7505c2e ("plock: always expose the inline version of the lock wait function"). However, the standard one remained static, but in order to resolve the symbols in "perf top", it's much better to export it, so let's move "static" with "inline" and leave it exported when PLOCK_INLINE_EBO is not set. This is plock commit 3bea7812ec705b9339bbb0ed482a2cd8aa6c185c.	2025-02-07 18:04:29 +01:00
Christopher Faulet	eb4e517489	CLEANUP: mux-spop: Remove useless comments Just a small cleanup to remove some comments added during the development of the mux.	2025-02-06 11:19:32 +01:00
Christopher Faulet	d16c534511	MINOR: mux-spop: Report EOI on the SE when a ACK is received for a stream The spop stream now reports the end of input when the ACK is transferred to the SPOE applet. To do so, the flag SPOP_SF_ACK_RCVD was added. It is set on the SPOP stream when its ACK is received by the SPOP connection. In addition when SPOP stream flags are propagated to the SE, the error is now reported if end of input was not reached instead of testing the connection error code. It is more accurate. This patch should be backported to 3.1.	2025-02-06 11:19:32 +01:00
Frederic Lecaille	85cb1cc7f4	BUILD: ssl: remove a boringssl definition defined by recent boringssl libs This is the case for AWS-LC which derives from boringssl, where X509_OBJECT_get0_X509_CRL() is already defined. There is definitively no more need to define this function to build haproxy against TLS libs derived from boringssl.	2025-02-06 10:48:25 +01:00
Aurelien DARRAGON	0846638f7f	MEDIUM: stream: interrupt costly rulesets after too many evaluations It is not rare to see configurations with a large number of "tcp-request content" or "http-request" rules for instance. A large number of rules combined with cpu-demanding actions (e.g.: actions that work on content) may create thread contention as all the rules from a given ruleset are evaluated under the same polling loop if the evaluation is not interrupted Thus, in this patch we add extra logic around "tcp-request content", "tcp-response content", "http-request" and "http-response" rulesets, so that when a certain number of rules are evaluated under the single polling loop, we force the evaluating function to yield. As such, the rule which was about to be evaluated is saved, and the function starts evaluating rules from the save pointer when it returns (in the next polling loop). We use task_wakeup(task, TASK_WOKEN_MSG) to explicitly wake the task so that no time is wasted and the processing is resumed ASAP. TASK_WOKEN_MSG is mandatory here because process_stream() expects TASK_WOKEN_MSG for explicit analyzers re-evaluation. rules_bcount stream's attribute was added to count how manu rules were evaluated since last interruption (yield). Also, SF_RULE_FYIELD flag was added to know that the s->current_rule was assigned due to forced yield and not regular yield. By default haproxy will enforce a yield every 50 rules, this behavior can be configured using the "tune.max-rules-at-once" global keyword. There is a limitation though: for now, if the ACT_OPT_FINAL flag is set on act_opts, we consider it is not safe to yield (as it is already the case for automatic yield). In this case instead of yielding an taking the risk of not being called back, we skip the yield and hope it will not create contention. This is something we should ideally try to improve in order to yield in all conditions.	2025-02-03 17:09:48 +01:00
Christopher Faulet	5f927f603a	BUG/MEDIUM: mux-fcgi: Properly handle read0 on partial records A Read0 event could be ignored by the FCGI multiplexer if it is blocked on a partial record. Instead of handling the event, it remained blocked, waiting for the end of the record. To fix the issue, the same solution than the H2 multiplexer is used. Two flags are introduced. The first one, FCGI_CF_END_REACHED, is used to acknowledge a read0. This flag is set when a read0 was received AND the FCGI multiplexer must handle it. The second one, FCGI_CF_DEM_SHORT_READ, is set when the demux is interrupted on a partial record. A short read and a read0 lead to set the FCGI_CF_END_REACHED flag. With these changes, the FCGI mux should be able to properly handle read0 on partial records. This patch should be backported to all stable versions after a period of observation.	2025-02-03 07:49:50 +01:00
Christopher Faulet	71320fc9c1	MINOR: tevt/connection: Add support for POLL_HUP/POLL_ERR events Connection errors can be detected via connect/recv/send syscall, but also because it was reported by the poller. So dedicated events, at the FD level, are introduced to make the difference. term_events tool was updated accordingly.	2025-01-31 10:41:50 +01:00
Christopher Faulet	990854ee0d	REORG: tevt/connection: Move enums at the end of the header file Enums used to report events were placed in the connection header for conveniance. But it is not specifically related to connection. So, they are moved at the end of the file to have a better isolation.	2025-01-31 10:41:50 +01:00
Christopher Faulet	487d6b09f1	MINOR: tevt: Improve function to convert a termination events log to string The function is now responsible to handle empty log because no event was reported. In that case, an empty string is returned. It is also responsible to handle case where termination events log is not supported for an given entity (for instance the quic mux for now). In that case, a dash ("-") is returned.	2025-01-31 10:41:50 +01:00
Christopher Faulet	cbd898c42b	MINOR: tevt: Don't duplicate termination event during reporting It is hard to never detect the same event several time without painful tests. In other words, the same termination event can be reported several time and this must be handled. To do so, "tevt_report_event" macro is updated to ignore an event if the last reported one is of the same type, for the same location. Of course, if the same event is reported several times at different moment, it will not be detected.	2025-01-31 10:41:50 +01:00
Christopher Faulet	2dc02f75b1	MEDIUM: tevt/stconn/stream: Add dedicated termination events for stream location If it is the last patch to introduce dedicated termination events for each location. In this one, events for the stream location are introcued. The old enum is also removed because it is now unused. Here, more accurate evets are added. The "intercepted" event was splitted.	2025-01-31 10:41:50 +01:00
Christopher Faulet	a58e650ad1	MEDIUM: tevt/muxes: Add dedicated termination events for muxc/se locations Termination events dedicated to mux connection and stream-endpoint descriptors are added in this patch. Specific events to these locations are thus added. Changes for the H1 and H2 multiplexers are reviewed to be more accurate.	2025-01-31 10:41:50 +01:00
Christopher Faulet	f2778ccc7d	MINOR: tevt/connection: Add dedicated termination events for lower locations To be able to add more accurate termination events for each location, the enum will be splitted by location. Indeed, there are at most 16 possbile events. It will be pretty confusing to use same termination events for the different locations. So the best is to split them. In this patch, the termination events for the fd, hs and xprt locations are introduced. For now some holes are added to keep similar events aligned across enums. But this may change in future.	2025-01-31 10:41:50 +01:00
Christopher Faulet	a4c281a190	MINOR: tevt/muxes: Add CTL and SCTL command to get the termination event logs MUX_CTL_TEVTS command is added to get the termination event logs of a mux connection and MUX_SCTL_TEVTS command to get the termination event logs of a mux stream.	2025-01-31 10:41:50 +01:00
Christopher Faulet	00a07c8b54	MINOR: tevt/stream/stconn: Report termination events for stream and sc In this patch, events for the stream location are reported. These events are first reported on the corresponding stream-connector. So front events on scf and back event on scb. Then all events are both merged in the stream. But only 4 events are saved on the stream. Several internal events are for now grouped with the type "tevt_type_intercepted". More events will be added to have a better resolution. But at least the place to report these events are identified. For now, when a event is reported on a SC, it is also reported on the stream and vice versa.	2025-01-31 10:41:50 +01:00
Christopher Faulet	992b4b9726	MINOR: tevt/stconn: Add a termination events log in the SE descriptor This termination events log will be used to report events from the mux streams. The location will be "tevt_loc_se" and the muxes will be responsible to report the corresponding events.	2025-01-31 10:41:50 +01:00
Christopher Faulet	e944944990	MINOR: tevt: Add the termination events log's fundations Termination events logs will be used to report the events that led to close a connection. Unlike flags, that reflect a state, the idea here is to store a log to preserve the order of the events. Most of time, when debugging an issue, the order of the events is crucial to be able to understand the root cause of the issue. The traces are trully heplful to do so. But it is not always possible to active them because it is pretty verbose. On heavily loaded platforms, it is not acceptable. We hope that the termination events logs will help us in that situations. One termination events log will be be store at each layer (connection, mux connection, mux stream...) as a 32-bits integer. Each event will be store on 8 bits, 4 bits for the location and 4 bits for the type. So the first four events will be stored only for each layer. It should be enough why a connection is closed. In this patch, the enums defining the termination event locations and types are added. The macro to report a new event is also added and a function to convert a termination events log to a string that could be display in log messages for instance.	2025-01-31 10:41:49 +01:00
Christopher Faulet	e56e718c82	MINOR: mux-h1: Add masks to group H1S DEMUX and MUX errors It is just a small patch to clean up mux/demux functions. Instead of listing the H1S errors that must be handled during demux of mux operations, masks of flags are used. It is more readable.	2025-01-31 10:41:49 +01:00
Willy Tarreau	d155924efe	MINOR: fd: add a generation number to file descriptors This patch adds a counter of close() on file descriptors in the fdtab. The goal is to better detect if reported events concern the current or a previous file descriptor. For now the counter is only added, and is showed in "show fd" as "gen". We're reusing unused space at the end of the struct. If it's needed for something more important later, this patch can be reverted.	2025-01-30 19:45:34 +01:00
Willy Tarreau	44ac7a7e73	DEBUG: fd: add a counter of takeovers of an FD since it was last opened That's essentially in order to help with debugging strange cases like the occasional epoll issues/races, by keeping a counter of how many times an FD was taken over since last inserted. The room is available so let's use it. If it's needed later, this patch can easily be reverted. The counter is also reported in "show fd" as "tkov".	2025-01-30 19:45:34 +01:00
Amaury Denoyelle	b849ee5fa3	BUILD: quic: fix overflow in global tune A new global option was recently introduced to disable pacing. However, the value used (1<<31) caused issue with some compiler as options field used for storage is declared as int. Move pacing deactivation flag outside into the newly defined quic_tune to fix this. This should be backported up to 3.1 after a period of observation. Note that it relied on the previous patch which defined new quic_tune type.	2025-01-30 18:12:53 +01:00
Amaury Denoyelle	09e9c7d5b7	MINOR: quic: define quic_tune Define a new structure quic_tune. It will be useful to regroup various configuration settings and tunable related to QUIC, instead of defining them into the global structure.	2025-01-30 18:12:40 +01:00
Amaury Denoyelle	0c8b54b2d1	MINOR: quic: transform pacing settings into a global option Pacing support was previously activated on each bind line individually, via an optional argument of quic-cc-algo keyword. Remove this optional argument and introduce a global setting to enable/disable pacing. Pacing activation is still flagged as experimental. One important change is that previously BBR usage automatically activated pacing support. This is not the case anymore, so users should now always explicitely activate pacing if BBR is selected. A new warning message will be displayed if this is not the case. Another consequence of this change is that now pacing_inter callback is always defined for every quic_cc_algo types. As such, QUIC MUX uses global.tune.options to determine if pacing is required. This should be backported up to 3.1, after a period of observation.	2025-01-30 17:19:38 +01:00
William Lallemand	b43e5d8c16	BUILD: ssl: more cleaner approach to WolfSSL without renegotiation Patch discussed in https://github.com/wolfSSL/wolfssl/issues/6834 When building Wolfssl without renegotiation options, WolfSSL still defines the macros about it, which warns during the build. This patch completes the previous one by undefining the macros so haproxy could build without any warning.	2025-01-28 20:55:20 +01:00
William Lallemand	c6a8279cdf	BUILD: ssl: allow to build without the renegotiation API of WolfSSL In ticket https://github.com/wolfSSL/wolfssl/issues/6834, it was suggested to push --enable-haproxy within --enable-distro. WolfSSL does not want to include the renegotiation support in --enable-distro. To achieve this, let haproxy build without SSL_renegotiate_pending() when wolfssl does not define HAVE_SECURE_RENEGOCIATION or HAVE_SERVER_RENEGOCIATION_INFO.	2025-01-28 18:31:32 +01:00
Willy Tarreau	f17b0a994b	BUILD: tools: fix build on BSD by dropping the ETIME check Commit `44537379fc` ("MINOR: tools: add errname to print errno macro name") brought a facility to report errno using a symbolic string when known instead of showing only the value. However, among the listed options, ETIME is mentioned but is unknown from FreeBSD where it breaks the build. Let's simply drop it, we don't use ETIME anyway and even if it would be reported, the default code path still reports the numeric value so there's no harm. If other ones fail to build in the future, they could be handled the same way.	2025-01-28 15:58:57 +01:00
Christopher Faulet	36d151dc10	MEDIUM: stream: No longer use TASK_F_UEVT* to shut a stream down Thanks to the previous patch, it is now possible to explicitly rely on stream's events to shut it down. The right event is set in stream_shutdown(), before waking up the stream, via an atomic operation. In process_stream(), this event will be handled as expected. Thus, TASK_F_UEVT* are no longer used, but not removed since still usable for other tasks. This patch depends on "MEDIUM: stream: Map task wake up reasons to dedicated stream events".	2025-01-28 14:53:37 +01:00
Christopher Faulet	6048460102	MEDIUM: stream: Map task wake up reasons to dedicated stream events To fix thread-safety issues when a stream must be shut, three new task states were added. These states are generic (UEVT1, UEVT2 and UEVT3), the task callback function is responsible to know what to do with them. However, it is not really scalable. The best is to use an atomic field in the stream structure itself to deal with these dedicated events. There is already the "pending_events" field that save wake up reasons (TASK_WOKEN_) to not loose them if process_stream() is interrupted before it had a chance to handle them. So the idea is to introduce a new field to handle streams dedicated events and merged them with the task's wake up reasons used by the stream. This means a mapping must be performed between some task wake up reasons and streams events. Note that not all task wake up reasons will be mapped. In this patch, the "new_events" field is introduced. It is an atomic bit-field. Streams events (STRM_EVT_) are also introduced to map the task wake up reasons used by process_stream(). Only TASK_WOKEN_TIMER and TASK_WOKEN_MSG are mapped, in addition to TASK_F_UEVT* flags. In process_stream(), "pending_events" field is now filled with new stream events and the mapping of the wake up reasons.	2025-01-28 14:53:37 +01:00
Christopher Faulet	0a52a75ef7	BUG/MINOR: stream: Properly handle "on-marked-up shutdown-backup-sessions" shutdown-backup-sessions action for on-marked-up directive does not work anymore since the stream_shutdown() function was modified to be async-safe. When stream_shutdown() was modified to be async-safe, dedicated task events were added to map the reasons to shut a stream down. SF_ERR_DOWN was mapped to TASK_F_EVT1 and SF_ERR_KILLED was mapped to TASK_F_EVT2. The reverse mapping was performed by process_stream() to shut the stream with the appropriate reason. However, SF_ERR_UP reason, used by shutdown-backup-sessions action to shut a stream down because a preferred server became available, was not mapped in the same way. So since commit `b8e3b0a18d` ("BUG/MEDIUM: stream: make stream_shutdown() async-safe"), this action is ignored and does not work anymore. To fix an issue, and being able to bakcport the fix, a third task event was added. TASK_F_EVT3 is now mapped on SF_ERR_UP. This patch should fix the issue #2848. It must be backported as far as 2.6.	2025-01-28 14:53:37 +01:00
Olivier Houchard	26b3e5236f	MEDIUM: servers/proxies: Switch to using per-tgroup queues. For both servers and proxies, use one connection queue per thread-group, instead of only one. Having only one can lead to severe performance issues on NUMA machines, it is actually trivial to get the watchdog to trigger on an AMD machine, having a server with a maxconn of 96, and an injector that uses 160 concurrent connections. We now have one queue per thread-group, however when dequeueing, we're dequeuing MAX_SELF_USE_QUEUE (currently 9) pendconns from our own queue, before dequeueing one from another thread group, if available, to make sure everybody is still running.	2025-01-28 12:49:41 +01:00
Olivier Houchard	583303c48b	MINOR: proxies/servers: Calculate queueslength and use it. For both proxies and servers, properly calculates queueslength, which is the total number of element in each queues (as they currently are only using one queue, it is equivalent to the number of element of that queue), and use it instead of the queue's length.	2025-01-28 12:49:41 +01:00
Olivier Houchard	59eddabe16	MINOR: Add fields to the per-thread group field in struct server. Add a per-thread group queue and associated fields in per-thread group field in struct server, as well as a new field, queues length. This is currently unused, so should change nothing.	2025-01-28 12:49:41 +01:00
Olivier Houchard	f879b9a18a	MINOR: proxies: Add a per-thread group field to struct proxy. Add a per-thread group field to struct proxy, that will contain a struct queue, as well as a new field, "queueslength". This is currently unused, so should change nothing. Please note that proxy_init_per_thr() must now be called for each proxy once the thread groups number is known.	2025-01-28 12:49:41 +01:00
Aurelien DARRAGON	e768a531b7	CLEANUP: tree-wide: define and use acl_match_cond() helper acl_match_cond() combines acl_exec_cond() + acl_pass() and a check on the condition->pol (to check if the cond is inverted) in order to return either 0 if the cond doesn't match or 1 if it matches (or NULL). Thanks to this we can actually simplify some redundant constructs that iterate over rules and evaluate if the condition matches or not. Conditions for tcp-request inspect-content and tcp-response inspect-content couldn't be simplified because they perform an extra check for missing data, and thus still need to leverage acl_exec_cond() It's best to display the patch using "-w", like "git show xxxx -w", because some blocks had to be re-indented after the cleanup, which makes the patch hard to review by default.	2025-01-27 11:11:43 +01:00
Valentine Krasnobaeva	94d3b7375a	CLEANUP: ssl: move ssl_sock_gencert_load_ca declaration in ssl_gencert.h As ssl_sock_gencert_load_ca and ssl_sock_gencert_free_ca are compiled only if SSL_NO_GENERATE_CERTIFICATES is not defined, let's align it and move these declarations in ssl_gencert.h.	2025-01-24 12:31:07 +01:00
Valentine Krasnobaeva	846819b316	CLEANUP: ssl: rename ssl_sock_load_ca to ssl_sock_gencert_load_ca ssl_sock_load_ca is defined in ssl_gencert.c and compiled only if SSL_NO_GENERATE_CERTIFICATES is not defined. It's name is a bit confusing, as we may think at the first glance, that it's a generic function, which is also used to load CA file, provided via 'ca-file' keyword. ssl_set_verify_locations_file is used in this case. So let's rename ssl_sock_load_ca into ssl_sock_gencert_load_ca. Same is applied to ssl_sock_free_ca.	2025-01-24 12:31:07 +01:00
Valentine Krasnobaeva	44537379fc	MINOR: tools: add errname to print errno macro name Add helper to print the name of errno's corresponding macro, for example "EINVAL" for errno=22. This may be helpful for debugging and for using in some CLI commands output. The switch-case in errname() contains only the errnos currently used in the code. So, it needs to be extended, if one starts to use new syscalls.	2025-01-24 09:54:57 +01:00
Amaury Denoyelle	7896edccdc	MINOR: quic: remove unused pacing burst in bind_conf/quic_cc_path Pacing burst size is now dynamic. As such, configuration value has been removed and related fields in bind_conf and quic_cc_path structures can be safely removed. This should be backported up to 3.1.	2025-01-23 17:40:48 +01:00
Amaury Denoyelle	cb91ccd8a8	MEDIUM: quic: use dynamic credit for pacing Major improvements have been introduced in pacing recently. Most notably, QMUX schedules emission on a millisecond resolution, which allow to use passive wait to be much CPU friendly. However, an issue remains with the pacing max credit. Unless BBR is used, it is fixed to the configured value from quic-cc-algo bind statement. This is not practical as if too low, it may drastically reduce performance due to 1ms sleep resolution. If too high, some clients will suffer from too much packet loss. This commit fixes the issue by implementing a dynamic maximum credit value based on the network condition specific to each clients. Calculation is done to fix a maximum value which should allow QMUX current tasklet context to emit enough data to cover the delay with the next tasklet invokation. As such, avg_loop_us is used to detect the process load. If too small, 1.5ms is used as minimal value, to cover the extra delay incurred by the system which will happen for a default 1ms sleep. This should be backported up to 3.1.	2025-01-23 17:40:48 +01:00
Amaury Denoyelle	8098be1fdc	MEDIUM: mux-quic: reduce pacing CPU usage with passive wait Pacing algorithm has been revamped in the previous commit to implement a credit based solution. This is a far more adaptative solution, in particular which allow to catch up in case pause between pacing emission was longer than expected. This allows QMUX to remove the active loop based on tasklet wake-up. Instead, a new task is used when emission should be paced. The main advantage is that CPU usage is drastically reduced. New pacing task timer is reset each time qcc_io_send() is invoked. Timer will be set only if pacing engine reports that emission must be interrupted. In this case timer is set via qcc_wakeup_pacing() to the delay reported by congestion algorithm, or 1ms if delay is too short. At the end of qcc_io_cb(), pacing task is queued if timer has been set. Pacing task execution is simple enough : it immediately wakes up QCC I/O handler. Note that to have decent performance, it requires to have a large enough burst defined in configuration of quic-cc-algo. However, this value is common to every listener clients, which may cause too much loss under network conditions. This will be address in a future patch. This should be backported up to 3.1.	2025-01-23 17:40:22 +01:00
Amaury Denoyelle	4489a61585	MEDIUM: quic: implement credit based pacing Implement a new method for QUIC pacing emission based on credit. This represents the number of packets which can be emitted in a single burst. After emission, decrement from the credit the number of emitted packets. Several emission can be conducted in the same sequence until the credit is completely decremented. When a new emission sequence is initiated (i.e. under a new QMUX tasklet invokation), credit is refilled according to the delay which occured between the last and current emission context. This new mechanism main advantage is that it allows to conduct several emission in the same task context without having to wait between each invokation. Wait is only forced if pacing is expired, which is now equivalent to having a null credit. Furthermore, if delay between two emissions sequence would have been smaller than expected, credit is only partially refilled. This allows to restart emission without having to wait for the whole credit to be available. On the implementation side, a new field <credit> is avaiable in quic_pacer structure. It is automatically decremented on quic_pacing_sent_done() invokation. Also, a new function quic_pacing_reload() must be used by QUIC MUX when a new emission sequence is initiated to refill credit. <next> field from quic_pacer has been removed. For the moment, credit is based on the burst configured via quic-cc-algo keyword, or directly reported by BBR. This should be backported up to 3.1.	2025-01-23 17:40:20 +01:00
Amaury Denoyelle	bbaa7aef7b	BUG/MINOR: quic: do not increase congestion window if app limited Previously, congestion window was increased any time each time a new acknowledge was received. However, it did not take into account the window filling level. In a network condition with negligible loss, this will cause the window to be incremented until the maximum value (by default 480k), even though the application does not have enough data to fill it. In most cases, this issue is not noticeable. However, it may lead to excessive memory consumption when a QUIC connection is suddendly interrupted, as in this case haproxy will fill the window with retransmission. It even has caused OOM crash when thousands of clients were interrupted at once on a local network benchmark. Fix this by first checking window level prior to every incrementation via a new helper function quic_cwnd_may_increase(). It was arbitrarily decided that the window must be at least 50% full when the ACK is handled prior to increment it. This value is a good compromise to keep window in check while still allowing fast increment when needed. Note that this patch only concerns cubic and newreno algorithm. BBR has already its notion of application limited which ensures the window is only incremented when necessary. This should be backported up to 2.6.	2025-01-23 14:49:35 +01:00
Amaury Denoyelle	7c0820892f	MINOR: quic: rename pacing_rate cb to pacing_inter Rename one of the congestion algorithms pacing callback from pacing_rate to pacing_inter. This better reflects that this function returns a delay (in nanoseconds) which should be applied between each packet emission to fill the congestion window with a perfectly smoothed emission. This should be backported up to 3.1.	2025-01-23 14:49:35 +01:00
Amaury Denoyelle	2178bf1192	CLEANUP: quic: remove unused prototype Remove undefined quic_pacing_send() function prototype from quic_pacing module. This should be backported up to 3.1.	2025-01-23 14:49:35 +01:00
Frederic Lecaille	4f38c4bfd8	MINOR: quic: Add a BUG_ON() on quic_tx_packet refcount This is definitively a bug to call quic_tx_packet_refdec() to decrement the reference counter of a TX packet calling quic_tx_packet_refdec(), and possibly to release its memory when it is negative or null. This counter is incremented when a TX frm is attached to it with some allocated memory and when the packet is inserted into a data structure, if needed (list or tree). Should be easily backported as far as 2.6 to ease any further backport around this code part.	2025-01-21 22:01:34 +01:00
Frederic Lecaille	cb729fb64d	BUG/MINOR: quic: ensure a detached coalesced packet can't access its neighbours Reset ->prev and ->next fields of a coalesced TX packet to ensure it cannot access several times its neighbours after it is supposed to be detached from them calling quic_tx_packet_dgram_detach(). There are two cases where a packet can be coalesced to another previous built one: this is when it is built into the same datagrame without GSO (and flagged flag with QUIC_FL_TX_PACKET_COALESCED) or when sent from the same sendto() syscall with GOS (not flagged with QUIC_FL_TX_PACKET_COALESCED). This fix may be in relation with GH #2839. Must be backported as far as 2.6.	2025-01-21 22:01:34 +01:00
Willy Tarreau	b066c0affb	REORG: version: move the remaining BUILD_* stuff from haproxy.c to version.c version.c tries to centralize all variables conveying version information, but there's still an issue with the BUILD_* variables which are only passed to haproxy.o and are only updated when that one is rebuilt. This is not very logical given that we can end up with values there which contradict info from version.c. Better move all of these to version.c which is systematically rebuilt. Most of these variables only end up as string concatenation at the moment. Some of them are even duplicated. In version.c we now have one variable (or constant) for each of them and haproxy.c references them in messages. This is much more logical and easier to maintain in a consistent state. The patch looks a bit large but it really only moves the ifdefed string assignment from one file to another, placing them into variables.	2025-01-20 17:53:55 +01:00
Amaury Denoyelle	a50dd07c16	MINOR: trace: ensure -dt priority over traces config section Traces can be activated on startup either via -dt command line argument or via the traces configuration section. This can caused confusion as it may not be clear as trace source can be completed or overriden by one or the other. Fix the precedence to give the priority to the command line argument. Now, each trace source configured via -dt is first resetted to a default state before applying new settings. Then, it is impossible to change a trace source via the configuration file if it was already targetted via -dt argument.	2025-01-10 14:50:59 +01:00
Willy Tarreau	b25850f25b	MINOR: tools: add a few functions to simply check for a file's existence At many places we'd like to be able to simply construct a path from a format string and check if that path corresponds to an existing file, directory etc. Here we add 3 functions, a generic one to test that a path corresponds to a given file mode (e.g. S_IFDIR, S_IFREG etc), and two other ones specifically checking for a file or a dir for easier use.	2025-01-09 09:18:49 +01:00
Willy Tarreau	bd06502b22	BUILD: makefile: add a qinfo macro to pass info in quiet mode Some commands such as $(cmd_CC) etc already handle the quiet vs verbose mode in the makefile, but sometimes we may want to pass other info. The new "qinfo" macro can be called with a 9-char string argument (spaces included) as a prefix for some commands, to emit that string when in quiet mode. The caller must fill the spaces needed for alignment. E.g: $(call quinfo, CC )$(CC) ...	2025-01-08 11:26:05 +01:00
Amaury Denoyelle	af00be8e0f	MINOR: mux-quic: change return value of qcs_attach_sc() A recent fix was introduced to ensure that a streamdesc instance won't be attached to an already completed QCS which is eligible to purging. This was performed by skipping application protocol decoding if a QCS is in such a state. Here is the patch responsible for this change. `caf60ac696` BUG/MEDIUM: mux-quic: do not attach on already closed stream However, this is too restrictive, in particular for unidirection stream where no streamdesc is never attached. To fix this behavior, first qcs_attach_sc() API has been modified. Instead of returning a streamdesc instance, it returns either 0 on success or a negative error code. There should be no functional changes with this patch. It is only to be able to extend qcs_attach_sc() with the possibility of skipping streamdesc instantiation while still keeping a success return value. This should be backported wherever the above patch has been merged. For the record, it was scheduled for immediate backport on 3.1, plus merging on older releases up to 2.8 after a period of observation.	2025-01-03 17:19:21 +01:00
Willy Tarreau	f486f976c7	BUILD: limits: make normalize_rlim() take an rlim_t to fix build on m68k As can be seen here, the build fails on m68k since commit `665dde648` ("MINOR: debug: use LIM2A to show limits") in 3.1: https://github.com/haproxy/haproxy/actions/runs/12440234399/job/34735360177 The reason is the comparison between a ulong limit and RLIM_INFINITY. Indeed, on m68k, rlim_t is an unsigned long long. Let's just change the function's input type to take an rlim_t instead. This also allows to get rid of the casts in the call place. This can be backported to 3.1 though it's not important given the low prevalence of this platform for such use cases.	2024-12-25 12:33:06 +01:00
Willy Tarreau	f78121dd32	BUILD: compat: add missing fcntl.h before defining F_SETPIPE_SZ n 1.5-dev8, 13 years ago, support for setting pipe size was added by commit `bd9a0a778` ("OPTIM/MINOR: make it possible to change pipe size (tune.pipesize)"). For compatibility purposes, it was defining F_SETPIPE_SZ in compat.h if it was not set. It apparently always had F_SETPIPE_SZ defined before being included. Now in 3.2-dev1, commit `fbc534a6f` ("REORG: startup: move nofile limit checks in limits.c") reordered a few includes and ended up with mworker-prog.c including compat.h before fcntl.h, causing a redefinition error on certain libcs: CC src/mworker-prog.o In file included from /usr/include/bits/fcntl.h:61:0, from /usr/include/fcntl.h:35, from include/haproxy/limits.h:11, from include/haproxy/mworker.h:18, from src/mworker-prog.c:27: /usr/include/bits/fcntl-linux.h:203:0: warning: "F_SETPIPE_SZ" redefined [enabled by default] In file included from include/haproxy/api-t.h:35:0, from include/haproxy/api.h:33, from src/mworker-prog.c:23: include/haproxy/compat.h:161:0: note: this is the location of the previous definition Let's simply include fcntl.h in compat.h before the macro is redefined. There's normally no need to backport this, though it's harmless to do it if needed.	2024-12-25 11:53:11 +01:00
Olivier Houchard	505480eeef	CLEANUP: Remove pendconn_must_try_again(). Remove pendconn_must_try_again(), now that it no longer is used.	2024-12-24 14:10:06 +01:00
Olivier Houchard	cda7275ef5	MEDIUM: queue: Handle the race condition between queue and dequeue differently There is a small race condition, where a server would check if there is something left in the proxy queue, and adding something to the proxy queue. If the server checks just before the stream is added to the queue, and it no longer has any stream to deal with, then nothing will take care of the stream, that may stay in the queue forever. This was worked around with commit `5541d4995d`, by checking for that exact condition after adding the stream to the queue, and trying again to get a server assigned if it is detected. That fix lead to multiple infinite loops, that got fixed, but it is not unlikely that it could happen again. So let's fix the initial problem differently : a single server may mark itself as ready, and it removes itself once used. The principle is that when we discover that the just queued stream is alone with no active request anywhere ot dequeue it, instead of rebalancing it, it will be assigned to that current "ready" server that is available to handle it. The extra cost of the atomic ops is negligible since the situation is super rare.	2024-12-24 14:10:06 +01:00
Olivier Houchard	5b8899b6cc	BUG/MEDIUM: queue: Make process_srv_queue return the number of streams Make process_srv_queue() return the number of streams unqueued, as pendconn_grab_from_px() did, as that number is used by srv_update_status() to generate logs. This should be backported up to 2.6 with `111ea83ed4`	2024-12-23 15:03:40 +01:00
William Lallemand	056ec51c26	MEDIUM: ssl/ocsp: counters for OCSP stapling Add 2 counters in the SSL stats module for OCSP stapling. - ssl_ocsp_staple is the number of OCSP response successfully stapled with the handshake - ssl_failed_ocsp_stapled is the number of OCSP response that we couldn't staple, it could be because of an error or because the response is expired. These counters are incremented in the OCSP stapling callback, so if no OCSP was configured they won't never increase. Also they are only working in frontends. This was discussed in github issue #2822.	2024-12-23 11:23:00 +01:00
William Lallemand	0e6af97233	MINOR: ssl: change visibility of ssl_stats_module In order to add stats from other files, the ssl_stats_module need to be visible from other files. This moves the ssl_counters definition in ssl_sock-t.h and removes the static of ssl_stats_module.	2024-12-23 11:23:00 +01:00
William Lallemand	acb2c9eb8b	MINOR: ssl: improve HAVE_SSL_OCSP ifdef Allow to build correctly without OCSP. It could be disabled easily with OpenSSL build with OPENSSL_NO_OCSP. Or even with DEFINE="-DOPENSSL_NO_OCSP" on haproxy make line.	2024-12-19 10:53:05 +01:00
Remi Tricot-Le Breton	93f2c73423	MINOR: ssl/ocsp: Add extra details in error logs when possible When the ocsp response auto update process fails during insertion or while validating the received ocsp response, we call ssl_sock_update_ocsp_response or ssl_ocsp_check_response respectively and both these functions take an 'err' parameter in which detailed error messages can be written. Until now, those error messages were discarded and the only information given to the user was a generic error (ERR_CHECK or ERR_INSERT) which does not help much. We now keep a pointer to the last error message in the certificate_ocsp structure and dump its content in the update logs as well as in the "show ssl ocsp-updates" cli command. This issue was raised in GitHub #2817.	2024-12-18 10:41:16 +01:00
Amaury Denoyelle	9d155ca706	MINOR: trace: implement tracing disabling API Define a set of functions to temporarily disable/reactivate tracing for the current thread. This could be useful when wanting to quickly remove tracing output for some code parts. The API relies on a disable/resume set of functions, with a thread-local counter. This counter is tested under __trace_enabled(). It is a cumulative value so that the same count of resume must be issued after several disable usage. There is also the possibility to force reset the counter to 0 before restoring the old value. This should be backported up to 3.1.	2024-12-18 09:52:06 +01:00
Amaury Denoyelle	e296585ae9	MEDIUM/OPTIM: mux-quic: implement purg_list This commit is part of the current serie which aims to refactor and improve overall performance of QUIC MUX I/O handler. qcc_io_process() is responsible to perform some internal operations on QUIC MUX after I/O completion. It is notably called on every qcc_io_cb() tasklet handler. The most intensive work on it is the purging of QCS instances after transfer completion. This was implemented by looping on QCC streams tree and inspecting the state of every QCS. The purpose of this commit is to optimize this processing. A new purg_list QCC member is defined. It is responsible to list every QCS instances whose transfer has been completed. It is thus safe to reuse <el_send> QCS list attach point. Stream purging will thus only loop on purg_list instead of every known QCS. This should be backported up to 3.1.	2024-12-18 09:33:52 +01:00
Amaury Denoyelle	4b42dd4ae0	MEDIUM/OPTIM: mux-quic: define a recv_list for demux resumption This commit is part of the current serie which aims to refactor and improve overall performance of QUIC MUX I/O handler. Define a recv_list element into qcc structure. This is used to registered every instance of qcs which are currently blocked on demuxing, which happen on no more space in <rx.appbuf>. The purpose of this patch is to reduce qcc_io_recv() CPU usage. Now, only recv_list iteration is performed, instead of the previous looping over every qcs instances. This is useful as qcc_io_recv() is called each time qcc_io_cb() is scheduled, even if only sending condition was the wakeup origin. A qcs is not inserted into recv_list immediately after blocking on demux full buffer. Instead, this is only done after unblocking via stream rcv_buf callback, which ensure that new buffer space is available. This should be backported up to 3.1.	2024-12-18 09:23:41 +01:00
Amaury Denoyelle	0a53a008d0	MINOR: mux-quic: refactor wait-for-handshake support This commit refactors wait-for-handshake support from QUIC MUX. The flag logic QC_CF_WAIT_HS is inverted : it is now positionned only if MUX is instantiated before handshake completion. When the handshake is completed, the flag is removed. The flag is now set directly on initialization via qmux_init(). Removal via qcc_wait_for_hs() is moved from qcc_io_process() to qcc_io_recv(). This is deemed more logical as QUIC MUX is scheduled on RECV to be notify by the transport layer about handshake termination. Moreover, qcc_wait_for_hs() is now called if recv subscription is still active. This commit is the first of a serie which aims to refactor QUIC MUX I/O handler and improves its overall performance. The ultimate objective is to be able to stream qcc_io_cb() by removing pacing specific code path via qcc_purge_sending(). This should be backported up to 3.1.	2024-12-18 09:23:41 +01:00
Amaury Denoyelle	17bfe93768	CLEANUP: mux-quic: remove unused qcc member send_retry_list Remove unused fields send_retry_list from qcc and its corresponding attach element el from qcs. This should be backported up to 3.1.	2024-12-18 09:20:20 +01:00
Willy Tarreau	7b6acb6a51	MINOR: bug: make BUG_ON() fall back to ASSUME When the strict level is zero and BUG_ON() is not implemented, some possible null-deref warnings are emitted again because some were covering for these cases. Let's make it fall back to ASSUME() so that the compiler continues to know that the tested expression never happens. It also allows to further optimize certain functions by helping the compiler eliminate certain tests for impossible values. However it requires that the expression is really evaluated before passing the result through ASSUME() otherwise it was shown that gcc-11 and above will fail to evaluate its implications and will continue to emit the null-deref warnings in case the expression is non-trivial (e.g. it has multiple terms). We don't do it for BUG_ON_HOT() however because the extra cost of evaluating the condition is generally not welcome in fast paths, particularly when that BUG_ON_HOT() was kept disabled for performance reasons.	2024-12-17 17:39:12 +01:00
Willy Tarreau	63798088b3	MINOR: compiler: add ASSUME_NONNULL() to tell the compiler a pointer is valid At plenty of places we have ALREADY_CHECKED() or DISGUISE() on a pointer just to avoid "possibly null-deref" warnings. These ones have the side effect of weakening optimizations by passing through an assembly step. Using ASSUME_NONNULL() we can avoid that extra step. And when the __builtin_unreachable() builtin is not present, we fall back to the old method using assembly. The macro returns the input value so that it may be used both as a declarative way to claim non-nullity or directly inside an expression like DISGUISE().	2024-12-17 16:46:46 +01:00
Willy Tarreau	2ce63b7b17	MINOR: compiler: also enable __builtin_assume() for ASSUME() Clang apparently has __builtin_assume() which does exactly the same as our macro, since at least v3.8. Let's enable it, in case it may even better detect assumptions vs unreachable code.	2024-12-17 16:46:46 +01:00
Willy Tarreau	efc897484b	MINOR: compiler: add a new "ASSUME" macro to help the compiler This macro takes an expression, tests it and calls an unreachable statement if false. This allows the compiler to know that such a combination does not happen, and totally eliminate tests that would be related to this condition. When the statement is not available in the compiler, we just perform a break from a do {} while loop so that the expression remains evaluated if needed (e.g. function call).	2024-12-17 16:46:46 +01:00
Willy Tarreau	41fc18b1d1	MINOR: compiler: rely on builtin detection for __builtin_unreachable() Due to __builtin_unreachable() only being associated to gcc 4.5 and above, it turns out it was not enabled for clang. It's not used that much but still a little bit, so let's enable it now. This reduces the code size by 0.2% and makes it a bit more efficient.	2024-12-17 16:46:46 +01:00
Willy Tarreau	96cfcb1df3	MINOR: compiler: add a __has_builtin() macro to detect features more easily We already have a __has_attribute() macro to detect when the compiler supports a specific attribute, but we didn't have the equivalent for builtins. clang-3 and gcc-10 have __has_builtin() for this. Let's just bring it using the same mechanism as __has_attribute(), which will allow us to simply define the macro's value for older compilers. It will save us from keeping that many compiler-specific tests that are incomplete (e.g. the __builtin_unreachable() test currently doesn't cover clang).	2024-12-17 16:46:46 +01:00
Olivier Houchard	b3cd5a4b86	CLEANUP: queues: Remove pendconn_grab_from_px(). pendconn_grab_from_px() is now unused, so just remove it.	2024-12-17 16:05:44 +01:00
William Lallemand	bb88f68cf7	MINOR: ssl: add utils functions to extract X509 notAfter date Add ASN1_to_time_t() which converts an ASN1_TIME to a time_t and x509_get_notafter_time_t() which returns the notAfter date in time_t format.	2024-12-16 14:54:53 +01:00
Valentine Krasnobaeva	fbc534a6fa	REORG: startup: move nofile limit checks in limits.c Let's encapsulate the code, which checks the applied nofile limit into a separate helper check_nofile_lim_and_prealloc_fd(). Let's keep in this new function scope the block, which tries to create a copy of FD with the highest number, if prealloc-fd is set in the configuration.	2024-12-16 10:44:01 +01:00
Valentine Krasnobaeva	14f5e00d38	REORG: startup: move code that applies limits to limits.c In step_init_3() we try to apply provided or calculated earlier haproxy maxsock and memmax limits. Let's encapsulate these code blocks in dedicated functions: apply_nofile_limit() and apply_memory_limit() and let's move them into limits.c. Limits.c gathers now all the logic for calculating and setting system limits in dependency of the provided configuration.	2024-12-16 10:44:01 +01:00
Valentine Krasnobaeva	1332e9b58d	REORG: startup: move global.maxconn calculations in limits.c Let's encapsulate the code, which calculates global.maxconn and global.maxsslconn into a dedicated function set_global_maxconn() and let's move this function in limits.c. In limits.c we keep helpers to calculate and check haproxy internal limits, based on the system nofile and memory limits.	2024-12-16 10:44:01 +01:00
Frederic Lecaille	e1d25cdbdd	CLEANUP: quic: remove a wrong comment about ->app_limited (drs) ->app_limited quic_drs struct member is not a boolean. This is the index of the last transmitted packet marked as application-limited, or 0 if the connection is not currently application-limited (see C.app_limited definition in BBR v3 draft).	2024-12-13 14:42:43 +01:00
Frederic Lecaille	eeaeb412dc	MINOR: quic: reduce the private data size of QUIC cc algos After these commits: BUG/MINOR: quic: remove max_bw filter from delivery rate sampling BUG/MINOR: quic: fix BBB max bandwidth oscillation issue where some members were removed from bbr struct, the private data size of QUIC cc algorithms may be reduced from 160 to 144 uint32_t. Should be easily backported to 3.1 alonside the commits mentioned above.	2024-12-13 14:42:43 +01:00
Frederic Lecaille	22ab45a3a8	BUG/MINOR: quic: remove max_bw filter from delivery rate sampling This filter is no more needed after this commit: BUG/MINOR: quic: fix BBB max bandwidth oscillation issue. Indeed, one added this filter at delivery rate sampling level to filter the BBR max bandwidth estimations and was inspired from ngtcp2 code source when trying to fix the oscillation issue. But this BBR max bandwidth oscillation issue was fixed by the aforementioned commit. Furthermore this code tends to always increment the BBR max bandwidth. From my point of view, this is not a good idea at all. Must be backported to 3.1.	2024-12-13 14:42:43 +01:00
Frederic Lecaille	a9a2f98f86	MINOR: window_filter: rely on the time to update the filter samples (QUIC/BBR) The windowed filters are used only the BBR implementation for QUIC to filter the maximum bandwidth samples for its estimation over a virtual time interval tracked by counting the cyclical progression through ProbeBW cycles. ngtcp2 and quiche use such windowed filters in their BBR implementation. But in a slightly different way. When updating the 2nd or 3rd filter samples, this is done based on their values in place of the time they have been sampled. It seems more logical to rely on the sample timestamps even if this has no implication because when a sample is updated using another sample because it has the same value, they have both the same timestamps! This patch modifies two statements which compare two consecutive filter samples based on their values (smp[]->v) by statements which compare them based on the virtual time they have been sampled (smp[]->t). This fully complies which the code used by the Linux kernel in lib/win_minmax.c. Alo take the opportunity of this patch to shorten some statements using <smp> local variable value to update smp[2] sample in place of initializing its two members with the <smp> member values. This patch SHOULD be easily backported to 3.1 where BBR was first implemented.	2024-12-13 14:42:43 +01:00
Amaury Denoyelle	1f458b3ea8	MINOR: applet: define applet_putchk_stress() alternative Previous patch introduced stress mode to be able to easily test alternative code paths. The first point would be to force interruption of stats dump on every line and check reentrant patchs, in particular while adding and removing servers instances. The purpose of this patch is to be able to use applet_putchk_stress() during stats dump while not impacting other applets. To support this, extract applet_putchk() into an internal _applet_putchk() which have a new argument stress. Define two helpers applet_putchk() and applet_putchk_stress(), the latter to set the stress argument to true. For the moment, applet_putchk_stress() is not used. This will be the subject of the next patch.	2024-12-12 11:26:33 +01:00
Amaury Denoyelle	9d19fc4cf7	MINOR: build: define DEBUG_STRESS Define a new build mode DEBUG_STRESS. This will be used to stress some code parts which cannot be reproduce easily with an alternative suboptimal code. First, a global <mode_stress> is set either to 1 or 0 depending on DEBUG_STRESS compilation. A new global keyword "stress-level" is also defined. It allows to specify a level from 0 to 9, to increase the stress incurred on the code. Helper macro STRESS_RUN* are defined for each stress level. This allows to easily specify an instruction in default execution and a stress counterpart if running on the corresponding stress level.	2024-12-12 11:19:10 +01:00
Aurelien DARRAGON	358166ae6a	BUG/MINOR: hlua_fcn: restore server pairs iterator pointer consistency Since `9c91b30` ("MINOR: server: remove prev_deleted server list"), hlua server pair iterator may use and return invalid (stale) server pointer if multiple servers were deleted between two iterations. Indeed, the server refcount mechanism (using srv_take()) is no longer sufficient as the prev_deleted mitigation was removed. To ensure server pointer consistency between two yields, the new watcher mechanism must be used (as it already the case for stats dumping). Thus in this patch we slightly change the server iteration logic: hlua_server_list_iterator_context struct now stores the next valid server pointer, and a watcher is added to ensure this pointer is never stale. Then in hlua_listable_servers_pairs_iterator(), this next pointer is used to create the Lua server object, and the next valid pointer is obtained by leveraging watcher_next(). No backport needed unless `9c91b30` ("MINOR: server: remove prev_deleted server list") is. Please note that dynamic servers were not supported in Lua prior to 2.8, so it doesn't make sense to backport this patch further than 2.8.	2024-12-11 10:52:11 +01:00
Amaury Denoyelle	9c91b30139	MINOR: server: remove prev_deleted server list This patch is a direct follow-up to the previous one. Thanks to watcher type, it is not safe to assume that servers manipulated via stats dump were not targetted by a "delete server" CLI command. As such, prev_deleted list server member is now unneeded. This patch thus removes any reference to it.	2024-12-10 16:19:33 +01:00

... 3 4 5 6 7 ...

8512 Commits