haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-12-06 02:01:01 +01:00

Author	SHA1	Message	Date
William Lallemand	546c67d137	MINOR: acme: generate a temporary key pair This patch provides two functions acme_gen_tmp_pkey() and acme_gen_tmp_x509(). These functions generates a unique keypair and X509 certificate that will be stored in tmp_x509 and tmp_pkey. If the key pair or certificate was already generated they will return the existing one. The key is an RSA2048 and the X509 is generated with a expiration in the past. The CN is "expired". These are just placeholders to be used if we don't have files.	2025-11-06 11:56:27 +01:00
William Lallemand	1df55b441b	MEDIUM: ssl/ckch: use ckch_store instead of ckch_data for ckch_conf_kws This is an API change, instead of passing a ckch_data alone, the ckch_conf_kws.func() is called with a ckch_store. This allows the callback to access the whole ckch_store, with the ckch_conf and the ckch_data. But it requires the ckch_conf to be actually put in the ckch_store before.	2025-11-06 11:56:27 +01:00
Amaury Denoyelle	b9809fe0d0	MINOR: quic: remove <mux_state> field This patch removes <mux_state> field from quic_conn structure. The purpose of this field was to indicate if MUX layer above quic_conn is not yet initialized, active, or already released. It became tedious to properly set it as initialization order of the various quic_conn/conn/MUX layers now differ between the frontend and backend sides, and also depending if 0-RTT is used or not. Recently, a new change introduced in connect_server() will allow to initialize QUIC MUX earlier if ALPN is cached on the server structure. This had another level of complexity. Thus, this patch removes <mux_state> field completely. Instead, a new flag QUIC_FL_CONN_XPRT_CLOSED is defined. It is set at a single place only on close XPRT callback invokation. It can be mixed with the new utility functions qc_wait_for_conn()/qc_is_conn_ready() to determine the status of conn/MUX layers now without an extra quic_conn field.	2025-11-05 14:03:34 +01:00
Willy Tarreau	096999ee20	BUG/MEDIUM: connections: permit to permanently remove an idle conn There's currently a function conn_delete_from_tree() which is used to detach an idle connection from the tree it's currently attached to so that it is no longer found. This function is used in three circumstances: - when picking a new connection that no longer has any avail stream - when temporarily working on the connection from an I/O handler, in which case it's re-added at the end - when killing a connection The 2nd case above is quite specific, as it requires to preserve the CO_FL_LIST_MASK flags so that the connection can be re-inserted into the proper tree when leaving the handler. However, there's a catch. When killing a connection, we want to be certain it will not be reinserted into the tree. The flags preservation is causing a tiny race if an I/O happens while the connection is in the kill list, because in this case the I/O handler will note the connection flags, do its work, then reinsert the connection where it believed it was, then the connection gets purged, and another user can find it in the tree. The issue is very difficult to reproduce. On a 128-thread machine it happens in H2 around 500k req/s after around 50M requests. In H1 it happens after around 1 billion requests. The fix here consists in passing an extra argument to the function to indicate if the removal is permanent or not. When it's permanent, the function will clear the associated flags. The callers were adjusted so that all those dequeuing a connection in order to kill it do it permanently and all other ones do it only temporarily. A slightly different approach could have worked: the function could always remove all flags, and the callers would need to restore them. But this would require trickier modifications of the various call places, compared to only passing 0/1 to indicate the permanent status. This will need to be backported to all stable versions. The issue was at least reproduced since 3.1 (not tested before). The patch will need to be adjusted for 3.2 and older, because a 2nd argument "thr" was added in 3.3, so the patch will not apply to older versions as-is.	2025-11-05 11:08:25 +01:00
Olivier Houchard	7d4aa7b22b	BUG/MEDIUM: server: Add a rwlock to path parameter Add a rwlock to control the server's path_parameter, to make sure multiple threads don't set it at the same time, and it can't be seen in an inconsistent state. Also don't set the parameter every time, only set them if they have changed, to prevent needless writes. This does not need to be backported.	2025-11-04 18:47:34 +01:00
Amaury Denoyelle	efe60745b3	MINOR: quic: remove connection arg from qc_new_conn() This patch is similar to the previous one, this time dealing with qc_new_conn(). This function was asymetric on frontend and backend side, as connection argument was set only in the latter case. This was required prior due to qc_alloc_ssl_sock_ctx() signature. This has changed with the previous patch, thus qc_new_conn() can also be realigned on both FE and BE sides. <conn> member of quic_conn instance is always set outside it, in qc_xprt_start() on the backend case.	2025-11-04 17:47:42 +01:00
Amaury Denoyelle	5a17cade4f	MINOR: quic: do not set conn member if ssl_sock_ctx ssl_sock_ctx is a generic object used both on TCP/SSL and QUIC stacks. Most notably it contains a <conn> member which is a pointer to struct connection. On QUIC frontend side, this member is always set to NULL. Indeed, connection is only created after handshake completion. However, this has changed for backend side, where the connection is instantiated prior to its quic_conn counterpart. Thus, ssl_sock_ctx member would be set in this case as a convenience for use later in qc_ssl_do_hanshake(). However, this method was unsafe as the connection can be released, without resetting ssl_sock_ctx member. Thus, the previous patch fixes this by using on <conn> member through the quic_conn instance which is the proper way. Thus, this patch resets ssl_sock_ctx <conn> member to NULL. This is deemed the cleanest method as it ensures that both frontend and backend sides must not use it anymore.	2025-11-04 17:38:09 +01:00
Willy Tarreau	fd012b6c59	OPTIM: proxy: move atomically access fields out of the read-only ones Perf top showed that h1_snd_buf() was having great difficulties accessing the proxy's server_id_hdr_name field in the middle of the headers loop. Moving the assignment out of the loop to a local variable moved the problem there as well: \| if (!(h1m->flags & H1_MF_RESP) && isttest(h1c->px->server_id_hdr_n 0.10 \|20b0: mov -0x120(%rbp),%rdi 1.33 \| mov 0x60(%rdi),%r10 0.01 \| test %eax,%eax 0.18 \| jne 2118 12.87 \| mov 0x350(%r10),%rdi 0.01 \| test %rdi,%rdi 0.05 \| je 2118 \| mov 0x358(%r10),%r11 It turns out that there are several atomically accessed fields in its vicinity, causing the cache line to bounce all the time. Let's collect the few frequently changed fields and place them together at the end of the structure, and plug the 32-bit hole with another isolated field. Doing so also reduced a little bit the cost of decrementing be->be_conn in process_stream(), and overall the HTTP/1 performance increased by about 1% both on ARM and x86_64.	2025-11-03 13:54:49 +01:00
Amaury Denoyelle	6bfabfdc77	OPTIM: backend: skip conn reuse for incompatible proxies When trying to reuse a backend connection, a connection hash is calculated to match an entry with similar parameters. Previously, this operation was skipped if the stream content wasn't based on HTTP, as it would have been incompatible with http-reuse. With the introduction of SPOP backends, this condition was removed, so that it can also benefit from connection reuse. However, this means that now hash calcul is always performed when connecting to a server, even for TCP or log backends. This is unnecessary as these proxies cannot perform connection reuse. Note also that reuse mode is resetted on postparsing for incompatible backends. This at least guarantees that no tree lookup will be performed via be_reuse_connection(). However, connection lookup is still performed in the session via session_get_conn() which is another unnecessary operation. Thus, this patch restores the condition so that reuse operations are now entirely skipped if a backend mode is incompatible. This is implemented via a new utility function named be_supports_conn_reuse(). This could be backported up to 3.1, as this commit could be considered as a performance regression for tcp/log backend modes.	2025-11-03 10:43:50 +01:00
Amaury Denoyelle	14a6468df5	MINOR: quic: reject conf with QUIC servers if not compiled Ensure that QUIC support is compiled into haproxy when a QUIC server is configured. This check is performed during _srv_parse_finalize() so that it is detected both on configuration parsing and when adding a dynamic server via the CLI. Note that this changes the behavior of srv_is_quic() utility function. Previously, it always returned false when QUIC support wasn't compiled. With this new check introduced, it is now guaranteed that a QUIC server won't exist if compilation support is not active. Hence srv_is_quic() does not rely anymore on USE_QUIC define.	2025-10-31 11:32:20 +01:00
Willy Tarreau	b0e8edaef2	MEDIUM: mux-h2: do not needlessly refrain from sending data early The mux currently refrains from sending data before H2_CS_FRAME_H, i.e. before the peer's SETTINGS frame was received. While it makes sense on the frontend, it's causing harm on the backend because it forces the first request to be sent in two halves over an extra RTT: first the preface and settings, second the request once the settings are received. This is totally contrary to the philosophy of the H2 protocol, consisting in permitting the client to send as soon as possible. Actually what happens is the following: - process_stream() calls connect_server() - connect_server() creates a connection, and if the proto/alpn is guessed or known, the mux is instantiated for the current request. - the H2 init code wakes the h2 tasklet up and returns - process_stream() tries to send the request using h2_snd_buf(), but that one sees that we're before H2_CS_FRAME_H, refrains from doing so and returns. - process_stream() subscribes and quits - the h2 tasklet can now execute to send the preface and settings, which leave as a first TCP segment. The connection is ready. - the iocb is woken again once the server's SETTINGS frame is received, turning the connection to the H2_CS_FRAME_H state, and the iocb wake up process_stream(). - process_stream() executes again and can try to send again. - h2_snd_buf() is called and finally sends the request as a second TCP segment. Not only this is inefficient, but it also renders 0-RTT and TFO impossible on H2 connections. When 0-RTT is used, only the preface and settings leave as early data (the very first data of that connection), which is totally pointless. In order to fix this, we have to go through a few steps: - first we need to let data be sent to a server immediately after the SETTINGS frame was sent (i.e. in H2_CS_SETTINGS1 state instead of H2_CS_FRAME_H). However, some protocol extensions are advertised by the server using SETTINGS (e.g. RFC8441) and some requests might need to know the existence of such extensions. For this reason we're adding a new h2c flag, H2_CF_SETTINGS_NEEDED, which indicates that some operations were not done because a server's SETTINGS frame is needed. This is set when trying to send a protocol upgrade or extended CONNECT during H2_CS_SETTINGS1, indicating that it's needed to wait for H2_CS_FRAME_H in this case. The flag is always set on frontend connections. This is what is being done in this patch. - second, we need to be able to push the preface opportunistically with the first h2_snd_buf() so that it's not needed to wake the tasklet up just to send that and wake process_stream() again. This will be in a separate patch. By doing the first step, we're at least saving one needless tasklet wakeup per connection (~9%), which results in ~5% backend connection rate increase.	2025-10-30 18:16:54 +01:00
William Lallemand	1e2f920be6	MINOR: listener: implement bind_conf_find_by_name() Returns a pointer to the first bind_conf matching <name> in a frontend <front>. When name is prefixed by a @ (@<filename>:<linenum>), it tries to look for the corresponding filename and line of the configuration file. NULL is returned if no match is found.	2025-10-30 10:37:42 +01:00
sftcd	23f5cbb411	MINOR: ssl/ech: add logging and sample fetches for ECH status and outer SNI This patch adds functions to expose Encrypted Client Hello (ECH) status and outer SNI information for logging and sample fetching. Two new helper functions are introduced in ech.c: - conn_get_ech_status() places the ECH processing status string into a buffer. - conn_get_ech_outer_sni() retrieves the outer SNI value if ECH succeeded. Two new sample fetch keywords are added: - "ssl_fc_ech_status" returns the ECH status string. - "ssl_fc_ech_outer_sni" returns the outer SNI value seen during ECH. These allow ECH information to be used in HAProxy logs, ACLs, and captures.	2025-10-30 10:37:30 +01:00
sftcd	dba4fd248a	MEDIUM: ssl/ech: config and load keys This patch introduces the USE_ECH option in the Makefile to enable support for Encrypted Client Hello (ECH) with OpenSSL. A new function, load_echkeys, is added to load ECH keys from a specified directory. The SSL context initialization process in ssl_sock.c is updated to load these keys if configured. A new configuration directive, `ech`, is introduced to allow users to specify the ECH key directory in the listener configuration.	2025-10-30 10:37:12 +01:00
Remi Tricot-Le Breton	dc35a3487b	MINOR: ssl: Do not dump decrypted privkeys in 'dump ssl cert' A private keys that is password protected and was decoded during init thanks to the password obtained thanks to 'ssl-passphrase-cmd' should not be dumped via 'dump ssl cert' CLI command.	2025-10-29 10:54:17 +01:00
Remi Tricot-Le Breton	478dd7bad0	MEDIUM: ssl: Add certificate password callback that calls external command When a certificate is protected by a password, we can provide the password via the dedicated pem_password_cb param provided to PEM_read_bio_PrivateKey. HAProxy will fetch the password automatically during init by calling a user-defined external command that should dump the right password on its standard output (see new 'ssl-passphrase-cmd' global option).	2025-10-29 10:54:17 +01:00
Remi Tricot-Le Breton	1ec59d3426	MINOR: init: Make devnullfd global and create it earlier in init The devnull fd might be needed during configuration parsing, if some options require to fork/exec for instance. So we now create it much earlier in the init process and without depending on the '-q' or '-d' parameters.	2025-10-29 10:54:17 +01:00
Willy Tarreau	2d7e3ddd4a	BUG/MEDIUM: cli: do not return ACKs one char at a time Since 3.0 where the CLI started to use rcv_buf, it appears that some external tools sending chained commands are randomly experiencing failures. Each time this happens when the whole command is sent as a single packet, immediately followed by a close. This is not a correct way to use the CLI but this has been working for ages for simple netcat-based scripts, so we should at least try to preserve this. The cause of the failure is that the first LF that acks a command is immediately sent back to the client and rejected due to the closed connection. This in turn forwards the error back to the applet which aborts its processing. Before 3.0 the responses would be queued into the buffer, then sent back to the channel, and would all fail at once. This changed when snd_buf/rcv_buf were implemented because the applets are much more responsive and since they yield between each command, they can deliver one ACK at a time that is immediately forwarded down the chain. An easy way to observe the problem is to send 5 map updates, a shutdown, and immediately close via tcploop, and in parallel run a periodic "show map" to count the number of elements: $ tcploop -U /tmp/sock1 C S:"add map #0 1 1; add map #0 2 2; add map #0 3 3; add map #0 4 4; add map #0 5 5\n" F K Before 3.0, there would always be 5 elements. Since 3.0 and before 20ec1de214 ("MAJOR: cli: Refacor parsing and execution of pipelined commands"), almost always 2. And since that commit above in 3.2, almost always one. Doing the same using socat or netcat shows almost always 5... It's entirely timing-dependent, and might even vary based on the RTT between the client and haproxy! The approach taken here consists in doing the same principle as MSG_MORE or Nagle but on the response buffer: the applet doesn't need to send a single ACK for each command when it has already been woken up and is scheduled to come back to work. It's fine (and even desirable) that ACKs are grouped in a single packet as much as possible. For this reason, this patch implements APPCTX_CLI_ST1_YIELD, a new CLI flag which indicates that the applet left in yielding condition, i.e. it has not finished its work. This flag is used by .rcv_buf to hold pending data. This way we won't return partial responses for no reason, and we can continue to emulate the previous behavior. One very nice benefit to this is that it saves huge amounts of CPU on the client. In the test below that tries to update 1M map entries, the CPU used by socat went from 100% to 0% and the total transfer time dropped by 28%: before: $ time awk 'BEGIN{ printf "prompt i\n"; for (i=0;i<1000000;i++) { \ printf "add map #0 %d %d\n",i,i,i }}' \| socat /tmp/sock1 - >/dev/null real 0m2.407s user 0m1.485s sys 0m1.682s after: $ time awk 'BEGIN{ printf "prompt i\n"; for (i=0;i<1000000;i++) { \ printf "add map #0 %d %d\n",i,i,i }}' \| socat /tmp/sock1 - >/dev/null real 0m1.721s user 0m0.952s sys 0m0.057s The difference is also quite visible on the number of syscalls during the test (for 1k updates): before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.071691 0 100001 sendmsg after: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000011 1 9 sendmsg This patch will need to be backported to 3.0, and depends on these two patches to be backported as well: MINOR: applet: do not put SE_FL_WANT_ROOM on rcv_buf() if the channel is empty MINOR: cli: create cli_raw_rcv_buf() from the generic applet_raw_rcv_buf()	2025-10-27 16:57:07 +01:00
Olivier Houchard	837351245a	BUG/MEDIUM: mt_list: Use atomic operations to prevent compiler optims As a folow-up to f40f5401b9f24becc6fdd2e77d4f4578bbecae7f, explicitely use atomic operations to set the prev and next fields, to make sure the compiler can't assume anything about it, and just does it. This should be backported after f40f5401b9 up to 2.8.	2025-10-24 13:34:41 +02:00
Willy Tarreau	2ec6df59bf	BUILD: openssl-compat: fix build failure with OPENSSL=0 and KTLS=1 The USE_KTLS test is currently being done outside of the USE_OPENSSL guard so disabling USE_OPENSSL still results in build failures on libcs built with support for kernels before 4.17, because we enable KTLS by default on linux. Let's move the KTLS block inside the USE_OPENSSL guard instead. No backport is needed since KTLS is only in 3.3.	2025-10-24 10:45:02 +02:00
Aurelien DARRAGON	d655ed5f14	BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency (2nd attempt) This is a second attempt at fixing issues on 32bits systems which would trigger the following BUG_ON() statement: FATAL: bug condition "sizeof(struct shm_stats_file_object) != 544" matched at src/stats-file.c:825 shm_stats_file_object struct size changed, is is part of the exported API: ensure all precautions were taken (ie: shm_stats_file version change) before adjusting this This is a drop-in replacement for d30b88a6c + 4693ee0ff, as suggested by Willy. Indeed, on supported platforms unsigned int can be assumed to be 4 bytes long, and long can be assumed to be 8 bytes long. As such, the previous attempt was overkill and added unecessary maintenance complexity which could result in bugs if not used properly. Moreover, it would only partially solve the issue, since on little endian vs big endian architectures, the provisioned memory areas (originating from the same shm stats file) could be read differently by the host. Instead we fix the aligments issues, and this alone helps to ensure struct memory consistency on 64 vs 32bits platforms. It was tested on both i386 and i586. last_change and last_sess counters are now stored as unsigned int, as it helped to fix the alignment issues and they were found to be used as 32bits integers anyway. Thanks to Willy for problem analysis and the patch proposal. No backport needed.	2025-10-24 09:35:38 +02:00
Aurelien DARRAGON	a931779dde	Revert "MINOR: compiler: add FIXED_SIZE(size, type, name) macro" This reverts commit 466a603b59ed77e9787398ecf1baf77c46ae57b1. Due to the last 2 commits, this macro is now unused, and will probably never be used, so let's get rid of that for now.	2025-10-24 09:35:34 +02:00
Aurelien DARRAGON	8277f891d2	Revert "MEDIUM: freq-ctr: use explicit-size types for freq-ctr struct" This reverts commit 4693ee0ff7a5fa4a12ff69b1a33adca142e781ac. As discussed in GH #3168, this works but it is not the proper way to fix the issue. See following commits.	2025-10-24 09:35:29 +02:00
Aurelien DARRAGON	c0d952ccc1	Revert "BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency" This reverts commit d30b88a6cc47d662e92b524ad5818be312401d0e. As discussed in GH #3168, this works but it is not the proper way to fix the issue. See following commits.	2025-10-24 09:35:25 +02:00
Amaury Denoyelle	7ba4b0ad5f	BUG/MINOR: quic: rename and duplicate stream settings Several settings can be set to control stream multiplexing and associated receive window. Previously, all of these settings were configured using prefix "tune.quic.frontend.", despite being applied blindly on both sides. Fix this by duplicating these settings specific to frontend and backend side. Options are also renamed to use the standardize prefix "tune.quic.[be\|fe].stream." notation. Also, each option is individually renamed to better reflect its purpose and hide technical details relative to QUIC transport parameter naming : * max-data-size -> stream.rxbuf * max-streams-bidi -> stream.max-concurrent * stream-data-ratio -> stream.data-ratio No need to backport.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	d5142706f8	BUG/MINOR: quic: split option for congestion max window size	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	33afba0dda	BUG/MINOR: quic: split max-idle-timeout option for FE/BE usage Streamline max-idle-timeout option. Rename it to use the newer cohesive naming scheme 'tune.quic.fe\|be.'. Two different fields were already defined in global struct. These fields are moved into quic_tune along with other QUIC settings. However, no parser was defined for backend option, this commit fixes this. No need to backport this.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	5bc659a4a2	MINOR: quic: rename frontend sock-per-conn setting On frontend side, a quic_conn can have a dedicated FD or use the listener one. These different modes can be activated via a global QUIC tune setting. This patch adjusts the option. First, it is renamed to the more meaningful name 'tune.quic.fe.sock-per-conn'. Also, arguments are now either 'default-on' or 'force-off'. The objective is to better highlight reliationship with 'quic-socket' bind option. The older option is deprecated and will be removed in 3.5.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	a14c6cee17	MINOR: quic: rename retry-threshold setting A QUIC global tune setting is defined to be able to force Retry emission prior to handshake. By definition, this ability is only supported by QUIC servers, hence it is a frontend option only. Rename the option to use "fe" prefix. The old option name is deprecated and will be removed in 3.5	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	d248c5bd21	MINOR: quic: rename max Tx mem setting QUIC global memory can be limited across the entire process via a global tune setting. Previously, this setting used to misleading "frontend" prefix. As this is applied as a sum between all QUIC connections, both from frontend and backend sides, remove the prefix. The new option name is "tune.quic.mem.tx-max". The older option name is deprecated and will be removed in 3.5.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	9bfe9b9e21	MINOR: quic: split Tx options for FE/BE usage This patch is similar to the previous one, except that it is focused on Tx QUIC settings. It is now possible to toggle GSO and pacing on frontend and backend sides independently. As with previous patch, option are renamed to use "fe/be" unified prefixes. This is part of the current serie of commits which unify QUI settings. Older options are deprecated and will be removed on 3.5 release.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	33a8cb87a9	MINOR: quic: split congestion controler options for FE/BE usage Various settings can be configured related to QUIC congestion controler. This patch duplicates them to be able to set independent values on frontend and backend sides. As with previous patch, option are renamed to use "fe/be" unified prefixes. This is part of the current serie of commits which unify QUIC settings. Older options are deprecated and will be removed on 3.5 release.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	7640e9a9ee	MINOR: quic: duplicate glitches FE option on BE side Previously, QUIC glitches support was only implemented for frontend side. Extend this so that the option can be specified separately both on frontend and backend sides. Function _qcc_report_glitch() now retrieves the relevant max value based on connection side. In addition to this, option has been renamed to use "fe/be" prefixes. This is part of the current serie of commits which unify QUIC settings. Older options are deprecated and will be removed on 3.5 release.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	b34cd0b506	MINOR: quic: rename "no-quic" to "tune.quic.listen" Rename the option to quickly enable/disable every QUIC listeners. It now takes an argument on/off. The documentation is extended to reflect the fact that QUIC backend are not impacted by this option. The older keyword is simply removed. Deprecation is considered unnecessary as this setting is only useful during debugging.	2025-10-23 16:47:58 +02:00
Amaury Denoyelle	42e5ec6519	MINOR: quic: prepare support for options on FE/BE side A major reorganization of QUIC settings is going to be performed. One of its objective is to clearly define options which can be separately configured on frontend and backend proxy sides. To implement this, quic_tune structure is extended to support fe and be options. A set of macros/functions is also defined : it allows to retrieve an option defined on both sides with unified code, based on proxy side of a quic_conn/connection instance.	2025-10-23 15:06:01 +02:00
Olivier Houchard	f40f5401b9	BUG/MEDIUM: mt_lists: Avoid el->prev = el->next = el Avoid setting both el->prev and el->next on the same line. The goal is to set both el->prev and el->next to el, but a naive compiler, such as when we're using -O0, will set el->next first, then will set el->prev to the value of el->next, but if we're unlucky, el->next will have been set to something else by another thread. So explicitely set both to what we want. This should be backported up to 2.8.	2025-10-23 14:43:51 +02:00
Aurelien DARRAGON	d30b88a6cc	BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency As reported by @tianon on GH #3168, running haproxy on 32bits i386 platform would trigger the following BUG_ON() statement: FATAL: bug condition "sizeof(struct shm_stats_file_object) != 544" matched at src/stats-file.c:825 shm_stats_file_object struct size changed, is is part of the exported API: ensure all precautions were taken (ie: shm_stats_file version change) before adjusting this In fact, some efforts were already taken to ensure shm_stats_file_object struct size remains consistent on 64 vs 32 bits platforms, since shm_stats_file_object is part of the public API and directly exposed in the stats file. However, some parts were overlooked: some structs that are embedded in shm_stats_file_object struct itself weren't using fixed-width integers, and would sometime be unaligned. The result of this is that it was up to the compiler (platform-dependent) to choose how to deal with such ambiguities, which could cause the struct mapping/size to be inconsistent from one platform to another. Hopefully this was caught by the BUG_ON() statement and with the precious help of @tianon To fix this, we now use fixed-width integers everywhere for members (and submembers) of shm_stats_file_object struct, and we use explicit padding where missing to avoid automatic padding when we don't expect one. As for the previous commit, we leverage FIXED_SIZE() and FIXED_SIZE_ARRAY() macro to set the expected width for each integer without causing build issues on platform that don't support larger integers. No backport needed, this feature was introduced during 3.3-dev.	2025-10-22 20:52:22 +02:00
Aurelien DARRAGON	4693ee0ff7	MEDIUM: freq-ctr: use explicit-size types for freq-ctr struct freq-ctr struct is used by the shm_stats_file API, and more precisely, it is used in the shm_stats_file_object struct for counters. shm_stats_file_object struct requires to be plateform-independent, thus we switch to using explicit size types (AKA fixed width integer types) for freq-ctr, in the attempt to make freq-ctr size and memory mapping consistent from one platform to another. We cannot simply use fixed-width integer because some of them are involved in atomic operations, and forcing a given width could cause build issues on some platforms where atomic ops are not implemented for large integers. Instead we leverage the FIXED_SIZE macro to keep handling the integers as before, but forcing them to be stored using expected number of bytes (unused bytes will simply be ignored). No change of behavior should be expected.	2025-10-22 20:52:18 +02:00
Aurelien DARRAGON	466a603b59	MINOR: compiler: add FIXED_SIZE(size, type, name) macro FIXED_SIZE() macro can be used to instruct the compiler that the struct member named <name>, handled as <type>, must be stored using <size> bytes and that even if the type used is actualler smaller than the expected size FIXED_SIZE_ARRAY(), similar to FIXED_SIZE() but for arrays: it takes an extra argument which is the number of members. They may be used for portability concerns to ensure a structure mapping remains consistent between platforms.	2025-10-22 20:52:12 +02:00
Amaury Denoyelle	f50425c021	MINOR: quic: remove received CRYPTO temporary tree storage The previous commit switch from ncbuf to ncbmbuf as storage for received CRYPTO frames. The latter ensures that buffering of such frames cannot fail anymore due to gaps size. Previously, extra mechanism were implemented on QUIC frames parsing function to overcome the limitation of ncbuf on gaps size. Before insertion, CRYPTO frames were stored in a temporary tree to order their insertion. As this is not necessary anymore, this commit removes the temporary tree insertion. This commit is closely associated to the previous bug fix. As it provides a neat optimization and code simplication, it can be backported with it, but not in the next immediate release to spot potential regression.	2025-10-22 15:24:02 +02:00
Amaury Denoyelle	4c11206395	BUG/MAJOR: quic: use ncbmbuf for CRYPTO handling In QUIC, TLS handshake messages such as ClientHello are encapsulated in CRYPTO frames. Each QUIC implementation can split the content in several frames of random sizes. In fact, this feature is now used by several clients, based on chrome so-called "Chaos protection" mechanism : https://quiche.googlesource.com/quiche/+/cb6b51054274cb2c939264faf34a1776e0a5bab7 To support this, haproxy uses a ncbuf storage to store received CRYPTO frames before passing it to the SSL library. However, this storage suffers from a limitation as gaps between two filled blocks cannot be smaller than 8 bytes. Thus, depending on the size of received CRYPTO frames and their order, ncbuf may not be sufficient. Over time, several mechanisms were implemented in haproxy QUIC frames parsing to overcome the ncbuf limitation. However, reports recently highlight that with some clients haproxy is not able to deal with CRYPTO frames reception. In particular, this is the case with the latest ngtcp2 release, which implements a similar chaos protection mechanism via the following patch. It also seems that this impacts haproxy interaction with firefox. commit 89c29fd8611d5e6d2f6b1f475c5e3494c376028c Author: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com> Date: Mon Aug 4 22:48:06 2025 +0900 Crumble Client Initial CRYPTO (aka chaos protection) To fix haproxy CRYPTO frames buffering once and for all, an alternative non-contiguous buffer named ncbmbuf has been recently implemented. This type does not suffer from gaps size limitation, albeit at the cost of a small reduction in the size available for data storage. Thus, the purpose of this current patch is to replace ncbuf with the newer ncbmbuf for QUIC CRYPTO frames parsing. Now, ncbmb_add() is used to buffer received frames which is guaranteed to suceed. The only remaining case of error is if a received frame offset and length exceed the ncbmbuf data storage, which would result in a CRYPTO_BUFFER_EXCEEDED error code. A notable behavior change when switching to ncbmbuf implementation is that NCB_ADD_COMPARE mode cannot be used anymore during add. Instead, crypto frame content received at a similar offset will be overwritten. A final note regarding STREAM frames parsing. For now, it is considered unnecessary to switch from ncbuf in this case. Indeed, QUIC clients does not perform aggressive fragmentation for them. Keeping ncbuf ensure that the data storage size is bigger than the equivalent ncbmbuf area. This should fix github issue #3141. This patch must be backported up to 2.6. It is first necessary to pick the relevant commits for ncbmbuf implementation prior to it.	2025-10-22 15:04:41 +02:00
Amaury Denoyelle	8b8ab2824e	MINOR: ncbmbuf: implement advance operation Implement ncbmb_advance() function for the ncbmbuf type. This allows to remove bytes in front of the buffer, regardless of the existing gaps. This is implemented by resetting the corresponding bits of the bitmap. As the previous patch, this commit must be backported prior to the fix to come on QUIC CRYPTO frames parsing.	2025-10-22 15:04:06 +02:00
Amaury Denoyelle	42c495f3d7	MINOR: ncbmbuf: implement ncbmb_data() Implement ncbmb_data() function for the ncbmbuf type. Its purpose is similar to its ncbuf counterpart : it returns the size in bytes of data starting at a specific offset until the next gap. As the previous patch, this commit must be backported prior to the fix to come on QUIC CRYPTO frames parsing.	2025-10-22 15:04:06 +02:00
Amaury Denoyelle	1e1a3aa6aa	MINOR: ncbmbuf: implement add This patch implements add operation for ncbmbuf type. This function is simpler than its ncbuf counterpart. Indeed, for now only NCB_ADD_OVERWRT mode is supported. This compromise has been chosen as ncbmbuf will be first used for QUIC CRYPTO frames handling, which does not mandate to compare existing filled blocks during insertion. As the previous patch, this commit must be backported prior to the fix to come on QUIC CRYPTO frames parsing.	2025-10-22 15:04:06 +02:00
Amaury Denoyelle	b9f91ad3ff	MINOR: ncbmbuf: define new ncbmbuf type Define ncbmbuf which is an alternative non-contiguous buffer implementation. "bm" abbreviation stands for bitmap, which reflects how gaps and filled blocks are encoded. The main purpose of this implementation is to get rid of the ncbuf limitation regarding the minimal size for gaps between two blocks of data. This commit adds the new module ncbmbuf. Along with it, some utility functions such as ncbmb_make(), ncbmb_init() and ncbmb_is_empty() are defined. Public API of ncbmbuf will be extended in the following patches. This patch is not considered a bug fix. However, it will be required to fix issue encountered on QUIC CRYPTO frames parsing. Thus, it will be necessary to backport the current patch prior to the fix to come.	2025-10-22 15:04:06 +02:00
Amaury Denoyelle	59f0bafef2	MINOR: ncbuf: extract common types ncbuf is a module which provide a non-contiguous buffer type implementation. This patch extracts some basic types related to it into a new file ncbuf_common.h. This patch will be useful to provide a new non-contiguous buffer alternative implementation based on a bitmap. This patch is not a bug fix. However, it is necessary for ncbmbuf implementation which will be required to fix a QUIC issue on CRYPTO frames parsing. This, it will be necessary to backport the current patch prior to the fix to come.	2025-10-22 11:11:20 +02:00
Olivier Houchard	d5562e31bd	MEDIUM: stick-tables: Remove the table lock Remove the table lock, it was only protecting the per-table expiration date, and that task is gone.	2025-10-20 15:04:47 +02:00
Olivier Houchard	8bc8a21b25	MEDIUM: stick-tables: Use a per-shard expiration task Instead of having per-table expiration tasks, just use one per shard. The task will now go through all the tables to expire entries. When a table gets an expiration earlier than the one previously known, it will be put in a mt-list, and the task will be responsible to put it into an eb32, ordered based on the next expiration. Each per-shard task will run on a different thread, so it should lead to a better load distribution than the per-table tasks.	2025-10-20 15:04:47 +02:00
Olivier Houchard	945aa0ea82	MINOR: initcalls: Add a new initcall stage, STG_INIT_2 Add a new initcall stage, STG_INIT_2, for stuff to be called after step_init_2() is called, so after we know for sure that global.nbthread will be set. Modify stick-tables stkt_late_init() to run at STG_INIT_2 instead of STG_INIT, in anticipation for it to be enhanced and have a need for global.nbthread.	2025-10-20 15:04:41 +02:00
Olivier Houchard	7a33b90b3c	BUG/MEDIUM: mt_list: Make sure not to unlock the element twice In mt_list_delete(), if the element was not in a list, then n and p will point to it, and so setting n->prev and n->next will be enough to unlock it. Don't do it twice, as once it's been done the first time, another thread may be working with it, and may have added it to a list already, and doing it a second time can lead to list inconsistencies. This should be backported up to 2.8.	2025-10-19 23:21:42 +02:00

1 2 3 4 5 ...

8760 Commits