haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-07 07:37:02 +02:00

Author	SHA1	Message	Date
Willy Tarreau	4c1480f13b	MINOR: stick-tables: mark the seen stksess with a flag "seen" Right now we're taking the stick-tables update lock for reads just for the sake of checking if the update index is past it or not. That's costly because even taking the read lock is sufficient to provoke a cache line write, while when under load or attack it's frequent that the update has not yet been propagated and wouldn't require anything. This commit brings a new field to the stksess, "seen", which is zeroed when the entry is updated, and set to one as soon as at least one peer starts to consult it. This way it will reflect that the entry must be updated again so that this peer can see it. Otherwise no update will be necessary. For now the flag is only set/reset but not exploited. A great care is taken to avoid writes whenever possible.	2024-04-03 17:34:47 +02:00
Willy Tarreau	15522fc243	BUG/MINOR: bwlim/config: fix missing '\n' after error messages Some bwlim error messages at parsing time were missing the trailing '\n' in commit `2b6777021d` ("MEDIUM: bwlim: Add support of bandwith limitation at the stream level"). This commit can be backported wherever the commit above is (likely as far as 2.7).	2024-04-03 17:34:36 +02:00
Willy Tarreau	f821a3983e	BUILD: systemd: fix build error on non-systemd systems with USE_SYSTEMD=1 Thanks to previous commit, we can now build with USE_SYSTEMD=1 on any system without requiring any parts from systemd. It just turns our that there was one remaining include in haproxy.c that needed to be replaced with haproxy/systemd.h to build correctly. That's what this commit does.	2024-04-03 17:34:36 +02:00
William Lallemand	aa3632962f	MEDIUM: mworker: get rid of libsystemd Given the xz drama which allowed liblzma to be linked to openssh, lets remove libsystemd to get rid of useless dependencies. The sd_notify API seems to be stable and is now documented. This patch replaces the sd_notify() and sd_notifyf() function by a reimplementation inspired by the systemd documentation. This should not change anything functionnally. The function will be built when haproxy is built using USE_SYSTEMD=1. References: https://github.com/systemd/systemd/issues/32028 https://www.freedesktop.org/software/systemd/man/devel/sd_notify.html#Notes Before: wla@kikyo:~% ldd /usr/sbin/haproxy linux-vdso.so.1 (0x00007ffcfaf65000) libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x000074637fef4000) libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x000074637fe4f000) libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x000074637f400000) liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x000074637fe0d000) libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x000074637f92a000) libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x000074637f365000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000074637f000000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000074637f27a000) libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x000074637fdff000) libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x000074637eeb8000) liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x000074637fdcd000) libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x000074637ee01000) liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x000074637fda8000) /lib64/ld-linux-x86-64.so.2 (0x000074637ff5d000) libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x000074637f904000) After: wla@kikyo:~% ldd /usr/sbin/haproxy linux-vdso.so.1 (0x00007ffd51901000) libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f758d6c0000) libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007f758d61b000) libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007f758ca00000) liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x00007f758d5d9000) libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f758d365000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f758d5ba000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f758c600000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f758c915000) /lib64/ld-linux-x86-64.so.2 (0x00007f758d729000) A backport to all stable versions could be considered at some point.	2024-04-03 15:53:18 +02:00
Aurelien DARRAGON	837a26ab05	BUG/MEDIUM: server/lbprm: fix crash in _srv_set_inetaddr_port() Since `faa8c3e` ("MEDIUM: lb-chash: Deterministic node hashes based on server address") the following configuration will cause haproxy to crash: backend test1 mode http balance hash int(1) server s1 haproxy.org:80 This is because lbprm.update_server_eweight() method is now systematically called in _srv_set_inetaddr_port() upon srv addr/port change (and with the above config it happens during startup after initial dns resolution). However, depending on the chosen lbprm algo, update_server_eweight function may not be set (it is not a mandatory method, some lb implementations don't define it). Thus, using 'balance hash' with map-based hashing or 'balance sticky' will cause a crash due to a NULL de-reference in _srv_set_inetaddr_port(). To fix the issue, we first check that the update_server_eweight() method is set before using it. No backport needed unless `faa8c3e` ("MEDIUM: lb-chash: Deterministic node hashes based on server address") gets backported.	2024-04-03 11:58:03 +02:00
Frederic Lecaille	0e14bac7bd	BUILD: quic: 32 bits compilation issue (QUIC_MIN() usage) This issue arrived with this commit: "MINOR: quic: HyStart++ implementation (RFC 9406)" Thanks to @chipitsine for having reported this issue in GH #2513. Should be backported where the previous commit will be backported.	2024-04-03 11:14:50 +02:00
Willy Tarreau	6a2f09de1c	OPTIM: peers: avoid the locking dance around peer_send_teach_process_msgs() In peer_send_msg(), we take a lock before calling peer_send_teach_process_msgs because of the check on the flags and update indexes, and the function then drops it then takes it again just to resume in the same situation, so that on return we can drop it again! Not only this is absurd because it doubles the costs of taking the lock, it's also totally inefficient because it takes a write lock while the only usage that is done with it is to read the indexes! Let's drop the lock from peer_send_teach_process_msgs() and move it explicitly in its only caller around the condition, and turn it into a read lock only.	2024-04-03 09:34:08 +02:00
Willy Tarreau	1ea18fa8a3	BUG/MAJOR: applet: fix a MIN vs MAX usage in appctx_raw_rcv_buf() The MAX() macro was used to limit the count of bytes to be transferred in appctx_raw_rcv_buf() by commit `ee53d8421f` ("MEDIUM: applet: Simplify a bit API to exchange data with applets") instead of MIN(). It didn't seem to have any consequences until commit `f37ddbeb4b` ("MAJOR: cli: Update the CLI applet to handle its own buffers") that triggers a BUG_ON() in __b_putblk() when the other side is slow to read, because we're trying to append a full buffer on top of a non-empty one. A way to reproduce it is to dump a heavy stick table on the CLI with a screen scrolling. No backport is needed since this was introduced in 3.0-dev3 and revealed after dev5 only.	2024-04-03 09:34:08 +02:00
Willy Tarreau	ed45d13321	BUG/MEDIUM: stick-table: use the update lock when reading tables from peers In 2.9, the stick-tables' locking was split between the lock used to manipulate the contents (->lock) and the lock used to manipulate the list of updates and the update indexes (->updt_lock). This was done with commit `87e072eea5` ("MEDIUM: stick-table: use a distinct lock for the updates tree"). However a part was overlooked in the peers code, the parts that consult (and update) the indexes use the table's lock instead of the update lock. It's surprising that it hasn't caused more trouble. It's likely due to the fact that the tree nodes are not often immediately freed and that their memory area remains connected to valid nodes in the tree during peer_stksess_lookup(), while other parts only check or update indexes, thus are not that critical. This needs to be backported wherever the commit above is, thus logically 2.9.	2024-04-03 09:33:10 +02:00
Christopher Faulet	3abf6934a4	BUG/MEDIUM: stconn: Don't forward shutdown to SE if iobuf is not empty It is only an issue when the kernel splicing is used. The zero-copy forwarding via the buffers is not affected. When a shutdown is received on the producer side and some data are blocked in the pipe for a while, the shutdown may be forwarded to the other side. Usually, in this case, the shutdown must be scheduled, waiting all output data (from the channel and the consumer's iobuf) are sent. But only the channel was considered. The bug was introduced by commit `20c463955d` ("MEDIUM: channel: don't look at iobuf to report an empty channel"). To fix the issue, we must also check data blocked in the consummer iobuf. This patch should solve the issue #2505. It must be backported to 2.9.	2024-04-03 08:46:37 +02:00
Frederic Lecaille	a305bb92b9	MINOR: quic: HyStart++ implementation (RFC 9406) This is a simple algorithm to replace the classic slow start phase of the congestion control algorithms. It should reduce the high packet loss during this step. Implemented only for Cubic.	2024-04-02 18:47:19 +02:00
Willy Tarreau	e9b774f4b3	BUG/MINOR: backend: properly handle redispatch 0 According to the documentation, "option redispatch 0" is expected to disable redispatch just like "no option redispatch", but due to the fact that it keeps PR_O_REDISP set, it doesn't actually work. Let's make sure value 0 is properly handled and drops PR_O_REDISP. This can be backported to all versions since it seems it has been broken since its introduction in 1.6 with commit `726ab7145c` ("MEDIUM: backend: Allow redispatch on retry intervals"). As a workaround, "no option redispatch" does work though.	2024-04-02 15:19:18 +02:00
Tim Duesterhus	ec38e1b39b	CLEANUP: Reapply ha_free.cocci This reapplies ha_free.cocci across the whole src/ tree.	2024-04-02 07:27:33 +02:00
Tim Duesterhus	7c317f4619	CLEANUP: Reapply xalloc_cast.cocci This reapplies xalloc_cast.cocci across the whole src/ tree.	2024-04-02 07:27:33 +02:00
Tim Duesterhus	f88ea5949c	CLEANUP: Reapply strcmp.cocci (2) This reapplies strcmp.cocci across the whole src/ tree.	2024-04-02 07:27:33 +02:00
Tim Duesterhus	cd5d62249f	CLEANUP: Reapply ist.cocci (3) This reapplies ist.cocci across the whole src/ tree.	2024-04-02 07:27:33 +02:00
Willy Tarreau	5fc1afb341	BUG/MEDIUM: stick-tables: fix a small remaining race in expiration task In 2.7 we addressed a race condition in the stick tables expiration task with commit `fbb934d` ("BUG/MEDIUM: stick-table: fix a race condition when updating the expiration task"). The issue was that the task could be running on another thread which would destroy its expiration timer while one had just recalculated it and prepares to queue it, causing a bug due to the attempt to queue an expired task. The fix consisted in enclosing the change into the stick-table's lock, which had a very low cost since it's done only after having checked that the date changed, i.e. no more than once every millisecond. But as reported by Ricardo and Felipe from Taghos in github issue #2508, a tiny race remained after the fix: the unlock() was done before the call to task_queue(), leaving a tiny window for another thread to run between unlock() and task_queue() and erase the timer. As confirmed, it's sufficient to also protect the task_queue() call. But overall this raises a point regarding the task_queue() API on tasks that may run anywhere. A while ago an attempt was made at removing the timer for woken up tasks, but something like this would be deserved with more atomicity on the timer manipulation (e.g. atomically use task_schedule() instead maybe). This should be backported to all stable branches.	2024-04-02 07:07:57 +02:00
Anthony Deschamps	faa8c3e024	MEDIUM: lb-chash: Deterministic node hashes based on server address Motivation: When services are discovered through DNS resolution, the order in which DNS records get resolved and assigned to servers is arbitrary. Therefore, even though two HAProxy instances using chash balancing might agree that a particular request should go to server3, it is likely the case that they have assigned different IPs and ports to the server in that slot. This patch adds a server option, "hash-key <key>" which can be set to "id" (the existing behaviour, default), "addr", or "addr-port". By deriving the keys for the chash tree nodes from a server's address and port we ensure that independent HAProxy instances will agree on routing decisions. If an address is not known then the key is derived from the server's puid as it was previously. When adjusting a server's weight, we now check whether the server's hash has changed. If it has, we have to remove all its nodes first, since the node keys will also have to change.	2024-04-02 07:00:10 +02:00
Amaury Denoyelle	da03396bb3	BUG/BUILD: debug: fix unused variable error A compilation error occurs when using DEBUG_MEM_STATS due to a variable now being unused in debug_iohandler_memstats() : src/debug.c: In function ‘debug_iohandler_memstats’: src/debug.c:1862:24: error: unused variable ‘sc’ [-Werror=unused-variable] 1862 \| struct stconn *sc = appctx_sc(appctx); \| ^~ This is caused since the following commit : `94b8ed446f` MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands This must not be backported.	2024-03-29 17:21:04 +01:00
Aurelien DARRAGON	3c6dfa618a	MEDIUM: log/balance: leverage lbprm api for log load-balancing log load-balancing implementation was not seamlessly integrated within lbprm API. The consequence is that it could become harder to maintain over time since it added some specific cases just for the log backend. Moreover, it resulted in some code duplication since balance algorithms that are common to logs and regular (tcp, http) backends were specifically rewritten for log backends. Thanks to the previous commit, we now have all the prerequisites to make log load-balancing fully leverage lbprm logic. Thus in this patch we make __do_send_log_backend() use existing lbprm algorithms, and we no longer require log-specific lbprm initialization in cfgparse.c and in postcheck_log_backend(). As a bonus, for log backends this allows weighed algorithms to properly support weights (ie: roundrobin, random and log-hash) since we now leverage the same lb algorithms that we use for tcp/http backends (doc was updated).	2024-03-29 17:08:37 +01:00
Aurelien DARRAGON	9aea6df81f	MINOR: lbprm: implement true "sticky" balance algo As previously mentioned in `cd352c0db` ("MINOR: log/balance: rename "log-sticky" to "sticky""), let's define a sticky algorithm that may be used from any protocol. Sticky algorithm sticks on the same server as long as it remains available. The documentation was updated accordingly.	2024-03-29 17:08:37 +01:00
Aurelien DARRAGON	d0692d7019	BUG/MINOR: log/balance: detect if user tries to use unsupported algo `b61147fd` ("MEDIUM: log/balance: merge tcp/http algo with log ones") introduced some ambiguities, because while it shares some algos with the ones from mode {tcp,http}, we forgot report an error when the user tries to use an algorithm that is not available in this mode (as per the doc). Because of that, haproxy would silently drop log messages during runtime. To fix that, we ensure that algo is one of the supported ones during log backend postparsing. If the algo is not supported, we raise an error. This should be backported in 2.9 with `b61147fd`	2024-03-29 17:08:36 +01:00
Christopher Faulet	87426e82ec	MAJOR: cli: Use a custom .snd_buf function to only copy the current command The CLI applet is now using its own snd_buf callback function. Instead of copying as most output data as possible, only one command is copied at a time. To do so, a new state CLI_ST_PARSEREQ is added for the CLI applet. In this state, the CLI I/O handle knows a full command was copied into its input buffer and it must parse this command to evaluate it.	2024-03-28 17:32:55 +01:00
Christopher Faulet	838fb54de6	MINOR: stconn: Add a connection flag to notify sending data are the last ones This flag can be use by endpoints to know the data to send, via .snd_buf callback function are the last ones. It is useful to know a shutdown is pending but it cannot be delivered while sedning data are not consumed.	2024-03-28 17:32:55 +01:00
Christopher Faulet	a933569b52	MINOR: applet: Let's applets .snd_buf function deal with full input buffers It is now the responsbility of applets .snd_buf callback function to notify the input buffer is full. This will allow the applets to not consume all data waiting for more data. Of course, it is only useful for applets using a custom .snd_buf callback function.	2024-03-28 17:32:55 +01:00
Christopher Faulet	f37ddbeb4b	MAJOR: cli: Update the CLI applet to handle its own buffers It is the third applet to be refactored to use its own buffers. In addition to the CLI applet, some I/O handlers of CLI commands were also updated, especially the stats ones. Some command I/O handlers were updated to use applet's buffers instead of channels ones.	2024-03-28 17:32:51 +01:00
Christopher Faulet	b8ca114031	BUG/MEDIUM: applet: State appctx have more data if its EOI/EOS/ERROR flag is set It is an harmless bug for now because only stats and cache applets are using their own buffers and it is not possible to trigger this bug with these applets. However, it remains important to try a receive if EOI, EOS or ERROR is reached by the applet while no data was produced. Otherwise, it is not possible to ack these events at the SE level. No backport needed.	2024-03-28 17:28:21 +01:00
Christopher Faulet	d2403a412c	MINOR: applet: Always use applet API to set appctx flags Some appctx flags were still set manually while there is a dedicated function to do so. Be sure to always use applet_fl_set() to set appctx flags.	2024-03-28 17:28:20 +01:00
Christopher Faulet	94b8ed446f	MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands The main CLI I/O handle is responsible to interrupt the processing on shutdown/abort. It is not the responsibility of the I/O handler of CLI commands to take care of it.	2024-03-28 17:28:20 +01:00
Christopher Faulet	2e6733eb45	MEDIUM: stream: Use generic version to perform sync receives and sends Instead of using connection versions, we now use generic versions. It means we will also perfom sync receives and sync sends on applets too, but only for applets using their own buffers. Old applets are not concerned.	2024-03-28 17:28:20 +01:00
Christopher Faulet	5056cbdb86	MINOR: sc_strm: Add generic version to perform sync receives and sends sc_sync_recv() and sc_sync_send() were added to use connection or applet versions, depending on the endpoint type. For now these functions are not used. But this will be used by process_stream() to replace the connection version.	2024-03-28 17:28:20 +01:00
Christopher Faulet	498520fdf5	BUG/MINOR: cli: Report an error to user if command or payload is too big Too big command, larger than a buffer, was silently rejected by the CLI applet. It was handled as an error and the connection was closed, but no error message was reported to user to notify him. Now an error is reported before closing. It is only displayed if the chunk buffer used by the CLI applet is full and no delimiter (\n or ;) is found to mark the end of the command. It works for a simple command but also for a command with a huge payload. This patch could be backported to all stable versions.	2024-03-28 17:28:20 +01:00
Amaury Denoyelle	6333e6ec8e	MINOR: server: allow cookie for dynamic servers This commit allows "cookie" keyword for dynamic servers. After code review, nothing was found which could prevent a dynamic server to use it. An extra warning is added under cli_parse_add_server() if cookie value is ignored due to a non HTTP backend. This patch is not considered a bugfix. However, it may backported if needed as its impact seems minimal.	2024-03-28 11:54:21 +01:00
Damien Claisse	9a0e0d3a19	BUG/MINOR: server: fix persistence cookie for dynamic servers When adding a server dynamically, we observe that when a backend has a dynamic persistence cookie, the new server has no cookie as we receive the following HTTP header: set-cookie: test-cookie=; Expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/ Whereas we were expecting to receive something like the following, which is what we receive for a server added in the config file: set-cookie: test-cookie=abcdef1234567890; path=/ After investigating code path, srv_set_dyncookie() is never called when adding a server through CLI, it is only called when parsing config file or using "set server bkd1/srv1 addr". To fix this, call srv_set_dyncookie() inside cli_parse_add_server(). This patch must be backported up to 2.4.	2024-03-28 11:54:21 +01:00
Amaury Denoyelle	250c19032f	BUG/MINOR: server: reject enabled for dynamic server Since their first implementation, dynamic servers are created into maintenance state. This has been done purposely to avoid immediate activation of a newly inserted server. However, this principle is incompatible if "enabled" keyword is used on "add server". The newly created instance will be unreacheable as proxy load-balancing algorithm is not informed of its presence via srv_lb_propagate(). The new server could be unblocked by toggling its state with "disable server" / "enable server" commands, which will trigger srv_lb_propagate() invocation. To avoid this unexpected state, simply forbid "enabled" keyword for dynamic servers. In the long-term, it could be possible to re authorize it but at least this requires to call srv_lb_propagate() on dynamic server creation. This should fix github issue #2497. This patch should not be backported as-is, to avoid breaking dynamic servers API on stable versions. "enabled" should instead be ignored for them. This will be implemented in a dedicated patch on top of 2.9.	2024-03-28 11:51:05 +01:00
Remi Tricot-Le Breton	7359c0c7f4	MEDIUM: ssl: Add 'tune.ssl.ocsp-update.mode' global option This option can be used to set a default ocsp-update mode for all certificates of a given conf file. It allows to activate ocsp-update on certificates without the need to create separate crt-lists. It can still be superseded by the crt-list 'ocsp-update' option. It takes either "on" or "off" as value and defaults to "off". Since setting this new parameter to "on" would mean that we try to enable ocsp-update on any certificate, and also certificates that don't have an OCSP URI, the checks performed in ssl_sock_load_ocsp were softened. We don't systematically raise an error when trying to enable ocsp-update on a certificate that does not have an OCSP URI, be it via the global option or the crt-list one. We will still raise an error when a user tries to load a certificate that does have an OCSP URI but a missing issuer certificate (if ocsp-update is enabled).	2024-03-27 11:38:28 +01:00
Remi Tricot-Le Breton	b1d623949c	BUG/MINOR: ssl: Detect more 'ocsp-update' incompatibilities The inconsistencies in 'ocsp-update' parameter were only checked when parsing a crt-list line so if a certificate was used on a bind line after being used in a crt-list with 'ocsp-update' set to 'on', then no error would be raised. This patch helps detect such inconsistencies. This patch can be backported up to branch 2.8.	2024-03-27 11:38:28 +01:00
Remi Tricot-Le Breton	97c2734f44	BUG/MINOR: ssl: Wrong ocsp-update "incompatibility" error message In a crt-list such as the following: foo.pem [ocsp-update off] foo.com foo.pem bar.com we would get a wrong "Incompatibilities found in OCSP update mode ..." error message during init when the two lines are actually saying the same thing since the default for 'ocsp-update' option is 'off'. This patch can be backported up to branch 2.8.	2024-03-27 11:38:28 +01:00
Willy Tarreau	40d1c84bf0	BUG/MAJOR: ring: free the ring storage not the ring itself when using maps A recent issue was uncovered by the CI which started to randomly report segfaults on a few tests, and more systematically on FreeBSD. It turn out that it was introduced by recent commit `03816ccfa9` ("MAJOR: ring: insert an intermediary ring_storage level"), which overlooked the munmap() path of the sink and startup logs: once the ring and its storage were split, it was no longer correct to munmap() the ring, only its storage area needs to be unmapped, and the ring must always be freed separately. Thanks to Christopher and William for their help at trying to reproduce it and figure the circumstances that triggers it. No backport is needed.	2024-03-26 15:15:59 +01:00
Aurelien DARRAGON	bd98db5078	BUG/MINOR: server: 'source' interface ignored from 'default-server' directive Sebastien Gross reported that 'interface' keyword ('source' subargument) is silently ignored when used from 'default-server' directive despite the documentation implicitly stating that the keyword should be supported there. When support for 'source' keyword was added to 'default-server' directive in `dba97077` ("MINOR: server: Make 'default-server' support 'source' keyword."), we properly duplicated the conn iface_name from the default- server but we forgot to copy the conn iface_len which must be set as well since it is used as setsockopt()'s 'optlen' argument in tcp_connect_server(). It should be backported to all stable versions.	2024-03-26 11:09:02 +01:00
Willy Tarreau	2431b20640	BUILD: ssl: fix build error on older compilers with openssl-3.2 OpenSSL 3.2 triggers the code part added by commit `25da217` ("MINOR: ssl: Update ssl_fc_curve/ssl_bc_curve to use SSL_get0_group_name") which contains a variable declaration in the for() statement and breaks on older compilers, as reported in GH issues #2501. Let's just declare it normally to fix the problem. This must be backported wherever the commit above is (at least 2.9).	2024-03-25 21:21:47 +01:00
Willy Tarreau	4bc81ec985	CLEANUP: ring: use only curr_cell and not next_cell in the main write loop It turns out that we can reduce by one variable in the loop and this clobbers one less register, making it slightly faster on Cortex A72.	2024-03-25 17:34:19 +00:00
Willy Tarreau	0a0a64ef02	OPTIM: ring: use relaxed stores to release the threads We don't care in what order the threads are released, so we can write their sent value using relaxed atomic stores. This brings a 3-5% perf boost on ARM with 80 cores, reaching 7.25M/s, and doesn't change anything on x86 since it keeps using strict ordering.	2024-03-25 17:34:19 +00:00
Willy Tarreau	cabe945876	MINOR: ring: avoid writes to cells during copy It has been found that performing a first pass consisting in copying all messages, and a second one to notify about releases is more efficient on AMD than updating all of them on the fly using a CAS, despite making writers wait longer to be released. Maybe it's related to the ability for the CPU to prefetch the contents during a simple load while it wouldn't do it for an XCHG, it's unsure at this point. This will also mater permit to use relaxed stores to release threads. On ARM the performance increased to 7.0M/s. If this patch is applied before the dropping of the intermediary step, instead it drops to 3.9M/s. This shows the dependency between such changes that strive to limit the number of writes on the fast path. On x86_64, the EPYC at 3C6T saw a small drop from 4.57M to 4.45M, but the 24C48T setup saw a nice 33% boost from 3.33M to 4.44M, i.e. we get stable perf at 3 and 24 cores, despite having 8 CCX involved and fighting with each other. Other possibilities are: - use of HA_ATOMIC_XCHG() instead of FETCH_OR() => slightly faster (4.62/7.37 vs 4.58/7.34). Pb: requires to modify the readers to wait much longer since the tail value won't be valid in this case during updates, and it will have to wait by looping over it. - use other conditions to release a cell => to be tested	2024-03-25 17:34:19 +00:00
Willy Tarreau	39df8c903d	MINOR: ring: it's not x86 but all non-ARMv8.1 which needs the read before OR Archs relying on CAS benefit from a read prior to FETCH_OR, so it's not just x86 that benefits from this. Let's just change the condition to only exclude __ARM_FEATURE_ATOMICS which is the only one faster without.	2024-03-25 17:34:19 +00:00
Willy Tarreau	e6fc167aec	CLEANUP: ring: further simplify the write loop The loop was cleaned up a little bit so that the inner loops are more readable and that the ifdef'd parts are whole blocks and not just an "if" condition. A few conditions were adjusted to benefit from "break" and "continue".	2024-03-25 17:34:19 +00:00
Willy Tarreau	4b984c5baa	MINOR: ring: simplify the write loop a little bit This is mostly a cleanup in that it turns the two-level loop into a single one, but it also simplifies the code a little bit and brings some performance savings again, which are mostly noticeable on ARM, but don't change anything for x86.	2024-03-25 17:34:19 +00:00
Willy Tarreau	573bbbe127	MEDIUM: ring: improve speed in the queue waiting loop on x86_64 x86_64 doesn't have a native atomic FETCH_OR(), it's implemented using a CAS, which will always cause a write cycle. Here we know we can just wait as long as the lock bit is held so better loop on a load, and only attempt the CAS on success. This requires a tiny ifdef and brings nice benefits. This brings the performance back from 3.33M to 3.75M at 24C48T while doing no change at 3C6T.	2024-03-25 17:34:19 +00:00
Willy Tarreau	30a659c355	MEDIUM: ring: significant boost in the loop by checking the ring queue ptr first By doing that and placing the cpu_relax at the right places, the ARM reaches 6.0M/s on 80 threads. On x86_64, at 3C6T the EPYC sees a small increase from 4.45M to 4.57M but at 24C48T it sees a drop from 3.82M to 3.33M due to the write contention hidden behind the CAS that implements the FETCH_OR(), that we'll address next.	2024-03-25 17:34:19 +00:00
Willy Tarreau	1e2311edbc	MAJOR: ring: implement a waiting queue in front of the ring The queue-based approach consists in forcing threads to wait away from the work area so as not to disturb the current writer, and to prepare the work by grouping them in a queue. The last arrived takes the head of the queue by placing its preinitialized ring cell there, becomes the queue's leader, informs itself about the amount of previously accumulated bytes so that when its turn comes, it immediately knows how much room is needed to be released. It can then take the whole queue with it, leaving an empty one for new threads to come while it's releasing the room needed to copy everything. By doing so we're cascading contention areas so that multiple parts can work in parallel. Note that we must never leave a write counter set to 0xFF at tail, and this happens when a message cannot fit and we give up, because in this case we're writing back tail_ofs, and only later we restore the counter. The solution here is to make a special case when we're going to drop the messages, and to write the readers count before restoring tail. This already shows a tremendous performance gain on ARM (385k -> 4.8M), thanks to the fact that now all waiting threads wait on the queue's head instead of polluting the tail lock. On x86_64, the EPYC sees a big boost at 24C48T (1.88M -> 3.82M) and a slowdown at 3C6T (6.0->4.45) though this one is much less of a concern as so few threads need less bandwidth than bigger counts.	2024-03-25 17:34:19 +00:00
Willy Tarreau	6c1b29d06f	MINOR: ring: make the number of queues configurable Now the rings have one wait queue per group. This should limit the contention on systems such as EPYC CPUs where the performance drops dramatically when using more than one CCX. Tests were run with different numbers and it was showed that value 6 outperforms all other ones at 12, 24, 48, 64 and 80 threads on an EPYC, a Xeon and an Ampere CPU. Value 7 sometimes comes close and anything around these values degrades quickly. The value has been left tunable in the global section. This commit only introduces everything needed to set up the queue count so that it's easier to adjust it in the forthcoming patches, but it was initially added after the series, making it harder to compare. It was also shown that trying to group the threads in queues by their thread groups is counter-productive and that it was more efficient to do that by applying a modulo on the thread number. As surprising as it seems, it does have the benefit of well balancing any number of threads.	2024-03-25 17:34:19 +00:00
Willy Tarreau	e3f101a19a	MINOR: ring: add the definition of a ring waiting cell This is what will be used to describe one waiting thread, its message in the queues, and the aggregation of pending messages after it.	2024-03-25 17:34:19 +00:00
Willy Tarreau	447189f286	MINOR: ring: keep a few frequently used pointers in the local stack Code disassembly shows that ring->storage->tail and ring->queue are accessed a lot and reloaded a lot due to aliasing. Let's just have variables for them in the local stack. It makes the code smaller and slightly faster.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c7bd7a68e4	OPTIM: ring: have only one thread at a time wake up all readers It's inefficient and counter-productive that each ring writer iterates over all readers to wake them up. Let's just have one in charge of this, it strongly limits contention. The only thing is that since the thread is iterating over a list, we want to be sure that if the first readers have already completed their job, they will be woken up again. For this we keep a counter of messages delivered after the wakeup started, and the waking thread will check it before going back to sleep. In order to avoid looping forever, it will also drop its waking flag soon enough to possibly let another one take it. There used to be a few cases of watchdogs before this on a 24-core AMD EPYC platform on the list iteration those never appeared anymore. The perf has dropped a bit on 3C6T on the EPYC, from 6.61 to 6.0M but remains unchanged at 24C48T.	2024-03-25 17:34:19 +00:00
Willy Tarreau	1f8b14b7be	OPTIM: ring: don't even try to update offset when failed to read If there's nothing to read, it's pointless for a reader to try to update the offset pointer, that's two atomic ops to replace a value by itself twice. Let's just stop this.	2024-03-25 17:34:19 +00:00
Willy Tarreau	9e99cfbeb6	MAJOR: ring: drop the now unneeded lock It was only used to protect the list which is now an mt_list so it doesn't provide any required protection anymore. It obviously also used to provide strict ordering between the writer and the reader when the writer started to update the messages, but that's now covered by the oredered tail updates and updates to the readers count to protect the area. The message rate on small thread counts (up to 12) saw a boost of roughly 5% while on large counts while for large counts it lost about 2% due to some contention now becoming visible elsewhere. Typical measures are 6.13M -> 6.61M at 3C6T, and 1.88 -> 1.92M at 24C48T on the EPYC.	2024-03-25 17:34:19 +00:00
Willy Tarreau	cb482f92c4	MINOR: ring: make sure ring_dispatch waits when facing a changing message The writer is using tags 0xFF instead of readers count at the front of messages that are undergoing an update, while the tail has already been updated. The reader needs to take care of this because it can face these messages and mistakenly parse data that's still being written, leading to corruption (especially if this happens while the size is changing). Let's just stop reading when facing reserved codes, since they indicate that the end of usable messages was reached.	2024-03-25 17:34:19 +00:00
Willy Tarreau	31b93b40b0	MEDIUM: ring: protect the initialization of the initial reader offset Since we're going to remove the lock, there's no more way to prevent the ring from being fed while we're attaching a client to it. We need to freeze the buffer while looking at its head so that we can attach there and have a trustable one. We could do it by setting the lock bit on the tail offset but quite frankly we don't need to bother with that, attaching a client is rare enough to permit a thread_isolate().	2024-03-25 17:34:19 +00:00
Willy Tarreau	a2d2dbf210	MEDIUM: ring/applet: turn the wait_entry list to an mt_list instead Rings are keeping a lock only for the list, which apparently doesn't need anything more than an mt_list, so let's first turn it into that before dropping the lock. There should be no visible effect.	2024-03-25 17:34:19 +00:00
Willy Tarreau	04f1e3f3d9	MINOR: ring: don't take the readers lock if there are no readers There's no point looking for freshly attached readers if there are none, taking this lock requires an atomic write to a shared area, something we clearly want to avoid. A general test with 213-byte messages on different thread counts shows how the performance degrades across CCX and how this patch improves the situation: Before After 3C6T/1CCX: 6.39 Mmsg/s 6.35 Mmsg/s 6C12T/2CCX: 2.90 Mmsg/s 3.16 Mmsg/s 12C24T/4CCX: 2.14 Mmsg/s 2.33 Mmsg/s 24C48T/8CCX: 1.75 Mmsg/s 1.92 Mmsg/s This tends to confirm that the queues will really be needed and that they'll have to be per-ccx hence per thread-group. They will amortize the number of updates on head & tail (one per multiple messages).	2024-03-25 17:34:19 +00:00
Willy Tarreau	41d3ea521b	MEDIUM: ring: unlock the ring's tail earlier We know we can continue to protect the message area so we can unlock the tail as soon as we know its new value. Now we're seeing ~6.4M msg/s vs 5.4M previously on 3C6T of a 3rd gen EPYC, and 1.88M vs 1.54M for 24C48T threads, which is a significant gain! This requires to carefully write the new head counter before releasing the writers, and to change the calculation of the work area from tail..head to tail...new_tail while writing the message.	2024-03-25 17:34:19 +00:00
Willy Tarreau	3cdd3d27a8	MEDIUM: move the ring's lock to only protect the readers list Now the lock is only taken around the readers list. With careful ordering of writes to head/tail, the ring remains protected. The perf is a bit better, though (1.54M msg/s vs 1.4M at 48T on a 3rd gen EPYC, and 5.4M vs 5.3M for a 3C6T setup).	2024-03-25 17:34:19 +00:00
Willy Tarreau	eb3d5f464d	MEDIUM: ring: use the topmost bit of the tail as a lock We're now locking the tail while looking for some room in the ring. In fact it's still while writing to it, but the goal definitely is to get rid of the lock ASAP. For this we reserve the topmost bit of the tail as a lock, which may have as a possible visible effect that buffers will be limited to 2GB instead of 4GB on 32-bit machines (though in practise, good luck for allocating more than 2GB contiguous on 32-bit), but in practice since the size is read with atol() and some operating systems limit it to LONG_MAX unless passing negative numbers, the limit is already there. For now the impact on x86_64 is significant (drop from 2.35 to 1.4M/s on 48 threads on EPYC 24 cores) but this situation is only temporary so that changes can be reviewable and bisectable. Other approaches were attempted, such as using XCHG instead, which is slightly faster on x86 with low thread counts (but causes more write contention), and forces readers to stall under heavy traffic because they can't access a valid value for the queue anymore. A CAS requires preloading the value and is les good on ARMv8.1. XADD could also be considered with 12-13 upper bits of the offset dedicated to locking, but that looks overkill.	2024-03-25 17:34:19 +00:00
Willy Tarreau	2192983ffd	MEDIUM: ring: protect the reader's positions against writers The reader now needs to protect the positions it's reading. This is already done via the readers counter at the beginning of messages, but as long as the lock is present, this counter is decremented before starting to parse messages, and incremented at the end. We must now do that in reverse, first protect the end of the messages, and only then remove ourselves from the already processed messages, so that at no point could a writer pass over and possibly overwrite data we're currently watching.	2024-03-25 17:34:19 +00:00
Willy Tarreau	73b2436fe6	MEDIUM: ring: lock the tail's readers counters before proceeding with the changes The goal here is to start to protect the writing area inside the area itself so that we'll later be able to release the ring's lock. We're not there yet, but at least the tail is marked as protected for as long as the message is not fully written.	2024-03-25 17:34:19 +00:00
Willy Tarreau	d336d71cbb	MINOR: ring: make the reader check the readers count before inc/dec We'll want to reserve some special values for the readers count to temporary lock the following message, but for this it will be mandatory that readers check for them before incrementing/decrementing the counter. Let'sdo that using a CAS. The readers performance is not as critical as the writer's anyway so the slight overhead is not a problem.	2024-03-25 17:34:19 +00:00
Willy Tarreau	bf3dead20c	MEDIUM: ring: remove the struct buffer from the ring The purpose is to store a head and a tail that are independent so that we can further improve the API to update them independently from each other. The struct was arranged like the original one so that as long as a ring has its head set to zero (i.e. no recycling) it will continue to work. The new format is already detectable thanks to the "rsvd" field which indicates the number of reserved bytes at the beginning. It's located where the buffer's area pointer previously was, so that older versions of haring can continue to open the ring in repair mode, and newer ones can use the fact that the upper bits of that variable are zero to guess that it's working with the new format instead of the old one. Also let's keep in mind that the layout will further change to place some alignment constraints. The haring tool will thus updated based on this and it detects that the rsvd field is smaller than a page and that the sum of it with the size equals the mapped size, in which case it uses the new dump_v2() function instead of dump_v1(). The new function also creates a buffer from the ring's area, size, head and tail and calls the generic one so that no other code had to be adapted.	2024-03-25 17:34:19 +00:00
Willy Tarreau	01aa0a057c	MEDIUM: ring: change the ring reader to use the new vector-based API now The code now looks cleaner and more easily shows what still needs to be addressed. There are not that many changes in practice, these are mostly mechanical, essentially hiding the buffer from the callers.	2024-03-25 17:34:19 +00:00
Willy Tarreau	4e6fadb8a1	MEDIUM: ring: replace the buffer API in ring_write() with the vec<->ring API This is the start of the replacement of the buffer API calls. Only the ring_write() function was touched. Instead of manipulating a buffer all along, we now extract the ring buffer's head and tail upon entry, store them locally and use them using the vec<->ring API until the last moment where we can update the buffer with the new values. One subtle point is that we must never fill the buffer past the last byte otherwise the vec-to-ring conversion gets lost and there's no more possibility to know where's the beginning nor the end (just like when dealing with head+tail in fact), because it then becomes impossible to distinguish between an empty and a full buffer.	2024-03-25 17:34:19 +00:00
Willy Tarreau	4e6de42b27	MINOR: ring: allow to reduce a ring size In ring_resize() we used to check if the new ring was at least as large as the previous one before resizing it, but what counts is that it's as large as the previous one's contents. Initially it was thought this would not really matter, but given that rings are initially created as BUFSIZE, it's currently not possible to shrink them for debugging purposes. Now with this change it is.	2024-03-25 17:34:19 +00:00
Willy Tarreau	0fa05ce171	MINOR: ring: resize only under thread isolation The ring resizing was already quite tricky, but when facing atomic writes it will no longer be possible and we definitely do not want to have to deal with a lock there. Since it's only done at boot time, and possibly later from the CLI, let's just do it under thread isolation.	2024-03-25 17:34:19 +00:00
Willy Tarreau	03816ccfa9	MAJOR: ring: insert an intermediary ring_storage level We'll need to add more complex structures in the ring, such as wait queues. That's far too much to be stored into the area in case of file-backed contents, so let's split the ring definition and its storage once for all. This patch introduces a struct ring_storage which is assigned to ring->storage, which contains minimal information to represent the storage layout, i.e. for now only the buffer, and all the rest remains in the ring itself. The storage is appended immediately after it and the buffer's pointer always points to that area. It has the benefit of remaining 100% compatible with the existing file-backed layout. In memory, the allocation loses the size of a struct buffer. It's not even certain it's worth placing the size there, given that it's constant and that a dump of a ring wouldn't really need it (the file size is sufficient). But for now everything comes with the struct buffer, and later this will change once split into head and tail. Also this area may be completed with more information in the future (e.g. storage version, format, endianness, word size etc).	2024-03-25 17:34:19 +00:00
Willy Tarreau	01abdcb307	MINOR: ring: add a flag to indicate a mapped file Till now we used to rely on a heuristic pointer comparison to check if a ring was mapped or allocated. Better assign a flag to clarify this because it's going to become difficult otherwise.	2024-03-25 17:34:19 +00:00
Willy Tarreau	80441a6983	MINOR: ring: use ring_size(), ring_area(), ring_head() and ring_tail() Some open-coded constructs were updated to make use of the ring accessors instead. This allows to remove some direct dependencies on the buffers API a bit more.	2024-03-25 17:34:19 +00:00
Willy Tarreau	a75052d665	MINOR: errors: use ring_dup() to duplicate the startup_logs In startup_logs_dup() we currently need to reference the ring's buffer, better not do this as it will complicate operations when switching to other types.	2024-03-25 17:34:19 +00:00
Willy Tarreau	7c9ce715c9	MINOR: ring: make callers use ring_data() and ring_size(), not ring->buf As we're going to remove the ring's buffer, we don't want callers to access it directly, so let's use ring_data() and ring_size() instead for this.	2024-03-25 17:34:19 +00:00
Willy Tarreau	ee1c92cf10	MINOR: ring: rename totlen vs msglen in ring_write() The ring_write() function uses confusing variable names: totlen is in fact the length of the message, not the total length that is going to be written. Let's rename it msglen and have a real "needed" that corresponds to the total size we're going to write. We also add a BUG_ON_HOT() to catch mistakes causing discrepancies.	2024-03-25 17:34:19 +00:00
Willy Tarreau	0b1c17a2dd	MINOR: ring: reserve one special value for the readers count In order to support concurrent writers we'll need to lock areas in the buffer. For this we'll use one special value of the single-byte readers count. Let's reserve it now and use the macro instead of the hardcoded 255.	2024-03-25 17:34:19 +00:00
Willy Tarreau	0f611987da	MINOR: ring: make the ring reader use only absolute offsets The goal is to remove references to the buffer's head and tail in the fast path so that we can release the lock during some reads. This means no more comparisons with b_data() nor operations relative to b_head() will be possible anymore. As a first step we need to have an absolute offset in the buffer, and to use b_getblk_ofs() in the applet callbacks to retrieve the data based on this.	2024-03-25 17:34:19 +00:00
Willy Tarreau	8f3edf2ac6	MEDIUM: log/sink: make the log forwarder code use ring_dispatch_messages() This code becomes even simpler and almost does not need any knowledge of the structure of the ring anymore. It even highlighted that an old race had not been fixed due to code duplication, but that's now done.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c262442b1a	MEDIUM: sink: move the generic ring forwarder code use ring_dispatch_messages() Now the code is much simpler than the ring forwarding function almost does not need any knowledge of the structure of the ring anymore.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c62a2d540d	MEDIUM: ring: move the ring reader code to ring_dispatch_messages() This new function is made around the loop that scans a ring for new messages and dispatches them to a message handler. It also takes ring flags (WAIT, NEW, etc) and offset pointers that the caller will use to initialize/reuse/update the current processing offset. The caller is still responsible for presetting it to ~0 before the first call if it wants the function to automatically adjust it (or set it to the correct value). The function may also return the last_ofs that was known before releasing the lock so that the caller knows what to compare against and if it needs to restart processing or not. The context remains a void* so that should not necessarily depend on an appctx. The current "show ring" code was ported to this and it continues to work as expected.	2024-03-25 17:34:19 +00:00
Willy Tarreau	ad31e53287	REORG: dns/ring: split the ring between the generic one and the DNS one A ring is used for the DNS code but slightly differently from the generic one, which prevents some important changes from being made to the generic code without breaking DNS. As the use cases differ, it's better to just split them apart for now and have the DNS code use its own ring that we rename dns_ring and let the generic code continue to live on its own. The unused parts such as CLI registration were dropped, resizing and allocation from a mapped area were dropped. dns_ring_detach_appctx() was kept despite not being used, so as to stay consistent with the comments that say it must be called, despite the DNS code explicitly mentioning that it skips it for now (i.e. this may change in the future). Hopefully after the generic rings are converted the DNS code can migrate back to them, though this is really not necessary.	2024-03-25 17:34:19 +00:00
Willy Tarreau	8022ae326c	MEDIUM: ring/sink: use applet_append_line()/syslog_applet_append_event() for readers The rink reader code was duplicated as-is in 2.2 for the ring forwarding code in commits `494c505703` ("MEDIUM: ring: add server statement to forward messages from a ring") and `975564784f` ("MEDIUM: ring: add new srv statement to support octet counting forward") (which only differs by using a prefix instead of a suffix to delimit messages). Unfortunately, that makes it almost impossible to rework the core ring code because all these parts rely on it. This first commit aims at restoring a common structure for the core loop by just calling a distinct function based on the use case. The functions are either applet_append_line() when a whole line is to be emitted followed by an LF character, or syslog_applet_appent_event() when trying to send a TCP syslog line prepended with its size in decimal. There is no functional change beyond this.	2024-03-25 17:34:19 +00:00
Willy Tarreau	201c706330	MINOR: log/applet: add new function syslog_applet_append_event() This function takes a buffer on input, and offset and a length, and consumes the block from that buffer to send it to the appctx's output buffer. Contrary to its sibling applet_append_line(), instead of just appending an LF at the end of the line, it prepends the message size in decimal and a space before the message, as expected by syslog TCP implementaions. This will be used to simplify the ring reader code.	2024-03-25 17:34:19 +00:00
Willy Tarreau	6ae41dc510	MINOR: applet: add new function applet_append_line() This function takes a buffer on input, and offset and a length, and consumes the block from that buffer to send it to the appctx's output buffer. This will be used to simplify the ring reader code.	2024-03-25 17:34:19 +00:00
Willy Tarreau	5df0df96dd	MINOR: debug: add "debug dev trace" to flood with traces This new command, enabled only with "DEBUG_DEV", sends 2 or 20 traces per task wakeup (depending on the verbosity level), and stops after 1M wakeups per thread in order not to have to stop/start the process each time it's fired. We have two small messages and 18 larger ones from 20 to 270 bytes each, so that the average size is approx 213 bytes counting headers (the header adds approx 82 bytes), which matches what's generally observed on average when traces are enabled in all muxes. Typical figures show varations between 5.7M and 6.2M msg/s on an EPYC in a 3C6T setup (single CCX), and 2.12M - 2.22M in a 24C48T setup (across 8 CCX, with 8 thread groups).	2024-03-25 17:32:22 +00:00
Aurelien DARRAGON	db1cd8f881	OPTIM: http_ext: avoid useless copy in http_7239_extract_{ipv4,ipv6} In http_7239_extract_{ipv4,ipv6}, we declare a local buffer in order to use inet_pton() since it requires a valid destination argument (cannot be NULL). Then, if the caller provided <ip> argument, we copy inet_pton() result (from local buffer to <ip>). In fact when the caller provides <ip>, we may directly use <ip> as inet_pton() dst argument to avoid an useless copy. Thus the local buffer is only relevant when the user doesn't provide <ip>. While at it, let's add a missing testcase for the rfc7239_n2nn converter (to check that http_7239_extract_ipv4() with <ip> provided works properly) This could be backported in 2.8 with `b2bb925` ("MINOR: proxy/http_ext: introduce proxy forwarded option")	2024-03-25 16:24:15 +01:00
Aurelien DARRAGON	3de1acfb23	BUILD: server: fix build regression on old compilers (<= gcc-4.4) Willy reported that since `3ac79b504` ("MEDIUM: server: make server_set_inetaddr() updater serializable"), haproxy fails to compile on some older compilers such as gcc-4.4 with this kind of error: src/server.c: In function 'snr_resolution_cb': src/server.c:4471: error: unknown field 'dns_resolver' specified in initializer compilation terminated due to -Wfatal-errors. make: *** [Makefile:1006: src/server.o] Error 1 This is due to referencing a member inside anonymous union from a compound literal assignment. Apparently such use of anonymous union wasn't properly supported back then on older compilers. To fix the issue, we give "u" name to the parent union use this name to explicitly refer to the union where relevant in the code (only a few changes fortunately). The fix itself was verified to restore build compatibility with gcc 4.4 (and even 4.2). As `3ac79b504` is used as a prerequisite for `64c9c8ef3` ("BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS"), please consider backporting this patch too if `64c9c8ef3` happens to be backported in 2.9.	2024-03-25 16:23:37 +01:00
Christopher Faulet	56c4b29ff1	BUG/MEDIUM: mux-fcgi: Properly handle EOM flag on end-of-trailers HTX block Trailers are skipped by the FCGI multiplexer. However empty chunked messages are not properly handled. It may be a chunked H1 request with no payload or a H2/H3 POST request with no payload. In that caes, the EOT HTX block is just ignored. The issue is that the EOM flag is thus ignored too. It means no empty STDIN record is sent to mark the end of the request to the server. To fix the issue, when a EOT htx block is found and it is the last HTX block of the message (and it should be), the EOM flag is tested. If it is found, an empty STDIN record is emitted. This patch should fix the issue #2499. It must be backported as far as 2.4.	2024-03-25 11:06:41 +01:00
Amaury Denoyelle	bd384a359b	BUG/MINOR: mux-quic: close all QCS before freeing QCC tasklet QUIC MUX is freed via qcc_release(). This in turn liberate all the remaining QCS instances. For each one of them, their corresponding stream-desc is released via qc_stream_desc_release(). This last function may itself notifies QUIC MUX when new buffers are available. This is useful when QCS are closed individually without the whole connection. However, when the connection is closed through qcc_release(), this may cause issue as some elements of QUIC MUX are already freed. In 2.9.6, a bug was detected directly linked to this. Indeed, QCC instance may be woken up on stream-desc release. If called through qcc_release(), this is an issue because QCC tasklet is freed before QCS instances. However, this bug is not systematic and relies on prior conditions : in particular, QUIC MUX must be under Tx buffers exhaustion prior to the qcc_release() invocation. The current dev tree is not impacted by this bug, thanks to QUIC MUX refactoring. Indeed, notifying accross layers have changed and now stream-desc release notifies individual QCS instances instead of the QCC element, which is a safer mechanism. However, to simplify backport process, bugfix is introduced in the current dev tree as it does not have any impact. Note that a proper fix would be to set quic-conn MUX state to QC_MUX_RELEASED. However, it is not possible to call quic_close() without having releasing all stream-desc elements first. The simpler solution was chosen to prevent other breaking issues during backports. This should fix github issue #2494. It should be backported up to 2.6. Note that prior to 2.7 qcc_release() was named qc_release().	2024-03-25 10:24:59 +01:00
Amaury Denoyelle	0d4273f04b	MEDIUM: server: close private idle connection before server deletion This commit similar to the following one : 65ae241dcfe710e1cdd3ec4e7a9bde38d2e4c116 MEDIUM: server: close idle conn before server deletion This patch implements a similar logic, this time to close private idle connections stored in sessions. The principle is identical to the above commit : conn_release() is used on idle connections after a takeover to ensure thread safety. An extra change was required to be able to execute takeover on such connections. Their original thread ID was unknown, contrary to non private connections which are stored in sharded lists. As such, a new tid member has been added under sess_priv_conns chaining element.	2024-03-22 17:12:27 +01:00
Amaury Denoyelle	5e8eb3661b	MEDIUM: mux: prepare for takeover on private connections When a backend connection is marked as idle, a special flag TASK_F_USR1 is set on MUX tasklet. When MUX tasklet is reactivated, extra checks are executed under this flag to ensure no takeover occurred in the meantime. Previously, only non private connections could be targetted by a takeover. However, this will change when implementing private idle connections closure on "delete server" CLI handler. As such, TASK_F_USR1 is now also set for private connections in MUX detach callbacks.	2024-03-22 17:10:06 +01:00
Amaury Denoyelle	6e0afb2e27	MEDIUM: server: close idle conn on server deletion To be able to delete a server, a number of preconditions must be validated to ensure it is not in used anymore. Previously, if idle connections were stored in the server, the deletion was cancelled. No action was implemented to force idle connection closure, the only solution was to wait for the periodic purging to be achieved. This is an extra burden to be able to delete a server. Indeed, idle connections are by definition inactive and can be closed prior to delete a server. This is the exact purpose of this patch. Idle connections removal is implemented inside "delete server" handler, once it has been determined that the server can be freely removed. A simple loop is run to call conn_release() over each idle connections. Takeover is also executed before conn_release() to ensure tasks/tasklets or any other sensible elements are not deleted from a foreign thread. This patch should reduce the occurence of rejected "delete server" execution, especially when connection reuse is high.	2024-03-22 16:59:02 +01:00
Amaury Denoyelle	f3862a9bc7	MINOR: connection: extend takeover with release option Extend takeover API both for MUX and XPRT with a new boolean argument <release>. Its purpose is to signal if the connection will be freed immediately after the takeover, rendering new resources allocation unnecessary. For the moment, release argument is always false. However, it will be set to true on delete server CLI handler to proactively close server idle connections.	2024-03-22 16:12:36 +01:00
Amaury Denoyelle	ff2e71ae24	MINOR: connection: implement conn_release() Several places reuse the same code to ensure a connection is properly freed, either via its MUX or by calling the proper set of functions. Factorize all of this in a new function conn_release(). This new function is now called via session_free() and session_accept_fd(). It will also be reused on delete server to proactively close idle connections.	2024-03-22 16:12:36 +01:00
Remi Tricot-Le Breton	099b5c421c	CLEANUP: ssl: Remove undocumented ocsp fetches Those fetchess were undocumented and were just here so that the ocsp-update log could be made through a regular log format. But since the logging is now "handmade" (since BUG/MEDIUM: ssl: Fix crash in ocsp-update log function), we don't need those anymore.	2024-03-20 16:12:11 +01:00
Remi Tricot-Le Breton	328a893713	MINOR: ssl: Change level of ocsp-update logs The pure ocsp-update log used to be in log level "info" and it would be mixed with actual traffic logs. This patch changes it to level "notice".	2024-03-20 16:12:11 +01:00
Remi Tricot-Le Breton	d4eeaa4003	MEDIUM: ssl: Change output of ocsp-update log Since commit "BUG/MEDIUM: ssl: Fix crash in ocsp-update log function", some information from the log line are "faked" because they can be actually retrieved anymore (or never could). We should then remove them from the logline all along instead of providing some useless fields. We then only keep pure OCSP-update information in the log line: "<certname> <status> <status str> <fail count> <success count>"	2024-03-20 16:12:11 +01:00
Remi Tricot-Le Breton	d4e3be18df	BUG/MEDIUM: ssl: Fix crash in ocsp-update log function The ocsp-update logging mechanism was built around the 'sess_log' function which required to keep a pointer to the said session until the logging function could be called. This was made by keeping a pointer to the appctx returned by the 'httpclient_start' function. But this appctx lives its life on its own and might be destroyed before 'ssl_ocsp_send_log' is called, which could result in a crash (UAF). Fixing this crash requires to stop using the 'sess_log' function to emit the ocsp-update logs. The log line will then need to be built by hand out of the information actually available when 'ssl_ocsp_send_log' is called. Since we don't use the "regular" logging functions anymore, we don't need to use the error_logformat anymore. In order to keep a consistent behavior than before, we will keep the same format for the logs but replace the fields that required a 'sess' pointer by fake values (the %ci:%cp for instance, which was never filled anyway). This crash was raised in GitHub issue #2442. It should be backported up to branch 2.8.	2024-03-20 16:12:10 +01:00
Remi Tricot-Le Breton	5c25c577a0	BUG/MEDIUM: ssl: Fix crash when calling "update ssl ocsp-response" when an update is ongoing The CLI command "update ssl ocsp-response" was forcefully removing an OCSP response from the update tree regardless of whether it used to be in it beforehand or not. But since the main OCSP upate task works by removing the entry being currently updated from the update tree and then reinserting it when the update process is over, it meant that in the CLI command code we were modifying a structure that was already being used. These concurrent accesses were not properly locked on the "regular" update case because it was assumed that once an entry was removed from the update tree, the update task was the only one able to work on it. Rather than locking the whole update process, an "updating" flag was added to the certificate_ocsp in order to prevent the "update ssl ocsp-response" command from trying to update a response already being updated. An easy way to reproduce this crash was to perform two "simultaneous" calls to "update ssl ocsp-response" on the same certificate. It would then crash on an eb64_delete call in the main ocsp update task function. This patch can be backported up to 2.8. Wait a little bit before backporting.	2024-03-20 16:12:10 +01:00
Remi Tricot-Le Breton	69071490ff	BUG/MAJOR: ocsp: Separate refcount per instance and per store With the current way OCSP responses are stored, a single OCSP response is stored (in a certificate_ocsp structure) when it is loaded during a certificate parsing, and each SSL_CTX that references it increments its refcount. The reference to the certificate_ocsp is kept in the SSL_CTX linked to each ckch_inst, in an ex_data entry that gets freed when the context is freed. One of the downsides of this implementation is that if every ckch_inst referencing a certificate_ocsp gets detroyed, then the OCSP response is removed from the system. So if we were to remove all crt-list lines containing a given certificate (that has an OCSP response), and if all the corresponding SSL_CTXs were destroyed (no ongoing connection using them), the OCSP response would be destroyed even if the certificate remains in the system (as an unused certificate). In such a case, we would want the OCSP response not to be "usable", since it is not used by any ckch_inst, but still remain in the OCSP response tree so that if the certificate gets reused (via an "add ssl crt-list" command for instance), its OCSP response is still known as well. But we would also like such an entry not to be updated automatically anymore once no instance uses it. An easy way to do it could have been to keep a reference to the certificate_ocsp structure in the ckch_store as well, on top of all the ones in the ckch_instances, and to remove the ocsp response from the update tree once the refcount falls to 1, but it would not work because of the way the ocsp response tree keys are calculated. They are decorrelated from the ckch_store and are the actual OCSP_CERTIDs, which is a combination of the issuer's name hash and key hash, and the certificate's serial number. So two copies of the same certificate but with different names would still point to the same ocsp response tree entry. The solution that answers to all the needs expressed aboved is actually to have two reference counters in the certificate_ocsp structure, one actual reference counter corresponding to the number of "live" pointers on the certificate_ocsp structure, incremented for every SSL_CTX using it, and one for the ckch stores. If the ckch_store reference counter falls to 0, the corresponding certificate must have been removed via CLI calls ('set ssl cert' for instance). If the actual refcount falls to 0, then no live SSL_CTX uses the response anymore. It could happen if all the corresponding crt-list lines were removed and there are no live SSL sessions using the certificate anymore. If any of the two refcounts becomes 0, we will always remove the response from the auto update tree, because there's no point in spending time updating an OCSP response that no new SSL connection will be able to use. But the certificate_ocsp object won't be removed from the tree unless both refcounts are 0. Must be backported up to 2.8. Wait a little bit before backporting.	2024-03-20 16:12:10 +01:00
Amaury Denoyelle	87b96cf3a5	BUG/MAJOR: connection: fix server used_conns with H2 + reuse safe By default, backend connections are accounted by the server. This allows to determine the number of idle connections to keep. A backend connection can also be marked as private to prevent its reuse. It is thus removed from server lists into the session list. As such, a private connection is not accounted into server : conn_set_private() uses srv_release_conn() to ensure this. When using HTTP/2 on backend side with default http-reuse safe, the above principle are mixed. Indeed, when a connection is first used, or switches from idle to used, it is moved into the session list but it is not flagged as private. This is done to prevent its sharing by different clients to prevent head-of-line blocking issue. When all streams are closed, the connection becomes idle again and is reinserted in the server list. This has been introduced by the following patch : `0d21deaded` MEDIUM: backend: add reused conn to sess if mux marked as HOL blocking When freeing a backend connection, special care is taken to ensure server used counter is decremented. This is implemented into conn_backend_deinit(). However, this function does this only if the connection is not present in a session list. This is valid for private connections. However, if a connection is non-private and present only temporarily into a session list, the decrement operation won't be executed despite the connection being accounted by the server. This bug has several impacts. The server used counter won't be able to reach its initial null value, even when all its connections are closed. This can result in a wrong estimation of necessary idle connections, which may cause unnecessary new connection usage. Also, this will prevent definitely the server from being removed via "delete server" CLI command. This should be backported up to 2.4. Note that conn_backend_deinit() was introduced in 2.9. For lesser versions, the change should be done directly into conn_free().	2024-03-20 14:26:57 +01:00
Amaury Denoyelle	fd3ce173aa	BUG/MEDIUM: http_ana: ignore NTLM for reuse aggressive/always and no H1 Backend connections can be marked as private to prevent their sharing by multiple clients. Now, this has become an exception as only two reasons for data traffic can trigger this (checks are ignored here) : * http-reuse never * HTTP response with NTLM header The first case is easy to manage as the connection is flagged as private since its inception. However, the second case is dynamic as the connection can be flagged anytime during its lifetime. When using a backend protocol such as HTTP/2 with reuse mode aggressive or always, we face a design issue as the connection would be marked as private, despite potentially being shared by several clients at the same time. This is conceptually invalid, but worst it can trigger crashes on MUX stream detach callback depending on the order of release of the streams, by calling session_check_idle_conn() with a NULL session. It could also be possible to have several NTLM responses on a single connection for different sessions. In this case, connection owner is still being updated without attaching the connection to its correct session, which ultimately would cause a crash on session_check_idle_conn with an invalid session. Here are two backtrace examples from GDB for such cases : Thread 1 (Thread 0x7ff73e9fc700 (LWP 648859)): #0 session_check_idle_conn (conn=0x7ff72f597800, sess=0x0) at include/haproxy/session.h:209 #1 h2_detach (sd=<optimized out>) at src/mux_h2.c:4520 #2 0x000056151742be24 in sc_detach_endp (scp=scp@entry=0x7ff73e9f0f18) at src/stconn.c:376 #3 0x000056151742c208 in sc_destroy (sc=<optimized out>) at src/stconn.c:444 #4 0x0000561517370871 in stream_free (s=s@entry=0x7ff72a2dbd80) at src/stream.c:728 #5 0x000056151737541f in process_stream (t=t@entry=0x7ff72d5e2620, context=0x7ff72a2dbd80, state=<optimized out>) at src/stream.c:2645 #6 0x0000561517456cbb in run_tasks_from_lists (budgets=budgets@entry=0x7ff73e9f10d0) at src/task.c:632 #7 0x00005615174576b9 in process_runnable_tasks () at src/task.c:876 #8 0x000056151742275a in run_poll_loop () at src/haproxy.c:2996 #9 0x0000561517422db1 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3195 #10 0x00007ff789e081ca in start_thread () from /lib64/libpthread.so.0 #11 0x00007ff789a39e73 in clone () from /lib64/libc.so.6 (gdb) Thread 1 (Thread 0x7ff52e7fc700 (LWP 681458)): #0 0x0000556ebd6e7e69 in session_check_idle_conn (conn=0x7ff5787ff100, sess=0x7ff51d2539a0) at include/haproxy/session.h:209 #1 h2_detach (sd=<optimized out>) at src/mux_h2.c:4520 #2 0x0000556ebd7f3e24 in sc_detach_endp (scp=scp@entry=0x7ff52e7f0f18) at src/stconn.c:376 #3 0x0000556ebd7f4208 in sc_destroy (sc=<optimized out>) at src/stconn.c:444 #4 0x0000556ebd738871 in stream_free (s=s@entry=0x7ff520e28200) at src/stream.c:728 #5 0x0000556ebd73d41f in process_stream (t=t@entry=0x7ff565783700, context=0x7ff520e28200, state=<optimized out>) at src/stream.c:2645 #6 0x0000556ebd81ecbb in run_tasks_from_lists (budgets=budgets@entry=0x7ff52e7f10d0) at src/task.c:632 #7 0x0000556ebd81f6b9 in process_runnable_tasks () at src/task.c:876 #8 0x0000556ebd7ea75a in run_poll_loop () at src/haproxy.c:2996 #9 0x0000556ebd7eadb1 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3195 #10 0x00007ff5752081ca in start_thread () from /lib64/libpthread.so.0 #11 0x00007ff574e39e73 in clone () from /lib64/libc.so.6 (gdb) To solve this issue, simply ignore NTLM responses when using a multiplexer with streams support and the connection is not already attached to the session. The connection is not marked as private and will continue to be shared freely accross clients. This is considered conceptually valid as NTLM usage (rfc 4559) with HTTP is broken and was designed only with HTTP/1.1 in mind. A side-effect of the change is that SESS_FL_PREFER_LAST is also not set anymore on NTLM detection, which allows following requests to be load-balanced accross several server instances. The original behavior is kept for HTTP/1 or if the connection is already attached to the session. This last case happens when using HTTP/2 with default http-reuse safe mode since the following patch : `0d21deaded` MEDIUM: backend: add reused conn to sess if mux marked as HOL blocking This should be backported up to all stable releases. Up until 2.4, it can be taken as-is. For lesser versions, above patch is not present. In this case the condition should be restricted only to HTTP/1 usage : if (srv_conn && strcmp(srv_conn->mux->name, "H1") == 0) {	2024-03-20 14:26:57 +01:00
Christopher Faulet	eb89e4f3e0	BUG/MEDIUM: spoe: Return an invalid frame on recv if size is too small Frames with a too small size must be detected on receive and an error must be triggered. It is especially important for frames of size 0. Otherwise, because the frame length is used as return value, the frame is ignored (0 is the return value to state the frame must be ignored). It is an issue because in this case, outgoing data, the 4 bytes representing the frame size, are never consumed. If the agent also closes the connection, this leads to a wakeup loop because outgoing data are stuck and a shutdown is pending. In addition, all pending outgoing data are systematcially skipped when the applet is in SPOE_APPCTX_ST_END state. The patch should fix the issue #2490. It must be backported to all stable versions.	2024-03-19 07:54:25 +01:00
Ilia Shipitsin	5fe02c33bc	CLEANUP: assorted typo fixes in the code and comments This is 40th iteration of typo fixes	2024-03-18 19:54:33 +01:00
Christopher Faulet	885e40494c	MINOR: spoe: Add SPOE filters in the exposed deprecated directives It is the first deprecated directive exposed via the 'expose-deprecated-directives' global option. This way, it is possible to silent the warning about the SPOE uses.	2024-03-15 11:31:48 +01:00
Christopher Faulet	189f74d4ff	MINOR: cfgparse: Add a global option to expose deprecated directives Similarly to "expose-exprimental-directives" option, there is no a global option to expose some deprecated directives. Idea is to have a way to silent warnings about deprecated directives when there is no alternative solution. Of course, deprecated directives covered by this option are not listed and may change. It is only a best effort to let users upgrade smoothly.	2024-03-15 11:31:48 +01:00
Christopher Faulet	dff9807188	MAJOR: spoe: Deprecate the SPOE filter As announced on the ML few weeks (months ?) ago and on several GH issues, the SPOE is now deprecated. Sadly, this filter should be refactored to work properly. It was implemented as a functionnal PoC for the 1.7 and since then, no time was invest to improve it and make it truly maintainable in time. Worst, other parts of HAProxy evolve, especially applets part, making maintenance ever more expensive. Instead of keeping the SPOE filter in a this state and always reply to users encountering issues or limitations that it is far from perfect but we cannot work on it for now, we decided to deprecate it. We can still change our mind before the 3.0.0 release if the situation evolves. Otherwise the filter will be removed or marked as unmaintained for the 3.1. If the situation does not change, it means the 3.0 will be the last version with a true SPOE support.	2024-03-15 11:29:39 +01:00
Christopher Faulet	6547b14292	BUG/MINOR: spoe: Be sure to be able to quickly close IDLE applets on soft-stop On soft-stop, we try, as far as possible, to process all pending messages before closing SPOE applets. However, in sync mode, when an applets waiting for a response receives the ACK frame, it is switched to IDLE state without checking if it may be closed. In this case, we will wait the idle timeout before closing de applet, delaying the soft-stop. To reduce this delay, on soft-stop, IDLE applets are woken up. On the next wakeup, the applet will try to process pending messages or will be closed. This patch should be backported to all stable versions.	2024-03-15 09:09:22 +01:00
Christopher Faulet	3c066b1e34	BUG/MEDIUM: spoe: Don't rely on stream's expiration to detect processing timeout On stream side, the SPOE filter relied on the stream's expiration date to be woken up and be able to detect processing timeout. However, the stream expiration date must not be updated this way. Mainly because it may be overwritten at the end of process_stream(). In the worst case, it is set to TICK_ETERNITY for any reason. In this case, it is impossible to detect the SPOE filter must time out and abort the processing. The right way to do is to set an analysis expiration date on the corresponding channel, depending on the direction. This expiration date will be used to compute the stream's expiration date at the end of process_stream(). This patch may be related to issue #2478. It must be backported to all stable versions.	2024-03-15 09:09:22 +01:00
Amaury Denoyelle	7dae3ceaa0	BUG/MAJOR: server: do not delete srv referenced by session A server can only be deleted if there is no elements which reference it. This is taken care via srv_check_for_deletion(), most notably for active and idle connections. A special case occurs for connections directly managed by a session. This is for so-called private connections, when using http-reuse never or H2 + http-reuse safe for example. In this case. server does not account these connections into its idle lists. This caused a bug as the server is deleted despite the session still being able to access it. To properly fix this, add a new referencing element into the server for these session connections. A mt_list has been chosen for this. On default http-reuse, private connections are typically not used so it won't make any difference. If using H2 servers, or more generally when dealing with private connections, insert/delete should typically occur only once per session lifetime so impact on performance should be minimal. This should be backported up to 2.4. Note that srv_check_for_deletion() was introduced in 3.0 dev tree. On backport, the extra condition in it should be placed in cli_parse_delete_server() instead.	2024-03-14 15:21:07 +01:00
Amaury Denoyelle	5ad801c058	MINOR: session: rename private conns elements By default, backend connections are attached to a server instance. This allows to implement connection reuse. However, in some particular cases, connection cannot be shared accross several clients. These connections are considered and private and are attached to the session instance instead. These private connections are also indexed by the target server to not mix them. All of this is implemented via a dedicated structure previously named struct sess_srv_list. Rename it to better reflect its usage to struct sess_priv_conns. Also rename its internal members and all of the associated functions. This commit is only a renaming, thus no functional impact is expected.	2024-03-14 15:21:02 +01:00
Christopher Faulet	f31a4e302e	BUG/MINOR: listener: Don't schedule frontend without task in listener_release() null pointer dereference was reported by Coverity in listener_release() function. Indeed, we must not try to schedule frontend without task when a limit is still blocking the frontend. This issue was introduced by commit `65ae1347c7` ("BUG/MINOR: listener: Wake proxy's mngmt task up if necessary on session release") This patch should fix issue #2488. It must be backported to all stable version with the commit above.	2024-03-14 09:34:36 +01:00
Christopher Faulet	65ae1347c7	BUG/MINOR: listener: Wake proxy's mngmt task up if necessary on session release When a session is released, listener_release() function is called to notify the listener. It is an opportunity to resume limited/full listeners. We first try to resume the listener owning the released session, then all limited listeners in the global queue and finally all limited listeners in the frontend's waiting queue. This last step is only performed if there is no limit applied on the frontend. Nothing is performed if the session rate is still limited. And it is an issue because if this happens for the last listener's session, there is no other event to wake the frontend's managment task up and the listener remains in the limited state. To fix the issue, when a limit is still applied on the frontent, we must compute the new wake up date from the sessions rate and schedule the frontend's managment task. It is easy to reproduce the issue in SSL by setting a maxconn and a rate limit on sessions. This patch should fix the issue #2476. It must be backported to all stable versions.	2024-03-13 15:20:06 +01:00
William Lallemand	70be894e41	MINOR: debug: enable insecure fork on the command line -dI allow to enable "insure-fork-wanted" directly from the command line, which is useful when you want to run ASAN with addr2line with a lot of configuration files without editing them.	2024-03-13 11:23:14 +01:00
Aurelien DARRAGON	07b2e84bce	BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread (2nd try) While trying to reproduce another crash case involving lua filters reported by @bgrooot on GH #2467, we found out that mixing filters loaded from different contexts ('lua-load' vs 'lua-load-per-thread') for the same stream isn't supported and may even cause the process to crash. Historically, mixing lua-load and lua-load-per-threads for a stream wasn't supported, but this changed thanks to `0913386` ("BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread"). However, the above fix didn't consider lua filters's use-case properly: unlike lua fetches, actions or even services, lua filters don't simply use the stream hlua context as a "temporary" hlua running context to process some hlua code. For fetches, actions.. hlua executions are processed sequentially, so we simply reuse the hlua context from the previous action/fetch to run the next one (this allows to bypass memory allocations and initialization, thus it increases performance), unless we need to run on a different hlua state-id, in which case we perform a reset of the hlua context. But this cannot work with filters: indeed, once registered, a filter will last for the whole stream duration. It means that the filter will rely on the stream hlua context from ->attach() to ->detach(). And here is the catch, if for the same stream we register 2 lua filters from different contexts ('lua-load' + 'lua-load-per-thread'), then we have an issue, because the hlua stream will be re-created each time we switch between runtime contexts, which means each time we switch between the filters (may happen for each stream processing step), and since lua filters rely on the stream hlua to carry context between filtering steps, this context will be lost upon a switch. Given that lua filters code was not designed with that in mind, it would confuse the code and cause unexpected behaviors ranging from lua errors to crashing process. So here we take another approach: instead of re-creating the stream hlua context each time we switch between "global" and "per-thread" runtime context, let's have both of them inside the stream directly as initially suggested by Christopher back then when talked about the original issue. For this we leverage hlua_stream_ctx_prepare() and hlua_stream_ctx_get() helper functions which return the proper hlua context for a given stream and state_id combination. As for debugging infos reported after ha_panic(), we check for both hlua runtime contexts to check if one of them was active when the panic occured (only 1 runtime ctx per stream may be active at a given time). This should be backported to all stable versions with `0913386` ("BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread") This commit depends on: - "DEBUG: lua: precisely identify if stream is stuck inside lua or not" [for versions < 2.9 the ha_thread_dump_one() part should be skipped] - "MINOR: hlua: use accessors for stream hlua ctx" For 2.4, the filters API didn't exist. However it may be a good idea to backport it anyway because ->set_priv()/->get_priv() from tcp/http lua applets may also be affected by this bug, plus it will ease code maintenance. Of course, filters-related parts should be skipped in this case.	2024-03-13 09:24:46 +01:00
Aurelien DARRAGON	aa554be69c	MINOR: hlua: use accessors for stream hlua ctx Change hlua_stream_ctx_prepare() prototype so that it now returns the proper hlua ctx on success instead of returning a boolean. Add hlua_stream_ctx_get() to retrieve hlua ctx out of a given stream. This way we may easily change the storage mechanism for hlua stream in the future without extensive code changes. No backport needed unless a commit depends on it.	2024-03-13 09:24:46 +01:00
Aurelien DARRAGON	1a2cdf64c9	DEBUG: lua: precisely identify if stream is stuck inside lua or not When ha_panic() is called by the watchdog, we try to guess from ha_task_dump() and ha_thread_dump_one() if the thread was stuck while executing lua from the stream context. However we consider this is the case by simply checking if the stream hlua context was set, but this is not very precise because if the hlua context is set, then it simply means that at least one lua instruction was executed at the stream level, not that the stuck was currently executing lua when the panic occured. This is especially true with filters, one could simply register a lua filter that does nothing but this will still end up initializing the stream hlua context for each stream. If the thread end up being stuck during the stream handling, then debug dumping functions will report that the stream was stuck while handling lua, which is not necessarilly true, and could in fact confuse us even more. So here we take another approach, we add the BUSY flag to hlua context: this flag is set by hlua_ctx_resume() around lua_resume() call, this way we can precisely tell if the thread was handling lua when it was interrupted, and we rely on this flag in debug functions to check if the thread was effectively stuck inside lua or not while processing the stream No backport needed unless a commit depends on it.	2024-03-13 09:24:46 +01:00
Aurelien DARRAGON	85d81e4d0a	BUG/MINOR: hlua: fix missing lock in hlua_filter_delete() hlua_filter_delete() calls hlua_unref() on the stream hlua stack, but we should own the lock prior to manipulating the stack. This should be backported up to 2.6.	2024-03-13 09:24:46 +01:00
Aurelien DARRAGON	ecd8f3bfd7	BUG/MINOR: hlua: missing lock in hlua_filter_new() This is a complementary patch to `8670db7` ("BUG/MAJOR: hlua: improper lock usage with hlua_ctx_resume()") for hlua_filter_new(). Indeed, the HLUA_E_ERRMSG case still relies on the lua stack but didn't take the lock to do so. This should be backported up to 2.6.	2024-03-13 09:24:46 +01:00
Aurelien DARRAGON	4aefffc38c	BUG/MINOR: hlua: segfault when loading the same filter from different contexts Trying to register the same lua filter from global and per-thread context (using 'lua-load' + 'lua-load-per-thread') causes a segmentation fault in hlua_post_init(). This is due to a simple copy paste error as we try to print the function name in the error message (like we do when loading the same lua function from different contexts) instead of the filter name. This should be backported up to 2.6.	2024-03-13 09:24:46 +01:00
William Lallemand	501d9fdb86	MEDIUM: ssl: allow to change the OpenSSL security level from global section The new "ssl-security-level" option allows one to change the OpenSSL security level without having to change the openssl.cnf global file of your distribution. This directives applies on every SSL_CTX context. People sometimes change their security level directly in the ciphers directive, however there are some cases when the security level change is not applied in the right order (for example when applying a DH param). Before this patch, it was to possible to trick by using a specific openssl.cnf file and start haproxy this way: OPENSSL_CONF=./openssl.cnf ./haproxy -f bug-2468.cfg Values for the security level can be found there: https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_security_level.html This was discussed in github issue #2468.	2024-03-12 17:37:11 +01:00
William Lallemand	7e9e4a8f50	MEDIUM: ssl: initialize the SSL stack explicitely In issue #2448, users are complaining that FIPS is not working correctly since the removal of SSL_library_init(). This was removed because SSL_library_init() is deprecated with OpenSSL 3.x and emits a warning. But the initialization was not needed anymore because it is done at the first openssl API call. However it some cases it is needed. SSL_library_init() is now a define to OPENSSL_init_ssl(0, NULL). This patch adds OPENSSL_init_ssl(0, NULL) to the init. This could be backported in every stable branches, however let's wait before backporting it.	2024-03-12 12:03:07 +01:00
Willy Tarreau	7223296092	BUG/MINOR: server: fix first server template not being indexed 3.0-dev1 introduced a small regression with commit `b4db3be86e` ("BUG/MINOR: server: fix server_find_by_name() usage during parsing"). By changing the way servers are indexed and moving it into the server template loop, the first one is no longer indexed because the loop starts at low+1 since it focuses on duplication. Let's index the first one explicitly now. This should not be backported, unless the commit above is backported.	2024-03-12 08:23:03 +01:00
Dragan Dosen	0091692d97	BUG/MINOR: ssl: do not set the aead_tag flags in sample_conv_aes_gcm() This was not useful and was using uninitialized value. Introduced with the commit `08ac28237` ("MINOR: Add aes_gcm_enc converter"). Must be backported wherever the commit `08ac28237` was backported.	2024-03-11 19:20:44 +01:00
Dragan Dosen	d7610e6dde	BUG/MINOR: ssl: fix possible ctx memory leak in sample_conv_aes_gcm() The issue was introduced with the commit `c31499d74` ("MINOR: ssl: Add aes_gcm_dec converter"). This must be backported to all stable branches where the above converter is present, but it may need to be adjusted for older branches because of code refactoring.	2024-03-11 19:20:31 +01:00
Brooks Davis	c03a023882	MINOR: tools: use public interface for FreeBSD get_exec_path() Where possible (FreeBSD 13+), use the public, documented interface to the ELF auxiliary argument vector: elf_aux_info(). __elf_aux_vector is a private interface exported so that the runtime linker can set its value during process startup and not intended for public consumption. In FreeBSD 15 it has been removed from libc and moved to libsys.	2024-03-11 19:00:37 +01:00
Amaury Denoyelle	c499d66f37	MINOR: quic: remove qc_treat_rx_crypto_frms() This commit removes qc_treat_rx_crypto_frms(). This function was used in a single place inside qc_ssl_provide_all_quic_data(). Besides, its naming was confusing as conceptually it is directly linked to quic_ssl module instead of quic_rx. Thus, body of qc_treat_rx_crypto_frms() is inlined directly inside qc_ssl_provide_all_quic_data(). Also, qc_ssl_provide_quic_data() is now only used inside quic_ssl to its scope is set to static. Overall, API for CRYPTO frame handling is now cleaner.	2024-03-11 14:27:51 +01:00
Amaury Denoyelle	b068e758fb	MINOR: quic: simplify rescheduling for handshake On CRYPTO frames reception, tasklet is rescheduled with TASK_HEAVY to limit CPU consumption. This commit slighly simplifies this by regrouping TASK_HEAVY setting and tasklet_wakeup() instructions in a single location in qc_handle_crypto_frm(). All other unnecessary tasklet_wakeup() are removed.	2024-03-11 14:15:36 +01:00
Willy Tarreau	6770259083	MEDIUM: mux-h2: allow to set the glitches threshold to kill a connection Till now it was still needed to write rules to eliminate bad behaving H2 clients, while most of the time it would be desirable to just be able to set a threshold on the level of anomalies on a connection. This is what this patch does. By setting a glitches threshold for frontend and backend, it allows to automatically turn a connection to the error state when the threshold is reached so that the connection dies by itself without having to write possibly complex rules. One subtlety is that we still have the error state being exclusive to the parser's state so this requires the h2c_report_glitches() function to return a status indicating if the threshold was reached or not so that processing can instantly stop and bypass the state update, otherwise the state could be turned back to a valid one (e.g. after parsing CONTINUATION); we should really contemplate the possibility to use H2_CF_ERROR for this. Fortunately there were very few places where a glitch was reported outside of an error path so the changes are quite minor. Now by setting the front value to 1000, a client flooding with short CONTINUATION frames is instantly stopped.	2024-03-11 08:25:08 +01:00
Willy Tarreau	e6e7e1587e	MINOR: mux-h2: always use h2c_report_glitch() The function aims at centralizing counter measures but due to the fact that it only increments the counter by one unit, sometimes it was not used and the value was calculated directly. Let's pass the increment in argument so that it can be used everywhere.	2024-03-11 07:36:56 +01:00
matthias sweertvaegher	062ea3a3d4	BUILD: solaris: fix compilation errors Compilation on solaris fails because of usage of names reserved on that platform, i.e. 'queue' and 's_addr'. This patch redefines 'queue' as '_queue' and renames 's_addr' to 'srv_addr' which fixes compilation for now. Future plan: rename 'queue' in code base so define can be removed again. Backporting: 2.9, 2.8	2024-03-09 11:24:54 +01:00
Willy Tarreau	758cb450a2	OPTIM: sink: drop the sink lock used to count drops The sink lock was made to prevent event producers from passing while there were other threads trying to print a "dropped" message, in order to guarantee the absence of reordering. It has a serious impact however, which is that all threads need to take the read lock when producing a regular trace even when there's no reader. This patch takes a different approach. The drop counter is shifted left by one so that the lowest bit is used to indicate that one thread is already taking care of trying to dump the counter. Threads only read this value normally, and will only try to change it if it's non-null, in which case they'll first check if they are the first ones trying to dump it, otherwise will simply count another drop and leave. This has a large benefit. First, it will avoid the locking that causes stalls as soon as a slow reader is present. Second, it avoids any write on the fast path as long as there's no drop. And it remains very lightweight since we just need to add +2 or subtract 2*dropped in operations, while offering the guarantee that the sink_write() has succeeded before unlocking the counter. While a reader was previously limiting the traffic to 11k RPS under 4C/8T, now we reach 36k RPS vs 14k with no reader, so readers will no longer slow the traffic down and will instead even speed it up due to avoiding the contention down the chain in the ring. The locking cost dropped from ~75% to ~60% now (it's in ring_write now).	2024-03-09 11:23:52 +01:00
Willy Tarreau	eb7b2ec83a	OPTIM: sink: try to merge "dropped" messages faster When a reader doesn't read fast enough and causes drops, subsequent threads try to produce a "dropped" message. But it takes time to produce and emit this message, in part due to the use of chunk_printf() that relies on vfprintf() which has to parse the printf format, and during this time other threads may continue to increment the counter. This is the reason why this is currently performed in a loop. When reading what is received, it's common to see a large count followed by one or two single-digit counts, indicating that we could possibly have improved that by writing faster. Let's improve the situation a little bit. First we're now using a static message prefixed with enough space to write the digits, and a call to ultoa_r() fills these digits from right to left so that we don't have to process a format string nor perform a copy of the message. Second, we now re-check the counter immediately after having prepared the message so that we still get an opportunity for updating it. In order to avoid too long loops, this is limited to 10 iterations. Tests show that the number of single-digit "dropped" counters on output now dropped roughly by 15-30%. Also, it was observed that with 8 threads, there's almost never more than one retry.	2024-03-09 11:23:52 +01:00
Amaury Denoyelle	1ee7bf5bd9	MINOR: quic: always use ncbuf for rx CRYPTO The previous patch fix the handling of in-order CRYPTO frames which requires the usage of a new buffer for these data as their handling is delayed to run under TASK_HEAVY. In fact, as now all CRYPTO frames handling must be delayed, their handling can be unify. This is the purpose of this commit, which removes the just introduced new buffer. Now, all CRYPTO frames are buffered inside the ncbuf. Unused elements such as crypto_frms member for encryption level are also removed. This commit is not a bugcfix but is a direct follow-up to the last one. As such, it can probably be backported with it to 2.9 to reduce code differences between these versions.	2024-03-08 17:22:48 +01:00
Amaury Denoyelle	81f118cec0	BUG/MEDIUM: quic: fix handshake freeze under high traffic QUIC relies on SSL_do_hanshake() to be able to validate handshake. As this function is computation heavy, it is since 2.9 called only under TASK_HEAVY. This has been implemented by the following patch : `94d20be138` MEDIUM: quic: Heavy task mode during handshake Instead of handling CRYPTO frames immediately during reception, this patch delays the process to run under TASK_HEAVY tasklet. A frame copy is stored in qel.rx.crypto_frms list. However, this frame still reference the receive buffer. If the receive buffer is cleared before the tasklet is rescheduled, it will point to garbage data, resulting in haproxy decryption error. This happens if a fair amount of data is received constantly to preempt the quic_conn tasklet execution. This bug can be reproduced with a fair amount of clients. It is exhibited by 'show quic full' which can report connections blocked on handshake. Using the following commands result in h2load non able to complete the last connections. $ h2load --alpn-list h3 -t 8 -c 800 -m 10 -w 10 -n 8000 "https://127.0.0.1:20443/?s=10k" Also, haproxy QUIC listener socket mode was active to trigger the issue. This forces several connections to share the same reception buffer, rendering the bug even more plausible to occur. It should be possible to reproduce it with connection socket if increasing the clients amount. To fix this bug, define a new buffer under quic_cstream. It is used exclusively to copy CRYPTO data for in-order frame if ncbuf is empty. This ensures data remains accessible even if receive buffer is cleared. Note that this fix is only a temporary step. Indeed, a ncbuf is also already used for out-of-order data. It should be possible to unify its usage for both in and out-of-order data, rendering this new buffer instance unnecessary. In this case, several unneeded elements will become obsolete such as qel.rx.crypto_frms list. This will be done in a future refactoring patch. This must be backported up to 2.9.	2024-03-08 17:22:48 +01:00
Nenad Merdanovic	08ac282375	MINOR: Add aes_gcm_enc converter The converter can be used to encrypt the raw byte input using the AES-GCM algorithm, using provided nonce and key. Co-authored-by: Dragan Dosen (ddosen@haproxy.com)	2024-03-08 17:20:43 +01:00
Nenad Merdanovic	e225e04ba7	MINOR: vars: export var_set and var_unset functions Co-authored-by: Dragan Dosen <ddosen@haproxy.com>	2024-03-08 17:20:43 +01:00
Aurelien DARRAGON	cf37e4cc1b	BUG/MINOR: cfgparse: report proper location for log-format-sd errors When a parsing error occurs inside a log-format-sd expression, we report the location of the log-format directive (which may not be set) instead of reporting the proper log-format-sd directive location where the parsing error occured. 1\|listen test 2\| log-format "%B" # no error 3\| log-format-sd "%bad" # error \| [ALERT] (322261) : config : Parsing [empty.conf:2]: failed to parse log-format-sd : no such format variable 'bad'. If you wanted to emit the '%' character verbatim, you need to use '%%'. The fix consists in using the config hints dedicated to log-format-sd directive instead of the log-format one. The bug was introduced in `8a4e4420` ("MEDIUM: log-format: Use standard HAProxy log system to report errors"). This should be backported to every stable versions.	2024-03-07 11:48:17 +01:00
Aurelien DARRAGON	59f08f65fd	CLEANUP: tree-wide: use proper ERR_* return values for PRE_CHECK fcts httpclient_precheck(), ssl_ocsp_update_precheck(), and resolvers_create_default() functions are registered through REGISTER_PRE_CHECK() macro to be called by haproxy during init from the pre_check_list list. When calling functions registered in pre_check_list, haproxy expects ERR_* return values. However those 3 functions currently use raw return values, so we better use explicit ERR_* macros to prevent breakage in the future if ERR_* values mapping were to change.	2024-03-07 11:48:08 +01:00
Aurelien DARRAGON	2df7e077c7	CLEANUP: log: fix obsolete comment for add_sample_to_logformat_list() Since `833cc794` ("MEDIUM: sample: handle comma-delimited converter list") logformat expressions now support having a comma-delimited converter list right after the fetch. Let's remove a leftover comment from the initial implementation that says otherwise.	2024-03-07 11:47:56 +01:00
Amaury Denoyelle	b0dd4810e7	BUG/MINOR: mux-quic: fix crash on aborting uni remote stream A remote unidirectional stream can be aborted prematurely if application layers cannot identify its type. In this case, a STOP_SENDING frame is emitted. Since QUIC MUX refactoring, a crash would occur in this scenario due to 2 specific characteristics of remote uni streams : * qcs.tx.fctl was not initialized completely. This cause a crash due to BUG_ON() statement inside qcs_destroy(). * qcs.stream is never allocated. This caused qcs_prep_bytes() to crash inside qcc_io_send(). This bug is considered minor as it happens only on very specific QUIC clients. It was detected when using s2n-quic over interop. This does not need to be backported.	2024-03-06 10:41:01 +01:00
Amaury Denoyelle	d8f1ff8648	BUG/MEDIUM: quic: fix connection freeze on post handshake After handshake completion, QUIC server is responsible to emit HANDSHAKE_DONE frame. Some clients wait for it to begin STREAM transfers. Previously, there was no explicit tasklet_wakeup() after handshake completion, which is necessary to emit post-handshake frames. In most cases, this was undetected as most client continue emission which will reschedule the tasklet. However, as there is no tasklet_wakeup(), this is not a consistent behavior. If this bug occurs, it causes a connection freeze, preventing the client to emit any request. The connection is finally closed on idle timeout. To fix this, add an explicit tasklet_wakeup() after handshake completion. It sounds simple enough but in fact it's difficult to find the correct location efor tasklet_wakeup() invocation, as post-handshake is directly linked to connection accept, with different orderings. Notably, if 0-RTT is used, connection can be accepted prior handshake completion. Another major point is that along HANDSHAKE_DONE frame, a series of NEW_CONNECTION_ID frames are emitted. However, these new CIDs allocation must occur after connection is migrated to its new thread as these CIDs are tied to it. A BUG_ON() is present to check this in qc_set_tid_affinity(). With all this in mind, 2 locations were selected for the necessary tasklet_wakeup() : * on qc_xprt_start() : this is useful for standard case without 0-RTT. This ensures that this is done only after connection thread migration. * on qc_ssl_provide_all_quic_data() : this is done on handshake completion with 0-RTT used. In this case only, connection is already accepted and migrated, so tasklet_wakeup() is safe. Note that as a side-change, quic_accept_push_qc() API has evolved to better reflect differences between standard and 0-RTT usages. It is now forbidden to call it multiple times on a single quic_conn instance. A BUG_ON() has been added. This issue is labelled as medium even though it seems pretty rare. It was only reproducible using QUIC interop runner, with haproxy compiled with LibreSSL with quic-go as client. However, affected code parts are pretty sensible, which justify the chosen severity. This should fix github issue #2418. It should be backported up to 2.6, after a brief period of observation. Note that the extra comment added in qc_set_tid_affinity() can be removed in 2.6 as thread migration is not implemented for this version. Other parts should apply without conflict.	2024-03-06 10:39:57 +01:00
William Lallemand	3a3c2b2695	BUG/MINOR: ssl/cli: typo in new ssl crl-file CLI description The `new ssl crl-file` option description on the CLI lacks the dash. Must be backported as far as 2.6.	2024-03-05 14:49:17 +01:00
Ilya Shipitsin	96cd04f8db	CLEANUP: fix typo in naming for variable "unused" In resolvers.c:rslv_promex_next_ts() and in stick-tables.c:stk_promex_next_ts(), an unused argument was mistakenly called "unsued" instead of "unused". Let's fix this in a separate patch so that it can be omitted from backports if this causes build problems.	2024-03-05 11:50:34 +01:00
Ilya Shipitsin	da3b12ade1	CLEANUP: assorted typo fixes in the code and comments This is 39th iteration of typo fixes The naming issue on the argument called "unsued" instead of "unused" in two functions from resolvers and stick-tables was put into a second patch so that it can be omitted if it were to cause backport issues.	2024-03-05 11:50:34 +01:00
Willy Tarreau	962c129dc1	BUG/MINOR: sink: fix a race condition in the TCP log forwarding code That's exactly the same as commit `53bfab080c` ("BUG/MINOR: sink: fix a race condition between the writer and the reader") that went into 2.7 and was backported as far as 2.4, except that since the code was duplicated, the second instance was not noticed, leaving the race present. The race has a limited impact, if a forwarder reaches the end of the logs and a new message arrives before it leaves, the forwarder will only wake up after yet another new message will be sent. In practice it remains unnoticeable because for the race to trigger, one needs to have a steady flow of logs, which means the wakeup will happen anyway. This should be backported, but no need to insist on it if it resists.	2024-03-05 11:48:44 +01:00
Aurelien DARRAGON	75c8a1bc2d	CLEANUP: hlua: txn class functions may LJMP Clarify that some txn related class functions may LJMP by adding the __LJMP tag to their prototype.	2024-03-04 16:48:51 +01:00
Aurelien DARRAGON	f364f4670b	MINOR: hlua: use SEND_ERR to report errors in hlua_event_runner() Instead of reporting lua errors using ha_alert(), let's use SEND_ERR() helper which will also try to generate a log message according to lua log settings.	2024-03-04 16:48:48 +01:00
Aurelien DARRAGON	e1b0031650	BUG/MINOR: hlua: don't call ha_alert() in hlua_event_subscribe() hlua_event_subscribe() is meant to be called from a protected lua env during init and/or runtime. As such, only hlua_event_sub() makes uses of it: when an error happens hlua_event_sub() will already raise a Lua exception. Thus it's not relevant to use ha_alert() there as it could generate log pollution (error is relevant from Lua script point of view, not from haproxy one). This could be backported in 2.8.	2024-03-04 16:48:42 +01:00
Aurelien DARRAGON	8670db7a89	BUG/MAJOR: hlua: improper lock usage with hlua_ctx_resume() hlua_ctx_resume() itself can safely be used as-is in a multithreading context because it takes care of taking the lua lock. However, when hlua_ctx_resume() returns, the lock is released and it is thus the caller's responsibility to ensure it owns the lock prior to performing additional manipulations on the Lua stack. Unfortunately, since early haproxy lua implementation, we used to do it wrong: The most common hlua_ctx_resume() pattern we can find in the code (because it was duplicated over and over over time) is the following: \|ret = hlua_ctx_resume() \|switch (ret) { \| case HLUA_E_OK: \| break; \| case HLUA_E_ERRMSG: \| break; \| [...] \|} Problem is: for some of the switch cases, we still perform lua stack manipulations. This is the case for the HLUA_E_ERRMSG for instance where we often use lua_tostring() to retrieve last lua error message on the top of the stack, or sometimes for the HLUA_E_OK case, when we need to perform some lua cleanup logic once the resume ended. But all of this is done WITHOUT the lua lock, so this means that the main lua stack could be accessed simultaneously by concurrent threads when a script was loaded using 'lua-load'. While it is not critical for switch-cases dedicated to error handling, (those are not supposed to happen very often), it can be very problematic for stack manipulations occuring in the HLUA_E_OK case under heavy load for instance. In this case, main lua stack corruptions will eventually happen. This is especially true inside hlua_filter_new(), where this bug was known to cause lua stack corruptions under load, leading to lua errors and even crashing the process as reported by @bgrooot in GH #2467. The fix is relatively simple, once hlua_ctx_resume() returns: we should consider that ANY lua stack access should be lua-lock protected. If the related lua calls may raise lua errors, then (RE)SET_SAFE_LJMP combination should be used as usual (it allows to lock the lua stack and catch lua exceptions at the same time), else hlua_{lock,unlock} may be used if no exceptions are expected. This patch should fix GH #2467. It should be backported to all stable versions. [ada: some ctx adj will be required for older versions as event_hdl doesn't exist prior to 2.8 and filters were implemented in 2.5, thus some chunks won't apply]	2024-03-04 16:48:31 +01:00
Aurelien DARRAGON	19b016f9f8	BUG/MEDIUM: hlua: improper lock usage with SET_SAFE_LJMP() When we want to perform some unsafe lua stack manipulations from an unprotected lua environment, we use SET_SAFE_LJMP() RESET_SAFE_LJMP() combination to lock lua stack and catch potential lua exceptions that may occur between the two. Hence, we regularly find this pattern (duplicated over and over): \|if (!SET_SAFE_LJMP(hlua)) { \| const char error; \| \| if (lua_type(hlua->T, -1) == LUA_TSTRING) \| error = hlua_tostring_safe(hlua->T, -1); \| else \| error = "critical error"; \| SEND_ERR(NULL, ": %s.\n", error); \|} This is wrong because when SET_SAFE_LJMP() returns false (meaning that an exception was caught), then the lua lock was released already, thus the caller is not expected to perform lua stack manipulations (because the main lua stack may be shared between multiple threads). In the pattern above we only want to retrieve the lua exception message which may be found at the top of the stack, to do so we now explicitly take the lua lock before accessing the lua stack. Note that hlua_lock() doesn't catch lua exceptions so only safe lua functions are expected to be used there (lua functions that may NOT raise exceptions). It should be backported to every stable versions. [ada: some ctx adj will be required for older versions as event_hdl doesn't exist prior to 2.8 and filters were implemented in 2.5, thus some chunks won't apply, but other fixes should stay relevant]	2024-03-04 16:47:20 +01:00
Aurelien DARRAGON	d81c2205a3	BUG/MINOR: hlua: improper lock usage in hlua_filter_new() In hlua_filter_new(), after each hlua resume, we systematically try to empty the stack by calling lua_settop(). However we're doing this without locking the lua context, so it is unsafe in multithreading context if the script is loaded using 'lua-load'. To fix the issue, we protect the call with hlua_{lock,unlock}() helpers. This should be backported up to 2.6.	2024-03-04 16:47:18 +01:00
Aurelien DARRAGON	51f291c795	BUG/MINOR: hlua: improper lock usage in hlua_filter_callback() In hlua_filter_callback(), some lua stack work is performed under SET_SAFE_LJMP() guard which also takes care of locking the hlua context when needed. However, a lua_gettop() call is performed out of the guard, thus it is unsafe in multithreading context if the script is loaded using 'lua-load' because in this case the main lua stack is shared between threads and each access to a lua stack must be performed under the lock, thus we move lua_gettop() call under the lock. It should be backported up to 2.6.	2024-03-04 16:47:17 +01:00
Aurelien DARRAGON	9578524091	BUG/MINOR: hlua: fix possible crash in hlua_filter_new() under load hlua_filter_new() handles memory allocation errors by jumping to the "end:" cleanup label in case of errors. Such errors may happen when the system is heavily loaded for instance. In hlua_filter_new(), we try to allocate two hlua contexts in a row before checking if one of them failed (in which case we jump to the cleanup part of the function), and only then we initialize them both. If a memory allocation failure happens for only one out of the two flt_ctx->hlua[] contexts pair, we still jump to the cleanup part. It means that the hlua context that was successfully allocated and wasn't initialized yet will be passed to hlua_ctx_destroy(), resulting in invalid reads in the cleanup function, which may ultimately cause the process to crash. To fix the issue: we make sure flt_ctx hlua contexts are initialized right after they are allocated, that is before any error handling condition that may force the cleanup. This bug was discovered when trying to reproduce GH #2467 with haproxy started with "-dMfail" argument. It should be backported up to 2.6.	2024-03-04 16:47:03 +01:00
Aurelien DARRAGON	369bfa0b50	BUG/MINOR: hlua: don't use lua_tostring() from unprotected contexts As per lua documentation, lua_tostring() may raise a memory error. However, we're often using it to fetch the error message at the top of the stack (ie: after a failing lua call) from unprotected environments. In practise, lua_tostring() has rare chances of failing, but still, if it happens to be the case, it could crash the process and we better not risk it. So here, we add hlua_tostring_safe() function, which works exactly as lua_tostring(), but the function cannot LJMP as it will catch lua_tostring() exceptions to return NULL instead. Everywhere lua_tostring() was used to retrieve error string from such unprotected contexts, we now rely on hlua_tostring_safe(). This should be backported to all stable versions. [ada: ctx adj will be required, for versions prior to 2.8 event_hdl API didn't exist so some chunks won't apply, and prior to 2.5 filters API didn't exist either, so again, some chunks should be ignored]	2024-03-04 16:46:55 +01:00
Aurelien DARRAGON	5508db9a20	BUG/MINOR: hlua: fix unsafe lua_tostring() usage with empty stack Lua documentation says that lua_tostring() returns a pointer that remains valid as long as the object is not removed from the stack. However there are some places were we use the returned string AFTER the corresponding object is removed from the stack. In practise this doesn't seem to cause visible bugs (probably because the pointer remains valid waiting for a GC cycle), but let's fix that to comply with the documentation and avoid undefined behavior. It should be backported in all stable versions.	2024-03-04 16:46:53 +01:00
Willy Tarreau	7151076522	BUG/MINOR: tools: seed the statistical PRNG slightly better Thomas Baroux reported a very interesting issue. "balance random" would systematically assign the same server first upon restart. That comes from its use of statistical_prng() which is only seeded with the thread number, and since at low loads threads are assigned to incoming connections in round robin order, practically speaking, the same thread always gets the same request and will produce the same random number. We already have a much better RNG that's also way more expensive, but we can use it at boot time to seed the PRNG instead of using the thread ID only. This needs to be backported to 2.4.	2024-03-01 16:25:39 +01:00
Christopher Faulet	31ec9f18bb	MINOR: hlua: Be able to disable logging from lua Add core.silent (-1) value to be able to disable logging via TXN:set_loglevel() call. Otherwise, there is no way to do so and it may be handy. This special value cannot be used with TXN:log() function. This patch may be backported if necessary.	2024-03-01 15:01:18 +01:00
Christopher Faulet	75fb0afde4	BUG/MINOR: hlua: Fix log level to the right value when set via TXN:set_loglevel When the log level is changed in lua, by calling TXN:set_loglevel function, it must be incremented by one because it is decremented in strm_log() function. This patch must be backport to all stable versions.	2024-03-01 15:01:18 +01:00
Christopher Faulet	573ed242e3	BUG/MINOR: config/quic: Alert about PROXY protocol use on a QUIC listener PROXY procotol is not supported on QUIC for now. Thus return an error during configuration parsing if 'accept-proxy' option is used for a QUIC listener. This patch should fix the issue #2186. It should be backport as far as 2.6.	2024-03-01 15:01:18 +01:00
Christopher Faulet	69f15b9a40	CLEANUP: mux-h2: Fix h2s_make_data() comment about the return value 2 return values are specified in the h2s_make_data() function comment. Both are more or less equivalent but the later is probably more accurate. So, keep the right one and remove the other one. This patch should fix the issue #2175.	2024-02-29 13:57:44 +01:00
Amaury Denoyelle	f913d42aaf	MINOR: quic: add MUX output for show quic Extend "show quic" to be able to dump MUX related information. This is done via the new function qcc_show_quic(). This replaces the old streams dumping list which was incomplete. These info are displayed on full output or by specifying "mux" field.	2024-02-29 10:03:36 +01:00
Amaury Denoyelle	dda3a0d8fc	MINOR: quic: specify show quic output fields Add the possibility to customize show quic full output with only a specific set of printed fields. This is specified as a comma-separated list. Here are the currently supported values : * tp: transport parameters * sock: connection addresses and socket FD * pktns: packet number space with ack ranges and in flight bytes * cc: congestion controler and loss information Note that streams output is not filtered by this mechanism. It's because it will be replaced soon by an output generated from the MUX which will use its owned field name.	2024-02-29 10:03:36 +01:00
Amaury Denoyelle	c4f5ff8369	MINOR: quic: filter show quic by address Add the possibilty to restrict show quic output to only a single connection. This is done by specifying a quic_conn address pointer. Default format selection has evolved with it. Indeed, it seems more fitting to use full format by default when filtering on a connection. However, it's still possible to revert to the original oneline format with it by specifying it explicitely.	2024-02-29 10:03:33 +01:00
Christopher Faulet	60fcc27577	MEDIUM: htx/http-ana: No longer close connection on early HAProxy response When a response was returned by HAProxy, a dedicated HTX flag was set. Thanks to this flag, it was possible to add a "connection: close" header to the response if the request was not fully received and to close the connection. In the same way, when a redirect rule was applied, keep-alive was forcefully disabled for unfinished requests. All these mechanisms are now useless because the H1 mux is able to drain the response. So HTX_FL_PROXY_RESP flag is removed and no special processing is performed on HAProxy response when the request is unfinished.	2024-02-28 16:02:33 +01:00
Christopher Faulet	077906da14	MAJOR: mux-h1: Drain requests on client side before shut a stream down unlike for H2 and H3, there is no mechanism in H1 to notify the client it must stop to upload data when a response is replied before the end of the request without closing the connection. There is no RST_STREAM frame equivalent. Thus, there is only two ways to deal with this situation: closing the connection or draining the request. Until now, HAProxy didn't support draining H1 messages. Closing the connection in this case has however a major drawback. It leads to send a TCP reset, dropping this way all in-fly data. There is no warranty the client has fully received the response. Draining H1 messages was never implemented because in old versions it was a bit tricky to implement. However, it is now far simplier to support this feature because it is possible to have a H1 stream without any applicative stream. It is the purpose of this patch. Now, when a shutdown is requested and the stream is detached from the connection, if the request is unfinished while the response was fully sent, the request in drained. To do so, in this case the shutdown and the detach are delayed. From the upper layer point of view, there is no changes. The endpoint is shut down and detached as usual. But on H1 mux point of view, the H1 stream is still alive and is being able to drain data. However the stream-endpoint descriptor is orphan. Once the request is fully received (and drained), the connection is shut down if it cannot be reused for a new transaction and the H1 stream is destroyed.	2024-02-28 16:02:33 +01:00
Christopher Faulet	14db433db9	MINOR: mux-h1: Move all stuff to detach a stream in an internal function All code from h1_detach() function was moved in a internal function, h1s_finish_detach(). It will be used to defer the detach and be able to drain the requests payload.	2024-02-28 15:31:07 +01:00
Christopher Faulet	7ae280d091	MINOR: mux-h1: Move checks performed before a shutdown in a dedicated function Checks performed in h1_shutw() to determine if the connection must be shutdown now or not was move in a dedicated function. This will be used to be able to drain the requests payload.	2024-02-28 15:31:07 +01:00
Christopher Faulet	81f75d32b2	BUG/MINOR: mux-h1: Properly report when mux is blocked during a nego During a zero-copy forwarding negociation, if the H1 mux is blocked for any reason, the IOBUF_FL_FF_BLOCKED flag must be set on its iobuf to notfiy the producer it must wait. However, there were two places where it was not performed: when the output buffer allocation failed and when the chunk formatting failed. This patch fixes the issue. It must be backported to 2.9.	2024-02-28 15:31:07 +01:00
Christopher Faulet	489e583ac5	BUG/MEDIUM: mux-h1: Fix again 0-copy forwarding of chunks with an unknown size There is still an issue with zero-copy forwarding of chunks with an unknown size. It is possible for a producer to fill the sapce reserved for the CRLF at the end of the chunk. The root cause is that this space is not accounted in the iobuf offset. So, from the producer point of view, the space may be used. We can also argue the current design for iobuf is not well suited for this case. Instead of using a pointer on the consumer's buffer, it could be easier to use a custom buffer built on top of the consumer one, via a call to b_make(), with the size, head and data field reflecting the avaialble space the producer can use. By the way, because of this bug, it is possible to trigger a BUG_ON() when we try to write the CRLF at the end of the chunk because the buffer is full. It is unexpected. Only the stats applet may hit this bug. To fix the issue, instead of writting this CRLF when the current chunk is consumed, it is written before consuming the next one. This way, all space reserved to create the chunk formatting is always placed before forwarding data. No backport needed.	2024-02-28 15:31:07 +01:00
Aurelien DARRAGON	60edfabc7b	LICENSE: http_ext: fix GPL license version This is a followup of the previous commit: GH user @songliumeng initially reported an issue with the GPL license version for event_hdl source file which was fixed by the previous commit. It turns out the same mistake was made in http_ext source file: due to a mixup between LGPL and GPL, GPL version '2.1' was referenced instead of '2'. Again, clarify that this is indeed GPL by making use of the banner provided in doc/gpl.txt This should be backported in 2.8 with `b2bb925` ("MINOR: proxy/http_ext: introduce proxy forwarded option")	2024-02-28 15:13:35 +01:00
Aurelien DARRAGON	e9261fff29	LICENSE: event_hdl: fix GPL license version As spotted by user @songliumeng in GH #2463, there was a mixup between LGPL and GPL in event_hdl source file: GPL version '2.1' was referenced instead of '2'. Clarify that this is indeed GPL by making use of the banner provided in doc/gpl.txt. This should be backported in 2.8 with `68e692d` ("MINOR: event_hdl: add event handler base api")	2024-02-28 15:13:27 +01:00
William Lallemand	bb7af8b2f1	BUG/MINOR: ssl/cli: duplicate cleaning code in cli_parse_del_crtlist Since `23cab33` ("BUG/MINOR: ssl: Clear the ckch instance when deleting a crt-list line"), LIST_DELETE is done twice, one time in cli_parse_del_crtlist() and another time in ckch_inst_free(). It could trigger a crash with -DDEBUG_LIST. This isn't a major problem since the ptr is not freed in the meantime so it will only trigger with the debug. This patch removes the LIST_DELETE as well as the loop done on link_ref which is also don in ckch_inst_free() Could be backported as far as 2.4. 2.4 version does not have a link_ref loop.	2024-02-27 18:10:43 +01:00
Amaury Denoyelle	8a31783b64	BUG/MEDIUM: server: fix dynamic servers initial settings Contrary to static servers, dynamic servers does not initialize their settings from a default server instance. As such, _srv_parse_init() was responsible to set a set of minimal values to have a correct behavior. However, some settings were not properly initialized. This caused dynamic servers to not behave as static ones without explicit parameters. Currently, the main issue detected is connection reuse which was completely impossible. This is due to incorrect pool_purge_delay and max_reuse settings incompatible with srv_add_to_idle_list(). To fix the connection reuse, but also more generally to ensure dynamic servers are aligned with other server instances, define a new function srv_settings_init(). This is used to set initial values for both default servers and dynamic servers. For static servers, srv_settings_cpy() is kept instead, using their default server as reference. This patch could have unexpected effects on dynamic servers behavior as it restored proper initial settings. Previously, they were set to 0 via calloc() invocation from new_server(). This should be backported up to 2.6, after a brief period of observation.	2024-02-27 17:02:20 +01:00
William Lallemand	4895fdac5a	BUG/MAJOR: ssl/ocsp: crash with ocsp when old process exit or using ocsp CLI This patch reverts 2 fixes that were made in an attempt to fix the ocsp-update feature used with the 'commit ssl cert' command. The patches crash the worker when doing a soft-stop when the 'set ssl ocsp-response' command was used, or during runtime if the ocsp-update was used. This was reported in issue #2462 and #2442. The last patch reverted is the associated reg-test. Revert "BUG/MEDIUM: ssl: Fix crash when calling "update ssl ocsp-response" when an update is ongoing" This reverts commit `5e66bf26ec`. Revert "BUG/MEDIUM: ocsp: Separate refcount per instance and per store" This reverts commit 04b77f84d1b52185fc64735d7d81137479d68b00. Revert "REGTESTS: ssl: Add OCSP related tests" This reverts commit acd1b85d3442fc58164bd0fb96e72f3d4b501d15.	2024-02-26 18:04:25 +01:00
Christopher Faulet	19559d4447	BUG/MEDIUM: applet: Fix HTX .rcv_buf callback function to release outbuf buffer In appctx_htx_rcv_buf(), HTX blocks found in the appctx output buffer are copied into the channel buffer. At the end, the state of the underlying buffer must be updated. If everything was copied, the buffer is reset. This way, it will be released later, at the end of the applet process function. However, here there was a typo. We do it on the input buffer instead of the output buffer. As side effect, an empty HTX message remained stuck in the appctx outbut buffer, blocking the applet and leading to blocked session with no expiration date. No backport needed.	2024-02-26 16:40:13 +01:00
Miroslav Zagorac	3f771f5118	MINOR: ssl: Call callback function after loading SSL CRL data Due to the possibility of calling a control process after adding CRLs, the ssl_commit_crlfile_cb variable was added. It is actually a pointer to the callback function, which is called if defined after initial loading of CRL data from disk and after committing CRL data via CLI command 'commit ssl crl-file ..'. If the callback function returns an error, then the CLI commit operation is terminated. Also, one case was added to the CLI context used by "commit cafile" and "commit crlfile": CACRL_ST_CRLCB in which the callback function is called. Signed-off-by: William Lallemand <wlallemand@haproxy.com>	2024-02-23 18:12:27 +01:00
Amaury Denoyelle	ba9f905da9	BUG/MINOR: quic: fix output of show quic Output of 'show quic' is messed up since the introduction of reordered packets counter in the following commit. The new counter is mixed up with the first stream line. This is due to the wrong placement of the newline delimiter. `167e38e0e0` MINOR: quic: Add a counter for reordered packets This should be backported up to 2.6.	2024-02-23 17:32:24 +01:00
Christopher Faulet	3d93ecc132	BUG/MAJOR: cli: Restore non-interactive mode behavior with pipelined commands The issue was decribed in commit "BUG/MEDIUM: cli: Warn if pipelined commands are delimited by a \n". In non-interactive mode, it was possible to use a newline character as delimiter for pipelined commands. As a consequence, it was possible to stop commands processing on the middle. With the above commit, a warning is emitted to notify users. With this one, we restore the expected behavior, as documented in the management guide. Only the first line of commands is parsed. This commit will not be backported to avoid breaking changes on stable versions. This commit has of course some visible effects. All script using a newline character as delimiter to pipeline commands in non-interactive mode will stop working. Only the first command will be evaluated, all others will be ignored. Pipelined commands MUST now be separated by a semi-colon. But there is a more subtle and probably more annoying change. It is no longer possible to pipeline commands with a payload ! A command with a payload will always be the last one evaluated because it must be finished by a newline (eventually preceeded by a custom pattern). It is really annoying to introduce such breaking change. But, on the long term, it is mandatory. The 2.8 will be the last LST version supporting the old behavior (with some warning however). This will let 4 years to users to adapt their scripts. No backport needed.	2024-02-23 15:19:49 +01:00
Christopher Faulet	598c7f164c	BUG/MEDIUM: cli: Warn if pipelined commands are delimited by a \n This was broken since commit `0011c25144` ("BUG/MINOR: cli: avoid O(bufsize) parsing cost on pipelined commands"). It is not really a bug fix but it is labelled as is to make it more visible. Before, a full line was first retrieved from the request buffer before extracting the first command to eval it. Now, only one command is retrieved. But we rely on the request buffer state to interrupt processing in non-interactive mode. After a command processing, if output of the request buffer is empty, we leave. Before the above commit, this was not a problem. But since then, it is obviously a bad statement. First because some input data may still be there. It is not true today, but it might change. Then, there is no warranty to receive all commands in same time. For small list of commands, it will be most of time the case, but it is a dangerous assumption. For long list of commands, it is almost always false. To be an issue, commands must be chunked exactly between two commands. But in this case, remaining commands are skipped. A good way to reproduce the issue is to wait a bit between two commands, for instance: (printf "show info;"; sleep 2; printf "show stat\n") \| socat ... In fact, to properly fix the issue, we should exit on the first command finished by a newline. Indeed, as stated in the documentation, in non-interactive mode, a single line is processed. To pipeline commands, commands must be separated by a semi-colon. Unfortunately, the above commit introduced another change. It is possible to pipeline commands delimited by a newline. It was pushed 2 years ago and backported to all stable versions. Several scripts may rely on this behavior. So, on stable version, the bug will not be fixed. However a warning will be emitted to notify users their scripts don't respect the documentation and they must adapt it. Mainly because the cli behavior on this point will be changed in 3.0 to stick to the doc. This warning will only be emitted once over the whole worker process life. Idea is to not flood the logs with the same warning for every offending commands. This commit should probably be backported to all stable versions. But with some cautions because the CLI was often modified.	2024-02-23 15:19:49 +01:00
Christopher Faulet	e018e8a419	MINOR: cli: Remove useless loop on commands to find unescaped semi-colon This loop was added to detect pipelined commands when only co_getline() was used to get commands. Now, co_getdelim() is used and the semi-colon is also considered as a command delimiter. As side effet, the last semi-colon, if any, is no longer replaced by a newline. Thus, we must take care to adapt the test to detect partial commands.	2024-02-23 15:19:49 +01:00
Amaury Denoyelle	73806f0675	BUG/MEDIUM: mux-quic: do not crash on qcs_destroy for connection error On qcs_destroy(), a BUG_ON() statement check that QCS does not have anymore prepared data. This is to ensure connection flow control is always coherent and prevent transfer freeze. However, this BUG_ON() may cause a spurrious crash in case QCC is considered on error. Indeed, in this case, all transfers are interrupted and qmux_strm_detach() will proceed to immediate QCS free before releasing the connection. In this situation, connection flow control is irrelevant so the BUG_ON() should be ignored. This crash occurs since the MUX refactoring via the following patch. Previously, a similar BUG_ON() was used but it was incorrectly implemented rendering it immune even to targetted cause. `3fe3251593` MEDIUM: mux-quic: simplify sending API This should fix github issue #2456. This does not need to be backported.	2024-02-23 11:41:33 +01:00
Amaury Denoyelle	1b8c5abeeb	BUG/MAJOR: server: fix stream crash due to deleted server Before a dynamic server can be deleted, a set of preconditions must be validated to ensure it is not referenced naymore by a stream or a connection. This is implemented in srv_check_for_deletion(). The various criteria specified were incomplete. This allows a server instance to be deleted while still be referenced by a stream and a connection. This bug was reproduced by using ASAN compilation. A script was used to add and delete a server every second, while using h2load to generate traffic with download of 1k objects. Here is the ASAN error. ==140916==ERROR: AddressSanitizer: heap-use-after-free on address 0x520000020080 at pc 0x63cb25679537 bp 0x701529ff5070 sp 0x701529ff5060 READ of size 1 at 0x520000020080 thread T7 #0 0x63cb25679536 in objt_server include/haproxy/obj_type.h:99 #1 0x63cb2568f465 in process_stream src/stream.c:1823 #2 0x63cb25a4a4a2 in run_tasks_from_lists src/task.c:632 #3 0x63cb25a4bf62 in process_runnable_tasks src/task.c:876 #4 0x63cb2596a220 in run_poll_loop src/haproxy.c:3050 #5 0x63cb2596b192 in run_thread_poll_loop src/haproxy.c:3252 #6 0x701539aa9559 (/usr/lib/libc.so.6+0x8b559) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af) #7 0x701539b26a3b (/usr/lib/libc.so.6+0x108a3b) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af) To fix this, add <curr_used_conns> to the counters checked in srv_check_for_deletion(). Outside of this bug, one case which remains sensible is for SF_DIRECT streams which referenced a server instance early in process_stream() before connect_server(). This occurs with use-server directive, force-persist rule or cookie persistence. However, after code reexamination, the code is considered reliable as process_stream() is not rescheduled before connect_server() invocation. These observations have been saved in sess_change_server() documentation to ensure it remains valid in the future. This must be backported up to 2.6.	2024-02-22 18:36:54 +01:00
Amaury Denoyelle	4adf2c9f00	BUG/MINOR: stats: drop srv refcount on early release Server refcount is used to protect from server deletion while dumping a server instance, for stats dump on both CLI and HTTP applet. However, dump can be aborted prematurely before reaching the end. In this case, server refcount is never decremented. This bug can cause an inconsistency on servers refcount, preventing them to be deleted even after "del server" success. To fix this, implement release handler for both stats CLI and HTTP applet. Drop server reference if dump was interrupted during servers loop. This should be backported up to 2.6.	2024-02-22 18:24:35 +01:00
Aurelien DARRAGON	2462e5bcca	BUG/MINOR: log: fix potential lf->name memory leak Recent commit `2ed6068` ("MINOR: log: custom name for logformat node") introduced a potential memory leak because when custom name is provided, lf->name value is allocated using strdup(), thus is expected to be freed alongside the node when the node is released. However lf->name was only freed in some common places within log.c cleanups and helpers func, but in reality there are still cases where lf nodes are manually freed without making use of freeing helpers. So this is what this patch does, it makes sure all lf freeing places now leverage the free_logformat_node() helper function that takes care of freeing all known allocated elements within the node, including custom name. This commit depends on: - "MINOR: log: add free_logformat_node() helper function" No backport needed unless `2ed6068` gets backported.	2024-02-22 15:32:42 +01:00
Aurelien DARRAGON	1c2e16ba8a	MINOR: log: add free_logformat_node() helper function Function may be used to free a single logformat node.	2024-02-22 15:32:42 +01:00
Aurelien DARRAGON	62121d5b90	CLEANUP: log: use free_logformat_list() in parse_logformat_string() This is a follow up for `24a5e42db6` ("CLEANUP: log: deinitialization of the log buffer in one function") as there was another opportunity to make use of the new cleanup function.	2024-02-22 15:32:42 +01:00
Aurelien DARRAGON	e7aee6edd5	CLEANUP: log: fix process_send_log() indentation Fix bad indentation for process_send_log() prototype (tab was used instead of spaces)	2024-02-22 15:32:42 +01:00
Christopher Faulet	1a2a196fcf	BUG/MEDIUM: mux-h1: Don't emit 0-CRLF chunk in h1_done_ff() when iobuf is empty A chunk message transferred via zero-copy forwarding in H1 may be corrupted. This only happens when the chunk size is not known during the nego stage and when there is nothing to forward when h1_donn_ff() is called. In this case, we always emit a chunk. Because there is nothing to forward, a 0-CRLF is emitted in the middle of the message. The issue occurred with the HTTP stats applet only. A simple fix is to check the size of data in the iobuf before emitting a new chunk in h1_done_ff(). However, we still try to send outgoing data because when this happens, it is most of time because the H1 output buffer is almost full. This patch should fix the issue #2453. No backport needed.	2024-02-21 11:49:58 +01:00
Amaury Denoyelle	a17eaf7763	BUG/MINOR: quic: initialize msg_flags before sendmsg Previously, msghdr struct used for sendmsg was memset to 0. This was updated for performance reason with each members individually defined. This is done by the following commit : commit `107d6d7546` OPTIM: quic: improve slightly qc_snd_buf() internal msg_flags is the only member unset, as sendmsg manual page reports that it is unused. However, this caused a coverity report. In the end, it is better to explicitely set it to 0 to avoid any future interrogations, compiler warning or even portability issues. This should fix coverity report from github issue #2455. No need to backport unless above patch is.	2024-02-21 10:13:53 +01:00
Willy Tarreau	9d572952a2	BUILD: applet: fix build on some 32-bit archs The to_forward field was added to debugging output of applets with commit `62a81cb6a` ("MINOR: applet: Add callback function to deal with zero-copy forwarding"), though it's a size_t printed as %lu, which causes complaints on 32-bit archs. Let's just cast as %lu. No backport is needed.	2024-02-21 04:18:32 +01:00
Amaury Denoyelle	8b950f40fa	MINOR: quic: only use sendmsg() syscall variant This patch is the direct followup of the previous one : MINOR: quic: remove sendto() usage variant This finalizes qc_snd_buf() simplification by removing send() syscall usage for quic-conn owned socket. Syscall invocation is merged in a single code location to the sendmsg() variant. The only difference for owned socket is that destination address for sendmsg() is set to NULL. This usage is documented in man 2 sendmsg as valid for connected sockets. This allows maximum performance by avoiding unnecessary lookups on kernel socket address tables. As the previous patch, no functional change should happen here. However, it will be simpler to extend qc_snd_buf() for GSO usage.	2024-02-20 16:42:05 +01:00
Amaury Denoyelle	8de9f8f193	MINOR: quic: remove sendto() usage variant qc_snd_buf() is a wrapper around emission syscalls. Given QUIC configuration, a different variant is used. When using connection socket, send() is the only used. For listener sockets, sendmsg() and sendto() are possible. The first one is used only if local address has been retrieved prior. This allows to fix it on sending to guarantee the source address selection. Finally, sendto() is used for systems which do not support local address retrieval. All of these variants render the code too complex. As such, this patch simplifies this by removing sendto() alternative. Now, sendmsg() is always used for listener sockets. Source address is then specified only if supported by the system. This patch should not exhibit functional behavior changes. It will be useful when implementing GSO as the code is now simpler.	2024-02-20 16:42:05 +01:00
Amaury Denoyelle	ea90c39302	MINOR: quic: move IP_PKTINFO on send on a dedicated function When using listener socket, source address for emission is explicitely set using ancillary data for sendmsg(). This is useful to guarantee the correct address is used when binding on a non-explicit address. This code was implemented directly under qc_snd_buf(). However, it is quite complex due to portability issue. For IPv4, two parallel implementations coexist, defined under IP_PKTINFO or IP_RECVDSTADDR. For IPv6, another option is defined under IPV6_RECVPKTINFO. Each variant uses its distinct name which increase the code complexity. Extract ancillary data filling in a dedicated function named cmsg_set_saddr(). This reduces greatly the body of qc_snd_buf(). Such functions can be replicated when other ancillary data type will be implemented. This will notably be useful for GSO implementation.	2024-02-20 16:42:05 +01:00
Amaury Denoyelle	107d6d7546	OPTIM: quic: improve slightly qc_snd_buf() internal qc_snd_buf() is a wrapper for sendmsg() syscall (or its derivatives) used for all QUIC emissions. This patch aims at removing several non-optimal code sections : * fd_send_ready() for connected sockets is only checked on the function preambule instead of inside the emission loop * zero-ing msghdr structure for unconnected sockets is removed. This is unnecessary as all fields are properly initialized then. * extra memcpy/memset invocations when using IP_PKTINFO/IPV6_RECVPKTINFO are removed by setting directly the address value into cmsg buffer	2024-02-20 16:42:05 +01:00
Amaury Denoyelle	9b806550b7	MINOR: quic: warn on bind on multiple addresses if no IP_PKTINFO support Binding on multiple addresses for QUIC is safe only if IP_PKTINFO or equivalent is available. Else, the behavior may be undefined as the system is responsible to choose the network interface and source address on response. This commit adds a warning on boot if no or partial support for IP_PKTINFO or equivalent is detected and configuration contains UDP binding on multiple addresses. This should be backported up to 2.6. Special backport recommdations : * change ha_warning() to ha_diag_warning() to ensure no spurrious warnings will be triggered on stable releases * IP_PKTINFO usage was introduced on 2.7. For 2.6, multiple addresses QUIC binding is always unreliable. As such, preprocessor condition must simply be removed so that the warning is always active regarding of the system. Warning message should also be truncated to suppress IP_PKTINFO reference.	2024-02-20 16:40:14 +01:00
Aurelien DARRAGON	ee88c4418f	MINOR: log: automate string array construction in sess_build_logline() make it so string array construction is performed by dedicated macro helpers instead of manual char insertion between string members. The goal is to easily be able to support multiple forms of array construction depending on the data encoding format (raw, json..). Only %hrl and %hsl logformats are concerned.	2024-02-20 15:49:55 +01:00
Aurelien DARRAGON	8d2b9e2acd	MINOR: log: print metadata prefixes separately in sess_build_logline() Some log variables may be prefixed with specific chars that represent extra informations that are relevant with it but are are not directly part of the "raw" value. ie: '+' char is prepended before some values when "option logasap" is used to indicate that the value has not yet reached its final value. However, as those "metadata" are printed using the general purpose LOGCHAR() printing helper, it's not easy to tell if they are part of the base value or not. In this patch we add the LOGMETACHAR() helper that is a wrapper for LOGCHAR(). The goal is to prepare for adding some logic to prevent such additional infos from being generated when not relevant or needed.	2024-02-20 15:49:55 +01:00
Aurelien DARRAGON	a2fc40bc28	MINOR: log: simplify quotes handling in sess_build_logline() quotes building for some log formats is directly performed under each switch case statement so it would become painful to add other conditions to prevent the quotes from being generated when it's not supported by the the data encoding format for instance (ie: JSON). Let's centralize and simplify quotes handling by adding LOGQUOTE_START() and LOGQUOTE_END() helper macros. If a quotation is started and not explicitly ended, it will be automatically ended at the end of the current logformat node: LOGQUOTE_START() sets 'quote' variable to 1, this way LOGQUOTE_END() only prints the ending quote when needed. LOGQUOTE_END() is systematically called after each node switch-case (after each value). LOGQUOTE_START() does nothing if LOG_OPT_QUOTE isn't set, so does LOGQUOTE_END(). Some rare cases such as %hsl (list of captured headers) required special handling: in this case multiple quoted texts are generated for the same field value so explicit LOGQUOTE_START() + LOGQUOTE_END() combination was needed.	2024-02-20 15:49:55 +01:00
Aurelien DARRAGON	c6a7138420	MINOR: log: simplify last_isspace in sess_build_logline() last_isspace variable is explicitly set to 0 in all cases except LOG_FMT_SEPARATOR case. So we can actually simplify the code by setting last_isspace to 0 by default and skipping the assignment for the LOG_FMT_SEPARATOR case.	2024-02-20 15:49:55 +01:00
Aurelien DARRAGON	1448478d62	MINOR: log: explicit typecasting for logformat nodes Add the ability to manually specify desired output type after a custom field name for logformat nodes. Forcing the type can be useful to ensure value is stored with the proper type representation. (i.e.: forcing numerical to string to work around the limited resolution of JS number types) By default, type is set to SMP_T_SAME, which means the original type will be preserved. Currently supported types are: bool, str, sint	2024-02-20 15:49:54 +01:00
Aurelien DARRAGON	0cfcc64b79	MINOR: sample: add type_to_smp() helper function type_to_smp(type) does the reverse operation of smp_to_type[smp]: it takes a type name as input string and tries to return the corresponding SMP_T_* smp type or SMP_TYPES if not found.	2024-02-20 15:18:39 +01:00
Aurelien DARRAGON	2ed6068f2a	MINOR: log: custom name for logformat node Add the ability to specify custom name (will be used for representation in verbose output types such as json) to logformat nodes. For now, a custom name should be composed by characters [a-zA-Z0-9-_]*	2024-02-20 15:18:39 +01:00
Amaury Denoyelle	5b31989a3f	BUG/MEDIUM: quic: fix transient send error with listener socket Transient send errors is handled differentely if using connection or listener socket for QUIC transfers. In the first case, proper poller subscription is used via fd_cant_send()/fd_want_send(). For the listener socket case, error is ignored by qc_snd_buf() caller and retransmission mechanism will allow to reemit the data. For listener socket, transient error code handling is buggy. It blindly uses fd_cand_send() with <qc.fd> member which is set to -1 for listener socket usage. This results in an invalid fdtab access, with a possible crash or a modification of a totally unrelated FD. This bug is simply fixed by using qc_test_fd() before using fd_cant_send()/fd_want_send(). This ensures <qc.fd> is used only if initialized which is only the case when using connection socket. No crash was reported yet for this bug. However, it is reproducible by using ASAN compilation and the following strace sendmsg() errno command injection : # strace -qq -yy -p $(pgrep haproxy) -f -e trace=%network \ -e inject=sendto,sendmsg:error=EAGAIN:when=20+20 This must be backported up to 2.7.	2024-02-19 17:56:51 +01:00
Christopher Faulet	56e73df37d	BUG/MEDIUM: hlua: Don't loop if a lua socket does not consume received data If some data are received for a lua socket while the lua script responsible to consume these data is not ready to do so, for instance because it is sleeping, the applet is woken up in loop because it never states it will not consume these data yet. To fix the issue, in the applet I/O handle, when there are outgoing data, we always pretend the applet will not consume it. It is the responsibility to the lua script to reactivate receives by calling Socket.receive() function. This patch must be backported to every stable version. For 2.4 and older, si_want_get()/si_cant_get() must be used instead of applet_will_consume()/applet_wont_consume().	2024-02-16 15:48:08 +01:00
Christopher Faulet	38534d344b	BUG/MEDIUM: hlua: Be able to garbage collect uninitialized lua sockets It is poosible to create a lua socket without performing any connect. In this case, the lua socket is released because of the garbage collector. However, the garbarge collector does not release the applet, it wakes it up. Since commit `751b59c40b` ("BUG/MEDIUM: hlua: Initialize appctx used by a lua socket on connect only"), the applet initialization is performed on connect. So, here, it is possible to wake an uninitialized applet. It is an unexpected case for the applet's I/O handler, leading to a segfault because some resources are not initialized (the stream's target in this case). So, now, in the lua socket GC function, we take care to immediately release uninitialized applets. At worst, the release itself is delayed. But it is safe because we are sure the applet's I/O handler will never be executed. In addition, we take case to increment the GC counter when the lua socket is created. The way, uninitialized lua socket are released more quickly. This patch should fix the issue #2451. It must be backported as far as 2.6.	2024-02-16 15:48:08 +01:00
Christopher Faulet	cd7e73efae	BUG/MEDIUM: applet: Immediately free appctx on early error When an error is triggered during the applet initialization, a dedicated function is called to release it. Indeed, in this case, because the applet was not initialized, the ->release callback must not be called. However, because the init stage may be delayed to be performed during the first applet wakeup, we must also take care to not rely on the default appctx_free() function, to immediately release the applet. Otherwise, if the error happens in a delayed init stage, the applet is never released. This patch partially fix the issue #2451. It must be backported as far as 2.6.	2024-02-16 15:48:08 +01:00
Amaury Denoyelle	f8df9bd6a5	BUG/MINOR: qpack: reject invalid dynamic table capacity Currently haproxy does not implement dynamic table support for QPACK. As such, dynamic table capacity advertized via H3 SETTINGS is 0. When receiving a non-null Set Dynamic Table Capacity instruction, close immediately the connection using QPACK_ENCODER_STREAM_ERROR. Prior to this patch, such instructions were simply ignored. This is non conform to QUIC specification. This should be backported up to 2.6. Note that on 2.6 qcc_set_error() must be replaced by function qcc_emit_cc_app().	2024-02-15 17:46:53 +01:00
Amaury Denoyelle	bd71212ea9	BUG/MINOR: qpack: reject invalid increment count decoding Close the connection using QPACK_DECODER_STREAM_ERROR when receiving an invalid insert count increment. As haproxy does not use dynamic table, this instruction must never be emitted by the peer. Prior to this patch, haproxy silently ignored such instruction which is not conform to the QUIC specification. This should be backported up to 2.6. Note that on 2.6 qcc_set_error() must be replaced by function qcc_emit_cc_app().	2024-02-15 17:46:19 +01:00
Amaury Denoyelle	cc29ab437e	BUG/MINOR: quic: reject HANDSHAKE_DONE as server As specified in RFC 9000, a client must never emit a HANDSHAKE_DONE frame. If this happens, the server must close the connection with error PROTOCOL VIOLATION. Previously, such a frame was silently discarded on server side. The connection remained opened which is not conformant to the specification. This should be backported up to 2.6.	2024-02-15 17:07:24 +01:00
Amaury Denoyelle	80b82c2192	MINOR: quic: handle all frame types on reception Ensure every frame types are handled in qc_parse_pkt_frms. Add an ABORT_NOW on the default case. This is safe as an unknown frame must be rejected prior via qc_parse_frm().	2024-02-15 17:07:24 +01:00
Amaury Denoyelle	5a2aa8c161	BUG/MINOR: quic: reject unknown frame type As specified by RFC 9000, connection is closed on error if an unknown QUIC frame type is received. Previously, a frame with unknown type was silently discarded. The connection remained opened which is not conformant to the specification. This should be backported up to 2.6.	2024-02-15 17:04:17 +01:00
Christopher Faulet	081022a0c5	MINOR: muxes/applet: Simplify checks on options to disable zero-copy forwarding Global options to disable for zero-copy forwarding are now tested outside callbacks responsible to perform the forwarding itself. It is cleaner this way because we don't try at all zero-copy forwarding if at least one side does not support it. It is equivalent to what was performed before, but it is simplier this way.	2024-02-14 15:41:04 +01:00
Christopher Faulet	ddf6b7539c	BUG/MAJOR: stconn: Check support for zero-copy forwarding on both sides There is a nego stage when a producer is ready to forward data to the other side. At this stage, the zero-copy forwarding may be disabled if the consumer does not support it. However, there is a flaw with this way to proceed. If the channel buffer is not empty, we delay the zero-copy forwarding to flush all data from the channel first. During this delay, receives on the endpoint (at connection level for muxes), are blocked to be sure to have the opportunity to switch on zero-copy forwarding. It is a problem if the consumer cannot flush data from the channel's buffer, waiting for more data for instance. It is especially annoying with the CLI applet, because this scenario can happen if a command is partially received. For instance without the LF at the end. In this case, the CLI applet is blocked because it waits more data. The frontend connexion is also blocked because channel's data must be flushed before trying to receive more data. Worst, this happen at where no timeout is armed. Thus the session is stuck infinitly, client aborts cannot be detected because receives are blocked, and the applet cannot abort on its side because there are pending outgoing data. It is clearly a situation where it is easy to consume all CLI slots. To fix the issue, thanks to previous commits, we now check zero-copy forwarding support on both sides before proceeding. This patch relies on the following commits: * MINOR: muxes: Announce support for zero-copy forwarding on consumer side * MINOR: stconn: Add SE flag to announce zero-copy forwarding on consumer side * MINOR: stconn: Rename SE_FL_MAY_FASTFWD and reorder bitfield * CLEANUP: stconn: Move SE flags set by app layer at the end of the bitfield All the series must be backported to 2.9.	2024-02-14 15:41:02 +01:00
Christopher Faulet	e2921ffad1	MINOR: muxes: Announce support for zero-copy forwarding on consumer side It is unused for now, but the muxes announce their support of the zero-copy forwarding on consumer side. All muxes, except the fgci one, are supported it.	2024-02-14 15:15:10 +01:00
Christopher Faulet	7598c0ba69	MINOR: stconn: Rename SE_FL_MAY_FASTFWD and reorder bitfield To fix a bug, a flag to announce the capabitlity to support the zero-copy forwarding on the consumer side will be added on the SE descriptor. So the old flag SE_FL_MAY_FASTFWD is renamed to indicate it concerns the producer side. It is now SE_FL_MAY_FASTFWD_PROD. And to prepare addition of the new flag, the bitfield is a bit reordered.	2024-02-14 15:00:32 +01:00
Christopher Faulet	40d98176ba	BUG/MEDIUM: stconn: Don't check pending shutdown to wake an applet up This revert of commit `0b93ff8c87` ("BUG/MEDIUM: stconn: Wake applets on sending path if there is a pending shutdown") and `9e394d34e0` ("BUG/MINOR: stconn: Don't report blocked sends during connection establishment") because it was not the right fixes. We must not wake an applet up when a shutdown is pending because it means output some data are still blocked in the channel buffer. The applet does not necessarily consume these data. In this case, the applet may be woken up infinitly, except if it explicitly reports it wont consume datay yet. This patch must be backported as far as 2.8. For older versions, as far as 2.2, it may be backported. If so, a previous fix must be pushed to prevent an HTTP applet to be stuck. In http_ana.c, in http_end_request() and http_end_reponse(), the call to channel_htx_truncate() on the request channel in case of MSG_ERROR must be replace by a call to channel_htx_erase().	2024-02-14 14:22:36 +01:00
Christopher Faulet	5895fa8ce0	MINOR: cli: No longer check SC for shutdown to interrupt wait command Thanks to the previous patch ("MEDIUM: applet: Add notion of shutdown for write for applets"), it is no longer necessary to check SC flags to detect shutdowns to interrupt the wait command. It is possible to remove this ugly workaround. In addition, we only test the SE for shutdown because end of stream and error are already checked by the CLI I/O handler. And it is no longer necessary to remove output data from the channel's buffer because shutdown are not reported if there are remaining outgoing data. Of course, if the "wait" command is backported, the commit above and this one must be backported too.	2024-02-14 14:22:36 +01:00
Christopher Faulet	4a78f766ff	MEDIUM: applet: Add notion of shutdown for write for applets In fact there is already flags on the SE to state a shutdown for reads or writes was performed. But for applets, this notion does not exist. Both flags are set in same time when the applet is released. But at the SC level, there are functions to perform a shutdown (formely the shutw) and an abort (formely the shutr). For applets, when a shutdown is performed on the SC, if the applet is not immediately released, nothing is acknowledge at the SE level. With old way to implement applets, this was not an real issue until recently because applets accessed to the channel/SC flags. It was thus possible to catch the shutdowns. But the "wait" command on the CLI reveals the flaw. Indeed, when this command is executed, nothing is read or sent. So, it is not possible to detect the shutdowns. As a workaround, a dedicated test on the SC flags was added at the end of the wait command I/O handler. But it is pretty ugly. With new way to implement applets, there is no longer access to the channel or SC. So we must add a way to acknowledge shutdown into the SE. This patch solves the both sides of the issue. The shutw notion is added for applets. Its only purpose is to set SE_FL_SHWN flags. This flag is tested by all applets, so, it solves the issue quite simply. Note that it is described as a bug fix but there is no real issue, just a design flaw. However, if the "wait" command is backported, this patch must be backported too. Unfortinately it will require an adaptation because there is no appctx flags on older versions.	2024-02-14 14:22:36 +01:00
Christopher Faulet	dcd917d972	MINOR: applet: Remove uselelss test on SE_FL_SHR/SHW flags These both flags are set after releasing the applet, in appctx_shut(). Concretly, it means the applet is shutdown for reads and writes. Once set, the applet's I/O handler was no longer called. Tests on these flags are useless. There is no chance to match them.	2024-02-14 14:22:36 +01:00
Christopher Faulet	5df45cff8f	BUG/MEDIUM: stconn/applet: Block 0-copy forwarding if producer needs more room This case does not exist yet with the H1 multiplexer, but applets may decide to not produce data if there is not enough room in the destination buffer (the applet's outbuf or the opposite SE buffer). It is true for the stats applets for instance. However this case is not properly handled when the zero-copy forwarding is in-use. To fix the issue, the se_done_ff() function was modified to return the number of bytes really forwarded and to subs for sends if nothing was forwarded while the zero-copy forwarding was blocked by the producer. On the applet side, we take care to block the zero-copy forwarding if the applet requests more room. At the end, zero-copy forwarding is unblocked if something was forwarded. This way, it is now possible for the stats applet to report a full buffer and block the zero-copy forwarding, even if the buffer is not really full, by requesting more room. No backport needed.	2024-02-14 14:22:36 +01:00
Christopher Faulet	ece002af1d	BUG/MEDIUM: applet: Add a flag to state an applet is using zero-copy forwarding An issue was introduced when zero-copy forwarding was added to the stats and cache applets. There is no test to be sure the upper layer is ready to use the zero-copy forwarding. So these applets refuse to deliver the response into the applet's output buffer if the zero-copy forwarding is supported by the opposite endpoint. It is especially an issue when a filter, like the compression, is in-use on the response channel. Because of this bug, the response is not delivered and the applet is woken up in loop to produce data. To fix the issue, an appctx flag was added, APPCTX_FL_FASTFWD, to know when the zero-copy forwarding is in-use. We rely on this flag to not fill the outbuf in the applet's I/O handler. No backport needed.	2024-02-14 14:22:36 +01:00
Christopher Faulet	1465eb570b	MINOR: stats: Use a dedicated function to check if output is almost full This simplifies a bit the stats applet. Because the CLI part was not refactored yet to use the applet's buffers, there are 3 ways to produce data: * the HTX message for the HTTP stats when zero-copy forwarding is not used * raw data in the opposite endpoint buffer for the HTTP stats when zero-copy forwarding is used * the channel buffer when the CLI "show stat" command is evaluated There is already a dedicated function to take care to copy data at the right place. There is now also a dedicated function to check us the output buffer is almost full.	2024-02-14 14:22:36 +01:00
Christopher Faulet	3ee3a7937a	BUG/MAJOR: mux-h1: Fix zero-copy forwarding when sending chunks of unknown size Commit `91b77c1632` ("MEDIUM: mux-h1: Support zero-copy forwarding for chunks with an unknown size") was recently pushed but it contains 3 bugs. The first one is during the nego. The extra size reserved for the CRLF at the end of the chunk must not be added to the offset value. Indeed, the CRLF will be appended after the data and not prepended to them. The second one, still during the nego, is an integer overflow when the available room in the output buffer is computed. Finally, the last one is when the chunk itself is formatted. This part was totally buggy if the output buffer was not empty at the beginning. No backport needed.	2024-02-14 14:22:36 +01:00
Frederic Lecaille	167e38e0e0	MINOR: quic: Add a counter for reordered packets A packet is considered as reordered when it is detected as lost because its packet number is above the largest acknowledeged packet number by at least the packet reordering threshold value. Add ->nb_reordered_pkt new quic_loss struct member at the same location that the number of lost packets to count such packets. Should be backported to 2.6.	2024-02-14 11:32:29 +01:00
Frederic Lecaille	eeeb81bb49	MINOR: quic: Dynamic packet reordering threshold Let's say that the largest packet number acknowledged by the peer is #10, when inspecting the non already acknowledged packets to detect if they are lost or not, this is the case a least if the difference between this largest packet number and and their packet numbers are bigger or equal to the packet reordering threshold as defined by the RFC 9002. This latter must not be less than QUIC_LOSS_PACKET_THRESHOLD(3). Which such a value, packets #7 and oldest are detected as lost if non acknowledged, contrary to packet number #8 or #9. So, the packet loss detection is very sensitive to such a network characteristic where non acknowledged packets are distant from each others by their packet number differences. Do not use this static value anymore for the packet reordering threshold which is used as a criteria to detect packet loss. In place, make it depend on the difference between the number of the last transmitted packet and the number of the oldest one among the packet which are still in flight before being inspected to be deemed as lost. Add new tune.quic.reorder-ratio setting to apply a ratio in percent to this dynamic packet reorder threshold. Should be backported to 2.6.	2024-02-14 11:32:29 +01:00
Frederic Lecaille	2ed53ae4a0	MINOR: quic: Update K CUBIC calculation (RFC 9438) The new formula for K CUBIC which arrives with RFC 9438 is as follows: K = cubic_root((W_max - cwnd_epoch) / C) Note that W_max is c->last_w_max, and cwnd_epoch is c->cwnd when entering quic_cubic_update() just after a congestion event. Must be backported as far as 2.6.	2024-02-12 13:44:42 +01:00
Frederic Lecaille	406c63ba44	BUG/MEDIUM: quic: Wrong K CUBIC calculation. The formula for K CUBIC calculation is as follows: K = cubic_root(W_max * (1 - beta_quic) / C). Note that this does not match the comment. But the aim of this patch is to not hide a bug inside another patch to update this K CUBIC calculation. The unit of C is bytes/s^3 (or segments/s^3). And we want to store K as milliseconds. So, the conversion inside the cubic_root() to convert seconds in milliseconds is wrong. The unit used here is bytes/(ms/1000)^3 or bytes*1000^3/ms^3. That said, it is preferable to compute K as seconds, then convert to milliseconds as done by this patch. Must be backported as far as 2.6.	2024-02-12 13:44:42 +01:00
Remi Tricot-Le Breton	5e66bf26ec	BUG/MEDIUM: ssl: Fix crash when calling "update ssl ocsp-response" when an update is ongoing The CLI command "update ssl ocsp-response" was forcefully removing an OCSP response from the update tree regardless of whether it used to be in it beforehand or not. But since the main OCSP upate task works by removing the entry being currently updated from the update tree and then reinserting it when the update process is over, it meant that in the CLI command code we were modifying a structure that was already being used. These concurrent accesses were not properly locked on the "regular" update case because it was assumed that once an entry was removed from the update tree, the update task was the only one able to work on it. Rather than locking the whole update process, an "updating" flag was added to the certificate_ocsp in order to prevent the "update ssl ocsp-response" command from trying to update a response already being updated. An easy way to reproduce this crash was to perform two "simultaneous" calls to "update ssl ocsp-response" on the same certificate. It would then crash on an eb64_delete call in the main ocsp update task function. This patch can be backported up to 2.8.	2024-02-12 11:15:45 +01:00
Willy Tarreau	b746af9990	BUG/MEDIUM: pool: fix rare risk of deadlock in pool_flush() As reported by github user @JB0925 in issue #2427, there is a possible crash in pool_flush(). The problem is that if the free_list is not empty in the first test, and is empty at the moment the xchg() is performed, for example because another thread called it in parallel, we place a POOL_BUSY there that is never removed later, causing the next thread to wait forever. This was introduced in 2.5 with commit `2a4523f6f` ("BUG/MAJOR: pools: fix possible race with free() in the lockless variant"). It has probably very rarely been detected, because: - pool_flush() is only called when stopping is set - the function does nothing if global pools are disabled, which is the case on most modern systems with a fast memory allocator. It's possible to reproduce it by modifying __task_free() to call pool_flush() on 1% of the calls instead of only when stopping. The fix is quite simple, it consists in moving the zeroing of the entry in the break path after verifying that the entry was not already busy. This must be backported wherever commit `2a4523f6f` is.	2024-02-10 12:38:40 +01:00
Willy Tarreau	ab8928b9db	BUILD: address a few remaining calloc(size, n) cases In issue #2427 Ilya reports that gcc-14 rightfully complains about sizeof() being placed in the left term of calloc(). There's no impact but it's a bad pattern that gets copy-pasted over time. Let's fix the few remaining occurrences (debug.c, halog, udp-perturb). This can be backported to all branches, and the irrelevant parts dropped.	2024-02-10 11:37:27 +01:00
Willy Tarreau	613e959c7b	MINOR: cli/wait: add a condition to wait on a server to become unused The "wait" command now supports a condition, "srv-unused", which waits for the designated server to become totally unused, indicating that it is removable. Upon each wakeup it calls srv_check_for_deletion() to verify if conditions are met, if not if it's recoverable, or if it's not recoverable, and proceeds according to this, never waiting for a final decision longer than the configured delay. The purpose is to make it possible to remove servers from the CLI after waiting for their sessions to be terminated: $ socat -t5 /path/to/socket - <<< " disable server px/srv1 shutdown sessions server px/srv1 wait 2s srv-unused px/srv1 del server px/srv1" Or even wait for connections to terminate themselves: $ socat -t70 /path/to/socket - <<< " disable server px/srv1 wait 1m srv-unused px/srv1 del server px/srv1"	2024-02-09 20:38:08 +01:00
Willy Tarreau	66989ff426	MINOR: cli/wait: also pass up to 4 arguments to the external conditions Conditions will need to have context, arguments etc from the command line. Since these will vary with time (otherwise we wouldn't wait), let's just pass them as text (possibly pre-processed). We're starting with 4 strings that are expected to be allocated by strdup() and are always sent to free() upon release.	2024-02-09 20:38:08 +01:00
Willy Tarreau	2673f8be82	MINOR: cli/wait: also support an unrecoverable failure status Since we'll support waiting for an action to succeed or permanently fail, we need the ability to return an unrecoverable failure. Let's add CLI_WAIT_ERR_FAIL for this. A static error message may be placed into ctx->msg to report to the user why the failure is unrecoverable.	2024-02-09 20:38:08 +01:00
Willy Tarreau	d8731c6680	MINOR: cli/wait: make the wait command support a more detailed help message We'll want to add some waiting conditions, so let's support -h to show the available list, and also print this usage on unknown options.	2024-02-09 20:38:08 +01:00
Willy Tarreau	9b680d7411	MINOR: server: split the server deletion code in two parts We'll need to be able to verify whether or not a server may be deleted. For now, both the verification and the action are performed in the same function, at once under thread isolation. The goal here is to extract the verification code into a new function that will perform these checks, return a status between success/recoverable/non-recoverable failure, and will also return a message for the caller.	2024-02-09 20:38:08 +01:00
Christopher Faulet	17cc4e4684	BUG/MINOR: applet: Always release empty appctx buffers after processing When an applet is using its own buffers, it is important to release them, if empty, after processing to recycle unsued buffers. It is not a leak because these buffers are necessarily released when the applet is released. But this leads to an excess of buffer allocations. No need to backport.	2024-02-09 15:14:38 +01:00
Willy Tarreau	1d2255a78a	MINOR: cli: add a new "wait" command to wait for a certain delay This allows to insert delays between commands, i.e. to collect a same set of metrics at a fixed interval. E.g: $ socat -t20 /path/to/socket <<< "show activity; wait 10s; show activity" The goal will be to extend the feature to optionally support waiting on certain conditions. For this reason the struct definitions and enums were placed into cli-t.h.	2024-02-08 21:54:54 +01:00
Willy Tarreau	02b31fa003	MINOR: cli: always reset the applet task's timeout The CLI applet doesn't make use of its timeout at all, only the stream does. That's a wonder because it allows any command's I/O handler to trivially set a wakeup timer by simply touching the task's ->expire field, and the I/O handler will automatically be woken up again. The only condition for this is that we properly take care of clearing that timeout whenever we finish processing a command and switch back to the PROMPT state. That's what this patch does.	2024-02-08 20:53:31 +01:00
Willy Tarreau	3d91ffdaff	MINOR: cli: make sure to always print a pending message after release() If a release handler produces a final message, it's currently left pending in the CLI context and needs another I/O event to be dumped because immediately after calling ->release, we check for states OUTPUT and above and we wait until more data arrives. This patch adds continue statement to go back to the loop immediately after leaving the release handler in order to attempt to emit the output message. At this point it's not sure whether any release handlers are producing messages, so it's probably not needed to backport this.	2024-02-08 18:22:35 +01:00
Willy Tarreau	6219a58d28	BUG/MEDIUM: cli: fix once for all the problem of missing trailing LFs Some commands are still missing their trailing LF, and very few were even already spotted in the past emitting more than one. The risk of missing this LF is particularly high, especially when tests are run in non- interactive mode where the output looks good at first glance. The problem is that once run in interactive mode, the missing empty line makes the command not being complete, and scripts can wait forever. Let's tackle the problem at its root: messages emitted at the end must always end with an LF and we know some miss it. Thus, in cli_output_msg() we now start by removing the trailing LFs from the string, and we always add exactly one. This way the trailing LF added by correct functions are silently ignored and all functions are now correct. This would need to be progressively backported to all supported versions in order to address them all at once, though the risk of breaking a legacy script relying on the wrong output is never zero. At first it should at least go as far as the lastest LTS (2.8), and maybe another one depending on user demands. Note that it also requires previous patch ("BUG/MINOR: vars/cli: fix missing LF after "get var" output") because it fixes a test for a bogus output for "get var" in a VTC.	2024-02-08 18:22:15 +01:00
Willy Tarreau	5d0dd88ac6	BUG/MINOR: vars/cli: fix missing LF after "get var" output "get var" on the CLI was also missing an LF, and the vtest as well, so that fixing only the code breaks the vtest. This must be backported to 2.4 as the issue was brought with commit `c35eb38f1d` ("MINOR: vars/cli: add a "get var" CLI command to retrieve global variables").	2024-02-08 18:22:01 +01:00
Willy Tarreau	eaeb67bdb4	BUG/MINOR: server/cli: add missing LF at the end of certain notice/error lines Some cli_err(), cli_msg() or even ha_error() etc are missing the trailing LF, which breaks the continuity of the CLI parsing: the extra LF that serves to mark the end of the command is in fact taken as the missing LF and no extra one is added. This patch adds the missing LF on identified messages. It might be worth trying to proceed in a more generic way with this, given the amount of code that is possibly at risk.	2024-02-08 18:21:52 +01:00
Willy Tarreau	870e2d3f1f	MEDIUM: mux-h2: update session trackers with number of glitches We now update the session's tracked counters with the observed glitches. In order to avoid incurring a high cost, e.g. if many small frames contain issues, we batch the updates around h2_process_demux() by directly passing the difference. Indeed, for now all functions that increment glitches are called from h2_process_demux(). If that were to change, we'd just need to keep the value of the last synced counter in the h2c struct instead of the stack. The regtest was updated to verify that the 3rd client that does not cause issue still sees the counter resulting from client 2's mistakes. The rate is also verified, considering it shouldn't fail since the period is very long (1m).	2024-02-08 15:51:49 +01:00
Willy Tarreau	8581d62daf	MINOR: session: add the necessary functions to update the per-session glitches This provides a new function session_add_glitch_ctr() that will update the glitch counter and rate for the session, if tracked at all.	2024-02-08 15:51:49 +01:00
Willy Tarreau	c9c6b683fb	MEDIUM: stick-tables: add a new stored type for glitch_cnt and glitch_rate This adds a new pair of stored types in the stick-tables: - glitch_cnt - glitch_rate These keep count of the number of glitches reported on a front connection, in order to decide how to act with a badly defective client or a potential attacker. For now nothing updates these counters, but all the infrastructure needed to configure, update and retrieve them was added, including the doc. No regtest was added yet since they're not filled yet.	2024-02-08 15:51:49 +01:00
Willy Tarreau	9f3a0834d8	MINOR: mux-h2: count late reduction of INITIAL_WINDOW_SIZE as a glitch It's quite uncommon for a client to decide to change the connection's initial window size after the settings exchange phase, unless it tries to increase it. One of the impacts depending is that it updates all streams, so it can be expensive, depending on the stacks, and may even be used to construct an attack. For this reason, we now count a glitch when this happens. A test with h2spec shows that it triggers 9 across a full test.	2024-02-08 15:51:49 +01:00
Willy Tarreau	28dfd006ca	MINOR: mux-h2: count excess of CONTINUATION frames as a glitch Here we consider that if a HEADERS frame is made of more than 4 fragments whose average size is lower than 1kB, that's very likely an abuse so we count a glitch per 16 fragments, which means 1 glitch per 1kB frame in a 16kB buffer. This means that an abuser sending 1600 1-byte frames would increase the counter by 100, and that sending 100 headers per request in individual frames each results in a count of ~7 to be added per request. A test consisting in sending 100M requests made of 101 frames each over a connection resulted in ~695M glitches to be counted for this connection. Note that no special care is taken to avoid wrapping since it already takes a very long time to reach 100M and there's no particular impact of wrapping here (roughly 1M/s).	2024-02-08 15:51:49 +01:00
Willy Tarreau	eeacca75d1	BUG/MINOR: mux-h2: count rejected DATA frames against the connection's flow control RFC9113 clarified a point regarding the payload from DATA frames sent to closed streams. It must always be counted against the connection's flow control. In practice it should really have no practical effect, but if repeated upload attempts are aborted, this might cause the client's window to progressively shrink since not being ACKed. It's probably not necessary to backport this, unless another patch depends on it.	2024-02-08 15:51:49 +01:00
Aurelien DARRAGON	0c437b2dfc	MINOR: sample: implement bc_{be,srv}_queue samples %[bc_be_queue] and %[bc_srv_queue] are equivalent to %bq and %sq tags in log-format.	2024-02-08 09:39:23 +01:00
Aurelien DARRAGON	16014bc5b3	MINOR: stream: rename "txn.redispatch" to "txn.redispatched" The fetch will return true if the stream was redispatched: this is a past action, thus we rename the fetch to better reflect its true meaning and prevent confusions. Documentation was updated. While at it, the fetch was moved from internal states section to Layer 4 section, which is where it belongs. No backport needed unless `92b2edb` (" MINOR: stream: add "txn.redispatch" fetch") gets backported.	2024-02-08 09:39:14 +01:00
Remi Tricot-Le Breton	e29ec2e649	BUG/MINOR: ssl: Reenable ocsp auto-update after an "add ssl crt-list" If a certificate that has an OCSP uri is unused and gets added to a crt-list with the ocsp auto update option "on", it would not have been inserted into the auto update tree because this insertion was only working on the first call of the ssl_sock_load_ocsp function. If the configuration used a crt-list like the following: cert1.pem * cert2.pem [ocsp-update on] * Then calling "del ssl crt-list" on the second line and then reverting the delete by calling "add ssl crt-list" with the same line, then the cert2.pem would not appear in the ocsp update list (can be checked thanks to "show ssl ocsp-updates" command). This patch ensures that in such a case we still perform the insertion in the update tree. This patch can be backported up to branch 2.8.	2024-02-07 17:10:49 +01:00
Remi Tricot-Le Breton	a290db5706	BUG/MINOR: ssl: Destroy ckch instances before the store during deinit The ckch_store's free'ing function might end up calling 'ssl_sock_free_ocsp' if the corresponding certificate had ocsp data. This ocsp cleanup function expects for the 'refcount_instance' member of the certificate_ocsp structure to be 0, meaning that no live ckch instance kept a reference on this certificate_ocsp structure. But since in ckch_store_free we were destroying the ckch_data before destroying the linked instances, the BUG_ON would fail during a standard deinit. Reversing the cleanup order fixes the problem. Must be backported to 2.8.	2024-02-07 17:10:31 +01:00
Remi Tricot-Le Breton	befebf8b51	BUG/MEDIUM: ocsp: Separate refcount per instance and per store With the current way OCSP responses are stored, a single OCSP response is stored (in a certificate_ocsp structure) when it is loaded during a certificate parsing, and each ckch_inst that references it increments its refcount. The reference to the certificate_ocsp is actually kept in the SSL_CTX linked to each ckch_inst, in an ex_data entry that gets freed when he context is freed. One of the downside of this implementation is that is every ckch_inst referencing a certificate_ocsp gets detroyed, then the OCSP response is removed from the system. So if we were to remove all crt-list lines containing a given certificate (that has an OCSP response), the response would be destroyed even if the certificate remains in the system (as an unused certificate). In such a case, we would want the OCSP response not to be "usable", since it is not used by any ckch_inst, but still remain in the OCSP response tree so that if the certificate gets reused (via an "add ssl crt-list" command for instance), its OCSP response is still known as well. But we would also like such an entry not to be updated automatically anymore once no instance uses it. An easy way to do it could have been to keep a reference to the certificate_ocsp structure in the ckch_store as well, on top of all the ones in the ckch_instances, and to remove the ocsp response from the update tree once the refcount falls to 1, but it would not work because of the way the ocsp response tree keys are calculated. They are decorrelated from the ckch_store and are the actual OCSP_CERTIDs, which is a combination of the issuer's name hash and key hash, and the certificate's serial number. So two copies of the same certificate but with different names would still point to the same ocsp response tree entry. The solution that answers to all the needs expressed aboved is actually to have two reference counters in the certificate_ocsp structure, one for the actual ckch instances and one for the ckch stores. If the instance refcount becomes 0 then we remove the entry from the auto update tree, and if the store reference becomes 0 we can then remove the OCSP response from the tree. This would allow to chain some "del ssl crt-list" and "add ssl crt-list" CLI commands without losing any functionality. Must be backported to 2.8.	2024-02-07 17:10:05 +01:00
Remi Tricot-Le Breton	23cab33b67	BUG/MINOR: ssl: Clear the ckch instance when deleting a crt-list line When deleting a crt-list line through a "del ssl crt-list" call on the CLI, we ended up free'ing the corresponding ckch instances without fully clearing their contents. It left some dangling references on other objects because the attache SSL_CTX was not deleted, as well as all the ex_data referenced by it (OCSP responses for instance). This patch can be backported up to branch 2.4.	2024-02-07 17:10:00 +01:00
Remi Tricot-Le Breton	28e78a0a74	MINOR: ssl: Use OCSP_CERTID instead of ckch_store in ckch_store_build_certid The only useful information taken out of the ckch_store in order to copy an OCSP certid into a buffer (later used as a key for entries in the OCSP response tree) is the ocsp_certid field of the ckch_data structure. We then don't need to pass a pointer to the full ckch_store to ckch_store_build_certid or even any information related to the store itself. The ckch_store_build_certid is then converted into a helper function that simply takes an OCSP_CERTID and converts it into a char buffer.	2024-02-07 17:09:39 +01:00
Remi Tricot-Le Breton	1fda0a5202	BUG/MINOR: ssl: Duplicate ocsp update mode when dup'ing ckch When calling ckchs_dup (during a "set ssl cert" CLI command), if the modified store had OCSP auto update enabled then the new certificate would not keep the previous update mode and would not appear in the auto update list. This patch can be backported to 2.8.	2024-02-07 17:09:34 +01:00
Christopher Faulet	d7467cd495	MINOR: applet: Identify applets using their own buffers via a flag These applets can now be identified by testing APPCTX_FL_INOUT_BUFS flag. This will be useful between the kind of applets in helper functions.	2024-02-07 15:05:05 +01:00
Christopher Faulet	a9301c96f1	MINOR: applet: Use an option to disable zero-copy forwarding for all applets At the beginning of the 3.0-dev cycle, the zero-copy forwarding support was added only for the cache applet with an option to disable it. This was a hack, waiting for a better integration with applets. It is now possible to implement the zero-copy forwarding for any applets. So the specific option for the cache applet was renamed to be used for all applets. And this option is now also checked for the stats applet. Concretely, 'tune.cache.zero-copy-forwarding' was renamed to 'tune.applet.zero-copy-forwarding'.	2024-02-07 15:05:01 +01:00
Christopher Faulet	00152bad85	MINOR: cache: Remove unsed .data_sent field from the cache applet context This field was introduced when the first implementation of the zero-copy forwarding was added. It is now useless. However, we must still save the body-size of the object in the cache.	2024-02-07 15:04:57 +01:00
Christopher Faulet	ee53d8421f	MEDIUM: applet: Simplify a bit API to exchange data with applets Default .rcv_buf and .snd_buf functions that applets can use are now specialized to manipulate raw buffers or HTX buffers. Thus a TCP applet should use appctx_raw_rcv_buf() and appctx_raw_snd_buf() while HTTP applet should use appctx_htx_rcv_buf() and appctx_htx_snd_buf(). Note that the appctx is now directly passed to these functions instead of the SC.	2024-02-07 15:04:52 +01:00
Christopher Faulet	868205943c	MAJOR: stats: Send stats dump over HTTP using zero-copy forwarding Just like for the cache applet, it is now possible to send response to the opposite side using the zero-copy forwarding. Internal functions were slightly updated but there is nothing special to say. Except the requested size during the nego stage is not exact.	2024-02-07 15:04:48 +01:00
Christopher Faulet	91b77c1632	MEDIUM: mux-h1: Support zero-copy forwarding for chunks with an unknown size Till now, for chunked messages, the H1 mux used the size requested during the zero-copy forwarding negotiation as the chunk size. And till now, this was accurate because the requested size was indeed the chunk size on the producer side. But this will be a problem to implement the zero-copy forwarding on some applets because the content size is not known during the nego but only when it is produced. Thanks to previous patches, it is now possible to know the requested size is not exact and we are able to reserve a larger space to write the chunk size later, in h1_done_ff(), with some padding.	2024-02-07 15:04:44 +01:00
Christopher Faulet	dcb964f8db	MINOR: mux-h1: Stop zero-copy forwarding during nego for too big requested size Now, during the zero-copy forwarding negotiation, when the requested size is exact, we are now able to check if it is bigger than the expected one or not. If it is indeed bigger than expeceted, the zero-copy forwarding is disabled, the error will be triggered later on the normal sending path.	2024-02-07 15:04:41 +01:00
Christopher Faulet	1c18d32a0d	MEDIUM: stconn: Nofify requested size during zero-copy forwarding nego is exact It is now possible to use a flag during zero-copy forwarding negotiation to specify the requested size is exact, it means the producer really expect to receive at least this amount of data. It can be used by consumer to prepare some processing at this stage, based on the requested size. For instance, in the H1 mux, it is used to write the next chunk size.	2024-02-07 15:04:38 +01:00
Christopher Faulet	fe506d7aaa	MINOR: mux-h1: Be able to define the length of a chunk size when it is prepended It is now possible to impose the length to represent the chunk size in the function used to prepended the chunk size in a buffer (so before the chunk itself). It is thus possible to reserve a specific space for an unknown chunk size and padding it with leading '0' to use all the space and avoid holes.	2024-02-07 15:04:34 +01:00
Christopher Faulet	2297f52734	MINOR: stconn: Add support for flags during zero-copy forwarding negotiation During zero-copy forwarding negotiation, a pseudo flag was already used to notify the consummer if the producer is able to use kernel splicing or not. But this was not extensible. So, now we use a true bitfield to be able to pass flags during the negotiation. NEGO_FF_FL_* flags may be used now. Of course, for now, there is only one flags, the kernel splicing support on producer side (NEGO_FF_FL_MAY_SPLICE).	2024-02-07 15:04:29 +01:00
Christopher Faulet	c061ba30f7	MAJOR: cache: Send cached objects using zero-copy forwarding The zero-copy forwarding is reintroduced implementing the .fastfwd callback function. Otherwise, there is nothing special to say.	2024-02-07 15:04:24 +01:00
Christopher Faulet	863417292b	MAJOR: cache: Update HTTP cache applet to handle its own buffers Just like the HTTP stats applet, the cache applet was refactored to use its own buffers. Changes are pretty similar.	2024-02-07 15:04:21 +01:00
Christopher Faulet	6d5dd23dbc	MEDIUM: cache: Temporarily remove zero-copy forwarding support The cache applet will be refactored to use its own buffer. Thus, for now, the zero-copy forwarding support is removed and it will be reintrocuded later.	2024-02-07 15:04:17 +01:00
Christopher Faulet	18845a0624	MAJOR: stats: Update HTTP stats applet to handle its own buffers The HTTP stat applets and all internal functions was adapted to use its own buffers instead of the channels ones. The CLI part was not refactored yet, thus there are still some access to channels in the file. But for the HTTP part, we no longer use the channels at all. To do so, the HTTP stats applet now uses default .rcv_buf and .snd_buf callback function. In addition, it sets appctx flags instead of SE ones.	2024-02-07 15:04:13 +01:00
Christopher Faulet	a4dcd3e54b	MEDIUM: stats: Don't interrupt processing on partial post We no longer test the opposite stream-connector to detect aborted partial post. Applets must not try to access to info ouside their scope. This make the code more sensitive to changes and it is a common source of bug. Tests on the sedesc flags at the begining of the I/O handler should be enough.	2024-02-07 15:04:09 +01:00
Christopher Faulet	6ac119ba2d	MINOR: applet: Automatically handle applets having more data for the stream This should simplify applets implementation. Of course an applet should still do it by itself if conditions to set this flag differ.	2024-02-07 15:04:06 +01:00
Christopher Faulet	39b6f5b04c	MEDIUM: applet: Add support for zero-copy forwarding from an applet Thanks to this patch, it is possible to an applet to directly send data to the opposite endpoint. To do so, it must implement <fastfwd> appctx callback function and set SE_FL_MAY_FASTFWD flag. Everything will be handled by appctx_fastfwd() function. The applet is only responsible to transfer data. If it sets <to_forward> value, it is used to limit the amount of data to forward.	2024-02-07 15:04:01 +01:00
Christopher Faulet	62a81cb6a6	MINOR: applet: Add callback function to deal with zero-copy forwarding This patch introduces the support for the callback function responsible to produce data via the zero-copy forwarding mechanism. There is no implementation for now. But <to_forward> field was added in the appctx structure to let an applet inform how much data it want to forward. It is not mandatory but it will be used during the zero-copy forwarding negociation.	2024-02-07 15:03:57 +01:00
Christopher Faulet	7ec544b217	MEDIUM: applet: Use appctx flags to report EOS/EOI/ERROR to SE We have indroduced flags to deal with end of input, end of stream and errors at the applet level. With this patch we make the link with the endpoint descriptor. In appctx_rcv_buf(), applet flags are converted to SE flags.	2024-02-07 15:03:54 +01:00
Christopher Faulet	cc7b141e1c	MINOR: applet: Add an appctx flag to report shutdown to applets There is no shutdown for reads and send with applets. Both are performed when the appctx is released. So instead of 2 flags, like for muxes/connections, only one flag is used. But the idea is the same: acknowledge the event at the applet level.	2024-02-07 15:03:50 +01:00
Christopher Faulet	14bd091fd7	MINOR: applet: Remove appctx state field to only used the flags The appctx state was never really used as a state. It is only used to know when an applet should be freed on the next wakeup. This can be converted to a flag and the state can be removed. This is what this patch does.	2024-02-07 15:03:46 +01:00
Christopher Faulet	e8655546b7	MINOR: applet: Add flags on the appctx and stop abusing its state Till now, we've extended the appctx state to add some flags. However, the field name is misleading. So a bitfield was added to handle real flags. And helper functions to manipulate this bitfield were added.	2024-02-07 15:03:34 +01:00
Christopher Faulet	c0527261cf	MINOR: applet: Show IN/OUT buffers in trace messages when used The function dumping applet trace messages was updated to dump info about in/out buffers instead of channel buffers when it is relevant.	2024-02-07 15:03:30 +01:00
Christopher Faulet	4ad8192ce4	MEDIM: applet: Add the applet handler based on IN/OUT buffers A dedicated function to run applets was introduced, in addition to the old one, to deal with applets that use their own buffers. The main differnce here is that this handler does not use channels at all. It performs a synchronous send before calling the applet and performs a synchronous receive just after. No applets are plugged on this handler for now.	2024-02-07 15:03:26 +01:00
Christopher Faulet	f81b704d01	MEDIUM: stconn: Add functions to handle applets I/O from the SC layer There is no tasklet to handle I/O subscriptions for applets, but functions to deal with receives and sends from the SC layer were added. it meanse a function to retrieve data from an applet with this synchronous version and a function to push data to an applet wit this synchronous version. It is pretty similar to the functions used for muxes but there are some differences. So for now, we keep them separated. Zero-copy forwarding is not supported for now. In addition, there is no subscription mechanism.	2024-02-07 15:03:23 +01:00
Christopher Faulet	525ec12305	MINOR: applet: Implement default functions to exchange data with channels In this patch, we add default functions to copy data from a channel to the <inbuf> buffer of an applet (appctx_rcv_buf) and another on to copy data from <outbuf> buffer of an applet to a channel (appctx_snd_buf). These functions are not used for now, but they will be used by applets to define their <rcv_buf> and <snd_buf> callback functions. Of course, it will be possible for a specific applet to implement its own functions but these ones should be good enough for most of applets. HTX and RAW buffers are supported.	2024-02-07 15:03:18 +01:00
Christopher Faulet	04eca50f49	MINOR: applet: Add traces to debug receive/send and block/wake events New traces events are added to be able to debug receives and sends.	2024-02-07 15:03:09 +01:00
Christopher Faulet	ab9d2c6ca8	MINOR: applet: Add dedicated IN/OUT buffers for appctx It is the first patch of a series aimed to align applets on connections. Here, dedicated buffers are added for applets. For now, buffers are initialized and helpers function to deal with allocation are added. In addition, flags to report allocation failures or full buffers are also introduced. <inbuf> will be used to push data to the applet from the stream and <outbuf> will be used to push data from the applet to the stream.	2024-02-07 15:03:01 +01:00
Christopher Faulet	45ca9dadcd	MINOR: stconn: Be prepared to handle error when a SC is attached to an applet sc_attach_applet() was changed to be able to fail and callers were updated accordingly. For now it cannot fail but if this changes, callers will be prepared to handle errors.	2024-02-07 15:02:27 +01:00
Christopher Faulet	ad937372f3	MINOR: stconn: Explicitly use an appctx to attach a stconn on it In sc_attach_applet, an untyped pointer (void *) was used to attach a SC on an applet. There is no reason to not use the right type here. So now a pointer on an appctx is explicitly used.	2024-02-07 15:02:22 +01:00
Frederic Lecaille	c977b9aa15	MINOR: quic: Stop using 1024th of a second. Use milliseconds in place of 1024th of a second. Should be backported as far as 2.6.	2024-02-07 08:44:31 +01:00
Frederic Lecaille	19a66b290e	BUG/MINOR: quic: fix possible integer wrap around in cubic window calculation Avoid loss of precision when computing K cubic value. Same issue when computing the congestion window value from cubic increase function formula with possible integer varaiable wrap around. Depends on this commit: MINOR: quic: Code clarifications for QUIC CUBIC (RFC 9438) Must be backported as far as 2.6.	2024-02-07 08:44:31 +01:00
Frederic Lecaille	88d13caa38	CLEANUP: quic: Code clarifications for QUIC CUBIC (RFC 9438) The first version of our QUIC CUBIC implementation is confusing because relying on TCP CUBIC linux kernel implementation and with references to RFC 8312 which is obsoleted by RFC 9438 (August 2023) after our implementation. RFC 8312 is a little bit hard to understand. RFC 9438 arrived with much more clarifications. So, RFC 9438 is about "CUBIC for Fast Long-Distance Networks". Our implementation for QUIC is not very well documented. As it was difficult to reread this code, this patch adds only some comments at complicated locations and rename some macros, variables without logic modifications at all. So, the aim of this patch is to add first some comments and variables/macros renaming to avoid embedding too much code modifications in the same big patch. Some code modifications will come to adapt this CUBIC implementation to this new RFC 9438. Rename some macros: CUBIC_BETA -> CUBIC_BETA_SCALED CUBIC_C -> CUBIC_C_SCALED CUBIC_BETA_SCALE_SHIFT -> CUBIC_SCALE_FACTOR_SHIFT (this is the scaling factor which is used only for CUBIC_BETA_SCALED) CUBIC_DIFF_TIME_LIMIT -> CUBIC_TIME_LIMIT CUBIC_ONE_SCALED was added (scaled value of 1). These cubic struct members were renamed: ->tcp_wnd -> ->W_est ->origin_point -> ->W_target ->epoch_start -> ->t_epoch ->remaining_tcp_inc -> remaining_W_est_inc Local variables to quic_cubic_update() were renamed: t -> elapsed_time diff ->t delta -> W_cubic_t Add a grahpic curve about the CUBIC Increase function. Add big copied & pasted RFC 9438 extracts in relation with the 3 different increase function regions. Same thing for the fast convergence. Fix a typo about the reference to QUIC RFC 9002. Must be backported as far as 2.6 to ease any further modifications to come.	2024-02-07 08:44:31 +01:00
Willy Tarreau	7cba015c85	DEBUG: make the "debug dev {debug\|warn\|check}" command print a message In order to test the new message output capability, these commands will now explicitly mention that the bug was triggered from the CLI.	2024-02-05 17:09:00 +01:00
Remi Tricot-Le Breton	73705ac701	BUG/MINOR: ssl: Fix error message after ssl_sock_load_ocsp call If we were to enable 'ocsp-update' on a certificate that does not have an OCSP URI, we would exit ssl_sock_load_ocsp with a negative error code which would raise a misleading error message ("<cert> has an OCSP URI and OCSP auto-update is set to 'on' ..."). This patch simply fixes the error message but an error is still raised. This issue was raised in GitHub #2432. It can be backported up to branch 2.8.	2024-02-05 15:59:16 +01:00
Aurelien DARRAGON	92b2edb42e	MINOR: stream: add "txn.redispatch" fetch Fetch will return true if the stream underwent a redispatch according to "option redispatch" setting upon retries. Documentation was added, and the "%rc" logformat alternative now mentions the new fetch to properly emulate the logformat behavior.	2024-02-05 14:54:37 +01:00
Frederic Lecaille	59acb27001	BUILD: quic: Variable name typo inside a BUG_ON(). This build issued was introduced by this previous commit which is a bugfix: BUG/MINOR: quic: Wrong ack ranges handling when reaching the limit. A BUG_ON() referenced <fist> variable in place of <first>. Must be backported as far as 2.6 as the previous commit.	2024-02-05 14:31:21 +01:00
Frederic Lecaille	0ce61d2f6d	BUG/MINOR: quic: Wrong ack ranges handling when reaching the limit. Acknowledgements ranges are used to build ACK frames. To avoid allocating too much such objects, a limit was set to 32(QUIC_MAX_ACK_RANGES) by this commit: MINOR: quic: Do not allocate too much ack ranges But there is an inversion when removing the oldest range from its tree. eb64_first() must be used in place of eb64_last(). Note that this patch only does this modification in addition to rename <last> variable to <first>. This bug leads such a h2load command to block when a request ends up not being acknowledged by haproxy even if correctly served: /opt/nghttp2/build/bin/h2load --alpn-list h3 -t 1 -c 1 -m 1 -n 100 \ https://127.0.0.1/?s=5m There is a remaining question to be answered. In such a case, haproxy refuses to reopen the stream, this is a good thing but should not haproxy ackownledge the request (because correctly parsed again). Note that to be easily reproduced, this setting had to be applied to the client network interface: tc qdisc add dev eth1 root netem delay 100ms 1s loss random Must be backported as far as 2.6.	2024-02-05 14:26:52 +01:00
Willy Tarreau	52cc45dfa5	MINOR: acl: add extra diagnostics about suspicious string patterns As noticed in this thread, some bogus configurations are not always easy to spot: https://www.mail-archive.com/haproxy@formilux.org/msg44558.html Here it was about config keywords being used in ACL patterns where strings were expected, hence they're always valid. Since we have the diag mode (-dD) we can perform some extra checks when it's used, and emit them to suggest the user there might be an issue. Here we detect a few common words (logic such as "and"/"or"/"\|\|" etc), C++/JS comments mistakenly used to try to isolate final args, and words that have the exact name of a sample fetch or an ACL keyword. These checks are only done in diag mode of course.	2024-02-03 12:08:11 +01:00
Willy Tarreau	75d64c0d4c	BUG/MINOR: diag: run the final diags before quitting when using -c Final diags were added in 2.4 by commit `5a6926dcf` ("MINOR: diag: create cfgdiag module"), but it's called too late in the startup process, because when "-c" is passed, the call is not made, while it's its primary use case. Let's just move the call earlier. Note that currently the check in this function is limited to verifying unicity of server cookies in a backend, so it can be backported as far as 2.4, but there is little value in insisting if it doesn't backport easily.	2024-02-03 12:08:11 +01:00
Willy Tarreau	ced4148401	BUG/MINOR: diag: always show the version before dumping a diag warning Diag warnings were added in 2.4 by commit `7b01a8dbd` ("MINOR: global: define diagnostic mode of execution") but probably due to the split function that checks for the mode, they did not reuse the emission of the version string before the first warning, as was brought in 2.2 by commit `bebd21206` ("MINOR: init: report in "haproxy -c" whether there were warnings or not"). The effet is that diag warnings are emitted before the version string if there is no other warning nor error. Let's just proceed like for the two other ones. This can be backported to 2.4, though this is of very low importance.	2024-02-03 12:08:11 +01:00
Christopher Faulet	ca6f0ca82b	MEDIUM: promex/resolvers: Dump resolvers metrics via a promex module Just like for stick-tables, this patch adds a promex module to dump resolvers metrics. It adds the "resolver" scope and for now, it dumps folloowing metrics: * haproxy_resolver_sent * haproxy_resolver_send_error * haproxy_resolver_valid * haproxy_resolver_update * haproxy_resolver_cname * haproxy_resolver_cname_error * haproxy_resolver_any_err * haproxy_resolver_nx * haproxy_resolver_timeout * haproxy_resolver_refused * haproxy_resolver_other * haproxy_resolver_invalid * haproxy_resolver_too_big * haproxy_resolver_outdated	2024-02-02 09:11:34 +01:00
Christopher Faulet	3e55b3da30	MEDIUM: promex/stick-table: Dump stick-table metrics via a promex module Create a promex module to dump stick-table metrics. Thanks to this patch, all references to stick tables were removed from the promex service.	2024-02-02 09:11:34 +01:00
Christopher Faulet	3246f863d6	MEDIUM: stats: Be able to access a specific field into a stats module It is now possible to selectively retrieve extra counters from stats modules. H1, H2, QUIC and H3 fill_stats() callback functions are updated to return a specific counter.	2024-02-01 12:00:53 +01:00
Christopher Faulet	fd366a106b	MINOR: stats: Be able to access to registered stats modules from anywhere The list of modules registered on the stats to expose extra counters is now public. It is required to export these counters into the Prometheus exporter.	2024-02-01 12:00:53 +01:00
Aurelien DARRAGON	42a97d9feb	MEDIUM: tcp-act/backend: support for set-bc-{mark,tos} actions set-bc-{mark,tos} actions are pretty similar to set-fc-{mark,tos} to set mark/tos on packets sent from haproxy to server: set-bc-{mark,tos} actions act on the whole backend/srv connection: from connect() to connection teardown, thus they may only be used before the connection to the server is instantiated, meaning that they are only relevant for request-oriented rules such as tcp-request or http-request rules. For now their use is limited to content request rules, because tos and mark informations are stored directly within the stream, thus it is required that the stream already exists. stream flags are used in combination with dedicated stream struct members variables to pass 'tos' and 'mark' informations so that they are correctly considered during stream connection assignment logic (prior to connecting to actually connecting to the server) 'tos' and 'mark' fd sockopts are taken into account in conn hash parameters for connection reuse mechanism. The documentation was updated accordingly.	2024-02-01 10:58:30 +01:00
Aurelien DARRAGON	b4ee7b044e	MEDIUM: tcp-act: <expr> support for set-fc-{mark,tos} actions In this patch we add the possibility to use sample expression as argument for set-fc-{mark,tos} actions. To make it backward compatible with previous behavior, during parsing we first try to parse the value as as integer (decimal or hex notation), and then fallback to expr parsing in case of failure. The documentation was updated accordingly.	2024-02-01 10:58:30 +01:00
Aurelien DARRAGON	03cb782bcb	MINOR: hlua: Rename set_{tos, mark} to set_fc_{tos, mark} This is a complementary patch to "MINOR: tcp-act: Rename "set-{mark,tos}" to "set-fc-{mark,tos}"", but for the Lua API. set_mark and set_tos were kept as aliases for set_fc_mark and set_fc_tos but they were marked as deprecated. Using this opportunity to reorder set_mark and set_tos by alphabetical order.	2024-02-01 10:58:30 +01:00
Aurelien DARRAGON	acf6383076	MINOR: tcp-act: Rename "set-{mark,tos}" to "set-fc-{mark,tos}" "set-mark" and "set-tos" only alter packets from haproxy to client (frontend connection). Since we may add support for equivalent keywords on server side, we rename them with an explicit name to prevent confusions. Thus, we rename: - "set-mark" to "set-fc-mark" - "set-tos" to "set-fc-tos" "set-mark" and "set-tos" were kept as aliases (to "set-fc-mark" and "set-fc-tos" respectively) for now to prevent config breakage, but they have been marked as deprecated so they can be removed in future version.	2024-02-01 10:58:30 +01:00
Aurelien DARRAGON	eea3b94514	MINOR: tcp_act: fix alphabetical ordering of tcp request content actions "set-src" and "set-src-port" were misplaced and incorrectly ordered in tcp_req_cont_actions keyword list.	2024-02-01 10:58:30 +01:00
Aurelien DARRAGON	ea09075f59	OPTIM: connection: progressive hash for conn_calculate_hash() Some CPU time is needlessly wasted in conn_calculate_hash(), because all params are first copied into a temporary buffer before computing the hash on the whole buffer. Instead, let's leverage the XXH progressive hash update functions to avoid expensive memcpys.	2024-02-01 10:58:30 +01:00
Amaury Denoyelle	4b5f557283	MINOR: mux-quic: realign Tx buffer if possible A major reorganization of QUIC MUX sending has been implemented. Now data transfer occur over a single QCS buffer. This has improve performance but at the cost of restrictions on snd_buf. Indeed, buffer instances are now shared from stream callback snd_buf up to quic-conn layer. As such, snd_buf cannot manipulate freely already present data buffer. In particular, realign has been completely removed by the previous patches. This commit reintroduces a partial realign support. This is only done if the buffer contains only unsent data, via a new MUX function qcc_realign_stream_txbuf() which is called during snd_buf.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	4513787d0d	MEDIUM: mux-quic: properly handle conn Tx buf exhaustion This commit is a direct follow-up on the major rearchitecture of send buffering. This patch implements the proper handling of connection pool buffer temporary exhaustion. The first step is to be able to differentiate a fatal allocation error from a temporary pool exhaustion. This is done via a new output argument on qcc_get_stream_txbuf(). For a fatal error, application protocol layer will schedule the immediate connection closing. For a pool exhaustion, QCC is flagged with QC_CF_CONN_FULL and stream sending process is interrupted. QCS instance is also registered in a new list <qcc.buf_wait_list>. A new connection buffer can become available when all ACKs are received for an older buffer. This process is taken in charge by quic-conn layer. It uses qcc_notify_buf() function to clear QC_CF_CONN_FULL and to wake up every streams registered on buf_wait_list to resume sending process.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	cd22200d23	MEDIUM: mux-quic: release Tx buf on too small room This commit is a direct follow-up on the major rearchitecture of send buffering. It allows application protocol to react if current QCS sending buffer space is too small. In this case, the buffer can be released to the quic-conn layer. This allows to allocate a new QCS buffer and retry HTX parsing, unless connection buffer pool is already depleted. A new function qcc_release_stream_txbuf() serves as API for app protocol to release the QCS sending buffer. This operation fails if there is unsent data in it. In this case, MUX has to keep it to finalize transfer of unsent data to quic-conn layer. QCS is thus flagged with QC_SF_BLK_MROOM to interrupt snd_buf operation. When all data are sent to the quic-conn layer, QC_SF_BLK_MROOM is cleared via qcc_streams_sent_done() and stream layer is woken up to restart snd_buf. Note that a new function qcc_stream_can_send() has been defined. It allows app proto to check if sending is currently blocked for the current QCS. For now, it checks QC_SF_BLK_MROOM flag. However, it will be extended to other conditions with the following patches.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	3fe3251593	MEDIUM: mux-quic: simplify sending API The previous commit was a major rework for QUIC MUX sending process. Following this, this patch cleans up a few elements that remains but can be removed as they are duplicated. Of notable changes, offset fields from QCS and QCC are removed. They are both equivalent to flow control soft offsets. A new function qcs_prep_bytes() is implemented. Its purpose is to return the count of prepared data bytes not yet sent. It also replaces qcs_need_sending().	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	00a3e5f786	MAJOR: mux-quic: remove intermediary Tx buffer Previously, QUIC MUX sending was implemented with data transfered along two different buffer instances per stream. The first QCS buffer was used for HTX blocks conversion into H3 (or other application protocol) during snd_buf stream callback. QCS instance is then registered for sending via qcc_io_cb(). For each sending QCS, data memcpy is performed from the first to a secondary buffer. A STREAM frame is produced for each QCS based on the content of their secondary buffer. This model is useful for QUIC MUX which has a major difference with other muxes : data must be preserved longer, even after sent to the lower layer. Data references is shared with quic-conn layer which implements retransmission and data deletion on ACK reception. This double buffering stages was the first model implemented and remains active until today. One of its major drawbacks is that it requires memcpy invocation for every data transferred between the two buffers. Another important drawback is that the first buffer was is allocated by each QCS individually without restriction. On the other hand, secondary buffers are accounted for the connection. A bottleneck can appear if secondary buffer pool is exhausted, causing unnecessary haproxy buffering. The purpose of this commit is to completely break this model. The first buffer instance is removed. Now, application protocols will directly allocate buffer from qc_stream_desc layer. This removes completely the memcpy invocation. This commit has a lot of code modifications. The most obvious one is the removal of <qcs.tx.buf> field. Now, qcc_get_stream_txbuf() returns a buffer instance from qc_stream_desc layer. qcs_xfer_data() which was responsible for the memcpy between the two buffers is also completely removed. Offset fields of QCS and QCC are now incremented directly by qcc_send_stream(). These values are used as boundary with flow control real offset to delimit the STREAM frames built. As this change has a big impact on the code, this commit is only the first part to fully support single buffer emission. For the moment, some limitations are reintroduced and will be fixed in the next patches : * on snd_buf if QCS sent buffer in used has room but not enough for the application protocol to store its content * on snd_buf if QCS sent buffer is NULL and allocation cannot succeeds due to connection pool exhaustion One final important aspect is that extra care is necessary now in snd_buf callback. The same buffer instance is referenced by both the stream and quic-conn layer. As such, some operation such as realign cannot be done anymore freely.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	7dd6ed6321	MINOR: mux-quic: check fctl during STREAM frame build qcs_build_stream_frm() is responsible to generate a STREAM frame pointing to the content of QCS TX buffer. This patch moves send flow control overflow check from qcs_xfer_data() to qcs_build_stream_frm(), i.e. from transfer between internal QCS buffer and qc_stream_desc, to STREAM frame generation. Flow control is both check at stream and connection level. For connection flow control, as several frames are built before emission, an accumulator is used as extra arguments to functions to account the total length of already built frames. This patch should not provide any functional changes. Its main purpose is to prepare for the removal of QCS internal buffer.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	c6ef55407c	MINOR: mux-quic: remove unneeded sent-offset fields Both QCS and QCC have their owned sent offset field. These fields store the newest offset sent to the quic-conn layer. It is similar to QCS/QCC flow control real offset. This patch removes them and replaces them by the latter for code clarification. MINOR: mux-quic: remove unneeded qcc.tx.sent_offsets field This commit as a similar purpose as previous, except that it removes QCC <sent_offsets> field, now equivalent to connection flow control real offset.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	d4bf6f0526	MEDIUM: mux-quic: limit conn flow control on snd_buf This commit is a direct follow-up on the previous one. This time, it deals with connection level flow control. Process is similar to stream level : soft offset is incremented during snd_buf and real offset during STREAM frame emission. On MAX_DATA reception, both stream layer and QMUX is woken up if necessary. One extra feature for conn level is the introduction of a new QCC list to reference QCS instances. It will store instances for which snd_buf callback has been interrupted on QCC soft offset reached. Every stream instances is woken up on MAX_DATA reception if soft_offset is unblocked.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	c44692356d	MEDIUM: mux-quic: limit stream flow control on snd_buf This patch is the first of two to reimplement flow control emission limits check. The objective is to account flow control earlier during snd_buf stream callback. This should smooth transfers and prevent over buffering on haproxy side if flow control limit is reached. The current patch deals with stream level flow control. It reuses the newly defined flow control type. Soft offset is incremented after HTX to data conversion. If limit is reached, snd_buf is interrupted and stream layer will subscribe on QCS. On qcc_io_cb(), generation of STREAM frames is restricted as previously to ensure to never surpass peer limits. Finally, flow control real offset is incremented on lower layer send notification. Thus, it will serve as a base offset for built STREAM frames. If limit is reached, STREAM frames generation is suspended. Each time QCS data flow control limit is reached, soft and real offsets are reconsidered. Finally, special care is used when flow control limit is incremented via MAX_STREAM_DATA reception. If soft value is unblocked, stream layer snd_buf is woken up. If real value is unblocked, qcc_io_cb() is rescheduled.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	25493ca036	MINOR: mux-quic: define a flow control related type Create a new module dedicated to flow control handling. It will be used to implement earlier flow control update on snd_buf stream callback. For the moment, only Tx part is implemented (i.e. limit set by the peer that haproxy must respect for sending). A type quic_fctl is defined to count emitted data bytes. Two offsets are used : a real one and a soft one. The difference is that soft offset can be incremented beyond limit unless it is already in excess. Soft offset will be used for HTX to H3 parsing. As size of generated H3 is unknown before parsing, it allows to surpass the limit one time. Real offset will be used during STREAM frame generation : this time the limit must not be exceeded to prevent protocol violation.	2024-01-31 16:28:54 +01:00
Amaury Denoyelle	f32c08be34	MINOR: mux-quic: prepare for earlier flow control update Add a new argument to qcc_send_stream() to specify the count of sent bytes. For the moment this argument is unused. This commit is in fact a step to implement earlier flow control update during stream layer snd_buf.	2024-01-31 16:28:54 +01:00
Willy Tarreau	fadabc430f	CLEANUP: h1: remove unused function h1_measure_trailers() This one stopped being used in 2.1 when HTX became mandatory, let's drop it.	2024-01-31 15:22:12 +01:00
Willy Tarreau	0d76a284b6	BUG/MEDIUM: h1: always reject the NUL character in header values Ben Kallus kindly reported that we still hadn't blocked the NUL character from header values as clarified in RFC9110 and that, even though there's no known issure related to this, it may one day be used to construct an attack involving another component. Actually, both Christopher and I sincerely believed we had done it prior to releasing 2.9, shame on us for missing that one and thanks to Ben for the reminder! The change was applied, it was confirmed to properly reject this NUL byte from both header and trailer values, and it's still possible to force it to continue to be supported using the usual pair of unsafe "option accept-invalid-http-{request\|response}" for those who would like to keep it for whatever reason that wouldn't make sense. This was tagged medium so that distros also remember to apply it as a preventive measure. It should progressively be backported to all versions down to 2.0.	2024-01-31 15:22:12 +01:00
Willy Tarreau	44f02d26f0	BUG/MINOR: h1-htx: properly initialize the err_pos field Trailers are parsed using a temporary h1m struct, likely due to using distinct h1 parser states. However, the err_pos field that's used to decide whether or not to enfore option accept-invalid-http-request (or response) was not initialized in this struct, resulting in using a random value that may randomly accept or reject a few bad chars. The impact is very limited in trailers (e.g. no message size is transmitted there) but we must make sure that the option is respected, at least for users facing the need for this option there. The issue was introduced in 2.0 by commit `2d7c5395ed` ("MEDIUM: htx: Add the parsing of trailers of chunked messages"), and the code moved from mux_h1.c to h1_htx.c in 2.1 with commit `4f0f88a9d0` ("MEDIUM: mux-h1/h1-htx: move HTX convertion of H1 messages in dedicated file") so the patch needs to be backported to all stable versions, and the file adjusted for 2.0.	2024-01-31 15:22:12 +01:00
William Lallemand	5c45199347	MEDIUM: ssl/quic: always compile the ssl_conf.early_data test Always compile the test of the early_data variable in "ssl_quic_initial_ctx", this way we can emit a warning about its support or not. The test was moved in a more simple preprocessor check which only checks the new HAVE_SSL_0RTT_QUIC constant. Could be backported to 2.9 with the 2 previous commits. However AWS-LC must be excluded of HAVE_SSL_0RTT_QUIC in this version.	2024-01-31 11:57:54 +01:00
William Lallemand	025f5105ee	MINOR: ssl: rename HA_OPENSSL_HAVE_0RTT_SUPPORT constant to HAVE_SSL_0RTT_QUIC Rename the constant to be me more comprehensive.	2024-01-31 11:57:54 +01:00
Christopher Faulet	4837e99892	BUG/MEDIUM: h1: Don't support LF only to mark the end of a chunk size It is similar to the previous fix but for the chunk size parsing. But this one is more annoying because a poorly coded application in front of haproxy may ignore the last digit before the LF thinking it should be a CR. In this case it may be out of sync with HAProxy and that could be exploited to perform some sort or request smuggling attack. While it seems unlikely, it is safer to forbid LF with CR at the end of a chunk size. This patch must be backported to 2.9 and probably to all stable versions because there is no reason to still support LF without CR in this case.	2024-01-30 15:00:14 +01:00
Christopher Faulet	7b737da825	BUG/MINOR: h1: Don't support LF only at the end of chunks When the message is chunked, all chunks must ends with a CRLF. However, on old versions, to support bad client or server implementations, the LF only was also accepted. Nowadays, it seems useless and can even be considered as an issue. Just forbid LF only at the end of chunks, it seems reasonnable. This patch must be backported to 2.9 and probably to all stable versions because there is no reason to still support LF without CR in this case.	2024-01-30 14:58:59 +01:00
Miroslav Zagorac	24a5e42db6	CLEANUP: log: deinitialization of the log buffer in one function In several places in the source, there was the same block of code that was used to deinitialize the log buffer. There were even two functions that did this, but they were called only from the code that is in the same source file (free_tcpcheck_fmt() in src/tcpcheck.c and free_logformat_list() in src/proxy.c - they were both static functions). The function free_logformat_list() was moved from the file src/proxy.c to src/log.c, and a check of the list before freeing the memory was added to that function.	2024-01-30 08:27:26 +01:00
Amaury Denoyelle	a13989f109	BUG/MEDIUM: quic: fix crash on invalid qc_stream_buf_free() BUG_ON A recent fix was introduced to ensure unsent data are deleted when a QUIC MUX stream releases its qc_stream_desc instance. This is necessary to ensure all used buffers will be liberated once all ACKs are received. This is implemented by the following patch : commit `ad6b13d317` (quic-dev/qns) BUG/MEDIUM: quic: remove unsent data from qc_stream_desc buf Before this patch, buffer removal was done only on ACK reception. ACK handling is only done in order from the oldest one. A BUG_ON() statement is present to ensure this assertion remains valid. This is however not true anymore since the above patch. Indeed, after unsent data removal, the current buffer may be empty if it did not contain yet any sent data. In this case, it is not the oldest buffer, thus the BUG_ON() statement will be triggered. To fix this, simply remove this BUG_ON() statement. It should not have any impact as it is safe to remove buffers in any order. Note that several conditions must be met to trigger this BUG_ON crash : * a QUIC MUX stream is destroyed before transmitting all of its data * several buffers must have been previously allocated for this stream so it happens only for transfers bigger than bufsize * latency should be high enough to delay ACK reception This must be backported wherever the above patch is (currently targetted to 2.6).	2024-01-29 15:44:49 +01:00
Amaury Denoyelle	7d22c4956c	BUG/MEDIUM: qpack: allow 6xx..9xx status codes HTTP status codes outside of 100..599 are considered invalid in HTTP RFC9110. However, it is explicitely stated that range 600..999 is often used for internal communication so in practice haproxy must be lenient with it. Before this patch, QPACK encoder rejected these values. This resulted in a connection error. Fix this by extending the range of allowed values from 100 to 999. This is linked to github issue #2422. Once again, thanks to @yokim-git for his help here. This must be backported up to 2.6.	2024-01-29 15:40:19 +01:00
Amaury Denoyelle	5d2fe1871a	BUG/MEDIUM: h3: do not crash on invalid response status code A crash occurs in h3_resp_headers_send() if an invalid response code is received from the backend side. Fix this by properly flagging the connection on error. This will cause a CONNECTION_CLOSE. This should fix github issue #2422. Big thanks to ygkim (@yokim-git) for his help and reactivity. Initially, GDB reported an invalid code source location due to heavy functions inlining inside h3_snd_buf(). The issue was found after using -Og flag. This must be backported up to 2.6.	2024-01-29 15:38:51 +01:00
Amaury Denoyelle	df5cf9123f	MINOR: h3: add traces for stream sending function Replace h3_debug_printf() by real trace for functions used by stream layer snd_buf callback. A new event type H3_EV_STRM_SEND is created for the occasion. This should be backported up to 2.6 to help investigate H3 issues on stable releases. Note that h3_nego_ff/h3_done_ff definition are not available from 2.8.	2024-01-29 15:38:24 +01:00
Olivier Houchard	1ad1991721	BUG/MAJOR: ssl_sock: Always clear retry flags in read/write functions It has been found that under some rare error circumstances, SSL_do_handshake() could return with SSL_ERROR_WANT_READ without even trying to call the read function, causing permanent wakeups that prevent the process from sleeping. It was established that this only happens if the retry flags are not systematically cleared in both directions upon any I/O attempt, but, given the lack of documentation on this topic, it is hard to say if this rather strange behavior is expected or not, otherwise why wouldn't the library always clear the flags by itself before proceeding? In addition, this only seems to affect OpenSSL 1.1.0 and above, and does not affect wolfSSL nor aws-lc. A bisection on haproxy showed that this issue was first triggered by commit `a8955d57ed` ("MEDIUM: ssl: provide our own BIO."), which means that OpenSSL's socket BIO does not have this problem. And this one does always clear the flags before proceeding. So let's just proceed the same way. It was verified that it properly fixes the problem, does not affect other implementations, and doesn't cause any freeze nor spurious wakeups either. Many thanks to Valent�n Guti�rrez for providing a network capture showing the incident as well as a reproducer. This is GH issue #2403. This patch needs to be backported to all versions that include the commit above, i.e. as far as 2.0.	2024-01-29 15:10:24 +01:00
Amaury Denoyelle	ad6b13d317	BUG/MEDIUM: quic: remove unsent data from qc_stream_desc buf QCS instances use qc_stream_desc for data buffering on emission. On stream reset, its Tx channel is closed earlier than expected. This may leave unsent data into qc_stream_desc. Before this patch, these unsent data would remain after QCS freeing. This prevents the buffer to be released as no ACK reception will remove them. The buffer is only freed when the whole connection is closed. As qc_stream_desc buffer is limited per connection, this reduces the buffer pool for other streams of the same connection. In the worst case if several streams are resetted, this may completely freeze the transfer of the remaining connection streams. This bug was reproduced by reducing the connection buffer pool to a single buffer instance by using the following global statement : tune.quic.frontend.conn-tx-buffers.limit 1. Then a QUIC client is used which opens a stream for a large enough object to ensure data are buffered. The client them emits a STOP_SENDING before reading all data, which forces the corresponding QCS instance to be resetted. The client then opens a new request but the transfer is freezed due to this bug. To fix this, adjust qc_stream_desc API. Add a new argument <final_size> on qc_stream_desc_release() function. Its value is compared to the currently buffered offset in latest qc_stream_desc buffer. If <final_size> is inferior, it means unsent data are present in the buffer. As such, qc_stream_desc_release() removes them to ensure the buffer will finally be freed when all ACKs are received. It is also possible that no data remains immediately, indicating that ACK were already received. As such, buffer instance is immediately removed by qc_stream_buf_free(). This must be backported up to 2.6. As this code section is known to regression, a period of observation could be reserved before distributing it on LTS releases.	2024-01-26 16:02:05 +01:00
Amaury Denoyelle	1da5787db4	MINOR: quic: extract qc_stream_buf free in a dedicated function On ACK reception, data are removed from buffer via qc_stream_desc_ack(). The buffer can be freed if no more data are left. A new slot is also accounted in buffer connection pool. Extract this operation in a dedicated private function qc_stream_buf_free(). This change should have no functional change. However it will be useful for the next patch which needs to remove a buffer from another function. This patch is necessary for the following bugfix. As such, it must be backported with it up to 2.6.	2024-01-26 16:00:53 +01:00
Frederic Lecaille	96385f40b5	MINOR: quic: Stop hardcoding a scale shifting value (CUBIC_BETA_SCALE_FACTOR_SHIFT) Very minor modification to replace a statement with an hardcoded value by a macro. Should be backported as far as 2.6 to ease any further modification to come.	2024-01-25 08:02:41 +01:00
Frederic Lecaille	574cf3fe00	CLEANUP: quic: Remove unused CUBIC_BETA_SCALE_FACTOR_SHIFT macro. This macro is not used and has a confusing name. Should be backported as far as 2.6.	2024-01-25 08:02:41 +01:00
Frederic Lecaille	b9a163e7e1	BUG/MINOR: quic: newreno QUIC congestion control algorithm no more available There is a typo in the statement to initialize this variable when selecting newreno as cc algo: const char newreno = "newrno"; This would have happened if #defines had be used in place of several const char variables harcoded values. Take the opportunity of this patch to use #defines for all the available cc algorithms. Must be backported to 2.9.	2024-01-25 08:02:41 +01:00
Remi Tricot-Le Breton	6b69512332	BUG/MEDIUM: cache: Fix crash when deleting secondary entry When a cache is "cold" and multiple clients simultaneously try to access the same resource we must forward all the requests to the server. Next, every "duplicated" response will be processed in http_action_store_cache and we will try to cache every one of them regardless of whether this response was already cached. In order to avoid having multiple entries for a same primary key, the logic is then to first delete any preexisting entry from the cache tree before storing the current one. The actual previous response content will not be deleted yet though because if the corresponding row is detached from the "avail" list it might still be used by a cache applet if it actually performed a lookup in the cache tree before the new response could be received. This all means that we can end up using a valid row that references a cache_entry that was already removed from the cache tree. This does not pose any problem in regular caches (no 'vary' mechanism enabled) because the applet only works on the data and not the 'cache_entry' information, but in the "vary" context, when calling 'http_cache_applet_release' we might call 'delete_entry' on the given entry which in turn tries to iterate over all the secondary entries to find the right one in which the secondary entry counter can be updated. We would then call eb32_next_dup on an entry that was not in the tree anymore which ended up crashing. This crash was introduced by "48f81ec09 : MAJOR: cache: Delay cache entry delete in reserve_hot function" which added the call to "release_entry" in "http_cache_applet_release" that ended up crashing. This issue was raised in GitHub #2417. This patch must be backported to branch 2.9.	2024-01-24 18:01:30 +01:00
Aurelien DARRAGON	f41402ab29	CLEANUP: hlua: fix indent, remove extra return in hlua_core_get_var() This is cleanup patch to address cosmetic issues introduced in `f034139bc0` ("MINOR: lua: Allow reading "proc." scoped vars from LUA core.") Also taking this opportunity to prefix the function with __LJMP to indicate that it may longjump. No backport needed.	2024-01-24 16:27:47 +01:00
Aurelien DARRAGON	564addcb72	BUG/MINOR: hlua: fix uninitialized var in hlua_core_get_var() As raised by Coverity in GH #2223, `f034139bc0` ("MINOR: lua: Allow reading "proc." scoped vars from LUA core.") causes uninitialized reads due to smp being passed to vars_get_by_name() without being initialized first. Indeed, vars_get_by_name() tries to read smp->sess and smp->strm pointers. As we're only interested in the PROC var scope, it is safe to call vars_get_by_name() with sess and strm pointers set to NULL, thus we simply memset smp prior to calling vars_get_by_name() to fix the issue. This should be backported in 2.9 with `f034139bc0`.	2024-01-24 16:27:38 +01:00
Frederic Lecaille	ab75d89e07	BUILD: quic: Fix build error when building QUIC against libressl. This previous commit was not sufficient to completely fix the building issue in relation with the TLS stack 0-RTT support. LibreSSL was the last TLS stack to refuse to compile because of undefined a QUIC specific function for 0-RTT: SSL_set_quic_early_data_enabled(). To get rid of such compilation issues, define HA_OPENSSL_HAVE_0RTT_SUPPORT only when building against TLS stack with 0-RTT support. No need to backport.	2024-01-24 15:37:40 +01:00
Frederic Lecaille	40f9902388	BUILD: quic: Fix build error when building QUIC against wolfssl. This commit: "MINOR: quic: Enable early data at SSL session level (aws-lc) introduced a build error when using wolfssl as TLS stack because it references unknown function wolfSSL_set_quic_early_data_enabled() which is not defined in qc_set_quic_early_data_context() that must not be used in this case. The compilation of this fonction was enabled for wolfssl when it should not have by the mentionned commit. No backport is needed.	2024-01-24 14:36:41 +01:00
Willy Tarreau	59e9b6c204	BUILD: quic: fix build error when using the compatibility layer Commit `f783dd959b` ("MINOR: quic: Enable early data at SSL session level (aws-lc)") introduced a build error when using the openssl compat layer because it references unknown function SSL_set_quic_early_data_context() in qc_set_quic_early_data_context() that is not used in this case. No backport is needed.	2024-01-24 10:49:24 +01:00
Willy Tarreau	e41638af33	BUG/MINOR: jwt: fix jwt_verify crash on 32-bit archs The jwt_verify converter was added in 2.5 with commit `130e142ee2` ("MEDIUM: jwt: Add jwt_verify converter to verify JWT integrity"). It takes a string on input and returns an integer. It turns out that by presetting the return value to zero before processing contents, while the sample data is a union, it overwrites the beginning of the buffer struct passed on input. On a 64-bit arch it's not an issue because it's where the allocated size is stored and it's not used in the operation, which explains why the regtest works. But on 32-bit, both the size and the pointer are overwritten, causing a NULL pointer to be passed to jwt_tokenize() which is not designed to support this, hence crashes. Let's just use a temporary variable to hold the result and move the output sample initialization to the end of the function. This should be backported as far as 2.5.	2024-01-24 10:35:22 +01:00
Emeric Brun	ef02dba7bc	BUG/MEDIUM: cli: some err/warn msg dumps add LR into CSV output on stat's CLI The initial purpose of CSV stats through CLI was to make it easely parsable by scripts. But in some specific cases some error or warning messages strings containing LF were dumped into cells of this CSV. This made some parsing failure on several tools. In addition, if a warning or message contains to successive LF, they will be dumped directly but double LFs tag the end of the response on CLI and the client may consider a truncated response. This patch extends the 'csv_enc_append' and 'csv_enc' functions used to format quoted string content according to RFC with an additionnal parameter to convert multi-lines strings to one line: CRs are skipped, and LFs are replaced with spaces. In addition and optionally, it is also possible to remove resulting trailing spaces. The call of this function to fill strings into stat's CSV output is updated to force this conversion. This patch should be backported on all supported branches (issue was already present in v2.0)	2024-01-24 08:38:59 +01:00
Frederic Lecaille	5c88b9fcfb	MINOR: quic: Correctly wait for the completion of handshakes with early data (aws-lc) This patch impacts only the haproxy builds against aws-lc TLS stack (USE_OPENSSL_AWSLC). As mentionned by the boringssl documentation, SSL_do_handshake() completes as soon as ClientHello is processed and server flight sent (from the TLS stack to the server endpoint I guess). Into QUIC, the completion has as side effect to discard the Handshake packet number space. If this handshake completion is not deffered, the Handshake level CRYPTO data will not be sent to the peer (because of the assotiated packet number space discarding). According to the documentation, SSL_in_early_data() may be used to do that. If it returns 1, this means that the handshake is still in progress but has enough progressed to send half-RTT data. This patch is required to make the haproxy builds against aws-lc TLS stack support 0-RTT.	2024-01-23 16:03:29 +01:00
Frederic Lecaille	fcc825501c	MINOR: ssl_sock: Early data disabled during SSL_CTX switching (aws-lc) This patch impacts only haproxy when built against aws-lc TLS stack (OPENSSL_IS_AWSLC). During the SSL_CTX switching from ssl_sock_switchctx_cbk() callback, ssl_sock_switchctx_set() is called. This latter calls SSL_set_SSL_CTX() whose aims is to change the SSL_CTX attached o an SSL object (TLS session). But the aws-lc (or boringssl) implementation of this function copy the "early data enabled" setting value (boolean) coming with the SSL_CTX object into the SSL object. So, if not set in the SSL_CTX object this setting disabled the one which has been set by configuration into the SSL object (see qc_set_quic_early_data_enabled(), it calls SSL_set_early_data_enabled() with an SSL object as parameter). Fix this enabling the "early data enabled" setting into the SSL_CTX before setting this latter into the SSL object. This patch is required to make QUIC 0-RTT work with haproxy built against aws-lc. Note that, this patch should also help in early data support for TCP connections.	2024-01-23 16:03:29 +01:00
Frederic Lecaille	f783dd959b	MINOR: quic: Enable early data at SSL session level (aws-lc) This patch impacts only the haproxy build against aws-lc TLS stack (USE_OPENSSL_AWSLC). Implement qc_set_quic_early_data_enabled() new function to enable early data at session level. To make QUIC O-RTT work, a context string must be set calling SSL_set_quic_early_data_context(). This is a subset of the encoded transport parameters which is used for this. Note that some application level settings should be also added (TODO). This patch is required to make 0-RTT work for haproxy builds against aws-lc.	2024-01-23 16:03:29 +01:00

... 5 6 7 8 9 ...

17551 Commits