haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-14 19:16:57 +02:00

Author	SHA1	Message	Date
Valentine Krasnobaeva	df12791da3	MINOR: startup: add O_CLOEXEC flag to open /dev/null As master process performs execvp() syscall to handle USR2 and HUP signals in mworker_reexec(), let's add O_CLOEXEC flag, when we open '/dev/null' in order to avoid fd leak. This a preparation step to refactor master-worker logic. See more details in the next commits.	2024-10-16 22:00:58 +02:00
Valentine Krasnobaeva	5bbcdc003a	REGTESTS: cli: add delay 0.1 before connect to cli When vtest starts haproxy process, it loops until the moment, when haproxy pidfile is created. When pidfile is created, vtest considers that haproxy process is ready and it starts to perform test commands, in particular, it connects to CLI. It's not very reliable approach to base the check of the process readiness on the PID file. After master-worker architecture refactoring pidfile is created in the early init stage, but master and worker are not yet finished its initialization routines. So, all mcli tests and some tests where we sent commands to CLI start to fail regularly. In vtest at the moment there is no any other approach to check that the process is really ready. So let's add a delay 0.1s before connecting to CLI in all mcli tests and in acl_cli_spaces test.	2024-10-16 22:00:58 +02:00
Willy Tarreau	2c2dac77aa	DEBUG: mux-h2/flags: add H2_CF_DEM_RXBUF & H2_SF_EXPECT_RXDATA for the decoder Both flags were recently added but missing from the decoders flags, so they appeared in hex in dev/flags/flags output. No backport needed.	2024-10-16 18:32:52 +02:00
Willy Tarreau	ca275d99ce	BUG/MEDIUM: queue: make sure never to queue when there's no more served conns Since commit `53f52e67a0` ("BUG/MEDIUM: queue: always dequeue the backend when redistributing the last server"), we've got two reports again still showing the theoretically impossible condition in pendconn_add(), including a single threaded one. Thanks to the traces, the issue could be tracked down to the redispatch part. In fact, in non-determinist LB algorithms (RR, LC, FAS), we don't perform the LB if there are pending connections in the backend, since it indicates that previous attempts already failed, so we directly return SRV_STATUS_FULL. And contrary to a previous belief, it is possible to meet this condition with be->served==0 when redispatching (and likely with maxconn not greater than the number of threads). The problem is that in this case, the entry is queued and then the pendconn_must_try_again() function checks if any connections are currently being served to detect if we missed a race, and tries again, but that situation is not caused by a concurrent thread and will never fix itself, resulting in the loop. All that part around pendconn_must_try_again() is still quite brittle, and a safer approach would involve a sequence counter to detect new arrivals and dequeues during the pendconn_add() call. But it's more sensitive work, probably for a later fix. This fix must be backported wherever the fix above was backported. Thanks to Patrick Hemmer, as well as Damien Claisse and Basha Mougamadou from Criteo for their help on tracking this one!	2024-10-16 18:08:39 +02:00
Aurelien DARRAGON	85298189bf	BUG/MEDIUM: server: server stuck in maintenance after FQDN change Pierre Bonnat reported that SRV-based server-template recently stopped to work properly. After reviewing the changes, it was found that the regression was caused by `a4d04c6` ("BUG/MINOR: server: make sure the HMAINT state is part of MAINT") Indeed, HMAINT is not a regular maintenance flag. It was implemented in `b418c122` `a4d04c6` ("BUG/MINOR: server: make sure the HMAINT state is part of MAINT"). This flag is only set (and never removed) when the server FQDN is changed from its initial config-time value. This can happen with "set server fqdn" command as well as SRV records updates from the DNS. This flag should ideally belong to server flags.. but it was stored under srv_admin enum because cur_admin is properly exported/imported via server state-file while regular server's flags are not. Due to `a4d04c6`, when a server FQDN changes, the server is considered in maintenance, and since the HMAINT flag is never removed, the server is stuck in maintenance. To fix the issue, we partially revert `a4d04c6`. But this latter commit is right on one point: HMAINT flag was way too confusing and mixed-up between regular MAINT flags, thus there's nothing to blame about `a4d04c6` as it was error-prone anyway.. To prevent such kind of bugs from happening again, let's rename HMAINT to something more explicit (SRV_ADMF_FQDN_CHANGED) and make it stand out under srv_admin enum so we're not tempted to mix it with regular maintenance flags anymore. Since `a4d04c6` was set to be backported in all versions, this patch must be backported there as well.	2024-10-16 14:26:57 +02:00
Amaury Denoyelle	0918c41ef6	BUG/MEDIUM: quic: support wait-for-handshake wait-for-handshake http-request action was completely ineffective with QUIC protocol. This commit implements its support for QUIC. QUIC MUX layer is extended to support wait-for-handshake. A new function qcc_handle_wait_for_hs() is executed during qcc_io_process(). It detects if MUX processing occurs after underlying QUIC handshake completion. If this is the case, it indicates that early data may be received. As such, connection is flagged with CO_FL_EARLY_SSL_HS, which is necessary to block stream processing on wait-for-handshake action. After this, qcc subscribs on quic_conn layer for RECV notification. This is used to detect QUIC handshake completion. Thus, qcc_handle_wait_for_hs() can be reexecuted one last time, to remove CO_FL_EARLY_SSL_HS and notify every streams flagged as SE_FL_WAIT_FOR_HS. This patch must be backported up to 2.6, after a mandatory period of observation. Note that it relies on the backport of the two previous patches : - MINOR: quic: notify connection layer on handshake completion - BUG/MINOR: stream: unblock stream on wait-for-handshake completion	2024-10-16 11:51:35 +02:00
Amaury Denoyelle	73031e81cd	BUG/MINOR: stream: unblock stream on wait-for-handshake completion wait-for-handshake is an http-request action which permits to delay the processing of content received as TLS early data. The action yields as long as connection handshake is in progress. In the meantime, stconn is flagged with SE_FL_WAIT_FOR_HS. When the handshake is finished, MUX layer is responsible to woken up SE_FL_WAIT_FOR_HS flagged stconn instances to restart the stream processing. On sc_conn_process(), SE_FL_WAIT_FOR_HS flag is removed and stream layer is woken up. However, there may be a blocking after MUX notification. sc_conn_recv() may return 0 due to no new data reception, which prevents sc_conn_process() execution. The stream is thus blocked until its timeout. To fix this, checks in sc_conn_recv() about the handshake termination condition. If true, explicitely returns 1 to ensure sc_conn_process() will be executed. Note that this bug is not reproducible due to various conditions related to early data implementation in haproxy. Indeed, connection layer instantiation is always delayed until SSL handshake completion, which prevents the handling of early data as expected. This fix will be necessary to implement wait-for-handshake support for QUIC. As such, it must be backported with the next commit up to 2.6, after a mandatory period of observation.	2024-10-16 11:44:31 +02:00
Amaury Denoyelle	5a5950e42d	MINOR: quic: notify connection layer on handshake completion Wake up connection layer on QUIC handshake completion via quic_conn_io_cb. Select SUB_RETRY_RECV as this was previously unused by QUIC MUX layer. For the moment, QUIC MUX never subscribes for handshake completion. However, this will be necessary for features such as the delaying of early data forwarding via wait-for-handshake. This patch will be necessary to implement wait-for-handshake support for QUIC. As such, it must be backported with next commits up to 2.6, after a mandatory period of observation.	2024-10-16 11:42:06 +02:00
Willy Tarreau	5091f90479	MINOR: activity/memprofile: always return "other" bin on NULL return address It was found in a large "show profiling memory" output that a few entries have a NULL return address, which causes confusion because this address will be reused by the next new allocation caller, possibly resulting in inconsistencies such as "free() ... pool=trash" which makes no sense. The cause is in fact that the first caller had an entry->info pointing to the trash pool from a p_alloc/p_free with a NULL return address, and the second had a different type and reused that entry. Let's make sure undecodable stacks causing an apparent NULL return address all lead to the "other" bin. While this is not exactly a bug, it would make sense to backport it to the recent branches where the feature is used (probably at least as far as 2.8).	2024-10-15 08:12:34 +02:00
Willy Tarreau	93c9f19af7	REGTESTS: fix a reload race in abns_socket.vtc This test issues a reload over the master CLI, but it is totally possible that the master has not yet finished starting up the master CLI when the command is issued, resulting in a failure. This was much more visible on the new master-worker model, but definitely affects the old one and could be the reason for this test to occasionally fail on the CI.	2024-10-14 19:15:21 +02:00
William Lallemand	0302adf996	CI: cirrus-ci: bump FreeBSD image to 14-1 FreeBSD CI since to be broken for a while, try to upgrade the image to the latest 14.1 version.	2024-10-14 14:28:26 +02:00
Willy Tarreau	e4cb0ad632	MINOR: mux-h2/traces: add buffer-related info to h2s and h2c The traces currently don't contain any info about the amount of data present in buffers, making it difficult to figure if an empty buffer is the cause for not demuxing or if a full buffer is the cause for not reading more data. Let's add them, with the head/tail info as well.	2024-10-12 18:07:21 +02:00
Willy Tarreau	a8f907a459	MINOR: mux-h2/traces: add missing flags and proxy ID in traces H2 traces are unusable to detect bugs most of the time because they miss the h2c and h2s flags, as well as the proxy, which makes it very hard to figure if the info comes from the client or the server as soon as two layers are stacked. This commit adds these precious information as well as the h2s's rx and tx windows. This could be backported to a few recent branches, but the rx window calculation will have to be replaced with the static value there.	2024-10-12 17:45:51 +02:00
Willy Tarreau	fcab647613	OPTIM: mux-h2: use tasklet_wakeup_after() in h2s_notify_recv() This reduces the avg wakeup latency of sc_conn_io_cb() from 1900 to 51us. The L2 cache misses from from 1.4 to 1.2 billion for 20k req. But the perf is not better. Also there are situations where we must not perform such wakeup, these may only be done from h2_io_cb, hence the test on the next_tasklet pointer and its reset when leaving the function. In practice all callers to h2s_close() or h2s_destroy() can reach that code, this includes h2_detach, h2_snd_buf, h2_shut etc. Another test with 40 concurrent connections, transferring 40k 1MB objects at different concurrency levels from 1 to 80 also showed a 21% drop in L2 cache misses, and a 2% perf improvement: Before: 329,510,887,528 instructions 50,907,966,181 branches 843,515,912 branch-misses 2,753,360,222 cache-misses 19,306,172,474 L1-icache-load-misses 17,321,132,742 L1-dcache-load-misses 951,787,350 LLC-load-misses 44.660469000 seconds user 62.459354000 seconds sys => avg perf: 373 MB/s After: 331,310,219,157 instructions 51,343,396,257 branches 851,567,572 branch-misses 2,183,369,149 cache-misses 19,129,827,134 L1-icache-load-misses 17,441,877,512 L1-dcache-load-misses 906,923,115 LLC-load-misses 42.795458000 seconds user 62.277983000 seconds sys => avg perf: 380 MB/s With small requests, it's the L1 and L3 cache misses which reduced by 3% and 7% respectively, and the performance went up by 3%.	2024-10-12 17:17:51 +02:00
Willy Tarreau	04ce6536e1	OPTIM: mux-h2: try to continue reading after demuxing when useful When we stop demuxing in the middle of a frame, we know that there are other data following. The demux buffer is small and unique, but now we have rxbufs, so after h2_process_demux() is left, the dbuf is almost empty and has room to be delivered into another rxbuf. Let's implement a short loop with a counter and a few conditions around the demux call. We limit the number of turns to the number of available rxbufs and no more than 12, since it shows good performance, and the wakeup is only called once. This has shown a nice 12-20% bandwidth gain on backend-side H2 transferring 1MB-large objects, and does not affect the rest (headers, control etc). The number of wakeup calls was divided by 5 to 8, which is also a nice improvement. The counter is limited to make sure we don't add processing latency. Tests were run to find the optimal limit, and it turns out that 16 is just slightly better, but not worth the +33% increase in peak processing latency. The h2_process_demux() function just doens't call the wakeup function anymore, and solely focuses on transferring from dbuf to rxbuf. Practical measurement: test with h2load producing 4 concurrent connections with 10 concurrent streams each, downloading 1MB objects (20k total) via two layers of haproxy stacked, reaching httpterm over H1 (numbers are total for the 2 h2 front and 1 h2 back). All on a single thread. Before: 549-553 MB/s (on h2load) function calls cpu_tot cpu_avg h2_io_cb 2562340 8.157s 3.183us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup h2_io_cb 30109 840.9ms 27.93us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup h2_io_cb 16105 106.4ms 6.607us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup h2_io_cb 1 11.75us 11.75us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup h2_io_cb 2608555 9.104s 3.490us --total-- perf stat: 153,117,996,214 instructions (71.41%) 22,919,659,027 branches # 14.97% of inst (71.41%) 384,009,600 branch-misses # 1.68% of all branches (71.42%) 44,052,220 cache-misses # 1 inst / 3476 (71.44%) 9,819,232,047 L1-icache-load-misses # 6.4% of inst (71.45%) 8,426,410,306 L1-dcache-load-misses # 5.5% of inst (57.15%) 10,951,949 LLC-load-misses # 1 inst / 13982 (57.13%) 12.372600000 seconds user 23.629506000 seconds sys After: 660 MB/s (+20%) function calls cpu_tot cpu_avg h2_io_cb 244502 4.410s 18.04us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup h2_io_cb 42107 1.062s 25.22us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup h2_io_cb 13703 106.3ms 7.758us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup h2_io_cb 1 13.74us 13.74us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup h2_io_cb 300313 5.578s 18.57us --total-- perf stat: 126,840,441,876 instructions (71.40%) 17,576,059,236 branches # 13.86% of inst (71.40%) 274,136,753 branch-misses # 1.56% of all branches (71.42%) 30,413,562 cache-misses # 1 inst / 4170 (71.45%) 6,665,036,203 L1-icache-load-misses # 5.25% of inst (71.46%) 7,519,037,097 L1-dcache-load-misses # 5.9% of inst (57.15%) 6,702,411 LLC-load-misses # 1 inst / 18925 (57.12%) 10.490097000 seconds user 19.212515000 seconds sys It's also interesting to see that less total time is spent in these functions, clearly indicating that the cost of interrupted processing, and the extraneous cache misses come into play at some point. Indeed, after the change, the number of instructions went down by 17.2%, while the L2 cache misses dropped by 31% and the L3 cache misses by 39%!	2024-10-12 16:38:36 +02:00
Willy Tarreau	9fbc01710a	OPTIM: mux-h2: make h2_send() report more accurate wake up conditions h2_send() used to report non-zero every time any data were sent, and this was used from h2_snd_buf() or h2_done_ff() to trigger a wakeup, which possibly can do nothing. Restricting this wakeup to either a successful send() combined with the ability to demux, or an error. Doing this makes the number of h2_io_cb() wakeups drop from 422k to 245k for 1000 1MB objects delivered over 100 streams between two H2 proxies, without any behavior change nor performance change. In practice, most send() calls do not result in a wakeup anymore but synchronous errors still do. A local test downloading 10k 1MB objects from an H1 server with a single connection shows this change: before after caller 1547 1467 h2_process_demux() 2138 0 h2_done_ff() <--- 38 1453 ssl_sock_io_cb() <--- 18 0 h2_snd_buf() 1 1 h2_init() 3742 2921 -- total -- In practice the ssl_sock_io_cb() wakeups are those notifying about SUB_RETRY_RECV, which are not accounted for when h2_done_ff() performs the wakeup because the tasklet is already queued (a counter placed there shows that it's nonetheless called). So there's no transfer and h2_done_ff() was only hiding the other one. Another test involving 4 connections with 10 concurrent streams each and 20000 1MB objects total shows a total disparition of the wakeups from h2_snd_buf and h2_done_ff, which used to account together for 50% of the wakeups, resulting in effectively halving the number of wakeups which, based on their avg process time, were not doing anything: Before: function calls cpu_tot cpu_avg h2_io_cb 2571208 7.406s 2.880us <- h2c_restart_reading@src/mux_h2.c:940 tasklet_wakeup h2_io_cb 2536949 251.4ms 99.00ns <- h2_snd_buf@src/mux_h2.c:7573 tasklet_wakeup ### h2_io_cb 41100 5.622ms 136.0ns <- h2_done_ff@src/mux_h2.c:7779 tasklet_wakeup ### h2_io_cb 38979 852.8ms 21.88us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup h2_io_cb 12519 90.28ms 7.211us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup h2_io_cb 1 13.81us 13.81us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup h2_io_cb 5200756 8.606s 1.654us --total-- After: h2_io_cb 2562340 8.157s 3.183us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup h2_io_cb 30109 840.9ms 27.93us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup h2_io_cb 16105 106.4ms 6.607us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup h2_io_cb 1 11.75us 11.75us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup h2_io_cb 2608555 9.104s 3.490us --total--	2024-10-12 16:38:36 +02:00
Willy Tarreau	633c41c621	MEDIUM: mux-h2: rework h2_restart_reading() to differentiate recv and demux From the beginning, h2_restart_reading() has always been confusing because it decides whether or not to wake the tasklet handler up or not. This tasklet handler does two things, one is receiving from the socket to the demux buf, and one is demuxing from the demux buf to the streams' rxbufs. The conditions are governed by h2_recv_allowed(), which is also called at a few places to decide whether or not to actually receive from the socket. It starts to be visible that this leaves some difficulties regarding what to do with possibly pending data. In 2.0 with commit `3ca18bf0b` ("BUG/MEDIUM: h2: Don't attempt to recv from h2_process_demux if we subscribed."), we even had to address a special case where it was possibly to endlessly wake up because the conditions would rely on the demux buffer's contents, though the solution consisted in passing a flag to decide whether or not to consider the buffer's contents. In 2.5 commit `b5f7b5296` ("BUG/MEDIUM: mux-h2: Handle remaining read0 cases on partial frames") introduced a new flag H2_CF_DEM_SHORT_READ which indicates that the demux had to stop in the middle of a frame and cannot make progress without more data. More adaptations later came in based on this but this actually reflected exactly what was needed to solve this painful situation: a state indicating whether to receive or parse. Now's about time to definitely address this by reworking h2_restart_reading() to check two completely independent things: - the ability to receive more data into the demux buffer, which is based on its allocation/fill state and the socket's errors - the ability to demux such data, which is based on the presence of enough data (i.e. no stuck short read), and ability to find an rx buf to continue the processing. Now the conditions are much more understandable, and it's also visible that the consider_buffer argument, whose value was not trivial for callers, is not used anymore. Tests stacking two layers of H2 show strictly no change to the wakeup cause distributions nor counts.	2024-10-12 16:38:36 +02:00
Willy Tarreau	e057f8367c	DOC: design-thoughts: add diagrams illustrating an rx win groth Let's just see on a diagram how the receiver can detect that the window is large enough for the remote sender to fill the link. Here it seems that a first criterion is that data are accumulating in the rxbuf, indicating that the next hop doesn't consume them fast enough. On the diagram it's visible when blue arrows (incoming data) are more frequent than the magenta ones on average (outgoing data), which happens when silence moments are less frequent and don't allow the reader to catch up. It's also visible that there are two phases alternating in the transfer: - measure round trip time (i.e. how long it takes to restart sending after a WU was sent after a long silence) - measure the lowest rxbuf size during the previous round trip It's worth noting that a window size change only has observable effect after two RTT: the first RTT is to restart sending (opening or enlarging the window), the second RTT to measure the lowest rxbuf size over the period. By turning the advertised window into an offset and comparing it to the received quantity, it's possible to measure the RTT of the whole chain (including the client possibly producing the data). Note that when multiple streams compete for BW this can become tricky. Limiting the window to available buffers and counting the number of sending streams on a connection could work (i.e. split total buffers into 1+#senders, first one being used for tx).	2024-10-12 16:38:36 +02:00
Willy Tarreau	0fd66703c2	MEDIUM: mux-h2: change the default initial window to 16kB Now that we're using all available rx buffers for transfers, there's no point anymore in advertising more than the minimum value we can safely buffer. Let's be conservative and only rely on the dynamic buffers to improve speed beyond the configured value, and make sure than many streams will no longer cause unfairness. Interestingly, the total number of wakeups has further shrunk down, but with a different distribution. From 128k for 1000 1M transfers, it went down to 119k, with 96k from restart_reading, 10k from done_ff and 2.6k from snd_buf. done_ff went up by 30% and restart_reading went down by 30%.	2024-10-12 16:38:26 +02:00
Willy Tarreau	1ed9d37c88	MINOR: mux-h2: add tune.h2.be.rxbuf and tune.h2.fe.rxbuf global settings These settings allow to change the total buffer size allocated to the backend and frontend respectively. This way it's no longer necessary to play with tune.bufsize nor increase the number of streams to benefit from more buffers. Setting tune.h2.fe.rxbuf to 4m to match a sender's max tcp_wmem resulted in 257 Mbps for a single stream at 103ms vs 121 Mbps default (or 5.1 Mbps with a single buffer and 64kB window).	2024-10-12 16:29:16 +02:00
Willy Tarreau	e018d9a0cf	MAJOR: mux-h2: make the rxbuf allocation algorithm a bit smarter Without using bandwidth estimates, we can already use up to the number of allocatable rxbufs and share them evenly between receiving streams. In practice we reserve one buffer for any non-receiving stream, plus 1 per 8 possible new streams, and divide the rest between the number of receiving streams. Finally, for front streams, this is rounded up to the buffer size while for back streams we round it down. The rationale here is that front to back is very fast to flush and slow to refill so we want to optimise upload bandwidth regardless of the number of streams, while it's the opposite in the other way so we try to minimize HoL. That shows good results with a single stream being able to send at 121 Mbps at 103ms using 1.4 MB buffer with default settings, or 8 streams sharing the bandwidth at 180kB each. Previously the limit was approx 5.1 Mbps per stream. It also enables better sharing of backend connections: a slow (100 Mbps) and a fast (1 Gbps) clients were both downloading 2 100MB files each over a shared H2 connection. The fast one used to show 6.86 to 20.74s with an avg of 11.45s and an stddev of 5.81s before the patch, and went to a much more respectable 6.82 to 7.73s with 7.08s avg and 0.336s stddev. We don't try to increase the window past the remaining content length. First, this is pointless (though harmless), but in addition it causes needless emission of WINDOW_UPDATE frames on small uploads that are smaller than a window, and beyond being useless, it upsets vtest which expects an RST on some tests. The scheduling is not reliable enough to insert an expect for a window update first, so in the end wich that extra check we save a few useless frames on small uploads and please vtest. A new setting should be added to allow to increase the number of buffers without having to change the number of streams. At this point it's not done.	2024-10-12 16:29:16 +02:00
Willy Tarreau	3816c38601	MAJOR: mux-h2: permit a stream to allocate as many buffers as desired Now we don't enforce allocation limits in h2s_get_rxbuf(), since there is no benefit in not processing pending data, it would still cause HoL for no saving. The only reason for not allocating is if there are no buffers available for the connection. In theory this should not change anything except that it excerts code paths that support reallocating multiple buffers, which could possibly uncover a sleeping bug. This is why it's placed in a separate commit. And one observation worth noting is that it almost cut in half the number of iocb wakeups: for 1000 1MB transfers over 100 concurrent streams of a single connection, we used to observe 208k wakeups (110 from restart_reading, 80 from snd_buf, 11 from done_ff), and now we're observing 128k (113 from restart_reading, 2.4 from snd_buf, 6.9k from done_ff), which seems to indicate that pretty often the demuxing was blocked on a buffer full due to the default advertised window of 64k.	2024-10-12 16:29:16 +02:00
Willy Tarreau	4eb3ff1d3b	MAJOR: mux-h2: make streams use the connection's buffers For now it seems to work as before, and even when artificially inflating the number of allocatable buffers per stream. The number of allocated slots is always the same as the max number of streams, which guarantees that each stream will find one buffer. we only grant one buffer per stream at this point, since the goal was to replace the existing single rxbuf. A new demux blocking flag, H2_CF_DEM_RXBUF, was added to indicate a failure to get an rxbuf slot from the connection. It was lightly tested (by forcing bl_init() to a lower number of buffers). It is not yet certain whether it's more useful to have a new flag or to reuse the existing H2_CF_DEM_SFULL which indicates the rxbuf is full, but at least the new flag more accurately translates the condition, that may make a difference in the future. However, given that when RXBUF is set, most of the time it results in a failure to find more room to demux and it sets SFULL, for now we have to always clear SFULL when clearing RXBUF as well. This means that most of the time we'll see 3 combinations: - none: everything's OK - SFULL: the unique rx buffer is full - RXBUF \|\| (RXBUF\|SFULL): cannot allocate more entries Note that we need to be super careful in h2_frt_transfer_data() because the htx_free_data_space() function doesn't guarantee that the room is usable, so htx_add_data() may still fail despite an apparent room. For this reason, h2_frt_transfer_data() maintains a "full" flag to indicate that a transfer attempt failed and that a new buffer is required.	2024-10-12 16:29:16 +02:00
Willy Tarreau	6279cbc9e9	MINOR: mux-h2: clear up H2_CF_DEM_DFULL and H2_CF_DEM_SHORT_READ ambiguity Since commit `485da0b05` ("BUG/MEDIUM: mux_h2: Handle others remaining read0 cases on partial frames"), H2_CF_DEM_SHORT_READ is set when there is no blocking flags. However, it checks H2_CF_DEM_BLOCK_ANY which does not include H2_CF_DEM_DFULL. This results in many cases where both H2_CF_DEM_DFULL and H2_CF_DEM_SHORT_READ are set together, which makes no sense, since one says the demux buffer is full while the other one says an incomplete read was done. This doesn't permit to properly decide whether to restart reading or processing. Let's make sure to clear DFULL in h2_process_demux() whenever we consume incoming data from the dbuf, and check for DFULL before setting SHORT_READ. This could probably be considered as a bug fix but it's hard to say if it has any impact on the current code, probably at worst it might cause a few useless wakeups, so until there's any proof that it needs to be backported, better not do it.	2024-10-12 16:29:16 +02:00
Willy Tarreau	b74bedf157	MINOR: mux-h2: simplify the wake up code in h2_rcv_buf() The code used to decide when to restart reading is far from being trivial and will cause trouble after the forthcoming changes: it checks if the current stream is the same that is being demuxed, and only if so, wakes the demux to restart reading. Once streams will start to use multiple buffers, this condition will make no sense anymore. Actually the real reason is split into two steps: - detect if the demux is currently blocked on the current stream, and if so remove SFULL - detect if any demux blocking flags were removed during the operations, and if so, wake demuxing. For now this doesn't change anything.	2024-10-12 16:29:16 +02:00
Willy Tarreau	a0ed92f3dd	MINOR: mux-h2: simplify the exit code in h2_rcv_buf() The code used to decide what to tell to the upper layer and when to free the rxbuf is a bit convoluted and difficult to adapt to dynamic rxbufs. We first need to deal with memory management (b_free) and only then to decide what to report upwards. Right now it does it the other way around. This should not change anything.	2024-10-12 16:29:16 +02:00
Willy Tarreau	3b5ac2b553	MINOR: mux-h2: move H2_CF_WAIT_IN_LIST flag away from the demux flags It's not convenient to have this flag in the middle of the demux flags, it easily hides other ones that need to be added. Let's move it after the other ones.	2024-10-12 16:29:16 +02:00
Willy Tarreau	8cf418811d	MINOR: mux-h2: add rxbuf head/tail/count management for h2s Now the h2s get their rx_head, rx_tail and rx_count associated with the shared rxbufs. A few functions are provided to manipulate all this, essentially allocate/release a buffer for the stream, return a buffer pointer to the head/tail, counting allocated buffers for the stream and reporting if a stream may still allocate. For now this code is not used.	2024-10-12 16:29:16 +02:00
Willy Tarreau	a891534bfd	MINOR: mux-h2: allocate the array of shared rx bufs in the h2c In preparation for having a shared list of rx bufs, we're now allocating the array of shared rx bufs in the h2c. The pool is created at the max size between the front and back max streams for now, and the array is not used yet.	2024-10-12 16:29:16 +02:00
Willy Tarreau	721ea5b06c	MINOR: mux-h2: count within a connection, how many streams are receiving data A stream is receiving data from after the HEADERS frame missing END_STREAM, to the end of the stream or HREM (the presence of END_STREAM). We're now adding a flag to the stream that indicates this state, as well as a counter in the connection of streams currently receiving data. The purpose will be to gauge at any instant the number of streams that might have to share the available bandwidth and buffers count in order not to allocate too much flow control to any single stream. For now the counter is kept up to date, and is reported in "show fd".	2024-10-12 16:29:16 +02:00
Willy Tarreau	c9275084bc	MEDIUM: mux-h2: start to introduce the window size in the offset calculation Instead of incrementing the last_max_ofs by the amount of received bytes, we now start from the new current offset to which we add the static window size. The result is exactly the same but it prepares the code to use a window size combined with an offset instead of just refilling the budget from what was received. It was even verified that changing h2_fe_settings_initial_window_size in the middle of a transfer using gdb does indeed allow the transfer speed to adapt accordingly.	2024-10-12 16:29:16 +02:00
Willy Tarreau	1cc851d9f2	MEDIUM: mux-h2: start to update stream when sending WU The rationale here is that we don't absolutely need to update the stream offset live, there's already the rcvd_s counter to remind us we've received data. So we can continue to exploit the current check points for this. Now we know that rcvd_s indicates the amount of newly received bytes for the stream since last call to h2c_send_strm_wu() so we can update our stream offsets within that function. The wu_s counter is set to the difference between next_adv_ofs and last_adv_ofs, which are resynchronized once the frame is sent. If the stream suddenly disappears with unacked data (aborted upload), the presence of the last update in h2c->wu_s is sufficient to let the connection ack the data alone, and upon subsequent calls with new rcvd_s, the received counter will be used to ack, like before. We don't need to do more anyway since the goal is to let the client abort ASAP when it gets an RST. At this point, the stream knows its current rx offset, the computed max offset and the last advertised one.	2024-10-12 16:29:16 +02:00
Willy Tarreau	eb0fe66c61	MINOR: mux-h2: create and initialize an rx offset per stream In H2, everything is accounted as budget. But if we want to moderate the rcv window that's not very convenient, and we'd rather have offsets instead so that we know where we are in the stream. Let's first add the fields to the struct and initialize them. The curr_rx_ofs indicates the position in the stream where next incoming bytes will be stored. last_adv_ofs tells what's the offset that was last advertised as the window limit, and next_max_ofs is the one that will need to be advertised, which is curr_rx_ofs plus the current window. next_max_ofs will have to cause a WINDOW_UPDATE to be emitted when it's higher than last_adv_ofs, and once the WU is sent, its value will have to be copied over last_adv_ofs. The problem is, for now wherever we emit a stream WU, we have no notion of stream (the stream might even not exist anymore, e.g. after aborting an upload), because we currently keep a counter of stream window to be acked for the current stream ID (h2c->dsi) in the connection (rcvd_s). Similarly there are a few places early in the frame header processing where rcvd_s is incremented without knowing the stream yet. Thus, lookups will be needed for that, unless such a connection-level counter remains used and poured into the stream's count once known (delicate). Thus for now this commit only creates the fields and initializes them.	2024-10-12 16:29:15 +02:00
Willy Tarreau	560e474cdd	MINOR: mux-h2: split the amount of rx data from the amount to ack We'll need to keep track of the total amount of data received for the current stream, and the amount of data to ack for the current stream, which might soon diverge as soon as we'll have to update the stream's offset with received data, which are different from those to be ACKed. One reason is that in case a stream doesn't exist anymore (e.g. aborted an upload), the rcvd_s info might get lost after updating the stream, so we do need to have an in-connection counter for that. What's done here is that the rcvd_s count is transferred to wu_s in h2c_send_strm_wu(), to be used as the counter to send, and both are considered as sufficient when non-null to call the function.	2024-10-12 16:29:15 +02:00
Willy Tarreau	8f09bdce10	MINOR: buffer: add a buffer list type with functions The buffer ring is problematic in multiple aspects, one of which being that it is only usable by one entity. With multiplexed protocols, we need to have shared buffers used by many entities (streams and connection), and the only way to use the buffer ring model in this case is to have each entity store its own array, and keep a shared counter on allocated entries. But even with the default 32 buf and 100 streams per HTTP/2 connection, we're speaking about 3210132 bytes = 103424 bytes per H2 connection, just to store up to 32 shared buffers, spread randomly in these tables. Some users might want to achieve much higher than default rates over high speed links (e.g. 30-50 MB/s at 100ms), which is 3 to 5 MB storage per connection, hence 180 to 300 buffers. There it starts to cost a lot, up to 1 MB per connection, just to store buffer indexes. Instead this patch introduces a variant which we call a buffer list. That's basically just a free list encoded in an array. Each cell contains a buffer structure, a next index, and a few flags. The index could be reduced to 16 bits if needed, in order to make room for a new struct member. The design permits initializing a whole freelist at once using memset(0). The list pointer is stored at a single location (e.g. the connection) and all users (the streams) will just have indexes referencing their first and last assigned entries (head and tail). This means that with a single table we can now have all our buffers shared between multiple streams, irrelevant to the number of potential streams which would want to use them. Now the 180 to 300 entries array only costs 7.2 to 12 kB, or 80 times less. Two large functions (bl_deinit() & bl_get()) were implemented in buf.c. A basic doc was added to explain how it works.	2024-10-12 16:29:15 +02:00
Willy Tarreau	ac66df4e2e	REORG: buffers: move some of the heavy functions from buf.h to buf.c Over time, some of the buffer management functions grew quite a bit, and were still forced to remain inlined since all defined in buf.h. Let's create buf.c and move the heaviest ones there. All those moved here were above 200 bytes.	2024-10-12 16:29:15 +02:00
Willy Tarreau	d288ddb575	CLEANUP: muxes: remove useless inclusion of ebmbtree.h Since 2.7 with commit `8522348482` ("BUG/MAJOR: conn-idle: fix hash indexing issues on idle conns"), we've been using eb64 trees and not ebmb trees anymore, and later we dropped all that to centralize the operations in the server. Let's remove the ebmbtree.h includes from the muxes that do not use them.	2024-10-12 16:29:15 +02:00
Willy Tarreau	cf3fe1eed4	MINOR: mux-h2/traces: print the size of the DATA frames DATA frames produce a special trace with the amount of transferred data in arg4, but this was not reported by h2_trace(). This commit just adds it.	2024-10-12 16:29:15 +02:00
Willy Tarreau	af064b497a	BUG/MINOR: mux-h2/traces: present the correct buffer for trailers errors traces The local "rxbuf" buffer was passed to the trace instead of h2s->rxbuf that is used when decoding trailers. The impact is essentially the impossibility to present some buffer contents in some rare cases. It may be backported but it's unlikely that anyone will ever notice the difference.	2024-10-12 16:29:15 +02:00
Willy Tarreau	0fa654ca92	BUILD: cache: silence an uninitialized warning at -Og with gcc-12.2 Building with gcc-12.2 -Og yields this incorrect warning in cache.c: In function 'release_entry_unlocked', inlined from 'http_action_store_cache' at src/cache.c:1449:4: src/cache.c:330:9: warning: 'object' may be used uninitialized [-Wmaybe-uninitialized] 330 \| release_entry(cache, entry, 1); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/cache.c: In function 'http_action_store_cache': src/cache.c:1200:29: note: 'object' was declared here 1200 \| struct cache_entry object, old; \| ^~~~~~ This is wrong, the only way to reach the function is with first!=NULL and the gotos that reach there are all those made with first==NULL. Let's just preset object to NULL to silence it.	2024-10-12 16:28:54 +02:00
William Lallemand	edf85a1d76	MINOR: cfgparse: simulate long configuration parsing with force-cfg-parser-pause This command is pausing the configuration parser for <timeout> milliseconds. This is useful for development or for testing timeouts of init scripts, particularly to simulate a very long reload. It requires the expose-experimental-directives to be set.	2024-10-11 17:40:37 +02:00
Amaury Denoyelle	232083c3e5	BUG/MEDIUM: mux-quic: ensure timeout server is active for short requests If a small request is received on QUIC MUX frontend, it can be transmitted directly with the FIN on attach operation. rcv_buf is skipped by the stream layer. Thus, it is necessary to ensure that there is similar behavior when FIN is reported either on attach or rcv_buf. One difference was that se_expect_data() was called only for rcv_buf but not on attach. This most obvious effect is that stream timeout was deactivated for this request : client timeout was disabled on EOI but server one not armed due to previous se_expect_no_data(). This prevents the early closure of too long requests. To fix this, add an invokation of se_expect_data() on attach operation. This bug can simply be detected using httpterm with delay request (for example /?t=10000) and using smaller client/server timeouts. The bug is present if the request is not aborted on timeout but instead continue until its proper HTTP 200 termination. This has been introduced by the following commit : `85eabfbf67` MEDIUM: mux-quic: Don't expect data from server as long as request is unfinished This must be backported up to 2.8.	2024-10-10 17:20:39 +02:00
Aurelien DARRAGON	7144e60cd2	MINOR: sample: postresolve sink names in debug() converter debug() converter used to resolve sink names during parsing time. Because of this, we were unable to specify sink names that were defined after the debug() converter was placed. Like in the previous commit, let's implement proper postparsing for the debug() converter, in order to be able to use sink names that are about to be defined later in the config file.	2024-10-10 16:55:15 +02:00
Aurelien DARRAGON	ed266589b6	MINOR: trace: postresolve sink names A previous known limitation about traces was that parsing was performed on the fly, meaning that when using "sink" keyword, only sinks that were either internal or previously defined in the config could be used. Indeed, it was not possible to use a ring section defined AFTER the traces section when using the 'sink' keyword from traces. This limitation was also mentioned in the config file. Let's get rid of that limitation by implementing proper postparsing for the sink parameter in traces section. To do this, make use of the new sink_find_early() helper to start referencing sink by their names even if they don't exist yet (if they are about to be defined later in the config) Traces commands on the cli are not concerned by this change.	2024-10-10 16:55:15 +02:00
Aurelien DARRAGON	1bdf6e884a	MEDIUM: sink: implement sink_find_early() sink_find_early() is a convenient function that can be used instead of sink_find() during parsing time in order to try to find a matching sink even if the sink is not defined yet. Indeed, if the sink is not defined, sink_find_early() will try to create it and mark it as forward-declared. It will also save informations from the caller to better identify it in case of errors. If the sink happens to be found in the config, it will transition from forward-declared type to its final type. Else, it means that the sink was not found in the config, in this case, during postresolve, we raise an error to indicate that the sink was not found in the configuration. It should help solve postresolving issue with rings, because for now only log targets implement proper ring postresolving.. but rings may be used at different places in the code, such as debug() converter or in "traces" section.	2024-10-10 16:55:15 +02:00
Damien Claisse	ba7c03c18e	MINOR: ssl: disable server side default CRL check with WolfSSL Patch `64a77e3ea5` disabled CRL check when no CRL file was provided, but it only did it on bind side. Add the same fix in server context initialization side. This allows to enable peer verification (verify required) on a server using TLS, without having to provide a CRL file.	2024-10-10 09:31:19 +02:00
Amaury Denoyelle	456c3997b2	BUG/MEDIUM: quic: properly decount out-of-order ACK on stream release Out-of-order STREAM ACK are buffered in its related streambuf tree. On insertion, overlapping or contiguous ranges are merged together. The total size of buffered ack range is stored in <room> streambuf member and reported to QUIC MUX layer on streambuf release. The objective is to ensure QUIC MUX layer can allocate Tx buffers conveniently to preserve a good transfer throughput. Streamdesc is the overall container of many streambufs. It may also been released when its upper QCS instance is freed, after all stream data have been emitted. In this case, the active streambuf is also released via custom code. However, in this code path, <room> was not reported to the QUIC MUX layer. This bug caused wrong estimation for the QUIC MUX txbuf window, with bytes reamining even after all ACK reception. This may cause transfer freeze on other connection streams, with RESET_STREAM emission on timeout client. To fix this, reuse the existing qc_stream_buf_release() function on streamdesc release. This ensures that notify_room is correctly used. No need to backport.	2024-10-09 17:47:16 +02:00
Amaury Denoyelle	f0049d0748	BUG/MINOR: quic: fix discarding of already stored out-of-order ACK To properly decount out-of-order acked data range, contiguous or overlapping ranges are first merged before their insertion in a tree. The first step ensure that a newly reported range is not completely covered by the existing tree ranges. However, one of the condition was incorrect. Fix this to ensure that the final range tree does not contain duplicated entry. The impact of this bug is unknown. However, it may have allowed the insertion of overlapping ranges, which could in turn cause an error in QUIC MUX txbuf window, with a possible transfer freeze. No need to backport.	2024-10-09 17:32:30 +02:00
Aurelien DARRAGON	f88f162868	BUG/MEDIUM: hlua: properly handle sample func errors in hlua_run_sample_{fetch,conv}() To execute sample fetches and converters from lua. hlua API leverages the sample API. Prior to executing the sample func, the arg checker is called from hlua_run_sample_{fetch,conv}() to detect potential errors. However, hlua_run_sample_{fetch,conv}() both pass NULL as <err> argument, but it is wrong for two reasons. First we miss an opportunity to report precise error messages to help the user know what went wrong during the check.. and more importantly, some val check functions consider that the <err> pointer is never NULL. This is the case for example with check_crypto_hmac(). Because of this, when such val check functions encounter an error, they will crash the process because they will try to de-reference NULL. This bug was discovered and reported by GH user @JB0925 on #2745. Perhaps val check functions should make sure that the provided <err> pointer is != NULL prior to de-referencing it. But since there are multiple occurences found in the code and the API isn't clear about that, it is easier to fix the hlua part (caller) for now. To fix the issue, let's always provide a valid <err> pointer when leveraging val_arg() check function pointer, and make use of it in case or error to report relevant message to the user before freeing it. It should be backported to all stable versions.	2024-10-08 12:00:42 +02:00
Aurelien DARRAGON	d0e0105181	BUG/MEDIUM: hlua: make hlua_ctx_renew() safe hlua_ctx_renew() is called from unsafe places where the caller doesn't expect it to LJMP.. however hlua_ctx_renew() makes use of Lua library function that could potentially raise errors, such as lua_newthread(), and it does nothing to catch errors. Because of this, haproxy could unexpectedly crash. This was discovered and reported by GH user @JB0925 on #2745. To fix the issue, let's simply make hlua_ctx_renew() safe by applying the same logic implemented for hlua_ctx_init() or hlua_ctx_destroy(), which is catching Lua errors by leveraging SET_SAFE_LJMP_PARENT() helper. It should be backported to all stable versions.	2024-10-08 12:00:36 +02:00

1 2 3 4 5 ...

23212 Commits