haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-11-02 01:21:34 +01:00

Author	SHA1	Message	Date
Christopher Faulet	5200203677	MINOR: proxy: Add options to drop HTTP trailers during message forwarding In RFC9110, it is stated that trailers could be merged with the headers. While it should be performed with a speicial care, it may be a problem for some applications. To avoid any trouble with such applications, two new options were added to drop trailers during the message forwarding. On the backend, "http-drop-request-trailers" option can be enabled to drop trailers from the requests before sending them to the server. And on the frontend, "http-drop-response-trailers" option can be enabled to drop trailers from the responses before sending them to the client. The options can be defined in defaults sections and disabled with "no" keyword. This patch should fix the issue #2930.	2025-04-22 16:14:46 +02:00
Amaury Denoyelle	bd1d02e2b3	MINOR: mux-h2: handle idle-ping on conn reverse This commit extends MUX H2 connection reversal step to properly take into account the new idle-ping feature. It first ensures that h2c task is properly instantiated/freed depending now on both timers and idle-ping configuration. Also, h2c_update_timeout() is now called instead of manually requeuing the task, which ensures the proper timer value is selected depending on the new connection side.	2025-04-17 14:49:36 +02:00
Amaury Denoyelle	cc5a7a760f	MINOR: mux-h2: do not emit GOAWAY on idle ping expiration If idle-ping is activated and h2c task is expired due to missing PING ACK, consider that the peer is away and the connection can be closed immediately. GOAWAY emission is thus skipped. A new test is necessary in h2c_update_timeout() when PING ACK is currently expected, but the next timer expiration selected is not idle-ping. This may happen if http-keep-alive/http-request timers are selected first. In this case, H2_CF_IDL_PING_SENT flag is resetted. This is necessary to not prevent GOAWAY emission on expiration.	2025-04-17 14:49:36 +02:00
Amaury Denoyelle	52246249ab	MEDIUM: listener/mux-h2: implement idle-ping on frontend side This commit is the counterpart of the previous one, adapted on the frontend side. "idle-ping" is added as keyword to bind lines, to be able to refresh client timeout of idle frontend connections. H2 MUX behavior remains similar as the previous patch. The only significant change is in h2c_update_timeout(), as idle-ping is now taken into account also for frontend connection. The calculated value is compared with http-request/http-keep-alive timeout value. The shorter delay is then used as expired date. As hr/ka timeout are based on idle_start, this allows to run them in parallel with an idle-ping timer.	2025-04-17 14:49:36 +02:00
Amaury Denoyelle	a78a04cfae	MEDIUM: server/mux-h2: implement idle-ping on backend side This commit implements support for idle-ping on the backend side. First, a new server keyword "idle-ping" is defined in configuration parsing. It is used to set the corresponding new server member. The second part of this commit implements idle-ping support on H2 MUX. A new inlined function conn_idle_ping() is defined to access connection idle-ping value. Two new connection flags are defined H2_CF_IDL_PING and H2_CF_IDL_PING_SENT. The first one is set for idle connections via h2c_update_timeout(). On h2_timeout_task() handler, if first flag is set, instead of releasing the connection as before, the second flag is set and tasklet is scheduled. As both flags are now set, h2_process_mux() will proceed to PING emission. The timer has also been rearmed to the idle-ping value. If a PING ACK is received before next timeout, connection timer is refreshed. Else, the connection is released, as with timer expiration. Also of importance, special care is needed when a backend connection is going to idle. In this case, idle-ping timer must be rearmed. Thus a new invokation of h2c_update_timeout() is performed on h2_detach().	2025-04-17 14:49:36 +02:00
Amaury Denoyelle	4dcfe098a6	MINOR: mux-h2: prepare to support PING emission Adapt the already existing function h2c_ack_ping(). The objective is to be able to emit a PING request. First, it is renamed as h2c_send_ping(). A new boolean argument <ack> is used to emit either a PING request or ack.	2025-04-17 14:49:36 +02:00
Amaury Denoyelle	99b2e52f89	MINOR: mux-h2: refactor idle timeout calculation Reorganize code for timeout calculation in case the connection is idle. The objective is to better reflect the relations between each timeouts as follow : * if GOAWAY already emitted, use shut-timeout, or if unset fallback to client/server one. However, an already set timeout is never erased. * else, for frontend connection, http-request or keep-alive timeout is applied depending on the current demux state. If the selected value is unset, fallback to client timeout * for backend connection, no timeout is set to perform http-reuse This commit is pure refactoring, so no functional change should occur.	2025-04-17 14:49:36 +02:00
Amaury Denoyelle	243bc95de0	BUG/MINOR: mux-h2: do not apply timer on idle backend connection Since the following commit, MUX H2 timeout function has been slightly exetended. d38d8c6ccb189e7bc813b3693fec3093c9be55f1 BUG/MEDIUM: mux-h2: make sure control frames do not refresh the idle timeout A side-effect of this patch is that now backend idle connection expire timer is not reset if already defined. This means that if a timer was registered prior to the connection transition to idle, the connection would be destroyed on its timeout. If this happens for enough connection, this may have an impact on the reuse rate. In practice, this case should be rare, as h2c timer is set to TICK_ETERNITY while there is active streams. The timer is not refreshed most of the time before going the transition to idle, so the connection won't be deleted on expiration. The only case where it could occur is if there is still pending data blocked on emission on stream detach. Here, timeout server is applied on the connection. When the emission completes, the connection goes to idle, but the timer will still armed, and thus will be triggered on the idle connection. To prevent this, explicitely reset h2c timer to TICK_ETERNITY for idle backend connection via h2c_update_timeout(). This patch is explicitely not scheduled for backport for now, as it is difficult to estimate the real impact of the previous code state.	2025-04-17 14:49:36 +02:00
Amaury Denoyelle	9e6f8ce328	BUG/MINOR: rhttp: ensure GOAWAY can be emitted after reversal GOAWAY emission should not be emitted before preface. Thus, max_id field from h2c acting as a server is initialized to -1, which prevents its emission until preface is received from the peer. If acting as a client, max_id is initialized to a valid value on the first h2s emission. This causes an issue with reverse HTTP on the active side. First, it starts as a client, so the peer does not emit a preface but instead a simple SETTINGS frame. As role are switched, max_id is initialized much later when the first h2s response is emitted. Thus, if the connection must be terminated before any stream transfer, GOAWAY cannot be emitted. To fix this, ensure max_id is initialized to 0 on h2_conn_reverse() for active connect side. Thus, a GOAWAY indicating that no stream has been handled can be generated. Note that passive connect side is not impacted, as it max_id is initialized thanks to preface reception. This should be backported up to 2.9.	2025-04-17 14:49:36 +02:00
Amaury Denoyelle	3ebdd3ae50	BUG/MINOR: mux-h2: prevent past scheduling with idle connections While reviewing HTTP/2 MUX timeout, it seems there is a possibility that MUX task is requeued via h2c_update_timeout() with an already expired date. This can happens with idle connections on two cases : * first with shut timeout, as timer is not refreshed if already set * second with http-request and keep-alive timers, which are based on idle_start Queuing an already expired task is an undefined behavior. Fix this by using task_wakeup() instead of task_queue() at the end of h2c_update_timeout() if such case occurs. This should be backported up to 2.6.	2025-04-17 14:49:36 +02:00
Ilia Shipitsin	27a6353ceb	CLEANUP: assorted typo fixes in the code, commits and doc	2025-04-03 11:37:25 +02:00
Ilia Shipitsin	78b849b839	CLEANUP: assorted typo fixes in the code and comments code, comments and doc actually.	2025-04-02 11:12:20 +02:00
Christopher Faulet	e87397bc7d	BUG/MINOR: mux-h2: Reset streams with NO_ERROR code if full response was already sent On frontend side, when a stream is shut while the response was already fully sent, it was cancelled by sending a RST_STREAM(CANCEL) frame. However, it is not accurrate. CANCEL error code must only be used if the response headers were sent, but not the full response. As stated in the RFC 9113, when the response was fully sent, to stop the request sending, a RST_STREAM with an error code of NO_ERROR must be sent. This patch should solve the issue #1219. It must be backported to all stable versions.	2025-03-20 08:36:06 +01:00
Willy Tarreau	bbf824933f	BUG/MINOR: h2: always trim leading and trailing LWS in header values Annika Wickert reported some occasional disconnections between haproxy and varnish when communicating over HTTP/2, with varnish complaining about protocol errors while captures looked apparently normal. Nils Goroll managed to reproduce this on varnish by injecting the capture of the outgoing haproxy traffic and noticed that haproxy was forwarding a header value containing a trailing space, which is now explicitly forbidden since RFC9113. It turns out that the only way for such a header to pass through haproxy is to arrive in h2 and not be edited, in which case it will arrive in HTX with its undesired spaces. Since the code dealing with HTX headers always trims spaces around them, these are not observable in dumps, but only when started in debug mode (-d). Conversions to/from h1 also drop the spaces. With this patch we trim LWS both on input and on output. This way we always present clean headers in the whole stack, and even if some are manually crafted by the configuration or Lua, they will be trimmed on the output. This must be backported to all stable versions. Thanks to Annika for the helpful capture and Nils for the help with the analysis on the varnish side!	2025-02-24 09:39:57 +01:00
Christopher Faulet	b70921f2c1	BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut On shut, truncated HTX messages were not properly handled by the H2 multiplexer. Depending on how data were emitted, a chunked HTX message without the 0-CRLF could be considered as full and an empty data with ES flag set could be emitted instead of a RST_STREAM(CANCEL) frame. In the H2 multiplexer, when a shut is performed, an HTX message is considered as truncated if more HTX data are still expected. It is based on the presence or not of the H2_SF_MORE_HTX_DATA flag on the H2 stream. However, this flag is set or unset depending on the HTX extra field value. This field is used to state how much data that must still be transferred, based on the announced data length. For a message with a content-length, this assumption is valid. But for a chunked message, it is not true. Only the length of the current chunk is announced. So we cannot rely on this field in that case to know if a message is full or not. Instead, we must rely on the HTX start-line flags to know if more HTX data are expected or not. If the xfer length is known (the HTX_SL_F_XFER_LEN flag is set on the HTX start-line), it means that more data are always expected, until the end of message is reached (the HTX_FL_EOM flag is set on the HTX message). This is true for bodyless message because the end of message is reported with the end of headers. This is also true for tunneled messages because the end of message is received before switching the H2 stream in tunnel mode. This patch must be backported as far as 2.8.	2025-02-18 17:34:59 +01:00
Christopher Faulet	54a09dfe0f	BUG/MINOR: tevt/mux-h2: Set truncated receive/eos events at SE level on error When receive or EOS termination events are reported at the SE level, a truncation was erroneously reported when no error was detected. Of course, it must be the opposite. No backport needed.	2025-02-06 11:19:32 +01:00
Christopher Faulet	a58e650ad1	MEDIUM: tevt/muxes: Add dedicated termination events for muxc/se locations Termination events dedicated to mux connection and stream-endpoint descriptors are added in this patch. Specific events to these locations are thus added. Changes for the H1 and H2 multiplexers are reviewed to be more accurate.	2025-01-31 10:41:50 +01:00
Christopher Faulet	a4c281a190	MINOR: tevt/muxes: Add CTL and SCTL command to get the termination event logs MUX_CTL_TEVTS command is added to get the termination event logs of a mux connection and MUX_SCTL_TEVTS command to get the termination event logs of a mux stream.	2025-01-31 10:41:50 +01:00
Christopher Faulet	95029305d3	MINOR: tevt/mux-h1/mux-h2: Add termination events log when dumping mux info The termiantion events logs of the multiplexer connection and stream are now dumped when corresponding mux info are dumped. The termination event logs of the underlying connection is also dumped in the debug string.	2025-01-31 10:41:50 +01:00
Christopher Faulet	147b6d3d4d	MINOR: tevt/mux-h2: Report termination events for the H2C shutdown for reads (read0), receive errors, shutdown for writes and timeouts are reported, but only for the H2 connection for now. As for the H1 multiplexer, more events must be added to report protocol errors, goaways and rst-streams. And of course, all events for the H2 streams must be reported too.	2025-01-31 10:41:50 +01:00
Christopher Faulet	22f8d2c99e	BUG/MEDIUM: mux-h2: Count copied data when looping on RX bufs in h2_rcv_buf() When data was copied from RX buffers to the channel buffer, more data than expected could be moved because amount of data copied was never decremented from the limit. This could lead to a stream dead lock when the compression filter was inuse. The issue was introduced by commit 4eb3ff1 ("MAJOR: mux-h2: make streams use the connection's buffers") but revealed by 3816c38 ("MAJOR: mux-h2: permit a stream to allocate as many buffers as desired"). Because a h2 stream can now have several RX buffers, in h2_rcv_buf(), we loop on these buffers to fill the channel buffer. However, we must still take care to respect the limit to not copy to much data. However, the "count" variable was never decremented to reflect amount of data already copied. So, it was possible to exceed the limit. It was an issue when the compression filter was inuse because the channel buffer could be fully filled, preventing the compression to be performed. When this happened, the stream was infinitly blocked because the compression filter was asking for some space but nothing was scheduled to be forwarded. This patch should fix the issue #2826. It must be backported to 3.1.	2025-01-02 09:58:23 +01:00
Amaury Denoyelle	8633446337	BUG/MINOR: h2/rhttp: fix HTTP2 conn counters on reverse Dedicated HTTP/2 stats proxy counters are available for current and total number of HTTP/2 connection on both frontend and backend sides. Both counters are simply incremented into h2_init(). This causes issues when using reverse HTTP. First, increment is not performed on the expected side, as it is triggered before h2_conn_reverse() which switches a connection from frontend to backend or vice versa. For example on active revers side, h2_total_connections is incremented on the backend only even after connection is reversed and attached to a listener for the remainder of its lifetime. h2_open_connections suffers from a similar but arguably worst behavior as it is also decremented. If increment and decrement operations are not performed on the same proxy side, which happens for every connection which has been successfully reversed, it causes an invalid counter value, possibly with an integer overflow. To fix this, delay increment operations on reverse HTTP from h2_init() to h2_conn_reverse(). Both counters are updated only after reverse has completed, thus using the expected frontend or backend side. To prevent overflow on h2_open_connections, ensure h2_release() decrement is not performed if a connection is freed before achieving its reversal, as in this case it would not have been accounted by H2 counters. This should be backported up to 2.9. This should fix github issue #2821.	2024-12-19 17:32:01 +01:00
Willy Tarreau	cb21db04c7	BUG/MINOR: mux-h2: fix expression when detecting excess of CONTINUATION frames Latest commit f0eca8fe7 ("MINOR: mux-h2/glitches: add a description to the H2 glitches") misplaced the optional glitch description field, with it appearing at the end of the if () condition and always reporting an excess of CONTINUATION frames from the first exceeding one. This needs to be backported along with that commit once it gets backported.	2024-12-06 18:53:19 +01:00
Willy Tarreau	f0eca8fe73	MINOR: mux-h2/glitches: add a description to the H2 glitches Since we can now list them using "debug counters" and now support a description, better add the description to all glitches. This patch may be backported to 3.1, but before this the following patches must also be picked: 86823c828 MINOR: mux-h2/traces: add a missing trace on negative initial window size 7c8e9420a CLEANUP: mux-h2/traces: reword certain ambiguous traces	2024-12-06 18:49:07 +01:00
Willy Tarreau	7c8e9420a2	CLEANUP: mux-h2/traces: reword certain ambiguous traces Some h2 traces were not very clear, let's reword them a bit.	2024-12-06 18:45:46 +01:00
Willy Tarreau	86823c828f	MINOR: mux-h2/traces: add a missing trace on negative initial window size When a negative initial windows size is reported, we're going to close the connection, so it's important to report a trace to explain why! This should be backported at least to 3.1 and possibly 3.0 (adapting the context since there's no glitches there).	2024-12-06 18:45:46 +01:00
Amaury Denoyelle	762d0764d7	MINOR: mux-h2: use explicit __objt_server on idle conn reinsert This commit is the counterpart of the previous one for H2 mux. It replaces objt_server() by unsafe __objt_server(), as conn target is guarantee to point to a valid server instance, which can then be used as _srv_add_idle() argument.	2024-12-06 18:02:55 +01:00
Willy Tarreau	d649278fce	BUG/MEDIUM: mux-h2: make sure not to touch dummy streams when sending WU Since commit 1cc851d9f2 ("MEDIUM: mux-h2: start to update stream when sending WU") we started storing stream offsets in the h2s struct. These offsets are updated at a few points, where it's safe to write to the stream, and in h2c_send_strm_wu(), where the h2s->h2c was not performed. Due to this, nothing protects the h2s from being updated when sending a WU for a closed stream, which might only happen when acknowledging a frame after resetting that stream, which is quite unlikely. In any case if this happens, it will crash as in issue #2793 since the closed streams are purposely read-only to catch such bugs. The fix is trivial, just check h2s->h2c before deciding to update the stream. Thanks to @Wahnes for reporting this, and Christopher for spotting the cause. This needs to be backported to 3.1 only.	2024-12-05 15:25:09 +01:00
Christopher Faulet	63d2760dfa	BUG/MEDIUM: mux-h2: Check the number of headers in HEADERS frame after decoding There is no explicit test on the number of headers when a HEADERS frame is received. It is implicitely limited by the size of the header list. But it is twice the configured limit to be sure to decode the frame. So now, a check is performed after the HTX message was created. This way, we are sure to not exceed the configured limit after the decoding stage. If there are too many headers, a parsing error is reported. Note the same is performed on the trailers. This patch should patially address the issue #2685. It should be backported to all stable versions.	2024-11-20 17:44:22 +01:00
Christopher Faulet	e415e3cb7a	BUG/MEDIUM: mux-h2: Increase max number of headers when encoding HEADERS frames When a HEADERS frame is encoded to be sent, the maximum number of headers allowed in the frame is lower than on receiving path. This can lead to report a sending error while the message was accepted. It could be confusing. In addition, the start-line is splitted into pseudo-headers and consummes this way some header slots, increasing the difference between HEADERS frames encoding and decoding. It is even more noticeable because when a HEADERS frame is decoded, a margin is used to be able to handle splitted cookie headers. Concretly, on decoding path, a limit of twice the maxumum number of headers allowed in a message (tune.http.maxhdr * 2) is used. On encoding path, the exact limit is used. It is not consistent. Note that when a frame is decoded, we must use a larger limit because the pseudo headers are reassembled in the start-line and must count for one. But also because, most of time, the cookies are splitted into several headers and are reassembled too. To fix the issue, the same ratio is applied on sending path. A limit must be defined because an dynamic allocation is not acceptable. Twice of the configured limit should be good enough to support headers manipulation. This patch should be backported to all stable versions.	2024-11-20 17:44:22 +01:00
Christopher Faulet	f065d00098	BUG/MEDIUM: mux-h2: Don't send RST_STREAM frame for streams with no ID On server side, the H2 stream is first created with an unassigned ID (ID == 0). Its ID is assigned when the request is emitted, before formatting the HEADERS frame. However, the session may be aborted during that stage. We must take care to not emit RST_STREAM frame for this stream, because it does not exist yet for the server. It is especially important to do so because, depending on the timing, it may also happens before the H2 PREFACE was sent. This patch must be backported to all stable versions. It is related to issue	2024-11-15 10:34:47 +01:00
Willy Tarreau	df93cf72b9	MINOR: mux-h2: count glitches when they're reported The h2c_report_glitch() function is now replaced with a macro to support enumerating counters for each individual glitch line. For now this adds 43 such counters. The macro supports an optional description, though that is not being used for now. It gives outputs like this (note that the last one was purposely instrumented to pass a description): > debug dev counters glt all 0 GLT mux_h2.c:5976 h2c_dec_hdrs() 0 GLT mux_h2.c:5960 h2c_dec_hdrs() (...) 0 GLT mux_h2.c:2207 h2c_frt_recv_preface() 0 GLT mux_h2.c:1954 h2c_frt_stream_new(): new stream too early As a reminder, this requires to build with -DDEBUG_GLITCHES=1.	2024-11-14 09:01:57 +01:00
Willy Tarreau	3ed9361688	BUG/MEDIUM: mux-h2: try to wait for the peer to read the GOAWAY When timeout http-keep-alive is very short (e.g. 10ms), it's possible sometimes for a client to face truncated responses due to an early close that happens while the system is still pushing the last data, colliding with the client's WINDOW_UPDATEs that trigger RSTs. Here we're trying to do better: first we send a GOAWAY on timeout, then we wait up to clientfin/client timeout for the peer to react so that we don't immediately close. This is sufficient to avoid truncation as soon as the timeout is more than a few hundred ms. It's not certain it should be backported, because it's a bit sensistive and might possibly fall into certain edge cases.	2024-11-08 14:31:07 +01:00
Willy Tarreau	db76949cff	CLEANUP: mux-h2: remove the unused "full" variable in h2_frt_transfer_data() During 11th and 12th iteration of the development cycle for the H2 auto rx window, several approaches were attempted to figure if another buffer could be allocated or not. One of them consisted in looping back to the beginning of the function requesting a new buffer slot and getting one if the buffer was either apparently or confirmed full. The latest one consisted in directly allocating the next buffer from the two places where it's found to be proven full, instead of checking with the now defunct h2s_may_get_rxbuf() if we were allowed to get once an loop. That approach was retained. In this case the "full" variabled is no longer needed, so let's get rid of it because the construct looks bogus and confuses coverity (and possibly code readers as the intent is unclear compared to the code).	2024-10-24 16:12:46 +02:00
Willy Tarreau	1eb31d30fe	Revert "OPTIM: mux-h2: make h2_send() report more accurate wake up conditions" This reverts commit 9fbc01710a313968c90e72537a5906432f438062. In 3.1-dev10, commit 9fbc01710a ("OPTIM: mux-h2: make h2_send() report more accurate wake up conditions") leveraged the more accurate distinction between demux and recv to decide when to wake the tasklet up after a send. But other cases are needed. When we just need to wake the processing task up so that it itself wakes up other streams, for example because these ones are blocked. Indeed, a temporarily blocked stream may block other ones, which will never be woken up if the demux has nothing to do. In an ideal world we would check all cases where blocking flags were dropped. However it looks like this case after a send is probably the only one that deserves waking up the connection again. It's likely that in practice the MUX_MFULL flag was dropped and that it was that one that was blocking the send. In addition, dealing with these cases was not sufficient, as one case was encountered where dbuf was empty, subs=0, short_read still present while in FRH state... and the timeouts were still there (easily found with halog -tcn cD at a rate of 1-2 every 2 minutes roughly). Interestingly, in a dump, some MBUF_HAS_DATA were seen on an empty mbuf, so it means that certain conditions must be taken very carefully in the wakeup conditions. So overall this indicates that there remain subtle inconsistencies that this optimization is sensitive to. It may have to be revisited later but for now better revert it. No backport is needed. Annex: - first dump showing a dependency on WAIT_INLIST after h2_send(): 0x6dc2800: [23/Oct/2024:18:07:22.861247] id=1696 proto=tcpv4 flags=0x100c4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x597a900, pend_pos=(nil) waiting=0 epoch=0 frontend=public (id=2 mode=http), listener=SSL (id=5) backend=gitweb-haproxy (id=6 mode=http) task=0x6e1d090 (state=0x00 nice=0 calls=23 rate=0 exp=2s tid=0(1/0) age=57s) txn=0x6e3f7c0 flags=0x43000 meth=1 status=200 req.st=MSG_DONE rsp.st=MSG_DATA req.f=0x4c rsp.f=0x2e scf=0x6dc33a0 flags=0x00002482 ioto=1m state=EST endp=CONN,0x6dc6c20,0x40405001 sub=3 rex=<NEVER> wex=3s rto=3s wto=3s iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0 h2s=0x6dc6c20 h2s.id=59 .st=HCR .flg=0x7001 .rxwin=32712 .rxbuf.c=0 .t=0@(nil)+0/0 .h=0@(nil)+0/0 .sc=0x6dc33a0(.flg=0x00002482 .app=0x6dc2800) .sd=0x6e83fd0(.flg=0x40405001) .subs=0x6dc33b8(ev=3 tl=0x6e22a20 tl.calls=10 tl.ctx=0x6dc33a0 tl.fct=sc_conn_io_cb) h2c=0x6e66570 h2c.st0=FRH .err=0 .maxid=77 .lastid=-1 .flg=0x2000e00 .nbst=2 .nbsc=2 .nbrcv=0 .glitches=0 .fctl_cnt=0 .send_cnt=2 .tree_cnt=2 .orph_cnt=0 .sub=1 .dsi=77 .dbuf=0@(nil)+0/0 .mbuf=[4..4\|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=0x6dbdc60 .exp=<NEVER> co0=0x7f84881614b0 ctrl=tcpv4 xprt=SSL mux=H2 data=STRM target=LISTENER:0x2acb7c0 flags=0x80000300 fd=19 fd.state=121 updt=0 fd.tmask=0x1 scb=0x2a8da90 flags=0x00001211 ioto=1m state=EST endp=CONN,0x6e5a530,0x106c0001 sub=0 rex=<NEVER> wex=<NEVER> rto=3s wto=<NEVER> iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0 h1s=0x6e5a530 h1s.flg=0x14094 .sd.flg=0x106c0001 .req.state=MSG_DONE .res.state=MSG_DATA .meth=GET status=200 .sd.flg=0x106c0001 .sc.flg=0x00001211 .sc.app=0x6dc2800 .subs=(nil) h1c=0x7f84880f5f40 h1c.flg=0x80000020 .sub=0 .ibuf=32704@0x6ddef30+16262/32768 .obuf=0@(nil)+0/0 .task=0x6e131d0 .exp=<NEVER> co1=0x7f8488172b70 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x597a900 flags=0x00000300 fd=31 fd.state=10122 updt=0 fd.tmask=0x1 filters={0x6e49f30="cache store filter", 0x6e67ad0="compression filter"} req=0x6dc2828 (f=0x21840000 an=0x48000 tofwd=0 total=224) an_exp=<NEVER> buf=0x6dc2830 data=(nil) o=0 p=0 i=0 size=0 htx=0x104d2c0 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0 res=0x6dc2870 (f=0xa0040000 an=0x24000000 tofwd=0 total=309982) an_exp=<NEVER> buf=0x6dc2878 data=0x6dceef0 o=16333 p=16333 i=16435 size=32768 htx=0x6dceef0 flags=0x0 size=32720 data=16333 used=1 wrap=NO extra=0 ----------------------------------- strm.flg 0x100c4a SF_SRV_REUSED SF_HTX SF_REDIRECTABLE SF_CURR_SESS SF_BE_ASSIGNED SF_ASSIGNED task.state 0 0 txn.meth 1 GET txn.flg 0x43000 TX_NOT_FIRST TX_CACHE_COOK TX_CACHEABLE txn.req.flg 0x4c HTTP_MSGF_BODYLESS HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN txn.rsp.flg 0x2e HTTP_MSGF_COMPRESSING HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN HTTP_MSGF_TE_CHNK f.sc.flg 0x2482 SC_FL_SND_EXP_MORE SC_FL_RCV_ONCE SC_FL_WONT_READ SC_FL_EOI f.sc.sd.flg 0x40405001 SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX f.h2s.flg 0x7001 H2_SF_HEADERS_RCVD H2_SF_OUTGOING_DATA H2_SF_HEADERS_SENT H2_SF_ES_RCVD f.h2s.sd.flg 0x40405001 SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX f.h2c.flg 0x2000e00 H2_CF_MBUF_HAS_DATA H2_CF_DEM_IN_PROGRESS H2_CF_DEM_SHORT_READ H2_CF_WAIT_INLIST f.co.flg 0x80000300 CO_FL_XPRT_TRACKED CO_FL_XPRT_READY CO_FL_CTRL_READY f.co.fd.st 0x121 FD_POLL_IN FD_EV_READY_W FD_EV_ACTIVE_R b.sc.flg 0x1211 SC_FL_SND_NEVERWAIT SC_FL_NEED_ROOM SC_FL_NOHALF SC_FL_ISBACK b.sc.sd.flg 0x106c0001 SE_FL_WAIT_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_MAY_FASTFWD_PROD SE_FL_WANT_ROOM SE_FL_RCV_MORE SE_FL_T_MUX b.h1s.sd.flg 0x106c0001 SE_FL_WAIT_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_MAY_FASTFWD_PROD SE_FL_WANT_ROOM SE_FL_RCV_MORE SE_FL_T_MUX b.h1s.flg 0x14094 H1S_F_HAVE_CLEN H1S_F_HAVE_O_CONN H1S_F_NOT_FIRST H1S_F_WANT_KAL H1S_F_RX_CONGESTED b.h1c.flg 0x80000020 H1C_F_IS_BACK H1C_F_IN_FULL b.co.flg 0x300 CO_FL_XPRT_READY CO_FL_CTRL_READY b.co.fd.st 0x278a FD_POLL_OUT FD_POLL_PRI FD_POLL_IN FD_EV_ERR_RW FD_EV_READY_R 0x2008 req.flg 0x21840000 CF_FLT_ANALYZE CF_DONT_READ CF_AUTO_CONNECT CF_WROTE_DATA req.ana 0x48000 AN_REQ_FLT_END AN_REQ_HTTP_XFER_BODY req.htx.flg 0 0 res.flg 0xa0040000 CF_ISRESP CF_FLT_ANALYZE CF_WROTE_DATA res.ana 0x24000000 AN_RES_FLT_END AN_RES_HTTP_XFER_BODY res.htx.flg 0 0 ----------------------------------- - second example of stuck connection after properly checking for WAIT_INLIST as well: 0x73438d0: [23/Oct/2024:18:46:57.235709] id=3963 proto=tcpv4 flags=0x100c4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x5dd3f50, pend_pos=(nil) waiting=0 epoch=0x13 p_stc=25 p_req=29 p_res=29 p_prp=29 frontend=public (id=2 mode=http), listener=SSL (id=5) backend=gitweb-haproxy (id=6 mode=http) task=0x72a13e0 (state=0x00 nice=0 calls=24 rate=0 exp=7s tid=0(1/0) age=53s) txn=0x7287260 flags=0x43000 meth=1 status=200 req.st=MSG_DONE rsp.st=MSG_DATA req.f=0x4c rsp.f=0x2e scf=0x729e520 flags=0x00042082 ioto=1m state=EST endp=CONN,0x737ffd0,0x4040d001 sub=2 rex=<NEVER> wex=46s rto=46s wto=46s iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0 h2s=0x737ffd0 h2s.id=57 .st=HCR .flg=0x7001 .rxwin=32712 .rxbuf.c=0 .t=0@(nil)+0/0 .h=0@(nil)+0/0 .sc=0x729e520(.flg=0x00042082 .app=0x73438d0) .sd=0x72afd50(.flg=0x4040d001) .subs=0x729e538(ev=2 tl=0x72af760 tl.calls=10 tl.ctx=0x729e520 tl.fct=sc_conn_io_cb) h2c=0x72555a0 h2c.st0=FRH .err=0 .maxid=77 .lastid=-1 .flg=0x60e00 .nbst=1 .nbsc=1 .nbrcv=0 .glitches=0 .fctl_cnt=0 .send_cnt=1 .tree_cnt=1 .orph_cnt=0 .sub=0 .dsi=77 .dbuf=0@(nil)+0/0 .mbuf=[2..2\|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=0x725e660 .exp=<NEVER> co0=0x7378e00 ctrl=tcpv4 xprt=SSL mux=H2 data=STRM target=LISTENER:0x2f24800 flags=0x80040300 fd=23 fd.state=1122 updt=0 fd.tmask=0x1 scb=0x2ee74c0 flags=0x00001211 ioto=1m state=EST endp=CONN,0x7287190,0x106c0001 sub=0 rex=<NEVER> wex=<NEVER> rto=46s wto=<NEVER> iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0 h1s=0x7287190 h1s.flg=0x14094 .sd.flg=0x106c0001 .req.state=MSG_DONE .res.state=MSG_DATA .meth=GET status=200 .sd.flg=0x106c0001 .sc.flg=0x00001211 .sc.app=0x73438d0 .subs=(nil) h1c=0x7373920 h1c.flg=0x80000020 .sub=0 .ibuf=32704@0x7272700+318/32768 .obuf=0@(nil)+0/0 .task=0x729e700 .exp=<NEVER> co1=0x72f5290 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x5dd3f50 flags=0x00000300 fd=19 fd.state=10122 updt=0 fd.tmask=0x1 filters={0x728f1f0="cache store filter" [3], 0x728fea0="compression filter" [28]} req=0x73438f8 (f=0x21840000 an=0x48000 tofwd=0 total=224) an_exp=<NEVER> buf=0x7343900 data=(nil) o=0 p=0 i=0 size=0 htx=0x105f440 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0 res=0x7343940 (f=0xa0040000 an=0x24000000 tofwd=0 total=359574) an_exp=<NEVER> buf=0x7343948 data=0x72b1b30 o=16333 p=16333 i=16435 size=32768 htx=0x72b1b30 flags=0x8 size=32720 data=16333 used=1 wrap=NO extra=0 ----------------------------------- strm.flg 0x100c4a SF_SRV_REUSED SF_HTX SF_REDIRECTABLE SF_CURR_SESS SF_BE_ASSIGNED SF_ASSIGNED task.state 0 0 txn.meth 1 GET txn.flg 0x43000 TX_NOT_FIRST TX_CACHE_COOK TX_CACHEABLE txn.req.flg 0x4c HTTP_MSGF_BODYLESS HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN txn.rsp.flg 0x2e HTTP_MSGF_COMPRESSING HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN HTTP_MSGF_TE_CHNK f.sc.flg 0x42082 SC_FL_EOS SC_FL_SND_EXP_MORE SC_FL_WONT_READ SC_FL_EOI f.sc.sd.flg 0x4040d001 SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX f.h2s.flg 0x7001 H2_SF_HEADERS_RCVD H2_SF_OUTGOING_DATA H2_SF_HEADERS_SENT H2_SF_ES_RCVD f.h2s.sd.flg 0x4040d001 SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX f.h2c.flg 0x60e00 H2_CF_END_REACHED H2_CF_RCVD_SHUT H2_CF_MBUF_HAS_DATA H2_CF_DEM_IN_PROGRESS H2_CF_DEM_SHORT_READ f.co.flg 0x80040300 CO_FL_XPRT_TRACKED CO_FL_SOCK_RD_SH CO_FL_XPRT_READY CO_FL_CTRL_READY f.co.fd.st 0x1122 FD_POLL_HUP FD_POLL_IN FD_EV_READY_W FD_EV_READY_R b.sc.flg 0x1211 SC_FL_SND_NEVERWAIT SC_FL_NEED_ROOM SC_FL_NOHALF SC_FL_ISBACK b.sc.sd.flg 0x106c0001 SE_FL_WAIT_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_MAY_FASTFWD_PROD SE_FL_WANT_ROOM SE_FL_RCV_MORE SE_FL_T_MUX	2024-10-23 19:17:10 +02:00
Christopher Faulet	ded28f6e5c	BUG/MEDIUM: mux-h2: Remove H2S from send list if data are sent via 0-copy FF When data are sent via the zero-copy data forwarding, in h2_done_ff, we must be sure to remove the H2 stream from the send list if something is send. It was only performed if no blocking condition was encountered. But we must also do it if something is sent. Otherwise the transfer may be blocked till timeout. This patch must be backported as far as 2.9.	2024-10-22 08:00:32 +02:00
Willy Tarreau	9aa86b9dbd	BUILD: mux-h2/traces: fix build on 32-bit due to size of the DATA frame Commit cf3fe1eed ("MINOR: mux-h2/traces: print the size of the DATA frames") added the size of the DATA frame to the traces. Unfortunately it uses ullong instead of ulong to cast a pointer, which breaks the build on 32-bit platforms. Let's just switch it to ulong which works on both.	2024-10-21 04:17:59 +02:00
Willy Tarreau	e4cb0ad632	MINOR: mux-h2/traces: add buffer-related info to h2s and h2c The traces currently don't contain any info about the amount of data present in buffers, making it difficult to figure if an empty buffer is the cause for not demuxing or if a full buffer is the cause for not reading more data. Let's add them, with the head/tail info as well.	2024-10-12 18:07:21 +02:00
Willy Tarreau	a8f907a459	MINOR: mux-h2/traces: add missing flags and proxy ID in traces H2 traces are unusable to detect bugs most of the time because they miss the h2c and h2s flags, as well as the proxy, which makes it very hard to figure if the info comes from the client or the server as soon as two layers are stacked. This commit adds these precious information as well as the h2s's rx and tx windows. This could be backported to a few recent branches, but the rx window calculation will have to be replaced with the static value there.	2024-10-12 17:45:51 +02:00
Willy Tarreau	fcab647613	OPTIM: mux-h2: use tasklet_wakeup_after() in h2s_notify_recv() This reduces the avg wakeup latency of sc_conn_io_cb() from 1900 to 51us. The L2 cache misses from from 1.4 to 1.2 billion for 20k req. But the perf is not better. Also there are situations where we must not perform such wakeup, these may only be done from h2_io_cb, hence the test on the next_tasklet pointer and its reset when leaving the function. In practice all callers to h2s_close() or h2s_destroy() can reach that code, this includes h2_detach, h2_snd_buf, h2_shut etc. Another test with 40 concurrent connections, transferring 40k 1MB objects at different concurrency levels from 1 to 80 also showed a 21% drop in L2 cache misses, and a 2% perf improvement: Before: 329,510,887,528 instructions 50,907,966,181 branches 843,515,912 branch-misses 2,753,360,222 cache-misses 19,306,172,474 L1-icache-load-misses 17,321,132,742 L1-dcache-load-misses 951,787,350 LLC-load-misses 44.660469000 seconds user 62.459354000 seconds sys => avg perf: 373 MB/s After: 331,310,219,157 instructions 51,343,396,257 branches 851,567,572 branch-misses 2,183,369,149 cache-misses 19,129,827,134 L1-icache-load-misses 17,441,877,512 L1-dcache-load-misses 906,923,115 LLC-load-misses 42.795458000 seconds user 62.277983000 seconds sys => avg perf: 380 MB/s With small requests, it's the L1 and L3 cache misses which reduced by 3% and 7% respectively, and the performance went up by 3%.	2024-10-12 17:17:51 +02:00
Willy Tarreau	04ce6536e1	OPTIM: mux-h2: try to continue reading after demuxing when useful When we stop demuxing in the middle of a frame, we know that there are other data following. The demux buffer is small and unique, but now we have rxbufs, so after h2_process_demux() is left, the dbuf is almost empty and has room to be delivered into another rxbuf. Let's implement a short loop with a counter and a few conditions around the demux call. We limit the number of turns to the number of available rxbufs and no more than 12, since it shows good performance, and the wakeup is only called once. This has shown a nice 12-20% bandwidth gain on backend-side H2 transferring 1MB-large objects, and does not affect the rest (headers, control etc). The number of wakeup calls was divided by 5 to 8, which is also a nice improvement. The counter is limited to make sure we don't add processing latency. Tests were run to find the optimal limit, and it turns out that 16 is just slightly better, but not worth the +33% increase in peak processing latency. The h2_process_demux() function just doens't call the wakeup function anymore, and solely focuses on transferring from dbuf to rxbuf. Practical measurement: test with h2load producing 4 concurrent connections with 10 concurrent streams each, downloading 1MB objects (20k total) via two layers of haproxy stacked, reaching httpterm over H1 (numbers are total for the 2 h2 front and 1 h2 back). All on a single thread. Before: 549-553 MB/s (on h2load) function calls cpu_tot cpu_avg h2_io_cb 2562340 8.157s 3.183us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup h2_io_cb 30109 840.9ms 27.93us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup h2_io_cb 16105 106.4ms 6.607us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup h2_io_cb 1 11.75us 11.75us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup h2_io_cb 2608555 9.104s 3.490us --total-- perf stat: 153,117,996,214 instructions (71.41%) 22,919,659,027 branches # 14.97% of inst (71.41%) 384,009,600 branch-misses # 1.68% of all branches (71.42%) 44,052,220 cache-misses # 1 inst / 3476 (71.44%) 9,819,232,047 L1-icache-load-misses # 6.4% of inst (71.45%) 8,426,410,306 L1-dcache-load-misses # 5.5% of inst (57.15%) 10,951,949 LLC-load-misses # 1 inst / 13982 (57.13%) 12.372600000 seconds user 23.629506000 seconds sys After: 660 MB/s (+20%) function calls cpu_tot cpu_avg h2_io_cb 244502 4.410s 18.04us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup h2_io_cb 42107 1.062s 25.22us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup h2_io_cb 13703 106.3ms 7.758us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup h2_io_cb 1 13.74us 13.74us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup h2_io_cb 300313 5.578s 18.57us --total-- perf stat: 126,840,441,876 instructions (71.40%) 17,576,059,236 branches # 13.86% of inst (71.40%) 274,136,753 branch-misses # 1.56% of all branches (71.42%) 30,413,562 cache-misses # 1 inst / 4170 (71.45%) 6,665,036,203 L1-icache-load-misses # 5.25% of inst (71.46%) 7,519,037,097 L1-dcache-load-misses # 5.9% of inst (57.15%) 6,702,411 LLC-load-misses # 1 inst / 18925 (57.12%) 10.490097000 seconds user 19.212515000 seconds sys It's also interesting to see that less total time is spent in these functions, clearly indicating that the cost of interrupted processing, and the extraneous cache misses come into play at some point. Indeed, after the change, the number of instructions went down by 17.2%, while the L2 cache misses dropped by 31% and the L3 cache misses by 39%!	2024-10-12 16:38:36 +02:00
Willy Tarreau	9fbc01710a	OPTIM: mux-h2: make h2_send() report more accurate wake up conditions h2_send() used to report non-zero every time any data were sent, and this was used from h2_snd_buf() or h2_done_ff() to trigger a wakeup, which possibly can do nothing. Restricting this wakeup to either a successful send() combined with the ability to demux, or an error. Doing this makes the number of h2_io_cb() wakeups drop from 422k to 245k for 1000 1MB objects delivered over 100 streams between two H2 proxies, without any behavior change nor performance change. In practice, most send() calls do not result in a wakeup anymore but synchronous errors still do. A local test downloading 10k 1MB objects from an H1 server with a single connection shows this change: before after caller 1547 1467 h2_process_demux() 2138 0 h2_done_ff() <--- 38 1453 ssl_sock_io_cb() <--- 18 0 h2_snd_buf() 1 1 h2_init() 3742 2921 -- total -- In practice the ssl_sock_io_cb() wakeups are those notifying about SUB_RETRY_RECV, which are not accounted for when h2_done_ff() performs the wakeup because the tasklet is already queued (a counter placed there shows that it's nonetheless called). So there's no transfer and h2_done_ff() was only hiding the other one. Another test involving 4 connections with 10 concurrent streams each and 20000 1MB objects total shows a total disparition of the wakeups from h2_snd_buf and h2_done_ff, which used to account together for 50% of the wakeups, resulting in effectively halving the number of wakeups which, based on their avg process time, were not doing anything: Before: function calls cpu_tot cpu_avg h2_io_cb 2571208 7.406s 2.880us <- h2c_restart_reading@src/mux_h2.c:940 tasklet_wakeup h2_io_cb 2536949 251.4ms 99.00ns <- h2_snd_buf@src/mux_h2.c:7573 tasklet_wakeup ### h2_io_cb 41100 5.622ms 136.0ns <- h2_done_ff@src/mux_h2.c:7779 tasklet_wakeup ### h2_io_cb 38979 852.8ms 21.88us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup h2_io_cb 12519 90.28ms 7.211us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup h2_io_cb 1 13.81us 13.81us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup h2_io_cb 5200756 8.606s 1.654us --total-- After: h2_io_cb 2562340 8.157s 3.183us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup h2_io_cb 30109 840.9ms 27.93us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup h2_io_cb 16105 106.4ms 6.607us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup h2_io_cb 1 11.75us 11.75us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup h2_io_cb 2608555 9.104s 3.490us --total--	2024-10-12 16:38:36 +02:00
Willy Tarreau	633c41c621	MEDIUM: mux-h2: rework h2_restart_reading() to differentiate recv and demux From the beginning, h2_restart_reading() has always been confusing because it decides whether or not to wake the tasklet handler up or not. This tasklet handler does two things, one is receiving from the socket to the demux buf, and one is demuxing from the demux buf to the streams' rxbufs. The conditions are governed by h2_recv_allowed(), which is also called at a few places to decide whether or not to actually receive from the socket. It starts to be visible that this leaves some difficulties regarding what to do with possibly pending data. In 2.0 with commit 3ca18bf0b ("BUG/MEDIUM: h2: Don't attempt to recv from h2_process_demux if we subscribed."), we even had to address a special case where it was possibly to endlessly wake up because the conditions would rely on the demux buffer's contents, though the solution consisted in passing a flag to decide whether or not to consider the buffer's contents. In 2.5 commit b5f7b5296 ("BUG/MEDIUM: mux-h2: Handle remaining read0 cases on partial frames") introduced a new flag H2_CF_DEM_SHORT_READ which indicates that the demux had to stop in the middle of a frame and cannot make progress without more data. More adaptations later came in based on this but this actually reflected exactly what was needed to solve this painful situation: a state indicating whether to receive or parse. Now's about time to definitely address this by reworking h2_restart_reading() to check two completely independent things: - the ability to receive more data into the demux buffer, which is based on its allocation/fill state and the socket's errors - the ability to demux such data, which is based on the presence of enough data (i.e. no stuck short read), and ability to find an rx buf to continue the processing. Now the conditions are much more understandable, and it's also visible that the consider_buffer argument, whose value was not trivial for callers, is not used anymore. Tests stacking two layers of H2 show strictly no change to the wakeup cause distributions nor counts.	2024-10-12 16:38:36 +02:00
Willy Tarreau	0fd66703c2	MEDIUM: mux-h2: change the default initial window to 16kB Now that we're using all available rx buffers for transfers, there's no point anymore in advertising more than the minimum value we can safely buffer. Let's be conservative and only rely on the dynamic buffers to improve speed beyond the configured value, and make sure than many streams will no longer cause unfairness. Interestingly, the total number of wakeups has further shrunk down, but with a different distribution. From 128k for 1000 1M transfers, it went down to 119k, with 96k from restart_reading, 10k from done_ff and 2.6k from snd_buf. done_ff went up by 30% and restart_reading went down by 30%.	2024-10-12 16:38:26 +02:00
Willy Tarreau	1ed9d37c88	MINOR: mux-h2: add tune.h2.be.rxbuf and tune.h2.fe.rxbuf global settings These settings allow to change the total buffer size allocated to the backend and frontend respectively. This way it's no longer necessary to play with tune.bufsize nor increase the number of streams to benefit from more buffers. Setting tune.h2.fe.rxbuf to 4m to match a sender's max tcp_wmem resulted in 257 Mbps for a single stream at 103ms vs 121 Mbps default (or 5.1 Mbps with a single buffer and 64kB window).	2024-10-12 16:29:16 +02:00
Willy Tarreau	e018d9a0cf	MAJOR: mux-h2: make the rxbuf allocation algorithm a bit smarter Without using bandwidth estimates, we can already use up to the number of allocatable rxbufs and share them evenly between receiving streams. In practice we reserve one buffer for any non-receiving stream, plus 1 per 8 possible new streams, and divide the rest between the number of receiving streams. Finally, for front streams, this is rounded up to the buffer size while for back streams we round it down. The rationale here is that front to back is very fast to flush and slow to refill so we want to optimise upload bandwidth regardless of the number of streams, while it's the opposite in the other way so we try to minimize HoL. That shows good results with a single stream being able to send at 121 Mbps at 103ms using 1.4 MB buffer with default settings, or 8 streams sharing the bandwidth at 180kB each. Previously the limit was approx 5.1 Mbps per stream. It also enables better sharing of backend connections: a slow (100 Mbps) and a fast (1 Gbps) clients were both downloading 2 100MB files each over a shared H2 connection. The fast one used to show 6.86 to 20.74s with an avg of 11.45s and an stddev of 5.81s before the patch, and went to a much more respectable 6.82 to 7.73s with 7.08s avg and 0.336s stddev. We don't try to increase the window past the remaining content length. First, this is pointless (though harmless), but in addition it causes needless emission of WINDOW_UPDATE frames on small uploads that are smaller than a window, and beyond being useless, it upsets vtest which expects an RST on some tests. The scheduling is not reliable enough to insert an expect for a window update first, so in the end wich that extra check we save a few useless frames on small uploads and please vtest. A new setting should be added to allow to increase the number of buffers without having to change the number of streams. At this point it's not done.	2024-10-12 16:29:16 +02:00
Willy Tarreau	3816c38601	MAJOR: mux-h2: permit a stream to allocate as many buffers as desired Now we don't enforce allocation limits in h2s_get_rxbuf(), since there is no benefit in not processing pending data, it would still cause HoL for no saving. The only reason for not allocating is if there are no buffers available for the connection. In theory this should not change anything except that it excerts code paths that support reallocating multiple buffers, which could possibly uncover a sleeping bug. This is why it's placed in a separate commit. And one observation worth noting is that it almost cut in half the number of iocb wakeups: for 1000 1MB transfers over 100 concurrent streams of a single connection, we used to observe 208k wakeups (110 from restart_reading, 80 from snd_buf, 11 from done_ff), and now we're observing 128k (113 from restart_reading, 2.4 from snd_buf, 6.9k from done_ff), which seems to indicate that pretty often the demuxing was blocked on a buffer full due to the default advertised window of 64k.	2024-10-12 16:29:16 +02:00
Willy Tarreau	4eb3ff1d3b	MAJOR: mux-h2: make streams use the connection's buffers For now it seems to work as before, and even when artificially inflating the number of allocatable buffers per stream. The number of allocated slots is always the same as the max number of streams, which guarantees that each stream will find one buffer. we only grant one buffer per stream at this point, since the goal was to replace the existing single rxbuf. A new demux blocking flag, H2_CF_DEM_RXBUF, was added to indicate a failure to get an rxbuf slot from the connection. It was lightly tested (by forcing bl_init() to a lower number of buffers). It is not yet certain whether it's more useful to have a new flag or to reuse the existing H2_CF_DEM_SFULL which indicates the rxbuf is full, but at least the new flag more accurately translates the condition, that may make a difference in the future. However, given that when RXBUF is set, most of the time it results in a failure to find more room to demux and it sets SFULL, for now we have to always clear SFULL when clearing RXBUF as well. This means that most of the time we'll see 3 combinations: - none: everything's OK - SFULL: the unique rx buffer is full - RXBUF \|\| (RXBUF\|SFULL): cannot allocate more entries Note that we need to be super careful in h2_frt_transfer_data() because the htx_free_data_space() function doesn't guarantee that the room is usable, so htx_add_data() may still fail despite an apparent room. For this reason, h2_frt_transfer_data() maintains a "full" flag to indicate that a transfer attempt failed and that a new buffer is required.	2024-10-12 16:29:16 +02:00
Willy Tarreau	6279cbc9e9	MINOR: mux-h2: clear up H2_CF_DEM_DFULL and H2_CF_DEM_SHORT_READ ambiguity Since commit 485da0b05 ("BUG/MEDIUM: mux_h2: Handle others remaining read0 cases on partial frames"), H2_CF_DEM_SHORT_READ is set when there is no blocking flags. However, it checks H2_CF_DEM_BLOCK_ANY which does not include H2_CF_DEM_DFULL. This results in many cases where both H2_CF_DEM_DFULL and H2_CF_DEM_SHORT_READ are set together, which makes no sense, since one says the demux buffer is full while the other one says an incomplete read was done. This doesn't permit to properly decide whether to restart reading or processing. Let's make sure to clear DFULL in h2_process_demux() whenever we consume incoming data from the dbuf, and check for DFULL before setting SHORT_READ. This could probably be considered as a bug fix but it's hard to say if it has any impact on the current code, probably at worst it might cause a few useless wakeups, so until there's any proof that it needs to be backported, better not do it.	2024-10-12 16:29:16 +02:00
Willy Tarreau	b74bedf157	MINOR: mux-h2: simplify the wake up code in h2_rcv_buf() The code used to decide when to restart reading is far from being trivial and will cause trouble after the forthcoming changes: it checks if the current stream is the same that is being demuxed, and only if so, wakes the demux to restart reading. Once streams will start to use multiple buffers, this condition will make no sense anymore. Actually the real reason is split into two steps: - detect if the demux is currently blocked on the current stream, and if so remove SFULL - detect if any demux blocking flags were removed during the operations, and if so, wake demuxing. For now this doesn't change anything.	2024-10-12 16:29:16 +02:00

1 2 3 4 5 ...

1005 Commits