haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-10-29 07:31:00 +01:00

Author	SHA1	Message	Date
Willy Tarreau	7c8e9420a2	CLEANUP: mux-h2/traces: reword certain ambiguous traces Some h2 traces were not very clear, let's reword them a bit.	2024-12-06 18:45:46 +01:00
Willy Tarreau	86823c828f	MINOR: mux-h2/traces: add a missing trace on negative initial window size When a negative initial windows size is reported, we're going to close the connection, so it's important to report a trace to explain why! This should be backported at least to 3.1 and possibly 3.0 (adapting the context since there's no glitches there).	2024-12-06 18:45:46 +01:00
Amaury Denoyelle	762d0764d7	MINOR: mux-h2: use explicit __objt_server on idle conn reinsert This commit is the counterpart of the previous one for H2 mux. It replaces objt_server() by unsafe __objt_server(), as conn target is guarantee to point to a valid server instance, which can then be used as _srv_add_idle() argument.	2024-12-06 18:02:55 +01:00
Willy Tarreau	d649278fce	BUG/MEDIUM: mux-h2: make sure not to touch dummy streams when sending WU Since commit 1cc851d9f2 ("MEDIUM: mux-h2: start to update stream when sending WU") we started storing stream offsets in the h2s struct. These offsets are updated at a few points, where it's safe to write to the stream, and in h2c_send_strm_wu(), where the h2s->h2c was not performed. Due to this, nothing protects the h2s from being updated when sending a WU for a closed stream, which might only happen when acknowledging a frame after resetting that stream, which is quite unlikely. In any case if this happens, it will crash as in issue #2793 since the closed streams are purposely read-only to catch such bugs. The fix is trivial, just check h2s->h2c before deciding to update the stream. Thanks to @Wahnes for reporting this, and Christopher for spotting the cause. This needs to be backported to 3.1 only.	2024-12-05 15:25:09 +01:00
Christopher Faulet	63d2760dfa	BUG/MEDIUM: mux-h2: Check the number of headers in HEADERS frame after decoding There is no explicit test on the number of headers when a HEADERS frame is received. It is implicitely limited by the size of the header list. But it is twice the configured limit to be sure to decode the frame. So now, a check is performed after the HTX message was created. This way, we are sure to not exceed the configured limit after the decoding stage. If there are too many headers, a parsing error is reported. Note the same is performed on the trailers. This patch should patially address the issue #2685. It should be backported to all stable versions.	2024-11-20 17:44:22 +01:00
Christopher Faulet	e415e3cb7a	BUG/MEDIUM: mux-h2: Increase max number of headers when encoding HEADERS frames When a HEADERS frame is encoded to be sent, the maximum number of headers allowed in the frame is lower than on receiving path. This can lead to report a sending error while the message was accepted. It could be confusing. In addition, the start-line is splitted into pseudo-headers and consummes this way some header slots, increasing the difference between HEADERS frames encoding and decoding. It is even more noticeable because when a HEADERS frame is decoded, a margin is used to be able to handle splitted cookie headers. Concretly, on decoding path, a limit of twice the maxumum number of headers allowed in a message (tune.http.maxhdr * 2) is used. On encoding path, the exact limit is used. It is not consistent. Note that when a frame is decoded, we must use a larger limit because the pseudo headers are reassembled in the start-line and must count for one. But also because, most of time, the cookies are splitted into several headers and are reassembled too. To fix the issue, the same ratio is applied on sending path. A limit must be defined because an dynamic allocation is not acceptable. Twice of the configured limit should be good enough to support headers manipulation. This patch should be backported to all stable versions.	2024-11-20 17:44:22 +01:00
Christopher Faulet	f065d00098	BUG/MEDIUM: mux-h2: Don't send RST_STREAM frame for streams with no ID On server side, the H2 stream is first created with an unassigned ID (ID == 0). Its ID is assigned when the request is emitted, before formatting the HEADERS frame. However, the session may be aborted during that stage. We must take care to not emit RST_STREAM frame for this stream, because it does not exist yet for the server. It is especially important to do so because, depending on the timing, it may also happens before the H2 PREFACE was sent. This patch must be backported to all stable versions. It is related to issue	2024-11-15 10:34:47 +01:00
Willy Tarreau	df93cf72b9	MINOR: mux-h2: count glitches when they're reported The h2c_report_glitch() function is now replaced with a macro to support enumerating counters for each individual glitch line. For now this adds 43 such counters. The macro supports an optional description, though that is not being used for now. It gives outputs like this (note that the last one was purposely instrumented to pass a description): > debug dev counters glt all 0 GLT mux_h2.c:5976 h2c_dec_hdrs() 0 GLT mux_h2.c:5960 h2c_dec_hdrs() (...) 0 GLT mux_h2.c:2207 h2c_frt_recv_preface() 0 GLT mux_h2.c:1954 h2c_frt_stream_new(): new stream too early As a reminder, this requires to build with -DDEBUG_GLITCHES=1.	2024-11-14 09:01:57 +01:00
Willy Tarreau	3ed9361688	BUG/MEDIUM: mux-h2: try to wait for the peer to read the GOAWAY When timeout http-keep-alive is very short (e.g. 10ms), it's possible sometimes for a client to face truncated responses due to an early close that happens while the system is still pushing the last data, colliding with the client's WINDOW_UPDATEs that trigger RSTs. Here we're trying to do better: first we send a GOAWAY on timeout, then we wait up to clientfin/client timeout for the peer to react so that we don't immediately close. This is sufficient to avoid truncation as soon as the timeout is more than a few hundred ms. It's not certain it should be backported, because it's a bit sensistive and might possibly fall into certain edge cases.	2024-11-08 14:31:07 +01:00
Willy Tarreau	db76949cff	CLEANUP: mux-h2: remove the unused "full" variable in h2_frt_transfer_data() During 11th and 12th iteration of the development cycle for the H2 auto rx window, several approaches were attempted to figure if another buffer could be allocated or not. One of them consisted in looping back to the beginning of the function requesting a new buffer slot and getting one if the buffer was either apparently or confirmed full. The latest one consisted in directly allocating the next buffer from the two places where it's found to be proven full, instead of checking with the now defunct h2s_may_get_rxbuf() if we were allowed to get once an loop. That approach was retained. In this case the "full" variabled is no longer needed, so let's get rid of it because the construct looks bogus and confuses coverity (and possibly code readers as the intent is unclear compared to the code).	2024-10-24 16:12:46 +02:00
Willy Tarreau	1eb31d30fe	Revert "OPTIM: mux-h2: make h2_send() report more accurate wake up conditions" This reverts commit 9fbc01710a313968c90e72537a5906432f438062. In 3.1-dev10, commit 9fbc01710a ("OPTIM: mux-h2: make h2_send() report more accurate wake up conditions") leveraged the more accurate distinction between demux and recv to decide when to wake the tasklet up after a send. But other cases are needed. When we just need to wake the processing task up so that it itself wakes up other streams, for example because these ones are blocked. Indeed, a temporarily blocked stream may block other ones, which will never be woken up if the demux has nothing to do. In an ideal world we would check all cases where blocking flags were dropped. However it looks like this case after a send is probably the only one that deserves waking up the connection again. It's likely that in practice the MUX_MFULL flag was dropped and that it was that one that was blocking the send. In addition, dealing with these cases was not sufficient, as one case was encountered where dbuf was empty, subs=0, short_read still present while in FRH state... and the timeouts were still there (easily found with halog -tcn cD at a rate of 1-2 every 2 minutes roughly). Interestingly, in a dump, some MBUF_HAS_DATA were seen on an empty mbuf, so it means that certain conditions must be taken very carefully in the wakeup conditions. So overall this indicates that there remain subtle inconsistencies that this optimization is sensitive to. It may have to be revisited later but for now better revert it. No backport is needed. Annex: - first dump showing a dependency on WAIT_INLIST after h2_send(): 0x6dc2800: [23/Oct/2024:18:07:22.861247] id=1696 proto=tcpv4 flags=0x100c4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x597a900, pend_pos=(nil) waiting=0 epoch=0 frontend=public (id=2 mode=http), listener=SSL (id=5) backend=gitweb-haproxy (id=6 mode=http) task=0x6e1d090 (state=0x00 nice=0 calls=23 rate=0 exp=2s tid=0(1/0) age=57s) txn=0x6e3f7c0 flags=0x43000 meth=1 status=200 req.st=MSG_DONE rsp.st=MSG_DATA req.f=0x4c rsp.f=0x2e scf=0x6dc33a0 flags=0x00002482 ioto=1m state=EST endp=CONN,0x6dc6c20,0x40405001 sub=3 rex=<NEVER> wex=3s rto=3s wto=3s iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0 h2s=0x6dc6c20 h2s.id=59 .st=HCR .flg=0x7001 .rxwin=32712 .rxbuf.c=0 .t=0@(nil)+0/0 .h=0@(nil)+0/0 .sc=0x6dc33a0(.flg=0x00002482 .app=0x6dc2800) .sd=0x6e83fd0(.flg=0x40405001) .subs=0x6dc33b8(ev=3 tl=0x6e22a20 tl.calls=10 tl.ctx=0x6dc33a0 tl.fct=sc_conn_io_cb) h2c=0x6e66570 h2c.st0=FRH .err=0 .maxid=77 .lastid=-1 .flg=0x2000e00 .nbst=2 .nbsc=2 .nbrcv=0 .glitches=0 .fctl_cnt=0 .send_cnt=2 .tree_cnt=2 .orph_cnt=0 .sub=1 .dsi=77 .dbuf=0@(nil)+0/0 .mbuf=[4..4\|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=0x6dbdc60 .exp=<NEVER> co0=0x7f84881614b0 ctrl=tcpv4 xprt=SSL mux=H2 data=STRM target=LISTENER:0x2acb7c0 flags=0x80000300 fd=19 fd.state=121 updt=0 fd.tmask=0x1 scb=0x2a8da90 flags=0x00001211 ioto=1m state=EST endp=CONN,0x6e5a530,0x106c0001 sub=0 rex=<NEVER> wex=<NEVER> rto=3s wto=<NEVER> iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0 h1s=0x6e5a530 h1s.flg=0x14094 .sd.flg=0x106c0001 .req.state=MSG_DONE .res.state=MSG_DATA .meth=GET status=200 .sd.flg=0x106c0001 .sc.flg=0x00001211 .sc.app=0x6dc2800 .subs=(nil) h1c=0x7f84880f5f40 h1c.flg=0x80000020 .sub=0 .ibuf=32704@0x6ddef30+16262/32768 .obuf=0@(nil)+0/0 .task=0x6e131d0 .exp=<NEVER> co1=0x7f8488172b70 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x597a900 flags=0x00000300 fd=31 fd.state=10122 updt=0 fd.tmask=0x1 filters={0x6e49f30="cache store filter", 0x6e67ad0="compression filter"} req=0x6dc2828 (f=0x21840000 an=0x48000 tofwd=0 total=224) an_exp=<NEVER> buf=0x6dc2830 data=(nil) o=0 p=0 i=0 size=0 htx=0x104d2c0 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0 res=0x6dc2870 (f=0xa0040000 an=0x24000000 tofwd=0 total=309982) an_exp=<NEVER> buf=0x6dc2878 data=0x6dceef0 o=16333 p=16333 i=16435 size=32768 htx=0x6dceef0 flags=0x0 size=32720 data=16333 used=1 wrap=NO extra=0 ----------------------------------- strm.flg 0x100c4a SF_SRV_REUSED SF_HTX SF_REDIRECTABLE SF_CURR_SESS SF_BE_ASSIGNED SF_ASSIGNED task.state 0 0 txn.meth 1 GET txn.flg 0x43000 TX_NOT_FIRST TX_CACHE_COOK TX_CACHEABLE txn.req.flg 0x4c HTTP_MSGF_BODYLESS HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN txn.rsp.flg 0x2e HTTP_MSGF_COMPRESSING HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN HTTP_MSGF_TE_CHNK f.sc.flg 0x2482 SC_FL_SND_EXP_MORE SC_FL_RCV_ONCE SC_FL_WONT_READ SC_FL_EOI f.sc.sd.flg 0x40405001 SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX f.h2s.flg 0x7001 H2_SF_HEADERS_RCVD H2_SF_OUTGOING_DATA H2_SF_HEADERS_SENT H2_SF_ES_RCVD f.h2s.sd.flg 0x40405001 SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX f.h2c.flg 0x2000e00 H2_CF_MBUF_HAS_DATA H2_CF_DEM_IN_PROGRESS H2_CF_DEM_SHORT_READ H2_CF_WAIT_INLIST f.co.flg 0x80000300 CO_FL_XPRT_TRACKED CO_FL_XPRT_READY CO_FL_CTRL_READY f.co.fd.st 0x121 FD_POLL_IN FD_EV_READY_W FD_EV_ACTIVE_R b.sc.flg 0x1211 SC_FL_SND_NEVERWAIT SC_FL_NEED_ROOM SC_FL_NOHALF SC_FL_ISBACK b.sc.sd.flg 0x106c0001 SE_FL_WAIT_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_MAY_FASTFWD_PROD SE_FL_WANT_ROOM SE_FL_RCV_MORE SE_FL_T_MUX b.h1s.sd.flg 0x106c0001 SE_FL_WAIT_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_MAY_FASTFWD_PROD SE_FL_WANT_ROOM SE_FL_RCV_MORE SE_FL_T_MUX b.h1s.flg 0x14094 H1S_F_HAVE_CLEN H1S_F_HAVE_O_CONN H1S_F_NOT_FIRST H1S_F_WANT_KAL H1S_F_RX_CONGESTED b.h1c.flg 0x80000020 H1C_F_IS_BACK H1C_F_IN_FULL b.co.flg 0x300 CO_FL_XPRT_READY CO_FL_CTRL_READY b.co.fd.st 0x278a FD_POLL_OUT FD_POLL_PRI FD_POLL_IN FD_EV_ERR_RW FD_EV_READY_R 0x2008 req.flg 0x21840000 CF_FLT_ANALYZE CF_DONT_READ CF_AUTO_CONNECT CF_WROTE_DATA req.ana 0x48000 AN_REQ_FLT_END AN_REQ_HTTP_XFER_BODY req.htx.flg 0 0 res.flg 0xa0040000 CF_ISRESP CF_FLT_ANALYZE CF_WROTE_DATA res.ana 0x24000000 AN_RES_FLT_END AN_RES_HTTP_XFER_BODY res.htx.flg 0 0 ----------------------------------- - second example of stuck connection after properly checking for WAIT_INLIST as well: 0x73438d0: [23/Oct/2024:18:46:57.235709] id=3963 proto=tcpv4 flags=0x100c4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x5dd3f50, pend_pos=(nil) waiting=0 epoch=0x13 p_stc=25 p_req=29 p_res=29 p_prp=29 frontend=public (id=2 mode=http), listener=SSL (id=5) backend=gitweb-haproxy (id=6 mode=http) task=0x72a13e0 (state=0x00 nice=0 calls=24 rate=0 exp=7s tid=0(1/0) age=53s) txn=0x7287260 flags=0x43000 meth=1 status=200 req.st=MSG_DONE rsp.st=MSG_DATA req.f=0x4c rsp.f=0x2e scf=0x729e520 flags=0x00042082 ioto=1m state=EST endp=CONN,0x737ffd0,0x4040d001 sub=2 rex=<NEVER> wex=46s rto=46s wto=46s iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0 h2s=0x737ffd0 h2s.id=57 .st=HCR .flg=0x7001 .rxwin=32712 .rxbuf.c=0 .t=0@(nil)+0/0 .h=0@(nil)+0/0 .sc=0x729e520(.flg=0x00042082 .app=0x73438d0) .sd=0x72afd50(.flg=0x4040d001) .subs=0x729e538(ev=2 tl=0x72af760 tl.calls=10 tl.ctx=0x729e520 tl.fct=sc_conn_io_cb) h2c=0x72555a0 h2c.st0=FRH .err=0 .maxid=77 .lastid=-1 .flg=0x60e00 .nbst=1 .nbsc=1 .nbrcv=0 .glitches=0 .fctl_cnt=0 .send_cnt=1 .tree_cnt=1 .orph_cnt=0 .sub=0 .dsi=77 .dbuf=0@(nil)+0/0 .mbuf=[2..2\|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=0x725e660 .exp=<NEVER> co0=0x7378e00 ctrl=tcpv4 xprt=SSL mux=H2 data=STRM target=LISTENER:0x2f24800 flags=0x80040300 fd=23 fd.state=1122 updt=0 fd.tmask=0x1 scb=0x2ee74c0 flags=0x00001211 ioto=1m state=EST endp=CONN,0x7287190,0x106c0001 sub=0 rex=<NEVER> wex=<NEVER> rto=46s wto=<NEVER> iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0 h1s=0x7287190 h1s.flg=0x14094 .sd.flg=0x106c0001 .req.state=MSG_DONE .res.state=MSG_DATA .meth=GET status=200 .sd.flg=0x106c0001 .sc.flg=0x00001211 .sc.app=0x73438d0 .subs=(nil) h1c=0x7373920 h1c.flg=0x80000020 .sub=0 .ibuf=32704@0x7272700+318/32768 .obuf=0@(nil)+0/0 .task=0x729e700 .exp=<NEVER> co1=0x72f5290 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x5dd3f50 flags=0x00000300 fd=19 fd.state=10122 updt=0 fd.tmask=0x1 filters={0x728f1f0="cache store filter" [3], 0x728fea0="compression filter" [28]} req=0x73438f8 (f=0x21840000 an=0x48000 tofwd=0 total=224) an_exp=<NEVER> buf=0x7343900 data=(nil) o=0 p=0 i=0 size=0 htx=0x105f440 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0 res=0x7343940 (f=0xa0040000 an=0x24000000 tofwd=0 total=359574) an_exp=<NEVER> buf=0x7343948 data=0x72b1b30 o=16333 p=16333 i=16435 size=32768 htx=0x72b1b30 flags=0x8 size=32720 data=16333 used=1 wrap=NO extra=0 ----------------------------------- strm.flg 0x100c4a SF_SRV_REUSED SF_HTX SF_REDIRECTABLE SF_CURR_SESS SF_BE_ASSIGNED SF_ASSIGNED task.state 0 0 txn.meth 1 GET txn.flg 0x43000 TX_NOT_FIRST TX_CACHE_COOK TX_CACHEABLE txn.req.flg 0x4c HTTP_MSGF_BODYLESS HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN txn.rsp.flg 0x2e HTTP_MSGF_COMPRESSING HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN HTTP_MSGF_TE_CHNK f.sc.flg 0x42082 SC_FL_EOS SC_FL_SND_EXP_MORE SC_FL_WONT_READ SC_FL_EOI f.sc.sd.flg 0x4040d001 SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX f.h2s.flg 0x7001 H2_SF_HEADERS_RCVD H2_SF_OUTGOING_DATA H2_SF_HEADERS_SENT H2_SF_ES_RCVD f.h2s.sd.flg 0x4040d001 SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX f.h2c.flg 0x60e00 H2_CF_END_REACHED H2_CF_RCVD_SHUT H2_CF_MBUF_HAS_DATA H2_CF_DEM_IN_PROGRESS H2_CF_DEM_SHORT_READ f.co.flg 0x80040300 CO_FL_XPRT_TRACKED CO_FL_SOCK_RD_SH CO_FL_XPRT_READY CO_FL_CTRL_READY f.co.fd.st 0x1122 FD_POLL_HUP FD_POLL_IN FD_EV_READY_W FD_EV_READY_R b.sc.flg 0x1211 SC_FL_SND_NEVERWAIT SC_FL_NEED_ROOM SC_FL_NOHALF SC_FL_ISBACK b.sc.sd.flg 0x106c0001 SE_FL_WAIT_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_MAY_FASTFWD_PROD SE_FL_WANT_ROOM SE_FL_RCV_MORE SE_FL_T_MUX	2024-10-23 19:17:10 +02:00
Christopher Faulet	ded28f6e5c	BUG/MEDIUM: mux-h2: Remove H2S from send list if data are sent via 0-copy FF When data are sent via the zero-copy data forwarding, in h2_done_ff, we must be sure to remove the H2 stream from the send list if something is send. It was only performed if no blocking condition was encountered. But we must also do it if something is sent. Otherwise the transfer may be blocked till timeout. This patch must be backported as far as 2.9.	2024-10-22 08:00:32 +02:00
Willy Tarreau	9aa86b9dbd	BUILD: mux-h2/traces: fix build on 32-bit due to size of the DATA frame Commit cf3fe1eed ("MINOR: mux-h2/traces: print the size of the DATA frames") added the size of the DATA frame to the traces. Unfortunately it uses ullong instead of ulong to cast a pointer, which breaks the build on 32-bit platforms. Let's just switch it to ulong which works on both.	2024-10-21 04:17:59 +02:00
Willy Tarreau	e4cb0ad632	MINOR: mux-h2/traces: add buffer-related info to h2s and h2c The traces currently don't contain any info about the amount of data present in buffers, making it difficult to figure if an empty buffer is the cause for not demuxing or if a full buffer is the cause for not reading more data. Let's add them, with the head/tail info as well.	2024-10-12 18:07:21 +02:00
Willy Tarreau	a8f907a459	MINOR: mux-h2/traces: add missing flags and proxy ID in traces H2 traces are unusable to detect bugs most of the time because they miss the h2c and h2s flags, as well as the proxy, which makes it very hard to figure if the info comes from the client or the server as soon as two layers are stacked. This commit adds these precious information as well as the h2s's rx and tx windows. This could be backported to a few recent branches, but the rx window calculation will have to be replaced with the static value there.	2024-10-12 17:45:51 +02:00
Willy Tarreau	fcab647613	OPTIM: mux-h2: use tasklet_wakeup_after() in h2s_notify_recv() This reduces the avg wakeup latency of sc_conn_io_cb() from 1900 to 51us. The L2 cache misses from from 1.4 to 1.2 billion for 20k req. But the perf is not better. Also there are situations where we must not perform such wakeup, these may only be done from h2_io_cb, hence the test on the next_tasklet pointer and its reset when leaving the function. In practice all callers to h2s_close() or h2s_destroy() can reach that code, this includes h2_detach, h2_snd_buf, h2_shut etc. Another test with 40 concurrent connections, transferring 40k 1MB objects at different concurrency levels from 1 to 80 also showed a 21% drop in L2 cache misses, and a 2% perf improvement: Before: 329,510,887,528 instructions 50,907,966,181 branches 843,515,912 branch-misses 2,753,360,222 cache-misses 19,306,172,474 L1-icache-load-misses 17,321,132,742 L1-dcache-load-misses 951,787,350 LLC-load-misses 44.660469000 seconds user 62.459354000 seconds sys => avg perf: 373 MB/s After: 331,310,219,157 instructions 51,343,396,257 branches 851,567,572 branch-misses 2,183,369,149 cache-misses 19,129,827,134 L1-icache-load-misses 17,441,877,512 L1-dcache-load-misses 906,923,115 LLC-load-misses 42.795458000 seconds user 62.277983000 seconds sys => avg perf: 380 MB/s With small requests, it's the L1 and L3 cache misses which reduced by 3% and 7% respectively, and the performance went up by 3%.	2024-10-12 17:17:51 +02:00
Willy Tarreau	04ce6536e1	OPTIM: mux-h2: try to continue reading after demuxing when useful When we stop demuxing in the middle of a frame, we know that there are other data following. The demux buffer is small and unique, but now we have rxbufs, so after h2_process_demux() is left, the dbuf is almost empty and has room to be delivered into another rxbuf. Let's implement a short loop with a counter and a few conditions around the demux call. We limit the number of turns to the number of available rxbufs and no more than 12, since it shows good performance, and the wakeup is only called once. This has shown a nice 12-20% bandwidth gain on backend-side H2 transferring 1MB-large objects, and does not affect the rest (headers, control etc). The number of wakeup calls was divided by 5 to 8, which is also a nice improvement. The counter is limited to make sure we don't add processing latency. Tests were run to find the optimal limit, and it turns out that 16 is just slightly better, but not worth the +33% increase in peak processing latency. The h2_process_demux() function just doens't call the wakeup function anymore, and solely focuses on transferring from dbuf to rxbuf. Practical measurement: test with h2load producing 4 concurrent connections with 10 concurrent streams each, downloading 1MB objects (20k total) via two layers of haproxy stacked, reaching httpterm over H1 (numbers are total for the 2 h2 front and 1 h2 back). All on a single thread. Before: 549-553 MB/s (on h2load) function calls cpu_tot cpu_avg h2_io_cb 2562340 8.157s 3.183us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup h2_io_cb 30109 840.9ms 27.93us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup h2_io_cb 16105 106.4ms 6.607us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup h2_io_cb 1 11.75us 11.75us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup h2_io_cb 2608555 9.104s 3.490us --total-- perf stat: 153,117,996,214 instructions (71.41%) 22,919,659,027 branches # 14.97% of inst (71.41%) 384,009,600 branch-misses # 1.68% of all branches (71.42%) 44,052,220 cache-misses # 1 inst / 3476 (71.44%) 9,819,232,047 L1-icache-load-misses # 6.4% of inst (71.45%) 8,426,410,306 L1-dcache-load-misses # 5.5% of inst (57.15%) 10,951,949 LLC-load-misses # 1 inst / 13982 (57.13%) 12.372600000 seconds user 23.629506000 seconds sys After: 660 MB/s (+20%) function calls cpu_tot cpu_avg h2_io_cb 244502 4.410s 18.04us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup h2_io_cb 42107 1.062s 25.22us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup h2_io_cb 13703 106.3ms 7.758us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup h2_io_cb 1 13.74us 13.74us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup h2_io_cb 300313 5.578s 18.57us --total-- perf stat: 126,840,441,876 instructions (71.40%) 17,576,059,236 branches # 13.86% of inst (71.40%) 274,136,753 branch-misses # 1.56% of all branches (71.42%) 30,413,562 cache-misses # 1 inst / 4170 (71.45%) 6,665,036,203 L1-icache-load-misses # 5.25% of inst (71.46%) 7,519,037,097 L1-dcache-load-misses # 5.9% of inst (57.15%) 6,702,411 LLC-load-misses # 1 inst / 18925 (57.12%) 10.490097000 seconds user 19.212515000 seconds sys It's also interesting to see that less total time is spent in these functions, clearly indicating that the cost of interrupted processing, and the extraneous cache misses come into play at some point. Indeed, after the change, the number of instructions went down by 17.2%, while the L2 cache misses dropped by 31% and the L3 cache misses by 39%!	2024-10-12 16:38:36 +02:00
Willy Tarreau	9fbc01710a	OPTIM: mux-h2: make h2_send() report more accurate wake up conditions h2_send() used to report non-zero every time any data were sent, and this was used from h2_snd_buf() or h2_done_ff() to trigger a wakeup, which possibly can do nothing. Restricting this wakeup to either a successful send() combined with the ability to demux, or an error. Doing this makes the number of h2_io_cb() wakeups drop from 422k to 245k for 1000 1MB objects delivered over 100 streams between two H2 proxies, without any behavior change nor performance change. In practice, most send() calls do not result in a wakeup anymore but synchronous errors still do. A local test downloading 10k 1MB objects from an H1 server with a single connection shows this change: before after caller 1547 1467 h2_process_demux() 2138 0 h2_done_ff() <--- 38 1453 ssl_sock_io_cb() <--- 18 0 h2_snd_buf() 1 1 h2_init() 3742 2921 -- total -- In practice the ssl_sock_io_cb() wakeups are those notifying about SUB_RETRY_RECV, which are not accounted for when h2_done_ff() performs the wakeup because the tasklet is already queued (a counter placed there shows that it's nonetheless called). So there's no transfer and h2_done_ff() was only hiding the other one. Another test involving 4 connections with 10 concurrent streams each and 20000 1MB objects total shows a total disparition of the wakeups from h2_snd_buf and h2_done_ff, which used to account together for 50% of the wakeups, resulting in effectively halving the number of wakeups which, based on their avg process time, were not doing anything: Before: function calls cpu_tot cpu_avg h2_io_cb 2571208 7.406s 2.880us <- h2c_restart_reading@src/mux_h2.c:940 tasklet_wakeup h2_io_cb 2536949 251.4ms 99.00ns <- h2_snd_buf@src/mux_h2.c:7573 tasklet_wakeup ### h2_io_cb 41100 5.622ms 136.0ns <- h2_done_ff@src/mux_h2.c:7779 tasklet_wakeup ### h2_io_cb 38979 852.8ms 21.88us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup h2_io_cb 12519 90.28ms 7.211us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup h2_io_cb 1 13.81us 13.81us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup h2_io_cb 5200756 8.606s 1.654us --total-- After: h2_io_cb 2562340 8.157s 3.183us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup h2_io_cb 30109 840.9ms 27.93us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup h2_io_cb 16105 106.4ms 6.607us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup h2_io_cb 1 11.75us 11.75us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup h2_io_cb 2608555 9.104s 3.490us --total--	2024-10-12 16:38:36 +02:00
Willy Tarreau	633c41c621	MEDIUM: mux-h2: rework h2_restart_reading() to differentiate recv and demux From the beginning, h2_restart_reading() has always been confusing because it decides whether or not to wake the tasklet handler up or not. This tasklet handler does two things, one is receiving from the socket to the demux buf, and one is demuxing from the demux buf to the streams' rxbufs. The conditions are governed by h2_recv_allowed(), which is also called at a few places to decide whether or not to actually receive from the socket. It starts to be visible that this leaves some difficulties regarding what to do with possibly pending data. In 2.0 with commit 3ca18bf0b ("BUG/MEDIUM: h2: Don't attempt to recv from h2_process_demux if we subscribed."), we even had to address a special case where it was possibly to endlessly wake up because the conditions would rely on the demux buffer's contents, though the solution consisted in passing a flag to decide whether or not to consider the buffer's contents. In 2.5 commit b5f7b5296 ("BUG/MEDIUM: mux-h2: Handle remaining read0 cases on partial frames") introduced a new flag H2_CF_DEM_SHORT_READ which indicates that the demux had to stop in the middle of a frame and cannot make progress without more data. More adaptations later came in based on this but this actually reflected exactly what was needed to solve this painful situation: a state indicating whether to receive or parse. Now's about time to definitely address this by reworking h2_restart_reading() to check two completely independent things: - the ability to receive more data into the demux buffer, which is based on its allocation/fill state and the socket's errors - the ability to demux such data, which is based on the presence of enough data (i.e. no stuck short read), and ability to find an rx buf to continue the processing. Now the conditions are much more understandable, and it's also visible that the consider_buffer argument, whose value was not trivial for callers, is not used anymore. Tests stacking two layers of H2 show strictly no change to the wakeup cause distributions nor counts.	2024-10-12 16:38:36 +02:00
Willy Tarreau	0fd66703c2	MEDIUM: mux-h2: change the default initial window to 16kB Now that we're using all available rx buffers for transfers, there's no point anymore in advertising more than the minimum value we can safely buffer. Let's be conservative and only rely on the dynamic buffers to improve speed beyond the configured value, and make sure than many streams will no longer cause unfairness. Interestingly, the total number of wakeups has further shrunk down, but with a different distribution. From 128k for 1000 1M transfers, it went down to 119k, with 96k from restart_reading, 10k from done_ff and 2.6k from snd_buf. done_ff went up by 30% and restart_reading went down by 30%.	2024-10-12 16:38:26 +02:00
Willy Tarreau	1ed9d37c88	MINOR: mux-h2: add tune.h2.be.rxbuf and tune.h2.fe.rxbuf global settings These settings allow to change the total buffer size allocated to the backend and frontend respectively. This way it's no longer necessary to play with tune.bufsize nor increase the number of streams to benefit from more buffers. Setting tune.h2.fe.rxbuf to 4m to match a sender's max tcp_wmem resulted in 257 Mbps for a single stream at 103ms vs 121 Mbps default (or 5.1 Mbps with a single buffer and 64kB window).	2024-10-12 16:29:16 +02:00
Willy Tarreau	e018d9a0cf	MAJOR: mux-h2: make the rxbuf allocation algorithm a bit smarter Without using bandwidth estimates, we can already use up to the number of allocatable rxbufs and share them evenly between receiving streams. In practice we reserve one buffer for any non-receiving stream, plus 1 per 8 possible new streams, and divide the rest between the number of receiving streams. Finally, for front streams, this is rounded up to the buffer size while for back streams we round it down. The rationale here is that front to back is very fast to flush and slow to refill so we want to optimise upload bandwidth regardless of the number of streams, while it's the opposite in the other way so we try to minimize HoL. That shows good results with a single stream being able to send at 121 Mbps at 103ms using 1.4 MB buffer with default settings, or 8 streams sharing the bandwidth at 180kB each. Previously the limit was approx 5.1 Mbps per stream. It also enables better sharing of backend connections: a slow (100 Mbps) and a fast (1 Gbps) clients were both downloading 2 100MB files each over a shared H2 connection. The fast one used to show 6.86 to 20.74s with an avg of 11.45s and an stddev of 5.81s before the patch, and went to a much more respectable 6.82 to 7.73s with 7.08s avg and 0.336s stddev. We don't try to increase the window past the remaining content length. First, this is pointless (though harmless), but in addition it causes needless emission of WINDOW_UPDATE frames on small uploads that are smaller than a window, and beyond being useless, it upsets vtest which expects an RST on some tests. The scheduling is not reliable enough to insert an expect for a window update first, so in the end wich that extra check we save a few useless frames on small uploads and please vtest. A new setting should be added to allow to increase the number of buffers without having to change the number of streams. At this point it's not done.	2024-10-12 16:29:16 +02:00
Willy Tarreau	3816c38601	MAJOR: mux-h2: permit a stream to allocate as many buffers as desired Now we don't enforce allocation limits in h2s_get_rxbuf(), since there is no benefit in not processing pending data, it would still cause HoL for no saving. The only reason for not allocating is if there are no buffers available for the connection. In theory this should not change anything except that it excerts code paths that support reallocating multiple buffers, which could possibly uncover a sleeping bug. This is why it's placed in a separate commit. And one observation worth noting is that it almost cut in half the number of iocb wakeups: for 1000 1MB transfers over 100 concurrent streams of a single connection, we used to observe 208k wakeups (110 from restart_reading, 80 from snd_buf, 11 from done_ff), and now we're observing 128k (113 from restart_reading, 2.4 from snd_buf, 6.9k from done_ff), which seems to indicate that pretty often the demuxing was blocked on a buffer full due to the default advertised window of 64k.	2024-10-12 16:29:16 +02:00
Willy Tarreau	4eb3ff1d3b	MAJOR: mux-h2: make streams use the connection's buffers For now it seems to work as before, and even when artificially inflating the number of allocatable buffers per stream. The number of allocated slots is always the same as the max number of streams, which guarantees that each stream will find one buffer. we only grant one buffer per stream at this point, since the goal was to replace the existing single rxbuf. A new demux blocking flag, H2_CF_DEM_RXBUF, was added to indicate a failure to get an rxbuf slot from the connection. It was lightly tested (by forcing bl_init() to a lower number of buffers). It is not yet certain whether it's more useful to have a new flag or to reuse the existing H2_CF_DEM_SFULL which indicates the rxbuf is full, but at least the new flag more accurately translates the condition, that may make a difference in the future. However, given that when RXBUF is set, most of the time it results in a failure to find more room to demux and it sets SFULL, for now we have to always clear SFULL when clearing RXBUF as well. This means that most of the time we'll see 3 combinations: - none: everything's OK - SFULL: the unique rx buffer is full - RXBUF \|\| (RXBUF\|SFULL): cannot allocate more entries Note that we need to be super careful in h2_frt_transfer_data() because the htx_free_data_space() function doesn't guarantee that the room is usable, so htx_add_data() may still fail despite an apparent room. For this reason, h2_frt_transfer_data() maintains a "full" flag to indicate that a transfer attempt failed and that a new buffer is required.	2024-10-12 16:29:16 +02:00
Willy Tarreau	6279cbc9e9	MINOR: mux-h2: clear up H2_CF_DEM_DFULL and H2_CF_DEM_SHORT_READ ambiguity Since commit 485da0b05 ("BUG/MEDIUM: mux_h2: Handle others remaining read0 cases on partial frames"), H2_CF_DEM_SHORT_READ is set when there is no blocking flags. However, it checks H2_CF_DEM_BLOCK_ANY which does not include H2_CF_DEM_DFULL. This results in many cases where both H2_CF_DEM_DFULL and H2_CF_DEM_SHORT_READ are set together, which makes no sense, since one says the demux buffer is full while the other one says an incomplete read was done. This doesn't permit to properly decide whether to restart reading or processing. Let's make sure to clear DFULL in h2_process_demux() whenever we consume incoming data from the dbuf, and check for DFULL before setting SHORT_READ. This could probably be considered as a bug fix but it's hard to say if it has any impact on the current code, probably at worst it might cause a few useless wakeups, so until there's any proof that it needs to be backported, better not do it.	2024-10-12 16:29:16 +02:00
Willy Tarreau	b74bedf157	MINOR: mux-h2: simplify the wake up code in h2_rcv_buf() The code used to decide when to restart reading is far from being trivial and will cause trouble after the forthcoming changes: it checks if the current stream is the same that is being demuxed, and only if so, wakes the demux to restart reading. Once streams will start to use multiple buffers, this condition will make no sense anymore. Actually the real reason is split into two steps: - detect if the demux is currently blocked on the current stream, and if so remove SFULL - detect if any demux blocking flags were removed during the operations, and if so, wake demuxing. For now this doesn't change anything.	2024-10-12 16:29:16 +02:00
Willy Tarreau	a0ed92f3dd	MINOR: mux-h2: simplify the exit code in h2_rcv_buf() The code used to decide what to tell to the upper layer and when to free the rxbuf is a bit convoluted and difficult to adapt to dynamic rxbufs. We first need to deal with memory management (b_free) and only then to decide what to report upwards. Right now it does it the other way around. This should not change anything.	2024-10-12 16:29:16 +02:00
Willy Tarreau	8cf418811d	MINOR: mux-h2: add rxbuf head/tail/count management for h2s Now the h2s get their rx_head, rx_tail and rx_count associated with the shared rxbufs. A few functions are provided to manipulate all this, essentially allocate/release a buffer for the stream, return a buffer pointer to the head/tail, counting allocated buffers for the stream and reporting if a stream may still allocate. For now this code is not used.	2024-10-12 16:29:16 +02:00
Willy Tarreau	a891534bfd	MINOR: mux-h2: allocate the array of shared rx bufs in the h2c In preparation for having a shared list of rx bufs, we're now allocating the array of shared rx bufs in the h2c. The pool is created at the max size between the front and back max streams for now, and the array is not used yet.	2024-10-12 16:29:16 +02:00
Willy Tarreau	721ea5b06c	MINOR: mux-h2: count within a connection, how many streams are receiving data A stream is receiving data from after the HEADERS frame missing END_STREAM, to the end of the stream or HREM (the presence of END_STREAM). We're now adding a flag to the stream that indicates this state, as well as a counter in the connection of streams currently receiving data. The purpose will be to gauge at any instant the number of streams that might have to share the available bandwidth and buffers count in order not to allocate too much flow control to any single stream. For now the counter is kept up to date, and is reported in "show fd".	2024-10-12 16:29:16 +02:00
Willy Tarreau	c9275084bc	MEDIUM: mux-h2: start to introduce the window size in the offset calculation Instead of incrementing the last_max_ofs by the amount of received bytes, we now start from the new current offset to which we add the static window size. The result is exactly the same but it prepares the code to use a window size combined with an offset instead of just refilling the budget from what was received. It was even verified that changing h2_fe_settings_initial_window_size in the middle of a transfer using gdb does indeed allow the transfer speed to adapt accordingly.	2024-10-12 16:29:16 +02:00
Willy Tarreau	1cc851d9f2	MEDIUM: mux-h2: start to update stream when sending WU The rationale here is that we don't absolutely need to update the stream offset live, there's already the rcvd_s counter to remind us we've received data. So we can continue to exploit the current check points for this. Now we know that rcvd_s indicates the amount of newly received bytes for the stream since last call to h2c_send_strm_wu() so we can update our stream offsets within that function. The wu_s counter is set to the difference between next_adv_ofs and last_adv_ofs, which are resynchronized once the frame is sent. If the stream suddenly disappears with unacked data (aborted upload), the presence of the last update in h2c->wu_s is sufficient to let the connection ack the data alone, and upon subsequent calls with new rcvd_s, the received counter will be used to ack, like before. We don't need to do more anyway since the goal is to let the client abort ASAP when it gets an RST. At this point, the stream knows its current rx offset, the computed max offset and the last advertised one.	2024-10-12 16:29:16 +02:00
Willy Tarreau	eb0fe66c61	MINOR: mux-h2: create and initialize an rx offset per stream In H2, everything is accounted as budget. But if we want to moderate the rcv window that's not very convenient, and we'd rather have offsets instead so that we know where we are in the stream. Let's first add the fields to the struct and initialize them. The curr_rx_ofs indicates the position in the stream where next incoming bytes will be stored. last_adv_ofs tells what's the offset that was last advertised as the window limit, and next_max_ofs is the one that will need to be advertised, which is curr_rx_ofs plus the current window. next_max_ofs will have to cause a WINDOW_UPDATE to be emitted when it's higher than last_adv_ofs, and once the WU is sent, its value will have to be copied over last_adv_ofs. The problem is, for now wherever we emit a stream WU, we have no notion of stream (the stream might even not exist anymore, e.g. after aborting an upload), because we currently keep a counter of stream window to be acked for the current stream ID (h2c->dsi) in the connection (rcvd_s). Similarly there are a few places early in the frame header processing where rcvd_s is incremented without knowing the stream yet. Thus, lookups will be needed for that, unless such a connection-level counter remains used and poured into the stream's count once known (delicate). Thus for now this commit only creates the fields and initializes them.	2024-10-12 16:29:15 +02:00
Willy Tarreau	560e474cdd	MINOR: mux-h2: split the amount of rx data from the amount to ack We'll need to keep track of the total amount of data received for the current stream, and the amount of data to ack for the current stream, which might soon diverge as soon as we'll have to update the stream's offset with received data, which are different from those to be ACKed. One reason is that in case a stream doesn't exist anymore (e.g. aborted an upload), the rcvd_s info might get lost after updating the stream, so we do need to have an in-connection counter for that. What's done here is that the rcvd_s count is transferred to wu_s in h2c_send_strm_wu(), to be used as the counter to send, and both are considered as sufficient when non-null to call the function.	2024-10-12 16:29:15 +02:00
Willy Tarreau	d288ddb575	CLEANUP: muxes: remove useless inclusion of ebmbtree.h Since 2.7 with commit 8522348482 ("BUG/MAJOR: conn-idle: fix hash indexing issues on idle conns"), we've been using eb64 trees and not ebmb trees anymore, and later we dropped all that to centralize the operations in the server. Let's remove the ebmbtree.h includes from the muxes that do not use them.	2024-10-12 16:29:15 +02:00
Willy Tarreau	cf3fe1eed4	MINOR: mux-h2/traces: print the size of the DATA frames DATA frames produce a special trace with the amount of transferred data in arg4, but this was not reported by h2_trace(). This commit just adds it.	2024-10-12 16:29:15 +02:00
Willy Tarreau	af064b497a	BUG/MINOR: mux-h2/traces: present the correct buffer for trailers errors traces The local "rxbuf" buffer was passed to the trace instead of h2s->rxbuf that is used when decoding trailers. The impact is essentially the impossibility to present some buffer contents in some rare cases. It may be backported but it's unlikely that anyone will ever notice the difference.	2024-10-12 16:29:15 +02:00
Christopher Faulet	001fb1a548	BUG/MEDIUM: mux-h1/mux-h2: Reject upgrades with payload on H2 side only Since 1d2d77b27 ("MEDIUM: mux-h1: Return a 501-not-implemented for upgrade requests with a body"), it is no longer possible to perform a protocol upgrade for requests with a payload. The main reason was to be able to support protocol upgrade for H1 client requesting a H2 server. In that case, the upgrade request is converted to a CONNECT request. So, it is not possible to convey a payload in that case. But, it is a problem for anyone wanting to perform upgrades on H1 server using requests with a payload. It is uncommon but valid. So, now, it is the H2 multiplexer responsibility to reject upgrade requests, on server side, if there is a payload. An INTERNAL_ERROR is returned for the H2S in that case. On H1 side, the upgrade is now allowed, but only if the server waits for the end of the request to return the 101-Switching-protocol response. Indeed, it is quite hard to synchronise the frontend side and the backend side in that case. Asking to servers to fully consume the request payload before returned the response seems reasonable. This patch should fix the issue #2684. It could be backported after a period of observation, as far as 2.4 if possible. But only if it is not too hard. It depends on "MINOR: mux-h1: Set EOI on SE during demux when both side are in DONE state".	2024-09-06 09:16:18 +02:00
Willy Tarreau	830e50561c	BUG/MAJOR: mux-h2: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf There exists an extremely tricky code path that was revealed in 3.0 by the glitches feature, though it might theoretically have existed before. TL;DR: a mux mbuf may be full after successfully sending GOAWAY, and discard its remaining contents without clearing H2_CF_MUX_MFULL and H2_CF_DEM_MROOM, then endlessly loop in h2_send(), until the watchdog takes care of it. What can happen is the following: Some data are received, h2_io_cb() is called. h2_recv() is called to receive the incoming data. Then h2_process() is called and in turn calls h2_process_demux() to process input data. At some point, a glitch limit is reached and h2c_error() is called to close the connection. The input frame was incomplete, so some data are left in the demux buffer. Then h2_send() is called, which in turn calls h2_process_mux(), which manages to queue the GOAWAY frame, turning the state to H2_CS_ERROR2. The frame is sent, and h2_process() calls h2_send() a last time (doing nothing) and leaves. The streams are all woken up to notify about the error. Multiple backend streams were waiting to be scheduled and are woken up in turn, before their parents being notified, and communicate with the h2 mux in zero-copy-forward mode, request a buffer via h2_nego_ff(), fill it, and commit it with h2_done_ff(). At some point the mux's output buffer is full, and gets flags H2_CF_MUX_MFULL. The io_cb is called again to process more incoming data. h2_send() isn't called (polled) or does nothing (e.g. TCP socket buffers full). h2_recv() may or may not do anything (doesn't matter). h2_process() is called since some data remain in the demux buf. It goes till the end, where it finds st0 == H2_CS_ERROR2 and clears the mbuf. We're now in a situation where the mbuf is empty and MFULL is still present. Then it calls h2_send(), which doesn't call h2_process_mux() due to MFULL, doesn't enter the for() loop since all buffers are empty, then keeps sent=0, which doesn't allow to clear the MFULL flag, and since "done" was not reset, it loops forever there. Note that the glitches make the issue more reproducible but theoretically it could happen with any other GOAWAY (e.g. PROTOCOL_ERROR). What makes it not happen with the data produced on the parsing side is that we process a single buffer of input at once, and there's no way to amplify this to 30 buffers of responses (RST_STREAM, GOAWAY, SETTINGS ACK, WINDOW_UPDATE, PING ACK etc are all quite small), and since the mbuf is cleared upon every exit from h2_process() once the error was sent, it is not possible to accumulate response data across multiple calls. And the regular h2_snd_buf() path checks for st0 >= H2_CS_ERROR so it will not produce any data there either. Probably that h2_nego_ff() should check for H2_CS_ERROR before accepting to deliver a buffer, but this needs to be carefully studied. In the mean time the real problem is that the MFULL flag was kept when clearing the buffer, making the two inconsistent. Since it doesn't seem possible to trigger this sequence without the zero-copy-forward mechanism, this fix needs to be backported as far as 2.9, along with previous commit "MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places" which will strengthen the consistency between these checks. Many thanks to Annika Wickert for her detailed report that allowed to diagnose this problem. CVE-2024-45506 was assigned to this problem.	2024-09-03 14:39:04 +02:00
Willy Tarreau	e9cdedb39b	MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places The code leading to H2_CF_MUX_MFULL and H2_CF_DEM_MROOM being cleared is quite complex and assumptions about its state are extremely difficult when reading the code. There are indeed long sequences where the mux might possibly be empty, still having the flag set until it reaches h2_send() which will clear it after the last send. Even then it's not obviour whether it's always guaranteed to release the flag when invoked in multiple passes. Let's just simplify the conditionnn so that h2_send() does not depend on "sent" anymore and that h2_timeout_task() doesn't leave the flags set on the buffer on emptiness. While it doesn't seem to fix anything, it will make the code more robust against future changes.	2024-09-03 14:39:04 +02:00
Christopher Faulet	4ef5251c44	BUG/MEDIUM: mux-h2: Set ES flag when necessary on 0-copy data forwarding When DATA frames are sent via the 0-copy data forwarding, we must take care to set the ES flag on the last DATA frame. It should be performed in h2_done_ff() when IOBUF_FL_EOI flag was set by the producer. This flag is here to know when the producer has reached the end of input. When this happens, the h2s state is also updated. It is switched to "half-closed local" or "closed" state depending on its previous state. It is mainly an issue on uploads because the server may be blocked waiting for the end of the request. A workaround is to disable the 0-copy forwarding support the the H2 by setting "tune.h2.zero-copy-fwd-send" directive to off in your global section. This patch should fix the issue #2665. It must be backported as far as 2.9.	2024-08-28 10:05:34 +02:00
Willy Tarreau	23417ab9d4	MINOR: mux-h2/trace: add a state trace on stream creation/destruction Logging below the developer level doesn't always yield very convenient traces as we don't know well where streams are allocated nor released. Let's just make that more explicit by using state-level traces for these important steps.	2024-08-07 16:02:59 +02:00
Willy Tarreau	6c6ef5ae12	MINOR: mux-h2: add a trace context filling helper This helper is able to find a connection, a session, a stream, a frontend or a backend from its args. Note that this required to always make sure that h2s->sess is reset on allocation because it's normally initialized later for backend streams, and producing traces between the two could pre-fill a bad pointer in the trace_ctx.	2024-08-07 16:02:59 +02:00
Willy Tarreau	490cb16d3a	MINOR: mux-h2: implement the debug string for logs Now it permits to have this for a front and a back: <134>Jul 30 19:32:53 haproxy[24405]: 127.0.0.1:64860 [30/Jul/2024:19:32:53.732] test2 test2/s1 0/0/0/0/0 200 130 - - ---- 2/1/0/0/0 0/0 "GET /blah HTTP/2.0" h2s.id=1 .st=CLO .flg=0x7003 .rxbuf=0@(nil)+0/0 .sc=0x1e03fb0(.flg=0x00034482 .app=0x1e04020) .sd=0x1e03f30(.flg=0x50405601) .subs=(nil) h2c.st0=FRH .err=0 .maxid=1 .lastid=-1 .flg=0x100e00 .nbst=0 .nbsc=1, .glitches=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=1 .orph_cnt=0 .sub=1 .dsi=1 .dbuf=0@(nil)+0/0 .mbuf=[1..1\|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=(nil) conn.flg=0x80000300 <134>Jul 30 19:32:53 haproxy[24405]: 127.0.0.1:65246 [30/Jul/2024:19:32:53.732] test1 test1/s1 0/0/0/0/0 200 130 - - ---- 2/1/0/0/0 0/0 "GET /blah HTTP/1.1" h2s.id=1 .st=CLO .flg=0x7003 .rxbuf=0@(nil)+0/0 .sc=0x1dfc7b0(.flg=0x0006d01b .app=0x1c65fe0) .sd=0x1dfc820(.flg=0x1040ca01) .subs=(nil) h2c.st0=FRH .err=0 .maxid=1 .lastid=-1 .flg=0x108e00 .nbst=0 .nbsc=1, .glitches=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=1 .orph_cnt=0 .sub=1 .dsi=1 .dbuf=0@(nil)+0/0 .mbuf=[1..1\|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=(nil) conn.flg=0x000300 Just with this in the front and back proxies respectively: log-format "$HAPROXY_HTTP_LOG_FMT %[bs.debug_str(15)]" log-format "$HAPROXY_HTTP_LOG_FMT %[fs.debug_str(15)]" For now the mux only implements muxs, muxc, conn. Xprt is ignored.	2024-08-07 14:07:41 +02:00
Christopher Faulet	184f16ded7	BUG/MEDIUM: mux-h2: Propagate term flags to SE on error in h2s_wake_one_stream When a stream is explicitly woken up by the H2 conneciton, if an error condition is detected, the corresponding error flag is set on the SE. So SE_FL_ERROR or SE_FL_ERR_PENDING, depending if the end of stream was reported or not. However, there is no attempt to propagate other termination flags. We must be sure to properly set SE_FL_EOI and SE_FL_EOS when appropriate to be able to switch a pending error to a fatal error. Because of this bug, the SE remains with a pending error and no end of stream, preventing the applicative stream to trully abort it. It means on some abort scenario, it is possible to block a stream infinitely. This patch must be backported at least as far as 2.8. No bug was observed on older versions while the same code is inuse.	2024-08-02 08:42:28 +02:00
Willy Tarreau	4de03e42cd	BUG/MAJOR: mux-h2: force a hard error upon short read with pending error A risk of truncated packet was addressed in 2.9 by commit 19fb19976f ("BUG/MEDIUM: mux-h2: Only Report H2C error on read error if demux buffer is empty") by ignoring CO_FL_ERROR after a recv() call as long as some data remained present in the buffer. However it has a side effect due to the fact that some frame processors only deal with full frames, for example, HEADERS. The side effect is that an incomplete frame will not be processed and will remain in the buffer, preventing the error from being taken into account, so the I/O handler wakes up the H2 parser to handle the error, and that one just subscribes for more data, and this loops forever wasting CPU cycles. Note that this only happens with errors at the SSL layer exclusively, otherwise we'd have a read0 pending that would properly be detected: conn->flags = CO_FL_XPRT_TRACKED \| CO_FL_ERROR \| CO_FL_XPRT_READY \| CO_FL_CTRL_READY conn->err_code = CO_ERR_SSL_FATAL h2c->flags = H2_CF_ERR_PENDING \| H2_CF_WINDOW_OPENED \| H2_CF_MBUF_HAS_DATA \| H2_CF_DEM_IN_PROGRESS \| H2_CF_DEM_SHORT_READ The condition to report the error in h2_recv() needs to be refined, so that connection errors are taken into account either when the buffer is empty, or when there's an incomplete frame, since we're certain it will never be completed. We're certain to enter that function because H2_CF_DEM_SHORT_READ implies too short a frame, and earlier there's a protocol check to validate that no frame size is larger than bufsize, hence a H2_CF_DEM_SHORT_READ implies there's some room left in the buffer and we're allowed to try to receive. The condition to reproduce the bug seems super hard to meet but was observed once by Patrick Hemmer who had the reflex to capture lots of information that allowed to explain the problem. In order to reproduce it, the SSL code had to be significantly modified to alter received contents at very empiric places, but that was sufficient to reproduce it and confirm that the current patch works as expected. The bug was tagged MAJOR because when it triggers there's no other solution to get rid of it but to restart the process. However given how hard it is to trigger on a lab, it does not seem very likely to occur in field. This needs to be backported to 2.9.	2024-07-17 15:07:47 +02:00
Christopher Faulet	4b8098bf48	MINOR: connection: No longer include stconn type header in connection-t.h It is a small change, but it is cleaner to no include stconn-t.h header in connection-t.h, mainly to avoid circular definitions. The related issue is #2502.	2024-07-12 15:27:04 +02:00
Willy Tarreau	821a04377d	BUG/MEDIUM: muxes: enforce buf_wait check in takeover() The ->takeover() is quite tricky. It didn't take care of the possibility that the original thread's connection handler had been woken up to handle an event (e.g. read0), failed to get a buffer, registered against its own thread's buffer_wait queue and left the connection in an idle state. A new thread could then come by, perform a takeover(), and when a buffer was available, the new thread's tasklet would be woken up by the old one via _buf_available(), causing all sort of problems. These problems are easy to reproduce, by running with shared backend connections and few buffers (tune.buffers.limit=20, 8 threads, 500 connections, transfer 64kB objects and wait 2-5s for a crash to appear). A first estimated solution consisted in removing the connection from the idle list but it turns out that it would be worse for the delete stuff (the connection no longer appearing as idle, making it impossible to find it in order to close it). Also, idle counts wouldn't match anymore the list's state, and the special case of private connections could be difficult to handle as the connection could be forcefully re-added to the idle list after allocation despite being private. After multiple attempts to address the problem in various ways, it appears that the only reliable solution for now (without starting to turn many lists to mt_lists) is to have the takeover() function handle the buf_wait detection or unregistration itself: - when doing a regular takeover aiming at finding an idle connection for a new request, connections that are blocked in a buffer_wait queue are quite rare and not interesting at all (since not immediately usable), so skipping them is sufficient. For this we detect that the desired connection belongs to a buffer_wait list by checking its buf_wait.list element. Note that this check is not* thread-safe! The LIST_DEL_INIT() is performed by __offer_buffers() after the callback was called. But this is sufficient as it is now because the only way for the element to be seen as not in a list is after the element was last touched by __offer_buffers(), so the situation for this connection will not change in a different way later. - when doing a server delete, we're running under thread isolation. The connection might get taken over to be killed. The only trick is that private connections not belonging to any idle list may also experience this, and in this case even the idle_conns lock will not offer any protection against anything. But since we're run under thread isolation, we're certain not to compete with the other thread, so it's safe to directly unregister the connection from its owner thread. Normally this is already handled by conn_release() in cli_parse_delete_server(), which calls mux->destroy(), but this would actually update the current thread's queue instead of the origin thread's, thus we do need to perform an explicit dequeue before completing the takeover. With this, the problem now looks solved for HTTP/1, HTTP/2 and FCGI, though extensive tests were essentially run on HTTP/1 and HTTP/2. While the problem has been there for a very long time, there should be no reason to backport it since buffer_wait didn't practically work before 3.0-dev and the process used to freeze hard very quickly before we'd even have a chance to meet that race.	2024-05-15 19:37:12 +02:00
Willy Tarreau	f5566afec6	MEDIUM: dynbuf: generalize the use of b_dequeue() to detach buffer_wait Now thanks to this the bufq_map field is expected to remain accurate.	2024-05-10 17:18:13 +02:00
Willy Tarreau	a214197ce7	MINOR: dynbuf: use the b_queue()/b_requeue() functions everywhere The code places that were used to manipulate the buffer_wq manually now just call b_queue() or b_requeue(). This will simplify the multiple list management later.	2024-05-10 17:18:13 +02:00

1 2 3 4 5 ...

1031 Commits