haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-12 10:06:58 +02:00

Author	SHA1	Message	Date
Willy Tarreau	b74bedf157	MINOR: mux-h2: simplify the wake up code in h2_rcv_buf() The code used to decide when to restart reading is far from being trivial and will cause trouble after the forthcoming changes: it checks if the current stream is the same that is being demuxed, and only if so, wakes the demux to restart reading. Once streams will start to use multiple buffers, this condition will make no sense anymore. Actually the real reason is split into two steps: - detect if the demux is currently blocked on the current stream, and if so remove SFULL - detect if any demux blocking flags were removed during the operations, and if so, wake demuxing. For now this doesn't change anything.	2024-10-12 16:29:16 +02:00
Willy Tarreau	a0ed92f3dd	MINOR: mux-h2: simplify the exit code in h2_rcv_buf() The code used to decide what to tell to the upper layer and when to free the rxbuf is a bit convoluted and difficult to adapt to dynamic rxbufs. We first need to deal with memory management (b_free) and only then to decide what to report upwards. Right now it does it the other way around. This should not change anything.	2024-10-12 16:29:16 +02:00
Willy Tarreau	8cf418811d	MINOR: mux-h2: add rxbuf head/tail/count management for h2s Now the h2s get their rx_head, rx_tail and rx_count associated with the shared rxbufs. A few functions are provided to manipulate all this, essentially allocate/release a buffer for the stream, return a buffer pointer to the head/tail, counting allocated buffers for the stream and reporting if a stream may still allocate. For now this code is not used.	2024-10-12 16:29:16 +02:00
Willy Tarreau	a891534bfd	MINOR: mux-h2: allocate the array of shared rx bufs in the h2c In preparation for having a shared list of rx bufs, we're now allocating the array of shared rx bufs in the h2c. The pool is created at the max size between the front and back max streams for now, and the array is not used yet.	2024-10-12 16:29:16 +02:00
Willy Tarreau	721ea5b06c	MINOR: mux-h2: count within a connection, how many streams are receiving data A stream is receiving data from after the HEADERS frame missing END_STREAM, to the end of the stream or HREM (the presence of END_STREAM). We're now adding a flag to the stream that indicates this state, as well as a counter in the connection of streams currently receiving data. The purpose will be to gauge at any instant the number of streams that might have to share the available bandwidth and buffers count in order not to allocate too much flow control to any single stream. For now the counter is kept up to date, and is reported in "show fd".	2024-10-12 16:29:16 +02:00
Willy Tarreau	c9275084bc	MEDIUM: mux-h2: start to introduce the window size in the offset calculation Instead of incrementing the last_max_ofs by the amount of received bytes, we now start from the new current offset to which we add the static window size. The result is exactly the same but it prepares the code to use a window size combined with an offset instead of just refilling the budget from what was received. It was even verified that changing h2_fe_settings_initial_window_size in the middle of a transfer using gdb does indeed allow the transfer speed to adapt accordingly.	2024-10-12 16:29:16 +02:00
Willy Tarreau	1cc851d9f2	MEDIUM: mux-h2: start to update stream when sending WU The rationale here is that we don't absolutely need to update the stream offset live, there's already the rcvd_s counter to remind us we've received data. So we can continue to exploit the current check points for this. Now we know that rcvd_s indicates the amount of newly received bytes for the stream since last call to h2c_send_strm_wu() so we can update our stream offsets within that function. The wu_s counter is set to the difference between next_adv_ofs and last_adv_ofs, which are resynchronized once the frame is sent. If the stream suddenly disappears with unacked data (aborted upload), the presence of the last update in h2c->wu_s is sufficient to let the connection ack the data alone, and upon subsequent calls with new rcvd_s, the received counter will be used to ack, like before. We don't need to do more anyway since the goal is to let the client abort ASAP when it gets an RST. At this point, the stream knows its current rx offset, the computed max offset and the last advertised one.	2024-10-12 16:29:16 +02:00
Willy Tarreau	eb0fe66c61	MINOR: mux-h2: create and initialize an rx offset per stream In H2, everything is accounted as budget. But if we want to moderate the rcv window that's not very convenient, and we'd rather have offsets instead so that we know where we are in the stream. Let's first add the fields to the struct and initialize them. The curr_rx_ofs indicates the position in the stream where next incoming bytes will be stored. last_adv_ofs tells what's the offset that was last advertised as the window limit, and next_max_ofs is the one that will need to be advertised, which is curr_rx_ofs plus the current window. next_max_ofs will have to cause a WINDOW_UPDATE to be emitted when it's higher than last_adv_ofs, and once the WU is sent, its value will have to be copied over last_adv_ofs. The problem is, for now wherever we emit a stream WU, we have no notion of stream (the stream might even not exist anymore, e.g. after aborting an upload), because we currently keep a counter of stream window to be acked for the current stream ID (h2c->dsi) in the connection (rcvd_s). Similarly there are a few places early in the frame header processing where rcvd_s is incremented without knowing the stream yet. Thus, lookups will be needed for that, unless such a connection-level counter remains used and poured into the stream's count once known (delicate). Thus for now this commit only creates the fields and initializes them.	2024-10-12 16:29:15 +02:00
Willy Tarreau	560e474cdd	MINOR: mux-h2: split the amount of rx data from the amount to ack We'll need to keep track of the total amount of data received for the current stream, and the amount of data to ack for the current stream, which might soon diverge as soon as we'll have to update the stream's offset with received data, which are different from those to be ACKed. One reason is that in case a stream doesn't exist anymore (e.g. aborted an upload), the rcvd_s info might get lost after updating the stream, so we do need to have an in-connection counter for that. What's done here is that the rcvd_s count is transferred to wu_s in h2c_send_strm_wu(), to be used as the counter to send, and both are considered as sufficient when non-null to call the function.	2024-10-12 16:29:15 +02:00
Willy Tarreau	d288ddb575	CLEANUP: muxes: remove useless inclusion of ebmbtree.h Since 2.7 with commit `8522348482` ("BUG/MAJOR: conn-idle: fix hash indexing issues on idle conns"), we've been using eb64 trees and not ebmb trees anymore, and later we dropped all that to centralize the operations in the server. Let's remove the ebmbtree.h includes from the muxes that do not use them.	2024-10-12 16:29:15 +02:00
Willy Tarreau	cf3fe1eed4	MINOR: mux-h2/traces: print the size of the DATA frames DATA frames produce a special trace with the amount of transferred data in arg4, but this was not reported by h2_trace(). This commit just adds it.	2024-10-12 16:29:15 +02:00
Willy Tarreau	af064b497a	BUG/MINOR: mux-h2/traces: present the correct buffer for trailers errors traces The local "rxbuf" buffer was passed to the trace instead of h2s->rxbuf that is used when decoding trailers. The impact is essentially the impossibility to present some buffer contents in some rare cases. It may be backported but it's unlikely that anyone will ever notice the difference.	2024-10-12 16:29:15 +02:00
Christopher Faulet	001fb1a548	BUG/MEDIUM: mux-h1/mux-h2: Reject upgrades with payload on H2 side only Since `1d2d77b27` ("MEDIUM: mux-h1: Return a 501-not-implemented for upgrade requests with a body"), it is no longer possible to perform a protocol upgrade for requests with a payload. The main reason was to be able to support protocol upgrade for H1 client requesting a H2 server. In that case, the upgrade request is converted to a CONNECT request. So, it is not possible to convey a payload in that case. But, it is a problem for anyone wanting to perform upgrades on H1 server using requests with a payload. It is uncommon but valid. So, now, it is the H2 multiplexer responsibility to reject upgrade requests, on server side, if there is a payload. An INTERNAL_ERROR is returned for the H2S in that case. On H1 side, the upgrade is now allowed, but only if the server waits for the end of the request to return the 101-Switching-protocol response. Indeed, it is quite hard to synchronise the frontend side and the backend side in that case. Asking to servers to fully consume the request payload before returned the response seems reasonable. This patch should fix the issue #2684. It could be backported after a period of observation, as far as 2.4 if possible. But only if it is not too hard. It depends on "MINOR: mux-h1: Set EOI on SE during demux when both side are in DONE state".	2024-09-06 09:16:18 +02:00
Willy Tarreau	830e50561c	BUG/MAJOR: mux-h2: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf There exists an extremely tricky code path that was revealed in 3.0 by the glitches feature, though it might theoretically have existed before. TL;DR: a mux mbuf may be full after successfully sending GOAWAY, and discard its remaining contents without clearing H2_CF_MUX_MFULL and H2_CF_DEM_MROOM, then endlessly loop in h2_send(), until the watchdog takes care of it. What can happen is the following: Some data are received, h2_io_cb() is called. h2_recv() is called to receive the incoming data. Then h2_process() is called and in turn calls h2_process_demux() to process input data. At some point, a glitch limit is reached and h2c_error() is called to close the connection. The input frame was incomplete, so some data are left in the demux buffer. Then h2_send() is called, which in turn calls h2_process_mux(), which manages to queue the GOAWAY frame, turning the state to H2_CS_ERROR2. The frame is sent, and h2_process() calls h2_send() a last time (doing nothing) and leaves. The streams are all woken up to notify about the error. Multiple backend streams were waiting to be scheduled and are woken up in turn, before their parents being notified, and communicate with the h2 mux in zero-copy-forward mode, request a buffer via h2_nego_ff(), fill it, and commit it with h2_done_ff(). At some point the mux's output buffer is full, and gets flags H2_CF_MUX_MFULL. The io_cb is called again to process more incoming data. h2_send() isn't called (polled) or does nothing (e.g. TCP socket buffers full). h2_recv() may or may not do anything (doesn't matter). h2_process() is called since some data remain in the demux buf. It goes till the end, where it finds st0 == H2_CS_ERROR2 and clears the mbuf. We're now in a situation where the mbuf is empty and MFULL is still present. Then it calls h2_send(), which doesn't call h2_process_mux() due to MFULL, doesn't enter the for() loop since all buffers are empty, then keeps sent=0, which doesn't allow to clear the MFULL flag, and since "done" was not reset, it loops forever there. Note that the glitches make the issue more reproducible but theoretically it could happen with any other GOAWAY (e.g. PROTOCOL_ERROR). What makes it not happen with the data produced on the parsing side is that we process a single buffer of input at once, and there's no way to amplify this to 30 buffers of responses (RST_STREAM, GOAWAY, SETTINGS ACK, WINDOW_UPDATE, PING ACK etc are all quite small), and since the mbuf is cleared upon every exit from h2_process() once the error was sent, it is not possible to accumulate response data across multiple calls. And the regular h2_snd_buf() path checks for st0 >= H2_CS_ERROR so it will not produce any data there either. Probably that h2_nego_ff() should check for H2_CS_ERROR before accepting to deliver a buffer, but this needs to be carefully studied. In the mean time the real problem is that the MFULL flag was kept when clearing the buffer, making the two inconsistent. Since it doesn't seem possible to trigger this sequence without the zero-copy-forward mechanism, this fix needs to be backported as far as 2.9, along with previous commit "MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places" which will strengthen the consistency between these checks. Many thanks to Annika Wickert for her detailed report that allowed to diagnose this problem. CVE-2024-45506 was assigned to this problem.	2024-09-03 14:39:04 +02:00
Willy Tarreau	e9cdedb39b	MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places The code leading to H2_CF_MUX_MFULL and H2_CF_DEM_MROOM being cleared is quite complex and assumptions about its state are extremely difficult when reading the code. There are indeed long sequences where the mux might possibly be empty, still having the flag set until it reaches h2_send() which will clear it after the last send. Even then it's not obviour whether it's always guaranteed to release the flag when invoked in multiple passes. Let's just simplify the conditionnn so that h2_send() does not depend on "sent" anymore and that h2_timeout_task() doesn't leave the flags set on the buffer on emptiness. While it doesn't seem to fix anything, it will make the code more robust against future changes.	2024-09-03 14:39:04 +02:00
Christopher Faulet	4ef5251c44	BUG/MEDIUM: mux-h2: Set ES flag when necessary on 0-copy data forwarding When DATA frames are sent via the 0-copy data forwarding, we must take care to set the ES flag on the last DATA frame. It should be performed in h2_done_ff() when IOBUF_FL_EOI flag was set by the producer. This flag is here to know when the producer has reached the end of input. When this happens, the h2s state is also updated. It is switched to "half-closed local" or "closed" state depending on its previous state. It is mainly an issue on uploads because the server may be blocked waiting for the end of the request. A workaround is to disable the 0-copy forwarding support the the H2 by setting "tune.h2.zero-copy-fwd-send" directive to off in your global section. This patch should fix the issue #2665. It must be backported as far as 2.9.	2024-08-28 10:05:34 +02:00
Willy Tarreau	23417ab9d4	MINOR: mux-h2/trace: add a state trace on stream creation/destruction Logging below the developer level doesn't always yield very convenient traces as we don't know well where streams are allocated nor released. Let's just make that more explicit by using state-level traces for these important steps.	2024-08-07 16:02:59 +02:00
Willy Tarreau	6c6ef5ae12	MINOR: mux-h2: add a trace context filling helper This helper is able to find a connection, a session, a stream, a frontend or a backend from its args. Note that this required to always make sure that h2s->sess is reset on allocation because it's normally initialized later for backend streams, and producing traces between the two could pre-fill a bad pointer in the trace_ctx.	2024-08-07 16:02:59 +02:00
Willy Tarreau	490cb16d3a	MINOR: mux-h2: implement the debug string for logs Now it permits to have this for a front and a back: <134>Jul 30 19:32:53 haproxy[24405]: 127.0.0.1:64860 [30/Jul/2024:19:32:53.732] test2 test2/s1 0/0/0/0/0 200 130 - - ---- 2/1/0/0/0 0/0 "GET /blah HTTP/2.0" h2s.id=1 .st=CLO .flg=0x7003 .rxbuf=0@(nil)+0/0 .sc=0x1e03fb0(.flg=0x00034482 .app=0x1e04020) .sd=0x1e03f30(.flg=0x50405601) .subs=(nil) h2c.st0=FRH .err=0 .maxid=1 .lastid=-1 .flg=0x100e00 .nbst=0 .nbsc=1, .glitches=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=1 .orph_cnt=0 .sub=1 .dsi=1 .dbuf=0@(nil)+0/0 .mbuf=[1..1\|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=(nil) conn.flg=0x80000300 <134>Jul 30 19:32:53 haproxy[24405]: 127.0.0.1:65246 [30/Jul/2024:19:32:53.732] test1 test1/s1 0/0/0/0/0 200 130 - - ---- 2/1/0/0/0 0/0 "GET /blah HTTP/1.1" h2s.id=1 .st=CLO .flg=0x7003 .rxbuf=0@(nil)+0/0 .sc=0x1dfc7b0(.flg=0x0006d01b .app=0x1c65fe0) .sd=0x1dfc820(.flg=0x1040ca01) .subs=(nil) h2c.st0=FRH .err=0 .maxid=1 .lastid=-1 .flg=0x108e00 .nbst=0 .nbsc=1, .glitches=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=1 .orph_cnt=0 .sub=1 .dsi=1 .dbuf=0@(nil)+0/0 .mbuf=[1..1\|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=(nil) conn.flg=0x000300 Just with this in the front and back proxies respectively: log-format "$HAPROXY_HTTP_LOG_FMT %[bs.debug_str(15)]" log-format "$HAPROXY_HTTP_LOG_FMT %[fs.debug_str(15)]" For now the mux only implements muxs, muxc, conn. Xprt is ignored.	2024-08-07 14:07:41 +02:00
Christopher Faulet	184f16ded7	BUG/MEDIUM: mux-h2: Propagate term flags to SE on error in h2s_wake_one_stream When a stream is explicitly woken up by the H2 conneciton, if an error condition is detected, the corresponding error flag is set on the SE. So SE_FL_ERROR or SE_FL_ERR_PENDING, depending if the end of stream was reported or not. However, there is no attempt to propagate other termination flags. We must be sure to properly set SE_FL_EOI and SE_FL_EOS when appropriate to be able to switch a pending error to a fatal error. Because of this bug, the SE remains with a pending error and no end of stream, preventing the applicative stream to trully abort it. It means on some abort scenario, it is possible to block a stream infinitely. This patch must be backported at least as far as 2.8. No bug was observed on older versions while the same code is inuse.	2024-08-02 08:42:28 +02:00
Willy Tarreau	4de03e42cd	BUG/MAJOR: mux-h2: force a hard error upon short read with pending error A risk of truncated packet was addressed in 2.9 by commit `19fb19976f` ("BUG/MEDIUM: mux-h2: Only Report H2C error on read error if demux buffer is empty") by ignoring CO_FL_ERROR after a recv() call as long as some data remained present in the buffer. However it has a side effect due to the fact that some frame processors only deal with full frames, for example, HEADERS. The side effect is that an incomplete frame will not be processed and will remain in the buffer, preventing the error from being taken into account, so the I/O handler wakes up the H2 parser to handle the error, and that one just subscribes for more data, and this loops forever wasting CPU cycles. Note that this only happens with errors at the SSL layer exclusively, otherwise we'd have a read0 pending that would properly be detected: conn->flags = CO_FL_XPRT_TRACKED \| CO_FL_ERROR \| CO_FL_XPRT_READY \| CO_FL_CTRL_READY conn->err_code = CO_ERR_SSL_FATAL h2c->flags = H2_CF_ERR_PENDING \| H2_CF_WINDOW_OPENED \| H2_CF_MBUF_HAS_DATA \| H2_CF_DEM_IN_PROGRESS \| H2_CF_DEM_SHORT_READ The condition to report the error in h2_recv() needs to be refined, so that connection errors are taken into account either when the buffer is empty, or when there's an incomplete frame, since we're certain it will never be completed. We're certain to enter that function because H2_CF_DEM_SHORT_READ implies too short a frame, and earlier there's a protocol check to validate that no frame size is larger than bufsize, hence a H2_CF_DEM_SHORT_READ implies there's some room left in the buffer and we're allowed to try to receive. The condition to reproduce the bug seems super hard to meet but was observed once by Patrick Hemmer who had the reflex to capture lots of information that allowed to explain the problem. In order to reproduce it, the SSL code had to be significantly modified to alter received contents at very empiric places, but that was sufficient to reproduce it and confirm that the current patch works as expected. The bug was tagged MAJOR because when it triggers there's no other solution to get rid of it but to restart the process. However given how hard it is to trigger on a lab, it does not seem very likely to occur in field. This needs to be backported to 2.9.	2024-07-17 15:07:47 +02:00
Christopher Faulet	4b8098bf48	MINOR: connection: No longer include stconn type header in connection-t.h It is a small change, but it is cleaner to no include stconn-t.h header in connection-t.h, mainly to avoid circular definitions. The related issue is #2502.	2024-07-12 15:27:04 +02:00
Willy Tarreau	821a04377d	BUG/MEDIUM: muxes: enforce buf_wait check in takeover() The ->takeover() is quite tricky. It didn't take care of the possibility that the original thread's connection handler had been woken up to handle an event (e.g. read0), failed to get a buffer, registered against its own thread's buffer_wait queue and left the connection in an idle state. A new thread could then come by, perform a takeover(), and when a buffer was available, the new thread's tasklet would be woken up by the old one via _buf_available(), causing all sort of problems. These problems are easy to reproduce, by running with shared backend connections and few buffers (tune.buffers.limit=20, 8 threads, 500 connections, transfer 64kB objects and wait 2-5s for a crash to appear). A first estimated solution consisted in removing the connection from the idle list but it turns out that it would be worse for the delete stuff (the connection no longer appearing as idle, making it impossible to find it in order to close it). Also, idle counts wouldn't match anymore the list's state, and the special case of private connections could be difficult to handle as the connection could be forcefully re-added to the idle list after allocation despite being private. After multiple attempts to address the problem in various ways, it appears that the only reliable solution for now (without starting to turn many lists to mt_lists) is to have the takeover() function handle the buf_wait detection or unregistration itself: - when doing a regular takeover aiming at finding an idle connection for a new request, connections that are blocked in a buffer_wait queue are quite rare and not interesting at all (since not immediately usable), so skipping them is sufficient. For this we detect that the desired connection belongs to a buffer_wait list by checking its buf_wait.list element. Note that this check is not* thread-safe! The LIST_DEL_INIT() is performed by __offer_buffers() after the callback was called. But this is sufficient as it is now because the only way for the element to be seen as not in a list is after the element was last touched by __offer_buffers(), so the situation for this connection will not change in a different way later. - when doing a server delete, we're running under thread isolation. The connection might get taken over to be killed. The only trick is that private connections not belonging to any idle list may also experience this, and in this case even the idle_conns lock will not offer any protection against anything. But since we're run under thread isolation, we're certain not to compete with the other thread, so it's safe to directly unregister the connection from its owner thread. Normally this is already handled by conn_release() in cli_parse_delete_server(), which calls mux->destroy(), but this would actually update the current thread's queue instead of the origin thread's, thus we do need to perform an explicit dequeue before completing the takeover. With this, the problem now looks solved for HTTP/1, HTTP/2 and FCGI, though extensive tests were essentially run on HTTP/1 and HTTP/2. While the problem has been there for a very long time, there should be no reason to backport it since buffer_wait didn't practically work before 3.0-dev and the process used to freeze hard very quickly before we'd even have a chance to meet that race.	2024-05-15 19:37:12 +02:00
Willy Tarreau	f5566afec6	MEDIUM: dynbuf: generalize the use of b_dequeue() to detach buffer_wait Now thanks to this the bufq_map field is expected to remain accurate.	2024-05-10 17:18:13 +02:00
Willy Tarreau	a214197ce7	MINOR: dynbuf: use the b_queue()/b_requeue() functions everywhere The code places that were used to manipulate the buffer_wq manually now just call b_queue() or b_requeue(). This will simplify the multiple list management later.	2024-05-10 17:18:13 +02:00
Willy Tarreau	72d0dcda8e	MINOR: dynbuf: pass a criticality argument to b_alloc() The goal is to indicate how critical the allocation is, between the least one (growing an existing buffer ring) and the topmost one (boot time allocation for the life of the process). The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function to try to allocate otherwise subscribe. There's currently no distinction of direction nor part that tries to allocate, and this should be revisited to improve this situation, particularly when we consider that mux-h2 can reduce its Tx allocations if needed. For now, 4 main levels are planned, to translate how the data travels inside haproxy from a producer to a consumer: - MUX_RX: buffer used to receive data from the OS - SE_RX: buffer used to place a transformation of the RX data for a mux, or to produce a response for an applet - CHANNEL: the channel buffer for sync recv - MUX_TX: buffer used to transfer data from the channel to the outside, generally a mux but there can be a few specificities (e.g. http client's response buffer passed to the application, which also gets a transformation of the channel data). The other levels are a bit different in that they don't strictly need to allocate for the first two ones, or they're permanent for the last one (used by compression).	2024-05-10 17:18:13 +02:00
Christopher Faulet	eca9831ec8	MINOR: muxes: Add ctl commands to get info on streams for a connection There are 2 new ctl commands that may be used to retrieve the current number of streams openned for a connection and its limit (the maximum number of streams a mux connection supports). For the PT and H1 muxes, the limit is always 1 and the current number of streams is 0 for idle connections, otherwise 1 is returned. For the H2 and the FCGI muxes, info are already available in the mux connection. For the QUIC mux, the limit is also directly available. It is the maximum initial sub-ID of bidirectional stream allowed for the connection. For the current number of streams, it is the number of SC attached on the connection and the number of not already attached streams present in the "opening_list" list.	2024-05-06 22:00:00 +02:00
Christopher Faulet	20b156ee15	MEDIUM: mux-h2: Forward h2 client cancellations to h2 servers When a H2 client sends a RST_STREAM(CANCEL) frame to abort a request, the abort reason is now used on server side, in the H2 mux, to set the RST_STREAM code. The main use case is to forward client cancellations to gRPC applications. This patch should fix the issue #172.	2024-05-06 22:00:00 +02:00
Christopher Faulet	dea79f3fe1	MINOR: mux-h2: Set the SE abort reason when a RST_STREAM frame is received When RST_STREAM frame is received, the error code is now saved in the SE abort reason. To do so, we use the H2 source (SE_ABRT_SRC_MUX_H2). For now, this code is only set but not used on the opposite side.	2024-05-06 22:00:00 +02:00
Christopher Faulet	96f8b7ad08	MEDIUM: stconn/muxes: Add an abort reason for SE shutdowns on muxes A reason is now passed as parameter to muxes shutdowns to pass additional info about the abort, if any. No info means no abort or only generic one. For now, the reason is composed of 2 32-bits integer. The first on represents the abort code and the other one represents the info about the code (for instance the source). The code should be interpreted according to the associated info. One info is the source, encoding on 5 bits. Other bits are reserverd for now. For now, the muxes are the only supported source. But we can imagine to extend it to applets, streams, health-checks... The current design is quite simple and will most probably evolved.. But the idea is to let the opposite side forward some errors and let's a mux know why its stream was aborted. At first glance, a abort reason must only be evaluated if SE_SHW_SILENT flag is set. The main goal at short term, is to forward some H2 RST_STREAM codes because it is mandatory for gRPC applications, mainly to forward gRPC cancellation from an H2 client to an H2 server. But we can imagine to alter this reason at the applicative level to enrich it. It would also be used to report more accurate errors in logs.	2024-05-06 22:00:00 +02:00
Amaury Denoyelle	65624876f2	MINOR: stats: introduce a more expressive stat definition method Previously, statistics were simply defined as a list of name_desc, as for example "stat_cols_px" for proxy stats. No notion of type was fixed for each stat definition. This correspondance was done individually inside stats_fill_*_line() functions. This renders the process to define new statistics tedious. Implement a more expressive stat definition method via a new API. A new type "struct stat_col" for stat column to replace name_desc usage is defined. It contains a field to store the stat nature and format. A <cap> field is also defined to be able to define a proxy stat only for certain type of objects. This new type is also further extended to include counter offsets. This allows to define a method to automatically generate a stat value field from a "struct stat_col". This will be the subject of a future commit. New type "struct stat_col" is fully compatible full name_desc. This allows to gradually convert stats definition. The focus will be first for proxies counters to implement statistics preservation on reload.	2024-04-26 10:20:57 +02:00
Christopher Faulet	fbc0850d36	MEDIUM: muxes: Use one callback function to shut a mux stream mux-ops .shutr and .shutw callback functions are merged into a unique functions, called .shut. The shutdown mode is still passed as argument, muxes are responsible to test it. Concretly, .shut() function of each mux is now the content of the old .shutw() followed by the content of the old .shutr().	2024-04-19 16:33:40 +02:00
Christopher Faulet	d2c3f8dde7	MINOR: stconn/connection: Move shut modes at the SE descriptor level CO_SHR_* and CO_SHW_* modes are in fact used by the stream-connectors to instruct the muxes how streams must be shut done. It is then the mux responsibility to decide if it must be propagated to the connection layer or not. And in this case, the modes above are only tested to pass a boolean (clean or not). So, it is not consistant to still use connection related modes for information set at an upper layer and never used by the connection layer itself. These modes are thus moved at the sedesc level and merged into a single enum. Idea is to add more modes, not necessarily mutually exclusive, to pass more info to the muxes. For now, it is a one-for-one renaming.	2024-04-19 16:24:46 +02:00
Amaury Denoyelle	5e8eb3661b	MEDIUM: mux: prepare for takeover on private connections When a backend connection is marked as idle, a special flag TASK_F_USR1 is set on MUX tasklet. When MUX tasklet is reactivated, extra checks are executed under this flag to ensure no takeover occurred in the meantime. Previously, only non private connections could be targetted by a takeover. However, this will change when implementing private idle connections closure on "delete server" CLI handler. As such, TASK_F_USR1 is now also set for private connections in MUX detach callbacks.	2024-03-22 17:10:06 +01:00
Amaury Denoyelle	f3862a9bc7	MINOR: connection: extend takeover with release option Extend takeover API both for MUX and XPRT with a new boolean argument <release>. Its purpose is to signal if the connection will be freed immediately after the takeover, rendering new resources allocation unnecessary. For the moment, release argument is always false. However, it will be set to true on delete server CLI handler to proactively close server idle connections.	2024-03-22 16:12:36 +01:00
Amaury Denoyelle	5ad801c058	MINOR: session: rename private conns elements By default, backend connections are attached to a server instance. This allows to implement connection reuse. However, in some particular cases, connection cannot be shared accross several clients. These connections are considered and private and are attached to the session instance instead. These private connections are also indexed by the target server to not mix them. All of this is implemented via a dedicated structure previously named struct sess_srv_list. Rename it to better reflect its usage to struct sess_priv_conns. Also rename its internal members and all of the associated functions. This commit is only a renaming, thus no functional impact is expected.	2024-03-14 15:21:02 +01:00
Willy Tarreau	6770259083	MEDIUM: mux-h2: allow to set the glitches threshold to kill a connection Till now it was still needed to write rules to eliminate bad behaving H2 clients, while most of the time it would be desirable to just be able to set a threshold on the level of anomalies on a connection. This is what this patch does. By setting a glitches threshold for frontend and backend, it allows to automatically turn a connection to the error state when the threshold is reached so that the connection dies by itself without having to write possibly complex rules. One subtlety is that we still have the error state being exclusive to the parser's state so this requires the h2c_report_glitches() function to return a status indicating if the threshold was reached or not so that processing can instantly stop and bypass the state update, otherwise the state could be turned back to a valid one (e.g. after parsing CONTINUATION); we should really contemplate the possibility to use H2_CF_ERROR for this. Fortunately there were very few places where a glitch was reported outside of an error path so the changes are quite minor. Now by setting the front value to 1000, a client flooding with short CONTINUATION frames is instantly stopped.	2024-03-11 08:25:08 +01:00
Willy Tarreau	e6e7e1587e	MINOR: mux-h2: always use h2c_report_glitch() The function aims at centralizing counter measures but due to the fact that it only increments the counter by one unit, sometimes it was not used and the value was calculated directly. Let's pass the increment in argument so that it can be used everywhere.	2024-03-11 07:36:56 +01:00
Christopher Faulet	69f15b9a40	CLEANUP: mux-h2: Fix h2s_make_data() comment about the return value 2 return values are specified in the h2s_make_data() function comment. Both are more or less equivalent but the later is probably more accurate. So, keep the right one and remove the other one. This patch should fix the issue #2175.	2024-02-29 13:57:44 +01:00
Christopher Faulet	081022a0c5	MINOR: muxes/applet: Simplify checks on options to disable zero-copy forwarding Global options to disable for zero-copy forwarding are now tested outside callbacks responsible to perform the forwarding itself. It is cleaner this way because we don't try at all zero-copy forwarding if at least one side does not support it. It is equivalent to what was performed before, but it is simplier this way.	2024-02-14 15:41:04 +01:00
Christopher Faulet	e2921ffad1	MINOR: muxes: Announce support for zero-copy forwarding on consumer side It is unused for now, but the muxes announce their support of the zero-copy forwarding on consumer side. All muxes, except the fgci one, are supported it.	2024-02-14 15:15:10 +01:00
Willy Tarreau	870e2d3f1f	MEDIUM: mux-h2: update session trackers with number of glitches We now update the session's tracked counters with the observed glitches. In order to avoid incurring a high cost, e.g. if many small frames contain issues, we batch the updates around h2_process_demux() by directly passing the difference. Indeed, for now all functions that increment glitches are called from h2_process_demux(). If that were to change, we'd just need to keep the value of the last synced counter in the h2c struct instead of the stack. The regtest was updated to verify that the 3rd client that does not cause issue still sees the counter resulting from client 2's mistakes. The rate is also verified, considering it shouldn't fail since the period is very long (1m).	2024-02-08 15:51:49 +01:00
Willy Tarreau	9f3a0834d8	MINOR: mux-h2: count late reduction of INITIAL_WINDOW_SIZE as a glitch It's quite uncommon for a client to decide to change the connection's initial window size after the settings exchange phase, unless it tries to increase it. One of the impacts depending is that it updates all streams, so it can be expensive, depending on the stacks, and may even be used to construct an attack. For this reason, we now count a glitch when this happens. A test with h2spec shows that it triggers 9 across a full test.	2024-02-08 15:51:49 +01:00
Willy Tarreau	28dfd006ca	MINOR: mux-h2: count excess of CONTINUATION frames as a glitch Here we consider that if a HEADERS frame is made of more than 4 fragments whose average size is lower than 1kB, that's very likely an abuse so we count a glitch per 16 fragments, which means 1 glitch per 1kB frame in a 16kB buffer. This means that an abuser sending 1600 1-byte frames would increase the counter by 100, and that sending 100 headers per request in individual frames each results in a count of ~7 to be added per request. A test consisting in sending 100M requests made of 101 frames each over a connection resulted in ~695M glitches to be counted for this connection. Note that no special care is taken to avoid wrapping since it already takes a very long time to reach 100M and there's no particular impact of wrapping here (roughly 1M/s).	2024-02-08 15:51:49 +01:00
Willy Tarreau	eeacca75d1	BUG/MINOR: mux-h2: count rejected DATA frames against the connection's flow control RFC9113 clarified a point regarding the payload from DATA frames sent to closed streams. It must always be counted against the connection's flow control. In practice it should really have no practical effect, but if repeated upload attempts are aborted, this might cause the client's window to progressively shrink since not being ACKed. It's probably not necessary to backport this, unless another patch depends on it.	2024-02-08 15:51:49 +01:00
Christopher Faulet	2297f52734	MINOR: stconn: Add support for flags during zero-copy forwarding negotiation During zero-copy forwarding negotiation, a pseudo flag was already used to notify the consummer if the producer is able to use kernel splicing or not. But this was not extensible. So, now we use a true bitfield to be able to pass flags during the negotiation. NEGO_FF_FL_* flags may be used now. Of course, for now, there is only one flags, the kernel splicing support on producer side (NEGO_FF_FL_MAY_SPLICE).	2024-02-07 15:04:29 +01:00
Christopher Faulet	3246f863d6	MEDIUM: stats: Be able to access a specific field into a stats module It is now possible to selectively retrieve extra counters from stats modules. H1, H2, QUIC and H3 fill_stats() callback functions are updated to return a specific counter.	2024-02-01 12:00:53 +01:00
Willy Tarreau	d2b44fd730	MINOR: mux-h2: implement MUX_CTL_GET_GLITCHES This reports the number of glitches on a connection.	2024-01-18 17:21:44 +01:00
Willy Tarreau	3d4438484a	MINOR: mux-h2: add a counter of "glitches" on a connection There are a lot of H2 events which are not invalid from a protocol perspective but which are yet anomalies, especially when repeated. They can come from bogus or really poorly implemlented clients, as well as purposely built attacks, as we've seen in the past with various waves of attempts at abusing H2 stacks. In order to better deal with such situations, it would be nice to be able to sort out what is correct and what is not. There's already the HTTP error counter that may even be updated on a tracked connection, but HTTP errors are something clearly defined while there's an entire scope of gray area around it that should not fall into it. This patch introduces the notion of "glitches", which normally correspond to unexpected and temporary malfunction. And this is exactly what we'd like to monitor. For example a peer is not misbehaving if a request it sends fails to decode because due to HPACK compression it's larger than a buffer, and for this reason such an event is reported as a stream error and not a connection error. But this causes trouble nonetheless and should be accounted for, especially to detect if it's repeated. Similarly, a truncated preamble or settings frame may very well be caused by a network hiccup but how do we know that in the logs? For such events, a glitch counter is incremented on the connection. For now a total of 41 locations were instrumented with this and the counter is reported in the traces when not null, as well as in "show sess" and "show fd". This was done using a new function, "h2c_report_glitch()" so that it becomes easier to extend to more advanced processing (applying thresholds, producing logs, escalating to connection error, tracking etc). A test with h2spec shows it reported in 8545 trace lines for 147 tests, with some reaching value 3 in a same test (e.g. HPACK errors). Some places were not instrumented, typically anything that can be triggered on perfectly valid activity (received data after RST being emitted, timeouts, etc). Some types of events were thought about, such as INITIAL_WINDOW_SIZE after the first SETTINGS frame, too small window update increments, etc. It just sounds too early to know if those are currently being triggered by perfectly legit clients. Also it's currently not incremented on timeouts so that we don't do that repeatedly on short keep-alive timeouts, though it could make sense. This may change in the future depending on how it's used. For now this is not exposed outside of traces and debugging.	2024-01-18 17:21:44 +01:00
Willy Tarreau	87b74697cd	MINOR: mux-h2/traces: add a missing trace on connection WU with negative inc The test was performed but no trace emitted, which can complicate certain diagnostics, so let's just add the trace for this rare case. It may safely be backported though this is really not important.	2024-01-18 17:21:44 +01:00

1 2 3 4 5 ...

956 Commits