haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-07 07:37:02 +02:00

Author	SHA1	Message	Date
Willy Tarreau	821a04377d	BUG/MEDIUM: muxes: enforce buf_wait check in takeover() The ->takeover() is quite tricky. It didn't take care of the possibility that the original thread's connection handler had been woken up to handle an event (e.g. read0), failed to get a buffer, registered against its own thread's buffer_wait queue and left the connection in an idle state. A new thread could then come by, perform a takeover(), and when a buffer was available, the new thread's tasklet would be woken up by the old one via _buf_available(), causing all sort of problems. These problems are easy to reproduce, by running with shared backend connections and few buffers (tune.buffers.limit=20, 8 threads, 500 connections, transfer 64kB objects and wait 2-5s for a crash to appear). A first estimated solution consisted in removing the connection from the idle list but it turns out that it would be worse for the delete stuff (the connection no longer appearing as idle, making it impossible to find it in order to close it). Also, idle counts wouldn't match anymore the list's state, and the special case of private connections could be difficult to handle as the connection could be forcefully re-added to the idle list after allocation despite being private. After multiple attempts to address the problem in various ways, it appears that the only reliable solution for now (without starting to turn many lists to mt_lists) is to have the takeover() function handle the buf_wait detection or unregistration itself: - when doing a regular takeover aiming at finding an idle connection for a new request, connections that are blocked in a buffer_wait queue are quite rare and not interesting at all (since not immediately usable), so skipping them is sufficient. For this we detect that the desired connection belongs to a buffer_wait list by checking its buf_wait.list element. Note that this check is not* thread-safe! The LIST_DEL_INIT() is performed by __offer_buffers() after the callback was called. But this is sufficient as it is now because the only way for the element to be seen as not in a list is after the element was last touched by __offer_buffers(), so the situation for this connection will not change in a different way later. - when doing a server delete, we're running under thread isolation. The connection might get taken over to be killed. The only trick is that private connections not belonging to any idle list may also experience this, and in this case even the idle_conns lock will not offer any protection against anything. But since we're run under thread isolation, we're certain not to compete with the other thread, so it's safe to directly unregister the connection from its owner thread. Normally this is already handled by conn_release() in cli_parse_delete_server(), which calls mux->destroy(), but this would actually update the current thread's queue instead of the origin thread's, thus we do need to perform an explicit dequeue before completing the takeover. With this, the problem now looks solved for HTTP/1, HTTP/2 and FCGI, though extensive tests were essentially run on HTTP/1 and HTTP/2. While the problem has been there for a very long time, there should be no reason to backport it since buffer_wait didn't practically work before 3.0-dev and the process used to freeze hard very quickly before we'd even have a chance to meet that race.	2024-05-15 19:37:12 +02:00
Willy Tarreau	f5566afec6	MEDIUM: dynbuf: generalize the use of b_dequeue() to detach buffer_wait Now thanks to this the bufq_map field is expected to remain accurate.	2024-05-10 17:18:13 +02:00
Willy Tarreau	a214197ce7	MINOR: dynbuf: use the b_queue()/b_requeue() functions everywhere The code places that were used to manipulate the buffer_wq manually now just call b_queue() or b_requeue(). This will simplify the multiple list management later.	2024-05-10 17:18:13 +02:00
Willy Tarreau	72d0dcda8e	MINOR: dynbuf: pass a criticality argument to b_alloc() The goal is to indicate how critical the allocation is, between the least one (growing an existing buffer ring) and the topmost one (boot time allocation for the life of the process). The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function to try to allocate otherwise subscribe. There's currently no distinction of direction nor part that tries to allocate, and this should be revisited to improve this situation, particularly when we consider that mux-h2 can reduce its Tx allocations if needed. For now, 4 main levels are planned, to translate how the data travels inside haproxy from a producer to a consumer: - MUX_RX: buffer used to receive data from the OS - SE_RX: buffer used to place a transformation of the RX data for a mux, or to produce a response for an applet - CHANNEL: the channel buffer for sync recv - MUX_TX: buffer used to transfer data from the channel to the outside, generally a mux but there can be a few specificities (e.g. http client's response buffer passed to the application, which also gets a transformation of the channel data). The other levels are a bit different in that they don't strictly need to allocate for the first two ones, or they're permanent for the last one (used by compression).	2024-05-10 17:18:13 +02:00
Christopher Faulet	eca9831ec8	MINOR: muxes: Add ctl commands to get info on streams for a connection There are 2 new ctl commands that may be used to retrieve the current number of streams openned for a connection and its limit (the maximum number of streams a mux connection supports). For the PT and H1 muxes, the limit is always 1 and the current number of streams is 0 for idle connections, otherwise 1 is returned. For the H2 and the FCGI muxes, info are already available in the mux connection. For the QUIC mux, the limit is also directly available. It is the maximum initial sub-ID of bidirectional stream allowed for the connection. For the current number of streams, it is the number of SC attached on the connection and the number of not already attached streams present in the "opening_list" list.	2024-05-06 22:00:00 +02:00
Christopher Faulet	20b156ee15	MEDIUM: mux-h2: Forward h2 client cancellations to h2 servers When a H2 client sends a RST_STREAM(CANCEL) frame to abort a request, the abort reason is now used on server side, in the H2 mux, to set the RST_STREAM code. The main use case is to forward client cancellations to gRPC applications. This patch should fix the issue #172.	2024-05-06 22:00:00 +02:00
Christopher Faulet	dea79f3fe1	MINOR: mux-h2: Set the SE abort reason when a RST_STREAM frame is received When RST_STREAM frame is received, the error code is now saved in the SE abort reason. To do so, we use the H2 source (SE_ABRT_SRC_MUX_H2). For now, this code is only set but not used on the opposite side.	2024-05-06 22:00:00 +02:00
Christopher Faulet	96f8b7ad08	MEDIUM: stconn/muxes: Add an abort reason for SE shutdowns on muxes A reason is now passed as parameter to muxes shutdowns to pass additional info about the abort, if any. No info means no abort or only generic one. For now, the reason is composed of 2 32-bits integer. The first on represents the abort code and the other one represents the info about the code (for instance the source). The code should be interpreted according to the associated info. One info is the source, encoding on 5 bits. Other bits are reserverd for now. For now, the muxes are the only supported source. But we can imagine to extend it to applets, streams, health-checks... The current design is quite simple and will most probably evolved.. But the idea is to let the opposite side forward some errors and let's a mux know why its stream was aborted. At first glance, a abort reason must only be evaluated if SE_SHW_SILENT flag is set. The main goal at short term, is to forward some H2 RST_STREAM codes because it is mandatory for gRPC applications, mainly to forward gRPC cancellation from an H2 client to an H2 server. But we can imagine to alter this reason at the applicative level to enrich it. It would also be used to report more accurate errors in logs.	2024-05-06 22:00:00 +02:00
Amaury Denoyelle	65624876f2	MINOR: stats: introduce a more expressive stat definition method Previously, statistics were simply defined as a list of name_desc, as for example "stat_cols_px" for proxy stats. No notion of type was fixed for each stat definition. This correspondance was done individually inside stats_fill_*_line() functions. This renders the process to define new statistics tedious. Implement a more expressive stat definition method via a new API. A new type "struct stat_col" for stat column to replace name_desc usage is defined. It contains a field to store the stat nature and format. A <cap> field is also defined to be able to define a proxy stat only for certain type of objects. This new type is also further extended to include counter offsets. This allows to define a method to automatically generate a stat value field from a "struct stat_col". This will be the subject of a future commit. New type "struct stat_col" is fully compatible full name_desc. This allows to gradually convert stats definition. The focus will be first for proxies counters to implement statistics preservation on reload.	2024-04-26 10:20:57 +02:00
Christopher Faulet	fbc0850d36	MEDIUM: muxes: Use one callback function to shut a mux stream mux-ops .shutr and .shutw callback functions are merged into a unique functions, called .shut. The shutdown mode is still passed as argument, muxes are responsible to test it. Concretly, .shut() function of each mux is now the content of the old .shutw() followed by the content of the old .shutr().	2024-04-19 16:33:40 +02:00
Christopher Faulet	d2c3f8dde7	MINOR: stconn/connection: Move shut modes at the SE descriptor level CO_SHR_* and CO_SHW_* modes are in fact used by the stream-connectors to instruct the muxes how streams must be shut done. It is then the mux responsibility to decide if it must be propagated to the connection layer or not. And in this case, the modes above are only tested to pass a boolean (clean or not). So, it is not consistant to still use connection related modes for information set at an upper layer and never used by the connection layer itself. These modes are thus moved at the sedesc level and merged into a single enum. Idea is to add more modes, not necessarily mutually exclusive, to pass more info to the muxes. For now, it is a one-for-one renaming.	2024-04-19 16:24:46 +02:00
Amaury Denoyelle	5e8eb3661b	MEDIUM: mux: prepare for takeover on private connections When a backend connection is marked as idle, a special flag TASK_F_USR1 is set on MUX tasklet. When MUX tasklet is reactivated, extra checks are executed under this flag to ensure no takeover occurred in the meantime. Previously, only non private connections could be targetted by a takeover. However, this will change when implementing private idle connections closure on "delete server" CLI handler. As such, TASK_F_USR1 is now also set for private connections in MUX detach callbacks.	2024-03-22 17:10:06 +01:00
Amaury Denoyelle	f3862a9bc7	MINOR: connection: extend takeover with release option Extend takeover API both for MUX and XPRT with a new boolean argument <release>. Its purpose is to signal if the connection will be freed immediately after the takeover, rendering new resources allocation unnecessary. For the moment, release argument is always false. However, it will be set to true on delete server CLI handler to proactively close server idle connections.	2024-03-22 16:12:36 +01:00
Amaury Denoyelle	5ad801c058	MINOR: session: rename private conns elements By default, backend connections are attached to a server instance. This allows to implement connection reuse. However, in some particular cases, connection cannot be shared accross several clients. These connections are considered and private and are attached to the session instance instead. These private connections are also indexed by the target server to not mix them. All of this is implemented via a dedicated structure previously named struct sess_srv_list. Rename it to better reflect its usage to struct sess_priv_conns. Also rename its internal members and all of the associated functions. This commit is only a renaming, thus no functional impact is expected.	2024-03-14 15:21:02 +01:00
Willy Tarreau	6770259083	MEDIUM: mux-h2: allow to set the glitches threshold to kill a connection Till now it was still needed to write rules to eliminate bad behaving H2 clients, while most of the time it would be desirable to just be able to set a threshold on the level of anomalies on a connection. This is what this patch does. By setting a glitches threshold for frontend and backend, it allows to automatically turn a connection to the error state when the threshold is reached so that the connection dies by itself without having to write possibly complex rules. One subtlety is that we still have the error state being exclusive to the parser's state so this requires the h2c_report_glitches() function to return a status indicating if the threshold was reached or not so that processing can instantly stop and bypass the state update, otherwise the state could be turned back to a valid one (e.g. after parsing CONTINUATION); we should really contemplate the possibility to use H2_CF_ERROR for this. Fortunately there were very few places where a glitch was reported outside of an error path so the changes are quite minor. Now by setting the front value to 1000, a client flooding with short CONTINUATION frames is instantly stopped.	2024-03-11 08:25:08 +01:00
Willy Tarreau	e6e7e1587e	MINOR: mux-h2: always use h2c_report_glitch() The function aims at centralizing counter measures but due to the fact that it only increments the counter by one unit, sometimes it was not used and the value was calculated directly. Let's pass the increment in argument so that it can be used everywhere.	2024-03-11 07:36:56 +01:00
Christopher Faulet	69f15b9a40	CLEANUP: mux-h2: Fix h2s_make_data() comment about the return value 2 return values are specified in the h2s_make_data() function comment. Both are more or less equivalent but the later is probably more accurate. So, keep the right one and remove the other one. This patch should fix the issue #2175.	2024-02-29 13:57:44 +01:00
Christopher Faulet	081022a0c5	MINOR: muxes/applet: Simplify checks on options to disable zero-copy forwarding Global options to disable for zero-copy forwarding are now tested outside callbacks responsible to perform the forwarding itself. It is cleaner this way because we don't try at all zero-copy forwarding if at least one side does not support it. It is equivalent to what was performed before, but it is simplier this way.	2024-02-14 15:41:04 +01:00
Christopher Faulet	e2921ffad1	MINOR: muxes: Announce support for zero-copy forwarding on consumer side It is unused for now, but the muxes announce their support of the zero-copy forwarding on consumer side. All muxes, except the fgci one, are supported it.	2024-02-14 15:15:10 +01:00
Willy Tarreau	870e2d3f1f	MEDIUM: mux-h2: update session trackers with number of glitches We now update the session's tracked counters with the observed glitches. In order to avoid incurring a high cost, e.g. if many small frames contain issues, we batch the updates around h2_process_demux() by directly passing the difference. Indeed, for now all functions that increment glitches are called from h2_process_demux(). If that were to change, we'd just need to keep the value of the last synced counter in the h2c struct instead of the stack. The regtest was updated to verify that the 3rd client that does not cause issue still sees the counter resulting from client 2's mistakes. The rate is also verified, considering it shouldn't fail since the period is very long (1m).	2024-02-08 15:51:49 +01:00
Willy Tarreau	9f3a0834d8	MINOR: mux-h2: count late reduction of INITIAL_WINDOW_SIZE as a glitch It's quite uncommon for a client to decide to change the connection's initial window size after the settings exchange phase, unless it tries to increase it. One of the impacts depending is that it updates all streams, so it can be expensive, depending on the stacks, and may even be used to construct an attack. For this reason, we now count a glitch when this happens. A test with h2spec shows that it triggers 9 across a full test.	2024-02-08 15:51:49 +01:00
Willy Tarreau	28dfd006ca	MINOR: mux-h2: count excess of CONTINUATION frames as a glitch Here we consider that if a HEADERS frame is made of more than 4 fragments whose average size is lower than 1kB, that's very likely an abuse so we count a glitch per 16 fragments, which means 1 glitch per 1kB frame in a 16kB buffer. This means that an abuser sending 1600 1-byte frames would increase the counter by 100, and that sending 100 headers per request in individual frames each results in a count of ~7 to be added per request. A test consisting in sending 100M requests made of 101 frames each over a connection resulted in ~695M glitches to be counted for this connection. Note that no special care is taken to avoid wrapping since it already takes a very long time to reach 100M and there's no particular impact of wrapping here (roughly 1M/s).	2024-02-08 15:51:49 +01:00
Willy Tarreau	eeacca75d1	BUG/MINOR: mux-h2: count rejected DATA frames against the connection's flow control RFC9113 clarified a point regarding the payload from DATA frames sent to closed streams. It must always be counted against the connection's flow control. In practice it should really have no practical effect, but if repeated upload attempts are aborted, this might cause the client's window to progressively shrink since not being ACKed. It's probably not necessary to backport this, unless another patch depends on it.	2024-02-08 15:51:49 +01:00
Christopher Faulet	2297f52734	MINOR: stconn: Add support for flags during zero-copy forwarding negotiation During zero-copy forwarding negotiation, a pseudo flag was already used to notify the consummer if the producer is able to use kernel splicing or not. But this was not extensible. So, now we use a true bitfield to be able to pass flags during the negotiation. NEGO_FF_FL_* flags may be used now. Of course, for now, there is only one flags, the kernel splicing support on producer side (NEGO_FF_FL_MAY_SPLICE).	2024-02-07 15:04:29 +01:00
Christopher Faulet	3246f863d6	MEDIUM: stats: Be able to access a specific field into a stats module It is now possible to selectively retrieve extra counters from stats modules. H1, H2, QUIC and H3 fill_stats() callback functions are updated to return a specific counter.	2024-02-01 12:00:53 +01:00
Willy Tarreau	d2b44fd730	MINOR: mux-h2: implement MUX_CTL_GET_GLITCHES This reports the number of glitches on a connection.	2024-01-18 17:21:44 +01:00
Willy Tarreau	3d4438484a	MINOR: mux-h2: add a counter of "glitches" on a connection There are a lot of H2 events which are not invalid from a protocol perspective but which are yet anomalies, especially when repeated. They can come from bogus or really poorly implemlented clients, as well as purposely built attacks, as we've seen in the past with various waves of attempts at abusing H2 stacks. In order to better deal with such situations, it would be nice to be able to sort out what is correct and what is not. There's already the HTTP error counter that may even be updated on a tracked connection, but HTTP errors are something clearly defined while there's an entire scope of gray area around it that should not fall into it. This patch introduces the notion of "glitches", which normally correspond to unexpected and temporary malfunction. And this is exactly what we'd like to monitor. For example a peer is not misbehaving if a request it sends fails to decode because due to HPACK compression it's larger than a buffer, and for this reason such an event is reported as a stream error and not a connection error. But this causes trouble nonetheless and should be accounted for, especially to detect if it's repeated. Similarly, a truncated preamble or settings frame may very well be caused by a network hiccup but how do we know that in the logs? For such events, a glitch counter is incremented on the connection. For now a total of 41 locations were instrumented with this and the counter is reported in the traces when not null, as well as in "show sess" and "show fd". This was done using a new function, "h2c_report_glitch()" so that it becomes easier to extend to more advanced processing (applying thresholds, producing logs, escalating to connection error, tracking etc). A test with h2spec shows it reported in 8545 trace lines for 147 tests, with some reaching value 3 in a same test (e.g. HPACK errors). Some places were not instrumented, typically anything that can be triggered on perfectly valid activity (received data after RST being emitted, timeouts, etc). Some types of events were thought about, such as INITIAL_WINDOW_SIZE after the first SETTINGS frame, too small window update increments, etc. It just sounds too early to know if those are currently being triggered by perfectly legit clients. Also it's currently not incremented on timeouts so that we don't do that repeatedly on short keep-alive timeouts, though it could make sense. This may change in the future depending on how it's used. For now this is not exposed outside of traces and debugging.	2024-01-18 17:21:44 +01:00
Willy Tarreau	87b74697cd	MINOR: mux-h2/traces: add a missing trace on connection WU with negative inc The test was performed but no trace emitted, which can complicate certain diagnostics, so let's just add the trace for this rare case. It may safely be backported though this is really not important.	2024-01-18 17:21:44 +01:00
Willy Tarreau	e1c8bfd0ed	BUG/MEDIUM: mux-h2: refine connection vs stream error on headers Commit `7021a8c4d8` ("BUG/MINOR: mux-h2: also count streams for refused ones") addressed stream counting issues on some error cases but not completely correctly regarding the conn_err vs stream_err case. Indeed, contrary to the initial analysis, h2c_dec_hdrs() can set H2_CS_ERROR when facing some unrecoverable protocol errors, and it's not correct to send it to strm_err which will only send the RST_STREAM frame and the subsequent GOAWAY frame is in fact the result of the read timeout. The difficulty behind this lies on the sequence of output validations because h2c_dec_hdrs() returns two results at once: - frame processing status (done/incomplete/failed) - connection error status The original ordering requires to write 2 exemplaries of the exact same error handling code disposed differently, which the patch above tried to factor to one. After careful inspection of h2c_dec_hdrs() and its comments, it's clear that it always returns -1 on failure, including connection errors. This means we can rearrange the test to get rid of the missing data first, and immediately enter the no-return zone where both the stream and connection errors can be checked at the same place, making sure to consistently maintain error counters. This is way better because we don't have to update stream counters on the error path anymore. h2spec now passes the test much faster. This will need to be backported to the same branches as the commit above, which was already backported to 2.9.	2024-01-18 17:21:02 +01:00
Willy Tarreau	7021a8c4d8	BUG/MINOR: mux-h2: also count streams for refused ones There are a few places where we can reject an incoming stream based on technical errors such as decoded headers that are too large for the internal buffers, or memory allocation errors. In this case we send an RST_STREAM to abort the request, but the total stream counter was not incremented. That's not really a problem, until one starts to try to enforce a total stream limit using tune.h2.fe.max-total-streams, and which will not count such faulty streams. Typically a client that learns too large cookies and tries to replay them in a way that overflows the maximum buffer size would be rejected and depending on how they're implemented, they might retry forever. This patch removes the stream count increment from h2s_new() and moves it instead to the calling functions, so that it translates the decision to process a new stream instead of a successfully decoded stream. The result is that such a bogus client will now be blocked after reaching the total stream limit. This can be validated this way: global tune.h2.fe.max-total-streams 128 expose-experimental-directives trace h2 sink stdout trace h2 level developer trace h2 verbosity complete trace h2 start now frontend h bind :8080 mode http redirect location / Sending this will fill frames with 15972 bytes of cookie headers that expand to 16500 for storage+index once decoded, causing "message too large" events: (dev/h2/mkhdr.sh -t p;dev/h2/mkhdr.sh -t s; for sid in {0..1000}; do dev/h2/mkhdr.sh -t h -i $((sid*2+1)) -f es,eh \ -R "828684410f7777772e6578616d706c652e636f6d \ $(for i in {1..66}; do echo -n 60 7F 73 433d $(for j in {1..24}; do echo -n 2e313233343536373839; done); done) "; done) \| nc 0 8080 Now it properly stops after sending 128 streams. This may be backported wherever commit `983ac4397` ("MINOR: mux-h2: support limiting the total number of H2 streams per connection") is present, since without it, that commit is less effective.	2024-01-12 18:59:59 +01:00
Willy Tarreau	e19334a343	CLEANUP: mux-h2: remove the printfs from previous commit on h2 streams limit. After thinking about them all the time at the end, I managed to remove them while editing the commit and to forget to push them :-(	2024-01-05 19:19:10 +01:00
Willy Tarreau	983ac4397d	MINOR: mux-h2: support limiting the total number of H2 streams per connection This patch introduces a new setting: tune.h2.fe.max-total-streams. It sets the HTTP/2 maximum number of total streams processed per incoming connection. Once this limit is reached, HAProxy will send a graceful GOAWAY frame informing the client that it will close the connection after all pending streams have been closed. In practice, clients tend to close as fast as possible when receiving this, and to establish a new connection for next requests. Doing this is sometimes useful and desired in situations where clients stay connected for a very long time and cause some imbalance inside a farm. For example, in some highly dynamic environments, it is possible that new load balancers are instantiated on the fly to adapt to a load increase, and that once the load goes down they should be stopped without breaking established connections. By setting a limit here, the connections will have a limited lifetime and will be frequently renewed, with some possibly being established to other nodes, so that existing resources are quickly released. The default value is zero, which enforces no limit beyond those implied by the protocol (2^30 ~= 1.07 billion). Values around 1000 were found to already cause frequent enough connection renewal without causing any perceptible latency to most clients. One notable exception here is h2load which reports errors for all requests that were expected to be sent over a given connection after it receives a GOAWAY. This is an already known limitation: https://github.com/nghttp2/nghttp2/issues/981 The patch was made in two parts inside h2_frt_handle_headers(): - the first one, at the end of the function, which verifies if the configured limit was reached and if it's needed to emit a GOAWAY ; - the second, just before decoding the stream frame, which verifies if a previously configured limit was ignored by the client, and closes the connection if this happens. Indeed, one reason for a connection to stay alive for too long definitely comes from a stupid bot that periodically fetches the same resource, scans lots of URLs or tries to brute-force something. These ones are more likely to just ignore the last stream ID advertised in GOAWAY than a regular browser, or a well-behaving client such as curl which respects it. So in order to make sure we can close the connection we need to enforce the advertised limit. Note that a regular client will not face a problem with that because in the worst case it will have max_concurrent_streams in flight and this limit is taken into account when calculating the advertised last acceptable stream ID. Just a note: it may also be possible to move the first part above to h2s_frt_stream_new() instead so that it's not processed for trailers, though it doesn't seem to be more interesting, first because it has two return points. This is something that may be backported to 2.9 and 2.8 to offer more control to those dealing with dynamic infrastructures, especially since for now we cannot force a connection to be cleanly closed using rules (e.g. github issues #946, #2146).	2024-01-05 18:49:11 +01:00
Christopher Faulet	d9eb6d6680	BUG/MEDIUM: mux-h2: Don't report error on SE for closed H2 streams An error on the H2 connection was always reported as an error to the stream-endpoint descriptor, independently on the H2 stream state. But it is a bug to do so for closed streams. And indeed, it leads to report "SD--" termination state for some streams while the response was fully received and forwarded to the client, at least for the backend side point of view. Now, errors are no longer reported for H2 streams in closed state. This patch is related to the three previous ones: * "BUG/MEDIUM: mux-h2: Don't report error on SE for closed H2 streams" * "BUG/MEDIUM: mux-h2: Don't report error on SE if error is only pending on H2C" * "BUG/MEDIUM: mux-h2: Only Report H2C error on read error if demux buffer is empty" The series should fix a bug reported in issue #2388 (#2388#issuecomment-1855735144). The series should be backported to 2.9 but only after a period of observation. In theory, older versions are also affected but this part is pretty sensitive. So don't backport it further except if someone ask for it.	2023-12-18 21:15:32 +01:00
Christopher Faulet	580ffd6123	BUG/MEDIUM: mux-h2: Don't report error on SE if error is only pending on H2C In h2s_wake_one_stream(), we must not report an error on the stream-endpoint descriptor if the error is not definitive on the H2 connection. A pending error on the H2 connection means there are potentially remaining data to be demux. It is important to not truncate a message for a stream. This patch is part of a series that should fix a bug reported in issue #2388 (#2388#issuecomment-1855735144). Backport instructions will be shipped in the last commit of the series.	2023-12-18 21:15:32 +01:00
Christopher Faulet	19fb19976f	BUG/MEDIUM: mux-h2: Only Report H2C error on read error if demux buffer is empty It is similar to the previous fix ("BUG/MEDIUM: mux-h2: Don't report H2C error on read error if dmux buffer is not empty"), but on receive side. If the demux buffer is not empty, an error on the TCP connection must not be immediately reported as an error on the H2 connection. We must be sure to have tried to demux all data first. Otherwise, messages for one or more streams may be truncated while all data were already received and are waiting to be demux. This patch is part of a series that should fix a bug reported in issue #2388 (#2388#issuecomment-1855735144). Backport instructions will be shipped in the last commit of the series.	2023-12-18 21:15:32 +01:00
Christopher Faulet	5b78cbae77	BUG/MEDIUM: mux-h2: Switch pending error to error if demux buffer is empty When an error on the H2 connection is detected when sending data, only a pending error is reported, waiting for an error or a shutdown on the read side. However if a shutdown was already received, the pending error is switched to a definitive error. At this stage, we must also wait to have flushed the demux buffer. Otherwise, if some data must still be demux, messages for one or more streams may be truncated. There is already the flag H2_CF_END_REACHED to know a shutdown was received and we no longer progress on demux side (buffer empty or data truncated). On sending side, we should use this flag instead to report a definitive error. This patch is part of a series that should fix a bug reported in issue #2388 (#2388#issuecomment-1855735144). Backport instructions will be shipped in the last commit of the series.	2023-12-18 21:15:32 +01:00
Christopher Faulet	682f73b4fa	BUG/MEDIUM: mux-h2: Report too large HEADERS frame only when rxbuf is empty During HEADERS frames decoding, if a frame is too large to fit in a buffer, an internal error is reported and a RST_STREAM is emitted. On the other hand, we wait to have an empty rxbuf to decode the frame because we cannot retry a failed HPACK decompression. When we are decoding headers, it is valid to return an error if dbuf buffer is full because no data can be blocked in the rxbuf (which hosts the HTX message). However, during the trailers decoding, it is possible to have some data not sent yet for the current stream in the rxbug and data for another stream fully filling the dbuf buffer. In this case, we don't decode the trailers but we must not return an error. We must wait to empty the rxbuf first. Now, a HEADERS frame is considered as too large if the dbuf buffer is full and if the rxbuf is empty (the HTX message to be accurate). This patch should fix the issue #2382. It must be backported to all stable versions.	2023-12-13 16:45:29 +01:00
Christopher Faulet	6da0429e75	MINOR: mux-h2: Add global option to enable/disable zero-copy forwarding tune.h2.zero-copy-fwd-send can now be used to enable or disable the zero-copy fast-forwarding for the H2 mux only, for sends. For now, there is no option to disable it for receives because it is not supported yet. It is enabled ('on') by default.	2023-12-04 15:33:34 +01:00
Christopher Faulet	fd8ce788a5	MINOR: muxes: Implement ->sctl() callback for muxes and return the stream id All muxes now implements the ->sctl() callback function and are able to return the stream ID. For the PT multiplexer, it is always 0. For the H1 multiplexer it is the request count for the current H1 connection (added for this purpose). The FCGI, H2 and QUIC muxes, the stream ID is returned. The stream ID is returned as a signed 64 bits integer.	2023-11-29 11:11:12 +01:00
Christopher Faulet	d982a37e4c	MINOR: muxes: Rename mux_ctl_type values to use MUX_CTL_ prefix Instead of the generic MUX_, we now use MUX_CTL_ prefix for all mux_ctl_type value. This will avoid any ambiguities with other enums, especially with a new one that will be added to get information on mux streams.	2023-11-29 11:11:12 +01:00
Christopher Faulet	af733ef6e4	BUG/MEDIUM: mux-h2: Remove H2_SF_NOTIFIED flag for H2S blocked on fast-forward When a H2 stream is blocked during data fast-forwarding, we must take care to remove H2_SF_NOTIFIED flag. This was only performed when data fast-forward was attempted. However, if the H2 stream was blocked for any reason, this flag was not removed. During our tests, we found it was possible to infinitely block a connection because one of its streams was in the send_list with the flag set. In this case, the stream was no longer woken up to resume the sends, blocking all other streams. No backport needed.	2023-11-28 14:01:56 +01:00
Willy Tarreau	d656ac7e13	OPTIM: mux-h2/zero-copy: don't allocate more buffers per connections than streams It's the exact same as commit `0a7ab7067` ("OPTIM: mux-h2: don't allocate more buffers per connections than streams"), but for the zero-copy case this time. Previously it was only done on the regular snd_buf() path, but this one is needed as well. A transfer on 16 parallel streams now consumes half of the memory, and a single stream consumes much less. An alternate approach would be worth investigating in the future, based on the same principle as the CF_STREAMER_FAST at the higher level: in short, by monitoring how many mux buffers we write at once before refilling them, we would get an idea of how much is worth keeping in buffers max, given that anything beyond would just waste memory. Some tests show that a single buffer already seems almost as good, except for single-stream transfers, which is why it's worth spending more time on this.	2023-11-28 09:15:26 +01:00
Ilya Shipitsin	80813cdd2a	CLEANUP: assorted typo fixes in the code and comments This is 37th iteration of typo fixes	2023-11-23 16:23:14 +01:00
Willy Tarreau	4f02e3da67	BUG/MEDIUM: mux-h2: fail earlier on malloc in takeover() Connection takeover was implemented for H2 in 2.2 by commit `cd4159f03` ("MEDIUM: mux_h2: Implement the takeover() method."). It does have one corner case related to memory allocation failure: in case the task or tasklet allocation fails, the connection gets released synchronously. Unfortunately the situation is bad there, because the lower layers are already switched to the new thread while the tasklet is either NULL or still the old one, and calling h2_release() will also result in h2_process() and h2_process_demux() that may process any possibly pending frames. Even the session remains the old one on the old thread, so that some sess_log() that are called when facing certain demux errors will be associated with the previous thread, possibly accessing a number of elements belonging to another thread. There are even code paths where the thread will try to grab the lock of its own idle conns list, believing the connection is there while it has no useful effect. However, if the owner thread was doing the same at the same moment, and ended up trying to pick from the current thread (which could happen if picking a connection for a different name), the two could even deadlock. The risk is extremely low, but Fred managed to reproduce use-after-free errors in conn_backend_get() after a takeover() failed by playing with -dMfail, indicating that h2_release() had been successfully called. In practise it's sufficient to have h2 on the server side with reuse-always and to inject lots of request on it with -dMfail. This patch takes a simple but radically different approach. Instead of starting to migrate the connection before risking to face allocation failures, it first pre-allocates a new task and tasklet, then assigns them to the connection if the migration succeeds, otherwise it just frees them. This way it's no longer needed to manipulate the connection until it's fully migrated, and as a bonus this means the connection will continue to exist and the use-after-free condition is solved at the same time. This should be backported to 2.2. Thanks to Fred for the initial analysis of the problem!	2023-11-17 18:10:16 +01:00
Amaury Denoyelle	a1457296d5	BUG/MINOR: mux_h2: reject passive reverse conn if error on add to idle On passive reverse, H2 mux is responsible to insert the connection in the server idle list. This is done via srv_add_to_idle_list(). However, this function may fail for various reason, such as FD usage limit reached. Handle properly this error case. H2 mux flags the connection on error which will cause its release. Prior to this patch, the connection was only released on server timeout. This bug was found inspecting server curr_used_conns counter. Indeed, on connection reverse, this counter is first incremented. It is decremented just after on srv_add_to_idle_list() if insertion is validated. However, if insertion is rejected, the connection was not released which cause curr_used_conns to remains positive. This has the major downside to break the reusing of idle connection on rhttp causing spurrious 503 errors. No need to backport.	2023-11-16 18:43:32 +01:00
Willy Tarreau	0a7ab7067f	OPTIM: mux-h2: don't allocate more buffers per connections than streams When an H2 mux works with a slow downstream connection and without the mux-mux mode, it is possible that a single stream will allocate all 32 buffers in the connection. This is not desirable at all because 1) it brings no value, and 2) it allocates a lot of memory per connection, which, in addition to using a lot of memory, tends to degrade performance due to cache thrashing. This patch improves the situation by refraining from sending data frames over a connection when more mbufs than streams are allocated. On a test featuring 10k connections each with a single stream reading from the cache, this patch reduces the RAM usage from ~180k buffers to ~20k bufs, and improves the bandwidth. This may even be backported later to recent versions to improve memory usage. Note however that it is efficient only when combined with `e16762f8a` ("OPTIM: mux-h2: call h2_send() directly from h2_snd_buf()"), and tends to slightly reduce the single-stream performance without it, so in case of a backport, the two need to be considered together.	2023-11-09 17:24:00 +01:00
Christopher Faulet	84d26bcf3f	MINOR: stconn/mux-h2: Use a iobuf flag to report EOI to consumer side during FF IOBUF_FL_EOI iobuf flag is now set by the producer to notify the consumer that the end of input was reached. Thanks to this flag, we can remove the ugly ack in h2_done_ff() to test the opposite SE flags. Of course, for now, it works and it is good enough. But we must keep in mind that EOI is always forwarded from the producer side to the consumer side in this case. But if this change, a new CO_RFL_ flag will have to be added to instruct the producer if it can forward EOI or not.	2023-11-08 21:14:07 +01:00
Christopher Faulet	4be0c7c655	MEDIUM: stconn/muxes: Loop on data fast-forwarding to forward at least a buffer In the mux-to-mux data forwarding, we now try, as far as possible to send at least a buffer. Of course, if the consumer side is congested or if nothing more can be received, we leave. But the idea is to retry to fast-forward data if less than a buffer was forwarded. It is only performed for buffer fast-forwarding, not splicing. The idea behind this patch is to optimise the forwarding, when a first forward was performed to complete a buffer with some existing data. In this case, the amount of data forwarded is artificially limited because we are using a non-empty buffer. But without this limitation, it is highly probable that a full buffer could have been sent. And indeed, with H2 client, a significant improvement was observed during our test. To do so, .done_fastfwd() callback function must be able to deal with interim forwards. Especially for the H2 mux, to remove H2_SF_NOTIFIED flags on the H2S on the last call only. Otherwise, the H2 stream can be blocked by itself because it is in the send_list. IOBUF_FL_INTERIM_FF iobuf flag is used to notify the consumer it is not the last call. This flag is then removed on the last call.	2023-11-08 21:14:07 +01:00
Christopher Faulet	141b489291	BUG/MEDIUM: stconn: Report send activity during mux-to-mux fast-forward When data are directly forwarded from a mux to the opposite one, we must not forget to report send activity when data are successfully sent or report a blocked send with data are blocked. It is important because otherwise, if the transfer is quite long, longer than the client or server timeout, an error may be triggered because the write timeout is reached. H1, H2 and PT muxes are concerned. To fix the issue, The done_fastword() callback now returns the amount of data consummed. This way it is possible to update/reset the FSB data accordingly. No backport needed.	2023-11-07 10:30:01 +01:00
Willy Tarreau	e16762f8a8	OPTIM: mux-h2: call h2_send() directly from h2_snd_buf() This allows to eliminate full buffers very quickly and to recycle them much faster, resulting in higher transfer rates and lower memory usage at the same time. We just wake the tasklet up if it succeeded so that h2_process() and friends are called to finalize what needs to. For regular buffer sizes, the performance level becomes quite close to the one obtained with the zero-copy mechanism (zero-copy remains much faster with non-default buffer sizes). The memory savings are huge with default buffer size: at 64c * 100 streams on a single thread, we used to forward 4.4 Gbps of traffic using 10400 buffers. After the change, the performance reaches 5.9 Gbps with only 22-24 buffers, since they are quickly recycled. That's asaving of 160 MB of RAM. A concern was an increase in the number of syscalls but this is not the case, the numbers remained exactly the same before and after. Some experimentations were made to try to cork data and not send incomplete buffers, and that always voided these changes. One explanation might be that keeping a first buffer with only headers frames is sufficient to prevent a zero-copy of the data coming in a next snd_buf() call. This still needs to be studied anyway.	2023-11-04 08:34:23 +01:00

1 2 3 4 5 ...

934 Commits