haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-07 07:37:02 +02:00

Author	SHA1	Message	Date
Willy Tarreau	e057f8367c	DOC: design-thoughts: add diagrams illustrating an rx win groth Let's just see on a diagram how the receiver can detect that the window is large enough for the remote sender to fill the link. Here it seems that a first criterion is that data are accumulating in the rxbuf, indicating that the next hop doesn't consume them fast enough. On the diagram it's visible when blue arrows (incoming data) are more frequent than the magenta ones on average (outgoing data), which happens when silence moments are less frequent and don't allow the reader to catch up. It's also visible that there are two phases alternating in the transfer: - measure round trip time (i.e. how long it takes to restart sending after a WU was sent after a long silence) - measure the lowest rxbuf size during the previous round trip It's worth noting that a window size change only has observable effect after two RTT: the first RTT is to restart sending (opening or enlarging the window), the second RTT to measure the lowest rxbuf size over the period. By turning the advertised window into an offset and comparing it to the received quantity, it's possible to measure the RTT of the whole chain (including the client possibly producing the data). Note that when multiple streams compete for BW this can become tricky. Limiting the window to available buffers and counting the number of sending streams on a connection could work (i.e. split total buffers into 1+#senders, first one being used for tx).	2024-10-12 16:38:36 +02:00
Willy Tarreau	0fd66703c2	MEDIUM: mux-h2: change the default initial window to 16kB Now that we're using all available rx buffers for transfers, there's no point anymore in advertising more than the minimum value we can safely buffer. Let's be conservative and only rely on the dynamic buffers to improve speed beyond the configured value, and make sure than many streams will no longer cause unfairness. Interestingly, the total number of wakeups has further shrunk down, but with a different distribution. From 128k for 1000 1M transfers, it went down to 119k, with 96k from restart_reading, 10k from done_ff and 2.6k from snd_buf. done_ff went up by 30% and restart_reading went down by 30%.	2024-10-12 16:38:26 +02:00
Willy Tarreau	1ed9d37c88	MINOR: mux-h2: add tune.h2.be.rxbuf and tune.h2.fe.rxbuf global settings These settings allow to change the total buffer size allocated to the backend and frontend respectively. This way it's no longer necessary to play with tune.bufsize nor increase the number of streams to benefit from more buffers. Setting tune.h2.fe.rxbuf to 4m to match a sender's max tcp_wmem resulted in 257 Mbps for a single stream at 103ms vs 121 Mbps default (or 5.1 Mbps with a single buffer and 64kB window).	2024-10-12 16:29:16 +02:00
Willy Tarreau	e018d9a0cf	MAJOR: mux-h2: make the rxbuf allocation algorithm a bit smarter Without using bandwidth estimates, we can already use up to the number of allocatable rxbufs and share them evenly between receiving streams. In practice we reserve one buffer for any non-receiving stream, plus 1 per 8 possible new streams, and divide the rest between the number of receiving streams. Finally, for front streams, this is rounded up to the buffer size while for back streams we round it down. The rationale here is that front to back is very fast to flush and slow to refill so we want to optimise upload bandwidth regardless of the number of streams, while it's the opposite in the other way so we try to minimize HoL. That shows good results with a single stream being able to send at 121 Mbps at 103ms using 1.4 MB buffer with default settings, or 8 streams sharing the bandwidth at 180kB each. Previously the limit was approx 5.1 Mbps per stream. It also enables better sharing of backend connections: a slow (100 Mbps) and a fast (1 Gbps) clients were both downloading 2 100MB files each over a shared H2 connection. The fast one used to show 6.86 to 20.74s with an avg of 11.45s and an stddev of 5.81s before the patch, and went to a much more respectable 6.82 to 7.73s with 7.08s avg and 0.336s stddev. We don't try to increase the window past the remaining content length. First, this is pointless (though harmless), but in addition it causes needless emission of WINDOW_UPDATE frames on small uploads that are smaller than a window, and beyond being useless, it upsets vtest which expects an RST on some tests. The scheduling is not reliable enough to insert an expect for a window update first, so in the end wich that extra check we save a few useless frames on small uploads and please vtest. A new setting should be added to allow to increase the number of buffers without having to change the number of streams. At this point it's not done.	2024-10-12 16:29:16 +02:00
Willy Tarreau	3816c38601	MAJOR: mux-h2: permit a stream to allocate as many buffers as desired Now we don't enforce allocation limits in h2s_get_rxbuf(), since there is no benefit in not processing pending data, it would still cause HoL for no saving. The only reason for not allocating is if there are no buffers available for the connection. In theory this should not change anything except that it excerts code paths that support reallocating multiple buffers, which could possibly uncover a sleeping bug. This is why it's placed in a separate commit. And one observation worth noting is that it almost cut in half the number of iocb wakeups: for 1000 1MB transfers over 100 concurrent streams of a single connection, we used to observe 208k wakeups (110 from restart_reading, 80 from snd_buf, 11 from done_ff), and now we're observing 128k (113 from restart_reading, 2.4 from snd_buf, 6.9k from done_ff), which seems to indicate that pretty often the demuxing was blocked on a buffer full due to the default advertised window of 64k.	2024-10-12 16:29:16 +02:00
Willy Tarreau	4eb3ff1d3b	MAJOR: mux-h2: make streams use the connection's buffers For now it seems to work as before, and even when artificially inflating the number of allocatable buffers per stream. The number of allocated slots is always the same as the max number of streams, which guarantees that each stream will find one buffer. we only grant one buffer per stream at this point, since the goal was to replace the existing single rxbuf. A new demux blocking flag, H2_CF_DEM_RXBUF, was added to indicate a failure to get an rxbuf slot from the connection. It was lightly tested (by forcing bl_init() to a lower number of buffers). It is not yet certain whether it's more useful to have a new flag or to reuse the existing H2_CF_DEM_SFULL which indicates the rxbuf is full, but at least the new flag more accurately translates the condition, that may make a difference in the future. However, given that when RXBUF is set, most of the time it results in a failure to find more room to demux and it sets SFULL, for now we have to always clear SFULL when clearing RXBUF as well. This means that most of the time we'll see 3 combinations: - none: everything's OK - SFULL: the unique rx buffer is full - RXBUF \|\| (RXBUF\|SFULL): cannot allocate more entries Note that we need to be super careful in h2_frt_transfer_data() because the htx_free_data_space() function doesn't guarantee that the room is usable, so htx_add_data() may still fail despite an apparent room. For this reason, h2_frt_transfer_data() maintains a "full" flag to indicate that a transfer attempt failed and that a new buffer is required.	2024-10-12 16:29:16 +02:00
Willy Tarreau	6279cbc9e9	MINOR: mux-h2: clear up H2_CF_DEM_DFULL and H2_CF_DEM_SHORT_READ ambiguity Since commit `485da0b05` ("BUG/MEDIUM: mux_h2: Handle others remaining read0 cases on partial frames"), H2_CF_DEM_SHORT_READ is set when there is no blocking flags. However, it checks H2_CF_DEM_BLOCK_ANY which does not include H2_CF_DEM_DFULL. This results in many cases where both H2_CF_DEM_DFULL and H2_CF_DEM_SHORT_READ are set together, which makes no sense, since one says the demux buffer is full while the other one says an incomplete read was done. This doesn't permit to properly decide whether to restart reading or processing. Let's make sure to clear DFULL in h2_process_demux() whenever we consume incoming data from the dbuf, and check for DFULL before setting SHORT_READ. This could probably be considered as a bug fix but it's hard to say if it has any impact on the current code, probably at worst it might cause a few useless wakeups, so until there's any proof that it needs to be backported, better not do it.	2024-10-12 16:29:16 +02:00
Willy Tarreau	b74bedf157	MINOR: mux-h2: simplify the wake up code in h2_rcv_buf() The code used to decide when to restart reading is far from being trivial and will cause trouble after the forthcoming changes: it checks if the current stream is the same that is being demuxed, and only if so, wakes the demux to restart reading. Once streams will start to use multiple buffers, this condition will make no sense anymore. Actually the real reason is split into two steps: - detect if the demux is currently blocked on the current stream, and if so remove SFULL - detect if any demux blocking flags were removed during the operations, and if so, wake demuxing. For now this doesn't change anything.	2024-10-12 16:29:16 +02:00
Willy Tarreau	a0ed92f3dd	MINOR: mux-h2: simplify the exit code in h2_rcv_buf() The code used to decide what to tell to the upper layer and when to free the rxbuf is a bit convoluted and difficult to adapt to dynamic rxbufs. We first need to deal with memory management (b_free) and only then to decide what to report upwards. Right now it does it the other way around. This should not change anything.	2024-10-12 16:29:16 +02:00
Willy Tarreau	3b5ac2b553	MINOR: mux-h2: move H2_CF_WAIT_IN_LIST flag away from the demux flags It's not convenient to have this flag in the middle of the demux flags, it easily hides other ones that need to be added. Let's move it after the other ones.	2024-10-12 16:29:16 +02:00
Willy Tarreau	8cf418811d	MINOR: mux-h2: add rxbuf head/tail/count management for h2s Now the h2s get their rx_head, rx_tail and rx_count associated with the shared rxbufs. A few functions are provided to manipulate all this, essentially allocate/release a buffer for the stream, return a buffer pointer to the head/tail, counting allocated buffers for the stream and reporting if a stream may still allocate. For now this code is not used.	2024-10-12 16:29:16 +02:00
Willy Tarreau	a891534bfd	MINOR: mux-h2: allocate the array of shared rx bufs in the h2c In preparation for having a shared list of rx bufs, we're now allocating the array of shared rx bufs in the h2c. The pool is created at the max size between the front and back max streams for now, and the array is not used yet.	2024-10-12 16:29:16 +02:00
Willy Tarreau	721ea5b06c	MINOR: mux-h2: count within a connection, how many streams are receiving data A stream is receiving data from after the HEADERS frame missing END_STREAM, to the end of the stream or HREM (the presence of END_STREAM). We're now adding a flag to the stream that indicates this state, as well as a counter in the connection of streams currently receiving data. The purpose will be to gauge at any instant the number of streams that might have to share the available bandwidth and buffers count in order not to allocate too much flow control to any single stream. For now the counter is kept up to date, and is reported in "show fd".	2024-10-12 16:29:16 +02:00
Willy Tarreau	c9275084bc	MEDIUM: mux-h2: start to introduce the window size in the offset calculation Instead of incrementing the last_max_ofs by the amount of received bytes, we now start from the new current offset to which we add the static window size. The result is exactly the same but it prepares the code to use a window size combined with an offset instead of just refilling the budget from what was received. It was even verified that changing h2_fe_settings_initial_window_size in the middle of a transfer using gdb does indeed allow the transfer speed to adapt accordingly.	2024-10-12 16:29:16 +02:00
Willy Tarreau	1cc851d9f2	MEDIUM: mux-h2: start to update stream when sending WU The rationale here is that we don't absolutely need to update the stream offset live, there's already the rcvd_s counter to remind us we've received data. So we can continue to exploit the current check points for this. Now we know that rcvd_s indicates the amount of newly received bytes for the stream since last call to h2c_send_strm_wu() so we can update our stream offsets within that function. The wu_s counter is set to the difference between next_adv_ofs and last_adv_ofs, which are resynchronized once the frame is sent. If the stream suddenly disappears with unacked data (aborted upload), the presence of the last update in h2c->wu_s is sufficient to let the connection ack the data alone, and upon subsequent calls with new rcvd_s, the received counter will be used to ack, like before. We don't need to do more anyway since the goal is to let the client abort ASAP when it gets an RST. At this point, the stream knows its current rx offset, the computed max offset and the last advertised one.	2024-10-12 16:29:16 +02:00
Willy Tarreau	eb0fe66c61	MINOR: mux-h2: create and initialize an rx offset per stream In H2, everything is accounted as budget. But if we want to moderate the rcv window that's not very convenient, and we'd rather have offsets instead so that we know where we are in the stream. Let's first add the fields to the struct and initialize them. The curr_rx_ofs indicates the position in the stream where next incoming bytes will be stored. last_adv_ofs tells what's the offset that was last advertised as the window limit, and next_max_ofs is the one that will need to be advertised, which is curr_rx_ofs plus the current window. next_max_ofs will have to cause a WINDOW_UPDATE to be emitted when it's higher than last_adv_ofs, and once the WU is sent, its value will have to be copied over last_adv_ofs. The problem is, for now wherever we emit a stream WU, we have no notion of stream (the stream might even not exist anymore, e.g. after aborting an upload), because we currently keep a counter of stream window to be acked for the current stream ID (h2c->dsi) in the connection (rcvd_s). Similarly there are a few places early in the frame header processing where rcvd_s is incremented without knowing the stream yet. Thus, lookups will be needed for that, unless such a connection-level counter remains used and poured into the stream's count once known (delicate). Thus for now this commit only creates the fields and initializes them.	2024-10-12 16:29:15 +02:00
Willy Tarreau	560e474cdd	MINOR: mux-h2: split the amount of rx data from the amount to ack We'll need to keep track of the total amount of data received for the current stream, and the amount of data to ack for the current stream, which might soon diverge as soon as we'll have to update the stream's offset with received data, which are different from those to be ACKed. One reason is that in case a stream doesn't exist anymore (e.g. aborted an upload), the rcvd_s info might get lost after updating the stream, so we do need to have an in-connection counter for that. What's done here is that the rcvd_s count is transferred to wu_s in h2c_send_strm_wu(), to be used as the counter to send, and both are considered as sufficient when non-null to call the function.	2024-10-12 16:29:15 +02:00
Willy Tarreau	8f09bdce10	MINOR: buffer: add a buffer list type with functions The buffer ring is problematic in multiple aspects, one of which being that it is only usable by one entity. With multiplexed protocols, we need to have shared buffers used by many entities (streams and connection), and the only way to use the buffer ring model in this case is to have each entity store its own array, and keep a shared counter on allocated entries. But even with the default 32 buf and 100 streams per HTTP/2 connection, we're speaking about 3210132 bytes = 103424 bytes per H2 connection, just to store up to 32 shared buffers, spread randomly in these tables. Some users might want to achieve much higher than default rates over high speed links (e.g. 30-50 MB/s at 100ms), which is 3 to 5 MB storage per connection, hence 180 to 300 buffers. There it starts to cost a lot, up to 1 MB per connection, just to store buffer indexes. Instead this patch introduces a variant which we call a buffer list. That's basically just a free list encoded in an array. Each cell contains a buffer structure, a next index, and a few flags. The index could be reduced to 16 bits if needed, in order to make room for a new struct member. The design permits initializing a whole freelist at once using memset(0). The list pointer is stored at a single location (e.g. the connection) and all users (the streams) will just have indexes referencing their first and last assigned entries (head and tail). This means that with a single table we can now have all our buffers shared between multiple streams, irrelevant to the number of potential streams which would want to use them. Now the 180 to 300 entries array only costs 7.2 to 12 kB, or 80 times less. Two large functions (bl_deinit() & bl_get()) were implemented in buf.c. A basic doc was added to explain how it works.	2024-10-12 16:29:15 +02:00
Willy Tarreau	ac66df4e2e	REORG: buffers: move some of the heavy functions from buf.h to buf.c Over time, some of the buffer management functions grew quite a bit, and were still forced to remain inlined since all defined in buf.h. Let's create buf.c and move the heaviest ones there. All those moved here were above 200 bytes.	2024-10-12 16:29:15 +02:00
Willy Tarreau	d288ddb575	CLEANUP: muxes: remove useless inclusion of ebmbtree.h Since 2.7 with commit `8522348482` ("BUG/MAJOR: conn-idle: fix hash indexing issues on idle conns"), we've been using eb64 trees and not ebmb trees anymore, and later we dropped all that to centralize the operations in the server. Let's remove the ebmbtree.h includes from the muxes that do not use them.	2024-10-12 16:29:15 +02:00
Willy Tarreau	cf3fe1eed4	MINOR: mux-h2/traces: print the size of the DATA frames DATA frames produce a special trace with the amount of transferred data in arg4, but this was not reported by h2_trace(). This commit just adds it.	2024-10-12 16:29:15 +02:00
Willy Tarreau	af064b497a	BUG/MINOR: mux-h2/traces: present the correct buffer for trailers errors traces The local "rxbuf" buffer was passed to the trace instead of h2s->rxbuf that is used when decoding trailers. The impact is essentially the impossibility to present some buffer contents in some rare cases. It may be backported but it's unlikely that anyone will ever notice the difference.	2024-10-12 16:29:15 +02:00
Willy Tarreau	0fa654ca92	BUILD: cache: silence an uninitialized warning at -Og with gcc-12.2 Building with gcc-12.2 -Og yields this incorrect warning in cache.c: In function 'release_entry_unlocked', inlined from 'http_action_store_cache' at src/cache.c:1449:4: src/cache.c:330:9: warning: 'object' may be used uninitialized [-Wmaybe-uninitialized] 330 \| release_entry(cache, entry, 1); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/cache.c: In function 'http_action_store_cache': src/cache.c:1200:29: note: 'object' was declared here 1200 \| struct cache_entry object, old; \| ^~~~~~ This is wrong, the only way to reach the function is with first!=NULL and the gotos that reach there are all those made with first==NULL. Let's just preset object to NULL to silence it.	2024-10-12 16:28:54 +02:00
William Lallemand	edf85a1d76	MINOR: cfgparse: simulate long configuration parsing with force-cfg-parser-pause This command is pausing the configuration parser for <timeout> milliseconds. This is useful for development or for testing timeouts of init scripts, particularly to simulate a very long reload. It requires the expose-experimental-directives to be set.	2024-10-11 17:40:37 +02:00
Amaury Denoyelle	232083c3e5	BUG/MEDIUM: mux-quic: ensure timeout server is active for short requests If a small request is received on QUIC MUX frontend, it can be transmitted directly with the FIN on attach operation. rcv_buf is skipped by the stream layer. Thus, it is necessary to ensure that there is similar behavior when FIN is reported either on attach or rcv_buf. One difference was that se_expect_data() was called only for rcv_buf but not on attach. This most obvious effect is that stream timeout was deactivated for this request : client timeout was disabled on EOI but server one not armed due to previous se_expect_no_data(). This prevents the early closure of too long requests. To fix this, add an invokation of se_expect_data() on attach operation. This bug can simply be detected using httpterm with delay request (for example /?t=10000) and using smaller client/server timeouts. The bug is present if the request is not aborted on timeout but instead continue until its proper HTTP 200 termination. This has been introduced by the following commit : `85eabfbf67` MEDIUM: mux-quic: Don't expect data from server as long as request is unfinished This must be backported up to 2.8.	2024-10-10 17:20:39 +02:00
Aurelien DARRAGON	7144e60cd2	MINOR: sample: postresolve sink names in debug() converter debug() converter used to resolve sink names during parsing time. Because of this, we were unable to specify sink names that were defined after the debug() converter was placed. Like in the previous commit, let's implement proper postparsing for the debug() converter, in order to be able to use sink names that are about to be defined later in the config file.	2024-10-10 16:55:15 +02:00
Aurelien DARRAGON	ed266589b6	MINOR: trace: postresolve sink names A previous known limitation about traces was that parsing was performed on the fly, meaning that when using "sink" keyword, only sinks that were either internal or previously defined in the config could be used. Indeed, it was not possible to use a ring section defined AFTER the traces section when using the 'sink' keyword from traces. This limitation was also mentioned in the config file. Let's get rid of that limitation by implementing proper postparsing for the sink parameter in traces section. To do this, make use of the new sink_find_early() helper to start referencing sink by their names even if they don't exist yet (if they are about to be defined later in the config) Traces commands on the cli are not concerned by this change.	2024-10-10 16:55:15 +02:00
Aurelien DARRAGON	1bdf6e884a	MEDIUM: sink: implement sink_find_early() sink_find_early() is a convenient function that can be used instead of sink_find() during parsing time in order to try to find a matching sink even if the sink is not defined yet. Indeed, if the sink is not defined, sink_find_early() will try to create it and mark it as forward-declared. It will also save informations from the caller to better identify it in case of errors. If the sink happens to be found in the config, it will transition from forward-declared type to its final type. Else, it means that the sink was not found in the config, in this case, during postresolve, we raise an error to indicate that the sink was not found in the configuration. It should help solve postresolving issue with rings, because for now only log targets implement proper ring postresolving.. but rings may be used at different places in the code, such as debug() converter or in "traces" section.	2024-10-10 16:55:15 +02:00
Damien Claisse	ba7c03c18e	MINOR: ssl: disable server side default CRL check with WolfSSL Patch `64a77e3ea5` disabled CRL check when no CRL file was provided, but it only did it on bind side. Add the same fix in server context initialization side. This allows to enable peer verification (verify required) on a server using TLS, without having to provide a CRL file.	2024-10-10 09:31:19 +02:00
Amaury Denoyelle	456c3997b2	BUG/MEDIUM: quic: properly decount out-of-order ACK on stream release Out-of-order STREAM ACK are buffered in its related streambuf tree. On insertion, overlapping or contiguous ranges are merged together. The total size of buffered ack range is stored in <room> streambuf member and reported to QUIC MUX layer on streambuf release. The objective is to ensure QUIC MUX layer can allocate Tx buffers conveniently to preserve a good transfer throughput. Streamdesc is the overall container of many streambufs. It may also been released when its upper QCS instance is freed, after all stream data have been emitted. In this case, the active streambuf is also released via custom code. However, in this code path, <room> was not reported to the QUIC MUX layer. This bug caused wrong estimation for the QUIC MUX txbuf window, with bytes reamining even after all ACK reception. This may cause transfer freeze on other connection streams, with RESET_STREAM emission on timeout client. To fix this, reuse the existing qc_stream_buf_release() function on streamdesc release. This ensures that notify_room is correctly used. No need to backport.	2024-10-09 17:47:16 +02:00
Amaury Denoyelle	f0049d0748	BUG/MINOR: quic: fix discarding of already stored out-of-order ACK To properly decount out-of-order acked data range, contiguous or overlapping ranges are first merged before their insertion in a tree. The first step ensure that a newly reported range is not completely covered by the existing tree ranges. However, one of the condition was incorrect. Fix this to ensure that the final range tree does not contain duplicated entry. The impact of this bug is unknown. However, it may have allowed the insertion of overlapping ranges, which could in turn cause an error in QUIC MUX txbuf window, with a possible transfer freeze. No need to backport.	2024-10-09 17:32:30 +02:00
Aurelien DARRAGON	f88f162868	BUG/MEDIUM: hlua: properly handle sample func errors in hlua_run_sample_{fetch,conv}() To execute sample fetches and converters from lua. hlua API leverages the sample API. Prior to executing the sample func, the arg checker is called from hlua_run_sample_{fetch,conv}() to detect potential errors. However, hlua_run_sample_{fetch,conv}() both pass NULL as <err> argument, but it is wrong for two reasons. First we miss an opportunity to report precise error messages to help the user know what went wrong during the check.. and more importantly, some val check functions consider that the <err> pointer is never NULL. This is the case for example with check_crypto_hmac(). Because of this, when such val check functions encounter an error, they will crash the process because they will try to de-reference NULL. This bug was discovered and reported by GH user @JB0925 on #2745. Perhaps val check functions should make sure that the provided <err> pointer is != NULL prior to de-referencing it. But since there are multiple occurences found in the code and the API isn't clear about that, it is easier to fix the hlua part (caller) for now. To fix the issue, let's always provide a valid <err> pointer when leveraging val_arg() check function pointer, and make use of it in case or error to report relevant message to the user before freeing it. It should be backported to all stable versions.	2024-10-08 12:00:42 +02:00
Aurelien DARRAGON	d0e0105181	BUG/MEDIUM: hlua: make hlua_ctx_renew() safe hlua_ctx_renew() is called from unsafe places where the caller doesn't expect it to LJMP.. however hlua_ctx_renew() makes use of Lua library function that could potentially raise errors, such as lua_newthread(), and it does nothing to catch errors. Because of this, haproxy could unexpectedly crash. This was discovered and reported by GH user @JB0925 on #2745. To fix the issue, let's simply make hlua_ctx_renew() safe by applying the same logic implemented for hlua_ctx_init() or hlua_ctx_destroy(), which is catching Lua errors by leveraging SET_SAFE_LJMP_PARENT() helper. It should be backported to all stable versions.	2024-10-08 12:00:36 +02:00
Aurelien DARRAGON	3f4a788329	REGTESTS: add some tests for 'do-log' action Now that 'do-log' action may be used for all existing action contexts, let's add some tests in reg-tests/log/log_profile.vtc to ensure it works as expected. quic-ini is not tested as it may not be builtin depending on build options..	2024-10-04 21:38:19 +02:00
Aurelien DARRAGON	3ba924a4da	MINOR: action: add do-log action Thanks to the two previous commits, we can now expose the do-log action on all available action contexts, including the new quic-init context. Each context is responsible for exposing the do-log action by registering the relevant log steps, saving the idendifier, and then store it in the rule's context so that do_log_action() automatically uses it to produce the log during runtime. To use the feature, it is simply needed to use "do-log" (without argument) on an action directive, example: tcp-request connection do-log As mentioned before, each context where the action is exposed has its own log step identifier. Currently known identifiers are: quic-initial: quic-init tcp-request connection: tcp-req-conn tcp-request session: tcp-req-sess tcp-request content: tcp-req-cont tcp-response content: tcp-res-cont http-request: http-req http-response: http-res http-after-response: http-after-res Thus, these "additional" logging steps can be used as-is under log-profile section (after "on" keyword). However, although the parser will accept them, it makes no sense to use them with the "log-steps" proxy keyword, since the only path for these origins to trigger a log generation is through the explicit use of "do-log" action. This need was described in GH #401, it should help to conditionally trigger logs using ACL at specific key points.. and may either be used alone or combined with "log-steps" to add additional log "trackers" during transaction handling. Documentation was updated and some examples were added.	2024-10-04 21:38:14 +02:00
Aurelien DARRAGON	0e271f1d2a	MINOR: log: add do_log_parse_act() helper func Function may be used from places where per-context actions are usually registered (tcp_act.c, http_act.c, quic_rules.c.. to name a few) in order to expose the do_log() action.	2024-10-04 21:38:08 +02:00
Aurelien DARRAGON	e63c7da508	MINOR: log: add do_log() logging helper do_log() is quite similar to sess_log() or strm_log(), excepts that it may be called at any time during session handling in an opportunistic way as long as the session exists (the stream may or may not exist). Also, it will try to emit the log as INFO by default, unless set-log-level is used on the stream, or error origin flag is set.	2024-10-04 21:38:02 +02:00
Amaury Denoyelle	f6599cf5a6	MEDIUM: quic: decount out-of-order ACK data range for MUX txbuf window This commit is the last one of a serie whose objective is to restore QUIC transfer throughput performance to the state prior to the recent QUIC MUX buffer allocator rework. This gain is obtained by reporting received out-of-order ACK data range to the QUIC MUX which can then decount room in its txbuf window. This is implemented in QUIC streamdesc layer by adding a new invokation of notify_room callback. This is done into qc_stream_buf_store_ack() which handle out-of-order ACK data range. Previous commit has introduced merging of overlapping ACK data range. As such, it's easy to only report the newly acknowledged data range. As with in-order ACKs, this new notification is only performed on released streambuf. As such, when a streambuf instance is released, notify_room notification now also reports the total length of out-of-order ACK data range currently stored. This value is stored in a new streambuf member <room> to avoid unnecessary tree lookup. This <room> member also serves on in-order ACK notification to reduce the notified room. This prevents to report invalid values when overlap ranges are treated first out-of-order and then in-order, which would cause an invalid QUIC MUX txbuf window value. After this change has been implemented, performance has been significantly improved, both with ngtcp2-client rate usage and on interop goodput test. These values are now similar to the rate observed on older haproxy version before QUIC MUX buffer allocator rework.	2024-10-04 18:09:51 +02:00
Amaury Denoyelle	ae3e768d32	MEDIUM: quic: merge contiguous/overlapping buffered ack stream range Transfer throughput was deteriorated since recent rework of QUIC MUX txbuf allocator. This was partially restorated with the commit to decount individual in-order ACK from the MUX buffer window. To fully retrieve the old performance level, all ACKs must be decounted when handled by QUIC streamdesc layer, event out-of-order ranges. However, this is not easily implemented as several ranges may exist in parallel with overlap on the underlying data. It would cause miscalculation for QUIC MUX buffer window if such ranges were blindly reported. The proper solution is to first implement merge of contiguous or overlapping ACK data ranges to reduce the number of stored ranges to the minimal. This is the purpose of this patch. This is implemented in a new static function named qc_stream_buf_store_ack() into streamdesc layer. The merge algorithm is simple enough. First, it ensures the newly added range is not already fully covered by a preexisting entry. Then, it checks if there is contiguity/overlap with one or several ranges starting at the same of a greater offset. If true, the newly added entry is extended to cover them all, and all contiguous/overlapped ranges are removed. Finally, if there is contiguity or overlap with an entry starting at a smaller offset, no new range is instantiated and instead the smaller offset is extended. Now that contiguous or overlapped ranges cannot exits anymore, ACK data ranges tree instiatiation can used EB_ROOT_UNIQUE. Outside of the longer term objective which is to decount out-of-order ACKs from MUX txbuf window, this commit could also improve some performance and/or memory usage for connections where stream data fragmentation and packet reording is high.	2024-10-04 18:07:52 +02:00
Amaury Denoyelle	e7578084b0	MINOR: quic: implement dedicated type for out-of-order stream ACK QUIC streamdesc layer is responsible to handle reception of ACK for streams. It removes stream data from the underlying buffers on ACK reception. Streamdesc layer treats ACK in order at the stream level. Out of order ACKs are buffered in a tree until they can be handled on older data acknowledgement reception. Previously, qf_stream instance which comes from the quic_tx_packet was used as tree node to buffer such ranges. Introduce a new type dedicated to represent out of order stream ack data range. This type is named qc_stream_ack. It contains minimal infos only relative to the acknowledged stream data range. This allows to reduce size of frequently used quic_frame with the removal of tree node from qf_stream. Another side effect of this change is that now quic_frame are always released immediately on ACK reception, both in-order and out-of-order. This allows to also release the quic_tx_packet instance which should reduce memory consumption. The drawback of this change is that qc_stream_ack instance must be allocated on out-of-order ACK reception. As such, qc_stream_desc_ack() may fail if an error happens on allocation. For the moment, such error is silenly recovered up to qc_treat_rx_pkts() with the dropping of the received packet containing the ACK frame. In the future, it may be useful to close the connection as this error may only happens on low memory usage.	2024-10-04 17:56:45 +02:00
Amaury Denoyelle	4ff87db5fe	MEDIUM: quic: decount acknowledged data for MUX txbuf window Recently, a new allocation mechanism was implemented for Tx buffers used by QUIC MUX. Now, underlying congestion window size is used to determine if it is still possible or not to allocate a new buffer when necessary. This mechanism has render the QUIC stack more flexible. However, it also has brought some performance degradation, with transfer time longer in certain environment. It was first discovered on the measurement results of the interop. It can also easily be reproduced using the following ngtcp2-client example which forces a very small congestion window due to frequent loss : $ ngtcp2-client -q --no-quic-dump --no-http-dump --exit-on-all-streams-close -r 0.1 127.0.0.1 20443 "https://[::]:20443/?s=10m" This performance decrease is caused by the allocator which is now too strict. It may cause buffer underrun frequently at the MUX layer when the congestion window is too small, as new buffers cannot be allocated until the current one is fully acknowledged. This resuls in transfers with very bad throughput utilisation. The objective of this new serie of patches is to relax some restrictions to permit QUIC MUX to allocate new buffers more quickly, while preserving the initial limitation based on congestion window size. An interesting method for this is to notify QUIC MUX about newly available room on individual ACK reception, without waiting for the full bffer acknowledgement. This is easily implemented by adding a new notify_room invokation in QUIC streamdesc layer on ACK reception. However, ACK reception are handled in-order at the stream level. Out of order ACKs are buffered and are not decounted for now. This will be implemented in a future commit. Note that for a single buffer instance, data can in parallel be written by QUIC MUX and removed on ACK reception. This could cause room notification to QUIC MUX layer to report invalid values. As such, ACK reception are only accounted for released buffers. This ensures that such buffers won't received any new data. In the same time, buffer room is notified on release operation as it does not need acknowledgement. This commit has permit to improve performance for the ngtcp2-client scenario above. However, it is not yet sufficient enough for interop goodput test.	2024-10-04 17:31:26 +02:00
Amaury Denoyelle	324a49ed4d	MINOR: quic: strengthen qc_release_frm() quic_frame is the type used to represent frames emitted in a QUIC Tx packet. Each frame is attached to a packet, and can also be linked to other frames from the the same packet, or duplicated frames for retransmission. As such, quic_frame free operation is a tedious process. qc_release_frm() has been implemented to ensure quic_frame is always properly freed after detaching from all its list attach point. One particular point is to ensure that when a frame is released, the frame origin and all origin copies, including the current <frm> are flagged as acked and detached from the reflist. Add a BUG_ON() to ensure this loop is properly conducted when dealing with the current <frm> instance.	2024-10-04 16:00:05 +02:00
Christopher Faulet	131b877565	BUG/MINOR: stats: Fix the name for the total number of streams created Because of a copy/paste error, CurrStreams was reused by mistake. It should be "CumStreams" No backports needed.	2024-10-04 15:44:40 +02:00
Amaury Denoyelle	c1d714156e	BUG/MAJOR: mux-quic: do not crash on empty STREAM frame emission Most of the time STREAM frames emitted by QUIC MUX have some data in it. However, it is possible to use an empty frame when a delayed FIN must be transferred. Recently, QUIC MUX send callback notification has been refactored. Now, this callback is blindly called by quic_conn lower layer each time a STREAM frame is built into a newly Tx packet. QUIC MUX is responsible to ensure the notified frame corresponds to newly emitted data or retransmission. Offsets are used for this comparison, but this requires special care for empty FIN frames. Sadly, the comparison written to determine if an empty FIN frame was sent for the first time or retransmitted is not correct. This caused such frame to always be dismissed as retransmission in QUIC MUX sent callback. This prevented the related QCS instance to be removed from the send_list, causing qcc_io_send() to retry a new emission. This was finally interrupted by the BUG_ON() assertion to prevent an infinite loop. Fix this crash by updating the condition in QUIC MUX send callback. For empty STREAM frame, it is sufficient to check if QC_SF_FIN_STREAM was already removed or not to detect a retransmission. Indeed, empty STREAM frames are never used outside of delayed FIN reporting. No need to backport. This crash was introduced in the current dev branch by the following commit. `d7f4e5abf0` MEDIUM: quic: strengthen MUX send notification	2024-10-04 11:31:11 +02:00
Willy Tarreau	7cdc9325a1	[RELEASE] Released version 3.1-dev9 Released version 3.1-dev9 with the following main changes : - MINOR: tools: add minimal file name management - CLEANUP: stick-table: make the file location point to a global file name - MINOR: proxy: use the global file names for conf->file - CLEANUP: cfgparse: factor proxy vs log-forward collisions - BUG/MINOR: cfgparse: detect another uncaught case of duplicate defaults - MINOR: proxy: add a list of orphaned defaults sections - MEDIUM: cfgparse: drop duplicate named defaults sections after use - OPTIM: cfgparse: speed up duplicate server detection - MEDIUM: cfgparse: warn about deprecated use of duplicate server names - BUG/MINOR: server: shut down streams under thread isolation - BUG/MINOR: proxy: also make the cli and resolvers use the global name - REGTESTS: log: fix log-profile.vtc - MEDIUM: mailers: warn about deprecated legacy mailers - BUG/MEDIUM: cli: Be sure to catch immediate client abort - DEV: flags/applet: decode appctx flags - BUG/MEDIUM: cli: Deadlock when setting frontend maxconn - MINOR: log: fix indent in strm_log() - MINOR: log: introduce extra log profile steps - MINOR: log: handle extra log origins in _process_send_log_override() - MINOR: log: introduce log_orig flags - MINOR: log: explicitly handle extra log origins as error when relevant - MINOR: log: support extra log origins for '%OG' alias - MINOR: proxy: add log_steps struct member - MINOR: log: introduce "log-steps" proxy keyword - MINOR: log: add log_orig_proxy() helper function - MEDIUM: log: consider log-steps proxy setting for existing log origins - DOC: config: document proxy "log-steps" keyword - REGTESTS: add a test for proxy "log-steps" - Revert "BUG/MINOR: server: shut down streams under thread isolation" - MINOR: task: define two new one-shot events for use with WOKEN_OTHER or MSG - BUG/MEDIUM: stream: make stream_shutdown() async-safe - BUG/MINOR: server: make sure the HMAINT state is part of MAINT - BUG/MINOR: queue: make sure that maintenance redispatches server queue - MINOR: server: make srv_shutdown_sessions() call pendconn_redistribute() - BUILD: tools: only include execinfo.h for the real backtrace() function - MINOR: tools: do not attempt to use backtrace() on linux without glibc - OPTIM: channel: speed up co_getline()'s search of the end of line - OPTIM: stconn: Don't pretend mux have more data to deliver on EOI/EOS/ERROR - BUG/MINOR: mcli: Pretend the mux have more data to deliver between two commands - MINOR: action: Export release_expr_int_action() release function - MINOR: stream: Rely on a per-stream max connection retries value - MINOR: stream: Support dynamic changes of the number of connection retries - MINOR: stream/stats: Expose the current number of streams in stats - MINOR: stream/stats: Expose the total number of streams ever created in stats - BUG/MINOR: cfgparse-global: fix allowed args number for setenv - MINOR: cfgparse-global: add dedicated parser for *env keywords - MINOR: mux-quic: complete Tx infos for QCS dump - MINOR: quic: ensure txbuf realloc is only performed on empty buffer - MINOR: mux-quic: strengthen qcs_send_metadata() usage - MINOR: quic: remove unneeded notification of txbuf room - MINOR: quic: refactor MUX send notification - MEDIUM: quic: strengthen MUX send notification - MINOR: quic: refactor STREAM room notification - MINOR: quic: do not remove qc_stream_desc automatically on ACK handling - MINOR: quic: store streambuf in a streamdesc tree - MINOR: quic: move buffered ACK to streambuf - MEDIUM: quic: handle out-of-order ACK at streamdesc layer - MEDIUM: quic: refactor buffered STREAM ACK consuming - BUG/MEDIUM: queue: always dequeue the backend when redistributing the last server - MINOR: config/trace: Add a 'traces' section to declare debug traces - MINOR: trace: Be able to chain commands for a source in one line - MINOR: tcpcheck: Add support for an option host header value for httpchk option - BUG/MINOR: mux-h1: Fix condition to set EOI on SE during zero-copy forwarding - MINOR: mux-h1: Use a dedicated function to conditionnaly set EOI flag on SE - BUG/MINOR: http-ana: Disable fast-fwd for unfinished req waiting for upgrade - BUG/MINOR: mux-quic: fix crash on qcc_init() early return - BUG/MINOR: quic: fix trace on releasing STREAM frame after ack	2024-10-03 17:47:33 +02:00
Amaury Denoyelle	b74df9fbc9	BUG/MINOR: quic: fix trace on releasing STREAM frame after ack Fix NULL argument pass to qc_release_frm(). This allows to give more context on the traces inside it. Note that no crash occured as QUIC traces always check validity on first arg before derefencing it. No backport needed.	2024-10-02 17:10:51 +02:00
Amaury Denoyelle	58b7a72d07	BUG/MINOR: mux-quic: fix crash on qcc_init() early return qcc_release() may be used in case qcc_init() cannot complete. In this case, connection instance is NULL. As such, it cannot be dereferenced without testing it first. This should fix github coverity report #2739. No backport needed.	2024-10-02 17:06:31 +02:00
Christopher Faulet	cea1379cf1	BUG/MINOR: http-ana: Disable fast-fwd for unfinished req waiting for upgrade If a request is waiting for a protocol upgrade but it is not finished, the data fast-forwarding is disabled. Otherwise, the request analyzers will miss the end of the message. This case is possible since the commit 01fb1a54 ("BUG/MEDIUM: mux-h1/mux-h2: Reject upgrades with payload on H2 side only"). Indeed, before, a protocol upgrade was not allowed for request with payload. But it is now possible and this comes with a side-effect. It is not really satisfying but for now there is no other way to sync the muxes and the applicative stream. It seems to be a reasonnable fix for now, waiting for a deeper refactoring. This patch must be backported with the commit above.	2024-10-02 10:31:40 +02:00
Christopher Faulet	267ba1d889	MINOR: mux-h1: Use a dedicated function to conditionnaly set EOI flag on SE The same conditions are evaluated in h1_process_demux() and h1_fastfwd() to know if SE_FL_EOI flag must be set or not on the sedesc. So now, a dedicated function is used.	2024-10-02 10:22:51 +02:00
Christopher Faulet	6b39e245e1	BUG/MINOR: mux-h1: Fix condition to set EOI on SE during zero-copy forwarding During zero-copy data forwarding, the producer must set the EOI flag on the SE when end of the message is reached. It is already done but there is a case where this flag is set while it should not. When a request wants to perform a protocol upgrade and it is waiting for the server response, the flag must not be set because the HTTP message is finished but some data are possibly still expected, depending on the server response. On a 101-switching-protocol, more data will be sent because the producer is switch to TUNNEL state. So, now, the right condition is used. In DONE state, SE_FL_EOI flag is set on the sedesc iff: - it is the response - it is the request and the response is also in DONNE state - it is a request but no a protocol upgrade nor a CONNECT This patch must be backported as far as 2.9.	2024-10-02 10:22:51 +02:00

... 3 4 5 6 7 ...

23295 Commits