haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-12-20 17:10:59 +01:00

Author	SHA1	Message	Date
Willy Tarreau	2d18717fb8	BUILD: pools: fix build error on clang with inline vs forceinline clang is more picky than gcc regarding duplicate "inline". The functions declared with "forceinline" don't need to have "inline" since it's already in the macro.	2023-08-12 19:58:17 +02:00
Willy Tarreau	29eed99b50	MINOR: pools: make pool_evict_last_items() use pool_put_to_os_no_dec() The bucket is already known, no need to calculate it again. Let's just include the lower level functions.	2023-08-12 19:04:34 +02:00
Willy Tarreau	7bf829ace1	MAJOR: pools: move the shared pool's free_list over multiple buckets This aims at further reducing the contention on the free_list when using global pools. The free_list pointer now appears for each bucket, and both the alloc and the release code skip to a next bucket when ending on a contended entry. The default entry used for allocations and releases depend on the thread ID so that locality is preserved as much as possible under low contention. It would be nice to improve the situation to make sure that releases to the shared pools doesn't consider the first entry's pointer but only an argument that would be passed and that would correspond to the bucket in the thread's cache. This would reduce computations and make sure that the shared cache only contains items whose pointers match the same bucket. This was not yet done. One possibility could be to keep the same splitting in the local cache. With this change, an h2load test with 5 * 160 conns & 40 streams on 80 threads that was limited to 368k RPS with the shared cache jumped to 3.5M RPS for 8 buckets, 4M RPS for 16 buckets, 4.7M RPS for 32 buckets and 5.5M RPS for 64 buckets.	2023-08-12 19:04:34 +02:00
Willy Tarreau	8a0b5f783b	MINOR: pools: move the failed allocation counter over a few buckets The failed allocation counter cannot depend on a pointer, but since it's a perpetually increasing counter and not a gauge, we don't care where it's incremented. Thus instead we're hashing on the TID. There's no contention there anyway, but it's better not to waste the room in the pool's heads and to move that with the other counters.	2023-08-12 19:04:34 +02:00
Willy Tarreau	da6999f839	MEDIUM: pools: move the needed_avg counter over a few buckets That's the same principle as for ->allocated and ->used. Here we return the summ of the raw values, so the result still needs to be fed to swrate_avg(). It also means that we now use the local ->used instead of the global one for the calculations and do not need to call pool_used() anymore on fast paths. The number of samples should likely be divided by the number of buckets, but that's not done yet (better observe first). A function pool_needed_avg() was added to report aggregated values for the "show pools" command. With this change, an h2load made of 5 * 160 conn * 40 streams on 80 threads raised from 1.5M RPS to 6.7M RPS.	2023-08-12 19:04:34 +02:00
Willy Tarreau	9e5eb586b1	MEDIUM: pools: move the used counter over a few buckets That's the same principle as for ->allocated. The small difference here is that it's no longer possible to decrement ->used in batches when releasing clusters from the cache to the shared cache, so the counter has to be decremented for each of them. But as it provides less contention and it's done only during forced eviction, it shouldn't be a problem. A function "pool_used()" was added to return the sum of the entries. It's used by pool_alloc_nocache() and pool_free_nocache() which need to count the number of used entries. It's not a problem since such operations are done when picking/releasing objects to/from the OS, but it is a reminder that the number of buckets should remain small. With this change, an h2load test made of 5 * 160 conn * 40 streams on 80 threads raised from 812k RPS to 1.5M RPS.	2023-08-12 19:04:34 +02:00
Willy Tarreau	cdb711e42b	MEDIUM: pools: spread the allocated counter over a few buckets The ->used counter is one of the most stressed, and it heavily depends on the ->allocated one, so let's first move ->allocated to a few buckets. A function "pool_allocated()" was added to return the sum of the entries. It's important not to abuse it as it does iterate, so everywhere it's possible to avoid it by keeping a local counter, it's better. Currently it's used for limited pools which need to make sure they do not allocate too many objects. That's an acceptable tradeoff to save CPU on large machines at the expense of spending a little bit more on small ones which normally are not under load.	2023-08-12 19:04:34 +02:00
Willy Tarreau	06885aaea7	MINOR: pools: introduce the use of multiple buckets On many threads and without the shared cache, there can be extreme contention on the ->allocated counter, the ->free_list pointer, and the ->used counter. It's possible to limit this contention by spreading the counters a little bit over multiple entries, that are summed up when a consultation is needed. The criterion used to spread the values cannot be related to the thread ID due to migrations, since we need to keep consistent stats (allocated vs used). Instead we'll just hash the pointer, it provides an index that does the job and that is consistent for the object. When having just a few entries (16 here as it showed almost identical performance between global and non-global pools) even iterations should be short enough during measurements to not be a problem. A pair of functions designed to ease pointer hash bucket calculation were added, with one of them doing it for thread IDs because allocation failures will be associated with a thread and not a pointer. For now this patch only brings in the relevant parts of the infrastructure, the CONFIG_HAP_POOL_BUCKETS_BITS macro that defaults to 6 bits when 512 threads or more are supported, 5 bits when 128 or more are supported, 4 bits when 16 or more are supported, otherwise 3 bits for small setups. The array in the pool_head and the two utility functions are already added. It should have no measurable impact beyond inflating the pool_head structure.	2023-08-12 19:04:34 +02:00
Willy Tarreau	29ad61fb00	OPTIM: pools: make pool_get_from_os() / pool_put_to_os() not update ->allocated The pool's allocation counter doesn't strictly require to be updated from these functions, it may more efficiently be done in the caller (even out of a loop for pool_flush() and pool_gc()), and doing so will also help us spread the counters over an array later. The functions were renamed _noinc and _nodec to make sure we catch any possible user in an external patch. If needed, the original functions may easily be reimplemented in an inline function.	2023-08-12 19:04:34 +02:00
Willy Tarreau	feeda4132b	OPTIM: pools: use exponential back-off on shared pool allocation/release Running a stick-table stress with -dMglobal under 56 threads shows extreme contention on the pool's free_list because it has to be processed in two phases and only used to implement a cpu_relax() on the retry path. Let's at least implement exponential back-off here to limit the neighbor's noise and reduce the time needed to successfully acquire the pointer. Just doing so shows there's still contention but almost doubled the performance, from 1.1 to 2.1M req/s.	2023-08-12 19:04:34 +02:00
Willy Tarreau	45eeaad45f	MEDIUM: peers: drop the stick-table lock before entering peer_send_teachmsgs() The function drops the lock very early, and the only operations that are performed on the entry code are updating the current peer's last_local_table, which doesn't need to be protected. Thus it's easier to drop the lock before entering the function and it further limits its scope. This has raised the peak RPS from 2050 to 2355k/s with a peers section on the 80-core machine.	2023-08-11 19:03:35 +02:00
Willy Tarreau	cfeca3a3a3	MEDIUM: stick-table: touch updates under an upgradable read lock Instead of taking the update's write lock in stktable_touch_with_exp(), while most of the time under high load there is nothing to update because the entry is touched before having been synchronized present, let's do the check under a read lock and upgrade it to perform the update if needed. These updates are rare and the contention is not expected to be very high, so at the first failure to upgrade we retry directly with a write lock. By doing so the performance has almost doubled again, from 1140 to 2050k with a peers section enabled. The contention is now on taking the read lock itself, so there's little to be gained beyond this in this function.	2023-08-11 19:03:35 +02:00
Willy Tarreau	87e072eea5	MEDIUM: stick-table: use a distinct lock for the updates tree Updating an entry in the updates tree is currently performed under the table's write lock, which causes huge contention with other accesses such as lookups and free. Aside the updates tree, the update, localupdate and commitupdate variables, nothing is manipulated, so let's create a distinct lock (updt_lock) to protect these together to remove this contention. It required to add an extra lock in the few places where we delete the update (though only if we're really going to delete it) to protect the tree. This is very convenient because now peer_send_teachmsgs() only needs to take this read lock, and there is very little contention left on the stick-table. With this alone, the performance jumped from 614k to 1140k/s on a 80-thread machine with a peers section! Stick-table updates with no peers however now has to stand two locks and slightly regressed from 4.0-4.1M/s to 3.9-4.0. This is fairly minimal compared to the significant unlocking of the peers updates and considered totally acceptable.	2023-08-11 19:03:35 +02:00
Willy Tarreau	29982ea769	MEDIUM: peers: only read-lock peer_send_teachmsgs() This function doesn't need to be write-locked. It performs a lookup of the next update at its index, atomically updates the ref_cnt on the stksess, updates some shared_table fields on the local thread, and updates the table's commitupdate. Now that this update is atomic we don't need to keep the write lock during that period. In addition this function's callers do not rely on the write lock to be held either since it was droped during peer_send_updatemsg() anyway. Now, when the function is entered with a write lock, it's downgraded to a read lock, otherwise a read lock is grabbed. Updates are looked up under the read lock and the message is sent without the lock. The commitupdate is still performed under the read lock (so as not to break the code too much), and the write lock is re-acquired when leaving if needed. This allows multiple peers to look up updates in parallel and to avoid stalling stick-table lookups.	2023-08-11 19:03:35 +02:00
Willy Tarreau	d4f8286e45	MEDIUM: peers: drop then re-acquire the wrlock in peer_send_teachmsgs() This function maintains the write lock for a while. In practice it does not need to hold it that long, and some parts could be performed under a read lock. This patch first drops then re-acquires the write lock at the function's entry. The purpose is simply to break the end-to-end atomicity to prove that it has no impact in case something needs to be bisected later. In fact the write lock is already dropped while calling peer_send_updatemsg().	2023-08-11 19:03:35 +02:00
Willy Tarreau	4eddf26f58	MEDIUM: peers: update ->commitupdate out of the lock using a CAS The ->commitupdate index doesn't need to be kept consistent with other operations, it only needs to be correct and to reflect the last known value. Right now it's updated under the stick-table lock, which is expensive and maintains this lock longer than needed. Let's move it outside of the lock, and update it using a CAS. This patch simply replaces the assignment with a CAS and makes sure all reads are atomic. On failed CAS we use a simple cpu_relax(), no need for more as there should not be that much contention here (updates are not that fast).	2023-08-11 19:03:35 +02:00
Willy Tarreau	7968fe3889	MEDIUM: stick-table: change the ref_cnt atomically Due to the ts->ref_cnt being manipulated and checked inside wrlocks, we continue to have it updated under plenty of read locks, which have an important cost on many-thread machines. This patch turns them all to atomic ops and carefully moves them outside of locks every time this is possible: - the ref_cnt is incremented before write-unlocking on creation otherwise the element could vanish before we can do it - the ref_cnt is decremented after write-locking on release - for all other cases it's updated out of locks since it's guaranteed by the sequence that it cannot vanish - checks are done before locking every time it's used to decide whether we're going to release the element (saves several write locks) - expiration tests are just done using atomic loads, since there's no particular ordering constraint there, we just want consistent values. For Lua, the loop that is used to dump stick-tables could switch to read locks only, but this was not done. For peers, the loop that builds updates in peer_send_teachmsgs is extremely expensive in write locks and it doesn't seem this is really needed since the only updated variables are last_pushed and commitupdate, the first one being on the shared table (thus not used by other threads) and the commitupdate could likely be changed using a CAS. Thus all of this could theoretically move under a read lock, but that was not done here. On a 80-thread machine with a peers section enabled, the request rate increased from 415 to 520k rps.	2023-08-11 19:03:35 +02:00
Willy Tarreau	73b1dea4d1	MINOR: stick-table: move the task_wakeup() call outside of the lock The write lock in stktable_touch_with_exp() is quite expensive and should be shortened as much as possible. There's no need for it when calling task_wakeup() so let's move it out. On a 80-thread machine with a peers section, the request rate increased from 397k to 415k rps.	2023-08-11 19:03:35 +02:00
Willy Tarreau	322e4ab9d2	MINOR: stick-table: move the task_queue() call outside of the lock The write lock in stktable_requeue_exp() is quite expensive and should be shortened as much as possible. There's no need for it when calling task_queue() so let's move it out. On a 80-thread machine with a peers section, the request rate increased from 368k to 397k rps.	2023-08-11 19:03:35 +02:00
Aurelien DARRAGON	09133860bf	BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread Michel Mayen reported that mixing lua actions loaded from 'lua-load' and 'lua-load-per-thread' directives within a single http/tcp session yields unexpected results. When executing action defined in another running context from the one of the previously executed action (from lua-load, then from lua-load-per-thread or the opposite, order doesn't matter), it would yield this kind of error: "Lua function 'name': [state-id x] runtime error: attempt to call a nil value from ." He also noted that when loading all actions using the same loading directive, the issue is gone. This is due to the fact that for lua actions, fetches and converters, lua code is being executed from the stream lua context. However, the stream lua context, which is created on the fly when first executing some lua code related to the stream, is reused between multiple lua executions. But the thing is, despite successive executions referring to the same parent "stream" (which is also assigned to a given thread id), they don't necessarily depend on the same running context from lua point of view. Indeed, since the function which is about to be executed could have been loaded from either 'lua-load' or 'lua-load-per-thread', the function declaration and related dependencies are defined in a specific stack ID which is known by calling fcn_ref_to_stack_id() on the given function. Thus, in order to make streams capable of chaining lua actions, fetches and converters loaded in different lua stacks, we add a new detection logic in hlua_stream_ctx_prepare() to be able to recreate the lua context in the proper stack space when the existing one conflicts with the expected stack id. This must be backported in every stable versions. It depends on: - "MINOR: hlua: add hlua_stream_prepare helper function" [for < 2.5, skip the filter part since they didn't exist] [wt: warning, wait a little bit before backporting too far, we need to be certain the added BUG_ON() will never trigger]	2023-08-11 19:02:59 +02:00
Aurelien DARRAGON	2fdb9d41b3	MINOR: hlua: add hlua_stream_ctx_prepare helper function Stream-dedicated hlua ctx creation and attachment is now performed in hlua_stream_ctx_prepare() helper function to ease code maintenance. No functional behavior change should be expected.	2023-08-11 19:00:57 +02:00
Aurelien DARRAGON	12cf8d4db7	BUG/MINOR: hlua: fix invalid use of lua_pop on error paths Multiple error paths made invalid use of lua_pop(): When the stack is emptied using lua_settop(0), lua_pop() (which is implemented as a lua_settop() macro) should not be used right after, because it could lead to invalid reads since the stack is already empty. Unfortunately, some remnants from initial lua stack implementation kept doing so, resulting in haproxy crashs on some lua runtime errors paths from time to time (ie: ERRRUN, ERRMEM). Moreover, the extra lua_pop() instruction, even if it was safe, is totally pointless in such case. Removing such unsafe lua_pop() statements when we know that the stack is already empty. This must be backported in every stable versions.	2023-08-11 19:00:55 +02:00
Amaury Denoyelle	7f80d51812	BUG/MEDIUM: quic: fix tasklet_wakeup loop on connection closing It is possible to trigger a loop of tasklets calls if a QUIC connection is interrupted abruptly by the client. This is caused by the following interaction : * FD iocb is woken up for read. This causes a wakeup on quic_conn tasklet. * quic_conn_io_cb is run and try to read but fails as the connection socket is closed (typically with a ECONNREFUSED). FD read is subscribed to the poller via qc_rcv_buf() which will cause the loop. The looping will stop automatically once the idle-timeout is expired and the connection instance is finally released. To fix this, ensure FD read is subscribed only for transient error cases (EAGAIN or similar). All other cases are considered as fatal and thus all future read operations will fail. Note that for the moment, nothing is reported on the quic_conn which may not skip future reception. This should be improved in a future commit to accelerate connection closing. This bug can be reproduced on a frequent occurence by interrupting the following command. Quic traces should be activated on haproxy side to detect the loop : $ ngtcp2-client --tp-file=/tmp/ngtcp2-tp.txt \ --session-file=/tmp/ngtcp2-session.txt \ -r 0.3 -t 0.3 --exit-on-all-streams-close 127.0.0.1 20443 \ "http://127.0.0.1:20443/?s=1024" This must be backported up to 2.7.	2023-08-11 17:04:20 +02:00
Frédéric Lécaille	d355bce7e4	BUG/MINOR: quic: Missing tasklet (quic_cc_conn_io_cb) memory release (leak) The tasklet responsible of handling the remaining QUIC connection object and its traffic was not released, leading to a memory leak. Furthermore its callback, quic_cc_conn_io_cb(), should return NULL after this tasklet is released.	2023-08-11 11:43:19 +02:00
Frédéric Lécaille	b0e32c6263	BUG/MINOR: quic: Possible crash when issuing "show fd/sess" CLI commands ->xprt_ctx (struct ssl_sock_ctx) and ->conn (struct connection) must be kept by the remaining QUIC connection object (struct quic_cc_conn) after having release the previous one (struct quic_conn) to allow "show fd/sess" commands to be functional without causing haproxy crashes. No need to backport.	2023-08-11 11:21:31 +02:00
Frédéric Lécaille	5d602f4f33	MINOR: quic: Add a trace for QUIC conn fd ready for receive Add a trace as this is done for the "send ready" fd state.	2023-08-11 08:57:47 +02:00
Frédéric Lécaille	f3edbc792e	BUG/MINOR: quic: Possible crash in quic_cc_conn_io_cb() traces. Reset the local cc_qc and qc after having released cc_qc. Note that cc_qc == qc. This is required to prevent haproxy from crashing when TRACE_LEAVE() is called. No need to backport.	2023-08-10 17:21:19 +02:00
Frédéric Lécaille	1ab6126ca7	BUG/MINOR: quic: mux started when releasing quic_conn There are cases where the mux is started before the handshake is completed: during 0-RTT sessions. So, it was a bad idea to try to release the quic_conn object from quic_conn_io_cb() without checking if the mux is started. No need to backport.	2023-08-10 17:15:21 +02:00
Willy Tarreau	d39a9cbd13	BUILD: mux-h1: shut a build warning on clang from previous commit Commit 5201b4abd ("BUG/MEDIUM: mux-h1: do not forget EOH even when no header is sent") introduced a build warning on clang due to the remaining two parenthesis in the expression. Let's fix this. No backport needed.	2023-08-09 16:03:39 +02:00
Willy Tarreau	5201b4abd1	BUG/MEDIUM: mux-h1: do not forget EOH even when no header is sent Since commit 723c73f8a ("MEDIUM: mux-h1: Split h1_process_mux() to make code more readable"), outgoing H1 requests with no header at all (i.e. essentially HTTP/1.0 requests) get delayed by 200ms. Christopher found that it's due to the fact that we end processing too early and we don't have the opportunity to send the EOH in this case. This fix addresses it by verifying if it's required to emit EOH when retruning from h1_make_headers(). But maybe that block could be moved after the while loop in fact, or the stop condition in the loop be revisited not to stop of !htx_is_empty(). The current solution gets the job done at least. No backport is needed, this was in 2.9-dev.	2023-08-09 11:58:15 +02:00
Willy Tarreau	949371a00d	BUG/MEDIUM: mux-h1: fix incorrect state checking in h1_process_mux() That's a regression introduced in 2.9-dev by commit 723c73f8a ("MEDIUM: mux-h1: Split h1_process_mux() to make code more readable") and found by Christopher. The consequence is uncertain but the test definitely was not right in that it would catch most existing states (H1_MSG_DONE=30). At least it would emit too many "H1 request fully xferred". No backport needed.	2023-08-09 11:51:58 +02:00
Willy Tarreau	22731762d9	BUG/MINOR: http: skip leading zeroes in content-length values Ben Kallus also noticed that we preserve leading zeroes on content-length values. While this is totally valid, it would be safer to at least trim them before passing the value, because a bogus server written to parse using "strtol(value, NULL, 0)" could inadvertently take a leading zero as a prefix for an octal value. While there is not much that can be done to protect such servers in general (e.g. lack of check for overflows etc), at least it's quite cheap to make sure the transmitted value is normalized and not taken for an octal one. This is not really a bug, rather a missed opportunity to sanitize the input, but is marked as a bug so that we don't forget to backport it to stable branches. A combined regtest was added to h1or2_to_h1c which already validates end-to-end syntax consistency on aggregate headers.	2023-08-09 11:28:48 +02:00
Willy Tarreau	6492f1f29d	BUG/MAJOR: http: reject any empty content-length header value The content-length header parser has its dedicated function, in order to take extreme care about invalid, unparsable, or conflicting values. But there's a corner case in it, by which it stops comparing values when reaching the end of the header. This has for a side effect that an empty value or a value that ends with a comma does not deserve further analysis, and it acts as if the header was absent. While this is not necessarily a problem for the value ending with a comma as it will be cause a header folding and will disappear, it is a problem for the first isolated empty header because this one will not be recontructed when next ones are seen, and will be passed as-is to the backend server. A vulnerable HTTP/1 server hosted behind haproxy that would just use this first value as "0" and ignore the valid one would then not be protected by haproxy and could be attacked this way, taking the payload for an extra request. In field the risk depends on the server. Most commonly used servers already have safe content-length parsers, but users relying on haproxy to protect a known-vulnerable server might be at risk (and the risk of a bug even in a reputable server should never be dismissed). A configuration-based work-around consists in adding the following rule in the frontend, to explicitly reject requests featuring an empty content-length header that would have not be folded into an existing one: http-request deny if { hdr_len(content-length) 0 } The real fix consists in adjusting the parser so that it always expects a value at the beginning of the header or after a comma. It will now reject requests and responses having empty values anywhere in the C-L header. This needs to be backported to all supported versions. Note that the modification was made to functions h1_parse_cont_len_header() and http_parse_cont_len_header(). Prior to 2.8 the latter was in h2_parse_cont_len_header(). One day the two should be refused but the former is also used by Lua. The HTTP messaging reg-tests were completed to test these cases. Thanks to Ben Kallus of Dartmouth College and Narf Industries for reporting this! (this is in GH #2237).	2023-08-09 09:27:38 +02:00
Willy Tarreau	2e97857a84	BUG/MINOR: h3: reject more chars from the :path pseudo header This is the h3 version of this previous fix: BUG/MINOR: h2: reject more chars from the :path pseudo header In addition to the current NUL/CR/LF, this will also reject all other control chars, the space and '#' from the :path pseudo-header, to avoid taking the '#' for a part of the path. It's still possible to fall back to the previous behavior using "option accept-invalid-http-request". Here the :path header value is scanned a second time to look for forbidden chars because we don't know upfront if we're dealing with a path header field or another one. This is no big deal anyway for now. This should be progressively backported to 2.6, along with the following commits it relies on (the same as for h2): REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests REORG: http: move has_forbidden_char() from h2.c to http.h MINOR: ist: add new function ist_find_range() to find a character range MINOR: http: add new function http_path_has_forbidden_char()	2023-08-08 19:56:41 +02:00
Willy Tarreau	b3119d4fb4	BUG/MINOR: h2: reject more chars from the :path pseudo header This is the h2 version of this previous fix: BUG/MINOR: h1: do not accept '#' as part of the URI component In addition to the current NUL/CR/LF, this will also reject all other control chars, the space and '#' from the :path pseudo-header, to avoid taking the '#' for a part of the path. It's still possible to fall back to the previous behavior using "option accept-invalid-http-request". This patch modifies the request parser to change the ":path" pseudo header validation function with a new one that rejects 0x00-0x1F (control chars), space and '#'. This way such chars will be dropped early in the chain, and the search for '#' doesn't incur a second pass over the header's value. This should be progressively backported to stable versions, along with the following commits it relies on: REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests REORG: http: move has_forbidden_char() from h2.c to http.h MINOR: ist: add new function ist_find_range() to find a character range MINOR: http: add new function http_path_has_forbidden_char() MINOR: h2: pass accept-invalid-http-request down the request parser	2023-08-08 19:56:41 +02:00
Willy Tarreau	2eab6d3543	BUG/MINOR: h1: do not accept '#' as part of the URI component Seth Manesse and Paul Plasil reported that the "path" sample fetch function incorrectly accepts '#' as part of the path component. This can in some cases lead to misrouted requests for rules that would apply on the suffix: use_backend static if { path_end .png .jpg .gif .css .js } Note that this behavior can be selectively configured using "normalize-uri fragment-encode" and "normalize-uri fragment-strip". The problem is that while the RFC says that this '#' must never be emitted, as often it doesn't suggest how servers should handle it. A diminishing number of servers still do accept it and trim it silently, while others are rejecting it, as indicated in the conversation below with other implementers: https://lists.w3.org/Archives/Public/ietf-http-wg/2023JulSep/0070.html Looking at logs from publicly exposed servers, such requests appear at a rate of roughly 1 per million and only come from attacks or poorly written web crawlers incorrectly following links found on various pages. Thus it looks like the best solution to this problem is to simply reject such ambiguous requests by default, and include this in the list of controls that can be disabled using "option accept-invalid-http-request". We're already rejecting URIs containing any control char anyway, so we should also reject '#'. In the H1 parser for the H1_MSG_RQURI state, there is an accelerated parser for bytes 0x21..0x7e that has been tightened to 0x24..0x7e (it should not impact perf since 0x21..0x23 are not supposed to appear in a URI anyway). This way '#' falls through the fine-grained filter and we can add the special case for it also conditionned by a check on the proxy's option "accept-invalid-http-request", with no overhead for the vast majority of valid URIs. Here this information is available through h1m->err_pos that's set to -2 when the option is here (so we don't need to change the API to expose the proxy). Example with a trivial GET through netcat: [08/Aug/2023:16:16:52.651] frontend layer1 (#2): invalid request backend <NONE> (#-1), server <NONE> (#-1), event #0, src 127.0.0.1:50812 buffer starts at 0 (including 0 out), 16361 free, len 23, wraps at 16336, error at position 7 H1 connection flags 0x00000000, H1 stream flags 0x00000810 H1 msg state MSG_RQURI(4), H1 msg flags 0x00001400 H1 chunk len 0 bytes, H1 body len 0 bytes : 00000 GET /aa#bb HTTP/1.0\r\n 00021 \r\n This should be progressively backported to all stable versions along with the following patch: REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests Similar fixes for h2 and h3 will come in followup patches. Thanks to Seth Manesse and Paul Plasil for reporting this problem with detailed explanations.	2023-08-08 19:56:11 +02:00
Willy Tarreau	d93a00861d	MINOR: h2: pass accept-invalid-http-request down the request parser We're adding a new argument "relaxed" to h2_make_htx_request() so that we can control its level of acceptance of certain invalid requests at the proxy level with "option accept-invalid-http-request". The goal will be to add deactivable checks that are still desirable to have by default. For now no test is subject to it.	2023-08-08 19:10:54 +02:00
Willy Tarreau	db97bb42d9	MINOR: mux-h2/traces: also suggest invalid header upon parsing error Historically the parsing error used to apply only to too large headers, so this is what has been reported in traces. But nowadays we can also reject invalid characters, and when this happens the trace is a bit misleading, so let's mention "or invalid".	2023-08-08 19:02:24 +02:00
Willy Tarreau	d13a80abb7	BUG/MAJOR: h3: reject header values containing invalid chars In practice it's exactly the same for h3 as 54f53ef7c ("BUG/MAJOR: h2: reject header values containing invalid chars") was for h2: we must make sure never to accept NUL/CR/LF in any header value because this may be used to construct splitted headers on the backend. Hence we apply the same solution. Here pseudo-headers, headers and trailers are checked separately, which explains why we have 3 locations instead of 2 for h2 (+1 for response which we don't have here). This is marked major for consistency and due to the impact if abused, but the reality is that at the time of writing, this problem is limited by the scarcity of the tools which would permit to build such a request in the first place. But this may change over time. This must be backported to 2.6. This depends on the following commit that exposes the filtering function: REORG: http: move has_forbidden_char() from h2.c to http.h	2023-08-08 19:02:24 +02:00
Willy Tarreau	d4069f3cee	REORG: http: move has_forbidden_char() from h2.c to http.h This function is not H2 specific but rather generic to HTTP. We'll need it in H3 soon, so let's move it to HTTP and rename it to http_header_has_forbidden_char().	2023-08-08 19:02:24 +02:00
Frédéric Lécaille	7c730803dc	MINOR: quic: Warning for OpenSSL wrapper QUIC bindings without "limited-quic" If the "limited-quic" globale option wa not set, the QUIC listener bindings were not bound, this is ok, but silently ignored. Add a warning in these cases to ask the user to explicitely enable the QUIC bindings when building QUIC support against a TLS/SSL library without QUIC support (OpenSSL).	2023-08-08 14:59:17 +02:00
Frédéric Lécaille	ab95230200	MINOR: quic: Release asap quic_conn memory from ->close() xprt callback. Add a condition to release asap the quic_conn memory when the connection is in "connection close" state from ->close() QUIC xprt callback.	2023-08-08 14:59:17 +02:00
Frédéric Lécaille	b930ff03d6	MINOR: quic: Release asap quic_conn memory (application level) Add a check to the QUIC packet handler running at application level (after the handshake is complete) to release the quic_conn memory calling quic_conn_release(). This is done only if the mux is not started.	2023-08-08 14:59:17 +02:00
Frédéric Lécaille	9f7cfb0a56	MEDIUM: quic: Allow the quic_conn memory to be asap released. When the connection enters the "connection closing" state after having sent a datagram with CONNECTION_CLOSE frames inside its packets, a lot of memory may be freed from quic_conn objects (QUIC connection). This is done allocating a reduced sized object which keeps enough information to handle the remaining incoming packets for the connection in "connection closing" state, and to continue to send again the previous datagram with CONNECTION_CLOSE frames inside which has already been sent. Define a new quic_cc_conn struct which represents the connection object after entering the "connection close" state and after having release the quic_conn connection object. Define <pool_head_quic_cc_conn> new pool for these quic_cc_conn struct objects. Define QUIC_CONN_COMMON structure which is shared between quic_conn struct object (the connection before entering "connection close" state), and new quic_cc_conn struct object (the connection after entering "connection close"). So, all the members inside QUIC_CONN_COMMON may be indifferently dereferenced from a quic_conn struct or a quic_cc_conn struct pointer. Implement qc_new_cc_conn() function to allocate such connections in "connection close" state. This function is responsible of copying the required information from the original connection (quic_conn) to the remaining connection (quic_cc_conn). Among others initialization, it redefined the QUIC packet handler task to quic_cc_conn_io_cb() and the idle timer task to qc_cc_idle_timer_task(). quic_cc_conn_io_cb() drains the received and resend the datagram which CONNECTION_CLOSE frame which has already been sent when entering "connection close" state. qc_cc_idle_timer_task() only releases the remaining quic_cc_conn struct object. Modify quic_conn_release() to allocate quic_cc_conn struct objects from the original connection passed as argument. It does nothing if this original connection is not in closing state, or if the idle timer has already expired. Implement quic_release_cc_conn() to release a "connection close" connection. It is called when its timer expires or if an error occured when sending a packet from this connection when the peer is no more reachable.	2023-08-08 14:59:17 +02:00
Frédéric Lécaille	276697438d	MINOR: quic: Use a pool for the connection ID tree. Add "quic_cids" new pool to allocate the ->cids trees of quic_conn objects. Replace ->cids member of quic_conn objects by pointer to "quic_cids" and adapt the code consequently. Nothing special.	2023-08-08 10:57:00 +02:00
Frédéric Lécaille	dc9b8e1f27	MEDIUM: quic: Send CONNECTION_CLOSE packets from a dedicated buffer. Add a new pool <pool_head_quic_cc_buf> for buffer used when building datagram wich CONNECTION_CLOSE frames inside with QUIC_MIN_CC_PKTSIZE(128) as minimum size. Add ->cc_buf_area to quic_conn struct to store such buffers. Add ->cc_dgram_len to store the size of the "connection close" datagrams and ->cc_buf a buffer struct to be used with ->cc_buf_area as ->area member value. Implement qc_get_txb() to be called in place of qc_txb_alloc() to allocate a struct "quic_cc_buf" buffer when the connection needs an immediate close or a buffer struct if not. Modify qc_prep_hptks() and qc_prep_app_pkts() to allow them to use such "quic_cc_buf" buffer when an immediate close is required.	2023-08-08 10:57:00 +02:00
Frédéric Lécaille	f7ab5918d1	MINOR: quic: Move some counters from [rt]x quic_conn anonymous struct Move rx.bytes, tx.bytes and tx.prep_bytes quic_conn struct member to bytes anonymous struct (bytes.rx, bytes.tx and bytes.prep member respectively). They are moved before being defined into a bytes anonoymous struct common to a future struct to be defined. Consequently adapt the code.	2023-08-07 18:57:45 +02:00
Frédéric Lécaille	a45f90dd4e	MINOR: quic: Amplification limit handling sanitization. Add a BUG_ON() to quic_peer_validated_addr() to check the amplification limit is respected when it return false(0), i.e. when the connection is not validated. Implement quic_may_send_bytes() which returns the number of bytes which may be sent when the connection has not already been validated and call this functions at several places when this is the case (after having called quic_peer_validated_addr()). Furthermore, this patch improves the code maintainability. Some patches to come will have to rename ->[rt]x.bytes quic_conn struct members.	2023-08-07 18:57:45 +02:00
Christopher Faulet	624979cf3b	BUG/MAJOR: http-ana: Get a fresh trash buffer for each header value replacement When a "replace-header" action is used, we loop on all headers in the message to change value of all headers matching a name. The new value is placed in a trash. However, there is a race here because if the message must be defragmented, another trash is used. If several defragmentation are performed because several headers must be updated at same time, the first trash, used to store the new value, may be crushed. Indeed, there are only 2 pre-allocated trash used in rotation. and the trash to store the new value is never renewed. As consequece, random data may be inserted into the header value. Here, to fix the issue, we must take care to refresh the trash buffer when we evaluated a new header. This way, a trash used for the new value, and eventually another way for the htx defragmentation. But that's all. Thanks to Christian Ruppert for his detailed report. This patch must be to all stable versions. On the 2.0, the patch must be applied on src/proto_htx.c and the function is named htx_transform_header_str().	2023-08-04 17:06:31 +02:00
Amaury Denoyelle	559482c11e	MINOR: h3: abort request if not completed before full response A HTTP server may provide a complete response even prior receiving the full request. In this case, RFC 9114 allows the server to abort read with a STOP_SENDING with error code H3_NO_ERROR. This scenario was notably reproduced with haproxy and an inactive server. If the client send a POST request, haproxy may provide a full HTTP 503 response before the end of the full request.	2023-08-04 16:17:16 +02:00

... 21 22 23 24 25 ...

17264 Commits