haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-09-22 22:31:28 +02:00

Author	SHA1	Message	Date
Willy Tarreau	cb4c236fac	BUG/MINOR: cfgparse: detect another uncaught case of duplicate defaults The following sequence was not properly caught: defaults def backend back from def defaults def But this one was: defaults def defaults def backend back from def Let's check when defaults are declared that they're not already referenced. Better not backport this. While it will catch broken configs (possibly some with backends pasted after the wrong defaults), these might still work by accident. It may be reported as a diag warning though.	2024-09-20 15:58:10 +02:00
Willy Tarreau	5b221d1e41	CLEANUP: cfgparse: factor proxy vs log-forward collisions This simplifies the check added in 1a38684fbc ("MEDIUM: cfgparse: detect collisions between defaults and log-forward"), by factoring it with the other existing one. The tests are ugly in that code because a first block tests pure proxies, a second one proxies or defaults and inside that one we have special cases for defaults. Let's just move the tests to the "any proxy type" block.	2024-09-20 14:13:14 +02:00
Willy Tarreau	b325453c36	MINOR: proxy: use the global file names for conf->file Proxy file names are assigned a bit everywhere (resolvers, peers, cli, logs, proxy). All these elements were enumerated and now use copy_file_name(). The only ha_free() call was turned to drop_file_name(). As a bonus side effect, a 300k backend config saved 14 MB of RAM.	2024-09-19 15:38:19 +02:00
Willy Tarreau	9ab21a3c2d	CLEANUP: stick-table: make the file location point to a global file name The file name used to point to the calling function's stack for stick tables, which was OK during parsing but remained dangling afterwards. At least it was already marked const so as not to accidentally free it. Let's make it point to a file_name_node now.	2024-09-19 15:38:19 +02:00
Willy Tarreau	d6c060c5ae	MINOR: tools: add minimal file name management In proxies, stick-tables, servers, etc... at plenty of places we store a file name and a line number. Some file names are the result of strdup() (e.g. in proxies), others not (e.g. stick-tables) and leave dangling pointers at the end of parsing. The risk of double-free is not null either. In order to stop this, let's first add a simple tool that allows to register short strings inside a global list, these strings happening to be server names. The strings are either duplicated and stored upon failure to find them, or just added to this storage. Since file names are not expected to disappear before the end of the process, for now we don't even implement refcounting, and we free them all at the end. There's already a drop_file_name() function to reset the pointer like ha_free() used to do, and even if not strictly needed it's a good habit to get used to doing it. The strings are returned as const so that they're stored as-is in structs, and that nasty free() calls are easily caught. The pointer points to the char[] storage inside the node itself. This way later if we want to implement refcounting, it will be trivial to just look up a string and change its associated node's refcount. If needed, comparisons can also be made on pointers. For now they're not used yet and are released on deinit().	2024-09-19 15:36:58 +02:00
Willy Tarreau	30a0e93fe6	[RELEASE] Released version 3.1-dev8 Released version 3.1-dev8 with the following main changes : - DOC: configuration: place the HAPROXY_HTTP_LOG_FMT example on the correct line - MINOR: mux-h1: Set EOI on SE during demux when both side are in DONE state - BUG/MEDIUM: mux-h1/mux-h2: Reject upgrades with payload on H2 side only - REGTESTS: h1/h2: Update script testing H1/H2 protocol upgrades - BUG/MEDIUM: clock: detect and cover jumps during execution - BUG/MINOR: pattern: prevent const sample from being tampered in pat_match_beg() - BUG/MEDIUM: pattern: prevent uninitialized reads in pat_match_{str,beg} - BUG/MEDIUM: pattern: prevent UAF on reused pattern expr - MEDIUM: ssl/cli: "dump ssl cert" allow to dump a certificate in PEM format - BUG/MAJOR: mux-h1: Wake SC to perform 0-copy forwarding in CLOSING state - BUG/MINOR: h1-htx: Don't flag response as bodyless when a tunnel is established - REGTESTS: fix random failures with wrong_ip_port_logging.vtc under load - BUG/MINOR: pattern: do not leave a leading comma on "set" error messages - REGTESTS: shorten a bit the delay for the h1/h2 upgrade test - MINOR: server: allow init-state for dynamic servers - DOC: server: document what to check for when adding new server keywords - MEDIUM: h1: Accept invalid T-E values with accept-invalid-http-response option - BUG/MINOR: polling: fix time reporting when using busy polling - BUG/MINOR: clock: make time jump corrections a bit more accurate - BUG/MINOR: clock: validate that now_offset still applies to the current date - BUG/MEDIUM: queue: implement a flag to check for the dequeuing - OPTIM: sample: don't check casts for samples of same type - OPTIM: vars: remove the unneeded lock in vars_prune_* - OPTIM: vars: inline vars_prune() to avoid many calls - MINOR: vars: remove the emptiness tests in callers before pruning - IMPORT: import cebtree (compact elastic binary trees) - OPTIM: vars: use a cebtree instead of a list for variable names - OPTIM: vars: use multiple name heads in the vars struct - BUG/MINOR: peers: local entries updates may not be advertised after resync - DOC: config: Explicitly list relaxing rules for accept-invalid-http-* options - MINOR: proxy: Rename accept-invalid-http-* options - DOC: configuration: Remove dangerous directives from the proxy matrix - BUG/MEDIUM: sc_strm/applet: Wake applet after a successfull synchronous send - BUG/MEDIUM: cache/stats: Wait to have the request before sending the response - BUG/MEDIUM: promex: Wait to have the request before sending the response - MINOR: clock: test all clock_gettime() return values - MEDIUM: clock: collect the monotonic time in clock_local_update_date() - MEDIUM: clock: opportunistically use CLOCK_MONOTONIC for the internal time - MEDIUM: clock: use the monotonic clock for idle time calculation - MEDIUM: clock: don't compute before_poll when using monotonic clock - BUG/MINOR: fix missing "log-format overrides previous 'option tcplog clf'..." detection - BUG/MINOR: fix missing "'option httpslog' overrides previous 'option tcplog clf'..." detection - BUG/MINOR: cfgparse-listen: fix option httpslog override warning message - BUG/MINOR: cfgparse: detect incorrect overlap of same backend names - MEDIUM: cfgparse: warn about proxies having the same names - DOC: management: add init-state to add server keywords - BUG/MINOR: mux-quic: report glitches to session - BUILD: cebtree: silence a bogus gcc warning on impossible code paths - MEDIUM: cfgparse: warn about colliding names between defaults and proxies - MEDIUM: cfgparse: detect collisions between defaults and log-forward v3.1-dev8	2024-09-18 22:29:08 +02:00
Willy Tarreau	1a38684fbc	MEDIUM: cfgparse: detect collisions between defaults and log-forward Sadly, when log-forward were introduced they took great care of avoiding collision with regular proxies but defaults were missed (they need to be explicitly checked for). So now we have to move them to a warning for 3.1 instead of rejecting them.	2024-09-18 18:08:15 +02:00
Willy Tarreau	d8f4b07e40	MEDIUM: cfgparse: warn about colliding names between defaults and proxies In order to complete the checks added in 303a66573d ("MEDIUM: cfgparse: warn about proxies having the same names"), we also need to warn about regular proxies having the same name as defaults sections as well as defaults sections having the same name as proxies, since defaults sections are inherently proxies, albeit stored in a separate list for now.	2024-09-18 18:08:06 +02:00
Willy Tarreau	8df44eea6d	BUILD: cebtree: silence a bogus gcc warning on impossible code paths gcc-12 and above report a wrong warning about a negative length being passed to memcmp() on an impossible code path when built at -O0. The pattern is the same at a few places, basically: int foo(int op, const void a, const void b, size_t size, size_t arg) { if (op == 1) // arg is a strict multiple of size return memcmp(a, b, arg - size); return 0; } ... int bar() { return foo(0, a, b, sizeof(something), 0); } It might be possible to invent dummy values for the "len" argument above in the real code, but that significantly complexifies it and as usual can easily result in introducing undesired bugs. Here we take a different approach consisting in shutting the -Wstringop-overread warning on gcc>=12 at -O0 since that's the only condition that triggers it. The issue was reported to and confirmed by the gcc team here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114622 No backport needed, but this should be upstreamed into cebtree after checking that all involved macros are available.	2024-09-18 17:42:52 +02:00
Amaury Denoyelle	fcd6d29acf	BUG/MINOR: mux-quic: report glitches to session Glitch counter was implemented for QUIC/HTTP3. The counter is stored in the QCC MUX connection instance. However, this is never reported at the session level which is necessary if glitch counter is tracked via a stick-table. To fix this, use session_add_glitch_ctr() in various QUIC MUX functions which may increment glitch counter. This should be backported up to 3.0.	2024-09-18 16:11:03 +02:00
Damien Claisse	2c783c25d6	DOC: management: add init-state to add server keywords Commit ce6a621ae allowed init-state to be used for dynamic servers but I forgot to update management doc.	2024-09-17 22:44:53 +02:00
Willy Tarreau	303a66573d	MEDIUM: cfgparse: warn about proxies having the same names As discussed below, there are too many problems and uncaught bugs in the parser when trying to support proxies having similar names but different types. There's specific code to detect the presence of stick-tables in a pair of such proxies for example. It's even possible that certain combinations of backend+listen that were not previously detected have some nasty side effects. According to the proposal in the discussion, this is now deprecated in 3.1 (thus we emit a warning) and will become forbidden in 3.3. A backport might be useful, but reporting a diag_warning only, not a classical warning, so as not to break setups running in zero-warning mode. It was verified with a config involving all 9 combinations of (frontend,backend,listen) followed by one of the same three that all collisions are now properly blocked and that only back+front are kept and emit a warning. Link: https://www.mail-archive.com/haproxy@formilux.org/msg45185.html	2024-09-17 19:55:00 +02:00
Willy Tarreau	c70906c8a1	BUG/MINOR: cfgparse: detect incorrect overlap of same backend names As reported below, it's possible to declare a backend then a proxy with the same name, because for the proxy we check a frontend capability (the first one to be tested): backend b listen b bind :8888 Let's check the two capabilities in this case and not just the frontend. Better not backport this, as there's a risk of breakage of existing setups that work by accident. It might make sense to report them as diag warnings though. Link: https://www.mail-archive.com/haproxy@formilux.org/msg45185.html	2024-09-17 19:55:00 +02:00
Aurelien DARRAGON	17e52c922b	BUG/MINOR: cfgparse-listen: fix option httpslog override warning message "option httpslog" override warning messaged used to be reported as "option httplog", probably as a result of copy paste without adjusting the context. Let's fix that to prevent emitting confusing warning messages The issue exists since 98b930d ("MINOR: ssl: Define a default https log format"), thus it should be backported up to 2.6	2024-09-17 15:40:02 +02:00
Aurelien DARRAGON	bc4bf5779f	BUG/MINOR: fix missing "'option httpslog' overrides previous 'option tcplog clf'..." detection Same as b85edd44db0 ("BUG/MINOR: fix missing "log-format overrides previous 'option tcplog clf'..." detection") but for "option httpslog" keyword. No backport needed unless fd48b28 ("MINOR: Implements new log format of option tcplog clf") is.	2024-09-17 15:40:02 +02:00
Aurelien DARRAGON	607b9adc9b	BUG/MINOR: fix missing "log-format overrides previous 'option tcplog clf'..." detection In commit fd48b28315 ("MINOR: Implements new log format of option tcplog clf") "option tcplog clf" detection was correcly added for "option tcplog" and "option httplog", but "log-format" case was overlooked. Thus, this config would report erroneous warning message: defaults option tcplog clf log-format "ok" [WARNING] (727893) : config : parsing [test.conf:3]: 'log-format' overrides previous 'log-format' in 'defaults' section. No backport needed unless fd48b28315 is.	2024-09-17 14:41:58 +02:00
Willy Tarreau	499e057644	MEDIUM: clock: don't compute before_poll when using monotonic clock There's no point keeping both clocks up to date; if the monotonic clock is ticking, let's just refrain from updating the wall clock one before polling since we won't use it. We still do it after polling however as we need a wall clock time to communicate with outside. This saves one gettimeofday() call per loop and two timeval comparisons.	2024-09-17 09:08:10 +02:00
Willy Tarreau	24496803d1	MEDIUM: clock: use the monotonic clock for idle time calculation By just keeping a copy of the last known value before entering polling, we can apply the same algorithm as we're currently using, except that it's now applied to the monotonic clock instead of the wall clock, when it's detected that it's ticking. This improves idle time calculation accuracy by making it independent on the wall clock.	2024-09-17 09:08:10 +02:00
Willy Tarreau	4150851ce5	MEDIUM: clock: opportunistically use CLOCK_MONOTONIC for the internal time We already collect CLOCK_MONOTONIC when it's available when leaving the poller, but it's only used for profiling. The functions that return it set the value to zero when it's not available, so we can use that to detect if it works or not. The idea is that if the monotonic time is non-zero, it is ticking and usable, then we use if for now_ns, otherwise we use the corrected date. We continue to apply the now_offset to the returned value because it helps forcing an early time wrap-around. Proceeding like this presents two benefits: - on systems supporting this, the time is much more robust against time changes - when it works, it saves us from having to go through the time correction code, which is usually cheap, but better avoided anyway. Note that idle time calculation continues to rely on the wall-clock time.	2024-09-17 09:08:10 +02:00
Willy Tarreau	f793845f4a	MEDIUM: clock: collect the monotonic time in clock_local_update_date() Now we collect this clock in clock_local_update_date(), the closest from the poller, which is also used when busy-polling, and the values is set into the thread's curr_mono_time which did not exist before. Later, clock_leaving_poll() just sets the prev_mono_time value from the curr_ one instead of retrieving the time at this specific point. It also means that the monotonic time will now also cover the time needed to update the global time, which should be negligible. Note that we don't collect the CPU time in the clock_local_update_date() function even though it's tempting, because when doing busy-polling, it would be collected on each round while being useless. Doing so will make sure that the local time always knows the monotonic time when it is available.	2024-09-17 09:08:10 +02:00
Willy Tarreau	42e699903e	MINOR: clock: test all clock_gettime() return values Till now we were only using clock_gettime() for profiling, so if it would fail it was no big deal. We intend to use it as the main clock as well now, so we need to more reliably detect its absence or failure and gracefully fall back to other options. Without the test we would return anything present in the stack, which is neither clean nor easy to detect.	2024-09-17 09:08:10 +02:00
Christopher Faulet	bb2a2bc5f2	BUG/MEDIUM: promex: Wait to have the request before sending the response It is similar to the previous fix about the stats applet ("BUG/MEDIUM: cache/stats: Wait to have the request before sending the response"). However, for promex, there is no crash and no obvious issue. But it depends on the filter. Indeed, the request is used by promex, independantly if it was considered as forwarded or not. So if it is modified by the filter, modification are just ignored. Same bug, same fix. We now wait the request was forwarded before processing it and produce the response.	2024-09-16 22:56:28 +02:00
Christopher Faulet	afc50f2445	BUG/MEDIUM: cache/stats: Wait to have the request before sending the response It seems obvious. On a classical workflow, the request headers analysis is finished when these applets are woken up for the first time. So they don't take care to really have the request to start to process it and to send the response. But with a filter, it is possible to stop the request analysis after the applet creation. If this happens for the stats applet, this leads to a crash because we retrieve the request start-line without checking if it is available. For the cache applet, the response is just immediatly sent. And here it is a problem if the compression is enabled. In that case too, this may lead to a crash because the compression may be enabled but not initialized. For a true server, there is no issue because the connection cannot be established. The server is chosen only after the request analysis. The issue with applets is that once created, an applet is quickly switched to the established state. So it is probably a point that must be carefully reviewed and probably reworked. In the mean time, as a fix, in the cache and the stats applet, we just take care to have the request before sending the response. This will do the trick. The patch must be backported as far as 2.6. On 2.6, the patch must be adapted.	2024-09-16 22:55:40 +02:00
Christopher Faulet	5fc12b0afd	BUG/MEDIUM: sc_strm/applet: Wake applet after a successfull synchronous send On a synchronous send from the stream to an applet, if some data were sent, we must take care to wake the applet up. It is important because if everything was sent at this stage, there is no other chance to wake the applet up, mainly because SE_FL_WAIT_DATA flag is set on the applet's sedesc in sc_update_tx() at the end of process_stream(). This flag prevent any wakeup of the applet for a send event. It is not necessary for a mux because the mux stream is called when a syncrhonous send from the stream is performed. So it is reponsible to wake the mux connection if necessary. This patch must be backport to 3.0.	2024-09-16 22:55:40 +02:00
Christopher Faulet	655124f5cc	DOC: configuration: Remove dangerous directives from the proxy matrix For now, that only concerns accept-invalid-http-{request/response} and accept-unsafe-violations-in-http-{request/response}. But the idea is to make dangerous directives hard to find. It is one more way to discourage anyone to use it. And, optionnaly, it is also handy because it keeps the matrix aligned on 80 columns.	2024-09-16 22:55:25 +02:00
Christopher Faulet	4de6632693	MINOR: proxy: Rename accept-invalid-http-* options With these options, it is possible to accept some invalid messages that may considered as unsafe and may result as vulnerabilities. The naming is not explicit enough on this point. These option must really be considered as dangerous and only used as a temporary workaround. Unfortunately, when used, it is probably because there are some legacy and unsupported applications in place. Nevermind. The documentation warns about the use of these options. Now the name of the options itself is a warning. So now, "accept-invalid-http-request" and "accept-invalid-http-response" options are deprecated and replaced by "accept-unsafe-violations-in-http-request" and "accept-unsafe-violations-in-http-response" options.	2024-09-16 22:55:25 +02:00
Christopher Faulet	0f4fad5291	DOC: config: Explicitly list relaxing rules for accept-invalid-http-* options Time to time, new exceptions are added in the HTTP parsing (most of time H1) to not reject some invalid messages sent by legacy applications. But the documentation of accept-invalid-http-request and accept-invalid-http-response options is not pretty clear. So, now, there is an explicit list of relaxing rules for both options.	2024-09-16 22:55:24 +02:00
Aurelien DARRAGON	1e0920f855	BUG/MINOR: peers: local entries updates may not be advertised after resync Since commit 864ac3117 ("OPTIM: stick-tables: check the stksess without taking the read lock"), when entries for a local table are learned from another peer upon resynchro, and this is the only peer haproxy speaks to, local updates on such entries are not advertised to the peer anymore, until they eventually expire and can be recreated upon local updates. This is due to the fact that ts->seen is always set to 0 when creating new entry, and also when touch_remote is performed on the entry. Indeed, while 864ac3117 attempts to avoid useless updates, it didn't consider entries learned from a remote peer. Such entries are exclusively learned in peer_treat_updatemsg(): once the entry is created (or updated) with new data, touch_remote is used to commit the change. However, unlike touch_local, entries committed using touch_remote will not be advertised to the peer from which the entry was just learned (otherwise we would enter a looping situation). Due to the above patch, once an entry is learned from the (unique) remote peer, 'seen' will be stuck to 0 so it will never be advertised for its whole lifetime. Instead, when entries are learned from a peer, we should consider that the peer that taught us the entry has seen it. To do this, let's set seen=1 in peer_treat_updatemsg() after calling touch_remote(). This way, if we happen to perform updates on this entry, it will be properly advertized to relevant peers. This patch should not affect the performance gain documented in 864ac3117 given that the test scenario didn't involved entries learned by remote peers, but solely locally created entries advertised to remote peers upon updates. This should be backported in 3.0 with 864ac3117.	2024-09-16 14:06:39 +02:00
Willy Tarreau	5d350d1e50	OPTIM: vars: use multiple name heads in the vars struct Given that the original list-based version was using a list head as the root of the variables, while the tree is using a single pointer, it made sense to reuse that space to place multiple roots, indexed on the lower bits of the name hash. Two roots slightly increase the performance level, but the best gain is obtained with 4 roots. The performance is now always above that of the list, even with small counts, and with 100 vars, it's 21% higher than before, or 67% higher than with the list. We keep the same lock (it could have made sense to use one lock per head), because most of the variables in large configs are attached to a stream or a session, hence are not shared between threads. Thus there's no point in sharding the pointer.	2024-09-15 23:51:51 +02:00
Willy Tarreau	47ec7c681e	OPTIM: vars: use a cebtree instead of a list for variable names Configs involving many variables can start to eat a lot of CPU in name lookups. The reason is that the names themselves are dynamic in that they are relative to dynamic objects (sessions, streams, etc), so there's no fixed index for example. The current implementation relies on a standard linked list, and in order to speed up lookups and avoid comparing strings, only a 64-bit hash of the variable's name is stored and compared everywhere. But with just 100 variables and 1000 accesses in a config, it's clearly visible that variable name lookup can reach 56% CPU with a config generated this way: for i in {0..100}; do printf "\thttp-request set-var(txn.var%04d) int(%d)" $i $i; for j in {1..10}; do [ $i -lt $j ] \|\| printf ",add(txn.var%04d)" $((i-j)); done; echo; done The performance and a 4-core skylake 4.4 GHz reaches 85k RPS with a perf profile showing: Samples: 170K of event 'cycles', Event count (approx.): 142378815419 Overhead Shared Object Symbol 56.39% haproxy [.] var_to_smp 6.65% haproxy [.] var_set.part.0 5.76% haproxy [.] sample_process_cnv 3.23% haproxy [.] sample_conv_var2smp 2.88% haproxy [.] sample_conv_arith_add 2.33% haproxy [.] __pool_alloc 2.19% haproxy [.] action_store 2.13% haproxy [.] vars_get_by_desc 1.87% haproxy [.] smp_dup [above, var_to_smp() calls var_get() under the read lock]. By switching to a binary tree, the cost is significantly lower, the performance reaches 117k RPS (+37%) with this profile: Samples: 170K of event 'cycles', Event count (approx.): 142323631229 Overhead Shared Object Symbol 40.22% haproxy [.] cebu64_lookup 7.12% haproxy [.] sample_process_cnv 6.15% haproxy [.] var_to_smp 4.75% haproxy [.] cebu64_insert 3.79% haproxy [.] sample_conv_var2smp 3.40% haproxy [.] cebu64_delete 3.10% haproxy [.] sample_conv_arith_add 2.36% haproxy [.] action_store 2.32% haproxy [.] __pool_alloc 2.08% haproxy [.] vars_get_by_desc 1.96% haproxy [.] smp_dup 1.75% haproxy [.] var_set.part.0 1.74% haproxy [.] cebu64_first 1.07% [kernel] [k] aq_hw_read_reg 1.03% haproxy [.] pool_put_to_cache 1.00% haproxy [.] sample_process The performance lowers a bit earlier than with the list however. What can be seen is that the performance maintains a plateau till 25 vars, starts degrading a little bit for the tree while it remains stable till 28 vars for the list. Then both cross at 42 vars and the list continues to degrade doing a hyperbole while the tree resists better. The biggest loss is at around 32 variables where the list stays 10% higher. Regardless, given the extremely narrow band where the list is better, it looks relevant to switch to this in order to preserve the almost linear performance of large setups. For example at 1000 variables and 10k lookups, the tree is 18 times faster than the list. In addition this reduces the size of the struct vars by 8 bytes since there's a single pointer, though it could make sense to re-invest them into a secondary head for example.	2024-09-15 23:49:01 +02:00
Willy Tarreau	a0205f9de4	IMPORT: import cebtree (compact elastic binary trees) This is an import of the compact elastic binary trees at commit a9cd84a ("OPTIM: descent: better prefetch less and for writes when deleting") These will be used to replace certain lists (and possibly certain tree nodes as well). They're as fast (or even faster) than ebtrees for lookups, as fast for insertion and slower for deletion, and a node only uses 2 pointers (like a list). The only changes were cebtree.h where common/tools.h was replaced with ebtree.h which we already have and already provides the needed functions and macros, and the addition of a wrapper cebtree-prv.h in src/ to redirect to import/cebtree-prv.h.	2024-09-15 23:44:59 +02:00
Willy Tarreau	6e92988e20	MINOR: vars: remove the emptiness tests in callers before pruning All callers of vars_prune_* currently check the list for emptiness. Let's leave that to vars_prune() itself, it will ease some changes in the code. Thanks to the previous inlining of the vars_prune() function, there's no performance loss, and even a very tiny 0.1% gain.	2024-09-15 23:44:16 +02:00
Willy Tarreau	2c1a9c3a43	OPTIM: vars: inline vars_prune() to avoid many calls Many configs don't have variables and call it for no reason, and even configs with variables don't necessarily have some in all scopes.	2024-09-15 23:42:09 +02:00
Willy Tarreau	aad6b771dd	OPTIM: vars: remove the unneeded lock in vars_prune_* vars_prune() and vars_prune_all() take the variable lock while purging all variables from a head. However this is not needed: - proc scope variables are only purged during deinit, hence no lock is needed ; - all other scopes are attached to entities bound to a single thread so no lock is needed either. Removing the lock saves about 0.5% CPU on variables-intensive setups, but above all simplify the code, so let's do it.	2024-09-15 23:05:50 +02:00
Willy Tarreau	51ade2f1db	OPTIM: sample: don't check casts for samples of same type Originally when converters were created, they were mostly for casting types. Nowadays we have many artithmetic converters to perform operations on integers, and a number of converters operating on strings. Both of these categories most often do not need any cast since the input and output types are the same, which is visible as the cast function is c_none. However, profiling shows that when heavily using arithmetic converters, it's possible to spend up to ~7% of the time in sample_process_cnv(), a good part of which is only in accessing the sample_casts[] array. Simply avoiding this lookup when input and ouput types are equal saves about 2% CPU on such setups doing intensive use of converters.	2024-09-15 12:43:56 +02:00
Willy Tarreau	b11495652e	BUG/MEDIUM: queue: implement a flag to check for the dequeuing As unveiled in GH issue #2711, commit 5541d4995d ("BUG/MEDIUM: queue: deal with a rare TOCTOU in assign_server_and_queue()") does have some side effects in that it can occasionally cause an endless loop. As Christopher analysed it, the problem is that process_srv_queue(), which uses a trylock in order to leave only one thread in charge of the dequeueing process, can lose the lock race against pendconn_add(). If this happens on the last served request, then there's no more thread to deal with the dequeuing, and assign_server_and_queue() will loop forever on a condition that was initially exepected to be extremely rare (and still is, except that now it can become sticky). Previously what was happening is that such queued requests would just time out and since that was very rare, nobody would notice. The root of the problem really is that trylock. It was added so that only one thread dequeues at a time but it doesn't offer only that guarantee since it also prevents a thread from dequeuing if another one is in the process of queuing. We need a different criterion. What we're doing now is to set a flag "dequeuing" in the server, which indicates that one thread is currently in the process of dequeuing requests. This one is atomically tested, and only if no thread is in this process, then the thread grabs the queue's lock and dequeues. This way it will be serialized with pendconn_add() and no request addition will be missed. It is not certain whether the original race covered by the fix above can still happen with this change, so better keep that fix for now. Thanks to @Yenya (Jan Kasprzak) for the precise and complete report allowing to spot the problem. This patch should be backported wherever the patch above was backported.	2024-09-13 08:35:47 +02:00
Willy Tarreau	adaba6f904	BUG/MINOR: clock: validate that now_offset still applies to the current date We want to make sure that now_offset is still valid for the current date: another thread could very well have updated it by detecting a backwards jump, and at the very same moment the time got fixed again, that we retrieve and add to the new offset, which results in a larger jump. Normally, for this to happen, it would mean that before_poll was also affected by the jump and was detected before and bounded within 2 seconds, resulting in max 2 seconds perturbations. Here we try to detect this situation and fall back to re-adjusting the offset instead. It's more of a strengthening of what's done by commit e8b1ad4c2b ("BUG/MEDIUM: clock: also update the date offset on time jumps") than a pure fix, in that the issue was not direclty observed but it's visibly possible by reading the code, so this should be backported along with the patch above. This is related to issue GH #2704. Note that this could be simplified in terms of operations by migrating the deadlines to nanoseconds, but this was the path to least intrusive changes.	2024-09-12 19:09:19 +02:00
Willy Tarreau	af48e4cc6b	BUG/MINOR: clock: make time jump corrections a bit more accurate Since commit e8b1ad4c2b ("BUG/MEDIUM: clock: also update the date offset on time jumps") we try to update the now_offet based on the last known valid date. But if it's off compared to the global_now_ns date shared by other threads, we'll get the time off a little bit. When this happens, we should consider the most recent of these dates so that if the global date was already known to be more recent, we should use it and stick to it. This will avoid setting too large an offset that could in turn provoke a larger jump on another thread. This is related to issue GH #2704. This can be backported to other branches having the patch above.	2024-09-12 18:27:03 +02:00
Willy Tarreau	ad98edd00a	BUG/MINOR: polling: fix time reporting when using busy polling Since commit beb859abce ("MINOR: polling: add an option to support busy polling") the time and status passed to clock_update_local_date() were incorrect. Indeed, what is considered is the before_poll date related to the configured timeout which does not correspond to what is passed to the poller. That's not correct because before_poll+the syscall's timeout will be crossed by the current date 100 ms after the start of the poller. In practice it didn't happen when the poller was limited to 1s timeout but at one minute it happens all the time. That's particularly visible when running a multi-threaded setup with busy polling and only half of the threads working (bind ... thread even). In this case, the fixup code of clock_update_local_date() is executed for each round of busy polling. The issue was made really visible starting with recent commit e8b1ad4c2b ("BUG/MEDIUM: clock: also update the date offset on time jumps") because upon a jump, the shared offset is reset, while it should not be in this specific case. What needs to be done instead is to pass the configured timeout of the poller (and not of the syscall), and always pass "interrupted" set so as to claim we got an event (which is sort of true as it just means the poller returned instantly). In this case we can still detect backwards/forward jumps and will use a correct boundary for the maximum date that covers the whole loop. This can be backported to all versions since the issue was introduced with busy-polling in 1.9-dev8.	2024-09-12 17:47:13 +02:00
Christopher Faulet	1900ca475f	MEDIUM: h1: Accept invalid T-E values with accept-invalid-http-response option Since the 2.6, A parsing error is reported when the chunked encoding is found twice. As stated in RFC9112, A sender must not apply the chunked transfer coding more than once to a message body. It means only one chunked coding must be found. In addition, empty values are also rejected becaues it is forbidden by RFC9110. However, in both cases, it may be useful to relax the rules for trusted legacy servers when accept-invalid-http-response option is set. Especially because it was accepted on 2.4 and older. In addition, T-E header is now sanitized before sending it. It is not a problem Because it is a hop-by-hop header Note that it remains invalid on client side because there is no good reason to relax the parsing on this side. We can argue a server is trusted so we can decide to support some legacy behavior. It is not true on client side and it is highly suspicious if a client is sending an invalid T-E header. Note also we continue to reject unsupported T-E values (so all codings except "chunked"). Because the "TE" header is sanitized and cannot contain other value than "Trailers", there is absolutely no reason for a server to use something else. This patch should fix the issue #2677. It could probably be backported as far as 2.6 if necessary.	2024-09-12 09:21:57 +02:00
Willy Tarreau	2b95c77c08	DOC: server: document what to check for when adding new server keywords It's too easy to overlook the dynamic servers when adding new server keywords, and the fields on each keyword line are totally obscure. This commit adds a title to each column of the table and explains what is expected and what to check for when adding a keyword.	2024-09-10 18:50:12 +02:00
Damien Claisse	ce6a621ae3	MINOR: server: allow init-state for dynamic servers Commit 50322df introduced the init-state keyword, but it didn't enable it for dynamic servers. However, this feature is perfectly desirable for virtual servers too, where someone would like a server inlived through "set server be1/srv1 state ready" to be put out of maintenance in down state until the next health check succeeds. At reading the code, it seems that it's only a matter of allowing this keyword for dynamic servers, as current code path calls srv_adm_set_ready() which incidentally triggers a call to _srv_update_status_adm().	2024-09-10 18:18:38 +02:00
Willy Tarreau	33deb4babe	REGTESTS: shorten a bit the delay for the h1/h2 upgrade test Commit d6c4ed9a96 ("REGTESTS: h1/h2: Update script testing H1/H2 protocol upgrades") introduced a 0.5 second delay which is higher than those of most other tests (usually 0.05 or 0.2) and triggers timeouts on my side. Let's just shorten it to 0.2 since its goal is only to send data separately. Note: maybe a barrier approach would be possible, though not studied.	2024-09-10 10:36:59 +02:00
Willy Tarreau	9f8d9c9e8b	BUG/MINOR: pattern: do not leave a leading comma on "set" error messages Commit 4f2493f355 ("BUG/MINOR: pattern: pat_ref_set: fix UAF reported by coverity") dropped the condition to concatenate error messages and as such introduced a leading comma in front of all of them. Then commit 911f4d93d4 ("BUG/MINOR: pattern: pat_ref_set: return 0 if err was found") changed the behavior to stop at the first error anyway, so all the mechanics dedicated to the concatenation of error messages is no longer needed and we can simply return the error as-is, without inserting any comma. This should be backported where the patches above are backported.	2024-09-10 08:55:29 +02:00
Willy Tarreau	036ab62231	REGTESTS: fix random failures with wrong_ip_port_logging.vtc under load This test has an expect rule for syslog that looks for [cC]D, to indicate a client abort or timeout during the data phase. The purpose was to say that when it fails it must be this, but the very low timeout (1ms) still makes it prone to succeeding if the machine is highly loaded. This has become more visible since commit e8b1ad4c2b ("BUG/MEDIUM: clock: also update the date offset on time jumps") because the clock drift adjustments are more systematic. Since this commit, running 50 such tests at twice more than the number of CPUs in parallel is sufficient to yield errors due to some lines appearing as succeeding: make reg-tests -- --j $((($(nproc)+1)*2)) --vtestparams -n50 reg-tests/log/wrong_ip_port_logging.vtc It was observed that pauses up to 300ms were observed in epoll_wait() in such circumstances, which were properly fixed by the time drift detection.. Another approach would consist in increasing the permitted margin during which we don't fix the clock drift but that would not be logical since the base time had really been awaited for. This should be backported to all stable releases since the commit above will trigger the issue more often.	2024-09-09 19:38:28 +02:00
Christopher Faulet	a99d58819f	BUG/MINOR: h1-htx: Don't flag response as bodyless when a tunnel is established This reverts commit 225a4d02e1f6a12c0b4f3584949fad3339d71708. When a 200-OK response is replied to a CONNECT request or a 101-Switching-protocol, a tunnel is considered as established between the client and the server. However, we must not declare the reponse as bodyless. Of course, there is no payload, but tunneled data are expected. Because of this bug, the zero-copy forwarding is disabled on the server side. This patch must be backported as far as 2.9.	2024-09-09 19:01:47 +02:00
Christopher Faulet	f6e193f1b0	BUG/MAJOR: mux-h1: Wake SC to perform 0-copy forwarding in CLOSING state When the mux is woken up on I/O events, if the zero-copy forwarding is enabled, receives are blocked. In this case, the SC is woken up to be able to perform 0-copy forwarding to the other side. This works well, except for the H1C in CLOSING state. Indeed, in that case, in h1_process(), the SC is not woken up because only RUNNING H1 connections are considered. As consequence, the mux will ignore connection closure. The H1 connection remains blocked, waiting for the shutdown timeout. If no timeout is configured, the H1 connection is never closed leading to a leak. This patch should fix leak reported by Damien Claisse in the issue #2697. It should be backported as far as 2.8.	2024-09-09 19:01:47 +02:00
William Lallemand	021ac6a108	MEDIUM: ssl/cli: "dump ssl cert" allow to dump a certificate in PEM format The new "dump ssl cert" CLI command allows to dump a certificate stored into HAProxy memory. Until now it was only possible to dump the description of the certificate using "show ssl cert", but with this new command you can dump the PEM content on the filesystem. This command is only available on a admin stats socket. $ echo "@1 dump ssl cert cert.pem" \| socat /tmp/master.sock - -----BEGIN PRIVATE KEY----- [...] -----END PRIVATE KEY----- -----BEGIN CERTIFICATE----- [...] -----END CERTIFICATE----- -----BEGIN CERTIFICATE----- [...] -----END CERTIFICATE-----	2024-09-09 16:54:48 +02:00
Aurelien DARRAGON	68cfb222b5	BUG/MEDIUM: pattern: prevent UAF on reused pattern expr Since c5959fd ("MEDIUM: pattern: merge same pattern"), UAF (leading to crash) can be experienced if the same pattern file (and match method) is used in two default sections and the first one is not referenced later in the config. In this case, the first default section will be cleaned up. However, due to an unhandled case in the above optimization, the original expr which the second default section relies on is mistakenly freed. This issue was discovered while trying to reproduce GH #2708. The issue was particularly tricky to reproduce given the config and sequence required to make the UAF happen. Hopefully, Github user @asmnek not only provided useful informations, but since he was able to consistently trigger the crash in his environment he was able to nail down the crash to the use of pattern file involved with 2 named default sections. Big thanks to him. To fix the issue, let's push the logic from c5959fd a bit further. Instead of relying on "do_free" variable to know if the expression should be freed or not (which proved to be insufficient in our case), let's switch to a simple refcounting logic. This way, no matter who owns the expression, the last one attempting to free it will be responsible for freeing it. Refcount is implemented using a 32bit value which fills a previous 4 bytes structure gap: int mflags; /* 80 4 / / XXX 4 bytes hole, try to pack / long unsigned int lock; / 88 8 */ (output from pahole) Even though it was not reproduced in 2.6 or below by @asmnek (the bug was revealed thanks to another bugfix), this issue theorically affects all stable versions (up to c5959fd), thus it should be backported to all stable versions.	2024-09-09 16:07:05 +02:00
Aurelien DARRAGON	8157c1caf2	BUG/MEDIUM: pattern: prevent uninitialized reads in pat_match_{str,beg} Using valgrind when running map_beg or map_str, the following error is reported: ==242644== Conditional jump or move depends on uninitialised value(s) ==242644== at 0x2E4AB1: pat_match_str (pattern.c:457) ==242644== by 0x2E81ED: pattern_exec_match (pattern.c:2560) ==242644== by 0x343176: sample_conv_map (map.c:211) ==242644== by 0x27522F: sample_process_cnv (sample.c:1330) ==242644== by 0x2752DB: sample_process (sample.c:1373) ==242644== by 0x319917: action_store (vars.c:814) ==242644== by 0x24D451: http_req_get_intercept_rule (http_ana.c:2697) In fact, the error is legit, because in pat_match_{beg,str}, we dereference the buffer on len+1 to check if a value was previously set, and then decide to force NULL-byte if it wasn't set. But the approach is no longer compatible with current architecture: data past str.data is not guaranteed to be initialized in the buffer. Thus we cannot dereference the value, else we expose us to uninitialized read errors. Moreover, the check is useless, because we systematically set the ending byte to 0 when the conditions are met. Finally, restoring the older value after the lookup is not relevant: indeed, either the sample is marked as const and in such case it is already duplicated, or the sample is not const and we forcefully add a terminating NULL byte outside from the actual string bytes (since we're past str.data), so as we didn't alter effective string data and that data past str.data cannot be dereferenced anyway as it isn't guaranteed to be initialized, there's no point in restoring previous uninitialized data. It could be backported in all stable versions. But since this was only detected by valgrind and isn't known to cause issues in existing deployments, it's probably better to wait a bit before backporting it to avoid any breakage.. although the fix should be theoretically harmless.	2024-09-09 15:57:30 +02:00

1 2 3 4 5 ...

23038 Commits