haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-14 02:57:01 +02:00

Author	SHA1	Message	Date
Willy Tarreau	43d4ed548f	CLEANUP: pools: merge pool_{get_from,put_to}_local_caches with generic ones Since pool_get_from_cache() and pool_put_to_cache() were now only wrappers to the local cache versions which do all the job, let's merge them together so that there is no more local-cache specific function.	2021-04-19 15:24:33 +02:00
Willy Tarreau	d56db11447	CLEANUP: pools: make the local cache allocator fall back to the shared cache Now when pool_get_from_local_cache() fails, it automatically falls back to pool_get_from_shared_cache(), which used to always be done in pool_get_from_cache(). Thus now the API is simpler as we always allocate and free from/to the local caches.	2021-04-19 15:24:33 +02:00
Willy Tarreau	fa19d20ac4	MEDIUM: pools: make pool_put_to_cache() always call pool_put_to_local_cache() Till now it used to call it only if there were not too many objects into the local cache otherwise would send the latest one directly into the shared cache. Now it always sends to the local cache and it's up to the local cache to free its oldest objects. From a cache freshness perspective it's better this way since we always evict cold objects instead of hot ones. From an API perspective it's better because it will help make the shared cache invisible to the public API.	2021-04-19 15:24:33 +02:00
Willy Tarreau	147e1fa385	MINOR: pools: create unified pool_{get_from,put_to}_cache() These two functions are now responsible for allocating directly from the cache and releasing to the cache. Now the pool_alloc() function simply does this: if cache enabled return pool_alloc_from_cache() if no NULL return pool_alloc_nocache() otherwise and the pool_free() function does this: if cache enabled pool_put_to_cache() else pool_free_nocache() For now this only introduces these two functions without changing anything else, but the goal is to soon allow to make them implementation-specific.	2021-04-19 15:24:33 +02:00
Willy Tarreau	b8498e961a	MEDIUM: pools: make CONFIG_HAP_POOLS control both local and shared pools Continuing the unification of local and shared pools, now the usage of pools is governed by CONFIG_HAP_POOLS without which allocations and releases are performed directly from the OS using pool_alloc_nocache() and pool_free_nocache().	2021-04-19 15:24:33 +02:00
Willy Tarreau	45e4e28161	MINOR: pools: factor the release code into pool_put_to_os() There are two levels of freeing to the OS: - code that wants to keep the pool's usage counters updated uses pool_free_area() and handles the counters itself. That's what pool_put_to_shared_cache() does in the no-global-pools case. - code that does not want to update the counters because they were already updated only calls pool_free_area(). Let's extract these calls to establish the symmetry with pool_get_from_os() and pool_alloc_nocache(), resulting in pool_put_to_os() (which only updates the allocated counter) and pool_free_nocache() (which also updates the used counter). This will later allow to simplify the generic code.	2021-04-19 15:24:33 +02:00
Willy Tarreau	acf0c54491	MINOR: pools: move pool_free_area() out of the lock in the locked version Calling pool_free_area() inside a lock in pool_put_to_shared_cache() is a very bad idea. Fortunately this only happens on the lowest end platforms which almost never use threads or in very small counts. This change consists in zeroing the pointer once already released to the cache in the first test so that the second stage knows if it needs to pass it to the OS or not. This has slightly reduced the length of the	2021-04-19 15:24:33 +02:00
Willy Tarreau	2b5579f6da	MINOR: pools: always use atomic ops to maintain counters A part of the code cannot be factored out because it still uses non-atomic inc/dec for pool->used and pool->allocated as these are located under the pool's lock. While it can make sense in terms of bus cycles, it does not make sense in terms of code normalization. Further, some operations were still performed under a lock that could be totally removed via the use of atomic ops. There is still one occurrence in pool_put_to_shared_cache() in the locked code where pool_free_area() is called under the lock, which must absolutely be fixed.	2021-04-19 15:24:33 +02:00
Willy Tarreau	13843641e5	MINOR: pools: split the OS-based allocator in two Now there's one part dealing with the allocation itself and keeping counters up to date, and another one on top of it to return such an allocated pointer to the user and update the use count and stats. This is in anticipation for being able to group cache-related parts. The release code is still done at once.	2021-04-19 15:24:33 +02:00
Willy Tarreau	207c095098	MINOR: pools: move the fault injector to __pool_alloc() Till now it was limited to objects allocated from the OS which means it had little use as soon as pools were enabled. Let's move it upper in the layers so that any code can benefit from fault injection. In addition this allows to pass a new flag POOL_F_NO_FAIL to disable it if some callers prefer a no-failure approach.	2021-04-19 15:24:33 +02:00
Willy Tarreau	84ebfabf7f	MINOR: tools: add statistical_prng_range() to get a random number over a range This is simply a multiply and shift from statistical_prng() but it's made easily accessible.	2021-04-19 15:24:33 +02:00
Willy Tarreau	635cced32f	CLEANUP: pools: rename __pool_free() to pool_put_to_shared_cache() Now the multi-level cache becomes more visible: pool_get_from_local_cache() pool_put_to_local_cache() pool_get_from_shared_cache() pool_put_to_shared_cache()	2021-04-19 15:24:33 +02:00
Willy Tarreau	8c77ee5ae5	CLEANUP: pools: rename pool__{from,to}_cache() to _local_cache() The functions were rightfully called from/to_cache when the thread-local cache was considered as the only cache, but this is getting terribly confusing. Let's call them from/to local_cache to make it clear that it is not related with the shared cache. As a side note, since pool_evict_from_cache() used not to work for a particular pool but for all of them at once, it was renamed to pool_evict_from_local_caches() (plural form).	2021-04-19 15:24:33 +02:00
Willy Tarreau	2f03dcde91	CLEANUP: pools: rename __pool_get_first() to pool_get_from_shared_cache() This is exactly what it is, the entry is retrieved from the shared cache when it is defined. The implementation that is enabled with CONFIG_HAP_NO_GLOBAL_POOLS continues to return NULL.	2021-04-19 15:24:33 +02:00
Willy Tarreau	2543211830	CLEANUP: pools: move the lock to the only __pool_get_first() that needs it Now that __pool_alloc() only surrounds __pool_get_first() with the lock, let's move it to the only variant that requires it and remove the ugly ifdefs from the function. This is safe because nobody else calls this function.	2021-04-19 15:24:33 +02:00
Willy Tarreau	8ee9df57db	MINOR: pools: call pool_alloc_nocache() out of the pool's lock In __pool_alloc(), historically we used to use factor out the pool's lock between __pool_get_first() and __pool_refill_alloc(), resulting in real malloc() or mmap() calls being performed under the pool lock (for platforms using the locked shared pools). As this is not needed anymore, let's move the call out of the lock, it may improve allocation patterns on some platforms. This also makes __pool_alloc() cleaner as we see a first attempt to allocate from the local cache, then a second from the shared cache then a reall allocation.	2021-04-19 15:24:33 +02:00
Willy Tarreau	8fe726f118	CLEANUP: pools: re-merge pool_refill_alloc() and __pool_refill_alloc() They were strictly equivalent, let's remerge them and rename them to pool_alloc_nocache() as it's the call which performs a real allocation which does not check nor update the cache. The only difference in the past was the former taking the lock and not the second but now the lock is not needed anymore at this stage since the pool's list is not touched. In addition, given that the "avail" argument is no longer used by the function nor by its callers, let's drop it.	2021-04-19 15:24:33 +02:00
Willy Tarreau	64383b8181	MINOR: pools: make the basic pool_refill_alloc()/pool_free() update needed_avg This is a first step towards unifying all the fallback code. Right now these two functions are the only ones which do not update the needed_avg rate counter since there's currently no shared pool kept when using them. But their code is similar to what could be used everywhere except for this one, so let's make them capable of maintaining usage statistics. As a side effect the needed field in "show pools" will now be populated.	2021-04-19 15:24:33 +02:00
Willy Tarreau	53a7fe49aa	MINOR: pools: enable the fault injector in all allocation modes The mem_should_fail() call enabled by DEBUG_FAIL_ALLOC used to be placed only in the no-cache version of the allocator. Now we can generalize it to all modes and remove the exclusive test on CONFIG_HAP_NO_GLOBAL_POOLS.	2021-04-19 15:24:33 +02:00
Willy Tarreau	2d6f628d34	MINOR: pools: rename CONFIG_HAP_LOCAL_POOLS to CONFIG_HAP_POOLS We're going to make the local pool always present unless pools are completely disabled. This means that pools are always enabled by default, regardless of the use of threads. Let's drop this notion of "local" pools and make it just "pool". The equivalent debug option becomes DEBUG_NO_POOLS instead of DEBUG_NO_LOCAL_POOLS. For now this changes nothing except the option and dropping the dependency on USE_THREAD.	2021-04-19 15:24:33 +02:00
Willy Tarreau	d5140e7c6f	MINOR: pool: remove the size field from pool_cache_head Everywhere we have access to the pool so we don't need to cache a copy of the pool's size into the pool_cache_head. Let's remove it.	2021-04-19 15:24:33 +02:00
Willy Tarreau	9f3129e583	MEDIUM: pools: move the cache into the pool header Initially per-thread pool caches were stored into a fixed-size array. But this was a bit ugly because the last allocated pools were not able to benefit from the cache at all. As a work around to preserve performance, a size of 64 cacheable pools was set by default (there are 51 pools at the moment, excluding any addon and debugging code), so all in-tree pools were covered, at the expense of higher memory usage. In addition an index had to be calculated for each pool, and was used to acces the pool cache head into that array. The pool index was not even stored into the pools so it was required to determine it to access the cache when the pool was already known. This patch changes this by moving the pool cache head into the pool head itself. This way it is certain that each pool will have its own cache. This removes the need for index calculation. The pool cache head is 32 bytes long so it was aligned to 64B to avoid false sharing between threads. The extra cost is not huge (~2kB more per pool than before), and we'll make better use of that space soon. The pool cache head contains the size, which should probably be removed since it's already in the pool's head.	2021-04-19 15:24:33 +02:00
Willy Tarreau	fff96b441f	CLEANUP: pools: remove unused arguments to pool_evict_from_cache() In commit `fb117e6a8` ("MEDIUM: memory: don't let pool_put_to_cache() free the objects itself") pool_evict_from_cache() was introduced with no argument, yet the only call place passes it the pool, the pointer and the index number! Let's remove these as they even let the reader think that the function does something specific to the current pool while it's not the case.	2021-04-19 15:24:33 +02:00
Tim Duesterhus	5be6ab269e	MEDIUM: http_act: Rename uri-normalizers This patch renames all existing uri-normalizers into a more consistent naming scheme: 1. The part of the URI that is being touched. 2. The modification being performed as an explicit verb.	2021-04-19 09:05:57 +02:00
Tim Duesterhus	a407193376	MINOR: uri_normalizer: Add a `percent-upper` normalizer This normalizer uppercases the hexadecimal characters used in percent-encoding. See GitHub Issue #714.	2021-04-19 09:05:57 +02:00
Tim Duesterhus	d7b89be30a	MINOR: uri_normalizer: Add a `sort-query` normalizer This normalizer sorts the `&` delimited query parameters by parameter name. See GitHub Issue #714.	2021-04-19 09:05:57 +02:00
Tim Duesterhus	560e1a6352	MINOR: uri_normalizer: Add support for supressing leading `../` for dotdot normalizer This adds an option to supress `../` at the start of the resulting path.	2021-04-19 09:05:57 +02:00
Tim Duesterhus	9982fc2bbd	MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request normalize-uri This normalizer merges `../` path segments with the predecing segment, removing both the preceding segment and the `../`. Empty segments do not receive special treatment. The `merge-slashes` normalizer should be executed first. See GitHub Issue #714.	2021-04-19 09:05:57 +02:00
Tim Duesterhus	d371e99d1c	MINOR: uri_normalizer: Add a `merge-slashes` normalizer to http-request normalize-uri This normalizer merges adjacent slashes into a single slash, thus removing empty path segments. See GitHub Issue #714.	2021-04-19 09:05:57 +02:00
Tim Duesterhus	d2bedcc4ab	MINOR: uri_normalizer: Add `http-request normalize-uri` This patch adds the `http-request normalize-uri` action that was requested in GitHub issue #714. Normalizers will be added in the next patches.	2021-04-19 09:05:57 +02:00
Tim Duesterhus	0ee1ad5675	MINOR: uri_normalizer: Add `enum uri_normalizer_err` This enum will serve as the return type for each normalizer.	2021-04-19 09:05:57 +02:00
Tim Duesterhus	dbd25c34de	MINOR: uri_normalizer: Add uri_normalizer module This is in preparation for future patches.	2021-04-19 09:05:57 +02:00
Christopher Faulet	76b44195c9	MINOR: threads: Only consider running threads to end a thread harmeless period When a thread ends its harmeless period, we must only consider running threads when testing threads_want_rdv_mask mask. To do so, we reintroduce all_threads_mask mask in the bitwise operation (It was removed to fix a deadlock). Note that for now it is useless because there is no way to stop threads or to have threads reserved for another task. But it is safer this way to avoid bugs in the future.	2021-04-17 11:14:58 +02:00
Christopher Faulet	f63a185500	BUG/MEDIUM: threads: Ignore current thread to end its harmless period A previous patch was pushed to fix a deadlock when an isolated thread ends its harmless period (`a9a9e9aac` ["BUG/MEDIUM: thread: Fix a deadlock if an isolated thread is marked as harmless"]). But, unfortunately, the fix is incomplete. The same must be done in the outer loop, in thread_harmless_end() function. The current thread must be ignored when threads_want_rdv_mask mask is tested. This patch must also be backported as far as 2.0.	2021-04-17 11:14:58 +02:00
Alex	41007a6835	MINOR: sample: converter: Add mjson library. This library is required for the subsequent patch which adds the JSON query possibility. It is necessary to change the include statement in "src/mjson.c" because the imported includes in haproxy are in "include/import" orig: #include "mjson.h" new: #include <import/mjson.h>	2021-04-15 17:05:38 +02:00
Tim Duesterhus	763342646f	MINOR: ist: Add `istclear(struct ist*)` istclear allows one to easily reset an ist to zero-size, while preserving the previous size, indicating the length of the underlying buffer.	2021-04-14 19:49:33 +02:00
Moemen MHEDHBI	92f7d43c5d	MINOR: sample: add ub64dec and ub64enc converters ub64dec and ub64enc are the base64url equivalent of b64dec and base64 converters. base64url encoding is the "URL and Filename Safe Alphabet" variant of base64 encoding. It is also used in in JWT (JSON Web Token) standard. RFC1421 mention in base64.c file is deprecated so it was replaced with RFC4648 to which existing converters, base64/b64dec, still apply. Example: HAProxy: http-request return content-type text/plain lf-string %[req.hdr(Authorization),word(2,.),ub64dec] Client: Token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyIjoiZm9vIiwia2V5IjoiY2hhZTZBaFhhaTZlIn0.5VsVj7mdxVvo1wP5c0dVHnr-S_khnIdFkThqvwukmdg $ curl -H "Authorization: Bearer ${TOKEN}" http://haproxy.local {"user":"foo","key":"chae6AhXai6e"}	2021-04-13 17:28:13 +02:00
Christopher Faulet	0c6d1dcf7d	BUG/MINOR: listener: Handle allocation error when allocating a new bind_conf Allocation error are now handled in bind_conf_alloc() functions. Thus callers, when not already done, are also updated to catch NULL return value. This patch may be backported (at least partially) to all stable versions. However, it only fix errors durung configuration parsing. Thus it is not mandatory.	2021-04-12 21:33:43 +02:00
Christopher Faulet	147b8c919c	MINOIR: checks/trace: Register a new trace source with its events Add the trace support for the checks. Only tcp-check based health-checks are supported, including the agent-check. In traces, the first argument is always a check object. So it is easy to get all info related to the check. The tcp-check ruleset, the conn-stream and the connection, the server state...	2021-04-12 12:09:36 +02:00
Christopher Faulet	6d80b63e3c	MINOR: trace: Add the checks as a possible trace source To be able to add the trace support for the checks, a new kind of source must be added for this purpose.	2021-04-12 12:09:36 +02:00
Willy Tarreau	7b1425a91b	MINOR: atomic: reimplement the relaxed version of x86 BTS/BTR Olivier spotted that I messed up during a rebase of commit `92c059c2a` ("MINOR: atomic: implement native BTS/BTR for x86"), losing the x86 version of the BTS/BTR and leaving the generic version for it instead of having this block in the #else. Since this variant is not used for now it was easy to overlook it. Let's re-implement it here.	2021-04-12 10:01:44 +02:00
Willy Tarreau	c4c80fb4ea	MINOR: time: move the time initialization out of tv_update_date() The time initialization was made a bit complex because we rely on a dummy negative argument to reset all fields, leaving no distinction between process-level initialization and thread-level initialization. This patch changes this by introducing two functions, one for the process and the second one for the threads. This removes ambigous test and makes sure that the relevant fields are always initialized exactly once. This also offers a better solution to the bug fixed in commit `b48e7c001` ("BUG/MEDIUM: time: make sure to always initialize the global tick") as there is no more special values for global_now_ms. It's simple enough to be backported if any other time-related issues are encountered in stable versions in the future.	2021-04-11 23:45:48 +02:00
Willy Tarreau	61c72c366e	CLEANUP: time: remove the now unused ms_left_scaled It was only used by freq_ctr and is not used anymore. In addition the local curr_sec_ms was removed, as well as the equivalent extern definitions which did not exist anymore either.	2021-04-11 14:01:53 +02:00
Willy Tarreau	d46ed5c26b	MINOR: freq_ctr: simplify and improve the update function update_freq_ctr_period() was still not very clean and didn't wait for the rotation lock to be dropped before trying again, thus maintaining the contention at a high level. In addition, the rotation update was made in three steps, which are not very efficient in terms of bus cycles. Here the wait loop was reworked so that the fast path remains short and that the contended path waits for the lock to be dropped before attempting another write, but it only waits a relax cycle before attempting a read. The rotation block was simplified to remove a test that was already validated by the first loop, and so that the retrieval of the current period, its reset and its increment are all performed in a single atomic op and the store to the previous period is performed immediately after. All this results in significantly smaller code for the inline function (~1kB total) and a shorter critical path.	2021-04-11 14:01:53 +02:00
Willy Tarreau	6339c19cac	MINOR: freq_ctr: add cpu_relax in the rotation loop of update_freq_ctr_period() When counters are rotated, there is contention between the threads which can slow down the operation of the thread performing the rotation. Let's apply a cpu_relax there to let the first thread finish faster.	2021-04-11 11:12:57 +02:00
Willy Tarreau	fc6323ad82	MEDIUM: freq_ctr: replace the per-second counters with the generic ones It remains cumbersome to preserve two versions of the freq counters and two different internal clocks just for this. In addition, the savings from using two different mechanisms are not that important as the only saving is a divide that is replaced by a multiply, but now thanks to the freq_ctr_total() unificaiton the code could also be simplified to optimize it in case of constants. This patch turns all non-period freq_ctr functions to static inlines which call the period-based ones with a period of 1 second. A direct benefit is that a single internal clock is now needed for any counter and that they now all rely on ticks. These 1-second counters are essentially used to report request rates and to enforce a connection rate limitation in listeners. It was verified that these continue to work like before.	2021-04-11 11:12:55 +02:00
Willy Tarreau	fa1258f02c	MINOR: freq_ctr: unify freq_ctr and freq_ctr_period into freq_ctr Both structures are identical except the name of the field starting the period and its description. Let's call them all freq_ctr and the period's start "curr_tick" which is generic. This is only a temporary change and fields are expected to remain the same with no code change (verified).	2021-04-11 11:11:27 +02:00
Willy Tarreau	d209c87142	MINOR: freq_ctr: add the missing next_event_delay_period() There was still no function to compute a wait time for periods, let's implement it on top of freq_ctr_total() as we'll soon need it for the per-second one. The divide here is applied on the frequency so that it will be replaced with a reciprocal multiply when constant.	2021-04-11 11:11:03 +02:00
Willy Tarreau	607be24a85	MEDIUM: freq_ctr: reimplement freq_ctr_remain_period() from freq_ctr_total() Now the function becomes an inline one and only contains a divide and a max. The divide will automatically go away with constant periods.	2021-04-11 11:11:03 +02:00
Willy Tarreau	a7a31b2602	MEDIUM: freq_ctr: make read_freq_ctr_period() use freq_ctr_total() This one is the easiest to implement, it just requires a call and a divide of the result. Anti-flapping correction for low-rates was preserved. Now calls using a constant period will be able to use a reciprocal multiply for the period instead of a divide.	2021-04-11 11:11:03 +02:00
Willy Tarreau	f3a9f8dc5a	MINOR: freq_ctr: add a generic function to report the total value Most of the functions designed to read a counter over a period go through the same complex loop and only differ in the way they use the returned values, so it was worth implementing all this into freq_ctr_total() which returns the total number of events over a period so that the caller can finish its operation using a divide or a remaining time calculation. As a special case, read_freq_ctr_period() doesn't take pending events but requires to enable an anti-flapping correction at very low frequencies. Thus the function implements it when pend<0. Thanks to this function it will be possible to reimplement the other ones as inline and merge the per-second ones with the arbitrary period ones without always adding the cost of a 64 bit divide.	2021-04-11 11:10:57 +02:00
Willy Tarreau	ff88270ef9	MINOR: pool: move pool declarations to read_mostly All pool heads are accessed via a pointer and should not be shared with highly written variables. Move them to the read_mostly section.	2021-04-10 19:27:41 +02:00
Willy Tarreau	f459640ef6	MINOR: global: declare a read_mostly section Some variables are mostly read (mostly pointers) but they tend to be merged with other ones in the same cache line, slowing their access down in multi-thread setups. This patch declares an empty, aligned variable in a section called "read_mostly". This will force a cache-line alignment on this section so that any variable declared in it will be certain to avoid false sharing with other ones. The section will be eliminated at link time if not used. A __read_mostly attribute was added to compiler.h to ease use of this section.	2021-04-10 19:27:41 +02:00
Willy Tarreau	ba386f6a8d	CLEANUP: initcall: rely on HA_SECTION_* instead of defining its own Now initcalls are defined using the regular section definitions from compiler.h in order to ease maintenance.	2021-04-10 19:27:41 +02:00
Willy Tarreau	5bec4c42ed	MINOR: compiler: add macros to declare section names HA_SECTION() is used as an attribute to force a section name. This is required because OSX prepends "__DATA, " in front of the declaration. HA_SECTION_START() and HA_SECTION_STOP() are used as post-attribute on variable declaration to designate the section start/end (needed only on OSX, empty on others). For platforms with an obsolete linker, all macros are left empty. It would possibly still work on some of them but this will not be needed anyway.	2021-04-10 19:27:41 +02:00
Willy Tarreau	731f0c6502	CLEANUP: initcall: rename HA_SECTION to HA_INIT_SECTION The HA_SECTION name is too generic and will be reused globally. Let's rename this one.	2021-04-10 19:27:41 +02:00
Willy Tarreau	afa9bc0ec5	MINOR: initcall: uniformize the section names between MacOS and other unixes Due to length restrictions on OSX the initcall sections are called "i_" there while they're called "init_" on other OSes. However the start and end of sections are still called "__start_init_" and "__stop_init_", which forces to have distinct code between the OSes. Let's switch everyone to "i_" and rename the symbols accordingly.	2021-04-10 19:27:41 +02:00
Willy Tarreau	ad14c2681b	MINOR: trace: replace the trace() inline function with an equivalent macro The trace() function is convenient to avoid calling trace() when traces are not enabled, but there starts to be some callers which place complex expressions in their trace calls, which results in all of them to be evaluated before being passed as arguments to the trace() function. This needlessly wastes precious CPU cycles. Let's change the function for a macro, so that the arguments are now only evaluated when the surce has traces enabled. However having a generic macro being called "trace()" can easily cause conflicts with innocent code so we rename it "_trace". Just doing this has resulted in a 2.5% increase of the HTTP/1 request rate.	2021-04-10 19:27:41 +02:00
Willy Tarreau	9057a0026e	CLEANUP: pattern: make all pattern tables read-only Interestingly, all arrays used to declare patterns were read-write while only hard-coded. Let's mark them const so that they move from data to rodata and don't risk to experience false sharing.	2021-04-10 17:49:41 +02:00
Tim Duesterhus	403fd722ac	CLEANUP: Remove useless malloc() casts This is not C++.	2021-04-08 20:11:58 +02:00
Tim Duesterhus	fea59fcf79	CLEANUP: ist: Remove unused `count` argument from `ist2str*` This argument is not being used inside the function (and the functions themselves are unused as well) and not documented. Its purpose is not clear. Just remove it.	2021-04-08 19:40:59 +02:00
Tim Duesterhus	b8ee894b66	CLEANUP: htx: Make http_get_stline take a `const struct` Nothing is being modified there, so this can be `const`.	2021-04-08 19:40:59 +02:00
Tim Duesterhus	fbc2b79743	MINOR: ist: Rename istappend() to __istappend() Indicate that this function is not inherently safe by adding two underscores as a prefix.	2021-04-08 19:35:52 +02:00
Willy Tarreau	1197459e0a	BUG/MAJOR: fd: switch temp values to uint in fd_stop_both() With latest commit `f50906519` ("MEDIUM: fd: merge fdtab[].ev and state for FD_EV_* and FD_POLL_* into state") one occurrence of a pair of chars was missed in fd_stop_both(), resulting in the operation to fail if the upper flags were set. Interestingly it managed to fail 2 tests in all setups in the CI while all used to work fine on my local machines. Probably that the reason is that the chars had enough room above them for the CAS to fail then refill "old" overwriting the upper parts of the stack, and that thanks to this the subsequent tests worked. With ASAN being used on lots of tests, it very likely caught it but used to only report failed tests with no more info. No backport is needed, as this was never released nor backported.	2021-04-07 20:46:26 +02:00
Tim Duesterhus	8daf8dceb9	MINOR: ist: Add `istsplit(struct ist*, char)` istsplit is a combination of iststop + istadv.	2021-04-07 19:50:43 +02:00
Tim Duesterhus	90aa8c7f02	MINOR: ist: Add `istshift(struct ist*)` istshift() returns the first character and advances the ist by 1.	2021-04-07 19:50:43 +02:00
Tim Duesterhus	551eeaec91	MINOR: ist: Add `istappend(struct ist, char)` This function appends the given char to the given `ist` and returns the resulting `ist`.	2021-04-07 19:50:43 +02:00
Willy Tarreau	92c059c2ac	MINOR: atomic: implement native BTS/BTR for x86 The current BTS/BTR operations on x86 are ugly because they rely on a CAS, so they may be unfair and take time to converge. Fortunately, where they are currently used (mostly FDs) the contention is expected to be rare (mostly listeners). But this also limits their use to such few low-load cases. On x86 there is a set of BTS/BTR instructions which help for this, but before the FD's state migrated to 32 bits there was little use of them since they do not exist in 8 bits. Now at least it makes sense to use them, at the very least in order to significantly reduce the code size (one BTS instead of a CMPXCHG loop). The implementation relies on modern gcc's ability to return condition flags and limit code inflation and register spilling. The fall back is retained on the old implementation for all other situations (inappropriate target size or non-capable compiler). The code shrank by 1.6 kB on the fast path. As expected, for now on up to 4 threads there is no measurable difference of performance.	2021-04-07 18:47:22 +02:00
Willy Tarreau	fa68d2641b	CLEANUP: atomic: use the __atomic variant of BTS/BTR on modern compilers Probably due to the result of an old copy-paste, HA_ATOMIC_BTS/BTR were still implemented using the __sync_* builtins instead of the more modern __atomic_* which allow to specify the memory model. Let's update this to use the newer there and also implement the relaxed variants (which are not used for now).	2021-04-07 18:18:37 +02:00
Willy Tarreau	4781b1521a	CLEANUP: atomic/tree-wide: replace single increments/decrements with inc/dec This patch replaces roughly all occurrences of an HA_ATOMIC_ADD(&foo, 1) or HA_ATOMIC_SUB(&foo, 1) with the equivalent HA_ATOMIC_INC(&foo) and HA_ATOMIC_DEC(&foo) respectively. These are 507 changes over 45 files.	2021-04-07 18:18:37 +02:00
Willy Tarreau	22d675cb77	CLEANUP: atomic: add HA_ATOMIC_INC/DEC for unit increments Most ADD/SUB callers use them for a single unit (e.g. refcounts) and it's a pain to always pass ",1". Let's add them to simplify the API. However we currently don't add any return value. If needed in the future better report zero/non-zero than a real value for the sake of efficiency at the instruction level.	2021-04-07 18:18:37 +02:00
Willy Tarreau	185157201c	CLEANUP: atomic: add a fetch-and-xxx variant for common operations The fetch_and_xxx variant is often missing for add/sub/and/or. In fact it was only provided for ADD under the name XADD which corresponds to the x86 instruction name. But for destructive operations like AND and OR it's missing even more as it's not possible to know the value before modifying it. This patch explicitly adds HA_ATOMIC_FETCH_{OR,AND,ADD,SUB} which cover these standard operations, and renames XADD to FETCH_ADD (there were only 6 call places). In the future, backport of fixes involving such operations could simply remap FETCH_ADD(x) to XADD(x), FETCH_SUB(x) to XADD(-x), and for the OR/AND if needed, these could possibly be done using BTS/BTR. It's worth noting that xchg could have been renamed to fetch_and_store() but xchg already has well understood semantics and it wasn't needed to go further.	2021-04-07 18:18:37 +02:00
Willy Tarreau	a477150fd7	CLEANUP: atomic: make all standard add/or/and/sub operations return void In order to make sure these ones will not be used anymore in an expression, let's make them always void. New callers will now be forced to use the explicit _FETCH variant if required.	2021-04-07 18:18:37 +02:00
Willy Tarreau	1db427399c	CLEANUP: atomic: add an explicit _FETCH variant for add/sub/and/or Currently our atomic ops return a value but it's never known whether the fetch is done before or after the operation, which causes some confusion each time the value is desired. Let's create an explicit variant of these operations suffixed with _FETCH to explicitly mention that the fetch occurs after the operation, and make use of it at the few call places.	2021-04-07 18:18:37 +02:00
Willy Tarreau	6756d95a8e	MINOR: atomic/arm64: detect and use builtins for the double-word CAS Gcc 10.2 implements outline atomics on aarch64. The replace all inline atomic ops with a function call that checks if the machine supports LSE atomics. This comes with a small cost but allows modern machines to scale much better than with the old LL/SC ones even when built for full 8.0 compatibility. This patch enables the use of the __atomic_compare_exchange() builtin for the double-word CAS when detected as available instead of using the hand-written LL/SC version. The extra cost is negligible because we do very few DWCAS operations (essentially FD migrations and shared pools) so the cost is low but under high contention it can still be beneficial. As expected no performance difference was measured in either direction on 4-core machines with this change. This could be backported to 2.3 if it was shown that FD migrations were representing a significant source of contention, but for now it does not appear to be needed.	2021-04-07 18:18:37 +02:00
Willy Tarreau	1673c4a883	MINOR: fd: implement an exclusive syscall bit to remove the ugly "log" lock There is a function called fd_write_frag_line() that's essentially used by loggers and that is used to write an atomic message line over a file descriptor using writev(). However a lock is required around the writev() call to prevent messages from multiple threads from being interleaved. Till now a SPIN_TRYLOCK was used on a dedicated lock that was common to all FDs. This is quite not pretty as if there are multiple output pipes to collect logs, there will be quite some contention. Now that there are empty flags left in the FD state and that we can finally use atomic ops on them, let's add a flag to indicate the FD is locked for exclusive access by a syscall. At least the locking will now be on an FD basis and not the whole process, so we can remove the log_lock.	2021-04-07 18:18:37 +02:00
Willy Tarreau	9063a660cc	MINOR: fd: move .exported into fdtab[].state No need to keep this flag apart any more, let's merge it into the global state.	2021-04-07 18:10:36 +02:00
Willy Tarreau	5362bc9044	MINOR: fd: move .et_possible into fdtab[].state No need to keep this flag apart any more, let's merge it into the global state.	2021-04-07 18:09:43 +02:00
Willy Tarreau	0cc612818d	MINOR: fd: move .initialized into fdtab[].state No need to keep this flag apart any more, let's merge it into the global state. The bit was not cleared in fd_insert() because the only user is the function used to create and atomically send a log message to a pipe FD, which never registers the fd. Here we clear it nevertheless for the sake of clarity. Note that with an extra cleaning pass we could have a bit number here and simply use a BTS to test and set it.	2021-04-07 18:09:08 +02:00
Willy Tarreau	030dae13a0	MINOR: fd: move .cloned into fdtab[].state No need to keep this flag apart any more, let's merge it into the global state.	2021-04-07 18:08:29 +02:00
Willy Tarreau	b41a6e9101	MINOR: fd: move .linger_risk into fdtab[].state No need to keep this flag apart any more, let's merge it into the global state. The CLI's output state was extended to 6 digits and the linger/cloned flags moved inside the parenthesis.	2021-04-07 18:07:49 +02:00
Willy Tarreau	f509065191	MEDIUM: fd: merge fdtab[].ev and state for FD_EV_* and FD_POLL_* into state For a long time we've had fdtab[].ev and fdtab[].state which contain two arbitrary sets of information, one is mostly the configuration plus some shutdown reports and the other one is the latest polling status report which also contains some sticky error and shutdown reports. These ones used to be stored into distinct chars, complicating certain operations and not even allowing to clearly see concurrent accesses (e.g. fd_delete_orphan() would set the state to zero while fd_insert() would only set the event to zero). This patch creates a single uint with the two sets in it, still delimited at the byte level for better readability. The original FD_EV_* values remained at the lowest bit levels as they are also known by their bit value. The next step will consist in merging the remaining bits into it. The whole bits are now cleared both in fd_insert() and _fd_delete_orphan() because after a complete check, it is certain that in both cases these functions are the only ones touching these areas. Indeed, for _fd_delete_orphan(), the thread_mask has already been zeroed before a poller can call fd_update_event() which would touch the state, so it is certain that _fd_delete_orphan() is alone. Regarding fd_insert(), only one thread will get an FD at any moment, and it as this FD has already been released by _fd_delete_orphan() by definition it is certain that previous users have definitely stopped touching it. Strictly speaking there's no need for clearing the state again in fd_insert() but it's cheap and will remove some doubts during some troubleshooting sessions.	2021-04-07 18:04:39 +02:00
Willy Tarreau	8d27c203ed	MEDIUM: fd: prepare FD_POLL_* to move to bits 8-15 In preparation of merging FD_POLL* and FD_EV, this only changes the value of FD_POLL_ to use bits 8-15 (the second byte). The size of the field has been temporarily extended to 32 bits already, as well as the temporary variables that carry the new composite value inside fd_update_events(). The resulting fdtab entry becomes temporarily unaligned. All places making access to .ev or FD_POLL_* were carefully inspected to make sure they were safe regarding this change. Only one temporary update was needed for the "show fd" code. The code was only slightly inflated at this step.	2021-04-07 15:08:40 +02:00
Willy Tarreau	fc0cdfb9b7	CLEANUP: fd: remove FD_POLL_DATA and FD_POLL_STICKY The former was not used and the second was used only as a positive mask of the flags to keep instead of having the flags that are updated. Both were removed in favor of a new FD_POLL_UPDT_MASK that only mentions the updated flags. This will ease merging of state and ev later.	2021-04-07 15:08:40 +02:00
Emeric Brun	9533a70381	MINOR: log: register config file and line number on log servers. This patch registers the parsed file and the line where a log server is declared to make those information available in configuration post check. Those new informations were added on error messages probed resolving ring names on post configuration check.	2021-04-07 09:18:34 +02:00
Amaury Denoyelle	5a6926dcf0	MINOR: diag: create cfgdiag module This module is intended to serve as a placeholder for various diagnostics executed after the configuration file has been fully loaded.	2021-04-01 18:03:37 +02:00
Amaury Denoyelle	7b01a8dbdd	MINOR: global: define diagnostic mode of execution Define MODE_DIAG which is used to run haproxy in diagnostic mode. This mode is used to output extra warnings about possible configuration blunder or sub-optimal usage. It can be activated with argument '-dD'. A new output function ha_diag_warning is implemented reserved for diagnostic output. It serves to standardize the format of diagnostic messages. A macro HA_DIAG_WARN_COND is also available to automatically check if diagnostic mode is on before executing the diagnostic check.	2021-04-01 18:03:37 +02:00
Christopher Faulet	021a8e4d7b	MEDIUM: http-rules: Add wait-for-body action on request and response side Historically, an option was added to wait for the request payload (option http-buffer-request). This option has 2 drawbacks. First, it is an ON/OFF option for the whole proxy. It cannot be enabled on demand depending on the message. Then, as its name suggests, it only works on the request side. The only option to wait for the response payload was to write a dedicated filter. While it is an acceptable solution for complex applications, it is a bit overkill to simply match strings in the body. To make everyone happy, this patch adds a dedicated HTTP action to wait for the message payload, for the request or the response depending it is used in an http-request or an http-response ruleset. The time to wait is configurable and, optionally, the minimum payload size to have before stop to wait. Both the http action and the old http analyzer rely on the same internal function.	2021-04-01 16:27:40 +02:00
Christopher Faulet	581db2b829	MINOR: payload/config: Warn if a L6 sample fetch is used from an HTTP proxy L6 sample fetches are now ignored when called from an HTTP proxy. Thus, a warning is emitted during the startup if such usage is detected. It is true for most ACLs and for log-format strings. Unfortunately, it is a bit painful to do so for sample expressions. This patch relies on the commit "MINOR: action: Use a generic function to check validity of an action rule list".	2021-04-01 15:34:22 +02:00
Christopher Faulet	42c6cf9501	MINOR: action: Use a generic function to check validity of an action rule list The check_action_rules() function is now used to check the validity of an action rule list. It is used from check_config_validity() function to check L5/6/7 rulesets.	2021-04-01 15:34:22 +02:00
Christopher Faulet	3b6446f4d9	MINOR: config/proxy: Don't warn for HTTP rules in TCP if 'switch-mode http' set Warnings about ignored HTTP directives in a TCP proxy are inhibited if at least one switch-mode tcp action is configured to perform HTTP upgraded.	2021-04-01 13:22:42 +02:00
Christopher Faulet	ae863c62e3	MEDIUM: Add tcp-request switch-mode action to perform HTTP upgrade It is now possible to perform HTTP upgrades on a TCP stream from the frontend side. To do so, a tcp-request content rule must be defined with the switch-mode action, specifying the mode (for now, only http is supported) and optionnaly the proto (h1 or h2). This way it could be possible to set HTTP directives on a TCP frontend which will only be evaluated if an upgrade is performed. This new way to perform HTTP upgrades should replace progressively the old way, consisting to route the request to an HTTP backend. And it should be also a good start to remove all HTTP processing from tcp-request content rules. This action is terminal, it stops the ruleset evaluation. It is only available on proxy with the frontend capability. The configuration manual has been updated accordingly.	2021-04-01 13:17:19 +02:00
Christopher Faulet	6c1fd987f6	MINOR: stream: Handle stream HTTP upgrade in a dedicated function The code responsible to perform an HTTP upgrade from a TCP stream is moved in a dedicated function, stream_set_http_mode(). The stream_set_backend() function is slightly updated, especially to correctly set the request analysers.	2021-04-01 11:06:48 +02:00
Christopher Faulet	75f619ad92	MINOR: http-ana: Simplify creation/destruction of HTTP transactions Now allocation and initialization of HTTP transactions are performed in a unique function. Historically, there were two functions because the same TXN was reset for K/A connections in the legacy HTTP mode. Now, in HTX, K/A connections are handled at the mux level. A new stream, and thus a new TXN, is created for each request. In addition, the function responsible to end the TXN is now also reponsible to release it. So, now, http_create_txn() and http_destroy_txn() must be used to create and destroy an HTTP transaction.	2021-04-01 11:06:48 +02:00
Christopher Faulet	bb69d781c8	MINOR: muxes: Show muxes flags when the mux list is displayed When the mux list is displayed on "haproxy -vv" output, the mux flags are now diplayed. The flags meaning may be found in the configuration manual.	2021-04-01 11:06:48 +02:00
Christopher Faulet	a460057f2e	MINOR: muxes: Add a flag to notify a mux does not support any upgrade MX_FL_NO_UPG flag may now be set on a multiplexer to explicitly disable upgrades from this mux. For now, it is set on the FCGI multiplexer because it is not supported and there is no upgrade on backend-only multiplexers. It is also set on the H2 multiplexer because it is clearly not supported.	2021-04-01 11:06:47 +02:00
Willy Tarreau	4bfc6630ba	CLEANUP: socket: replace SOL_IP/IPV6/TCP with IPPROTO_IP/IPV6/TCP Historically we've used SOL_IP/SOL_IPV6/SOL_TCP everywhere as the socket level value in getsockopt() and setsockopt() but as we've seen over time it regularly broke the build and required to have them defined to their IPPROTO_* equivalent. The Linux ip(7) man page says: Using the SOL_IP socket options level isn't portable; BSD-based stacks use the IPPROTO_IP level. And it indeed looks like a pure linuxism inherited from old examples and documentation. strace also reports SOL_* instead of IPPROTO_, which does not help... A check to linux/in.h shows they have the same values. Only SOL_SOCKET and other non-IP values make sense since there is no IPPROTO equivalent. Let's get rid of this annoying confusion by removing all redefinitions of SOL_IP/IPV6/TCP and using IPPROTO_ instead, just like any other operating system. This also removes duplicated tests for the same value. Note that this should not result in exposing syscalls to other OSes as the only ones that were still conditionned to SOL_IPV6 were for IPV6_UNICAST_HOPS which already had an IPPROTO_IPV6 equivalent, and IPV6_TRANSPARENT which is Linux-specific.	2021-03-31 08:59:34 +02:00
Willy Tarreau	be362fd992	MINOR: compat: add short aliases for a few very commonly used types Very often we use "int" where negative numbers are not needed (and can further cause trouble) just because it's painful to type "unsigned int" or "unsigned", or ugly to use in function arguments. Similarly sometimes chars would absolutely need to be signed but nobody types "signed char". Let's add a few aliases for such types and make them part of the standard internal API so that over time we can get used to them and get rid of horrible definitions. A comment also reminds some commonly available types and their properties regarding other types.	2021-03-26 17:54:15 +01:00
Willy Tarreau	2f836de100	MINOR: action: add a new ACT_F_CLI_PARSER origin designation In order to process samples from the command line interface we'll need rules as well, and these rules will have to be marked as coming from the CLI parser. This new origin is used for this.	2021-03-26 16:34:53 +01:00
Willy Tarreau	db5e0dbea9	MINOR: sample: add a new CLI_PARSER context for samples In order to prepare for supporting calling sample expressions from the CLI, let's create a new CLI_PARSER parsing context. This one supports constants and internal samples only.	2021-03-26 16:34:53 +01:00
Willy Tarreau	01d580ae86	MINOR: action: add a new ACT_F_CFG_PARSER origin designation In order to process samples from the config file we'll need rules as well, and these rules will have to be marked as coming from the config parser. This new origin is used for this.	2021-03-26 16:23:45 +01:00
Willy Tarreau	f9a7a8fd8e	MINOR: sample: add a new CFG_PARSER context for samples We'd sometimes like to be able to process samples while parsing the configuration based on purely internal thing but that's not possible right now. Let's add a new CFG_PARSER context for samples which only permits constant samples (i.e. those which do not change in the process' life and which are stable during config parsing).	2021-03-26 16:23:45 +01:00
Willy Tarreau	be2159b946	MINOR: sample: add a new SMP_SRC_CONST sample capability This level indicates that everything it constant in the expression during the whole process' life and that it may safely be used at config parsing time.	2021-03-26 16:23:45 +01:00
Willy Tarreau	77e6a4ef0f	MINOR: sample: make smp_resolve_args() return an allocate error message For now smp_resolve_args() complains on stderr via ha_alert(), but if we want to make it a bit more dynamic, we need it to return errors in an allocated message. Let's pass it an error pointer and have it fill it. On return we indent the output if it contains more than one line.	2021-03-26 16:23:45 +01:00
Amaury Denoyelle	6f26faecd8	MINOR: proxy: define cap PR_CAP_LUA Define a new cap PR_CAP_LUA. It can be used to allocate the internal proxy for lua Socket class. This cap overrides default settings for preferable values in the lua context.	2021-03-26 15:28:33 +01:00
Amaury Denoyelle	27fefa1967	MINOR: proxy: implement a free_proxy function Move all liberation code related to a proxy in a dedicated function free_proxy in proxy.c. For now, this function is only called in haproxy.c. In the future, it will be used to free the lua proxy. This helps to clean up haproxy.c.	2021-03-26 15:28:33 +01:00
Amaury Denoyelle	476b9ad97a	REORG: split proxy allocation functions Create a new function parse_new_proxy specifically designed to allocate a new proxy from the configuration file and copy settings from the default proxy. The function alloc_new_proxy is reduced to a minimal allocation. It is used for default proxy allocation and could also be used for internal proxies such as the lua Socket proxy.	2021-03-26 15:28:33 +01:00
Amaury Denoyelle	68fd7e43d3	REORG: global: move free acl/action in their related source files Move deinit_acl_cond and deinit_act_rules from haproxy.c respectively in acl.c and action.c. The name of the functions has been slightly altered, replacing the prefix deinit_* by free_* to reflect their purpose more clearly. This change has been made in preparation to the implementation of a free proxy function. As a side-effect, it helps to clean up haproxy.c.	2021-03-26 15:28:33 +01:00
Amaury Denoyelle	ce44482fe5	REORG: global: move initcall register code in a dedicated file Create a new module init which contains code related to REGISTER_* macros for initcalls. init.h is included in api.h to make init code available to all modules. It's a step to clean up a bit haproxy.c/global.h.	2021-03-26 15:28:33 +01:00
Ilya Shipitsin	df627943a4	BUILD: ssl: introduce fine guard for ssl random extraction functions SSL_get_{client,server}_random are supported in OpenSSL-1.1.0, BoringSSL, LibreSSL-2.7.0 let us introduce HAVE_SSL_EXTRACT_RANDOM for that purpose	2021-03-26 15:19:07 +01:00
Remi Tricot-Le Breton	8218aed90e	BUG/MINOR: ssl: Fix update of default certificate The default SSL_CTX used by a specific frontend is the one of the first ckch instance created for this frontend. If this instance has SNIs, then the SSL context is linked to the instance through the list of SNIs contained in it. If the instance does not have any SNIs though, then the SSL_CTX is only referenced by the bind_conf structure and the instance itself has no link to it. When trying to update a certificate used by the default instance through a cli command, a new version of the default instance was rebuilt but the default SSL context referenced in the bind_conf structure would not be changed, resulting in a buggy behavior in which depending on the SNI used by the client, he could either use the new version of the updated certificate or the original one. This patch adds a reference to the default SSL context in the default ckch instances so that it can be hot swapped during a certificate update. This should fix GitHub issue #1143. It can be backported as far as 2.2.	2021-03-26 13:06:29 +01:00
Willy Tarreau	6cf13119e2	CLEANUP: fd: remove unused fd_set_running_excl() This one is no longer used and was the origin of the previously mentioned deadlock.	2021-03-24 17:17:21 +01:00
Willy Tarreau	2c3f9818e8	BUG/MEDIUM: fd: do not wait on FD removal in fd_delete() Christopher discovered an issue mostly affecting 2.2 and to a less extent 2.3 and above, which is that it's possible to deadlock a soft-stop when several threads are using a same listener: thread1 thread2 unbind_listener() fd_set_running() lock(listener) listener_accept() fd_delete() lock(listener) while (running_mask); -----> deadlock unlock(listener) This simple case disappeared from 2.3 due to the removal of some locked operations at the end of listener_accept() on the regular path, but the architectural problem is still here and caused by a lock inversion built around the loop on running_mask in fd_clr_running_excl(), because there are situations where the caller of fd_delete() may hold a lock that is preventing other threads from dropping their bit in running_mask. The real need here is to make sure the last user deletes the FD. We have all we need to know the last one, it's the one calling fd_clr_running() last, or entering fd_delete() last, both of which can be summed up as the last one calling fd_clr_running() if fd_delete() calls fd_clr_running() at the end. And we can prevent new threads from appearing in running_mask by removing their bits in thread_mask. So what this patch does is that it sets the running_mask for the thread in fd_delete(), clears the thread_mask, thus marking the FD as orphaned, then clears the running mask again, and completes the deletion if it was the last one. If it was not, another thread will pass through fd_clr_running and will complete the deletion of the FD. The bug is easily reproducible in 2.2 under high connection rates during soft close. When the old process stops its listener, occasionally two threads will deadlock and the old process will then be killed by the watchdog. It's strongly believed that similar situations do exist in 2.3 and 2.4 (e.g. if the removal attempt happens during resume_listener() called from listener_accept()) but if so, they should be much harder to trigger. This should be backported to 2.2 as the issue appeared with the FD migration. It requires previous patches "fd: make fd_clr_running() return the remaining running mask" and "MINOR: fd: remove the unneeded running bit from fd_insert()". Notes for backport: in 2.2, the fd_dodelete() function requires an extra argument "do_close" indicating whether we want to remove and close the FD (fd_delete) or just delete it (fd_remove). While this information is not conveyed along the chain, we know that late calls always imply do_close=1 become do_close=0 exclusively results from fd_remove() which is only used by the config parser and the master, both of which are single-threaded, hence are always the last ones in the running_mask. Thus it is safe to assume that a postponed FD deletion always implies do_close=1. Thanks to Olivier for his help in designing this optimal solution.	2021-03-24 17:17:21 +01:00
Willy Tarreau	71bada5ca4	MINOR: fd: remove the unneeded running bit from fd_insert() There's no point taking the running bit in fd_insert() since by definition there will never be more than one thread inserting the FD, and that fd_insert() may only be done after the fd was allocated by the system, indicating the end of use by any other thread. This will need to be backported to 2.2 to fix an issue.	2021-03-24 17:17:21 +01:00
Willy Tarreau	6e8e10b415	MINOR: fd: make fd_clr_running() return the remaining running mask We'll need to know that a thread is the last one to use an fd, so let's make fd_clr_running() return the remaining bits after removal. Note that in practice we're only interested in knowing if it's zero but the compiler doesn't make use of the clags after the AND and emits a CMPXCHG anyway :-/ This will need to be backported to 2.2 to fix an issue.	2021-03-24 17:17:21 +01:00
Christopher Faulet	cc2c4f8f4c	BUG/MEDIUM: debug/lua: Use internal hlua function to dump the lua traceback The commit reverts following commits: * `83926a04` BUG/MEDIUM: debug/lua: Don't dump the lua stack if not dumpable * `a61789a1` MEDIUM: lua: Use a per-thread counter to track some non-reentrant parts of lua Instead of relying on a Lua function to print the lua traceback into the debugger, we are now using our own internal function (hlua_traceback()). This one does not allocate memory and use a chunk instead. This avoids any issue with a possible deadlock in the memory allocator because the thread processing was interrupted during a memory allocation. This patch relies on the commit "BUG/MEDIUM: debug/lua: Use internal hlua function to dump the lua traceback". Both must be backported wherever the patches above are backported, thus as far as 2.0	2021-03-24 16:35:23 +01:00
Christopher Faulet	d09cc519bd	MINOR: lua: Slightly improve function dumping the lua traceback The separator string is now configurable, passing it as parameter when the function is called. In addition, the message have been slightly changed to be a bit more readable.	2021-03-24 16:33:26 +01:00
Ilya Shipitsin	8cd1627599	CLEANUP: ssl: remove unused definitions not need since `e7eb1fec2f`	2021-03-24 09:52:32 +01:00
Remi Tricot-Le Breton	fb00f31af4	BUG/MINOR: ssl: Prevent disk access when using "add ssl crt-list" If an unknown CA file was first mentioned in an "add ssl crt-list" CLI command, it would result in a call to X509_STORE_load_locations which performs a disk access which is forbidden during runtime. The same would happen if a "ca-verify-file" or "crl-file" was specified. This was due to the fact that the crt-list file parsing and the crt-list related CLI commands parsing use the same functions. The patch simply adds a new parameter to all the ssl_bind parsing functions so that they know if the call is made during init or by the CLI, and the ssl_store_load_locations function can then reject any new cafile_entry creation coming from a CLI call. It can be backported as far as 2.2.	2021-03-23 19:29:46 +01:00
Emeric Brun	69ba35146f	MINOR: tools: introduce new option PA_O_DEFAULT_DGRAM on str2sa_range. str2sa_range function options PA_O_DGRAM and PA_O_STREAM are used to define the supported address types but also to set the default type if it is not explicit. If the used address support both STREAM and DGRAM, the default was always set to STREAM. This patch introduce a new option PA_O_DEFAULT_DGRAM to force the default to DGRAM type if it is not explicit in the address field and both STREAM and DGRAM are supported. If only DGRAM or only STREAM is supported, it continues to be considered as the default.	2021-03-23 15:32:22 +01:00
Willy Tarreau	8cc586c73f	BUG/MEDIUM: freq_ctr/threads: use the global_now_ms variable In commit `a1ecbca0a` ("BUG/MINOR: freq_ctr/threads: make use of the last updated global time"), for period-based counters, the millisecond part of the global_now variable was used as the date for the new period. But it's wrong, it only works with sub-second periods as it wraps every second, and for other periods the counters never rotate anymore. Let's make use of the newly introduced global_now_ms variable instead, which contains the global monotonic time expressed in milliseconds. This patch needs to be backported wherever the patch above is backported. It depends on previous commit "MINOR: time: also provide a global, monotonic global_now_ms timer".	2021-03-23 09:03:37 +01:00
Willy Tarreau	6064b34be0	MINOR: time: also provide a global, monotonic global_now_ms timer The period-based freq counters need the global date in milliseconds, so better calculate it and expose it rather than letting all call places incorrectly retrieve it. Here what we do is that we maintain a new globally monotonic timer, global_now_ms, which ought to be very close to the global_now one, but maintains the monotonic approach of now_ms between all threads in that global_now_ms is always ahead of any now_ms. This patch is made simple to ease backporting (it will be needed for a subsequent fix), but it also opens the way to some simplifications on the time handling: instead of computing the local time and trying to force it to the global one, we should soon be able to proceed in the opposite way, that is computing the new global time an making the local one just the latest snapshot of it. This will bring the benefit of making sure that the global time is always ahead of the local one.	2021-03-23 09:01:37 +01:00
Willy Tarreau	5d110b25dd	CLEANUP: connection: use pool_zalloc() in conn_alloc_hash_node() This one used to alloc then zero the area, let's have the allocator do it.	2021-03-22 23:17:24 +01:00
Willy Tarreau	18759079b6	MINOR: pools: add pool_zalloc() to return a zeroed area It's like pool_alloc() but the output is zeroed before being returned and is never poisonned.	2021-03-22 22:05:05 +01:00
Willy Tarreau	de749a9333	MINOR: pools: make the pool allocator support a few flags The pool_alloc_dirty() function was renamed to __pool_alloc() and now takes a set of flags indicating whether poisonning is permitted or not and whether zeroing the area is needed or not. The pool_alloc() function is now just a wrapper calling __pool_alloc(pool, 0).	2021-03-22 20:54:15 +01:00
Willy Tarreau	a213b683f7	CLEANUP: pools: remove the unused pool_get_first() function This one used to maintain a shortcut in the pools allocation path that was only justified by b_alloc_fast() which was not used! Let's get rid of it as well so that the allocator becomes a bit more straight forward.	2021-03-22 16:28:08 +01:00
Willy Tarreau	7be7ffac15	CLEANUP: dynbuf: remove the unused b_alloc_fast() function It is never used anymore since 1.7 where it was used by b_alloc_margin() then replaced by direct calls to the pools function, and it maintains a dependency on the exposed pools functions. It's time to get rid of it, as it's not even certain it still works.	2021-03-22 16:28:05 +01:00
Willy Tarreau	f44ca97fcb	CLEANUP: dynbuf: remove b_alloc_margin() It's not used anymore, let's completely remove it before anyone uses it again by accident.	2021-03-22 16:28:02 +01:00
Willy Tarreau	0f495b3d87	MINOR: channel: simplify the channel's buffer allocation The channel's buffer allocator, channel_alloc_buffer(), was still relying on the principle of a margin for the request and not for the response. But this margin stopped working around 1.7 with the introduction of the content filters such as SPOE, and was completely anihilated with the local pools that came with threads. Let's simplify this and just use b_alloc().	2021-03-22 16:19:45 +01:00
Willy Tarreau	766b6cf206	MINOR: dynbuf: make b_alloc() always check if the buffer is allocated Right now there is a discrepancy beteween b_alloc() and b_allow_margin(): the former forcefully overwrites the target pointer while the latter tests it and returns it as-is if already allocated. As a matter of fact, all callers of b_alloc() either preliminary test the buffer, or assume it's already null. Let's remove this pain and make the function test the buffer's allocation before doing it again, and match call places' expectations.	2021-03-22 16:14:45 +01:00
Christopher Faulet	a61789a1d6	MEDIUM: lua: Use a per-thread counter to track some non-reentrant parts of lua Some parts of the Lua are non-reentrant. We must be sure to carefully track these parts to not dump the lua stack when it is interrupted inside such parts. For now, we only identified the custom lua allocator. If the thread is interrupted during the memory allocation, we must not try to print the lua stack wich also allocate memory. Indeed, realloc() is not async-signal-safe. In this patch we introduce a thread-local counter. It is incremented before entering in a non-reentrant part and decremented when exiting. It is only performed in hlua_alloc() for now.	2021-03-19 16:16:23 +01:00
Olivier Houchard	dae6975498	MINOR: muxes: garbage collect the reset() method. Now that connections aren't being reused when they failed, remove the reset() method. It was unimplemented anywhere, except for H1 where it did nothing, anyway.	2021-03-19 15:33:04 +01:00
Olivier Houchard	1b3c931bff	MEDIUM: connections: Introduce a new XPRT method, start(). Introduce a new XPRT method, start(). The init() method will now only initialize whatever is needed for the XPRT to run, but any action the XPRT has to do before being ready, such as handshakes, will be done in the new start() method. That way, we will be sure the full stack of xprt will be initialized before attempting to do anything. The init() call is also moved to conn_prepare(). There's no longer any reason to wait for the ctrl to be ready, any action will be deferred until start(), anyway. This means conn_xprt_init() is no longer needed.	2021-03-19 15:33:04 +01:00
Amaury Denoyelle	216a1ce3b9	MINOR: stats: export function to allocate extra proxy counters Remove static qualifier on stats_allocate_proxy_counters_internal. This function will be used to allocate extra counters at runtime for dynamic servers.	2021-03-18 15:52:07 +01:00
Amaury Denoyelle	76e10e78bb	MINOR: server: prepare parsing for dynamic servers Prepare the server parsing API to support dynamic servers. - define a new parsing flag to be used for dynamic servers - each keyword contains a new field dynamic_ok to indicate if it can be used for a dynamic server. For now, no keyword are supported. - do not copy settings from the default server for a new dynamic server. - a dynamic server is created in a maintenance mode and requires an explicit 'enable server' command. - a new server flag named SRV_F_DYNAMIC is created. This flag is set for all servers created at runtime. It might be useful later, for example to know if a server can be purged.	2021-03-18 15:51:12 +01:00
Amaury Denoyelle	30c0537f5a	REORG: server: use flags for parse_server Modify the API of parse_server function. Use flags to describe the type of the parsed server instead of discrete arguments. These flags can be used to specify if a server/default-server/server-template is parsed. Additional parameters are also specified (parsing of the address required, resolve of a name must be done immediately). It is now unneeded to use strcmp on args[0] in parse_server. Also, the calls to parse_server are more explicit thanks to the flags.	2021-03-18 15:37:05 +01:00
Amaury Denoyelle	828adf0121	REORG: server: add a free server function Create a new server function named free_server. It can be used to deallocate a server and its member.	2021-03-18 15:37:05 +01:00
Amaury Denoyelle	18487fb532	MINOR: cli: implement experimental-mode Experimental mode is similar to expert-mode. It can be used to access to features still in development.	2021-03-18 15:37:05 +01:00
Willy Tarreau	6f9f2c0857	MINOR: freq_ctr/threads: relax when failing to update a sliding window value The swrate_add* functions would sping fast on a failed CAS, better place a cpu_relax() call there to reduce contention if any.	2021-03-17 19:36:15 +01:00
Willy Tarreau	a1ecbca0a5	BUG/MINOR: freq_ctr/threads: make use of the last updated global time The freq counters were using the thread's own time as the start of the current period. The problem is that in case of contention, it was occasionally possible to perform non-monotonic updates on the edge of the next second, because if the upfront thread updates a counter first, it causes a rotation, then the second thread loses the race from its older time, and tries again, and detects a different time again, but in the past so it only updates the counter, then a third thread on the new date would detect a change again, thus provoking a rotation again. The effect was triple: - rare loss of stored values during certain transitions from one period to the next one, causing counters to report 0 - half of the threads forced to go through the slow path every second - difficult convergence when using many threads where the CAS can fail a lot and we can observe N(N-1) attempts for N threads to complete This patch fixes this issue in two ways: - first, it now makes use og the monotonic global_now value which also happens to be volatile and to carry the latest known time; this way time will never jump backwards anymore and only the first thread updates it on transition, the other ones do not need to. - second, re-read the time in the loop after each failure, because if the date changed in the counter, it means that one thread knows a more recent one and we need to update. In this case if it matches the new current second, the fast path is usable. This patch relies on previous patch "MINOR: time: export the global_now variable" and must be backported as far as 1.8.	2021-03-17 19:36:15 +01:00
Willy Tarreau	650f374f24	MINOR: time: export the global_now variable This is the process-wide monotonic time that is used to update each thread's own time. It may be required at a few places where a strictly monotonic clock is required such as freq_ctr. It will be have to be backported as a dependency of a forthcoming fix.	2021-03-17 19:25:47 +01:00
Willy Tarreau	31a3cea84f	MINOR: cfgparse/proxy: also support spelling fixes on options Some are not always easy to spot with "chk" vs "check" or hyphens at some places and not at others. Now entering "option http-close" properly suggests "httpclose" and "option tcp-chk" suggests "tcp-check". There's no need to consider the proxy's capabilities, what matters is to figure what related word the user tried to spell, and there are not that many options anyway.	2021-03-15 11:14:57 +01:00
Willy Tarreau	b12bc646d5	MINOR: cli: limit spelling suggestions to 5 There's no need to suggest up to 10 entries for matching keywords, most of the times 5 are plenty, and will be more readable.	2021-03-15 10:40:13 +01:00
Willy Tarreau	9294e8822f	MINOR: tools: improve word fingerprinting by counting presence The distance between two words can be high due to a sub-word being missing and in this case it happens that other totally unrealted words are proposed because their average score looks lower thanks to being shorter. Here we're introducing the notion of presence of each character so that word sequences that contain existing sub-words are favored against the shorter ones having nothing in common. In addition we do not distinguish being/end from a regular delimitor anymore. That made it harder to spot inverted words.	2021-03-15 09:38:42 +01:00
Ilya Shipitsin	f3ede874a5	CLEANUP: assorted typo fixes in the code and comments This is 20th iteration of typo fixes	2021-03-13 11:45:17 +01:00
Willy Tarreau	7416314145	CLEANUP: task: make sure tasklet handlers always indicate their statuses When tasklets were derived from tasks, there was no immediate need for the scheduler to know their status after execution, and in a spirit of simplicity they just started to always return NULL. The problem is that it simply prevents the scheduler from 1) accounting their execution time, and 2) keeping track of their current execution status. Indeed, a remote wake-up could very well end up manipulating a tasklet that's currently being executed. And this is the reason why those handlers have to take the idle lock before checking their context. In 2.5 we'll take care of making tasklets and tasks work more similarly, but trouble is to be expected if we continue to propagate the trend of returning NULL everywhere, especially if some fixes relying on a stricter model later need to be backported. For this reason this patch updates all known tasklet handlers to make them return NULL only when the tasklet was freed. It has no effect for now and isn't even guaranteed to always be 100% safe but it puts the code into the right direction for this.	2021-03-13 11:30:19 +01:00
Willy Tarreau	4975d1482f	CLEANUP: cli: rename the last few "stats_" to "cli_" There were still a very small list of functions, variables and fields called "stats_" while they were really purely CLI-centric. There's the frontend called "stats_fe" in the global section, which instantiates a "cli_applet" called "<CLI>" so it was renamed "cli_fe". The "alloc_stats_fe" function cas renamed to "cli_alloc_fe" which also better matches the naming convention of all cli-specific functions. Finally the "stats_permission_denied_msg" used to return an error on the CLI was renamed "cli_permission_denied_msg". Now there's no more "stats_something" that designates the CLI.	2021-03-13 11:04:35 +01:00
Willy Tarreau	f14c7570d6	CLEANUP: cli: rename MAX_STATS_ARGS to MAX_CLI_ARGS This is the number of args accepted on a command received on the CLI, is has long been totally independent of stats and should not carry this misleading "stats" name anymore.	2021-03-13 10:59:23 +01:00
Willy Tarreau	e33c4b3c11	MINOR: tools: add the ability to update a word fingerprint Instead of making a new one from scratch, let's support not wiping the existing fingerprint and updating it, and to do the same char by char. The word-by-word one will still result in multiple beginnings and ends, but that will accurately translate word boundaries. The char-based one has more flexibility and requires that the caller maintains the previous char to indicate the transition, which also allows to insert delimiters for example.	2021-03-12 19:09:19 +01:00
Willy Tarreau	b736458bfa	MEDIUM: cli: apply spelling fixes for known commands before listing them Entering "show tls" would still emit 35 entries. By measuring the distance between all unknown words and the candidates, we can sort them and pick the 10 most likely candidates. This works reasonably well, as now "show tls" only proposes "show tls-keys", "show threads", "show pools" and "show tasks". If the distance is still too high or if a word is missing, the whole prefix list continues to be dumped, thus "show" alone will still report the entire list of commands beginning with "show". It's still impossible to skip a word, for example "show conn" will not propose "show servers conn" because the distance is calculated for each word individually. Some changes to the distance calculation to support updating an existing map could easily address this. But this is already a great improvement.	2021-03-12 19:09:19 +01:00
Willy Tarreau	4451150251	CLEANUP: cli: fix misleading comment and better indent the access level flags It was mentioned that ACCESS_MASTER_ONLY as for workers only instead of master-only. And it wasn't clear that all ACCESS_* would belong to the same thing.	2021-03-12 19:09:19 +01:00
Christopher Faulet	55c1c4053f	MINOR: resolvers: Use milliseconds for cached items in resolver responses The last time when an item was seen in a resolver responses is now stored in milliseconds instead of seconds. This avoid some corner-cases at the edges. This also simplifies time comparisons.	2021-03-12 17:41:28 +01:00
Christopher Faulet	0efc0993ec	BUG/MEDIUM: resolvers: Don't release resolution from a requester callbacks Another way to say it: "Safely unlink requester from a requester callbacks". Requester callbacks must never try to unlink a requester from a resolution, for the current requester or another one. First, these callback functions are called in a loop on a request list, not necessarily safe. Thus unlink resolution at this place, may be unsafe. And it is useless to try to make these loops safe because, all this stuff is placed in a loop on a resolution list. Unlink a requester may lead to release a resolution if it is the last requester. However, the unkink is necessary because we cannot reset the server state (hostname and IP) with some pending DNS resolution on it. So, to workaround this issue, we introduce the "safe" unlink. It is only performed from a requester callback. In this case, the unlink function never releases the resolution, it only reset it if necessary. And when a resolution is found with an empty requester list, it is released. This patch depends on the following commits : * MINOR: resolvers: Purge answer items when a SRV resolution triggers an error * MINOR: resolvers: Use a function to remove answers attached to a resolution * MINOR: resolvers: Directly call srvrq_update_srv_state() when possible * MINOR: resolvers: Add function to change the srv status based on SRV resolution All the series must be backported as far as 2.2. It fixes a regression introduced by the commit `b4badf720` ("BUG/MINOR: resolvers: new callback to properly handle SRV record errors"). don't release resolution from requester cb	2021-03-12 17:41:28 +01:00
Christopher Faulet	5efdef24c1	MINOR: resolvers: Add function to change the srv status based on SRV resolution srvrq_update_srv_status() update the server status based on result of SRV resolution. For now, it is only used from snr_update_srv_status() when appropriate.	2021-03-12 17:41:28 +01:00
Christopher Faulet	1dec5c7934	MINOR: resolvers: Use a function to remove answers attached to a resolution resolv_purge_resolution_answer_records() must be used to removed all answers attached to a resolution. For now, it is only used when a resolution is released.	2021-03-12 17:41:28 +01:00
Baptiste Assmann	6a8d11dc80	MINOR: resolvers: new function find_srvrq_answer_record() This function search for a SRV answer item associated to a requester whose type is server. This is mainly useful to "link" a server to its SRV record when no additional record were found to configure the IP address. This patch is required by a bug fix.	2021-03-12 17:41:28 +01:00
Willy Tarreau	99eb2cc1cc	MINOR: actions: add a function to suggest an action ressembling a given word action_suggest() will return a pointer to an action whose keyword more or less ressembles the passed argument. It also accepts to be more tolerant against prefixes (since actions taking arguments are handled as prefixes). This will be used to suggest approaching words.	2021-03-12 14:13:21 +01:00
Willy Tarreau	433b05fa64	MINOR: cfgparse/bind: suggest correct spelling for unknown bind keywords Just like with the server keywords, now's the turn of "bind" keywords. The difference is that 100% of the bind keywords are registered, thus we do not need the list of extra keywords. There are multiple bind line parsers today, all were updated: - peers - log - dgram-bind - cli $ printf "listen f\nbind :8000 tcut\n" \| ./haproxy -c -f /dev/stdin [NOTICE] 070/101358 (25146) : haproxy version is 2.4-dev11-7b8787-26 [NOTICE] 070/101358 (25146) : path to executable is ./haproxy [ALERT] 070/101358 (25146) : parsing [/dev/stdin:2] : 'bind :8000' unknown keyword 'tcut'; did you mean 'tcp-ut' maybe ? [ALERT] 070/101358 (25146) : Error(s) found in configuration file : /dev/stdin [ALERT] 070/101358 (25146) : Fatal errors found in configuration.	2021-03-12 14:13:21 +01:00
Willy Tarreau	e2afcc4509	MINOR: cfgparse: add cfg_find_best_match() to suggest an existing word Instead of just reporting "unknown keyword", let's provide a function which will look through a list of registered keywords for a similar-looking word to the one that wasn't matched. This will help callers suggest correct spelling. Also, given that a large part of the config parser still relies on a long chain of strcmp(), we'll need to be able to pass extra candidates. Thus the function supports an optional extra list for this purpose.	2021-03-12 14:13:21 +01:00
Willy Tarreau	ba2c4459a5	MINOR: tools: add simple word fingerprinting to find similar-looking words This introduces two functions, one which creates a fingerprint of a word, and one which computes a distance between two words fingerprints. The fingerprint is made by counting the transitions between one character and another one. Here we consider the 26 alphabetic letters regardless of their case, then any digit as a digit, and anything else as "other". We also consider the first and last locations as transitions from begin to first char, and last char to end. The distance is simply the sum of the squares of the differences between two fingerprints. This way, doubling/ missing a letter has the same cost, however some repeated transitions such as "e"->"r" like in "server" are very unlikely to match against situations where they do not exist. This is a naive approach but it seems to work sufficiently well for now. It may be refined in the future if needed.	2021-03-12 14:13:21 +01:00
Willy Tarreau	133c8c412e	CLEANUP: actions: the keyword must always be const from the rule There's no reason for a rule to want to modify an action keyword, let's make sure it is always const.	2021-03-12 14:13:21 +01:00
Christopher Faulet	77e376783e	BUG/MINOR: proxy/session: Be sure to have a listener to increment its counters It is possible to have a session without a listener. It happens for applets on the client side. Thus all accesses to the listener info from the session must be guarded. It was the purpose of the commit `36119de18` ("BUG/MEDIUM: session: NULL dereference possible when accessing the listener"). However, some tests on the session's listener existence are missing in proxy_inc_* functions. This patch should fix the issues #1171, #1172, #1173, #1174 and #1175. It must be backported with the above commit as far as 1.8.	2021-03-12 09:25:45 +01:00
Willy Tarreau	3b728a92bb	BUILD: atomic/arm64: force the register pairs to use in __ha_cas_dw() Since commit `f8fb4f75f` ("MINOR: atomic: implement a more efficient arm64 __ha_cas_dw() using pairs"), on some modern arm64 (armv8.1+) compiled with -march=armv8.1-a under gcc-7.5.0, a build error may appear on ev_poll.o : /tmp/ccHD2lN8.s:1771: Error: reg pair must start from even reg at operand 1 -- `casp x27,x28,x22,x23,[x12]' Makefile:927: recipe for target 'src/ev_poll.o' failed It appears that the compiler cannot always assign register pairs there for a structure made of two u64. It was possibly later addressed since gcc-9.3 never caused this, but there's no trivially available info on the subject in the changelogs. Unsuprizingly, using a u128 instead does fix this, but it significantly inflates the code (+4kB for just 6 places, very likely that it loaded some extra stubs) and the comparison is ugly, involving two slower conditional jumps instead of a single one and a conditional comparison. For example, ha_random64() grew from 144 bytes to 232. However, simply forcing the base register does work pretty well, and makes the code even cleaner and more efficient by further reducing it by about 4.5kB, possibly because it helps the compiler to pick suitable registers for the pair there. And the perf on 64-cores looks steadily 0.5% above the previous one, so let's do this. Note that the commit above was backported to 2.3 to fix scalability issues on AWS Graviton2 platform, so this one will need to be as well.	2021-03-12 06:26:22 +01:00
Fr�d�ric L�caille	c0ed91910a	BUG/MINOR: connection: Missing QUIC initialization The QUIC connection struct connection member was not initialized. This may make randomly haproxy handle TLS connections as QUIC ones only when QUIC support is enabled leading to such OpenSSL errors (captured from a reg test output, TLS Client-Hello callback failed): OpenSSL error[0x10000085] OPENSSL_internal: CONNECTION_REJECTED OpenSSL error[0x10000410] OPENSSL_internal: SSLV3_ALERT_HANDSHAKE_FAILURE OpenSSL error[0x1000009a] OPENSSL_internal: HANDSHAKE_FAILURE_ON_CLIENT_HELLO This patch should fix #1168 github issue.	2021-03-10 12:21:05 +01:00
Willy Tarreau	060a761248	OPTIM: task: automatically adjust the default runqueue-depth to the threads The recent default runqueue size reduction appeared to have significantly lowered performance on low-thread count configs. Testing various values runqueue values on different workloads under thread counts ranging from 1 to 64, it appeared that lower values are more optimal for high thread counts and conversely. It could even be drawn that the optimal value for various workloads sits around 280/sqrt(nbthread), and probably has to do with both the L3 cache usage and how to optimally interlace the threads' activity to minimize contention. This is much easier to optimally configure, so let's do this by default now.	2021-03-10 11:15:34 +01:00
Daniel Corbett	befef70e23	BUG/MINOR: sample: Rename SenderComID/TargetComID to SenderCompID/TargetCompID The recently introduced Financial Information eXchange (FIX) converters have some hard coded tags based on the specification that were misspelled. Specifically, SenderComID and TargetComID should be SenderCompID and TargetCompID according to the specification [1][2]. This patch updates all references, which includes the converters themselves, the regression test, and the documentation. [1] https://fiximate.fixtrading.org/en/FIX.5.0SP2_EP264/tag49.html [2] https://fiximate.fixtrading.org/en/FIX.5.0SP2_EP264/tag56.html	2021-03-10 10:44:20 +01:00
Emeric Brun	4c75195f5b	BUG/MEDIUM: resolvers: handle huge responses over tcp servers. Parameter "accepted_payload_size" is currently considered regardless the used nameserver is using TCP or UDP. It remains mandatory to annouce such capability to support e-dns, so a value have to be announced also in TCP. Maximum DNS message size in TCP is limited by protocol to 65535 and so for UDP (65507) if system supports such UDP messages. But the maximum value for this option was arbitrary forced to 8192. This patch change this maximum to 65535 to allow user to set bigger value for UDP if its system supports. It also sets accepted_payload_size in TCP allowing to retrieve huge responses if the configuration uses TCP nameservers. The request announcing the accepted_payload_size capability is currently built at resolvers level and is common to all used nameservers of the section regardess transport protocol used. A further patch should be made to at least specify a different payload size depending of the transport, and perhaps could be forced to 65535 in case of TCP and maximum would be forced back to 65507 matching UDP max. This patch is appliable since 2.4 version	2021-03-09 15:44:46 +01:00
Willy Tarreau	e89fae3a4e	CLEANUP: stream: rename a few remaining occurrences of "stream *sess" These are some leftovers from the ancient code where they were still called sessions, but these areas in the code remain confusing due to this naming. They were now called "strm" which will not even affect indenting nor alignment.	2021-03-09 15:44:33 +01:00
Willy Tarreau	c93638e1d1	BUILD: connection: do not use VAR_ARRAY in struct tlv It was brought by commit `c44b8de99` ("CLEANUP: connection: Use `VAR_ARRAY` in `struct tlv` definition") but breaks the build with clang. Actually it had already been done 6 months ago by commit `4987a4744` ("CLEANUP: tree-wide: use VAR_ARRAY instead of [0] in various definitions") then reverted by commit `441b6c31e` ("BUILD: connection: fix build on clang after the VAR_ARRAY cleanup") which explained the same thing but didn't place a comment in the code to justify this (in short it's just an end of struct marker).	2021-03-09 10:15:16 +01:00
Willy Tarreau	018251667e	CLEANUP: config: make the cfg_keyword parsers take a const for the defproxy The default proxy was passed as a variable to all parsers instead of a const, which is not without risk, especially when some timeout parsers used to make some int pointers point to the default values for comparisons. We want to be certain that none of these parsers will modify the defaults sections by accident, so it's important to mark this proxy as const. This patch touches all occurrences found (89).	2021-03-09 10:09:43 +01:00
Willy Tarreau	82a92743fc	BUILD: bug: refine HA_LINK_ERROR() to only be used on gcc and derivatives TCC happens to define __OPTIMIZE__ at -O2 but doesn't proceed with dead code elimination, resulting in ha_free() to always reference the link error symbol. Let's condition this test on __GCC__ which others like Clang also define.	2021-03-09 10:09:43 +01:00
Tim Duesterhus	615f81eb5a	MINOR: connection: Use a `struct ist` to store proxy_authority This makes the code cleaner, because proxy_authority can be handled like proxy_unique_id.	2021-03-09 09:24:32 +01:00
Tim Duesterhus	002bd77a6e	CLEANUP: connection: Use istptr / istlen for proxy_unique_id Don't access the ist's fields directly, use the helper functions instead.	2021-03-09 09:24:32 +01:00
Tim Duesterhus	e004c2beae	CLEANUP: connection: Remove useless test for NULL before calling `pool_free()` `pool_free()` is a noop when the given pointer is NULL. No need to test.	2021-03-09 09:24:32 +01:00
Tim Duesterhus	c44b8de995	CLEANUP: connection: Use `VAR_ARRAY` in `struct tlv` definition This is for consistency with `struct tlv_ssl`.	2021-03-09 09:24:32 +01:00
Olivier Houchard	7b00e31509	BUILD: Fix build when using clang without optimizing. ha_free() uses code that attempts to set a non-existant variable to provoke a link-time error, with the expectation that the compiler will not omit that if the code is unreachable. However, clang will emit it when compiling with no optimization, so only do that if __OPTIMIZE__ is defined.	2021-03-05 16:58:56 +01:00
Willy Tarreau	eef7f7fe68	CLEANUP: server: reorder some fields in the server struct to respect cache lines There's currently quite some thread contention in the server struct because frequently fields accessed fields are mixed with those being often written to by any thread. Let's split this a little bit to separate a few areas: - pure config / admin / operating status (almost never changes) - idle and queuing (fast changes, done almost together) - LB (fast changes, not necessarily dependent on the above) - counters (fast changes, at a different instant again)	2021-03-05 15:00:24 +01:00
Willy Tarreau	d4e78d873c	MINOR: server: move actconns to the per-thread structure The actconns list creates massive contention on low server counts because it's in fact a list of streams using a server, all threads compete on the list's head and it's still possible to see some watchdog panics on 48 threads under extreme contention with 47 threads trying to add and one thread trying to delete. Moving this list per thread is trivial because it's only used by srv_shutdown_streams(), which simply required to iterate over the list. The field was renamed to "streams" as it's really a list of streams rather than a list of connections.	2021-03-05 15:00:24 +01:00
Willy Tarreau	430bf4a483	MINOR: server: allocate a per-thread struct for the per-thread connections stuff There are multiple per-thread lists in the listeners, which isn't the most efficient in terms of cache, and doesn't easily allow to store all the per-thread stuff. Now we introduce an srv_per_thread structure which the servers will have an array of, and place the idle/safe/avail conns tree heads into. Overall this was a fairly mechanical change, and the array is now always initialized for all servers since we'll put more stuff there. It's worth noting that the Lua code still has to deal with its own deinit by itself despite being in a global list, because its server is not dynamically allocated.	2021-03-05 15:00:24 +01:00
Willy Tarreau	198e92a8e5	MINOR: server: add a global list of all known servers It's a real pain not to have access to the list of all registered servers, because whenever there is a need to late adjust their configuration, only those attached to regular proxies are seen, but not the peers, lua, logs nor DNS. What this patch does is that new_server() will automatically add the newly created server to a global list, and it does so as well for the 1 or 2 statically allocated servers created for Lua. This way it will be possible to iterate over all of them.	2021-03-05 15:00:24 +01:00
Willy Tarreau	90e9b8c8b6	CLEANUP: global: reorder some fields to respect cache lines Some entries are atomically updated by various threads, such as the global counters, and they're mixed with others which are read all the time like the mode. This explains why "perf" was seeing a huge access cost on global.mode in process_stream()! Let's reorder them so that the static config stuff is at the beginning and the live stuff is at the end.	2021-03-05 08:30:08 +01:00
Willy Tarreau	cc2672f48b	MINOR: server: don't read curr_used_conns multiple times This one is added atomically and we reread it just after this, causing a second memory load that is visible in the perf profile.	2021-03-05 08:30:08 +01:00
Willy Tarreau	4f8cd4397f	MINOR: xprt: add new xprt_set_idle and xprt_set_used methods These functions are used on the mux layer to indicate that the connection is becoming idle and that the xprt ought to be careful before checking the context or that it's not idle anymore and that the context is safe. The purpose is to allow a mux which is going to release a connection to tell the xprt to be careful when touching it. At the moment, the xprt are always careful and that's costly so we want to have the ability to relax this a bit. No xprt layer uses this yet.	2021-03-05 08:30:08 +01:00
Willy Tarreau	6fa8bcdc78	MINOR: task: add an application specific flag to the state: TASK_F_USR1 This flag will be usable by any application. It will be preserved across wakeups so the application can use it to do various stuff. Some I/O handlers will soon benefit from this.	2021-03-05 08:30:08 +01:00
Willy Tarreau	144f84a09d	MEDIUM: task: extend the state field to 32 bits It's been too short for quite a while now and is now full. It's still time to extend it to 32-bits since we have room for this without wasting any space, so we now gained 16 new bits for future flags. The values were not reassigned just in case there would be a few hidden u16 or short somewhere in which these flags are placed (as it used to be the case with stream->pending_events). The patch is tagged MEDIUM because this required to update the task's process() prototype to use an int instead of a short, that's quite a bunch of places.	2021-03-05 08:30:08 +01:00
Willy Tarreau	e0d5942ddd	MINOR: task: move the nice field to the struct task only The nice field isn't needed anymore for the tasklet so we can move it from the TASK_COMMON area into the struct task which already has a hole around the expire entry.	2021-03-05 08:30:08 +01:00
Willy Tarreau	db4e238938	MINOR: task: stop abusing the nice field to detect a tasklet It's cleaner to use a flag from the task's state to detect a tasklet and it's even cheaper. One of the best benefits is that this will allow to get the nice field out of the common part since the tasklet doesn't need it anymore. This commit uses the last task bit available but that's temporary as the purpose of the change is to extend this.	2021-03-05 08:30:08 +01:00
Willy Tarreau	06e69b556c	REORG: tools: promote the debug PRNG to more general use as a statistical one We frequently need to access a simple and fast PRNG for statistical purposes. The debug_prng() function did exactly this using a xorshift generator but its use was limited to debug only. Let's move this to tools.h and tools.c to make it accessible everywhere. Since it needs to be fast, its state is thread-local. An initialization function starts a different initial value for each thread for better distribution.	2021-03-05 08:30:08 +01:00
Ubuntu	6fa9225628	CLEANUP: stream: explain why we queue the stream at the head of the server list In stream_add_srv_conn() MT_LIST_ADD() is used instead of MT_LIST_ADDQ(), resulting in the stream being queued at the end of the server list. This has no particular effect since we cannot dump the streams on a server, and this is only used by "shutdown sessions" on a server. But it also turns out to be significantly faster due to the shorter recovery from the conflict with an adjacent MT_LIST_DEL(), thus it remains desirable to use it, but at least it deserves a comment. In addition to this, it's worth mentioning that this list should creates extreme contention with threads while almost never used. It should be made per-thread just like the global streams list.	2021-03-05 08:30:08 +01:00
Willy Tarreau	f587003fe9	MINOR: pools: double the local pool cache size to 1 MB The reason is that H2 can already require 32 16kB buffers for the mux output at once, which will deplete the local cache. Thus it makes sense to go further to leave some time to other connection to release theirs. In addition, the L2 cache on modern CPUs is already 1 MB, so this change is welcome in any case.	2021-03-05 08:30:08 +01:00
Willy Tarreau	0bae075928	MEDIUM: pools: add CONFIG_HAP_NO_GLOBAL_POOLS and CONFIG_HAP_GLOBAL_POOLS We've reached a point where the global pools represent a significant bottleneck with threads. On a 64-core machine, the performance was divided by 8 between 32 and 64 H2 connections only because there were not enough entries in the local caches to avoid picking from the global pools, and the contention on the list there was very high. It becomes obvious that we need to have an array of lists, but that will require more changes. In parallel, standard memory allocators have improved, with tcmalloc and jemalloc finding their ways through mainstream systems, and glibc having upgraded to a thread-aware ptmalloc variant, keeping this level of contention here isn't justified anymore when we have both the local per-thread pool caches and a fast process-wide allocator. For these reasons, this patch introduces a new compile time setting CONFIG_HAP_NO_GLOBAL_POOLS which is set by default when threads are enabled with thread local pool caches, and we know we have a fast thread-aware memory allocator (currently set for glibc>=2.26). In this case we entirely bypass the global pool and directly use the standard memory allocator when missing objects from the local pools. It is also possible to force it at compile time when a good allocator is used with another setup. It is still possible to re-enable the global pools using CONFIG_HAP_GLOBAL_POOLS, if a corner case is discovered regarding the operating system's default allocator, or when building with a recent libc but a different allocator which provides other benefits but does not scale well with threads.	2021-03-05 08:30:08 +01:00
Ubuntu	f8fb4f75f1	MINOR: atomic: implement a more efficient arm64 __ha_cas_dw() using pairs There finally is a way to support register pairs on aarch64 assembly under gcc, it's just undocumented, like many of the options there :-( As indicated below, it's possible to pass "%H" to mention the high part of a register pair (e.g. "%H0" to go with "%0"): https://patchwork.ozlabs.org/project/gcc/patch/59368A74.2060908@foss.arm.com/ By making local variables from pairs of registers via a struct (as is used in IST for example), we can let gcc choose the correct register pairs and avoid a few moves in certain situations. The code is now slightly more efficient than the previous one on AWS' Graviton2 platform, and noticeably smaller (by 4.5kB approx). A few tests on older releases show that even Linaro's gcc-4.7 used to support such register pairs and %H, and by then ATOMICS were not supported so this should not cause build issues, and as such this patch replaces the earlier implementation.	2021-03-05 08:30:08 +01:00
Willy Tarreau	46cca86900	MINOR: atomic: add armv8.1-a atomics variant for cas-dw This variant uses the CASP instruction available on armv8.1-a CPU cores, which is detected when __ARM_FEATURE_ATOMICS is set (gcc-linaro >= 7, mainline >= 9). This one was tested on cortex-A55 (S905D3) and on AWS' Graviton2 CPUs. The instruction performs way better on high thread counts since it guarantees some forward progress when facing extreme contention while the original LL/SC approach is light on low-thread counts but doesn't guarantee progress. The implementation is not the most optimal possible. In particular since the instruction requires to work on register pairs and there doesn't seem to be a way to force gcc to emit register pairs, we have to decide to force to use the pair (x0,x1) to store the old value, and (x2,x3) to store the new one, and this necessarily involves some extra moves. But at least it does improve the situation with 16 threads and more. See issue #958 for more context. Note, a first implementation of this function was making use of an input/output constraint passed using "+Q"((void*)target), which was resulting in smaller overall code than passing "target" as an input register only. It turned out that the cause was directly related to whether the function was inlined or not, hence the "forceinline" attribute. Any changes to this code should still pay attention to this important factor.	2021-03-05 08:30:08 +01:00
Willy Tarreau	168fc5332c	BUG/MINOR: mt-list: always perform a cpu_relax call on failure On highly threaded machines it is possible to occasionally trigger the watchdog on certain contended areas like the server's connection list, because while the mechanism inherently cannot guarantee a constant progress, it lacks CPU relax calls which are absolutely necessary in this situation to let a thread finish its job. The loop's "while (1)" was changed to use a "for" statement calling __ha_cpu_relax() as its continuation expression. This way the "continue" statements jump to the unique place containing the pause without excessively inflating the code. This was sufficient to definitely fix the problem on 64-core ARM Graviton2 machines. This patch should probably be backported once it's confirmed it also helps on many-cores x86 machines since some people are facing contention in these environments. This patch depends on previous commit "REORG: atomic: reimplement pl_cpu_relax() from atomic-ops.h". An attempt was made to first read the value before exchanging, and it significantly degraded the performance. It's very likely that this caused other cores to lose exclusive ownership on their line and slow down their next xchg operation. In addition it was found that MT_LIST_ADD is significantly faster than MT_LIST_ADDQ under high contention, because it fails one step earlier when conflicting with an adjacent MT_LIST_DEL(). It might be worth switching some operations' order to favor MT_LIST_ADDQ() instead.	2021-03-05 08:30:08 +01:00
Willy Tarreau	958ae26c35	REORG: atomic: reimplement pl_cpu_relax() from atomic-ops.h There is some confusion here as we need to place some cpu_relax statements in some loops where it's not easily possible to condition them on the use of threads. That's what atomic.h already does. So let's take the various pl_cpu_relax() implementations from there and place them in atomic.h under the name __ha_cpu_relax() and let them adapt to the presence or absence of threads and to the architecture (currently only x86 and aarch64 use a barrier instruction), though it's very likely that arm would work well with a cache flushing ISB instruction as well). This time they were implemented as expressions returning 1 rather than statements, in order to ease their placement as the loop condition or the continuation expression inside "for" loops. We should probably do the same with barriers and a few such other ones.	2021-03-05 08:30:08 +01:00
Amaury Denoyelle	8ede3db080	MINOR: backend: handle reuse for conns with no server as target If dispatch mode or transparent backend is used, the backend connection target is a proxy instead of a server. In these cases, the reuse of backend connections is not consistent. With the default behavior, no reuse is done and every new request uses a new connection. However, if http-reuse is set to never, the connection are stored by the mux in the session and can be reused for future requests in the same session. As no server is used for these connections, no reuse can be made outside of the session, similarly to http-reuse never mode. A different http-reuse config value should not have an impact. To achieve this, mark these connections as private to have a defined behavior. For this feature to properly work, the connection hash has been slightly adjusted. The server pointer as an input as been replaced by a generic target pointer to refer to the server or proxy instance. The hash is always calculated on connect_server even if the connection target is not a server. This also requires to allocate the connection hash node for every backend connections, not just the one with a server target.	2021-03-03 11:31:19 +01:00
Frédéric Lécaille	b28812af7a	BUILD: quic: Implicit conversion between SSL related enums. Fix such compilation issues: include/haproxy/quic_tls.h:157:10: error: implicit conversion from 'enum ssl_encryption_level_t' to 'enum quic_tls_enc_level' [-Werror=enum-conversion] 157 \| return ssl_encryption_application; \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ src/xprt_quic.c: In function 'quic_conn_enc_level_init': src/xprt_quic.c:2358:13: error: implicit conversion from 'enum quic_tls_enc_level' to 'enum ssl_encryption_level_t' [-Werror=enum-conversion] 2358 \| qel->level = quic_to_ssl_enc_level(level); \| ^ Not detected by all the compilators.	2021-03-02 10:34:18 +01:00
Willy Tarreau	61cfdf4fd8	CLEANUP: tree-wide: replace free(x);x=NULL with ha_free(&x) This makes the code more readable and less prone to copy-paste errors. In addition, it allows to place some __builtin_constant_p() predicates to trigger a link-time error in case the compiler knows that the freed area is constant. It will also produce compile-time error if trying to free something that is not a regular pointer (e.g. a function). The DEBUG_MEM_STATS macro now also defines an instance for ha_free() so that all these calls can be checked. 178 occurrences were converted. The vast majority of them were handled by the following Coccinelle script, some slightly refined to better deal with "&*x" or with long lines: @ rule @ expression E; @@ - free(E); - E = NULL; + ha_free(&E); It was verified that the resulting code is the same, more or less a handful of cases where the compiler optimized slightly differently the temporary variable that holds the copy of the pointer. A non-negligible amount of {free(str);str=NULL;str_len=0;} are still present in the config part (mostly header names in proxies). These ones should also be cleaned for the same reasons, and probably be turned into ist strings.	2021-02-26 21:21:09 +01:00
Christopher Faulet	29e9326f2f	CLEANUP: hlua: Use net_addr structure internally to parse and compare addresses hlua_addr structure may be replaced by net_addr structure to parse and compare addresses. Both structures are similar.	2021-02-26 13:53:26 +01:00
Christopher Faulet	5d1def623a	MEDIUM: http-ana: Add IPv6 support for forwardfor and orignialto options A network may be specified to avoid header addition for "forwardfor" and "orignialto" option via the "except" parameter. However, only IPv4 networks/addresses are supported. This patch adds the support of IPv6. To do so, the net_addr structure is used to store the parameter value in the proxy structure. And ipcmp2net() function is used to perform the comparison. This patch should fix the issue #1145. It depends on the following commit: * c6ce0ab MINOR: tools: Add function to compare an address to a network address * 5587287 MINOR: tools: Add net_addr structure describing a network addess	2021-02-26 13:52:48 +01:00
Christopher Faulet	9553de7fec	MINOR: tools: Add function to compare an address to a network address ipcmp2net() function may be used to compare an addres (struct sockaddr_storage) to a network address (struct net_addr). Among other things, this function will be used to add support of IPv6 for "except" parameter of "forwardfor" and "originalto" options.	2021-02-26 13:52:06 +01:00
Christopher Faulet	01f02a4d84	MINOR: tools: Add net_addr structure describing a network addess The net_addr structure describes a IPv4 or IPv6 address. Its ip and mask are represented. Among other things, this structure will be used to add support of IPv6 for "except" parameter of "forwardfor" and "originalto" options.	2021-02-26 13:32:17 +01:00
Willy Tarreau	401135cee6	MINOR: task: add one extra tasklet class: TL_HEAVY This class will be used exclusively for heavy processing tasklets. It will be cleaner than mixing them with the bulk ones. For now it's allocated ~1% of the CPU bandwidth. The largest part of the patch consists in re-arranging the fields in the task_per_thread structure to preserve a clean alignment with one more list head. Since we're now forced to increase the struct past a second cache line, it now uses 4 cache lines (for easy multiplying) with the first two ones being exclusively used by local operations and the third one mostly by atomic operations. Interestingly, this better arrangement causes less stress and reduced the response time by 8 microseconds at 1 million requests per second.	2021-02-26 12:00:53 +01:00
Willy Tarreau	d8aa21a611	CLEANUP: server: rename srv_cleanup_{idle,toremove}_connections() These function names are unbearably long, they don't even fit into the screen in "show profiling", let's trim the "_connections" to "_conns", which happens to match the name of the lists there.	2021-02-26 00:30:22 +01:00
Willy Tarreau	74dea8caea	MINOR: task: limit the number of subsequent heavy tasks with flag TASK_HEAVY While the scheduler is priority-aware and class-aware, and consistently tries to maintain fairness between all classes, it doesn't make use of a fine execution budget to compensate for high-latency tasks such as TLS handshakes. This can result in many subsequent calls adding multiple milliseconds of latency between the various steps of other tasklets that don't even depend on this. An ideal solution would be to add a 4th queue, have all tasks announce their estimated cost upfront and let the scheduler maintain an auto- refilling budget to pick from the most suitable queue. But it turns out that a very simplified version of this already provides impressive gains with very tiny changes and could easily be backported. The principle is to reserve a new task flag "TASK_HEAVY" that indicates that a task is expected to take a lot of time without yielding (e.g. an SSL handshake typically takes 700 microseconds of crypto computation). When the scheduler sees this flag when queuing a tasklet, it will place it into the bulk queue. And during dequeuing, we accept only one of these in a full round. This means that the first one will be accepted, will not prevent other lower priority tasks from running, but if a new one arrives, then the queue stops here and goes back to the polling. This will allow to collect more important updates for other tasks that will be batched before the next call of a heavy task. Preliminary tests consisting in placing this flag on the SSL handshake tasklet show that response times under SSL stress fell from 14 ms before the patch to 3.0 ms with the patch, and even 1.8 ms if tune.sched.low-latency is set to "on".	2021-02-26 00:25:51 +01:00
Christopher Faulet	69beaa91d5	REORG: server: Export and rename some functions updating server info Some static functions are now exported and renamed to follow the same pattern of other exported functions. Here is the list : * update_server_fqdn: Renamed to srv_update_fqdn and exported * update_server_check_addr_port: renamed to srv_update_check_addr_port and exported * update_server_agent_addr_port: renamed to srv_update_agent_addr_port and exported * update_server_addr: renamed to srv_update_addr * update_server_addr_potr: renamed to srv_update_addr_port * srv_prepare_for_resolution: exported This change is mandatory to move all functions dealing with the server-state files in a separate file.	2021-02-25 10:02:39 +01:00
Christopher Faulet	ecfb9b9109	MEDIUM: server: Store parsed params of a server-state line in the tree Parsed parameters are now stored in the tree of server-state lines. This way, a line from the global server-state file is only parsed once. Before, it was parsed a first time to store it in the tree and one more time to load the server state. To do so, the server-state line object must be allocated before parsing a line. This means its size must no longer depend on the length of first parsed parameters (backend and server names). Thus the node type was changed to use a hashed key instead of a string.	2021-02-25 10:02:39 +01:00
Christopher Faulet	6d87c58fb4	CLEANUP: server: Rename state_line structure into server_state_line The structure used to store a server-state line in an eb-tree has a too generic name. Instead of state_line, the structure is renamed as server_state_line.	2021-02-25 10:02:39 +01:00
Christopher Faulet	fcb53fbb58	CLEANUP: server: Rename state_line node to node instead of name_name <state_line.name_name> field is a node in an eb-tree. Thus, instead of "name_name", we now use "node" to name this field. If is a more explicit name and not too strange.	2021-02-25 10:02:39 +01:00
Willy Tarreau	b2285de049	MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set It is extremely useful to be able to observe the wakeup latency of some important I/O operations, so let's accept to inflate the tasklet struct by 8 extra bytes when DEBUG_TASK is set. With just this we have enough to get live reports like this: $ socat - /tmp/sock1 <<< "show profiling" Per-task CPU profiling : on # set profiling tasks {on\|auto\|off} Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg si_cs_io_cb 8099492 4.833s 596.0ns 8.974m 66.48us h1_io_cb 7460365 11.55s 1.548us 2.477m 19.92us process_stream 7383828 22.79s 3.086us 18.39m 149.5us h1_timeout_task 4157 - - 348.4ms 83.81us srv_cleanup_toremove_connections751 39.70ms 52.86us 10.54ms 14.04us srv_cleanup_idle_connections 21 1.405ms 66.89us 30.82us 1.467us task_run_applet 16 1.058ms 66.13us 446.2us 27.89us accept_queue_process 7 34.53us 4.933us 333.1us 47.58us	2021-02-25 09:44:16 +01:00
Willy Tarreau	45499c56d3	MINOR: task: make grq_total atomic to move it outside of the grq_lock Instead of decrementing grq_total once per task picked from the global run queue, let's do it at once after the loop like we do for other counters. This simplifies the code everywhere. It is not expected to bring noticeable improvements however, since global tasks tend to be less common nowadays.	2021-02-25 09:44:16 +01:00
Willy Tarreau	c03fbeb358	CLEANUP: task: re-merge __task_unlink_rq() with task_unlink_rq() There's no point keeping the two separate anymore, some tests are duplicated for no reason.	2021-02-25 09:44:16 +01:00
Christopher Faulet	e071f0e6a4	MINOR: htx: Add function to reserve the max possible size for an HTX DATA block The function htx_reserve_max_data() should be used to get an HTX DATA block with the max possible size. A current block may be extended or a new one created, depending on the HTX message state. But the idea is to let the caller to copy a bunch of data without requesting many new blocks. It is its responsibility to resize the block at the end, to set the final block size. This function will be used to parse messages with small chunks. Indeed, we can have more than 2700 1-byte chunks in a 16Kb of input data. So it is easy to understand how this function may help to improve the parsing of chunk messages.	2021-02-24 22:10:01 +01:00
Baptiste Assmann	b4badf720c	BUG/MINOR: resolvers: new callback to properly handle SRV record errors When a SRV record was created, it used to register the regular server name resolution callbacks. That said, SRV records and regular server name resolution don't work the same way, furthermore on error management. This patch introduces a new call back to manage DNS errors related to the SRV queries. this fixes github issue #50. Backport status: 2.3, 2.2, 2.1, 2.0	2021-02-24 21:58:45 +01:00
Willy Tarreau	5926e384e6	BUG/MINOR: fd: properly wait for !running_mask in fd_set_running_excl() In fd_set_running_excl() we don't reset the old mask in the CAS loop, so if we fail on the first round, we'll forcefully take the FD on the next one. In practice it's used bu fd_insert() and fd_delete() only, none of which is supposed to be passed an FD which is still in use since in practice, given that for now only listeners may be enabled on multiple threads at once. This can be backported to 2.2 but shouldn't result in fixing any user visible bug for now.	2021-02-24 19:40:49 +01:00
Willy Tarreau	9c6dbf0eea	CLEANUP: task: split the large tasklet_wakeup_on() function in two This function has become large with the multi-queue scheduler. We need to keep the fast path and the debugging parts inlined, but the rest now moves to task.c just like was done for task_wakeup(). This has reduced the code size by 6kB due to less inlining of large parts that are always context-dependent, and as a side effect, has increased the overall performance by 1%.	2021-02-24 17:55:58 +01:00
Willy Tarreau	955a11ebfa	MINOR: task: move the allocated tasks counter to the per-thread struct The nb_tasks counter was still global and gets incremented and decremented for each task_new()/task_free(), and was read in process_runnable_tasks(). But it's only used for stats reporting, so doing this this often is pointless and expensive. Let's move it to the task_per_thread struct and have the stats sum it when needed.	2021-02-24 17:42:04 +01:00
Willy Tarreau	018564eaa2	CLEANUP: task: move the tree root detection from __task_wakeup() to task_wakeup() Historically we used to call __task_wakeup() with a known tree root but this is not the case and the code has remained needlessly complicated with the root calculation in task_wakeup() passed in argument to __task_wakeup() which compares it again. Let's get rid of this and just move the detection code there. This eliminates some ifdefs and allows to simplify the test conditions quite a bit.	2021-02-24 17:42:04 +01:00
Willy Tarreau	1f3b1417b8	CLEANUP: tasks: use a less confusing name for task_list_size This one is systematically misunderstood due to its unclear name. It is in fact the number of tasks in the local tasklet list. Let's call it "tasks_in_list" to remove some of the confusion.	2021-02-24 17:42:04 +01:00
Willy Tarreau	2c41d77ebc	MINOR: tasks: do not maintain the rqueue_size counter anymore This one is exclusively used as a boolean nowadays and is non-zero only when the thread-local run queue is not empty. Better check the root tree's pointer and avoid updating this counter all the time.	2021-02-24 17:42:04 +01:00
Willy Tarreau	9c7b8085f4	MEDIUM: task: remove the tasks_run_queue counter and have one per thread This counter is solely used for reporting in the stats and is the hottest thread contention point to date. Moving it to the scheduler and having a separate one for the global run queue dramatically improves the performance, showing a 12% boost on the request rate on 16 threads! In addition, the thread debugging output which used to rely on rqueue_size was not totally accurate as it would only report task counts. Now we can return the exact thread's run queue length. It is also interesting to note that there are still a few other task/tasklet counters in the scheduler that are not efficiently updated because some cover a single area and others cover multiple areas. It looks like having a distinct counter for each of the following entries would help and would keep the code a bit cleaner: - global run queue (tree) - per-thread run queue (tree) - per-thread shared tasklets list - per-thread local lists Maybe even splitting the shared tasklets lists between pure tasklets and tasks instead of having the whole and tasks would simplify the code because there remain a number of places where several counters have to be updated.	2021-02-24 17:42:04 +01:00
Willy Tarreau	49de68520e	MEDIUM: streams: do not use the streams lock anymore The lock was still used exclusively to deal with the concurrency between the "show sess" release handler and a stream_new() or stream_free() on another thread. All other accesses made by "show sess" are already done under thread isolation. The release handler only requires to unlink its node when stopping in the middle of a dump (error, timeout etc). Let's just isolate the thread to deal with this case so that it's compatible with the dump conditions, and remove all remaining locking on the streams. This effectively kills the streams lock. The measured gain here is around 1.6% with 4 threads (374krps -> 380k).	2021-02-24 13:54:50 +01:00
Willy Tarreau	a698eb6739	MINOR: streams: use one list per stream instead of a global one The global streams list is exclusively used for "show sess", to look up a stream to shut down, and for the hard-stop. Having all of them in a single list is extremely expensive in terms of locking when using threads, with performance losses as high as 7% having been observed just due to this. This patch makes the list per-thread, since there's no need to have a global one in this situation. All call places just iterate over all threads. The most "invasive" changes was in "show sess" where the end of list needs to go back to the beginning of next thread's list until the last thread is seen. For now the lock was maintained to keep the code auditable but a next commit should get rid of it. The observed performance gain here with only 4 threads is already 7% (350krps -> 374krps).	2021-02-24 13:53:20 +01:00
Willy Tarreau	b981318c11	MINOR: stream: add an "epoch" to figure which streams appeared when The "show sess" CLI command currently lists all streams and needs to stop at a given position to avoid dumping forever. Since 2.2 with commit `c6e7a1b8e` ("MINOR: cli: make "show sess" stop at the last known session"), a hack consists in unlinking the stream running the applet and linking it again at the current end of the list, in order to serve as a delimiter. But this forces the stream list to be global, which affects scalability. This patch introduces an epoch, which is a global 32-bit counter that is incremented by the "show sess" command, and which is copied by newly created streams. This way any stream can know whether any other one is newer or older than itself. For now it's only stored and not exploited.	2021-02-24 12:12:51 +01:00
Ilya Shipitsin	98a9e1b873	BUILD: SSL: introduce fine guard for RAND_keep_random_devices_open RAND_keep_random_devices_open is OpenSSL specific function, not implemented in LibreSSL and BoringSSL. Let us define guard HAVE_SSL_RAND_KEEP_RANDOM_DEVICES_OPEN in include/haproxy/openssl-compat.h That guard does not depend anymore on HA_OPENSSL_VERSION	2021-02-22 10:35:23 +01:00
Willy Tarreau	c6ba9a0b9b	MINOR: sched: have one runqueue ticks counter per thread The runqueue_ticks counts the number of task wakeups and is used to position new tasks in the run queue, but since we've had per-thread run queues, the values there are not very relevant anymore and the nice value doesn't apply well if some threads are more loaded than others. In addition, letting all threads compete over a shared counter is not smart as this may cause some excessive contention. Let's move this index close to the run queues themselves, i.e. one per thread and a global one. In addition to improving fairness, this has increased global performance by 2% on 16 threads thanks to the lower contention on rqueue_ticks. Fairness issues were not observed, but if any were to be, this patch could be backported as far as 2.0 to address them.	2021-02-20 13:03:37 +01:00
Willy Tarreau	4d77bbf856	MINOR: dynbuf: pass offer_buffers() the number of buffers instead of a threshold Historically this function would try to wake the most accurate number of process_stream() waiters. But since the introduction of filters which could also require buffers (e.g. for compression), things started not to be as accurate anymore. Nowadays muxes and transport layers also use buffers, so the runqueue size has nothing to do anymore with the number of supposed users to come. In addition to this, the threshold was compared to the number of free buffer calculated as allocated minus used, but this didn't work anymore with local pools since these counts are not updated upon alloc/free! Let's clean this up and pass the number of released buffers instead, and consider that each waiter successfully called counts as one buffer. This is not rocket science and will not suddenly fix everything, but at least it cannot be as wrong as it is today. This could have been marked as a bug given that the current situation is totally broken regarding this, but this probably doesn't completely fix it, it only goes in a better direction. It is possible however that it makes sense in the future to backport this as part of a larger series if the situation significantly improves.	2021-02-20 12:38:18 +01:00
Willy Tarreau	90f366b595	MINOR: dynbuf: use regular lists instead of mt_lists for buffer_wait There's no point anymore in keeping mt_lists for the buffer_wait and buffer_wq since it's thread-local now.	2021-02-20 12:38:18 +01:00
Willy Tarreau	e8e5091510	MINOR: dynbuf: make the buffer wait queue per thread The buffer wait queue used to be global historically but this doest not make any sense anymore given that the most common use case is to have thread-local pools. Thus there's no point waking up waiters of other threads after releasing an entry, as they won't benefit from it. Let's move the queue head to the thread_info structure and use ti->buffer_wq from now on.	2021-02-20 12:38:18 +01:00
Christopher Faulet	ea2cdf55e3	MEDIUM: server: Don't introduce a new server-state file version This revert the commit `63e6cba12` ("MEDIUM: server: add server-states version 2"), but keeping all recent features added to the server-sate file. Instead of adding a 2nd version for the server-state file format to handle the 5 new fields added during the 2.4 development, these fields are considered as optionnal during the parsing. So it is possible to load a server-state file from HAProxy 2.3. However, from 2.4, these new fields are always dumped in the server-state file. But it should not be a problem to load it on the 2.3. This patch seems a bit huge but the diff ignoring the space is much smaller. The version 2 of the server-state file format is reserved for a real refactoring to address all issues of the current format.	2021-02-19 18:03:59 +01:00
Amaury Denoyelle	8990b010a0	MINOR: connection: allocate dynamically hash node for backend conns Remove ebmb_node entry from struct connection and create a dedicated struct conn_hash_node. struct connection contains now only a pointer to a conn_hash_node, allocated only for connections where target is of type OBJ_TYPE_SERVER. This will reduce memory footprints for every connections that does not need http-reuse such as frontend connections.	2021-02-19 16:59:18 +01:00
Olivier Houchard	5567f41d0a	BUG/MEDIUM: lists: Avoid an infinite loop in MT_LIST_TRY_ADDQ(). In MT_LIST_TRY_ADDQ(), deal with the "prev" field of the element before the "next". If the element is the first in the list, then its next will already have been locked when we locked list->prev->next, so locking it again will fail, and we'll start over and over. This should be backported to 2.3.	2021-02-19 16:47:20 +01:00
Willy Tarreau	66161326fd	MINOR: listener: refine the default MAX_ACCEPT from 64 to 4 The maximum number of connections accepted at once by a thread for a single listener used to default to 64 divided by the number of processes but the tasklet-based model is much more scalable and benefits from smaller values. Experimentation has shown that 4 gives the highest accept rate for all thread values, and that 3 and 5 come very close, as shown below (HTTP/1 connections forwarded per second at multi-accept 4 and 64): ac\thr\| 1 2 4 8 16 ------+------------------------------ 4\| 80k 106k 168k 270k 336k 64\| 63k 89k 145k 230k 274k Some tests were also conducted on SSL and absolutely no change was observed. The value was placed into a define because it used to be spread all over the code. It might be useful at some point to backport this to 2.3 and 2.2 to help those who observed some performance regressions from 1.6.	2021-02-19 16:02:04 +01:00
Willy Tarreau	4327d0ac00	MINOR: tasks: refine the default run queue depth Since a lot of internal callbacks were turned to tasklets, the runqueue depth had not been readjusted from the default 200 which was initially used to favor batched processing. But nowadays it appears too large already based on the following tests conducted on a 8c16t machine with a simple config involving "balance leastconn" and one server. The setup always involved the two threads of a same CPU core except for 1 thread, and the client was running over 1000 concurrent H1 connections. The number of requests per second is reported for each (runqueue-depth, nbthread) couple: rq\thr\| 1 2 4 8 16 ------+------------------------------ 32\| 120k 159k 276k 477k 698k 40\| 122k 160k 276k 478k 722k 48\| 121k 159k 274k 482k 720k 64\| 121k 160k 274k 469k 710k 200\| 114k 150k 247k 415k 613k <-- default It's possible to save up to about 18% performance by lowering the default value to 40. One possible explanation to this is that checking I/Os more frequently allows to flush buffers faster and to smooth the I/O wait time over multiple operations instead of alternating phases of processing, waiting for locks and waiting for new I/Os. The total round trip time also fell from 1.62ms to 1.40ms on average, among which at least 0.5ms is attributed to the testing tools since this is the minimum attainable on the loopback. After some observation it would be nice to backport this to 2.3 and 2.2 which observe similar improvements, since some users have already observed some perf regressions between 1.6 and 2.2.	2021-02-19 16:01:55 +01:00
Ilya Shipitsin	c47d676bd7	BUILD: ssl: introduce fine guard for OpenSSL specific SCTL functions SCTL (signed certificate timestamp list) specified in RFC6962 was implemented in c74ce24cd22e8c683ba0e5353c0762f8616e597d, let us introduce macro HAVE_SSL_SCTL for the HAVE_SSL_SCTL sake, which in turn is based on SN_ct_cert_scts, which comes in the same commit	2021-02-18 15:55:50 +01:00
Christopher Faulet	8dd40fbde9	BUG/MINOR: sample: Always consider zero size string samples as unsafe smp_is_safe() function is used to be sure a sample may be safely modified. For string samples, a test is performed to verify if there is a null-terminated byte. If not, one is added, if possible. It means if the sample is not const and if there is some free space in the buffer, after data. However, we must not try to read the null-terminated byte if the string sample is too long (data >= size) or if the size is equal to zero. This last test was not performed. Thus it was possible to consider a string sample as safe by testing a byte outside the buffer. Now, a zero size string sample is always considered as unsafe and is duplicated when smp_make_safe() is called. This patch must be backported in all stable versions.	2021-02-18 14:58:43 +01:00
Willy Tarreau	ca9f60c1ac	MINOR: tasks/debug: add some extra controls of use-after-free in DEBUG_TASK It's pretty easy to pre-initialize the index, change it on free() and check it during the wakeup, so let's do this to ease detection of any accidental task_wakeup() after a task_free() or tasklet_wakeup() after a tasklet_free(). If this would ever happen we'd then get a backtrace and a core now. The index's parity is respected so that the call history remains exploitable.	2021-02-18 14:38:49 +01:00
Willy Tarreau	b23f04260b	MINOR: tasks: add DEBUG_TASK to report caller info in a task The idea is to know who woke a task up, by recording the last two callers in a rotating mode. For now it's trivial with task_wakeup() but tasklet_wakeup_on() will require quite some more changes. This typically gives this from the debugger: (gdb) p t->debug $2 = { caller_file = {0x0, 0x8c0d80 "src/task.c"}, caller_line = {0, 260}, caller_idx = 1 } or this: (gdb) p t->debug $6 = { caller_file = {0x7fffe40329e0 "", 0x885feb "src/stream.c"}, caller_line = {284, 284}, caller_idx = 1 } But it also provides a trivial macro allowing to simply place a call in a task/tasklet handler that needs to be observed: DEBUG_TASK_PRINT_CALLER(t); Then starting haproxy this way would trivially yield such info: $ ./haproxy -db -f test.cfg \| sort \| uniq -c \| sort -nr 199992 h1_io_cb woken up from src/sock.c:797 51764 h1_io_cb woken up from src/mux_h1.c:3634 65 h1_io_cb woken up from src/connection.c:169 45 h1_io_cb woken up from src/sock.c:777	2021-02-18 10:42:07 +01:00
Willy Tarreau	59b0fecfd9	MINOR: lb/api: let callers of take_conn/drop_conn tell if they have the lock The two algos defining these functions (first and leastconn) do not need the server's lock. However it's already present in pendconn_process_next_strm() so the API must be updated so that the functions may take it if needed and that the callers indicate whether they already own it. As such, the call places (backend.c and stream.c) now do not take it anymore, queue.c was unchanged since it's already held, and both "first" and "leastconn" were updated to take it if not already held. A quick test on the "first" algo showed a jump from 432 to 565k rps by just dropping the lock in stream.c!	2021-02-18 10:06:45 +01:00
Willy Tarreau	b9ad30a8ad	Revert "MINOR: threads: change lock_t to an unsigned int" This reverts commit `8f1f177ed0`. Repeated tests have shown a small perforamnce degradation of ~1.8% caused by this patch at high request rates on 16 threads. The exact cause is not yet perfectly known but it probably stems in slower accesses for non-64-bit aligned atomic accesses.	2021-02-18 10:06:45 +01:00
Willy Tarreau	751153e0f1	OPTIM: server: switch the actconn list to an mt-list The remaining contention on the server lock solely comes from sess_change_server() which takes the lock to add and remove a stream from the server's actconn list. This is both expensive and pointless since we have mt-lists, and this list is only used by the CLI's "shutdown server sessions" command! Let's migrate to an mt-list and remove the need for this costly lock. By doing so, the request rate increased by ~1.8%.	2021-02-18 10:06:45 +01:00
Willy Tarreau	ccea3c54f4	DEBUG: thread: add 5 extra lock labels for statistics and debugging Since OTHER_LOCK is commonly used it's become much more difficult to profile lock contention by temporarily changing a lock label. Let's add DEBUG1..5 to serve only for debugging. These ones must not be used in committed code. We could decide to only define them when DEBUG_THREAD is set but that would complicate attempts at measuring performance with debugging turned off.	2021-02-18 10:06:45 +01:00
Willy Tarreau	4e9df2737d	BUG/MEDIUM: checks: don't needlessly take the server lock in health_adjust() The server lock was taken preventively for anything in health_adjust(), including the static config checks needed to detect that the lock was not needed, while the function is always called on the response path to update a server's status. This was responsible for huge contention causing a performance drop of about 17% on 16 threads. Let's move the lock only where it should be, i.e. inside the function around the critical sections only. By doing this, a 16-thread process jumped back from 575 to 675 krps. This should be backported to 2.3 as the situation degraded there, and maybe later to 2.2.	2021-02-18 10:06:45 +01:00
Amaury Denoyelle	36441f46c4	MINOR: connection: remove pointers for prehash in conn_hash_params Replace unneeded pointers for sni/proxy prehash by plain data type. The code is slightly cleaner.	2021-02-17 16:43:07 +01:00
Amaury Denoyelle	aba507334b	BUG/MAJOR: connection: prevent double free if conn selected for removal Always try to remove a connexion from its toremove_list in conn_free. This prevents a double-free in case the connection is freed but was already added in toremove_list. This bug was easily reproduced by running 4-5 runs of inject on a single-thread instance of haproxy : $ inject -u 10000 -d 10 -G 127.0.0.1:20080 A crash would soon be triggered in srv_cleanup_toremove_connections. This does not need to be backported.	2021-02-16 16:17:29 +01:00
William Dauchy	3679d0c794	MINOR: stats: add helper to get status string move listen status to a helper, defining both status enum and string definition. this will be helpful to be reused in prometheus code. It also removes this hard-to-read nested ternary. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-02-15 14:13:32 +01:00
William Dauchy	655e14ef17	MEDIUM: stats: allow to select one field in `stats_fill_li_stats` prometheus approach requires to output all values for a given metric name; meaning we iterate through all metrics, and then iterate in the inner loop on all objects for this metric. In order to allow more code reuse, adapt the stats API to be able to select one field or fill them all otherwise. From this patch it should be possible to add support for listen stats in prometheus. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-02-15 14:13:32 +01:00
Emeric Brun	fd647d5f5f	MEDIUM: dns: adds code to support pipelined DNS requests over TCP. This patch introduce the "dns_stream_nameserver" to use DNS over TCP on strict nameservers. For the upper layer it is analog to the api used with udp nameservers except that the user que switch the name server in "stream" mode at the init using "dns_stream_init". The fallback from UDP to TCP is not handled and this is not the purpose of this feature. This is done to choose the transport layer during the initialization. Currently there is a hardcoded limit of 4 pipelined transactions per TCP connections. A batch of idle connections is expired every 5s. This code is designed to support a maximum DNS message size on TCP: 64k. Note: this code won't perform retry on unanswered queries this should be handled by the upper layer	2021-02-13 10:03:46 +01:00
Emeric Brun	c943799c86	MEDIUM: resolvers/dns: split dns.c into dns.c and resolvers.c This patch splits current dns.c into two files: The first dns.c contains code related to DNS message exchange over UDP and in future other TCP. We try to remove depencies to resolving to make it usable by other stuff as DNS load balancing. The new resolvers.c inherit of the code specific to the actual resolvers. Note: It was really difficult to obtain a clean diff dur to the amount of moved code. Note2: Counters and stuff related to stats is not cleany separated because currently counters for both layers are merged and hard to separate for now.	2021-02-13 10:03:46 +01:00
Emeric Brun	d26a6237ad	MEDIUM: resolvers: split resolving and dns message exchange layers. This patch splits recv and send functions in two layers. the lowest is responsible of DNS message transactions over the network. Doing this we could use DNS message layer for something else than resolving. Load balancing for instance. This patch also re-works the way to init a nameserver and introduce the new struct dns_dgram_server to prepare the arrival of dns_stream_server and the support of DNS over TCP. The way to retry a send failure of a request because of EAGAIN was re-worked. Previously there was no control and all "pending" queries were re-played each time it reaches a EAGAIN. This patch introduce a ring to stack messages in case of sent failure. This patch is emptied if poller shows that the socket is ready again to push messages.	2021-02-13 09:51:10 +01:00
Emeric Brun	d3b4495f0d	MINOR: resolvers: rework dns stats prototype because specific to resolvers Counters are currently stored into lowlevel nameservers struct but most of them are resolving layer data and increased in the upper layer So this patch renames the prototype used to allocate/dump them with prefix 'resolv' waiting for a clean split.	2021-02-13 09:43:18 +01:00
Emeric Brun	6a2006ae37	MINOR: resolvers: replace nameserver's resolver ref by generic parent pointer This will allow to use nameservers in something else than a resolver section (load balancing for instance).	2021-02-13 09:43:18 +01:00
Emeric Brun	d30e9a1709	MINOR: resolvers: rework prototype suffixes to split resolving and dns. A lot of prototypes in dns.h are specific to resolvers and must be renamed to split resolving and DNS layers.	2021-02-13 09:43:18 +01:00
Emeric Brun	456de77bdb	MINOR: resolvers: renames resolvers DNS_UPD_* returncodes to RSLV_UPD_* This patch renames some #defines prefixes from DNS to RSLV.	2021-02-13 09:43:18 +01:00
Emeric Brun	30c766ebbc	MINOR: resolvers: renames resolvers DNS_RESP_* errcodes RSLV_RESP_* This patch renames some #defines prefixes from DNS to RSLV.	2021-02-13 09:43:18 +01:00
Emeric Brun	21fbeedf97	MINOR: resolvers: renames some dns prefixed types using resolv prefix. @@ -119,8 +119,8 @@ struct act_rule { - } dns; /* dns resolution / + } resolv; / resolving */ -struct dns_options { +struct resolv_options {	2021-02-13 09:43:18 +01:00
Emeric Brun	08622d3c0a	MINOR: resolvers: renames some resolvers specific types to not use dns prefix This patch applies those changes on names: -struct dns_resolution { +struct resolv_resolution { -struct dns_requester { +struct resolv_requester { -struct dns_srvrq { +struct resolv_srvrq { @@ -185,12 +185,12 @@ struct stream { struct { - struct dns_requester dns_requester; + struct resolv_requester requester; ... - } dns_ctx; + } resolv_ctx;	2021-02-13 09:43:18 +01:00
Emeric Brun	750fe79cd0	MINOR: resolvers: renames type dns_resolvers to resolvers. It also renames 'dns_resolvers' head list to sec_resolvers to avoid conflicts with local variables 'resolvers'.	2021-02-13 09:43:17 +01:00
Emeric Brun	85914e9d9b	MINOR: resolvers: renames some resolvers internal types and removes dns prefix Some types are specific to resolver code and a renamed using the 'resolv' prefix instead 'dns'. -struct dns_query_item { +struct resolv_query_item { -struct dns_answer_item { +struct resolv_answer_item { -struct dns_response_packet { +struct resolv_response {	2021-02-13 09:43:17 +01:00
Emeric Brun	67f830d29d	BUG/MINOR: resolvers: fix attribute packed struct for dns This patch adds the attribute packed on struct dns_question because it is directly memcpy to network building a response. This patch also removes the commented line: // struct list options; /* list of option records */ because it is also used directly using memcpy to build a request and must not contain host data.	2021-02-13 09:43:17 +01:00
Emeric Brun	50c870e4de	BUG/MINOR: dns: add missing sent counter and parent id to dns counters. Resolv callbacks are also updated to rely on counters and not on nameservers. "show stat domain dns" will now show the parent id (i.e. resolvers section name).	2021-02-13 09:43:17 +01:00
Emeric Brun	e14b98c08e	MINOR: ring: adds new ring_init function. Adds the new ring_init function to initialize a pre-allocated ring struct using the given memory area.	2021-02-13 09:43:17 +01:00
Willy Tarreau	49962b58d0	MINOR: peers/cli: do not dump the peers dictionaries by default on "show peers" The "show peers" output has become huge due to the dictionaries making it less readable. Now this feature has reached a certain level of maturity which doesn't warrant to dump it all the time, given that it was essentially needed by developers. Let's make it optional, and disabled by default, only when "show peers dict" is requested. The default output reminds about the command. The output has been divided by 5 : $ socat - /tmp/sock1 <<< "show peers dict" \| wc -l 125 $ socat - /tmp/sock1 <<< "show peers" \| wc -l 26 It could be useful to backport this to recent stable versions.	2021-02-12 17:00:52 +01:00
Willy Tarreau	e90904d5a9	MEDIUM: proxy: store the default proxies in a tree by name Now default proxies are stored into a dedicated tree, sorted by name. Only unnamed entries are not kept upon new section creation. The very first call to cfg_parse_listen() will automatically allocate a dummy defaults section which corresponds to the previous static one, since the code requires to have one at a few places. The first immediately visible benefit is that it allows to reuse alloc_new_proxy() to allocate a defaults section instead of doing it by hand. And the secret goal is to allow to keep multiple named defaults section in memory to reuse them from various proxies.	2021-02-12 16:23:46 +01:00
Willy Tarreau	0a0f6a7e4f	MINOR: proxy: support storing defaults sections into their own tree Now we'll have a tree of named defaults sections. The regular insertion and lookup functions take care of the capability in order to select the appropriate tree. A new function proxy_destroy_defaults() removes a proxy from this tree and frees it entirely.	2021-02-12 16:23:46 +01:00
Willy Tarreau	80dc6fea59	MINOR: proxy: add a new capability PR_CAP_DEF In order to more easily distinguish a default proxy from a standard one, let's introduce a new capability PR_CAP_DEF.	2021-02-12 16:23:46 +01:00
Willy Tarreau	7d0c143185	MINOR: cfgparse: move defproxy to cfgparse-listen as a static We don't want to expose this one anymore as we'll soon keep multiple default proxies. Let's move it inside the parser which is the only place which still uses it, and initialize it on the fly once needed instead of doing it at boot time.	2021-02-12 16:23:46 +01:00
Willy Tarreau	bb8669ae28	BUG/MINOR: server: parse_server() must take a const for the defproxy The default proxy was passed as a variable, which in addition to being a PITA to deal with in the config parser, doesn't feel safe to use when it ought to be const. This will only affect new code so no backport is needed.	2021-02-12 16:23:46 +01:00
Willy Tarreau	54fa7e332a	BUG/MINOR: tcpcheck: proxy_parse_check() must take a const for the defproxy The default proxy was passed as a variable, which in addition to being a PITA to deal with in the config parser, doesn't feel safe to use when it ought to be const. This will only affect new code so no backport is needed.	2021-02-12 16:23:46 +01:00
Willy Tarreau	220fd70694	BUG/MINOR: extcheck: proxy_parse_extcheck() must take a const for the defproxy The default proxy was passed as a variable, which in addition to being a PITA to deal with in the config parser, doesn't feel safe to use when it ought to be const. This will only affect new code so no backport is needed.	2021-02-12 16:23:46 +01:00
Willy Tarreau	a3320a0509	MINOR: proxy: move the defproxy freeing code to proxy.c This used to be open-coded in cfgparse-listen.c when facing a "defaults" keyword. Let's move this into proxy_free_defaults(). This code is ugly and doesn't even reset the just freed pointers. Let's not change this yet. This code should probably be merged with a generic proxy deinit function called from deinit(). However there's a catch on uri_auth which cannot be freed because it might be used by one or several proxies. We definitely need refcounts there!	2021-02-12 16:23:46 +01:00
Willy Tarreau	7683893c70	REORG: proxy: centralize the proxy allocation code into alloc_new_proxy() This new function takes over the old open-coding that used to be done for too long in cfg_parse_listen() and it now does everything at once in a proxy-centric function. The function does all the job of allocating the structure, initializing it, presetting its defaults from the default proxy and checking for errors. The code was almost unchanged except for defproxy being passed as a pointer, and the error message being passed using memprintf(). This change will be needed to ease reuse of multiple default proxies, or to create dynamic backends in a distant future.	2021-02-12 16:23:46 +01:00
Willy Tarreau	144289b459	REORG: move init_default_instance() to proxy.c and pass it the defproxy pointer init_default_instance() was still left in cfgparse.c which is not the best place to pre-initialize a proxy. Let's place it in proxy.c just after init_new_proxy(), take this opportunity for renaming it to proxy_preset_defaults() and taking out init_new_proxy() from it, and let's pass it the pointer to the default proxy to be initialized instead of implicitly assuming defproxy. We'll soon be able to exploit this. Only two call places had to be updated.	2021-02-12 16:23:46 +01:00
Willy Tarreau	168a414037	BUILD: proxy: add missing compression-t.h to proxy-t.h struct comp is used in struct proxy but never declared prior to this so depending on where proxy.h is included, touching the <comp> field can break the build.	2021-02-12 16:23:46 +01:00
Willy Tarreau	09f2e77eb1	BUG/MINOR: tcpheck: the source list must be a const in dup_tcpcheck_var() This is just an API bug but it's annoying when trying to tidy the code. The source list passed in argument must be a const and not a variable, as it's typically the list head from a default proxy and must obviously not be modified by the function. No backport is needed as it only impacts new code.	2021-02-12 16:23:46 +01:00
Willy Tarreau	016255a483	BUG/MINOR: http-htx: defpx must be a const in proxy_dup_default_conf_errors() This is just an API bug but it's annoying when trying to tidy the code. The default proxy passed in argument must be a const and not a variable. No backport is needed as it only impacts new code.	2021-02-12 16:23:46 +01:00
Willy Tarreau	5bbc676608	BUG/MINOR: stats: revert the change on ST_CONVDONE In 2.1, commit `ee4f5f83d` ("MINOR: stats: get rid of the ST_CONVDONE flag") introduced a subtle bug. By testing curproxy against defproxy in check_config_validity(), it tried to eliminate the need for a flag to indicate that stats authentication rules were already compiled, but by doing so it left the issue opened for the case where a new defaults section appears after the two proxies sharing the first one: defaults mode http stats auth foo:bar listen l1 bind :8080 listen l2 bind :8181 defaults # just to break above This config results in: [ALERT] 042/113725 (3121) : proxy 'f2': stats 'auth'/'realm' and 'http-request' can't be used at the same time. [ALERT] 042/113725 (3121) : Fatal errors found in configuration. Removing the last defaults remains OK. It turns out that the cleanups that followed that patch render it useless, so the best fix is to revert the change (with the up-to-date flags instead). The flag was marked as belonging to the config. It's not exact but it's the closest to the reality, as it's not there to configure the behavior but ti mention that the config parser did its job. This could be backported as far as 2.1, but in practice it looks like nobody ever hit it.	2021-02-12 16:23:45 +01:00
William Dauchy	d1a7b85a40	MEDIUM: server: support {check,agent}_addr, agent_port in server state logical followup from cli commands addition, so that the state server file stays compatible with the changes made at runtime; use previously added helper to load server attributes. also alloc a specific chunk to avoid mixing with other called functions using it Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-02-12 16:04:52 +01:00
William Dauchy	63e6cba12a	MEDIUM: server: add server-states version 2 Even if it is possibly too much work for the current usage, it makes sure we don't break states file from v2.3 to v2.4; indeed, since v2.3, we introduced two new fields, so we put them aside to guarantee we can easily reload from a version 1. The diff seems huge but there is no specific change apart from: - introduce v2 where it is needed (parsing, update) - move away from switch/case in update to be able to reuse code - move srv lock to the whole function to make it easier this patch confirm how painful it is to maintain this functionality. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-02-12 16:04:52 +01:00
Amaury Denoyelle	1921d20fff	MINOR: connection: use proxy protocol as parameter for srv conn hash Use the proxy protocol frame if proxy protocol is activated on the server line. Do not add anymore these connections in the private list. If some requests are made with the same proxy fields, they can reuse the idle connection. The reg-tests proxy_protocol_send_unique_id must be adapted has it relied on the side effect behavior that every requests from a same connection reused a private server connection. Now, a new connection is created as expected if the proxy protocol fields differ.	2021-02-12 12:54:04 +01:00
Amaury Denoyelle	d10a200f62	MINOR: connection: use src addr as parameter for srv conn hash The source address is used as an input to the the server connection hash. The address and port are used as separate hash inputs. Do not add anymore these connections in the private list. This parameter is set only if used in the transparent-proxy mode.	2021-02-12 12:54:04 +01:00
Amaury Denoyelle	01a287f1e5	MINOR: connection: use dst addr as parameter for srv conn hash The destination address is used as an input to the server connection hash. The address and port are used as separated hash inputs. Note that they are not used when statically specified on the server line. This is only useful for dynamic destination address. This is typically used when the server address is dynamically set via the set-dst action. The address and port are separated hash parameters. Most notably, it should fixed set-dst use case (cf github issue #947).	2021-02-12 12:53:56 +01:00
Amaury Denoyelle	9b626e3c19	MINOR: connection: use sni as parameter for srv conn hash The sni parameter is an input to the server connection hash. Do not add anymore connections with dynamic sni in the private list. Thus, it is now possible to reuse a server connection if they use the same sni.	2021-02-12 12:48:11 +01:00
Amaury Denoyelle	293dcc400e	MINOR: backend: compare conn hash for session conn reuse Compare the connection hash when reusing a connection from the session. This ensures that a private connection is reused only if it shares the same set of parameters.	2021-02-12 12:33:05 +01:00
Amaury Denoyelle	1a58aca84e	MINOR: connection: use the srv pointer for the srv conn hash The pointer of the target server is used as a first parameter for the server connection hash calcul. This prevents the hash to be null when no specific parameters are present, and can serve as a simple defense against an attacker trying to reuse a non-conform connection.	2021-02-12 12:33:05 +01:00
Amaury Denoyelle	81c6f76d3e	MINOR: connection: prepare hash calcul for server conns This is a preliminary work for the calcul of the backend connection hash. A structure conn_hash_params is the input for the operation, containing the various specific parameters of a connection. The high bits of the hash will reflect the parameters present as input. A set of macros is written to manipulate the connection hash and extract the parameters/payload.	2021-02-12 12:33:05 +01:00
Amaury Denoyelle	f232cb3e9b	MEDIUM: connection: replace idle conn lists by eb trees The server idle/safe/available connection lists are replaced with ebmb- trees. This is used to store backend connections, with the new field connection hash as the key. The hash is a 8-bytes size field, used to reflect specific connection parameters. This is a preliminary work to be able to reuse connection with SNI, explicit src/dst address or PROXY protocol.	2021-02-12 12:33:05 +01:00
Amaury Denoyelle	5c7086f6b0	MEDIUM: connection: protect idle conn lists with locks This is a preparation work for connection reuse with sni/proxy protocol/specific src-dst addresses. Protect every access to idle conn lists with a lock. This is currently strictly not needed because the access to the list are made with atomic operations. However, to be able to reuse connection with specific parameters, the list storage will be converted to eb-trees. As this structure does not have atomic operation, it is mandatory to protect it with a lock. For this, the takeover lock is reused. Its role was to protect during connection takeover. As it is now extended to general idle conns usage, it is renamed to idle_conns_lock. A new lock section is also instantiated named IDLE_CONNS_LOCK to isolate its impact on performance.	2021-02-12 12:33:04 +01:00
William Dauchy	38cd986c54	BUG/MINOR: server: re-align state file fields number Since commit `3169471964` ("MINOR: Add server port field to server state file.") max_fields was not increased on version number 1. So this patch aims to fix it. This should be backported as far as v1.8, but the numbering should be adpated depending on the version: simply increase the field by 1. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-02-10 16:25:42 +01:00
William Lallemand	7b41654495	MINOR: ssl: add SSL_SERVER_LOCK label in threads.h Amaury reported that the commit `3ce6eed` ("MEDIUM: ssl: add a rwlock for SSL server session cache") introduced some warning during compilation: include/haproxy/thread.h\|411 col 2\| warning: enumeration value 'SSL_SERVER_LOCK' not handled in switch [-Wswitch] This patch fix the issue by adding the right entry in the switch block. Must be backported where `3ce6eed` is backported. (2.4 only for now)	2021-02-10 16:17:19 +01:00
Willy Tarreau	826f3ab5e6	MINOR: stick-tables/counters: add http_fail_cnt and http_fail_rate data types Historically we've been counting lots of client-triggered events in stick tables to help detect misbehaving ones, but we've been missing the same on the server side, and there's been repeated requests for being able to count the server errors per URL in order to precisely monitor the quality of service or even to avoid routing requests to certain dead services, which is also called "circuit breaking" nowadays. This commit introduces http_fail_cnt and http_fail_rate, which work like http_err_cnt and http_err_rate in that they respectively count events and their frequency, but they only consider server-side issues such as network errors, unparsable and truncated responses, and 5xx status codes other than 501 and 505 (since these ones are usually triggered by the client). Note that retryable errors are purposely not accounted for, so that only what the client really sees is considered. With this it becomes very simple to put some protective measures in place to perform a redirect or return an excuse page when the error rate goes beyond a certain threshold for a given URL, and give more chances to the server to recover from this condition. Typically it could look like this to bypass a URL causing more than 10 requests per second: stick-table type string len 80 size 4k expire 1m store http_fail_rate(1m) http-request track-sc0 base # track host+path, ignore query string http-request return status 503 content-type text/html \ lf-file excuse.html if { sc0_http_fail_rate gt 10 } A more advanced mechanism using gpt0 could even implement high/low rates to disable/enable the service. Reg-test converteers_ref_cnt_never_dec.vtc was updated to test it.	2021-02-10 12:27:01 +01:00
Willy Tarreau	e66ee1a651	BUG/MINOR: intops: fix mul32hi()'s off-by-one mul32hi() multiples a constant a with a variable b from 0 to 0xffffffff and shifts the result by 32 bits. It's visible that it's always impossible to reach the constant a this way because the product always misses exactly one unit of a to be preserved. And this cannot be corrected by the caller either as adding one to the output will only shift the output range, and it's not possible to pass 2^32 on the ratio <b>. The right approach is to add "a" after the multiplication so that the input range is always preserved for all ratio values from 0 to 0xffffffff: (a=0x00000000 * b=0x00000000 + a=0x00000000) >> 32 = 0x00000000 (a=0x00000000 * b=0x00000001 + a=0x00000000) >> 32 = 0x00000000 (a=0x00000000 * b=0xffffffff + a=0x00000000) >> 32 = 0x00000000 (a=0x00000001 * b=0x00000000 + a=0x00000001) >> 32 = 0x00000000 (a=0x00000001 * b=0x00000001 + a=0x00000001) >> 32 = 0x00000000 (a=0x00000001 * b=0xffffffff + a=0x00000001) >> 32 = 0x00000001 (a=0xffffffff * b=0x00000000 + a=0xffffffff) >> 32 = 0x00000000 (a=0xffffffff * b=0x00000001 + a=0xffffffff) >> 32 = 0x00000001 (a=0xffffffff * b=0xffffffff + a=0xffffffff) >> 32 = 0xffffffff This is only used in freq_ctr calculations and the slightly lower value is unlikely to have ever been noticed by anyone. This may be backported though it is not important.	2021-02-09 17:52:50 +01:00
William Lallemand	3ce6eedb37	MEDIUM: ssl: add a rwlock for SSL server session cache When adding the server side support for certificate update over the CLI we encountered a design problem with the SSL session cache which was not locked. Indeed, once a certificate is updated we need to flush the cache, but we also need to ensure that the cache is not used during the update. To prevent the use of the cache during an update, this patch introduce a rwlock for the SSL server session cache. In the SSL session part this patch only lock in read, even if it writes. The reason behind this, is that in the session part, there is one cache storage per thread so it is not a problem to write in the cache from several threads. The problem is only when trying to write in the cache from the CLI (which could be on any thread) when a session is trying to access the cache. So there is a write lock in the CLI part to prevent simultaneous access by a session and the CLI. This patch also remove the thread_isolate attempt which is eating too much CPU time and was not protecting from the use of a free ptr in the session.	2021-02-09 09:43:44 +01:00
Ilya Shipitsin	acf84595a7	CLEANUP: assorted typo fixes in the code and comments This is 17th iteration of typo fixes	2021-02-08 10:49:08 +01:00
Ilya Shipitsin	7bbf5866e0	BUILD: ssl: fix typo in HAVE_SSL_CTX_ADD_SERVER_CUSTOM_EXT macro HAVE_SSL_CTX_ADD_SERVER_CUSTOM_EXT was introduced in `ec60909871` however it was defined as HAVE_SL_CTX_ADD_SERVER_CUSTOM_EXT (missing "S") let us fix typo	2021-02-08 00:11:41 +01:00
Willy Tarreau	4acb99f867	BUG/MINOR: xxhash: make sure armv6 uses memcpy() There was a special case made to allow ARMv6 to use unaligned accesses via a cast in xxHash when __ARM_FEATURE_UNALIGNED is defined. But while ARMv6 (and v7) does support unaligned accesses, it's only for 32-bit pointers, not 64-bit ones, leading to bus errors when the compiler emits an ldrd instruction and the input (e.g. a pattern) is not aligned, as in issue #1035. Note that v7 was properly using the packed approach here and was safe, however haproxy versions 2.3 and older use the old r39 xxhash code which has the same issue for armv7. A slightly different fix is required there, by using a different definition of packed for 32 and 64 bits. The problem is really visible when running v7 code on a v8 kernel because such kernels do not implement alignment trap emulation, and the process dies when this happens. This is why in the issue above it was only detected under lxc. The emulation could have been disabled on v7 as well by writing zero to /proc/cpu/alignment though. This commit is a backport of xxhash commit a470f2ef ("update default memory access for armv6"). Thanks to @srkunze for the report and tests, @stgraber for his help on setting up an easy reproducer outside of lxc, and @Cyan4973 for the discussion around the best way to fix this. Details and alternate patches available on https://github.com/Cyan4973/xxHash/issues/490.	2021-02-04 17:14:58 +01:00
William Dauchy	4858fb2e18	MEDIUM: check: align agentaddr and agentport behaviour in the same manner of agentaddr, we now: - permit to set agentport through `port` keyword, like it is the case for agentaddr through `addr` - set the priority on `agent-port` keyword when used - add a flag to be able to test when the value is set like for agentaddr it makes the behaviour between `addr` and `port` more consistent. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-02-04 14:00:38 +01:00
William Dauchy	1c921cd748	BUG/MINOR: check: consitent way to set agentaddr small consistency problem with `addr` and `agent-addr` options: for the both options, the last one parsed is always used to set the agent-check addr. Thus these two lines don't have the same behavior: server ... addr <addr1> agent-addr <addr2> server ... agent-addr <addr2> addr <addr1> After this patch `agent-addr` will always be the priority option over `addr`. It means we test the flag before setting agentaddr. We also fix all the places where we did not set the flag to be coherent everywhere. I was not really able to determine where this issue is coming from. So it is probable we may backport it to all stable version where the agent is supported. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-02-04 13:55:04 +01:00
William Dauchy	fe03e7d045	MEDIUM: server: adding support for check_port in server state We can currently change the check-port using the cli command `set server check-port` but there is a consistency issue when using server state. This patch aims to fix this problem but will be also a good preparation work to get rid of checkport flag, so we are able to know when checkport was set by config. I am fully aware this is not making github #953 moving forward, I however think this might be acceptable while waiting for a proper solution and resolve consistency problem faced with port settings. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-02-04 10:46:52 +01:00
William Dauchy	69f118d7b6	MEDIUM: check: remove checkport checkaddr flag While trying to fix some consistency problem with the config file/cli (e.g. check-port cli command does not set the flag), we realised checkport flag was not necessarily needed. Indeed tcpcheck uses service port as the last choice if check.port is zero. So we can assume if check.port is zero, it means it was never set by the user, regardless if it is by the cli or config file. In the longterm this will avoid to introduce a new consistency issue if we forget to set the flag. in the same manner of checkport flag, we don't really need checkaddr flag. We can assume if checkaddr is not set, it means it was never set by the user or config. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-02-04 10:43:00 +01:00
William Dauchy	eedb9b13f4	MINOR: stats: improve pending connections description In order to unify prometheus and stats description, we need to clarify the description for pending connections. - remove the BE reference in counters struct, as it is also used in servers - remove reference of `qcur` field in description as it is specific to stats implemention - try to reword cur and max pending connections description Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-02-01 15:16:33 +01:00
Christopher Faulet	7aa3271439	MINOR: checks: Add function to get the result code corresponding to a status The function get_check_status_result() can now be used to get the result code (CHK_RES_) corresponding to a check status (HCHK_STATUS_). It will be used by the Prometheus exporter when reporting the check status of a server.	2021-02-01 15:16:33 +01:00
Willy Tarreau	d597ec2718	MINOR: listener: export manage_global_listener_queue() This one pops up in tasks lists when running against a saturated listener.	2021-01-29 14:29:57 +01:00
Willy Tarreau	02922e19ca	MINOR: session: export session_expire_embryonic() This is only to make it resolve nicely in "show tasks".	2021-01-29 12:27:57 +01:00
Willy Tarreau	fb5401f296	MINOR: listener: export accept_queue_process This is only to make it resolve in "show tasks".	2021-01-29 12:25:23 +01:00
Willy Tarreau	3fb6a7b46e	MINOR: activity: declare a new structure to collect per-function activity The new sched_activity structure will be used to collect task-level activity based on the target function. The principle is to declare a large enough array to make collisions rare (256 entries), and hash the function pointer using a reduced XXH to decide where to store the stats. On first computation an entry is definitely assigned to the array and it's done atomically. A special entry (0) is used to store collisions ("others"). The goal is to make it easy and inexpensive for the scheduler code to use these to store #calls, cpu_time and lat_time for each task.	2021-01-29 12:10:33 +01:00
Willy Tarreau	aa622b822b	MINOR: activity: make profiling more manageable In 2.0, commit `d2d3348ac` ("MINOR: activity: enable automatic profiling turn on/off") introduced an automatic mode to enable/disable profiling. The problem is that the automatic mode automatically changes to on/off, which implied that the forced on/off modes aren't sticky anymore. It's annoying when debugging because as soon as the load decreases, profiling stops. This makes a small change which ought to have been done first, which consists in having two states for "auto" (auto-on, auto-off) to distinguish them from the forced states. Setting to "auto" in the config defaults to "auto-off" as before, and setting it on the CLI switches to auto but keeps the current operating state. This is simple enough to be backported to older releases if needed.	2021-01-29 12:10:33 +01:00
Willy Tarreau	4deeb1055f	MINOR: tools: add print_time_short() to print a condensed duration value When reporting some values in debugging output we often need to have some condensed, stable-length values. This function prints a duration from nanosecond to years with at least 4 digits of accuracy using the most suitable unit, always on 7 chars.	2021-01-29 12:10:33 +01:00
Christopher Faulet	405f054652	MINOR: h1: Raise the chunk size limit up to (2^52 - 1) The allowed chunk size was historically limited to 2GB to avoid risk of overflow. This restriction is no longer necessary because the chunk size is immediately stored into a 64bits integer after the parsing. Thus, it is now possible to raise this limit. However to never fed possibly bogus values from languages that use floats for their integers, we don't get more than 13 hexa-digit (2^52 - 1). 4 petabytes is probably enough ! This patch should fix the issue #1065. It may be backported as far as 2.1. For the 2.0, the legacy HTTP part must be reviewed. But there is honestely no reason to do so.	2021-01-28 16:37:14 +01:00
Amaury Denoyelle	f9dcbeeab3	MEDIUM: h2: send connect protocol h2 settings In order to announce support for the Extended CONNECT h2 method by haproxy, always send the ENABLE_CONNECT_PROTOCOL h2 settings. This new setting has been described in the rfc 8441. After receiving ENABLE_CONNECT_PROTOCOL, the client is free to use the Extended CONNECT h2 method. This can notably be useful for the support of websocket handshake on http/2.	2021-01-28 16:37:14 +01:00
Amaury Denoyelle	c9a0afcc32	MEDIUM: h2: parse Extended CONNECT request to htx Support for the rfc 8441 Bootstraping WebSockets with HTTP/2 Convert an Extended CONNECT HTTP/2 request into a htx representation. The htx message uses the GET method with an Upgrade header field to be fully compatible with the equivalent HTTP/1.1 Upgrade mechanism. The Extended CONNECT is of the following form : :method = CONNECT :protocol = websocket :scheme = https :path = /chat :authority = server.example.com The new pseudo-header :protocol has been defined and is used to identify an Extended CONNECT method. Contrary to standard CONNECT, Extended CONNECT must have :scheme, :path and :authority defined.	2021-01-28 16:37:14 +01:00
Amaury Denoyelle	aad333a9fc	MEDIUM: h1: add a WebSocket key on handshake if needed Add the header Sec-Websocket-Key when generating a h1 handshake websocket without this header. This is the case when doing h2-h1 conversion. The key is randomly generated and base64 encoded. It is stored on the session side to be able to verify response key and reject it if not valid.	2021-01-28 16:37:14 +01:00
Amaury Denoyelle	7416274914	MEDIUM: h2: parse Extended CONNECT reponse to htx Support for the rfc 8441 Bootstraping WebSockets with HTTP/2 Convert a 200 status reply from an Extended CONNECT request into a htx representation. The htx message is set to 101 status code to be fully compatible with the equivalent HTTP/1.1 Upgrade mechanism. This conversion is only done if the stream flags H2_SF_EXT_CONNECT_SENT has been set. This is true if an Extended CONNECT request has already been seen on the stream. Besides the 101 status, the additional headers Connection/Upgrade are added to the htx message. The protocol is set from the value stored in h2s. Typically it will be extracted from the client request. This is only used if the client is using h1 as only the HTTP/1.1 101 Response contains the Upgrade header.	2021-01-28 16:37:14 +01:00
Amaury Denoyelle	c193823343	MEDIUM: h1: generate WebSocket key on response if needed Add the Sec-Websocket-Accept header on a websocket handshake response. This header may be missing if a h2 server is used with a h1 client. The response key is calculated following the rfc6455. For this, the handshake request key must be stored in the h1 session, as a new field name ws_key. Note that this is only done if the message has been prealably identified as a Websocket handshake request.	2021-01-28 16:37:14 +01:00
Amaury Denoyelle	18ee5c3eb0	MINOR: h1: reject websocket handshake if missing key If a request is identified as a WebSocket handshake, it must contains a websocket key header or else it can be reject, following the rfc6455. A new flag H1_MF_UPG_WEBSOCKET is set on such messages. For the request te be identified as a WebSocket handshake, it must contains the headers: Connection: upgrade Upgrade: websocket This commit is a compagnon of "MEDIUM: h1: generate WebSocket key on response if needed" and "MEDIUM: h1: add a WebSocket key on handshake if needed". Indeed, it ensures that a WebSocket key is added only from a http/2 side and not for a http/1 bogus peer.	2021-01-28 16:37:14 +01:00
Christopher Faulet	7d247f0771	MINOR: h2/mux-h2: Add flags to notify the response is known to have no body The H2 message flag H2_MSGF_BODYLESS_RSP is now used during the request or the response parsing to notify the mux that, considering the parsed message, the response is known to have no body. This happens during HEAD requests parsing and during 204/304 responses parsing. On the H2 multiplexer, the equivalent flag is set on H2 streams. Thus the H2_SF_BODYLESS_RESP flag is set on a H2 stream if the H2_MSGF_BODYLESS_RSP is found after a HEADERS frame parsing. Conversely, this flag is also set when a HEADERS frame is emitted for HEAD requests and for 204/304 responses. The H2_SF_BODYLESS_RESP flag will be used to ignore data payload from the response but not the trailers.	2021-01-28 16:37:14 +01:00
Christopher Faulet	d1ac2b90cd	MAJOR: htx: Remove the EOM block type and use HTX_FL_EOM instead The EOM block may be removed. The HTX_FL_EOM flags is enough. Most of time, to know if the end of the message is reached, we just need to have an empty HTX message with HTX_FL_EOM flag set. It may also be detected when the last block of a message with HTX_FL_EOM flag is manipulated. Removing EOM blocks simplifies the HTX message filling. Indeed, there is no more edge problems when the message ends but there is no more space to write the EOM block. However, some part are more tricky. Especially the compression filter or the FCGI mux. The compression filter must finish the compression on the last DATA block. Before it was performed on the EOM block, an extra DATA block with the checksum was added. Now, we must detect the last DATA block to be sure to finish the compression. The FCGI mux on its part must be sure to reserve the space for the empty STDIN record on the last DATA block while this record was inserted on the EOM block. The H2 multiplexer is probably the part that benefits the most from this change. Indeed, it is now fairly easier to known when to set the ES flag. The HTX documentaion has been updated accordingly.	2021-01-28 16:37:14 +01:00
Christopher Faulet	789a472674	MINOR: htx: Add a function to know if a block is the only one in a message The htx_is_unique_blk() function may now be used to know if a block is the only one in an HTX message, excluding all unused blocks. Note the purpose of this function is not to know if a block is the last one of an HTTP message. This means no more data part from the message are expected, except tunneled data. It only says if a block is alone in an HTX message.	2021-01-28 16:37:14 +01:00
Christopher Faulet	42432f347f	MINOR: htx: Rename HTX_FL_EOI flag into HTX_FL_EOM The HTX_FL_EOI flag is not well named. For now, it is not very used. But that will change. It will replace the EOM block. Thus, it is renamed.	2021-01-28 16:37:14 +01:00
Christopher Faulet	576c358508	MINOR: htx/http-ana: Save info about Upgrade option in the Connection header Add an HTX start-line flag and its counterpart into the HTTP message to track the presence of the Upgrade option into the Connection header. This way, without parsing the Connection header again, it will be easy to know if a client asks for a protocol upgrade and if the server agrees to do so. It will also be easy to perform some conformance checks when a 101-switching-protocols is received.	2021-01-28 16:27:48 +01:00
Christopher Faulet	4ef84c9c41	MINOR: stream: Add a function to validate TCP to H1 upgrades TCP to H1 upgrades are buggy for now. When such upgrade is performed, a crash is experienced. The bug is the result of the recent H1 mux refactoring, and more specifically because of the commit `c4bfa59f1` ("MAJOR: mux-h1: Create the client stream as later as possible"). Indeed, now the H1 mux is responsible to create the frontend conn-stream once the request headers are fully received. Thus the TCP to H1 upgrade is a problem because the frontend conn-stream already exists. To fix the bug, we must keep this conn-stream and the associate stream and use it in the H1 mux. To do so, the upgrade will be performed in two steps. First, the mux is upgraded from mux-pt to mux-h1. Then, the mux-h1 performs the stream upgrade, once the request headers are fully received and parsed. To do so, stream_upgrade_from_cs() must be used. This function set the SF_HTX flags to switch the stream to HTX mode, it removes the SF_IGNORE flags and eventually it fills the request channel with some input data. This patch is required to fix the TCP to H1 upgrades and is intimately linked with the next commits.	2021-01-28 16:27:48 +01:00
Amaury Denoyelle	3f07c20fab	BUG/MEDIUM: session: only retrieve ready idle conn from session A bug was introduced by the early insertion of idle connections at the end of connect_server. It is possible to reuse a connection not yet ready waiting for an handshake (for example with proxy protocol or ssl). A wrong duplicate xprt_handshake_io_cb tasklet is thus registered as a side-effect. This triggers the BUG_ON statement of xprt_handshake_subscribe : BUG_ON(ctx->subs && ctx->subs != es); To counter this, a check is now present in session_get_conn to only return a connection without the flag CO_FL_WAIT_XPRT. This might cause sometimes the creation of dedicated server connections when in theory reuse could have been used, but probably only occurs rarely in real condition. This behavior is present since commit : MEDIUM: connection: Add private connections synchronously in session server list It could also be further exagerated by : MEDIUM: backend: add reused conn to sess if mux marked as HOL blocking It can be backported up to 2.3. NOTE : This bug seems to be only reproducible with mode tcp, for an unknown reason. However, reuse should never happen when not in http mode. This improper behavior will be the subject of a dedicated patch. This bug can easily be reproducible with the following config (a webserver is required to accept proxy protocol on port 31080) : global defaults mode tcp timeout connect 1s timeout server 1s timeout client 1s listen li bind 0.0.0.0:4444 server bla1 127.0.0.1:31080 check send-proxy-v2 with the inject client : $ inject -u 10000 -d 10 -G 127.0.0.1:4444 This should fix the github issue #1058.	2021-01-28 14:16:27 +01:00
Tim Duesterhus	491be54cf1	BUILD: Include stdlib.h in compiler.h if DEBUG_USE_ABORT is set Building with `"DEBUG=-DDEBUG_STRICT=1 -DDEBUG_USE_ABORT=1"` previously emitted the warning: In file included from include/haproxy/api.h:35:0, from src/mux_pt.c:13: include/haproxy/buf.h: In function ‘br_init’: include/haproxy/bug.h:42:90: warning: implicit declaration of function ‘abort’ [-Wimplicit-function-declaration] #define ABORT_NOW() do { extern void ha_backtrace_to_stderr(); ha_backtrace_to_stderr(); abort(); } while (0) ^ include/haproxy/bug.h:56:21: note: in expansion of macro ‘ABORT_NOW’ #define CRASH_NOW() ABORT_NOW() ^ include/haproxy/bug.h:68:4: note: in expansion of macro ‘CRASH_NOW’ CRASH_NOW(); \ ^ include/haproxy/bug.h:62:35: note: in expansion of macro ‘__BUG_ON’ #define _BUG_ON(cond, file, line) __BUG_ON(cond, file, line) ^ include/haproxy/bug.h:61:22: note: in expansion of macro ‘_BUG_ON’ #define BUG_ON(cond) _BUG_ON(cond, __FILE__, __LINE__) ^ include/haproxy/buf.h:875:2: note: in expansion of macro ‘BUG_ON’ BUG_ON(size < 2); ^ This patch fixes that issue. The `DEBUG_USE_ABORT` option exists for use with static analysis tools. No backport needed.	2021-01-27 12:44:39 +01:00
William Lallemand	795bd9ba3a	CLEANUP: ssl: remove SSL_CTX function parameter Since the server SSL_CTX is now stored in the ckch_inst, it is not needed anymore to pass an SSL_CTX to ckch_inst_new_load_srv_store() and ssl_sock_load_srv_ckchs().	2021-01-26 15:19:36 +01:00
Remi Tricot-Le Breton	bb470aa327	MINOR: ssl: Remove client_crt member of the server's ssl context The client_crt member is not used anymore since the server's ssl context initialization now behaves the same way as the bind lines one (using ckch stores and instances).	2021-01-26 15:19:36 +01:00
Remi Tricot-Le Breton	f3eedfe195	MEDIUM: ssl: Enable backend certificate hot update When trying to update a backend certificate, we should find a server-side ckch instance thanks to which we can rebuild a new ssl context and a new ckch instance that replace the previous ones in the server structure. This way any new ssl session will be built out of the new ssl context and the newly updated certificate. This resolves a subpart of GitHub issue #427 (the certificate part)	2021-01-26 15:19:36 +01:00
Remi Tricot-Le Breton	d817dc733e	MEDIUM: ssl: Load client certificates in a ckch for backend servers In order for the backend server's certificate to be hot-updatable, it needs to fit into the implementation used for the "bind" certificates. This patch follows the architecture implemented for the frontend implementation and reuses its structures and general function calls (adapted for the server side). The ckch store logic is kept and a dedicated ckch instance is used (one per server). The whole sni_ctx logic was not kept though because it is not needed. All the new functions added in this patch are basically server-side copies of functions that already exist on the frontend side with all the sni and bind_cond references removed. The ckch_inst structure has a new 'is_server_instance' flag which is used to distinguish regular instances from the server-side ones, and a new pointer to the server's structure in case of backend instance. Since the new server ckch instances are linked to a standard ckch_store, a lookup in the ckch store table will succeed so the cli code used to update bind certificates needs to be covered to manage those new server side ckch instances.	2021-01-26 15:19:36 +01:00
Remi Tricot-Le Breton	442b7f2238	MINOR: ssl: Server ssl context prepare function refactoring Split the server's ssl context initialization into the general ssl related initializations and the actual initialization of a single SSL_CTX structure. This way the context's initialization will be usable by itself from elsewhere.	2021-01-26 15:19:36 +01:00
Christopher Faulet	6071c2d12d	BUG/MEDIUM: filters/htx: Fix data forwarding when payload length is unknown It is only a problem on the response path because the request payload length it always known. But when a filter is registered to analyze the response payload, the filtering may hang if the server closes just after the headers. The root cause of the bug comes from an attempt to allow the filters to not immediately forward the headers if necessary. A filter may choose to hold the headers by not forwarding any bytes of the payload. For a message with no payload but a known payload length, there is always a EOM block to forward. Thus holding the EOM block for bodyless messages is a good way to also hold the headers. However, messages with an unknown payload length, there is no EOM block finishing the message, but only a SHUTR flag on the channel to mark the end of the stream. If there is no payload when it happens, there is no payload at all to forward. In the filters API, it is wrongly detected as a condition to not forward the headers. Because it is not the most used feature and not the obvious one, this patch introduces another way to hold the message headers at the begining of the forwarding. A filter flag is added to explicitly says the headers should be hold. A filter may choose to set the STRM_FLT_FL_HOLD_HTTP_HDRS flag and not forwad anything to hold the headers. This flag is removed at each call, thus it must always be explicitly set by filters. This flag is only evaluated if no byte has ever been forwarded because the headers are forwarded with the first byte of the payload. reg-tests/filters/random-forwarding.vtc reg-test is updated to also test responses with unknown payload length (with and without payload). This patch must be backported as far as 2.0.	2021-01-26 09:53:52 +01:00
Tim Duesterhus	3d7f9ff377	MINOR: abort() on my_unreachable() when DEBUG_USE_ABORT is set. Hopefully this helps static analysis tools detecting that the code after that call is unreachable. See GitHub Issue #1075.	2021-01-26 09:33:18 +01:00
William Dauchy	d3a9a4992b	MEDIUM: stats: allow to select one field in `stats_fill_sv_stats` prometheus approach requires to output all values for a given metric name; meaning we iterate through all metrics, and then iterate in the inner loop on all objects for this metric. In order to allow more code reuse, adapt the stats API to be able to select one field or fill them all otherwise. This patch follows what has already been done on frontend and backend side. From this patch it should be possible to remove most of the duplicate code on prometheuse side for the server. A few things to note though: - state require prior calculation, so I moved that to a sort of helper `stats_fill_be_stats_computestate`. - all ST_F*TIME fields requires some minor compute, so I moved it at te beginning of the function under a condition. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-01-26 09:24:51 +01:00
William Dauchy	da3b466fc2	MEDIUM: stats: allow to select one field in `stats_fill_be_stats` prometheus approach requires to output all values for a given metric name; meaning we iterate through all metrics, and then iterate in the inner loop on all objects for this metric. In order to allow more code reuse, adapt the stats API to be able to select one field or fill them all otherwise. This patch follows what has already been done on frontend side. From this patch it should be possible to remove most of the duplicate code on prometheuse side for the backend A few things to note though: - status and uweight field requires prior compute, so I moved that to a sort of helper `stats_fill_be_stats_computesrv`. - all ST_F*TIME fields requires some minor compute, so I moved it at te beginning of the function under a condition. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-01-26 09:24:19 +01:00
William Dauchy	2107a0faf5	CLEANUP: stats: improve field selection for frontend http fields while working on backend/servers I realised I could have written that in a better way and avoid one extra break. This is slightly improving readiness. also while being here, fix function declaration which was not 100% accurate. this patch does not change the behaviour of the code. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-01-25 15:53:28 +01:00
Ilya Shipitsin	1fc44d494a	BUILD: ssl: guard Client Hello callbacks with HAVE_SSL_CLIENT_HELLO_CB macro instead of openssl version let us introduce new macro HAVE_SSL_CLIENT_HELLO_CB and guard callback functions with it	2021-01-22 20:45:24 +01:00
Willy Tarreau	2bfce7e424	MINOR: debug: let ha_dump_backtrace() dump a bit further for some callers The dump state is now passed to the function so that the caller can adjust the behavior. A new series of 4 values allow to stop after dumping main instead of before it or any of the usual loops. This allows to also report BUG_ON() that could happen very high in the call graph (e.g. startup, or the scheduler itself) while still understanding what the call path was.	2021-01-22 14:48:34 +01:00
Willy Tarreau	5baf4fe31a	MEDIUM: debug: now always print a backtrace on CRASH_NOW() and friends The purpose is to enable the dumping of a backtrace on BUG_ON(). While it's very useful to know that a condition was met, very often some caller context is missing to figure how the condition could happen. From now on, on systems featuring backtrace, a backtrace of the calling thread will also be dumped to stderr in addition to the unexpected condition. This will help users of DEBUG_STRICT as they'll most often find this backtrace in their logs even if they can't find their core file. A new "debug dev bug" expert-mode CLI command was added to test the feature.	2021-01-22 14:18:34 +01:00
Willy Tarreau	a8459b28c3	MINOR: debug: create ha_backtrace_to_stderr() to dump an instant backtrace This function calls the ha_dump_backtrace() function with a locally allocated buffer and sends the output slightly indented to fd #2. It's meant to be used as an emergency backtrace dump.	2021-01-22 14:15:36 +01:00
Willy Tarreau	123fc9786a	MINOR: debug: extract the backtrace dumping code to its own function The backtrace dumping code was located into the thread dump function but it looks particularly convenient to be able to call it to produce a dump in other situations, so let's move it to its own function and make sure it's called last in the function so that we can benefit from tail merging to save one entry.	2021-01-22 13:52:41 +01:00
Willy Tarreau	2f1227eb3f	MINOR: debug: always export the my_backtrace function In order to simplify the code and remove annoying ifdefs everywhere, let's always export my_backtrace() and make it adapt to the situation and return zero if not supported. A small update in the thread dump function was needed to make sure we don't use its results if it fails now.	2021-01-22 12:12:29 +01:00
William Dauchy	b9577450ea	MINOR: contrib/prometheus-exporter: use fill_fe_stats for frontend dump use `stats_fill_fe_stats` when possible to avoid duplicating code; make use of field selector to get the needed field only. this should not introduce any difference of output. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-01-21 18:59:30 +01:00
William Dauchy	0ef54397b0	MEDIUM: stats: allow to select one field in `stats_fill_fe_stats` prometheus approach requires to output all values for a given metric name; meaning we iterate through all metrics, and then iterate in the inner loop on all objects for this metric. In order to allow more code reuse, adapt the stats API to be able to select one field or fill them all otherwise. From this patch it should be possible to remove most of the duplicate code on prometheuse side for the frontend. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-01-21 18:59:30 +01:00
William Dauchy	defd15685e	MINOR: stats: add new start time field Another patch in order to try to reconciliate haproxy stats and prometheus. Here I'm adding a proper start time field in order to make proper use of uptime field. That being done we can move the calculation in `fill_info` Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-01-21 18:59:30 +01:00
William Dauchy	a8766cfad1	MINOR: stats: duplicate 3 fields in bytes in info in order to prepare a possible merge of fields between haproxy stats and prometheus, duplicate 3 fields: INF_MEMMAX INF_POOL_ALLOC INF_POOL_USED Those were specifically named in MB unit which is not what prometheus recommends. We therefore used them but changed the unit while doing the calculation. It created a specific case for that, up to the description. This patch: - removes some possible confusion, i.e. using MB field for bytes - will permit an easier merge of fields such as description First consequence for now, is that we can remove the calculation on prometheus side and move it on `fill_info`. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-01-21 18:59:30 +01:00
Christopher Faulet	142dd33912	MINOR: muxes: Add exit status for errors about not implemented features The MUX_ES_NOTIMPL_ERR exit status is added to allow the multiplexers to report errors about not implemented features. This will be used by the H1 mux to return 501-not-implemented errors.	2021-01-21 15:21:12 +01:00
Christopher Faulet	e095f31d36	MINOR: http: Add HTTP 501-not-implemented error message Add the support for the 501-not-implemented status code with the corresponding default message. The documentation is updated accordingly because it is now part of status codes HAProxy may emit via an errorfile or a deny/return HTTP action.	2021-01-21 15:21:12 +01:00
Christopher Faulet	8f100427c4	BUG/MEDIUM: tcpcheck: Don't destroy connection in the wake callback context When a tcpcheck ruleset uses multiple connections, the existing one must be closed and destroyed before openning the new one. This part is handled in the tcpcheck_main() function, when called from the wake callback function (wake_srv_chk). But it is indeed a problem, because this function may be called from the mux layer. This means a mux may call the wake callback function of the data layer, which may release the connection and the mux. It is easy to see how it is hazardous. And actually, depending on the scheduling, it leads to crashes. Thus, we must avoid to release the connection in the wake callback context, and move this part in the check's process function instead. To do so, we rely on the CHK_ST_CLOSE_CONN flags. When a connection must be replaced by a new one, this flag is set on the check, in tcpcheck_main() function, and the check's task is woken up. Then, the connection is really closed in process_chk_conn() function. This patch must be backported as far as 2.2, with some adaptations however because the code is not exactly the same.	2021-01-21 15:21:12 +01:00
Willy Tarreau	8050efeacb	MINOR: cli: give the show_fd helpers the ability to report a suspicious entry Now the show_fd helpers at the transport and mux levels return an integer which indicates whether or not the inspected entry looks suspicious. When an entry is reported as suspicious, "show fd" will suffix it with an exclamation mark ('!') in the dump, that is supposed to help detecting them. For now, helpers were adjusted to adapt to the new API but none of them reports any suspicious entry yet.	2021-01-21 08:58:15 +01:00
Willy Tarreau	108a271049	MINOR: xprt: add a new show_fd() helper to complete some "show fd" dumps. Just like we did for the muxes, now the transport layers will have the ability to provide helpers to report more detailed information about their internal context. When the helper is not known, the pointer continues to be dumped as-is if it's not NULL. This way a transport with no context nor dump function will not add a useless "xprt_ctx=(nil)" but the pointer will be emitted if valid or if a helper is defined.	2021-01-20 17:17:39 +01:00
Willy Tarreau	45fd1030d5	CLEANUP: tools: make resolve_sym_name() take a const pointer When `0c439d895` ("BUILD: tools: make resolve_sym_name() return a const") was written, the pointer argument ought to have been turned to const for more flexibility. Let's do it now.	2021-01-20 17:17:39 +01:00
Tim Duesterhus	1d66e396bf	MINOR: cache: Remove the `hash` part of the accept-encoding secondary key As of commit `6ca89162dc` this hash no longer is required, because unknown encodings are not longer stored and known encodings do not use the cache.	2021-01-18 15:01:41 +01:00
Willy Tarreau	31ffe9fad0	MINOR: pattern: add the missing generation ID manipulation functions The functions needed to commit a pattern file generation number or increase it were still missing. Better not have the caller play with these.	2021-01-15 14:41:16 +01:00
Willy Tarreau	dc2410d093	CLEANUP: pattern: rename pat_ref_commit() to pat_ref_commit_elt() It's about the third time I get confused by these functions, half of which manipulate the reference as a whole and those manipulating only an entry. For me "pat_ref_commit" means committing the pattern reference, not just an element, so let's rename it. A number of other ones should really be renamed before 2.4 gets released :-/	2021-01-15 14:11:59 +01:00
Christopher Faulet	d4a83dd6b3	MINOR: config: Add failifnotcap() to emit an alert on proxy capabilities This function must be used to emit an alert if a proxy does not have at least one of the requested capabilities. An additional message may be appended to the alert.	2021-01-13 17:45:34 +01:00
William Dauchy	5d9b8f3c93	MINOR: contrib/prometheus-exporter: use fill_info for process dump use `stats_fill_info` when possible to avoid duplicating code. Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-01-13 15:19:00 +01:00
Thayne McCombs	8f0cc5c4ba	CLEANUP: Fix spelling errors in comments This is from the output of codespell. It's done at once over a bunch of files and only affects comments, so there is nothing user-visible. No backport needed.	2021-01-08 14:56:32 +01:00
William Dauchy	5a982a7165	MINOR: contrib/prometheus-exporter: export build_info commit `c55a626217` ("MINOR: contrib/prometheus-exporter: Add missing global and per-server metrics") is renaming two metrics between v2.2 and v2.3: server_idle_connections_current server_idle_connections_limit It is breaking some tools which are making use of those metrics while supporting several haproxy versions. This build_info will permit tools which make use of metrics to be able to match the haproxy version and change the list of expected metrics. This was possible using the haproxy stats socket but not with prometheus export. This patch follows prometheus best pratices to export specific software informations. It is adding a new field `build_info` so we can extend it to other parameters if needed in the future. example output: # HELP haproxy_process_build_info HAProxy build info. # TYPE haproxy_process_build_info gauge haproxy_process_build_info{version="2.4-dev5-2e1a3f-5"} 1 Even though it is not a bugfix, this patch will make more sense when backported up to >= 2.0 Signed-off-by: William Dauchy <wdauchy@gmail.com>	2021-01-08 14:48:13 +01:00
Ilya Shipitsin	b8888ab557	CLEANUP: assorted typo fixes in the code and comments This is 15th iteration of typo fixes	2021-01-06 17:32:03 +01:00
Ilya Shipitsin	1e9a66603f	CLEANUP: assorted typo fixes in the code and comments This is 14th iteration of typo fixes	2021-01-06 16:26:50 +01:00
Fr�d�ric L�caille	242fb1b639	MINOR: quic: Drop packets with STREAM frames with wrong direction. A server initiates streams with odd-numbered stream IDs. Also add useful traces when parsing STREAM frames.	2021-01-04 12:31:28 +01:00
Fr�d�ric L�caille	6c1e36ce55	CLEANUP: quic: Remove useless QUIC event trace definitions. Remove QUIC_EV_CONN_E* event trace macros which were defined for errors. Replace QUIC_EV_CONN_ECHPKT by QUIC_EV_CONN_BCFRMS used in qc_build_cfrms()	2021-01-04 12:31:28 +01:00
Fr�d�ric L�caille	164096eb76	MINOR: qpack: Add static header table definitions for QPACK. As HPACK, QPACK makes usage of a static header table.	2021-01-04 12:31:28 +01:00
Tim Duesterhus	54182ec9d7	CLEANUP: Apply the coccinelle patch for `XXXcmp()` on include/ Compare the various `XXXcmp()` functions against zero.	2021-01-04 10:09:02 +01:00
Thayne McCombs	92149f9a82	MEDIUM: stick-tables: Add srvkey option to stick-table This allows using the address of the server rather than the name of the server for keeping track of servers in a backend for stickiness. The peers code was also extended to support feeding the dictionary using this key instead of the name. Fixes #814	2020-12-31 10:04:54 +01:00
Remi Tricot-Le Breton	ce9e7b2521	MEDIUM: cache: Manage a subset of encodings in accept-encoding normalizer The accept-encoding normalizer now explicitely manages a subset of encodings which will all have their own bit in the encoding bitmap stored in the cache entry. This way two requests with the same primary key will be served the same cache entry if they both explicitely accept the stored response's encoding, even if their respective secondary keys are not the same and do not match the stored response's one. The actual hash of the accept-encoding will still be used if the response's encoding is unmanaged. The encoding matching and the encoding weight parsing are done for every subpart of the accept-encoding values, and a bitmap of accepted encodings is built for every request. It is then tested upon any stored response that has the same primary key until one with an accepted encoding is found. The specific "identity" and "*" accept-encoding values are managed too. When storing a response in the key, we also parse the content-encoding header in order to only set the response's corresponding encoding's bit in its cache_entry encoding bitmap. This patch fixes GitHub issue #988. It does not need to be backported.	2020-12-24 17:18:00 +01:00
Remi Tricot-Le Breton	56e46cb393	MINOR: http: Add helper functions to trim spaces and tabs Add two helper functions that trim leading or trailing spaces and horizontal tabs from an ist string.	2020-12-24 17:18:00 +01:00
Remi Tricot-Le Breton	2b5c5cbef6	MINOR: cache: Avoid storing responses whose secondary key was not correctly calculated If any of the secondary hash normalizing functions raises an error, the secondary hash will be unusable. In this case, the response will not be stored anymore.	2020-12-24 17:18:00 +01:00
Fr�d�ric L�caille	f63921fc24	MINOR: quic: Add traces for quic_packet_encrypt(). Add traces to have an idea why this function may fail. In fact in never fails when the passed parameters are correct, especially the lengths. This is not the case when a packet is not correctly built before being encrypted.	2020-12-23 11:57:26 +01:00
Fr�d�ric L�caille	133e8a7146	MINOR: quic: make a packet build fails when qc_build_frm() fails. Even if the size of frames built by qc_build_frm() are computed so that not to overflow a buffer, do not rely on this and always makes a packet build fails if we could not build a frame. Also add traces to have an idea where qc_build_frm() fails. Fixes a memory leak in qc_build_phdshk_apkt().	2020-12-23 11:57:26 +01:00
Fr�d�ric L�caille	f7e0b8d6ae	MINOR: quic: Add traces for in flght ack-eliciting packet counter. Add trace for this counter. Also shorten its variable name (->ifae_pkts).	2020-12-23 11:57:26 +01:00
Fr�d�ric L�caille	04ffb66bc9	MINOR: quic: Make usage of the congestion control window. Remove ->ifcdata which was there to control the CRYPTO data sent to the peer so that not to saturate its reception buffer. This was a sort of flow control. Add ->prep_in_flight counter to the QUIC path struct to control the number of bytes prepared to be sent so that not to saturare the congestion control window. This counter is increased each time a packet was built. This has nothing to see with ->in_flight which is the real in flight number of bytes which have really been sent. We are olbiged to maintain two such counters to know how many bytes of data we can prepared before sending them. Modify traces consequently which were useful to diagnose issues about the congestion control window usage.	2020-12-23 11:57:26 +01:00
Fr�d�ric L�caille	c5e72b9868	MINOR: quic: Attempt to make trace more readable As there is a lot of information in this protocol, this is not easy to make the traces readable. We remove here a few of them and shorten some line shortening the variable names.	2020-12-23 11:57:26 +01:00
Fr�d�ric L�caille	8090b51e92	MAJOR: quic: Make usage of ebtrees to store QUIC ACK ranges. Store QUIC ACK ranges in ebtrees in place of lists with a 0(n) time complexity for insertion.	2020-12-23 11:57:26 +01:00
Fr�d�ric L�caille	f46c10cfb1	MINOR: server: Add QUIC definitions to servers. This patch adds QUIC structs to server struct so that to make the QUIC code compile. Also initializes the ebtree to store the connections by connection IDs.	2020-12-23 11:57:26 +01:00
Fr�d�ric L�caille	884f2e9f43	MINOR: listener: Add QUIC info to listeners and receivers. This patch adds a quic_transport_params struct to bind_conf struct used for the listeners. This is to store the QUIC transport parameters for the listeners. Also initializes them when calling str2listener(). Before str2sa_range() it's too early to figure we're going to speak QUIC, and after it's too late as listeners are already created. So it seems that doing it in str2listener() when the protocol is discovered is the best place. Also adds two ebtrees to the underlying receivers to store the connection by connections IDs (one for the original connection IDs, and another one for the definitive connection IDs which really identify the connections. However it doesn't seem normal that it is stored in the receiver nor the listener. There should be a private context in the listener so that protocols can store internal information. This element should in fact be the listener handle. Something still feels wrong, and probably we'll have to make QUIC and SSL co-exist: a proof of this is that there's some explicit code in bind_parse_ssl() to prevent the "ssl" keyword from replacing the xprt.	2020-12-23 11:57:26 +01:00
Fr�d�ric L�caille	0c4e3b09b0	MINOR: quic: Add definitions for QUIC protocol. This patch imports all the definitions for QUIC protocol with few modifications from 20200720-quic branch of quic-dev repository found at https://github.com/haproxytech/quic-dev.	2020-12-23 11:57:26 +01:00
Fr�d�ric L�caille	901ee2f37b	MINOR: ssl: Export definitions required by QUIC. QUIC needs to initialize its BIO and SSL session the same way as for SSL over TCP connections. It needs also to use the same ClientHello callback. This patch only exports functions and variables shared between QUIC and SSL/TCP connections.	2020-12-23 11:57:26 +01:00
Fr�d�ric L�caille	5e3d83a221	MINOR: connection: Add a new xprt to connection. Simply adds XPRT_QUIC new enum to integrate QUIC transport protocol.	2020-12-23 11:57:26 +01:00
Fr�d�ric L�caille	70da889d57	MINOR: quic: Redefine control layer callbacks which are QUIC specific. We add src/quic_sock.c QUIC specific socket management functions as callbacks for the control layer: ->accept_conn, ->default_iocb and ->rx_listening. accept_conn() will have to be defined. The default I/O handler only recvfrom() the datagrams received. Furthermore, ->rx_listening callback always returns 1 at this time but should returns 0 when reloading the processus.	2020-12-23 11:57:26 +01:00
Fr�d�ric L�caille	72f7cb170a	MINOR: connection: Attach a "quic_conn" struct to "connection" struct. This is a simple patch to prepare the integration of QUIC support to come. quic_conn struct is supposed to embed any QUIC specific information for a QUIC connection.	2020-12-23 11:57:26 +01:00
Fr�d�ric L�caille	ca42b2c9d3	MINOR: protocol: Create proto_quic QUIC protocol layer. As QUIC is a connection oriented protocol, this file is almost a copy of proto_tcp without TCP specific features. To suspend/resume a QUIC receiver we proceed the same way as for proto_udp receivers. With the recent updates to the listeners, we don't need a specific set of quic*_add_listener() functions, the default ones are sufficient. The fields declaration were reordered to make the various layers more visible like in other protocols. udp_suspend_receiver/udp_resume_receiver are up-to-date (the check for INHERITED is present) and the code being UDP-specific, it's normal to use UDP here. Note that in the future we might more reasily reference stacked layers so that there's no more need for specifying the pointer here.	2020-12-23 11:57:26 +01:00
Dragan Dosen	6f7cc11e6d	MEDIUM: xxhash: use the XXH_INLINE_ALL macro to inline all functions This way we make all xxhash functions inline, with implementations being directly included within xxhash.h. Makefile is updated as well, since we don't need to compile and link xxhash.o anymore. Inlining should improve performance on small data inputs.	2020-12-23 06:39:21 +01:00
Dragan Dosen	967e7e79af	MEDIUM: xxhash: use the XXH3 functions to generate 64-bit hashes Replace the XXH64() function calls with the XXH3 variant function XXH3_64bits_withSeed() where possible.	2020-12-23 06:39:21 +01:00
Dragan Dosen	de37443e64	IMPORT: xxhash: update to v0.8.0 that introduces stable XXH3 variant A new XXH3 variant of hash functions shows a noticeable improvement in performance (especially on small data), and also brings 128-bit support, better inlining and streaming capabilities. Performance comparison is available here: https://github.com/Cyan4973/xxHash/wiki/Performance-comparison	2020-12-23 06:39:21 +01:00
Olivier Houchard	63ee281854	MINOR: atomic: don't use ; to separate instruction on aarch64. The assembler on MacOS aarch64 interprets ; as the beginning of comments, so it is not suitable for separating instructions in inline asm. Use \n instead. This should be backported to 2.3, 2.2, 2.1, 2.0 and 1.9.	2020-12-23 01:23:41 +01:00
Willy Tarreau	4f59d38616	MINOR: time: increase the minimum wakeup interval to 60s The MAX_DELAY_MS which is set an upper limit to the poll wait time and force a wakeup this often used to be set to 1 second in order to easily spot and correct time drifts. This was added 12 years ago at an era where virtual machines were starting to become common in server environments while not working particularly well. Nowadays, such issues are not as common anymore, however forcing 64 threads to wake up every single second starts to make the process visible on otherwise idle systems. Let's increase this wakeup interval to one minute. In the worst case it will make idle threads wake every second, which remains low. If this is not sufficient anymore on some systems, another approach would consist in implementing a deep-sleep mode which only triggers after a while and which is always disabled if any time drift is observed.	2020-12-22 10:35:43 +01:00
Christian Ruppert	b67e155895	BUILD: hpack: hpack-tbl-t.h uses VAR_ARRAY but does not include compiler.h This fixes building hpack from contrib, which failed because of the undeclared VAR_ARRAY: make -C contrib/hpack ... cc -O2 -Wall -g -I../../include -fwrapv -fno-strict-aliasing -c -o gen-enc.o gen-enc.c In file included from gen-enc.c:18: ../../include/haproxy/hpack-tbl-t.h:105:23: error: 'VAR_ARRAY' undeclared here (not in a function) 105 \| struct hpack_dte dte[VAR_ARRAY]; /* dynamic table entries */ ... As discussed in the thread below, let's redefine VAR_ARRAY in this file so that it remains self-sustaining: https://www.mail-archive.com/haproxy@formilux.org/msg39212.html	2020-12-22 10:18:07 +01:00
Ilya Shipitsin	f38a01884a	CLEANUP: assorted typo fixes in the code and comments This is 13n iteration of typo fixes	2020-12-21 11:24:48 +01:00
Ilya Shipitsin	af204881a3	BUILD: ssl: fine guard for SSL_CTX_get0_privatekey call SSL_CTX_get0_privatekey is openssl/boringssl specific function present since openssl-1.0.2, let us define readable guard for it, not depending on HA_OPENSSL_VERSION	2020-12-21 11:17:36 +01:00
Willy Tarreau	b1f54925fc	BUILD: plock: remove dead code that causes a warning in gcc 11 As Ilya reported in issue #998, gcc 11 complains about misleading code indentation which is in fact caused by dead assignments to zero after a loop which stops on zero. Let's clean both of these.	2020-12-21 10:27:18 +01:00
Miroslav Zagorac	7f8314c8d1	MINOR: opentracing: add ARGC_OT enum Due to the addition of the OpenTracing filter it is necessary to define ARGC_OT enum. This value is used in the functions fmt_directive() and smp_resolve_args().	2020-12-16 15:49:53 +01:00
Miroslav Zagorac	6deab79d59	MINOR: vars: replace static functions with global ones The OpenTracing filter uses several internal HAProxy functions to work with variables and therefore requires two static local HAProxy functions, var_accounting_diff() and var_clear(), to be declared global. In fact, the var_clear() function was not originally defined as static, but it lacked a declaration.	2020-12-16 14:20:08 +01:00
Ilya Shipitsin	ec60909871	BUILD: SSL: fine guard for SSL_CTX_add_server_custom_ext call SSL_CTX_add_server_custom_ext is openssl specific function present since openssl-1.0.2, let us define readable guard for it, not depending on HA_OPENSSL_VERSION	2020-12-15 16:13:35 +01:00
Willy Tarreau	472125bc04	MINOR: protocol: add a pair of check_events/ignore_events functions at the ctrl layer Right now the connection subscribe/unsubscribe code needs to manipulate FDs, which is not compatible with QUIC. In practice what we need there is to be able to either subscribe or wake up depending on readiness at the moment of subscription. This commit introduces two new functions at the control layer, which are provided by the socket code, to check for FD readiness or subscribe to it at the control layer. For now it's not used.	2020-12-11 17:02:50 +01:00
Willy Tarreau	2ded48dd27	MINOR: connection: make conn_sock_drain() use the control layer's ->drain() Now we don't touch the fd anymore there, instead we rely on the ->drain() provided by the control layer. As such the function was renamed to conn_ctrl_drain().	2020-12-11 16:26:01 +01:00
Willy Tarreau	427c846cc9	MINOR: protocol: add a ->drain() function at the connection control layer This is what we need to drain pending incoming data from an connection. The code was taken from conn_sock_drain() without the connection-specific stuff. It still takes a connection for now for API simplicity.	2020-12-11 16:26:00 +01:00
Willy Tarreau	586f71b43f	REORG: connection: move the socket iocb (conn_fd_handler) to sock.c conn_fd_handler() is 100% specific to socket code. It's about time it moves to sock.c which manipulates socket FDs. With it comes conn_fd_check() which tests for the socket's readiness. The ugly connection status check at the end of the iocb was moved to an inlined function in connection.h so that if we need it for other socket layers it's not too hard to reuse. The code was really only moved and not changed at all.	2020-12-11 16:26:00 +01:00
Willy Tarreau	827fee7406	MINOR: connection: remove sock-specific code from conn_sock_send() The send() loop present in this function and the error handling is already present in raw_sock_from_buf(). Let's rely on it instead and stop touching the FD from this place. The send flag was changed to use a more agnostic CO_SFL_*. The name was changed to "conn_ctrl_send()" to remind that it's meant to be used to send at the lowest level.	2020-12-11 16:25:11 +01:00
Willy Tarreau	3a9e56478e	CLEANUP: connection: remove the unneeded fd_stop_{recv,send} on read0/shutw These are two other areas where this fd_stop_recv()/fd_stop_send() makes no sense anymore. Both happen by definition while the FD is not subscribed, since nowadays it's subscribed after failing recv()/send(), in which case we cannot close.	2020-12-11 13:56:12 +01:00
Willy Tarreau	3ec094b09d	CLEANUP: remove the unused fd_stop_send() in conn_xprt_shutw{,_hard}() These functions used to disable polling for writes when shutting down but this is no longer used as it still happens later when closing if the connection was subscribed to FD events. Let's just remove this fake and undesired dependency on the FD layer.	2020-12-11 13:49:19 +01:00
Amaury Denoyelle	8d22823ade	MEDIUM: http_act: define set-timeout server/tunnel action Add a new http-request action 'set-timeout [server/tunnel]'. This action can be used to update the server or tunnel timeout of a stream. It takes two parameters, the timeout name to update and the new timeout value. This rule is only valid for a proxy with backend capabilities. The timeout value cannot be null. A sample expression can also be used instead of a plain value.	2020-12-11 12:01:07 +01:00
Amaury Denoyelle	fb50443517	MEDIUM: stream: support a dynamic tunnel timeout Allow the modification of the tunnel timeout on the stream side. Use a new field in the stream for the tunnel timeout. It is initialized by the tunnel timeout from backend unless it has already been set by a set-timeout tunnel rule.	2020-12-11 12:01:07 +01:00
Amaury Denoyelle	b715078821	MINOR: stream: prepare the hot refresh of timeouts Define a stream function to allow to update the timeouts. This commit is in preparation for the support of dynamic timeouts with the set-timeout rule.	2020-12-11 12:01:07 +01:00
Amaury Denoyelle	5a9fc2d10f	MINOR: action: define enum for timeout type of the set-timeout rule This enum is used to specify the timeout targetted by a set-timeout rule.	2020-12-11 12:01:07 +01:00
Willy Tarreau	343d0356a5	CLEANUP: connection: remove the unused conn_{stop,cond_update}_polling() These functions are not used anymore and were quite confusing given that their names reflected their original role and not the current ones. Let's kill them before they inspire anyone.	2020-12-11 11:21:53 +01:00
Willy Tarreau	6aee5b9a4c	MINOR: connection: implement cs_drain_and_close() We had cs_close() which forces a CS_SHR_RESET mode on the read side, and due to this there are a few call places in the checks which perform a manual call to conn_sock_drain() before calling cs_close(). This is absurd by principle, and it can be counter-productive in the case of a mux where this could even cause the opposite of the desired effect by deleting pending frames on the socket before closing. Let's add cs_drain_and_close() which uses the CS_SHR_DRAIN mode to prepare this.	2020-12-11 11:04:51 +01:00
Willy Tarreau	29885f0308	MINOR: udp: export udp_suspend_receiver() and udp_resume_receiver() QUIC will rely on UDP at the receiver level, and will need these functions to suspend/resume the receivers. In the future, protocol chaining may simplify this.	2020-12-08 18:10:18 +01:00
Willy Tarreau	c14e7ae744	MINOR: connection: use the control layer's init/close In conn_ctrl_init() and conn_ctrl_close() we now use the control layer's functions instead of manipulating the FD directly. This is safe since the control layer is always present when done. Note that now we also adjust the flag before calling the function to make things cleaner in case such a layer would need to call the same functions again for any reason.	2020-12-08 15:53:45 +01:00
Willy Tarreau	de471c4655	MINOR: protocol: add a set of ctrl_init/ctrl_close methods for setup/teardown Currnetly conn_ctrl_init() does an fd_insert() and conn_ctrl_close() does an fd_delete(). These are the two only short-term obstacles against using a non-fd handle to set up a connection. Let's have pur these into the protocol layer, along with the other connection-level stuff so that the generic connection code uses them instead. This will allow to define new ones for other protocols (e.g. QUIC). Since we only support regular sockets at the moment, the code was placed into sock.c and shared with proto_tcp, proto_uxst and proto_sockpair.	2020-12-08 15:50:56 +01:00
Willy Tarreau	b366c9a59a	CLEANUP: protocol: group protocol struct members by usage For the sake of an improved readability, let's group the protocol field members according to where they're supposed to be defined: - connection layer (note: for now even UDP needs one) - binding layer - address family - socket layer Nothing else was changed.	2020-12-08 14:58:24 +01:00
Willy Tarreau	b9b2fd7cf4	MINOR: protocol: export protocol definitions The various protocols were made static since there was no point in exporting them in the past. Nowadays with QUIC relying on UDP we'll significantly benefit from UDP being exported and more generally from being able to declare some functions as being the same as other protocols'. In an ideal world it should not be these protocols which should be exported, but the intermediary levels: - socket layer (sock.c only right now), already exported as functions but nothing structured at the moment ; - family layer (sock_inet, sock_unix, sockpair etc): already structured and exported - binding layer (the part that relies on the receiver): currently fused within the protocol - connectiong layer (the part that manipulates connections): currently fused within the protocol - protocol (connection's control): shouldn't need to be exposed ultimately once the elements above are in an easily sharable way.	2020-12-08 14:54:08 +01:00
Willy Tarreau	f9ad06cb26	MINOR: protocol: remove the redundant ->sock_domain field This field used to be needed before commit `2b5e0d8b6` ("MEDIUM: proto_udp: replace last AF_CUST_UDP* with AF_INET*") as it was used as a protocol entry selector. Since this commit it's always equal to the socket family's value so it's entirely redundant. Let's remove it now to simplify the protocol definition a little bit.	2020-12-08 12:13:54 +01:00
Christopher Faulet	16df178b6e	BUG/MEDIUM: stream: Xfer the input buffer to a fully created stream The input buffer passed as argument to create a new stream must not be transferred when the request channel is initialized because the channel flags are not set at this stage. In addition, the API is a bit confusing regarding the buffer owner when an error occurred. The caller remains the owner, but reading the code it is not obvious. So, first of all, to avoid any ambiguities, comments are added on the calling chain to make it clear. The buffer owner is the caller if any error occurred. And the ownership is transferred to the stream on success. Then, to make things simple, the ownership is transferred at the end of stream_new(), in case of success. And the input buffer is updated to point on BUF_NULL. Thus, in all cases, if the caller try to release it calling b_free() on it, it is not a problem. Of course, it remains the caller responsibility to release it on error. The patch fixes a bug introduced by the commit `26256f86e` ("MINOR: stream: Pass an optional input buffer when a stream is created"). No backport is needed.	2020-12-04 17:15:03 +01:00
Willy Tarreau	d1f250f87b	MINOR: listener: now use a generic add_listener() function With the removal of the family-specific port setting, all protocol had exactly the same implementation of ->add(). A generic one was created with the name "default_add_listener" so that all other ones can now be removed. The API was slightly adjusted so that the protocol and the listener are passed instead of the listener and the port. Note that all protocols continue to provide this ->add() method instead of routinely calling default_add_listener() from create_listeners(). This makes sure that any non-standard protocol will still be able to intercept the listener addition if needed. This could be backported to 2.3 along with the few previous patches on listners as a pure code cleanup.	2020-12-04 15:08:00 +01:00
Willy Tarreau	73bed9ff13	MINOR: protocol: add a ->set_port() helper to address families At various places we need to set a port on an IPv4 or IPv6 address, and it requires casts that are easy to get wrong. Let's add a new set_port() helper to the address family to assist in this. It will be directly accessible from the protocol and will make the operation seamless. Right now this is only implemented for sock_inet as other families do not need a port.	2020-12-04 15:08:00 +01:00
Christopher Faulet	6ad06066cd	CLEANUP: connection: Remove CS_FL_READ_PARTIAL flag Since the recent refactoring of the H1 multiplexer, this flag is no more used. Thus it is removed.	2020-12-04 14:41:49 +01:00
Christopher Faulet	da831fa068	CLEANUP: http-ana: Remove TX_WAIT_NEXT_RQ unsued flag This flags is now unused. It was used in REQ_WAIT_HTTP analyser, when a stream was waiting for a request, to set the keep-alive timeout or to avoid to send HTTP errors to client.	2020-12-04 14:41:49 +01:00
Christopher Faulet	2afd874704	CLEANUP: htx: Remove HTX_FL_UPGRADE unsued flag Now the H1 to H2 upgrade is handled before the stream creation. HTX_FL_UPGRADE flag is now unused.	2020-12-04 14:41:49 +01:00
Christopher Faulet	4c8ad84232	MINOR: mux: Add a ctl parameter to get the exit status of the multiplexers The ctl param MUX_EXIT_STATUS can be request to get the exit status of a multiplexer. For instance, it may be an HTTP status code or an H2 error. For now, 0 is always returned. When the mux h1 will be able to return HTTP errors itself, this ctl param will be used to get the HTTP status code from the logs. the mux_exit_status enum has been created to map internal mux exist status to generic one. Thus there is 5 possible status for now: success, invalid error, timeout error, internal error and unknown.	2020-12-04 14:41:49 +01:00
Christopher Faulet	7d0c19e82d	MINOR: session: Add functions to increase http values of tracked counters cumulative numbers of http request and http errors of counters tracked at the session level and their rates can now be updated at the session level thanks to two new functions. These functions are not used for now, but it will be called to keep tracked counters up-to-date if an error occurs before the stream creation.	2020-12-04 14:41:49 +01:00
Christopher Faulet	84600631cd	MINOR: stick-tables: Add functions to update some values of a tracked counter The cumulative numbers of http requests, http errors, bytes received and sent and their respective rates for a tracked counters are now updated using specific stream independent functions. These functions are used by the stream but the aim is to allow the session to do so too. For now, there is no reason to perform these updates from the session, except from the mux-h2 maybe. But, the mux-h1, on the frontend side, will be able to return some errors to the client, before the stream creation. In this case, it will be mandatory to update counters tracked at the session level.	2020-12-04 14:41:49 +01:00
Christopher Faulet	26256f86e1	MINOR: stream: Pass an optional input buffer when a stream is created It is now possible to set the buffer used by the channel request buffer when a stream is created. It may be useful if input data are already received, instead of waiting the first call to the mux rcv_buf() callback. This change is mandatory to support H1 connection with no stream attached. For now, the multiplexers don't pass any buffer. BUF_NULL is thus used to call stream_create_from_cs().	2020-12-04 14:41:48 +01:00
Christopher Faulet	afc02a4436	MINOR: muxes: Remove get_cs_info callback function now useless This callback function was only defined by the mux-h1. But it has been removed in the previous commit because it is unused now. So, we can do a step forward removing the callback function from the mux definition and the cs_info structure.	2020-12-04 14:41:48 +01:00
Christopher Faulet	d517396f8e	MINOR: session: Add the idle duration field into the session The idle duration between two streams is added to the session structure. It is not necessarily pertinent on all protocols. In fact, it is only defined for H1 connections. It is the duration between two H1 transactions. But the .get_cs_info() callback function on the multiplexers only exists because this duration is missing at the session level. So it is a simplification opportunity for a really low cost. To reduce the cost, a hole in the session structure is filled by moving .srv_list field at the end of the structure.	2020-12-04 14:41:48 +01:00
Thierry Fournier	c749259dff	MINOR: lua-thread: Store each function reference and init reference in array The goal is to allow execution of one main lua state per thread. The array introduces storage of one reference per thread, because each lua state can have different reference id for a same function. A function returns the preferred state id according to configuration and current thread id.	2020-12-02 21:53:16 +01:00
Thierry Fournier	021d986ecc	MINOR: lua-thread: Replace state_from by state_id The goal is to allow execution of one main lua state per thread. "state_from" is a pointer to the parent lua state. "state_id" is the index of the parent state id in the reference lua states array. "state_id" is better because the lock is a "== 0" test which is quick than pointer comparison. In other way, the state_id index could index other things the the Lua state concerned. I think to the function references.	2020-12-02 21:53:16 +01:00
Thierry Fournier	62a22aa23f	MINOR: lua-thread: Replace "struct hlua_function" allocation by dedicated function The goal is to allow execution of one main lua state per thread. This function will initialize the struct with other things than 0. With this function helper, the initialization is centralized and it prevents mistakes. This patch also keeps a reference to each declared function in a list. It will be useful in next patches to control consistency of declared references.	2020-12-02 21:53:16 +01:00
Thierry Fournier	75fc02956b	MINOR: lua-thread: make hlua_ctx_init() get L from its caller The goal is to allow execution of one main lua state per thread. The function hlua_ctx_init() now gets the original lua state from its caller. This allows the initialisation of lua_thread (coroutines) from any master lua state. The parent lua state is stored in the hlua struct. This patch is a temporary transition, it will be modified later.	2020-12-02 21:53:16 +01:00
Thierry Fournier	ad5345fed7	MINOR: lua-thread: Replace embedded struct hlua_function by a pointer The goal is to allow execution of one main lua state per thread. Because this struct will be filled after the configuration parser, we cannot copy the content. The actual state of the Haproxy code doesn't justify this change, it is an update preparing next steps.	2020-12-02 21:53:16 +01:00
Thierry Fournier	a51a1fd174	MINOR: cli: add a function to look up a CLI service description This function will be useful to check if the keyword is already registered. Also add a define for the max number of args. This will be needed by a next patch to fix a bug and will have to be backported.	2020-12-02 09:45:18 +01:00
Thierry Fournier	87e539906b	MINOR: actions: add a function returning a service pointer from its name This function simply calls action_lookup() on the private service_keywords, to look up a service name. This will be used to detect double registration of a same service from Lua. This will be needed by a next patch to fix a bug and will have to be backported.	2020-12-02 09:45:18 +01:00
Thierry Fournier	7a71a6d9d2	MINOR: actions: Export actions lookup functions These functions will be useful to check if a keyword is already registered. This will be needed by a next patch to fix a bug, and will need to be backported.	2020-12-02 09:45:18 +01:00
Willy Tarreau	a1f12746b1	MINOR: traces: add a new level "error" below the "user" level Sometimes it would be nice to be able to only trace abnormal events such as protocol errors. Let's add a new "error" level below the "user" level for this. This will allow to add TRACE_ERROR() at various error points and only see them.	2020-12-01 10:25:20 +01:00
Maciej Zdeb	fcdfd857b3	MINOR: log: Logging HTTP path only with %HPO This patch adds a new logging variable '%HPO' for logging HTTP path only (without query string) from relative or absolute URI. For example: log-format "hpo=%HPO hp=%HP hu=%HU hq=%HQ" GET /r/1 HTTP/1.1 => hpo=/r/1 hp=/r/1 hu=/r/1 hq= GET /r/2?q=2 HTTP/1.1 => hpo=/r/2 hp=/r/2 hu=/r/2?q=2 hq=?q=2 GET http://host/r/3 HTTP/1.1 => hpo=/r/3 hp=http://host/r/3 hu=http://host/r/3 hq= GET http://host/r/4?q=4 HTTP/1.1 => hpo=/r/4 hp=http://host/r/4 hu=http://host/r/4?q=4 hq=?q=4	2020-12-01 09:32:44 +01:00
Emeric Brun	0237c4e3f5	BUG/MEDIUM: local log format regression. Since 2.3 default local log format always adds hostame field. This behavior change was due to log/sink re-work, because according to rfc3164 the hostname field is mandatory. This patch re-introduce a legacy "local" format which is analog to rfc3164 but with hostname stripped. This is the new default if logs are generated by haproxy. To stay compliant with previous configurations, the option "log-send-hostname" acts as if the default format is switched to rfc3164. This patch addresses the github issue #963 This patch should be backported in branches >= 2.3.	2020-12-01 06:58:42 +01:00
Willy Tarreau	4d6c594998	BUG/MEDIUM: task: close a possible data race condition on a tasklet's list link In issue #958 Ashley Penney reported intermittent crashes on AWS's ARM nodes which would not happen on x86 nodes. After investigation it turned out that the Neoverse N1 CPU cores used in the Graviton2 CPU are much more aggressive than the usual Cortex A53/A72/A55 or any x86 regarding memory ordering. The issue that was triggered there is that if a tasklet_wakeup() call is made on a tasklet scheduled to run on a foreign thread and that tasklet is just being dequeued to be processed, there can be a race at two places: - if MT_LIST_TRY_ADDQ() happens between MT_LIST_BEHEAD() and LIST_SPLICE_END_DETACHED() if the tasklet is alone in the list, because the emptiness tests matches ; - if MT_LIST_TRY_ADDQ() happens during LIST_DEL_INIT() in run_tasks_from_lists(), then depending on how LIST_DEL_INIT() ends up being implemented, it may even corrupt the adjacent nodes while they're being reused for the in-tree storage. This issue was introduced in 2.2 when support for waking up remote tasklets was added. Initially the attachment of a tasklet to a list was enough to know its status and this used to be stable information. Now it's not sufficient to rely on this anymore, thus we need to use a different information. This patch solves this by adding a new task flag, TASK_IN_LIST, which is atomically set before attaching a tasklet to a list, and is only removed after the tasklet is detached from a list. It is checked by tasklet_wakeup_on() so that it may only be done while the tasklet is out of any list, and is cleared during the state switch when calling the tasklet. Note that the flag is not set for pure tasks as it's not needed. However this introduces a new special case: the function tasklet_remove_from_tasklet_list() needs to keep both states in sync and cannot check both the state and the attachment to a list at the same time. This function is already limited to being used by the thread owning the tasklet, so in this case the test remains reliable. However, just like its predecessors, this function is wrong by design and it should probably be replaced with a stricter one, a lazy one, or be totally removed (it's only used in checks to avoid calling a possibly scheduled event, and when freeing a tasklet). Regardless, for now the function exists so the flag is removed only if the deletion could be done, which covers all cases we're interested in regarding the insertion. This removal is safe against a concurrent tasklet_wakeup_on() since MT_LIST_DEL() guarantees the atomic test, and will ultimately clear the flag only if the task could be deleted, so the flag will always reflect the last state. This should be carefully be backported as far as 2.2 after some observation period. This patch depends on previous patch "MINOR: task: remove __tasklet_remove_from_tasklet_list()".	2020-11-30 18:17:59 +01:00
Willy Tarreau	2da4c316c2	MINOR: task: remove __tasklet_remove_from_tasklet_list() This function is only used at a single place directly within the scheduler in run_tasks_from_lists() and it really ought not be called by anything else, regardless of what its comment says. Let's delete it, move the two lines directly into the call place, and take this opportunity to factor the atomic decrement on tasks_run_queue. A comment was added on the remaining one tasklet_remove_from_tasklet_list() to mention the risks in using it.	2020-11-30 18:17:44 +01:00
Willy Tarreau	a868c2920b	MINOR: task: remove tasklet_insert_into_tasklet_list() This function is only called at a single place and adds more confusion than it removes. It also makes one think it could be used outside of the scheduler while it must absolutely not. Let's just move its two lines to the call place, making the code more readable there. In addition this clearly shows that the preliminary LIST_INIT() is useless since the entry is immediately overwritten.	2020-11-30 18:17:44 +01:00
Olivier Houchard	1f05324cbe	BUG/MEDIUM: lists: Lock the element while we check if it is in a list. In MT_LIST_TRY_ADDQ() and MT_LIST_TRY_ADD() we can't just check if the element is already in a list, because there's a small race condition, it could be added between the time we checked, and the time we actually set its next and prev, so we have to lock it first. This is required to address issue #958. This should be backported to 2.3, 2.2 and 2.1.	2020-11-30 18:17:29 +01:00
Your Name	1e237d037b	MINOR: plock: use an ARMv8 instruction barrier for the pause instruction As suggested by @AGSaidi in issue #958, on ARMv8 its convenient to use an "isb" instruction in pl_cpu_relax() to improve fairness. Without it I've met a few watchdog conditions on valid locks with 16 threads, indicating that some threads couldn't manage to get it in 2 seconds. I never happened again with it. In addition, the performance increased by slightly more than 5% thanks to the reduced contention. This should be backported as far as 2.2, possibly even 2.0.	2020-11-29 14:53:33 +01:00
Christopher Faulet	97b7bdfcf7	REORG: tcpcheck: Move check option parsing functions based on tcp-check The parsing of the check options based on tcp-check rules (redis, spop, smtp, http...) are moved aways from check.c. Now, these functions are placed in tcpcheck.c. These functions are only related to the tcpcheck ruleset configured on a proxy and not to the health-check attached to a server.	2020-11-27 10:30:23 +01:00
Christopher Faulet	bb9fb8b7f8	MINOR: config: Deprecate and ignore tune.chksize global option This option is now ignored because I/O check buffers are now allocated using the buffer pool. Thus, it is marked as deprecated in the documentation and ignored during the configuration parsing. The field is also removed from the global structure. Because this option is ignored since a recent fix, backported as fare as 2.2, this patch should be backported too. Especially because it updates the documentation.	2020-11-27 10:30:23 +01:00
Christopher Faulet	b381a505c1	BUG/MAJOR: tcpcheck: Allocate input and output buffers from the buffer pool Historically, the input and output buffers of a check are allocated by hand during the startup, with a specific size (not necessarily the same than other buffers). But since the recent refactoring of the checks to rely exclusively on the tcp-checks and to use the underlying mux layer, this part is totally buggy. Indeed, because these buffers are now passed to a mux, they maybe be swapped if a zero-copy is possible. In fact, for now it is only possible in h2_rcv_buf(). Thus the bug concretely only exists if a h2 health-check is performed. But, it is a latent bug for other muxes. Another problem is the size of these buffers. because it may differ for the other buffer size, it might be source of bugs. Finally, for configurations with hundreds of thousands of servers, having 2 buffers per check always allocated may be an issue. To fix the bug, we now allocate these buffers when required using the buffer pool. Thus not-running checks don't waste memory and muxes may swap them if possible. The only drawback is the check buffers have now always the same size than buffers used by the streams. This deprecates indirectly the "tune.chksize" global option. In addition, the http-check regtest have been update to perform some h2 health-checks. Many thanks to @VigneshSP94 for its help on this bug. This patch should solve the issue #936. It relies on the commit "MINOR: tcpcheck: Don't handle anymore in-progress send rules in tcpcheck_main". Both must be backport as far as 2.2. bla	2020-11-27 10:29:41 +01:00
Remi Tricot-Le Breton	3d08236cb3	MINOR: cache: Prepare helper functions for Vary support The Vary functionality is based on a secondary key that needs to be calculated for every request to which a server answers with a Vary header. The Vary header, which can only be found in server responses, determines which headers of the request need to be taken into account in the secondary key. Since we do not want to have to store all the headers of the request until we have the response, we will pre-calculate as many sub-hashes as there are headers that we want to manage in a Vary context. We will only focus on a subset of headers which are likely to be mentioned in a Vary response (accept-encoding and referer for now). Every managed header will have its own normalization function which is in charge of transforming the header value into a core representation, more robust to insignificant changes that could exist between multiple clients. For instance, two accept-encoding values mentioning the same encodings but in different orders should give the same hash. This patch adds a function that parses a Vary header value and checks if all the values belong to our supported subset. It also adds the normalization functions for our two headers, as well as utility functions that can prebuild a secondary key for a given request and transform it into an actual secondary key after the vary signature is determined from the response.	2020-11-24 16:52:57 +01:00
Christopher Faulet	401e6dbff3	BUG/MAJOR: filters: Always keep all offsets up to date during data filtering When at least one data filter is registered on a channel, the offsets of all filters must be kept up to date. For data filters but also for others. It is safer to do it in that way. Indirectly, this patch fixes 2 hidden bugs revealed by the commit `22fca1f2c` ("BUG/MEDIUM: filters: Forward all filtered data at the end of http filtering"). The first one, the worst of both, happens at the end of http filtering when at least one data filtered is registered on the channel. We call the http_end() callback function on the filters, when defined, to finish the http filtering. But it is performed for all filters. Before the commit `22fca1f2c`, the only risk was to call the http_end() callback function unexpectedly on a filter. Now, we may have an overflow on the offset variable, used at the end to forward all filtered data. Of course, from the moment we forward an arbitrary huge amount of data, all kinds of bad things may happen. So offset computation is performed for all filters and http_end() callback function is called only for data filters. The other one happens when a data filter alter the data of a channel, it must update the offsets of all previous filters. But the offset of non-data filters must be up to date, otherwise, here too we may have an integer overflow. Another way to fix these bugs is to always ignore non-data filters from the offsets computation. But this patch is safer and probably easier to maintain. This patch must be backported in all versions where the above commit is. So as far as 2.0.	2020-11-24 14:17:32 +01:00
Ilya Shipitsin	5bfe66366c	BUILD: SSL: do not "update" BoringSSL version equivalent anymore we have added all required fine guarding, no need to reduce BoringSSL version back to 1.1.0 anymore, we do not depend on it	2020-11-24 09:54:44 +01:00
Ilya Shipitsin	f04a89c549	CLEANUP: remove unused function "ssl_sock_is_ckch_valid" "ssl_sock_is_ckch_valid" is not used anymore, let us remove it	2020-11-24 09:54:44 +01:00
Julien Pivotto	2de240a676	MINOR: stream: Add level 7 retries on http error 401, 403 Level-7 retries are only possible with a restricted number of HTTP return codes. While it is usually not safe to retry on 401 and 403, I came up with an authentication backend which was not synchronizing authentication of users. While not perfect, being allowed to also retry on those return codes is really helpful and acts as a hotfix until we can fix the backend. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-11-23 09:33:14 +01:00
Maciej Zdeb	ebdd4c55da	MINOR: http_act: Add -m flag for del-header name matching method This patch adds -m flag which allows to specify header name matching method when deleting headers from http request/response. Currently beg, end, sub, str and reg are supported. This is related to GitHub issue #909	2020-11-21 15:54:30 +01:00
Willy Tarreau	3aab17bd56	BUG/MAJOR: connection: reset conn->owner when detaching from session list Baptiste reported a new crash affecting 2.3 which can be triggered when using H2 on the backend, with http-reuse always and with a tens of clients doing close only. There are a few combined cases which cause this to happen, but each time the issue is the same, an already freed session is dereferenced in session_unown_conn(). Two cases were identified to cause this: - a connection referencing a session as its owner, which is detached from the session's list and is destroyed after this session ends. The test on conn->owner before calling session_unown_conn() is not sufficent as the pointer is not null but is not valid anymore. - a connection that never goes idle and that gets killed form the mux, where session_free() is called first, then conn_free() calls session_unown_conn() which scans the just freed session for older connections. This one is only triggered with DEBUG_UAF The reason for this session to be present here is that it's needed during the connection setup, to be passed to conn_install_mux_be() to mux->init() as the owning session, but it's never deleted aftrewards. Furthermore, even conn_session_free() doesn't delete this pointer after freeing the session that lies there. Both do definitely result in a use-after-free that's more easily triggered under DEBUG_UAF. This patch makes sure that the owner is always deleted after detaching or killing the session. However it is currently not possible to clear the owner right after a synchronous init because the proxy protocol apparently needs it (a reg test checks this), and if we leave it past the connection setup with the session not attached anywhere, it's hard to catch the right moment to detach it. This means that the session may remain in conn->owner as long as the connection has never been added to nor removed from the session's idle list. Given that this patch needs to remain simple enough to be backported, instead it adds a workaround in session_unown_conn() to detect that the element is already not attached anywhere. This fix absolutely requires previous patch "CLEANUP: connection: do not use conn->owner when the session is known" otherwise the situation will be even worse, as some places used to rely on conn->owner instead of the session. The fix could theorically be backported as far as 1.8. However, the code in this area has significantly changed along versions and there are more risks of breaking working stuff than fixing real issues there. The issue was really woken up in two steps during 2.3-dev when slightly reworking the idle conns with commit `08016ab82` ("MEDIUM: connection: Add private connections synchronously in session server list") and when adding support for storing used H2 connections in the session and adding the necessary call to session_unown_conn() in the muxes. But the same test managed to crash 2.2 when built in DEBUG_UAF and patched like this, proving that we used to already leave dangling pointers behind us: \| diff --git a/include/haproxy/connection.h b/include/haproxy/connection.h \| index f8f235c1a..dd30b5f80 100644 \| --- a/include/haproxy/connection.h \| +++ b/include/haproxy/connection.h \| @@ -458,6 +458,10 @@ static inline void conn_free(struct connection conn) \| sess->idle_conns--; \| session_unown_conn(sess, conn); \| } \| + else { \| + struct session sess = conn->owner; \| + BUG_ON(sess && sess->origin != &conn->obj_type); \| + } \| \| sockaddr_free(&conn->src); \| sockaddr_free(&conn->dst); It's uncertain whether an existing code path there can lead to dereferencing conn->owner when it's bad, though certain suspicious memory corruption bugs make one think it's a likely candidate. The patch should not be hard to adapt there. Backports to 2.1 and older are left to the appreciation of the person doing the backport. A reproducer consists in this: global nbthread 1 listen l bind :9000 mode http http-reuse always server s 127.0.0.1:8999 proto h2 frontend f bind :8999 proto h2 mode http http-request return status 200 Then this will make it crash within 2-3 seconds: $ h1load -e -r 1 -c 10 http://0:9000/ If it does not, it might be that DEBUG_UAF was not used (it's harder then) and it might be useful to restart.	2020-11-21 15:29:22 +01:00
Willy Tarreau	38b4d2eb22	CLEANUP: connection: do not use conn->owner when the session is known At a few places we used to rely on conn->owner to retrieve the session while the session is already known. This is not correct because at some of these points the reason the connection's owner was still the session (instead of NULL) is a mistake. At one place a comparison is even made between the session and conn->owner assuming it's valid without checking if it's NULL. Let's clean this up to use the session all the time. Note that this will be needed for a forthcoming fix and will have to be backported.	2020-11-21 15:29:22 +01:00

... 7 8 9 10 11 ...

5449 Commits