haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-09-01 03:51:28 +02:00

Author	SHA1	Message	Date
Olivier Houchard	d2d4c3eb65	MEDIUM: stick-tables: Limit the number of old entries we remove Limit the number of old entries we remove in one call of stktable_trash_oldest(), as we do so while holding the heavily contended update write lock, so we'd rather not hold it for too long. This helps getting stick tables perform better under heavy load.	2025-05-02 15:27:55 +02:00
Olivier Houchard	388539faa3	MEDIUM: stick-tables: defer adding updates to a tasklet There is a lot of contention trying to add updates to the tree. So instead of trying to add the updates to the tree right away, just add them to a mt-list (with one mt-list per thread group, so that the mt-list does not become the new point of contention that much), and create a tasklet dedicated to adding updates to the tree, in batchs, to avoid keeping the update lock for too long. This helps getting stick tables perform better under heavy load.	2025-05-02 15:27:55 +02:00
Willy Tarreau	1af592c511	MINOR: stick-table: use a separate lock label for updates Too many locks were sharing STK_TABLE_LOCK making it hard to analyze. Let's split the already heavily used update lock.	2025-04-24 14:02:22 +02:00
Ilia Shipitsin	78b849b839	CLEANUP: assorted typo fixes in the code and comments code, comments and doc actually.	2025-04-02 11:12:20 +02:00
Aurelien DARRAGON	0486b9e491	MINOR: stktable: add table_{inc,clr}_gpc* converters As discussed in GH #2423, there are some cases where src_{inc,clr}_gpc* is not sufficient because we need to perform the lookup on a specific key. Indeed, just like we did in e642916 ("MEDIUM: stktable: leverage smp_fetch_* helpers from sample conv"), we can easily implement new table converters based on existing fetches. This is what we do in this patch. Also the doc was updated so that src_{inc,clr}_gpc* fetches now point to their generic equivalent table_{inc,clr}_gpc. Indeed, src_{inc,clr}_gpc are simply aliases. This should fix GH #2423.	2025-01-16 11:50:33 +01:00
Aurelien DARRAGON	9f68049cc1	CLEANUP: stktable: move sample_conv_table_bytes_out_rate() sample_conv_table_bytes_out_rate() was defined in the middle of other stick-table sample convs without any ordering logic. Let's put it where it belongs, right after sample_conv_table_bytes_in_rate().	2025-01-16 11:50:27 +01:00
Aurelien DARRAGON	e6429166b9	MEDIUM: stktable: leverage smp_fetch_* helpers from sample conv In this patch we try to prevent code duplication: some fetches and sample converters do the exact same thing, except that the converter takes the argument as input data. Until now, both the converter and the fetch had their own implementation (copy pasted), with the fetch specific or converter specific lookup part. Thanks to previous commits, we now have generic sample fetch helpers that take the stkctr as argument, so let's leverage them directly from the converter functions when available. This allows to remove a lot of code duplication and should make code maintenance easier in the future.	2025-01-15 14:04:55 +01:00
Aurelien DARRAGON	6c9b315187	MEDIUM: stktable: split sc_ and src_ fetch lookup logics While this patch actually adds more insertions than deletions, it actually tries to simplify the lookup logic for sc_ and src_ sticktable fetches. Indeed, smp_create_src_stkctr() and smp_fetch_sc_stkctr() combination was used everywhere the fetch supports sc_ and src_ form, and smp_fetch_sc_stkctr() even integrated some of the src-oriented fetch logic. Not only this was confusing, but it made the task of adding new generic fetches even more complex. Thus in this patch we completely dedicate smp_fetch_sc_stkctr() to sc_ oriented fetches, while smp_create_src_stkctr() is now renamed to smp_fetch_src_stkctr() and can now work on its own for src_ oriented fetches. It takes an additional paramater, "create" to tell the function if the entry should be created if it doesn't exist yet. Now it's up to the calling function to know if it should be using the sc_ oriented fetch or the src_ oriented one based on the input keyword.	2025-01-15 14:04:50 +01:00
Aurelien DARRAGON	22229a41a2	MEDIUM: stktable: split src-based key smp_fetch_sc functions In this patch we split several sample fetch functions that are leveraged by the "src-" fetches such as smp_fetch_sc_inc_gpc(). Indeed, for all of them, we add an intermediate helper function that takes a stkctr pointer as parameter and performs the logic, leaving the lookup part in the calling function. Before this patch existing functions were doing the lookup + the fetch logic. Thanks to this patch it will become easier to add generic converters taking lookup key as input. List of targeted functions: - smp_fetch_sc_inc_gpc() - smp_fetch_sc_inc_gpc0() - smp_fetch_sc_inc_gpc1() - smp_fetch_sc_clr_gpc() - smp_fetch_sc_clr_gpc0() - smp_fetch_sc_clr_gpc1() - smp_fetch_sc_conn_cnt() - smp_fetch_sc_conn_rate() - smp_fetch_sc_updt_conn_cnt() - smp_fetch_sc_conn_curr() - smp_fetch_sc_glitch_cnt() - smp_fetch_sc_glitch_rate() - smp_fetch_sc_sess_cnt() - smp_fetch_sc_sess_rate() - smp_fetch_sc_http_req_cnt() - smp_fetch_sc_http_req_rate() - smp_fetch_sc_http_err_cnt() - smp_fetch_sc_http_err_rate() - smp_fetch_sc_http_fail_cnt() - smp_fetch_sc_http_fail_rate() - smp_fetch_sc_kbytes_in() - smp_fetch_sc_bytes_in_rate() - smp_fetch_kbytes_out() - smp_fetch_sc_gpc1_rate() - smp_fetch_sc_gpc0_rate() - smp_fetch_sc_gpc_rate() - smp_fetch_sc_get_gpc1() - smp_fetch_sc_get_gpc0() - smp_fetch_sc_get_gpc() - smp_fetch_sc_get_gpt0() - smp_fetch_sc_get_gpt() - smp_fetch_sc_bytes_out_rate() Please note that this patch doesn't render any good using "git show" or "git diff". For all the functions listed above, a new helper function was defined right above it, with the same name without "_sc". These new functions perform the fetch part, while the original ones (with "_sc") now simply perform the lookup and then leverage the corresponding fetch helper.	2025-01-15 14:04:45 +01:00
Aurelien DARRAGON	f71bad4694	MINOR: stktable: add smp_fetch_stksess() helper function smp_fetch_stksess(table, smp, create) performs a lookup in <table> by using <smp> as a key. It returns matching entry on success and NULL on failure. <create> can be set to 1 to force the entry creation. We then use this helper everywhere relevant to prevent code duplication	2025-01-15 14:04:40 +01:00
Aurelien DARRAGON	0fb8807820	MINOR: stktable: fix potential build issue in smp_to_stkey (2nd try) As discussed in GH #2838, the previous fix f399dbf ("MINOR: stktable: fix potential build issue in smp_to_stkey") which attempted to remove conversion ambiguity and prevent build warning proved to be insufficient. This time, we implement Willy's suggestion, which is to use an union to perform the conversion. Hopefully this should fix GH #2838. If that's the case (and only in that case), then this patch may be backported with f399dbf (else the patch won't apply) anywhere b59d1fd ("BUG/MINOR: stktable: fix big-endian compatiblity in smp_to_stkey()") was backported.	2025-01-15 14:04:31 +01:00
Aurelien DARRAGON	8919a80da9	BUG/MEDIUM: stktable: fix missing lock on some table converters In 819fc6f563 ("MEDIUM: threads/stick-tables: handle multithreads on stick tables"), sample fetch and action functions were properly guarded with stksess read/write locks for read and write operations respectively, but the sample_conv_table functions leveraged by "table_" converters were overlooked. This bug was not known to cause issues in existing deployments yet (at least it was not reported), but due to its nature it can theorically lead to inconsistent values being reported by "table_" converters if the value is being updated by another thread in parallel. It should be backported to all stable versions. [ada: for versions < 3.0, glitch_cnt and glitch_rate samples should be ignored as they first appeared in 3.0]	2025-01-14 11:36:04 +01:00
Aurelien DARRAGON	f399dbf70c	MINOR: stktable: fix potential build issue in smp_to_stkey smp_to_stkey() uses an ambiguous cast from 64bit integer to 32 bit unsigned integer. While it is intended, let's make the cast less ambiguous by explicitly casting the right part of the assignment to the proper type. This should fix GH #2838	2025-01-13 09:45:40 +01:00
Aurelien DARRAGON	24042df94e	MINOR: stktable: add sc[0-2]_key fetches As discussed in GH #1750, we were lacking a sample fetch to be able to retrieve the key from the currently tracked counter entry. To do so, sc_key fetch can now be used. It returns a sample with the correct type (table key type) corresponding to the tracked counter entry (from previous track-sc rules). If no entry is currently tracked, it returns nothing. It can be used using the standard form "sc_key(<sc_number>)" or the legacy form: "sc0_key", "sc1_key", "sc2_key" Documentation was updated.	2025-01-09 10:57:01 +01:00
Aurelien DARRAGON	7423310d5d	MINOR: stktable: add stksess_getkey() helper stksess_getkey(t, ts) returns a stktable_key struct pointer filled with data from input <ts> entry in <t> table. Returned pointer uses the static_table_key variable. Indeed, stktable_key struct is more convenient to manipulate than having to deal with the key extraction from stktsess struct directly.	2025-01-09 10:56:56 +01:00
Aurelien DARRAGON	df9c2ef2c3	MINOR: stktable: add stkey_to_smp() helper reverse operation for smp_to_stkey(): fills input <smp> from a stktable_key struct. Returns 1 on success and 0 on failure.	2025-01-09 10:56:50 +01:00
Aurelien DARRAGON	b59d1fd911	BUG/MINOR: stktable: fix big-endian compatiblity in smp_to_stkey() When smp_to_stkey() deals with SINT samples, since stick-tables deals with 32 bits integers while SINT sample is 64 bit integer, inplace conversion was done in smp_to_stkey. For that the 64 bit integer was truncated before the key would point to it. Unfortunately this only works on little endian architectures because with big endian ones, the key would point to the wrong 32bit range. To fix the issue and make the conversion endian-proof, let's re-assign the sample as 32bit integer before the key points to it. Thanks to Willy for having spotted the bug and suggesting the above fix. It should be backported to all stable versions.	2025-01-09 10:56:43 +01:00
Aurelien DARRAGON	5bbdd14f56	BUG/MINOR: stktable: invalid use of stkctr_set_entry() with mixed table types Some actions such as "sc0_get_gpc0" (using smp_fetch_sc_stkctr() internally) can take an optional table name as parameter to perform the lookup on a different table from the tracked one but using the key from the tracked entry. It is done by leveraging the stktable_lookup() function which was originally meant to perform intra-table lookups. Calling sc0_get_gpc0() with a different table name will result in stktable_lookup() being called to perform lookup using a stktsess from a different table. While it is theorically fine, it comes with a pitfall: both tables (the one from where the stktsess originates and the actual target table) should rely on the exact same key type and length. Failure to do so actually results in undefined behavior, because the key type and/or length from one table is used to perform the lookup in another table, while the underlying lookup API expects explicit type and key length. For instance, consider the below example: peers testpeers bind 127.0.0.1:10001 server localhost table test type binary len 1 size 100k expire 1h store gpc0 table test2 type string size 100k expire 1h store gpc0 listen test_px mode http bind 0.0.0.0:8080 http-request track-sc0 bin(AA) table testpeers/test http-request track-sc1 str(ok) table testpeers/test2 log-format "%[sc0_get_gpc0(testpeers/test2)]" log stdout format raw local0 server s1 git.haproxy.org:80 Performing a curl request to localhost:8080 will cause unitialized reads because string "ok" from test2 table will be compared as a string against "AA" binary sample which is not NULL terminated: ==2450742== Conditional jump or move depends on uninitialised value(s) ==2450742== at 0x484F238: strlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==2450742== by 0x27BCE6: stktable_lookup (stick_table.c:539) ==2450742== by 0x281470: smp_fetch_sc_stkctr (stick_table.c:3580) ==2450742== by 0x283083: smp_fetch_sc_get_gpc0 (stick_table.c:3788) ==2450742== by 0x2A805C: sample_process (sample.c:1376) So let's prevent that by adding some comments in stktable_set_entry() func description, and by adding a check in smp_fetch_sc_stkctr() to ensure both source stksess and target table share the same key properties. While it could be relevant to backport this in all stable versions, it is probably safer to wait for some time before doing so, to ensure that no existing configs rely on this ambiguity because the fact that the target table and source stksess entry need to share the same key type and length is not explicitly documented.	2024-12-31 16:36:00 +01:00
Aurelien DARRAGON	e8b7337d86	MINOR: stktable: support optional index for array types in {set, clear, show} table commands As discussed in GH #2286, {set, clear, show} table commands were unable to deal with array types such as gpt, because they handled such types as a non-array types, thus only the first entry (ie: gpt[0]) was considered. In this patch we add an extra logic around array-types handling so that it is possible to specify an array index right after the type, like this: set table peer/table key mykey data.gpt[2] value # where 2 is the entry index that we want to access If no index is specified, then it implicitly defaults to 0 to mimic previous behavior.	2024-12-23 17:32:11 +01:00
Aurelien DARRAGON	c0dc7769d4	MINOR: stktable: add stktable_get_data_type_idx() helper function Same as stktable_get_data_type(), but tries to parse optional index in the form "name[idx]" (only for array types). Falls back to stktable_get_data_type() when no index is provided.	2024-12-23 17:32:09 +01:00
Aurelien DARRAGON	9f44c5f9be	CLEANUP: stktable: replace nopurge attribute with flag Thanks to previous commit stktable struct now have a "flags" struct member Let's take this opportunity to remove the isolated "nopurge" attribute in stktable struct and rely on a flag named STK_FL_NOPURGE instead. This helps to better organize stktable struct members.	2024-12-05 12:15:31 +01:00
Aurelien DARRAGON	1f73d3524d	MINOR: stktable: implement "recv-only" table option When "recv-only" keyword is added on a stick table declaration (in peers or proxy section), haproxy considers that the table is only used for data retrieval from a remote location and not used to perform local updates. As such, it enables the retrieval of local-only values such as conn_cur that are ignored by default. This can be useful in some contexts where we want to know about local-values such are conn_cur from a remote peer. To do this, add stktable struct flags which default to NONE and enable the RECV_ONLY flag on the table then "recv-only" keyword is found in the table declaration. Then, when in peer_treat_updatemsg(), when handling table updates, don't ignore data updates for local-only values if the flag is set.	2024-12-05 12:15:24 +01:00
Willy Tarreau	9ab21a3c2d	CLEANUP: stick-table: make the file location point to a global file name The file name used to point to the calling function's stack for stick tables, which was OK during parsing but remained dangling afterwards. At least it was already marked const so as not to accidentally free it. Let's make it point to a file_name_node now.	2024-09-19 15:38:19 +02:00
Christopher Faulet	a7f6b0ac03	MEDIUM: stick-table: Add support of a factor for IN/OUT bytes rates Add a factor parameter to stick-tables, called "brates-factor", that is applied to in/out bytes rates to work around the 32-bits limit of the frequency counters. Thanks to this factor, it is possible to have bytes rates beyond the 4GB. Instead of counting each bytes, we count blocks of bytes. Among other things, it will be useful for the bwlim filter, to be able to configure shared limit exceeding the 4GB/s. For now, this parameter must be in the range ]0-1024].	2024-09-02 15:50:25 +02:00
Amaury Denoyelle	ea7ea5198a	BUG/MINOR: stick-table: fix crash for src_inc_gpc() without stkcounter Since 2.5, an array of GPC is provided to replace legacy gpc0/gpc1. src_inc_gpc is a sample fetch which is used to increment counters in this array. A crash occurs if src_inc_gpc is used without any previous track-sc rule. This is caused by an error in smp_fetch_sc_inc_gpc(). When temporary stick counter is created via smp_create_src_stkctr(), table pointer arg value used is not correct : it points to the counter ID instead of the table argument. To fix this, use the proper sample fetch second arg. This can be reproduced with the following config : acl mark src_inc_gpc(0,<table>) -m bool tcp-request connection accept if mark This should be backported up to 2.6.	2024-07-18 16:12:36 +02:00
Christopher Faulet	ad946a704d	MINOR: stick-table: Always decrement ref count before killing a session Guarded functions to kill a sticky session, stksess_kill() stksess_kill_if_expired(), may or may not decrement and test its reference counter before really killing it. This depends on a parameter. If it is set to non-zero value, the ref count is decremented and if it falls to zero, the session is killed. Otherwise, if this parameter is equal to zero, the session is killed, regardless the ref count value. In the code, these functions are always called with a non-zero parameter and the ref count is always decremented and tested. So, there is no reason to still have a special case. Especially because it is not really easy to say if it is supported or not. Does it mean it is possible to kill a sticky session while it is still referenced somewhere ? probably not. So, does it mean it is possible to kill a unreferenced session ? This case may be problematic because the session is accessed outside of any lock and thus may be released by another thread because it is unreferenced. Enlarging scope of the lock to avoid any issue is possible but it is a bit of shame to do so because there is no usage for now. The best is to simplify the API and remove this case. Now, stksess_kill() and stksess_kill_if_expired() functions always decrement and test the ref count before killing a sticky session.	2024-06-26 15:05:06 +02:00
Christopher Faulet	9357873641	BUG/MEDIUM: stick-table: Decrement the ref count inside lock to kill a session When we try to kill a session, the shard must be locked before decrementing the ref count on the session. Otherwise, the ref count can fall to 0 and a purge task (stktable_trash_oldest or process_table_expire) may release the session before we have the opportunity to acquire the lock on the shard to effectively kill the session. This could lead to a double free. Here is the scenario: Thread 1 Thread 2 sktsess_kill(ts) if (ATOMIC_DEC(&ts->ref_cnt) != 0) return /* here the ref count is 0 / stktable_trash_oldest() LOCK(&sh_lock) if (!ATOMIC_LOAD(&ts->ref_cnf)) __stksess_free(ts) UNLOCK(&sh_lock) / here the session was released */ LOCK(&sh_lock) __stksess_free(ts) <--- double free UNLOCK(&sh_lock) The bug was introduced in 2.9 by the commit 7968fe3889 ("MEDIUM: stick-table: change the ref_cnt atomically"). The ref count must be decremented inside the lock for stksess_kill() and sktsess_kill_if_expired() function. This patch should fix the issue #2611. It must be backported as far as 2.9. On the 2.9, there is no sharding. All the table is locked. The patch will have to be adapted.	2024-06-26 12:05:37 +02:00
Aurelien DARRAGON	8860c22c00	MINOR: stktable: avoid ambiguous stktable_data_ptr() usage in cli_io_handler_table() As reported by @Bbulatov in GH #2586, stktable_data_ptr() return value is used without checking it isn't NULL first, which may happen if the given type is invalid or not stored in the table. However, since date_type is set by table_prepare_data_request() right before cli_io_handler_table() is invoked, date_type is not expected to be invalid: table_prepare_data_request() normally checked that the type is stored inside the table. Thus stktable_data_ptr() should not be failing at this point, so we add a BUG_ON() to indicate that.	2024-06-03 16:59:54 +02:00
Willy Tarreau	0bda33a3ec	MINOR: stick-tables: remove the uneeded read lock in stksess_free() During changes made in 2.7 by commits 8d3c3336f9 ("MEDIUM: stick-table: make stksess_kill_if_expired() avoid the exclusive lock") and 996f1a5124 ("MEDIUM: stick-table: do not take a lock to update t->current anymore."), the operation was done cautiously one baby step at a time and the final cleanup was not done, as we're keeping a read lock under an atomic dec. Furthermore there's a pool_free() call under that lock, and we try to avoid pool_alloc() and pool_free() under locks for their nasty side effects (e.g. when memory gets recompacted), so let's really drop it now. Note that the performance gain is not really perceptible here, it's essentially for code clarity reasons that this has to be done.	2024-05-24 11:52:57 +02:00
Willy Tarreau	8580f9db20	CLEANUP: stick-tables: remove a few unneeded tests for use_wrlock Due to the code in stktable_touch_with_exp() being the same as in other functions previously made around a loop trying first to upgrade a read lock then to fall back to a direct write lock, there remains a confusing construct with multiple tests on use_wrlock that is obviously zero when tested. Let's remove them since the value is known and the loop does not exist anymore.	2024-05-24 11:52:19 +02:00
Willy Tarreau	77f286e8bc	BUG/MEDIUM: stick-tables: make sure never to create two same remote entries In GH issue #2552, Christian Ruppert reported an increase in crashes with recent 3.0-dev versions, always related with stick-tables and peers. One particularity of his config is that it has a lot of peers. While trying to reproduce, it empirically was found that firing 10 load generators at 10 different haproxy instances tracking a random key among 100k against a table of max 5k entries, on 8 threads and between a total of 50 parallel peers managed to reproduce the crashes in seconds, very often in ebtree deletion or insertion code, but not only. The debugging revealed that the crashes are often caused by a parent node being corrupted while delete/insert tries to update it regarding a recently inserted/removed node, and that that corrupted node had always been proven to be deleted, then immediately freed, so it ought not be visited in the tree from functions enclosed between a pair of lock/unlock. As such the only possibility was that it had experienced unexpected inserts. Also, running with pool integrity checking would 90% of the time cause crashes during allocation based on corrupted contents in the node, likely because it was found at two places in the same tree and still present as a parent of a node being deleted or inserted (hence the __stksess_free and stktable_trash_oldest callers being visible on these items). Indeed the issue is in fact related to the test set (occasionally redundant keys, many peers). What happens is that sometimes, a same key is learned from two different peers. When it is learned for the first time, we end up in stktable_touch_with_exp() in the "else" branch, where the test for existence is made before taking the lock (since commit cfeca3a3a3 ("MEDIUM: stick-table: touch updates under an upgradable read lock") that was merged in 2.9), and from there the entry is added. But is one of the threads manages to insert it before the other thread takes the lock, then the second thread will try to insert this node again. And inserting an already inserted node will corrupt the tree (note that we never switched to enforcing a check in insertion code on this due to API history that would break various code parts). Here the solution is simple, it requires to recheck leaf_p after getting the lock, to avoid touching anything if the entry has already been inserted in the mean time. Many thanks to Christian Ruppert for testing this and for his invaluable help on this hard-to-trigger issue. This fix needs to be backported to 2.9.	2024-05-24 11:52:11 +02:00
Christopher Faulet	9938fb9c7a	BUG/MEDIUM: stick-tables: Fix race with peers when killing a sticky session When a sticky session is killed, we must be sure no other entity is still referencing it. The session's ref_cnt must be 0. However, there is a race with peers, as decribed in 21447b1dd4 ("BUG/MAJOR: stick-tables: fix race with peers in entry expiration"). When the update lock is acquire, we must recheck the ref_cnt value. This patch is part of a debugging session about issue #2552. It must be backported to 2.9.	2024-05-24 11:52:11 +02:00
Christopher Faulet	dfd938bad6	BUG/MEDIUM: stick-tables: Fix race with peers when trashing oldest entries It is the same that the one fixed in process_table_expire() (21447b1dd4 ["BUG/MAJOR: stick-tables: fix race with peers in entry expiration"]). In stktable_trash_oldest(), when the update lock is acquired, we must take care to check again the ref_cnt because some peers may increment it (See commit above for details). This patch fixes a crash mentionned in 2552#issuecomment-2110532706. It must be backported to 2.9.	2024-05-24 11:52:11 +02:00
Willy Tarreau	19f8762a98	BUILD: stick-tables: silence build warnings when threads are disabled Since 3.0-dev7 with commit 1a088da7c2 ("MAJOR: stktable: split the keys across multiple shards to reduce contention"), building without threads yields a warning about the shard not being used. This is because the locks API does nothing of its arguments, which is the only place where the shard is being used. We cannot modify the lock API to pretend to consume its argument because quite often it's not even instantiated. Let's just pretend we consume shard using an explict ALREADY_CHECKED() statement instead. While we're at it, let's make sure that XXH32() is not called when there is a single bucket! No backport is needed.	2024-04-24 08:23:56 +02:00
Willy Tarreau	21447b1dd4	BUG/MAJOR: stick-tables: fix race with peers in entry expiration In 2.9 with commit 7968fe3889 ("MEDIUM: stick-table: change the ref_cnt atomically") we significantly relaxed the stick-tables locking when dealing with peers by adjusting the ref_cnt atomically and moving it out of the lock. However it opened a tiny window that became problematic in 3.0-dev7 when the table's contention was lowered by commit 1a088da7c2 ("MAJOR: stktable: split the keys across multiple shards to reduce contention"). What happens is that some peers may access the entry for reading at the moment it's about to expire, and while the read accesses to push the data remain unnoticed (possibly that from time to time we push crap), but the releasing of the refcount causes a new write that may damage anything else. The scenario is the following: process_table_expire() peer_send_teachmsgs() RDLOCK(&updt_lock); tick_is_expired() != 0 ebmb_delete(ts->key); if (ts->upd.node.leaf_p) { HA_ATOMIC_INC(&ts->ref_cnt); RDUNLOCK(&updt_lock); WRLOCK(&updt_lock); eb32_delete(&ts->upd); } __stksess_free(t, ts); peer_send_updatemsg(ts); RDLOCK(&updt_lock); HA_ATOMIC_DEC(&ts->ref_cnt); Here it's clear that the bottom part of peer_send_teachmsgs() believes to be protected but may act on freed data. This is more visible when enabling -dMtag,no-merge,integrity because the ATOMIC_DEC(&ref_cnt) decrements one byte in the area, that makes the eviction check fail while the tag has the address of the left __stksess_free(), proving a completed pool_free() before the decrement, and the anomaly there is pretty visible in the crash dump. Changing INC()/DEC() with ADD(2)/DEC(2) shows that the byte is now off by two, confirming that the operation happened there. The solution is not very hard, it consists in checking for the ref_cnt on the left after grabbing the lock, and doing both before deleting the element, so that we have the guarantee that either the peer will not take it or that it has already started taking it. This was proven to be sufficient, as instead of crashing after 3s of injection with 4 peers, 16 threads and 130k RPS, it survived for 15mn. In order to stress the setup, a config involving 4+ peers, tracking HTTP request with randoms and applying a bwlim-out filter with a random key, with a client made of 160 h2 conns downloading 10 streams of 4MB objects in parallel managed to trigger it within a few seconds: frontend ft http-request track-sc0 rand(100000) table tbl filter bwlim-out lim-out limit 2047m key rand(100000000),ipmask(32) min-size 1 table tbl http-request set-bandwidth-limit lim-out use_backend bk backend bk server s1 198.18.0.30:8000 server s2 198.18.0.34:8000 backend tbl stick-table type ip size 1000k expire 1s store http_req_cnt,bytes_in_rate(1s),bytes_out_rate(1s) peers peers This seems to be very dependent on the timing and setup though. This will need to be backported to 2.9. This part of the code was reindented with shards but the block should remain mostly unchanged. The logic to apply is the same.	2024-04-12 18:00:13 +02:00
Willy Tarreau	90efe8a877	CLEANUP: stick-tables: always respect the to_batch limit when trashing When adding the shards support to tables with commit 1a088da7c ("MAJOR: stktable: split the keys across multiple shards to reduce contention"), the condition to stop eliminating entries based on the batch size being reached is based on a pre-decrement of the max_search counter, but now it goes back into the outer loop which doesn't check it, so next time it does it when entering the next shard, it will become even more negative and will properly stop, but at first glance it looks like an int overflow (which it is not). Let's make sure the outer loop stops on this condition so that we don't continue searching when the limit is reached.	2024-04-12 17:58:54 +02:00
Willy Tarreau	44a8f9e7fc	BUG/MEDIUM: stick-tables: fix the task's next expiration date While changing the stick-table indexing that led to commit 1a088da7c ("MAJOR: stktable: split the keys across multiple shards to reduce contention"), I met a problem with the task's expiration date being incorrectly updated, I fixed it and apparently I committed the wrong version :-/ The effect is that the task's date is only correctly reset if the table is empty, otherwise the task wakes up again and is queued at the previous date, eating 100% CPU. The tick_isfirst() must not be used when storing the last result. No backport is needed as this was only merged in 3.0-dev7.	2024-04-12 17:58:54 +02:00
Frederic Lecaille	fcb096f7cd	BUG/MINOR: stick-tables: Missing stick-table key nullity check This bug arrived with this commit: MAJOR: stktable: split the keys across multiple shards to reduce contention At this time, there are no callers which call stktable_get_entry() without checking the nullity of <key> passed as parameter. But the documentation of this function says it supports this case where the <key> passed as parameter could be null. Move the nullity test on <key> at first statement of this function. Thanks to @chipitsine for having reported this issue in GH #2518.	2024-04-04 11:08:56 +02:00
Willy Tarreau	1a088da7c2	MAJOR: stktable: split the keys across multiple shards to reduce contention In order to reduce the contention on the table when keys expire quickly, we're spreading the load over multiple trees. That counts for keys and expiration dates. The shard number is calculated from the key value itself, both when looking up and when setting it. The "show table" dump on the CLI iterates over all shards so that the output is not fully sorted, it's only sorted within each shard. The Lua table dump just does the same. It was verified with a Lua program to count stick-table entries that it works as intended (the test case is reproduced here as it's clearly not easy to automate as a vtc): function dump_stk() local dmp = core.proxies['tbl'].stktable:dump({}); local count = 0 for _, __ in pairs(dmp) do count = count + 1 end core.Info('Total entries: ' .. count) end core.register_action("dump_stk", {'tcp-req', 'http-req'}, dump_stk, 0); ## global tune.lua.log.stderr on lua-load-per-thread lua-cnttbl.lua listen front bind :8001 http-request lua.dump_stk if { path_beg /stk } http-request track-sc1 rand(),upper,hex table tbl http-request redirect location / backend tbl stick-table size 100k type string len 12 store http_req_cnt ## $ h2load -c 16 -n 10000 0:8001/ $ curl 0:8001/stk ## A count close to 100k appears on haproxy's stderr ## On the CLI, "show table tbl" \| wc will show the same. Some large parts were reindented only to add a top-level loop to iterate over shards (e.g. process_table_expire()). Better check the diff using git show -b. The number of shards is decided just like for the pools, at build time based on the max number of threads, so that we can keep a constant. Maybe this should be done differently. For now CONFIG_HAP_TBL_BUCKETS is used, and defaults to CONFIG_HAP_POOL_BUCKETS to keep the benefits of all the measurements made for the pools. It turns out that this value seems to be the most reasonable one without inflating the struct stktable too much. By default for 1024 threads the value is 32 and delivers 980k RPS in a test involving 80 threads, while adding 1kB to the struct stktable (roughly doubling it). The same test at 64 gives 1008 kRPS and at 128 it gives 1040 kRPS for 8 times the initial size. 16 would be too low however, with 675k RPS. The stksess already have a shard number, it's the one used to decide which peer connection to send the entry. Maybe we should also store the one associated with the entry itself instead of recalculating it, though it does not happen that often. The operation is done by hashing the key using XXH32(). The peers also take and release the table's lock but the way it's used it not very clear yet, so at this point it's sure this will not work. At this point, this allowed to completely unlock the performance on a 80-thread setup: before: 5.4 Gbps, 150k RPS, 80 cores 52.71% haproxy [.] stktable_lookup_key 36.90% haproxy [.] stktable_get_entry.part.0 0.86% haproxy [.] ebmb_lookup 0.18% haproxy [.] process_stream 0.12% haproxy [.] process_table_expire 0.11% haproxy [.] fwrr_get_next_server 0.10% haproxy [.] eb32_insert 0.10% haproxy [.] run_tasks_from_lists after: 36 Gbps, 980k RPS, 80 cores 44.92% haproxy [.] stktable_get_entry 5.47% haproxy [.] ebmb_lookup 2.50% haproxy [.] fwrr_get_next_server 0.97% haproxy [.] eb32_insert 0.92% haproxy [.] process_stream 0.52% haproxy [.] run_tasks_from_lists 0.45% haproxy [.] conn_backend_get 0.44% haproxy [.] __pool_alloc 0.35% haproxy [.] process_table_expire 0.35% haproxy [.] connect_server 0.35% haproxy [.] h1_headers_to_hdr_list 0.34% haproxy [.] eb_delete 0.31% haproxy [.] srv_add_to_idle_list 0.30% haproxy [.] h1_snd_buf WIP: uint64_t -> long WIP: ulong -> uint code is much smaller	2024-04-03 17:34:47 +02:00
Willy Tarreau	864ac31174	OPTIM: stick-tables: check the stksess without taking the read lock Thanks to the previous commit, we can now simply perform an atomic read on stksess->seen and take the write lock to recreate the entry only if at least one peer has seen it, otherwise leave it untouched. On a test on 40 cores, the performance used to drop from 2.10 to 1.14M RPS when one peer was connected, now it drops to 2.05, thus there's basically no impact of connecting a peer vs ~45% previously, all spent in the read lock. This can be particularly important when often updating the same entries (user-agent, source address during an attack etc).	2024-04-03 17:34:47 +02:00
Willy Tarreau	4c1480f13b	MINOR: stick-tables: mark the seen stksess with a flag "seen" Right now we're taking the stick-tables update lock for reads just for the sake of checking if the update index is past it or not. That's costly because even taking the read lock is sufficient to provoke a cache line write, while when under load or attack it's frequent that the update has not yet been propagated and wouldn't require anything. This commit brings a new field to the stksess, "seen", which is zeroed when the entry is updated, and set to one as soon as at least one peer starts to consult it. This way it will reflect that the entry must be updated again so that this peer can see it. Otherwise no update will be necessary. For now the flag is only set/reset but not exploited. A great care is taken to avoid writes whenever possible.	2024-04-03 17:34:47 +02:00
Tim Duesterhus	cd5d62249f	CLEANUP: Reapply ist.cocci (3) This reapplies ist.cocci across the whole src/ tree.	2024-04-02 07:27:33 +02:00
Willy Tarreau	5fc1afb341	BUG/MEDIUM: stick-tables: fix a small remaining race in expiration task In 2.7 we addressed a race condition in the stick tables expiration task with commit fbb934d ("BUG/MEDIUM: stick-table: fix a race condition when updating the expiration task"). The issue was that the task could be running on another thread which would destroy its expiration timer while one had just recalculated it and prepares to queue it, causing a bug due to the attempt to queue an expired task. The fix consisted in enclosing the change into the stick-table's lock, which had a very low cost since it's done only after having checked that the date changed, i.e. no more than once every millisecond. But as reported by Ricardo and Felipe from Taghos in github issue #2508, a tiny race remained after the fix: the unlock() was done before the call to task_queue(), leaving a tiny window for another thread to run between unlock() and task_queue() and erase the timer. As confirmed, it's sufficient to also protect the task_queue() call. But overall this raises a point regarding the task_queue() API on tasks that may run anywhere. A while ago an attempt was made at removing the timer for woken up tasks, but something like this would be deserved with more atomicity on the timer manipulation (e.g. atomically use task_schedule() instead maybe). This should be backported to all stable branches.	2024-04-02 07:07:57 +02:00
Christopher Faulet	94b8ed446f	MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands The main CLI I/O handle is responsible to interrupt the processing on shutdown/abort. It is not the responsibility of the I/O handler of CLI commands to take care of it.	2024-03-28 17:28:20 +01:00
Ilya Shipitsin	96cd04f8db	CLEANUP: fix typo in naming for variable "unused" In resolvers.c:rslv_promex_next_ts() and in stick-tables.c:stk_promex_next_ts(), an unused argument was mistakenly called "unsued" instead of "unused". Let's fix this in a separate patch so that it can be omitted from backports if this causes build problems.	2024-03-05 11:50:34 +01:00
Willy Tarreau	c9c6b683fb	MEDIUM: stick-tables: add a new stored type for glitch_cnt and glitch_rate This adds a new pair of stored types in the stick-tables: - glitch_cnt - glitch_rate These keep count of the number of glitches reported on a front connection, in order to decide how to act with a badly defective client or a potential attacker. For now nothing updates these counters, but all the infrastructure needed to configure, update and retrieve them was added, including the doc. No regtest was added yet since they're not filled yet.	2024-02-08 15:51:49 +01:00
Christopher Faulet	3e55b3da30	MEDIUM: promex/stick-table: Dump stick-table metrics via a promex module Create a promex module to dump stick-table metrics. Thanks to this patch, all references to stick tables were removed from the promex service.	2024-02-02 09:11:34 +01:00
Willy Tarreau	cdc993b19e	BUILD: stick-table: fix build error on 32-bit platforms Commit 9b2717e7b ("MINOR: stktable: use {show,set,clear} table with ptr") stores a pointer in a long long (64bit), which fails the cas to void* on 32-bit platforms: src/stick_table.c: In function 'table_process_entry_per_ptr': src/stick_table.c:5136:37: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] 5136 \| ts = stktable_lookup_ptr(t, (void *)ptr); On all our supported platforms, longs and pointers are of the same size, so let's just turn this to a ulong instead.	2024-01-21 08:21:35 +01:00
Aurelien DARRAGON	41b7193e3c	MINOR: stktable: stktable_data_ptr() cannot fail in table_process_entry() In table_process_entry(), stktable_data_ptr() result is dereferenced without checking if it's NULL first, which may happen when bad inputs are provided to the function. However, data_type and ts arguments were already checked prior to calling the function, so we know for sure that stktable_data_ptr() will never return NULL in this case. However some static code analyzers such as Coverity are being confused because they think that the result might possibly be NULL. (See GH #2398) To make it explicit that we always provide good inputs and expect valid result, let's switch to the __stktable_data_ptr() unsafe function.	2024-01-02 08:51:51 +01:00
Aurelien DARRAGON	9b2717e7bb	MINOR: stktable: use {show,set,clear} table with ptr This patchs adds support for optional ptr (0xffff form) instead of key argument to match against existing sticktable entries, ie: if the key is empty or cannot be matched on the cli due to incompatible characters. Lookup is performed using a linear search so it will be slower than key search which relies on eb tree lookup. Example: set table mytable key mykey data.gpc0 1 show table mytable > 0x7fbd00032bd8: key=mykey use=0 exp=86373242 shard=0 gpc0=1 clear table mytable ptr 0x7fbd00032bd8 This patchs depends on: - "MINOR: stktable: add table_process_entry helper function" It should solve GH #2118	2023-12-21 14:22:27 +01:00

1 2 3 4 5 ...

331 Commits