haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-10 09:07:02 +02:00

Author	SHA1	Message	Date
Willy Tarreau	0bda33a3ec	MINOR: stick-tables: remove the uneeded read lock in stksess_free() During changes made in 2.7 by commits `8d3c3336f9` ("MEDIUM: stick-table: make stksess_kill_if_expired() avoid the exclusive lock") and `996f1a5124` ("MEDIUM: stick-table: do not take a lock to update t->current anymore."), the operation was done cautiously one baby step at a time and the final cleanup was not done, as we're keeping a read lock under an atomic dec. Furthermore there's a pool_free() call under that lock, and we try to avoid pool_alloc() and pool_free() under locks for their nasty side effects (e.g. when memory gets recompacted), so let's really drop it now. Note that the performance gain is not really perceptible here, it's essentially for code clarity reasons that this has to be done.	2024-05-24 11:52:57 +02:00
Willy Tarreau	8580f9db20	CLEANUP: stick-tables: remove a few unneeded tests for use_wrlock Due to the code in stktable_touch_with_exp() being the same as in other functions previously made around a loop trying first to upgrade a read lock then to fall back to a direct write lock, there remains a confusing construct with multiple tests on use_wrlock that is obviously zero when tested. Let's remove them since the value is known and the loop does not exist anymore.	2024-05-24 11:52:19 +02:00
Willy Tarreau	77f286e8bc	BUG/MEDIUM: stick-tables: make sure never to create two same remote entries In GH issue #2552, Christian Ruppert reported an increase in crashes with recent 3.0-dev versions, always related with stick-tables and peers. One particularity of his config is that it has a lot of peers. While trying to reproduce, it empirically was found that firing 10 load generators at 10 different haproxy instances tracking a random key among 100k against a table of max 5k entries, on 8 threads and between a total of 50 parallel peers managed to reproduce the crashes in seconds, very often in ebtree deletion or insertion code, but not only. The debugging revealed that the crashes are often caused by a parent node being corrupted while delete/insert tries to update it regarding a recently inserted/removed node, and that that corrupted node had always been proven to be deleted, then immediately freed, so it ought not be visited in the tree from functions enclosed between a pair of lock/unlock. As such the only possibility was that it had experienced unexpected inserts. Also, running with pool integrity checking would 90% of the time cause crashes during allocation based on corrupted contents in the node, likely because it was found at two places in the same tree and still present as a parent of a node being deleted or inserted (hence the __stksess_free and stktable_trash_oldest callers being visible on these items). Indeed the issue is in fact related to the test set (occasionally redundant keys, many peers). What happens is that sometimes, a same key is learned from two different peers. When it is learned for the first time, we end up in stktable_touch_with_exp() in the "else" branch, where the test for existence is made before taking the lock (since commit `cfeca3a3a3` ("MEDIUM: stick-table: touch updates under an upgradable read lock") that was merged in 2.9), and from there the entry is added. But is one of the threads manages to insert it before the other thread takes the lock, then the second thread will try to insert this node again. And inserting an already inserted node will corrupt the tree (note that we never switched to enforcing a check in insertion code on this due to API history that would break various code parts). Here the solution is simple, it requires to recheck leaf_p after getting the lock, to avoid touching anything if the entry has already been inserted in the mean time. Many thanks to Christian Ruppert for testing this and for his invaluable help on this hard-to-trigger issue. This fix needs to be backported to 2.9.	2024-05-24 11:52:11 +02:00
Christopher Faulet	9938fb9c7a	BUG/MEDIUM: stick-tables: Fix race with peers when killing a sticky session When a sticky session is killed, we must be sure no other entity is still referencing it. The session's ref_cnt must be 0. However, there is a race with peers, as decribed in `21447b1dd4` ("BUG/MAJOR: stick-tables: fix race with peers in entry expiration"). When the update lock is acquire, we must recheck the ref_cnt value. This patch is part of a debugging session about issue #2552. It must be backported to 2.9.	2024-05-24 11:52:11 +02:00
Christopher Faulet	dfd938bad6	BUG/MEDIUM: stick-tables: Fix race with peers when trashing oldest entries It is the same that the one fixed in process_table_expire() (`21447b1dd4` ["BUG/MAJOR: stick-tables: fix race with peers in entry expiration"]). In stktable_trash_oldest(), when the update lock is acquired, we must take care to check again the ref_cnt because some peers may increment it (See commit above for details). This patch fixes a crash mentionned in 2552#issuecomment-2110532706. It must be backported to 2.9.	2024-05-24 11:52:11 +02:00
Willy Tarreau	19f8762a98	BUILD: stick-tables: silence build warnings when threads are disabled Since 3.0-dev7 with commit `1a088da7c2` ("MAJOR: stktable: split the keys across multiple shards to reduce contention"), building without threads yields a warning about the shard not being used. This is because the locks API does nothing of its arguments, which is the only place where the shard is being used. We cannot modify the lock API to pretend to consume its argument because quite often it's not even instantiated. Let's just pretend we consume shard using an explict ALREADY_CHECKED() statement instead. While we're at it, let's make sure that XXH32() is not called when there is a single bucket! No backport is needed.	2024-04-24 08:23:56 +02:00
Willy Tarreau	21447b1dd4	BUG/MAJOR: stick-tables: fix race with peers in entry expiration In 2.9 with commit `7968fe3889` ("MEDIUM: stick-table: change the ref_cnt atomically") we significantly relaxed the stick-tables locking when dealing with peers by adjusting the ref_cnt atomically and moving it out of the lock. However it opened a tiny window that became problematic in 3.0-dev7 when the table's contention was lowered by commit `1a088da7c2` ("MAJOR: stktable: split the keys across multiple shards to reduce contention"). What happens is that some peers may access the entry for reading at the moment it's about to expire, and while the read accesses to push the data remain unnoticed (possibly that from time to time we push crap), but the releasing of the refcount causes a new write that may damage anything else. The scenario is the following: process_table_expire() peer_send_teachmsgs() RDLOCK(&updt_lock); tick_is_expired() != 0 ebmb_delete(ts->key); if (ts->upd.node.leaf_p) { HA_ATOMIC_INC(&ts->ref_cnt); RDUNLOCK(&updt_lock); WRLOCK(&updt_lock); eb32_delete(&ts->upd); } __stksess_free(t, ts); peer_send_updatemsg(ts); RDLOCK(&updt_lock); HA_ATOMIC_DEC(&ts->ref_cnt); Here it's clear that the bottom part of peer_send_teachmsgs() believes to be protected but may act on freed data. This is more visible when enabling -dMtag,no-merge,integrity because the ATOMIC_DEC(&ref_cnt) decrements one byte in the area, that makes the eviction check fail while the tag has the address of the left __stksess_free(), proving a completed pool_free() before the decrement, and the anomaly there is pretty visible in the crash dump. Changing INC()/DEC() with ADD(2)/DEC(2) shows that the byte is now off by two, confirming that the operation happened there. The solution is not very hard, it consists in checking for the ref_cnt on the left after grabbing the lock, and doing both before deleting the element, so that we have the guarantee that either the peer will not take it or that it has already started taking it. This was proven to be sufficient, as instead of crashing after 3s of injection with 4 peers, 16 threads and 130k RPS, it survived for 15mn. In order to stress the setup, a config involving 4+ peers, tracking HTTP request with randoms and applying a bwlim-out filter with a random key, with a client made of 160 h2 conns downloading 10 streams of 4MB objects in parallel managed to trigger it within a few seconds: frontend ft http-request track-sc0 rand(100000) table tbl filter bwlim-out lim-out limit 2047m key rand(100000000),ipmask(32) min-size 1 table tbl http-request set-bandwidth-limit lim-out use_backend bk backend bk server s1 198.18.0.30:8000 server s2 198.18.0.34:8000 backend tbl stick-table type ip size 1000k expire 1s store http_req_cnt,bytes_in_rate(1s),bytes_out_rate(1s) peers peers This seems to be very dependent on the timing and setup though. This will need to be backported to 2.9. This part of the code was reindented with shards but the block should remain mostly unchanged. The logic to apply is the same.	2024-04-12 18:00:13 +02:00
Willy Tarreau	90efe8a877	CLEANUP: stick-tables: always respect the to_batch limit when trashing When adding the shards support to tables with commit `1a088da7c` ("MAJOR: stktable: split the keys across multiple shards to reduce contention"), the condition to stop eliminating entries based on the batch size being reached is based on a pre-decrement of the max_search counter, but now it goes back into the outer loop which doesn't check it, so next time it does it when entering the next shard, it will become even more negative and will properly stop, but at first glance it looks like an int overflow (which it is not). Let's make sure the outer loop stops on this condition so that we don't continue searching when the limit is reached.	2024-04-12 17:58:54 +02:00
Willy Tarreau	44a8f9e7fc	BUG/MEDIUM: stick-tables: fix the task's next expiration date While changing the stick-table indexing that led to commit `1a088da7c` ("MAJOR: stktable: split the keys across multiple shards to reduce contention"), I met a problem with the task's expiration date being incorrectly updated, I fixed it and apparently I committed the wrong version :-/ The effect is that the task's date is only correctly reset if the table is empty, otherwise the task wakes up again and is queued at the previous date, eating 100% CPU. The tick_isfirst() must not be used when storing the last result. No backport is needed as this was only merged in 3.0-dev7.	2024-04-12 17:58:54 +02:00
Frederic Lecaille	fcb096f7cd	BUG/MINOR: stick-tables: Missing stick-table key nullity check This bug arrived with this commit: MAJOR: stktable: split the keys across multiple shards to reduce contention At this time, there are no callers which call stktable_get_entry() without checking the nullity of <key> passed as parameter. But the documentation of this function says it supports this case where the <key> passed as parameter could be null. Move the nullity test on <key> at first statement of this function. Thanks to @chipitsine for having reported this issue in GH #2518.	2024-04-04 11:08:56 +02:00
Willy Tarreau	1a088da7c2	MAJOR: stktable: split the keys across multiple shards to reduce contention In order to reduce the contention on the table when keys expire quickly, we're spreading the load over multiple trees. That counts for keys and expiration dates. The shard number is calculated from the key value itself, both when looking up and when setting it. The "show table" dump on the CLI iterates over all shards so that the output is not fully sorted, it's only sorted within each shard. The Lua table dump just does the same. It was verified with a Lua program to count stick-table entries that it works as intended (the test case is reproduced here as it's clearly not easy to automate as a vtc): function dump_stk() local dmp = core.proxies['tbl'].stktable:dump({}); local count = 0 for _, __ in pairs(dmp) do count = count + 1 end core.Info('Total entries: ' .. count) end core.register_action("dump_stk", {'tcp-req', 'http-req'}, dump_stk, 0); ## global tune.lua.log.stderr on lua-load-per-thread lua-cnttbl.lua listen front bind :8001 http-request lua.dump_stk if { path_beg /stk } http-request track-sc1 rand(),upper,hex table tbl http-request redirect location / backend tbl stick-table size 100k type string len 12 store http_req_cnt ## $ h2load -c 16 -n 10000 0:8001/ $ curl 0:8001/stk ## A count close to 100k appears on haproxy's stderr ## On the CLI, "show table tbl" \| wc will show the same. Some large parts were reindented only to add a top-level loop to iterate over shards (e.g. process_table_expire()). Better check the diff using git show -b. The number of shards is decided just like for the pools, at build time based on the max number of threads, so that we can keep a constant. Maybe this should be done differently. For now CONFIG_HAP_TBL_BUCKETS is used, and defaults to CONFIG_HAP_POOL_BUCKETS to keep the benefits of all the measurements made for the pools. It turns out that this value seems to be the most reasonable one without inflating the struct stktable too much. By default for 1024 threads the value is 32 and delivers 980k RPS in a test involving 80 threads, while adding 1kB to the struct stktable (roughly doubling it). The same test at 64 gives 1008 kRPS and at 128 it gives 1040 kRPS for 8 times the initial size. 16 would be too low however, with 675k RPS. The stksess already have a shard number, it's the one used to decide which peer connection to send the entry. Maybe we should also store the one associated with the entry itself instead of recalculating it, though it does not happen that often. The operation is done by hashing the key using XXH32(). The peers also take and release the table's lock but the way it's used it not very clear yet, so at this point it's sure this will not work. At this point, this allowed to completely unlock the performance on a 80-thread setup: before: 5.4 Gbps, 150k RPS, 80 cores 52.71% haproxy [.] stktable_lookup_key 36.90% haproxy [.] stktable_get_entry.part.0 0.86% haproxy [.] ebmb_lookup 0.18% haproxy [.] process_stream 0.12% haproxy [.] process_table_expire 0.11% haproxy [.] fwrr_get_next_server 0.10% haproxy [.] eb32_insert 0.10% haproxy [.] run_tasks_from_lists after: 36 Gbps, 980k RPS, 80 cores 44.92% haproxy [.] stktable_get_entry 5.47% haproxy [.] ebmb_lookup 2.50% haproxy [.] fwrr_get_next_server 0.97% haproxy [.] eb32_insert 0.92% haproxy [.] process_stream 0.52% haproxy [.] run_tasks_from_lists 0.45% haproxy [.] conn_backend_get 0.44% haproxy [.] __pool_alloc 0.35% haproxy [.] process_table_expire 0.35% haproxy [.] connect_server 0.35% haproxy [.] h1_headers_to_hdr_list 0.34% haproxy [.] eb_delete 0.31% haproxy [.] srv_add_to_idle_list 0.30% haproxy [.] h1_snd_buf WIP: uint64_t -> long WIP: ulong -> uint code is much smaller	2024-04-03 17:34:47 +02:00
Willy Tarreau	864ac31174	OPTIM: stick-tables: check the stksess without taking the read lock Thanks to the previous commit, we can now simply perform an atomic read on stksess->seen and take the write lock to recreate the entry only if at least one peer has seen it, otherwise leave it untouched. On a test on 40 cores, the performance used to drop from 2.10 to 1.14M RPS when one peer was connected, now it drops to 2.05, thus there's basically no impact of connecting a peer vs ~45% previously, all spent in the read lock. This can be particularly important when often updating the same entries (user-agent, source address during an attack etc).	2024-04-03 17:34:47 +02:00
Willy Tarreau	4c1480f13b	MINOR: stick-tables: mark the seen stksess with a flag "seen" Right now we're taking the stick-tables update lock for reads just for the sake of checking if the update index is past it or not. That's costly because even taking the read lock is sufficient to provoke a cache line write, while when under load or attack it's frequent that the update has not yet been propagated and wouldn't require anything. This commit brings a new field to the stksess, "seen", which is zeroed when the entry is updated, and set to one as soon as at least one peer starts to consult it. This way it will reflect that the entry must be updated again so that this peer can see it. Otherwise no update will be necessary. For now the flag is only set/reset but not exploited. A great care is taken to avoid writes whenever possible.	2024-04-03 17:34:47 +02:00
Tim Duesterhus	cd5d62249f	CLEANUP: Reapply ist.cocci (3) This reapplies ist.cocci across the whole src/ tree.	2024-04-02 07:27:33 +02:00
Willy Tarreau	5fc1afb341	BUG/MEDIUM: stick-tables: fix a small remaining race in expiration task In 2.7 we addressed a race condition in the stick tables expiration task with commit `fbb934d` ("BUG/MEDIUM: stick-table: fix a race condition when updating the expiration task"). The issue was that the task could be running on another thread which would destroy its expiration timer while one had just recalculated it and prepares to queue it, causing a bug due to the attempt to queue an expired task. The fix consisted in enclosing the change into the stick-table's lock, which had a very low cost since it's done only after having checked that the date changed, i.e. no more than once every millisecond. But as reported by Ricardo and Felipe from Taghos in github issue #2508, a tiny race remained after the fix: the unlock() was done before the call to task_queue(), leaving a tiny window for another thread to run between unlock() and task_queue() and erase the timer. As confirmed, it's sufficient to also protect the task_queue() call. But overall this raises a point regarding the task_queue() API on tasks that may run anywhere. A while ago an attempt was made at removing the timer for woken up tasks, but something like this would be deserved with more atomicity on the timer manipulation (e.g. atomically use task_schedule() instead maybe). This should be backported to all stable branches.	2024-04-02 07:07:57 +02:00
Christopher Faulet	94b8ed446f	MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands The main CLI I/O handle is responsible to interrupt the processing on shutdown/abort. It is not the responsibility of the I/O handler of CLI commands to take care of it.	2024-03-28 17:28:20 +01:00
Ilya Shipitsin	96cd04f8db	CLEANUP: fix typo in naming for variable "unused" In resolvers.c:rslv_promex_next_ts() and in stick-tables.c:stk_promex_next_ts(), an unused argument was mistakenly called "unsued" instead of "unused". Let's fix this in a separate patch so that it can be omitted from backports if this causes build problems.	2024-03-05 11:50:34 +01:00
Willy Tarreau	c9c6b683fb	MEDIUM: stick-tables: add a new stored type for glitch_cnt and glitch_rate This adds a new pair of stored types in the stick-tables: - glitch_cnt - glitch_rate These keep count of the number of glitches reported on a front connection, in order to decide how to act with a badly defective client or a potential attacker. For now nothing updates these counters, but all the infrastructure needed to configure, update and retrieve them was added, including the doc. No regtest was added yet since they're not filled yet.	2024-02-08 15:51:49 +01:00
Christopher Faulet	3e55b3da30	MEDIUM: promex/stick-table: Dump stick-table metrics via a promex module Create a promex module to dump stick-table metrics. Thanks to this patch, all references to stick tables were removed from the promex service.	2024-02-02 09:11:34 +01:00
Willy Tarreau	cdc993b19e	BUILD: stick-table: fix build error on 32-bit platforms Commit `9b2717e7b` ("MINOR: stktable: use {show,set,clear} table with ptr") stores a pointer in a long long (64bit), which fails the cas to void* on 32-bit platforms: src/stick_table.c: In function 'table_process_entry_per_ptr': src/stick_table.c:5136:37: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] 5136 \| ts = stktable_lookup_ptr(t, (void *)ptr); On all our supported platforms, longs and pointers are of the same size, so let's just turn this to a ulong instead.	2024-01-21 08:21:35 +01:00
Aurelien DARRAGON	41b7193e3c	MINOR: stktable: stktable_data_ptr() cannot fail in table_process_entry() In table_process_entry(), stktable_data_ptr() result is dereferenced without checking if it's NULL first, which may happen when bad inputs are provided to the function. However, data_type and ts arguments were already checked prior to calling the function, so we know for sure that stktable_data_ptr() will never return NULL in this case. However some static code analyzers such as Coverity are being confused because they think that the result might possibly be NULL. (See GH #2398) To make it explicit that we always provide good inputs and expect valid result, let's switch to the __stktable_data_ptr() unsafe function.	2024-01-02 08:51:51 +01:00
Aurelien DARRAGON	9b2717e7bb	MINOR: stktable: use {show,set,clear} table with ptr This patchs adds support for optional ptr (0xffff form) instead of key argument to match against existing sticktable entries, ie: if the key is empty or cannot be matched on the cli due to incompatible characters. Lookup is performed using a linear search so it will be slower than key search which relies on eb tree lookup. Example: set table mytable key mykey data.gpc0 1 show table mytable > 0x7fbd00032bd8: key=mykey use=0 exp=86373242 shard=0 gpc0=1 clear table mytable ptr 0x7fbd00032bd8 This patchs depends on: - "MINOR: stktable: add table_process_entry helper function" It should solve GH #2118	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	6ee3923c52	MINOR: stktable: add table_process_entry helper function Only keep key-related logic in table_process_entry_per_key() function, and then use table_process_entry() function that takes an entry pointer as argument to process the entry.	2023-12-21 14:22:27 +01:00
Aurelien DARRAGON	2c4943c18b	BUG/MINOR: proxy/stktable: missing frees on proxy cleanup In `1b8e68e` ("MEDIUM: stick-table: Stop handling stick-tables as proxies.") we forgot to free the table pointer which is now dynamically allocated. Let's take this opportunity to also fix a missing free in the table itself (the table expire task wasn't properly destroyed) This patch depends on: - "MINOR: stktable: add sktable_deinit function" It should be backported in every stable versions.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	e10cf61099	MINOR: stktable: add stktable_deinit function Adding sktable_deinit() helper function to properly cleanup a sticktable that was initialized using stktable_init().	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	8dae361f35	MINOR: stktable/cli: support v6tov4 and v4tov6 conversions Add a special treatment for the IPV4 and IPV6 cases in table_process_entry_per_key() function so that input string is parsed in best effort (STR to pseudo type ADDR): input format is first considered over table type and then let smp_to_stkey() do the type conversion for us when needed. This patch heavily depends on: - "MEDIUM: stktable/cli: simplify entry key handling" And optionally depends on: - `72514a44` ("MEDIUM: tools/ip: v4tov6() and v6tov4() rework")	2023-11-08 16:38:06 +01:00
Aurelien DARRAGON	0a47e6bccc	MEDIUM: stktable/cli: simplify entry key handling Make use of smp_to_stkey() in table_process_entry_per_key() to simplify key handling and leverage auto type conversions from sample API. One noticeable side effect is that integer input checks will be relaxed given that c_str2int() sample conv is more permissible than the integrated table_process_entry_per_key() integer parser.	2023-11-08 16:38:06 +01:00
Aurelien DARRAGON	c6826b9570	BUG/MINOR: stick-table/cli: Check for invalid ipv4 key When an ipv4 key is used to filter a CLI command on a stick table clear/set/show table ...), inetaddr_host+htonl combination was used with no error checking. Instead, we now use inet_pton(), which is what we use for ipv6 addresses since `b7c962b0c0` ("BUG/MINOR: stick-table/cli: Check for invalid ipv6 key") Doing this allows us to easily check for parsing errors: we're trading off some parsing efficience to better catch input errors and ensure we get similar behavior between ipv4 and ipv6 addresses handling. This patch may be backported to all supported versions.	2023-11-08 16:38:06 +01:00
Aurelien DARRAGON	5158c0ff69	MEDIUM: stktable/peers: "write-to" local table on peer updates In this patch, we add the possibility to declare on a table definition ("table" in peer section, or "stick-table" in proxy section) that we want the remote/peer updates on that table to be pushed on a local haproxy table in addition to the source table. Consider this example: \|peers mypeers \| peer local 127.0.0.1:3334 \| peer clust 127.0.0.1:3333 \| table t1.local type string size 10m store server_id,server_key expire 30s \| table t1.clust type string size 10m store server_id,server_key write-to mypeers/t1.local expire 30s With this setup, we consider haproxy uses t1.local as cache/local table for read and write operations, and that t1.clust is a remote table containing datas processed from t1.local and similar tables from other haproxy peers in a cluster setup. The t1.clust table will be used to refresh the local/cache one via the "write-to" statement. What will happen, is that every time haproxy will see entry updates for the t1.clust table: it will overwrite t1.local table with fresh data and will update the entry expiration timer. If t1.local entry doesn't exist yet (key doesn't exist), it will automatically create it. Note that only types that cannot be used for arithmetic ops will be handled, and this to prevent processed values from the remote table from interfering with computations based on values from the local table. (ie: prevent cumulative counters from growing indefinitely). "write-to" will only push supported types if they both exist in the source and the target table. Be careful with server_id and server_key storage because they are often declared implicitly when referencing a table in sticking rules but it is required to declare them explicitly for them to be pushed between a remote and a local table through "write-to" option. Also note that the "write-to" target table should have the same type as the source one, and that the key length should be strictly equal, otherwise haproxy will raise an error due to the tables being incompatibles. A table that is already being written to cannot be used as a source table for a "write-to" target. Thanks to this patch, it will now be possible to use sticking rules in peer cluster context by using a local table as a local cache which will be automatically refreshed by one or multiple remote table(s). This commit depends on: - "MINOR: stktable: stktable_init() sets err_msg on error" - "MINOR: stktable: check if a type should be used as-is"	2023-11-03 17:30:30 +01:00
Aurelien DARRAGON	db0cb54f81	MINOR: stktable: check if a type should be used as-is stick table types now have an extra bit named 'as_is' that allows us to check if such type should be used as-is or if it may be involved in arithmetic operations such as counters. This can be useful since those types are not common and may require specific handling. e.g.: stktable_data_types[data_type].as_is will be set to 1 if the type cannot be used in arithmetic operations.	2023-11-03 17:30:30 +01:00
Aurelien DARRAGON	b8c19f877a	MINOR: stktable: stktable_init() sets err_msg on error stktable_init() now sets err_msg when error occurs so that caller is able to precisely report the cause of the failure.	2023-11-03 17:30:30 +01:00
Aurelien DARRAGON	6376fe9142	BUG/MINOR: stktable: missing free in parse_stick_table() When "peers" keyword is encountered within a stick table definition, peers.name hint gets replaced with a new copy of the provided name using strdup(). However, there is no detection on whether the name was previously set or not, so it is currently allowed to reuse the keyword multiple time to overwrite previous value, but here we forgot to free previous value for peers.name before assigning it to a new one. This should be backported to every stable versions.	2023-11-03 17:30:30 +01:00
Aurelien DARRAGON	7eb05891d8	BUG/MINOR: stktable: allow sc-add-gpc from tcp-request connection Following the previous commit's logic, we enable the use of sc-add-gpc from tcp-request connection since it was probably forgotten in the first place for sc-set-gpt0, and since sc-add-gpc was inspired from it, it also lacks its. As sc-add-gpc was implemented in `5a72d03a58` ("MINOR: stick-table: implement the sc-add-gpc() action"), this should only be backported to 2.8	2023-08-14 09:03:49 +02:00
Aurelien DARRAGON	6c79309fda	BUG/MINOR: stktable: allow sc-set-gpt(0) from tcp-request connection Both the documentation and original developer intents seem to suggest that sc-set-gpt/sc-set-gpt0 actions should be available from tcp-request connection. Yet because it was probably forgotten when expr support was added to sc-set-gpt0 in `0d7712dff0` ("MINOR: stick-table: allow sc-set-gpt0 to set value from an expression") it doesn't work and will report this kind of errors: "internal error, unexpected rule->from=0, please report this bug!" Fixing the code to comply with the documentation and the expected behavior. This must be backported to every stable versions. [for < 2.5, as only sc-set-gpt0 existed back then, the patch must be manually applied to skip irrelevant parts]	2023-08-14 09:03:44 +02:00
Willy Tarreau	cfeca3a3a3	MEDIUM: stick-table: touch updates under an upgradable read lock Instead of taking the update's write lock in stktable_touch_with_exp(), while most of the time under high load there is nothing to update because the entry is touched before having been synchronized present, let's do the check under a read lock and upgrade it to perform the update if needed. These updates are rare and the contention is not expected to be very high, so at the first failure to upgrade we retry directly with a write lock. By doing so the performance has almost doubled again, from 1140 to 2050k with a peers section enabled. The contention is now on taking the read lock itself, so there's little to be gained beyond this in this function.	2023-08-11 19:03:35 +02:00
Willy Tarreau	87e072eea5	MEDIUM: stick-table: use a distinct lock for the updates tree Updating an entry in the updates tree is currently performed under the table's write lock, which causes huge contention with other accesses such as lookups and free. Aside the updates tree, the update, localupdate and commitupdate variables, nothing is manipulated, so let's create a distinct lock (updt_lock) to protect these together to remove this contention. It required to add an extra lock in the few places where we delete the update (though only if we're really going to delete it) to protect the tree. This is very convenient because now peer_send_teachmsgs() only needs to take this read lock, and there is very little contention left on the stick-table. With this alone, the performance jumped from 614k to 1140k/s on a 80-thread machine with a peers section! Stick-table updates with no peers however now has to stand two locks and slightly regressed from 4.0-4.1M/s to 3.9-4.0. This is fairly minimal compared to the significant unlocking of the peers updates and considered totally acceptable.	2023-08-11 19:03:35 +02:00
Willy Tarreau	7968fe3889	MEDIUM: stick-table: change the ref_cnt atomically Due to the ts->ref_cnt being manipulated and checked inside wrlocks, we continue to have it updated under plenty of read locks, which have an important cost on many-thread machines. This patch turns them all to atomic ops and carefully moves them outside of locks every time this is possible: - the ref_cnt is incremented before write-unlocking on creation otherwise the element could vanish before we can do it - the ref_cnt is decremented after write-locking on release - for all other cases it's updated out of locks since it's guaranteed by the sequence that it cannot vanish - checks are done before locking every time it's used to decide whether we're going to release the element (saves several write locks) - expiration tests are just done using atomic loads, since there's no particular ordering constraint there, we just want consistent values. For Lua, the loop that is used to dump stick-tables could switch to read locks only, but this was not done. For peers, the loop that builds updates in peer_send_teachmsgs is extremely expensive in write locks and it doesn't seem this is really needed since the only updated variables are last_pushed and commitupdate, the first one being on the shared table (thus not used by other threads) and the commitupdate could likely be changed using a CAS. Thus all of this could theoretically move under a read lock, but that was not done here. On a 80-thread machine with a peers section enabled, the request rate increased from 415 to 520k rps.	2023-08-11 19:03:35 +02:00
Willy Tarreau	73b1dea4d1	MINOR: stick-table: move the task_wakeup() call outside of the lock The write lock in stktable_touch_with_exp() is quite expensive and should be shortened as much as possible. There's no need for it when calling task_wakeup() so let's move it out. On a 80-thread machine with a peers section, the request rate increased from 397k to 415k rps.	2023-08-11 19:03:35 +02:00
Willy Tarreau	322e4ab9d2	MINOR: stick-table: move the task_queue() call outside of the lock The write lock in stktable_requeue_exp() is quite expensive and should be shortened as much as possible. There's no need for it when calling task_queue() so let's move it out. On a 80-thread machine with a peers section, the request rate increased from 368k to 397k rps.	2023-08-11 19:03:35 +02:00
Christopher Faulet	208c712b40	MINOR: stconn: Rename SC_FL_SHUTW in SC_FL_SHUT_DONE Here again, it is just a flag renaming. In SC flags, there is no longer shutdown for writes but shutdowns.	2023-04-14 15:01:21 +02:00
William Lallemand	3f210970bf	BUG/MINOR: stick_table: alert when type len has incorrect characters Alert when the len argument of a stick table type contains incorrect characters. Replace atol by strtol. Could be backported in every maintained versions.	2023-04-13 14:46:08 +02:00
Christopher Faulet	7faac7cf34	MINOR: tree-wide: Simplifiy some tests on SHUT flags by accessing SCs directly At many places, we simplify the tests on SHUT flags to remove calls to chn_prod() or chn_cons() function because the corresponding SC is available.	2023-04-05 08:57:06 +02:00
Christopher Faulet	87633c3a11	MEDIUM: tree-wide: Move flags about shut from the channel to the SC The purpose of this patch is only a one-to-one replacement, as far as possible. CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not so simple because at many places, we were testing if a channel was shut for reads and writes in same time. To do the same, shut for reads must be tested on one side on the SC and shut for writes on the other side on the opposite SC. A special care was taken with process_stream(). flags of SCs must be saved to be able to detect changes, just like for the channels.	2023-04-05 08:57:06 +02:00
Aurelien DARRAGON	e2907c7ee3	MINOR: stick-table: add sc-add-gpc() to http-after-response sc-add-gpc() was implemented in `5a72d03` ("MINOR: stick-table: implement the sc-add-gpc() action") This new action was exposed everywhere sc-inc-gpc() is available, except for http-after-response. But there doesn't seem to be a technical constraint that prevents us from exposing it in http-after-response. It was probably overlooked, let's add it. No backport needed, unless `5a72d03` ("MINOR: stick-table: implement the sc-add-gpc() action") is being backported.	2023-03-17 13:09:09 +01:00
Aleksey Ponomaryov	593802128c	BUG/MEDIUM: stick-table: do not leave entries in end of window during purge At some moments expired stick table records stop being removed. This happens when the internal time wraps around the 32-bit limit, or every 49.7 days. What precisely happens is that some elements that are collected close to the end of the time window (2^32 - table's "expire" setting) might have been updated and will be requeued further, at the beginning of the next window. Here, three bad situations happen: - the incorrect integer-based comparison that is not aware of wrapping will result in the scan to restart from the freshly requeued element, skipping all those at the end of the window. The net effect of this is that at each wakeup of the expiration task, only one element from the end of the window will be expired, and other ones will remain there for a very long time, especially if they have to wait for all the predecessors to be picked one at a time after slow wakeups due to a long expiration ; this is what was observed in issue #2034 making the table fill up and appear as not expiring at all, and it seems that issue #2024 reports the same problem at the same moment (since such issues happen for everyone roughly at the same time when the clock doesn't drift too much). - the elements that were placed at the beginning of the next window are skipped as well for as long as there are refreshed entries at the end of the previous window, so these ones participate to filling the table as well. This is cause by the restart from the current, updated node that is generally placed after most other less recently updated elements. - once the last element at the end of the window is picked, suddenly there is a large amount of expired entries at the beginning of the next window that all have to be requeued. If the expiration delay is large, the number can be big and it can take a long time, which can very likely explain the periodic crashes reported in issue #2025. Limiting the batch size as done in commit `dfe79251d` ("BUG/MEDIUM: stick-table: limit the time spent purging old entries") would make sense for process_table_expire() as well. This patch addresses the incorrect tree scan algorithm to make sure that: - there's always a next element to compare against, even when dealing with the last one in the tree, the first one must be used ; - time comparisons used to decide whether to restart from the current element use tick_is_lt() as it is the only case where we know the current element will be placed before any other one (since the tree respects insertion ordering for duplicates) In order to reproduce the issue, it was found that injecting traffic on a random key that spans over half of the size of a table whose expiration is set to 15s while the date is going to wrap in 20s does exhibit an increase of the table's size 5s after startup, when entries start to be pushed to the next window. It's more effective when a second load generator constantly hammers a same key to be certain that none of them is ready to expire. This doesn't happen anymore after this patch. This fix needs to be backported to all stable versions. The bug has been there for as long as the stick tables were introduced in 1.4-dev7 with commit `3bd697e07` ("[MEDIUM] Add stick table (persistence) management functions and types"). A cleanup could consists in deduplicating that code by having process_table_expire() call __stktable_trash_oldest(), with that one improved to support an optional time check.	2023-02-08 08:55:02 +01:00
Christopher Faulet	da89e9b95b	MINOR: channel/applets: Stop to test CF_WRITE_ERROR flag if CF_SHUTW is enough In applets, we stop processing when a write error (CF_WRITE_ERROR) or a shutdown for writes (CF_SHUTW) is detected. However, any write error leads to an immediate shutdown for writes. Thus, it is enough to only test if CF_SHUTW is set.	2023-01-09 18:41:08 +01:00
Willy Tarreau	5a72d03a58	MINOR: stick-table: implement the sc-add-gpc() action This action increments the General Purpose Counter at the index <idx> of the array associated to the sticky counter designated by <sc-id> by the value of either integer <int> or the integer evaluation of expression <expr>. Integers and expressions are limited to unsigned 32-bit values. If an error occurs, this action silently fails and the actions evaluation continues. <idx> is an integer between 0 and 99 and <sc-id> is an integer between 0 and 2. It also silently fails if the there is no GPC stored at this index. The entry in the table is refreshed even if the value is zero. The 'gpc_rate' is automatically adjusted to reflect the average growth rate of the gpc value. The main use of this action is to count scores or total volumes (e.g. estimated danger per source IP reported by the server or a WAF, total uploaded bytes, etc).	2023-01-07 09:11:22 +01:00
Willy Tarreau	6c0117168e	MEDIUM: stick-table: set the track-sc limit at boottime via tune.stick-counters The number of stick-counter entries usable by track-sc rules is currently set at build time. There is no good value for this since the vast majority of users don't need any, most need only a few and rare users need more. Adding more counters for everyone increases memory and CPU usages for no reason. This patch moves the per-session and per-stream arrays to a pool of a size defined at boot time. This way it becomes possible to set the number of entries at boot time via a new global setting "tune.stick-counters" that sets the limit for the whole process. When not set, the MAX_SESS_STR_CTR value still applies, or 3 if not set, as before. It is also possible to lower the value to 0 to save a bit of memory if not used at all. Note that a few low-level sample-fetch functions had to be protected due to the ability to use sample-fetches in the global section to set some variables.	2023-01-06 18:08:49 +01:00
Christopher Faulet	a92480462c	MINOR: http-rules: Add missing actions in http-after-response ruleset This patch adds the support of following actions in the http-after-response ruleset: * set-map, del-map and del-acl * set-log-level * sc-inc-gpc, sc-inc-gpc0 and set-inc-gpc1 * sc-inc-gpt and sc-set-gpt0 This patch should solve the issue #1980.	2023-01-05 11:23:59 +01:00
Willy Tarreau	20391519c3	BUG/MINOR: stick-table: report the correct action name in error message sc-inc-gpc() learned to use arrays in 2.5 with commit `4d7ada8f9` ("MEDIUM: stick-table: add the new arrays of gpc and gpc_rate"), but the error message says "sc-set-gpc" instead of "sc-inc-gpc". Let's fix this to avoid confusion. This can be backported to 2.5.	2023-01-02 17:35:50 +01:00

1 2 3 4 5 ...

303 Commits