haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-10-27 14:41:28 +01:00

Author	SHA1	Message	Date
Olivier Houchard	71199e394c	BUG/MEDIUM: stick-tables: Don't let table_process_entry() handle refcnt Instead of having table_process_entry() decrement the session's ref counter, do it outside, from the caller. Some were missed, such as when an action was invalid, which would lead to the ref counter not being decremented, and the session not being destroyable. It makes more sense to do that from the caller, who just obtained the ref counter, anyway. This should be backporter up to 2.8.	2025-09-22 23:14:19 +02:00
Willy Tarreau	f4059ea42f	MEDIUM: stktable: index table names using compact trees Here we're saving 64 bytes per stick-table, from 3392 to 3328, and the change was really straightforward so there's no reason not to do it.	2025-09-16 09:23:46 +02:00
Willy Tarreau	e05afda249	MEDIUM: stick-table: move process_table_expire() to a single thread A big deal of the task_queue() contention is caused by this function because it's created using task_new_anywhere() and is subject to heavy updates. Let's turn it to single thread by rotating the assigned threads during initialization so that a table only runs on one thread at a time. However there's a trick: the function used to call task_queue() to requeue the task if it had advanced its timer (may only happen when learning an entry from a peer). We can't do that anymore since we can't queue another thread's task. Thus instead of the task needs to be scheduled earlier than previously planned, we simply perform a wakeup. It will likely do nothing and will self-adjust its next wakeup timer. Doing so halves the number of multi-thread task wakeups. In addition the request rate at saturation increased by 12% with 16 peers and 40 tables on a 16 8-thread processes. This should improve the situation described by Felipe in issues #3084 and #3101. This should be backported to 3.2 after some extended checks.	2025-09-10 19:13:33 +02:00
Willy Tarreau	2831cb104f	BUG/MINOR: stick-table: make sure never to miss a process_table_expire update In stktable_requeue_exp(), there's a tiny race at the beginning during which we check the task's expiration date to decide whether or not to wake process_table_expire() up. During this race, the task might just have finished running on its owner thread and we can miss a task_queue() opportunity, which probably explains why during testing it seldom happens that a few entries are left at the end. Let's perform a CAS to confirm the value is still the same before leaving. This way we're certain that our value has been seen at least once. This should be backported to 3.2.	2025-09-10 18:45:01 +02:00
Willy Tarreau	be86a69fe8	DEBUG: stick-tables: export stktable_add_pend_updates() for better reporting This function is a tasklet handler used to send peers updates, and it can happen quite a bit in "show tasks" and "show profiling tasks", so let's export it so that we don't face a cryptic symbol name: $ socat - /tmp/haproxy-n10.stat <<< "show tasks" Running tasks: 43 (8 threads) function places % lat_tot lat_avg calls_tot calls_avg calls% process_table_expire 16 37.2 1.072m 4.021s 115831 7239 15.4 task_process_applet 15 34.8 1.072m 4.287s 486299 32419 65.0 stktable_add_pend_updates 8 18.6 - - 89725 11215 12.0 sc_conn_io_cb 3 6.9 - - 5007 1669 0.6 process_peer_sync 1 2.3 4.293s 4.293s 50765 50765 6.7 This should be backported to 3.2 as it participates to debugging the table+peers processing overhead.	2025-09-10 11:34:51 +02:00
Willy Tarreau	993c09438b	BUG/MEDIUM: stick-tables: don't loop on non-expirable entries The stick-table expiration of ref-counted entries was insufficiently addresse by commit 324f0a60ab ("BUG/MINOR: stick-tables: never leave used entries without expiration"), because now entries are just requeued where they were, so they're visited over and over for long sessions, causing process_table_expire() to loop, eating CPU and causing lock contention. Here we take care of refreshing their timeer when they are met, so that we don't meet them more than once per stick-table lifetime. It should address at least a part of the recent degradation that Felipe noticed in GH #3084. Since the fix above was marked for backporting to 3.2, this one should be backported there as well.	2025-09-10 11:27:27 +02:00
Willy Tarreau	d5e7fba5c0	MEDIUM: stick-tables: don't wait indefinitely in stktable_add_pend_updates() This one doesn't need to wait forever, if it cannot work it can postpone it. When building with a high value of STKTABLE_MAX_UPDATES_AT_ONCE (1000), it's still possible to trigger warnings in this function on the write lock that is contended by peers and expiration. Changing it for a trylock resolves the issue. This should be backported to 3.2 after a bit of testing.	2025-09-09 17:56:37 +02:00
Willy Tarreau	a771b14541	MEDIUM: stick-tables: give up on lock contention in process_table_expire() process_table_expire() can take quite a lot of time running over all shards. During this time it will hinder track-sc rules and peers, which will experience an increased latency to do their work, especially peers where each message will cause a lock, whose cumulated time can exceed the watchdog's patience. Here, we proceed just like in stktable_trash_oldest(), which is that we're using a trylock to detect contention. The first time it happens, if we hadn't purged anything, we switch to a regular lock to perform the operation, and next time it happens we abort. This guarantees that some entries will be expired and that contention will be reduced with when detected. With this change, various tests didn't manage to produce any warning, including at the end of the load generation session. This should be backported to 3.2 after a bit more testing.	2025-09-09 17:56:37 +02:00
Willy Tarreau	f87cf8b76e	MEDIUM: stick-tables: relax stktable_trash_oldest() to only purge what is needed stktable_trash_oldest() does insist a lot on purging what was requested, only limited by STKTABLE_MAX_UPDATES_AT_ONCE. This is called in two conditions, one to allocate a new stksess, and the other one to purge entries of a stopping process. The cost of iterating over all shards is huge, and a shard lock is taken each time before looking up entries. Moreover, multiple threads can end up doing the same and looking hard for many entries to purge when only one is needed. Furthermore, all threads start from the same shard, hence synchronize their locks. All of this costs a lot to other operations such as access from peers. This commit simplifies the approach by ignoring the budget, starting from a random shard number, and using a trylock so as to be able to give up early in case of contention. The approach chosen here consists in trying hard to flush at least one entry, but once at least one is evicted or at least one trylock failed, then a failure on the trylock will result in finishing. The function now returns a success as long as one entry was freed. With this, tests no longer show watchdog warnings during tests, though a few still remain when stopping the tests (which are not related to this function but to the contention from process_table_expire()). With this change, under high contention some entries' purge might be postponed and the table may occasionally contain slightly more entries than their size (though this already happens since stksess_new() first increments ->current before decrementing it). Measures were made on a 64-core system with 8 peers of 16 threads each, at CPU saturation (350k req/s each doing 10 track-sc) for 10M req, with 3 different approaches: - this one resulted in 1500 failures to find an entry (0.015% size overhead), with the lowest contention and the fairest peers distibution. - leaving only after a success resulted in 229 failures (0.0029% size overhead) but doubled the time spent in the function (on the write lock precisely). - leaving only when both a success and a failed lock were met resulted in 31 failures (0.00031% overhead) but the contention was high enough again so that peers were not all up to date. Considering that a saturated machine might exceed its entries by 0.015% is pretty minimal, the mechanism is kept. This should be backported to 3.2 after a bit more testing as it resolves some watchdog warnings and panics. It requires precedent commit "MINOR: stick-table: permit stksess_new() to temporarily allocate more entries" to over-allocate instead of failing in case of contention.	2025-09-09 17:56:37 +02:00
Willy Tarreau	b119280f60	MINOR: stick-table: permit stksess_new() to temporarily allocate more entries stksess_new() calls stktable_trash_oldest() to release some entries. If it fails however, it will fail to allocate an entry. This is a problem because it doesn't permit stktable_trash_oldest() to be used in best effort mode, which forces it to impose high contention. There's no problem with allocating slightly more in practice. In the worst case if all entries are in use, it's not shocking to temporarily exceed the number of entries by a few units. Let's relax this problematic rule. This patch might need to be backported to 3.2 after a bit more testing in order to support locking relaxation.	2025-09-09 17:56:37 +02:00
Willy Tarreau	46463d6850	OPTIM: stick-tables: exit expiry faster when the update lock is held It helps keep the contention level low: when we hold the update lock that we know other parts may be relying on (peers, track-sc etc), we decrease the remaining visit counters 4 times as fast to further reduce the contention. At this point no more warnings are seen during intense synchronization (2x64 cores, 1.5M req/s with a track-sc each, 5M entries in use).	2025-09-03 15:51:13 +02:00
Willy Tarreau	696793205b	MINOR: stick-tables: limit the number of visited nodes during expiration As reported by Felipe in GH issue #3084, on large systems it's not sufficient to leave the expiration process after a certain number of expired entries, because if they accumulate too fast, it's possible to still spend some time visiting many (e.g. those still in use), which takes time. Thus here we're taking a stricter approach consisting in counting the number of visited entries, which allows to leave early if we can't do the expected work in a reasonable amount of time. In order to avoid always stopping on first shards and never visiting last ones, we're always starting from a random shard number and looping from that one. This way even if we always leave early, all shards will be handled equally. This should be backported to 3.2.	2025-09-03 15:51:13 +02:00
Willy Tarreau	324f0a60ab	BUG/MINOR: stick-tables: never leave used entries without expiration When trying to kill/expire entries, if a ref-counted entry is found, let's requeue it with its expiration timer instead of leaving it out, because other ref-counters (e.g. peers) will not purge it otherwise, leaving it orphan. This one seems trickier to trigger, though it seems to happen sometimes when peers are late and a long resync is active and competing with intense calls to process_table_expire() (i.e. when no other acitvity is there). This must be backported to 3.2. It's likely that older versions are affected as well, but possibly differently since the expiration mechanism changed between 3.1 and 3.2, so better not take unneeded risks there.	2025-09-03 15:51:13 +02:00
Willy Tarreau	8da6ed6b6a	BUG/MEDIUM: stick-tables: don't leave the expire loop with elements deleted In 3.2, the table expiration latency was improved by commit 994cc58576 ("MEDIUM: stick-tables: Limit the number of entries we expire"), however it introduced an issue by which it's possible to leave the loop after a certain number of elements were expired, without requeuing the deleted elements. The issue it causes is that other places with a non-null ref_cnt will not necessarily delete it themselves, resulting in orphan elements in the table. These ones will then pollute it and force recycling old ones more often which in turn results in an increase of the contention. Let's check for the expiration counter before deleting the element so that it can be found upon next visit. This fix must be backported to 3.2. It is directly related to GH issue #3084. Thanks to Felipe and Ricardo for sharing precious info and testing a candidate fix.	2025-09-03 15:51:13 +02:00
Valentine Krasnobaeva	21d5f43aa6	BUG/MINOR: stick-table: cap sticky counter idx with tune.nb_stk_ctr instead of MAX_SESS_STKCTR Cap sticky counter index with tune.nb_stk_ctr instead of MAX_SESS_STKCTR for sc-add-gpc. Same logic is already implemented for sc-inc-gpc and sc-set-gpt keywords. So, it seems missed for sc-add-gpc. This fixes the issue #3061 reported at GitHub. Thanks to @ma311 for reporting their analysis of the issue. This should be backported in all versions until 2.8, included 2.8.	2025-08-08 05:26:30 +02:00
Willy Tarreau	576e47fb9a	BUG/MEDIUM: stick-table: always remove update before adding a new one Since commit 388539faa ("MEDIUM: stick-tables: defer adding updates to a tasklet"), between the entry creation and its arrival in the updates tree, there is time for scheduling, and it now becomes possible for an stksess entry to be requeued into the list while it's still in the tree as a remote one. Only local updates were removed prior to being inserted. In this case we would re-insert the entry, causing it to appear as the parent of two distinct nodes or leaves, and to be visited from the first leaf during a delete() after having already been removed and freed, causing a crash, as Christian reported in issue #2959. There's no reason to backport this as this appeared with the commit above in 3.2-dev13.	2025-05-08 23:32:25 +02:00
Willy Tarreau	0cee7b5b8d	BUG/MEDIUM: stick-tables: close a tiny race in __stksess_kill() It might be possible not to see the element in the tree, then not to see it in the update list, thus not to take the lock before deleting. But an element in the list could have moved to the tree during the check, and be removed later without the updt_lock. Let's delete prior to checking the presence in the tree to avoid this situation. No backport is needed since this arrived in -dev13 with the update list.	2025-05-07 18:49:21 +02:00
Willy Tarreau	46b5dcad99	MINOR: stick-tables: add "ipv4" as an alias for the "ip" type However the doc purposely says the opposite, to encourage migrating away from "ip". The goal is that in the future we change "ip" to mean "ipv6", which seems to be what most users naturally expect. But we cannot break configurations in the LTS version so for now "ipv4" is the alias. The reason for not changing it in the table is that the type name is used at a few places (look for "].kw"): - dumps - promex We'd rather not change that output for 3.2, but only do it in 3.3. This way, 3.2 can be made future-proof by using "ipv4" in the config without any other side effect. Please see github issue #2962 for updates on this transition.	2025-05-07 10:11:55 +02:00
Aurelien DARRAGON	0e6f968ee3	BUG/MEDIUM: stktable: fix sc_(<ctr>) BUG_ON() regression with ctx > 9 As reported in GH #2958, commit 6c9b315 caused a regression with sc_ fetches and tracked counter id > 9. As such, the below configuration would cause a BUG_ON() to be triggered: global log stdout format raw local0 tune.stick-counters 11 defaults log global mode http frontend www bind :8080 acl track_me bool(true) http-request set-var(txn.track_var) str("a") http-request track-sc10 var(txn.track_var) table rate_table if track_me http-request set-var(txn.track_var_rate) sc_gpc_rate(0,10,rate_table) http-request return status 200 backend rate_table stick-table type string size 1k expire 5m store gpc_rate(1,1m) While in 6c9b315 the src_fetch logic was removed from smp_fetch_sc_stkctr(), num > 9 is indeed not expected anymore as original num value. But what we didn't consider is that num is effectively re-assigned for generic sc_ variant. Thus the BUG_ON() is misplaced as it should only be evaluated for non-generic fetches. It explains why it triggers with valid configurations Thanks to GH user @tkjaer for his detailed report and bug analysis No backport needed, this bug is specific to 3.2.	2025-05-02 16:57:45 +02:00
Olivier Houchard	994cc58576	MEDIUM: stick-tables: Limit the number of entries we expire In process_table_expire(), limit the number of entries we remove in one call, and just reschedule the task if there's more to do. Removing entries require to use the heavily contended update write lock, and we don't want to hold it for too long. This helps getting stick tables perform better under heavy load.	2025-05-02 15:27:55 +02:00
Olivier Houchard	d2d4c3eb65	MEDIUM: stick-tables: Limit the number of old entries we remove Limit the number of old entries we remove in one call of stktable_trash_oldest(), as we do so while holding the heavily contended update write lock, so we'd rather not hold it for too long. This helps getting stick tables perform better under heavy load.	2025-05-02 15:27:55 +02:00
Olivier Houchard	388539faa3	MEDIUM: stick-tables: defer adding updates to a tasklet There is a lot of contention trying to add updates to the tree. So instead of trying to add the updates to the tree right away, just add them to a mt-list (with one mt-list per thread group, so that the mt-list does not become the new point of contention that much), and create a tasklet dedicated to adding updates to the tree, in batchs, to avoid keeping the update lock for too long. This helps getting stick tables perform better under heavy load.	2025-05-02 15:27:55 +02:00
Willy Tarreau	1af592c511	MINOR: stick-table: use a separate lock label for updates Too many locks were sharing STK_TABLE_LOCK making it hard to analyze. Let's split the already heavily used update lock.	2025-04-24 14:02:22 +02:00
Ilia Shipitsin	78b849b839	CLEANUP: assorted typo fixes in the code and comments code, comments and doc actually.	2025-04-02 11:12:20 +02:00
Aurelien DARRAGON	0486b9e491	MINOR: stktable: add table_{inc,clr}_gpc* converters As discussed in GH #2423, there are some cases where src_{inc,clr}_gpc* is not sufficient because we need to perform the lookup on a specific key. Indeed, just like we did in e642916 ("MEDIUM: stktable: leverage smp_fetch_* helpers from sample conv"), we can easily implement new table converters based on existing fetches. This is what we do in this patch. Also the doc was updated so that src_{inc,clr}_gpc* fetches now point to their generic equivalent table_{inc,clr}_gpc. Indeed, src_{inc,clr}_gpc are simply aliases. This should fix GH #2423.	2025-01-16 11:50:33 +01:00
Aurelien DARRAGON	9f68049cc1	CLEANUP: stktable: move sample_conv_table_bytes_out_rate() sample_conv_table_bytes_out_rate() was defined in the middle of other stick-table sample convs without any ordering logic. Let's put it where it belongs, right after sample_conv_table_bytes_in_rate().	2025-01-16 11:50:27 +01:00
Aurelien DARRAGON	e6429166b9	MEDIUM: stktable: leverage smp_fetch_* helpers from sample conv In this patch we try to prevent code duplication: some fetches and sample converters do the exact same thing, except that the converter takes the argument as input data. Until now, both the converter and the fetch had their own implementation (copy pasted), with the fetch specific or converter specific lookup part. Thanks to previous commits, we now have generic sample fetch helpers that take the stkctr as argument, so let's leverage them directly from the converter functions when available. This allows to remove a lot of code duplication and should make code maintenance easier in the future.	2025-01-15 14:04:55 +01:00
Aurelien DARRAGON	6c9b315187	MEDIUM: stktable: split sc_ and src_ fetch lookup logics While this patch actually adds more insertions than deletions, it actually tries to simplify the lookup logic for sc_ and src_ sticktable fetches. Indeed, smp_create_src_stkctr() and smp_fetch_sc_stkctr() combination was used everywhere the fetch supports sc_ and src_ form, and smp_fetch_sc_stkctr() even integrated some of the src-oriented fetch logic. Not only this was confusing, but it made the task of adding new generic fetches even more complex. Thus in this patch we completely dedicate smp_fetch_sc_stkctr() to sc_ oriented fetches, while smp_create_src_stkctr() is now renamed to smp_fetch_src_stkctr() and can now work on its own for src_ oriented fetches. It takes an additional paramater, "create" to tell the function if the entry should be created if it doesn't exist yet. Now it's up to the calling function to know if it should be using the sc_ oriented fetch or the src_ oriented one based on the input keyword.	2025-01-15 14:04:50 +01:00
Aurelien DARRAGON	22229a41a2	MEDIUM: stktable: split src-based key smp_fetch_sc functions In this patch we split several sample fetch functions that are leveraged by the "src-" fetches such as smp_fetch_sc_inc_gpc(). Indeed, for all of them, we add an intermediate helper function that takes a stkctr pointer as parameter and performs the logic, leaving the lookup part in the calling function. Before this patch existing functions were doing the lookup + the fetch logic. Thanks to this patch it will become easier to add generic converters taking lookup key as input. List of targeted functions: - smp_fetch_sc_inc_gpc() - smp_fetch_sc_inc_gpc0() - smp_fetch_sc_inc_gpc1() - smp_fetch_sc_clr_gpc() - smp_fetch_sc_clr_gpc0() - smp_fetch_sc_clr_gpc1() - smp_fetch_sc_conn_cnt() - smp_fetch_sc_conn_rate() - smp_fetch_sc_updt_conn_cnt() - smp_fetch_sc_conn_curr() - smp_fetch_sc_glitch_cnt() - smp_fetch_sc_glitch_rate() - smp_fetch_sc_sess_cnt() - smp_fetch_sc_sess_rate() - smp_fetch_sc_http_req_cnt() - smp_fetch_sc_http_req_rate() - smp_fetch_sc_http_err_cnt() - smp_fetch_sc_http_err_rate() - smp_fetch_sc_http_fail_cnt() - smp_fetch_sc_http_fail_rate() - smp_fetch_sc_kbytes_in() - smp_fetch_sc_bytes_in_rate() - smp_fetch_kbytes_out() - smp_fetch_sc_gpc1_rate() - smp_fetch_sc_gpc0_rate() - smp_fetch_sc_gpc_rate() - smp_fetch_sc_get_gpc1() - smp_fetch_sc_get_gpc0() - smp_fetch_sc_get_gpc() - smp_fetch_sc_get_gpt0() - smp_fetch_sc_get_gpt() - smp_fetch_sc_bytes_out_rate() Please note that this patch doesn't render any good using "git show" or "git diff". For all the functions listed above, a new helper function was defined right above it, with the same name without "_sc". These new functions perform the fetch part, while the original ones (with "_sc") now simply perform the lookup and then leverage the corresponding fetch helper.	2025-01-15 14:04:45 +01:00
Aurelien DARRAGON	f71bad4694	MINOR: stktable: add smp_fetch_stksess() helper function smp_fetch_stksess(table, smp, create) performs a lookup in <table> by using <smp> as a key. It returns matching entry on success and NULL on failure. <create> can be set to 1 to force the entry creation. We then use this helper everywhere relevant to prevent code duplication	2025-01-15 14:04:40 +01:00
Aurelien DARRAGON	0fb8807820	MINOR: stktable: fix potential build issue in smp_to_stkey (2nd try) As discussed in GH #2838, the previous fix f399dbf ("MINOR: stktable: fix potential build issue in smp_to_stkey") which attempted to remove conversion ambiguity and prevent build warning proved to be insufficient. This time, we implement Willy's suggestion, which is to use an union to perform the conversion. Hopefully this should fix GH #2838. If that's the case (and only in that case), then this patch may be backported with f399dbf (else the patch won't apply) anywhere b59d1fd ("BUG/MINOR: stktable: fix big-endian compatiblity in smp_to_stkey()") was backported.	2025-01-15 14:04:31 +01:00
Aurelien DARRAGON	8919a80da9	BUG/MEDIUM: stktable: fix missing lock on some table converters In 819fc6f563 ("MEDIUM: threads/stick-tables: handle multithreads on stick tables"), sample fetch and action functions were properly guarded with stksess read/write locks for read and write operations respectively, but the sample_conv_table functions leveraged by "table_" converters were overlooked. This bug was not known to cause issues in existing deployments yet (at least it was not reported), but due to its nature it can theorically lead to inconsistent values being reported by "table_" converters if the value is being updated by another thread in parallel. It should be backported to all stable versions. [ada: for versions < 3.0, glitch_cnt and glitch_rate samples should be ignored as they first appeared in 3.0]	2025-01-14 11:36:04 +01:00
Aurelien DARRAGON	f399dbf70c	MINOR: stktable: fix potential build issue in smp_to_stkey smp_to_stkey() uses an ambiguous cast from 64bit integer to 32 bit unsigned integer. While it is intended, let's make the cast less ambiguous by explicitly casting the right part of the assignment to the proper type. This should fix GH #2838	2025-01-13 09:45:40 +01:00
Aurelien DARRAGON	24042df94e	MINOR: stktable: add sc[0-2]_key fetches As discussed in GH #1750, we were lacking a sample fetch to be able to retrieve the key from the currently tracked counter entry. To do so, sc_key fetch can now be used. It returns a sample with the correct type (table key type) corresponding to the tracked counter entry (from previous track-sc rules). If no entry is currently tracked, it returns nothing. It can be used using the standard form "sc_key(<sc_number>)" or the legacy form: "sc0_key", "sc1_key", "sc2_key" Documentation was updated.	2025-01-09 10:57:01 +01:00
Aurelien DARRAGON	7423310d5d	MINOR: stktable: add stksess_getkey() helper stksess_getkey(t, ts) returns a stktable_key struct pointer filled with data from input <ts> entry in <t> table. Returned pointer uses the static_table_key variable. Indeed, stktable_key struct is more convenient to manipulate than having to deal with the key extraction from stktsess struct directly.	2025-01-09 10:56:56 +01:00
Aurelien DARRAGON	df9c2ef2c3	MINOR: stktable: add stkey_to_smp() helper reverse operation for smp_to_stkey(): fills input <smp> from a stktable_key struct. Returns 1 on success and 0 on failure.	2025-01-09 10:56:50 +01:00
Aurelien DARRAGON	b59d1fd911	BUG/MINOR: stktable: fix big-endian compatiblity in smp_to_stkey() When smp_to_stkey() deals with SINT samples, since stick-tables deals with 32 bits integers while SINT sample is 64 bit integer, inplace conversion was done in smp_to_stkey. For that the 64 bit integer was truncated before the key would point to it. Unfortunately this only works on little endian architectures because with big endian ones, the key would point to the wrong 32bit range. To fix the issue and make the conversion endian-proof, let's re-assign the sample as 32bit integer before the key points to it. Thanks to Willy for having spotted the bug and suggesting the above fix. It should be backported to all stable versions.	2025-01-09 10:56:43 +01:00
Aurelien DARRAGON	5bbdd14f56	BUG/MINOR: stktable: invalid use of stkctr_set_entry() with mixed table types Some actions such as "sc0_get_gpc0" (using smp_fetch_sc_stkctr() internally) can take an optional table name as parameter to perform the lookup on a different table from the tracked one but using the key from the tracked entry. It is done by leveraging the stktable_lookup() function which was originally meant to perform intra-table lookups. Calling sc0_get_gpc0() with a different table name will result in stktable_lookup() being called to perform lookup using a stktsess from a different table. While it is theorically fine, it comes with a pitfall: both tables (the one from where the stktsess originates and the actual target table) should rely on the exact same key type and length. Failure to do so actually results in undefined behavior, because the key type and/or length from one table is used to perform the lookup in another table, while the underlying lookup API expects explicit type and key length. For instance, consider the below example: peers testpeers bind 127.0.0.1:10001 server localhost table test type binary len 1 size 100k expire 1h store gpc0 table test2 type string size 100k expire 1h store gpc0 listen test_px mode http bind 0.0.0.0:8080 http-request track-sc0 bin(AA) table testpeers/test http-request track-sc1 str(ok) table testpeers/test2 log-format "%[sc0_get_gpc0(testpeers/test2)]" log stdout format raw local0 server s1 git.haproxy.org:80 Performing a curl request to localhost:8080 will cause unitialized reads because string "ok" from test2 table will be compared as a string against "AA" binary sample which is not NULL terminated: ==2450742== Conditional jump or move depends on uninitialised value(s) ==2450742== at 0x484F238: strlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==2450742== by 0x27BCE6: stktable_lookup (stick_table.c:539) ==2450742== by 0x281470: smp_fetch_sc_stkctr (stick_table.c:3580) ==2450742== by 0x283083: smp_fetch_sc_get_gpc0 (stick_table.c:3788) ==2450742== by 0x2A805C: sample_process (sample.c:1376) So let's prevent that by adding some comments in stktable_set_entry() func description, and by adding a check in smp_fetch_sc_stkctr() to ensure both source stksess and target table share the same key properties. While it could be relevant to backport this in all stable versions, it is probably safer to wait for some time before doing so, to ensure that no existing configs rely on this ambiguity because the fact that the target table and source stksess entry need to share the same key type and length is not explicitly documented.	2024-12-31 16:36:00 +01:00
Aurelien DARRAGON	e8b7337d86	MINOR: stktable: support optional index for array types in {set, clear, show} table commands As discussed in GH #2286, {set, clear, show} table commands were unable to deal with array types such as gpt, because they handled such types as a non-array types, thus only the first entry (ie: gpt[0]) was considered. In this patch we add an extra logic around array-types handling so that it is possible to specify an array index right after the type, like this: set table peer/table key mykey data.gpt[2] value # where 2 is the entry index that we want to access If no index is specified, then it implicitly defaults to 0 to mimic previous behavior.	2024-12-23 17:32:11 +01:00
Aurelien DARRAGON	c0dc7769d4	MINOR: stktable: add stktable_get_data_type_idx() helper function Same as stktable_get_data_type(), but tries to parse optional index in the form "name[idx]" (only for array types). Falls back to stktable_get_data_type() when no index is provided.	2024-12-23 17:32:09 +01:00
Aurelien DARRAGON	9f44c5f9be	CLEANUP: stktable: replace nopurge attribute with flag Thanks to previous commit stktable struct now have a "flags" struct member Let's take this opportunity to remove the isolated "nopurge" attribute in stktable struct and rely on a flag named STK_FL_NOPURGE instead. This helps to better organize stktable struct members.	2024-12-05 12:15:31 +01:00
Aurelien DARRAGON	1f73d3524d	MINOR: stktable: implement "recv-only" table option When "recv-only" keyword is added on a stick table declaration (in peers or proxy section), haproxy considers that the table is only used for data retrieval from a remote location and not used to perform local updates. As such, it enables the retrieval of local-only values such as conn_cur that are ignored by default. This can be useful in some contexts where we want to know about local-values such are conn_cur from a remote peer. To do this, add stktable struct flags which default to NONE and enable the RECV_ONLY flag on the table then "recv-only" keyword is found in the table declaration. Then, when in peer_treat_updatemsg(), when handling table updates, don't ignore data updates for local-only values if the flag is set.	2024-12-05 12:15:24 +01:00
Willy Tarreau	9ab21a3c2d	CLEANUP: stick-table: make the file location point to a global file name The file name used to point to the calling function's stack for stick tables, which was OK during parsing but remained dangling afterwards. At least it was already marked const so as not to accidentally free it. Let's make it point to a file_name_node now.	2024-09-19 15:38:19 +02:00
Christopher Faulet	a7f6b0ac03	MEDIUM: stick-table: Add support of a factor for IN/OUT bytes rates Add a factor parameter to stick-tables, called "brates-factor", that is applied to in/out bytes rates to work around the 32-bits limit of the frequency counters. Thanks to this factor, it is possible to have bytes rates beyond the 4GB. Instead of counting each bytes, we count blocks of bytes. Among other things, it will be useful for the bwlim filter, to be able to configure shared limit exceeding the 4GB/s. For now, this parameter must be in the range ]0-1024].	2024-09-02 15:50:25 +02:00
Amaury Denoyelle	ea7ea5198a	BUG/MINOR: stick-table: fix crash for src_inc_gpc() without stkcounter Since 2.5, an array of GPC is provided to replace legacy gpc0/gpc1. src_inc_gpc is a sample fetch which is used to increment counters in this array. A crash occurs if src_inc_gpc is used without any previous track-sc rule. This is caused by an error in smp_fetch_sc_inc_gpc(). When temporary stick counter is created via smp_create_src_stkctr(), table pointer arg value used is not correct : it points to the counter ID instead of the table argument. To fix this, use the proper sample fetch second arg. This can be reproduced with the following config : acl mark src_inc_gpc(0,<table>) -m bool tcp-request connection accept if mark This should be backported up to 2.6.	2024-07-18 16:12:36 +02:00
Christopher Faulet	ad946a704d	MINOR: stick-table: Always decrement ref count before killing a session Guarded functions to kill a sticky session, stksess_kill() stksess_kill_if_expired(), may or may not decrement and test its reference counter before really killing it. This depends on a parameter. If it is set to non-zero value, the ref count is decremented and if it falls to zero, the session is killed. Otherwise, if this parameter is equal to zero, the session is killed, regardless the ref count value. In the code, these functions are always called with a non-zero parameter and the ref count is always decremented and tested. So, there is no reason to still have a special case. Especially because it is not really easy to say if it is supported or not. Does it mean it is possible to kill a sticky session while it is still referenced somewhere ? probably not. So, does it mean it is possible to kill a unreferenced session ? This case may be problematic because the session is accessed outside of any lock and thus may be released by another thread because it is unreferenced. Enlarging scope of the lock to avoid any issue is possible but it is a bit of shame to do so because there is no usage for now. The best is to simplify the API and remove this case. Now, stksess_kill() and stksess_kill_if_expired() functions always decrement and test the ref count before killing a sticky session.	2024-06-26 15:05:06 +02:00
Christopher Faulet	9357873641	BUG/MEDIUM: stick-table: Decrement the ref count inside lock to kill a session When we try to kill a session, the shard must be locked before decrementing the ref count on the session. Otherwise, the ref count can fall to 0 and a purge task (stktable_trash_oldest or process_table_expire) may release the session before we have the opportunity to acquire the lock on the shard to effectively kill the session. This could lead to a double free. Here is the scenario: Thread 1 Thread 2 sktsess_kill(ts) if (ATOMIC_DEC(&ts->ref_cnt) != 0) return /* here the ref count is 0 / stktable_trash_oldest() LOCK(&sh_lock) if (!ATOMIC_LOAD(&ts->ref_cnf)) __stksess_free(ts) UNLOCK(&sh_lock) / here the session was released */ LOCK(&sh_lock) __stksess_free(ts) <--- double free UNLOCK(&sh_lock) The bug was introduced in 2.9 by the commit 7968fe3889 ("MEDIUM: stick-table: change the ref_cnt atomically"). The ref count must be decremented inside the lock for stksess_kill() and sktsess_kill_if_expired() function. This patch should fix the issue #2611. It must be backported as far as 2.9. On the 2.9, there is no sharding. All the table is locked. The patch will have to be adapted.	2024-06-26 12:05:37 +02:00
Aurelien DARRAGON	8860c22c00	MINOR: stktable: avoid ambiguous stktable_data_ptr() usage in cli_io_handler_table() As reported by @Bbulatov in GH #2586, stktable_data_ptr() return value is used without checking it isn't NULL first, which may happen if the given type is invalid or not stored in the table. However, since date_type is set by table_prepare_data_request() right before cli_io_handler_table() is invoked, date_type is not expected to be invalid: table_prepare_data_request() normally checked that the type is stored inside the table. Thus stktable_data_ptr() should not be failing at this point, so we add a BUG_ON() to indicate that.	2024-06-03 16:59:54 +02:00
Willy Tarreau	0bda33a3ec	MINOR: stick-tables: remove the uneeded read lock in stksess_free() During changes made in 2.7 by commits 8d3c3336f9 ("MEDIUM: stick-table: make stksess_kill_if_expired() avoid the exclusive lock") and 996f1a5124 ("MEDIUM: stick-table: do not take a lock to update t->current anymore."), the operation was done cautiously one baby step at a time and the final cleanup was not done, as we're keeping a read lock under an atomic dec. Furthermore there's a pool_free() call under that lock, and we try to avoid pool_alloc() and pool_free() under locks for their nasty side effects (e.g. when memory gets recompacted), so let's really drop it now. Note that the performance gain is not really perceptible here, it's essentially for code clarity reasons that this has to be done.	2024-05-24 11:52:57 +02:00
Willy Tarreau	8580f9db20	CLEANUP: stick-tables: remove a few unneeded tests for use_wrlock Due to the code in stktable_touch_with_exp() being the same as in other functions previously made around a loop trying first to upgrade a read lock then to fall back to a direct write lock, there remains a confusing construct with multiple tests on use_wrlock that is obviously zero when tested. Let's remove them since the value is known and the loop does not exist anymore.	2024-05-24 11:52:19 +02:00

1 2 3 4 5 ...

351 Commits