Commit Graph

303 Commits

Author SHA1 Message Date
Willy Tarreau
0bda33a3ec MINOR: stick-tables: remove the uneeded read lock in stksess_free()
During changes made in 2.7 by commits 8d3c3336f9 ("MEDIUM: stick-table:
make stksess_kill_if_expired() avoid the exclusive lock") and 996f1a5124
("MEDIUM: stick-table: do not take a lock to update t->current anymore."),
the operation was done cautiously one baby step at a time and the final
cleanup was not done, as we're keeping a read lock under an atomic dec.
Furthermore there's a pool_free() call under that lock, and we try to
avoid pool_alloc() and pool_free() under locks for their nasty side
effects (e.g. when memory gets recompacted), so let's really drop it
now.

Note that the performance gain is not really perceptible here, it's
essentially for code clarity reasons that this has to be done.
2024-05-24 11:52:57 +02:00
Willy Tarreau
8580f9db20 CLEANUP: stick-tables: remove a few unneeded tests for use_wrlock
Due to the code in stktable_touch_with_exp() being the same as in other
functions previously made around a loop trying first to upgrade a read
lock then to fall back to a direct write lock, there remains a confusing
construct with multiple tests on use_wrlock that is obviously zero when
tested. Let's remove them since the value is known and the loop does not
exist anymore.
2024-05-24 11:52:19 +02:00
Willy Tarreau
77f286e8bc BUG/MEDIUM: stick-tables: make sure never to create two same remote entries
In GH issue #2552, Christian Ruppert reported an increase in crashes
with recent 3.0-dev versions, always related with stick-tables and peers.
One particularity of his config is that it has a lot of peers.

While trying to reproduce, it empirically was found that firing 10 load
generators at 10 different haproxy instances tracking a random key among
100k against a table of max 5k entries, on 8 threads and between a total
of 50 parallel peers managed to reproduce the crashes in seconds, very
often in ebtree deletion or insertion code, but not only.

The debugging revealed that the crashes are often caused by a parent node
being corrupted while delete/insert tries to update it regarding a recently
inserted/removed node, and that that corrupted node had always been proven
to be deleted, then immediately freed, so it ought not be visited in the
tree from functions enclosed between a pair of lock/unlock. As such the
only possibility was that it had experienced unexpected inserts. Also,
running with pool integrity checking would 90% of the time cause crashes
during allocation based on corrupted contents in the node, likely because
it was found at two places in the same tree and still present as a parent
of a node being deleted or inserted (hence the __stksess_free and
stktable_trash_oldest callers being visible on these items).

Indeed the issue is in fact related to the test set (occasionally redundant
keys, many peers). What happens is that sometimes, a same key is learned
from two different peers. When it is learned for the first time, we end up
in stktable_touch_with_exp() in the "else" branch, where the test for
existence is made before taking the lock (since commit cfeca3a3a3
("MEDIUM: stick-table: touch updates under an upgradable read lock") that
was merged in 2.9), and from there the entry is added. But is one of the
threads manages to insert it before the other thread takes the lock, then
the second thread will try to insert this node again. And inserting an
already inserted node will corrupt the tree (note that we never switched
to enforcing a check in insertion code on this due to API history that
would break various code parts).

Here the solution is simple, it requires to recheck leaf_p after getting
the lock, to avoid touching anything if the entry has already been
inserted in the mean time.

Many thanks to Christian Ruppert for testing this and for his invaluable
help on this hard-to-trigger issue.

This fix needs to be backported to 2.9.
2024-05-24 11:52:11 +02:00
Christopher Faulet
9938fb9c7a BUG/MEDIUM: stick-tables: Fix race with peers when killing a sticky session
When a sticky session is killed, we must be sure no other entity is still
referencing it. The session's ref_cnt must be 0. However, there is a race
with peers, as decribed in 21447b1dd4 ("BUG/MAJOR: stick-tables: fix race
with peers in entry expiration"). When the update lock is acquire, we must
recheck the ref_cnt value.

This patch is part of a debugging session about issue #2552. It must be
backported to 2.9.
2024-05-24 11:52:11 +02:00
Christopher Faulet
dfd938bad6 BUG/MEDIUM: stick-tables: Fix race with peers when trashing oldest entries
It is the same that the one fixed in process_table_expire() (21447b1dd4
["BUG/MAJOR: stick-tables: fix race with peers in entry expiration"]). In
stktable_trash_oldest(), when the update lock is acquired, we must take care
to check again the ref_cnt because some peers may increment it (See commit
above for details).

This patch fixes a crash mentionned in 2552#issuecomment-2110532706. It must
be backported to 2.9.
2024-05-24 11:52:11 +02:00
Willy Tarreau
19f8762a98 BUILD: stick-tables: silence build warnings when threads are disabled
Since 3.0-dev7 with commit 1a088da7c2 ("MAJOR: stktable: split the keys
across multiple shards to reduce contention"), building without threads
yields a warning about the shard not being used. This is because the
locks API does nothing of its arguments, which is the only place where
the shard is being used. We cannot modify the lock API to pretend to
consume its argument because quite often it's not even instantiated.
Let's just pretend we consume shard using an explict ALREADY_CHECKED()
statement instead. While we're at it, let's make sure that XXH32() is
not called when there is a single bucket!

No backport is needed.
2024-04-24 08:23:56 +02:00
Willy Tarreau
21447b1dd4 BUG/MAJOR: stick-tables: fix race with peers in entry expiration
In 2.9 with commit 7968fe3889 ("MEDIUM: stick-table: change the ref_cnt
atomically") we significantly relaxed the stick-tables locking when
dealing with peers by adjusting the ref_cnt atomically and moving it
out of the lock.

However it opened a tiny window that became problematic in 3.0-dev7
when the table's contention was lowered by commit 1a088da7c2 ("MAJOR:
stktable: split the keys across multiple shards to reduce contention").

What happens is that some peers may access the entry for reading at
the moment it's about to expire, and while the read accesses to push
the data remain unnoticed (possibly that from time to time we push
crap), but the releasing of the refcount causes a new write that may
damage anything else. The scenario is the following:

  process_table_expire()               peer_send_teachmsgs()

                                       RDLOCK(&updt_lock);
     tick_is_expired() != 0

     ebmb_delete(ts->key);

     if (ts->upd.node.leaf_p) {
                                       HA_ATOMIC_INC(&ts->ref_cnt);
                                       RDUNLOCK(&updt_lock);
          WRLOCK(&updt_lock);
          eb32_delete(&ts->upd);
     }
     __stksess_free(t, ts);
                                       peer_send_updatemsg(ts);
                                       RDLOCK(&updt_lock);
                                       HA_ATOMIC_DEC(&ts->ref_cnt);

Here it's clear that the bottom part of peer_send_teachmsgs() believes
to be protected but may act on freed data.

This is more visible when enabling -dMtag,no-merge,integrity because
the ATOMIC_DEC(&ref_cnt) decrements one byte in the area, that makes
the eviction check fail while the tag has the address of the left
__stksess_free(), proving a completed pool_free() before the decrement,
and the anomaly there is pretty visible in the crash dump. Changing
INC()/DEC() with ADD(2)/DEC(2) shows that the byte is now off by two,
confirming that the operation happened there.

The solution is not very hard, it consists in checking for the ref_cnt
on the left after grabbing the lock, and doing both before deleting the
element, so that we have the guarantee that either the peer will not
take it or that it has already started taking it.

This was proven to be sufficient, as instead of crashing after 3s of
injection with 4 peers, 16 threads and 130k RPS, it survived for 15mn.

In order to stress the setup, a config involving 4+ peers, tracking
HTTP request with randoms and applying a bwlim-out filter with a
random key, with a client made of 160 h2 conns downloading 10 streams
of 4MB objects in parallel managed to trigger it within a few seconds:

  frontend ft
    http-request track-sc0 rand(100000) table tbl
    filter bwlim-out lim-out limit 2047m key rand(100000000),ipmask(32) min-size 1 table tbl
    http-request set-bandwidth-limit lim-out
    use_backend bk

  backend bk
    server s1 198.18.0.30:8000
    server s2 198.18.0.34:8000

  backend tbl
        stick-table type ip size 1000k expire 1s store http_req_cnt,bytes_in_rate(1s),bytes_out_rate(1s) peers peers

This seems to be very dependent on the timing and setup though.

This will need to be backported to 2.9. This part of the code was
reindented with shards but the block should remain mostly unchanged.
The logic to apply is the same.
2024-04-12 18:00:13 +02:00
Willy Tarreau
90efe8a877 CLEANUP: stick-tables: always respect the to_batch limit when trashing
When adding the shards support to tables with commit 1a088da7c ("MAJOR:
stktable: split the keys across multiple shards to reduce contention"),
the condition to stop eliminating entries based on the batch size being
reached is based on a pre-decrement of the max_search counter, but now
it goes back into the outer loop which doesn't check it, so next time
it does it when entering the next shard, it will become even more
negative and will properly stop, but at first glance it looks like an
int overflow (which it is not). Let's make sure the outer loop stops
on this condition so that we don't continue searching when the limit
is reached.
2024-04-12 17:58:54 +02:00
Willy Tarreau
44a8f9e7fc BUG/MEDIUM: stick-tables: fix the task's next expiration date
While changing the stick-table indexing that led to commit 1a088da7c
("MAJOR: stktable: split the keys across multiple shards to reduce
contention"), I met a problem with the task's expiration date being
incorrectly updated, I fixed it and apparently I committed the wrong
version :-/

The effect is that the task's date is only correctly reset if the
table is empty, otherwise the task wakes up again and is queued at
the previous date, eating 100% CPU. The tick_isfirst() must not be
used when storing the last result.

No backport is needed as this was only merged in 3.0-dev7.
2024-04-12 17:58:54 +02:00
Frederic Lecaille
fcb096f7cd BUG/MINOR: stick-tables: Missing stick-table key nullity check
This bug arrived with this commit:
     MAJOR: stktable: split the keys across multiple shards to reduce contention

At this time, there are no callers which call stktable_get_entry() without checking
the nullity of <key> passed as parameter. But the documentation of this function
says it supports this case where the <key> passed as parameter could be null.

Move the nullity test on <key> at first statement of this function.

Thanks to @chipitsine for having reported this issue in GH #2518.
2024-04-04 11:08:56 +02:00
Willy Tarreau
1a088da7c2 MAJOR: stktable: split the keys across multiple shards to reduce contention
In order to reduce the contention on the table when keys expire quickly,
we're spreading the load over multiple trees. That counts for keys and
expiration dates. The shard number is calculated from the key value
itself, both when looking up and when setting it.

The "show table" dump on the CLI iterates over all shards so that the
output is not fully sorted, it's only sorted within each shard. The Lua
table dump just does the same. It was verified with a Lua program to
count stick-table entries that it works as intended (the test case is
reproduced here as it's clearly not easy to automate as a vtc):

  function dump_stk()
    local dmp = core.proxies['tbl'].stktable:dump({});
    local count = 0
    for _, __ in pairs(dmp) do
        count = count + 1
    end
    core.Info('Total entries: ' .. count)
  end

  core.register_action("dump_stk", {'tcp-req', 'http-req'}, dump_stk, 0);

  ##
  global
    tune.lua.log.stderr on
    lua-load-per-thread lua-cnttbl.lua

  listen front
    bind :8001
    http-request lua.dump_stk if { path_beg /stk }
    http-request track-sc1 rand(),upper,hex table tbl
    http-request redirect location /

  backend tbl
    stick-table size 100k type string len 12 store http_req_cnt

  ##
  $ h2load -c 16 -n 10000 0:8001/
  $ curl 0:8001/stk

  ## A count close to 100k appears on haproxy's stderr
  ## On the CLI, "show table tbl" | wc will show the same.

Some large parts were reindented only to add a top-level loop to iterate
over shards (e.g. process_table_expire()). Better check the diff using
git show -b.

The number of shards is decided just like for the pools, at build time
based on the max number of threads, so that we can keep a constant. Maybe
this should be done differently. For now CONFIG_HAP_TBL_BUCKETS is used,
and defaults to CONFIG_HAP_POOL_BUCKETS to keep the benefits of all the
measurements made for the pools. It turns out that this value seems to
be the most reasonable one without inflating the struct stktable too
much. By default for 1024 threads the value is 32 and delivers 980k RPS
in a test involving 80 threads, while adding 1kB to the struct stktable
(roughly doubling it). The same test at 64 gives 1008 kRPS and at 128
it gives 1040 kRPS for 8 times the initial size. 16 would be too low
however, with 675k RPS.

The stksess already have a shard number, it's the one used to decide which
peer connection to send the entry. Maybe we should also store the one
associated with the entry itself instead of recalculating it, though it
does not happen that often. The operation is done by hashing the key using
XXH32().

The peers also take and release the table's lock but the way it's used
it not very clear yet, so at this point it's sure this will not work.

At this point, this allowed to completely unlock the performance on a
80-thread setup:

 before: 5.4 Gbps, 150k RPS, 80 cores
  52.71%  haproxy    [.] stktable_lookup_key
  36.90%  haproxy    [.] stktable_get_entry.part.0
   0.86%  haproxy    [.] ebmb_lookup
   0.18%  haproxy    [.] process_stream
   0.12%  haproxy    [.] process_table_expire
   0.11%  haproxy    [.] fwrr_get_next_server
   0.10%  haproxy    [.] eb32_insert
   0.10%  haproxy    [.] run_tasks_from_lists

 after: 36 Gbps, 980k RPS, 80 cores
  44.92%  haproxy    [.] stktable_get_entry
   5.47%  haproxy    [.] ebmb_lookup
   2.50%  haproxy    [.] fwrr_get_next_server
   0.97%  haproxy    [.] eb32_insert
   0.92%  haproxy    [.] process_stream
   0.52%  haproxy    [.] run_tasks_from_lists
   0.45%  haproxy    [.] conn_backend_get
   0.44%  haproxy    [.] __pool_alloc
   0.35%  haproxy    [.] process_table_expire
   0.35%  haproxy    [.] connect_server
   0.35%  haproxy    [.] h1_headers_to_hdr_list
   0.34%  haproxy    [.] eb_delete
   0.31%  haproxy    [.] srv_add_to_idle_list
   0.30%  haproxy    [.] h1_snd_buf

WIP: uint64_t -> long

WIP: ulong -> uint

code is much smaller
2024-04-03 17:34:47 +02:00
Willy Tarreau
864ac31174 OPTIM: stick-tables: check the stksess without taking the read lock
Thanks to the previous commit, we can now simply perform an atomic read
on stksess->seen and take the write lock to recreate the entry only if
at least one peer has seen it, otherwise leave it untouched. On a test
on 40 cores, the performance used to drop from 2.10 to 1.14M RPS when
one peer was connected, now it drops to 2.05, thus there's basically
no impact of connecting a peer vs ~45% previously, all spent in the
read lock. This can be particularly important when often updating the
same entries (user-agent, source address during an attack etc).
2024-04-03 17:34:47 +02:00
Willy Tarreau
4c1480f13b MINOR: stick-tables: mark the seen stksess with a flag "seen"
Right now we're taking the stick-tables update lock for reads just for
the sake of checking if the update index is past it or not. That's
costly because even taking the read lock is sufficient to provoke a
cache line write, while when under load or attack it's frequent that
the update has not yet been propagated and wouldn't require anything.

This commit brings a new field to the stksess, "seen", which is zeroed
when the entry is updated, and set to one as soon as at least one peer
starts to consult it. This way it will reflect that the entry must be
updated again so that this peer can see it. Otherwise no update will
be necessary. For now the flag is only set/reset but not exploited.
A great care is taken to avoid writes whenever possible.
2024-04-03 17:34:47 +02:00
Tim Duesterhus
cd5d62249f CLEANUP: Reapply ist.cocci (3)
This reapplies ist.cocci across the whole src/ tree.
2024-04-02 07:27:33 +02:00
Willy Tarreau
5fc1afb341 BUG/MEDIUM: stick-tables: fix a small remaining race in expiration task
In 2.7 we addressed a race condition in the stick tables expiration task
with commit fbb934d ("BUG/MEDIUM: stick-table: fix a race condition when
updating the expiration task"). The issue was that the task could be
running on another thread which would destroy its expiration timer
while one had just recalculated it and prepares to queue it, causing
a bug due to the attempt to queue an expired task. The fix consisted in
enclosing the change into the stick-table's lock, which had a very low
cost since it's done only after having checked that the date changed,
i.e. no more than once every millisecond.

But as reported by Ricardo and Felipe from Taghos in github issue #2508,
a tiny race remained after the fix: the unlock() was done before the call
to task_queue(), leaving a tiny window for another thread to run between
unlock() and task_queue() and erase the timer. As confirmed, it's
sufficient to also protect the task_queue() call.

But overall this raises a point regarding the task_queue() API on tasks
that may run anywhere. A while ago an attempt was made at removing the
timer for woken up tasks, but something like this would be deserved
with more atomicity on the timer manipulation (e.g. atomically use
task_schedule() instead maybe). This should be backported to all
stable branches.
2024-04-02 07:07:57 +02:00
Christopher Faulet
94b8ed446f MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands
The main CLI I/O handle is responsible to interrupt the processing on
shutdown/abort. It is not the responsibility of the I/O handler of CLI
commands to take care of it.
2024-03-28 17:28:20 +01:00
Ilya Shipitsin
96cd04f8db CLEANUP: fix typo in naming for variable "unused"
In resolvers.c:rslv_promex_next_ts() and in
stick-tables.c:stk_promex_next_ts(), an unused argument was mistakenly
called "unsued" instead of "unused". Let's fix this in a separate patch
so that it can be omitted from backports if this causes build problems.
2024-03-05 11:50:34 +01:00
Willy Tarreau
c9c6b683fb MEDIUM: stick-tables: add a new stored type for glitch_cnt and glitch_rate
This adds a new pair of stored types in the stick-tables:
  - glitch_cnt
  - glitch_rate

These keep count of the number of glitches reported on a front connection,
in order to decide how to act with a badly defective client or a potential
attacker. For now nothing updates these counters, but all the infrastructure
needed to configure, update and retrieve them was added, including the doc.

No regtest was added yet since they're not filled yet.
2024-02-08 15:51:49 +01:00
Christopher Faulet
3e55b3da30 MEDIUM: promex/stick-table: Dump stick-table metrics via a promex module
Create a promex module to dump stick-table metrics. Thanks to this patch,
all references to stick tables were removed from the promex service.
2024-02-02 09:11:34 +01:00
Willy Tarreau
cdc993b19e BUILD: stick-table: fix build error on 32-bit platforms
Commit 9b2717e7b ("MINOR: stktable: use {show,set,clear} table with ptr")
stores a pointer in a long long (64bit), which fails the cas to void* on
32-bit platforms:

  src/stick_table.c: In function 'table_process_entry_per_ptr':
  src/stick_table.c:5136:37: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
   5136 |         ts = stktable_lookup_ptr(t, (void *)ptr);

On all our supported platforms, longs and pointers are of the same size,
so let's just turn this to a ulong instead.
2024-01-21 08:21:35 +01:00
Aurelien DARRAGON
41b7193e3c MINOR: stktable: stktable_data_ptr() cannot fail in table_process_entry()
In table_process_entry(), stktable_data_ptr() result is dereferenced
without checking if it's NULL first, which may happen when bad inputs
are provided to the function.

However, data_type and ts arguments were already checked prior to calling
the function, so we know for sure that stktable_data_ptr() will never
return NULL in this case.

However some static code analyzers such as Coverity are being confused
because they think that the result might possibly be NULL.
(See GH #2398)

To make it explicit that we always provide good inputs and expect valid
result, let's switch to the __stktable_data_ptr() unsafe function.
2024-01-02 08:51:51 +01:00
Aurelien DARRAGON
9b2717e7bb MINOR: stktable: use {show,set,clear} table with ptr
This patchs adds support for optional ptr (0xffff form) instead of key
argument to match against existing sticktable entries, ie: if the key is
empty or cannot be matched on the cli due to incompatible characters.
Lookup is performed using a linear search so it will be slower than key
search which relies on eb tree lookup.

Example:

set table mytable key mykey data.gpc0 1

show table mytable
> 0x7fbd00032bd8: key=mykey use=0 exp=86373242 shard=0 gpc0=1

clear table mytable ptr 0x7fbd00032bd8

This patchs depends on:
 - "MINOR: stktable: add table_process_entry helper function"

It should solve GH #2118
2023-12-21 14:22:27 +01:00
Aurelien DARRAGON
6ee3923c52 MINOR: stktable: add table_process_entry helper function
Only keep key-related logic in table_process_entry_per_key() function,
and then use table_process_entry() function that takes an entry pointer
as argument to process the entry.
2023-12-21 14:22:27 +01:00
Aurelien DARRAGON
2c4943c18b BUG/MINOR: proxy/stktable: missing frees on proxy cleanup
In 1b8e68e ("MEDIUM: stick-table: Stop handling stick-tables as proxies.")
we forgot to free the table pointer which is now dynamically allocated.

Let's take this opportunity to also fix a missing free in the table itself
(the table expire task wasn't properly destroyed)

This patch depends on:
 - "MINOR: stktable: add sktable_deinit function"

It should be backported in every stable versions.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
e10cf61099 MINOR: stktable: add stktable_deinit function
Adding sktable_deinit() helper function to properly cleanup a sticktable
that was initialized using stktable_init().
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
8dae361f35 MINOR: stktable/cli: support v6tov4 and v4tov6 conversions
Add a special treatment for the IPV4 and IPV6 cases in
table_process_entry_per_key() function so that input string is parsed
in best effort (STR to pseudo type ADDR): input format is first considered
over table type and then let smp_to_stkey() do the type conversion for us
when needed.

This patch heavily depends on:
- "MEDIUM: stktable/cli: simplify entry key handling"

And optionally depends on:
- 72514a44 ("MEDIUM: tools/ip: v4tov6() and v6tov4() rework")
2023-11-08 16:38:06 +01:00
Aurelien DARRAGON
0a47e6bccc MEDIUM: stktable/cli: simplify entry key handling
Make use of smp_to_stkey() in table_process_entry_per_key() to simplify
key handling and leverage auto type conversions from sample API.

One noticeable side effect is that integer input checks will be relaxed
given that c_str2int() sample conv is more permissible than the integrated
table_process_entry_per_key() integer parser.
2023-11-08 16:38:06 +01:00
Aurelien DARRAGON
c6826b9570 BUG/MINOR: stick-table/cli: Check for invalid ipv4 key
When an ipv4 key is used to filter a CLI command on a stick table
clear/set/show table ...), inetaddr_host+htonl combination was used
with no error checking.

Instead, we now use inet_pton(), which is what we use for ipv6 addresses
since b7c962b0c0 ("BUG/MINOR: stick-table/cli: Check for invalid ipv6 key")

Doing this allows us to easily check for parsing errors: we're trading off
some parsing efficience to better catch input errors and ensure we get
similar behavior between ipv4 and ipv6 addresses handling.

This patch may be backported to all supported versions.
2023-11-08 16:38:06 +01:00
Aurelien DARRAGON
5158c0ff69 MEDIUM: stktable/peers: "write-to" local table on peer updates
In this patch, we add the possibility to declare on a table definition
("table" in peer section, or "stick-table" in proxy section) that we
want the remote/peer updates on that table to be pushed on a local
haproxy table in addition to the source table.

Consider this example:

  |peers mypeers
  |        peer local 127.0.0.1:3334
  |        peer clust 127.0.0.1:3333
  |        table t1.local type string size 10m store server_id,server_key expire 30s
  |        table t1.clust type string size 10m store server_id,server_key write-to mypeers/t1.local expire 30s

With this setup, we consider haproxy uses t1.local as cache/local table
for read and write operations, and that t1.clust is a remote table
containing datas processed from t1.local and similar tables from other
haproxy peers in a cluster setup. The t1.clust table will be used to
refresh the local/cache one via the "write-to" statement.

What will happen, is that every time haproxy will see entry updates for
the t1.clust table: it will overwrite t1.local table with fresh data and
will update the entry expiration timer. If t1.local entry doesn't exist
yet (key doesn't exist), it will automatically create it. Note that only
types that cannot be used for arithmetic ops will be handled, and this
to prevent processed values from the remote table from interfering with
computations based on values from the local table. (ie: prevent
cumulative counters from growing indefinitely).

"write-to" will only push supported types if they both exist in the source
and the target table. Be careful with server_id and server_key storage
because they are often declared implicitly when referencing a table in
sticking rules but it is required to declare them explicitly for them to
be pushed between a remote and a local table through "write-to" option.

Also note that the "write-to" target table should have the same type as
the source one, and that the key length should be strictly equal,
otherwise haproxy will raise an error due to the tables being
incompatibles. A table that is already being written to cannot be used
as a source table for a "write-to" target.

Thanks to this patch, it will now be possible to use sticking rules in
peer cluster context by using a local table as a local cache which
will be automatically refreshed by one or multiple remote table(s).

This commit depends on:
 - "MINOR: stktable: stktable_init() sets err_msg on error"
 - "MINOR: stktable: check if a type should be used as-is"
2023-11-03 17:30:30 +01:00
Aurelien DARRAGON
db0cb54f81 MINOR: stktable: check if a type should be used as-is
stick table types now have an extra bit named 'as_is' that allows us to
check if such type should be used as-is or if it may be involved in
arithmetic operations such as counters. This can be useful since those
types are not common and may require specific handling.

e.g.: stktable_data_types[data_type].as_is will be set to 1 if the type
cannot be used in arithmetic operations.
2023-11-03 17:30:30 +01:00
Aurelien DARRAGON
b8c19f877a MINOR: stktable: stktable_init() sets err_msg on error
stktable_init() now sets err_msg when error occurs so that caller is able
to precisely report the cause of the failure.
2023-11-03 17:30:30 +01:00
Aurelien DARRAGON
6376fe9142 BUG/MINOR: stktable: missing free in parse_stick_table()
When "peers" keyword is encountered within a stick table definition,
peers.name hint gets replaced with a new copy of the provided name using
strdup(). However, there is no detection on whether the name was
previously set or not, so it is currently allowed to reuse the keyword
multiple time to overwrite previous value, but here we forgot to free
previous value for peers.name before assigning it to a new one.

This should be backported to every stable versions.
2023-11-03 17:30:30 +01:00
Aurelien DARRAGON
7eb05891d8 BUG/MINOR: stktable: allow sc-add-gpc from tcp-request connection
Following the previous commit's logic, we enable the use of sc-add-gpc
from tcp-request connection since it was probably forgotten in the first
place for sc-set-gpt0, and since sc-add-gpc was inspired from it, it also
lacks its.

As sc-add-gpc was implemented in 5a72d03a58 ("MINOR: stick-table: implement
the sc-add-gpc() action"), this should only be backported to 2.8
2023-08-14 09:03:49 +02:00
Aurelien DARRAGON
6c79309fda BUG/MINOR: stktable: allow sc-set-gpt(0) from tcp-request connection
Both the documentation and original developer intents seem to suggest
that sc-set-gpt/sc-set-gpt0 actions should be available from
tcp-request connection.

Yet because it was probably forgotten when expr support was added to
sc-set-gpt0 in 0d7712dff0 ("MINOR: stick-table: allow sc-set-gpt0 to
set value from an expression") it doesn't work and will report this
kind of errors:
 "internal error, unexpected rule->from=0, please report this bug!"

Fixing the code to comply with the documentation and the expected
behavior.

This must be backported to every stable versions.

[for < 2.5, as only sc-set-gpt0 existed back then, the patch must be
manually applied to skip irrelevant parts]
2023-08-14 09:03:44 +02:00
Willy Tarreau
cfeca3a3a3 MEDIUM: stick-table: touch updates under an upgradable read lock
Instead of taking the update's write lock in stktable_touch_with_exp(),
while most of the time under high load there is nothing to update because
the entry is touched before having been synchronized present, let's do
the check under a read lock and upgrade it to perform the update if
needed. These updates are rare and the contention is not expected to be
very high, so at the first failure to upgrade we retry directly with a
write lock.

By doing so the performance has almost doubled again, from 1140 to 2050k
with a peers section enabled. The contention is now on taking the read
lock itself, so there's little to be gained beyond this in this function.
2023-08-11 19:03:35 +02:00
Willy Tarreau
87e072eea5 MEDIUM: stick-table: use a distinct lock for the updates tree
Updating an entry in the updates tree is currently performed under the
table's write lock, which causes huge contention with other accesses
such as lookups and free. Aside the updates tree, the update,
localupdate and commitupdate variables, nothing is manipulated, so
let's create a distinct lock (updt_lock) to protect these together
to remove this contention. It required to add an extra lock in the
few places where we delete the update (though only if we're really
going to delete it) to protect the tree. This is very convenient
because now peer_send_teachmsgs() only needs to take this read lock,
and there is very little contention left on the stick-table.

With this alone, the performance jumped from 614k to 1140k/s on a
80-thread machine with a peers section! Stick-table updates with
no peers however now has to stand two locks and slightly regressed
from 4.0-4.1M/s to 3.9-4.0. This is fairly minimal compared to the
significant unlocking of the peers updates and considered totally
acceptable.
2023-08-11 19:03:35 +02:00
Willy Tarreau
7968fe3889 MEDIUM: stick-table: change the ref_cnt atomically
Due to the ts->ref_cnt being manipulated and checked inside wrlocks,
we continue to have it updated under plenty of read locks, which have
an important cost on many-thread machines.

This patch turns them all to atomic ops and carefully moves them outside
of locks every time this is possible:
  - the ref_cnt is incremented before write-unlocking on creation otherwise
    the element could vanish before we can do it
  - the ref_cnt is decremented after write-locking on release
  - for all other cases it's updated out of locks since it's guaranteed by
    the sequence that it cannot vanish
  - checks are done before locking every time it's used to decide
    whether we're going to release the element (saves several write locks)
  - expiration tests are just done using atomic loads, since there's no
    particular ordering constraint there, we just want consistent values.

For Lua, the loop that is used to dump stick-tables could switch to read
locks only, but this was not done.

For peers, the loop that builds updates in peer_send_teachmsgs is extremely
expensive in write locks and it doesn't seem this is really needed since
the only updated variables are last_pushed and commitupdate, the first
one being on the shared table (thus not used by other threads) and the
commitupdate could likely be changed using a CAS. Thus all of this could
theoretically move under a read lock, but that was not done here.

On a 80-thread machine with a peers section enabled, the request rate
increased from 415 to 520k rps.
2023-08-11 19:03:35 +02:00
Willy Tarreau
73b1dea4d1 MINOR: stick-table: move the task_wakeup() call outside of the lock
The write lock in stktable_touch_with_exp() is quite expensive and should
be shortened as much as possible. There's no need for it when calling
task_wakeup() so let's move it out.

On a 80-thread machine with a peers section, the request rate increased
from 397k to 415k rps.
2023-08-11 19:03:35 +02:00
Willy Tarreau
322e4ab9d2 MINOR: stick-table: move the task_queue() call outside of the lock
The write lock in stktable_requeue_exp() is quite expensive and should
be shortened as much as possible. There's no need for it when calling
task_queue() so let's move it out.

On a 80-thread machine with a peers section, the request rate increased
from 368k to 397k rps.
2023-08-11 19:03:35 +02:00
Christopher Faulet
208c712b40 MINOR: stconn: Rename SC_FL_SHUTW in SC_FL_SHUT_DONE
Here again, it is just a flag renaming. In SC flags, there is no longer
shutdown for writes but shutdowns.
2023-04-14 15:01:21 +02:00
William Lallemand
3f210970bf BUG/MINOR: stick_table: alert when type len has incorrect characters
Alert when the len argument of a stick table type contains incorrect
characters.

Replace atol by strtol.

Could be backported in every maintained versions.
2023-04-13 14:46:08 +02:00
Christopher Faulet
7faac7cf34 MINOR: tree-wide: Simplifiy some tests on SHUT flags by accessing SCs directly
At many places, we simplify the tests on SHUT flags to remove calls to
chn_prod() or chn_cons() function because the corresponding SC is available.
2023-04-05 08:57:06 +02:00
Christopher Faulet
87633c3a11 MEDIUM: tree-wide: Move flags about shut from the channel to the SC
The purpose of this patch is only a one-to-one replacement, as far as
possible.

CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the
stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not
so simple because at many places, we were testing if a channel was shut for
reads and writes in same time. To do the same, shut for reads must be tested
on one side on the SC and shut for writes on the other side on the opposite
SC. A special care was taken with process_stream(). flags of SCs must be
saved to be able to detect changes, just like for the channels.
2023-04-05 08:57:06 +02:00
Aurelien DARRAGON
e2907c7ee3 MINOR: stick-table: add sc-add-gpc() to http-after-response
sc-add-gpc() was implemented in 5a72d03 ("MINOR: stick-table:
implement the sc-add-gpc() action")

This new action was exposed everywhere sc-inc-gpc() is available,
except for http-after-response.
But there doesn't seem to be a technical constraint that prevents us from
exposing it in http-after-response.
It was probably overlooked, let's add it.

No backport needed, unless 5a72d03 ("MINOR: stick-table: implement the
sc-add-gpc() action") is being backported.
2023-03-17 13:09:09 +01:00
Aleksey Ponomaryov
593802128c BUG/MEDIUM: stick-table: do not leave entries in end of window during purge
At some moments expired stick table records stop being removed. This
happens when the internal time wraps around the 32-bit limit, or every
49.7 days. What precisely happens is that some elements that are collected
close to the end of the time window (2^32 - table's "expire" setting)
might have been updated and will be requeued further, at the beginning
of the next window. Here, three bad situations happen:

  - the incorrect integer-based comparison that is not aware of wrapping
    will result in the scan to restart from the freshly requeued element,
    skipping all those at the end of the window. The net effect of this
    is that at each wakeup of the expiration task, only one element from
    the end of the window will be expired, and other ones will remain
    there for a very long time, especially if they have to wait for all
    the predecessors to be picked one at a time after slow wakeups due
    to a long expiration ; this is what was observed in issue #2034
    making the table fill up and appear as not expiring at all, and it
    seems that issue #2024 reports the same problem at the same moment
    (since such issues happen for everyone roughly at the same time
    when the clock doesn't drift too much).

  - the elements that were placed at the beginning of the next window
    are skipped as well for as long as there are refreshed entries at
    the end of the previous window, so these ones participate to filling
    the table as well. This is cause by the restart from the current,
    updated node that is generally placed after most other less recently
    updated elements.

  - once the last element at the end of the window is picked, suddenly
    there is a large amount of expired entries at the beginning of the
    next window that all have to be requeued. If the expiration delay
    is large, the number can be big and it can take a long time, which
    can very likely explain the periodic crashes reported in issue #2025.
    Limiting the batch size as done in commit dfe79251d ("BUG/MEDIUM:
    stick-table: limit the time spent purging old entries") would make
    sense for process_table_expire() as well.

This patch addresses the incorrect tree scan algorithm to make sure that:
  - there's always a next element to compare against, even when dealing
    with the last one in the tree, the first one must be used ;

  - time comparisons used to decide whether to restart from the current
    element use tick_is_lt() as it is the only case where we know the
    current element will be placed before any other one (since the tree
    respects insertion ordering for duplicates)

In order to reproduce the issue, it was found that injecting traffic on
a random key that spans over half of the size of a table whose expiration
is set to 15s while the date is going to wrap in 20s does exhibit an
increase of the table's size 5s after startup, when entries start to be
pushed to the next window. It's more effective when a second load
generator constantly hammers a same key to be certain that none of them
is ready to expire. This doesn't happen anymore after this patch.

This fix needs to be backported to all stable versions. The bug has been
there for as long as the stick tables were introduced in 1.4-dev7 with
commit 3bd697e07 ("[MEDIUM] Add stick table (persistence) management
functions and types"). A cleanup could consists in deduplicating that
code by having process_table_expire() call __stktable_trash_oldest(),
with that one improved to support an optional time check.
2023-02-08 08:55:02 +01:00
Christopher Faulet
da89e9b95b MINOR: channel/applets: Stop to test CF_WRITE_ERROR flag if CF_SHUTW is enough
In applets, we stop processing when a write error (CF_WRITE_ERROR) or a shutdown
for writes (CF_SHUTW) is detected. However, any write error leads to an
immediate shutdown for writes. Thus, it is enough to only test if CF_SHUTW is
set.
2023-01-09 18:41:08 +01:00
Willy Tarreau
5a72d03a58 MINOR: stick-table: implement the sc-add-gpc() action
This action increments the General Purpose Counter at the index <idx> of
the array associated to the sticky counter designated by <sc-id> by the
value of either integer <int> or the integer evaluation of expression
<expr>. Integers and expressions are limited to unsigned 32-bit values.
If an error occurs, this action silently fails and the actions evaluation
continues. <idx> is an integer between 0 and 99 and <sc-id> is an integer
between 0 and 2. It also silently fails if the there is no GPC stored at
this index. The entry in the table is refreshed even if the value is zero.
The 'gpc_rate' is automatically adjusted to reflect the average growth
rate of the gpc value.

The main use of this action is to count scores or total volumes (e.g.
estimated danger per source IP reported by the server or a WAF, total
uploaded bytes, etc).
2023-01-07 09:11:22 +01:00
Willy Tarreau
6c0117168e MEDIUM: stick-table: set the track-sc limit at boottime via tune.stick-counters
The number of stick-counter entries usable by track-sc rules is currently
set at build time. There is no good value for this since the vast majority
of users don't need any, most need only a few and rare users need more.
Adding more counters for everyone increases memory and CPU usages for no
reason.

This patch moves the per-session and per-stream arrays to a pool of a size
defined at boot time. This way it becomes possible to set the number of
entries at boot time via a new global setting "tune.stick-counters" that
sets the limit for the whole process. When not set, the MAX_SESS_STR_CTR
value still applies, or 3 if not set, as before.

It is also possible to lower the value to 0 to save a bit of memory if
not used at all.

Note that a few low-level sample-fetch functions had to be protected due
to the ability to use sample-fetches in the global section to set some
variables.
2023-01-06 18:08:49 +01:00
Christopher Faulet
a92480462c MINOR: http-rules: Add missing actions in http-after-response ruleset
This patch adds the support of following actions in the http-after-response
ruleset:

  * set-map, del-map and del-acl
  * set-log-level
  * sc-inc-gpc, sc-inc-gpc0 and set-inc-gpc1
  * sc-inc-gpt and sc-set-gpt0

This patch should solve the issue #1980.
2023-01-05 11:23:59 +01:00
Willy Tarreau
20391519c3 BUG/MINOR: stick-table: report the correct action name in error message
sc-inc-gpc() learned to use arrays in 2.5 with commit 4d7ada8f9 ("MEDIUM:
stick-table: add the new arrays of gpc and gpc_rate"), but the error
message says "sc-set-gpc" instead of "sc-inc-gpc". Let's fix this to
avoid confusion.

This can be backported to 2.5.
2023-01-02 17:35:50 +01:00