Commit Graph

207 Commits

Author SHA1 Message Date
Aurelien DARRAGON
3b0bf5097b MINOR: map: mapfile ordering also matters for tree-based match types
Willy made me realize that tree-based matching may also suffer from
out-of-order mapfile loading, as opposed to what's being said in
b546bb6d ("BUG/MINOR: map: list-based matching potential ordering
regression") and the associated REGTEST.

Indeed, in case of duplicated keys, we want to be sure that only the key
that was first seen in the file will be returned (as long as it is not
removed). The above fix is still valid, and the list-based match regtest
will also prevent regressions for tree-based match since mapfile loading
logic is currently match-type agnostic.

But let's clarify that by making both the code comment and the regtest
more precise.
2024-01-11 11:13:54 +01:00
Aurelien DARRAGON
b546bb6d67 BUG/MINOR: map: list-based matching potential ordering regression
An unexpected side-effect was introduced by 5fea597 ("MEDIUM: map/acl:
Accelerate several functions using pat_ref_elt struct ->head list")

The above commit tried to use eb tree API to manipulate elements as much
as possible in the hope to accelerate some functions.

Prior to 5fea597, pattern_read_from_file() used to iterate over all
elements from the map file in the same order they were seen in the file
(using list_for_each_entry) to push them in the pattern expression.

Now, since eb api is used to iterate over elements, the ordering is lost
very early.

This is known to cause behavior changes with existing setups (same conf
and map file) when compared with previous versions for some list-based
matching methods as described in GH #2400. For instance, the map_dom()
converter may return a different matching key from the one that was
returned by older haproxy versions.

For IP or STR matching, matching is based on tree lookups for better
efficiency, so in this case the ordering is lost at the name of
performance. The order in which they are loaded doesn't matter because
tree ordering is based on the content, it is not positional.

But with some other types, matching is based on list lookups (e.g.: dom),
and the order in which elements are pushed into the list can affect the
matching element that will be returned (in case of multiple matches, since
only the first matching element in the list will be returned).

Despite the documentation not officially stating that the file ordering
should be preserved for list-based matching methods, it's probably best
to be conservative here and stick to historical behavior. Moreover, there
was no performance benefit from using the eb tree api to iterate over
elements in pattern_read_from_file() since all elements are visited
anyway.

This should be backported to 2.9.
2024-01-10 18:02:13 +01:00
Aurelien DARRAGON
d7964c52ce BUG/MEDIUM: map/acl: pat_ref_{set,delete}_by_id regressions
Some regressions were introduced by 5fea59754b ("MEDIUM: map/acl:
Accelerate several functions using pat_ref_elt struct ->head list")

pat_ref_delete_by_id() fails to properly unlink and free the removed
reference because it bypasses the pat_ref_delete_by_ptr() made for
that purpose. This function is normally used everywhere the target
reference is set for removal, such as the pat_ref_delete() function
that matches pattern against a string. The call was probably skipped
by accident during the rewrite of the function.

With the above commit also comes another undesirable change:
both pat_ref_delete_by_id() and pat_ref_set_by_id() directly use the
<refelt> argument as a valid pointer (they do dereference it).

This is wrong, because <refelt> is unsafe and should be handled as an
ID, not a pointer (hence the function name). Indeed, the calling function
may directly pass user input from the CLI as <refelt> argument, so we must
first ensure that it points to a valid element before using it, else it is
probably invalid and we shouldn't touch it.

What this patch essentially does, is that it reverts pat_ref_set_by_id()
and pat_ref_delete_by_id() to pre 5fea59754b behavior. This seems like
it was the only optimization from the patch that doesn't apply.

Hopefully, after reviewing the changes with Fred, it seems that the 2
functions are only being involved in commands for manipulating maps or
acls on the cli, so the "missed" opportunity to improve their performance
shouldn't matter much. Nonetheless, if we wanted to speed up the reference
lookup by ID, we could consider adding an eb64 tree for that specific
purpose that contains all pattern references IDs (ie: pointers) so that
eb lookup functions may be used instead of linear list search.

The issue was raised by Marko Juraga as he failed to perform an an acl
removal by reference on the CLI on 2.9 which was known to work properly
on other versions.

It should be backported on 2.9.

Co-Authored-by: Frédéric Lécaille <flecaille@haproxy.com>
2023-12-08 14:26:06 +01:00
Christopher Faulet
67c03508d6 MEDIUM: pattern: Add support for virtual and optional files for patterns
Before this patch, it was not possible to use a list of patterns, map or a
list of acls, without an existing file.  However, it could be handy to just
use an ID, with no file on the disk. It is pretty useful for everyone
managing dynamically these lists. It could also be handy to try to load a
list from a file if it exists without failing if not. This way, it could be
possible to make a cold start without any file (instead of empty file),
dynamically add and del patterns, dump the list to the file periodically to
reuse it on reload (via an external process).

In this patch, we uses some prefixes to be able to use virtual or optional
files.

The default case remains unchanged. regular files are used. A filename, with
no prefix, is used as reference, and it must exist on the disk. With the
prefix "file@", the same is performed. Internally this prefix is
skipped. Thus the same file, with ou without "file@" prefix, references the
same list of patterns.

To use a virtual map, "virt@" prefix must be used. No file is read, even if
the following name looks like a file. It is just an ID. The prefix is part
of ID and must always be used.

To use a optional file, ie a file that may or may not exist on a disk at
startup, "opt@" prefix must be used. If the file exists, its content is
loaded. But HAProxy doesn't complain if not. The prefix is not part of
ID. For a given file, optional files and regular files reference the same
list of patterns.

This patch should fix the issue #2202.
2023-12-06 10:24:41 +01:00
Christopher Faulet
660e4185e1 MINOR: pattern: Use reference name as filename to read patterns from a file
It is only a small API refactoring. The filename is no longer used when
pat_ref_read_from_file_smp() or pat_ref_read_from_file() functions are
called. The filename was already used to create the reference on the list of
patterns. Thus, we now directly use info from this reference.
2023-12-06 10:24:41 +01:00
Willy Tarreau
3ac9912837 OPTIM: pattern: save memory and time using ebst instead of ebis
In the pat_ref_elt struct, the pattern string is stored outside of the
node element, using a pointer to an strdup(). Not only this needlessly
wastes at least 16-24 bytes per entry (8 for the pointer, 8-16 for the
allocator), it also makes the tree descent less efficient since both
the node and the string have to be visited for each layer (hence at least
two cache lines). Let's use an ebmb storage and place the pattern right
at the end of the pat_ref_elt, making it a variable-sized element instead.

The set-map test below jumps from 173 to 182 kreq/s/core, and the memory
usage drops from 356 MB to 324 MB:

  http-request set-map(/dev/null) %[rand(1000000)] 1

This is even more visible with large maps: after loading 16M IP addresses
into a map, the process uses this amount of memory:

  - 3.15 GB with haproxy-2.8
  - 4.21 GB with haproxy-2.9-dev11
  - 3.68 GB with this patch

So that's a net saving of 32 bytes per entry here, which cuts in half the
extra cost of the tree, and loading a large map takes about 20% less time.
2023-11-27 11:25:07 +01:00
Willy Tarreau
58185669d8 BUG/MEDIUM: pattern: don't trim pools under lock in pat_ref_purge_range()
There's a subtle issue that results from pat_ref_purge_range() trying
to release memory. Since commit 0d93a8186 ("MINOR: pools: work around
possibly slow malloc_trim() during gc") that was backported to 2.3,
trim_all_pools() now protects itself against concurrent malloc() and
free() by isolating itself. The problem is that pat_ref_purge_range()
must be called under a lock, which is precisely what's done in
cli_io_handler_clear_map(). Thus during a clearing of a map, if
another thread tries to access or update an entry in the same map, it
will wait for the ref->lock to be released, and trim_all_pools() will
wait for all threads to be harmless, thus causing a deadlock. Note
that disabling memory trimming cannot work around the problem here
because it's tested only under isolation.

The solution here consists in moving the call to trim_all_pools() to
the caller, out of the lock.

This must be backported as far as 2.4.
2023-11-04 07:55:37 +01:00
Aurelien DARRAGON
0189a4679e MINOR: pattern/ip: simplify pat_match_ip() function
pat_match_ip() has been updated several times over the last decade to
introduce new features, but it was never cleaned up.

The result is that the function is pretty hard to read, and there are
multiple duplicated code blocks so it becomes error-prone to maintain it,
plus it bloats the haproxy binary for nothing.

In this patch, we move the tree search (ip4 / ip6) logic into 2
dedicated helper functions. This allows us to refactor pat_match_ip()
without touching to the original behavior.
2023-09-21 09:50:56 +02:00
Aurelien DARRAGON
f80122db26 MINOR: pattern/ip: offload ip conversion logic to helper functions
Now that v4tov6() and v6tov4() were reworked to match behavior from
pat_match_ip() function in ("MINOR: tools/ip: v4tov6() and v6tov4()
rework"), we can remove code duplication in pat_match_ip() by directly
using those dedicated functions where relevant.
2023-09-21 09:50:55 +02:00
Frdric Lcaille
81815a9a83 MEDIUM: map/acl: Replace map/acl spin lock by a read/write lock.
Replace ->lock type of pat_ref struct by HA_RWLOCK_T.
Replace all calls to HA_SPIN_LOCK() (resp. HA_SPIN_UNLOCK()) by HA_RWLOCK_WRLOCK()
(resp. HA_RWLOCK_WRUNLOCK()) when a write access is required.
There is only one read access which is needed. This is in the "show map" command
callback, cli_io_handler_map_lookup() where a HA_SPIN_LOCK() call is replaced
by HA_RWLOCK_RDLOCK() (resp. HA_SPIN_UNLOCK() by HA_RWLOCK_RDUNLOCK).
Replace HA_SPIN_INIT() calls by HA_RWLOCK_INIT() calls.
2023-08-25 15:42:03 +02:00
Frdric Lcaille
5fea59754b MEDIUM: map/acl: Accelerate several functions using pat_ref_elt struct ->head list
Replace as much as possible list_for_each*() around ->head list, member of
pat_ref_elt struct by use of its ->ebpt_root member which is an ebtree.
2023-08-25 15:42:01 +02:00
Frdric Lcaille
745d1a269b MEDIUM: map/acl: Improve pat_ref_set_elt() efficiency (for "set-map", "add-acl"action perfs)
Store a pointer to the expression (struct pattern_expr) into the data structure
used to chain/store the map element references (struct pat_ref_elt) , e.g. the
struct pattern_tree when stored into an ebtree or struct pattern_list when
chained to a list.

Modify pat_ref_set_elt() to stop inspecting all the expressions attached to a map
and to look for the <elt> element passed as parameter to retrieve the sample data
to be parsed. Indeed, thanks to the pointer added above to each pattern tree nodes
or list elements, they all can be inspected directly from the <elt> passed as
parameter and its ->tree_head and ->list_head member: the pattern tree nodes are
stored into elt->tree_head, and the pattern list elements are chained to
elt->list_head list. This inspection was also the job of pattern_find_smp() which
is no more useful. This patch removes the code of this function.
2023-08-25 15:41:59 +02:00
Frdric Lcaille
0844bed7d3 MEDIUM: map/acl: Improve pat_ref_set() efficiency (for "set-map", "add-acl" action perfs)
Organize reference to pattern element of map (struct pat_ref_elt) into an ebtree:
  - add an eb_root member to the map (pat_ref struct) and an ebpt_node to its
    element (pat_ref_elt struct),
  - modify the code to insert these nodes into their ebtrees each time they are
    allocated. This is done in pat_ref_append().

Note that ->head member (struct list) of map (struct pat_ref) is not removed
could have been removed. This is not the case because still necessary to dump
the map contents from the CLI in the order the map elememnts have been inserted.

This patch also modifies http_action_set_map() which is the callback at least
used by "set-map" action. The pat_ref_elt element returned by pat_ref_find_elt()
is no more ignored, but reused if not NULL by pat_ref_set() as first element to
lookup from. This latter is also modified to use the ebtree attached to the map
in place of the ->head list attached to each map element (pat_ref_elt struct).

Also modify pat_ref_find_elt() to makes it use ->eb_root map ebtree added to the
map by this patch in place of inspecting all the elements with a strcmp() call.
2023-08-25 15:41:56 +02:00
Willy Tarreau
821fc95146 MINOR: pattern: do not needlessly lookup the LRU cache for empty lists
If a pattern list is empty, there's no way we can find its elements in
the pattern cache, so let's avoid this expensive lookup. This can happen
for ACLs or maps loaded from files that may optionally be empty for
example. Doing so improves the request rate by roughly 10% for a single
such match for only 8 threads. That's normal because the LRU cache
pre-creates an entry that is about to be committed for the case the list
lookup succeeds after a miss, so we bypass all this.
2023-08-22 07:27:01 +02:00
Willy Tarreau
9b060f148e MINOR: pattern: use trim_all_pools() instead of a conditional malloc_trim()
First this will ensure that we serialize the threads and avoid severe
contention. Second it removes ugly ifdefs and conditions.
2023-03-22 17:30:28 +01:00
Miroslav Zagorac
d8a97d8f60 BUG/MINOR: illegal use of the malloc_trim() function if jemalloc is used
In the event that HAProxy is linked with the jemalloc library, it is still
shown that malloc_trim() is enabled when executing "haproxy -vv":
  ..
  Support for malloc_trim() is enabled.
  ..

It's not so much a problem as it is that malloc_trim() is called in the
pat_ref_purge_range() function without any checking.

This was solved by setting the using_default_allocator variable to the
correct value in the detect_allocator() function and before calling
malloc_trim() it is checked whether the function should be called.
2023-03-22 14:14:50 +01:00
Willy Tarreau
51d38a26fe BUG/MEDIUM: pattern: only visit equivalent nodes when skipping versions
Miroslav reported in issue #1802 a problem that affects atomic map/acl
updates. During an update, incorrect versions are properly skipped, but
in order to do so, we rely on ebmb_next() instead of ebmb_next_dup().
This means that if a new matching entry is in the process of being
added and is the first one to succeed in the lookup, we'll skip it due
to its version and use the next entry regardless of its value provided
that it has the correct version. For IP addresses and string prefixes
it's particularly visible because a lookup may match a new longer prefix
that's not yet committed (e.g. 11.0.0.1 would match 11/8 when 10/7 was
the only committed one), and skipping it could end up on 12/8 for
example. As soon as a commit for the last version happens, the issue
disappears.

This problem only affects tree-based matches: the "str", "ip", and "beg"
matches.

Here we replace the ebmb_next() values with ebmb_next_dup() for exact
string matches, and with ebmb_lookup_shorter() for longest matches,
which will first visit duplicates, then look for shorter prefixes. This
relies on previous commit:

    MINOR: ebtree: add ebmb_lookup_shorter() to pursue lookups

Both need to be backported to 2.4, where the generation ID was added.

Note that nowadays a simpler and more efficient approach might be employed,
by having a single version in the current tree, and a list of trees per
version. Manipulations would look up the tree version and work (and lock)
only in the relevant trees, while normal operations would be performed on
the current tree only. Committing would just be a matter of swapping tree
roots and deleting old trees contents.
2022-08-01 11:59:46 +02:00
Tim Duesterhus
d5fc8fcb86 CLEANUP: Add haproxy/xxhash.h to avoid modifying import/xxhash.h
This solves setting XXH_INLINE_ALL in a cleaner way, because the imported
header is not modified, easing future updates.

see 6f7cc11e6d
2021-09-11 19:58:45 +02:00
Dragan Dosen
a75eea78e2 MINOR: map/acl: print the count of all the map/acl entries in "show map/acl"
The output of "show map/acl" now contains the 'entry_cnt' value that
represents the count of all the entries for each map/acl, not just the
active ones, which means that it also includes entries currently being
added.
2021-05-25 08:44:45 +02:00
Willy Tarreau
da7f11bfb5 CLEANUP: pattern: remove the unused and dangerous pat_ref_reload()
This function was not used anymore after the atomic updates were
implemented in 2.3, and it must not be used given that it does not
yield and can easily make the process hang for tens of seconds on
large acls/maps. Let's remove it before someone uses it as an
example to implement something else!
2021-05-11 16:49:55 +02:00
Willy Tarreau
a13afe6535 MINOR: pattern: support purging arbitrary ranges of generations
Instead of being able to purge only values older than a specific value,
let's support arbitrary ranges and make pat_ref_purge_older() just be
one special case of this one.
2021-04-30 15:36:31 +02:00
Willy Tarreau
2b71810cb3 CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion
The current "ADD" vs "ADDQ" is confusing because when thinking in terms
of appending at the end of a list, "ADD" naturally comes to mind, but
here it does the opposite, it inserts. Several times already it's been
incorrectly used where ADDQ was expected, the latest of which was a
fortunate accident explained in 6fa922562 ("CLEANUP: stream: explain
why we queue the stream at the head of the server list").

Let's use more explicit (but slightly longer) names now:

   LIST_ADD        ->       LIST_INSERT
   LIST_ADDQ       ->       LIST_APPEND
   LIST_ADDED      ->       LIST_INLIST
   LIST_DEL        ->       LIST_DELETE

The same is true for MT_LISTs, including their "TRY" variant.
LIST_DEL_INIT keeps its short name to encourage to use it instead of the
lazier LIST_DELETE which is often less safe.

The change is large (~674 non-comment entries) but is mechanical enough
to remain safe. No permutation was performed, so any out-of-tree code
can easily map older names to new ones.

The list doc was updated.
2021-04-21 09:20:17 +02:00
Willy Tarreau
295a89c029 MINOR: pattern: make the pat_lru_seed read_mostly
This seed is created once at boot and is used in every LRU hash when
caching results. Let's mark it read_mostly.
2021-04-10 19:27:41 +02:00
Willy Tarreau
9057a0026e CLEANUP: pattern: make all pattern tables read-only
Interestingly, all arrays used to declare patterns were read-write while
only hard-coded. Let's mark them const so that they move from data to
rodata and don't risk to experience false sharing.
2021-04-10 17:49:41 +02:00
Willy Tarreau
61cfdf4fd8 CLEANUP: tree-wide: replace free(x);x=NULL with ha_free(&x)
This makes the code more readable and less prone to copy-paste errors.
In addition, it allows to place some __builtin_constant_p() predicates
to trigger a link-time error in case the compiler knows that the freed
area is constant. It will also produce compile-time error if trying to
free something that is not a regular pointer (e.g. a function).

The DEBUG_MEM_STATS macro now also defines an instance for ha_free()
so that all these calls can be checked.

178 occurrences were converted. The vast majority of them were handled
by the following Coccinelle script, some slightly refined to better deal
with "&*x" or with long lines:

  @ rule @
  expression E;
  @@
  - free(E);
  - E = NULL;
  + ha_free(&E);

It was verified that the resulting code is the same, more or less a
handful of cases where the compiler optimized slightly differently
the temporary variable that holds the copy of the pointer.

A non-negligible amount of {free(str);str=NULL;str_len=0;} are still
present in the config part (mostly header names in proxies). These
ones should also be cleaned for the same reasons, and probably be
turned into ist strings.
2021-02-26 21:21:09 +01:00
Willy Tarreau
dc2410d093 CLEANUP: pattern: rename pat_ref_commit() to pat_ref_commit_elt()
It's about the third time I get confused by these functions, half of
which manipulate the reference as a whole and those manipulating only
an entry. For me "pat_ref_commit" means committing the pattern reference,
not just an element, so let's rename it. A number of other ones should
really be renamed before 2.4 gets released :-/
2021-01-15 14:11:59 +01:00
Thayne McCombs
8f0cc5c4ba CLEANUP: Fix spelling errors in comments
This is from the output of codespell. It's done at once over a bunch
of files and only affects comments, so there is nothing user-visible.
No backport needed.
2021-01-08 14:56:32 +01:00
Dragan Dosen
967e7e79af MEDIUM: xxhash: use the XXH3 functions to generate 64-bit hashes
Replace the XXH64() function calls with the XXH3 variant function
XXH3_64bits_withSeed() where possible.
2020-12-23 06:39:21 +01:00
Thierry Fournier
a68affeaa9 BUG/MINOR: pattern: a sample marked as const could be written
The functions add final 0 to string if the final 0 is not set,
but don't check the flag CONST. This patch duplicates the strings
if the final zero is not set and the string is CONST.

Should be backported until 2.2 (at least)
2020-11-11 10:43:15 +01:00
Willy Tarreau
38d41996c1 MEDIUM: pattern: turn the pattern chaining to single-linked list
It does not require heavy deletion from the expr anymore, so we can now
turn this to a single-linked list since most of the time we want to delete
all instances of a given pattern from the head. By doing so we save 32 bytes
of memory per pattern. The pat_unlink_from_head() function was adjusted
accordingly.
2020-11-05 19:27:09 +01:00
Willy Tarreau
867a8a5a10 MINOR: pattern: prepare removal of a pattern from the list head
Instead of using LIST_DEL() on the pattern itself inside an expression,
we look it up from its head. The goal is to get rid of the double-linked
list while this usage remains exclusively for freeing on startup error!
2020-11-05 19:27:09 +01:00
Willy Tarreau
2817472bb0 MINOR: pattern: during reload, delete elements frem the ref, not the expression
Instead of scanning all elements from the expression and using the
slow delete path there, let's use the faster way which involves
pat_delete_gen() while the elements are detached from ther reference.
2020-11-05 19:27:09 +01:00
Willy Tarreau
ae83e63b48 MEDIUM: pattern: make pat_ref_prune() rely on pat_ref_purge_older()
When purging all of a reference, it's much more efficient to scan the
reference patterns from the reference head and delete all derivative
patterns than to scan the expressions. The only thing is that we need
to proceed both for the current and next generations, in case there is
a huge gap between the two. With this, purging 20M IP addresses in small
batches of 100 takes roughly 3 seconds.
2020-11-05 19:27:09 +01:00
Willy Tarreau
94b9abe200 MINOR: pattern: add pat_ref_purge_older() to purge old entries
This function will be usable to purge at most a specified number of old
entries from a reference. Entries are declared old if their generation
number is in the past compared to the one passed in argument. This will
ease removal of early entries when new ones have been appended.

We also call malloc_trim() when available, at the end of the series,
because this is one place where there is a lot of memory to save. Reloads
of 1M IP addresses used in an ACL made the process grow up to 1.7 GB RSS
after 10 reloads and roughly stabilize there without this call, versus
only 260 MB when the call is present. Sadly there is no direct equivalent
for jemalloc, which stabilizes around 800MB-1GB.
2020-11-05 19:27:09 +01:00
Willy Tarreau
1a6857b9c1 MINOR: pattern: implement pat_ref_load() to load a pattern at a given generation
pat_ref_load() basically combines pat_ref_append() and pat_ref_commit().
It's very similar to pat_ref_add() except that it also allows to set the
generation ID and the line number. pat_ref_add() was modified to directly
rely on it to avoid code duplication. Note that a previous declaration
of pat_ref_load() was removed as it was just a leftover of an earlier
incarnation of something possibly similar, so no existing functionality
was changed here.
2020-11-05 19:27:09 +01:00
Willy Tarreau
0439e5eeb4 MINOR: pattern: add pat_ref_commit() to commit a previously inserted element
This function will be used after a successful pat_ref_append() to propagate
the pattern to all use places (including parsing and indexing). On failure,
it will entirely roll back all insertions and free the pattern itself. It
also preserves the generation number so that it is convenient for use in
association with pat_ref_append(). pat_ref_add() was modified to rely on
it instead of open-coding the insertion and roll-back.
2020-11-05 19:27:09 +01:00
Willy Tarreau
c93da6950e MEDIUM: pattern: only match patterns that match the current generation
Instead of matching any pattern found in the tree, only match those
matching the current generation of entries. This will make sure that
reloads are atomic, regardless of the time they take to complete, and
that newly added data are not matched until the whole reference is
committed. For consistency we proceed the same way on "show map" and
"show acl".

This will have no impact for now since generations are not used.
2020-11-05 19:27:09 +01:00
Willy Tarreau
29947745b5 MINOR: pattern: store a generation number in the reference patterns
Right now it's not possible to perform a safe reload because we don't
know what patterns were recently added or were already present. This
patch adds a generation counter to the reference patterns so that it
is possible to know what generation of the reference they were loaded
with. A reference now has two generations, the current one, used for
all additions, and the next one, allocated to those wishing to update
the contents. The generation wraps at 2^32 so comparisons must be made
relative to the current position.

The idea will be that upon full reload, the caller will first get a new
generation ID, will insert all new patterns using it, will then switch
the current ID to the new one, and will delete all entries older than
the current ID. This has the benefit of supporting chunked updates that
remain consistent and that won't block the whole process for ages like
pat_ref_reload() currently does.
2020-11-05 19:27:09 +01:00
Willy Tarreau
1fd52f70e5 MINOR: pattern: introduce pat_ref_delete_by_ptr() to delete a valid reference
Till now the only way to remove a known reference was via
pat_ref_delete_by_id() which scans the whole list to find a matching pointer.
Let's add pat_ref_delete_by_ptr() which takes a valid pointer. It can be
called by the function above after the pointer is found, and can also be
used to roll back a failed insertion much more efficiently.
2020-11-05 19:27:09 +01:00
Willy Tarreau
a98b2882ac CLEANUP: pattern: remove pat_delete_fcts[] and pattern_head->delete()
These ones are not used anymore, so let's remove them to remove a bit
of the complexity. The ACL keyword's delete() function could be removed
as well, though most keyword declarations are positional and we have a
high risk of introducing a mistake here, so let's not touch the ACL part.
2020-11-05 19:27:09 +01:00
Willy Tarreau
e828d8f0e8 MINOR: pattern: perform a single call to pat_delete_gen() under the expression
When we're removing an element under the expression lock, we don't need
anymore to run over all ->delete() functions via the expressions, since
we know that the single function does it fine now. Note that at this
point, pattern->delete() is not used at all through out the code anymore.
2020-11-05 19:27:09 +01:00
Willy Tarreau
f1c0892aa6 MINOR: pattern: remerge the list and tree deletion functions
pat_del_tree_gen() was already chained onto pat_del_list_gen() to deal
with remaining cases, so let's complete the merge and have a generic
pattern deletion function acting on the reference and taking care of
reliably removing all elements.
2020-11-05 19:27:09 +01:00
Willy Tarreau
78777ead32 MEDIUM: pattern: change the pat_del_* functions to delete from the references
This is the next step in speeding up entry removal. Now we don't scan
the whole lists or trees for elements pointing to the target reference,
instead we start from the reference and delete all linked patterns.

This simplifies some delete functions since we don't need anymore to
delete multiple times from an expression since all nodes appear after
the reference element. We can now have one generic list and one generic
tree deletion function.

This required the replacement of pattern_delete() with an open-coded
version since we now need to lock all expressions first before proceeding.
This means there is a high risk of lock inversion here but given that the
expressions are always scanned in the same order from the same head, this
must not happen.

Now deleting first entries is instantaneous, and it's still slow to
delete the last ones when looking up their ID since it still requires
to look them up by a full scan, but it's already way faster than
previously. Typically removing the last 10 IP from a 20M entries ACL
with a full-scan each took less than 2 seconds.

It would be technically possible to make use of indexed entries to
speed up most lookups for removal by value (e.g. IP addresses) but
that's for later.
2020-11-05 19:27:09 +01:00
Willy Tarreau
4bdd0a13d6 MEDIUM: pattern: link all final elements from the reference
There is a data model issue in the current pattern design that makes
pattern deletion extremely expensive: there's no direct way from a
reference to access all indexed occurrences. As such, the only way
to remove all indexed entries corresponding to a reference update
is to scan all expressions's lists and trees to find a link to the
reference. While this was possibly OK when map removal was not
common and most maps were small, this is not conceivable anymore
with GeoIP maps containing 10M+ entries and del-map operations that
are triggered from http-request rulesets.

This patch introduces two list heads from the pattern reference, one
for the objects linked by lists and one for those linked by tree node.
Ideally a single list would be enough but the linked elements are too
much unrelated to be distinguished at the moment, so we'll need two
lists. However for the long term a single-linked list will suffice but
for now it's not possible due to the way elements are removed from
expressions. As such this patch adds 32 bytes of memory usage per
reference plus 16 per indexed entry, but both will be cut in half
later.

The links are not yet used for deletion, this patch only ensures the
list is always consistent.
2020-11-05 19:27:09 +01:00
Willy Tarreau
6d8a68914e MINOR: pattern: make the delete and prune functions more generic
Now we have a single prune() function to act on an expression, and one
delete function for the lists and one for the trees. The presence of a
pointer in the lists is enough to warrant a free, and we rely on the
PAT_SF_REGFREE flag to decide whether to free using free() or regfree().
2020-11-05 19:27:09 +01:00
Willy Tarreau
9b5c8bbc89 MINOR: pattern: new sflag PAT_SF_REGFREE indicates regex_free() is needed
Currently we have no way to know how to delete/prune a pattern in a
generic way. A pattern doesn't contain its own type so we don't know
what function to call. Tree nodes are roughly OK but not lists where
regex are possible. Let's add one new bit for sflags at index time to
indicate that regex_free() will be needed upon deletion. It's not used
for now.
2020-11-05 19:27:08 +01:00
Willy Tarreau
d4164dcd4a CLEANUP: pattern: delete the back refs at once during pat_ref_reload()
It's pointless to delete a backref and relink it to the next entry since
the next entry is going to do the exact same and so on until all of them
are deleted. Let's simply delete backrefs on reload.
2020-11-05 19:27:08 +01:00
Willy Tarreau
3ee0de1b41 MINOR: pattern: move the update revision to the pat_ref, not the expression
It's not possible to uniquely update a single expression without updating
the pattern reference, I don't know why we've put the revision in the
expression back then, given that it in fact provides an update for a
full pattern. Let's move the revision into the reference's head instead.
2020-11-05 19:27:08 +01:00
Willy Tarreau
114d698fde MEDIUM: pattern: call malloc_trim() on pat_ref_reload()
This is one case where we may release large amounts of data at once. Tests
show that without this, after 10 full reloads of an ACL containing  1M IP
addresses, the memory usage grew and stabilized around 1.7 GB of RSS. With
this change, it stays around 260 MB and is stable across reloads.
2020-11-05 19:27:08 +01:00
Willy Tarreau
a5bbaaf9f4 CLEANUP: pattern: fix spelling/grammatical/copy-paste in comments
The code is horrible to work with because most functions are documented
with misleading comments resulting from many spelling and grammatical
mistakes, and plenty of remains of copy-paste mentioning arguments that
do not exist and return values that are never set. Too many hours wasted
writing non-working code because of assumptions resulting from this,
let's fix this once for all now!
2020-10-31 13:14:10 +01:00