225 Commits

Author SHA1 Message Date
Ilia Shipitsin
27a6353ceb CLEANUP: assorted typo fixes in the code, commits and doc 2025-04-03 11:37:25 +02:00
Aurelien DARRAGON
f72a66eef2 MINOR: pattern: publish event_hdl events on pat_ref updates
Now that PAT_REF events were defined in previous commit, let's actually
publish them from pattern API where relevant. Unlike server events,
pattern reference events are only published in the pat_ref subscriber's
list on purpose, because in some setups patref updates (updates performed
on a map for instance from action or cli) are very frequent, and we don't
want to impact pattern API performance just for that.

Moreover, as the main use case is to be able to subscribe to maps updates
from Lua, allowing a per-pattern reference registration is already enough.

No additional data is provided for such events (also for performance reason)

Care was taken not to publish events when the update doesn't affect the
live subset (the one targeted by curr_gen).
2024-11-29 07:22:25 +01:00
Aurelien DARRAGON
aa69a02d7f MEDIUM: pattern: always consider gen_id for pat_ref lookup operations
Historically, pat_ref lookup operations were performed on the whole
pat_ref elements list. As such, set, find and delete operations on a given
key would cause any matching element in pat_ref to be considered.

When prepare/commit operations were added, gen_id was impelemnted in
order to be able to work on a subset from pat_ref without impacting
the current (live) version from pat_ref, until a new subset is committed
to replace the current one.

While the logic was good, there remained a design flaw from the historical
implementation: indeed, legacy functions such as pat_ref_set(),
pat_ref_delete() and pat_ref_find_elt() kept performing the lookups on the
whole set of elements instead of considering only elements from the current
subset. Because of this, mixing new prepare/commit operations with legacy
operations could yield unexpected results.

For instance, before this commit:

  echo "add map #0 key oldvalue" | socat /tmp/ha.sock -
  echo "prepare map #0" | socat /tmp/ha.sock -
  New version created: 1
  echo "add map @1 #0 key newvalue" | socat /tmp/ha.sock -
  echo "del map #0 key" | socat /tmp/ha.sock -
  echo "commit map @1 #0" | socat /tmp/ha.sock -

  -> the result would be that "key" entry doesn't exist anymore after the
  commit, while we would expect the new value to be there instead.

Thanks to the previous commits, we may finally fix this issue: for set,
find_elt and delete operations, the current generation id is considered.

With the above example, it means that the "del map #0 key" would only
target elements from the current subset, thus elements in "version 1" of
the map would be immune to the delete (as we would expect it to work).
2024-11-26 16:12:31 +01:00
Aurelien DARRAGON
010c34b8c7 MEDIUM: pattern: consider gen_id in pat_ref_set_from_node()
Don't set all duplicates from a given node if they don't have the same
gen_id. Indeed, now we consider the gen_id to only work on the same
pattern ref revision.
2024-11-26 16:12:26 +01:00
Aurelien DARRAGON
4792f27892 MINOR: pattern: add pat_ref_gen_delete() function
pat_ref_gen_delete(ref, gen_id, key) tries to delete all samples belonging
to <gen_id> and matching <key> under <ref>

The goal is to be able to target a single subset from <ref>
2024-11-26 16:12:21 +01:00
Aurelien DARRAGON
a131c542a6 MINOR: pattern: add pat_ref_gen_find_elt() function
pat_ref_gen_find_elt(ref, gen_id, key) tries to find <elt> element
belonging to <gen_id> and matching <key> in <ref> reference.

The goal is to be able to target a single subset from <ref>
2024-11-26 16:12:16 +01:00
Aurelien DARRAGON
c9d6af3c6d MINOR: pattern: add pat_ref_gen_set() function
pat_ref_gen_set(ref, gen_id, value, err) modifies to <value> the sample
of all patterns matching <key> and belonging to <gen_id> (generation id)
under <ref>

The goal is to be able to target a single subset from <ref>
2024-11-26 16:12:11 +01:00
Aurelien DARRAGON
3d250b3be8 MINOR: pattern: split pat_ref_set()
split pat_ref_set() function in 2 distinct functions. Indeed, since
0844bed7d3 ("MEDIUM: map/acl: Improve pat_ref_set() efficiency (for
"set-map", "add-acl" action perfs)"), pat_ref_set() prototype was updated
to include an extra <elt> argument. But the logic behind is not explicit
because the function will not only try to set <elt>, but also its
duplicate (unlike pat_ref_set_elt() which only tries to update <elt>).

Thus, to make it clearer and better distinguish between the key-based
lookup version and the elt-based one, restotre pat_ref_set() previous
prototype and add a dedicated pat_ref_set_elt_duplicate() that takes
<elt> as argument and tries to update <elt> and all duplicates.
2024-11-26 16:12:05 +01:00
Willy Tarreau
555994c968 OPTIM: pattern: only apply LRU cache for large enough lists
As shown in issue #1518, the LRU cache has a non-null cost that can
sometimes be above the match cost it's trying to avoid. After a number
of tests, it appears that:
  - "simple" match operations (sub, beg, end, int etc) reach a break-even
    after ~20 patterns in list
  - "heavy" match operations (reg) reach a break-even after ~5 patterns in
    list

Let's only consult the LRU cache when the number of patterns in the
expression is at least as large as this limit. Of course there will
always be outliers but it already starts good.

Another improvement consists in reducing the cache size to further
speed up lookups, which makes sense if less expressions use the cache.
2024-11-15 15:33:04 +01:00
Aurelien DARRAGON
aba3ed62ae MINOR: pattern: add pat_ref_free() helper func
For now, pat_ref struct are never freed, except during init in case of
error. The freeing is done directly in the init functions because we
don't have an helper for that.

No having an helper func to properly free pat_ref struct doesn't encourage
us to free unused pat_ref structs, plus it is error-prone if new dynamic
members are added to pat_ref struct in the future.

To fix that, let's add a pat_ref_free() helper func and use it where
relevant (which means only under pat_ref init function for now..)
2024-11-07 11:36:13 +01:00
Aurelien DARRAGON
e8a0dbff93 OPTIM: pattern: use malloc() to initialize new pat_ref struct
As mentioned in the previous commit, in _pat_ref_new(), it was not
strictly needed to explicitly assign all struct members to 0 since
the struct was allocated with calloc() which does the zeroing for us.

However, it was verified that we already initialize all fields explictly,
thus there is no reason to keep using calloc() instead of malloc(). In
fact using malloc() is less expensive, so let's use that instead now.
2024-11-07 11:36:08 +01:00
Aurelien DARRAGON
d1397401f0 MINOR: pattern: add _pat_ref_new() helper func
pat_ref_newid() and pat_ref_new() are two functions to create and
initialize a pat_ref struct based on input parameters.

Both function perform the same generic allocation and initialization
for pat_ref struct, thus there is quite a lot of code redundancy.

This is error-prone if the pat_ref init sequence has to be updated at
some point.

To reduce maintenance costs, let's add a _pat_ref_new() helper func that
takes care of the generic allocation and base initialization for pat_ref
struct.
2024-11-07 11:36:01 +01:00
Willy Tarreau
9f8d9c9e8b BUG/MINOR: pattern: do not leave a leading comma on "set" error messages
Commit 4f2493f355 ("BUG/MINOR: pattern: pat_ref_set: fix UAF reported by
coverity") dropped the condition to concatenate error messages and as
such introduced a leading comma in front of all of them. Then commit
911f4d93d4 ("BUG/MINOR: pattern: pat_ref_set: return 0 if err was found")
changed the behavior to stop at the first error anyway, so all the
mechanics dedicated to the concatenation of error messages is no longer
needed and we can simply return the error as-is, without inserting any
comma.

This should be backported where the patches above are backported.
2024-09-10 08:55:29 +02:00
Aurelien DARRAGON
68cfb222b5 BUG/MEDIUM: pattern: prevent UAF on reused pattern expr
Since c5959fd ("MEDIUM: pattern: merge same pattern"), UAF (leading to
crash) can be experienced if the same pattern file (and match method) is
used in two default sections and the first one is not referenced later in
the config. In this case, the first default section will be cleaned up.
However, due to an unhandled case in the above optimization, the original
expr which the second default section relies on is mistakenly freed.

This issue was discovered while trying to reproduce GH #2708. The issue
was particularly tricky to reproduce given the config and sequence
required to make the UAF happen. Hopefully, Github user @asmnek not only
provided useful informations, but since he was able to consistently
trigger the crash in his environment he was able to nail down the crash to
the use of pattern file involved with 2 named default sections. Big thanks
to him.

To fix the issue, let's push the logic from c5959fd a bit further. Instead
of relying on "do_free" variable to know if the expression should be freed
or not (which proved to be insufficient in our case), let's switch to a
simple refcounting logic. This way, no matter who owns the expression, the
last one attempting to free it will be responsible for freeing it.
Refcount is implemented using a 32bit value which fills a previous 4 bytes
structure gap:

        int                        mflags;               /*    80     4 */

        /* XXX 4 bytes hole, try to pack */

        long unsigned int          lock;                 /*    88     8 */
(output from pahole)

Even though it was not reproduced in 2.6 or below by @asmnek (the bug was
revealed thanks to another bugfix), this issue theorically affects all
stable versions (up to c5959fd), thus it should be backported to all
stable versions.
2024-09-09 16:07:05 +02:00
Aurelien DARRAGON
8157c1caf2 BUG/MEDIUM: pattern: prevent uninitialized reads in pat_match_{str,beg}
Using valgrind when running map_beg or map_str, the following error is
reported:

==242644== Conditional jump or move depends on uninitialised value(s)
==242644==    at 0x2E4AB1: pat_match_str (pattern.c:457)
==242644==    by 0x2E81ED: pattern_exec_match (pattern.c:2560)
==242644==    by 0x343176: sample_conv_map (map.c:211)
==242644==    by 0x27522F: sample_process_cnv (sample.c:1330)
==242644==    by 0x2752DB: sample_process (sample.c:1373)
==242644==    by 0x319917: action_store (vars.c:814)
==242644==    by 0x24D451: http_req_get_intercept_rule (http_ana.c:2697)

In fact, the error is legit, because in pat_match_{beg,str}, we
dereference the buffer on len+1 to check if a value was previously set,
and then decide to force NULL-byte if it wasn't set.

But the approach is no longer compatible with current architecture:
data past str.data is not guaranteed to be initialized in the buffer.
Thus we cannot dereference the value, else we expose us to uninitialized
read errors. Moreover, the check is useless, because we systematically
set the ending byte to 0 when the conditions are met.

Finally, restoring the older value after the lookup is not relevant:
indeed, either the sample is marked as const and in such case it
is already duplicated, or the sample is not const and we forcefully add
a terminating NULL byte outside from the actual string bytes (since we're
past str.data), so as we didn't alter effective string data and that data
past str.data cannot be dereferenced anyway as it isn't guaranteed to be
initialized, there's no point in restoring previous uninitialized data.

It could be backported in all stable versions. But since this was only
detected by valgrind and isn't known to cause issues in existing
deployments, it's probably better to wait a bit before backporting it
to avoid any breakage.. although the fix should be theoretically harmless.
2024-09-09 15:57:30 +02:00
Aurelien DARRAGON
3449525a02 BUG/MINOR: pattern: prevent const sample from being tampered in pat_match_beg()
This is a complementary patch to a68affeaa ("BUG/MINOR: pattern: a sample
marked as const could be written"). Indeed the same logic from
pat_match_str() is used there, but we lack the check to ensure that the
sample is not const before writing data to it.

It could be backported to all stable versions.
2024-09-09 15:57:23 +02:00
Valentine Krasnobaeva
911f4d93d4 BUG/MINOR: pattern: pat_ref_set: return 0 if err was found
pat_ref_set_elt() returns 0, if we are run out of memory or can't parse a new
map value. Any arror message emitted by pat_ref_set_elt() is saved in err
buffer, if its provided by caller. These error messages are cumulated during
the loop.

pat_ref_set() is used to update values in map, referred to the same given key.
If during the update pat_ref_set_elt() fails, let's retun 0 to caller
immediately. We have the same non-unique key and the same new value in each
loop. So it seems quite odd to cumulate the same error messages and print it in
CLI:

        > add map @1 mytest.map <<
        + 1.0.1.11 TestA
        + 1.0.1.11 TESTA
        + 1.0.1.11 test_a
        +

        > set map mytest.map 1.0.1.11 15
         unable to parse '15' unable to parse '15' unable to parse '15'.

cli_parse_set_map(), which calls pat_ref_set() to update map, will return only
one error message with this patch:

> set map mytest.map 1.0.1.11 15
 unable to parse '15'.

hlua_set_map() and http_action_set_map() don't provide error buffer and will
just exit on the first error.

This should be backported in all stable versions.
2024-08-13 16:13:43 +02:00
Valentine Krasnobaeva
4f2493f355 BUG/MINOR: pattern: pat_ref_set: fix UAF reported by coverity
memprintf() performs realloc and updates then the pointer to an output buffer,
where it has written the data. So free() is called on the previous buffer
address, if it was provided.

pat_ref_set_elt() uses memprintf() to write its error message as well as
pat_ref_set(). So, when we re-enter into the while loop the second time and
pat_ref_set_elt() has returned, the *err ptr (previous value of *merr) is
already freed by memprintf() from pat_ref_set_el().

'if (!found)' condition is false at this point, because we've found a node at
the first loop. So, the second memprintf(), in order to write error messages,
does again free(*err).

This should be backported in all stable versions.
2024-08-13 16:13:41 +02:00
Aurelien DARRAGON
3b0bf5097b MINOR: map: mapfile ordering also matters for tree-based match types
Willy made me realize that tree-based matching may also suffer from
out-of-order mapfile loading, as opposed to what's being said in
b546bb6d ("BUG/MINOR: map: list-based matching potential ordering
regression") and the associated REGTEST.

Indeed, in case of duplicated keys, we want to be sure that only the key
that was first seen in the file will be returned (as long as it is not
removed). The above fix is still valid, and the list-based match regtest
will also prevent regressions for tree-based match since mapfile loading
logic is currently match-type agnostic.

But let's clarify that by making both the code comment and the regtest
more precise.
2024-01-11 11:13:54 +01:00
Aurelien DARRAGON
b546bb6d67 BUG/MINOR: map: list-based matching potential ordering regression
An unexpected side-effect was introduced by 5fea597 ("MEDIUM: map/acl:
Accelerate several functions using pat_ref_elt struct ->head list")

The above commit tried to use eb tree API to manipulate elements as much
as possible in the hope to accelerate some functions.

Prior to 5fea597, pattern_read_from_file() used to iterate over all
elements from the map file in the same order they were seen in the file
(using list_for_each_entry) to push them in the pattern expression.

Now, since eb api is used to iterate over elements, the ordering is lost
very early.

This is known to cause behavior changes with existing setups (same conf
and map file) when compared with previous versions for some list-based
matching methods as described in GH #2400. For instance, the map_dom()
converter may return a different matching key from the one that was
returned by older haproxy versions.

For IP or STR matching, matching is based on tree lookups for better
efficiency, so in this case the ordering is lost at the name of
performance. The order in which they are loaded doesn't matter because
tree ordering is based on the content, it is not positional.

But with some other types, matching is based on list lookups (e.g.: dom),
and the order in which elements are pushed into the list can affect the
matching element that will be returned (in case of multiple matches, since
only the first matching element in the list will be returned).

Despite the documentation not officially stating that the file ordering
should be preserved for list-based matching methods, it's probably best
to be conservative here and stick to historical behavior. Moreover, there
was no performance benefit from using the eb tree api to iterate over
elements in pattern_read_from_file() since all elements are visited
anyway.

This should be backported to 2.9.
2024-01-10 18:02:13 +01:00
Aurelien DARRAGON
d7964c52ce BUG/MEDIUM: map/acl: pat_ref_{set,delete}_by_id regressions
Some regressions were introduced by 5fea59754b ("MEDIUM: map/acl:
Accelerate several functions using pat_ref_elt struct ->head list")

pat_ref_delete_by_id() fails to properly unlink and free the removed
reference because it bypasses the pat_ref_delete_by_ptr() made for
that purpose. This function is normally used everywhere the target
reference is set for removal, such as the pat_ref_delete() function
that matches pattern against a string. The call was probably skipped
by accident during the rewrite of the function.

With the above commit also comes another undesirable change:
both pat_ref_delete_by_id() and pat_ref_set_by_id() directly use the
<refelt> argument as a valid pointer (they do dereference it).

This is wrong, because <refelt> is unsafe and should be handled as an
ID, not a pointer (hence the function name). Indeed, the calling function
may directly pass user input from the CLI as <refelt> argument, so we must
first ensure that it points to a valid element before using it, else it is
probably invalid and we shouldn't touch it.

What this patch essentially does, is that it reverts pat_ref_set_by_id()
and pat_ref_delete_by_id() to pre 5fea59754b behavior. This seems like
it was the only optimization from the patch that doesn't apply.

Hopefully, after reviewing the changes with Fred, it seems that the 2
functions are only being involved in commands for manipulating maps or
acls on the cli, so the "missed" opportunity to improve their performance
shouldn't matter much. Nonetheless, if we wanted to speed up the reference
lookup by ID, we could consider adding an eb64 tree for that specific
purpose that contains all pattern references IDs (ie: pointers) so that
eb lookup functions may be used instead of linear list search.

The issue was raised by Marko Juraga as he failed to perform an an acl
removal by reference on the CLI on 2.9 which was known to work properly
on other versions.

It should be backported on 2.9.

Co-Authored-by: Frédéric Lécaille <flecaille@haproxy.com>
2023-12-08 14:26:06 +01:00
Christopher Faulet
67c03508d6 MEDIUM: pattern: Add support for virtual and optional files for patterns
Before this patch, it was not possible to use a list of patterns, map or a
list of acls, without an existing file.  However, it could be handy to just
use an ID, with no file on the disk. It is pretty useful for everyone
managing dynamically these lists. It could also be handy to try to load a
list from a file if it exists without failing if not. This way, it could be
possible to make a cold start without any file (instead of empty file),
dynamically add and del patterns, dump the list to the file periodically to
reuse it on reload (via an external process).

In this patch, we uses some prefixes to be able to use virtual or optional
files.

The default case remains unchanged. regular files are used. A filename, with
no prefix, is used as reference, and it must exist on the disk. With the
prefix "file@", the same is performed. Internally this prefix is
skipped. Thus the same file, with ou without "file@" prefix, references the
same list of patterns.

To use a virtual map, "virt@" prefix must be used. No file is read, even if
the following name looks like a file. It is just an ID. The prefix is part
of ID and must always be used.

To use a optional file, ie a file that may or may not exist on a disk at
startup, "opt@" prefix must be used. If the file exists, its content is
loaded. But HAProxy doesn't complain if not. The prefix is not part of
ID. For a given file, optional files and regular files reference the same
list of patterns.

This patch should fix the issue #2202.
2023-12-06 10:24:41 +01:00
Christopher Faulet
660e4185e1 MINOR: pattern: Use reference name as filename to read patterns from a file
It is only a small API refactoring. The filename is no longer used when
pat_ref_read_from_file_smp() or pat_ref_read_from_file() functions are
called. The filename was already used to create the reference on the list of
patterns. Thus, we now directly use info from this reference.
2023-12-06 10:24:41 +01:00
Willy Tarreau
3ac9912837 OPTIM: pattern: save memory and time using ebst instead of ebis
In the pat_ref_elt struct, the pattern string is stored outside of the
node element, using a pointer to an strdup(). Not only this needlessly
wastes at least 16-24 bytes per entry (8 for the pointer, 8-16 for the
allocator), it also makes the tree descent less efficient since both
the node and the string have to be visited for each layer (hence at least
two cache lines). Let's use an ebmb storage and place the pattern right
at the end of the pat_ref_elt, making it a variable-sized element instead.

The set-map test below jumps from 173 to 182 kreq/s/core, and the memory
usage drops from 356 MB to 324 MB:

  http-request set-map(/dev/null) %[rand(1000000)] 1

This is even more visible with large maps: after loading 16M IP addresses
into a map, the process uses this amount of memory:

  - 3.15 GB with haproxy-2.8
  - 4.21 GB with haproxy-2.9-dev11
  - 3.68 GB with this patch

So that's a net saving of 32 bytes per entry here, which cuts in half the
extra cost of the tree, and loading a large map takes about 20% less time.
2023-11-27 11:25:07 +01:00
Willy Tarreau
58185669d8 BUG/MEDIUM: pattern: don't trim pools under lock in pat_ref_purge_range()
There's a subtle issue that results from pat_ref_purge_range() trying
to release memory. Since commit 0d93a8186 ("MINOR: pools: work around
possibly slow malloc_trim() during gc") that was backported to 2.3,
trim_all_pools() now protects itself against concurrent malloc() and
free() by isolating itself. The problem is that pat_ref_purge_range()
must be called under a lock, which is precisely what's done in
cli_io_handler_clear_map(). Thus during a clearing of a map, if
another thread tries to access or update an entry in the same map, it
will wait for the ref->lock to be released, and trim_all_pools() will
wait for all threads to be harmless, thus causing a deadlock. Note
that disabling memory trimming cannot work around the problem here
because it's tested only under isolation.

The solution here consists in moving the call to trim_all_pools() to
the caller, out of the lock.

This must be backported as far as 2.4.
2023-11-04 07:55:37 +01:00
Aurelien DARRAGON
0189a4679e MINOR: pattern/ip: simplify pat_match_ip() function
pat_match_ip() has been updated several times over the last decade to
introduce new features, but it was never cleaned up.

The result is that the function is pretty hard to read, and there are
multiple duplicated code blocks so it becomes error-prone to maintain it,
plus it bloats the haproxy binary for nothing.

In this patch, we move the tree search (ip4 / ip6) logic into 2
dedicated helper functions. This allows us to refactor pat_match_ip()
without touching to the original behavior.
2023-09-21 09:50:56 +02:00
Aurelien DARRAGON
f80122db26 MINOR: pattern/ip: offload ip conversion logic to helper functions
Now that v4tov6() and v6tov4() were reworked to match behavior from
pat_match_ip() function in ("MINOR: tools/ip: v4tov6() and v6tov4()
rework"), we can remove code duplication in pat_match_ip() by directly
using those dedicated functions where relevant.
2023-09-21 09:50:55 +02:00
Frdric Lcaille
81815a9a83 MEDIUM: map/acl: Replace map/acl spin lock by a read/write lock.
Replace ->lock type of pat_ref struct by HA_RWLOCK_T.
Replace all calls to HA_SPIN_LOCK() (resp. HA_SPIN_UNLOCK()) by HA_RWLOCK_WRLOCK()
(resp. HA_RWLOCK_WRUNLOCK()) when a write access is required.
There is only one read access which is needed. This is in the "show map" command
callback, cli_io_handler_map_lookup() where a HA_SPIN_LOCK() call is replaced
by HA_RWLOCK_RDLOCK() (resp. HA_SPIN_UNLOCK() by HA_RWLOCK_RDUNLOCK).
Replace HA_SPIN_INIT() calls by HA_RWLOCK_INIT() calls.
2023-08-25 15:42:03 +02:00
Frdric Lcaille
5fea59754b MEDIUM: map/acl: Accelerate several functions using pat_ref_elt struct ->head list
Replace as much as possible list_for_each*() around ->head list, member of
pat_ref_elt struct by use of its ->ebpt_root member which is an ebtree.
2023-08-25 15:42:01 +02:00
Frdric Lcaille
745d1a269b MEDIUM: map/acl: Improve pat_ref_set_elt() efficiency (for "set-map", "add-acl"action perfs)
Store a pointer to the expression (struct pattern_expr) into the data structure
used to chain/store the map element references (struct pat_ref_elt) , e.g. the
struct pattern_tree when stored into an ebtree or struct pattern_list when
chained to a list.

Modify pat_ref_set_elt() to stop inspecting all the expressions attached to a map
and to look for the <elt> element passed as parameter to retrieve the sample data
to be parsed. Indeed, thanks to the pointer added above to each pattern tree nodes
or list elements, they all can be inspected directly from the <elt> passed as
parameter and its ->tree_head and ->list_head member: the pattern tree nodes are
stored into elt->tree_head, and the pattern list elements are chained to
elt->list_head list. This inspection was also the job of pattern_find_smp() which
is no more useful. This patch removes the code of this function.
2023-08-25 15:41:59 +02:00
Frdric Lcaille
0844bed7d3 MEDIUM: map/acl: Improve pat_ref_set() efficiency (for "set-map", "add-acl" action perfs)
Organize reference to pattern element of map (struct pat_ref_elt) into an ebtree:
  - add an eb_root member to the map (pat_ref struct) and an ebpt_node to its
    element (pat_ref_elt struct),
  - modify the code to insert these nodes into their ebtrees each time they are
    allocated. This is done in pat_ref_append().

Note that ->head member (struct list) of map (struct pat_ref) is not removed
could have been removed. This is not the case because still necessary to dump
the map contents from the CLI in the order the map elememnts have been inserted.

This patch also modifies http_action_set_map() which is the callback at least
used by "set-map" action. The pat_ref_elt element returned by pat_ref_find_elt()
is no more ignored, but reused if not NULL by pat_ref_set() as first element to
lookup from. This latter is also modified to use the ebtree attached to the map
in place of the ->head list attached to each map element (pat_ref_elt struct).

Also modify pat_ref_find_elt() to makes it use ->eb_root map ebtree added to the
map by this patch in place of inspecting all the elements with a strcmp() call.
2023-08-25 15:41:56 +02:00
Willy Tarreau
821fc95146 MINOR: pattern: do not needlessly lookup the LRU cache for empty lists
If a pattern list is empty, there's no way we can find its elements in
the pattern cache, so let's avoid this expensive lookup. This can happen
for ACLs or maps loaded from files that may optionally be empty for
example. Doing so improves the request rate by roughly 10% for a single
such match for only 8 threads. That's normal because the LRU cache
pre-creates an entry that is about to be committed for the case the list
lookup succeeds after a miss, so we bypass all this.
2023-08-22 07:27:01 +02:00
Willy Tarreau
9b060f148e MINOR: pattern: use trim_all_pools() instead of a conditional malloc_trim()
First this will ensure that we serialize the threads and avoid severe
contention. Second it removes ugly ifdefs and conditions.
2023-03-22 17:30:28 +01:00
Miroslav Zagorac
d8a97d8f60 BUG/MINOR: illegal use of the malloc_trim() function if jemalloc is used
In the event that HAProxy is linked with the jemalloc library, it is still
shown that malloc_trim() is enabled when executing "haproxy -vv":
  ..
  Support for malloc_trim() is enabled.
  ..

It's not so much a problem as it is that malloc_trim() is called in the
pat_ref_purge_range() function without any checking.

This was solved by setting the using_default_allocator variable to the
correct value in the detect_allocator() function and before calling
malloc_trim() it is checked whether the function should be called.
2023-03-22 14:14:50 +01:00
Willy Tarreau
51d38a26fe BUG/MEDIUM: pattern: only visit equivalent nodes when skipping versions
Miroslav reported in issue #1802 a problem that affects atomic map/acl
updates. During an update, incorrect versions are properly skipped, but
in order to do so, we rely on ebmb_next() instead of ebmb_next_dup().
This means that if a new matching entry is in the process of being
added and is the first one to succeed in the lookup, we'll skip it due
to its version and use the next entry regardless of its value provided
that it has the correct version. For IP addresses and string prefixes
it's particularly visible because a lookup may match a new longer prefix
that's not yet committed (e.g. 11.0.0.1 would match 11/8 when 10/7 was
the only committed one), and skipping it could end up on 12/8 for
example. As soon as a commit for the last version happens, the issue
disappears.

This problem only affects tree-based matches: the "str", "ip", and "beg"
matches.

Here we replace the ebmb_next() values with ebmb_next_dup() for exact
string matches, and with ebmb_lookup_shorter() for longest matches,
which will first visit duplicates, then look for shorter prefixes. This
relies on previous commit:

    MINOR: ebtree: add ebmb_lookup_shorter() to pursue lookups

Both need to be backported to 2.4, where the generation ID was added.

Note that nowadays a simpler and more efficient approach might be employed,
by having a single version in the current tree, and a list of trees per
version. Manipulations would look up the tree version and work (and lock)
only in the relevant trees, while normal operations would be performed on
the current tree only. Committing would just be a matter of swapping tree
roots and deleting old trees contents.
2022-08-01 11:59:46 +02:00
Tim Duesterhus
d5fc8fcb86 CLEANUP: Add haproxy/xxhash.h to avoid modifying import/xxhash.h
This solves setting XXH_INLINE_ALL in a cleaner way, because the imported
header is not modified, easing future updates.

see 6f7cc11e6dd0f01b437fba893da2edd2362660a2
2021-09-11 19:58:45 +02:00
Dragan Dosen
a75eea78e2 MINOR: map/acl: print the count of all the map/acl entries in "show map/acl"
The output of "show map/acl" now contains the 'entry_cnt' value that
represents the count of all the entries for each map/acl, not just the
active ones, which means that it also includes entries currently being
added.
2021-05-25 08:44:45 +02:00
Willy Tarreau
da7f11bfb5 CLEANUP: pattern: remove the unused and dangerous pat_ref_reload()
This function was not used anymore after the atomic updates were
implemented in 2.3, and it must not be used given that it does not
yield and can easily make the process hang for tens of seconds on
large acls/maps. Let's remove it before someone uses it as an
example to implement something else!
2021-05-11 16:49:55 +02:00
Willy Tarreau
a13afe6535 MINOR: pattern: support purging arbitrary ranges of generations
Instead of being able to purge only values older than a specific value,
let's support arbitrary ranges and make pat_ref_purge_older() just be
one special case of this one.
2021-04-30 15:36:31 +02:00
Willy Tarreau
2b71810cb3 CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion
The current "ADD" vs "ADDQ" is confusing because when thinking in terms
of appending at the end of a list, "ADD" naturally comes to mind, but
here it does the opposite, it inserts. Several times already it's been
incorrectly used where ADDQ was expected, the latest of which was a
fortunate accident explained in 6fa922562 ("CLEANUP: stream: explain
why we queue the stream at the head of the server list").

Let's use more explicit (but slightly longer) names now:

   LIST_ADD        ->       LIST_INSERT
   LIST_ADDQ       ->       LIST_APPEND
   LIST_ADDED      ->       LIST_INLIST
   LIST_DEL        ->       LIST_DELETE

The same is true for MT_LISTs, including their "TRY" variant.
LIST_DEL_INIT keeps its short name to encourage to use it instead of the
lazier LIST_DELETE which is often less safe.

The change is large (~674 non-comment entries) but is mechanical enough
to remain safe. No permutation was performed, so any out-of-tree code
can easily map older names to new ones.

The list doc was updated.
2021-04-21 09:20:17 +02:00
Willy Tarreau
295a89c029 MINOR: pattern: make the pat_lru_seed read_mostly
This seed is created once at boot and is used in every LRU hash when
caching results. Let's mark it read_mostly.
2021-04-10 19:27:41 +02:00
Willy Tarreau
9057a0026e CLEANUP: pattern: make all pattern tables read-only
Interestingly, all arrays used to declare patterns were read-write while
only hard-coded. Let's mark them const so that they move from data to
rodata and don't risk to experience false sharing.
2021-04-10 17:49:41 +02:00
Willy Tarreau
61cfdf4fd8 CLEANUP: tree-wide: replace free(x);x=NULL with ha_free(&x)
This makes the code more readable and less prone to copy-paste errors.
In addition, it allows to place some __builtin_constant_p() predicates
to trigger a link-time error in case the compiler knows that the freed
area is constant. It will also produce compile-time error if trying to
free something that is not a regular pointer (e.g. a function).

The DEBUG_MEM_STATS macro now also defines an instance for ha_free()
so that all these calls can be checked.

178 occurrences were converted. The vast majority of them were handled
by the following Coccinelle script, some slightly refined to better deal
with "&*x" or with long lines:

  @ rule @
  expression E;
  @@
  - free(E);
  - E = NULL;
  + ha_free(&E);

It was verified that the resulting code is the same, more or less a
handful of cases where the compiler optimized slightly differently
the temporary variable that holds the copy of the pointer.

A non-negligible amount of {free(str);str=NULL;str_len=0;} are still
present in the config part (mostly header names in proxies). These
ones should also be cleaned for the same reasons, and probably be
turned into ist strings.
2021-02-26 21:21:09 +01:00
Willy Tarreau
dc2410d093 CLEANUP: pattern: rename pat_ref_commit() to pat_ref_commit_elt()
It's about the third time I get confused by these functions, half of
which manipulate the reference as a whole and those manipulating only
an entry. For me "pat_ref_commit" means committing the pattern reference,
not just an element, so let's rename it. A number of other ones should
really be renamed before 2.4 gets released :-/
2021-01-15 14:11:59 +01:00
Thayne McCombs
8f0cc5c4ba CLEANUP: Fix spelling errors in comments
This is from the output of codespell. It's done at once over a bunch
of files and only affects comments, so there is nothing user-visible.
No backport needed.
2021-01-08 14:56:32 +01:00
Dragan Dosen
967e7e79af MEDIUM: xxhash: use the XXH3 functions to generate 64-bit hashes
Replace the XXH64() function calls with the XXH3 variant function
XXH3_64bits_withSeed() where possible.
2020-12-23 06:39:21 +01:00
Thierry Fournier
a68affeaa9 BUG/MINOR: pattern: a sample marked as const could be written
The functions add final 0 to string if the final 0 is not set,
but don't check the flag CONST. This patch duplicates the strings
if the final zero is not set and the string is CONST.

Should be backported until 2.2 (at least)
2020-11-11 10:43:15 +01:00
Willy Tarreau
38d41996c1 MEDIUM: pattern: turn the pattern chaining to single-linked list
It does not require heavy deletion from the expr anymore, so we can now
turn this to a single-linked list since most of the time we want to delete
all instances of a given pattern from the head. By doing so we save 32 bytes
of memory per pattern. The pat_unlink_from_head() function was adjusted
accordingly.
2020-11-05 19:27:09 +01:00
Willy Tarreau
867a8a5a10 MINOR: pattern: prepare removal of a pattern from the list head
Instead of using LIST_DEL() on the pattern itself inside an expression,
we look it up from its head. The goal is to get rid of the double-linked
list while this usage remains exclusively for freeing on startup error!
2020-11-05 19:27:09 +01:00
Willy Tarreau
2817472bb0 MINOR: pattern: during reload, delete elements frem the ref, not the expression
Instead of scanning all elements from the expression and using the
slow delete path there, let's use the faster way which involves
pat_delete_gen() while the elements are detached from ther reference.
2020-11-05 19:27:09 +01:00