The pattern match "found" fails to parse on binary type samples. The
reason is that it presents itself as an integer type which bin cannot
be cast into. We must always accept this match since it validates
anything on input regardless of the type. Let's just relax the parser
to accept it.
This problem might also exist in 1.5.
(cherry picked from commit 91cc2368a73198bddc3e140d267cce4ee08cf20e)
This patch removes the 32 bits unsigned integer and the 32 bit signed
integer. It replaces these types by a unique type 64 bit signed.
This makes easy the usage of integer and clarify signed and unsigned use.
With the previous version, signed and unsigned are used ones in place of
others, and sometimes the converter loose the sign. For example, divisions
are processed with "unsigned", if one entry is negative, the result is
wrong.
Note that the integer pattern matching and dotted version pattern matching
are already working with signed 64 bits integer values.
There is one user-visible change : the "uint()" and "sint()" sample fetch
functions which used to return a constant integer have been replaced with
a new more natural, unified "int()" function. These functions were only
introduced in the latest 1.6-dev2 so there's no impact on regular
deployments.
Many such function need a session, and till now they used to dereference
the stream. Once we remove the stream from the embryonic session, this
will not be possible anymore.
So as of now, sample fetch functions will be called with this :
- sess = NULL, strm = NULL : never
- sess = valid, strm = NULL : tcp-req connection
- sess = valid, strm = valid, strm->txn = NULL : tcp-req content
- sess = valid, strm = valid, strm->txn = valid : http-req / http-res
All of them can now retrieve the HTTP transaction *if it exists* from
the stream and be sure to get NULL there when called with an embryonic
session.
The patch is a bit large because many locations were touched (all fetch
functions had to have their prototype adjusted). The opportunity was
taken to also uniformize the call names (the stream is now always "strm"
instead of "l4") and to fix indent where it was broken. This way when
we later introduce the session here there will be less confusion.
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.
In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.
The files stream.{c,h} were added and session.{c,h} removed.
The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.
Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.
Once all changes are completed, we should see approximately this :
L7 - http_txn
L6 - stream
L5 - session
L4 - connection | applet
There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.
Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.
A memory optimization can use the same pattern expression for many
equal pattern list (same parse method, index method and index_smp
method).
The pattern expression is returned by "pattern_new_expr", but this
function dont indicate if the returned pattern is already in use.
So, the caller function reload the list of patterns in addition with
the existing patterns. This behavior is not a problem with tree indexed
pattern, but it grows the lists indexed patterns.
This fix add a "reuse" flag in return of the function "pattern_new_expr".
If the flag is set, I suppose that the patterns are already loaded.
This fix must be backported into 1.5.
This code aims at clearing up the ACL parsing code a bit to make it
more obvious what happens in the case of an ACL keyword and what happens
in the case of a sample expression. Variables prev_type and cur_type were
merged, and ACL keyword processing functions are checked only once.
This patch could be backported into 1.5 after the previous patch, in order
to keep the code more maintainable.
Sample expressions involving converters in expression simply do not work
if the converter changes the sample type from the original keyword. Either
the keyword is a sample fetch keyword and its own type is used, or it's an
ACL keyword, and the keyword's parse/index/match functions are used despite
the converters. Thus it causes such a stupid error :
redirect location / if { date,ltime(%a) -i Fri }
[ALERT] 240/171746 (29127) : parsing [bug-conv.cfg:35] : error detected in proxy 'svc' while parsing redirect rule : error in condition: 'Fri' is not a number.
In fact now in ACLs, the case where the ACL keyword is alone is an exception
(eventhough the most common one). It's an exception to the sample expression
parsing rules since ACLs allow to redefine alternate parsing functions.
This fix does two things :
- it voids any references to the ACL keyword when a converter is involved
since we certainly not want to enforce a parser for a wrong data type ;
- it ensures that for all other cases (sample expressions), the type of
the expression is used instead of the type of the fetch keyword.
A significant cleanup of the code should be done, but this patch only aims
fixing the bug.
The fix should be backported into 1.5 since this appeared along the redesign
of the acl/pattern processing.
It appears than many people considers that the default match for a fetch
returning string is "exact match string" aka "str". This patch set this
match as default for strings.
Whatever ACL option beginning with a '-' is considered as a pattern
if it does not match a known option. This is a big problem because
typos are silently ignored, such as "-" or "-mi".
Better clearly check the complete option name and report a parsing
error if the option is unknown.
Last fix did address the issue for inlined patterns, but it was not
enough because the flags are lost as well when updating patterns
dynamically over the CLI.
Also if the same file was used once with -i and another time without
-i, their references would have been merged and both would have used
the same matching method.
It's appear that the patterns have two types of flags. The first
ones are relative to the pattern matching, and the second are
relative to the pattern storage. The pattern matching flags are
the same for all the patterns of one expression. Now they are
stored in the expression. The storage flags are information
returned by the pattern mathing function. This information is
relative to each entry and is stored in the "struct pattern".
Now, the expression matching flags are forwarded to the parse
and index functions. These flags are stored during the
configuration parsing, and they are used during the parse and
index actions.
This issue was introduced in dev23 with the major pattern rework,
and is a continuation of commit a631fc8 ("BUG/MAJOR: patterns: -i
and -n are ignored for inlined patterns"). No backport is needed.
These flags are only passed to pattern_read_from_file() which
loads the patterns from a file. The functions used to parse the
patterns from the current line do not provide the means to pass
the pattern flags so they're lost.
This issue was introduced in dev23 with the major pattern rework,
and was reported by Graham Morley. No backport is needed.
This patch replace a lot of pointeur by pattern matching identifier. If
the declared ACL use all the predefined pattern matching functions, the
register function gets the functions provided by "pattern.c" and
identified by the PAT_LATCH_*.
In the case of the acl uses his own functions, they can be declared, and
the acl registration doesn't change it.
This flag is no longer used. The last place using this, are the display
of the result of pattern matching in the cli command "get map" or "get
acl".
The first parameter of this command is the reference of the file used to
perform the lookup.
The acl and map function do the same work with the file parsing. This
patch merge these code in only one.
Note that the function map_read_entries_from_file() in the file "map.c"
is moved to the the function pat_ref_read_from_file_smp() in the file
"pattern.c". The code of this function is not modified, only the the
name and the arguments order has changed.
The find_smp search the smp using the value of the pat_ref_elt pointer.
The pat_find_smp_* are no longer used. The function pattern_find_smp()
known all pattern indexation, and can be found
The pattern reference are stored with two identifiers: the unique_id and
the reference.
The reference identify a file. Each file with the same name point to the
same reference. We can register many times one file. If the file is
modified, all his dependencies are also modified. The reference can be
used with map or acl.
The unique_id identify inline acl. The unique id is unique for each acl.
You cannot force the same id in the configuration file, because this
repport an error.
The format of the acl and map listing through the "socket" has changed
for displaying these new ids.
This patch extract the expect_type variable from the "struct pattern" to
"struct pattern_head". This variable is set during the declaration of
ACL and MAP. With this change, the function "pat_parse_len()" become
useless and can be replaced by "pat_parse_int()".
Implicit ACLs by default rely on the fetch's output type, so let's simply do
the same for all other ones. It has been verified that they all match.
Sometimes the same pattern file is used with the same index, parse and
parse_smp functions. If this two condition are true, these two pattern
are identical and the same struct can be used.
This patch add the following socket command line options:
show acl [<id>]
clear acl <id>
get acl <id> <pattern>
del acl <id> <pattern>
add acl <id> <pattern>
The system used for maps is backported in the pattern functions.
Some functions needs to change the sample associated to pattern. This
new pointer permit to return the a pointer to the sample pointer. The
caller can use or change the value.
This commit adds a delete function for patterns. It looks up all
instances of the pattern to delete and deletes them all. The fetch
keyword declarations have been extended to point to the appropriate
delete function.
Before this commit, the pattern_exec_match() function returns the
associate sample, the associate struct pattern or the associate struct
pattern_tree. This is complex to use, because we can check the type of
information returned.
Now the function return always a "struct pattern". If <fill> is not set,
only the value of the pointer can be used as boolean (NULL or other). If
<fill> is set, you can use the <smp> pointer and the pattern
information.
If information must be duplicated, it is stored in trash buffer.
Otherwise, the pattern can point on existing strings.
Before this patch, the indexation function check the declared patttern
matching function and index the data according with this function. This
is not useful to add some indexation mode.
This commit adds dedicated indexation function. Each struct pattern is
associated with one indexation function. This function permit to index
data according with the type of pattern and with the type of match.
This commit separes the "struct list" used for the chain the "struct
pattern" which contain the pattern data. Later, this change will permit
to manipulate lists ans trees with the same "struct pattern".
Each pattern parser take only one string. This change is reported to the
function prototype of the function "pattern_register()". Now, it is
called with just one string and no need to browse the array of args.
The goal of these patch is to simplify the prototype of
"pat_pattern_*()" functions. I want to replace the argument "char
**args" by a simple "char *arg" and remove the "opaque" argument.
"pat_parse_int()" and "pat_parse_dotted_ver()" are the unique pattern
parser using the "opaque" argument and using more than one string
argument of the char **args. These specificities are only used with ACL.
Other systems using this pattern parser (MAP and CLI) just use one
string for describing a range.
This two functions can read a range, but the min and the max must y
specified. This patch extends the syntax to describe a range with
implicit min and max. This is used for operators like "lt", "le", "gt",
and "ge". the syntax is the following:
":x" -> no min to "x"
"x:" -> "x" to no max
This patch moves the parsing of the comparison operator from the
functions "pat_parse_int()" and "pat_parse_dotted_ver()" to the acl
parser. The acl parser read the operator and the values and build a
volatile string readable by the functions "pat_parse_int()" and
"pat_parse_dotted_ver()". The transformation is done with these rules:
If the parser is "pat_parse_int()":
"eq x" -> "x"
"le x" -> ":x"
"lt x" -> ":y" (with y = x - 1)
"ge x" -> "x:"
"gt x" -> "y:" (with y = x + 1)
If the parser is "pat_parse_dotted_ver()":
"eq x.y" -> "x.y"
"le x.y" -> ":x.y"
"lt x.y" -> ":w.z" (with w.z = x.y - 1)
"ge x.y" -> "x.y:"
"gt x.y" -> "w.z:" (with w.z = x.y + 1)
Note that, if "y" is not present, assume that is "0".
Now "pat_parse_int()" and "pat_parse_dotted_ver()" accept only one
pattern and the variable "opaque" is no longer used. The prototype of
the pattern parsers can be changed.
This patch remove the limit of 32 groups. It also permit to use standard
"pat_parse_str()" function in place of "pat_parse_strcat()". The
"pat_parse_strcat()" is no longer used and its removed. Before this
patch, the groups are stored in a bitfield, now they are stored in a
list of strings. The matching is slower, but the number of groups is
low and generally the list of allowed groups is short.
The fetch function "smp_fetch_http_auth_grp()" used with the name
"http_auth_group" return valid username. It can be used as string for
displaying the username or with the acl "http_auth_group" for checking
the group of the user.
Maybe the names of the ACL and fetch methods are no longer suitable, but
I keep the current names for conserving the compatibility with existing
configurations.
The function "userlist_postinit()" is created from verification code
stored in the big function "check_config_validity()". The code is
adapted to the new authentication storage system and it is moved in the
"src/auth.c" file. This function is used to check the validity of the
users declared in groups and to check the validity of groups declared
on the "user" entries.
This resolve function is executed before the check of all proxy because
many acl needs solved users and groups.
The ACL keyword returned by find_acl_kw() is checked for having a
valid ->parse() function. This dates back 2007 when ACLs were reworked
in order to differenciate old and new keywords. This check is
inappropriate and confusing since all keywords have a parser now.
The ACL expression parser recently became a huge mess like a
spaghetti plate. The keyword is looked up at the beginning, then
sample fetches are processed, then an expression is initialized,
then arguments and converters are parsed but only if the keyword
was an ACL one, etc... Lots of "if" and redundant variables
everywhere making it hard to read and follow.
Let's move the args/conv parsing just after the keyword lookup.
At least now it's consistent that when we leave this if/else
statement, we have a sample expression initialized and full
parsed wherever the elements came from.
Just like for the last commit, we need to fix the ACL argument parser so
that it lets the lower layer do the job of referencing unresolved arguments
and correctly report the type of missing arguments.
Doing so ensures that we're consistent between all the functions in the whole
chain. This is important so that we can extract the argument parsing from this
function.
Now, the pat_parse_*() functions parses the incoming data. The input
"pattern" struct can be preallocated. If the parser needs to add some
buffers, it allocates memory.
The function pattern_register() runs the call to the parser, process
the key indexation and associate the "sample_storage" used by maps.
This patch remove the compatibility check from the input type and the
match method. Now, it checks if a casts from the input type to output
type exists and the pattern_exec_match() function apply casts before
each pattern matching.
ACL parse errors are not easy to understand since recent commit 348971e
(MEDIUM: acl: use the fetch syntax 'fetch(args),conv(),conv()' into the
ACL keyword) :
[ALERT] 339/154717 (26437) : parsing [check-bug.cfg:10] : error detected while parsing a 'stats admin' rule : unknown ACL or sample keyword 'env(a,b,c)': invalid arg 2 in fetch method 'env' : end of arguments expected at position 2, but got ',b,c'..
This error is only relevant to sample fetch keywords, so the new form is
a bit easier to understand :
[ALERT] 339/160011 (26626) : parsing [check-bug.cfg:12] : error detected while parsing a 'stats admin' rule : invalid arg 2 in fetch method 'env' : end of arguments expected at position 2, but got ',b,c' in sample expression 'env(a,b,c),upper'.
No backport is needed.
Commit 348971e (MEDIUM: acl: use the fetch syntax 'fetch(args),conv(),conv()'
into the ACL keyword) introduced a regression in the ACL parser. The second
argument of an ACL keyword is now mistakenly confused with a converter.
This bug is post-dev19 and does not require any backport.
We now have the following enums and all related functions return them and
consume them :
enum pat_match_res {
PAT_NOMATCH = 0, /* sample didn't match any pattern */
PAT_MATCH = 3, /* sample matched at least one pattern */
};
enum acl_test_res {
ACL_TEST_FAIL = 0, /* test failed */
ACL_TEST_MISS = 1, /* test may pass with more info */
ACL_TEST_PASS = 3, /* test passed */
};
enum acl_cond_pol {
ACL_COND_NONE, /* no polarity set yet */
ACL_COND_IF, /* positive condition (after 'if') */
ACL_COND_UNLESS, /* negative condition (after 'unless') */
};
It's just in order to avoid doubts when reading some code.
This patch just renames functions, types and enums. No code was changed.
A significant number of files were touched, especially the ACL arrays,
so it is likely that some external patches will not apply anymore.
One important thing is that we had to split ACL_PAT_* into two groups :
- ACL_TEST_{PASS|MISS|FAIL}
- PAT_{MATCH|UNMATCH}
A future patch will enforce enums on all these places to avoid confusion.
This patch just moves code without any change.
The ACL are just the association between sample and pattern. The pattern
contains the match method and the parse method. These two things are
different. This patch cleans the code by splitting it.
This will be used later with maps. Each map will associate an entry with
a sample_storage value.
This patch changes the "parse" prototype and all the parsing methods.
The goal is to associate "struct sample_storage" to each entry of
"struct acl_pattern". Only the "parse" function can add the sample value
into the "struct acl_pattern".
The map feature will need to match acl patterns. This patch extracts
the matching function from the global ACL function "acl_exec_cond".
The code was only moved to its own function, no functional changes were made.
With this split, the pattern indexation can apply to any source. The map
feature needs this functionality because the map cannot be loaded with the
same file format as the ones supported by acl_read_patterns_from_file().
The code was only moved to its own function, no functional changes were made.
If the acl keyword is a "fetch", the dedicated parsing function
"sample_parse_expr()" is used. Otherwise, the acl parsing function
"parse_acl_expr()" is extended to understand the syntax of a series
of converters placed after the "fetch" keyword.
Before this patch, each acl uses a "struct sample_fetch" and executes
it with the "<fetch>->process()" function. Now, the dedicated function
"sample_process()" is called.
These syntax are now avalaible:
acl bad req.hdr(host),lower -m str www
http-request redirect prefix /go-away if bad
acl bad hdr_beg(host),lower www
http-request redirect prefix /go-away if bad
A call to free_pattern_tree() upon exit() is made to free all ACL
patterns allocated in a tree (strings or IP addresses). Unfortunately
it happens that this function has been bogus from the beginning, it
walks over the whole tree, frees the nodes but forgets to remove them
from the tree prior to freeing them. So after visiting a leaf, the
next eb_next() call will require to revisit some of the upper nodes
that were just freed. This can remain unnoticed for a long time because
free() often just marks the area as free. But in cases of aggressive
memory freeing, the location will not be mapped anymore and the process
segfaults.
Note that the bug has no impact other than polluting kernel logs and
frightening sysadmins, since it happens just before exit().
Simply adding the debug code below makes it easier to reproduce the
same bug :
while (node) {
next = eb_next(node);
+ node->node_p = (void *)-1;
free(node);
node = next;
}
Many thanks to the StackExchange team for their very detailed bug report
that permitted to quickly understand this non-obvious bug!
This fix should be backported to 1.4 which introduced the bug.
It's quite common to write directives like the following :
tcp-request reject if WAIT_END { sc0_inc_gpc0 }
This one will never reject, because sc0_inc_gpc0 is provided no value
to compare against. The proper form should have been something like this :
tcp-request reject if WAIT_END { sc0_inc_gpc0 gt 0 }
or :
tcp-request reject if WAIT_END { sc0_inc_gpc0 -m found }
Now we detect the absence of any argument on the command line and emit
a warning suggesting alternatives or the use of "--" to really avoid
matching anything (might be used when debugging).
When a condition does something like :
action if A B C || D E F
If B returns a miss (can't tell true or false), C must not
be evaluated. This is important when C has a side effect
(eg: sc*_inc_gpc0). However the second part after the ||
can still be evaluated.
If haproxy is compiled with the USE_PCRE_JIT option, the length of the
string is used. If it is compiled without this option the function doesn't
use the length and expects a null terminated string.
The prototype of the function is ambiguous, and depends on the
compilation option. The developer can think that the length is always
used, and many bugs can be created.
This patch makes sure that the length is used. The regex_exec function
adds the final '\0' if it is needed.
William Lallemand reported a bug which happens when an ACL keyword using an
implicit argument (eg: a proxy name) is used : the keyword is not properly
set in the arglist field, resulting in an error about the previous keyword
being returned, or "(null)" if the faulty ACL appears first.
The bug only affects error reporting and is 1.5-specific, so no backport is
nedeed.
The current file "regex.h" define an abstraction for the regex. It
provides the same struct name and the same "regexec" function for the
3 regex types supported: standard libc, basic pcre and jit pcre.
The regex compilation function is not provided by this file. If the
developper wants to use regex, he must write regex compilation code
containing "#define *JIT*".
This patch provides a unique regex compilation function according to
the compilation options.
In addition, the "regex.h" file checks the presence of the "#define
PCRE_CONFIG_JIT" when "USE_PCRE_JIT" is enabled. If this flag is not
present, the pcre lib doesn't support JIT and "#error" is emitted.
This minor bug was found using the coccinelle script "da.cocci". The
len was initialized twice instead of setting the size. It's harmless
since no operations are performed on this empty string but needs to
be fixed anyway.
We're having a lot of duplicate code just because of minor variants between
fetch functions that could be dealt with if the functions had the pointer to
the original keyword, so let's pass it as the last argument. An earlier
version used to pass a pointer to the sample_fetch element, but this is not
the best solution for two reasons :
- fetch functions will solely rely on the keyword string
- some other smp_fetch_* users do not have the pointer to the original
keyword and were forced to pass NULL.
So finally we're passing a pointer to the keyword as a const char *, which
perfectly fits the original purpose.
There is no more reason for having "always_true", "always_false" and "env"
in acl.c while they're the most basic sample fetch keywords, so let's move
them to sample.c where it's easier to find them.
Benoit Dolez reported a failure to start haproxy 1.5-dev19. The
process would immediately report an internal error with missing
fetches from some crap instead of ACL names.
The cause is that some versions of gcc seem to trim static structs
containing a variable array when moving them to BSS, and only keep
the fixed size, which is just a list head for all ACL and sample
fetch keywords. This was confirmed at least with gcc 3.4.6. And we
can't move these structs to const because they contain a list element
which is needed to link all of them together during the parsing.
The bug indeed appeared with 1.5-dev19 because it's the first one
to have some empty ACL keyword lists.
One solution is to impose -fno-zero-initialized-in-bss to everyone
but this is not really nice. Another solution consists in ensuring
the struct is never empty so that it does not move there. The easy
solution consists in having a non-null list head since it's not yet
initialized.
A new "ILH" list head type was thus created for this purpose : create
an Initialized List Head so that gcc cannot move the struct to BSS.
This fixes the issue for this version of gcc and does not create any
burden for the declarations.
Commit 5adeda1 (acl: add option -m to change the pattern matching method)
was not completely correct with regards to boolean fetches. It only used
the sample type to determine if the test had to be performed as a boolean
instead of relying on the match function. Due to this, a test such as the
following would not correctly match as the pattern would be ignored :
acl srv_down srv_is_up(s2) -m int 0
No backport is needed as this was merged first in 1.5-dev18.
This is useful in order to take different actions across restarts without
touching the configuration (eg: soft-stop), or to pass some information
such as the local host name to the next hop.
Commit bef91e71 added the possibility to automatically use some fetch
functions instead of ACL functions, but for the fetch output type was
never used and setting the match method using -m was always mandatory.
Some fetch types are non-ambiguous and can intuitively be associated
with some ACL types :
SMP_T_BOOL -> bool
SMP_T_UINT/SINT -> int
SMP_T_IPV4/IPV6 -> ip
So let's have the ACL expression parser detect these ones automatically.
Other types are more ambiguous, especially everything related to strings,
as there are many string matching methods available and none of them is
the obvious standard matching method for any string. These ones will still
have to be specified using -m.
According to "man pcreapi", pcre_compile() does not accept being passed
a NULL pointer in errptr or erroffset. It immediately returns NULL, causing
any expression to fail. So let's pass real variables and make use of them
to improve error reporting.
When an ACL keyword needs a mandatory argument and this argument is of
type proxy or table, it is allowed not to specify it so that current
proxy is used by default.
In order to achieve this, the ACL expression parser builds a dummy
argument from scratch and marks it unresolved.
However, since recent changes on the ACL and samples, an unresolved
argument needs to be added to the unresolved list. This specific code
did not do it, resulting in random data being used as a proxy pointer
if no argument was passed for a proxy name, possibly even causing a
crash.
A quick workaround consists explicitly naming proxies in ACLs.
While ACL args were resolved after all the config was parsed, it was not the
case with sample fetch args because they're almost everywhere now.
The issue is that ACLs now solely rely on sample fetches, so their args
resolving doesn't work anymore. And many fetches involving a server, a
proxy or a userlist don't work at all.
The real issue is that at the bottom layers we have no information about
proxies, line numbers, even ACLs in order to report understandable errors,
and that at the top layers we have no visibility over the locations where
fetches are referenced (think log node).
After failing multiple unsatisfying solutions attempts, we now have a new
concept of args list. The principle is that every proxy has a list head
which contains a number of indications such as the config keyword, the
context where it's used, the file and line number, etc... and a list of
arguments. This list head is of the same type as the elements, so it
serves as a template for adding new elements. This way, it is filled from
top to bottom by the callers with the information they have (eg: line
numbers, ACL name, ...) and the lower layers just have to duplicate it and
add an element when they face an argument they cannot resolve yet.
Then at the end of the configuration parsing, a loop passes over each
proxy's list and resolves all the args in sequence. And this way there is
all necessary information to report verbose errors.
The first immediate benefit is that for the first time we got very precise
location of issues (arg number in a keyword in its context, ...). Second,
in order to do this we had to parse log-format and unique-id-format a bit
earlier, so that was a great opportunity for doing so when the directives
are encountered (unless it's a default section). This way, the recorded
line numbers for these args are the ones of the place where the log format
is declared, not the end of the file.
Userlists report slightly more information now. They're the only remaining
ones in the ACL resolving function.
Now it becomes possible to directly use sample fetches as the ACL fetch
methods. In this case, the matching method is mandatory. This allows to
form more ACL combinations from existing fetches and will limit the need
for new ACLs when everything is available to form them from sample fetches
and matches.
The acl_expr struct used to hold a pointer to the ACL keyword. But since
we now have all the relevant pointers, we don't need that anymore, we just
need the pointer to the keyword as a string in order to return warnings
and error messages.
So let's change this in order to remove the dependency on the acl_keyword
struct from acl_expr.
During this change, acl_cond_kw_conflicts() used to return a pointer to an
ACL keyword but had to be changed to return a const char* for the same reason.
ACL expressions now support "-m" in addition to "-i" and "-f". This new
option is followed by the name of the pattern matching method to be used
on the extracted pattern. This makes it possible to reuse existing sample
fetch methods with other matching methods (eg: regex). A "found" matching
method ignores any pattern and only verifies that the required sample was
found (useful for cookies).
The ACLs now use the fetch's ->use and ->val to decide upon compatibility
between the place where they are used and where the information are fetched.
The code is capable of reporting warnings about very fine incompatibilities
between certain fetches and an exact usage location, so it is expected that
some new warnings will be emitted on some existing configurations.
Two degrees of detection are provided :
- detecting ACLs that never match
- detecting keywords that are ignored
All tests show that this seems to work well, though bugs are still possible.
Proxy's acl_requires was a copy of all bits taken from ACLs, but we'll
get rid of ACL flags and only rely on sample fetches soon. The proxy's
acl_requires was only used to allocate an HTTP context when needed, and
was even forced in HTTP mode. So better have a flag which exactly says
what it's supposed to be used for.
These hooks, which established the relation between ACL_USE_* and the location
where the ACL were used, were never used because they were superseded with the
sample capabilities. Remove them now.
ACL fetch being inherited from the sample fetch keyword, we don't need
anymore to specify what function to use to validate the fetch arguments.
Note that the job is still done in the ACL parsing code based on elements
from the sample fetch structs.
Now that ACLs solely rely on sample fetch functions, make them use the
same arg mask. All inconsistencies have been fixed separately prior to
this patch, so this patch almost only adds a new pointer indirection
and removes all references to ARG*() in the definitions.
The parsing is still performed by the ACL code though.
ACL fetch functions used to directly reference a fetch function. Now
that all ACL fetches have their sample fetches equivalent, we can make
ACLs reference a sample fetch keyword instead.
In order to simplify the code, a sample keyword name may be NULL if it
is the same as the ACL's, which is the most common case.
A minor change appeared, http_auth always expects one argument though
the ACL allowed it to be missing and reported as such afterwards, so
fix the ACL to match this. This is not really a bug.
The file acl.c is a real mess, it both contains functions to parse and
process ACLs, and some sample extraction functions which act on buffers.
Some other payload analysers were arbitrarily dispatched to proto_tcp.c.
So now we're moving all payload-based fetches and ACLs to payload.c
which is capable of extracting data from buffers and rely on everything
that is protocol-independant. That way we can safely inflate this file
and only use the other ones when some fetches are really specific (eg:
HTTP, SSL, ...).
As a result of this cleanup, the following new sample fetches became
available even if they're not really useful :
always_false, always_true, rep_ssl_hello_type, rdp_cookie_cnt,
req_len, req_ssl_hello_type, req_ssl_sni, req_ssl_ver, wait_end
The function 'acl_fetch_nothing' was wrong and never used anywhere so it
was removed.
The "rdp_cookie" sample fetch used to have a mandatory argument while it
was optional in ACLs, which are supposed to iterate over RDP cookies. So
we're making it optional as a fetch too, and it will return the first one.
This flag is used on ACL matches that support being looking up patterns
in trees. At the moment, only strings and IPs support tree-based lookups,
but the flag is randomly set also on integers and binary data, and is not
even always set on strings nor IPs.
Better get rid of this mess by only relying on the matching function to
decide whether or not it supports tree-based lookups, this is safer and
easier to maintain.
The wrong variable is checked after a calloc() so a memory shortage would
result in a segfault while loading the config instead of a clean error.
This fix may be backported to 1.4 and 1.3 which are both affected.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
When leaving, during the deinit() process, prune_acl_expr() is called to
delete all ACL expressions. A bug was introduced with commit 34db1084 that
caused every other expression argument to be skipped, and more annoyingly,
it introduced the risk of scanning past the arg list and crashing or
freezing the old process during a reload.
Credits for finding this issue go to Dmitry Sivachenko who first reported
it, and second did a lot of research to narrow it down to a minimal
configuration.
Since 1.5-dev9, ACLs support multiple args. The changes performed in
acl_find_targets() were bogus as they were not always applied to the
current argument being processed, but sometimes to the first one only.
Fortunately till now, all ACLs which support resolvable arguments have
it in the first place only, so there was no impact.
The trash is used everywhere to store the results of temporary strings
built out of s(n)printf, or as a storage for a chunk when chunks are
needed.
Using global.tune.bufsize is not the most convenient thing either.
So let's replace trash with a chunk and directly use it as such. We can
then use trash.size as the natural way to get its size, and get rid of
many intermediary chunks that were previously used.
The patch is huge because it touches many areas but it makes the code
a lot more clear and even outlines places where trash was used without
being that obvious.
Some tests revealed that IPs not in the range of IPv6 subnets incorrectly
matched (for example "acl BUG src 2804::/16" applied to a src IP "127.0.0.1").
This is caused by the acl_match_ip() function applies a mask in host byte
order, whereas it should be in network byte order.
ACL and sample fetches use args list and it is really not convenient to
check for null args everywhere. Now for empty args we pass a constant
list of end of lists. It will allow us to remove many useless checks.
With this commit, we now separate the channel from the buffer. This will
allow us to replace buffers on the fly without touching the channel. Since
nobody is supposed to keep a reference to a buffer anymore, doing so is not
a problem and will also permit some copy-less data manipulation.
Interestingly, these changes have shown a 2% performance increase on some
workloads, probably due to a better cache placement of data.