A memory allocation failure happening in chash_init_server_tree while
trying to allocate a server's lb_nodes item used in consistent hashing
would have resulted in a crash. This function is only called during
configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
The function is defined when using linux+cpu affinity but is only used
if threads are enabled, so let's add this condition to avoid aa build
warning about an unused function when building with thread disabled.
This came in 2.4-dev17 with commit b56a7c89a ("MEDIUM: cfgparse: detect
numa and set affinity if needed") so no backport is needed.
These ones are used by virtually every config parser. Not only they
provide no benefit in being inlined, but they imply a very deep
dependency starting at proxy.h, which results for example in task.c
including openssl.
Let's move these two functions to cfgparse.c.
This one works just like .notice/.warning/.alert except that it prints
the message at level "DIAG" only when haproxy runs in diagnostic mode
(-dD). This can be convenient for example to pass a few hints to help
locate certain config parts or to leave messages about certain temporary
workarounds.
Example:
.diag "WTA/2021-05-07: $.LINE: replace 'redirect' with 'return' after final switch to 2.4"
http-request redirect location /goaway if ABUSE
These predicates respectively verify that the current version is at least
a given version or is before a specific one. The syntax is exactly the one
reported by "haproxy -v", though each component is optional, so both "1.5"
and "2.4-dev18-88910-48" are supported. Missing components equal zero, and
"dev" is below "pre" or "rc", which are both inferior to no such mention
(i.e. they are negative). Thus "2.4-dev18" is older than "2.4-rc1" which
is older than "2.4".
The "feature(name)" predicate will return true if <name> corresponds to
a name listed after a '+' in the features list, that is it was enabled at
build time with USE_<name>=1. Typical use cases will include OPENSSL, LUA
and LINUX_SPLICE. But maybe it will also be convenient to use with optional
addons such as PROMEX and the device detection modules to help keeping the
same configs across various deployments.
"streq(str1,str2)" will return true if the two strings match while
"strneq(str1,str2)" will return true only if they differ. This is
convenient to match an environment variable against a predefined value.
Now we can look up a list of known predicates and pre-parse their
arguments. For now the list is empty. The code needed to be arranged with
a common exit point to release all arguments because there's no default
argument freeing function (it likely only used to exist in the deinit
code). Since we only support simple arguments for now it's no big deal,
only a 2-liner loop.
Let's return the position of the first unparsable character on error,
so that instead of just saying "unparsable conditional expression blah"
we can have:
[ALERT] 125/150618 (13995) : parsing [test-conds2.cfg:1]: unparsable conditional expression '12/blah' in '.if' at position 1:
.if 12/blah
^
This is important because conditions will be made from environment
variables or later from more complex expressions where the error will
not always be easy to locate.
Let's add a few fields to the global struct to store information about
the current file being processed, the current line number and the current
section. This will be used to retrieve them using special variables.
Instead of duplicating the condition evaluations, let's have a single
function cfg_eval_condition() that returns true/false/error. It takes
less code and will ease its extension.
The doc about .if/.elif config block conditions says:
a non-nul integer (e.g. '1'), always returns "true"
So we must accept negative integers as well. The test was made on
atoi() > 0.
No backport is needed, this is only 2.4.
This missing state was causing a second elif condition to be evaluated
after a first one succeeded after a .if failed. For example in the test
below the else would be executed:
.if 0
.elif 1
.elif 0
.else
.endif
No backport is needed, this is 2.4-only.
The condition to skip the block in the ".if" evaluator forgot to check
that the level was high enough, resulting in rare cases where a random
value matched one of the 5 values that cause the block to be skipped.
No backport is needed as it's 2.4-only.
By default haproxy loads all files designated by a relative path from the
location the process is started in. In some circumstances it might be
desirable to force all relative paths to start from a different location
just as if the process was started from such locations. This is what this
directive is made for. Technically it will perform a temporary chdir() to
the designated location while processing each configuration file, and will
return to the original directory after processing each file. It takes an
argument indicating the policy to use when loading files whose path does
not start with a slash ('/').
A few options are offered, "current" (the default), "config" (files
relative to config file's dir), "parent" (files relative to config file's
parent dir), and "origin" with an absolute path.
This should address issue #1198.
In readcfgfile() when malloc() fails to allocate a buffer for the
config line, it currently says "parsing[<file>]: out of memory" while
the error is unrelated to the config file and may make one think it has
to do with the file's size. The second test (fopen() returning error)
needs to release the previously allocated line. Both directly return -1
which is not even documented as a valid error code for the function.
Let's simply make sure that the few variables freed at the end are
properly preset, and jump there upon error, after having displayed a
meaningful error message. Now at least we can get this:
$ ./haproxy -f /dev/kmem
[NOTICE] 116/191904 (23233) : haproxy version is 2.4-dev17-c3808c-13
[NOTICE] 116/191904 (23233) : path to executable is ./haproxy
[ALERT] 116/191904 (23233) : Could not open configuration file /dev/kmem : Permission denied
The error path of the NUMA topology detection introduced in commit
b56a7c89a ("MEDIUM: cfgparse: detect numa and set affinity if needed")
lacks an initialization resulting in possible crashes at boot. No
backport is needed since that was introduced in 2.4-dev.
The compilation is currently broken on platform without USE_CPU_AFFINITY
set. An error has been reported by the cygwin build of the CI.
This does not need to be backported.
In file included from include/haproxy/global-t.h:27,
from include/haproxy/global.h:26,
from include/haproxy/fd.h:33,
from src/ev_poll.c:22:
include/haproxy/cpuset-t.h:32:3: error: #error "No cpuset support implemented on this platform"
32 | # error "No cpuset support implemented on this platform"
| ^~~~~
include/haproxy/cpuset-t.h:37:2: error: unknown type name ‘CPUSET_REPR’
37 | CPUSET_REPR cpuset;
| ^~~~~~~~~~~
make: *** [Makefile:944: src/ev_poll.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from include/haproxy/global-t.h:27,
from include/haproxy/global.h:26,
from include/haproxy/fd.h:33,
from include/haproxy/connection.h:30,
from include/haproxy/ssl_sock.h:27,
from src/ssl_sample.c:30:
include/haproxy/cpuset-t.h:32:3: error: #error "No cpuset support implemented on this platform"
32 | # error "No cpuset support implemented on this platform"
| ^~~~~
include/haproxy/cpuset-t.h:37:2: error: unknown type name ‘CPUSET_REPR’
37 | CPUSET_REPR cpuset;
| ^~~~~~~~~~~
make: *** [Makefile:944: src/ssl_sample.o] Error 1
Render numa detection optional with a global configuration statement
'no numa-cpu-mapping'. This can be used if the applied affinity of the
algorithm is not optimal. Also complete the documentation with this new
keyword.
On process startup, the CPU topology of the machine is inspected. If a
multi-socket CPU machine is detected, automatically define the process
affinity on the first node with active cpus. This is done to prevent an
impact on the overall performance of the process in case the topology of
the machine is unknown to the user.
This step is not executed in the following condition :
- a non-null nbthread statement is present
- a restrictive 'cpu-map' statement is present
- the process affinity is already restricted, for example via a taskset
call
For the record, benchmarks were executed on a machine with 2 CPUs
Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. In both clear and ssl
scenario, the performance were sub-optimal without the automatic
rebinding on a single node.
Allow to specify multiple cpu ids/ranges in parse_cpu_set separated by a
comma. This is optional and must be activated by a parameter.
The comma support is disabled for the parsing of the 'cpu-map' config
statement. However, it will be useful to parse files in sysfs when
inspecting the cpus topology for NUMA automatic process binding.
Replace the unsigned long parameter by a hap_cpuset. This allows to
address CPU with index greater than LONGBITS.
This function is used to parse the 'cpu-map' statement. However at the
moment, the result is casted back to a long to store it in the global
structure. The next step is to replace ulong in in cpu_map in the
global structure with hap_cpuset.
The current "ADD" vs "ADDQ" is confusing because when thinking in terms
of appending at the end of a list, "ADD" naturally comes to mind, but
here it does the opposite, it inserts. Several times already it's been
incorrectly used where ADDQ was expected, the latest of which was a
fortunate accident explained in 6fa922562 ("CLEANUP: stream: explain
why we queue the stream at the head of the server list").
Let's use more explicit (but slightly longer) names now:
LIST_ADD -> LIST_INSERT
LIST_ADDQ -> LIST_APPEND
LIST_ADDED -> LIST_INLIST
LIST_DEL -> LIST_DELETE
The same is true for MT_LISTs, including their "TRY" variant.
LIST_DEL_INIT keeps its short name to encourage to use it instead of the
lazier LIST_DELETE which is often less safe.
The change is large (~674 non-comment entries) but is mechanical enough
to remain safe. No permutation was performed, so any out-of-tree code
can easily map older names to new ones.
The list doc was updated.
This patch registers the parsed file and the line where a log server
is declared to make those information available in configuration
post check.
Those new informations were added on error messages probed resolving
ring names on post configuration check.
L6 sample fetches are now ignored when called from an HTTP proxy. Thus, a
warning is emitted during the startup if such usage is detected. It is true
for most ACLs and for log-format strings. Unfortunately, it is a bit painful
to do so for sample expressions.
This patch relies on the commit "MINOR: action: Use a generic function to
check validity of an action rule list".
The check_action_rules() function is now used to check the validity of an
action rule list. It is used from check_config_validity() function to check
L5/6/7 rulesets.
If a 'switch-mode http' tcp action is configured on a listener with no
backend, a warning is displayed to remember HTTP connections cannot be
routed to TCP servers. Indeed, backend connection is still established using
the proxy mode.
It is just a small cleanup. AN_REQ_FLT_HTTP_HDRS and AN_RES_FLT_HTTP_HDRS
analysers are now set in HTTP analysers at the same place
AN_REQ_HTTP_XFER_BODY and AN_RES_HTTP_XFER_BODY are set.
No warning is emitted if some http-after-response rules are configured on a
TCP proxy while such warning messages are emitted for other HTTP ruleset in
same condition. It is just an oversight.
This patch may be backported as far as 2.2.
For now smp_resolve_args() complains on stderr via ha_alert(), but if we
want to make it a bit more dynamic, we need it to return errors in an
allocated message. Let's pass it an error pointer and have it fill it.
On return we indent the output if it contains more than one line.
Modify the API of parse_server function. Use flags to describe the type
of the parsed server instead of discrete arguments. These flags can be
used to specify if a server/default-server/server-template is parsed.
Additional parameters are also specified (parsing of the address
required, resolve of a name must be done immediately).
It is now unneeded to use strcmp on args[0] in parse_server. Also, the
calls to parse_server are more explicit thanks to the flags.
The idle conn task is is a global task used to cleanup backend
connections marked for deletion. Previously, it was only only allocated
if at least one server in the configuration has idle connections.
This assumption won't be valid anymore when new servers can be created
at runtime with idle connections. Always allocate the global idle conn
task.
If the configuration file contains a 'unix-bind prefix' directive, and
if we use the -S option and specify a UNIX socket path, the path of the
socket will be prepended with the value of the unix-bind prefix.
For instance, if we have 'unix-bind prefix /tmp/sockets/' and we use
'-S /tmp/master-socket' on the command line, we will get this error:
Starting proxy MASTER:
cannot bind UNIX socket (No such file or directory) [/tmp/sockets/tmp/master-socket]
So this patch adds an exception, and will ignore the unix-bind prefix
for the master CLI socket.
This patch can be backported as far as 1.9.
There were still a very small list of functions, variables and fields
called "stats_" while they were really purely CLI-centric. There's the
frontend called "stats_fe" in the global section, which instantiates a
"cli_applet" called "<CLI>" so it was renamed "cli_fe".
The "alloc_stats_fe" function cas renamed to "cli_alloc_fe" which also
better matches the naming convention of all cli-specific functions.
Finally the "stats_permission_denied_msg" used to return an error on
the CLI was renamed "cli_permission_denied_msg".
Now there's no more "stats_something" that designates the CLI.
Just like with the server keywords, now's the turn of "bind" keywords.
The difference is that 100% of the bind keywords are registered, thus
we do not need the list of extra keywords.
There are multiple bind line parsers today, all were updated:
- peers
- log
- dgram-bind
- cli
$ printf "listen f\nbind :8000 tcut\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/101358 (25146) : haproxy version is 2.4-dev11-7b8787-26
[NOTICE] 070/101358 (25146) : path to executable is ./haproxy
[ALERT] 070/101358 (25146) : parsing [/dev/stdin:2] : 'bind :8000' unknown keyword 'tcut'; did you mean 'tcp-ut' maybe ?
[ALERT] 070/101358 (25146) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/101358 (25146) : Fatal errors found in configuration.
Instead of just reporting "unknown keyword", let's provide a function which
will look through a list of registered keywords for a similar-looking word
to the one that wasn't matched. This will help callers suggest correct
spelling. Also, given that a large part of the config parser still relies
on a long chain of strcmp(), we'll need to be able to pass extra candidates.
Thus the function supports an optional extra list for this purpose.
The actconns list creates massive contention on low server counts because
it's in fact a list of streams using a server, all threads compete on the
list's head and it's still possible to see some watchdog panics on 48
threads under extreme contention with 47 threads trying to add and one
thread trying to delete.
Moving this list per thread is trivial because it's only used by
srv_shutdown_streams(), which simply required to iterate over the list.
The field was renamed to "streams" as it's really a list of streams
rather than a list of connections.
There are multiple per-thread lists in the listeners, which isn't the
most efficient in terms of cache, and doesn't easily allow to store all
the per-thread stuff.
Now we introduce an srv_per_thread structure which the servers will have an
array of, and place the idle/safe/avail conns tree heads into. Overall this
was a fairly mechanical change, and the array is now always initialized for
all servers since we'll put more stuff there. It's worth noting that the Lua
code still has to deal with its own deinit by itself despite being in a
global list, because its server is not dynamically allocated.
Till now servers were only initialized as part of the proxy setup loop,
which doesn't cover peers, tcp log, dns, lua etc. Let's move this part
out of this loop and instead iterate over all registered servers. This
way we're certain to visit them all.
The patch looks big but it's just a move of a large block with the
corresponding reindent (as can be checked with diff -b). It relies
on the two previous ones ("MINOR: server: add a global list of all
known servers and" and "CLEANUP: lua: set a dummy file name and line
number on the dummy servers").
A few occurrences of calls to free() to free a section name,
peers name or server name were using casts and didn't include
the trailing free, let's switch them to ha_free().
This makes the code more readable and less prone to copy-paste errors.
In addition, it allows to place some __builtin_constant_p() predicates
to trigger a link-time error in case the compiler knows that the freed
area is constant. It will also produce compile-time error if trying to
free something that is not a regular pointer (e.g. a function).
The DEBUG_MEM_STATS macro now also defines an instance for ha_free()
so that all these calls can be checked.
178 occurrences were converted. The vast majority of them were handled
by the following Coccinelle script, some slightly refined to better deal
with "&*x" or with long lines:
@ rule @
expression E;
@@
- free(E);
- E = NULL;
+ ha_free(&E);
It was verified that the resulting code is the same, more or less a
handful of cases where the compiler optimized slightly differently
the temporary variable that holds the copy of the pointer.
A non-negligible amount of {free(str);str=NULL;str_len=0;} are still
present in the config part (mostly header names in proxies). These
ones should also be cleaned for the same reasons, and probably be
turned into ist strings.
These function names are unbearably long, they don't even fit into the
screen in "show profiling", let's trim the "_connections" to "_conns",
which happens to match the name of the lists there.
The maximum number of connections accepted at once by a thread for a single
listener used to default to 64 divided by the number of processes but the
tasklet-based model is much more scalable and benefits from smaller values.
Experimentation has shown that 4 gives the highest accept rate for all
thread values, and that 3 and 5 come very close, as shown below (HTTP/1
connections forwarded per second at multi-accept 4 and 64):
ac\thr| 1 2 4 8 16
------+------------------------------
4| 80k 106k 168k 270k 336k
64| 63k 89k 145k 230k 274k
Some tests were also conducted on SSL and absolutely no change was observed.
The value was placed into a define because it used to be spread all over the
code.
It might be useful at some point to backport this to 2.3 and 2.2 to help
those who observed some performance regressions from 1.6.
This patch splits current dns.c into two files:
The first dns.c contains code related to DNS message exchange over UDP
and in future other TCP. We try to remove depencies to resolving
to make it usable by other stuff as DNS load balancing.
The new resolvers.c inherit of the code specific to the actual
resolvers.
Note:
It was really difficult to obtain a clean diff dur to the amount
of moved code.
Note2:
Counters and stuff related to stats is not cleany separated because
currently counters for both layers are merged and hard to separate
for now.