This reverts commit a331a1e8eb2ad4750711a477ca3e22d940495faf.
This commit fixes a real bug, but it also reveals some hidden bugs, mostly
because of some design issues. Thus, in itself, it create more problem than
it solves. So revert it for now. All known bugs will be addressed in next
commits.
This patch should be backported as far as 2.2.
This was introduced in previous commit 49c2b45c1 ("MINOR: cfgparse/server:
try to fix spelling mistakes on server lines"), the loop was changed but
the increment left. No backport is needed.
This adds support for action_suggest() in http-request, http-response
and http-after-response rulesets. For example:
parsing [/dev/stdin:2]: 'http-request' expects (...), but got 'del-hdr'. Did you mean 'del-header' maybe ?
action_suggest() will return a pointer to an action whose keyword more or
less ressembles the passed argument. It also accepts to be more tolerant
against prefixes (since actions taking arguments are handled as prefixes).
This will be used to suggest approaching words.
Just like with the server keywords, now's the turn of "bind" keywords.
The difference is that 100% of the bind keywords are registered, thus
we do not need the list of extra keywords.
There are multiple bind line parsers today, all were updated:
- peers
- log
- dgram-bind
- cli
$ printf "listen f\nbind :8000 tcut\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/101358 (25146) : haproxy version is 2.4-dev11-7b8787-26
[NOTICE] 070/101358 (25146) : path to executable is ./haproxy
[ALERT] 070/101358 (25146) : parsing [/dev/stdin:2] : 'bind :8000' unknown keyword 'tcut'; did you mean 'tcp-ut' maybe ?
[ALERT] 070/101358 (25146) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/101358 (25146) : Fatal errors found in configuration.
Let's apply the fuzzy match to server keywords so that we can avoid
dumping the huge list of supported keywords each time there is a spelling
mistake, and suggest proper spelling instead:
$ printf "listen f\nserver s 0 sendpx-v2\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/095718 (24152) : haproxy version is 2.4-dev11-caa6e3-25
[NOTICE] 070/095718 (24152) : path to executable is ./haproxy
[ALERT] 070/095718 (24152) : parsing [/dev/stdin:2] : 'server s' unknown keyword 'sendpx-v2'; did you mean 'send-proxy-v2' maybe ?
[ALERT] 070/095718 (24152) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/095718 (24152) : Fatal errors found in configuration.
The global section also knows a large number of keywords that are not
referenced in any list, so this needed them to be specifically listed.
It becomes particularly handy now because some tunables are never easy
to remember, but now it works remarkably well:
$ printf "global\nsched.queue_depth\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/093007 (23457) : haproxy version is 2.4-dev11-dd8ee5-24
[NOTICE] 070/093007 (23457) : path to executable is ./haproxy
[ALERT] 070/093007 (23457) : parsing [/dev/stdin:2] : unknown keyword 'sched.queue_depth' in 'global' section; did you mean 'tune.runqueue-depth' maybe ?
[ALERT] 070/093007 (23457) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/093007 (23457) : Fatal errors found in configuration.
Let's start by the largest keyword list, the listeners. Many keywords were
still not part of a list, so a common_kw_list array was added to list the
not enumerated ones. Now for example, typing "tmout" properly suggests
"timeout":
$ printf "frontend f\ntmout client 10s\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/091355 (22545) : haproxy version is 2.4-dev11-3b728a-21
[NOTICE] 070/091355 (22545) : path to executable is ./haproxy
[ALERT] 070/091355 (22545) : parsing [/dev/stdin:2] : unknown keyword 'tmout' in 'frontend' section; did you mean 'timeout' maybe ?
[ALERT] 070/091355 (22545) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/091355 (22545) : Fatal errors found in configuration.
Instead of just reporting "unknown keyword", let's provide a function which
will look through a list of registered keywords for a similar-looking word
to the one that wasn't matched. This will help callers suggest correct
spelling. Also, given that a large part of the config parser still relies
on a long chain of strcmp(), we'll need to be able to pass extra candidates.
Thus the function supports an optional extra list for this purpose.
This introduces two functions, one which creates a fingerprint of a word,
and one which computes a distance between two words fingerprints. The
fingerprint is made by counting the transitions between one character and
another one. Here we consider the 26 alphabetic letters regardless of
their case, then any digit as a digit, and anything else as "other". We
also consider the first and last locations as transitions from begin to
first char, and last char to end. The distance is simply the sum of the
squares of the differences between two fingerprints. This way, doubling/
missing a letter has the same cost, however some repeated transitions
such as "e"->"r" like in "server" are very unlikely to match against
situations where they do not exist. This is a naive approach but it seems
to work sufficiently well for now. It may be refined in the future if
needed.
The error message for http-request and http-response starts with a comma
that very likely is a leftover from a previous list construct. Let's remove
it: "'http-request' expects , 'wait-for-handshake', 'use-service' ...".
The error message after "http-response set-var" isn't very clear:
[ALERT] 070/115043 (30526) : parsing [/dev/stdin:2] : error detected in proxy 'f' while parsing 'http-response set-var' rule : invalid variable 'set-var'. Expects 'set-var(<var-name>)' or 'unset-var(<var-name>)'.
Let's change it to this instead:
[ALERT] 070/115608 (30799) : parsing [/dev/stdin:2] : error detected in proxy 'f' while parsing 'http-response set-var' rule : invalid or incomplete action 'set-var'. Expects 'set-var(<var-name>)' or 'unset-var(<var-name>)'.
With a wrong action name, it also works better (it's handled as a prefix
due to the opening parenthesis):
[ALERT] 070/115608 (30799) : parsing [/dev/stdin:2] : error detected in proxy 'f' while parsing 'http-response set-varxxx' rule : invalid or incomplete action 'set-varxxx'. Expects 'set-var(<var-name>)' or 'unset-var(<var-name>)'.
The tcp-request error message only mentions "accept", "reject" and
track-sc*, but there are a few other ones that were missing, so let's
add them.
This could be backported, though it's not likely that it will help anyone
with an existing config.
The refactoring in commit 131b07be3 ("MEDIUM: server: Refactor
apply_server_state() to make it more readable") also had a copy-paste
error resulting in using global.server_state_file instead of the
function's argument, which easily crashes with a conf having a
state file in a backend and no global state file.
In addition, let's simplify the code and get rid of strcpy() which
almost certainly will break the build on OpenBSD.
This was introduced in 2.4-dev10, no backport is needed.
The refactoring in commit 131b07be3 ("MEDIUM: server: Refactor
apply_server_state() to make it more readable") made the global
server_state_base be dereferenced before being checked, resulting
in a crash on certain files.
This happened in 2.4-dev10, no backport is needed.
When a "tcp-check" or a "http-check" rule is parsed, we try to get the
previous rule in the ruleset to get its index. We must take care to reset
the pointer on this rule in case an error is triggered later on the
parsing. Otherwise, the same rule may be released twice. For instance, it
happens with such line :
http-check meth GET uri / ## note there is no "send" parameter
This patch must be backported as far as 2.2.
If an agent-check is configured for a server, When the response is parsed,
the .health threshold of the agent must be updated on up/down/stopped/fail
command and not the threshold of the health-check. Otherwise, the
agent-check will compete with the health-check and may mark a DOWN server as
UP.
This patch should fix the issue #1176. It must be backported as far as 2.2.
CF_FL_ANALYZE flag is used to know a channel is filtered. It is important to
synchronize request and response channels when the filtering ends.
However, it is possible to call all request analyzers before starting the
filtering on the response channel. This means flt_end_analyze() may be
called for the request channel before flt_start_analyze() on the response
channel. Thus because CF_FL_ANALYZE flag is not set on the response channel,
we consider the filtering is finished on both sides. The consequence is that
flt_end_analyze() is not called for the response and backend filters are
unregistered before their execution on the response channel.
It is possible to encounter this bug on TCP frontend or CONNECT request on
HTTP frontend if the client shutdown is reveiced with the first read.
To fix this bug, CF_FL_ANALYZE is set when filters are attached to the
stream. It means, on the request channel when the stream is created, in
flt_stream_start(). And on both channels when the backend is set, in
flt_set_stream_backend().
This patch must be backported as far as 1.7.
Setting multiple http-request track-scX rules generates entries
which never expires.
If there was already an entry registered by a previous http rule
'stream_track_stkctr(&s->stkctr[rule->action], t, ts)' didn't
register the new 'ts' into the stkctr. And function is left
with no reference on 'ts' whereas refcount had been increased
by the '_get_entry'
The patch applies the same policy as the one showed on tcp track
rules and if there is successive rules the track counter keep the
first entry registered in the counter and nothing more is computed.
After validation this should be backported in all versions.
The recent default runqueue size reduction appeared to have significantly
lowered performance on low-thread count configs. Testing various values
runqueue values on different workloads under thread counts ranging from
1 to 64, it appeared that lower values are more optimal for high thread
counts and conversely. It could even be drawn that the optimal value for
various workloads sits around 280/sqrt(nbthread), and probably has to do
with both the L3 cache usage and how to optimally interlace the threads'
activity to minimize contention. This is much easier to optimally
configure, so let's do this by default now.
Instead of setting a hard-limit on runqueue-depth and keeping it short
to maintain fairness, let's allow the scheduler to automatically cut
the existing one in two equal halves if its size is between the configured
size and its double. This will allow to increase the default value while
keeping a low latency.
Emeric found that SSL+keepalive traffic had dropped quite a bit in the
recent changes, which could be bisected to recent commit 9205ab31d
("MINOR: ssl: mark the SSL handshake tasklet as heavy"). Indeed, a
first incarnation of this commit made use of the TASK_SELF_WAKING
flag but the last version directly used TASK_HEAVY, but it would still
continue to remove the already absent TASK_SELF_WAKING one instead of
TASK_HEAVY. As such, the SSL traffic remained processed with low
granularity.
No backport is needed as this is only 2.4.
The tests were made on slz and the zlib parsers for memlevel and windowsize
managed to escape the change made by commit 018251667 ("CLEANUP: config: make
the cfg_keyword parsers take a const for the defproxy"). This is now fixed.
These are some leftovers from the ancient code where they were still
called sessions, but these areas in the code remain confusing due to
this naming. They were now called "strm" which will not even affect
indenting nor alignment.
When implementing a client applet, a NULL dereference was encountered on
the error path which increment the counters.
Indeed, the counters incremented are the one in the listener which does
not exist in the case of client applets, so in sess->listener->counters,
listener is NULL.
This patch fixes the access to the listener structure when accessing
from a sesssion, most of the access are the counters in error paths.
Must be backported as far as 1.8.
The default proxy was passed as a variable to all parsers instead of a
const, which is not without risk, especially when some timeout parsers used
to make some int pointers point to the default values for comparisons. We
want to be certain that none of these parsers will modify the defaults
sections by accident, so it's important to mark this proxy as const.
This patch touches all occurrences found (89).
grq_total was incremented when picking tasks from the global run queue,
but this variable was not defined with threads disabled, and the code
was optimized away at -O2. No backport is needed.
This commit is a fix/complement to the following one :
08d87b3f49867440f66aee09173c84bf58cbc859
BUG/MEDIUM: backend: never reuse a connection for tcp mode
It fixes the check for the early insertion of backend connections in
the reuse lists if the backend mode is HTTP.
The impact of this bug seems limited because :
- in tcp mode, no insertion is done in the avail list as mux_pt does not
support multiple streams.
- in http mode, muxes are also responsible to insert backend connections
in lists in their detach functions. Prior to this fix the reuse rate
could be slightly inferior.
It can be backported to 2.3.
Currently, there seems to be no way to have the transport layer ready
but not the mux in the function connect_server. Add a BUG_ON to report
if this implicit condition is not true anymore.
This should fix coverity report from github issue #1120.
The actconns list creates massive contention on low server counts because
it's in fact a list of streams using a server, all threads compete on the
list's head and it's still possible to see some watchdog panics on 48
threads under extreme contention with 47 threads trying to add and one
thread trying to delete.
Moving this list per thread is trivial because it's only used by
srv_shutdown_streams(), which simply required to iterate over the list.
The field was renamed to "streams" as it's really a list of streams
rather than a list of connections.
There are multiple per-thread lists in the listeners, which isn't the
most efficient in terms of cache, and doesn't easily allow to store all
the per-thread stuff.
Now we introduce an srv_per_thread structure which the servers will have an
array of, and place the idle/safe/avail conns tree heads into. Overall this
was a fairly mechanical change, and the array is now always initialized for
all servers since we'll put more stuff there. It's worth noting that the Lua
code still has to deal with its own deinit by itself despite being in a
global list, because its server is not dynamically allocated.
Till now servers were only initialized as part of the proxy setup loop,
which doesn't cover peers, tcp log, dns, lua etc. Let's move this part
out of this loop and instead iterate over all registered servers. This
way we're certain to visit them all.
The patch looks big but it's just a move of a large block with the
corresponding reindent (as can be checked with diff -b). It relies
on the two previous ones ("MINOR: server: add a global list of all
known servers and" and "CLEANUP: lua: set a dummy file name and line
number on the dummy servers").
It's a real pain not to have access to the list of all registered servers,
because whenever there is a need to late adjust their configuration, only
those attached to regular proxies are seen, but not the peers, lua, logs
nor DNS.
What this patch does is that new_server() will automatically add the newly
created server to a global list, and it does so as well for the 1 or 2
statically allocated servers created for Lua. This way it will be possible
to iterate over all of them.
The "socket_tcp" and "socket_ssl" servers had no config file name nor
line number, but this is sometimes annoying during debugging or later
in error messages, while all other places using new_server() or
parse_server() make sure to have a valid file:line set. Let's set
something to address this.
Currently the SSL layer checks the validity of its tasklet's context just
in case it would have been stolen, had the connection been idle. Now it
will be able to be notified by the mux when this situation happens so as
not to have to grab the idle connection lock on each pass. This reuses the
TASK_F_USR1 flag just as the muxes do.
These functions are used on the mux layer to indicate that the connection
is becoming idle and that the xprt ought to be careful before checking the
context or that it's not idle anymore and that the context is safe. The
purpose is to allow a mux which is going to release a connection to tell
the xprt to be careful when touching it. At the moment, the xprt are
always careful and that's costly so we want to have the ability to relax
this a bit.
No xprt layer uses this yet.
The muxes are touching the idle_conns_lock all the time now because
they need to be careful that no other thread has stolen their tasklet's
context.
This patch changes this a little bit by setting the TASK_F_USR1 flag on
the tasklet before marking a connection idle, and removing it once it's
not idle anymore. Thanks to this we have the guarantee that a tasklet
without this flag cannot be present in an idle list and does not need
to go through this costly lock. This is especially true for front
connections.
This flag will be usable by any application. It will be preserved across
wakeups so the application can use it to do various stuff. Some I/O
handlers will soon benefit from this.
It's been too short for quite a while now and is now full. It's still
time to extend it to 32-bits since we have room for this without
wasting any space, so we now gained 16 new bits for future flags.
The values were not reassigned just in case there would be a few
hidden u16 or short somewhere in which these flags are placed (as
it used to be the case with stream->pending_events).
The patch is tagged MEDIUM because this required to update the task's
process() prototype to use an int instead of a short, that's quite a
bunch of places.
It's cleaner to use a flag from the task's state to detect a tasklet
and it's even cheaper. One of the best benefits is that this will
allow to get the nice field out of the common part since the tasklet
doesn't need it anymore. This commit uses the last task bit available
but that's temporary as the purpose of the change is to extend this.
The PRNG used by the "random" LB algorithm was the central one which tries
hard to produce "correct" (i.e. hardly predictable) values suitable for use
in UUIDs or cookies. It's much too expensive for pure load balancing where
a cheaper thread-local PRNG is sufficient, and the current PRNG is part of
the hot places when running with many threads.
Let's switch to the stastistical PRNG instead, it's thread-local, very
fast, and with a period of (2^32)-1 which is more than enough to decide
on a server.
We frequently need to access a simple and fast PRNG for statistical
purposes. The debug_prng() function did exactly this using a xorshift
generator but its use was limited to debug only. Let's move this to
tools.h and tools.c to make it accessible everywhere. Since it needs to
be fast, its state is thread-local. An initialization function starts a
different initial value for each thread for better distribution.
In conn_backend_get() we can cause some extreme contention due to the
idle_conns_lock. Indeed, even though it's per-thread, it still causes
high contention when running with many threads. The reason is that all
threads which do not have any idle connections are quickly skipped,
till the point where there are still some, so the first reaching that
point will grab the lock and the other ones wait behind. From this
point, all threads are synchronized waiting on the same lock, and
will follow the leader in small jumps, all hindering each other.
Here instead of doing this we're using a trylock. This way when a thread
is already checking a list, other ones will continue to next thread. In
the worst case, a high contention will lead to a few new connections to
be set up, but this may actually be what is required to avoid contention
in the first place. With this change, the contention has mostly
disappeared on this lock (it's still present in muxes and transport
layers due to the takeover).
Surprisingly, checking for emptiness of the tree root before taking
the lock didn't address any contention.
A few improvements are still possible and desirable here. The first
one would be to avoid seeing all threads jump to the next one. We
could have each thread use a different prime number as the increment
so as to spread them across the entire table instead of keeping them
synchronized. The second one is that the lock in the muck layers
shouldn't be needed to check for the tasklet's context availability.
Using abort() occasionally results in unexploitable core due to issues
rewinding the stack. Let's use ABORT_NOW() which in addition to crashing
much closer to the call point also has the benefit of showing the call
trace.