The problem with the way idle connections currently work is that it's
easy for a thread to steal all of its siblings' connections, then release
them, then it's done by another one, etc. This happens even more easily
due to scheduling latencies, or merged events inside the same pool loop,
which, when dealing with a fast server responding in sub-millisecond
delays, can really result in one thread being fully at work at a time.
In such a case, we perform a huge amount of takeover() which consumes
CPU and requires quite some locking, sometimes resulting in lower
performance than expected.
In order to fight against this problem, this patch introduces a new server
setting "pool-low-conn", whose purpose is to dictate when it is allowed to
steal connections from a sibling. As long as the number of idle connections
remains at least as high as this value, it is permitted to take over another
connection. When the idle connection count becomes lower, a thread may only
use its own connections or create a new one. By proceeding like this even
with a low number (typically 2*nbthreads), we quickly end up in a situation
where all active threads have a few connections. It then becomes possible
to connect to a server without bothering other threads the vast majority
of the time, while still being able to use these connections when the
number of available FDs becomes low.
We also use this threshold instead of global.nbthread in the connection
release logic, allowing to keep more extra connections if needed.
A test performed with 10000 concurrent HTTP/1 connections, 16 threads
and 210 servers with 1 millisecond of server response time showed the
following numbers:
haproxy 2.1.7: 185000 requests per second
haproxy 2.2: 314000 requests per second
haproxy 2.2 lowconn 32: 352000 requests per second
The takeover rate goes down from 300k/s to 13k/s. The difference is
further amplified as the response time shrinks.
Starting with commit 079cb9a ("MEDIUM: connections: Revamp the way idle
connections are killed") we started to improve the way to compute the
need for idle connections. But the condition to keep a connection idle
or drop it when releasing it was not updated. This often results in
storms of close when certain thresholds are met, and long series of
takeover() when there aren't enough connections left for a thread on
a server.
This patch tries to improve the situation this way:
- it keeps an estimate of the number of connections needed for a server.
This estimate is a copy of the max over previous purge period, or is a
max of what is seen over current period; it differs from max_used_conns
in that this one is a counter that's reset on each purge period ;
- when releasing, if the number of current idle+used connections is
lower than this last estimate, then we'll keep the connection;
- when releasing, if the current thread's idle conns head is empty,
and we don't exceed the estimate by the number of threads, then
we'll keep the connection.
- when cleaning up connections, we consider the max of the last two
periods to avoid killing too many idle conns when facing bursty
traffic.
Thanks to this we can better converge towards a situation where, provided
there are enough FDs, each active server keeps at least one idle connection
per thread all the time, with a total number close to what was needed over
the previous measurement period (as defined by pool-purge-delay).
On tests with large numbers of concurrent connections (30k) and many
servers (200), this has quite smoothed the CPU usage pattern, increased
the reuse rate and roughly halved the takeover rate.
There's a minor glitch with the way idle connections start to be evicted.
The lookup always goes from thread 0 to thread N-1. This causes depletion
of connections on the first threads and abundance on the last ones. This
is visible with the takeover() stats below:
$ socat - /tmp/sock1 <<< "show activity"|grep ^fd ; \
sleep 10 ; \
socat -/tmp/sock1 <<< "show activity"|grep ^fd
fd_takeover: 300144 [ 91887 84029 66254 57974 ]
fd_takeover: 359631 [ 111369 99699 79145 69418 ]
There are respectively 19k, 15k, 13k and 11k takeovers for only 4 threads,
indicating that the first thread needs a foreign FD twice more often than
the 4th one.
This patch changes this si that all threads are scanned in round robin
starting with the current one. The takeovers now happen in a much more
distributed way (about 4 times 9k) :
fd_takeover: 1420081 [ 359562 359453 346586 354480 ]
fd_takeover: 1457044 [ 368779 368429 355990 363846 ]
There is no need to backport this, as this happened along a few patches
that were merged during 2.2 development.
We used to have 3 thread-based arrays for toremove_lock, idle_cleanup,
and toremove_connections. The problem is that these items are small,
and that this creates false sharing between threads since it's possible
to pack up to 8-16 of these values into a single cache line. This can
cause real damage where there is contention on the lock.
This patch creates a new array of struct "idle_conns" that is aligned
on a cache line and which contains all three members above. This way
each thread has access to its variables without hindering the other
ones. Just doing this increased the HTTP/1 request rate by 5% on a
16-thread machine.
The definition was moved to connection.{c,h} since it appeared a more
natural evolution of the ongoing changes given that there was already
one of them declared in connection.h previously.
Apparently Cygwin requires sys/types.h before netinet/tcp.h but doesn't
include it by itself, as shown here:
https://github.com/haproxy/haproxy/actions/runs/131943890
This patch makes sure it's always present, which is in server.c and
the SPOA example.
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
Most of the files dealing with error reports have to include log.h in order
to access ha_alert(), ha_warning() etc. But while these functions don't
depend on anything, log.h depends on a lot of stuff because it deals with
log-formats and samples. As a result it's impossible not to embark long
dependencies when using ha_warning() or qfprintf().
This patch moves these low-level functions to errors.h, which already
defines the error codes used at the same places. About half of the users
of log.h could be adjusted, sometimes revealing other issues such as
missing tools.h. Interestingly the total preprocessed size shrunk by
4%.
Checks.c remains one of the largest file of the project and it contains
too many things. The tcpchecks code represents half of this file, and
both parts are relatively isolated, so let's move it away into its own
file. We now have tcpcheck.c, tcpcheck{,-t}.h.
Doing so required to export quite a number of functions because check.c
has almost everything made static, which really doesn't help to split!
check.c is one of the largest file and contains too many things. The
e-mail alerting code is stored there while nothing is in mailers.c.
Let's move this code out. That's only 4% of the code but a good start.
In order to do so, a few tcp-check functions had to be exported.
There's no point splitting the file in two since only cfgparse uses the
types defined there. A few call places were updated and cleaned up. All
of them were in C files which register keywords.
There is nothing left in common/ now so this directory must not be used
anymore.
This one was not easy because it was embarking many includes with it,
which other files would automatically find. At least global.h, arg.h
and tools.h were identified. 93 total locations were identified, 8
additional includes had to be added.
In the rare files where it was possible to finalize the sorting of
includes by adjusting only one or two extra lines, it was done. But
all files would need to be rechecked and cleaned up now.
It was the last set of files in types/ and proto/ and these directories
must not be reused anymore.
extern struct dict server_name_dict was moved from the type file to the
main file. A handful of inlined functions were moved at the bottom of
the file. Call places were updated to use server-t.h when relevant, or
to simply drop the entry when not needed.
The files remained mostly unchanged since they were OK. However, half of
the users didn't need to include them, and about as many actually needed
to have it and used to find functions like srv_currently_usable() through
a long chain that broke when moving the file.
Almost no change except moving the cli_kw struct definition after the
defines. Almost all users had both types&proto included, which is not
surprizing since this code is old and it used to be the norm a decade
ago. These places were cleaned.
Just some minor reordering, and the usual cleanup of call places for
those which didn't need it. We don't include the whole tools.h into
stats-t anymore but just tools-t.h.
The type file was slightly tidied. The cli-specific APPCTX_CLI_ST1_* flag
definitions were moved to cli.h. The type file was adjusted to include
buf-t.h and not the huge buf.h. A few call places were fixed because they
did not need this include.
All includes that were not absolutely necessary were removed because
checks.h happens to very often be part of dependency loops. A warning
was added about this in check-t.h. The fields, enums and structs were
a bit tidied because it's particularly tedious to find anything there.
It would make sense to split this in two or more files (at least
extract tcp-checks).
The file was renamed to the singular because it was one of the rare
exceptions to have an "s" appended to its name compared to the struct
name.
The type file is becoming a mess, half of it is for the proxy protocol,
another good part describes conn_streams and mux ops, it would deserve
being split again. At least it was reordered so that elements are easier
to find, with the PP-stuff left at the end. The MAX_SEND_FD macro was moved
to compat.h as it's said to be the value for Linux.
The TASK_IS_TASKLET() macro was moved to the proto file instead of the
type one. The proto part was a bit reordered to remove a number of ugly
forward declaration of static inline functions. About a tens of C and H
files had their dependency dropped since they were not using anything
from task.h.
global.h was one of the messiest files, it has accumulated tons of
implicit dependencies and declares many globals that make almost all
other file include it. It managed to silence a dependency loop between
server.h and proxy.h by being well placed to pre-define the required
structs, forcing struct proxy and struct server to be forward-declared
in a significant number of files.
It was split in to, one which is the global struct definition and the
few macros and flags, and the rest containing the functions prototypes.
The UNIX_MAX_PATH definition was moved to compat.h.
This one is particularly tricky to move because everyone uses it
and it depends on a lot of other types. For example it cannot include
arg-t.h and must absolutely only rely on forward declarations to avoid
dependency loops between vars -> sample_data -> arg. In order to address
this one, it would be nice to split the sample_data part out of sample.h.
The protocol.h files are pretty low in the dependency and (sadly) used
by some files from common/. Almost nothing was changed except lifting a
few comments.
The type was moved out as it's used by standard.h for netns_entry.
Instead of just being a forward declaration when not used, it's an
empty struct, which makes gdb happier (the resulting stripped executable
is the same).
This one is included almost everywhere and used to rely on a few other
.h that are not needed (unistd, stdlib, standard.h). It could possibly
make sense to split it into multiple parts to distinguish operations
performed on timers and the internal time accounting, but at this point
it does not appear much important.
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:
- common/config.h
- common/compat.h
- common/compiler.h
- common/defaults.h
- common/initcall.h
- common/tools.h
The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.
In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.
No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
This is where other imported components are located. All files which
used to directly include ebtree were touched to update their include
path so that "import/" is now prefixed before the ebtree-related files.
The ebtree.h file was slightly adjusted to read compiler.h from the
common/ subdirectory (this is the only change).
A build issue was encountered when eb32sctree.h is loaded before
eb32tree.h because only the former checks for the latter before
defining type u32. This was addressed by adding the reverse ifdef
in eb32tree.h.
No further cleanup was done yet in order to keep changes minimal.
log-proto <logproto>
The "log-proto" specifies the protocol used to forward event messages to
a server configured in a ring section. Possible values are "legacy"
and "octet-count" corresponding respectively to "Non-transparent-framing"
and "Octet counting" in rfc6587. "legacy" is the default.
Notes: a separated io_handler was created to avoid per messages test
and to prepare code to set different log protocols such as
request- response based ones.
Helper functions are used to dump bind, server or filter keywords. These
functions are used to report errors during the configuration parsing. To have a
coherent API, these functions are now prepared to handle a null pointer as
argument. If so, no action is performed and functions immediately return.
This patch should fix the issue #631. It is not a bug. There is no reason to
backport it.
srv_cleanup_connections() is supposed to be static, so mark it as so.
This patch should be backported where commit 6318d33ce625
("BUG/MEDIUM: connections: force connections cleanup on server changes")
will be backported, that is to say v1.9 to v2.1.
Fixes: 6318d33ce625 ("BUG/MEDIUM: connections: force connections cleanup
on server changes")
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
I've been trying to understand a change of behaviour between v2.2dev5 and
v2.2dev6. Indeed our probe is regularly testing to add and remove
servers on a given backend such as:
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 263 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 0.0.0.0 0 1 256 256 0 15 3 0 14 0 0 0 - 0 -
-> curl on the corresponding frontend: reply from server:31255
# echo "set server be_foo/srv1 addr 10.236.139.34 port 31257" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '0.0.0.0' to '10.236.139.34', port changed from '0' to '31257' by 'stats socket command'
# echo "set server be_foo/srv1 weight 256" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 check-port 8500" | sudo socat stdio /var/lib/haproxy/stats
health check port updated.
# echo "set server be_foo/srv1 state ready" | sudo socat stdio /var/lib/haproxy/stats
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 105 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 10.236.139.34 2 0 256 256 2319 15 3 2 6 0 0 0 - 31257 -
-> curl on the corresponding frontend: reply for server:31257
(notice the difference of weight)
# echo "set server be_foo/srv1 state maint" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 addr 0.0.0.0 port 0" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '10.236.139.34' to '0.0.0.0', port changed from '31257' to '0' by 'stats socket command'
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 263 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 0.0.0.0 0 1 256 256 0 15 3 0 14 0 0 0 - 0 -
-> curl on the corresponding frontend: reply from server:31255
# echo "set server be_foo/srv1 addr 10.236.139.34 port 31256" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '0.0.0.0' to '10.236.139.34', port changed from '0' to '31256' by 'stats socket command'
# echo "set server be_foo/srv1 weight 256" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 check-port 8500" | sudo socat stdio /var/lib/haproxy/stats
health check port updated.
# echo "set server be_foo/srv1 state ready" | sudo socat stdio /var/lib/haproxy/stats
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 105 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 10.236.139.34 2 0 256 256 2319 15 3 2 6 0 0 0 - 31256 -
-> curl on the corresponding frontend: reply from server:31257 (!)
Here we indeed would expect to get an anver from server:31256. The issue
is highly linked to the usage of `pool-purge-delay`, with a value which
is higher than the duration of the test, 10s in our case.
a git bisect between dev5 and dev6 seems to show commit
079cb9af22da6 ("MEDIUM: connections: Revamp the way idle connections are killed")
being the origin of this new behaviour.
So if I understand the later correctly, it seems that it was more a
matter of chance that we did not saw the issue earlier.
My patch proposes to force clean idle connections in the two following
cases:
- we set a (still running) server to maintenance
- we change the ip/port of a server
This commit should be backported to 2.1, 2.0, and 1.9.
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
The variable 'ret' must only be declared When HAProxy is compiled with the SSL
support (more precisely SSL_CTRL_SET_TLSEXT_HOSTNAME must be defined).
No backport needed.
A shared tcp-check ruleset is now created to support agent checks. The following
sequence is used :
tcp-check send "%[var(check.agent_string)] log-format
tcp-check expect custom
The custom function to evaluate the expect rule does the same that it was done
to handle agent response when a custom check was used.
Parsing of following keywords have been moved in checks.c file : addr, check,
check-send-proxy, check-via-socks4, no-check, no-check-send-proxy, rise, fall,
inter, fastinter, downinter and port.
A global list to tcp-check ruleset can now be used to share common rulesets with
all backends without any duplication. It is mandatory to convert all specific
protocol checks (redis, pgsql...) to tcp-check healthchecks.
To do so, a flag is now attached to each tcp-check ruleset to know if it is a
shared ruleset or not. tcp-check rules defined in a backend are still directly
attached to the proxy and not shared. In addition a second flag is used to know
if the ruleset is inherited from the defaults section.
To allow reusing these blocks without consuming more memory, their list
should be static and share-able accross uses. The head of the list will
be shared as well.
It is thus necessary to extract the head of the rule list from the proxy
itself. Transform it into a pointer instead, that can be easily set to
an external dynamically allocated head.
The options and directives related to the configuration of checks in a backend
may be defined after the servers declarations. So, initialization of the check
of each server must not be performed during configuration parsing, because some
info may be missing. Instead, it must be done during the configuration validity
check.
Thus, callback functions are registered to be called for each server after the
config validity check, one for the server check and another one for the server
agent-check. In addition deinit callback functions are also registered to
release these checks.
This patch should be backported as far as 1.7. But per-server post_check
callback functions are only supported since the 2.1. And the initcall mechanism
does not exist before the 1.9. Finally, in 1.7, the code is totally
different. So the backport will be harder on older versions.
Error codes ERR_WARN and ERR_ALERT are used to signal that the error
given is of the corresponding level. All errors are displayed as ALERT
in the display_parser_err() function.
Differentiate the display level based on the error code. If both
ERR_WARN and ERR_ALERT are used, ERR_ALERT is given priority.
Documentation states that default settings for ssl server options can be set
using either ssl-default-server-options or default-server directives. In practice,
not all ssl server options can have default values, such as ssl-min-ver, ssl-max-ver,
etc..
This patch adds the missing ssl options in srv_ssl_settings_cpy() and srv_parse_ssl(),
making it possible to write configurations like the following examples, and have them
behave as expected.
global
ssl-default-server-options ssl-max-ver TLSv1.2
defaults
mode http
listen l1
bind 1.2.3.4:80
default-server ssl verify none
server s1 1.2.3.5:443
listen l2
bind 2.2.3.4:80
default-server ssl verify none ssl-max-ver TLSv1.3 ssl-min-ver TLSv1.2
server s1 1.2.3.6:443
This should be backported as far as 1.8.
This fixes issue #595.
Before supporting "server" line in "peers" section, such sections without
any local peer were removed from the configuration to get it validated.
This patch fixes the issue where a "server" line without address and port which
is a remote peer without address and port makes the configuration parsing fail.
When encoutering such cases we now ignore such lines remove them from the
configuration.
Thank you to Jérôme Magnin for having reported this bug.
Must be backported to 2.1 and 2.0.
The original algorithm always killed half the idle connections. This doesn't
take into account the way the load can change. Instead, we now kill half
of the exceeding connections (exceeding connection being the number of
used + idle connections past the last maximum used connections reached).
That way if we reach a peak, we will kill much less, and it'll slowly go back
down when there's less usage.