When a new connection is created, its target is always set just after. So the
connection target may set when it is created instead, during its initialisation
to be precise. It is the purpose of this patch. Now, conn_new() function is
called with the connection target as parameter. The target is then passed to
conn_init(). It means the target must be passed when cs_new() is called. In this
case, the target is only used when the conn-stream is created with no
connection. This only happens for tcpchecks for now.
The session_get_conn() must now be used to look for an available connection
matching a specific target for a given session. This simplifies a bit the
connect_server() function.
When a connection is marked as private, it is now added in the session server
list. We don't wait a stream is detached from the mux to do so. When the
connection is created, this happens after the mux creation. Otherwise, it is
performed when the connection is marked as private.
To allow that, when a connection is created, the session is systematically set
as the connectin owner. Thus, a backend connection has always a owner during its
creation. And a private connection has always a owner until its death.
Note that outside the detach() callback, if the call to session_add_conn()
failed, the error is ignored. In this situation, we retry to add the connection
into the session server list in the detach() callback. If this fails at this
step, the multiplexer is destroyed and the connection is closed.
To set a connection as private, the conn_set_private() function must now be
called. It sets the CO_FL_PRIVATE flags, but it also remove the connection from
the available connection list, if necessary. For now, it never happens because
only HTTP/1 connections may be set as private after their creation. And these
connections are never inserted in the available connection list.
When a new connection is created, it may immediatly be set as private if
http-reuse never is configured for the backend. There is no reason to wait the
call to mux->detach() to do so.
If an expression is configured to set the SNI on a server connection, the
connection is marked as private. To not needlessly add it in the available
connection list when the mux is installed, the SNI is now set on the connection
before installing the mux, just after the call to si_connect().
When a connection is added to an idle list, it's already detached and
cannot be seen by two threads at once, so there's no point using
TRY_ADDQ, there will never be any conflict. Let's just use the cheaper
ADDQ.
Initially when mt_lists were added, their purpose was to be used with
the scheduler, where anyone may concurrently add the same tasklet, so
it sounded natural to implement a check in MT_LIST_ADD{,Q}. Later their
usage was extended and MT_LIST_ADD{,Q} started to be used on situations
where the element to be added was exclusively owned by the one performing
the operation so a conflict was impossible. This became more obvious with
the idle connections and the new macro was called MT_LIST_ADDQ_NOCHECK.
But this remains confusing and at many places it's not expected that
an MT_LIST_ADD could possibly fail, and worse, at some places we start
by initializing it before adding (and the test is superflous) so let's
rename them to something more conventional to denote the presence of the
check or not:
MT_LIST_ADD{,Q} : inconditional operation, the caller owns the
element, and doesn't care about the element's
current state (exactly like LIST_ADD)
MT_LIST_TRY_ADD{,Q}: only perform the operation if the element is not
already added or in the process of being added.
This means that the previously "safe" MT_LIST_ADD{,Q} are not "safe"
anymore. This also means that in case of backport mistakes in the
future causing this to be overlooked, the slower and safer functions
will still be used by default.
Note that the missing unchecked MT_LIST_ADD macro was added.
The rest of the code will have to be reviewed so that a number of
callers of MT_LIST_TRY_ADDQ are changed to MT_LIST_ADDQ to remove
the unneeded test.
When a connection is created and the multiplexer is installed, if the connection
is marked as private, don't consider it as available, regardless the number of
available streams. This test is performed when the mux is installed when the
connection is created, in connect_server(), and when the mux is installed after
the handshakes stage.
No backport needed, this is 2.2-dev.
When a connection is picked from the session server list because the proxy or
the session are marked to use the last requested server, if it is idle, we must
marked it as used removing the CO_FL_SESS_IDLE flag and decrementing the session
idle_conns counter.
This patch must be backported as far as 1.9.
In the various takeover() methods, make sure we schedule the old tasklet
on the old thread, as we don't want it to run on our own thread! This
was causing a very rare crash when building with DEBUG_STRICT, seeing
that either an FD's thread mask didn't match the thread ID in h1_io_cb(),
or that stream_int_notify() would try to queue a task with the wrong
tid_bit.
In order to reproduce this, it is necessary to maintain many connections
(typically 30k) at a high request rate flowing over H1+SSL between two
proxies, the second of which would randomly reject ~1% of the incoming
connection and randomly killing some idle ones using a very short client
timeout. The request rate must be adjusted so that the CPUs are nearly
saturated, but never reach 100%. It's easier to reproduce this by skipping
local connections and always picking from other threads. The issue
should happen in less than 20s otherwise it's necessary to restart to
reset the idle connections lists.
No backport is needed, takeover() is 2.2 only.
Enables ('on') or disables ('off') sharing of idle connection pools between
threads for a same server. The default is to share them between threads in
order to minimize the number of persistent connections to a server, and to
optimize the connection reuse rate. But to help with debugging or when
suspecting a bug in HAProxy around connection reuse, it can be convenient to
forcefully disable this idle pool sharing between multiple threads, and force
this option to "off". The default is on.
This could have been nice to have during the idle connections debugging,
but it's not too late to add it!
The next thread walking algorithm in commit 566df309c ("MEDIUM:
connections: Attempt to get idle connections from other threads.")
proved to be sufficient for most cases, but it still has some rough
edges when threads are unevenly loaded. If one thread wakes up with
10 streams to process in a burst, it will mainly take over connections
from the next one until it doesn't have anymore.
This patch implements a rotating index that is stored into the server
list and that any thread taking over a connection is responsible for
updating. This way it starts mostly random and avoids always picking
from the same place. This results in a smoother distribution overall
and a slightly lower takeover rate.
There's a tricky behavior that was lost when the idle connections were
made sharable between thread in commit 566df309c ("MEDIUM: connections:
Attempt to get idle connections from other threads."), it is the ability
to retry from the safe list when looking for any type of idle connection
and not finding one in the idle list.
It is already important when dealing with long-lived connections since
they ultimately all become safe, but that case is already covered by
the fact that safe conns not being used end up closing and are not
looked up anymore since connect_server() sees there are none.
But it's even more important when using server-side connections which
periodically close, because the new connections may spend half of their
time in safe state and the other half in the idle state, and failing
to grab one such connection from the right list results in establishing
a new connection.
This patch makes sure that a failure to find an idle connection results
in a new attempt at finding one from the safe list if available. In order
to avoid locking twice, connections are attempted alternatively from the
idle then safe list when picking from siblings. Tests have shown a ~2%
performance increase by avoiding to lock twice.
A typical test with 10000 connections over 16 threads with 210 servers
having a 1 millisecond response time and closing every 5 requests shows
a degrading performance starting at 120k req/s down to 60-90k and an
average reuse rate of 44%. After the fix, the reuse rate raises to 79%
and the performance becomes stable at 254k req/s. Similarly the previous
test with full keep-alive has now increased from 96% reuse rate to 99%
and from 352k to 375k req/s.
No backport is needed as this is 2.2-only.
The problem with the way idle connections currently work is that it's
easy for a thread to steal all of its siblings' connections, then release
them, then it's done by another one, etc. This happens even more easily
due to scheduling latencies, or merged events inside the same pool loop,
which, when dealing with a fast server responding in sub-millisecond
delays, can really result in one thread being fully at work at a time.
In such a case, we perform a huge amount of takeover() which consumes
CPU and requires quite some locking, sometimes resulting in lower
performance than expected.
In order to fight against this problem, this patch introduces a new server
setting "pool-low-conn", whose purpose is to dictate when it is allowed to
steal connections from a sibling. As long as the number of idle connections
remains at least as high as this value, it is permitted to take over another
connection. When the idle connection count becomes lower, a thread may only
use its own connections or create a new one. By proceeding like this even
with a low number (typically 2*nbthreads), we quickly end up in a situation
where all active threads have a few connections. It then becomes possible
to connect to a server without bothering other threads the vast majority
of the time, while still being able to use these connections when the
number of available FDs becomes low.
We also use this threshold instead of global.nbthread in the connection
release logic, allowing to keep more extra connections if needed.
A test performed with 10000 concurrent HTTP/1 connections, 16 threads
and 210 servers with 1 millisecond of server response time showed the
following numbers:
haproxy 2.1.7: 185000 requests per second
haproxy 2.2: 314000 requests per second
haproxy 2.2 lowconn 32: 352000 requests per second
The takeover rate goes down from 300k/s to 13k/s. The difference is
further amplified as the response time shrinks.
In conn_backend_get() we can avoid locking other servers when trying
to steal their connections when we know for sure they will not have
one, so let's do it to lower the contention on the lock.
Starting with commit 079cb9a ("MEDIUM: connections: Revamp the way idle
connections are killed") we started to improve the way to compute the
need for idle connections. But the condition to keep a connection idle
or drop it when releasing it was not updated. This often results in
storms of close when certain thresholds are met, and long series of
takeover() when there aren't enough connections left for a thread on
a server.
This patch tries to improve the situation this way:
- it keeps an estimate of the number of connections needed for a server.
This estimate is a copy of the max over previous purge period, or is a
max of what is seen over current period; it differs from max_used_conns
in that this one is a counter that's reset on each purge period ;
- when releasing, if the number of current idle+used connections is
lower than this last estimate, then we'll keep the connection;
- when releasing, if the current thread's idle conns head is empty,
and we don't exceed the estimate by the number of threads, then
we'll keep the connection.
- when cleaning up connections, we consider the max of the last two
periods to avoid killing too many idle conns when facing bursty
traffic.
Thanks to this we can better converge towards a situation where, provided
there are enough FDs, each active server keeps at least one idle connection
per thread all the time, with a total number close to what was needed over
the previous measurement period (as defined by pool-purge-delay).
On tests with large numbers of concurrent connections (30k) and many
servers (200), this has quite smoothed the CPU usage pattern, increased
the reuse rate and roughly halved the takeover rate.
There's a minor glitch with the way idle connections start to be evicted.
The lookup always goes from thread 0 to thread N-1. This causes depletion
of connections on the first threads and abundance on the last ones. This
is visible with the takeover() stats below:
$ socat - /tmp/sock1 <<< "show activity"|grep ^fd ; \
sleep 10 ; \
socat -/tmp/sock1 <<< "show activity"|grep ^fd
fd_takeover: 300144 [ 91887 84029 66254 57974 ]
fd_takeover: 359631 [ 111369 99699 79145 69418 ]
There are respectively 19k, 15k, 13k and 11k takeovers for only 4 threads,
indicating that the first thread needs a foreign FD twice more often than
the 4th one.
This patch changes this si that all threads are scanned in round robin
starting with the current one. The takeovers now happen in a much more
distributed way (about 4 times 9k) :
fd_takeover: 1420081 [ 359562 359453 346586 354480 ]
fd_takeover: 1457044 [ 368779 368429 355990 363846 ]
There is no need to backport this, as this happened along a few patches
that were merged during 2.2 development.
The FD takeover operation might have certain impacts explaining
unexpected activities, so it's important to report such a counter
there. We thus count the number of times a thread has stolen an
FD from another thread.
In connect_server(), we want to increase curr_used_conns only if the
connection is new, or if it comes from an idle_pool, otherwise it means
the connection is already used by at least one another stream, and it is
already accounted for.
We used to have 3 thread-based arrays for toremove_lock, idle_cleanup,
and toremove_connections. The problem is that these items are small,
and that this creates false sharing between threads since it's possible
to pack up to 8-16 of these values into a single cache line. This can
cause real damage where there is contention on the lock.
This patch creates a new array of struct "idle_conns" that is aligned
on a cache line and which contains all three members above. This way
each thread has access to its variables without hindering the other
ones. Just doing this increased the HTTP/1 request rate by 5% on a
16-thread machine.
The definition was moved to connection.{c,h} since it appeared a more
natural evolution of the ongoing changes given that there was already
one of them declared in connection.h previously.
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
This one was not easy because it was embarking many includes with it,
which other files would automatically find. At least global.h, arg.h
and tools.h were identified. 93 total locations were identified, 8
additional includes had to be added.
In the rare files where it was possible to finalize the sorting of
includes by adjusting only one or two extra lines, it was done. But
all files would need to be rechecked and cleaned up now.
It was the last set of files in types/ and proto/ and these directories
must not be reused anymore.
extern struct dict server_name_dict was moved from the type file to the
main file. A handful of inlined functions were moved at the bottom of
the file. Call places were updated to use server-t.h when relevant, or
to simply drop the entry when not needed.
The files remained mostly unchanged since they were OK. However, half of
the users didn't need to include them, and about as many actually needed
to have it and used to find functions like srv_currently_usable() through
a long chain that broke when moving the file.
This one is particularly difficult to split because it provides all the
functions used to manipulate a proxy state and to retrieve names or IDs
for error reporting, and as such, it was included in 73 files (down to
68 after cleanup). It would deserve a small cleanup though the cut points
are not obvious at the moment given the number of structs involved in
the struct proxy itself.
The current state of the logging is a real mess. The main problem is
that almost all files include log.h just in order to have access to
the alert/warning functions like ha_alert() etc, and don't care about
logs. But log.h also deals with real logging as well as log-format and
depends on stream.h and various other things. As such it forces a few
heavy files like stream.h to be loaded early and to hide missing
dependencies depending where it's loaded. Among the missing ones is
syslog.h which was often automatically included resulting in no less
than 3 users missing it.
Among 76 users, only 5 could be removed, and probably 70 don't need the
full set of dependencies.
A good approach would consist in splitting that file in 3 parts:
- one for error output ("errors" ?).
- one for log_format processing
- and one for actual logging.
It was moved without any change, however many callers didn't need it at
all. This was a consequence of the split of proto_http.c into several
parts that resulted in many locations to still reference it.
The files were moved almost as-is, just dropping arg-t and auth-t from
acl-t but keeping arg-t in acl.h. It was useful to revisit the call places
since a handful of files used to continue to include acl.h while they did
not need it at all. Struct stream was only made a forward declaration
since not otherwise needed.
All includes that were not absolutely necessary were removed because
checks.h happens to very often be part of dependency loops. A warning
was added about this in check-t.h. The fields, enums and structs were
a bit tidied because it's particularly tedious to find anything there.
It would make sense to split this in two or more files (at least
extract tcp-checks).
The file was renamed to the singular because it was one of the rare
exceptions to have an "s" appended to its name compared to the struct
name.
The TASK_IS_TASKLET() macro was moved to the proto file instead of the
type one. The proto part was a bit reordered to remove a number of ugly
forward declaration of static inline functions. About a tens of C and H
files had their dependency dropped since they were not using anything
from task.h.
global.h was one of the messiest files, it has accumulated tons of
implicit dependencies and declares many globals that make almost all
other file include it. It managed to silence a dependency loop between
server.h and proxy.h by being well placed to pre-define the required
structs, forcing struct proxy and struct server to be forward-declared
in a significant number of files.
It was split in to, one which is the global struct definition and the
few macros and flags, and the rest containing the functions prototypes.
The UNIX_MAX_PATH definition was moved to compat.h.
This one is particularly tricky to move because everyone uses it
and it depends on a lot of other types. For example it cannot include
arg-t.h and must absolutely only rely on forward declarations to avoid
dependency loops between vars -> sample_data -> arg. In order to address
this one, it would be nice to split the sample_data part out of sample.h.
There's no type file, it only contains fetch_rdp_cookie_name() and
val_payload_lv() which probably ought to move somewhere else instead
of staying there.