The server idle/safe/available connection lists are replaced with ebmb-
trees. This is used to store backend connections, with the new field
connection hash as the key. The hash is a 8-bytes size field, used to
reflect specific connection parameters.
This is a preliminary work to be able to reuse connection with SNI,
explicit src/dst address or PROXY protocol.
Since commit 3169471964fdc49963e63f68c1fd88686821a0c4 ("MINOR: Add
server port field to server state file.") max_fields was not increased
on version number 1. So this patch aims to fix it. This should be
backported as far as v1.8, but the numbering should be adpated depending
on the version: simply increase the field by 1.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
When adding the server side support for certificate update over the CLI
we encountered a design problem with the SSL session cache which was not
locked.
Indeed, once a certificate is updated we need to flush the cache, but we
also need to ensure that the cache is not used during the update.
To prevent the use of the cache during an update, this patch introduce a
rwlock for the SSL server session cache.
In the SSL session part this patch only lock in read, even if it writes.
The reason behind this, is that in the session part, there is one cache
storage per thread so it is not a problem to write in the cache from
several threads. The problem is only when trying to write in the cache
from the CLI (which could be on any thread) when a session is trying to
access the cache. So there is a write lock in the CLI part to prevent
simultaneous access by a session and the CLI.
This patch also remove the thread_isolate attempt which is eating too
much CPU time and was not protecting from the use of a free ptr in the
session.
in the same manner of agentaddr, we now:
- permit to set agentport through `port` keyword, like it is the case
for agentaddr through `addr`
- set the priority on `agent-port` keyword when used
- add a flag to be able to test when the value is set like for agentaddr
it makes the behaviour between `addr` and `port` more consistent.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
We can currently change the check-port using the cli command `set server
check-port` but there is a consistency issue when using server state.
This patch aims to fix this problem but will be also a good preparation
work to get rid of checkport flag, so we are able to know when checkport
was set by config.
I am fully aware this is not making github #953 moving forward, I
however think this might be acceptable while waiting for a proper
solution and resolve consistency problem faced with port settings.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
While trying to fix some consistency problem with the config file/cli
(e.g. check-port cli command does not set the flag), we realised
checkport flag was not necessarily needed. Indeed tcpcheck uses service
port as the last choice if check.port is zero. So we can assume if
check.port is zero, it means it was never set by the user, regardless if
it is by the cli or config file. In the longterm this will avoid to
introduce a new consistency issue if we forget to set the flag.
in the same manner of checkport flag, we don't really need checkaddr
flag. We can assume if checkaddr is not set, it means it was never set
by the user or config.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The client_crt member is not used anymore since the server's ssl context
initialization now behaves the same way as the bind lines one (using
ckch stores and instances).
In order for the backend server's certificate to be hot-updatable, it
needs to fit into the implementation used for the "bind" certificates.
This patch follows the architecture implemented for the frontend
implementation and reuses its structures and general function calls
(adapted for the server side).
The ckch store logic is kept and a dedicated ckch instance is used (one
per server). The whole sni_ctx logic was not kept though because it is
not needed.
All the new functions added in this patch are basically server-side
copies of functions that already exist on the frontend side with all the
sni and bind_cond references removed.
The ckch_inst structure has a new 'is_server_instance' flag which is
used to distinguish regular instances from the server-side ones, and a
new pointer to the server's structure in case of backend instance.
Since the new server ckch instances are linked to a standard ckch_store,
a lookup in the ckch store table will succeed so the cli code used to
update bind certificates needs to be covered to manage those new server
side ckch instances.
This is from the output of codespell. It's done at once over a bunch
of files and only affects comments, so there is nothing user-visible.
No backport needed.
This allows using the address of the server rather than the name of the
server for keeping track of servers in a backend for stickiness.
The peers code was also extended to support feeding the dictionary using
this key instead of the name.
Fixes#814
This patch adds QUIC structs to server struct so that to make the QUIC code
compile. Also initializes the ebtree to store the connections by connection
IDs.
HAVE_SSL_CTX_SET_CIPHERSUITES is newly defined macro set in openssl-compat.h,
which helps to identify ssl libs (currently OpenSSL-1.1.1 only) that supports
TLS13 cipersuites manipulation on TLS13 context
in the context of a progressive backend migration, we want to be able to
activate SSL on outgoing connections to the server at runtime without
reloading.
This patch adds a `set server ssl` command; in order to allow that:
- add `srv_use_ssl` to `show servers state` command for compatibility,
also update associated parsing
- when using default-server ssl setting, and `no-ssl` on server line,
init SSL ctx without activating it
- when triggering ssl API, de/activate SSL connections as requested
- clean ongoing connections as it is done for addr/port changes, without
checking prior server state
example config:
backend be_foo
default-server ssl
server srv0 127.0.0.1:6011 weight 1 no-ssl
show servers state:
5 be_foo 1 srv0 127.0.0.1 2 0 1 1 15 1 0 4 0 0 0 0 - 6011 - -1
where srv0 can switch to ssl later during the runtime:
set server be_foo/srv0 ssl on
5 be_foo 1 srv0 127.0.0.1 2 0 1 1 15 1 0 4 0 0 0 0 - 6011 - 1
Also update existing tests and create a new one.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
A few structures were slightly rearranged in order to plug some holes
left around the locks. Sizes ranging from 8 to 32 bytes could be saved
depending on the structures. No performance difference was noticed (none
was expected there), though memory usage might be slightly reduced in
some rare cases.
This is executed on startup with the registered statistics module. The
existing statistics have been merged in a list containing all
statistics for each domain. This is useful to print all available
statistics in a generic way.
Allocate extra counters for all proxies/servers/listeners instances.
These counters are allocated with the counters from the stats modules
registered on startup.
The next thread walking algorithm in commit 566df309c ("MEDIUM:
connections: Attempt to get idle connections from other threads.")
proved to be sufficient for most cases, but it still has some rough
edges when threads are unevenly loaded. If one thread wakes up with
10 streams to process in a burst, it will mainly take over connections
from the next one until it doesn't have anymore.
This patch implements a rotating index that is stored into the server
list and that any thread taking over a connection is responsible for
updating. This way it starts mostly random and avoids always picking
from the same place. This results in a smoother distribution overall
and a slightly lower takeover rate.
The problem with the way idle connections currently work is that it's
easy for a thread to steal all of its siblings' connections, then release
them, then it's done by another one, etc. This happens even more easily
due to scheduling latencies, or merged events inside the same pool loop,
which, when dealing with a fast server responding in sub-millisecond
delays, can really result in one thread being fully at work at a time.
In such a case, we perform a huge amount of takeover() which consumes
CPU and requires quite some locking, sometimes resulting in lower
performance than expected.
In order to fight against this problem, this patch introduces a new server
setting "pool-low-conn", whose purpose is to dictate when it is allowed to
steal connections from a sibling. As long as the number of idle connections
remains at least as high as this value, it is permitted to take over another
connection. When the idle connection count becomes lower, a thread may only
use its own connections or create a new one. By proceeding like this even
with a low number (typically 2*nbthreads), we quickly end up in a situation
where all active threads have a few connections. It then becomes possible
to connect to a server without bothering other threads the vast majority
of the time, while still being able to use these connections when the
number of available FDs becomes low.
We also use this threshold instead of global.nbthread in the connection
release logic, allowing to keep more extra connections if needed.
A test performed with 10000 concurrent HTTP/1 connections, 16 threads
and 210 servers with 1 millisecond of server response time showed the
following numbers:
haproxy 2.1.7: 185000 requests per second
haproxy 2.2: 314000 requests per second
haproxy 2.2 lowconn 32: 352000 requests per second
The takeover rate goes down from 300k/s to 13k/s. The difference is
further amplified as the response time shrinks.
Starting with commit 079cb9a ("MEDIUM: connections: Revamp the way idle
connections are killed") we started to improve the way to compute the
need for idle connections. But the condition to keep a connection idle
or drop it when releasing it was not updated. This often results in
storms of close when certain thresholds are met, and long series of
takeover() when there aren't enough connections left for a thread on
a server.
This patch tries to improve the situation this way:
- it keeps an estimate of the number of connections needed for a server.
This estimate is a copy of the max over previous purge period, or is a
max of what is seen over current period; it differs from max_used_conns
in that this one is a counter that's reset on each purge period ;
- when releasing, if the number of current idle+used connections is
lower than this last estimate, then we'll keep the connection;
- when releasing, if the current thread's idle conns head is empty,
and we don't exceed the estimate by the number of threads, then
we'll keep the connection.
- when cleaning up connections, we consider the max of the last two
periods to avoid killing too many idle conns when facing bursty
traffic.
Thanks to this we can better converge towards a situation where, provided
there are enough FDs, each active server keeps at least one idle connection
per thread all the time, with a total number close to what was needed over
the previous measurement period (as defined by pool-purge-delay).
On tests with large numbers of concurrent connections (30k) and many
servers (200), this has quite smoothed the CPU usage pattern, increased
the reuse rate and roughly halved the takeover rate.
There are list definitions everywhere in the code, let's drop the need
for including list-t.h to declare them. The rest of the list manipulation
is huge however and not needed everywhere so using the list walking macros
still requires to include list.h.
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
extern struct dict server_name_dict was moved from the type file to the
main file. A handful of inlined functions were moved at the bottom of
the file. Call places were updated to use server-t.h when relevant, or
to simply drop the entry when not needed.