we previously forgot to add `agent-*` commands.
Take this opportunity to rewrite the help string in a simpler way for
readability (mainly removing simple quotes)
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The remaining contention on the server lock solely comes from
sess_change_server() which takes the lock to add and remove a
stream from the server's actconn list. This is both expensive
and pointless since we have mt-lists, and this list is only
used by the CLI's "shutdown server sessions" command!
Let's migrate to an mt-list and remove the need for this costly
lock. By doing so, the request rate increased by ~1.8%.
The RMAINT admin state is dynamic and should be remove from the
srv_admin_state parameter when a server state is loaded from a server-state
file. Otherwise an erorr is reported, the server-state line is ignored and
the server state is not updated.
This patch should fix the issue #576. It must be backported as far as 1.8.
This patch splits current dns.c into two files:
The first dns.c contains code related to DNS message exchange over UDP
and in future other TCP. We try to remove depencies to resolving
to make it usable by other stuff as DNS load balancing.
The new resolvers.c inherit of the code specific to the actual
resolvers.
Note:
It was really difficult to obtain a clean diff dur to the amount
of moved code.
Note2:
Counters and stuff related to stats is not cleany separated because
currently counters for both layers are merged and hard to separate
for now.
Resolv callbacks are also updated to rely on counters and not on
nameservers.
"show stat domain dns" will now show the parent id (i.e. resolvers
section name).
This variable is now only used to point on the local server-state file. When
the server-state is global, it is unused. So, we now use "localfilepath"
instead. Thus, the "filepath" variable can safely be removed.
When a local server-state file is loaded, if its name is too long, the error
is not properly handled, resulting to a call to fopen() with the "filepath"
variable set to NULL. To fix the bug, when this error occurs, we jump to the
next proxy, via a "continue" statement. And we take case to set "filepath"
variable after the error handling to be sure.
This patch should fix the issue #1111. It must be backported as far as 1.6.
The default proxy was passed as a variable, which in addition to being
a PITA to deal with in the config parser, doesn't feel safe to use when
it ought to be const.
This will only affect new code so no backport is needed.
init_default_instance() was still left in cfgparse.c which is not the
best place to pre-initialize a proxy. Let's place it in proxy.c just
after init_new_proxy(), take this opportunity for renaming it to
proxy_preset_defaults() and taking out init_new_proxy() from it, and
let's pass it the pointer to the default proxy to be initialized instead
of implicitly assuming defproxy. We'll soon be able to exploit this.
Only two call places had to be updated.
server health checks and agent parameters are written the same way as
others to be able to enahcne code reuse: basically we make use of
parsing and assignment at the same place. It makes it difficult for
error handling to know whether srv object was modified partially or not.
The problem was already present with SRV resolution though.
I was a bit puzzled about the approach to take to be honest, and I did
not wanted to go into a full refactor, so I assumed it was ok to simply
notify whether the line was failed or partially applied.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
logical followup from cli commands addition, so that the state server
file stays compatible with the changes made at runtime; use previously
added helper to load server attributes.
also alloc a specific chunk to avoid mixing with other called functions
using it
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Even if it is possibly too much work for the current usage, it makes
sure we don't break states file from v2.3 to v2.4; indeed, since v2.3,
we introduced two new fields, so we put them aside to guarantee we can
easily reload from a version 1.
The diff seems huge but there is no specific change apart from:
- introduce v2 where it is needed (parsing, update)
- move away from switch/case in update to be able to reuse code
- move srv lock to the whole function to make it easier
this patch confirm how painful it is to maintain this functionality.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
this patch allows to set agent port at runtime. In order to align with
both `addr` and `check-addr` commands, also add the possibility to
optionnaly set port on `agent-addr` command. This led to a small
refactor in order to use the same function for both `agent-addr` and
`agent-port` commands.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
this patch allows to set server health check address at runtime. In
order to align with `addr` command, also allow to set port optionnaly.
This led to a small refactor in order to use the same function for both
`check-addr` and `check-port` commands.
for `check-port`, we however don't permit the change anymore if checks
are not enabled on the server.
This command becomes more and more useful for people having a consul
like architecture:
- the backend server is located on a container with its own IP
- the health checks are done the consul instance located on the host
with the host IP
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The server idle/safe/available connection lists are replaced with ebmb-
trees. This is used to store backend connections, with the new field
connection hash as the key. The hash is a 8-bytes size field, used to
reflect specific connection parameters.
This is a preliminary work to be able to reuse connection with SNI,
explicit src/dst address or PROXY protocol.
This is a preparation work for connection reuse with sni/proxy
protocol/specific src-dst addresses.
Protect every access to idle conn lists with a lock. This is currently
strictly not needed because the access to the list are made with atomic
operations. However, to be able to reuse connection with specific
parameters, the list storage will be converted to eb-trees. As this
structure does not have atomic operation, it is mandatory to protect it
with a lock.
For this, the takeover lock is reused. Its role was to protect during
connection takeover. As it is now extended to general idle conns usage,
it is renamed to idle_conns_lock. A new lock section is also
instantiated named IDLE_CONNS_LOCK to isolate its impact on performance.
When adding the server side support for certificate update over the CLI
we encountered a design problem with the SSL session cache which was not
locked.
Indeed, once a certificate is updated we need to flush the cache, but we
also need to ensure that the cache is not used during the update.
To prevent the use of the cache during an update, this patch introduce a
rwlock for the SSL server session cache.
In the SSL session part this patch only lock in read, even if it writes.
The reason behind this, is that in the session part, there is one cache
storage per thread so it is not a problem to write in the cache from
several threads. The problem is only when trying to write in the cache
from the CLI (which could be on any thread) when a session is trying to
access the cache. So there is a write lock in the CLI part to prevent
simultaneous access by a session and the CLI.
This patch also remove the thread_isolate attempt which is eating too
much CPU time and was not protecting from the use of a free ptr in the
session.
small consistency problem with `addr` and `agent-addr` options:
for the both options, the last one parsed is always used to set the
agent-check addr. Thus these two lines don't have the same behavior:
server ... addr <addr1> agent-addr <addr2>
server ... agent-addr <addr2> addr <addr1>
After this patch `agent-addr` will always be the priority option over
`addr`. It means we test the flag before setting agentaddr.
We also fix all the places where we did not set the flag to be coherent
everywhere.
I was not really able to determine where this issue is coming from. So
it is probable we may backport it to all stable version where the agent
is supported.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
We can currently change the check-port using the cli command `set server
check-port` but there is a consistency issue when using server state.
This patch aims to fix this problem but will be also a good preparation
work to get rid of checkport flag, so we are able to know when checkport
was set by config.
I am fully aware this is not making github #953 moving forward, I
however think this might be acceptable while waiting for a proper
solution and resolve consistency problem faced with port settings.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
While trying to fix some consistency problem with the config file/cli
(e.g. check-port cli command does not set the flag), we realised
checkport flag was not necessarily needed. Indeed tcpcheck uses service
port as the last choice if check.port is zero. So we can assume if
check.port is zero, it means it was never set by the user, regardless if
it is by the cli or config file. In the longterm this will avoid to
introduce a new consistency issue if we forget to set the flag.
in the same manner of checkport flag, we don't really need checkaddr
flag. We can assume if checkaddr is not set, it means it was never set
by the user or config.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
When the server state is loaded from a server-state file, there is no reason
to set an unconfigured check port with the server port. Because by default,
if the check port is not set, the server's one is used. Thus we can remove
this useless assignment. It is mandatory for next improvements.
while reading `update_server_addr_port` I found out some things which
can be seen as incoherency. I hope I did not overlooked anything:
- one comment is stating check's address should be updated if it uses
the server one; however the condition checks if `SRV_F_CHECKADDR` is
set; this flag is set when a check address is set; result is that we
override the check address where I was not expecting it. In fact we
don't need to update anything here as server addr is used when check
addr is not set.
- same goes for check agent addr
- for port, it is a bit different, we update the check port if it is
unset. This is harmless because we also use server port if check port
is unset. However it creates some incoherency before/after using this
command, as check port should stay unset througout the life of the
process unless it is is set by `set server check-port` command.
quite hard to locate the origin of this this issue but the function was
introduced in commit d458adcc52 ("MINOR:
new update_server_addr_port() function to change both server's ADDR and
service PORT"). I was however not able to determine whether this is due
to a change of behavior along the years. So this patch can potentially
be backported up to v1.8 but we must be careful while doing so, as the
code has changed a lot. That being said, the bug being not very
impacting I would be fine keeping it for 2.4 only.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The client_crt member is not used anymore since the server's ssl context
initialization now behaves the same way as the bind lines one (using
ckch stores and instances).
An fatal error is now reported if a server is defined in a frontend
section. til now, a warning was just emitted and the server was ignored. The
warning was added in the 1.3.4 when the frontend/backend keywords were
introduced to allow a smooth transition and to not break existing
configs. It is old enough now to emit an fatal error in this case.
This patch is related to the issue #1043. It may be backported at least as
far as 2.2, and possibly to older versions. It relies on the previous commit
("MINOR: config: Add failifnotcap() to emit an alert on proxy capabilities").
If a server is configured to not have any idle conns, returns immediatly
from srv_cleanup_connections. This avoids a segfault when a server is
configured with pool-max-conn to 0.
This should be backported up to 2.2.
Do not proceed on init_addr if the backend of the server is marked as
disabled. When marked as disabled, the server is not fully initialized
and some operation must be avoided to prevent segfault. It is correct
because there is no way to activate a disabled backend.
This fixes the github issue #1031.
This should be backported to 2.2.
GitHub Issue #1026 reported a crash during configuration check for the
following example config:
backend 0
server 0 0
server 0 0
HAProxy crashed in srv_set_addr_desc() due to a NULL pointer dereference
caused by `sa2str` returning NULL for an `AF_UNSPEC` address (`0`).
Check to make sure the address key is non-null before using it for
comparison or inserting it into the tree.
The crash was introduced in commit 92149f9a8 ("MEDIUM: stick-tables: Add
srvkey option to stick-table") which not in any released version so no
backport is needed.
Cc: Tim Duesterhus <tim@bastelstu.be>
This allows using the address of the server rather than the name of the
server for keeping track of servers in a backend for stickiness.
The peers code was also extended to support feeding the dictionary using
this key instead of the name.
Fixes#814
This patch adds QUIC structs to server struct so that to make the QUIC code
compile. Also initializes the ebtree to store the connections by connection
IDs.
in the context of a progressive backend migration, we want to be able to
activate SSL on outgoing connections to the server at runtime without
reloading.
This patch adds a `set server ssl` command; in order to allow that:
- add `srv_use_ssl` to `show servers state` command for compatibility,
also update associated parsing
- when using default-server ssl setting, and `no-ssl` on server line,
init SSL ctx without activating it
- when triggering ssl API, de/activate SSL connections as requested
- clean ongoing connections as it is done for addr/port changes, without
checking prior server state
example config:
backend be_foo
default-server ssl
server srv0 127.0.0.1:6011 weight 1 no-ssl
show servers state:
5 be_foo 1 srv0 127.0.0.1 2 0 1 1 15 1 0 4 0 0 0 0 - 6011 - -1
where srv0 can switch to ssl later during the runtime:
set server be_foo/srv0 ssl on
5 be_foo 1 srv0 127.0.0.1 2 0 1 1 15 1 0 4 0 0 0 0 - 6011 - 1
Also update existing tests and create a new one.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Define a per-thread counters allocated with the greatest size of any
stat module counters. This variable is named trash_counters.
When using a proxy without allocated counters, return the trash counters
from EXTRA_COUNTERS_GET instead of a dangling pointer to prevent
segfault.
This is useful for all the proxies used internally and not
belonging to the global proxy list. As these objects does not appears on
the stat report, it does not matter to use the dummy counters.
For this fix to be functional, the extra counters are explicitly
initialized to NULL on proxy/server/listener init functions.
Most notably, the crash has already been detected with the following
vtc:
- reg-tests/lua/txn_get_priv.vtc
- reg-tests/peers/tls_basic_sync.vtc
- reg-tests/peers/tls_basic_sync_wo_stkt_backend.vtc
There is probably other parts that may be impacted (SPOE for example).
This bug was introduced in the current release and do not need to be
backported. The faulty commits are
"MINOR: ssl: count client hello for stats" and
"MINOR: ssl: add counters for ssl sessions".
This function used to grab the idle lock when scanning the threads for
idle connections, but it doesn't need it since the lock only protects
the tree. Let's remove it.
In issue #933, @jaroslawr provided a report indicating that when using
many threads and many servers, it's very difficult to terminate the last
idle connections on each server. The issue has two causes in fact. The
first one is that during the calculation of the estimate of needed
connections, we round the computation up while in previous round it was
already rounded up, so we end up adding 1 to 1 which once divided by 2
remains 1. The second issue is that servers are not woken up anymore for
purging their connections if they don't have activity. The only reason
that was there to wake them up again was in case insufficient connections
were purged. And even then the purge task itself was not woken up. But
that is not enough for getting rid of the long tail of old connections
nor updating est_need_conns.
This patch makes sure to properly wake up as long as at least one idle
connection remains, and not to round up the needed connections anymore.
Prior to this patch, a test involving many connections which suddenly
stopped would keep many idle connections, now they're effectively halved
every pool-purge-delay.
This needs to be backported to 2.2.
When servers based on server templates are initialized, the configuration file
and line are now copied. This helps to emit understandable warning and alert
messages.
This patch may be backported if needed, as far as 1.8.
On startup, if a server has no address but the dns resolutions are configured,
"none" method is added to the default init-addr methods, in addition to "last"
and "libc". Thus on startup, this server is set to RMAINT mode if no address is
found. It is only performed if no other init-addr method is configured.
Setting the RMAINT mode on startup is important to inhibit the health checks.
For instance, following servers will now be set to RMAINT mode on startup :
server srv nofound.tld:80 check resolvers mydns
server srv _http._tcp.service.local check resolvers mydns
server-template srv 1-3 _http._tcp.service.local check resolvers mydns
while followings ones will trigger an error :
server srv nofound.tld:80 check
server srv nofound.tld:80 check resolvers mydns init-addr libc
server srv _http._tcp.service.local check
server srv _http._tcp.service.local check resolvers mydns init-addr libc
server-template srv 1-3 _http._tcp.service.local check resolvers mydns init-addr libc
This patch must be backported as far as 1.8.
Adjust condition used to report down_time for statistics. There was a
tiny probabilty to have a negative downtime if last_change was superior
to now. If this is the case, return only down_time.
This bug can backported up to 1.8.
When a server is up after a failure, its downtime was reset to 0 on the
statistics. This is due to a wrong condition that causes srv.down_time
to never be set. Fix this by updating down_time each time the server is in
STARTING state.
Fixes the github issue #920.
This bug can be backported up to 1.8.
No need to use an exclusive lock on the proxy anymore when reading its
setting, a read lock is enough. A few other places continue to use a
write-lock when modifying simple flags only in order to let this
function see a consistent value all along. This might be changed in
the future using barriers and local copies.
This is an anticipation of finer grained locking for the queues. For now
all lock places take a write lock so that there is no difference at all
with previous code.
If the slowstart value in a state file implies the latest state change
is within the slowstart period, we end up calling srv_update_status()
to reschedule the server's state change but its task is not yet
allocated and remains null, causing a crash on startup.
Make sure srv_update_status() supports being called with partially
initialized servers which do not yet have a task. If the task has to
be scheduled, it will necessarily happen after initialization since
it will result from a state change.
This should be backported wherever server-state is present.