571 Commits

Author SHA1 Message Date
Willy Tarreau
260f324c19 REORG: server: uninline the idle conns management functions
The following functions are quite heavy and have no reason to be kept
inlined:

   srv_release_conn, srv_lookup_conn, srv_lookup_conn_next,
   srv_add_to_idle_list

They were moved to server.c. It's worth noting that they're a bit
at the edge between server and connection and that maybe we could
create an idle-conn file for these in the near future.
2021-10-07 01:41:14 +02:00
Willy Tarreau
a8a72c68d5 CLEANUP: ssl/server: move ssl_sock_set_srv() to srv_set_ssl() in server.c
This one has nothing to do with ssl_sock as it manipulates the struct
server only. Let's move it to server.c and remove unneeded dependencies
on ssl_sock.h. This further reduces by 10% the number of includes of
opensslconf.h and by 0.5% the number of compiled lines.
2021-10-07 01:41:06 +02:00
Willy Tarreau
80527bcb9d CLEANUP: server: always include the storage for SSL settings
The SSL stuff in struct server takes less than 3% of it and requires
lots of annoying ifdefs in the code just to take care of the cases
where the field is absent. Let's get rid of this and stop including
openssl-compat from server.c to detect NPN and ALPN capabilities.

This reduces the total LoC by another 0.4%.
2021-10-07 01:36:51 +02:00
Willy Tarreau
beeabf5314 MINOR: task: provide 3 task_new_* wrappers to simplify the API
We'll need to improve the API to pass other arguments in the future, so
let's start to adapt better to the current use cases. task_new() is used:
  - 18 times as task_new(tid_bit)
  - 18 times as task_new(MAX_THREADS_MASK)
  - 2 times with a single bit (in a loop)
  - 1 in the debug code that uses a mask

This patch provides 3 new functions to achieve this:
  - task_new_here()     to create a task on the calling thread
  - task_new_anywhere() to create a task to be run anywhere
  - task_new_on()       to create a task to run on a specific thread

The change is trivial and will allow us to later concentrate the
required adaptations to these 3 functions only. It's still possible
to call task_new() if needed but a comment was added to encourage the
use of the new ones instead. The debug code was not changed and still
uses it.
2021-10-01 18:36:29 +02:00
Amaury Denoyelle
cd8a6f28c6 MINOR: server: enable slowstart for dynamic server
Enable the 'slowstart' keyword for dynamic servers. The slowstart task
is allocated in 'add server' handler if slowstart is used.

As the server is created in disabled state, there is no need to start
the task. The slowstart task will be automatically started on the first
'enable server' invocation.
2021-09-21 14:00:32 +02:00
Amaury Denoyelle
29d1ac1330 REORG: server: move slowstart init outside of checks
'slowstart' can be used without check on a server, with the CLI handlers
'enable/disable server'. Move the code to initialize and start the
slowstart task outside of check.c.

This change will also be reused to enable slowstart for dynamic servers.
2021-09-21 14:00:32 +02:00
Amaury Denoyelle
725f8d29ff MINOR: server: enable more check related keywords for dynamic servers
Allow to use the check related keywords defined in server.c. These
keywords can be enabled now that checks have been implemented for
dynamic servers.

Here is the list of the new keywords supported :
- error-limit
- observe
- on-error
- on-marked-down
- on-marked-up
2021-09-21 14:00:32 +02:00
Amaury Denoyelle
79b90e8cd4 MINOR: server: enable more keywords for ssl checks for dynamic servers
Allow to configure ssl support for dynamic server checks independently
of the ssl server configuration. This is done via the keyword
"check-ssl". Also enable to configure the sni/alpn used for the check
via "check-sni/alpn".
2021-09-21 14:00:07 +02:00
Amaury Denoyelle
b621552ca3 BUG/MINOR: server: alloc dynamic srv ssl ctx if proxy uses ssl chk rule
The ssl context is not initialized for a dynamic server, even if there
is a tcpcheck rule which uses ssl on the related backed. This will cause
the check initialization to failed with the message :
  "Out of memory when initializing an SSL connection"

This can be reproduced by having the following config in the backend :
  option tcp-check
  tcp-check connect ssl
and create a dynamic server with check activated and a ca-file.

Fix this by calling the prepare_srv xprt callback when the proxy options
PR_O_TCPCKH_SSL is set.

Check support for dynamic servers has been merged in the current branch.
No backport needed.
2021-09-21 13:56:03 +02:00
Amaury Denoyelle
0f456d5029 BUG/MINOR: server: allow 'enable health' only if check configured
Test that checks have been configured on the server before enabling via
the 'enable health' CLI. This mirrors the 'enable agent' command.

Without this, a user can use the command on the server without checks.
This leaves the server in an undefined state. Notably, the stat page
reports the server in check transition.

This condition was left on the following reorg commit.
  2c04eda8b58636ad2ae44e42b1f50f3b5a24a642
  REORG: cli: move "{enable|disable} health" to server.c

This should be backported up to 1.8.
2021-09-21 11:50:22 +02:00
Tim Duesterhus
d5fc8fcb86 CLEANUP: Add haproxy/xxhash.h to avoid modifying import/xxhash.h
This solves setting XXH_INLINE_ALL in a cleaner way, because the imported
header is not modified, easing future updates.

see 6f7cc11e6dd0f01b437fba893da2edd2362660a2
2021-09-11 19:58:45 +02:00
Amaury Denoyelle
14c3c5c121 MEDIUM: server: allow to remove servers at runtime except non purgeable
Relax the condition on "delete server" CLI handler to be able to remove
all servers, even non dynamic, except if they are flagged as non
purgeable.

This change is necessary to extend the use cases for dynamic servers
with reload. It's expected that each dynamic server created via the CLI
is manually commited in the haproxy configuration by the user. Dynamic
servers will be present on reload only if they are present in the
configuration file. This means that non-dynamic servers must be allowed
to be removable at runtime.

The dynamic servers removal reg-test has been updated and renamed to
reflect its purpose. A new test is present to check that non-purgeable
servers cannot be removed.
2021-08-25 15:53:54 +02:00
Amaury Denoyelle
0626961ad3 MINOR: server: mark referenced servers as non purgeable
Mark servers that are referenced by configuration elements as non
purgeable. This includes the following list :
- tracked servers
- servers referenced in a use-server rule
- servers referenced in a sample fetch
2021-08-25 15:53:54 +02:00
Amaury Denoyelle
bc2ebfa5a4 MEDIUM: server: extend refcount for all servers
In a future patch, it will be possible to remove at runtime every
servers, both static and dynamic. This requires to extend the server
refcount for all instances.

First, refcount manipulation functions have been renamed to better
express the API usage.

* srv_refcount_use -> srv_take
The refcount is always initialize to 1 on the server creation in
new_server. It's also incremented for each check/agent configured on a
server instance.

* free_server -> srv_drop
This decrements the refcount and if null, the server is freed, so code
calling it must not use the server reference after it. As a bonus, this
function now returns the next server instance. This is useful when
calling on the server loop without having to save the next pointer
before each invocation.

In these functions, remove the checks that prevent refcount on
non-dynamic servers. Each reference to "dynamic" in variable/function
naming have been eliminated as well.
2021-08-25 15:53:54 +02:00
Amaury Denoyelle
0a8d05d31c BUG/MINOR: stats: use refcount to protect dynamic server on dump
A dynamic server may be deleted at runtime at the same moment when the
stats applet is pointing to it. Use the server refcount to prevent
deletion in this case.

This should be backported up to 2.4, with an observability period of 2
weeks. Note that it requires the dynamic server refcounting feature
which has been implemented on 2.5; the following commits are required :

- MINOR: server: implement a refcount for dynamic servers
- BUG/MINOR: server: do not use refcount in free_server in stopping mode
- MINOR: server: return the next srv instance on free_server
2021-08-25 15:53:43 +02:00
Amaury Denoyelle
f5c1e12e44 MINOR: server: return the next srv instance on free_server
As a convenience, return the next server instance from servers list on
free_server.

This is particularily useful when using this function on the servers
list without having to save of the next pointer before calling it.
2021-08-25 15:29:19 +02:00
William Lallemand
4c395fce21 MINOR: server: check if srv is NULL in free_server()
Check if srv is NULL before trying to do anything  in free_server(),
like most free()-like function do.
2021-08-20 10:20:51 +02:00
Amaury Denoyelle
3eb42f91d9 BUG/MEDIUM: server: support both check/agent-check on a dynamic instance
A static server is able to support simultaneously both health chech and
agent-check. Adjust the dynamic server CLI handlers to also support this
configuration.

This should not be backported, unless dynamic server checks are
backported.
2021-08-11 14:41:47 +02:00
Amaury Denoyelle
13f2e2ceeb BUG/MINOR: server: do not use refcount in free_server in stopping mode
Currently there is a leak at process shutdown with dynamic servers with
check/agent-check activated. Check purges are not executed on process
stopping, so the server is not liberated due to its refcount.

The solution is simply to ignore the refcount on process stopping mode
and free the server on the first free_server invocation.

This should not be backported, unless dynamic server checks are
backported. In this case, the following commit must be backported first.
  7afa5c1843521ec3be7549592d2b38ccc9d68b73
  MINOR: global: define MODE_STOPPING
2021-08-09 17:53:30 +02:00
Amaury Denoyelle
b65f4cab6a MEDIUM: server: implement agent check for dynamic servers
This commit is the counterpart for agent check of
"MEDIUM: server: implement check for dynamic servers".

The "agent-check" keyword is enabled for dynamic servers. The agent
check must manually be activated via "enable agent" CLI. This can
enable the dynamic server if the agent response is "ready" without an
explicit "enable server" CLI.
2021-08-06 11:09:48 +02:00
Amaury Denoyelle
2fc4d39577 MEDIUM: server: implement check for dynamic servers
Implement check support for dynamic servers. The "check" keyword is now
enabled for dynamic servers. If used, the server check is initialized
and the check task started in the "add server" CLI handler. The check is
explicitely disabled and must be manually activated via "enable health"
CLI handler.

The dynamic server refcount is incremented if a check is configured. On
"delete server" handler, the check is purged, which decrements the
refcount.
2021-08-06 11:09:48 +02:00
Amaury Denoyelle
d6b7080cec MINOR: server: implement a refcount for dynamic servers
It is necessary to have a refcount mechanism on dynamic servers to be
able to enable check support. Indeed, when deleting a dynamic server
with check activated, the check will be asynchronously removed. This is
mandatory to properly free the check resources in a thread-safe manner.
The server instance must be kept alive for this.
2021-08-06 11:09:48 +02:00
Amaury Denoyelle
fca18172d9 MINOR: server: initialize fields for dynamic server check
Set default inter/rise/fall values for dynamic servers check/agent. This
is required because dynamic servers do not inherit from a
default-server.
2021-08-06 11:08:04 +02:00
Amaury Denoyelle
c755efd5c6 MINOR: server: unmark deprecated on enable health/agent cli
Remove the "DEPRECATED" marker on "enable/disable health/agent"
commands. Their purpose is to toggle the check/agent on a server.

These commands are still useful because their purpose is not covered by
the "set server" command. Most there was confusion with the commands
'set server health/agent', which in fact serves another goal.

Note that the indication "use 'set server' instead" has been added since
2016 on the commit
  2c04eda8b58636ad2ae44e42b1f50f3b5a24a642
  REORG: cli: move "{enable|disable} health" to server.c

and
  58d9cb7d22c1b0d8239543443131e3e3658375d0
  REORG: cli: move "{enable|disable} agent" to server.c

Besides, these commands will become required to enable check/agent on
dynamic servers which will be created with check disabled.

This should be backported up to 2.4.
2021-08-06 10:09:50 +02:00
Willy Tarreau
d332f1396b BUG/MINOR: server: update last_change on maint->ready transitions too
Nenad noticed that when leaving maintenance, the servers' last_change
field was not updated. This is visible in the Status column of the stats
page in front of the state, as the cumuled time spent in the current state
is wrong, it starts from the last transition (typically ready->maint). In
addition, the backend's state was not updated either, because the down
transition is performed by set_backend_down() which also emits a log, and
it is this function which was extended to update the backend's last_change,
but it's not called for down->up transitions so that was not done.

The most visible (and unpleasant) effect of this bug is that it affects
slowstart so such a server could immediately restart with a significant
load ratio.

This should likely be backported to all stable releases.
2021-08-04 19:41:01 +02:00
Amaury Denoyelle
bd8dd841e5 BUG/MINOR: server: remove srv from px list on CLI 'add server' error
If an error occured during the CLI 'add server' handler, the newly
created server must be removed from the proxy list if already inserted.
Currently, this can happen on the extremely rare error during server id
generation if there is no id left.

The removal operation is not thread-safe, it must be conducted before
releasing the thread isolation.

This can be backported up to 2.4. Please note that dynamic server track
is not implemented in 2.4, so the release_server_track invocation must
be removed for the backport to prevent a compilation error.
2021-08-04 14:57:06 +02:00
Willy Tarreau
ba3ab7907a MEDIUM: servers: make the server deletion code run under full thread isolation
In 2.4, runtime server deletion was brought by commit e558043e1 ("MINOR:
server: implement delete server cli command"). A comment remained in the
code about a theoretical race between the thread_isolate() call and another
thread being in the process of allocating memory before accessing the
server via a reference that was grabbed before the memory allocation,
since the thread_harmless_now()/thread_harmless_end() pair around mmap()
may have the effect of allowing cli_parse_delete_server() to proceed.

Now that the full thread isolation is available, let's update the code
to rely on this. Now it is guaranteed that competing threads will either
be in the poller or queued in front of thread_isolate_full().

This may be backported to 2.4 if any report of breakage suggests the bug
really exists, in which case the two following patches will also be
needed:
  MINOR: threads: make thread_release() not wait for other ones to complete
  MEDIUM: threads: add a stronger thread_isolate_full() call
2021-08-04 14:49:36 +02:00
Amaury Denoyelle
08be72b827 BUG/MINOR: server: fix race on error path of 'add server' CLI if track
If an error occurs during a dynamic server creation with tracking, it
must be removed from the tracked list. This operation is not thread-safe
and thus must be conducted under the thread isolation.

Track support for dynamic servers has been introduced in this release.
This does not need to be backported.
2021-08-04 09:18:12 +02:00
Amaury Denoyelle
56eb8ed37d MEDIUM: server: support track keyword for dynamic servers
Allow the usage of the 'track' keyword for dynamic servers. On server
deletion, the server is properly removed from the tracking chain to
prevents NULL pointer dereferencing.
2021-07-16 10:22:58 +02:00
Amaury Denoyelle
79f68be207 MINOR: srv: do not allow to track a dynamic server
Prevents the use of the "track" keyword for a dynamic server. This
simplifies the deletion of a dynamic server, without having to worry
about servers which might tracked it.

A BUG_ON is present in the dynamic server delete function to validate
this assertion.
2021-07-16 10:08:55 +02:00
Amaury Denoyelle
669b620e5f MINOR: srv: extract tracking server config function
Extract the post-config tracking setup in a dedicated function
srv_apply_track. This will be useful to implement track support for
dynamic servers.
2021-07-16 10:08:55 +02:00
Remi Tricot-Le Breton
0498fa4059 BUG/MINOR: ssl: Default-server configuration ignored by server
When a default-server line specified a client certificate to use, the
frontend would not take it into account and create an empty SSL context,
which would raise an error on the backend side ("peer did not return a
certificate").

This bug was introduced by d817dc733eacfd7cf5bb0bbc6128f44644db078e in
which the SSL contexts are created earlier than before (during the
default-server line parsing) without setting it in the corresponding
server structures. It then made the server create an empty SSL context
in ssl_sock_prepare_srv_ctx because it thought it needed one.

It was raised on redmine, in Bug #3906.

It can be backported to 2.4.
2021-07-13 18:35:38 +02:00
Christopher Faulet
81ba74ae50 BUG/MEDIUM: resolvers: Make 1st server of a template take part to SRV resolution
The commit 3406766d5 ("MEDIUM: resolvers: add a ref between servers and srv
request or used SRV record") introduced a regression. The first server of a
template based on SRV record is no longer resolved. The same bug exists for
a normal server based on a SRV record.

In fact, the server used during parsing (used as reference when a
server-template line is parsed) is never attached to the corresponding srvrq
object. Thus with following lines, no resolution is performed because
"srvrq->attached_servers" is empty:

  server-template test 1 _http.domain.tld resolvers dns ...
  server test1 _http.domain.tld resolvers dns ...

This patch should fix the issue #1295 (but not confirmed yet it is the same
bug). It must be backported everywhere the above commit is.
2021-06-29 20:52:37 +02:00
Christopher Faulet
07ecff589d MINOR: resolvers: Reset server IP on error in resolv_get_ip_from_response()
If resolv_get_ip_from_response() returns an error (or an unexpected return
value), the server is set to RMAINT status. However, its address must also
be reset. Otherwise, it is still reported by the cli on "show servers state"
commands. This may be confusing. Note that it is a theorical patch because
this code path does not exist. Thus it is not tagged as a BUG.

This patch may be backported as far as 2.0.
2021-06-24 17:22:36 +02:00
Christopher Faulet
a8ce497aac BUG/MINOR: resolvers: Reset server IP when no ip is found in the response
For A/AAAA resolution, if no ip is found for a server in the response, the
server is set to RMAINT status. However, its address must also be
reset. Otherwise, it is still reported by the cli on "show servers state"
commands. This may be confusing.

This patch may be backported as far as 2.0.
2021-06-24 17:22:36 +02:00
Willy Tarreau
cdc83e0192 MINOR: queue: add a pointer to the server and the proxy in the queue
A queue is specific to a server or a proxy, so we don't need to place
this distinction inside all pendconns, it can be in the queue itself.
This commit adds the relevant fields "px" and "sv" into the struct
queue, and initializes them accordingly.
2021-06-24 10:52:31 +02:00
Willy Tarreau
df3b0cbe31 MINOR: queue: add queue_init() to initialize a queue
This is better and cleaner than open-coding this in the server and
proxy code, where it has all chances of becoming wrong once forgotten.
2021-06-24 10:52:31 +02:00
Willy Tarreau
9ab78293bf MEDIUM: queue: simplify again the process_srv_queue() API (v2)
This basically undoes the API changes that were performed by commit
0274286dd ("BUG/MAJOR: server: fix deadlock when changing maxconn via
agent-check") to address the deadlock issue: since process_srv_queue()
doesn't use the server lock anymore, it doesn't need the "server_locked"
argument, so let's get rid of it before it gets used again.
2021-06-24 10:52:31 +02:00
Willy Tarreau
16fbdda3c3 MEDIUM: queue: use a dedicated lock for the queues (v2)
Till now whenever a server or proxy's queue was touched, this server
or proxy's lock was taken. Not only this requires distinct code paths,
but it also causes unnecessary contention with other uses of these locks.

This patch adds a lock inside the "queue" structure that will be used
the same way by the server and the proxy queuing code. The server used
to use a spinlock and the proxy an rwlock, though the queue only used
it for locked writes. This new version uses a spinlock since we don't
need the read lock part here. Tests have not shown any benefit nor cost
in using this one versus the rwlock so we could change later if needed.

The lower contention on the locks increases the performance from 362k
to 374k req/s on 16 threads with 20 servers and leastconn. The gain
with roundrobin even increases by 9%.

This is tagged medium because the lock is changed, but no other part of
the code touches the queues, with nor without locking, so this should
remain invisible.
2021-06-24 10:52:31 +02:00
Willy Tarreau
3f70fb9ea2 Revert "MEDIUM: queue: use a dedicated lock for the queues"
This reverts commit fcb8bf8650ec6b5614d1b88db54f1200ebd96cbd.

The recent changes since 5304669e1 MEDIUM: queue: make
pendconn_process_next_strm() only return the pendconn opened a tiny race
condition between stream_free() and process_srv_queue(), as the pendconn
is accessed outside of the lock, possibly while it's being freed. A
different approach is required.
2021-06-24 07:26:28 +02:00
Willy Tarreau
ccd85a3e08 Revert "MEDIUM: queue: simplify again the process_srv_queue() API"
This reverts commit c83e45e9b001591633188a480a896c935d3c9625.

The recent changes since 5304669e1 MEDIUM: queue: make
pendconn_process_next_strm() only return the pendconn opened a tiny race
condition between stream_free() and process_srv_queue(), as the pendconn
is accessed outside of the lock, possibly while it's being freed. A
different approach is required.
2021-06-24 07:22:18 +02:00
Willy Tarreau
c83e45e9b0 MEDIUM: queue: simplify again the process_srv_queue() API
This basically undoes the API changes that were performed by commit
0274286dd ("BUG/MAJOR: server: fix deadlock when changing maxconn via
agent-check") to address the deadlock issue: since process_srv_queue()
doesn't use the server lock anymore, it doesn't need the "server_locked"
argument, so let's get rid of it before it gets used again.
2021-06-22 18:57:15 +02:00
Willy Tarreau
fcb8bf8650 MEDIUM: queue: use a dedicated lock for the queues
Till now whenever a server or proxy's queue was touched, this server
or proxy's lock was taken. Not only this requires distinct code paths,
but it also causes unnecessary contention with other uses of these locks.

This patch adds a lock inside the "queue" structure that will be used
the same way by the server and the proxy queuing code. The server used
to use a spinlock and the proxy an rwlock, though the queue only used
it for locked writes. This new version uses a spinlock since we don't
need the read lock part here. Tests have not shown any benefit nor cost
in using this one versus the rwlock so we could change later if needed.

The lower contention on the locks increases the performance from 491k
to 507k req/s on 16 threads with 20 servers and leastconn. The gain
with roundrobin even increases by 6%.

The performance profile changes from this:
  13.03%  haproxy             [.] fwlc_srv_reposition
   8.08%  haproxy             [.] fwlc_get_next_server
   3.62%  haproxy             [.] process_srv_queue
   1.78%  haproxy             [.] pendconn_dequeue
   1.74%  haproxy             [.] pendconn_add

to this:
  11.95%  haproxy             [.] fwlc_srv_reposition
   7.57%  haproxy             [.] fwlc_get_next_server
   3.51%  haproxy             [.] process_srv_queue
   1.74%  haproxy             [.] pendconn_dequeue
   1.70%  haproxy             [.] pendconn_add

At this point the differences are mostly measurement noise.

This is tagged medium because the lock is changed, but no other part of
the code touches the queues, with nor without locking, so this should
remain invisible.
2021-06-22 18:43:56 +02:00
Willy Tarreau
a05704582c MINOR: server: replace the pendconns-related stuff with a struct queue
Just like for proxies, all three elements (pendconns, nbpend, queue_idx)
were moved to struct queue.
2021-06-22 18:43:14 +02:00
Amaury Denoyelle
0274286dd3 BUG/MAJOR: server: fix deadlock when changing maxconn via agent-check
The server_parse_maxconn_change_request locks the server lock. However,
this function can be called via agent-checks or lua code which already
lock it. This bug has been introduced by the following commit :

  commit 79a88ba3d09f7e2b73ae27cb5d24cc087a548fa6
  BUG/MAJOR: server: prevent deadlock when using 'set maxconn server'

This commit tried to fix another deadlock with can occur because
previoulsy server_parse_maxconn_change_request requires the server lock
to be held. However, it may call internally process_srv_queue which also
locks the server lock. The locking policy has thus been updated. The fix
is functional for the CLI 'set maxconn' but fails to address the
agent-check / lua counterparts.

This new issue is fixed in two steps :
- changes from the above commit have been reverted. This means that
  server_parse_maxconn_change_request must again be called with the
  server lock.

- to counter the deadlock fixed by the above commit, process_srv_queue
  now takes an argument to render the server locking optional if the
  caller already held it. This is only used by
  server_parse_maxconn_change_request.

The above commit was subject to backport up to 1.8. Thus this commit
must be backported in every release where it is already present.
2021-06-22 11:39:20 +02:00
Amaury Denoyelle
34897d2eff MINOR: ssl: support ssl keyword for dynamic servers
Activate the 'ssl' keyword for dynamic servers. This is the final step
to have ssl dynamic servers feature implemented. If activated,
ssl_sock_prepare_srv_ctx will be called at the end of the 'add server'
CLI handler.

At the same time, update the management doc to list all ssl keywords
implemented for dynamic servers.
2021-06-18 16:42:26 +02:00
Amaury Denoyelle
b89d3d3de7 MINOR: server: disable CLI 'set server ssl' for dynamic servers
'set server ssl' uses ssl parameters from default-server. As dynamic
servers does not reuse any default-server parameters, this command has
no sense for them.
2021-06-18 16:42:25 +02:00
Christopher Faulet
0ba54bb401 BUG/MINOR: server/cli: Fix locking in function processing "set server" command
The commit c7b391aed ("BUG/MEDIUM: server/cli: Fix ABBA deadlock when fqdn
is set from the CLI") introduced 2 bugs. The first one is a typo on the
server's lock label (s/SERVER_UNLOCK/SERVER_LOCK/). The second one is about
the server's lock itself. It must be acquired to execute the "agent-send"
subcommand.

The patch above is marked to be backported as far as 1.8. Thus, this one
must also backported as far 1.8.

BUG/MINOR: server/cli: Don't forget to lock server on agent-send subcommand
2021-06-18 09:16:32 +02:00
Christopher Faulet
dcac418062 BUG/MEDIUM: resolvers: Add a task on servers to check SRV resolution status
When a server relies on a SRV resolution, a task is created to clean it up
(fqdn/port and address) when the SRV resolution is considered as outdated
(based on the resolvers 'timeout' value). It is only possible if the server
inherits outdated info from a state file and is no longer selected to be
attached to a SRV item. Note that most of time, a server is attached to a
SRV item. Thus when the item becomes obsolete, the server is cleaned
up.

It is important to have such task to be sure the server will be free again
to have a chance to be resolved again with fresh information. Of course,
this patch is a workaround to solve a design issue. But there is no other
obvious way to fix it without rewritting all the resolvers part. And it must
be backportable.

This patch relies on following commits:
 * MINOR: resolvers: Clean server in a dedicated function when removing a SRV item
 * MINOR: resolvers: Remove server from named_servers tree when removing a SRV item

All the series must be backported as far as 2.2 after some observation
period. Backports to 2.0 and 1.8 must be evaluated.
2021-06-17 16:52:35 +02:00
Christopher Faulet
c7b391aed2 BUG/MEDIUM: server/cli: Fix ABBA deadlock when fqdn is set from the CLI
To perform servers resolution, the resolver's lock is first acquired then
the server's lock when necessary. However, when the fqdn is set via the CLI,
the opposite is performed. So, it is possible to experience an ABBA
deadlock.

To fix this bug, the server's lock is acquired and released for each
subcommand of "set server" with an exception when the fqdn is set. The
resolver's lock is first acquired. Of course, this means we must be sure to
have a resolver to lock.

This patch must be backported as far as 1.8.
2021-06-17 16:52:14 +02:00