Commit Graph

610 Commits

Author SHA1 Message Date
Amaury Denoyelle
14c3c5c121 MEDIUM: server: allow to remove servers at runtime except non purgeable
Relax the condition on "delete server" CLI handler to be able to remove
all servers, even non dynamic, except if they are flagged as non
purgeable.

This change is necessary to extend the use cases for dynamic servers
with reload. It's expected that each dynamic server created via the CLI
is manually commited in the haproxy configuration by the user. Dynamic
servers will be present on reload only if they are present in the
configuration file. This means that non-dynamic servers must be allowed
to be removable at runtime.

The dynamic servers removal reg-test has been updated and renamed to
reflect its purpose. A new test is present to check that non-purgeable
servers cannot be removed.
2021-08-25 15:53:54 +02:00
Amaury Denoyelle
0626961ad3 MINOR: server: mark referenced servers as non purgeable
Mark servers that are referenced by configuration elements as non
purgeable. This includes the following list :
- tracked servers
- servers referenced in a use-server rule
- servers referenced in a sample fetch
2021-08-25 15:53:54 +02:00
Amaury Denoyelle
bc2ebfa5a4 MEDIUM: server: extend refcount for all servers
In a future patch, it will be possible to remove at runtime every
servers, both static and dynamic. This requires to extend the server
refcount for all instances.

First, refcount manipulation functions have been renamed to better
express the API usage.

* srv_refcount_use -> srv_take
The refcount is always initialize to 1 on the server creation in
new_server. It's also incremented for each check/agent configured on a
server instance.

* free_server -> srv_drop
This decrements the refcount and if null, the server is freed, so code
calling it must not use the server reference after it. As a bonus, this
function now returns the next server instance. This is useful when
calling on the server loop without having to save the next pointer
before each invocation.

In these functions, remove the checks that prevent refcount on
non-dynamic servers. Each reference to "dynamic" in variable/function
naming have been eliminated as well.
2021-08-25 15:53:54 +02:00
Amaury Denoyelle
0a8d05d31c BUG/MINOR: stats: use refcount to protect dynamic server on dump
A dynamic server may be deleted at runtime at the same moment when the
stats applet is pointing to it. Use the server refcount to prevent
deletion in this case.

This should be backported up to 2.4, with an observability period of 2
weeks. Note that it requires the dynamic server refcounting feature
which has been implemented on 2.5; the following commits are required :

- MINOR: server: implement a refcount for dynamic servers
- BUG/MINOR: server: do not use refcount in free_server in stopping mode
- MINOR: server: return the next srv instance on free_server
2021-08-25 15:53:43 +02:00
Amaury Denoyelle
f5c1e12e44 MINOR: server: return the next srv instance on free_server
As a convenience, return the next server instance from servers list on
free_server.

This is particularily useful when using this function on the servers
list without having to save of the next pointer before calling it.
2021-08-25 15:29:19 +02:00
William Lallemand
4c395fce21 MINOR: server: check if srv is NULL in free_server()
Check if srv is NULL before trying to do anything  in free_server(),
like most free()-like function do.
2021-08-20 10:20:51 +02:00
Amaury Denoyelle
3eb42f91d9 BUG/MEDIUM: server: support both check/agent-check on a dynamic instance
A static server is able to support simultaneously both health chech and
agent-check. Adjust the dynamic server CLI handlers to also support this
configuration.

This should not be backported, unless dynamic server checks are
backported.
2021-08-11 14:41:47 +02:00
Amaury Denoyelle
13f2e2ceeb BUG/MINOR: server: do not use refcount in free_server in stopping mode
Currently there is a leak at process shutdown with dynamic servers with
check/agent-check activated. Check purges are not executed on process
stopping, so the server is not liberated due to its refcount.

The solution is simply to ignore the refcount on process stopping mode
and free the server on the first free_server invocation.

This should not be backported, unless dynamic server checks are
backported. In this case, the following commit must be backported first.
  7afa5c1843
  MINOR: global: define MODE_STOPPING
2021-08-09 17:53:30 +02:00
Amaury Denoyelle
b65f4cab6a MEDIUM: server: implement agent check for dynamic servers
This commit is the counterpart for agent check of
"MEDIUM: server: implement check for dynamic servers".

The "agent-check" keyword is enabled for dynamic servers. The agent
check must manually be activated via "enable agent" CLI. This can
enable the dynamic server if the agent response is "ready" without an
explicit "enable server" CLI.
2021-08-06 11:09:48 +02:00
Amaury Denoyelle
2fc4d39577 MEDIUM: server: implement check for dynamic servers
Implement check support for dynamic servers. The "check" keyword is now
enabled for dynamic servers. If used, the server check is initialized
and the check task started in the "add server" CLI handler. The check is
explicitely disabled and must be manually activated via "enable health"
CLI handler.

The dynamic server refcount is incremented if a check is configured. On
"delete server" handler, the check is purged, which decrements the
refcount.
2021-08-06 11:09:48 +02:00
Amaury Denoyelle
d6b7080cec MINOR: server: implement a refcount for dynamic servers
It is necessary to have a refcount mechanism on dynamic servers to be
able to enable check support. Indeed, when deleting a dynamic server
with check activated, the check will be asynchronously removed. This is
mandatory to properly free the check resources in a thread-safe manner.
The server instance must be kept alive for this.
2021-08-06 11:09:48 +02:00
Amaury Denoyelle
fca18172d9 MINOR: server: initialize fields for dynamic server check
Set default inter/rise/fall values for dynamic servers check/agent. This
is required because dynamic servers do not inherit from a
default-server.
2021-08-06 11:08:04 +02:00
Amaury Denoyelle
c755efd5c6 MINOR: server: unmark deprecated on enable health/agent cli
Remove the "DEPRECATED" marker on "enable/disable health/agent"
commands. Their purpose is to toggle the check/agent on a server.

These commands are still useful because their purpose is not covered by
the "set server" command. Most there was confusion with the commands
'set server health/agent', which in fact serves another goal.

Note that the indication "use 'set server' instead" has been added since
2016 on the commit
  2c04eda8b5
  REORG: cli: move "{enable|disable} health" to server.c

and
  58d9cb7d22
  REORG: cli: move "{enable|disable} agent" to server.c

Besides, these commands will become required to enable check/agent on
dynamic servers which will be created with check disabled.

This should be backported up to 2.4.
2021-08-06 10:09:50 +02:00
Willy Tarreau
d332f1396b BUG/MINOR: server: update last_change on maint->ready transitions too
Nenad noticed that when leaving maintenance, the servers' last_change
field was not updated. This is visible in the Status column of the stats
page in front of the state, as the cumuled time spent in the current state
is wrong, it starts from the last transition (typically ready->maint). In
addition, the backend's state was not updated either, because the down
transition is performed by set_backend_down() which also emits a log, and
it is this function which was extended to update the backend's last_change,
but it's not called for down->up transitions so that was not done.

The most visible (and unpleasant) effect of this bug is that it affects
slowstart so such a server could immediately restart with a significant
load ratio.

This should likely be backported to all stable releases.
2021-08-04 19:41:01 +02:00
Amaury Denoyelle
bd8dd841e5 BUG/MINOR: server: remove srv from px list on CLI 'add server' error
If an error occured during the CLI 'add server' handler, the newly
created server must be removed from the proxy list if already inserted.
Currently, this can happen on the extremely rare error during server id
generation if there is no id left.

The removal operation is not thread-safe, it must be conducted before
releasing the thread isolation.

This can be backported up to 2.4. Please note that dynamic server track
is not implemented in 2.4, so the release_server_track invocation must
be removed for the backport to prevent a compilation error.
2021-08-04 14:57:06 +02:00
Willy Tarreau
ba3ab7907a MEDIUM: servers: make the server deletion code run under full thread isolation
In 2.4, runtime server deletion was brought by commit e558043e1 ("MINOR:
server: implement delete server cli command"). A comment remained in the
code about a theoretical race between the thread_isolate() call and another
thread being in the process of allocating memory before accessing the
server via a reference that was grabbed before the memory allocation,
since the thread_harmless_now()/thread_harmless_end() pair around mmap()
may have the effect of allowing cli_parse_delete_server() to proceed.

Now that the full thread isolation is available, let's update the code
to rely on this. Now it is guaranteed that competing threads will either
be in the poller or queued in front of thread_isolate_full().

This may be backported to 2.4 if any report of breakage suggests the bug
really exists, in which case the two following patches will also be
needed:
  MINOR: threads: make thread_release() not wait for other ones to complete
  MEDIUM: threads: add a stronger thread_isolate_full() call
2021-08-04 14:49:36 +02:00
Amaury Denoyelle
08be72b827 BUG/MINOR: server: fix race on error path of 'add server' CLI if track
If an error occurs during a dynamic server creation with tracking, it
must be removed from the tracked list. This operation is not thread-safe
and thus must be conducted under the thread isolation.

Track support for dynamic servers has been introduced in this release.
This does not need to be backported.
2021-08-04 09:18:12 +02:00
Amaury Denoyelle
56eb8ed37d MEDIUM: server: support track keyword for dynamic servers
Allow the usage of the 'track' keyword for dynamic servers. On server
deletion, the server is properly removed from the tracking chain to
prevents NULL pointer dereferencing.
2021-07-16 10:22:58 +02:00
Amaury Denoyelle
79f68be207 MINOR: srv: do not allow to track a dynamic server
Prevents the use of the "track" keyword for a dynamic server. This
simplifies the deletion of a dynamic server, without having to worry
about servers which might tracked it.

A BUG_ON is present in the dynamic server delete function to validate
this assertion.
2021-07-16 10:08:55 +02:00
Amaury Denoyelle
669b620e5f MINOR: srv: extract tracking server config function
Extract the post-config tracking setup in a dedicated function
srv_apply_track. This will be useful to implement track support for
dynamic servers.
2021-07-16 10:08:55 +02:00
Remi Tricot-Le Breton
0498fa4059 BUG/MINOR: ssl: Default-server configuration ignored by server
When a default-server line specified a client certificate to use, the
frontend would not take it into account and create an empty SSL context,
which would raise an error on the backend side ("peer did not return a
certificate").

This bug was introduced by d817dc733e in
which the SSL contexts are created earlier than before (during the
default-server line parsing) without setting it in the corresponding
server structures. It then made the server create an empty SSL context
in ssl_sock_prepare_srv_ctx because it thought it needed one.

It was raised on redmine, in Bug #3906.

It can be backported to 2.4.
2021-07-13 18:35:38 +02:00
Christopher Faulet
81ba74ae50 BUG/MEDIUM: resolvers: Make 1st server of a template take part to SRV resolution
The commit 3406766d5 ("MEDIUM: resolvers: add a ref between servers and srv
request or used SRV record") introduced a regression. The first server of a
template based on SRV record is no longer resolved. The same bug exists for
a normal server based on a SRV record.

In fact, the server used during parsing (used as reference when a
server-template line is parsed) is never attached to the corresponding srvrq
object. Thus with following lines, no resolution is performed because
"srvrq->attached_servers" is empty:

  server-template test 1 _http.domain.tld resolvers dns ...
  server test1 _http.domain.tld resolvers dns ...

This patch should fix the issue #1295 (but not confirmed yet it is the same
bug). It must be backported everywhere the above commit is.
2021-06-29 20:52:37 +02:00
Christopher Faulet
07ecff589d MINOR: resolvers: Reset server IP on error in resolv_get_ip_from_response()
If resolv_get_ip_from_response() returns an error (or an unexpected return
value), the server is set to RMAINT status. However, its address must also
be reset. Otherwise, it is still reported by the cli on "show servers state"
commands. This may be confusing. Note that it is a theorical patch because
this code path does not exist. Thus it is not tagged as a BUG.

This patch may be backported as far as 2.0.
2021-06-24 17:22:36 +02:00
Christopher Faulet
a8ce497aac BUG/MINOR: resolvers: Reset server IP when no ip is found in the response
For A/AAAA resolution, if no ip is found for a server in the response, the
server is set to RMAINT status. However, its address must also be
reset. Otherwise, it is still reported by the cli on "show servers state"
commands. This may be confusing.

This patch may be backported as far as 2.0.
2021-06-24 17:22:36 +02:00
Willy Tarreau
cdc83e0192 MINOR: queue: add a pointer to the server and the proxy in the queue
A queue is specific to a server or a proxy, so we don't need to place
this distinction inside all pendconns, it can be in the queue itself.
This commit adds the relevant fields "px" and "sv" into the struct
queue, and initializes them accordingly.
2021-06-24 10:52:31 +02:00
Willy Tarreau
df3b0cbe31 MINOR: queue: add queue_init() to initialize a queue
This is better and cleaner than open-coding this in the server and
proxy code, where it has all chances of becoming wrong once forgotten.
2021-06-24 10:52:31 +02:00
Willy Tarreau
9ab78293bf MEDIUM: queue: simplify again the process_srv_queue() API (v2)
This basically undoes the API changes that were performed by commit
0274286dd ("BUG/MAJOR: server: fix deadlock when changing maxconn via
agent-check") to address the deadlock issue: since process_srv_queue()
doesn't use the server lock anymore, it doesn't need the "server_locked"
argument, so let's get rid of it before it gets used again.
2021-06-24 10:52:31 +02:00
Willy Tarreau
16fbdda3c3 MEDIUM: queue: use a dedicated lock for the queues (v2)
Till now whenever a server or proxy's queue was touched, this server
or proxy's lock was taken. Not only this requires distinct code paths,
but it also causes unnecessary contention with other uses of these locks.

This patch adds a lock inside the "queue" structure that will be used
the same way by the server and the proxy queuing code. The server used
to use a spinlock and the proxy an rwlock, though the queue only used
it for locked writes. This new version uses a spinlock since we don't
need the read lock part here. Tests have not shown any benefit nor cost
in using this one versus the rwlock so we could change later if needed.

The lower contention on the locks increases the performance from 362k
to 374k req/s on 16 threads with 20 servers and leastconn. The gain
with roundrobin even increases by 9%.

This is tagged medium because the lock is changed, but no other part of
the code touches the queues, with nor without locking, so this should
remain invisible.
2021-06-24 10:52:31 +02:00
Willy Tarreau
3f70fb9ea2 Revert "MEDIUM: queue: use a dedicated lock for the queues"
This reverts commit fcb8bf8650.

The recent changes since 5304669e1 MEDIUM: queue: make
pendconn_process_next_strm() only return the pendconn opened a tiny race
condition between stream_free() and process_srv_queue(), as the pendconn
is accessed outside of the lock, possibly while it's being freed. A
different approach is required.
2021-06-24 07:26:28 +02:00
Willy Tarreau
ccd85a3e08 Revert "MEDIUM: queue: simplify again the process_srv_queue() API"
This reverts commit c83e45e9b0.

The recent changes since 5304669e1 MEDIUM: queue: make
pendconn_process_next_strm() only return the pendconn opened a tiny race
condition between stream_free() and process_srv_queue(), as the pendconn
is accessed outside of the lock, possibly while it's being freed. A
different approach is required.
2021-06-24 07:22:18 +02:00
Willy Tarreau
c83e45e9b0 MEDIUM: queue: simplify again the process_srv_queue() API
This basically undoes the API changes that were performed by commit
0274286dd ("BUG/MAJOR: server: fix deadlock when changing maxconn via
agent-check") to address the deadlock issue: since process_srv_queue()
doesn't use the server lock anymore, it doesn't need the "server_locked"
argument, so let's get rid of it before it gets used again.
2021-06-22 18:57:15 +02:00
Willy Tarreau
fcb8bf8650 MEDIUM: queue: use a dedicated lock for the queues
Till now whenever a server or proxy's queue was touched, this server
or proxy's lock was taken. Not only this requires distinct code paths,
but it also causes unnecessary contention with other uses of these locks.

This patch adds a lock inside the "queue" structure that will be used
the same way by the server and the proxy queuing code. The server used
to use a spinlock and the proxy an rwlock, though the queue only used
it for locked writes. This new version uses a spinlock since we don't
need the read lock part here. Tests have not shown any benefit nor cost
in using this one versus the rwlock so we could change later if needed.

The lower contention on the locks increases the performance from 491k
to 507k req/s on 16 threads with 20 servers and leastconn. The gain
with roundrobin even increases by 6%.

The performance profile changes from this:
  13.03%  haproxy             [.] fwlc_srv_reposition
   8.08%  haproxy             [.] fwlc_get_next_server
   3.62%  haproxy             [.] process_srv_queue
   1.78%  haproxy             [.] pendconn_dequeue
   1.74%  haproxy             [.] pendconn_add

to this:
  11.95%  haproxy             [.] fwlc_srv_reposition
   7.57%  haproxy             [.] fwlc_get_next_server
   3.51%  haproxy             [.] process_srv_queue
   1.74%  haproxy             [.] pendconn_dequeue
   1.70%  haproxy             [.] pendconn_add

At this point the differences are mostly measurement noise.

This is tagged medium because the lock is changed, but no other part of
the code touches the queues, with nor without locking, so this should
remain invisible.
2021-06-22 18:43:56 +02:00
Willy Tarreau
a05704582c MINOR: server: replace the pendconns-related stuff with a struct queue
Just like for proxies, all three elements (pendconns, nbpend, queue_idx)
were moved to struct queue.
2021-06-22 18:43:14 +02:00
Amaury Denoyelle
0274286dd3 BUG/MAJOR: server: fix deadlock when changing maxconn via agent-check
The server_parse_maxconn_change_request locks the server lock. However,
this function can be called via agent-checks or lua code which already
lock it. This bug has been introduced by the following commit :

  commit 79a88ba3d0
  BUG/MAJOR: server: prevent deadlock when using 'set maxconn server'

This commit tried to fix another deadlock with can occur because
previoulsy server_parse_maxconn_change_request requires the server lock
to be held. However, it may call internally process_srv_queue which also
locks the server lock. The locking policy has thus been updated. The fix
is functional for the CLI 'set maxconn' but fails to address the
agent-check / lua counterparts.

This new issue is fixed in two steps :
- changes from the above commit have been reverted. This means that
  server_parse_maxconn_change_request must again be called with the
  server lock.

- to counter the deadlock fixed by the above commit, process_srv_queue
  now takes an argument to render the server locking optional if the
  caller already held it. This is only used by
  server_parse_maxconn_change_request.

The above commit was subject to backport up to 1.8. Thus this commit
must be backported in every release where it is already present.
2021-06-22 11:39:20 +02:00
Amaury Denoyelle
34897d2eff MINOR: ssl: support ssl keyword for dynamic servers
Activate the 'ssl' keyword for dynamic servers. This is the final step
to have ssl dynamic servers feature implemented. If activated,
ssl_sock_prepare_srv_ctx will be called at the end of the 'add server'
CLI handler.

At the same time, update the management doc to list all ssl keywords
implemented for dynamic servers.
2021-06-18 16:42:26 +02:00
Amaury Denoyelle
b89d3d3de7 MINOR: server: disable CLI 'set server ssl' for dynamic servers
'set server ssl' uses ssl parameters from default-server. As dynamic
servers does not reuse any default-server parameters, this command has
no sense for them.
2021-06-18 16:42:25 +02:00
Christopher Faulet
0ba54bb401 BUG/MINOR: server/cli: Fix locking in function processing "set server" command
The commit c7b391aed ("BUG/MEDIUM: server/cli: Fix ABBA deadlock when fqdn
is set from the CLI") introduced 2 bugs. The first one is a typo on the
server's lock label (s/SERVER_UNLOCK/SERVER_LOCK/). The second one is about
the server's lock itself. It must be acquired to execute the "agent-send"
subcommand.

The patch above is marked to be backported as far as 1.8. Thus, this one
must also backported as far 1.8.

BUG/MINOR: server/cli: Don't forget to lock server on agent-send subcommand
2021-06-18 09:16:32 +02:00
Christopher Faulet
dcac418062 BUG/MEDIUM: resolvers: Add a task on servers to check SRV resolution status
When a server relies on a SRV resolution, a task is created to clean it up
(fqdn/port and address) when the SRV resolution is considered as outdated
(based on the resolvers 'timeout' value). It is only possible if the server
inherits outdated info from a state file and is no longer selected to be
attached to a SRV item. Note that most of time, a server is attached to a
SRV item. Thus when the item becomes obsolete, the server is cleaned
up.

It is important to have such task to be sure the server will be free again
to have a chance to be resolved again with fresh information. Of course,
this patch is a workaround to solve a design issue. But there is no other
obvious way to fix it without rewritting all the resolvers part. And it must
be backportable.

This patch relies on following commits:
 * MINOR: resolvers: Clean server in a dedicated function when removing a SRV item
 * MINOR: resolvers: Remove server from named_servers tree when removing a SRV item

All the series must be backported as far as 2.2 after some observation
period. Backports to 2.0 and 1.8 must be evaluated.
2021-06-17 16:52:35 +02:00
Christopher Faulet
c7b391aed2 BUG/MEDIUM: server/cli: Fix ABBA deadlock when fqdn is set from the CLI
To perform servers resolution, the resolver's lock is first acquired then
the server's lock when necessary. However, when the fqdn is set via the CLI,
the opposite is performed. So, it is possible to experience an ABBA
deadlock.

To fix this bug, the server's lock is acquired and released for each
subcommand of "set server" with an exception when the fqdn is set. The
resolver's lock is first acquired. Of course, this means we must be sure to
have a resolver to lock.

This patch must be backported as far as 1.8.
2021-06-17 16:52:14 +02:00
Christopher Faulet
a386e78823 BUG/MINOR: server: Forbid to set fqdn on the CLI if SRV resolution is enabled
If a server is configured to rely on a SRV resolution, we must forbid to
change its fqdn on the CLI. Indeed, in this case, the server retrieves its
fqdn from the SRV resolution. If the fqdn is changed via the CLI, this
conflicts with the SRV resolution and leaves the server in an undefined
state. Most of time, the SRV resolution remains enabled with no effect on
the server (no update). Some time the A/AAAA resolution for the new fqdn is
not enabled at all. It depends on the server state and resolver state when
the CLI command is executed.

This patch must be backported as far as 2.0 (maybe to 1.8 too ?) after some
observation period.
2021-06-17 16:17:14 +02:00
Miroslav Zagorac
8a8f270f6a CLEANUP: server: a separate function for initializing the per_thr field
To avoid repeating the same source code, allocating memory and initializing
the per_thr field from the server structure is transferred to a separate
function.
2021-06-17 16:07:10 +02:00
Amaury Denoyelle
8ff0434b61 BUG/MEDIUM: server: do not auto insert a dynamic server in px addr_node
Until then, the servers were automatically attached on their creation
into the proxy addr_node tree via _srv_parse_init. In case of an invalid
dynamic server which is instantly freed, no detach operation was made
leaving a NULL server in the tree.

Change this mode of operation by marking the attach operation as
optional in _srv_parse_init. This operation is not conduct for a dynamic
server. The server is attached only at the end of the CLI handler when
it is marked as valid.

This must be backported up to 2.4.
2021-06-15 11:42:53 +02:00
Amaury Denoyelle
1613b4a75d BUG/MINOR: server: do not keep an invalid dynamic server in px ids tree
A bug is present when trying to create a dynamic server with a fixed id.
If the server is detected invalid due to a later parsing arguments
error, the server is not removed from the proxy used ids tree before
being freed.

Change the mode of operation of 'id' keyword parsing handler. The
insertion in the backend tree is removed from the handler and is not
taken in charge by parse_server for configuration parsing. For the
dynamic servers, the insertion is called at the end of the 'add server'
CLI handler when the server has been validated.

This must be backported up to 2.4.
2021-06-15 11:42:53 +02:00
Amaury Denoyelle
406aaef55a BUG/MEDIUM: server: do not forget to generate the dynamic servers ids
If no id is specified by the user for a dynamic server, it is necessary
to generate a new one. This operation is now done at the end of 'add
server' CLI handler. The server is then inserted into the proxy ids
tree.

Without this, several features may be broken for dynamic servers. Among
them, there is the "first" lb algorithm, the persistence using
stick-tables or the uniqueness internal check of srv_parse_id.

This must be backported up to 2.4.
2021-06-15 11:42:53 +02:00
Amaury Denoyelle
82d7f77463 BUG/MEDIUM: server: clear dynamic srv on delete from proxy id/name trees
Do not leave deleted server in used_server_id/used_server_addr backend
trees. This might lead to crashes if a deleted server is used through
these trees.

At this moment, dynamic servers are only added in used_server_id if they
have a fixed id. They are never inserted in used_server_addr as this
code is missing. So these new delete instructions are noop. However, a
fix will be provided soon to insert properly all dynamic servers in both
used_server_id and used_server_addr trees so the deletion counterpart
will be mandatory in the CLI server delete handler.

This must be backported to 2.4.
2021-06-15 11:38:06 +02:00
Amaury Denoyelle
31ddd76fef BUG/MEDIUM: server: extend thread-isolate over much of CLI 'add server'
Some config parsing handlers were designed to be run at startup on a
single-thread. When executing at runtime for dynamic servers,
thread-safety is not guaranteed. This is the case for example in
srv_parse_id which manipulates backend used_ids tree.

One solution could be to add locks but it might be tricky to found all
affected functions and it can be an easy source of deadlock. The other
solution which has been chosen is to use thread-isolation over almost
all of the cli_parse_add_server CLI handler.

For now this solution is sufficient. If some users make heavy use of the
'add server', hurting the overall performance, it will be necessary to
design a much thinner solution.

This must be backported up to 2.4.
2021-06-15 11:19:43 +02:00
Emeric Brun
caef19e0c7 BUG/MAJOR: resolvers: segfault using server template without SRV RECORDs
This patch fix the issue adding a test in srvrq before registering
the server on it during server template init.

This was a regression due to commit :
3406766d57

This should be backported with this previous commit (until 2.0)
2021-06-14 11:04:02 +02:00
Emeric Brun
3406766d57 MEDIUM: resolvers: add a ref between servers and srv request or used SRV record
This patch add a ref into servers to register them onto the
record answer item used to set their hostnames.

It also adds a head list into 'srvrq' to register servers free
to be affected to a SRV record.

A head of a tree is also added to srvrq to put servers which
present a hotname in server state file. To re-link them fastly
to the matching record as soon an item present the same name.

This results in better performances on SRV record response
parsing.

This is an optimization but it could avoid to trigger the haproxy's
internal wathdog in some circumstances. And for this reason
it should be backported as far we can (2.0 ?)
2021-06-11 16:16:16 +02:00
Emeric Brun
bd78c912fd MEDIUM: resolvers: add a ref on server to the used A/AAAA answer item
This patch adds a head list into answer items on servers which use
this record to set their IPs. It makes lookup on duplicated ip faster and
allow to check immediatly if an item is still valid renewing the IP.

This results in better performances on A/AAAA resolutions.

This is an optimization but it could avoid to trigger the haproxy's
internal wathdog in some circumstances. And for this reason
it should be backported as far we can (2.0 ?)
2021-06-11 16:16:16 +02:00
Emeric Brun
12ca658dbe BUG/MINOR: resolvers: answser item list was randomly purged or errors
In case of SRV records, The answer item list was purged by the
error callback of the first requester which considers the error
could not be safely ignored. It makes this item list unavailable
for subsequent requesters even if they consider the error
could be ignored.

On A resolution or do_resolve action error, the answer items were
never trashed.

This patch re-work the error callbacks and the code to check the return code
If a callback return 1, we consider the error was ignored and
the answer item list must be kept. At the opposite, If all error callbacks
of all requesters of the same resolution returns 0 the list will be purged

This patch should be backported as far as 2.0.
2021-06-11 16:16:16 +02:00
Amaury Denoyelle
efbf35caf9 BUG/MINOR: server: explicitly set "none" init-addr for dynamic servers
Define srv.init_addr_methods to SRV_IADDR_NONE on 'add server' CLI
handler. This explicitly states that no resolution will be made on the
server creation.

This is not a real bug as the default value (SRV_IADDR_END) has the same
effect in practice. However the intent is clearer and prevent to use the
default "libc,last" by mistake which cannot execute on runtime (blocking
call + file access via gethostbyname/getaddrinfo).

The doc is also updated to reflect this limitation.

This should be backported up to 2.4.
2021-06-10 17:44:05 +02:00
Amaury Denoyelle
5e560e80c7 MINOR: server: use ha_alert in server parsing functions
Replace memprintf usage in _srv_parse* functions by ha_alert calls. This
has the advantage to simplify the function prototype by removing an
extra char** argument.

As a consequence, the CLI handler of 'add server' is updated to output
the user messages buffers if not empty.
2021-06-07 17:19:33 +02:00
Amaury Denoyelle
9d0138ab08 MINOR: server: use parsing ctx for server init addr
Initialize the parsing context in srv_init_addr. This function is called
after configuration check.

This will standardize the stderr output on startup with the parse_server
function.
2021-06-07 17:19:30 +02:00
Amaury Denoyelle
0fc136ce5b REORG: server: use parsing ctx for server parsing
Use the parsing context in parse_server. Remove redundant manual
format-string specifying the current file/line/server parsed.
2021-06-07 17:19:24 +02:00
Amaury Denoyelle
c008a63582 CLEANUP: server: fix cosmetic of error message on sni parsing
Fix memprintf used in server_parse_sni_expr. Error messages should not
be ending with a newline as it will be inserted in the parent function
on the ha_alert invocation.
2021-06-07 16:58:16 +02:00
Remi Tricot-Le Breton
f1800e64ef BUG/MINOR: server: Missing calloc return value check in srv_parse_source
Two calloc calls were not checked in the srv_parse_source function.
Considering that this function could be called at runtime through a
dynamic server creation via the CLI, this could lead to an unfortunate
crash.

It was raised in GitHub issue #1233.
It could be backported to all stable branches even though the runtime
crash could only happen on branches where dynamic server creation is
possible.
2021-05-31 10:50:32 +02:00
Amaury Denoyelle
79a88ba3d0 BUG/MAJOR: server: prevent deadlock when using 'set maxconn server'
A deadlock is possible with 'set maxconn server' command, if there is
pending connection ready to be dequeued. This is caused by the locking
of server spinlock in both cli_parse_set_maxconn_server and
process_srv_queue.

Fix this by reducing the scope of the server lock into
server_parse_maxconn_change_request. If connection are dequeued, the
lock is taken a second time. This can be seen as suboptimal but as it
happens only during 'set maxconn server' it can be considered as
tolerable.

This issue was reported on the mailing list, for the 1.8.x branch.
It must be backported up to the 1.8.
2021-05-19 17:52:05 +02:00
Willy Tarreau
b00a8e30f1 BUILD: server: include missing proxy.h in server.c
It's needed for a number of functions and definitions but was missing.
2021-05-08 20:24:09 +02:00
Willy Tarreau
ba6300ea62 BUILD: server: include tools.h from server.c
A lot of functions from tools.h are used there but the file was only
inherited via other ones.
2021-05-08 19:37:41 +02:00
Amaury Denoyelle
24abb0cdc1 BUG/MINOR: server: do not report diag for peer servers with null weight
Only check servers attached to a proxy with PR_CAP_LB.

This does not need to be backported as the diag message was added in the
current 2.4-dev branch.
2021-05-07 15:20:54 +02:00
Willy Tarreau
b205bfdab7 CLEANUP: cli/tree-wide: properly re-align the CLI commands' help messages
There were 102 CLI commands whose help were zig-zagging all along the dump
making them unreadable. This patch realigns all these messages so that the
command now uses up to 40 characters before the delimiting colon. About a
third of the commands did not correctly list their arguments which were
added after the first version, so they were all updated. Some abuses of
the term "id" were fixed to use a more explanatory term. The
"set ssl ocsp-response" command was not listed because it lacked a help
message, this was fixed as well. The deprecated enable/disable commands
for agent/health/server were prominently written as deprecated. Whenever
possible, clearer explanations were provided.
2021-05-07 11:51:26 +02:00
Amaury Denoyelle
3109ccfe70 MINOR: srv: close all idle connections on shutdown
Implement a function to close all server idle connections. This function
is called via a global deinit server handler.

The main objective is to prevents from leaving sockets in TIME_WAIT
state. To limit the set of operations on shutdown and prevents
tasks rescheduling, only the ctrl stack closing is done.
2021-05-05 14:33:51 +02:00
Amaury Denoyelle
eafd701dc5 MINOR: server: fix doc/trace on lb algo for dynamic server creation
The text mentionned that only backends with consistent hash method were
supported for dynamic servers. In fact, it is only required that the lb
algorith is dynamic.
2021-04-29 14:59:42 +02:00
Amaury Denoyelle
d6b4b6da3f BUG/MINOR: server: fix potential null gcc error in delete server
gcc still reports a potential null pointer dereference in delete server
function event with a BUG_ON before it. Remove the misleading NULL check
in the for loop which should never happen.

This does not need to be backported.
2021-04-21 12:02:30 +02:00
Amaury Denoyelle
e558043e13 MINOR: server: implement delete server cli command
Implement a new CLI command 'del server'. It can be used to removed a
dynamically added server. Only servers in maintenance mode can be
removed, and without pending/active/idle connection on it.

Add a new reg-test for this feature. The scenario of the reg-test need
to first add a dynamic server. It is then deleted and a client is used
to ensure that the server is non joinable.

The management doc is updated with the new command 'del server'.
2021-04-21 11:00:31 +02:00
Amaury Denoyelle
d38e7fa233 MINOR: server: add log on dynamic server creation
Add a notice log to report the creation of a new server. The log is
printed at the end of the function.
2021-04-21 11:00:31 +02:00
Amaury Denoyelle
cece918625 BUG/MEDIUM: server: ensure thread-safety of server runtime creation
cli_parse_add_server can be executed in parallel by several CLI
instances and so must be thread-safe. The critical points of the
function are :
- server duplicate detection
- insertion of the server in the proxy list

The mode of operation has been reversed. The server is first
instantiated and parsed. The duplicate check has been moved at the end
just before the insertion in the proxy list, under the thread isolation.
Thus, the thread safety is guaranteed and server allocation is kept
outside of locks/thread isolation.
2021-04-21 11:00:30 +02:00
Amaury Denoyelle
fb247946a1 BUG/MINOR: server: free srv.lb_nodes in free_server
lb_nodes is allocated for servers using lb_chash (balance random or
hash-type consistent).

It can be backported up to 1.8.
2021-04-21 11:00:03 +02:00
Willy Tarreau
2b71810cb3 CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion
The current "ADD" vs "ADDQ" is confusing because when thinking in terms
of appending at the end of a list, "ADD" naturally comes to mind, but
here it does the opposite, it inserts. Several times already it's been
incorrectly used where ADDQ was expected, the latest of which was a
fortunate accident explained in 6fa922562 ("CLEANUP: stream: explain
why we queue the stream at the head of the server list").

Let's use more explicit (but slightly longer) names now:

   LIST_ADD        ->       LIST_INSERT
   LIST_ADDQ       ->       LIST_APPEND
   LIST_ADDED      ->       LIST_INLIST
   LIST_DEL        ->       LIST_DELETE

The same is true for MT_LISTs, including their "TRY" variant.
LIST_DEL_INIT keeps its short name to encourage to use it instead of the
lazier LIST_DELETE which is often less safe.

The change is large (~674 non-comment entries) but is mechanical enough
to remain safe. No permutation was performed, so any out-of-tree code
can easily map older names to new ones.

The list doc was updated.
2021-04-21 09:20:17 +02:00
Willy Tarreau
dcb121fd9c BUG/MINOR: server: make srv_alloc_lb() allocate lb_nodes for consistent hash
The test in srv_alloc_lb() to allocate the lb_nodes[] array used in the
consistent hash was incorrect, it wouldn't do it for consistent hash and
could do it for regular random.

No backport is needed as this was added for dynamic servers in 2.4-dev by
commit f99f77a50 ("MEDIUM: server: implement 'add server' cli command").
2021-04-20 11:39:54 +02:00
Willy Tarreau
14015b8880 MINOR: server: move idle_conn_task to read_mostly
This pointer is used when adding connections to the idle list and is
never changed, let's move it to the read_mostly section.
2021-04-10 19:27:41 +02:00
Amaury Denoyelle
da0e7f61e0 MINOR: server: diag for 0 weight server
Output a diagnostic report if a server has been configured with a null
weight.
2021-04-01 18:03:37 +02:00
Ilya Shipitsin
ba13f16aa2 CLEANUP: assorted typo fixes in the code and comments
This is 21st iteration of typo fixes
2021-03-20 09:28:58 +01:00
Amaury Denoyelle
304672320e MINOR: server: support keyword proto in 'add server' cli
Allow to specify the mux proto for a dynamic server. It must be
compatible with the backend mode to be accepted. The reg-tests has been
extended for this error case.
2021-03-18 16:22:10 +01:00
Amaury Denoyelle
fc465a54fd MINOR: server: enable standard options for dynamic servers
Enable a subset of server options to be used as keywords on the CLI
command 'add server'. These options are safe and can be applied
flawlessly for a dynamic server.
2021-03-18 16:22:10 +01:00
Amaury Denoyelle
f99f77a500 MEDIUM: server: implement 'add server' cli command
Add a new cli command 'add server'. This command is used to create a new
server at runtime attached on an existing backend. The syntax is the
following one :

$ add server <be_name>/<sv_name> [<kws>...]

This command is only available through experimental mode for the moment.

Currently, no server keywords are supported. They will be activated
individually when deemed properly functional and safe.

Another limitation is put on the backend load-balancing algorithm. The
algorithm must use consistent hashing to guarantee a minimal
reallocation of existing connections on the new server insertion.
2021-03-18 15:52:07 +01:00
Amaury Denoyelle
76e10e78bb MINOR: server: prepare parsing for dynamic servers
Prepare the server parsing API to support dynamic servers.
- define a new parsing flag to be used for dynamic servers
- each keyword contains a new field dynamic_ok to indicate if it can be
  used for a dynamic server. For now, no keyword are supported.
- do not copy settings from the default server for a new dynamic server.
- a dynamic server is created in a maintenance mode and requires an
  explicit 'enable server' command.
- a new server flag named SRV_F_DYNAMIC is created. This flag is set for
  all servers created at runtime. It might be useful later, for example
  to know if a server can be purged.
2021-03-18 15:51:12 +01:00
Amaury Denoyelle
30c0537f5a REORG: server: use flags for parse_server
Modify the API of parse_server function. Use flags to describe the type
of the parsed server instead of discrete arguments. These flags can be
used to specify if a server/default-server/server-template is parsed.
Additional parameters are also specified (parsing of the address
required, resolve of a name must be done immediately).

It is now unneeded to use strcmp on args[0] in parse_server. Also, the
calls to parse_server are more explicit thanks to the flags.
2021-03-18 15:37:05 +01:00
Amaury Denoyelle
cf58dd79e3 REORG: server: attach servers in parse_server
Move server linked into proxy backend list outside of _srv_parse_init to
parse_server.

This is groundwork for dynamic servers support. There will be two
differences in case of a dynamic server :
- the server will be attached to the proxy list only at the very end of the
  operations when everything is ok
- the server will be directly attached to the end of the server proxy
  list
2021-03-18 15:37:05 +01:00
Amaury Denoyelle
7d27efef23 REORG: server: rename internal functions from parse_server
Use a standard convention for the functions used through parse_server.
Use the prefix _srv_parse and specify their private scope in a comment.
2021-03-18 15:37:05 +01:00
Amaury Denoyelle
9394a9444e REORG: server: move alert traces in parse_server
Move every ha_alert calls in parsing functions into parse_server.
Parsing functions now support a pointer-to-string argument which will be
allocated with an error message if needed via memprintf.

parse_server has then the responsibility to display errors with ha_alert.

This is groundwork for dynamic server. No traces should be printed on
stderr as a response to a cli command. cli_err will replace ha_alert in
this case.
2021-03-18 15:37:05 +01:00
Amaury Denoyelle
a8f442e078 REORG: server: split parse_server
The huge parse_server function is splitted into two smaller ones.
* _srv_parse_init allocates a new server instance and parses the address
  parameter
* _srv_parse_kw parse the current server keyword

This simplify a bit the parse_server function. Besides, it will be
useful for dynamic server creation.
2021-03-18 15:37:05 +01:00
Amaury Denoyelle
3b89c11d4d MINOR: server: remove fastinter from mistyped kw list
This keyword is already present in server kw list from checks.c.
2021-03-18 15:37:05 +01:00
Amaury Denoyelle
587b71e402 REORG: server: move keywords in srv_kws
Move server-keyword hardcoded in parse_server into the srv_kws list of
server.c. Now every server keywords is checked through srv_find_kw. This
has the effect to reduce the size of parse_server. As a side-effect,
common kw list can be reduced.

This change has been made to be able to quickly discard these keywords
in case of a dynamic server.
2021-03-18 15:37:05 +01:00
Amaury Denoyelle
828adf0121 REORG: server: add a free server function
Create a new server function named free_server. It can be used to
deallocate a server and its member.
2021-03-18 15:37:05 +01:00
Christopher Faulet
59b2925733 BUG/MINOR: resolvers: Add missing case-insensitive comparisons of DNS hostnames
DNS hostname comparisons were fixed to be case-insensitive (see b17b88487
"BUG/MEDIUM: dns: Consider the fact that dns answers are
case-insensitive"). However 2 comparisons are still case-sensitive.

This patch must be backported as far as 1.8.
2021-03-16 11:25:04 +01:00
Christopher Faulet
c392d461d6 CLEANUP: resolvers: Use ha_free() in srvrq_resolution_error_cb()
Two occurrences to "free(A);A=NULL;" may be replaced by a call to ha_free()
in the srvrq_resolution_error_cb() function.
2021-03-12 17:42:47 +01:00
Christopher Faulet
d83a6df5cd BUG/MEDIUM: resolvers: Skip DNS resolution at startup if SRV resolution is set
At startup, if a SRV resolution is set for a server, no DNS resolution is
created. We must wait the first SRV resolution to know if it must be
triggered. It is important to do so for two reasons.

First, during a "classical" startup, a server based on a SRV resolution has
no hostname. Thus the created DNS resolution is useless. Best waiting the
first SRV resolution. It is not really a bug at this stage, it is just
useless.

Second, in the same situation, if the server state is loaded from a file,
its hosname will be set a bit later. Thus, if there is no additionnal record
for this server, because there is already a DNS resolution, it inhibits any
new DNS resolution. But there is no hostname attached to the existing DNS
resolution. So no resolution is performed at all for this server.

To avoid any problem, it is fairly easier to handle this special case during
startup. But this means we must be prepared to have no "resolv_requester"
field for a server at runtime.

This patch must be backported as far as 2.2.
2021-03-12 17:41:28 +01:00
Christopher Faulet
0efc0993ec BUG/MEDIUM: resolvers: Don't release resolution from a requester callbacks
Another way to say it: "Safely unlink requester from a requester callbacks".

Requester callbacks must never try to unlink a requester from a resolution, for
the current requester or another one. First, these callback functions are called
in a loop on a request list, not necessarily safe. Thus unlink resolution at
this place, may be unsafe. And it is useless to try to make these loops safe
because, all this stuff is placed in a loop on a resolution list. Unlink a
requester may lead to release a resolution if it is the last requester.

However, the unkink is necessary because we cannot reset the server state
(hostname and IP) with some pending DNS resolution on it. So, to workaround
this issue, we introduce the "safe" unlink. It is only performed from a
requester callback. In this case, the unlink function never releases the
resolution, it only reset it if necessary. And when a resolution is found
with an empty requester list, it is released.

This patch depends on the following commits :

 * MINOR: resolvers: Purge answer items when a SRV resolution triggers an error
 * MINOR: resolvers: Use a function to remove answers attached to a resolution
 * MINOR: resolvers: Directly call srvrq_update_srv_state() when possible
 * MINOR: resolvers: Add function to change the srv status based on SRV resolution

All the series must be backported as far as 2.2. It fixes a regression
introduced by the commit b4badf720 ("BUG/MINOR: resolvers: new callback to
properly handle SRV record errors").

don't release resolution from requester cb
2021-03-12 17:41:28 +01:00
Christopher Faulet
6b117aed49 MINOR: resolvers: Directly call srvrq_update_srv_state() when possible
When the server status must be updated from the result of a SRV resolution,
we can directly call srvrq_update_srv_state(). It is simpler and this avoid
a test on the server DNS resolution.

This patch is mandatory for the next commit. It also rely on "MINOR:
resolvers: Directly call srvrq_update_srv_state() when possible".
2021-03-12 17:41:28 +01:00
Christopher Faulet
5efdef24c1 MINOR: resolvers: Add function to change the srv status based on SRV resolution
srvrq_update_srv_status() update the server status based on result of SRV
resolution. For now, it is only used from snr_update_srv_status() when
appropriate.
2021-03-12 17:41:28 +01:00
Christopher Faulet
51d5e3bda7 MINOR: resolvers: Purge answer items when a SRV resolution triggers an error
When a SRV request trigger an error, if we decide to handle the error
because last_valid duration is expired, the answer list may be purged. All
items are considered as obsolete.
2021-03-12 17:41:28 +01:00
Christopher Faulet
49531e8471 BUG/MINOR; resolvers: Ignore DNS resolution for expired SRV item
If no ADD item is found for a SRV item in a SRV response, a DNS resolution
is triggered. When it succeeds, we must be sure the SRV item is still
alive. Otherwise the DNS resolution must be ignored.

This patch depends on the commit "MINOR: resolvers: Move last_seen time of
an ADD into its corresponding SRV item". Both must be backported as far as
2.2.
2021-03-12 17:41:28 +01:00
Christopher Faulet
bca680ba90 BUG/MINOR: resolvers: Unlink DNS resolution to set RMAINT on SRV resolution
When a server is set in RMAINT becaues of a SRV resolution failure, the
server DNS resolution, if any, must be unlink first. It is mandatory to
handle the change in the context of a SRV resolution.

This patch must be backported as far as 2.2.
2021-03-12 16:43:37 +01:00
Christopher Faulet
5130c21fbb BUG/MINOR: resolvers: Reset server address on DNS error only on status change
When a DNS resolution error is detected, in snr_resolution_error_cb(), the
server address must be reset only if the server status has changed. It this
case, it means the server is set to RMAINT. Thus the server address may by
reset.

This patch fixes a bug introduced by commit d127ffa9f ("BUG/MEDIUM:
resolvers: Reset address for unresolved servers"). It must be backported as
far as 2.0.
2021-03-12 16:43:37 +01:00
Christopher Faulet
bd0227c109 BUG/MINOR: resolvers: Consider server to have no IP on DNS resolution error
When an error is received for a DNS resolution, for instance a NXDOMAIN
error, the server must be considered to have no address when its status is
updated, not the opposite.

Concretly, because this parameter is not used on error path in
snr_update_srv_status(), there is no impact.

This patch must be backported as far as 1.8.
2021-03-12 16:43:37 +01:00
Willy Tarreau
736adef511 BUG/MINOR: cfgparse/server: increment the extra keyword counter one at a time
This was introduced in previous commit 49c2b45c1 ("MINOR: cfgparse/server:
try to fix spelling mistakes on server lines"), the loop was changed but
the increment left. No backport is needed.
2021-03-12 14:47:10 +01:00
Willy Tarreau
49c2b45c1d MINOR: cfgparse/server: try to fix spelling mistakes on server lines
Let's apply the fuzzy match to server keywords so that we can avoid
dumping the huge list of supported keywords each time there is a spelling
mistake, and suggest proper spelling instead:

  $ printf "listen f\nserver s 0 sendpx-v2\n" | ./haproxy -c -f /dev/stdin
  [NOTICE] 070/095718 (24152) : haproxy version is 2.4-dev11-caa6e3-25
  [NOTICE] 070/095718 (24152) : path to executable is ./haproxy
  [ALERT] 070/095718 (24152) : parsing [/dev/stdin:2] : 'server s' unknown keyword 'sendpx-v2'; did you mean 'send-proxy-v2' maybe ?
  [ALERT] 070/095718 (24152) : Error(s) found in configuration file : /dev/stdin
  [ALERT] 070/095718 (24152) : Fatal errors found in configuration.
2021-03-12 14:13:21 +01:00
Willy Tarreau
018251667e CLEANUP: config: make the cfg_keyword parsers take a const for the defproxy
The default proxy was passed as a variable to all parsers instead of a
const, which is not without risk, especially when some timeout parsers used
to make some int pointers point to the default values for comparisons. We
want to be certain that none of these parsers will modify the defaults
sections by accident, so it's important to mark this proxy as const.

This patch touches all occurrences found (89).
2021-03-09 10:09:43 +01:00
Willy Tarreau
d4e78d873c MINOR: server: move actconns to the per-thread structure
The actconns list creates massive contention on low server counts because
it's in fact a list of streams using a server, all threads compete on the
list's head and it's still possible to see some watchdog panics on 48
threads under extreme contention with 47 threads trying to add and one
thread trying to delete.

Moving this list per thread is trivial because it's only used by
srv_shutdown_streams(), which simply required to iterate over the list.

The field was renamed to "streams" as it's really a list of streams
rather than a list of connections.
2021-03-05 15:00:24 +01:00