Commit Graph

749 Commits

Author SHA1 Message Date
Aurelien DARRAGON
85298189bf BUG/MEDIUM: server: server stuck in maintenance after FQDN change
Pierre Bonnat reported that SRV-based server-template recently stopped
to work properly.

After reviewing the changes, it was found that the regression was caused
by a4d04c6 ("BUG/MINOR: server: make sure the HMAINT state is part of MAINT")

Indeed, HMAINT is not a regular maintenance flag. It was implemented in
b418c122 a4d04c6 ("BUG/MINOR: server: make sure the HMAINT state is part
of MAINT"). This flag is only set (and never removed) when the server FQDN
is changed from its initial config-time value. This can happen with "set
server fqdn" command as well as SRV records updates from the DNS. This
flag should ideally belong to server flags.. but it was stored under
srv_admin enum because cur_admin is properly exported/imported via server
state-file while regular server's flags are not.

Due to a4d04c6, when a server FQDN changes, the server is considered in
maintenance, and since the HMAINT flag is never removed, the server is
stuck in maintenance.

To fix the issue, we partially revert a4d04c6. But this latter commit is
right on one point: HMAINT flag was way too confusing and mixed-up between
regular MAINT flags, thus there's nothing to blame about a4d04c6 as it was
error-prone anyway.. To prevent such kind of bugs from happening again,
let's rename HMAINT to something more explicit (SRV_ADMF_FQDN_CHANGED) and
make it stand out under srv_admin enum so we're not tempted to mix it with
regular maintenance flags anymore.

Since a4d04c6 was set to be backported in all versions, this patch must
be backported there as well.
2024-10-16 14:26:57 +02:00
Willy Tarreau
1d403caf8a MINOR: server: make srv_shutdown_sessions() call pendconn_redistribute()
When shutting down server sessions, the queue was not considered, which
is a problem if some element reached the queue at the moment the server
was going down, because there will be no more requests to kick them out
of it. Let's always make sure we scan the queue to kick these streams
out of it and that they can possibly find a more suitable server. This
may make a difference in the time it takes to shut down a server on the
CLI when lots of servers are in the queue.

It might be interesting to backport this to 3.0 but probably not much
further.
2024-09-27 19:01:38 +02:00
Willy Tarreau
d1c398b786 Revert "BUG/MINOR: server: shut down streams under thread isolation"
This reverts commit b500e84e24.

Thread isolation does not work well for this, there exists code paths
which already hold the server's lock and result in a deadlock. Let's
revert that and address it better without isolation.
2024-09-27 10:17:31 +02:00
Willy Tarreau
b500e84e24 BUG/MINOR: server: shut down streams under thread isolation
Since the beginning of thread support, the shutdown of streams attached
to a server was run under the server's lock, but that's not sufficient.
It indeed turns out that shutting down streams (either from the CLI using
"shutdown sessions server XXX" or due to "on-error shutdown-sessions")
iterates over all the streams to shut them down, but stream_shutdown()
has no way to protect its actions against concurrent actions from the
stream itself on another thread, and streams offer no such provisions
anyway.

The impact is some rare but possible crashes when shutting down streams
from the CLI in cmopetition with high server traffic. The probability
is low enough to mark it minor, though it was observed in the field.

At least since 2.4 the streams are arranged in per-thread lists, so it
likely would be possible using the event subsystem to delegate these
events to dedicated per-thread tasks which would address the problem.
But server streams don't get killed often enough to justify such extra
complexity, so better just run the loop under thread isolation.

It also shows that the internal API could probably be improved to
support a lighter thread exclusion instead of full isolation: various
places want to only exclude one thread and here it could work. But
again there's no point doing this for now.

This patch should be backported to all stable branches. It's important
to carefully check that this srv_shutdowns_streams() function is never
called itself under isolation in older versions (though at first glance
it looks OK).
2024-09-21 19:35:35 +02:00
Willy Tarreau
2b95c77c08 DOC: server: document what to check for when adding new server keywords
It's too easy to overlook the dynamic servers when adding new server
keywords, and the fields on each keyword line are totally obscure. This
commit adds a title to each column of the table and explains what is
expected and what to check for when adding a keyword.
2024-09-10 18:50:12 +02:00
Damien Claisse
ce6a621ae3 MINOR: server: allow init-state for dynamic servers
Commit 50322df introduced the init-state keyword, but it didn't enable
it for dynamic servers. However, this feature is perfectly desirable
for virtual servers too, where someone would like a server inlived
through "set server be1/srv1 state ready" to be put out of maintenance
in down state until the next health check succeeds.
At reading the code, it seems that it's only a matter of allowing this
keyword for dynamic servers, as current code path calls
srv_adm_set_ready() which incidentally triggers a call to
_srv_update_status_adm().
2024-09-10 18:18:38 +02:00
Aaron Kuehler
50322dff81 MEDIUM: server: add init-state
Allow the user to set the "initial state" of a server.

Context:

Servers are always set in an UP status by default. In
some cases, further checks are required to determine if the server is
ready to receive client traffic.

This introduces the "init-state {up|down}" configuration parameter to
the server.

- when set to 'fully-up', the server is considered immediately available
  and can turn to the DOWN sate when ALL health checks fail.
- when set to 'up' (the default), the server is considered immediately
  available and will initiate a health check that can turn it to the DOWN
  state immediately if it fails.
- when set to 'down', the server initially is considered unavailable and
  will initiate a health check that can turn it to the UP state immediately
  if it succeeds.
- when set to 'fully-down', the server is initially considered unavailable
  and can turn to the UP state when ALL health checks succeed.

The server's init-state is considered when the HAProxy instance
is (re)started, a new server is detected (for example via service
discovery / DNS resolution), a server exits maintenance, etc.

Link: https://github.com/haproxy/haproxy/issues/51
2024-09-05 11:13:10 +02:00
Aperence
38618822e1 MINOR: server: add a alt_proto field for server
Add a new field alt_proto to the server structures that
specify if an alternate protocol should be used for this server.

This field can be transparently passed to protocol_lookup to get
an appropriate protocol structure.

This change allows thus to create servers with different protocols,
and not only TCP anymore.
2024-08-30 18:53:49 +02:00
Aperence
a7b04e383a MINOR: tools: extend str2sa_range to add an alt parameter
Add a new parameter "alt" that will store wether this configuration
use an alternate protocol.

This alt pointer will contain a value that can be transparently
passed to protocol_lookup to obtain an appropriate protocol structure.

This change is needed to allow for example the servers to know if it
need to use an alternate protocol or not.
2024-08-30 18:53:49 +02:00
Aurelien DARRAGON
a6d1eb8f5d MINOR: server: ensure max_events_at_once > 0 in server_atomic_sync()
In 8f1fd96 ("BUG/MEDIUM: server/addr: fix tune.events.max-events-at-once
event miss and leak"), we added a comment saying that
tune.events.max-events-at-once is assumed to be strictly positive.

It is so because the keyword parser forces values between 1 and 10000:
we don't want less than 1 because it wouldn't make any sense, and 10k
max because beyond that we could create contention in server_atomic_sync()

Now as the above commit implements a do..while it heavily relies on the
fact that the budget is at least 1. Upon soft-stop, we break away from
the loop without decrementing the budget. With all that in mind, it is
safe to assume that the 'remain' counter will only fall to 0 if the task
runs out of budget while doing work, in which case the task still exists
and must be rescheduled.

As seen in GH #2667 this assumption was ambiguous, so let's make it
official by adding a pair of BUG_ON() that make it explicit that it
works because remain 'cannot' be 0 unless the entire budget was
consumed.

No backport needed.
2024-08-07 18:31:35 +02:00
Aurelien DARRAGON
8f1fd96d17 BUG/MEDIUM: server/addr: fix tune.events.max-events-at-once event miss and leak
An issue has been introduced with cd99440 ("BUG/MAJOR: server/addr: fix
a race during server addr:svc_port updates").

Indeed, in the above commit we implemented the atomic_sync task which is
responsible for consuming pending server events to apply the changes
atomically. For now only server's addr updates are concerned.

To prevent the task from causing contention, a budget was assigned to it.
It can be controlled with the global tunable
'tune.events.max-events-at-once': the task may not process more than this
number of events at once.

However, a bug was introduced with this budget logic: each time the task
has to be interrupted because it runs out of budget, we reschedule the
task to finish where it left off, but the current event which was already
removed from the queue wasn't processed yet. This means that this pending
event (each tune.events.max-events-at-once) is effectively lost.

When the atomic_sync task deals with large number of concurrent events,
this bug has 2 known consequences: first a server's addr/port update
will be lost every 'tune.events.max-events-at-once'. This can of course
cause reliability issues because if the event is not republished
periodically, the server could stay in a stale state for indefinite amount
of time. This is the case when the DNS server flaps for instance: some
servers may not come back UP after the incident as described in GH #2666.

Another issue is that the lost event was not cleaned up, resulting in a
small memory leak. So in the end, it means that the bug is likely to
cause more and more degradation over time until haproxy is restarted.

As a workaround, 'tune.events.max-events-at-once' may be set to the
maximum number of events expected per batch. Note however that this value
cannot exceed 10 000, otherwise it could cause the watchdog to trigger due
to the task being busy for too long and preventing other threads from
making any progress. Setting higher values may not be optimal for common
workloads so it should only be used to mitigate the bug while waiting for
this fix.

Since tune.events.max-events-at-once defaults to 100, this bug only
affects configs that involve more than 100 servers whose addr:port
properties are likely to be updated at the same time (batched updates
from cli, lua, dns..)

To fix the bug, we move the budget check after the current event is fully
handled. For that we went from a basic 'while' to 'do..while' loop as we
assume from the config that 'tune.events.max-events-at-once' cannot be 0.
While at it, we reschedule the task once thread isolation ends (it was not
required to perform the reschedule while under isolation) to give the hand
back faster to waiting threads.

This patch should be backported up to 2.9 with cd99440. It should fix
GH #2666.
2024-08-06 16:41:37 +02:00
Christopher Faulet
abaafda485 BUG/MINOR: server: Don't warn fallback IP is used during init-addr resolution
When a fallback IP address is provided in the list of methods to use to
resolve the server address, a warning is emitted if previous methods
failed. The aim is to inform this address will be used for the
server. However, it is valid use-case. It is the expected behavior. There is
no reason to emit a warning. Having a message during HAProxy startup to
inform the fallback IP address will be used is probably a good idea. But it
should be a notice not a warning. Otherwise, checking the configuration
validity will always failed, just like starting HAProxy in zero-warning
mode while the option was set on purpose.

This patch should fix the issue #2627. It must be backported to all stable
versions.
2024-07-18 16:39:38 +02:00
Aurelien DARRAGON
05f33e95ba MINOR: server: better mt_list usage for node migration (prev_deleted handling)
Now that mt_list v2 api was merged into haproxy's codebase in 4e65fc6 ("
MAJOR: import: update mt_list to support exponential back-off (try #2)"),
let's fix a hack in cli_parse_delete_server() which abused from mt_list
api to migrate an element from one list to another: there used to be a
tiny race there between the pop and the append operations, race that was
compensated by the fact that it was performed under full thread isolation.

However that was a bad example of the mt_list API which could have
resulted in actual bug if the code was duplicated elsewhere without thread
isolation. To fix this, we now make use of the
MT_LIST_FOR_EACH_ENTRY_LOCKED() macro which allows us to simply migrate
the current element to another list since the element is appended into
another one while still in busy state and then unlinked from the original
list.
2024-07-16 09:12:39 +02:00
Willy Tarreau
4e65fc66f6 MAJOR: import: update mt_list to support exponential back-off (try #2)
This is the second attempt at importing the updated mt_list code (commit
59459ea3). The previous one was attempted with commit c618ed5ff4 ("MAJOR:
import: update mt_list to support exponential back-off") but revealed
problems with QUIC connections and was reverted.

The problem that was faced was that elements deleted inside an iterator
were no longer reset, and that if they were to be recycled in this form,
they could appear as busy to the next user. This was trivially reproduced
with this:

  $ cat quic-repro.cfg
  global
          stats socket /tmp/sock1 level admin
          stats timeout 1h
          limited-quic

  frontend stats
          mode http
          bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3
          timeout client 5s
          stats uri /

  $ ./haproxy -db -f quic-repro.cfg  &

  $ h2load -c 10 -n 100000 --npn h3 https://127.0.0.1:8443/
  => hang

This was purely an API issue caused by the simplified usage of the macros
for the iterator. The original version had two backups (one full element
and one pointer) that the user had to take care of, while the new one only
uses one that is transparent for the user. But during removal, the element
still has to be unlocked if it's going to be reused.

All of this sparked discussions with Fred and Aurlien regarding the still
unclear state of locking. It was found that the lock API does too much at
once and is lacking granularity. The new version offers a much more fine-
grained control allowing to selectively lock/unlock an element, a link,
the rest of the list etc.

It was also found that plenty of places just want to free the current
element, or delete it to do anything with it, hence don't need to reset
its pointers (e.g. event_hdl). Finally it appeared obvious that the
root cause of the problem was the unclear usage of the list iterators
themselves because one does not necessarily expect the element to be
presented locked when not needed, which makes the unlock easy to overlook
during reviews.

The updated version of the list presents explicit lock status in the
macro name (_LOCKED or _UNLOCKED suffixes). When using the _LOCKED
suffix, the caller is expected to unlock the element if it intends to
reuse it. At least the status is advertised. The _UNLOCKED variant,
instead, always unlocks it before starting the loop block. This means
it's not necessary to think about unlocking it, though it's obviously
not usable with everything. A few _UNLOCKED were used at obvious places
(i.e. where the element is deleted and freed without any prior check).

Interestingly, the tests performed last year on QUIC forwarding, that
resulted in limited traffic for the original version and higher bit
rate for the new one couldn't be reproduced because since then the QUIC
stack has gaind in efficiency, and the 100 Gbps barrier is now reached
with or without the mt_list update. However the unit tests definitely
show a huge difference, particularly on EPYC platforms where the EBO
provides tremendous CPU savings.

Overall, the following changes are visible from the application code:

  - mt_list_for_each_entry_safe() + 1 back elem + 1 back ptr
    => MT_LIST_FOR_EACH_ENTRY_LOCKED() or MT_LIST_FOR_EACH_ENTRY_UNLOCKED()
       + 1 back elem

  - MT_LIST_DELETE_SAFE() no longer needed in MT_LIST_FOR_EACH_ENTRY_UNLOCKED()
      => just manually set iterator to NULL however.
    For MT_LIST_FOR_EACH_ENTRY_LOCKED()
      => mt_list_unlock_self() (if element going to be reused) + NULL

  - MT_LIST_LOCK_ELT => mt_list_lock_full()
  - MT_LIST_UNLOCK_ELT => mt_list_unlock_full()

  - l = MT_LIST_APPEND_LOCKED(h, e);  MT_LIST_UNLOCK_ELT();
    => l=mt_list_lock_prev(h); mt_list_lock_elem(e); mt_list_unlock_full(e, l)
2024-07-09 16:46:38 +02:00
Amaury Denoyelle
50ae717624 BUG/MEDIUM: server: fix race on server_atomic_sync()
The following patch fixes a race condition during server addr/port
update :
  cd994407a9
  BUG/MAJOR: server/addr: fix a race during server addr:svc_port updates

The new update mechanism is implemented via an event update. It uses
thread isolation to guarantee that no other thread is accessing server
addr/port. Furthermore, to ensure server instance is not deleted just
before the event handler, server instance is lookup via its ID in proxy
tree.

However, thread isolation is only entered after server lookup. This
leaves a tiny race condition as the thread will be marked as harmless
and a concurrent thread can delete the server in the meantime. This
causes server_atomic_sync() to manipulated a deleted server instance to
reinsert it in used_server_addr backend tree. This can cause a segfault
during this operation or possibly on a future used_server_addr tree
access.

This issue was detected by criteo. Several backtraces were retrieved,
each related to server addr_node insert or delete operation, either in
srv_set_addr_desc(), or add/delete dynamic server handlers.

To fix this, simply extend thread isolation section to start it before
server lookup. This ensures that once retrieved the server cannot be
deleted until its addr/port are updated. To ensure this issue won't
happen anymore, a new BUG_ON() is added in srv_set_addr_desc().

Also note that ebpt_delete() is now called every time on delete handler
as this is a safe idempotent operation.

To reproduce these crashes, a script was executed to add then remove
different servers every second. In parallel, the following CLI command
was issued repeatdly without any delay to force multiple update on
servers port :

  set server <srv> addr 0.0.0.0 port $((1024 + RANDOM % 1024))

This must be backported at least up to 3.0. If above mentionned patch
has been selected for previous version, this commit must also be
backported on them.
2024-07-03 09:20:24 +02:00
Aurelien DARRAGON
80aba1d284 BUG/MEDIUM: server/dns: prevent DOWN/UP flap upon resolution timeout or error
This is a complementary patch to c16eba818 ("BUG/MEDIUM: server/dns:
preserve server's port upon resolution timeout or error").

Indeed, since c16eba818, the port is properly preserved, but unsetting
server's address this way results in server_atomic_sync() function
thinking that we're actually setting a new address and not unsetting
the previous one because addr family is != AF_UNSPEC.

Upon DNS timeout, this could be observed:

[WARNING]  (2588257) : Server http/s1 is going DOWN for maintenance (DNS timeout status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING]  (2588257) : Server http/s1 ('test1.localhost') is UP/READY (resolves again).

Notice that server timeouts and then immediately resolves again. Of course
in this case case the server's address was properly set to 0, meaning
that the server will not receive any traffic, but it is confusing and
could result in haproxy temporarily thinking that the server is actually
available while it's not.

To properly fix the issue and restore historical behavior, let's
explicitly set inetaddr's family to AF_UNSPEC after fetching original
server's address.

It should be backported in 3.0 with c16eba818.
2024-06-28 11:26:52 +02:00
Aurelien DARRAGON
eec8048042 BUG/MINOR: server: fix first server template name lookup UAF
This is a follow-up for 7223296 ("BUG/MINOR: server: fix first server
template not being indexed").

Indeed, in 7223296 we added a new call to _srv_parse_set_id_from_prefix()
for the first server before handling additional ones. But we actually
overlooked the fact that _srv_parse_set_id_from_prefix() was already
performed at the end of _srv_parse_tmpl_init() for the same server.

Since _srv_parse_set_id_from_prefix() frees srv->id, it results in UAF
when performing name lookups on the first server, because used_server_name
node key still uses the freed string pointer.

The early _srv_parse_set_id_from_prefix() call (added in 7223296) and
the original one perform the same task, except that the new one is
followed by name node insertion logic required for name lookups to work
properly. So let's simply get rid of the old one at the end of the
function.

_srv_parse_set_id_from_prefix() in the 'err:' label was also removed since
is is now useless as well starting with 7223296 and would trigger the same
bug on error paths. Thanks to Amaury for noticing it.

This bug was discovered while trying to address GH issue #2620.
Thanks to @x-yuri for his detailed report (with working repro).

It should be backported in 3.0 with 7223296.
2024-06-27 16:38:25 +02:00
Christopher Faulet
0d7c1bc6ab BUG/MINOR: server: Don't reset resolver options on a new default-server line
When a new "default-server" line is parsed, some resolver options are reset.
Thus previously defined default options cannot be inherited. There is no
reason to do so. First because other server options are inherited. And then
because not all resolver options are reset. It is not consistent.

This patch should fix issue #2559. It should be backported to all stable
versions.
2024-05-24 16:31:01 +02:00
Aurelien DARRAGON
c16eba8183 BUG/MEDIUM: server/dns: preserve server's port upon resolution timeout or error
@boi4 reported in GH #2578 that since 3.0-dev1 for servers with address
learned from A/AAAA records after a DNS flap server would be put out of
maintenance with proper address but with invalid port (== 0), making it
unusable and causing tcp checks to fail:

[NOTICE]   (1) : Loading success.
[WARNING]  (8) : Server mybackend/myserver1 is going DOWN for maintenance (DNS refused status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT]    (8) : backend 'mybackend' has no server available!
[WARNING]  (8) : mybackend/myserver1: IP changed from '(none)' to '127.0.0.1' by 'myresolver/ns1'.
[WARNING]  (8) : Server mybackend/myserver1 ('myhost') is UP/READY (resolves again).
[WARNING]  (8) : Server mybackend/myserver1 administratively READY thanks to valid DNS answer.
[WARNING]  (8) : Server mybackend/myserver1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

@boi4 also mentioned that this used to work fine before.

Willy suggested that this regression may have been introduced by 64c9c8e
("BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS")

Turns out he was right! Indeed, in 64c9c8e we systematically memset the
whole server_inetaddr struct (which contains both the requested server's
addr and port planned for atomic update) instead of only memsetting the
addr part of the structure: except when SRV records are involved (SRV
records provide both the address and the port unlike A or AAAA records),
we must not reset the server's port upon DNS errors because the port may
have been provided at config time and we don't want to lose its value.

Big thanks to @boi4 for his well-documented issue that really helped us to
pinpoint the bug right on time for the dev-13 release.

No backport needed (unless 64c9c8e gets backported).
2024-05-24 15:29:48 +02:00
Amaury Denoyelle
be4f89f2b2 MINOR: server: define pool-conn-name keyword
Define a new server keyword pool-conn-name. The purpose of this keyword
will be to identify connections inside the idle connections pool,
replacing SNI in case SSL is not wanted.

This keyword uses a sample expression argument. It thus can reuse
existing function parse_srv_expr() for parsing. In the future, it may be
necessary to define a keyword variant which uses a logformat for
extensability.

This patch only implement parsing. Argument is stored inside new server
field <pool_conn_name> and expression is generated in
_srv_parse_finalize() into <pool_conn_name_expr>.

If pool-conn-name is not set but SNI is, the latter is reused
automatically as pool-conn-name via _srv_parse_finalize(). This ensures
current reuse behavior remains compatible and idle connection reuse will
not mix connections with different SNIs by mistake.

Main usage will be for rhttp when SSL is not wanted between the two
haproxy instances. Previously, it was possible to use "sni" keyword even
without SSL on a server line which have a similar effect. However,
having a dedicated "pool-conn-name" keyword is deemed clearer. Besides,
it would allow for more complex configuration where pool-conn-name and
SNI are use in parallel with different values.
2024-05-24 14:36:31 +02:00
Amaury Denoyelle
91001422b4 MINOR: server: generalize sni expr parsing
Two functions exists for server sni sample expression parsing. This is
confusing so this commit aims at clarifying this.

Functions are renamed with the following identifiers. First function is
named parse_srv_expr() and can be used during parsing. Besides
expression parsing, it has ensure sample fetch validity in the context
of a server line.

Second function is renamed _parse_srv_expr() and is used internally by
parse_srv_expr(). It only implements sample parsing without extra
checks. It is already use for server instantiation derived from
server-template as checks were already performed. Also, it is now used
in http-client code as SNI is a fixed string.

Finally, both functions are generalized to remove any reference to SNI.
This will allow to reuse it to parse other server keywords which use an
expression. This will be the case for the future keyword pool-conn-name.
2024-05-24 14:36:31 +02:00
Amaury Denoyelle
3efd9f3925 BUG/MINOR: server: free PROXY v2 TLVs on srv drop
Dynamically allocated servers PROXY TLVs were not freed on server
release. This patch fixes this leak by extending srv_free_params().
Every server line with set-proxy-v2-tlv-fmt keyword is impacted.

For static servers, issue is minimal as it will only cause leak on
deinit(). However, this could be aggravated when performing multiple
removal of dynamic servers.

This should be backported up to 2.9.
2024-05-22 10:01:57 +02:00
Amaury Denoyelle
412f1eeb89 BUG/MEDIUM: server: clear purgeable conns before server deletion
Since the following commit, idle connections are cleared before a server
is deleted. This is better than blocking server deletion due to inactive
connections :

  6e0afb2e27
  MEDIUM: server: close idle conn on server deletion

A BUG_ON() has been added to ensure that server idle conn counter is nul
after these connections are removed. However, Willy managed to trigger
it easily by repeatedly and randomly delete servers accross a
single-thread haproxy using a server-template with 1000 instances. In
parallel, a h1load client is executed to generate traffic.

This BUG_ON() reflected that it some connections referencing the server
targetted for deletion remained, even though idle server list is empty.
In fact, this is caused by connections scheduled for purging. These
connections are moved from idle server list to a global toremove_list
while still being accounted by the server.

A first approach could be to decrement server idle counter while moving
connection to the purge list. However, this is functionnaly incorrect as
these purgeable connections still reference the server and it could
cause a crash if cleared after it.

The correct fix for this issue is simply to remove every purgeable
connections before a server is deleted. This is implemented by this
patch by extending cli_parse_delete_server(). It could be enough to only
remove connections targetted the deleted server, but as these
connections will be purged anyway it is justified to clear the whole
list.

This must not be backported, unless the above mentionned patch is.
2024-05-15 15:01:55 +02:00
Amaury Denoyelle
10ab56831e MINOR: stats: convert age as generic column for proxy stat
Convert FN_AGE in stat_cols_px[] as generic columns. These values will
be automatically used for dump/preload of a stats-file.

Remove srv_lastsession() / be_lastsession() function which are now
useless as last_sess is calculated via me_generate_field().
2024-05-02 10:55:25 +02:00
Amaury Denoyelle
634cc2a5d8 MINOR: counters: move last_change into counters struct
last_change was a member present in both proxy and server struct. It is
used as an age statistics to report the last update of the object.

Move last_change into fe_counters/be_counters. This is necessary to be
able to manipulate it through generic stat column and report it into
stats-file.

Note that there is a change for proxy structure with now 2 different
last_change values, on frontend and backend side. Special care was taken
to ensure that the value is initialized only on the proxy side. The
other value is set to 0 unless a listen proxy is instantiated. For the
moment, only backend counter is reported in stats. However, with now two
distinct values, stats could be extended to report it on both side.
2024-05-02 10:55:25 +02:00
Valentine Krasnobaeva
5cbb278fae MINOR: capabilities: add cap_sys_admin support
If 'namespace' keyword is used in the backend server settings or/and in the
bind string, it means that haproxy process will call setns() to change its
default namespace to the configured one and then, it will create a
socket in this new namespace. setns() syscall requires CAP_SYS_ADMIN
capability in the process Effective set (see man 2 setns). Otherwise, the
process must be run as root.

To avoid to run haproxy as root, let's add cap_sys_admin capability in the
same way as we already added the support for some other network capabilities.

As CAP_SYS_ADMIN belongs to CAP_SYS_* capabilities type, let's add a separate
flag LSTCHK_SYSADM for it. This flag is set, if the 'namespace' keyword was
found during configuration parsing. The flag may be unset only in
prepare_caps_for_setuid() or in prepare_caps_from_permitted_set(), which
inspect process EUID/RUID and Effective and Permitted capabilities sets.

If system doesn't support Linux capabilities or 'cap_sys_admin' was not set
in 'setcap', but 'namespace' keyword is presented in the configuration, we
keep the previous strict behaviour. Process, that has changed uid to the
non-priviledged user, will terminate with alert. This alert invites the user
to recheck its configuration.

In the case, when haproxy will start and run under a non-root user and
'cap_sys_admin' is not set, but 'namespace' keyword is presented, this patch
does not change previous behaviour as well. We'll still let the user to try
its configuration, but we inform via warning, that unexpected things, like
socket creation errors, may occur.
2024-04-30 21:40:17 +02:00
Damien Claisse
0797e05d9f BUG/MINOR: server: fix slowstart behavior
We observed that a dynamic server which health check is down for longer
than slowstart delay at startup doesn't trigger the warmup phase, it
receives full traffic immediately. This has been confirmed by checking
haproxy UI, weight is immediately the full one (e.g. 75/75), without any
throttle applied. Further tests showed that it was similar if it was in
maintenance, and even when entering a down or maintenance state after
being up.
Another issue is that if the server is down for less time than
slowstart, when it comes back up, it briefly has a much higher weight
than expected for a slowstart.

An easy way to reproduce is to do the following:
- Add a server with e.g. a 20s slowstart and a weight of 10 in config
  file
- Put it in maintenance using CLI (set server be1/srv1 state maint)
- Wait more than 20s, enable it again (set server be1/srv1 state ready)
- Observe UI, weight will show 10/10 immediately.
If server was down for less than 20s, you'd briefly see a weight and
throttle value that is inconsistent, e.g. 50% throttle value and a
weight of 5 if server comes back up after 10s before going back to
6% after a second or two.

Code analysis shows that the logic in server_recalc_eweight stops the
warmup task by setting server's next state to SRV_ST_RUNNING if it
didn't change state for longer than the slowstart duration, regardless
of its current state. As a consequence, a server being down or disabled
for longer than the slowstart duration will never enter the warmup phase
when it will be up again.

Regarding the weight when server comes back up, issue is that even if
the server is down, we still compute its next weight as if it was up,
hence when it comes back up, it can briefly have a much higher weight
than expected during slowstart, until the warmup task is called again
after last_change is updated.

This patch aims to fix both issues.
2024-04-11 19:24:01 +02:00
Amaury Denoyelle
8259456981 MINOR: server: implement GUID support
This commit is similar to previous one, except that it implements GUID
support for server instances. A guid_node field is inserted into server
structure. A new "guid" server keyword is defined.
2024-04-05 15:40:42 +02:00
Aurelien DARRAGON
6810c41f8e MEDIUM: tree-wide: add logformat expressions wrapper
log format expressions are broadly used within the code: once they are
parsed from input string, they are converted to a linked list of
logformat nodes.

We're starting to face some limitations because we're simply storing the
converted expression as a generic logformat_node list.

The first issue we're facing is that storing logformat expressions that
way doesn't allow us to add metadata alongside the list, which is part
of the prerequites for implementing log-profiles.

Another issue with storing logformat expressions as generic lists of
logformat_node elements is that it's starting to become really hard to
tell when we rely on logformat expressions or not in the code given that
there isn't always a comment near the list declaration or manipulation
to indicate that it's relying on logformat expressions under the hood,
so this adds some complexity for code maintenance.

This patch looks quite impressive due to changes in a lot of header and
source files (since logformat expressions are broadly used), but it does
a simple thing: it defines the lf_expr structure which itself holds a
generic list of logformat nodes, and then declares some helpers to
manipulate lf_expr elements and fixes the code so that we now exclusively
manipulate logformat_node lists as lf_expr elements outside of log.c.

For now, lf_expr struct only contains the list of logformat nodes (no
additional metadata), but now that we have dedicated type and helpers,
doing so in the future won't be problematic at all and won't require
extensive code changes.
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
837a26ab05 BUG/MEDIUM: server/lbprm: fix crash in _srv_set_inetaddr_port()
Since faa8c3e ("MEDIUM: lb-chash: Deterministic node hashes based on
server address") the following configuration will cause haproxy to crash:

  backend test1
       mode http
       balance hash int(1)
       server s1 haproxy.org:80

This is because lbprm.update_server_eweight() method is now systematically
called in _srv_set_inetaddr_port() upon srv addr/port change (and with the
above config it happens during startup after initial dns resolution).

However, depending on the chosen lbprm algo, update_server_eweight function
may not be set (it is not a mandatory method, some lb implementations don't
define it).

Thus, using 'balance hash' with map-based hashing or 'balance sticky' will
cause a crash due to a NULL de-reference in _srv_set_inetaddr_port(). To
fix the issue, we first check that the update_server_eweight() method is
set before using it.

No backport needed unless faa8c3e ("MEDIUM: lb-chash: Deterministic node
hashes based on server address") gets backported.
2024-04-03 11:58:03 +02:00
Anthony Deschamps
faa8c3e024 MEDIUM: lb-chash: Deterministic node hashes based on server address
Motivation: When services are discovered through DNS resolution, the order in
which DNS records get resolved and assigned to servers is arbitrary. Therefore,
even though two HAProxy instances using chash balancing might agree that a
particular request should go to server3, it is likely the case that they have
assigned different IPs and ports to the server in that slot.

This patch adds a server option, "hash-key <key>" which can be set to "id" (the
existing behaviour, default), "addr", or "addr-port". By deriving the keys for
the chash tree nodes from a server's address and port we ensure that independent
HAProxy instances will agree on routing decisions. If an address is not known
then the key is derived from the server's puid as it was previously.

When adjusting a server's weight, we now check whether the server's hash has
changed. If it has, we have to remove all its nodes first, since the node keys
will also have to change.
2024-04-02 07:00:10 +02:00
Amaury Denoyelle
6333e6ec8e MINOR: server: allow cookie for dynamic servers
This commit allows "cookie" keyword for dynamic servers. After code
review, nothing was found which could prevent a dynamic server to use
it. An extra warning is added under cli_parse_add_server() if cookie
value is ignored due to a non HTTP backend.

This patch is not considered a bugfix. However, it may backported if
needed as its impact seems minimal.
2024-03-28 11:54:21 +01:00
Damien Claisse
9a0e0d3a19 BUG/MINOR: server: fix persistence cookie for dynamic servers
When adding a server dynamically, we observe that when a backend has a
dynamic persistence cookie, the new server has no cookie as we receive
the following HTTP header:
set-cookie: test-cookie=; Expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/
Whereas we were expecting to receive something like the following, which
is what we receive for a server added in the config file:
set-cookie: test-cookie=abcdef1234567890; path=/

After investigating code path, srv_set_dyncookie() is never called when
adding a server through CLI, it is only called when parsing config file
or using "set server bkd1/srv1 addr".

To fix this, call srv_set_dyncookie() inside cli_parse_add_server().

This patch must be backported up to 2.4.
2024-03-28 11:54:21 +01:00
Amaury Denoyelle
250c19032f BUG/MINOR: server: reject enabled for dynamic server
Since their first implementation, dynamic servers are created into
maintenance state. This has been done purposely to avoid immediate
activation of a newly inserted server.

However, this principle is incompatible if "enabled" keyword is used on
"add server". The newly created instance will be unreacheable as proxy
load-balancing algorithm is not informed of its presence via
srv_lb_propagate(). The new server could be unblocked by toggling its
state with "disable server" / "enable server" commands, which will
trigger srv_lb_propagate() invocation.

To avoid this unexpected state, simply forbid "enabled" keyword for
dynamic servers. In the long-term, it could be possible to re authorize
it but at least this requires to call srv_lb_propagate() on dynamic
server creation.

This should fix github issue #2497.

This patch should not be backported as-is, to avoid breaking dynamic
servers API on stable versions. "enabled" should instead be ignored for
them. This will be implemented in a dedicated patch on top of 2.9.
2024-03-28 11:51:05 +01:00
Aurelien DARRAGON
bd98db5078 BUG/MINOR: server: 'source' interface ignored from 'default-server' directive
Sebastien Gross reported that 'interface' keyword ('source' subargument)
is silently ignored when used from 'default-server' directive despite the
documentation implicitly stating that the keyword should be supported
there.

When support for 'source' keyword was added to 'default-server' directive
in dba97077 ("MINOR: server: Make 'default-server' support 'source'
keyword."), we properly duplicated the conn iface_name from the default-
server but we forgot to copy the conn iface_len which must be set as well
since it is used as setsockopt()'s 'optlen' argument in
tcp_connect_server().

It should be backported to all stable versions.
2024-03-26 11:09:02 +01:00
Aurelien DARRAGON
3de1acfb23 BUILD: server: fix build regression on old compilers (<= gcc-4.4)
Willy reported that since 3ac79b504 ("MEDIUM: server:
make server_set_inetaddr() updater serializable"), haproxy fails to
compile on some older compilers such as gcc-4.4 with this kind of error:

  src/server.c: In function 'snr_resolution_cb':
  src/server.c:4471: error: unknown field 'dns_resolver' specified in initializer
  compilation terminated due to -Wfatal-errors.
  make: *** [Makefile:1006: src/server.o] Error 1

This is due to referencing a member inside anonymous union from a compound
literal assignment. Apparently such use of anonymous union wasn't properly
supported back then on older compilers. To fix the issue, we give "u" name
to the parent union use this name to explicitly refer to the union where
relevant in the code (only a few changes fortunately).

The fix itself was verified to restore build compatibility with gcc 4.4
(and even 4.2).

As 3ac79b504 is used as a prerequisite for 64c9c8ef3 ("BUG/MINOR:
server/dns: use server_set_inetaddr() to unset srv addr from DNS"), please
consider backporting this patch too if 64c9c8ef3 happens to be backported
in 2.9.
2024-03-25 16:23:37 +01:00
Amaury Denoyelle
0d4273f04b MEDIUM: server: close private idle connection before server deletion
This commit similar to the following one :
  65ae241dcfe710e1cdd3ec4e7a9bde38d2e4c116
  MEDIUM: server: close idle conn before server deletion

This patch implements a similar logic, this time to close private idle
connections stored in sessions. The principle is identical to the above
commit : conn_release() is used on idle connections after a takeover to
ensure thread safety.

An extra change was required to be able to execute takeover on such
connections. Their original thread ID was unknown, contrary to non
private connections which are stored in sharded lists. As such, a new
tid member has been added under sess_priv_conns chaining element.
2024-03-22 17:12:27 +01:00
Amaury Denoyelle
6e0afb2e27 MEDIUM: server: close idle conn on server deletion
To be able to delete a server, a number of preconditions must be
validated to ensure it is not in used anymore. Previously, if idle
connections were stored in the server, the deletion was cancelled. No
action was implemented to force idle connection closure, the only
solution was to wait for the periodic purging to be achieved.

This is an extra burden to be able to delete a server. Indeed, idle
connections are by definition inactive and can be closed prior to delete
a server. This is the exact purpose of this patch.

Idle connections removal is implemented inside "delete server" handler,
once it has been determined that the server can be freely removed. A
simple loop is run to call conn_release() over each idle connections.
Takeover is also executed before conn_release() to ensure tasks/tasklets
or any other sensible elements are not deleted from a foreign thread.

This patch should reduce the occurence of rejected "delete server"
execution, especially when connection reuse is high.
2024-03-22 16:59:02 +01:00
Amaury Denoyelle
7dae3ceaa0 BUG/MAJOR: server: do not delete srv referenced by session
A server can only be deleted if there is no elements which reference it.
This is taken care via srv_check_for_deletion(), most notably for active
and idle connections.

A special case occurs for connections directly managed by a session.
This is for so-called private connections, when using http-reuse never
or H2 + http-reuse safe for example. In this case. server does not
account these connections into its idle lists. This caused a bug as the
server is deleted despite the session still being able to access it.

To properly fix this, add a new referencing element into the server for
these session connections. A mt_list has been chosen for this. On
default http-reuse, private connections are typically not used so it
won't make any difference. If using H2 servers, or more generally when
dealing with private connections, insert/delete should typically occur
only once per session lifetime so impact on performance should be
minimal.

This should be backported up to 2.4. Note that srv_check_for_deletion()
was introduced in 3.0 dev tree. On backport, the extra condition in it
should be placed in cli_parse_delete_server() instead.
2024-03-14 15:21:07 +01:00
Willy Tarreau
7223296092 BUG/MINOR: server: fix first server template not being indexed
3.0-dev1 introduced a small regression with commit b4db3be86e ("BUG/MINOR:
server: fix server_find_by_name() usage during parsing"). By changing the
way servers are indexed and moving it into the server template loop, the
first one is no longer indexed because the loop starts at low+1 since it
focuses on duplication. Let's index the first one explicitly now.

This should not be backported, unless the commit above is backported.
2024-03-12 08:23:03 +01:00
matthias sweertvaegher
062ea3a3d4 BUILD: solaris: fix compilation errors
Compilation on solaris fails because of usage of names reserved on that
platform, i.e. 'queue' and 's_addr'.

This patch redefines 'queue' as '_queue' and renames 's_addr' to
'srv_addr' which fixes compilation for now.

Future plan: rename 'queue' in code base so define can be removed again.

Backporting: 2.9, 2.8
2024-03-09 11:24:54 +01:00
Amaury Denoyelle
8a31783b64 BUG/MEDIUM: server: fix dynamic servers initial settings
Contrary to static servers, dynamic servers does not initialize their
settings from a default server instance. As such, _srv_parse_init() was
responsible to set a set of minimal values to have a correct behavior.

However, some settings were not properly initialized. This caused
dynamic servers to not behave as static ones without explicit
parameters.

Currently, the main issue detected is connection reuse which was
completely impossible. This is due to incorrect pool_purge_delay and
max_reuse settings incompatible with srv_add_to_idle_list().

To fix the connection reuse, but also more generally to ensure dynamic
servers are aligned with other server instances, define a new function
srv_settings_init(). This is used to set initial values for both default
servers and dynamic servers. For static servers, srv_settings_cpy() is
kept instead, using their default server as reference.

This patch could have unexpected effects on dynamic servers behavior as
it restored proper initial settings. Previously, they were set to 0 via
calloc() invocation from new_server().

This should be backported up to 2.6, after a brief period of
observation.
2024-02-27 17:02:20 +01:00
Amaury Denoyelle
1b8c5abeeb BUG/MAJOR: server: fix stream crash due to deleted server
Before a dynamic server can be deleted, a set of preconditions must be
validated to ensure it is not referenced naymore by a stream or a
connection. This is implemented in srv_check_for_deletion().

The various criteria specified were incomplete. This allows a server
instance to be deleted while still be referenced by a stream and a
connection.

This bug was reproduced by using ASAN compilation. A script was used to
add and delete a server every second, while using h2load to generate
traffic with download of 1k objects. Here is the ASAN error.

==140916==ERROR: AddressSanitizer: heap-use-after-free on address 0x520000020080 at pc 0x63cb25679537 bp 0x701529ff5070 sp 0x701529ff5060
READ of size 1 at 0x520000020080 thread T7
    #0 0x63cb25679536 in objt_server include/haproxy/obj_type.h:99
    #1 0x63cb2568f465 in process_stream src/stream.c:1823
    #2 0x63cb25a4a4a2 in run_tasks_from_lists src/task.c:632
    #3 0x63cb25a4bf62 in process_runnable_tasks src/task.c:876
    #4 0x63cb2596a220 in run_poll_loop src/haproxy.c:3050
    #5 0x63cb2596b192 in run_thread_poll_loop src/haproxy.c:3252
    #6 0x701539aa9559  (/usr/lib/libc.so.6+0x8b559) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af)
    #7 0x701539b26a3b  (/usr/lib/libc.so.6+0x108a3b) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af)

To fix this, add <curr_used_conns> to the counters checked in
srv_check_for_deletion().

Outside of this bug, one case which remains sensible is for SF_DIRECT
streams which referenced a server instance early in process_stream()
before connect_server(). This occurs with use-server directive,
force-persist rule or cookie persistence. However, after code
reexamination, the code is considered reliable as process_stream() is
not rescheduled before connect_server() invocation. These observations
have been saved in sess_change_server() documentation to ensure it
remains valid in the future.

This must be backported up to 2.6.
2024-02-22 18:36:54 +01:00
Willy Tarreau
9b680d7411 MINOR: server: split the server deletion code in two parts
We'll need to be able to verify whether or not a server may be deleted.
For now, both the verification and the action are performed in the same
function, at once under thread isolation. The goal here is to extract
the verification code into a new function that will perform these checks,
return a status between success/recoverable/non-recoverable failure, and
will also return a message for the caller.
2024-02-09 20:38:08 +01:00
Willy Tarreau
eaeb67bdb4 BUG/MINOR: server/cli: add missing LF at the end of certain notice/error lines
Some cli_err(), cli_msg() or even ha_error() etc are missing the trailing
LF, which breaks the continuity of the CLI parsing: the extra LF that serves
to mark the end of the command is in fact taken as the missing LF and no
extra one is added.

This patch adds the missing LF on identified messages. It might be worth
trying to proceed in a more generic way with this, given the amount of
code that is possibly at risk.
2024-02-08 18:21:52 +01:00
Frédéric Lécaille
860028db47 CLEANUP: quic: Remaining useless code into server part
Remove some QUIC definitions of members from server structure as the haproxy QUIC
stack does not support at all the server part (QUIC client) as this time.
Remove the statements in relation with their initializations.

This patch should be backported as far as 2.6 to save memory.
2024-01-04 11:16:06 +01:00
Amaury Denoyelle
b4db3be86e BUG/MINOR: server: fix server_find_by_name() usage during parsing
Since below commit, server_find_by_name() now search using
'used_server_id' proxy backend tree :
  4bcfe30414
  OPTIM: server: eb lookup for server_find_by_name()

This introduces a regression if server_find_by_name() is used via
check_config_validity() during post-parsing. Indeed, used_server_id tree
is populated at the same stage so it's possible to not found an existing
server. This can cause incorrect rejection of previously valid
configuration file.

To fix this, servers are now inserted in used_server_id tree during
parsing via parse_server(). This guarantees that server instances can be
retrieved during post parsing.

A known feature which uses server_find_by_name() during post parsing is
attach-srv tcp-rule used for reverse HTTP. Prior to the current fix, a
config was wrongly rejected if the rule was declared before the server
line.

This should not be backported unless the mentionned commit is.
2024-01-02 15:52:47 +01:00
Aurelien DARRAGON
bdecff511c MEDIUM: server: simplify snr_set_srv_down() to prevent confusions
snr_set_srv_down() (was formely known as snr_update_srv_status()), is
still too ambiguous because it's not clear whether we will be putting
the server under maintenance or not. This is mainly due to the fact that
the function behaves differently if has_no_ip is set or not.

By reviewing the function callers, it has now become clear that
snr_resolution_cb() is always calling the function with a valid resolution
so we only want to put the server under maintenance if we don't have a
valid IP address. On the other hand snr_resolution_error_cb() always
calls the function on error, with either no resolution (for SRV requests)
or with failing resolution (all cases except RSLV_STATUS_VALID), so in
this case we decide whether to put the server under maintenance case by
case (ie: expired? timeout?)

As a result, let's simplify snr_set_srv_down() so that it is only called
when the caller really thinks that the server should be put under
maintenance, which means always for snr_resolution_error_cb(), and only
if the resolution didn't yield usable ip for snr_resolution_cb().
2024-01-02 10:29:50 +01:00
Aurelien DARRAGON
689784ed91 CLEANUP: resolvers: remove some more unused RSLV_UDP flags
RSLV_UPD_CNAME and RSLV_UPD_NAME_ERROR flags have now become useless since
3cf7f987 ("MINOR: dns: proper domain name validation when receiving DNS
response") as they are never set, but we forgot to remove them.
2024-01-02 10:29:41 +01:00
Aurelien DARRAGON
3ebe7bef8d CLEANUP: server: remove ambiguous check in srv_update_addr_port()
A leftover check was left by recent patch series about server
addr:svc_port propagation: a check on (msg) being set was performed
in srv_update_addr_port(), but msg is always set, so the check is not
needed and confuses coverity (See GH #2399)
2024-01-02 10:29:24 +01:00