195 Commits

Author SHA1 Message Date
Aurelien DARRAGON
5c299dee5a MEDIUM: stats: consider that shared stats pointers may be NULL
This patch looks huge, but it has a very simple goal: protect all
accessed to shared stats pointers (either read or writes), because
we know consider that these pointers may be NULL.

The reason behind this is despite all precautions taken to ensure the
pointers shouldn't be NULL when not expected, there are still corner
cases (ie: frontends stats used on a backend which no FE cap and vice
versa) where we could try to access a memory area which is not
allocated. Willy stumbled on such cases while playing with the rings
servers upon connection error, which eventually led to process crashes
(since 3.3 when shared stats were implemented)

Also, we may decide later that shared stats are optional and should
be disabled on the proxy to save memory and CPU, and this patch is
a step further towards that goal.

So in essence, this patch ensures shared stats pointers are always
initialized (including NULL), and adds necessary guards before shared
stats pointers are de-referenced. Since we already had some checks
for backends and listeners stats, and the pointer address retrieval
should stay in cpu cache, let's hope that this patch doesn't impact
stats performance much.
2025-09-18 16:49:51 +02:00
Amaury Denoyelle
0678d0a69b MINOR: check: reject invalid check config on a QUIC server
QUIC is now supported on the backend side. The previous commit ensures
that simple checks can be activated on QUIC servers without any issue.

The current patch ensures that check server settings remain compatible
with a QUIC server. Thus, configuration is now invalid if check
specifies an explicit MUX proto other than QUIC, disables SSL or try to
use PROXY protocol.
2025-09-09 16:55:09 +02:00
Amaury Denoyelle
6d3c3c7871 BUG/MINOR: check: ensure check-reuse is compatible with SSL
SSL may be activated implicitely if a server relies on SSL, even without
check-ssl keyword. This is performed by init_srv_check() function. The
main operation is to change xprt layer for check to SSL.

Prior to this patch, <use_ssl> check member was also set, despite not
strictly necessary. This has a negative side-effect of rendering
check-reuse-pool ineffective. Indeed, reuse on check is only performed
if no specific check configuration has been specified (see
tcpcheck_use_nondefault_connect()).

This patch fixes check reuse with SSL : <use_ssl> is not set in case SSL
is inherited implicitely from server configuration. Thus, <use_ssl> is
now only set if an explicit check-ssl keyword is set, which disables
connection reuse for check.

This must be backported up to 3.2.
2025-09-03 16:54:48 +02:00
Christopher Faulet
f8b7299ee7 BUG/MINOR: server: Duplicate healthcheck's sni inherited from default server
It is not really an issue, but the "check-sni" value inerited from a default
server is not duplicated while the paramter value is duplicated during the
parsing. So here there is a small leak if several "check-sni" parameters are
used on the same server line. The previous value is never released. But to
fix this issue, the value inherited from the default server must also be
duplicated. At the end it is safer this way and consistant with the parsing
of the "sni" parameter.

It is harmless so there is no reason to backport this patch.
2025-09-01 15:45:05 +02:00
Christopher Faulet
f7a04b428a BUG/MEDIUM: server: Duplicate healthcheck's alpn inherited from default server
When "check-alpn" parameter is inherited from the default server, the value
is not duplicated, the pointer of the default server is used. However, when
this parameter is overridden, the old value is released. So the "check-alpn"
value of the default server is released. So it is possible to have a UAF if
if another server inherit from the same the default server.

To fix the issue, the "check-alpn" parameter must be handled the same way
the "alpn" is. The default value is duplicated. So it could be safely
released if it is forced on the server line.

This patch should fix the issue #3096. It must be backported to all stable
versions.
2025-09-01 15:45:05 +02:00
Aurelien DARRAGON
75e480d107 MEDIUM: stats: avoid 1 indirection by storing the shared stats directly in counters struct
Between 3.2 and 3.3-dev we noticed a noticeable performance regression
due to stats handling. After bisecting, Willy found out that recent
work to split stats computing accross multiple thread groups (stats
sharding) was responsible for that performance regression. We're looking
at roughly 20% performance loss.

More precisely, it is the added indirections, multiplied by the number
of statistics that are updated for each request, which in the end causes
a significant amount of time being spent resolving pointers.

We noticed that the fe_counters_shared and be_counters_shared structures
which are currently allocated in dedicated memory since a0dcab5c
("MAJOR: counters: add shared counters base infrastructure")
are no longer huge since 16eb0fab31 ("MAJOR: counters: dispatch counters
over thread groups") because they now essentially hold flags plus the
per-thread group id pointer mapping, not the counters themselves.

As such we decided to try merging fe_counters_shared and
be_counters_shared in their parent structures. The cost is slight memory
overhead for the parent structure, but it allows to get rid of one
pointer indirection. This patch alone yields visible performance gains
and almost restores 3.2 stats performance.

counters_fe_shared_get() was renamed to counters_fe_shared_prepare() and
now returns either failure or success instead of a pointer because we
don't need to retrieve a shared pointer anymore, the function takes care
of initializing existing pointer.
2025-07-25 16:46:10 +02:00
Aurelien DARRAGON
01dfe17acf MEDIUM: server: add and use a separate last_change variable for internal use
last_change server metric is used for 2 separate purposes. First it is
used to report last server state change date for stats and other related
metrics. But it is also used internally, including in sensitive paths,
such as lb related stuff to take decision or perform computations
(ie: in srv_dynamic_maxconn()).

Due to last_change counter now being split over thread groups since 16eb0fa
("MAJOR: counters: dispatch counters over thread groups"), reading the
aggregated value has a cost, and we cannot afford to consult last_change
value from srv_dynamic_maxconn() anymore. Moreover, since the value is
used to take decision for the current process we don't wan't the variable
to be updated by another process in our back.

To prevent performance regression and sharing issues, let's instead add a
separate srv->last_change value, which is not updated atomically (given how
rare the  updates are), and only serves for places where the use of the
aggregated last_change counter/stats (split over thread groups) is too
costly.
2025-06-30 16:26:25 +02:00
Aurelien DARRAGON
5694a98744 MAJOR: mailers: remove native mailers support
As mentioned in 2.8 announce on the mailing list [1] and on the wiki [2]
native mailers were deprecated and planned for removal in 3.3. Now is
the time to drop the legacy code for native mailers which is based on a
tcpcheck "hack" and cannot be maintained. Lua mailers should be used as
a drop in replacement. Indeed, "mailers" and associated config directives
are preserved because mailers config is exposed to Lua, which helps smoothing
the transition from native mailers to Lua based ones.

As a reminder, to keep mailers configuration working as before without
making changes to the config file, simply add the line below to the global
section:

       lua-load examples/lua/mailers.lua

mailers.lua script (provided in the git repository, adjust path as needed)
may be customized by users familiar with Lua, by default it emulates the
behavior of the native (now removed) mailers.

[1]: https://www.mail-archive.com/haproxy@formilux.org/msg43600.html
[2]: https://github.com/haproxy/wiki/wiki/Breaking-changes
2025-06-24 10:55:58 +02:00
Christopher Faulet
54d74259e9 BUG/MEDIUM: check: Set SOCKERR by default when a connection error is reported
When a connection error is reported, we try to collect as much information
as possible on the connection status and the server status is adjusted
accordingly. However, the function does nothing if there is no connection
error and if the healthcheck is not expired yet. It is a problem when an
internal error occurred. It may happen at many places and it is hard to be
sure an error is reported on the connection. And in fact, it is already a
problem when the multiplexer allocation fails. In that case, the healthcheck
is not interrupted as it should be. Concretely, it could only happen when a
connection is established.

It is hard to predict the effects of this bug. It may be unimportant. But it
could probably lead to a crash. To avoid any issue, a SOCKERR status is now
set by default when a connection error is reported. There is no reason to
report a connection error for nothing. So a healthcheck failure must be
reported. There is no "internal error" status. So a socket error is
reported.

This patch must be backport to all stable versions.
2025-06-16 17:47:35 +02:00
Aurelien DARRAGON
16eb0fab31 MAJOR: counters: dispatch counters over thread groups
Most fe and be counters are good candidates for being shared between
processes. They are now grouped inside "shared" struct sub member under
be_counters and fe_counters.

Now they are properly identified, they would greatly benefit from being
shared over thread groups to reduce the cost of atomic operations when
updating them. For this, we take the current tgid into account so each
thread group only updates its own counters. For this to work, it is
mandatory that the "shared" member from {fe,be}_counters is initialized
AFTER global.nbtgroups is known, because each shared counter causes the stat
to be allocated lobal.nbtgroups times. When updating a counter without
concurrency, the first counter from the array may be updated.

To consult the shared counters (which requires aggregation of per-tgid
individual counters), some helper functions were added to counter.h to
ease code maintenance and avoid computing errors.
2025-06-05 09:59:38 +02:00
Aurelien DARRAGON
a0dcab5c45 MAJOR: counters: add shared counters base infrastructure
Shareable counters are not tagged as shared counters and are dynamically
allocated in separate memory area as a prerequisite for being stored
in shared memory area. For now, GUID and threads groups are not taken into
account, this is only a first step.

also we ensure all counters are now manipulated using atomic operations,
namely, "last_change" counter is now read from and written to using atomic
ops.

Despite the numerous changes caused by the counters being moved away from
counters struct, no change of behavior should be expected.
2025-06-05 09:58:58 +02:00
Christopher Faulet
6786b05297 DEBUG: check: Add the healthcheck's expiration date in the trace messags
It could help to diagnose some issues about timeout processing. So let's add
it !
2025-06-03 15:06:12 +02:00
Christopher Faulet
7c788f0984 BUG/MEDIUM: check: Requeue healthchecks on I/O events to handle check timeout
When a healthchecks is processed, once the first wakeup passed to start the
check, and as long as the expiration timer is not reached, only I/O events
are able to wake it up. It is an issue when there is a check timeout
defined.  Especially if the connect timeout is high and the check timeout is
low. In that case, the healthcheck's task is never requeue to handle any
timeout update. When the connection is established, the check timeout is set
to replace the connect timeout. It is thus possible to report a success
while a timeout should be reported.

So, now, when an I/O event is handled, the healthcheck is requeue, except if
an success or an abort is reported.

Thanks to Thierry Fournier for report and the reproducer.

This patch must be backported to all stable versions.
2025-06-03 15:03:30 +02:00
Olivier Houchard
81dc3e67cf MEDIUM: checks: Make sure we return the tasklet from srv_chk_io_cb
In srv_chk_io_cb, return the tasklet to tell the scheduler the tasklet
is still alive, it is not yet needed, but will be soon.
2025-04-25 16:14:26 +02:00
Aurelien DARRAGON
8a944d0e46 MINOR: checks: deinit checks_fe upon deinit
This is just to make valgrind and friends happy, leverage deinit_proxy()
for checks_fe proxy upon deinit to ensure proper cleanup.

We check the presence of proxy->id to know if it was initialized because
we cannot rely on a pointer for that.
2025-04-10 22:10:31 +02:00
Aurelien DARRAGON
4194f756de MEDIUM: tree-wide: avoid manually initializing proxies
In this patch we try to use the proxy API init functions as much as
possible to avoid code redundancy and prevent proxy initialization
errors. As such, we prefer using alloc_new_proxy() and setup_new_proxy()
instead of manually allocating the proxy pointer and performing the
base init ourselves.
2025-04-10 22:10:31 +02:00
Aurelien DARRAGON
5087048b6d MINOR: checks: mark CHECKS-FE dummy frontend as internal
CHECKS-FE frontend is a dummy frontend used to create checks sessions
as such, it is internal and should not be exposed to the user.
Better mark it as internal using PR_CAP_INT capability to prevent
proxy API from ever exposing it.
2025-04-10 22:10:31 +02:00
Amaury Denoyelle
f0f1816f1a MINOR: check: implement check-pool-conn-name srv keyword
This commit is a direct follow-up of the previous one. It defines a new
server keyword check-pool-conn-name. It is used as the default value for
the name parameter of idle connection hash generation.

Its behavior is similar to server keyword pool-conn-name, but reserved
for checks reuse. If check-pool-conn-name is set, it is used in priority
to match a connection for reuse. If unset, a fallback is performed on
check-sni.
2025-04-03 17:19:07 +02:00
Amaury Denoyelle
e34f748e3a MINOR: check define check-reuse-pool server keyword
Define a new server keyword check-reuse-pool, and its counterpart with a
"no" prefix. For the moment, only parsing is implemented. The real
behavior adjustment will be implemented in the next patch.
2025-04-02 14:57:40 +02:00
Olivier Houchard
583303c48b MINOR: proxies/servers: Calculate queueslength and use it.
For both proxies and servers, properly calculates queueslength, which is
the total number of element in each queues (as they currently are only
using one queue, it is equivalent to the number of element of that
queue), and use it instead of the queue's length.
2025-01-28 12:49:41 +01:00
Ilia Shipitsin
495f1f9741 BUG/MINOR: checks: handle a possible strdup() failure
This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to all supported branches.
2024-12-25 12:40:56 +01:00
Willy Tarreau
9c6ccb8dbb MEDIUM: config: warn on unitless timeouts < 100 ms
From time to time we face a configuration with very small timeouts which
look accidental because there could be expectations that they're expressed
in seconds and not milliseconds.

This commit adds a check for non-nul unitless values smaller than 100
and emits a warning suggesting to append an explicit unit if that was
the intent.

Only the common timeouts, the server check intervals and the resolvers
hold and timeout values were covered for now. All the code needs to be
manually reviewed to verify if it supports emitting warnings.

This may break some configs using "zero-warning", but greps in existing
configs indicate that these are extremely rare and solely intentionally
done during tests. At least even if a user leaves that after a test, it
will be more obvious when reading 10ms that something's probably not
correct.
2024-11-19 10:33:20 +01:00
Willy Tarreau
2f287f14f3 BUG/MEDIUM: checks: make sure to always apply offsets to now_ms in expiration
Now_ms can be zero nowadays, so it's not suitable for direct assignment to
t->expire, as there's a risk that the timer never wakes up once assigned
(TICK_ETERNITY). Let's use tick_add(now_ms, 0) for an immediate wakeup
instead. The impact here might be health checks suddenly stopping.

This should be backported where it applies.
2024-11-15 15:39:00 +01:00
Aperence
a7b04e383a MINOR: tools: extend str2sa_range to add an alt parameter
Add a new parameter "alt" that will store wether this configuration
use an alternate protocol.

This alt pointer will contain a value that can be transparently
passed to protocol_lookup to obtain an appropriate protocol structure.

This change is needed to allow for example the servers to know if it
need to use an alternate protocol or not.
2024-08-30 18:53:49 +02:00
Christopher Faulet
1538c4aa82 MEDIUM: proxy/spoe: Add a SPOP mode
The SPOE was significantly lightened. It is now possible to refactor it to
use a dedicated multiplexer. The first step is to add a SPOP mode for
proxies. The corresponding multiplexer mode is also added.

For now, there is no SPOP multiplexer, so it is only declarative. But at the
end, the SPOP multiplexer will be automatically selected for servers inside
a SPOP backend.

The related issue is #2502.
2024-07-12 15:27:04 +02:00
Willy Tarreau
f5566afec6 MEDIUM: dynbuf: generalize the use of b_dequeue() to detach buffer_wait
Now thanks to this the bufq_map field is expected to remain accurate.
2024-05-10 17:18:13 +02:00
Willy Tarreau
a214197ce7 MINOR: dynbuf: use the b_queue()/b_requeue() functions everywhere
The code places that were used to manipulate the buffer_wq manually
now just call b_queue() or b_requeue(). This will simplify the multiple
list management later.
2024-05-10 17:18:13 +02:00
Willy Tarreau
72d0dcda8e MINOR: dynbuf: pass a criticality argument to b_alloc()
The goal is to indicate how critical the allocation is, between the
least one (growing an existing buffer ring) and the topmost one (boot
time allocation for the life of the process).

The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function
to try to allocate otherwise subscribe. There's currently no distinction
of direction nor part that tries to allocate, and this should be revisited
to improve this situation, particularly when we consider that mux-h2 can
reduce its Tx allocations if needed.

For now, 4 main levels are planned, to translate how the data travels
inside haproxy from a producer to a consumer:
  - MUX_RX:   buffer used to receive data from the OS
  - SE_RX:    buffer used to place a transformation of the RX data for
              a mux, or to produce a response for an applet
  - CHANNEL:  the channel buffer for sync recv
  - MUX_TX:   buffer used to transfer data from the channel to the outside,
              generally a mux but there can be a few specificities (e.g.
              http client's response buffer passed to the application,
              which also gets a transformation of the channel data).

The other levels are a bit different in that they don't strictly need to
allocate for the first two ones, or they're permanent for the last one
(used by compression).
2024-05-10 17:18:13 +02:00
Amaury Denoyelle
634cc2a5d8 MINOR: counters: move last_change into counters struct
last_change was a member present in both proxy and server struct. It is
used as an age statistics to report the last update of the object.

Move last_change into fe_counters/be_counters. This is necessary to be
able to manipulate it through generic stat column and report it into
stats-file.

Note that there is a change for proxy structure with now 2 different
last_change values, on frontend and backend side. Special care was taken
to ensure that the value is initialized only on the proxy side. The
other value is set to 0 unless a listen proxy is instantiated. For the
moment, only backend counter is reported in stats. However, with now two
distinct values, stats could be extended to report it on both side.
2024-05-02 10:55:25 +02:00
Christopher Faulet
1e38ac72ce MEDIUM: stconn: Use one function to shut connection and applet endpoints
se_shutdown() function is now used to perform a shutdown on a connection
endpoint and an applet endpoint. The same function is used for
both. sc_conn_shut() function was removed and appctx_shut() function was
updated to only deal with the applet stuff.
2024-04-19 16:33:35 +02:00
Christopher Faulet
c96a873ba3 MEDIUM: stconn: Use only one SC function to shut connection endpoints
The SC API to perform shutdowns on connection endpoints was unified to have
only one function, sc_conn_shut(), with read/write shut modes passed
explicitly. It means sc_conn_shutr() and sc_conn_shutw() were removed. The
next step is to do the same at the mux level.
2024-04-19 16:25:06 +02:00
Ilya Shipitsin
80813cdd2a CLEANUP: assorted typo fixes in the code and comments
This is 37th iteration of typo fixes
2023-11-23 16:23:14 +01:00
Aurelien DARRAGON
12582eb8e5 MINOR: tools: make str2sa_range() directly return type hints
str2sa_range() already allows the caller to provide <proto> in order to
get a pointer on the protocol matching with the string input thanks to
5fc9328a ("MINOR: tools: make str2sa_range() directly return the protocol")

However, as stated into the commit message, there is a trick:
   "we can fail to return a protocol in case the caller
    accepts an fqdn for use later. This is what servers do and in this
    case it is valid to return no protocol"

In this case, we're unable to return protocol because the protocol lookup
depends on both the [proto type + xprt type] and the [family type] to be
known.

While family type might not be directly resolved when fqdn is involved
(because family type might be discovered using DNS queries), proto type
and xprt type are already known. As such, the caller might be interested
in knowing those address related hints even if the address family type is
not yet resolved and thus the matching protocol cannot be looked up.

Thus in this patch we add the optional net_addr_type (custom type)
argument to str2sa_range to enable the caller to check the protocol type
and transport type when the function succeeds.
2023-11-10 17:49:57 +01:00
Christopher Faulet
c72ab1cc6d BUG/MINOR: tcpcheck: Report hexstring instead of binary one on check failure
When an expect rule failed for a tcp-check, information about the expect
rule is dumped in the report. For a check on a binary string, a hexstring is
used in the configuration but the decoded string is dumped. It is an problem
because it can contain special characters. And it is not really handy
because there is no correspondance with the config.

So, now, the hexstring is dumped in the report. This way, we are sure there
is no special characters and it is easy to find it in the configuration.

This patch shoudl solve the issue #2326. It must be backported as far as
2.2.
2023-10-31 08:02:44 +01:00
Willy Tarreau
fca3fc0d90 BUILD: checks: shut up yet another stupid gcc warning
gcc has always had hallucinations regarding value ranges, and this one
is interesting, and affects branches 4.7 to 11.3 at least. When building
without threads, the randomly picked new_tid that is reduced to a multiply
by 1 shifted right 32 bits, hence a constant output of 0 shows this
warning:

  src/check.c: In function 'process_chk_conn':
  src/check.c:1150:32: warning: array subscript [-1, 0] is outside array bounds of 'struct thread_ctx[1]' [-Warray-bounds]
  In file included from include/haproxy/thread.h:28,
                   from include/haproxy/list.h:26,
                   from include/haproxy/action.h:28,
                   from src/check.c:31:

or this one when trying to force the test to see that it cannot be zero(!):

  src/check.c: In function 'process_chk_conn':
  src/check.c:1150:54: warning: array subscript [0, 0] is outside array bounds of 'struct thread_ctx[1]' [-Warray-bounds]
   1150 |         uint t2_act  = _HA_ATOMIC_LOAD(&ha_thread_ctx[thr2].active_checks);
        |                                         ~~~~~~~~~~~~~^~~~~~
  include/haproxy/atomic.h:66:40: note: in definition of macro 'HA_ATOMIC_LOAD'
     66 | #define HA_ATOMIC_LOAD(val)          *(val)
        |                                        ^~~
  src/check.c:1150:24: note: in expansion of macro '_HA_ATOMIC_LOAD'
   1150 |         uint t2_act  = _HA_ATOMIC_LOAD(&ha_thread_ctx[thr2].active_checks);
        |                        ^~~~~~~~~~~~~~~

Let's just add an ALREADY_CHECKED() statement there, no other check seems
to get rid of it. No backport is needed.
2023-09-04 19:38:51 +02:00
Willy Tarreau
b0031d9679 MINOR: checks: also consider the thread's queue for rebalancing
Let's also check for other threads when the current one is queueing,
let's not wait for the load to be high. Now this totally eliminates
differences between threads.
2023-09-01 14:00:04 +02:00
Willy Tarreau
844a3bc25b MEDIUM: checks: implement a queue in order to limit concurrent checks
The progressive adoption of OpenSSL 3 and its abysmal handshake
performance has started to reveal situations where it simply isn't
possible anymore to succesfully run health checks on many servers,
because between the moment all the checks are started and the moment
the handshake finally completes, the timeout has expired!

This also has consequences on production traffic which gets
significantly delayed as well, all that for lots of checks. While it's
possible to increase the check delays, it doesn't solve everything as
checks still take a huge amount of time to converge in such conditions.

Here we take a different approach by permitting to enforce the maximum
concurrent checks per thread limitation and implementing an ordered
queue. Thanks to this, if a thread about to start a check has reached
its limit, it will add the check at the end of a queue and it will be
processed once another check is finished. This proves to be extremely
efficient, with all checks completing in a reasonable amount of time
and not being disturbed by the rest of the traffic from other checks.
They're just cycling slower, but at the speed the machine can handle.

One must understand however that if some complex checks perform multiple
exchanges, they will take a check slot for all the required duration.
This is why the limit is not enforced by default.

Tests on SSL show that a limit of 5-50 checks per thread on local
servers gives excellent results already, so that could be a good starting
point.
2023-09-01 14:00:04 +02:00
Willy Tarreau
cfc0bceeb5 MEDIUM: checks: search more aggressively for another thread on overload
When the current check is overloaded (more running checks than the
configured limit), we'll try more aggressively to find another thread.
Instead of just opportunistically looking for one half as loaded, now if
the current thread has more than 1% more active checks than another one,
or has more than a configured limit of concurrent running checks, it will
search for a more suitable thread among 3 other random ones in order to
migrate the check there. The number of migrations remains very low (~1%)
and the checks load very fair across all threads (~1% as well). The new
parameter is called tune.max-checks-per-thread.
2023-09-01 08:26:06 +02:00
Willy Tarreau
016e189ea3 MINOR: check: also consider the random other thread's active checks
When checking if it's worth transferring a sleeping thread to another
random thread, let's also check if that random other thread has less
checks than the current one, which is another reason for transferring
the load there.

This commit adds a function "check_thread_cmp_load()" to compare two
threads' loads in order to simplify the decision taking.

The minimum active check count before starting to consider rebalancing
the load was now raised from 2 to 3, because tests show that at 15k
concurrent checks, at 2, 50% are evaluated for rebalancing and 30%
are rebalanced, while at 3, this is cut in half.
2023-09-01 08:26:06 +02:00
Willy Tarreau
00de9e0804 MINOR: checks: maintain counters of active checks per thread
Let's keep two check counters per thread:
  - one for "active" checks, i.e. checks that are no more sleeping
    and are assigned to the thread. These include sleeping and
    running checks ;

  - one for "running" checks, i.e. those which are currently
    executing on the thread.

By doing so, we'll be able to spread the health checks load a bit better
and refrain from sending too many at once per thread. The counters are
atomic since a migration increments the target thread's active counter.
These numbers are reported in "show activity", which allows to check
per thread and globally how many checks are currently pending and running
on the system.

Ideally, we should only consider checks in the process of establishing
a connection since that's really the expensive part (particularly with
OpenSSL 3.0). But the inner layers are really not suitable to doing
this. However knowing the number of active checks is already a good
enough hint.
2023-09-01 08:26:06 +02:00
Willy Tarreau
3b7942a1c9 MINOR: check/activity: collect some per-thread check activity stats
We now count the number of times a check was started on each thread
and the number of times a check was adopted. This helps understand
better what is observed regarding checks.
2023-09-01 08:26:06 +02:00
Willy Tarreau
e03d05c6ce MINOR: check: remember when we migrate a check
The goal here is to explicitly mark that a check was migrated so that
we don't do it again. This will allow us to perform other actions on
the target thread while still knowing that we don't want to be migrated
again. The new READY bit combine with SLEEPING to form 4 possible states:

 SLP  RDY   State      Description
  0    0    -          (reserved)
  0    1    RUNNING    Check is bound to current thread and running
  1    0    SLEEPING   Check is sleeping, not bound to a thread
  1    1    MIGRATING  Check is migrating to another thread

Thus we set READY upon migration, and check for it before migrating, this
is sufficient to prevent a second migration. To make things a bit clearer,
the SLEEPING bit was switched with FASTINTER so that SLEEPING and READY are
adjacent.
2023-09-01 08:26:06 +02:00
Willy Tarreau
3544c9f8a0 MINOR: checks: pin the check to its thread upon wakeup
When a check leaves the sleeping state, we must pin it to the thread that
is processing it. It's normally always the case after the first execution,
but initial checks that start assigned to any thread (-1) could be assigned
much later, causing problems with planned changes involving queuing. Thus
better do it early, so that all threads start properly pinned.
2023-09-01 08:26:06 +02:00
Willy Tarreau
7163f95b43 MINOR: checks: start the checks in sleeping state
The CHK_ST_SLEEPING state was introduced by commit d114f4a68 ("MEDIUM:
checks: spread the checks load over random threads") to indicate that
a check was not currently bound to a thread and that it could easily
be migrated to any other thread. However it did not start the checks
in this state, meaning that they were not redispatchable on startup.

Sometimes under heavy load (e.g. when using SSL checks with OpenSSL 3.0)
the cost of setting up new connections is so high that some threads may
experience connection timeouts on startup. In this case it's better if
they can transfer their excess load to other idle threads. By just
marking the check as sleeping upon startup, we can do this and
significantly reduce the number of failed initial checks.
2023-09-01 08:26:06 +02:00
Willy Tarreau
48442b8b15 BUG/MINOR: checks: do not queue/wake a bounced check
A small issue was introduced with commit d114f4a68 ("MEDIUM: checks:
spread the checks load over random threads"): when a check is bounced
to another thread, its expiration time is set to TICK_ETERNITY. This
makes it show as not expired upon first wakeup on the next thread,
thus being detected as "woke up too early" and being instantly
rescheduled. Only this after this next wakeup it will be properly
considered.

Several approaches were attempted to fix this. The best one seems to
consist in resetting t->expire and expired upon wakeup, and changing
the !expired test for !tick_is_expired() so that we don't trigger on
this case.

This needs to be backported to 2.7.
2023-09-01 08:26:06 +02:00
Christopher Faulet
8bca3cc8c7 MEDIUM: checks: Stop scheduling healthchecks during stopping stage
When the process is stopping, the health-checks are suspended. However the
task is still periodically woken up for nothing. If there is a huge number
of health-checks and if they are woken up in same time, it may lead to a
noticeable CPU consumption for no reason.

To avoid this extra CPU cost, we stop to schedule the health-check tasks
when the proxy is disabled or stopped.

This patch should partially solve the issue #2145.
2023-05-17 14:57:10 +02:00
Willy Tarreau
c7b9308f20 BUG/MINOR: clock: automatically adjust the internal clock with the boot time
This is a better and more general solution to the problem described in
this commit:

    BUG/MINOR: checks: postpone the startup of health checks by the boot time

Now we're updating the now_offset that is used to compute now_ms at the
few points where we update the ready date during boot. This ensures that
now_ms while being stable during all the boot process will be correct
and will start with the boot value right after the boot is finished. As
such the patch above is rolled back (we don't want to count the boot
time twice).

This must not be backported because it relies on the more flexible clock
architecture in 2.8.
2023-05-17 09:33:54 +02:00
Willy Tarreau
8e978a094d BUG/MINOR: checks: postpone the startup of health checks by the boot time
When health checks are started at boot, now_ms could be off by the boot
time. In general it's not even noticeable, but with very large configs
taking up to one or even a few seconds to start, this can result in a
part of the servers' checks being scheduled slightly in the past. As
such all of them will start groupped, partially defeating the purpose of
the spread-checks setting. For example, this can cause a burst of
connections for the network, or an excess of CPU usage during SSL
handshakes, possibly even causing some timeouts to expire early.

Here in order to compensate for this, we simply add the known boot time
to the computed delay when scheduling the startup of checks. That's very
simple and particularly efficient. For example, a config with 5k servers
in 800 backends checked every 5 seconds, that was taking 3.8 seconds to
start used to show this distribution of health checks previously despite
the spread-checks 50:

   3690 08:59:25
    417 08:59:26
    213 08:59:27
     71 08:59:28
    428 08:59:29
    860 08:59:30
    918 08:59:31
    938 08:59:32
   1124 08:59:33
    904 08:59:34
    647 08:59:35
    890 08:59:36
    973 08:59:37
    856 08:59:38
    893 08:59:39
    154 08:59:40

Now with the fix it shows this:
    470 08:59:59
    929 09:00:00
    896 09:00:01
    937 09:00:02
    854 09:00:03
    827 09:00:04
    906 09:00:05
    863 09:00:06
    913 09:00:07
    873 09:00:08
    162 09:00:09

This should be backported to all supported versions. It depends on
this commit:

    MINOR: clock: measure the total boot time

For 2.8 where the internal clock is now totally independent on the human
one, an more generic fix will consist in simply updating now_ms to reflect
the startup time.
2023-05-17 09:33:54 +02:00
Christopher Faulet
cb76030356 CLEANUP: check; Remove some useless assignments to NULL
In process_chk_conn(), some assignments to NULL are useless and are reported
by Coverity as unused value. while it is harmless, these assignments can be
removed.

This patch should fix the coverity report #2158.
2023-05-17 09:28:23 +02:00
Willy Tarreau
b93758cec9 MINOR: checks: make sure spread-checks is used also at boot time
This makes use of spread-checks also for the startup of the check tasks.
This provides a smoother load on startup for uneven configurations which
tend to enable only *some* servers. Below is the connection distribution
per second of the SSL checks of a config with 5k servers spread over 800
backends, with a check inter of 5 seconds:

- default:
    682 08:00:50
    826 08:00:51
    773 08:00:52
   1016 08:00:53
    885 08:00:54
    889 08:00:55
    825 08:00:56
    773 08:00:57
   1016 08:00:58
    884 08:00:59
    888 08:01:00
    491 08:01:01

- with spread-checks 50:
    437 08:01:19
    866 08:01:20
    777 08:01:21
   1023 08:01:22
   1118 08:01:23
    923 08:01:24
    641 08:01:25
    859 08:01:26
    962 08:01:27
    860 08:01:28
    929 08:01:29
    909 08:01:30
    866 08:01:31
    849 08:01:32
    114 08:01:33

- with spread-checks 50 + this patch:
    680 08:01:55
    922 08:01:56
    962 08:01:57
    899 08:01:58
    819 08:01:59
    843 08:02:00
    916 08:02:01
    896 08:02:02
    886 08:02:03
    846 08:02:04
    903 08:02:05
    894 08:02:06
    178 08:02:07

The load is much smoother from the start, this can help initial health
checks succeed when many target the same overloaded server for example.
This could be backported as it should make border-line configs more
reliable across reloads.
2023-05-17 08:10:40 +02:00