The code places that were used to manipulate the buffer_wq manually
now just call b_queue() or b_requeue(). This will simplify the multiple
list management later.
The goal is to indicate how critical the allocation is, between the
least one (growing an existing buffer ring) and the topmost one (boot
time allocation for the life of the process).
The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function
to try to allocate otherwise subscribe. There's currently no distinction
of direction nor part that tries to allocate, and this should be revisited
to improve this situation, particularly when we consider that mux-h2 can
reduce its Tx allocations if needed.
For now, 4 main levels are planned, to translate how the data travels
inside haproxy from a producer to a consumer:
- MUX_RX: buffer used to receive data from the OS
- SE_RX: buffer used to place a transformation of the RX data for
a mux, or to produce a response for an applet
- CHANNEL: the channel buffer for sync recv
- MUX_TX: buffer used to transfer data from the channel to the outside,
generally a mux but there can be a few specificities (e.g.
http client's response buffer passed to the application,
which also gets a transformation of the channel data).
The other levels are a bit different in that they don't strictly need to
allocate for the first two ones, or they're permanent for the last one
(used by compression).
last_change was a member present in both proxy and server struct. It is
used as an age statistics to report the last update of the object.
Move last_change into fe_counters/be_counters. This is necessary to be
able to manipulate it through generic stat column and report it into
stats-file.
Note that there is a change for proxy structure with now 2 different
last_change values, on frontend and backend side. Special care was taken
to ensure that the value is initialized only on the proxy side. The
other value is set to 0 unless a listen proxy is instantiated. For the
moment, only backend counter is reported in stats. However, with now two
distinct values, stats could be extended to report it on both side.
se_shutdown() function is now used to perform a shutdown on a connection
endpoint and an applet endpoint. The same function is used for
both. sc_conn_shut() function was removed and appctx_shut() function was
updated to only deal with the applet stuff.
The SC API to perform shutdowns on connection endpoints was unified to have
only one function, sc_conn_shut(), with read/write shut modes passed
explicitly. It means sc_conn_shutr() and sc_conn_shutw() were removed. The
next step is to do the same at the mux level.
str2sa_range() already allows the caller to provide <proto> in order to
get a pointer on the protocol matching with the string input thanks to
5fc9328a ("MINOR: tools: make str2sa_range() directly return the protocol")
However, as stated into the commit message, there is a trick:
"we can fail to return a protocol in case the caller
accepts an fqdn for use later. This is what servers do and in this
case it is valid to return no protocol"
In this case, we're unable to return protocol because the protocol lookup
depends on both the [proto type + xprt type] and the [family type] to be
known.
While family type might not be directly resolved when fqdn is involved
(because family type might be discovered using DNS queries), proto type
and xprt type are already known. As such, the caller might be interested
in knowing those address related hints even if the address family type is
not yet resolved and thus the matching protocol cannot be looked up.
Thus in this patch we add the optional net_addr_type (custom type)
argument to str2sa_range to enable the caller to check the protocol type
and transport type when the function succeeds.
When an expect rule failed for a tcp-check, information about the expect
rule is dumped in the report. For a check on a binary string, a hexstring is
used in the configuration but the decoded string is dumped. It is an problem
because it can contain special characters. And it is not really handy
because there is no correspondance with the config.
So, now, the hexstring is dumped in the report. This way, we are sure there
is no special characters and it is easy to find it in the configuration.
This patch shoudl solve the issue #2326. It must be backported as far as
2.2.
gcc has always had hallucinations regarding value ranges, and this one
is interesting, and affects branches 4.7 to 11.3 at least. When building
without threads, the randomly picked new_tid that is reduced to a multiply
by 1 shifted right 32 bits, hence a constant output of 0 shows this
warning:
src/check.c: In function 'process_chk_conn':
src/check.c:1150:32: warning: array subscript [-1, 0] is outside array bounds of 'struct thread_ctx[1]' [-Warray-bounds]
In file included from include/haproxy/thread.h:28,
from include/haproxy/list.h:26,
from include/haproxy/action.h:28,
from src/check.c:31:
or this one when trying to force the test to see that it cannot be zero(!):
src/check.c: In function 'process_chk_conn':
src/check.c:1150:54: warning: array subscript [0, 0] is outside array bounds of 'struct thread_ctx[1]' [-Warray-bounds]
1150 | uint t2_act = _HA_ATOMIC_LOAD(&ha_thread_ctx[thr2].active_checks);
| ~~~~~~~~~~~~~^~~~~~
include/haproxy/atomic.h:66:40: note: in definition of macro 'HA_ATOMIC_LOAD'
66 | #define HA_ATOMIC_LOAD(val) *(val)
| ^~~
src/check.c:1150:24: note: in expansion of macro '_HA_ATOMIC_LOAD'
1150 | uint t2_act = _HA_ATOMIC_LOAD(&ha_thread_ctx[thr2].active_checks);
| ^~~~~~~~~~~~~~~
Let's just add an ALREADY_CHECKED() statement there, no other check seems
to get rid of it. No backport is needed.
Let's also check for other threads when the current one is queueing,
let's not wait for the load to be high. Now this totally eliminates
differences between threads.
The progressive adoption of OpenSSL 3 and its abysmal handshake
performance has started to reveal situations where it simply isn't
possible anymore to succesfully run health checks on many servers,
because between the moment all the checks are started and the moment
the handshake finally completes, the timeout has expired!
This also has consequences on production traffic which gets
significantly delayed as well, all that for lots of checks. While it's
possible to increase the check delays, it doesn't solve everything as
checks still take a huge amount of time to converge in such conditions.
Here we take a different approach by permitting to enforce the maximum
concurrent checks per thread limitation and implementing an ordered
queue. Thanks to this, if a thread about to start a check has reached
its limit, it will add the check at the end of a queue and it will be
processed once another check is finished. This proves to be extremely
efficient, with all checks completing in a reasonable amount of time
and not being disturbed by the rest of the traffic from other checks.
They're just cycling slower, but at the speed the machine can handle.
One must understand however that if some complex checks perform multiple
exchanges, they will take a check slot for all the required duration.
This is why the limit is not enforced by default.
Tests on SSL show that a limit of 5-50 checks per thread on local
servers gives excellent results already, so that could be a good starting
point.
When the current check is overloaded (more running checks than the
configured limit), we'll try more aggressively to find another thread.
Instead of just opportunistically looking for one half as loaded, now if
the current thread has more than 1% more active checks than another one,
or has more than a configured limit of concurrent running checks, it will
search for a more suitable thread among 3 other random ones in order to
migrate the check there. The number of migrations remains very low (~1%)
and the checks load very fair across all threads (~1% as well). The new
parameter is called tune.max-checks-per-thread.
When checking if it's worth transferring a sleeping thread to another
random thread, let's also check if that random other thread has less
checks than the current one, which is another reason for transferring
the load there.
This commit adds a function "check_thread_cmp_load()" to compare two
threads' loads in order to simplify the decision taking.
The minimum active check count before starting to consider rebalancing
the load was now raised from 2 to 3, because tests show that at 15k
concurrent checks, at 2, 50% are evaluated for rebalancing and 30%
are rebalanced, while at 3, this is cut in half.
Let's keep two check counters per thread:
- one for "active" checks, i.e. checks that are no more sleeping
and are assigned to the thread. These include sleeping and
running checks ;
- one for "running" checks, i.e. those which are currently
executing on the thread.
By doing so, we'll be able to spread the health checks load a bit better
and refrain from sending too many at once per thread. The counters are
atomic since a migration increments the target thread's active counter.
These numbers are reported in "show activity", which allows to check
per thread and globally how many checks are currently pending and running
on the system.
Ideally, we should only consider checks in the process of establishing
a connection since that's really the expensive part (particularly with
OpenSSL 3.0). But the inner layers are really not suitable to doing
this. However knowing the number of active checks is already a good
enough hint.
We now count the number of times a check was started on each thread
and the number of times a check was adopted. This helps understand
better what is observed regarding checks.
The goal here is to explicitly mark that a check was migrated so that
we don't do it again. This will allow us to perform other actions on
the target thread while still knowing that we don't want to be migrated
again. The new READY bit combine with SLEEPING to form 4 possible states:
SLP RDY State Description
0 0 - (reserved)
0 1 RUNNING Check is bound to current thread and running
1 0 SLEEPING Check is sleeping, not bound to a thread
1 1 MIGRATING Check is migrating to another thread
Thus we set READY upon migration, and check for it before migrating, this
is sufficient to prevent a second migration. To make things a bit clearer,
the SLEEPING bit was switched with FASTINTER so that SLEEPING and READY are
adjacent.
When a check leaves the sleeping state, we must pin it to the thread that
is processing it. It's normally always the case after the first execution,
but initial checks that start assigned to any thread (-1) could be assigned
much later, causing problems with planned changes involving queuing. Thus
better do it early, so that all threads start properly pinned.
The CHK_ST_SLEEPING state was introduced by commit d114f4a68 ("MEDIUM:
checks: spread the checks load over random threads") to indicate that
a check was not currently bound to a thread and that it could easily
be migrated to any other thread. However it did not start the checks
in this state, meaning that they were not redispatchable on startup.
Sometimes under heavy load (e.g. when using SSL checks with OpenSSL 3.0)
the cost of setting up new connections is so high that some threads may
experience connection timeouts on startup. In this case it's better if
they can transfer their excess load to other idle threads. By just
marking the check as sleeping upon startup, we can do this and
significantly reduce the number of failed initial checks.
A small issue was introduced with commit d114f4a68 ("MEDIUM: checks:
spread the checks load over random threads"): when a check is bounced
to another thread, its expiration time is set to TICK_ETERNITY. This
makes it show as not expired upon first wakeup on the next thread,
thus being detected as "woke up too early" and being instantly
rescheduled. Only this after this next wakeup it will be properly
considered.
Several approaches were attempted to fix this. The best one seems to
consist in resetting t->expire and expired upon wakeup, and changing
the !expired test for !tick_is_expired() so that we don't trigger on
this case.
This needs to be backported to 2.7.
When the process is stopping, the health-checks are suspended. However the
task is still periodically woken up for nothing. If there is a huge number
of health-checks and if they are woken up in same time, it may lead to a
noticeable CPU consumption for no reason.
To avoid this extra CPU cost, we stop to schedule the health-check tasks
when the proxy is disabled or stopped.
This patch should partially solve the issue #2145.
This is a better and more general solution to the problem described in
this commit:
BUG/MINOR: checks: postpone the startup of health checks by the boot time
Now we're updating the now_offset that is used to compute now_ms at the
few points where we update the ready date during boot. This ensures that
now_ms while being stable during all the boot process will be correct
and will start with the boot value right after the boot is finished. As
such the patch above is rolled back (we don't want to count the boot
time twice).
This must not be backported because it relies on the more flexible clock
architecture in 2.8.
When health checks are started at boot, now_ms could be off by the boot
time. In general it's not even noticeable, but with very large configs
taking up to one or even a few seconds to start, this can result in a
part of the servers' checks being scheduled slightly in the past. As
such all of them will start groupped, partially defeating the purpose of
the spread-checks setting. For example, this can cause a burst of
connections for the network, or an excess of CPU usage during SSL
handshakes, possibly even causing some timeouts to expire early.
Here in order to compensate for this, we simply add the known boot time
to the computed delay when scheduling the startup of checks. That's very
simple and particularly efficient. For example, a config with 5k servers
in 800 backends checked every 5 seconds, that was taking 3.8 seconds to
start used to show this distribution of health checks previously despite
the spread-checks 50:
3690 08:59:25
417 08:59:26
213 08:59:27
71 08:59:28
428 08:59:29
860 08:59:30
918 08:59:31
938 08:59:32
1124 08:59:33
904 08:59:34
647 08:59:35
890 08:59:36
973 08:59:37
856 08:59:38
893 08:59:39
154 08:59:40
Now with the fix it shows this:
470 08:59:59
929 09:00:00
896 09:00:01
937 09:00:02
854 09:00:03
827 09:00:04
906 09:00:05
863 09:00:06
913 09:00:07
873 09:00:08
162 09:00:09
This should be backported to all supported versions. It depends on
this commit:
MINOR: clock: measure the total boot time
For 2.8 where the internal clock is now totally independent on the human
one, an more generic fix will consist in simply updating now_ms to reflect
the startup time.
In process_chk_conn(), some assignments to NULL are useless and are reported
by Coverity as unused value. while it is harmless, these assignments can be
removed.
This patch should fix the coverity report #2158.
This makes use of spread-checks also for the startup of the check tasks.
This provides a smoother load on startup for uneven configurations which
tend to enable only *some* servers. Below is the connection distribution
per second of the SSL checks of a config with 5k servers spread over 800
backends, with a check inter of 5 seconds:
- default:
682 08:00:50
826 08:00:51
773 08:00:52
1016 08:00:53
885 08:00:54
889 08:00:55
825 08:00:56
773 08:00:57
1016 08:00:58
884 08:00:59
888 08:01:00
491 08:01:01
- with spread-checks 50:
437 08:01:19
866 08:01:20
777 08:01:21
1023 08:01:22
1118 08:01:23
923 08:01:24
641 08:01:25
859 08:01:26
962 08:01:27
860 08:01:28
929 08:01:29
909 08:01:30
866 08:01:31
849 08:01:32
114 08:01:33
- with spread-checks 50 + this patch:
680 08:01:55
922 08:01:56
962 08:01:57
899 08:01:58
819 08:01:59
843 08:02:00
916 08:02:01
896 08:02:02
886 08:02:03
846 08:02:04
903 08:02:05
894 08:02:06
178 08:02:07
The load is much smoother from the start, this can help initial health
checks succeed when many target the same overloaded server for example.
This could be backported as it should make border-line configs more
reliable across reloads.
Adding a new event type: SERVER_CHECK.
This event is published when a server's check state ought to be reported.
(check status change or check result)
SERVER_CHECK event is provided as a server event with additional data
carrying relevant check's context such as check's result and health.
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".
All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really is.
The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.
The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.
The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.
One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec
part to disappear. At this point, "now" is only used as a timeval in
clock.c where it is updated.
Instead of operating on {sec, usec} now we convert both operands to
ns then subtract them and convert to ms. This is a first step towards
dropping timeval from these timestamps.
Interestingly, tv_ms_elapsed() and tv_ms_remain() are no longer used at
all and could be removed.
This one is greatly inspired by "MINOR: server: change adm_st_chg_cause storage type".
While looking at current srv_op_st_chg_cause usage, it was clear that
the struct needed some cleanup since some leftovers from asynchronous server
state change updates were left behind and resulted in some useless code
duplication, and making the whole thing harder to maintain.
Two observations were made:
- by tracking down srv_set_{running, stopped, stopping} usage,
we can see that the <reason> argument is always a fixed statically
allocated string.
- check-related state change context (duration, status, code...) is
not used anymore since srv_append_status() directly extracts the
values from the server->check. This is pure legacy from when
the state changes were applied asynchronously.
To prevent code duplication, useless string copies and make the reason/cause
more exportable, we store it as an enum now, and we provide
srv_op_st_chg_cause() function to fetch the related description string.
HEALTH and AGENT causes (check related) are now explicitly identified to
make consumers like srv_append_op_chg_cause() able to fetch checks info
from the server itself if they need to.
srv_append_status() has become a swiss-knife function over time.
It is used from server code and also from checks code, with various
inputs and distincts code paths, making it very hard to guess the
actual behavior of the function (resulting string output).
To simplify the logic behind it, we're dividing it in multiple contextual
functions that take simple inputs and do explicit things, making them
more predictable and easier to maintain.
srv_drop() function is reponsible for freeing the server when the
refcount reaches 0.
There is one exception: when global.mode has the MODE_STOPPING flag set,
srv_drop() will ignore the refcount and free the server on first
invocation.
This logic has been implemented with 13f2e2ce ("BUG/MINOR: server: do
not use refcount in free_server in stopping mode") and back then doing
so was not a problem since dynamic server API was just implemented and
srv_take() and srv_drop() were not widely used.
Now that dynamic server API is starting to get more popular we cannot
afford to keep the current logic: some modules or lua scripts may hold
references to existing server and also do their cleanup in deinit phases
In this kind of situation, it would be easy to trigger double-frees
since every call to srv_drop() on a specific server will try to free it.
To fix this, we take a different approach and try to fix the issue at
the source: we now properly drop server references involved with
checks/agent_checks in deinit_srv_check() and deinit_srv_agent_check().
While this could theorically be backported up to 2.6, it is not very
relevant for now since srv_drop() usage in older versions is very
limited and we're only starting to face the issue in mid 2.8
developments. (ie: lua core updates)
With previous commit, 9e080bf ("BUG/MINOR: checks: make sure fastinter is used
even on forced transitions"), on-error mark-down|sudden-death|fail-check are
now working as expected.
However, on-error fastinter remains broken because srv_getinter(), used in
the above commit to check the expiration date, won't return fastinter interval
if server health is maxed out (which is the case with on-error fastinter mode).
To fix this, we introduce a check flag named CHK_ST_FASTINTER.
This flag is set when on-error is triggered. This way we can force
srv_getinter() to return fastinter interval whenever the flag is set.
The flag is automatically cleared as soon as the new check task expiry is
recalculated in process_chk_conn().
This restores original behavior prior to d114f4a ("MEDIUM: checks: spread the
checks load over random threads").
It must be backported to 2.7 along with the aforementioned commits.
Aurélien also found that while previous commit a56798ea4 ("BUG/MEDIUM:
checks: do not reschedule a possibly running task on state change")
addressed one specific case where the check's task had to be woken up
quickly, but it's not always sufficient as the check will not be
considered as expired regarding the fastinter yet.
Let's make sure we do consider this specific case to update the timer
based on the new state if the new value is shorter. This particularly
means that even if the timer is not expired yet during a wakeup when
nothing is in progress, we need to check if applying the currently
effective interval right now to the current date would expire earlier
than what is programmed, then the timer needs to be updated. I.e.
make sure we never miss fastinter during a state transition before
the end of the current period.
The approach is not pretty, but it forces to repass via the existing
block dedicated to updating the timer if the current one is expired
and the updated one would appear earlier.
This must be backported to 2.7 along with the commit above.
Aurélien found an issue introduced in 2.7-dev8 with commit d114f4a68
("MEDIUM: checks: spread the checks load over random threads"), but which
in fact has deeper roots.
When a server's state is changed via __health_adjust(), if a fastinter
setting is set, the task gets rescheduled to run at the new date. The
way it's done is not thread safe, as nothing prevents another thread
where the task is already running from also updating the expire field
in parallel. But since such events are quite rare, this statistically
never happens. However, with the commit above, the tasks are no longer
required to go to the shared wait queue and are no longer marked as
shared between multiple threads. It's just that *any* thread may run
them at a time without implying that all of them are allowed to modify
them. And this change is sufficient to trigger the BUG_ON() condition
in the scheduler that detects the inconsistency between a task queued
in one thread and being manipulated in parallel by another one:
FATAL: bug condition "task->tid != tid" matched at
include/haproxy/task.h:670
call trace(13):
| 0x55f61cf520c9 [c6 04 25 01 00 00 00 00]: main-0x2ee7
| 0x55f61d0646e8 [8b 45 08 a8 40 0f 85 65]: back_handle_st_cer+0x78/0x4d7
| 0x55f61cff3e72 [41 0f b6 4f 01 e9 c8 df]: process_stream+0x2252/0x364f
| 0x55f61d0d2fab [48 89 c3 48 85 db 74 75]: run_tasks_from_lists+0x34b/0x8c4
| 0x55f61d0d38ad [29 44 24 18 8b 54 24 18]: process_runnable_tasks+0x37d/0x6c6
| 0x55f61d0a22fa [83 3d 0b 63 1e 00 01 0f]: run_poll_loop+0x13a/0x536
| 0x55f61d0a28c9 [48 8b 1d f0 46 19 00 48]: main+0x14d919
| 0x55f61cf56dfe [31 c0 e8 eb 93 1b 00 31]: main+0x1e4e/0x2d5d
At first glance it looked like it could be addressed in the scheduler
only, but in fact the problem clearly is at the application level, since
some shared fields are manipulated without protection. At minima, the
task's expiry ought to be touched only under the server's lock. While
it's arguable that the scheduler could make such updates easier, changing
it alone will not be sufficient here.
Looking at the sequencing closer, it becomes obvious that we do not need
this task_schedule() at all: a simple task_wakeup() is sufficient for the
callee to update its timers. Indeed, the process_chk_con() function already
deals with spurious wakeups, and already uses srv_getinter() to calculate
the next wakeup date based on the current state. So here, instead of
having to queue the task from __health_adjust() to anticipate a new check,
we can simply wake the task up and let it decide when it needs to run
next. This is much cleaner as the expiry calculation remains performed at
a single place, from the task itself, as it should be, and it fixes the
problem above.
This should be backported to 2.7, but not to older versions where the
risks of breakage are higher than the chance to fix something that
ever happened.
The CPU usage pattern was found to be high (5%) on a machine with
48 threads and only 100 servers checked every second That was
supposed to be only 100 connections per second, which should be very
cheap. It was figured that due to the check tasks unbinding from any
thread when going back to sleep, they're queued into the shared queue.
Not only this requires to manipulate the global queue lock, but in
addition it means that all threads have to check the global queue
before going to sleep (hence take a lock again) to figure how long
to sleep, and that they would all sleep only for the shortest amount
of time to the next check, one would pick it and all other ones would
go down to sleep waiting for the next check.
That's perfectly visible in time-to-first-byte measurements. A quick
test consisting in retrieving the stats page in CSV over a 48-thread
process checking 200 servers every 2 seconds shows the following tail:
percentile ttfb(ms)
99.98 2.43
99.985 5.72
99.99 32.96
99.995 82.176
99.996 82.944
99.9965 83.328
99.997 83.84
99.9975 84.288
99.998 85.12
99.9985 86.592
99.999 88
99.9995 89.728
99.9999 100.352
One solution could consist in forcefully binding checks to threads at
boot time, but that's annoying, will cause trouble for dynamic servers
and may cause some skew in the load depending on some server patterns.
Instead here we take a different approach. A check remains bound to its
thread for as long as possible, but upon every wakeup, the thread's load
is compared with another random thread's load. If it's found that that
other thread's load is less than half of the current one's, the task is
bounced to that thread. In order to prevent that new thread from doing
the same, we set a flag "CHK_ST_SLEEPING" that indicates that it just
woke up and we're bouncing the task only on this condition.
Tests have shown that the initial load was very unfair before, with a few
checks threads having a load of 15-20 and the vast majority having zero.
With this modification, after two "inter" delays, the load is either zero
or one everywhere when checks start. The same test shows a CPU usage that
significantly drops, between 0.5 and 1%. The same latency tail measurement
is much better, roughly 10 times smaller:
percentile ttfb(ms)
99.98 1.647
99.985 1.773
99.99 4.912
99.995 8.76
99.996 8.88
99.9965 8.944
99.997 9.016
99.9975 9.104
99.998 9.224
99.9985 9.416
99.999 9.8
99.9995 10.04
99.9999 10.432
In fact one difference here is that many threads work while in the past
they were waking up and going down to sleep after having perturbated the
shared lock. Thus it is anticipated that this will scale way smoother
than before. Under strace it's clearly visible that all threads are
sleeping for the time it takes to relaunch a check, there's no more
thundering herd wakeups.
However it is also possible that in some rare cases such as very short
check intervals smaller than a scheduler's timeslice (such as 4ms),
some users might have benefited from the work being concentrated on
less threads and would instead observe a small increase of apparent
CPU usage due to more total threads waking up even if that's for less
work each and less total work. That's visible with 200 servers at 4ms
where show activity shows that a few threads were overloaded and others
doing nothing. It's not a problem, though as in practice checks are not
supposed to eat much CPU and to wake up fast enough to represent a
significant load anyway, and the main issue they could have been
causing (aside the global lock) is an increase last-percentile latency.
There's no point using ha_random32() which is heavy and uses shared
variables to calculate a random timer when we have statistical_prng()
which does the same and was made exactly for this.
It is only a real problem for agent-checks when there is no agent string to
send. The condition to disable TCP_QUICKACK was only based on the action
type following the connect one. But it is not always accurate. indeed, for
agent-checks, there is always a SEND action. But if there is no "agent-send"
string defined, nothing is sent. In this case, this adds 200ms of latency
with no reason.
To fix the bug, a flag is now used on the CONNECT action to instruct there
are data that should be sent after the connect. For health-checks, this flag
is set if the action following the connect is a SEND action. For
agent-checks, it is set if an "agent-send" string is defined.
This patch should fix the issue #1836. It must be backported as far as 2.2.
This macro was used both for binding and for lookups. When binding tasks
or FDs, using all_threads_mask instead is better as it will later be per
group. For lookups, ~0UL always does the job. Thus in practice the macro
was already almost not used anymore since the rest of the code could run
fine with a constant of all ones there.
Since the conn-stream refactoring, from the time the health-check is in
progress, its stream-connector is always defined. So, some tests on it are
useless and can be removed.
This patch should fix the issue #1739.
There is no server for email alerts. So the trace messages must be adapted
to handle this case. Information related to the server are now skipped for
email alerts and "[EMAIL]" prefix is used.
This patch must be backported as far as 2.4.
The check struct had a "cs" field renamed to "sc", which also required
a tiny update to a few functions using it to distinguish a check from
a stream (log.c, payload.c, ssl_sample.c, tcp_sample.c, tcpcheck.c,
connection.c).
Function arguments and local variables called "cs" were renamed to "sc".
The presence of one "cs=" in the debugging traces was also turned to
"sc=" for consistency.
The following functions which act on a connection-based stream connector
were renamed to sc_conn_* (~60 places):
cs_conn_drain_and_shut
cs_conn_process
cs_conn_read0
cs_conn_ready
cs_conn_recv
cs_conn_send
cs_conn_shut
cs_conn_shutr
cs_conn_shutw