The sleep time calculation in next_event_delay() was wrong because it
was dividing 999 by the number of pending events, and was directly
responsible for an observation made a long time ago that listeners
would eat all the CPU when hammered while globally rate-limited,
because the more the queued events, the least it would wait, and would
ignore the configured frequency to compute the delay.
This was addressed in various ways in listeners through the switch to
the FULL state and the wakeup of manage_global_listener_queue() that
avoids this fast loop, but the calculation made there remained wrong
nevertheless. It's even visible with this patch that the accept
frequency is much more accurate at low values now; for example,
configuring a maxconrate of 10 would give between 8.99 and 11.0 cps
before this patch and between 9.99 and 10.0 with it.
Better fix it now in case it's reused anywhere else and causes confusion
again. It maybe be backported but is probably not worth it.
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
And also rename standard.c to tools.c. The original split between
tools.h and standard.h dates from version 1.3-dev and was mostly an
accident. This patch moves the files back to what they were expected
to be, and takes care of not changing anything else. However this
time tools.h was split between functions and types, because it contains
a small number of commonly used macros and structures (e.g. name_desc)
which in turn cause the massive list of includes of tools.h to conflict
with the callers.
They remain the ugliest files of the whole project and definitely need
to be cleaned and split apart. A few types are defined there only for
functions provided there, and some parts are even OS-specific and should
move somewhere else, such as the symbol resolution code.
types/freq_ctr.h was moved to haproxy/freq_ctr-t.h and proto/freq_ctr.h
was moved to haproxy/freq_ctr.h. Files were updated accordingly, no other
change was applied.
This one is included almost everywhere and used to rely on a few other
.h that are not needed (unistd, stdlib, standard.h). It could possibly
make sense to split it into multiple parts to distinguish operations
performed on timers and the internal time accounting, but at this point
it does not appear much important.
The hathreads.h file has quickly become a total mess because it contains
thread definitions, atomic operations and locking operations, all this for
multiple combinations of threads, debugging and architectures, and all this
done with random ordering!
This first patch extracts all the atomic ops code from hathreads.h to move
it to haproxy/atomic.h. The code there still contains several sections
based on non-thread vs thread, and GCC versions in the latter case. Each
section was arranged in the exact same order to ease finding.
The redundant HA_BARRIER() which was the same as __ha_compiler_barrier()
was dropped in favor of the latter which follows the naming convention
of all other barriers. It was only used in freq_ctr.c which was updated.
Additionally, __ha_compiler_barrier() was defined inconditionally but
used only for thread-related operations, so it was made thread-only like
HA_BARRIER() used to be. We'd still need to have two types of compiler
barriers, one for the general case (e.g. signals) and another one for
concurrency, but this was not addressed here.
Some comments were added at the beginning of each section to inform about
the use case and warn about the traps to avoid.
Some files which continue to include hathreads.h solely for atomic ops
should now be updated.
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:
- common/config.h
- common/compat.h
- common/compiler.h
- common/defaults.h
- common/initcall.h
- common/tools.h
The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.
In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.
No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
commit 6e01286 (BUG/MAJOR: threads/freq_ctr: fix lock on freq counters)
attempted to fix the loop using volatile but that doesn't work depending
on the level of optimization, resulting in situations where the threads
could remain looping forever. Here we use memory barriers between reads
to enforce a strict ordering and the asm code produced does exactly what
the C code does and works perfectly, with a 3-digit measurement accuracy
observed during a test.
The wrong bit was set to keep the lock on freq counter update. And the read
functions were re-worked to use volatile.
Moreover, when a freq counter is updated, it is now rotated only if the current
counter is in the past (now.tv_sec > ctr->curr_sec). It is important with
threads because the current time (now) is thread-local. So, rounded to the
second, the time may vary by more or less 1 second. So a freq counter rotated by
one thread may be see 1 second in the future. In this case, it is updated but
not rotated.
When a frequency counter must be updated, we use the curr_sec/curr_tick fields
as a lock, by setting the MSB to 1 in a compare-and-swap to lock and by reseting
it to unlock. And when we need to read it, we loop until the counter is
unlocked. This way, the frequency counters are thread-safe without any external
lock. It is important to avoid increasing the size of many structures (global,
proxy, server, stick_table).
When a frontend is rate-limited to 1000 connections per second, the
effective rate measured from the client is 999/s, and connections
experience an average response time of 99.5 ms with a standard
deviation of 2 ms.
The reason for this inaccuracy is that when computing frequency
counters, we use one part of the previous value proportional to the
number of milliseconds remaining in the current second. But even the
last millisecond still uses a part of the past value, which is wrong :
since we have a 1ms resolution, the last millisecond must be dedicated
only to filling the current second.
So we slightly adjust the algorithm to use 999/1000 of the past value
during the first millisecond, and 0/1000 of the past value during the
last millisecond. We also slightly improve the computation by computing
the remaining time instead of the current time in tv_update_date(), so
that we don't have to negate the value in each frequency counter.
Now with the fix, the connection rate measured by both the client and
haproxy is a steady 1000/s, the average response time measured is 99.2ms
and more importantly, the standard deviation has been divided by 3 to
0.6 millisecond.
This fix should also be backported to 1.4 which has the same issue.
Some freq counters will have to work on periods different from 1 second.
The original freq counters rely on the period to be exactly one second.
The new ones (freq_ctr_period) let the user define the period in ticks,
and all computations are operated over that period. When reading a value,
it indicates the amount of events over that period too.
It's easier to take the counter's age into account when consulting it
than to rotate it first. It also saves some CPU cycles and avoids the
multiply for outdated counters, finally saving CPU cycles here too
when multiple operations need to read the same counter.
The freq_ctr code has also shrinked by one third consecutively to these
optimizations.
The rate-limit was applied to the smoothed value which does a special
case for frequencies below 2 events per period. This caused irregular
limitations when set to 1 session per second.
The proper way to handle this is to compute the number of remaining
events that can occur without reaching the limit. This is what has
been added. It also has the benefit that the frequency calculation
is now done once when entering event_accept(), before the accept()
loop, and not once per accept() loop anymore, thus saving a few CPU
cycles during very high loads.
With this fix, rate limits of 1/s are perfectly respected.
With this change, all frontends, backends, and servers maintain a session
counter and a timer to compute a session rate over the last second. This
value will be very useful because it varies instantly and can be used to
check thresholds. This value is also reported in the stats in a new "rate"
column.