Commit Graph

21 Commits

Author SHA1 Message Date
Willy Tarreau
e327b4a73e MINOR: freq_ctr: add opportunistic versions of swrate_add()
Some uses of swrate_add() only consist in getting a rough estimate of
a frequency. There are cases where speed matters more than accuracy
(e.g. pools). For such use cases, let's just stop looping on the CAS,
if the update fails, another thread is already providing input, and
it's not dramatic to lose the race. All these functions are now
suffixed with "_opportunistic".
2022-12-20 14:51:12 +01:00
Willy Tarreau
f6a42c3a37 MINOR: freq_ctr: use the thread's local time whenever possible
Right now when dealing with freq_ctr updates, we're using the process-
wide monotinic time, and accessing it is expensive since every thread
needs to update it, so this adds some contention. However we don't need
it all the time, the thread's local time is most of the time strictly
equal to the global time, and may be off by one millisecond when the
global time is switched to the next one by another thread, and in this
case we don't want to use the local time because it would risk to cause
a rotation of the counter. But that's precisely the condition we're
already relying on for the slow path!

What this patch does is to add a check for the period against the
local time prior to anything else, and immediately return after
updating the counter if still within the period, otherwise fall back
to the existing code. Given that the function starts to inflate a bit,
it was split between s very short inline part that does the hot path,
and the slower fallback that's in a cold function. It was measured that
on a 24-CPU machine it was called ~0.003% of the time.

The resulting improvement sits between 2 and 3% at 500k req/s tracking
an http_req_rate counter.
2022-10-12 14:19:05 +02:00
Christopher Faulet
aa55640b8c MINOR: freq_ctr: Add a function to get events excess over the current period
freq_ctr_overshoot_period() function may be used to retrieve the excess of
events over the current period for a givent frequency counter, ignoring the
history. It is a way compare the "current rate" (the number of events over
the current period) to a given rate and estimate the excess of events.

It may be used to safely add new events, especially at the begining of the
current period for a frequency counter with large period. This way, it is
possible to smoothly add events during the whole period without quickly
consuming all the quota at the beginning of the period and waiting for the
next one to be able to add new events.
2022-06-22 18:33:27 +02:00
Willy Tarreau
9310f481ce CLEANUP: tree-wide: remove unneeded include time.h in ~20 files
20 files used to have haproxy/time.h included only for now_ms, and two
were missing it for other things but used to inherit from it via other
files.
2021-10-07 01:41:14 +02:00
Willy Tarreau
55a0975b1e BUG/MINOR: freq_ctr: use stricter barriers between updates and readings
update_freq_ctr_period() was using relaxed atomics without using barriers,
which usually works fine on x86 but not everywhere else. In addition, some
values were read without being enclosed by barriers, allowing the compiler
to possibly prefetch them a bit earlier. Finally, freq_ctr_total() was also
reading these without enough barriers. Let's make explicit use of atomic
loads and atomic stores to get rid of this situation. This required to
slightly rearrange the freq_ctr_total() loop, which could possibly slightly
improve performance under extreme contention by avoiding to reread all
fields.

A backport may be done to 2.4 if a problem is encountered, but last tests
on arm64 with LSE didn't show any issue so this can possibly stay as-is.
2021-08-01 17:34:06 +02:00
Willy Tarreau
aa33f20e27 MINOR: freq_ctr: add new functions to report float measurements
For stats reporting it can be convenient to report floats at low rates
instead of discrete integers. We do have quite some precision since we
currently divide counters by number of milliseconds, so we can usually
add 3 digits after the decimal point.
2021-05-08 10:48:17 +02:00
Willy Tarreau
b4476c6a8c CLEANUP: freq_ctr: make arguments of freq_ctr_total() const
freq_ctr_total() doesn't modify the freq counters, it should take a
const argument.
2021-04-28 17:44:37 +02:00
Willy Tarreau
d46ed5c26b MINOR: freq_ctr: simplify and improve the update function
update_freq_ctr_period() was still not very clean and didn't wait for
the rotation lock to be dropped before trying again, thus maintaining
the contention at a high level. In addition, the rotation update was
made in three steps, which are not very efficient in terms of bus
cycles.

Here the wait loop was reworked so that the fast path remains short
and that the contended path waits for the lock to be dropped before
attempting another write, but it only waits a relax cycle before
attempting a read. The rotation block was simplified to remove a
test that was already validated by the first loop, and so that the
retrieval of the current period, its reset and its increment are all
performed in a single atomic op and the store to the previous period
is performed immediately after.

All this results in significantly smaller code for the inline function
(~1kB total) and a shorter critical path.
2021-04-11 14:01:53 +02:00
Willy Tarreau
6339c19cac MINOR: freq_ctr: add cpu_relax in the rotation loop of update_freq_ctr_period()
When counters are rotated, there is contention between the threads which
can slow down the operation of the thread performing the rotation. Let's
apply a cpu_relax there to let the first thread finish faster.
2021-04-11 11:12:57 +02:00
Willy Tarreau
fc6323ad82 MEDIUM: freq_ctr: replace the per-second counters with the generic ones
It remains cumbersome to preserve two versions of the freq counters and
two different internal clocks just for this. In addition, the savings
from using two different mechanisms are not that important as the only
saving is a divide that is replaced by a multiply, but now thanks to
the freq_ctr_total() unificaiton the code could also be simplified to
optimize it in case of constants.

This patch turns all non-period freq_ctr functions to static inlines
which call the period-based ones with a period of 1 second. A direct
benefit is that a single internal clock is now needed for any counter
and that they now all rely on ticks.

These 1-second counters are essentially used to report request rates
and to enforce a connection rate limitation in listeners. It was
verified that these continue to work like before.
2021-04-11 11:12:55 +02:00
Willy Tarreau
fa1258f02c MINOR: freq_ctr: unify freq_ctr and freq_ctr_period into freq_ctr
Both structures are identical except the name of the field starting
the period and its description. Let's call them all freq_ctr and the
period's start "curr_tick" which is generic.

This is only a temporary change and fields are expected to remain
the same with no code change (verified).
2021-04-11 11:11:27 +02:00
Willy Tarreau
d209c87142 MINOR: freq_ctr: add the missing next_event_delay_period()
There was still no function to compute a wait time for periods, let's
implement it on top of freq_ctr_total() as we'll soon need it for the
per-second one. The divide here is applied on the frequency so that it
will be replaced with a reciprocal multiply when constant.
2021-04-11 11:11:03 +02:00
Willy Tarreau
607be24a85 MEDIUM: freq_ctr: reimplement freq_ctr_remain_period() from freq_ctr_total()
Now the function becomes an inline one and only contains a divide and
a max. The divide will automatically go away with constant periods.
2021-04-11 11:11:03 +02:00
Willy Tarreau
a7a31b2602 MEDIUM: freq_ctr: make read_freq_ctr_period() use freq_ctr_total()
This one is the easiest to implement, it just requires a call and a
divide of the result. Anti-flapping correction for low-rates was
preserved.

Now calls using a constant period will be able to use a reciprocal
multiply for the period instead of a divide.
2021-04-11 11:11:03 +02:00
Willy Tarreau
f3a9f8dc5a MINOR: freq_ctr: add a generic function to report the total value
Most of the functions designed to read a counter over a period go through
the same complex loop and only differ in the way they use the returned
values, so it was worth implementing all this into freq_ctr_total() which
returns the total number of events over a period so that the caller can
finish its operation using a divide or a remaining time calculation. As
a special case, read_freq_ctr_period() doesn't take pending events but
requires to enable an anti-flapping correction at very low frequencies.
Thus the function implements it when pend<0.

Thanks to this function it will be possible to reimplement the other ones
as inline and merge the per-second ones with the arbitrary period ones
without always adding the cost of a 64 bit divide.
2021-04-11 11:10:57 +02:00
Willy Tarreau
1db427399c CLEANUP: atomic: add an explicit _FETCH variant for add/sub/and/or
Currently our atomic ops return a value but it's never known whether
the fetch is done before or after the operation, which causes some
confusion each time the value is desired. Let's create an explicit
variant of these operations suffixed with _FETCH to explicitly mention
that the fetch occurs after the operation, and make use of it at the
few call places.
2021-04-07 18:18:37 +02:00
Willy Tarreau
8cc586c73f BUG/MEDIUM: freq_ctr/threads: use the global_now_ms variable
In commit a1ecbca0a ("BUG/MINOR: freq_ctr/threads: make use of the last
updated global time"), for period-based counters, the millisecond part
of the global_now variable was used as the date for the new period. But
it's wrong, it only works with sub-second periods as it wraps every
second, and for other periods the counters never rotate anymore.

Let's make use of the newly introduced global_now_ms variable instead,
which contains the global monotonic time expressed in milliseconds.

This patch needs to be backported wherever the patch above is backported.
It depends on previous commit "MINOR: time: also provide a global,
monotonic global_now_ms timer".
2021-03-23 09:03:37 +01:00
Willy Tarreau
6f9f2c0857 MINOR: freq_ctr/threads: relax when failing to update a sliding window value
The swrate_add* functions would sping fast on a failed CAS, better place
a cpu_relax() call there to reduce contention if any.
2021-03-17 19:36:15 +01:00
Willy Tarreau
a1ecbca0a5 BUG/MINOR: freq_ctr/threads: make use of the last updated global time
The freq counters were using the thread's own time as the start of the
current period. The problem is that in case of contention, it was
occasionally possible to perform non-monotonic updates on the edge of
the next second, because if the upfront thread updates a counter first,
it causes a rotation, then the second thread loses the race from its
older time, and tries again, and detects a different time again, but
in the past so it only updates the counter, then a third thread on the
new date would detect a change again, thus provoking a rotation again.

The effect was triple:
  - rare loss of stored values during certain transitions from one
    period to the next one, causing counters to report 0
  - half of the threads forced to go through the slow path every second
  - difficult convergence when using many threads where the CAS can fail
    a lot and we can observe N(N-1) attempts for N threads to complete

This patch fixes this issue in two ways:
  - first, it now makes use og the monotonic global_now value which also
    happens to be volatile and to carry the latest known time; this way
    time will never jump backwards anymore and only the first thread
    updates it on transition, the other ones do not need to.

  - second, re-read the time in the loop after each failure, because
    if the date changed in the counter, it means that one thread knows
    a more recent one and we need to update. In this case if it matches
    the new current second, the fast path is usable.

This patch relies on previous patch "MINOR: time: export the global_now
variable" and must be backported as far as 1.8.
2021-03-17 19:36:15 +01:00
Willy Tarreau
6784c99463 CLEANUP: include: make atomic.h part of the base API
Atomic ops are used about everywhere, let's make them part of the base
API by including atomic.h in api.h.
2020-06-11 10:18:59 +02:00
Willy Tarreau
6634794992 REORG: include: move freq_ctr to haproxy/
types/freq_ctr.h was moved to haproxy/freq_ctr-t.h and proto/freq_ctr.h
was moved to haproxy/freq_ctr.h. Files were updated accordingly, no other
change was applied.
2020-06-11 10:18:56 +02:00