OPTIM: ring: check the queue's owner using a CAS on x86

In the loop where the queue's leader tries to get the tail lock,
we also need to check if another thread took ownership of the queue
the current thread is currently working for. This is currently done
using an atomic load.

Tests show that on x86, using a CAS for this is much more efficient
because it allows to keep the cache line in exclusive state for a
few more cycles that permit the queue release call after the loop
to be done without having to wait again. The measured gain is +5%
for 128 threads on a 64-core AMD system (11.08M msg/s vs 10.56M).
However, ARM loses about 1% on this, and we cannot afford that on
machines without a fast CAS anyway, so the load is performed using
a CAS only on x86_64. It might not be as efficient on low-end models
but we don't care since they are not the ones dealing with high
contention.
This commit is contained in:
Willy Tarreau 2025-09-18 15:08:12 +02:00
parent d25099b359
commit a727c6eaa5

View File

@ -275,7 +275,18 @@ ssize_t ring_write(struct ring *ring, size_t maxlen, const struct ist pfx[], siz
*/
while (1) {
if ((curr_cell = HA_ATOMIC_LOAD(ring_queue_ptr)) != &cell)
#if defined(__x86_64__)
/* read using a CAS on x86, as it will keep the cache line
* in exclusive state for a few more cycles that will allow
* us to release the queue without waiting after the loop.
*/
curr_cell = &cell;
HA_ATOMIC_CAS(ring_queue_ptr, &curr_cell, curr_cell);
#else
curr_cell = HA_ATOMIC_LOAD(ring_queue_ptr);
#endif
/* give up if another thread took the leadership of the queue */
if (curr_cell != &cell)
goto wait_for_flush;
/* OK the queue is locked, let's attempt to get the tail lock.