From d25099b359df8277e9296a1deaa7297b66e95ad0 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Thu, 18 Sep 2025 15:01:29 +0200 Subject: [PATCH] OPTIM: ring: always relax in the ring lock and leader wait loop Tests have shown that AMD systems really need to use a cpu_relax() in these two loops. The performance improves from 10.03 to 10.56M messages per second (+5%) on a 128-thread system, without affecting intel nor ARM, so let's do this. --- src/ring.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/ring.c b/src/ring.c index 74172ce3a..8a97b37c0 100644 --- a/src/ring.c +++ b/src/ring.c @@ -295,7 +295,7 @@ ssize_t ring_write(struct ring *ring, size_t maxlen, const struct ist pfx[], siz break; } #endif - __ha_cpu_relax_for_read(); + __ha_cpu_relax(); } /* Here we own the tail. We can go on if we're still the leader, @@ -459,7 +459,7 @@ ssize_t ring_write(struct ring *ring, size_t maxlen, const struct ist pfx[], siz */ do { next_cell = HA_ATOMIC_LOAD(&cell.next); - } while (next_cell != &cell && __ha_cpu_relax_for_read()); + } while (next_cell != &cell && __ha_cpu_relax()); /* OK our message was queued. Retrieving the sent size in the ring cell * allows another leader thread to zero it if it finally couldn't send