From d25099b359df8277e9296a1deaa7297b66e95ad0 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Thu, 18 Sep 2025 15:01:29 +0200
Subject: [PATCH] OPTIM: ring: always relax in the ring lock and leader wait
 loop

Tests have shown that AMD systems really need to use a cpu_relax()
in these two loops. The performance improves from 10.03 to 10.56M
messages per second (+5%) on a 128-thread system, without affecting
intel nor ARM, so let's do this.
---
 src/ring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/ring.c b/src/ring.c
index 74172ce3a..8a97b37c0 100644
--- a/src/ring.c
+++ b/src/ring.c
@@ -295,7 +295,7 @@ ssize_t ring_write(struct ring *ring, size_t maxlen, const struct ist pfx[], siz
 				break;
 		}
 #endif
-		__ha_cpu_relax_for_read();
+		__ha_cpu_relax();
 	}
 
 	/* Here we own the tail. We can go on if we're still the leader,
@@ -459,7 +459,7 @@ ssize_t ring_write(struct ring *ring, size_t maxlen, const struct ist pfx[], siz
 	 */
 	do {
 		next_cell = HA_ATOMIC_LOAD(&cell.next);
-	} while (next_cell != &cell && __ha_cpu_relax_for_read());
+	} while (next_cell != &cell && __ha_cpu_relax());
 
 	/* OK our message was queued. Retrieving the sent size in the ring cell
 	 * allows another leader thread to zero it if it finally couldn't send