OPTIM: ring: always relax in the ring lock and leader wait loop

Tests have shown that AMD systems really need to use a cpu_relax() in these two loops. The performance improves from 10.03 to 10.56M messages per second (+5%) on a 128-thread system, without affecting intel nor ARM, so let's do this.
2025-09-20 13:21:29 +02:00 · 2025-09-18 15:01:29 +02:00 · 2025-09-18 15:01:29 +02:00 · d25099b359
commit d25099b359
parent eca1f90e16
1 changed files with 2 additions and 2 deletions
--- a/src/ring.c
+++ b/src/ring.c
@ -295,7 +295,7 @@ ssize_t ring_write(struct ring *ring, size_t maxlen, const struct ist pfx[], siz
 				break;
 		}
 #endif
-		__ha_cpu_relax_for_read();
+		__ha_cpu_relax();
 	}

 	/* Here we own the tail. We can go on if we're still the leader,
@ -459,7 +459,7 @@ ssize_t ring_write(struct ring *ring, size_t maxlen, const struct ist pfx[], siz
 	 */
 	do {
 		next_cell = HA_ATOMIC_LOAD(&cell.next);
-	} while (next_cell != &cell && __ha_cpu_relax_for_read());
+	} while (next_cell != &cell && __ha_cpu_relax());

 	/* OK our message was queued. Retrieving the sent size in the ring cell
 	 * allows another leader thread to zero it if it finally couldn't send