OPTIM: ring: always relax in the ring lock and leader wait loop

Tests have shown that AMD systems really need to use a cpu_relax()
in these two loops. The performance improves from 10.03 to 10.56M
messages per second (+5%) on a 128-thread system, without affecting
intel nor ARM, so let's do this.
This commit is contained in:
Willy Tarreau 2025-09-18 15:01:29 +02:00
parent eca1f90e16
commit d25099b359

View File

@ -295,7 +295,7 @@ ssize_t ring_write(struct ring *ring, size_t maxlen, const struct ist pfx[], siz
break;
}
#endif
__ha_cpu_relax_for_read();
__ha_cpu_relax();
}
/* Here we own the tail. We can go on if we're still the leader,
@ -459,7 +459,7 @@ ssize_t ring_write(struct ring *ring, size_t maxlen, const struct ist pfx[], siz
*/
do {
next_cell = HA_ATOMIC_LOAD(&cell.next);
} while (next_cell != &cell && __ha_cpu_relax_for_read());
} while (next_cell != &cell && __ha_cpu_relax());
/* OK our message was queued. Retrieving the sent size in the ring cell
* allows another leader thread to zero it if it finally couldn't send