From 3b728a92bbe35b0c3ef34c3c8942ac5541a1713c Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Fri, 12 Mar 2021 06:06:14 +0100 Subject: [PATCH] BUILD: atomic/arm64: force the register pairs to use in __ha_cas_dw() Since commit f8fb4f75f ("MINOR: atomic: implement a more efficient arm64 __ha_cas_dw() using pairs"), on some modern arm64 (armv8.1+) compiled with -march=armv8.1-a under gcc-7.5.0, a build error may appear on ev_poll.o : /tmp/ccHD2lN8.s:1771: Error: reg pair must start from even reg at operand 1 -- `casp x27,x28,x22,x23,[x12]' Makefile:927: recipe for target 'src/ev_poll.o' failed It appears that the compiler cannot always assign register pairs there for a structure made of two u64. It was possibly later addressed since gcc-9.3 never caused this, but there's no trivially available info on the subject in the changelogs. Unsuprizingly, using a u128 instead does fix this, but it significantly inflates the code (+4kB for just 6 places, very likely that it loaded some extra stubs) and the comparison is ugly, involving two slower conditional jumps instead of a single one and a conditional comparison. For example, ha_random64() grew from 144 bytes to 232. However, simply forcing the base register does work pretty well, and makes the code even cleaner and more efficient by further reducing it by about 4.5kB, possibly because it helps the compiler to pick suitable registers for the pair there. And the perf on 64-cores looks steadily 0.5% above the previous one, so let's do this. Note that the commit above was backported to 2.3 to fix scalability issues on AWS Graviton2 platform, so this one will need to be as well. --- include/haproxy/atomic.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/include/haproxy/atomic.h b/include/haproxy/atomic.h index 4292413c4..709d4aa21 100644 --- a/include/haproxy/atomic.h +++ b/include/haproxy/atomic.h @@ -525,13 +525,14 @@ static forceinline int __ha_cas_dw(void *target, void *compare, const void *set) */ struct pair { uint64_t r[2]; }; register struct pair bck = *(struct pair *)compare; - register struct pair cmp = bck; + register struct pair cmp asm("x0") = bck; + register struct pair new asm("x2") = *(const struct pair*)set; int ret; __asm__ __volatile__("casp %0, %H0, %2, %H2, [%1]\n" : "+r" (cmp) // %0 : "r" (target), // %1 - "r" (*(const struct pair*)set) // %2 + "r" (new) // %2 : "memory"); /* if the old value is still the same unchanged, we won, otherwise we