From 871c80505c2d196d675878a1c28755e59b52716c Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Wed, 22 Oct 2025 18:55:29 +0200 Subject: [PATCH] BUG/MEDIUM: build: limit excessive and counter-productive gcc-15 vectorization In https://bugs.gentoo.org/964719, Dan Goodliffe reported that using CFLAGS="-O3 -march=westmere" creates a binary that segfaults on startup with gcc-15. This could be reproduced here, is isolated to gcc-15 and -O3, and is caused by gcc emitting "movdqa" instructions to read unaligned longs taken from chars that were carefully isolated within ifdefs checking for support for unaligned integers on the platform... Some experiments showed that changing all casts all over the code using either typedef-enforced align(1) or using the packed union trick does the job, it needs a more in-depth validation since it's obvious that it doesn't produce the same code at all (at least on more modern machines). However, the offending optimization option could be isolated, it's "-fvect-cost-model=dynamic" which causes this, while -O2 uses "-fvect-cost-model=very-cheap". Turning it back to very-cheap solves the issue, reduces the code, and yields an extra 5% performance increase on the http-request rate (181k vs 172k on a single core)! This could at least partially explain why it has been observed several times over the last few years that -O3 yields bigger and slower code than -O2. It was also verified that the option doesn't change the emitted code at -O0..-O2,-Os,-Oz, but only at -O3. This patch detects the presence of this option and turns it on to address the problem that some distros are facing after an upgrade to gcc-15. As such it should be backported to recent LTS and stable branches. Here, 3.1 was used, so it seems legit to at least target the last two LTS branches (i.e. go as far as 3.0). Thanks to Dan Goodliffe for sharing a working reproducer, Sam James for starting the investigations and Christian Ruppert for bringing the issue to us. --- Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 59b706ec3..d26e57948 100644 --- a/Makefile +++ b/Makefile @@ -213,7 +213,8 @@ UNIT_TEST_SCRIPT=./scripts/run-unittests.sh # undefined behavior to silently produce invalid code. For this reason we have # to use -fwrapv or -fno-strict-overflow to guarantee the intended behavior. # It is preferable not to change this option in order to avoid breakage. -STD_CFLAGS := $(call cc-opt-alt,-fwrapv,-fno-strict-overflow) +STD_CFLAGS := $(call cc-opt-alt,-fwrapv,-fno-strict-overflow) \ + $(call cc-opt,-fvect-cost-model=very-cheap) #### Compiler-specific flags to enable certain classes of warnings. # Some are hard-coded, others are enabled only if supported.