BUG/MEDIUM: build: limit excessive and counter-productive gcc-15 vectorization

In https://bugs.gentoo.org/964719, Dan Goodliffe reported that using
CFLAGS="-O3 -march=westmere" creates a binary that segfaults on startup
with gcc-15. This could be reproduced here, is isolated to gcc-15 and
-O3, and is caused by gcc emitting "movdqa" instructions to read unaligned
longs taken from chars that were carefully isolated within ifdefs checking
for support for unaligned integers on the platform...

Some experiments showed that changing all casts all over the code using
either typedef-enforced align(1) or using the packed union trick does
the job, it needs a more in-depth validation since it's obvious that
it doesn't produce the same code at all (at least on more modern
machines).

However, the offending optimization option could be isolated, it's
"-fvect-cost-model=dynamic" which causes this, while -O2 uses
"-fvect-cost-model=very-cheap". Turning it back to very-cheap solves the
issue, reduces the code, and yields an extra 5% performance increase on
the http-request rate (181k vs 172k on a single core)! This could at
least partially explain why it has been observed several times over
the last few years that -O3 yields bigger and slower code than -O2.

It was also verified that the option doesn't change the emitted code
at -O0..-O2,-Os,-Oz, but only at -O3.

This patch detects the presence of this option and turns it on to
address the problem that some distros are facing after an upgrade to
gcc-15. As such it should be backported to recent LTS and stable
branches. Here, 3.1 was used, so it seems legit to at least target
the last two LTS branches (i.e. go as far as 3.0).

Thanks to Dan Goodliffe for sharing a working reproducer, Sam James
for starting the investigations and Christian Ruppert for bringing
the issue to us.
This commit is contained in:
Willy Tarreau 2025-10-22 18:55:29 +02:00
parent d30b88a6cc
commit 871c80505c

View File

@ -213,7 +213,8 @@ UNIT_TEST_SCRIPT=./scripts/run-unittests.sh
# undefined behavior to silently produce invalid code. For this reason we have
# to use -fwrapv or -fno-strict-overflow to guarantee the intended behavior.
# It is preferable not to change this option in order to avoid breakage.
STD_CFLAGS := $(call cc-opt-alt,-fwrapv,-fno-strict-overflow)
STD_CFLAGS := $(call cc-opt-alt,-fwrapv,-fno-strict-overflow) \
$(call cc-opt,-fvect-cost-model=very-cheap)
#### Compiler-specific flags to enable certain classes of warnings.
# Some are hard-coded, others are enabled only if supported.