When threads are enabled and running on a machine with multiple CCX
or multiple nodes, thread groups are now enabled since 3.3-dev2, causing
load-balancing algorithms to randomly fail due to incoming connections
spreading over multiple groups and using different load balancing indexes.
Let's just force "thread-groups 1" into all configs when threads are
enabled to avoid this.
Every reg-test now runs without any warning, so let's acivate -dW by
default so the new ones will inheritate the option.
This patch reverts 9d511b3c ("REGTESTS: enable -dW on almost all tests
to fail on warnings") and adds -dW in the default HAPROXY_ARGS of
scripts/run-regtests.sh instead.
Now that warnings were almost all removed, let's enable zero-warning
via -dW. All tests were adjusted, but two:
- mcli/mcli_start_progs.vtc:
the programs section currently cannot be silenced
- stats/stats-file.vtc:
the warning comes from the stats file itself on comment lines.
All other ones are now OK.
No less than 30 tests were missing timeouts, preventing them from being
started with zero-warning. Since they were not supposed to trigger, they
have been set to 30s so as never to trigger, and now they do not produce
any warning anymore.
Some regtests involve multiple requests from multiple clients, which can
be dispatched as multiple requests to a server. It turns out that the
idle connection sharing works so well that very quickly few connections
are used, and regularly some of the remaining idle server connections
time out at the moment they were going to be reused, causing those random
"HTTP header incomplete" traces in the logs that make them fail often. In
the end this is only an artefact of the test environment.
And indeed, some tests like normalize-uri which perform a lot of reuse
fail very often, about 20-30% of the times in the CI, and 100% of the
time in local when running 1000 tests in a row. Others like ubase64,
sample_fetches or vary_* fail less often but still a lot in tests.
This patch addresses this by adding "tune.idle-pool.shared off" to all
tests which have at least twice as many requests as clients. It proves
very effective as no single error happens on normalize-uri anymore after
10000 tests. Also 100 full runs of all tests yield no error anymore.
One test is tricky, http_abortonclose, it used to fail ~10 times per
1000 runs and with this workaround still fails once every 1000 runs.
But the test is complex and there's a warning in it mentioning a
possible issue when run in parallel due to a port reuse.