mirror of
https://git.haproxy.org/git/haproxy.git/
synced 2025-09-21 05:41:26 +02:00
During multiple tests we've already noticed that shared stats counters have become a real bottleneck under large thread counts. With QUIC it's pretty visible, with qc_snd_buf() taking 2.5% of the CPU on a 48-thread machine at only 25 Gbps, and this CPU is entirely spent in the atomic increment of the byte count and byte rate. It's also visible in H1/H2 but slightly less since we're working with larger buffers, hence less frequent updates. These counters are exclusively used to report the byte count in "show info" and the byte rate in the stats. Let's move them to the thread_ctx struct and make the stats reader just collect each thread's stats when requested. That's way more efficient than competing on a single cache line. After this, qc_snd_buf has totally disappeared from the perf profile and tests made in h1 show roughly 1% performance increase on small objects.