mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-11-16 00:11:16 +01:00

Go to file

Willy Tarreau 04ce6536e1 OPTIM: mux-h2: try to continue reading after demuxing when useful

When we stop demuxing in the middle of a frame, we know that there are
other data following. The demux buffer is small and unique, but now we
have rxbufs, so after h2_process_demux() is left, the dbuf is almost
empty and has room to be delivered into another rxbuf.

Let's implement a short loop with a counter and a few conditions around
the demux call. We limit the number of turns to the number of available
rxbufs and no more than 12, since it shows good performance, and the
wakeup is only called once. This has shown a nice 12-20% bandwidth gain
on backend-side H2 transferring 1MB-large objects, and does not affect
the rest (headers, control etc). The number of wakeup calls was divided
by 5 to 8, which is also a nice improvement. The counter is limited to
make sure we don't add processing latency. Tests were run to find the
optimal limit, and it turns out that 16 is just slightly better, but not
worth the +33% increase in peak processing latency.

The h2_process_demux() function just doens't call the wakeup function
anymore, and solely focuses on transferring from dbuf to rxbuf.

Practical measurement: test with h2load producing 4 concurrent connections
with 10 concurrent streams each, downloading 1MB objects (20k total) via
two layers of haproxy stacked, reaching httpterm over H1 (numbers are total
for the 2 h2 front and 1 h2 back). All on a single thread.

Before: 549-553 MB/s (on h2load)
  function    calls  cpu_tot  cpu_avg
  h2_io_cb  2562340  8.157s   3.183us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup
  h2_io_cb    30109  840.9ms  27.93us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup
  h2_io_cb    16105  106.4ms  6.607us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup
  h2_io_cb        1  11.75us  11.75us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup
  h2_io_cb  2608555  9.104s   3.490us --total--

  perf stat:
   153,117,996,214 instructions                             (71.41%)
    22,919,659,027 branches       # 14.97% of inst          (71.41%)
       384,009,600 branch-misses  #  1.68% of all branches  (71.42%)
        44,052,220 cache-misses          # 1 inst / 3476    (71.44%)
     9,819,232,047 L1-icache-load-misses # 6.4% of inst     (71.45%)
     8,426,410,306 L1-dcache-load-misses # 5.5% of inst     (57.15%)
        10,951,949 LLC-load-misses       # 1 inst / 13982   (57.13%)

      12.372600000 seconds user
      23.629506000 seconds sys

After: 660 MB/s (+20%)
  function    calls  cpu_tot  cpu_avg
  h2_io_cb   244502  4.410s   18.04us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup
  h2_io_cb    42107  1.062s   25.22us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup
  h2_io_cb    13703  106.3ms  7.758us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup
  h2_io_cb        1  13.74us  13.74us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup
  h2_io_cb   300313  5.578s   18.57us --total--

  perf stat:
   126,840,441,876 instructions                             (71.40%)
    17,576,059,236 branches       # 13.86% of inst          (71.40%)
       274,136,753 branch-misses  #  1.56% of all branches  (71.42%)
        30,413,562 cache-misses          # 1 inst / 4170    (71.45%)
     6,665,036,203 L1-icache-load-misses # 5.25% of inst    (71.46%)
     7,519,037,097 L1-dcache-load-misses # 5.9% of inst     (57.15%)
         6,702,411 LLC-load-misses       # 1 inst / 18925   (57.12%)

      10.490097000 seconds user
      19.212515000 seconds sys

It's also interesting to see that less total time is spent in these
functions, clearly indicating that the cost of interrupted processing,
and the extraneous cache misses come into play at some point. Indeed,
after the change, the number of instructions went down by 17.2%, while
the L2 cache misses dropped by 31% and the L3 cache misses by 39%!

2024-10-12 16:38:36 +02:00

.github

CI: QUIC Interop: use different artifact names for uploading logs

2024-08-26 11:19:41 +02:00

addons

BUG/MEDIUM: promex: Wait to have the request before sending the response

2024-09-16 22:56:28 +02:00

admin

CLEANUP: assorted typo fixes in the code and comments

2024-09-03 17:49:21 +02:00

dev

DEV: flags/applet: decode appctx flags

2024-09-24 18:26:36 +02:00

doc

DOC: design-thoughts: add diagrams illustrating an rx win groth

2024-10-12 16:38:36 +02:00

examples

MEDIUM: protocol: add MPTCP per address support

2024-08-30 18:53:49 +02:00

include

MAJOR: mux-h2: make streams use the connection's buffers

2024-10-12 16:29:16 +02:00

reg-tests

REGTESTS: add some tests for 'do-log' action

2024-10-04 21:38:19 +02:00

scripts

SCRIPTS: create-release: no more need to skip architecture.txt

2024-07-10 15:38:45 +02:00

src

OPTIM: mux-h2: try to continue reading after demuxing when useful

2024-10-12 16:38:36 +02:00

tests

MAJOR: import: update mt_list to support exponential back-off (try #2 )

2024-07-09 16:46:38 +02:00

.cirrus.yml

CI: FreeBSD: upgrade image, packages

2024-06-04 11:19:00 +02:00

.gitattributes

MINOR: Configure the cpp userdiff driver for *.[ch] in .gitattributes

2021-02-22 18:17:57 +01:00

.gitignore

CONTRIB: Add vi file extensions to .gitignore

2023-06-02 18:14:34 +02:00

.mailmap

DOC: update Tim's address in .mailmap

2021-09-16 09:14:14 +02:00

.travis.yml

CI: travis-ci: temporarily disable arm64 builds

2021-08-07 07:28:15 +02:00

BRANCHES

DOC: fix some spelling issues over multiple files

2021-01-08 14:53:47 +01:00

BSDmakefile

BUILD: makefile: commit the tiny FreeBSD makefile stub

2023-05-24 17:17:36 +02:00

CHANGELOG

[RELEASE] Released version 3.1-dev9

2024-10-03 17:47:33 +02:00

CONTRIBUTING

CLEANUP: assorted typo fixes in the code and comments

2021-08-16 12:37:59 +02:00

INSTALL

DOC: install: don't reference removed CPU arg

2024-07-16 20:06:06 +02:00

LICENSE

LICENSE: add licence exception for OpenSSL

2012-09-07 13:52:26 +02:00

MAINTAINERS

MAJOR: spoe: Let the SPOE back into the game

2024-05-22 09:04:38 +02:00

Makefile

REORG: buffers: move some of the heavy functions from buf.h to buf.c

2024-10-12 16:29:15 +02:00

README.md

DOC: change the link to the FreeBSD CI in README.md

2024-06-03 15:21:29 +02:00

SUBVERS

BUILD: use format tags in VERDATE and SUBVERS files

2013-12-10 11:22:49 +01:00

VERDATE

[RELEASE] Released version 3.1-dev9

2024-10-03 17:47:33 +02:00

VERSION

[RELEASE] Released version 3.1-dev9

2024-10-03 17:47:33 +02:00

README.md

HAProxy

HAProxy is a free, very fast and reliable reverse-proxy offering high availability, load balancing, and proxying for TCP and HTTP-based applications.

Installation

The INSTALL file describes how to build HAProxy. A list of packages is also available on the wiki.

Getting help

The discourse and the mailing-list are available for questions or configuration assistance. You can also use the slack or IRC channel. Please don't use the issue tracker for these.

The issue tracker is only for bug reports or feature requests.

Documentation

The HAProxy documentation has been split into a number of different files for ease of use. It is available in text format as well as HTML. The wiki is also meant to replace the old architecture guide.

Please refer to the following files depending on what you're looking for:

INSTALL for instructions on how to build and install HAProxy
BRANCHES to understand the project's life cycle and what version to use
LICENSE for the project's license
CONTRIBUTING for the process to follow to submit contributions

The more detailed documentation is located into the doc/ directory:

doc/intro.txt for a quick introduction on HAProxy
doc/configuration.txt for the configuration's reference manual
doc/lua.txt for the Lua's reference manual
doc/SPOE.txt for how to use the SPOE engine
doc/network-namespaces.txt for how to use network namespaces under Linux
doc/management.txt for the management guide
doc/regression-testing.txt for how to use the regression testing suite
doc/peers.txt for the peers protocol reference
doc/coding-style.txt for how to adopt HAProxy's coding style
doc/internals for developer-specific documentation (not all up to date)

License

HAProxy is licensed under GPL 2 or any later version, the headers under LGPL 2.1. See the LICENSE file for a more detailed explanation.

Languages

C 98%

Shell 0.9%

Makefile 0.5%

Lua 0.2%

Python 0.2%