When we stop demuxing in the middle of a frame, we know that there are other data following. The demux buffer is small and unique, but now we have rxbufs, so after h2_process_demux() is left, the dbuf is almost empty and has room to be delivered into another rxbuf. Let's implement a short loop with a counter and a few conditions around the demux call. We limit the number of turns to the number of available rxbufs and no more than 12, since it shows good performance, and the wakeup is only called once. This has shown a nice 12-20% bandwidth gain on backend-side H2 transferring 1MB-large objects, and does not affect the rest (headers, control etc). The number of wakeup calls was divided by 5 to 8, which is also a nice improvement. The counter is limited to make sure we don't add processing latency. Tests were run to find the optimal limit, and it turns out that 16 is just slightly better, but not worth the +33% increase in peak processing latency. The h2_process_demux() function just doens't call the wakeup function anymore, and solely focuses on transferring from dbuf to rxbuf. Practical measurement: test with h2load producing 4 concurrent connections with 10 concurrent streams each, downloading 1MB objects (20k total) via two layers of haproxy stacked, reaching httpterm over H1 (numbers are total for the 2 h2 front and 1 h2 back). All on a single thread. Before: 549-553 MB/s (on h2load) function calls cpu_tot cpu_avg h2_io_cb 2562340 8.157s 3.183us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup h2_io_cb 30109 840.9ms 27.93us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup h2_io_cb 16105 106.4ms 6.607us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup h2_io_cb 1 11.75us 11.75us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup h2_io_cb 2608555 9.104s 3.490us --total-- perf stat: 153,117,996,214 instructions (71.41%) 22,919,659,027 branches # 14.97% of inst (71.41%) 384,009,600 branch-misses # 1.68% of all branches (71.42%) 44,052,220 cache-misses # 1 inst / 3476 (71.44%) 9,819,232,047 L1-icache-load-misses # 6.4% of inst (71.45%) 8,426,410,306 L1-dcache-load-misses # 5.5% of inst (57.15%) 10,951,949 LLC-load-misses # 1 inst / 13982 (57.13%) 12.372600000 seconds user 23.629506000 seconds sys After: 660 MB/s (+20%) function calls cpu_tot cpu_avg h2_io_cb 244502 4.410s 18.04us <- h2c_restart_reading@src/mux_h2.c:957 tasklet_wakeup h2_io_cb 42107 1.062s 25.22us <- sock_conn_iocb@src/sock.c:1007 tasklet_wakeup h2_io_cb 13703 106.3ms 7.758us <- ssl_sock_io_cb@src/ssl_sock.c:5721 tasklet_wakeup h2_io_cb 1 13.74us 13.74us <- sock_conn_iocb@src/sock.c:986 tasklet_wakeup h2_io_cb 300313 5.578s 18.57us --total-- perf stat: 126,840,441,876 instructions (71.40%) 17,576,059,236 branches # 13.86% of inst (71.40%) 274,136,753 branch-misses # 1.56% of all branches (71.42%) 30,413,562 cache-misses # 1 inst / 4170 (71.45%) 6,665,036,203 L1-icache-load-misses # 5.25% of inst (71.46%) 7,519,037,097 L1-dcache-load-misses # 5.9% of inst (57.15%) 6,702,411 LLC-load-misses # 1 inst / 18925 (57.12%) 10.490097000 seconds user 19.212515000 seconds sys It's also interesting to see that less total time is spent in these functions, clearly indicating that the cost of interrupted processing, and the extraneous cache misses come into play at some point. Indeed, after the change, the number of instructions went down by 17.2%, while the L2 cache misses dropped by 31% and the L3 cache misses by 39%!
HAProxy
HAProxy is a free, very fast and reliable reverse-proxy offering high availability, load balancing, and proxying for TCP and HTTP-based applications.
Installation
The INSTALL file describes how to build HAProxy. A list of packages is also available on the wiki.
Getting help
The discourse and the mailing-list are available for questions or configuration assistance. You can also use the slack or IRC channel. Please don't use the issue tracker for these.
The issue tracker is only for bug reports or feature requests.
Documentation
The HAProxy documentation has been split into a number of different files for ease of use. It is available in text format as well as HTML. The wiki is also meant to replace the old architecture guide.
Please refer to the following files depending on what you're looking for:
- INSTALL for instructions on how to build and install HAProxy
- BRANCHES to understand the project's life cycle and what version to use
- LICENSE for the project's license
- CONTRIBUTING for the process to follow to submit contributions
The more detailed documentation is located into the doc/ directory:
- doc/intro.txt for a quick introduction on HAProxy
- doc/configuration.txt for the configuration's reference manual
- doc/lua.txt for the Lua's reference manual
- doc/SPOE.txt for how to use the SPOE engine
- doc/network-namespaces.txt for how to use network namespaces under Linux
- doc/management.txt for the management guide
- doc/regression-testing.txt for how to use the regression testing suite
- doc/peers.txt for the peers protocol reference
- doc/coding-style.txt for how to adopt HAProxy's coding style
- doc/internals for developer-specific documentation (not all up to date)
License
HAProxy is licensed under GPL 2 or any later version, the headers under LGPL 2.1. See the LICENSE file for a more detailed explanation.