When leastconn is used under many threads, there can be a lot of contention on leastconn, because the same node has to be moved around all the time (when picking it and when releasing it). In GH issue #2861 it was noticed that 46 threads out of 64 were waiting on the same lock in fwlc_srv_reposition(). In such a case, the accuracy of the server's key becomes quite irrelevant because nobody cares if the same server is picked twice in a row and the next one twice again. While other approaches in the past considered using a floating key to avoid moving the server each time (which was not compatible with the round-robin rule for equal keys), here a more drastic solution is needed. What we're doing instead is that we turn this lock into a trylock. If we can grab it, we do the job. If we can't, then we just wake up a server's tasklet dedicated to this. That tasklet will then try again slightly later, knowing that during this short time frame, the server's position in the queue is slightly inaccurate. Note that any thread touching the same server will also reposition it and save that work for next time. Also if multiple threads wake the tasklet up, then that's fine, their calls will be merged and a single lock will be taken in the end. Testing this on a 24-core EPYC 74F3 showed a significant performance boost from 382krps to 610krps. The performance profile reported by perf top dropped from 43% to 2.5%: Before: Overhead Shared Object Symbol 43.46% haproxy-master-inlineebo [.] fwlc_srv_reposition 21.20% haproxy-master-inlineebo [.] fwlc_get_next_server 0.91% haproxy-master-inlineebo [.] process_stream 0.75% [kernel] [k] ice_napi_poll 0.51% [kernel] [k] tcp_recvmsg 0.50% [kernel] [k] ice_start_xmit 0.50% [kernel] [k] tcp_ack After: Overhead Shared Object Symbol 30.37% haproxy [.] fwlc_get_next_server 2.51% haproxy [.] fwlc_srv_reposition 1.91% haproxy [.] process_stream 1.46% [kernel] [k] ice_napi_poll 1.36% [kernel] [k] tcp_recvmsg 1.04% [kernel] [k] tcp_ack 1.00% [kernel] [k] skb_release_data 0.96% [kernel] [k] ice_start_xmit 0.91% haproxy [.] conn_backend_get 0.82% haproxy [.] connect_server 0.82% haproxy [.] run_tasks_from_lists Tested on an Ampere Altra with 64 aarch64 cores dedicated to haproxy, the gain is even more visible (3.6x): Before: 311-323k rps, 3.16-3.25ms, 6400% CPU Overhead Shared Object Symbol 55.69% haproxy-master [.] fwlc_srv_reposition 33.30% haproxy-master [.] fwlc_get_next_server 0.89% haproxy-master [.] process_stream 0.45% haproxy-master [.] h1_snd_buf 0.34% haproxy-master [.] run_tasks_from_lists 0.32% haproxy-master [.] connect_server 0.31% haproxy-master [.] conn_backend_get 0.31% haproxy-master [.] h1_headers_to_hdr_list 0.24% haproxy-master [.] srv_add_to_idle_list 0.23% haproxy-master [.] http_request_forward_body 0.22% haproxy-master [.] __pool_alloc 0.21% haproxy-master [.] http_wait_for_response 0.21% haproxy-master [.] h1_send After: 1.21M rps, 0.842ms, 6400% CPU Overhead Shared Object Symbol 17.44% haproxy [.] fwlc_get_next_server 6.33% haproxy [.] process_stream 4.40% haproxy [.] fwlc_srv_reposition 3.64% haproxy [.] conn_backend_get 2.75% haproxy [.] connect_server 2.71% haproxy [.] h1_snd_buf 2.66% haproxy [.] srv_add_to_idle_list 2.33% haproxy [.] run_tasks_from_lists 2.14% haproxy [.] h1_headers_to_hdr_list 1.56% haproxy [.] stream_set_backend 1.37% haproxy [.] http_request_forward_body 1.35% haproxy [.] http_wait_for_response 1.34% haproxy [.] h1_send And at similar loads, the CPU usage considerably drops (3.55x), as well as the response time (10x): After: 320k rps, 0.322ms, 1800% CPU Overhead Shared Object Symbol 7.62% haproxy [.] process_stream 4.64% haproxy [.] h1_headers_to_hdr_list 3.09% haproxy [.] h1_snd_buf 3.08% haproxy [.] h1_process_demux 2.22% haproxy [.] __pool_alloc 2.14% haproxy [.] connect_server 1.87% haproxy [.] h1_send > 1.84% haproxy [.] fwlc_srv_reposition 1.84% haproxy [.] run_tasks_from_lists 1.77% haproxy [.] sock_conn_iocb 1.75% haproxy [.] srv_add_to_idle_list 1.66% haproxy [.] http_request_forward_body 1.65% haproxy [.] wake_expired_tasks 1.59% haproxy [.] h1_parse_msg_hdrs 1.51% haproxy [.] http_wait_for_response > 1.50% haproxy [.] fwlc_get_next_server The cost of fwlc_get_next_server() naturally increases as the server count increases, but now has no visible effect on updates. The load distribution remains unchanged compared to the previous approach, the weight still being respected. For further improvements to the fwlc algo, please consult github issue #881 which centralizes everything related to this algorithm. |
||
---|---|---|
.github | ||
addons | ||
admin | ||
dev | ||
doc | ||
examples | ||
include | ||
reg-tests | ||
scripts | ||
src | ||
tests | ||
.cirrus.yml | ||
.gitattributes | ||
.gitignore | ||
.mailmap | ||
.travis.yml | ||
BRANCHES | ||
BSDmakefile | ||
CHANGELOG | ||
CONTRIBUTING | ||
INSTALL | ||
LICENSE | ||
MAINTAINERS | ||
Makefile | ||
README.md | ||
SUBVERS | ||
VERDATE | ||
VERSION |
HAProxy
HAProxy is a free, very fast and reliable reverse-proxy offering high availability, load balancing, and proxying for TCP and HTTP-based applications.
Installation
The INSTALL file describes how to build HAProxy. A list of packages is also available on the wiki.
Getting help
The discourse and the mailing-list are available for questions or configuration assistance. You can also use the slack or IRC channel. Please don't use the issue tracker for these.
The issue tracker is only for bug reports or feature requests.
Documentation
The HAProxy documentation has been split into a number of different files for ease of use. It is available in text format as well as HTML. The wiki is also meant to replace the old architecture guide.
Please refer to the following files depending on what you're looking for:
- INSTALL for instructions on how to build and install HAProxy
- BRANCHES to understand the project's life cycle and what version to use
- LICENSE for the project's license
- CONTRIBUTING for the process to follow to submit contributions
The more detailed documentation is located into the doc/ directory:
- doc/intro.txt for a quick introduction on HAProxy
- doc/configuration.txt for the configuration's reference manual
- doc/lua.txt for the Lua's reference manual
- doc/SPOE.txt for how to use the SPOE engine
- doc/network-namespaces.txt for how to use network namespaces under Linux
- doc/management.txt for the management guide
- doc/regression-testing.txt for how to use the regression testing suite
- doc/peers.txt for the peers protocol reference
- doc/coding-style.txt for how to adopt HAProxy's coding style
- doc/internals for developer-specific documentation (not all up to date)
License
HAProxy is licensed under GPL 2 or any later version, the headers under LGPL 2.1. See the LICENSE file for a more detailed explanation.