Go to file
Willy Tarreau 627280e15f MAJOR: leastconn: postpone the server's repositioning under contention
When leastconn is used under many threads, there can be a lot of
contention on leastconn, because the same node has to be moved around
all the time (when picking it and when releasing it). In GH issue #2861
it was noticed that 46 threads out of 64 were waiting on the same lock
in fwlc_srv_reposition().

In such a case, the accuracy of the server's key becomes quite irrelevant
because nobody cares if the same server is picked twice in a row and the
next one twice again.

While other approaches in the past considered using a floating key to
avoid moving the server each time (which was not compatible with the
round-robin rule for equal keys), here a more drastic solution is needed.
What we're doing instead is that we turn this lock into a trylock. If we
can grab it, we do the job. If we can't, then we just wake up a server's
tasklet dedicated to this. That tasklet will then try again slightly
later, knowing that during this short time frame, the server's position
in the queue is slightly inaccurate. Note that any thread touching the
same server will also reposition it and save that work for next time.
Also if multiple threads wake the tasklet up, then that's fine, their
calls will be merged and a single lock will be taken in the end.

Testing this on a 24-core EPYC 74F3 showed a significant performance
boost from 382krps to 610krps. The performance profile reported by
perf top dropped from 43% to 2.5%:

Before:
  Overhead  Shared Object             Symbol
    43.46%  haproxy-master-inlineebo  [.] fwlc_srv_reposition
    21.20%  haproxy-master-inlineebo  [.] fwlc_get_next_server
     0.91%  haproxy-master-inlineebo  [.] process_stream
     0.75%  [kernel]                  [k] ice_napi_poll
     0.51%  [kernel]                  [k] tcp_recvmsg
     0.50%  [kernel]                  [k] ice_start_xmit
     0.50%  [kernel]                  [k] tcp_ack

After:
  Overhead  Shared Object             Symbol
    30.37%  haproxy                   [.] fwlc_get_next_server
     2.51%  haproxy                   [.] fwlc_srv_reposition
     1.91%  haproxy                   [.] process_stream
     1.46%  [kernel]                  [k] ice_napi_poll
     1.36%  [kernel]                  [k] tcp_recvmsg
     1.04%  [kernel]                  [k] tcp_ack
     1.00%  [kernel]                  [k] skb_release_data
     0.96%  [kernel]                  [k] ice_start_xmit
     0.91%  haproxy                   [.] conn_backend_get
     0.82%  haproxy                   [.] connect_server
     0.82%  haproxy                   [.] run_tasks_from_lists

Tested on an Ampere Altra with 64 aarch64 cores dedicated to haproxy,
the gain is even more visible (3.6x):

  Before: 311-323k rps, 3.16-3.25ms, 6400% CPU
  Overhead  Shared Object     Symbol
    55.69%  haproxy-master    [.] fwlc_srv_reposition
    33.30%  haproxy-master    [.] fwlc_get_next_server
     0.89%  haproxy-master    [.] process_stream
     0.45%  haproxy-master    [.] h1_snd_buf
     0.34%  haproxy-master    [.] run_tasks_from_lists
     0.32%  haproxy-master    [.] connect_server
     0.31%  haproxy-master    [.] conn_backend_get
     0.31%  haproxy-master    [.] h1_headers_to_hdr_list
     0.24%  haproxy-master    [.] srv_add_to_idle_list
     0.23%  haproxy-master    [.] http_request_forward_body
     0.22%  haproxy-master    [.] __pool_alloc
     0.21%  haproxy-master    [.] http_wait_for_response
     0.21%  haproxy-master    [.] h1_send

  After: 1.21M rps, 0.842ms, 6400% CPU
  Overhead  Shared Object     Symbol
    17.44%  haproxy           [.] fwlc_get_next_server
     6.33%  haproxy           [.] process_stream
     4.40%  haproxy           [.] fwlc_srv_reposition
     3.64%  haproxy           [.] conn_backend_get
     2.75%  haproxy           [.] connect_server
     2.71%  haproxy           [.] h1_snd_buf
     2.66%  haproxy           [.] srv_add_to_idle_list
     2.33%  haproxy           [.] run_tasks_from_lists
     2.14%  haproxy           [.] h1_headers_to_hdr_list
     1.56%  haproxy           [.] stream_set_backend
     1.37%  haproxy           [.] http_request_forward_body
     1.35%  haproxy           [.] http_wait_for_response
     1.34%  haproxy           [.] h1_send

And at similar loads, the CPU usage considerably drops (3.55x), as
well as the response time (10x):

  After: 320k rps, 0.322ms, 1800% CPU
  Overhead  Shared Object     Symbol
     7.62%  haproxy           [.] process_stream
     4.64%  haproxy           [.] h1_headers_to_hdr_list
     3.09%  haproxy           [.] h1_snd_buf
     3.08%  haproxy           [.] h1_process_demux
     2.22%  haproxy           [.] __pool_alloc
     2.14%  haproxy           [.] connect_server
     1.87%  haproxy           [.] h1_send
   > 1.84%  haproxy           [.] fwlc_srv_reposition
     1.84%  haproxy           [.] run_tasks_from_lists
     1.77%  haproxy           [.] sock_conn_iocb
     1.75%  haproxy           [.] srv_add_to_idle_list
     1.66%  haproxy           [.] http_request_forward_body
     1.65%  haproxy           [.] wake_expired_tasks
     1.59%  haproxy           [.] h1_parse_msg_hdrs
     1.51%  haproxy           [.] http_wait_for_response
   > 1.50%  haproxy           [.] fwlc_get_next_server

The cost of fwlc_get_next_server() naturally increases as the server
count increases, but now has no visible effect on updates. The load
distribution remains unchanged compared to the previous approach,
the weight still being respected.

For further improvements to the fwlc algo, please consult github
issue #881 which centralizes everything related to this algorithm.
2025-02-12 11:48:10 +01:00
.github CI: github: activate debug in wolfssl weekly build 2024-12-23 18:00:34 +01:00
addons MEDIUM: stream: Map task wake up reasons to dedicated stream events 2025-01-28 14:53:37 +01:00
admin CLEANUP: assorted typo fixes in the code and comments 2024-09-03 17:49:21 +02:00
dev MINOR: tevt/dev: Parse tuple of termination events 2025-01-31 10:46:08 +01:00
doc [RELEASE] Released version 3.2-dev5 2025-02-08 05:53:40 +01:00
examples EXAMPLES: add "traces.cfg" with traces examples 2024-11-06 17:32:32 +01:00
include MEDIUM: server: allocate a tasklet for asyncronous requeuing 2025-02-11 17:24:09 +01:00
reg-tests MINOR: hlua: rename "tune.lua.preserve-smp-bool" to "tune.lua.bool-sample-conversion" 2024-12-20 17:34:05 +01:00
scripts CI: scripts: allow to build wolfssl with --enable-debug 2024-12-23 18:00:25 +01:00
src MAJOR: leastconn: postpone the server's repositioning under contention 2025-02-12 11:48:10 +01:00
tests MAJOR: import: update mt_list to support exponential back-off (try #2) 2024-07-09 16:46:38 +02:00
.cirrus.yml CI: cirrus-ci: bump FreeBSD image to 14-1 2024-10-14 14:28:26 +02:00
.gitattributes MINOR: Configure the cpp userdiff driver for *.[ch] in .gitattributes 2021-02-22 18:17:57 +01:00
.gitignore MINOR: tevt/dev: Add term_events tool 2025-01-31 10:41:50 +01:00
.mailmap DOC: update Tim's address in .mailmap 2021-09-16 09:14:14 +02:00
.travis.yml MEDIUM: mworker: remove USE_SYSTEMD requirement for -Ws 2024-11-20 12:07:38 +01:00
BRANCHES DOC: fix some spelling issues over multiple files 2021-01-08 14:53:47 +01:00
BSDmakefile BUILD: makefile: commit the tiny FreeBSD makefile stub 2023-05-24 17:17:36 +02:00
CHANGELOG [RELEASE] Released version 3.2-dev5 2025-02-08 05:53:40 +01:00
CONTRIBUTING CLEANUP: assorted typo fixes in the code and comments 2021-08-16 12:37:59 +02:00
INSTALL MINOR: version: this is development again (3.2) 2024-11-26 17:21:16 +01:00
LICENSE
MAINTAINERS MAJOR: spoe: Let the SPOE back into the game 2024-05-22 09:04:38 +02:00
Makefile MINOR: tevt/dev: Add term_events tool 2025-01-31 10:41:50 +01:00
README.md DOC: change the link to the FreeBSD CI in README.md 2024-06-03 15:21:29 +02:00
SUBVERS BUILD: use format tags in VERDATE and SUBVERS files 2013-12-10 11:22:49 +01:00
VERDATE [RELEASE] Released version 3.2-dev5 2025-02-08 05:53:40 +01:00
VERSION [RELEASE] Released version 3.2-dev5 2025-02-08 05:53:40 +01:00

HAProxy

alpine/musl AWS-LC openssl no-deprecated Illumos NetBSD FreeBSD VTest

HAProxy logo

HAProxy is a free, very fast and reliable reverse-proxy offering high availability, load balancing, and proxying for TCP and HTTP-based applications.

Installation

The INSTALL file describes how to build HAProxy. A list of packages is also available on the wiki.

Getting help

The discourse and the mailing-list are available for questions or configuration assistance. You can also use the slack or IRC channel. Please don't use the issue tracker for these.

The issue tracker is only for bug reports or feature requests.

Documentation

The HAProxy documentation has been split into a number of different files for ease of use. It is available in text format as well as HTML. The wiki is also meant to replace the old architecture guide.

Please refer to the following files depending on what you're looking for:

  • INSTALL for instructions on how to build and install HAProxy
  • BRANCHES to understand the project's life cycle and what version to use
  • LICENSE for the project's license
  • CONTRIBUTING for the process to follow to submit contributions

The more detailed documentation is located into the doc/ directory:

License

HAProxy is licensed under GPL 2 or any later version, the headers under LGPL 2.1. See the LICENSE file for a more detailed explanation.