mirror of
https://git.haproxy.org/git/haproxy.git/
synced 2026-05-04 20:46:11 +02:00
The test directory gained a speed test guide (README-test-speed) explaining how to run performance benchmarks at various rate-limit levels, together with benchmark result files for the standalone, composite, context-propagation, and frontend-backend test configurations.
320 lines
13 KiB
Plaintext
320 lines
13 KiB
Plaintext
-----------------------------------------
|
|
HAProxy OTEL filter speed test guide
|
|
Version 1.0
|
|
( Last update: 2026-04-09 )
|
|
-----------------------------------------
|
|
Author : Miroslav Zagorac
|
|
Contact : mzagorac at haproxy dot com
|
|
|
|
|
|
SUMMARY
|
|
--------
|
|
|
|
1. Overview
|
|
2. Prerequisites
|
|
3. Running the test
|
|
4. Test parameters
|
|
5. Rate-limit levels
|
|
6. Test configurations
|
|
7. Results
|
|
7.1. Standalone (sa)
|
|
7.2. Comparison (cmp)
|
|
7.3. Context propagation (ctx)
|
|
7.4. Frontend / backend (fe-be)
|
|
8. Summary
|
|
|
|
|
|
1. Overview
|
|
------------
|
|
|
|
The test-speed.sh script measures the performance impact of the OTEL filter on
|
|
HAProxy at various rate-limit settings. For each test configuration, the script
|
|
iterates through a series of rate-limit values -- from full tracing (100%) down
|
|
to the filter being completely removed -- measuring throughput and latency at
|
|
each level.
|
|
|
|
The script uses template files (haproxy.cfg.in and otel.cfg.in) from each test
|
|
directory to generate the actual configuration files. A sed substitution
|
|
adjusts the rate-limit value (or disables/removes the filter) before each run.
|
|
|
|
|
|
2. Prerequisites
|
|
-----------------
|
|
|
|
The following tools must be installed and available in PATH:
|
|
|
|
- thttpd : a lightweight HTTP server used as the backend origin server.
|
|
It serves a small static HTML file (index.html) on port 8000.
|
|
|
|
- wrk : an HTTP benchmarking tool that generates the test load.
|
|
See https://github.com/wg/wrk
|
|
|
|
|
|
3. Running the test
|
|
--------------------
|
|
|
|
The test is executed from the test directory. It can be run for all
|
|
configurations at once or for a single configuration.
|
|
|
|
To run all configurations:
|
|
|
|
% ./test-speed.sh all
|
|
|
|
This produces result files in the _logs directory:
|
|
|
|
_logs/README-speed-fe-be
|
|
_logs/README-speed-sa
|
|
_logs/README-speed-cmp
|
|
_logs/README-speed-ctx
|
|
|
|
To run a single configuration:
|
|
|
|
% ./test-speed.sh <cfg> [<dir>]
|
|
|
|
Where <cfg> corresponds to a run-<cfg>.sh script and <dir> is the configuration
|
|
directory (defaults to <cfg>). For example:
|
|
|
|
% ./test-speed.sh sa
|
|
% ./test-speed.sh fe-be fe
|
|
% ./test-speed.sh cmp
|
|
% ./test-speed.sh ctx
|
|
|
|
|
|
4. Test parameters
|
|
-------------------
|
|
|
|
The wrk benchmarking tool is invoked with the following parameters:
|
|
|
|
-t8 8 threads
|
|
-c8 8 concurrent connections
|
|
-d300 5-minute test duration (300 seconds)
|
|
--latency latency distribution reporting
|
|
|
|
Each rate-limit level is tested sequentially. Between runs, HAProxy is stopped
|
|
via SIGUSR1 and restarted with the next rate-limit configuration. A 10-second
|
|
pause separates consecutive runs.
|
|
|
|
The backend origin server (thttpd) serves a small static HTML page (index.html,
|
|
approximately 50 bytes) on port 8000. HAProxy listens on port 10080 and proxies
|
|
requests to the origin.
|
|
|
|
|
|
5. Rate-limit levels
|
|
---------------------
|
|
|
|
The script tests nine rate-limit levels in the following order:
|
|
|
|
100.0 - the filter processes every stream (worst case)
|
|
75.0 - the filter processes 75% of streams
|
|
50.0 - the filter processes 50% of streams
|
|
25.0 - the filter processes 25% of streams
|
|
10.0 - the filter processes 10% of streams
|
|
2.5 - the filter processes 2.5% of streams
|
|
0.0 - the filter is loaded and attached to every stream but never
|
|
processes any telemetry (the rate-limit check always fails);
|
|
this measures the per-stream attach and detach overhead
|
|
|
|
disabled - the filter is loaded but disabled via 'option disabled'; it is not
|
|
attached to streams at all; this measures the cost of loading and
|
|
initializing the filter library without any per-stream work
|
|
|
|
off - the 'filter opentelemetry' and 'otel-group' directives are
|
|
commented out of haproxy.cfg; the filter is not loaded and has zero
|
|
presence in the processing path; this is the absolute baseline
|
|
|
|
In the result tables, the 'overhead' column is the throughput loss relative to
|
|
the 'off' baseline, expressed as a percentage:
|
|
|
|
overhead = (req/s_off - req/s_test) / req/s_off * 100
|
|
|
|
|
|
6. Test configurations
|
|
-----------------------
|
|
|
|
Four OTEL filter configurations are tested. They differ in complexity and in
|
|
the features they exercise:
|
|
|
|
sa - Standalone. Uses all possible HAProxy filter events with spans,
|
|
attributes, events, links, baggage, status, metrics and groups.
|
|
This is the most comprehensive single-instance configuration and
|
|
represents the worst-case scenario.
|
|
|
|
cmp - Comparison. A simplified configuration made for comparison with
|
|
other tracing implementations. It uses a reduced span hierarchy
|
|
without context propagation, groups or metrics. This is closer to
|
|
a typical production deployment.
|
|
|
|
ctx - Context propagation. Similar to 'sa' in scope coverage, but spans
|
|
are opened using extracted span contexts (inject/extract via HAProxy
|
|
variables) as parent references instead of direct span names. This
|
|
adds the overhead of context serialization, variable storage and
|
|
deserialization on every scope execution.
|
|
|
|
fe-be - Frontend / backend. Two cascaded HAProxy instances: the frontend
|
|
(fe) creates the root trace and injects span context into HTTP
|
|
headers; the backend (be) extracts the context and continues the
|
|
trace. This configuration measures the combined overhead of two
|
|
OTEL filter instances plus the inter-process context propagation
|
|
cost.
|
|
|
|
Note: the rate-limit is varied only on the frontend. The backend
|
|
always runs with its default configuration (filter enabled,
|
|
hard-errors on). The backend configuration is only modified for
|
|
the 'disabled' and 'off' levels.
|
|
|
|
|
|
7. Results
|
|
-----------
|
|
|
|
The tables below summarize the benchmarking results. The 'req/s' column shows
|
|
the sustained request rate reported by wrk, 'avg latency' is the average
|
|
response time, and 'overhead' is the throughput loss relative to the 'off'
|
|
baseline.
|
|
|
|
|
|
7.1. Standalone (sa)
|
|
---------------------
|
|
|
|
---------------------------------------------------------------
|
|
rate-limit req/s avg latency overhead
|
|
---------------------------------------------------------------
|
|
100.0% 38,202 213.08 us 21.6%
|
|
75.0% 40,223 202.18 us 17.4%
|
|
50.0% 42,777 190.49 us 12.2%
|
|
25.0% 45,302 180.46 us 7.0%
|
|
10.0% 46,879 174.69 us 3.7%
|
|
2.5% 47,993 170.58 us 1.4%
|
|
0.0% 48,726 167.96 us ~0
|
|
disabled 48,788 167.74 us ~0
|
|
off 48,697 168.00 us baseline
|
|
---------------------------------------------------------------
|
|
|
|
With all possible events active, the sa configuration at 100% rate-limit shows
|
|
a 22% throughput reduction. At 10% rate-limit the overhead drops to 3.7%, and
|
|
at 2.5% it is barely measurable at 1.4%. The 'disabled' and '0.0' levels show
|
|
no measurable overhead above the baseline.
|
|
|
|
|
|
7.2. Comparison (cmp)
|
|
----------------------
|
|
|
|
---------------------------------------------------------------
|
|
rate-limit req/s avg latency overhead
|
|
---------------------------------------------------------------
|
|
100.0% 44,780 182.58 us 7.6%
|
|
75.0% 45,362 180.10 us 6.4%
|
|
50.0% 46,279 176.59 us 4.6%
|
|
25.0% 47,061 173.57 us 2.9%
|
|
10.0% 47,793 170.85 us 1.4%
|
|
2.5% 48,410 169.11 us 0.2%
|
|
0.0% 48,586 168.09 us ~0
|
|
disabled 48,510 168.57 us ~0
|
|
off 48,488 168.64 us baseline
|
|
---------------------------------------------------------------
|
|
|
|
The simplified cmp configuration shows significantly lower overhead than sa at
|
|
every rate-limit level. At 100% rate-limit the overhead is 7.6%, at 10% it is
|
|
1.4%, and below 2.5% it becomes indistinguishable from the baseline.
|
|
|
|
|
|
7.3. Context propagation (ctx)
|
|
-------------------------------
|
|
|
|
---------------------------------------------------------------
|
|
rate-limit req/s avg latency overhead
|
|
---------------------------------------------------------------
|
|
100.0% 30,032 270.35 us 38.0%
|
|
75.0% 33,490 242.56 us 30.9%
|
|
50.0% 37,679 215.92 us 22.2%
|
|
25.0% 42,449 192.36 us 12.4%
|
|
10.0% 45,458 180.08 us 6.2%
|
|
2.5% 47,546 171.64 us 1.9%
|
|
0.0% 48,313 168.86 us 0.3%
|
|
disabled 48,620 168.36 us ~0
|
|
off 48,446 168.73 us baseline
|
|
---------------------------------------------------------------
|
|
|
|
The ctx configuration is the most expensive due to the inject/extract cycle on
|
|
every scope execution. At 100% rate-limit the overhead reaches 38%. However,
|
|
the cost scales linearly with the rate-limit: at 10% the overhead is 6.2%, and
|
|
at 2.5% it drops to 1.9%. The filter attachment overhead (0.0% vs off) is
|
|
negligible at 0.3%.
|
|
|
|
|
|
7.4. Frontend / backend (fe-be)
|
|
--------------------------------
|
|
|
|
---------------------------------------------------------------
|
|
rate-limit req/s avg latency overhead
|
|
---------------------------------------------------------------
|
|
100.0% 34,012 238.75 us 19.1%
|
|
75.0% 35,462 230.08 us 15.7%
|
|
50.0% 36,811 222.33 us 12.5%
|
|
25.0% 38,493 210.95 us 8.5%
|
|
10.0% 39,735 206.15 us 5.5%
|
|
2.5% 40,423 202.35 us 3.9%
|
|
0.0% 40,842 200.88 us 2.9%
|
|
disabled 42,113 194.91 us ~0
|
|
off 42,062 194.37 us baseline
|
|
---------------------------------------------------------------
|
|
|
|
The fe-be configuration involves two HAProxy instances in series, so the
|
|
absolute baseline (off) is already lower at 42,062 req/s due to the extra
|
|
network hop. The rate-limit is varied only on the frontend; the backend
|
|
always has the filter loaded with hard-errors enabled.
|
|
|
|
This explains the 2.9% overhead at rate-limit 0.0: even though the frontend
|
|
never traces, the backend filter still attaches to every stream, attempts to
|
|
extract context from the HTTP headers, fails (because the frontend did not
|
|
inject any context), and the hard-errors option stops further processing.
|
|
This per-stream attach/extract/error cycle accounts for the residual cost.
|
|
|
|
When both instances have the filter disabled (disabled level), the overhead
|
|
is within measurement noise, consistent with the single-instance
|
|
configurations.
|
|
|
|
|
|
8. Summary
|
|
-----------
|
|
|
|
The table below shows the overhead for each configuration at selected rate-limit
|
|
levels:
|
|
|
|
---------------------------------------------------
|
|
rate-limit sa cmp ctx fe-be
|
|
---------------------------------------------------
|
|
100.0% 21.6% 7.6% 38.0% 19.1%
|
|
25.0% 7.0% 2.9% 12.4% 8.5%
|
|
10.0% 3.7% 1.4% 6.2% 5.5%
|
|
2.5% 1.4% 0.2% 1.9% 3.9%
|
|
---------------------------------------------------
|
|
|
|
Key observations:
|
|
|
|
- The overhead scales approximately linearly with the rate-limit value.
|
|
Reducing the rate-limit from 100% to 10% eliminates the vast majority
|
|
of the cost in all configurations.
|
|
|
|
- The cmp configuration, which uses a reduced span hierarchy typical of
|
|
production deployments, adds only 1.4% overhead at a 10% rate-limit.
|
|
|
|
- The sa configuration, which exercises all possible events, stays at about
|
|
7% overhead at a 25% rate-limit and below 4% at 10%.
|
|
|
|
- The ctx configuration is the most expensive due to the inject/extract
|
|
context propagation on every scope. It is designed as a stress test for
|
|
the propagation mechanism rather than a practical production template.
|
|
|
|
- The fe-be configuration carries a higher fixed cost because two HAProxy
|
|
instances are involved and the backend filter processes context extraction
|
|
regardless of the frontend rate-limit setting.
|
|
|
|
- Loading the filter but disabling it via 'option disabled' adds no measurable
|
|
overhead in any configuration.
|
|
|
|
- The filter attachment cost without any telemetry processing (rate-limit 0.0)
|
|
is 0.3% or less for single-instance configurations (sa, cmp, ctx).
|
|
|
|
- In typical production use with a rate-limit of 10% or less, the performance
|
|
impact of the OTEL filter should be negligible for single-instance deployments.
|