Thanks to previous commits, it is possible to use small buffers at different
places: to store the request when a connection is queued or when L7 retries
are enabled, or for health-checks requests. However, there was no
configuration parameter to fine tune small buffer use.
It is now possible, thanks to the proxy option "use-small-buffers".
Documentation was updated accordingly.
When healthchecks were configured for a proxy, an enum-like was used to
sepcify the check's type. The idea was to reserve some values for futur
types of healthcheck. But it is overkill. I doubt we will ever have
something else than tcp and external checks. So corresponding PR_O2 flags
were slightly reviewed and a hole was filled.
Thanks to this change, some bits were released in options2 bitfield.
If support for small buffers is enabled, we now try to use them for
healthcheck requests. First, we take care the tcpcheck ruleset may use small
buffers. Send rules using LF strings or too large data are excluded. The
ability to use small buffers or not are set on the ruleset. All send rules
of the ruleset must be compatible. This info is then transfer to server's
healthchecks relying on this ruleset.
Then, when a healthcheck is running, when a send rule is evaluated, if
possible, we try to use small buffers. On error, the ability to use small
buffers is removed and we retry with a regular buffer. It means on the first
error, the support is disabled for the healthcheck and all other runs will
use regular buffers.
In h2_rcv_buf(), HTX flags are transfer with data when htx_xfer() is
called. There is no reason to continue to deal with them in the H2 mux. In
addition, there is no reason to set SE_FL_EOI flag when a parsing error was
reported. This part was added before the stconn era. Nowadays, when an HTX
parsing error is reported, an error on the sedesc should also be reported.
This reverts commit 44932b6c417e472d25039ec3d7b8bf14e07629bc.
The patch above was only necessary to handle partial headers or trailers
parsing. There was nothing to prevent the H2 multiplexer to start to add
headers or trailers in an HTX message and to stop the processing on error,
leaving the HTX message with no EOH/EOT block.
From the HTX API point of view, it is unexepected. And this was fixed thanks
to the commit ba7dc46a9 ("BUG/MINOR: h2/h3: Never insert partial
headers/trailers in an HTX message").
So this patch can be reverted. It is important to not report a parsign error
too early, when there are still data to transfer to the upper layer.
This patch must be backport where 44932b6c4 was backported but only after
backporting ba7dc46a9 first.
When a HTX stream is queued, if the request is small enough, it is moved
into a small buffer. This should save memory on instances intensively using
queues.
Applet and connection receive function were update to block receive when a
small buffer is in use.
In the same way support for large chunks was added to properly work with
large buffers, we are now adding supports for small chunks because it is
possible to process small buffers.
So a dedicated memory pool is added to allocate small
chunks. alloc_small_trash_chunk() must be used to allocate a small
chunk. alloc_trash_chunk_sz() and free_trash_chunk() were uppdated to
support small chunks.
In addition, small trash buffers are also created, using the same mechanism
than for regular trash buffers. So three thread-local trash buffers are
created. get_small_trash_chunk() must be used to get a small trash buffer.
And get_trash_chunk_sz() was updated to also deal with small buffers.
htx_move_to_small_buffer()/htx_move_to_large_buffer() and
htx_copy_to_small_buffer()/htx_copy_to_large_buffer() functions can now be
used to move or copy blocks from a default buffer to a small or large
buffer. The destination buffer is allocated and then each blocks are
transferred into it.
These funtions relies in htx_xfer() function.
htx_xfer() function should replace htx_xfer_blks(). It will be a bit easier to
maintain and to use. The behavior of htx_xfer() can be changed by calling it
with specific flags:
* HTX_XFER_KEEP_SRC_BLKS: Blocks from the source message are just copied
* HTX_XFER_PARTIAL_HDRS_COPY: It is allowed to partially xfer headers or trailers
* HTX_XFER_HDRS_ONLY: only headers are xferred
By default (HTX_XFER_DEFAULT or 0), all blocks from the source message are moved
into to the destination mesage. So copied in the destination messageand removed
from the source message.
The caller must still define the maximum amount of data (including meta-data)
that can be xferred.
It is no longer necessary to specify a block type to stop the copy. Most of
time, with htx_xfer_blks(), this parameter was set to HTX_BLK_UNUSED. And
otherwise it was only specified to transfer headers.
It is important to not that the caller is responsible to verify the original
HTX message is well-formated. Especially, it must be sure headers part and
trailers part are complete (finished by EOH/EOT block).
For now, htx_xfer_blks() is not removed for compatiblity reason. But it is
deprecated.
When small buffer size was greater than the default buffer size, an error
was triggered. We now do the same than for large buffer. A warning is
emitted and the small buffer size is set to 0 do disable small buffer
allocation.
Because small buffers were only used by QUIC streams, the pool used to alloc
these buffers was located in the quic code. However, their usage will be
extended to other parts. So, the small buffers pool was moved into the
dynbuf part.
http-errors parsing has been refactored in a recent serie of patches.
However, a null deref was introduced by the following patch in case a
non-existent http-errors section is referenced by an "errorfiles"
directive.
commit 2ca7601c2d6781f455cf205e4f3b52f5beb16e41
MINOR/OPTIM: http_htx: lookup once http_errors section on check/init
Fix this by delaying ha_free() so that it is called after ha_alert().
No need to backport.
The cfg_parse_acme() function checks if an 'acme' section is already
existing in the configuration with cur_acme->linenum > 0. But the wrong
filename and line number are displayed in the commit message.
Must be backported to 3.2 and later.
This patch fixes a leak of the ext_san structure when
sk_X509_EXTENSION_push() failed. sk_X509_EXTENSION_pop_free() is already
suppose to free it, so ext_san must be set to NULL upon success to avoid
a double-free.
Must be backported to 3.2 and later.
Use proxy_check_http_errors() on defaults proxy instances. This will
emit alert messages for errorfiles directives referencing a non-existing
http-errors section, or a warning if an explicitely listed status code
is not present in the target section.
This is a small behavior changes, as previouly this was only performed
for regular proxies. Thus, errorfile/errorfiles directives in an unused
defaults were never checked.
This may prevent startup of haproxy with a configuration file previously
considered as valid. However, this change is considered as necessary to
be able to use http-errors with dynamic backends. Any invalid defaults
will be detected on startup, rather than having to discover it at
runtime via "add backend" invokation.
Thus, any restriction on http-errors usage is now lifted for the
creation of dynamic backends.
The previous patch has splitted the original proxy_check_errors()
function in two, so that check and init steps are performed separately.
However, this renders the code inefficient for "errorfiles" directive as
tree lookup on http-errors section is performed twice.
Optimize this by adding a reference to the section in conf_errors
structure. This is resolved during proxy_check_http_errors() and
proxy_finalize_http_errors() can reuse it.
No need to backport.
Function proxy_check_errors() is used when configuration parsing is
over. This patch splits it in two newly named ones.
The first function is named proxy_check_http_errors(). It is responsible
to check for the validity of any "errorfiles" directive which could
reference non-existent http-errors section or code not defined in such
section. This function is now called via proxy_finalize().
The second function is named proxy_finalize_http_errors(). It converts
each conf_errors type used during parsing in a proper http_reply type
for runtime usage. This function is still called via post-proxy-check,
after proxy_finalize().
This patch does not bring any functional change. However, it will become
necessary to ensure http-errors can be used as expected with dynamic
backends.
This patch is the second part of the refactoring for http-errors
parsing. It renames some fields in <conf_errors> structure to clarify
their usage. In particular, union variants are renamed "inl"/"section",
which better highlight the link with the newly defined enum
http_err_directive.
In conf_errors struct, arbitrary integer values were used for both
<type> field and <status> array. This renders the code difficult to
follow.
Replaces these values with proper enums type. Two new types are defined
for each of these fields. The first one represents the directive type,
derived from the keyword used (errorfile vs errorfiles). This directly
represents which part of <info> union should be manipulated.
The second enum is used for errorfiles directive with a reference on a
http-errors section. It indicates whether or not if a status code should
be imported from this section, and if this import is explicit or
implicit.
Several resources were leaked on both success and error paths:
- X509_NAME *nm was never freed. X509_REQ_set_subject_name() makes
an internal copy, so nm must be freed separately by the caller.
- str_san allocated via my_strndup() was never freed on either path.
- On error paths after allocation, x (X509_REQ) and exts
(STACK_OF(X509_EXTENSION)) were also leaked.
Fix this by adding proper cleanup of all allocated resources in both
the success and error paths. Also move sk_X509_EXTENSION_pop_free()
after X509_REQ_sign() so it is not skipped when sign fails, and
initialize nm to NULL to make early error paths safe.
Must be backported as far as 3.2.
There was a leftover of "activity[tid].ctr1++" in commit 7d40b3134
("MEDIUM: sched: do not run a same task multiple times in series")
that unfortunately only builds in development mode :-(
This introduces 3 new settings: tune.h2.be.max-frames-at-once and
tune.h2.fe.max-frames-at-once, which limit the number of frames that
will be processed at once for backend and frontend side respectively,
and tune.h2.fe.max-rst-at-once which limits the number of RST_STREAM
frames processed at once on the frontend.
We can now yield when reading too many frames at once, which allows to
limit the latency caused by processing too many frames in large buffers.
However if we stop due to the RST budget being depleted, it's most likely
the sign of a protocol abuse, so we make the tasklet go to BULK since
the goal is to punish it.
By limiting the number of RST per loop to 1, the SSL response time drops
from 95ms to 1.6ms during an H2 RST flood attack, and the maximum SSL
connection rate drops from 35.5k to 28.0k instead of 11.8k. A moderate
SSL load that shows 1ms response time and 23kcps increases to 2ms with
15kcps versus 95ms and 800cps before. The average loop time goes down
from 270-280us to 160us, while still doubling the attack absorption
rate with the same CPU capacity.
This patch may usefully be backported to 3.3 and 3.2. Note that to be
effective, this relies on the following patches:
MEDIUM: sched: do not run a same task multiple times in series
MINOR: sched: do not requeue a tasklet into the current queue
MINOR: sched: do not punish self-waking tasklets anymore
MEDIUM: sched: do not punish self-waking tasklets if TASK_WOKEN_ANY
MEDIUM: sched: change scheduler budgets to lower TL_BULK
Having less yielding tasks in TL_BULK and more in TL_NORMAL, we need
to rebalance these queues' priorities. Tests have shown that raising
TL_NORMAL to 40% and lowering TL_BULK to 3% seems to give about the
best tradeoffs.
Self-waking tasklets are currently punished and go to the BULK list.
However it's a problem with muxes or the stick-table purge that just
yield and wake themselves up to limit the latency they cause to the
rest of the process, because by doing so to help others, they punish
themselves. Let's check if any TASK_WOKEN_ANY flag is present on
the tasklet and stop sending tasks presenting such a flag to TL_BULK.
Since tasklet_wakeup() by default passes TASK_WOKEN_OTHER, it means
that such tasklets will no longer be punished. However, tasks which
only want a best-effort wakeup can simply pass 0.
It's worth noting that a comparison was made between going into
TL_BULK at all and only setting the TASK_SELF_WAKING flag, and
it shows that the average latencies are ~10% better when entirely
avoiding TL_BULK in this case.
Nowadays due to yield etc, it's counter-productive to permanently
punish self-waking tasklets, let's abandon this principle as it prevent
finer task priority handling.
We continue to check for the TASK_SELF_WAKING flag to place a task
into TL_BULK in case some code wants to make use of it in the future
(similarly to TASK_HEAVY), but no code sets it anymore. It could
possible make sense in the future to replace this flag with a one-shot
variant requesting low-priority.
As found by Christopher, the concept of waking a tasklet up into the
current queue is totally flawed, because if a task is in TL_BULK or
TL_HEAVY, all the tasklets it will wake up will end up in the same
queue. Not only this will clobber such queues, but it will also
reduce their quality of service, and this can contaminate other
tasklets due to the numerous wakeups there are now with the subsribe
mechanism between layers.
There's always a risk that some tasks run multiple times if they wake
each other up. Now we include the loop counter in the task struct and
stop processing the queue it's in when meeting a task that has already
run. We only pick 16 bits since that's only what remains free in the
task common part, so from time to time (once every 65536) it will be
possible to wrongly match a task as having already run and stop evaluating
its queue, but it's rare enough that we don't care, because this will
be OK on the next iteration.
This patch improves the robustness of the QPACK varint decoder and fixes
potential 1-byte out-of-bounds reads in qpack_decode_fs().
In qpack_decode_fs(), two 1-byte OOB reads were possible on truncated
streams between two varint decoding. These occurred when trying to read
the byte containing the Huffman bit <h> and the Value Length prefix
immediately following an Index or a Name Length.
Note that these OOB are limited to a single byte because
qpack_get_varint() already ensures that its input length is non-zero
before consuming any data.
The fixes in qpack_decode_fs() are:
- When decoding an index, we now verify that at least one byte remains
to safely access the following <h> bit and value length.
- When decoding a literal, we now check len < name_len + 1 to ensure
the byte starting the header value is reachable.
In qpack_get_varint(), the maximum value is now strictly capped at 2^62-1
as per RFC. This is enforced using a budget-based check:
(v & 127) > (limit - ret) >> shift
This prevents values from overflowing into the 63rd or 64th bits, which
would otherwise break subsequent signed comparisons (e.g., if (len < name_len))
by interpreting the length as a negative value, leading to false positive
tests.
Thank you to @jming912 for having reported this issue in GH #3302.
Must be backported as far as 2.6
In the ENFILE and ENOMEM cases, when accept() fails, an irrelevant
global.maxsock value was printed that doesn't reflect system limits.
Now the actconn is printed that gives a hint about the failure reasons.
Should be backported in all stable branches.
We anticipated that the do-log action should be expanded with optional
arguments at some point. Now that we heard of multiple use-cases
that could be achieved with do-log action, but that are limitated by the
fact that all do-log statements inherit from the implicit log-profile
defined on the logger, we need to provide a way for the user to specify
that custom log-profile that could be used per do-log actions individually
This is what we try to achieve in this commit, by leveraging the
prerequisite work performed by the last 2 commits.
In process_send_log(), now also consider the ctx if ctx->profile != NULL
In that case, we do as if logger->prof was set, but we consider
ctx->profile in priority over the logger one. What this means is that
it will become possible to pass ctx.profile to a profile that will be
used no matter what to generate the log payload.
This is a pre-requisite to implement optional "profile" argument for
do-log action
do_log() is just a wrapper to use do_log_ctx() with pre-filled ctx, but
we now have the low-level do_log_ctx() variant which can be used to
pass specific ctx parameters instead.
Released version 3.4-dev7 with the following main changes :
- BUG/MINOR: stconn: Increase SC bytes_out value in se_done_ff()
- BUG/MINOR: ssl-sample: Fix sample_conv_sha2() by checking EVP_Digest* failures
- BUG/MINOR: backend: Don't get proto to use for webscoket if there is no server
- BUG/MINOR: jwt: Missing 'jwt_tokenize' return value check
- MINOR: flt_http_comp: define and use proxy_get_comp() helper function
- MEDIUM: flt_http_comp: split "compression" filter in 2 distinct filters
- CLEANUP: flt_http_comp: comp_state doesn't bother about the direction anymore
- BUG/MINOR: admin: haproxy-reload use explicit socat address type
- MEDIUM: admin: haproxy-reload conversion to POSIX sh
- BUG/MINOR: admin: haproxy-reload rename -vv long option
- SCRIPTS: git-show-backports: hide the common ancestor warning in quiet mode
- SCRIPTS: git-show-backports: add a restart-from-last option
- MINOR: mworker: add a BUG_ON() on mproxy_li in _send_status
- BUG/MINOR: mworker: don't set the PROC_O_LEAVING flag on master process
- Revert "BUG/MINOR: jwt: Missing 'jwt_tokenize' return value check"
- MINOR: jwt: Improve 'jwt_tokenize' function
- MINOR: jwt: Convert EC JWK to EVP_PKEY
- MINOR: jwt: Parse ec-specific fields in jose header
- MINOR: jwt: Manage ECDH-ES algorithm in jwt_decrypt_jwk function
- MINOR: jwt: Add ecdh-es+axxxkw support in jwt_decrypt_jwk converter
- MINOR: jwt: Manage ec certificates in jwt_decrypt_cert
- DOC: jwt: Add ECDH support in jwt_decrypt converters
- MINOR: stconn: Call sc_conn_process from the I/O callback if TASK_WOKEN_MSG state was set
- MINOR: mux-h2: Rely on h2s_notify_send() when resuming h2s for sending
- MINOR: mux-spop: Rely on spop_strm_notify_send() when resuming streams for sending
- MINOR: muxes: Wakup the data layer from a mux stream with TASK_WOKEN_IO state
- MAJOR: muxes: No longer use app_ops .wake() callback function from muxes
- MINOR: applet: Call sc_applet_process() instead of .wake() callback function
- MINOR: connection: Call sc_conn_process() instead of .wake() callback function
- MEDIUM: stconn: Remove .wake() callback function from app_ops
- MINOR: check: Remove wake_srv_chk() function
- MINOR: haterm: Remove hstream_wake() function
- MINOR: stconn: Wakup the SC with TASK_WOKEN_IO state from opposite side
- MEDIUM: stconn: Merge all .chk_rcv() callback functions in sc_chk_rcv()
- MINOR: stconn: Remove .chk_rcv() callback functions
- MEDIUM: stconn: Merge all .chk_snd() callback functions in sc_chk_snd()
- MINOR: stconn: Remove .chk_snd() callback functions
- MEDIUM: stconn: Merge all .abort() callback functions in sc_abort()
- MINOR: stconn: Remove .abort() callback functions
- MEDIUM: stconn: Merge all .shutdown() callback functions in sc_shutdown()
- MINOR: stconn: Remove .shutdown() callback functions
- MINOR: stconn: Totally app_ops from the stconns
- MINOR: stconn: Simplify sc_abort/sc_shutdown by merging calls to se_shutdown
- DEBUG: stconn: Add a CHECK_IF() when I/O are performed on a orphan SC
- MEDIUM: mworker: exiting when couldn't find the master mworker_proc element
- BUILD: ssl: use ASN1_STRING accessors for OpenSSL 4.0 compatibility
- BUILD: ssl: make X509_NAME usage OpenSSL 4.0 ready
- BUG/MINOR: tcpcheck: Fix typo in error error message for `http-check expect`
- BUG/MINOR: jws: fix memory leak in jws_b64_signature
- DOC: configuration: http-check expect example typo
- DOC/CLEANUP: config: update mentions of the old "Global parameters" section
- BUG/MEDIUM: ssl: Handle receiving early data with BoringSSL/AWS-LC
- BUG/MINOR: mworker: always stop the receiving listener
- BUG/MEDIUM: ssl: Don't report read data as early data with AWS-LC
- BUILD: makefile: fix range build without test command
- BUG/MINOR: memprof: avoid a small memory leak in "show profiling"
- BUG/MINOR: proxy: do not forget to validate quic-initial rules
- MINOR: activity: use dynamic allocation for "show profiling" entries
- MINOR: tools: extend the pointer hashing code to ease manipulations
- MINOR: tools: add a new pointer hash function that also takes an argument
- MINOR: memprof: attempt different retry slots for different hashes on collision
- MINOR: tinfo: start to add basic thread_exec_ctx
- MINOR: memprof: prepare to consider exec_ctx in reporting
- MINOR: memprof: also permit to sort output by calling context
- MINOR: tools: add a function to write a thread execution context.
- MINOR: debug: report the execution context on thread dumps
- MINOR: memprof: report the execution context on profiling output
- MINOR: initcall: record the file and line declaration of an INITCALL
- MINOR: tools: decode execution context TH_EX_CTX_INITCALL
- MINOR: tools: support decoding ha_caller type exec context
- MINOR: sample: store location for fetch/conv via initcalls
- MINOR: sample: also report contexts registered directly
- MINOR: tools: support an execution context that is just a function
- MINOR: actions: store the location of keywords registered via initcalls
- MINOR: actions: also report execution contexts registered directly
- MINOR: filters: set the exec context to the current filter config
- MINOR: ssl: set the thread execution context during message callbacks
- MINOR: connection: track mux calls to report their allocation context
- MINOR: task: set execution context on task/tasklet calls
- MINOR: applet: set execution context on applet calls
- MINOR: cli: keep the info of the current keyword being processed in the appctx
- MINOR: cli: keep track of the initcall context since kw registration
- MINOR: cli: implement execution context for manually registered keywords
- MINOR: activity: support aggregating by caller also for memprofile
- MINOR: activity: raise the default number of memprofile buckets to 4k
- DOC: internals: short explanation on how thread_exec_ctx works
- BUG/MINOR: mworker: only match worker processes when looking for unspawned proc
- MINOR: traces: defer processing of "-dt" options
- BUG/MINOR: mworker: fix typo &= instead of & in proc list serialization
- BUG/MINOR: mworker: set a timeout on the worker socketpair read at startup
- BUG/MINOR: mworker: avoid passing NULL version in proc list serialization
- BUG/MINOR: sockpair: set FD_CLOEXEC on fd received via SCM_RIGHTS
- BUG/MEDIUM: stconn: Don't forget to wakeup applets on shutdown
- BUG/MINOR: spoe: Properly switch SPOE filter to WAITING_ACK state
- BUG/MEDIUM: spoe: Properly abort processing on client abort
- BUG/MEDIUM: stconn: Fix abort on close when a large buffer is used
- BUG/MEDIUM: stconn: Don't perform L7 retries with large buffer
- BUG/MINOR: h2/h3: Only test number of trailers inserted in HTX message
- MINOR: htx: Add function to truncate all blocks after a specific block
- BUG/MINOR: h2/h3: Never insert partial headers/trailers in an HTX message
- BUG/MINOR: http-ana: Swap L7 buffer with request buffer by hand
- BUG/MINOR: stream: Fix crash in stream dump if the current rule has no keyword
- BUG/MINOR: mjson: make mystrtod() length-aware to prevent out-of-bounds reads
- MEDIUM: stats-file/clock: automatically update now_offset based on shared clock
- MINOR: promex: export "haproxy_sticktable_local_updates" metric
- BUG/MINOR: spoe: Fix condition to abort processing on client abort
- BUILD: spoe: Remove unsused variable
- MINOR: tools: add a function to create a tar file header
- MINOR: tools: add a function to load a file into a tar archive
- MINOR: config: support explicit "on" and "off" for "set-dumpable"
- MINOR: debug: read all libs in memory when set-dumpable=libs
- DEV: gdb: add a new utility to extract libs from a core dump: libs-from-core
- MINOR: debug: copy debug symbols from /usr/lib/debug when present
- MINOR: debug: opportunistically load libthread_db.so.1 with set-dumpable=libs
- BUG/MINOR: mworker: don't try to access an initializing process
- BUG/MEDIUM: peers: enforce check on incoming table key type
- BUG/MINOR: mux-h2: properly ignore R bit in GOAWAY stream ID
- BUG/MINOR: mux-h2: properly ignore R bit in WINDOW_UPDATE increments
- OPTIM: haterm: use chunk builders for generated response headers
- BUG/MAJOR: h3: check body size with content-length on empty FIN
- BUG/MEDIUM: h3: reject unaligned frames except DATA
- BUG/MINOR: mworker/cli: fix show proc pagination losing entries on resume
- CI: github: treat vX.Y.Z release tags as stable like haproxy-* branches
- MINOR: freq_ctr: add a function to add values with a peak
- MINOR: task: maintain a per-thread indicator of the peak run-queue size
- MINOR: mux-h2: store the concurrent streams hard limit in the h2c
- MINOR: mux-h2: permit to moderate the advertised streams limit depending on load
- MINOR: mux-h2: permit to fix a minimum value for the advertised streams limit
- BUG/MINOR: mworker: fix sort order of mworker_proc in 'show proc'
- CLEANUP: mworker: fix tab/space mess in mworker_env_to_proc_list()
Since version 3.1, the display order of old workers in 'show proc' was
accidentally reversed. The oldest worker was shown first and the newest
last, which was not the intended behavior. This regression was introduced
during the master-worker rework.
Fix this by sorting the list during deserialization in
mworker_env_to_proc_list().
An alternative fix would have been to iterate the list in reverse order
in the show proc function, but that approach risks introducing
inconsistencies when backporting to older versions.
Must be backported to 3.1 and later.
When using rq-load on tune.h2.fe.max-concurrent-streams, it's easy to
reach a situation where only one stream is allowed. There's nothing
wrong with this but it turns out that slightly higher values do not
necessarily cause significantly higher loads and will improve the user
experience. For this reason the keyword now also supports "min" to
specify a value. Experimentation shows that values from 5 to 15 remain
very effective at protecting the run queue while allowing a great level
of parallelism that keeps a site fluid.
Global setting tune.h2.fe.max-concurrent-streams now supports an optional
"rq-load" option to pass either a target load, or a keyword among "auto"
and "ignore". These are used to quadratically reduce the advertised streams
limit when the thread's run queue size goes beyong the configured value,
and automatically reduce the load on the process from new connections.
With "auto", instead of taking an explicit value, it uses as a target the
"tune.runqueue-depth" setting (which might be automatic). Tests have shown
that values between 50 and 100 are already very effective at reducing the
loads during attacks from 100000 to around 1500. By default, "ignore"
is in effect, which means that the dynamic tuning is not enabled.
The hard limit on the number of concurrent streams is currently
determined only by configuration and returned by
h2c_max_concurrent_streams(). However this doesn't permit to
change such settings on the fly without risking to break connections,
and it doesn't allow a connection to pick a different value, which
could be desirable for example to try to slow abuse down.
Let's store a copy of h2c_max_concurrent_streams() at connection
creation time into the h2c as streams_hard_limit. This inflates
the h2c size from 1324 to 1328 (0.3%) which is acceptable for the
expected benefits.
The new field th_ctx->rq_tot_peak contains the computed peak run queue
length averaged over the last 512 calls. This is computed when entering
process_runnable_tasks. It will not take into account new tasks that are
created or woken up during this round nor those which are evicted, which
is the reason why we're using a peak measurement to increase chances to
observe transient high values. Tests have shown that 512 samples are good
to provide a relatively smooth average measurement while still fading
away in a matter of milliseconds at high loads. Since this value is
only updated once per round, it cannot be used as a statistic and
shouldn't be exposed, it's only for internal use (self-regulation).
Sometimes it's desirable to observe fading away peak values, where a new
value that is higher than the historical one instantly replaces it,
otherwise contributes to it. It is convenient when trying to observe
certain phenomenons like peak queue sizes. The new function
swrate_add_peak_local() does that to a private variable (no atomic ops
involved as it's not worth the cost since such use cases are typically
local).