Congestion window is limit by a minimal and maximum values which can
never be exceeded. Min value is hardcoded to 2 datagrams as recommended
by the specification. Max value is specified via haproxy configuration.
These values must be respected each time the congestion window size is
adjusted. However, in some rare occasions, limit were not always
enforced. Fix this by implementing wrappers to set or increment the
congestion window. These functions ensure limits are always applied
after the operation.
Additionnally, wrappers also ensure that if window reached a new maximum
value, it is saved in <cwnd_last_max> field.
This should be backported up to 2.6, after a brief period of
observation.
Write minor adjustments to QUIC BBR functions. The objective is to
centralize every modification of path cwnd field.
No functional change. This patch will be useful to simplify
implementation of global QUIC Tx memory usage limitation.
There was some possible confusion between fields related to congestion
window size min and max limit which cannot be exceeded, and the maximum
value previously reached by the window.
Fix this by adopting a new naming scheme. Enforced limit are now renamed
<limit_max>/<limit_min>, while the previously reached max value is
renamed <cwnd_last_max>.
This should be backported up to 3.1.
The account creation was mistakenly ending the task instead of being
wakeup for the NewOrder state, it was preventing the creation of the
certificate, however the account was correctly created.
To fix this, only the jump to the end label need to be remove, the
standard leaving codepath of the function will allow to be wakeup.
No backport needed.
TCP_NOTSENT_LOWAT is very convenient as it indicates when to report
EAGAIN on the sending side. It takes a margin on top of the estimated
window, meaning that it's no longer needed to store too many data in
socket buffers. Instead there's just enough to fill the send window
and a little bit of margin to cover the scheduling time to restart
sending. Experiments on a 100ms network have shown a 10-fold reduction
in the memory used by socket buffers by just setting this value to
tune.bufsize, without noticing any performance degradation. Theoretically
the responsiveness on multiplexed protocols such as H2 should also be
improved.
The ping frame's pattern must be written at offset 9 (frame header
length), not 8. This was added in 3.2 with commit 4dcfe098a6 ("MINOR:
mux-h2: prepare to support PING emission"), so no backport is needed.
While humans can find it convenient to enter the worker process in prompt
mode, for external tools it will not be convenient to have to systematically
disable it. A better approach is to replicate the master socket's mode
there, since it has already been configured to suit the user: interactive,
prompt and timed modes are automatically passed to the worker process.
This makes the using the worker commands more natural from the master
process, without having to systematically adapt it for each new connection.
Support the same syntax in master mode as in worker mode in order to
configure the prompt. The only thing is that for now the master doesn't
have a non-interactive mode and it doesn't seem necessary to implement
it, so we only support the interactive and prompt modes. However the code
was written in a way that makes it easy to change this later if desired.
Now the prompt mode can more finely be configured between non-interactive
(default), interactive without prompt, and interactive with prompt. This
will ease the usage from automated tools which are not necessarily
interested in having to consume '> ' after each command nor displaying
"+" on payload lines. This can also be convenient when coming from the
master CLI to keep the same output format.
The CLI's "prompt" command toggles two distinct things:
- displaying or hiding the prompt at the beginning of the line
- single-command vs interactive mode
These are two independent concepts and the prompt mode doesn't
always cope well with tools that would like to upload data without
having to read the prompt on return. Also, the master command line
works in interactive mode by default with no prompt, which is not
consistent (and not convenient for tools). So let's start by splitting
the bit in two, and have a new APPCTX_CLI_ST1_INTER flag dedicated
to the interactive mode. For now the "prompt" command alone continues
to toggle the two at once.
We add __equals_0(NAME) which is only true if NAME is defined as zero,
and __def_as_empty(NAME) which is only true if NAME is defined as an
empty string.
Generate the private key on the account file when the file does not
exists. This generate a private key of the type and parameters
configured in the acme section.
Setting DEBUG_THREAD to 1 allows recording the lock history for each
thread. Tests have shown that (as predicted) the cost of updating a
single thread-local variable is not perceptible in the noise, especially
when compared to the cost of obtaining a lock. Since this can provide
useful value when debugging deadlocks, let's enable it by default when
threads are enabled.
This will display the lock labels and modes for each non-empty step
at the end of "show threads" when these are defined. This allows to
emit up to the last 8 locking operation for each thread on 64 bit
machines.
by only storing a word in each thread context, we can keep the history
of all taken/dropped locks by label. This is expected to be very cheap
and to permit to store up to 8 consecutive lock operations in 64 bits.
That should significantly help detect recursive locks as well as figure
what thread was likely to hinder another one waiting for a lock.
For now we only store the final state of the lock, we don't store the
attempt to get it. It's just a matter of space since we already need
4 ops (rd,sk,wr,un) which take 2 bits, leaving max 64 labels. We're
already around 45. We could also multiply by 5 and still keep 8 bits
total per lock, that would limit us to 51 locks max. It seems that
most of the time if we get a watchdog panic, anyway the victim thread
will be perfectly located so that we don't need a specific value for
this. Another benefit is that we perform a single memory write per
lock.
We now default the value to zero and make sure all tests properly take
care of values above zero. This is in preparation for supporting several
degrees of debugging.
Machines lacking CAS8B/DWCAS and emit a warning in lb_fwlc.c without
threads due to declaration ordering. Let's just move the variable
declaration into the block that uses it as a last variable. No
backport is needed.
Not all systems have strndup(), that's why we have our "my_strndup()",
so let's make use of it here. This fixes the build on Solaris 10. No
backport is needed.
It has been requested to have the current_session_rate exposed at the
frontend level. For now only the per-process value was exposed
(ST_I_INF_SESS_RATE).
Thanks to the work done lately to merge promex and stat_cols_px[]
array, let's simply defined an .alt_name for the ST_I_PX_RATE metric in
order to have promex exposing it as current_session_rate for relevant
contexts.
log-forward "host" option may be confusing because we often mention the
host field when talking about syslog RFC3164 or RFC5424 messages, but
neither rfc actually define "host" field. In fact, everywhere we used
"host field" we actually meant "hostname field" as documented in RFC5424.
This was a language abuse on our side.
In this patch we replace "host" with "hostname" where relevant in the
documentation to prevent confusion.
Thanks to Nick Ramirez for having reported the issue.
Nick Ramirez reported that the ACME paragraph (3.13) caused a rendering
issue where simple text was rendered as a directive. This was caused
by the use of unescaped <name> which confuses dconv.
Let's escape <name> by putting quotes around it to prevent the rendering
issue.
No backport needed.
Add a -t option to 'show ssl sni', allowing to add an offset to the
current date so it would allow to check which certificates are expired
after a certain period of time.
Since we made it possible for a bind_conf to listen to multiple thread
groups with shards in 2.8 with commit 9d360604bd ("MEDIUM: listener:
rework thread assignment to consider all groups"), the per-listener
connection count was not properly transferred to the target listener
with the connection when switching to another thread group. This results
in one listener possibly reaching high values and another one possibly
reaching negative values. Usually it's not visible, unless a maxconn is
set on the bind_conf, in which case comparisons will quickly put an end
to the willingness to accept new connections.
This problem only happens when thread groups are enabled, and it seems
very hard to trigger it normally, it only impacts sockets having a single
shard, hence currently the CLI (or any conf with "bind ... shards 1"),
where it can be reproduced with a config having a very low "maxconn" on
the stats socket directive (here, 4), and issuing a few tens of
socat <<< "show activity" in parallel, or sending HTTP connections to a
single-shared listener. Very quickly, haproxy stops accepting connections
and eats CPU in the poller which tries to get its connections accepted.
A BUG_ON(l->nbconn<0) after HA_ATOMIC_DEC() in listener_release() also
helps spotting them better.
Many thanks to Christian Ruppert who once again provided a very accurate
report in GH #2951 with the required data permitting this analysis.
This fix must be backported to 2.8.
tasklets were originally designed to alway run on only one thread, so it
was not possible to have it run on 2 threads concurrently.
The API has been extended so that another thread may wake the tasklet,
the idea was still that we wanted to have it run on one thread only.
However, the way it's been done meant that unless a tasklet was bound to
a specific tid with tasklet_set_tid(), or we explicitely used
tasklet_wakeup_on() to specify the thread for the target to run on, it
would be scheduled to run on the current thread.
This is in fact a desirable feature. There is however a race condition
in which the tasklet would be scheduled on a thread, while it is running
on another. This could lead to the same tasklet to run on multiple
threads, which we do not want.
To fix this, just do what we already do for regular tasks, set the
"TASK_RUNNING" flag, and when it's time to execute the tasklet, wait
until that flag is gone.
Only one case has been found in the current code, where the tasklet
could run on different threads depending on who wakes it up, in the
leastconn load balancer, since commit
627280e15f.
It should not be a problem in practice, as the function called can be
called concurrently.
If a bug is eventually found in relation to this problem, and this patch
should be backported, the following patches should be backported too :
MEDIUM: quic: Make sure we return the tasklet from quic_accept_run
MEDIUM: quic: Make sure we return NULL in quic_conn_app_io_cb if needed
MEDIUM: quic: Make sure we return the tasklet from qcc_io_cb
MEDIUM: mux_fcgi: Make sure we return the tasklet from fcgi_deferred_shut
MEDIUM: listener: Make sure w ereturn the tasklet from accept_queue_process
MEDIUM: checks: Make sure we return the tasklet from srv_chk_io_cb
In quic_conn_app_io_cb, make sure we return NULL if the tasklet has been
destroyed, so that the scheduler knows. It is not yet needed, but will
be soon.
Released version 3.2-dev12 with the following main changes :
- BUG/MINOR: quic: do not crash on CRYPTO ncbuf alloc failure
- BUG/MINOR: proxy: always detach a proxy from the names tree on free()
- CLEANUP: proxy: detach the name node in proxy_free_common() instead
- CLEANUP: Slightly reorder some proxy option flags to free slots
- MINOR: proxy: Add options to drop HTTP trailers during message forwarding
- MINOR: h1-htx: Skip C-L and T-E headers for 1xx and 204 messages during parsing
- MINOR: mux-h1: Keep custom "Content-Length: 0" header in 1xx and 204 messages
- MINOR: hlua/h1: Use http_parse_cont_len_header() to parse content-length value
- CLEANUP: h1: Remove now useless h1_parse_cont_len_header() function
- BUG/MEDIUM: mux-spop: Respect the negociated max-frame-size value to send frames
- MINOR: http-act: Add 'pause' action to temporarily suspend the message analysis
- MINOR: acme/cli: add the 'acme renew' command to the help message
- MINOR: httpclient: add an "https" log-format
- MEDIUM: acme: use a customized proxy
- MEDIUM: acme: rename "uri" into "directory"
- MEDIUM: acme: rename "account" into "account-key"
- MINOR: stick-table: use a separate lock label for updates
- MINOR: h3: simplify h3_rcv_buf return path
- BUG/MINOR: mux-quic: fix possible infinite loop during decoding
- BUG/MINOR: mux-quic: do not decode if conn in error
- BUG/MINOR: cli: Issue an error when too many args are passed for a command
- MINOR: cli: Use a full prompt command for bidir connections with workers
- MAJOR: cli: Refacor parsing and execution of pipelined commands
- MINOR: cli: Rename some CLI applet states to reflect recent refactoring
- CLEANUP: applet: Update st0/st1 comment in appctx structure
- BUG/MINOR: hlua: Fix I/O handler of lua CLI commands to not rely on the SC
- BUG/MINOR: ring: Fix I/O handler of "show event" command to not rely on the SC
- MINOR: cli/applet: Move appctx fields only used by the CLI in a private context
- MINOR: cache: Add a pointer on the cache in the cache applet context
- MINOR: hlua: Use the applet name in error messages for lua services
- MINOR: applet: Save the "use-service" rule in the stream to init a service applet
- CLEANUP: applet: Remove unsued rule pointer in appctx structure
- BUG/MINOR: master/cli: properly trim the '@@' process name in error messages
- MEDIUM: resolvers: add global "dns-accept-family" directive
- MINOR: resolvers: add command-line argument -4 to force IPv4-only DNS
- MINOR: sock-inet: detect apparent IPv6 connectivity
- MINOR: resolvers: add "dns-accept-family auto" to rely on detected IPv6
- MEDIUM: acme: use Retry-After value for retries
- MEDIUM: acme: reset the remaining retries
- MEDIUM: acme: better error/retry management of the challenge checks
- BUG/MEDIUM: cli: Handle applet shutdown when waiting for a command line
- Revert "BUG/MINOR: master/cli: properly trim the '@@' process name in error messages"
- BUG/MINOR: master/cli: only parse the '@@' prefix on complete lines
- MINOR: resolvers: use the runtime IPv6 status instead of boot time one
On systems where the network is not reachable at boot time (certain HA
systems for example, or dynamically addressed test machines), we'll want
to be able to periodically revalidate the IPv6 reachability status. The
current code makes it complicated because it sets the config bits once
for all at boot time. This commit changes this so that the config bits
are not changed, but instead we rely on a static inline function that
relies on sock_inet6_seems_reachable for every test (really cheap). This
also removes the now unneeded resolvers late init code.
This variable for now is still set at boot time but this will ease the
transition later, as the resolvers code is now ready for this.
The new adhoc parser for the '@@' prefix forgot to require the presence
of the LF character marking the end of the line. This is the reason why
entering incomplete commands would display garbage, because the line was
expected to have its LF character replaced with a zero.
The problem is well illustrated by using socat in raw mode:
socat /tmp/master.sock STDIO,raw,echo=0
then entering "@@1 show info" one character at a time would error just
after the second "@". The command must take care to report an incomplete
line and wait for more data in such a case.
This reverts commit 0e94339eaf.
This patch was in fact fixing the symptom, not the cause. The root cause
of the problem is that the parser was processing an incomplete line when
looking for '@@'. When the LF is present, this problem does not exist
as it's properly replaced with a zero. This can be verified using socat
in raw mode:
socat /tmp/master.sock STDIO,raw,echo=0
Then entering "@@1 show info" one character at a time will immediately
fail on "@@" without going further. A subsequent patch will fix this.
No backport is needed.
When the CLI applet was refactord in the commit 20ec1de21 ("MAJOR: cli:
Refacor parsing and execution of pipelined commands"), a regression was
introduced. The applet shutdown was not longer handled when the applet was
waiting for the next command line. It is especially visible when a client
timeout occurred because the client connexion is no longer closed.
To fix the issue, the test on the SE_FL_SHW flag was reintroduced in
CLI_ST_PARSE_CMDLINE state, but only is there is no pending input data.
It is a 3.2-specific issue. No backport needed.
When the ACME task is checking for the status of the challenge, it would
only succeed or retry upon failure.
However that's not the best way to do it, ACME objects contain an
"status" field which could have a final status or a in progress status,
so we need to be able to retry.
This patch adds an acme_ret enum which contains OK, RETRY and FAIL.
In the case of the CHKCHALLENGE, the ACME could return a "pending" or a
"processing" status, which basically need to be rechecked later with the
RETRY. However a "invalid" or "valid" status is final and will return
either a FAIL or a OK.
So instead of retrying in any case, the "invalid" status will ends the
task with an error.
Parse the Retry-After header in response and store it in order to use
the value as the next delay for the next retry, fallback to 3s if the
value couldn't be parse or does not exist.
Instead of always having to force IPv4 or IPv6, let's now also offer
"auto" which will only enable IPv6 if the system has a default gateway
for it. This means that properly configured dual-stack systems will
default to "ipv4,ipv6" while those lacking a gateway will only use
"ipv4". Note that no real connectivity test is performed, so firewalled
systems may still get it wrong and might prefer to rely on a manual
"ipv4" assignment.
In order to ease dual-stack deployments, we could at least try to
check if ipv6 seems to be reachable. For this we're adding a test
based on a UDP connect (no traffic) on port 53 to the base of
public addresses (2001::) and see if the connect() is permitted,
indicating that the routing table knows how to reach it, or fails.
Based on this result we're setting a global variable that other
subsystems might use to preset their defaults.
In order to ease troubleshooting and testing, the new "-4" command line
argument enforces queries and processing of "A" DNS records only, i.e.
those representing IPv4 addresses. This can be useful when a host lack
end-to-end dual-stack connectivity. This overrides the global
"dns-accept-family" directive and is equivalent to value "ipv4".