Since 9c91b30 ("MINOR: server: remove prev_deleted server list"), hlua
server pair iterator may use and return invalid (stale) server pointer
if multiple servers were deleted between two iterations.
Indeed, the server refcount mechanism (using srv_take()) is no longer
sufficient as the prev_deleted mitigation was removed.
To ensure server pointer consistency between two yields, the new watcher
mechanism must be used (as it already the case for stats dumping).
Thus in this patch we slightly change the server iteration logic:
hlua_server_list_iterator_context struct now stores the next valid server
pointer, and a watcher is added to ensure this pointer is never stale.
Then in hlua_listable_servers_pairs_iterator(), this next pointer is used
to create the Lua server object, and the next valid pointer is obtained by
leveraging watcher_next().
No backport needed unless 9c91b30 ("MINOR: server: remove prev_deleted
server list") is. Please note that dynamic servers were not supported in
Lua prior to 2.8, so it doesn't make sense to backport this patch further
than 2.8.
"hold.timeout" was used as expiration date for srvrq_check tasks. But it is
not accurrate. The expiration date must be based on the resolution timeouts
instead (resolve and retry).
The purpose of srvrq_check task is to clean up the server resolution status
when outdated info are inherited from the state file. Using "hold.timeout"
is not accurrate here because hold timeouts concern the resolution response
items not the resolution status of servers. It may be set to a huge value or
0. The expiration date of these tasks must be based on the resolution
timeouts instead.
So now the ("timeout resolve" + resolve_retries * "timeout retry") value is
used.
This patch should fix the issue #2816. It must be backported to all stable
versions.
query() sample fetch function takes an optional argument string. During
configuration parsing, empty string must be ignored. It is especially
important when the sample is used with empty parenthesis. The argument is
optional and it is a list of options to configure the behavior of the sample
fetch. So it is logical to ignore empty strings.
This patch should fix the issue #2815. It must be backported to 3.1.
This patch is a direct follow-up to the previous one. Thanks to watcher
type, it is not safe to assume that servers manipulated via stats dump
were not targetted by a "delete server" CLI command. As such,
prev_deleted list server member is now unneeded. This patch thus removes
any reference to it.
If a server A is deleted while a stats dump is currently on it, deletion
is delayed thanks to reference counting. Server A is nonetheless removed
from the proxy list. However, this list is a single linked list. If the
next server B is deleted and freed immediately, server A would still
point to it. This problem has been solved by the prev_deleted list in
servers.
This model seems correct, but it is difficult to ensure completely its
validity. In particular, it implies when stats dump is resumed, server A
elements will be accessed despite the server being in a half-deleted
state.
Thus, it has been decided to completely ditch the refcount mechanism for
stats dump. Instead, use the watcher element to register every stats
dump currently tracking a server instance. Each time a server is deleted
on the CLI, each stats dump element which may points to it are updated
to access the next server instance, or NULL if this is the last server.
This ensures that a server which was deleted via CLI but not completely
freed is never accessed on stats dump resumption.
Currently, no race condition related to dynamic servers and stats dump
is known. However, as described above, the previous model is deemed too
fragile, as such this patch is labelled as bug-fix. It should be
backported up to 2.6, after a reasonable period of observation. It
relies on the following patch :
MINOR: list: define a watcher type
Define a new watcher type into list module. This type is similar to bref
and can be used to register an element which is currently tracking a
dynamic target. Contrary to bref, if the target is freed, every watcher
element are updated to point to a next valid entry or NULL.
This type will simplify handling of dynamic servers deletion, in
particular while stats dump are performed.
This patch is not a bug-fix. However, it is mandatory to fix a race
condition in dynamic servers. Thus, it should be backported along the
next commit up to 2.6.
Servers instance may be removed at runtime. This can occurs during a
stat dump which currently references this server instance. This case is
protected by server refcount to prevent the server immediate release.
CLI output may be interrupted prior to stats dump completion, for
example if client CLI has been disconnected before the full response
transfer. As such, srv_drop() must be called in every stats dump release
callback.
srv_drop() was missing for stats-file dump release callback. This could
cause a race condition which would prevent a server instance to be fully
removed. Fix this by adding srv_drop() invokation into
cli_io_handler_release_dump_stat_file().
This should be backported up to 3.0.
The 'show ssl sni' output can be confusing when using crt-list, because
the wildcards can be completed with negative filters, and they need to
be associated to the same line.
Having a negative filter on its line alone does not make much sense,
this patch adds a new 'Negative Filter' column that show the exception
applied on a wildcard from a crt-list line.
Some master process' initialization steps are conditioned by receiving the
READY message from worker (pidfile creation, forwarding READY message to the
launching parent). So, master process can not do these initialization routines
before.
If the master process fails, while creating pid or forwarding the READY to the
parent in daemon mode, he exits with a proper alert message. In daemon mode we
no longer see such message, as process is already detached from the tty.
To fix this, as these alerts could be very useful, let's detach the master
process from the tty after his last initialization steps in _send_status.
As daemonization fork happens now very early and before the master-worker
fork, if master or worker processes fail during the initialization, some
critical errors can't be reported to stdout. The launching (parent) process in
such cases exits with 0. This makes an impression, that master and his worker
have successfully started at background, which really complicates the
operations.
In the previous commit a pipe was added to make daemonized child communicate
with his parent. Let's add the same logic to master-worker mode. Up to
receiving the READY message from the worker, master will "forward" it via the
pipe to the launching process. Launching process can obtain master's exit
status, if the master fails to start and nothing has been written in the pipe.
This fix should be backported only in 3.1.
Due to master-worker rework, daemonization fork happens now before parsing
and applying the configuration. This makes impossible to report correctly all
warnings and alerts to shell's stdout. Daemonzied process fails, while being
already in background, exit code reported by shell via '$?' equals to 0, as
it's the exit code of his parent.
To fix this, let's create a pipe between parent and daemonized child. The
child will send into this pipe a "READY" message, when it finishes his
initialization. The parent will wait on the "read" end of the pipe until
receiving something. If read() fails, parent obtains the status of the
exited child with waitpid(). So, the parent can correctly report the error to
the stdout and he can exit with child's exitcode.
This fix should be backported only in 3.1.
Due to master-worker refactoring, daemonization fork happens now very early,
before parsing and verifying the configuration. For the moment there is no
any specific syntax, which needs for the daemon mode to be really applied in
order to perform the tests.
So, it's better not to do the daemonization fork, if 'daemon' keyword is
presented in the config (or -D option), when we started with -c (MODE_CHECK).
Like this, during the config verification, the process will always stay in
foreground and all warning or errors will be delivered to the stdout.
This fix should be backported only in 3.1.
The "show ssl sni" command, allows one to dump the list of SNI in an
haproxy process, or a designated frontend.
It lists the SNI with the type, filename, and dates of expiration and
activation
fddebug() is sometimes quite helpful, but annoying to use when following
a call path because it's a pain to always repeat the function name and
call place. Let's have it automatically prepend the function name, the
file name and the line number, and make its arguments optional, replacing
them by a simple LF when all absent. This way, simply placing:
fddebug();
is sufficient to emit a location follocing "[%s@%s:%d]\n". This function
must not be used in production (and even call places with it shouldn't be
committed) and it should only be used by developers, so the simplest the
better.
Commit 7f64bb79fd ("BUG/MINOR: debug: COUNT_IF() should return true/false")
allowed the COUNT_IF() macro to return the evaluated value. This is handy
to place it in "if ()" conditions and count them at the same time. When
glitches are disabled, the condition is just returned as-is, but most call
places do not use the result, making some compilers complain. In addition,
while reviewing this, it was noticed that when DEBUG_STRICT=0, the macro
would still be replaced by a "do { } while (0)" statement, which not only
does not evaluate the expression, but also cannot return anything. Ditto
for COUNT_IF_HOT().
Let's make sure both are always properly evaluated now.
Latest commit f0eca8fe7 ("MINOR: mux-h2/glitches: add a description to
the H2 glitches") misplaced the optional glitch description field, with
it appearing at the end of the if () condition and always reporting
an excess of CONTINUATION frames from the first exceeding one.
This needs to be backported along with that commit once it gets backported.
Since we can now list them using "debug counters" and now support a
description, better add the description to all glitches. This patch may
be backported to 3.1, but before this the following patches must also
be picked:
86823c828 MINOR: mux-h2/traces: add a missing trace on negative initial window size
7c8e9420a CLEANUP: mux-h2/traces: reword certain ambiguous traces
When a negative initial windows size is reported, we're going to close
the connection, so it's important to report a trace to explain why!
This should be backported at least to 3.1 and possibly 3.0 (adapting the
context since there's no glitches there).
The COUNT_IF() macro was initially meant to return true/false to be used
in if() conditions but had an extra do { } while(0) that prevents it from
doing so. Let's get rid of the do { } while(0) before the code generalizes
to too many places. There's no impact on existing code, but may have to be
backported if future fixes rely on it.
Define a new function quic_register_build_options(). Its purpose is to
register a build options string for QUIC features which is reported when
using haproxy -vv.
This will allow to easily determine if connection socket-owner mode and
GSO are supported or not. Here is the new filtered output :
$ ./haproxy -vv|grep '^QUIC:'
QUIC: connection socket-owner mode support : yes
QUIC: GSO emission support : yes
Two features are tested on startup via quic_test_socketopts() :
connection socket-owner mode support and GSO. Extract both test in their
separated functions called by quic_test_socketopts().
This patch will allow to reuse easily QUIC features detection for build
options report via haproxy -vv.
quic_test_socketopts() function is used to detect system support for
QUIC network stack. Previously, it relies on an already bound listener
instance, notably to ensure that two UDP sockets can be bound on the
same source address.
Improve quic_test_socketopts() to run without any listener argument. It
now automatically instantiates and manipulates two dummy sockets FDs to
check for multi-bind support. This brings two advantages :
* the function is now called via an initcall
* it will easily be reusable to implement build option description
This commit is the counterpart of the previous one for FCGI mux. It
replaces objt_server() by unsafe __objt_server(), as conn target is
guarantee to point to a valid server instance, which can then be used as
_srv_add_idle() argument.
This commit is the counterpart of the previous one for SPOP mux. It
replaces objt_server() by unsafe __objt_server(), as conn target is
guarantee to point to a valid server instance, which can then be used as
_srv_add_idle() argument.
This should fix coverity report from github issue #2811.
This commit is the counterpart of the previous one for H2 mux. It
replaces objt_server() by unsafe __objt_server(), as conn target is
guarantee to point to a valid server instance, which can then be used as
_srv_add_idle() argument.
When dealing with a backend connection, H1 mux IO handler must reinsert
it in its idle list pool if it was extracted from it at the beginning.
This is the case if conn_in_list is true.
On reinsert, idle list pool is retrieved via the server instance
accessible from <conn.target>. Replace objt_server usage with
__objt_server as an idle connection is always attached to a server. This
ensures that there is no issue when using _srv_add_idle() then.
This should fix coverity report from github issue #2810.
in 50322dff ("MEDIUM: server: add init-state"), some examples on how to
use init-state server keyword were added alongside with the keyword
documentation.
However, as reported by Nick Ramirez, there was an error because the
example that stated that haproxy will pass the traffic to the server after
3 successful health checks used the "init-state down" instead of the
"init-state fully-down". Thus the behavior wouldn't match what the
comment said (only 1 successful health check was required).
Here we fix the example in itself to match with the comment. Also the
following example ("# or") was also affected, but it is kind of
redundant as the main purpose of the examples are to illustrate the
feature in itself and not how to use server-template directive, so we
remove it.
This should be backported in 3.1 with 50322dff
Update comment and condition. nb_oldpids it's not a pointer, but a signed int,
which keeps the max number of elements in oldpids array. So, it's a good
practice to check, if it's strictly positive here.
If master process can't open a pidfile, there is no sense to send SIGTTIN to
oldpids, as it will exit. So, old workers will terminate as well. It's better
to send the last alert to the log about unrecoverable error, because master is
already in its polling loop.
For the standalone mode we should keep the previous logic in this case: send
SIGTTIN to old process and unbind listeners for the new one. So, it's better
to put this error path in main(), as it's done when other configuration settings
can't be applied.
This patch should be backported only in 3.1.
When a new master process is launched like below:
./haproxy -W -D -p ha.pid -sf $(cat ha.pid)...
The old master process and its workers do not stop. Since the master-worker
refactoring, the code, which sends USR1/TERM to old pids from -sf, is called
only for the standalone mode. In master-worker mode we should receive the READY
message from the newly forked worker at first, in order to be able to terminate
the previous master.
So, to fix this, let's terminate the previous master in _send_status(), where
we parse the READY message from the newly forked worker. And let's continue to
use oldpids array, as it was in 3.0, in order to stop the workers, launched
before the reload.
This patch should be backported only in 3.1.
After reload, previously launched programs are stopped explicitly in
mworker_ext_launch_all(). So, there is no longer need to save their PIDs in
oldpids array before the master reexec().
This also prepares the fix of "-D -W -sf/-st" modes, as we will need to
loop over this array in the master process context, in order to stop the
previous master, when the new one is ready.
This patch should be backported only in 3.1.
These options are now deprectated, but the proxy capabilities are not
properly checked during the configuration parsing leading to always ignore
these options. This is now fixed by checking the frontend capability for
"accept-invalid-http-request" option and the backend capability for
"accept-invalid-http-response" option.
In addition, the messages about the deprecation of these options are now
emitted with ha_warning() instead of ha_alert() because they are only
warnings and not errors.
This patch should fix the issue #2806. It must be backported to 3.1.
Recently, an issue was found with QUIC zero-copy forwarding on 3.0
version. A desynchronization could occur internally in QCS Tx bytes
counters which would cause a BUG_ON() crash on qcs_destroy() when the
stream is detached.
It was silently fixed in version 3.1 by the following patch. As it was
considered as an optimization, it was not scheduled yet for backport.
6697e87ae5
MINOR: mux-quic: Don't send an emtpy H3 DATA frame during zero-copy forwarding
This mistake has been caused due to some counter-intuitive manipulation
in QUIC zero-copy implementation. Try to streamline this in QUIC MUX
done_ff callback and its application protocol counterpart. Especially
for values exchanged between MUX and application on one side, and MUX
and stconn layer as done_fastfwd return value.
First, application done_ff callback now returns the length of the wholly
encoded frame. For HTTP/3, it means header length + payload length h3
frame. This value can then be reused as qcc_send_stream() argument to
increase QCS Tx soft offset.
As previously, special care has been taken to ensure that QUIC MUX
done_ff only return the transferred data bytes. Thus, any extra offset
for HTTP/3 header is properly excluded. This is mandatory for stconn
layer to consider the transfer has completed.
Secondly, remove duplicated code in application done_ff to reset iobuf
info. This is now factorize in QUIC MUX done_ff itself.
This patch is related to github issue #2678.
Since commit 1cc851d9f2 ("MEDIUM: mux-h2: start to update stream when
sending WU") we started storing stream offsets in the h2s struct. These
offsets are updated at a few points, where it's safe to write to the
stream, and in h2c_send_strm_wu(), where the h2s->h2c was not performed.
Due to this, nothing protects the h2s from being updated when sending a
WU for a closed stream, which might only happen when acknowledging a
frame after resetting that stream, which is quite unlikely. In any case
if this happens, it will crash as in issue #2793 since the closed streams
are purposely read-only to catch such bugs.
The fix is trivial, just check h2s->h2c before deciding to update the
stream.
Thanks to @Wahnes for reporting this, and Christopher for spotting the
cause. This needs to be backported to 3.1 only.
Better late than never, commit 1f73d35 ("MINOR: stktable: implement
"recv-only" table option") implemented stktable flags and initial
definitions, but it lacks some comments plus the flag is stored as
16bits but the SKT_FL_ definition width allows for only 8bits so
it is a bit confusing, let's fix that
Thanks to previous commit stktable struct now have a "flags" struct member
Let's take this opportunity to remove the isolated "nopurge" attribute in
stktable struct and rely on a flag named STK_FL_NOPURGE instead.
This helps to better organize stktable struct members.
When "recv-only" keyword is added on a stick table declaration (in peers
or proxy section), haproxy considers that the table is only used for
data retrieval from a remote location and not used to perform local
updates. As such, it enables the retrieval of local-only values such
as conn_cur that are ignored by default. This can be useful in some
contexts where we want to know about local-values such are conn_cur
from a remote peer.
To do this, add stktable struct flags which default to NONE and enable
the RECV_ONLY flag on the table then "recv-only" keyword is found in the
table declaration. Then, when in peer_treat_updatemsg(), when handling
table updates, don't ignore data updates for local-only values if the flag
is set.
This patch is similar to the previous one, but for GSO support. Remove
alert level message to a diag report only visible with argument -dD.
This must be backported up to 3.1.
QUIC relies on several advanced network API features from the kernel to
perform optimally. Checks are performed during startup to ensure that
these features are supported. A fallback is automatically performed for
every incompatible feature.
Besides the automatic fallback mechanism, a message is also reported to
the user at the same time. Previously, alert level was used, but it is
incorrect as it is reserved for unrecoverable errors which should
prevent haproxy to start. Warning level could be used, but this can
annoy users running with zero-warning mode.
This patch removes the alert message when 'socket-owner connection' mode
cannot be activated. Convert the message to a diag level. This allows
users to start without forcing configuration modification to hide a
warning. Besides, several feature fallback such as the polling mechanism
does not emit any warning either, so it's better to adopt a similar
behavior for QUIC features.
This must be backported up to 2.8.
TASK_F_USR1 is used by MUX tasklet when emission has been interrupted
due to pacing. When the tasklet runs again, only qcc_purge_sending()
will be called as an optimization.
Pacing status is only removed via qcc_wakeup(). Until then, TASK_F_USR1
is not cleared. This causes an issue after emission with pacing
completion if the MUX tasklet is woken up for a recv subscribe, as
qcc_wakeup() is not used by quic-conn layer. The tasklet will
incorrectly run only for pacing emission, without handling reception
process. Worst, a crash will occur if QCC tx frames list is empty, due
to a BUG_ON() in qcc_purge_sending().
Recv subscribe is only used for 0-RTT, when QUIC MUX is instantiated
before quic-conn handshake completion. Thus, this bug can only be
reproduced with 0-rtt. Furthermore, MUX must already have emitted at
least a few response bytes with pacing, before QUIC handshake
completion. It cannot easily be reproduced, at least with CLI clients
where the handshake is always already completed before MUX exchanges.
To fix this, remove TASK_F_USR1 when pacing emission has been completed.
At least, this prevents BUG_ON() on qcc_purge_sending() as it won't be
called with an empty QCC Tx frame list anymore. However, this bug has
revealed that MUX tasklet architecture is not suitable when both
handling reception and emission part. This will be improved in a future
serie of patches.
This should fix github issue #2796.
This must be backported up to 3.1.
In 3.1-dev10, commit 8dd4efe42f ("MAJOR: mworker: move master-worker
fork in init()") made the fork_poller() code unconditional, while it
is only desirable for processes that have been forked from a parent
(standalone daemon mode) or from a master (master-worker mode). The
call can be expensive in some cases as it will create a new poller,
scan and try to migrate to it all existing FDs till the highest known
one. With very high numbers of FDs, this can take several seconds to
start.
This should be backported to 3.1.
Commit 8dd4efe42f ("MAJOR: mworker: move master-worker fork in init()")
introduced some sensitive changes to the startup code (which was
expected), and one sensitive change is that the second call to setsid()
was accidentally made unconditional. As such it even applies to foreground
processes, resulting in foreground processes being detached from the
terminal and no longer responding to Ctrl-C nor Ctrl-Z. An example of
this simply consists in start haproxy -db under sudo. Then a new shell
is required to stop it.
This patch removes this second setsid(), as it is already done in
apply_daemon_mode().
This must be backported to 3.1.