Add self-wake in signal_handler() to fix a race condition with a signal
coming in between checking signal_queue_len and entering polling sleep.
The changes in commit 43c891dda ("BUG/MINOR: signals/poller: set the
poller timeout to 0 when there are signals") were insufficient.
Move the signal_queue_len check from the poll implementations to
run_poll_loop() to keep that logic in one place.
The poll loops are terminated either by the parameter wake being set or
wake up due to a write to their poller_wr_pipe by wake_thread() in
signal_handler().
This fixes issue #1841.
Must be backported in every stable version.
The current "ADD" vs "ADDQ" is confusing because when thinking in terms
of appending at the end of a list, "ADD" naturally comes to mind, but
here it does the opposite, it inserts. Several times already it's been
incorrectly used where ADDQ was expected, the latest of which was a
fortunate accident explained in 6fa922562 ("CLEANUP: stream: explain
why we queue the stream at the head of the server list").
Let's use more explicit (but slightly longer) names now:
LIST_ADD -> LIST_INSERT
LIST_ADDQ -> LIST_APPEND
LIST_ADDED -> LIST_INLIST
LIST_DEL -> LIST_DELETE
The same is true for MT_LISTs, including their "TRY" variant.
LIST_DEL_INIT keeps its short name to encourage to use it instead of the
lazier LIST_DELETE which is often less safe.
The change is large (~674 non-comment entries) but is mechanical enough
to remain safe. No permutation was performed, so any out-of-tree code
can easily map older names to new ones.
The list doc was updated.
Most of the files dealing with error reports have to include log.h in order
to access ha_alert(), ha_warning() etc. But while these functions don't
depend on anything, log.h depends on a lot of stuff because it deals with
log-formats and samples. As a result it's impossible not to embark long
dependencies when using ha_warning() or qfprintf().
This patch moves these low-level functions to errors.h, which already
defines the error codes used at the same places. About half of the users
of log.h could be adjusted, sometimes revealing other issues such as
missing tools.h. Interestingly the total preprocessed size shrunk by
4%.
The current state of the logging is a real mess. The main problem is
that almost all files include log.h just in order to have access to
the alert/warning functions like ha_alert() etc, and don't care about
logs. But log.h also deals with real logging as well as log-format and
depends on stream.h and various other things. As such it forces a few
heavy files like stream.h to be loaded early and to hide missing
dependencies depending where it's loaded. Among the missing ones is
syslog.h which was often automatically included resulting in no less
than 3 users missing it.
Among 76 users, only 5 could be removed, and probably 70 don't need the
full set of dependencies.
A good approach would consist in splitting that file in 3 parts:
- one for error output ("errors" ?).
- one for log_format processing
- and one for actual logging.
The TASK_IS_TASKLET() macro was moved to the proto file instead of the
type one. The proto part was a bit reordered to remove a number of ugly
forward declaration of static inline functions. About a tens of C and H
files had their dependency dropped since they were not using anything
from task.h.
A few files were including it while not needing it (anymore). Some
only required access to the atomic ops and got haproxy/atomic.h in
exchange. Others didn't need it at all. A significant number of
files still include it only for THREAD_LOCAL definition.
When running __signal_process_queue(), we ignore most signals. We can't,
however, ignore WDTSIG and DEBUGSIG, otherwise that thread may end up
waiting for another one that could hold a glibc lock, while the other thread
wait for this one to enter debug_handler().
So make sure WDTSIG and DEBUGSIG aren't ignored, if they are defined.
This probably explains the watchdog deadlock described in github issue
This should be backported to 2.1, 2.0 and 1.9.
signal_init(), init_log(), init_stream(), and init_task() all used to
only preset some values and lists. This needs to be done very early to
provide a reliable interface to all other users. The calls used to be
explicit in haproxy.c:init(). Now they're placed in initcalls at the
STG_PREPARE stage. The functions are not exported anymore.
Instead of exporting a number of pools and having to manually delete
them in deinit() or to have dedicated destructors to remove them, let's
simply kill all pools on deinit().
For this a new function pool_destroy_all() was introduced. As its name
implies, it destroys and frees all pools (provided they don't have any
user anymore of course).
This allowed to remove 4 implicit destructors, 2 explicit ones, and 11
individual calls to pool_destroy(). In addition it properly removes
the mux_pt_ctx pool which was not cleared on exit (no backport needed
here since it's 1.9 only). The sig_handler pool doesn't need to be
exported anymore and became static now.
This commit replaces the explicit pool creation that are made in
constructors with a pool registration. Not only this simplifies the
pools declaration (it can be done on a single line after the head is
declared), but it also removes references to pools from within
constructors. The only remaining create_pool() calls are those
performed in init functions after the config is parsed, so there
is no more user of potentially uninitialized pool now.
It has been the opportunity to remove no less than 12 constructors
and 6 init functions.
The new function signal_unregister() removes every handlers assigned to
a signal. Once the handler list of the signal is empty, the signal is
ignored with SIG_IGN.
With the new way of handling the signals in the master worker, we are
are not staying in a waitpid() loop. Which means that we need to catch the
SIGCHLD signals to call waitpid().
The problem is when the master is reloading, this signal is neither
registered nor blocked so we lost all signals between the restart and
the call to mworker_loop().
This patch blocks the SIGCHLD signals before the reloading and ensure
it's not unblocked before the master registered the SIGCHLD handler.
This is never used and would even be wrong since the reasons are ORed
so two signals would be turned into a third value, just like if any
other reason was used at the same time.
The behavior of sigprocmask in an multithreaded environment is
undefined.
The new macro ha_sigmask() calls either pthreads_sigmask() or
sigprocmask() if haproxy was built with thread support or not.
This should be backported to 1.8.
We don't have any reason of blocking those signals.
If SIGBUS, SIGFPE, SIGILL, or SIGSEGV are generated while they are blocked, the
result is undefined, unless the signal was generated by kill(2), sigqueue(3), or
raise(3).
This should be backported to 1.8.
Signals were handled in all threads which caused some signals to be lost
from time to time. To avoid complicated lock system (threads+signals),
we prefer handling the signals in one thread avoiding concurrent access.
The side effect of this bug was that some process were not leaving from
time to time during a reload.
This patch must be backported in 1.8.
During the migration to the second version of the pools, the new
functions and pool pointers were all called "pool_something2()" and
"pool2_something". Now there's no more pool v1 code and it's a real
pain to still have to deal with this. Let's clean this up now by
removing the "2" everywhere, and by renaming the pool heads
"pool_head_something".
This macro should be used to declare variables or struct members depending on
the USE_THREAD compile option. It avoids the encapsulation of such declarations
between #ifdef/#endif. It is used to declare all lock variables.
The master-worker will reload itself on SIGUSR2/SIGHUP
It's inherited from the systemd wrapper, when the SIGUSR2 signal is
received, the master process will reexecute itself with the -sf flag
followed by the PIDs of the children.
In the systemd wrapper, the children were using a pipe to notify when
the config has been parsed and when the new process is ready. The goal
was to ensure that the process couldn't reload during the parsing of the
configuration, before signals were send to old process.
With the new mworker model, the master parses the configuration and is
aware of all the children. We don't need a pipe, but we need to block
those signals before the end of a reload, to ensure that the process
won't be killed during a reload.
The SIGUSR1 signal is forwarded to the children to soft-stop HAProxy.
The SIGTERM and SIGINT signals are forwarded to the children in order to
terminate them.
A problem was reported recently by some users of programs compiled
with Go 1.5 which by default blocks all signals before executing
processes, resulting in haproxy not receiving SIGUSR1 or even SIGTERM,
causing lots of zombie processes.
This problem was apparently observed by users of consul and kubernetes
(at least).
This patch is a workaround for this issue. It consists in unblocking
all signals on startup. Since they're normally not blocked in a regular
shell, it ensures haproxy always starts under the same conditions.
Quite useful information reported by both Matti Savolainen and REN
Xiaolei actually helped find the root cause of this problem and this
workaround. Thanks to them for this.
This patch must be backported to 1.6 and 1.5 where the problem is
observed.
sig is checked for < 0 or > MAX_SIGNAL, but the signal array is
MAX_SIGNAL in size. At the moment, MAX_SIGNAL is 256. If a system supports
more than MAX_SIGNAL signals, then sending signal MAX_SIGNAL to the process
will corrupt one integer in its memory and might also crash the process.
This bug is also present in 1.4 and 1.3, and the fix must be backported.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
SIGPROF is blocked then restored to default settings, which sometimes
makes profiling harder. Let's not block it by default nor restore it,
it doesn't serve any purpose anyway.
Until now, signals configured with no handler were still enabled and
ignored upon signal reception. Until now it was not an issue but with
SSL causing many EPIPE all the time, it becomes obvious that signal
processing comes with a cost. So set the handler to SIG_IGN when the
function is NULL.
Signal zero is never delivered by the system. However having a signal to
which functions and tasks can subscribe to be notified of a stopping event
is useful. So this patch does two things :
1) allow signal zero to be delivered from any function of signal handler
2) make soft_stop() deliver this signal so that tasks can be notified of
a stopping condition.
The two new functions below make it possible to register any number
of functions or tasks to a system signal. They will be called in the
registration order when the signal is received.
struct sig_handler *signal_register_fct(int sig, void (*fct)(struct sig_handler *), int arg);
struct sig_handler *signal_register_task(int sig, struct task *task, int reason);
AIX wants string.h in signal.c (and is right to do so) :
gcc -Iinclude -Wall -O2 -g -DTPROXY -DENABLE_POLL -DCONFIG_HAPROXY_VERSION=\"1.3.18\" -DCONFIG_HAPROXY_DATE=\"2009/05/10\" -c -o src/signal.o src/signal.c
src/signal.c: In function 'signal_init':
src/signal.c:32: warning: implicit declaration of function 'memset'
src/signal.c:32: warning: incompatible implicit declaration of built-in function 'memset'
These functions will be used to deliver asynchronous signals in order
to make the signal handling functions more robust. The goal is to keep
the same interface to signal handlers.