When haproxy is compiled without USE_SHM_OPEN, does not try to dump the
startup-logs in the "reload" output, because it won't show anything
interesting.
Change the output of the "reload" command, it now displays "Success=0"
if the reload failed and "Success=1" if it succeed.
If the startup-logs is available (USE_SHM_OPEN=1), the command will
print a "--\n" line, followed by the content of the startup-logs.
Example:
$ echo "reload" | socat /tmp/master.sock -
Success=1
--
[NOTICE] (482713) : haproxy version is 2.7-dev7-4827fb-69
[NOTICE] (482713) : path to executable is ./haproxy
[WARNING] (482713) : config : 'http-request' rules ignored for proxy 'frt1' as they require HTTP mode.
[NOTICE] (482713) : New worker (482720) forked
[NOTICE] (482713) : Loading success.
$ echo "reload" | socat /tmp/master.sock -
Success=0
--
[NOTICE] (482886) : haproxy version is 2.7-dev7-4827fb-69
[NOTICE] (482886) : path to executable is ./haproxy
[ALERT] (482886) : config : parsing [test3.cfg:1]: unknown keyword 'Aglobal' out of section.
[ALERT] (482886) : config : Fatal errors found in configuration.
[WARNING] (482886) : Loading failure!
$
The environment variable HAPROXY_LOAD_SUCCESS stores "1" if it
successfully load the configuration and started, "0" otherwise.
The "_loadstatus" master CLI command displays either
"Loading failure!\n" or "Loading success.\n"
When using the "reload" command over the master CLI, all connections to
the master CLI were cut, this was unfortunate because it could have been
used to implement a synchronous reload command.
This patch implements an architecture to keep the connection alive after
the reload.
The master CLI is now equipped with a listener which uses a socketpair,
the 2 FDs of this socketpair are stored in the mworker_proc of the
master, which the master keeps via the environment variable.
ipc_fd[1] is used as a listener for the master CLI. During the "reload"
command, the CLI will send the FD of the current session over ipc_fd[0],
then the reload is achieved, so the master won't handle the recv of the
FD. Once reloaded, ipc_fd[1] receives the FD of the session, so the
connection is preserved. Of course it is a new context, so everything
like the "prompt mode" are lost.
Only the FD which performs the reload is kept.
Since commit 2be557f ("MEDIUM: mworker: seamless reload use the internal
sockpair"), we are using the PROC_O_LEAVING flag to determine which
sockpair worker will be used with -x during the next reload.
However in mworker_reexec(), the PROC_O_LEAVING flag is not updated, it
is only updated at startup in mworker_env_to_proc_list().
This could be a problem when a remaining process is still in the list,
it could be selected as the current worker, and its socket will be used
even if _getsocks doesn't work anymore on it. (bug #1803)
This patch fixes the issue by updating the PROC_O_LEAVING flag in
mworker_proc_list_to_env() just before using it in mworker_reexec()
Must be backported to 2.6.
The function mworker_pipe_register_per_thread() is called this way
because the master first used pipes instead of socketpairs.
Rename mworker_pipe_register_per_thread() to
mworker_sockpair_register_per_thread() in order to be more consistent.
Also update a comment inside the function.
The worker was previously changing the iocb of the socketpair in the
worker by mworker_accept_wrapper(). However, it was done using
fd_insert() instead of changing directly the callback in the
fdtab[].iocb pointer.
This patch cleans up this by part by removing fd_insert().
It also stops setting tid_bit on the thread mask, the socketpair will be
handled by any thread from now.
There's no more reason for keepin the code and definitions in conn_stream,
let's move all that to stconn. The alphabetical ordering of include files
was adjusted.
This file contains all the stream-connector functions that are specific
to application layers of type stream. So let's name it accordingly so
that it's easier to figure what's located there.
The alphabetical ordering of include files was preserved.
We're starting to propagate the stream connector's new name through the
API. Most call places of these functions that retrieve the channel or its
buffer are in applets. The local variable names are not changed in order
to keep the changes small and reviewable. There were ~92 uses of cs_ic(),
~96 of cs_oc() (due to co_get*() being less factorizable than ci_put*),
and ~5 accesses to the buffer itself.
This applies the change so that the applet code stops using ci_putchk()
and friends everywhere possible, for the much saferapplet_put*() instead.
The change is mechanical but large. Two or three functions used to have no
appctx and a cs derived from the appctx instead, which was a reminiscence
of old times' stream_interface. These were simply changed to directly take
the appctx. No sensitive change was performed, and the old (more complex)
API is still usable when needed (e.g. the channel is already known).
The change touched roughly a hundred of locations, with no less than 124
lines removed.
It's worth noting that the stats applet, the oldest of the series, could
get a serious lifting, as it's still very channel-centric instead of
propagating the appctx along the chain. Given that this code doesn't
change often, there's no emergency to clean it up but it would look
better.
This renames the "struct conn_stream" to "struct stconn" and updates
the descriptions in all comments (and the rare help descriptions) to
"stream connector" or "connector". This touches a lot of files but
the change is minimal. The local variables were not even renamed, so
there's still a lot of "cs" everywhere.
This one is the pointer to the conn_stream which is always in the
endpoint that is always present in the appctx, thus it's not needed.
This patch removes it and replaces it with appctx_cs() instead. A
few occurences that were using __cs_strm(appctx->owner) were moved
directly to appctx_strm() which does the equivalent.
This gets rid of most open-coded fcntl() calls, some of which were passed
through DISGUISE() to avoid a useless test. The FD_CLOEXEC was most often
set without preserving previous flags, which could become a problem once
new flags are created. Now this will not happen anymore.
Some older systems may routinely return EWOULDBLOCK for some syscalls
while we tend to check only for EAGAIN nowadays. Modern systems define
EWOULDBLOCK as EAGAIN so that solves it, but on a few older ones (AIX,
VMS etc) both are different, and for portability we'd need to test for
both or we never know if we risk to confuse some status codes with
plain errors.
There were few entries, the most annoying ones are the switch/case
because they require to only add the entry when it differs, but the
other ones are really trivial.
Remaining flags and associated functions are move in the conn-stream
scope. These flags are added on the endpoint and not the conn-stream
itself. This way it will be possible to get them from the mux or the
applet. The functions to get or set these flags are renamed accordingly with
the "cs_" prefix and updated to manipualte a conn-stream instead of a
stream-interface.
At many places, we now use the new CS functions to get a stream or a channel
from a conn-stream instead of using the stream-interface API. It is the
first step to reduce the scope of the stream-interfaces. The main change
here is about the applet I/O callback functions. Before the refactoring, the
stream-interface was the appctx owner. Thus, it was heavily used. Now, as
far as possible,the conn-stream is used. Of course, it remains many calls to
the stream-interface API.
Because appctx is now an endpoint of the conn-stream, there is no reason to
still have the stream-interface as appctx owner. Thus, the conn-stream is
now the appctx owner.
When starting HAProxy in master-worker, the master pre-allocate a struct
mworker_proc and do a socketpair() before the configuration parsing. If
the configuration loading failed, the FD are never closed because they
aren't part of listener, they are not even in the fdtab.
This patch fixes the issue by cleaning the mworker_proc structure that
were not asssigned a process, and closing its FDs.
Must be backported as far as 2.0, the srv_drop() only frees the memory
and could be dropped since it's done before an exec().
Since the wait mode is always used once we successfuly loaded the
configuration, every processes were marked as old workers.
To fix this, the PROC_O_LEAVING flag is set only on the processes which
have a number of reloads greater than the current processes.
Implement a reload failure counter which counts the number of failure
since the last success. This counter is available in 'show proc' over
the master CLI.
nbproc was removed, it's time to remove any reference to the relative
PID in the master-worker, since there can be only 1 current haproxy
process.
This patch cleans up the alerts and warnings emitted during the exit of
a process, as well as the "show proc" output.
This change is required to support TCP/HTTP rules in defaults sections. The
'disabled' bitfield in the proxy structure, used to know if a proxy is
disabled or stopped, is replaced a generic bitfield named 'flags'.
PR_DISABLED and PR_STOPPED flags are renamed to PR_FL_DISABLED and
PR_FL_STOPPED respectively. In addition, everywhere there is a test to know
if a proxy is disabled or stopped, there is now a bitwise AND operation on
PR_FL_DISABLED and/or PR_FL_STOPPED flags.
.disabled field in the proxy structure is documented to be a bitfield. So
use it as a bitfield. This change was introduced to the 2.5, by commit
8e765b86f ("MINOR: proxy: disabled takes a stopping and a disabled state").
No backport is needed except if the above commit is backported.
This patch splits the disabled state of a proxy into a PR_DISABLED and a
PR_STOPPED state.
The first one is set when the proxy is disabled in the configuration
file, and the second one is set upon a stop_proxy().
The relative_pid is always 1. In mworker mode we also have a
child->relative_pid which is always equalt relative_pid, except for a
master (0) or external process (-1), but these types are usually tested
for, except for one place that was amended to carefully check for the
PROC_O_TYPE_WORKER option.
Changes were pretty limited as most usages of relative_pid were for
designating a process in stats output and peers protocol.
A memory allocation failure happening in mworker_env_to_proc_list when
trying to allocate a mworker_proc would have resulted in a crash. This
function is only called during init.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
Only mworker uses proc_self, and it was declared in global.h, forcing
users of global.h to include mworker and its dependencies.
Moving it to mworker reduces the preprocessed size of version.c from
170 to 125kB by shrinking the number of local includes from 30 to 16
and the number of system includes from 147 to 132.
The current "ADD" vs "ADDQ" is confusing because when thinking in terms
of appending at the end of a list, "ADD" naturally comes to mind, but
here it does the opposite, it inserts. Several times already it's been
incorrectly used where ADDQ was expected, the latest of which was a
fortunate accident explained in 6fa922562 ("CLEANUP: stream: explain
why we queue the stream at the head of the server list").
Let's use more explicit (but slightly longer) names now:
LIST_ADD -> LIST_INSERT
LIST_ADDQ -> LIST_APPEND
LIST_ADDED -> LIST_INLIST
LIST_DEL -> LIST_DELETE
The same is true for MT_LISTs, including their "TRY" variant.
LIST_DEL_INIT keeps its short name to encourage to use it instead of the
lazier LIST_DELETE which is often less safe.
The change is large (~674 non-comment entries) but is mechanical enough
to remain safe. No permutation was performed, so any out-of-tree code
can easily map older names to new ones.
The list doc was updated.
The default proxy was passed as a variable to all parsers instead of a
const, which is not without risk, especially when some timeout parsers used
to make some int pointers point to the default values for comparisons. We
want to be certain that none of these parsers will modify the defaults
sections by accident, so it's important to mark this proxy as const.
This patch touches all occurrences found (89).
This makes the code more readable and less prone to copy-paste errors.
In addition, it allows to place some __builtin_constant_p() predicates
to trigger a link-time error in case the compiler knows that the freed
area is constant. It will also produce compile-time error if trying to
free something that is not a regular pointer (e.g. a function).
The DEBUG_MEM_STATS macro now also defines an instance for ha_free()
so that all these calls can be checked.
178 occurrences were converted. The vast majority of them were handled
by the following Coccinelle script, some slightly refined to better deal
with "&*x" or with long lines:
@ rule @
expression E;
@@
- free(E);
- E = NULL;
+ ha_free(&E);
It was verified that the resulting code is the same, more or less a
handful of cases where the compiler optimized slightly differently
the temporary variable that holds the copy of the pointer.
A non-negligible amount of {free(str);str=NULL;str_len=0;} are still
present in the config part (mostly header names in proxies). These
ones should also be cleaned for the same reasons, and probably be
turned into ist strings.
glibc < 2.10 requires _GNU_SOURCE in order to make use of strsignal(),
otherwise leading to SEGV at runtime.
$ make V=1 TARGET=linux-glibc-legacy USE_THREAD= USE_ACCEPT4=
..
src/mworker.c: In function 'mworker_catch_sigchld':
src/mworker.c:285: warning: implicit declaration of function 'strsignal'
src/mworker.c:285: warning: pointer/integer type mismatch in conditional expression
..
$ make V=1 reg-tests REGTESTS_TYPES=slow,default
..
###### Test case: reg-tests/mcli/mcli_start_progs.vtc ######
## test results in: "/tmp/haregtests-2021-01-19_15-18-07.n24989/vtc.29077.28f6153d"
---- h1 Bad exit status: 0x008b exit 0x0 signal 11 core 128
---- h1 Assert error in haproxy_wait(), src/vtc_haproxy.c line 792: Condition(*(&h->fds[1]) >= 0) not true. Errno=0 Success
..
$ gdb ./haproxy /tmp/core.0.haproxy.30270
..
Core was generated by `/root/haproxy/haproxy -d -W -S fd@8 -dM -f /tmp/haregtests-2021-01-19_15-18-07.'.
Program terminated with signal 11, Segmentation fault.
#0 0x00002aaaab387a10 in strlen () from /lib64/libc.so.6
(gdb) bt
#0 0x00002aaaab387a10 in strlen () from /lib64/libc.so.6
#1 0x00002aaaab354b69 in vfprintf () from /lib64/libc.so.6
#2 0x00002aaaab37788a in vsnprintf () from /lib64/libc.so.6
#3 0x00000000004a76a3 in memvprintf (out=0x7fffedc680a0, format=0x5a5d58 "Current worker #%d (%d) exited with code %d (%s)\n", orig_args=0x7fffedc680d0)
at src/tools.c:3868
#4 0x00000000004bbd40 in print_message (label=0x58abed "ALERT", fmt=0x5a5d58 "Current worker #%d (%d) exited with code %d (%s)\n", argp=0x7fffedc680d0)
at src/log.c:1066
#5 0x00000000004bc07f in ha_alert (fmt=0x5a5d58 "Current worker #%d (%d) exited with code %d (%s)\n") at src/log.c:1109
#6 0x0000000000534b7b in mworker_catch_sigchld (sh=<value optimized out>) at src/mworker.c:293
#7 0x0000000000556af3 in __signal_process_queue () at src/signal.c:88
#8 0x00000000004f6216 in signal_process_queue () at include/haproxy/signal.h:39
#9 run_poll_loop () at src/haproxy.c:2859
#10 0x00000000004f63b7 in run_thread_poll_loop (data=<value optimized out>) at src/haproxy.c:3028
#11 0x00000000004faaac in main (argc=<value optimized out>, argv=0x7fffedc68498) at src/haproxy.c:904
See: https://man7.org/linux/man-pages/man3/strsignal.3.html
Must be backported as far as 2.0.
Now we define a new sock_accept_iocb() for socket-based stream protocols
and use it as a wrapper for listener_accept() which now takes a listener
and not an FD anymore. This will allow the receiver's I/O cb to be
redefined during registration, and more specifically to get rid of the
hard-coded hacks in protocol_bind_all() made for syslog.
The previous ->accept() callback in the protocol was removed since it
doesn't have anything to do with accept() anymore but is more generic.
A few places where listener_accept() was compared against the FD's IO
callback for debugging purposes on the CLI were updated.
This listener flag indicates whether the receiver part of the listener
is specific to the master or to the workers. In practice it's only used
by the master's CLI right now. It's used to know whether or not the FD
must be closed before forking the workers. For this reason it's way more
of a receiver's property than a listener's property, so let's move it
there under the name RX_F_MWORKER. The rest of the code remains
unchanged.
And also remove it from its callers. This subtle distinction was added as
sort of a hack for the seamless reload feature but is not needed anymore
since the do_close turned unused since commit previous commit ("MEDIUM:
listener: let do_unbind_listener() decide whether to close or not").
This also removes the unbind_listener_no_close() function.
The remaining proxy states were only used to distinguish an enabled
proxy from a disabled one. Due to the initialization order, both
PR_STNEW and PR_STREADY were equivalent after startup, and they
would only differ from PR_STSTOPPED when the proxy is disabled or
shutdown (which is effectively another way to disable it).
Now we just have a "disabled" field which allows to distinguish them.
It's becoming obvious that start_proxies() is only used to print a
greeting message now, that we'd rather get rid of. Probably that
zombify_proxy() and stop_proxy() should be merged once their
differences move to the right place.
It's the receiver's FD that's inherited from the parent process, not
the listener's so the flag must move to the receiver so that appropriate
actions can be taken.
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
Most of the files dealing with error reports have to include log.h in order
to access ha_alert(), ha_warning() etc. But while these functions don't
depend on anything, log.h depends on a lot of stuff because it deals with
log-formats and samples. As a result it's impossible not to embark long
dependencies when using ha_warning() or qfprintf().
This patch moves these low-level functions to errors.h, which already
defines the error codes used at the same places. About half of the users
of log.h could be adjusted, sometimes revealing other issues such as
missing tools.h. Interestingly the total preprocessed size shrunk by
4%.