140 Commits

Author SHA1 Message Date
Christopher Faulet
b734d7c156 MINOR: cli/applet: Move appctx fields only used by the CLI in a private context
There are several fields in the appctx structure only used by the CLI. To
make things cleaner, all these fields are now placed in a dedicated context
inside the appctx structure. The final goal is to move it in the service
context and add an API for cli commands to get a command coontext inside the
cli context.
2025-04-24 15:09:37 +02:00
Willy Tarreau
23705564ae BUG/MINOR: debug: remove the trailing \n from BUG_ON() statements
These ones were added by mistake during the change of the cfgparse
mechanism in 3.1, but they're corrupting the output of "debug counters"
by leaving stray ']' on their own lines. We could possibly check them
all once at boot but it doens't seem worth it.

This should be backported to 3.1.
2025-04-14 19:02:13 +02:00
William Lallemand
143be1b59f MEDIUM: errors: get rid of shm_open()
Since 5ee266b7 ("MINOR: error: simplify startup_logs_init_shm"), the FD
of the startup logs is always closed and the HAPROXY_STARTUPLOGS_FD
variable is not used anymore. Which means we only need a mmap.

Indeed the shm_open() function was only needed to keep the shm between
the exec() of the master so we can get the logs stored there after doing
the final exec() in wait mode. Since the wait mode doesn't exist
anymore and the parsing is done in a worker, we only need to share a
memory zone between the master and the worker.

This patch removes shm_open() and replace it with a simple mmap(), this
way the shared startup-logs become more portable and USE_SHM_OPEN is not
required anymore.
2025-01-07 16:42:38 +01:00
Willy Tarreau
21df7677a9 BUILD: mworker: always initialize the saveptr of strtok_r()
Building with some libcs which define strtok_r() as an inline function
can yield a possibly uninitialized warning due to a loop dereferencing
this save pointer early, even though the doc clearly mentions that it
is ignored. This is actually more of a mismatch between the compiler
and the libc (gcc-4.7 and glibc-2.23 in that case). It's trivial to
set s2 to NULL here so let's do it to please this old couple. Note
that while the warning is triggered in all supported versions, there's
no point backporting it since it's unlikely this combination will be
relevant outside of backwards compatibility checks now.
2024-12-25 12:18:46 +01:00
Valentine Krasnobaeva
97aaf76716 BUG/MEDIUM: mworker: report status, if daemonized master fails
As daemonization fork happens now very early and before the master-worker
fork, if master or worker processes fail during the initialization, some
critical errors can't be reported to stdout. The launching (parent) process in
such cases exits with 0. This makes an impression, that master and his worker
have successfully started at background, which really complicates the
operations.

In the previous commit a pipe was added to make daemonized child communicate
with his parent. Let's add the same logic to master-worker mode. Up to
receiving the READY message from the worker, master will "forward" it via the
pipe to the launching process. Launching process can obtain master's exit
status, if the master fails to start and nothing has been written in the pipe.

This fix should be backported only in 3.1.
2024-12-09 21:32:49 +01:00
Valentine Krasnobaeva
1fead6c0ca BUG/MINOR: mworker: don't save program PIDs in oldpids
After reload, previously launched programs are stopped explicitly in
mworker_ext_launch_all(). So, there is no longer need to save their PIDs in
oldpids array before the master reexec().

This also prepares the fix of "-D -W -sf/-st" modes, as we will need to
loop over this array in the master process context, in order to stop the
previous master, when the new one is ready.

This patch should be backported only in 3.1.
2024-12-06 12:00:22 +01:00
Valentine Krasnobaeva
3500865bc1 REORG: startup: move mworker_apply_master_worker_mode in mworker.c
mworker_apply_master_worker_mode() is called only in master-worker mode, so
let's move it mworker.c
2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva
dee247c14e REORG: startup: move mworker_reexec and mworker_reload in mworker.c
Let's move mworker_reexec() and mworker_reload() in mworker.c. mworker_reload()
is called only within the functions, which are already in mworker.c. So, this
reorganization allows to declare mworker_reload() as a static.
2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva
0c7b93eb1d REORG: startup: move mworker_run_master and mworker_loop in mworker.c
mworker_run_master() is called only in master mode. mworker_loop() is static
and called only in mworker_run_master(). So let's move these both functions in
mworker.c.

We also need here to make run_thread_poll_loop() accessible from other units,
as it's used in mworker_loop().
2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva
7974089ac6 REORG: startup: move mworker_prepare_master in mworker.c
mworker_prepare_master() performs some preparation routines for the new worker
process, which will be forked during the startup. It's called only in
master-worker mode, so let's move it in mworker.c.
2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva
af642420b4 REORG: startup: move on_new_child_failure in mworker.c
mworker_on_new_child_failure() performs some routines for the worker process,
if it has failed the reload. As it's called only in mworker_catch_sigchld()
from mworker.c, let's move mworker_on_new_child_failure() in mworker.c as well.
Like this it could also be declared as a static.
2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva
321c021a83 MINOR: startup: rename on_new_child_failure to mworker_on_new_child_failure
This patch prepares the moving of on_new_child_failure definition into
mworker.c. So, let's rename it accordingly and let's also update its
description.
2024-11-25 15:20:24 +01:00
William Lallemand
15845247db MEDIUM: mworker: remove USE_SYSTEMD requirement for -Ws
Since sd_notify() is now implemented in src/systemd.c, there is no need
anymore to build its support conditionnally with USE_SYSTEMD.

This patch add supports for -Ws for every build and removes the
USE_SYSTEMD build option. It also remove every reference to USE_SYSTEMD
in the documentation and the CI.

This also allows to run the reg-tests in -Ws with the new VTest support.
2024-11-20 12:07:38 +01:00
Valentine Krasnobaeva
d5d41dee3d MINOR: startup: replace HAPROXY_LOAD_SUCCESS with global load_status
After master-worker refactoring, master performs re-exec only once up to
receiving "reload" command or USR2 signal. There is no more the second
master's re-exec to free unused memory. Thus, there is no longer need to export
environment variable HAPROXY_LOAD_SUCCESS with worker process load status. This
status can be simply saved in a global variable load_status.
2024-11-13 09:50:05 +01:00
Valentine Krasnobaeva
745a4c5e93 CLEANUP: mworker: make mworker_create_master_cli more readable
Using nested 'if' operator, while checking if we will need to allocate again the
"reload" sockpair, does not degrade performance, as mworker_create_master_cli is
a startup routine.

This nested 'if' (we check one condition in each operator) makes more visible the
fact, that the "reload" sockpair is allocated only once, when the master process
starts and it does not re-allocated again (hence, its FDs are not closed) during
reloads. This way of checking multiple conditions here makes more easy to spot
this fact, while analysing the code in order to investigate FD leaks between
master and worker.
2024-10-26 22:26:49 +02:00
William Lallemand
5db761f709 MINOR: mworker/cli: 'show proc debug' for old workers
Add FD details for old workers in 'show proc debug'.
2024-10-24 14:47:28 +02:00
William Lallemand
b49ddae21b MINOR: mworker/cli: remove comment line for program when useless
Remove the '# programs' line on 'show proc' output when there are no
program.
2024-10-24 14:39:41 +02:00
William Lallemand
84640aaa2a MINOR: mworker/cli: add 'debug' to 'show proc'
This patch adds a 'debug' parameter to the 'show proc' command of the
master CLI. It allows to show debug details about the processes.

Example:

echo 'show proc debug' | socat /tmp/master.sock -
\#<PID>          <type>          <reloads>       <uptime>        <version>      		<ipc_fd[0]>     <ipc_fd[1]>
391999          master          0 [failed: 0]   0d00h00m02s     3.1-dev10-b9095a-63		5               6
\# workers
392001          worker          0               0d00h00m02s     3.1-dev10-b9095a-63		3               -1
\# programs
2024-10-24 14:23:27 +02:00
Valentine Krasnobaeva
af1d170122 BUG/MINOR: mworker: fix mworker-max-reloads parser
Before this patch, when wrong argument was provided in the configuration for
mworker-max-reloads keyword, parser shows these errors below on the stderr:

[WARNING]  (1820317) : config : parsing [haproxy.cfg:154] : (null)parsing [haproxy.cfg:154] : 'mworker-max-reloads' expects an integer argument.

In a case, when by mistake two arguments were provided instead of one, this has
also triggered a buggy error message:

[ALERT]    (1820668) : config : parsing [haproxy.cfg:154] : 'mworker-max-reloads' cannot handle unexpected argument '45'.
[WARNING]  (1820668) : config : parsing [haproxy.cfg:154] : (null)

So, as 'mworker-max-reloads' is parsed in discovery mode by master process
let's align now its parser with all others, which could be called for this
mode. Like this in cases, when there are too many args or argument isn't a
valid integer we return proper error codes to global section parser and
messages are formated properly.

This fix should be backported in all stable versions.
2024-10-21 10:46:58 +02:00
Valentine Krasnobaeva
d11dc11e5a MINOR: mworker: report explicitly when worker exits due to max reloads
It's convienient for testing and for usage to produce different warning
messages, when the former worker exits due to max reloads exceeded, and when it
was terminated by the master.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
ee7fc98320 MINOR: mworker: deserialize process list before read_cfg_in_discovery_mode
This patch is a part of series to reintroduce the program support in the new
master-worker architecture.

For the moment we keep the order of program and worker forks the same as before
the refactoring, as we need to be sure that this won't introduce regressions.
So, programs are forked before the new worker process.

Before the program's fork we already need deserialized processes list to find
the programs launched before reload and to stop them. Processes list saved
before the reload in HAPROXY_PROCESSES variable. It should be deserialized
before the first configuration read in discovery mode, because resetenv keyword
could be presented in the global section.

So, let's move mworker_env_to_proc_list() from mworker_create_master_cli() to
main(). We need to call it only after reload in master-worker mode, thus
HAPROXY_MWORKER_REEXEC and HAPROXY_PROCESSES should be still presented in the
re-executing process environment before the first configuration read.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
7a267c4a27 MINOR: mworker: readapt program support in mworker_catch_sigchld
This patch is a part of series to reintroduce the program support in the new
master-worker architecture.

We just only launch and stop external programs and there is no any
communication between the master process and the started program binary. So,
ipc_fd[0] and ipc_fd[1] are not used and kept as -1 for programs processes. Due
to this, no need for the exiting program process to call fd_delete on this
fds. Otherwise, this will trigger a BUG_ON.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
d766677d92 MINOR: mworker: slow load status delivery if worker is starting
With refactored master-worker architecture master and worker processes parse
its parts of the configuration. Worker could have a huge configuration, so it
will take some time to load. As now HAPROXY_LOAD_SUCCESS is set to 1 only
after receiving the status READY from the new worker
cli_io_handler_show_loadstatus() may exit very fast by showing load status 0,
and in such case and mcli socket will be closed.

This already breaks some regression tests and can confuse some APIs. So, let's
slow down the load status delivery. If in the process list there is still some
process, which is loading (PROC_O_INIT). appctx task will sleep in this case for
50ms and then return 0. cli_io_handler_show_loadstatus() is called in loop, so
with such pacing, there is a high chance that the next time, when we enter in
its scope all processes will have the state READY. Like this master CLI
connection socket won't be closed until the loading of the new worker is really
finished, thus the reload status and logs (Success=1/0) will be shown in
synchronious way.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
5f16453082 MEDIUM: mworker: block reloads
When reloads arrive very often (sent by some APIs), newly forked workers
almost don't have a time to load completely and to send its READY status to
master, which allows then to stop the previous worker (launched before reload).
As a result, the number of workers increases very quickly, previous workers are
still alive and the memory consumption is very high.

To avoid such situations let's return in cli_parse_reload() reload status 0
with the text ""Another reload is still in progress", if there is still a
process with PROC_O_INIT flag in the processes list.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
b73a278df4 MINOR: mworker/cli: add _send_status to support state transition
In the new master-worker architecture, when a worker process is forked and
successfully initialized it needs somehow to communicate its "READY" state to
the master, in order to terminate the previous worker and workers, that might
exceeded max_reloads counter.

So, let's implement for this a new master CLI _send_status command. A new
worker can send its status string "READY" to the master, when it's about
entering to the run poll loop, thus it can start to receive data.

In _send_status() in the master context we update the status of the new worker:
PROC_O_INIT flag is withdrawn.

When TERM signal is sent to a worker, worker terminates and this triggers the
mworker_catch_sigchld() handler in master. This handler deletes the exiting
process entry from the processes list.

In _send_status() we loop over the processes list twice. At the first time, in
order to stop workers that exceeded the max_reloads counter. At the second time,
in order to stop the worker forked before the last reload. In the corner case,
when max_reloads=1, we avoid to send SIGTERM twice to the same worker by
setting sigterm_sent flag during the first loop.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
154848a314 MINOR: mworker: simplify the code that sets PROC_O_LEAVING
When master performs a reexec it should set for an already existed worker the
flag PROC_O_LEAVING. It means that existed worked is marked as the previous one
and will be terminated after the reload.

In the previous implementation master process was need to do the reexec
twice (the first time for parsing its configuration and the second time to free
unused ressources). So the logic of setting PROC_O_LEAVING was based on
comparing the number of reloads, performed by each process from the processes
list, except the master.

Now, as being mentioned before, reexec is performed only once. So, in this case
we need to set PROC_O_LEAVING flag, when we deserialize the list. It is done for
all processes, which have the number of reloads stricly positive.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
c8aac63893 MINOR: mworker: add support for case when new worker dies
The case, when the new worker fails while it parses its configuration or while
it tries to apply it, could be considered as the new one, because the master
process is no longer need to reexec again. The master simply keeps the previous
worker (forked before the reload) and it let the new one to exit with failure.

When the new worker exits, in the master process context (mworker_catch_sigchld)
we need to stop a MASTER proxy listener and we need to drop the server,
attached to new worker's CLI sockpair (it's inherited in master). Then we
explicitly delete master's end of this sockpair (child->ipc_fd[0]) from the
fdtab and we free the memory allocated for the worker process.
on_new_child_failure() is called before the clean up to signal systemd that
reload/load was failed.

If the new worker fails during the first start, so there is no any previous
worker, master process should exit immediately in order to keep the same
behaviour, as it was before this architecture change.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
9b27f82da3 MINOR: mworker: mworker_catch_sigchld: use fd_delete instead of close
If the worker exits due to failure or due to receiving TERM signal, in the
master context, we can't now simply close the master's fd (ipc_fd[0]) of
the inherited master CLI sockpair.

When the worker is created, in the master process context MASTER proxy listener
is bound to ipc_fd[0]. When this worker fails or exits, master process is
always in its polling loop. So, closing some fd in its context immediately
triggers the BUG_ON(fd->owner), as the poller try to reinsert the "freed" fd
into fdtab and try to reuse it. We must call fd_delete in this case. This will
deinitializes fd auxilary data and closes its properly.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
26ad5465cc MINOR: mworker/cli: create MASTER proxy before mcli listeners
For the master process we always need to create a MASTER proxy, even if
master cli settings were not provided via command line, because now we bind a
listener in the master process context at ipc_fd[0]. So, MASTER proxy should be
already allocated at this moment.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
0fbf1973ad MINOR: mworker/cli: rename mworker_cli_proxy_new_listener
This is the first commit in a series to add the support of 4 primary reload
use-cases for the new master-worker architecture:

1. Newly forked worker process dies before any reload, due to some errors in
   the configuration. Newly forked worker process crashes before any reload
   after sending its "READY" state to master.

2. Newly forked worker process dies due to some errors in the new
   configuration. This happens after reload, when this new configuration was
   supplied, so the previous worker process is still here.

3. Newly forked worker process crashes after sending its "READY" state to
   master due to some bugs. This happens after reload, so the previous worker
   process is still here.

4. Newly forked worker process has sent its "READY" state to master and starts
   to receive traffic. This happens after reload, the old worker hasn't
   terminated yet, as it is waiting on some idle connection and it crashes.

Let's rename in this commit mworker_cli_proxy_new_listener() to
mworker_cli_master_proxy_new_listener() to outline, that this function creates
"master-socket" bind conf and allocates a listener. This listener is attached
to the MASTER proxy and it's bound to the ipc_fd[0] of the sockpair,
inherited in master and in worker processes (master CLI sockpair).
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
48371e6a30 MEDIUM: cfgparse: call some parsers only in MODE_DISCOVERY
This commit is a part of the series to add a support of discovery mode in the
configuration parser and in initialization sequence.

Some keyword parsers tagged with KWF_DISCOVERY (for example those, which parse
runtime modes, poller types, pidfile), should not be called twice when
the configuration will be read the second time after the discovery mode.
It's redundant and could trigger parser's errors in standalone mode. In
master-worker mode the worker process inherits parsed settings from the master.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
f9123e2183 MEDIUM: cfgparse: add KWF_DISCOVERY keyword flag
This commit is a part of the series to add a support of discovery mode in the
configuration parser and in initialization sequence.

So, let's add here KWF_DISCOVERY flag to distinguish the keywords,
which should be parsed in "discovery" mode and which are needed for master
process, from all others. Keywords, that should be parsed in "discovery" mode
have its dedicated parser funtions. Let's tag these functions with
KWF_DISCOVERY flag in keywords list. Like this, only these keyword parsers
might be called during the first configuration read in discovery mode.
2024-10-16 22:02:39 +02:00
Valentine Krasnobaeva
5e06d45df7 REORG: init: encapsulate 'reload' sockpair and master CLI listeners creation
Let's encapsulate the logic of 'reload' sockpair and master CLI listeners
creation, used by master CLI into a separate function, as we needed this
only in master-worker runtime  mode. This makes the code of init() more
readable.
2024-06-27 16:08:42 +02:00
William Lallemand
aa3632962f MEDIUM: mworker: get rid of libsystemd
Given the xz drama which allowed liblzma to be linked to openssh, lets remove
libsystemd to get rid of useless dependencies.

The sd_notify API seems to be stable and is now documented. This patch replaces
the sd_notify() and sd_notifyf() function by a reimplementation inspired by the
systemd documentation.

This should not change anything functionnally. The function will be built when
haproxy is built using USE_SYSTEMD=1.

References:
  https://github.com/systemd/systemd/issues/32028
  https://www.freedesktop.org/software/systemd/man/devel/sd_notify.html#Notes

Before:

wla@kikyo:~% ldd /usr/sbin/haproxy
	linux-vdso.so.1 (0x00007ffcfaf65000)
	libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x000074637fef4000)
	libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x000074637fe4f000)
	libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x000074637f400000)
	liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x000074637fe0d000)
	libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x000074637f92a000)
	libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x000074637f365000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000074637f000000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000074637f27a000)
	libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x000074637fdff000)
	libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x000074637eeb8000)
	liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x000074637fdcd000)
	libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x000074637ee01000)
	liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x000074637fda8000)
	/lib64/ld-linux-x86-64.so.2 (0x000074637ff5d000)
	libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x000074637f904000)

After:

wla@kikyo:~% ldd /usr/sbin/haproxy
	linux-vdso.so.1 (0x00007ffd51901000)
	libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f758d6c0000)
	libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007f758d61b000)
	libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007f758ca00000)
	liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x00007f758d5d9000)
	libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f758d365000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f758d5ba000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f758c600000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f758c915000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f758d729000)

A backport to all stable versions could be considered at some point.
2024-04-03 15:53:18 +02:00
Christopher Faulet
94b8ed446f MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands
The main CLI I/O handle is responsible to interrupt the processing on
shutdown/abort. It is not the responsibility of the I/O handler of CLI
commands to take care of it.
2024-03-28 17:28:20 +01:00
Willy Tarreau
7c9ce715c9 MINOR: ring: make callers use ring_data() and ring_size(), not ring->buf
As we're going to remove the ring's buffer, we don't want callers to access
it directly, so let's use ring_data() and ring_size() instead for this.
2024-03-25 17:34:19 +00:00
William Lallemand
3dd55fa132 MINOR: mworker/cli: implement hard-reload over the master CLI
The mworker mode never had a proper 'hard-stop' (-st) for the reload,
this is a mode which was commonly used with the daemon mode, but it was
never implemented in mworker mode.

This patch fixes the problem by implementing a "hard-reload" command
over the master CLI. It does the same as the "reload" command, but
instead of waiting for the connections to stop in the previous process,
it immediately quits the previous process after binding.
2023-11-24 21:44:25 +01:00
William Lallemand
fcb080d8f9 MEDIUM: mworker: display a more accessible message when a worker crash
Should fix issue #1034.

Display a more accessible message when a worker crash about what to do.

Example:

	$ ./haproxy -W -f haproxy.cfg
	[NOTICE]   (308877) : New worker (308884) forked
	[NOTICE]   (308877) : Loading success.
	[NOTICE]   (308877) : haproxy version is 2.9-dev4-d90d3b-58
	[NOTICE]   (308877) : path to executable is ./haproxy
	[ALERT]    (308877) : Current worker (308884) exited with code 139 (Segmentation fault)
	[WARNING]  (308877) : A worker process unexpectedly died and this can only be explained by a bug in haproxy or its dependencies.
	Please check that you are running an up to date and maintained version of haproxy and open a bug report.
	HAProxy version 2.9-dev4-d90d3b-58 2023/09/05 - https://haproxy.org/
	Status: development branch - not safe for use in production.
	Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
	Running on: Linux 6.2.0-31-generic #31-Ubuntu SMP PREEMPT_DYNAMIC Mon Aug 14 13:42:26 UTC 2023 x86_64
	[ALERT]    (308877) : exit-on-failure: killing every processes with SIGTERM
	[WARNING]  (308877) : All workers exited. Exiting... (139)
2023-09-05 15:31:04 +02:00
William Lallemand
117b03ff4a BUG/MINOR: mworker: leak of a socketpair during startup failure
Aurelien Darragon found a case of leak when working on ticket #2184.

When a reexec_on_failure() happens *BEFORE* protocol_bind_all(), the
worker is not fork and the mworker_proc struct is still there with
its 2 socketpairs.

The socketpair that is supposed to be in the master is already closed in
mworker_cleanup_proc(), the one for the worker was suppposed to
be cleaned up in mworker_cleanlisteners().

However, since the fd is not bound during this failure, the fd is never
closed.

This patch fixes the problem by setting the fd to -1 in the mworker_proc
after the fork, so we ensure that this it won't be close if everything
was done right, and then we try to close it in mworker_cleanup_proc()
when it's not set to -1.

This could be triggered with the script in ticket #2184 and a `ulimit -H
-n 300`. This will fail before the protocol_bind_all() when trying to
increase the nofile setrlimit.

In recent version of haproxy, there is a BUG_ON() in fd_insert() that
could be triggered by this bug because of the global.maxsock check.

Must be backported as far as 2.6.

The problem could exist in previous version but the code is different
and this won't be triggered easily without other consequences in the
master.
2023-06-21 09:44:18 +02:00
William Lallemand
e6051a04ef BUG/MEDIUM: mworker: increase maxsock with each new worker
In ticket #2184, HAProxy is crashing in a BUG_ON() after a lot of reload
when the previous processes did not exit.

Each worker has a socketpair which is a FD in the master, when reloading
this FD still exists until the process leaves. But the global.maxconn
value is not incremented for each of these FD. So when there is too much
workers and the number of FD reaches maxsock, the next FD inserted in
the poller will crash the process.

This patch fixes the issue by increasing the maxsock for each remaining
worker.

Must be backported in every maintained version.
2023-06-19 17:32:32 +02:00
Tim Duesterhus
fe83f58906 CLEANUP: Stop checking the pointer before calling task_free()
Changes performed with this Coccinelle patch:

    @@
    expression e;
    @@

    - if (e != NULL) {
    	task_destroy(e);
    - }

    @@
    expression e;
    @@

    - if (e) {
    	task_destroy(e);
    - }

    @@
    expression e;
    @@

    - if (e)
    	task_destroy(e);

    @@
    expression e;
    @@

    - if (e != NULL)
    	task_destroy(e);
2023-04-23 00:28:25 +02:00
Christopher Faulet
208c712b40 MINOR: stconn: Rename SC_FL_SHUTW in SC_FL_SHUT_DONE
Here again, it is just a flag renaming. In SC flags, there is no longer
shutdown for writes but shutdowns.
2023-04-14 15:01:21 +02:00
Christopher Faulet
7faac7cf34 MINOR: tree-wide: Simplifiy some tests on SHUT flags by accessing SCs directly
At many places, we simplify the tests on SHUT flags to remove calls to
chn_prod() or chn_cons() function because the corresponding SC is available.
2023-04-05 08:57:06 +02:00
Christopher Faulet
87633c3a11 MEDIUM: tree-wide: Move flags about shut from the channel to the SC
The purpose of this patch is only a one-to-one replacement, as far as
possible.

CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the
stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not
so simple because at many places, we were testing if a channel was shut for
reads and writes in same time. To do the same, shut for reads must be tested
on one side on the SC and shut for writes on the other side on the opposite
SC. A special care was taken with process_stream(). flags of SCs must be
saved to be able to detect changes, just like for the channels.
2023-04-05 08:57:06 +02:00
William Lallemand
cc5b9fa593 BUG/MEDIUM: mworker: don't register mworker_accept_wrapper() when master FD is wrong
This patch handles the case where the fd could be -1 when proc_self was
lost for some reason (environment variable corrupted or upgrade from < 1.9).

This could result in a out of bound array access fdtab[-1] and would crash.

Must be backported in every maintained versions.
2023-02-21 13:53:35 +01:00
William Lallemand
e16d32050e BUG/MEDIUM: mworker: prevent inconsistent reload when upgrading from old versions
Previous versions ( < 1.9 ) of the master-worker process didn't had the
"HAPROXY_PROCESSES" environment variable which contains the list of
processes, fd etc.

The part which describes the master is created at first startup so if
you started the master with an old version you would never have
it.

Since patch 68836740 ("MINOR: mworker: implement a reload failure
counter"), the failedreloads member of the proc_self structure for the
master is set to 0. However if this structure does not exist, it will
result in a NULL dereference and crash the master.

This patch fixes the issue by creating the proc_self structure for the
master when it does not exist. It also shows a warning which states to
restart the master if that is the case, because we can't guarantee that
it will be working correctly.

This MUST be backported as far as 2.5, and could be backported in every
other stable branches.
2023-02-21 13:53:35 +01:00
William Lallemand
d27f457eea BUG/MINOR: mworker: stop doing strtok directly from the env
When parsing the HAPROXY_PROCESSES environement variable, strtok was
done directly from the ptr resulting from getenv(), which replaces the ;
by \0, showing confusing environment variables when debugging in /proc
or in a corefile.

Example:

	(gdb) x/39s *environ
	[...]
	0x7fff6935af64: "HAPROXY_PROCESSES=|type=w"
	0x7fff6935af7e: "fd=3"
	0x7fff6935af83: "pid=4444"
	0x7fff6935af8d: "rpid=1"
	0x7fff6935af94: "reloads=0"
	0x7fff6935af9e: "timestamp=1676338060"
	0x7fff6935afb3: "id="
	0x7fff6935afb7: "version=2.4.0-8076da-1010+11"

This patch fixes the issue by doing a strdup on the variable.

Could be backported in previous versions (mworker_proc_to_env_list
exists since 1.9)
2023-02-21 13:53:33 +01:00
William Lallemand
5a7f83af84 BUG/MINOR: mworker: prevent incorrect values in uptime
Since the recent changes on the clocks, now.tv_sec is not to be used
between processes because it's a clock which is local to the process and
does not contain a real unix timestamp.  This patch fixes the issue by
using "data.tv_sec" which is the wall clock instead of "now.tv_sec'.
It prevents having incoherent timestamps.

It also introduces some checks on negatives values in order to never
displays a netative value if it was computed from a wrong value set by a
previous haproxy version.

It must be backported as far as 2.0.
2023-02-17 17:17:28 +01:00
Christopher Faulet
da89e9b95b MINOR: channel/applets: Stop to test CF_WRITE_ERROR flag if CF_SHUTW is enough
In applets, we stop processing when a write error (CF_WRITE_ERROR) or a shutdown
for writes (CF_SHUTW) is detected. However, any write error leads to an
immediate shutdown for writes. Thus, it is enough to only test if CF_SHUTW is
set.
2023-01-09 18:41:08 +01:00
Aurelien DARRAGON
1412d31a6d MINOR: mworker: remove unused legacy code in mworker_cleanlisteners
This cleanup is a follow up of "CLEANUP: peers: unused code path in
process_peer_sync"

There are some remnants of 1.6 peers specific code in mworker_cleanlisteners()
that was introduced with this patch serie:
f83d3fe00a MEDIUM: init: stop any peers section not bound to the correct process
47c8c029db MEDIUM: init: completely deallocate unused peers

Back then, nbthread did not exist, nbproc was used instead.
Updating some comments to make them more relevant to current haproxy design.
(multithreaded single process)

Moreover, in 47c8c029db, task_free() was performed on peers_fe->task.
But by looking at the code, from 1.6 til now, peers_fe->task
is never used for peers proxies, it is only used for main proxies (referenced
in proxies_list).
Removing this extra task cleanup because it is misleading.
2022-12-07 18:26:53 +01:00