Let's move mworker_reexec() and mworker_reload() in mworker.c. mworker_reload()
is called only within the functions, which are already in mworker.c. So, this
reorganization allows to declare mworker_reload() as a static.
mworker_run_master() is called only in master mode. mworker_loop() is static
and called only in mworker_run_master(). So let's move these both functions in
mworker.c.
We also need here to make run_thread_poll_loop() accessible from other units,
as it's used in mworker_loop().
This commit prepares the move of mworker_run_master() in mworker.c.
Let's remove from it's definition the code, which adjusts verbosity in
dependency of other global run time modes (daemon or foreground). This part
should stay in main(), where all verbosity modes are handeled for
different mode combinations.
mworker_prepare_master() performs some preparation routines for the new worker
process, which will be forked during the startup. It's called only in
master-worker mode, so let's move it in mworker.c.
mworker_on_new_child_failure() performs some routines for the worker process,
if it has failed the reload. As it's called only in mworker_catch_sigchld()
from mworker.c, let's move mworker_on_new_child_failure() in mworker.c as well.
Like this it could also be declared as a static.
This patch prepares the moving of on_new_child_failure definition into
mworker.c. So, let's rename it accordingly and let's also update its
description.
In 3.1-dev10, commit 8dd4efe42f ("MAJOR: mworker: move master-worker
fork in init()"), the FD associated to /dev/null was made CLOEXEC
using O_CLOEXEC. Unfortunately this is not portable on older OSes,
doesn't build on Solaris for example, and was even reported as breaking
moderately old Linux OSes for other projects. Better not use it unless
absolutely certain it will work (currently we only use it for Linux
namespaces, which are optional), and use the conventional FD_CLOEXEC
instead.
No backport is needed.
This fixes the commit d6ccd1738b
("MINOR: startup: set HAPROXY_LOCALPEER only once").
Comment "/* preset some environment variables */" is now useless here as
HAPROXY_LOCALPEER is set later during the initialization stage and only once.
This should not be backported, as related to the latest master-worker
refactoring.
This fixes the commit d6ccd1738b
("MINOR: startup: set HAPROXY_LOCALPEER only once"). HAPROXY_LOCALPEER could
be checked in the configuration to set some servers settings or listeners. So,
we need to set it just before we read the configuration at the second time.
Let's mark HAPROXY_LOCALPEER as "usable" in the configuration in the related
documentation chapter.
This should not be backported, as related to the latest master-worker
refactoring.
Let's store progname in the global variable, as it is handy to use it in
different parts of code to format messages sent to stdout.
This reduces the number of arguments, which we should pass to some functions.
In the init_early() global.log_tag is initialized to the string from progname
pointer and global.log_tag.area points to this pointer.
If log-tag keyword is provided in the configuration, its parser at first frees
global.log_tag.area and then it does a new memory allocation to copy
there the argument of log-tag. So, progname no longer points to the valid
memory.
To fix this, let's always keep progname and global.log_tag.area at separate
memory areas. If log_tag will be redefined in the configuration, its parser will
free the memory allocated for the default value in chunk_destroy(). Memory
allocated for progname will be freed in deinit().
This should not be backported as related to the latest master-worker
refactoring.
Before this patch HAPROXY_LOCALPEER variable could be set in init_early(),
in init_args() and in cfg_parse_global(). In master-worker mode, if localpeer
keyword set in the global section, HAPROXY_LOCALPEER in the worker
environment is set to this keyword's value, but in the master environment it
still keeps the default, a localhost name. This is confusing.
To fix it, let's set HAPROXY_LOCALPEER only once, when a worker or process in a
standalone mode has finished to parse its configuration. And let's set this
variable only for the worker process or for the process in a standalone mode,
because the master doesn't need it.
HAPROXY_LOCALPEER takes the value saved in localpeer global variable, which is
always set by default in init_early() to the local hostname. Then, localpeer
could be reset in init_args (-L option) and in cfg_parse_global() (while
parsing "localpeer" keyword).
Since sd_notify() is now implemented in src/systemd.c, there is no need
anymore to build its support conditionnally with USE_SYSTEMD.
This patch add supports for -Ws for every build and removes the
USE_SYSTEMD build option. It also remove every reference to USE_SYSTEMD
in the documentation and the CI.
This also allows to run the reg-tests in -Ws with the new VTest support.
Before this patch HAPROXY_BRANCH was unset just after configuration parsing.
Let's keep it, as it could be used in conditional blocks and some
configuration directives and it's handy to check its runtime value via "show
env".
In master-worker mode, this variable is set to the same value for both
processes.
proxy auth_uri struct was manually cleaned up during deinit, but the logic
behind was kind of akward because it was required to find out which ones
were shared or not. Instead, let's switch to a proper refcount mechanism
and free the auth_uri struct directly in proxy_free_common().
When uri_auth admin rules were implemented in 474be415
("[MEDIUM] stats: add an admin level") no attempt was made to free the
list of allocated rules, which makes valgrind unhappy upon deinit when
"stats admin" is used in the config.
To fix the issue, let's cleanup the admin rules list upon deinit where
uri_auth freeing is already handled.
While this could be backported to every stable versions, given how minor
this is and has no impact on the dying process, it is probably not worth
the effort.
load_cfg() is called only once before the first reading of the configuration
(we parse here only the global section). Then, before reading the rest of the
sections (second call of read_cfg()), we call clean_env(). As
HAPROXY_CFGFILES is set in load_cfg(), which is called only once, clean_env()
erases it. Thus, it's not longer shown in "show env" output.
To fix this, let's set HAPROXY_CFGFILES in read_cfg(). Like this in
master-worker mode it is set for master and for worker processes, as it was
before the refactoring.
This fix doesn't need to be backported as related to the latest master-worker
architecture change.
After master-worker refactoring, master performs re-exec only once up to
receiving "reload" command or USR2 signal. There is no more the second
master's re-exec to free unused memory. Thus, there is no longer need to export
environment variable HAPROXY_LOAD_SUCCESS with worker process load status. This
status can be simply saved in a global variable load_status.
cfg_program_postparser() contains 2 parts:
- check the combination of MODE_MWORKER and "program" section. if
"program" section was parsed, MODE_MWORKER is mandatory;
- check "command" keyword, which is mandatory for this section as
well.
This is more appropriate now, after the master-worker refactoring, do the
first part in read_cfg_in_discovery_mode, where we already check the
combination of MODE_MWORKER and -S option.
We need to do the second part just below, in read_cfg_in_discovery_mode() as
well, because it's only the master process, who parses now program section and
programs are forked before running postparser functions in step_init_2.
Otherwise, mworker_ext_launch_all() will emit a log message, that program is
started, but actually nothing has been launched, if 'command' keyword is
absent.
This not needs to be backported, as related to the master-worker refactoring.
This commit introduces the tune.renice.startup and tune.renice.runtime
global keywords that allows to change the priority with setpriority().
tune.renice.startup is parsed and applied in the worker or the standalone
process for configuration parsing. If this keyword is used alone, the
nice value is changed to the previous one after configuration parsing.
tune.renice.runtime is applied after configuration parsing, so in the
worker or a standalone process. Combined with tune.renice.startup it
allows to have a different nice value during configuration parsing and
during runtime.
The feature was discussed in github issue #1919.
Example:
global
tune.renice.startup 15
tune.renice.runtime 0
As master-worker fork happens now before step_init_2(), when pollers are
initialized and polling settings and dumped then in verbose and in debug modes
to stdout, it turns out that master and worker dump its same polling
settings separately. This creates long and messy output in these modes.
Polling settings are the same for master and for worker process for the moment.
Even if they would diverge in future we are interested here in worker's
settings. So, when started in the master-worker mode let's dump it only in the
worker context.
This doesn't need to be backported as related to the latest master-worker
refactoring.
If haproxy was started with -W -dK*, after master-worker refactoring, we dump
registered keywords to stdout twice in master and in worker processes. This
information is redundant and output has no longer the right format. So, as the
keyword registration happens very early before the fork, let's dump keywords
only in the worker context, if haproxy was launched with -W.
This does not need to be backported, as related to the latest master-worker
refactoring.
If haproxy was started with -W -dL, after master-worker refactoring we dump
libs to stdout twice in master and in worker processes. This is information is
redundant. So let's show linked libraries only in the worker context, if
haproxy was started also with -W.
This does not need to be backported, as related to the latest master-worker
rework.
Don't do master-worker fork if MODE_CHECK is detected from the command line along
with the master-worker mode. We should exit in MODE_CHECK, after the
configuration parsing and validation. So, with the new master-worker architecture
it's better to align this mode with the standalone.
This patch does not need to be backported, as related to the latest
master-worker rework.
Flag MODE_STARTING should be unset for master just before freeing the startup
logs ring, as it triggers the copy of process logs to this ring, see the code
of print_message().
Moreover with this flag set, if startup logs ring pointer is NULL, any
print_message() triggered just before the execvp in mworker_reexec() will call
startup_logs_init(). So ring will be allocated again "discretely" and after
execvp we will lost its address, as in step_init_1() we will call again
startup_logs_init().
No need to backport this fix as it's related to the latest master-worker
refactoring.
During re-execution master keeps always opened "reload" sockpair FDs and
shared sockpair ipc_fd[0], the latter is using to transfert listeners sockets
from the previously forked worker to the new one. So, these master's FDs are
inherited in the newly forked worker and must be closed in its context.
"reload" sockpair inherited FDs and shared sockpair FD (ipc_fd[0]) are closed
separately, becase master doesn't recreate "reload" sockpair each time after
its re-exec. It always keeps the same FDs for this "reload" sockpair. So in
worker context it can be closed immediately after the fork.
At contrast, shared sockpair is created each time after reload, when the new
worker will be forked. So, if N previous workers are still exist at this moment,
the new worker will inherit N ipc_fd[0] from master. So, it's more save to
close all these FDs after get_listeners_fd() and bind_listeners() calls.
Otherwise, early closed FDs in the worker context will be immediately bound to
listeners and we could potentially have some bugs.
There are two parts in mworker_cli_proxy_create(): allocating and setting up
MASTER proxy and allocating and setting up servers on ipc_fd[0] of the
sockpairs shared with workers.
So, let's split mworker_cli_proxy_create() into two functions respectively.
Each of them takes **errmsg as an argument to write an error message, which may
be triggered by some subcalls. The content of this errmsg will allow to extend
the final alert message shown to user, if these new functions will fail.
The main goals of this split is to allow to move these two parts independantly
in future and makes the code of haproxy initialization in haproxy.c more
transparent.
Before refactoring master-worker architecture, resources to setup master CLI
for the new worker process (shared sockpair, entry in proc_list) were created
in init() before parsing the configuration and binding listening sockets. So,
master during its re-exec has had to cleanup the new worker's ressources in
a case, when it fails at some initialization step before the fork.
Now fork happens very early and worker parses its configuration by itself. If
it fails during the initialization stage, all clean ups (deleting the fds of
the shared sockpair, proc_list cleanup) are performed in SIGCHLD handler up to
catching the SIGCHLD corresponded to this new worker. So, there is no longer
need to call mworker_cleanup_proc() in mworker_reexec().
As for mworker_cleanlisteners(), there is no longer need to call this function.
Master parses now only "global" and "program" sections, so it allocates only
MASTER proxy, which is stopped in mworker_reexec() by mworker_cli_proxy_stop().
Let's keep the definitions of mworker_cleanlisteners() and
mworker_cleanup_proc() in mworker.c for the moment. We may reuse parts of its
code later.
As master-worker fork happens now at early init stage and worker then parses
its configuration and performs all initialization steps, let's duplicate
startup logs ring for it, just before the moment when it enters in its pollong
loop. Startup logs ring content is shown as an output of the "reload" master
CLI command and we should be able to dump here worker initialization logs.
Log messages are written in startup logs ring only, when mode MODE_STARTING is
set (see print_message()). So, to be able to keep in startup logs the last
worker alerts, let's withdraw MODE_STARTING and let's reset user messages
context respectively just before entering in polling loop.
This fix does not need to be backported as it is a part of previous patches
from this version, which refactor master-worker architecture.
This patch simplifies the code of startup_logs_init_shm(). We no longer re-exec
master process twice after each reload to free its unused memory, which it had
to allocate, because it has parsed all configuration sections. So, there is no
longer need to keep SHM fd opened between the first and the next reloads. We
can completely remove HAPROXY_STARTUPLOGS_FD.
In step_init_1() we continue to call startup_logs_init_shm() to open SHM and to
allocate startup logs ring area within it. In master-worker mode, worker
duplicates initial startup logs ring after sending its READY state to master.
Sharing the same ring between two processes until the worker finishes its
initialization allows to show at master CLI output worker's startup logs.
During the next reload master process should free the memory allocated for the
ring structure. Then after the execvp() it will reopen and map SHM area again
and it will reallocate again the ring structure.
When master enters in recovery mode after unsuccessfull reload
HAPROXY_LOAD_SUCCESS should be set as 0. Like this
cli_io_handler_show_cli_sock() could dump in master CLI its warnings and alerts,
saved in startup logs ring.
No need to backport this fix, as this is related to the previous patches in
this version to refactor master-worker architecture.
In case of daemon mode now daemonization fork happens in the early init stage
before parsing and applying the configuration, so we can't close
stdio/stderr/stdout immediately after forking. We keep it open until the most
of configuration, including chroot are applied in order to show alerts, if
there are some problems. To achieve this /dev/null is opened just before calling
chroot(), and after the chroot block it's used to close all standard outputs
and stdin. At this point we no longer need the fd of /dev/null, so we can close
it as well.
setenv/resetenv/presetenv/unsetenv keywords in the configuration modify the
process environment. In case of master-worker and programs we need to restore
the initial process environment before reload, as the configuration could
change in between and newly forked workers and programs should be launched
in the environment corresponded to this new configuration.
To achieve this we backup the initial process environment before the first
configuration read, when 'global' and 'program' sections are read. And then we
clean up master process environment and restore the initial one from the backup
in mworker_reexec().
Let's reintroduce systemd support in the refactored master-worker mode.
As for now, the master-worker fork happens during early initialization steps and
then the master process receieves the "READY" status message from the newly
forked worker, that has successfully loaded. Let's propagate this "READY" status
message at this moment to the systemd from the master process context
(_send_status()). We use the master process to send messages to systemd,
because it is only the process, monitored by systemd.
In master recovery mode, we also need to send to the systemd the "READY"
message, but with the status "Reload failed". "READY" will signal to systemd,
that master process is still alive, because it doesn't exit in recovery mode
and it keeps the existed worker. Status "Reload failed" will signal to user,
that something wrong has happened with the configuration. Same message logic
was originally preserved for the case, when the worker fails to read its
configuration, see on_new_child_failure() for more details.
This patch is a part of series to reintroduce the program support in the new
master-worker architecture.
Let's add here mworker_ext_launch_all() call before master-worker fork to
start external programs. We keep the order and the place of these two forks
(program and master-worker) the same as before the refactoring, in order to
avoid regressions.
This patch is a part of series to reintroduce the program support in the new
master-worker architecture.
For the moment we keep the order of program and worker forks the same as before
the refactoring, as we need to be sure that this won't introduce regressions.
So, programs are forked before the new worker process.
Before the program's fork we already need deserialized processes list to find
the programs launched before reload and to stop them. Processes list saved
before the reload in HAPROXY_PROCESSES variable. It should be deserialized
before the first configuration read in discovery mode, because resetenv keyword
could be presented in the global section.
So, let's move mworker_env_to_proc_list() from mworker_create_master_cli() to
main(). We need to call it only after reload in master-worker mode, thus
HAPROXY_MWORKER_REEXEC and HAPROXY_PROCESSES should be still presented in the
re-executing process environment before the first configuration read.
Let's encapsulate the logic to set verbosity modes (MODE_DEBUG and MODE_VERBOSE)
in a separate function set_verbosity(). This makes the code of main() more
readable and this allows to call set_verbosity() for master process in recovery
mode. So, in this mode, verbosity settings before the master re-execution will
be re-applied to master. set_verbosity() will be extended in future commits to
reduce the verbosiness of master in order not to dump pollers list and filters,
if it was started with -V or -d.
In this commit we add run_master_in_recovery_mode(), which groups all necessary
initialization steps, which master should perform to be able to enter in its
polling loop (run_master()), when it fails while parsing its new config.
As exit_on_failure() is now adapted for master recovery mode. Let's register
it as atexit handler, when master enters in this mode. And let's remove
atexit_flag variable for master, because we no longer use it.
We also slightly refactor here read_cfg_in_discovery_mode() in order to call
run_master_in_recovery_mode() for the case, described above. Warning messages
are mandatory before calling the run_master_in_recovery_mode() as this allows
to stop haproxy with error, if it was launched in zero-warning mode.
So, in recovery mode master does not launch any worker. It just performs its
necessary initialization routines and enters in its polling loop to continue
to monitor the existed worker process.
Master recovery mode replaces the former wait-mode with a difference, that
master in this case doesn't try to fork the new worker process. But it still
needs to enter to its polling loop in order to monitor the previous worker.
Master performs some initialization steps for this and it recreates its master
CLI. During its initialization steps, master could potentially fail again.
As we use for the moment for master init steps some common routines
(step_init_2() and step_init_3()), there is no way there to signal to user that
failure has happened for the master and in addition, in its recovery mode. So,
in such case exit_on_failure() can be still useful in order to print an
appropriate alert, as we can register this function as atexit handler for the
master.
Let's encapsulate here the code to load and to read the configuration at the
first time in MODE_DISCOVERY. This makes the code of main() more readable and
this adds the structure for adding necessary master initializations routines
to support master recovery mode.
Let's encapsulate master's code (steps which it does before entering in its
polling loop and deinitialization routines after) in a separate run_master()
function. This makes the code of main() more readable. In future we plan to put
in run_master() more master process related code, in order to clean completely
init_step_2(), init_step_3() and init_step_4().
Let's encapsulate here another part of main, after binding listeners sockets
and before calling the master's code in master-worker mode. This block
contains the code, which applies verbosity settings, checks limits and updates
the ready date. It will take some time to figure out, which of these parts are
really needed for the master, or which ones it could skip. So let's put all
these for the moment in step_init_4() and let's call it for all modes.
Let's encapsulate here the code, which tries to bind listeners for the new
process in a separate function. This will make the main() code more readable.
Master process, even if it has failed while reading its new configuration, has
to bind its master CLI sockets. So like this we will can call this function in
the master recovery mode.
Master CLI socket address and port for external connections (user, monitoring
tools) are provided for now only via the command line. So, master, even after
this failure can and must reestablish master CLI connections again.
Let's encapsulate here the code, that calls sock_get_old_sockets() to obtain
listeners sockets from the previous process into a separate function. This
will make the code of main() more readable and we can move this new function
(if we might need so) in future.
MODE_CHECK and MODE_CHECK_CONDITION are applied now very early in
step_init_1() and step_init_2() in order to check the configuration or to check
some condition provided via the command line. When these checks have
terminated, the main process exits. So, no longer need to verify these modes at
the moment, when the current process have already done its basic initialization
routines and is asking for listeners sockets from the previously started one.
The first part of main(), just after calling the former init() and before
trying to bind listeners, need to be also encapsulated into a separate
step_init_3() as it is. It contains important blocks to register signals, to
apply memory and nofile limits, etc. The order of these blocks should be also
preserved (especially the signals part).
For the moment step_init_3() must be also executed for all runtime modes.
This is the first commit in a series to add a support of the 5-th reload
use case, when the master process fails to read its new configuration. In this
case it just need to perform its initialization steps and keep the existed
worker.
To add the support for this last use case we need to split init() and main()
in a shorter steps in order to encapsulate necessary initialization routines
into separate functions.
Let's at first, make here progname as a global variable for haproxy.c, as it
will be used in error messages in the initialization functions. Then let's
split the init() into separate routines, which set and apply modes, write
process PID in a pidfile, etc.
The big part of the former init(), which called functions to allocate pools,
to initialize proxies, to calculate maxconn and to perform some post checks was
just encasulated as is, into step_init_2(). It will take some time to figure
out exactly which parts of this initialization block are really necessary for
the master process and which ones it could skip. So, for the moment
step_init_2() is called for all runtime modes.
Before refactoring the master-worker mode, in all runtime modes, when the new
process successfully parsed its configuration and bound to sockets, it sent
either SIGUSR1 or SIGTERM to the previous one in order to terminate it.
Let's keep this logic as is for the standalone mode. In addition, in standalone
mode we need to send the signal to old process before calling set_identity(),
because in set_identity() effective user or group may change. So, the order is
important here.
In case of master-worker mode after refactoring, master terminates the previous
worker by itself up to receiving "READY" status from the new one in
_send_status(). Master also sets at this moment HAPROXY_LOAD_SUCCESS env
variable and checks, if there are some other workers to terminate with
max_reloads exceeded.
So, now in master-worker mode we terminate old workers only, when the new one
has successfully done all initialization steps and has sent "READY" status to
master.
In the new master-worker architecture, when a worker process is forked and
successfully initialized it needs somehow to communicate its "READY" state to
the master, in order to terminate the previous worker and workers, that might
exceeded max_reloads counter.
So, let's implement for this a new master CLI _send_status command. A new
worker can send its status string "READY" to the master, when it's about
entering to the run poll loop, thus it can start to receive data.
In _send_status() in the master context we update the status of the new worker:
PROC_O_INIT flag is withdrawn.
When TERM signal is sent to a worker, worker terminates and this triggers the
mworker_catch_sigchld() handler in master. This handler deletes the exiting
process entry from the processes list.
In _send_status() we loop over the processes list twice. At the first time, in
order to stop workers that exceeded the max_reloads counter. At the second time,
in order to stop the worker forked before the last reload. In the corner case,
when max_reloads=1, we avoid to send SIGTERM twice to the same worker by
setting sigterm_sent flag during the first loop.
When master performs a reexec it should set for an already existed worker the
flag PROC_O_LEAVING. It means that existed worked is marked as the previous one
and will be terminated after the reload.
In the previous implementation master process was need to do the reexec
twice (the first time for parsing its configuration and the second time to free
unused ressources). So the logic of setting PROC_O_LEAVING was based on
comparing the number of reloads, performed by each process from the processes
list, except the master.
Now, as being mentioned before, reexec is performed only once. So, in this case
we need to set PROC_O_LEAVING flag, when we deserialize the list. It is done for
all processes, which have the number of reloads stricly positive.
Previously reexec_on_failure() was called in cases when the process has failed
after reload, while it was parsing its configuration or it was trying to apply
it. reexec_on_failure() has called mworker_reexec() and the master process has
been reexecuted.
With the new architecture in such cases there is no longer need to reexecute
the master process after its reload again. It simply keeps the previous worker,
forked before the reload, and it lets the new one to exit with an error. But we
still need the code, which increments the number of failed reloads and which
notifies systemd with new "Reload failed!" status. So, let's reuse and adapt
for this reexec_on_failure() and let's rename it to on_new_child_failure().
Basically, this is the continuation of the previous commits. So, here after the
fork, worker process closes the "master" end of the copied CLI sockpair and
binds its end, ipc_fd[1], to the GLOBAL proxy listener.
mworker_cli_global_proxy_new_listener() guarantees that GLOBAL proxy will be
created, if it wasn't the case before.
Master process, at first, allocates the MASTER proxy, creates master CLI listener
(-S command line option) and reload sockpair and then closes the "worker" end of
the copied CLI sockpair and binds its end, ipc_fd[0], to the created MASTER proxy.
Usage of the new PROC_O_INIT state helps to reduce test conditions to find the
newly forked worker.
Here, to distinguish between the new worker and the previous one let's add a
new process state PROC_O_INIT and let's set it, when the memory is allocated
for the new worker in the processes list.
For the master process we always need to create a MASTER proxy, even if
master cli settings were not provided via command line, because now we bind a
listener in the master process context at ipc_fd[0]. So, MASTER proxy should be
already allocated at this moment.
The main idea here is to create a master CLI inherited sockpair just before the
master-worker fork. And only then after the fork let each process to bind a
needed listener to the its end of this sockpair.
Like this master and worker processes can close unused "ends" of its sockpair
copy (ipc_fd[0] for worker and and ipc_fd[1] for master).
When this sockpair creation happens inside the
mworker_cli_global_proxy_new_listener() is not possible for the master to
close ipc_fd[1] bound to the GLOBAL proxy listener, as this triggers a
BUG_ON(fd->owner) in fd_insert() in master context, because master process
has alredy entered in its polling loop and poller in its turn tries to reused
closed fd.
Let's rename mworker_cli_sockpair_new() to
mworker_cli_global_proxy_new_listener() to outline that this function creates
the GLOBAL proxy, allocates the listener with "master-socket" bind conf and
attaches this listener to this GLOBAL proxy. Listener is bound to ipc_fd[1] of
the sockpair inherited in master and in worker (master CLI sockpair).
*thread keywords parsers are sensitive to global section position. If they are
present there, the global section must be the first section in the
configuration. *thread parsers logic is based on non_global_section_parsed
counter. So, we need to reset it explicitly before the second configuration
read done by worker or in a standalone mode.
This commit is a part of the series to add a support of discovery mode in the
configuration parser and in initialization sequence.
In order to support discovery mode, we need to read the configuration twice.
So, we need to split the stage, when we load all configuration files, from
the stage when we parse it. To do this, let's encapsulate in read_cfg() the
part, where we load the configuration files in a separate function, load_cfg().
Like this we can call only the parsing part as many times as we need.
Before reading configuration at the first time we set MODE_DISCOVERY. After
the reading this mode is immediately unset, as the real runtime mode has been
already set by discovery keywords parsers.
Second read is performed when all primary runtime modes (daemon, master-worker)
are applied, because we should not read the configuration twice in the master
process.
MODE_MWORKER_WAIT becames redundant with MODE_MWORKER, due to moving
master-worker fork in init(). This change allows master no longer perform
reexec just after forking in order to free additional memory.
As after the fork in the master process we set 'master' variable, we can
replace now MODE_MWORKER_WAIT in some 'if' statements by simple check of this
'master' variable.
Let's also continue to get rid of HAPROXY_MWORKER_WAIT_ONLY environment
variable, as it's no longer needed as well.
In cfg_program_postparser(), which is used to check if cmdline is defined to
launch a program, we completely remove the check of mode for now, because
the master process does not parse the configuration for the moment. 'program'
section parsing will be reintroduced in master later in the next commits.
As we no longer support MODE_MWORKER_WAIT for master (it became redundant with
MODE_MWORKER after moving master-worker fork in init()), let's rename
exit_on_waitmode_failure() callback in just exit_on_failure().
This a first commit to prepare the removal of MODE_MWORKER_WAIT support. It has
became redundant with MODE_MWORKER, due to moving master-worker fork in init().
Master process does no longer perform reexec to free additional memory after
forking and does no longer changing its mode to MODE_MWORKER_WAIT, where it has
entered to its wait polling loop and has handled signals. Now, master enters in
this loop almost immediately after forking a worker and being always in mode
MODE_MWORKER.
So, we can remove mworker_reexec_waitmode() wrapper, which was used to set
HAPROXY_MWORKER_WAIT_ONLY variable and to call mworker_reexec(). But let's keep
for the moment the logic of reexec_on_failure() atexit callback for master in
order if in the future we will need to support this case again.
Due to moving the master-worker fork in init(), we need to protect
prepare_caps_from_permitted_set() call, which is executed after init(). This
call makes sense only for worker, daemon and for foreground mono process modes.
prepare_caps_from_permitted_set() allows to read Linux capabilities from
haproxy binary and to move some of them in process Effective set, if 'setcap'
keyword lists needed capabilities in the global section.
There are two set_identity() calls, both under quite same:
'if ((global.mode & (MODE_MWORKER|MODE_DAEMON...)...'
The first call serves to change uid/gid and set some needed Linux capabilities
only for process in the foreground mode. The second comes after master-worker
fork and allows to do the same in daemon and in worker modes.
Due to moving the master-worker fork in init() in some previous commit, the
second set_identity() now is no longer under the 'if'. So, it is executed
for all modes, except MODE_MWORKER. Now in MODE_MWORKER process enters in its
wait polling loop just after forking a worker and it terminates almost
immediately, if it exits this loop.
Worker, daemon and process in a foreground mode will perform set_identity() as
before, but now it will be called in a one place at main().
global.last_checks should be verified just after set_identity() call. As it's
stated in comments some configuration options may require full privileges or
some Linux capabilities need to be granted to process. set_identity() via
prepare_caps_for_setuid() may put configured capabilities in process Effective
set and, hence, remove respective flag from global.last_checks.
There are two 'chroot' code blocks, both under quite same:
'if ((global.mode & (MODE_MWORKER|MODE_DAEMON...)...'
The first block serves to perform chroot only for process in the foreground
mode. The second comes after master-worker fork and allows to do chroot
in daemon and in worker modes.
Due to moving the master-worker fork in init() in some previous commit, the
second 'chroot' code block now is no longer under the 'if'. So, it is executed
for all modes, except MODE_MWORKER. Now in MODE_MWORKER process enters in its
wait polling loop just after forking a worker and it terminates almost
immediately, if it exits this loop.
Worker, daemon and process in a foreground mode will perform the chroot as
before, but now it will be done in a one place at main().
mworker_create_master_cli() creates MASTER proxy and allocates listeners,
which are attached to this proxy. It also creates a reload sockpair.
So, it's more appropriate to do the check, that we are in a MODE_MWORKER, if
master CLI settings were provided via command line, just after the config
parsing. And only then, if runtime mode and command line settings are
coherent, try to perform master-worker fork and try to create master CLI.
After moving master-worker fork into init() and reintroducing it into a
switch-case (see the previous commit), it is more appropriate to set
nbthread=1 and nbtgroups=1 immediately in the 'case' for the parent process.
Before this fix, startup logs ring was duplicated before the fork(), so master
and worker had both the original startup_logs ring and the duplicated one. In
the worker context we freed the original ring and used a duplicated one. In
the master context we did nothing, but we still create a duplicated copy again
and again during the reload.
So, let's duplicate startup logs ring only in the worker context. Master
continues to use the original ring initialized in init() before its fork().
This refactoring allows to simplify 'master-worker' logic. The master process
with this change will fork a worker very early at the initialization stage,
which allows to perform a configuration parsing only for the worker. In reality
only the worker process needs to parse and to apply the whole configuration.
Master process just polls master CLI sockets, watches worker status, catches
its termination state and handles the signals. With this refactoring there is
no longer need for master to perform re-execution after reading the whole
configuration file to free additional memory. And there is no longer need for
worker to register atexit callbacks, in order to free the memory, when it
fails to apply the new configuration. In contrast, we now need to set
proc_self pointer to the new worker entry in processes list just after
the fork in the worker process context. proc_self is dereferenced in
mworker_sockpair_register_per_thread(), which is called when worker enters in
its polling loop.
Following patches will try to gather more 'worker' and 'master' specific' code
in the dedicated cases of this new fork() switch, or in a separate functions.
Let's move PID handling in init() from the main() code. It is more appropriate
to open and to write the PID of the process just after daemonization fork. In
case of daemon monoprocess mode, we will simply write a PID of the process,
which is already in the background. In case of 'master-worker' mode, we keep
the previous behaviour and we write only a PID of the master process.
This allows to remove redundant tests of the process execution mode, tests of
the pidfd value and consequent writes to this pidfd. This patch prepares the
refactoring of master-worker fork by moving it in init() function as well.
Let's move daemonization fork in init(). We need to perform this fork always
before forking a worker process, in order to be able to launch master and then
its worker in daemon, i.e. background mode, if haproxy was started with '-D'
option.
This refactoring is a preparation step, needed for replacing then master-worker
fork in init() as well. This allows the master process not to read the whole
configuration file and not to do re-execution in order to free additional
memory, when worker was forked. In the new refactored design only the worker
process will read and apply a new configuration, while the master will arrive
very fast in its polling loop to wait worker's termination and to handle
signals. See more details in the following commits.
As master process performs execvp() syscall to handle USR2 and HUP signals in
mworker_reexec(), let's add O_CLOEXEC flag, when we open '/dev/null' in order
to avoid fd leak.
This a preparation step to refactor master-worker logic. See more details in
the next commits.
In proxies, stick-tables, servers, etc... at plenty of places we store
a file name and a line number. Some file names are the result of strdup()
(e.g. in proxies), others not (e.g. stick-tables) and leave dangling
pointers at the end of parsing. The risk of double-free is not null
either.
In order to stop this, let's first add a simple tool that allows to
register short strings inside a global list, these strings happening
to be server names. The strings are either duplicated and stored upon
failure to find them, or just added to this storage. Since file names
are not expected to disappear before the end of the process, for now
we don't even implement refcounting, and we free them all at the end.
There's already a drop_file_name() function to reset the pointer like
ha_free() used to do, and even if not strictly needed it's a good
habit to get used to doing it.
The strings are returned as const so that they're stored as-is in
structs, and that nasty free() calls are easily caught. The pointer
points to the char[] storage inside the node itself. This way later
if we want to implement refcounting, it will be trivial to just look
up a string and change its associated node's refcount. If needed,
comparisons can also be made on pointers.
For now they're not used yet and are released on deinit().
This fixes 7b78e1571 (" MINOR: mworker: restore initial env before wait
mode").
In cases, when haproxy starts without any configuration, for example:
'haproxy -vv', init_env array to backup env variables is never allocated. So,
we need to check in deinit(), when we free its memory, that init_env is not a
NULL ptr.
This patch is the follow-up of 1811d2a6ba (MINOR: tools: add helpers to
backup/clean/restore env).
In order to avoid unexpected behaviour in master-worker mode during the process
reload with a new configuration, when the old one has contained '*env' keywords,
let's backup its initial environment before calling parse_cfg() and let's clean
and restore it in the context of master process, just before it enters in a wait
polling loop.
This will garantee that new workers will have a new updated environment and not
the previous one inherited from the master, which does not read the configuration,
when it's in a wait-mode.
Define a new buffer pool reserved to allocate smaller memory area. For
the moment, its usage will be restricted to QUIC, as such it is declared
in quic_stream module.
Add a new config option "tune.bufsize.small" to specify the size of the
allocated objects. A special check ensures that it is not greater than
the default bufsize to avoid unexpected effects.
QUIC MUX buffer allocation limit is now directly based on the underlying
congestion window size. previous static limit based on conn-tx-buffers
is now unused. As such, this commit adds a warning to users to prevent
that it is now obsolete.
Secondly, update max-window-size setting. It is now the main entrypoint
to limit both the maximum congestion window size and the number of QUIC
MUX allocated buffer on emission. Remove its special value '0' which was
used to automatically adjust it on now unused conn-tx-buffers.
Define a new global keyword tune.quic.frontend.max-window-size. This
allows to set globally the maximum congestion window size for each QUIC
frontend connections.
The default value is 0. It is a special value which automatically derive
the size from the configured QUIC connection buffer limit. This is
similar to the previous "quic-cc-algo" behavior, which can be used to
override the maximum window size per bind line.
As readcfgfile no longer opens configuration files and reads them with fgets,
but performs only the parsing of provided data, let's rename it to parse_cfg by
analogy with read_cfg in haproxy.c.
Let's call load_cfg_in_ram() helper for each configuration file to load it's
content in some area in memory. Adapt readcfgfile() parser function
respectively. In order to limit changes in its scope we give as an argument a
cfgfile structure, already filled in init_args() and in load_cfg_in_ram() with
file metadata and content.
Parser function (readcfgfile()) uses now fgets_from_mem() instead of standard
fgets from libc implementations.
SPOE filter parses its own configuration file, pointed by 'config' keyword in
the configuration already loaded in memory. So, let's allocate and fill for
this a supplementary cfgfile structure, which is not referenced in cfg_cfgfiles
list. This structure and the memory with content of SPOE filter configuration
are freed immediately in parse_spoe_flt(), when readcfgfile() returns.
HAProxy OpenTracing filter also uses its own configuration file. So, let's
follow the same logic as we do for SPOE filter.
This commit prepares read_cfg() to call load_cfg_in_mem() helper in order to
load configuration files in memory. Before, read_cfg() calls the parser for all
files from cfg_cfgfiles list and cumulates parser's errors and memprintf's
errors in for_each loop. memprintf's errors did not stop this loop and were
accounted just after.
Now, as we plan to load configuration files in memory, we stop the loop, if
memprintf() fails, and we show appropraite error message with ha_alert. Then
process terminates. So not all cumulated syntax-related errors will be shown
before exit in this case and we has to stop, because we run out of memory.
If we can't open the current file or we fail to allocate a memory to store
some configuration line, the previous behaviour is kept, process emits
appropriate alert message and exits.
If parser returns some syntax-related error on the current file, the previous
behaviour is kept as well. We cumulate such errors for all parsed files and we
check them just after the loop. All syntax-related errors for all files is
shown then as before in ha_alert messages line by line during the startup.
Then process will exit with 1.
As now cfg_cfgfiles list contains many pointers to some memory areas with
configuration files content and this content could be big, it's better to
free the list explicitly, when parsing was finished. So, let's change
read_cfg() to return some integer value to its caller init(), and let's perform
the free routine at a caller level, as cfg_cfgfiles list was initialized and
initially filled at this level.
list_append_word() helper was used before only to chain configuration file names
in a list. As now we start to use cfgfile structure which represents entire file
in memory and its metadata, let's adapt this helper to use this structure and
let's rename it to list_append_cfgfile().
Adapt functions, which process configuration files and directories to use
cfgfile structure and list_append_cfgfile() instead of wordlist.
Let's check the second time a global counter of "ha_warning" messages, if
zero-warning is set. And let's do this just before forking. At this moment we
are sure, that we've already done all init operations, where we could emit
"ha_warning", and we still have stderr fd opened.
Even with the second check, we could lost some late and rare warnings
about failing to drop supplementary groups and about re-enabling core dumps.
Notes about this are added into 'zero-warning' keyword description.
This patch is done in order to prepare the move of handlers to compute and to
check process related limits as maxconn, maxsock, maxpipes.
So, these handlers become no longer static due to the future move.
We add the handlers declarations in limits.h in this patch as well, in order to
keep the next patch, dedicated to code replacement, without any additional
modifications.
Such split also assures that this patch can be compiled separately from the
next one, where we moving the handlers. This is important in case of
git-bisect.
The code which gets, sets and checks initial and current fd limits and process
related limits (maxconn, maxsock, ulimit-n, fd-hard-limit) is spread around
different functions in haproxy.c and in fd.c. Let's group it together in
dedicated limits.c and limits.h.
This patch is done in order to prepare the moving of limits-related functions
from different places to the new 'limits' compilation unit. It helps to keep
clean the next patch, which will do only the move without any additional
modifications.
Such detailed split is needed in order to be sure not to break accidentally
limits logic and in order to be able to compile each commit separately in case
of git-bisect.
This commit fixes 41275a691 ("MEDIUM: init: set default for fd_hard_limit via
DEFAULT_MAXFD").
fd_hard_limit is taken in account implicitly via 'ideal_maxconn' value in
all maxconn adjustements, when global.rlimit_memmax is set:
MIN(global.maxconn, capped by global.rlimit_memmax, ideal_maxconn);
It also caps provided global.rlimit_nofile, if it couldn't be set as a current
process fd limit (see more details in the main() code).
So, lets set the default value for fd_hard_limit only, when there is no any
other haproxy-specific limit provided, i.e. rlimit_memmax, maxconn,
rlimit_nofile. Otherwise we may break users configs.
Please, note, that in master-worker mode, master does not need the
DEFAULT_MAXFD (1048576) as well, as we explicitly limit its maxconn to 100.
Must be backported in all stable versions until v2.6.0, including v2.6.0,
like the commit above.
Let's provide a default value for fd_hard_limit, if it's not set in the
configuration. With this patch we could set some specific default via
compile-time variable DEFAULT_MAXFD as well. Hope, this will be helpfull for
haproxy package maintainers.
make -j 8 TARGET=linux-glibc DEBUG=-DDEFAULT_MAXFD=50000
If haproxy is comipled without DEFAULT_MAXFD defined, the default will be set
to 1048576.
This is done to avoid killing the process by its watchdog, while it started
without any limitations in its configuration or in the command line and the
hard RLIMIT_NOFILE is extremely huge (~1000000000). We use in this case
compute_ideal_maxconn() to calculate maxconn and maxsock, maxsock defines the
size of internal fdtab, which becames very-very large as well. When
the process starts to simply loop over this fdtab (0(n)), this takes a lot of
time, so watchdog does it job.
To avoid this, maxconn now is always reduced to some reasonable value either
by explicit global.fd-hard-limit from configuration, or by its default. The
default may be changed at build-time and overwritten then by
global.fd-hard-limit at runtime. Explicit global.fd-hard-limit from the
configuration has always precedence over DEFAULT_MAXFD, if set.
Must be backported in all stable versions until v2.6.0, including v2.6.0.
Haproxy master process should not read its configuration the second time
after performing reexec and passing to MODE_MWORKER_WAIT. So, to make
this part of init() function more readable and to distinguish better the
point, where configs have been read, let's encapsulate it in a separate
function.
Let's encapsulate the logic of 'reload' sockpair and master CLI listeners
creation, used by master CLI into a separate function, as we needed this
only in master-worker runtime mode. This makes the code of init() more
readable.
As MODE_CHECK_CONDITION logic terminates the process anyway, no matter if
the test for the provided condition was successfull or not, let's
encapsulate it in a separate function. This makes the code of init() more
readable.
In MODE_CHECK_CONDITION we only parse check_condition string, provided by
'-cc', and then we evaluate it. Haproxy process terminates at the
end of {if..else} block anyway, if the test has failed or passed. So, it
will be more appropriate to perform MODE_CHECK_CONDITION test first and
then do all other process runtime mode verifications.