It's always a pain to have to stuff lots of #ifdef USE_OPENSSL around
ssl headers, it even results in some of them appearing in a random order
and multiple times just to benefit form an existing ifdef block. Let's
make these headers safe for inclusion when USE_OPENSSL is not defined,
they now perform the test themselves and do nothing if USE_OPENSSL is
not defined. This allows to remove no less than 8 such ifdef blocks
and make include blocks more readable.
Since we're providing a compatibility layer for multiple OpenSSL
implementations and their derivatives, it is important that no C file
directly includes openssl headers but only passes via openssl-compat
instead. As a bonus this also gets rid of redundant complex rules for
inclusion of certain files (engines etc).
They were all check to comply with the advertised openssl version. Now
that libressl doesn't pretend to be a more recent openssl anymore, we
can simply rely on the regular openssl version tests without having to
deal with exceptions for libressl.
Most tests on OPENSSL_VERSION_NUMBER have become complex and break all
the time because this number is fake for some derivatives like LibreSSL.
This patch creates a new macro, HA_OPENSSL_VERSION_NUMBER, which will
carry the real openssl version defining the compatibility level, and
this version will be adjusted depending on the variants.
As with every single OpenSSL fix, LibreSSL build broke again, this time
after commit 56996dabe ("BUG/MINOR: mworker/ssl: close OpenSSL FDs on
reload"). A definitive solution will have to be found quickly. For now,
let's exclude libressl from the version test.
This patch must be backported to 1.9 since the fix above was already
backported there.
This patch implements a new global parameter for the master-worker mode.
When setting the mworker-max-reloads value, a worker receive a SIGTERM
if its number of reloads is greater than this value.
Since previous commit it's not needed anymore to test a task pointer
before calling task_destory() so let's just remove these tests from
the various callers before they become confusing. The function's
arguments were also documented. The same should probably be done
with tasklet_free() which involves a test in roughly half of the
call places.
In commit 1b8e68e ("MEDIUM: stick-table: Stop handling stick-tables as
proxies."), the ->table member of proxy struct was replaced by a pointer
that is not always checked and in some situations can cause a segfault,
eg. during reload or while using "show table" on CLI socket.
No backport is needed.
From OpenSSL 1.1.1, the default behaviour is to maintain open FDs to any
random devices that get used by the random number library. As a result,
those FDs leak when the master re-execs on reload; since those FDs are
not marked FD_CLOEXEC or O_CLOEXEC, they also get inherited by children.
Eventually both master and children run out of FDs.
OpenSSL 1.1.1 introduces a new function to control whether the random
devices are kept open. When clearing the keep-open flag, it also closes
any currently open FDs, so it can be used to clean-up open FDs too.
Therefore, a call to this function is made in mworker_reload prior to
re-exec.
The call is guarded by whether SSL is in use, because it will cause
initialisation of the OpenSSL random number library if that has not
already been done.
This should be backported to 1.9 and 1.8.
Now we atomically allocate the my_regex struct within function
regex_comp() and compile the regex or free both in case of failure. The
pointer to the allocated my_regex struct is returned directly. The
my_regex* argument to regex_comp() is removed.
Function regex_free() was modified so that it systematically frees the
my_regex entry. The function does nothing when called with a NULL as
argument (like free()). It will avoid existing risk of not properly
freeing the initialized area.
Other structures are also updated in order to be compatible (the ones
related to Lua and action rules).
This patch adds the support for the "table" line parsing in "peers" sections
to declare stick-table in such sections. This also prevents the user from having
to declare dummy backends sections with a unique stick-table inside.
Even if still supported, this usage will become deprecated.
To do so, the ->table member of proxy struct which is a stktable struct is replaced
by a pointer to a stktable struct allocated at parsing time in src/cfgparse-listen.c
for the dummy stick-table backends and in src/cfgparse.c for "peers" sections.
This has an impact on the code for stick-table sample converters and on the stickiness
rules parsers which first store the name of the dummy before resolving the rules.
This patch replaces proxy_tbl_by_name() calls by stktable_find_by_name() calls
to lookup for stick-tables stored in "stktable_by_name" ebtree at parsing time.
There is only one remaining place where proxy_tbl_by_name() is used: src/hlua.c.
At several places in the code we relied on the fact that ->size member of stick-table
was equal to zero to consider the stick-table was present by not configured,
this do not make sense anymore as ->table member of struct proxyis fow now on a pointer.
These tests are replaced by a test on ->table value itself.
In "peers" section we do not have to temporary store the name of the section the
stick-table are attached to because this name is obviously already known just after
having entered this "peers" section.
About the CLI stick-table I/O handler, the pointer to proxy struct is replaced by
a pointer to a stktable struct.
Currently the thread array is a local variable inside a function block
and there is no access to it from outside, which often complicates
debugging. Let's make it global and export it. Also the allocation
return is now checked.
It's still obscure how we managed to initialize an array of integers
with values always equal to the index, just to retrieve the value
from an opaque pointer to the index instead of directly using it! I
suspect it's a leftover from the very early threading experiments.
This commit gets rid of this and simply passes the thread ID as the
argument to run_thread_poll_loop(), thus significantly simplifying the
few call places and removing the need to allocate then free an array
of identity.
When we initially experimented with threads and processes support, we
needed to implement arrays of threads per process for cpu-map, but this
is not needed anymore since we support either threads or processes.
Let's simply make the thread-based cpu-map per thread and not per
thread and per process since that's not used anymore. Doing so reduces
the global struct from 33kB to 1.5kB.
When using the "use_backend" configuration directive, the configuration
file name stored as rule->file was not freed in some situations. This
was introduced in commit 4ed1c95 ("MINOR: http/conf: store the
use_backend configuration file and line for logs").
This patch should be backported to 1.9, 1.8 and 1.7.
As by default we add all keepalive connections to the idle pool, if we run
into a pathological case, where all client don't do keepalive, but the server
does, and haproxy is configured to only reuse "safe" connections, we will
soon find ourself having lots of idling, unusable for new sessions, connections,
while we won't have any file descriptors available to create new connections.
To fix this, add 2 new global settings, "pool_low_ratio" and "pool_high_ratio".
pool-low-fd-ratio is the % of fds we're allowed to use (against the maximum
number of fds available to haproxy) before we stop adding connections to the
idle pool, and destroy them instead. The default is 20. pool-high-fd-ratio is
the % of fds we're allowed to use (against the maximum number of fds available
to haproxy) before we start killing idling connection in the event we have to
create a new outgoing connection, and no reuse is possible. The default is 25.
task_delete() was never used without calling task_free() just after, and
task_free() was only used on error pathes to destroy a just-created task,
so merge them into task_destroy(), that will remove the task from the
wait queue, and make sure the task is either destroyed immediately if it's
not in the run queue, or destroyed when it's supposed to run.
It's always a pain to get a core dump when enabling user/group setting
(which disables the dumpable flag on Linux), when using a chroot and/or
when haproxy is started by a service management tool which requires
complex operations to just raise the core dump limit.
This patch introduces a new "set-dumpable" global directive to work
around these troubles by doing the following :
- remove file size limits (equivalent of ulimit -f unlimited)
- remove core size limits (equivalent of ulimit -c unlimited)
- mark the process dumpable again (equivalent of suid_dumpable=1)
Some of these will depend on the operating system. This way it becomes
much easier to retrieve a core file. Temporarily moving the chroot to
a user-writable place generally enough.
Since the introduction of the options field, we can use it to store the
type of process.
type = 'm' is replaced by PROC_O_TYPE_MASTER
type = 'w' is replaced by PROC_O_TYPE_WORKER
type = 'e' is replaced by PROC_O_TYPE_PROG
The old values are still used in the HAPROXY_PROCESSES environment
variable to pass the information during a reload.
Pavlos Parissis reported an interesting case where some map identifiers
were not assigned (appearing as -1 in show map). It turns out that it
only happens for log-format expressions parsed in check_config_validity()
that involve maps (log-format, use_backend, unique-id-header), as in the
sample configuration below :
frontend foo
bind :8001
unique-id-format %[src,map(addr.lst)]
log-format %[src,map(addr.lst)]
use_backend %[src,map(addr.lst)]
The reason stems from the initial introduction of unique IDs in 1.5 via
commit af5a29d5f ("MINOR: pattern: Each pattern is identified by unique
id.") : the unique_id assignment was done before calling
check_config_validity() so all maps loaded after this call are not
properly configured. From what the function does, it seems they will not
be able to use a cache, will not have a unique_id assigned and will not
be updatable from the CLI.
This fix must be backported to all supported versions.
This patch implements the external binary support in the master worker.
To configure an external process, you need to use the program section,
for example:
program dataplane-api
command ./dataplane_api
Those processes are launched at the same time as the workers.
During a reload of HAProxy, those processes are dealing with the same
sequence as a worker:
- the master is re-executed
- the master sends a USR1 signal to the program
- the master launches a new instance of the program
During a stop, or restart, a SIGTERM is sent to the program.
The children variable is still used in haproxy, it is not required
anymore since we have the information about the current workers in the
mworker_proc linked list.
The oldpids array is also replaced by this linked list when we
generated the arguments for the master reexec.
The current initcall implementation relies on dedicated sections (one
section per init stage) to store the initcall descriptors. Then upon
startup, these sections are scanned from beginning to end and all items
found there are called in sequence.
On platforms like AIX or Cygwin it seems difficult to figure the
beginning and end of sections as the linker doesn't seem to provide
the corresponding symbols. In order to replace this, this patch
simply implements an array of single linked (one per init stage)
which are fed using constructors for each register call. These
constructors are declared static, with a name depending on their
line number in the file, in order to avoid name clashes. The final
effect is the same, except that the method is slightly more expensive
in that it explicitly produces code to register these initcalls :
$ size haproxy.sections haproxy.constructor
text data bss dec hex filename
4060312 249176 1457652 5767140 57ffe4 haproxy.sections
4062862 260408 1457652 5780922 5835ba haproxy.constructor
This mechanism is enabled as an alternative to the default one when
build option USE_OBSOLETE_LINKER is set. This option is currently
enabled by default only on AIX and Cygwin, and may be attempted for
any target which fails to build complaining about missing symbols
__start_init_* and/or __stop_init_*.
Once confirmed as a reliable fix, this will likely have to be backported
to 1.9 where AIX and Cygwin do not build anymore.
A bug occurs when the sigchld handler is called and a child which is
not in the process list just left, or with an empty process list.
The child variable won't be set and left as an uninitialized variable or
set to the wrong child entry, which can lead to a free of this
uninitialized variable or of the wrong child.
This can lead to a crash of the master during a stop or a reload.
It is not supposed to happen with a worker which was created by the
master. A cause could be a fork made by a dependency. (openssl, lua ?)
This patch strengthens the case of the missing child by doing the free
only if the child was found.
This patch must be backported to 1.9.
It's not convenient not to know the status of default options, and
requires the user to know what option is enabled by default in each
target. With this patch, a new "Features list" line is added to the
output of "haproxy -vv" to report the whole list of known features
with their respective status. They're prefixed with a "+" when enabled
or a "-" when disabled. The "USE_" prefix is removed for clarity.
It's never easy to guess what services are built in. We currently have
the prometheus exporter in contrib/ which is the only extension for now.
Let's enumerate all available ones just like we do for filterr and pollers.
Each thread uses one epoll_fd or kqueue_fd, and a pipe (thus two FDs).
These ones have to be accounted for in the maxsock calculation, otherwise
we can reach maxsock before maxconn. This is difficult to observe but it
in fact happens when a server connects back to the frontend and has checks
enabled : the check uses its FD and serves to fill the loop. In this case
all FDs planed for the datapath are used for this.
This needs to be backported to 1.9 and 1.8.
Some packages used to rely on DEFAULT_MAXCONN to set the default global
maxconn value to use regardless of the initial ulimit. The recent changes
made the lowest bound set to 100 so that it is compatible with almost any
environment. Now that DEFAULT_MAXCONN is not needed for anything else, we
can use it for the lowest bound set when maxconn is not configured. This
way it retains its original purpose of setting the default maxconn value
eventhough most of the time the effective value will be higher thanks to
the automatic computation based on "ulimit -n".
This entry was still set to 2000 but never used anymore. The only places
where it appeared was as an alias to SYSTEM_MAXCONN which forces it, so
let's turn these ones to SYSTEM_MAXCONN and remove the default value for
DEFAULT_MAXCONN. SYSTEM_MAXCONN still defines the upper bound however.
The global maxconn value is often a pain to configure :
- in development the user never has the permissions to increase the
rlim_cur value too high and gets warnings all the time ;
- in some production environments, users may have limited actions on
it or may only be able to act on rlim_fd_cur using ulimit -n. This
is sometimes particularly true in containers or whatever environment
where the user has no privilege to upgrade the limits.
- keeping config homogenous between machines is even less easy.
We already had the ability to automatically compute maxconn from the
memory limits when they were set. This patch goes a bit further by also
computing the limit permitted by the configured limit on the number of
FDs. For this it simply reverses the rlim_fd_cur calculation to determine
maxconn based on the number of reserved sockets for listeners & checks,
the number of SSL engines and the number of pipes (absolute or relative).
This way it becomes possible to make maxconn always be the highest possible
value resulting in maxsock matching what was set using "ulimit -n", without
ever setting it. Note that we adjust to the soft limit, not the hard one,
since it's what is configured with ulimit -n. This allows users to also
limit to low values if needed.
Just like before, the calculated value is reported in verbose mode.
We'll need to know the global maxsock before the maxconn calculation.
Actually only two components were calculated too late, the peers FD
and the stats FD. Let's move them a few lines upward.
The default number of pipes is adjusted based on the sum of frontends
and backends maxconn/fullconn settings. Now that it is possible to have
a null maxconn on a frontend to indicate "unlimited" with commit
c8d5b95e6 ("MEDIUM: config: don't enforce a low frontend maxconn value
anymore"), the sum of maxconn may remain low and limited to the only
frontends/backends where this limit is set.
This patch considers this new unlimited case when doing the check, and
automatically switches to the default value which is maxconn/4 in this
case. All the calculation was moved to a distinct function for ease of
use. This function also supports returning unlimited (-1) when the
value depends on global.maxconn and this latter is not yet set.
When the master re-execs itself on reload, it doesn't restore the initial
rlim_fd_cur/rlim_fd_max values, which have been modified by the ulimit-n
or global maxconn directives. This is a problem, because if these values
were set really low it could prevent the process from restarting, and if
they were set very high, this could have some implications on the restart
time, or later on the computed maxconn.
Let's simply reset these values to the ones we had at boot to maintain
the system in a consistent state.
A backport could be performed to 1.9 and maybe 1.8. This patch depends on
the two previous ones.
If a ulimit-n value is set, we must not lower the rlim_max value if the
new value is lower, we must only adjust the rlim_cur one. The effect is
that on very low values, this could prevent a master-worker reload, or
make an external check fail by lack of FDs.
This may be backported to 1.9 and earlier, but it depends on this patch
"MINOR: global: keep a copy of the initial rlim_fd_cur and rlim_fd_max
values".
Let's keep a copy of these initial values. They will be useful to
compute automatic maxconn, as well as to restore proper limits when
doing an execve() on external checks.
Historically the default frontend's maxconn used to be quite low (2000),
which was sufficient two decades ago but often proved to be a problem
when users had purposely set the global maxconn value but forgot to set
the frontend's.
There is no point in keeping this arbitrary limit for frontends : when
the global maxconn is lower, it's already too high and when the global
maxconn is much higher, it becomes a limiting factor which causes trouble
in production.
This commit allows the value to be set to zero, which becomes the new
default value, to mean it's not directly limited, or in fact it's set
to the global maxconn. Since this operation used to be performed before
computing a possibly automatic global maxconn based on memory limits,
the calculation of the maxconn value and its propagation to the backends'
fullconn has now moved to a dedicated function, proxy_adjust_all_maxconn(),
which is called once the global maxconn is stabilized.
This comes with two benefits :
1) a configuration missing "maxconn" in the defaults section will not
limit itself to a magically hardcoded value but will scale up to the
global maxconn ;
2) when the global maxconn is not set and memory limits are used instead,
the frontends' maxconn automatically adapts, and the backends' fullconn
as well.