First of all, all legacy HTTP analyzers and all functions exclusively used by
them were removed. So the most of the functions in proto_http.{c,h} were
removed. Only functions to deal with the HTTP transaction have been kept. Then,
http_msg and hdr_idx modules were entirely removed. And finally the structure
http_msg was lightened of all its useless information about the legacy HTTP. The
structure hdr_ctx was also removed because unused now, just like unused states
in the enum h1_state. Note that the memory pool "hdr_idx" was removed and
"http_txn" is now smaller.
Since commit 81492c989 ("MINOR: threads: flatten the per-thread cpu-map"),
we don't keep the proc*thread matrix anymore to represent the full binding
possibilities, but only the proc and thread ones. The problem is that the
per-process binding is not the same for each thread and for the process,
and the proc[] array was assumed to store the per-proc first thread value
when doing this change. Worse, the logic present there tries to deal with
thread ranges and process ranges in a way which automatically exclused the
other possibility (since ranges cannot be used on both) but as such fails
to apply changes if neither the process nor the thread is expressed as a
range.
The real problem comes from the fact that specifying cpu-map 1/1 doesn't
yet reveal if the per-process mask or the per-thread mask needs to be
updated. In practice it's the thread one but then the current storage
doesn't allow to store the binding of the first thread of each other
process in nbproc>1 configurations.
When removing the proc*thread matrix, what ought to have been kept was
both the thread column for process 1 and the process line for threads 1,
but instead only the thread column was kept. This patch reintroduces the
storage of the configuration for the first thread of each process so that
it is again possible to store either the per-thread or per-process
configuration.
As a partial workaround for existing configurations, it is possible to
systematically indicate at least two processes or two threads at once
and map them by pairs or more so that at least two values are present
in the range. E.g :
# set processes 1-4 to cpus 0-3 :
cpu-map auto:1-4/1 0 1 2 3
# or:
cpu-map 1-2/1 0 1
cpu-map 2-3/1 2 3
# set threads 1-4 to cpus 0-3 :
cpu-map auto:1/1-4 0 1 2 3
# or :
cpu-map 1/1-2 0 1
cpu-map 3/3-4 2 3
This fix must be backported to 2.0.
Before switching to wait mode, the per thread deinit should not be
called, because we didn't initiate threads and fdtab.
The problem is that the master could crash if we try to reload HAProxy
The commit 944e619 ("MEDIUM: mworker: wait mode use standard init code
path") removed the deinit code by accident, but its fix 7c756a8
("BUG/MEDIUM: mworker: fix FD leak upon reload") was incomplete and did
not took care of the WAIT_MODE.
This fix must be backported in 1.9 and 2.0
getpid() is documented as returning a pit pid_t result, not
necessarily an int. This causes a build warning on Solaris 10
because of '%d' or '%u' are used in the format passed to snprintf().
Let's just cast the result as an int (respectively unsigned int).
This can be backported to 2.0 and possibly older versions though
it really has no impact.
It's really confusing to call it a task because it's a tasklet and used
in places where tasks and tasklets are used together. Let's rename it
to tasklet to remove this confusion.
PiBa-NL found some pathological cases where starting threads can hinder
each other and cause a measurable slow down. This problem is reproducible
with the following config (haproxy must be built with -DDEBUG_DEV) :
global
stats socket /tmp/sock1 mode 666 level admin
nbthread 64
backend stopme
timeout server 1s
option tcp-check
tcp-check send "debug dev exit\n"
server cli unix@/tmp/sock1 check
This will cause the process to be stopped once the checks are ready to
start. Binding all these to just a few cores magnifies the problem.
Starting them in loops shows a significant time difference among the
commits :
# before startup serialization
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e186161 -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m1.581s
user 0m0.621s
sys 0m5.339s
# after startup serialization
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e4d7c9dd -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m2.366s
user 0m0.894s
sys 0m8.238s
In order to address this, let's use plain mutexes and cond_wait during
the init phase. With this done, waiting threads now sleep and the problem
completely disappeared :
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m0.161s
user 0m0.079s
sys 0m0.149s
There's no point in calling this on each and every thread since the first
thread passing there will enable the listeners, and the next ones will
simply scan all of them in turn to discover that they are already
initialized. Let's only initilize them on the first thread. This could
slightly speed up start up on very large configurations, eventhough most
of the time is still spent in the main thread binding the sockets.
A few measurements have constantly shown that this decreases the startup
time by ~0.1s for 150k listeners. Starting all of them in parallel doesn't
provide better results and can still expose some undesired races.
Since commit 6ec902a ("MINOR: threads: serialize threads initialization")
we now serialize threads initialization. But doing so has emphasized another
race which is that some threads may actually start the loop before others
are done initializing.
As soon as all threads enter the first thread_release() call, their rdv
bit is cleared and they're all waiting for all others' rdv to be cleared
as well, with their harmless bit set. The first one to notice the cleared
mask will progress through thread_isolate(), take rdv again preventing
most others from noticing its short pass to zero, and this first one will
be able to run all the way through the initialization till the last call
to thread_release() which it happily crosses, being the only one with the
rdv bit, leaving the room for one or a few others to do the same. This
results in some threads entering the loop before others are done with
their initialization, which is particularly bad. PiBa-NL reported that
some regtests fail for him due to this (which was impossible to reproduce
here, but races are racy by definition). However placing some printf()
in the initialization code definitely shows this unsychronized startup.
This patch takes a different approach in three steps :
- first, we don't start with thread_release() anymore and we don't
set the rdv mask anymore in the main call. This was initially done
to let all threads start toghether, which we don't want. Instead
we just start with thread_isolate(). Since all threads are harmful
by default, they all wait for each other's readiness before starting.
- second, we don't release with thread_release() but with
thread_sync_release(), meaning that we don't leave the function until
other ones have reached the point in the function where they decide
to leave it as well.
- third, it makes sure we don't start the listeners using
protocol_enable_all() before all threads have allocated their local
FD tables or have initialized their pollers, otherwise startup could
be racy as well. It's worth noting that it is even possible to limit
this call to thread #0 as it only needs to be performed once.
This now guarantees that all thread init calls start only after all threads
are ready, and that no thread enters the polling loop before all others have
completed their initialization.
Please check GH issues #111 and #117 for more context.
No backport is needed, though if some new init races are reported in
1.9 (or even 1.8) which do not affect 2.0, then it may make sense to
carefully backport this small series.
There is no point in initializing threads in parallel when we know that
it's the moment where some global variables are turned to thread-local
ones, and/or that some global variables are updated (like global_now or
trash_size). Some FDs might be created/destroyed/reallocated and could
be tricky to follow as well (think about epoll_fd for example).
Instead of having to be extremely careful about all these, and to trigger
false positives in thread sanitizers, let's simply initialize one thread
at a time. The init step is very fast so nobody should even notice, and
we won't have any more doubts about what might have happened when
analysing a dump.
See GH issues #111 and #117 for some background on this.
As reported in GH issue #99, when hard-stop-after triggers and threads
are in use, the chance that any thread releases the resources in use by
the other ones is non-null. Thus no thread should be allowed to deinit()
nor exit by itself.
Here we take a different approach. We simply use a 3rd possible value
for the "killed" variable so that all threads know they must break out
of the run-poll-loop and immediately stop.
This patch was tested by commenting the stream_shutdown() calls in
hard_stop() to increase the chances to see a stream use released
resources. With this fix applied, it never crashes anymore.
This fix should be backported to 1.9 and 1.8.
Remove the active_tasks_mask variable, we can deduce if we've work to do
by other means, and it is costly to maintain. Instead, introduce a new
function, thread_has_tasks(), that returns non-zero if there's tasks
scheduled for the thread, zero otherwise.
We have been abusing the do_poll()'s timeout for a while, making it zero
whenever there is some known activity. The problem this poses is that it
complicates activity diagnostic by incrementing the poll_exp field for
each known activity. It also requires extra computations that could be
avoided.
This change passes a "wake" argument to say that the poller must not
sleep. This simplifies the operations and allows one to differenciate
expirations from activity.
We still have quite a number of build macros which are mapped 1:1 to a
USE_something setting in the makefile but which have a different name.
This patch cleans this up by renaming them to use the USE_something
one, allowing to clean up the makefile and make it more obvious when
reading the code what build option needs to be added.
The following renames were done :
ENABLE_POLL -> USE_POLL
ENABLE_EPOLL -> USE_EPOLL
ENABLE_KQUEUE -> USE_KQUEUE
ENABLE_EVPORTS -> USE_EVPORTS
TPROXY -> USE_TPROXY
NETFILTER -> USE_NETFILTER
NEED_CRYPT_H -> USE_CRYPT_H
CONFIG_HAP_CRYPT -> USE_LIBCRYPT
CONFIG_HAP_NS -> DUSE_NS
CONFIG_HAP_LINUX_SPLICE -> USE_LINUX_SPLICE
CONFIG_HAP_LINUX_TPROXY -> USE_LINUX_TPROXY
CONFIG_HAP_LINUX_VSYSCALL -> USE_LINUX_VSYSCALL
We currently have the ability to register functions to be called early
on thread creation and at thread deinitialization. It turns out this is
not sufficient because certain such functions may use resources that are
being allocated by the other ones, thus creating a race condition depending
only on the linking order. For example the mworker needs to register a
file descriptor while the pollers will reallocate the fd_updt[] array.
Similarly logs and trashes may be used by some init functions while it's
unclear whether they have been deduplicated.
The same issue happens on deinit, if the fd_updt[] or trash is released
before some functions finish to use them, we'll get into trouble.
This patch creates a couple of early and late callbacks for per-thread
allocation/freeing of resources. A few init functions were moved there,
and the fd init code was split between the two (since it used to both
allocate and initialize at once). This way the init/deinit sequence is
expected to be safe now.
This patch should be backported to 1.9 as at least the trash/log issue
seems to be present. The run_thread_poll_loop() code is a bit different
there as the mworker is not a callback, but it will have no effect and
it's enough to drop the mworker changes.
This bug was reported by Ilya Shipitsin in github issue #104.
Commit 5a6e2245f ("REORG: threads: move the struct thread_info from
global.h to hathreads.h") didn't hold its promise well, as the thread_info
struct was still declared and initialized in haproxy.c in addition to being
in hathreads.c. Let's move it for real now.
The struct mworker_proc is not uniformly freed everywhere, sometimes leading
to leaks of the `id` string (and possibly the other strings).
Introduce a mworker_free_child function instead of duplicating the freeing
logic everywhere to prevent this kind of issues.
This leak was reported in issue #96.
It looks like the leaks have been introduced in commit 9a1ee7ac31,
which is specific to 2.0-dev. Backporting `mworker_free_child` might be
helpful to ease backporting other fixes, though.
The clock_gettime() man page says we must check that _POSIX_TIMERS is
defined to a value greater than zero, not just that it's simply defined
so let's fix this right now.
Event ports are kqueue/epoll polling class for Solaris. Code is based
on https://github.com/joyent/haproxy-1.8/tree/joyent/dev-v1.8.8.
Event ports are available only on SunOS systems derived from
Solaris 10 and later (including illumos systems).
I took extreme care to always check for _POSIX_THREAD_CPUTIME before
manipulating clock_id, except at one place (run_thread_poll_loop) as
found by Manu, breaking Solaris. Now fixed, no backport needed.
Since we're likely to access this thread_info struct more frequently in
the future, let's reserve the thread-local symbol to access it directly
and avoid always having to combine thread_info and tid. This pointer is
set when tid is set.
This is the per-thread CPU runtime clock, it will be used to measure
the CPU usage of each thread and by the lockup detection mechanism. It
must only be retrieved at the beginning of run_thread_poll_loop() since
the thread must already have been started for this. But it must be done
before performing any per-thread initcall so that all thread init
functions have access to the clock ID.
Note that it could make sense to always have this clockid available even
in non-threaded situations and place the process' clock there instead.
But it would add portability issues which are currently easy to deal
with by disabling threads so it may not be worth it for now.
This way we'll be able to store more per-thread information than just
the pthread pointer. The storage became an array of struct instead of
an allocated array since it's very small (typically 512 bytes) and not
worth the hassle of dealing with memory allocation on this. The array
was also renamed thread_info to make its intended usage more explicit.
Now that we have the guarantee that init calls happen before any other
thread starts, we don't need anymore the workaround installed by commit
1605c7ae6 ("BUG/MEDIUM: threads/mworker: fix a race on startup") and we
can instead rely on a regular per-thread initcall for this function. It
will only be performed on worker thread #0, the other ones and the master
have nothing to do, just like in the original code that was only moved
to the function.
It's a bit dangerous to let threads initialize at different speeds on
startup. Some are still in their init functions while others area already
running. It was even subject to some race condition bugs like the one
fixed by commit 1605c7ae6 ("BUG/MEDIUM: threads/mworker: fix a race on
startup").
Here in order to secure all this, we take a very simplistic approach
consisting in using half of the rendez-vous point, which is made
exactly for this purpose : we first initialize the mask of the threads
requesting a rendez-vous to the mask of all threads, and we simply call
thread_release() once the init is complete. This guarantees that no
thread will go further than the initialization code during this time.
This could even safely be backported if any other issue related to an
init race was discovered in a stable release.
It's always a pain to have to stuff lots of #ifdef USE_OPENSSL around
ssl headers, it even results in some of them appearing in a random order
and multiple times just to benefit form an existing ifdef block. Let's
make these headers safe for inclusion when USE_OPENSSL is not defined,
they now perform the test themselves and do nothing if USE_OPENSSL is
not defined. This allows to remove no less than 8 such ifdef blocks
and make include blocks more readable.
Since we're providing a compatibility layer for multiple OpenSSL
implementations and their derivatives, it is important that no C file
directly includes openssl headers but only passes via openssl-compat
instead. As a bonus this also gets rid of redundant complex rules for
inclusion of certain files (engines etc).
They were all check to comply with the advertised openssl version. Now
that libressl doesn't pretend to be a more recent openssl anymore, we
can simply rely on the regular openssl version tests without having to
deal with exceptions for libressl.
Most tests on OPENSSL_VERSION_NUMBER have become complex and break all
the time because this number is fake for some derivatives like LibreSSL.
This patch creates a new macro, HA_OPENSSL_VERSION_NUMBER, which will
carry the real openssl version defining the compatibility level, and
this version will be adjusted depending on the variants.
As with every single OpenSSL fix, LibreSSL build broke again, this time
after commit 56996dabe ("BUG/MINOR: mworker/ssl: close OpenSSL FDs on
reload"). A definitive solution will have to be found quickly. For now,
let's exclude libressl from the version test.
This patch must be backported to 1.9 since the fix above was already
backported there.
This patch implements a new global parameter for the master-worker mode.
When setting the mworker-max-reloads value, a worker receive a SIGTERM
if its number of reloads is greater than this value.
Since previous commit it's not needed anymore to test a task pointer
before calling task_destory() so let's just remove these tests from
the various callers before they become confusing. The function's
arguments were also documented. The same should probably be done
with tasklet_free() which involves a test in roughly half of the
call places.
In commit 1b8e68e ("MEDIUM: stick-table: Stop handling stick-tables as
proxies."), the ->table member of proxy struct was replaced by a pointer
that is not always checked and in some situations can cause a segfault,
eg. during reload or while using "show table" on CLI socket.
No backport is needed.
From OpenSSL 1.1.1, the default behaviour is to maintain open FDs to any
random devices that get used by the random number library. As a result,
those FDs leak when the master re-execs on reload; since those FDs are
not marked FD_CLOEXEC or O_CLOEXEC, they also get inherited by children.
Eventually both master and children run out of FDs.
OpenSSL 1.1.1 introduces a new function to control whether the random
devices are kept open. When clearing the keep-open flag, it also closes
any currently open FDs, so it can be used to clean-up open FDs too.
Therefore, a call to this function is made in mworker_reload prior to
re-exec.
The call is guarded by whether SSL is in use, because it will cause
initialisation of the OpenSSL random number library if that has not
already been done.
This should be backported to 1.9 and 1.8.
Now we atomically allocate the my_regex struct within function
regex_comp() and compile the regex or free both in case of failure. The
pointer to the allocated my_regex struct is returned directly. The
my_regex* argument to regex_comp() is removed.
Function regex_free() was modified so that it systematically frees the
my_regex entry. The function does nothing when called with a NULL as
argument (like free()). It will avoid existing risk of not properly
freeing the initialized area.
Other structures are also updated in order to be compatible (the ones
related to Lua and action rules).
This patch adds the support for the "table" line parsing in "peers" sections
to declare stick-table in such sections. This also prevents the user from having
to declare dummy backends sections with a unique stick-table inside.
Even if still supported, this usage will become deprecated.
To do so, the ->table member of proxy struct which is a stktable struct is replaced
by a pointer to a stktable struct allocated at parsing time in src/cfgparse-listen.c
for the dummy stick-table backends and in src/cfgparse.c for "peers" sections.
This has an impact on the code for stick-table sample converters and on the stickiness
rules parsers which first store the name of the dummy before resolving the rules.
This patch replaces proxy_tbl_by_name() calls by stktable_find_by_name() calls
to lookup for stick-tables stored in "stktable_by_name" ebtree at parsing time.
There is only one remaining place where proxy_tbl_by_name() is used: src/hlua.c.
At several places in the code we relied on the fact that ->size member of stick-table
was equal to zero to consider the stick-table was present by not configured,
this do not make sense anymore as ->table member of struct proxyis fow now on a pointer.
These tests are replaced by a test on ->table value itself.
In "peers" section we do not have to temporary store the name of the section the
stick-table are attached to because this name is obviously already known just after
having entered this "peers" section.
About the CLI stick-table I/O handler, the pointer to proxy struct is replaced by
a pointer to a stktable struct.
Currently the thread array is a local variable inside a function block
and there is no access to it from outside, which often complicates
debugging. Let's make it global and export it. Also the allocation
return is now checked.
It's still obscure how we managed to initialize an array of integers
with values always equal to the index, just to retrieve the value
from an opaque pointer to the index instead of directly using it! I
suspect it's a leftover from the very early threading experiments.
This commit gets rid of this and simply passes the thread ID as the
argument to run_thread_poll_loop(), thus significantly simplifying the
few call places and removing the need to allocate then free an array
of identity.
When we initially experimented with threads and processes support, we
needed to implement arrays of threads per process for cpu-map, but this
is not needed anymore since we support either threads or processes.
Let's simply make the thread-based cpu-map per thread and not per
thread and per process since that's not used anymore. Doing so reduces
the global struct from 33kB to 1.5kB.
When using the "use_backend" configuration directive, the configuration
file name stored as rule->file was not freed in some situations. This
was introduced in commit 4ed1c95 ("MINOR: http/conf: store the
use_backend configuration file and line for logs").
This patch should be backported to 1.9, 1.8 and 1.7.
As by default we add all keepalive connections to the idle pool, if we run
into a pathological case, where all client don't do keepalive, but the server
does, and haproxy is configured to only reuse "safe" connections, we will
soon find ourself having lots of idling, unusable for new sessions, connections,
while we won't have any file descriptors available to create new connections.
To fix this, add 2 new global settings, "pool_low_ratio" and "pool_high_ratio".
pool-low-fd-ratio is the % of fds we're allowed to use (against the maximum
number of fds available to haproxy) before we stop adding connections to the
idle pool, and destroy them instead. The default is 20. pool-high-fd-ratio is
the % of fds we're allowed to use (against the maximum number of fds available
to haproxy) before we start killing idling connection in the event we have to
create a new outgoing connection, and no reuse is possible. The default is 25.
task_delete() was never used without calling task_free() just after, and
task_free() was only used on error pathes to destroy a just-created task,
so merge them into task_destroy(), that will remove the task from the
wait queue, and make sure the task is either destroyed immediately if it's
not in the run queue, or destroyed when it's supposed to run.
It's always a pain to get a core dump when enabling user/group setting
(which disables the dumpable flag on Linux), when using a chroot and/or
when haproxy is started by a service management tool which requires
complex operations to just raise the core dump limit.
This patch introduces a new "set-dumpable" global directive to work
around these troubles by doing the following :
- remove file size limits (equivalent of ulimit -f unlimited)
- remove core size limits (equivalent of ulimit -c unlimited)
- mark the process dumpable again (equivalent of suid_dumpable=1)
Some of these will depend on the operating system. This way it becomes
much easier to retrieve a core file. Temporarily moving the chroot to
a user-writable place generally enough.
Since the introduction of the options field, we can use it to store the
type of process.
type = 'm' is replaced by PROC_O_TYPE_MASTER
type = 'w' is replaced by PROC_O_TYPE_WORKER
type = 'e' is replaced by PROC_O_TYPE_PROG
The old values are still used in the HAPROXY_PROCESSES environment
variable to pass the information during a reload.
Pavlos Parissis reported an interesting case where some map identifiers
were not assigned (appearing as -1 in show map). It turns out that it
only happens for log-format expressions parsed in check_config_validity()
that involve maps (log-format, use_backend, unique-id-header), as in the
sample configuration below :
frontend foo
bind :8001
unique-id-format %[src,map(addr.lst)]
log-format %[src,map(addr.lst)]
use_backend %[src,map(addr.lst)]
The reason stems from the initial introduction of unique IDs in 1.5 via
commit af5a29d5f ("MINOR: pattern: Each pattern is identified by unique
id.") : the unique_id assignment was done before calling
check_config_validity() so all maps loaded after this call are not
properly configured. From what the function does, it seems they will not
be able to use a cache, will not have a unique_id assigned and will not
be updatable from the CLI.
This fix must be backported to all supported versions.
This patch implements the external binary support in the master worker.
To configure an external process, you need to use the program section,
for example:
program dataplane-api
command ./dataplane_api
Those processes are launched at the same time as the workers.
During a reload of HAProxy, those processes are dealing with the same
sequence as a worker:
- the master is re-executed
- the master sends a USR1 signal to the program
- the master launches a new instance of the program
During a stop, or restart, a SIGTERM is sent to the program.
The children variable is still used in haproxy, it is not required
anymore since we have the information about the current workers in the
mworker_proc linked list.
The oldpids array is also replaced by this linked list when we
generated the arguments for the master reexec.
The current initcall implementation relies on dedicated sections (one
section per init stage) to store the initcall descriptors. Then upon
startup, these sections are scanned from beginning to end and all items
found there are called in sequence.
On platforms like AIX or Cygwin it seems difficult to figure the
beginning and end of sections as the linker doesn't seem to provide
the corresponding symbols. In order to replace this, this patch
simply implements an array of single linked (one per init stage)
which are fed using constructors for each register call. These
constructors are declared static, with a name depending on their
line number in the file, in order to avoid name clashes. The final
effect is the same, except that the method is slightly more expensive
in that it explicitly produces code to register these initcalls :
$ size haproxy.sections haproxy.constructor
text data bss dec hex filename
4060312 249176 1457652 5767140 57ffe4 haproxy.sections
4062862 260408 1457652 5780922 5835ba haproxy.constructor
This mechanism is enabled as an alternative to the default one when
build option USE_OBSOLETE_LINKER is set. This option is currently
enabled by default only on AIX and Cygwin, and may be attempted for
any target which fails to build complaining about missing symbols
__start_init_* and/or __stop_init_*.
Once confirmed as a reliable fix, this will likely have to be backported
to 1.9 where AIX and Cygwin do not build anymore.
A bug occurs when the sigchld handler is called and a child which is
not in the process list just left, or with an empty process list.
The child variable won't be set and left as an uninitialized variable or
set to the wrong child entry, which can lead to a free of this
uninitialized variable or of the wrong child.
This can lead to a crash of the master during a stop or a reload.
It is not supposed to happen with a worker which was created by the
master. A cause could be a fork made by a dependency. (openssl, lua ?)
This patch strengthens the case of the missing child by doing the free
only if the child was found.
This patch must be backported to 1.9.
It's not convenient not to know the status of default options, and
requires the user to know what option is enabled by default in each
target. With this patch, a new "Features list" line is added to the
output of "haproxy -vv" to report the whole list of known features
with their respective status. They're prefixed with a "+" when enabled
or a "-" when disabled. The "USE_" prefix is removed for clarity.
It's never easy to guess what services are built in. We currently have
the prometheus exporter in contrib/ which is the only extension for now.
Let's enumerate all available ones just like we do for filterr and pollers.
Each thread uses one epoll_fd or kqueue_fd, and a pipe (thus two FDs).
These ones have to be accounted for in the maxsock calculation, otherwise
we can reach maxsock before maxconn. This is difficult to observe but it
in fact happens when a server connects back to the frontend and has checks
enabled : the check uses its FD and serves to fill the loop. In this case
all FDs planed for the datapath are used for this.
This needs to be backported to 1.9 and 1.8.
Some packages used to rely on DEFAULT_MAXCONN to set the default global
maxconn value to use regardless of the initial ulimit. The recent changes
made the lowest bound set to 100 so that it is compatible with almost any
environment. Now that DEFAULT_MAXCONN is not needed for anything else, we
can use it for the lowest bound set when maxconn is not configured. This
way it retains its original purpose of setting the default maxconn value
eventhough most of the time the effective value will be higher thanks to
the automatic computation based on "ulimit -n".
This entry was still set to 2000 but never used anymore. The only places
where it appeared was as an alias to SYSTEM_MAXCONN which forces it, so
let's turn these ones to SYSTEM_MAXCONN and remove the default value for
DEFAULT_MAXCONN. SYSTEM_MAXCONN still defines the upper bound however.
The global maxconn value is often a pain to configure :
- in development the user never has the permissions to increase the
rlim_cur value too high and gets warnings all the time ;
- in some production environments, users may have limited actions on
it or may only be able to act on rlim_fd_cur using ulimit -n. This
is sometimes particularly true in containers or whatever environment
where the user has no privilege to upgrade the limits.
- keeping config homogenous between machines is even less easy.
We already had the ability to automatically compute maxconn from the
memory limits when they were set. This patch goes a bit further by also
computing the limit permitted by the configured limit on the number of
FDs. For this it simply reverses the rlim_fd_cur calculation to determine
maxconn based on the number of reserved sockets for listeners & checks,
the number of SSL engines and the number of pipes (absolute or relative).
This way it becomes possible to make maxconn always be the highest possible
value resulting in maxsock matching what was set using "ulimit -n", without
ever setting it. Note that we adjust to the soft limit, not the hard one,
since it's what is configured with ulimit -n. This allows users to also
limit to low values if needed.
Just like before, the calculated value is reported in verbose mode.
We'll need to know the global maxsock before the maxconn calculation.
Actually only two components were calculated too late, the peers FD
and the stats FD. Let's move them a few lines upward.
The default number of pipes is adjusted based on the sum of frontends
and backends maxconn/fullconn settings. Now that it is possible to have
a null maxconn on a frontend to indicate "unlimited" with commit
c8d5b95e6 ("MEDIUM: config: don't enforce a low frontend maxconn value
anymore"), the sum of maxconn may remain low and limited to the only
frontends/backends where this limit is set.
This patch considers this new unlimited case when doing the check, and
automatically switches to the default value which is maxconn/4 in this
case. All the calculation was moved to a distinct function for ease of
use. This function also supports returning unlimited (-1) when the
value depends on global.maxconn and this latter is not yet set.
When the master re-execs itself on reload, it doesn't restore the initial
rlim_fd_cur/rlim_fd_max values, which have been modified by the ulimit-n
or global maxconn directives. This is a problem, because if these values
were set really low it could prevent the process from restarting, and if
they were set very high, this could have some implications on the restart
time, or later on the computed maxconn.
Let's simply reset these values to the ones we had at boot to maintain
the system in a consistent state.
A backport could be performed to 1.9 and maybe 1.8. This patch depends on
the two previous ones.
If a ulimit-n value is set, we must not lower the rlim_max value if the
new value is lower, we must only adjust the rlim_cur one. The effect is
that on very low values, this could prevent a master-worker reload, or
make an external check fail by lack of FDs.
This may be backported to 1.9 and earlier, but it depends on this patch
"MINOR: global: keep a copy of the initial rlim_fd_cur and rlim_fd_max
values".
Let's keep a copy of these initial values. They will be useful to
compute automatic maxconn, as well as to restore proper limits when
doing an execve() on external checks.
Historically the default frontend's maxconn used to be quite low (2000),
which was sufficient two decades ago but often proved to be a problem
when users had purposely set the global maxconn value but forgot to set
the frontend's.
There is no point in keeping this arbitrary limit for frontends : when
the global maxconn is lower, it's already too high and when the global
maxconn is much higher, it becomes a limiting factor which causes trouble
in production.
This commit allows the value to be set to zero, which becomes the new
default value, to mean it's not directly limited, or in fact it's set
to the global maxconn. Since this operation used to be performed before
computing a possibly automatic global maxconn based on memory limits,
the calculation of the maxconn value and its propagation to the backends'
fullconn has now moved to a dedicated function, proxy_adjust_all_maxconn(),
which is called once the global maxconn is stabilized.
This comes with two benefits :
1) a configuration missing "maxconn" in the defaults section will not
limit itself to a magically hardcoded value but will scale up to the
global maxconn ;
2) when the global maxconn is not set and memory limits are used instead,
the frontends' maxconn automatically adapts, and the backends' fullconn
as well.
Threads have long matured by now, still for most users their usage is
not trivial. It's about time to enable them by default on platforms
where we know the number of CPUs bound. This patch does this, it counts
the number of CPUs the process is bound to upon startup, and enables as
many threads by default. Of course, "nbthread" still overrides this, but
if it's not set the default behaviour is to start one thread per CPU.
The default number of threads is reported in "haproxy -vv". Simply using
"taskset -c" is now enough to adjust this number of threads so that there
is no more need for playing with cpu-map. And thanks to the previous
patches on the listener, the vast majority of configurations will not
need to duplicate "bind" lines with the "process x/y" statement anymore
either, so a simple config will automatically adapt to the number of
processors available.
tune.listener.multi-queue { on | off }
Enables ('on') or disables ('off') the listener's multi-queue accept which
spreads the incoming traffic to all threads a "bind" line is allowed to run
on instead of taking them for itself. This provides a smoother traffic
distribution and scales much better, especially in environments where threads
may be unevenly loaded due to external activity (network interrupts colliding
with one thread for example). This option is enabled by default, but it may
be forcefully disabled for troubleshooting or for situations where it is
estimated that the operating system already provides a good enough
distribution and connections are extremely short-lived.
Instead of having one task per thread and per server that does clean the
idling connections, have only one global task for every servers.
That tasks parses all the servers that currently have idling connections,
and remove half of them, to put them in a per-thread list of connections
to kill. For each thread that does have connections to kill, wake a task
to do so, so that the cleaning will be done in the context of said thread.
Released version 2.0-dev1 with the following main changes :
- MINOR: mux-h2: only increase the connection window with the first update
- REGTESTS: remove the expected window updates from H2 handshakes
- BUG/MINOR: mux-h2: make empty HEADERS frame return a connection error
- BUG/MEDIUM: mux-h2: mark that we have too many CS once we have more than the max
- MEDIUM: mux-h2: remove padlen during headers phase
- MINOR: h2: add a bit-based frame type representation
- MINOR: mux-h2: remove useless check for empty frame length in h2s_decode_headers()
- MEDIUM: mux-h2: decode HEADERS frames before allocating the stream
- MINOR: mux-h2: make h2c_send_rst_stream() use the dummy stream's error code
- MINOR: mux-h2: add a new dummy stream for the REFUSED_STREAM error code
- MINOR: mux-h2: fail stream creation more cleanly using RST_STREAM
- MINOR: buffers: add a new b_move() function
- MINOR: mux-h2: make h2_peek_frame_hdr() support an offset
- MEDIUM: mux-h2: handle decoding of CONTINUATION frames
- CLEANUP: mux-h2: remove misleading comments about CONTINUATION
- BUG/MEDIUM: servers: Don't try to reuse connection if we switched server.
- BUG/MEDIUM: tasks: Decrement tasks_run_queue in tasklet_free().
- BUG/MINOR: htx: send the proper authenticate header when using http-request auth
- BUG/MEDIUM: mux_h2: Don't add to the idle list if we're full.
- BUG/MEDIUM: servers: Fail if we fail to allocate a conn_stream.
- BUG/MAJOR: servers: Use the list api correctly to avoid crashes.
- BUG/MAJOR: servers: Correctly use LIST_ELEM().
- BUG/MAJOR: sessions: Use an unlimited number of servers for the conn list.
- BUG/MEDIUM: servers: Flag the stream_interface on handshake error.
- MEDIUM: servers: Be smarter when switching connections.
- MEDIUM: sessions: Keep track of which connections are idle.
- MINOR: payload: add sample fetch for TLS ALPN
- BUG/MEDIUM: log: don't mark log FDs as non-blocking on terminals
- MINOR: channel: Add the function channel_add_input
- MINOR: stats/htx: Call channel_add_input instead of updating channel state by hand
- BUG/MEDIUM: cache: Be sure to end the forwarding when XFER length is unknown
- BUG/MAJOR: htx: Return the good block address after a defrag
- MINOR: lb: allow redispatch when using consistent hash
- CLEANUP: mux-h2: fix end-of-stream flag name when processing headers
- BUG/MEDIUM: mux-h2: always restart reading if data are available
- BUG/MINOR: mux-h2: set the stream-full flag when leaving h2c_decode_headers()
- BUG/MINOR: mux-h2: don't check the CS count in h2c_bck_handle_headers()
- BUG/MINOR: mux-h2: mark end-of-stream after processing response HEADERS, not before
- BUG/MINOR: mux-h2: only update rxbuf's length for H1 headers
- BUG/MEDIUM: mux-h1: use per-direction flags to indicate transitions
- BUG/MEDIUM: mux-h1: make HTX chunking consistent with H2
- BUG/MAJOR: stream-int: Update the stream expiration date in stream_int_notify()
- BUG/MEDIUM: proto-htx: Set SI_FL_NOHALF on server side when request is done
- BUG/MEDIUM: mux-h1: Add a task to handle connection timeouts
- MINOR: mux-h2: make h2c_decode_headers() return a status, not a count
- MINOR: mux-h2: add a new dummy stream : h2_error_stream
- MEDIUM: mux-h2: make h2c_decode_headers() support recoverable errors
- BUG/MINOR: mux-h2: detect when the HTX EOM block cannot be added after headers
- MINOR: mux-h2: remove a misleading and impossible test
- CLEANUP: mux-h2: clean the stream error path on HEADERS frame processing
- MINOR: mux-h2: check for too many streams only for idle streams
- MINOR: mux-h2: set H2_SF_HEADERS_RCVD when a HEADERS frame was decoded
- BUG/MEDIUM: mux-h2: decode trailers in HEADERS frames
- MINOR: h2: add h2_make_h1_trailers to turn H2 headers to H1 trailers
- MEDIUM: mux-h2: pass trailers to H1 (legacy mode)
- MINOR: htx: add a new function to add a block without filling it
- MINOR: h2: add h2_make_htx_trailers to turn H2 headers to HTX trailers
- MEDIUM: mux-h2: pass trailers to HTX
- MINOR: mux-h1: parse the content-length header on output and set H1_MF_CLEN
- BUG/MEDIUM: mux-h1: don't enforce chunked encoding on requests
- MINOR: mux-h2: make HTX_BLK_EOM processing idempotent
- MINOR: h1: make the H1 headers block parser able to parse headers only
- MEDIUM: mux-h2: emit HEADERS frames when facing HTX trailers blocks
- MINOR: stream/htx: Add info about the HTX structs in "show sess all" command
- MINOR: stream: Add the subscription events of SIs in "show sess all" command
- MINOR: mux-h1: Add the subscription events in "show fd" command
- BUG/MEDIUM: h1: Get the h1m state when restarting the headers parsing
- BUG/MINOR: cache/htx: Be sure to count partial trailers
- BUG/MEDIUM: h1: In h1_init(), wake the tasklet instead of calling h1_recv().
- BUG/MEDIUM: server: Defer the mux init until after xprt has been initialized.
- MINOR: connections: Remove a stall comment.
- BUG/MEDIUM: cli: make "show sess" really thread-safe
- BUILD: add a new file "version.c" to carry version updates
- MINOR: stream/htx: add the HTX flags output in "show sess all"
- MINOR: stream/cli: fix the location of the waiting flag in "show sess all"
- MINOR: stream/cli: report more info about the HTTP messages on "show sess all"
- BUG/MINOR: lua: bad args are returned for Lua actions
- BUG/MEDIUM: lua: dead lock when Lua tasks are trigerred
- MINOR: htx: Add an helper function to get the max space usable for a block
- MINOR: channel/htx: Add HTX version for some helper functions
- BUG/MEDIUM: cache/htx: Respect the reserve when cached objects are served
- BUG/MINOR: stats/htx: Respect the reserve when the stats page is dumped
- DOC: regtest: make it clearer what the purpose of the "broken" series is
- REGTEST: mailers: add new test for 'mailers' section
- REGTEST: Add a reg test for health-checks over SSL/TLS.
- BUG/MINOR: mux-h1: Close connection on shutr only when shutw was really done
- MEDIUM: mux-h1: Clarify how shutr/shutw are handled
- BUG/MINOR: compression: Disable it if another one is already in progress
- BUG/MINOR: filters: Detect cache+compression config on legacy HTTP streams
- BUG/MINOR: cache: Disable the cache if any compression filter precedes it
- REGTEST: Add some informatoin to test results.
- MINOR: htx: Add a function to truncate all blocks after a specific offset
- MINOR: channel/htx: Add the HTX version of channel_truncate/erase
- BUG/MINOR: proto_htx: Use HTX versions to truncate or erase a buffer
- BUG/CRITICAL: mux-h2: re-check the frame length when PRIORITY is used
- DOC: Fix typo in req.ssl_alpn example (commit 4afdd138424ab...)
- DOC: http-request cache-use / http-response cache-store expects cache name
- REGTEST: "capture (request|response)" regtest.
- BUG/MINOR: lua/htx: Respect the reserve when data are send from an HTX applet
- REGTEST: filters: add compression test
- BUG/MEDIUM: init: Initialize idle_orphan_conns for first server in server-template
- BUG/MEDIUM: ssl: Disable anti-replay protection and set max data with 0RTT.
- DOC: Be a bit more explicit about allow-0rtt security implications.
- MINOR: mux-h1: make the mux_h1_ops struct static
- BUILD: makefile: add an EXTRA_OBJS variable to help build optional code
- BUG/MEDIUM: connection: properly unregister the mux on failed initialization
- BUG/MAJOR: cache: fix confusion between zero and uninitialized cache key
- REGTESTS: test case for map_regm commit 271022150d
- REGTESTS: Basic tests for concat,strcmp,word,field,ipmask converters
- REGTESTS: Basic tests for using maps to redirect requests / select backend
- DOC: REGTESTS README varnishtest -Dno-htx= define.
- MINOR: spoe: Make the SPOE filter compatible with HTX proxies
- MINOR: checks: Store the proxy in checks.
- BUG/MEDIUM: checks: Avoid having an associated server for email checks.
- REGTEST: Switch to vtest.
- REGTEST: Adapt reg test doc files to vtest.
- BUG/MEDIUM: h1: Make sure we destroy an inactive connectin that did shutw.
- BUG/MINOR: base64: dec func ignores padding for output size checking
- BUG/MEDIUM: ssl: missing allocation failure checks loading tls key file
- MINOR: ssl: add support of aes256 bits ticket keys on file and cli.
- BUG/MINOR: backend: don't use url_param_name as a hint for BE_LB_ALGO_PH
- BUG/MINOR: backend: balance uri specific options were lost across defaults
- BUG/MINOR: backend: BE_LB_LKUP_CHTREE is a value, not a bit
- MINOR: backend: move url_param_name/len to lbprm.arg_str/len
- MINOR: backend: make headers and RDP cookie also use arg_str/len
- MINOR: backend: add new fields in lbprm to store more LB options
- MINOR: backend: make the header hash use arg_opt1 for use_domain_only
- MINOR: backend: remap the balance uri settings to lbprm.arg_opt{1,2,3}
- MINOR: backend: move hash_balance_factor out of chash
- MEDIUM: backend: move all LB algo parameters into an union
- MINOR: backend: make the random algorithm support a number of draws
- BUILD/MEDIUM: da: Necessary code changes for new buffer API.
- BUG/MINOR: stick_table: Prevent conn_cur from underflowing
- BUG: 51d: Changes to the buffer API in 1.9 were not applied to the 51Degrees code.
- BUG/MEDIUM: stats: Get the right scope pointer depending on HTX is used or not
- DOC: add a missing space in the documentation for bc_http_major
- REGTEST: checks basic stats webpage functionality
- BUG/MEDIUM: servers: Make assign_tproxy_address work when ALPN is set.
- BUG/MEDIUM: connections: Add the CO_FL_CONNECTED flag if a send succeeded.
- DOC: add github issue templates
- MINOR: cfgparse: Extract some code to be re-used.
- CLEANUP: cfgparse: Return asap from cfg_parse_peers().
- CLEANUP: cfgparse: Code reindentation.
- MINOR: cfgparse: Useless frontend initialization in "peers" sections.
- MINOR: cfgparse: Rework peers frontend init.
- MINOR: cfgparse: Simplication.
- MINOR: cfgparse: Make "peer" lines be parsed as "server" lines.
- MINOR: peers: Make outgoing connection to SSL/TLS peers work.
- MINOR: cfgparse: SSL/TLS binding in "peers" sections.
- DOC: peers: SSL/TLS documentation for "peers"
- BUG/MINOR: startup: certain goto paths in init_pollers fail to free
- BUG/MEDIUM: checks: fix recent regression on agent-check making it crash
- BUG/MINOR: server: don't always trust srv_check_health when loading a server state
- BUG/MINOR: check: Wake the check task if the check is finished in wake_srv_chk()
- BUG/MEDIUM: ssl: Fix handling of TLS 1.3 KeyUpdate messages
- DOC: mention the effect of nf_conntrack_tcp_loose on src/dst
- BUG/MINOR: proto-htx: Return an error if all headers cannot be received at once
- BUG/MEDIUM: mux-h2/htx: Respect the channel's reserve
- BUG/MINOR: mux-h1: Apply the reserve on the channel's buffer only
- BUG/MINOR: mux-h1: avoid copying output over itself in zero-copy
- BUG/MAJOR: mux-h2: don't destroy the stream on failed allocation in h2_snd_buf()
- BUG/MEDIUM: backend: also remove from idle list muxes that have no more room
- BUG/MEDIUM: mux-h2: properly abort on trailers decoding errors
- MINOR: h2: declare new sets of frame types
- BUG/MINOR: mux-h2: CONTINUATION in closed state must always return GOAWAY
- BUG/MINOR: mux-h2: headers-type frames in HREM are always a connection error
- BUG/MINOR: mux-h2: make it possible to set the error code on an already closed stream
- BUG/MINOR: hpack: return a compression error on invalid table size updates
- MINOR: server: make sure pool-max-conn is >= -1
- BUG/MINOR: stream: take care of synchronous errors when trying to send
- CLEANUP: server: fix indentation mess on idle connections
- BUG/MINOR: mux-h2: always check the stream ID limit in h2_avail_streams()
- BUG/MINOR: mux-h2: refuse to allocate a stream with too high an ID
- BUG/MEDIUM: backend: never try to attach to a mux having no more stream available
- MINOR: server: add a max-reuse parameter
- MINOR: mux-h2: always consider a server's max-reuse parameter
- MEDIUM: stream-int: always mark pending outgoing SI_ST_CON
- MINOR: stream: don't wait before retrying after a failed connection reuse
- MEDIUM: h2: always parse and deduplicate the content-length header
- BUG/MINOR: mux-h2: always compare content-length to the sum of DATA frames
- CLEANUP: h2: Remove debug printf in mux_h2.c
- MINOR: cfgparse: make the process/thread parser support a maximum value
- MINOR: threads: make MAX_THREADS configurable at build time
- DOC: nbthread is no longer experimental.
- BUG/MINOR: listener: always fill the source address for accepted socketpairs
- BUG/MINOR: mux-h2: do not report available outgoing streams after GOAWAY
- BUG/MINOR: spoe: corrected fragmentation string size
- BUG/MINOR: task: fix possibly missed event in inter-thread wakeups
- BUG/MEDIUM: servers: Attempt to reuse an unfinished connection on retry.
- BUG/MEDIUM: backend: always call si_detach_endpoint() on async connection failure
- SCRIPTS: add the issue tracker URL to the announce script
- MINOR: peers: Extract some code to be reused.
- CLEANUP: peers: Indentation fixes.
- MINOR: peers: send code factorization.
- MINOR: peers: Add new functions to send code and reduce the I/O handler.
- MEDIUM: peers: synchronizaiton code factorization to reduce the size of the I/O handler.
- MINOR: peers: Move update receive code to reduce the size of the I/O handler.
- MINOR: peers: Move ack, switch and definition receive code to reduce the size of the I/O handler.
- MINOR: peers: Move high level receive code to reduce the size of I/O handler.
- CLEANUP: peers: Be more generic.
- MINOR: peers: move error handling to reduce the size of the I/O handler.
- MINOR: peers: move messages treatment code to reduce the size of the I/O handler.
- MINOR: peers: move send code to reduce the size of the I/O handler.
- CLEANUP: peers: Remove useless statements.
- MINOR: peers: move "hello" message treatment code to reduce the size of the I/O handler.
- MINOR: peers: move peer initializations code to reduce the size of the I/O handler.
- CLEANUP: peers: factor the error handling code in peer_treet_updatemsg()
- CLEANUP: peers: factor error handling in peer_treat_definedmsg()
- BUILD/MINOR: peers: shut up a build warning introduced during last cleanup
- BUG/MEDIUM: mux-h2: only close connection on request frames on closed streams
- CLEANUP: mux-h2: remove two useless but misleading assignments
- BUG/MEDIUM: checks: Check that conn_install_mux succeeded.
- BUG/MEDIUM: servers: Only destroy a conn_stream we just allocated.
- BUG/MEDIUM: servers: Don't add an incomplete conn to the server idle list.
- BUG/MEDIUM: checks: Don't try to set ALPN if connection failed.
- BUG/MEDIUM: h2: In h2_send(), stop the loop if we failed to alloc a buf.
- BUG/MEDIUM: peers: Handle mux creation failure.
- BUG/MEDIUM: servers: Close the connection if we failed to install the mux.
- BUG/MEDIUM: compression: Rewrite strong ETags
- BUG/MINOR: deinit: tcp_rep.inspect_rules not deinit, add to deinit
- CLEANUP: mux-h2: remove misleading leftover test on h2s' nullity
- BUG/MEDIUM: mux-h2: wake up flow-controlled streams on initial window update
- BUG/MEDIUM: mux-h2: fix two half-closed to closed transitions
- BUG/MEDIUM: mux-h2: make sure never to send GOAWAY on too old streams
- BUG/MEDIUM: mux-h2: do not abort HEADERS frame before decoding them
- BUG/MINOR: mux-h2: make sure response HEADERS are not received in other states than OPEN and HLOC
- MINOR: h2: add a generic frame checker
- MEDIUM: mux-h2: check the frame validity before considering the stream state
- CLEANUP: mux-h2: remove stream ID and frame length checks from the frame parsers
- BUG/MINOR: mux-h2: make sure request trailers on aborted streams don't break the connection
- DOC: compression: Update the reasons for disabled compression
- BUG/MEDIUM: buffer: Make sure b_is_null handles buffers waiting for allocation.
- DOC: htx: make it clear that htxbuf() and htx_from_buf() always return valid pointers
- MINOR: htx: never check for null htx pointer in htx_is_{,not_}empty()
- MINOR: mux-h2: consistently rely on the htx variable to detect the mode
- BUG/MEDIUM: peers: Peer addresses parsing broken.
- BUG/MEDIUM: mux-h1: Don't add "transfer-encoding" if message-body is forbidden
- BUG/MEDIUM: connections: Don't forget to remove CO_FL_SESS_IDLE.
- BUG/MINOR: stream: don't close the front connection when facing a backend error
- BUG/MEDIUM: mux-h2: wait for the mux buffer to be empty before closing the connection
- MINOR: stream-int: add a new flag to mention that we want the connection to be killed
- MINOR: connstream: have a new flag CS_FL_KILL_CONN to kill a connection
- BUG/MEDIUM: mux-h2: do not close the connection on aborted streams
- BUG/MINOR: server: fix logic flaw in idle connection list management
- MINOR: mux-h2: max-concurrent-streams should be unsigned
- MINOR: mux-h2: make sure to only check concurrency limit on the frontend
- MINOR: mux-h2: learn and store the peer's advertised MAX_CONCURRENT_STREAMS setting
- BUG/MEDIUM: mux-h2: properly consider the peer's advertised max-concurrent-streams
- MINOR: xref: Add missing barriers.
- MINOR: muxes: Don't bother to LIST_DEL(&conn->list) before calling conn_free().
- MINOR: debug: Add an option that causes random allocation failures.
- BUG/MEDIUM: backend: always release the previous connection into its own target srv_list
- BUG/MEDIUM: htx: check the HTX compatibility in dynamic use-backend rules
- BUG/MINOR: tune.fail-alloc: Don't forget to initialize ret.
- BUG/MINOR: backend: check srv_conn before dereferencing it
- BUG/MEDIUM: mux-h2: always omit :scheme and :path for the CONNECT method
- BUG/MEDIUM: mux-h2: always set :authority on request output
- BUG/MEDIUM: stream: Don't forget to free s->unique_id in stream_free().
- BUG/MINOR: threads: fix the process range of thread masks
- BUG/MINOR: config: fix bind line thread mask validation
- CLEANUP: threads: fix misleading comment about all_threads_mask
- CLEANUP: threads: use nbits to calculate the thread mask
- OPTIM: listener: optimize cache-line packing for struct listener
- MINOR: tools: improve the popcount() operation
- MINOR: config: keep an all_proc_mask like we have all_threads_mask
- MINOR: global: add proc_mask() and thread_mask()
- MINOR: config: simplify bind_proc processing using proc_mask()
- MINOR: threads: make use of thread_mask() to simplify some thread calculations
- BUG/MINOR: compression: properly report compression stats in HTX mode
- BUG/MINOR: task: close a tiny race in the inter-thread wakeup
- BUG/MAJOR: config: verify that targets of track-sc and stick rules are present
- BUG/MAJOR: spoe: verify that backends used by SPOE cover all their callers' processes
- BUG/MAJOR: htx/backend: Make all tests on HTTP messages compatible with HTX
- BUG/MINOR: config: make sure to count the error on incorrect track-sc/stick rules
- DOC: ssl: Clarify when pre TLSv1.3 cipher can be used
- DOC: ssl: Stop documenting ciphers example to use
- BUG/MINOR: spoe: do not assume agent->rt is valid on exit
- BUG/MINOR: lua: initialize the correct idle conn lists for the SSL sockets
- BUG/MEDIUM: spoe: initialization depending on nbthread must be done last
- BUG/MEDIUM: server: initialize the idle conns list after parsing the config
- BUG/MEDIUM: server: initialize the orphaned conns lists and tasks at the end
- MINOR: config: make MAX_PROCS configurable at build time
- BUG/MAJOR: spoe: Don't try to get agent config during SPOP healthcheck
- BUG/MINOR: config: Reinforce validity check when a process number is parsed
- BUG/MEDIUM: peers: check that p->srv actually exists before using p->srv->use_ssl
- CONTRIB: contrib/prometheus-exporter: Add a Prometheus exporter for HAProxy
- BUG/MINOR: mux-h1: verify the request's version before dropping connection: keep-alive
- BUG: 51d: In Hash Trie, multi header matching was affected by the header names stored globaly.
- MEDIUM: 51d: Enabled multi threaded operation in the 51Degrees module.
- BUG/MAJOR: stream: avoid double free on unique_id
- BUILD/MINOR: stream: avoid a build warning with threads disabled
- BUILD/MINOR: tools: fix build warning in the date conversion functions
- BUILD/MINOR: peers: remove an impossible null test in intencode()
- BUILD/MINOR: htx: fix some potential null-deref warnings with http_find_stline
- BUG/MEDIUM: peers: Missing peer initializations.
- BUG/MEDIUM: http_fetch: fix the "base" and "base32" fetch methods in HTX mode
- BUG/MEDIUM: proto_htx: Fix data size update if end of the cookie is removed
- BUG/MEDIUM: http_fetch: fix "req.body_len" and "req.body_size" fetch methods in HTX mode
- BUILD/MEDIUM: initcall: Fix build on MacOS.
- BUG/MEDIUM: mux-h2/htx: Always set CS flags before exiting h2_rcv_buf()
- MINOR: h2/htx: Set the flag HTX_SL_F_BODYLESS for messages without body
- BUG/MINOR: mux-h1: Add "transfer-encoding" header on outgoing requests if needed
- BUG/MINOR: mux-h2: Don't add ":status" pseudo-header on trailers
- BUG/MINOR: proto-htx: Consider a XFER_LEN message as chunked by default
- BUG/MEDIUM: h2/htx: Correctly handle interim responses when HTX is enabled
- MINOR: mux-h2: Set HTX extra value when possible
- BUG/MEDIUM: htx: count the amount of copied data towards the final count
- MINOR: mux-h2: make the H2 MAX_FRAME_SIZE setting configurable
- BUG/MEDIUM: mux-h2/htx: send an empty DATA frame on empty HTX trailers
- BUG/MEDIUM: servers: Use atomic operations when handling curr_idle_conns.
- BUG/MEDIUM: servers: Add a per-thread counter of idle connections.
- MINOR: fd: add a new my_closefrom() function to close all FDs
- MINOR: checks: use my_closefrom() to close all FDs
- MINOR: fd: implement an optimised my_closefrom() function
- BUG/MINOR: fd: make sure my_closefrom() doesn't miss some FDs
- BUG/MAJOR: fd/threads, task/threads: ensure all spin locks are unlocked
- BUG/MAJOR: listener: Make sure the listener exist before using it.
- MINOR: fd: Use closefrom() as my_closefrom() if supported.
- BUG/MEDIUM: mux-h1: Report the right amount of data xferred in h1_rcv_buf()
- BUG/MINOR: channel: Set CF_WROTE_DATA when outgoing data are skipped
- MINOR: htx: Add function to drain data from an HTX message
- MINOR: channel/htx: Add function to skips output bytes from an HTX channel
- BUG/MAJOR: cache/htx: Set the start-line offset when a cached object is served
- BUG/MEDIUM: cache: Get objects from the cache only for GET and HEAD requests
- BUG/MINOR: cache/htx: Return only the headers of cached objects to HEAD requests
- BUG/MINOR: mux-h1: Always initilize h1m variable in h1_process_input()
- BUG/MEDIUM: proto_htx: Fix functions applying regex filters on HTX messages
- BUG/MEDIUM: h2: advertise to servers that we don't support push
- MINOR: standard: Add a function to parse uints (dotted notation).
- MINOR: arg: Add support for ARGT_PBUF_FNUM arg type.
- MINOR: http_fetch: add "req.ungrpc" sample fetch for gRPC.
- MINOR: sample: Add two sample converters for protocol buffers.
- DOC: sample: Add gRPC related documentation.
Add a per-thread counter of idling connections, and use it to determine
how many connections we should kill after the timeout, instead of using
the global counter, or we're likely to just kill most of the connections.
This should be backported to 1.9.
For some embedded systems, it's pointless to have 32- or even 64- large
arrays of processes when it's known that much fewer processes will be
used in the worst case. Let's introduce this MAX_PROCS define which
contains the highest number of processes allowed to run at once. It
still defaults to LONGBITS but may be lowered.
Since all of them are exclusive, let's move them to an union instead
of eating memory with the sum of all of them. We're using a transparent
union to limit the code changes.
Doing so reduces the struct lbprm from 392 bytes to 372, and thanks
to these changes, the struct proxy is now down to 6480 bytes vs 6624
before the changes (144 bytes saved per proxy).
While testing fixes, it's sometimes confusing to rebuild only one C file
(e.g. a mux) and not to have the correct commit ID reported in "haproxy -v"
nor on the stats page.
This patch adds a new "version.c" file which is always rebuilt. It's
very small and contains only 3 variables derived from the various
version strings. These variables are used instead of the macros at the
few places showing the version. This way the output version of the
running code is always correct for the parts that were rebuilt.
First, it's a pain to always have to think about updating this date,
second for a long time I've not been the only developer there, and third,
some users contact me hoping to get help that I can't deliver. It's about
time to redirect them to the main site where all the useful links should
be.
If the reload fail after the parsing of the configuration, the
mworker_proc structures are created for the processes it tried to
create.
The mworker_proc_list_to_env() function was exporting these unitialized
structures in the "HAPROXY_PROCESSES" environment variable which was
leading to this kind of output in "show proc":
4294967295 worker [was: 1] 1 17879d 16h26m28s
Since HTX casts the buffer to a struct and stores relative pointers at the
end, it is mandatory that its end is properly aligned. This patch enforces
a buffer size rounding up to the next multiple of two void*, thus 8 on
32-bit and 16 on 64-bit, to match what malloc() already does on the beginning
of the buffer. In pratice it will never be really noticeable since default
sizes already are such multiples.
The master is not supposed to run (at the moment) any task before the
polling loop, the created tasks should be run only in the workers but in
the master they should be disabled or removed.
No backport needed.
The previous code was only stopping the listeners in the master, not the
entire proxy.
Since we now have a polling loop in the master, there might be some side
effects, indeed some things that are still initialized. For example the
checks were still running.
Add a new keyword for servers, "idle-timeout". If set, unused connections are
kept alive until the timeout happens, and will be picked for reuse if no
other connection is available.
signal_init(), init_log(), init_stream(), and init_task() all used to
only preset some values and lists. This needs to be done very early to
provide a reliable interface to all other users. The calls used to be
explicit in haproxy.c:init(). Now they're placed in initcalls at the
STG_PREPARE stage. The functions are not exported anymore.
Instead of exporting a number of pools and having to manually delete
them in deinit() or to have dedicated destructors to remove them, let's
simply kill all pools on deinit().
For this a new function pool_destroy_all() was introduced. As its name
implies, it destroys and frees all pools (provided they don't have any
user anymore of course).
This allowed to remove 4 implicit destructors, 2 explicit ones, and 11
individual calls to pool_destroy(). In addition it properly removes
the mux_pt_ctx pool which was not cleared on exit (no backport needed
here since it's 1.9 only). The sig_handler pool doesn't need to be
exported anymore and became static now.
This commit replaces the explicit pool creation that are made in
constructors with a pool registration. Not only this simplifies the
pools declaration (it can be done on a single line after the head is
declared), but it also removes references to pools from within
constructors. The only remaining create_pool() calls are those
performed in init functions after the config is parsed, so there
is no more user of potentially uninitialized pool now.
It has been the opportunity to remove no less than 12 constructors
and 6 init functions.
We reintroduced some FDs leaking by using a poller and some listeners in
the master.
The master proxy needs to be stopped to avoid leaking its listeners, the
polling loop needs to be deinit, and the thread waker pipe need to be
closed too.
No backport needed.
Valgrind reports:
==3389== Warning: invalid file descriptor -1 in syscall close()
Check for >= 0 before closing.
This bug was introduced in commit ce83b4a5dd
and is specific to 1.9. No backport needed.
At the moment the situation with activity measurement is quite tricky
because the struct activity is defined in global.h and declared in
haproxy.c, with operations made in time.h and relying on freq_ctr
which are defined in freq_ctr.h which itself includes time.h. It's
barely possible to touch any of these files without breaking all the
circular dependency.
Let's move all this stuff to activity.{c,h} and be done with it. The
measurement of active and stolen time is now done in a dedicated
function called just after tv_before_poll() instead of mixing the two,
which used to be a lazy (but convenient) decision.
No code was changed, stuff was just moved around.
The signal_register_fct() does not remove the handlers assigned to a
signal, but add a new handler to a list.
We accidentality inherited the handlers of the main() function in the
master process which is a problem because they act on the proxies.
The side effect was to stop the MASTER proxy which handle the master CLI
on a SIGUSR1, and to display some debug info when doing a SIGHUP and a
SIGQUIT.
The mworker waitpid mode (which is used when a reload failed to apply
the new configuration) was still using a specific initialisation path.
That's a problem since we use a polling loop in the master now, the
master proxy is not initialized and the master CLI is not activated.
This patch removes the initialisation code of the wait mode and
introduce the MODE_MWORKER_WAIT in order to use the same init path as
the MODE_MWORKER with some exceptions. It allows to use the master proxy
and the master CLI during the waitpid mode.
This patch allows a process to properly quit when some jobs are still
active, this feature is handled by the unstoppable_jobs variable, which
must be atomically incremented.
During each new iteration of run_poll_loop() the break condition of the
loop is now (jobs - unstoppable_jobs) == 0.
The unique usage of this at the moment is to handle the socketpair CLI
of a the worker during the stopping of the process. During the soft
stop, we could mark the CLI listener as an unstoppable job and still
handle new connections till every other jobs are stopped.
When using the CLI proxy of the master and trying to access a worker
with the @ prefix, the worker just crash.
The commit 7216032 ("MEDIUM: mworker: leave when the master die")
reintroduced the old code of the pipe, which was not trying to access
the pointers before. The owner of the FD was modified to a different
value, this is a problem since we call listener_accept() in most cases
now from the mworker_accept_wrapper() and it casts the owner variable to
get the listener.
This patch fix the issue by setting back the previous owner of the FD.
The process was aborting with nbthread > 1.
The mworker_pipe_register() could be called several time in multithread
mode, we don't want to abort() there.
When the master die, the worker should exit too, this is achieved by
checking if the FD of the socketpair/pipe was closed between the master
and the worker.
In the former architecture of the master-worker, there was only a pipe
between the master and the workers, and it was easy to check an EOF on
the pipe FD to exit() the worker.
With the new architecture, we use a socketpair by process, and this
socketpair is also used to accept new connections with the
listener_accept() callback.
This accept callback can't handle the EOF and the exit of the process,
because it's very specific to the master worker. This is why we
transformed the mworker_pipe_handler() function in a wrapper which check
if there is an EOF and exit the process, and if not call
listener_accept() to achieve the accept.
The former behavior was to exit() the master process with the latest
status code known, which was the one of the last process to exit.
The problem is that the master process was not exiting with the status
code which provoked the exit-on-failure.
The active peers output indicates both the number of established peers
connections and the number of peers connection attempts. The new counter
"ConnectedPeers" also indicates the number of currently connected peers.
This helps detect that some peers cannot be reached for example. It's
worth mentioning that this value changes over time because unused peers
are often disconnected and reconnected. Most of the time it should be
equal to ActivePeers.
Peers are the last type of activity which can maintain a job present, so
it's important to report that such an entity is still active to explain
why the job count may be higher than zero. Here by "ActivePeers" we report
peers sessions, which include both established connections and outgoing
connection attempts.
This patch introduces mworker_cli_proxy_new_listener() which allows the
creation of new listeners for the CLI proxy.
Using this function it is possible to create new listeners from the
program arguments with -Sa <unix_socket>. It is allowed to create
multiple listeners with several -Sa.
This patch implements a listen proxy within the master. It uses the
sockpair of all the workers as servers.
In the current state of the code, the proxy is only doing round robin on
the CLI of the workers. A CLI mode will be needed to know to which CLI
send the requests.
The init code of the mworker_proc structs has been moved before the
init of the listeners.
Each socketpair is now connected to a CLI within the workers, which
allows the master to access their CLI.
The inherited flag of the worker side socketpair is removed so the
socket can be closed in the master.
The listeners with the LI_O_INHERITED flag were deleted but not unbound
which is a problem since we have a polling in the master.
This patch unbind every listeners which are not require for the master,
but does not close the FD of those that have a LI_O_INHERITED flag.
This bug appeared only if nbthread > 1. Handling the pipe with the
master, multiple threads of the same worker could process the deinit().
In addition, deinit() was called while some other threads were still
performing some tasks.
This patch assign the handler of the pipe with master to only the first
thread and removes the call to deinit() before exiting with an error.
This patch should be backported in v1.8.
These ones are mostly called from cfgparse.c for the parsing and do
not depend on the HTTP representation. The functions's prototypes
were moved to proto/http_rules.h, making this file work exactly like
tcp_rules. Ideally we should stop calling these functions directly
from cfgparse and register keywords, but there are a few cases where
that wouldn't work (stats http-request) so it's probably not worth
trying to go this far.
Cyril Bonté reported that commit f9cc07c25b broke the build without
thread.
We don't need to initialise tid = 0 in mworker_loop, so we could
completely remove it.
These error codes and messages are agnostic to the version, even if
they are represented as HTTP/1.0 messages. Ultimately they will have
to be transformed into internal HTTP messages to be used everywhere.
The HTTP/1.1 100 Continue message was turned to an IST and the local
copy in the Lua code was removed.
We need to clean the FDs registered manually in the poller to avoid FD
leaking during a reload of the master.
This patch call the per thread deinit function which close the thread
waker pipe.
In order to communicate with the workers, the master pipe has been
replaced by a socketpair() per worker.
The goal is to use these sockets as stats sockets and be able to access
them from the master.
When reloading, the master serialize the information of the workers and
put them in a environment variable. Once the master has been reexecuted
it unserialize that information and it is capable of closing the FDs of
the leaving children.
The master now use a poll loop, which should be initialized even in wait
mode. We need to init some variables if we didn't success to load the
configuration file.
If haproxy failed to load its configuration, the process is reexecuted
and it did not init the poller. So we must not try to deinit the poller
before the exec().
With the new way of handling the signals in the master worker, we are
are not staying in a waitpid() loop. Which means that we need to catch the
SIGCHLD signals to call waitpid().
The problem is when the master is reloading, this signal is neither
registered nor blocked so we lost all signals between the restart and
the call to mworker_loop().
This patch blocks the SIGCHLD signals before the reloading and ensure
it's not unblocked before the master registered the SIGCHLD handler.
In order to reorganize the code of the master worker, the mworker_wait()
function which was the main function was split. This function was
handling a wait() loop, but it does not need it anymore since the code
will use the poll loop of haproxy instead.
The function was split in several functions:
- mworker_catch_sigterm() which is a signal handler for SIGTERM ans
SIGUSR1 that sends the signals to the workers
- mworker_catch_sigchld() which is the code handling the leaving of a
child
- mworker_catch_sighup which basically call the mworker_restart()
function
- mworker_loop() which is the function calling the main poll loop in the
master
Now we try to synchronously push updates as they come using the new rdv
point, so that the call to the server update function from the main poll
loop is not needed anymore.
It further reduces the apparent latency in the health checks as the response
time almost always appears as 0 ms, resulting in a slightly higher check rate
of ~1960 conn/s. Despite this, the CPU consumption has slightly dropped again
to ~32% for the same test.
The only trick is that the checks code is built with a bit of recursivity
because srv_update_status() calls server_recalc_eweight(), and the latter
needs to signal srv_update_status() in case of updates. Thus we added an
extra argument to this function to indicate whether or not it must
propagate updates (no if it comes from srv_update_status).