This moves types/activity.h to haproxy/activity-t.h and
proto/activity.h to haproxy/activity.h.
The macros defining the bit field values for the profiling variable
were moved to the type file to be more future-proof.
Now the file is ready to be stored into its final destination. A few
minor reorderings were performed to keep the file properly organized,
making the various sections more visible (cache & lockless).
In addition and to stay consistent, memory.c was renamed to pool.c.
This one is included almost everywhere and used to rely on a few other
.h that are not needed (unistd, stdlib, standard.h). It could possibly
make sense to split it into multiple parts to distinguish operations
performed on timers and the internal time accounting, but at this point
it does not appear much important.
This splits the hathreads.h file into types+macros and functions. Given
that most users of this file used to include it only to get the definition
of THREAD_LOCAL and MAXTHREADS, the bare minimum was placed into thread-t.h
(i.e. types and macros).
All the thread management was left to haproxy/thread.h. It's worth noting
the drop of the trailing "s" in the name, to remove the permanent confusion
that arises between this one and the system implementation (no "s") and the
makefile's option (no "s").
For consistency, src/hathreads.c was also renamed thread.c.
A number of files were updated to only include thread-t which is the one
they really needed.
Some future improvements are possible like replacing empty inlined
functions with macros for the thread-less case, as building at -O0 disables
inlining and causes these ones to be emitted. But this really is cosmetic.
Half of the users of this include only need the type definitions and
not the manipulation macros nor the inline functions. Moves the various
types into mini-clist-t.h makes the files cleaner. The other one had all
its includes grouped at the top. A few files continued to reference it
without using it and were cleaned.
In addition it was about time that we'd rename that file, it's not
"mini" anymore and contains a bit more than just circular lists.
This file is to openssl what compat.h is to the libc, so it makes sense
to move it to haproxy/. It could almost be part of api.h but given the
amount of openssl stuff that gets loaded I fear it could increase the
build time.
Note that this file contains lots of inlined functions. But since it
does not depend on anything else in haproxy, it remains safe to keep
all that together.
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:
- common/config.h
- common/compat.h
- common/compiler.h
- common/defaults.h
- common/initcall.h
- common/tools.h
The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.
In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.
No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
When HAProxy is started with a '--' option, all following parameters are
considered configuration files. You can't add new options after a '--'.
The current reload system of the master-worker adds extra options at the
end of the arguments list. Which is a problem if HAProxy was started wih
'--'.
This patch fixes the issue by copying the new option at the beginning of
the arguments list instead of the end.
This patch must be backported as far as 1.8.
There is no reason the -S option can't take an argument which starts with
a -. This limitation must only be used for options that take a
non-finite list of parameters (-sf/-st)
This can be backported only if the previous patch which fixes
copy_argv() is backported too.
Could be backported as far as 1.9.
There is no reason the -x option can't take an argument which starts with
a -. This limitation must only be used for options that take a
non-finite list of parameters (-sf/-st)
This can be backported only if the previous patch which fixes
copy_argv() is backported too.
Could be backported as far as 1.8.
The copy_argv() function, which is used to copy and remove some of the
arguments of the command line in order to re-exec() the master process,
is poorly implemented.
The function tries to remove the -x and the -sf/-st options but without
taking into account that some of the options could take a parameter
starting with a dash.
In issue #644, haproxy starts with "-L -xfoo" which is perfectly
correct. However, the re-exec is done without "-xfoo" because the master
tries to remove the "-x" option. Indeed, the copy_argv() function does
not know how much arguments an option can have, and just assume that
everything starting with a dash is an option. So haproxy is exec() with
"-L" but without a parameter, which is wrong and leads to the exit of
the master, with usage().
To fix this issue, copy_argv() must know how much parameters an option
takes, and copy or skip the parameters correctly.
This fix is a first step but it should evolve to a cleaner way of
declaring the options to avoid deduplication of the parsing code, so we
avoid new bugs.
Should be backported with care as far as 1.8, by removing the options
that does not exists in the previous versions.
When the first thread stops and wakes others up, it's possible some of
them will also start to wake others in parallel. Let's make give this
notification task to the very first one instead since it's enough and
can reduce the amount of needless (though harmless) wakeup calls.
Currently the soft-stop can lead to old processes remaining alive for as
long as two seconds after receiving a soft-stop signal. What happens is
that when receiving SIGUSR1, one thread (usually the first one) wakes up,
handles the signal, sets "stopping", goes into runn_poll_loop(), and
discovers that stopping is set, so its also sets itself in the
stopping_thread_mask bit mask. After this it sees that other threads are
not yet willing to stop, so it continues to wait.
From there, other threads which were waiting in poll() expire after one
second on poll timeout and enter run_poll_loop() in turn. That's already
one second of wait time. They discover each in turn that they're stopping
and see that other threads are not yet stopping, so they go back waiting.
After the end of the first second, all threads know they're stopping and
have set their bit in stopping_thread_mask. It's only now that those who
started to wait first wake up again on timeout to discover that all other
ones are stopping, and can now quit. One second later all threads will
have done it and the process will quit.
This is effectively strictly larger than one second and up to two seconds.
What the current patch does is simple, when the first thread stops, it sets
its own bit into stopping_thread_mask then wakes up all other threads to do
also set theirs. This kills the first second which corresponds to the time
to discover the stopping state. Second, when a thread exists, it wakes all
other ones again because some might have gone back sleeping waiting for
"jobs" to go down to zero (i.e. closing the last connection). This kills
the last second of wait time.
Thanks to this, as SIGUSR1 now acts instantly again if there's no active
connection, or it stops immediately after the last connection has left if
one was still present.
This should be backported as far as 2.0.
HTTP health-checks are now internally based on tcp-checks. Of course all the
configuration parsing of the "http-check" keyword and the httpchk option has
been rewritten. But the main changes is that now, as for tcp-check ruleset, it
is possible to perform several send/expect sequences into the same
health-checks. Thus the connect rule is now also available from HTTP checks, jst
like set-var, unset-var and comment rules.
Because the request defined by the "option httpchk" line is used for the first
request only, it is now possible to set the method, the uri and the version on a
"http-check send" line.
The options and directives related to the configuration of checks in a backend
may be defined after the servers declarations. So, initialization of the check
of each server must not be performed during configuration parsing, because some
info may be missing. Instead, it must be done during the configuration validity
check.
Thus, callback functions are registered to be called for each server after the
config validity check, one for the server check and another one for the server
agent-check. In addition deinit callback functions are also registered to
release these checks.
This patch should be backported as far as 1.7. But per-server post_check
callback functions are only supported since the 2.1. And the initcall mechanism
does not exist before the 1.9. Finally, in 1.7, the code is totally
different. So the backport will be harder on older versions.
This options is used to force a non-SSL connection to check a SSL server or to
invert a check-ssl option inherited from the default section. The use_ssl field
in the check structure is used to know if a SSL connection must be used
(use_ssl=1) or not (use_ssl=0). The server configuration is used by default.
The problem is that we cannot distinguish the default case (no specific SSL
check option) and the case of an explicit non-SSL check. In both, use_ssl is set
to 0. So the server configuration is always used. For a SSL server, when
no-check-ssl option is set, the check is still performed using a SSL
configuration.
To fix the bug, instead of a boolean value (0=TCP, 1=SSL), we use a ternary value :
* 0 = use server config
* 1 = force SSL
* -1 = force non-SSL
The same is done for the server parameter. It is not really necessary for
now. But it is a good way to know is the server no-ssl option is set.
In addition, the PR_O_TCPCHK_SSL proxy option is no longer used to set use_ssl
to 1 for a check. Instead the flag is directly tested to prepare or destroy the
server SSL context.
This patch should be backported as far as 1.8.
The 'http-check send' directive have been added to add headers and optionnaly a
payload to the request sent during HTTP healthchecks. The request line may be
customized by the "option httpchk" directive but there was not official way to
add extra headers. An old trick consisted to hide these headers at the end of
the version string, on the "option httpchk" line. And it was impossible to add
an extra payload with an "http-check expect" directive because of the
"Connection: close" header appended to the request (See issue #16 for details).
So to make things official and fully support payload additions, the "http-check
send" directive have been added :
option httpchk POST /status HTTP/1.1
http-check send hdr Content-Type "application/json;charset=UTF-8" \
hdr X-test-1 value1 hdr X-test-2 value2 \
body "{id: 1, field: \"value\"}"
When a payload is defined, the Content-Length header is automatically added. So
chunk-encoded requests are not supported yet. For now, there is no special
validity checks on the extra headers.
This patch is inspired by Kiran Gavali's work. It should fix the issue #16 and
as far as possible, it may be backported, at least as far as 1.8.
This patch adds the sysname, release, version and machine fields from
the uname results to the version output. It intentionally leaves out the
machine name, because it is usually not useful and users might not want to
expose their machine names for privacy reasons.
May be backported if it is considered useful for debugging.
Some portability issues were met a few times in the past depending on
compiler versions, but this one was not reported in haproxy -vv output
while it's trivial to add it. This patch tries to be the most accurate
by explicitly reporting the clang version if detected, otherwise the
gcc version.
Since some systems switched to service managers which hide all warnings
by default, some users are not aware of some possibly important warnings
and get caught too late with errors that could have been detected earlier.
This patch adds a new global keyword, "zero-warning" and an equivalent
command-line option "-dW" to refuse to start in case any warning is
detected. It is recommended to use these with configurations that are
managed by humans in order to catch mistakes very early.
This helps quickly checking if the config produces any warning. For
this we reuse the "warned" bit field to add a new WARN_ANY bit that is
set by ha_warning(). The rest of the bit field was also cleaned from
unused bits.
In run_thread_poll_loop() we test both for (global_tasks_mask & tid_bit)
and thread_has_tasks(), but the former is useless since this test is
already part of the latter.
Commit 4b3f27b ("BUG/MINOR: haproxy/threads: try to make all threads
leave together") improved the soft-stop synchronization but it left a
small race open because it looks at tasks_run_queue, which can drop
to zero then back to one while another thread picks the task from the
run queue to insert it into the tasklet_list. The risk is very low but
not null. In addition the condition didn't consider the possible presence
of signals in the queue.
This patch moves the stopping detection just after the "wake" calculation
which already takes care of the various queues' sizes and signals. It
avoids needlessly duplicating these tests.
The bug was discovered during a code review but will probably never be
observed. This fix may be backported to 2.1 and 2.0 along with the commit
above.
Revamp the server connection lists. We know have 3 lists :
- idle_conns, which contains idling connections
- safe_conns, which contains idling connections that are safe to use even
for the first request
- available_conns, which contains connections that are not idling, but can
still accept new streams (those are HTTP/2 or fastcgi, and are always
considered safe).
It's more generic and versatile than the previous shut_your_big_mouth_gcc()
that was used to silence annoying warnings as it's not limited to ignoring
syscalls returns only. This allows us to get rid of the aforementioned
function and the shut_your_big_mouth_gcc_int variable, that started to
look ugly in multi-threaded environments.
There's a small issue with soft stop combined with the incoming
connection load balancing. A thread may dispatch a connection to
another one at the moment stopping=1 is set, and the second one could
stop by seeing (jobs - unstoppable_jobs) == 0 in run_poll_loop(),
without ever picking these connections from the queue. This is
visible in that it may occasionally cause a connection drop on
reload since no remaining thread will ever pick that connection
anymore.
In order to address this, this patch adds a stopping_thread_mask
variable by which threads acknowledge their willingness to stop
when their runqueue is empty. And all threads will only stop at
this moment, so that if finally some late work arrives in the
thread's queue, it still has a chance to process it.
This should be backported to 2.1 and 2.0.
Surprizingly the variable was never initialized, though on most
platforms it's zeroed at boot, and it is relatively harmless
anyway since in the worst case the bits are updated around poll().
This was introduced by commit 79321b95a8 and needs to be backported
as far as 1.9.
Remove the list of private connections from server, it has been largely
unused, we only inserted connections in it, but we would never actually
use it.
When a maximum memory setting is passed to haproxy and maxconn is not set
and ulimit-n is not set, it is expected that maxconn will be set to the
highest value permitted by this memory setting, possibly affecting the
FD limit.
When maxconn was changed to be deduced from the current process's FD limit,
the automatic setting above was partially lost because it now remains
limited to the current FD limit in addition to being limited to the
memory usage. For unprivileged processes it does not change anything,
but for privileged processes the difference is important. Indeed, the
previous behavior ensured that the new FD limit could be enforced on
the process as long as the user had the privilege to do so. Now this
does not happen anymore, and some people rely on this for automatic
sizing in VM environments.
This patch implements the ability to verify if the setting will be
enforceable on the process or not. First it computes maxconn based on
the memory limits alone, then checks if the process is willing to accept
them, otherwise tries again by respecting the process' hard limit.
Thanks to this we now have the best of the pre-2.0 behavior and the
current one, in that privileged users will be able to get as high a
maxconn as they need just based on the memory limit, while unprivileged
users will still get as high a setting as permitted by the intersection
of the memory limit and the process' FD limit.
Ideally, after some observation period, this patch along with the
previous one "MINOR: init: move the maxsock calculation code to
compute_ideal_maxsock()" should be backported to 2.1 and 2.0.
Thanks to Baptiste for raising the issue.
The maxsock value is currently derived from global.maxconn and a few other
settings, some of which also depend on global.maxconn. This makes it
difficult to check if a limit is already too high or not during the maxconn
automatic sizing.
Let's move this code into a new function, compute_ideal_maxsock() which now
takes a maxconn in argument. It performs the same operations and returns
the maxsock value if global.maxconn were to be set to that value. It now
replaces the previous code to compute maxsock.
This is the replacement of failed attempt to add thread safety and
per-process sequences of random numbers initally tried with commit
1c306aa84d ("BUG/MEDIUM: random: implement per-thread and per-process
random sequences").
This new version takes a completely different approach and doesn't try
to work around the horrible OS-specific and non-portable random API
anymore. Instead it implements "xoroshiro128**", a reputedly high
quality random number generator, which is one of the many variants of
xorshift, which passes all quality tests and which is described here:
http://prng.di.unimi.it/
While not cryptographically secure, it is fast and features a 2^128-1
period. It supports fast jumps allowing to cut the period into smaller
non-overlapping sequences, which we use here to support up to 2^32
processes each having their own, non-overlapping sequence of 2^96
numbers (~7*10^28). This is enough to provide 1 billion randoms per
second and per process for 2200 billion years.
The implementation was made thread-safe either by using a double 64-bit
CAS on platforms supporting it (x86_64, aarch64) or by using a local
lock for the time needed to perform the shift operations. This ensures
that all threads pick numbers from the same pool so that it is not
needed to assign per-thread ranges. For processes we use the fast jump
method to advance the sequence by 2^96 for each process.
Before this patch, the following config:
global
nbproc 8
frontend f
bind :4445
mode http
log stdout format raw daemon
log-format "%[uuid] %pid"
redirect location /
Would produce this output:
a4d0ad64-2645-4b74-b894-48acce0669af 12987
a4d0ad64-2645-4b74-b894-48acce0669af 12992
a4d0ad64-2645-4b74-b894-48acce0669af 12986
a4d0ad64-2645-4b74-b894-48acce0669af 12988
a4d0ad64-2645-4b74-b894-48acce0669af 12991
a4d0ad64-2645-4b74-b894-48acce0669af 12989
a4d0ad64-2645-4b74-b894-48acce0669af 12990
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12987
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12992
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12986
(...)
And now produces:
f94b29b3-da74-4e03-a0c5-a532c635bad9 13011
47470c02-4862-4c33-80e7-a952899570e5 13014
86332123-539a-47bf-853f-8c8ea8b2a2b5 13013
8f9efa99-3143-47b2-83cf-d618c8dea711 13012
3cc0f5c7-d790-496b-8d39-bec77647af5b 13015
3ec64915-8f95-4374-9e66-e777dc8791e0 13009
0f9bf894-dcde-408c-b094-6e0bb3255452 13011
49c7bfde-3ffb-40e9-9a8d-8084d650ed8f 13014
e23f6f2e-35c5-4433-a294-b790ab902653 13012
There are multiple benefits to using this method. First, it doesn't
depend anymore on a non-portable API. Second it's thread safe. Third it
is fast and more proven than any hack we could attempt to try to work
around the deficiencies of the various implementations around.
This commit depends on previous patches "MINOR: tools: add 64-bit rotate
operators" and "BUG/MEDIUM: random: initialize the random pool a bit
better", all of which will need to be backported at least as far as
version 2.0. It doesn't require to backport the build fixes for circular
include files dependecy anymore.
This reverts commit 1c306aa84d.
It breaks the build on all non-glibc platforms. I got confused by the
man page (which possibly is the most confusing man page I've ever read
about a standard libc function) and mistakenly understood that random_r
was portable, especially since it appears in latest freebsd source as
well but not in released versions, and with a slightly different API :-/
We need to find a different solution with a fallback. Among the
possibilities, we may reintroduce this one with a fallback relying on
locking around the standard functions, keeping fingers crossed for no
other library function to call them in parallel, or we may also provide
our own PRNG, which is not necessarily more difficult than working
around the totally broken up design of the portable API.
As mentioned in previous patch, the random number generator was never
made thread-safe, which used not to be a problem for health checks
spreading, until the uuid sample fetch function appeared. Currently
it is possible for two threads or processes to produce exactly the
same UUID. In fact it's extremely likely that this will happen for
processes, as can be seen with this config:
global
nbproc 8
frontend f
bind :4445
mode http
log stdout daemon format raw
log-format "%[uuid] %pid"
redirect location /
It typically produces this log:
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30645
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30641
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30644
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30639
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30646
07764439-c24d-4e6f-a5a6-0138be59e7a8 30645
07764439-c24d-4e6f-a5a6-0138be59e7a8 30639
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30643
07764439-c24d-4e6f-a5a6-0138be59e7a8 30646
b6773fdd-678f-4d04-96f2-4fb11ad15d6b 30646
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30642
07764439-c24d-4e6f-a5a6-0138be59e7a8 30642
What this patch does is to use a distinct per-thread and per-process
seed to make sure the same sequences will not appear, and will then
extend these seeds by "burning" a number of randoms that depends on
the global random seed, the thread ID and the process ID. This adds
roughly 20 extra bits of randomness, resulting in 52 bits total per
thread and per process.
It only takes a few milliseconds to burn these randoms and given
that threads start with a different seed, we know they will not
catch each other. So these random extra bits are essentially added
to ensure randomness between boots and cluster instances.
This replaces all uses of random() with ha_random() which uses the
thread-local state.
This must be backported as far as 2.0 or any version having the
UUID sample-fetch function since it's the main victim here.
It's important to note that this patch, in addition to depending on
the previous one "BUG/MEDIUM: init: initialize the random pool a bit
better", also depends on the preceeding build fixes to address a
circular dependency issue in the include files that prevented it
from building. Part or all of these patches may need to be backported
or adapted as well.
Since the UUID sample fetch was created, some people noticed that in
certain virtualized environments they manage to get exact same UUIDs
on different instances started exactly at the same moment. It turns
out that the randoms were only initialized to spread the health checks
originally, not to provide "clean" randoms.
This patch changes this and collects more randomness from various
sources, including existing randoms, /dev/urandom when available,
RAND_bytes() when OpenSSL is available, as well as the timing for such
operations, then applies a SHA1 on all this to keep a 160 bits random
seed available, 32 of which are passed to srandom().
It's worth mentioning that there's no clean way to pass more than 32
bits to srandom() as even initstate() provides an opaque state that
must absolutely not be tampered with since known implementations
contain state information.
At least this allows to have up to 4 billion different sequences
from the boot, which is not that bad.
Note that the thread safety was still not addressed, which is another
issue for another patch.
This must be backported to all versions containing the UUID sample
fetch function, i.e. as far as 2.0.
James Stroehmann reported something working as documented but that can
be considered as a regression in the way the automatic maxconn is
calculated from the process' limits :
https://www.mail-archive.com/haproxy@formilux.org/msg36523.html
The purpose of the changes in 2.0 was to have maxconn default to the
highest possible value permitted to the user based on the ulimit -n
setting, however the calculation starts from the soft limit, which
can be lower than what users were allowed to with previous versions
where the default value of 2000 would force a higher ulimit -n as
long as it fitted in the hard limit.
Usually this is not noticeable if the user changes the limits, because
quite commonly setting a new value restricts both the soft and hard
values.
Let's instead always use the max between the hard and soft limits, as
we know these values are permitted. This was tried on the following
setup:
$ cat ulimit-n.cfg
global
stats socket /tmp/sock1 level admin
$ ulimit -n
1024
Before the change the limits would show like this:
$ socat - /tmp/sock1 <<< "show info" | grep -im2 ^Max
Maxsock: 1023
Maxconn: 489
After the change the limits are now much better and more in line with
the default settings in earlier versions:
$ socat - /tmp/sock1 <<< "show info" | grep -im2 ^Max
Maxsock: 4095
Maxconn: 2025
The difference becomes even more obvious when running moderately large
configs with hundreds of checked servers and hundreds of listeners:
$ cat ulimit-n.cfg
global
stats socket /tmp/sock1 level admin
listen l
bind :10000-10300
server-template srv- 300 0.0.0.0 check disabled
Before After
Maxsock 1024 4096
Maxconn 189 1725
This issue is tagged as minor since a trivial config change fixes it,
but it would help new users to have it backported as far as 2.0.
pattern_finalize_config() uses an inefficient algorithm which is a
problem with very large configuration files. This affects startup, and
therefore reload time. When haproxy is deployed as a router in a
Kubernetes cluster the generated configuration file may be large and
reloads are frequently occuring, which makes this a significant issue.
The old algorithm is O(n^2)
* allocate missing uids - O(n^2)
* sort linked list - O(n^2)
The new algorithm is O(n log n):
* find the user allocated uids - O(n)
* store them for efficient lookup - O(n log n)
* allocate missing uids - n times O(log n)
* sort all uids - O(n log n)
* convert back to linked list - O(n)
Performance examples, startup time in seconds:
pat_refs old new
1000 0.02 0.01
10000 2.1 0.04
20000 12.3 0.07
30000 27.9 0.10
40000 52.5 0.14
50000 77.5 0.17
Please backport to 1.8, 2.0 and 2.1.
The isalnum(), isalpha(), isdigit() etc functions from ctype.h are
supposed to take an int in argument which must either reflect an
unsigned char or EOF. In practice on some platforms they're implemented
as macros referencing an array, and when passed a char, they either cause
a warning "array subscript has type 'char'" when lucky, or cause random
segfaults when unlucky. It's quite unconvenient by the way since none of
them may return true for negative values. The recent introduction of
cygwin to the list of regularly tested build platforms revealed a lot
of breakage there due to the same issues again.
So this patch addresses the problem all over the code at once. It adds
unsigned char casts to every valid use case, and also drops the unneeded
double cast to int that was sometimes added on top of it.
It may be backported by dropping irrelevant changes if that helps better
support uncommon platforms. It's unlikely to fix bugs on platforms which
would already not emit any warning though.
This patch introduces the 'http-after-response' rules. These rules are evaluated
at the end of the response analysis, just before the data forwarding, on ALL
HTTP responses, the server ones but also all responses generated by
HAProxy. Thanks to this ruleset, it is now possible for instance to add some
headers to the responses generated by the stats applet. Following actions are
supported :
* allow
* add-header
* del-header
* replace-header
* replace-value
* set-header
* set-status
* set-var
* strict-mode
* unset-var
When the global structure is initialized, instead of setting tune.maxrewrite to
-1, its default value can be immediately set. This way, it is always defined
during the configuration validity check. Otherwise, the only way to have it at
this stage, it is to explicity set it in the global section.
Released version 2.2-dev1 with the following main changes :
- DOC: this is development again
- MINOR: version: this is development again, update the status
- SCRIPTS: update create-release to fix the changelog on new branches
- CLEANUP: ssl: Clean up error handling
- BUG/MINOR: contrib/prometheus-exporter: decode parameter and value only
- BUG/MINOR: h1: Don't test the host header during response parsing
- BUILD/MINOR: trace: fix use of long type in a few printf format strings
- DOC: Clarify behavior of server maxconn in HTTP mode
- MINOR: ssl: deduplicate ca-file
- MINOR: ssl: compute ca-list from deduplicate ca-file
- MINOR: ssl: deduplicate crl-file
- CLEANUP: dns: resolution can never be null
- BUG/MINOR: http-htx: Don't make http_find_header() fail if the value is empty
- DOC: ssl/cli: set/commit/abort ssl cert
- BUG/MINOR: ssl: fix SSL_CTX_set1_chain compatibility for openssl < 1.0.2
- BUG/MINOR: fcgi-app: Make the directive pass-header case insensitive
- BUG/MINOR: stats: Fix HTML output for the frontends heading
- BUG/MINOR: ssl: fix X509 compatibility for openssl < 1.1.0
- DOC: clarify matching strings on binary fetches
- DOC: Fix ordered list in summary
- DOC: move the "group" keyword at the right place
- MEDIUM: init: prevent process and thread creation at runtime
- BUG/MINOR: ssl/cli: 'ssl cert' cmd only usable w/ admin rights
- BUG/MEDIUM: stream-int: don't subscribed for recv when we're trying to flush data
- BUG/MINOR: stream-int: avoid calling rcv_buf() when splicing is still possible
- BUG/MINOR: ssl/cli: don't overwrite the filters variable
- BUG/MEDIUM: listener/thread: fix a race when pausing a listener
- BUG/MINOR: ssl: certificate choice can be unexpected with openssl >= 1.1.1
- BUG/MEDIUM: mux-h1: Never reuse H1 connection if a shutw is pending
- BUG/MINOR: mux-h1: Don't rely on CO_FL_SOCK_RD_SH to set H1C_F_CS_SHUTDOWN
- BUG/MINOR: mux-h1: Fix conditions to know whether or not we may receive data
- BUG/MEDIUM: tasks: Make sure we switch wait queues in task_set_affinity().
- BUG/MEDIUM: checks: Make sure we set the task affinity just before connecting.
- MINOR: debug: replace popen() with pipe+fork() in "debug dev exec"
- MEDIUM: init: set NO_NEW_PRIVS by default when supported
- BUG/MINOR: mux-h1: Be sure to set CS_FL_WANT_ROOM when EOM can't be added
- BUG/MEDIUM: mux-fcgi: Handle cases where the HTX EOM block cannot be inserted
- BUG/MINOR: proxy: make soft_stop() also close FDs in LI_PAUSED state
- BUG/MINOR: listener/threads: always use atomic ops to clear the FD events
- BUG/MINOR: listener: also clear the error flag on a paused listener
- BUG/MEDIUM: listener/threads: fix a remaining race in the listener's accept()
- MINOR: listener: make the wait paths cleaner and more reliable
- MINOR: listener: split dequeue_all_listener() in two
- REORG: listener: move the global listener queue code to listener.c
- DOC: document the listener state transitions
- BUG/MEDIUM: kqueue: Make sure we report read events even when no data.
- BUG/MAJOR: dns: add minimalist error processing on the Rx path
- BUG/MEDIUM: proto_udp/threads: recv() and send() must not be exclusive.
- DOC: listeners: add a few missing transitions
- BUG/MINOR: tasks: only requeue a task if it was already in the queue
- MINOR: tasks: split wake_expired_tasks() in two parts to avoid useless wakeups
- DOC: proxies: HAProxy only supports 3 connection modes
- DOC: remove references to the outdated architecture.txt
- BUG/MINOR: log: fix minor resource leaks on logformat error path
- BUG/MINOR: mworker: properly pass SIGTTOU/SIGTTIN to workers
- BUG/MINOR: listener: do not immediately resume on transient error
- BUG/MINOR: server: make "agent-addr" work on default-server line
- BUG/MINOR: listener: fix off-by-one in state name check
- BUILD/MINOR: unix sockets: silence an absurd gcc warning about strncpy()
- MEDIUM: h1-htx: Add HTX EOM block when the message is in H1_MSG_DONE state
- MINOR: http-htx: Add some htx sample fetches for debugging purpose
- REGTEST: Add an HTX reg-test to check an edge case
- DOC: clarify the fact that replace-uri works on a full URI
- BUG/MINOR: sample: fix the closing bracket and LF in the debug converter
- BUG/MINOR: sample: always check converters' arguments
- MINOR: sample: Validate the number of bits for the sha2 converter
- BUG/MEDIUM: ssl: Don't set the max early data we can receive too early.
- MINOR: ssl/cli: 'show ssl cert' give information on the certificates
- BUG/MINOR: ssl/cli: fix build for openssl < 1.0.2
- MINOR: debug: support logging to various sinks
- MINOR: http: add a new "replace-path" action
- REGTEST: ssl: test the "set ssl cert" CLI command
- REGTEST: run-regtests: implement #REQUIRE_BINARIES
- MINOR: task: only check TASK_WOKEN_ANY to decide to requeue a task
- BUG/MAJOR: task: add a new TASK_SHARED_WQ flag to fix foreing requeuing
- BUG/MEDIUM: ssl: Revamp the way early data are handled.
- MINOR: fd/threads: make _GET_NEXT()/_GET_PREV() use the volatile attribute
- BUG/MEDIUM: fd/threads: fix a concurrency issue between add and rm on the same fd
- REGTEST: make the "set ssl cert" require version 2.1
- BUG/MINOR: ssl: openssl-compat: Fix getm_ defines
- BUG/MEDIUM: state-file: do not allocate a full buffer for each server entry
- BUG/MINOR: state-file: do not store duplicates in the global tree
- BUG/MINOR: state-file: do not leak memory on parse errors
- BUG/MAJOR: mux-h1: Don't pretend the input channel's buffer is full if empty
- BUG/MEDIUM: stream: Be sure to never assign a TCP backend to an HTX stream
- BUILD: ssl: improve SSL_CTX_set_ecdh_auto compatibility
- BUILD: travis-ci: link with ssl libraries using rpath instead of LD_LIBRARY_PATH/DYLD_LIBRARY_PATH
- BUILD: travis-ci: reenable address sanitizer for clang builds
- BUG/MINOR: checks: refine which errno values are really errors.
- BUG/MINOR: connection: only wake send/recv callbacks if the FD is active
- CLEANUP: connection: conn->xprt is never NULL
- MINOR: pollers: add a new flag to indicate pollers reporting ERR & HUP
- MEDIUM: tcp: make tcp_connect_probe() consider ERR/HUP
- REORG: connection: move tcp_connect_probe() to conn_fd_check()
- MINOR: connection: check for connection validation earlier
- MINOR: connection: remove the double test on xprt_done_cb()
- CLEANUP: connection: merge CO_FL_NOTIFY_DATA and CO_FL_NOTIFY_DONE
- MINOR: poller: do not call the IO handler if the FD is not active
- OPTIM: epoll: always poll for recv if neither active nor ready
- OPTIM: polling: do not create update entries for FD removal
- BUG/MEDIUM: checks: Only attempt to do handshakes if the connection is ready.
- BUG/MEDIUM: connections: Hold the lock when wanting to kill a connection.
- BUILD: CI: modernize cirrus-ci
- MINOR: config: disable busy polling on old processes
- MINOR: ssl: Remove unused variable "need_out".
- BUG/MINOR: h1: Report the right error position when a header value is invalid
- BUG/MINOR: proxy: Fix input data copy when an error is captured
- BUG/MEDIUM: http-ana: Truncate the response when a redirect rule is applied
- BUG/MINOR: channel: inject output data at the end of output
- BUG/MEDIUM: session: do not report a failure when rejecting a session
- MEDIUM: dns: implement synchronous send
- MINOR: raw_sock: make sure to disable polling once everything is sent
- MINOR: http: Add 410 to http-request deny
- MINOR: http: Add 404 to http-request deny
- CLEANUP: mux-h2: remove unused goto "out_free_h2s"
- BUILD: cirrus-ci: choose proper openssl package name
- BUG/MAJOR: listener: do not schedule a task-less proxy
- CLEANUP: server: remove unused err section in server_finalize_init
- REGTEST: set_ssl_cert.vtc: replace "echo" with "printf"
- BUG/MINOR: stream-int: Don't trigger L7 retry if max retries is already reached
- BUG/MEDIUM: tasks: Use the MT macros in tasklet_free().
- BUG/MINOR: mux-h2: use a safe list_for_each_entry in h2_send()
- BUG/MEDIUM: mux-h2: fix missing test on sending_list in previous patch
- CLEANUP: ssl: remove opendir call in ssl_sock_load_cert
- MEDIUM: lua: don't call the GC as often when dealing with outgoing connections
- BUG/MEDIUM: mux-h2: don't stop sending when crossing a buffer boundary
- BUG/MINOR: cli/mworker: can't start haproxy with 2 programs
- REGTEST: mcli/mcli_start_progs: start 2 programs
- BUG/MEDIUM: mworker: remain in mworker mode during reload
- DOC: clarify crt-base usage
- CLEANUP: compression: remove unused deinit_comp_ctx section
- BUG/MEDIUM: mux_h1: Don't call h1_send if we subscribed().
- BUG/MEDIUM: raw_sock: Make sur the fd and conn are sync.
- CLEANUP: proxy: simplify proxy_parse_rate_limit proxy checks
- BUG/MAJOR: hashes: fix the signedness of the hash inputs
- REGTEST: add sample_fetches/hashes.vtc to validate hashes
- BUG/MEDIUM: cli: _getsocks must send the peers sockets
- CLEANUP: cli: deduplicate the code in _getsocks
- BUG/MINOR: stream: don't mistake match rules for store-request rules
- BUG/MEDIUM: connection: add a mux flag to indicate splice usability
- BUG/MINOR: pattern: handle errors from fgets when trying to load patterns
- MINOR: connection: move the CO_FL_WAIT_ROOM cleanup to the reader only
- MINOR: stream-int: remove dependency on CO_FL_WAIT_ROOM for rcv_buf()
- MEDIUM: connection: get rid of CO_FL_CURR_* flags
- BUILD: pattern: include errno.h
- MEDIUM: mux-h2: do not try to stop sending streams on blocked mux
- MEDIUM: mux-fcgi: do not try to stop sending streams on blocked mux
- MEDIUM: mux-h2: do not make an h2s subscribe to itself on deferred shut
- MEDIUM: mux-fcgi: do not make an fstrm subscribe to itself on deferred shut
- REORG: stream/backend: move backend-specific stuff to backend.c
- MEDIUM: backend: move the connection finalization step to back_handle_st_con()
- MEDIUM: connection: merge the send_wait and recv_wait entries
- MEDIUM: xprt: merge recv_wait and send_wait in xprt_handshake
- MEDIUM: ssl: merge recv_wait and send_wait in ssl_sock
- MEDIUM: mux-h1: merge recv_wait and send_wait
- MEDIUM: mux-h2: merge recv_wait and send_wait event notifications
- MEDIUM: mux-fcgi: merge recv_wait and send_wait event notifications
- MINOR: connection: make the last arg of subscribe() a struct wait_event*
- MINOR: ssl: Add support for returning the dn samples from ssl_(c|f)_(i|s)_dn in LDAP v3 (RFC2253) format.
- DOC: Fix copy and paste mistake in http-response replace-value doc
- BUG/MINOR: cache: Fix leak of cache name in error path
- BUG/MINOR: dns: Make dns_query_id_seed unsigned
- BUG/MINOR: 51d: Fix bug when HTX is enabled
- MINOR: http-htx: Move htx sample fetches in the scope "internal"
- MINOR: http-htx: Rename 'internal.htx_blk.val' to 'internal.htx_blk.data'
- MINOR: http-htx: Make 'internal.htx_blk_data' return a binary string
- DOC: Add a section to document the internal sample fetches
- MINOR: mux-h1: Inherit send flags from the upper layer
- MINOR: contrib/prometheus-exporter: Add heathcheck status/code in server metrics
- BUG/MINOR: http-ana/filters: Wait end of the http_end callback for all filters
- BUG/MINOR: http-rules: Remove buggy deinit functions for HTTP rules
- BUG/MINOR: stick-table: Use MAX_SESS_STKCTR as the max track ID during parsing
- MEDIUM: http-rules: Register an action keyword for all http rules
- MINOR: tcp-rules: Always set from which ruleset a rule comes from
- MINOR: actions: Use ACT_RET_CONT code to ignore an error from a custom action
- MINOR: tcp-rules: Kill connections when custom actions return ACT_RET_ERR
- MINOR: http-rules: Return an error when custom actions return ACT_RET_ERR
- MINOR: counters: Add a counter to report internal processing errors
- MEDIUM: http-ana: Properly handle internal processing errors
- MINOR: http-rules: Add a rule result to report internal error
- MINOR: http-rules: Handle internal errors during HTTP rules evaluation
- MINOR: http-rules: Add more return codes to let custom actions act as normal ones
- MINOR: tcp-rules: Handle denied/aborted/invalid connections from TCP rules
- MINOR: http-rules: Handle denied/aborted/invalid connections from HTTP rules
- MINOR: stats: Report internal errors in the proxies/listeners/servers stats
- MINOR: contrib/prometheus-exporter: Export internal errors per proxy/server
- MINOR: counters: Remove failed_secu counter and use denied_resp instead
- MINOR: counters: Review conditions to increment counters from analysers
- MINOR: http-ana: Add a txn flag to support soft/strict message rewrites
- MINOR: http-rules: Handle all message rewrites the same way
- MINOR: http-rules: Add a rule to enable or disable the strict rewriting mode
- MEDIUM: http-rules: Enable the strict rewriting mode by default
- REGTEST: Fix format of set-uri HTTP request rule in h1or2_to_h1c.vtc
- MINOR: actions: Add a function pointer to release args used by actions
- MINOR: actions: Regroup some info about HTTP rules in the same struct
- MINOR: http-rules/tcp-rules: Call the defined action function first if defined
- MINOR: actions: Rename the act_flag enum into act_opt
- MINOR: actions: Add flags to configure the action behaviour
- MINOR: actions: Use an integer to set the action type
- MINOR: http-rules: Use a specific action type for some custom HTTP actions
- MINOR: http-rules: Make replace-header and replace-value custom actions
- MINOR: http-rules: Make set-header and add-header custom actions
- MINOR: http-rules: Make set/del-map and add/del-acl custom actions
- MINOR: http-rules: Group all processing of early-hint rule in its case clause
- MEDIUM: http-rules: Make early-hint custom actions
- MINOR: http-rule/tcp-rules: Make track-sc* custom actions
- MINOR: tcp-rules: Make tcp-request capture a custom action
- MINOR: http-rules: Add release functions for existing HTTP actions
- BUG/MINOR: http-rules: Fix memory releases on error path during action parsing
- MINOR: tcp-rules: Add release functions for existing TCP actions
- BUG/MINOR: tcp-rules: Fix memory releases on error path during action parsing
- MINOR: http-htx: Add functions to read a raw error file and convert it in HTX
- MINOR: http-htx: Add functions to create HTX redirect message
- MINOR: config: Use dedicated function to parse proxy's errorfiles
- MINOR: config: Use dedicated function to parse proxy's errorloc
- MEDIUM: http-htx/proxy: Use a global and centralized storage for HTTP error messages
- MINOR: proxy: Register keywords to parse errorfile and errorloc directives
- MINOR: http-htx: Add a new section to create groups of custom HTTP errors
- MEDIUM: proxy: Add a directive to reference an http-errors section in a proxy
- MINOR: http-rules: Update txn flags and status when a deny rule is executed
- MINOR: http-rules: Support an optional status on deny rules for http reponses
- MINOR: http-rules: Use same function to parse request and response deny actions
- MINOR: http-ana: Add an error message in the txn and send it when defined
- MEDIUM: http-rules: Support an optional error message in http deny rules
- REGTEST: Add a strict rewriting mode reg test
- REGEST: Add reg tests about error files
- MINOR: ssl: accept 'verify' bind option with 'set ssl cert'
- BUG/MINOR: ssl: ssl_sock_load_ocsp_response_from_file memory leak
- BUG/MINOR: ssl: ssl_sock_load_issuer_file_into_ckch memory leak
- BUG/MINOR: ssl: ssl_sock_load_sctl_from_file memory leak
- BUG/MINOR: http_htx: Fix some leaks on error path when error files are loaded
- CLEANUP: http-ana: Remove useless test on txn when the error message is retrieved
- BUILD: CI: introduce ARM64 builds
- BUILD: ssl: more elegant anti-replay feature presence check
- MINOR: proxy/http-ana: Add support of extra attributes for the cookie directive
- MEDIUM: dns: use Additional records from SRV responses
- CLEANUP: Consistently `unsigned int` for bitfields
- CLEANUP: pattern: remove the pat_time definition
- BUG/MINOR: http_act: don't check capture id in backend
- BUG/MINOR: ssl: fix build on development versions of openssl-1.1.x
It is now possible to insert any attribute when a cookie is inserted by
HAProxy. Any value may be set, no check is performed except the syntax validity
(CTRL chars and ';' are forbidden). For instance, it may be used to add the
SameSite attribute:
cookie SRV insert attr "SameSite=Strict"
The attr option may be repeated to add several attributes.
This patch should fix the issue #361.
All custom HTTP errors are now stored in a global tree. Proxies use a references
on these messages. The key used for errorfile directives is the file name as
specified in the configuration. For errorloc directives, a key is created using
the redirect code and the url. This means that the same custom error message is
now stored only once. It may be used in several proxies or for several status
code, it is only parsed and stored once.
Arguments used by actions are never released during HAProxy deinit. Now, it is
possible to specify a function to do so. ".release_ptr" field in the act_rule
structure may be set during the configuration parsing to a specific deinit
function depending on the action type.
Functions to deinitialize the HTTP rules are buggy. These functions does not
check the action name to release the right part in the arg union. Only few info
are released. For auth rules, the realm is released and there is no problem
here. But the regex <arg.hdr_add.re> is always unconditionally released. So it
is easy to make these functions crash. For instance, with the following rule
HAProxy crashes during the deinit :
http-request set-map(/path/to/map) %[src] %[req.hdr(X-Value)]
For now, These functions are simply removed and we rely on the deinit function
used for TCP rules (renamed as deinit_act_rules()). This patch fixes the
bug. But arguments used by actions are not released at all, this part will be
addressed later.
This patch must be backported to all stable versions.
If you reload an haproxy started in master-worker mode with
"master-worker" in the configuration, and no "-W" argument,
the new process lost the fact that is was in master-worker mode
resulting in weird behaviors.
The bigest problem is that if it is reloaded with an bad configuration,
the master will exits instead of remaining in waitpid mode.
This problem was discovered in bug #443.
Should be backported in every version using the master-worker mode.
(as far as 1.8)
Apparently gcc developers decided that strncpy() semantics are no longer
valid and now deserve a warning, especially if used exactly as designed.
This results in issue #304. Let's just remove one to the target size to
please her majesty gcc, the God of C Compilers, who tries hard to make
users completely eliminate any use of string.h and reimplement it by
themselves at much higher risks. Pfff....
This can be backported to stable version, the fix is harmless since it
ignores the last zero that is already set on next line.
If a new process is started with -sf and it fails to bind, it may send
a SIGTTOU to the master process in hope that it will temporarily unbind.
Unfortunately this one doesn't catch it and stops to background instead
of forwarding the signal to the workers. The same is true for SIGTTIN.
This commit simply implements an extra signal handler for the master to
deal with such signals that must be passed down to the workers. It must
be backported as far as 1.8, though there the code differs in that it's
entirely in haproxy.c and doesn't require an extra sig handler.
We used to have wake_expired_tasks() wake up tasks and return the next
expiration delay. The problem this causes is that we have to call it just
before poll() in order to consider latest timers, but this also means that
we don't wake up all newly expired tasks upon return from poll(), which
thus systematically requires a second poll() round.
This is visible when running any scheduled task like a health check, as there
are systematically two poll() calls, one with the interval, nothing is done
after it, and another one with a zero delay, and the task is called:
listen test
bind *:8001
server s1 127.0.0.1:1111 check
09:37:38.200959 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8696843}) = 0
09:37:38.200967 epoll_wait(3, [], 200, 1000) = 0
09:37:39.202459 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8712467}) = 0
>> nothing run here, as the expired task was not woken up yet.
09:37:39.202497 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8715766}) = 0
09:37:39.202505 epoll_wait(3, [], 200, 0) = 0
09:37:39.202513 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8719064}) = 0
>> now the expired task was woken up
09:37:39.202522 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 7
09:37:39.202537 fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
09:37:39.202565 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
09:37:39.202577 setsockopt(7, SOL_TCP, TCP_QUICKACK, [0], 4) = 0
09:37:39.202585 connect(7, {sa_family=AF_INET, sin_port=htons(1111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:37:39.202659 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLOUT, {u32=7, u64=7}}) = 0
09:37:39.202673 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8814713}) = 0
09:37:39.202683 epoll_wait(3, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}], 200, 1000) = 1
09:37:39.202693 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8818617}) = 0
09:37:39.202701 getsockopt(7, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
09:37:39.202715 close(7) = 0
Let's instead split the function in two parts:
- the first part, wake_expired_tasks(), called just before
process_runnable_tasks(), wakes up all expired tasks; it doesn't
compute any timeout.
- the second part, next_timer_expiry(), called just before poll(),
only computes the next timeout for the current thread.
Thanks to this, all expired tasks are properly woken up when leaving
poll, and each poll call's timeout remains up to date:
09:41:16.270449 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10223556}) = 0
09:41:16.270457 epoll_wait(3, [], 200, 999) = 0
09:41:17.270130 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10238572}) = 0
09:41:17.270157 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 7
09:41:17.270194 fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
09:41:17.270204 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
09:41:17.270216 setsockopt(7, SOL_TCP, TCP_QUICKACK, [0], 4) = 0
09:41:17.270224 connect(7, {sa_family=AF_INET, sin_port=htons(1111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:41:17.270299 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLOUT, {u32=7, u64=7}}) = 0
09:41:17.270314 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10337841}) = 0
09:41:17.270323 epoll_wait(3, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}], 200, 1000) = 1
09:41:17.270332 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10341860}) = 0
09:41:17.270340 getsockopt(7, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
09:41:17.270367 close(7) = 0
This may be backported to 2.1 and 2.0 though it's unlikely to bring any
user-visible improvement except to clarify debugging.
The global listener queue code and declarations were still lying in
haproxy.c while not needed there anymore at all. This complicates
the code for no reason. As a result, the global_listener_queue_task
and the global_listener_queue were made static.
We use it half times for the global_listener_queue and half times
for a proxy's queue and this requires the callers to take care of
these. Let's split it in two versions, the current one working only
on the global queue and another one dedicated to proxies for the
per-proxy queues. This cleans up quite a bit of code.
HAProxy doesn't need to call executables at run time (except when using
external checks which are strongly recommended against), and is even expected
to isolate itself into an empty chroot. As such, there basically is no valid
reason to allow a setuid executable to be called without the user being fully
aware of the risks. In a situation where haproxy would need to call external
checks and/or disable chroot, exploiting a vulnerability in a library or in
haproxy itself could lead to the execution of an external program. On Linux
it is possible to lock the process so that any setuid bit present on such an
executable is ignored. This significantly reduces the risk of privilege
escalation in such a situation. This is what haproxy does by default. In case
this causes a problem to an external check (for example one which would need
the "ping" command), then it is possible to disable this protection by
explicitly adding this directive in the global section. If enabled, it is
possible to turn it back off by prefixing it with the "no" keyword.
Before the option:
$ socat - /tmp/sock1 <<< "expert-mode on; debug dev exec sudo /bin/id"
uid=0(root) gid=0(root) groups=0(root
After the option:
$ socat - /tmp/sock1 <<< "expert-mode on; debug dev exec sudo /bin/id"
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the
'nosuid' option set or an NFS file system without root privileges?
Some concerns are regularly raised about the risk to inherit some Lua
files which make use of a fork (e.g. via os.execute()) as well as
whether or not some of bugs we fix might or not be exploitable to run
some code. Given that haproxy is event-driven, any foreground activity
completely stops processing and is easy to detect, but background
activity is a different story. A Lua script could very well discretely
fork a sub-process connecting to a remote location and taking commands,
and some injected code could also try to hide its activity by creating
a process or a thread without blocking the rest of the processing. While
such activities should be extremely limited when run in an empty chroot
without any permission, it would be better to get a higher assurance
they cannot happen.
This patch introduces something very simple: it limits the number of
processes and threads to zero in the workers after the last thread was
created. By doing so, it effectively instructs the system to fail on
any fork() or clone() syscall. Thus any undesired activity has to happen
in the foreground and is way easier to detect.
This will obviously break external checks (whose concept is already
totally insecure), and for this reason a new option
"insecure-fork-wanted" was added to disable this protection, and it
is suggested in the fork() error report from the checks. It is
obviously recommended not to use it and to reconsider the reasons
leading to it being enabled in the first place.
If for any reason we fail to disable forks, we still start because it
could be imaginable that some operating systems refuse to set this
limit to zero, but in this case we emit a warning, that may or may not
be reported since we're after the fork point. Ideally over the long
term it should be conditionned by strict-limits and cause a hard fail.
The link to the known bugs page for the current version is built and
reported there. When it is a development version (less than 2 dots),
instead a link to github open issues is reported as there's no way to
be sure about the current situation in this case and it's better that
users report their trouble there.
As discussed on Discourse here:
https://discourse.haproxy.org/t/haproxy-branch-support-lifetime/4466
it's not always easy for end users to know the lifecycle of the version
they are using. This patch introduces a "Status" line in the output of
"haproxy -vv" indicating whether it's a development, stable, long-term
supported version, possibly with an estimated end of life for the branch
when it can be anticipated (e.g. for stable versions). This field should
be adjusted when creating a major release to reflect the new status.
It may make sense to backport this to other branches to clarify the
situation.
Upon a reexec_on_failure, if the process tried to exit after the
initialization of the process structure but before it was filled with a
PID, the PID in the mworker_proc structure is set to -1.
In this particular case the -sf argument is filled with -1 and haproxy
will exit with the usage message because of that argument.
Should be backported in 2.0.
since the introduction of mworker, the setuid/setgid was duplicated in
two places; try to improve that by creating a dedicated function.
this patch does not introduce any functional change.
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
in mworker mode used with uid/gid settings, it was not possible to get
a coredump despite the set-dumpable option.
indeed prctl(2) manual page specifies the dumpable attribute is reverted
to `/proc/sys/fs/suid_dumpable` in a few conditions such as process
effective user and group are changed.
this patch moves the whole set-dumpable logic before the polling code in
order to catch all possible cases where we could have changed the
uid/gid. It however does not cover the possible segfault at startup.
this patch should be backported in 2.0.
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
this patch introduces a strict-limits parameter which enforces the
setrlimit setting instead of a warning. This option can be forcingly
disable with the "no" keyword.
The general aim of this patch is to avoid bad surprises on a production
environment where you change the maxconn for example, a new fd limit is
calculated, but cannot be set because of sysfs setting. In that case you
might want to have an explicit failure to be aware of it before seeing
your traffic going down. During a global rollout it is also useful to
explictly fail as most progressive rollout would simply check the
general health check of the process.
As discussed, plan to use the strict by default mode starting from v2.3.
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Patch 56996da ("BUG/MINOR: mworker/ssl: close OpenSSL FDs on reload")
fixes a issue where the /dev/random FD was leaked by OpenSSL upon a
reload in master worker mode. Indeed the FD was not flagged with
CLOEXEC.
The fix was checking if ssl_used_frontend or ssl_used_backend were set
to close the FD. This is wrong, indeed the lua init code creates an SSL
server without increasing the backend value, so the deinit is never
done when you don't use SSL in your configuration.
To reproduce the problem you just need to build haproxy with openssl and
lua with an openssl which does not use the getrandom() syscall. No
openssl nor lua configuration are required for haproxy.
This patch must be backported as far as 1.8.
Fix issue #314.
Now that we can wake tasklet for other threads, make sure that if the thread
is sleeping, we wake it up, or the tasklet won't be executed until it's
done sleeping.
That also means that, before going to sleep, and after we put our bit
in sleeping_thread_mask, we have to check that nobody added a tasklet for
us, just checking for global_tasks_mask isn't enough anymore.
The aim is to rassemble all scheduler information related to the current
thread. It simply points to task_per_thread[tid] without having to perform
the operation at each time. We save around 1.2 kB of code on performance
sensitive paths and increase the request rate by almost 1%.
Instead of using the same type for regular linked lists and "autolocked"
linked lists, use a separate type, "struct mt_list", for the autolocked one,
and introduce a set of macros, similar to the LIST_* macros, with the
MT_ prefix.
When we use the same entry for both regular list and autolocked list, as
is done for the "list" field in struct connection, we know have to explicitely
cast it to struct mt_list when using MT_ macros.
Most of times, when a keyword is added in proxy section or on the server line,
we need to have a post-parser callback to check the config validity for the
proxy or the server which uses this keyword.
It is possible to register a global post-parser callback. But all these
callbacks need to loop on the proxies and servers to do their job. It is neither
handy nor efficient. Instead, it is now possible to register per-proxy and
per-server post-check callbacks.
Most of times, when any allocation is done during configuration parsing because
of a new keyword in proxy section or on the server line, we must add a call in
the deinit() function to release allocated ressources. It is now possible to
register a post-deinit callback because, at this stage, the proxies and the
servers are already releases.
Now, it is possible to register deinit callbacks per-proxy or per-server. These
callbacks will be called for each proxy and server before releasing them.
If haproxy is built with profiling enabled with -pg, it is possible to
see the master quit during a reload while it's re-executing itself with
error code 155 (signal 27) saying "Profile timer expired)". This happens
if the SIGPROF signal is delivered during the execve() call while the
handler was already unregistered. The issue itself is not directly inside
haproxy but it's easy to address. This patch disables this signal before
calling execvp() during a master reload. A simple test for this consists
in running this little script with haproxy started in master-worker mode :
$ while usleep 50000; do killall -USR2 haproxy; done
This fix should be backported to all versions using the master-worker
model.
Now that the architecture was changed so that attempts to receive/send data
always come from the upper layers, instead of them only trying to do so when
the lower layer let them know they could try, we can finally get rid of the
fd cache. We don't really need it anymore, and removing it gives us a small
performance boost.
Default HTTP error messages are stored in an array of chunks. And since the HTX
was added, these messages are also converted in HTX and stored in another
array. But now, the first array is not used anymore because the legacy HTTP mode
was removed.
So now, only the array with the HTX messages are kept. The other one was
removed.
By default, this tune parameter is set to MAX_HTTP_HDR. This assignment is done
after the configuration parsing, when we check the configuration validity. So
during the configuration parsing, its value is 0. Now, it is set to MAX_HTTP_HDR
from the start. So, it is possible to rely on it during the configuration
parsing.
The keywords req* and rsp* are now unsupported. So the corresponding lists are
now unused. It is safe to remove them from the structure proxy.
As a result, the code dealing with these rules in HTTP analyzers was also
removed.
The old module proto_http does not exist anymore. All code dedicated to the HTTP
analysis is now grouped in the file proto_htx.c. So, to finish the polishing
after removing the legacy HTTP code, proto_htx.{c,h} files have been moved in
http_ana.{c,h} files.
In addition, all HTX analyzers and related functions prefixed with "htx_" have
been renamed to start with "http_" instead.
First of all, all legacy HTTP analyzers and all functions exclusively used by
them were removed. So the most of the functions in proto_http.{c,h} were
removed. Only functions to deal with the HTTP transaction have been kept. Then,
http_msg and hdr_idx modules were entirely removed. And finally the structure
http_msg was lightened of all its useless information about the legacy HTTP. The
structure hdr_ctx was also removed because unused now, just like unused states
in the enum h1_state. Note that the memory pool "hdr_idx" was removed and
"http_txn" is now smaller.
Since commit 81492c989 ("MINOR: threads: flatten the per-thread cpu-map"),
we don't keep the proc*thread matrix anymore to represent the full binding
possibilities, but only the proc and thread ones. The problem is that the
per-process binding is not the same for each thread and for the process,
and the proc[] array was assumed to store the per-proc first thread value
when doing this change. Worse, the logic present there tries to deal with
thread ranges and process ranges in a way which automatically exclused the
other possibility (since ranges cannot be used on both) but as such fails
to apply changes if neither the process nor the thread is expressed as a
range.
The real problem comes from the fact that specifying cpu-map 1/1 doesn't
yet reveal if the per-process mask or the per-thread mask needs to be
updated. In practice it's the thread one but then the current storage
doesn't allow to store the binding of the first thread of each other
process in nbproc>1 configurations.
When removing the proc*thread matrix, what ought to have been kept was
both the thread column for process 1 and the process line for threads 1,
but instead only the thread column was kept. This patch reintroduces the
storage of the configuration for the first thread of each process so that
it is again possible to store either the per-thread or per-process
configuration.
As a partial workaround for existing configurations, it is possible to
systematically indicate at least two processes or two threads at once
and map them by pairs or more so that at least two values are present
in the range. E.g :
# set processes 1-4 to cpus 0-3 :
cpu-map auto:1-4/1 0 1 2 3
# or:
cpu-map 1-2/1 0 1
cpu-map 2-3/1 2 3
# set threads 1-4 to cpus 0-3 :
cpu-map auto:1/1-4 0 1 2 3
# or :
cpu-map 1/1-2 0 1
cpu-map 3/3-4 2 3
This fix must be backported to 2.0.
Before switching to wait mode, the per thread deinit should not be
called, because we didn't initiate threads and fdtab.
The problem is that the master could crash if we try to reload HAProxy
The commit 944e619 ("MEDIUM: mworker: wait mode use standard init code
path") removed the deinit code by accident, but its fix 7c756a8
("BUG/MEDIUM: mworker: fix FD leak upon reload") was incomplete and did
not took care of the WAIT_MODE.
This fix must be backported in 1.9 and 2.0
getpid() is documented as returning a pit pid_t result, not
necessarily an int. This causes a build warning on Solaris 10
because of '%d' or '%u' are used in the format passed to snprintf().
Let's just cast the result as an int (respectively unsigned int).
This can be backported to 2.0 and possibly older versions though
it really has no impact.
It's really confusing to call it a task because it's a tasklet and used
in places where tasks and tasklets are used together. Let's rename it
to tasklet to remove this confusion.
PiBa-NL found some pathological cases where starting threads can hinder
each other and cause a measurable slow down. This problem is reproducible
with the following config (haproxy must be built with -DDEBUG_DEV) :
global
stats socket /tmp/sock1 mode 666 level admin
nbthread 64
backend stopme
timeout server 1s
option tcp-check
tcp-check send "debug dev exit\n"
server cli unix@/tmp/sock1 check
This will cause the process to be stopped once the checks are ready to
start. Binding all these to just a few cores magnifies the problem.
Starting them in loops shows a significant time difference among the
commits :
# before startup serialization
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e186161 -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m1.581s
user 0m0.621s
sys 0m5.339s
# after startup serialization
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e4d7c9dd -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m2.366s
user 0m0.894s
sys 0m8.238s
In order to address this, let's use plain mutexes and cond_wait during
the init phase. With this done, waiting threads now sleep and the problem
completely disappeared :
$ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy -db -f slow-init.cfg >/dev/null 2>&1; done
real 0m0.161s
user 0m0.079s
sys 0m0.149s
There's no point in calling this on each and every thread since the first
thread passing there will enable the listeners, and the next ones will
simply scan all of them in turn to discover that they are already
initialized. Let's only initilize them on the first thread. This could
slightly speed up start up on very large configurations, eventhough most
of the time is still spent in the main thread binding the sockets.
A few measurements have constantly shown that this decreases the startup
time by ~0.1s for 150k listeners. Starting all of them in parallel doesn't
provide better results and can still expose some undesired races.
Since commit 6ec902a ("MINOR: threads: serialize threads initialization")
we now serialize threads initialization. But doing so has emphasized another
race which is that some threads may actually start the loop before others
are done initializing.
As soon as all threads enter the first thread_release() call, their rdv
bit is cleared and they're all waiting for all others' rdv to be cleared
as well, with their harmless bit set. The first one to notice the cleared
mask will progress through thread_isolate(), take rdv again preventing
most others from noticing its short pass to zero, and this first one will
be able to run all the way through the initialization till the last call
to thread_release() which it happily crosses, being the only one with the
rdv bit, leaving the room for one or a few others to do the same. This
results in some threads entering the loop before others are done with
their initialization, which is particularly bad. PiBa-NL reported that
some regtests fail for him due to this (which was impossible to reproduce
here, but races are racy by definition). However placing some printf()
in the initialization code definitely shows this unsychronized startup.
This patch takes a different approach in three steps :
- first, we don't start with thread_release() anymore and we don't
set the rdv mask anymore in the main call. This was initially done
to let all threads start toghether, which we don't want. Instead
we just start with thread_isolate(). Since all threads are harmful
by default, they all wait for each other's readiness before starting.
- second, we don't release with thread_release() but with
thread_sync_release(), meaning that we don't leave the function until
other ones have reached the point in the function where they decide
to leave it as well.
- third, it makes sure we don't start the listeners using
protocol_enable_all() before all threads have allocated their local
FD tables or have initialized their pollers, otherwise startup could
be racy as well. It's worth noting that it is even possible to limit
this call to thread #0 as it only needs to be performed once.
This now guarantees that all thread init calls start only after all threads
are ready, and that no thread enters the polling loop before all others have
completed their initialization.
Please check GH issues #111 and #117 for more context.
No backport is needed, though if some new init races are reported in
1.9 (or even 1.8) which do not affect 2.0, then it may make sense to
carefully backport this small series.
There is no point in initializing threads in parallel when we know that
it's the moment where some global variables are turned to thread-local
ones, and/or that some global variables are updated (like global_now or
trash_size). Some FDs might be created/destroyed/reallocated and could
be tricky to follow as well (think about epoll_fd for example).
Instead of having to be extremely careful about all these, and to trigger
false positives in thread sanitizers, let's simply initialize one thread
at a time. The init step is very fast so nobody should even notice, and
we won't have any more doubts about what might have happened when
analysing a dump.
See GH issues #111 and #117 for some background on this.
As reported in GH issue #99, when hard-stop-after triggers and threads
are in use, the chance that any thread releases the resources in use by
the other ones is non-null. Thus no thread should be allowed to deinit()
nor exit by itself.
Here we take a different approach. We simply use a 3rd possible value
for the "killed" variable so that all threads know they must break out
of the run-poll-loop and immediately stop.
This patch was tested by commenting the stream_shutdown() calls in
hard_stop() to increase the chances to see a stream use released
resources. With this fix applied, it never crashes anymore.
This fix should be backported to 1.9 and 1.8.
Remove the active_tasks_mask variable, we can deduce if we've work to do
by other means, and it is costly to maintain. Instead, introduce a new
function, thread_has_tasks(), that returns non-zero if there's tasks
scheduled for the thread, zero otherwise.
We have been abusing the do_poll()'s timeout for a while, making it zero
whenever there is some known activity. The problem this poses is that it
complicates activity diagnostic by incrementing the poll_exp field for
each known activity. It also requires extra computations that could be
avoided.
This change passes a "wake" argument to say that the poller must not
sleep. This simplifies the operations and allows one to differenciate
expirations from activity.
We still have quite a number of build macros which are mapped 1:1 to a
USE_something setting in the makefile but which have a different name.
This patch cleans this up by renaming them to use the USE_something
one, allowing to clean up the makefile and make it more obvious when
reading the code what build option needs to be added.
The following renames were done :
ENABLE_POLL -> USE_POLL
ENABLE_EPOLL -> USE_EPOLL
ENABLE_KQUEUE -> USE_KQUEUE
ENABLE_EVPORTS -> USE_EVPORTS
TPROXY -> USE_TPROXY
NETFILTER -> USE_NETFILTER
NEED_CRYPT_H -> USE_CRYPT_H
CONFIG_HAP_CRYPT -> USE_LIBCRYPT
CONFIG_HAP_NS -> DUSE_NS
CONFIG_HAP_LINUX_SPLICE -> USE_LINUX_SPLICE
CONFIG_HAP_LINUX_TPROXY -> USE_LINUX_TPROXY
CONFIG_HAP_LINUX_VSYSCALL -> USE_LINUX_VSYSCALL
We currently have the ability to register functions to be called early
on thread creation and at thread deinitialization. It turns out this is
not sufficient because certain such functions may use resources that are
being allocated by the other ones, thus creating a race condition depending
only on the linking order. For example the mworker needs to register a
file descriptor while the pollers will reallocate the fd_updt[] array.
Similarly logs and trashes may be used by some init functions while it's
unclear whether they have been deduplicated.
The same issue happens on deinit, if the fd_updt[] or trash is released
before some functions finish to use them, we'll get into trouble.
This patch creates a couple of early and late callbacks for per-thread
allocation/freeing of resources. A few init functions were moved there,
and the fd init code was split between the two (since it used to both
allocate and initialize at once). This way the init/deinit sequence is
expected to be safe now.
This patch should be backported to 1.9 as at least the trash/log issue
seems to be present. The run_thread_poll_loop() code is a bit different
there as the mworker is not a callback, but it will have no effect and
it's enough to drop the mworker changes.
This bug was reported by Ilya Shipitsin in github issue #104.
Commit 5a6e2245f ("REORG: threads: move the struct thread_info from
global.h to hathreads.h") didn't hold its promise well, as the thread_info
struct was still declared and initialized in haproxy.c in addition to being
in hathreads.c. Let's move it for real now.
The struct mworker_proc is not uniformly freed everywhere, sometimes leading
to leaks of the `id` string (and possibly the other strings).
Introduce a mworker_free_child function instead of duplicating the freeing
logic everywhere to prevent this kind of issues.
This leak was reported in issue #96.
It looks like the leaks have been introduced in commit 9a1ee7ac31,
which is specific to 2.0-dev. Backporting `mworker_free_child` might be
helpful to ease backporting other fixes, though.
The clock_gettime() man page says we must check that _POSIX_TIMERS is
defined to a value greater than zero, not just that it's simply defined
so let's fix this right now.
Event ports are kqueue/epoll polling class for Solaris. Code is based
on https://github.com/joyent/haproxy-1.8/tree/joyent/dev-v1.8.8.
Event ports are available only on SunOS systems derived from
Solaris 10 and later (including illumos systems).
I took extreme care to always check for _POSIX_THREAD_CPUTIME before
manipulating clock_id, except at one place (run_thread_poll_loop) as
found by Manu, breaking Solaris. Now fixed, no backport needed.
Since we're likely to access this thread_info struct more frequently in
the future, let's reserve the thread-local symbol to access it directly
and avoid always having to combine thread_info and tid. This pointer is
set when tid is set.
This is the per-thread CPU runtime clock, it will be used to measure
the CPU usage of each thread and by the lockup detection mechanism. It
must only be retrieved at the beginning of run_thread_poll_loop() since
the thread must already have been started for this. But it must be done
before performing any per-thread initcall so that all thread init
functions have access to the clock ID.
Note that it could make sense to always have this clockid available even
in non-threaded situations and place the process' clock there instead.
But it would add portability issues which are currently easy to deal
with by disabling threads so it may not be worth it for now.
This way we'll be able to store more per-thread information than just
the pthread pointer. The storage became an array of struct instead of
an allocated array since it's very small (typically 512 bytes) and not
worth the hassle of dealing with memory allocation on this. The array
was also renamed thread_info to make its intended usage more explicit.
Now that we have the guarantee that init calls happen before any other
thread starts, we don't need anymore the workaround installed by commit
1605c7ae6 ("BUG/MEDIUM: threads/mworker: fix a race on startup") and we
can instead rely on a regular per-thread initcall for this function. It
will only be performed on worker thread #0, the other ones and the master
have nothing to do, just like in the original code that was only moved
to the function.
It's a bit dangerous to let threads initialize at different speeds on
startup. Some are still in their init functions while others area already
running. It was even subject to some race condition bugs like the one
fixed by commit 1605c7ae6 ("BUG/MEDIUM: threads/mworker: fix a race on
startup").
Here in order to secure all this, we take a very simplistic approach
consisting in using half of the rendez-vous point, which is made
exactly for this purpose : we first initialize the mask of the threads
requesting a rendez-vous to the mask of all threads, and we simply call
thread_release() once the init is complete. This guarantees that no
thread will go further than the initialization code during this time.
This could even safely be backported if any other issue related to an
init race was discovered in a stable release.
It's always a pain to have to stuff lots of #ifdef USE_OPENSSL around
ssl headers, it even results in some of them appearing in a random order
and multiple times just to benefit form an existing ifdef block. Let's
make these headers safe for inclusion when USE_OPENSSL is not defined,
they now perform the test themselves and do nothing if USE_OPENSSL is
not defined. This allows to remove no less than 8 such ifdef blocks
and make include blocks more readable.
Since we're providing a compatibility layer for multiple OpenSSL
implementations and their derivatives, it is important that no C file
directly includes openssl headers but only passes via openssl-compat
instead. As a bonus this also gets rid of redundant complex rules for
inclusion of certain files (engines etc).
They were all check to comply with the advertised openssl version. Now
that libressl doesn't pretend to be a more recent openssl anymore, we
can simply rely on the regular openssl version tests without having to
deal with exceptions for libressl.
Most tests on OPENSSL_VERSION_NUMBER have become complex and break all
the time because this number is fake for some derivatives like LibreSSL.
This patch creates a new macro, HA_OPENSSL_VERSION_NUMBER, which will
carry the real openssl version defining the compatibility level, and
this version will be adjusted depending on the variants.
As with every single OpenSSL fix, LibreSSL build broke again, this time
after commit 56996dabe ("BUG/MINOR: mworker/ssl: close OpenSSL FDs on
reload"). A definitive solution will have to be found quickly. For now,
let's exclude libressl from the version test.
This patch must be backported to 1.9 since the fix above was already
backported there.
This patch implements a new global parameter for the master-worker mode.
When setting the mworker-max-reloads value, a worker receive a SIGTERM
if its number of reloads is greater than this value.
Since previous commit it's not needed anymore to test a task pointer
before calling task_destory() so let's just remove these tests from
the various callers before they become confusing. The function's
arguments were also documented. The same should probably be done
with tasklet_free() which involves a test in roughly half of the
call places.
In commit 1b8e68e ("MEDIUM: stick-table: Stop handling stick-tables as
proxies."), the ->table member of proxy struct was replaced by a pointer
that is not always checked and in some situations can cause a segfault,
eg. during reload or while using "show table" on CLI socket.
No backport is needed.
From OpenSSL 1.1.1, the default behaviour is to maintain open FDs to any
random devices that get used by the random number library. As a result,
those FDs leak when the master re-execs on reload; since those FDs are
not marked FD_CLOEXEC or O_CLOEXEC, they also get inherited by children.
Eventually both master and children run out of FDs.
OpenSSL 1.1.1 introduces a new function to control whether the random
devices are kept open. When clearing the keep-open flag, it also closes
any currently open FDs, so it can be used to clean-up open FDs too.
Therefore, a call to this function is made in mworker_reload prior to
re-exec.
The call is guarded by whether SSL is in use, because it will cause
initialisation of the OpenSSL random number library if that has not
already been done.
This should be backported to 1.9 and 1.8.
Now we atomically allocate the my_regex struct within function
regex_comp() and compile the regex or free both in case of failure. The
pointer to the allocated my_regex struct is returned directly. The
my_regex* argument to regex_comp() is removed.
Function regex_free() was modified so that it systematically frees the
my_regex entry. The function does nothing when called with a NULL as
argument (like free()). It will avoid existing risk of not properly
freeing the initialized area.
Other structures are also updated in order to be compatible (the ones
related to Lua and action rules).
This patch adds the support for the "table" line parsing in "peers" sections
to declare stick-table in such sections. This also prevents the user from having
to declare dummy backends sections with a unique stick-table inside.
Even if still supported, this usage will become deprecated.
To do so, the ->table member of proxy struct which is a stktable struct is replaced
by a pointer to a stktable struct allocated at parsing time in src/cfgparse-listen.c
for the dummy stick-table backends and in src/cfgparse.c for "peers" sections.
This has an impact on the code for stick-table sample converters and on the stickiness
rules parsers which first store the name of the dummy before resolving the rules.
This patch replaces proxy_tbl_by_name() calls by stktable_find_by_name() calls
to lookup for stick-tables stored in "stktable_by_name" ebtree at parsing time.
There is only one remaining place where proxy_tbl_by_name() is used: src/hlua.c.
At several places in the code we relied on the fact that ->size member of stick-table
was equal to zero to consider the stick-table was present by not configured,
this do not make sense anymore as ->table member of struct proxyis fow now on a pointer.
These tests are replaced by a test on ->table value itself.
In "peers" section we do not have to temporary store the name of the section the
stick-table are attached to because this name is obviously already known just after
having entered this "peers" section.
About the CLI stick-table I/O handler, the pointer to proxy struct is replaced by
a pointer to a stktable struct.
Currently the thread array is a local variable inside a function block
and there is no access to it from outside, which often complicates
debugging. Let's make it global and export it. Also the allocation
return is now checked.
It's still obscure how we managed to initialize an array of integers
with values always equal to the index, just to retrieve the value
from an opaque pointer to the index instead of directly using it! I
suspect it's a leftover from the very early threading experiments.
This commit gets rid of this and simply passes the thread ID as the
argument to run_thread_poll_loop(), thus significantly simplifying the
few call places and removing the need to allocate then free an array
of identity.
When we initially experimented with threads and processes support, we
needed to implement arrays of threads per process for cpu-map, but this
is not needed anymore since we support either threads or processes.
Let's simply make the thread-based cpu-map per thread and not per
thread and per process since that's not used anymore. Doing so reduces
the global struct from 33kB to 1.5kB.
When using the "use_backend" configuration directive, the configuration
file name stored as rule->file was not freed in some situations. This
was introduced in commit 4ed1c95 ("MINOR: http/conf: store the
use_backend configuration file and line for logs").
This patch should be backported to 1.9, 1.8 and 1.7.
As by default we add all keepalive connections to the idle pool, if we run
into a pathological case, where all client don't do keepalive, but the server
does, and haproxy is configured to only reuse "safe" connections, we will
soon find ourself having lots of idling, unusable for new sessions, connections,
while we won't have any file descriptors available to create new connections.
To fix this, add 2 new global settings, "pool_low_ratio" and "pool_high_ratio".
pool-low-fd-ratio is the % of fds we're allowed to use (against the maximum
number of fds available to haproxy) before we stop adding connections to the
idle pool, and destroy them instead. The default is 20. pool-high-fd-ratio is
the % of fds we're allowed to use (against the maximum number of fds available
to haproxy) before we start killing idling connection in the event we have to
create a new outgoing connection, and no reuse is possible. The default is 25.
task_delete() was never used without calling task_free() just after, and
task_free() was only used on error pathes to destroy a just-created task,
so merge them into task_destroy(), that will remove the task from the
wait queue, and make sure the task is either destroyed immediately if it's
not in the run queue, or destroyed when it's supposed to run.
It's always a pain to get a core dump when enabling user/group setting
(which disables the dumpable flag on Linux), when using a chroot and/or
when haproxy is started by a service management tool which requires
complex operations to just raise the core dump limit.
This patch introduces a new "set-dumpable" global directive to work
around these troubles by doing the following :
- remove file size limits (equivalent of ulimit -f unlimited)
- remove core size limits (equivalent of ulimit -c unlimited)
- mark the process dumpable again (equivalent of suid_dumpable=1)
Some of these will depend on the operating system. This way it becomes
much easier to retrieve a core file. Temporarily moving the chroot to
a user-writable place generally enough.
Since the introduction of the options field, we can use it to store the
type of process.
type = 'm' is replaced by PROC_O_TYPE_MASTER
type = 'w' is replaced by PROC_O_TYPE_WORKER
type = 'e' is replaced by PROC_O_TYPE_PROG
The old values are still used in the HAPROXY_PROCESSES environment
variable to pass the information during a reload.
Pavlos Parissis reported an interesting case where some map identifiers
were not assigned (appearing as -1 in show map). It turns out that it
only happens for log-format expressions parsed in check_config_validity()
that involve maps (log-format, use_backend, unique-id-header), as in the
sample configuration below :
frontend foo
bind :8001
unique-id-format %[src,map(addr.lst)]
log-format %[src,map(addr.lst)]
use_backend %[src,map(addr.lst)]
The reason stems from the initial introduction of unique IDs in 1.5 via
commit af5a29d5f ("MINOR: pattern: Each pattern is identified by unique
id.") : the unique_id assignment was done before calling
check_config_validity() so all maps loaded after this call are not
properly configured. From what the function does, it seems they will not
be able to use a cache, will not have a unique_id assigned and will not
be updatable from the CLI.
This fix must be backported to all supported versions.
This patch implements the external binary support in the master worker.
To configure an external process, you need to use the program section,
for example:
program dataplane-api
command ./dataplane_api
Those processes are launched at the same time as the workers.
During a reload of HAProxy, those processes are dealing with the same
sequence as a worker:
- the master is re-executed
- the master sends a USR1 signal to the program
- the master launches a new instance of the program
During a stop, or restart, a SIGTERM is sent to the program.
The children variable is still used in haproxy, it is not required
anymore since we have the information about the current workers in the
mworker_proc linked list.
The oldpids array is also replaced by this linked list when we
generated the arguments for the master reexec.
The current initcall implementation relies on dedicated sections (one
section per init stage) to store the initcall descriptors. Then upon
startup, these sections are scanned from beginning to end and all items
found there are called in sequence.
On platforms like AIX or Cygwin it seems difficult to figure the
beginning and end of sections as the linker doesn't seem to provide
the corresponding symbols. In order to replace this, this patch
simply implements an array of single linked (one per init stage)
which are fed using constructors for each register call. These
constructors are declared static, with a name depending on their
line number in the file, in order to avoid name clashes. The final
effect is the same, except that the method is slightly more expensive
in that it explicitly produces code to register these initcalls :
$ size haproxy.sections haproxy.constructor
text data bss dec hex filename
4060312 249176 1457652 5767140 57ffe4 haproxy.sections
4062862 260408 1457652 5780922 5835ba haproxy.constructor
This mechanism is enabled as an alternative to the default one when
build option USE_OBSOLETE_LINKER is set. This option is currently
enabled by default only on AIX and Cygwin, and may be attempted for
any target which fails to build complaining about missing symbols
__start_init_* and/or __stop_init_*.
Once confirmed as a reliable fix, this will likely have to be backported
to 1.9 where AIX and Cygwin do not build anymore.
A bug occurs when the sigchld handler is called and a child which is
not in the process list just left, or with an empty process list.
The child variable won't be set and left as an uninitialized variable or
set to the wrong child entry, which can lead to a free of this
uninitialized variable or of the wrong child.
This can lead to a crash of the master during a stop or a reload.
It is not supposed to happen with a worker which was created by the
master. A cause could be a fork made by a dependency. (openssl, lua ?)
This patch strengthens the case of the missing child by doing the free
only if the child was found.
This patch must be backported to 1.9.
It's not convenient not to know the status of default options, and
requires the user to know what option is enabled by default in each
target. With this patch, a new "Features list" line is added to the
output of "haproxy -vv" to report the whole list of known features
with their respective status. They're prefixed with a "+" when enabled
or a "-" when disabled. The "USE_" prefix is removed for clarity.
It's never easy to guess what services are built in. We currently have
the prometheus exporter in contrib/ which is the only extension for now.
Let's enumerate all available ones just like we do for filterr and pollers.
Each thread uses one epoll_fd or kqueue_fd, and a pipe (thus two FDs).
These ones have to be accounted for in the maxsock calculation, otherwise
we can reach maxsock before maxconn. This is difficult to observe but it
in fact happens when a server connects back to the frontend and has checks
enabled : the check uses its FD and serves to fill the loop. In this case
all FDs planed for the datapath are used for this.
This needs to be backported to 1.9 and 1.8.
Some packages used to rely on DEFAULT_MAXCONN to set the default global
maxconn value to use regardless of the initial ulimit. The recent changes
made the lowest bound set to 100 so that it is compatible with almost any
environment. Now that DEFAULT_MAXCONN is not needed for anything else, we
can use it for the lowest bound set when maxconn is not configured. This
way it retains its original purpose of setting the default maxconn value
eventhough most of the time the effective value will be higher thanks to
the automatic computation based on "ulimit -n".
This entry was still set to 2000 but never used anymore. The only places
where it appeared was as an alias to SYSTEM_MAXCONN which forces it, so
let's turn these ones to SYSTEM_MAXCONN and remove the default value for
DEFAULT_MAXCONN. SYSTEM_MAXCONN still defines the upper bound however.
The global maxconn value is often a pain to configure :
- in development the user never has the permissions to increase the
rlim_cur value too high and gets warnings all the time ;
- in some production environments, users may have limited actions on
it or may only be able to act on rlim_fd_cur using ulimit -n. This
is sometimes particularly true in containers or whatever environment
where the user has no privilege to upgrade the limits.
- keeping config homogenous between machines is even less easy.
We already had the ability to automatically compute maxconn from the
memory limits when they were set. This patch goes a bit further by also
computing the limit permitted by the configured limit on the number of
FDs. For this it simply reverses the rlim_fd_cur calculation to determine
maxconn based on the number of reserved sockets for listeners & checks,
the number of SSL engines and the number of pipes (absolute or relative).
This way it becomes possible to make maxconn always be the highest possible
value resulting in maxsock matching what was set using "ulimit -n", without
ever setting it. Note that we adjust to the soft limit, not the hard one,
since it's what is configured with ulimit -n. This allows users to also
limit to low values if needed.
Just like before, the calculated value is reported in verbose mode.
We'll need to know the global maxsock before the maxconn calculation.
Actually only two components were calculated too late, the peers FD
and the stats FD. Let's move them a few lines upward.