There currently is a large inconsistency in how binding parameters are
split between bind_conf and listeners. It happens that for historical
reasons some parameters are available at the listener level but cannot
be configured per-listener but only for a bind_conf, and thus, need to
be replicated. In addition, some of the bind_conf parameters are in fact
for the listening socket itself while others are for the instanciated
sockets.
A previous attempt at splitting listeners into receivers failed because
the boundary between all these settings is not well defined.
This patch introduces a level of listening socket settings in the
bind_conf, that will be detachable later. Such settings that are solely
for the listening socket are:
- unix socket permissions (used only during binding)
- interface (used for binding)
- network namespace (used for binding)
- process mask and thread mask (used during startup)
The rest seems to be used only to initialize the resulting sockets, or
to control the accept rate. For now, only the unix params (bind_conf->ux)
were moved there.
Just like with previous commit, DNS nameservers are affected as well with
addresses starting in "udp@", but here it's different, because due to
another bug in the DNS parser, the address is rejected, indicating that
it doesn't have a ->connect() method. Similarly, the DNS code believes
it's working on top of TCP at this point and this used to work because of
this. The same fix is applied to remap the protocol and the ->connect test
was dropped.
No backport is needed, as the ->connect() test will never strike in 2.2
or below.
Commit 3835c0dcb ("MEDIUM: udp: adds minimal proto udp support for
message listeners.") introduced a problematic side effect in log server
address parser: if "udp@", "udp4@" or "udp6@" prefixes a log server's
address, the adress is passed as-is to the log server with a non-existing
family and fails like this when trying to send:
[ALERT] 259/195708 (3474) : socket() failed in logger #1: Address family not supported by protocol (errno=97)
The problem is that till now there was no UDP family, so logs expect an
AF_INET family to be passed for UDP there.
This patch manually remaps AF_CUST_UDP4 and AF_CUST_UDP6 to their "tcp"
equivalent that the log server parser expects. No backport is needed.
Remove the last utility functions for handling the multi-cert bundles
and remove the multi-variable from the ckch structure.
With this patch, the bundles are completely removed.
The multi variable is not useful anymore since the removal of the
multi-certificates bundle support. It can be removed safely from the CLI
functions and suppose that every ckch contains a single certificate.
Since the removal of the multi-certificates bundle support, this
variable is not useful anymore, we can remove all tests for this
variable and suppose that every ckch contains a single certificate.
Like the previous commit, this one emulates the bundling by loading each
certificate separately and storing it in a separate SSL_CTX.
This patch does it for the standard certificate loading, which means
outside directories or crt-list.
The multi-certificates bundle was the common way of offering multiple
certificates of different types (ecdsa and rsa) for a same SSL_CTX.
This was implemented with OpenSSL 1.0.2 before the client_hello callback
was available.
Now that all versions which does not support this callback are
deprecated (< 1.1.0), we can safely removes the support for the bundle
which was inconvenient and complexify too much the code.
The multi-certificates bundle was the common way of offering multiple
certificates of different types (ecdsa and rsa) for a same SSL_CTX.
This was implemented with OpenSSL 1.0.2 before the client_hello callback
was available.
Now that all versions which does not support this callback are
depracated (< 1.1.0), we can safely removes the support for the bundle
which was inconvenient and complexify too much the code.
This patch emulates the bundle loading by looking for the bundle files
when the specified file in the configuration does not exist. It then
creates new entries in the crtlist, so they will appear as new line if
they are dumped from the CLI.
Remove the support for multi-certificates bundle in the CLI. There is
nothing to replace here, it will use the standard codepath with the
"bundle emulation" in the future.
The multi-cert certificates bundle is the former way, implemented with
openssl 1.0.2, of doing multi-certificate (RSA, ECDSA and DSA) for the
same SNI host. Remove this support temporarely so it is replaced by
the loading of each certificate in a separate SSL_CTX.
The use of "bind" wasn't that wise but was temporary. The problem is that
it will not allow to coexist with tcp. Let's explicitly call it "dgram-bind"
so that datagram listeners are expected here, leaving some room for stream
listeners later. This is the only change.
Since the refactoring of the crt-list, the same function is used to
parse a crt-list file and a crt-list line on the CLI.
The assumption that a line on the CLI and a line in a file is finished
by a \n was made. However that is potentialy not the case with a file
which does not finish by a \n.
This patch fixes issue #860 and must be backported in 2.2.
In the SSL code, when we were waiting for the availability of the crypto
engine, once it is ready and its fd's I/O handler is called, don't call
ssl_sock_io_cb() directly, instead, call tasklet_wakeup() on the
ssl_sock_ctx's tasklet. We were calling ssl_sock_io_cb() with NULL as
a tasklet, which used to be fine, but it is no longer true since the
fd takeover changes. We could just provide the tasklet, but let's just
wake the tasklet, as is done for other FDs, for fairness.
This should fix github issue #856.
This should be backported into 2.2.
The socks4 keyword parser was a bit too much copy-pasted, it only checks
for a null port and reports "invalid range". Let's properly check for the
1-65535 range and report the correct error.
It may be backported everywhere "socks4" is present (2.0).
In bug #835, @arjenzorgdoc reported that the verifyhost option on the
server line is case-sensitive, that shouldn't be the case.
This patch fixes the issue by replacing memcmp by strncasecmp and strcmp
by strcasecmp. The patch was suggested by @arjenzorgdoc.
This must be backported in all versions supporting the verifyhost
option.
Changes performed using the following coccinelle patch:
@@
type T;
expression E;
expression t;
@@
(
t = calloc(E, sizeof(*t))
|
- t = calloc(E, sizeof(T))
+ t = calloc(E, sizeof(*t))
)
Looking through the commit history, grepping for coccinelle shows that the same
replacement with a different patch was already performed in the past in commit
02779b6263.
newsrv->curr_idle_thr is of type `unsigned int`, not `int`. Fix this issue
by simply passing the dereferenced pointer to sizeof, which is the preferred
style anyway.
This bug was introduced in commit dc2f2753e9.
It first appeared in 2.2-dev5. The patch must be backported to 2.2+.
It is notable that the `calloc` call was not introduced within the commit in
question. The allocation was already happening before that commit and it
already looked like it does after applying the patch. Apparently the
argument for the `sizeof` managed to get broken during the rearrangement
that happened in that commit:
for (i = 0; i < global.nbthread; i++)
- MT_LIST_INIT(&newsrv->idle_orphan_conns[i]);
- newsrv->curr_idle_thr = calloc(global.nbthread, sizeof(*newsrv->curr_idle_thr));
+ MT_LIST_INIT(&newsrv->safe_conns[i]);
+
+ newsrv->curr_idle_thr = calloc(global.nbthread, sizeof(int));
Even more notable is that I previously fixed that *exact same* allocation in
commit 017484c80f.
So apparently it was managed to break this single line twice in the same
way for whatever reason there might be.
iif() takes a boolean as input and returns one of the two argument
strings depending on whether the boolean is true.
This converter most likely is most useful to return the proper scheme
depending on the value returned by the `ssl_fc` fetch, e.g. for use within
the `x-forwarded-proto` request header.
However it can also be useful for use within a template that is sent to
the client using `http-request return` with a `lf-file`. It allows the
administrator to implement a simple condition, without needing to prefill
variables within the regular configuration using `http-request
set-var(req.foo)`.
It must be done to expire patterns cached in the LRU cache. Otherwise it is
possible to retrieve an already freed pattern, attached to a released pattern
expression.
When a specific pattern is deleted (->delete() callback), the pattern expression
revision is already renewed. Thus it is not affected by this bug. Only prune
action on the pattern expression is concerned.
In addition, for a pattern expression, in ->prune() callbacks when the pattern
list is released, a missing LIST_DEL() has been added. It is not a real issue
because the list is reinitialized at the end and all elements are released and
should never be reused. But it is less confusing this way.
This bug may be triggered when a map is cleared from the cli socket. A
workaround is to set the pattern cache size (tune.pattern.cache-size) to 0 to
disable it.
This patch should fix the issue #844. It must be backported to all supported
versions.
This allocation technically is always reachable and cannot leak, however other
global variables such as `oldpids` are already being freed. This is in an
attempt to get HAProxy to a state where there are zero live allocations after a
clean exit.
Given the following example configuration:
listen http
bind *:80
mode http
stats scope .
Running a configuration check with valgrind reports:
==16341== 26 (24 direct, 2 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 13
==16341== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16341== by 0x571C2E: stats_add_scope (uri_auth.c:296)
==16341== by 0x46CE29: cfg_parse_listen (cfgparse-listen.c:1901)
==16341== by 0x45A112: readcfgfile (cfgparse.c:2078)
==16341== by 0x50A0F5: init (haproxy.c:1828)
==16341== by 0x418248: main (haproxy.c:3012)
After this patch is applied the leak is gone as expected.
This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
It initially looked appealing to be able to call traces with ",,," for
unused arguments, but tcc doesn't like empty macro arguments, and quite
frankly, adding a zero between the few remaining ones is no big deal.
Let's do so now.
Commit 77b98220e ("BUG/MINOR: threads: work around a libgcc_s issue with
chrooting") tried to address an issue with libgcc_s being loaded too late.
But it turns out that the symbol used there isn't present on armhf, thus
it breaks the build.
Given that the issue manifests itself during pthread_exit(), the safest
and most portable way to test this is to call pthread_exit(). For this
we create a dummy thread which exits, during the early boot. This results
in the relevant library to be loaded if needed, making sure that a later
call to pthread_exit() will still work. It was tested to work fine under
linux on the following platforms:
glibc:
- armhf
- aarch64
- x86_64
- sparc64
- ppc64le
musl:
- mipsel
Just running the code under strace easily shows the call in the dummy
thread, for example here on armhf:
$ strace -fe trace=file ./haproxy -v 2>&1 | grep gcc_s
[pid 23055] open("/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
The code was isolated so that it's easy to #ifdef it out if needed.
This should be backported where the patch above is backported (likely
2.0).
The condition in h1_refresh_timeout() seems insufficient to properly
take care of the half-closed timeout, because depending on the ordering
of operations when performing the last send() to a client, the stream
may or may not still be there and we may fail to shrink the client
timeout on our last opportunity to do so.
Here we want to make sure that the timeout is always reduced when the
last chunk was sent and the shutdown completed, regardless of the
presence of a stream or not. This is what this patch does.
This should be backported as far as 2.0, and should fix the issue
reported in #541.
Since 1.8 with commit e8692b41e ("CLEANUP: auth: use the build options list
to report its support"), crypt(3) is always reported as being supported in
"haproxy -vv" because no test on USE_LIBCRYPT is made anymore when
producing the output.
This reintroduces the distinction between with and without USE_LIBCRYPT
in the output by indicating "yes" or "no". It may be backported as far
as 1.8, though the code differs due to a number of include files cleanups.
When the server address is set for the first time, the log message is a bit ugly
because there is no old ip address to report. Thus in the log, we can see :
PX/SRV changed its IP from to A.B.C.D by DNS additional record.
Now, when this happens, "(none)" is reported :
PX/SRV changed its IP from (none) to A.B.C.D by DNS additional record.
This patch may be backported to 2.2.
When a SRV record for an already known server is processed, only the weight is
updated, if not configured to be ignored. It is a problem if the IP address
carried by the associated additional record changes. Because the server IP
address is never renewed.
To fix this bug, If there is an addition record attached to a SRV record, we
always try to set the IP address. If it is the same, no change is
performed. This way, IP changes are always handled.
This patch should fix the issue #841. It must be backported to 2.2.
A SRV record keeps a reference on the corresponding additional record, if
any. But this additional record is also inserted in a separate linked-list into
the dns response. The problems arise when obsolete additional records are
released. The additional records list is purged but the SRV records always
reference these objects, leading to an undefined behavior. Worst, this happens
very quickly because additional records are never renewed. Thus, once received,
an additional record will always expire.
Now, the addtional record are only associated to a SRV record or simply
ignored. And the last version is always used.
This patch helps to fix the issue #841. It must be backported to 2.2.
The pathq sample fetch extract the relative URI of a request, i.e the path with
the query-string, excluding the scheme and the authority, if any. It is pretty
handy to always get a relative URI independently on the HTTP version. Indeed,
while relative URIs are common in HTTP/1.1, in HTTP/2, most of time clients use
absolute URIs.
This patch may be backported to 2.2.
These actions do the same as corresponding "-path" versions except the
query-string is included to the manipulated request path. This means set-pathq
action replaces the path and the query-string and replace-pathq action matches
and replace the path including the query-string.
This patch may be backported to 2.2.
This reverts commit 4b9c0d1fc0.
Actually, the "replace-path" action is ambiguous. "set-path" action preserves
the query-string. The "path" sample fetch does not contain the query-string. But
"replace-path" action is documented to handle the query-string. It is probably
not the expected behavior. So instead of fixing the code, we will fix the
documentation to make "replace-path" action consistent with other parts of the
code. In addition actions and sample fetches to handle the path with the
query-string will be added.
If the commit above is ever backported, this one must be as well.
It was reported in bug #837 that haproxy -s causes a 100% CPU.
However this option does not exist and haproxy must exit with the
usage message.
The parser was not handling the case where -s is not followed by 't' or
'f' which are the only two valid cases.
This bug was introduced by df6c5a ("BUG/MEDIUM: mworker: fix the copy of
options in copy_argv()") which was backported as far as 1.8.
This fix must be backported as far as 1.8.
Ever since the protocols were added in 1.3.13, listeners used to be
started twice:
- once by start_proxies(), which iteratees over all proxies then all
listeners ;
- once by protocol_bind_all() which iterates over all protocols then
all listeners ;
It's a real mess because error reporting is not even consistent, and
more importantly now that some protocols do not appear in regular
proxies (peers, logs), there is no way to retry their binding should
it fail on the last step.
What this patch does is to make sure that listeners are exclusively
started by protocols. The failure to start a listener now causes the
emission of an error indicating the proxy's name (as it used to be
the case per proxy), and retryable failures are silently ignored
during all but last attempts.
The start_proxies() function was kept solely for setting the proxy's
state to READY and emitting the "Proxy started" message and log that
some have likely got used to seeking in their logs.
Similarly to previous commit about ->bind_all(), we have the same
construct for ->unbind_all() which ought not to be used either. Let's
make protocol_unbind_all() iterate over all listeners and directly
call unbind_listener() instead.
It's worth noting that for uxst there was originally a specific
->unbind_all() function but the simplifications that came over the
years have resulted in a locally reimplemented version of the same
function: the test on (state > LI_ASSIGNED) there is equivalent to
the one on (state >= LI_PAUSED) that is used in do_unbind_listener(),
and it seems these have been equivalent since at least commit dabf2e264
("[MAJOR] added a new state to listeners")) (1.3.14).
All protocols only iterate over their own listeners list and start
the listeners using a direct call to their ->bind() function. This
code duplication doesn't make sense and prevents us from centralizing
the startup error handling. Worse, it's not even symmetric because
there's an unbind_all_listeners() function common to all protocols
without any equivalent for binding. Let's start by directly calling
each protocol's bind() function from protocol_bind_all().
Previous commit 77b98220e ("BUG/MINOR: threads: work around a libgcc_s
issue with chrooting") broke the build on cygwin. I didn't even know we
supported threads on cygwin. But the point is that it's actually the
glibc-based libpthread which requires libgcc_s, so in absence of other
reports we should not apply the workaround on other libraries.
This should be backported along with the aforementioned patch.
Sander Hoentjen reported another issue related to libgcc_s in issue #671.
What happens is that when the old process quits, pthread_exit() calls
something from libgcc_s.so after the process was chrooted, and this is
the first call to that library, causing an attempt to load it. In a
chroot, this fails, thus libthread aborts. The behavior widely differs
between operating systems because some decided to use a static build for
this library.
In 2.2 this was resolved as a side effect of a workaround for the same issue
with the backtrace() call, which is also in libgcc_s. This was in commit
0214b45 ("MINOR: debug: call backtrace() once upon startup"). But backtraces
are not necessarily enabled, and we need something for older versions.
By inspecting a significant number of ligcc_s on various gcc versions and
platforms, it appears that a few functions have been present since gcc 3.0,
one of which, _Unwind_Find_FDE() has no side effect (it only returns a
pointer). What this patch does is that in the thread initialization code,
if built with gcc >= 3.0, a call to this function is made in order to make
sure that libgcc_s is loaded at start up time and that there will be no
need to load it upon exit.
An easy way to check which libs are loaded under Linux is :
$ strace -e trace=openat ./haproxy -v
With this patch applied, libgcc_s now appears during init.
Sander confirmed that this patch was enough to put an end to the core
dumps on exit in 2.0, so this patch should be backported there, and maybe
as far as 1.8.
In issue #777, cppcheck wrongly assumes a useless null pointer check
in the expression below while it's obvious that in a 3G/1G split on
32-bit, len can become positive if p is NULL:
p = memchr(ctx.value.ptr, ' ', ctx.value.len);
len = p - ctx.value.ptr;
if (!p || len <= 0)
return 0;
In addition, on 64 bits you never know given that len is a 32-bit signed
int thus the sign of the result in case of a null p will always be the
opposite of the 32th bit of ctx.value.ptr. Admittedly the test is ugly.
Tim proposed this fix consisting in checking for p == ctx.value.ptr
instead when checking for first character only, which Ilya confirmed is
enough to shut cppcheck up. No backport is needed.
When calling the http_replace_res_status() function, an optional reason may now
be set. It is ignored if it points to NULL and the original reason is
preserved. Only the response status is replaced. Otherwise both the status and
the reason are replaced.
It simplifies the API and most of time, avoids an extra call to
http_replace_res_reason().
The documentation stated the "replace-path" action replaces the path, including
the query-string if any is present. But in the code, only the path is
replaced. The query-string is preserved. So, now, instead of relying on the same
action code than "set-uri" action (1), a new action code (4) is used for
"replace-path" action. In http_req_replace_stline() function, when the action
code is 4, we call http_replace_req_path() setting the last argument (with_qs)
to 1. This way, the query-string is not skipped but included to the path to be
replaced.
This patch relies on the commit b8ce505c6 ("MINOR: http-htx: Add an option to
eval query-string when the path is replaced"). Both must be backported as far as
2.0. It should fix the issue #829.
The http_replace_req_path() function now takes a third argument to evaluate the
query-string as part of the path or to preserve it. If <with_qs> is set, the
query-string is replaced with the path. Otherwise, only the path is replaced.
This patch is mandatory to fix issue #829. The next commit depends on it. So be
carefull during backports.
When an informational response (1xx) is received, we must be sure to send it
ASAP. To do so, CF_SEND_DONTWAIT flag must be set on the response channel to
instruct the stream-interface to not set the CO_SFL_MSG_MORE flag on the
transport layer. Otherwise the response delivery may be delayed, because of the
commit 8945bb6c0 ("BUG/MEDIUM: stream-int: fix loss of CO_SFL_MSG_MORE flag in
forwarding").
Note that a previous patch (cf6898cd ["BUG/MINOR: http-ana: Don't wait to send
1xx responses generated by HAProxy"]) add this flag on 1xx responses generated
by HAProxy but not on responses coming from servers.
This patch must be backported to 2.2 and may be backported as far as 1.9, for
HTX part only. But this part has changed in the 2.2, so it may be a bit
tricky. Note it does not fix any known bug on 2.1 and below because the
CO_SFL_MSG_MORE flag is ignored by the h1 mux.
Commit 0d06df6 ("MINOR: sock: introduce sock_inet and sock_unix")
made use of isdigit() on the UNIX socket path without casting the
value to unsigned char, breaking the build on cygwin and possibly
other platforms. No backport is needed.
For now we still don't retrieve dgram sockets, but the code must be able
to distinguish them before we switch to receivers. This adds a new flag
to the xfer_sock_list indicating that a socket is of type SOCK_DGRAM. The
way to set the flag for now is by looking at the dummy address family which
equals AF_CUST_UDP{4,6} in this case (given that other dgram sockets are not
yet supported).
We'll want to store more info there and some info that are not represented
in listener options at the moment (such as dgram vs stream) so let's get
rid of these and instead use a new set of options (SOCK_XFER_OPT_*).
The new function was called sock_get_old_sockets() and was left as-is
except a minimum amount of style lifting to make it more readable. It
will never be awesome anyway since it's used very early in the boot
sequence and needs to perform socket I/O without any external help.
This code was highly redundant, existing for TCP clients, TCP servers
and UDP servers. Let's move it to sock_inet where it belongs. The new
functions are sock_inet4_make_foreign() and sock_inet6_make_foreign().
This is essentially a merge from tcp_find_compatible_fd() and
uxst_find_compatible_fd() that relies on a listener's address and
compare function and still checks for other variations. For AF_INET6
it compares a few of the listener's bind options. A minor change for
UNIX sockets is that transparent mode, interface and namespace used
to be ignored when trying to pick a previous socket while now if they
are changed, the socket will not be reused. This could be refined but
it's still better this way as there is no more risk of using a
differently bound socket by accident.
Eventually we should not pass a listener there but a set of binding
parameters (address, interface, namespace etc...) which ultimately will
be grouped into a receiver. For now this still doesn't exist so let's
stick to the listener to break dependencies in the rest of the code.
Let's determine it at boot time instead of doing it on first use. It
also saves us from having to keep it thread local. It's been moved to
the new sock_inet_prepare() function, and the variables were renamed
to sock_inet_tcp_maxseg_default and sock_inet6_tcp_maxseg_default.
The v6only_default variable is not specific to TCP but to AF_INET6, so
let's move it to the right file. It's now immediately filled on startup
during the PREPARE stage so that it doesn't have to be tested each time.
The variable's name was changed to sock_inet6_v6only_default.
The function now makes it clear that it's independent on the socket
type and solely relies on the address family. Note that it supports
both IPv4 and IPv6 as we don't seem to need it per-family.
This one is common to the TCPv4 and UDPv4 code, it retrieves the
destination address of a socket, taking care of the possiblity that for
an incoming connection the traffic was possibly redirected. The TCP and
UDP definitions were updated to rely on it and remove duplicated code.
The new addrcmp() protocol member points to the function to be used to
compare two addresses of the same family.
When picking an FD from a previous process, we can now use the address
specific address comparison functions instead of having to rely on a
local implementation. This will help move that code to a more central
place.
These files will regroup everything specific to AF_INET, AF_INET6 and
AF_UNIX socket definitions and address management. Some code there might
be agnostic to the socket type and could later move to af_xxxx.c but for
now we only support regular sockets so no need to go too far.
The files are quite poor at this step, they only contain the address
comparison function for each address family.
The new file sock.c will contain generic code for standard sockets
relying on file descriptors. We currently have way too much duplication
between proto_uxst, proto_tcp, proto_sockpair and proto_udp.
For now only get_src, get_dst and sock_create_server_socket were moved,
and are used where appropriate.
Let's finish the cleanup and get rid of all bind and server keywords
parsers from proto_uxst.c. They're now moved to cfgparse-unix.c. Now
proto_uxst.c is clean and only contains code related to binding and
connecting.
Let's continue the cleanup and get rid of all bind and server keywords
parsers from proto_tcp.c. They're now moved to cfgparse-tcp.c, just as
was done for ssl before 2.2 release. Nothing has changed beyond this.
Now proto_tcp.c is clean and only contains code related to binding and
connecting.
Let's continue the cleanup and get rid of all sample fetch functions
from proto_tcp.c. They're now moved to tcp_sample.c, just as was done
for ssl before 2.2 release. Nothing has changed beyond this.
This is totally ugly, smp_fetch_src() is exported only so that stick_table.c
can (ab)use it in the {sc,src}_* sample fetch functions. It could be argued
that the sample could have been reconstructed there in place, but we don't
even need to duplicate the code. We'd rather simply retrieve the "src"
fetch's function from where it's used at init time and be done with it.
The file proto_tcp.c has become a real mess because it still contains
tons of definitions that have nothing to do with the TCP protocol setup.
This commit moves the ruleset actions "set-src-port", "set-dst-port",
"set-src", "set-dst", and "silent-drop" to a new file "tcp_act.c".
Nothing has changed beyond this.
get_old_sockets() mistakenly sets ret=0 instead of ret2=0 before leaving
when the old process announces zero FD. So it will return an error
instead of success. This must be particularly rare not to have a
single socket to offer though!
A few comments were added to make it more obvious what to expect in
return.
This must be backported to 1.8 since the bug has always been there.
Now we don't limit ourselves to listeners found in proxies nor peers
anymore, we're instead scanning all known FDs for those marked with
".exported=1". Just doing so has significantly simplified the code,
and will later allow to yield while sending FDs if desired.
When it comes to retrieving a possible namespace name or interface
name, for now this is only performed on listeners since these are the
only ones carrying such info. Once this moves somewhere else, we'll
be able to also pass these info for UDP receivers for example, with
only tiny changes.
This new flag will be used to mark FDs that must be passed to any future
process across the CLI's "_getsocks" command.
The scheme here is quite complex and full of special cases:
- FDs inherited from parent processes are *not* exported this way, as
they are supposed to instead be passed by the master process itself
across reloads. However such FDs ought never to be paused otherwise
this would disrupt the socket in the parent process as well;
- FDs resulting from a "bind" performed over a socket pair, which are
in fact one side of a socket pair passed inside another control socket
pair must not be passed either. Since all of them are used the same
way, for now it's enough never to put this "exported" flag to FDs
bound by the socketpair code.
- FDs belonging to temporary listeners (e.g. a passive FTP data port)
must not be passed either. Fortunately we don't have such FDs yet.
- the rest of the listeners for now are made of TCP, UNIX stream, ABNS
sockets and are exportable, so they get the flag.
- UDP listeners were wrongly created as listeners and are not suitable
here. Their FDs should be passed but for now they are not since the
client doesn't even distinguish the SO_TYPE of the retrieved sockets.
In addition, it's important to keep in mind that:
- inherited FDs may never be closed in master process but may be closed
in worker processes if the service is shut down (useless since still
bound, but technically possible) ;
- inherited FDs may not be disabled ;
- exported FDs may be disabled because the caller will perform the
subsequent listen() on them. However that might not work for all OSes
- exported FDs may be closed, it just means the service was shut down
from the worker, and will be rebound in the new process. This implies
that we have to disable exported on close().
=> as such, contrary to an apparently obvious equivalence, the "exported"
status doesn't imply anything regarding the ability to close a
listener's FD or not.
This essentially undoes what we did in fd.c in 1.8 to support seamless
reload. Since we don't need to remove an fd anymore we can turn
fd_delete() to the simple function it used to be.
We used to require fd_remove() to remove an FD from a poller when we
still had the FD cache and it was not possible to directly act on the
pollers. Nowadays we don't need this anymore as the pollers will
automatically unregister disabled FDs. The fd_remove() hack is
particularly problematic because it additionally hides the FD from
the known FD list and could make one think it's closed.
It's used at two places:
- with the async SSL engine
- with the listeners (when unbinding from an fd for another process)
Let's just use fd_stop_both() instead, which will propagate down the
stack to do the right thing, without removing the FD from the array
of known ones.
Now when dumping FDs using "show fd" on a process which still knows some
of the other workers' FDs, the FD will properly be listed with a listener
state equal to "ZOM" for "zombie". This guarantees that the FD is still
known and will properly be passed using _getsocks().
The fix 7df5c2d ("BUG/MEDIUM: ssl: fix ssl_bind_conf double free") was
not complete. The problem still occurs when using wildcards in
certificate, during the deinit.
This patch removes the free of the ssl_conf structure in
ssl_sock_free_all_ctx() since it's already done in the crtlist deinit.
It must be backported in 2.2.
During a reload operation, we used to send listener options associated
with each passed file descriptor. These were passed as binary contents
for the size of the "options" field in the struct listener. This means
that any flag value change or field size change would be problematic,
the former failing to properly grab certain options, the latter possibly
causing permanent failures during this operation.
Since these two previous commits:
MINOR: reload: determine the foreing binding status from the socket
BUG/MINOR: reload: detect the OS's v6only status before choosing an old socket
we don't need this anymore as the values are determined from the file
descriptor itself.
Let's just turn the previous 32 bits to vestigal space, send them as
zeroes and ignore them on receipt. The only possible side effect is if
someone would want to roll back from a 2.3 to 2.2 or earlier, such options
might be ignored during this reload. But other forthcoming changes might
make this fail as well anyway so that's not a reason for keeping this
behavior.
Let's not look at the listener options passed by the original process
and determine from the socket itself whether it is configured for
transparent mode or not. This is cleaner and safer, and doesn't rely
on flag values that could possibly change between versions.
The v4v6 and v6only options are passed as data during the socket transfer
between processes so that the new process can decide whether it wants to
reuse a socket or not. But this actually misses one point: if no such option
is set and the OS defaults are changed between the reloads, then the socket
will still be inherited and will never be rebound using the new options.
This can be seen by starting the following config:
global
stats socket /tmp/haproxy.sock level admin expose-fd listeners
frontend testme
bind :::1234
timeout client 2000ms
Having a look at the OS settins, v6only is disabled:
$ cat /proc/sys/net/ipv6/bindv6only
0
A first check shows it's indeed bound to v4 and v6:
$ ss -an -6|grep 1234
tcp LISTEN 0 2035 *:1234 *:*
Reloading the process doesn't change anything (which is expected). Now let's set
bindv6only:
$ echo 1 | sudo tee /proc/sys/net/ipv6/bindv6only
1
$ cat /proc/sys/net/ipv6/bindv6only
1
Reloading gives the same state:
$ ss -an -6|grep 1234
tcp LISTEN 0 2035 *:1234 *:*
However a restart properly shows a correct bind:
$ ss -an -6|grep 1234
tcp LISTEN 0 2035 [::]:1234 [::]:*
This one doesn't change once bindv6only is reset, for the same reason.
This patch attacks this problem differently. Instead of passing the two
options at once for each listening fd, it ignores the options and reads
the socket's current state for the IPV6_V6ONLY flag and sets it only.
Then before looking for a compatible FD, it checks the OS's defaults
before deciding which of the v4v6 and v6only needs to be kept on the
listener. And the selection is only made on this.
First, it addresses this issue. Second, it also ensures that if such
options are changed between reloads to identical states, the socket
can still be inherited. For example adding v4v6 when bindv6only is not
set will allow the socket to still be usable. Third, it avoids an
undesired dependency on the LI_O_* bit values between processes across
a reload (for these ones at least).
It might make sense to backport this to some recent stable versions, but
quite frankly the likelyhood that anyone will ever notice it is extremely
faint.
If a socket was already bound (inherited from a parent or retrieved from
a previous process), there's no point trying to change its IPV6_V6ONLY
state since it will fail. This is visible in strace as an EINVAL during
a reload when passing FDs.
The use of Common Name is fading out in favor of the RFC recommended
way of using SAN extensions. For example, Chrome from version 58
will only match server name against SAN.
The following patch adds SAN extension by default to all generated certificates.
The SAN extension will be of type DNS and based on the server name.
haproxy supports generating SSL certificates based on SNI using a provided
CA signing certificate. Because CA certificates may be signed by multiple
CAs, in some scenarios, it is neccesary for the server to attach the trust chain
in addition to the generated certificate.
The following patch adds the ability to serve the entire trust chain with
the generated certificate. The chain is loaded from the provided
`ca-sign-file` PEM file.
As reported in issue #816, when building task.o at -O1 with gcc 4.7 or
4.8, we get the following warning:
CC src/task.o
In file included from include/haproxy/proxy.h:31:0,
from include/haproxy/cfgparse.h:27,
from src/task.c:19:
src/task.c: In function 'next_timer_expiry':
include/haproxy/ticks.h:121:10: warning: 'key' may be used uninitialized in this function [-Wmaybe-uninitialized]
src/task.c:349:2: note: 'key' was declared here
It is wrong since the condition to use 'key' is exactly the same as
the one used to set it. This warning disappears at -O2 and disappeared
from gcc 5 and above. Let's just initialize 'key' there, it only adds
16 bytes of code and remains cheap enough for this function.
This should be backported to 2.2.
As reported in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=24745,
haproxy fails to build with TARGET=generic and without extra options due
to auxv.h not being included, since the __GLIBC__ macro is not yet defined.
Let's include it after other libc headers so that the __GLIBC__ definition
is known. Thanks to David and Tim for the diag.
This should be backported to 2.2.
Using a duplicate cache name most likely is the result of a misgenerated
configuration. There is no good reason to allow this, as the duplicate
caches can't be referred to.
This commit resolves GitHub issue #820.
It can be argued whether this is a fix for a bug or not. I'm erring on the
side of caution and marking this as a "new feature". It can be considered for
backporting to 2.2, but for other branches the risk of accidentally breaking
some working (but non-ideal) configuration might be too large.
When the cache name is left out in 'filter cache' the error message refers
to a missing '<id>'. The name of the cache is called 'name' within the docs.
Adjust the error message for consistency.
The error message was introduced in 99a17a2d91.
This commit first appeared in 1.9, thus the patch must be backported to 1.9+.
The negative filters which are supposed to exclude a SNI from a
wildcard, never worked. Indeed the negative filters were skipped in the
code.
To fix the issue, this patch looks for negative filters that are on the
same line as a the wildcard that just matched.
This patch should fix issue #818. It must be backported in 2.2. The
problem also exists in versions > 1.8 but the infrastructure required to
fix this was only introduced in 2.1. In older versions we should
probably change the documentation to state that negative filters are
useless.
In bug #810, the SNI are not matched correctly, indeed when trying to
match a certificate type in ssl_sock_switchctx_cbk() all SNIs were not
looked up correctly.
In the case you have in a crt-list:
wildcard.subdomain.domain.tld.pem.rsa *.subdomain.domain.tld record.subdomain.domain.tld
record.subdomain.domain.tld.pem.ecdsa record.subdomain.domain.tld another-record.subdomain.domain.tld
If the client only supports RSA and requests
"another-record.subdomain.domain.tld", HAProxy will find the single
ECDSA certificate and won't try to look up for a wildcard RSA
certificate.
This patch fixes the code so we look for all single and
wildcard before chosing the certificate type.
This bug was introduced by commit 3777e3a ("BUG/MINOR: ssl: certificate
choice can be unexpected with openssl >= 1.1.1").
It must be backported as far as 1.8 once it is heavily tested.
When a regex had been succesfully compiled by the JIT pass, it is better
to use the related match, thanksfully having same signature, for better
performance.
Signed-off-by: David Carlier <devnexen@gmail.com>
In bug #781 it was reported that HAProxy completes the certificate chain
using the verify store in the case there is no chain.
Indeed, according to OpenSSL documentation, when generating the chain,
OpenSSL use the chain store OR the verify store in the case there is no
chain store.
As a workaround, this patch always put a NULL chain in the SSL_CTX so
OpenSSL does not tries to complete it.
This must be backported in all branches, the code could be different,
the important part is to ALWAYS set a chain, and uses sk_X509_new_null()
if the chain is NULL.
It is possible to process a channel based on desynchronized info if a
request fetch is called from a response and conversely. However, the
code in smp_prefetch_htx() already makes sure the analysis has already
started before trying to fetch from a buffer, so the problem effectively
lies in response rules making use of request expressions only.
Usually it's not a problem as extracted data are checked against the
current HTTP state, except when it comes to the start line, which is
usually accessed directly from sample fetch functions such as status,
path, url, url32, query and so on. In this case, trying to access the
request buffer from the response path will lead to unpredictable
results. When building with DEBUG_STRICT, a process violating these
rules will simply die after emitting:
FATAL: bug condition "htx->first == -1" matched at src/http_htx.c:67
But when this is not enabled, it may or may not crash depending on what
the pending request buffer data look like when trying to spot a start
line there. This is typically what happens in issue #806.
This patch adds a test in smp_prefetch_htx() so that it does not try
to parse an HTX buffer in a channel belonging to the wrong direction.
There's one special case on the "method" sample fetch since it can
retrieve info even without a buffer, from the other direction, as
long as the method is one of the well known ones. Three, we call
smp_prefetch_htx() only if needed.
This was reported in 2.0 and must be backported there (oldest stable
version with HTX).
smp_fetch_ssl_x_chain_der() uses the SSL_get_peer_cert_chain() which
does not increment the refcount of the chain, so it should not be free'd.
The bug was introduced by a598b50 ("MINOR: ssl: add ssl_{c,s}_chain_der
fetch methods"). No backport needed.
The reports for health states are checked using memcmp() in order to
only focus on the first word and possibly ignore trailing %d/%d etc.
This makes gcc unhappy about a potential use of "" as the string, which
never happens since the string is always set. This resulted in commit
c4e6460f6 ("MINOR: build: Disable -Wstringop-overflow.") to silence
these messages. However some lengths are incorrect (though cannot cause
trouble), and in the end strncmp() is just safer and cleaner.
This can be backported to all stable branches as it will shut a warning
with gcc 8 and above.
In commit f187ce6, the ssl-skip-self-issued-ca option was accidentally
made useless by reverting the SSL_CTX reworking.
The previous attempt of making this feature was putting each certificate
of the chain in the SSL_CTX with SSL_CTX_add_extra_chain_cert() and was
skipping the Root CA.
The problem here is that doing it this way instead of doing a
SSL_CTX_set1_chain() break the support of the multi-certificate bundles.
The SSL_CTX_build_cert_chain() function allows one to remove the Root CA
with the SSL_BUILD_CHAIN_FLAG_NO_ROOT flag. Use it instead of doing
tricks with the CA.
Should fix issue #804.
Must be backported in 2.2.
Following work from Arjen and Mathilde, it adds ssl_{c,s}_chain_der
methods; it returns DER encoded certs from SSL_get_peer_cert_chain
Also update existing vtc tests to add random intermediate certificates
When getting the result through this header:
http-response add-header x-ssl-chain-der %[ssl_c_chain_der,hex]
One can parse it with any lib accepting ASN.1 DER data, such as in go:
bin, err := encoding/hex.DecodeString(cert)
certs_parsed, err := x509.ParseCertificates(bin)
Cc: Arjen Nienhuis <arjen@zorgdoc.nl>
Signed-off-by: Mathilde Gilles <m.gilles@criteo.com>
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Free the snapshots on deinit() when they were initialized in a proxy
upon an error.
This was introduced by c55015e ("MEDIUM: snapshots: dynamically allocate
the snapshots").
Should be backported as far as 1.9.
This way, all fields of the buffer structure are reset when a string argument
(ARGT_STR) is released. It is also a good way to explicitly specify this kind
of argument is a chunk. So .data and .size fields must be set.
This patch may be backported to ease backports.
It means now regsub() converter is now exported to the lua. Map converters based
on regex are not available because the map arguments are not supported.
Thanks to previous commits, it is now safe to use from lua the sample fetches
and sample converters that convert arguments, especially the strings
(ARGT_STR). So now, there are all exported to the lua. They was filtered on the
validation functions. Only fetches without validation functions or with val_hdr
or val_payload_lv functions were exported, and converters without validation
functions.
This patch depends on following commits :
* aec27ef44 "BUG/MINOR: lua: Duplicate lua strings in sample fetches/converters arg array"
* fdea1b631 "MINOR: hlua: Don't needlessly copy lua strings in trash during args validation"
It must be backported as far as 2.1 because the date() and http_date()
converters are no longer exported because of the filter on the validation
function, since the commit ae6f125c7 ("MINOR: sample: add us/ms support to
date/http_date)".
Strings in the argument array used by sample fetches and converters must be
duplicated. This is mandatory because, during the arguments validations, these
strings may be converted and released. It works this way during the
configuration parsing and there is no reason to adapt this behavior during the
runtime when a sample fetch or a sample converter is called from the lua. In
fact, there is a reason to not change the behavior. It must reamain simple for
everyone to add new fetches or converters.
Thus, lua strings are duplicated. It is only performed at the end of the
hlua_lua2arg_check() function, if the argument is still a ARGT_STR. Of course,
it requires a cleanup loop after the call or when an error is triggered.
This patch depends on following commits:
* 959171376 "BUG/MINOR: arg: Fix leaks during arguments validation for fetches/converters"
* fdea1b631 "MINOR: hlua: Don't needlessly copy lua strings in trash during args validation"
It may be backported to all supported versions, most probably as far as 2.1
only.
Lua strings are NULL terminated. So in the hlua_lua2arg_check() function, used
to check arguments against the sample fetches specification, there is no reason
to copy these strings in a trash to add a terminating null byte.
In addition, when the array of arguments is built from lua values, we must take
care to count this terminating null bytes in the size of the buffer where a
string is stored. The same must be done when a sample is built from a lua value.
This patch may be backported to easy backports.
In hlua_lua2arg_check() function, before converting an argument to an IPv4 or
IPv6 mask, we must be sure to have an integer or a string argument (ARGT_SINT or
ARGT_STR).
This patch must be backported to all supported versions.
In hlua_lua2arg_check() function, before converting a string to an IP address,
we must be to sure to have a string argument (ARGT_STR).
This patch must be backported to all supported versions.
Some sample fetches or sample converters uses a validation functions for their
arguments. In these function, string arguments (ARGT_STR) may be converted to
another type (for instance a regex, a variable or a integer). Because these
strings are allocated when the argument list is built, they must be freed after
a conversion. Most of time, it is done. But not always. This patch fixes these
minor memory leaks (only on few strings, during the configuration parsing).
This patch may be backported to all supported versions, most probably as far as
2.1 only. If this commit is backported, the previous one 73292e9e6 ("BUG/MINOR:
lua: Duplicate map name to load it when a new Map object is created") must also
be backported. Note that some validation functions does not exists on old
version. It should be easy to resolve conflicts.
When a new map is created, the sample_load_map() function is called. To do so,
an argument array is created with the name as first argument. Because it is a
lua string, owned by the lua, it must be duplicated. The sample_load_map()
function will convert this argument to a map. In theory, after the conversion,
it must release the original string. It is not performed for now and it is a bug
that will be fixed in the next commit.
This patch may be backported to all supported versions, most probably as far as
2.1 only. But it must be backported with the next commit "BUG/MINOR: arg: Fix
leaks during arguments validation for fetches/converters".
The debug() converter uses a string to reference the sink where to send debug
events. During the configuration parsing, this string is converted to a sink
object but it is still store as a string argument. It is a problem on deinit
because string arguments are released. So the sink pointer will be released
twice.
To fix the bug, we keep a reference on the sink using an ARGT_PTR argument. This
way, it will not be freed on the deinit.
This patch depends on the commit e02fc4d0d ("MINOR: arg: Add an argument type to
keep a reference on opaque data"). Both must be backported as far as 2.1.
In sample_load_map() function, the global mode is now tested to be sure to be in
the starting mode. If not, an error is returned.
At first glance, this patch may seem useless because maps are loaded during the
configuration parsing. But in fact, it is possible to load a map from the lua,
using Map:new() method. And, there is nothing to forbid to call this method at
runtime, during a script execution. It must never be done because it may perform
an filesystem access for unknown maps or allocation for known ones. So at
runtime, it means a blocking call or a memroy leak. Note it is still possible to
load a map from the lua, but in the global part of a script only. This part is
executed during the configuration parsing.
This patch must be backported in all stable versions.
This bug affects all version of HAProxy since the OCSP data is not free
in the deinit(), but leaking on exit() is not really an issue. However,
when doing dynamic update of certificates over the CLI, those data are
not free'd upon the free of the SSL_CTX.
3 leaks are happening, the first leak is the one of the ocsp_arg
structure which serves the purpose of containing the pointers in the
case of a multi-certificate bundle. The second leak is the one ocsp
struct. And the third leak is the one of the struct buffer in the
ocsp_struct.
The problem lies with SSL_CTX_set_tlsext_status_arg() which does not
provide a way to free the argument upon an SSL_CTX_free().
This fix uses ex index functions instead of registering a
tlsext_status_arg(). This is really convenient because it allows to
register a free callback which will free the ex index content upon a
SSL_CTX_free().
A refcount was also added to the ocsp_response structure since it is
stored in a tree and can be reused in another SSL_CTX.
Should fix part of the issue #746.
This must be backported in 2.2 and 2.1.
Fix a memory leak when loading an OCSP file when the file was already
loaded elsewhere in the configuration.
Indeed, if the OCSP file already exists, a useless chunk_dup() will be
done during the load.
To fix it we reverts "ocsp" to "iocsp" like it was done previously.
This was introduced by commit 246c024 ("MINOR: ssl: load the ocsp
in/from the ckch").
Should fix part of the issue #746.
It must be backported in 2.1 and 2.2.
A regression was introduced by 13a9232ebc
when I added support for Additional section of the SRV responses..
Basically, when a server is managed through SRV records additional
section and it's disabled (because its associated Additional record has
disappeared), it never leaves its MAINT state and so never comes back to
production.
This patch updates the "snr_update_srv_status()" function to clear the
MAINT status when the server now has an IP address and also ensure this
function is called when parsing Additional records (and associating them
to new servers).
This can cause severe outage for people using HAProxy + consul (or any
other service registry) through DNS service discovery).
This should fix issue #793.
This should be backported to 2.2.
The H1 multiplexer is able to perform synchronous send. When a large body is
transfer, if nothing is received and if no error or shutdown occurs, it is
possible to not go down at the H1 connection level to do I/O for a long
time. When this happens, we must still take care to refresh the H1 connection
timeout. Otherwise it is possible to hit the connection timeout during the
transfer while it should not expire.
This bug exists because only h1_process() refresh the H1 connection timeout. To
fix the bug, h1_snd_buf() must also refresh this timeout. To make things more
readable, a dedicated function has been introduced and called to refresh the
timeout.
This bug exists on all HTX versions. But it is harder to hit it on 2.1 and below
because when a H1 mux is initialized, we actively try to read data instead of
subscribing for receiving. So there is at least one call to h1_process().
This patch should fix the issue #790. It must be backported as far as 2.0.
Check the return of the calloc in ssl_sock_load_ocsp() which could lead
to a NULL dereference.
This was introduced by commit be2774d ("MEDIUM: ssl: Added support for
Multi-Cert OCSP Stapling").
Could be backported as far as 1.7.
There's no point trying to perform an recv() on a back connection if we
have a stream before having sent a request, as it's expected to fail.
It's likely that this may avoid some spurious subscribe() calls in some
keep-alive cases (the close case was already addressed at the connection
level by "MINOR: connection: avoid a useless recvfrom() on outgoing
connections").
When a connect() doesn't immediately succeed (i.e. most of the times),
fd_cant_send() is called to enable polling. But given that we don't
mark that we cannot receive either, we end up performing a failed
recvfrom() immediately when the connect() is finally confirmed, as
indicated in issue #253.
This patch simply adds fd_cant_recv() as well so that we're only
notified once the recv path is ready. The reason it was not there
is purely historic, as in the past when there was the fd cache,
doing it would have caused a pending recv request to be placed into
the fd cache, hence a useless recvfrom() upon success (i.e. what
happens now).
Without this patch, forwarding 100k connections does this:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
17.51 0.704229 7 100000 100000 connect
16.75 0.673875 3 200000 sendto
16.24 0.653222 3 200036 close
10.82 0.435082 1 300000 100000 recvfrom
10.37 0.417266 1 300012 setsockopt
7.12 0.286511 1 199954 epoll_ctl
6.80 0.273447 2 100000 shutdown
5.34 0.214942 2 100005 socket
4.65 0.187137 1 105002 5002 accept4
3.35 0.134757 1 100004 fcntl
0.61 0.024585 4 5858 epoll_wait
With the patch:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
18.04 0.697365 6 100000 100000 connect
17.40 0.672471 3 200000 sendto
17.03 0.658134 3 200036 close
10.57 0.408459 1 300012 setsockopt
7.69 0.297270 1 200000 recvfrom
7.32 0.282934 1 199922 epoll_ctl
7.09 0.274027 2 100000 shutdown
5.59 0.216041 2 100005 socket
4.87 0.188352 1 104697 4697 accept4
3.35 0.129641 1 100004 fcntl
0.65 0.024959 4 5337 1 epoll_wait
Note the total disappearance of 1/3 of failed recvfrom() *without*
adding any extra syscall anywhere else.
The trace of an HTTP health check is now totally clean, with no useless
syscall at all anymore:
09:14:21.959255 connect(9, {sa_family=AF_INET, sin_port=htons(8000), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:14:21.959292 epoll_ctl(4, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLOUT|EPOLLRDHUP, {u32=9, u64=9}}) = 0
09:14:21.959315 epoll_wait(4, [{EPOLLOUT, {u32=9, u64=9}}], 200, 1000) = 1
09:14:21.959376 sendto(9, "OPTIONS / HTTP/1.0\r\ncontent-leng"..., 41, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 41
09:14:21.959436 epoll_wait(4, [{EPOLLOUT, {u32=9, u64=9}}], 200, 1000) = 1
09:14:21.959456 epoll_ctl(4, EPOLL_CTL_MOD, 9, {EPOLLIN|EPOLLRDHUP, {u32=9, u64=9}}) = 0
09:14:21.959512 epoll_wait(4, [{EPOLLIN|EPOLLRDHUP, {u32=9, u64=9}}], 200, 1000) = 1
09:14:21.959548 recvfrom(9, "HTTP/1.0 200\r\nContent-length: 0\r"..., 16320, 0, NULL, NULL) = 126
09:14:21.959570 close(9) = 0
With the edge-triggered poller, it gets even better:
09:29:15.776201 connect(9, {sa_family=AF_INET, sin_port=htons(8000), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:29:15.776256 epoll_ctl(4, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=9, u64=9}}) = 0
09:29:15.776287 epoll_wait(4, [{EPOLLOUT, {u32=9, u64=9}}], 200, 1000) = 1
09:29:15.776320 sendto(9, "OPTIONS / HTTP/1.0\r\ncontent-leng"..., 41, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 41
09:29:15.776374 epoll_wait(4, [{EPOLLIN|EPOLLOUT|EPOLLRDHUP, {u32=9, u64=9}}], 200, 1000) = 1
09:29:15.776406 recvfrom(9, "HTTP/1.0 200\r\nContent-length: 0\r"..., 16320, 0, NULL, NULL) = 126
09:29:15.776434 close(9) = 0
It could make sense to backport this patch to 2.2 and maybe 2.1 after
it has been sufficiently checked for absence of side effects in 2.3-dev,
as some people had reported an extra overhead like in issue #168.
Similarly to the issue described in commit "BUG/MEDIUM: backend: always
attach the transport before installing the mux", in tcpcheck_eval_connect()
we can install a handshake transport layer underneath the mux and replace
its subscriptions, causing a crash if the mux had already subscribed for
whatever reason.
A simple reproducer consists in adding fd_cant_recv() after fd_cant_send()
in tcp_connect_server() and running it on this config, as discussed in issue
listen foo
bind :8181
mode http
option httpchk
server srv1 127.0.0.1:8888 send-proxy-v2 check inter 1000
The mux may only be installed *after* xprt_handshake is set up, so that
it registers against it and not against raw_sock or ssl_sock. This needs
to be backported to 2.2 which is the first version using muxes for checks.
In connect_server(), we can enter in a stupid situation:
- conn_install_mux_be() is called to install the mux. This one
subscribes for receiving and quits ;
- then we discover that a handshake is required on the connection
(e.g. send-proxy), so xprt_add_hs() is called and subscribes as
well.
- we crash in conn_subscribe() which gets a different subscriber.
And if BUG_ON is disabled, we'd likely lose one event.
Note that it doesn't seem to happen by default, but definitely does
if connect() rightfully performs fd_cant_recv(), so it's a matter of
who does what and in what order.
A simple reproducer consists in adding fd_cant_recv() after fd_cant_send()
in tcp_connect_server() and running it on this config, as discussed in issue
listen foo
bind :8181
mode http
server srv1 127.0.0.1:8888 send-proxy-v2
The root cause is that xprt_add_hs() installs an xprt layer underneath
the mux without taking over its subscriptions. Ideally if we want to
support this, we'd need to steal the connection's wait_events and
replace them by new ones. But there doesn't seem to be any case where
we're interested in doing this so better simply always install the
transport layer before installing the mux, that's safer and simpler.
This needs to be backported to 2.1 which is constructed the same way
and thus suffers from the same issue, though the code is slightly
different there.
This bug was introduced by the commit 8f587ea3 ("MEDIUM: lua: Set the analyse
expiration date with smaller wake_time only"). At the end of hlua_action(), the
lua context may be null if the alloc failed.
No backport needed, this is 2.3-dev.
In si_cs_send() and si_cs_recv(), we explicitly test the connection's mux is
defined to proceed. For si_cs_recv(), it is probably a bit overkill. But
opportunistic sends are possible from the moment the server connection is
created. So it is safer to do this test.
This patch may be backported as far as 1.9 if necessary.
In the connect_server() function, there is an optim to install the mux as soon
as possible. It is possible if we can determine the mux to use from the
configuration only. For instance if the mux is explicitly specified or if no ALPN
is set. This patch adds a new condition to preinstall the mux for non-ssl
connection. In this case, by default, we always use the mux_pt for raw
connections and the mux-h1 for HTTP ones.
This patch is related to the issue #762. It may be backported to 2.2 (and
possibly as far as 1.9 if necessary).
Sometime, a server connection may be performed synchronously. Most of time on
TCP socket, it does not happen. It is easier to have sync connect with unix
socket. When it happens, if we are not waiting for any hanshake completion, we
must be sure to have a mux installed before leaving the connect_server()
function because an attempt to send may be done before the I/O connection
handler have a chance to be executed to install the mux, if not already done.
For now, It is not expected to perform a send with no mux installed, leading to
a crash if it happens.
This patch should fix the issue #762 and probably #779 too. It must be
backported as far as 1.9.
If a lua action yields for any reason and if the wake timeout is set, it only
override the analyse expiration date if it is smaller. This way, a lower
inspect-delay will be respected, if any.
A dedicated expiration date is now used to apply the inspect-delay of the
tcp-request or tcp-response rulesets. Before, the analyse expiratation date was
used but it may also be updated by the lua (at least). So a lua script may
extend or reduce the inspect-delay by side effect. This is not expected. If it
becomes necessary, a specific function will be added to do this. Because, for
now, it is a bit confusing.
On a tcp-response content ruleset evaluation, the inspect-delay is engaged when
rule's conditions are not validated but not when the rule's action yields.
This patch must be backported to all supported versions.
When a tcp-request or a tcp-response content ruleset evaluation is aborted, the
corresponding FLT_END analyser must be preserved, if any. But because of a typo
error, on the tcp-response content ruleset evaluation, we try to preserve the
request analyser instead of the response one.
This patch must be backported to 2.2.
On a final evaluation of a tcp-request or tcp-response content ruleset, it is
forbidden for an action to yield. To quickly identify bugs an internal error is
now returned if it happens and a warning log message is emitted.
A Lua action may yield. It may happen because the action returns explicitly
act.YIELD or because the script itself yield. In the first case, we must abort
the script execution if it is the final rule evaluation, i.e if the
ACT_OPT_FINAL flag is set. The second case is already covered.
This patch must be backported to 2.2.
When an action is evaluated, flags are passed to know if it is the first call
(ACT_OPT_FIRST) and if it must be the last one (ACT_OPT_FINAL). For the
do-resolve DNS action, the ACT_OPT_FINAL flag must be handled because the
action may yield. It must never yield when this flag is set. Otherwise, it may
lead to a wakeup loop of the stream because the inspected-delay of a tcp-request
content ruleset was reached without stopping the rules evaluation.
This patch is related to the issue #222. It must be backported as far as 2.0.
On Lua 5.4, some API changes make HAProxy compilation to fail. Among other
things, the lua_resume() function has changed and now takes an extra argument in
Lua 5.4 and the error LUA_ERRGCMM was removed. Thus the LUA_VERSION_NUM macro is
now tested to know the lua version is used and adapt the code accordingly.
Here are listed the incompatibilities with the previous Lua versions :
http://www.lua.org/manual/5.4/manual.html#8
This patch comes from the HAproxy's fedora RPM, committed by Tom Callaway :
https://src.fedoraproject.org/rpms/haproxy/blob/db970613/f/haproxy-2.2.0-lua-5.4.patch
This patch should fix the issue #730. It must be backported to 2.2 and probably
as far as 2.0.
Support for DNS Service Discovery by means of SRV records was enhanced with
commit 13a9232eb ("MEDIUM: dns: use Additional records from SRV responses")
to use the content of the answers Additional records when present.
If there are Authority records before the Additional records we mistakenly
treat that as an invalid response. To fix this, just ignore the Authority
section if it exist and skip to the Additional records.
As 13a9232eb was introduced during 2.2-dev, it must be backported to 2.2.
This is a fix for issue #778
Since commit 13a9232eb ("MEDIUM: dns: use Additional records from SRV
responses"), a struct server can have a NULL dns_requester->resolution,
when SRV records are used and DNS answers contain an Additional section.
This is a problem when we call snr_update_srv_status() because it does
not check that resolution is NULL, and dereferences it. This patch
simply adds a test for resolution being NULL. When that happens, it means
we are using SRV records with Additional records, and an entry was removed.
This should fix issue #775.
This should be backported to 2.2.
When the watchdog is fired because of the lua, the stack of the corresponding
lua context is dumped. But we must be sure the lua context is fully initialized
to do so. If we are blocked on the global lua lock, during the lua context
initialization, the lua stask may be NULL.
This patch should fix the issue #776. It must be backported as far as 2.0.
uClibc toolchains built with no dynamic library support don't provide
the dlfcn.h header. That leads to build failure:
CC src/tools.o
src/tools.c:15:10: fatal error: dlfcn.h: No such file or directory
#include <dlfcn.h>
^~~~~~~~~
Enable dladdr on Linux platforms only when USE_DL is defined.
This should be backported wherever 109201fc5 ("BUILD: tools: rely on
__ELF__ not USE_DL to enable use of dladdr()") is backported (currently
only 2.2 and 2.1).
In the CGI/1.1 specification, it is specified the QUERY_STRING must not be
url-decoded. However, this parameter is sent decoded because it is extracted
after the URI's path decoding. Now, the query-string is first extracted, then
the script part of the path is url-decoded. This way, the QUERY_STRING parameter
is no longer decoded.
This patch should fix the issue #769. It must be backported as far as 2.1.
A workaround for some difficulties encountered to anticipate end of
messages was addressed by commit 810df0614 ("MEDIUM: htx: Add a flag on
a HTX message when no more data are expected"), but there were 3 issues
in it (with minor impact):
- the flag was mistakenly set before an EOH in Lua, which would only
cause incomplete packets to be emitted for now but could cause
truncated responses in the future. It's not needed to add it on
the next EOM block as http_forward_proxy_resp() already does it.
- one was still missing in hlua_applet_http_fct(), possibly causing
delays on Lua services
- one was missing in the Prometheus exporter.
All this simply shows that this mechanism is still quite fragile and
not trivial to use, especially in order to deal with the impossibility
to append the EOM, so we'll need to improve the solution in the future
and future backports should not be completely ruled out.
This fix must be backported where the patch above is backported,
typically 2.1 and later as it was required for a set of fixes.
The previous leak on do-resolve was particularly tricky to check due
to the important code repetition in dns_validate_dns_response() which
required careful examination of all return statements to check whether
they needed a pool_free() or not. Let's clean all this up using a common
leave point which releases the element itself. This also encourages
to properly set the current response to null right after freeing or
adding it so that it doesn't get added. 45 return and 22 pool_free()
were replaced by one of each.
This flag is set by HTTP analyzers to notify that more data are epxected. It is
used to know if the CO_SFL_MSG_MORE flag must be set on the connection when data
are sent. Historically, it was set on chuncked messages and on compressed
responses. But in HTX, the chunked messages are parsed by the H1 multipexer. So
for this case, the infinite forwarding is enabled and the flag must no longer be
set. For the compression, the test must be extended and be applied on all data
filters. Thus it is also true for the request channel.
So, now, CF_EXPECT_MORE flag is set on a request or a response channel if there
is at least one data filter attached to the stream. In addition, the flag is
removed when the HTTP message analysis is finished.
This patch should partially fix the issue #756. It must be backported to 2.1.
In HTX, if the HTX_FL_EOI message is set on the message, we don't set the
CO_SFL_MSG_MORE flag on the connection. This way, the send is not delayed if
only the EOM is missing in the HTX message.
This patch depends on the commit "MEDIUM: htx: Add a flag on a HTX message when
no more data are expected".
This patch should partially fix the issue #756. It must be backported to
2.1. For earlier versions, CO_SFL_MSG_MORE is ignored by HTX muxes.
The HTX_FL_EOI flag must now be set on a HTX message when no more data are
expected. Most of time, it must be set before adding the EOM block. Thus, if
there is no space for the EOM, there is still an information to know all data
were received and pushed in the HTX message. There is only an exception for the
HTTP replies (deny, return...). For these messages, the flag is set after all
blocks are pushed in the message, including the EOM block, because, on error,
we remove all inserted data.
When a DNS resolution is freed, the remaining items in .ar_list and .answer_list
are also released. It must be done to avoid a memory leak. And it is the last
chance to release these objects. I've honestly no idea if there is a better
place to release them earlier. But at least, there is no more leak.
This patch should solve the issue #222. It must be backported, at least, as far
as 2.0, and probably, with caution, as far as 1.8 or 1.7.
The do-resolve HTTP action, performing a DNS resolution of a sample expression
output, is not thread-safe at all. The resolver object used to do the resolution
must be locked when the action is executed or when the stream is released
because its curr or wait resolution lists and the requester list inside a
resolution are updated. It is also important to not wake up a released stream
(with a destroyed task).
Of course, because of this bug, various kind of crashes may be observed.
This patch should fix the issue #236. It must be backported as far as 2.0.
This aims at catching calls to task_unlink_wq() performed by the wrong
thread based on the shared status for the task, as well as calls to
__task_queue() with the wrong timer queue being used based on the task's
capabilities. This will at least help eliminate some hypothesis during
debugging sessions when suspecting that a wrong thread has attempted to
queue a task at the wrong place.
A bug was introduced by commit 77015abe0 ("MEDIUM: tasks: clean up the
front side of the wait queue in wake_expired_tasks()"): front tasks
that are not yet expired were incorrectly requeued into the local
wait queue instead of the global one. Because of this, the same task
could be found by the same thread on next invocation and be unlinked
without locking, allowing another thread to requeue it in parallel,
and conversely another thread could unlink it while the task was being
walked over, causing all sorts of crashes and endless loops in
wake_expired_tasks() and affiliates.
This bug can easily be triggered by stressing the do_resolve action
in multi-thread (after applying the fixes required to get do_resolve
to work with threads). It certainly is the cause of issue #758.
This must be backported to 2.2 only.
Reported github issue #759 shows there is no name resolving
on server lines for ring and peers sections.
This patch introduce the resolving for those lines.
This patch adds boolean a parameter to parse_server function to specify
if we want the function to perform an initial name resolving using libc.
This boolean is forced to true in case of peers or ring section.
The boolean is kept to false in case of classic servers (from
backend/listen)
This patch should be backported in branches where peers sections
support 'server' lines.
Before commit 80b53ffb1 ("MEDIUM: arg: make make_arg_list() stop after
its own arguments"), consumers of arguments would measure the length of
the string between the first opening and closing parenthesis before
calling make_arg_list(), and this latter one would detect an empty string
early by len==0 and would not allocate an argument list.
Since that commit, this has a changed a bit because the argument parser
is now the one in charge for delimiting the argument string, so the early
test cannot be used anymore. But the argument list is still allocated,
and despite the number of arguments being returned, consumers do not
necessarily rely on it but instead they rely on the non-null arg_p
pointer that used to be allocated only if at least one argument was
present. But as it's now always allocated, the first argument always
carries the first argument's type with an empty value, which confuses
all functions that take a unique optional argument (such as uuid()).
The proper long term solution would be to always use the returned argument
count, but at least we can make sure the function always returns an empty
argument list when fed with an empty set of parenthesis, as it always used
to do. This is what this patch does.
This fix must be backported to 2.2 and fixes github issue #763. Thanks to
Luke Seelenbinder for reporting the problem.
Since commit ad37c7ab ("BUILD: config: address build warning on
raspbian+rpi4") gcc 7.3.0 complains again on x86_64 (while 8.2.0
does not) :
src/cfgparse.c: In function 'check_config_validity':
src/cfgparse.c:3593:26: warning: argument 1 range [18446744071562067968, 18446744073709551615] exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]
newsrv->idle_conns = calloc(global.nbthread, sizeof(*newsrv->idle_conns));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This thing is completely bogus (actually the RPi one was the most wrong).
Let's try to shut them both by using an unsigned short for the cast which
is expected to satisfy everyone. It's worth noting that the exact same call
a few lines above and below do not trigger this stupid warning.
This should be backported to 2.2 since the fix above was put there already.
In run_tasks_from_task_list() we may free some tasks that have been
killed. Before doing so we unlink them from the wait queue. But if such
a task is in the global wait queue, the queue isn't locked so this can
result in corrupting the global task list and causing loops or crashes.
It's very likely one cause of issue #758.
This must be backported to 2.2. For 2.1 there doesn't seem to be any
case where a task could be freed this way while in the global queue,
but it doesn't cost much to apply the same change (the code is in
process_runnable_task there).
Issue #747 reports that building on raspbian for rpi4 triggers this
warning:
src/cfgparse.c: In function 'check_config_validity':
src/cfgparse.c:3584:26: warning: argument 1 range [2147483648, 4294967295] exceeds maximum object size 2147483647 [-Walloc-size-larger-than=]
newsrv->idle_conns = calloc((unsigned)global.nbthread, sizeof(*newsrv->idle_conns));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It's surprising because the declared type is size_t and the argument is
unsigned (i.e. the same type on 32-bit) precisely to avoid cast issues,
but gcc seems to be too smart at this one and to issue a warning over
the valid range, implying that passing the originally required type would
also warn. Given that these are the only casts in calloc and other ones
don't complain, let's drop them.
All 3 were added by commit dc2f2753e ("MEDIUM: servers: Split the
connections into idle, safe, and available.") that went into 2.2, so
this should be backported.
The CF_SHUTW_NOW flag must be handled the same way than the CF_SHUTW flag in
co_getblk_nc() and co_getline_nc() functions. It is especally important when we
try to peek a line from outgoing data. In this case, an unfinished line is
blocked an nothing is peeked if the CF_SHUTW_NOW flag is set. But the blocked
data pevent the transition to CF_SHUTW.
The above functions are only used by LUA cosockets. Because of this bug, we may
experienced wakeups in loop of the cosocket's io handler if we try to read a
line on a closed socket with a pending unfinished line (no LF found at the end).
This patch should fix issue #744. It must be backported to all supported
versions.
Previous fix dc6e8a9a7 ("BUG/MEDIUM: server: resolve state file handle
leak on reload") traded a bug for another one, now we get this warning
when building server.c, which is valid since f is not necessarily
initialized (e.g. if no global state file is passed):
src/server.c: In function 'apply_server_state':
src/server.c:3272:3: warning: 'f' may be used uninitialized in this function [-Wmaybe-uninitialized]
fclose(f);
^~~~~~~~~
Let's initialize it first. This whole code block should really be
splitted, cleaned up and reorganized as it's possible that other
similar bugs are hidden in it.
This must be backported to the same branches the commit above is
backported to (likely 2.2 and 2.1).
During reload, server state file is read and file handle is not released
this was indepently reported in #738 and #660.
partially resolves#660. This should be backported to 2.2 and 2.1.
When the loop is continued early, the memory for param_rule is not freed. This
can leak memory per request, which will eventually consume all available memory
on the server.
This patch should fix the issue #750. It must be backported as far as 2.1.
This patch adds a global counter of received syslog messages
and this one is exported on CLI "show info" as "CumRecvLogs".
This patch also updates internal conn counter and freq
of the listener and the proxy for each received log message to
prepare a further export on the "show stats".
Log forwarding:
It is possible to declare one or multiple log forwarding section,
haproxy will forward all received log messages to a log servers list.
log-forward <name>
Creates a new log forwarder proxy identified as <name>.
bind <addr> [param*]
Used to configure a log udp listener to receive messages to forward.
Only udp listeners are allowed, address must be prefixed using
'udp@', 'udp4@' or 'udp6@'. This supports for all "bind" parameters
found in 5.1 paragraph but most of them are irrelevant for udp/syslog case.
log global
log <address> [len <length>] [format <format>] [sample <ranges>:<smp_size>]
<facility> [<level> [<minlevel>]]
Used to configure target log servers. See more details on proxies
documentation.
If no format specified, haproxy tries to keep the incoming log format.
Configured facility is ignored, except if incoming message does not
present a facility but one is mandatory on the outgoing format.
If there is no timestamp available in the input format, but the field
exists in output format, haproxy will use the local date.
Example:
global
log stderr format iso local7
ring myring
description "My local buffer"
format rfc5424
maxlen 1200
size 32764
timeout connect 5s
timeout server 10s
# syslog tcp server
server mysyslogsrv 127.0.0.1:514 log-proto octet-count
log-forward sylog-loadb
bind udp4@127.0.0.1:1514
# all messages on stderr
log global
# all messages on local tcp syslog server
log ring@myring local0
# load balance messages on 4 udp syslog servers
log 127.0.0.1:10001 sample 1:4 local0
log 127.0.0.1:10002 sample 2:4 local0
log 127.0.0.1:10003 sample 3:4 local0
log 127.0.0.1:10004 sample 4:4 local0
This patch introduce a new fd handler used to parse syslog
message on udp.
The parsing function returns level, facility and metadata that
can be immediatly reused to forward message to a log server.
This handler is enabled on udp listeners if proxy is internally set
to mode PR_MODE_SYSLOG
This patch merges build message code between sink and log
and introduce a new API based on struct ist array to
prepare message header with zero copy, targeting the
log forwarding feature.
Log format 'iso' and 'timed' are now avalaible on logs line.
A new log format 'priority' is also added.
This patch introduce proto_udp.c targeting a further support of
log forwarding feature.
This code was originally produced by Frederic Lecaille working on
QUIC support and only minimal requirements for syslog support
have been merged.
A boolean was mistakenly declared 'static THREAD_LOCAL' causing
the probe of a log to a 'not sampled' log server conditionned by
the last evaluated 'sampled log' server test on the same thread.
This results to unpredictable drops of logs on 'not sampled'
log servers as soon a 'sampled' log server is declared.
This patch removes the static THREAD_LOCAL attribute from this
boolean, fixing the issue and allowing to mix 'sampled' and
'not sampled' servers.
This fix should be backported in any branches which includes
the log sampling feature.
Commit 08016ab82 ("MEDIUM: connection: Add private connections
synchronously in session server list") introduced a build warning about
a potential null dereference which is actually true: in case a reuse
fails an we fail to allocate a new connection, we could crash. The
issue was already present earlier but the compiler couldn't detect
it since it was guarded by an independent condition.
This should be carefully backported to older versions (at least 2.2
and maybe 2.1), the change consists in only adding a test on srv_conn.
The whole sequence of "if" blocks is ugly there and would deserve being
cleaned up so that the !srv_conn condition is matched ASAP and the
assignment is done later. This would remove complicated conditions.
In fcgi_strm_handle_empty_stdout(), the FCGI_SF_ES_RCVD flag is set on "->state"
stream field instead of "->flags". It is obviously wrong. This bug is not
noticeable because the right state is set in the fcgi_process_demux() function a
bit later.
This patch must be backported as far as 2.1.
When the padding of a "stream" record (STDOUT or STDERR) is skipped, we must set
the connection state to RECORD_P. It is especially important if the padding is
not fully received.
This patch must be backported as far as 2.1.
As mentionned in the FastCGI specification, FCGI "streams" are series of
non-empty stream records (length != 0), followed by an empty one. It is properly
handled for FCGI_STDOUT records, but not for FCGI_STDERR ones. If an empty
FCGI_STDERR record is received, the connection is blocked waiting for data which
will never come.
To fix the bug, when an empty FCGI_STDERR record is received, we drop it, eating
the padding if any.
This patch should fix the issue #743. It must be backported as far as 2.1.
The following sample fetches have been added :
* srv_iweight : returns the initial server's weight
* srv_uweight : returns the user-visible server's weight
* srv_weight : returns the current (or effetctive) server's weight
The requested server must be passed as argument, evnetually preceded by the
backend name. For instance :
srv_weight(back-http/www1)
In the continuity of the commit 7cf0e4517 ("MINOR: raw_sock: report global
traffic statistics"), we are now able to report the global number of bytes
emitted using the splicing. It can be retrieved in "show info" output on the
CLI.
Note this counter is always declared, regardless the splicing support. This
eases the integration with monitoring tools plugged on the CLI.
When input data are processed, if the request is switched in tunnel mode on a
protocol upgrade, we must continue the processing. Otherwise, pending input data
will only be processed on the next wakeup. So when new input data are received,
on a timeout expiration or shutdown. Worst, if the input buffer is full when it
happens, only a timeout or a shutdown will unblock the situation.
This patch should fix the issue #737. It must be backported as far as 1.9. The
bug does not seem to affect the 2.0 and 1.9 because, on a protocol upgrade, the
request is switched in tunnel mode when the response is sent to the client. But
the bug is present, so the backport remains necessary.
The srv_use_idle_conn() function is now responsible to update the server
counters and the connection flags when an idle connection is reused. The same
function is called when a new connection is created. This simplifies a bit the
connect_server() function.
When a new connection is created, its target is always set just after. So the
connection target may set when it is created instead, during its initialisation
to be precise. It is the purpose of this patch. Now, conn_new() function is
called with the connection target as parameter. The target is then passed to
conn_init(). It means the target must be passed when cs_new() is called. In this
case, the target is only used when the conn-stream is created with no
connection. This only happens for tcpchecks for now.
The session_get_conn() must now be used to look for an available connection
matching a specific target for a given session. This simplifies a bit the
connect_server() function.
When a connection is marked as private, it is now added in the session server
list. We don't wait a stream is detached from the mux to do so. When the
connection is created, this happens after the mux creation. Otherwise, it is
performed when the connection is marked as private.
To allow that, when a connection is created, the session is systematically set
as the connectin owner. Thus, a backend connection has always a owner during its
creation. And a private connection has always a owner until its death.
Note that outside the detach() callback, if the call to session_add_conn()
failed, the error is ignored. In this situation, we retry to add the connection
into the session server list in the detach() callback. If this fails at this
step, the multiplexer is destroyed and the connection is closed.
To set a connection as private, the conn_set_private() function must now be
called. It sets the CO_FL_PRIVATE flags, but it also remove the connection from
the available connection list, if necessary. For now, it never happens because
only HTTP/1 connections may be set as private after their creation. And these
connections are never inserted in the available connection list.
When a new connection is created, it may immediatly be set as private if
http-reuse never is configured for the backend. There is no reason to wait the
call to mux->detach() to do so.
If an expression is configured to set the SNI on a server connection, the
connection is marked as private. To not needlessly add it in the available
connection list when the mux is installed, the SNI is now set on the connection
before installing the mux, just after the call to si_connect().
When a stream is detached from a backend private connection, we must not insert
it in the available connection list. In addition, we must be sure to remove it
from this list. To ensure it is properly performed, this part has been slightly
refactored to clearly split processing of private connections from the others.
This patch should probably be backported to 2.2.
When a stream is detached from a backend private connection, we must not insert
it in the available connection list. In addition, we must be sure to remove it
from this list. To ensure it is properly performed, this part has been slightly
refactored to clearly split processing of private connections from the others.
This patch should probably be backported to 2.2.
A bug in task_kill() was fixed by commy 54d31170a ("BUG/MAJOR: sched:
make sure task_kill() always queues the task") which added a list
initialization before adding an element. But in fact an inconditional
addition would have done the same and been simpler than first
initializing then checking the element was initialized. Let's use
MT_LIST_ADDQ() there to add the task to kill into the shared queue
and kill the dirty LIST_INIT().
When a connection is added to an idle list, it's already detached and
cannot be seen by two threads at once, so there's no point using
TRY_ADDQ, there will never be any conflict. Let's just use the cheaper
ADDQ.
The TRY_ADDQ there was not needed since the wait list is exclusively
owned by the caller. There's a preliminary test on MT_LIST_ADDED()
that might have been eliminated by keeping MT_LIST_TRY_ADDQ() but
it would have required two more expensive writes before testing so
better keep the test the way it is.
Initially when mt_lists were added, their purpose was to be used with
the scheduler, where anyone may concurrently add the same tasklet, so
it sounded natural to implement a check in MT_LIST_ADD{,Q}. Later their
usage was extended and MT_LIST_ADD{,Q} started to be used on situations
where the element to be added was exclusively owned by the one performing
the operation so a conflict was impossible. This became more obvious with
the idle connections and the new macro was called MT_LIST_ADDQ_NOCHECK.
But this remains confusing and at many places it's not expected that
an MT_LIST_ADD could possibly fail, and worse, at some places we start
by initializing it before adding (and the test is superflous) so let's
rename them to something more conventional to denote the presence of the
check or not:
MT_LIST_ADD{,Q} : inconditional operation, the caller owns the
element, and doesn't care about the element's
current state (exactly like LIST_ADD)
MT_LIST_TRY_ADD{,Q}: only perform the operation if the element is not
already added or in the process of being added.
This means that the previously "safe" MT_LIST_ADD{,Q} are not "safe"
anymore. This also means that in case of backport mistakes in the
future causing this to be overlooked, the slower and safer functions
will still be used by default.
Note that the missing unchecked MT_LIST_ADD macro was added.
The rest of the code will have to be reviewed so that a number of
callers of MT_LIST_TRY_ADDQ are changed to MT_LIST_ADDQ to remove
the unneeded test.
Previous commit b24bc0d ("MINOR: tcp: Support TCP keepalive parameters
customization") broke non-Linux builds as TCP_KEEP{CNT,IDLE,INTVL} are
not necessarily defined elsewhere.
This patch adds the required #ifdefs to condition the visibility of the
keywords, and adds a mention in the doc about their dependency on Linux.
It is now possible to customize TCP keepalive parameters.
These correspond to the socket options TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL
and are valid for the defaults, listen, frontend and backend sections.
This patch fixes GitHub issue #670.
Compiling HAProxy with USE_LUA=1 and running a configuration check within
valgrind with a very simple configuration such as:
listen foo
bind *:8080
Will report quite a few possible leaks afterwards:
==24048== LEAK SUMMARY:
==24048== definitely lost: 0 bytes in 0 blocks
==24048== indirectly lost: 0 bytes in 0 blocks
==24048== possibly lost: 95,513 bytes in 1,209 blocks
==24048== still reachable: 329,960 bytes in 71 blocks
==24048== suppressed: 0 bytes in 0 blocks
Printing these possible leaks shows that all of them are caused by Lua.
Luckily Lua makes it *very* easy to free all used memory, so let's do
this on shutdown.
Afterwards this patch is applied the output looks much better:
==24199== LEAK SUMMARY:
==24199== definitely lost: 0 bytes in 0 blocks
==24199== indirectly lost: 0 bytes in 0 blocks
==24199== possibly lost: 0 bytes in 0 blocks
==24199== still reachable: 329,960 bytes in 71 blocks
==24199== suppressed: 0 bytes in 0 blocks
Given the following example configuration:
listen foo
mode http
bind *:8080
http-request set-var(txn.leak) meth(GET)
server x example.com:80
Running a configuration check with valgrind reports:
==25992== 4 bytes in 1 blocks are definitely lost in loss record 1 of 344
==25992== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==25992== by 0x4E239D: my_strndup (tools.c:2261)
==25992== by 0x581E20: make_arg_list (arg.c:253)
==25992== by 0x4DE91D: sample_parse_expr (sample.c:890)
==25992== by 0x58E304: parse_store (vars.c:772)
==25992== by 0x566A3F: parse_http_req_cond (http_rules.c:95)
==25992== by 0x4A4CE6: cfg_parse_listen (cfgparse-listen.c:1339)
==25992== by 0x494C59: readcfgfile (cfgparse.c:2049)
==25992== by 0x545145: init (haproxy.c:2029)
==25992== by 0x421E42: main (haproxy.c:3175)
After this patch is applied the leak is gone as expected.
This is a fairly minor leak, but it can add up for many uses of the `bool()`
sample fetch. The bug most likely exists since the `bool()` sample fetch was
introduced in commit cc103299c7. The fix may
be backported to HAProxy 1.6+.
Given the following example configuration:
listen foo
mode http
bind *:8080
http-request set-var(txn.leak) bool(1)
server x example.com:80
Running a configuration check with valgrind reports:
==24233== 2 bytes in 1 blocks are definitely lost in loss record 1 of 345
==24233== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24233== by 0x4E238D: my_strndup (tools.c:2261)
==24233== by 0x581E10: make_arg_list (arg.c:253)
==24233== by 0x4DE90D: sample_parse_expr (sample.c:890)
==24233== by 0x58E2F4: parse_store (vars.c:772)
==24233== by 0x566A2F: parse_http_req_cond (http_rules.c:95)
==24233== by 0x4A4CE6: cfg_parse_listen (cfgparse-listen.c:1339)
==24233== by 0x494C59: readcfgfile (cfgparse.c:2049)
==24233== by 0x545135: init (haproxy.c:2029)
==24233== by 0x421E42: main (haproxy.c:3175)
After this patch is applied the leak is gone as expected.
This is a fairly minor leak, but it can add up for many uses of the `bool()`
sample fetch. The bug most likely exists since the `bool()` sample fetch was
introduced in commit cc103299c7. The fix may
be backported to HAProxy 1.6+.
Given the following example configuration:
backend foo
mode http
use-server %[str(x)] if { always_true }
server x example.com:80
Running a configuration check with valgrind reports:
==19376== 170 (40 direct, 130 indirect) bytes in 1 blocks are definitely lost in loss record 281 of 347
==19376== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==19376== by 0x5091AC: add_sample_to_logformat_list (log.c:511)
==19376== by 0x50A5A6: parse_logformat_string (log.c:671)
==19376== by 0x4957F2: check_config_validity (cfgparse.c:2588)
==19376== by 0x54442D: init (haproxy.c:2129)
==19376== by 0x421E42: main (haproxy.c:3169)
After this patch is applied the leak is gone as expected.
This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
Given the following example configuration:
backend foo
mode http
use-server x if { always_true }
server x example.com:80
Running a configuration check with valgrind reports:
==18650== 14 bytes in 1 blocks are definitely lost in loss record 3 of 345
==18650== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18650== by 0x649E489: strdup (strdup.c:42)
==18650== by 0x4A5438: cfg_parse_listen (cfgparse-listen.c:1548)
==18650== by 0x494C59: readcfgfile (cfgparse.c:2049)
==18650== by 0x5450B5: init (haproxy.c:2029)
==18650== by 0x421E42: main (haproxy.c:3168)
After this patch is applied the leak is gone as expected.
This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
Given the following example configuration:
frontend foo
mode http
bind *:8080
unique-id-header x
Running a configuration check with valgrind reports:
==17621== 2 bytes in 1 blocks are definitely lost in loss record 1 of 341
==17621== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17621== by 0x649E489: strdup (strdup.c:42)
==17621== by 0x4A87F1: cfg_parse_listen (cfgparse-listen.c:2747)
==17621== by 0x494C59: readcfgfile (cfgparse.c:2049)
==17621== by 0x545095: init (haproxy.c:2029)
==17621== by 0x421E42: main (haproxy.c:3167)
After this patch is applied the leak is gone as expected.
This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
Given the following example configuration:
resolvers test
nameserver test 127.0.0.1:53
listen foo
bind *:8080
server foo example.com resolvers test
Running a configuration check within valgrind reports:
==21995== 5 bytes in 1 blocks are definitely lost in loss record 1 of 30
==21995== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21995== by 0x5726489: strdup (strdup.c:42)
==21995== by 0x4B2CFB: parse_server (server.c:2163)
==21995== by 0x4680C1: cfg_parse_listen (cfgparse-listen.c:534)
==21995== by 0x459E33: readcfgfile (cfgparse.c:2167)
==21995== by 0x50778D: init (haproxy.c:2021)
==21995== by 0x418262: main (haproxy.c:3133)
==21995==
==21995== 12 bytes in 1 blocks are definitely lost in loss record 3 of 30
==21995== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21995== by 0x5726489: strdup (strdup.c:42)
==21995== by 0x4AC666: srv_prepare_for_resolution (server.c:1606)
==21995== by 0x4B2EBD: parse_server (server.c:2081)
==21995== by 0x4680C1: cfg_parse_listen (cfgparse-listen.c:534)
==21995== by 0x459E33: readcfgfile (cfgparse.c:2167)
==21995== by 0x50778D: init (haproxy.c:2021)
==21995== by 0x418262: main (haproxy.c:3133)
with one more leak unrelated to `struct server`. After applying this
patch the leak is gone as expected.
This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
Given the following example configuration:
frontend foo
mode http
bind *:8080
unique-id-format x
Running a configuration check with valgrind reports:
==30712== 42 (40 direct, 2 indirect) bytes in 1 blocks are definitely lost in loss record 18 of 39
==30712== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30712== by 0x4ED7E9: add_to_logformat_list (log.c:462)
==30712== by 0x4EEE28: parse_logformat_string (log.c:720)
==30712== by 0x47B09A: check_config_validity (cfgparse.c:3046)
==30712== by 0x52881D: init (haproxy.c:2121)
==30712== by 0x41F382: main (haproxy.c:3126)
After this patch is applied the leak is gone as expected.
This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
Instead of just calling release_sample_arg(conv_expr->arg_p) we also must
free() the conv_expr itself (after removing it from the list).
Given the following example configuration:
frontend foo
bind *:8080
mode http
http-request set-var(txn.foo) str(bar)
acl is_match str(foo),strcmp(txn.hash) -m bool
Running a configuration check within valgrind reports:
==1431== 32 bytes in 1 blocks are definitely lost in loss record 20 of 43
==1431== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1431== by 0x4C39B5: sample_parse_expr (sample.c:982)
==1431== by 0x56B410: parse_acl_expr (acl.c:319)
==1431== by 0x56BA7F: parse_acl (acl.c:697)
==1431== by 0x48D225: cfg_parse_listen (cfgparse-listen.c:816)
==1431== by 0x4797C3: readcfgfile (cfgparse.c:2167)
==1431== by 0x52943D: init (haproxy.c:2021)
==1431== by 0x41F382: main (haproxy.c:3133)
After this patch is applied the leak is gone as expected.
This is a fairly minor leak that can only be observed if samples need to be
freed, which is not something that should occur during normal processing and
most likely only during shut down. Thus no backport should be needed.
Instead of simply calling free() in expr->smp->arg_p in certain cases
properly free the sample using release_sample_expr().
Given the following example configuration:
frontend foo
bind *:8080
mode http
http-request set-var(txn.foo) str(bar)
acl is_match str(foo),strcmp(txn.hash) -m bool
Running a configuration check within valgrind reports:
==31371== 160 (48 direct, 112 indirect) bytes in 1 blocks are definitely lost in loss record 35 of 45
==31371== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371== by 0x4C3832: sample_parse_expr (sample.c:876)
==31371== by 0x56B3E0: parse_acl_expr (acl.c:319)
==31371== by 0x56BA4F: parse_acl (acl.c:697)
==31371== by 0x48D225: cfg_parse_listen (cfgparse-listen.c:816)
==31371== by 0x4797C3: readcfgfile (cfgparse.c:2167)
==31371== by 0x5293ED: init (haproxy.c:2021)
==31371== by 0x41F382: main (haproxy.c:3126)
After this patch this leak is reduced. It will be fully removed in a
follow up patch:
==32503== 32 bytes in 1 blocks are definitely lost in loss record 20 of 43
==32503== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==32503== by 0x4C39B5: sample_parse_expr (sample.c:982)
==32503== by 0x56B410: parse_acl_expr (acl.c:319)
==32503== by 0x56BA7F: parse_acl (acl.c:697)
==32503== by 0x48D225: cfg_parse_listen (cfgparse-listen.c:816)
==32503== by 0x4797C3: readcfgfile (cfgparse.c:2167)
==32503== by 0x52943D: init (haproxy.c:2021)
==32503== by 0x41F382: main (haproxy.c:3133)
This is a fairly minor leak that can only be observed if ACLs need to be
freed, which is not something that should occur during normal processing
and most likely only during shut down. Thus no backport should be needed.
When the multiplexer creation is delayed after the handshakes phase, the
connection is added in the available connection list if http-reuse never is not
configured for the backend. But it is a wrong statement. At this step, the
connection is not safe because it is a new connection. So it must be added in
the available connection list only if http-reuse always is used.
No backport needed, this is 2.2-dev.
When a connection is created and the multiplexer is installed, if the connection
is marked as private, don't consider it as available, regardless the number of
available streams. This test is performed when the mux is installed when the
connection is created, in connect_server(), and when the mux is installed after
the handshakes stage.
No backport needed, this is 2.2-dev.
When a connection is picked from the session server list because the proxy or
the session are marked to use the last requested server, if it is idle, we must
marked it as used removing the CO_FL_SESS_IDLE flag and decrementing the session
idle_conns counter.
This patch must be backported as far as 1.9.
Trace messages have been added when the CS_FL_MAY_SPLICE flag is set or unset
and when the splicing is really enabled for the H1 connection.
This patch may be backpored to 2.1 to ease debugging.
The CS_FL_MAY_SPLICE flag must be unset for the conn-stream if a read0 is
received while reading on the kernel pipe. It is mandatory when some data was
also received. Otherwise, this flag prevent the call to the h1 rcv_buf()
callback. Thus the read0 will never be handled by the h1 multiplexer leading to
a freeze of the session until a timeout is reached.
This patch must be backported to 2.1 and 2.0.
In h1_rcv_buf(), the splicing is systematically disabled if it was previously
enabled. When it happens, if the splicing is enabled it means the channel's
buffer was empty before calling h1_rcv_buf(). Thus, the only reason to disable
the splicing at this step is when some input data have just been processed.
This patch may be backported to 2.1 and 2.0.
In h1_rcv_pipe(), if the mux is unable to receive data, for instance because the
multiplexer is blocked on input waiting the other side (BUSY mode), no receive
must be performed.
This patch must be backported to 2.1 and 2.0.
In the commit 17ccd1a35 ("BUG/MEDIUM: connection: add a mux flag to indicate
splice usability"), The CS_FL_MAY_SPLICE flags was added to notify the upper
layer that the mux is able to use the splicing. But this was only done for the
payload in a message, in HTTP_MSG_DATA state. But the splicing is also possible
in TUNNEL mode, in HTTP_MSG_TUNNEL state. In addition, the splicing ability is
always disabled for chunked messages.
This patch must be backported to 2.1 and 2.0.
Since the commit cd0d2ed6e ("MEDIUM: log-format: make the LF parser aware of
sample expressions' end"), the LF_STEXPR label in the last switch-case statement
at the end of the for loop in the parse_logformat_string() function cannot be
reached anymore.
This patch should fix the issue #723.
Add a check on the conn pointer to avoid a NULL dereference in
smp_fetch_ssl_x_keylog().
The problem is not suppose to happen because the function is only used
for the frontend at the moment.
Introduced by 7d42ef5, 2.2 only.
Fix issue #733.
OpenSSL 1.1.1 provides a callback registering function
SSL_CTX_set_keylog_callback, which allows one to receive a string
containing the keys to deciphers TLSv1.3.
Unfortunately it is not possible to store this data in binary form and
we can only get this information using the callback. Which means that we
need to store it until the connection is closed.
This patches add 2 pools, the first one, pool_head_ssl_keylog is used to
store a struct ssl_keylog which will be inserted as a ex_data in a SSL *.
The second one is pool_head_ssl_keylog_str which will be used to store
the hexadecimal strings.
To enable the capture of the keys, you need to set "tune.ssl.keylog on"
in your configuration.
The following fetches were implemented:
ssl_fc_client_early_traffic_secret,
ssl_fc_client_handshake_traffic_secret,
ssl_fc_server_handshake_traffic_secret,
ssl_fc_client_traffic_secret_0,
ssl_fc_server_traffic_secret_0,
ssl_fc_exporter_secret,
ssl_fc_early_exporter_secret
NetBSD apparently uses macros for tolower/toupper and complains about
the use of char for array subscripts. Let's properly cast all of them
to unsigned char where they are used.
This is needed to fix issue #729.
Originally it was made to return a void* because some comparisons in the
code where it was used required a lot of casts. But now we don't need
that anymore. And having it non-const breaks the build on NetBSD 9 as
reported in issue #728.
So let's switch to const and adjust debug.c to accomodate this.
A typo was accidently introduced in commit 48ce6a3 ("BUG/MEDIUM: muxes:
Make sure nobody stole the connection before using it."), a "&" was
placed in front of "OTHER_LOCK", which breaks DEBUG_LOCK. No backport
is needed.
Building on OpenBSD 6.7 with gcc-4.2.1 yields the following warnings
which suggest that the initialization is not taken as expected but
that the container member is reset with each initialization:
src/peers.c: In function 'peer_send_updatemsg':
src/peers.c:1000: warning: initialized field overwritten
src/peers.c:1000: warning: (near initialization for 'p.updt')
src/peers.c:1001: warning: initialized field overwritten
src/peers.c:1001: warning: (near initialization for 'p.updt')
src/peers.c:1002: warning: initialized field overwritten
src/peers.c:1002: warning: (near initialization for 'p.updt')
src/peers.c:1003: warning: initialized field overwritten
src/peers.c:1003: warning: (near initialization for 'p.updt')
src/peers.c:1004: warning: initialized field overwritten
src/peers.c:1004: warning: (near initialization for 'p.updt')
Fixing this is trivial, we just have to initialize one level at
a time.
Please refer to commit 19a69b3740 for all the
details. This follow up commit fixes the `http-response capture` case, the
previous one only fixed the `http-request capture` one. The documentation was
already updated and the change to `check_http_res_capture` is identical to
the `check_http_req_capture` change.
This patch must be backported together with 19a69b3740.
Most likely this is 1.6+.
When we takeover a connection, let the xprt layer know. If it has its own
tasklet, and it is already scheduled, then it has to be destroyed, otherwise
it may run the new mux tasklet on the old thread.
Note that we only do this for the ssl xprt for now, because the only other
one that might wake the mux up is the handshake one, which is supposed to
disappear before idle connections exist.
No backport is needed, this is for 2.2.
In the various takeover() methods, make sure we schedule the old tasklet
on the old thread, as we don't want it to run on our own thread! This
was causing a very rare crash when building with DEBUG_STRICT, seeing
that either an FD's thread mask didn't match the thread ID in h1_io_cb(),
or that stream_int_notify() would try to queue a task with the wrong
tid_bit.
In order to reproduce this, it is necessary to maintain many connections
(typically 30k) at a high request rate flowing over H1+SSL between two
proxies, the second of which would randomly reject ~1% of the incoming
connection and randomly killing some idle ones using a very short client
timeout. The request rate must be adjusted so that the CPUs are nearly
saturated, but never reach 100%. It's easier to reproduce this by skipping
local connections and always picking from other threads. The issue
should happen in less than 20s otherwise it's necessary to restart to
reset the idle connections lists.
No backport is needed, takeover() is 2.2 only.
In srv_cleanup_idle_connections(), we compute how many idle connections
are in excess compared to the average need. But we may actually be missing
some, for example if a certain number were recently closed and the average
of used connections didn't change much since previous period. In this
case exceed_conn can become negative. There was no special case for this
in the code, and calculating the per-thread share of connections to kill
based on this value resulted in special value -1 to be passed to
srv_migrate_conns_to_remove(), which for this function means "kill all of
them", as used in srv_cleanup_connections() for example.
This causes large variations of idle connections counts on servers and
CPU spikes at the moment the cleanup task passes. These were quite more
visible with SSL as it costs CPU to close and re-establish these
connections, and it also takes time, reducing the reuse ratio, hence
increasing the amount of connections during reconnection.
In this patch we simply skip the killing loop when this condition is met.
No backport is needed, this is purely 2.2.
The function timeofday_as_iso_us adds now the trailing local timezone offset.
Doing this the function could be use directly to generate rfc5424 logs.
It affects content of a ring if the ring's format is set to 'iso' and 'timed'.
Note: the default ring 'buf0' is of type 'timed'.
It is preferable NOT to backport this to stable releases unless bugs are
reported, because while the previous format is not correct and the new
one is correct, there is a small risk to cause inconsistencies in log
format to some users who would not expect such a change in a stable
cycle.
Sadly, the fix from commit 54d31170a ("BUG/MAJOR: sched: make sure
task_kill() always queues the task") broke the builds without DEBUG_STRICT
as, in order to be careful, it plcaed a BUG_ON() around the previously
failing condition to check for any new possible failure, but this BUG_ON
strips the condition when DEBUG_STRICT is not set. We don't want BUG_ON
to evaluate any condition either as some debugging code calls possibly
expensive ones (e.g. in htx_get_stline). Let's just drop the useless
BUG_ON().
No backport is needed, this is 2.2-dev.
As reported in issue #724, openbsd fails to build in haproxy.c
due to a faulty comma in the middle of a warning message. This code
is only compiled when RLIMIT_AS is not defined, which seems to be
rare these days.
This may be backported to older versions as the problem was likely
introduced when strict limits were added.
Commit 69f591e3b ("MINOR: cli/proxy: add a new "show servers conn" command")
added the ability to dump the idle connections state for a server, but we
must not do this if idle connections were not allocated, which happens if
the server is configured with pool-max-conn 0.
This is 2.2, no backport is needed.
In the various timeout functions, make sure nobody stole the connection from
us before attempting to doing anything with it, there's a very small race
condition between the time we access the task context, and the time we
actually check it again with the lock, where it could have been free'd.
task_kill() may fail to queue a task if this task has never ever run,
because its equivalent (tasklet->list) member has never been "emptied"
since it didn't pass through the LIST_DEL_INIT() that's performed by
run_tasks_from_lists(). This results in these tasks to never be freed.
It happens during the mux takeover since the target task usually is
the timeout task which, by definition, has never run yet.
This fixes commit eb8c2c69f ("MEDIUM: sched: implement task_kill() to
kill a task") which was introduced after 2.2-dev11 and doesn't need to
be backported.
Now it's possible to preserve spacing everywhere except in "log-format",
"log-format-sd" and "unique-id-format" directives, where spaces are
delimiters and are merged. That may be useful when the response payload
is specified as a log format string by "lf-file" or "lf-string", or even
for headers or anything else.
In order to merge spaces, a new option LOG_OPT_MERGE_SPACES is applied
exclusively on options passed to function parse_logformat_string().
This patch fixes an issue #701 ("http-request return log-format file
evaluation altering spacing of ASCII output/art").
Now when building with -DDEBUG_MEM_STATS, some malloc/calloc/strdup/realloc
stats are kept per file+line number and may be displayed and even reset on
the CLI using "debug dev memstats". This allows to easily track potential
leakers or abnormal usages.
Enables ('on') or disables ('off') sharing of idle connection pools between
threads for a same server. The default is to share them between threads in
order to minimize the number of persistent connections to a server, and to
optimize the connection reuse rate. But to help with debugging or when
suspecting a bug in HAProxy around connection reuse, it can be convenient to
forcefully disable this idle pool sharing between multiple threads, and force
this option to "off". The default is on.
This could have been nice to have during the idle connections debugging,
but it's not too late to add it!
task_wakeup() passes the task through the global run queue under the
global RQ lock, which is expensive when dealing with large amounts of
fcgi_takeover() calls. Let's use the new task_kill() instead to kill the
task.
task_wakeup() passes the task through the global run queue under the
global RQ lock, which is expensive when dealing with large amounts of
h2_takeover() calls. Let's use the new task_kill() instead to kill the
task.
task_wakeup() passes the task through the global run queue under the
global RQ lock, which is expensive when dealing with large amounts of
h1_takeover() calls. Let's use the new task_kill() instead to kill the
task.
By doing so, a scenario involving approximately 130k takeover/s running on
16 threads gained almost 3% performance from 319k req/s to 328k.
task_kill() may be used by any thread to kill any task with less overhead
than a regular wakeup. In order to achieve this, it bypasses the priority
tree and inserts the task directly into the shared tasklets list, cast as
a tasklet. The task_list_size is updated to make sure it is properly
decremented after execution of this task. The task will thus be picked by
process_runnable_tasks() after checking the tree and sent to the TL_URGENT
list, where it will be processed and killed.
If the task is bound to more than one thread, its first thread will be the
one notified.
If the task was already queued or running, nothing is done, only the flag
is added so that it gets killed before or after execution. Of course it's
the caller's responsibility to make sur any resources allocated by this
task were already cleaned up or taken over.
This flag, when set, will be used to indicate that the task must die.
At the moment this may only be placed by the task itself or by the
scheduler when placing it into the TL_NORMAL queue.
The next thread walking algorithm in commit 566df309c ("MEDIUM:
connections: Attempt to get idle connections from other threads.")
proved to be sufficient for most cases, but it still has some rough
edges when threads are unevenly loaded. If one thread wakes up with
10 streams to process in a burst, it will mainly take over connections
from the next one until it doesn't have anymore.
This patch implements a rotating index that is stored into the server
list and that any thread taking over a connection is responsible for
updating. This way it starts mostly random and avoids always picking
from the same place. This results in a smoother distribution overall
and a slightly lower takeover rate.
There's a tricky behavior that was lost when the idle connections were
made sharable between thread in commit 566df309c ("MEDIUM: connections:
Attempt to get idle connections from other threads."), it is the ability
to retry from the safe list when looking for any type of idle connection
and not finding one in the idle list.
It is already important when dealing with long-lived connections since
they ultimately all become safe, but that case is already covered by
the fact that safe conns not being used end up closing and are not
looked up anymore since connect_server() sees there are none.
But it's even more important when using server-side connections which
periodically close, because the new connections may spend half of their
time in safe state and the other half in the idle state, and failing
to grab one such connection from the right list results in establishing
a new connection.
This patch makes sure that a failure to find an idle connection results
in a new attempt at finding one from the safe list if available. In order
to avoid locking twice, connections are attempted alternatively from the
idle then safe list when picking from siblings. Tests have shown a ~2%
performance increase by avoiding to lock twice.
A typical test with 10000 connections over 16 threads with 210 servers
having a 1 millisecond response time and closing every 5 requests shows
a degrading performance starting at 120k req/s down to 60-90k and an
average reuse rate of 44%. After the fix, the reuse rate raises to 79%
and the performance becomes stable at 254k req/s. Similarly the previous
test with full keep-alive has now increased from 96% reuse rate to 99%
and from 352k to 375k req/s.
No backport is needed as this is 2.2-only.
The problem with the way idle connections currently work is that it's
easy for a thread to steal all of its siblings' connections, then release
them, then it's done by another one, etc. This happens even more easily
due to scheduling latencies, or merged events inside the same pool loop,
which, when dealing with a fast server responding in sub-millisecond
delays, can really result in one thread being fully at work at a time.
In such a case, we perform a huge amount of takeover() which consumes
CPU and requires quite some locking, sometimes resulting in lower
performance than expected.
In order to fight against this problem, this patch introduces a new server
setting "pool-low-conn", whose purpose is to dictate when it is allowed to
steal connections from a sibling. As long as the number of idle connections
remains at least as high as this value, it is permitted to take over another
connection. When the idle connection count becomes lower, a thread may only
use its own connections or create a new one. By proceeding like this even
with a low number (typically 2*nbthreads), we quickly end up in a situation
where all active threads have a few connections. It then becomes possible
to connect to a server without bothering other threads the vast majority
of the time, while still being able to use these connections when the
number of available FDs becomes low.
We also use this threshold instead of global.nbthread in the connection
release logic, allowing to keep more extra connections if needed.
A test performed with 10000 concurrent HTTP/1 connections, 16 threads
and 210 servers with 1 millisecond of server response time showed the
following numbers:
haproxy 2.1.7: 185000 requests per second
haproxy 2.2: 314000 requests per second
haproxy 2.2 lowconn 32: 352000 requests per second
The takeover rate goes down from 300k/s to 13k/s. The difference is
further amplified as the response time shrinks.
In conn_backend_get() we can avoid locking other servers when trying
to steal their connections when we know for sure they will not have
one, so let's do it to lower the contention on the lock.
This command reuses the existing "show servers state" to also dump the
state of active and idle connections. The main use is to serve as a
debugging tool to troubleshot connection reuse issues.
Actually the cleanup in commit 6ff8143f7 ("BUG/MINOR: proxy: fix
dump_server_state()'s misuse of the trash") allowed to spot that the
trash is never reset when dumping a servers state. I couldn't manage
to make it dump garbage even with large setups but didn't find either
where it's cleared between successive calls while other handlers do
explicitly invoke chunk_reset(), so it seems to happen a bit by luck.
Let's use chunk_printf() here for each turn, it makes things clearer.
This could be backported along with previous patch, especially if any
user reports occasional garbage appearing in the show servers output.
dump_server_state() claims to dump into a buffer but instead it writes
into a buffer then dumps the trash into the channel, so it only supports
being called with buf=&trash and doesn't need this buffer. There doesn't
seem to be any current impact of this mistake since the function is called
from one location only.
A backport may be performed if it helps fixing other bugs but it will not
fix an existing bug by itself.
This patch adds a missing break to end the loop in case when '%[' is not
properly closed with ']'.
The issue has been introduced with commit cd0d2ed ("MEDIUM: log-format:
make the LF parser aware of sample expressions' end").
In pat_match_str() and pat_math_beg() functions, a trailing zero is
systematically added at the end of the string, even if the buffer is not large
enough to accommodate it. It is a possible buffer overflow. For instance, when
the alpn is matched against a list of strings, the sample fetch is filled with a
non-null terminated string returned by the SSL library. No trailing zero must be
added at the end of this string, because it is outside the buffer.
So, to fix the bug, a trailing zero is added only if the buffer is large enough
to accommodate it. Otherwise, the sample fetch is duplicated. smp_dup() function
adds a trailing zero to the duplicated string, truncating it if it is too long.
This patch should fix the issue #718. It must be backported to all supported
versions.
If the owning task is already dying (context was destroyed by fcgi_takeover)
there's no point taking the lock then removing it later since all the code
in between is conditionned by a non-null context. Let's simplify this.
If the owning task is already dying (context was destroyed by h2_takeover)
there's no point taking the lock then removing it later since all the code
in between is conditionned by a non-null context. Let's simplify this.
If the owning task is already dying (context was destroyed by h1_takeover)
there's no point taking the lock then removing it later since all the code
in between is conditionned by a non-null context. Let's simplify this.
In commit 3ef7a190b ("MEDIUM: tasks: apply a fair CPU distribution
between tasklet classes") we compute a total weight to be used to
split the CPU time between queues. There is a mention that the
total cannot be null, wihch is based on the fact that we only get
there if thread_has_task() returns non-zero. But there is a very
small race which can break this assumption: if two threads conflict
on MT_LIST_ADDQ() on an empty shared list and both roll back before
trying again, there is the possibility that a first call to
MT_LIST_ISEMPTY() sees the first thread install itself, then the
second call will see the list empty when both roll back. Thus we
could proceed with the queue while it's temporarily empty and
compute max lengths using a divide by zero. This case is very
hard to trigger, it seldom happens on 16 threads at 400k req/s.
Let's simply test for max_total and leave the loop when we've not
found any work.
No backport is needed, that's 2.2-only.
The parsing of http deny rules with no argument or only the deny_status argument
is buggy if followed by an ACLs expression (starting with "if" or "unless"
keyword). Instead of using the proxy errorfiles, a dummy error is used. To fix
the bug, the parsing function must also check for "if" or "unless" keyword in
such cases.
This patch should fix the issue #720. No backport is needed.
Commit d645574 ("MINOR: soft-stop: let the first stopper only signal
other threads") introduced a minor mistake which is that when a stopping
thread signals all other threads, it also signals itself. When
single-threaded, the process constantly wakes up while waiting for
last connections to exit. Let's reintroduce the lost mask to avoid
this.
No backport is needed, this is 2.2-dev only.
The max_used_conns value is used as an estimate of the needed number of
connections on a server to know how many to keep open. But this one is
not reported, making it hard to troubleshoot reuse issues. Let's export
it in the sessions/current column.
Starting with commit 079cb9a ("MEDIUM: connections: Revamp the way idle
connections are killed") we started to improve the way to compute the
need for idle connections. But the condition to keep a connection idle
or drop it when releasing it was not updated. This often results in
storms of close when certain thresholds are met, and long series of
takeover() when there aren't enough connections left for a thread on
a server.
This patch tries to improve the situation this way:
- it keeps an estimate of the number of connections needed for a server.
This estimate is a copy of the max over previous purge period, or is a
max of what is seen over current period; it differs from max_used_conns
in that this one is a counter that's reset on each purge period ;
- when releasing, if the number of current idle+used connections is
lower than this last estimate, then we'll keep the connection;
- when releasing, if the current thread's idle conns head is empty,
and we don't exceed the estimate by the number of threads, then
we'll keep the connection.
- when cleaning up connections, we consider the max of the last two
periods to avoid killing too many idle conns when facing bursty
traffic.
Thanks to this we can better converge towards a situation where, provided
there are enough FDs, each active server keeps at least one idle connection
per thread all the time, with a total number close to what was needed over
the previous measurement period (as defined by pool-purge-delay).
On tests with large numbers of concurrent connections (30k) and many
servers (200), this has quite smoothed the CPU usage pattern, increased
the reuse rate and roughly halved the takeover rate.
There's a minor glitch with the way idle connections start to be evicted.
The lookup always goes from thread 0 to thread N-1. This causes depletion
of connections on the first threads and abundance on the last ones. This
is visible with the takeover() stats below:
$ socat - /tmp/sock1 <<< "show activity"|grep ^fd ; \
sleep 10 ; \
socat -/tmp/sock1 <<< "show activity"|grep ^fd
fd_takeover: 300144 [ 91887 84029 66254 57974 ]
fd_takeover: 359631 [ 111369 99699 79145 69418 ]
There are respectively 19k, 15k, 13k and 11k takeovers for only 4 threads,
indicating that the first thread needs a foreign FD twice more often than
the 4th one.
This patch changes this si that all threads are scanned in round robin
starting with the current one. The takeovers now happen in a much more
distributed way (about 4 times 9k) :
fd_takeover: 1420081 [ 359562 359453 346586 354480 ]
fd_takeover: 1457044 [ 368779 368429 355990 363846 ]
There is no need to backport this, as this happened along a few patches
that were merged during 2.2 development.
The FD takeover operation might have certain impacts explaining
unexpected activities, so it's important to report such a counter
there. We thus count the number of times a thread has stolen an
FD from another thread.
The servers have internal states describing the status of idle connections,
unfortunately these were not exported in the stats. This patch adds the 3
following gauges:
- idle_conn_cur : Current number of unsafe idle connections
- safe_conn_cur : Current number of safe idle connections
- used_conn_cur : Current number of connections in use
DEBUG_FD was added by commit 38e8a1c in 2.2-dev, and "show fd" was
slightly modified to still allow to print orphaned/closed FDs if their
count is non-null. But bypassing the existing test made it possible
to dereference fdt.owner which can be null. Let's adjust the condition
to avoid this.
No backport is needed.
The LRU cache head was an array of list, which causes false sharing
between 4 to 8 threads in the same cache line. Let's move it to the
thread_info structure instead. There's no need to do the same for the
pool_cache[] array since it's already quite large (32 pointers each).
By doing this the request rate increased by 1% on a 16-thread machine.
In tcpcheck_eval_connect(), if we're targetting a server, increase its
curr_used_conns when creating a new connection, as the counter will be
decreased later when the connection is destroyed and conn_free() is called.
In connect_server(), we want to increase curr_used_conns only if the
connection is new, or if it comes from an idle_pool, otherwise it means
the connection is already used by at least one another stream, and it is
already accounted for.
We used to have 3 thread-based arrays for toremove_lock, idle_cleanup,
and toremove_connections. The problem is that these items are small,
and that this creates false sharing between threads since it's possible
to pack up to 8-16 of these values into a single cache line. This can
cause real damage where there is contention on the lock.
This patch creates a new array of struct "idle_conns" that is aligned
on a cache line and which contains all three members above. This way
each thread has access to its variables without hindering the other
ones. Just doing this increased the HTTP/1 request rate by 5% on a
16-thread machine.
The definition was moved to connection.{c,h} since it appeared a more
natural evolution of the ongoing changes given that there was already
one of them declared in connection.h previously.
"show sess" and particularly "show sess all" can be very slow when dumping
lots of information, and while dumping, new sessions might appear, making
the output really endless. When threads are used, this causes a double
problem:
- all threads are paused during the dump, so an overly long dump degrades
the quality of service ;
- since all threads are paused, more events get postponed, possibly
resulting in more streams to be dumped on next invocation of the dump
function.
This patch addresses this long-lasting issue by doing something simple:
the CLI's stream is moved at the end of the steams list, serving as an
identifiable marker to end the dump, because all entries past it were
added after the command was entered. As a result, the CLI's stream always
appears as the last one.
It may make sense to backport this to stable branches where dumping live
streams is difficult as well.
Commit cd4159f ("MEDIUM: mux_h2: Implement the takeover() method.")
added a return in the middle of the function, and as usual with such
stray return statements, some unrolling was lost. Here it's only the
TRACE_LEAVE() call, so it's mostly harmless. That's 2.2 only, no
backport is needed.
The IPv4 code did not take into account that the header value might not
contain the trailing NUL byte, possibly reading stray data after the header
value, failing the parse and testing the IPv6 branch. That one adds the
missing NUL, but fails to parse IPv4 addresses.
Fix this issue by always adding the trailing NUL.
The bug was reported on GitHub as issue #715.
It's not entirely clear when this bug started appearing, possibly earlier
versions of smp_fetch_hdr guaranteed the NUL termination. However the
addition of the NUL in the IPv6 case was added together with IPv6 support,
hinting that at that point in time the NUL was not guaranteed.
The commit that added IPv6 support was 69fa99292e
which first appeared in HAProxy 1.5. This patch should be backported to
1.5+, taking into account the various buffer / chunk changes and the movement
across different files.
Issue 23653 in oss-fuzz reports a heap overflow bug which is in fact a
bug introduced by commit 9e1758efb ("BUG/MEDIUM: cfgparse: use
parse_line() to expand/unquote/unescape config lines") to address
oss-fuzz issue 22689, which was only partially fixed by commit 70f58997f
("BUG/MINOR: cfgparse: Support configurations without newline at EOF").
Actually on an empty line, end == line so we cannot dereference end-1
to check for a trailing LF without first being sure that end is greater
than line.
No backport is needed, this is 2.2 only.
When an event must be processed, we decide to create a new SPOE applet if there
is no idle applet at all or if the processing rate is lower than the number of
waiting events. But when the processing rate is very low (< 1 event/second), a
new applet is created independently of the number of idle applets.
Now, when there is at least one idle applet when there is only one event to
process, no new applet is created.
This patch is related to the issue #690.
When an informational response (1xx) is returned by HAProxy, we must be sure to
send it ASAP. To do so, CF_SEND_DONTWAIT flag must be set on the response
channel to instruct the stream-interface to not set the CO_SFL_MSG_MORE flag on
the transport layer. Otherwise the response delivery may be delayed, because of
the commit 8945bb6c0 ("BUG/MEDIUM: stream-int: fix loss of CO_SFL_MSG_MORE flag
in forwarding").
This patch may be backported as far as 1.9, for HTX part only. But this part has
changed in the 2.2, so it may be a bit tricky. Note it does not fix any known
bug on previous versions because the CO_SFL_MSG_MORE flag is ignored by the h1
mux.
To be consistent with other processings on the channels, when HAProxy generates
a final response, the CF_EOI flag must be set on the response channel. This flag
is used to know that a full message was pushed into the channel (HTX messages
with an EOM block). It is used in conjunction with other channel's flags in
stream-interface functions. Especially when si_cs_send() is called, to know if
we must set or not the CO_SFL_MSG_MORE flag. Without CF_EOI, the CO_SFL_MSG_MORE
flag is always set and the message forwarding is delayed.
This patch may be backported as far as 1.9, for HTX part only. But this part has
changed in the 2.2, so it may be a bit tricky. Note it does not fix any known
bug on previous versions because the CO_SFL_MSG_MORE flag is ignored by the h1
mux.
In HTX, since the commit 8945bb6c0 ("BUG/MEDIUM: stream-int: fix loss of
CO_SFL_MSG_MORE flag in forwarding"), the CO_SFL_MSG_MORE flag is set on the
transport layer if the end of the HTTP message is not reached, to delay the data
forwarding. To do so, the CF_EOI flag is tested and must not be set on the
output channel.
But the CO_SFL_MSG_MORE flag is also added if the message was truncated. Only
CF_SHUTR is set if this case. So the forwarding may be delayed to wait more data
that will never come. So, in HTX, the CO_SFL_MSG_MORE flag must not be set if
the message is finished (full or truncated).
No backport is needed.
When HAProxy generates a 500 response, if the formatting failed, for instance
because the message is larger than a buffer, it retries to format it in loop. To
fix the bug, we must stop trying to send a response if it is a non-rewritable
response (TX_CONST_REPLY flag is set on the HTTP transaction).
Because this part is not trivial, some comments have been added.
No backport is needed.
This commit adds some sample fetches that were lacking on the server
side:
ssl_s_key_alg, ssl_s_notafter, ssl_s_notbefore, ssl_s_sig_alg,
ssl_s_i_dn, ssl_s_s_dn, ssl_s_serial, ssl_s_sha1, ssl_s_der,
ssl_s_version
Trailing slashes were not handled in crt-list commands on CLI which can
be useful when you use the commands with a directory.
Strip the slashes before looking for the crtlist in the tree.
With the rework of the config line parser, we've started to emit a dump
of the initial line underlined by a caret character indicating the error
location. But with extremely large lines it starts to take time and can
even cause trouble to slow terminals (e.g. over ssh), and this becomes
useless. In addition, control characters could be dumped as-is which is
bad, especially when the input file is accidently wrong (an executable).
This patch adds a string sanitization function which isolates an area
around the error position in order to report only that area if the string
is too large. The limit was set to 80 characters, which will result in
roughly 40 chars around the error being reported only, prefixed and suffixed
with "..." as needed. In addition, non-printable characters in the line are
now replaced with '?' so as not to corrupt the terminal. This way invalid
variable names, unmatched quotes etc will be easier to spot.
A typical output is now:
[ALERT] 176/092336 (23852) : parsing [bad.cfg:8]: forbidden first char in environment variable name at position 811957:
...c$PATH$PATH$d(xlc`%?$PATH$PATH$dgc?T$%$P?AH?$PATH$PATH$d(?$PATH$PATH$dgc?%...
^
The config parser change in commit 9e1758efb ("BUG/MEDIUM: cfgparse: use
parse_line() to expand/unquote/unescape config lines") is wrong when
displaying the last parsed word, because it doesn't verify that the output
string was properly allocated. This may fail in two cases:
- very first line (outline is NULL, as in oss-fuzz issue 23657)
- much longer line than previous ones, requiring a realloc(), in which
case the final 0 is out of the allocated space.
This patch moves the reporting after the allocation check to fix this.
No backport is needed, this is 2.2 only.
parse_line() as added in commit c8d167bcf ("MINOR: tools: add a new
configurable line parse, parse_line()") presents an difficult usage
because it's up to the caller to determine the last written argument
based on what was passed to it. In practice the only way to safely
use it is for the caller to always pass nbarg-1 and make that last
entry point to the last arg + its strlen. This is annoying because
it makes it as painful to use as the infamous strncpy() while it has
all the information the caller needs.
This patch changes its behavior so that it guarantees that at least
one argument will point to the trailing zero at the end of the output
string, as long as there is at least one argument. The caller just
has to pass +1 to the arg count to make sure at least a last one is
empty.
When fgets() returns an incomplete line we must not increment linenum
otherwise line numbers become incorrect. This may happen when parsing
files with extremely long lines which require a realloc().
The bug has been present since unbounded line length was supported, so
the fix should be backported to older branches.
A crash was reported in issue #707 because the private key was not
uploaded correctly with "set ssl cert".
The bug is provoked by X509_check_private_key() being called when there
is no private key, which can lead to a segfault.
This patch adds a check and return an error is the private key is not
present.
This must be backported in 2.1.
Now that all tasklet queues are scanned at once by run_tasks_from_lists(),
it becomes possible to always check for lower priority classes and jump
back to them when they exist.
This patch adds tune.sched.low-latency global setting to enable this
behavior. What it does is stick to the lowest ranked priority list in
which tasks are still present with an available budget, and leave the
loop to refill the tasklet lists if the trees got new tasks or if new
work arrived into the shared urgent queue.
Doing so allows to cut the latency in half when running with extremely
deep run queues (10k-100k), thus allowing forwarding of small and large
objects to coexist better. It remains off by default since it does have
a small impact on large traffic by default (shorter batches).
Now process_runnable_tasks is responsible for calculating the budgets
for each queue, dequeuing from the tree, and calling run_tasks_from_lists().
This latter one scans the queues, picking tasks there and respecting budgets.
Note that its name was updated with a plural "s" for this reason.
It is neither convenient nor scalable to check each and every tasklet
queue to figure whether it's empty or not while we often need to check
them all at once. This patch introduces a tasklet class mask which gets
a bit 1 set for each queue representing one class of service. A single
test on the mask allows to figure whether there's still some work to be
done. It will later be usable to better factor the runqueue code.
Bits are set when tasklets are queued. They're cleared when queues are
emptied. It is possible that a queue is empty but has a bit if a tasklet
was added then removed, but this is not a problem as this is properly
checked for in run_tasks_from_list().
It will be convenient to have the tasklet queue number soon, better make
current_queue an index rather than a pointer to the queue. When not currently
running (e.g. from I/O), the index is -1.
Till now in process_runnable_tasks() we used to reserve a fixed portion
of max_processed to urgent tasks, then a portion of what remains for
normal tasks, then what remains for bulk tasks. This causes two issues:
- the current budget for processed tasks could be drained once for
all by higher level tasks so that they couldn't have enough left
for the next run. For example, if bulk tasklets cause task wakeups,
the required share to run them could be eaten by other bulk tasklets.
- it forces the urgent tasks to be run before scanning the tree so that
we know how many tasks to pick from the tree, and this isn't very
efficient cache-wise.
This patch changes this so that we compute upfront how max_processed will
be shared between classes that require so. We can then decide in advance
to pick a certain number of tasks from the tree, then execute all tasklets
in turn. When reaching the end, if there's still some budget, we can go
back and do the same thing again, improving chances to pick new work
before the global budget is depleted.
The default weights have been set to 50% for urgent tasklets, 37% for
normal ones and 13% for the bulk ones. In practice, there are not that
many urgent tasklets but when they appear they are cheap and must be
processed in as large batches as possible. Every time there is nothing
to pick there, the unused budget is shared between normal and bulk and
this allows bulk tasklets to still have quite some CPU to run on.
Move the ckch_deinit() and crtlist_deinit() call to ssl_sock.c,
also unlink the SNI from the ckch_inst because they are free'd before in
ssl_sock_free_all_ctx().
In ticket #706 it was reported that a certificate which was added from
the CLI can't be removed with 'del ssl cert' and is marked as 'Used'.
The problem is that the certificate instances are not added to the
created crtlist_entry, so they can't be deleted upon a 'del ssl
crt-list', and the store can't never be marked 'Unused' because of this.
This patch fixes the issue by adding the instances to the crtlist_entry,
which is enough to fix the issue.
Add some functions to deinit the whole crtlist and ckch architecture.
It will free all crtlist, crtlist_entry, ckch_store, ckch_inst and their
associated SNI, ssl_conf and SSL_CTX.
The SSL_CTX in the default_ctx and initial_ctx still needs to be free'd
separately.
Since commit 2954c47 ("MEDIUM: ssl: allow crt-list caching"), the
ssl_bind_conf is allocated directly in the crt-list, and the crt-list
can be shared between several bind_conf. The deinit() code wasn't
changed to handle that.
This patch fixes the issue by removing the free of the ssl_conf in
ssl_sock_free_all_ctx().
It should be completed with a patch that free the ssl_conf and the
crt-list.
Fix issue #700.
The arguments are relative to the outline, not relative to the input line.
This patch fixes up commit 9e1758efbd which
is 2.2 only. No backport needed.
The returned `arg` value is the number of arguments found, but in case
of the error message it's not a valid argument index.
Because we know how many arguments we allowed (MAX_LINE_ARGS) we know
what to print in the error message, so do just that.
Consider a configuration like this:
listen foo
1 2 3 [...] 64 65
Then running a configuration check within valgrind reports the following:
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56E8B83: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Use of uninitialised value of size 8
==18265== at 0x56E576B: _itoa_word (_itoa.c:179)
==18265== by 0x56E912C: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56E5775: _itoa_word (_itoa.c:179)
==18265== by 0x56E912C: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56E91AF: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56E8C59: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56E941A: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56E8CAB: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56E8CE2: vfprintf (vfprintf.c:1631)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
==18265== Conditional jump or move depends on uninitialised value(s)
==18265== at 0x56EA2DB: vfprintf (vfprintf.c:1632)
==18265== by 0x57B1895: __vsnprintf_chk (vsnprintf_chk.c:63)
==18265== by 0x4A8642: vsnprintf (stdio2.h:77)
==18265== by 0x4A8642: memvprintf (tools.c:3647)
==18265== by 0x4CB8A4: print_message (log.c:1085)
==18265== by 0x4CE0AC: ha_alert (log.c:1128)
==18265== by 0x459E41: readcfgfile (cfgparse.c:1978)
==18265== by 0x507CB5: init (haproxy.c:2029)
==18265== by 0x4182A2: main (haproxy.c:3137)
==18265==
[ALERT] 174/165720 (18265) : parsing [./config.cfg:2]: too many words, truncating at word 65, position -95900735: <(null)>.
[ALERT] 174/165720 (18265) : Error(s) found in configuration file : ./config.cfg
[ALERT] 174/165720 (18265) : Fatal errors found in configuration.
Valgrind reports conditional jumps relying on an undefined value and the
error message clearly shows incorrect stuff.
After this patch is applied the relying on undefined values is gone and
the <(null)> will actually show the argument. However the position value
still is incorrect. This will be fixed in a follow up patch.
This patch fixes up commit 9e1758efbd which
is 2.2 only. No backport needed.
In task_per_thread[] we now have current_queue which is a pointer to
the current tasklet_list entry being evaluated. This will be used to
know the class under which the current task/tasklet is currently
running.
We want to be sure not to exceed max_processed. It can actually go
slightly negative due to the rounding applied to ratios, but we must
refrain from processing too many tasks if it's already low.
This became particularly relevant since recent commit 5c8be272c ("MEDIUM:
tasks: also process late wakeups in process_runnable_tasks()") which was
merged into 2.2-dev10. No backport is needed.
When DEBUG_FD is set at build time, we'll keep a counter of per-FD events
in the fdtab. This counter is reported in "show fd" even for closed FDs if
not zero. The purpose is to help spot situations where an apparently closed
FD continues to be reported in loops, or where some events are dismissed.
Coverity reports a possible null deref in issue #703. It seems this
cannot happen as in order to have a CF_READ_ERROR we'd need to have
attempted a recv() which implies a conn_stream, thus conn cannot be
NULL anymore. But at least one line tests for conn and the other one
not, which is confusing. So let's add a check for conn before
dereferencing it.
This needs to be backported to 2.1 and 2.0. Note that in 2.0 it's
in proto_htx.c.
As discussed on the list: https://www.mail-archive.com/haproxy@formilux.org/msg37698.html
This patch adds warnings to the configuration parser that detect the
following situations:
- A line being truncated by a null byte in the middle.
- A file not ending in a new line (and possibly being truncated).
Fix parsing of configurations if the configuration file does not end with
an LF.
This patch fixes GitHub issue #704. It's a regression in
9e1758efbd which is 2.2 specific. No backport
needed.
When a SPOE filter starts the response analyze, the wrong flag is tested on the
pre_analyzers bit field. AN_RES_INSPECT must be tested instead of
SPOE_EV_ON_TCP_RSP.
This patch must be backported to all versions with the SPOE support, i.e as far
as 1.7.
If a fcgi application is configured to send its logs to a ring buffer, the
corresponding sink must be resolved during the configuration post
parsing. Otherwise, the sink is undefined when a log message is emitted,
crashing HAProxy.
No need to backport.
In h1_snd_buf(), also set H1_F_CO_MSG_MORE if we know we still have more to
send, not just if the stream-interface told us to do so. This may happen if
the last block of a transfer doesn't fit in the buffer, it remains useful
for the transport layer to know that more data follows what's already in
the buffer.
In 2.2-dev1, a change was made by commit 46230363a ("MINOR: mux-h1: Inherit
send flags from the upper layer"). The purpose was to accurately set the
CO_SFL_MSG_MORE flag on the transport layer because previously it as only
set based on the buffer full condition, which does not accurately indicate
that there are more data to follow.
The problem is that the stream-interface never sets this flag anymore in
HTX mode due to the channel's to_forward always being set to infinity.
Because of this, HTX transfers are always performed without the MSG_MORE
flag and experience a severe performance degradation on large transfers.
This patch addresses this by making the stream-interface aware of HTX and
having it check for CF_EOI to check if more contents are expected or not.
With this change, the single-threaded forwarding performance on 10 MB
objects jumped from 29 to 40 Gbps.
No backport is needed.
As reported in issue #419, a "clear map" operation on a very large map
can take a lot of time and freeze the entire process for several seconds.
This patch makes sure that pat_ref_prune() can regularly yield after
clearing some entries so that the rest of the process continues to work.
The first part, the removal of the patterns, can take quite some time
by itself in one run but it's still relatively fast. It may block for
up to 100ms for 16M IP addresses in a tree typically. This change needed
to declare an I/O handler for the clear operation so that we can get
back to it after yielding.
The second part can be much slower because it deconstructs the elements
and its users, but it iterates progressively so we can yield less often
here.
The patch was tested with traffic in parallel sollicitating the map being
released and showed no problem. Some traffic will definitely notice an
incomplete map but the filling is already not atomic anyway thus this is
not different.
It may be backported to stable versions once sufficiently tested for side
effects, at least as far as 2.0 in order to avoid the watchdog triggering
when the process is frozen there. For a better behaviour, all these
prune_* functions should support yielding so that the callers have a
chance to continue also yield in turn.
Initial default settings for maxconn/maxsock/maxpipes were rearranged
in commit a409f30d0 ("MINOR: init: move the maxsock calculation code
to compute_ideal_maxsock()") but as a side effect, the calculated
maxpipes value was not stored anymore into global.maxpipes. This
resulted in splicing being disabled unless there is an explicit
maxpipes setting in the global section.
This patch just stores the calculated ideal value as planned in the
computation and as was done before the patch above.
This is strictly 2.2, no backport is needed.
Fix the semicolon escaping which must be handled in the master CLI,
the commands were wrongly splitted and could be forwarded partially to
the target CLI.
The master CLI must not do the escaping since it forwards the commands
to another CLI. It should be able to split into words by taking care of
the escaping, but must not remove the forwarded backslashes.
This fix do the same thing as the previous patch applied to the
cli_parse_request() function, by taking care of the escaping during the
word split, but it also remove the part which was removing the
backslashes from the forwarded command.
It was not possible to escape spaces over the CLI, making impossible the
insertion of new ACL entries with spaces from the CLI.
This patch fixes the escaping of spaces over the CLI.
It is now possible to launch "add acl agents.acl My\ User\ Agent" over
the CLI.
Could be backported in all stable branches.
Should fix issue #400.
Since version 1.8, we've started to use tasks and tasklets more
extensively to defer I/O processing. Originally with the simple
scheduler, a task waking another one up using task_wakeup() would
have caused it to be processed right after the list of runnable ones.
With the introduction of tasklets, we've started to spill running
tasks from the run queues to the tasklet queues, so if a task wakes
another one up, it will only be executed on the next call to
process_runnable_task(), which means after yet another round of
polling loop.
This is particularly visible with I/Os hitting muxes: poll() reports
a read event, the connection layer performs a tasklet_wakeup() on the
mux subscribed to this I/O, and this mux in turn signals the upper
layer stream using task_wakeup(). The process goes back to poll() with
a null timeout since there's one active task, then back to checking all
possibly expired events, and finally back to process_runnable_tasks()
again. Worse, when there is high I/O activity, doing so will make the
task's execution further apart from the tasklet and will both increase
the total processing latency and reduce the cache hit ratio.
This patch brings back to the original spirit of process_runnable_tasks()
which is to execute runnable tasks as long as the execution budget is not
exhausted. By doing so, we're immediately cutting in half the number of
calls to all functions called by run_poll_loop(), and halving the number
of calls to poll(). Furthermore, calling poll() less often also means
purging FD updates less often and offering more chances to merge them.
This also has the nice effect of making tune.runqueue-depth effective
again, as in the past it used to be quickly bounded by this artificial
event horizon which was preventing from executing remaining tasks. On
certain workloads we can see a 2-3% performance increase.
Due to the way the wait queue works, some tasks might be postponed but not
requeued. However when we exit wake_expired_tasks() on a not-yet-expired
task and leave it in this situation, the next call to next_timer_expiry()
will use this first task's key in the tree as an expiration date, but this
date might be totally off and cause needless wakeups just to reposition it.
This patch makes sure that we leave wake_expired_tasks with a clean state
of frontside tasks and that their tree's key matches their expiration date.
Doing so we can already observe a ~15% reduction of the number of wakeups
when dealing with large numbers of health checks.
The patch looks large because the code was rearranged but the real change
is to take the wakeup/requeue decision on the task's expiration date instead
of the tree node's key, the rest is unchanged.
Nowadays signals cause tasks to be woken up. The historic code still
processes signals after tasks, which forces a second round in the loop
before they can effectively be processed. Let's move the signal queue
handling between wake_expired_tasks() and process_runnable_tasks() where
it makes much more sense.
Some of the recent optimizations around the polling to save a few
epoll_ctl() calls have shown that they could also cause some trouble.
However, over time our code base has become totally asynchronous with
I/Os always attempted from the upper layers and only retried at the
bottom, making it look like we're getting closer to EPOLLET support.
There are showstoppers there such as the listeners which cannot support
this. But given that most of the epoll_ctl() dance comes from the
connections, we can try to enable edge-triggered polling on connections.
What this patch does is to add a new global tunable "tune.fd.edge-triggered",
that makes fd_insert() automatically set an et_possible bit on the fd if
the I/O callback is conn_fd_handler. When the epoll code sees an update
for such an FD, it immediately registers it in both directions the first
time and doesn't update it anymore.
On a few tests it proved quite useful with a 14% request rate increase in
a H2->H1 scenario, reducing the epoll_ctl() calls from 2 per request to
2 per connection.
The option is obviously disabled by default as bugs are still expected,
particularly around the subscribe() code where it is possible that some
layers do not always re-attempt reading data after being woken up.
localpeer <name>
Sets the local instance's peer name. It will be ignored if the "-L"
command line argument is specified or if used after "peers" section
definitions. In such cases, a warning message will be emitted during
the configuration parsing.
This option will also set the HAPROXY_LOCALPEER environment variable.
See also "-L" in the management guide and "peers" section in the
configuration manual.
Since there was a risk of leaving fd_takeover() without properly
stopping the fd, let's take this opportunity for factoring the code
around a commont exit point that's common to both double-cas and locked
modes. This means using the "ret" variable inside the double-CAS code,
and inverting the loop to first test the old values. Doing do also
produces cleaner code because the compiler cannot factorize common
exit paths using asm statements that are present in some atomic ops.
The loop in fd_takeover() around the double-CAS is conditionned on
a previous value of old_masks[0] that always matches tid_bit on the
first iteration because it does not result from the atomic op but
from a pre-loaded value. Let's set the result of the atomic op there
instead so that the conflict between threads can be detected earlier
and before performing the double-word CAS.
When haproxy is compiled without double-word CAS, we use a migration lock
in fd_takeover(). This lock was covering the atomic OR on the running_mask
before checking its value, while it is not needed since this atomic op
already returns the result. Let's just refine the code to avoid grabbing
the lock in the event another thread has already stolen the FD, this may
reduce contention in high reuse rate scenarios.
This one was confusingly called, I thought it was the cumulated number
of streams but it's the number of calls to process_stream(). Let's make
this clearer.
empty_rq and long_rq are per-loop so it makes sense to group them
together with the loop count. In addition since ctxsw and tasksw
apply in the context of these counters, let's move them as well.
More precisely the difference between wake_tasks and long_rq should
roughly correspond to the number of inter-task messages. Visually
it's much easier to spot ratios of wakeup causes now.
In fd_takeover(), when a double-width compare-and-swap is implemented,
make sure, if we managed to get the fd, to call fd_stop_recv() on it, so
that the thread that used to own it will know it has to stop polling it.
In fd_takeover(), if we failed to grab the fd, when a double-width
compare-and-swap is not implemented, do not call fd_stop_recv() on the
fd, it is not ours and may be used by another thread.
We have poll_drop, poll_dead and poll_skip which are confusingly named
like their poll_io and poll_exp counterparts except that they are not
per poll() call but per-fd. This patch renames them to poll_drop_fd(),
poll_dead_fd() and poll_skip_fd() for this reason.
The "show activity" output mentions a number of indicators to explain
wake up reasons but doesn't have the number of times poll() sees some
I/O. And given that multiple events can happen simultaneously, it's
not always possible to deduce this metric by subtracting.
This patch adds a new "poll_io" counter that allows one to see how
often poll() returns with at least one active FD. This should help
detect stuck events and measure various ratios of poll sub-metrics.
Since 2.1-dev2, with commit 305d5ab46 ("MAJOR: fd: Get rid of the fd cache.")
we don't have the fd_lock anymore and as such its acitvity counter is always
zero. Let's remove it from the struct and from "show activity" output, as
there are already plenty of indicators to look at.
The cache line comment in the struct activity was updated to reflect
reality as it looks like another one already got removed in the past.
This effectively reverts the two following commits:
6f95f6e11 ("OPTIM: connection: disable receiving on disabled events when the run queue is too high")
065a02561 ("MEDIUM: connection: don't stop receiving events in the FD handler")
The problem as reported in issue #662 is that when the events signals
the readiness of input data that has to be forwarded over a congested
stream, the mux will read data and wake the stream up to forward them,
but the buffer full condition makes this impossible immediately, then
nobody in the chain will be able to disable the event after it was
first reported. And given we don't know at the connection level whether
an event was already reported or not, we can't decide anymore to
forcefully stop it if for any reason its processing gets delayed.
The problem is magnified in issue #662 by the fact that a shutdown is
reported with pending data occupying the buffer. The shutdown will
strike in loops and cause the upper layer stream to be notified until
it's handled, but with a buffer full it's not possible to call cs_recv()
hence to purge the event.
All this can only be handled optimally by implementing a lower layer,
direct mux-to-mux forwarding that will not require any scheduling. This
was no wake up will be needed and the event will be instantly handled
or paused for a long time.
For now let's simply revert these optimizations. Running a 1 MB transfer
test over H2 using 8 connections having each 32 streams with a limited
link of 320 Mbps shows the following profile before this fix:
calls syscall (100% CPU)
------ -------
259878 epoll_wait
519759 clock_gettime
17277 sendto
17129 recvfrom
672 epoll_ctl
And the following one after the fix:
calls syscall (2-3% CPU)
------ -------
17201 sendto
17357 recvfrom
2304 epoll_wait
4609 clock_gettime
1200 epoll_ctl
Thus the behavior is much better.
No backport is needed as these patches were only in 2.2-dev.
Many thanks to William Dauchy for reporting a lot of details around this
difficult issue.
Since we've seen clang emit bad code when the address sanitizer is enabled
at -O2, better clearly report it in the version output. It is detected both
for clang and gcc (both tested with and without).
For an unknown reason in commit bb1b63c079 I placed the compiler version
output in haproxy.c instead of version.c. Better have it in version.c which
is more suitable to this sort of things.
The spoe parser fails to check that the decoded key length is large
enough to match a given key but it uses the returned length in memcmp().
So returning "ver" could match "version" for example. In addition this
makes clang 10's ASAN complain because the second argument to memcmp()
is the static key which is shorter than the decoded buffer size, which
in practice has no impact.
I'm still not 100% sure the parser is entirely correct because even
with this fix it cannot parse a key whose name matches the beginning
of another one, but in practice this does not happen. Ideally a
preliminary length check before the comparison would be safer.
This needs to be backported as far as 1.7.
Since we dropped support for legacy mode, it's not the stream which
deals with the connection but the mux, and there's no point in closing
the client connection after most internal status codes. For example if
the client gets a 401 or a 503 because a server doesn't respond, it
makes no sense forcing the connection to close after reporting this
status, because it's already done by the mux if the client asks for it
or is not compatible with keep-alive. This current state was inherited
from the early days but is still limiting the amount of client-side
connection reuse in a number of circumstances (typically server-side
errors). This change was planned for 2.1 but forgotten.
The status codes for which the connection is not closed anymore are those
that do not depend on the client side connection itself, which are all
except 400 and 408. This could be backported to 2.1 but not further, in
order to make sure legacy and HTX behave strictly similarly.
One issue with the config parser is that while it tries to report as many
errors as possible at once, it's actually unbounded. Thus, when calling
haproxy on a wrong file, it can take ages to process, such as here on
half a gigabyte of map file instead of config file:
$ time ./haproxy -c -f large.map 2>&1 |wc -l
16777220
real 0m31.324s
user 0m22.595s
sys 0m28.909s
This patch modifies readcfgfile() to stop reading the config file after a
reasonable amount of fatal errors. This threshold is set to 50, which seems
more than enough to spot a recurrent issue with a bit of context in a terminal
to address several issues at once, without filling logs nor taking time to
parse the file. The difference is clear now:
$ time ./haproxy -c -f large.map 2>&1 |wc -l
55
real 0m0.005s
user 0m0.004s
sys 0m0.003s
This may be backported to older versions without causing too many
difficulties. However the patch will not apply as-is, it will require
to increment the "fatal" count for each place where ERR_FATAL is set
in the parsing loop.
Issue 22689 in oss-fuzz shows that specially crafted config files can take
a long time to process. This happens when variable expansion, backslash
escaping or unquoting causes calls to memmove() and possibly to realloc()
resulting in O(N^2) complexity with N following the line size.
By using parse_line() we now have a safe parser that remains in O(N)
regardless of the type of operation. Error reporting changed a little bit
since the errors are not reported anymore from the deepest parsing level.
As such we now report the beginning of the error. One benefit is that for
many invalid character sequences, the original line is shown and the first
bad char or sequence is designated with a caret ('^'), which tends to be
visually easier to spot, for example:
[ALERT] 167/170507 (14633) : parsing [mini5.cfg:19]: unmatched brace in environment variable name below:
"${VAR"}
^
or:
[ALERT] 167/170645 (14640) : parsing [mini5.cfg:18]: unmatched quote below:
timeout client 10s'
^
In case the target buffer is too short for the new line, the output buffer
is grown in 1kB chunks and kept till the end, so that it should not happen
too often.
Before this patch a test like below involving a 4 MB long line would take
138s to process, 98% of which were spent in __memmove_avx_unaligned_erms(),
and now it takes only 65 milliseconds:
$ perl -e 'print "\"\$A\""x1000000,"\n"' | ./haproxy -c -f /dev/stdin 2>/dev/null
This may be backported to stable versions after a long period of
observation to be sure nothing broke. It relies on patch "MINOR: tools:
add a new configurable line parse, parse_line()".
This function takes on input a string to tokenize, an output storage
(which may be the same) and a number of options indicating how to handle
certain characters (single & double quote support, backslash support,
end of line on '#', environment variables etc). On output it will provide
a list of pointers to individual words after having possibly unescaped
some character sequences, handled quotes and resolved environment
variables, and it will also indicate a status made of:
- a list of failures (overlap between src/dst, wrong quote etc)
- the pointer to the first sequence in error
- the required output length (a-la snprintf()).
This allows a caller to freely unescape/unquote a string by using a
pre-allocated temporary buffer and expand it as necessary. It takes
extreme care at avoiding expensive operations and intentionally does
not use memmove() when removing escapes, hence the reason for the
different input and output buffers. The goal is to use it as the basis
for the config parser.
As reported in issue #689, there is a subtle bug in the ebtree code used
to compared memory blocks. It stems from the platform-dependent memcmp()
implementation. Original implementations used to perform a byte-per-byte
comparison and to stop at the first non-matching byte, as in this old
example:
https://www.retro11.de/ouxr/211bsd/usr/src/lib/libc/compat-sys5/memcmp.c.html
The ebtree code has been relying on this to detect the first non-matching
byte when comparing keys. This is made so that a zero-terminated string
can fail to match against a longer string.
Over time, especially with large busses and SIMD instruction sets,
multi-byte comparisons have appeared, making the processor fetch bytes
past the first different byte, which could possibly be a trailing zero.
This means that it's possible to read past the allocated area for a
string if it was allocated by strdup().
This is not correct and definitely confuses address sanitizers. In real
life the problem doesn't have visible consequences. Indeed, multi-byte
comparisons are implemented so that aligned words are loaded (e.g. 512
bits at once to process a cache line at a time). So there is no way such
a multi-byte access will cross a page boundary and end up reading from
an unallocated zone. This is why it was never noticed before.
This patch addresses this by implementing a one-byte-at-a-time memcmp()
variant for ebtree, called eb_memcmp(). It's optimized for both small and
long strings and guarantees to stop after the first non-matching byte. It
only needs 5 instructions in the loop and was measured to be 3.2 times
faster than the glibc's AVX2-optimized memcmp() on short strings (1 to
257 bytes), since that latter one comes with a significant setup cost.
The break-even seems to be at 512 bytes where both version perform
equally, which is way longer than what's used in general here.
This fix should be backported to stable versions and reintegrated into
the ebtree code.
We cannot simply `release_sample_expr(rule->arg.vars.expr)` for a
`struct act_rule`, because `rule->arg` is a union that might not
contain valid `vars`. This leads to a crash on a configuration using
`http-request redirect` and possibly others:
frontend http
mode http
bind 127.0.0.1:80
http-request redirect scheme https
Instead a `struct act_rule` has a `release_ptr` that must be used
to properly free any additional storage allocated.
This patch fixes a regression in commit ff78fcdd7f.
It must be backported to whereever that patch is backported.
It has be verified that the configuration above no longer crashes.
It has also been verified that the configuration in ff78fcdd7f
does not leak.
Commit 0a3b43d9c ("MINOR: haproxy: Make use of deinit_and_exit() for
clean exits") introduced this build warning:
src/haproxy.c: In function 'main':
src/haproxy.c:3775:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
This is because the new deinit_and_exit() is not marked as "noreturn"
so depending on the optimizations, the noreturn attribute of exit() will
either leak through it and silence the warning or not and confuse the
compiler. Let's just add the attribute to fix this.
No backport is needed, this is purely 2.2.
It's unclear why the buffer length wasn't considered when tcp-response
rules were added in 1.5-dev3 with commit 97679e790 ("[MEDIUM] Implement
tcp inspect response rules"). But it's impossible to write working
tcp-response content rules as they're always waiting for the expiration
and do not consider the fact that the buffer is full. It's likely that
tcp-response content rules were only used with HTTP traffic.
This may be backported to stable versions, though it's not very
important considering that that nobody reported this in 10 years.
The req_body and res_body sample fetch functions forgot to set the
SMP_F_MAY_CHANGE flag, making them unusable in tcp content rules. Now we
set the flag as long as the channel is not full and nothing indicates
the end was reached.
This is marked as a bug because it's unusual for a sample fetch function
to return a final verdict while data my change, but this results from a
limitation that was affecting the legacy mode where it was not possible
to know whether the end was reached without de-chunking the message. In
HTX there is no more reason to limit this. This fix could be backported
to 2.1, and to 2.0 if really needed, though it will only be doable for
HTX, and legacy cannot be fixed.
In issue #685 gcc 10 seems to find a null pointer deref in free_zlib().
There is no such case because all possible pools are tested and there's
no other one in zlib, except if there's a bug or memory corruption
somewhere else. The code used to be written like this to make sure that
any such bug couldn't remain unnoticed.
Now gcc 10 sees this theorical code path and complains, so let's just
change the code to place an explicit crash in case of no match (which
must never happen).
This might be backported if other versions are affected.
Given the following example configuration:
frontend foo
bind *:8080
mode http
http-request set-var(txn.foo) str(bar)
Running a configuration check within valgrind reports:
==23665== Memcheck, a memory error detector
==23665== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==23665== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==23665== Command: ./haproxy -c -f ./crasher.cfg
==23665==
[WARNING] 165/002941 (23665) : config : missing timeouts for frontend 'foo'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Warnings were found.
Configuration file is valid
==23665==
==23665== HEAP SUMMARY:
==23665== in use at exit: 314,008 bytes in 87 blocks
==23665== total heap usage: 160 allocs, 73 frees, 1,448,074 bytes allocated
==23665==
==23665== 132 (48 direct, 84 indirect) bytes in 1 blocks are definitely lost in loss record 15 of 28
==23665== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==23665== by 0x4A2612: sample_parse_expr (sample.c:876)
==23665== by 0x54DF84: parse_store (vars.c:766)
==23665== by 0x528BDF: parse_http_req_cond (http_rules.c:95)
==23665== by 0x469F36: cfg_parse_listen (cfgparse-listen.c:1339)
==23665== by 0x459E33: readcfgfile (cfgparse.c:2167)
==23665== by 0x5074FD: init (haproxy.c:2021)
==23665== by 0x418262: main (haproxy.c:3126)
==23665==
==23665== LEAK SUMMARY:
==23665== definitely lost: 48 bytes in 1 blocks
==23665== indirectly lost: 84 bytes in 2 blocks
==23665== possibly lost: 0 bytes in 0 blocks
==23665== still reachable: 313,876 bytes in 84 blocks
==23665== suppressed: 0 bytes in 0 blocks
==23665== Reachable blocks (those to which a pointer was found) are not shown.
==23665== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==23665==
==23665== For counts of detected and suppressed errors, rerun with: -v
==23665== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
After this patch is applied the leak is gone as expected.
This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
Particularly cleanly deinit() after a configuration check to clean up the
output of valgrind which reports "possible losses" without a deinit() and
does not with a deinit(), converting actual losses into proper hard losses
which makes the whole stuff easier to analyze.
As an example, given an example configuration of the following:
frontend foo
bind *:8080
mode http
Running `haproxy -c -f cfg` within valgrind will report 4 possible losses:
$ valgrind --leak-check=full ./haproxy -c -f ./example.cfg
==21219== Memcheck, a memory error detector
==21219== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==21219== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==21219== Command: ./haproxy -c -f ./example.cfg
==21219==
[WARNING] 165/001100 (21219) : config : missing timeouts for frontend 'foo'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Warnings were found.
Configuration file is valid
==21219==
==21219== HEAP SUMMARY:
==21219== in use at exit: 1,436,631 bytes in 130 blocks
==21219== total heap usage: 153 allocs, 23 frees, 1,447,758 bytes allocated
==21219==
==21219== 7 bytes in 1 blocks are possibly lost in loss record 5 of 54
==21219== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x5726489: strdup (strdup.c:42)
==21219== by 0x468FD9: bind_conf_alloc (listener.h:158)
==21219== by 0x468FD9: cfg_parse_listen (cfgparse-listen.c:557)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== 14 bytes in 1 blocks are possibly lost in loss record 9 of 54
==21219== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x5726489: strdup (strdup.c:42)
==21219== by 0x468F9B: bind_conf_alloc (listener.h:154)
==21219== by 0x468F9B: cfg_parse_listen (cfgparse-listen.c:557)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== 128 bytes in 1 blocks are possibly lost in loss record 35 of 54
==21219== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x468F90: bind_conf_alloc (listener.h:152)
==21219== by 0x468F90: cfg_parse_listen (cfgparse-listen.c:557)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== 608 bytes in 1 blocks are possibly lost in loss record 46 of 54
==21219== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x4B953A: create_listeners (listener.c:576)
==21219== by 0x4578F6: str2listener (cfgparse.c:192)
==21219== by 0x469039: cfg_parse_listen (cfgparse-listen.c:568)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== LEAK SUMMARY:
==21219== definitely lost: 0 bytes in 0 blocks
==21219== indirectly lost: 0 bytes in 0 blocks
==21219== possibly lost: 757 bytes in 4 blocks
==21219== still reachable: 1,435,874 bytes in 126 blocks
==21219== suppressed: 0 bytes in 0 blocks
==21219== Reachable blocks (those to which a pointer was found) are not shown.
==21219== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==21219==
==21219== For counts of detected and suppressed errors, rerun with: -v
==21219== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
Re-running the same command with the patch applied will not report any
losses any more:
$ valgrind --leak-check=full ./haproxy -c -f ./example.cfg
==22124== Memcheck, a memory error detector
==22124== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==22124== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==22124== Command: ./haproxy -c -f ./example.cfg
==22124==
[WARNING] 165/001503 (22124) : config : missing timeouts for frontend 'foo'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Warnings were found.
Configuration file is valid
==22124==
==22124== HEAP SUMMARY:
==22124== in use at exit: 313,864 bytes in 82 blocks
==22124== total heap usage: 153 allocs, 71 frees, 1,447,758 bytes allocated
==22124==
==22124== LEAK SUMMARY:
==22124== definitely lost: 0 bytes in 0 blocks
==22124== indirectly lost: 0 bytes in 0 blocks
==22124== possibly lost: 0 bytes in 0 blocks
==22124== still reachable: 313,864 bytes in 82 blocks
==22124== suppressed: 0 bytes in 0 blocks
==22124== Reachable blocks (those to which a pointer was found) are not shown.
==22124== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==22124==
==22124== For counts of detected and suppressed errors, rerun with: -v
==22124== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
It might be worth investigating what exactly HAProxy does to lose pointers
to the start of those 4 memory areas and then to be able to still free them
during deinit(). If HAProxy is able to free them, they ideally should be
"still reachable" and not "possibly lost".
The allocation did not account for either the trailing null byte or the
space, leading to a buffer overwrite.
This bug was detected by an assertion failure in the allocator. But can
also be easily detected using valgrind:
==25827== Invalid write of size 1
==25827== at 0x6529759: __vsprintf_chk (vsprintf_chk.c:84)
==25827== by 0x65296AC: __sprintf_chk (sprintf_chk.c:31)
==25827== by 0x4D6AB7: sprintf (stdio2.h:33)
==25827== by 0x4D6AB7: proxy_parse_smtpchk_opt (check.c:1799)
==25827== by 0x4A7DDD: cfg_parse_listen (cfgparse-listen.c:2269)
==25827== by 0x494AD3: readcfgfile (cfgparse.c:2167)
==25827== by 0x542995: init (haproxy.c:2021)
==25827== by 0x421DD2: main (haproxy.c:3121)
==25827== Address 0x78712a8 is 0 bytes after a block of size 24 alloc'd
==25827== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==25827== by 0x4D6A8C: proxy_parse_smtpchk_opt (check.c:1797)
==25827== by 0x4A7DDD: cfg_parse_listen (cfgparse-listen.c:2269)
==25827== by 0x494AD3: readcfgfile (cfgparse.c:2167)
==25827== by 0x542995: init (haproxy.c:2021)
==25827== by 0x421DD2: main (haproxy.c:3121)
This patch fixes issue #681.
This bug was introduced in commit fbcc77c6ba,
which first appeared in 2.2-dev7. No backport needed.
When building with gcc-8 -fsanitize=address, we get this warning once on
an strncpy() call in proto_uxst.c:
src/proto_uxst.c:262:3: warning: 'strncpy' output may be truncated copying 107 bytes from a string of length 4095 [-Wstringop-truncation]
strncpy(addr.sun_path, tempname, sizeof(addr.sun_path) - 1);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It happens despite the test on snprintf() at the top (since gcc's string
handling is totally empiric), and requires the strlen() test to be placed
"very close" to the strncpy() call (with "very close" yet to be determined).
There's no other way to shut this one except disabling it. Given there's
only one instance of this warning and the cost of dealing with it in the
code is not huge, let's decorate the code to make gcc happily believe it
is smart since it seems to have a mind of itself.
In bug #676, it was reported that ssl-min-ver SSLv3 does not work in
Amazon environments with OpenSSL 1.0.2.
The reason for this is a patch of Amazon OpenSSL which sets
SSL_OP_NO_SSLv3 in SSL_CTX_new(). Which is kind of a problem with our
implementation of ssl-{min,max}-ver in old openSSL versions, because it
does not try to clear existing version flags.
This patch fixes the bug by cleaning versions flags known by HAProxy in
the SSL_CTX before applying the right ones.
Should be backported as far as 1.8.
Getting rid of this warning is cleaner solved using a 'fall through' comment,
because it clarifies intent to a human reader.
This patch adjust a few places that cause -Wimplicit-fallthrough to trigger:
- Fix typos in the comment.
- Remove redundant 'no break' that trips up gcc from comment.
- Move the comment out of the block when the 'case' is completely surrounded
by braces.
- Add comments where I could determine that the fall through was intentional.
Changes tested on
gcc (Debian 9.3.0-13) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
using
make -j4 all TARGET=linux-glibc USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_GETADDRINFO=1
Commit b5997f740 ("MAJOR: threads/map: Make acls/maps thread safe")
introduced a subtle bug in the pattern matching code. In order to cope
with the possibility that another thread might be modifying the pattern's
sample_data while it's being used, we return a thread-local static
sample_data which is a copy of the one found in the matched pattern. The
copy is performed depending on the sample_data's type. But the switch
statement misses some breaks and doesn't set the new sample_data pointer
at the right place, resulting in the original sample_data being restored
at the end before returning.
The net effect overall is that the correct sample_data is returned (hence
functionally speaking the matching works fine) but it's not thread-safe
so any del_map() or set_map() action could modify the pattern on one
thread while it's being used on another one. It doesn't seem likely to
cause a crash but could result in corrupted data appearing where the
value is consumed (e.g. when appended in a header or when logged) or an
ACL occasionally not matching after a map lookup.
This fix should be backported as far as 1.8.
Thanks to Tim for reporting it and to Emeric for the analysis.
In issue #648 a second problem was reported, indicating that some users
mistakenly send the log to an FD mapped on a file. This situation doesn't
even enable O_NONBLOCK and results in huge access times in the order of
milliseconds with the lock held and other threads waiting till the
watchdog fires to unblock the situation.
The problem with files is that O_NONBLOCK is ignored, and we still need
to lock otherwise we can end up with interleaved log messages.
What this patch does is different. Instead of locking all writers, it
uses a trylock so that there's always at most one logger and that other
candidates can simply give up and report a failure, just as would happen
if writev() returned -1 due to a pipe full condition. This solution is
elegant because it gives back the control to haproxy to decide to give
up when it takes too much time, while previously it was the kernel that
used to block the syscall.
However at high log rates (500000 req/s) there was up to 50% dropped logs
due to the contention on the lock. In order to address this, we try to
grab the lock up to 200 times and call ha_thread_relax() on failure. This
results in almost no failure (no more than previously with O_NONBLOCK). A
typical test with 6 competing threads writing to stdout chained to a pipe
to a single process shows around 1000 drops for 10 million logs at 480000
lines per second.
Please note that this doesn't mean that writing to a blocking FD is a good
idea, and it might only be temporarily done on testing environments for
debugging. A file or a terminal will continue to block the writing thread
while others spin a little bit and lose their logs, but the writing thread
will still experience performance-killing latencies.
This patch should be backported to 2.1 and 2.0. The code is in log.c in
2.0, but the principle is the same.
Apparently Cygwin requires sys/types.h before netinet/tcp.h but doesn't
include it by itself, as shown here:
https://github.com/haproxy/haproxy/actions/runs/131943890
This patch makes sure it's always present, which is in server.c and
the SPOA example.
The set of files proto_udp.{c,h} were misleadingly named, as they do not
provide anything related to the UDP protocol but to datagram handling
instead, since currently all UDP processing is hard-coded where it's used
(dns, logs). They are to UDP what connection.{c,h} are to proto_tcp. This
was causing confusion about how to insert UDP socket management code,
so let's rename them right now to dgram.{c,h} which more accurately
matches what's inside since every function and type is already prefixed
with "dgram_".
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
Since these are used as type attributes or conditional clauses, they
are used about everywhere and should not require a dependency on
thread.h. Moving them to compiler.h along with other similar statements
like ALIGN() etc looks more logical; this way they become part of the
base API. This allowed to remove thread-t.h from ~12 files, one was
found to only require thread-t and not thread and dict.c was found to
require thread.h.
That's already where MAX_PROCS is set, and we already handle the case of
the default value so there is no reason for placing it in thread.h given
that most call places don't need the rest of the threads definitions. The
include was removed from global-t.h and activity.c.
Most of the files dealing with error reports have to include log.h in order
to access ha_alert(), ha_warning() etc. But while these functions don't
depend on anything, log.h depends on a lot of stuff because it deals with
log-formats and samples. As a result it's impossible not to embark long
dependencies when using ha_warning() or qfprintf().
This patch moves these low-level functions to errors.h, which already
defines the error codes used at the same places. About half of the users
of log.h could be adjusted, sometimes revealing other issues such as
missing tools.h. Interestingly the total preprocessed size shrunk by
4%.
Checks.c remains one of the largest file of the project and it contains
too many things. The tcpchecks code represents half of this file, and
both parts are relatively isolated, so let's move it away into its own
file. We now have tcpcheck.c, tcpcheck{,-t}.h.
Doing so required to export quite a number of functions because check.c
has almost everything made static, which really doesn't help to split!
check.c is one of the largest file and contains too many things. The
e-mail alerting code is stored there while nothing is in mailers.c.
Let's move this code out. That's only 4% of the code but a good start.
In order to do so, a few tcp-check functions had to be exported.
When building contrib/hpack there is a warning about an unused static
function. Actually it makes no sense to make it static, instead it must
be regularly exported. Similarly there is hpack_dht_get_tail() which is
inlined in the C file and which would make more sense with all other ones
in the H file.
There's no point splitting the file in two since only cfgparse uses the
types defined there. A few call places were updated and cleaned up. All
of them were in C files which register keywords.
There is nothing left in common/ now so this directory must not be used
anymore.
This one was not easy because it was embarking many includes with it,
which other files would automatically find. At least global.h, arg.h
and tools.h were identified. 93 total locations were identified, 8
additional includes had to be added.
In the rare files where it was possible to finalize the sorting of
includes by adjusting only one or two extra lines, it was done. But
all files would need to be rechecked and cleaned up now.
It was the last set of files in types/ and proto/ and these directories
must not be reused anymore.
extern struct dict server_name_dict was moved from the type file to the
main file. A handful of inlined functions were moved at the bottom of
the file. Call places were updated to use server-t.h when relevant, or
to simply drop the entry when not needed.
The files remained mostly unchanged since they were OK. However, half of
the users didn't need to include them, and about as many actually needed
to have it and used to find functions like srv_currently_usable() through
a long chain that broke when moving the file.
This one is particularly difficult to split because it provides all the
functions used to manipulate a proxy state and to retrieve names or IDs
for error reporting, and as such, it was included in 73 files (down to
68 after cleanup). It would deserve a small cleanup though the cut points
are not obvious at the moment given the number of structs involved in
the struct proxy itself.
The current state of the logging is a real mess. The main problem is
that almost all files include log.h just in order to have access to
the alert/warning functions like ha_alert() etc, and don't care about
logs. But log.h also deals with real logging as well as log-format and
depends on stream.h and various other things. As such it forces a few
heavy files like stream.h to be loaded early and to hide missing
dependencies depending where it's loaded. Among the missing ones is
syslog.h which was often automatically included resulting in no less
than 3 users missing it.
Among 76 users, only 5 could be removed, and probably 70 don't need the
full set of dependencies.
A good approach would consist in splitting that file in 3 parts:
- one for error output ("errors" ?).
- one for log_format processing
- and one for actual logging.
It was moved without any change, however many callers didn't need it at
all. This was a consequence of the split of proto_http.c into several
parts that resulted in many locations to still reference it.
Almost no change except moving the cli_kw struct definition after the
defines. Almost all users had both types&proto included, which is not
surprizing since this code is old and it used to be the norm a decade
ago. These places were cleaned.
Just some minor reordering, and the usual cleanup of call places for
those which didn't need it. We don't include the whole tools.h into
stats-t anymore but just tools-t.h.
The type file was slightly tidied. The cli-specific APPCTX_CLI_ST1_* flag
definitions were moved to cli.h. The type file was adjusted to include
buf-t.h and not the huge buf.h. A few call places were fixed because they
did not need this include.
Initially it looked like this could have been placed into auth.h or
stats.h but it's not the case as it's what makes the link between them
and the HTTP layer. However the file needed to be split in two. Quite
a number of call places were dropped because these were mostly leftovers
from the early days where the stats and cli were packed together.
The files were moved almost as-is, just dropping arg-t and auth-t from
acl-t but keeping arg-t in acl.h. It was useful to revisit the call places
since a handful of files used to continue to include acl.h while they did
not need it at all. Struct stream was only made a forward declaration
since not otherwise needed.
The stktable_types[] array declaration was moved to the main file as
it had nothing to do in the types. A few declarations were reordered
in the types file so that defines were before the structs. Thread-t
was added since there are a few __decl_thread(). The loss of peers.h
revealed that cfgparse-listen needed it.
The cfg_peers external declaration was moved to the main file instead
of the type one. A few types were still missing from the proto, causing
warnings in the functions prototypes (proxy, stick_table).
All includes that were not absolutely necessary were removed because
checks.h happens to very often be part of dependency loops. A warning
was added about this in check-t.h. The fields, enums and structs were
a bit tidied because it's particularly tedious to find anything there.
It would make sense to split this in two or more files (at least
extract tcp-checks).
The file was renamed to the singular because it was one of the rare
exceptions to have an "s" appended to its name compared to the struct
name.
The type file is becoming a mess, half of it is for the proxy protocol,
another good part describes conn_streams and mux ops, it would deserve
being split again. At least it was reordered so that elements are easier
to find, with the PP-stuff left at the end. The MAX_SEND_FD macro was moved
to compat.h as it's said to be the value for Linux.
The TASK_IS_TASKLET() macro was moved to the proto file instead of the
type one. The proto part was a bit reordered to remove a number of ugly
forward declaration of static inline functions. About a tens of C and H
files had their dependency dropped since they were not using anything
from task.h.
global.h was one of the messiest files, it has accumulated tons of
implicit dependencies and declares many globals that make almost all
other file include it. It managed to silence a dependency loop between
server.h and proxy.h by being well placed to pre-define the required
structs, forcing struct proxy and struct server to be forward-declared
in a significant number of files.
It was split in to, one which is the global struct definition and the
few macros and flags, and the rest containing the functions prototypes.
The UNIX_MAX_PATH definition was moved to compat.h.
There is no C file for this one, the code was placed into sample.c which
thus has a dependency on this file which itself includes sample.h. Probably
that it would be wise to split that later.
This one is particularly tricky to move because everyone uses it
and it depends on a lot of other types. For example it cannot include
arg-t.h and must absolutely only rely on forward declarations to avoid
dependency loops between vars -> sample_data -> arg. In order to address
this one, it would be nice to split the sample_data part out of sample.h.
There's no type file, it only contains fetch_rdp_cookie_name() and
val_payload_lv() which probably ought to move somewhere else instead
of staying there.
It was moved as-is, except for extern declaration of pattern_reference.
A few C files used to include it but didn't need it anymore after having
been split apart so this was cleaned.
One function prototype makes reference to struct mworker_proc which was
not defined there but in global.h instead. This definition, along with
the PROC_O_* fields were moved to mworker-t.h instead.
The file mostly contained struct definitions but there was also a
variable export. Most of the stuff currently lies in checks.h and
should definitely move here!
The STATS_DEFAULT_REALM and STATS_DEFAULT_URI were moved to defaults.h.
It was required to include types/pattern.h and types/sample.h since they
are mentioned in function prototypes.
It would be wise to merge this with uri_auth.h later.
List.h was missing for LIST_ADDQ(). A few unneeded includes of action.h
were removed from certain files.
This one still relies on applet.h and stick-table.h.
A few includes had to be added, namely list-t.h in the type file and
types/proxy.h in the proto file. actions.h was including http-htx.h
but didn't need it so it was dropped.
The sink files could be moved with almost no change at since they
didn't rely on anything fancy. ssize_t required sys/types.h and
thread.h was needed for the locks.
A few includes were missing in each file. A definition of
struct polled_mask was moved to fd-t.h. The MAX_POLLERS macro was
moved to defaults.h
Stdio used to be silently inherited from whatever path but it's needed
for list_pollers() which takes a FILE* and which can thus not be
forward-declared.
And also rename standard.c to tools.c. The original split between
tools.h and standard.h dates from version 1.3-dev and was mostly an
accident. This patch moves the files back to what they were expected
to be, and takes care of not changing anything else. However this
time tools.h was split between functions and types, because it contains
a small number of commonly used macros and structures (e.g. name_desc)
which in turn cause the massive list of includes of tools.h to conflict
with the callers.
They remain the ugliest files of the whole project and definitely need
to be cleaned and split apart. A few types are defined there only for
functions provided there, and some parts are even OS-specific and should
move somewhere else, such as the symbol resolution code.
The protocol.h files are pretty low in the dependency and (sadly) used
by some files from common/. Almost nothing was changed except lifting a
few comments.
The file was moved almost verbatim (only stdio.h was dropped as useless).
It was not split between types and functions because it's only included
from direct C code (fcgi.c and mux_fcgi.c) as well as fcgi_app.h, included
from the same ones, which should also be remerged as a single one.
The various hpack files are self-contained, but hpack-tbl was one of
those showing difficulties when pools were added because that began
to add quite some dependencies. Now when built in standalone mode,
it still uses the bare minimum pool definitions and doesn't require
to know the prototypes anymore when only the structures are needed.
Thus the files were moved verbatim except for hpack-tbl which was
split between types and prototypes.
Most of the file was a large set of HTX elements manipulation functions
and few types, so splitting them allowed to further reduce dependencies
and shrink the build time. Doing so revealed that a few files (h2.c,
mux_pt.c) needed haproxy/buf.h and were previously getting it through
htx.h. They were fixed.
The file was moved as-is. There was a wrong dependency on dynbuf.h
instead of buf.h which was addressed. There was no benefit to
splitting this between types and functions.
There's only one struct and 2 inline functions. It could have been
merged into http.h but that would have added a massive dependency on
the hpack parts for nothing, so better keep it this way since hpack
is already freestanding and portable.
So the enums and structs were placed into http-t.h and the functions
into http.h. This revealed that several files were dependeng on http.h
but not including it, as it was silently inherited via other files.
The type is the only element needed by applet.h and hlua.h, while hlua.c
needs the various functions. XREF_BUSY was placed into the types as well
since it's better to have the special values there.
Regex are essentially included for myregex_t but it turns out that
several of the C files didn't include it directly, relying on the
one included by their own .h. This has been cleanly addressed so
that only the type is included by H files which need it, and adding
the missing includes for the other ones.
The type was moved out as it's used by standard.h for netns_entry.
Instead of just being a forward declaration when not used, it's an
empty struct, which makes gdb happier (the resulting stripped executable
is the same).
The pretty confusing "buffer.h" was in fact not the place to look for
the definition of "struct buffer" but the one responsible for dynamic
buffer allocation. As such it defines the struct buffer_wait and the
few functions to allocate a buffer or wait for one.
This patch moves it renaming it to dynbuf.h. The type definition was
moved to its own file since it's included in a number of other structs.
Doing this cleanup revealed that a significant number of files used to
rely on this one to inherit struct buffer through it but didn't need
anything from this file at all.
This moves types/activity.h to haproxy/activity-t.h and
proto/activity.h to haproxy/activity.h.
The macros defining the bit field values for the profiling variable
were moved to the type file to be more future-proof.
Now the file is ready to be stored into its final destination. A few
minor reorderings were performed to keep the file properly organized,
making the various sections more visible (cache & lockless).
In addition and to stay consistent, memory.c was renamed to pool.c.
Till now the local pool caches were implemented only when lockless pools
were in use. This was mainly due to the difficulties to disentangle the
code parts. However the locked pools would further benefit from the local
cache, and having this would reduce the variants in the code.
This patch does just this. It adds a new debug macro DEBUG_NO_LOCAL_POOLS
to forcefully disable local pool caches, and makes sure that the high
level functions are now strictly the same between locked and lockless
(pool_alloc(), pool_alloc_dirty(), pool_free(), pool_get_first()). The
pool index calculation was moved inside the CONFIG_HAP_LOCAL_POOLS guards.
This allowed to move them out of the giant #ifdef and to significantly
reduce the code duplication.
A quick perf test shows that with locked pools the performance increases
by roughly 10% on 8 threads and gets closer to the lockless one.
Just as for the allocation path, the release path was not symmetrical.
It was not logical to have pool_put_to_cache() free the objects itself,
it was pool_free's job. In addition, just because of a variable export
issue, it the insertion of the object to free back into the local cache
couldn't be inlined while it was very cheap.
This patch just slightly reorganizes this code path by making pool_free()
decide whether or not to put the object back into the cache via
pool_put_to_cache() otherwise place it back to the global pool using
__pool_free(). Then pool_put_to_cache() adds the item to the local
cache and only calls pool_evict_from_cache() if the cache is too big.
The memory.h file is particularly complex due to the combination of
debugging options. This patch extracts the OS-level interface and
places it into a new file: pool-os.h.
Doing this also moves pool_alloc_area() and pool_free_area() out of
the #ifndef CONFIG_HAP_LOCKLESS_POOLS, making them usable from
__pool_refill_alloc(), pool_free(), pool_flush() and pool_gc() instead
of having direct calls to malloc/free there that are hard to wrap for
debugging purposes.
This is the beginning of the move and cleanup of memory.h. This first
step only extracts type definitions and basic macros that are needed
by the files which reference a pool. They're moved to pool-t.h (since
"pool" is more obvious than "memory" when looking for pool-related
stuff). 3 files which didn't need to include the whole memory.h were
updated.
In memory.h we had to reimplement the swrate* functions just because of
a broken circular dependency around freq_ctr.h. Now that this one is
solved, let's get rid of this copy and use the original ones instead.
types/freq_ctr.h was moved to haproxy/freq_ctr-t.h and proto/freq_ctr.h
was moved to haproxy/freq_ctr.h. Files were updated accordingly, no other
change was applied.
Some of them were simply removed as unused (possibly some leftovers
from an older cleanup session), some were turned to haproxy/bitops.h
and a few had to be added (hlua.c and stick-table.h need standard.h
for parse_time_err; htx.h requires chunk.h but used to get it through
standard.h).
This one is included almost everywhere and used to rely on a few other
.h that are not needed (unistd, stdlib, standard.h). It could possibly
make sense to split it into multiple parts to distinguish operations
performed on timers and the internal time accounting, but at this point
it does not appear much important.
This splits the hathreads.h file into types+macros and functions. Given
that most users of this file used to include it only to get the definition
of THREAD_LOCAL and MAXTHREADS, the bare minimum was placed into thread-t.h
(i.e. types and macros).
All the thread management was left to haproxy/thread.h. It's worth noting
the drop of the trailing "s" in the name, to remove the permanent confusion
that arises between this one and the system implementation (no "s") and the
makefile's option (no "s").
For consistency, src/hathreads.c was also renamed thread.c.
A number of files were updated to only include thread-t which is the one
they really needed.
Some future improvements are possible like replacing empty inlined
functions with macros for the thread-less case, as building at -O0 disables
inlining and causes these ones to be emitted. But this really is cosmetic.
A few files were including it while not needing it (anymore). Some
only required access to the atomic ops and got haproxy/atomic.h in
exchange. Others didn't need it at all. A significant number of
files still include it only for THREAD_LOCAL definition.
The hathreads.h file has quickly become a total mess because it contains
thread definitions, atomic operations and locking operations, all this for
multiple combinations of threads, debugging and architectures, and all this
done with random ordering!
This first patch extracts all the atomic ops code from hathreads.h to move
it to haproxy/atomic.h. The code there still contains several sections
based on non-thread vs thread, and GCC versions in the latter case. Each
section was arranged in the exact same order to ease finding.
The redundant HA_BARRIER() which was the same as __ha_compiler_barrier()
was dropped in favor of the latter which follows the naming convention
of all other barriers. It was only used in freq_ctr.c which was updated.
Additionally, __ha_compiler_barrier() was defined inconditionally but
used only for thread-related operations, so it was made thread-only like
HA_BARRIER() used to be. We'd still need to have two types of compiler
barriers, one for the general case (e.g. signals) and another one for
concurrency, but this was not addressed here.
Some comments were added at the beginning of each section to inform about
the use case and warn about the traps to avoid.
Some files which continue to include hathreads.h solely for atomic ops
should now be updated.
Half of the users of this include only need the type definitions and
not the manipulation macros nor the inline functions. Moves the various
types into mini-clist-t.h makes the files cleaner. The other one had all
its includes grouped at the top. A few files continued to reference it
without using it and were cleaned.
In addition it was about time that we'd rename that file, it's not
"mini" anymore and contains a bit more than just circular lists.
File buf.h is one common cause of pain in the dependencies. Many files in
the code need it to get the struct buffer definition, and a few also need
the inlined functions to manipulate a buffer, but the file used to depend
on a long chain only for BUG_ON() (addressed by last commit).
Now buf.h is split into buf-t.h which only contains the type definitions,
and buf.h for all inlined functions. Callers who don't care can continue
to use buf.h but files in types/ must only use buf-t.h. sys/types.h had
to be added to buf.h to get ssize_t as used by b_move(). It's worth noting
that ssize_t is only supposed to be a size_t supporting -1, so b_move()
ought to be rethought regarding this.
The files were moved to haproxy/ and all their users were updated
accordingly. A dependency issue was addressed on fcgi whose C file didn't
include buf.h.
This one used to be stored into debug.h but the debug tools got larger
and require a lot of other includes, which can't use BUG_ON() anymore
because of this. It does not make sense and instead this macro should
be placed into the lower includes and given its omnipresence, the best
solution is to create a new bug.h with the few surrounding macros needed
to trigger bugs and place assertions anywhere.
Another benefit is that it won't be required to add include <debug.h>
anymore to use BUG_ON, it will automatically be covered by api.h. No
less than 32 occurrences were dropped.
The FSM_PRINTF macro was dropped since not used at all anymore (probably
since 1.6 or so).
Fortunately that file wasn't made dependent upon haproxy since it was
integrated, better isolate it before it's too late. Its dependency on
api.h was the result of the change from config.h, which in turn wasn't
correct. It was changed back to stddef.h for size_t and sys/types.h for
ssize_t. The recently added reference to MAX() was changed as it was
placed only to avoid a zero length in the non-free-standing version and
was causing a build warning in the hpack encoder.
This file is to openssl what compat.h is to the libc, so it makes sense
to move it to haproxy/. It could almost be part of api.h but given the
amount of openssl stuff that gets loaded I fear it could increase the
build time.
Note that this file contains lots of inlined functions. But since it
does not depend on anything else in haproxy, it remains safe to keep
all that together.
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:
- common/config.h
- common/compat.h
- common/compiler.h
- common/defaults.h
- common/initcall.h
- common/tools.h
The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.
In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.
No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
This is where other imported components are located. All files which
used to directly include ebtree were touched to update their include
path so that "import/" is now prefixed before the ebtree-related files.
The ebtree.h file was slightly adjusted to read compiler.h from the
common/ subdirectory (this is the only change).
A build issue was encountered when eb32sctree.h is loaded before
eb32tree.h because only the former checks for the latter before
defining type u32. This was addressed by adding the reverse ifdef
in eb32tree.h.
No further cleanup was done yet in order to keep changes minimal.
As part of the include files cleanup, we're going to kill the ebtree
directory. For this we need to host its C files in a different location
and src/ is the right one.
Fix a trash buffer leak when we can't take the lock of the ckch, or when
"set ssl cert" is wrongly used.
The bug was mentionned in this thread:
https://www.mail-archive.com/haproxy@formilux.org/msg37539.html
The bug was introduced by commit bc6ca7c ("MINOR: ssl/cli: rework 'set
ssl cert' as 'set/commit'").
Must be backported in 2.1.
When HAProxy is started with a '--' option, all following parameters are
considered configuration files. You can't add new options after a '--'.
The current reload system of the master-worker adds extra options at the
end of the arguments list. Which is a problem if HAProxy was started wih
'--'.
This patch fixes the issue by copying the new option at the beginning of
the arguments list instead of the end.
This patch must be backported as far as 1.8.
There is no reason the -S option can't take an argument which starts with
a -. This limitation must only be used for options that take a
non-finite list of parameters (-sf/-st)
This can be backported only if the previous patch which fixes
copy_argv() is backported too.
Could be backported as far as 1.9.
There is no reason the -x option can't take an argument which starts with
a -. This limitation must only be used for options that take a
non-finite list of parameters (-sf/-st)
This can be backported only if the previous patch which fixes
copy_argv() is backported too.
Could be backported as far as 1.8.
The copy_argv() function, which is used to copy and remove some of the
arguments of the command line in order to re-exec() the master process,
is poorly implemented.
The function tries to remove the -x and the -sf/-st options but without
taking into account that some of the options could take a parameter
starting with a dash.
In issue #644, haproxy starts with "-L -xfoo" which is perfectly
correct. However, the re-exec is done without "-xfoo" because the master
tries to remove the "-x" option. Indeed, the copy_argv() function does
not know how much arguments an option can have, and just assume that
everything starting with a dash is an option. So haproxy is exec() with
"-L" but without a parameter, which is wrong and leads to the exit of
the master, with usage().
To fix this issue, copy_argv() must know how much parameters an option
takes, and copy or skip the parameters correctly.
This fix is a first step but it should evolve to a cleaner way of
declaring the options to avoid deduplication of the parsing code, so we
avoid new bugs.
Should be backported with care as far as 1.8, by removing the options
that does not exists in the previous versions.
When checking the config validity of the http-check rulesets, the test on the
ruleset type is inverted. So a warning about ignored directive is emitted when
the config is valid and omitted when it should be reported.
No backport needed.
The pattern references lock must be hold to perform set/add/del
operations. Unfortunately, it is not true for the lua functions manipulating acl
and map files.
This patch should fix the issue #664. It must be backported as far as 1.8.
Before executing a lua action, the analyse expiration timeout of the
corresponding channel must be reset. Otherwise, when it expires, for instance
because of a call to core.msleep(), if the action yields, an expired timeout
will be used for the stream's task, leading to a loop.
This patch should fix the issue #661. It must be backported in all versions
supporting the lua.
By default, HAProxy is able to implicitly upgrade an H1 client connection to an
H2 connection if the first request it receives from a given HTTP connection
matches the HTTP/2 connection preface. This way, it is possible to support H1
and H2 clients on a non-SSL connections. It could be a problem if for any
reason, the H2 upgrade is not acceptable. "option disable-h2-upgrade" may now be
used to disable it, per proxy. The main puprose of this option is to let an
admin to totally disable the H2 support for security reasons. Recently, a
critical issue in the HPACK decoder was fixed, forcing everyone to upgrade their
HAProxy version to fix the bug. It is possible to disable H2 for SSL
connections, but not on clear ones. This option would have been a viable
workaround.
This reverts commit 4fed93eb72.
This commit was simplifying the certificate chain loading with
SSL_CTX_add_extra_chain_cert() which is available in all SSL libraries.
Unfortunately this function is not compatible with the
multi-certificates bundles, which have the effect of concatenating the
chains of all certificate types instead of creating a chain for each
type (RSA, ECDSA etc.)
Should fix issue #655.
The support for reqrep and friends was removed in 2.1 but the
chain_regex() function and the "action" field in the regex struct
was still there. This patch removes them.
One point worth mentioning though. There is a check_replace_string()
function whose purpose was to validate the replacement strings passed
to reqrep. It should also be used for other replacement regex, but is
never called. Callers of exp_replace() should be checked and a call to
this function should be added to detect the error early.
Network types were directly and mistakenly mapped on sample types:
This patch fix the doc with values effectively used to keep backward
compatiblitiy on existing implementations.
In addition it adds an internal/network mapping for key types to avoid
further mistakes adding or modifying internals types.
This patch should be backported on all maintained branches,
particularly until v1.8 included for documentation part.
The recently added ring section post-processing added this bening
warning on 32-bit archs:
src/sink.c: In function 'cfg_post_parse_ring':
src/sink.c:994:15: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
ha_warning("ring '%s' event max length '%u' exceeds size, forced to size '%lu'.\n",
^
Let's just cast b_size() to unsigned long here.
Since 8177ad9 ("MINOR: ssl: split config and runtime variable for
ssl-{min,max}-ver"), the dump for ssl-min-ver and ssl-max-ver is fixed,
so we can remove the comment.
Using ssl-max-ver without ssl-min-ver is ambiguous.
When the ssl-min-ver is not configured, and ssl-max-ver is set to a
value lower than the default ssl-min-ver (which is TLSv1.2 currently),
set the ssl-min-ver to the value of ssl-max-ver, and emit a warning.
log-proto <logproto>
The "log-proto" specifies the protocol used to forward event messages to
a server configured in a ring section. Possible values are "legacy"
and "octet-count" corresponding respectively to "Non-transparent-framing"
and "Octet counting" in rfc6587. "legacy" is the default.
Notes: a separated io_handler was created to avoid per messages test
and to prepare code to set different log protocols such as
request- response based ones.
This patch adds new statement "server" into ring section, and the
related "timeout connect" and "timeout server".
server <name> <address> [param*]
Used to configure a syslog tcp server to forward messages from ring buffer.
This supports for all "server" parameters found in 5.2 paragraph.
Some of these parameters are irrelevant for "ring" sections.
timeout connect <timeout>
Set the maximum time to wait for a connection attempt to a server to succeed.
Arguments :
<timeout> is the timeout value specified in milliseconds by default, but
can be in any other unit if the number is suffixed by the unit,
as explained at the top of this document.
timeout server <timeout>
Set the maximum time for pending data staying into output buffer.
Arguments :
<timeout> is the timeout value specified in milliseconds by default, but
can be in any other unit if the number is suffixed by the unit,
as explained at the top of this document.
Example:
global
log ring@myring local7
ring myring
description "My local buffer"
format rfc3164
maxlen 1200
size 32764
timeout connect 5s
timeout server 10s
server mysyslogsrv 127.0.0.1:6514
Commit 04f5fe87d3 introduced an rwlock in the pools to deal with the risk
that pool_flush() dereferences an area being freed, and commit 899fb8abdc
turned it into a spinlock. The pools already contain a spinlock in case of
locked pools, so let's use the same and simplify the code by removing ifdefs.
At this point I'm really suspecting that if pool_flush() would instead
rely on __pool_get_first() to pick entries from the pool, the concurrency
problem could never happen since only one user would get a given entry at
once, thus it could not be freed by another user. It's not certain this
would be faster however because of the number of atomic ops to retrieve
one entry compared to a locked batch.
Since HAProxy 1.8, the TLS default minimum version was set to TLSv1.0 to
avoid using the deprecated SSLv3.0. Since then, the standard changed and
the recommended TLS version is now TLSv1.2.
This patch changes the minimum default version to TLSv1.2 on bind lines.
If you need to use prior TLS version, this is still possible by
using the ssl-min-ver keyword.
When a tcpcheck ruleset is created, it is automatically inserted in a global
tree. Unfortunately for applicative health checks (redis, mysql...), the created
ruleset is inserted a second time during the directive parsing. The leads to a
infinite loop when haproxy is stopped when we try to scan the tree to release
all tcpcheck rulesets.
Now, only the function responsible to create the tcpcheck ruleset insert it into
the tree.
No backport needed.
In issue #657, Coverity found a bug in the "nameserver" parser for the
resolv.conf when "parse-resolv-conf" is set. What happens is that if an
unparsable address appears on a "nameserver" line, it will destroy the
previously allocated pointer before reporting the warning, then the next
"nameserver" line will dereference it again and wlil cause a crash. If
the faulty nameserver is the last one, it will only be a memory leak.
Let's just make sure we preserve the pointer when handling the error.
The patch also fixes a typo in the warning.
The bug was introduced in 1.9 with commit 44e609bfa ("MINOR: dns:
Implement `parse-resolv-conf` directive") so the fix needs to be backported
up to 1.9 or 2.0.
This patch removes all trailing LFs and Zeros from
log messages. Previously only the last LF was removed.
It's a regression from e8ea0ae6f6 "BUG/MINOR: logs:
prevent double line returns in some events."
This should fix github issue #654
In hlua_stktable_lookup(), the key length is never set so all
stktable:lookup("key") calls return nil from lua.
This patch must be backported as far as 1.9.
[Cf: I slightly updated the patch to use lua_tolstring() instead of
luaL_checkstring() + strlen()]
Most of code in event_srv_chk_io() function is inherited from the checks before
the recent refactoring. Now, it is enough to only call wake_srv_chk(). Since the
refactoring, the removed code is dead and never called. wake_srv_chk() may only
return 0 if tcpcheck_main() returns 0 and the check status is unknown
(CHK_RES_UNKNOWN). When this happens, nothing is performed in event_srv_chk_io().
When an health check is waiting for a connection establishment, it subscribe for
receive or send events, depending on the next rule to be evaluated. For
subscription for send events, there is no problem. It works as expected. For
subscription for receive events, It only works for HTTP checks because the
underlying multiplexer takes care to do a receive before subscribing again,
updating the fd state. For TCP checks, the PT multiplexer only forwards
subscriptions at the transport layer. So the check itself is woken up. This
leads to a subscribe/notify loop waiting the connection establishment or a
timeout, uselessly eating CPU.
Thus, when a check is waiting for a connect, instead of blindly resubscribe for
receive events when it is woken up, we now try to receive data.
This patch should fix the issue #635. No backport needed.
HTTP_1XX, HTTP_3XX and HTTP_4XX message templates are no longer used. Only
HTTP_302 and HTTP_303 are used during configuration parsing by "errorloc" family
directives. So these templates are removed from the generic http code. And
HTTP_302 and HTTP_303 templates are moved as static strings in the function
parsing "errorloc" directives.
Now http-request auth rules are evaluated in a dedicated function and no longer
handled "in place" during the HTTP rules evaluation. Thus the action name
ACT_HTTP_REQ_AUTH is removed. In additionn, http_reply_40x_unauthorized() is
also removed. This part is now handled in the new action_ptr callback function.
There is no reason to not use proxy's error replies to emit 401/407
responses. The function http_reply_40x_unauthorized(), responsible to emit those
responses, is not really complex. It only adds a
WWW-Authenticate/Proxy-Authenticate header to a generic message.
So now, error replies can be defined for 401 and 407 status codes, using
errorfile or http-error directives. When an http-request auth rule is evaluated,
the corresponding error reply is used. For 401 responses, all occurrences of the
WWW-Authenticate header are removed and replaced by a new one with a basic
authentication challenge for the configured realm. For 407 responses, the same
is done on the Proxy-Authenticate header. If the error reply must not be
altered, "http-request return" rule must be used instead.
This adds a sliding estimate of the pools' usage. The goal is to be able
to use this to start to more aggressively free memory instead of keeping
lots of unused objects in pools. The average is calculated as a sliding
average over the last 1024 consecutive measures of ->used during calls to
pool_free(), and is bumped up for 1/4 of its history from ->allocated when
allocation from the pool fails and results in a call to malloc().
The result is a floating value between ->used and ->allocated, that tries
to react fast to under-estimates that result in expensive malloc() but
still maintains itself well in case of stable usage, and progressively
goes down if usage shrinks over time.
This new metric is reported as "needed_avg" in "show pools".
Sadly due to yet another include dependency hell, we couldn't reuse the
functions from freq_ctr.h so they were temporarily duplicated into memory.h.
In connect_server(), if we can't create the mux immediately because we have
to wait until the alpn is negociated, store the session as the connection's
owner. conn_create_mux() expects it to be set, and provides it to the mux
init() method. Failure to do so will result to crashes later if the
connection is private, and even if we didn't do so it would prevent connection
reuse for private connections.
This should fix github issue #651.
When a PROXY protocol line must be sent, it is important to always get the
stream if it exists. It is mandatory to send an unique ID when the unique-id
option is enabled. In conn_si_send_proxy(), to get the stream, we first retrieve
the conn-stream attached to the backend connection. Then if the conn-stream data
callback is si_conn_cb, it is possible to get the stream. But for now, it only
works for connections with a multiplexer. Thus, for mux-less connections, the
unique ID is never sent. This happens for all SSL connections relying on the
alpn to choose the good multiplexer. But it is possible to use the context of
such connections to get the conn-stream.
The bug was introduced by the commit cf6e0c8a8 ("MEDIUM: proxy_protocol: Support
sending unique IDs using PPv2"). Thus, this patch must be backported to the same
versions as the commit above.
It is possible to send a unique ID when the PROXY protocol v2 is used. It relies
on the stream to do so. So we must be sure to have a stream. Locally initiated
connections may not be linked to a stream. For instance, outgoing connections
created by health checks have no stream. Moreover, the stream is not retrieved
for mux-less connections (this bug will be fixed in another commit).
Unfortunately, in make_proxy_line_v2() function, the stream is not tested before
generating the unique-id. This bug leads to a segfault when a health check is
performed for a server with the PROXY protocol v2 and the unique-id option
enabled. It also crashes for servers using SSL connections with alpn. The bug
was introduced by the commit cf6e0c8a8 ("MEDIUM: proxy_protocol: Support sending
unique IDs using PPv2")
This patch should fix the issue #640. It must be backported to the same versions
as the commit above.
During an health check execution, the conn-stream and the conncetion may only be
NULL before the evaluation of the first rule, which is always a connect, or if
the first conn-stream allocation failed. Thus, in tcpcheck_main(), useless tests
on the conn-stream or on the connection have been removed. A comment has been
added to make it clear.
No backport needed.
When a connect rule is evaluated, the conn-stream and the connection must be
refreshed in tcpcheck_main(). Otherwise, in case of synchronous connect, these
variables point on outdated values (NULL for the first connect or released
otherwise).
No backport needed.
It is possible to globally declare ring-buffers, to be used as target for log
servers or traces.
ring <ringname>
Creates a new ring-buffer with name <ringname>.
description <text>
The descritpition is an optional description string of the ring. It will
appear on CLI. By default, <name> is reused to fill this field.
format <format>
Format used to store events into the ring buffer.
Arguments:
<format> is the log format used when generating syslog messages. It may be
one of the following :
iso A message containing only the ISO date, followed by the text.
The PID, process name and system name are omitted. This is
designed to be used with a local log server.
raw A message containing only the text. The level, PID, date, time,
process name and system name are omitted. This is designed to be
used in containers or during development, where the severity
only depends on the file descriptor used (stdout/stderr). This
is the default.
rfc3164 The RFC3164 syslog message format. This is the default.
(https://tools.ietf.org/html/rfc3164)
rfc5424 The RFC5424 syslog message format.
(https://tools.ietf.org/html/rfc5424)
short A message containing only a level between angle brackets such as
'<3>', followed by the text. The PID, date, time, process name
and system name are omitted. This is designed to be used with a
local log server. This format is compatible with what the systemd
logger consumes.
timed A message containing only a level between angle brackets such as
'<3>', followed by ISO date and by the text. The PID, process
name and system name are omitted. This is designed to be
used with a local log server.
maxlen <length>
The maximum length of an event message stored into the ring,
including formatted header. If an event message is longer than
<length>, it will be truncated to this length.
size <size>
This is the optional size in bytes for the ring-buffer. Default value is
set to BUFSIZE.
Example:
global
log ring@myring local7
ring myring
description "My local buffer"
format rfc3164
maxlen 1200
Note: ring names are resolved during post configuration processing.
As discussed in GitHub issue #624 Lua scripts should not use
variables that are never going to be read, because the memory
for variable names is never going to be freed.
Add an optional `ifexist` parameter to the `set_var` function
that allows a Lua developer to set variables that are going to
be ignored if the variable name was not used elsewhere before.
Usually this mean that there is no `var()` sample fetch for the
variable in question within the configuration.
With "MINOR: lua: Use vars_unset_by_name_ifexist()" the last user was
removed and as outlined in that commit there is no good reason for this
function to exist.
May be backported together with the commit mentioned above.
There is no good reason to register a variable name, just to unset
that value that could not even be set without the variable existing.
This change should be safe, may be backported if desired.
With the checks refactoring, all connections are performed into a tcp-check
ruleset. So, in process_chk_conn(), it is no longer required to check
synchronous error on the connect when an health check is started. This part is
now handled in tcpcheck_main(). And because it is impossible to start a health
check with a connection, all tests on the connection during the health check
startup can be safely removed.
This patch should fix the issue #636. No backport needed.
When "hdr" arguments of an http reply are parsed, the allocated header may leak
on error path. Adding it to the header list earlier fixes the issue.
This patch should partly fix the issue #645.
No backport needed.
The http reply must be released in the function responsible to release it. This
leak was introduced when the http return was refactored to use http reply.
This patch should partly fix the issue #645.
No backport needed.
A bug in the PROXY protocol v2 implementation was present in HAProxy up to
version 2.1, causing it to emit a PROXY command instead of a LOCAL command
for health checks. This is particularly minor but confuses some servers'
logs. Sadly, the bug was discovered very late and revealed that some servers
which possibly only tested their PROXY protocol implementation against
HAProxy fail to properly handle the LOCAL command, and permanently remain in
the "down" state when HAProxy checks them. When this happens, it is possible
to enable this global option to revert to the older (bogus) behavior for the
time it takes to contact the affected components' vendors and get them fixed.
This option is disabled by default and acts on all servers having the
"send-proxy-v2" statement.
Older versions were reverted to the old behavior and should not attempt to
be fixed by default again. However a variant of this patch could possibly
be implemented to ask to explicitly send LOCAL if needed by some servers.
More context here:
https://www.mail-archive.com/haproxy@formilux.org/msg36890.htmlhttps://www.mail-archive.com/haproxy@formilux.org/msg37218.html
When a check port or a check address is specified, the check transport layer is
ignored. So it is impossible to do a SSL check in this case. This bug was
introduced by the commit 8892e5d30 ("BUG/MEDIUM: server/checks: Init server
check during config validity check").
This patch should fix the issue #643. It must be backported to all branches
where the above commit was backported.
The http-error directive can now be used instead of errorfile to define an error
message in a proxy section (including default sections). This directive uses the
same syntax that http return rules. The only real difference is the limitation
on status code that may be specified. Only status codes supported by errorfile
directives are supported for this new directive. Parsing of errorfile directive
remains independent from http-error parsing. But functionally, it may be
expressed in terms of http-errors :
errorfile <status> <file> ==> http-errror status <status> errorfile <file>
When an error response is sent to a client, the write of the http reply in the
channel buffer and its sending are performed in different functions. The
http_reply_to_htx() function is used to write an http reply in HTX message. This
way, it could be possible to use the http replies in a different context.
The htx_copy_msg() function can now be used to copy the HTX message stored in a
buffer in an existing HTX message. It takes care to not overwrite existing
data. If the destination message is empty, a raw copy is performed. All the
message is copied or nothing.
This function is used instead of channel_htx_copy_msg().
When HAProxy returns an http error message, the corresponding http reply is now
used instead of the buffer containing the corresponding HTX message. So,
http_error_message() function now returns the http reply to use for a given
stream. And the http_reply_and_close() function now relies on
http_reply_message() to send the response to the client.
The txn flag TX_CONST_REPLY may now be used to prevent after-response ruleset
evaluation. It is used if this ruleset evaluation failed on an internal error
response. Before, it was done incrementing the parameter <final>. But it is not
really convenient if an intermediary function is used to produce the
response. Using a txn flag could also be a good way to prevent after-response
ruleset evaluation in a different context.
When an http reply is configured to use an error message from an http-errors
section, instead of referencing the error message, the http reply is used. To do
so the new http reply type HTTP_REPLY_INDIRECT has been added.
Error messages defined in proxy section or inherited from a default section are
now also referenced using an array of http replies. This is done during the
configuration validity check.
During configuration parsing, error messages resulting of parsing of errorloc
and errorfile directives are now also stored as an http reply. So, for now,
these messages are stored as a buffer and as an http reply. To be able to
release all these http replies when haproxy is stopped, a global list is
used. We must do that because the same http reply may be referenced several
times by different proxies if it is defined in a default section.
Error messages specified in an http-errors section is now also stored in an
array of http replies. So, for now, these messages are stored as a buffer and as
a http reply.
Default error messages are stored as a buffer, in http_err_chunks global array.
Now, they are also stored as a http reply, in http_err_replies global array.
"http-request deny", "http-request tarpit" and "http-response deny" rules now
use the same syntax than http return rules and internally rely on the http
replies. The behaviour is not the same when no argument is specified (or only
the status code). For http replies, a dummy response is produced, with no
payload. For old deny/tarpit rules, the proxy's error messages are used. Thus,
to be compatible with existing configuration, the "default-errorfiles" parameter
is implied. For instance :
http-request deny deny_status 404
is now an alias of
http-request deny status 404 default-errorfiles
The http_reply_message() function may be used to send an http reply to a
client. This function is responsile to convert the reply in HTX, to push it in
the response buffer and to forward it to the client. It is also responsible to
terminate the transaction.
This function is used during evaluation of http return rules.
A dedicated function is added to check the validity of an http reply object,
after parsing. It is used to check the validity of http return rules.
For now, this function is only used to find the right error message in an
http-errors section for http replies of type HTTP_REPLY_ERRFILES (using
"errorfiles" argument). On success, such replies are updated to point on the
corresponding error message and their type is set to HTTP_REPLY_ERRMSG. If an
unknown http-errors section is referenced, anx error is returned. If a unknown
error message is referenced inside an existing http-errors section, a warning is
emitted and the proxy's error messages are used instead.
A dedicated function to parse arguments and create an http_reply object is
added. It is used to parse http return rule. Thus, following arguments are
parsed by this function :
... [status <code>] [content-type <type>]
[ { default-errorfiles | errorfile <file> | errorfiles <name> |
file <file> | lf-file <file> | string <str> | lf-string <fmt> } ]
[ hdr <name> <fmt> ]*
Because the status code argument is optional, a default status code must be
defined when this function is called.
No real change here. Instead of using an internal structure to the action rule,
the http return rules are now stored as an http reply. The main change is about
the action type. It is now always set to ACT_CUSTOM. The http reply type is used
to know how to evaluate the rule.
The structure owns an error message, most of time loaded from a file, and
converted to HTX. It is created when an errorfile or errorloc directive is
parsed. It is renamed to avoid ambiguities with http_reply structure.
For HTTP rules, this flag is only used to trigger a warning during HAProxy
startup when a final rule without ACL is not the last one. So this patch is
marked as a bug, but its impact is really limited.
No backport needed because http return rules were introduced in the 2.2.
TX_CLDENY, TX_CLALLOW, TX_SVDENY and TX_SVALLOW flags are unused. Only
TX_CLTARPIT is used to make the difference between an http deny rule and an http
tarpit rule. So these unused flags are removed.
In the CLI command 'show ssl crt-list', the ssl-min-ver and the
ssl-min-max arguments were always displayed because the dumped versions
were the actual version computed and used by haproxy, instead of the
version found in the configuration.
To fix the problem, this patch separates the variables to have one with
the configured version, and one with the actual version used. The dump
only shows the configured version.
This reverts commit 957ec59571.
As discussed with Emeric, the current syntax is not extensible enough,
this will be turned to a section instead in a forthcoming patch.
The ring to applet communication was only made to deal with CLI functions
but it's generic. Let's have generic appctx functions and have the CLI
rely on these instead. This patch introduces ring_attach_appctx() and
ring_detach_appctx().
A few fields, including a generic list entry, were added to the CLI context
by commit 300decc8d9 ("MINOR: cli: extend the CLI context with a list and
two offsets"). It turns out that the list entry (l0) is solely used to
consult rings and that the generic ring_write() code is restricted to a
consumer on the CLI due to this, which was not the initial intent. Let's
make it a general purpose wait_entry field that is properly initialized
during appctx_init(). This will allow any applet to wait on a ring, not
just the CLI.
The LIST_ADDQ() and LIST_DEL_INIT() calls made to attach/detach a waiter
to the ring were made under a read lock which was sufficient in front of
the writer's write lock. But it's not sufficient against other readers!
Thus theorically multiple "show events" on the same ring buffer on the
CLI could result in a crash, even though for now I couldn't manage to
reproduce it.
This fixes commit 1d181e489c ("MEDIUM: ring: implement a wait mode for
watchers") so it must be backported to 2.1, possibly further if the ring
code gets backported.
Because of a typo error in conditions to exit the sending loop, it is possible
to loop infinitely in fcgi_snd_buf() function. Instead of checking the FCGI
stream is not blocked to continue sending data, the FCGI connection is used. So
it is possible to have a stream blocked because there is not enough space in the
mux buffers to copy more data but continue to loop to send more data.
This patch should fix the issue #637. It must be backported to 2.1.
Instead of using malloc/free to allocate an HPACK table, let's declare
a pool. However the HPACK size is configured by the H2 mux, so it's
also this one which allocates it after post_check.
This patch adds the new global statement:
ring <name> [desc <desc>] [format <format>] [size <size>] [maxlen <length>]
Creates a named ring buffer which could be used on log line for instance.
<desc> is an optionnal description string of the ring. It will appear on
CLI. By default, <name> is reused to fill this field.
<format> is the log format used when generating syslog messages. It may be
one of the following :
iso A message containing only the ISO date, followed by the text.
The PID, process name and system name are omitted. This is
designed to be used with a local log server.
raw A message containing only the text. The level, PID, date, time,
process name and system name are omitted. This is designed to be
used in containers or during development, where the severity only
depends on the file descriptor used (stdout/stderr). This is
the default.
rfc3164 The RFC3164 syslog message format. This is the default.
(https://tools.ietf.org/html/rfc3164)
rfc5424 The RFC5424 syslog message format.
(https://tools.ietf.org/html/rfc5424)
short A message containing only a level between angle brackets such as
'<3>', followed by the text. The PID, date, time, process name
and system name are omitted. This is designed to be used with a
local log server. This format is compatible with what the systemd
logger consumes.
timed A message containing only a level between angle brackets such as
'<3>', followed by ISO date and by the text. The PID, process
name and system name are omitted. This is designed to be
used with a local log server.
<length> is the maximum length of event message stored into the ring,
including formatted header. If the event message is longer
than <length>, it would be truncated to this length.
<name> is the ring identifier, which follows the same naming convention as
proxies and servers.
<size> is the optionnal size in bytes. Default value is set to BUFSIZE.
Note: Historically sink's name and desc were refs on const strings. But with new
configurable rings a dynamic allocation is needed.
Before this path, they rely directly on ring_write bypassing
a part of the sink API.
Now the maxlen parameter of the log will apply only on the text
message part (and not the header, for this you woud prefer
to use the maxlen parameter on the sink/ring).
sink_write prototype was also reviewed to return the number of Bytes
written to be compliant with the other write functions.
This patch extends the sink_write prototype and code to
handle the rfc5424 and rfc3164 header.
It uses header building tools from log.c. Doing this some
functions/vars have been externalized.
facility and minlevel have been removed from the struct sink
and passed to args at sink_write because they depends of the log
and not of the sink (they remained unused by rest of the code
until now).
Historically some messages used to already contain the trailing LF but
not all, and __do_send_log adds a new one in needed cases. It also does
trim a trailing LF in certain cases while computing the max message
length, as a result of subtracting 1 to the available room in the
destination buffer. But the way it's done is wrong since some messages
still contain it.
So the code was fixed to always trim the trailing LF from messages if
present, and then only subtract 1 from the destination buffer room
instead of the size..
Note: new sink API is not designed to receive a trailing LF on
event messages
This could be backported to relevant stable versions with particular
care since the logic of the code changed a bit since 1.6 and there
may be other locations that need to be adjusted.
MySQL 4.1 is old enough to be the default mode for mysql checks. So now, once a
username is defined, post-41 mode is automatically used. To do mysql checks on
previous MySQL version, the argument "pre-41" must be used.
Note, it is a compatibility breakage for everyone using an antique and
unsupported MySQL version.
Helper functions are used to dump bind, server or filter keywords. These
functions are used to report errors during the configuration parsing. To have a
coherent API, these functions are now prepared to handle a null pointer as
argument. If so, no action is performed and functions immediately return.
This patch should fix the issue #631. It is not a bug. There is no reason to
backport it.
parse_cache_flt() is the registered callback for the "cache" filter keyword. It
is only called when the "cache" keyword is found on a filter line. So, it is
useless to test the filter name in the callback function.
This patch should fix the issue #634. It may be backported as far as 1.9.
The CI revealed that the boringssl build is still broken because of some
ifdef misplacement.
Bug introduced by dad3105 ("REORG: ssl: move ssl configuration to
cfgparse-ssl.c").
No backport needed.
Fix issue #633.
Coverity found unused variable assignment
CID 1299671 (#1 of 1): Unused value (UNUSED_VALUE)assigned_pointer:
Assigning value from args[arg + 1] to word here, but that stored
value is overwritten before it can be used.
958 word = args[arg + 1];
959 arg = arg_end;
In issue #632 boringssl build were broken by the lack of errno.h
include in ssl_crtlist.c
Bug introduced by 6e9556b ("REORG: ssl: move crtlist functions to src/ssl_crtlist.c").
No backport needed.
Expose native cum_req metric for a server: so far it was calculated as a
sum or all responses. Rename it from Cum. HTTP Responses to Cum. HTTP
Requests to be consistent with Frontend and Backend.
In order to move all SSL sample fetches in another file, moving the
ssl_sock_ctx definition in a .h file is required.
Unfortunately it became a cross dependencies hell to solve, because of
the struct wait_event field, so <types/connection.h> is needed which
created other problems.
The ssl_sock.c file contains a lot of macros and structure definitions
that should be in a .h. Move them to the more appropriate
types/ssl_sock.h file.
Make use of ssl_sock_register_msg_callback(). Function ssl_sock_msgcbk()
is now split into two dedicated functions for heartbeat and clienthello.
They are both registered by using a new callback mechanism for SSL/TLS
protocol messages.
This patch adds the ability to register callbacks for SSL/TLS protocol
messages by using the function ssl_sock_register_msg_callback().
All registered callback functions will be called when observing received
or sent SSL/TLS protocol messages.
Only allow L7 retries when using HTTP, it only really makes sense for HTTP,
anyway, and as the L7 retries code assume the message will be HTX, it will
crash when used with mode TCP.
This should fix github issue #627.
This should be backported to 2.1 and 2.0.
In do_l7_retry(), remove the SF_ADDR_SET flag. Otherwise,
assign_server_address() won't be called again, which means for 2.1 or 2.2,
we will always retry to connect to the server that just failed, and for 2.0,
that we will try to use to whatever the address is for the connection,
probably the last server used by that connection before it was pool_free()
and reallocated.
This should be backported to 2.1 and 2.0.
Commit 42a50bd19 ("BUG/MINOR: pollers: remove uneeded free in global
init") removed the 'fail_revt' label from the _do_init() function
in src/ev_select.c but left the local label declaration, which makes
clang unhappy and unable to build.
These labels are only historic and unneeded anyway so let's remove them.
This should be backported where 42a50bd19 is backported.
When the first thread stops and wakes others up, it's possible some of
them will also start to wake others in parallel. Let's make give this
notification task to the very first one instead since it's enough and
can reduce the amount of needless (though harmless) wakeup calls.
Currently the soft-stop can lead to old processes remaining alive for as
long as two seconds after receiving a soft-stop signal. What happens is
that when receiving SIGUSR1, one thread (usually the first one) wakes up,
handles the signal, sets "stopping", goes into runn_poll_loop(), and
discovers that stopping is set, so its also sets itself in the
stopping_thread_mask bit mask. After this it sees that other threads are
not yet willing to stop, so it continues to wait.
From there, other threads which were waiting in poll() expire after one
second on poll timeout and enter run_poll_loop() in turn. That's already
one second of wait time. They discover each in turn that they're stopping
and see that other threads are not yet stopping, so they go back waiting.
After the end of the first second, all threads know they're stopping and
have set their bit in stopping_thread_mask. It's only now that those who
started to wait first wake up again on timeout to discover that all other
ones are stopping, and can now quit. One second later all threads will
have done it and the process will quit.
This is effectively strictly larger than one second and up to two seconds.
What the current patch does is simple, when the first thread stops, it sets
its own bit into stopping_thread_mask then wakes up all other threads to do
also set theirs. This kills the first second which corresponds to the time
to discover the stopping state. Second, when a thread exists, it wakes all
other ones again because some might have gone back sleeping waiting for
"jobs" to go down to zero (i.e. closing the last connection). This kills
the last second of wait time.
Thanks to this, as SIGUSR1 now acts instantly again if there's no active
connection, or it stops immediately after the last connection has left if
one was still present.
This should be backported as far as 2.0.
while reading the code, I thought it was clearer to put one instruction
per line as it is mostly done elsewhere
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Since commit d4604adeaa ("MAJOR: threads/fd: Make fd stuffs
thread-safe"), we init pollers per thread using a helper. It was still
correct for mono-thread mode until commit cd7879adc2 ("BUG/MEDIUM:
threads: Run the poll loop on the main thread too"). We now use a deinit
helper for all threads, making those free uneeded.
Only poll and select are affected by this very minor issue.
it could be backported from v1.8 to v2.1.
Fixes: cd7879adc2 ("BUG/MEDIUM: threads: Run the poll loop on the main
thread too")
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
In dump_pools_to_trash() we happen to use %d to display unsigned ints!
This has probably been there since "show pools" was introduced so this
fix must be backported to all versions. The impact is negligible since
no pool uses 2 billion entries. It could possibly affect the report of
failed allocation counts but in this case there's a bigger problem to
solved!
In the commit 2fabd9d53 ("BUG/MEDIUM: checks: Subscribe to I/O events on an
unfinished connect"), we force the subscribtion to I/O events when a new
connection is opened if it is not fully established. But it must only be done if
a mux was immediately installed. If there is no mux, no subscription must be
performed.
No backport needed.
Make the digest and HMAC function of OpenSSL accessible to the user via
converters. They can be used to sign and validate content.
Reviewed-by: Tim Duesterhus <tim@bastelstu.be>
aes_gcm_dec is independent of the TLS implementation and fits better
in sample.c file with others hash functions.
[Cf: I slightly updated this patch to move aes_gcm_dec converter in sample.c
instead the new file crypto.c]
Reviewed-by: Tim Duesterhus <tim@bastelstu.be>
It is useless to try to send outgoing data if the check is still waiting to be
able to send data.
No backport needed.
(cherry picked from commit d94653700437430864c03090d710b95f4e860321)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
In tcpcheck_main(), when we are waiting for a connection, we must rely on the
next connect/send/expect rule to subscribe to I/O events, not on the immediate
next rule. Because, if it is a set-var or an unset-var rule, we will not
subscribe to I/O events while it is in fact mandatory because a send or an
expect rule is coming. It is required to wake-up the health check as soon as I/O
are possible, instead of hitting a timeout.
No backport needed.
(cherry picked from commit 758d48f54cc3372c2d8e7c34b926d218089c533a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Subscription to I/O events should not be performed if the check is already
subscribed.
No backport needed.
(cherry picked from commit 9e0b3e92f73b6715fb2814e3d09b8ba62270b417)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
In tcp-check based health check, when a new connection is opened, we must wait
it is really established before moving to the next rule. But at this stage, we
must also be sure to subscribe to I/O events. Otherwise, depending on the
timing, the health check may remains sleepy till the timeout.
No backport needed. This patch should fix the issue #622.
(cherry picked from commit b2a4c0d473e3c5dcb87f7d16f2ca410bafc62f64)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
For 6 years now we've been seeing a warning suggesting to set dh-param
beyond 1024 if possible when it was not set. It's about time to do it
and get rid of this warning since most users seem to already use 2048.
It will remain possible to set a lower value of course, so only those
who were experiencing the warning and were relying on the default value
may notice a change (higher CPU usage). For more context, please refer
to this thread :
https://www.mail-archive.com/haproxy@formilux.org/msg37226.html
This commit removes a big chunk of code which happened to be needed
exclusively to figure if it was required to emit a warning or not :-)
This fixes OSS Fuzz issue https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=21931.
OSS Fuzz detected a hang on configuration parsing for a 200kB line with a large number of
invalid escape sequences. Most likely due to the amounts of error output generated.
This issue is very minor, because usually generated configurations are to be trusted.
The bug exists since at the very least HAProxy 1.4. The patch may be backported if desired.
Commit 9df188695f ("BUG/MEDIUM: http-ana: Handle NTLM messages correctly.")
tried to address an HTTP-reuse issue reported in github issue #511 by making
sure we properly detect extended NTLM responses, but made the match case-
sensitive while it's a token so it's case insensitive.
This should be backported to the same versions as the commit above.
During use_backend and use-server post-parsing, if the log-format string used to
specify the backend or the server is just a single string, the log-format string
(a list, internally) is replaced by the string itself. Because the field is an
union, the list is not emptied according to the rules of art. The element, when
released, is not removed from the list. There is no bug, but it is clearly not
obvious and error prone.
This patch should fix#544. The fix for the use_backend post-parsing may be
backported to all stable releases. use-server is static in 2.1 and prior.
The issue can easily be reproduced with "stick on" statement
backend BE_NAME
stick-table type ip size 1k
stick on src
and calling dump() method on BE_NAME stick table from Lua
Before the fix, HAProxy would return 500 and log something like
the following:
runtime error: attempt to index a string value from [C] method 'dump'
Where one would expect a Lua table like this:
{
["IP_ADDR"] = {
["server_id"] = 1,
["server_name"] = "srv1"
}
}
This patch needs to backported to 1.9 and later releases.
A default statement in the switch testing the header name has been added to be
sure it is impossible to eval the value pattern on an uninitialized header
value. It should never happen of course. But this way, it is explicit.
And a comment has been added to make clear that ctx.value is always defined when
it is evaluated.
This patch fixes the issue #619.
For http-check send rules, it is now possible to use a log-format string to set
the request's body. the keyword "body-lf" should be used instead of "body". If the
string eval fails, no body is added.
For http-check send rules, it is now possible to use a log-format string to set
the request URI. the keyword "uri-lf" should be used instead of "uri". If the
string eval fails, we fall back on the default uri "/".
Since commit bb86986 ("MINOR: init: report the haproxy version and
executable path once on errors") the master-worker displays its version
and path upon a successful exits of a current worker. Which is kind of
confusing upon a clean exits.
This is due to the fact that that commit displays this upon a ha_alert()
which was used during the exit of the process.
Replace the ha_alert() by an ha_warning() if the process exit correctly
and was supposed to.
It still displays the message upon a SIGINT since the workers are
catching the signal.
A sample must always have a session defined. Otherwise, it is a bug. So it is
unnecessary to test if it is defined when called from a health checks context.
This patch fixes the issue #616.
Extra parameters on http-check expect rules, for the header matching method, to
use log-format string or to match full header line have been removed. There is
now separate matching methods to match a full header line or to match each
comma-separated values. "http-check expect fhdr" must be used in the first case,
and "http-check expect hdr" in the second one. In addition, to match log-format
header name or value, "-lf" suffix must be added to "name" or "value"
keyword. For intance:
http-check expect hdr name "set-cookie" value-lf -m beg "sessid=%[var(check.cookie)]"
Thanks to this changes, each parameter may only be interpreted in one way.
Following actions have been added to send log-format strings from a tcp-check
ruleset instead the log-format parameter:
* tcp-check send-lf <fmt>
* tcp-check send-binary-lf <fmt>
It is easier for tools generating configurations. Each action may only be
interpreted in one way.
All sample fetches in the scope "check." have been removed. Response sample
fetches must be used instead. It avoids keyword duplication. So, for instance,
res.hdr() must be now used instead of check.hdr().
To do so, following sample fetches have been added on the response :
* res.body, res.body_len and res.body_size
* res.hdrs and res.hdrs_bin
Sample feches dealing with the response's body are only useful in the health
checks context. When called from a stream context, there is no warranty on the
body presence. There is no option to wait the response's body.
It is now possible to use log-format string (or hexadecimal string for the
binary version) to match a content in tcp-check based expect rules. For
hexadecimal log-format string, the conversion in binary is performed after the
string evaluation, during health check execution. The pattern keywords to use
are "string-lf" for the log-format string and "binary-lf" for the hexadecimal
log-format string.
TCP and HTTP expect rules may fail because of unexpected and internal
error. Mainly during log-format strings eval. The error report is improved by
this patch.
An additional argument has been added to smp_prefetch_htx() function in the
commit 778f5ed47 ("MEDIUM: checks/http-fetch: Support htx prefetch from a check
for HTTP samples"). But forgot to update call in 51d module.
No need to backport.
An additional argument has been added to smp_prefetch_htx() function in the
commit 778f5ed47 ("MEDIUM: checks/http-fetch: Support htx prefetch from a check
for HTTP samples"). But forgot to update call in wurfl module.
No need to backport.
An additional argument has been added to smp_prefetch_htx() function in the
commit 778f5ed47 ("MEDIUM: checks/http-fetch: Support htx prefetch from a check
for HTTP samples"). But forgot to update call in da module.
No need to backport.
It is now possible to add http-check expect rules matching HTTP header names and
values. Here is the format of these rules:
http-check expect header name [ -m <meth> ] <name> [log-format] \
[ value [ -m <meth> ] <value> [log-format] [full] ]
the name pattern (name ...) is mandatory but the value pattern (value ...) is
optionnal. If not specified, only the header presence is verified. <meth> is the
matching method, applied on the header name or the header value. Supported
matching methods are:
* "str" (exact match)
* "beg" (prefix match)
* "end" (suffix match)
* "sub" (substring match)
* "reg" (regex match)
If not specified, exact matching method is used. If the "log-format" option is
used, the pattern (<name> or <value>) is evaluated as a log-format string. This
option cannot be used with the regex matching method. Finally, by default, the
header value is considered as comma-separated list. Each part may be tested. The
"full" option may be used to test the full header line. Note that matchings are
case insensitive on the header names.
For an http-check ruleset, it should be allowed to set a chain of expect
rules. But an error is triggered during the post-parsing because of a wrong
test, inherited from the evaluation mode before the refactoring.
No need to backport.
The status (ok, error and timeout) of an TCP or HTTP expect rule are set to
HCHK_STATUS_UNKNOWN by default, when not specified, during the configuration
parsing. This does not change the default status used for a terminal expect rule
(ok=L7OK, err=L7RSP and tout=L7TOUT). But this way, it is possible to know if a
specific status was forced by config or not.
It is now possible to use different matching methods to look for header names in
an HTTP message:
* The exact match. It is the default method. http_find_header() uses this
method. http_find_str_header() is an alias.
* The prefix match. It evals the header names starting by a prefix.
http_find_pfx_header() must be called to use this method.
* The suffix match. It evals the header names ending by a suffix.
http_find_sfx_header() must be called to use this method.
* The substring match. It evals the header names containing a string.
http_find_sub_header() must be called to use this method.
* The regex match. It evals the header names matching a regular expression.
http_match_header() must be called to use this method.
HTPP sample fetches acting on the response can now be called from any sample
expression or log-format string in a tcp-check based ruleset. To avoid any
ambiguities, all these sample fetches are in the check scope, for instance
check.hdr() or check.cook().
SSL sample fetches acting on the server connection can now be called from any
sample expression or log-format string in a tcp-check based ruleset. ssl_bc and
ssl_bc_* sample fetches are concerned.
It is now possible to call be_id, be_name, srv_id and srv_name sample fetches
from any sample expression or log-format string in a tcp-check based ruleset.
It is now possible to call check.payload(), check.payload_lv() and check.len()
sample fetches from any sample expression or log-format string in a tcp-check
based ruleset. In fact, check.payload() was already added. But instead of having
a specific function to handle this sample fetch, we use the same than
req.payload().
These sample fetches act on the check input buffer, containing data received for
the server. So it should be part of or after an expect rule, but before any send
rule. Because the input buffer is cleared at this stage.
Some HTTP sample fetches will be accessible from the context of a http-check
health check. Thus, the prefetch function responsible to return the HTX message
has been update to handle a check, in addition to a channel. Both cannot be used
at the same time. So there is no ambiguity.
A binary sample data can be converted, implicitly or not, to a string by cutting
the buffer on the first null byte.
I guess this patch should be backported to all stable versions.
srv_cleanup_connections() is supposed to be static, so mark it as so.
This patch should be backported where commit 6318d33ce6
("BUG/MEDIUM: connections: force connections cleanup on server changes")
will be backported, that is to say v1.9 to v2.1.
Fixes: 6318d33ce6 ("BUG/MEDIUM: connections: force connections cleanup
on server changes")
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
After we call SSL_SESSION_get_id(), the length of the id in bytes is
stored in "len", which was never checked. This could cause unexpected
behavior when using the "ssl_fc_session_id" or "ssl_bc_session_id"
fetchers (eg. the result can be an empty value).
The issue was introduced with commit 105599c ("BUG/MEDIUM: ssl: fix
several bad pointer aliases in a few sample fetch functions").
This patch must be backported to 2.1, 2.0, and 1.9.
When only request headers are parsed, the host header should not be compared to
the request authority because no start-line was parsed. Thus there is no
authority.
Till now this bug was hidden because this parsing mode was only used for the
response in the FCGI multiplexer. Since the HTTP checks refactoring, the request
headers may now also be parsed without the start-line.
This patch fixes the issue #610. It must be backported to 2.1.
I've been trying to understand a change of behaviour between v2.2dev5 and
v2.2dev6. Indeed our probe is regularly testing to add and remove
servers on a given backend such as:
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 263 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 0.0.0.0 0 1 256 256 0 15 3 0 14 0 0 0 - 0 -
-> curl on the corresponding frontend: reply from server:31255
# echo "set server be_foo/srv1 addr 10.236.139.34 port 31257" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '0.0.0.0' to '10.236.139.34', port changed from '0' to '31257' by 'stats socket command'
# echo "set server be_foo/srv1 weight 256" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 check-port 8500" | sudo socat stdio /var/lib/haproxy/stats
health check port updated.
# echo "set server be_foo/srv1 state ready" | sudo socat stdio /var/lib/haproxy/stats
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 105 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 10.236.139.34 2 0 256 256 2319 15 3 2 6 0 0 0 - 31257 -
-> curl on the corresponding frontend: reply for server:31257
(notice the difference of weight)
# echo "set server be_foo/srv1 state maint" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 addr 0.0.0.0 port 0" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '10.236.139.34' to '0.0.0.0', port changed from '31257' to '0' by 'stats socket command'
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 263 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 0.0.0.0 0 1 256 256 0 15 3 0 14 0 0 0 - 0 -
-> curl on the corresponding frontend: reply from server:31255
# echo "set server be_foo/srv1 addr 10.236.139.34 port 31256" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '0.0.0.0' to '10.236.139.34', port changed from '0' to '31256' by 'stats socket command'
# echo "set server be_foo/srv1 weight 256" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 check-port 8500" | sudo socat stdio /var/lib/haproxy/stats
health check port updated.
# echo "set server be_foo/srv1 state ready" | sudo socat stdio /var/lib/haproxy/stats
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 105 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 10.236.139.34 2 0 256 256 2319 15 3 2 6 0 0 0 - 31256 -
-> curl on the corresponding frontend: reply from server:31257 (!)
Here we indeed would expect to get an anver from server:31256. The issue
is highly linked to the usage of `pool-purge-delay`, with a value which
is higher than the duration of the test, 10s in our case.
a git bisect between dev5 and dev6 seems to show commit
079cb9af22 ("MEDIUM: connections: Revamp the way idle connections are killed")
being the origin of this new behaviour.
So if I understand the later correctly, it seems that it was more a
matter of chance that we did not saw the issue earlier.
My patch proposes to force clean idle connections in the two following
cases:
- we set a (still running) server to maintenance
- we change the ip/port of a server
This commit should be backported to 2.1, 2.0, and 1.9.
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
When a stream is detached from its connection, we try to move the connection in
an idle list to keep it opened, the session one or the server one. But it must
only be done if there is no connection error and if we want to keep it
open. This last statement is true if FCGI_CF_KEEP_CONN flag is set. But the test
is inverted at the stage.
This patch must be backported to 2.1.
fcgi_release() function is responsible to release a FCGI connection. But the
release of the connection itself is missing.
This patch must be backported to 2.1.
When the last stream is detached from a FCGI connection, if the server don't add
the connection in its idle list, the connection is destroyed. Thus it is
important to exist immediately from the detach function. A return statement is
missing here.
This bug was introduced in the commit 2444aa5b6 ("MEDIUM: sessions: Don't be
responsible for connections anymore.").
It is a 2.2-dev bug. No need to backport.
Now we very rarely catch spinning streams, and whenever we catch one it
seems a filter is involved, but we currently report no info about them.
Let's print the list of enabled filters on the stream with such a crash
to help with the reports. A typical output will now look like this:
[ALERT] 121/165908 (1110) : A bogus STREAM [0x7fcaf4016a60] is spinning at 2 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x7fcaf4016a60 src=127.0.0.1 fe=l1 be=l1 dst=<CACHE> rqf=6dc42000 rqa=48000 rpf=a0040223 rpa=24000000 sif=EST,10008 sib=DIS,80110 af=(nil),0 csf=0x7fcaf4023c00,10c000 ab=0x7fcaf40235f0,4 csb=(nil),0 cof=0x7fcaf4016610,1300:H1(0x7fcaf4016840)/RAW((nil))/tcpv4(29) cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0) filters={0x7fcaf4016fb0="cache store filter", 0x7fcaf4017080="compression filter"}]
This may be backported to 2.0.
I changed my mind twice on this one and pushed after the last test with
threads disabled, without re-enabling long long, causing this rightful
build warning.
This needs to be backported if the previous commit ff64d3b027 ("MINOR:
threads: export the POSIX thread ID in panic dumps") is backported as
well.
It is very difficult to map a panic dump against a gdb thread dump
because the thread numbers do not match. However gdb provides the
pthread ID but this one is supposed to be opaque and not to be cast
to a scalar.
This patch provides a fnuction, ha_get_pthread_id() which retrieves
the pthread ID of the indicated thread and casts it to an unsigned
long long so as to lose the least possible amount of information from
it. This is done cleanly using a union to maintain alignment so as
long as these IDs are stored on 1..8 bytes they will be properly
reported. This ID is now presented in the panic dumps so it now
becomes possible to map these threads. When threads are disabled,
zero is returned. For example, this is a panic dump:
Thread 1 is about to kill the process.
*>Thread 1 : id=0x7fe92b825180 act=0 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
stuck=1 prof=0 harmless=0 wantrdv=0
cpu_ns: poll=5119122 now=2009446995 diff=2004327873
curr_task=0xc99bf0 (task) calls=4 last=0
fct=0x592440(task_run_applet) ctx=0xca9c50(<CLI>)
strm=0xc996a0 src=unix fe=GLOBAL be=GLOBAL dst=<CLI>
rqf=848202 rqa=0 rpf=80048202 rpa=0 sif=EST,200008 sib=EST,204018
af=(nil),0 csf=0xc9ba40,8200
ab=0xca9c50,4 csb=(nil),0
cof=0xbf0e50,1300:PASS(0xc9cee0)/RAW((nil))/unix_stream(20)
cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0)
call trace(20):
| 0x59e4cf [48 83 c4 10 5b 5d 41 5c]: wdt_handler+0xff/0x10c
| 0x7fe92c170690 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x13690
| 0x7ffce29519d9 [48 c1 e2 20 48 09 d0 48]: linux-vdso:+0x9d9
| 0x7ffce2951d54 [eb d9 f3 90 e9 1c ff ff]: linux-vdso:__vdso_gettimeofday+0x104/0x133
| 0x57b484 [48 89 e6 48 8d 7c 24 10]: main+0x157114
| 0x50ee6a [85 c0 75 76 48 8b 55 38]: main+0xeaafa
| 0x50f69c [48 63 54 24 20 85 c0 0f]: main+0xeb32c
| 0x59252c [48 c7 c6 d8 ff ff ff 44]: task_run_applet+0xec/0x88c
Thread 2 : id=0x7fe92b6e6700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=786738 now=1086955 diff=300217
curr_task=0
Thread 3 : id=0x7fe92aee5700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=828056 now=1129738 diff=301682
curr_task=0
Thread 4 : id=0x7fe92a6e4700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=818900 now=1153551 diff=334651
curr_task=0
And this is the gdb output:
(gdb) info thr
Id Target Id Frame
* 1 Thread 0x7fe92b825180 (LWP 15234) 0x00007fe92ba81d6b in raise () from /lib64/libc.so.6
2 Thread 0x7fe92b6e6700 (LWP 15235) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6
3 Thread 0x7fe92a6e4700 (LWP 15237) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6
4 Thread 0x7fe92aee5700 (LWP 15236) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6
We can clearly see that while threads 1 and 2 are the same, gdb's
threads 3 and 4 respectively are haproxy's threads 4 and 3.
This may be backported to 2.0 as it removes some confusion in github issues.
We tried hard to make sure we report threads as not stuck at various
crucial places, but one of them is special, it's the listener_accept()
function. The reason it is special is because it will loop a certain
number of times (default: 64) accepting incoming connections, allocating
resources, dispatching them to other threads or running L4 rules on them,
and while all of this is supposed to be extremely fast, when the machine
slows down or runs low on memory, the expectedly small delays in malloc()
caused by contention with other threads can quickly accumulate and suddenly
become critical to the point of triggering the watchdog. Furthermore, it
is technically possible to trigger this by pure configuration by setting
a huge tune.maxaccept value, which should not be possible.
Given that each operation isn't related to the same task but to a different
one each time, it is appropriate to mark the thread as not stuck each time
it accepts new work that possibly gets dispatched to other threads which
execute it.
This looks like this could be a good reason for the issue reported in
issue #388.
This fix must be backported to 2.0.
Building without threads now shows this warning:
src/ssl_sock.c: In function 'cli_io_handler_commit_cert':
src/ssl_sock.c:12121:24: warning: unused variable 'bind_conf' [-Wunused-variable]
struct bind_conf *bind_conf = ckchi->bind_conf;
^~~~~~~~~
This is because the variable is needed only to unlock the structure, and
the unlock operation does nothing in this case. Let's mark the variable
__maybe_unused for this, but it would be convenient in the long term if
we could make the thread macros pretend they consume the argument so that
this remains less visible outside.
No backport is needed.
Because in HTTP, the host header and the request authority, if any, must be
identical, we keep both synchornized. It means the right flags are set on the
HTX statrt-line calling http_update_host(). There is no header when it happens,
but it is not an issue. Then, if a Host header is inserted,
http_update_authority() is called.
Note that for now, the host header is not automatically added when required.
Connection, content-length and transfer-encoding headers are ignored for
http-check send rules. For now, the keep-alive is not supported and the
"connection: close" header is always added to the request. And the
content-length header is automatically added.
cpu_calls, cpu_ns_avg, cpu_ns_tot, lat_ns_avg and lat_ns_tot depend on the
stream to find the current task and must check for it or they may cause a
crash if misused or used in a log-format string after commit 5f940703b3
("MINOR: log: Don't depends on a stream to process samples in log-format
string").
This must be backported as far as 1.9.
get_http_auth() expects a valid stream but this is not mentioned, though
fortunately it's always called from places which already check this.
smp_prefetch_htx() performs all the required checks and is the key to the
stability of almost all sample fetch functions, so let's make this clearer.
Since commit 5f940703b3 ("MINOR: log: Don't depends on a stream to process
samples in log-format string") it has become quite obvious that a few sample
fetch functions and converters were still heavily dependent on the presence
of a stream without testing for it.
The unique-id sample fetch function, if called without a stream, will result
in a crash.
This fix adds a check for the stream's existence, and should be backported
to all stable versions up to 1.7.
Since commit 5f940703b3 ("MINOR: log: Don't depends on a stream to process
samples in log-format string") it has become quite obvious that a few sample
fetch functions and converters were still heavily dependent on the presence
of a stream without testing for it.
The http_first_req sample fetch function, if called without a stream, will
result in a crash.
This fix adds a check for the stream's existence, and should be backported
to all stable versions up to 1.6.
Since commit 5f940703b3 ("MINOR: log: Don't depends on a stream to process
samples in log-format string") it has become quite obvious that a few sample
fetch functions and converters were still heavily dependent on the presence
of a stream without testing for it.
The capture.req.hdr, capture.res.hdr, capture.req.method, capture.req.uri,
capture.req.ver and capture.res.ver sample fetches used to assume the
presence of a stream, which is not necessarily the case (especially after
the commit above) and would crash haproxy if incorrectly used. Let's make
sure they check for this stream.
This fix adds a check for the stream's existence, and should be backported
to all stable versions up to 1.6.
Since commit 5f940703b3 ("MINOR: log: Don't depends on a stream to process
samples in log-format string") it has become quite obvious that a few sample
fetch functions and converters were still heavily dependent on the presence
of a stream without testing for it.
The capture-req and capture-res converters were in this case and could
crash the process if misused.
This fix adds a check for the stream's existence, and should be backported
to all stable versions up to 1.6.
Mux-h1 currently heavily relies on the presence of an upper stream, even
when waiting for a new request after one is being finished, and it's that
upper stream that's in charge of request and keep-alive timeouts for now.
But since recent commit 493d9dc6ba ("MEDIUM: mux-h1: do not blindly wake
up the tasklet at end of request anymore") that assumption was broken as
the purpose of this change was to avoid initiating processing of a request
when there's no data in the buffer. The side effect is that there's no more
timeout to handle the front connection, resulting in dead front connections
stacking up as clients get kicked off the net.
This fix makes sure we always enable the timeout when there's no stream
attached to the connection. It doesn't do this for back connections since
they may purposely be left idle.
No backport is needed as this bug was introduced in 2.2-dev4.
A bug was introduced in the commit 2edcd4cbd ("BUG/MINOR: checks: Avoid
incompatible cast when a binary string is parsed"). The length of the
destination buffer must be set before call the parse_binary() function.
No backport needed.
It can be sometimes useful to measure total time of a request as seen
from an end user, including TCP/TLS negotiation, server response time
and transfer time. "Tt" currently provides something close to that, but
it also takes client idle time into account, which is problematic for
keep-alive requests as idle time can be very long. "Ta" is also not
sufficient as it hides TCP/TLS negotiationtime. To improve that, introduce
a "Tu" timer, without idle time and everything else. It roughly estimates
time spent time spent from user point of view (without DNS resolution
time), assuming network latency is the same in both directions.
When a tcp-check line is parsed, a warning may be reported if the keyword is
used for a frontend. The return value must be used to report it. But this info
is lost before the end of the function.
Partly fixes issue #600. No backport needed.
When an error is found during the parsing of an expect rule (tcp or http),
everything is released at the same place, at the end of the function.
Partly fixes issue #600. No backport needed.
parse_binary() function must be called with a pointer on an integer. So don't
pass a pointer on a size_t element, casting it to a pointer on a integer.
Partly fixes issue #600. No backport needed.
The variable s, pointing on the check server, may be null when a connection is
openned. It happens for email alerts. To avoid ambiguities, its use is now more
explicit. Comments have been added at some places and tests on the variable have
been added elsewhere (useless but explicit).
Partly fixes issue #600.
If a message is not fully received from a mysql server, depending on last_read
value, an error must be reported or we must wait for more data. The first if
statement must not check last_read.
Partly fixes issue #600. No backport needed.
The version is partially parsed to set the flag HTX_SL_F_VER_11 on the HTX
message. But exactly 8 chars is expected. So if "HTTP/2" is specified, the flag
is not set. Thus, the version parsing has been updated to handle "HTTP/2" and
"HTTP/2.0" the same way.
For PostgreSQL health check, there is a regex on the backend authentication
packet. It must match to succeed. But it exists 6 types of authentication
packets and the regex only matches the first one (AuthenticationOK). This patch
fixes the regex to match all authentication packets.
No backport needed.
At the end of a tcp-check based health check, if there is still a connection
attached to the check, it must be closed. But it must be done before releasing
the session, because the session may still be referenced by the mux. For
instance, an h2 stream may still have a reference on the session.
No need to backport.
When a new connection is established, if an old connection is still attached to
the current check, it must be detroyed. When it happens, the old conn-stream
must be used to unsubscribe for events, not the new one.
No backport is needed.
The variable 'ret' must only be declared When HAProxy is compiled with the SSL
support (more precisely SSL_CTRL_SET_TLSEXT_HOSTNAME must be defined).
No backport needed.
Since the tcp-check based heath checks uses the best multuplexer for a
connection, the mux-pt is no longer the only possible choice. So events
subscriptions and unsubscriptions must be done with the mux.
No backport needed.
It is now possible to match on a comma-separated list of status codes or range
of codes. In addtion, instead of a string comparison to match the response's
status code, a integer comparison is performed. Here is an example:
http-check expect status 200,201,300-310
When default parameters are set in a request message, we get the client
connection using the session's origin. But it is not necessarily a
conncection. For instance, for health checks, thanks to recent changes, it may
be a check object. At this step, the client connection may be NULL. Thus, we
must be sure to have a client connection before using it.
This patch must be backported to 2.1.
It is now possible to force the mux protocol for a tcp-check based health check
using the server keyword "check-proto". If set, this parameter overwrites the
server one.
In the same way, a "proto" parameter has been added for tcp-check and http-check
connect rules. If set, this mux protocol overwrites all others for the current
connection.
First, when a server health check is initialized, it inherits the mux protocol
from the server if it is not already specified. Because there is no option to
specify the mux protocol for the checks, it is always inherited from the server
for now.
Then, if the connect rule is configured to use the server options, the mux
protocol of the check is used, if defined. Of course, if a mux protocol is
already defined for the connect rule, it is used in priority. But for now, it is
not possible.
Thus, if a server is configured to use, for instance, the h2 protocol, it is
possible to do the same for the health-checks.
No backport needed.
This reverts commit 1979943c30ef285ed04f07ecf829514de971d9b2.
Captures in comment was only used when a tcp-check expect based on a negative
regex matching failed to eventually report what was captured while it was not
expected. It is a bit far-fetched to be useable IMHO. on-error and on-success
log-format strings are far more usable. For now there is few check sample
fetches (in fact only one...). But it could be really powerful to report info in
logs.
HTTP health-checks now use HTX multiplexers. So it is important to really send
the amount of outgoing data for such checks because the HTX buffers appears
always full.
No backport needed.
Since all tcp-check rulesets are globally stored, it is a problem to use
list. For configuration with many backends, the lookups in list may be costly
and slow downs HAProxy startup. To solve this problem, tcp-check rulesets are
now stored in a tree.
When some data are scheduled to be sent, we must be sure to subscribe for sends
if nothing was sent. Because of a bug, when nothing was sent, connection errors
are checks. If no error is found, we exit, waiting for more data, without any
subcription on send events.
No need to backport.
Instead of accessing directly to the ist fields, the ist API is used instead. To
get its length or its pointer, to release it or to duplicate it. It is more
readable this way.
The patch is not obvious at the first glance. But it is just a reorg. Functions
have been grouped and ordered in a more logical way. Some structures and flags
are now private to the checks module (so moved from the .h to the .c file).
Thanks to previous change, it is now possible to removed all code handling pure
tcp checks. Now every connection based health-checks are handled by the
tcpcheck_main() function. __event_srv_chk_w() and __event_srv_chk_r() have been
removed. And all connection establishment is handled at one place.
Defaut health-checks, without any option, doing only a connection check, are now
based on tcp-checks. An implicit default tcp-check connect rule is used. A
shared tcp-check ruleset, name "*tcp-check" is created to support these checks.
When a tcp-check connect rule is evaluated, the mux protocol corresponding to
the health-check is chosen. So for TCP based health-checks, the mux-pt is
used. For HTTP based health-checks, the mux-h1 is used. The connection is marked
as private to be sure to not ruse regular HTTP connection for
health-checks. Connections reuse will be evaluated later.
The functions evaluating HTTP send rules and expect rules have been updated to
be HTX compliant. The main change for users is that HTTP health-checks are now
stricter on the HTTP message format. While before, the HTTP formatting and
parsing were minimalist, now messages should be well formatted.
Before, the server was used as origin during session creation. It was only used
to get the check associated to the server when a variable is get or set in the
check scope or when a check sample fetch was called. So it seems easier to use
the check as origin of a session. It is also more logical becaues the session is
created by the health-check itself and not its server.
A dedicated function is now used to received data. fundamentally, it should do
the same operations than before. But the way data are received has been reworked
to be closer to the si_cs_recv() function.
First tests before executing the loop on tcp-check rules in tcpcheck_main()
function have been slightly modified to be more explicit and easier to
understand.
HTTP health-checks are now internally based on tcp-checks. Of course all the
configuration parsing of the "http-check" keyword and the httpchk option has
been rewritten. But the main changes is that now, as for tcp-check ruleset, it
is possible to perform several send/expect sequences into the same
health-checks. Thus the connect rule is now also available from HTTP checks, jst
like set-var, unset-var and comment rules.
Because the request defined by the "option httpchk" line is used for the first
request only, it is now possible to set the method, the uri and the version on a
"http-check send" line.
the get_last_tcpcheck_rule() function iters on a rule list in the reverse order
and returns the first non comment and non action-kw rule. If no such rule is
found, NULL is returned.
Instead of having 2 independent integers, used as boolean values, to know if the
expect rule is invered and to know if the matching regexp has captures, we know
use a 32-bits bitfield.
All tcp-check rules are now stored in the globla shared list. The ones created
to parse a specific protocol, for instance redis, are already stored in this
list. Now pure tcp-check rules are also stored in it. The ruleset name is
created using the proxy name and its config file and line. tcp-check rules
declared in a defaults section are also stored this way using "defaults" as
proxy name.
For now, all tcp-check ruleset are stored in a list. But it could be a bit slow
to looks for a specific ruleset with a huge number of backends. So, it could be
a good idea to use a tree instead.
It is now possible to specified the healthcheck status to use on success of a
tcp-check rule, if it is the last evaluated rule. The option "ok-status"
supports "L4OK", "L6OK", "L7OK" and "L7OKC" status.
A shared tcp-check ruleset is now created to support agent checks. The following
sequence is used :
tcp-check send "%[var(check.agent_string)] log-format
tcp-check expect custom
The custom function to evaluate the expect rule does the same that it was done
to handle agent response when a custom check was used.
Parsing of following keywords have been moved in checks.c file : addr, check,
check-send-proxy, check-via-socks4, no-check, no-check-send-proxy, rise, fall,
inter, fastinter, downinter and port.
A share tcp-check ruleset is now created to support SPOP checks. This way no
extra memory is used if several backends use a SPOP check.
The following sequence is used :
tcp-check send-binary SPOP_REQ
tcp-check expect custom min-recv 4
The spop request is the result of the function
spoe_prepare_healthcheck_request() and the expect rule relies on a custom
function calling spoe_handle_healthcheck_response().
A shared tcp-check ruleset is now created to support LDAP check. This way no
extra memory is used if several backends use a LDAP check.
The following sequance is used :
tcp-check send-binary "300C020101600702010304008000"
tcp-check expect rbinary "^30" min-recv 14 \
on-error "Not LDAPv3 protocol"
tcp-check expect custom
The last expect rule relies on a custom function to check the LDAP server reply.
A share tcp-check ruleset is now created to support MySQL checks. This way no
extra memory is used if several backends use a MySQL check.
One for the following sequence is used :
## If no extra params are set
tcp-check connect default linger
tcp-check expect custom ## will test the initial handshake
## If the username is defined
tcp-check connect default linger
tcp-check send-binary MYSQL_REQ log-format
tcp-check expect custom ## will test the initial handshake
tcp-check expect custom ## will test the reply to the client message
The log-format hexa string MYSQL_REQ depends on 2 preset variables, the packet
header containing the packet length and the sequence ID (check.header) and the
username (check.username). If is also different if the "post-41" option is set
or not. Expect rules relies on custom functions to check MySQL server packets.
A shared tcp-check ruleset is now created to support postgres check. This way no
extra memory is used if several backends use a pgsql check.
The following sequence is used :
tcp-check connect default linger
tcp-check send-binary PGSQL_REQ log-format
tcp-check expect !rstring "^E" min-recv 5 \
error-status "L7RSP" on-error "%[check.payload(6,0)]"
tcp-check expect rbinary "^520000000800000000 min-recv "9" \
error-status "L7STS" \
on-success "PostgreSQL server is ok" \
on-error "PostgreSQL unknown error"
The log-format hexa string PGSQL_REQ depends on 2 preset variables, the packet
length (check.plen) and the username (check.username).
A share tcp-check ruleset is now created to support smtp checks. This way no
extra memory is used if several backends use a smtp check.
The following sequence is used :
tcp-check connect default linger
tcp-check expect rstring "^[0-9]{3}[ \r]" min-recv 4 \
error-status "L7RSP" on-error "%[check.payload(),cut_crlf]"
tcp-check expect rstring "^2[0-9]{2}[ \r]" min-recv 4 \
error-status "L7STS" \
on-error %[check.payload(4,0),ltrim(' '),cut_crlf] \
status-code "check.payload(0,3)"
tcp-echeck send "%[var(check.smtp_cmd)]\r\n" log-format
tcp-check expect rstring "^2[0-9]{2}[- \r]" min-recv 4 \
error-status "L7STS" \
on-error %[check.payload(4,0),ltrim(' '),cut_crlf] \
on-success "%[check.payload(4,0),ltrim(' '),cut_crlf]" \
status-code "check.payload(0,3)"
The variable check.smtp_cmd is by default the string "HELO localhost" by may be
customized setting <helo> and <domain> parameters on the option smtpchk
line. Note there is a difference with the old smtp check. The server gretting
message is checked before send the HELO/EHLO comand.
A shared tcp-check ruleset is now created to support ssl-hello check. This way
no extra memory is used if several backends use a ssl-hello check.
The following sequence is used :
tcp-check send-binary SSLV3_CLIENT_HELLO log-format
tcp-check expect rbinary "^1[56]" min-recv 5 \
error-status "L6RSP" tout-status "L6TOUT"
SSLV3_CLIENT_HELLO is a log-format hexa string representing a SSLv3 CLIENT HELLO
packet. It is the same than the one used by the old ssl-hello except the sample
expression "%[date(),htonl,hex]" is used to set the date field.
A share tcp-check ruleset is now created to support redis checks. This way no
extra memory is used if several backends use a redis check.
The following sequence is used :
tcp-check send "*1\r\n$4\r\nPING\r\n"
tcp-check expect string "+PONG\r\n" error-status "L7STS" \
on-error "%[check.payload(),cut_crlf]" on-success "Redis server is ok"
It is now possible to set a custom function to evaluate a tcp-check expect
rule. It is an internal and not documentd option because the right pointer of
function must be set and it is not possible to express it in the
configuration. It will be used to convert some protocol healthchecks to
tcp-checks.
Custom functions must have the following signature:
enum tcpcheck_eval_ret (*custom)(struct check *, struct tcpcheck_rule *, int);
A list of variables is now associated to each tcp-check ruleset. It is more a
less a list of set-var expressions. This list may be filled during the
configuration parsing. The listed variables will then be set during each
execution of the tcp-check healthcheck, at the begining, before execution of the
the first tcp-check rule.
This patch is mandatory to convert all protocol checks to tcp-checks. It is a
way to customize shared tcp-check rulesets.
This option defines a sample expression, evaluated as an integer, to set the
status code (check->code) if a tcp-check healthcheck ends on the corresponding
expect rule.
These options define log-format strings used to produce the info message if a
tcp-check expect rule fails (on-error option) or succeeds (on-success
option). For this last option, it must be the ending rule, otherwise the
parameter is ignored.
It is now possible to extract information from the check input buffer using the
check.payload sample fetch. As req.payload or res.payload, an offset and a
length must be specified.
A new section has been added in the configuration manual. Now check sample
fetches will have to be documented under the section 7.3.7 (Fetching
health-check samples).
When a tcp-check healthcheck fails on a specific rule with no dedicated comment,
we look in previous rules if a comment rule is specified. Now, instead of doing
it during tcp-checks execution, we assign the comment to the corresponding rules
during the configuration parsing. So after HAProxy startup, no more comment
rules remains in a tcp-check ruleset.
It is now possible to specified the healthcheck status to use on error or on
timeout for tcp-check expect rules. First, to define the error status, the
option "error-status" must be used followed by "L4CON", "L6RSP", "L7RSP" or
"L7STS". Then, to define the timeout status, the option "tout-status" must be
used followed by "L4TOUT", "L6TOUT" or "L7TOUT".
These options will be used to convert specific protocol healthchecks (redis,
pgsql...) to tcp-check ones.
x
This converter tranform a integer to its binary representation in the network
byte order. Integer are already automatically converted to binary during sample
expression evaluation. But because samples own 8-bytes integers, the conversion
produces 8 bytes. the htonl converter do the same but for 4-bytes integer.
A global list to tcp-check ruleset can now be used to share common rulesets with
all backends without any duplication. It is mandatory to convert all specific
protocol checks (redis, pgsql...) to tcp-check healthchecks.
To do so, a flag is now attached to each tcp-check ruleset to know if it is a
shared ruleset or not. tcp-check rules defined in a backend are still directly
attached to the proxy and not shared. In addition a second flag is used to know
if the ruleset is inherited from the defaults section.
When a log-format string is parsed, if a sample fetch is found, the flag LW_REQ
is systematically added on the proxy. Unfortunately, this produce a warning
during HAProxy start-up when a log-format string is used for a tcp-check send
rule. Now this flag is only added if the parsed sample fetch depends on HTTP
information.
When a log-format string is evaluated, there is no reason to process sample
fetches only when a stream is defined. Several sample fetches are available
outside the stream scope. All others should handle calls without stream. This
patch is mandatory to support log-format string in tcp-check rules.
An extra parameter for tcp-check send rules can be specified to handle the
string or the hexa string as a log-format one. Using "log-format" option,
instead of considering the data to send as raw data, it is parsed as a
log-format string. Thus it is possible to call sample fetches to customize data
sent to a server. Of course, because we have no stream attached to healthchecks,
not all sample fetches are available. So be careful.
tcp-check set-var(check.port) int(8000)
tcp-check set-var(check.uri) str(/status)
tcp-check connect port var(check.port)
tcp-check send "GET %[check.uri] HTTP/1.0\r\n" log-format
tcp-check send "Host: %[srv_name]\r\n" log-format
tcp-check send "\r\n"
Since we have a session attached to tcp-check healthchecks, It is possible use
sample expression and variables. In addition, it is possible to add tcp-check
set-var rules to define custom variables. So, now, a sample expression can be
used to define the port to use to establish a connection for a tcp-check connect
rule. For instance:
tcp-check set-var(check.port) int(8888)
tcp-check connect port var(check.port)
With this option, it is now possible to use a specific address to open the
connection for a tcp-check connect rule. If the port option is also specified,
it is used in priority.
With this option, it is possible to open a connection from a tcp-check connect
rule using all parameter of the server line, like any other healthcheck. For
now, this parameter is exclusive with all other option for a tcp-check connect
rule.
With this option, it is possible to establish the connection opened by a
tcp-check connect rule using upstream socks4 proxy. Info from the socks4
parameter on the server are used.
Evaluate the registered action_ptr associated with each CHK_ACTION_KW rules from
a ruleset. Currently only the 'set-var' and 'unset-var' are parsed by the
tcp-check parser. Thus it is now possible to set or unset variables. It is
possible to use such rules before the first connect of the ruleset.
Register the custom action rules "set-var" and "unset-var", that will
call the parse_store() command upon parsing.
These rules are thus built and integrated to the tcp-check ruleset, but
have no further effect for the moment.
Add a dedicated vars scope for checks. This scope is considered as part of the
session scope for accounting purposes.
The scope can be addressed by a valid session, even embryonic. The stream is not
necessary.
The scope is initialized after the check session is created. All variables are
then pruned before the session is destroyed.
Create a session for each healthcheck relying on a tcp-check ruleset. When such
check is started, a session is allocated, which will be freed when the check
finishes. A dummy static frontend is used to create these sessions. This will be
useful to support variables and sample expression. This will also be used,
later, by HTTP healthchecks to rely on HTTP muxes.
The loop in tcpcheck_main() function is quite hard to understand. Depending
where we are in the loop, The current_step is the currentely executed rule or
the one to execute on the next call to tcpcheck_main(). When the check result is
reported, we rely on the rule pointed by last_started_step or the one pointed by
current_step. In addition, the loop does not use the common list_for_each_entry
macro and it is thus quite confusing.
So the loop has been totally rewritten and splitted to several functions to
simplify its reading and its understanding. Tcp-check rules are evaluated in
dedicated functions. And a common for_each loop is used and only one rule is
referenced, the current one.
After the configuration parsing, when its validity check, an implicit tcp-check
connect rule is added in front of the tcp-check ruleset if the first non-comment
rule is not a connect one. This implicit rule is flagged to use the default
check parameter.
This means now, all tcp-check rulesets begin with a connect and are never
empty. When tcp-check healthchecks are used, all connections are thus handled by
tcpcheck_main() function.
The keyword 'tcp-check' is now parsed in a dedicated callback function. Thus the
code to parse these rules is now located in checks.c. In addition, a deinit
function have been added to release proxy tcp-check rules, on error or when
HAProxy is stopped.
This patch is based on Gaetan Rivet work. It uses a callback function registerd
on the 'tcp-check' keyword instead, but the spirit is the same.
To allow reusing these blocks without consuming more memory, their list
should be static and share-able accross uses. The head of the list will
be shared as well.
It is thus necessary to extract the head of the rule list from the proxy
itself. Transform it into a pointer instead, that can be easily set to
an external dynamically allocated head.
Parse back-references in comments of tcp-check expect rules. If references are
made, capture groups in the match and replace references to it within the
comment when logging the error. Both text and binary regex can caputre groups
and reference them in the expect rule comment.
[Cf: I slightly updated the patch. exp_replace() function is used instead of a
custom one. And if the trash buffer is too small to contain the comment during
the substitution, the comment is ignored.]
The loop to find the id corresponding to the current rule in
tcpcheck_get_step_id() function has been simplified. And
tcpcheck_get_step_comment() function now only relies on the current rule to find
the rigth comment string. The step id is no longer used. To do so, we iterate
backward from the current step to find the first COMMENT rule immediately
preceedding the expect rule chain.
The rbinary match works similarly to the rstring match type, however the
received data is rewritten as hex-string before the match operation is
done.
This allows using regexes on binary content even with the POSIX regex
engine.
[Cf: I slightly updated the patch. mem2hex function was removed and dump_binary
is used instead.]
On the input buffer, it was mainly done to call regex_exec() function. But
regex_exec2() can be used instead. This way, it is no more required to add the
terminating null byte. For the output buffer, it was only done for debugging
purpose.
Allow declaring tcpcheck connect commands with a new parameter,
"linger". This option will configure the connection to avoid using an
RST segment to close, instead following the four-way termination
handshake. Some servers would otherwise log each healthcheck as
an error.
Some expect rules cannot be satisfied due to inherent ambiguity towards
the received data: in the absence of match, the current behavior is to
be forced to wait either the end of the connection or a buffer full,
whichever comes first. Only then does the matching diagnostic is
considered conclusive. For instance :
tcp-check connect
tcp-check expect !rstring "^error"
tcp-check expect string "valid"
This check will only succeed if the connection is closed by the server before
the check timeout. Otherwise the first expect rule will wait for more data until
"^error" regex matches or the check expires.
Allow the user to explicitly define an amount of data that will be
considered enough to determine the value of the check.
This allows succeeding on negative rstring rules, as previously
in valid condition no match happened, and the matching was repeated
until the end of the connection. This could timeout the check
while no error was happening.
[Cf: I slighly updated the patch. The parameter was renamed and the value is a
signed integer to support -1 as default value to ignore the parameter.]
Reduce copy of parsing portions that is common to all three types of
expect actions.
This reduces the amount of code, helping maintainability and reducing
future change spread.
Functionality is identical.
When receiving additional data while chaining multiple tcp-check expects,
previous inverse expects might have a different result with the new data. They
need to be evaluated again against the new data.
Add a pointer to the first inverse expect rule of the current expect chain
(possibly of length one) to each expect rule. When receiving new data, the
currently evaluated tcp-check rule is set back to this pointed rule.
Fonctionnaly speaking, it is a bug and it exists since the introduction of the
feature. But there is no way for now to hit it because when an expect rule does
not match, we wait for more data, independently on the inverse flag. The only
way to move to the following rule is to be sure no more data will be received.
This patch depends on the commit "MINOR: mini-clist: Add functions to iterate
backward on a list".
[Cf: I slightly updated the patch. First, it only concerns inverse expect
rule. Normal expect rules are not concerned. Then, I removed the BUG tag
because, for now, it is not possible to move to the following rule when the
current one does not match while more data can be received.]
TCP check expect matching strings or binary patterns are able to know
prior to applying their match function whether the available data is
already sufficient to attempt the match or not.
As such, on insufficient data the expect is postponed. This behavior
avoids unnecessary matches when the data could not possibly match.
When chaining expect, upon passing the previous and going onto the next
however, this length check is not done again. Then the match is done and
will necessarily fail, triggering a new wait for more data. The end
result is the same for a slightly higher cost.
Check received data length for all expects in a chain.
This bug exists since the introduction of the feature:
Fixes: 5ecb77f4c7 ("MEDIUM: checks: add send/expect tcp based check")
Version 1.5+ impacted.
The options and directives related to the configuration of checks in a backend
may be defined after the servers declarations. So, initialization of the check
of each server must not be performed during configuration parsing, because some
info may be missing. Instead, it must be done during the configuration validity
check.
Thus, callback functions are registered to be called for each server after the
config validity check, one for the server check and another one for the server
agent-check. In addition deinit callback functions are also registered to
release these checks.
This patch should be backported as far as 1.7. But per-server post_check
callback functions are only supported since the 2.1. And the initcall mechanism
does not exist before the 1.9. Finally, in 1.7, the code is totally
different. So the backport will be harder on older versions.
This options is used to force a non-SSL connection to check a SSL server or to
invert a check-ssl option inherited from the default section. The use_ssl field
in the check structure is used to know if a SSL connection must be used
(use_ssl=1) or not (use_ssl=0). The server configuration is used by default.
The problem is that we cannot distinguish the default case (no specific SSL
check option) and the case of an explicit non-SSL check. In both, use_ssl is set
to 0. So the server configuration is always used. For a SSL server, when
no-check-ssl option is set, the check is still performed using a SSL
configuration.
To fix the bug, instead of a boolean value (0=TCP, 1=SSL), we use a ternary value :
* 0 = use server config
* 1 = force SSL
* -1 = force non-SSL
The same is done for the server parameter. It is not really necessary for
now. But it is a good way to know is the server no-ssl option is set.
In addition, the PR_O_TCPCHK_SSL proxy option is no longer used to set use_ssl
to 1 for a check. Instead the flag is directly tested to prepare or destroy the
server SSL context.
This patch should be backported as far as 1.8.
Error codes ERR_WARN and ERR_ALERT are used to signal that the error
given is of the corresponding level. All errors are displayed as ALERT
in the display_parser_err() function.
Differentiate the display level based on the error code. If both
ERR_WARN and ERR_ALERT are used, ERR_ALERT is given priority.
The 'http-check send' directive have been added to add headers and optionnaly a
payload to the request sent during HTTP healthchecks. The request line may be
customized by the "option httpchk" directive but there was not official way to
add extra headers. An old trick consisted to hide these headers at the end of
the version string, on the "option httpchk" line. And it was impossible to add
an extra payload with an "http-check expect" directive because of the
"Connection: close" header appended to the request (See issue #16 for details).
So to make things official and fully support payload additions, the "http-check
send" directive have been added :
option httpchk POST /status HTTP/1.1
http-check send hdr Content-Type "application/json;charset=UTF-8" \
hdr X-test-1 value1 hdr X-test-2 value2 \
body "{id: 1, field: \"value\"}"
When a payload is defined, the Content-Length header is automatically added. So
chunk-encoded requests are not supported yet. For now, there is no special
validity checks on the extra headers.
This patch is inspired by Kiran Gavali's work. It should fix the issue #16 and
as far as possible, it may be backported, at least as far as 1.8.
Server address and port may change at runtime. So the address and port passed as
arguments and as environment variables when an external check is executed must
be updated. The current number of connections on the server was already updated
before executing the command. So the same mechanism is used for the server
address and port. But in addition, command arguments are also updated.
This patch must be backported to all stable versions. It should fix the
issue #577.
It is the intended behaviour. But because of a bug, the 500 error resulting of a
rewrite failure during http-after-response ruleset evaluation is also
rewritten. So if at this step, if there is also a rewrite error, the session is
closed and no error message is returned.
Instead, we must be sure to not evaluate the http-after-response rules on an
error message if it is was thrown because of a rewrite failure on a previous
error message.
It is a 2.2-dev2+ bug. No need to backport. This patch should fix the issue
pool_gc() causes quite some stress on the memory allocator because
it calls a lot of free() calls while other threads are using malloc().
In addition, pool_gc() needs to take care of possible locking because
it may be called from pool allocators. One way to avoid all this is to
use thread_isolate() to make sure the gc runs alone. By putting less
pressure on the pools and getting rid of the locks, it may even take
less time to complete.
The url_decode() function used by the url_dec converter and a few other
call points is ambiguous on its processing of the '+' character which
itself isn't stable in the spec. This one belongs to the reserved
characters for the query string but not for the path nor the scheme,
in which it must be left as-is. It's only in argument strings that
follow the application/x-www-form-urlencoded encoding that it must be
turned into a space, that is, in query strings and POST arguments.
The problem is that the function is used to process full URLs and
paths in various configs, and to process query strings from the stats
page for example.
This patch updates the function to differentiate the situation where
it's parsing a path and a query string. A new argument indicates if a
query string should be assumed, otherwise it's only assumed after seeing
a question mark.
The various locations in the code making use of this function were
updated to take care of this (most call places were using it to decode
POST arguments).
The url_dec converter is usually called on path or url samples, so it
needs to remain compatible with this and will default to parsing a path
and turning the '+' to a space only after a question mark. However in
situations where it would explicitly be extracted from a POST or a
query string, it now becomes possible to enforce the decoding by passing
a non-null value in argument.
It seems to be what was reported in issue #585. This fix may be
backported to older stable releases.
A typo resulted in '||' being used to concatenate trace flags, which will
only set flag of value '1' there. Noticed by clang 10 and reported in
issue #588.
No backport is needed, this trace was added in 2.2-dev.
When checking www-authenticate headers, we don't want to just accept
"NTLM" as value, because the server may send "HTLM <base64 value>". Instead,
just check that it starts with NTLM.
This should be backported to 2.1, 2.0, 1.9 and 1.8.
Documentation states that default settings for ssl server options can be set
using either ssl-default-server-options or default-server directives. In practice,
not all ssl server options can have default values, such as ssl-min-ver, ssl-max-ver,
etc..
This patch adds the missing ssl options in srv_ssl_settings_cpy() and srv_parse_ssl(),
making it possible to write configurations like the following examples, and have them
behave as expected.
global
ssl-default-server-options ssl-max-ver TLSv1.2
defaults
mode http
listen l1
bind 1.2.3.4:80
default-server ssl verify none
server s1 1.2.3.5:443
listen l2
bind 2.2.3.4:80
default-server ssl verify none ssl-max-ver TLSv1.3 ssl-min-ver TLSv1.2
server s1 1.2.3.6:443
This should be backported as far as 1.8.
This fixes issue #595.
This option activate the feature introduce in commit 16739778:
"MINOR: ssl: skip self issued CA in cert chain for ssl_ctx".
The patch disable the feature per default.
When trying to insert a new certificate into a directory with "add ssl
crt-list", no check were done on the path of the new certificate.
To be more consistent with the HAProxy reload, when adding a file to
a crt-list, if this crt-list is a directory, the certificate will need
to have the directory in its path.
Allowing the use of SSL options and filters when adding a file in a
directory is not really consistent with the reload of HAProxy. Disable
the ability to use these options if one try to use them with a directory.
This patch adds the sysname, release, version and machine fields from
the uname results to the version output. It intentionally leaves out the
machine name, because it is usually not useful and users might not want to
expose their machine names for privacy reasons.
May be backported if it is considered useful for debugging.
If haproxy fails to start and emits an alert, then it can be useful
to have it also emit the version and the path used to load it. Some
users may be mistakenly launching the wrong binary due to a misconfigured
PATH variable and this will save them some troubleshooting time when it
reports that some keywords are not understood.
What we do here is that we *try* to extract the binary name from the
AUX vector on glibc, and we report this as a NOTICE tag before the
very first alert is emitted.
Some portability issues were met a few times in the past depending on
compiler versions, but this one was not reported in haproxy -vv output
while it's trivial to add it. This patch tries to be the most accurate
by explicitly reporting the clang version if detected, otherwise the
gcc version.
Since some systems switched to service managers which hide all warnings
by default, some users are not aware of some possibly important warnings
and get caught too late with errors that could have been detected earlier.
This patch adds a new global keyword, "zero-warning" and an equivalent
command-line option "-dW" to refuse to start in case any warning is
detected. It is recommended to use these with configurations that are
managed by humans in order to catch mistakes very early.
This helps quickly checking if the config produces any warning. For
this we reuse the "warned" bit field to add a new WARN_ANY bit that is
set by ha_warning(). The rest of the bit field was also cleaned from
unused bits.
Before supporting "server" line in "peers" section, such sections without
any local peer were removed from the configuration to get it validated.
This patch fixes the issue where a "server" line without address and port which
is a remote peer without address and port makes the configuration parsing fail.
When encoutering such cases we now ignore such lines remove them from the
configuration.
Thank you to Jrme Magnin for having reported this bug.
Must be backported to 2.1 and 2.0.
Commit 7f26391bc5 ("BUG/MINOR: connection: make sure to correctly tag
local PROXY connections") revealed that some implementations do not
properly ignore addresses in LOCAL connections (at least Dovecot was
spotted). More context information in the thread below:
https://www.mail-archive.com/haproxy@formilux.org/msg36890.html
The patch above was using LOCAL on top of local addresses in order to
minimize the risk of breakage but revealed worse than a clean fix. So
let's partially revert it and send pure LOCAL connections instead now.
After a bit of observation, this patch should be progressively backported
to stable branches. However if it reveals new breakage, the backport of
the patch above will have to be reverted from stable branches while other
products work on fixing their code based on the master branch.
When reading a crt-list file, the SSL options betweeen square brackets
are parsed, however the calling function sets the ssl_conf ptr to NULL
leading to all options being ignored, and a memory leak.
This is a remaining of the previous code which was forgotten.
This bug was introduced by 97b0810 ("MINOR: ssl: split the line parsing
of the crt-list").
Create a ckch_store_new() function which alloc and initialize a
ckch_store, allowing us to remove duplicated code and avoiding wrong
initialization in the future.
Bug introduced by d9d5d1b ("MINOR: ssl: free instances and SNIs with
ckch_inst_free()").
Upon an 'commit ssl cert' the HA_RWLOCK_WRUNLOCK of the SNI lock is done
with using the bind_conf pointer of the ckch_inst which was freed.
Fix the problem by using an intermediate variable to store the
bind_conf pointer.
Replace ckchs_free() by ckch_store_free() which frees the ckch_store but
now also all its ckch_inst with ckch_inst_free().
Also remove the "ckchs" naming since its confusing.
In 'commit ssl cert', instead of trying to regenerate a list of filters
from the SNIs, use the list provided by the crtlist_entry used to
generate the ckch_inst.
This list of filters doesn't need to be free'd anymore since they are
always reused from the crtlist_entry.
Use the refcount of the SSL_CTX' to free them instead of freeing them on
certains conditions. That way we can free the SSL_CTX everywhere its
pointer is used.
When deleting the previous SNI entries with 'set ssl cert', the old
SSL_CTX' were not free'd, which probably prevent the completion of the
free of the X509 in the old ckch_store, because of the refcounts in the
SSL library.
This bug was introduced by 150bfa8 ("MEDIUM: cli/ssl: handle the
creation of SSL_CTX in an IO handler").
Must be backported to 2.1.
The crtlist_load_cert_dir() caches the directory name without trailing
slashes when ssl_sock_load_cert_list_file() tries to lookup without
cleaning the trailing slashes.
This bug leads to creating the crtlist twice and prevents to remove
correctly a crtlist_entry since it exists in the serveral crtlists
created by accident.
Move the trailing slashes cleanup in ssl_sock_load_cert_list_file() to
fix the problem.
This bug was introduced by 6be66ec ("MINOR: ssl: directories are loaded
like crt-list")
Delete a certificate store from HAProxy and free its memory. The
certificate must be unused and removed from any crt-list or directory.
The deletion doesn't work with a certificate referenced directly with
the "crt" directive in the configuration.
Bundles are deprecated and can't be used with the crt-list command of
the CLI, improve the error output when trying to use them so the users
can disable them.
The cli_parse_del_crtlist() does unlock the ckch big lock, but it does
not lock it at the beginning of the function which is dangerous.
As a side effect it let the structures locked once it called the unlock.
This bug was introduced by 0a9b941 ("MINOR: ssl/cli: 'del ssl crt-list'
delete an entry")
Issue #574 reported an unclear error when trying to open a file with not
enough permission.
[ALERT] 096/032117 (835) : parsing [/etc/haproxy/haproxy.cfg:54] : 'bind :443' : error encountered while processing 'crt'.
[ALERT] 096/032117 (835) : Error(s) found in configuration file : /etc/haproxy/haproxy.cfg
[ALERT] 096/032117 (835) : Fatal errors found in configuration.
Improve the error to give us more information:
[ALERT] 097/142030 (240089) : parsing [test.cfg:22] : 'bind :443' : cannot open the file 'kikyo.pem.rsa'.
[ALERT] 097/142030 (240089) : Error(s) found in configuration file : test.cfg
[ALERT] 097/142030 (240089) : Fatal errors found in configuration.
This patch could be backported in 2.1.
The dump and show ssl crt-list commands does the same thing, they dump
the content of a crt-list, but the 'show' displays an ID in the first
column. Delete the 'dump' command so it is replaced by the 'show' one.
The old 'show' command is replaced by an '-n' option to dump the ID.
And the ID which was a pointer is replaced by a line number and placed
after colons in the filename.
Example:
$ echo "show ssl crt-list -n kikyo.crt-list" | socat /tmp/sock1 -
# kikyo.crt-list
kikyo.pem.rsa:1 secure.domain.tld
kikyo.pem.ecdsa:2 secure.domain.tld
Delete an entry in a crt-list, this is done by iterating over the
ckch_inst in the crtlist_entry. For each ckch_inst the bind_conf lock is
held during the deletion of the sni_ctx in the SNI trees. Everything
is free'd.
If there is several entries with the same certificate, a line number
must be provided to chose with entry delete.
Initialize fcount to 0 when 'add ssl crt-list' does not contain any
filters. This bug can lead to trying to read some filters even if they
doesn't exist.
The HPACK header table is implemented as a wrapping list inside a contigous
area. Headers names and values are stored from right to left while indexes
are stored from left to right. When there's no more room to store a new one,
we wrap to the right again, or possibly defragment it if needed. The condition
do use the right part (called tailroom) or the left part (called headroom)
depends on the location of the last inserted header. After wrapping happens,
the code forces to stick to tailroom by pretending there's no more headroom,
so that the size fit test always fails. The problem is that nothing prevents
from storing a header with an empty name and empty value, resulting in a
total size of zero bytes, which satisfies the condition to use the headroom.
Doing this in a wrapped buffer results in changing the "front" header index
and causing miscalculations on the available size and the addresses of the
next headers. This may even allow to overwrite some parts of the index,
opening the possibility to perform arbitrary writes into a 32-bit relative
address space.
This patch fixes the issue by making sure the headroom is considered only
when the buffer does not wrap, instead of relying on the zero size. This
must be backported to all versions supporting H2, which is as far as 1.8.
Many thanks to Felix Wilhelm of Google Project Zero for responsibly
reporting this problem with a reproducer and a detailed analysis.
CVE-2020-11100 was assigned to this issue.
Add the support for filters and SSL options in the CLI command
"add ssl crt-list".
The feature was implemented by applying the same parser as the crt-list
file to the payload.
The new options are passed to the command as a payload with the same
format that is suppported by the crt-list file itself, so you can easily
copy a line from a file and push it via the CLI.
Example:
printf "add ssl crt-list localhost.crt-list <<\necdsa.pem [verify none allow-0rtt] localhost !www.test1.com\n\n" | socat /tmp/sock1 -
In order to reuse the crt-list line parsing in "add ssl crt-list",
the line parsing was extracted from crtlist_parse_file() to a new
function crtlist_parse_line().
This function fills a crtlist_entry structure with a bind_ssl_conf and
the list of filters.
We can't expect the DNS answer to always match the case we used for the
request, so we can't just use memcmp() to compare the DNS answer with what
we are expected.
Instead, introduce dns_hostname_cmp(), which compares each string in a
case-insensitive way.
This should fix github issue #566.
This should be backported to 2.1, 2.0, 1.9 and 1.8.
This patch fixes#53 where it was noticed that when an active
server is set to DRAIN it no longer has the color blue reflected
within the stats page. This patch addresses that and adds the
color back to drain. It's to be noted that backup servers are
configured to have an orange color when they are draining.
Should be backported as far as 1.7.
The new 'add ssl crt-list' command allows to add a new crt-list entry in
a crt-list (or a directory since they are handled the same way).
The principle is basicaly the same as the "commit ssl cert" command with
the exception that it iterates on every bind_conf that uses the crt-list
and generates a ckch instance (ckch_inst) for each of them.
The IO handler yield every 10 bind_confs so HAProxy does not get stuck in
a too much time consuming generation if it needs to generate too much
SSL_CTX'.
This command does not handle the SNI filters and the SSL configuration
yet.
Example:
$ echo "new ssl cert foo.net.pem" | socat /tmp/sock1 -
New empty certificate store 'foo.net.pem'!
$ echo -e -n "set ssl cert foo.net.pem <<\n$(cat foo.net.pem)\n\n" | socat /tmp/sock1 -
Transaction created for certificate foo.net.pem!
$ echo "commit ssl cert foo.net.pem" | socat /tmp/sock1 -
Committing foo.net.pem
Success!
$ echo "add ssl crt-list one.crt-list foo.net.pem" | socat /tmp/sock1 -
Inserting certificate 'foo.net.pem' in crt-list 'one.crt-list'......
Success!
$ echo "show ssl crt-list one.crt-list" | socat /tmp/sock1 -
# one.crt-list
0x55d17d7be360 kikyo.pem.rsa [ssl-min-ver TLSv1.0 ssl-max-ver TLSv1.3]
0x55d17d82cb10 foo.net.pem
The crtlist_entry structure use a pointer to the store as key.
That's a problem with the dynamic update of a certificate over the CLI,
because it allocates a new ckch_store. So updating the pointers is
needed. To achieve that, a linked list of the crtlist_entry was added in
the ckch_store, so it's easy to iterate on this list to update the
pointers. Another solution would have been to rework the system so we
don't allocate a new ckch_store, but it requires a rework of the ckch
code.
When updating a ckch_store we may want to update its pointer in the
crtlist_entry which use it. To do this, we need the list of the entries
using the store.
The instances were wrongly inserted in the crtlist entries, all
instances of a crt-list were inserted in the last crt-list entry.
Which was kind of handy to free all instances upon error.
Now that it's done correctly, the error path was changed, it must
iterate on the entries and find the ckch_insts which were generated for
this bind_conf. To avoid wasting time, it stops the iteration once it
found the first unsuccessful generation.
In order to be able to add new certificate in a crt-list, we need the
list of bind_conf that uses this crt-list so we can create a ckch_inst
for each of them.
The original algorithm always killed half the idle connections. This doesn't
take into account the way the load can change. Instead, we now kill half
of the exceeding connections (exceeding connection being the number of
used + idle connections past the last maximum used connections reached).
That way if we reach a peak, we will kill much less, and it'll slowly go back
down when there's less usage.
Add a counter to know the current number of used connections, as well as the
max, this will be used later to refine the algorithm used to kill idle
connections, based on current usage.
With server-template was introduced the possibility to scale the
number of servers in a backend without needing a configuration change
and associated reload. On the other hand it became impractical to
write use-server rules for these servers as they would only accept
existing server labels as argument. This patch allows the use of
log-format notation to describe targets of a use-server rules, such
as in the example below:
listen test
bind *:1234
use-server %[hdr(srv)] if { hdr(srv) -m found }
use-server s1 if { path / }
server s1 127.0.0.1:18080
server s2 127.0.0.1:18081
If a use-server rule is applied because it was conditionned by an
ACL returning true, but the target of the use-server rule cannot be
resolved, no other use-server rule is evaluated and we fall back to
load balancing.
This feature was requested on the ML, and bumped with issue #563.
Add a sample fetch for the name of a bind. This can be useful to
take decisions when PROXY protocol is used and we can't rely on dst,
such as the sample config below.
defaults
mode http
listen bar
bind 127.0.0.1:1111
server s1 127.0.1.1:1234 send-proxy
listen foo
bind 127.0.1.1:1234 name foo accept-proxy
http-request return status 200 hdr dst %[dst] if { dst 127.0.1.1 }
First: self issued CA, aka root CA, is the enchor for chain validation,
no need to send it, client must have it. HAProxy can skip it in ssl_ctx.
Second: the main motivation to skip root CA in ssl_ctx is to be able to
provide it in the chain without drawback. Use case is to provide issuer
for ocsp without the need for .issuer and be able to share it in
issuers-chain-path. This concerns all certificates without intermediate
certificates. It's useless for BoringSSL, .issuer is ignored because ocsp
bits doesn't need it.
13a9232ebc introduced parsing of
Additionnal DNS response section to pick up IP address when available.
That said, this introduced a side effect for other query types (A and
AAAA) leading to consider those responses invalid when parsing the
Additional section.
This patch avoids this situation by ensuring the Additional section is
parsed only for SRV queries.
In h1_detach(), if our input buffer isn't empty, don't just subscribe(), we
may hold a new request, and there's nothing left to read. Instead, call
h1_process() directly, so that a new stream is created.
Failure to do so means if we received the new request to early, the
connecetion will just hang, as it happens when using svn.
When a "peers" section has not any local peer, it is removed of the list
of "peers" sections by check_config_validity(). But a stick-table which
refers to a "peers" section stores a pointer to this peers section.
These pointer must be reset to NULL value for each stick-table refering to
such a "peers" section to prevent stktable_init() to start the peers frontend
attached to the peers section dereferencing the invalid pointer.
Furthemore this patch stops the peers frontend as this is done for other
configurations invalidated by check_config_validity().
Thank you to Olivier D for having reported this issue with such a
simple configuration file which made haproxy crash when started with
-c option for configuration file validation.
defaults
mode http
peers mypeers
peer toto 127.0.0.1:1024
backend test
stick-table type ip size 10k expire 1h store http_req_rate(1h) peers mypeers
Must be backported to 2.1 and 2.0.
Fix an infinite loop which was added in an attempt to fix#558.
If the peers_fe is NULL, it will loop forever.
Must be backported with a2cfd7e as far as 1.8.
Tim reported that in master-worker mode, if a stick-table is declared
but not used in the configuration, its associated peers listener won't
bind.
This problem is due the fact that the master-worker and the daemon mode,
depend on the bind_proc field of the peers proxy to disable the listener.
Unfortunately the bind_proc is left to 0 if no stick-table were used in
the configuration, stopping the listener on all processes.
This fixes sets the bind_proc to the first process if it wasn't
initialized.
Should fix bug #558. Should be backported as far as 1.8.
SSL_CTX_set1_chain is used for openssl >= 1.0.2 and a loop with
SSL_CTX_add_extra_chain_cert for openssl < 1.0.2.
SSL_CTX_add_extra_chain_cert exist with openssl >= 1.0.2 and can be
used for all openssl version (is new name is SSL_CTX_add0_chain_cert).
This patch use SSL_CTX_add_extra_chain_cert to remove any #ifdef for
compatibilty. In addition sk_X509_num()/sk_X509_value() replace
sk_X509_shift() to extract CA from chain, as it is used in others part
of the code.
This bug was introduced by 85888573 "BUG/MEDIUM: ssl: chain must be
initialized with sk_X509_new_null()". No need to set find_chain with
sk_X509_new_null(), use find_chain conditionally to fix issue #516.
This bug was referenced by issue #559.
[wla: fix some alignment/indentation issue]
In run_thread_poll_loop() we test both for (global_tasks_mask & tid_bit)
and thread_has_tasks(), but the former is useless since this test is
already part of the latter.
Commit 4b3f27b ("BUG/MINOR: haproxy/threads: try to make all threads
leave together") improved the soft-stop synchronization but it left a
small race open because it looks at tasks_run_queue, which can drop
to zero then back to one while another thread picks the task from the
run queue to insert it into the tasklet_list. The risk is very low but
not null. In addition the condition didn't consider the possible presence
of signals in the queue.
This patch moves the stopping detection just after the "wake" calculation
which already takes care of the various queues' sizes and signals. It
avoids needlessly duplicating these tests.
The bug was discovered during a code review but will probably never be
observed. This fix may be backported to 2.1 and 2.0 along with the commit
above.
In the various muxes, add a comment documenting that once
srv_add_to_idle_list() got called, any thread may pick that conenction up,
so it is unsafe to access the mux context/the connection, the only thing we
can do is returning.
In h1_detach(), make sure we subscribe before we call
srv_add_to_idle_list(), not after. As soon as srv_add_to_idle_list() is
called, and it is put in an idle list, another thread can take it, and
we're no longer allowed to subscribe.
This fixes a race condition when another thread grabs a connection as soon
as it is put, the original owner would subscribe, and thus the new thread
would fail to do so, and to activate polling.
Fix a potential NULL dereference in "show ssl cert" when we can't
allocate the <out> trash buffer.
This patch creates a new label so we could jump without trying to do the
ci_putchk in this case.
This bug was introduced by ea987ed ("MINOR: ssl/cli: 'new ssl cert'
command"). 2.2 only.
This bug was referenced by issue #556.
In connect_server(), make sure we properly free a newly created connection
if we somehow fail, and it has not yet been attached to a conn_stream, or
it would lead to a memory leak.
This should appease coverity for backend.c, as reported in inssue #556.
This should be backported to 2.1, 2.0 and 1.9
Fix a memory leak that could happen upon a "show ssl cert" if notBefore:
or notAfter: failed to extract its ASN1 string.
Introduced by d4f946c ("MINOR: ssl/cli: 'show ssl cert' give information
on the certificates"). 2.2 only.
In `crtlist_dup_filters()` add the `1` to the number of elements instead of
the size of a single element.
This bug was introduced in commit 2954c478eb,
which is 2.2+. No backport needed.
In `ckch_inst_sni_ctx_to_sni_filters` use `calloc()` to allocate the filter
array. When the function fails to allocate memory for a single entry the
whole array will be `free()`d using free_sni_filters(). With the previous
`malloc()` the pointers for entries after the failing allocation could
possibly be a garbage value.
This bug was introduced in commit 38df1c8006,
which is 2.2+. No backport needed.
In conn_backend_get(), when we manage to get an idle connection from the
current thread's pool, don't forget to decrement the idle connection
counters, or we may end up not reusing connections when we could, and/or
killing connections when we shouldn't.
In connect_server(), if we notice we have more file descriptors opened than
we should, there's no reason not to close a connection just because we're
reusing one, so do it anyway.
In connect_server(), if we no longer have any idle connections for the
current thread, attempt to use the new "takeover" mux method to steal a
connection from another thread.
This should have no impact right now, given no mux implements it.
Make the "list" element a struct mt_list, and explicitely use
list_from_mt_list to get a struct list * where it is used as such, so that
mt_list_for_each_entry will be usable with it.
Implement a new function, fd_takeover(), that lets you become the thread
responsible for the fd. On architectures that do not have a double-width CAS,
use a global rwlock.
fd_set_running() was also changed to be able to compete with fd_takeover(),
either using a dooble-width CAS on both running_mask and thread_mask, or
by claiming a reader on the global rwlock. This extra operation should not
have any measurable impact on modern architectures where threading is
relevant.
Revamp the server connection lists. We know have 3 lists :
- idle_conns, which contains idling connections
- safe_conns, which contains idling connections that are safe to use even
for the first request
- available_conns, which contains connections that are not idling, but can
still accept new streams (those are HTTP/2 or fastcgi, and are always
considered safe).
Make it so sessions are not responsible for connection anymore, except for
connections that are private, and thus can't be shared, otherwise, as soon
as a request is done, the session will just add the connection to the
orphan connections pool.
This will break http-reuse safe, but it is expected to be fixed later.
The CLI command "new ssl cert" allows one to create a new certificate
store in memory. It can be filed with "set ssl cert" and "commit ssl
cert".
This patch also made a small change in "show ssl cert" to handle an
empty certificate store.
Multi-certificate bundles are not supported since they will probably be
removed soon.
This feature alone is useless since there is no way to associate the
store to a crt-list yet.
Example:
$ echo "new ssl cert foobar.pem" | socat /tmp/sock1 -
New empty certificate store 'foobar.pem'!
$ printf "set ssl cert foobar.pem <<\n$(cat localhost.pem.rsa)\n\n" | socat /tmp/sock1 -
Transaction created for certificate foobar.pem!
$ echo "commit ssl cert foobar.pem" | socat /tmp/sock1 -
Committing foobar.pem
Success!
$ echo "show ssl cert foobar.pem" | socat /tmp/sock1 -
Filename: foobar.pem
[...]
The flush_lock was introduced, mostly to be sure that pool_gc() will never
dereference a pointer that has been free'd. __pool_get_first() was acquiring
the lock to, the fear was that otherwise that pointer could get free'd later,
and then pool_gc() would attempt to dereference it. However, that can not
happen, because the only functions that can free a pointer, when using
lockless pools, are pool_gc() and pool_flush(), and as long as those two
are mutually exclusive, nobody will be able to free the pointer while
pool_gc() attempts to access it.
So change the flush_lock to a spinlock, and don't bother acquire/release
it in __pool_get_first(), that way callers of __pool_get_first() won't have
to wait while the pool is flushed. The worst that can happen is we call
__pool_refill_alloc() while the pool is getting flushed, and memory can
get allocated just to be free'd.
This may help with github issue #552
This may be backported to 2.1, 2.0 and 1.9.
When running __signal_process_queue(), we ignore most signals. We can't,
however, ignore WDTSIG and DEBUGSIG, otherwise that thread may end up
waiting for another one that could hold a glibc lock, while the other thread
wait for this one to enter debug_handler().
So make sure WDTSIG and DEBUGSIG aren't ignored, if they are defined.
This probably explains the watchdog deadlock described in github issue
This should be backported to 2.1, 2.0 and 1.9.
Move the definition of WDTSIG and DEBUGSIG from wdt.c and debug.c into
types/signal.h, so that we can access them in another file.
We need those definition to avoid blocking those signals when running
__signal_process_queue().
This should be backported to 2.1, 2.0 and 1.9.
The behavior of calloc() when being passed `0` as `nelem` is implementation
defined. It may return a NULL pointer.
Avoid this issue by checking before allocating. While doing so adjust the local
integer variables that are used to refer to memory offsets to `size_t`.
This issue was introced in commit f91ac19299. This
patch should be backported together with that commit.
There is a memleak of the entry structure in crtlist_load_cert_dir(), in
the case we can't stat the file, or this is not a regular file. Let's
move the entry allocation so it's done after these tests.
Fix issue #551.
When tasklet were introduced, it has been decided not to provide the tasklet
to the callback, but NULL instead. While it may have been reasonable back
then, maybe to be able to differentiate a task from a tasklet from the
callback, it also means that we can't access the tasklet from the handler if
the context provided can't be trusted.
As no handler is shared between a task and a tasklet, and there are now
other means of distinguishing between task and tasklet, just pass the
tasklet pointer too.
This may be backported to 2.1, 2.0 and 1.9 if needed.
A memory leak happens in an error case when ckchs_load_cert_file()
returns NULL in crtlist_parse_file().
This bug was introduced by commit 2954c47 ("MEDIUM: ssl: allow crt-list caching")
This patch fixes bug #551.
In the struct fdtab, introduce a new mask, running_mask. Each thread should
add its bit before using the fd.
Use the running_mask instead of a lock, in fd_insert/fd_delete, we'll just
spin as long as the mask is non-zero, to be sure we access the data
exclusively.
fd_set_running_excl() spins until the mask is 0, fd_set_running() just
adds the thread bit, and fd_clr_running() removes it.
Implement 2 new commands on the CLI:
show ssl crt-list [<filename>]: Without a specified filename, display
the list of crt-lists used by the configuration. If a filename is
specified, it will displays the content of this crt-list, with a line
identifier at the beginning of each line. This output must not be used
as a crt-list file.
dump ssl crt-list <filename>: Dump the content of a crt-list, the output
can be used as a crt-list file.
Note: It currently displays the default ssl-min-ver and ssl-max-ver
which are potentialy not in the original file.
Don't bother trying to remove the connection from the idle list, as the
only connections the mux_pt handles are now the TCP-mode connections, and
those are never added to the idle list.
The agent's engine_id forgot to dup from trash, all engine_ids point to
the same address "&trash.area", the engine_id changed at run time and will
double free when release agents and trash.
This bug was introduced by the commit ee3bcddef ("MINOR: tools: add a generic
function to generate UUIDs").
No backport is needed, this is 2.2-dev.
The commit 6be66ec ("MINOR: ssl: directories are loaded like crt-list")
broke the directory loading of the certificates. The <crtlist> wasn't
filled by the crtlist_load_cert_dir() function. And the entries were
not correctly initialized. Leading to a segfault during startup.
Generate a directory cache with the crtlist and crtlist_entry structures.
With this new model, directories are a special case of the crt-lists.
A directory is a crt-list which allows only one occurence of each file,
without SSL configuration (ssl_bind_conf) and without filters.
The crtlist structure defines a crt-list in the HAProxy configuration.
It contains crtlist_entry structures which are the lines in a crt-list
file.
crt-list are now loaded in memory using crtlist and crtlist_entry
structures. The file is read only once. The generation algorithm changed
a little bit, new ckch instances are generated from the crtlist
structures, instead of being generated during the file loading.
The loading function was split in two, one that loads and caches the
crt-list and certificates, and one that looks for a crt-list and creates
the ckch instances.
Filters are also stored in crtlist_entry->filters as a char ** so we can
generate the sni_ctx again if needed. I won't be needed anymore to parse
the sni_ctx to do that.
A crtlist_entry stores the list of all ckch_inst that were generated
from this entry.
Pass a pointer to the struct ckch_inst to the ssl_sock_load_ckchs()
function so we can manipulate the ckch_inst from
ssl_sock_load_cert_list_file() and ssl_sock_load_cert().
This bug was introduced with peers.c code re-work (7d0ceeec80):
"struct peer" flags are mistakenly checked instead of
"struct peers" flags to check the resync status of the local peer.
The issue was reported here:
https://github.com/haproxy/haproxy/issues/545
This bug affects all branches >= 2.0 and should be backported.
It's more generic and versatile than the previous shut_your_big_mouth_gcc()
that was used to silence annoying warnings as it's not limited to ignoring
syscalls returns only. This allows us to get rid of the aforementioned
function and the shut_your_big_mouth_gcc_int variable, that started to
look ugly in multi-threaded environments.
This message comes when we run:
haproxy -c -V -f /etc/haproxy/haproxy.cfg
[ALERT] 072/233727 (30865) : parsing [/etc/haproxy/haproxy.cfg:34] : The 'rspirep' directive is not supported anymore sionce HAProxy 2.1. Use 'http-response replace-header' instead.
[ALERT] 072/233727 (30865) : Error(s) found in configuration file : /etc/haproxy/haproxy.cfg
[ALERT] 072/233727 (30865) : Fatal errors found in configuration.
These are mostly comments in the code. A few error messages were fixed
and are of low enough importance not to deserve a backport. Some regtests
were also fixed.
This adds the missing blank lines in `make_proxy_line_v2` and
`conn_recv_proxy`. It also adjusts the type of the temporary variable
used for the return value of `recv` to be `ssize_t` instead of `int`.
This patch adds the `unique-id` option to `proxy-v2-options`. If this
option is set a unique ID will be generated based on the `unique-id-format`
while sending the proxy protocol v2 header and stored as the unique id for
the first stream of the connection.
This feature is meant to be used in `tcp` mode. It works on HTTP mode, but
might result in inconsistent unique IDs for the first request on a keep-alive
connection, because the unique ID for the first stream is generated earlier
than the others.
Now that we can send unique IDs in `tcp` mode the `%ID` log variable is made
available in TCP mode.
There's a small issue with soft stop combined with the incoming
connection load balancing. A thread may dispatch a connection to
another one at the moment stopping=1 is set, and the second one could
stop by seeing (jobs - unstoppable_jobs) == 0 in run_poll_loop(),
without ever picking these connections from the queue. This is
visible in that it may occasionally cause a connection drop on
reload since no remaining thread will ever pick that connection
anymore.
In order to address this, this patch adds a stopping_thread_mask
variable by which threads acknowledge their willingness to stop
when their runqueue is empty. And all threads will only stop at
this moment, so that if finally some late work arrives in the
thread's queue, it still has a chance to process it.
This should be backported to 2.1 and 2.0.
When stopping there is a risk that other threads are already in the
process of stopping, so let's not add new work in their queue and
instead keep the incoming connection local.
This should be backported to 2.1 and 2.0.
Surprizingly the variable was never initialized, though on most
platforms it's zeroed at boot, and it is relatively harmless
anyway since in the worst case the bits are updated around poll().
This was introduced by commit 79321b95a8 and needs to be backported
as far as 1.9.
In pool_gc(), when we're not using lockless pool, always update free_list,
and read from it the next element to free. As we now unlock the pool while
we're freeing the item, another thread could have updated free_list in our
back. Not doing so could lead to segfaults when pool_gc() is called.
This should be backported to 2.1.
Don't assume the connection always has a valid session in "owner".
Instead, attempt to retrieve the session from the stream, and modify
the error snapshot code to not assume we always have a session, or the proxy
for the other end.
x86_64 and ARM64 do support the double-word atomic CAS. However on
ARM it must be done only on aligned data. The random generator makes
use of such double-word atomic CAS when available but didn't enforce
alignment, which causes ARM64 to crash early in the startup since
commit 52bf839 ("BUG/MEDIUM: random: implement a thread-safe and
process-safe PRNG").
This commit just unconditionally aligns the arrays. It must be
backported to all branches where the commit above is backported
(likely till 2.0).
Remove the list of private connections from server, it has been largely
unused, we only inserted connections in it, but we would never actually
use it.
When a maximum memory setting is passed to haproxy and maxconn is not set
and ulimit-n is not set, it is expected that maxconn will be set to the
highest value permitted by this memory setting, possibly affecting the
FD limit.
When maxconn was changed to be deduced from the current process's FD limit,
the automatic setting above was partially lost because it now remains
limited to the current FD limit in addition to being limited to the
memory usage. For unprivileged processes it does not change anything,
but for privileged processes the difference is important. Indeed, the
previous behavior ensured that the new FD limit could be enforced on
the process as long as the user had the privilege to do so. Now this
does not happen anymore, and some people rely on this for automatic
sizing in VM environments.
This patch implements the ability to verify if the setting will be
enforceable on the process or not. First it computes maxconn based on
the memory limits alone, then checks if the process is willing to accept
them, otherwise tries again by respecting the process' hard limit.
Thanks to this we now have the best of the pre-2.0 behavior and the
current one, in that privileged users will be able to get as high a
maxconn as they need just based on the memory limit, while unprivileged
users will still get as high a setting as permitted by the intersection
of the memory limit and the process' FD limit.
Ideally, after some observation period, this patch along with the
previous one "MINOR: init: move the maxsock calculation code to
compute_ideal_maxsock()" should be backported to 2.1 and 2.0.
Thanks to Baptiste for raising the issue.
The maxsock value is currently derived from global.maxconn and a few other
settings, some of which also depend on global.maxconn. This makes it
difficult to check if a limit is already too high or not during the maxconn
automatic sizing.
Let's move this code into a new function, compute_ideal_maxsock() which now
takes a maxconn in argument. It performs the same operations and returns
the maxsock value if global.maxconn were to be set to that value. It now
replaces the previous code to compute maxsock.
When calling MT_LIST_DEL_SAFE(), give him the temporary pointer "tmpelt",
as that's what is expected. We want to be able to set that pointer to NULL,
to let other parts of the code know we deleted an element.
In order to store and cache the directory loading, the directory loading
was separated from ssl_sock_load_cert() and put in a new function
ssl_sock_load_cert_dir() to be more readable.
This patch only splits the function in two.
SI_TKILL is not necessarily defined on older systems and is used only
with the pthread_kill() call a few lines below, so it should also be
subject to the USE_THREAD condition.
Technically speaking the call was implemented in glibc 2.3 so we must
rely on this and not on __USE_GNU which is an internal define of glibc
to track use of GNU_SOURCE.
The splice() syscall has been supported in glibc since version 2.5 issued
in 2006 and is present on supported systems so there's no need for having
our own arch-specific syscall definitions anymore.
This was made to support epoll on patched 2.4 kernels, and on early 2.6
using alternative libcs thanks to the arch-specific syscall definitions.
All the features we support have been around since 2.6.2 and present in
glibc since 2.3.2, neither of which are found in field anymore. Let's
simply drop this and use epoll normally.
The accept4() syscall has been present for a while now, there is no more
reason for maintaining our own arch-specific syscall implementation for
systems lacking it in libc but having it in the kernel.
This was introduced 10 years ago to squeeze a few CPU cycles per syscall
on 32-bit x86 machines and was already quite old by then, requiring to
explicitly enable support for this in the kernel. We don't even know if
it still builds, let alone if it works at all on recent kernels! Let's
completely drop this now.
Since commit 244b070 ("MINOR: ssl/cli: support crt-list filters"),
HAProxy generates a list of filters based on the sni_ctx in memory.
However it's not always relevant, sometimes no filters were configured
and the CN/SAN in the new certificate are not the same.
This patch fixes the issue by using a flag filters in the ckch_inst, so
we are able to know if there were filters or not. In the late case it
uses the CN/SAN of the new certificate to generate the sni_ctx.
note: filters are still only used in the crt-list atm.
We currently have two UUID generation functions, one for the sample
fetch and the other one in the SPOE filter. Both were a bit complicated
since they were made to support random() implementations returning an
arbitrary number of bits, and were throwing away 33 bits every 64. Now
we don't need this anymore, so let's have a generic function consuming
64 bits at once and use it as appropriate.
The rand() sample fetch supports being limited to a certain range, but
it only uses 31 bits and scales them as requested, which means that when
the requested output range is larger than 31 bits, the least significant
one is not random and may even be constant.
Let's make use of the whole 32 bits now that we have access ot them.
In order to honor spread_checks we currently call rand() which is not
thread safe and which must never turn its internal state to zero. This
is not thread safe, let's use ha_random() instead. This is a complement
to commimt 52bf839394 ("BUG/MEDIUM: random: implement a thread-safe and
process-safe PRNG") and may be backported with it.
For the random LB algorithm we need a random 32-bit hashing key that used
to be made of two calls to random(). Now we can simply perform a single
call to ha_random32() and get rid of the useless operations.
This is the replacement of failed attempt to add thread safety and
per-process sequences of random numbers initally tried with commit
1c306aa84d ("BUG/MEDIUM: random: implement per-thread and per-process
random sequences").
This new version takes a completely different approach and doesn't try
to work around the horrible OS-specific and non-portable random API
anymore. Instead it implements "xoroshiro128**", a reputedly high
quality random number generator, which is one of the many variants of
xorshift, which passes all quality tests and which is described here:
http://prng.di.unimi.it/
While not cryptographically secure, it is fast and features a 2^128-1
period. It supports fast jumps allowing to cut the period into smaller
non-overlapping sequences, which we use here to support up to 2^32
processes each having their own, non-overlapping sequence of 2^96
numbers (~7*10^28). This is enough to provide 1 billion randoms per
second and per process for 2200 billion years.
The implementation was made thread-safe either by using a double 64-bit
CAS on platforms supporting it (x86_64, aarch64) or by using a local
lock for the time needed to perform the shift operations. This ensures
that all threads pick numbers from the same pool so that it is not
needed to assign per-thread ranges. For processes we use the fast jump
method to advance the sequence by 2^96 for each process.
Before this patch, the following config:
global
nbproc 8
frontend f
bind :4445
mode http
log stdout format raw daemon
log-format "%[uuid] %pid"
redirect location /
Would produce this output:
a4d0ad64-2645-4b74-b894-48acce0669af 12987
a4d0ad64-2645-4b74-b894-48acce0669af 12992
a4d0ad64-2645-4b74-b894-48acce0669af 12986
a4d0ad64-2645-4b74-b894-48acce0669af 12988
a4d0ad64-2645-4b74-b894-48acce0669af 12991
a4d0ad64-2645-4b74-b894-48acce0669af 12989
a4d0ad64-2645-4b74-b894-48acce0669af 12990
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12987
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12992
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12986
(...)
And now produces:
f94b29b3-da74-4e03-a0c5-a532c635bad9 13011
47470c02-4862-4c33-80e7-a952899570e5 13014
86332123-539a-47bf-853f-8c8ea8b2a2b5 13013
8f9efa99-3143-47b2-83cf-d618c8dea711 13012
3cc0f5c7-d790-496b-8d39-bec77647af5b 13015
3ec64915-8f95-4374-9e66-e777dc8791e0 13009
0f9bf894-dcde-408c-b094-6e0bb3255452 13011
49c7bfde-3ffb-40e9-9a8d-8084d650ed8f 13014
e23f6f2e-35c5-4433-a294-b790ab902653 13012
There are multiple benefits to using this method. First, it doesn't
depend anymore on a non-portable API. Second it's thread safe. Third it
is fast and more proven than any hack we could attempt to try to work
around the deficiencies of the various implementations around.
This commit depends on previous patches "MINOR: tools: add 64-bit rotate
operators" and "BUG/MEDIUM: random: initialize the random pool a bit
better", all of which will need to be backported at least as far as
version 2.0. It doesn't require to backport the build fixes for circular
include files dependecy anymore.
This reverts commit 1c306aa84d.
It breaks the build on all non-glibc platforms. I got confused by the
man page (which possibly is the most confusing man page I've ever read
about a standard libc function) and mistakenly understood that random_r
was portable, especially since it appears in latest freebsd source as
well but not in released versions, and with a slightly different API :-/
We need to find a different solution with a fallback. Among the
possibilities, we may reintroduce this one with a fallback relying on
locking around the standard functions, keeping fingers crossed for no
other library function to call them in parallel, or we may also provide
our own PRNG, which is not necessarily more difficult than working
around the totally broken up design of the portable API.
As mentioned in previous patch, the random number generator was never
made thread-safe, which used not to be a problem for health checks
spreading, until the uuid sample fetch function appeared. Currently
it is possible for two threads or processes to produce exactly the
same UUID. In fact it's extremely likely that this will happen for
processes, as can be seen with this config:
global
nbproc 8
frontend f
bind :4445
mode http
log stdout daemon format raw
log-format "%[uuid] %pid"
redirect location /
It typically produces this log:
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30645
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30641
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30644
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30639
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30646
07764439-c24d-4e6f-a5a6-0138be59e7a8 30645
07764439-c24d-4e6f-a5a6-0138be59e7a8 30639
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30643
07764439-c24d-4e6f-a5a6-0138be59e7a8 30646
b6773fdd-678f-4d04-96f2-4fb11ad15d6b 30646
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30642
07764439-c24d-4e6f-a5a6-0138be59e7a8 30642
What this patch does is to use a distinct per-thread and per-process
seed to make sure the same sequences will not appear, and will then
extend these seeds by "burning" a number of randoms that depends on
the global random seed, the thread ID and the process ID. This adds
roughly 20 extra bits of randomness, resulting in 52 bits total per
thread and per process.
It only takes a few milliseconds to burn these randoms and given
that threads start with a different seed, we know they will not
catch each other. So these random extra bits are essentially added
to ensure randomness between boots and cluster instances.
This replaces all uses of random() with ha_random() which uses the
thread-local state.
This must be backported as far as 2.0 or any version having the
UUID sample-fetch function since it's the main victim here.
It's important to note that this patch, in addition to depending on
the previous one "BUG/MEDIUM: init: initialize the random pool a bit
better", also depends on the preceeding build fixes to address a
circular dependency issue in the include files that prevented it
from building. Part or all of these patches may need to be backported
or adapted as well.
Since the UUID sample fetch was created, some people noticed that in
certain virtualized environments they manage to get exact same UUIDs
on different instances started exactly at the same moment. It turns
out that the randoms were only initialized to spread the health checks
originally, not to provide "clean" randoms.
This patch changes this and collects more randomness from various
sources, including existing randoms, /dev/urandom when available,
RAND_bytes() when OpenSSL is available, as well as the timing for such
operations, then applies a SHA1 on all this to keep a 160 bits random
seed available, 32 of which are passed to srandom().
It's worth mentioning that there's no clean way to pass more than 32
bits to srandom() as even initstate() provides an opaque state that
must absolutely not be tampered with since known implementations
contain state information.
At least this allows to have up to 4 billion different sequences
from the boot, which is not that bad.
Note that the thread safety was still not addressed, which is another
issue for another patch.
This must be backported to all versions containing the UUID sample
fetch function, i.e. as far as 2.0.
In the same way than for the request, when a redirect rule is applied the
transction is aborted. This must be done returning HTTP_RULE_RES_ABRT from
http_res_get_intercept_rule() function.
No backport needed because on previous versions, the action return values are
not handled the same way.
Backend counters must be incremented only if a backend was already assigned to
the stream (when the stream exists). Otherwise, it means we are still on the
frontend side.
This patch may be backported as far as 1.6.
When an action interrupts a transaction, returning a response or not, it must
return the ACT_RET_ABRT value and not ACT_RET_STOP. ACT_RET_STOP is reserved to
stop the processing of the current ruleset.
No backport needed because on previous versions, the action return values are
not handled the same way.
When at least a filter is attached to a stream, FLT_END analyzers must be
preserved on request and response channels.
This patch should be backported as far as 1.7.
Since the HTX mode is the only mode to process HTTP messages, the stream is
created for a uniq transaction. The keep-alive is handled at the mux level. So,
the compression filter can be initialized when the stream is created and
released with the stream. Concretly, .channel_start_analyze and
.channel_end_analyze callback functions are replaced by .attach and .detach
ones.
With this change, it is no longer necessary to call FLT_START_FE/BE and FLT_END
analysers for the compression filter.
Since the HTX mode is the only mode to process HTTP messages, the stream is
created for a uniq transaction. The keep-alive is handled at the mux level. So,
the cache filter can be initialized when the stream is created and released with
the stream. Concretly, .channel_start_analyze and .channel_end_analyze callback
functions are replaced by .attach and .detach ones.
With this change, it is no longer necessary to call FLT_START_FE/BE and FLT_END
analysers for the cache filter.
A typo was introduced by the commit c5bb5a0f2 ("BUG/MINOR: http-rules: Preserve
FLT_END analyzers on reject action").
This patch must be backported with the commit c5bb5a0f2.
When at least a filter is attached to a stream, FLT_END analyzers must be
preserved on request and response channels.
This patch should be backported as far as 1.8.
When an action interrupts a transaction, returning a response or not, it must
return the ACT_RET_ABRT value and not ACT_RET_DONE. ACT_RET_DONE is reserved to
stop the processing on the current channel but some analysers may still be
active. When ACT_RET_ABRT is returned, all analysers are removed, except FLT_END
if it is set.
No backport needed because on previous verions, the action return value was not
handled the same way.
It is stated in the comment the return action returns ACT_RET_ABRT on
success. It it the right code to use to abort a transaction. ACT_RET_DONE must
be used when the message processing must be stopped. This does not means the
transaction is interrupted.
No backport needed.
The wake_time of a lua context is now always set to TICK_ETERNITY when the
context is initialized and when everytime the execution of the lua stack is
started. It is mandatory to not set arbitrary wake_time when an action yields.
No backport needed.
This flag was used in some internal functions to be sure the current stream is
able to handle HTTP content. It was introduced when the legacy HTTP code was
still there. Now, It is possible to rely on stream's flags to be sure we have an
HTX stream.
So the flag HLUA_TXN_HTTP_RDY can be removed. Everywhere it was tested, it is
replaced by a call to the IS_HTX_STRM() macro.
This patch is mandatory to allow the support of the filters written in lua.
In these functions, the lua txn was not used. So it can be removed from the
function argument list.
This patch is mandatory to allow the support of the filters written in lua.
In this function, the lua txn was only used to retrieve the stream. But it can
be retieve from the HTTP message, using its channel pointer. So, the lua txn can
be removed from the function argument list.
This patch is mandatory to allow the support of the filters written in lua.
In this function, the lua txn was only used to test if the HTTP transaction is
defined. But it is always used in a context where it is true. So, the lua txn
can be removed from the function argument list.
This patch is mandatory to allow the support of the filters written in lua.
The Lua function Channel.is_full() should not take care of the reserve because
it is not called from a producer (an applet for instance). From an action, it is
allowed to overwrite the buffer reserve.
This patch should be backported as far as 1.7. But it must be adapted for 1.8
and lower because there is no HTX on these versions.
When a lua action aborts a transaction calling txn:done() function, the action
must return ACT_RET_ABRT instead of ACT_RET_DONE. It is mandatory to
abort the message analysis.
This patch must be backported everywhere the commit 7716cdf45 ("MINOR: lua: Get
the action return code on the stack when an action finishes") was
backported. For now, no backport needed.
When an error occurred on the response side, request analysers must be reset. At
this stage, only AN_REQ_HTTP_XFER_BODY analyser remains, and possibly
AN_REQ_FLT_END, if at least one filter is attached to the stream. So it is safe
to remove the AN_REQ_HTTP_XFER_BODY analyser. An error was already handled and a
response was already returned to the client (or it was at least scheduled to be
sent). So there is no reason to continue to process the request payload. It may
cause some troubles for the filters because when an error occurred, data from
the request buffer are truncated.
This patch must be backported as far as 1.9, for the HTX part only. I don't know
if the legacy HTTP code is affected.
During the payload filtering, the offset is relative to the head of the HTX
message and not its first index. This index is the position of the first block
to (re)start the HTTP analysis. It must be used during HTTP analysis but not
during the payload forwarding.
So, from the compression filter point of view, when we loop on the HTX blocks to
compress the response payload, we must start from the head of the HTX
message. To ease the loop, we use the function htx_find_offset().
This patch must be backported as far as 2.0. It depends on the commit "MINOR:
htx: Add a function to return a block at a specific an offset". So this one must
be backported first.
During the payload filtering, the offset is relative to the head of the HTX
message and not its first index. This index is the position of the first block
to (re)start the HTTP analysis. It must be used during HTTP analysis but not
during the payload forwarding.
So, from the cache point of view, when we loop on the HTX blocks to cache the
response payload, we must start from the head of the HTX message. To ease the
loop, we use the function htx_find_offset().
This patch must be backported as far as 2.0. It depends on the commit "MINOR:
htx: Add a function to return a block at a specific an offset". So this one must
be backported first.
If a filter enable the data filtering, in TCP or in HTTP, but it does not
defined the corresponding callback function (so http_payload() or
tcp_payload()), it will be ignored. If all configured data filter do the same,
we must be sure to forward everything. Otherwise nothing will be forwarded at
all.
This patch must be forwarded as far as 1.9.
When the tcp or http payload is filtered, it is important to use the filter
offset to decude the amount of forwarded data because this offset may change
during the call to the callback function. So we should not rely on a local
variable defined before this call.
For now, existing HAproxy filters don't change this offset, so this bug may only
affect external filters.
This patch must be forwarded as far as 1.9.
The htx_find_offset() function may be used to look for a block at a specific
offset in an HTX message, starting from the message head. A compound result is
returned, an htx_ret structure, with the found block and the position of the
offset in the block. If the offset is ouside of the HTX message, the returned
block is NULL.
This patch fixes PROXYv2 parsing when the payload of the TCP connection is
fused with the PROXYv2 header within a single recv() call.
Previously HAProxy ignored the PROXYv2 header length when attempting to
parse the TLV, possibly interpreting the first byte of the payload as a
TLV type.
This patch adds proper validation. It ensures that:
1. TLV parsing stops when the end of the PROXYv2 header is reached.
2. TLV lengths cannot exceed the PROXYv2 header length.
3. The PROXYv2 header ends together with the last TLV, not allowing for
"stray bytes" to be ignored.
A reg-test was added to ensure proper behavior.
This patch tries to find the sweat spot between a small and easily
backportable one, and a cleaner one that's more easily adaptable to
older versions, hence why it merges the "if" and "while" blocks which
causes a reindent of the whole block. It should be used as-is for
versions 1.9 to 2.1, the block about PP2_TYPE_AUTHORITY should be
dropped for 2.0 and the block about CRC32C should be dropped for 1.8.
This bug was introduced when TLV parsing was added. This happened in commit
b3e54fe387. This commit was first released
with HAProxy 1.6-dev1.
A similar issue was fixed in commit 7209c204bd.
This patch must be backported to HAProxy 1.6+.
James Stroehmann reported something working as documented but that can
be considered as a regression in the way the automatic maxconn is
calculated from the process' limits :
https://www.mail-archive.com/haproxy@formilux.org/msg36523.html
The purpose of the changes in 2.0 was to have maxconn default to the
highest possible value permitted to the user based on the ulimit -n
setting, however the calculation starts from the soft limit, which
can be lower than what users were allowed to with previous versions
where the default value of 2000 would force a higher ulimit -n as
long as it fitted in the hard limit.
Usually this is not noticeable if the user changes the limits, because
quite commonly setting a new value restricts both the soft and hard
values.
Let's instead always use the max between the hard and soft limits, as
we know these values are permitted. This was tried on the following
setup:
$ cat ulimit-n.cfg
global
stats socket /tmp/sock1 level admin
$ ulimit -n
1024
Before the change the limits would show like this:
$ socat - /tmp/sock1 <<< "show info" | grep -im2 ^Max
Maxsock: 1023
Maxconn: 489
After the change the limits are now much better and more in line with
the default settings in earlier versions:
$ socat - /tmp/sock1 <<< "show info" | grep -im2 ^Max
Maxsock: 4095
Maxconn: 2025
The difference becomes even more obvious when running moderately large
configs with hundreds of checked servers and hundreds of listeners:
$ cat ulimit-n.cfg
global
stats socket /tmp/sock1 level admin
listen l
bind :10000-10300
server-template srv- 300 0.0.0.0 check disabled
Before After
Maxsock 1024 4096
Maxconn 189 1725
This issue is tagged as minor since a trivial config change fixes it,
but it would help new users to have it backported as far as 2.0.
pattern_finalize_config() uses an inefficient algorithm which is a
problem with very large configuration files. This affects startup, and
therefore reload time. When haproxy is deployed as a router in a
Kubernetes cluster the generated configuration file may be large and
reloads are frequently occuring, which makes this a significant issue.
The old algorithm is O(n^2)
* allocate missing uids - O(n^2)
* sort linked list - O(n^2)
The new algorithm is O(n log n):
* find the user allocated uids - O(n)
* store them for efficient lookup - O(n log n)
* allocate missing uids - n times O(log n)
* sort all uids - O(n log n)
* convert back to linked list - O(n)
Performance examples, startup time in seconds:
pat_refs old new
1000 0.02 0.01
10000 2.1 0.04
20000 12.3 0.07
30000 27.9 0.10
40000 52.5 0.14
50000 77.5 0.17
Please backport to 1.8, 2.0 and 2.1.
This command is used to produce an arbitrary amount of data on the
output. It can be used to test the CLI's state machine as well as
the internal parts related to applets an I/O. A typical test consists
in asking for all sizes from 0 to 16384:
$ (echo "prompt;expert-mode on";for i in {0..16384}; do
echo "debug dev write $i"; done) | socat - /tmp/sock1 | wc -c
134258738
A better test would consist in first waiting for the response before
sending a new request.
This command is not restricted to the admin since it's harmless.
There's a build error reported here:
c9c6cdbf9c/checks
It's just caused by an inconditional assignment of tmp_filter to
*sni_filter without having been initialized, though it's harmless because
this return pointer is not used when fcount is NULL, which is the only
case where this happens.
No backport is needed as this was brought today by commit 38df1c8006
("MINOR: ssl/cli: support crt-list filters").
It was only possible to go down from the ckch_store to the sni_ctx but
not to go up from the sni_ctx to the ckch_store.
To allow that, 2 pointers were added:
- a ckch_inst pointer in the struct sni_ctx
- a ckckh_store pointer in the struct ckch_inst
Generate a list of the previous filters when updating a certificate
which use filters in crt-list. Then pass this list to the function
generating the sni_ctx during the commit.
This feature allows the update of the crt-list certificates which uses
the filters with "set ssl cert".
This function could be probably replaced by creating a new
ckch_inst_new_load_store() function which take the previous sni_ctx list as
an argument instead of the char **sni_filter, avoiding the
allocation/copy during runtime for each filter. But since are still
handling the multi-cert bundles, it's better this way to avoid code
duplication.
When building with DEBUG_STRICT, there are still some BUG_ON(events&event_type)
in the subscribe() code which are not welcome anymore since we explicitly
permit to wake the caller up on readiness. This causes some regtests to fail
since 2c1f37d353 ("OPTIM: mux-h1: subscribe rather than waking up at a few
other places") when built with this option.
No backport is needed, this is 2.2-dev.
This is another round of conversion from a blind tasklet_wakeup() to
a more careful subscribe(). It has significantly improved the number
of function calls per HTTP request (/?s=1k/t=20) :
before after
tasklet_wakeup: 3 2
conn_subscribe: 3 2
h1_iocb: 3 2
h1_process: 3 2
h1_parse_msg_hdrs: 4 3
h1_rcv_buf: 5 3
h1_send: 5 4
h1_subscribe: 2 1
h1_wake_stream_for_send: 5 4
http_wait_for_request: 2 1
process_stream: 3 2
si_cs_io_cb: 4 2
si_cs_process: 3 1
si_cs_rcv: 5 3
si_sync_send: 2 1
si_update_both: 2 1
stream_int_chk_rcv_conn: 3 2
stream_int_notify: 3 1
stream_release_buffers: 9 4
In order to save a lot on syscalls, we currently don't disable receiving
on a file descriptor anymore if its handler was already woken up. But if
the run queue is huge and the poller collects a lot of events, this causes
excess wakeups which take CPU time which is not used to flush these tasklets.
This patch simply considers the run queue size to decide whether or not to
stop receiving. Tests show that by stopping receiving when the run queue
reaches ~16 times its configured size, we can still hold maximal performance
in extreme situations like maxpollevents=20k for runqueue_depth=2, and still
totally avoid calling epoll_event under moderate load using default settings
on keep-alive connections.
We used to call ->wake() for any I/O event for which there was no
subscriber. But this is a problem because this causes massive wake()
storms since we disabled fd_stop_recv() to save syscalls.
The only reason for the io_available condition is to detect that an
asynchronous connect() just finished and will not be handled by any
registered event handler. Since we now properly handle synchronous
connects, we can detect this situation by the fact that we had a
success on conn_fd_check() and no requested I/O took over.
In the rare case of immediate connect() (unix sockets, socket pairs, and
occasionally TCP over the loopback), it is counter-productive to subscribe
for sending and then getting immediately back to process_stream() after
having passed through si_cs_process() just to update the connection. We
already know it is established and it doesn't have any handshake anymore
so we just have to complete it and return to process_stream() with the
stream_interface in the SI_ST_RDY state. In this case, process_stream will
simply loop back to the beginning to synchronize the state and turn it to
SI_ST_EST/ASS/CLO/TAR etc.
This will save us from having to needlessly subscribe in the connect()
code, something which in addition cannot work with edge-triggered pollers.
With commit 065a025610 ("MEDIUM: connection: don't stop receiving events
in the FD handler") we disabled a number of fd_stop_* in conn_fd_handler(),
in order to wait for their respective handlers to deal with them. But it
is not correct to do that for the send direction, as we may very well
have nothing to send. This is visible when connecting in TCP mode to
a server with no data to send, there's nobody anymore to disable the
polling for the send direction.
And it is logical, on the recv() path we know the system has data to
deliver and that some code will be in charge of it. On the send
direction we simply don't know if it was the result of a successful
connect() or if there is still something to send. In any case we
almost never fill the network buffer on a single send() after being
woken up by the system, so disabling the FD immediately or much later
will not change the number of operations.
No backport is needed, this is 2.2-dev.
The approach was wrong. USE_DL is for the makefile to know if it's required
to append -ldl at link time. Some platforms do not need it (and in fact do
not have it) yet they have a working dladdr(). The real condition is related
to ELF. Given that due to Lua, all platforms that require -ldl already have
USE_DL set, let's replace USE_DL with __ELF__ here and consider that dladdr
is always needed on ELF, which basically is already the case.
resolve_sym_name() doesn't build when USE_DL is set on non-GNU platforms
because "Elf(W)" isn't defined. Since it's only used for dladdr1(), let's
refactor all this so that we can completely ifdef out that part on other
platforms. Now we have a separate function to perform the call depending
on the platform and it only returns the size when available.
Instead of special-casing the use of the symbol resolving to decide
whether to dump a partial or complete trace, let's simply start over
and dump everything when we reach the end after having found nothing.
It will be more robust against dirty traces as well.
It happens that on aarch64 backtrace() only returns one entry (tested
with gcc 4.7.4, 5.5.0 and 7.4.1). Probably that it refrains from unwinding
the stack due to the risk of hitting a bad pointer. Here we can use
may_access() to know when it's safe, so we can actually unwind the stack
without taking risks. It happens that the faulting function (the one
just after the signal handler) is not listed here, very likely because
the signal handler uses a special stack and did not create a new frame.
So this patch creates a new my_backtrace() function in standard.h that
either calls backtrace() or does its own unrolling. The choice depends
on HA_HAVE_WORKING_BACKTRACE which is set in compat.h based on the build
target.
It's useful to get an indication of unresolved stuff or memory
corruption to have the apparent depth of the stack trace in the
output, especially if we dump nothing.
At least FreeBSD has a fully functional CLOCK_THREAD_CPUTIME but it
cannot create a timer on it. This is not a problem since our timer is
only used to measure each thread's usage using now_cpu_time_thread().
So by just replacing this clock with CLOCK_REALTIME we allow such
platforms to periodically call the wdt and check the thread's CPU usage.
The consequence is that even on a totally idle system there will still
be a few extra periodic wakeups, but the watchdog becomes usable there
as well.
On operating systems not supporting to create a timer on
POSIX_THREAD_CPUTIME we emit a warning but we return an error so the
process fails to start, which is absurd. Let's return a success once
the warning is emitted instead.
This may be backported to 2.1 and 2.0.
It's only available for bind line. "ca-verify-file" allows to separate
CA certificates from "ca-file". CA names sent in server hello message is
only compute from "ca-file". Typically, "ca-file" must be defined with
intermediate certificates and "ca-verify-file" with certificates to
ending the chain, like root CA.
Fix issue #404.
Calling backtrace() will access libgcc at runtime. We don't want to do
it after the chroot, so let's perform a first call to have it ready in
memory for later use.
When a panic() occurs due to a stuck thread, we'll try to dump a
backtrace of this thread if the config directive USE_BACKTRACE is
set (which is the case on linux+glibc). For this we use the
backtrace() call provided by glibc and iterate the pointers through
resolve_sym_name(). In order to minimize the output (which is limited
to one buffer), we only do this for stuck threads, and we start the
dump above ha_panic()/ha_thread_dump_all_to_trash(), and stop when
meeting known points such as main/run_tasks_from_list/run_poll_loop.
If enabled without USE_DL, the dump will be complete with no details
except that pointers will all be given relative to main, which is
still better than nothing.
The new USE_BACKTRACE config option is enabled by default on glibc since
it has been present for ages. When it is set, the export-dynamic linker
option is enabled so that all non-static symbols are properly resolved.
Now in "show threads", the task/tasklet handler will be resolved
using this function, which will provide more detailed results and
will still support offsets to main for unresolved symbols.
We use various hacks at a few places to try to identify known function
pointers in debugging outputs (show threads & show fd). Let's centralize
this into a new function dedicated to this. It already knows about the
functions matched by "show threads" and "show fd", and when built with
USE_DL, it can rely on dladdr1() to resolve other functions. There are
some limitations, as static functions are not resolved, linking with
-rdynamic is mandatory, and even then some functions will not necessarily
appear. It's possible to do a better job by rebuilding the whole symbol
table from the ELF headers in memory but it's less portable and the gains
are still limited, so this solution remains a reasonable tradeoff.
This function dumps <n> bytes from <addr> in hex form into buffer <buf>
enclosed in brackets after the address itself, formatted on 14 chars
including the "0x" prefix. This is meant to be used as a prefix for code
areas. For example: "0x7f10b6557690 [48 c7 c0 0f 00 00 00 0f]: "
It relies on may_access() to know if the bytes are dumpable, otherwise "--"
is emitted. An optional prefix is supported.
Since commit 4c2ae48375 ("MINOR: trace: implement a very basic trace()
function") merged in 2.1, trace() is an inline function. It must not
appear in standard.c anymore and may break build depending on includes.
This can be backported to 2.1.
It happens that just sending the debug signal to the process makes on
thread wait for its turn while nobody wants to dump. We need to at
least verify that a dump was really requested for this thread.
This can be backported to 2.1 and 2.0.
Often in crash dumps we see unknown function pointers. Let's display
them relative to main, that helps quite a lot figure the function
from an executable, for example:
(gdb) x/a main+645360
0x4c56a0 <h1_timeout_task>: 0x2e6666666666feeb
This could be backported to 2.0.
In commit 4eee38a ("BUILD/MINOR: tools: fix build warning in the date
conversion functions") we added some return checks to shut build
warnings but the last test is useless since the tested pointer is not
updated by the last call to utoa_pad() used to convert the milliseconds.
It turns out the original code from 2012 already skipped this part,
probably in order to avoid the risk of seeing a millisecond field not
belonging to the 0-999 range. Better keep the check and put the code
into stricter shape.
No backport is needed. This fixes issue #526.
Commit 80b53ffb1c ("MEDIUM: arg: make make_arg_list() stop after its
own arguments") changed the way we detect the empty list because we
cannot stop by looking up the closing parenthesis anymore, thus for
the first missing arg we have to enter the parsing loop again. And
there, finding an empty arg means we go to the empty_err label, where
it was not initially planned to handle this condition. This results
in %[date()] to fail while %[date] works. Let's simply check if we've
reached the minimally supported args there (it used to be done during
the function entry).
Thanks to Jrme for reporting this issue. No backport is needed,
this is 2.2-dev2+ only.
Since commit "MEDIUM: connection: make the subscribe() call able to wakeup
if ready" we have the guarantee that the tasklet will be woken up if
subscribing to a connection for an even that's ready. Since we have too
many tasklet_wakeup() calls in mux-h1, let's now use this property to
improve the situation a bit.
With this change, no syscall count changed, however the number of useless
calls to some functions significantly went down. Here are the differences
for the test below (100k req), in number of calls per request :
$ ./h1load -n 100000 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20
before after change note
tasklet_wakeup: 3 1 -66%
h1_io_cb: 4 3 -25%
h1_send: 6.7 5.4 -19%
h1_wake: 0.73 0.44 -39%
h1_process: 4.7 3.4 -27%
h1_wake_stream_for_send: 6.7 5.5 -18%
si_cs_process 3.7 3.4 -7.8%
conn_fd_handler 2.7 2.4 -10%
raw_sock_to_buf: 4 2 -50%
pool_free: 4 2 -50% from failed rx calls
Note that the situation could be further improved by having muxes lazily
subscribe to Rx events in case the FD is already being polled. However
this requires deeper changes to implement a LAZY_RECV subscribe mode,
and to replace the FD's active bit by 3 states representing the desired
action to perform on the FD during the update, among NONE (no need to
change), POLL (can't proceed without), and STOP (buffer full). This
would only impact Rx since on Tx we know what we have to send. The
savings to expect from this might be more visible with splicing and/or
when dealing with many connections having long think times.
The remaining epoll_ctl() calls are exclusively caused by the disagreement
between conn_fd_handler() and the mux receiving the data: the fd handler
wants to stop after having woken up the tasklet, then the mux after
receiving data wants to receive again. Given that they don't happen in
the same poll loop when there are many FDs, this causes a lot of state
changes.
As suggested by Olivier, if the task is already scheduled for running,
we don't need to disable the event because it's in the run queue, poll()
cannot stop, and reporting it again will be harmless. What *might*
happen however is that a sampling-based poller like epoll() would report
many times the same event and has trouble getting others behind. But if
it would happen, it would still indicate the run queue has plenty of
pending operations, so it would in fact only displace the problem from
the poller to the run queue, which doesn't seem to be worse (and in
fact we do support priorities while the poller does not).
By doing this change, the keep-alive test with 1k conns and 100k reqs
completely gets rid of the per-request epoll_ctl changes, while still
not causing extra recvfrom() :
$ ./h1load -n 100000 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20
200000 sendto 1
200000 recvfrom 1
10762 epoll_wait 1
3664 epoll_ctl 1
1999 recvfrom -1
In close mode, it didn't change anything, we're still in the optimal
case (2 epoll per connection) :
$ ./h1load -n 100000 -r 1 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20
203764 epoll_ctl 1
200000 sendto 1
200000 recvfrom 1
6091 epoll_wait 1
2994 recvfrom -1
There's currently an internal API limitation at the connection layer
regarding conn_subscribe(). We must not subscribe if we haven't yet
met EAGAIN or such a condition, so we sometimes force ourselves to read
in order to meet this condition and being allowed to call subscribe.
But reading cannot always be done (e.g. at the end of a loop where we
cannot afford to retrieve new data and start again) so we instead
perform a tasklet_wakeup() of the requester's io_cb. This is what is
done in mux_h1 for example. The problem with this is that it forces
a new receive when we're not necessarily certain we need one. And if
the FD is not ready and was already being polled, it's a useless
wakeup.
The current patch improves the connection-level subscribe() so that
it really manipulates the polling if the FD is marked not-ready, but
instead schedules the argument tasklet for a wakeup if the FD is
ready. This guarantees we'll wake this tasklet up in any case once the
FD is ready, either immediately or after polling.
By doing so, a test on pure close mode shows we cut in half the number
of epoll_ctl() calls and almost eliminate failed recvfrom():
$ ./h1load -n 100000 -r 1 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20
before:
399464 epoll_ctl 1
200007 recvfrom 1
200000 sendto 1
100000 recvfrom -1
7508 epoll_wait 1
after:
205739 epoll_ctl 1
200000 sendto 1
200000 recvfrom 1
6084 epoll_wait 1
2651 recvfrom -1
On keep-alive there is no change however.
This partially reverts commit 1113116b4a ("MEDIUM: raw-sock: remove
obsolete calls to fd_{cant,cond,done}_{send,recv}") so that we can mark
the FD not ready as required since commit 19bc201c9f ("MEDIUM: connection:
remove the intermediary polling state from the connection"). Indeed, with
the removal of the latter we don't have any other reliable indication that
the FD is blocked, which explains why there are so many EAGAIN in traces.
It's worth noting that a short read or a short write are also reliable
indicators of exhausted buffers and are even documented as such in the
epoll man page in case of edge-triggered mode. That's why we also report
the FD as blocked in such a case.
With this change we completely got rid of EAGAIN in keep-alive tests, but
they were expectedly transferred to epoll_ctl:
$ ./h1load -n 100000 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20
before:
266331 epoll_ctl 1
200000 sendto 1
200000 recvfrom 1
135757 recvfrom -1
8626 epoll_wait 1
after:
394865 epoll_ctl 1
200000 sendto 1
200000 recvfrom 1
10748 epoll_wait 1
1999 recvfrom -1
When a header is added or modified, in http_add_header() or
http_replace_header(), a comparison is performed on its name to know if it is
the Host header and if the authority part of the uri must be updated or
not. This comparision must be case-insensive.
This patch should fix the issue #522. It must be backported to 2.1.
As per issue #435 a hostname with a trailing dot confuses our DNS code,
as for a zero length DNS label we emit a null-byte. This change makes us
ignore the zero length label instead.
Must be backported to 1.8.
Even when there isn't a chain, it must be initialized to a empty X509
structure with sk_X509_new_null().
This patch fixes a segfault which appears with older versions of the SSL
libs (openssl 0.9.8, libressl 2.8.3...) because X509_chain_up_ref() does
not check the pointer.
This bug was introduced by b90d2cb ("MINOR: ssl: resolve issuers chain
later").
Should fix issue #516.
Previously when the `unique-id-format` contained non-deterministic parts,
such as the `uuid` fetch each use of the `unique-id` fetch would generate
a new unique ID, replacing the old one. The following configuration shows
the error:
global
log stdout format short daemon
listen test
log global
log-format "%ID"
unique-id-format %{+X}o\ TEST-%[uuid]
mode http
bind *:8080
http-response set-header A %[unique-id]
http-response set-header B %[unique-id]
server example example.com:80
Without the patch the contents of the `A` and `B` response header would
differ.
This bug was introduced in commit f4011ddcf5,
which was first released with HAProxy 1.7-dev3.
This fix should be backported to HAProxy 1.7+.
valgrind complains that epoll_ctl() uses an epoll_event in which we
have only set the part we use from the data field (i.e. the fd). Tests
show that pre-initializing the struct in the stack doesn't have a
measurable impact so let's do it.
In issue #471 it was reported that valgrind sometimes complains about
timer_create() being called with uninitialized bytes. These are in fact
the bits from sigev_value.sival_ptr that are not part of sival_int that
are tagged as such, as valgrind has no way to know we're using the int
instead of the ptr in the union. It's cheap to initialize the field so
let's do it.
Since commit 92919f7fd5 ("MEDIUM: h2: make the request parser rebuild
a complete URI") we make sure to rebuild a complete URI. Unfortunately
the test for an empty :path pseudo-header that is mandated by #8.1.2.3
appened to be performed on the URI before this patch, which is never
empty anymore after being rebuilt, causing h2spec to complain :
8. HTTP Message Exchanges
8.1. HTTP Request/Response Exchange
8.1.2. HTTP Header Fields
8.1.2.3. Request Pseudo-Header Fields
- 1: Sends a HEADERS frame with empty ":path" pseudo-header field
-> The endpoint MUST respond with a stream error of type PROTOCOL_ERROR.
Expected: GOAWAY Frame (Error Code: PROTOCOL_ERROR)
RST_STREAM Frame (Error Code: PROTOCOL_ERROR)
Connection closed
Actual: DATA Frame (length:0, flags:0x01, stream_id:1)
It's worth noting that this error doesn't trigger when calling h2spec
with a timeout as some scripts do, which explains why it wasn't detected
after the patch above.
This fixes one half of issue #471 and should be backported to 2.1.
The goal is to use the ckch to store data from PEM files or <payload> and
only for that. This patch adresses the ckch->ocsp_issuer case. It finds
issuers chain if no chain is present in the ckch in ssl_sock_put_ckch_into_ctx(),
filling the ocsp_issuer from the chain must be done after.
It changes the way '.issuer' is managed: it tries to load '.issuer' in
ckch->ocsp_issuer first and then look for the issuer in the chain later
(in ssl_sock_load_ocsp() ). "ssl-load-extra-files" without the "issuer"
parameter can negate extra '.issuer' file check.
The goal is to use the ckch to store data from a loaded PEM file or a
<payload> and only for that. This patch addresses the ckch->chain case.
Looking for the issuers chain, if no chain is present in the ckch, can
be done in ssl_sock_put_ckch_into_ctx(). This way it is possible to know
the origin of the certificate chain without an extra state.
This lock was only needed to protect the buffer_wq list, but now we have
the mt_list for this. This patch simply turns the buffer_wq list to an
mt_list and gets rid of the lock.
It's worth noting that the whole buffer_wait thing still looks totally
wrong especially in a threaded context: the wakeup_cb() callback is
called synchronously from any thread and may end up calling some
connection code that was not expected to run on a given thread. The
whole thing should probably be reworked to use tasklets instead and be
a bit more centralized.
Move the cert_issuer_tree outside the global_ssl structure since it's
not a configuration variable. And move the declaration of the
issuer_chain structure in types/ssl_sock.h
Reorder the 'show ssl cert' output so it's easier to see if the whole
chain is correct.
For a chain to be correct, an "Issuer" line must have the same
content as the next "Subject" line.
Example:
Subject: /C=FR/ST=Paris/O=HAProxy Test Certificate/CN=test.haproxy.local
Issuer: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 2/CN=ca2.haproxy.local
Chain Subject: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 2/CN=ca2.haproxy.local
Chain Issuer: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 1/CN=ca1.haproxy.local
Chain Subject: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 1/CN=ca1.haproxy.local
Chain Issuer: /C=FR/ST=Paris/O=HAProxy Test Root CA/CN=root.haproxy.local
For each certificate in the chain, displays the issuer, so it's easy to
know if the chain is right.
Also rename "Chain" to "Chain Subject".
Example:
Chain Subject: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 2/CN=ca2.haproxy.local
Chain Issuer: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 1/CN=ca1.haproxy.local
Chain Subject: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 1/CN=ca1.haproxy.local
Chain Issuer: /C=FR/ST=Paris/O=HAProxy Test Root CA/CN=root.haproxy.local
Display the subject of each certificate contained in the chain in the
output of "show ssl cert <filename>".
Each subjects are on a unique line prefixed by "Chain: "
Example:
Chain: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 2/CN=ca2.haproxy.local
Chain: /C=FR/ST=Paris/O=HAProxy Test Intermediate CA 1/CN=ca1.haproxy.local
This directive has never made any sense and has already caused trouble
by forcing the process to stay in foreground during the boot process.
Let's emit a warning mentioning it's deprecated and will be removed in
2.3.
As reported in issue #511, when sending an outgoing local connection
(e.g. health check) we must set the "local" tag and not a "proxy" tag.
The issue comes from historic support on v1 which required to steal the
address on the outgoing connection for such ones, creating confusion in
the v2 code which believes it sees the incoming connection.
In order not to risk to break existing setups which might rely on seeing
the LB's address in the connection's source field, let's just change the
connection type from proxy to local and keep the addresses. The protocol
spec states that for local, the addresses must be ignored anyway.
This problem has always existed, this can be backported as far as 1.5,
though it's probably not a good idea to change such setups, thus maybe
2.0 would be more reasonable.
There were 8 strict aliasing warnings there due to the dereferences
casting to uint32_t of input and output. We can achieve the same using
two write_u64() and four read_u64() which do not cause this issue and
even let the compiler use 64-bit operations.
Enabling strict aliasing fails on the cache's hash which is a series of
20 bytes cast as u32. And in practice it could even fail on some archs
if the http_txn didn't guarantee the hash was properly aligned. Let's
use read_u32() to read the value and write_u32() to set it, this makes
sure the compiler emits the correct code to access these and knows about
the intentional aliasing.
Enabling strict aliasing fails in fd.c when using the double-word CAS,
let's get rid of the (void**)(void*)&cur_list junk and use a union
instead. This way the compiler knows they do alias.
Sample fetch functions ssl_x_sha1(), ssl_fc_npn(), ssl_fc_alpn(),
ssl_fc_session_id(), as well as the CLI's "show cert details" handler
used to dereference the output buffer's <data> field by casting it to
"unsigned *". But while doing this could work before 1.9, it broke
starting with commit 843b7cbe9d ("MEDIUM: chunks: make the chunk struct's
fields match the buffer struct") which merged chunks and buffers, causing
the <data> field to become a size_t. The impact is only on 64-bit platform
and depends on the endianness: on little endian, there should never be any
non-zero bits in the field as it is supposed to have been zeroed before the
call, so it shouldbe harmless; on big endian, the high part of the value
only is written instead of the lower one, often making the result appear
4 billion times larger, and making such values dropped everywhere due to
being larger than a buffer.
It seems that it would be wise to try to re-enable strict-aliasing to
catch such errors.
This must be backported till 1.9.
About every time there's a pointer cast in the code, there's a hidden
bug, and this one was no exception, as it passes the first octet of the
native representation of an integer as a single-character string, which
obviously only works on little endian machines. On big-endian machines,
something as simple as "str(foo),json" only returns zeroes.
This bug was introduced with the JSON converter in 1.6-dev1 by commit
317e1c4f1e ("MINOR: sample: add "json" converter"), the fix may be
backported to all stable branches.
The isalnum(), isalpha(), isdigit() etc functions from ctype.h are
supposed to take an int in argument which must either reflect an
unsigned char or EOF. In practice on some platforms they're implemented
as macros referencing an array, and when passed a char, they either cause
a warning "array subscript has type 'char'" when lucky, or cause random
segfaults when unlucky. It's quite unconvenient by the way since none of
them may return true for negative values. The recent introduction of
cygwin to the list of regularly tested build platforms revealed a lot
of breakage there due to the same issues again.
So this patch addresses the problem all over the code at once. It adds
unsigned char casts to every valid use case, and also drops the unneeded
double cast to int that was sometimes added on top of it.
It may be backported by dropping irrelevant changes if that helps better
support uncommon platforms. It's unlikely to fix bugs on platforms which
would already not emit any warning though.
A build failure on cygwin was reported on github actions here:
https://github.com/haproxy/haproxy/runs/466507874
It's caused by a signed char being passed to isspace(), and this one
being implemented as a macro instead of a function as the man page
suggests. It's the same issue that regularly pops up on Solaris. This
comes from commit 98263291cc which was merged in 1.8-dev1. A backport
is possible though not incredibly useful.
`curr_idle_thr` is of type `unsigned int`, not `int`. Fix this issue by
taking the size of the dereferenced `curr_idle_thr` array.
This issue was introduced when adding the `curr_idle_thr` struct member
in commit f131481a0a. This commit is first
tagged in 2.0-dev1 and marked for backport to 1.9.
This used to be a minor optimization on ix86 where registers are scarce
and the calling convention not very efficient, but this platform is not
relevant enough anymore to warrant all this dirt in the code for the sake
of saving 1 or 2% of performance. Modern platforms don't use this at all
since their calling convention already defaults to using several registers
so better get rid of this once for all.
Don't try to load a .key in a directory without loading its associated
certificate file.
This patch ignores the .key files when iterating over the files in a
directory.
Introduced by 4c5adbf ("MINOR: ssl: load the key from a dedicated
file").
For a certificate on a bind line, if the private key was not found in
the PEM file, look for a .key and load it.
This default behavior can be changed by using the ssl-load-extra-files
directive in the global section
This feature was mentionned in the issue #221.
Now that we have flags indicating the CPU's capabilities, better use
them instead of missing some updates for new CPU families (ARMv8 was
missing there).
The blocksize and the extra field are not necessarily aligned on a
machine word. This can result in crashing an align-sensitive machine
when initializing the shctx area. Let's round both sizes up to a pointer
size to make this safe everywhere.
This fixes issue #512. This should be backported as far as 1.8.
In sample_fetch_path we can use iststop() instead of a for loop
to find the '?' and return the correct length. This requires commit
"MINOR: ist: add an iststop() function".
In http_action_replace_uri() we call http_get_path() in the case of
a replace-path rule. http_get_path() will return an ist pointing to
the start of the path, but uri.ptr + uri.len points to the end of the
uri. As as result, we are matching against a string containing the
query, which we append to the "path" later, effectively duplicating
the query string.
This patch uses the iststop() function introduced in "MINOR: ist: add
an iststop() function" to find the '?' character and update the ist
length when needed.
This fixes issue #510.
The bug was introduced by commit 262c3f1a ("MINOR: http: add a new
"replace-path" action"), which was backported to 2.1 and 2.0.