The ssl_sock.c file contains a lot of macros and structure definitions
that should be in a .h. Move them to the more appropriate
types/ssl_sock.h file.
Make use of ssl_sock_register_msg_callback(). Function ssl_sock_msgcbk()
is now split into two dedicated functions for heartbeat and clienthello.
They are both registered by using a new callback mechanism for SSL/TLS
protocol messages.
This patch adds the ability to register callbacks for SSL/TLS protocol
messages by using the function ssl_sock_register_msg_callback().
All registered callback functions will be called when observing received
or sent SSL/TLS protocol messages.
Only allow L7 retries when using HTTP, it only really makes sense for HTTP,
anyway, and as the L7 retries code assume the message will be HTX, it will
crash when used with mode TCP.
This should fix github issue #627.
This should be backported to 2.1 and 2.0.
In do_l7_retry(), remove the SF_ADDR_SET flag. Otherwise,
assign_server_address() won't be called again, which means for 2.1 or 2.2,
we will always retry to connect to the server that just failed, and for 2.0,
that we will try to use to whatever the address is for the connection,
probably the last server used by that connection before it was pool_free()
and reallocated.
This should be backported to 2.1 and 2.0.
Commit 42a50bd19 ("BUG/MINOR: pollers: remove uneeded free in global
init") removed the 'fail_revt' label from the _do_init() function
in src/ev_select.c but left the local label declaration, which makes
clang unhappy and unable to build.
These labels are only historic and unneeded anyway so let's remove them.
This should be backported where 42a50bd19 is backported.
When the first thread stops and wakes others up, it's possible some of
them will also start to wake others in parallel. Let's make give this
notification task to the very first one instead since it's enough and
can reduce the amount of needless (though harmless) wakeup calls.
Currently the soft-stop can lead to old processes remaining alive for as
long as two seconds after receiving a soft-stop signal. What happens is
that when receiving SIGUSR1, one thread (usually the first one) wakes up,
handles the signal, sets "stopping", goes into runn_poll_loop(), and
discovers that stopping is set, so its also sets itself in the
stopping_thread_mask bit mask. After this it sees that other threads are
not yet willing to stop, so it continues to wait.
From there, other threads which were waiting in poll() expire after one
second on poll timeout and enter run_poll_loop() in turn. That's already
one second of wait time. They discover each in turn that they're stopping
and see that other threads are not yet stopping, so they go back waiting.
After the end of the first second, all threads know they're stopping and
have set their bit in stopping_thread_mask. It's only now that those who
started to wait first wake up again on timeout to discover that all other
ones are stopping, and can now quit. One second later all threads will
have done it and the process will quit.
This is effectively strictly larger than one second and up to two seconds.
What the current patch does is simple, when the first thread stops, it sets
its own bit into stopping_thread_mask then wakes up all other threads to do
also set theirs. This kills the first second which corresponds to the time
to discover the stopping state. Second, when a thread exists, it wakes all
other ones again because some might have gone back sleeping waiting for
"jobs" to go down to zero (i.e. closing the last connection). This kills
the last second of wait time.
Thanks to this, as SIGUSR1 now acts instantly again if there's no active
connection, or it stops immediately after the last connection has left if
one was still present.
This should be backported as far as 2.0.
while reading the code, I thought it was clearer to put one instruction
per line as it is mostly done elsewhere
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Since commit d4604adeaa ("MAJOR: threads/fd: Make fd stuffs
thread-safe"), we init pollers per thread using a helper. It was still
correct for mono-thread mode until commit cd7879adc2 ("BUG/MEDIUM:
threads: Run the poll loop on the main thread too"). We now use a deinit
helper for all threads, making those free uneeded.
Only poll and select are affected by this very minor issue.
it could be backported from v1.8 to v2.1.
Fixes: cd7879adc2 ("BUG/MEDIUM: threads: Run the poll loop on the main
thread too")
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
In dump_pools_to_trash() we happen to use %d to display unsigned ints!
This has probably been there since "show pools" was introduced so this
fix must be backported to all versions. The impact is negligible since
no pool uses 2 billion entries. It could possibly affect the report of
failed allocation counts but in this case there's a bigger problem to
solved!
In the commit 2fabd9d53 ("BUG/MEDIUM: checks: Subscribe to I/O events on an
unfinished connect"), we force the subscribtion to I/O events when a new
connection is opened if it is not fully established. But it must only be done if
a mux was immediately installed. If there is no mux, no subscription must be
performed.
No backport needed.
Make the digest and HMAC function of OpenSSL accessible to the user via
converters. They can be used to sign and validate content.
Reviewed-by: Tim Duesterhus <tim@bastelstu.be>
aes_gcm_dec is independent of the TLS implementation and fits better
in sample.c file with others hash functions.
[Cf: I slightly updated this patch to move aes_gcm_dec converter in sample.c
instead the new file crypto.c]
Reviewed-by: Tim Duesterhus <tim@bastelstu.be>
It is useless to try to send outgoing data if the check is still waiting to be
able to send data.
No backport needed.
(cherry picked from commit d94653700437430864c03090d710b95f4e860321)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
In tcpcheck_main(), when we are waiting for a connection, we must rely on the
next connect/send/expect rule to subscribe to I/O events, not on the immediate
next rule. Because, if it is a set-var or an unset-var rule, we will not
subscribe to I/O events while it is in fact mandatory because a send or an
expect rule is coming. It is required to wake-up the health check as soon as I/O
are possible, instead of hitting a timeout.
No backport needed.
(cherry picked from commit 758d48f54cc3372c2d8e7c34b926d218089c533a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Subscription to I/O events should not be performed if the check is already
subscribed.
No backport needed.
(cherry picked from commit 9e0b3e92f73b6715fb2814e3d09b8ba62270b417)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
In tcp-check based health check, when a new connection is opened, we must wait
it is really established before moving to the next rule. But at this stage, we
must also be sure to subscribe to I/O events. Otherwise, depending on the
timing, the health check may remains sleepy till the timeout.
No backport needed. This patch should fix the issue #622.
(cherry picked from commit b2a4c0d473e3c5dcb87f7d16f2ca410bafc62f64)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
For 6 years now we've been seeing a warning suggesting to set dh-param
beyond 1024 if possible when it was not set. It's about time to do it
and get rid of this warning since most users seem to already use 2048.
It will remain possible to set a lower value of course, so only those
who were experiencing the warning and were relying on the default value
may notice a change (higher CPU usage). For more context, please refer
to this thread :
https://www.mail-archive.com/haproxy@formilux.org/msg37226.html
This commit removes a big chunk of code which happened to be needed
exclusively to figure if it was required to emit a warning or not :-)
This fixes OSS Fuzz issue https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=21931.
OSS Fuzz detected a hang on configuration parsing for a 200kB line with a large number of
invalid escape sequences. Most likely due to the amounts of error output generated.
This issue is very minor, because usually generated configurations are to be trusted.
The bug exists since at the very least HAProxy 1.4. The patch may be backported if desired.
Commit 9df188695f ("BUG/MEDIUM: http-ana: Handle NTLM messages correctly.")
tried to address an HTTP-reuse issue reported in github issue #511 by making
sure we properly detect extended NTLM responses, but made the match case-
sensitive while it's a token so it's case insensitive.
This should be backported to the same versions as the commit above.
During use_backend and use-server post-parsing, if the log-format string used to
specify the backend or the server is just a single string, the log-format string
(a list, internally) is replaced by the string itself. Because the field is an
union, the list is not emptied according to the rules of art. The element, when
released, is not removed from the list. There is no bug, but it is clearly not
obvious and error prone.
This patch should fix#544. The fix for the use_backend post-parsing may be
backported to all stable releases. use-server is static in 2.1 and prior.
The issue can easily be reproduced with "stick on" statement
backend BE_NAME
stick-table type ip size 1k
stick on src
and calling dump() method on BE_NAME stick table from Lua
Before the fix, HAProxy would return 500 and log something like
the following:
runtime error: attempt to index a string value from [C] method 'dump'
Where one would expect a Lua table like this:
{
["IP_ADDR"] = {
["server_id"] = 1,
["server_name"] = "srv1"
}
}
This patch needs to backported to 1.9 and later releases.
A default statement in the switch testing the header name has been added to be
sure it is impossible to eval the value pattern on an uninitialized header
value. It should never happen of course. But this way, it is explicit.
And a comment has been added to make clear that ctx.value is always defined when
it is evaluated.
This patch fixes the issue #619.
For http-check send rules, it is now possible to use a log-format string to set
the request's body. the keyword "body-lf" should be used instead of "body". If the
string eval fails, no body is added.
For http-check send rules, it is now possible to use a log-format string to set
the request URI. the keyword "uri-lf" should be used instead of "uri". If the
string eval fails, we fall back on the default uri "/".
Since commit bb86986 ("MINOR: init: report the haproxy version and
executable path once on errors") the master-worker displays its version
and path upon a successful exits of a current worker. Which is kind of
confusing upon a clean exits.
This is due to the fact that that commit displays this upon a ha_alert()
which was used during the exit of the process.
Replace the ha_alert() by an ha_warning() if the process exit correctly
and was supposed to.
It still displays the message upon a SIGINT since the workers are
catching the signal.
A sample must always have a session defined. Otherwise, it is a bug. So it is
unnecessary to test if it is defined when called from a health checks context.
This patch fixes the issue #616.
Extra parameters on http-check expect rules, for the header matching method, to
use log-format string or to match full header line have been removed. There is
now separate matching methods to match a full header line or to match each
comma-separated values. "http-check expect fhdr" must be used in the first case,
and "http-check expect hdr" in the second one. In addition, to match log-format
header name or value, "-lf" suffix must be added to "name" or "value"
keyword. For intance:
http-check expect hdr name "set-cookie" value-lf -m beg "sessid=%[var(check.cookie)]"
Thanks to this changes, each parameter may only be interpreted in one way.
Following actions have been added to send log-format strings from a tcp-check
ruleset instead the log-format parameter:
* tcp-check send-lf <fmt>
* tcp-check send-binary-lf <fmt>
It is easier for tools generating configurations. Each action may only be
interpreted in one way.
All sample fetches in the scope "check." have been removed. Response sample
fetches must be used instead. It avoids keyword duplication. So, for instance,
res.hdr() must be now used instead of check.hdr().
To do so, following sample fetches have been added on the response :
* res.body, res.body_len and res.body_size
* res.hdrs and res.hdrs_bin
Sample feches dealing with the response's body are only useful in the health
checks context. When called from a stream context, there is no warranty on the
body presence. There is no option to wait the response's body.
It is now possible to use log-format string (or hexadecimal string for the
binary version) to match a content in tcp-check based expect rules. For
hexadecimal log-format string, the conversion in binary is performed after the
string evaluation, during health check execution. The pattern keywords to use
are "string-lf" for the log-format string and "binary-lf" for the hexadecimal
log-format string.
TCP and HTTP expect rules may fail because of unexpected and internal
error. Mainly during log-format strings eval. The error report is improved by
this patch.
An additional argument has been added to smp_prefetch_htx() function in the
commit 778f5ed47 ("MEDIUM: checks/http-fetch: Support htx prefetch from a check
for HTTP samples"). But forgot to update call in 51d module.
No need to backport.
An additional argument has been added to smp_prefetch_htx() function in the
commit 778f5ed47 ("MEDIUM: checks/http-fetch: Support htx prefetch from a check
for HTTP samples"). But forgot to update call in wurfl module.
No need to backport.
An additional argument has been added to smp_prefetch_htx() function in the
commit 778f5ed47 ("MEDIUM: checks/http-fetch: Support htx prefetch from a check
for HTTP samples"). But forgot to update call in da module.
No need to backport.
It is now possible to add http-check expect rules matching HTTP header names and
values. Here is the format of these rules:
http-check expect header name [ -m <meth> ] <name> [log-format] \
[ value [ -m <meth> ] <value> [log-format] [full] ]
the name pattern (name ...) is mandatory but the value pattern (value ...) is
optionnal. If not specified, only the header presence is verified. <meth> is the
matching method, applied on the header name or the header value. Supported
matching methods are:
* "str" (exact match)
* "beg" (prefix match)
* "end" (suffix match)
* "sub" (substring match)
* "reg" (regex match)
If not specified, exact matching method is used. If the "log-format" option is
used, the pattern (<name> or <value>) is evaluated as a log-format string. This
option cannot be used with the regex matching method. Finally, by default, the
header value is considered as comma-separated list. Each part may be tested. The
"full" option may be used to test the full header line. Note that matchings are
case insensitive on the header names.
For an http-check ruleset, it should be allowed to set a chain of expect
rules. But an error is triggered during the post-parsing because of a wrong
test, inherited from the evaluation mode before the refactoring.
No need to backport.
The status (ok, error and timeout) of an TCP or HTTP expect rule are set to
HCHK_STATUS_UNKNOWN by default, when not specified, during the configuration
parsing. This does not change the default status used for a terminal expect rule
(ok=L7OK, err=L7RSP and tout=L7TOUT). But this way, it is possible to know if a
specific status was forced by config or not.
It is now possible to use different matching methods to look for header names in
an HTTP message:
* The exact match. It is the default method. http_find_header() uses this
method. http_find_str_header() is an alias.
* The prefix match. It evals the header names starting by a prefix.
http_find_pfx_header() must be called to use this method.
* The suffix match. It evals the header names ending by a suffix.
http_find_sfx_header() must be called to use this method.
* The substring match. It evals the header names containing a string.
http_find_sub_header() must be called to use this method.
* The regex match. It evals the header names matching a regular expression.
http_match_header() must be called to use this method.
HTPP sample fetches acting on the response can now be called from any sample
expression or log-format string in a tcp-check based ruleset. To avoid any
ambiguities, all these sample fetches are in the check scope, for instance
check.hdr() or check.cook().
SSL sample fetches acting on the server connection can now be called from any
sample expression or log-format string in a tcp-check based ruleset. ssl_bc and
ssl_bc_* sample fetches are concerned.
It is now possible to call be_id, be_name, srv_id and srv_name sample fetches
from any sample expression or log-format string in a tcp-check based ruleset.
It is now possible to call check.payload(), check.payload_lv() and check.len()
sample fetches from any sample expression or log-format string in a tcp-check
based ruleset. In fact, check.payload() was already added. But instead of having
a specific function to handle this sample fetch, we use the same than
req.payload().
These sample fetches act on the check input buffer, containing data received for
the server. So it should be part of or after an expect rule, but before any send
rule. Because the input buffer is cleared at this stage.
Some HTTP sample fetches will be accessible from the context of a http-check
health check. Thus, the prefetch function responsible to return the HTX message
has been update to handle a check, in addition to a channel. Both cannot be used
at the same time. So there is no ambiguity.
A binary sample data can be converted, implicitly or not, to a string by cutting
the buffer on the first null byte.
I guess this patch should be backported to all stable versions.
srv_cleanup_connections() is supposed to be static, so mark it as so.
This patch should be backported where commit 6318d33ce6
("BUG/MEDIUM: connections: force connections cleanup on server changes")
will be backported, that is to say v1.9 to v2.1.
Fixes: 6318d33ce6 ("BUG/MEDIUM: connections: force connections cleanup
on server changes")
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
After we call SSL_SESSION_get_id(), the length of the id in bytes is
stored in "len", which was never checked. This could cause unexpected
behavior when using the "ssl_fc_session_id" or "ssl_bc_session_id"
fetchers (eg. the result can be an empty value).
The issue was introduced with commit 105599c ("BUG/MEDIUM: ssl: fix
several bad pointer aliases in a few sample fetch functions").
This patch must be backported to 2.1, 2.0, and 1.9.
When only request headers are parsed, the host header should not be compared to
the request authority because no start-line was parsed. Thus there is no
authority.
Till now this bug was hidden because this parsing mode was only used for the
response in the FCGI multiplexer. Since the HTTP checks refactoring, the request
headers may now also be parsed without the start-line.
This patch fixes the issue #610. It must be backported to 2.1.
I've been trying to understand a change of behaviour between v2.2dev5 and
v2.2dev6. Indeed our probe is regularly testing to add and remove
servers on a given backend such as:
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 263 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 0.0.0.0 0 1 256 256 0 15 3 0 14 0 0 0 - 0 -
-> curl on the corresponding frontend: reply from server:31255
# echo "set server be_foo/srv1 addr 10.236.139.34 port 31257" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '0.0.0.0' to '10.236.139.34', port changed from '0' to '31257' by 'stats socket command'
# echo "set server be_foo/srv1 weight 256" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 check-port 8500" | sudo socat stdio /var/lib/haproxy/stats
health check port updated.
# echo "set server be_foo/srv1 state ready" | sudo socat stdio /var/lib/haproxy/stats
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 105 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 10.236.139.34 2 0 256 256 2319 15 3 2 6 0 0 0 - 31257 -
-> curl on the corresponding frontend: reply for server:31257
(notice the difference of weight)
# echo "set server be_foo/srv1 state maint" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 addr 0.0.0.0 port 0" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '10.236.139.34' to '0.0.0.0', port changed from '31257' to '0' by 'stats socket command'
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 263 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 0.0.0.0 0 1 256 256 0 15 3 0 14 0 0 0 - 0 -
-> curl on the corresponding frontend: reply from server:31255
# echo "set server be_foo/srv1 addr 10.236.139.34 port 31256" | sudo socat stdio /var/lib/haproxy/stats
IP changed from '0.0.0.0' to '10.236.139.34', port changed from '0' to '31256' by 'stats socket command'
# echo "set server be_foo/srv1 weight 256" | sudo socat stdio /var/lib/haproxy/stats
# echo "set server be_foo/srv1 check-port 8500" | sudo socat stdio /var/lib/haproxy/stats
health check port updated.
# echo "set server be_foo/srv1 state ready" | sudo socat stdio /var/lib/haproxy/stats
# echo "show servers state be_foo" | sudo socat stdio /var/lib/haproxy/stats
113 be_foo 1 srv0 10.236.139.34 2 0 1 1 105 15 3 4 6 0 0 0 - 31255 -
113 be_foo 2 srv1 10.236.139.34 2 0 256 256 2319 15 3 2 6 0 0 0 - 31256 -
-> curl on the corresponding frontend: reply from server:31257 (!)
Here we indeed would expect to get an anver from server:31256. The issue
is highly linked to the usage of `pool-purge-delay`, with a value which
is higher than the duration of the test, 10s in our case.
a git bisect between dev5 and dev6 seems to show commit
079cb9af22 ("MEDIUM: connections: Revamp the way idle connections are killed")
being the origin of this new behaviour.
So if I understand the later correctly, it seems that it was more a
matter of chance that we did not saw the issue earlier.
My patch proposes to force clean idle connections in the two following
cases:
- we set a (still running) server to maintenance
- we change the ip/port of a server
This commit should be backported to 2.1, 2.0, and 1.9.
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
When a stream is detached from its connection, we try to move the connection in
an idle list to keep it opened, the session one or the server one. But it must
only be done if there is no connection error and if we want to keep it
open. This last statement is true if FCGI_CF_KEEP_CONN flag is set. But the test
is inverted at the stage.
This patch must be backported to 2.1.
fcgi_release() function is responsible to release a FCGI connection. But the
release of the connection itself is missing.
This patch must be backported to 2.1.
When the last stream is detached from a FCGI connection, if the server don't add
the connection in its idle list, the connection is destroyed. Thus it is
important to exist immediately from the detach function. A return statement is
missing here.
This bug was introduced in the commit 2444aa5b6 ("MEDIUM: sessions: Don't be
responsible for connections anymore.").
It is a 2.2-dev bug. No need to backport.
Now we very rarely catch spinning streams, and whenever we catch one it
seems a filter is involved, but we currently report no info about them.
Let's print the list of enabled filters on the stream with such a crash
to help with the reports. A typical output will now look like this:
[ALERT] 121/165908 (1110) : A bogus STREAM [0x7fcaf4016a60] is spinning at 2 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x7fcaf4016a60 src=127.0.0.1 fe=l1 be=l1 dst=<CACHE> rqf=6dc42000 rqa=48000 rpf=a0040223 rpa=24000000 sif=EST,10008 sib=DIS,80110 af=(nil),0 csf=0x7fcaf4023c00,10c000 ab=0x7fcaf40235f0,4 csb=(nil),0 cof=0x7fcaf4016610,1300:H1(0x7fcaf4016840)/RAW((nil))/tcpv4(29) cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0) filters={0x7fcaf4016fb0="cache store filter", 0x7fcaf4017080="compression filter"}]
This may be backported to 2.0.
I changed my mind twice on this one and pushed after the last test with
threads disabled, without re-enabling long long, causing this rightful
build warning.
This needs to be backported if the previous commit ff64d3b027 ("MINOR:
threads: export the POSIX thread ID in panic dumps") is backported as
well.
It is very difficult to map a panic dump against a gdb thread dump
because the thread numbers do not match. However gdb provides the
pthread ID but this one is supposed to be opaque and not to be cast
to a scalar.
This patch provides a fnuction, ha_get_pthread_id() which retrieves
the pthread ID of the indicated thread and casts it to an unsigned
long long so as to lose the least possible amount of information from
it. This is done cleanly using a union to maintain alignment so as
long as these IDs are stored on 1..8 bytes they will be properly
reported. This ID is now presented in the panic dumps so it now
becomes possible to map these threads. When threads are disabled,
zero is returned. For example, this is a panic dump:
Thread 1 is about to kill the process.
*>Thread 1 : id=0x7fe92b825180 act=0 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
stuck=1 prof=0 harmless=0 wantrdv=0
cpu_ns: poll=5119122 now=2009446995 diff=2004327873
curr_task=0xc99bf0 (task) calls=4 last=0
fct=0x592440(task_run_applet) ctx=0xca9c50(<CLI>)
strm=0xc996a0 src=unix fe=GLOBAL be=GLOBAL dst=<CLI>
rqf=848202 rqa=0 rpf=80048202 rpa=0 sif=EST,200008 sib=EST,204018
af=(nil),0 csf=0xc9ba40,8200
ab=0xca9c50,4 csb=(nil),0
cof=0xbf0e50,1300:PASS(0xc9cee0)/RAW((nil))/unix_stream(20)
cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0)
call trace(20):
| 0x59e4cf [48 83 c4 10 5b 5d 41 5c]: wdt_handler+0xff/0x10c
| 0x7fe92c170690 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x13690
| 0x7ffce29519d9 [48 c1 e2 20 48 09 d0 48]: linux-vdso:+0x9d9
| 0x7ffce2951d54 [eb d9 f3 90 e9 1c ff ff]: linux-vdso:__vdso_gettimeofday+0x104/0x133
| 0x57b484 [48 89 e6 48 8d 7c 24 10]: main+0x157114
| 0x50ee6a [85 c0 75 76 48 8b 55 38]: main+0xeaafa
| 0x50f69c [48 63 54 24 20 85 c0 0f]: main+0xeb32c
| 0x59252c [48 c7 c6 d8 ff ff ff 44]: task_run_applet+0xec/0x88c
Thread 2 : id=0x7fe92b6e6700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=786738 now=1086955 diff=300217
curr_task=0
Thread 3 : id=0x7fe92aee5700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=828056 now=1129738 diff=301682
curr_task=0
Thread 4 : id=0x7fe92a6e4700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=818900 now=1153551 diff=334651
curr_task=0
And this is the gdb output:
(gdb) info thr
Id Target Id Frame
* 1 Thread 0x7fe92b825180 (LWP 15234) 0x00007fe92ba81d6b in raise () from /lib64/libc.so.6
2 Thread 0x7fe92b6e6700 (LWP 15235) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6
3 Thread 0x7fe92a6e4700 (LWP 15237) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6
4 Thread 0x7fe92aee5700 (LWP 15236) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6
We can clearly see that while threads 1 and 2 are the same, gdb's
threads 3 and 4 respectively are haproxy's threads 4 and 3.
This may be backported to 2.0 as it removes some confusion in github issues.
We tried hard to make sure we report threads as not stuck at various
crucial places, but one of them is special, it's the listener_accept()
function. The reason it is special is because it will loop a certain
number of times (default: 64) accepting incoming connections, allocating
resources, dispatching them to other threads or running L4 rules on them,
and while all of this is supposed to be extremely fast, when the machine
slows down or runs low on memory, the expectedly small delays in malloc()
caused by contention with other threads can quickly accumulate and suddenly
become critical to the point of triggering the watchdog. Furthermore, it
is technically possible to trigger this by pure configuration by setting
a huge tune.maxaccept value, which should not be possible.
Given that each operation isn't related to the same task but to a different
one each time, it is appropriate to mark the thread as not stuck each time
it accepts new work that possibly gets dispatched to other threads which
execute it.
This looks like this could be a good reason for the issue reported in
issue #388.
This fix must be backported to 2.0.
Building without threads now shows this warning:
src/ssl_sock.c: In function 'cli_io_handler_commit_cert':
src/ssl_sock.c:12121:24: warning: unused variable 'bind_conf' [-Wunused-variable]
struct bind_conf *bind_conf = ckchi->bind_conf;
^~~~~~~~~
This is because the variable is needed only to unlock the structure, and
the unlock operation does nothing in this case. Let's mark the variable
__maybe_unused for this, but it would be convenient in the long term if
we could make the thread macros pretend they consume the argument so that
this remains less visible outside.
No backport is needed.
Because in HTTP, the host header and the request authority, if any, must be
identical, we keep both synchornized. It means the right flags are set on the
HTX statrt-line calling http_update_host(). There is no header when it happens,
but it is not an issue. Then, if a Host header is inserted,
http_update_authority() is called.
Note that for now, the host header is not automatically added when required.
Connection, content-length and transfer-encoding headers are ignored for
http-check send rules. For now, the keep-alive is not supported and the
"connection: close" header is always added to the request. And the
content-length header is automatically added.
cpu_calls, cpu_ns_avg, cpu_ns_tot, lat_ns_avg and lat_ns_tot depend on the
stream to find the current task and must check for it or they may cause a
crash if misused or used in a log-format string after commit 5f940703b3
("MINOR: log: Don't depends on a stream to process samples in log-format
string").
This must be backported as far as 1.9.
get_http_auth() expects a valid stream but this is not mentioned, though
fortunately it's always called from places which already check this.
smp_prefetch_htx() performs all the required checks and is the key to the
stability of almost all sample fetch functions, so let's make this clearer.
Since commit 5f940703b3 ("MINOR: log: Don't depends on a stream to process
samples in log-format string") it has become quite obvious that a few sample
fetch functions and converters were still heavily dependent on the presence
of a stream without testing for it.
The unique-id sample fetch function, if called without a stream, will result
in a crash.
This fix adds a check for the stream's existence, and should be backported
to all stable versions up to 1.7.
Since commit 5f940703b3 ("MINOR: log: Don't depends on a stream to process
samples in log-format string") it has become quite obvious that a few sample
fetch functions and converters were still heavily dependent on the presence
of a stream without testing for it.
The http_first_req sample fetch function, if called without a stream, will
result in a crash.
This fix adds a check for the stream's existence, and should be backported
to all stable versions up to 1.6.
Since commit 5f940703b3 ("MINOR: log: Don't depends on a stream to process
samples in log-format string") it has become quite obvious that a few sample
fetch functions and converters were still heavily dependent on the presence
of a stream without testing for it.
The capture.req.hdr, capture.res.hdr, capture.req.method, capture.req.uri,
capture.req.ver and capture.res.ver sample fetches used to assume the
presence of a stream, which is not necessarily the case (especially after
the commit above) and would crash haproxy if incorrectly used. Let's make
sure they check for this stream.
This fix adds a check for the stream's existence, and should be backported
to all stable versions up to 1.6.
Since commit 5f940703b3 ("MINOR: log: Don't depends on a stream to process
samples in log-format string") it has become quite obvious that a few sample
fetch functions and converters were still heavily dependent on the presence
of a stream without testing for it.
The capture-req and capture-res converters were in this case and could
crash the process if misused.
This fix adds a check for the stream's existence, and should be backported
to all stable versions up to 1.6.
Mux-h1 currently heavily relies on the presence of an upper stream, even
when waiting for a new request after one is being finished, and it's that
upper stream that's in charge of request and keep-alive timeouts for now.
But since recent commit 493d9dc6ba ("MEDIUM: mux-h1: do not blindly wake
up the tasklet at end of request anymore") that assumption was broken as
the purpose of this change was to avoid initiating processing of a request
when there's no data in the buffer. The side effect is that there's no more
timeout to handle the front connection, resulting in dead front connections
stacking up as clients get kicked off the net.
This fix makes sure we always enable the timeout when there's no stream
attached to the connection. It doesn't do this for back connections since
they may purposely be left idle.
No backport is needed as this bug was introduced in 2.2-dev4.
A bug was introduced in the commit 2edcd4cbd ("BUG/MINOR: checks: Avoid
incompatible cast when a binary string is parsed"). The length of the
destination buffer must be set before call the parse_binary() function.
No backport needed.
It can be sometimes useful to measure total time of a request as seen
from an end user, including TCP/TLS negotiation, server response time
and transfer time. "Tt" currently provides something close to that, but
it also takes client idle time into account, which is problematic for
keep-alive requests as idle time can be very long. "Ta" is also not
sufficient as it hides TCP/TLS negotiationtime. To improve that, introduce
a "Tu" timer, without idle time and everything else. It roughly estimates
time spent time spent from user point of view (without DNS resolution
time), assuming network latency is the same in both directions.
When a tcp-check line is parsed, a warning may be reported if the keyword is
used for a frontend. The return value must be used to report it. But this info
is lost before the end of the function.
Partly fixes issue #600. No backport needed.
When an error is found during the parsing of an expect rule (tcp or http),
everything is released at the same place, at the end of the function.
Partly fixes issue #600. No backport needed.
parse_binary() function must be called with a pointer on an integer. So don't
pass a pointer on a size_t element, casting it to a pointer on a integer.
Partly fixes issue #600. No backport needed.
The variable s, pointing on the check server, may be null when a connection is
openned. It happens for email alerts. To avoid ambiguities, its use is now more
explicit. Comments have been added at some places and tests on the variable have
been added elsewhere (useless but explicit).
Partly fixes issue #600.
If a message is not fully received from a mysql server, depending on last_read
value, an error must be reported or we must wait for more data. The first if
statement must not check last_read.
Partly fixes issue #600. No backport needed.
The version is partially parsed to set the flag HTX_SL_F_VER_11 on the HTX
message. But exactly 8 chars is expected. So if "HTTP/2" is specified, the flag
is not set. Thus, the version parsing has been updated to handle "HTTP/2" and
"HTTP/2.0" the same way.
For PostgreSQL health check, there is a regex on the backend authentication
packet. It must match to succeed. But it exists 6 types of authentication
packets and the regex only matches the first one (AuthenticationOK). This patch
fixes the regex to match all authentication packets.
No backport needed.
At the end of a tcp-check based health check, if there is still a connection
attached to the check, it must be closed. But it must be done before releasing
the session, because the session may still be referenced by the mux. For
instance, an h2 stream may still have a reference on the session.
No need to backport.
When a new connection is established, if an old connection is still attached to
the current check, it must be detroyed. When it happens, the old conn-stream
must be used to unsubscribe for events, not the new one.
No backport is needed.
The variable 'ret' must only be declared When HAProxy is compiled with the SSL
support (more precisely SSL_CTRL_SET_TLSEXT_HOSTNAME must be defined).
No backport needed.
Since the tcp-check based heath checks uses the best multuplexer for a
connection, the mux-pt is no longer the only possible choice. So events
subscriptions and unsubscriptions must be done with the mux.
No backport needed.
It is now possible to match on a comma-separated list of status codes or range
of codes. In addtion, instead of a string comparison to match the response's
status code, a integer comparison is performed. Here is an example:
http-check expect status 200,201,300-310
When default parameters are set in a request message, we get the client
connection using the session's origin. But it is not necessarily a
conncection. For instance, for health checks, thanks to recent changes, it may
be a check object. At this step, the client connection may be NULL. Thus, we
must be sure to have a client connection before using it.
This patch must be backported to 2.1.