When a stream is detached from a backend private connection, we must not insert
it in the available connection list. In addition, we must be sure to remove it
from this list. To ensure it is properly performed, this part has been slightly
refactored to clearly split processing of private connections from the others.
This patch should probably be backported to 2.2.
A bug in task_kill() was fixed by commy 54d31170a ("BUG/MAJOR: sched:
make sure task_kill() always queues the task") which added a list
initialization before adding an element. But in fact an inconditional
addition would have done the same and been simpler than first
initializing then checking the element was initialized. Let's use
MT_LIST_ADDQ() there to add the task to kill into the shared queue
and kill the dirty LIST_INIT().
When a connection is added to an idle list, it's already detached and
cannot be seen by two threads at once, so there's no point using
TRY_ADDQ, there will never be any conflict. Let's just use the cheaper
ADDQ.
The TRY_ADDQ there was not needed since the wait list is exclusively
owned by the caller. There's a preliminary test on MT_LIST_ADDED()
that might have been eliminated by keeping MT_LIST_TRY_ADDQ() but
it would have required two more expensive writes before testing so
better keep the test the way it is.
Initially when mt_lists were added, their purpose was to be used with
the scheduler, where anyone may concurrently add the same tasklet, so
it sounded natural to implement a check in MT_LIST_ADD{,Q}. Later their
usage was extended and MT_LIST_ADD{,Q} started to be used on situations
where the element to be added was exclusively owned by the one performing
the operation so a conflict was impossible. This became more obvious with
the idle connections and the new macro was called MT_LIST_ADDQ_NOCHECK.
But this remains confusing and at many places it's not expected that
an MT_LIST_ADD could possibly fail, and worse, at some places we start
by initializing it before adding (and the test is superflous) so let's
rename them to something more conventional to denote the presence of the
check or not:
MT_LIST_ADD{,Q} : inconditional operation, the caller owns the
element, and doesn't care about the element's
current state (exactly like LIST_ADD)
MT_LIST_TRY_ADD{,Q}: only perform the operation if the element is not
already added or in the process of being added.
This means that the previously "safe" MT_LIST_ADD{,Q} are not "safe"
anymore. This also means that in case of backport mistakes in the
future causing this to be overlooked, the slower and safer functions
will still be used by default.
Note that the missing unchecked MT_LIST_ADD macro was added.
The rest of the code will have to be reviewed so that a number of
callers of MT_LIST_TRY_ADDQ are changed to MT_LIST_ADDQ to remove
the unneeded test.
Previous commit b24bc0d ("MINOR: tcp: Support TCP keepalive parameters
customization") broke non-Linux builds as TCP_KEEP{CNT,IDLE,INTVL} are
not necessarily defined elsewhere.
This patch adds the required #ifdefs to condition the visibility of the
keywords, and adds a mention in the doc about their dependency on Linux.
It is now possible to customize TCP keepalive parameters.
These correspond to the socket options TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL
and are valid for the defaults, listen, frontend and backend sections.
This patch fixes GitHub issue #670.
Compiling HAProxy with USE_LUA=1 and running a configuration check within
valgrind with a very simple configuration such as:
listen foo
bind *:8080
Will report quite a few possible leaks afterwards:
==24048== LEAK SUMMARY:
==24048== definitely lost: 0 bytes in 0 blocks
==24048== indirectly lost: 0 bytes in 0 blocks
==24048== possibly lost: 95,513 bytes in 1,209 blocks
==24048== still reachable: 329,960 bytes in 71 blocks
==24048== suppressed: 0 bytes in 0 blocks
Printing these possible leaks shows that all of them are caused by Lua.
Luckily Lua makes it *very* easy to free all used memory, so let's do
this on shutdown.
Afterwards this patch is applied the output looks much better:
==24199== LEAK SUMMARY:
==24199== definitely lost: 0 bytes in 0 blocks
==24199== indirectly lost: 0 bytes in 0 blocks
==24199== possibly lost: 0 bytes in 0 blocks
==24199== still reachable: 329,960 bytes in 71 blocks
==24199== suppressed: 0 bytes in 0 blocks
Given the following example configuration:
listen foo
mode http
bind *:8080
http-request set-var(txn.leak) meth(GET)
server x example.com:80
Running a configuration check with valgrind reports:
==25992== 4 bytes in 1 blocks are definitely lost in loss record 1 of 344
==25992== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==25992== by 0x4E239D: my_strndup (tools.c:2261)
==25992== by 0x581E20: make_arg_list (arg.c:253)
==25992== by 0x4DE91D: sample_parse_expr (sample.c:890)
==25992== by 0x58E304: parse_store (vars.c:772)
==25992== by 0x566A3F: parse_http_req_cond (http_rules.c:95)
==25992== by 0x4A4CE6: cfg_parse_listen (cfgparse-listen.c:1339)
==25992== by 0x494C59: readcfgfile (cfgparse.c:2049)
==25992== by 0x545145: init (haproxy.c:2029)
==25992== by 0x421E42: main (haproxy.c:3175)
After this patch is applied the leak is gone as expected.
This is a fairly minor leak, but it can add up for many uses of the `bool()`
sample fetch. The bug most likely exists since the `bool()` sample fetch was
introduced in commit cc103299c75c530ab3637a1698306145bdc85552. The fix may
be backported to HAProxy 1.6+.
Given the following example configuration:
listen foo
mode http
bind *:8080
http-request set-var(txn.leak) bool(1)
server x example.com:80
Running a configuration check with valgrind reports:
==24233== 2 bytes in 1 blocks are definitely lost in loss record 1 of 345
==24233== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24233== by 0x4E238D: my_strndup (tools.c:2261)
==24233== by 0x581E10: make_arg_list (arg.c:253)
==24233== by 0x4DE90D: sample_parse_expr (sample.c:890)
==24233== by 0x58E2F4: parse_store (vars.c:772)
==24233== by 0x566A2F: parse_http_req_cond (http_rules.c:95)
==24233== by 0x4A4CE6: cfg_parse_listen (cfgparse-listen.c:1339)
==24233== by 0x494C59: readcfgfile (cfgparse.c:2049)
==24233== by 0x545135: init (haproxy.c:2029)
==24233== by 0x421E42: main (haproxy.c:3175)
After this patch is applied the leak is gone as expected.
This is a fairly minor leak, but it can add up for many uses of the `bool()`
sample fetch. The bug most likely exists since the `bool()` sample fetch was
introduced in commit cc103299c75c530ab3637a1698306145bdc85552. The fix may
be backported to HAProxy 1.6+.
Given the following example configuration:
backend foo
mode http
use-server %[str(x)] if { always_true }
server x example.com:80
Running a configuration check with valgrind reports:
==19376== 170 (40 direct, 130 indirect) bytes in 1 blocks are definitely lost in loss record 281 of 347
==19376== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==19376== by 0x5091AC: add_sample_to_logformat_list (log.c:511)
==19376== by 0x50A5A6: parse_logformat_string (log.c:671)
==19376== by 0x4957F2: check_config_validity (cfgparse.c:2588)
==19376== by 0x54442D: init (haproxy.c:2129)
==19376== by 0x421E42: main (haproxy.c:3169)
After this patch is applied the leak is gone as expected.
This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
Given the following example configuration:
backend foo
mode http
use-server x if { always_true }
server x example.com:80
Running a configuration check with valgrind reports:
==18650== 14 bytes in 1 blocks are definitely lost in loss record 3 of 345
==18650== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18650== by 0x649E489: strdup (strdup.c:42)
==18650== by 0x4A5438: cfg_parse_listen (cfgparse-listen.c:1548)
==18650== by 0x494C59: readcfgfile (cfgparse.c:2049)
==18650== by 0x5450B5: init (haproxy.c:2029)
==18650== by 0x421E42: main (haproxy.c:3168)
After this patch is applied the leak is gone as expected.
This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
Given the following example configuration:
frontend foo
mode http
bind *:8080
unique-id-header x
Running a configuration check with valgrind reports:
==17621== 2 bytes in 1 blocks are definitely lost in loss record 1 of 341
==17621== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17621== by 0x649E489: strdup (strdup.c:42)
==17621== by 0x4A87F1: cfg_parse_listen (cfgparse-listen.c:2747)
==17621== by 0x494C59: readcfgfile (cfgparse.c:2049)
==17621== by 0x545095: init (haproxy.c:2029)
==17621== by 0x421E42: main (haproxy.c:3167)
After this patch is applied the leak is gone as expected.
This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
Given the following example configuration:
resolvers test
nameserver test 127.0.0.1:53
listen foo
bind *:8080
server foo example.com resolvers test
Running a configuration check within valgrind reports:
==21995== 5 bytes in 1 blocks are definitely lost in loss record 1 of 30
==21995== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21995== by 0x5726489: strdup (strdup.c:42)
==21995== by 0x4B2CFB: parse_server (server.c:2163)
==21995== by 0x4680C1: cfg_parse_listen (cfgparse-listen.c:534)
==21995== by 0x459E33: readcfgfile (cfgparse.c:2167)
==21995== by 0x50778D: init (haproxy.c:2021)
==21995== by 0x418262: main (haproxy.c:3133)
==21995==
==21995== 12 bytes in 1 blocks are definitely lost in loss record 3 of 30
==21995== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21995== by 0x5726489: strdup (strdup.c:42)
==21995== by 0x4AC666: srv_prepare_for_resolution (server.c:1606)
==21995== by 0x4B2EBD: parse_server (server.c:2081)
==21995== by 0x4680C1: cfg_parse_listen (cfgparse-listen.c:534)
==21995== by 0x459E33: readcfgfile (cfgparse.c:2167)
==21995== by 0x50778D: init (haproxy.c:2021)
==21995== by 0x418262: main (haproxy.c:3133)
with one more leak unrelated to `struct server`. After applying this
patch the leak is gone as expected.
This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
Given the following example configuration:
frontend foo
mode http
bind *:8080
unique-id-format x
Running a configuration check with valgrind reports:
==30712== 42 (40 direct, 2 indirect) bytes in 1 blocks are definitely lost in loss record 18 of 39
==30712== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30712== by 0x4ED7E9: add_to_logformat_list (log.c:462)
==30712== by 0x4EEE28: parse_logformat_string (log.c:720)
==30712== by 0x47B09A: check_config_validity (cfgparse.c:3046)
==30712== by 0x52881D: init (haproxy.c:2121)
==30712== by 0x41F382: main (haproxy.c:3126)
After this patch is applied the leak is gone as expected.
This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
Instead of just calling release_sample_arg(conv_expr->arg_p) we also must
free() the conv_expr itself (after removing it from the list).
Given the following example configuration:
frontend foo
bind *:8080
mode http
http-request set-var(txn.foo) str(bar)
acl is_match str(foo),strcmp(txn.hash) -m bool
Running a configuration check within valgrind reports:
==1431== 32 bytes in 1 blocks are definitely lost in loss record 20 of 43
==1431== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1431== by 0x4C39B5: sample_parse_expr (sample.c:982)
==1431== by 0x56B410: parse_acl_expr (acl.c:319)
==1431== by 0x56BA7F: parse_acl (acl.c:697)
==1431== by 0x48D225: cfg_parse_listen (cfgparse-listen.c:816)
==1431== by 0x4797C3: readcfgfile (cfgparse.c:2167)
==1431== by 0x52943D: init (haproxy.c:2021)
==1431== by 0x41F382: main (haproxy.c:3133)
After this patch is applied the leak is gone as expected.
This is a fairly minor leak that can only be observed if samples need to be
freed, which is not something that should occur during normal processing and
most likely only during shut down. Thus no backport should be needed.
Instead of simply calling free() in expr->smp->arg_p in certain cases
properly free the sample using release_sample_expr().
Given the following example configuration:
frontend foo
bind *:8080
mode http
http-request set-var(txn.foo) str(bar)
acl is_match str(foo),strcmp(txn.hash) -m bool
Running a configuration check within valgrind reports:
==31371== 160 (48 direct, 112 indirect) bytes in 1 blocks are definitely lost in loss record 35 of 45
==31371== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371== by 0x4C3832: sample_parse_expr (sample.c:876)
==31371== by 0x56B3E0: parse_acl_expr (acl.c:319)
==31371== by 0x56BA4F: parse_acl (acl.c:697)
==31371== by 0x48D225: cfg_parse_listen (cfgparse-listen.c:816)
==31371== by 0x4797C3: readcfgfile (cfgparse.c:2167)
==31371== by 0x5293ED: init (haproxy.c:2021)
==31371== by 0x41F382: main (haproxy.c:3126)
After this patch this leak is reduced. It will be fully removed in a
follow up patch:
==32503== 32 bytes in 1 blocks are definitely lost in loss record 20 of 43
==32503== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==32503== by 0x4C39B5: sample_parse_expr (sample.c:982)
==32503== by 0x56B410: parse_acl_expr (acl.c:319)
==32503== by 0x56BA7F: parse_acl (acl.c:697)
==32503== by 0x48D225: cfg_parse_listen (cfgparse-listen.c:816)
==32503== by 0x4797C3: readcfgfile (cfgparse.c:2167)
==32503== by 0x52943D: init (haproxy.c:2021)
==32503== by 0x41F382: main (haproxy.c:3133)
This is a fairly minor leak that can only be observed if ACLs need to be
freed, which is not something that should occur during normal processing
and most likely only during shut down. Thus no backport should be needed.
When the multiplexer creation is delayed after the handshakes phase, the
connection is added in the available connection list if http-reuse never is not
configured for the backend. But it is a wrong statement. At this step, the
connection is not safe because it is a new connection. So it must be added in
the available connection list only if http-reuse always is used.
No backport needed, this is 2.2-dev.
When a connection is created and the multiplexer is installed, if the connection
is marked as private, don't consider it as available, regardless the number of
available streams. This test is performed when the mux is installed when the
connection is created, in connect_server(), and when the mux is installed after
the handshakes stage.
No backport needed, this is 2.2-dev.
When a connection is picked from the session server list because the proxy or
the session are marked to use the last requested server, if it is idle, we must
marked it as used removing the CO_FL_SESS_IDLE flag and decrementing the session
idle_conns counter.
This patch must be backported as far as 1.9.
Trace messages have been added when the CS_FL_MAY_SPLICE flag is set or unset
and when the splicing is really enabled for the H1 connection.
This patch may be backpored to 2.1 to ease debugging.
The CS_FL_MAY_SPLICE flag must be unset for the conn-stream if a read0 is
received while reading on the kernel pipe. It is mandatory when some data was
also received. Otherwise, this flag prevent the call to the h1 rcv_buf()
callback. Thus the read0 will never be handled by the h1 multiplexer leading to
a freeze of the session until a timeout is reached.
This patch must be backported to 2.1 and 2.0.
In h1_rcv_buf(), the splicing is systematically disabled if it was previously
enabled. When it happens, if the splicing is enabled it means the channel's
buffer was empty before calling h1_rcv_buf(). Thus, the only reason to disable
the splicing at this step is when some input data have just been processed.
This patch may be backported to 2.1 and 2.0.
In h1_rcv_pipe(), if the mux is unable to receive data, for instance because the
multiplexer is blocked on input waiting the other side (BUSY mode), no receive
must be performed.
This patch must be backported to 2.1 and 2.0.
In the commit 17ccd1a35 ("BUG/MEDIUM: connection: add a mux flag to indicate
splice usability"), The CS_FL_MAY_SPLICE flags was added to notify the upper
layer that the mux is able to use the splicing. But this was only done for the
payload in a message, in HTTP_MSG_DATA state. But the splicing is also possible
in TUNNEL mode, in HTTP_MSG_TUNNEL state. In addition, the splicing ability is
always disabled for chunked messages.
This patch must be backported to 2.1 and 2.0.
Since the commit cd0d2ed6e ("MEDIUM: log-format: make the LF parser aware of
sample expressions' end"), the LF_STEXPR label in the last switch-case statement
at the end of the for loop in the parse_logformat_string() function cannot be
reached anymore.
This patch should fix the issue #723.
Add a check on the conn pointer to avoid a NULL dereference in
smp_fetch_ssl_x_keylog().
The problem is not suppose to happen because the function is only used
for the frontend at the moment.
Introduced by 7d42ef5, 2.2 only.
Fix issue #733.
OpenSSL 1.1.1 provides a callback registering function
SSL_CTX_set_keylog_callback, which allows one to receive a string
containing the keys to deciphers TLSv1.3.
Unfortunately it is not possible to store this data in binary form and
we can only get this information using the callback. Which means that we
need to store it until the connection is closed.
This patches add 2 pools, the first one, pool_head_ssl_keylog is used to
store a struct ssl_keylog which will be inserted as a ex_data in a SSL *.
The second one is pool_head_ssl_keylog_str which will be used to store
the hexadecimal strings.
To enable the capture of the keys, you need to set "tune.ssl.keylog on"
in your configuration.
The following fetches were implemented:
ssl_fc_client_early_traffic_secret,
ssl_fc_client_handshake_traffic_secret,
ssl_fc_server_handshake_traffic_secret,
ssl_fc_client_traffic_secret_0,
ssl_fc_server_traffic_secret_0,
ssl_fc_exporter_secret,
ssl_fc_early_exporter_secret
NetBSD apparently uses macros for tolower/toupper and complains about
the use of char for array subscripts. Let's properly cast all of them
to unsigned char where they are used.
This is needed to fix issue #729.
Originally it was made to return a void* because some comparisons in the
code where it was used required a lot of casts. But now we don't need
that anymore. And having it non-const breaks the build on NetBSD 9 as
reported in issue #728.
So let's switch to const and adjust debug.c to accomodate this.
A typo was accidently introduced in commit 48ce6a3 ("BUG/MEDIUM: muxes:
Make sure nobody stole the connection before using it."), a "&" was
placed in front of "OTHER_LOCK", which breaks DEBUG_LOCK. No backport
is needed.
Building on OpenBSD 6.7 with gcc-4.2.1 yields the following warnings
which suggest that the initialization is not taken as expected but
that the container member is reset with each initialization:
src/peers.c: In function 'peer_send_updatemsg':
src/peers.c:1000: warning: initialized field overwritten
src/peers.c:1000: warning: (near initialization for 'p.updt')
src/peers.c:1001: warning: initialized field overwritten
src/peers.c:1001: warning: (near initialization for 'p.updt')
src/peers.c:1002: warning: initialized field overwritten
src/peers.c:1002: warning: (near initialization for 'p.updt')
src/peers.c:1003: warning: initialized field overwritten
src/peers.c:1003: warning: (near initialization for 'p.updt')
src/peers.c:1004: warning: initialized field overwritten
src/peers.c:1004: warning: (near initialization for 'p.updt')
Fixing this is trivial, we just have to initialize one level at
a time.
Please refer to commit 19a69b3740702ce5503a063e9dfbcea5b9187d27 for all the
details. This follow up commit fixes the `http-response capture` case, the
previous one only fixed the `http-request capture` one. The documentation was
already updated and the change to `check_http_res_capture` is identical to
the `check_http_req_capture` change.
This patch must be backported together with 19a69b3740702ce5503a063e9dfbcea5b9187d27.
Most likely this is 1.6+.
When we takeover a connection, let the xprt layer know. If it has its own
tasklet, and it is already scheduled, then it has to be destroyed, otherwise
it may run the new mux tasklet on the old thread.
Note that we only do this for the ssl xprt for now, because the only other
one that might wake the mux up is the handshake one, which is supposed to
disappear before idle connections exist.
No backport is needed, this is for 2.2.
In the various takeover() methods, make sure we schedule the old tasklet
on the old thread, as we don't want it to run on our own thread! This
was causing a very rare crash when building with DEBUG_STRICT, seeing
that either an FD's thread mask didn't match the thread ID in h1_io_cb(),
or that stream_int_notify() would try to queue a task with the wrong
tid_bit.
In order to reproduce this, it is necessary to maintain many connections
(typically 30k) at a high request rate flowing over H1+SSL between two
proxies, the second of which would randomly reject ~1% of the incoming
connection and randomly killing some idle ones using a very short client
timeout. The request rate must be adjusted so that the CPUs are nearly
saturated, but never reach 100%. It's easier to reproduce this by skipping
local connections and always picking from other threads. The issue
should happen in less than 20s otherwise it's necessary to restart to
reset the idle connections lists.
No backport is needed, takeover() is 2.2 only.
In srv_cleanup_idle_connections(), we compute how many idle connections
are in excess compared to the average need. But we may actually be missing
some, for example if a certain number were recently closed and the average
of used connections didn't change much since previous period. In this
case exceed_conn can become negative. There was no special case for this
in the code, and calculating the per-thread share of connections to kill
based on this value resulted in special value -1 to be passed to
srv_migrate_conns_to_remove(), which for this function means "kill all of
them", as used in srv_cleanup_connections() for example.
This causes large variations of idle connections counts on servers and
CPU spikes at the moment the cleanup task passes. These were quite more
visible with SSL as it costs CPU to close and re-establish these
connections, and it also takes time, reducing the reuse ratio, hence
increasing the amount of connections during reconnection.
In this patch we simply skip the killing loop when this condition is met.
No backport is needed, this is purely 2.2.
The function timeofday_as_iso_us adds now the trailing local timezone offset.
Doing this the function could be use directly to generate rfc5424 logs.
It affects content of a ring if the ring's format is set to 'iso' and 'timed'.
Note: the default ring 'buf0' is of type 'timed'.
It is preferable NOT to backport this to stable releases unless bugs are
reported, because while the previous format is not correct and the new
one is correct, there is a small risk to cause inconsistencies in log
format to some users who would not expect such a change in a stable
cycle.
Sadly, the fix from commit 54d31170a ("BUG/MAJOR: sched: make sure
task_kill() always queues the task") broke the builds without DEBUG_STRICT
as, in order to be careful, it plcaed a BUG_ON() around the previously
failing condition to check for any new possible failure, but this BUG_ON
strips the condition when DEBUG_STRICT is not set. We don't want BUG_ON
to evaluate any condition either as some debugging code calls possibly
expensive ones (e.g. in htx_get_stline). Let's just drop the useless
BUG_ON().
No backport is needed, this is 2.2-dev.
As reported in issue #724, openbsd fails to build in haproxy.c
due to a faulty comma in the middle of a warning message. This code
is only compiled when RLIMIT_AS is not defined, which seems to be
rare these days.
This may be backported to older versions as the problem was likely
introduced when strict limits were added.
Commit 69f591e3b ("MINOR: cli/proxy: add a new "show servers conn" command")
added the ability to dump the idle connections state for a server, but we
must not do this if idle connections were not allocated, which happens if
the server is configured with pool-max-conn 0.
This is 2.2, no backport is needed.
In the various timeout functions, make sure nobody stole the connection from
us before attempting to doing anything with it, there's a very small race
condition between the time we access the task context, and the time we
actually check it again with the lock, where it could have been free'd.