Reorganize code for timeout calculation in case the connection is idle.
The objective is to better reflect the relations between each timeouts
as follow :
* if GOAWAY already emitted, use shut-timeout, or if unset fallback to
client/server one. However, an already set timeout is never erased.
* else, for frontend connection, http-request or keep-alive timeout is
applied depending on the current demux state. If the selected value is
unset, fallback to client timeout
* for backend connection, no timeout is set to perform http-reuse
This commit is pure refactoring, so no functional change should occur.
Since the following commit, MUX H2 timeout function has been slightly
exetended.
d38d8c6ccb
BUG/MEDIUM: mux-h2: make sure control frames do not refresh the idle timeout
A side-effect of this patch is that now backend idle connection expire
timer is not reset if already defined. This means that if a timer was
registered prior to the connection transition to idle, the connection
would be destroyed on its timeout. If this happens for enough
connection, this may have an impact on the reuse rate.
In practice, this case should be rare, as h2c timer is set to
TICK_ETERNITY while there is active streams. The timer is not refreshed
most of the time before going the transition to idle, so the connection
won't be deleted on expiration.
The only case where it could occur is if there is still pending data
blocked on emission on stream detach. Here, timeout server is applied on
the connection. When the emission completes, the connection goes to
idle, but the timer will still armed, and thus will be triggered on the
idle connection.
To prevent this, explicitely reset h2c timer to TICK_ETERNITY for idle
backend connection via h2c_update_timeout().
This patch is explicitely not scheduled for backport for now, as it is
difficult to estimate the real impact of the previous code state.
GOAWAY emission should not be emitted before preface. Thus, max_id field
from h2c acting as a server is initialized to -1, which prevents its
emission until preface is received from the peer. If acting as a client,
max_id is initialized to a valid value on the first h2s emission.
This causes an issue with reverse HTTP on the active side. First, it
starts as a client, so the peer does not emit a preface but instead a
simple SETTINGS frame. As role are switched, max_id is initialized much
later when the first h2s response is emitted. Thus, if the connection
must be terminated before any stream transfer, GOAWAY cannot be emitted.
To fix this, ensure max_id is initialized to 0 on h2_conn_reverse() for
active connect side. Thus, a GOAWAY indicating that no stream has been
handled can be generated.
Note that passive connect side is not impacted, as it max_id is
initialized thanks to preface reception.
This should be backported up to 2.9.
Active connect on reverse http relies on connect timeout to detect
connection failure. Thus, if this timeout was unset, connection failure
may not be properly detected.
Fix this by fallback on hardcoded value of 1s for connect if timeout is
unset in the configuration. This is considered as a minor bug, as
haproxy advises against running with timeout unset.
This must be backported up to 2.9.
While reviewing HTTP/2 MUX timeout, it seems there is a possibility that
MUX task is requeued via h2c_update_timeout() with an already expired
date. This can happens with idle connections on two cases :
* first with shut timeout, as timer is not refreshed if already set
* second with http-request and keep-alive timers, which are based on
idle_start
Queuing an already expired task is an undefined behavior. Fix this by
using task_wakeup() instead of task_queue() at the end of
h2c_update_timeout() if such case occurs.
This should be backported up to 2.6.
Jacques Heunis from bloomberg reported on the mailing list [1] that
with haproxy 2.8 up to master, yielding from a Lua tcp service while
data was still buffered inside haproxy would eat some data which was
definitely lost.
He provided the reproducer below which turned out to be really helpful:
global
log stdout format raw local0 info
lua-load haproxy_yieldtest.lua
defaults
log global
timeout connect 10s
timeout client 1m
timeout server 1m
listen echo
bind *:9090
mode tcp
tcp-request content use-service lua.print_input
haproxy_yieldtest.lua:
core.register_service("print_input", "tcp", function(applet)
core.Info("Start printing input...")
while true do
local inputs = applet:getline()
if inputs == nil or string.len(inputs) == 0 then
core.Info("closing input connection")
return
end
core.Info("Received line: "..inputs)
core.yield()
end
end)
And the script below:
#!/usr/bin/bash
for i in $(seq 1 9999); do
for j in $(seq 1 50); do
echo "${i}_foo_${j}"
done
sleep 2
done
Using it like this:
./test_seq.sh | netcat localhost 9090
We can clearly see the missing data for every "foo" burst (every 2
seconds), as they are holes in the numbering.
Thanks to the reproducer, it was quickly found that only versions
>= 2.8 were affected, and that in fact this regression was introduced
by commit 31572229e ("MEDIUM: hlua/applet: Use the sedesc to report and
detect end of processing")
In fact in 31572229e 2 mistakes were made during the refaco.
Indeed, both in hlua_applet_tcp_fct() (which is involved in the reproducer
above) and hlua_applet_http_fct(), the request (buffer) is now
systematically consumed when returning from the function, which wasn't the
case prior to this commit: when HLUA_E_AGAIN is returned, it means a
yield was requested and that the processing is not done yet, thus we
should not consume any data, like we did prior to the refacto.
Big thanks to Jacques who did a great job reproducing and reporting this
issue on the mailing list.
[1]: https://www.mail-archive.com/haproxy@formilux.org/msg45778.html
It should be backported up to 2.8 with commit 31572229e
Change the representation of the start-line URI when parsing a HTTP/3
request into HTX. Adopt the same conversion as HTTP/2. If :authority
header is used (default case), the URI is encoded using absolute-form,
with scheme, host and path concatenated. If only a plain host header is
used instead, fallback to the origin form.
This commit may cause some configuration to be broken if parsing is
performed on the URI. Indeed, now most of the HTTP/3 requests will be
represented with an absolute-form URI at the stream layer.
Note that prior to this commit a check was performed on the path used as
URI to ensure that it did not contain any invalid characters. Now, this
is directly performed on the URI itself, which may include the path.
This must not be backported.
Ensure that the HTX start-line generated after parsing an HTTP/3 request
does not contain any invalid character, i.e. control or whitespace
characters.
Note that for now path is used directly as URI. Thus, the check is
performed directly over it. A patch will change this to generate an
absolute-form URI in most cases, but it won't be backported to avoid
configuration breaking in stable versions.
This must be backported up to 2.6.
RFC 9114 specifies some requirements for :path pseudo-header when using
http or https scheme. This commit enforces this by rejecting a request
if needed. Thus, path cannot be empty, and it must either start with a
'/' character or contains only '*'.
This must be backported up to 2.6.
As specified in RFC 9114, connection headers required special care in
HTTP/3. When a request is received with connection headers, the stream
is immediately closed. Conversely, when translating the response from
HTX, such headers are not encoded but silently ignored.
However, "upgrade" was not listed in connection headers. This commit
fixes this by adding a check on it both on request parsing and response
encoding.
This must be backported up to 2.6.
This commit does a similar job than the previous one, but it acts now on
the response path. Any leading or trailing whitespaces characters from a
HTX block header value are removed, prior to the header encoding via
QPACK.
This must be backported up to 2.6.
Remove any leading and trailing whitespace from header field values
prior to inserting a new HTX header block. This is done when parsing a
HEADERS frame, both as headers and trailers.
This must be backported up to 2.6.
Free the acme_ctx task context once the task is done.
It frees everything but the config and the httpclient,
everything else is free.
The ckch_store is freed in case of error, but when the task is
successful, the ptr is set to NULL to prevent the free once inserted in
the tree.
This patch registers the task in the ckch_store so we don't run 2 tasks
at the same time for a given certificate.
Move the task creation under the lock and check if there was already a
task under the lock.
Exponential backoff values was multiplied by 3000 instead of 3 with a
second to ms conversion. Leading to a 9000000ms value at the 2nd
attempt.
Fix the issue by setting the value in seconds and converting the value
in tick_add().
No backport needed.
When receiving the final certificate, it need to be loaded by
ssl_sock_load_pem_into_ckch(). However this function will remove any
existing private key in the struct ckch_store.
In order to fix the issue, the ptr to the key is swapped with a NULL
ptr, and restored once the new certificate is commited.
However there is a discrepancy when there is an error in
ssl_sock_load_pem_into_ckch() fails and the pointer is lost.
This patch fixes the issue by restoring the pointer in the error path.
This must fix issue #2933.
Add a 1MB ring buffer called "dpapi" for communication with the
dataplane API. It would first be used to transmit ACME informations to
the dataplane API but could be used for more.
A server abort must be handled by the request analyzers only when the
response forwarding was already started. Otherwise, it it the responsability
of the response analyzer to detect this event. L7-retires and conditions to
decide to silently close a client conneciotn are handled by this analyzer.
Because a reused server connections closed too early could be detected at
the wrong place, it was possible to get a 502/SH instead of a silent close,
preventing the client to safely retries its request.
Thanks to this patch, we are able to silently close the client connection in
this case and eventually to perform a L7 retry.
This patch must be backported as far as 2.8.
During the response payload forwarding, if the back SC is closed, we try to
figure out if it is because of a client abort or a server abort. However,
the condition was not accurrate, especially when abortonclose option is
set. Because of this issue, a server abort may be reported (SD-- in logs)
instead of a client abort (CD-- in logs).
The right way to detect a client abort when we try to forward the response
is to test if the back SC was shut down (SC_FL_SHUT_DOWN flag set) AND
aborted (SC_FL_ABRT_DONE flag set). When these both flags are set, it means
the back connection underwent the shutdown, which should be converted to a
client abort at this stage.
This patch should be backported as far as 2.8. It should fix last strange SD
report in the issue #2749.
src/jws.c: In function '__jws_init':
src/jws.c:594:38: error: passing argument 2 of 'hap_register_unittest' from incompatible pointer type [-Wincompatible-pointer-types]
594 | hap_register_unittest("jwk", jwk_debug);
| ^~~~~~~~~
| |
| int (*)(int, char **)
In file included from include/haproxy/api.h:36,
from include/import/ebtree.h:251,
from include/import/ebmbtree.h:25,
from include/haproxy/jwt-t.h:25,
from src/jws.c:5:
include/haproxy/init.h:37:52: note: expected 'int (*)(void)' but argument is of type 'int (*)(int, char **)'
37 | void hap_register_unittest(const char *name, int (*fct)());
| ~~~~~~^~~~~~
GCC 15 is warning because the function pointer does have its
arguments in the register function.
Should fix issue #2929.
>>> CID 1609049: Code maintainability issues (UNUSED_VALUE)
>>> Assigning value "NULL" to "new_ckchs" here, but that stored value is overwritten before it can be used.
592 struct ckch_store *old_ckchs, *new_ckchs = NULL;
Coverity reported an issue where a variable is initialized to NULL then
directry overwritten with another value. This doesn't arm but this patch
removes the useless initialization.
Must fix issue #2932.
In call traces, we're interested in seeing the code that was executed, not
the code that was not yet. The return address is where the CPU will return
to, so we want to see the bytes that precede this location. In the example
below on x86 we can clearly see a number of direct "call" instructions
(0xe8 + 4 bytes). There are also indirect calls (0xffd0) that cannot be
exploited but it gives insights about where the code branched, which will
not always be the function above it if that one used tail branching for
example. Here's an example dump output:
call ------------,
v
0x6bd634 <64 8b 38 e8 ac f7 ff ff]: debug_handler+0x84/0x95
0x7fa4ea2593a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0
0x752132 <00 00 00 00 00 90 41 55]: htx_remove_blk+0x2/0x354
0x5b1a2c <4c 89 ef e8 04 07 1a 00]: main+0x10471c
0x5b5f05 <48 89 df e8 8b b8 ff ff]: main+0x108bf5
0x60b6f4 <89 ee 4c 89 e7 41 ff d0]: tcpcheck_eval_send+0x3b4/0x14b2
0x610ded <00 00 00 e8 53 a5 ff ff]: tcpcheck_main+0x7dd/0xd36
0x6c5ab4 <48 89 df e8 5c ab f4 ff]: wake_srv_chk+0xc4/0x3d7
0x6c5ddc <48 89 f7 e8 14 fc ff ff]: srv_chk_io_cb+0xc/0x13
For code dumps, dumping from the return address is pointless, what is
interesting is to dump before the return address to read the machine
code that was executed before branching. Let's just make the function
support negative sizes to indicate that we're dumping this number of
bytes to the address instead of this number from the address. In this
case, in order to distinguish them, we're using a '<' instead of '[' to
start the series of bytes, indicating where the bytes expand and where
they stop. For example we can now see this:
0x6bd634 <64 8b 38 e8 ac f7 ff ff]: debug_handler+0x84/0x95
0x7fa4ea2593a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0
0x752132 <00 00 00 00 00 90 41 55]: htx_remove_blk+0x2/0x354
0x5b1a2c <4c 89 ef e8 04 07 1a 00]: main+0x10471c
0x5b5f05 <48 89 df e8 8b b8 ff ff]: main+0x108bf5
0x60b6f4 <89 ee 4c 89 e7 41 ff d0]: tcpcheck_eval_send+0x3b4/0x14b2
0x610ded <00 00 00 e8 53 a5 ff ff]: tcpcheck_main+0x7dd/0xd36
0x6c5ab4 <48 89 df e8 5c ab f4 ff]: wake_srv_chk+0xc4/0x3d7
0x6c5ddc <48 89 f7 e8 14 fc ff ff]: srv_chk_io_cb+0xc/0x13
These counters can have a noticeable cost on large machines, though not
dramatic. There's no single good choice to keep them enabled or disabled.
This commit adds multiple choices:
- DEBUG_COUNTERS set to 2 will automatically enable them by default, while
1 will disable them by default
- the global "debug.counters on/off" will allow to change the setting at
boot, regardless of DEBUG_COUNTERS as long as it was at least 1.
- the CLI "debug counters on/off" will also allow to change the value at
run time, allowing to observe a phenomenon while it's happening, or to
disable counters if it's suspected that their cost is too high
Finally, the "debug counters" command will append "(stopped)" at the end
of the CNT lines when these counters are stopped.
Not that the whole mechanism would easily support being extended to all
counter types by specifying the types to apply to, but it doesn't seem
useful at all and would require the user to also type "cnt" on debug
lines. This may easily be changed in the future if it's found relevant.
COUNT_IF() is convenient but can be heavy since some of them were found
to trigger often (roughly 1 counter per request on avg). This might even
have an impact on large setups due to the cost of a shared cache line
bouncing between multiple cores. For now there's no way to disable it,
so let's only enable it when DEBUG_COUNTERS is 1 or above. A future
change will make it configurable.
Till now the per-line glitches counters were only enabled with the
confusingly named DEBUG_GLITCHES (which would not turn glitches off
when disabled). Let's instead change it to DEBUG_COUNTERS and make sure
it's enabled by default (though it can still be disabled with
-DDEBUG_GLITCHES=0 just like for DEBUG_STRICT). It will later be
expanded to cover more counters.
It's easy to leave some trailing \n or even other characters that can
mangle the debug output. Let's verify at boot time that the debug sections
are clean by checking for chars 0x20 to 0x7e inclusive. This is very simple
to do and it managed to find another one in a multi-line message:
[WARNING] (23696) : Invalid character 0x0a at position 96 in description string at src/cli.c:2516 _send_status()
This way new offending code will be spotted before being committed.
These ones were added by mistake during the change of the cfgparse
mechanism in 3.1, but they're corrupting the output of "debug counters"
by leaving stray ']' on their own lines. We could possibly check them
all once at boot but it doens't seem worth it.
This should be backported to 3.1.
Following error is triggered at linker invokation, when we try to compile with
USE_THREAD=0 and -O0.
make -j 8 TARGET=linux-glibc USE_LUA=1 USE_PCRE2=1 USE_LINUX_CAP=1 \
USE_MEMORY_PROFILING=1 OPT_CFLAGS=-O0 USE_THREAD=0
/usr/bin/ld: src/thread.o: warning: relocation against `thread_cpus_enabled_at_boot' in read-only section `.text'
/usr/bin/ld: src/thread.o: in function `thread_detect_count':
/home/vk/projects/haproxy/src/thread.c:1619: undefined reference to `thread_cpus_enabled_at_boot'
/usr/bin/ld: /home/vk/projects/haproxy/src/thread.c:1619: undefined reference to `thread_cpus_enabled_at_boot'
/usr/bin/ld: /home/vk/projects/haproxy/src/thread.c:1620: undefined reference to `thread_cpus_enabled_at_boot'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
collect2: error: ld returned 1 exit status
make: *** [Makefile:1044: haproxy] Error 1
thread_cpus_enabled_at_boot is only available when we compiled with
USE_THREAD=1, which is the default for the most targets now.
In some cases, we need to recompile in mono-thread mode, thus
thread_cpus_enabled_at_boot should be protected with USE_THREAD in
thread_detect_count().
thread_detect_count() is always called during the process initialization
never mind of multi thread support. It sets some defaults in global.nbthread
and global.nbtgroups.
This patch is related to GitHub issue #2916.
No need to be backported as it was added in 3.2-dev9 version.
When receiving the final certificate, it need to be loaded by
ssl_sock_load_pem_into_ckch(). However this function will remove any
existing private key in the struct ckch_store.
In order to fix the issue, the ptr to the key is swapped with a NULL
ptr, and restored once the new certificate is commited.
However there is a discrepancy when there is an error in
ssl_sock_load_pem_into_ckch() fails and the pointer is lost.
This patch fixes the issue by restoring the pointer in the error path.
This must fix issue #2933.
This step is the latest to have a usable ACME certificate in haproxy.
It looks for the previous certificate, locks the "BIG CERTIFICATE LOCK",
copy every instance, deploys new ones, remove the previous one.
This is done in one step in a function which does not yield, so it could
be problematic if you have thousands of instances to handle.
It still lacks the rate limit which is mandatory to be used in
production, and more cleanup and deinit.
Copy the original ckch_store instead of creating a new one. This allows
to inherit the ckch_conf from the previous structure when doing a
ckchs_dup(). The ckch_conf contains the SAN for ACME.
Free the previous PKEY since it a new one is generated.
Handle new members of the ckch_conf in ckchs_dup() and
ckch_conf_clean().
This could be automated at some point since we have the description of
all types in ckch_conf_kws.