The previous commit contains a bug in some COUNT_IF() relying on the pipe
inside the IOBUF. We must take care to have a pipe before checking its size.
No backport needed.
Debug counters were added on all connection error when pending data remain
blocked in the input or ouput buffers. The same is performed when the H1C is
released, when the connection is closed and when a timeout is reached. Idea
is to be able to count all cases where data are lost, especially the
outgoing ones.
gcc-4.8 is unhappy with the cfg_file initialization:
src/flt_spoe.c: In function 'parse_spoe_flt':
src/flt_spoe.c:2202:9: warning: missing braces around initializer [-Wmissing-braces]
struct cfgfile cfg_file = {0};
^
src/flt_spoe.c:2202:9: warning: (near initialization for 'cfg_file.list') [-Wmissing-braces]
This is due to the embedded list member. Initializing it to empty like
we do almost everywhere else makes it happy. No backport is needed as
this was changed in 3.1-dev5 only.
As described in GH #2765, there were situations where http connections
would be re-used for requests to different endpoints, which is obviously
unexpected. In GH #2765, this occured with httpclient and UNIX socket
combination, but later code analysis revealed that while disabling http
reuse on httpclient proxy helped, it didn't fix the underlying issue since
it was found that conn_calculate_hash_sockaddr() didn't take into account
families such as AF_UNIX or AF_CUST_SOCKPAIR, and because of that the
sock_addr part of the connection wasn't hashed.
To properly fix the issue, let's explicly handle UNIX (both regular and
ABNS) and AF_CUST_SOCKPAIR families, so that the destination address is
properly hashed. To prevent this bug from re-appearing: when the family
isn't known, instead of doing nothing like before, let's fall back to a
generic (unoptimal) hashing which hashes the whole sockaddr_storage struct
As a workaround, http-reuse may be disabled on impacted proxies.
(unfortunately this doesn't help for httpclient since reuse policy
defaults to safe and cannot be modified from the config)
It should be backported to all stable versions.
Shout out to @christopherhibbert for having reported the issue and
provided a trivial reproducer.
[ada: prior to 3.0, ctx adjt is required because conn_hash_update()'s
prototype is slightly different]
Sometimes it would be desirable to include some debugging output only
under certain conditions, but the end of the transfer is too late to
apply some rules.
Here we take the approach of making a converter ("when") that takes a
condition among an arbitrary list, and decides whether or not to let
the input sample pass through or not based on the condition. This
allows for example to log debugging information only when an error
was encountered during the processing (sort of an extension of
dontlog-normal). The conditions are quite limited (stopping, error,
normal, toapplet, forwarded, processed) and can be negated. The
converter can also be chained to use more complex conditions.
A suggested example will be:
# log "dbg={-}" when fine, or "dbg={... debug info ...}" on error:
log-format "$HAPROXY_HTTP_LOG_FMT dbg={%[bs.debug_str,when(!normal)]}"
The idea here is to record how many times a filter is being called on a
stream. We're incrementing the same counter all along, regardless of the
type of event, since the purpose is essentially to detect one that might
be misbehaving. The number of calls is reported in "show sess all" next
to the filter name. It may also help detect suboptimal processing. For
example compressing 1GB shows 138k calls to the compression filter, which
is roughly two calls per buffer. Maybe we wake up with incomplete buffers
and compress less. That's left for a future analysis.
Process_stream() is a complex function and a few times some lopos were
either witnessed or suspected. Each time this happens it's extremely
difficult to figure why because it involves combinations of analysers,
filters, errors etc.
Let's at least maintain a set of 4 counters per stream that report the
number of times we've been through each of the 4 most important blocks
(stconn changes, request analysers, response analysers, and propagation
of changes down). These ones are stored in the stream and reported in
"show sess all", just like they will be reported in panic dumps.
Now it is possible to have info about front and back H1 multiplexer. For instance:
<134>Oct 22 18:10:46 haproxy[3841864]: 127.0.0.1:44280 [22/Oct/2024:18:10:43.265] front-http back-http/www 0/0/-1/-1/3082 503 217 - - SC-- 1/1/0/0/3 0/0 "GET / HTTP/1.1" fs=< h1s=0x13b6f10 h1s.flg=0x14010 .sd.flg=0x50404601 .req.state=MSG_DONE .res.state=MSG_DONE .meth=GET status=503 .sd.flg=0x50404601 .sc.flg=0x00034482 .sc.app=0x11e4c30 .subs=(nil) h1c.flg=0x0 .sub=0 .ibuf
=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x1337d10 .exp=<NEVER> conn.flg=0x80000300> bs=< h1s=0x13bb400 h1s.flg=0x100010 .sd.flg=0x10400001 .req.state=MSG_RQBEFORE .res.state=MSG_RPBEFORE .meth=UNKNOWN status=0 .sd.flg=0x10400001 .sc.flg=0x0003c007 .sc.app=0x11e4c30 .subs=(nil) h1c.flg=0x80000000 .sub=0 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x12ba610 .exp=<NEVER> conn.flg=0x5c0300>
The have this log message, the log-format must be set to:
log-format "$HAPROXY_HTTP_LOG_FMT fs=<%[fs.debug_str]> bs=<%[bs.debug_str]>"
Debug counters are added to track errors about wrong the payload length
during the message formatting (on the sending path). Aborts are also
concerned. connection shutdowns and errors while the end of the message was
not reached are now tracked. On the sending path, shutdown performed while
all the message was not forwarded are tracked too.
Not all aborts are tracked for now but only those a bit ambiguous. Mainly,
aborts during the data forwarding are concerned. Those triggered during the
request or the response analysis are easier to analyze with the stream
termination state.
When abortonclose option is enabled on the backend, at the SC level, we must
still pretend the SE have more data to deliver to be able to receive the
EOS. It must be performed at 2 places:
* When the backend is set and the connection is requested. It is when the
option is seen for the first time.
* After a receive attempt, if the EOI flag is set on the sedesc.
Otherwise, when an abort is detected by the mux, the SC is not
notified.
This patch should fix the issue #2764.
This bug probably exists in all stable version but is only visible since
bca5e1423 ("OPTIM: stconn: Don't pretend mux have more data to deliver on
EOI/EOS/ERROR"). So I suggest to not backport it for now, except if the commit
above is backported.
When data are sent via the zero-copy data forwarding, in h2_done_ff, we must
be sure to remove the H2 stream from the send list if something is send. It
was only performed if no blocking condition was encountered. But we must
also do it if something is sent. Otherwise the transfer may be blocked till
timeout.
This patch must be backported as far as 2.9.
During the zero-copy data forwarding, the caller specify the maximum amount
of data the producer may push. However, the HTML stats applet does not use
it and can fill all the free space in the buffer. It is especially an issue
when the consumer is limited by a flow control, like the H2. Because we may
emit too large DATA frame in this case. It is especially visible with big
buffer (for instance 32k).
In the early age or zero-copy data forwarding, the caller was responsible to
pass a properly resized buffer. And during the different refactoring steps,
this has changed but the HTML stats applet was not updated accordingly.
To fix the bug, the buffer used to dump the HTML page is resized to be sure
not too much data are dumped.
This patch should solve the issue #2757. It must be backported to 3.0.
Issuing "debug dev counters" on the CLI will now scan all existing
counters, and report their count, type, location, function name, the
condition and an optional comment passed to the macro.
The command takes a number of arguments:
- "show": this is the default, it will just list the counters
- "reset": will reset the matching counters instead of listing them
- "all": by default, only non-zero counters are listed. With "all",
they are all listed
- "bug": restrict the reset or dump to counters of type "BUG" (BUG_ON usually)
- "chk": restrict the reset or dump to counters of type "CHK" (CHECK_IF)
- "cnt": restrict the reset or dump to counters of type "CNT" (COUNT_IF)
The types may be cumulated, and the option entered in any order. Here's
an example of the output of "debug dev counters show all bug":
Count Type Location function(): "condition" [comment]
0 BUG ring.h:114 ring_dup(): "max > ring_size(dst)"
0 BUG vecpair.h:223 vp_getblk_ofs(): "ofs >= v1->len + v2->len"
0 BUG buf.h:395 b_add(): "b->data + count > b->size"
0 BUG buf.h:106 b_room(): "b->data > b->size"
0 BUG task.h:328 _task_queue(): "(ulong)caller & 1"
0 BUG task.h:324 _task_queue(): "task->tid != tid"
0 BUG task.h:313 _task_queue(): "(ulong)caller & 1"
(...)
This is expected to be convenient combined with the use and abuse of
COUNT_IF() at select locations.
This macro works exactly like BUG_ON() except that it never logs anything
nor crashes, it only implements an atomic counter that is incremented on
every call. This can be used to count a number of unlikely events that are
worth checking at run time on setups showing unusual and unreproducible
behaviors.
These macros do not always kill the process, and sometimes it would be
nice to know if some match or not, and how many times (especially for the
CHECK_IF one).
This commit adds a new section "dbg_cnt" made of structs that contain
function name, file name, line number, check type, condition and match
count. A newe macro __DBG_COUNT() adds one to the counter, and is placed
inside _BUG_ON() and _BUG_ON_ONCE(). It's worth noting that the exact
type of the check is not very precise but in practice we don't care,
as most checks will cause the process to die anyway unless they're of
type _BUG_ON_ONCE() (used by CHECK_IF by default).
All of this is limited to !defined(USE_OBSOLETE_LINKER) because we're
creating a section, thus we need a modern linker to be able to scan
this section later. Doing so adds ~50kB to the executable due to the
~1266 BUG_ON() and others placed there. That's not huge in comparison
to the visibility it can provide.
The BUG_ON() macros are made of two levels so as to resolve the condition
to a string. However this doesn't offer much flexibility for performing
other operations when the condition is validated, so let's adjust them so
that the condition is checked in the outer macro and the operations are
performed in the inner one.
A stream may be shut without any HTX EOM reported to report a proper
closure. This is the case for QCS instances flagged with
QC_SF_UNKNOWN_PL_LENGTH. Shut is performed with an empty FIN emission
instead of a RESET_STREAM. This has been implemented since the following
patch :
24962dd178
BUG/MEDIUM: mux-quic: do not emit RESET_STREAM for unknown length
However, in case of HTTP/3, an empty FIN should only be done after a
full message is emitted, which requires at least a HEADERS frame. If an
empty FIN is emitted without it, client may interpret this as invalid
and close the connection. To prevent this, fallback to a RESET_STREAM
emission if no data were emitted on the stream.
This was reproduced using ngtcp2-client with 10% loss (-r 0.1) on a
remote host, with httpterm request "/?s=100k&C=1&b=0&P=400". An error
ERR_H3_FRAME_UNEXPECTED is returned by ngtcp2-client when the bug
occurs.
Note that this change is incomplete. The message validity depends solely
on the application protocol in use. As such, a new app_ops callback
should be implemented to ensure the stream is closed accordingly.
However, this first patch ensures that at least HTTP/3 case is valid
while keeping a minimal backport process.
This should be backported up to 2.8.
An empty STREAM frame can be emitted by QUIC MUX to notify about a
delayed FIN when there is no data left to transmit. This requires a
tedious comparison on stream offset in qmux_ctrl_send() to ensure an
empty stream frame is not always considered as retransmitted, which is
necessary to locally close the QCS instance.
Simplify this by unsubscribe from streamdesc layer when the QCS is
locally closed on FIN transmission notification. This prevents all
future retransmitted frames to be reported to the QCS instance,
especially any potentially retransmitted empty FIN.
Before this patch, when wrong argument was provided in the configuration for
mworker-max-reloads keyword, parser shows these errors below on the stderr:
[WARNING] (1820317) : config : parsing [haproxy.cfg:154] : (null)parsing [haproxy.cfg:154] : 'mworker-max-reloads' expects an integer argument.
In a case, when by mistake two arguments were provided instead of one, this has
also triggered a buggy error message:
[ALERT] (1820668) : config : parsing [haproxy.cfg:154] : 'mworker-max-reloads' cannot handle unexpected argument '45'.
[WARNING] (1820668) : config : parsing [haproxy.cfg:154] : (null)
So, as 'mworker-max-reloads' is parsed in discovery mode by master process
let's align now its parser with all others, which could be called for this
mode. Like this in cases, when there are too many args or argument isn't a
valid integer we return proper error codes to global section parser and
messages are formated properly.
This fix should be backported in all stable versions.
Initially we agreed to split builds into "latest" for development branch
and fixed 22.04 for stable branches. It got broken when "latest" label migrated
from ubuntu-22 to ubuntu-24 ... because of build cache. Cache key is built using
runner label, it was not prepared to use the same "latest" cache from ubuntu 22
on ubuntu 24. To make things clear, let's stick explicitely to ubuntu 24.
PCRE2 is recommended, PCRE was chosen for no reason. GHA Ubuntu 22 images include both libs,
but recent Ubuntu 24 does not. Let us prepare for Ubuntu 24
Commit cf3fe1eed ("MINOR: mux-h2/traces: print the size of the DATA
frames") added the size of the DATA frame to the traces. Unfortunately
it uses ullong instead of ulong to cast a pointer, which breaks the
build on 32-bit platforms. Let's just switch it to ulong which works
on both.
One main problem with panic dumps is that they're filling the dumping
thread's trash, and that the global thread_dump_buffer is too small to
catch enough of them.
Here we're proceeding differently. When dumping threads for a panic, we're
passing the magic value 0x2 as the buffer, and it will instruct the target
thread to allocate its own buffer using get_trash_chunk() (which is signal
safe), so that each thread dumps into its own buffer. Then the thread will
wait for the buffer to be consumed, and will assign its own thread_dump_buffer
to it. This way we can simply dump all threads' buffers from gdb like this:
(gdb) set $t=0
while ($t < global.nbthread)
printf "%s\n", ha_thread_ctx[$t].thread_dump_buffer.area
set $t=$t+1
end
For now we make it wait forever since it's only called on panic and we
want to make sure the thread doesn't leave and continues to use that trash
buffer or do other nasty stuff. That way the dumping thread will make all
of them die.
This would be useful to backport to the most recent branches to help
troubleshooting. It backports well to 2.9, except for some trivial
context in tinfo-t.h for an updated comment. 2.8 and older would also
require TAINTED_PANIC. The following previous patches are required:
MINOR: debug: make mark_tainted() return the previous value
MINOR: chunk: drop the global thread_dump_buffer
MINOR: debug: split ha_thread_dump() in two parts
MINOR: debug: slightly change the thread_dump_pointer signification
MINOR: debug: make ha_thread_dump_done() take the pointer to be used
MINOR: debug: replace ha_thread_dump() with its two components
At the few places we were calling ha_thread_dump(), now we're
calling separately ha_thread_dump_fill() and ha_thread_dump_done()
once the data are consumed.
This will allow the caller to decide whether to definitely clear the
pointer and release the thread, or to leave it unlocked so that it's
easy to analyse from the struct (the goal will be to use that in panic()
so that cores are easy to analyse).
Now the thread_dump_pointer is returned ORed with 1 once done, or NULL
when cancelled (for now noone cancels). The goal will be to permit
the callee to provide its own pointer.
The ha_thread_dump_fill() function now returns the buffer pointer that
was used (without OR 1) or NULL, for ease of use from the caller.
We want to have a function to trigger the dump and another one to wait
for it to be completed. This will be important to permit panic dumps to
be done on local threads. For now this does not change anything, as the
function still calls the two new functions one after the other.
This variable is not very useful and is confusing anyway. It was mostly
used to detect that a panic dump was still in progress, but we can now
check mark_tainted() for this. The pointer was set to one of the dumping
thread's trash chunks. Let's temporarily continue to copy the dumps to
that trash, we'll remove it later.
Since mark_tainted() uses atomic ops to update the tainted status, let's
make it return the prior value, which will allow the caller to detect
if it's the first one to set it or not.
As mentioned in previous commit, b_peek_ofs() performs a wrapping check
but is often called with ofs == 0 as a constant. We can detect this case
with __builtin_const_p() so it makes sense to use it. A test shows a size
reduction of about 320 bytes, which is not much, but it happens in hot code
paths, and each 16 bytes reduction indicates an eliminated conditional
branch.
Some clear winners are ci_getblk_nc() (-48 bytes), h2c_dec_hdrs (-141B),
h1_copy_msg_data (-124B), tcpcheck_spop_expect_hello (-80B),
h1_parse_msg_data (-44B). These ones will definitely benefit from doing
less conditional jumps.
The function is an exact copy of b_peek_varint() with ofs==0 and doing a
b_del() at the end. We can simply call that other one and delete the
contents. It turns out that the code is bigger with this change because
b_peek_varint() passes its offset to b_peek() which performs a wrapping
check. When ofs==0 the wrapping cannot happen, but there's no real way
to tell that to the compiler. Instead conditioning the if() in b_peek()
with (!__builtin_constant_p(ofs) || ofs) does the job, but it's not worth
it at the moment since we have no users of b_get_varint() for now. Let's
just stick to the simple normal code.
Some large functions were moved to buf.c by commit ac66df4e2 ("REORG:
buffers: move some of the heavy functions from buf.h to buf.c"). However,
as found by Amaury, haring doesn't build anymore. Upon close inspection,
b_getblk_nc() isn't that big since it's very much inlinable, and a part
of its apparently large size comes from the BUG_ON_HOT() that were
implemented. Regarding b_peek_varint(), it doesn't have any dependency
and is used only at 4 places in the DNS code, so its loop will not have
big impacts, and the rest around can be optimised away by the compiler
so it remains relevant to keep it inlined. Also it can serve as a base
to deduplicate the code in b_get_varint().
No backport needed.
The ARGT_ID argument type may now be used to set a custom resolve
function in order to help resolve the argument string value. If the
custom resolve function is not set, the behavior is the same as of
type ARGT_STR.
A useless BUG_ON() statement was let in a conditional block that already
checks that the condition cannot be met within the block. Remove the
useless BUG_ON()
"option forwarded" provides a convenient way to automatically insert
rfc7239 forwarded header to requests sent to servers.
On the other hand, manually crafting the header is quite complicated due
to specific formatting rules that must be followed as per rfc7239.
However, sometimes it may be necessary to craft the header manually, for
instance if it has to be conditional or based on parameters that "option
forwarded" doesn't provide. To ease this task, in this patch we implement
rfc7239_nn and rfc7239_np which are respectively meant to craft nodename:
nodeport values, specifically intended to manually build rfc7239 'for'
and 'by' header fields while ensuring rfc7239 compliancy.
Example:
# build RFC-compliant 7239 header:
http-request set-var-fmt(txn.forwarded) "for=\"%[ipv6(::1),rfc7239_nn]:%[str(8888),rfc7239_np]\";host=\"haproxy.org\";proto=http"
# check RFC-compliancy:
http-request set-var(txn.test) "var(txn.forwarded),debug(ok,stderr),rfc7239_is_valid,debug(ok,stderr)"
# stderr output:
# [debug] ok: type=str <for="[::1]:_8888";host="haproxy.org";proto=http>
# [debug] ok: type=bool <1>
See documentation for more info and examples.
This issue came with this commit:
f627b92 BUG/MEDIUM: quic: always validate sender address on 0-RTT
and could be easily reproduced with picoquic QUIC client with -Q option
which splits a big ClientHello TLS message into two Initial datagrams.
A second condition must be fulfilled to reprodue this issue: picoquic
must not send the token provided by haproxy (NEW_TOKEN). To do that,
haproxy must be patched to prevent it to send such tokens.
Under these conditions, if haproxy has enough time to reply to the first Initial
datagrams, when it receives the second Initial datagram it sends a Retry paquet.
Then the client ignores the Retry paquet as mentionned by RFC 9000:
17.2.5.2. Handling a Retry Packet
A client MUST accept and process at most one Retry packet for each connection
attempt. After the client has received and processed an Initial or Retry packet
from the server, it MUST discard any subsequent Retry packets that it receives.
On its side, haproxy has closed the connection. When it receives the second
Initial datagram, it open a new connection but with Initial packets it
cannot decrypt (wrong ODCID) leaving the client without response.
To fix this, as the aim of the token (NEW_TOKEN) sent by haproxy is to validate
the peer address, in place of closing the connection when no token was received
for a 0RTT connection, one leaves this validation to the handshake process.
Indeed, the peer adress is validated during the handshake when a valid handshake
packet is received by the listener. But as one does not want haproxy to process
0RTT data when no token was received, one does not accept the connection before
the successful handshake completion. In addition to this, the 0RTT packets
are not released after successful handshake completion when no token was received
to leave a chance to haproxy to process these 0RTT data in such case (see
quic_conn_io_cb()).
Must be backported as far as 2.9.
Tokens are sent when opening a connection, just after the handshake, to
be possibly reused by the peer for the next connection. They are used
to validate the peer address during the 0RTT connection openings.
But there is no reason to reserve this feature to 0RTT connections.
This patch modifies quic_build_post_handshake_frames() to do so.
This bug came with this commit:
f627b92 BUG/MEDIUM: quic: always validate sender address on 0-RTT
If an error happens in quic_build_post_handshake_frames() during the
code exexuted for th NEW_TOKEN frame allocation, some could leak because
of the wrong label used to interrupt this function asap.
Replace the "goto leave" by "goto err" to deallocated such frames to fix
this issue.
Must be backported as far as 2.9.
When a filter is registered on the data, it means it may change the payload
length by rewritting data. It means consumers of the message cannot trust the
expected length of payload as announced by the producer. The commit 8bd835b2d2
("MEDIUM: filters/htx: Don't rely on HTX extra field if payload is filtered")
was pushed to solve this issue. When the HTTP payload of a message is filtered,
the extra field is set to 0 to be sure it will never be used by error by any
consumer. However, it is not enough.
Indeed, the filters must be called before fowarding some data. They cannot be
by-passed. But if a consumer is unable to flush the HTX message, some outgoing
data can remain blocked in the channel's buffer. If some new data are then
pushed because there is some room in the channel's buffe, the producer will set
the HTX extra field. At this stage, if the consumer is unblocked and can send
again data, it is possible to call it to forward outgoing data blocked in the
channel's buffer before waking the stream up to filter new input data. It is the
purpose of the data fast-forwarding. In this case, the HTX extra field will be
seen by the consumer. It is unexpected and leads to undefined behavior.
One consequence of this bug is to perform a wrong chunking on compressed
messages, leading to processing errors at the end of the message, reported as
"ID--" in logs.
To fix the bug, a HTX flag is added to state the payload of the current HTX
message is altered. When this flag is set (HTX_FL_ALTERED_PAYLOAD), the HTX
extra field must not be trusted. And to keep things simple, when this flag is
set, the HTX extra field is automatically set to 0 when the HTX message is
loaded, in htxbuf() function.
It is probably the less intrusive way to fix the bug for now. But this part must
be reviewed to save meta-info of the HTX message outside of the message itself.
This commit should solve the issue #2741. It must be backported as far as 2.9.
In sc_notify() function, the consumer side of the SC is tested to verify if
we must perform a shutdown on the endpoint. To do so, no output data must be
present in the buffer and in the iobuf. However, there is a bug here, the
iobuf of the opposite SC is tested instead of the one of the current SC. So
a shutdown can be performed on the endpoint while there are still output
data in the iobuf that must be sent. Concretely, it can only be data blocked
in a pipe.
Because of this bug, data blocked in the pipe will be never sent. I've not
tested but I guess this may block the stream during the client or server
timeout.
This patch must be backported as far as 2.9.
If a parsing error is reported by the mux on the response payload, a proxy
error (PRXCOND) must be reported instead of a server abort (SRVCL). Because
of this bug, inavlid response may are reported as "SD--" or "SL--" in logs
instead of "PD--" or "PL--".
This patch must be backported to all stable versions.
When the stream is shut down, some tests are performed to know if the
connection must also be closed or not. There are trace messages for all
cases, except for the default one: Abort or close-mode. Thanks to this
patch, there is now a message too in this case.
Info about the SD iobuf are now dumped in trace messages when a stream send
event is processed. It is a useful information to debug zero-copy forwarding
issues.