The strm_dump_to_buffer() function requires a variable stream only
for a few functions in it that do not take a const. strm_flt() is
one of them (and for good reasons since most call places want to
update filters). Here we know we won't modify the filter nor the
stream so let's directly access the strm_flt in the stream and assign
it to a const filter. This will also catch any future accidental change.
The function only works with the CLI's appctx and does most of the
convenient work of dumping a stream into a buffer (well, the trash
buffer for now). Let's split it in two so that most of the work is
done in a generic function and that the CLI-specific function relies
on that one.
The diff looks huge due to the changed indent caused by the extraction
of the switch/case statement, but when looked at using diff -b it's
small.
The HA_ANON_CLI() helper relies on the CLI appctx and prevents the code
from being made more generic. Let's extract the CLI's anon key separately
and pass it via HA_ANON_STR() instead.
Added set-timeout for frontend side of session, so it can be used to set
custom per-client timeouts if needed. Added cur_client_timeout to fetch
client timeout samples.
The commit 7f59d68fe2 ("BUG/MEDIIM: stconn: Flush output data before
forwarding close to write side") introduced a regression. When a write
timeout is detected, the shutdown is no longer forwarded. Dependig on the
channels state, it may block the processing, waiting the client or the
server leaves.
The commit above tries to avoid to truncate messages on shutdown but on
write timeout, if the channel is not empty, there is nothing more we can do
to send these data. It means the endpoint is unable to send data. In this
case, we must forward the shutdown.
This patch should be backported as far as 2.2.
As found by Coverity in issue #2273, the fix in commit e64bccab2 ("BUG/MINOR:
stream: protect stream_dump() against incomplete streams") was still not
enough, as scf/scb are still dereferenced to dump their flags and states.
This should be backported to 2.8.
If a stream is interrupted during its initialization by a panic signal
and tries to dump itself, it may cause a crash during the dump due to
scf and/or scb not being fully initialized. This may also happen while
releasing an endpoint to attach a new one. The effect is that instead
of dying on an abort, the process dies on a segv. This race is ultra-
rare but totally possible. E.g:
#0 se_fl_test (test=1, se=0x0) at include/haproxy/stconn.h:98
#1 sc_ep_test (test=1, sc=0x7ff8d5cbd560) at include/haproxy/stconn.h:148
#2 sc_conn (sc=0x7ff8d5cbd560) at include/haproxy/stconn.h:223
#3 stream_dump (buf=buf@entry=0x7ff9507e7678, s=0x7ff4c40c8800, pfx=pfx@entry=0x55996c558cb3 ' ' <repeats 13 times>, eol=eol@entry=10 '\n') at src/stream.c:2840
#4 0x000055996c493b42 in ha_task_dump (buf=buf@entry=0x7ff9507e7678, task=<optimized out>, pfx=pfx@entry=0x55996c558cb3 ' ' <repeats 13 times>) at src/debug.c:328
#5 0x000055996c493edb in ha_thread_dump_one (thr=thr@entry=18, from_signal=from_signal@entry=0) at src/debug.c:227
#6 0x000055996c493ff1 in ha_thread_dump (buf=buf@entry=0x7ff9507e7678, thr=thr@entry=18) at src/debug.c:270
#7 0x000055996c494257 in ha_panic () at src/debug.c:430
#8 ha_panic () at src/debug.c:411
(...)
#23 0x000055996c341fe8 in ssl_sock_close (conn=<optimized out>, xprt_ctx=0x7ff8dcae3880) at src/ssl_sock.c:6699
#24 0x000055996c397648 in conn_xprt_close (conn=0x7ff8c297b0c0) at include/haproxy/connection.h:148
#25 conn_full_close (conn=0x7ff8c297b0c0) at include/haproxy/connection.h:192
#26 h1_release (h1c=0x7ff8c297b3c0) at src/mux_h1.c:1074
#27 0x000055996c39c9f0 in h1_detach (sd=<optimized out>) at src/mux_h1.c:3502
#28 0x000055996c474de4 in sc_detach_endp (scp=scp@entry=0x7ff9507e3148) at src/stconn.c:375
#29 0x000055996c4752a5 in sc_reset_endp (sc=<optimized out>, sc@entry=0x7ff8d5cbd560) at src/stconn.c:475
Note that this cannot happen on "show sess" since a stream never leaves
process_stream in such an uninitialized state, thus it's really only the
crash dump that may cause this.
It should be backported to 2.8.
When the stream expiration date is computed at the end of process_stream(),
if there is no longer analyzer on the request channel, its analyse
expiration date is reset. The same is now performed on the response
channel. This way, we are sure to not inherit of an orphan expired date.
This should prevent spinning loop on process_stream().
Historically the client-fin and server-fin timeouts were made to allow
a connection closure to be effective quickly if the last data were sent
down a socket and the client didn't close, something that can happen
when the peer's FIN is lost and retransmits are blocked by a firewall
for example. This made complete sense in 1.5 for TCP and HTTP in close
mode. But nowadays with muxes, it's not done at the right layer anymore
and even the description doesn't match what is being done, because what
happens is that the stream will abort the whole transfer after it's done
sending to the mux and this timeout expires.
We've seen in GH issue 2095 that this can happen with very short timeout
values, and while this didn't trigger often before, now that the muxes
(h2 & quic) properly report an end of stream before even the first
sc_conn_sync_recv(), it seems that it can happen more often, and have
two undesirable effects:
- logging a timeout when that's not the case
- aborting the request channel, hence the server-side conn, possibly
before it had a chance to be put back to the idle list, causing
this connection to be closed and not reusable.
Unfortunately for TCP (mux_pt) this remains necessary because the mux
doesn't have a timeout task. So here we're adding tests to only do
this through an HTX mux. But to be really clean we should in fact
completely drop all of this and implement these timeouts in the mux
itself.
This needs to be backported to 2.8 where the issue was discovered,
and maybe carefully to older versions, though that is not sure at
all. In any case, using a higher timeout or removing client-fin in
HTTP proxies is sufficient to make the issue disappear.
At the end of process_stream(), if there was any change on request/response
analyzers, we now trigger a resync. It is performed if any analyzer is added
but also removed. It should help to catch internal changes on a stream and
eventually avoid it to be frozen.
There is no reason to backport this patch. But it may be good to keep an eye
on it, just in case.
In process_stream(), after request and response analyzers evaluation,
unhandled errors are processed, if any. In this case, depending on the case,
remaining request or response analyzers may be removed, unlesse the last one
about end of filters. However, auto-close is not reenabled in same
time. Thus it is possible to not forward the shutdown for a side to the
other one while no analyzer is there to do so or at least to make evolved
the situation.
In theory, it is thus possible to freeze a stream if no wakeup happens. And
it seems possible because it explain a freeze we've oberseved.
This patch could be backported to every stable versions but only after a
period of observation and if it may match an unexplained bug. It should not
lead to any loop but at worst and eventually to truncated messages.
The "show sess" command displays the stream's age in synthetic form,
and also makes it appear in the long version (show sess all). But that
last one uses the wrong origin, it uses accept_date.tv_sec instead of
accept_ts (formerly known as tv_accept). This was introduced in 1.4.2
with the long format, with commit 66dc20a17 ("[MINOR] stats socket: add
show sess <id> to dump details about a session"), while the code that
split the two variables was introduced in 1.3.16 with commit b7f694f20
("[MEDIUM] implement a monotonic internal clock"). This problem was
revealed by recent change ad5a5f677 ("MEDIUM: tree-wide: replace timeval
with nanoseconds in tv_accept and tv_request") that made this value report
random garbage, and generally emphasized by the fact that in 2.8 the two
clocks have sufficiently large an offset for such mistakes to be noticeable
early.
Arguably a difference between date and accept_date could also make sense,
to indicate if the stream had been there for more than 49 days, but this
would introduce instabilities for most sockets (including negative times)
for extremely rare cases while the goal is essentially to see how much
longer than a configured timeout a stream has been there. And that's what
other locations (including the short form) provide.
This patch could be backported but most users will never notice. In case
of backport, tv_accept.tv_sec should be used instead of accept_date.tv_sec.
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".
All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really is.
The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.
The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.
The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.
One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec
part to disappear. At this point, "now" is only used as a timeval in
clock.c where it is updated.
Let's get rid of timeval in storage of internal timestamps so that they
are no longer mistaken for wall clock time. These were exclusively used
subtracted from each other or to/from "now" after being converted to ns,
so this patch removes the tv_to_ns() conversion to use them natively. Two
occurrences of tv_isge() were turned to a regular wrapping subtract.
Instead of operating on {sec, usec} now we convert both operands to
ns then subtract them and convert to ms. This is a first step towards
dropping timeval from these timestamps.
Interestingly, tv_ms_elapsed() and tv_ms_remain() are no longer used at
all and could be removed.
SC_FL_EOS flag is added to report the end-of-stream at the SC level. It will
be used to distinguish end of stream reported by the endoint, via the
SE_FL_EOS flag, and the abort triggered by the stream, via the
SC_FL_ABRT_DONE flag.
In this patch, the flag is defined and is systematically tested everywhere
SC_FL_ABRT_DONE is tested. It should be safe because it is never set.
We can now fully rely on SC_FL_ERROR flag from the stream. The first step is
to stop to set the SE_FL_ERROR flag. Only endpoints are responsible to set
this flag. It was a design limitation. It is now fixed.
From the stream, when SE_FL_ERROR flag is tested, we now also test the
SC_FL_ERROR flag. Idea is to stop to rely on the SE descriptor to detect
errors.
Here again, it is just a flag renaming. In SC flags, there is no longer
shutdown for reads but aborts. For now this flag is set when a read0 is
detected. It is of couse not accurate. This will be changed later.
After the flag renaming, it is now the turn for the channel function to be
renamed and moved in the SC scope. channel_shutw_now() is replaced by
sc_schedule_shutdown(). The request channel is replaced by the front SC and
the response is replace by the back SC.
Because shutowns for reads are now considered as aborts, the shudowns for
writes can now be considered as shutdowns. Here it is just a flag
renaming. SC_FL_SHUTW_NOW is renamed SC_FL_SHUT_WANTED.
After the flag renaming, it is now the turn for the channel function to be
renamed and moved in the SC scope. channel_shutr_now() is replaced by
sc_schedule_abort(). The request channel is replaced by the front SC and the
response is replace by the back SC.
Most of calls to channel_abort() are associated to a call to
channel_auto_close(). Others are in areas where the auto close is the
default. So, it is now systematically enabled when an abort is performed on
a channel, as part of channel_abort() function.
A regression was introduced when stream's timeouts were refactored. Write
timeouts are not testing is the right order. When timeous of the front SC
are handled, we must then test the read timeout on the request channel and
the write timeout on the response channel. But write timeout is tested on
the request channel instead. On the back SC, the same mix-up is performed.
We must be careful to handle timeouts before checking channel flags. To
avoid any confusions, all timeuts are handled first, on front and back SCs.
Then flags of the both channels are tested.
It is a 2.8-specific issue. No backport needed.
There is a bug at begining of process_stream(). The SE_FL_ERROR flag is
tested against backend stream-connector's flags instead of its SE
descriptor's flags. It is an old typo, introduced when the stream-interfaces
were replaced by the conn-streams.
This patch must be backported as far as 2.6.
The purpose of this patch is only a one-to-one replacement, as far as
possible.
CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the
stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not
so simple because at many places, we were testing if a channel was shut for
reads and writes in same time. To do the same, shut for reads must be tested
on one side on the SC and shut for writes on the other side on the opposite
SC. A special care was taken with process_stream(). flags of SCs must be
saved to be able to detect changes, just like for the channels.
There is a bug in a way the channels flags are checked to set clientfin or
serverfin timeout. Indeed, to set the clientfin timeout, the request channel
must be shut for reads (CF_SHUTR) or the response channel must be shut for
writes (CF_SHUTW). As the opposite, the serverfin timeout must be set when
the request channel is shut for writes (CF_SHUTW) or the response channel is
shut for reads (CF_SHUTR).
It is a 2.8-dev specific issue. No backport needed.
In stream_free() if we fail to allocate s->scb() we go to the path where
we try to free it, and it doesn't like being called with a null at all.
It's easily reproducible with -dMfail,no-cache and "tune.fail-alloc 10"
in the global section.
This must be backported to 2.6.
When we exit from process_stream(), if the task is expired, we try to handle
the stream timeouts and we resync the stream-connectors. This avoids a
useless immediate wakeup. It is not really an issue, but it is a small
improvement in edge cases.
This will be mandatory to be able to handle stream's timeouts before exiting
process_stream(). So, to not duplicate code, all this stuff is moved in a
dedicated function.
At the end of process_stream(), A BUG_ON was recently added to abort if we
leave the function with an expired task. However, it may happen if an event
prevents the timeout to be handled but nothing evolved. In this case, the
task expiration is not updated and we expect to catch the timeout on the
immediate task wakeup.
No backport needed.
The half-closed timeout is now directly retrieved from the proxy
settings. There is no longer usage for the .hcto field in the stconn
structure. So let's remove it.
Expiration dates in trace messages are now relative to now_ms. It is fairly
easier to read traces this way. And an expired date is now negative. So, it
is also easy to detect when a timeout was reached.
Becasue read and write timeout are now detected using .lra and .fsb fields
of the stream-endpoint descriptor, it is better to also use these fields to
report read and write expiration date in trace messages. Especially because
old rex and wex fields will be removed.
We stop to use the channel's expiration dates to detect read and write
timeouts on the channels. We now rely on the stream-endpoint descriptor to
do so. All the stuff is handled in process_stream().
The stream relies on 2 helper functions to know if the receives or sends may
expire: sc_rcv_may_expire() and sc_snd_may_expire().