When the stream expiration date is computed at the end of process_stream(),
if there is no longer analyzer on the request channel, its analyse
expiration date is reset. The same is now performed on the response
channel. This way, we are sure to not inherit of an orphan expired date.
This should prevent spinning loop on process_stream().
Historically the client-fin and server-fin timeouts were made to allow
a connection closure to be effective quickly if the last data were sent
down a socket and the client didn't close, something that can happen
when the peer's FIN is lost and retransmits are blocked by a firewall
for example. This made complete sense in 1.5 for TCP and HTTP in close
mode. But nowadays with muxes, it's not done at the right layer anymore
and even the description doesn't match what is being done, because what
happens is that the stream will abort the whole transfer after it's done
sending to the mux and this timeout expires.
We've seen in GH issue 2095 that this can happen with very short timeout
values, and while this didn't trigger often before, now that the muxes
(h2 & quic) properly report an end of stream before even the first
sc_conn_sync_recv(), it seems that it can happen more often, and have
two undesirable effects:
- logging a timeout when that's not the case
- aborting the request channel, hence the server-side conn, possibly
before it had a chance to be put back to the idle list, causing
this connection to be closed and not reusable.
Unfortunately for TCP (mux_pt) this remains necessary because the mux
doesn't have a timeout task. So here we're adding tests to only do
this through an HTX mux. But to be really clean we should in fact
completely drop all of this and implement these timeouts in the mux
itself.
This needs to be backported to 2.8 where the issue was discovered,
and maybe carefully to older versions, though that is not sure at
all. In any case, using a higher timeout or removing client-fin in
HTTP proxies is sufficient to make the issue disappear.
At the end of process_stream(), if there was any change on request/response
analyzers, we now trigger a resync. It is performed if any analyzer is added
but also removed. It should help to catch internal changes on a stream and
eventually avoid it to be frozen.
There is no reason to backport this patch. But it may be good to keep an eye
on it, just in case.
In process_stream(), after request and response analyzers evaluation,
unhandled errors are processed, if any. In this case, depending on the case,
remaining request or response analyzers may be removed, unlesse the last one
about end of filters. However, auto-close is not reenabled in same
time. Thus it is possible to not forward the shutdown for a side to the
other one while no analyzer is there to do so or at least to make evolved
the situation.
In theory, it is thus possible to freeze a stream if no wakeup happens. And
it seems possible because it explain a freeze we've oberseved.
This patch could be backported to every stable versions but only after a
period of observation and if it may match an unexplained bug. It should not
lead to any loop but at worst and eventually to truncated messages.
The "show sess" command displays the stream's age in synthetic form,
and also makes it appear in the long version (show sess all). But that
last one uses the wrong origin, it uses accept_date.tv_sec instead of
accept_ts (formerly known as tv_accept). This was introduced in 1.4.2
with the long format, with commit 66dc20a17 ("[MINOR] stats socket: add
show sess <id> to dump details about a session"), while the code that
split the two variables was introduced in 1.3.16 with commit b7f694f20
("[MEDIUM] implement a monotonic internal clock"). This problem was
revealed by recent change ad5a5f677 ("MEDIUM: tree-wide: replace timeval
with nanoseconds in tv_accept and tv_request") that made this value report
random garbage, and generally emphasized by the fact that in 2.8 the two
clocks have sufficiently large an offset for such mistakes to be noticeable
early.
Arguably a difference between date and accept_date could also make sense,
to indicate if the stream had been there for more than 49 days, but this
would introduce instabilities for most sockets (including negative times)
for extremely rare cases while the goal is essentially to see how much
longer than a configured timeout a stream has been there. And that's what
other locations (including the short form) provide.
This patch could be backported but most users will never notice. In case
of backport, tv_accept.tv_sec should be used instead of accept_date.tv_sec.
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".
All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really is.
The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.
The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.
The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.
One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec
part to disappear. At this point, "now" is only used as a timeval in
clock.c where it is updated.
Let's get rid of timeval in storage of internal timestamps so that they
are no longer mistaken for wall clock time. These were exclusively used
subtracted from each other or to/from "now" after being converted to ns,
so this patch removes the tv_to_ns() conversion to use them natively. Two
occurrences of tv_isge() were turned to a regular wrapping subtract.
Instead of operating on {sec, usec} now we convert both operands to
ns then subtract them and convert to ms. This is a first step towards
dropping timeval from these timestamps.
Interestingly, tv_ms_elapsed() and tv_ms_remain() are no longer used at
all and could be removed.
SC_FL_EOS flag is added to report the end-of-stream at the SC level. It will
be used to distinguish end of stream reported by the endoint, via the
SE_FL_EOS flag, and the abort triggered by the stream, via the
SC_FL_ABRT_DONE flag.
In this patch, the flag is defined and is systematically tested everywhere
SC_FL_ABRT_DONE is tested. It should be safe because it is never set.
We can now fully rely on SC_FL_ERROR flag from the stream. The first step is
to stop to set the SE_FL_ERROR flag. Only endpoints are responsible to set
this flag. It was a design limitation. It is now fixed.
From the stream, when SE_FL_ERROR flag is tested, we now also test the
SC_FL_ERROR flag. Idea is to stop to rely on the SE descriptor to detect
errors.
Here again, it is just a flag renaming. In SC flags, there is no longer
shutdown for reads but aborts. For now this flag is set when a read0 is
detected. It is of couse not accurate. This will be changed later.
After the flag renaming, it is now the turn for the channel function to be
renamed and moved in the SC scope. channel_shutw_now() is replaced by
sc_schedule_shutdown(). The request channel is replaced by the front SC and
the response is replace by the back SC.
Because shutowns for reads are now considered as aborts, the shudowns for
writes can now be considered as shutdowns. Here it is just a flag
renaming. SC_FL_SHUTW_NOW is renamed SC_FL_SHUT_WANTED.
After the flag renaming, it is now the turn for the channel function to be
renamed and moved in the SC scope. channel_shutr_now() is replaced by
sc_schedule_abort(). The request channel is replaced by the front SC and the
response is replace by the back SC.
Most of calls to channel_abort() are associated to a call to
channel_auto_close(). Others are in areas where the auto close is the
default. So, it is now systematically enabled when an abort is performed on
a channel, as part of channel_abort() function.
A regression was introduced when stream's timeouts were refactored. Write
timeouts are not testing is the right order. When timeous of the front SC
are handled, we must then test the read timeout on the request channel and
the write timeout on the response channel. But write timeout is tested on
the request channel instead. On the back SC, the same mix-up is performed.
We must be careful to handle timeouts before checking channel flags. To
avoid any confusions, all timeuts are handled first, on front and back SCs.
Then flags of the both channels are tested.
It is a 2.8-specific issue. No backport needed.
There is a bug at begining of process_stream(). The SE_FL_ERROR flag is
tested against backend stream-connector's flags instead of its SE
descriptor's flags. It is an old typo, introduced when the stream-interfaces
were replaced by the conn-streams.
This patch must be backported as far as 2.6.
The purpose of this patch is only a one-to-one replacement, as far as
possible.
CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the
stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not
so simple because at many places, we were testing if a channel was shut for
reads and writes in same time. To do the same, shut for reads must be tested
on one side on the SC and shut for writes on the other side on the opposite
SC. A special care was taken with process_stream(). flags of SCs must be
saved to be able to detect changes, just like for the channels.
There is a bug in a way the channels flags are checked to set clientfin or
serverfin timeout. Indeed, to set the clientfin timeout, the request channel
must be shut for reads (CF_SHUTR) or the response channel must be shut for
writes (CF_SHUTW). As the opposite, the serverfin timeout must be set when
the request channel is shut for writes (CF_SHUTW) or the response channel is
shut for reads (CF_SHUTR).
It is a 2.8-dev specific issue. No backport needed.
In stream_free() if we fail to allocate s->scb() we go to the path where
we try to free it, and it doesn't like being called with a null at all.
It's easily reproducible with -dMfail,no-cache and "tune.fail-alloc 10"
in the global section.
This must be backported to 2.6.
When we exit from process_stream(), if the task is expired, we try to handle
the stream timeouts and we resync the stream-connectors. This avoids a
useless immediate wakeup. It is not really an issue, but it is a small
improvement in edge cases.
This will be mandatory to be able to handle stream's timeouts before exiting
process_stream(). So, to not duplicate code, all this stuff is moved in a
dedicated function.
At the end of process_stream(), A BUG_ON was recently added to abort if we
leave the function with an expired task. However, it may happen if an event
prevents the timeout to be handled but nothing evolved. In this case, the
task expiration is not updated and we expect to catch the timeout on the
immediate task wakeup.
No backport needed.
The half-closed timeout is now directly retrieved from the proxy
settings. There is no longer usage for the .hcto field in the stconn
structure. So let's remove it.
Expiration dates in trace messages are now relative to now_ms. It is fairly
easier to read traces this way. And an expired date is now negative. So, it
is also easy to detect when a timeout was reached.
Becasue read and write timeout are now detected using .lra and .fsb fields
of the stream-endpoint descriptor, it is better to also use these fields to
report read and write expiration date in trace messages. Especially because
old rex and wex fields will be removed.
We stop to use the channel's expiration dates to detect read and write
timeouts on the channels. We now rely on the stream-endpoint descriptor to
do so. All the stuff is handled in process_stream().
The stream relies on 2 helper functions to know if the receives or sends may
expire: sc_rcv_may_expire() and sc_snd_may_expire().
The stream endpoint descriptor now owns two date, lra (last read activity) and
fsb (first send blocked).
The first one is updated every time a read activity is reported, including data
received from the endpoint, successful connect, end of input and shutdown for
reads. A read activity is also reported when receives are unblocked. It will be
used to detect read timeouts.
The other one is updated when no data can be sent to the endpoint and reset
when some data are sent. It is the date of the first send blocked by the
endpoint. It will be used to detect write timeouts.
Helper functions are added to report read/send activity and to retrieve lra/fsb
date.
Read and write timeouts (.rto and .wto) are now replaced by an unique
timeout, call .ioto. Since the recent refactoring on channel's timeouts,
both use the same value, the client timeout on client side and the server
timeout on the server side. Thus, this part may be simplified. Now it
represents the I/O timeout.
These timers are related to the I/O. Thus it is logical to move them into
the SE descriptor. The patch is a bit huge but it is just a
replacement. However it is error-prone.
From the stconn or the stream, helper functions are used to get, set or
reset these timers. This simplify the timers manipulations.
Read and write timeouts concerns the I/O. Thus, it is logical to move it into
the stconn. At the end, the stream is responsible to detect the timeouts. So
it is logcial to have these values in the stconn and not in the SE
descriptor. But it may change depending on the recfactoring.
So, now:
* scf->rto is used instead of req->rto
* scf->wto is used instead of res->wto
* scb->rto is used instead of res->rto
* scb->wto is used instead of req->wto
This patch removes CF_READ_ERROR and CF_WRITE_ERROR flags. We now rely on
SE_FL_ERR_PENDING and SE_FL_ERROR flags. SE_FL_ERR_PENDING is used for write
errors and SE_FL_ERROR for read or unrecoverable errors.
When a connection error is reported, SE_FL_ERROR and SE_FL_EOS are now set and a
read event and a write event are reported to be sure the stream will properly
process the error. At the stream-connector level, it is similar. When an error
is reported during a send, a write event is triggered. On the read side, nothing
more is performed because an error at this stage is enough to wake the stream
up.
A major change is brought with this patch. We stop to check flags of the
ooposite channel to report abort or timeout. It also means when an read or
write error is reported on a side, we no longer update the other side. Thus
a read error on the server side does no long lead to a write error on the
client side. This should ease errors report.
This flag was introduced in 1.3 to fix a design issue. It was untouch since
then but there is no reason to still have this trick. Note it could be good
to review what happens in HTTP with the server is waiting for the end of the
request. It could be good to be sure a client timeout is always reported.