By default, backend connections are attached to a server instance. This
allows to implement connection reuse. However, in some particular cases,
connection cannot be shared accross several clients. These connections
are considered and private and are attached to the session instance
instead.
These private connections are also indexed by the target server to not
mix them. All of this is implemented via a dedicated structure
previously named struct sess_srv_list.
Rename it to better reflect its usage to struct sess_priv_conns. Also
rename its internal members and all of the associated functions.
This commit is only a renaming, thus no functional impact is expected.
Till now it was still needed to write rules to eliminate bad behaving
H2 clients, while most of the time it would be desirable to just be able
to set a threshold on the level of anomalies on a connection.
This is what this patch does. By setting a glitches threshold for frontend
and backend, it allows to automatically turn a connection to the error
state when the threshold is reached so that the connection dies by itself
without having to write possibly complex rules.
One subtlety is that we still have the error state being exclusive to the
parser's state so this requires the h2c_report_glitches() function to return
a status indicating if the threshold was reached or not so that processing
can instantly stop and bypass the state update, otherwise the state could
be turned back to a valid one (e.g. after parsing CONTINUATION); we should
really contemplate the possibility to use H2_CF_ERROR for this. Fortunately
there were very few places where a glitch was reported outside of an error
path so the changes are quite minor.
Now by setting the front value to 1000, a client flooding with short
CONTINUATION frames is instantly stopped.
The function aims at centralizing counter measures but due to the fact
that it only increments the counter by one unit, sometimes it was not
used and the value was calculated directly. Let's pass the increment in
argument so that it can be used everywhere.
2 return values are specified in the h2s_make_data() function comment. Both
are more or less equivalent but the later is probably more accurate. So,
keep the right one and remove the other one.
This patch should fix the issue #2175.
Global options to disable for zero-copy forwarding are now tested outside
callbacks responsible to perform the forwarding itself. It is cleaner this
way because we don't try at all zero-copy forwarding if at least one side
does not support it. It is equivalent to what was performed before, but it
is simplier this way.
It is unused for now, but the muxes announce their support of the zero-copy
forwarding on consumer side. All muxes, except the fgci one, are supported
it.
We now update the session's tracked counters with the observed glitches.
In order to avoid incurring a high cost, e.g. if many small frames contain
issues, we batch the updates around h2_process_demux() by directly passing
the difference. Indeed, for now all functions that increment glitches are
called from h2_process_demux(). If that were to change, we'd just need to
keep the value of the last synced counter in the h2c struct instead of the
stack.
The regtest was updated to verify that the 3rd client that does not cause
issue still sees the counter resulting from client 2's mistakes. The rate
is also verified, considering it shouldn't fail since the period is very
long (1m).
It's quite uncommon for a client to decide to change the connection's
initial window size after the settings exchange phase, unless it tries
to increase it. One of the impacts depending is that it updates all
streams, so it can be expensive, depending on the stacks, and may even
be used to construct an attack. For this reason, we now count a glitch
when this happens.
A test with h2spec shows that it triggers 9 across a full test.
Here we consider that if a HEADERS frame is made of more than 4 fragments
whose average size is lower than 1kB, that's very likely an abuse so we
count a glitch per 16 fragments, which means 1 glitch per 1kB frame in a
16kB buffer. This means that an abuser sending 1600 1-byte frames would
increase the counter by 100, and that sending 100 headers per request in
individual frames each results in a count of ~7 to be added per request.
A test consisting in sending 100M requests made of 101 frames each over
a connection resulted in ~695M glitches to be counted for this connection.
Note that no special care is taken to avoid wrapping since it already takes
a very long time to reach 100M and there's no particular impact of wrapping
here (roughly 1M/s).
RFC9113 clarified a point regarding the payload from DATA frames sent to
closed streams. It must always be counted against the connection's flow
control. In practice it should really have no practical effect, but if
repeated upload attempts are aborted, this might cause the client's
window to progressively shrink since not being ACKed.
It's probably not necessary to backport this, unless another patch
depends on it.
During zero-copy forwarding negotiation, a pseudo flag was already used to
notify the consummer if the producer is able to use kernel splicing or not. But
this was not extensible. So, now we use a true bitfield to be able to pass flags
during the negotiation. NEGO_FF_FL_* flags may be used now.
Of course, for now, there is only one flags, the kernel splicing support on
producer side (NEGO_FF_FL_MAY_SPLICE).
It is now possible to selectively retrieve extra counters from stats
modules. H1, H2, QUIC and H3 fill_stats() callback functions are updated to
return a specific counter.
There are a lot of H2 events which are not invalid from a protocol
perspective but which are yet anomalies, especially when repeated. They
can come from bogus or really poorly implemlented clients, as well as
purposely built attacks, as we've seen in the past with various waves
of attempts at abusing H2 stacks.
In order to better deal with such situations, it would be nice to be
able to sort out what is correct and what is not. There's already the
HTTP error counter that may even be updated on a tracked connection,
but HTTP errors are something clearly defined while there's an entire
scope of gray area around it that should not fall into it.
This patch introduces the notion of "glitches", which normally correspond
to unexpected and temporary malfunction. And this is exactly what we'd
like to monitor. For example a peer is not misbehaving if a request it
sends fails to decode because due to HPACK compression it's larger than
a buffer, and for this reason such an event is reported as a stream error
and not a connection error. But this causes trouble nonetheless and should
be accounted for, especially to detect if it's repeated. Similarly, a
truncated preamble or settings frame may very well be caused by a network
hiccup but how do we know that in the logs? For such events, a glitch
counter is incremented on the connection.
For now a total of 41 locations were instrumented with this and the
counter is reported in the traces when not null, as well as in
"show sess" and "show fd". This was done using a new function,
"h2c_report_glitch()" so that it becomes easier to extend to more
advanced processing (applying thresholds, producing logs, escalating
to connection error, tracking etc).
A test with h2spec shows it reported in 8545 trace lines for 147 tests,
with some reaching value 3 in a same test (e.g. HPACK errors).
Some places were not instrumented, typically anything that can be
triggered on perfectly valid activity (received data after RST being
emitted, timeouts, etc). Some types of events were thought about,
such as INITIAL_WINDOW_SIZE after the first SETTINGS frame, too small
window update increments, etc. It just sounds too early to know if
those are currently being triggered by perfectly legit clients. Also
it's currently not incremented on timeouts so that we don't do that
repeatedly on short keep-alive timeouts, though it could make sense.
This may change in the future depending on how it's used. For now
this is not exposed outside of traces and debugging.
The test was performed but no trace emitted, which can complicate certain
diagnostics, so let's just add the trace for this rare case. It may safely
be backported though this is really not important.
Commit 7021a8c4d8 ("BUG/MINOR: mux-h2: also count streams for refused
ones") addressed stream counting issues on some error cases but not
completely correctly regarding the conn_err vs stream_err case. Indeed,
contrary to the initial analysis, h2c_dec_hdrs() can set H2_CS_ERROR
when facing some unrecoverable protocol errors, and it's not correct
to send it to strm_err which will only send the RST_STREAM frame and
the subsequent GOAWAY frame is in fact the result of the read timeout.
The difficulty behind this lies on the sequence of output validations
because h2c_dec_hdrs() returns two results at once:
- frame processing status (done/incomplete/failed)
- connection error status
The original ordering requires to write 2 exemplaries of the exact
same error handling code disposed differently, which the patch above
tried to factor to one. After careful inspection of h2c_dec_hdrs()
and its comments, it's clear that it always returns -1 on failure,
*including* connection errors. This means we can rearrange the test
to get rid of the missing data first, and immediately enter the
no-return zone where both the stream and connection errors can be
checked at the same place, making sure to consistently maintain
error counters. This is way better because we don't have to update
stream counters on the error path anymore. h2spec now passes the
test much faster.
This will need to be backported to the same branches as the commit
above, which was already backported to 2.9.
There are a few places where we can reject an incoming stream based on
technical errors such as decoded headers that are too large for the
internal buffers, or memory allocation errors. In this case we send
an RST_STREAM to abort the request, but the total stream counter was
not incremented. That's not really a problem, until one starts to try
to enforce a total stream limit using tune.h2.fe.max-total-streams,
and which will not count such faulty streams. Typically a client that
learns too large cookies and tries to replay them in a way that
overflows the maximum buffer size would be rejected and depending on
how they're implemented, they might retry forever.
This patch removes the stream count increment from h2s_new() and moves
it instead to the calling functions, so that it translates the decision
to process a new stream instead of a successfully decoded stream. The
result is that such a bogus client will now be blocked after reaching
the total stream limit.
This can be validated this way:
global
tune.h2.fe.max-total-streams 128
expose-experimental-directives
trace h2 sink stdout
trace h2 level developer
trace h2 verbosity complete
trace h2 start now
frontend h
bind :8080
mode http
redirect location /
Sending this will fill frames with 15972 bytes of cookie headers that
expand to 16500 for storage+index once decoded, causing "message too large"
events:
(dev/h2/mkhdr.sh -t p;dev/h2/mkhdr.sh -t s;
for sid in {0..1000}; do
dev/h2/mkhdr.sh -t h -i $((sid*2+1)) -f es,eh \
-R "828684410f7777772e6578616d706c652e636f6d \
$(for i in {1..66}; do
echo -n 60 7F 73 433d $(for j in {1..24}; do
echo -n 2e313233343536373839; done);
done) ";
done) | nc 0 8080
Now it properly stops after sending 128 streams.
This may be backported wherever commit 983ac4397 ("MINOR: mux-h2:
support limiting the total number of H2 streams per connection") is
present, since without it, that commit is less effective.
This patch introduces a new setting: tune.h2.fe.max-total-streams. It
sets the HTTP/2 maximum number of total streams processed per incoming
connection. Once this limit is reached, HAProxy will send a graceful GOAWAY
frame informing the client that it will close the connection after all
pending streams have been closed. In practice, clients tend to close as fast
as possible when receiving this, and to establish a new connection for next
requests. Doing this is sometimes useful and desired in situations where
clients stay connected for a very long time and cause some imbalance inside a
farm. For example, in some highly dynamic environments, it is possible that
new load balancers are instantiated on the fly to adapt to a load increase,
and that once the load goes down they should be stopped without breaking
established connections. By setting a limit here, the connections will have
a limited lifetime and will be frequently renewed, with some possibly being
established to other nodes, so that existing resources are quickly released.
The default value is zero, which enforces no limit beyond those implied by
the protocol (2^30 ~= 1.07 billion). Values around 1000 were found to
already cause frequent enough connection renewal without causing any
perceptible latency to most clients. One notable exception here is h2load
which reports errors for all requests that were expected to be sent over
a given connection after it receives a GOAWAY. This is an already known
limitation: https://github.com/nghttp2/nghttp2/issues/981
The patch was made in two parts inside h2_frt_handle_headers():
- the first one, at the end of the function, which verifies if the
configured limit was reached and if it's needed to emit a GOAWAY ;
- the second, just before decoding the stream frame, which verifies if
a previously configured limit was ignored by the client, and closes
the connection if this happens. Indeed, one reason for a connection
to stay alive for too long definitely comes from a stupid bot that
periodically fetches the same resource, scans lots of URLs or tries
to brute-force something. These ones are more likely to just ignore
the last stream ID advertised in GOAWAY than a regular browser, or
a well-behaving client such as curl which respects it. So in order
to make sure we can close the connection we need to enforce the
advertised limit.
Note that a regular client will not face a problem with that because in
the worst case it will have max_concurrent_streams in flight and this
limit is taken into account when calculating the advertised last
acceptable stream ID.
Just a note: it may also be possible to move the first part above to
h2s_frt_stream_new() instead so that it's not processed for trailers,
though it doesn't seem to be more interesting, first because it has
two return points.
This is something that may be backported to 2.9 and 2.8 to offer more
control to those dealing with dynamic infrastructures, especially since
for now we cannot force a connection to be cleanly closed using rules
(e.g. github issues #946, #2146).
An error on the H2 connection was always reported as an error to the
stream-endpoint descriptor, independently on the H2 stream state. But it is
a bug to do so for closed streams. And indeed, it leads to report "SD--"
termination state for some streams while the response was fully received and
forwarded to the client, at least for the backend side point of view.
Now, errors are no longer reported for H2 streams in closed state.
This patch is related to the three previous ones:
* "BUG/MEDIUM: mux-h2: Don't report error on SE for closed H2 streams"
* "BUG/MEDIUM: mux-h2: Don't report error on SE if error is only pending on H2C"
* "BUG/MEDIUM: mux-h2: Only Report H2C error on read error if demux buffer is empty"
The series should fix a bug reported in issue #2388
(#2388#issuecomment-1855735144). The series should be backported to 2.9 but
only after a period of observation. In theory, older versions are also
affected but this part is pretty sensitive. So don't backport it further
except if someone ask for it.
In h2s_wake_one_stream(), we must not report an error on the stream-endpoint
descriptor if the error is not definitive on the H2 connection. A pending
error on the H2 connection means there are potentially remaining data to be
demux. It is important to not truncate a message for a stream.
This patch is part of a series that should fix a bug reported in issue #2388
(#2388#issuecomment-1855735144). Backport instructions will be shipped in
the last commit of the series.
It is similar to the previous fix ("BUG/MEDIUM: mux-h2: Don't report H2C
error on read error if dmux buffer is not empty"), but on receive side. If
the demux buffer is not empty, an error on the TCP connection must not be
immediately reported as an error on the H2 connection. We must be sure to
have tried to demux all data first. Otherwise, messages for one or more
streams may be truncated while all data were already received and are
waiting to be demux.
This patch is part of a series that should fix a bug reported in issue #2388
(#2388#issuecomment-1855735144). Backport instructions will be shipped in
the last commit of the series.
When an error on the H2 connection is detected when sending data, only a
pending error is reported, waiting for an error or a shutdown on the read
side. However if a shutdown was already received, the pending error is
switched to a definitive error.
At this stage, we must also wait to have flushed the demux
buffer. Otherwise, if some data must still be demux, messages for one or
more streams may be truncated. There is already the flag H2_CF_END_REACHED
to know a shutdown was received and we no longer progress on demux side
(buffer empty or data truncated). On sending side, we should use this flag
instead to report a definitive error.
This patch is part of a series that should fix a bug reported in issue #2388
(#2388#issuecomment-1855735144). Backport instructions will be shipped in
the last commit of the series.
During HEADERS frames decoding, if a frame is too large to fit in a buffer,
an internal error is reported and a RST_STREAM is emitted. On the other
hand, we wait to have an empty rxbuf to decode the frame because we cannot
retry a failed HPACK decompression.
When we are decoding headers, it is valid to return an error if dbuf buffer
is full because no data can be blocked in the rxbuf (which hosts the HTX
message).
However, during the trailers decoding, it is possible to have some data not
sent yet for the current stream in the rxbug and data for another stream
fully filling the dbuf buffer. In this case, we don't decode the trailers
but we must not return an error. We must wait to empty the rxbuf first.
Now, a HEADERS frame is considered as too large if the dbuf buffer is full
and if the rxbuf is empty (the HTX message to be accurate).
This patch should fix the issue #2382. It must be backported to all stable
versions.
tune.h2.zero-copy-fwd-send can now be used to enable or disable the
zero-copy fast-forwarding for the H2 mux only, for sends. For now, there is
no option to disable it for receives because it is not supported yet.
It is enabled ('on') by default.
All muxes now implements the ->sctl() callback function and are able to
return the stream ID. For the PT multiplexer, it is always 0. For the H1
multiplexer it is the request count for the current H1 connection (added for
this purpose). The FCGI, H2 and QUIC muxes, the stream ID is returned.
The stream ID is returned as a signed 64 bits integer.
Instead of the generic MUX_, we now use MUX_CTL_ prefix for all mux_ctl_type
value. This will avoid any ambiguities with other enums, especially with a
new one that will be added to get information on mux streams.
When a H2 stream is blocked during data fast-forwarding, we must take care
to remove H2_SF_NOTIFIED flag. This was only performed when data
fast-forward was attempted. However, if the H2 stream was blocked for any
reason, this flag was not removed. During our tests, we found it was
possible to infinitely block a connection because one of its streams was in
the send_list with the flag set. In this case, the stream was no longer
woken up to resume the sends, blocking all other streams.
No backport needed.
It's the exact same as commit 0a7ab7067 ("OPTIM: mux-h2: don't allocate
more buffers per connections than streams"), but for the zero-copy case
this time. Previously it was only done on the regular snd_buf() path, but
this one is needed as well. A transfer on 16 parallel streams now consumes
half of the memory, and a single stream consumes much less.
An alternate approach would be worth investigating in the future, based
on the same principle as the CF_STREAMER_FAST at the higher level: in
short, by monitoring how many mux buffers we write at once before refilling
them, we would get an idea of how much is worth keeping in buffers max,
given that anything beyond would just waste memory. Some tests show that
a single buffer already seems almost as good, except for single-stream
transfers, which is why it's worth spending more time on this.
Connection takeover was implemented for H2 in 2.2 by commit cd4159f03
("MEDIUM: mux_h2: Implement the takeover() method."). It does have one
corner case related to memory allocation failure: in case the task or
tasklet allocation fails, the connection gets released synchronously.
Unfortunately the situation is bad there, because the lower layers are
already switched to the new thread while the tasklet is either NULL or
still the old one, and calling h2_release() will also result in
h2_process() and h2_process_demux() that may process any possibly
pending frames. Even the session remains the old one on the old thread,
so that some sess_log() that are called when facing certain demux errors
will be associated with the previous thread, possibly accessing a number
of elements belonging to another thread. There are even code paths where
the thread will try to grab the lock of its own idle conns list, believing
the connection is there while it has no useful effect. However, if the
owner thread was doing the same at the same moment, and ended up trying
to pick from the current thread (which could happen if picking a connection
for a different name), the two could even deadlock.
The risk is extremely low, but Fred managed to reproduce use-after-free
errors in conn_backend_get() after a takeover() failed by playing with
-dMfail, indicating that h2_release() had been successfully called. In
practise it's sufficient to have h2 on the server side with reuse-always
and to inject lots of request on it with -dMfail.
This patch takes a simple but radically different approach. Instead of
starting to migrate the connection before risking to face allocation
failures, it first pre-allocates a new task and tasklet, then assigns
them to the connection if the migration succeeds, otherwise it just
frees them. This way it's no longer needed to manipulate the connection
until it's fully migrated, and as a bonus this means the connection will
continue to exist and the use-after-free condition is solved at the same
time.
This should be backported to 2.2. Thanks to Fred for the initial analysis
of the problem!
On passive reverse, H2 mux is responsible to insert the connection in
the server idle list. This is done via srv_add_to_idle_list(). However,
this function may fail for various reason, such as FD usage limit
reached.
Handle properly this error case. H2 mux flags the connection on error
which will cause its release. Prior to this patch, the connection was
only released on server timeout.
This bug was found inspecting server curr_used_conns counter. Indeed, on
connection reverse, this counter is first incremented. It is decremented
just after on srv_add_to_idle_list() if insertion is validated. However,
if insertion is rejected, the connection was not released which cause
curr_used_conns to remains positive. This has the major downside to
break the reusing of idle connection on rhttp causing spurrious 503
errors.
No need to backport.
When an H2 mux works with a slow downstream connection and without the
mux-mux mode, it is possible that a single stream will allocate all 32
buffers in the connection. This is not desirable at all because 1) it
brings no value, and 2) it allocates a lot of memory per connection,
which, in addition to using a lot of memory, tends to degrade performance
due to cache thrashing.
This patch improves the situation by refraining from sending data frames
over a connection when more mbufs than streams are allocated. On a test
featuring 10k connections each with a single stream reading from the
cache, this patch reduces the RAM usage from ~180k buffers to ~20k bufs,
and improves the bandwidth. This may even be backported later to recent
versions to improve memory usage. Note however that it is efficient only
when combined with e16762f8a ("OPTIM: mux-h2: call h2_send() directly
from h2_snd_buf()"), and tends to slightly reduce the single-stream
performance without it, so in case of a backport, the two need to be
considered together.
IOBUF_FL_EOI iobuf flag is now set by the producer to notify the consumer
that the end of input was reached. Thanks to this flag, we can remove the
ugly ack in h2_done_ff() to test the opposite SE flags.
Of course, for now, it works and it is good enough. But we must keep in mind
that EOI is always forwarded from the producer side to the consumer side in
this case. But if this change, a new CO_RFL_ flag will have to be added to
instruct the producer if it can forward EOI or not.
In the mux-to-mux data forwarding, we now try, as far as possible to send at
least a buffer. Of course, if the consumer side is congested or if nothing
more can be received, we leave. But the idea is to retry to fast-forward
data if less than a buffer was forwarded. It is only performed for buffer
fast-forwarding, not splicing.
The idea behind this patch is to optimise the forwarding, when a first
forward was performed to complete a buffer with some existing data. In this
case, the amount of data forwarded is artificially limited because we are
using a non-empty buffer. But without this limitation, it is highly probable
that a full buffer could have been sent. And indeed, with H2 client, a
significant improvement was observed during our test.
To do so, .done_fastfwd() callback function must be able to deal with
interim forwards. Especially for the H2 mux, to remove H2_SF_NOTIFIED flags
on the H2S on the last call only. Otherwise, the H2 stream can be blocked by
itself because it is in the send_list. IOBUF_FL_INTERIM_FF iobuf flag is
used to notify the consumer it is not the last call. This flag is then
removed on the last call.
When data are directly forwarded from a mux to the opposite one, we must not
forget to report send activity when data are successfully sent or report a
blocked send with data are blocked. It is important because otherwise, if
the transfer is quite long, longer than the client or server timeout, an
error may be triggered because the write timeout is reached.
H1, H2 and PT muxes are concerned. To fix the issue, The done_fastword()
callback now returns the amount of data consummed. This way it is possible
to update/reset the FSB data accordingly.
No backport needed.
This allows to eliminate full buffers very quickly and to recycle them
much faster, resulting in higher transfer rates and lower memory usage
at the same time. We just wake the tasklet up if it succeeded so that
h2_process() and friends are called to finalize what needs to.
For regular buffer sizes, the performance level becomes quite close to
the one obtained with the zero-copy mechanism (zero-copy remains much
faster with non-default buffer sizes). The memory savings are huge with
default buffer size: at 64c * 100 streams on a single thread, we used
to forward 4.4 Gbps of traffic using 10400 buffers. After the change,
the performance reaches 5.9 Gbps with only 22-24 buffers, since they
are quickly recycled. That's asaving of 160 MB of RAM.
A concern was an increase in the number of syscalls but this is not
the case, the numbers remained exactly the same before and after.
Some experimentations were made to try to cork data and not send
incomplete buffers, and that always voided these changes. One
explanation might be that keeping a first buffer with only headers
frames is sufficient to prevent a zero-copy of the data coming in
a next snd_buf() call. This still needs to be studied anyway.
By calling h2_process(), the code would theoretically make it possible
for a synchronous ->wake() call to provoke an indirect call to h2_snd_buf()
while we're in h2_done_ff(), which could be quite bad. The current
conditions do not permit it right now but this could easily break by
accident. Better use h2_send() and wake the task up if needed. Precise
performance tests showed no change.
Define a new function srv_add_to_avail_list(). This function is used to
centralize connection insertion in available tree. It reuses a BUG_ON()
statement to ensure the connection is not present in the idle list.
Originally H2 would transfer everything to H1 and parsing errors were
handled there, so that if there was a track-sc rule in effect, the
counters would be updated as well. As we started to add more and more
HTTP-compliance checks at the H2 layer, then switched to HTX, we
progressively lost this ability. It's a bit annoying because it means
we will not maintain accurate error counters for a given source, for
example.
This patch adds the calls to session_inc_http_req_ctr() and
session_inc_http_err_ctr() when needed (i.e. when failing to parse
an HTTP request since all other cases are handled by the stream),
just like mux-h1 does. The same should be done for mux-h3 by the
way.
This can be backported to recent stable versions. It's not exactly a
bug, rather a missing feature in that we had never updated this counter
for H2 till now, but it does make sense to do it especially based on
what the doc says about its usage.
The H2 spec says that a HEADERS frame turns an idle stream to the open
state, and it may then turn to half-closed(remote) on ES, then to close,
all at once, if we respond with RST (e.g. on error). Due to the fact that
we process a complete frame at once since h2_dec_hdrs() may reassemble
CONTINUATION frames until everything is complete, the state was only
committed after the frame was completley valid (otherwise multiple passes
could result in subsequent frames being rejected as the stream ID would
be equal to the highest one).
However this is not correct because it means that a client may retry on
the same ID as a previously failed one, which technically is forbidden
(for example the client couldn't know which of them a WINDOW_UPDATE or
RST_STREAM frame is for).
In practice, due to the error paths, this would only be possible when
failing to decode HPACK while leaving the HPACK stream intact, thus
when the valid decoded HPACK stream cannot be turned into a valid HTTP
representation, e.g. when the resulting headers are too large for example.
The solution to avoid this consists in committing the stream ID on this
error path as well. h2spec continues to be happy.
Thanks to Annika Wickert and Tim Windelschmidt for reporting this issue.
This fix must be backported to all stable versions.
In h2_frt_handle_headers() all failures lead to a generic message saying
"rejected H2 request". It's quite inexpressive while there are a few
distinct tests that are made before jumping there:
- trailers on closed stream
- unparsable request
- refused stream
Let's emit the traces from these call points instead so that we get more
info about what happened. Since these are user-level messages, we take
care of keeping them aligned as much as possible.
For example before it would say:
[04|h2|1|mux_h2.c:2859] rejected H2 request : h2c=0x7f5d58036fd0(F,FRE)
[04|h2|5|mux_h2.c:2860] h2c_frt_handle_headers(): leaving on error : h2c=0x7f5d58036fd0(F,FRE) dsi=1 h2s=0x9fdb60(0,CLO)
And now it says:
[04|h2|1|mux_h2.c:2817] rcvd unparsable H2 request : h2c=0x7f55f8037160(F,FRH) dsi=1 h2s=CLO
[04|h2|5|mux_h2.c:2875] h2c_frt_handle_headers(): leaving on error : h2c=0x7f55f8037160(F,FRE) dsi=1 h2s=CLO
Sometimes it's unclear whether a stream is still open or closed when
certain traces are emitted, for example when the stream was refused,
because the reported pointer and ID in fact correspond to the refused
stream. And for closed streams, no pointer/name is printed, leaving
some confusion about the state. This patch makes the situation easier
to analyse by explicitly reporting "h2s=CLO" on closed/error/refused
streams so that we don't waste time comparing pointers and we instantly
know the stream is closed. Now instead of emitting:
[03|h2|5|mux_h2.c:2874] h2c_frt_handle_headers(): leaving on error : h2c=0x7fdfa8026820(F,FRE) dsi=201 h2s=0x9fdb60(0,CLO)
It will emit:
[03|h2|5|mux_h2.c:2874] h2c_frt_handle_headers(): leaving on error : h2c=0x7fdfa8026820(F,FRE) dsi=201 h2s=CLO
Stefan Behte reported that since commit f279a2f14 ("BUG/MINOR: mux-h2:
refresh the idle_timer when the mux is empty"), the http-request and
http-keep-alive timeouts don't work anymore on H2. Before this patch,
and since 3e448b9b64 ("BUG/MEDIUM: mux-h2: make sure control frames do
not refresh the idle timeout"), they would only be refreshed after stream
frames were sent (HEADERS or DATA) but the patch above that adds more
refresh points broke these so they don't expire anymore as long as
there's some activity.
We cannot just revert the fix since it also addressed an isse by which
sometimes the timeout would trigger too early and provoque truncated
responses. The right approach here is in fact to only use refresh the
idle timer when the mux buffer was flushed from any such stream frames.
In order to achieve this, we're now setting a flag on the connection
whenever we write a stream frame, and we consider that flag when deciding
to refresh the buffer after it's emptied. This way we'll only clear that
flag once the buffer is empty and there were stream data in it, not if
there were no such stream data. In theory it remains possible to leave
the flag on if some control data is appended after the buffer and it's
never cleared, but in practice it's not a problem as a buffer will always
get sent in large blocks when the window opens. Even a large buffer should
be emptied once in a while as control frames will not fill it as much as
data frames could.
Given the patch above was backported as far as 2.6, this patch should
also be backported as far as 2.6.
Instead of speaking of an initialisation stage for each data
fast-forwarding, we now use the negociate term. Thus init_ff/init_fastfwd
functions were renamed nego_ff/nego_fastfwd.
The H2 multiplexer now implements callbacks to consume fast-forwarded
data. It is the most usful case: A H2 client getting data from a H1
server. It is also the easiest case to implement. The producer side is
trickier because of multiplexing. It is not obvious this case would be
improved with data fast-forwarding.
If a shutw is blocked because the mux is full or busy, we must defer the
shutr. In this case, the H2 stream is not in H2_SS_CLOSED state because the
shutw is also deferred. If the shutr is performed, this will lead to a
error.
Concretly, when the mux is unblocked, a RST_STREAM is sent while in some
cases, an empty DATA frame with ES flag set could be sent.
This patch should be backported to all stable versions.
An interesting issue was met when testing the mux-to-mux forwarding code.
In order to preserve fairness, in h2_snd_buf() if other streams are waiting
in send_list or fctl_list, the stream that is attempting to send also goes
to its list, and will be woken up by h2_process_mux() or h2_send() when
some space is released. But on rare occasions, there are only a few (or
even a single) streams waiting in this list, and these streams are just
quickly removed because of a timeout or a quick h2_detach() that calls
h2s_destroy(). In this case there's no even to wake up the other waiting
stream in its list, and this will possibly resume processing after some
client WINDOW_UPDATE frames or even new streams, so usually it doesn't
last too long and it not much noticeable, reason why it was left that
long. In addition, measures have shown that in heavy network-bound
benchmark, this exact situation happens on less than 1% of the streams
(reached 4% with mux-mux).
The fix here consists in replacing these LIST_DEL_INIT() calls on
h2s->list with a function call that checks if other streams were queued
to the send_list recently, and if so, which also tries to resume them
by calling h2_resume_each_sending_h2s(). The detection of late additions
is made via a new flag on the connection, H2_CF_WAIT_INLIST, which is set
when a stream is queued due to other streams being present, and which is
cleared when this is function is called.
It is particularly difficult to reproduce this case which is particularly
timing-dependent, but in a constrained environment, a test involving 32
conns of 20 streams each, all downloading a 10 MB object previously
showed a limitation of 17 Gbps with lots of idle CPU time, and now
filled the cable at 25 Gbps.
This should be backported to all versions where it applies.
Since commit 5afcb686b ("MAJOR: connection: purge idle conn by last usage")
in 2.9-dev4, the test on conn->toremove_list added to conn_get_idle_flag()
in 2.8 by commit 3a7b539b1 ("BUG/MEDIUM: connection: Preserve flags when a
conn is removed from an idle list") becomes misleading. Indeed, now both
toremove_list and idle_list are shared by a union since the presence in
these lists is mutually exclusive. However, in conn_get_idle_flag() we
check for the presence in the toremove_list to decide whether or not to
delete the connection from the tree. This test now fails because instead
it sees the presence in the idle or safe list via the union, and concludes
the element must not be removed. Thus the element remains in the tree and
can be found later after the connection is released, causing crashes that
Tristan reported in issue #2292.
The following config is sufficient to reproduce it with 2 threads:
defaults
mode http
timeout client 5s
timeout server 5s
timeout connect 1s
listen front
bind :8001
server next 127.0.0.1:8002
frontend next
bind :8002
timeout http-keep-alive 1
http-request redirect location /
Sending traffic with a few concurrent connections and some short timeouts
suffices to instantly crash it after ~10k reqs:
$ h2load -t 4 -c 16 -n 10000 -m 1 -w 1 http://0:8001/
With Amaury we analyzed the conditions in which the function is called
in order to figure a better condition for the test and concluded that
->toremove_list is never filled there so we can safely remove that part
from the test and just move the flag retrieval back to what it was prior
to the 2.8 patch above. Note that the patch is not reverted though, as
the parts that would drop the unexpected flags removal are unchanged.
This patch must NOT be backported. The code in 2.8 works correctly, it's
only the change in 2.9 that makes it misbehave.
Add a new MUX flag MX_FL_REVERSABLE. This value is used to indicate that
MUX instance supports connection reversal. For the moment, only HTTP/2
multiplexer is flagged with it.
This allows to dynamically check if reversal can be completed during MUX
installation. This will allow to relax requirement on config writing for
'tcp-request session attach-srv' which currently cannot be used mixed
with non-http/2 listener instances, even if used conditionnally with an
ACL.