Christian Ruppert reported an issue explaining that it's not possible to
forcefully close H2 connections which do not receive requests anymore if
they continue to send control traffic (window updates, ping etc). This
will indeed refresh the timeout. In H1 we don't have this problem because
any single byte is part of the stream, so the control frames in H2 would
be equivalent to TCP acks in H1, that would not contribute to the timeout
being refreshed.
What misses from H2 is the use of http-request and keep-alive timeouts.
These were not implemented because initially it was hard to see how they
could map to H2. But if we consider the real use of the keep-alive timeout,
that is, how long do we keep a connection alive with no request, then it's
pretty obvious that it does apply to H2 as well. Similarly, http-request
may definitely be honored as soon as a HEADERS frame starts to appear
while there is no stream. This will also allow to deal with too long
CONTINUATION frames.
This patch moves the timeout update to a new function, h2c_update_timeout(),
which is in charge of this. It also adds an "idle_start" timestamp in the
connection, which is set when nb_cs reaches zero or when a headers frame
start to arrive, so that it cannot be delayed too long.
This patch should be backported to recent stable releases after some
observation time. It depends on previous patch "MEDIUM: mux-h2: slightly
relax timeout management rules".
The H2 timeout rules were arranged to cover complex situations In 2.1
with commit c2ea47fb1 ("BUG/MEDIUM: mux-h2: do not enforce timeout on
long connections").
It turns out that such rules while complex, do not perfectly cover all
use cases. The real intent is to say that as long as there are attached
streams, the connection must not timeout. Then once all these streams
have quit (possibly for timeout reasons) then the mux should take over
the management of timeouts.
We do have this nb_cs field which indicates the number of attached
streams, and it's updated even when leaving orphaned streams. So
checking it alone is sufficient to know whether it's the mux or the
streams that are in charge of the timeouts.
In its current state, this doesn't cause visible effects except that
it makes it impossible to implement more subtle parsing timeouts.
This would need to be backported as far as 2.0 along with the next
commit that will depend on it.
The server_id_hdr_name is already processed as an ist in various locations lets
also just store it as such.
see 0643b0e7e ("MINOR: proxy: Make `header_unique_id` a `struct ist`") for a
very similar past commit.
Thanks to all previous changes, it is now possible to move the
stream-interface into the conn-stream. To do so, some SI functions are
removed and their conn-stream counterparts are added. In addition, the
conn-stream is now responsible to create and release the
stream-interface. While the stream-interfaces were inlined in the stream
structure, there is now a pointer in the conn-stream. stream-interfaces are
now dynamically allocated. Thus a dedicated pool is added. It is a temporary
change because, at the end, the stream-interface structure will most
probably disappear.
In the same way the conn-stream has a pointer to the stream endpoint , this
patch adds a pointer to the application entity in the conn-stream
structure. For now, it is a stream or a health-check. It is mandatory to
merge the stream-interface with the conn-stream.
Thanks to previous changes, it is now possible to set an appctx as endpoint
for a conn-stream. This means the appctx is no longer linked to the
stream-interface but to the conn-stream. Thus, a pointer to the conn-stream
is explicitly stored in the stream-interface. The endpoint (connection or
appctx) can be retrieved via the conn-stream.
To be able to handle applets as a conn-stream endpoint, we must be prepared
to handle different types of endpoints. First of all, the conn-strream's
connection must no longer be used directly.
The backend conn-stream is no longer released on connection retry. This
means the conn-stream is detached from the underlying connection but not
released. Thus, during connection retries, the stream has always an
allocated conn-stream with no connection. All previous changes were made to
make this possible.
Note that .attach() mux callback function was changed to get the conn-stream
as argument. The muxes are no longer responsible to create the conn-stream
when a server connection is attached to a stream.
If a parsing error is detected and the corresponding HTX flag is set
(HTX_FL_PARSING_ERROR), we must be sure to always report it to the app
layer. It is especially important when the error occurs during the response
parsing, on the server side. In this case, the RX buffer contains an empty
HTX message to carry the flag. And it remains in this state till the info is
reported to the app layer. This must be done otherwise, on the conn-stream,
the CS_FL_ERR_PENDING flag cannot be switched to CS_FL_ERROR and the
CS_FL_WANT_ROOM flag is always set when h2_rcv_buf() is called. The result
is a ping-pong loop between the mux and the stream.
Note that this patch fixes a bug. But it also reveals a design issue. The
error must not be reported at the HTX level. The error is already carried by
the conn-stream. There is no reason to duplicate it. In addition, it is
errorprone to have an empty HTX message only to report the error to the app
layer.
This patch should fix the issue #1561. It must be backported as far as 2.0
but the bug only affects HAProxy >= 2.4.
The idle connection delay calculation before a request is a bit tricky,
especially for multiplexed protocols. It changed between 2.3 and 2.4 by
the integration of the idle delay inside the session itself with these
commits:
dd78921c6 ("MINOR: logs: Use session idle duration when no stream is provided")
7a6c51324 ("MINOR: stream: Always get idle duration from the session")
and by then it was only set by the H1 mux. But over multiple changes, what
used to be a zero idle delay + a request delay for H2 became a bit odd, with
the idle time slipping into the request time measurement. The effect is that,
as reported in GH issue #1395, some H2 request times look huge.
This patch introduces the calculation of the session's idle time on the
H2 mux before creating the stream. This is made possible because the
stream_new() code immediately copies this value into the stream for use
at log time. Thus we don't care about changing something that will be
touched by every single request. The idle time is calculated as documented,
i.e. the delay from the previous request to the current one. This also
means that when a single stream is present on a connection, a part of
the server's response time may appear in the %Ti measurement, but this
reflects the reality since nothing would prevent the client from using
the connection to fetch more objects. In addition this shows how long
it takes a client to find references to objects in an HTML page and
start to fetch them.
A different approach could have consisted in counting from the last time
the connection was left without any request (i.e. really idle), but this
would at least require a documentation change and it's not certain this
would provide a more useful information.
Thanks to Bart Butler and Luke Seelenbinder for reporting enough elements
to diagnose this issue.
This should be backported to 2.4.
Sadly, despite particular care, commit 39a0a1e12 ("MEDIUM: h2/hpack: emit
a Dynamic Table Size Update after settings change") broke H2 when sending
DTSU. A missing negation on the flag caused the DTSU_EMITTED flag to be
lost and the DTSU to be sent again on the next stream, and possibly to
break flow control or a few other internal states.
This will have to be backported wherever the patch above was backported.
Thanks to Yves Lafon for notifying us with elements to reproduce the
issue!
As reported by @jinsubsim in github issue #1498, there is an
interoperability issue between nghttp2 as a client and a few servers
among which haproxy (in fact likely all those which do not make use
of the dynamic headers table in responses or which do not intend to
use a larger table), when reducing the header table size below 4096.
These are easily testable this way:
nghttp -v -H":method: HEAD" --header-table-size=0 https://$SITE
It will result in a compression error for those which do not start
with an HPACK dynamic table size update opcode.
There is a possible interpretation of the H2 and HPACK specs that
says that an HPACK encoder must send an HPACK headers table update
confirming the new size it will be using after having acknowledged
it, because since it's possible for a decoder to advertise a late
SETTINGS and change it after transfers have begun, the initially
advertised value might very well be seen as a first change from the
initial setting, and the HPACK spec doesn't specify the side which
causes the change that triggers a DTSU update, which was essentially
summed up in this question from nghttp2's author when this issue
was already raised 6 years ago, but which didn't really find a solid
response by then:
https://lists.w3.org/Archives/Public/ietf-http-wg/2015OctDec/0107.html
The ongoing consensus based on what some servers are doing and that aims
at limiting interoperability issues seems to be that a DTSU is expected
for each reduction from the current size, which should be reflected in
the next revision of the H2 spec:
https://github.com/httpwg/http2-spec/pull/1005
Given that we do not make use of this table we can emit a DTSU of zero
before encoding any HPACK frame. However, some clients do not support
receiving DTSU with such values (e.g. VTest) so we cannot do it
inconditionnally!
The current patch aims at sticking as close to the spec as possible by
proceeding this way:
- when a SETTINGS_HEADER_TABLE_SIZE is received, a flag is set
indicating that the value changed
- before sending any HPACK frame, this flag is checked to see if
an update is wanted and if none was sent
- in this case a DTSU of size zero is emitted and a flag is set
to mention it was emitted so that it never has to be sent again
This addresses the problem with nghttp2 without affecting VTest.
More context is available here:
https://github.com/nghttp2/nghttp2/issues/1660https://lists.w3.org/Archives/Public/ietf-http-wg/2021OctDec/0235.html
Many thanks to @jinsubsim for this report and participating to the issue
that led to an improvement of the H2 spec.
This should be backported to stable releases in a timely manner, ideally
as far as 2.4 once the h2spec update is merged, then to other versions
after a few months of observation or in case an issue around this is
reported.
The stopping-list management introduced by commit d3a88c1c3 ("MEDIUM:
connection: close front idling connection on soft-stop") missed two
error paths in the H1 and H2 muxes. The effect is that if a stream
or HPACK table couldn't be allocated for these incoming connections,
we would leave with the connection freed still attached to the
stopping_list and it would never leave it, resulting in use-after-free
hence either a crash or a data corruption.
This is marked as medium as it only happens under extreme memory pressure
or when playing with tune.fail-alloc. Other stability issues remain in
such a case so that abnormal behaviors cannot be explained by this bug
alone.
This must be backported to 2.4.
During 2.4-dev, an issue with partial frames was fixed with commit
3d4631fec ("BUG/MEDIUM: mux-h2: fix read0 handling on partial frames").
However this patch is not completely correct. It makes h2_recv() return
0 if the connection was shut for reads, but this not make h2_io_cb()
call h2_process(), so if there are any pending data left in the demux
buffer, they will never be processed, and the I/O callback will be
called in loops forever from the poller.
The correct return value there is 1, as is done at the end of the
function to report a pending read0.
This should definitely fix issue #1328. However even after a lot of
tests I couldn't manage to reproduce it, the conditions to enter that
situation are quite racy.
This must be backported to 2.0 since the fix above was merged into
2.0.21 and 2.2.9.
The value for H2_CF_DEM_SHORT_READ flag is wrong. 2 bits are erroneously
set, 0x200 and 0x80000. It is not an issue because both bits are not used
anywhere else.
The typo was introduced in the commit b5f7b5296 ("BUG/MEDIUM: mux-h2: Handle
remaining read0 cases on partial frames"). Thus this patch must also be
backported as far a 2.0.
Define a new stream flag SF_WEBSOCKET and a new cs flag CS_FL_WEBSOCKET.
The conn-stream flag is first set by h1/h2 muxes if the request is a
valid websocket upgrade. The flag is then converted to SF_WEBSOCKET on
the stream creation.
This will be useful to properly manage websocket streams in
connect_server().
The RFC8441 was not respected by haproxy in regards with server support
for Extended CONNECT. The Extended CONNECT method was used to convert an
Upgrade header stream even if no SETTINGS_ENABLE_CONNECT_PROTOCOL was
received, which is forbidden by the RFC8441. In this case, the behavior
of the http/2 server is unspecified.
Fix this by flagging the connection on receiption of the RFC8441
settings SETTINGS_ENABLE_CONNECT_PROTOCOL. Extended CONNECT is thus only
be used if the flag is present. In the other case, the stream is
immediatly closed as there is no way to handle it in http/2. It results
in a http/1.1 502 or http/2 RESET_STREAM to the client side.
The protocol-upgrade regtest has been extended to test that haproxy does
not emit Extended CONNECT on servers without RFC8441 support.
It must be backported up to 2.4.
Add a state trace to report that a protocol upgrade is converted using
the rfc8441 Extended connect method. This is useful in regards with the
recent changes to improve http/2 websockets.
While in H1 we can usually close quickly, in H2 a client might be sending
window updates or anything while we're sending a GOAWAY and the pending
data in the socket buffers at the moment the close() is performed on the
socket results in the output data being lost and an RST being emitted.
One example where this happens easily is with h2spec, which randomly
reports connection resets when waiting for a GOAWAY while haproxy sends
it, as seen in issue #1422. With h2spec it's not window updates that are
causing this but the fact that h2spec has to upload the payload that
comes with invalid frames to accommodate various implementations, and
does that in two different segments. When haproxy aborts on the invalid
frame header, the payload was not yet received and causes an RST to
be sent.
Here we're dealing with this two ways:
- we perform a shutdown(WR) on the connection to forcefully push pending
data on a front connection after the xprt is shut and closed ;
- we drain pending data
- then we close
This totally solves the issue with h2spec, and the extra cost is very
low, especially if we consider that H2 connections are not set up and
torn down often. This issue was never observed with regular clients,
most likely because this pattern does not happen in regular traffic.
After more testing it could make sense to backport this, at least to
avoid reporting errors on h2spec tests.
Some checks were added by commit 9a3d3fcb5 ("BUG/MAJOR: mux-h2: Don't try
to send data if we know it is no longer possible") to make sure we don't
loop forever trying to send data that cannot leave. But one of the
conditions there is not correct, the one relying on H2_CS_ERROR2. Indeed,
this state indicates that the error code was serialized into the mux
buffer, and since the test is placed before trying to send the data to
the socket, if the connection states only contains a GOAWAY frame, it
may refrain from sending and may close without sending anything. It's
not dramatic, as GOAWAY reports connection errors in situations where
delivery is not even certain, but it's cleaner to make sure the error
is properly sent, and it avoids upsetting h2spec, as seen in github
issue #1422.
Given that the patch above was backported as far as 1.8, this patch will
also have to be backported that far.
Thanks to Ilya for reporting this one.
This change is required to support TCP/HTTP rules in defaults sections. The
'disabled' bitfield in the proxy structure, used to know if a proxy is
disabled or stopped, is replaced a generic bitfield named 'flags'.
PR_DISABLED and PR_STOPPED flags are renamed to PR_FL_DISABLED and
PR_FL_STOPPED respectively. In addition, everywhere there is a test to know
if a proxy is disabled or stopped, there is now a bitwise AND operation on
PR_FL_DISABLED and/or PR_FL_STOPPED flags.
The last 3 fields were 3 list heads that are per-thread, and which are:
- the pool's LRU head
- the buffer_wq
- the streams list head
Moving them into thread_ctx completes the removal of dynamic elements
from the struct thread_info. Now all these dynamic elements are packed
together at a single place for a thread.
We've found others places where the read0 is ignored because of an
incomplete frame parsing. This time, it happens during parsing of
CONTINUATION frames.
When frames are parsed, incomplete frames are properly handled and
H2_CF_DEM_SHORT_READ flag is set. It is also true for HEADERS
frames. However, for CONTINUATION frames, there is an exception. Besides
parsing the current frame, we try to peek header of the next one to merge
payload of both frames, the current one and the next one. Idea is to create
a sole HEADERS frame before parsing the payload. However, in this case, it
is possible to have an incomplete frame too, not the current one but the
next one. From the demux point of view, the current frame is complete. We
must go to the internal function h2c_decode_headers() to detect an
incomplete frame. And this case was not identified and fixed when
H2_CF_DEM_SHORT_READ flag was introduced in the commit b5f7b5296
("BUG/MEDIUM: mux-h2: Handle remaining read0 cases on partial frames")
This bug was reported in a comment of the issue #1362. The patch must be
backported as far as 2.0.
backend.c, all muxes, backend.c started manipulating ebmb_nodes with
the introduction of idle conns but the types were inherited through
other includes. Let's add ebmbtree.h there.
We'll need to improve the API to pass other arguments in the future, so
let's start to adapt better to the current use cases. task_new() is used:
- 18 times as task_new(tid_bit)
- 18 times as task_new(MAX_THREADS_MASK)
- 2 times with a single bit (in a loop)
- 1 in the debug code that uses a mask
This patch provides 3 new functions to achieve this:
- task_new_here() to create a task on the calling thread
- task_new_anywhere() to create a task to be run anywhere
- task_new_on() to create a task to run on a specific thread
The change is trivial and will allow us to later concentrate the
required adaptations to these 3 functions only. It's still possible
to call task_new() if needed but a comment was added to encourage the
use of the new ones instead. The debug code was not changed and still
uses it.
The transient flag CO_RFL_BUF_NOT_STUCK should now be set when the mux's
rcv_buf() function is called, in si_cs_recv(), to be sure the mux is able to
perform some optimisation during data copy. This flag is set when we are
sure the channel buffer is not stuck. Concretely, it happens when there are
data scheduled to be sent.
It is not a fix and this flag is not used for now. But it makes sense to have
this info to be sure to be able to do some optimisations if necessary.
This patch is related to the issue #1362. It may be backported to 2.4 to
ease future backports.
Instead of returning a 501-Not-implemented error when "Ugrade:" header is
found for a request with a payload, the header is removed. This way, the
upgrade is disabled and the request is still sent to the server. It is
required because some frameworks seem to try to perform H2 upgrade on every
requests, including POST ones.
The h2 mux was slightly fixed to convert Upgrade requests to extended
connect ones only if the rigth HTX flag is set.
This patch should fix the issue #1381. It must be backported to 2.4.
This part was fixed several times since commit aade4edc1 ("BUG/MEDIUM:
mux-h2: Don't handle pending read0 too early on streams") and there are
still some cases where a read0 event may be ignored because a partial frame
inhibits the event.
Here, we must take care to set H2_CF_END_REACHED flag if a read0 was
received while a partial frame header is received or if the padding length
is missing.
To ease partial frame detection, H2_CF_DEM_SHORT_READ flag is introduced. It
is systematically removed when some data are received and is set when a
partial frame is found or when dbuf buffer is empty. At the end of the
demux, if the connection must be closed ASAP or if data are missing to move
forward, we may acknowledge the pending read0 event, if any. For now,
H2_CF_DEM_SHORT_READ is not part of H2_CF_DEM_BLOCK_ANY mask.
This patch should fix the issue #1328. It must be backported as far as 2.0.
If a connection is closed during the preface while no data are received, if
the dontlognull option is set, no log message must be emitted. However, this
will still be handled as a protocol error. Only the log is omitted.
This patch should fix the issue #1336 for H2 sessions. It must be backported
to 2.4 and 2.3 at least, and probably as far as 2.0.
Define a new global config statement named
"h2-workaround-bogus-websocket-clients".
This statement will disable the automatic announce of h2 websocket
support as specified in the RFC8441. This can be use to overcome clients
which fail to implement the relatively fresh RFC8441. Clients will in
his case automatically downgrade to http/1.1 for the websocket tunnel
if the haproxy configuration allows it.
This feature is relatively simple and can be backported up to 2.4, which
saw the introduction of h2 websocket support.
In 2.4, commit d1ac2b90c ("MAJOR: htx: Remove the EOM block type and
use HTX_FL_EOM instead") changed the HTX processing to destroy the
blocks as they are processed. So the traces that were emitted at the
end of the send headers functions didn't have anything to show.
Let's move these traces earlier in the function, right before the HTX
processing, so that everything is still in place.
This should be backported to 2.4.
Since commit 7d013e796 ("BUG/MEDIUM: mux-h2: Xfer rxbuf to the upper
layer when creating a front stream"), the rxbuf is lost during the
call to h2c_frt_stream_new(), so the trace that happens later cannot
find a request there and we've lost the useful part indicating what
the request looked like. Let's move the trace before this call.
This should be backported to 2.4.
We're seeing some browsers setting up multiple connections and closing
some to just keep one. It looks like they do this in case they'd
negotiate H1. This results in aborted prefaces and log pollution about
bad requests and "PR--" in the status flags.
We already have an option to ignore connections with no data, it's called
http-ignore-probes. But it was not used by the H2 mux. However it totally
makes sense to use it during the preface.
This patch changes this so that connections aborted before sending the
preface can avoid being logged.
This should be backported to 2.4 and 2.3 at least, and probably even
as far as 2.0.
"sent H2 request" was already misaligned with the 3 other ones
(sent/rcvd, request/response), and now with "new H2 connection" that's
yet another alignment making the traces even less legible. Let's just
realign all 5 messages, this even eases quick pointer comparisons. This
should probably be backported to 2.4 as it's where it's the most likely
to be used in the mid-term.
It is currently very difficult to match some H2 trace outputs against
some log extracts because there's no exactly equivalent info.
This patch tries to address this by adding a TRACE_USER() call in h2_init()
that is matched in h2_trace() to report:
- connection pointer and direction
- frontend's name or server's name
- transport layer and control layer (e.g. "SSL/tcpv4")
- source and/or destination depending on what is set
This now permits to get something like this at verbosity level complete:
<0>2021-06-16T18:30:19.810897+02:00 [00|h2|1|mux_h2.c:1006] new H2 connection : h2c=0x19fee50(F,PRF) : conn=0x7f373c026850(IN) fe=h2gw RAW/tcpv4 src=127.0.0.1:19540
<0>2021-06-16T18:30:19.810919+02:00 [00|h2|1|mux_h2.c:2731] rcvd H2 request : h2c=0x19fee50(F,FRH)
<0>2021-06-16T18:30:19.810998+02:00 [00|h2|1|mux_h2.c:1006] new H2 connection : h2c=0x1a04ee0(B,PRF) : conn=0x1a04ce0(OUT) sv=h2gw/s1 RAW/tcpv4 dst=127.0.0.1:4446
Implement a safe mechanism to close front idling connection which
prevents the soft-stop to complete. Every h1/h2 front connection is
added in a new per-thread list instance. On shutdown, a new task is
waking up which calls wake mux operation on every connection still
present in the new list.
A new stopping_list attach point has been added in the connection
structure. As this member is only used for frontend connections, it
shared the same union as the session_list reserved for backend
connections.
When a DATA frame is sent, we must take care to properly detect the EOM flag
on the HTX message to set ES flag on the frame when necessary, to finish the
stream. But it is only done when data are copied from the HTX message to the
mux buffer and not when the frame are sent via a zero-copy. This patch fixes
this bug.
It is a 2.4-specific bug. No backport is needed.
Since the input buffer is transferred to the stream when it is created,
there is no longer control on the request size to be sure the buffer's
reserve is still respected. It was automatically performed in h2_rcv_buf()
because the caller took care to provide the correct available space in the
buffer. The control is still there but it is no longer applied on the
request headers. Now, we should take care of the reserve when the headers
are decoded, before the stream creation.
The test is performed for the request and the response.
It is a 2.4-specific bug. No backport is needed.
The H2_CF_RCVD_SHUT flag is used to report a read0 was encountered. It is
used by the H2 mux to properly handle shutdowns. However, this flag is only
set when no data are received. If it is detected at the socket level when
some data are received, it is not handled. And because the event was
reported on the connection, any other read attempts are blocked. In this
case, we are unable to close the connection and release the mux
immediately. We must wait the mux timeout expires.
This patch should fix the issue #1231. It must be backported as far as 2.0.
When header are splitted over several frames, payload of HEADERS and
CONTINUATION frames are merged to form a unique HEADERS frame before
decoding the payload. To do so, info about the current frame are updated
(dff, dfl..) with info of the next one. Here there is a bug when the frame
length (dfl) is update. We must add the next frame length (hdr.dfl) and not
only the amount of data found in the buffer (clen). Because HEADERS frames
are decoded in one pass, dfl value is the whole frame length or 0. nothing
intermediary.
This patch must be backported as far as 2.0.
In the function decoding payload of HEADERS frames, an internal error is
returned if the frame length is too large. it cannot exceed the buffer
size. The same is true when headers are splitted on several frames. The
payload of HEADERS and CONTINUATION frames are merged and the overall size
must not exceed the buffer size.
However, there is a bug when the current frame is big enough to only have
the space for a part of the header of the next frame. Because, in this case,
we wait for more data, to have the whole frame header. We don't properly
detect that the headers are too large to be stored in one buffer. In fact
the test to trigger this error is not accurate. When the buffer is full, the
error is reported if the frame length exceeds the amount of data in the
buffer. But in reality, an error must be reported when we are unable to
decode the current frame while the buffer is full. Because, in this case, we
know there is no way to change this state.
When the bug happens, the H2 connection is woken up in loop, consumming all
the CPU. But the traffic is not blocked for all that.
This patch must be backported as far as 2.0.
The current "ADD" vs "ADDQ" is confusing because when thinking in terms
of appending at the end of a list, "ADD" naturally comes to mind, but
here it does the opposite, it inserts. Several times already it's been
incorrectly used where ADDQ was expected, the latest of which was a
fortunate accident explained in 6fa922562 ("CLEANUP: stream: explain
why we queue the stream at the head of the server list").
Let's use more explicit (but slightly longer) names now:
LIST_ADD -> LIST_INSERT
LIST_ADDQ -> LIST_APPEND
LIST_ADDED -> LIST_INLIST
LIST_DEL -> LIST_DELETE
The same is true for MT_LISTs, including their "TRY" variant.
LIST_DEL_INIT keeps its short name to encourage to use it instead of the
lazier LIST_DELETE which is often less safe.
The change is large (~674 non-comment entries) but is mechanical enough
to remain safe. No permutation was performed, so any out-of-tree code
can easily map older names to new ones.
The list doc was updated.
This patch replaces roughly all occurrences of an HA_ATOMIC_ADD(&foo, 1)
or HA_ATOMIC_SUB(&foo, 1) with the equivalent HA_ATOMIC_INC(&foo) and
HA_ATOMIC_DEC(&foo) respectively. These are 507 changes over 45 files.
MX_FL_NO_UPG flag may now be set on a multiplexer to explicitly disable
upgrades from this mux. For now, it is set on the FCGI multiplexer because
it is not supported and there is no upgrade on backend-only multiplexers. It
is also set on the H2 multiplexer because it is clearly not supported.
sess_log() was called twice if an error occurred on the preface parsing, in
h2c_frt_recv_preface() and in h2_process_demux().
This patch must be backported as far as 2.0.
The function's purpose used to be to fail a buffer allocation if that
allocation wouldn't result in leaving some buffers available. Thus,
some allocations could succeed and others fail for the sole purpose of
trying to provide 2 buffers at once to process_stream(). But things
have changed a lot with 1.7 breaking the promise that process_stream()
would always succeed with only two buffers, and later the thread-local
pool caches that keep certain buffers available that are not accounted
for in the global pool so that local allocators cannot guess anything
from the number of currently available pools.
Let's just replace all last uses of b_alloc_margin() with b_alloc() once
for all.
When tasklets were derived from tasks, there was no immediate need for
the scheduler to know their status after execution, and in a spirit of
simplicity they just started to always return NULL. The problem is that
it simply prevents the scheduler from 1) accounting their execution time,
and 2) keeping track of their current execution status. Indeed, a remote
wake-up could very well end up manipulating a tasklet that's currently
being executed. And this is the reason why those handlers have to take
the idle lock before checking their context.
In 2.5 we'll take care of making tasklets and tasks work more similarly,
but trouble is to be expected if we continue to propagate the trend of
returning NULL everywhere, especially if some fixes relying on a stricter
model later need to be backported. For this reason this patch updates all
known tasklet handlers to make them return NULL only when the tasklet was
freed. It has no effect for now and isn't even guaranteed to always be
100% safe but it puts the code into the right direction for this.
The default proxy was passed as a variable to all parsers instead of a
const, which is not without risk, especially when some timeout parsers used
to make some int pointers point to the default values for comparisons. We
want to be certain that none of these parsers will modify the defaults
sections by accident, so it's important to mark this proxy as const.
This patch touches all occurrences found (89).
There are multiple per-thread lists in the listeners, which isn't the
most efficient in terms of cache, and doesn't easily allow to store all
the per-thread stuff.
Now we introduce an srv_per_thread structure which the servers will have an
array of, and place the idle/safe/avail conns tree heads into. Overall this
was a fairly mechanical change, and the array is now always initialized for
all servers since we'll put more stuff there. It's worth noting that the Lua
code still has to deal with its own deinit by itself despite being in a
global list, because its server is not dynamically allocated.
These functions are used on the mux layer to indicate that the connection
is becoming idle and that the xprt ought to be careful before checking the
context or that it's not idle anymore and that the context is safe. The
purpose is to allow a mux which is going to release a connection to tell
the xprt to be careful when touching it. At the moment, the xprt are
always careful and that's costly so we want to have the ability to relax
this a bit.
No xprt layer uses this yet.
The muxes are touching the idle_conns_lock all the time now because
they need to be careful that no other thread has stolen their tasklet's
context.
This patch changes this a little bit by setting the TASK_F_USR1 flag on
the tasklet before marking a connection idle, and removing it once it's
not idle anymore. Thanks to this we have the guarantee that a tasklet
without this flag cannot be present in an idle list and does not need
to go through this costly lock. This is especially true for front
connections.
It's been too short for quite a while now and is now full. It's still
time to extend it to 32-bits since we have room for this without
wasting any space, so we now gained 16 new bits for future flags.
The values were not reassigned just in case there would be a few
hidden u16 or short somewhere in which these flags are placed (as
it used to be the case with stream->pending_events).
The patch is tagged MEDIUM because this required to update the task's
process() prototype to use an int instead of a short, that's quite a
bunch of places.
That comma should've been a semicolon. Fortunately, as it is now there
is no impact thanks to operators precedence, and all expressions are
properly evaluated. But this is troubling and the risk is high to
turn it into an effective bug with a minor change.
Introduced in b8ce8905cf which first
appeared in 2.1-dev3. This fix must be backported to 2.1+.
In H1, H2 and FCGI muxes, in the show_fd function, there is duplicated test on
the stream's subs field.
This patch fixes the issue #1142. It may be backported as far as 2.2.
Historically this function would try to wake the most accurate number of
process_stream() waiters. But since the introduction of filters which could
also require buffers (e.g. for compression), things started not to be as
accurate anymore. Nowadays muxes and transport layers also use buffers, so
the runqueue size has nothing to do anymore with the number of supposed
users to come.
In addition to this, the threshold was compared to the number of free buffer
calculated as allocated minus used, but this didn't work anymore with local
pools since these counts are not updated upon alloc/free!
Let's clean this up and pass the number of released buffers instead, and
consider that each waiter successfully called counts as one buffer. This
is not rocket science and will not suddenly fix everything, but at least
it cannot be as wrong as it is today.
This could have been marked as a bug given that the current situation is
totally broken regarding this, but this probably doesn't completely fix
it, it only goes in a better direction. It is possible however that it
makes sense in the future to backport this as part of a larger series if
the situation significantly improves.
The buffer wait queue used to be global historically but this doest not
make any sense anymore given that the most common use case is to have
thread-local pools. Thus there's no point waking up waiters of other
threads after releasing an entry, as they won't benefit from it.
Let's move the queue head to the thread_info structure and use
ti->buffer_wq from now on.
Remove ebmb_node entry from struct connection and create a dedicated
struct conn_hash_node. struct connection contains now only a pointer to
a conn_hash_node, allocated only for connections where target is of type
OBJ_TYPE_SERVER. This will reduce memory footprints for every
connections that does not need http-reuse such as frontend connections.
In h2_process there was two parts where the connection was removed from
the idle trees, without first checking if the connection is a backend
side.
This should not produce a crash as the node is properly zeroed on
conn_init. However, it is better to explicit the test as it is done on
all other places. Besides it will be mandatory if the node part is
dynamically allocated only for backend connections.
The server idle/safe/available connection lists are replaced with ebmb-
trees. This is used to store backend connections, with the new field
connection hash as the key. The hash is a 8-bytes size field, used to
reflect specific connection parameters.
This is a preliminary work to be able to reuse connection with SNI,
explicit src/dst address or PROXY protocol.
This is a preparation work for connection reuse with sni/proxy
protocol/specific src-dst addresses.
Protect every access to idle conn lists with a lock. This is currently
strictly not needed because the access to the list are made with atomic
operations. However, to be able to reuse connection with specific
parameters, the list storage will be converted to eb-trees. As this
structure does not have atomic operation, it is mandatory to protect it
with a lock.
For this, the takeover lock is reused. Its role was to protect during
connection takeover. As it is now extended to general idle conns usage,
it is renamed to idle_conns_lock. A new lock section is also
instantiated named IDLE_CONNS_LOCK to isolate its impact on performance.
In H1, H2 and FCGI muxes, b_realign_if_empty() is called to reset the head
of an empty buffer before setting it a specific value to permit the
zero-copy. Thus, we can remove call to b_realign_if_empty().
In the H2 mux, when a empty DATA frame is used to finish a message, just to
set the ES flag, we now only set the EOM flag on the HTX message. However,
if the HTX message is empty, this event will not be properly handled on the
other side because there is no effective data to handle. Thus, it is
interpreted as an abort by the H1 mux.
It is in part caused by the current H1 mux design but also because there is
no way to emit empty HTX block (NOOP HTX block) or to wakeup a mux for send
when there is no data to finish some internal processing.
Thus, for now, to work around this limitation, an EOT HTX block is added by
the H2 mux if a EOM flag is added on an empty HTX message. This case is only
possible when an empty DATA frame with the ES flag is received.
This fix is specific for 2.4. No backport needed.
The demux loop could quit on missing data but the H2_CF_END_REACHED flag
would not be set in this case. This fixes a remaining situation where
previous commit f09612289 ("BUG/MEDIUM: mux-h2: handle remaining read0
cases") could not be sufficient and still leave CLOSE_WAIT. It's harder
to reproduce but was still observed in prod.
Now we quit via the end of the loop which already takes care of shutr.
This should be backported along with the patch above as far as 2.0.
Commit 3d4631fec ("BUG/MEDIUM: mux-h2: fix read0 handling on partial
frames") tried to address an issue introduced in commit aade4edc1 where
read0 wasn't properly handled in the middle of a frame. But the fix was
incomplete for two reasons:
- first, it would set H2_CF_RCVD_SHUT in h2_recv() after detecting
a read0 but the condition was guarded by h2_recv_allowed() which
explicitly excludes read0 ;
- second, h2_process would only call h2_process_demux() when there
were still data in the buffer, but closing after a short pause to
leave a buffer empty wouldn't be caught in this case.
This patch fixes this by properly taking care of the received shutdown
and by also waking up h2_process_demux() on an empty buffer if the demux
is not blocked.
Given the patches above were tagged for backporting to 2.0, this one
should be as well.
Duplicate titles for the stats H2_ST_{OPEN,TOTAL}_{CONN,STREAM}. These
entries are used on csv for the heading.
This must be backported up to 2.3.
This fixes the github issue #1102.
In h2s_bck_make_req_headers() function, in the loop on the HTX blocks, the
most common blocks, the headers, are now handled in first, before the
start-line. The same change was already performed on the response HEADERS
frames. Thus the code is more consistent now.
When a HEADERS frame is sent, it is always when an HTX start-line block is
found. Thus, in h2s_bck_make_req_headers() and h2s_frt_make_resp_headers()
functions, it is useless to tests the start-line. Instead of being too
defensive, we use BUG_ON() now because it must not happen and must be
handled as a bug.
This patch should fix the issue #1086.
In order to announce support for the Extended CONNECT h2 method by
haproxy, always send the ENABLE_CONNECT_PROTOCOL h2 settings. This new
setting has been described in the rfc 8441.
After receiving ENABLE_CONNECT_PROTOCOL, the client is free to use the
Extended CONNECT h2 method. This can notably be useful for the support
of websocket handshake on http/2.
Support for the rfc 8441 Bootstraping WebSockets with HTTP/2
Generate an HTTP/2 Extended CONNECT request from a htx Upgrade message.
This conversion is done when seeing the header Connection: Upgrade. A
CONNECT request is written with the :protocol pseudo-header set from the
Upgrade htx header value.
The protocol is saved in the h2s structure. This is needed on the
response side because the protocol is not present on HTTP/2 response but
is needed if the client side is using HTTP/1.1 with 101 status code.
Support for the rfc 8441 Bootstraping WebSockets with HTTP/2
Convert a 200 status reply from an Extended CONNECT request into a htx
representation. The htx message is set to 101 status code to be fully
compatible with the equivalent HTTP/1.1 Upgrade mechanism.
This conversion is only done if the stream flags H2_SF_EXT_CONNECT_SENT
has been set. This is true if an Extended CONNECT request has already
been seen on the stream.
Besides the 101 status, the additional headers Connection/Upgrade are
added to the htx message. The protocol is set from the value stored in
h2s. Typically it will be extracted from the client request. This is
only used if the client is using h1 as only the HTTP/1.1 101 Response
contains the Upgrade header.
This flag is used to signal that an Extended CONNECT has been sent by
the server mux on the current stream.
This will allow to convert the response to a 101 htx status message.
Some responses must not contain data. Reponses to HEAD requests and 204/304
responses. But there is no warranty that this will be really respected by
the senders or even if it is possible. For instance, the method may be
rewritten by an http-request rule (HEAD->GET). Thus, it is not really
possible to always strip these data from the response at the receive
stage. And the response may be emitted by an applet or an internal service
not strictly following the spec. All that to say that we may be prepared to
handle payload for bodyless responses on the sending path.
In addition, unlike the HTTP/1, it is not really clear that the trailers is
part of the payload or not. Thus, some clients may expect to have the
trailers, if any, in the response to a HEAD request. For instance, the GRPC
status is placed in a trailer and clients rely on it. But what happens for
204 responses then. Read the following thread for details :
https://lists.w3.org/Archives/Public/ietf-http-wg/2020OctDec/0040.html
So, thanks to previous patches, it is now possible to know on the sending
path if a response must be bodyless or not. So, for such responses, no DATA
frame is emitted, except eventually the last empty one carring the ES
flag. However, the TRAILERS frames are still emitted. The h2s_skip_data()
function is added to take care to remove HTX DATA blocks without emitting
any DATA frame expect the last one, if there is no trailers.
The H2 message flag H2_MSGF_BODYLESS_RSP is now used during the request or
the response parsing to notify the mux that, considering the parsed message,
the response is known to have no body. This happens during HEAD requests
parsing and during 204/304 responses parsing.
On the H2 multiplexer, the equivalent flag is set on H2 streams. Thus the
H2_SF_BODYLESS_RESP flag is set on a H2 stream if the H2_MSGF_BODYLESS_RSP
is found after a HEADERS frame parsing. Conversely, this flag is also set
when a HEADERS frame is emitted for HEAD requests and for 204/304 responses.
The H2_SF_BODYLESS_RESP flag will be used to ignore data payload from the
response but not the trailers.
The EOM block may be removed. The HTX_FL_EOM flags is enough. Most of time,
to know if the end of the message is reached, we just need to have an empty
HTX message with HTX_FL_EOM flag set. It may also be detected when the last
block of a message with HTX_FL_EOM flag is manipulated.
Removing EOM blocks simplifies the HTX message filling. Indeed, there is no
more edge problems when the message ends but there is no more space to write
the EOM block. However, some part are more tricky. Especially the
compression filter or the FCGI mux. The compression filter must finish the
compression on the last DATA block. Before it was performed on the EOM
block, an extra DATA block with the checksum was added. Now, we must detect
the last DATA block to be sure to finish the compression. The FCGI mux on
its part must be sure to reserve the space for the empty STDIN record on the
last DATA block while this record was inserted on the EOM block.
The H2 multiplexer is probably the part that benefits the most from this
change. Indeed, it is now fairly easier to known when to set the ES flag.
The HTX documentaion has been updated accordingly.
Tunnel management between the H1 and H2 multiplexers is a bit blurred. And
the HTX is not enough well defined on this point to make things clear. In
fact, Establishing a tunnel between an H2 client and an H1 server, or the
opposite is buggy because the both multiplexers don't handle the EOM block
the same way when a tunnel is established. In fact, the H2 multiplexer is
pretty strict and add an END_STREAM flag when an EOM block is found, while
the H1 multiplexer is more flexible.
The purpose of this patch is to make the EOM block usage pretty clear and to
fix the HTTP multiplexers to really handle HTTP tunnels in the right
way. Now, an EOM block is used to mark the end of an HTTP message,
semantically speaking. That means it may be followed by tunneled data. Thus,
CONNECT requests are now finished by an EOM block, just after the EOH block.
On the H1 multiplexer side, a tunnel is now only established on the response
path. So a CONNECT request remains in a DONE state waiting for the 2xx
response. On the H2 multiplexer side, a flag is used to know an HTTP tunnel
is requested, to not immediately add the END_STREAM flag on the EOM block.
All these changes are sensitives and not backportable because of recent
changes. The same problem exists on earlier versions and should be
addressed. But it will only be possible with a specific patchset.
This patch relies on the following ones :
* MEDIUM: mux-h1: Properly handle tunnel establishments and aborts
* MEDIUM: mux-h2: Close streams when processing data for an aborted tunnel
* MEDIUM: mux-h2: Block client data on server side waiting tunnel establishment
* MINOR: mux-h2: Add 2 flags to help to properly handle tunnel mode
* MINOR: mux-h1: Split H1C_F_WAIT_OPPOSITE flag to separate input/output sides
* MINOR: mux-h1/mux-fcgi: Don't set TUNNEL mode if payload length is unknown
In the previous patch ("MEDIUM: mux-h2: Block client data on server side
waiting tunnel establishment"), we added a way to block client data for not
fully established tunnel on the server side. This one closes the stream with
an ERR_CANCEL erorr if there are some pending tunneled data while the tunnel
was aborted. This may happen on the client side if a non-empty DATA frame or
an empty DATA frame without the ES flag is received. This may also happen on
the server side if there is a DATA htx block. However in this last case, we
first wait the response is fully forwarded.
This patch contributes to fix the tunnel mode between the H1 and the H2
muxes.
On the server side, when a tunnel is not fully established, we must block
tunneled data, waiting for the server response. It is mandatory because the
server may refuse the tunnel. This happens when a DATA htx block is
processed in tunnel mode (H2_SF_BODY_TUNNEL flag set) but before the
response HEADERS frame is received (H2_SF_HEADERS_RCVD flag no set). In this
case, the H2_SF_BLK_MBUSY flag is set to mark the stream as busy. This flag
is removed when the tunnel is fully established or aborted.
This patch contributes to fix the tunnel mode between the H1 and the H2
muxes.
H2_SF_BODY_TUNNEL and H2_SF_TUNNEL_ABRT flags are added to properly handle
the tunnel mode in the H2 mux. The first one is used to detect tunnel
establishment or fully established tunnel. The second one is used to abort a
tunnel attempt. It is the first commit having as a goal to fix tunnel
establishment between H1 and H2 muxes.
There is a subtlety in h2_rcv_buf(). CS_FL_EOS flag is added on the
conn-stream when ES is received on a tunneled stream. It really reflects the
conn-stream state and is mandatory for next commits.
As stated in the RFC7540, section 8.1.1, the HTTP/2 removes support for the
101 informational status code. Thus a PROTOCOL_ERROR is now returned to the
server if a 101-switching-protocols response is received. Thus, the server
connection is aborted.
This patch may be backported as far as 2.0.
Since commit aade4edc1 ("BUG/MEDIUM: mux-h2: Don't handle pending read0
too early on streams"), we've met a few cases where an early connection
close wouldn't be properly handled if some data were pending in a frame
header, because the test now considers the buffer's contents before
accepting to report the close, but given that frame headers or preface
are consumed at once, the buffer cannot make progress when it's stuck
at intermediary lengths.
In order to address this, this patch introduces two flags in the h2c
connection to store any reported shutdown and failed parsing. The idea
is that we cannot rely on conn_xprt_read0_pending() in the parser since
it wouldn't consider data pending in the buffer nor intermediary layers,
but we know for certain that after a read0 is reported by the transport
layer in presence of an RD_SH on the connection, no more progress will
be made there. This alone is not sufficient to decide to end processing,
we can only do this once these final data have been submitted to a parser.
Therefore, now when a parser fails on missing data, we check if a read0
has already been reported on this connection, and if so we set a new
END_REACHED flag on the connection to indicate a failure to process the
final data. The h2c_read0_pending() function now simply reports this
flag's status. This way we're certain that the input shutdown is only
considered after the demux attempted to parse the last frame.
Maybe over the long term the subscribe() API should be improved to
synchronously fail when trying to subscribe for an even that will not
happen. This may be an elegant solution that could possibly work across
multiple layers and even muxes, and be usable at a few specific places
where that's needed.
Given the patch above was backported as far as 2.0, this one should be
backported there as well. It is possible that the fcgi mux has the same
issue, but this was not analysed yet.
Thanks to Pierre Cheynier for providing detailed traces allowing to
quickly narrow the problem down, and to Olivier for his analysis.
Just like the H1 muliplexer, when a new frontend H2 stream is created, the
rxbuf is xferred to the stream at the upper layer.
Originally, it is not a bug fix, but just an api standardization. And in
fact, it fixes a crash when a h2 stream is aborted after the request parsing
but before the first call to process_stream(). It crashes since the commit
8bebd2fe5 ("MEDIUM: http-ana: Don't process partial or empty request
anymore"). It is now totally unexpected to have an HTTP stream without a
valid request. But here the stream is unable to get the request because the
client connection was aborted. Passing it during the stream creation fixes
the bug. But the true problem is that the stream-interfaces are still
relying on the connection state while only the muxes should do so.
This fix is specific for 2.4. No backport needed.
Now the show_fd helpers at the transport and mux levels return an integer
which indicates whether or not the inspected entry looks suspicious. When
an entry is reported as suspicious, "show fd" will suffix it with an
exclamation mark ('!') in the dump, that is supposed to help detecting
them.
For now, helpers were adjusted to adapt to the new API but none of them
reports any suspicious entry yet.
In FD dumps it's often very important to figure what upper layer function
is going to be called. Let's export the few I/O callbacks that appear as
tasklet functions so that "show fd" can resolve them instead of printing
a pointer relative to main. For example:
1028 : st=0x21(R:rA W:Ra) ev=0x01(heopI) [lc] tmask=0x2 umask=0x2 owner=0x7f00b889b200 iocb=0x65b638(sock_conn_iocb) back=0 cflg=0x00001300 fe=recv mux=H2 ctx=0x7f00c8824de0 h2c.st0=FRH .err=0 .maxid=795 .lastid=-1 .flg=0x0000 .nbst=0 .nbcs=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=0 .orph_cnt=0 .sub=1 .dsi=795 .dbuf=0@(nil)+0/0 .msi=-1 .mbuf=[1..1|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] xprt=SSL xprt_ctx=0x7f00c86d0750 xctx.st=0 .xprt=RAW .wait.ev=1 .subs=0x7f00c88252e0(ev=1 tl=0x7f00a07d1aa0 tl.calls=1047 tl.ctx=0x7f00c8824de0 tl.fct=h2_io_cb) .sent_early=0 .early_in=0
That was causing confusing outputs like this one whenan H2S is known:
1030 : ... last_h2s=0x2ed8390 .id=775 .st=HCR.flg=0x4001 .rxbuf=...
^^^^
This was introduced by commit ab2ec4540 in 2.1-dev2 so the fix can be
backported as far as 2.1.
This is a regression in 7838a79ba ("MEDIUM: mux-h2/trace: add lots of traces
all over the code"). The issue was found using -Wmisleading-indentation.
This patch fixes GitHub issue #1015.
The impact of this bug is that it could in theory cause occasional delays
on some long responses for connections having otherwise no traffic.
This patch should be backported to 2.1+, the commit was first tagged in
v2.1-dev2.