The SUB_CAN_SEND/SUB_CAN_RECV enum values have been confusing a few
times, especially when checking them on reading. After some discussion,
it appears that calling them SUB_RETRY_SEND/SUB_RETRY_RECV more
accurately reflects their purpose since these events may only appear
after a first attempt to perform the I/O operation has failed or was
not completed.
In addition the wait_reason field in struct wait_event which carries
them makes one think that a single reason may happen at once while
it is in fact a set of events. Since the struct is called wait_event
it makes sense that this field is called "events" to indicate it's the
list of events we're subscribed to.
Last, the values for SUB_RETRY_RECV/SEND were swapped so that value
1 corresponds to recv and 2 to send, as is done almost everywhere else
in the code an in the shutdown() call.
Today the demux only wakes a stream up after receiving some contents, but
not necessarily on close or error. Let's do it based on both error flags
and both EOS flags. With a bit of refinement we should be able to only do
it when the pending bits are there but not the static ones.
No backport is needed.
This function is called when dealing with a connection error or a GOAWAY
frame. It used to report a synchronous error instead of an asycnhronous
error, which can lead to data truncation since whatever is still available
in the rxbuf will be ignored. Let's correctly use CS_FL_ERR_PENDING instead
and only fall back to CS_FL_ERROR if CS_FL_EOS was already delivered.
No backport is needed.
If EOS has already been reported on the conn_stream, there won't be
any read anymore to turn ERR_PENDING into ERROR, so we have to do
report it directly.
No backport is needed.
The h2s pointer was used to scan fctl lists prior to being used to scan
the send list by ID, so it could appear non-null eventhough the list is
empty, resulting in misleading information on empty connections.
No backport is needed.
Most of the time when we issue "show fd" to dump a mux's state, it's
to figure why a transfer is frozen. Connection, stream and conn_stream
states are critical there. And most of the time when this happens there
is a single stream left in the H2 mux, so let's always dump the last
known stream on show fd, as most of the time it will be the one of
interest.
Commit 7505f94f9 ("MEDIUM: h2: Don't use a wake() method anymore.")
changed the conditions to restart demuxing so that this happens as soon
as something is read. But similar to previous fix, at an end of stream
we may be woken up with nothing to read but data still available in the
demux buffer, so we must also use this as a valid condition for demuxing.
No backport is needed, this is purely 1.9.
Commit 082f559d3 ("BUG/MEDIUM: h2: restart demuxing after releasing
buffer space") tried to address a situation where transfers could stall
after a read, but the condition was not completely covered : some stalls
may still happen at end of stream because there's nothing anymore to
receive and the last data lie in the demux buffer. Thus we must also
consider this state as a valid condition to restart demuxing.
No backport is needed.
Add a new flag to conn_streams, CS_FL_ERR_PENDING. This is to be set instead
of CS_FL_ERR in case there's still more data to be read, so that we read all
the data before closing.
In h2_deferred_shut, if we're done sending the shutr/shutw, don't destroy
the h2s if it still has a conn_stream attached, or the conn_stream may try
to access it again.
When creating new conn_streams, always set the CS_FL_NOT_FIRST flag. We
don't really care about being the first request for HTTP/2, this only
really makes sense for HTTP/1, and that way we can reuse connections.
In session, don't keep an infinite number of connection that can idle.
Add a new frontend parameter, "max-session-srv-conns" to set a max number,
with a default value of 5.
Instead of trying to get the session from the connection, which is not
always there, and of course there could be multiple sessions per connection,
provide it with the init() and attach() methods, so that we know the
session for each outgoing stream.
In the mux_h1 and mux_h2, move the test to see if we should add the
connection in the idle list until after we destroyed the h1s/h2s, that way
later we'll be able to check if the connection has no stream at all, and if
it should be added to the server idling list.
When using zerocopy, start from the beginning of the data, not from the
beginning of the buffer, it may have contained headers, and so the data
won't start at the beginning of the buffer.
The transpory layer now respects buffer alignment, so we don't need to
cheat anymore pretending we have some data at the head, adjusting the
buffer's head is enough.
For a long time we've been realigning empty buffers in the transport
layers, where the I/Os were performed based on callbacks. Doing so is
optimal for higher data throughput but makes it trickier to optimize
unaligned data, where mux_h1/h2 have to claim some data are present
in the buffer to force unaligned accesses to skip the frame's header
or the chunk header.
We don't need to do this anymore since the I/O calls are now always
performed from top to bottom, so it's only the mux's responsibility
to realign an empty buffer if it wants to.
In practice it doesn't change anything, it's just a convention, and
it will allow the code to be simplified in a next patch.
In the mux detach function, when using HTX, take the connection over if
it no longer has an owner (ie because the session that was the owner left).
It is done for legacy code in proto_http.c, but not for HTX.
Also when using HTX, in H2, try to add the connection back to idle_conns if
it was not already (ie we used to use all the available streams, and we're
freeing one). That too was done in proto_http.c.
H2 has a 9-byte frame header, and HTX has a 40-byte frame header.
By artificially advancing the Rx header and limiting the amount of
bytes read to protect the end of the buffer, we can make the data
payload perfectly aligned with HTX blocks and optimize the copy.
This is similar to what was done for the H1 mux : when the mux's buffer
is empty and the htx area contains exactly one data block of the same
size as the requested count, and all window and frame size conditions are
satisfied, then it's possible to simply swap the caller's buffer with the
mux's output buffer and adjust offsets and length to match the entire
DATA HTX block in the middle. An H2 frame header has to be prepended
before the block but this always fits in an HTX frame header.
In this case we perform a true zero-copy operation from end-to-end. This
is the situation that happens all the time with large files. When using
HTX over H2 over TLS, this brings a 3% extra performance gain. TLS remains
a limiting factor here but the copy definitely has a cost. Also since
haproxy can now use H2 in clear, the savings can be higher.
Due to blocking factor being different on H1 and H2, we regularly end
up with tails of data blocks that leave room in the mux buffer, making
it tempting to copy the pending frame into the remaining room left, and
possibly realigning the output buffer.
Here we check if the output buffer contains data, and prefer to wait
if either the current frame doesn't fit or if it's larger than 1/4 of
the buffer. This way upon next call, either a zero copy, or a larger
and aligned copy will be performed, taking the whole chunk at once.
Doing so increases the H2 bandwidth by slightly more than 1% on large
objects.
By default H2 uses a 65535 bytes window for the connection, and changing
it requires sending a WINDOW_UPDATE message. We only used to update the
window when receiving data, thus never increasing it further.
As reported by user klzgrad on the mailing list, this seriously limits
the upload bitrate, and will have an even higher impact on the backend
H2 connections to origin servers.
There is no technical reason for keeping this window so low, so let's
increase it to the maximum possible value (2G-1). We do this by
pretending we've already received that many data minus the maximum
data the client might already send (65535), so that an early
WINDOW_UPDATE message is sent right after the SETTINGS frame.
This should be backported to 1.8. This patch depends on previous
patch "BUG/MINOR: mux-h2: refrain from muxing during the preface".
The condition to refrain from processing the mux was insufficient as it
would only handle the outgoing connections. In essence it is not that much
of a problem since we don't have streams yet on an incoming connetion. But
it prevents waiting for the end of the preface before sending an early
WINDOW_UPDATE message, thus causing the connections to fail in this case.
This must be backported to 1.8 with a few minor adaptations.
All the HTX definition is self-contained and doesn't really depend on
anything external since it's a mostly protocol. In addition, some
external similar files (like h2) also placed in common used to rely
on it, making it a bit awkward.
This patch moves the two htx.h files into a single self-contained one.
The historical dependency on sample.h could be also removed since it
used to be there only for http_meth_t which is now in http.h.
This way we don't open-code the HPACK status codes anymore in the H2
code. Special care was taken not to cause any slowdown as this code is
very sensitive.
Jerome reported that outgoing H2 failed for methods different from GET
or POST. It turns out that the HPACK encoding is performed by hand in
the outgoing headers encoding function and that the data length was not
incremented to cover the literal method value, resulting in a corrupted
HEADERS frame.
Admittedly this code should move to the generic HPACK code.
No backport is needed.
Gcc 7 warns about a potential null pointer deref that cannot happen
since the start line block is guaranteed to be present in the functions
where it's dereferenced. Let's mark it as already checked.
When we're using HTX, we don't have to generate chunk header/trailers, and
that ultimately leads to a crash when we try to access a buffer that
contains just chunk trailers.
This should not be backported.
Since HTX stores header names in lower case already, we don't need to
do it again anymore. This increased H2 performance by 2.7% on quick
tests, now making H2 overr HTX about 5.5% faster than H2 over H1.
CS_FL_RCV_MORE is used in two cases, to let the conn_stream
know there may be more data available, and to let it know that
it needs more room. We can't easily differentiate between the
two, and that may leads to hangs, so split it into two flags,
CS_FL_RCV_MORE, that means there may be more data, and
CS_FL_WANT_ROOM, that means we need more room.
This should not be backported.
Due to a thinko, I used sl_off as the start line index number but it's
not it, it's its offset. The first index is obtained using htx_get_head(),
and the start line is obtained using htx_get_sline(). This caused crashes
to happen when forwarding HTX traffic via the H2 mux once the HTX buffer
started to wrap.
No backport is needed.
Now, the function htx_from_buf() will set the buffer's length to its size
automatically. In return, the caller should call htx_to_buf() at the end to be
sure to leave the buffer hosting the HTX message in the right state. When the
caller can use the function htxbuf() to get the HTX message without any update
on the underlying buffer.
It's incorrect to send more bytes than requested, because some filters
(e.g. compression) might intentionally hold on some blocks, so DATA
blocks must not be processed past the advertised byte count. It is not
the case for headers however.
No backport is needed.
If we're blocking on mux full, mux busy or whatever, we must get out of
the loop. In legacy mode this problem doesn't exist as we can normally
return 0 but here it's not a sufficient condition to stop sending, so
we must inspect the blocking flags as well.
No backport is needed.
The way htx_xfer_blks() was used is wrong, if we receive data, we must
report everything we found, not just the headers blocks. This ways causing
the EOM to be postponed and some fast responses (or errors) to be incorrectly
delayed.
No backport is needed.
In h2_snd_buf(), when running with htx, make sure we return the amount of
data the caller specified, if we emptied the buffer, as it is what the
caller expects, and will lead to him properly consider the buffer to be
empty.
When reaching h2_shutr/h2_shutw, as we may have generated an empty frame,
a goaway or a rst, make sure we wake the I/O tasklet, or we may not send
what we just generated.
Also in h2_shutw(), don't forget to return if all went well, we don't want
to subscribe the h2s to wait events.
Add a new method to muxes, "max_streams", that returns the max number of
streams the mux can handle. This will be used to know if a mux is in use
or not.
When creating a new stream, don't bother flagging a connection with
H2_CF_DEM_TOOMANY if we created the last available stream. We won't create
any other anyway, because h2_avail_streams() would return 0 available streams,
and has it is a blocking flag, it prevents us from reading data after.