This one was not easy because it was embarking many includes with it,
which other files would automatically find. At least global.h, arg.h
and tools.h were identified. 93 total locations were identified, 8
additional includes had to be added.
In the rare files where it was possible to finalize the sorting of
includes by adjusting only one or two extra lines, it was done. But
all files would need to be rechecked and cleaned up now.
It was the last set of files in types/ and proto/ and these directories
must not be reused anymore.
The type file is becoming a mess, half of it is for the proxy protocol,
another good part describes conn_streams and mux ops, it would deserve
being split again. At least it was reordered so that elements are easier
to find, with the PP-stuff left at the end. The MAX_SEND_FD macro was moved
to compat.h as it's said to be the value for Linux.
A few includes had to be added, namely list-t.h in the type file and
types/proxy.h in the proto file. actions.h was including http-htx.h
but didn't need it so it was dropped.
The various hpack files are self-contained, but hpack-tbl was one of
those showing difficulties when pools were added because that began
to add quite some dependencies. Now when built in standalone mode,
it still uses the bare minimum pool definitions and doesn't require
to know the prototypes anymore when only the structures are needed.
Thus the files were moved verbatim except for hpack-tbl which was
split between types and prototypes.
Most of the file was a large set of HTX elements manipulation functions
and few types, so splitting them allowed to further reduce dependencies
and shrink the build time. Doing so revealed that a few files (h2.c,
mux_pt.c) needed haproxy/buf.h and were previously getting it through
htx.h. They were fixed.
The file was moved as-is. There was a wrong dependency on dynbuf.h
instead of buf.h which was addressed. There was no benefit to
splitting this between types and functions.
The pretty confusing "buffer.h" was in fact not the place to look for
the definition of "struct buffer" but the one responsible for dynamic
buffer allocation. As such it defines the struct buffer_wait and the
few functions to allocate a buffer or wait for one.
This patch moves it renaming it to dynbuf.h. The type definition was
moved to its own file since it's included in a number of other structs.
Doing this cleanup revealed that a significant number of files used to
rely on this one to inherit struct buffer through it but didn't need
anything from this file at all.
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:
- common/config.h
- common/compat.h
- common/compiler.h
- common/defaults.h
- common/initcall.h
- common/tools.h
The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.
In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.
No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
This is where other imported components are located. All files which
used to directly include ebtree were touched to update their include
path so that "import/" is now prefixed before the ebtree-related files.
The ebtree.h file was slightly adjusted to read compiler.h from the
common/ subdirectory (this is the only change).
A build issue was encountered when eb32sctree.h is loaded before
eb32tree.h because only the former checks for the latter before
defining type u32. This was addressed by adding the reverse ifdef
in eb32tree.h.
No further cleanup was done yet in order to keep changes minimal.
Instead of using malloc/free to allocate an HPACK table, let's declare
a pool. However the HPACK size is configured by the H2 mux, so it's
also this one which allocates it after post_check.
In the various muxes, add a comment documenting that once
srv_add_to_idle_list() got called, any thread may pick that conenction up,
so it is unsafe to access the mux context/the connection, the only thing we
can do is returning.
Make the "list" element a struct mt_list, and explicitely use
list_from_mt_list to get a struct list * where it is used as such, so that
mt_list_for_each_entry will be usable with it.
Revamp the server connection lists. We know have 3 lists :
- idle_conns, which contains idling connections
- safe_conns, which contains idling connections that are safe to use even
for the first request
- available_conns, which contains connections that are not idling, but can
still accept new streams (those are HTTP/2 or fastcgi, and are always
considered safe).
Make it so sessions are not responsible for connection anymore, except for
connections that are private, and thus can't be shared, otherwise, as soon
as a request is done, the session will just add the connection to the
orphan connections pool.
This will break http-reuse safe, but it is expected to be fixed later.
Remove the list of private connections from server, it has been largely
unused, we only inserted connections in it, but we would never actually
use it.
When building with DEBUG_STRICT, there are still some BUG_ON(events&event_type)
in the subscribe() code which are not welcome anymore since we explicitly
permit to wake the caller up on readiness. This causes some regtests to fail
since 2c1f37d353 ("OPTIM: mux-h1: subscribe rather than waking up at a few
other places") when built with this option.
No backport is needed, this is 2.2-dev.
This lock was only needed to protect the buffer_wq list, but now we have
the mt_list for this. This patch simply turns the buffer_wq list to an
mt_list and gets rid of the lock.
It's worth noting that the whole buffer_wait thing still looks totally
wrong especially in a threaded context: the wakeup_cb() callback is
called synchronously from any thread and may end up calling some
connection code that was not expected to run on a given thread. The
whole thing should probably be reworked to use tasklets instead and be
a bit more centralized.
In H1/H2/FCGI, the *_get_buf() functions try to disable receipt of data
when there's no buffer available. But they do so at the lowest possible
level, which is unrelated to the upper transport layers which may still
be trying to feed data based on subscription. The correct approach here
would theorically be to only disable subscription, though when we get
there, the subscription will already have been dropped, so we can safely
just remove that call.
It's unlikely that this could have had any practical impact, as the upper
xprt layer would call this callback which would fail an not resubscribe.
Having the lowest layer disabled would just be temporary since when
re-enabling reading, a subscribe at the end of data would re-enable it.
Backport should not harm but seems useless at this point.
When calling the mux "destroy" method, the argument should be the mux
context, not the connection. In a few instances in the mux code, the
connection was used (mainly when the session wouldn't handle the idle
connection, and the server pool was fool), and that could lead to random
segfaults.
This should be backported to 2.1, 2.0, and 1.9
In commit 477902b ("MEDIUM: connections: Get ride of the xprt_done
callback.") we added an inconditional call to h2_wake_some_streams()
in h2_wake(), though we must not do it if the connection is destroyed
or we end up with a use-after-free. In this case it's already done in
h2_process() before destroying the connection anyway.
Let's just add this test for now. A cleaner approach might consist in
doing it in the h2_process() function itself when a connection status
change is detected.
No backport is needed, this is purely 2.2.
While the H2 parser properly checks for the absence of anything but
"trailers" in the TE header field, we forget to check this when sending
the request to an H2 server. The problem is that an H2->H2 conversion
may keep "gzip" and fail on the next stage.
This patch makes sure that we only send "TE: trailers" if the TE header
contains the "trailers" token, otherwise it's dropped.
This fixes issue #464 and should be backported till 1.9.
As mentioned in commit c192b0ab95 ("MEDIUM: connection: remove
CO_FL_CONNECTED and only rely on CO_FL_WAIT_*"), there is a lack of
consistency on which flags are checked among L4/L6/HANDSHAKE depending
on the code areas. A number of sample fetch functions only check for
L4L6 to report MAY_CHANGE, some places only check for HANDSHAKE and
many check both L4L6 and HANDSHAKE.
This patch starts to make all of this more consistent by introducing a
new mask CO_FL_WAIT_XPRT which is the union of L4/L6/HANDSHAKE and
reports whether the transport layer is ready or not.
All inconsistent call places were updated to rely on this one each time
the goal was to check for the readiness of the transport layer.
The xprt_done_cb callback was used to defer some connection initialization
until we're connected and the handshake are done. As it mostly consists of
creating the mux, instead of using the callback, introduce a conn_create_mux()
function, that will just call conn_complete_session() for frontend, and
create the mux for backend.
In h2_wake(), make sure we call the wake method of the stream_interface,
as we no longer wakeup the stream task.
The subscriber used to be passed as a "void *param" that was systematically
cast to a struct wait_event*. By now it appears clear that the subscribe()
call at every layer is well defined and always takes a pointer to an event
subscriber of type wait_event, so let's enforce this in the functions'
prototypes, remove the intermediary variables used to cast it and clean up
the comments to clarify what all these functions do in their context.
This is the continuation of the recv+send event notifications merge
that was started. This patch is less trivial than the previous ones
because the existence of a send event subscription is also used to
decide to put a stream back into the send list.
The logic handling the deferred shutdown is a bit complex because it
involves a wait_event struct in each h2s dedicated to subscribing to
itself when shutdowns are not immediately possible. This implies that
we will not be able to support a shutdown and a receive subscription
in the future when we merge all wait events.
Let's solely rely on the H2_SF_WANT_SHUT_{R,W} flags instead and have
an autonomous tasklet for this. This requires to add a few controls
in the code because now when waking up a stream we need to check if it
is for I/O or just a shut, but since sending and shutting are exclusive
it's not difficult.
One point worth noting is that further resources could be shaved off
by only allocating the tasklet when failing to shut, given that in the
vast majority of streams it will never be used. In fact the sole purpose
of the tasklet is to support calling this code from outside the H2 mux
context. Looking at the code, it seems that not too many adaptations
would be required to have the send_list walking code deal with sending
the shut bits itself and further simplify all this.
This partially reverts commit d846c267 ("MINOR: h2: Don't run tasks that
are waiting to send if mux in full"). This commit was introduced to
limit the start/stop overhead incurred by waking many streams to let
only a few work. But since commit 9c218e7521 ("MAJOR: mux-h2: switch
to next mux buffer on buffer full condition."), this situation occurs
way less (typically 2000 to 4000 times less often) and the benefits of
the patch above do not outweigh its shortcomings anymore. And commit
c7ce4e3e7f ("BUG/MEDIUM: mux-h2: don't stop sending when crossing a
buffer boundary") addressed a root cause of many unexpected sleeps and
wakeups.
The main problem it's causing is that it requires to keep the element
in the send_wait list until it's executed, leaving the entry in an
uncertain state, and significantly complicating the coexistence of this
list and the wait list dedicated to shutdown. Also it happens that this
call to tasklet_remove_from_task_list() will not be usable anymore once
we start to support streams on different threads. And finally, some of
the other streams that we remove might very well have managed to find
their way to the h2_snd_buf() with an unblocked condition as well so it
is possible that some of these removals were not welcome.
So this patch now makes sure that send_wait is immediately nulled when
the task is woken up, and that we don't have to play with it afterwards.
Since we don't need to stop the tasklets anymore, we don't need the
sending_list that we can remove.
However one very useful benefit of the sending_list was that it used to
provide the information about the fact that the stream already tried to
send and failed. This was an important factor to improve fairness because
late arrived streams should not be allowed to send if others are already
scheduled. So this patch introduces a new per-stream flag H2_SF_NOTIFIED
to distinguish such streams.
With this patch the fairness is preserved, and the ratio of aborted
h2_snd_buf() due to other streams already sending remains quite low
(~0.3-2.1% measured depending on object size, this is within
expectations for 100 independent streams).
If the contention issue the patch above used to address comes up again
in the future, a much better (though more complicated) solution would
be to switch to per-connection buffer pools to distribute between the
connection and the streams so that by default there are more buffers
available for the mux and the streams only have some when the mux's are
unused, i.e. it would push the memory pressure back to the data layer.
One observation made while developing this patch is that when dealing
with large objects we still spend a huge amount of time scanning the
send_list with tasks that are already woken up every time a send()
manages to purge a bit more data. Indeed, by removing the elements
from the list when H2_SF_NOTIFIED is set, the netowrk bandwidth on
1 MB objects fetched over 100 streams per connection increases by 38%.
This was not done here to preserve fairness but is worth studying (e.g.
by keeping a restart pointer on the list or just having a flag indicating
if an entry was added since last scan).
In version 2.0, after commit 9c218e7521 ("MAJOR: mux-h2: switch to next
mux buffer on buffer full condition."), the H2 mux started to use a ring
buffer for the output data in order to reduce competition between streams.
However, one corner case was suboptimally covered: when crossing a buffer
boundary, we have to shrink the outgoing frame size to the one left in
the output buffer, but this shorter size is later used as a signal of
incomplete send due to a buffer full condition (which used to be true when
using a single buffer). As a result, function h2s_frt_make_resp_data()
used to return less than requested, which in turn would cause h2_snd_buf()
to stop sending and leave some unsent data in the buffer, and si_cs_send()
to subscribe for sending more later.
But it goes a bit further than this, because subscribing to send again
causes the mux's send_list not to be empty anymore, hence extra streams
can be denied the access to the mux till the first stream is woken again.
This causes a nasty wakeup-sleep dance between streams that makes it
totally impractical to try to remove the sending list. A test showed
that it was possible to observe 3 million h2_snd_buf() giveups for only
100k requests when using 100 concurrent streams on 20kB objects.
It doesn't seem likely that a stream could get blocked and time out due
to this bug, though it's not possible either to demonstrate the opposite.
One risk is that incompletely sent streams do not have any blocking flags
so they may not be identified as blocked. However on first scan of the
send_list they meet all conditions for a wakeup.
This patch simply allows to continue on a new frame after a partial
frame. with only this change, the number of failed h2_snd_buf() was
divided by 800 (4% of calls). And by slightly increasing the H2C_MBUF_CNT
size, it can go down to zero.
This fix must be backported to 2.1 and 2.0.
Previous commit 989539b048 ("BUG/MINOR: mux-h2: use a safe
list_for_each_entry in h2_send()") accidently lost its sending_list test,
resulting in some elements to be woken up again while already in the
sending_list and h2_unsubscribe() crashing on integrity tests (only
when built with DEBUG_DEV).
If the fix above is backported this one must be as well.
h2_send() uses list_for_each_entry() to scan paused streams and resume
them, but happily deletes any leftover from a previous failed unsubscribe,
which is obviously not safe and would corrupt the list. In practice this
is a proof that this doesn't happen, but it's not the best way to prove it.
In order to fix this and reduce the maintenance burden caused by code
duplication (this list walk exists at 3 places), let's introduce a new
function h2_resume_each_sending_h2s() doing exactly this and use it at
all 3 places.
This bug was introduced as a side effect of fix 998410a41b ("BUG/MEDIUM:
h2: Revamp the way send subscriptions works.") so it should be backported
as far as 1.9.
Since commit fa8aa867b915 ("MEDIUM: connections: Change struct
wait_list to wait_event.") we no longer use this section.
this should fix github issue #437
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Christopher found another issue in the H2 backend implementation that
results from a miss in the H2 spec: the processing of a HEADERS frame
is always permitted in IDLE state, but this doesn't make sense on the
response path! And here when facing such a frame, we try to decode it
while we didn't allocate any stream, so we end up trying to fill the
idle stream's buffer (read-only) and crash.
What we're doing here is that if we get a HEADERS frame in IDLE state
from a server, we terminate the connection with a PROTOCOL_ERROR. No
such transition seems to be permitted by the spec but it seems to be
the only sane solution.
This fix must be backported as far as 1.9. Note that in 2.0 and earlier
there's no h2_frame_check_vs_state() function, instead the check is
inlined in h2_process_demux().
Some BUG_ON() tests emit a warning because of a potential null pointer
dereference on an HTX block. In fact, it should never happen, but now, GCC is
happy.
This patch must be backported to 2.0.
In case a stream tries to send on a connection error, we must report the
error so that the stream interface keeps the data available and may safely
retry on another connection. Till now this would happen only before the
connection was established, not in case of a failed handshake or an early
GOAWAY for example.
This should be backported to 2.0 and 1.9.
If a connection faces an error or a timeout, it must be removed from its
idle list ASAP. We certainly don't want to risk sending new streams on
it.
This should be backported to 2.0 (replacing MT_LIST_DEL with LIST_DEL_LOCKED)
and 1.9 (there's no lock there, the idle lists are per-thread and per-server
however a LIST_DEL_INIT will be needed).
If an H2 mux has met an error, we must not report available streams
anymore, or it risks to accumulate new streams while not being able
to process them.
This should be backported to 2.0 and 1.9.
Add a new method, ctl(), to muxes. It uses a "enum mux_ctl_type" to
let it know which information we're asking for, and can output it either
directly by returning the expected value, or by using an optional argument.
"output" argument.
Right now, the only known mux_ctl_type is MUX_STATUS, that will return 0 if
the mux is not ready, or MUX_STATUS_READY if the mux is ready.
We probably want to backport this to 1.9 and 2.0.
In h2_send(), when something is sent, we remove the flags
(H2_CF_MUX_MFULL|H2_CF_DEM_MROOM) on the h2 connection. This way, we are able to
wake up all streams waiting to send data. Unfortunatly, these flags are
unconditionally removed, even when nothing was sent. So if the h2c is blocked
because the mux buffers are full and we are unable to send anything, all streams
in the send_list are woken up for nothing. Now, we only remove these flags if at
least a send succeeds.
This patch must be backport to 2.0.
The logs were added to the H2 mux so that we can report logs in case
of errors that prevent a stream from being created, but as a side effect
these logs are emitted twice for backend connections: once by the H2 mux
itself and another time by the upper layer stream. It can even happen
more with connection retries.
This patch makes sure we do not emit logs for backend connections.
It should be backported to 2.0 and 1.9.
Instead of mapping the Host header field to :authority, we now act
differently if the request is in origin form or in absolute form.
If it's absolute, we extract the scheme and the authority from the
request, fix the path if it's empty, and drop the Host header.
Otherwise we take the scheme from the http/https flags in the HTX
layer, make the URI be the path only, and emit the Host header,
as indicated in RFC7540#8.1.2.3. This allows to distinguish between
absolute and origin requests for H1 to H2 conversions.
There are some reports of users not being able to pass "enterprise"
traffic through haproxy when using H2 because it doesn't emit CONTINUATION
frames and as such is limited to headers no longer than the negociated
max-frame-size which usually is 16 kB.
This patch implements support form emitting CONTINUATION when a HEADERS
frame cannot fit within a limit of mfs. It does this by first filling a
buffer-wise frame, then truncating it starting from the tail to append
CONTINUATION frames. This makes sure that we can truncate on any byte
without being forced to stop on a header boundary, and ensures that the
common case (no fragmentation) doesn't add any extra cost. By moving
the tail first we make sure that each byte is moved only once, thus the
performance impact remains negligible.
This addresses github issue #249.