The frame check code in the demuxer was moved to its own function to
keep the demux function clean enough. This also simplifies the test
case as we can now simply call this function once in H2_CS_FRAME_P
state.
When parsing references to stick-tables declared as backends, they are added to
a list of proxies (they are proxies!) which refer to this stick-tables.
Before this patch we added them to these list without checking they were already
present, making the silly hypothesis the actions/sample were checked/resolved in the same
order the proxies are parsed.
This patch implement a simple inline function to in_proxies_list() to test
the presence of a proxy in a list of proxies. We use this function when resolving
/checking samples/actions.
This bug was introduced by 015e4d7 commit.
Must be backported to 2.0.
In SMTP, MySQL and PgSQL checks, we're supposed to finish with a message
to politely quit the server, otherwise some of them will log some errors.
This is the case with Postfix as reported in GH issue #187. Since commit
fe4abe6 ("BUG/MEDIUM: connections: Don't call shutdown() if we want to
disable linger.") we are a bit more aggressive on outgoing connection
closure and checks were not prepared for this.
This patch makes the 3 checks above disable the linger_risk for these
checks so that we close cleanly, with the side effect that it will leave
some TIME_WAIT connections behind (hence why it should not be generalized
to all checks). It's worth noting that in issue #187 it's mentioned that
this patch doesn't seem to be sufficient for Postfix, however based only
on local network activity this looks OK, so maybe this will need to be
improved later.
Given that the patch above was backported to 2.0 and 1.9, this one should
as well.
In Patrick's trace it was visible that after a stream had been missed,
the next stream would receive a WINDOW_UPDATE with the first one's
credit added to its own. This makes sense because in case of error
h2c->rcvd_s is not reset. Given that this counter is per frame, better
reset it when starting to parse a new frame, it's easier and safer.
This must be backported as far as 1.8.
In h2_process_mux() if we have some room and an attempt to send a window
update for the connection was pending, it's done first. But it's not done
for the stream, which will have for effect of postponing this attempt
till next pass into h2_process_demux(), at the risk of seeing the send
buffer full again. Let's always try to send both pending frames as soon
as possible.
This should be backported as far as 1.8.
Patrick Hemmer reported a rare case where the H2 mux emits spurious
RST_STREAM(STREAM_CLOSED) that are triggered by the send path and do
not even appear to be associated with a previous incoming frame, while
the send path never emits such a thing.
The problem is particularly complex (hence its rarity). What happens is
that when data are uploaded (POST) we must refill the sending stream's
window by sending a WINDOW_UPDATE message (and we must refill the
connection's too). But in a highly bidirectional traffic, it is possible
that the mux's buffer will be full and that there is no more room to build
this WINDOW_UPDATE frame. In this case the demux parser switches to the
H2_CS_FRAME_A state, noting that an "acknowledgement" is needed for the
current frame, and it doesn't change the current stream nor frame type.
But the stream's state was possibly updated (typically OPEN->HREM when
a DATA frame carried the ES flag).
Later the data can leave the buffer, wake up h2_io_cb(), which calls
h2_send() to send pending data, itself calling h2_process_mux() which
detects that there are unacked data in the connection's window so it
emits a WINDOW_UPDATE for the connection and resets the counter. so it
emits a WINDOW_UPDATE for the connection and resets the counter. Then
h2_process() calls h2_process_demux() which continues the processing
based on the current frame type and the current state H2_CS_FRAME_A.
Unfortunately the protocol compliance checks matching the frame type
against the current state are still present. These tests are designed
for new frames only, not for those in progress, but they are not
limited by frame types. Thus the current DATA frame is checked again
against the current stream state that is now HREM, and fails the test
with a STREAM_CLOSED error.
The quick and backportable solution consists in adding the test for
this ACK and bypass all these checks that were already validated prior
to the state transition. A better long-term solution would consist in
having a new state between H and P indicating the frame is new and
needs to be checked ("N" for new?) and apply the protocol tests only
in this state. In addition everywhere we decide to send a window
update, we should send a stream WU first if there are unacked data
for the current stream. Last, rcvd_s should always be reset when
transitioning to FRAME_H (and a BUGON for this in dev would help).
The bug will be way harder to trigger on 2.0 than on 1.8/1.9 because
we have a ring buffer for the connection so the buffer full situations
are extremely rare.
This fix must be backpored to all versions having H2 (as far as 1.8).
Special thanks to Patrick for providing exploitable traces.
If the server decides to close early, we don't want to send a
REFUSED_STREAM error but a CANCEL, so that the client doesn't want
to retry. The test in h2_do_shutw() was wrong for this as it would
handle the HLOC case like the case where nothing had been sent for
this stream, which is wrong. Now h2_do_shutw() does nothing in this
case and lets h2_do_shutr() decide.
Note that this partially undoes f983d00a1 ("BUG/MINOR: mux-h2: make
the do_shut{r,w} functions more robust against retries").
This must be backported to 2.0. The patch above was not backported to
1.9 for being too risky there, but if it eventually gets to it, this
one will be needed as well.
There is a test on the existence of the conn_stream when receiving data,
to be sure to have somewhere to deliver it. Right now it responds with
STREAM_CLOSED, which is not correct since from an H2 point of view the
stream is not closed and a peer could be upset to see this. After some
analysis, it is important to keep this test to be sure not to fill the
rxbuf then stall the connection. Another option could be to modiffy
h2_frt_transfer_data() to silently discard any contents but the CANCEL
error code is designed exactly for this and to save the peer from
continuing to stream data that will be discarded, so better switch to
using this.
This must be backported as far as 1.8.
The test in h2s_send_rst_stream() is excessive, it refrains from sending an
RST_STREAM if *the last frame* was an RST_STREAM, regardless of the stream
ID. In a context where both clients and servers abort a lot, it could happen
that one RST_STREAM is dropped from responses from time to time, causing
delays to the client.
This must be backported to 2.0, 1.9 and 1.8.
In _update_fd(), if we're about to remove the FD from the poller, remove
both the receive and the send bits, instead of removing the receive bits
twice.
In fd_dodelete(), always reset the polled_mask bits, instead on only doing
it if we're closing the file descriptor. We call the poller clo() method
anyway, and failing to do so means that if fd_remove() is used while the
fd is polled, the poller won't attempt to poll on a fd with the same value
as the old one.
This leads to fd being stuck in the SSL code while using the async engine.
This should be backported to 2.0, 1.9 and 1.8.
This commit adds a new `Question.md` issue template that aims to prevent
users asking questions within the issue tracker, because the other two
templates do not fit "question", thus leading to people not seeing that
the issue tracker is not for questions.
This template inspired by TypeScript's Question template at:
https://github.com/microsoft/TypeScript/blob/master/.github/ISSUE_TEMPLATE/Question.md
In stream_set_backend(), if we have a TCP stream, and we want to upgrade it
to H2 instead of attempting ot reuse the stream, just destroy the
conn_stream, make sure we don't log anything about the stream, and pretend
we failed setting the backend, so that the stream will get destroyed.
New streams will then be created by the mux, as if the connection just
happened.
This fixes a crash when upgrading from TCP to H2, as the H2 mux totally
ignored the conn_stream provided by the upgrade, as reported in github
issue #196.
This should be backported to 2.0.
The SETTINGS frame parser updates all streams' window for each
INITIAL_WINDOW_SIZE setting received on the connection (like h2spec
does in test 6.5.3), which can start to be expensive if repeated when
there are many streams (up to 100 by default). A quick test shows that
it's possible to parse only 35000 settings per second on a 3 GHz core
for 100 streams, which is rather small.
Given that window sizes are relative and may be negative, there's no
point in pre-initializing them for each stream and update them from
the settings. Instead, let's make them relative to the connection's
initial window size so that any change immediately affects all streams.
The only thing that remains needed is to wake up the streams that were
unblocked by the update, which is now done once at the end of
h2_process_demux() instead of once per setting. This now results in
5.7 million settings being processed per second, which is way better.
In order to keep the change small, the h2s' mws field was renamed to
"sws" for "stream window size", and an h2s_mws() function was added
to add it to the connection's initial window setting and determine the
window size to use when muxing. The h2c_update_all_ws() function was
renamed to h2c_unblock_sfctl() since it's now only used to unblock
previously blocked streams.
This needs to be backported to all versions till 1.8.
Recent optimization in commit 4d7a88482 ("MEDIUM: mux-h2: don't try to
read more than needed") broke the receipt of large DATA frames because
it would unconditionally subscribe if there was some room left, thus
preventing any new rx from being done since subscription may only be
done once the end was reached, as indicated by ret == 0.
However, fixing this uncovered that in HTX mode previous versions might
occasionally be affected as well, when an available frame is the same
size as the maximum data that may fit into an HTX buffer, we may end
up reading that whole frame and still subscribe since it's still allowed
to receive, thus causing issues to read the next frame.
This patch will only work for 2.1-dev but a minor adaptation will be
needed for earlier versions (down to 1.9, where subscribe() was added).
Between 1.6 and 1.7, some parts of the stream forwarding process were
moved into lower layers and the stream-interface had to keep the
stream's task up to date regarding the timeouts. The analyser timeouts
were not updated there as it was believed this was not needed during
forwarding, but actually there is a case for this which is "option
contstats" which periodically triggers the analyser timeout, and this
change broke the option in case of sustained traffic (if there is some
I/O activity during the same millisecond as the timeout expires, then
the update will be missed).
This patch simply brings back the analyser expiration updates from
process_stream() to stream_int_notify().
It may be backported as far as 1.7, taking care to adjust the fields
names if needed.
There are some situations, after sending a request or response, upon I/O
completion, or applet execution, where we end up with an empty buffer that
was not released. This results in excessive memory usage (back to 1.5) and
a lower CPU cache efficiency since buffers are not recycled as fast. This
has changed since the places where we send have changed with the new
layering, but not all cases susceptible of leaving an empty buffer were
properly spotted. Doing so reduces the memory pressure on buffers by about
2/3 in high traffic tests.
This should be backported to 2.0 and maybe 1.9.
Multiple calls to smp_fetch_fhdr use the header context to keep track of
header parsing position; however, when using header sampling on a raw
connection, the raw buffer is converted into an HTX structure each time, and
this was done in the trash areas; so the block reference would be invalid on
subsequent calls.
This patch must be backported to 2.0 and 1.9.
When the server weight is increased in consistant hash, extra nodes have to be
allocated. So a realloc() is performed on the nodes array of the server. the
previous commit 962ea7732 ("BUG/MEDIUM: lb-chash: Remove all server's entries
before realloc() to re-insert them after") have fixed the size used during the
realloc() to avoid segfaults. But another bug remains. After the realloc(), the
memory area allocated for the nodes array may change, invalidating all node
addresses in the chash tree.
So, to fix the bug, we must remove all server's entries from the chash tree
before the realloc to insert all of them after, old nodes and new ones. The
insert will be automatically handled by the loop at the end of the function
chash_queue_dequeue_srv().
Note that if the call to realloc() failed, no new entries will be created for
the server, so the effective server weight will be unchanged.
This issue was reported on Github (#189).
This patch must be backported to all versions since the 1.6.
When using a ckch we should never try to free its content, because it
won't be usable after and can result in a NULL derefence during
parsing.
The content was previously freed because the ckch wasn't stored in a
tree to be used later, now that we use it multiple time, we need to keep
the data.
This test occasionally fails on the Travis CI tests because the
"in progress" bit is sometimes still set (or set again) in the show
servers state output and is not expected in all regexes (some do
already cover it), like in this one :
https://travis-ci.com/haproxy/haproxy/jobs/221324920
Let's extend the remaining ones to accept this as well. Other tests
do not seem affected as they only expect sequences of digits there.
It happens that upon looping threads the watchdog fires, starts a dump,
and other threads expire their budget while waiting for the other threads
to get dumped and trigger a watchdog event again, adding some confusion
to the traces. With this patch the situation becomes clearer as we export
the list of threads being dumped so that the watchdog can check it before
deciding to trigger. This way such threads in queue for being dumped are
not attempted to be reported in turn.
This should be backported to 2.0 as it helps understand stack traces.
If a thread dump is requested from a signal handler, it may interrupt
a thread already waiting for a dump to complete, and may see the
threads_to_dump variable go to zero while others are waiting, steal
the lock and prevent other threads from ever completing. This tends
to happen when dumping many threads upon a watchdog timeout, to threads
waiting for their turn.
Instead now we proceed in two steps :
1) the last dumped thread sets all bits again
2) all threads only wait for their own bit to appear, then clear it
and quit
This way there's no risk that a bit performs a double flip in the same
loop and threads cannot get stuck here anymore.
This should be backported to 2.0 as it clarifies stack traces.
Recent changes use struct cert_key_and_chain to load all certificates in
frontends, this structure was previously used only to load multi-cert
bundle, which is supported only on >= 1.0.2.
The h2_recv() loop was historically built around a loop to deal with
the callback model but this is not needed anymore, as it the upper
layer wants more data, it will simply try to read again. Right now
50% of the recvfrom() calls made over H2 return EAGAIN. With this
change it doesn't happen anymore. Note that the code simply consists
in breaking the loop, and reporting real data receipt instead of
always returning 1.
A test was made not to subscribe if we actually read data but it
doesn't change anything since we might be subscribed very early
already.
In the poller code, instead of just remembering if we're currently polling
a fd or not, remember if we're polling it for writing and/or for reading, that
way, we can avoid to modify the polling if it's already polled as needed.
Now that the architecture was changed so that attempts to receive/send data
always come from the upper layers, instead of them only trying to do so when
the lower layer let them know they could try, we can finally get rid of the
fd cache. We don't really need it anymore, and removing it gives us a small
performance boost.
OPENSSL_NO_DH can be defined to avoid obsolete and heavy DH processing.
With OPENSSL_NO_DH, parse the entire PEM file to look at DHparam is wast
of time.
Load a PEM certificate and use it in CTX are now decorrelated.
Checking the certificate and private key consistency can be done
earlier: in loading phase instead CTX set phase.
cert_key_and_chain handling is now outside openssl 1.0.2 #if: the
code must be libssl compatible. SSL_CTX_add1_chain_cert and
SSL_CTX_set1_chain requires openssl >= 1.0.2, replace it by legacy
SSL_CTX_add_extra_chain_cert when SSL_CTX_set1_chain is not provided.
A problem involving server slowstart was reported by @max2k1 in issue #197.
The problem is that pendconn_grab_from_px() takes the proxy lock while
already under the server's lock while process_srv_queue() first takes the
proxy's lock then the server's lock.
While the latter seems more natural, it is fundamentally incompatible with
mayn other operations performed on servers, namely state change propagation,
where the proxy is only known after the server and cannot be locked around
the servers. Howwever reversing the lock in process_srv_queue() is trivial
and only the few functions related to dynamic cookies need to be adjusted
for this so that the proxy's lock is taken for each server operation. This
is possible because the proxy's server list is built once at boot time and
remains stable. So this is what this patch does.
The comments in the proxy and server structs were updated to mention this
rule that the server's lock may not be taken under the proxy's lock but
may enclose it.
Another approach could consist in using a second lock for the proxy's queue
which would be different from the regular proxy's lock, but given that the
operations above are rare and operate on small servers list, there is no
reason for overdesigning a solution.
This fix was successfully tested with 10000 servers in a backend where
adjusting the dyncookies in loops over the CLI didn't have a measurable
impact on the traffic.
The only workaround without the fix is to disable any occurrence of
"slowstart" on server lines, or to disable threads using "nbthread 1".
This must be backported as far as 1.8.
Load the DH param at the same time as the certificate, we don't need to
open the file once more and read it again. We store it in the ckch_node.
There is a minor change comparing to the previous way of loading the DH
param in a bundle. With a bundle, the DH param in a certificate file was
never loaded, it only used the global DH or the default DH, now it's
able to use the DH param from a certificate file.
Don't read a certificate file again if it was already stored in the
ckchn tree. It allows HAProxy to start more quickly if the same
certificate is used at different places in the configuration.
HAProxy lookup in the ssl_sock_load_cert() function, doing it at this
level allows to skip the reading of the certificate in the filesystem.
If the certificate is not found in the tree, we insert the ckch_node in
the tree once the certificate is read on the filesystem, the filename or
the bundle name is used as the key.
Split the functions which open the certificates.
Instead of opening directly the certificates and inserting them directly
into a SSL_CTX, we use a struct cert_key_and_chain to store them in
memory and then we associate a SSL_CTX to the certificate stored in that
structure.
Introduce the struct ckch_node for the multi-cert bundles so we can
store multiple cert_key_and_chain in the same structure.
The functions ssl_sock_load_multi_cert() and ssl_sock_load_cert_file()
were modified so they don't open the certicates anymore on the
filesystem. (they still open the sctl and ocsp though). These functions
were renamed ssl_sock_load_ckchn() and ssl_sock_load_multi_ckchn().
The new function ckchn_load_cert_file() is in charge of loading the
files in the cert_key_and_chain. (TODO: load ocsp and sctl from there
too).
The ultimate goal is to be able to load a certificate from a certificate
tree without doing any filesystem access, so we don't try to open it
again if it was already loaded, and we share its configuration.
This structure was only used in the case of the multi-cert bundle.
Using these primitives everywhere when we load the file are a first step
in the deduplication of the code.
This commit merges the function ssl_sock_load_cert_file() and
ssl_sock_load_cert_chain_file().
The goal is to refactor the SSL code and use the cert_key_and_chain
struct to load everything.
When the payload of a block is shrinked or enlarged, addresses of the free
spaces must be updated. There are many possible cases. One of them is
buggy. When there is only one block in the HTX message and its payload is just
before the tail room and it needs to be moved in the head room to be enlarged,
addresses are not correctly updated. This bug may be hit by the compression
filter.
This patch must be backported to 2.0.
The flag HLUA_TXN_HTTP_RDY was added in the previous commit to know when a
function is called for a channel with a valid HTTP message or not. Of course it
also depends on the calling direction. In this commit, we allow the execution of
functions of the HTTP class only if this flag is set.
Nobody seems to use them from an unsupported context (for instance, trying to
set an HTTP header from a tcp-request rule). But it remains a bug leading to
undefined behaviors or crashes.
This patch may be backported to all versions since the 1.6. It depends on the
commits "MINOR: hlua: Add a flag on the lua txn to know in which context it can
be used" and "MINOR: hlua: Don't set request analyzers on response channel for
lua actions".
When a lua action or a lua sample fetch is called, a lua transaction is
created. It is an entry in the stack containing the class TXN. Thanks to it, we
can know the direction (request or response) of the call. But, for some
functions, it is also necessary to know if the buffer is "HTTP ready" for the
given direction. "HTTP ready" means there is a valid HTTP message in the
channel's buffer. So, when a lua action or a lua sample fetch is called, the
flag HLUA_TXN_HTTP_RDY is set if it is appropriate.
Setting some requests analyzers on the response channel was an old trick to be
sure to re-evaluate the request's analyers after the response's ones have been
called. It is no more necessary. In fact, this trick was removed in the version
1.8 and backported up to the version 1.6.
This patch must be backported to all versions since 1.6 to ease the backports of
fixes on the lua code.
It is invalid to manipulate responses from http-request rules or to manipulate
requests from http-response rules. When http-request rules are evaluated, the
connection to server is not yet established, so there is no response at all. And
when http-response rules are evaluated, the request has already been sent to the
server.
Now, the calling direction is checked. So functions "txn.http:req_*" can now
only be called from http-request rules and the functions "txn.http:res_*" can
only be called from http-response rules.
This issue was reported on Github (#190).
This patch must be backported to all versions since the 1.6.
For HTX streams, when txn:done() is called, the work is delegated to the
function http_reply_and_close(). But it is not enough. The channel's analyzers
must also be reset. Otherwise, some analyzers may still be called while
processing should be aborted.
For instance, if the function is called from an http-request rules on the
frontend, request analyzers on the backend side are still called. So we may try
to add an header to the request, while this one was already reset.
This patch must be backported to 2.0 and 1.9.