The error handling in the HTTP payload forwarding is far to be ideal because
both sides (request and response) are tested each time. It is espcially ugly
on the request side. To report a server error instead of a client error,
there are some workarounds to delay the error handling. The reason is that
the request analyzer is evaluated before the response one. In addition,
errors are tested before the data analysis. It means it is possible to
truncate data because errors may be handled to early.
So the error handling at this stages was totally reviewed. Aborts are now
handled after the data analysis. We also stop to finish the response on
request error or the opposite. As a side effect, the HTTP_MSG_ERROR state is
now useless. As another side effect, the termination flags are now set by
the HTTP analysers and not process_stream().
It was inherited from the legacy HTTP mode, but the message parsing is
handled by the underlying mux now. Thus, if a message is in HTTP_MSG_ERROR
state, it is just an analysis error and not a parsing error. So there is no
reason to block the HTTP sample fetch evaluation in this case.
This patch could be backported in all stable versions (For the 2.0, only the
htx part must be updated).
When HAProxy is waiting for the request body and an abort or an error is
detected, we can now use http_set_term_flags() function to set the termination
flags of the stream instead of handling it by hand.
When we wait for the request body, we are still in the request analysis. So
a SF_FINST_R flag must be reported in logs. Even if some data are already
received, at this staged, nothing is sent to the server.
This patch could be backported in all stable versions.
We use the new function to set the HTTP termination flags in the most
obvious places. The other places are a bit specific and will be handled one
by one in dedicated patched.
There is already a function to set termination flags but it is not well
suited for HTTP streams. So a function, dedicated to the HTTP analysis, was
added. This way, this new function will be called for HTTP analysers on
error. And if the error is not caugth at this stage, the generic function
will still be called from process_stream().
Here, by default a PRXCOND error is reported and depending on the stream
state, the reson will be set accordingly:
* If the backend SC is in INI state, SF_FINST_T is reported on tarpit and
SF_FINST_R otherwise.
* SF_FINST_Q is the server connection is queued
* SF_FINST_C in any connection attempt state (REQ/TAR/ASS/CONN/CER/RDY).
Except for applets, a SF_FINST_R is reported.
* Once the server connection is established, SF_FINST_H is reported while
HTTP_MSG_DATA state on the response side.
* SF_FINST_L is reported if the response is in HTTP_MSG_DONE state or
higher and a client error/timeout was reported.
* Otherwise SF_FINST_D is reported.
When the promex applet triggers an error, for instance because the URI is
invalid, we must still take care to consume the request. Otherwise, the
error will be handled by HTTP analyzers as a server abort.
This patch must be backported as far as 2.0.
Since 2.6 with commit 34e4085f8 ("MEDIUM: peers: Balance applets across
threads") the initialization of a peers appctx may be postponed with a
wakeup, causing some partially initialized appctx to be visible. The
"show peers" command used to only care about peers without appctx, but
now it must also take care of those with no stconn, otherwise it can
occasionally crash while dumping them.
This fix must be backported to 2.6. Thanks to Patrick Hemmer for
reporting the problem.
During multiple tests we've already noticed that shared stats counters
have become a real bottleneck under large thread counts. With QUIC it's
pretty visible, with qc_snd_buf() taking 2.5% of the CPU on a 48-thread
machine at only 25 Gbps, and this CPU is entirely spent in the atomic
increment of the byte count and byte rate. It's also visible in H1/H2
but slightly less since we're working with larger buffers, hence less
frequent updates. These counters are exclusively used to report the
byte count in "show info" and the byte rate in the stats.
Let's move them to the thread_ctx struct and make the stats reader
just collect each thread's stats when requested. That's way more
efficient than competing on a single cache line.
After this, qc_snd_buf has totally disappeared from the perf profile
and tests made in h1 show roughly 1% performance increase on small
objects.
When updating an OCSP response, in case of HTTP error (host unreachable
for instance) we do not want to reinsert the entry at the same place in
the update tree otherwise we might retry immediately the update of the
same response. This patch adds an arbitrary 1min time to the next_update
of a response in such a case.
After an HTTP error, instead of waking the update task up after an
arbitrary 10s time, we look for the first entry of the update tree and
sleep for the apropriate time.
In the unlikely event that the ocsp update task is started but the
update tree is empty, put the update task to sleep indefinitely.
The only way this can happen is if the same certificate is loaded under
two different names while the second one has the 'ocsp-update on'
option. Since the certificate names are distinct we will have two
ckch_stores but a single certificate_ocsp because they are identified by
the OCSP_CERTID which is built out of the issuer certificate and the
certificate id (which are the same regardless of the .pem file name).
If incompatibilities are found in a certificate's ocsp-update mode we
raised a single alert that will be considered fatal from here on. This
is changed because in case of incompatibilities we will end up with an
undefined behaviour. The ocsp response might or might not be updated
depending on the order in which the multiple ocsp-update options are
taken into account.
An arbitrary 5 minutes minimum interval between two updates of the same
OCSP response is defined but it was not properly used when inserting
entries in the update tree.
This patch does not need to be backported.
Since commit 36d9097cf ("MINOR: fd: Add BUG_ON checks on fd_insert()"),
there is currently a test in fd_insert() to detect that we're not trying
to reinsert an FD that had already been inserted. This test catches the
following anomalies:
frontend fail1
bind fd@0
bind fd@0
and:
frontend fail2
bind fd@0 shards 2
What happens is that clone_listener() is called on a listener already
having an FD, and when sock_{inet,unix}_bind_receiver() are called, the
same FD will be registered multiple times and rightfully crash in the
sanity check.
It wouldn't be correct to block shards though (e.g. they could be used
in a default-bind line). What looks like a safer and more future-proof
approach simply is to dup() the FD so that each listener has one copy.
This is also the only solution that might allow later to support more
than 64 threads on an inherited FD.
This needs to be backported as far as 2.4. Better wait for at least
one extra -dev version before backporting though, as the bug should
not be triggered often anyway.
Since the tool permits to pass an FD bound for listening, it's convenient
to test haproxy's "bind fd@". Let's add support for UNIX sockets the same
way. -U needs to be passed to change the default address family, and the
address must contain a "/".
E.g.
$ dev/tcploop/tcploop -U /tmp/ux L Xi ./haproxy -f fd1.cfg
The do_resolv action triggers a resolution and must wait for the
result. Concretely, if no cache entry is available, it creates a resolution
and wakes up the resolvers task. Then it yields. When the action is
recalled, if the resolution is still running, it yields again.
However, if the resolution is not running, it does not check it was
running. Thus, it is possible to ignore the resolution because the action
was recalled before the resolvers task had a chance to be executed. If there
is result, the action must yield.
This patch should fix the issue #1993. It must be backported as far as 2.0.
These both functions are buggy and don't respect the documentation. They
must wait for more data, if possible.
For Channel.data(), it must happen if not enough data was received orf if no
length was specified and no data was received. The first case is properly
handled but not the second one. An empty string is return instead. In
addition, if there is no data and the channel can't receive more data, 'nil'
value must be returned.
In the same spirit, for Channel.line(), we must try to wait for more data
when no line is found if not enough data was received or if no length was
specified. Here again, only the first case is properly handled. And for this
function too, 'nil' value must be returned if there is no data and the
channel can't receive more data.
This patch is related to the issue #1993. It must be backported as far as
2.5.
It is possible to have an "upgrade:" header and the corresponding value in
the "connection:" header for a non-101 response. It happens for
426-Upgrade-Required messages. However, on HAProxy side, a parsing error is
reported for this kind of message because no websocket key header
("sec-websocket-accept:") is found in the response.
So a possible fix could be to not perform this test for non-101
responses. However, having flags about protocol upgrade on this kind of
response could lead to other bugs. Instead, corresponding flags are
removed. Thus, during the H1 response post-parsing, H1_MF_CONN_UPG and
H1_MF_UPG_WEBSOCKET flags are removed from any non-101 response.
This patch should fix the issue #1997. It must be backported as far as 2.4.
Sending is done with several iterations over qcs streams in qc_send().
The first loop is conducted over streams in <qcc.send_list>. After this
first iteration, some streams may still have data in their Tx buffer but
were blocked by a full qc_stream_desc buffer. In this case, they have
release their qc_stream_desc buffer in qcc_streams_sent_done().
New iterations can be done for these streams which can allocate new
qc_stream_desc buffer if available. Before this patch, this was done
through another stream list <qcc.send_retry_list>. Now, we can reuse the
new <qcc.send_list> for this usage.
This is safe to use as after first iteration, we have guarantee that
either one of the following is true if there is still streams in
<qcc.send_list> :
* transport layer has rejected data due to congestion
* stream is left because it is blocked on stream flow control
* stream still has data and has released a fulfilled qc_stream_desc
buffer. Immediate retry is useful for these streams : they will
allocate a new qc_stream_desc buffer if possible to continue sending.
This must be backported up to 2.7.
When a STOP_SENDING or RESET_STREAM must be send, its corresponding qcs
is inserted into <qcc.send_list> via qcc_reset_stream() or
qcc_abort_stream_read().
This allows to remove the iteration on full qcs tree in qc_send().
Instead, STOP_SENDING and RESET_STREAM is done in the loop over
<qcc.send_list> as with STREAM frames. This should improve slightly the
performance, most notably when large number of streams are opened.
This must be backported up to 2.7.
Complete qcc_send_stream() function to allow to specify if the stream
should be handled in priority. Internally this will insert the qcs
instance in front of <qcc.send_list> to be able to treat it before other
streams.
This functionality is useful when some QUIC streams should be sent
before others. Most notably, this is used to guarantee that H3 SETTINGS
is done first via the control stream.
This must be backported up to 2.7.
Implement a mechanism to register streams ready to send data in new
STREAM frames. Internally, this is implemented with a new list
<qcc.send_list> which contains qcs instances.
A qcs can be registered safely using the new function qcc_send_stream().
This is done automatically in qc_send_buf() which covers most cases.
Also, application layer is free to use it for internal usage streams.
This is currently the case for H3 control stream with SETTINGS sending.
The main point of this patch is to handle stream sending fairly. This is
in stark contrast with previous code where streams with lower ID were
always prioritized. This could cause other streams to be indefinitely
blocked behind a stream which has a lot of data to transfer. Now,
streams are handled in an order scheduled by se_desc layer.
This commit is the first one of a serie which will bring other
improvments which also relied on the send_list implementation.
This must be backported up to 2.7 when deemed sufficiently stable.
Add new traces when QUIC flow-control limits are reached at stream or
connection level. This may help to explain an interrupted transfer.
This should be backported up to 2.6.
QUIC stream did not transferred its response if it was an empty HTTP
response without headers nor entity body. This is caused by an
incomplete condition on qc_send() which skips streams with empty
<tx.buf>.
Fix this by extending the condition. Sending will be conducted on a
stream if <tx.buf> is not empty or FIN notification must be provided.
This allows to send the last STREAM frame for this stream.
Such HTTP responses should be extremely rare so this bug is labelled as
MINOR. It was encountered with a HTTP/0.9 request on an empty payload.
The bug was triggered as HTTP/0.9 does not support header in response
message.
Also, note that condition to wakeup MUX tasklet has been changed
similarly in qc_send_buf(). It is not mandatory to work properly
however, most probably because another tasklet_wakeup() is done
before/after.
This should be backported up to 2.6.
In applets, we stop processing when a write error (CF_WRITE_ERROR) or a shutdown
for writes (CF_SHUTW) is detected. However, any write error leads to an
immediate shutdown for writes. Thus, it is enough to only test if CF_SHUTW is
set.
When a read error (CF_READ_ERROR) is reported, a shutdown for reads is
always performed (CF_SHUTR). Thus, there is no reason to check if
CF_READ_ERROR is set if CF_SHUTR is also checked.
CF_READ_ATTACHED flag is only used in input events for stream analyzers,
CF_MASK_ANALYSER. A read event can be reported instead and this flag can be
removed. We must only take care to report a read event when the client
connection is upgraded from TCP to HTTP.
It appears CF_ANA_TIMEOUT is flag only used in CF_MASK_ANALYSER. All
analyzer timeout relies on the analysis expiration date (chn->analyse_exp).
Worst, once set, this flag is never removed. Thus this flag can be removed
and replaced by a read event (CF_READ_EVENT).
Thanks to previous changes, CF_WRITE_ACTIVITY flags can be removed.
Everywhere it was used, its value is now directly used
(CF_WRITE_EVENT|CF_WRITE_ERROR).
Thanks to previous changes, CF_READ_ACTIVITY flags can be removed.
Everywhere it was used, its value is now directly used
(CF_READ_EVENT|CF_READ_ERROR).
Just like CF_READ_PARTIAL, CF_WRITE_PARTIAL is now merged with
CF_WRITE_EVENT. There a subtlety in sc_notify(). The "connect" event
(formely CF_WRITE_NULL) is now detected with
(CF_WRITE_EVENT + sc->state < SC_ST_EST).
CF_READ_PARTIAL flag is now merged with CF_READ_EVENT. It means
CF_READ_EVENT is set when a read0 is received (formely CF_READ_NULL) or when
data are received (formely CF_READ_ACTIVITY).
There is nothing special here, except conditions to wake the stream up in
sc_notify(). Indeed, the test was a bit changed to reflect recent
change. read0 event is now formalized by (CF_READ_EVENT + CF_SHUTR).
As for CF_READ_NULL, it appears CF_WRITE_NULL and other write events on a
channel are mainly used to wake up the stream and may be replace by on write
event.
In this patch, we introduce CF_WRITE_EVENT flag as a replacement to
CF_WRITE_EVENT_NULL. There is no breaking change for now, it is just a
rename. Gradually, other write events will be merged with this one.
CF_READ_NULL flag is not really useful and used. It is a transient event
used to wakeup the stream. As we will see, all read events on a channel may
be resumed to only one and are all used to wake up the stream.
In this patch, we introduce CF_READ_EVENT flag as a replacement to
CF_READ_NULL. There is no breaking change for now, it is just a
rename. Gradually, other read events will be merged with this one.
If CF_READ_NULL flag is set on a channel, it implies a shutdown for reads
was performed and CF_SHUTR is also set on this channel. Thus, there is no
reason to test is any of these flags is present, testing CF_SHUTR is enough.
When calling 'update ssl ocsp-response' with an unknown certificate file
name, the error message would mention a "ckch_store" which is an
internal structure unknown by users.
The ocsp_uri field of the certificate_ocsp structure was a 16k buffer
when it could be hand allocated to just the required size to store the
OCSP uri. This field is now behaving the same way as the sctl and
ocsp_response buffers of the ckch_store structure.
If a given certificate is used multiple times in a configuration, the
ocsp_cid field would have been overwritten during each
ssl_sock_load_ocsp call even if it was previously filled.
This patch does not need to be backported.
If a configuration such as the following was included in a crt-list
file, it would not have raised a warning about 'ocsp-update'
inconsistencies for the concerned certificate:
cert.pem [ocsp-update on]
cert.pem
because the second line as a NULL entry->ssl_conf.
In the unlikely event that the OCSP udpate task is killed in the middle
of an update process (request sent but no response received yet) the
cur_ocsp member of the update context would keep an unneeded reference
to a certificate_ocsp object. It must then be freed during the task's
cleanup.
If the ocsp issuer certificate was actually taken from the certificate
chain in ssl_sock_load_ocsp, we don't need to keep an extra reference on
it since we already keep a reference to the full certificate chain.
When calling OCSP_basic_verify to check the validity of the received
OCSP response, we need to provide an untrusted certificate chain as well
as an X509_STORE holding only trusted certificates. Since the
certificate chain and the issuer certificate are all provided by the
user, we assume that they are valid and we add them all to a temporary
store. This enables to focus only on the response's validity.
When ocsp-update is enabled for a given certificate, its
certificate_ocsp objects is inserted in two separate trees (the actual
ocsp response one and the ocsp update one). But since the same instance
is used for the two trees, its ownership is kept by the regular ocsp
response one. The ocsp update task should then never have to free the
ocsp entries. The crash actually occurred because of this. The update
task was freeing entries whose reference counter was not increased while
a reference was still held by the SSL_CTXs.
The only time during which the ocsp update task will need to increase
the reference counter is during an actual update, because at this moment
the entry is taken out of the update tree and a 'flying' reference to
the certificate_ocsp is kept in the ocsp update context.
This bug could be reproduced by calling './haproxy -f conf.cfg -c' with
any of the used certificates having the 'ocsp-update on' option. For
some reason asan caught the bug easily but valgrind did not.
This patch does not need to be backported.
This CLI command crashed when called for a certificate which did not
have an OCSP response during startup because it assumed that the
ocsp_issuer pointer of the ckch_data object would be valid. It was only
true for already known OCSP responses though.
The ocsp issuer certificate is now taken either from the ocsp_issuer
pointer or looked for in the certificate chain. This is the same logic
as the one in ssl_sock_load_ocsp.
This patch does not need to be backported.