We must trim any excess data from the response buffer when recycling
a keep-alive connection, because we may have blocked an invalid response
from a server that we don't want to accidentely forward once we disable
the analysers, nor do we want those data to come along with next response.
A typical example of such data would be from a buggy server responding to
a HEAD with some data, or sending more than the advertised content-length.
For deciding to set the BF_EXPECT_MORE, we reused the same code as in
http_wait_for_request(), but here we must ignore buf->lr which is not
yet set and useless. This might only have caused random sub-optimal
behaviours.
Krzysztof Oledzki reported that 1.4-dev7 would regularly crash
on an apparently very common workload. The cores he provided
showed some inter-buffer data corruption, exactly similar to
what was fixed by the following recent commit :
bbfa7938bd74adbfa435f26503fc10f5938195a3 [BUG] buffer_replace2 must never change the ->w entry
In fact, it was buffer_insert_line2() which was still modifying the
->w pointer, causing issues with pipelined responses in keep-alive
mode if some headers were to be added.
The bug requires a remote client, a near server, large server buffers
and small client buffers to be reproduced, with response header
insertion. Still, it's surprizing that it did not trigger earlier.
Now after 100k pipelined requests it did not trigger anymore.
Despite what is explicitly stated in HTTP specifications,
browsers still use the undocumented Proxy-Connection header
instead of the Connection header when they connect through
a proxy. As such, proxies generally implement support for
this stupid header name, breaking the standards and making
it harder to support keep-alive between clients and proxies.
Thus, we add a new "option http-use-proxy-header" to tell
haproxy that if it sees requests which look like proxy
requests, it should use the Proxy-Connection header instead
of the Connection header.
This function is used to move data which is located between ->w and ->r,
so it must not touch ->w, otherwise it will displace pending data which
is before the one we're actually overwriting. The issue arises with
some pipelined responses which cause some part of the previous one to
be chopped off when removing the connection: close header, thus
corrupting last response and shifting next one. Those are detected
in the logs because the next response will be a 502 with flags PH.
When using "option persist" or "force-persist", we want to know from the
logs if the cookie referenced a valid server or a down server. Till here
the flag reported a valid server even if the server was down, which is
misleading. Now we correctly report that the requested server was down.
We can typically see "--DI" when using "option persist" with redispatch,
ad "SCDN" when using force-persist on a down server.
This is used to force access to down servers for some requests. This
is useful when validating that a change on a server correctly works
before enabling the server again.
We use to delay the response if there is a new request in the buffer.
However, if the pending request is incomplete, we should not delay the
pending responses.
This can cause parts of responses to be truncated in case of
pipelined requests if the second request generates an error
before the first request is completely flushed.
Sometimes we need to be able to change the default kernel socket
buffer size (recv and send). Four new global settings have been
added for this :
- tune.rcvbuf.client
- tune.rcvbuf.server
- tune.sndbuf.client
- tune.sndbuf.server
Those can be used to reduce kernel memory footprint with large numbers
of concurrent connections, and to reduce risks of write timeouts with
very slow clients due to excessive kernel buffering.
This one is the next step of previous patch. It correctly computes
the response mode and the Connection flag transformations depending
on the request mode and version, and the response version and headers.
We're now also able to add "Connection: keep-alive", and to convert
server's close during a keep-alive connection to a server-close
connection.
We need to improve Connection header handling in the request for it
to support the upcoming keep-alive mode. Now we have two flags which
keep in the session the information about the presence of a
Connection: close and a Connection: keep-alive headers in the initial
request, as well as two others which keep the current state of those
headers so that we don't have to parse them again. Knowing the initial
value is essential to know when the client asked for keep-alive while
we're forcing a close (eg in server-close mode). Also the Connection
request parser is now able to automatically remove single header values
at the same time they are parsed. This provides greater flexibility and
reliability.
All combinations of listen/front/back in all modes and with both
1.0 and 1.1 have been tested.
Some header values might be delimited with spaces, so it's not enough to
compare "close" or "keep-alive" with strncasecmp(). Use word_match() for
that.
Calling this function after http_find_header2() automatically deletes
the current value of the header, and removes the header itself if the
value is the only one. The context is automatically adjusted for a
next call to http_find_header2() to return the next header. No other
change nor test should be made on the transient context though.
The close mode of a transaction would be switched to tunnel mode
at the end of the processing, letting a lot of pending data pass
in the other direction if any. Let's fix that by checking for the
close mode during state resync too.
We must set the error flags when detecting that a client has reset
a connection or timed out while waiting for a new request on a keep-alive
connection, otherwise process_session() sets it itself and counts one
request error.
That explains why some sites were showing an increase in request errors
with the keep-alive.
We get a lot of those, especially with web crawlers :
recv(2, 0x810b610, 7000, 0) = -1 ECONNRESET (Connection reset by peer)
shutdown(2, 1 /* send */) = -1 ENOTCONN (Transport endpoint is not connected)
close(2) = 0
There's no need to perform the shutdown() here, the socket is already
in error so it is down.
A check was performed in buffer_replace2() to compare buffer
length with its read pointer. This has been wrong for a long
time, though it only has an impact when dealing with keep-alive
requests/responses. In theory this should be backported but
the check has no impact without keep-alive.
We can receive data with a notification of socket error. But we
must not check for the error before reading the data, because it
may be an asynchronous error notification that we check too early
while the response we're waiting for is available. If there is an
error, recv() will get it.
This should help with servers that close very fast after the response
and should also slightly lower the CPU usage during very fast checks
on massive amounts of servers since we eliminate one system call.
This should probably be backported to 1.3.
While waiting in a keep-alive state for a request, we want to silently
close if we don't get anything. However if we get a partial request it's
different because that means the client has started to send something.
This requires a new transaction flag. It will be used to implement a
distinct timeout for keep-alive and requests.
This change, suggested by Cyril Bonté, makes a lot of sense and
would have made it obvious that sessid was not properly initialized
while switching to keep-alive. The code is now cleaner.
The stream_int_cond_close() function was added to preserve the
contents of the response buffer because stream_int_retnclose()
was buggy. It flushed the response instead of flushing the
request. This caused issues with pipelined redirects followed
by error messages which ate the previous response.
This might even have caused object truncation on pipelined
requests followed by an error or by a server redirection.
Now that this is fixed, simply get rid of the now useless
function.
I've tried to follow all the pool_alloc2/pool_free2 calls in the code
to track memory leaks. I've found one which only happens when there's
already no more memory when allocating a new appsession cookie.
Sometimes it can be desired to return a location which is the same
as the request with a slash appended when there was not one in the
request. A typical use of this is for sending a 301 so that people
don't reference links without the trailing slash. The name of the
new option is "append-slash" and it can be used on "redirect"
statements in prefix mode.
When using server redirection, it is possible to specify a path
consisting of only one slash. While this is discouraged (risk of
loop) it may sometimes be useful combined with content switching.
The prefixing of a '/' then causes two slashes to be returned in
the response. So we now do as with the other redirects, don't
prepend a slash if it's alone.
Some message pointers were not usable once the message reached the
HTTP_MSG_DONE state. This is the case for ->som which points to the
body because it is needed to parse chunks. There is one case where
we need the beginning of the message : server redirect. We have to
call http_get_path() after the request has been parsed. So we rely
on ->sol without counting on ->som. In order to achieve this, we're
making ->rq.{u,v} relative to the beginning of the message instead
of the buffer. That simplifies the code and makes it cleaner.
Preliminary tests show this is OK.
This might have been introduced with chunk extensions. Note that
the server redirect still does not work because http_get_path()
cannot get the correct path once the request message is in the
HTTP_MSG_DONE state (->som does not point to the start of message
anymore).
The initial code's intention was to loop on the analysers as long
as an analyser is added by another one. [This code was wrong due to
the while(0) which breaks even on a continue statement, but the
initial intention must be changed too]. In fact we should limit the
number of times we loop on analysers in order to limit latency.
Using maxpollevents as a limit makes sense since this tunable is
used for the exact same purposes. We may add another tunable later
if that ever makes sense, so it's very unlikely.
If we accept a new request and that request produces an immediate
response (error, redirect, ...), then we may fail to send it in
case of pipelined requests if the response buffer is full. To avoid
this, we check the availability of at least maxrewrite bytes in the
response buffer before accepting a new pipelined request.
During a redirect, we used to send the last chunk of response with
stream_int_cond_close(). But this is wrong in case of pipeline,
because if the response already contains something, this function
will refrain from touching the buffer. Use a concatenation function
instead.
Also, this call might still fail when the buffer is full, we need
a second fix to refrain from parsing an HTTP request as long as the
response buffer is full, otherwise we may not even be able to return
a pending redirect or an error code.
That patch was incorrect because under some circumstances, the
capture memory could be freed by session_free() and then again
by http_end_txn(), causing a double free and an eventual segfault.
The pool use count was also reported wrong due to this bug.
The cleanup code was removed from session_free() to remain only
in http_end_txn().