mirror of
https://git.haproxy.org/git/haproxy.git/
synced 2025-08-06 15:17:01 +02:00
BUG/MEDIUM: http: fix risk of CPU spikes with pipelined requests from dead client
Since client-side HTTP keep-alive was introduced in 1.4-dev, a good number of corner cases had to be dealt with. One of them remained and caused some occasional CPU spikes that Cyril Bont had the opportunity to observe from time to time and even recently to capture for deeper analysis. What happens is the following scenario : 1) a client sends a first request which causes the server to respond using chunked encoding 2) the server starts to respond with a large response that doesn't fit into a single buffer 3) the response buffer fills up with the response 4) the client reads the response while it is being sent by the server. 5) haproxy eventually receives the end of the response from the server, which occupies the whole response buffer (must at least override the reserve), parses it and prepares to receive a second request while sending the last data blocks to the client. It then reinitializes the whole http_txn and calls http_wait_for_request(), which arms the http-request timeout. 6) at this exact moment the client emits a second request, probably asking for an object referenced in the first page while parsing it on the fly, and silently disappears from the internet (internet access cut or software having trouble with pipelined request). 7) the second request arrives into the request buffer and the response data stall in the response buffer, preventing the reserve from being used for anything else. 8) haproxy calls http_wait_for_request() to parse the next request which has just arrived, but since it sees the response buffer is full, it refrains from doing so and waits for some data to leave or a timeout to strike. 9) the http-request timeout strikes, causing http_wait_for_request() to be called again, and the function immediately returns since it cannot even produce an error message, and the loop is maintained here. 10) the client timeout strikes, aborting the loop. At first glance a check for timeout would be needed *before* considering the buffer in http_wait_for_request(), but the issue is not there in fact, because when doing so we see in the logs a client timeout error while waiting for a request, which is wrong. The real issue is that we must not consider the first transaction terminated until at least the reserve is released so that http_wait_for_request() has no problem starting to process a new request. This allows the first response to be reported in timeout and not the second request. A more restrictive control could consist in not considering the request complete until the response buffer has no more outgoing data but this brings no added value and needlessly increases the number of wake-ups when dealing with pipelining. Note that the same issue exists with the request, we must ensure that any POST data are finished being forwarded to the server before accepting a new request. This case is much harder to trigger however as servers rarely disappear and if they do so, they impact all their sessions at once. No specific reproducer is usable because the bug is very hard to reproduce and depends on the system as well with settings varying across reboots. On a test machine, socket buffers were reduced to 4096 (tcp_rmem and tcp_wmem), haproxy's buffers were set to 16384 and tune.maxrewrite to 12288. The proxy must work in http-server-close mode, with a request timeout smaller than the client timeout. The test is run like this : $ (printf "GET /15000 HTTP/1.1\r\n\r\n"; usleep 100000; \ printf "GET / HTTP/1.0\r\n\r\n"; sleep 15) | nc6 --send-only 0 8002 The server returns chunks of the requested size (15000 bytes here, but 78000 in a previous test was the only working value). Strace must show the last recvfrom() succeed and the last sendto() being shorter than desired or better, not being called. This fix must be backported to 1.6, 1.5 and 1.4. It was made in a way that should make it possible to backport it. It depends on channel_congested() which also needs to be backported. Further cleanup of http_wait_for_request() could be made given that some of the tests are now useless.
This commit is contained in:
parent
55e58f2334
commit
3de5bd603c
@ -5442,9 +5442,18 @@ int http_resync_states(struct stream *s)
|
||||
(txn->flags & TX_CON_WANT_MSK) == TX_CON_WANT_KAL)) {
|
||||
/* server-close/keep-alive: terminate this transaction,
|
||||
* possibly killing the server connection and reinitialize
|
||||
* a fresh-new transaction.
|
||||
* a fresh-new transaction, but only once we're sure there's
|
||||
* enough room in the request and response buffer to process
|
||||
* another request. The request buffer must not hold any
|
||||
* pending output data and the request buffer must not have
|
||||
* output data occupying the reserve.
|
||||
*/
|
||||
http_end_txn_clean_session(s);
|
||||
if (s->req.buf->o)
|
||||
s->req.flags |= CF_WAKE_WRITE;
|
||||
else if (channel_congested(&s->res))
|
||||
s->res.flags |= CF_WAKE_WRITE;
|
||||
else
|
||||
http_end_txn_clean_session(s);
|
||||
}
|
||||
|
||||
return txn->req.msg_state != old_req_state ||
|
||||
|
Loading…
Reference in New Issue
Block a user