mirror of
https://git.haproxy.org/git/haproxy.git/
synced 2025-09-21 13:51:26 +02:00
Since 1.5-dev12 and commit 3bf1b2b8 (MAJOR: channel: stop relying on BF_FULL to take action), the HTTP parser switched to channel_full() instead of BF_FULL to decide whether a buffer had enough room to start parsing a request or response. The problem is that channel_full() intentionally ignores outgoing data, so a corner case exists where a large response might still be left in a response buffer with just a few bytes left (much less than the reserve), enough to accept a second response past the last data, but not enough to permit the HTTP processor to add some headers. Since all the processing relies on this space being available, we can get some random crashes when clients pipeline requests. The analysis of a core from haproxy configured with 20480 bytes buffers shows this : with enough "luck", when sending back the response for the first request, the client is slow, the TCP window is congested, the socket buffers are full, and haproxy's buffer fills up. We still have 20230 bytes of response data in a 20480 response buffer. The second request is sent to the server which returns 214 bytes which fit in the small 250 bytes left in this buffer. And the buffer arrangement makes it possible to escape all the controls in http_wait_for_response() : |<------ response buffer = 20480 bytes ------>| [ 2/2 | 3 | 4 | 1/2 ] ^ start of circular buffer 1/2 = beginning of previous response (18240) 2/2 = end of previous response (1990) 3 = current response (214) 4 = free space (36) - channel_full() returns false (20230 bytes are going to leave) - the response headers does not wrap at the end of the buffer - the remaining linear room after the headers is larger than the reserve, because it's the previous response which wraps : => response is processed Header rewriting causes it to reach 260 bytes, 10 bytes larger than what the buffer could hold. So all computations during header addition are wrong and lead to the corruption we've observed. All the conditions are very hard to meet (which explains why it took almost one year for this bug to show up) and are almost impossible to reproduce on purpose on a test platform. But the bug is clearly there. This issue was reported by Dinko Korunic who kindly devoted a lot of time to provide countless traces and cores, and to experiment with troubleshooting patches to knock the bug down. Thanks Dinko! No backport is needed, but all 1.5-dev versions between dev12 and dev18 included must be upgraded. A workaround consists in setting option forceclose to prevent pipelined requests from being processed.