haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-14 02:57:01 +02:00

Author	SHA1	Message	Date
Willy Tarreau	26f4a04744	MEDIUM: connection: set the socket shutdown flags on socket errors When we get a hard error from a syscall indicating the socket is dead, it makes sense to set the CO_FL_SOCK_WR_SH and CO_FL_SOCK_RD_SH flags to indicate that the socket may not be used anymore. It will ease the error processing in health checks where the state of socket is very important. We'll also be able to avoid some setsockopt(nolinger) after an error. For now, the rest of the code is not impacted because CO_FL_ERROR is always tested prior to these flags.	2013-12-04 23:50:36 +01:00
Willy Tarreau	61d39a0e2a	BUG/MEDIUM: splicing: fix abnormal CPU usage with splicing Mark Janssen reported an issue in 1.5-dev19 which was introduced in 1.5-dev12 by commit `96199b10`. From time to time, randomly, the CPU usage spikes to 100% for seconds to minutes. A deep analysis of the traces provided shows that it happens when waiting for the response to a second pipelined HTTP request, or when trying to handle the received shutdown advertised by epoll() after the last block of data. Each time, splice() was involved with data pending in the pipe. The cause of this was that such events could not be taken into account by splice nor by recv and were left pending : - the transfer of the last block of data, optionally with a shutdown was not handled by splice() because of the validation that to_forward is higher than MIN_SPLICE_FORWARD ; - the next recv() call was inhibited because of the test on presence of data in the pipe. This is also what prevented the recv() call from handling a response to a pipelined request until the client had ACKed the previous response. No less than 4 different methods were experimented to fix this, and the current one was finally chosen. The principle is that if an event is not caught by splice(), then it MUST be caught by recv(). So we remove the condition on the pipe's emptiness to perform an recv(), and in order to prevent recv() from being used in the middle of a transfer, we mark supposedly full pipes with CO_FL_WAIT_ROOM, which makes sense because the reason for stopping a splice()-based receive is that the pipe is supposed to be full. The net effect is that we don't wake up and sleep in loops during these transient states. This happened much more often than expected, sometimes for a few cycles at end of transfers, but rarely long enough to be noticed, unless a client timed out with data pending in the pipe. The effect on CPU usage is visible even when transfering 1MB objects in pipeline, where the CPU usage drops from 10 to 6% on a small machine at medium bandwidth. Some further improvements are needed : - the last chunk of a splice() transfer is never done using splice due to the test on to_forward. This is wrong and should be performed with splice if the pipe has not yet been emptied ; - si_chk_snd() should not be called when the write event is already being polled, otherwise we're almost certain to get EAGAIN. Many thanks to Mark for all the traces he cared to provide, they were essential for understanding this issue which was not reproducible without. Only 1.5-dev is affected, no backport is needed.	2013-07-22 09:31:55 +02:00
Willy Tarreau	4fc90efed0	BUG/MEDIUM: splicing is broken since 1.5-dev12 Commit `96199b10` reintroduced the splice() mechanism in the new connection system. However, it failed to account for the number of transferred bytes, allowing more bytes than scheduled to be transferred to the client. This can cause an issue with small-chunked responses, where each packet from the server may contain several chunks, because a single splice() call may succeed, then try to splice() a second time as the pipe is not full, thus consuming the next chunk size. This patch also reverts commit baf2a5 ("OPTIM: splice: detect shutdowns...") because it introduced a related regression. The issue is that splice() may return less data than available also if the pipe is full, so having EPOLLRDHUP after splice() returns less than expected is not a sufficient indication that the input is empty. In both cases, the issue may be detected by the presence of "SD" termination flags in the logs, and worked around by disabling splicing (using "-dS"). This problem was reported by Sander Klein, and no backport is needed.	2013-04-06 11:46:27 +02:00
Willy Tarreau	b6daedd46c	OPTIM: splice: assume by default that splice is working correctly Versions of splice between 2.6.25 and 2.6.27.12 were bogus and would return EAGAIN on incoming shutdowns. On these versions, we have to call recv() after such a return in order to find whether splice is OK or not. Since 2.6.27.13 we don't need to do this anymore, saving one useless recv() call after each splice() returning EAGAIN, and we can avoid this logic by defining ASSUME_SPLICE_WORKS. Building with linux2628 automatically enables splice and the flag above since the kernel is safe. People enabling splice for custom kernels will be able to disable this logic by hand too.	2013-01-07 16:57:09 +01:00
Willy Tarreau	baf2a500a1	OPTIM: splice: detect shutdowns and avoid splice() == 0 Since last commit introducing EPOLLRDHUP, the splicing code is able to detect an incoming shutdown without calling splice() == 0. This avoids one useless syscall.	2013-01-07 16:39:51 +01:00
Willy Tarreau	5fb3803f4b	CLEANUP: buffer: use buffer_empty() instead of buffer_len()==0 A few places still made use of buffer_len()==0 to detect an empty buffer. Use the cleaner and more efficient buffer_empty() instead.	2012-12-17 01:14:49 +01:00
Willy Tarreau	debdc4b657	BUG/MAJOR: raw_sock: must check error code on hangup In raw_sock, we already check for FD_POLL_HUP after a short recv() to avoid a useless syscall and detect the end of stream. However, we fail to check for FD_POLL_ERR here, which causes major issues as some errors might be delivered and ignored if they are delivered at the same time as a HUP, and there is no data to send to detect them on the other direction. Since the connections flags do not have the CO_FL_ERROR flag, the polling is not disabled on the socket and the pollers immediately call the conn_fd_handler() again, resulting in CPU spikes for as long as the timeouts allow them. Note that this patch alone fixes the issue but a few patches will follow to strengthen this fragile area. Big thanks to Bryan Berry who reported the issue with significant amounts of detailed traces that helped rule out many other initially suspected causes and to finally reproduce the issue in the lab.	2012-12-07 00:01:33 +01:00
Willy Tarreau	45b8893966	MINOR: splice: disable it when the system returns EBADF At least on a heavily patched 2.6.35.9, we can see splice() fail with EBADF : recv(6, "789.123456789.123456789.12345678"..., 1049, 0) = 1049 send(5, "HTTP/1.1 200\r\nContent-length: 10"..., 8030, MSG_DONTWAIT\|MSG_NOSIGNAL\|MSG_MORE) = 8030 gettimeofday({1352717854, 515601}, NULL) = 0 epoll_wait(0x3, 0x40221008, 0x7, 0) = 0 gettimeofday({1352717854, 515793}, NULL) = 0 pipe([7, 8]) = 0 splice(0x6, 0, 0x8, 0, 0xfe12c, 0x3) = -1 EBADF (Bad file descriptor) close(6) = 0 This clearly is a kernel issue since all FDs are valid here, so let's simply disable splice() on the connection when this happens so that the session correctly recovers from that issue using recv().	2012-11-12 12:02:20 +01:00
Willy Tarreau	0ea0cf606e	BUG: raw_sock: also consider ENOTCONN in addition to EAGAIN A failed send() may return ENOTCONN when the connection is not yet established. On Linux, we generally see EAGAIN but on OpenBSD we clearly have ENOTCONN, so let's ensure we poll for write when we encounter this error.	2012-11-11 20:53:28 +01:00
Willy Tarreau	665e6ee7aa	MEDIUM: connection: it's not the data layer's role to validate the connection Till now we used to perform the L4_CONN check in the data layer (eg: stream interface) but that does not make sense, because some transport layers will imply that the connection is opened (eg: SSL), and also because the complexity to check for this is higher in the data layer than in the transport layer. This is so much true that some read0 cases did not validate the connection. So as of now, the transport layer is responsible for clearing L4_CONN when it detects an activity, and the data layer may safely rely on this flag. This only impacts a minor change in raw_sock and stream_interface for now.	2012-10-04 22:26:11 +02:00
Willy Tarreau	f7bc57ca6e	REORG: connection: rename the data layer the "transport layer" While working on the changes required to make the health checks use the new connections, it started to become obvious that some naming was not logical at all in the connections. Specifically, it is not logical to call the "data layer" the layer which is in charge for all the handshake and which does not yet provide a data layer once established until a session has allocated all the required buffers. In fact, it's more a transport layer, which makes much more sense. The transport layer offers a medium on which data can transit, and it offers the functions to move these data when the upper layer requests this. And it is the upper layer which iterates over the transport layer's functions to move data which should be called the data layer. The use case where it's obvious is with embryonic sessions : an incoming SSL connection is accepted. Only the connection is allocated, not the buffers nor stream interface, etc... The connection handles the SSL handshake by itself. Once this handshake is complete, we can't use the data functions because the buffers and stream interface are not there yet. Hence we have to first call a specific function to complete the session initialization, after which we'll be able to use the data functions. This clearly proves that SSL here is only a transport layer and that the stream interface constitutes the data layer. A similar change will be performed to rename app_cb => data, but the two could not be in the same commit for obvious reasons.	2012-10-04 22:26:09 +02:00
Willy Tarreau	6f5d141149	MEDIUM: raw_sock: improve connection error reporting When a connection setup is pending and we receive an error without a POLL_IN flag, we're certain there will be nothing to read from it and we can safely report an error without attempting a recv() call. This will be significantly better for health checks which will avoid a useless recv() on all failed checks.	2012-10-04 22:26:09 +02:00
Willy Tarreau	c0e98868fe	MINOR: raw_sock: always report asynchronous connection errors Depending on the pollers used, a connection error may be notified with POLLOUT\|POLLERR\|POLLHUP. POLLHUP by itself is enough for the connection handler to call the read actor, which would only consider this flag as a good indication of a hangup, without considering the POLLERR flag. In order to address this, we directly jump to the read0 label if POLLERR was not set. This will be important with health checks as we don't want to believe a connection was properly established when it's not the case !	2012-10-04 22:26:09 +02:00
Willy Tarreau	d1d5454180	REORG: split "protocols" files into protocol and listener It was becoming confusing to have protocols and listeners in the same files, split them.	2012-09-15 22:29:32 +02:00
Willy Tarreau	56a77e5933	MEDIUM: connection: complete the polling cleanups I/O handlers now all use __conn_{sock,data}_{stop,poll,want}_* instead of returning dummy flags. The code has become slightly simpler because some tricks such as the MIN_RET_FOR_READ_LOOP are not needed anymore, and the data handlers which switch to a handshake handler do not need to disable themselves anymore.	2012-09-03 20:47:35 +02:00
Willy Tarreau	c7e4238df0	REORG: buffers: split buffers into chunk,buffer,channel Many parts of the channel definition still make use of the "buffer" word.	2012-09-03 20:47:32 +02:00
Willy Tarreau	c578891112	CLEANUP: connection: split sock_ops into data_ops, app_cp and si_ops Some parts of the sock_ops structure were only used by the stream interface and have been moved into si_ops. Some of them were callbacks to the stream interface from the connection and have been moved into app_cp as they're the application seen from the connection (later, health-checks will need to use them). The rest has moved to data_ops. Normally at this point the connection could live without knowing about stream interfaces at all.	2012-09-03 20:47:31 +02:00
Willy Tarreau	96199b1016	MAJOR: stream-interface: restore splicing mechanism The splicing is now provided by the data-layer rcv_pipe/snd_pipe functions which in turn are called by the stream interface's recv and send callbacks. The presence of the rcv_pipe/snd_pipe functions is used to attest support for splicing at the data layer. It looks like the stream-interface's SI_FL_CAP_SPLICE flag does not make sense anymore as it's used as a proxy for the pointers above. It also appears that we call chk_snd() from the recv callback and then try to call it again in update_conn(). It is very likely that this last function will progressively slip into the recv/send callbacks in order to avoid duplicate check code. The code works right now with and without splicing. Only raw_sock provides support for it and it is automatically selected when the various splice options are set. However it looks like splice-auto doesn't enable it, which possibly means that the streamer detection code does not work anymore, or that it's only called at a time where it's too late to enable splicing (in process_session).	2012-09-03 20:47:31 +02:00
Willy Tarreau	5368d80ede	MAJOR: connection: split the send call into connection and stream interface Similar to what was done on the receive path, the data layer now provides only an snd_buf() callback that is iterated over by the stream interface's si_conn_send_loop() function. The data layer now has no knowledge about channels nor stream interfaces. The splice() code still need to be ported as it currently is disabled.	2012-09-03 20:47:31 +02:00
Willy Tarreau	ce323dea14	REORG: stream-interface: move sock_raw_read() to si_conn_recv_cb() The recv function is now generic and is usable to iterate any connection-to-buf reading function from a stream interface. So let's move it to stream-interface.	2012-09-03 20:47:30 +02:00
Willy Tarreau	1fe6bc335a	MINOR: stream-interface: add an rcv_buf callback to sock_ops This one is to be used by the read I/O handlers.	2012-09-03 20:47:30 +02:00
Willy Tarreau	af978c4170	MAJOR: raw_sock: temporarily disable splicing It's too hard to convert splicing to connection+buf for now, so let's disable it in order to make progress.	2012-09-03 20:47:30 +02:00
Willy Tarreau	2ba4465086	MAJOR: raw_sock: extract raw_sock_to_buf() from raw_sock_read() This is the start of the stream connection iterator which calls the data-layer reader. This still looks a bit tricky but is OK. Splicing is not handled at all at the moment.	2012-09-03 20:47:30 +02:00
Willy Tarreau	75bf2c925f	REORG: sock_raw: rename the files raw_sock* The "raw_sock" prefix will be more convenient for naming functions as it will be prefixed with the data layer and suffixed with the data direction. So let's rename the files now to avoid any further confusion. The #include directive was also removed from a number of files which do not need it anymore.	2012-09-02 21:54:56 +02:00

1 2

74 Commits