haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-09 08:37:04 +02:00

Author	SHA1	Message	Date
Willy Tarreau	7340ca5a54	[OPTIM] stream_sock: don't shutdown(write) when the socket is in error We get a lot of those, especially with web crawlers : recv(2, 0x810b610, 7000, 0) = -1 ECONNRESET (Connection reset by peer) shutdown(2, 1 /* send */) = -1 ENOTCONN (Transport endpoint is not connected) close(2) = 0 There's no need to perform the shutdown() here, the socket is already in error so it is down.	2010-01-16 10:03:45 +01:00
Willy Tarreau	fc1daaf497	[CLEANUP] stream_sock: MSG_NOSIGNAL is only for send(), not recv() We must not set this flag on recv(), it's not used, it's just for send().	2010-01-15 10:26:13 +01:00
Willy Tarreau	2be3939416	[MINOR] http: don't wait for sending requests to the server By default we automatically wait for enough data to fill large packets if buf->to_forward is not null. This causes a problem with POST/Expect requests which have a data size but no data immediately available. Instead of causing noticeable delays on such requests, simply add a flag to disable waiting when sending requests.	2010-01-03 17:24:51 +01:00
Willy Tarreau	face839296	[OPTIM] http: set MSG_MORE on response when a pipelined request is pending Many times we see a lot of short responses in HTTP (typically 304 on a reload). It is a waste of network bandwidth to send that many small packets when we know we can merge them. When we know that another HTTP request is following a response, we set BF_EXPECT_MORE on the response buffer, which will turn MSG_MORE on exactly once. That way, multiple short responses can leave pipelined if their corresponding requests were also pipelined.	2010-01-03 11:37:54 +01:00
Willy Tarreau	d38b53b896	[MINOR] stream_sock: enable MSG_MORE when forwarding finite amount of data While it could be dangerous to enable MSG_MORE on infinite data (eg: interactive sessions), it makes sense to enable it when we know the chunk to be sent is just a part of a larger one.	2010-01-03 11:18:34 +01:00
Willy Tarreau	4c283dce4b	[MINOR] stream_sock: add SI_FL_NOLINGER for faster close This new flag may be set by any user on a stream interface to tell the underlying protocol that there is no need for lingering on the socket since we know the other side either received everything or does not care about what we sent. This will typically be used with forced server close in HTTP mode, where we want to quickly close a server connection after receiving its response. Otherwise the system would prevent us from reusing the same port for some time.	2009-12-29 14:36:34 +01:00
Willy Tarreau	33b2db69a9	[MINOR] stream_sock: prepare for closing when all pending data are sent Since we'll soon be able to close a connection with remaining data in a buffer, it becomes obvious that we can prepare to close when we're about to send the last chunk of data and not the whole buffer.	2009-12-29 08:02:56 +01:00
Willy Tarreau	864e8256ec	[BUG] stream_sock: wrong max computation on recv Since the introduction of the automatic sizing of buffers during reads, a bug appeared where the max size could be negative, causing large chunks of memory to be overwritten during recv() calls if a read pointer was already past the buffer's limit.	2009-12-28 17:36:37 +01:00
Willy Tarreau	7c3c54177a	[MAJOR] buffers: automatically compute the maximum buffer length We used to apply a limit to each buffer's size in order to leave some room to rewrite headers, then we used to remove this limit once the session switched to a data state. Proceeding that way becomes a problem with keepalive because we have to know when to stop reading too much data into the buffer so that we can leave some room again to process next requests. The principle we adopt here consists in only relying on to_forward+send_max. Indeed, both of those data define how many bytes will leave the buffer. So as long as their sum is larger than maxrewrite, we can safely fill the buffers. If they are smaller, then we refrain from filling the buffer. This means that we won't risk to fill buffers when reading last data chunk followed by a POST request and its contents. The only impact identified so far is that we must ensure that the BF_FULL flag is correctly dropped when starting to forward. Right now this is OK because nobody inflates to_forward without using buffer_forward().	2009-12-22 10:06:34 +01:00
Willy Tarreau	a9de333aa5	[BUG] stream_sock: BUF_INFINITE_FORWARD broke splice on 64-bit platforms Yohan Tordjman at Dstorage found that upgrading haproxy to 1.4-dev4 caused truncated objects to be returned. An strace quickly exhibited the issue which was 100% reproducible : 4297 epoll_wait(0, {}, 10, 0) = 0 4297 epoll_wait(0, {{EPOLLIN, {u32=7, u64=7}}}, 10, 1000) = 1 4297 splice(0x7, 0, 0x5, 0, 0xffffffffffffffff, 0x3) = -1 EINVAL (Invalid argument) 4297 shutdown(7, 1 /* send /) = 0 4297 close(7) = 0 4297 shutdown(2, 1 / send */) = 0 4297 close(2) = 0 This is caused by the fact that the forward length is taken from BUF_INFINITE_FORWARD, which is -1. The problem does not appear in 32-bit mode because this value is first cast to an unsigned long, truncating it to 32-bit (4 GB). Setting an upper bound fixes the issue. Also, a second error check has been added for splice. If EINVAL is returned, we fall back to recv().	2009-11-28 07:47:10 +01:00
Willy Tarreau	f1ba4b3de5	[MAJOR] buffer: flag BF_DONT_READ to disable reads when not required When processing a GET or HEAD request in close mode, we know we don't need to read anything anymore on the socket, so we can disable it. Doing this can save up to 40% of the recv calls, and half of the epoll_ctl calls. For this we need a buffer flag indicating that we're not interesting in reading anymore. Right now, this flag also disables both polled reads. We might benefit from disabling only speculative reads, but we will need at least this flag when we want to support keepalive anyway. Currently we don't disable the flag on completion, but it does not matter as we close ASAP when performing the shutw().	2009-10-18 08:52:24 +02:00
Willy Tarreau	8d5d77efc3	[OPTIM] move some rarely used fields out of fdtab Some rarely information are stored in fdtab, making it larger for no reason (source port ranges, remote address, ...). Such information lie there because the checks can't find them anywhere else. The goal will be to move these information to the stream interface once the checks make use of it. For now, we move them to an fdinfo array. This simple change might have improved the cache hit ratio a little bit because a 0.5% of performance increase has measured.	2009-10-18 08:17:33 +02:00
Willy Tarreau	fe8903cc76	[BUG] don't refresh timeouts late after detected activity In old versions, before 1.3.16, we had to refresh the timeouts after each call to process_session() because the stream socket handler did not do it. Now that the sockets can exchange data for a long period without calling process_session(), we can detect an old activity and refresh a timeout long after the last activity, causing too late a detection of some timeouts. The fix simply consists in not checking for activity anymore in stream_sock_data_finish() but only set a timeout if it was not previously set.	2009-10-04 10:56:08 +02:00
Willy Tarreau	f27b5ea8dc	[MEDIUM] new option "independant-streams" to stop updating read timeout on writes By default, when data is sent over a socket, both the write timeout and the read timeout for that socket are refreshed, because we consider that there is activity on that socket, and we have no other means of guessing if we should receive data or not. While this default behaviour is desirable for almost all applications, there exists a situation where it is desirable to disable it, and only refresh the read timeout if there are incoming data. This happens on sessions with large timeouts and low amounts of exchanged data such as telnet session. If the server suddenly disappears, the output data accumulates in the system's socket buffers, both timeouts are correctly refreshed, and there is no way to know the server does not receive them, so we don't timeout. However, when the underlying protocol always echoes sent data, it would be enough by itself to detect the issue using the read timeout. Note that this problem does not happen with more verbose protocols because data won't accumulate long in the socket buffers. When this option is set on the frontend, it will disable read timeout updates on data sent to the client. There probably is little use of this case. When the option is set on the backend, it will disable read timeout updates on data sent to the server. Doing so will typically break large HTTP posts from slow lines, so use it with caution.	2009-10-03 22:01:18 +02:00
Willy Tarreau	89f7ef295d	[MINOR] stream_interface: add SI_FL_DONT_WAKE flag We had to add a new stream_interface flag : SI_FL_DONT_WAKE. This flag is used to indicate that a stream interface is being updated and that no wake up should be sent to its owner. This will be required for tasks embedded into stream interfaces. Otherwise, we could have the owner task send wakeups to itself during status updates, thus preventing the state from converging. As long as a stream_interface's status is being monitored and adjusted, there is no reason to wake it up again, as we know its changes will be seen and considered.	2009-09-23 23:52:14 +02:00
Willy Tarreau	31971e536a	[MEDIUM] add support for infinite forwarding In TCP, we don't want to forward chunks of data, we want to forward indefinitely. This patch introduces a special value for the amount of data to be forwarded. When buffer_forward() is called with BUF_INFINITE_FORWARD, it configures the buffer to never stop forwarding until the end.	2009-09-20 12:07:52 +02:00
Willy Tarreau	59454bfaa4	[MINOR] stream_sock: don't set SI_FL_WAIT_DATA if BF_SHUTW_NOW is set Don't ask for more data when we know we're about to close. This is harmless but better have it cleaned up.	2009-09-20 11:14:27 +02:00
Willy Tarreau	ba0b63d2c7	[MAJOR] buffers: fix the BF_EMPTY flag's meaning The BF_EMPTY flag was once used to indicate an empty buffer. However, it was used half the time as meaning the buffer is empty for the reader, and half the time as meaning there is nothing left to send. "nothing to send" is only indicated by "->send_max=0 && !pipe". Once we fix this, we discover that the flag is not used anymore. So the flags has been renamed BF_OUT_EMPTY and means exactly the condition above, ie, there is nothing to send. Doing so has allowed us to remove some unused tests for emptiness, but also to uncover a certain amount of situations where the flag was not correctly set or tested.	2009-09-20 08:17:45 +02:00
Willy Tarreau	520d95e42b	[MAJOR] buffers: split BF_WRITE_ENA into BF_AUTO_CONNECT and BF_AUTO_CLOSE The BF_WRITE_ENA buffer flag became very complex to deal with, because it was used to : - enable automatic connection - enable close forwarding - enable data forwarding The last point was not very true anymore since we introduced ->send_max, but still the test remained everywhere. This was causing issues such as impossibility to connect without forwarding data, impossibility to prevent closing when data was forwarded, etc... This patch clarifies the situation by getting rid of this multi-purpose flag and replacing it with : - data forwarding based only on ->send_max \|\| ->pipe ; - a new BF_AUTO_CONNECT flag to allow automatic connection and only that ; - ability to perform an automatic connection when ->send_max or ->pipe indicate that data is waiting to leave the buffer ; - a new BF_AUTO_CLOSE flag to let the producer automatically set the BF_SHUTW_NOW flag when it gets a BF_SHUTR. During this cleanup, it was discovered that some tests were performed twice, or that the BF_HIJACK flag was still tested, which is not needed anymore since ->send_max replcaed it. These places have been fixed too. These cleanups have also revealed a few areas where the other flags such as BF_EMPTY are not cleanly used. This will be an opportunity for a second patch.	2009-09-19 21:14:54 +02:00
Willy Tarreau	418fd4722a	[MAJOR] buffers: fix misuse of the BF_SHUTW_NOW flag This flag was incorrectly used as meaning "close immediately", while it needs to say "close ASAP". ASAP here means when unsent data pending in the buffer are sent. This helps cleaning up some dirty tricks where the buffer output was checking the BF_SHUTR flag combined with EMPTY and other such things. Now we have a clearly defined semantics : - producer sets SHUTR and may set SHUTW_NOW if WRITE_ENA is set, otherwise leave it to the session processor to set it. - consumer only checks SHUTW_NOW to decide whether or not to call shutw(). This also induced very minor changes at some locations which were not protected against buffer changes while the SHUTW_NOW flag was set. Now we prevent send_max from changing when the flag is set. Several tests have been run without any unexpected behaviour detected. Some more cleanups are needed, as it clearly appears that some tests could be removed with stricter semantics.	2009-09-19 14:53:46 +02:00
Dmitry Sivachenko	caf58986fb	[BUILD] compilation of haproxy-1.4-dev2 on FreeBSD Please consider the following patches. They are required to compile haproxy-1.4-dev2 on FreeBSD. Summary: 1) include <sys/types.h> before <netinet/tcp.h> 2) Use IPPROTO_TCP instead of SOL_TCP (they are both defined as 6, TCP protocol number)	2009-08-30 14:45:19 +02:00
Willy Tarreau	6db06d3870	[MEDIUM] remove TCP_CORK and make use of MSG_MORE instead send() supports the MSG_MORE flag on Linux, which does the same as TCP_CORK except that we don't have to remove TCP_NODELAY before and we don't need any syscall to set/remove it. This can save up to 4 syscalls around a send() (two for setting it, two for removing it), and it's much cleaner since it is not persistent. So make use of it instead.	2009-08-19 11:29:44 +02:00
Willy Tarreau	d6d06909da	[CLEANUP] remove ifdef MSG_NOSIGNAL and define it instead ifdefs are really annoying in the code. Define MSG_NOSIGNAL to zero when undefined and remove associated ifdefs.	2009-08-19 11:25:08 +02:00
Willy Tarreau	a07a34eb24	[MEDIUM] replace BUFSIZE with buf->size in computations The first step towards dynamic buffer size consists in removing all static definitions of the buffer size. Instead, we store a buffer's size in itself. Right now they're all preinitialized to BUFSIZE, but we will change that.	2009-08-16 23:27:46 +02:00
Willy Tarreau	c9fce2fee8	[BUILD] fix build for systems without SOL_TCP Andrew Azarov reported that haproxy-1.4-dev1 does not build under FreeBSD 7.2 because SOL_TCP is not defined. So add a check for its definition before using it. This only impacts network optimisations anyway.	2009-08-16 14:13:47 +02:00
Willy Tarreau	c54aef3180	[BUG] fix random pauses on last segment of a series During a direct data transfer from the server to the client, if the system did not have enough buffers anymore, haproxy would not enable write polling again if it could write at least one data chunk. Under normal conditions, this would remain undetected because the remaining data would be pushed by next data chunks. However, when this happens on the last chunk of a session, or the last in a series in an interactive bidirectional TCP transfer, haproxy would only start sending again when the read timeout was reached on the side it stopped writing, causing long pauses on some protocols such as SQL. This bug was reported by an Exceliance customer who generously offered to help us by sending large amounts of traces and running various tests on production systems. It is quite hard to trigger it but it becomes easier with a ping-pong TCP service which transfers random data sizes, with a modified version of send() able to send packets smaller than the average transfer size. A cleaner fix would imply only updating the write timeout when data transfers are attempted, not succeeded, but that requires more sensible code changes without fixing the result. It is a candidate for a later patch though.	2009-07-27 20:08:06 +02:00
Willy Tarreau	7154365cc6	[BUG] stream_sock: don't stop reading when the poller reports an error As reported by Jean-Baptiste Quenot and Robbie Aelter, sometimes a backend server error is converted to a 502 error if the backend stops before reading all the request. The reason is that the remote system sends a TCP RST packet because there are still unread data pending in the socket buffer. This RST is translated as a socket error on the local system, and this error is reported by the poller. However, most of the time, it's a write error, but the system is still able to read the remaining pending data, such as in the trace below : send(7, "GET /aaa HTTP/1.0\r\nUser-Agent: Mo"..., 1123, MSG_DONTWAIT\|MSG_NOSIGNAL) = 1123 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=7, u64=7}}) = 0 epoll_wait(3, {{EPOLLIN\|EPOLLERR\|EPOLLHUP, {u32=7, u64=7}}}, 8, 1000) = 1 gettimeofday({1247593958, 643572}, NULL) = 0 recv(7, "HTTP/1.0 400 Bad request\r\nCache-C"..., 7000, MSG_NOSIGNAL) = 187 setsockopt(6, SOL_TCP, TCP_NODELAY, [0], 4) = 0 setsockopt(6, SOL_TCP, TCP_CORK, [1], 4) = 0 send(6, "HTTP/1.0 400 Bad request\r\nCache-C"..., 187, MSG_DONTWAIT\|MSG_NOSIGNAL) = 187 shutdown(6, 1 /* send */) = 0 The recv succeeded while epoll_wait() reported an error. Note: This case is very hard to reproduce and requires that the backend server is reached via the loopback in order to minimise latency and reduce the risk of sent data being ACKed.	2009-07-14 19:55:05 +02:00
Willy Tarreau	720058cdcb	[BUG] stream_sock: always shutdown(SHUT_WR) before closing When we close a socket with unread data in the buffer, or when the nolinger option is set, we regularly lose the last fragment, which often contains the error message. This typically occurs when sending too large a request. Only the RST is seen due to the close() (since not all data were read) and the output message never reaches the network. Doing a shutdown() before the close() solves this annoying issue because the data are really pushed before the system sends the RST.	2009-07-14 19:21:50 +02:00
Willy Tarreau	dc340a900d	[MEDIUM] splice: set the capability on each stream_interface The splice code did not consider compatibility between both ends of the connection. Now we set different capabilities on each stream interface, depending on what the protocol can splice to/from. Right now, only TCP is supported. Thanks to this, we're now able to automatically detect when splice() is not implemented and automatically disable it on one end instead of reporting errors to the upper layer.	2009-06-28 23:10:19 +02:00
Willy Tarreau	5d707e1aaa	[MEDIUM] stream_sock: don't close prematurely when nolinger is set When the nolinger option is used, we must not close too fast because some data might be left unsent. Instead we must proceed with a normal shutdown first, then a close. Also, we want to avoid merging FIN with the last segment if nolinger is set, because if that one gets lost, there is no chance for it to be retransmitted.	2009-06-28 11:09:07 +02:00
Willy Tarreau	fb14edc215	[MEDIUM] stream_sock: implement tcp-cork for use during shutdowns on Linux Setting TCP_CORK on a socket before sending the last segment enables automatic merging of this segment with the FIN from the shutdown() call. Playing with TCP_CORK is not easy though as we have to track the status of the TCP_NODELAY flag since both are mutually exclusive. Doing so saves one more packet per session and offers about 5% more performance. There is no reason not to do it, so there is no associated option.	2009-06-14 15:24:37 +02:00
Willy Tarreau	d06e71179a	[BUG] stream_sock: check for shut{r,w} before refreshing some timeouts Under some circumstances, it appears possible to refresh a timeout just after a side has been shut. For instance, if poll() plans to call both read and write, and the read side calls chk_snd() which in turn causes a shutw to occur, then stream_sock_write could update its write timeout. The same problem happens the other way. The timeout checks will then not catch these cases because they ignore timeouts in case of shut{r,w}. This is very likely to be the major cause of the 100% CPU usages reported by Bart Bobrowski. The fix consists in always ensuring that a side is not shut before updating its timeout.	2009-03-29 10:18:41 +02:00
Willy Tarreau	1714e0ffda	[BUG] stream_sock: disable I/O on fds reporting an error Upon read or write error, we cannot immediately close the FD because we want to first report the error to the upper layer which will do it itself. However, we want to prevent any further I/O from being performed on the FD. This is especially important in case of speculative I/O where nothing else could stop the FD from still being polled until the upper layer takes care of the condition.	2009-03-28 23:42:30 +01:00
Willy Tarreau	127334e89b	[BUG] reset the stream_interface connect timeout upon connect or error The stream_interface timeout was not reset upon a connect success or error, leading to busy loops when requeuing tasks in the past. Thanks to Bart Bobrowski for reporting the issue.	2009-03-28 11:01:20 +01:00
Willy Tarreau	1b194fe03e	[OPTIM] buffer: new BF_READ_DONTWAIT flag reduces EAGAIN rates When the reader does not expect to read lots of data, it can set BF_READ_DONTWAIT on the request buffer. When it is set, the stream_sock_read callback will not try to perform multiple reads, it will return after only one, and clear the flag. That way, we can immediately return when waiting for an HTTP request without trying to read again. On pure request/responses schemes such as monitor-uri or redirects, this has completely eliminated the EAGAIN occurrences and the epoll_ctl() calls, resulting in a performance increase of about 10%. Similar effects should be observed once we support HTTP keep-alive since we'll immediately disable reads once we get a full request.	2009-03-21 21:57:30 +01:00
Willy Tarreau	6f4a82c7af	[OPTIM] stream_sock: don't retry to read after a large read If we get very large data at once, it's almost certain that it's worthless trying to read again, because we got everything we could get. Doing this has made all -EAGAIN disappear from splice reads. The threshold has been put in the global tunable structures so that if we one day want to make it accessible from user config, it will be easy to do so.	2009-03-21 20:43:57 +01:00
Willy Tarreau	c9619468ea	[BUG] stream_sock: write timeout must be updated when forwarding ! When data are forwarded between socket, we must update the output socket's write timeout. This was forgotten, causing sessions to unexpectedly expire during long posts.	2009-03-09 22:40:57 +01:00
Willy Tarreau	87bed62a92	[BUILD] build fixes for Solaris One build error in stream_sock.c when MSG_NOSIGNAL is not defined, and a warning in task.c.	2009-03-08 22:25:28 +01:00
Vincenzo Farruggia	9b97cff1c2	[BUILD] Haproxy won't compile if DEBUG_FULL is defined As subject when i try to compile haproxy with -DDEBUG_FULL it stop at stream_sock.c file with: gcc -Iinclude -Wall -O2 -g -DDEBUG_FULL -DTPROXY -DENABLE_POLL -DENABLE_EPOLL -DENABLE_SEPOLL -DNETFILTER -DUSE_GETSOCKNAME -DCONFIG_HAPROXY_VERSION=\"1.3.15\" -DCONFIG_HAPROXY_DATE=\"2008/04/19\" -c -o src/stream_sock.o src/stream_sock.c src/stream_sock.c: In function 'stream_sock_chk_rcv': src/stream_sock.c:905: error: 'fd' undeclared (first use in this function) src/stream_sock.c:905: error: (Each undeclared identifier is reported only once src/stream_sock.c:905: error: for each function it appears in.) src/stream_sock.c:905: error: 'ob' undeclared (first use in this function) src/stream_sock.c: In function 'stream_sock_chk_snd': src/stream_sock.c:940: error: 'fd' undeclared (first use in this function) src/stream_sock.c:940: error: 'ib' undeclared (first use in this function) make: *** [src/stream_sock.o] Error 1 With this patch all build fine:	2009-02-04 22:46:19 +01:00
Willy Tarreau	3eba98aa57	[MEDIUM] splice: make use of pipe pools Using pipe pools makes pipe management a lot easier. It also allows to remove quite a bunch of #ifdefs in areas which depended on the presence or not of support for kernel splicing. The buffer now holds a pointer to a pipe structure which is always NULL except if there are still data in the pipe. When it needs to use that pipe, it dynamically allocates it from the pipe pool. When the data is consumed, the pipe is immediately released. That way, there is no need anymore to care about pipe closure upon session termination, nor about pipe creation when trying to use splice(). Another immediate advantage of this method is that it considerably reduces the number of pipes needed to use splice(). Tests have shown that even with 0.2 pipe per connection, almost all sessions can use splice(), because the same pipe may be used by several consecutive calls to splice().	2009-01-25 13:56:13 +01:00
Willy Tarreau	98b306be65	[MEDIUM] splice: add hints to support older buggy kernels Kernels before 2.6.27.13 would have splice() return EAGAIN on shutdown. By adding a few tricks, we can deal with the situation. If splice() returns EAGAIN and the pipe is empty, then fallback to recv() which will be able to check if it's an end of connection or not. The advantage of this method is that it remains transparent for good kernels since there is no reason that epoll() will return EPOLLIN without anything to read, and even if it would happen, the recv() overhead on this check is minimal.	2009-01-25 11:11:32 +01:00
Willy Tarreau	5bd8c376ad	[MAJOR] complete support for linux 2.6 kernel splicing This code provides support for linux 2.6 kernel splicing. This feature appeared in kernel 2.6.25, but initial implementations were awkward and buggy. A kernel >= 2.6.29-rc1 is recommended, as well as some optimization patches. Using pipes, this code is able to pass network data directly between sockets. The pipes are a bit annoying to manage (fd creation, release, ...) but finally work quite well. Preliminary tests show that on high bandwidths, there's a substantial gain (approx +50%, only +20% with kernel workarounds for corruption bugs). With 2000 concurrent connections, with Myricom NICs, haproxy now more easily achieves 4.5 Gbps for 1 process and 6 Gbps for two processes buffers. 8-9 Gbps are easily reached with smaller numbers of connections. We also try to splice out immediately after a splice in by making profit from the new ability for a data producer to notify the consumer that data are available. Doing this ensures that the data are immediately transferred between sockets without latency, and without having to re-poll. Performance on small packets has considerably increased due to this method. Earlier kernels return only one TCP segment at a time in non-blocking splice-in mode, while newer return as many segments as may fit in the pipe. To work around this limitation without hurting more recent kernels, we try to collect as much data as possible, but we stop when we believe we have read 16 segments, then we forward everything at once. It also ensures that even upon shutdown or EAGAIN the data will be forwarded. Some tricks were necessary because the splice() syscall does not make a difference between missing data and a pipe full, it always returns EAGAIN. The trick consists in stop polling in case of EAGAIN and a non empty pipe. The receiver waits for the buffer to be empty before using the pipe. This is in order to avoid confusion between buffer data and pipe data. The BF_EMPTY flag now covers the pipe too. Right now the code is disabled by default. It needs to be built with CONFIG_HAP_LINUX_SPLICE, and the instances intented to use splice() must have "option splice-response" (or option splice-request) enabled. It is probably desirable to keep a pool of pre-allocated pipes to avoid having to create them for every session. This will be worked on later. Preliminary tests show very good results, even with the kernel workaround causing one memcpy(). At 3000 connections, performance has moved from 3.2 Gbps to 4.7 Gbps.	2009-01-19 00:32:22 +01:00
Willy Tarreau	6b4aad4c1b	[MEDIUM] add definitions for Linux kernel splicing Some older libc don't define the splice() syscall, and some even define a wrong one. For this reason, we try our best to declare it correctly. These definitions still work with recent glibc.	2009-01-18 21:59:13 +01:00
Willy Tarreau	a456f2a059	[MEDIUM] stream_sock: try to send pending data on chk_snd() When the producer calls stream_sock_chk_snd(), we now try to send all pending data asynchronously. If it succeeds, we don't have to enable polling on the FD which saves about half of the calls to epoll_wait(). In stream_sock_read(), we finally set the WAIT_ROOM flag as soon as possible, in preparation of the splice code. We reset it when we detect that some room has been released either in the buffer or in the splice.	2009-01-18 19:43:47 +01:00
Willy Tarreau	d2def0fd25	[MINOR] stream_sock: fix a few wrong empty calculations	2009-01-18 17:37:33 +01:00
Willy Tarreau	9c0fe59612	[MEDIUM] stream_sock_read: call ->chk_snd whenever there are data pending The condition to cakk ->chk_snd() in stream_sock_read() was suboptimal because we did not call it when the socket was shut down nor when there was an error after data were added. Now we ensure to call is whenever there are data pending. Also, the "full" condition was handled before calling chk_snd(), which could cause deadlock issues if chk_snd() did consume some data.	2009-01-18 16:25:31 +01:00
Willy Tarreau	0c2fc1f39d	[MEDIUM] split stream_sock_write() into callback and core functions stream_sock_write() has been split in two parts : - the poll callback, intented to be called when an I/O event has been detected - the write() core function, which ought to be usable from various other places, possibly not meant to wake the task up. The code has also been slightly cleaned up in the process. It's more readable now.	2009-01-18 15:48:52 +01:00
Willy Tarreau	ac128fef73	[CLEANUP] stream_sock: move the write-nothing condition out of the loop Some tricks to handle situations where we write nothing were in the middle of the main loop in stream_sock_write(). This cleanup provides better source and object code, and slightly shrinks the output code.	2009-01-09 13:05:19 +01:00
Willy Tarreau	efc612c17b	[CLEANUP] replace a few occurrences of (flags & X) && !(flags & Y) This construct collapses into ((flags & (X\|Y)) == X) when X is a single-bit flag. This provides a noticeable code shrink and the output code results in less conditional jumps.	2009-01-09 12:18:24 +01:00
Willy Tarreau	68eac13217	[OPTIM] stream_sock: factor out the buffer full handling out of the loop Handling the buffer full condition is not trivial and this code was duplicated inside the loop. Move it out of the loop at a single place.	2009-01-09 11:38:52 +01:00

1 2 3

112 Commits