Commit Graph

122 Commits

Author SHA1 Message Date
Willy Tarreau
fe598a7779 [BUILD] stream_sock: previous fix lacked the #include, causing a warning. 2010-09-21 21:48:23 +02:00
Willy Tarreau
e9f32dbf5c [BUG] stream_sock: cleanly disable the listener in case of resource shortage
Jozsef R.Nagy reported a reliability issue on FreeBSD. Sometimes an error
would be emitted, reporting the inability to switch a socket to non-blocking
mode and the listener would definitely not accept anything. Cyril Bonté
narrowed this bug down to the call to EV_FD_CLR(l->fd, DIR_RD).

He was right because this call is wrong. It only disables input events on
the listening socket, without setting the listener to the LI_LISTEN state,
so any subsequent call to enable_listener() from maintain_proxies() is
ignored ! The correct fix consists in calling disable_listener() instead.

It is discutable whether we should keep such error path or just ignore the
event. The goal in earlier versions was to temporarily disable new activity
in order to let the system recover while releasing resources.
2010-09-21 21:14:29 +02:00
Willy Tarreau
af7ad00a99 [MINOR] support a global jobs counter
This counter is incremented for each incoming connection and each active
listener, and is used to prevent haproxy from stopping upon SIGUSR1. It
will thus be possible for some tasks in increment this counter in order
to prevent haproxy from dying until they have completed their job.
2010-08-31 15:39:26 +02:00
Willy Tarreau
5af1fa1df0 [MAJOR] stream_sock: better wakeup conditions on read()
After a read, there was a condition to mandatorily wake the task
up if the BF_READ_DONTWAIT flag was set. This was wrong because
the wakeup condition in this case can be deduced from the other
ones. Another condition was put on the other side not being in
SI_ST_EST state. It is not appropriate to do this because it
causes a useless wakeup at the beginning of every first request
in case of speculative polling, due to the fact that we don't
read anything and that the other side is still in SI_ST_INI.
Also, the wakeup was performed whenever to_forward was null,
which causes an unexpected wakeup upon the first read for the
same reason. However, those two conditions are valid if and
only if at least one read was performed.

Also, the BF_SHUTR flag was tested as part of the wakeup condition,
while this one can only be set if BF_READ_NULL is set too. So let's
simplify this ambiguous test by removing the BF_SHUTR part from the
condition to only process events.

Last, the BF_READ_DONTWAIT flag was unconditionally cleared,
while sometimes there would have been no I/O. Now we only clear
it once the I/O operation has been performed, which maintains
its validity until the I/O occurs.

Finally, those fixes saved approximately 16% of the per-session
wakeups and 20% of the epoll_ctl() calls, which translates into
slightly less under high load due to the request often being ready
when the read() occurs. A performance increase between 2 and 5% is
expected depending on the workload.

It does not seem necessary to backport this change to 1.4, eventhough
it fixes some performance issues. It may later be backported if
required to fix something else because the risk of regression seems
very low due to the fact that we're more in line with the documented
semantics.
2010-08-10 14:04:09 +02:00
Willy Tarreau
0bd05eaf24 [MEDIUM] stream-interface: add a ->release callback
When a connection is closed on a stream interface, some iohandlers
will need to be informed in order to release some resources. This
normally happens upon a shutr+shutw. It is the equivalent of the
fd_delete() call which is done for real sockets, except that this
time we release internal resources.

It can also be used with real sockets because it does not cost
anything else and might one day be useful.
2010-07-13 16:06:23 +02:00
Willy Tarreau
24dcaf3450 [MEDIUM] frontend: count the incoming connection earlier
The frontend's connection was accounted for once the session was
instanciated. This was problematic because the early ACLs weren't
able to correctly account for the number of concurrent connections.
Now we count the connection once it is assigned to the frontend.
It also brings the nice advantage of being more symmetrical, because
the stream_sock's accept() does not have to account for that anymore,
only the session's accept() does.
2010-06-14 10:53:19 +02:00
Willy Tarreau
7999ddbc95 [MINOR] stream_sock: don't dereference a non-existing frontend
The stream_sock accept() can be used without any frontend. Check
everywhere if it exists before dereferencing it in the error path.
2010-06-14 10:53:18 +02:00
Willy Tarreau
a8f55d5473 [MEDIUM] backend: initialize the server stream_interface upon connect()
It's not normal to initialize the server-side stream interface from the
accept() function, because it may change later. Thus, we introduce a new
stream_sock_prepare_interface() function which is called just before the
connect() and which sets all of the stream_interface's callbacks to the
default ones used for real sockets. The ->connect function is also set
at the same instant so that we can easily add new server-side protocols
soon.
2010-06-14 10:53:15 +02:00
Willy Tarreau
eb472685cb [MEDIUM] separate protocol-level accept() from the frontend's
For a long time we had two large accept() functions, one for TCP
sockets instanciating proxies, and another one for UNIX sockets
instanciating the stats interface.

A lot of code was duplicated and both did not work exactly the same way.

Now we have a stream_sock layer accept() called for either TCP or UNIX
sockets, and this function calls the frontend-specific accept() function
which does the rest of the frontend-specific initialisation.

Some code is still duplicated (session & task allocation, stream interface
initialization), and might benefit from having an intermediate session-level
accept() callback to perform such initializations. Still there are some
minor differences that need to be addressed first. For instance, the monitor
nets should only be checked for proxies and not for other connection templates.

Last, we renamed l->private as l->frontend. The "private" pointer in
the listener is only used to store a frontend, so let's rename it to
eliminate this ambiguity. When we later support detached listeners
(eg: FTP), we'll add another field to avoid the confusion.
2010-06-14 10:53:11 +02:00
Willy Tarreau
03fa5df64a [CLEANUP] rename client -> frontend
The 'client.c' file now only contained frontend-specific functions,
so it has naturally be renamed 'frontend.c'. Same for client.h. This
has also been an opportunity to remove some cross references from
files that should not have depended on it.

In the end, this file should contain a protocol-agnostic accept()
code, which would initialize a session, task, etc... based on an
accept() from a lower layer. Right now there are still references
to TCP.
2010-06-14 10:53:10 +02:00
Willy Tarreau
7340ca5a54 [OPTIM] stream_sock: don't shutdown(write) when the socket is in error
We get a lot of those, especially with web crawlers :

recv(2, 0x810b610, 7000, 0)             = -1 ECONNRESET (Connection reset by peer)
shutdown(2, 1 /* send */)               = -1 ENOTCONN (Transport endpoint is not connected)
close(2)                                = 0

There's no need to perform the shutdown() here, the socket is already
in error so it is down.
2010-01-16 10:03:45 +01:00
Willy Tarreau
fc1daaf497 [CLEANUP] stream_sock: MSG_NOSIGNAL is only for send(), not recv()
We must not set this flag on recv(), it's not used, it's just for
send().
2010-01-15 10:26:13 +01:00
Willy Tarreau
2be3939416 [MINOR] http: don't wait for sending requests to the server
By default we automatically wait for enough data to fill large
packets if buf->to_forward is not null. This causes a problem
with POST/Expect requests which have a data size but no data
immediately available. Instead of causing noticeable delays on
such requests, simply add a flag to disable waiting when sending
requests.
2010-01-03 17:24:51 +01:00
Willy Tarreau
face839296 [OPTIM] http: set MSG_MORE on response when a pipelined request is pending
Many times we see a lot of short responses in HTTP (typically 304 on a
reload). It is a waste of network bandwidth to send that many small packets
when we know we can merge them. When we know that another HTTP request is
following a response, we set BF_EXPECT_MORE on the response buffer, which
will turn MSG_MORE on exactly once. That way, multiple short responses can
leave pipelined if their corresponding requests were also pipelined.
2010-01-03 11:37:54 +01:00
Willy Tarreau
d38b53b896 [MINOR] stream_sock: enable MSG_MORE when forwarding finite amount of data
While it could be dangerous to enable MSG_MORE on infinite data (eg:
interactive sessions), it makes sense to enable it when we know the
chunk to be sent is just a part of a larger one.
2010-01-03 11:18:34 +01:00
Willy Tarreau
4c283dce4b [MINOR] stream_sock: add SI_FL_NOLINGER for faster close
This new flag may be set by any user on a stream interface to tell
the underlying protocol that there is no need for lingering on the
socket since we know the other side either received everything or
does not care about what we sent.

This will typically be used with forced server close in HTTP mode,
where we want to quickly close a server connection after receiving
its response. Otherwise the system would prevent us from reusing
the same port for some time.
2009-12-29 14:36:34 +01:00
Willy Tarreau
33b2db69a9 [MINOR] stream_sock: prepare for closing when all pending data are sent
Since we'll soon be able to close a connection with remaining data in a
buffer, it becomes obvious that we can prepare to close when we're about
to send the last chunk of data and not the whole buffer.
2009-12-29 08:02:56 +01:00
Willy Tarreau
864e8256ec [BUG] stream_sock: wrong max computation on recv
Since the introduction of the automatic sizing of buffers during reads,
a bug appeared where the max size could be negative, causing large
chunks of memory to be overwritten during recv() calls if a read pointer
was already past the buffer's limit.
2009-12-28 17:36:37 +01:00
Willy Tarreau
7c3c54177a [MAJOR] buffers: automatically compute the maximum buffer length
We used to apply a limit to each buffer's size in order to leave
some room to rewrite headers, then we used to remove this limit
once the session switched to a data state.

Proceeding that way becomes a problem with keepalive because we
have to know when to stop reading too much data into the buffer
so that we can leave some room again to process next requests.

The principle we adopt here consists in only relying on to_forward+send_max.
Indeed, both of those data define how many bytes will leave the buffer.
So as long as their sum is larger than maxrewrite, we can safely
fill the buffers. If they are smaller, then we refrain from filling
the buffer. This means that we won't risk to fill buffers when
reading last data chunk followed by a POST request and its contents.

The only impact identified so far is that we must ensure that the
BF_FULL flag is correctly dropped when starting to forward. Right
now this is OK because nobody inflates to_forward without using
buffer_forward().
2009-12-22 10:06:34 +01:00
Willy Tarreau
a9de333aa5 [BUG] stream_sock: BUF_INFINITE_FORWARD broke splice on 64-bit platforms
Yohan Tordjman at Dstorage found that upgrading haproxy to 1.4-dev4
caused truncated objects to be returned. An strace quickly exhibited
the issue which was 100% reproducible :

4297  epoll_wait(0, {}, 10, 0)          = 0
4297  epoll_wait(0, {{EPOLLIN, {u32=7, u64=7}}}, 10, 1000) = 1
4297  splice(0x7, 0, 0x5, 0, 0xffffffffffffffff, 0x3) = -1 EINVAL (Invalid argument)
4297  shutdown(7, 1 /* send */)         = 0
4297  close(7)                          = 0
4297  shutdown(2, 1 /* send */)         = 0
4297  close(2)                          = 0

This is caused by the fact that the forward length is taken from
BUF_INFINITE_FORWARD, which is -1. The problem does not appear
in 32-bit mode because this value is first cast to an unsigned
long, truncating it to 32-bit (4 GB). Setting an upper bound
fixes the issue.

Also, a second error check has been added for splice. If EINVAL
is returned, we fall back to recv().
2009-11-28 07:47:10 +01:00
Willy Tarreau
f1ba4b3de5 [MAJOR] buffer: flag BF_DONT_READ to disable reads when not required
When processing a GET or HEAD request in close mode, we know we don't
need to read anything anymore on the socket, so we can disable it.
Doing this can save up to 40% of the recv calls, and half of the
epoll_ctl calls.

For this we need a buffer flag indicating that we're not interesting in
reading anymore. Right now, this flag also disables both polled reads.
We might benefit from disabling only speculative reads, but we will need
at least this flag when we want to support keepalive anyway.

Currently we don't disable the flag on completion, but it does not
matter as we close ASAP when performing the shutw().
2009-10-18 08:52:24 +02:00
Willy Tarreau
8d5d77efc3 [OPTIM] move some rarely used fields out of fdtab
Some rarely information are stored in fdtab, making it larger for no
reason (source port ranges, remote address, ...). Such information
lie there because the checks can't find them anywhere else. The goal
will be to move these information to the stream interface once the
checks make use of it.

For now, we move them to an fdinfo array. This simple change might
have improved the cache hit ratio a little bit because a 0.5% of
performance increase has measured.
2009-10-18 08:17:33 +02:00
Willy Tarreau
fe8903cc76 [BUG] don't refresh timeouts late after detected activity
In old versions, before 1.3.16, we had to refresh the timeouts after
each call to process_session() because the stream socket handler did
not do it. Now that the sockets can exchange data for a long period
without calling process_session(), we can detect an old activity and
refresh a timeout long after the last activity, causing too late a
detection of some timeouts.

The fix simply consists in not checking for activity anymore in
stream_sock_data_finish() but only set a timeout if it was not
previously set.
2009-10-04 10:56:08 +02:00
Willy Tarreau
f27b5ea8dc [MEDIUM] new option "independant-streams" to stop updating read timeout on writes
By default, when data is sent over a socket, both the write timeout and the
read timeout for that socket are refreshed, because we consider that there is
activity on that socket, and we have no other means of guessing if we should
receive data or not.

While this default behaviour is desirable for almost all applications, there
exists a situation where it is desirable to disable it, and only refresh the
read timeout if there are incoming data. This happens on sessions with large
timeouts and low amounts of exchanged data such as telnet session. If the
server suddenly disappears, the output data accumulates in the system's
socket buffers, both timeouts are correctly refreshed, and there is no way
to know the server does not receive them, so we don't timeout. However, when
the underlying protocol always echoes sent data, it would be enough by itself
to detect the issue using the read timeout. Note that this problem does not
happen with more verbose protocols because data won't accumulate long in the
socket buffers.

When this option is set on the frontend, it will disable read timeout updates
on data sent to the client. There probably is little use of this case. When
the option is set on the backend, it will disable read timeout updates on
data sent to the server. Doing so will typically break large HTTP posts from
slow lines, so use it with caution.
2009-10-03 22:01:18 +02:00
Willy Tarreau
89f7ef295d [MINOR] stream_interface: add SI_FL_DONT_WAKE flag
We had to add a new stream_interface flag : SI_FL_DONT_WAKE. This flag
is used to indicate that a stream interface is being updated and that
no wake up should be sent to its owner. This will be required for tasks
embedded into stream interfaces. Otherwise, we could have the
owner task send wakeups to itself during status updates, thus
preventing the state from converging. As long as a stream_interface's
status is being monitored and adjusted, there is no reason to wake it
up again, as we know its changes will be seen and considered.
2009-09-23 23:52:14 +02:00
Willy Tarreau
31971e536a [MEDIUM] add support for infinite forwarding
In TCP, we don't want to forward chunks of data, we want to forward
indefinitely. This patch introduces a special value for the amount
of data to be forwarded. When buffer_forward() is called with
BUF_INFINITE_FORWARD, it configures the buffer to never stop
forwarding until the end.
2009-09-20 12:07:52 +02:00
Willy Tarreau
59454bfaa4 [MINOR] stream_sock: don't set SI_FL_WAIT_DATA if BF_SHUTW_NOW is set
Don't ask for more data when we know we're about to close. This is
harmless but better have it cleaned up.
2009-09-20 11:14:27 +02:00
Willy Tarreau
ba0b63d2c7 [MAJOR] buffers: fix the BF_EMPTY flag's meaning
The BF_EMPTY flag was once used to indicate an empty buffer. However,
it was used half the time as meaning the buffer is empty for the reader,
and half the time as meaning there is nothing left to send.

"nothing to send" is only indicated by "->send_max=0 && !pipe". Once
we fix this, we discover that the flag is not used anymore. So the
flags has been renamed BF_OUT_EMPTY and means exactly the condition
above, ie, there is nothing to send.

Doing so has allowed us to remove some unused tests for emptiness,
but also to uncover a certain amount of situations where the flag
was not correctly set or tested.
2009-09-20 08:17:45 +02:00
Willy Tarreau
520d95e42b [MAJOR] buffers: split BF_WRITE_ENA into BF_AUTO_CONNECT and BF_AUTO_CLOSE
The BF_WRITE_ENA buffer flag became very complex to deal with, because
it was used to :
  - enable automatic connection
  - enable close forwarding
  - enable data forwarding

The last point was not very true anymore since we introduced ->send_max,
but still the test remained everywhere. This was causing issues such as
impossibility to connect without forwarding data, impossibility to prevent
closing when data was forwarded, etc...

This patch clarifies the situation by getting rid of this multi-purpose
flag and replacing it with :
  - data forwarding based only on ->send_max || ->pipe ;
  - a new BF_AUTO_CONNECT flag to allow automatic connection and only
    that ;
  - ability to perform an automatic connection when ->send_max or ->pipe
    indicate that data is waiting to leave the buffer ;
  - a new BF_AUTO_CLOSE flag to let the producer automatically set the
    BF_SHUTW_NOW flag when it gets a BF_SHUTR.

During this cleanup, it was discovered that some tests were performed
twice, or that the BF_HIJACK flag was still tested, which is not needed
anymore since ->send_max replcaed it. These places have been fixed too.

These cleanups have also revealed a few areas where the other flags
such as BF_EMPTY are not cleanly used. This will be an opportunity for
a second patch.
2009-09-19 21:14:54 +02:00
Willy Tarreau
418fd4722a [MAJOR] buffers: fix misuse of the BF_SHUTW_NOW flag
This flag was incorrectly used as meaning "close immediately",
while it needs to say "close ASAP". ASAP here means when unsent
data pending in the buffer are sent. This helps cleaning up some
dirty tricks where the buffer output was checking the BF_SHUTR
flag combined with EMPTY and other such things. Now we have a
clearly defined semantics :

  - producer sets SHUTR and *may* set SHUTW_NOW if WRITE_ENA is
    set, otherwise leave it to the session processor to set it.
  - consumer only checks SHUTW_NOW to decide whether or not to
    call shutw().

This also induced very minor changes at some locations which were
not protected against buffer changes while the SHUTW_NOW flag was
set. Now we prevent send_max from changing when the flag is set.

Several tests have been run without any unexpected behaviour detected.

Some more cleanups are needed, as it clearly appears that some tests
could be removed with stricter semantics.
2009-09-19 14:53:46 +02:00
Dmitry Sivachenko
caf58986fb [BUILD] compilation of haproxy-1.4-dev2 on FreeBSD
Please consider the following patches. They are required to
compile haproxy-1.4-dev2 on FreeBSD.

Summary:
1) include <sys/types.h> before <netinet/tcp.h>
2) Use IPPROTO_TCP instead of SOL_TCP
(they are both defined as 6, TCP protocol number)
2009-08-30 14:45:19 +02:00
Willy Tarreau
6db06d3870 [MEDIUM] remove TCP_CORK and make use of MSG_MORE instead
send() supports the MSG_MORE flag on Linux, which does the same
as TCP_CORK except that we don't have to remove TCP_NODELAY before
and we don't need any syscall to set/remove it. This can save up
to 4 syscalls around a send() (two for setting it, two for removing
it), and it's much cleaner since it is not persistent. So make use
of it instead.
2009-08-19 11:29:44 +02:00
Willy Tarreau
d6d06909da [CLEANUP] remove ifdef MSG_NOSIGNAL and define it instead
ifdefs are really annoying in the code. Define MSG_NOSIGNAL to zero
when undefined and remove associated ifdefs.
2009-08-19 11:25:08 +02:00
Willy Tarreau
a07a34eb24 [MEDIUM] replace BUFSIZE with buf->size in computations
The first step towards dynamic buffer size consists in removing
all static definitions of the buffer size. Instead, we store a
buffer's size in itself. Right now they're all preinitialized
to BUFSIZE, but we will change that.
2009-08-16 23:27:46 +02:00
Willy Tarreau
c9fce2fee8 [BUILD] fix build for systems without SOL_TCP
Andrew Azarov reported that haproxy-1.4-dev1 does not build
under FreeBSD 7.2 because SOL_TCP is not defined. So add a
check for its definition before using it. This only impacts
network optimisations anyway.
2009-08-16 14:13:47 +02:00
Willy Tarreau
c54aef3180 [BUG] fix random pauses on last segment of a series
During a direct data transfer from the server to the client, if the
system did not have enough buffers anymore, haproxy would not enable
write polling again if it could write at least one data chunk. Under
normal conditions, this would remain undetected because the remaining
data would be pushed by next data chunks.

However, when this happens on the last chunk of a session, or the last
in a series in an interactive bidirectional TCP transfer, haproxy would
only start sending again when the read timeout was reached on the side
it stopped writing, causing long pauses on some protocols such as SQL.

This bug was reported by an Exceliance customer who generously offered
to help us by sending large amounts of traces and running various tests
on production systems.

It is quite hard to trigger it but it becomes easier with a ping-pong
TCP service which transfers random data sizes, with a modified version
of send() able to send packets smaller than the average transfer size.

A cleaner fix would imply only updating the write timeout when data
transfers are *attempted*, not succeeded, but that requires more
sensible code changes without fixing the result. It is a candidate
for a later patch though.
2009-07-27 20:08:06 +02:00
Willy Tarreau
7154365cc6 [BUG] stream_sock: don't stop reading when the poller reports an error
As reported by Jean-Baptiste Quenot and Robbie Aelter, sometimes a
backend server error is converted to a 502 error if the backend stops
before reading all the request. The reason is that the remote system
sends a TCP RST packet because there are still unread data pending in
the socket buffer. This RST is translated as a socket error on the
local system, and this error is reported by the poller.

However, most of the time, it's a write error, but the system is
still able to read the remaining pending data, such as in the trace
below :

send(7, "GET /aaa HTTP/1.0\r\nUser-Agent: Mo"..., 1123, MSG_DONTWAIT|MSG_NOSIGNAL) = 1123
epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=7, u64=7}}) = 0
epoll_wait(3, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}}, 8, 1000) = 1
gettimeofday({1247593958, 643572}, NULL) = 0
recv(7, "HTTP/1.0 400 Bad request\r\nCache-C"..., 7000, MSG_NOSIGNAL) = 187
setsockopt(6, SOL_TCP, TCP_NODELAY, [0], 4) = 0
setsockopt(6, SOL_TCP, TCP_CORK, [1], 4) = 0
send(6, "HTTP/1.0 400 Bad request\r\nCache-C"..., 187, MSG_DONTWAIT|MSG_NOSIGNAL) = 187
shutdown(6, 1 /* send */)               = 0

The recv succeeded while epoll_wait() reported an error.

Note: This case is very hard to reproduce and requires that the backend
server is reached via the loopback in order to minimise latency and
reduce the risk of sent data being ACKed.
2009-07-14 19:55:05 +02:00
Willy Tarreau
720058cdcb [BUG] stream_sock: always shutdown(SHUT_WR) before closing
When we close a socket with unread data in the buffer, or when the
nolinger option is set, we regularly lose the last fragment, which
often contains the error message. This typically occurs when sending
too large a request. Only the RST is seen due to the close() (since
not all data were read) and the output message never reaches the
network.

Doing a shutdown() before the close() solves this annoying issue
because the data are really pushed before the system sends the RST.
2009-07-14 19:21:50 +02:00
Willy Tarreau
dc340a900d [MEDIUM] splice: set the capability on each stream_interface
The splice code did not consider compatibility between both ends
of the connection. Now we set different capabilities on each
stream interface, depending on what the protocol can splice to/from.
Right now, only TCP is supported. Thanks to this, we're now able to
automatically detect when splice() is not implemented and automatically
disable it on one end instead of reporting errors to the upper layer.
2009-06-28 23:10:19 +02:00
Willy Tarreau
5d707e1aaa [MEDIUM] stream_sock: don't close prematurely when nolinger is set
When the nolinger option is used, we must not close too fast because
some data might be left unsent. Instead we must proceed with a normal
shutdown first, then a close. Also, we want to avoid merging FIN with
the last segment if nolinger is set, because if that one gets lost,
there is no chance for it to be retransmitted.
2009-06-28 11:09:07 +02:00
Willy Tarreau
fb14edc215 [MEDIUM] stream_sock: implement tcp-cork for use during shutdowns on Linux
Setting TCP_CORK on a socket before sending the last segment enables
automatic merging of this segment with the FIN from the shutdown()
call. Playing with TCP_CORK is not easy though as we have to track
the status of the TCP_NODELAY flag since both are mutually exclusive.
Doing so saves one more packet per session and offers about 5% more
performance.

There is no reason not to do it, so there is no associated option.
2009-06-14 15:24:37 +02:00
Willy Tarreau
d06e71179a [BUG] stream_sock: check for shut{r,w} before refreshing some timeouts
Under some circumstances, it appears possible to refresh a timeout
just after a side has been shut. For instance, if poll() plans to
call both read and write, and the read side calls chk_snd() which
in turn causes a shutw to occur, then stream_sock_write could update
its write timeout. The same problem happens the other way.

The timeout checks will then not catch these cases because they
ignore timeouts in case of shut{r,w}.

This is very likely to be the major cause of the 100% CPU usages
reported by Bart Bobrowski.

The fix consists in always ensuring that a side is not shut before
updating its timeout.
2009-03-29 10:18:41 +02:00
Willy Tarreau
1714e0ffda [BUG] stream_sock: disable I/O on fds reporting an error
Upon read or write error, we cannot immediately close the FD because
we want to first report the error to the upper layer which will do it
itself. However, we want to prevent any further I/O from being performed
on the FD. This is especially important in case of speculative I/O where
nothing else could stop the FD from still being polled until the upper
layer takes care of the condition.
2009-03-28 23:42:30 +01:00
Willy Tarreau
127334e89b [BUG] reset the stream_interface connect timeout upon connect or error
The stream_interface timeout was not reset upon a connect success or
error, leading to busy loops when requeuing tasks in the past.

Thanks to Bart Bobrowski for reporting the issue.
2009-03-28 11:01:20 +01:00
Willy Tarreau
1b194fe03e [OPTIM] buffer: new BF_READ_DONTWAIT flag reduces EAGAIN rates
When the reader does not expect to read lots of data, it can
set BF_READ_DONTWAIT on the request buffer. When it is set,
the stream_sock_read callback will not try to perform multiple
reads, it will return after only one, and clear the flag.
That way, we can immediately return when waiting for an HTTP
request without trying to read again.

On pure request/responses schemes such as monitor-uri or
redirects, this has completely eliminated the EAGAIN occurrences
and the epoll_ctl() calls, resulting in a performance increase of
about 10%. Similar effects should be observed once we support
HTTP keep-alive since we'll immediately disable reads once we
get a full request.
2009-03-21 21:57:30 +01:00
Willy Tarreau
6f4a82c7af [OPTIM] stream_sock: don't retry to read after a large read
If we get very large data at once, it's almost certain that it's
worthless trying to read again, because we got everything we could
get.

Doing this has made all -EAGAIN disappear from splice reads. The
threshold has been put in the global tunable structures so that if
we one day want to make it accessible from user config, it will be
easy to do so.
2009-03-21 20:43:57 +01:00
Willy Tarreau
c9619468ea [BUG] stream_sock: write timeout must be updated when forwarding !
When data are forwarded between socket, we must update the output
socket's write timeout. This was forgotten, causing sessions to
unexpectedly expire during long posts.
2009-03-09 22:40:57 +01:00
Willy Tarreau
87bed62a92 [BUILD] build fixes for Solaris
One build error in stream_sock.c when MSG_NOSIGNAL is not defined,
and a warning in task.c.
2009-03-08 22:25:28 +01:00
Vincenzo Farruggia
9b97cff1c2 [BUILD] Haproxy won't compile if DEBUG_FULL is defined
As subject when i try to compile haproxy with -DDEBUG_FULL it stop at
stream_sock.c file with:
gcc -Iinclude -Wall -O2 -g     -DDEBUG_FULL  -DTPROXY -DENABLE_POLL
-DENABLE_EPOLL -DENABLE_SEPOLL -DNETFILTER -DUSE_GETSOCKNAME
-DCONFIG_HAPROXY_VERSION=\"1.3.15\"
-DCONFIG_HAPROXY_DATE=\"2008/04/19\" -c -o src/stream_sock.o
src/stream_sock.c
src/stream_sock.c: In function 'stream_sock_chk_rcv':
src/stream_sock.c:905: error: 'fd' undeclared (first use in this function)
src/stream_sock.c:905: error: (Each undeclared identifier is reported only once
src/stream_sock.c:905: error: for each function it appears in.)
src/stream_sock.c:905: error: 'ob' undeclared (first use in this function)
src/stream_sock.c: In function 'stream_sock_chk_snd':
src/stream_sock.c:940: error: 'fd' undeclared (first use in this function)
src/stream_sock.c:940: error: 'ib' undeclared (first use in this function)
make: *** [src/stream_sock.o] Error 1

With this patch all build fine:
2009-02-04 22:46:19 +01:00
Willy Tarreau
3eba98aa57 [MEDIUM] splice: make use of pipe pools
Using pipe pools makes pipe management a lot easier. It also allows to
remove quite a bunch of #ifdefs in areas which depended on the presence
or not of support for kernel splicing.

The buffer now holds a pointer to a pipe structure which is always NULL
except if there are still data in the pipe. When it needs to use that
pipe, it dynamically allocates it from the pipe pool. When the data is
consumed, the pipe is immediately released.

That way, there is no need anymore to care about pipe closure upon
session termination, nor about pipe creation when trying to use
splice().

Another immediate advantage of this method is that it considerably
reduces the number of pipes needed to use splice(). Tests have shown
that even with 0.2 pipe per connection, almost all sessions can use
splice(), because the same pipe may be used by several consecutive
calls to splice().
2009-01-25 13:56:13 +01:00