Commit Graph

429 Commits

Author SHA1 Message Date
Willy Tarreau
908d26fd03 MINOR: stream-int: don't needlessly call si_cs_send() in si_cs_process()
There's a call there to si_cs_send() while we're supposed to come from
si_cs_io_cb() which has just done it. But in fact we can also come here
as a lower layer callback from ->wake() after a connection is established.
Since most of the time we'll end up here with either no data in the buffer
or a blocked output, let's simply check if we're already susbcribed to send
events before calling si_cs_send().
2018-10-28 13:50:02 +01:00
Willy Tarreau
0dfccb20f5 MINOR: stream-int: make stream_int_notify() not wake the tasklet up
stream_int_notify() is I/O agnostic and should not wake up the tasklet,
it's up to si_cs_process() to do that, just like si_applet_wake_cb()
does it for the applet.
2018-10-28 13:50:01 +01:00
Willy Tarreau
33a09a5f2a MINOR: stream-int: don't needlessly call tasklet_wakeup() in stream_int_chk_snd_conn()
This one was added by commit 53216e7db ("MEDIUM: connections: Don't
directly mess with the polling from the upper layers.") after the
removal of the conditional cs_want_send() call. But after analysis
it turned out that it's not needed since the si_cs_send() call will
either succeed or subscribe.
2018-10-28 13:50:01 +01:00
Willy Tarreau
eafd8ebcfe MEDIUM: stream-int: call si_cs_process() in stream_int_update_conn
Calling si_cs_send() alone is always dangerous because it can result
in the loss of an event if it manages to empty the buffer. Indeed, in
this case it's critical to call si_chk_rcv() on the opposite stream-int.
Given that si_cs_process() takes care of all this, let's call it instead.
All this code could possibly be refined soon to avoid redoing the whole
stream_int_notify() and do it only after a send(), but at the moment it's
not important.
2018-10-28 13:48:06 +01:00
Willy Tarreau
85f890174a MEDIUM: stream-int: make si_update() synchronize flag changes before the I/O
With the new synchronous si_cs_send() at the end of process_stream(),
we're seeing re-appear the I/O layer specific part of the stream interface
which is supposed to deal with I/O event subscription. The only difference
is that now we subscribe to I/Os only after having attempted (and failed)
them.

This patch brings a cleanup in this by reintroducing stream_int_update_conn()
with the send code from process_stream(). However this alone would not be
enough because the flags which are cleared afterwards would result in the
loss of the possible events (write events only at the moment). So the flags
clearing and stream-int state updates are also performed inside si_update()
between the generic code and the I/O specific code. This definitely makes
sense as after this call we can simply check again for channel and SI flag
changes and decide to loop once again or not.
2018-10-28 13:47:00 +01:00
Willy Tarreau
581abd3f99 MEDIUM: stream-int: replace channel_alloc_buffer() with si_alloc_ibuf() everywhere
Well that's only 3 places (applet.c, stream_interface.c, hlua.c). This
ensures we always clear SI_FL_WAIT_ROOM before setting it on failure,
so that it is granted that SI_FL_WAIT_ROOM always indicates a lack of
room for doing an operation, including the inability to allocate a
buffer for this.
2018-10-28 13:47:00 +01:00
Willy Tarreau
ede3d884fc MEDIUM: channel: merge back flags CF_WRITE_PARTIAL and CF_WRITE_EVENT
The behaviour of the flag CF_WRITE_PARTIAL was modified by commit
95fad5ba4 ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") due
to a situation where it could trigger an immediate wake up of the other
side, both acting in loops via the FD cache. This loss has caused the
need to introduce CF_WRITE_EVENT as commit c5a9d5bf, to replace it, but
both flags express more or less the same thing and this distinction
creates a lot of confusion and complexity in the code.

Since the FD cache now acts via tasklets, the issue worked around in the
first patch no longer exists, so it's more than time to kill this hack
and to restore CF_WRITE_PARTIAL's semantics (i.e.: there has been some
write activity since we last left process_stream).

This patch mostly reverts the two commits above. Only the part making
use of CF_WROTE_DATA instead of CF_WRITE_PARTIAL to detect the loss of
data upon connection setup was kept because it's more accurate and
better suited.
2018-10-26 08:32:57 +02:00
Christopher Faulet
955188d37d BUG/MEDIUM: stream-int: don't set SI_FL_WAIT_ROOM on CF_READ_DONTWAIT
With the previous connection model, when we purposely decided to stop
receiving in order to avoid polling after a complete request was received
for example, it was needed to set SI_FL_WAIT_ROOM to prevent receive
polling from being re-armed. Now with the new subscription-based model
there is no such thing anymore and there is noone to remove this flag
either. Thus if a request takes more than one packet to come in or
spans over too many packets, this flag will cause it to wait forever.
Let's simply remove this flag now.

This patch should not be backported since older versions still need
that this flag is set here to stop receiving.
2018-10-23 10:22:36 +02:00
Olivier Houchard
31f04e4416 MINOR: stream_interface: Avoid calling si_cs_send/recv if not needed.
Don't bother calling si_cs_send and si_cs_recv if we're either already
subscribe, or if the output buffer is empty for si_cs_send.
2018-10-22 16:05:08 +02:00
Olivier Houchard
53216e7db9 MEDIUM: connections: Don't directly mess with the polling from the upper layers.
Avoid using conn_xprt_want_send/recv, and totally nuke cs_want_send/recv,
from the upper layers. The polling is now directly handled by the connection
layer, it is activated on subscribe(), and unactivated once we got the event
and we woke the related task.
2018-10-21 05:58:40 +02:00
Olivier Houchard
fa8aa867b9 MEDIUM: connections: Change struct wait_list to wait_event.
When subscribing, we don't need to provide a list element, only the h2 mux
needs it. So instead, Add a list element to struct h2s, and use it when a
list is needed.
This forces us to use the unsubscribe method, since we can't just unsubscribe
by using LIST_DEL anymore.
This patch is larger than it should be because it includes some renaming.
2018-10-11 15:34:39 +02:00
Olivier Houchard
21df6cc2f9 MINOR: h2/stream_interface: Reintroduce te wake() method.
For the time being, reintroduce the wake methods, it may be revisited later.h
2018-09-26 14:21:54 +02:00
Olivier Houchard
0e367bbb01 BUG/MEDIUM: process_stream: Don't use si_cs_io_cb() in process_stream().
Instead of using si_cs_io_cb() in process_stream()  use si_cs_send/si_cs_recv
instead, as si_cs_io_cb() may lead to process_stream being woken up when it
shouldn't be, and thus timeout would never get triggered.
2018-09-26 14:21:54 +02:00
Bertrand Jacquin
874a35cb55 DOC: Fix typos in lua documentation 2018-09-14 09:31:34 +02:00
Olivier Houchard
71384551fe MINOR: conn_streams: Remove wait_list from conn_streams.
The conn_streams won't be used for subscribing/waiting for I/O events, after
all, so just remove its wait_list, and send/recv/_wait_list.
2018-09-12 17:37:55 +02:00
Olivier Houchard
c2aa71108a MEDIUM: stream_interfaces: Starts receiving from the upper layers.
Instead of waiting for the connection layer to let us know we can read,
attempt to receive as soon as process_stream() is called, and subscribe
to receive events if we can't receive yet.

Now, except for idle connections, the recv(), send() and wake() methods are
no more, all the lower layers do is waking tasklet for anybody waiting
for I/O events.
2018-09-12 17:37:55 +02:00
Olivier Houchard
f653528dc1 MEDIUM: stream_interface: Make recv() subscribe when more data is needed.
Refactor the code so that si_cs_recv() subscribes to receive events.
2018-09-12 17:37:55 +02:00
Olivier Houchard
af4021e680 MEDIUM: connections: Get rid of the recv() method.
Remove the recv() method from mux and conn_stream.
The goal is to always receive from the upper layers, instead of waiting
for the connection later. For now, recv() is still called from the wake()
method, but that should change soon.
2018-09-12 17:37:55 +02:00
Olivier Houchard
4cf7fb148f MEDIUM: connections/mux: Add a recv and a send+recv wait list.
For struct connection, struct conn_stream, and for the h2 mux, add 2 new
lists, one that handles waiters for recv, and one that handles waiters for
recv and send. That way we can ask to subscribe for either recv or send.
2018-09-12 17:37:55 +02:00
Olivier Houchard
c7ffa91763 BUG/MEDIUM: stream_interface: try to call si_cs_send() earlier.
Call si_cs_send() at the beginning of si_cs_wake_cb(), instead of from
stream_int_notify-), so that if we get a connection error while trying to
send, the stream_interface will get SI_FL_ERR, the associated task will
be woken up, and the connection will be properly destroyed.

No backport needed.
2018-08-28 19:46:45 +02:00
Olivier Houchard
80c56790d9 BUG/MEDIUM: stream_interface: Call the wake callback after sending.
If we subscribed to send, and the callback is called, call the wake callback
after, so that process_stream() may be woken up if needed.

This is 1.9-specific, no backport is needed.
2018-08-21 18:06:57 +02:00
Olivier Houchard
a6ff035770 BUG/MEDIUM: stream-int: Check if the conn_stream exist in si_cs_io_cb.
It is possible that the conn_stream gets detached from the stream_interface,
and as it subscribed to the wait list, si_cs_io_cb() gets called anyway,
so make sure we have a conn_stream before attempting to send more data.

This is 1.9-specific, no backport is needed.
2018-08-21 18:06:54 +02:00
Olivier Houchard
8f0b4c66f5 MINOR: stream_interface: Give stream_interface its own wait_list.
Instead of just using the conn_stream wait_list, give the stream_interface
its own. When the conn_stream will have its own buffers, the stream_interface
may have to wait on it.
2018-08-16 17:29:54 +02:00
Olivier Houchard
91894cbf4c MINOR: stream_interface: Don't use si_cs_send() as a task handler.
Instead of using si_cs_send() as a task handler, define a new function,
si_cs_io_cb(), and give si_cs_send() its original prototype. Right now
si_cs_io_cb() just handles send, but later it'll handle recv() too.
2018-08-16 17:29:54 +02:00
Olivier Houchard
e1c6dbcd70 MINOR: connections/mux: Add the wait reason(s) to wait_list.
Add a new element to the wait_list, that let us know which event(s) we are
waiting on.
2018-08-16 17:29:53 +02:00
Olivier Houchard
ed0f207ef5 MINOR: connections: Get rid of txbuf.
Remove txbuf from conn_stream. It is not used yet, and its only user will
probably be the mux_h2, so it will be better suited in the struct h2s.
2018-08-16 17:29:51 +02:00
Olivier Houchard
511efeae7e MINOR: connections: Make rcv_buf mandatory and nuke cs_recv().
Reintroduce h2_rcv_buf(), right now it just does what cs_recv() did, but
should be modified later.
2018-08-16 17:23:44 +02:00
Christopher Faulet
7c42eacbe9 BUG/MEDIUM: stream_int: Don't check CO_FL_SOCK_RD_SH flag to trigger cs receive
It is mandatory to be sure to process data blocked in the RX buffer of the
conn_stream while the shutr/read0 was already processed. The stream interface
doesn't need to rely on this flags because it already tests CS_FL_EOS.
2018-08-08 10:41:11 +02:00
Christopher Faulet
063f786553 MINOR: conn_stream: add cs_send() as a default snd_buf() function
This function is generic and is able to automatically transfer data from a
buffer to the conn_stream's tx buffer. It does this automatically if the mux
doesn't define another snd_buf() function.

It cannot yet be used as-is with the conn_stream's txbuf without risking to
lose data on close since conn_streams need to be orphaned for this.
2018-08-08 09:53:58 +02:00
Willy Tarreau
171d5f203a BUG/MEDIUM: stream-int: don't immediately enable reading when the buffer was reportedly full
There is a long-time issue which affects some applets, at least the stats
applet. If a large stats page is read over a slow link, regularly the
channel's buffer contains too many response data to allow another round
of ci_putblk() to copy a new message. In this case the applet calls
si_applet_cant_put() to mention that it failed to emit data into the
channel's buffer, and wants to be called only once some room is made.

The problem is that stream_int_update(), which is called from
process_stream(), will clear this flag whenever it sees there's some
spare room in the channel's buffer. It causes the applet to be woken
again immediately. This is very visible when reading a large stats
page over a slow link, because in this case haproxy will run at 100%
CPU and strace shows mostly epoll_wait(0). It is very likely that some
other applets like CLI, Lua, peers or SPOE have also been affected but
that the effect were less noticeable because it was mixed with traffic.

Ideally stream_int_update() should not touch these flags, but changing
this would require a very careful auditing of all users. Instead here
what we do is that we respect the flag if the channel still has output
data. This way the flag will automatically disappear once the buffer is
empty, and the applet function will be called only when input data
remains, if at all.

This patch alone is not enough to observe the behaviour change on the
stats page because another bug takes over, addressed by next patch
(BUG/MEDIUM: stats: don't ask for more data as long as we're responding).
When both are applied, dumping stats for 5k backends over a 10 Mbps link
take 1% CPU instead of 100%, with 1.5k epoll_wait() calls instead of 80k.

This fix should be backported at least as far as 1.5.
2018-07-24 17:12:38 +02:00
Willy Tarreau
67b1e78f68 MEDIUM: stream-int: automatically call si_cs_recv_cb() if the cs has data on wake()
If the cs has data pending or shutdown and the input channel is still
waiting for reads, let's simply call the recv() function from the wake()
callback. This will allow the lower layers to simply wake the upper one
up without having to consider the recv() nor anything else.
2018-07-20 19:21:43 +02:00
Willy Tarreau
11c9aa424e MEDIUM: conn_stream: add cs_recv() as a default rcv_buf() function
This function is generic and is able to automatically transfer data
from a conn_stream's rx buffer to the destination buffer. It does this
automatically if the mux doesn't define another rcv_buf() function.
2018-07-20 19:21:43 +02:00
Willy Tarreau
8318885487 MINOR: connection: simplify subscription by adding a registration function
This new function wl_set_waitcb() prepopulates a wait_list with a tasklet
and a context and returns it so that it can be passed to ->subscribe() to
be added to a connection or conn_stream's wait_list. The caller doesn't
need to know all the insiders details anymore this way.
2018-07-19 18:31:07 +02:00
Olivier Houchard
910b2bc829 MEDIUM: connections/mux: Revamp the send direction.
Totally nuke the "send" method, instead, the upper layer decides when it's
time to send data, and if it's not possible, uses the new subscribe() method
to be called when it can send data again.
2018-07-19 18:31:07 +02:00
Willy Tarreau
83061a820e MAJOR: chunks: replace struct chunk with struct buffer
Now all the code used to manipulate chunks uses a struct buffer instead.
The functions are still called "chunk*", and some of them will progressively
move to the generic buffer handling code as they are cleaned up.
2018-07-19 16:23:43 +02:00
Willy Tarreau
843b7cbe9d MEDIUM: chunks: make the chunk struct's fields match the buffer struct
Chunks are only a subset of a buffer (a non-wrapping version with no head
offset). Despite this we still carry a lot of duplicated code between
buffers and chunks. Replacing chunks with buffers would significantly
reduce the maintenance efforts. This first patch renames the chunk's
fields to match the name and types used by struct buffers, with the goal
of isolating the code changes from the declaration changes.

Most of the changes were made with spatch using this coccinelle script :

  @rule_d1@
  typedef chunk;
  struct chunk chunk;
  @@
  - chunk.str
  + chunk.area

  @rule_d2@
  typedef chunk;
  struct chunk chunk;
  @@
  - chunk.len
  + chunk.data

  @rule_i1@
  typedef chunk;
  struct chunk *chunk;
  @@
  - chunk->str
  + chunk->area

  @rule_i2@
  typedef chunk;
  struct chunk *chunk;
  @@
  - chunk->len
  + chunk->data

Some minor updates to 3 http functions had to be performed to take size_t
ints instead of ints in order to match the unsigned length here.
2018-07-19 16:23:43 +02:00
Willy Tarreau
c9fa0480af MAJOR: buffer: finalize buffer detachment
Now the buffers only contain the header and a pointer to the storage
area which can be anywhere. This will significantly simplify buffer
swapping and will make it possible to map chunks on buffers as well.

The buf_empty variable was removed, as now it's enough to have size==0
and area==NULL to designate the empty buffer (thus a non-allocated head
is the empty buffer by default). buf_wanted for now is indicated by
size==0 and area==(void *)1.

The channels and the checks now embed the buffer's head, and the only
pointer is to the storage area. This slightly increases the unallocated
buffer size (3 extra ints for the empty buffer) but considerably
simplifies dynamic buffer management. It will also later permit to
detach unused checks.

The way the struct buffer is arranged has proven quite efficient on a
number of tests, which makes sense given that size is always accessed
and often first, followed by the othe ones.
2018-07-19 16:23:43 +02:00
Willy Tarreau
77e478c56e MINOR: stream-int: use the new buffer API
A few locations still accessing ->i and ->o directly were changed to
use ci_data() and co_data() respectively. A call to b_del() was replaced
with co_set_data() in si_cs_send() so that ->o will is automatically be
decremented after the migration.
2018-07-19 16:23:42 +02:00
Willy Tarreau
d760eecf61 MINOR: buffer: replace buffer_not_empty() with b_data() or c_data()
It's mostly for consistency as many places already use one of these instead.
2018-07-19 16:23:41 +02:00
Willy Tarreau
337ea57cfc MINOR: connection: add a new receive flag : CO_RFL_BUF_WET
With this flag we introduce the notion of "dry" vs "wet" buffers : some
demultiplexers like the H2 mux require as much room as possible for some
operations that are not retryable like decoding a headers frame. For this
they need to know if the buffer is congested with data scheduled for
leaving soon or not. Since the new API will not provide this information
in the buffer itself, the caller must indicate it. We never need to know
the amount of such data, just the fact that the buffer is not in its
optimal condition to be used for receipt. This "CO_RFL_BUF_WET" flag is
used to mention that such outgoing data are still pending in the buffer
and that a sensitive receiver should better let it "dry" before using it.
2018-07-19 16:23:41 +02:00
Willy Tarreau
7f3225f251 MINOR: connection: add a flags argument to rcv_buf()
The mux and transport rcv_buf() now takes a "flags" argument, just like
the snd_buf() one or like the equivalent syscall lower part. The upper
layers will use this to pass some information such as indicating whether
the buffer is free from outgoing data or if the lower layer may allocate
the buffer itself.
2018-07-19 16:23:41 +02:00
Willy Tarreau
deccd1116d MEDIUM: mux: make mux->snd_buf() take the byte count in argument
This way the mux doesn't need to modify the buffer's metadata anymore
nor to know the output's size. The mux->snd_buf() function now takes a
const buffer and it's up to the caller to update the buffer's state.

The return type was updated to return a size_t to comply with the count
argument.
2018-07-19 16:23:41 +02:00
Willy Tarreau
bcbd39370f MINOR: channel/buffer: replace b_{adv,rew} with c_{adv,rew}
These ones manipulate the output data count which will be specific to
the channel soon, so prepare the call points to use the channel only.
The b_* functions are now unused and were removed.
2018-07-19 16:23:40 +02:00
Olivier Houchard
6fa63d9852 MINOR: early data: Don't rely on CO_FL_EARLY_DATA to wake up streams.
Instead of looking for CO_FL_EARLY_DATA to know if we have to try to wake
up a stream, because it is waiting for a SSL handshake, instead add a new
conn_stream flag, CS_FL_WAIT_FOR_HS. This way we don't have to rely on
CO_FL_EARLY_DATA, and we will only wake streams that are actually waiting.
2018-02-05 14:24:50 +01:00
Willy Tarreau
b78b80efe5 BUG/MINOR: stream-int: don't try to receive again after receiving an EOS
When an end of stream has been reported, we should not try to receive again
as the mux layer might not be prepared to this and could report unexpected
errors.

This is more of a strengthening measure that follows the introduction of
conn_stream that came in 1.8. It's desired to backport this into 1.8 though
it's uncertain at this time whether it may have caused real issues.
2017-12-14 13:43:52 +01:00
Willy Tarreau
6577b48613 BUG/MEDIUM: stream-int: always set SI_FL_WAIT_ROOM on CS_FL_RCV_MORE
When a stream interface tries to read data from a mux using rcv_buf(),
sometimes it sees 0 as the return value and concludes that there's no
more data while there are, resulting in the connection being polled for
more data and no new attempt being made at reading these pending data.

Now it will automatically check for flag CS_FL_RCV_MORE to know if the
mux really did not have anything available or was not able to provide
these data by lack of room in the destination buffer, and will set
SI_FL_WAIT_ROOM accordingly. This will ensure that once current data
lying in the buffer are forwarded to the other side, reading chk_rcv()
will be called to re-enable reading.

It's important to note that in practice it will rely on the mux's
update_poll() function to re-enable reading and that where the calls
are placed in the stream interface, it's not possible to perform a
new synchronous rcv_buf() call. Thus a corner case remains where the
mux cannot receive due to a full buffer or any similar condition, but
needs to be able to wake itself up to deliver pending data. This is a
limitation of the current connection/conn_stream API which will likely
need a new event subscription to at least call ->wake() asynchronously
(eg: mux->{kick,restart,touch,update} ?).

For now the affected mux (h2 only) will have to take care of the extra
logic to carefully enable polling to restart processing incoming data.

This patch relies on previous one (MINOR: conn_stream: add new flag
CS_FL_RCV_MORE to indicate pending data) and both must be backported to
1.8.
2017-12-10 21:19:33 +01:00
Olivier Houchard
7fc96d5a01 MINOR: mux: Make sure every string is woken up after the handshake.
In case any stream was waiting for the handshake after receiving early data,
we have to wake all of them. Do so by making the mux responsible for
removing the CO_FL_EARLY_DATA flag after all of them are woken up, instead
of doing it in si_cs_wake_cb(), which would then only work for the first one.
This makes wait_for_handshake work with HTTP/2.
2017-11-23 19:35:42 +01:00
Willy Tarreau
62dd698070 BUG/MINOR: stream-int: don't try to read again when CF_READ_DONTWAIT is set
Commit 9aaf778 ("MAJOR: connection : Split struct connection into struct
connection and struct conn_stream.") had to change the way the stream
interface deals with incoming data to accomodate the mux. A break
statement got lost during a change, leading to the receive call being
performed twice even when CF_READ_DONTWAIT is set. The most noticeable
effect is that it made the bug described in commit 33982cb ("BUG/MAJOR:
stream: ensure analysers are always called upon close") much easier to
reproduce as it would appear even with an HTTP frontend.

Let's just restore the stream-interface flag and the break here, as in
the previous code.

No backport is needed as this was introduced during 1.8-dev.
2017-11-20 16:13:16 +01:00
Olivier Houchard
e9bed53486 MINOR: ssl: Make sure we don't shutw the connection before the handshake.
Instead of trying to finish the handshake in ssl_sock_shutw, which may
fail, try not to shutdown until the handshake is finished.
2017-11-16 19:04:10 +01:00
Christopher Faulet
c5a9d5bf23 BUG/MEDIUM: stream-int: Don't loss write's notifs when a stream is woken up
When a write activity is reported on a channel, it is important to keep this
information for the stream because it take part on the analyzers' triggering.
When some data are written, the flag CF_WRITE_PARTIAL is set. It participates to
the task's timeout updates and to the stream's waking. It is also used in
CF_MASK_ANALYSER mask to trigger channels anaylzers. In the past, it was cleared
by process_stream. Because of a bug (fixed in commit 95fad5ba4 ["BUG/MAJOR:
stream-int: don't re-arm recv if send fails"]), It is now cleared before each
send and in stream_int_notify. So it is possible to loss this information when
process_stream is called, preventing analyzers to be called, and possibly
leading to a stalled stream.

Today, this happens in HTTP2 when you call the stat page or when you use the
cache filter. In fact, this happens when the response is sent by an applet. In
HTTP1, everything seems to work as expected.

To fix the problem, we need to make the difference between the write activity
reported to lower layers and the one reported to the stream. So the flag
CF_WRITE_EVENT has been added to notify the stream of the write activity on a
channel. It is set when a send succedded and reset by process_stream. It is also
used in CF_MASK_ANALYSER. finally, it is checked in stream_int_notify to wake up
a stream and in channel_check_timeouts.

This bug is probably present in 1.7 but it seems to have no effect. So for now,
no needs to backport it.
2017-11-09 15:16:05 +01:00
Willy Tarreau
ecd2e15919 BUG/MINOR: stream-int: don't set MSG_MORE on closed request path
Commit 4ac4928 ("BUG/MINOR: stream-int: don't set MSG_MORE on SHUTW_NOW
without AUTO_CLOSE") was incomplete. H2 reveals another situation where
the input stream is marked closed with the request and we set MSG_MORE,
causing a delay before the request leaves.

Better avoid setting the flag on the request path for close cases in
general.
2017-11-07 15:07:25 +01:00
Willy Tarreau
a553ae96f5 MEDIUM: connection: replace conn_full_close() with cs_close()
At all call places where a conn_stream is in use, we can now use
cs_close() to get rid of a conn_stream and of its underlying connection
if the mux estimates it makes sense. This is what is currently being
done for the pass-through mux.
2017-10-31 18:03:24 +01:00
Willy Tarreau
9fbbff6de4 MEDIUM: connection: make conn_sock_shutw() aware of lingering
Instead of having to manually handle lingering outside, let's make
conn_sock_shutw() check for it before calling shutdown(). We simply
don't want to emit the FIN if we're going to reset the connection
due to lingering. It's particularly important for silent-drop where
it's absolutely mandatory that no packet leaves the machine.
2017-10-31 18:03:24 +01:00
Willy Tarreau
ecdb3fe9f4 MINOR: conn_stream: modify cs_shut{r,w} API to pass the desired mode
Now we can specify how we want to shutdown (drain vs reset, and normal
vs silent), and this propagates to the mux then the transport layer.
2017-10-31 18:03:23 +01:00
Willy Tarreau
4ff3b89643 MINOR: connection: make conn_stream users also check for per-stream error flag
In a 1:1 connection:stream there's no problem relying on the connection
flags alone to check for errors. But in a mux, it will be possible to mark
certain streams in error without having to mark all of them. An example is
an H2 client sending RST_STREAM frames to abort a long download, or a parse
error requiring to abort only this specific stream.

This commit ensures that stream-interface and checks properly check for
CS_FL_ERROR in cs->flags wherever CO_FL_ERROR was in use. Most likely over
the long term, any check for CO_FL_ERROR will have to disappear.
2017-10-31 18:03:23 +01:00
Olivier Houchard
9aaf778129 MAJOR: connection : Split struct connection into struct connection and struct conn_stream.
All the references to connections in the data path from streams and
stream_interfaces were changed to use conn_streams. Most functions named
"something_conn" were renamed to "something_cs" for this. Sometimes the
connection still is what matters (eg during a connection establishment)
and were not always renamed. The change is significant and minimal at the
same time, and was quite thoroughly tested now. As of this patch, all
accesses to the connection from upper layers go through the pass-through
mux.
2017-10-31 18:03:23 +01:00
Olivier Houchard
ccaa7de72e MINOR: ssl/proto_http: Add keywords to take care of early data.
Add a new sample fetch, "ssl_fc_has_early", a boolean that will be true
if early data were sent, and a new action, "wait-for-handshake", if used,
the request won't be forwarded until the SSL handshake is done.
2017-10-27 13:32:22 +02:00
Willy Tarreau
3b9c850271 MINOR: stream-int: stop checking for useless connection flags in chk_snd_conn
We've been keep this test for a connection being established since 1.5-dev14
when the stream-interface was still accessing the FD directly. The test on
CO_FL_HANDSHAKE and L{4,6}_CONN is totally useless here, and can even be
counter-productive on pure TCP where it could prevent a request from being
sent on a connection still attempting to complete its establishment. And it
creates an abnormal dependency between the layers that will complicate the
implementation of the mux, so let's get rid of it now.
2017-10-25 14:24:48 +02:00
Willy Tarreau
f9ce57e86c MEDIUM: connection: make conn_sock_shutw() aware of lingering
Instead of having to manually handle lingering outside, let's make
conn_sock_shutw() check for it before calling shutdown(). We simply
don't want to emit the FIN if we're going to reset the connection
due to lingering. It's particularly important for silent-drop where
it's absolutely mandatory that no packet leaves the machine.
2017-10-22 09:54:16 +02:00
Olivier Houchard
1a0545f3d7 REORG: connection: rename CO_FL_DATA_* -> CO_FL_XPRT_*
These flags are not exactly for the data layer, they instead indicate
what is expected from the transport layer. Since we're going to split
the connection between the transport and the data layers to insert a
mux layer, it's important to have a clear idea of what each layer does.

All function conn_data_* used to manipulate these flags were renamed to
conn_xprt_*.
2017-10-22 09:54:15 +02:00
Willy Tarreau
06d80a9a9c REORG: channel: finally rename the last bi_* / bo_* functions
For HTTP/2 we'll need some buffer-only equivalent functions to some of
the ones applying to channels and still squatting the bi_* / bo_*
namespace. Since these names have kept being misleading for quite some
time now and are really getting annoying, it's time to rename them. This
commit will use "ci/co" as the prefix (for "channel in", "channel out")
instead of "bi/bo". The following ones were renamed :

  bi_getblk_nc, bi_getline_nc, bi_putblk, bi_putchr,
  bo_getblk, bo_getblk_nc, bo_getline, bo_getline_nc, bo_inject,
  bi_putchk, bi_putstr, bo_getchr, bo_skip, bi_swpbuf
2017-10-19 15:01:08 +02:00
Willy Tarreau
4ac4928718 BUG/MINOR: stream-int: don't set MSG_MORE on SHUTW_NOW without AUTO_CLOSE
Since around 1.5-dev12, we've been setting MSG_MORE on send() on various
conditions, including the fact that SHUTW_NOW is present, but we don't
check that it's accompanied with AUTO_CLOSE. The result is that on requests
immediately followed by a close (where AUTO_CLOSE is not set), the request
gets delayed in the TCP stack before being sent to the server. This is
visible with the H2 code where the end-of-stream flag is set on requests,
but probably happens when a POLL_HUP is detected along with the request.
The (lack of) presence of option abortonclose has no effect here since we
never send the SHUTW along with the request.

This fix can be backported to 1.7, 1.6 and 1.5.
2017-10-17 16:38:21 +02:00
Bin Wang
95fad5ba4b BUG/MAJOR: stream-int: don't re-arm recv if send fails
When
    1) HAProxy configured to enable splice on both directions
    2) After some high load, there are 2 input channels with their socket buffer
       being non-empty and pipe being full at the same time, sitting in `fd_cache`
       without any other fds.

The 2 channels will repeatedly be stopped for receiving (pipe full) and waken
for receiving (data in socket), thus getting out and in of `fd_cache`, making
their fd swapping location in `fd_cache`.

There is a `if (entry < fd_cache_num && fd_cache[entry] != fd) continue;`
statement in `fd_process_cached_events` to prevent frequent polling, but since
the only 2 fds are constantly swapping location, `fd_cache[entry] != fd` will
always hold true, thus HAProxy can't make any progress.

The root cause of the issue is dual :
  - there is a single fd_cache, for next events and for the ones being
    processed, while using two distinct arrays would avoid the problem.

  - the write side of the stream interface wakes the read side up even
    when it couldn't write, and this one really is a bug.

Due to CF_WRITE_PARTIAL not being cleared during fast forwarding, a failed
send() attempt will still cause ->chk_rcv() to be called on the other side,
re-creating an entry for its connection fd in the cache, causing the same
sequence to be repeated indefinitely without any opportunity to make progress.

CF_WRITE_PARTIAL used to be used for what is present in these tests : check
if a recent write operation was performed. It's part of the CF_WRITE_ACTIVITY
set and is tested to check if timeouts need to be updated. It's also used to
detect if a failed connect() may be retried.

What this patch does is use CF_WROTE_DATA() to check for a successful write
for connection retransmits, and to clear CF_WRITE_PARTIAL before preparing
to send in stream_int_notify(). This way, timeouts are still updated each
time a write succeeds, but chk_rcv() won't be called anymore after a failed
write.

It seems the fix is required all the way down to 1.5.

Without this patch, the only workaround at this point is to disable splicing
in at least one direction. Strictly speaking, splicing is not absolutely
required, as regular forwarding could theorically cause the issue to happen
if the timing is appropriate, but in practice it appears impossible to
reproduce it without splicing, and even with splicing it may vary.

The following config manages to reproduce it after a few attempts (haproxy
going 100% CPU and having to be killed) :

  global
      maxpipes 50000
      maxconn 10000

  listen srv1
      option splice-request
      option splice-response
      bind :8001
      server s1 127.0.0.1:8002

  server$ tcploop 8002 L N20 A R10 S1000000 R10 S1000000 R10 S1000000 R10 S1000000 R10 S1000000
  client$ tcploop 8001 N20 C T S1000000 R10 J
2017-10-05 11:20:16 +02:00
Willy Tarreau
bbae3f0170 MEDIUM: connection: remove useless flag CO_FL_DATA_WR_SH
After careful inspection, this flag is set at exactly two places :
  - once in the health-check receive callback after receipt of a
    response
  - once in the stream interface's shutw() code where CF_SHUTW is
    always set on chn->flags

The flag was checked in the checks before deciding to send data, but
when it is set, the wake() callback immediately closes the connection
so the CO_FL_SOCK_WR_SH flag is also set.

The flag was also checked in si_conn_send(), but checking the channel's
flag instead is enough and even reveals that one check involving it
could never match.

So it's time to remove this flag and replace its check with a check of
CF_SHUTW in the stream interface. This way each layer is responsible
for its shutdown, this will ease insertion of the mux layer.
2017-08-30 10:05:49 +02:00
Willy Tarreau
54e917cfa1 MEDIUM: connection: remove useless flag CO_FL_DATA_RD_SH
This flag is both confusing and wrong. It is supposed to report the
fact that the data layer has received a shutdown, but in fact this is
reported by CO_FL_SOCK_RD_SH which is set by the transport layer after
this condition is detected. The only case where the flag above is set
is in the stream interface where CF_SHUTR is also set on the receiving
channel.

In addition, it was checked in the health checks code (while never set)
and was always test jointly with CO_FL_SOCK_RD_SH everywhere, except in
conn_data_read0_pending() which incorrectly doesn't match the second
time it's called and is fortunately protected by an extra check on
(ic->flags & CF_SHUTR).

This patch gets rid of the flag completely. Now conn_data_read0_pending()
accurately reports the fact that the transport layer has detected the end
of the stream, regardless of the fact that this state was already consumed,
and the stream interface watches ic->flags&CF_SHUTR to know if the channel
was already closed by the upper layer (which it already used to do).

The now unused conn_data_read0() function was removed.
2017-08-30 08:18:50 +02:00
Willy Tarreau
8ff5a8d87f BUG/MINOR: stream-int: don't check the CO_FL_CURR_WR_ENA flag
The stream interface chk_snd() code checks if the connection has already
subscribed to write events in order to avoid attempting a useless write()
which will fail. But it used to check both the CO_FL_CURR_WR_ENA and the
CO_FL_DATA_WR_ENA flags, while the former may only be present without the
latterif either the other side just disabled writing did not synchronize
yet (which is harmless) or if it's currently performing a handshake, which
is being checked by the next condition and will be better dealt with by
properly subscribing to the data events.

This code was added back in 1.5-dev20 to limit the number of useless calls
to splice() but both flags were checked at once while only CO_FL_DATA_WR_ENA
was needed. This bug seems to have no impact other than making code changes
more painful. This fix may be backported down to 1.5 though is unlikely to
be needed there.
2017-08-30 07:03:34 +02:00
Emeric Brun
2802b07d97 BUG/MAJOR: applet: fix a freeze if data is immedately forwarded.
Introduced regression with 'MAJOR: applet scheduler rework' (1.8-dev only).

The fix consist to re-enable the appctx immediatly from the
applet wake cb if the process_stream is not pending in runqueue
and the applet want perform a put or a get and the WAIT_ROOM
flag was removed by stream_int_notify.
2017-06-30 14:57:24 +02:00
Emeric Brun
c730606879 MAJOR: applet: applet scheduler rework.
In order to authorize call of appctx_wakeup on running task:
- from within the task handler itself.
- in futur, from another thread.

The appctx is considered paused as default after running the handler.

The handler should explicitly call appctx_wakeup to be re-called.

When the appctx_free is called on a running handler. The real
free is postponed at the end of the handler process.
2017-06-27 14:38:02 +02:00
Willy Tarreau
2686dcad1e CLEANUP: connection: remove unused CO_FL_WAIT_DATA
Very early in the connection rework process leading to v1.5-dev12, commit
56a77e5 ("MEDIUM: connection: complete the polling cleanups") marked the
end of use for this flag which since was never set anymore, but it continues
to be tested. Let's kill it now.
2017-06-02 15:50:27 +02:00
Hongbo Long
e39683c4d4 BUG/MEDIUM: stream: fix client-fin/server-fin handling
A tcp half connection can cause 100% CPU on expiration.

First reproduced with this haproxy configuration :

  global
      tune.bufsize 10485760
  defaults
      timeout server-fin 90s
      timeout client-fin 90s
  backend node2
      mode tcp
      timeout server 900s
      timeout connect 10s
      server def 127.0.0.1:3333
  frontend fe_api
      mode  tcp
      timeout client 900s
      bind :1990
      use_backend node2

Ie timeout server-fin shorter than timeout server, the backend server
sends data, this package is left in the cache of haproxy, the backend
server continue sending fin package, haproxy recv fin package. this
time the session information is as follows:

  time the session information is as follows:
      0x2373470: proto=tcpv4 src=127.0.0.1:39513 fe=fe_api be=node2
      srv=def ts=08 age=1s calls=3 rq[f=848000h,i=0,an=00h,rx=14m58s,wx=,ax=]
      rp[f=8004c020h,i=0,an=00h,rx=,wx=14m58s,ax=] s0=[7,0h,fd=6,ex=]
      s1=[7,18h,fd=7,ex=] exp=14m58s

  rp has set the CF_SHUTR state, next, the client sends the fin package,

  session information is as follows:
      0x2373470: proto=tcpv4 src=127.0.0.1:39513 fe=fe_api be=node2
      srv=def ts=08 age=38s calls=4 rq[f=84a020h,i=0,an=00h,rx=,wx=,ax=]
      rp[f=8004c020h,i=0,an=00h,rx=1m11s,wx=14m21s,ax=] s0=[7,0h,fd=6,ex=]
      s1=[9,10h,fd=7,ex=] exp=1m11s

  After waiting 90s, session information is as follows:
      0x2373470: proto=tcpv4 src=127.0.0.1:39513 fe=fe_api be=node2
      srv=def ts=04 age=4m11s calls=718074391 rq[f=84a020h,i=0,an=00h,rx=,wx=,ax=]
      rp[f=8004c020h,i=0,an=00h,rx=?,wx=10m49s,ax=] s0=[7,0h,fd=6,ex=]
      s1=[9,10h,fd=7,ex=] exp=? run(nice=0)

  cpu information:
      6899 root      20   0  112224  21408   4260 R 100.0  0.7   3:04.96 haproxy

Buffering is set to ensure that there is data in the haproxy buffer, and haproxy
can receive the fin package, set the CF_SHUTR flag, If the CF_SHUTR flag has been
set, The following code does not clear the timeout message, causing cpu 100%:

stream.c:process_stream:

        if (unlikely((res->flags & (CF_SHUTR|CF_READ_TIMEOUT)) == CF_READ_TIMEOUT)) {
            if (si_b->flags & SI_FL_NOHALF)
            si_b->flags |= SI_FL_NOLINGER;
            si_shutr(si_b);
        }

If you have closed the read, set the read timeout does not make sense.
With or without cf_shutr, read timeout is set:

       if (tick_isset(s->be->timeout.serverfin)) {
           res->rto = s->be->timeout.serverfin;
           res->rex = tick_add(now_ms, res->rto);
       }

After discussion on the mailing list, setting half-closed timeouts the
hard way here doesn't make sense. They should be set only at the moment
the shutdown() is performed. It will also solve a special case which was
already reported of some half-closed timeouts not working when the shutw()
is performed directly at the stream-interface layer (no analyser involved).
Since the stream interface layer cannot know the timeout values, we'll have
to store them directly in the stream interface so that they are used upon
shutw(). This patch does this, fixing the problem.

An easier reproducer to validate the fix is to keep the huge buffer and
shorten all timeouts, then call it under tcploop server and client, and
wait 3 seconds to see haproxy run at 100% CPU :

  global
      tune.bufsize 10485760

  listen px
      bind :1990
      timeout client 90s
      timeout server 90s
      timeout connect 1s
      timeout server-fin 3s
      timeout client-fin 3s
      server def 127.0.0.1:3333

  $ tcploop 3333 L W N20 A P100 F P10000 &
  $ tcploop 127.0.0.1:1990 C S10000000 F
2017-03-21 15:04:43 +01:00
Willy Tarreau
52821e2737 BUG/MAJOR: stream-int: do not depend on connection flags to detect connection
Recent fix 7bf3fa3 ("BUG/MAJOR: connection: update CO_FL_CONNECTED before
calling the data layer") marked an end to a fragile situation where the
absence of CO_FL_{CONNECTED,L4,L6}* flags is used to mark the completion
of a connection setup. The problem is that by setting the CO_FL_CONNECTED
flag earlier, we can indeed call the ->wake() function from conn_fd_handler
but the stream-interface's wake function needs to see CO_FL_CONNECTED unset
to detect that a connection has just been established, so if there's no
pending data in the buffer, the connection times out. The other ->wake()
functions (health checks and idle connections) don't do this though.

So instead of trying to detect a subtle change in connection flags,
let's simply rely on the stream-interface's state and validate that the
connection is properly established and that handshakes are completed
before reporting the WRITE_NULL indicating that a pending connection was
just completed.

This patch passed all tests of handshake and non-handshake combinations,
with synchronous and asynchronous connect() and should be safe for backport
to 1.7, 1.6 and 1.5 when the fix above is already present.
2017-03-19 12:00:04 +01:00
Willy Tarreau
8cf9c8e663 BUG/MINOR: stream-int: automatically release SI_FL_WAIT_DATA on SHUTW_NOW
While developing an experimental applet performing only one read per full
line, it appeared that it would be woken up for the client's close, not
read all data (missing LF), then wait for a subsequent call, and would only
be woken up on client timeout to finish the read. The reason is that we
preset SI_FL_WAIT_DATA in the stream-interface's flags to avoid a fast loop,
but there's nothing which can remove this flag until there's a read operation.

We must definitely remove it in stream_int_notify() each time we're called
with CF_SHUTW_NOW because we know there will be no more subsequent read
and we don't want an applet which keeps the WANT_GET flag to block on this.

This fix should be backported to 1.7 and 1.6 though it's uncertain whether
cli, peers, lua or spoe really are affected there.
2016-12-14 16:48:16 +01:00
Christopher Faulet
a73e59b690 BUG/MAJOR: Fix how the list of entities waiting for a buffer is handled
When an entity tries to get a buffer, if it cannot be allocted, for example
because the number of buffers which may be allocated per process is limited,
this entity is added in a list (called <buffer_wq>) and wait for an available
buffer.

Historically, the <buffer_wq> list was logically attached to streams because it
were the only entities likely to be added in it. Now, applets can also be
waiting for a free buffer. And with filters, we could imagine to have more other
entities waiting for a buffer. So it make sense to have a generic list.

Anyway, with the current design there is a bug. When an applet failed to get a
buffer, it will wait. But we add the stream attached to the applet in
<buffer_wq>, instead of the applet itself. So when a buffer is available, we
wake up the stream and not the waiting applet. So, it is possible to have
waiting applets and never awakened.

So, now, <buffer_wq> is independant from streams. And we really add the waiting
entity in <buffer_wq>. To be generic, the entity is responsible to define the
callback used to awaken it.

In addition, applets will still request an input buffer when they become
active. But they will not be sleeped anymore if no buffer are available. So this
is the responsibility to the applet I/O handler to check if this buffer is
allocated or not. This way, an applet can decide if this buffer is required or
not and can do additional processing if not.

[wt: backport to 1.7 and 1.6]
2016-12-12 19:11:04 +01:00
Willy Tarreau
796c5b7997 OPTIM: stream-int: don't disable polling anymore on DONT_READ
Commit 5fddab0 ("OPTIM: stream_interface: disable reading when
CF_READ_DONTWAIT is set") improved the connection layer's efficiency
back in 1.5-dev13 by avoiding successive read attempts on an active
FD. But by disabling this on a polled FD, it causes an unpleasant
side effect which is that the FD that was subscribed to polling is
suddenly stopped and may need to be re-enabled once the kernel
starts to slow down on data eviction (eg: saturated server at the
other end, bursty traffic caused by too large maxpollevents).

This behaviour is observable with persistent connections when there
is a large enough connection count so that there's no data in the
early connection and polling is required, because there are then
up to 4 epoll_ctl() calls per request. It's important that the
server is slower than haproxy to cause some delays when reading
response.

The current connection layer as designed in 1.6 with the FD cache
doesn't require this trick anymore, though it still benefits from
it when it saves an FD from being uselessly polled. But compared
to the increased cost of enabling and disabling poll all the time,
it's still better to disable it. In some cases it's possible to
observe a performance increase as high as 30% by avoiding this
epoll_ctl() dance.

In the end we only want to disable it when the FD is speculatively
read and not when it's polled. For this we introduce a new function
__conn_data_done_recv() which is used to indicate that we're done
with recv() and not interested in new attempts. If/when we later
support event-triggered epoll, this function will have to change
a bit to do the same even in the polled case.

A quick test with keep-alive requests run on a dual-core / dual-
thread Atom shows a significant improvement :

single process, 0 bytes :
before: Requests per second:    12243.20 [#/sec] (mean)
after:  Requests per second:    13354.54 [#/sec] (mean)

single process, 4k :
before: Requests per second:    9639.81 [#/sec] (mean)
after:  Requests per second:    10991.89 [#/sec] (mean)

dual process, 0 bytes (unstable) :
before: Requests per second:    16900-19800 ~ 17600 [#/sec] (mean)
after:  Requests per second:    18600-21400 ~ 20500 [#/sec] (mean)
2016-12-05 13:49:57 +01:00
Willy Tarreau
8e0bb0ae16 MINOR: connection: add names for transport and data layers
This makes debugging easier and avoids having to put ugly checks
against certain well-known internal struct pointers.
2016-11-24 16:58:12 +01:00
Willy Tarreau
958f0742a2 BUG/MEDIUM: stream-int: avoid double-call to applet->release
While the SI_ST_DIS state is set *after* doing the close on a connection,
it was set *before* calling release on an applet. Applets have no internal
flags contrary to connections, so they have no way to detect they were
already released. Because of this it happened that applets were closed
twice, once via si_applet_release() and once via si_release_endpoint() at
the end of a transaction. The CLI applet could perform a double free in
this case, though the situation to cause it is quite hard because it
requires that the applet is stuck on output in states that produce very
few data.

In order to solve this, we now assign the SI_ST_DIS state *after* calling
->release, and we refrain from doing so if the state is already assigned.
This makes applets work much more like connections and definitely avoids
this double release.

In the future it might be worth making applets have their own flags like
connections to carry their own state regardless of the stream interface's
state, especially when dealing with connection reuse.

No backport is needed since this issue was caused by the rearchitecture
in 1.6.
2015-09-25 21:16:03 +02:00
Willy Tarreau
eca572fe7f BUG/MEDIUM: applet: fix reporting of broken write situation
If an applet tries to write to a closed connection, it hangs forever.
This results in some "get map" commands on the CLI to leave orphaned
connections alive.

Now the applet wakeup function detects that the applet still wants to
write while the channel is closed for reads, which is the equivalent
to the common "broken pipe" situation. In this case, an error is
reported on the stream interface, just as it happens with connections
trying to perform a send() in a similar situation.

With this fix the stats socket is properly released.
2015-09-25 21:16:02 +02:00
Willy Tarreau
aa977ba205 MINOR: stream-int: rename si_applet_done() to si_applet_wake_cb()
This function is a callback made only for calls from the applet handler.
Rename it to remove confusion. It's currently called from the Lua code
but that's not correct, we should call the notify and update functions
instead otherwise it will not enable the applet again.
2015-09-25 21:16:02 +02:00
Willy Tarreau
335520305c MEDIUM: stream-int: completely remove stream_int_update_embedded()
This one is not needed anymore as what it used to do is either
completely covered by the new stream_int_notify() function, or undesired
and inherited from the past as a side effect of introducing the
connections.

This update is theorically never called since it's assigned only when
nothing is connected to the stream interface. However a test has been
added to si_update() to stay safe if some foreign code decides to call
si_update() in unsafe situations.
2015-09-25 21:16:02 +02:00
Willy Tarreau
651e18292d MEDIUM: stream-int: use the same stream notification function for applets and conns
The code to report completion after a connection update or an applet update
was almost the same since applets stole it from the connection. But the
differences made them hard to maintain and prevented the creation of new
functions doing only one part of the work.

This patch replaces the common code from the si_conn_wake_cb() and
si_applet_wake_cb() with a single call to stream_int_notify() which only
notifies the stream (si+channels+task) from the outside.

No functional change was made beyond this.
2015-09-25 21:16:02 +02:00
Willy Tarreau
615f28bec1 MINOR: stream-int: implement the stream_int_notify() function
stream_int_notify() was taken from the common part between si_conn_wake_cb()
and si_applet_done(). It is designed to report activity to a stream from
outside its handler. It'll generally be used by lower layers to report I/O
completion but may also be used by remote streams if the buffer processing
is shared.
2015-09-25 21:16:02 +02:00
Willy Tarreau
ea3cc48d64 MEDIUM: stream-int: clean up the conditions to enable reading in si_conn_wake_cb
The condition to release the SI_FL_WAIT_ROOM flag was abnormally
complicated because it was inherited from 6 years ago before we used
to check for the buffer's emptiness. The CF_READ_PARTIAL flag had to be
removed, and the complex test was replaced with a simpler one checking
if *some* data were moved out or not.

The reason behind this change is to have a condition compatible with
both connections and applets, as applets currently don't work very
well in this area. Specifically, some optimizations on the applet
side cause them not to release the flag above until the buffer is
empty, which may prevent applets from taking together (eg: peers
over large haproxy buffers and small kernel buffers).
2015-09-25 18:07:16 +02:00
Willy Tarreau
388a2385a5 MINOR: stream-int: move the applet_pause call out of the stream updates
It's just to split the part dealing with the stream update and the part
dealing with the applet update in si_applet_done().
2015-09-25 18:07:16 +02:00
Willy Tarreau
cbc32601a6 MINOR: stream-int: export stream_int_update_*
Not only these functions were not static, but we'll also want to export
them.
2015-09-25 18:07:16 +02:00
Willy Tarreau
5d5b2fecac MEDIUM: stream-int: call stream_int_update() from si_update()
Now the call to stream_int_update() is moved to si_update(), which
is exclusively called from the stream, so that the socket layer may
be updated without updating the stream layer. This will later permit
to call it individually from other places (other tasks or applets for
example).
2015-09-25 18:07:16 +02:00
Willy Tarreau
452c7d5d93 MEDIUM: stream-int: factor out the stream update functions
Now that we have a generic stream_int_update() function, we can
replace the equivalent part in stream_int_update_conn() and
stream_int_update_applet() to avoid code duplication.

There is no functional change, as the code is the same but split
in two functions for each call.
2015-09-25 18:07:16 +02:00
Willy Tarreau
25f1310f33 MINOR: stream-int: implement a new stream_int_update() function
This function is designed to be called from within the stream handler to
update the channels' expiration timers and the stream interface's flags
based on the channels' flags. It needs to be called only once after the
channels' flags have settled down, and before they are cleared, though it
doesn't harm to call it as often as desired (it just slightly hurts
performance). It must not be called from outside of the stream handler,
as what it does will be used to compute the stream task's expiration.

The code was taken directly from stream_int_update_applet() and
stream_int_update_conn() which had exactly the same one except for
applet-specific or connection-specific status update.
2015-09-25 18:07:16 +02:00
Willy Tarreau
2f4e702031 MEDIUM: stream-int: split stream_int_update_conn() into si- and conn-specific parts
The purpose is to separate the connection-specific parts so that the
stream-int specific one can be factored out. There's no functional
change here, only code displacement.
2015-09-25 18:07:16 +02:00
Willy Tarreau
c4b56e4470 MINOR: stream-int: use si_release_endpoint() to close idle conns
We don't want to open-code the connection close code in
si_idle_conn_wake_cb() because we need to centralize some controls.
2015-09-24 11:57:34 +02:00
Thierry FOURNIER
5bc2cbf8f4 CLEANUP: typo: bad indent
A space alignment remains in the stream_interface.c file
2015-09-10 21:16:55 +02:00
Willy Tarreau
323a2d925c MEDIUM: stream-int: queue idle connections at the server
Now we get a per-server list of all idle connections. That way we'll
be able to reclaim them upon shortage later.
2015-08-06 11:06:25 +02:00
Willy Tarreau
7a08d3b2d7 CLEANUP: stream-int: remove stream_int_unregister_handler() and si_detach()
The former was not used anymore and the latter was only used by the former.
They were only aliases to other existing functions anyway.
2015-07-19 18:48:20 +02:00
Willy Tarreau
a9ff5e64c1 CLEANUP: stream-int: fix a few outdated comments about stream_int_register_handler()
They were not updated after the infrastructure change.
2015-07-19 18:46:30 +02:00
Willy Tarreau
0b1a4541dc MEDIUM: stream-int: pause the appctx if the task is woken up
If we're going to call the task we don't need to call the appctx anymore
since the task may decide differently in the end and will do the proper
thing using ->update(). This reduces one wake up call per session and
may go down to half in case of high concurrency (scheduling races).
2015-04-23 17:56:17 +02:00
Willy Tarreau
fe127937a8 MEDIUM: applet: make the applets only use si_applet_{cant|want|stop}_{get|put}
The applets don't fiddle with SI_FL_WAIT_ROOM anymore, instead they indicate
what they want, possibly that they failed (eg: WAIT_ROOM), and it's done() /
update() which finally updates the WAIT_* flags according to the channels'
and stream interface's states. This solves the issue of the pauses during a
"show sess" without creating busy loops.
2015-04-23 17:56:17 +02:00
Willy Tarreau
563cc37609 MAJOR: stream: use a regular ->update for all stream interfaces
Now si->update() is used to update any type of stream interface, whether
it's an applet, a connection or even nothing. We don't call si_applet_call()
anymore at the end of the resync and we don't have the risk that the
stream's task is reinserted into the run queue, which makes the code
a bit simpler.

The stream_int_update_applet() function was simplified to ensure that
it remained compatible with this standardized calling convention. It
was almost copy-pasted from the update code dedicated to connections.
Just like for si_applet_done(), it seems that it should be possible to
merge the two functions except that it would require some slow operations,
except maybe if the type of end point is tested inside the update function
itself.
2015-04-23 17:56:16 +02:00
Willy Tarreau
828824af05 MAJOR: applet: now call si_applet_done() instead of si_update() in I/O handlers
The applet I/O handlers now rely on si_applet_done() which itself decides
to wake up or sleep the appctx. Now it becomes critical that applte handlers
properly call this on every exit path so that the appctx is removed from the
active list after I/O have been handled. One such call was added to the Lua
socket handler. It used to work without it probably because the main task is
woken up by the parent task but now it's needed.
2015-04-23 17:56:16 +02:00
Willy Tarreau
e5f8649102 MEDIUM: stream-int: add a new function si_applet_done()
This is the equivalent of si_conn_wake() but for applets. It will be
called after changes to the stream interface are brought by the applet
I/O handler. Ultimately it will release buffers and may be even wake
the stream's task up if some important changes are detected.

It would be nice to be able to merge it with the connection's wake
function since it mostly manipulates the stream interface, but there
are minor differences (such as how to enable/disable polling on a fd
vs applet) and some specificities to applets (eg: don't wake the
applet up until the output is empty) which would require abstract
functions which would slow down everything.
2015-04-23 17:56:16 +02:00
Willy Tarreau
d45b9f8991 REORG: stream-int: create si_applet_ops dedicated to applets
These functions are dedicated to applets so that we don't use the default
ones anymore in this case.
2015-04-23 17:56:16 +02:00
Willy Tarreau
3057645b37 CLEANUP: applet: rename struct si_applet to applet
Since this one does not depend on stream_interface anymore, remove the
"si_" prefix.
2015-04-23 17:56:16 +02:00
Willy Tarreau
8a8d83b85c REORG: applet: move the applet definitions out of stream_interface
We're tidying the definitions so that appctx lives on its own. A new
set of applet.h files has been added for this purpose.
2015-04-23 17:56:16 +02:00
Willy Tarreau
a7513f5d00 MINOR: stream-int: make appctx_new() take the applet in argument
Doing so simplifies the initialization of a new appctx. We don't
need appctx_set_applet() anymore.
2015-04-06 11:37:32 +02:00
Willy Tarreau
87b09668be REORG/MAJOR: session: rename the "session" entity to "stream"
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.

In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.

The files stream.{c,h} were added and session.{c,h} removed.

The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.

Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.

Once all changes are completed, we should see approximately this :

   L7 - http_txn
   L6 - stream
   L5 - session
   L4 - connection | applet

There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.

Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.
2015-04-06 11:23:56 +02:00
Willy Tarreau
6b5a9c23ce CLEANUP: stream-int: remove inclusion of fd.h that is not used anymore
That's a historic achievement, stream_interface.c doesn't manipulate
any file descriptor anymore. It only relies on connections or applets.
2015-03-13 00:46:47 +01:00
Willy Tarreau
d85c48589a REORG: connection: move conn_drain() to connection.c and rename it
It's now called conn_sock_drain() to make it clear that it only reads
at the sock layer and not at the data layer. The function was too big
to remain inlined and it's used at a few places where size counts.
2015-03-13 00:42:48 +01:00
Willy Tarreau
f31fb07958 MEDIUM: connection: make conn_drain() perform more controls
Currently si_idle_conn_null_cb() has to perform some low-level checks
over the file descriptor and the connection configuration that should
only belong to conn_drain(). Let's move these controls there. The
function now automatically checks for errors and hangups on the file
descriptor for example, and disables recv polling if there's no drain
function at the control layer.
2015-03-13 00:32:20 +01:00
Willy Tarreau
0a03c0f022 MEDIUM: stream-int: make conn_si_send_proxy() use conn_sock_send()
This substantially simplifies the code as we don't need to handle the
file descriptors anymore nor the specific error codes from send().
2015-03-13 00:09:30 +01:00
Willy Tarreau
1398aa19d8 MEDIUM: stream-int: replace xprt->shutw calls with conn_data_shutw()
Now that the connection performs the correct controls when shutting down,
use that in the few places where conn->xprt->shutw() was called. The calls
were split between conn_data_shutw() and conn_data_shutw_hard() depending
on the argument. Since the connection flags are updated, we don't need to
call conn_data_stop_send() anymore, instead we just have to call
conn_cond_update_polling().
2015-03-12 23:04:07 +01:00
Willy Tarreau
4dfd54f26a MINOR: stream-int: use conn_sock_shutw() to shutdown a connection
Stop calling shutdown() on the connection's fd. Note, this also seems
to fix a bug which was harmless, but which consisted in not marking
the connection as shutdown at the socket level until the other side
was shut as well.
2015-03-12 22:44:53 +01:00
Willy Tarreau
1140512f76 CLEANUP: stream-int: remove a redundant clearing of the linger_risk flag
In stream_sock_read0(), we used to clear this flag. But the only case
where stream_sock_read0() is called is in reaction with a conn_sock_read0()
event coming from the lower layers, which already clears this flag. So let's
remove this duplicate one and clear one of the few remaining layering
violations in this area.
2015-03-12 22:32:27 +01:00
Willy Tarreau
78955f4c8b MEDIUM: session: simplify receive buffer allocator to only use the channel
Now that we can get the session from the channel, let's simplify the
prototype of session_alloc_recv_buffer() to only require the channel.
Both the caller and the function are now simplified.
2015-03-11 20:41:47 +01:00
Willy Tarreau
afc8a22ad7 CLEANUP: stream-int: limit usage of si_ic/si_oc
As much as possible, we copy the result of this function into a local
variable to avoid having to check the flag all the time.
2015-03-11 20:41:47 +01:00
Willy Tarreau
50fe03be78 CLEANUP: stream-int: add si_opposite() to find the other stream interface
At a few places we need to find one stream interface from the other one.
Instead of passing via the channel, we simply use the session as an
intermediary, which simply results in applying an offset to the pointer.
2015-03-11 20:41:47 +01:00
Willy Tarreau
4e4292b9af CLEANUP: stream-int: add si_ib/si_ob to dereference the buffers
This makes the code cleaner and is more intuitive to use.
2015-03-11 20:41:46 +01:00
Willy Tarreau
07373b8660 MEDIUM: stream-int: use si_task() to retrieve the task from the stream int
We go back to the session to get the owner. Here again it's very easy
and is just a matter of relative offsets. Since the owner always exists
and always points to the session's task, we can remove some unneeded
tests.
2015-03-11 20:41:46 +01:00
Willy Tarreau
2bb4a96f8f REORG/MEDIUM: stream-int: introduce si_ic/si_oc to access channels
We'll soon remove direct references to the channels from the stream
interface since everything belongs to the same session, so let's
first not dereference si->ib / si->ob anymore and use macros instead.
2015-03-11 20:41:46 +01:00
Willy Tarreau
319f745ba0 MINOR: channel: rename bi_erase() to channel_truncate()
It applies to the channel and it doesn't erase outgoing data, only
pending unread data, which is strictly equivalent to what recv()
does with MSG_TRUNC, so that new name is more accurate and intuitive.
2015-01-14 20:32:59 +01:00
Willy Tarreau
b5051f8742 MINOR: channel: rename bi_avail() to channel_recv_max()
This name more accurately reminds that it applies to a channel and not
to a buffer, and that what is returned may be used as a max number of
bytes to pass to recv().
2015-01-14 20:26:54 +01:00
Willy Tarreau
3889fffe92 MINOR: channel: rename channel_full() to !channel_may_recv()
This function's name was poorly chosen and is confusing to the point of
being suspiciously used at some places. The operations it does always
consider the ability to forward pending input data before receiving new
data. This is not obvious at all, especially at some places where it was
used when consuming outgoing data to know if the buffer has any chance
to ever get the missing data. The code needs to be re-audited with that
in mind. Care must be taken with existing code since the polarity of the
function was switched with the renaming.
2015-01-14 18:41:33 +01:00
Willy Tarreau
56efc4896b OPTIM: stream-int: try to send pending spliced data
This is the equivalent of eb9fd51 ("OPTIM: stream_sock: reduce the amount
of in-flight spliced data") whose purpose is to try to immediately send
spliced data if available.
2014-12-24 23:47:33 +01:00
Willy Tarreau
9b20c55562 MEDIUM: stream-int: support splicing from applets
If we want to splice from applets, we must check the pipe before clearing
SI_FL_WAIT_ROOM.
2014-12-24 23:47:33 +01:00
Willy Tarreau
10fc09e872 MAJOR: session: only allocate buffers when needed
A session doesn't need buffers all the time, especially when they're
empty. With this patch, we don't allocate buffers anymore when the
session is initialized, we only allocate them in two cases :

  - during process_session()
  - during I/O operations

During process_session(), we try hard to allocate both buffers at once
so that we know for sure that a started operation can complete. Indeed,
a previous version of this patch used to allocate one buffer at a time,
but it can result in a deadlock when all buffers are allocated for
requests for example, and there's no buffer left to emit error responses.
Here, if any of the buffers cannot be allocated, the whole operation is
cancelled and the session is added at the tail of the buffer wait queue.

At the end of process_session(), a call to session_release_buffers() is
done so that we can offer unused buffers to other sessions waiting for
them.

For I/O operations, we only need to allocate a buffer on the Rx path.
For this, we only allocate a single buffer but ensure that at least two
are available to avoid the deadlock situation. In case buffers are not
available, SI_FL_WAIT_ROOM is set on the stream interface and the session
is queued. Unused buffers resulting either from a successful send() or
from an unused read buffer are offered to pending sessions during the
->wake() callback.
2014-12-24 23:47:33 +01:00
Willy Tarreau
bf883e0aa7 MAJOR: session: implement a wait-queue for sessions who need a buffer
When a session_alloc_buffers() fails to allocate one or two buffers,
it subscribes the session to buffer_wq, and waits for another session
to release buffers. It's then removed from the queue and woken up with
TASK_WAKE_RES, and can attempt its allocation again.

We decide to try to wake as many waiters as we release buffers so
that if we release 2 and two waiters need only once, they both have
their chance. We must never come to the situation where we don't wake
enough tasks up.

It's common to release buffers after the completion of an I/O callback,
which can happen even if the I/O could not be performed due to half a
failure on memory allocation. In this situation, we don't want to move
out of the wait queue the session that was just added, otherwise it
will never get any buffer. Thus, we only force ourselves out of the
queue when freeing the session.

Note: at the moment, since session_alloc_buffers() is not used, no task
is subscribed to the wait queue.
2014-12-24 23:47:33 +01:00
Willy Tarreau
a69fc9f803 BUG/MAJOR: stream-int: properly check the memory allocation return
In stream_int_register_handler(), we call si_alloc_appctx(si) but as a
mistake, instead of checking the return value for a NULL, we test <si>.
This bug was discovered under extreme memory contention (memory for only
two buffers with 500 connections waiting) and after 3 million failed
connections. While it was very hard to produce it, the fix is tagged
major because in theory it could happen when haproxy runs with a very
low "-m" setting preventing from allocating just the few bytes needed
for an appctx. But most users will never be able to trigger it. The
fix was confirmed to address the bug.

This fix must be backported to 1.5.
2014-12-23 11:22:39 +01:00
Willy Tarreau
9dc1c61c43 BUG/CRITICAL: http: don't update msg->sov once data start to leave the buffer
Commit bb2e669 ("BUG/MAJOR: http: correctly rewind the request body
after start of forwarding") was incorrect/incomplete. It used to rely on
CF_READ_ATTACHED to stop updating msg->sov once data start to leave the
buffer, but this is unreliable because since commit a6eebb3 ("[BUG]
session: clear BF_READ_ATTACHED before next I/O") merged in 1.5-dev1,
this flag is only ephemeral and is cleared once all analysers have
seen it. So we can start updating msg->sov again each time we pass
through this place with new data. With a sufficiently large amount of
data, it is possible to make msg->sov wrap and validate the if()
condition at the top, causing the buffer to advance by about 2GB and
crash the process.

Note that the offset cannot be controlled by the attacker because it is
a sum of millions of small random sizes depending on how many bytes were
read by the server and how many were left in the buffer, only because
of the speed difference between reading and writing. Also, nothing is
written, the invalid pointer resulting from this operation is only read.

Many thanks to James Dempsey for reporting this bug and to Chris Forbes for
narrowing down the faulty area enough to make its root cause analysable.

This fix must be backported to haproxy 1.5.
2014-09-02 16:48:54 +02:00
David S
afb768340c MEDIUM: connection: Implement and extented PROXY Protocol V2
This commit modifies the PROXY protocol V2 specification to support headers
longer than 255 bytes allowing for optional extensions.  It implements the
PROXY protocol V2 which is a binary representation of V1. This will make
parsing more efficient for clients who will know in advance exactly how
many bytes to read.  Also, it defines and implements some optional PROXY
protocol V2 extensions to send information about downstream SSL/TLS
connections.  Support for PROXY protocol V1 remains unchanged.
2014-05-09 08:25:38 +02:00
Willy Tarreau
7e3127391f MINOR: config: make the stream interface idle timer user-configurable
The new tune.idletimer value allows one to set a different value for
idle stream detection. The default value remains set to one second.
It is possible to disable it using zero, and to change the default
value at build time using DEFAULT_IDLE_TIMER.
2014-02-12 16:36:12 +01:00
Willy Tarreau
c5890e66cd MEDIUM: stream-int: automatically disable CF_STREAMER flags after idle
Disabling the streamer flags after an idle period will help TCP proxies
to better adapt to the streams they're forwarding, especially with SSL
where this will allow the SSL sender to use smaller records. This is
typically used to optimally relay HTTP and derivatives such as SPDY or
HTTP/2 in pure TCP mode when haproxy is used as an SSL offloader.

This idea was first proposed by Ilya Grigorik on the haproxy mailing
list, and his tests seem to confirm the improvement :

  https://www.mail-archive.com/haproxy@formilux.org/msg12576.html
2014-02-12 11:46:03 +01:00
Willy Tarreau
7bed945be0 OPTIM: ssl: implement dynamic record size adjustment
By having the stream interface pass the CF_STREAMER flag to the
snd_buf() primitive, we're able to tell the send layer whether
we're sending large chunks or small ones.

We use this information in SSL to adjust the max record dynamically.
This results in small chunks respecting tune.ssl.maxrecord at the
beginning of a transfer or for small transfers, with an automatic
switch to full records if the exchanges last long. This allows the
receiver to parse HTML contents on the fly without having to retrieve
16kB of data, which is even more important with small initcwnd since
the receiver does not need to wait for round trips to start fetching
new objects. However, sending large files still produces large chunks.

For example, with tune.ssl.maxrecord = 2859, we see 5 write(2885)
sent in two segments each and 6 write(16421).

This idea was first proposed on the haproxy mailing list by Ilya Grigorik.
2014-02-06 11:37:29 +01:00
Willy Tarreau
1049b1f551 MEDIUM: connection: don't use real send() flags in snd_buf()
This prevents us from passing other useful info and requires the
upper levels to know these flags. Let's use a new flags category
instead : CO_SFL_*. For now, only MSG_MORE has been remapped.
2014-02-06 11:37:29 +01:00
Willy Tarreau
798c3c9c41 MINOR: stream-interface: no need to call fd_stop_both() on error
We don't need to call fd_stop_both() since we already call
conn_cond_update_polling() which will do it. This call was introduced by
commit d29a066 ("BUG/MAJOR: connection: always recompute polling status
upon I/O").
2014-01-26 00:42:31 +01:00
Willy Tarreau
708e717251 MEDIUM: stream-interface: the polling flags must always be updated in chk_snd_conn
We used to only update the polling flags in data phase, but after that
we could update other flags. It does not seem possible to trigger a
bug here but it's not very safe either. Better always keep them up to
date.
2014-01-26 00:42:30 +01:00
Willy Tarreau
fd803bb4d7 MEDIUM: connection: add check for readiness in I/O handlers
The recv/send callbacks must check for readiness themselves instead of
having their callers do it. This will strengthen the test and will also
ensure we never refrain from calling a handshake handler because a
direction is being polled while the other one is ready.
2014-01-26 00:42:30 +01:00
Willy Tarreau
e1f50c4b02 MEDIUM: connection: remove conn_{data,sock}_poll_{recv,send}
We simply remove these functions and replace their calls with the
appropriate ones :

  - if we're in the data phase, we can simply report wait on the FD
  - if we're in the socket phase, we may also have to signal the
    desire to read/write on the socket because it might not be
    active yet.
2014-01-26 00:42:30 +01:00
Willy Tarreau
310987a038 MAJOR: connection: remove the CO_FL_WAIT_{RD,WR} flags
These flags were used to report the readiness of the file descriptor.
Now this readiness is directly checked at the file descriptor itself.
This removes the need for constantly synchronizing updates between the
file descriptor and the connection and ensures that all layers share
the same level of information.

For now, the readiness is updated in conn_{sock,data}_poll_* by directly
touching the file descriptor. This must move to the lower layers instead
so that these functions can disappear as well. In this state, the change
works but is incomplete. It's sensible enough to avoid making it more
complex.

Now the sock/data updates become much simpler because they just have to
enable/disable access to a file descriptor and not to care anymore about
its readiness.
2014-01-26 00:42:30 +01:00
Willy Tarreau
e6300be8f8 BUG/MEDIUM: stream-interface: don't wake the task up before end of transfer
Recent commit d7ad9f5 ("MAJOR: channel: add a new flag CF_WAKE_WRITE to
notify the task of writes") was not correct. It used to wake up the task
as soon as there was some write activity and the flag was set, even if there
were still some data to be forwarded. This resulted in process_session()
being called a lot when transfering chunk-encoded HTTP responses made of
very large chunks.

The purpose of the flag is to wake up only a task waiting for some
room and not the other ones, so it's totally counter-productive to
wake it up as long as there are data to forward because the task
will not be allowed to write anyway.

Also, the commit above was taking some risks by not considering
certain events anymore (eg: state != SI_ST_EST). While such events
are not used at the moment, if some new features were developped
in the future relying on these, it would be better that they could
be notified when subscribing to the WAKE_WRITE event, so let's
restore the condition.
2014-01-25 22:28:22 +01:00
Willy Tarreau
46be2e5039 MEDIUM: connection: update callers of ctrl->drain() to use conn_drain()
Now we can more safely rely on the connection state to decide how to
drain and what to do when data are drained. Callers don't need to
manipulate the file descriptor's state anymore.

Note that it also removes the need for the fix ea90063 ("BUG/MEDIUM:
stream-int: fix the keep-alive idle connection handler") since conn_drain()
correctly sets the polling flags.
2014-01-20 22:27:17 +01:00
Willy Tarreau
7f4bcc312d MINOR: protocol: improve the proto->drain() API
It was not possible to know if the drain() function had hit an
EAGAIN, so now we change the API of this function to return :
  < 0 if EAGAIN was met
  = 0 if some data remain
  > 0 if a shutdown was received
2014-01-20 22:27:16 +01:00
Willy Tarreau
d7ad9f5b0d MAJOR: channel: add a new flag CF_WAKE_WRITE to notify the task of writes
Since commit 6b66f3e ([MAJOR] implement autonomous inter-socket forwarding)
introduced in 1.3.16-rc1, we've been relying on a stupid mechanism to wake
up the task after a write, which was an exact copy-paste of the reader side.

The principle was that if we empty a buffer and there's no forwarding
scheduled or if the *producer* is not in a connected state, then we wake
the task up.

That does not make any sense. It happens to wake up too late sometimes (eg,
when the request analyser waits for some room in the buffer to start to
work), and leads to unneeded wakeups in client-side keep-alive, because
the task is woken up when the response is sent, while the analysers are
simply waiting for a new request.

In order to fix this, we introduce a new channel flag : CF_WAKE_WRITE. It
is designed so that an analyser can explicitly request being notified when
some data were written. It is used only when the HTTP request or response
analysers need to wait for more room in the buffers. It is automatically
cleared upon wake up.

The flag is also automatically set by the functions which try to write into
a buffer from an applet when they fail (bi_putblk() etc...).

That allows us to remove the stupid condition above and avoid some wakeups.
In http-server-close and in http-keep-alive modes, this reduces from 4 to 3
the average number of wakeups per request, and increases the overall
performance by about 1.5%.
2013-12-31 18:37:36 +01:00
Willy Tarreau
61f7f0a959 BUG/MINOR: stream-int: do not clear the owner upon unregister
Since the applet rework and the removal of the inter-task applets,
we must not clear the stream-interface's owner task anymore otherwise
we risk a crash when maintaining keep-alive with an applet. This is
not possible right now so there is no impact yet, but this bug is not
easy to track down. No backport is needed.
2013-12-28 21:33:37 +01:00
Willy Tarreau
ea90063cbc BUG/MEDIUM: stream-int: fix the keep-alive idle connection handler
Commit 2737562 (MEDIUM: stream-int: implement a very simplistic idle
connection manager) implemented an idle connection handler. In the
case where all data is drained from the server, it fails to disable
polling, resulting in a busy spinning loop.

Thanks to Sander Klein and Guillaume Castagnino for reporting this bug.

No backport is needed.
2013-12-17 14:21:48 +01:00
Willy Tarreau
2737562e43 MEDIUM: stream-int: implement a very simplistic idle connection manager
Idle connections are not monitored right now. So if a server closes after
a response without advertising it, it won't be detected until a next
request wants to use the connection. This is a bit problematic because
it unnecessarily maintains file descriptors and sockets in an idle
state.

This patch implements a very simple idle connection manager for the stream
interface. It presents itself as an I/O callback. The HTTP engine enables
it when it recycles a connection. If a close or an error is detected on the
underlying socket, it tries to drain as much data as possible from the socket,
detect the close and responds with a close as well, then detaches from the
stream interface.
2013-12-17 00:00:28 +01:00
Willy Tarreau
ad38acedaa MEDIUM: connection: centralize handling of nolinger in fd management
Right now we see many places doing their own setsockopt(SO_LINGER).
Better only do it just before the close() in fd_delete(). For this
we add a new flag on the file descriptor, indicating if it's safe or
not to linger. If not (eg: after a connect()), then the setsockopt()
call is automatically performed before a close().

The flag automatically turns to safe when receiving a read0.
2013-12-16 02:23:52 +01:00
Willy Tarreau
d02cdd23be MINOR: connection: add simple functions to report connection readiness
conn_xprt_ready() reports if the transport layer is ready.
conn_ctrl_ready() reports if the control layer is ready.

The stream interface uses si_conn_ready() to report that the
underlying connection is ready. This will be used for connection
reuse in keep-alive mode.
2013-12-16 02:23:52 +01:00
Willy Tarreau
0a23bcb8be MAJOR: stream-interface: dynamically allocate the applet context
From now on, a call to stream_int_register_handler() causes a call
to si_alloc_appctx() and returns an initialized appctx for the
current stream interface. If one was previously allocated, it is
released. If the stream interface was attached to a connection, it
is released as well.

The appctx are allocated from the same pools as the connections, because
they're substantially smaller in size, and we can't have both a connection
and an appctx on an interface at any moment.

In case of memory shortage, the call may return NULL, which is already
handled by all consumers of stream_int_register_handler().

The field appctx was removed from the stream interface since we only
rely on the endpoint now. On 32-bit, the stream_interface size went down
from 108 to 44 bytes. On 64-bit, it went down from 144 to 64 bytes. This
represents a memory saving of 160 bytes per session.

It seems that a later improvement could be to move the call to
stream_int_register_handler() to session.c for most cases.
2013-12-09 15:40:23 +01:00
Willy Tarreau
1fbe1c9ec8 MEDIUM: stream-int: return the allocated appctx in stream_int_register_handler()
The task returned by stream_int_register_handler() is never used, however we
always need to access the appctx afterwards. So make it return the appctx
instead. We already plan for it to fail, which is the reason for the addition
of a few tests and the possibility for the HTTP analyser to return a status
code 500.
2013-12-09 15:40:23 +01:00
Willy Tarreau
57cd3e46b9 MEDIUM: connection: merge the send_proxy and local_send_proxy calls
We used to have two very similar functions for sending a PROXY protocol
line header. The reason is that the default one relies on the stream
interface to retrieve the other end's address, while the "local" one
performs a local address lookup and sends that instead (used by health
checks).

Now that the send_proxy_ofs is stored in the connection and not the
stream interface, we can make the local_send_proxy rely on it and
support partial sends. This also simplifies the code by removing the
local_send_proxy function, making health checks use send_proxy_ofs,
resulting in the removal of the CO_FL_LOCAL_SPROXY flag, and the
associated test in the connection handler. The other flag,
CO_FL_SI_SEND_PROXY was renamed without the "SI" part so that it
is clear that it is not dedicated anymore to a usage with a stream
interface.
2013-12-09 15:40:23 +01:00
Willy Tarreau
b8020cefed MEDIUM: connection: move the send_proxy offset to the connection
Till now the send_proxy_ofs field remained in the stream interface,
but since the dynamic allocation of the connection, it makes a lot
of sense to move that into the connection instead of the stream
interface, since it will not be statically allocated for each
session.

Also, it turns out that moving it to the connection fils an alignment
hole on 64 bit architectures so it does not consume more memory, and
removing it from the stream interface was an opportunity to correctly
reorder fields and reduce the stream interface's size from 160 to 144
bytes (-10%). This is 32 bytes saved per session.
2013-12-09 15:40:23 +01:00
Willy Tarreau
32e3c6a607 MAJOR: stream interface: dynamically allocate the outgoing connection
The outgoing connection is now allocated dynamically upon the first attempt
to touch the connection's source or destination address. If this allocation
fails, we fail on SN_ERR_RESOURCE.

As we didn't use si->conn anymore, it was removed. The endpoints are released
upon session_free(), on the error path, and upon a new transaction. That way
we are able to carry the existing server's address across retries.

The stream interfaces are not initialized anymore before session_complete(),
so we could even think about allocating them dynamically as well, though
that would not provide much savings.

The session initialization now makes use of conn_new()/conn_free(). This
slightly simplifies the code and makes it more logical. The connection
initialization code is now shorter by about 120 bytes because it's done
at once, allowing the compiler to remove all redundant initializations.

The si_attach_applet() function now takes care of first detaching the
existing endpoint, and it is called from stream_int_register_handler(),
so we can safely remove the calls to si_release_endpoint() in the
application code around this call.

A call to si_detach() was made upon stream_int_unregister_handler() to
ensure we always free the allocated connection if one was allocated in
parallel to setting an applet (eg: detect HTTP proxy while proceeding
with stats maybe).
2013-12-09 15:40:23 +01:00
Willy Tarreau
2a6e8802c0 MEDIUM: stream-interface: introduce si_attach_conn to replace si_prepare_conn
si_prepare_conn() is not appropriate in our case as it both initializes and
attaches the connection to the stream interface. Due to the asymmetry between
accept() and connect(), it causes some fields such as the control and transport
layers to be reinitialized.

Now that we can separately initialize these fields using conn_prepare(), let's
break this function to only attach the connection to the stream interface.

Also, by analogy, si_prepare_none() was renamed si_detach(), and
si_prepare_applet() was renamed si_attach_applet().
2013-12-09 15:40:23 +01:00
Willy Tarreau
f79c8171b2 MAJOR: connection: add two new flags to indicate readiness of control/transport
Currently the control and transport layers of a connection are supposed
to be initialized when their respective pointers are not NULL. This will
not work anymore when we plan to reuse connections, because there is an
asymmetry between the accept() side and the connect() side :

  - on accept() side, the fd is set first, then the ctrl layer then the
    transport layer ; upon error, they must be undone in the reverse order,
    then the FD must be closed. The FD must not be deleted if the control
    layer was not yet initialized ;

  - on the connect() side, the fd is set last and there is no reliable way
    to know if it has been initialized or not. In practice it's initialized
    to -1 first but this is hackish and supposes that local FDs only will
    be used forever. Also, there are even less solutions for keeping trace
    of the transport layer's state.

Also it is possible to support delayed close() when something (eg: logs)
tracks some information requiring the transport and/or control layers,
making it even more difficult to clean them.

So the proposed solution is to add two flags to the connection :

  - CO_FL_CTRL_READY is set when the control layer is initialized (fd_insert)
    and cleared after it's released (fd_delete).

  - CO_FL_XPRT_READY is set when the control layer is initialized (xprt->init)
    and cleared after it's released (xprt->close).

The functions have been adapted to rely on this and not on the pointers
anymore. conn_xprt_close() was unused and dangerous : it did not close
the control layer (eg: the socket itself) but still marks the transport
layer as closed, preventing any future call to conn_full_close() from
finishing the job.

The problem comes from conn_full_close() in fact. It needs to close the
xprt and ctrl layers independantly. After that we're still having an issue :
we don't know based on ->ctrl alone whether the fd was registered or not.
For this we use the two new flags CO_FL_XPRT_READY and CO_FL_CTRL_READY. We
now rely on this and not on conn->xprt nor conn->ctrl anymore to decide what
remains to be done on the connection.

In order not to miss some flag assignments, we introduce conn_ctrl_init()
to initialize the control layer, register the fd using fd_insert() and set
the flag, and conn_ctrl_close() which unregisters the fd and removes the
flag, but only if the transport layer was closed.

Similarly, at the transport layer, conn_xprt_init() calls ->init and sets
the flag, while conn_xprt_close() checks the flag, calls ->close and clears
the flag, regardless xprt_ctx or xprt_st. This also ensures that the ->init
and the ->close functions are called only once each and in the correct order.
Note that conn_xprt_close() does nothing if the transport layer is still
tracked.

conn_full_close() now simply calls conn_xprt_close() then conn_full_close()
in turn, which do nothing if CO_FL_XPRT_TRACKED is set.

In order to handle the error path, we also provide conn_force_close() which
ignores CO_FL_XPRT_TRACKED and closes the transport and the control layers
in turns. All relevant instances of fd_delete() have been replaced with
conn_force_close(). Now we always know what state the connection is in and
we can expect to split its initialization.
2013-12-09 15:40:23 +01:00
Willy Tarreau
b363a1f469 MAJOR: stream-int: stop using si->conn and use si->end instead
The connection will only remain there as a pre-allocated entity whose
goal is to be placed in ->end when establishing an outgoing connection.
All connection initialization can be made on this connection, but all
information retrieved should be applied to the end point only.

This change is huge because there were many users of si->conn. Now the
only users are those who initialize the new connection. The difficulty
appears in a few places such as backend.c, proto_http.c, peers.c where
si->conn is used to hold the connection's target address before assigning
the connection to the stream interface. This is why we have to keep
si->conn for now. A future improvement might consist in dynamically
allocating the connection when it is needed.
2013-12-09 15:40:22 +01:00
Willy Tarreau
cf644ed37a MEDIUM: stream-int: make ->end point to the connection or the appctx
The long-term goal is to have a context for applets as an alternative
to the connection and not as a complement. At the moment, the context
is still stored into the stream interface, and we only put a pointer
to the applet's context in si->end, initialize the context with object
type OBJ_TYPE_APPCTX, and this allows us not to allocate an entry when
deciding to switch to an applet.

A special care is taken to never dereference si->conn anymore when
dealing with an applet. That's why it's important that si->end is
always set to the proper type :

    si->end == NULL             => not connected to anything
   *si->end == OBJ_TYPE_APPCTX  => connected to an applet
   *si->end == OBJ_TYPE_CONN    => real connection (server, proxy, ...)

The session management code used to check the applet from the connection's
target. Now it uses the stream interface's end point and does not touch the
connection at all. Similarly, we stop checking the connection's addresses
and file descriptors when reporting the applet's status in the stats dump.
2013-12-09 15:40:22 +01:00
Willy Tarreau
4a59f2f954 MAJOR: stream interface: remove the ->release function pointer
Since last commit, we now have a pointer to the applet in the
applet context. So we don't need the si->release function pointer
anymore, it can be extracted from applet->applet.release. At many
places, the ->release function was still tested for real connections
while it is only limited to applets, so most of them were simply
removed. For the remaining valid uses, a new inline function
si_applet_release() was added to simplify the check and the call.
2013-12-09 15:40:22 +01:00
Willy Tarreau
7d67d7b9e5 MINOR: stream-int: add a new pointer to the end point
The end point will correspond to either an applet context or a connection,
depending on the object type. For now the pointer remains null.
2013-12-09 15:40:22 +01:00
Willy Tarreau
372d6708fb MINOR: stream-int: split si_prepare_embedded into si_prepare_none and si_prepare_applet
si_prepare_embedded() was used both to attach an applet and to detach
anything from a stream interface. Split it into si_prepare_none() to
detach and si_prepare_applet() to attach an applet.

si->conn->target is now assigned from within these two functions instead
of their respective callers.
2013-12-09 15:40:22 +01:00
Willy Tarreau
6fe1541285 MINOR: stream-int: make the shutr/shutw functions void
This is to be more consistent with the other functions. The only
reason why these functions used to return a value was to let the
caller adjust polling by itself, but now their only callers were
the si_shutr()/si_shutw() inline functions. Now these functions
do not depend anymore on the connection.

These connection variant of these functions now call
conn_data_stop_recv()/conn_data_stop_send() before returning order
not to require a return code anymore. The applet version does not
need this at all.
2013-12-09 15:40:22 +01:00
Willy Tarreau
8b3d7dfd7c MEDIUM: stream-int: split the shutr/shutw functions between applet and conn
These functions induce a lot of ifs everywhere because they consider two
different cases, one which is where the connection exists and has a file
descriptor, and the other one which is the default case where at most an
applet has to be notified.

Let's have them in si_ops and automatically decide which one to use.

The connection shutdown sequence has been slightly simplified, and we
now clear the flags at the end.

Also we remove SHUTR_NOW after a shutw with nolinger, as it's cleaner
not to keep it.
2013-12-09 15:40:22 +01:00
Willy Tarreau
26f4a04744 MEDIUM: connection: set the socket shutdown flags on socket errors
When we get a hard error from a syscall indicating the socket is dead,
it makes sense to set the CO_FL_SOCK_WR_SH and CO_FL_SOCK_RD_SH flags
to indicate that the socket may not be used anymore. It will ease the
error processing in health checks where the state of socket is very
important. We'll also be able to avoid some setsockopt(nolinger) after
an error.

For now, the rest of the code is not impacted because CO_FL_ERROR is
always tested prior to these flags.
2013-12-04 23:50:36 +01:00
Willy Tarreau
7fe45698f5 BUG/MINOR: connection: check EINTR when sending a PROXY header
PROXY protocol header was not tolerant to signals, so it might cause a
connection to report an error if a signal comes in at the exact same
moment the send is done.

This is 1.5-specific and does not need any backport.
2013-12-04 23:50:26 +01:00
Godbach
4f48990c1a OPTIM: stream_interface: return directly if the connection flag CO_FL_ERROR has been set
The connection flag CO_FL_ERROR will be tested in the functions both
si_conn_recv_cb() and si_conn_send_cb(). If CO_FL_ERROR has been set, out_error
branch will be executed. But the only job of out_error branch is to set
CO_FL_ERROR on connection flag. So it's better return directly than goto
out_error branch under such conditions. As a result, out_error branch becomes
needless and can be removed.

In addition, the return type of si_conn_send_loop() is also changed to void.
The caller should check conn->flags for errors just like stream_int_chk_snd_conn()
does as below:

static void stream_int_chk_snd_conn(struct stream_interface *si)
{
	...
        conn_refresh_polling_flags(si->conn);

-       if (si_conn_send(si->conn) < 0) {
+       si_conn_send(si->conn);
+       if (si->conn->flags & CO_FL_ERROR) {
	...
}

Signed-off-by: Godbach <nylzhaowei@gmail.com>
2013-12-04 10:46:09 +01:00
Godbach
e68e02dc1d CLEANUP: stream_interface: cleanup loop information in si_conn_send_loop()
Though si_conn_send_loop() does not loop over ->snd_buf() after commit ed7f836,
there is still some codes left which use `while` but only execute once. This
commit does the cleanup job and rename si_conn_send_loop() to si_conn_send().

Signed-off-by: Godbach <nylzhaowei@gmail.com>
2013-10-12 07:53:33 +02:00
Willy Tarreau
95742a43aa BUG/MEDIUM: fix broken send_proxy on FreeBSD
David Berard reported that send-proxy was broken on FreeBSD and tracked the
issue to be an error returned by send(). We already had the same issue in
the past in another area which was addressed by the following commit :

   0ea0cf6 BUG: raw_sock: also consider ENOTCONN in addition to EAGAIN

In fact, on Linux send() returns EAGAIN when the connection is not yet
established while other OSes return ENOTCONN. Let's consider ENOTCONN for
send-proxy there as the same as EAGAIN.

David confirmed that this change properly fixed the issue.

Another place was affected as well (health checks with send-proxy), and
was fixed.

This fix does not need any backport since it only affects 1.5.
2013-09-03 09:08:31 +02:00
Willy Tarreau
fa8e2bc68c OPTIM: splicing: use splice() for the last block when relevant
Splicing is avoided for small transfers because it's generally cheaper
to perform a couple of recv+send calls than pipe+splice+splice. This
has the consequence that the last chunk of a large transfer may be
transferred using recv+send if it's less than 4 kB. But when the pipe
is already set up, it's better to use splice() to read the pending data,
since they will get merged with the pending ones. This is what now
happens everytime the reader is slower than the writer.

Note that this change alone could have fixed most of the CPU hog bug,
except at the end when only the close was pending.
2013-07-22 09:31:56 +02:00
Willy Tarreau
5007d2aa33 BUG/MINOR: stream_interface: don't call chk_snd() on polled events
As explained in previous patch, we incorrectly call chk_snd() when
performing a read even if the write event is already subscribed to
poll(). This is counter-productive because we're almost sure to get
an EAGAIN.

A quick test shows that this fix halves the number of failed splice()
calls without adding any extra work on other syscalls.

This could have been tagged as an improvement, but since this behaviour
made the analysis of previous bug more complex, it still qualifies as
a fix.
2013-07-22 09:31:55 +02:00
Willy Tarreau
61d39a0e2a BUG/MEDIUM: splicing: fix abnormal CPU usage with splicing
Mark Janssen reported an issue in 1.5-dev19 which was introduced
in 1.5-dev12 by commit 96199b10. From time to time, randomly, the
CPU usage spikes to 100% for seconds to minutes.

A deep analysis of the traces provided shows that it happens when
waiting for the response to a second pipelined HTTP request, or
when trying to handle the received shutdown advertised by epoll()
after the last block of data. Each time, splice() was involved with
data pending in the pipe.

The cause of this was that such events could not be taken into account
by splice nor by recv and were left pending :

  - the transfer of the last block of data, optionally with a shutdown
    was not handled by splice() because of the validation that to_forward
    is higher than MIN_SPLICE_FORWARD ;

  - the next recv() call was inhibited because of the test on presence
    of data in the pipe. This is also what prevented the recv() call
    from handling a response to a pipelined request until the client
    had ACKed the previous response.

No less than 4 different methods were experimented to fix this, and the
current one was finally chosen. The principle is that if an event is not
caught by splice(), then it MUST be caught by recv(). So we remove the
condition on the pipe's emptiness to perform an recv(), and in order to
prevent recv() from being used in the middle of a transfer, we mark
supposedly full pipes with CO_FL_WAIT_ROOM, which makes sense because
the reason for stopping a splice()-based receive is that the pipe is
supposed to be full.

The net effect is that we don't wake up and sleep in loops during these
transient states. This happened much more often than expected, sometimes
for a few cycles at end of transfers, but rarely long enough to be
noticed, unless a client timed out with data pending in the pipe. The
effect on CPU usage is visible even when transfering 1MB objects in
pipeline, where the CPU usage drops from 10 to 6% on a small machine at
medium bandwidth.

Some further improvements are needed :
  - the last chunk of a splice() transfer is never done using splice due
    to the test on to_forward. This is wrong and should be performed with
    splice if the pipe has not yet been emptied ;

  - si_chk_snd() should not be called when the write event is already being
    polled, otherwise we're almost certain to get EAGAIN.

Many thanks to Mark for all the traces he cared to provide, they were
essential for understanding this issue which was not reproducible
without.

Only 1.5-dev is affected, no backport is needed.
2013-07-22 09:31:55 +02:00
Willy Tarreau
9568d7108f BUG/MEDIUM: stream_interface: don't close outgoing connections on shutw()
Commit 7bb68abb introduced the SI_FL_NOHALF flag in dev10. It is used
to automatically close the write side of a connection whose read side
is closed. But the patch also caused the opposite to happen, which is
that a simple shutw() call would immediately close the connection. This
is not desired because when using option abortonclose, we want to pass
the client's shutdown to the server which will decide what to do with
it. So let's avoid the close when SHUTR is not set.
2012-12-30 01:39:37 +01:00
Willy Tarreau
34ac5665d4 BUG/MEDIUM: stream_interface: fix another case where the reader might not be woken up
The code review during the chase for the POST freeze uncovered another possible
issue which might appear when we perform an incomplete read and want to stop because
of READ_DONTWAIT or because we reached the maximum read_poll limit. Reading is
disabled but SI_FL_WAIT_ROOM was not set, possibly causing some cases where a
send() on the other side would not wake the reader up until another activity
on the same side calls the update function which fixes its status.
2012-12-19 19:28:57 +01:00
Willy Tarreau
6657276871 BUG/MAJOR: stream_interface: fix occasional data transfer freezes
Since the changes in connection management, it became necessary to re-enable
polling after a fast-forward transfer would complete.

One such issue was addressed after dev12 by commit 9f7c6a18 (BUG/MAJOR:
stream_interface: certain workloads could cause get stuck) but unfortunately,
it was incomplete as very subtle cases would occasionally remain unaddressed
when a buffer was marked with the NOEXP flag, which is used during POST
uploads. The wake up must be performed even when the flag is there, the
flag is used only to refresh the timeout.

Too many conditions need to be hit together for the situation to be
reproducible, but it systematically appears for some users.

It is particularly important to credit Sander Klein and John Rood from
Picturae ICT ( http://picturae.com/ ) for reporting this bug on the mailing
list, providing configs and countless traces showing the bug in action, and
for their patience testing litteraly tens of snapshots and versions of
supposed fixes during a full week to narrow the commit range until the bug
was really knocked down! As a side effect of their numerous tests, several
other bugs were fixed.
2012-12-19 19:20:24 +01:00
Willy Tarreau
7d28149e92 BUG/MEDIUM: connection: always update connection flags prior to computing polling
stream_int_chk_rcv_conn() did not clear connection flags before updating them. It
is unsure whether this could have caused the stalled transfers that have been
reported since dev15.

In order to avoid such further issues, we now use a simple inline function to do
all the job.
2012-12-17 01:14:25 +01:00
Willy Tarreau
b016587068 BUG/MINOR: stream_interface: don't return when the fd is already set
Back in the days where polling was made with select() where all FDs
were checked at once, stream_int_chk_snd_conn() used to check whether
the file descriptor it was passed was ready or not, so that it did
not perform the work for nothing.

Right now FDs are checked just before calling the I/O handler so this
test never matches at best, or may return false information at worst.

Since conn_fd_handler() always clears the flags upon exit, it looks
like a missed event cannot happen right now. Still, better remove
this outdated check than wait for it to cause issues.
2012-12-15 10:12:39 +01:00
Willy Tarreau
ca00fbcb91 BUG/MEDIUM: stream-interface: fix possible stalls during transfers
Sander Klein reported a rare case of POST transfers being stalled
after a few megabytes since dev15. One possible culprit is the fix
for the CPU spinning issues which is not totally correct, because
stream_int_chk_snd_conn() would inconditionally enable the
CO_FL_CURR_WR_ENA flag.

What could theorically happen is the following sequence :
  1) send buffer is empty, server-side polling is disabled
  2) client sends some data
  3) such data are forwarded to the server using
     stream_int_chk_snd_conn()
  4) conn->flags |= CO_FL_CURR_WR_ENA
  5) si_conn_send_loop() is called
  6) raw_sock_from_buf() does a partial write due to full kernel buffers
  7) stream_int_chk_snd_conn() detects this and requests to be called
     to send the remaining data using __conn_data_want_send(), and clears
     the SI_FL_WAIT_DATA flag on the stream interface, indicating that it
     is already congestionned.
  8) conn_cond_update_polling() calls conn_data_update_polling() which
     sees that both CO_FL_DATA_WR_ENA and CO_FL_CURR_WR_ENA are set, so
     it does not enable polling on the output fd.
  9) the next chunk from the client fills the buffer
  10) stream_int_chk_snd_conn() is called again
  11) SI_FL_WAIT_DATA is already cleared, so the function immediately
      returns without doing anything.
  12) the buffer is now full with the FD write polling disabled and
      everything deadlocks.

Not that there is no reason for such an issue not to happen the other
way around, from server to client, except maybe that due to the speed
difference between the client and the server, client-side polling is
always enabled and the buffer is never empty.

All this shows that the new polling still looks fragile, in part due
to the double information on the FD status, being both in fdtab[] and
in the connection, which looks unavoidable. We should probably have
some functions to tighten the relation between such flags and avoid
manipulating them by hand.

Also, the effects of chk_snd() on the polling are still under-estimated,
while the relation between the stream_int and the FD is still too much
present. Maybe the function should be rethought to only call the connection's
fd handler.  The connection model probably needs two calling conventions
for bottom half and upper half.
2012-12-15 09:18:05 +01:00
Willy Tarreau
d486ef5045 BUG/MINOR: connection: remove a few synchronous calls to polling updates
There were a few synchronous calls to polling updates in some functions
called from the connection handler. These ones are not needed and should
be replaced by more efficient and more debugable asynchronous calls.
2012-12-10 17:03:52 +01:00
Willy Tarreau
d29a06689f BUG/MAJOR: connection: always recompute polling status upon I/O
Bryan Berry and Baptiste Assmann both reported some occasional CPU
spinning loops where haproxy was still processing I/O but burning
CPU for apparently uncaught events.

What happens is the following sequence :
  - proxy is in TCP mode
  - a connection from a client initiates a connection to a server
  - the connection to the server does not immediately happen and is
    polled for
  - in the mean time, the client speaks and the stream interface
    calls ->chk_snd() on the peer connection to send the new data
  - chk_snd() calls send_loop() to send the data. This last one
    makes the connection succeed and empties the buffer, so it
    disables polling on the connection and on the FD by creating
    an update entry.
  - before the update is processed, poll() succeeds and reports
    a write event for this fd. The poller does fd_ev_set() on the
    FD to switch it to speculative mode
  - the IO handler is called with a connection which has no write
    flag but an FD which is enabled in speculative mode.
  - the connection does nothing useful.
  - conn_update_polling() at the end of conn_fd_handler() cannot
    disable the FD because there were no changes on this FD.
  - the handler is left with speculative polling still enabled on
    the FD, and will be called over and over until a poll event is
    needed to transfer data.

There is no perfectly elegant solution to this. At least we should
update the flags indicating the current polling status to reflect
what is being done at the FD level. This will allow to detect that
the FD needs to be disabled upon exit.

chk_snd() also needs minor changes to correctly switch to speculative
polling before calling send_loop(), and to reflect this in the connection
flags. This is needed so that no event remains stuck there without any
polling. In fact, chk_snd() and chk_rcv() should perform the same number
of preparations and cleanups as conn_fd_handler().
2012-12-10 16:52:10 +01:00
Willy Tarreau
d1b3f0498d MINOR: connection: don't remove failed handshake flags
It's annoying that handshake handlers remove themselves from the
connection flags when they fail because there is no way to tell
which one fails. So now we only remove them when they succeed.
2012-12-03 14:22:12 +01:00
Willy Tarreau
2b199c9ac3 MEDIUM: connection: provide a common conn_full_close() function
Several places got the connection close sequence wrong because it
was not obvious. In practice we always need the same sequence when
aborting, so let's have a common function for this.
2012-11-23 17:32:21 +01:00
Willy Tarreau
f9fbfe8229 BUG/MAJOR: stream_interface: read0 not always handled since dev12
The connection handling changed introduced in 1.5-dev12 introduced a
regression with commit 9bf9c14c. The issue is that the stream_sock_read0()
callback must update the channel flags to indicate that the side is closed
so that when process_session() is called, it can propagate the close to the
other side and terminate the session.

The issue only appears in HTTP tunnel mode. It's a bit tricky to trigger
the issue, it requires that the request channel is full with data flowing
from the client to the server and that both the response and the read0()
are received at once so that the flags are not updated, and that the HTTP
analyser switches to tunnel mode without being informed that the request
write side is closed. After that, process_session() does not know that the
connection has to be aborted either, and no more event appears on this side
where the connection stays here forever.

Many thanks to Igor at owind for testing several snapshots and for providing
valuable traces to reproduce and diagnose the issue!
2012-11-21 21:59:51 +01:00
Willy Tarreau
9f7c6a183b BUG/MAJOR: stream_interface: certain workloads could cause get stuck
Some very specifically scheduled workloads could sometimes get stuck when
data receive was disabled due to buffer full then re-enabled due to a full
send(). A conn_data_want_recv() had to be set again in this specific case.

This bug was introduced with connection rework and polling changes in dev12.
2012-11-19 17:11:00 +01:00
Willy Tarreau
3fdb366885 MAJOR: connection: replace struct target with a pointer to an enum
Instead of storing a couple of (int, ptr) in the struct connection
and the struct session, we use a different method : we only store a
pointer to an integer which is stored inside the target object and
which contains a unique type identifier. That way, the pointer allows
us to retrieve the object type (by dereferencing it) and the object's
address (by computing the displacement in the target structure). The
NULL pointer always corresponds to OBJ_TYPE_NONE.

This reduces the size of the connection and session structs. It also
simplifies target assignment and compare.

In order to improve the generated code, we try to put the obj_type
element at the beginning of all the structs (listener, server, proxy,
si_applet), so that the original and target pointers are always equal.

A lot of code was touched by massive replaces, but the changes are not
that important.
2012-11-12 00:42:33 +01:00
Willy Tarreau
128b03c9ab CLEANUP: stream_interface: remove the external task type target
Before connections were introduced, it was possible to connect an
external task to a stream interface. However it was left as an
exercise for the brave implementer to find how that ought to be
done.

The feature was broken since the introduction of connections and
was never fixed since due to lack of users. Better remove this dead
code now.
2012-11-11 23:14:16 +01:00
Willy Tarreau
b31c971bef CLEANUP: channel: remove any reference of the hijackers
Hijackers were functions designed to inject data into channels in the
distant past. They became unused around 1.3.16, and since there has
not been any user of this mechanism to date, it's uncertain whether
the mechanism still works (and it's not really useful anymore). So
better remove it as well as the pointer it uses in the channel struct.
2012-11-11 23:05:39 +01:00
Willy Tarreau
7f7ad91056 BUILD: stream_interface: remove si_fd() and its references
si_fd() is not used a lot, and breaks builds on OpenBSD 5.2 which
defines this name for its own purpose. It's easy enough to remove
this one-liner function, so let's do it.
2012-11-11 20:53:29 +01:00
Willy Tarreau
5fddab0a56 OPTIM: stream_interface: disable reading when CF_READ_DONTWAIT is set
CF_READ_DONTWAIT was designed to avoid getting an EAGAIN upon recv() when
very few data are expected. It prevents the reader from looping over
recv(). Unfortunately with speculative I/O, it is very common that the
same event has the time to be called twice before the task handles the
data and disables the recv(). This is because not all tasks are always
processed at once.

Instead of leaving the buffer free-wheeling and doing an EAGAIN, we
disable reading as soon as the first recv() succeeds. This way we're
sure that only the next wakeup of the task will re-enable it if needed.

Doing so has totally removed the EAGAIN we were seeing till now (30% of
recv).
2012-11-10 00:23:38 +01:00
Willy Tarreau
ed7f836f07 BUG/MINOR: stream_interface: don't loop over ->snd_buf()
It is stupid to loop over ->snd_buf() because the snd_buf() itself already
loops and stops when system buffers are full. But looping again onto it,
we lose the information of the full buffers and perform one useless syscall.

Furthermore, this causes issues when dealing with large uploads while waiting
for a connection to establish, as it can report a server reject of some data
as a connection abort, which is wrong.

1.4 does not have this issue as it loops maximum twice (once for each buffer
half) and exists as soon as system buffers are full. So no backport is needed.
2012-10-29 23:30:33 +01:00
Willy Tarreau
19d14ef104 MEDIUM: make the trash be a chunk instead of a char *
The trash is used everywhere to store the results of temporary strings
built out of s(n)printf, or as a storage for a chunk when chunks are
needed.

Using global.tune.bufsize is not the most convenient thing either.

So let's replace trash with a chunk and directly use it as such. We can
then use trash.size as the natural way to get its size, and get rid of
many intermediary chunks that were previously used.

The patch is huge because it touches many areas but it makes the code
a lot more clear and even outlines places where trash was used without
being that obvious.
2012-10-29 16:57:30 +01:00
Willy Tarreau
f2943dccd0 MAJOR: session: detach the connections from the stream interfaces
We will need to be able to switch server connections on a session and
to keep idle connections. In order to achieve this, the preliminary
requirement is that the connections can survive the session and be
detached from them.

Right now they're still allocated at exactly the same place, so when
there is a session, there are always 2 connections. We could soon
improve on this by allocating the outgoing connection only during a
connect().

This current patch touches a lot of code and intentionally does not
change any functionnality. Performance tests show no regression (even
a very minor improvement). The doc has not yet been updated.
2012-10-26 20:15:20 +02:00
Willy Tarreau
c919dc66a3 CLEANUP: remove trashlen
trashlen is a copy of global.tune.bufsize, so let's stop using it as
a duplicate, fall back to the original bufsize, it's less confusing
this way.
2012-10-26 20:04:27 +02:00
Willy Tarreau
9b28e03b66 MAJOR: channel: replace the struct buffer with a pointer to a buffer
With this commit, we now separate the channel from the buffer. This will
allow us to replace buffers on the fly without touching the channel. Since
nobody is supposed to keep a reference to a buffer anymore, doing so is not
a problem and will also permit some copy-less data manipulation.

Interestingly, these changes have shown a 2% performance increase on some
workloads, probably due to a better cache placement of data.
2012-10-13 09:07:52 +02:00
Willy Tarreau
cb76e5978c CLEANUP: stream_interface: use 'chn' instead of 'b' to name channel pointers
As with previous patches, this naming is confusing.
2012-10-12 23:56:57 +02:00
Willy Tarreau
e1e4a61e7a REORG: connection: move the PROXY protocol management to connection.c
It was previously in frontend.c but there is no reason for this anymore
considering that all the information involved is in the connection itself
only. Theorically this should be in the socket layer but we don't have
this yet.
2012-10-05 00:32:33 +02:00
Willy Tarreau
665e6ee7aa MEDIUM: connection: it's not the data layer's role to validate the connection
Till now we used to perform the L4_CONN check in the data layer
(eg: stream interface) but that does not make sense, because some transport
layers will imply that the connection is opened (eg: SSL), and also because
the complexity to check for this is higher in the data layer than in the
transport layer. This is so much true that some read0 cases did not validate
the connection.

So as of now, the transport layer is responsible for clearing L4_CONN when
it detects an activity, and the data layer may safely rely on this flag. This
only impacts a minor change in raw_sock and stream_interface for now.
2012-10-04 22:26:11 +02:00
Willy Tarreau
2396c1c4a2 MEDIUM: connection: make it possible for data->wake to return an error
Just like ->init(), ->wake() may now be used to return an error and
abort the connection. Currently this is not used but will be with
embryonic sessions.
2012-10-04 22:26:10 +02:00
Willy Tarreau
4aa3683b2d MINOR: connection: provide a generic data layer wakeup callback
Instead of calling conn_notify_si() from the connection handler, we
now call data->wake(), which will allow us to use a different callback
with health checks.

Note that we still rely on a flag in order to decide whether or not
to call this function. The reason is that with embryonic sessions,
the callback is already initialized to si_conn_cb without the flag,
and we can't call the SI notify function in the leave path before
the stream interface is initialized.

This issue should be addressed by involving a different data_cb for
embryonic sessions and for stream interfaces, that would be changed
during session_complete() for the final data_cb.
2012-10-04 22:26:10 +02:00
Willy Tarreau
74beec32a5 REORG: connection: rename app_cb "data"
Now conn->data will designate the data layer which is the client for
the transport layer. In practice it's the stream interface and will
soon also be the health checks.
2012-10-04 22:26:10 +02:00
Willy Tarreau
f7bc57ca6e REORG: connection: rename the data layer the "transport layer"
While working on the changes required to make the health checks use the
new connections, it started to become obvious that some naming was not
logical at all in the connections. Specifically, it is not logical to
call the "data layer" the layer which is in charge for all the handshake
and which does not yet provide a data layer once established until a
session has allocated all the required buffers.

In fact, it's more a transport layer, which makes much more sense. The
transport layer offers a medium on which data can transit, and it offers
the functions to move these data when the upper layer requests this. And
it is the upper layer which iterates over the transport layer's functions
to move data which should be called the data layer.

The use case where it's obvious is with embryonic sessions : an incoming
SSL connection is accepted. Only the connection is allocated, not the
buffers nor stream interface, etc... The connection handles the SSL
handshake by itself. Once this handshake is complete, we can't use the
data functions because the buffers and stream interface are not there
yet. Hence we have to first call a specific function to complete the
session initialization, after which we'll be able to use the data
functions. This clearly proves that SSL here is only a transport layer
and that the stream interface constitutes the data layer.

A similar change will be performed to rename app_cb => data, but the
two could not be in the same commit for obvious reasons.
2012-10-04 22:26:09 +02:00
Willy Tarreau
e603e69d18 MEDIUM: connection: make use of the owner instead of container_of
This way the connection can become independant on the stream interface.
2012-09-28 00:01:23 +02:00
Willy Tarreau
34ffd77648 MAJOR: stream_interface: continue to update data polling flags during handshakes
Since data and socket polling flags were split, it became possible to update
data flags even during handshakes. In fact this is very important otherwise
it is not possible to poll for writes if some data are to be forwarded during
a handshake (eg: data received during an SSL connect).
2012-09-03 20:49:13 +02:00
Willy Tarreau
56a77e5933 MEDIUM: connection: complete the polling cleanups
I/O handlers now all use __conn_{sock,data}_{stop,poll,want}_* instead
of returning dummy flags. The code has become slightly simpler because
some tricks such as the MIN_RET_FOR_READ_LOOP are not needed anymore,
and the data handlers which switch to a handshake handler do not need
to disable themselves anymore.
2012-09-03 20:47:35 +02:00
Willy Tarreau
93b0f4f6c6 MEDIUM: stream_interface: remove CAP_SPLTCP/CAP_SPLICE flags
These ones are implicitly handled by the connection's data layer, no need
to rely on them anymore and reaching them maintains undesired dependences
on stream-interface.
2012-09-03 20:47:34 +02:00
Willy Tarreau
986a9d2d12 MAJOR: connection: move the addr field from the stream_interface
We need to have the source and destination addresses in the connection.
They were lying in the stream interface so let's move them. The flags
SI_FL_FROM_SET and SI_FL_TO_SET have been moved as well.

It's worth noting that tcp_connect_server() almost does not use the
stream interface anymore except for a few flags.

It has been identified that once we detach the connection from the SI,
it will probably be needed to keep a copy of the server-side addresses
in the SI just for logging purposes. This has not been implemented right
now though.
2012-09-03 20:47:34 +02:00